Matthew Honnibal
c0799430a7
Make small changes to Doc.to_array
...
* Change type-check logic to 'hasattr' (Python type-checking is brittle)
* Small 'house style' edits, mostly making code more terse.
2017-10-20 11:17:00 +02:00
Ramanan Balakrishnan
fbccc8c87d
Update documentation on doc.to_array
2017-10-20 14:23:48 +05:30
Ramanan Balakrishnan
5941aa96a1
Support strings for attribute list in doc.to_array
2017-10-20 11:59:34 +05:30
Ramanan Balakrishnan
b47b4e2654
Support single value for attribute list in doc.to_scalar conversion
2017-10-18 14:43:47 +05:30
Matthew Honnibal
cd9378c8f1
Merge pull request #1423 from yuukos/master
...
Fixed Russian tokenizer
2017-10-16 11:45:53 +02:00
Matthew Honnibal
6b0121091c
Merge pull request #1420 from polm/master
...
[ja] Stash tokenizer output for speed
2017-10-16 10:28:22 +02:00
yuukos
34e9c6ddc0
Merge remote-tracking branch 'origin/master'
2017-10-16 13:48:10 +07:00
yuukos
92931a2efd
Merge branch 'russian_language'
2017-10-16 13:46:28 +07:00
yuukos
241d19a3e6
fixed Russian Tokenizer
...
- added trailing space flags for tokens
2017-10-16 13:37:05 +07:00
Paul O'Leary McCann
71ae8013ec
[ja] Use user_details instead of a wrapper class
...
Instead of using a JapaneseDoc wrapper class to store Mecab output,
stash it in `user_data`. -POLM
2017-10-16 00:24:34 +09:00
Paul O'Leary McCann
43eedf73f2
[ja] Stash tokenizer output for speed
...
Before this commit, the Mecab tokenizer had to be called twice when
creating a Doc- once during tokenization and once during tagging. This
creates a JapaneseDoc wrapper class for Doc that stashes the parsed
tokenizer output to remove redundant processing. -POLM
2017-10-15 23:33:25 +09:00
Ines Montani
e00a6c08cf
Merge pull request #1418 from polm/master
...
Contributor agreement
2017-10-14 15:10:58 +02:00
Paul O'Leary McCann
a31d33be06
Contributor agreement
2017-10-14 19:28:04 +09:00
Ines Montani
4b5af8bd17
Merge pull request #1414 from yuukos/master
...
Adding Russian language support
2017-10-13 17:03:52 +02:00
Alex
95836abee1
Update CONTRIBUTORS.md
2017-10-13 21:02:19 +07:00
Alex
ce00405afc
Create yuukos.md
2017-10-13 21:00:15 +07:00
yuukos
6fb9d75bd2
fixed test with creating tokenizer
2017-10-13 15:51:03 +07:00
yuukos
a229b6e0de
added tests for Russian language
...
added tests of creating Russian Language instance and Russian tokenizer
2017-10-13 14:04:37 +07:00
yuukos
622b6d6270
updated Russian tokenizer
...
moved the trying to import pymorph into __init__
2017-10-13 13:57:29 +07:00
yuukos
f81dd284eb
updated spacy/__init__.py
...
registered russian language via set_lang_class
2017-10-12 22:28:34 +07:00
yuukos
7b9491679f
added russian language support
2017-10-12 22:24:20 +07:00
yuukos
2a78f4d634
updated .gitignore file
...
added excluding PyCharm's idea directory
2017-10-12 22:23:19 +07:00
Ines Montani
a06b84e7cc
Merge pull request #1407 from hscspring/patch-6
...
Update training.jade
2017-10-11 14:25:38 +02:00
Ines Montani
ffc2fef13c
Merge pull request #1411 from raphael0202/issue_1078
...
Resolve issue #1078 by simplifying URL pattern
2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque
3452d6ce52
Resolve issue #1078 by simplifying URL pattern
...
- avoid catastrophic backtracking
- reduce character range of host name, domain name and TLD identifier
2017-10-11 11:24:00 +02:00
Yam
efe0800f91
Update training.jade
...
fix several changes
2017-10-09 21:39:15 -05:00
Matthew Honnibal
331d338b8b
Merge pull request #1246 from polm/ja-pos-tagger
...
[wip] Sample implementation of Japanese Tagger (ref #1214 )
2017-10-09 04:00:53 +02:00
Ines Montani
d33899b60b
Merge pull request #1393 from yuukos/patch-1
...
Update adding-languages.jade
2017-10-06 18:03:31 +02:00
Ines Montani
e89689a31d
Update CONTRIBUTORS.md
2017-10-06 18:02:40 +02:00
Alex
763b54cbc3
Update adding-languages.jade
...
Fixed misspellings
2017-10-06 16:30:44 +07:00
Matthew Honnibal
0e1adacaff
Merge pull request #1390 from mdcclv/contributor-mdcclv
...
Contributor agreement for Orion Montoya @mdcclv
2017-10-06 02:39:08 +02:00
Orion Montoya
e04e11070f
Contributor agreement for Orion Montoya @mdcclv
2017-10-05 17:45:45 -04:00
Ines Montani
e77d8886f7
Update CONTRIBUTORS.md
2017-10-05 22:22:04 +02:00
Matthew Honnibal
dea81f113d
Merge pull request #1389 from mdcclv/lemmatizer_obey_exceptions
...
Lemmatizer obey exceptions
2017-10-05 22:11:21 +02:00
Orion Montoya
b0d271809d
Unit test for lemmatizer exceptions -- copied from regression test for #1387
2017-10-05 10:49:28 -04:00
Orion Montoya
ffb50d21a0
Lemmatizer honors exceptions: Fix #1387
2017-10-05 10:49:02 -04:00
Orion Montoya
e81a608173
Regression test for lemmatizer exceptions -- demonstrate issue #1387
2017-10-05 10:47:48 -04:00
Ines Montani
678651ca98
Merge pull request #1386 from kokes/patch-1
...
Fixing links to SyntaxNet
2017-10-04 13:35:01 +02:00
Ondrej Kokes
a9362f1c73
Fixing links to SyntaxNet
2017-10-04 12:55:07 +02:00
Matthew Honnibal
eb72eae258
Merge pull request #1364 from Destygo/master
...
Fixed NER model loading bug
2017-09-29 12:29:43 +02:00
Ines Montani
58bfe30a12
Merge pull request #1362 from IamJeffG/docs/custom-tokenizer
...
Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
2017-09-26 15:51:15 +02:00
Vincent Genty
259ed027af
Fixed NER model loading bug
2017-09-26 15:46:04 +02:00
Ines Montani
361211fe26
Merge pull request #1342 from wannaphongcom/master
...
Add Thai language
2017-09-26 15:40:55 +02:00
Jeffrey Gerard
b6ebedd09c
Document Tokenizer(token_match) and clarify tokenizer_pseudo_code
...
Closes #835
In the `tokenizer_pseudo_code` I put the `special_cases` kwarg
before `find_prefix` because this now matches the order the args
are used in the pseudocode, and it also matches spacy's actual code.
2017-09-25 13:13:25 -07:00
Matthew Honnibal
2f8d535f65
Merge pull request #1351 from hscspring/patch-4
...
Update punctuation.py
2017-09-24 12:16:39 +02:00
Matthew Honnibal
9177313063
Merge pull request #1352 from hscspring/patch-5
...
Update customizing-tokenizer.jade
2017-09-22 16:11:49 +02:00
Matthew Honnibal
1dbc2285b8
Merge pull request #1350 from hscspring/patch-3
...
Update word-vectors-similarities.jade
2017-09-22 16:11:05 +02:00
Yam
54855f0eee
Update customizing-tokenizer.jade
2017-09-22 12:15:48 +08:00
Yam
6f450306c3
Update customizing-tokenizer.jade
...
update some codes:
- `me` -> `-PRON`
- `TAG` -> `POS`
- `create_tokenizer` function
2017-09-22 10:53:22 +08:00
Yam
923c4c2fb2
Update punctuation.py
...
add `……`
2017-09-22 09:50:46 +08:00