Matthew Honnibal
4174477161
Fix equality check in test
2017-10-16 19:50:35 +02:00
Matthew Honnibal
2bc06e4b22
Bump rolling buffer size to 10k
2017-10-16 19:38:29 +02:00
Matthew Honnibal
66e2eb8f39
Clean up remnant of frozen in StringStore
2017-10-16 19:34:41 +02:00
Matthew Honnibal
a002264fec
Remove caching of Token in Doc, as caused cycle.
2017-10-16 19:34:21 +02:00
Matthew Honnibal
3e037054c8
Remove obsolete is_frozen functionality from StringStore
2017-10-16 19:23:10 +02:00
Matthew Honnibal
5c14f3f033
Create a rolling buffer for the StringStore in Language.pipe()
2017-10-16 19:22:40 +02:00
Matthew Honnibal
59c216196c
Allow weakrefs on Doc objects
2017-10-16 19:22:11 +02:00
ines
d5418553eb
Fix whitespace
2017-10-16 18:30:04 +02:00
ines
6ceadcdb5c
Make sure from_disk passes string to numpy (see #1421 )
...
If path is a WindowsPath, numpy does not recognise it as a path and as
a result, doesn't open the file.
https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L369
2017-10-16 18:29:56 +02:00
Matthew Honnibal
010a7309ff
Merge pull request #1402 from explosion/feature/fix-matcher-operators
...
💫 Fix Matcher variable-length operators
2017-10-16 17:53:19 +02:00
Matthew Honnibal
c29927d2e7
Fix matcher test
2017-10-16 17:22:18 +02:00
Vishnu Kumar Nekkanti
d3c54cf39a
fixed SyntaxError while checking for jieba
2017-10-16 18:51:33 +05:30
Vishnu Kumar Nekkanti
18ec6610dd
Merge pull request #1 from explosion/develop
...
Develop
2017-10-16 18:34:13 +05:30
ines
63393b4e0d
Update matcher docs to reflect operator changes
2017-10-16 13:44:12 +02:00
Matthew Honnibal
a928ae2f35
Merge branch 'develop' into feature/fix-matcher-operators
2017-10-16 13:38:36 +02:00
Matthew Honnibal
56aa42cc5d
Fix and document matcher operator 'shadowing' behaviour
2017-10-16 13:38:20 +02:00
Matthew Honnibal
748d525801
Add more matcher operator tests
2017-10-16 13:38:01 +02:00
Matthew Honnibal
0433181658
Document operator semantics in Matcher docstring
2017-10-16 12:06:33 +02:00
Matthew Honnibal
cd9378c8f1
Merge pull request #1423 from yuukos/master
...
Fixed Russian tokenizer
2017-10-16 11:45:53 +02:00
Matthew Honnibal
6b0121091c
Merge pull request #1420 from polm/master
...
[ja] Stash tokenizer output for speed
2017-10-16 10:28:22 +02:00
yuukos
34e9c6ddc0
Merge remote-tracking branch 'origin/master'
2017-10-16 13:48:10 +07:00
yuukos
92931a2efd
Merge branch 'russian_language'
2017-10-16 13:46:28 +07:00
yuukos
241d19a3e6
fixed Russian Tokenizer
...
- added trailing space flags for tokens
2017-10-16 13:37:05 +07:00
Paul O'Leary McCann
71ae8013ec
[ja] Use user_details instead of a wrapper class
...
Instead of using a JapaneseDoc wrapper class to store Mecab output,
stash it in `user_data`. -POLM
2017-10-16 00:24:34 +09:00
Paul O'Leary McCann
43eedf73f2
[ja] Stash tokenizer output for speed
...
Before this commit, the Mecab tokenizer had to be called twice when
creating a Doc- once during tokenization and once during tagging. This
creates a JapaneseDoc wrapper class for Doc that stashes the parsed
tokenizer output to remove redundant processing. -POLM
2017-10-15 23:33:25 +09:00
ines
15514dc333
Add section on upgrading
2017-10-14 22:14:47 +02:00
ines
c0aceb9fbe
Add Hindi to supported languages
2017-10-14 15:16:41 +02:00
Ines Montani
e00a6c08cf
Merge pull request #1418 from polm/master
...
Contributor agreement
2017-10-14 15:10:58 +02:00
ines
266e7180a7
Add Language class, stop words and basic stemmer that sets NORM
2017-10-14 14:59:52 +02:00
ines
e85e1d571b
Update base punctuation
2017-10-14 14:59:23 +02:00
ines
9d6c8eaa49
Update base norm exceptions with more unicode characters
...
e.g. unicode variations of punctuation used in Chinese
2017-10-14 14:58:52 +02:00
ines
3516aa0cea
Port over changes from #1389
2017-10-14 13:32:55 +02:00
ines
cd6a29dce7
Port over changes from #1294
2017-10-14 13:28:46 +02:00
ines
38c756fd85
Port over changes from #1287
2017-10-14 13:16:21 +02:00
ines
612224c10d
Port over changes from #1157
2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3
Fix formatting and add comment on languages
2017-10-14 13:11:18 +02:00
ines
a4d974d97b
Port over URL pattern changes from #1411
2017-10-14 12:58:07 +02:00
ines
09aed58140
Port over changes from #1333 and add comments
2017-10-14 12:52:59 +02:00
ines
a5da683578
Add Russian to alpha docs and update tokenizer dependencies
2017-10-14 12:52:41 +02:00
ines
a69f4e56e5
Remove outdated aside
2017-10-14 12:52:07 +02:00
ines
bb6ecb82e5
Ensure long file paths in code examples break if needed
2017-10-14 12:51:52 +02:00
Paul O'Leary McCann
a31d33be06
Contributor agreement
2017-10-14 19:28:04 +09:00
Ines Montani
4b5af8bd17
Merge pull request #1414 from yuukos/master
...
Adding Russian language support
2017-10-13 17:03:52 +02:00
Alex
95836abee1
Update CONTRIBUTORS.md
2017-10-13 21:02:19 +07:00
Alex
ce00405afc
Create yuukos.md
2017-10-13 21:00:15 +07:00
yuukos
6fb9d75bd2
fixed test with creating tokenizer
2017-10-13 15:51:03 +07:00
yuukos
a229b6e0de
added tests for Russian language
...
added tests of creating Russian Language instance and Russian tokenizer
2017-10-13 14:04:37 +07:00
yuukos
622b6d6270
updated Russian tokenizer
...
moved the trying to import pymorph into __init__
2017-10-13 13:57:29 +07:00
ines
bfd9506f1d
Update extensions docs and add resources
2017-10-13 00:18:13 +02:00
ines
5f5d6897e8
Increment version
2017-10-13 00:18:02 +02:00