Commit Graph

7669 Commits

Author SHA1 Message Date
yuukos
92931a2efd Merge branch 'russian_language' 2017-10-16 13:46:28 +07:00
yuukos
241d19a3e6 fixed Russian Tokenizer
- added trailing space flags for tokens
2017-10-16 13:37:05 +07:00
Paul O'Leary McCann
71ae8013ec [ja] Use user_details instead of a wrapper class
Instead of using a JapaneseDoc wrapper class to store Mecab output,
stash it in `user_data`. -POLM
2017-10-16 00:24:34 +09:00
Paul O'Leary McCann
43eedf73f2 [ja] Stash tokenizer output for speed
Before this commit, the Mecab tokenizer had to be called twice when
creating a Doc- once during tokenization and once during tagging. This
creates a JapaneseDoc wrapper class for Doc that stashes the parsed
tokenizer output to remove redundant processing. -POLM
2017-10-15 23:33:25 +09:00
ines
15514dc333 Add section on upgrading 2017-10-14 22:14:47 +02:00
ines
c0aceb9fbe Add Hindi to supported languages 2017-10-14 15:16:41 +02:00
Ines Montani
e00a6c08cf Merge pull request #1418 from polm/master
Contributor agreement
2017-10-14 15:10:58 +02:00
ines
266e7180a7 Add Language class, stop words and basic stemmer that sets NORM 2017-10-14 14:59:52 +02:00
ines
e85e1d571b Update base punctuation 2017-10-14 14:59:23 +02:00
ines
9d6c8eaa49 Update base norm exceptions with more unicode characters
e.g. unicode variations of punctuation used in Chinese
2017-10-14 14:58:52 +02:00
ines
3516aa0cea Port over changes from #1389 2017-10-14 13:32:55 +02:00
ines
cd6a29dce7 Port over changes from #1294 2017-10-14 13:28:46 +02:00
ines
38c756fd85 Port over changes from #1287 2017-10-14 13:16:21 +02:00
ines
612224c10d Port over changes from #1157 2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3 Fix formatting and add comment on languages 2017-10-14 13:11:18 +02:00
ines
a4d974d97b Port over URL pattern changes from #1411 2017-10-14 12:58:07 +02:00
ines
09aed58140 Port over changes from #1333 and add comments 2017-10-14 12:52:59 +02:00
ines
a5da683578 Add Russian to alpha docs and update tokenizer dependencies 2017-10-14 12:52:41 +02:00
ines
a69f4e56e5 Remove outdated aside 2017-10-14 12:52:07 +02:00
ines
bb6ecb82e5 Ensure long file paths in code examples break if needed 2017-10-14 12:51:52 +02:00
Paul O'Leary McCann
a31d33be06 Contributor agreement 2017-10-14 19:28:04 +09:00
Ines Montani
4b5af8bd17 Merge pull request #1414 from yuukos/master
Adding Russian language support
2017-10-13 17:03:52 +02:00
Alex
95836abee1 Update CONTRIBUTORS.md 2017-10-13 21:02:19 +07:00
Alex
ce00405afc Create yuukos.md 2017-10-13 21:00:15 +07:00
yuukos
6fb9d75bd2 fixed test with creating tokenizer 2017-10-13 15:51:03 +07:00
yuukos
a229b6e0de added tests for Russian language
added tests of creating Russian Language instance and Russian tokenizer
2017-10-13 14:04:37 +07:00
yuukos
622b6d6270 updated Russian tokenizer
moved the trying to import pymorph into __init__
2017-10-13 13:57:29 +07:00
ines
bfd9506f1d Update extensions docs and add resources 2017-10-13 00:18:13 +02:00
ines
5f5d6897e8 Increment version 2017-10-13 00:18:02 +02:00
ines
9fd68334ab Add validate command docs 2017-10-12 23:36:48 +02:00
Matthew Honnibal
cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal
9b90d235d1 Fix tag check in lemmatizer 2017-10-12 22:50:43 +02:00
Matthew Honnibal
dc01acd821 Escape encoding in validate function 2017-10-12 22:23:21 +02:00
Matthew Honnibal
27b927259a Add locale_escape compat function 2017-10-12 22:22:04 +02:00
Matthew Honnibal
e72603f39f Merge pull request #1416 from explosion/feature/cli-validate
💫 Add "validate" command to CLI
2017-10-12 21:45:20 +02:00
Matthew Honnibal
cb0e727c54 Merge pull request #1415 from IamJeffG/fix-alpha-example-train-ner-standalone
Bugfix example script train_ner_standalone.py, fails after training
2017-10-12 21:44:28 +02:00
ines
9c6de3dcfa Merge branch 'develop' into feature/cli-validate 2017-10-12 21:44:28 +02:00
Jeffrey Gerard
5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Matthew Honnibal
462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Jeffrey Gerard
39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines
fff1028391 Add validate CLI command 2017-10-12 20:05:06 +02:00
yuukos
f81dd284eb updated spacy/__init__.py
registered russian language via set_lang_class
2017-10-12 22:28:34 +07:00
yuukos
7b9491679f added russian language support 2017-10-12 22:24:20 +07:00
yuukos
2a78f4d634 updated .gitignore file
added excluding PyCharm's idea directory
2017-10-12 22:23:19 +07:00
Matthew Honnibal
908f44c3fe Disable history features by default 2017-10-12 14:56:11 +02:00
Matthew Honnibal
a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal
cecfcc7711 Set default hyper params back to 'slow' settings 2017-10-12 13:12:26 +02:00
Ines Montani
37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
Matthew Honnibal
40dbc85ffa Merge pull request #1413 from explosion/feature/lemmatizer
💫  Integrate lookup lemmatization (9+ languages)
2017-10-11 17:54:36 +02:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00