Commit Graph

11857 Commits

Author SHA1 Message Date
ines
bb6ecb82e5 Ensure long file paths in code examples break if needed 2017-10-14 12:51:52 +02:00
Paul O'Leary McCann
a31d33be06 Contributor agreement 2017-10-14 19:28:04 +09:00
Ines Montani
4b5af8bd17 Merge pull request #1414 from yuukos/master
Adding Russian language support
2017-10-13 17:03:52 +02:00
Alex
95836abee1 Update CONTRIBUTORS.md 2017-10-13 21:02:19 +07:00
Alex
ce00405afc Create yuukos.md 2017-10-13 21:00:15 +07:00
yuukos
6fb9d75bd2 fixed test with creating tokenizer 2017-10-13 15:51:03 +07:00
yuukos
a229b6e0de added tests for Russian language
added tests of creating Russian Language instance and Russian tokenizer
2017-10-13 14:04:37 +07:00
yuukos
622b6d6270 updated Russian tokenizer
moved the trying to import pymorph into __init__
2017-10-13 13:57:29 +07:00
ines
bfd9506f1d Update extensions docs and add resources 2017-10-13 00:18:13 +02:00
ines
5f5d6897e8 Increment version 2017-10-13 00:18:02 +02:00
ines
9fd68334ab Add validate command docs 2017-10-12 23:36:48 +02:00
Matthew Honnibal
cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal
9b90d235d1 Fix tag check in lemmatizer 2017-10-12 22:50:43 +02:00
Matthew Honnibal
dc01acd821 Escape encoding in validate function 2017-10-12 22:23:21 +02:00
Matthew Honnibal
27b927259a Add locale_escape compat function 2017-10-12 22:22:04 +02:00
Matthew Honnibal
e72603f39f Merge pull request #1416 from explosion/feature/cli-validate
💫 Add "validate" command to CLI
2017-10-12 21:45:20 +02:00
Matthew Honnibal
cb0e727c54 Merge pull request #1415 from IamJeffG/fix-alpha-example-train-ner-standalone
Bugfix example script train_ner_standalone.py, fails after training
2017-10-12 21:44:28 +02:00
ines
9c6de3dcfa Merge branch 'develop' into feature/cli-validate 2017-10-12 21:44:28 +02:00
Jeffrey Gerard
5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Matthew Honnibal
462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Jeffrey Gerard
39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines
fff1028391 Add validate CLI command 2017-10-12 20:05:06 +02:00
yuukos
f81dd284eb updated spacy/__init__.py
registered russian language via set_lang_class
2017-10-12 22:28:34 +07:00
yuukos
7b9491679f added russian language support 2017-10-12 22:24:20 +07:00
yuukos
2a78f4d634 updated .gitignore file
added excluding PyCharm's idea directory
2017-10-12 22:23:19 +07:00
Matthew Honnibal
908f44c3fe Disable history features by default 2017-10-12 14:56:11 +02:00
Matthew Honnibal
a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal
cecfcc7711 Set default hyper params back to 'slow' settings 2017-10-12 13:12:26 +02:00
Ines Montani
37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
Matthew Honnibal
40dbc85ffa Merge pull request #1413 from explosion/feature/lemmatizer
💫  Integrate lookup lemmatization (9+ languages)
2017-10-11 17:54:36 +02:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
Ines Montani
a06b84e7cc Merge pull request #1407 from hscspring/patch-6
Update training.jade
2017-10-11 14:25:38 +02:00
ines
eac9e99086 Update docs on adding lemmatization to languages 2017-10-11 14:21:15 +02:00
ines
51519251c2 Fix underscore method test 2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf Fix formatting 2017-10-11 13:34:11 +02:00
ines
453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
ines
15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
ines
6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
ines
9620c1a640 Add lemma_lookup to Language defaults 2017-10-11 13:26:05 +02:00
ines
9fd471372a Add lookup lemmatizer to lemmatizer as lookup() method 2017-10-11 13:25:51 +02:00
ines
e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
ines
c1d6d43c83 Merge branch 'develop' into feature/lemmatizer 2017-10-11 11:56:35 +02:00
Ines Montani
ffc2fef13c Merge pull request #1411 from raphael0202/issue_1078
Resolve issue #1078 by simplifying URL pattern
2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque
3452d6ce52 Resolve issue #1078 by simplifying URL pattern
- avoid catastrophic backtracking
- reduce character range of host name, domain name and TLD identifier
2017-10-11 11:24:00 +02:00
Matthew Honnibal
17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal
807e109f2b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 02:47:59 -05:00
Matthew Honnibal
6e552c9d83 Prune number of non-projective labels more aggressiely 2017-10-11 02:46:44 -05:00
Matthew Honnibal
76fe24f44d Improve embedding defaults 2017-10-11 09:44:17 +02:00
Matthew Honnibal
188f620046 Improve parser defaults 2017-10-11 09:43:48 +02:00
Matthew Honnibal
acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00