Commit Graph

8849 Commits

Author SHA1 Message Date
ines
bfd9506f1d Update extensions docs and add resources 2017-10-13 00:18:13 +02:00
ines
5f5d6897e8 Increment version 2017-10-13 00:18:02 +02:00
ines
9fd68334ab Add validate command docs 2017-10-12 23:36:48 +02:00
Matthew Honnibal
cf6da9301a Update lemmatizer test 2017-10-12 22:50:52 +02:00
Matthew Honnibal
9b90d235d1 Fix tag check in lemmatizer 2017-10-12 22:50:43 +02:00
Matthew Honnibal
dc01acd821 Escape encoding in validate function 2017-10-12 22:23:21 +02:00
Matthew Honnibal
27b927259a Add locale_escape compat function 2017-10-12 22:22:04 +02:00
Matthew Honnibal
e72603f39f Merge pull request #1416 from explosion/feature/cli-validate
💫 Add "validate" command to CLI
2017-10-12 21:45:20 +02:00
Matthew Honnibal
cb0e727c54 Merge pull request #1415 from IamJeffG/fix-alpha-example-train-ner-standalone
Bugfix example script train_ner_standalone.py, fails after training
2017-10-12 21:44:28 +02:00
ines
9c6de3dcfa Merge branch 'develop' into feature/cli-validate 2017-10-12 21:44:28 +02:00
Jeffrey Gerard
5ba970b495 minor cleanup 2017-10-12 12:34:46 -07:00
Matthew Honnibal
462caf835a Fix SBD test 2017-10-12 21:18:22 +02:00
Jeffrey Gerard
39d3cbfdba Bugfix example script train_ner_standalone.py, fails after training 2017-10-12 11:39:12 -07:00
ines
fff1028391 Add validate CLI command 2017-10-12 20:05:06 +02:00
yuukos
f81dd284eb updated spacy/__init__.py
registered russian language via set_lang_class
2017-10-12 22:28:34 +07:00
yuukos
7b9491679f added russian language support 2017-10-12 22:24:20 +07:00
yuukos
2a78f4d634 updated .gitignore file
added excluding PyCharm's idea directory
2017-10-12 22:23:19 +07:00
Matthew Honnibal
908f44c3fe Disable history features by default 2017-10-12 14:56:11 +02:00
Matthew Honnibal
a955843684 Increase default number of epochs 2017-10-12 13:13:01 +02:00
Matthew Honnibal
cecfcc7711 Set default hyper params back to 'slow' settings 2017-10-12 13:12:26 +02:00
Ines Montani
37aa523a8e Merge pull request #1408 from explosion/feature/dot-underscore
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
Matthew Honnibal
40dbc85ffa Merge pull request #1413 from explosion/feature/lemmatizer
💫  Integrate lookup lemmatization (9+ languages)
2017-10-11 17:54:36 +02:00
ines
8ce6f96180 Don't make copies of language data components 2017-10-11 15:34:55 +02:00
Ines Montani
a06b84e7cc Merge pull request #1407 from hscspring/patch-6
Update training.jade
2017-10-11 14:25:38 +02:00
ines
eac9e99086 Update docs on adding lemmatization to languages 2017-10-11 14:21:15 +02:00
ines
51519251c2 Fix underscore method test 2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf Fix formatting 2017-10-11 13:34:11 +02:00
ines
453c47ca24 Add German lemmatizer tests 2017-10-11 13:27:26 +02:00
ines
15fe0fd82d Fix tests 2017-10-11 13:27:18 +02:00
ines
6dd14dc342 Add lookup lemmas to tokens without POS tags 2017-10-11 13:27:10 +02:00
ines
9620c1a640 Add lemma_lookup to Language defaults 2017-10-11 13:26:05 +02:00
ines
9fd471372a Add lookup lemmatizer to lemmatizer as lookup() method 2017-10-11 13:25:51 +02:00
ines
e0ff145a8b Merge branch 'develop' into feature/dot-underscore 2017-10-11 11:57:05 +02:00
ines
c1d6d43c83 Merge branch 'develop' into feature/lemmatizer 2017-10-11 11:56:35 +02:00
Ines Montani
ffc2fef13c Merge pull request #1411 from raphael0202/issue_1078
Resolve issue #1078 by simplifying URL pattern
2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque
3452d6ce52 Resolve issue #1078 by simplifying URL pattern
- avoid catastrophic backtracking
- reduce character range of host name, domain name and TLD identifier
2017-10-11 11:24:00 +02:00
Matthew Honnibal
17c467e0ab Avoid clobbering existing lemmas 2017-10-11 03:33:06 -05:00
Matthew Honnibal
807e109f2b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 02:47:59 -05:00
Matthew Honnibal
6e552c9d83 Prune number of non-projective labels more aggressiely 2017-10-11 02:46:44 -05:00
Matthew Honnibal
76fe24f44d Improve embedding defaults 2017-10-11 09:44:17 +02:00
Matthew Honnibal
188f620046 Improve parser defaults 2017-10-11 09:43:48 +02:00
Matthew Honnibal
acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00
Matthew Honnibal
74c2c6a58c Add default name and lang to meta 2017-10-11 08:49:12 +02:00
Matthew Honnibal
3814a161e6 Avoid clobbering preset lemmas 2017-10-11 08:41:03 +02:00
Matthew Honnibal
fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal
462b2e26b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 08:23:04 +02:00
Matthew Honnibal
a6ac4699eb Allow Morphology class to setup tokens
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal
3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal
c15d8278cb Avoid lemmatizing inappropriate tags in English lemmatizer 2017-10-11 03:23:23 +02:00
Matthew Honnibal
d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00