Ines Montani
ffc2fef13c
Merge pull request #1411 from raphael0202/issue_1078
...
Resolve issue #1078 by simplifying URL pattern
2017-10-11 11:54:57 +02:00
Raphaël Bournhonesque
3452d6ce52
Resolve issue #1078 by simplifying URL pattern
...
- avoid catastrophic backtracking
- reduce character range of host name, domain name and TLD identifier
2017-10-11 11:24:00 +02:00
Matthew Honnibal
17c467e0ab
Avoid clobbering existing lemmas
2017-10-11 03:33:06 -05:00
Matthew Honnibal
807e109f2b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 02:47:59 -05:00
Matthew Honnibal
6e552c9d83
Prune number of non-projective labels more aggressiely
2017-10-11 02:46:44 -05:00
Matthew Honnibal
76fe24f44d
Improve embedding defaults
2017-10-11 09:44:17 +02:00
Matthew Honnibal
188f620046
Improve parser defaults
2017-10-11 09:43:48 +02:00
Matthew Honnibal
acba2e1051
Fix metadata in training
2017-10-11 08:55:52 +02:00
Matthew Honnibal
74c2c6a58c
Add default name and lang to meta
2017-10-11 08:49:12 +02:00
Matthew Honnibal
3814a161e6
Avoid clobbering preset lemmas
2017-10-11 08:41:03 +02:00
Matthew Honnibal
fd47f8e89f
Fix failing test
2017-10-11 08:38:34 +02:00
Matthew Honnibal
462b2e26b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 08:23:04 +02:00
Matthew Honnibal
a6ac4699eb
Allow Morphology class to setup tokens
...
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal
3b527fa52b
Call morphology.assign_untagged when pushing token to Doc
2017-10-11 03:23:57 +02:00
Matthew Honnibal
c15d8278cb
Avoid lemmatizing inappropriate tags in English lemmatizer
2017-10-11 03:23:23 +02:00
Matthew Honnibal
d528b6e36d
Add assign_untagged method in Morphology
2017-10-11 03:22:49 +02:00
Matthew Honnibal
2c118ab3a6
Add tests for Doc creation
2017-10-11 03:21:23 +02:00
ines
f4ae6763b9
Fix consistency of imports from spacy.tokens in examples
2017-10-11 02:30:40 +02:00
ines
820bf85075
Move LookupLemmatizer to spacy.lemmatizer
2017-10-11 02:25:13 +02:00
ines
417d45f5d0
Add lemmatizer data as variable on language data
...
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a
Tidy up language data
2017-10-11 02:22:49 +02:00
Matthew Honnibal
d84136b4a9
Update add label test
2017-10-10 22:57:41 +02:00
Matthew Honnibal
3065f12ef2
Make add parser label work for hidden_depth=0
2017-10-10 22:57:31 +02:00
ines
bfd58dd0fc
Merge branch 'develop' into feature/dot-underscore
2017-10-10 22:03:51 +02:00
Matthew Honnibal
73bca3d382
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 12:51:37 -05:00
Matthew Honnibal
5156074df1
Make loading code more consistent in train command
2017-10-10 12:51:20 -05:00
Matthew Honnibal
d70fba6807
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 19:33:10 +02:00
Matthew Honnibal
8143618497
Set prefix length back to 1
2017-10-10 19:32:54 +02:00
Matthew Honnibal
97c9b5db8b
Patch spacy.train for new pipeline management
2017-10-09 23:41:16 -05:00
ines
19598ebfee
Update migration guide
2017-10-10 06:38:11 +02:00
ines
9c96a6e131
Update pipelines section in v2 overview
2017-10-10 06:33:53 +02:00
Matthew Honnibal
a635240398
Add conll_ner2json converter
2017-10-09 22:03:26 -05:00
Matthew Honnibal
e0a9b02b67
Merge Span._ and Span.as_doc methods
2017-10-09 22:00:15 -05:00
Matthew Honnibal
dce8afb9cf
Set prefix length to 3
2017-10-09 21:55:55 -05:00
Matthew Honnibal
8265b90c83
Update parser defaults
2017-10-09 21:55:20 -05:00
Yam
efe0800f91
Update training.jade
...
fix several changes
2017-10-09 21:39:15 -05:00
Matthew Honnibal
dd2b0601d1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-09 21:30:46 -05:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
ines
6679117000
Add pipeline component examples
2017-10-10 04:26:06 +02:00
ines
7a592d01dc
Update pipeline component usage docs
2017-10-10 04:24:39 +02:00
ines
3d5154811a
Fix typo
2017-10-10 04:24:22 +02:00
ines
43b70651fb
Document extension methods on Doc, Token and Span
...
set_extension, get_extension, has_extension
2017-10-10 04:23:37 +02:00
ines
67350fa496
Use better logic for auto-generating component name
...
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
ines
b4fc6b203c
Rename mixin
2017-10-10 04:22:23 +02:00
ines
3fc4fe61d2
Fix typo
2017-10-10 04:15:14 +02:00
ines
59c4f27499
Add get, set and has methods to Underscore
2017-10-10 04:14:35 +02:00
Matthew Honnibal
19136fd155
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 03:58:30 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
f0f2739ae3
Add test for serialization issue raised in #1105
2017-10-10 03:57:58 +02:00
Matthew Honnibal
735d18654d
Add NER converter for CoNLL 2003 data
2017-10-09 20:06:28 -05:00