Matthew Honnibal
2c118ab3a6
Add tests for Doc creation
2017-10-11 03:21:23 +02:00
ines
820bf85075
Move LookupLemmatizer to spacy.lemmatizer
2017-10-11 02:25:13 +02:00
ines
417d45f5d0
Add lemmatizer data as variable on language data
...
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a
Tidy up language data
2017-10-11 02:22:49 +02:00
Matthew Honnibal
d84136b4a9
Update add label test
2017-10-10 22:57:41 +02:00
Matthew Honnibal
3065f12ef2
Make add parser label work for hidden_depth=0
2017-10-10 22:57:31 +02:00
ines
bfd58dd0fc
Merge branch 'develop' into feature/dot-underscore
2017-10-10 22:03:51 +02:00
Matthew Honnibal
73bca3d382
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 12:51:37 -05:00
Matthew Honnibal
5156074df1
Make loading code more consistent in train command
2017-10-10 12:51:20 -05:00
Matthew Honnibal
d70fba6807
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 19:33:10 +02:00
Matthew Honnibal
8143618497
Set prefix length back to 1
2017-10-10 19:32:54 +02:00
Matthew Honnibal
97c9b5db8b
Patch spacy.train for new pipeline management
2017-10-09 23:41:16 -05:00
Matthew Honnibal
a635240398
Add conll_ner2json converter
2017-10-09 22:03:26 -05:00
Matthew Honnibal
e0a9b02b67
Merge Span._ and Span.as_doc methods
2017-10-09 22:00:15 -05:00
Matthew Honnibal
dce8afb9cf
Set prefix length to 3
2017-10-09 21:55:55 -05:00
Matthew Honnibal
8265b90c83
Update parser defaults
2017-10-09 21:55:20 -05:00
Matthew Honnibal
dd2b0601d1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-09 21:30:46 -05:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
ines
67350fa496
Use better logic for auto-generating component name
...
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
ines
3fc4fe61d2
Fix typo
2017-10-10 04:15:14 +02:00
ines
59c4f27499
Add get, set and has methods to Underscore
2017-10-10 04:14:35 +02:00
Matthew Honnibal
19136fd155
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 03:58:30 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
f0f2739ae3
Add test for serialization issue raised in #1105
2017-10-10 03:57:58 +02:00
Matthew Honnibal
735d18654d
Add NER converter for CoNLL 2003 data
2017-10-09 20:06:28 -05:00
Matthew Honnibal
51d18937af
Partially apply doc/span/token into method
...
We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:
```
def partial(unbound_method, constant_arg):
def bound_method(*args, **kwargs):
return unbound_method(constant_arg, *args, **kwargs)
return bound_method
2017-10-10 02:21:28 +02:00
Matthew Honnibal
808d8740d6
Remove print statement
2017-10-09 08:45:20 -05:00
Matthew Honnibal
0f41b25f60
Add speed benchmarks to metadata
2017-10-09 08:05:37 -05:00
ines
de374dc72a
Merge branch 'feature/pipeline-management' into feature/dot-underscore
2017-10-09 14:37:51 +02:00
Matthew Honnibal
2534cd57d7
Add bandaid solution to the 'shadowing' problem in #864
2017-10-09 08:59:35 +02:00
Matthew Honnibal
d8a2506023
Merge pull request #1401 from explosion/feature/add-parser-action
...
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f
Merge pull request #1400 from explosion/feature/sentence-parsing
...
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
e79fc41ff8
Merge pull request #1391 from explosion/feature/multilabel-textcat
...
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal
fad2b8315f
Merge branch 'develop' into feature/add-parser-action
2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d
Fix tests for history features
2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d
Add tests for adding parser actions
2017-10-09 03:42:35 +02:00
Matthew Honnibal
b2b8506f2c
Remove whitespace
2017-10-09 03:35:57 +02:00
Matthew Honnibal
d43a83e37a
Allow parser.add_label for pretrained models
2017-10-09 03:35:40 +02:00
Matthew Honnibal
81a64119db
Fix string-to-unicode problem
2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119
Fix test
2017-10-09 00:29:37 +02:00
Matthew Honnibal
4cc84b0234
Prohibit Break when sent_start < 0
2017-10-09 00:02:45 +02:00
Matthew Honnibal
5a67efeccc
Add tests for sentence segmentation presetting
2017-10-09 00:02:23 +02:00
Matthew Honnibal
e938bce320
Adjust parsing transition system to allow preset sentence segments.
2017-10-08 23:53:34 +02:00
Matthew Honnibal
080afd4924
Add ternary value setting to Token.sent_start
2017-10-08 23:51:58 +02:00
Matthew Honnibal
7ae67ec6a1
Add Span.as_doc method
2017-10-08 23:50:20 +02:00
Matthew Honnibal
20309fb9db
Make history features default to zero
2017-10-08 20:32:14 +02:00
Matthew Honnibal
e74c8d2fad
Merge remote-tracking branch 'origin/develop' into feature/sentence-parsing
2017-10-08 20:20:41 +02:00
Matthew Honnibal
18063803de
Make TokenC.sent_tart an int, to allow ternary value
2017-10-08 19:58:54 +02:00
Matthew Honnibal
be4f0b6460
Update defaults
2017-10-08 02:08:12 -05:00
Matthew Honnibal
42b401d08b
Change default hidden depth to 1
2017-10-07 21:05:21 -05:00
Matthew Honnibal
9d66a915da
Update training defaults
2017-10-07 21:02:38 -05:00
Matthew Honnibal
d163115e91
Add non-linearity after history features
2017-10-07 21:00:43 -05:00
Matthew Honnibal
92c5d78b42
Unhack NER.add_action
2017-10-07 19:02:40 +02:00
Matthew Honnibal
f2b590f672
Increment version
2017-10-07 19:01:01 +02:00
Matthew Honnibal
9bd8191739
Add tests for Underscore
2017-10-07 18:56:19 +02:00
Matthew Honnibal
668a0ea640
Pass extensions into Underscore class
2017-10-07 18:56:01 +02:00
Matthew Honnibal
1289129fd9
Add Underscore class
2017-10-07 18:00:14 +02:00
Matthew Honnibal
eb0595bea9
Merge pull request #1392 from explosion/feature/parser-history-model
...
💫 Parser history features
2017-10-07 15:07:02 +02:00
Matthew Honnibal
3d22ccf495
Update default hyper-parameters
2017-10-07 07:16:41 -05:00
Matthew Honnibal
09442d25ec
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-07 07:05:04 -05:00
Matthew Honnibal
3b67eabfea
Allow empty dictionaries to match any token in Matcher
...
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.
The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.
This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0
Fix beam parse model test
2017-10-07 02:15:15 +02:00
ines
b38a8f4a94
Fix and update pipe methods tests
2017-10-07 02:06:23 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
3a65a0c970
Start adding tests for new pipeline management
2017-10-07 01:48:23 +02:00
ines
e43530269c
Update docstrings
2017-10-07 01:04:50 +02:00
ines
61a503a611
Fix parser test
2017-10-07 00:38:51 +02:00
ines
b39409173e
Add disable option and True/False/None values for pipeline
2017-10-07 00:29:08 +02:00
ines
2586b61b15
Fix formatting, tidy up and remove unused imports
2017-10-07 00:26:05 +02:00
ines
212c8f0711
Implement new Language methods and pipeline API
2017-10-07 00:25:54 +02:00
Matthew Honnibal
8be46d766e
Remove print statement
2017-10-06 16:19:02 -05:00
Matthew Honnibal
8e731009fe
Fix parser config serialization
2017-10-06 13:50:52 -05:00
Matthew Honnibal
f4c9a98166
Fix spacy evaluate command on non-GPU
2017-10-06 13:17:47 -05:00
Matthew Honnibal
16ba6aa8a6
Fix parser config serialization
2017-10-06 13:17:31 -05:00
Matthew Honnibal
c66399d8ae
Fix depth definition with history features
2017-10-06 06:20:05 -05:00
Matthew Honnibal
5c750a9c2f
Reserve 0 for 'missing' in history features
2017-10-06 06:10:13 -05:00
Matthew Honnibal
fbba7c517e
Pass dropout through to embed tables
2017-10-06 06:09:18 -05:00
Matthew Honnibal
21d11936fe
Fix significant train/test skew error in history feats
2017-10-06 06:08:50 -05:00
Matthew Honnibal
555d8c8bff
Fix beam history features
2017-10-05 22:21:50 -05:00
Matthew Honnibal
3db0a32fd6
Fix dropout for history features
2017-10-05 22:21:30 -05:00
Matthew Honnibal
b0618def8d
Add support for 2-token state option
2017-10-05 21:54:12 -05:00
Matthew Honnibal
363aa47b40
Clean up dead parsing code
2017-10-05 21:53:49 -05:00
Matthew Honnibal
ca12764772
Enable history features for beam parser
2017-10-05 21:53:29 -05:00
Matthew Honnibal
fc06b0a333
Fix training when hist_size==0
2017-10-05 21:52:28 -05:00
Matthew Honnibal
e25ffcb11f
Move history size under feature flags
2017-10-05 19:38:13 -05:00
Matthew Honnibal
563f46f026
Fix multi-label support for text classification
...
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.
For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.
To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.
Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
c6cd81f192
Wrap try/except around model saving
2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36
Wrap model saving in try/except
2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475
Update tests
2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa
Disable LayerNorm hack
2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a
Make depth setting in parser work again
2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6
Fix Embed and HistoryFeatures
2017-10-04 19:55:34 -05:00
Matthew Honnibal
d903986439
Increment version
2017-10-04 17:14:26 +02:00
Matthew Honnibal
40edb65ee7
Make test work for Python 2.7
2017-10-04 16:36:50 +02:00
Matthew Honnibal
bd8e84998a
Add nO attribute to TextCategorizer model
2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527
Improve textcat model slightly
2017-10-04 15:15:53 +02:00
Matthew Honnibal
39798b0172
Uncomment layernorm adjustment hack
2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582
Add test for #1380 . Passes without fix?
2017-10-04 14:56:31 +02:00
Matthew Honnibal
774f5732bd
Fix dimensionality of textcat when no vectors available
2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51
Merge pull request #1385 from explosion/feature/new-website
...
💫 New spaCy website
2017-10-04 14:35:52 +02:00
Matthew Honnibal
af75b74208
Unset LayerNorm backwards compat hack
2017-10-03 20:47:10 -05:00
ines
73ac0aa0b5
Update spacy evaluate and add displaCy option
2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a
Fix evaluate for non-GPU
2017-10-03 22:47:31 +02:00
Matthew Honnibal
5cbefcba17
Set backwards compatibility flag
2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7
Update thinc imports for 6.9
2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Matthew Honnibal
e514d6aa0a
Import thinc modules more explicitly, to avoid cycles
2017-10-03 18:49:25 +02:00
Matthew Honnibal
338e1fda0e
Unbreak merge artefact
2017-10-03 09:41:05 -05:00
Matthew Honnibal
1289187279
Fix circular import
2017-10-03 09:33:21 -05:00
Matthew Honnibal
a44c4c3a5b
Add timer to evaluate
2017-10-03 09:15:35 -05:00
Matthew Honnibal
96da86b3e5
Add support for verbose flag to Language
2017-10-03 09:14:57 -05:00
Matthew Honnibal
02586a5243
Add timing to spacy evaluate command
2017-10-03 09:14:34 -05:00
ines
e49cd7aeaf
Move import into load to avoid circular imports
2017-10-03 15:22:19 +02:00
ines
b0dfa059db
Update docs link in about.py
2017-10-03 15:19:55 +02:00
Matthew Honnibal
dc3c791947
Fix history size option
2017-10-03 13:41:23 +02:00
Matthew Honnibal
278a4c17c6
Fix history features
2017-10-03 13:27:10 +02:00
Matthew Honnibal
b770f4e108
Fix embed class in history features
2017-10-03 13:26:55 +02:00
Matthew Honnibal
b50a359e11
Add support for history features in parsing models
2017-10-03 12:44:01 +02:00
Matthew Honnibal
ee41e4fea7
Support history features in stateclass
2017-10-03 12:43:48 +02:00
Matthew Honnibal
6aa6a5bc25
Add a layer type for history features
2017-10-03 12:43:09 +02:00
Matthew Honnibal
8902df44de
Fix component disabling during training
2017-10-02 21:07:23 +02:00
Matthew Honnibal
c617d288d8
Update pipeline component names in spaCy train
2017-10-02 17:20:19 +02:00
Matthew Honnibal
f942903429
Improve sentence merging in iob2json
2017-10-02 17:02:10 +02:00
Matthew Honnibal
31681d20e0
Fix concatenation in iob2json converter
2017-10-02 16:50:26 +02:00
Matthew Honnibal
4896ce3320
Remove misleading comment
2017-10-02 00:09:14 +02:00
Matthew Honnibal
d90cc917fa
Merge vectors.pyx doc strings
2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77
Fix inconsistency of Vectors class API
2017-10-01 17:00:34 -05:00
Matthew Honnibal
e38089d598
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 22:10:54 +02:00
Matthew Honnibal
97c409b602
Add docstrings for spacy.vectors
2017-10-01 22:10:33 +02:00
ines
b776f48e58
Fix typo
2017-10-01 21:58:45 +02:00
Matthew Honnibal
94df115a81
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 14:06:23 -05:00
Matthew Honnibal
2cf0f4622f
Fix loading of models with pre-trained vectors
2017-10-01 14:05:32 -05:00
Matthew Honnibal
69c7c642c2
Add spacy evaluate
2017-10-01 14:05:04 -05:00
ines
8dbe49ecb8
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:55:17 +02:00
ines
153c2589d4
Revert "Always compare lowercase package names"
...
This reverts commit 7d77dc490f
.
2017-09-29 20:53:36 +02:00
ines
fd1a9225d8
Handle conversion of pipeline components correctly
...
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
ines
7d77dc490f
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:52:28 +02:00
Matthew Honnibal
cdb2d83e16
Pass dropout in parser
2017-09-28 18:47:13 -05:00
Matthew Honnibal
158e177cae
Fix default embed size
2017-09-28 08:25:23 -05:00
Matthew Honnibal
f6330d69e6
Default embed size to 7000
2017-09-28 08:07:41 -05:00
Matthew Honnibal
ac8481a7b0
Print NER loss
2017-09-28 08:05:31 -05:00
Matthew Honnibal
542ebfa498
Improve defaults
2017-09-27 18:54:37 -05:00
Matthew Honnibal
dcb86bdc43
Default batch size to 32
2017-09-27 11:48:19 -05:00
Matthew Honnibal
1a37a2c0a0
Update training defaults
2017-09-27 11:48:07 -05:00
Matthew Honnibal
13d7a97f3a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-27 11:44:37 -05:00
Matthew Honnibal
66c388ee01
Remove unhelpful multitask objectives
2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a
Fix hard-coded vector width
2017-09-27 11:43:58 -05:00