ines
612224c10d
Port over changes from #1157
2017-10-14 13:11:39 +02:00
ines
9b3f8f9ec3
Fix formatting and add comment on languages
2017-10-14 13:11:18 +02:00
ines
a4d974d97b
Port over URL pattern changes from #1411
2017-10-14 12:58:07 +02:00
ines
09aed58140
Port over changes from #1333 and add comments
2017-10-14 12:52:59 +02:00
Matthew Honnibal
cf6da9301a
Update lemmatizer test
2017-10-12 22:50:52 +02:00
Matthew Honnibal
9b90d235d1
Fix tag check in lemmatizer
2017-10-12 22:50:43 +02:00
Matthew Honnibal
dc01acd821
Escape encoding in validate function
2017-10-12 22:23:21 +02:00
Matthew Honnibal
27b927259a
Add locale_escape compat function
2017-10-12 22:22:04 +02:00
ines
9c6de3dcfa
Merge branch 'develop' into feature/cli-validate
2017-10-12 21:44:28 +02:00
Matthew Honnibal
462caf835a
Fix SBD test
2017-10-12 21:18:22 +02:00
ines
fff1028391
Add validate CLI command
2017-10-12 20:05:06 +02:00
Matthew Honnibal
908f44c3fe
Disable history features by default
2017-10-12 14:56:11 +02:00
Matthew Honnibal
a955843684
Increase default number of epochs
2017-10-12 13:13:01 +02:00
Matthew Honnibal
cecfcc7711
Set default hyper params back to 'slow' settings
2017-10-12 13:12:26 +02:00
Ines Montani
37aa523a8e
Merge pull request #1408 from explosion/feature/dot-underscore
...
💫 Custom attributes via Doc._, Token._ and Span._
2017-10-11 18:35:56 +02:00
ines
8ce6f96180
Don't make copies of language data components
2017-10-11 15:34:55 +02:00
ines
51519251c2
Fix underscore method test
2017-10-11 13:34:19 +02:00
ines
c6ae49e8bf
Fix formatting
2017-10-11 13:34:11 +02:00
ines
453c47ca24
Add German lemmatizer tests
2017-10-11 13:27:26 +02:00
ines
15fe0fd82d
Fix tests
2017-10-11 13:27:18 +02:00
ines
6dd14dc342
Add lookup lemmas to tokens without POS tags
2017-10-11 13:27:10 +02:00
ines
9620c1a640
Add lemma_lookup to Language defaults
2017-10-11 13:26:05 +02:00
ines
9fd471372a
Add lookup lemmatizer to lemmatizer as lookup() method
2017-10-11 13:25:51 +02:00
ines
e0ff145a8b
Merge branch 'develop' into feature/dot-underscore
2017-10-11 11:57:05 +02:00
ines
c1d6d43c83
Merge branch 'develop' into feature/lemmatizer
2017-10-11 11:56:35 +02:00
Matthew Honnibal
17c467e0ab
Avoid clobbering existing lemmas
2017-10-11 03:33:06 -05:00
Matthew Honnibal
807e109f2b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 02:47:59 -05:00
Matthew Honnibal
6e552c9d83
Prune number of non-projective labels more aggressiely
2017-10-11 02:46:44 -05:00
Matthew Honnibal
76fe24f44d
Improve embedding defaults
2017-10-11 09:44:17 +02:00
Matthew Honnibal
188f620046
Improve parser defaults
2017-10-11 09:43:48 +02:00
Matthew Honnibal
acba2e1051
Fix metadata in training
2017-10-11 08:55:52 +02:00
Matthew Honnibal
74c2c6a58c
Add default name and lang to meta
2017-10-11 08:49:12 +02:00
Matthew Honnibal
3814a161e6
Avoid clobbering preset lemmas
2017-10-11 08:41:03 +02:00
Matthew Honnibal
fd47f8e89f
Fix failing test
2017-10-11 08:38:34 +02:00
Matthew Honnibal
462b2e26b4
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-11 08:23:04 +02:00
Matthew Honnibal
a6ac4699eb
Allow Morphology class to setup tokens
...
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal
3b527fa52b
Call morphology.assign_untagged when pushing token to Doc
2017-10-11 03:23:57 +02:00
Matthew Honnibal
c15d8278cb
Avoid lemmatizing inappropriate tags in English lemmatizer
2017-10-11 03:23:23 +02:00
Matthew Honnibal
d528b6e36d
Add assign_untagged method in Morphology
2017-10-11 03:22:49 +02:00
Matthew Honnibal
2c118ab3a6
Add tests for Doc creation
2017-10-11 03:21:23 +02:00
ines
820bf85075
Move LookupLemmatizer to spacy.lemmatizer
2017-10-11 02:25:13 +02:00
ines
417d45f5d0
Add lemmatizer data as variable on language data
...
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines
0c2343d73a
Tidy up language data
2017-10-11 02:22:49 +02:00
Matthew Honnibal
d84136b4a9
Update add label test
2017-10-10 22:57:41 +02:00
Matthew Honnibal
3065f12ef2
Make add parser label work for hidden_depth=0
2017-10-10 22:57:31 +02:00
ines
bfd58dd0fc
Merge branch 'develop' into feature/dot-underscore
2017-10-10 22:03:51 +02:00
Matthew Honnibal
73bca3d382
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 12:51:37 -05:00
Matthew Honnibal
5156074df1
Make loading code more consistent in train command
2017-10-10 12:51:20 -05:00
Matthew Honnibal
d70fba6807
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 19:33:10 +02:00
Matthew Honnibal
8143618497
Set prefix length back to 1
2017-10-10 19:32:54 +02:00
Matthew Honnibal
97c9b5db8b
Patch spacy.train for new pipeline management
2017-10-09 23:41:16 -05:00
Matthew Honnibal
a635240398
Add conll_ner2json converter
2017-10-09 22:03:26 -05:00
Matthew Honnibal
e0a9b02b67
Merge Span._ and Span.as_doc methods
2017-10-09 22:00:15 -05:00
Matthew Honnibal
dce8afb9cf
Set prefix length to 3
2017-10-09 21:55:55 -05:00
Matthew Honnibal
8265b90c83
Update parser defaults
2017-10-09 21:55:20 -05:00
Matthew Honnibal
dd2b0601d1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-09 21:30:46 -05:00
Matthew Honnibal
09d61ada5e
Merge pull request #1396 from explosion/feature/pipeline-management
...
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
ines
67350fa496
Use better logic for auto-generating component name
...
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
ines
3fc4fe61d2
Fix typo
2017-10-10 04:15:14 +02:00
ines
59c4f27499
Add get, set and has methods to Underscore
2017-10-10 04:14:35 +02:00
Matthew Honnibal
19136fd155
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-10 03:58:30 +02:00
Matthew Honnibal
8978212ee5
Patch serialization bug raised in #1105
2017-10-10 03:58:12 +02:00
Matthew Honnibal
f0f2739ae3
Add test for serialization issue raised in #1105
2017-10-10 03:57:58 +02:00
Matthew Honnibal
735d18654d
Add NER converter for CoNLL 2003 data
2017-10-09 20:06:28 -05:00
Matthew Honnibal
51d18937af
Partially apply doc/span/token into method
...
We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:
```
def partial(unbound_method, constant_arg):
def bound_method(*args, **kwargs):
return unbound_method(constant_arg, *args, **kwargs)
return bound_method
2017-10-10 02:21:28 +02:00
Matthew Honnibal
808d8740d6
Remove print statement
2017-10-09 08:45:20 -05:00
Matthew Honnibal
0f41b25f60
Add speed benchmarks to metadata
2017-10-09 08:05:37 -05:00
ines
de374dc72a
Merge branch 'feature/pipeline-management' into feature/dot-underscore
2017-10-09 14:37:51 +02:00
Matthew Honnibal
2534cd57d7
Add bandaid solution to the 'shadowing' problem in #864
2017-10-09 08:59:35 +02:00
Matthew Honnibal
d8a2506023
Merge pull request #1401 from explosion/feature/add-parser-action
...
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal
689349e32f
Merge pull request #1400 from explosion/feature/sentence-parsing
...
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal
e79fc41ff8
Merge pull request #1391 from explosion/feature/multilabel-textcat
...
💫 Fix multi-label support for text classification
2017-10-09 04:22:31 +02:00
Matthew Honnibal
fad2b8315f
Merge branch 'develop' into feature/add-parser-action
2017-10-09 04:13:04 +02:00
Matthew Honnibal
6c79841c0d
Fix tests for history features
2017-10-09 04:12:24 +02:00
Matthew Honnibal
dde87e6b0d
Add tests for adding parser actions
2017-10-09 03:42:35 +02:00
Matthew Honnibal
b2b8506f2c
Remove whitespace
2017-10-09 03:35:57 +02:00
Matthew Honnibal
d43a83e37a
Allow parser.add_label for pretrained models
2017-10-09 03:35:40 +02:00
Matthew Honnibal
81a64119db
Fix string-to-unicode problem
2017-10-09 00:59:49 +02:00
Matthew Honnibal
02c2af7119
Fix test
2017-10-09 00:29:37 +02:00
Matthew Honnibal
4cc84b0234
Prohibit Break when sent_start < 0
2017-10-09 00:02:45 +02:00
Matthew Honnibal
5a67efeccc
Add tests for sentence segmentation presetting
2017-10-09 00:02:23 +02:00
Matthew Honnibal
e938bce320
Adjust parsing transition system to allow preset sentence segments.
2017-10-08 23:53:34 +02:00
Matthew Honnibal
080afd4924
Add ternary value setting to Token.sent_start
2017-10-08 23:51:58 +02:00
Matthew Honnibal
7ae67ec6a1
Add Span.as_doc method
2017-10-08 23:50:20 +02:00
Matthew Honnibal
20309fb9db
Make history features default to zero
2017-10-08 20:32:14 +02:00
Matthew Honnibal
e74c8d2fad
Merge remote-tracking branch 'origin/develop' into feature/sentence-parsing
2017-10-08 20:20:41 +02:00
Matthew Honnibal
18063803de
Make TokenC.sent_tart an int, to allow ternary value
2017-10-08 19:58:54 +02:00
Matthew Honnibal
be4f0b6460
Update defaults
2017-10-08 02:08:12 -05:00
Matthew Honnibal
42b401d08b
Change default hidden depth to 1
2017-10-07 21:05:21 -05:00
Matthew Honnibal
9d66a915da
Update training defaults
2017-10-07 21:02:38 -05:00
Matthew Honnibal
d163115e91
Add non-linearity after history features
2017-10-07 21:00:43 -05:00
Matthew Honnibal
92c5d78b42
Unhack NER.add_action
2017-10-07 19:02:40 +02:00
Matthew Honnibal
f2b590f672
Increment version
2017-10-07 19:01:01 +02:00
Matthew Honnibal
9bd8191739
Add tests for Underscore
2017-10-07 18:56:19 +02:00
Matthew Honnibal
668a0ea640
Pass extensions into Underscore class
2017-10-07 18:56:01 +02:00
Matthew Honnibal
1289129fd9
Add Underscore class
2017-10-07 18:00:14 +02:00
Matthew Honnibal
eb0595bea9
Merge pull request #1392 from explosion/feature/parser-history-model
...
💫 Parser history features
2017-10-07 15:07:02 +02:00
Matthew Honnibal
3d22ccf495
Update default hyper-parameters
2017-10-07 07:16:41 -05:00
Matthew Honnibal
09442d25ec
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-07 07:05:04 -05:00
Matthew Honnibal
3b67eabfea
Allow empty dictionaries to match any token in Matcher
...
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.
The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.
This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0
Fix beam parse model test
2017-10-07 02:15:15 +02:00
ines
b38a8f4a94
Fix and update pipe methods tests
2017-10-07 02:06:23 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
3a65a0c970
Start adding tests for new pipeline management
2017-10-07 01:48:23 +02:00
ines
e43530269c
Update docstrings
2017-10-07 01:04:50 +02:00
ines
61a503a611
Fix parser test
2017-10-07 00:38:51 +02:00
ines
b39409173e
Add disable option and True/False/None values for pipeline
2017-10-07 00:29:08 +02:00
ines
2586b61b15
Fix formatting, tidy up and remove unused imports
2017-10-07 00:26:05 +02:00
ines
212c8f0711
Implement new Language methods and pipeline API
2017-10-07 00:25:54 +02:00
Matthew Honnibal
8be46d766e
Remove print statement
2017-10-06 16:19:02 -05:00
Matthew Honnibal
8e731009fe
Fix parser config serialization
2017-10-06 13:50:52 -05:00
Matthew Honnibal
f4c9a98166
Fix spacy evaluate command on non-GPU
2017-10-06 13:17:47 -05:00
Matthew Honnibal
16ba6aa8a6
Fix parser config serialization
2017-10-06 13:17:31 -05:00
Matthew Honnibal
c66399d8ae
Fix depth definition with history features
2017-10-06 06:20:05 -05:00
Matthew Honnibal
5c750a9c2f
Reserve 0 for 'missing' in history features
2017-10-06 06:10:13 -05:00
Matthew Honnibal
fbba7c517e
Pass dropout through to embed tables
2017-10-06 06:09:18 -05:00
Matthew Honnibal
21d11936fe
Fix significant train/test skew error in history feats
2017-10-06 06:08:50 -05:00
Matthew Honnibal
555d8c8bff
Fix beam history features
2017-10-05 22:21:50 -05:00
Matthew Honnibal
3db0a32fd6
Fix dropout for history features
2017-10-05 22:21:30 -05:00
Matthew Honnibal
b0618def8d
Add support for 2-token state option
2017-10-05 21:54:12 -05:00
Matthew Honnibal
363aa47b40
Clean up dead parsing code
2017-10-05 21:53:49 -05:00
Matthew Honnibal
ca12764772
Enable history features for beam parser
2017-10-05 21:53:29 -05:00
Matthew Honnibal
fc06b0a333
Fix training when hist_size==0
2017-10-05 21:52:28 -05:00
Matthew Honnibal
e25ffcb11f
Move history size under feature flags
2017-10-05 19:38:13 -05:00
Matthew Honnibal
563f46f026
Fix multi-label support for text classification
...
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.
For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.
To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.
Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
c6cd81f192
Wrap try/except around model saving
2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36
Wrap model saving in try/except
2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475
Update tests
2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa
Disable LayerNorm hack
2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a
Make depth setting in parser work again
2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6
Fix Embed and HistoryFeatures
2017-10-04 19:55:34 -05:00
Matthew Honnibal
d903986439
Increment version
2017-10-04 17:14:26 +02:00
Matthew Honnibal
40edb65ee7
Make test work for Python 2.7
2017-10-04 16:36:50 +02:00
Matthew Honnibal
bd8e84998a
Add nO attribute to TextCategorizer model
2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527
Improve textcat model slightly
2017-10-04 15:15:53 +02:00
Matthew Honnibal
39798b0172
Uncomment layernorm adjustment hack
2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582
Add test for #1380 . Passes without fix?
2017-10-04 14:56:31 +02:00
Matthew Honnibal
774f5732bd
Fix dimensionality of textcat when no vectors available
2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51
Merge pull request #1385 from explosion/feature/new-website
...
💫 New spaCy website
2017-10-04 14:35:52 +02:00
Matthew Honnibal
af75b74208
Unset LayerNorm backwards compat hack
2017-10-03 20:47:10 -05:00
ines
73ac0aa0b5
Update spacy evaluate and add displaCy option
2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a
Fix evaluate for non-GPU
2017-10-03 22:47:31 +02:00
Matthew Honnibal
5cbefcba17
Set backwards compatibility flag
2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7
Update thinc imports for 6.9
2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Matthew Honnibal
e514d6aa0a
Import thinc modules more explicitly, to avoid cycles
2017-10-03 18:49:25 +02:00
Matthew Honnibal
338e1fda0e
Unbreak merge artefact
2017-10-03 09:41:05 -05:00
Matthew Honnibal
1289187279
Fix circular import
2017-10-03 09:33:21 -05:00
Matthew Honnibal
a44c4c3a5b
Add timer to evaluate
2017-10-03 09:15:35 -05:00
Matthew Honnibal
96da86b3e5
Add support for verbose flag to Language
2017-10-03 09:14:57 -05:00
Matthew Honnibal
02586a5243
Add timing to spacy evaluate command
2017-10-03 09:14:34 -05:00
ines
e49cd7aeaf
Move import into load to avoid circular imports
2017-10-03 15:22:19 +02:00
ines
b0dfa059db
Update docs link in about.py
2017-10-03 15:19:55 +02:00
Matthew Honnibal
dc3c791947
Fix history size option
2017-10-03 13:41:23 +02:00
Matthew Honnibal
278a4c17c6
Fix history features
2017-10-03 13:27:10 +02:00
Matthew Honnibal
b770f4e108
Fix embed class in history features
2017-10-03 13:26:55 +02:00
Matthew Honnibal
b50a359e11
Add support for history features in parsing models
2017-10-03 12:44:01 +02:00
Matthew Honnibal
ee41e4fea7
Support history features in stateclass
2017-10-03 12:43:48 +02:00
Matthew Honnibal
6aa6a5bc25
Add a layer type for history features
2017-10-03 12:43:09 +02:00
Matthew Honnibal
8902df44de
Fix component disabling during training
2017-10-02 21:07:23 +02:00
Matthew Honnibal
c617d288d8
Update pipeline component names in spaCy train
2017-10-02 17:20:19 +02:00
Matthew Honnibal
f942903429
Improve sentence merging in iob2json
2017-10-02 17:02:10 +02:00
Matthew Honnibal
31681d20e0
Fix concatenation in iob2json converter
2017-10-02 16:50:26 +02:00
Matthew Honnibal
4896ce3320
Remove misleading comment
2017-10-02 00:09:14 +02:00
Matthew Honnibal
d90cc917fa
Merge vectors.pyx doc strings
2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77
Fix inconsistency of Vectors class API
2017-10-01 17:00:34 -05:00
Matthew Honnibal
e38089d598
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 22:10:54 +02:00
Matthew Honnibal
97c409b602
Add docstrings for spacy.vectors
2017-10-01 22:10:33 +02:00
ines
b776f48e58
Fix typo
2017-10-01 21:58:45 +02:00
Matthew Honnibal
94df115a81
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 14:06:23 -05:00
Matthew Honnibal
2cf0f4622f
Fix loading of models with pre-trained vectors
2017-10-01 14:05:32 -05:00
Matthew Honnibal
69c7c642c2
Add spacy evaluate
2017-10-01 14:05:04 -05:00
ines
8dbe49ecb8
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:55:17 +02:00
ines
153c2589d4
Revert "Always compare lowercase package names"
...
This reverts commit 7d77dc490f
.
2017-09-29 20:53:36 +02:00
ines
fd1a9225d8
Handle conversion of pipeline components correctly
...
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
ines
7d77dc490f
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:52:28 +02:00
Matthew Honnibal
cdb2d83e16
Pass dropout in parser
2017-09-28 18:47:13 -05:00
Matthew Honnibal
158e177cae
Fix default embed size
2017-09-28 08:25:23 -05:00
Matthew Honnibal
f6330d69e6
Default embed size to 7000
2017-09-28 08:07:41 -05:00
Matthew Honnibal
ac8481a7b0
Print NER loss
2017-09-28 08:05:31 -05:00
Matthew Honnibal
542ebfa498
Improve defaults
2017-09-27 18:54:37 -05:00
Matthew Honnibal
dcb86bdc43
Default batch size to 32
2017-09-27 11:48:19 -05:00
Matthew Honnibal
1a37a2c0a0
Update training defaults
2017-09-27 11:48:07 -05:00
Matthew Honnibal
13d7a97f3a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-27 11:44:37 -05:00
Matthew Honnibal
66c388ee01
Remove unhelpful multitask objectives
2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a
Fix hard-coded vector width
2017-09-27 11:43:58 -05:00
Ines Montani
959c46eabe
Merge pull request #1365 from wannaphongcom/develop
...
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Matthew Honnibal
1ef4236f8e
Merge pull request #1343 from explosion/feature/phrasematcher
...
Update PhraseMatcher for spaCy 2
2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4
fix thai test
2017-09-26 23:54:15 +07:00
ines
1ff62eaee7
Fix option shortcut to avoid conflict
2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun
3d5046c499
fix import in th
2017-09-26 22:41:20 +07:00
ines
7fdfb78141
Add version option to cli.train
2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun
a63f790b8c
fix thai tag_map
2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun
2ea27d07f4
fix tokenizer_exceptions in thai
2017-09-26 22:14:47 +07:00
Matthew Honnibal
41cc5c4c17
Merge branch 'develop' into feature/phrasematcher
2017-09-26 09:59:17 -05:00
Matthew Honnibal
c2e2f81773
Merge pull request #1355 from explosion/feature/noshare
...
Make pipeline components independent
2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun
a2bf4cc7bf
fix newline in file
2017-09-26 21:49:43 +07:00
ines
bb5c631402
Implement like_num getter for French (via #1161 )
2017-09-26 16:47:45 +02:00
ines
15479b3bae
Add comment to like_num re: future work
2017-09-26 16:43:28 +02:00
ines
adda08fe14
Implement like_num getter for Dutch (via #1177 )
2017-09-26 16:39:15 +02:00
ines
5ee10379db
Port over changes from #1340
2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun
5cba67146c
add thai in spacy2
2017-09-26 21:36:27 +07:00
ines
10d291f129
Port over change from #1351
2017-09-26 16:11:41 +02:00
Matthew Honnibal
3274b46a0d
Try to fix compile error on Windows
2017-09-26 09:05:53 -05:00
Matthew Honnibal
19c7c09bf7
Fix PhraseMatcher.__contains__
2017-09-26 08:35:53 -05:00
Matthew Honnibal
d02a41a8c9
Merge remote-tracking branch 'origin/develop' into feature/phrasematcher
2017-09-26 08:32:55 -05:00
Matthew Honnibal
698fc0d016
Remove merge artefact
2017-09-26 08:31:37 -05:00
Matthew Honnibal
defb68e94f
Update feature/noshare with recent develop changes
2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd
Use dep and ent multi-task objectives for parser'
2017-09-26 08:13:52 -05:00
Matthew Honnibal
9bfd585a11
Fix parameter name in .pxd file
2017-09-26 07:28:50 -05:00
Matthew Honnibal
74f08e1ad5
Update test
2017-09-26 06:45:56 -05:00
Matthew Honnibal
5aaef3e7b8
Dont link vectors in vocab deserialize
2017-09-26 06:45:47 -05:00
Matthew Honnibal
18a27c7579
Fix typo in tensorizer serialization
2017-09-26 06:45:14 -05:00
Matthew Honnibal
5056743ad5
Fix parser serialization
2017-09-26 06:44:56 -05:00
Ines Montani
7123139b2b
Add __contains__ to PhraseMatcher
2017-09-26 13:13:27 +02:00
Ines Montani
50ad50f96a
Update matcher.pyx
2017-09-26 13:11:17 +02:00
Matthew Honnibal
e34e70673f
Allow tagger models to be built with pre-defined tok2vec layer
2017-09-26 05:51:52 -05:00
Matthew Honnibal
bf917225ab
Allow multi-task objectives during training
2017-09-26 05:42:52 -05:00
Matthew Honnibal
4ae9ea7684
Remove unused argument in Language
2017-09-26 05:41:35 -05:00
ines
edf7e4881d
Add meta.json option to cli.train and add relevant properties
...
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
ines
d2d35b63b7
Fix formatting
2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779
Add docstrings for Pipe API
2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7
Add docstrings for Pipe API
2017-09-25 16:20:49 +02:00
Matthew Honnibal
8716ffe57d
Serialize vocab last
2017-09-24 05:01:45 -05:00
Matthew Honnibal
72bbcc0871
Handle lemmatization for unknown string IDs
2017-09-24 05:01:31 -05:00
Matthew Honnibal
204b58c864
Fix evaluation during training
2017-09-24 05:01:03 -05:00
Matthew Honnibal
dc3a623d00
Remove unused update_shared argument
2017-09-24 05:00:37 -05:00
Matthew Honnibal
63bd87508d
Don't use iterated convolutions
2017-09-23 04:39:17 -05:00
Matthew Honnibal
5a7fd0fd36
Fix vector linkage
2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43
Whitespace
2017-09-22 20:00:50 -05:00
Matthew Honnibal
e93d43a43a
Fix training with preset vectors
2017-09-22 20:00:40 -05:00
Matthew Honnibal
0795857dcb
Fix beam parsing
2017-09-23 02:59:53 +02:00
Matthew Honnibal
4bd6a12b1f
Fix Tok2Vec
2017-09-23 02:58:54 +02:00
Matthew Honnibal
386c1a5bd8
Fix tagger training
2017-09-23 02:58:06 +02:00
Matthew Honnibal
a2357cce3f
Set random seed in train script
2017-09-23 02:57:31 +02:00
Matthew Honnibal
05596159bf
Fix serialization when pre-trained vectors
2017-09-22 15:33:27 -05:00
Matthew Honnibal
980fb6e854
Refactor Tok2Vec
2017-09-22 09:38:36 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
a186596307
Add 'reapply' combinator, for iterated CNN
2017-09-22 09:37:03 -05:00
Matthew Honnibal
40a4873b70
Fix serialization of model options
2017-09-21 13:07:26 -05:00
Matthew Honnibal
0a9016cade
Fix serialization during training
2017-09-21 13:06:45 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
1d73dec8b1
Refactor train script
2017-09-20 19:17:10 -05:00
Matthew Honnibal
ffda38356a
Add util function to enable GPU
2017-09-20 19:16:35 -05:00
Matthew Honnibal
24e85c2048
Pass values for CNN maxout pieces option
2017-09-20 19:16:12 -05:00
Matthew Honnibal
b832f89ff8
Add resume_training function
2017-09-20 19:15:20 -05:00