Commit Graph

4184 Commits

Author SHA1 Message Date
ines
e43530269c Update docstrings 2017-10-07 01:04:50 +02:00
ines
61a503a611 Fix parser test 2017-10-07 00:38:51 +02:00
ines
b39409173e Add disable option and True/False/None values for pipeline 2017-10-07 00:29:08 +02:00
ines
2586b61b15 Fix formatting, tidy up and remove unused imports 2017-10-07 00:26:05 +02:00
ines
212c8f0711 Implement new Language methods and pipeline API 2017-10-07 00:25:54 +02:00
Matthew Honnibal
8be46d766e Remove print statement 2017-10-06 16:19:02 -05:00
Matthew Honnibal
8e731009fe Fix parser config serialization 2017-10-06 13:50:52 -05:00
Matthew Honnibal
f4c9a98166 Fix spacy evaluate command on non-GPU 2017-10-06 13:17:47 -05:00
Matthew Honnibal
16ba6aa8a6 Fix parser config serialization 2017-10-06 13:17:31 -05:00
Matthew Honnibal
c66399d8ae Fix depth definition with history features 2017-10-06 06:20:05 -05:00
Matthew Honnibal
5c750a9c2f Reserve 0 for 'missing' in history features 2017-10-06 06:10:13 -05:00
Matthew Honnibal
fbba7c517e Pass dropout through to embed tables 2017-10-06 06:09:18 -05:00
Matthew Honnibal
21d11936fe Fix significant train/test skew error in history feats 2017-10-06 06:08:50 -05:00
Matthew Honnibal
555d8c8bff Fix beam history features 2017-10-05 22:21:50 -05:00
Matthew Honnibal
3db0a32fd6 Fix dropout for history features 2017-10-05 22:21:30 -05:00
Matthew Honnibal
b0618def8d Add support for 2-token state option 2017-10-05 21:54:12 -05:00
Matthew Honnibal
363aa47b40 Clean up dead parsing code 2017-10-05 21:53:49 -05:00
Matthew Honnibal
ca12764772 Enable history features for beam parser 2017-10-05 21:53:29 -05:00
Matthew Honnibal
fc06b0a333 Fix training when hist_size==0 2017-10-05 21:52:28 -05:00
Matthew Honnibal
e25ffcb11f Move history size under feature flags 2017-10-05 19:38:13 -05:00
Matthew Honnibal
563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36 Wrap model saving in try/except 2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa Disable LayerNorm hack 2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a Make depth setting in parser work again 2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6 Fix Embed and HistoryFeatures 2017-10-04 19:55:34 -05:00
Matthew Honnibal
d903986439 Increment version 2017-10-04 17:14:26 +02:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
bd8e84998a Add nO attribute to TextCategorizer model 2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527 Improve textcat model slightly 2017-10-04 15:15:53 +02:00
Matthew Honnibal
39798b0172 Uncomment layernorm adjustment hack 2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
774f5732bd Fix dimensionality of textcat when no vectors available 2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51 Merge pull request #1385 from explosion/feature/new-website
💫 New spaCy website
2017-10-04 14:35:52 +02:00
Matthew Honnibal
af75b74208 Unset LayerNorm backwards compat hack 2017-10-03 20:47:10 -05:00
ines
73ac0aa0b5 Update spacy evaluate and add displaCy option 2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53 Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a Fix evaluate for non-GPU 2017-10-03 22:47:31 +02:00
Matthew Honnibal
5cbefcba17 Set backwards compatibility flag 2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7 Update thinc imports for 6.9 2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Matthew Honnibal
e514d6aa0a Import thinc modules more explicitly, to avoid cycles 2017-10-03 18:49:25 +02:00
Matthew Honnibal
338e1fda0e Unbreak merge artefact 2017-10-03 09:41:05 -05:00
Matthew Honnibal
1289187279 Fix circular import 2017-10-03 09:33:21 -05:00
Matthew Honnibal
a44c4c3a5b Add timer to evaluate 2017-10-03 09:15:35 -05:00
Matthew Honnibal
96da86b3e5 Add support for verbose flag to Language 2017-10-03 09:14:57 -05:00
Matthew Honnibal
02586a5243 Add timing to spacy evaluate command 2017-10-03 09:14:34 -05:00
ines
e49cd7aeaf Move import into load to avoid circular imports 2017-10-03 15:22:19 +02:00
ines
b0dfa059db Update docs link in about.py 2017-10-03 15:19:55 +02:00
Matthew Honnibal
dc3c791947 Fix history size option 2017-10-03 13:41:23 +02:00
Matthew Honnibal
278a4c17c6 Fix history features 2017-10-03 13:27:10 +02:00
Matthew Honnibal
b770f4e108 Fix embed class in history features 2017-10-03 13:26:55 +02:00
Matthew Honnibal
b50a359e11 Add support for history features in parsing models 2017-10-03 12:44:01 +02:00
Matthew Honnibal
ee41e4fea7 Support history features in stateclass 2017-10-03 12:43:48 +02:00
Matthew Honnibal
6aa6a5bc25 Add a layer type for history features 2017-10-03 12:43:09 +02:00
Matthew Honnibal
8902df44de Fix component disabling during training 2017-10-02 21:07:23 +02:00
Matthew Honnibal
c617d288d8 Update pipeline component names in spaCy train 2017-10-02 17:20:19 +02:00
Matthew Honnibal
f942903429 Improve sentence merging in iob2json 2017-10-02 17:02:10 +02:00
Matthew Honnibal
31681d20e0 Fix concatenation in iob2json converter 2017-10-02 16:50:26 +02:00
Matthew Honnibal
4896ce3320 Remove misleading comment 2017-10-02 00:09:14 +02:00
Matthew Honnibal
d90cc917fa Merge vectors.pyx doc strings 2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77 Fix inconsistency of Vectors class API 2017-10-01 17:00:34 -05:00
Matthew Honnibal
e38089d598 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-01 22:10:54 +02:00
Matthew Honnibal
97c409b602 Add docstrings for spacy.vectors 2017-10-01 22:10:33 +02:00
ines
b776f48e58 Fix typo 2017-10-01 21:58:45 +02:00
Matthew Honnibal
94df115a81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-01 14:06:23 -05:00
Matthew Honnibal
2cf0f4622f Fix loading of models with pre-trained vectors 2017-10-01 14:05:32 -05:00
Matthew Honnibal
69c7c642c2 Add spacy evaluate 2017-10-01 14:05:04 -05:00
ines
8dbe49ecb8 Always compare lowercase package names
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:55:17 +02:00
ines
153c2589d4 Revert "Always compare lowercase package names"
This reverts commit 7d77dc490f.
2017-09-29 20:53:36 +02:00
ines
fd1a9225d8 Handle conversion of pipeline components correctly
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
ines
7d77dc490f Always compare lowercase package names
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:52:28 +02:00
Matthew Honnibal
cdb2d83e16 Pass dropout in parser 2017-09-28 18:47:13 -05:00
Matthew Honnibal
158e177cae Fix default embed size 2017-09-28 08:25:23 -05:00
Matthew Honnibal
f6330d69e6 Default embed size to 7000 2017-09-28 08:07:41 -05:00
Matthew Honnibal
ac8481a7b0 Print NER loss 2017-09-28 08:05:31 -05:00
Matthew Honnibal
542ebfa498 Improve defaults 2017-09-27 18:54:37 -05:00
Matthew Honnibal
dcb86bdc43 Default batch size to 32 2017-09-27 11:48:19 -05:00
Matthew Honnibal
1a37a2c0a0 Update training defaults 2017-09-27 11:48:07 -05:00
Matthew Honnibal
13d7a97f3a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-27 11:44:37 -05:00
Matthew Honnibal
66c388ee01 Remove unhelpful multitask objectives 2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a Fix hard-coded vector width 2017-09-27 11:43:58 -05:00
Ines Montani
959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Matthew Honnibal
1ef4236f8e Merge pull request #1343 from explosion/feature/phrasematcher
Update PhraseMatcher for spaCy 2
2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00
ines
1ff62eaee7 Fix option shortcut to avoid conflict 2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun
3d5046c499 fix import in th 2017-09-26 22:41:20 +07:00
ines
7fdfb78141 Add version option to cli.train 2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun
a63f790b8c fix thai tag_map 2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun
2ea27d07f4 fix tokenizer_exceptions in thai 2017-09-26 22:14:47 +07:00
Matthew Honnibal
41cc5c4c17 Merge branch 'develop' into feature/phrasematcher 2017-09-26 09:59:17 -05:00
Matthew Honnibal
c2e2f81773 Merge pull request #1355 from explosion/feature/noshare
Make pipeline components independent
2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun
a2bf4cc7bf fix newline in file 2017-09-26 21:49:43 +07:00
ines
bb5c631402 Implement like_num getter for French (via #1161) 2017-09-26 16:47:45 +02:00
ines
15479b3bae Add comment to like_num re: future work 2017-09-26 16:43:28 +02:00
ines
adda08fe14 Implement like_num getter for Dutch (via #1177) 2017-09-26 16:39:15 +02:00
ines
5ee10379db Port over changes from #1340 2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun
5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
ines
10d291f129 Port over change from #1351 2017-09-26 16:11:41 +02:00
Matthew Honnibal
3274b46a0d Try to fix compile error on Windows 2017-09-26 09:05:53 -05:00
Matthew Honnibal
19c7c09bf7 Fix PhraseMatcher.__contains__ 2017-09-26 08:35:53 -05:00
Matthew Honnibal
d02a41a8c9 Merge remote-tracking branch 'origin/develop' into feature/phrasematcher 2017-09-26 08:32:55 -05:00
Matthew Honnibal
698fc0d016 Remove merge artefact 2017-09-26 08:31:37 -05:00
Matthew Honnibal
defb68e94f Update feature/noshare with recent develop changes 2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd Use dep and ent multi-task objectives for parser' 2017-09-26 08:13:52 -05:00
Matthew Honnibal
9bfd585a11 Fix parameter name in .pxd file 2017-09-26 07:28:50 -05:00
Matthew Honnibal
74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal
5aaef3e7b8 Dont link vectors in vocab deserialize 2017-09-26 06:45:47 -05:00
Matthew Honnibal
18a27c7579 Fix typo in tensorizer serialization 2017-09-26 06:45:14 -05:00
Matthew Honnibal
5056743ad5 Fix parser serialization 2017-09-26 06:44:56 -05:00
Ines Montani
7123139b2b Add __contains__ to PhraseMatcher 2017-09-26 13:13:27 +02:00
Ines Montani
50ad50f96a Update matcher.pyx 2017-09-26 13:11:17 +02:00
Matthew Honnibal
e34e70673f Allow tagger models to be built with pre-defined tok2vec layer 2017-09-26 05:51:52 -05:00
Matthew Honnibal
bf917225ab Allow multi-task objectives during training 2017-09-26 05:42:52 -05:00
Matthew Honnibal
4ae9ea7684 Remove unused argument in Language 2017-09-26 05:41:35 -05:00
ines
edf7e4881d Add meta.json option to cli.train and add relevant properties
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
ines
d2d35b63b7 Fix formatting 2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779 Add docstrings for Pipe API 2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7 Add docstrings for Pipe API 2017-09-25 16:20:49 +02:00
Matthew Honnibal
8716ffe57d Serialize vocab last 2017-09-24 05:01:45 -05:00
Matthew Honnibal
72bbcc0871 Handle lemmatization for unknown string IDs 2017-09-24 05:01:31 -05:00
Matthew Honnibal
204b58c864 Fix evaluation during training 2017-09-24 05:01:03 -05:00
Matthew Honnibal
dc3a623d00 Remove unused update_shared argument 2017-09-24 05:00:37 -05:00
Matthew Honnibal
63bd87508d Don't use iterated convolutions 2017-09-23 04:39:17 -05:00
Matthew Honnibal
5a7fd0fd36 Fix vector linkage 2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc Merge pre-trained vectors and noshare patches 2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43 Whitespace 2017-09-22 20:00:50 -05:00
Matthew Honnibal
e93d43a43a Fix training with preset vectors 2017-09-22 20:00:40 -05:00
Matthew Honnibal
0795857dcb Fix beam parsing 2017-09-23 02:59:53 +02:00
Matthew Honnibal
4bd6a12b1f Fix Tok2Vec 2017-09-23 02:58:54 +02:00
Matthew Honnibal
386c1a5bd8 Fix tagger training 2017-09-23 02:58:06 +02:00
Matthew Honnibal
a2357cce3f Set random seed in train script 2017-09-23 02:57:31 +02:00
Matthew Honnibal
05596159bf Fix serialization when pre-trained vectors 2017-09-22 15:33:27 -05:00
Matthew Honnibal
980fb6e854 Refactor Tok2Vec 2017-09-22 09:38:36 -05:00
Matthew Honnibal
d9124f1aa3 Add link_vectors_to_models function 2017-09-22 09:38:22 -05:00
Matthew Honnibal
a186596307 Add 'reapply' combinator, for iterated CNN 2017-09-22 09:37:03 -05:00
Matthew Honnibal
40a4873b70 Fix serialization of model options 2017-09-21 13:07:26 -05:00
Matthew Honnibal
0a9016cade Fix serialization during training 2017-09-21 13:06:45 -05:00
Matthew Honnibal
20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal
1d73dec8b1 Refactor train script 2017-09-20 19:17:10 -05:00
Matthew Honnibal
ffda38356a Add util function to enable GPU 2017-09-20 19:16:35 -05:00
Matthew Honnibal
24e85c2048 Pass values for CNN maxout pieces option 2017-09-20 19:16:12 -05:00
Matthew Honnibal
b832f89ff8 Add resume_training function 2017-09-20 19:15:20 -05:00
Matthew Honnibal
f5144f04be Add argument for CNN maxout pieces 2017-09-20 19:14:41 -05:00
Matthew Honnibal
842e21de9f Fix int type error for Python 2 2017-09-20 23:55:30 +02:00
Matthew Honnibal
0c93c73e49 Add __reduce__ method for PhraseMatcher 2017-09-20 22:26:40 +02:00
Matthew Honnibal
cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
Matthew Honnibal
828cc91545 Fix PhraseMatcher for spaCy 2 2017-09-20 21:54:31 +02:00
Matthew Honnibal
78301b2d29 Avoid comparison to None in Tok2Vec 2017-09-20 00:19:34 +02:00
Matthew Honnibal
b36a38f63d Fix serialization of pretrained_dims property 2017-09-19 23:42:27 +02:00
Matthew Honnibal
2489dcaccf Fix serialization of parser 2017-09-19 23:42:12 +02:00
Matthew Honnibal
40837b275d Fix tensorizer with pretrained vectors 2017-09-18 18:05:38 -05:00
Matthew Honnibal
a0c4b33d03 Support resuming a model during spacy train 2017-09-18 18:04:47 -05:00
Matthew Honnibal
c858927271 Copy vectors to GPU on begin training 2017-09-18 18:04:16 -05:00
Matthew Honnibal
3fa76c17d1 Refactor Tok2Vec 2017-09-18 15:00:05 -05:00
Matthew Honnibal
217e7891cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-18 11:36:21 -05:00
Matthew Honnibal
7b3f391f80 Try dropping the Affine layer, conditionally 2017-09-18 11:35:59 -05:00
ines
2480f8f521 Add missing return in Doc.from_disk() (closes #1330) 2017-09-18 15:32:00 +02:00
Matthew Honnibal
2148ae605b Dont use iterated convolutions 2017-09-17 17:36:04 -05:00
Matthew Honnibal
c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
Matthew Honnibal
8f42f8d305 Remove unused 'preprocess' argument in Tok2Vec' 2017-09-17 12:30:16 -05:00
Matthew Honnibal
039d609362 Remove hard-coded default vectors width 2017-09-17 12:29:39 -05:00
Matthew Honnibal
4f38a67a89 Make width default to 0 in vectors.pyx 2017-09-17 12:29:14 -05:00
Matthew Honnibal
16122f566e Fix cpdef enum in attrs.pyx 2017-09-17 12:28:53 -05:00
Matthew Honnibal
b159e0eb50 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-17 05:47:50 -05:00
Matthew Honnibal
2b0efc77ae Fix wiring of pre-trained vectors in parser loading 2017-09-17 05:47:34 -05:00
Matthew Honnibal
31c2e91c35 Fix wiring of pre-trained vectors in parser loading 2017-09-17 05:46:55 -05:00
Matthew Honnibal
8f913a74ca Fix defaults and args to build_tagger_model 2017-09-17 05:46:36 -05:00
Matthew Honnibal
c003c561c3 Revert NER action loading change, for model compatibility 2017-09-17 05:46:03 -05:00
Matthew Honnibal
43210abacc Resolve fine-tuning conflict 2017-09-17 05:30:04 -05:00
ines
ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines
68f66aebf8 Use pkg_resources instead of pip for is_package (resolves #1293) 2017-09-16 20:27:59 +02:00
Matthew Honnibal
5ff2491f24 Pass option for pre-trained vectors in parser 2017-09-16 12:47:21 -05:00
Matthew Honnibal
8665a77f48 Fix feature error in NER 2017-09-16 12:46:57 -05:00
Matthew Honnibal
e37a50a436 Pass documents to tensorizer, not 'features' 2017-09-16 12:46:36 -05:00
Matthew Honnibal
84e637e2e6 Pass option for pretrained vectors in pipeline 2017-09-16 12:46:02 -05:00
Matthew Honnibal
2a93404da6 Support optional pre-trained vectors in tensorizer model 2017-09-16 12:45:37 -05:00
Matthew Honnibal
e0a2aa9289 Support having word vectors data on GPU 2017-09-16 12:45:09 -05:00
Matthew Honnibal
ebf8942564 Fix test for Python3 2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb Excuse emoji failure on narrow unicode builds 2017-09-16 16:21:13 +02:00
Matthew Honnibal
11f2a05ede Fix code explosion from long enum in Python 3, Cython 0.24+ 2017-09-16 12:20:04 +02:00
Matthew Honnibal
3fa5b40b5c Add test for hash consistency 2017-09-16 11:21:35 +02:00
Matthew Honnibal
f730d07e4e Fix prange error for Windows 2017-09-16 00:25:33 +02:00
Matthew Honnibal
4b2065430e Merge branch 'feature/parser-history' into develop 2017-09-15 10:42:20 +02:00
Matthew Honnibal
2f08489694 Remove AddHistory layer -- didnt work as planned 2017-09-15 10:41:40 +02:00
Matthew Honnibal
8b481e0465 Remove redundant brackets 2017-09-15 10:38:08 +02:00
Matthew Honnibal
d84607f6bb Vectorize update in AddHistory 2017-09-14 20:34:40 +02:00
Ines Montani
bd3da3d6fb Port over change from #1323 and tidy up 2017-09-14 19:23:13 +02:00
Matthew Honnibal
18347ab69c Implement AddHistory layer wrapper 2017-09-14 19:07:35 +02:00
Matthew Honnibal
d4ca6cef9e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 17:00:07 +02:00
Matthew Honnibal
8c503487af Fix lookup of missing NER actions 2017-09-14 16:59:45 +02:00
Matthew Honnibal
664c5af745 Revert padding in parser 2017-09-14 16:59:25 +02:00
Matthew Honnibal
8496d76224 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 09:21:20 -05:00
Matthew Honnibal
d1518027a9 Increment version 2017-09-14 16:18:46 +02:00
Matthew Honnibal
70da88a3a7 Update comment on Language.begin_training 2017-09-14 16:18:30 +02:00
Matthew Honnibal
c6395b057a Improve parser feature extraction, for missing values 2017-09-14 16:18:02 +02:00
Matthew Honnibal
daf869ab3b Fix add_action for NER, so labelled 'O' actions aren't added 2017-09-14 16:16:41 +02:00
Matthew Honnibal
9cb2aef587 Remove print statement 2017-09-14 13:38:28 +02:00
Matthew Honnibal
ba23d63c35 Fix minibatch function, for fixed batch size 2017-09-14 13:37:41 +02:00
Matthew Honnibal
456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal
5c3ff06924 Fix lemmatizer rules 2017-09-06 19:13:24 +02:00
Matthew Honnibal
dd9cab0faf Fix type-check for int/long 2017-09-06 19:03:05 +02:00
Matthew Honnibal
497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal
dcbf866970 Merge parser changes 2017-09-06 18:41:05 +02:00
Matthew Honnibal
5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal
24ff6b0ad9 Fix parsing and tok2vec models 2017-09-06 05:50:58 -05:00
Matthew Honnibal
1b65115bc2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 20:02:53 -05:00
Matthew Honnibal
33fa91feb7 Restore correctness of parser model 2017-09-04 21:19:30 +02:00
Matthew Honnibal
e88a42e460 Increment version 2017-09-04 21:14:39 +02:00
Matthew Honnibal
9d65d67985 Preserve model compatibility in parser, for now 2017-09-04 16:46:22 +02:00
Matthew Honnibal
d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal
7fdafcc4c4 Fix config loading in tagger 2017-09-04 16:38:49 +02:00
Matthew Honnibal
058372d120 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 16:27:53 +02:00
Matthew Honnibal
16e25ce3b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 09:26:53 -05:00
Matthew Honnibal
9f512e657a Fix drop_layer calculation 2017-09-04 09:26:38 -05:00
Matthew Honnibal
cb4839033c Fix loader for EN tests 2017-09-04 15:19:18 +02:00
Matthew Honnibal
382ce566eb Fix deserialization bug 2017-09-04 15:19:01 +02:00
Matthew Honnibal
bfddf50081 Fix #1296: Incorrect lemmatization of base form verbs 2017-09-04 15:18:41 +02:00
Matthew Honnibal
b29e6bff46 Improve lemmatization rule for am|VBP 2017-09-04 15:18:10 +02:00
Matthew Honnibal
644d6c9e1a Improve lemmatization tests, re #1296 2017-09-04 15:17:44 +02:00
Matthew Honnibal
3cf3fa1704 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-02 12:46:11 -05:00
Matthew Honnibal
e920885676 Fix pickle during train 2017-09-02 12:46:01 -05:00
Matthew Honnibal
c0eaba8b28 Fix low-data textcat 2017-09-02 15:17:32 +02:00
Matthew Honnibal
9e378bdac5 Fix textcat serialization 2017-09-02 15:17:20 +02:00
Matthew Honnibal
e3ea6ee02b Increment version 2017-09-02 15:17:01 +02:00
Matthew Honnibal
a3b69bcb3d Add low_data mode in textcat 2017-09-02 14:56:30 +02:00
Matthew Honnibal
ead78c7b9b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-02 12:55:25 +02:00
Matthew Honnibal
5e6a9e7dcc Add rule-based SBD 2017-09-02 12:53:38 +02:00
Matthew Honnibal
a824cf8f9a Adjust text classification model 2017-09-02 11:41:00 +02:00
Matthew Honnibal
ac040b99bb Add support for pre-trained vectors in text classifier 2017-09-01 16:39:55 +02:00
Matthew Honnibal
7742a6d559 Add GloVe vectors reader 2017-09-01 16:39:22 +02:00
Matthew Honnibal
789e1a3980 Use 13 parser features, not 8 2017-08-31 14:13:00 -05:00
Matthew Honnibal
30e35d9666 Fix syntax error 2017-08-30 17:35:39 -05:00
Matthew Honnibal
4ceebde523 Fix gradient bug in parser 2017-08-30 17:32:56 -05:00
ines
173089a45a Add more validation for model meta 2017-08-29 11:21:46 +02:00
Matthew Honnibal
2e28982e28 Merge pull request #1288 from geovedi/indonesian
Indonesian language support
2017-08-26 21:31:13 +02:00
ines
7e04b7f89c Fix info text on pipeline in package cli 2017-08-26 18:30:59 +02:00
ines
40afa13a8a Increment version 2017-08-26 18:30:49 +02:00
Matthew Honnibal
876f38c548 Merge pull request #1279 from oroszgy/model_cli_v2
Added vector loading to model cli
2017-08-26 15:57:50 +02:00
Matthew Honnibal
cfc055734e Split % in units, for compatibility with corpus 2017-08-25 20:03:37 -05:00
Matthew Honnibal
4bb6bc3f9e Add support for sent_start to GoldParse 2017-08-25 20:03:14 -05:00
Matthew Honnibal
44589fb38c Fix Break oracle 2017-08-25 19:50:55 -05:00
Matthew Honnibal
6d4e8e14ca Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-25 12:37:16 -05:00
Matthew Honnibal
4ce5531389 Use layer norm instead of batch norm 2017-08-25 12:37:10 -05:00
Matthew Honnibal
20dd66ddc2 Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens 2017-08-25 19:35:47 +02:00
Jim Geovedi
58d8078971 Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-25 09:21:49 +08:00
Matthew Honnibal
6ceb0f0518 Allow Lexeme.rank to be set 2017-08-24 21:43:00 +02:00
Matthew Honnibal
44a1fa80d3 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-23 13:02:16 +02:00
ines
bb1abbeba5 Only link model if download was successfull 2017-08-23 12:36:31 +02:00
Matthew Honnibal
bb2541ffd3 Fix PROB attr for OOV words 2017-08-23 12:11:52 +02:00
Matthew Honnibal
1c5c256e58 Fix fine_tune when optimizer is None 2017-08-23 10:51:33 +02:00
Matthew Honnibal
9c580ad28a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-22 17:02:04 -05:00
Matthew Honnibal
a4633fff6f Restore use of batch norm in model 2017-08-22 17:01:58 -05:00
Matthew Honnibal
03b5b9727a Fix Doc.vector for empty doc objects 2017-08-22 19:52:19 +02:00
Matthew Honnibal
0551b7b03a Fix doc.vector 2017-08-22 19:46:52 +02:00
Matthew Honnibal
83f8e98450 Fix retrieval of OOV vectors 2017-08-22 19:46:35 +02:00
Matthew Honnibal
df2745eb08 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-22 19:00:43 +02:00
Matthew Honnibal
5b329acbf2 Fix vectors_length property in vocab 2017-08-22 19:00:27 +02:00
Matthew Honnibal
1fe605dfe5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-21 19:18:31 -05:00
Matthew Honnibal
18b64e79ec Fix fine tuning 2017-08-21 19:18:26 -05:00
Matthew Honnibal
682346dd66 Restore optimized hidden_depth=0 for parser 2017-08-21 19:18:04 -05:00
Matthew Honnibal
a21d8f3f0b Add predict paths to _ml models 2017-08-21 23:23:45 +02:00
Matthew Honnibal
cec76801dc Add profile command to CLI 2017-08-21 23:23:05 +02:00
Matthew Honnibal
7be5f30f17 Add profile function 2017-08-21 23:22:49 +02:00
ines
a68dc891ea Port over changes from #1281 2017-08-21 23:19:18 +02:00
Matthew Honnibal
5e50a65252 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-21 14:15:46 -05:00
Matthew Honnibal
80acbc5f1f Fix fine-tune weight mixture 2017-08-21 14:15:29 -05:00
ines
d15775c3ad Fix typos and commands in alpha docs 2017-08-21 13:40:11 +02:00
Gyorgy Orosz
b3576bfc86 Added vector leading to model cli 2017-08-20 23:16:12 +02:00
Matthew Honnibal
c10f63bf10 Initialize fine tuning to 0.5 2017-08-20 15:59:48 -05:00
Matthew Honnibal
62878e50db Fix misalignment caued by filtering inputs at wrong point in parser 2017-08-20 15:59:28 -05:00
Matthew Honnibal
78a5f842e9 Fix update when update_shared=False 2017-08-20 15:58:34 -05:00
Matthew Honnibal
7a6edeea68 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 12:55:39 -05:00
Matthew Honnibal
f2f9229964 Fix name of update_shared flag 2017-08-20 18:19:06 +02:00
Matthew Honnibal
8a59718fd6 Fix fine-tuning 2017-08-20 18:17:35 +02:00
Matthew Honnibal
80a5146ec2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 11:07:08 -05:00
Matthew Honnibal
84bb543e4d Add gold_preproc flag to cli/train 2017-08-20 11:07:00 -05:00
Matthew Honnibal
3fe0d76e6d Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-20 14:50:01 +02:00
Matthew Honnibal
c1d3ff517a Track loss in tagger 2017-08-20 14:42:23 +02:00
Matthew Honnibal
8875590081 Add optimizer in Language.update if sgd=None 2017-08-20 14:42:07 +02:00
Matthew Honnibal
84b7ed49e4 Ensure updates aren't made if no gold available 2017-08-20 14:41:38 +02:00
Ines Montani
c2bbd393af Merge pull request #1276 from oroszgy/model_cli_v2
Ported model cli from v1
2017-08-20 11:52:59 +02:00
Jim Geovedi
f77443ab68 reworked 2017-08-20 13:43:21 +07:00
Jim Geovedi
fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0 added indonesian lang test 2017-08-20 12:17:14 +07:00
Jim Geovedi
b7d83f37c8 indonesian abbr. 2017-08-20 12:16:50 +07:00
Jim Geovedi
7193c47f0b direct lookup 2017-08-20 11:57:52 +07:00
Jim Geovedi
fdf802d505 added examples 2017-08-20 11:57:10 +07:00
Jim Geovedi
fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
Matthew Honnibal
42fa84075f Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-08-19 22:42:50 +02:00
Matthew Honnibal
aefef6fd28 Prevent strings from being lost during from_disk and from_bytes 2017-08-19 22:42:17 +02:00
ines
281e7e58b3 Don't escape forward slashes on ujson.dumps 2017-08-19 22:32:16 +02:00
ines
2d126a00ae Fix typo 2017-08-19 22:32:07 +02:00
Matthew Honnibal
41c2218c53 Fix test for vectors 2017-08-19 22:09:12 +02:00
Matthew Honnibal
b8e1603cc4 Fix load fail for missing vectors 2017-08-19 22:07:00 +02:00