Commit Graph

4056 Commits

Author SHA1 Message Date
Matthew Honnibal
c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36 Wrap model saving in try/except 2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa Disable LayerNorm hack 2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a Make depth setting in parser work again 2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6 Fix Embed and HistoryFeatures 2017-10-04 19:55:34 -05:00
Matthew Honnibal
d903986439 Increment version 2017-10-04 17:14:26 +02:00
Matthew Honnibal
40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal
bd8e84998a Add nO attribute to TextCategorizer model 2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527 Improve textcat model slightly 2017-10-04 15:15:53 +02:00
Matthew Honnibal
39798b0172 Uncomment layernorm adjustment hack 2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal
774f5732bd Fix dimensionality of textcat when no vectors available 2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51 Merge pull request #1385 from explosion/feature/new-website
💫 New spaCy website
2017-10-04 14:35:52 +02:00
Matthew Honnibal
af75b74208 Unset LayerNorm backwards compat hack 2017-10-03 20:47:10 -05:00
ines
73ac0aa0b5 Update spacy evaluate and add displaCy option 2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53 Merge remote-tracking branch 'origin/develop' into feature/parser-history-model 2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a Fix evaluate for non-GPU 2017-10-03 22:47:31 +02:00
Matthew Honnibal
5cbefcba17 Set backwards compatibility flag 2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7 Update thinc imports for 6.9 2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Matthew Honnibal
e514d6aa0a Import thinc modules more explicitly, to avoid cycles 2017-10-03 18:49:25 +02:00
Matthew Honnibal
338e1fda0e Unbreak merge artefact 2017-10-03 09:41:05 -05:00
Matthew Honnibal
1289187279 Fix circular import 2017-10-03 09:33:21 -05:00
Matthew Honnibal
a44c4c3a5b Add timer to evaluate 2017-10-03 09:15:35 -05:00
Matthew Honnibal
96da86b3e5 Add support for verbose flag to Language 2017-10-03 09:14:57 -05:00
Matthew Honnibal
02586a5243 Add timing to spacy evaluate command 2017-10-03 09:14:34 -05:00
ines
e49cd7aeaf Move import into load to avoid circular imports 2017-10-03 15:22:19 +02:00
ines
b0dfa059db Update docs link in about.py 2017-10-03 15:19:55 +02:00
Matthew Honnibal
dc3c791947 Fix history size option 2017-10-03 13:41:23 +02:00
Matthew Honnibal
278a4c17c6 Fix history features 2017-10-03 13:27:10 +02:00
Matthew Honnibal
b770f4e108 Fix embed class in history features 2017-10-03 13:26:55 +02:00
Matthew Honnibal
b50a359e11 Add support for history features in parsing models 2017-10-03 12:44:01 +02:00
Matthew Honnibal
ee41e4fea7 Support history features in stateclass 2017-10-03 12:43:48 +02:00
Matthew Honnibal
6aa6a5bc25 Add a layer type for history features 2017-10-03 12:43:09 +02:00
Matthew Honnibal
8902df44de Fix component disabling during training 2017-10-02 21:07:23 +02:00
Matthew Honnibal
c617d288d8 Update pipeline component names in spaCy train 2017-10-02 17:20:19 +02:00
Matthew Honnibal
f942903429 Improve sentence merging in iob2json 2017-10-02 17:02:10 +02:00
Matthew Honnibal
31681d20e0 Fix concatenation in iob2json converter 2017-10-02 16:50:26 +02:00
Matthew Honnibal
4896ce3320 Remove misleading comment 2017-10-02 00:09:14 +02:00
Matthew Honnibal
d90cc917fa Merge vectors.pyx doc strings 2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77 Fix inconsistency of Vectors class API 2017-10-01 17:00:34 -05:00
Matthew Honnibal
e38089d598 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-01 22:10:54 +02:00
Matthew Honnibal
97c409b602 Add docstrings for spacy.vectors 2017-10-01 22:10:33 +02:00
ines
b776f48e58 Fix typo 2017-10-01 21:58:45 +02:00
Matthew Honnibal
94df115a81 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-01 14:06:23 -05:00
Matthew Honnibal
2cf0f4622f Fix loading of models with pre-trained vectors 2017-10-01 14:05:32 -05:00
Matthew Honnibal
69c7c642c2 Add spacy evaluate 2017-10-01 14:05:04 -05:00
ines
8dbe49ecb8 Always compare lowercase package names
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:55:17 +02:00
ines
153c2589d4 Revert "Always compare lowercase package names"
This reverts commit 7d77dc490f.
2017-09-29 20:53:36 +02:00
ines
fd1a9225d8 Handle conversion of pipeline components correctly
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
ines
7d77dc490f Always compare lowercase package names
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:52:28 +02:00
Matthew Honnibal
cdb2d83e16 Pass dropout in parser 2017-09-28 18:47:13 -05:00
Matthew Honnibal
158e177cae Fix default embed size 2017-09-28 08:25:23 -05:00
Matthew Honnibal
f6330d69e6 Default embed size to 7000 2017-09-28 08:07:41 -05:00
Matthew Honnibal
ac8481a7b0 Print NER loss 2017-09-28 08:05:31 -05:00
Matthew Honnibal
542ebfa498 Improve defaults 2017-09-27 18:54:37 -05:00
Matthew Honnibal
dcb86bdc43 Default batch size to 32 2017-09-27 11:48:19 -05:00
Matthew Honnibal
1a37a2c0a0 Update training defaults 2017-09-27 11:48:07 -05:00
Matthew Honnibal
13d7a97f3a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-27 11:44:37 -05:00
Matthew Honnibal
66c388ee01 Remove unhelpful multitask objectives 2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a Fix hard-coded vector width 2017-09-27 11:43:58 -05:00
Ines Montani
959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Matthew Honnibal
1ef4236f8e Merge pull request #1343 from explosion/feature/phrasematcher
Update PhraseMatcher for spaCy 2
2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00
ines
1ff62eaee7 Fix option shortcut to avoid conflict 2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun
3d5046c499 fix import in th 2017-09-26 22:41:20 +07:00
ines
7fdfb78141 Add version option to cli.train 2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun
a63f790b8c fix thai tag_map 2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun
2ea27d07f4 fix tokenizer_exceptions in thai 2017-09-26 22:14:47 +07:00
Matthew Honnibal
41cc5c4c17 Merge branch 'develop' into feature/phrasematcher 2017-09-26 09:59:17 -05:00
Matthew Honnibal
c2e2f81773 Merge pull request #1355 from explosion/feature/noshare
Make pipeline components independent
2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun
a2bf4cc7bf fix newline in file 2017-09-26 21:49:43 +07:00
ines
bb5c631402 Implement like_num getter for French (via #1161) 2017-09-26 16:47:45 +02:00
ines
15479b3bae Add comment to like_num re: future work 2017-09-26 16:43:28 +02:00
ines
adda08fe14 Implement like_num getter for Dutch (via #1177) 2017-09-26 16:39:15 +02:00
ines
5ee10379db Port over changes from #1340 2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun
5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
ines
10d291f129 Port over change from #1351 2017-09-26 16:11:41 +02:00
Matthew Honnibal
3274b46a0d Try to fix compile error on Windows 2017-09-26 09:05:53 -05:00
Matthew Honnibal
19c7c09bf7 Fix PhraseMatcher.__contains__ 2017-09-26 08:35:53 -05:00
Matthew Honnibal
d02a41a8c9 Merge remote-tracking branch 'origin/develop' into feature/phrasematcher 2017-09-26 08:32:55 -05:00
Matthew Honnibal
698fc0d016 Remove merge artefact 2017-09-26 08:31:37 -05:00
Matthew Honnibal
defb68e94f Update feature/noshare with recent develop changes 2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd Use dep and ent multi-task objectives for parser' 2017-09-26 08:13:52 -05:00
Matthew Honnibal
9bfd585a11 Fix parameter name in .pxd file 2017-09-26 07:28:50 -05:00
Matthew Honnibal
74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal
5aaef3e7b8 Dont link vectors in vocab deserialize 2017-09-26 06:45:47 -05:00
Matthew Honnibal
18a27c7579 Fix typo in tensorizer serialization 2017-09-26 06:45:14 -05:00
Matthew Honnibal
5056743ad5 Fix parser serialization 2017-09-26 06:44:56 -05:00
Ines Montani
7123139b2b Add __contains__ to PhraseMatcher 2017-09-26 13:13:27 +02:00
Ines Montani
50ad50f96a Update matcher.pyx 2017-09-26 13:11:17 +02:00
Matthew Honnibal
e34e70673f Allow tagger models to be built with pre-defined tok2vec layer 2017-09-26 05:51:52 -05:00
Matthew Honnibal
bf917225ab Allow multi-task objectives during training 2017-09-26 05:42:52 -05:00
Matthew Honnibal
4ae9ea7684 Remove unused argument in Language 2017-09-26 05:41:35 -05:00
ines
edf7e4881d Add meta.json option to cli.train and add relevant properties
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
ines
d2d35b63b7 Fix formatting 2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779 Add docstrings for Pipe API 2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7 Add docstrings for Pipe API 2017-09-25 16:20:49 +02:00
Matthew Honnibal
8716ffe57d Serialize vocab last 2017-09-24 05:01:45 -05:00
Matthew Honnibal
72bbcc0871 Handle lemmatization for unknown string IDs 2017-09-24 05:01:31 -05:00
Matthew Honnibal
204b58c864 Fix evaluation during training 2017-09-24 05:01:03 -05:00
Matthew Honnibal
dc3a623d00 Remove unused update_shared argument 2017-09-24 05:00:37 -05:00
Matthew Honnibal
63bd87508d Don't use iterated convolutions 2017-09-23 04:39:17 -05:00
Matthew Honnibal
5a7fd0fd36 Fix vector linkage 2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc Merge pre-trained vectors and noshare patches 2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43 Whitespace 2017-09-22 20:00:50 -05:00
Matthew Honnibal
e93d43a43a Fix training with preset vectors 2017-09-22 20:00:40 -05:00
Matthew Honnibal
0795857dcb Fix beam parsing 2017-09-23 02:59:53 +02:00
Matthew Honnibal
4bd6a12b1f Fix Tok2Vec 2017-09-23 02:58:54 +02:00
Matthew Honnibal
386c1a5bd8 Fix tagger training 2017-09-23 02:58:06 +02:00
Matthew Honnibal
a2357cce3f Set random seed in train script 2017-09-23 02:57:31 +02:00
Matthew Honnibal
05596159bf Fix serialization when pre-trained vectors 2017-09-22 15:33:27 -05:00
Matthew Honnibal
980fb6e854 Refactor Tok2Vec 2017-09-22 09:38:36 -05:00
Matthew Honnibal
d9124f1aa3 Add link_vectors_to_models function 2017-09-22 09:38:22 -05:00
Matthew Honnibal
a186596307 Add 'reapply' combinator, for iterated CNN 2017-09-22 09:37:03 -05:00
Matthew Honnibal
40a4873b70 Fix serialization of model options 2017-09-21 13:07:26 -05:00
Matthew Honnibal
0a9016cade Fix serialization during training 2017-09-21 13:06:45 -05:00
Matthew Honnibal
20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal
1d73dec8b1 Refactor train script 2017-09-20 19:17:10 -05:00
Matthew Honnibal
ffda38356a Add util function to enable GPU 2017-09-20 19:16:35 -05:00
Matthew Honnibal
24e85c2048 Pass values for CNN maxout pieces option 2017-09-20 19:16:12 -05:00
Matthew Honnibal
b832f89ff8 Add resume_training function 2017-09-20 19:15:20 -05:00
Matthew Honnibal
f5144f04be Add argument for CNN maxout pieces 2017-09-20 19:14:41 -05:00
Matthew Honnibal
842e21de9f Fix int type error for Python 2 2017-09-20 23:55:30 +02:00
Matthew Honnibal
0c93c73e49 Add __reduce__ method for PhraseMatcher 2017-09-20 22:26:40 +02:00
Matthew Honnibal
cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
Matthew Honnibal
828cc91545 Fix PhraseMatcher for spaCy 2 2017-09-20 21:54:31 +02:00
Matthew Honnibal
78301b2d29 Avoid comparison to None in Tok2Vec 2017-09-20 00:19:34 +02:00
Matthew Honnibal
b36a38f63d Fix serialization of pretrained_dims property 2017-09-19 23:42:27 +02:00
Matthew Honnibal
2489dcaccf Fix serialization of parser 2017-09-19 23:42:12 +02:00
Matthew Honnibal
40837b275d Fix tensorizer with pretrained vectors 2017-09-18 18:05:38 -05:00
Matthew Honnibal
a0c4b33d03 Support resuming a model during spacy train 2017-09-18 18:04:47 -05:00
Matthew Honnibal
c858927271 Copy vectors to GPU on begin training 2017-09-18 18:04:16 -05:00
Matthew Honnibal
3fa76c17d1 Refactor Tok2Vec 2017-09-18 15:00:05 -05:00
Matthew Honnibal
217e7891cd Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-18 11:36:21 -05:00
Matthew Honnibal
7b3f391f80 Try dropping the Affine layer, conditionally 2017-09-18 11:35:59 -05:00
ines
2480f8f521 Add missing return in Doc.from_disk() (closes #1330) 2017-09-18 15:32:00 +02:00
Matthew Honnibal
2148ae605b Dont use iterated convolutions 2017-09-17 17:36:04 -05:00
Matthew Honnibal
c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
Matthew Honnibal
8f42f8d305 Remove unused 'preprocess' argument in Tok2Vec' 2017-09-17 12:30:16 -05:00
Matthew Honnibal
039d609362 Remove hard-coded default vectors width 2017-09-17 12:29:39 -05:00
Matthew Honnibal
4f38a67a89 Make width default to 0 in vectors.pyx 2017-09-17 12:29:14 -05:00
Matthew Honnibal
16122f566e Fix cpdef enum in attrs.pyx 2017-09-17 12:28:53 -05:00
Matthew Honnibal
b159e0eb50 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-17 05:47:50 -05:00
Matthew Honnibal
2b0efc77ae Fix wiring of pre-trained vectors in parser loading 2017-09-17 05:47:34 -05:00
Matthew Honnibal
31c2e91c35 Fix wiring of pre-trained vectors in parser loading 2017-09-17 05:46:55 -05:00
Matthew Honnibal
8f913a74ca Fix defaults and args to build_tagger_model 2017-09-17 05:46:36 -05:00
Matthew Honnibal
c003c561c3 Revert NER action loading change, for model compatibility 2017-09-17 05:46:03 -05:00
Matthew Honnibal
43210abacc Resolve fine-tuning conflict 2017-09-17 05:30:04 -05:00
ines
ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines
68f66aebf8 Use pkg_resources instead of pip for is_package (resolves #1293) 2017-09-16 20:27:59 +02:00
Matthew Honnibal
5ff2491f24 Pass option for pre-trained vectors in parser 2017-09-16 12:47:21 -05:00
Matthew Honnibal
8665a77f48 Fix feature error in NER 2017-09-16 12:46:57 -05:00
Matthew Honnibal
e37a50a436 Pass documents to tensorizer, not 'features' 2017-09-16 12:46:36 -05:00
Matthew Honnibal
84e637e2e6 Pass option for pretrained vectors in pipeline 2017-09-16 12:46:02 -05:00
Matthew Honnibal
2a93404da6 Support optional pre-trained vectors in tensorizer model 2017-09-16 12:45:37 -05:00
Matthew Honnibal
e0a2aa9289 Support having word vectors data on GPU 2017-09-16 12:45:09 -05:00
Matthew Honnibal
ebf8942564 Fix test for Python3 2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb Excuse emoji failure on narrow unicode builds 2017-09-16 16:21:13 +02:00
Matthew Honnibal
11f2a05ede Fix code explosion from long enum in Python 3, Cython 0.24+ 2017-09-16 12:20:04 +02:00
Matthew Honnibal
3fa5b40b5c Add test for hash consistency 2017-09-16 11:21:35 +02:00
Matthew Honnibal
f730d07e4e Fix prange error for Windows 2017-09-16 00:25:33 +02:00
Matthew Honnibal
4b2065430e Merge branch 'feature/parser-history' into develop 2017-09-15 10:42:20 +02:00
Matthew Honnibal
2f08489694 Remove AddHistory layer -- didnt work as planned 2017-09-15 10:41:40 +02:00
Matthew Honnibal
8b481e0465 Remove redundant brackets 2017-09-15 10:38:08 +02:00
Matthew Honnibal
d84607f6bb Vectorize update in AddHistory 2017-09-14 20:34:40 +02:00
Ines Montani
bd3da3d6fb Port over change from #1323 and tidy up 2017-09-14 19:23:13 +02:00
Matthew Honnibal
18347ab69c Implement AddHistory layer wrapper 2017-09-14 19:07:35 +02:00
Matthew Honnibal
d4ca6cef9e Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 17:00:07 +02:00
Matthew Honnibal
8c503487af Fix lookup of missing NER actions 2017-09-14 16:59:45 +02:00
Matthew Honnibal
664c5af745 Revert padding in parser 2017-09-14 16:59:25 +02:00
Matthew Honnibal
8496d76224 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-14 09:21:20 -05:00
Matthew Honnibal
d1518027a9 Increment version 2017-09-14 16:18:46 +02:00
Matthew Honnibal
70da88a3a7 Update comment on Language.begin_training 2017-09-14 16:18:30 +02:00
Matthew Honnibal
c6395b057a Improve parser feature extraction, for missing values 2017-09-14 16:18:02 +02:00
Matthew Honnibal
daf869ab3b Fix add_action for NER, so labelled 'O' actions aren't added 2017-09-14 16:16:41 +02:00
Matthew Honnibal
9cb2aef587 Remove print statement 2017-09-14 13:38:28 +02:00
Matthew Honnibal
ba23d63c35 Fix minibatch function, for fixed batch size 2017-09-14 13:37:41 +02:00
Matthew Honnibal
456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal
5c3ff06924 Fix lemmatizer rules 2017-09-06 19:13:24 +02:00
Matthew Honnibal
dd9cab0faf Fix type-check for int/long 2017-09-06 19:03:05 +02:00
Matthew Honnibal
497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal
dcbf866970 Merge parser changes 2017-09-06 18:41:05 +02:00
Matthew Honnibal
5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal
24ff6b0ad9 Fix parsing and tok2vec models 2017-09-06 05:50:58 -05:00
Matthew Honnibal
1b65115bc2 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 20:02:53 -05:00
Matthew Honnibal
33fa91feb7 Restore correctness of parser model 2017-09-04 21:19:30 +02:00
Matthew Honnibal
e88a42e460 Increment version 2017-09-04 21:14:39 +02:00
Matthew Honnibal
9d65d67985 Preserve model compatibility in parser, for now 2017-09-04 16:46:22 +02:00
Matthew Honnibal
d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal
7fdafcc4c4 Fix config loading in tagger 2017-09-04 16:38:49 +02:00
Matthew Honnibal
058372d120 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 16:27:53 +02:00
Matthew Honnibal
16e25ce3b5 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-09-04 09:26:53 -05:00
Matthew Honnibal
9f512e657a Fix drop_layer calculation 2017-09-04 09:26:38 -05:00
Matthew Honnibal
cb4839033c Fix loader for EN tests 2017-09-04 15:19:18 +02:00