Matthew Honnibal
92c5d78b42
Unhack NER.add_action
2017-10-07 19:02:40 +02:00
Matthew Honnibal
f2b590f672
Increment version
2017-10-07 19:01:01 +02:00
Matthew Honnibal
9bd8191739
Add tests for Underscore
2017-10-07 18:56:19 +02:00
Matthew Honnibal
668a0ea640
Pass extensions into Underscore class
2017-10-07 18:56:01 +02:00
Matthew Honnibal
1289129fd9
Add Underscore class
2017-10-07 18:00:14 +02:00
Matthew Honnibal
eb0595bea9
Merge pull request #1392 from explosion/feature/parser-history-model
...
💫 Parser history features
2017-10-07 15:07:02 +02:00
Matthew Honnibal
3d22ccf495
Update default hyper-parameters
2017-10-07 07:16:41 -05:00
Matthew Honnibal
09442d25ec
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-07 07:05:04 -05:00
Matthew Honnibal
3b67eabfea
Allow empty dictionaries to match any token in Matcher
...
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.
The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.
This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines
0adadcb3f0
Fix beam parse model test
2017-10-07 02:15:15 +02:00
ines
b38a8f4a94
Fix and update pipe methods tests
2017-10-07 02:06:23 +02:00
Matthew Honnibal
0384f08218
Trigger nonproj.deprojectivize as a postprocess
2017-10-07 02:00:47 +02:00
Matthew Honnibal
3a65a0c970
Start adding tests for new pipeline management
2017-10-07 01:48:23 +02:00
ines
e43530269c
Update docstrings
2017-10-07 01:04:50 +02:00
ines
61a503a611
Fix parser test
2017-10-07 00:38:51 +02:00
ines
b39409173e
Add disable option and True/False/None values for pipeline
2017-10-07 00:29:08 +02:00
ines
2586b61b15
Fix formatting, tidy up and remove unused imports
2017-10-07 00:26:05 +02:00
ines
212c8f0711
Implement new Language methods and pipeline API
2017-10-07 00:25:54 +02:00
Matthew Honnibal
8be46d766e
Remove print statement
2017-10-06 16:19:02 -05:00
Matthew Honnibal
8e731009fe
Fix parser config serialization
2017-10-06 13:50:52 -05:00
Matthew Honnibal
f4c9a98166
Fix spacy evaluate command on non-GPU
2017-10-06 13:17:47 -05:00
Matthew Honnibal
16ba6aa8a6
Fix parser config serialization
2017-10-06 13:17:31 -05:00
Matthew Honnibal
c66399d8ae
Fix depth definition with history features
2017-10-06 06:20:05 -05:00
Matthew Honnibal
5c750a9c2f
Reserve 0 for 'missing' in history features
2017-10-06 06:10:13 -05:00
Matthew Honnibal
fbba7c517e
Pass dropout through to embed tables
2017-10-06 06:09:18 -05:00
Matthew Honnibal
21d11936fe
Fix significant train/test skew error in history feats
2017-10-06 06:08:50 -05:00
Matthew Honnibal
555d8c8bff
Fix beam history features
2017-10-05 22:21:50 -05:00
Matthew Honnibal
3db0a32fd6
Fix dropout for history features
2017-10-05 22:21:30 -05:00
Matthew Honnibal
b0618def8d
Add support for 2-token state option
2017-10-05 21:54:12 -05:00
Matthew Honnibal
363aa47b40
Clean up dead parsing code
2017-10-05 21:53:49 -05:00
Matthew Honnibal
ca12764772
Enable history features for beam parser
2017-10-05 21:53:29 -05:00
Matthew Honnibal
fc06b0a333
Fix training when hist_size==0
2017-10-05 21:52:28 -05:00
Matthew Honnibal
e25ffcb11f
Move history size under feature flags
2017-10-05 19:38:13 -05:00
Matthew Honnibal
563f46f026
Fix multi-label support for text classification
...
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.
For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.
To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.
Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
c6cd81f192
Wrap try/except around model saving
2017-10-05 08:14:24 -05:00
Matthew Honnibal
5743b06e36
Wrap model saving in try/except
2017-10-05 08:12:50 -05:00
Matthew Honnibal
fd4baff475
Update tests
2017-10-05 08:12:27 -05:00
Matthew Honnibal
dcdfa071aa
Disable LayerNorm hack
2017-10-04 20:06:52 -05:00
Matthew Honnibal
943af4423a
Make depth setting in parser work again
2017-10-04 20:06:05 -05:00
Matthew Honnibal
bfabc333be
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-04 20:00:36 -05:00
Matthew Honnibal
92066b04d6
Fix Embed and HistoryFeatures
2017-10-04 19:55:34 -05:00
Matthew Honnibal
d903986439
Increment version
2017-10-04 17:14:26 +02:00
Matthew Honnibal
40edb65ee7
Make test work for Python 2.7
2017-10-04 16:36:50 +02:00
Matthew Honnibal
bd8e84998a
Add nO attribute to TextCategorizer model
2017-10-04 16:07:30 +02:00
Matthew Honnibal
f8a0614527
Improve textcat model slightly
2017-10-04 15:15:53 +02:00
Matthew Honnibal
39798b0172
Uncomment layernorm adjustment hack
2017-10-04 15:12:09 +02:00
Matthew Honnibal
b3a7082bf8
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-04 14:56:46 +02:00
Matthew Honnibal
db05d4d582
Add test for #1380 . Passes without fix?
2017-10-04 14:56:31 +02:00
Matthew Honnibal
774f5732bd
Fix dimensionality of textcat when no vectors available
2017-10-04 14:55:15 +02:00
Ines Montani
28ba0b9b51
Merge pull request #1385 from explosion/feature/new-website
...
💫 New spaCy website
2017-10-04 14:35:52 +02:00
Matthew Honnibal
af75b74208
Unset LayerNorm backwards compat hack
2017-10-03 20:47:10 -05:00
ines
73ac0aa0b5
Update spacy evaluate and add displaCy option
2017-10-04 00:03:15 +02:00
Matthew Honnibal
246612cb53
Merge remote-tracking branch 'origin/develop' into feature/parser-history-model
2017-10-03 16:56:42 -05:00
Matthew Honnibal
f24c2e3a8a
Fix evaluate for non-GPU
2017-10-03 22:47:31 +02:00
Matthew Honnibal
5cbefcba17
Set backwards compatibility flag
2017-10-03 20:29:58 +02:00
Matthew Honnibal
5454b20cd7
Update thinc imports for 6.9
2017-10-03 20:07:17 +02:00
Matthew Honnibal
4a59f6358c
Fix thinc imports
2017-10-03 19:21:26 +02:00
Matthew Honnibal
e514d6aa0a
Import thinc modules more explicitly, to avoid cycles
2017-10-03 18:49:25 +02:00
Matthew Honnibal
338e1fda0e
Unbreak merge artefact
2017-10-03 09:41:05 -05:00
Matthew Honnibal
1289187279
Fix circular import
2017-10-03 09:33:21 -05:00
Matthew Honnibal
a44c4c3a5b
Add timer to evaluate
2017-10-03 09:15:35 -05:00
Matthew Honnibal
96da86b3e5
Add support for verbose flag to Language
2017-10-03 09:14:57 -05:00
Matthew Honnibal
02586a5243
Add timing to spacy evaluate command
2017-10-03 09:14:34 -05:00
ines
e49cd7aeaf
Move import into load to avoid circular imports
2017-10-03 15:22:19 +02:00
ines
b0dfa059db
Update docs link in about.py
2017-10-03 15:19:55 +02:00
Matthew Honnibal
dc3c791947
Fix history size option
2017-10-03 13:41:23 +02:00
Matthew Honnibal
278a4c17c6
Fix history features
2017-10-03 13:27:10 +02:00
Matthew Honnibal
b770f4e108
Fix embed class in history features
2017-10-03 13:26:55 +02:00
Matthew Honnibal
b50a359e11
Add support for history features in parsing models
2017-10-03 12:44:01 +02:00
Matthew Honnibal
ee41e4fea7
Support history features in stateclass
2017-10-03 12:43:48 +02:00
Matthew Honnibal
6aa6a5bc25
Add a layer type for history features
2017-10-03 12:43:09 +02:00
Matthew Honnibal
8902df44de
Fix component disabling during training
2017-10-02 21:07:23 +02:00
Matthew Honnibal
c617d288d8
Update pipeline component names in spaCy train
2017-10-02 17:20:19 +02:00
Matthew Honnibal
f942903429
Improve sentence merging in iob2json
2017-10-02 17:02:10 +02:00
Matthew Honnibal
31681d20e0
Fix concatenation in iob2json converter
2017-10-02 16:50:26 +02:00
Matthew Honnibal
4896ce3320
Remove misleading comment
2017-10-02 00:09:14 +02:00
Matthew Honnibal
d90cc917fa
Merge vectors.pyx doc strings
2017-10-01 17:05:54 -05:00
Matthew Honnibal
b2a8b9be77
Fix inconsistency of Vectors class API
2017-10-01 17:00:34 -05:00
Matthew Honnibal
e38089d598
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 22:10:54 +02:00
Matthew Honnibal
97c409b602
Add docstrings for spacy.vectors
2017-10-01 22:10:33 +02:00
ines
b776f48e58
Fix typo
2017-10-01 21:58:45 +02:00
Matthew Honnibal
94df115a81
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-10-01 14:06:23 -05:00
Matthew Honnibal
2cf0f4622f
Fix loading of models with pre-trained vectors
2017-10-01 14:05:32 -05:00
Matthew Honnibal
69c7c642c2
Add spacy evaluate
2017-10-01 14:05:04 -05:00
ines
8dbe49ecb8
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:55:17 +02:00
ines
153c2589d4
Revert "Always compare lowercase package names"
...
This reverts commit 7d77dc490f
.
2017-09-29 20:53:36 +02:00
ines
fd1a9225d8
Handle conversion of pipeline components correctly
...
Allow both comma and comma + whitespace as separators
2017-09-29 20:52:56 +02:00
ines
7d77dc490f
Always compare lowercase package names
...
Otherwise, is_package will return False if model name contains
uppercase characters. See this issue:
https://support.prodi.gy/t/saving-a-trained-ner-model-as-a-loadable-modu
le/46/6
2017-09-29 20:52:28 +02:00
Matthew Honnibal
cdb2d83e16
Pass dropout in parser
2017-09-28 18:47:13 -05:00
Matthew Honnibal
158e177cae
Fix default embed size
2017-09-28 08:25:23 -05:00
Matthew Honnibal
f6330d69e6
Default embed size to 7000
2017-09-28 08:07:41 -05:00
Matthew Honnibal
ac8481a7b0
Print NER loss
2017-09-28 08:05:31 -05:00
Matthew Honnibal
542ebfa498
Improve defaults
2017-09-27 18:54:37 -05:00
Matthew Honnibal
dcb86bdc43
Default batch size to 32
2017-09-27 11:48:19 -05:00
Matthew Honnibal
1a37a2c0a0
Update training defaults
2017-09-27 11:48:07 -05:00
Matthew Honnibal
13d7a97f3a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-27 11:44:37 -05:00
Matthew Honnibal
66c388ee01
Remove unhelpful multitask objectives
2017-09-27 11:44:16 -05:00
Matthew Honnibal
983201a83a
Fix hard-coded vector width
2017-09-27 11:43:58 -05:00
Ines Montani
959c46eabe
Merge pull request #1365 from wannaphongcom/develop
...
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Matthew Honnibal
1ef4236f8e
Merge pull request #1343 from explosion/feature/phrasematcher
...
Update PhraseMatcher for spaCy 2
2017-09-26 20:44:23 +02:00
Wannaphong Phatthiyaphaibun
7b5263ffa4
fix thai test
2017-09-26 23:54:15 +07:00
ines
1ff62eaee7
Fix option shortcut to avoid conflict
2017-09-26 17:59:34 +02:00
Wannaphong Phatthiyaphaibun
3d5046c499
fix import in th
2017-09-26 22:41:20 +07:00
ines
7fdfb78141
Add version option to cli.train
2017-09-26 17:34:52 +02:00
Wannaphong Phatthiyaphaibun
a63f790b8c
fix thai tag_map
2017-09-26 22:28:57 +07:00
Wannaphong Phatthiyaphaibun
2ea27d07f4
fix tokenizer_exceptions in thai
2017-09-26 22:14:47 +07:00
Matthew Honnibal
41cc5c4c17
Merge branch 'develop' into feature/phrasematcher
2017-09-26 09:59:17 -05:00
Matthew Honnibal
c2e2f81773
Merge pull request #1355 from explosion/feature/noshare
...
Make pipeline components independent
2017-09-26 16:58:09 +02:00
Wannaphong Phatthiyaphaibun
a2bf4cc7bf
fix newline in file
2017-09-26 21:49:43 +07:00
ines
bb5c631402
Implement like_num getter for French (via #1161 )
2017-09-26 16:47:45 +02:00
ines
15479b3bae
Add comment to like_num re: future work
2017-09-26 16:43:28 +02:00
ines
adda08fe14
Implement like_num getter for Dutch (via #1177 )
2017-09-26 16:39:15 +02:00
ines
5ee10379db
Port over changes from #1340
2017-09-26 16:38:08 +02:00
Wannaphong Phatthiyaphaibun
5cba67146c
add thai in spacy2
2017-09-26 21:36:27 +07:00
ines
10d291f129
Port over change from #1351
2017-09-26 16:11:41 +02:00
Matthew Honnibal
3274b46a0d
Try to fix compile error on Windows
2017-09-26 09:05:53 -05:00
Matthew Honnibal
19c7c09bf7
Fix PhraseMatcher.__contains__
2017-09-26 08:35:53 -05:00
Matthew Honnibal
d02a41a8c9
Merge remote-tracking branch 'origin/develop' into feature/phrasematcher
2017-09-26 08:32:55 -05:00
Matthew Honnibal
698fc0d016
Remove merge artefact
2017-09-26 08:31:37 -05:00
Matthew Honnibal
defb68e94f
Update feature/noshare with recent develop changes
2017-09-26 08:15:14 -05:00
Matthew Honnibal
ca28590ddd
Use dep and ent multi-task objectives for parser'
2017-09-26 08:13:52 -05:00
Matthew Honnibal
9bfd585a11
Fix parameter name in .pxd file
2017-09-26 07:28:50 -05:00
Matthew Honnibal
74f08e1ad5
Update test
2017-09-26 06:45:56 -05:00
Matthew Honnibal
5aaef3e7b8
Dont link vectors in vocab deserialize
2017-09-26 06:45:47 -05:00
Matthew Honnibal
18a27c7579
Fix typo in tensorizer serialization
2017-09-26 06:45:14 -05:00
Matthew Honnibal
5056743ad5
Fix parser serialization
2017-09-26 06:44:56 -05:00
Ines Montani
7123139b2b
Add __contains__ to PhraseMatcher
2017-09-26 13:13:27 +02:00
Ines Montani
50ad50f96a
Update matcher.pyx
2017-09-26 13:11:17 +02:00
Matthew Honnibal
e34e70673f
Allow tagger models to be built with pre-defined tok2vec layer
2017-09-26 05:51:52 -05:00
Matthew Honnibal
bf917225ab
Allow multi-task objectives during training
2017-09-26 05:42:52 -05:00
Matthew Honnibal
4ae9ea7684
Remove unused argument in Language
2017-09-26 05:41:35 -05:00
ines
edf7e4881d
Add meta.json option to cli.train and add relevant properties
...
Add accuracy scores to meta.json instead of accuracy.json and replace
all relevant properties like lang, pipeline, spacy_version in existing
meta.json. If not present, also add name and version placeholders to
make it packagable.
2017-09-25 19:00:47 +02:00
ines
d2d35b63b7
Fix formatting
2017-09-25 18:37:13 +02:00
Matthew Honnibal
8eb0b7b779
Add docstrings for Pipe API
2017-09-25 16:22:07 +02:00
Matthew Honnibal
39f390dba7
Add docstrings for Pipe API
2017-09-25 16:20:49 +02:00
Matthew Honnibal
8716ffe57d
Serialize vocab last
2017-09-24 05:01:45 -05:00
Matthew Honnibal
72bbcc0871
Handle lemmatization for unknown string IDs
2017-09-24 05:01:31 -05:00
Matthew Honnibal
204b58c864
Fix evaluation during training
2017-09-24 05:01:03 -05:00
Matthew Honnibal
dc3a623d00
Remove unused update_shared argument
2017-09-24 05:00:37 -05:00
Matthew Honnibal
63bd87508d
Don't use iterated convolutions
2017-09-23 04:39:17 -05:00
Matthew Honnibal
5a7fd0fd36
Fix vector linkage
2017-09-22 20:11:52 -05:00
Matthew Honnibal
4348c479fc
Merge pre-trained vectors and noshare patches
2017-09-22 20:07:28 -05:00
Matthew Honnibal
7dc61b3f43
Whitespace
2017-09-22 20:00:50 -05:00
Matthew Honnibal
e93d43a43a
Fix training with preset vectors
2017-09-22 20:00:40 -05:00
Matthew Honnibal
0795857dcb
Fix beam parsing
2017-09-23 02:59:53 +02:00
Matthew Honnibal
4bd6a12b1f
Fix Tok2Vec
2017-09-23 02:58:54 +02:00
Matthew Honnibal
386c1a5bd8
Fix tagger training
2017-09-23 02:58:06 +02:00
Matthew Honnibal
a2357cce3f
Set random seed in train script
2017-09-23 02:57:31 +02:00
Matthew Honnibal
05596159bf
Fix serialization when pre-trained vectors
2017-09-22 15:33:27 -05:00
Matthew Honnibal
980fb6e854
Refactor Tok2Vec
2017-09-22 09:38:36 -05:00
Matthew Honnibal
d9124f1aa3
Add link_vectors_to_models function
2017-09-22 09:38:22 -05:00
Matthew Honnibal
a186596307
Add 'reapply' combinator, for iterated CNN
2017-09-22 09:37:03 -05:00
Matthew Honnibal
40a4873b70
Fix serialization of model options
2017-09-21 13:07:26 -05:00
Matthew Honnibal
0a9016cade
Fix serialization during training
2017-09-21 13:06:45 -05:00
Matthew Honnibal
20193371f5
Don't share CNN, to reduce complexities
2017-09-21 14:59:48 +02:00
Matthew Honnibal
1d73dec8b1
Refactor train script
2017-09-20 19:17:10 -05:00
Matthew Honnibal
ffda38356a
Add util function to enable GPU
2017-09-20 19:16:35 -05:00
Matthew Honnibal
24e85c2048
Pass values for CNN maxout pieces option
2017-09-20 19:16:12 -05:00
Matthew Honnibal
b832f89ff8
Add resume_training function
2017-09-20 19:15:20 -05:00
Matthew Honnibal
f5144f04be
Add argument for CNN maxout pieces
2017-09-20 19:14:41 -05:00
Matthew Honnibal
842e21de9f
Fix int type error for Python 2
2017-09-20 23:55:30 +02:00
Matthew Honnibal
0c93c73e49
Add __reduce__ method for PhraseMatcher
2017-09-20 22:26:40 +02:00
Matthew Honnibal
cc408fc189
Make PhraseMatcher API like Matcher API
2017-09-20 22:20:35 +02:00
Matthew Honnibal
43ad250dd5
Update matcher tests
2017-09-20 21:54:49 +02:00
Matthew Honnibal
828cc91545
Fix PhraseMatcher for spaCy 2
2017-09-20 21:54:31 +02:00
Matthew Honnibal
78301b2d29
Avoid comparison to None in Tok2Vec
2017-09-20 00:19:34 +02:00
Matthew Honnibal
b36a38f63d
Fix serialization of pretrained_dims property
2017-09-19 23:42:27 +02:00
Matthew Honnibal
2489dcaccf
Fix serialization of parser
2017-09-19 23:42:12 +02:00
Matthew Honnibal
40837b275d
Fix tensorizer with pretrained vectors
2017-09-18 18:05:38 -05:00
Matthew Honnibal
a0c4b33d03
Support resuming a model during spacy train
2017-09-18 18:04:47 -05:00
Matthew Honnibal
c858927271
Copy vectors to GPU on begin training
2017-09-18 18:04:16 -05:00
Matthew Honnibal
3fa76c17d1
Refactor Tok2Vec
2017-09-18 15:00:05 -05:00
Matthew Honnibal
217e7891cd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-18 11:36:21 -05:00
Matthew Honnibal
7b3f391f80
Try dropping the Affine layer, conditionally
2017-09-18 11:35:59 -05:00
ines
2480f8f521
Add missing return in Doc.from_disk() ( closes #1330 )
2017-09-18 15:32:00 +02:00
Matthew Honnibal
2148ae605b
Dont use iterated convolutions
2017-09-17 17:36:04 -05:00
Matthew Honnibal
c013e5996f
Fix parser test
2017-09-17 13:13:20 -05:00
Matthew Honnibal
8f42f8d305
Remove unused 'preprocess' argument in Tok2Vec'
2017-09-17 12:30:16 -05:00
Matthew Honnibal
039d609362
Remove hard-coded default vectors width
2017-09-17 12:29:39 -05:00
Matthew Honnibal
4f38a67a89
Make width default to 0 in vectors.pyx
2017-09-17 12:29:14 -05:00
Matthew Honnibal
16122f566e
Fix cpdef enum in attrs.pyx
2017-09-17 12:28:53 -05:00
Matthew Honnibal
b159e0eb50
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-17 05:47:50 -05:00
Matthew Honnibal
2b0efc77ae
Fix wiring of pre-trained vectors in parser loading
2017-09-17 05:47:34 -05:00
Matthew Honnibal
31c2e91c35
Fix wiring of pre-trained vectors in parser loading
2017-09-17 05:46:55 -05:00
Matthew Honnibal
8f913a74ca
Fix defaults and args to build_tagger_model
2017-09-17 05:46:36 -05:00
Matthew Honnibal
c003c561c3
Revert NER action loading change, for model compatibility
2017-09-17 05:46:03 -05:00
Matthew Honnibal
43210abacc
Resolve fine-tuning conflict
2017-09-17 05:30:04 -05:00
ines
ece30c28a8
Don't split hyphenated words in German
...
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
ines
68f66aebf8
Use pkg_resources instead of pip for is_package ( resolves #1293 )
2017-09-16 20:27:59 +02:00
Matthew Honnibal
5ff2491f24
Pass option for pre-trained vectors in parser
2017-09-16 12:47:21 -05:00
Matthew Honnibal
8665a77f48
Fix feature error in NER
2017-09-16 12:46:57 -05:00
Matthew Honnibal
e37a50a436
Pass documents to tensorizer, not 'features'
2017-09-16 12:46:36 -05:00
Matthew Honnibal
84e637e2e6
Pass option for pretrained vectors in pipeline
2017-09-16 12:46:02 -05:00
Matthew Honnibal
2a93404da6
Support optional pre-trained vectors in tensorizer model
2017-09-16 12:45:37 -05:00
Matthew Honnibal
e0a2aa9289
Support having word vectors data on GPU
2017-09-16 12:45:09 -05:00
Matthew Honnibal
ebf8942564
Fix test for Python3
2017-09-16 16:22:38 +02:00
Matthew Honnibal
8c945310fb
Excuse emoji failure on narrow unicode builds
2017-09-16 16:21:13 +02:00
Matthew Honnibal
11f2a05ede
Fix code explosion from long enum in Python 3, Cython 0.24+
2017-09-16 12:20:04 +02:00
Matthew Honnibal
3fa5b40b5c
Add test for hash consistency
2017-09-16 11:21:35 +02:00
Matthew Honnibal
f730d07e4e
Fix prange error for Windows
2017-09-16 00:25:33 +02:00
Matthew Honnibal
4b2065430e
Merge branch 'feature/parser-history' into develop
2017-09-15 10:42:20 +02:00
Matthew Honnibal
2f08489694
Remove AddHistory layer -- didnt work as planned
2017-09-15 10:41:40 +02:00
Matthew Honnibal
8b481e0465
Remove redundant brackets
2017-09-15 10:38:08 +02:00
Matthew Honnibal
d84607f6bb
Vectorize update in AddHistory
2017-09-14 20:34:40 +02:00
Ines Montani
bd3da3d6fb
Port over change from #1323 and tidy up
2017-09-14 19:23:13 +02:00
Matthew Honnibal
18347ab69c
Implement AddHistory layer wrapper
2017-09-14 19:07:35 +02:00
Matthew Honnibal
d4ca6cef9e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-14 17:00:07 +02:00
Matthew Honnibal
8c503487af
Fix lookup of missing NER actions
2017-09-14 16:59:45 +02:00
Matthew Honnibal
664c5af745
Revert padding in parser
2017-09-14 16:59:25 +02:00
Matthew Honnibal
8496d76224
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-14 09:21:20 -05:00
Matthew Honnibal
d1518027a9
Increment version
2017-09-14 16:18:46 +02:00
Matthew Honnibal
70da88a3a7
Update comment on Language.begin_training
2017-09-14 16:18:30 +02:00
Matthew Honnibal
c6395b057a
Improve parser feature extraction, for missing values
2017-09-14 16:18:02 +02:00
Matthew Honnibal
daf869ab3b
Fix add_action for NER, so labelled 'O' actions aren't added
2017-09-14 16:16:41 +02:00
Matthew Honnibal
9cb2aef587
Remove print statement
2017-09-14 13:38:28 +02:00
Matthew Honnibal
ba23d63c35
Fix minibatch function, for fixed batch size
2017-09-14 13:37:41 +02:00
Matthew Honnibal
456bb8a74c
Unxfail and close #1305
2017-09-06 19:14:17 +02:00
Matthew Honnibal
99e44fbdbb
Update regression test
2017-09-06 19:13:51 +02:00
Matthew Honnibal
5c3ff06924
Fix lemmatizer rules
2017-09-06 19:13:24 +02:00
Matthew Honnibal
dd9cab0faf
Fix type-check for int/long
2017-09-06 19:03:05 +02:00
Matthew Honnibal
497a9308a8
Xfail new lemmatizer test
2017-09-06 18:41:22 +02:00
Matthew Honnibal
dcbf866970
Merge parser changes
2017-09-06 18:41:05 +02:00
Matthew Honnibal
5384fff5ce
Add test for 1305: Incorrect lemmatization of VBZ for English
2017-09-06 18:40:18 +02:00
Matthew Honnibal
24ff6b0ad9
Fix parsing and tok2vec models
2017-09-06 05:50:58 -05:00
Matthew Honnibal
1b65115bc2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 20:02:53 -05:00
Matthew Honnibal
33fa91feb7
Restore correctness of parser model
2017-09-04 21:19:30 +02:00
Matthew Honnibal
e88a42e460
Increment version
2017-09-04 21:14:39 +02:00
Matthew Honnibal
9d65d67985
Preserve model compatibility in parser, for now
2017-09-04 16:46:22 +02:00
Matthew Honnibal
d5fbf27335
Fix test
2017-09-04 16:45:11 +02:00
Matthew Honnibal
7fdafcc4c4
Fix config loading in tagger
2017-09-04 16:38:49 +02:00
Matthew Honnibal
058372d120
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 16:27:53 +02:00
Matthew Honnibal
16e25ce3b5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-04 09:26:53 -05:00
Matthew Honnibal
9f512e657a
Fix drop_layer calculation
2017-09-04 09:26:38 -05:00
Matthew Honnibal
cb4839033c
Fix loader for EN tests
2017-09-04 15:19:18 +02:00
Matthew Honnibal
382ce566eb
Fix deserialization bug
2017-09-04 15:19:01 +02:00
Matthew Honnibal
bfddf50081
Fix #1296 : Incorrect lemmatization of base form verbs
2017-09-04 15:18:41 +02:00
Matthew Honnibal
b29e6bff46
Improve lemmatization rule for am|VBP
2017-09-04 15:18:10 +02:00
Matthew Honnibal
644d6c9e1a
Improve lemmatization tests, re #1296
2017-09-04 15:17:44 +02:00
Matthew Honnibal
3cf3fa1704
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-02 12:46:11 -05:00
Matthew Honnibal
e920885676
Fix pickle during train
2017-09-02 12:46:01 -05:00
Matthew Honnibal
c0eaba8b28
Fix low-data textcat
2017-09-02 15:17:32 +02:00
Matthew Honnibal
9e378bdac5
Fix textcat serialization
2017-09-02 15:17:20 +02:00
Matthew Honnibal
e3ea6ee02b
Increment version
2017-09-02 15:17:01 +02:00
Matthew Honnibal
a3b69bcb3d
Add low_data mode in textcat
2017-09-02 14:56:30 +02:00
Matthew Honnibal
ead78c7b9b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-09-02 12:55:25 +02:00
Matthew Honnibal
5e6a9e7dcc
Add rule-based SBD
2017-09-02 12:53:38 +02:00
Matthew Honnibal
a824cf8f9a
Adjust text classification model
2017-09-02 11:41:00 +02:00
Matthew Honnibal
ac040b99bb
Add support for pre-trained vectors in text classifier
2017-09-01 16:39:55 +02:00
Matthew Honnibal
7742a6d559
Add GloVe vectors reader
2017-09-01 16:39:22 +02:00
Matthew Honnibal
789e1a3980
Use 13 parser features, not 8
2017-08-31 14:13:00 -05:00
Matthew Honnibal
30e35d9666
Fix syntax error
2017-08-30 17:35:39 -05:00
Matthew Honnibal
4ceebde523
Fix gradient bug in parser
2017-08-30 17:32:56 -05:00
ines
173089a45a
Add more validation for model meta
2017-08-29 11:21:46 +02:00
Matthew Honnibal
2e28982e28
Merge pull request #1288 from geovedi/indonesian
...
Indonesian language support
2017-08-26 21:31:13 +02:00
ines
7e04b7f89c
Fix info text on pipeline in package cli
2017-08-26 18:30:59 +02:00
ines
40afa13a8a
Increment version
2017-08-26 18:30:49 +02:00
Matthew Honnibal
876f38c548
Merge pull request #1279 from oroszgy/model_cli_v2
...
Added vector loading to model cli
2017-08-26 15:57:50 +02:00
Matthew Honnibal
cfc055734e
Split % in units, for compatibility with corpus
2017-08-25 20:03:37 -05:00
Matthew Honnibal
4bb6bc3f9e
Add support for sent_start to GoldParse
2017-08-25 20:03:14 -05:00
Matthew Honnibal
44589fb38c
Fix Break oracle
2017-08-25 19:50:55 -05:00
Matthew Honnibal
6d4e8e14ca
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-25 12:37:16 -05:00
Matthew Honnibal
4ce5531389
Use layer norm instead of batch norm
2017-08-25 12:37:10 -05:00
Matthew Honnibal
20dd66ddc2
Constrain sentence boundaries to IS_PUNCT and IS_SPACE tokens
2017-08-25 19:35:47 +02:00
Jim Geovedi
58d8078971
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-25 09:21:49 +08:00
Matthew Honnibal
6ceb0f0518
Allow Lexeme.rank to be set
2017-08-24 21:43:00 +02:00
Matthew Honnibal
44a1fa80d3
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-23 13:02:16 +02:00
ines
bb1abbeba5
Only link model if download was successfull
2017-08-23 12:36:31 +02:00
Matthew Honnibal
bb2541ffd3
Fix PROB attr for OOV words
2017-08-23 12:11:52 +02:00
Matthew Honnibal
1c5c256e58
Fix fine_tune when optimizer is None
2017-08-23 10:51:33 +02:00
Matthew Honnibal
9c580ad28a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 17:02:04 -05:00
Matthew Honnibal
a4633fff6f
Restore use of batch norm in model
2017-08-22 17:01:58 -05:00
Matthew Honnibal
03b5b9727a
Fix Doc.vector for empty doc objects
2017-08-22 19:52:19 +02:00
Matthew Honnibal
0551b7b03a
Fix doc.vector
2017-08-22 19:46:52 +02:00
Matthew Honnibal
83f8e98450
Fix retrieval of OOV vectors
2017-08-22 19:46:35 +02:00
Matthew Honnibal
df2745eb08
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-22 19:00:43 +02:00
Matthew Honnibal
5b329acbf2
Fix vectors_length property in vocab
2017-08-22 19:00:27 +02:00
Matthew Honnibal
1fe605dfe5
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 19:18:31 -05:00
Matthew Honnibal
18b64e79ec
Fix fine tuning
2017-08-21 19:18:26 -05:00
Matthew Honnibal
682346dd66
Restore optimized hidden_depth=0 for parser
2017-08-21 19:18:04 -05:00
Matthew Honnibal
a21d8f3f0b
Add predict paths to _ml models
2017-08-21 23:23:45 +02:00
Matthew Honnibal
cec76801dc
Add profile command to CLI
2017-08-21 23:23:05 +02:00
Matthew Honnibal
7be5f30f17
Add profile function
2017-08-21 23:22:49 +02:00
ines
a68dc891ea
Port over changes from #1281
2017-08-21 23:19:18 +02:00
Matthew Honnibal
5e50a65252
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-21 14:15:46 -05:00
Matthew Honnibal
80acbc5f1f
Fix fine-tune weight mixture
2017-08-21 14:15:29 -05:00
ines
d15775c3ad
Fix typos and commands in alpha docs
2017-08-21 13:40:11 +02:00
Gyorgy Orosz
b3576bfc86
Added vector leading to model cli
2017-08-20 23:16:12 +02:00
Matthew Honnibal
c10f63bf10
Initialize fine tuning to 0.5
2017-08-20 15:59:48 -05:00
Matthew Honnibal
62878e50db
Fix misalignment caued by filtering inputs at wrong point in parser
2017-08-20 15:59:28 -05:00
Matthew Honnibal
78a5f842e9
Fix update when update_shared=False
2017-08-20 15:58:34 -05:00
Matthew Honnibal
7a6edeea68
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 12:55:39 -05:00
Matthew Honnibal
f2f9229964
Fix name of update_shared flag
2017-08-20 18:19:06 +02:00
Matthew Honnibal
8a59718fd6
Fix fine-tuning
2017-08-20 18:17:35 +02:00
Matthew Honnibal
80a5146ec2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 11:07:08 -05:00
Matthew Honnibal
84bb543e4d
Add gold_preproc flag to cli/train
2017-08-20 11:07:00 -05:00
Matthew Honnibal
3fe0d76e6d
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-20 14:50:01 +02:00
Matthew Honnibal
c1d3ff517a
Track loss in tagger
2017-08-20 14:42:23 +02:00
Matthew Honnibal
8875590081
Add optimizer in Language.update if sgd=None
2017-08-20 14:42:07 +02:00
Matthew Honnibal
84b7ed49e4
Ensure updates aren't made if no gold available
2017-08-20 14:41:38 +02:00
Ines Montani
c2bbd393af
Merge pull request #1276 from oroszgy/model_cli_v2
...
Ported model cli from v1
2017-08-20 11:52:59 +02:00
Jim Geovedi
f77443ab68
reworked
2017-08-20 13:43:21 +07:00
Jim Geovedi
fbc62a09c7
added {pre,suf,in}fix tests
2017-08-20 13:43:00 +07:00
Jim Geovedi
713d7c0aa0
added indonesian lang test
2017-08-20 12:17:14 +07:00
Jim Geovedi
b7d83f37c8
indonesian abbr.
2017-08-20 12:16:50 +07:00
Jim Geovedi
7193c47f0b
direct lookup
2017-08-20 11:57:52 +07:00
Jim Geovedi
fdf802d505
added examples
2017-08-20 11:57:10 +07:00
Jim Geovedi
fa544e6c9a
Merge remote-tracking branch 'upstream/develop' into indonesian
2017-08-20 11:49:40 +07:00
Matthew Honnibal
42fa84075f
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 22:42:50 +02:00
Matthew Honnibal
aefef6fd28
Prevent strings from being lost during from_disk and from_bytes
2017-08-19 22:42:17 +02:00
ines
281e7e58b3
Don't escape forward slashes on ujson.dumps
2017-08-19 22:32:16 +02:00
ines
2d126a00ae
Fix typo
2017-08-19 22:32:07 +02:00
Matthew Honnibal
41c2218c53
Fix test for vectors
2017-08-19 22:09:12 +02:00
Matthew Honnibal
b8e1603cc4
Fix load fail for missing vectors
2017-08-19 22:07:00 +02:00
Matthew Honnibal
a3c51a0355
Fix creation of pipeline
2017-08-19 21:58:57 +02:00
Gyorgy Orosz
e5344b83a3
Ported model cli from v1
2017-08-19 21:45:23 +02:00
Matthew Honnibal
6a94648373
Fix serialization
2017-08-19 21:27:35 +02:00
Matthew Honnibal
1157294434
Improve vector handling
2017-08-19 20:35:33 +02:00
Matthew Honnibal
ef87562741
Restore vectors test utils
2017-08-19 20:35:16 +02:00
Matthew Honnibal
1391f9da37
Restore vectors tests
2017-08-19 20:34:58 +02:00
Matthew Honnibal
8cfeeb4884
Increment version
2017-08-19 19:52:58 +02:00
Matthew Honnibal
93fb8b64e9
Fix vector loading
2017-08-19 19:52:25 +02:00
Matthew Honnibal
49a615e7d9
Create Vectors object in Vocab
2017-08-19 18:50:16 +02:00
Matthew Honnibal
3d049af563
Improve vectors to/from disk
2017-08-19 18:42:11 +02:00
Matthew Honnibal
d55d6e1cfa
Fix comparison of Token from different docs. Closes #1257
2017-08-19 16:39:32 +02:00
Matthew Honnibal
9b6a5df15e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 16:24:57 +02:00
Matthew Honnibal
4fda02c7e6
Add test for new Span.to_array method
2017-08-19 16:24:38 +02:00
Matthew Honnibal
dea229c634
Fix Span.to_array method
2017-08-19 16:24:28 +02:00
Matthew Honnibal
c606b4a42c
Add test for Doc.char_span
2017-08-19 16:18:23 +02:00
Matthew Honnibal
8b7ac77c23
Allow span label to be string in Doc.char_span
2017-08-19 16:18:09 +02:00
Matthew Honnibal
7c47e38c12
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-19 09:03:15 -05:00
Matthew Honnibal
ab28f911b4
Fix parser learning rates
2017-08-19 09:02:57 -05:00
ines
1fe5e1a4d1
Add language example sentences (see #1107 )
...
da, de, en, es, fr, he, it, nb, pl, pt, sv
2017-08-19 12:22:29 +02:00
Matthew Honnibal
97aabafb5f
Document as_tuples keyword arg of Language.pipe
2017-08-19 12:21:33 +02:00
Matthew Honnibal
80236116a6
Add Doc.char_span method, to get a span by character offset
2017-08-19 12:21:09 +02:00
Matthew Honnibal
482bba1722
Add Span.to_array method
2017-08-19 12:20:45 +02:00
Matthew Honnibal
19c495f451
Fix vectors deserialization
2017-08-19 04:33:03 +02:00
Matthew Honnibal
42d47c1e5c
Fix tagger serialization
2017-08-19 04:16:32 +02:00
Matthew Honnibal
2da96a0ec7
Fix beam test
2017-08-19 04:15:46 +02:00
Matthew Honnibal
a7309a217d
Update tagger serialization
2017-08-18 23:12:05 +02:00
Matthew Honnibal
bae59bf92f
Remove BiLSTM import
2017-08-18 22:46:59 +02:00
Matthew Honnibal
c307a0ffb8
Restore patches from nn-beam-parser to spacy/syntax
2017-08-18 22:38:59 +02:00
Matthew Honnibal
fe90dfc390
Restore changes from nn-beam-parser to spacy/_ml
2017-08-18 22:38:28 +02:00
Matthew Honnibal
de7e8703e3
Restore tests for beam parser
2017-08-18 22:27:42 +02:00
Matthew Honnibal
11c31d285c
Restore changes from nn-beam-parser
2017-08-18 22:26:12 +02:00
Matthew Honnibal
ce321b0322
Restore changes from nn-beam-parser to spacy/_ml
2017-08-18 22:24:46 +02:00
Matthew Honnibal
5f81d700ff
Restore patches from nn-beam-parser to spacy/syntax
2017-08-18 22:23:03 +02:00
Matthew Honnibal
ec482580b5
Restore changes to pipeline.pyx from nn-beam-parser branch
2017-08-18 22:02:35 +02:00
Matthew Honnibal
931509d96a
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2017-08-18 21:57:15 +02:00
Matthew Honnibal
ed95009b5c
Fix data loading on Python 2
2017-08-18 21:57:06 +02:00
Matthew Honnibal
baf36d0588
Add compat function for importlib.util
2017-08-18 21:56:47 +02:00