Matthew Honnibal
8f2a6367e9
Fix usage of PyTorch BiLSTM in ud_train
2018-09-13 22:54:59 +00:00
Matthew Honnibal
445b81ce3f
Support bilstm_depth argument in ud-train
2018-09-13 19:30:22 +02:00
Matthew Honnibal
3eb9f3e2b8
Fix defaults for ud-train
2018-09-13 18:05:48 +02:00
Matthew Honnibal
59cf533879
Improve ud-train script. Make config optional
2018-09-13 14:24:08 +02:00
Matthew Honnibal
da7650e84b
Fix maximum doc length in ud_train script
2018-09-13 14:10:25 +02:00
Matthew Honnibal
4d2d7d5866
Fix new feature flags
2018-08-27 02:12:39 +02:00
Matthew Honnibal
9c33d4d1df
Add more hyper-parameters to spacy ud-train
...
* subword_features: Controls whether subword features are used in the
word embeddings. True by default (specifically, prefix, suffix and word
shape). Should be set to False for languages like Chinese and Japanese.
* conv_depth: Depth of the convolutional layers. Defaults to 4.
2018-08-27 01:48:46 +02:00
Matthew Honnibal
595c893791
Expose noise_level option in train CLI
2018-08-16 00:41:44 +02:00
Matthew Honnibal
6ea981c839
Add converter for jsonl NER data
2018-08-14 14:04:32 +02:00
Matthew Honnibal
02c5c114d0
Fix usage of deprecated freqs.txt in init-model
2018-08-14 13:19:15 +02:00
Matthew Honnibal
4336397ecb
Update develop from master
2018-08-14 03:04:28 +02:00
Xiaoquan Kong
f0c9652ed1
New Feature: display more detail when Error E067 ( #2639 )
...
* Fix off-by-one error
* Add verbose option
* Update verbose option
* Update documents for verbose option
2018-08-07 10:45:29 +02:00
Kaisa (Katarzyna) Korsak
e531a827db
Changed conllu2json to be able to extract NER tags ( #2594 )
...
* extract ner tags from conllu file if available
* fixed a bug in regex
2018-07-25 22:21:31 +02:00
ines
d84b13e02c
Merge branch 'master' into develop
2018-07-18 18:57:00 +02:00
Ole Henrik Skogstrøm
6e2930a4a2
Conll(u)-bio converter ( #2525 )
...
* Started simple conllxbiluo converter
* Fix missing BIO to BILUO conversion
2018-07-18 18:55:42 +02:00
Matthew Honnibal
8ae1bec8bf
Fix init_model
2018-07-05 14:02:06 +02:00
Matthew Honnibal
dee8bdb900
Fix init-model for npz vectors
2018-07-04 02:29:48 +02:00
Matthew Honnibal
59d655e8d0
Fix model init from jsonl
2018-07-04 01:30:40 +02:00
Matthew Honnibal
1e38bea6e9
Save vectors init
2018-07-03 23:55:04 +02:00
Matthew Honnibal
6692833887
Fix init_model
2018-07-03 23:24:11 +02:00
Matthew Honnibal
4a38a26cb5
Fix init_model
2018-07-03 22:57:11 +02:00
Matthew Honnibal
019d09e3c3
Fix init model
2018-07-03 22:16:44 +02:00
Matthew Honnibal
2543f8c93a
Support .npz vectors in init-model command
2018-07-03 21:42:16 +02:00
Matthew Honnibal
86aad11939
Fix init_model arg
2018-07-03 17:00:42 +02:00
Matthew Honnibal
eff42d36e3
Fix init model command
2018-07-03 16:32:23 +02:00
Matthew Honnibal
6a89faf12e
Add support for jsonl-formatted lexical attributes to init-model command.
2018-07-03 12:22:56 +02:00
Matthew Honnibal
c83fccfe2a
Fix output of best model
2018-06-25 23:05:56 +02:00
Matthew Honnibal
69c900f003
Fix init-model if no vectors provided
2018-06-25 18:26:02 +02:00
Matthew Honnibal
664f89327a
Fix init-model if no vectors provided
2018-06-25 17:58:45 +02:00
Matthew Honnibal
c4698f5712
Don't collate model unless training succeeds
2018-06-25 16:36:42 +02:00
Matthew Honnibal
24dfbb8a28
Fix model collation
2018-06-25 14:35:24 +02:00
Matthew Honnibal
62237755a4
Import shutil
2018-06-25 13:40:17 +02:00
Matthew Honnibal
a040fca99e
Import json into cli.train
2018-06-25 11:50:37 +02:00
Matthew Honnibal
2c703d99c2
Fix collation of best models
2018-06-25 01:21:34 +02:00
Matthew Honnibal
2c80b7c013
Collate best model after training
2018-06-24 23:39:52 +02:00
ines
330c039106
Merge branch 'master' into develop
2018-05-26 18:30:52 +02:00
James Messinger
4515e96e90
Better formatting for spacy train
CLI ( #2357 )
...
* Better formatting for `spacy train` CLI
Changed to use fixed-spaces rather than tabs to align table headers and data.
### Before:
```
Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token %
0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4
1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1
2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9
```
### After:
```
Itn. Dep Loss NER Loss UAS NER P. NER R. NER F. Tag % Token % CPU WPS GPU WPS
0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4
1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1
2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9
```
* Added contributor file
2018-05-25 13:08:45 +02:00
Matthew Honnibal
ce458c2428
Fix spacy requirement constraint in package template
2018-05-22 20:50:46 +02:00
Matthew Honnibal
f3b4f6a4ec
Merge setup.py
2018-05-20 23:21:00 +02:00
Ines Montani
d4cc736b7c
💫 Improve model downloads: check for existing install, customise pip and use requests library again ( #2346 )
...
* Go back to using requests instead of urllib (closes #2320 )
Fewer dependencies are good, but this one was simply causing too many other problems around SSL verification and Python 2/3 compatibility. requests is a popular enough package that it's okay for spaCy to depend on it – and this will hopefully make model downloads less flakey.
* Only download model if not installed (see #1456 )
Use #egg=model==version to allow pip to check for existing installations. The download is only started if no installation matching the package/version is found. Fixes a long-standing inconvenience.
* Pass additional options to pip when installing model (resolves #1456 )
Treat all additional arguments passed to the download command as pip options to allow user to customise the command. For example:
python -m spacy download en --user
* Add CLI option to enable installing model package dependencies
* Revert "Add CLI option to enable installing model package dependencies"
This reverts commit 9336ffe695
.
* Update documentation
2018-05-20 20:26:56 +02:00
Matthew Honnibal
74d5c625b3
Use rising beam update prob
2018-05-16 20:11:59 +02:00
Matthew Honnibal
dc1a479fbd
Merge branch 'develop' into feature/refactor-parser
2018-05-15 18:39:21 +02:00
Matthew Honnibal
546dd99cdf
Merge master into develop -- mostly Arabic and website
2018-05-15 18:14:28 +02:00
Matthew Honnibal
a6ae1ee6f7
Don't modify Token in global scope
2018-05-09 00:43:00 +02:00
Matthew Honnibal
f94f721f40
Avoid importing fused token symbol in ud-run-test, untl that's added
2018-05-09 00:28:03 +02:00
Matthew Honnibal
659ec5b975
Avoid importing fused token symbol in ud-run-test, untl that's added
2018-05-08 19:40:33 +02:00
Matthew Honnibal
fc4dd49b77
Support oracle segmentation in ud-train CLI command
2018-05-08 13:47:45 +02:00
ines
7a3599c21a
Fix formatting and consistency
2018-05-07 23:02:11 +02:00
Matthew Honnibal
eddc0e0c74
Set gold.sent_starts in ud_train
2018-05-07 15:52:47 +02:00
G.Pruvost
cc8e804648
#2211 - Support for ssl certs config on download command ( #2212 )
...
* Add support for SSL/Certs customization on download CLI
* Add a note on SSL options for the 'download' CLI in the README
* Add contributor agreement
2018-05-03 18:37:02 +02:00