spaCy/spacy
Sofie Van Landeghem c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
..
cli train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
displacy unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
lang Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
matcher Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
ml train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
pipeline train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
syntax train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
tests train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
tokens Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify warnings 2020-04-28 13:37:37 +02:00
__main__.py add discard_oversize parameter, move optimizer to training subsection 2020-06-03 10:04:16 +02:00
about.py Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
gold.pxd Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
gold.pyx train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
kb.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
kb.pyx unicode -> str consistency 2020-05-24 17:20:58 +02:00
language.py train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
lemmatizer.py Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
lexeme.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lexeme.pyx Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
lookups.py Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx Fix typo 2020-06-03 14:42:39 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
schemas.py Fix test and schemas 2020-05-21 19:01:02 +02:00
scorer.py train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
structs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
tokenizer.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
tokenizer.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py train is from-config by default (#5575) 2020-06-12 02:02:07 +02:00
vectors.pyx Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
vocab.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
vocab.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00