spaCy/spacy
Sofie Van Landeghem 311133e579
Train textcat with config (#5143)
* bring back default build_text_classifier method

* remove _set_dims_ hack in favor of proper dim inference

* add tok2vec initialize to unit test

* small fixes

* add unit test for various textcat config settings

* logistic output layer does not have nO

* fix window_size setting

* proper fix

* fix W initialization

* Update textcat training example

* Use ml_datasets
* Convert training data to `Example` format
* Use `n_texts` to set proportionate dev size

* fix _init renaming on latest thinc

* avoid setting a non-existing dim

* update to thinc==8.0.0a2

* add BOW and CNN defaults for easy testing

* various experiments with train_textcat script, fix softmax activation in textcat bow

* allow textcat train script to work on other datasets as well

* have dataset as a parameter

* train textcat from config, with example config

* add config for training textcat

* formatting

* fix exclusive_classes

* fixing BOW for GPU

* bump thinc to 8.0.0a3 (not published yet so CI will fail)

* add in link_vectors_to_models which got deleted

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-03-29 19:40:36 +02:00
..
cli Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
displacy Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lang Remove unicode declarations 2020-03-26 15:18:32 +01:00
matcher Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
ml Train textcat with config (#5143) 2020-03-29 19:40:36 +02:00
pipeline Train textcat with config (#5143) 2020-03-29 19:40:36 +02:00
syntax Fix parser @ GPU (#5210) 2020-03-28 23:09:35 +01:00
tests Train textcat with config (#5143) 2020-03-29 19:40:36 +02:00
tokens bugfix in span similarity (#5155) 2020-03-29 13:56:07 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify warnings 2020-02-28 12:20:23 +01:00
__main__.py Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
_ml.py take care of global vectors in multiprocessing (#5081) 2020-03-03 13:58:22 +01:00
about.py Bugfix linking vectors (#5196) 2020-03-25 10:20:11 +01:00
analysis.py Simplify warnings 2020-02-28 12:20:23 +01:00
attrs.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
attrs.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
glossary.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
gold.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
gold.pyx Check whether doc is instantiated in Example.get_gold_parses() (#5167) 2020-03-29 13:57:00 +02:00
kb.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
kb.pyx Merge branch 'develop' into refactor/simplify-warnings 2020-03-04 16:38:55 +01:00
language.py Fix argument 2020-03-26 14:09:02 +01:00
lemmatizer.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
lexeme.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
lexeme.pyx Simplify warnings 2020-02-28 12:20:23 +01:00
lookups.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx Fix small errors 2020-03-26 13:47:31 +01:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
schemas.py Add sent_start to pattern schema 2020-03-26 14:05:40 +01:00
scorer.py Fix GoldParse init when token count differs (#5191) 2020-03-26 10:46:23 +01:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
structs.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
symbols.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
symbols.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
tokenizer.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
tokenizer.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Tok2Vec: extract-embed-encode (#5102) 2020-03-08 13:23:18 +01:00
vectors.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
vocab.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
vocab.pyx Tidy up and auto-format 2020-02-18 15:38:18 +01:00