spaCy/spacy
Matthew Honnibal 375f0dc529
💫 Make TextCategorizer default to a simpler, GPU-friendly model (#3038)
Currently the TextCategorizer defaults to a fairly complicated model, designed partly around the active learning requirements of Prodigy. The model's a bit slow, and not very GPU-friendly.

This patch implements a straightforward CNN model that still performs pretty well. The replacement model also makes it easy to use the LMAO pretraining, since most of the parameters are in the CNN.

The replacement model has a flag to specify whether labels are mutually exclusive, which defaults to True. This has been a common problem with the text classifier. We'll also now be able to support adding labels to pretrained models again.

Resolves #2934, #2756, #1798, #1748.
2018-12-10 14:37:39 +01:00
..
cli Make spacy train respect LOG_FRIENDLY 2018-12-10 09:46:53 +01:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
lang 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
syntax Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-10 09:44:07 +01:00
tests 💫 Allow Span to take text label (#3031) 2018-12-08 13:08:41 +01:00
tokens 💫 Allow Span to take text label (#3031) 2018-12-08 13:08:41 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
__main__.py 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py 💫 Make TextCategorizer default to a simpler, GPU-friendly model (#3038) 2018-12-10 14:37:39 +01:00
about.py Set version back to 2.1.0a4 2018-12-03 02:03:26 +01:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
errors.py replace user-facing references to "sbd" with "sentencizer" (#2985) 2018-11-30 21:22:40 +01:00
glossary.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
gold.pxd Add support for sent_start to GoldParse 2017-08-25 20:03:14 -05:00
gold.pyx Fix JSON segmentation bug that affected French 2018-12-08 10:41:24 +01:00
language.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
lemmatizer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lexeme.pxd WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx 💫 Make TextCategorizer default to a simpler, GPU-friendly model (#3038) 2018-12-10 14:37:39 +01:00
scorer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
structs.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Set cupy.random seed in fix_random_seed helper 2018-12-08 12:37:38 +01:00
vectors.pyx 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00