spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-02 13:36:18 +03:00

History

Matthew Honnibal 6c783f8045 Bug fixes and options for TextCategorizer (#3472 ) * Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs		2019-03-23 16:44:44 +01:00
..
conllu.py	Remove unused cytoolz / itertools imports	2018-12-03 02:12:07 +01:00
ner_multitask_objective.py	Auto-format examples	2018-12-02 04:26:26 +01:00
pretrain_textcat.py	Auto-format examples	2018-12-02 04:26:26 +01:00
rehearsal.py	Update rehearsal example	2019-02-24 16:17:41 +01:00
train_intent_parser.py	Auto-format examples	2018-12-02 04:26:26 +01:00
train_ner.py	Test and update examples [ci skip]	2019-03-16 14:15:49 +01:00
train_new_entity_type.py	Test and update examples [ci skip]	2019-03-16 14:15:49 +01:00
train_parser.py	Test and update examples [ci skip]	2019-03-16 14:15:49 +01:00
train_tagger.py	Test and update examples [ci skip]	2019-03-16 14:15:49 +01:00
train_textcat.py	Bug fixes and options for TextCategorizer (#3472 )	2019-03-23 16:44:44 +01:00
training-data.json	Update Example input JSON file to adhere to specification. (#3243 )	2019-02-07 16:18:01 +01:00
vocab-data.jsonl	Use even smaller examle size	2017-10-30 19:46:45 +01:00