mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-24 12:41:23 +03:00
* Fix code for bag-of-words feature extraction The _ml.py module had a redundant copy of a function to extract unigram bag-of-words features, except one had a bug that set values to 0. Another function allowed extraction of bigram features. Replace all three with a new function that supports arbitrary ngram sizes and also allows control of which attribute is used (e.g. ORTH, LOWER, etc). * Support 'bow' architecture for TextCategorizer This allows efficient ngram bag-of-words models, which are better when the classifier needs to run quickly, especially when the texts are long. Pass architecture="bow" to use it. The extra arguments ngram_size and attr are also available, e.g. ngram_size=2 means unigram and bigram features will be extracted. * Fix size limits in train_textcat example * Explain architectures better in docs |
||
|---|---|---|
| .. | ||
| conllu.py | ||
| ner_multitask_objective.py | ||
| pretrain_textcat.py | ||
| rehearsal.py | ||
| train_intent_parser.py | ||
| train_ner.py | ||
| train_new_entity_type.py | ||
| train_parser.py | ||
| train_tagger.py | ||
| train_textcat.py | ||
| training-data.json | ||
| vocab-data.jsonl | ||