spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-05 04:11:26 +03:00

History

Matthew Honnibal 563f46f026 Fix multi-label support for text classification The TextCategorizer class is supposed to support multi-label text classification, and allow training data to contain missing values. For this to work, the gradient of the loss should be 0 when labels are missing. Instead, there was no way to actually denote "missing" in the GoldParse class, and so the TextCategorizer class treated the label set within gold.cats as complete. To fix this, we change GoldParse.cats to be a dict instead of a list. The GoldParse.cats dict should map to floats, with 1. denoting 'present' and 0. denoting 'absent'. Gradients are zeroed for categories absent from the gold.cats dict. A nice bonus is that you can also set values between 0 and 1 for partial membership. You can also set numeric values, if you're using a text classification model that uses an appropriate loss function. Unfortunately this is a breaking change; although the functionality was only recently introduced and hasn't been properly documented yet. I've updated the example script accordingly.		2017-10-05 18:43:02 -05:00
..
load_ner.py	use model_dir inside of load_model	2016-12-12 20:23:24 +00:00
train_ner_standalone.py	Update train_ner_standalone example	2017-10-03 18:49:38 +02:00
train_ner.py	Fix formatting and remove unused imports	2017-06-01 12:47:18 +02:00
train_new_entity_type.py	Update example for adding entity type	2017-09-14 16:15:59 +02:00
train_parser.py	updated training examples to v1.1.2	2016-10-24 11:53:33 +10:00
train_tagger.py	train_ner should save vocab; add load_ner example	2016-12-12 20:09:49 +00:00
train_textcat.py	Fix multi-label support for text classification	2017-10-05 18:43:02 -05:00