Commit Graph

129 Commits

Author SHA1 Message Date
Matthew Honnibal
563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
Matthew Honnibal
f1b86dff8c Update textcat example 2017-10-04 15:12:28 +02:00
Matthew Honnibal
79a94bc166 Update textcat exampe 2017-10-04 14:55:30 +02:00
Matthew Honnibal
cbb1fbef80 Update train_ner_standalone example 2017-10-03 18:49:38 +02:00
Matthew Honnibal
38286b6f07 Add example loadig Fast Text vectors 2017-10-01 23:40:02 +02:00
Matthew Honnibal
f92ab03dc8 Rename phrase matcher example 2017-09-20 22:51:58 +02:00
Matthew Honnibal
01858e9b59 Fix PhraseMatcher example 2017-09-20 22:51:41 +02:00
Matthew Honnibal
027a5d8b75 Update train_ner_standalone example 2017-09-15 10:36:46 +02:00
Matthew Honnibal
683d81bb49 Update example for adding entity type 2017-09-14 16:15:59 +02:00
Matthew Honnibal
c16ef0a85c Clarify train textcat example 2017-07-29 21:59:27 +02:00
Matthew Honnibal
54a539a113 Finish text classifier example 2017-07-23 00:34:12 +02:00
Matthew Honnibal
2bc7d87c70 Add example for training text classifier 2017-07-22 20:15:32 +02:00
ines
992559bf9a Fix formatting and remove unused imports 2017-06-01 12:47:18 +02:00
Matthew Honnibal
5c30466c95 Update NER training example 2017-05-31 13:42:12 +02:00
akYoung
c158cdb1da Corretions for model test example
The sentences of test data in sentence entailment example should be generated with integers limited to vocab_size.
2017-05-03 22:41:23 +08:00
Matthew Honnibal
2da16adcc2 Add dropout optin for parser and NER
Dropout can now be specified in the `Parser.update()` method via
the `drop` keyword argument, e.g.

    nlp.entity.update(doc, gold, drop=0.4)

This will randomly drop 40% of features, and multiply the value of the
others by 1. / 0.4. This may be useful for generalising from small data
sets.

This commit also patches the examples/training/train_new_entity_type.py
example, to use dropout and fix the output (previously it did not output
the learned entity).
2017-04-27 13:18:39 +02:00
Matthew Honnibal
0605b95f2e Merge branch 'master' of https://github.com/explosion/spaCy 2017-04-18 13:48:00 +02:00
Matthew Honnibal
2f84626417 Fix train_new_entity_type example 2017-04-18 13:47:36 +02:00
Ines Montani
e7ae3b7cc2 Fix formatting and typo (closes #967) 2017-04-16 23:56:12 +02:00
Ines Montani
734b0a4e4a Update train_new_entity_type.py 2017-04-16 23:42:16 +02:00
ines
264af6cd17 Add documentation 2017-04-16 20:37:46 +02:00
ines
c7adca58a9 Tidy up example and only save/test if output_directory is not None 2017-04-16 16:55:01 +02:00
Matthew Honnibal
40e3024241 Move standalone NER training script into examples directory 2017-04-15 16:13:42 +02:00
Matthew Honnibal
b9c26aae11 Remove neptune refs from new train example 2017-04-15 16:13:17 +02:00
Matthew Honnibal
c729d72fc6 Add new example for training new entity types 2017-04-15 16:11:06 +02:00
Matthew Honnibal
a7626bd7fd Tmp commit to example 2017-04-15 15:43:14 +02:00
Matthew Honnibal
97b83c74dc WIP on training example 2017-04-14 23:54:27 +02:00
Kumaran Rajendhiran
3f55d6afae Update README 2017-04-05 16:59:52 +05:30
Kumaran Rajendhiran
47d7137c83 Set max_length to 100 for demo and evaluate 2017-04-05 16:48:35 +05:30
Kumaran Rajendhiran
10e8dcdfdb Remove not needed parameters from function 2017-04-05 16:20:47 +05:30
Matthew Honnibal
07726cf0a6 Add example of standalone NER training 2017-03-19 15:01:38 +01:00
Matthew Honnibal
f028f8ad28 Remove unfinished examples 2017-02-18 11:04:41 +01:00
Matthew Honnibal
c031c677cc Remove unused model_dir option
As noted in #845, the `model_dir` argument was not being used. I've removed it for now, although it would be good to have this option restored and working.
2017-02-18 10:38:22 +01:00
Matthew Honnibal
16ce7409e4 Merge branch 'master' of https://github.com/explosion/spaCy 2017-01-31 13:27:34 -06:00
Matthew Honnibal
80aa4e114b Fix x keras deep learning example 2017-01-31 13:27:13 -06:00
Matthew Honnibal
ab70f6e18d Update NER training example 2017-01-27 12:27:10 +01:00
Ines Montani
853130bcf8 Update installation instructions (see #727) 2017-01-14 22:12:42 +01:00
Matthew Honnibal
5a319060b9 Merge branch 'master' of https://github.com/explosion/spaCy 2016-12-20 16:26:57 -06:00
Matthew Honnibal
7793e2ad82 Fix use of dropout in sentiment analysis LSTM example 2016-12-20 16:26:38 -06:00
Christos Savvopoulos
c19b83f6ae use model_dir inside of load_model 2016-12-12 20:23:24 +00:00
Christos Savvopoulos
93cf4af701 actually commit load_ner.py 2016-12-12 20:13:33 +00:00
Christos Savvopoulos
ad54a929f8 train_ner should save vocab; add load_ner example 2016-12-12 20:09:49 +00:00
Matthew Honnibal
d0c999e0ad Add config.py for paddle example 2016-11-20 23:24:51 +01:00
Matthew Honnibal
d75fe7c19a Update paddle example 2016-11-20 21:45:08 +01:00
Matthew Honnibal
1ef541ddff Add train.sh for paddle 2016-11-20 21:44:33 +01:00
Matthew Honnibal
001abe2b9d Update config.py 2016-11-20 03:45:51 +01:00
Matthew Honnibal
409a18bd42 Add paddle sentiment example 2016-11-20 03:35:23 +01:00
Matthew Honnibal
e7eac08819 Work on paddle example 2016-11-20 03:29:36 +01:00
Matthew Honnibal
1ed40682a3 Set vectors in chainer example 2016-11-19 18:42:58 -06:00
Matthew Honnibal
b701a08249 Fix embedding in chainer sentiment example 2016-11-19 19:05:37 +01:00