spaCy/examples
Matthew Honnibal 563f46f026 Fix multi-label support for text classification
The TextCategorizer class is supposed to support multi-label
text classification, and allow training data to contain missing
values.

For this to work, the gradient of the loss should be 0 when labels
are missing. Instead, there was no way to actually denote "missing"
in the GoldParse class, and so the TextCategorizer class treated
the label set within gold.cats as complete.

To fix this, we change GoldParse.cats to be a dict instead of a list.
The GoldParse.cats dict should map to floats, with 1. denoting
'present' and 0. denoting 'absent'. Gradients are zeroed for categories
absent from the gold.cats dict. A nice bonus is that you can also set
values between 0 and 1 for partial membership. You can also set numeric
values, if you're using a text classification model that uses an
appropriate loss function.

Unfortunately this is a breaking change; although the functionality
was only recently introduced and hasn't been properly documented
yet. I've updated the example script accordingly.
2017-10-05 18:43:02 -05:00
..
inventory_count Rename inventory count example 2016-11-01 02:30:22 +01:00
keras_parikh_entailment Corretions for model test example 2017-05-03 22:41:23 +08:00
training Fix multi-label support for text classification 2017-10-05 18:43:02 -05:00
_handler.py * Add _handler to resolve Issue #123 2015-10-15 02:44:23 +11:00
deep_learning_keras.py Fix use of dropout in sentiment analysis LSTM example 2016-12-20 16:26:38 -06:00
get_parse_subregions.py move displacy to its own subdomain 2016-02-19 14:03:52 +01:00
information_extraction.py * Tweak information extraction example 2015-10-06 10:35:49 +11:00
matcher_example.py * Add clarifying comment 2015-09-27 18:17:41 +10:00
nn_text_class.py Add setup directions for data dir 2016-11-13 10:08:16 -08:00
parallel_parse.py added batch_size as keyword argument 2016-03-10 14:16:34 -08:00
phrase_matcher.py Rename phrase matcher example 2017-09-20 22:51:58 +02:00
pos_tag.py Fix formatting and typo (closes #967) 2017-04-16 23:56:12 +02:00
README.md Add README.md to examples 2016-11-01 01:14:04 +01:00
twitter_filter.py * Begin rewriting twitter_filter examples 2015-08-22 22:12:26 +02:00
vectors_fast_text.py Add example loadig Fast Text vectors 2017-10-01 23:40:02 +02:00

spaCy examples

The examples are Python scripts with well-behaved command line interfaces. For a full list of spaCy tutorials and code snippets, see the documentation.

How to run an example

For example, to run the nn_text_class.py script, do:

$ python examples/nn_text_class.py
usage: nn_text_class.py [-h] [-d 3] [-H 300] [-i 5] [-w 40000] [-b 24]
                        [-r 0.3] [-p 1e-05] [-e 0.005]
                        data_dir
nn_text_class.py: error: too few arguments

You can print detailed help with the -h argument.

While we try to keep the examples up to date, they are not currently exercised by the test suite, as some of them require significant data downloads or take time to train. If you find that an example is no longer running, please tell us! We know there's nothing worse than trying to figure out what you're doing wrong, and it turns out your code was never the problem.