spaCy/examples/training/textcat_example_data
Sofie Van Landeghem 0d94737857
Feature toggle_pipes (#5378)
* make disable_pipes deprecated in favour of the new toggle_pipes

* rewrite disable_pipes statements

* update documentation

* remove bin/wiki_entity_linking folder

* one more fix

* remove deprecated link to documentation

* few more doc fixes

* add note about name change to the docs

* restore original disable_pipes

* small fixes

* fix typo

* fix error number to W096

* rename to select_pipes

* also make changes to the documentation

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-18 22:27:10 +02:00
..
CC_BY-SA-3.0.txt Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
CC_BY-SA-4.0.txt Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
CC0.txt Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
cooking.json Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
cooking.jsonl Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
jigsaw-toxic-comment.json Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
jigsaw-toxic-comment.jsonl Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
README.md Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
textcatjsonl_to_trainjson.py Feature toggle_pipes (#5378) 2020-05-18 22:27:10 +02:00

Examples of textcat training data

spacy JSON training files were generated from JSONL with:

python textcatjsonl_to_trainjson.py -m en file.jsonl .

cooking.json is an example with mutually-exclusive classes with two labels:

  • baking
  • not_baking

jigsaw-toxic-comment.json is an example with multiple labels per instance:

  • insult
  • obscene
  • severe_toxic
  • toxic

Data Sources

Data Licenses