spaCy/extra/example_data/textcat_example_data
2020-08-25 13:28:42 +02:00
..
CC_BY-SA-3.0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
CC_BY-SA-4.0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
CC0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
cooking.json Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
cooking.jsonl Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
jigsaw-toxic-comment.json Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
jigsaw-toxic-comment.jsonl Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
README.md Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
textcatjsonl_to_trainjson.py Clean out /examples and /bin 2020-08-25 13:28:42 +02:00

Examples of textcat training data

spacy JSON training files were generated from JSONL with:

python textcatjsonl_to_trainjson.py -m en file.jsonl .

cooking.json is an example with mutually-exclusive classes with two labels:

  • baking
  • not_baking

jigsaw-toxic-comment.json is an example with multiple labels per instance:

  • insult
  • obscene
  • severe_toxic
  • toxic

Data Sources

Data Licenses