spaCy/extra/example_data/textcat_example_data
Sofie Van Landeghem 8e7557656f
Renaming gold & annotation_setter (#6042)
* version bump to 3.0.0a16

* rename "gold" folder to "training"

* rename 'annotation_setter' to 'set_extra_annotations'

* formatting
2020-09-09 10:31:03 +02:00
..
CC_BY-SA-3.0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
CC_BY-SA-4.0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
CC0.txt Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
cooking.json Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
cooking.jsonl Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
jigsaw-toxic-comment.json Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
jigsaw-toxic-comment.jsonl Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
README.md Clean out /examples and /bin 2020-08-25 13:28:42 +02:00
textcatjsonl_to_trainjson.py Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00

Examples of textcat training data

spacy JSON training files were generated from JSONL with:

python textcatjsonl_to_trainjson.py -m en file.jsonl .

cooking.json is an example with mutually-exclusive classes with two labels:

  • baking
  • not_baking

jigsaw-toxic-comment.json is an example with multiple labels per instance:

  • insult
  • obscene
  • severe_toxic
  • toxic

Data Sources

Data Licenses