mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-12 12:47:15 +03:00
5453821a9f
Add note on training data sources and include coarse-grained Wikipedia scheme
110 lines
2.7 KiB
Plaintext
110 lines
2.7 KiB
Plaintext
//- 💫 DOCS > API > ANNOTATION > NAMED ENTITIES
|
|
|
|
p
|
|
| Models trained on the
|
|
| #[+a("https://catalog.ldc.upenn.edu/ldc2013t19") OntoNotes 5] corpus
|
|
| support the following entity types:
|
|
|
|
+table(["Type", "Description"])
|
|
+row
|
|
+cell #[code PERSON]
|
|
+cell People, including fictional.
|
|
|
|
+row
|
|
+cell #[code NORP]
|
|
+cell Nationalities or religious or political groups.
|
|
|
|
+row
|
|
+cell #[code FACILITY]
|
|
+cell Buildings, airports, highways, bridges, etc.
|
|
|
|
+row
|
|
+cell #[code ORG]
|
|
+cell Companies, agencies, institutions, etc.
|
|
|
|
+row
|
|
+cell #[code GPE]
|
|
+cell Countries, cities, states.
|
|
|
|
+row
|
|
+cell #[code LOC]
|
|
+cell Non-GPE locations, mountain ranges, bodies of water.
|
|
|
|
+row
|
|
+cell #[code PRODUCT]
|
|
+cell Objects, vehicles, foods, etc. (Not services.)
|
|
|
|
+row
|
|
+cell #[code EVENT]
|
|
+cell Named hurricanes, battles, wars, sports events, etc.
|
|
|
|
+row
|
|
+cell #[code WORK_OF_ART]
|
|
+cell Titles of books, songs, etc.
|
|
|
|
+row
|
|
+cell #[code LAW]
|
|
+cell Named documents made into laws.
|
|
|
|
+row
|
|
+cell #[code LANGUAGE]
|
|
+cell Any named language.
|
|
|
|
+row
|
|
+cell #[code DATE]
|
|
+cell Absolute or relative dates or periods.
|
|
|
|
+row
|
|
+cell #[code TIME]
|
|
+cell Times smaller than a day.
|
|
|
|
+row
|
|
+cell #[code PERCENT]
|
|
+cell Percentage, including "%".
|
|
|
|
+row
|
|
+cell #[code MONEY]
|
|
+cell Monetary values, including unit.
|
|
|
|
+row
|
|
+cell #[code QUANTITY]
|
|
+cell Measurements, as of weight or distance.
|
|
|
|
+row
|
|
+cell #[code ORDINAL]
|
|
+cell "first", "second", etc.
|
|
|
|
+row
|
|
+cell #[code CARDINAL]
|
|
+cell Numerals that do not fall under another type.
|
|
|
|
+h(4, "ner-wikipedia-scheme") Wikipedia scheme
|
|
|
|
p
|
|
| Models trained on Wikipedia corpus
|
|
| (#[+a("http://www.sciencedirect.com/science/article/pii/S0004370212000276") Nothman et al., 2013])
|
|
| use a less fine-grained NER annotation scheme and recognise the
|
|
| following entities:
|
|
|
|
+table(["Type", "Description"])
|
|
+row
|
|
+cell #[code PER]
|
|
+cell Named person or family.
|
|
|
|
+row
|
|
+cell #[code LOC]
|
|
+cell
|
|
| Name of politically or geographically defined location (cities,
|
|
| provinces, countries, international regions, bodies of water,
|
|
| mountains).
|
|
|
|
+row
|
|
+cell #[code ORG]
|
|
+cell Named corporate, governmental, or other organizational entity.
|
|
|
|
+row
|
|
+cell #[code MISC]
|
|
+cell
|
|
| Miscellaneous entities, e.g. events, nationalities, products or
|
|
| works of art.
|