Update TIGER link and tag description (#6344)

This commit is contained in:
Adriane Boyd 2020-11-05 09:33:00 +01:00
parent 58a7461cff
commit e4c3d6748c

View File

@ -51,11 +51,11 @@ rely on simple lookup tables.
<Infobox title="About spaCy's custom pronoun lemma for English" variant="warning"> <Infobox title="About spaCy's custom pronoun lemma for English" variant="warning">
spaCy adds a **special case for English pronouns**: all English pronouns are spaCy adds a **special case for English pronouns**: all English pronouns are
lemmatized to the special token `-PRON-`. Unlike verbs and common nouns, lemmatized to the special token `-PRON-`. Unlike verbs and common nouns, there's
there's no clear base form of a personal pronoun. Should the lemma of "me" be no clear base form of a personal pronoun. Should the lemma of "me" be "I", or
"I", or should we normalize person as well, giving "it" — or maybe "he"? should we normalize person as well, giving "it" — or maybe "he"? spaCy's
spaCy's solution is to introduce a novel symbol, `-PRON-`, which is used as the solution is to introduce a novel symbol, `-PRON-`, which is used as the lemma
lemma for all personal pronouns. for all personal pronouns.
</Infobox> </Infobox>
@ -120,69 +120,70 @@ The English part-of-speech tagger uses the
Treebank tag set. We also map the tags to the simpler Universal Dependencies v2 Treebank tag set. We also map the tags to the simpler Universal Dependencies v2
POS tag set. POS tag set.
| Tag |  POS | Morphology | Description | | Tag |  POS | Morphology | Description |
| ------------------------------------- | ------- | --------------------------------------- | ----------------------------------------- | | ----------------------------------- | ------- | -------------------------------------------------- | ----------------------------------------- |
| `$` | `SYM` | | symbol, currency | | `$` | `SYM` | | symbol, currency |
| <InlineCode>&#96;&#96;</InlineCode> | `PUNCT` | `PunctType=quot PunctSide=ini` | opening quotation mark | | <InlineCode>&#96;&#96;</InlineCode> | `PUNCT` | `PunctType=quot PunctSide=ini` | opening quotation mark |
| `''` | `PUNCT` | `PunctType=quot PunctSide=fin` | closing quotation mark | | `''` | `PUNCT` | `PunctType=quot PunctSide=fin` | closing quotation mark |
| `,` | `PUNCT` | `PunctType=comm` | punctuation mark, comma | | `,` | `PUNCT` | `PunctType=comm` | punctuation mark, comma |
| `-LRB-` | `PUNCT` | `PunctType=brck PunctSide=ini` | left round bracket | | `-LRB-` | `PUNCT` | `PunctType=brck PunctSide=ini` | left round bracket |
| `-RRB-` | `PUNCT` | `PunctType=brck PunctSide=fin` | right round bracket | | `-RRB-` | `PUNCT` | `PunctType=brck PunctSide=fin` | right round bracket |
| `.` | `PUNCT` | `PunctType=peri` | punctuation mark, sentence closer | | `.` | `PUNCT` | `PunctType=peri` | punctuation mark, sentence closer |
| `:` | `PUNCT` | | punctuation mark, colon or ellipsis | | `:` | `PUNCT` | | punctuation mark, colon or ellipsis |
| `ADD` | `X` | | email | | `ADD` | `X` | | email |
| `AFX` | `ADJ` | `Hyph=yes` | affix | | `AFX` | `ADJ` | `Hyph=yes` | affix |
| `CC` | `CCONJ` | `ConjType=comp` | conjunction, coordinating | | `CC` | `CCONJ` | `ConjType=comp` | conjunction, coordinating |
| `CD` | `NUM` | `NumType=card` | cardinal number | | `CD` | `NUM` | `NumType=card` | cardinal number |
| `DT` | `DET` | | determiner | | `DT` | `DET` | | determiner |
| `EX` | `PRON` | `AdvType=ex` | existential there | | `EX` | `PRON` | `AdvType=ex` | existential there |
| `FW` | `X` | `Foreign=yes` | foreign word | | `FW` | `X` | `Foreign=yes` | foreign word |
| `GW` | `X` | | additional word in multi-word expression | | `GW` | `X` | | additional word in multi-word expression |
| `HYPH` | `PUNCT` | `PunctType=dash` | punctuation mark, hyphen | | `HYPH` | `PUNCT` | `PunctType=dash` | punctuation mark, hyphen |
| `IN` | `ADP` | | conjunction, subordinating or preposition | | `IN` | `ADP` | | conjunction, subordinating or preposition |
| `JJ` | `ADJ` | `Degree=pos` | adjective | | `JJ` | `ADJ` | `Degree=pos` | adjective |
| `JJR` | `ADJ` | `Degree=comp` | adjective, comparative | | `JJR` | `ADJ` | `Degree=comp` | adjective, comparative |
| `JJS` | `ADJ` | `Degree=sup` | adjective, superlative | | `JJS` | `ADJ` | `Degree=sup` | adjective, superlative |
| `LS` | `X` | `NumType=ord` | list item marker | | `LS` | `X` | `NumType=ord` | list item marker |
| `MD` | `VERB` | `VerbType=mod` | verb, modal auxiliary | | `MD` | `VERB` | `VerbType=mod` | verb, modal auxiliary |
| `NFP` | `PUNCT` | | superfluous punctuation | | `NFP` | `PUNCT` | | superfluous punctuation |
| `NIL` | `X` | | missing tag | | `NIL` | `X` | | missing tag |
| `NN` | `NOUN` | `Number=sing` | noun, singular or mass | | `NN` | `NOUN` | `Number=sing` | noun, singular or mass |
| `NNP` | `PROPN` | `NounType=prop Number=sing` | noun, proper singular | | `NNP` | `PROPN` | `NounType=prop Number=sing` | noun, proper singular |
| `NNPS` | `PROPN` | `NounType=prop Number=plur` | noun, proper plural | | `NNPS` | `PROPN` | `NounType=prop Number=plur` | noun, proper plural |
| `NNS` | `NOUN` | `Number=plur` | noun, plural | | `NNS` | `NOUN` | `Number=plur` | noun, plural |
| `PDT` | `DET` | | predeterminer | | `PDT` | `DET` | | predeterminer |
| `POS` | `PART` | `Poss=yes` | possessive ending | | `POS` | `PART` | `Poss=yes` | possessive ending |
| `PRP` | `PRON` | `PronType=prs` | pronoun, personal | | `PRP` | `PRON` | `PronType=prs` | pronoun, personal |
| `PRP$` | `DET` | `PronType=prs Poss=yes` | pronoun, possessive | | `PRP$` | `DET` | `PronType=prs Poss=yes` | pronoun, possessive |
| `RB` | `ADV` | `Degree=pos` | adverb | | `RB` | `ADV` | `Degree=pos` | adverb |
| `RBR` | `ADV` | `Degree=comp` | adverb, comparative | | `RBR` | `ADV` | `Degree=comp` | adverb, comparative |
| `RBS` | `ADV` | `Degree=sup` | adverb, superlative | | `RBS` | `ADV` | `Degree=sup` | adverb, superlative |
| `RP` | `ADP` | | adverb, particle | | `RP` | `ADP` | | adverb, particle |
| `SP` | `SPACE` | | space | | `SP` | `SPACE` | | space |
| `SYM` | `SYM` | | symbol | | `SYM` | `SYM` | | symbol |
| `TO` | `PART` | `PartType=inf VerbForm=inf` | infinitival "to" | | `TO` | `PART` | `PartType=inf VerbForm=inf` | infinitival "to" |
| `UH` | `INTJ` | | interjection | | `UH` | `INTJ` | | interjection |
| `VB` | `VERB` | `VerbForm=inf` | verb, base form | | `VB` | `VERB` | `VerbForm=inf` | verb, base form |
| `VBD` | `VERB` | `VerbForm=fin Tense=past` | verb, past tense | | `VBD` | `VERB` | `VerbForm=fin Tense=past` | verb, past tense |
| `VBG` | `VERB` | `VerbForm=part Tense=pres Aspect=prog` | verb, gerund or present participle | | `VBG` | `VERB` | `VerbForm=part Tense=pres Aspect=prog` | verb, gerund or present participle |
| `VBN` | `VERB` | `VerbForm=part Tense=past Aspect=perf` | verb, past participle | | `VBN` | `VERB` | `VerbForm=part Tense=past Aspect=perf` | verb, past participle |
| `VBP` | `VERB` | `VerbForm=fin Tense=pres` | verb, non-3rd person singular present | | `VBP` | `VERB` | `VerbForm=fin Tense=pres` | verb, non-3rd person singular present |
| `VBZ` | `VERB` | `VerbForm=fin Tense=pres Number=sing Person=three` | verb, 3rd person singular present | | `VBZ` | `VERB` | `VerbForm=fin Tense=pres Number=sing Person=three` | verb, 3rd person singular present |
| `WDT` | `DET` | | wh-determiner | | `WDT` | `DET` | | wh-determiner |
| `WP` | `PRON` | | wh-pronoun, personal | | `WP` | `PRON` | | wh-pronoun, personal |
| `WP$` | `DET` | `Poss=yes` | wh-pronoun, possessive | | `WP$` | `DET` | `Poss=yes` | wh-pronoun, possessive |
| `WRB` | `ADV` | | wh-adverb | | `WRB` | `ADV` | | wh-adverb |
| `XX` | `X` | | unknown | | `XX` | `X` | | unknown |
| `_SP` | `SPACE` | | | | `_SP` | `SPACE` | | |
</Accordion> </Accordion>
<Accordion title="German" id="pos-de"> <Accordion title="German" id="pos-de">
The German part-of-speech tagger uses the The German part-of-speech tagger uses the
[TIGER Treebank](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html) [TIGER Treebank](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger/)
annotation scheme. We also map the tags to the simpler Universal Dependencies annotation scheme. We also map the tags to the simpler Universal Dependencies v2
v2 POS tag set. POS tag set.
| Tag |  POS | Morphology | Description | | Tag |  POS | Morphology | Description |
| --------- | ------- | ---------------------------------------- | ------------------------------------------------- | | --------- | ------- | ---------------------------------------- | ------------------------------------------------- |
@ -211,7 +212,7 @@ v2 POS tag set.
| `PDS` | `PRON` | `PronType=dem` | substituting demonstrative pronoun | | `PDS` | `PRON` | `PronType=dem` | substituting demonstrative pronoun |
| `PIAT` | `DET` | `PronType=ind|neg|tot` | attributive indefinite pronoun without determiner | | `PIAT` | `DET` | `PronType=ind|neg|tot` | attributive indefinite pronoun without determiner |
| `PIS` | `PRON` | `PronType=ind|neg|tot` | substituting indefinite pronoun | | `PIS` | `PRON` | `PronType=ind|neg|tot` | substituting indefinite pronoun |
| `PPER` | `PRON` | `PronType=prs` | non-reflexive personal pronoun | | `PPER` | `PRON` | `PronType=prs` | replaceable personal pronoun |
| `PPOSAT` | `DET` | `Poss=yes PronType=prs` | attributive possessive pronoun | | `PPOSAT` | `DET` | `Poss=yes PronType=prs` | attributive possessive pronoun |
| `PPOSS` | `PRON` | `Poss=yes PronType=prs` | substituting possessive pronoun | | `PPOSS` | `PRON` | `Poss=yes PronType=prs` | substituting possessive pronoun |
| `PRELAT` | `DET` | `PronType=rel` | attributive relative pronoun | | `PRELAT` | `DET` | `PronType=rel` | attributive relative pronoun |
@ -241,6 +242,7 @@ v2 POS tag set.
| `VVPP` | `VERB` | `Aspect=perf VerbForm=part` | perfect participle, full | | `VVPP` | `VERB` | `Aspect=perf VerbForm=part` | perfect participle, full |
| `XY` | `X` | | non-word containing non-letter | | `XY` | `X` | | non-word containing non-letter |
| `_SP` | `SPACE` | | | | `_SP` | `SPACE` | | |
</Accordion> </Accordion>
--- ---