Update POS/TAG tables in docs

Update POS/TAG tables for English and German docs using current
information generated from the tag_maps and GLOSSARY.
This commit is contained in:
Adriane Boyd 2019-10-22 10:32:31 +02:00
parent ea1283c1b5
commit baf512bbff

View File

@ -117,76 +117,72 @@ type. They're available as the [`Token.pos`](/api/token#attributes) and
The English part-of-speech tagger uses the
[OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) version of the Penn
Treebank tag set. We also map the tags to the simpler Google Universal POS tag
set.
| Tag |  POS | Morphology | Description |
| ----------------------------------- | ------- | ---------------------------------------------- | ----------------------------------------- |
| `-LRB-` | `PUNCT` | `PunctType=brck PunctSide=ini` | left round bracket |
| `-RRB-` | `PUNCT` | `PunctType=brck PunctSide=fin` | right round bracket |
| `,` | `PUNCT` | `PunctType=comm` | punctuation mark, comma |
| `:` | `PUNCT` | | punctuation mark, colon or ellipsis |
| `.` | `PUNCT` | `PunctType=peri` | punctuation mark, sentence closer |
| `''` | `PUNCT` | `PunctType=quot PunctSide=fin` | closing quotation mark |
| `""` | `PUNCT` | `PunctType=quot PunctSide=fin` | closing quotation mark |
| <InlineCode>&#96;&#96;</InlineCode> | `PUNCT` | `PunctType=quot PunctSide=ini` | opening quotation mark |
| `#` | `SYM` | `SymType=numbersign` | symbol, number sign |
| `$` | `SYM` | `SymType=currency` | symbol, currency |
| `ADD` | `X` | | email |
| `AFX` | `ADJ` | `Hyph=yes` | affix |
| `BES` | `VERB` | | auxiliary "be" |
| `CC` | `CONJ` | `ConjType=coor` | conjunction, coordinating |
| `CD` | `NUM` | `NumType=card` | cardinal number |
| `DT` | `DET` | | determiner |
| `EX` | `ADV` | `AdvType=ex` | existential there |
| `FW` | `X` | `Foreign=yes` | foreign word |
| `GW` | `X` | | additional word in multi-word expression |
| `HVS` | `VERB` | | forms of "have" |
| `HYPH` | `PUNCT` | `PunctType=dash` | punctuation mark, hyphen |
| `IN` | `ADP` | | conjunction, subordinating or preposition |
| `JJ` | `ADJ` | `Degree=pos` | adjective |
| `JJR` | `ADJ` | `Degree=comp` | adjective, comparative |
| `JJS` | `ADJ` | `Degree=sup` | adjective, superlative |
| `LS` | `PUNCT` | `NumType=ord` | list item marker |
| `MD` | `VERB` | `VerbType=mod` | verb, modal auxiliary |
| `NFP` | `PUNCT` | | superfluous punctuation |
| `NIL` | | | missing tag |
| `NN` | `NOUN` | `Number=sing` | noun, singular or mass |
| `NNP` | `PROPN` | `NounType=prop Number=sign` | noun, proper singular |
| `NNPS` | `PROPN` | `NounType=prop Number=plur` | noun, proper plural |
| `NNS` | `NOUN` | `Number=plur` | noun, plural |
| `PDT` | `ADJ` | `AdjType=pdt PronType=prn` | predeterminer |
| `POS` | `PART` | `Poss=yes` | possessive ending |
| `PRP` | `PRON` | `PronType=prs` | pronoun, personal |
| `PRP$` | `ADJ` | `PronType=prs Poss=yes` | pronoun, possessive |
| `RB` | `ADV` | `Degree=pos` | adverb |
| `RBR` | `ADV` | `Degree=comp` | adverb, comparative |
| `RBS` | `ADV` | `Degree=sup` | adverb, superlative |
| `RP` | `PART` | | adverb, particle |
| `_SP` | `SPACE` | | space |
| `SYM` | `SYM` | | symbol |
| `TO` | `PART` | `PartType=inf VerbForm=inf` | infinitival "to" |
| `UH` | `INTJ` | | interjection |
| `VB` | `VERB` | `VerbForm=inf` | verb, base form |
| `VBD` | `VERB` | `VerbForm=fin Tense=past` | verb, past tense |
| `VBG` | `VERB` | `VerbForm=part Tense=pres Aspect=prog` | verb, gerund or present participle |
| `VBN` | `VERB` | `VerbForm=part Tense=past Aspect=perf` | verb, past participle |
| `VBP` | `VERB` | `VerbForm=fin Tense=pres` | verb, non-3rd person singular present |
| `VBZ` | `VERB` | `VerbForm=fin Tense=pres Number=sing Person=3` | verb, 3rd person singular present |
| `WDT` | `ADJ` | `PronType=int|rel` | wh-determiner |
| `WP` | `NOUN` | `PronType=int|rel` | wh-pronoun, personal |
| `WP$` | `ADJ` | `Poss=yes PronType=int|rel` | wh-pronoun, possessive |
| `WRB` | `ADV` | `PronType=int|rel` | wh-adverb |
| `XX` | `X` | | unknown |
Treebank tag set. We also map the tags to the simpler Universal Dependencies v2
POS tag set.
| Tag |  POS | Morphology | Description |
| ------------------------------------- | ------- | --------------------------------------- | ----------------------------------------- |
| `$` | `SYM` | | symbol, currency |
| <InlineCode>&#96;&#96;</InlineCode> | `PUNCT` | `PunctType=quot PunctSide=ini` | opening quotation mark |
| `''` | `PUNCT` | `PunctType=quot PunctSide=fin` | closing quotation mark |
| `,` | `PUNCT` | `PunctType=comm` | punctuation mark, comma |
| `-LRB-` | `PUNCT` | `PunctType=brck PunctSide=ini` | left round bracket |
| `-RRB-` | `PUNCT` | `PunctType=brck PunctSide=fin` | right round bracket |
| `.` | `PUNCT` | `PunctType=peri` | punctuation mark, sentence closer |
| `:` | `PUNCT` | | punctuation mark, colon or ellipsis |
| `ADD` | `X` | | email |
| `AFX` | `ADJ` | `Hyph=yes` | affix |
| `CC` | `CCONJ` | `ConjType=comp` | conjunction, coordinating |
| `CD` | `NUM` | `NumType=card` | cardinal number |
| `DT` | `DET` | | determiner |
| `EX` | `PRON` | `AdvType=ex` | existential there |
| `FW` | `X` | `Foreign=yes` | foreign word |
| `GW` | `X` | | additional word in multi-word expression |
| `HYPH` | `PUNCT` | `PunctType=dash` | punctuation mark, hyphen |
| `IN` | `ADP` | | conjunction, subordinating or preposition |
| `JJ` | `ADJ` | `Degree=pos` | adjective |
| `JJR` | `ADJ` | `Degree=comp` | adjective, comparative |
| `JJS` | `ADJ` | `Degree=sup` | adjective, superlative |
| `LS` | `X` | `NumType=ord` | list item marker |
| `MD` | `VERB` | `VerbType=mod` | verb, modal auxiliary |
| `NFP` | `PUNCT` | | superfluous punctuation |
| `NIL` | `X` | | missing tag |
| `NN` | `NOUN` | `Number=sing` | noun, singular or mass |
| `NNP` | `PROPN` | `NounType=prop Number=sing` | noun, proper singular |
| `NNPS` | `PROPN` | `NounType=prop Number=plur` | noun, proper plural |
| `NNS` | `NOUN` | `Number=plur` | noun, plural |
| `PDT` | `DET` | | predeterminer |
| `POS` | `PART` | `Poss=yes` | possessive ending |
| `PRP` | `PRON` | `PronType=prs` | pronoun, personal |
| `PRP$` | `DET` | `PronType=prs Poss=yes` | pronoun, possessive |
| `RB` | `ADV` | `Degree=pos` | adverb |
| `RBR` | `ADV` | `Degree=comp` | adverb, comparative |
| `RBS` | `ADV` | `Degree=sup` | adverb, superlative |
| `RP` | `ADP` | | adverb, particle |
| `SP` | `SPACE` | | space |
| `SYM` | `SYM` | | symbol |
| `TO` | `PART` | `PartType=inf VerbForm=inf` | infinitival "to" |
| `UH` | `INTJ` | | interjection |
| `VB` | `VERB` | `VerbForm=inf` | verb, base form |
| `VBD` | `VERB` | `VerbForm=fin Tense=past` | verb, past tense |
| `VBG` | `VERB` | `VerbForm=part Tense=pres Aspect=prog` | verb, gerund or present participle |
| `VBN` | `VERB` | `VerbForm=part Tense=past Aspect=perf` | verb, past participle |
| `VBP` | `VERB` | `VerbForm=fin Tense=pres` | verb, non-3rd person singular present |
| `VBZ` | `VERB` | `VerbForm=fin Tense=pres Number=sing Person=three` | verb, 3rd person singular present |
| `WDT` | `DET` | | wh-determiner |
| `WP` | `PRON` | | wh-pronoun, personal |
| `WP$` | `DET` | `Poss=yes` | wh-pronoun, possessive |
| `WRB` | `ADV` | | wh-adverb |
| `XX` | `X` | | unknown |
| `_SP` | `SPACE` | | |
</Accordion>
<Accordion title="German" id="pos-de">
The German part-of-speech tagger uses the
[TIGER Treebank](http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/TIGERCorpus/annotation/index.html)
annotation scheme. We also map the tags to the simpler Google Universal POS tag
set.
annotation scheme. We also map the tags to the simpler Universal Dependencies
v2 POS tag set.
| Tag |  POS | Morphology | Description |
| --------- | ------- | ---------------------------------------- | ------------------------------------------------- |
@ -194,7 +190,7 @@ set.
| `$,` | `PUNCT` | `PunctType=comm` | comma |
| `$.` | `PUNCT` | `PunctType=peri` | sentence-final punctuation mark |
| `ADJA` | `ADJ` | | adjective, attributive |
| `ADJD` | `ADJ` | `Variant=short` | adjective, adverbial or predicative |
| `ADJD` | `ADJ` | | adjective, adverbial or predicative |
| `ADV` | `ADV` | | adverb |
| `APPO` | `ADP` | `AdpType=post` | postposition |
| `APPR` | `ADP` | `AdpType=prep` | preposition; circumposition left |
@ -204,28 +200,28 @@ set.
| `CARD` | `NUM` | `NumType=card` | cardinal number |
| `FM` | `X` | `Foreign=yes` | foreign language material |
| `ITJ` | `INTJ` | | interjection |
| `KOKOM` | `CONJ` | `ConjType=comp` | comparative conjunction |
| `KON` | `CONJ` | | coordinate conjunction |
| `KOKOM` | `CCONJ` | `ConjType=comp` | comparative conjunction |
| `KON` | `CCONJ` | | coordinate conjunction |
| `KOUI` | `SCONJ` | | subordinate conjunction with "zu" and infinitive |
| `KOUS` | `SCONJ` | | subordinate conjunction with sentence |
| `NE` | `PROPN` | | proper noun |
| `NNE` | `PROPN` | | proper noun |
| `NN` | `NOUN` | | noun, singular or mass |
| `PROAV` | `ADV` | `PronType=dem` | pronominal adverb |
| `NNE` | `PROPN` | | proper noun |
| `PDAT` | `DET` | `PronType=dem` | attributive demonstrative pronoun |
| `PDS` | `PRON` | `PronType=dem` | substituting demonstrative pronoun |
| `PIAT` | `DET` | `PronType=ind\|neg\|tot` | attributive indefinite pronoun without determiner |
| `PIS` | `PRON` | `PronType=ind\|neg\|tot` | substituting indefinite pronoun |
| `PIAT` | `DET` | `PronType=ind|neg|tot` | attributive indefinite pronoun without determiner |
| `PIS` | `PRON` | `PronType=ind|neg|tot` | substituting indefinite pronoun |
| `PPER` | `PRON` | `PronType=prs` | non-reflexive personal pronoun |
| `PPOSAT` | `DET` | `Poss=yes PronType=prs` | attributive possessive pronoun |
| `PPOSS` | `PRON` | `PronType=rel` | substituting possessive pronoun |
| `PPOSS` | `PRON` | `Poss=yes PronType=prs` | substituting possessive pronoun |
| `PRELAT` | `DET` | `PronType=rel` | attributive relative pronoun |
| `PRELS` | `PRON` | `PronType=rel` | substituting relative pronoun |
| `PRF` | `PRON` | `PronType=prs Reflex=yes` | reflexive personal pronoun |
| `PROAV` | `ADV` | `PronType=dem` | pronominal adverb |
| `PTKA` | `PART` | | particle with adjective or adverb |
| `PTKANT` | `PART` | `PartType=res` | answer particle |
| `PTKNEG` | `PART` | `Negative=yes` | negative particle |
| `PTKVZ` | `PART` | `PartType=vbp` | separable verbal particle |
| `PTKNEG` | `PART` | `Polarity=neg` | negative particle |
| `PTKVZ` | `ADP` | `PartType=vbp` | separable verbal particle |
| `PTKZU` | `PART` | `PartType=inf` | "zu" before infinitive |
| `PWAT` | `DET` | `PronType=int` | attributive interrogative pronoun |
| `PWAV` | `ADV` | `PronType=int` | adverbial interrogative or relative pronoun |
@ -234,9 +230,9 @@ set.
| `VAFIN` | `AUX` | `Mood=ind VerbForm=fin` | finite verb, auxiliary |
| `VAIMP` | `AUX` | `Mood=imp VerbForm=fin` | imperative, auxiliary |
| `VAINF` | `AUX` | `VerbForm=inf` | infinitive, auxiliary |
| `VAPP` | `AUX` | `Aspect=perf VerbForm=fin` | perfect participle, auxiliary |
| `VAPP` | `AUX` | `Aspect=perf VerbForm=part` | perfect participle, auxiliary |
| `VMFIN` | `VERB` | `Mood=ind VerbForm=fin VerbType=mod` | finite verb, modal |
| `VMINF` | `VERB` | `VerbForm=fin VerbType=mod` | infinitive, modal |
| `VMINF` | `VERB` | `VerbForm=inf VerbType=mod` | infinitive, modal |
| `VMPP` | `VERB` | `Aspect=perf VerbForm=part VerbType=mod` | perfect participle, modal |
| `VVFIN` | `VERB` | `Mood=ind VerbForm=fin` | finite verb, full |
| `VVIMP` | `VERB` | `Mood=imp VerbForm=fin` | imperative, full |
@ -244,8 +240,7 @@ set.
| `VVIZU` | `VERB` | `VerbForm=inf` | infinitive with "zu", full |
| `VVPP` | `VERB` | `Aspect=perf VerbForm=part` | perfect participle, full |
| `XY` | `X` | | non-word containing non-letter |
| `SP` | `SPACE` | | space |
| `_SP` | `SPACE` | | |
</Accordion>
---