mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Update POS tagging workflow
This commit is contained in:
parent
43258d6b0a
commit
b6209e2427
|
@ -7,22 +7,12 @@ p
|
|||
| assigned to each token in the document. They're useful in rule-based
|
||||
| processes. They can also be useful features in some statistical models.
|
||||
|
||||
p
|
||||
| To use spaCy's tagger, you need to have a data pack installed that
|
||||
| includes a tagging model. Tagging models are included in the data
|
||||
| downloads for English and German. After you load the model, the tagger
|
||||
| is applied automatically, as part of the default pipeline. You can then
|
||||
| access the tags using the #[+api("token") #[code Token.tag]] and
|
||||
| #[+api("token") #[code token.pos]] attributes. For English, the tagger
|
||||
| also triggers some simple rule-based morphological processing, which
|
||||
| gives you the lemma as well.
|
||||
+h(2, "101") Part-of-speech tagging 101
|
||||
+tag-model("dependency parse")
|
||||
|
||||
+code("Usage").
|
||||
import spacy
|
||||
nlp = spacy.load('en')
|
||||
doc = nlp(u'They told us to duck.')
|
||||
for word in doc:
|
||||
print(word.text, word.lemma, word.lemma_, word.tag, word.tag_, word.pos, word.pos_)
|
||||
include _spacy-101/_pos-deps
|
||||
|
||||
+aside("Help – spaCy's output is wrong!")
|
||||
|
||||
+h(2, "rule-based-morphology") Rule-based morphology
|
||||
|
||||
|
@ -63,7 +53,8 @@ p
|
|||
|
||||
+list("numbers")
|
||||
+item
|
||||
| The tokenizer consults a #[strong mapping table]
|
||||
| The tokenizer consults a
|
||||
| #[+a("/docs/usage/adding-languages#tokenizer-exceptions") mapping table]
|
||||
| #[code TOKENIZER_EXCEPTIONS], which allows sequences of characters
|
||||
| to be mapped to multiple tokens. Each token may be assigned a part
|
||||
| of speech and one or more morphological features.
|
||||
|
@ -77,8 +68,9 @@ p
|
|||
|
||||
+item
|
||||
| For words whose POS is not set by a prior process, a
|
||||
| #[strong mapping table] #[code TAG_MAP] maps the tags to a
|
||||
| part-of-speech and a set of morphological features.
|
||||
| #[+a("/docs/usage/adding-languages#tag-map") mapping table]
|
||||
| #[code TAG_MAP] maps the tags to a part-of-speech and a set of
|
||||
| morphological features.
|
||||
|
||||
+item
|
||||
| Finally, a #[strong rule-based deterministic lemmatizer] maps the
|
||||
|
|
Loading…
Reference in New Issue
Block a user