Update POS tagging workflow

2025-10-30 07:27:28 +03:00 · 2017-05-23 23:18:08 +02:00 · 2017-05-23 23:18:08 +02:00 · b6209e2427
commit b6209e2427
parent 43258d6b0a
1 changed files with 10 additions and 18 deletions
--- a/website/docs/usage/pos-tagging.jade
+++ b/website/docs/usage/pos-tagging.jade
@ -7,22 +7,12 @@ p
    |  assigned to each token in the document. They're useful in rule-based
    |  processes. They can also be useful features in some statistical models.
-p
+h(2, "101") Part-of-speech tagging 101
-    |  To use spaCy's tagger, you need to have a data pack installed that
+    +tag-model("dependency parse")
    |  includes a tagging model. Tagging models are included in the data
    |  downloads for English and German. After you load the model, the tagger
    |  is applied automatically, as part of the default pipeline. You can then
    |  access the tags using the #[+api("token") #[code Token.tag]] and
    |  #[+api("token") #[code token.pos]] attributes. For English, the tagger
    |  also triggers some simple rule-based morphological processing, which
    |  gives you the lemma as well.
-+code("Usage").
+include _spacy-101/_pos-deps
-    import spacy
+
-    nlp = spacy.load('en')
+aside("Help – spaCy's output is wrong!")
    doc = nlp(u'They told us to duck.')
    for word in doc:
        print(word.text, word.lemma, word.lemma_, word.tag, word.tag_, word.pos, word.pos_)
 +h(2, "rule-based-morphology") Rule-based morphology
@ -63,7 +53,8 @@ p
 +list("numbers")
    +item
-        |  The tokenizer consults a #[strong mapping table]
+        |  The tokenizer consults a
        |  #[+a("/docs/usage/adding-languages#tokenizer-exceptions") mapping table]
        |  #[code TOKENIZER_EXCEPTIONS], which allows sequences of characters
        |  to be mapped to multiple tokens. Each token may be assigned a part
        |  of speech and one or more morphological features.
@ -77,8 +68,9 @@ p
    +item
        |  For words whose POS is not set by a prior process, a
-        |  #[strong mapping table] #[code TAG_MAP] maps the tags to a
+        |  #[+a("/docs/usage/adding-languages#tag-map") mapping table]
-        |  part-of-speech and a set of morphological features.
+        |  #[code TAG_MAP] maps the tags to a part-of-speech and a set of
        |  morphological features.
    +item
        |  Finally, a #[strong rule-based deterministic lemmatizer] maps the