Update POS tagging workflow

2025-12-23 10:03:15 +03:00 · 2017-05-23 23:18:08 +02:00 · 2017-05-23 23:18:08 +02:00 · b6209e2427
commit b6209e2427
parent 43258d6b0a
1 changed files with 10 additions and 18 deletions
--- a/website/docs/usage/pos-tagging.jade
+++ b/website/docs/usage/pos-tagging.jade
@ -7,22 +7,12 @@ p
    |  assigned to each token in the document. They're useful in rule-based
    |  processes. They can also be useful features in some statistical models.

-p
-    |  To use spaCy's tagger, you need to have a data pack installed that
-    |  includes a tagging model. Tagging models are included in the data
-    |  downloads for English and German. After you load the model, the tagger
-    |  is applied automatically, as part of the default pipeline. You can then
-    |  access the tags using the #[+api("token") #[code Token.tag]] and
-    |  #[+api("token") #[code token.pos]] attributes. For English, the tagger
-    |  also triggers some simple rule-based morphological processing, which
-    |  gives you the lemma as well.
+h(2, "101") Part-of-speech tagging 101
+    +tag-model("dependency parse")

-+code("Usage").
-    import spacy
-    nlp = spacy.load('en')
-    doc = nlp(u'They told us to duck.')
-    for word in doc:
-        print(word.text, word.lemma, word.lemma_, word.tag, word.tag_, word.pos, word.pos_)
+include _spacy-101/_pos-deps
+
+aside("Help – spaCy's output is wrong!")

 +h(2, "rule-based-morphology") Rule-based morphology

@ -63,7 +53,8 @@ p

 +list("numbers")
    +item
-        |  The tokenizer consults a #[strong mapping table]
+        |  The tokenizer consults a
+        |  #[+a("/docs/usage/adding-languages#tokenizer-exceptions") mapping table]
        |  #[code TOKENIZER_EXCEPTIONS], which allows sequences of characters
        |  to be mapped to multiple tokens. Each token may be assigned a part
        |  of speech and one or more morphological features.
@ -77,8 +68,9 @@ p

    +item
        |  For words whose POS is not set by a prior process, a
-        |  #[strong mapping table] #[code TAG_MAP] maps the tags to a
-        |  part-of-speech and a set of morphological features.
+        |  #[+a("/docs/usage/adding-languages#tag-map") mapping table]
+        |  #[code TAG_MAP] maps the tags to a part-of-speech and a set of
+        |  morphological features.

    +item
        |  Finally, a #[strong rule-based deterministic lemmatizer] maps the