From e39d1bd4ab04694bf30bf1d978989e7026ac37b9 Mon Sep 17 00:00:00 2001 From: Adriane Boyd Date: Mon, 21 Jun 2021 09:33:50 +0200 Subject: [PATCH] Various docs updates for v3.1 (#8406) * Update for Catalan/Italian lemmatizer changes * Add warning about relevance of section --- website/docs/api/lemmatizer.md | 2 ++ website/docs/models/index.md | 19 +++++++++++++------ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/website/docs/api/lemmatizer.md b/website/docs/api/lemmatizer.md index a19c1185e..995f890cd 100644 --- a/website/docs/api/lemmatizer.md +++ b/website/docs/api/lemmatizer.md @@ -64,11 +64,13 @@ libraries (`pymorphy2`). | Language | Default Mode | | -------- | ------------ | | `bn` | `rule` | +| `ca` | `pos_lookup` | | `el` | `rule` | | `en` | `rule` | | `es` | `rule` | | `fa` | `rule` | | `fr` | `rule` | +| `it` | `pos_lookup` | | `mk` | `rule` | | `nb` | `rule` | | `nl` | `rule` | diff --git a/website/docs/models/index.md b/website/docs/models/index.md index 9bffa5b21..92d1b0172 100644 --- a/website/docs/models/index.md +++ b/website/docs/models/index.md @@ -97,9 +97,10 @@ In the `sm`/`md`/`lg` models: tagger. For English, the attribute ruler can improve its mapping from `token.tag` to `token.pos` if dependency parses from a `parser` are present, but the parser is not required. -- The `lemmatizer` component for many languages (Dutch, English, French, Greek, - Macedonian, Norwegian, Polish and Spanish) requires `token.pos` annotation - from either `tagger`+`attribute_ruler` or `morphologizer`. +- The `lemmatizer` component for many languages (Catalan, Dutch, English, + French, Greek, Italian Macedonian, Norwegian, Polish and Spanish) requires + `token.pos` annotation from either `tagger`+`attribute_ruler` or + `morphologizer`. - The `ner` component is independent with its own internal tok2vec layer. ### Transformer pipeline design {#design-trf} @@ -133,9 +134,9 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "attribute_ruler", "lemma Token.pos"> The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for -Dutch, English, French, Greek, Macedonian, Norwegian, Polish and Spanish. If you -disable any of these components, you'll see lemmatizer warnings unless the -lemmatizer is also disabled. +Catalan, Dutch, English, French, Greek, Italian, Macedonian, Norwegian, Polish +and Spanish. If you disable any of these components, you'll see lemmatizer +warnings unless the lemmatizer is also disabled. @@ -184,6 +185,12 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_rule #### Move NER to the end of the pipeline + + +As of v3.1, the NER component is at the end of the pipeline by default. + + + For access to `POS` and `LEMMA` features in an `entity_ruler`, move `ner` to the end of the pipeline after `attribute_ruler` and `lemmatizer`: