mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Various docs updates for v3.1 (#8406)
* Update for Catalan/Italian lemmatizer changes * Add warning about relevance of section
This commit is contained in:
parent
7abfa25035
commit
e39d1bd4ab
|
@ -64,11 +64,13 @@ libraries (`pymorphy2`).
|
|||
| Language | Default Mode |
|
||||
| -------- | ------------ |
|
||||
| `bn` | `rule` |
|
||||
| `ca` | `pos_lookup` |
|
||||
| `el` | `rule` |
|
||||
| `en` | `rule` |
|
||||
| `es` | `rule` |
|
||||
| `fa` | `rule` |
|
||||
| `fr` | `rule` |
|
||||
| `it` | `pos_lookup` |
|
||||
| `mk` | `rule` |
|
||||
| `nb` | `rule` |
|
||||
| `nl` | `rule` |
|
||||
|
|
|
@ -97,9 +97,10 @@ In the `sm`/`md`/`lg` models:
|
|||
tagger. For English, the attribute ruler can improve its mapping from
|
||||
`token.tag` to `token.pos` if dependency parses from a `parser` are present,
|
||||
but the parser is not required.
|
||||
- The `lemmatizer` component for many languages (Dutch, English, French, Greek,
|
||||
Macedonian, Norwegian, Polish and Spanish) requires `token.pos` annotation
|
||||
from either `tagger`+`attribute_ruler` or `morphologizer`.
|
||||
- The `lemmatizer` component for many languages (Catalan, Dutch, English,
|
||||
French, Greek, Italian Macedonian, Norwegian, Polish and Spanish) requires
|
||||
`token.pos` annotation from either `tagger`+`attribute_ruler` or
|
||||
`morphologizer`.
|
||||
- The `ner` component is independent with its own internal tok2vec layer.
|
||||
|
||||
### Transformer pipeline design {#design-trf}
|
||||
|
@ -133,9 +134,9 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "attribute_ruler", "lemma
|
|||
Token.pos">
|
||||
|
||||
The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for
|
||||
Dutch, English, French, Greek, Macedonian, Norwegian, Polish and Spanish. If you
|
||||
disable any of these components, you'll see lemmatizer warnings unless the
|
||||
lemmatizer is also disabled.
|
||||
Catalan, Dutch, English, French, Greek, Italian, Macedonian, Norwegian, Polish
|
||||
and Spanish. If you disable any of these components, you'll see lemmatizer
|
||||
warnings unless the lemmatizer is also disabled.
|
||||
|
||||
</Infobox>
|
||||
|
||||
|
@ -184,6 +185,12 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_rule
|
|||
|
||||
#### Move NER to the end of the pipeline
|
||||
|
||||
<Infobox title="For v3.0.x models only" variant="warning">
|
||||
|
||||
As of v3.1, the NER component is at the end of the pipeline by default.
|
||||
|
||||
</Infobox>
|
||||
|
||||
For access to `POS` and `LEMMA` features in an `entity_ruler`, move `ner` to the
|
||||
end of the pipeline after `attribute_ruler` and `lemmatizer`:
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user