Notes on source with vectors

This commit is contained in:
Adriane Boyd 2021-06-24 10:34:07 +02:00
parent 35425d7e26
commit 92dc6b409e

View File

@ -220,4 +220,34 @@ working as expected, you can update the spaCy version requirements in the
+ "spacy_version": ">=3.0.0,<3.2.0",
```
<!-- TODO: vectors initialization and anything else we want to mention -->
### Sourcing pipeline components with vectors {#source-vectors}
If you're sourcing a pipeline component that requires static vectors (for
example, a tagger or parser from an `md` or `lg` pretrained pipeline), be sure
to include the source model's vectors in the setting `[initialize.vectors]`. In
spaCy v3.0, a bug allowed vectors to be loaded implicitly through `source`,
however in v3.1 this setting must be provided explicitly as
`[initialize.vectors]`:
```ini
### config.cfg (excerpt)
[components.ner]
source = "en_core_web_md"
[initialize]
vectors = "en_core_web_md"
```
<Infobox title="Important note" variant="warning">
Each pipeline can only store one set of static vectors, so it's not possible to
assemble a pipeline with components that were trained on different static
vectors.
</Infobox>
[`spacy train`](/api/cli#train) and [`spacy assemble`](/api/cli#assemble) will
provide warnings if the source and target pipelines don't contain the same
vectors. If you are sourcing a rule-based component like an entity ruler or
lemmatizer that does not use the vectors as a model feature, then this warning
can be safely ignored.