Merge branch 'develop' of https://github.com/explosion/spaCy into develop

This commit is contained in:
Matthew Honnibal 2020-09-17 13:59:25 +02:00
commit b57ce9a875

View File

@ -710,6 +710,48 @@ nlp = spacy.blank("en")
+ nlp.add_pipe("ner", source=source_nlp)
```
#### Configuring pipeline components with settings {#migrating-configure-pipe}
Because pipeline components are now added using their string names, you won't
have to instantiate the [component classes](/api/#architecture-pipeline)
directly anynore. To configure the component, you can now use the `config`
argument on [`nlp.add_pipe`](/api/language#add_pipe).
> #### config.cfg (excerpt)
>
> ```ini
> [components.sentencizer]
> factory = "sentencizer"
> punct_chars = ["!", ".", "?"]
> ```
```diff
punct_chars = ["!", ".", "?"]
- sentencizer = Sentencizer(punct_chars=punct_chars)
+ sentencizer = nlp.add_pipe("sentencizer", config={"punct_chars": punct_chars})
```
The `config` corresponds to the component settings in the
[`config.cfg`](/usage/training#config-components) and will overwrite the default
config defined by the components.
<Infobox variant="warning" title="Important note on config values">
Config values you pass to components **need to be JSON-serializable** and can't
be arbitrary Python objects. Otherwise, the settings you provide can't be
represented in the `config.cfg` and spaCy has no way of knowing how to re-create
your component with the same settings when you load the pipeline back in. If you
need to pass arbitrary objects to a component, use a
[registered function](/usage/processing-pipelines#example-stateful-components):
```diff
- config = {"model": MyTaggerModel()}
+ config= {"model": {"@architectures": "MyTaggerModel"}}
tagger = nlp.add_pipe("tagger", config=config)
```
</Infobox>
### Adding match patterns {#migrating-matcher}
The [`Matcher.add`](/api/matcher#add),