Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-07-27 00:29:45 +02:00
parent 3d56a3f286
commit 7adbaf9a5b
32 changed files with 1298 additions and 806 deletions

View File

@ -384,7 +384,7 @@ original file is shown at the top of the widget.
> ```
```python
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
https://github.com/explosion/spaCy/tree/master/spacy/language.py
```
### Infobox

View File

@ -1,23 +1,22 @@
---
title: DependencyParser
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/dep_parser.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"parser"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/dep_parser.pyx
```
## DependencyParser.\_\_init\_\_ {#init tag="method"}
@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_d
> #### Example
>
> ```python
> # Construction via create_pipe with default model
> parser = nlp.create_pipe("parser")
> # Construction via add_pipe with default model
> parser = nlp.add_pipe("parser")
>
> # Construction via create_pipe with custom model
> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_parser"}}
> parser = nlp.create_pipe("parser", config)
>
> # Construction from class with custom model from file
> from spacy.pipeline import DependencyParser
> model = util.load_config("model.cfg", create_objects=True)["model"]
> parser = DependencyParser(nlp.vocab, model)
> parser = nlp.add_pipe("parser", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | ------------------ | ------------------------------------------------------------------------------- |

View File

@ -4,7 +4,7 @@ teaser:
Functionality to disambiguate a named entity in text to a unique knowledge
base identifier.
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/entity_linker.py
new: 2.2
---
@ -12,16 +12,15 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"entity_linker"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_linker_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/entity_linker.py
```
## EntityLinker.\_\_init\_\_ {#init tag="method"}
@ -29,22 +28,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_l
> #### Example
>
> ```python
> # Construction via create_pipe with default model
> entity_linker = nlp.create_pipe("entity_linker")
> # Construction via add_pipe with default model
> entity_linker = nlp.add_pipe("entity_linker")
>
> # Construction via create_pipe with custom model
> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_el"}}
> entity_linker = nlp.create_pipe("entity_linker", config)
>
> # Construction from class with custom model from file
> from spacy.pipeline import EntityLinker
> model = util.load_config("model.cfg", create_objects=True)["model"]
> entity_linker = EntityLinker(nlp.vocab, model)
> entity_linker = nlp.add_pipe("entity_linker", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ------- | ------- | ------------------------------------------------------------------------------- |
@ -185,9 +179,8 @@ method, a knowledge base should have been defined with
> #### Example
>
> ```python
> entity_linker = EntityLinker(nlp.vocab)
> entity_linker = nlp.add_pipe("entity_linker", last=True)
> entity_linker.set_kb(kb)
> nlp.add_pipe(entity_linker, last=True)
> optimizer = entity_linker.begin_training(pipeline=nlp.pipeline)
> ```

View File

@ -1,23 +1,22 @@
---
title: EntityRecognizer
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/ner.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"ner"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/ner.pyx
```
## EntityRecognizer.\_\_init\_\_ {#init tag="method"}
@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defa
> #### Example
>
> ```python
> # Construction via create_pipe
> ner = nlp.create_pipe("ner")
> # Construction via add_pipe with default model
> ner = nlp.add_pipe("ner")
>
> # Construction via create_pipe with custom model
> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_ner"}}
> parser = nlp.create_pipe("ner", config)
>
> # Construction from class with custom model from file
> from spacy.pipeline import EntityRecognizer
> model = util.load_config("model.cfg", create_objects=True)["model"]
> ner = EntityRecognizer(nlp.vocab, model)
> parser = nlp.add_pipe("ner", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | ------------------ | ------------------------------------------------------------------------------- |

View File

@ -8,10 +8,10 @@ new: 2.1
The EntityRuler lets you add spans to the [`Doc.ents`](/api/doc#ents) using
token-based rules or exact phrase matches. It can be combined with the
statistical [`EntityRecognizer`](/api/entityrecognizer) to boost accuracy, or
used on its own to implement a purely rule-based entity recognition system.
After initialization, the component is typically added to the processing
pipeline using [`nlp.add_pipe`](/api/language#add_pipe). For usage examples, see
the docs on
used on its own to implement a purely rule-based entity recognition system. The
pipeline component is available in the
[processing pipeline](/usage/processing-pipelines) via the ID `"entity_ruler"`.
For usage examples, see the docs on
[rule-based entity recognition](/usage/rule-based-matching#entityruler).
## EntityRuler.\_\_init\_\_ {#init tag="method"}
@ -19,13 +19,13 @@ the docs on
Initialize the entity ruler. If patterns are supplied here, they need to be a
list of dictionaries with a `"label"` and `"pattern"` key. A pattern can either
be a token pattern (list) or a phrase pattern (string). For example:
`{'label': 'ORG', 'pattern': 'Apple'}`.
`{"label": "ORG", "pattern": "Apple"}`.
> #### Example
>
> ```python
> # Construction via create_pipe
> ruler = nlp.create_pipe("entity_ruler")
> # Construction via add_pipe
> ruler = nlp.add_pipe("entity_ruler")
>
> # Construction from class
> from spacy.pipeline import EntityRuler
@ -90,9 +90,8 @@ is chosen.
> #### Example
>
> ```python
> ruler = EntityRuler(nlp)
> ruler = nlp.add_pipe("entity_ruler")
> ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
> nlp.add_pipe(ruler)
>
> doc = nlp("A text about Apple.")
> ents = [(ent.text, ent.label_) for ent in doc.ents]

View File

@ -223,7 +223,7 @@ in `example.predicted`.
> #### Example
>
> ```python
> nlp.add_pipe(my_ner)
> nlp.add_pipe("my_ner")
> doc = nlp("Mr and Mrs Smith flew to New York")
> tokens_ref = ["Mr and Mrs", "Smith", "flew", "to", "New York"]
> example = Example.from_dict(doc, {"words": tokens_ref})

View File

@ -15,6 +15,88 @@ the tagger or parser that are called on a document in order. You can also add
your own processing pipeline components that take a `Doc` object, modify it and
return it.
## Language.component {#component tag="classmethod" new="3"}
Register a custom pipeline component under a given name. This allows
initializing the component by name using
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
[config files](/usage/training#config). This classmethod and decorator is
intended for **simple stateless functions** that take a `Doc` and return it. For
more complex stateful components that allow settings and need access to the
shared `nlp` object, use the [`Language.factory`](/api/language#factory)
decorator. For more details and examples, see the
[usage documentation](/usage/processing-pipelines#custom-components).
> #### Example
>
> ```python
> from spacy.language import Language
>
> # Usage as a decorator
> @Language.component("my_component")
> def my_component(doc):
> # Do something to the doc
> return doc
>
> # Usage as a function
> Language.component("my_component2", func=my_component)
> ```
| Name | Type | Description |
| -------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | str | The name of the component factory. |
| _keyword-only_ | | |
| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something --> |
| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
## Language.factory {#factory tag="classmethod"}
Register a custom pipeline component factory under a given name. This allows
initializing the component by name using
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
[config files](/usage/training#config). The registered factory function needs to
take at least two **named arguments** which spaCy fills in automatically: `nlp`
for the current `nlp` object and `name` for the component instance name. This
can be useful to distinguish multiple instances of the same component and allows
trainable components to add custom losses using the component instance name. The
`default_config` defines the default values of the remaining factory arguments.
It's merged into the [`nlp.config`](/api/language#config). For more details and
examples, see the
[usage documentation](/usage/processing-pipelines#custom-components).
> #### Example
>
> ```python
> from spacy.language import Language
>
> # Usage as a decorator
> @Language.factory(
> "my_component",
> default_config={"some_setting": True},
> )
> def create_my_component(nlp, name, some_setting):
> return MyComponent(some_setting)
>
> # Usage as function
> Language.factory(
> "my_component",
> default_config={"some_setting": True},
> func=create_my_component
> )
> ```
| Name | Type | Description |
| ---------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | str | The name of the component factory. |
| _keyword-only_ | | |
| `default_config` | `Dict[str, any]` | The default config, describing the default values of the factory arguments. |
| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something --> |
| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
## Language.\_\_init\_\_ {#init tag="method"}
Initialize a `Language` object.
@ -30,12 +112,41 @@ Initialize a `Language` object.
> nlp = English()
> ```
| Name | Type | Description |
| ----------- | ---------- | ------------------------------------------------------------------------------------------ |
| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created via `Language.Defaults.create_vocab`. |
| `make_doc` | callable | A function that takes text and returns a `Doc` object. Usually a `Tokenizer`. |
| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
| **RETURNS** | `Language` | The newly constructed object. |
| Name | Type | Description |
| ------------------ | ----------- | ------------------------------------------------------------------------------------------ |
| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created using the default language data settings. |
| _keyword-only_ | | |
| `max_length` | int | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`. |
| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
| `create_tokenizer` |  `Callable` | Optional function that receives the `nlp` object and returns a tokenizer. |
| **RETURNS** | `Language` | The newly constructed object. |
## Language.from_config {#from_config tag="classmethod"}
Create a `Language` object from a loaded config. Will set up the tokenizer and
language data, add pipeline components based on the pipeline and components
define in the config and validate the results. If no config is provided, the
default config of the given language is used. This is also how spaCy loads a
model under the hood based on its [`config.cfg`](/api/data-formats#config).
> #### Example
>
> ```python
> from thinc.api import Config
> from spacy.language import Language
>
> config = Config().from_disk("./config.cfg")
> nlp = Language.from_config(config)
> ```
| Name | Type | Description |
| -------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `config` | `Dict[str, Any]` / [`Config`](https://thinc.ai/docs/api-config#config) | The loaded config. |
| _keyword-only_ | |
| `disable` | `Iterable[str]` | List of pipeline component names to disable. |
| `auto_fill` | bool | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. |
| `validate` | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
| **RETURNS** | `Language` | The initialized object. |
## Language.\_\_call\_\_ {#call tag="method"}
@ -162,43 +273,99 @@ their original weights after the block.
Create a pipeline component from a factory.
<Infobox title="Changed in v3.0" variant="warning">
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
the string name of the factory, creates the component, adds it to the pipeline
and returns it. The `Language.create_pipe` method is now mostly used internally.
To create a component and add it to the pipeline, you should always use
`Language.add_pipe`.
</Infobox>
> #### Example
>
> ```python
> parser = nlp.create_pipe("parser")
> nlp.add_pipe(parser)
> ```
| Name | Type | Description |
| ----------- | -------- | ---------------------------------------------------------------------------------- |
| `name` | str | Factory name to look up in [`Language.factories`](/api/language#class-attributes). |
| `config` | dict | Configuration parameters to initialize component. |
| **RETURNS** | callable | The pipeline component. |
| Name | Type | Description |
| ------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name` | str | Name of the registered component factory. |
| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
| `config` <Tag variant="new">3</Tag> | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
| `validate` <Tag variant="new">3</Tag> | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
| **RETURNS** | callable | The pipeline component. |
## Language.add_pipe {#add_pipe tag="method" new="2"}
Add a component to the processing pipeline. Valid components are callables that
take a `Doc` object, modify it and return it. Only one of `before`, `after`,
`first` or `last` can be set. Default behavior is `last=True`.
Add a component to the processing pipeline. Expects a name that maps to a
component factory registered using
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory). Components should be callables
that take a `Doc` object, modify it and return it. Only one of `before`,
`after`, `first` or `last` can be set. Default behavior is `last=True`.
<Infobox title="Changed in v3.0" variant="warning">
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
take callables anymore and instead expects the name of a component factory
registered using [`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory). It now takes care of creating the
component, adds it to the pipeline and returns it.
</Infobox>
> #### Example
>
> ```python
> def component(doc):
> @Language.component("component")
> def component_func(doc):
> # modify Doc and return it return doc
>
> nlp.add_pipe(component, before="ner")
> nlp.add_pipe(component, name="custom_name", last=True)
> nlp.add_pipe("component", before="ner")
> component = nlp.add_pipe("component", name="custom_name", last=True)
> ```
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `component` | callable | The pipeline component. |
| `name` | str | Name of pipeline component. Overwrites existing `component.name` attribute if available. If no `name` is set and the component exposes no name attribute, `component.__name__` is used. An error is raised if the name already exists in the pipeline. |
| `before` | str | Component name to insert component directly before. |
| `after` | str | Component name to insert component directly after: |
| `first` | bool | Insert component first / not first in the pipeline. |
| `last` | bool | Insert component last / not last in the pipeline. |
| Name | Type | Description |
| -------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `factory_name` | str | Name of the registered component factory. |
| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
| _keyword-only_ | | |
| `before` | str / int | Component name or index to insert component directly before. |
| `after` | str / int | Component name or index to insert component directly after: |
| `first` | bool | Insert component first / not first in the pipeline. |
| `last` | bool | Insert component last / not last in the pipeline. |
| `config` <Tag variant="new">3</Tag> | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
| `validate` <Tag variant="new">3</Tag> | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
| **RETURNS** <Tag variant="new">3</Tag> | callable | The pipeline component. |
## Language.has_factory {#has_factory tag="classmethod" new="3"}
Check whether a factory name is registered on the `Language` class or subclass.
Will check for
[language-specific factories](/usage/processing-pipelines#factories-language)
registered on the subclass, as well as general-purpose factories registered on
the `Language` base class, available to all subclasses.
> #### Example
>
> ```python
> from spacy.language import Language
> from spacy.lang.en import English
>
> @English.component("component")
> def component(doc):
> return doc
>
> assert English.has_factory("component")
> assert not Language.has_factory("component")
> ```
| Name | Type | Description |
| ----------- | ---- | ---------------------------------------------------------- |
| `name` | str | Name of the pipeline factory to check. |
| **RETURNS** | bool | Whether a factory of that name is registered on the class. |
## Language.has_pipe {#has_pipe tag="method" new="2"}
@ -208,9 +375,13 @@ Check whether a component is present in the pipeline. Equivalent to
> #### Example
>
> ```python
> nlp.add_pipe(lambda doc: doc, name="component")
> assert "component" in nlp.pipe_names
> assert nlp.has_pipe("component")
> @Language.component("component")
> def component(doc):
> return doc
>
> nlp.add_pipe("component", name="my_component")
> assert "my_component" in nlp.pipe_names
> assert nlp.has_pipe("my_component")
> ```
| Name | Type | Description |
@ -324,6 +495,43 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
| `enable` | str / list | Names(s) of pipeline components that will not be disabled. |
| **RETURNS** | `DisabledPipes` | The disabled pipes that can be restored by calling the object's `.restore()` method. |
## Language.meta {#meta tag="property"}
Custom meta data for the Language class. If a model is loaded, contains meta
data of the model. The `Language.meta` is also what's serialized as the
`meta.json` when you save an `nlp` object to disk.
> #### Example
>
> ```python
> print(nlp.meta)
> ```
| Name | Type | Description |
| ----------- | ---- | -------------- |
| **RETURNS** | dict | The meta data. |
## Language.config {#config tag="property" new="3"}
Export a trainable [`config.cfg`](/api/data-formats#config) for the current
`nlp` object. Includes the current pipeline, all configs used to create the
currently active pipeline components, as well as the default training config
that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
subclass of the built-in `dict`. It supports the additional methods `to_disk`
(serialize the config to a file) and `to_str` (output the config as a string).
> #### Example
>
> ```python
> nlp.config.to_disk("./config.cfg")
> print(nlp.config.to_str())
> ```
| Name | Type | Description |
| ----------- | --------------------------------------------------- | ----------- |
| **RETURNS** | [`Config`](https://thinc.ai/docs/api-config#config) | The config. |
## Language.to_disk {#to_disk tag="method" new="2"}
Save the current state to a directory. If a model is loaded, this will **include
@ -405,23 +613,25 @@ available to the loaded object.
## Attributes {#attributes}
| Name | Type | Description |
| ------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------- |
| `vocab` | `Vocab` | A container for the lexical types. |
| `tokenizer` | `Tokenizer` | The tokenizer. |
| `make_doc` | `callable` | Callable that takes a string and returns a `Doc`. |
| `pipeline` | list | List of `(name, component)` tuples describing the current processing pipeline, in order. |
| `pipe_names` <Tag variant="new">2</Tag> | list | List of pipeline component names, in order. |
| `pipe_labels` <Tag variant="new">2.2</Tag> | dict | List of labels set by the pipeline components, if available, keyed by component name. |
| `meta` | dict | Custom meta data for the Language class. If a model is loaded, contains meta data of the model. |
| `path` <Tag variant="new">2</Tag> | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
| Name | Type | Description |
| --------------------------------------------- | ---------------------- | ---------------------------------------------------------------------------------------- |
| `vocab` | `Vocab` | A container for the lexical types. |
| `tokenizer` | `Tokenizer` | The tokenizer. |
| `make_doc` | `Callable` | Callable that takes a string and returns a `Doc`. |
| `pipeline` | `List[str, Callable]` | List of `(name, component)` tuples describing the current processing pipeline, in order. |
| `pipe_names` <Tag variant="new">2</Tag> | `List[str]` | List of pipeline component names, in order. |
| `pipe_labels` <Tag variant="new">2.2</Tag> | `Dict[str, List[str]]` | List of labels set by the pipeline components, if available, keyed by component name. |
| `pipe_factories` <Tag variant="new">2.2</Tag> | `Dict[str, str]` | Dictionary of pipeline component names, mapped to their factory names. |
| `factory_names` <Tag variant="new">3</Tag> | `List[str]` | List of all available factory names. |
| `path` <Tag variant="new">2</Tag> | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
## Class attributes {#class-attributes}
| Name | Type | Description |
| ---------- | ----- | ----------------------------------------------------------------------------------------------- |
| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
| Name | Type | Description |
| ---------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
| `default_config` | dict | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](https://github.com/explosion/spaCy/tree/develop/spacy/default_config.cfg). |
## Defaults {#defaults}

View File

@ -10,20 +10,18 @@ coarse-grained POS tags following the Universal Dependencies
[UPOS](https://universaldependencies.org/u/pos/index.html) and
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
annotation guidelines. This class is a subclass of `Pipe` and follows the same
API. The component is also available via the string name `"morphologizer"`.
After initialization, it is typically added to the processing pipeline using
[`nlp.add_pipe`](/api/language#add_pipe).
API. The pipeline component is available in the
[processing pipeline](/usage/processing-pipelines) via the ID `"morphologizer"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/morphologizer_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/morphologizer.pyx
```
## Morphologizer.\_\_init\_\_ {#init tag="method"}
@ -33,24 +31,19 @@ Initialize the morphologizer.
> #### Example
>
> ```python
> # Construction via create_pipe
> morphologizer = nlp.create_pipe("morphologizer")
>
> # Construction from class
> from spacy.pipeline import Morphologizer
> morphologizer = Morphologizer()
> # Construction via add_pipe
> morphologizer = nlp.add_pipe("morphologizer")
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------------------------- |
| `vocab` | `Vocab` | The shared vocabulary. |
| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
| `**cfg` | - | Configuration parameters. |
| Name | Type | Description |
| ----------- | --------------- | ------------------------------------------------------------------------------- |
| `vocab` | `Vocab` | The shared vocabulary. |
| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
| `**cfg` | - | Configuration parameters. |
| **RETURNS** | `Morphologizer` | The newly constructed object. |
## Morphologizer.\_\_call\_\_ {#call tag="method"}
@ -58,8 +51,8 @@ shortcut for this and instantiate the component using its string name and
Apply the pipe to one document. The document is modified in place, and returned.
This usually happens under the hood when the `nlp` object is called on a text
and all pipeline components are applied to the `Doc` in order. Both
[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe) delegate to the
[`predict`](/api/morphologizer#predict) and
[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe)
delegate to the [`predict`](/api/morphologizer#predict) and
[`set_annotations`](/api/morphologizer#set_annotations) methods.
> #### Example
@ -81,7 +74,8 @@ and all pipeline components are applied to the `Doc` in order. Both
Apply the pipe to a stream of documents. This usually happens under the hood
when the `nlp` object is called on a text and all pipeline components are
applied to the `Doc` in order. Both [`__call__`](/api/morphologizer#call) and
[`pipe`](/api/morphologizer#pipe) delegate to the [`predict`](/api/morphologizer#predict) and
[`pipe`](/api/morphologizer#pipe) delegate to the
[`predict`](/api/morphologizer#predict) and
[`set_annotations`](/api/morphologizer#set_annotations) methods.
> #### Example
@ -126,9 +120,9 @@ Modify a batch of documents, using pre-computed scores.
> morphologizer.set_annotations([doc1, doc2], scores)
> ```
| Name | Type | Description |
| -------- | --------------- | ------------------------------------------------ |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| Name | Type | Description |
| -------- | --------------- | ------------------------------------------------------- |
| `docs` | `Iterable[Doc]` | The documents to modify. |
| `scores` | - | The scores to set, produced by `Morphologizer.predict`. |
## Morphologizer.update {#update tag="method"}
@ -145,15 +139,15 @@ pipe's model. Delegates to [`predict`](/api/morphologizer#predict) and
> losses = morphologizer.update(examples, sgd=optimizer)
> ```
| Name | Type | Description |
| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
| _keyword-only_ | | |
| `drop` | float | The dropout rate. |
| Name | Type | Description |
| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
| _keyword-only_ | | |
| `drop` | float | The dropout rate. |
| `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/morphologizer#set_annotations). |
| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
## Morphologizer.get_loss {#get_loss tag="method"}
@ -187,12 +181,12 @@ Initialize the pipe for training, using data examples if available. Return an
> optimizer = morphologizer.begin_training(pipeline=nlp.pipeline)
> ```
| Name | Type | Description |
| -------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
| Name | Type | Description |
| -------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
| `sgd` | `Optimizer` | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/morphologizer#create_optimizer) if not set. |
| **RETURNS** | `Optimizer` | An optimizer. |
| **RETURNS** | `Optimizer` | An optimizer. |
## Morphologizer.create_optimizer {#create_optimizer tag="method"}
@ -237,9 +231,9 @@ both `pos` and `morph`, the label should include the UPOS as the feature `POS`.
> morphologizer.add_label("Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin")
> ```
| Name | Type | Description |
| -------- | ---- | --------------------------------------------------------------- |
| `label` | str | The label to add. |
| Name | Type | Description |
| ------- | ---- | ----------------- |
| `label` | str | The label to add. |
## Morphologizer.to_disk {#to_disk tag="method"}
@ -268,11 +262,11 @@ Load the pipe from disk. Modifies the object in place and returns it.
> morphologizer.from_disk("/path/to/morphologizer")
> ```
| Name | Type | Description |
| ----------- | ------------ | -------------------------------------------------------------------------- |
| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
| Name | Type | Description |
| ----------- | --------------- | -------------------------------------------------------------------------- |
| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
## Morphologizer.to_bytes {#to_bytes tag="method"}
@ -288,7 +282,7 @@ Serialize the pipe to a bytestring.
| Name | Type | Description |
| ----------- | ----- | ------------------------------------------------------------------------- |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
## Morphologizer.from_bytes {#from_bytes tag="method"}
@ -302,16 +296,16 @@ Load the pipe from a bytestring. Modifies the object in place and returns it.
> morphologizer.from_bytes(morphologizer_bytes)
> ```
| Name | Type | Description |
| ------------ | -------- | ------------------------------------------------------------------------- |
| `bytes_data` | bytes | The data to load from. |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
| Name | Type | Description |
| ------------ | --------------- | ------------------------------------------------------------------------- |
| `bytes_data` | bytes | The data to load from. |
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
## Morphologizer.labels {#labels tag="property"}
The labels currently added to the component in Universal Dependencies [FEATS
format](https://universaldependencies.org/format.html#morphological-annotation).
The labels currently added to the component in Universal Dependencies
[FEATS format](https://universaldependencies.org/format.html#morphological-annotation).
Note that even for a blank component, this will always include the internal
empty label `_`. If POS features are used, the labels will include the
coarse-grained POS as the feature `POS`.
@ -339,8 +333,8 @@ serialization by passing in the string names via the `exclude` argument.
> data = morphologizer.to_disk("/path", exclude=["vocab"])
> ```
| Name | Description |
| --------- | ------------------------------------------------------------------------------------------ |
| `vocab` | The shared [`Vocab`](/api/vocab). |
| `cfg` | The config file. You usually don't want to exclude this. |
| `model` | The binary model data. You usually don't want to exclude this. |
| Name | Description |
| ------- | -------------------------------------------------------------- |
| `vocab` | The shared [`Vocab`](/api/vocab). |
| `cfg` | The config file. You usually don't want to exclude this. |
| `model` | The binary model data. You usually don't want to exclude this. |

View File

@ -11,8 +11,7 @@ menu:
## merge_noun_chunks {#merge_noun_chunks tag="function"}
Merge noun chunks into a single token. Also available via the string name
`"merge_noun_chunks"`. After initialization, the component is typically added to
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
`"merge_noun_chunks"`.
> #### Example
>
@ -20,9 +19,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
> texts = [t.text for t in nlp("I have a blue car")]
> assert texts == ["I", "have", "a", "blue", "car"]
>
> merge_nps = nlp.create_pipe("merge_noun_chunks")
> nlp.add_pipe(merge_nps)
>
> nlp.add_pipe("merge_noun_chunks")
> texts = [t.text for t in nlp("I have a blue car")]
> assert texts == ["I", "have", "a blue car"]
> ```
@ -44,8 +41,7 @@ all other components.
## merge_entities {#merge_entities tag="function"}
Merge named entities into a single token. Also available via the string name
`"merge_entities"`. After initialization, the component is typically added to
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
`"merge_entities"`.
> #### Example
>
@ -53,8 +49,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
> texts = [t.text for t in nlp("I like David Bowie")]
> assert texts == ["I", "like", "David", "Bowie"]
>
> merge_ents = nlp.create_pipe("merge_entities")
> nlp.add_pipe(merge_ents)
> nlp.add_pipe("merge_entities")
>
> texts = [t.text for t in nlp("I like David Bowie")]
> assert texts == ["I", "like", "David Bowie"]
@ -76,12 +71,9 @@ components to the end of the pipeline and after all other components.
## merge_subtokens {#merge_subtokens tag="function" new="2.1"}
Merge subtokens into a single token. Also available via the string name
`"merge_subtokens"`. After initialization, the component is typically added to
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
As of v2.1, the parser is able to predict "subtokens" that should be merged into
one single token later on. This is especially relevant for languages like
Chinese, Japanese or Korean, where a "word" isn't defined as a
`"merge_subtokens"`. As of v2.1, the parser is able to predict "subtokens" that
should be merged into one single token later on. This is especially relevant for
languages like Chinese, Japanese or Korean, where a "word" isn't defined as a
whitespace-delimited sequence of characters. Under the hood, this component uses
the [`Matcher`](/api/matcher) to find sequences of tokens with the dependency
label `"subtok"` and then merges them into a single token.
@ -96,9 +88,7 @@ label `"subtok"` and then merges them into a single token.
> print([(token.text, token.dep_) for token in doc])
> # [('拜', 'subtok'), ('托', 'subtok')]
>
> merge_subtok = nlp.create_pipe("merge_subtokens")
> nlp.add_pipe(merge_subtok)
>
> nlp.add_pipe("merge_subtokens")
> doc = nlp("拜托")
> print([token.text for token in doc])
> # ['拜托']

View File

@ -1,26 +1,24 @@
---
title: SentenceRecognizer
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/senter.pyx
new: 3
---
A trainable pipeline component for sentence segmentation. For a simpler,
ruse-based strategy, see the [`Sentencizer`](/api/sentencizer). This class is a
subclass of `Pipe` and follows the same API. The component is also available via
the string name `"senter"`. After initialization, it is typically added to the
processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
the string name `"senter"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/senter_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/senter.pyx
```
## SentenceRecognizer.\_\_init\_\_ {#init tag="method"}
@ -30,12 +28,8 @@ Initialize the sentence recognizer.
> #### Example
>
> ```python
> # Construction via create_pipe
> senter = nlp.create_pipe("senter")
>
> # Construction from class
> from spacy.pipeline import SentenceRecognizer
> senter = SentenceRecognizer()
> # Construction via add_pipe
> senter = nlp.add_pipe("senter")
> ```
<!-- TODO: document, similar to other trainable pipeline components -->

View File

@ -9,8 +9,7 @@ that doesn't require the dependency parse. By default, sentence segmentation is
performed by the [`DependencyParser`](/api/dependencyparser), so the
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
require a statistical model to be loaded. The component is also available via
the string name `"sentencizer"`. After initialization, it is typically added to
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
the string name `"sentencizer"`.
## Sentencizer.\_\_init\_\_ {#init tag="method"}
@ -19,12 +18,8 @@ Initialize the sentencizer.
> #### Example
>
> ```python
> # Construction via create_pipe
> sentencizer = nlp.create_pipe("sentencizer")
>
> # Construction from class
> from spacy.pipeline import Sentencizer
> sentencizer = Sentencizer()
> # Construction via add_pipe
> sentencizer = nlp.add_pipe("sentencizer")
> ```
| Name | Type | Description |
@ -58,8 +53,7 @@ the component has been added to the pipeline using
> from spacy.lang.en import English
>
> nlp = English()
> sentencizer = nlp.create_pipe("sentencizer")
> nlp.add_pipe(sentencizer)
> nlp.add_pipe("sentencizer")
> doc = nlp("This is a sentence. This is another sentence.")
> assert len(list(doc.sents)) == 2
> ```

View File

@ -1,7 +1,7 @@
---
title: Tagger
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/tagger.pyx
---
This class is a subclass of `Pipe` and follows the same API. The pipeline
@ -13,22 +13,17 @@ via the ID `"tagger"`.
> #### Example
>
> ```python
> # Construction via create_pipe
> tagger = nlp.create_pipe("tagger")
> # Construction via add_pipe with default model
> tagger = nlp.add_pipe("tagger")
>
> # Construction via create_pipe with custom model
> config = {"model": {"@architectures": "my_tagger"}}
> parser = nlp.create_pipe("tagger", config)
>
> # Construction from class with custom model from file
> from spacy.pipeline import Tagger
> model = util.load_config("model.cfg", create_objects=True)["model"]
> tagger = Tagger(nlp.vocab, model)
> parser = nlp.add_pipe("tagger", config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------------------------- |

View File

@ -1,7 +1,7 @@
---
title: TextCategorizer
tag: class
source: spacy/pipeline/pipes.pyx
source: spacy/pipeline/textcat.py
new: 2
---
@ -9,41 +9,33 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
component is available in the [processing pipeline](/usage/processing-pipelines)
via the ID `"textcat"`.
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/textcat_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/textcat.py
```
<!-- TODO: do we also need to document the other defaults here? -->
## TextCategorizer.\_\_init\_\_ {#init tag="method"}
> #### Example
>
> ```python
> # Construction via create_pipe
> textcat = nlp.create_pipe("textcat")
> # Construction via add_pipe with default model
> textcat = nlp.add_pipe("textcat")
>
> # Construction via create_pipe with custom model
> # Construction via add_pipe with custom model
> config = {"model": {"@architectures": "my_textcat"}}
> parser = nlp.create_pipe("textcat", config)
>
> # Construction from class with custom model from file
> from spacy.pipeline import TextCategorizer
> model = util.load_config("model.cfg", create_objects=True)["model"]
> textcat = TextCategorizer(nlp.vocab, model)
> parser = nlp.add_pipe("textcat", config=config)
> ```
Create a new pipeline instance. In your application, you would normally use a
shortcut for this and instantiate the component using its string name and
[`nlp.create_pipe`](/api/language#create_pipe).
[`nlp.add_pipe`](/api/language#create_pipe).
| Name | Type | Description |
| ----------- | ----------------- | ------------------------------------------------------------------------------- |

View File

@ -4,16 +4,15 @@ source: spacy/pipeline/tok2vec.py
new: 3
---
TODO: document
<!-- TODO: document -->
## Default config {#config}
## Implementation and defaults {#implementation}
This is the default configuration used to initialize the model powering the
pipeline component. See the [model architectures](/api/architectures)
documentation for details on the architectures and their arguments and
hyperparameters. To learn more about how to customize the config and train
custom models, check out the [training config](/usage/training#config) docs.
See the [model architectures](/api/architectures) documentation for details on
the architectures and their arguments and hyperparameters. To learn more about
how to customize the config and train custom models, check out the
[training config](/usage/training#config) docs.
```python
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/tok2vec_defaults.cfg
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/tok2vec.py
```

View File

@ -31,7 +31,7 @@ the
> nlp = English()
> # Create a Tokenizer with the default settings for English
> # including punctuation rules and exceptions
> tokenizer = nlp.Defaults.create_tokenizer(nlp)
> tokenizer = nlp.tokenizer
> ```
| Name | Type | Description |

View File

@ -45,7 +45,8 @@ class, loads in the model data and returns it.
### Abstract example
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en'
nlp = cls() # initialise the language
for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline
for name in pipeline:
nlp.add_pipe(name) # add component to pipeline
nlp.from_disk(model_data_path) # load in model data
```
@ -479,7 +480,6 @@ you can use the [`set_lang_class`](/api/top-level#util.set_lang_class) helper.
> for lang_id in ["en", "de"]:
> lang_class = util.get_lang_class(lang_id)
> lang = lang_class()
> tokenizer = lang.Defaults.create_tokenizer()
> ```
| Name | Type | Description |

View File

@ -1,30 +1,33 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 923 200" width="923" height="200">
<style>
.svg__pipeline__text { fill: #1a1e23; font: 20px Arial, sans-serif }
.svg__pipeline__text-small { fill: #1a1e23; font: bold 18px Arial, sans-serif }
.svg__pipeline__text-code { fill: #1a1e23; font: 600 16px Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace }
</style>
<rect width="601" height="127" x="159" y="21" fill="none" stroke="#09a3d5" stroke-width="3" rx="19.1" stroke-dasharray="3 6" ry="19.1"/>
<path fill="#e1d5e7" stroke="#9673a6" stroke-width="2" d="M801 55h120v60H801z"/>
<text class="svg__pipeline__text" dy="0.75em" width="28" height="19" transform="translate(846.5 75.5)">Doc</text>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M121.2 84.7h29.4"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M156.6 84.7l-8 4 2-4-2-4z"/>
<path fill="#f5f5f5" stroke="#999" stroke-width="2" d="M1 55h120v60H1z"/>
<text class="svg__pipeline__text" dy="0.85em" width="34" height="22" transform="translate(43.5 73.5)">Text</text>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M760 84.7h33"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M799 84.7l-8 4 2-4-2-4z"/>
<rect width="75" height="39" x="422" y="1" fill="#dae8fc" stroke="#09a3d5" stroke-width="2" rx="5.8" ry="5.8"/>
<text class="svg__pipeline__text-code" dy="0.8em" dx="0.1em" width="29" height="17" transform="translate(444.5 11.5)">nlp</text>
<path fill="#f8cecc" stroke="#b85450" stroke-width="2" stroke-miterlimit="10" d="M176 58h103.3L296 88l-16.8 30H176l16.8-30z"/>
<text class="svg__pipeline__text-small" dy="0.75em" dx="-0.25em" width="58" height="14" transform="translate(206.5 80.5)">tokenizer</text>
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M314 58h103.3L434 88l-16.8 30H314l16.8-30z"/>
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="62" height="14" transform="translate(342.5 80.5)">tagger</text>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M296.5 88.2h24.7"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M327.2 88.2l-8 4 2-4-2-4z"/>
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M416 58h103.3L536 88l-16.8 30H416l16.8-30z"/>
<text class="svg__pipeline__text-small" dy="0.75em" dx="-0.25em" width="40" height="14" transform="translate(455.5 80.5)">parser</text>
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M519 58h103.3L639 88l-16.8 30H519l16.8-30z"/>
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="40" height="14" transform="translate(558.5 80.5)">ner</text>
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M622 58h103.3L742 88l-16.8 30H622l16.8-30z"/>
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="20" height="14" transform="translate(671.5 80.5)">...</text>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1155" height="221" viewBox="0 0 1155 221">
<defs>
<rect id="a" width="735" height="170" x="210" y="25" rx="30"/>
<mask id="b" width="735" height="170" x="0" y="0" fill="#fff" maskContentUnits="userSpaceOnUse" maskUnits="objectBoundingBox">
<use xlink:href="#a"/>
</mask>
</defs>
<g fill="none" fill-rule="evenodd" transform="translate(0 26)">
<rect width="145" height="80" x="2.5" y="2.5" fill="#D8D8D8" stroke="#6A6A6A" stroke-width="5" rx="10" transform="translate(0 70)"/>
<path fill="#3D4251" fill-rule="nonzero" d="M55.4 99.7v3.9h-7.6V125H43v-21.4h-7.7v-3.9h20zm10.2 7c1 0 2.1.2 3 .6a6.8 6.8 0 014.1 4.1 9.6 9.6 0 01.6 4.3l-.2.5-.3.3H61.3c0 2 .6 3.3 1.4 4.1.9.9 2 1.3 3.5 1.3a6 6 0 001.8-.2l1.3-.6 1-.5.8-.3c.2 0 .3 0 .5.2l.3.2 1.3 1.6c-.5.6-1 1-1.6 1.4a9 9 0 01-3.9 1.4l-2 .2c-1.2 0-2.3-.2-3.4-.7-1-.4-2-1-2.8-1.8a8.6 8.6 0 01-1.9-3 11.6 11.6 0 010-7.6c.3-1.1.9-2 1.6-2.8a8 8 0 012.7-2 9 9 0 013.7-.6zm0 3.2a4 4 0 00-3 1c-.6.7-1 1.8-1.3 3h8.1c0-.5 0-1-.2-1.5-.1-.5-.4-1-.7-1.3-.3-.4-.7-.7-1.2-1a4 4 0 00-1.7-.2zm15.5 5.8l-5.9-8.7h4.2c.3 0 .5 0 .7.2l.4.4 3.7 6a4.9 4.9 0 01.6-1.2l3-4.7.4-.5.6-.2h4l-6 8.5L93 125h-4.2c-.3 0-.5 0-.7-.2l-.5-.6-3.8-6.3-.4 1.1-3.4 5.2-.5.5a1 1 0 01-.7.3H75l6-9.3zm20.5 9.6c-1.5 0-2.7-.5-3.5-1.3a5 5 0 01-1.3-3.7v-10H95c-.3 0-.5 0-.6-.2-.2-.2-.3-.4-.3-.7v-1.7l2.9-.5 1-5c0-.1 0-.3.2-.5l.7-.2h2.2v5.7h4.7v3h-4.7v9.8c0 .6.2 1 .4 1.3.3.3.7.5 1.2.5l.6-.1a3.7 3.7 0 00.9-.4l.3-.1.3.1.3.3 1.2 2c-.6.6-1.3 1-2.1 1.3a8 8 0 01-2.6.4z"/>
<rect width="145" height="80" x="2.5" y="2.5" fill="#D7CCF4" stroke="#8978B5" stroke-width="5" rx="10" transform="translate(1005 70)"/>
<path fill="#3D4251" fill-rule="nonzero" d="M1050.3 101.5a58.8 58.8 0 016.8-.4c2.2 0 4 .4 5.4 1 1.4.6 2.5 1.5 3.4 2.6a10 10 0 011.7 4 23.2 23.2 0 010 9.6c-.3 1.5-1 2.9-1.8 4-.8 1.3-2 2.2-3.5 3-1.5.7-3.4 1-5.8 1a37.3 37.3 0 01-5-.1l-1.2-.2v-24.5zm7 4a15.6 15.6 0 00-2.3 0V122h.5a158 158 0 001.6.1 6 6 0 003.2-.7c.8-.5 1.4-1.2 1.8-2 .4-.8.7-1.8.8-2.8a27.3 27.3 0 000-5.8 8 8 0 00-.7-2.6c-.4-.8-1-1.5-1.8-2-.7-.5-1.8-.8-3.1-.8zm13.4 11.8c0-1.5.2-2.8.7-4a8 8 0 014.8-4.7c1.1-.4 2.4-.6 3.8-.6 1.5 0 2.8.2 4 .7 1 .4 2 1 2.9 1.8.8.9 1.4 1.8 1.8 3 .4 1.1.6 2.4.6 3.7 0 1.5-.2 2.8-.7 4a8 8 0 01-4.8 4.7c-1.1.4-2.4.6-3.8.6a11 11 0 01-4-.7c-1-.4-2-1-2.9-1.8a7.9 7.9 0 01-1.8-3c-.4-1.1-.6-2.4-.6-3.8zm4.7 0c0 .7.1 1.4.3 2 .2.7.5 1.3 1 1.8a4.1 4.1 0 003.3 1.5c1.4 0 2.5-.4 3.3-1.3.9-.8 1.3-2.2 1.3-4a6 6 0 00-1.2-4c-.8-1-2-1.4-3.4-1.4-.7 0-1.3 0-1.8.3-.6.2-1 .5-1.5 1-.4.4-.7 1-1 1.6-.2.7-.3 1.5-.3 2.4zm34.2 7c-1 .7-2 1.3-3.3 1.6-1.3.4-2.7.6-4 .6-1.6 0-3-.2-4.1-.7-1.2-.4-2.2-1-3-1.8a8 8 0 01-1.8-3 10.9 10.9 0 010-7.7 8.2 8.2 0 015.2-4.7 14.3 14.3 0 017.6-.2l2.6 1v6.1h-3.8v-3.2l-2.2-.3c-.7 0-1.3.1-2 .3a4.8 4.8 0 00-2.9 2.6c-.3.7-.5 1.4-.5 2.3 0 .8.2 1.5.4 2.1a5 5 0 002.8 2.8 8.2 8.2 0 005.6-.2l1.9-1 1.5 3.4z"/>
<use stroke="#3AC" stroke-dasharray="5 10" stroke-width="10" mask="url(#b)" xlink:href="#a"/>
<g transform="translate(540)">
<rect width="95" height="50" x="2.5" y="2.5" fill="#C3E7F1" stroke="#3AC" stroke-width="5" rx="10"/>
<path fill="#3D4251" fill-rule="nonzero" d="M27.8 24.5h4.4l.3 1.6h.1a5.2 5.2 0 014.2-2c.7 0 1.3.1 1.8.3.6.2 1 .4 1.4.8.4.4.7 1 1 1.6.1.6.3 1.5.3 2.4V37H38v-7.1c0-1-.2-1.8-.7-2.2-.4-.5-1-.7-1.7-.7-.6 0-1.2.2-1.7.6-.5.3-.9.8-1 1.3V37h-3.3v-9.8h-1.8v-2.7zm16.9-5H50v11.6c0 1.2.2 2.1.5 2.6s.8.8 1.5.8c.5 0 1 0 1.3-.2l1-.4 1.2 2.2a15.3 15.3 0 01-1.8 1 6.1 6.1 0 01-2.3.3c-1.5 0-2.7-.4-3.5-1.3-.8-.8-1.1-1.9-1.1-3.4V22.3h-2.1v-2.7zm12.8 5h4.3L62 26h.1c.9-1.2 2.3-1.9 4.2-1.9a6 6 0 012.1.4c.7.3 1.2.6 1.7 1.1.4.6.8 1.2 1 2 .3.8.4 1.7.4 2.8 0 1-.1 2-.4 3-.3.8-.7 1.5-1.2 2.1-.6.6-1.2 1-2 1.4-.7.3-1.6.5-2.6.5-.5 0-1 0-1.5-.2-.5 0-1-.2-1.3-.3V42h-3.2V27.2h-1.9v-2.7zm8 2.4c-.7 0-1.3.2-1.8.5s-.9.8-1 1.4V34c.2.2.5.3 1 .4l1.3.2c.4 0 .9 0 1.3-.2s.7-.4 1-.8c.3-.4.6-.8.7-1.3.2-.6.3-1.2.3-2 0-1-.3-1.9-.8-2.5-.6-.6-1.2-.9-2-.9z"/>
</g>
<path fill="#3AC" d="M205 112.5L180 125v-25z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M180 112.5h-23.1"/>
<path fill="#3AC" d="M1000 112.5L975 125v-25z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M975 112.5h-23.1"/>
<path fill="#EAC1CC" stroke="#F03969" stroke-linejoin="round" stroke-width="3.8" d="M230 75h135l23.5 43.4L365 160H230l23.5-41.5z"/>
<path fill="#F2D7B2" stroke="#F0A439" stroke-linejoin="round" stroke-width="3.8" d="M395 75h135l23.5 43.4L530 160H395l23.5-41.5z"/>
<path fill="#F2E7A6" stroke="#CDB217" stroke-linejoin="round" stroke-width="3.8" d="M515 75h135l23.5 43.4L650 160H515l23.5-41.5z"/>
<path fill="#D7E99A" stroke="#B2D73A" stroke-linejoin="round" stroke-width="3.8" d="M640 75h135l23.5 43.4L775 160H640l23.5-41.5z"/>
<path fill="#B5F3D4" stroke="#3AD787" stroke-linejoin="round" stroke-width="3.8" d="M765 75h135l23.5 43.4L900 160H765l23.5-41.5z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M265.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm10.9-13.2c1 0 1.8.1 2.6.4a5.6 5.6 0 013.3 3.4c.3.8.4 1.8.4 2.8 0 1-.1 1.9-.4 2.7a5.5 5.5 0 01-3.3 3.4 7 7 0 01-2.6.5 7 7 0 01-2.6-.5 5.6 5.6 0 01-3.3-3.4 7.8 7.8 0 010-5.5c.3-.8.7-1.5 1.3-2 .5-.6 1.2-1 2-1.4a7 7 0 012.6-.4zm0 10.8c1 0 1.9-.3 2.4-1 .5-.8.7-1.8.7-3.2 0-1.4-.2-2.4-.7-3.2-.5-.7-1.3-1-2.4-1-1 0-1.9.3-2.4 1-.5.8-.8 1.8-.8 3.2 0 1.4.3 2.4.8 3.1.5.8 1.3 1.1 2.4 1.1zm11.9-16.4v10.7h.5l.5-.1.4-.3 3.2-4 .4-.4.7-.1h2.8l-4 4.7-.4.5-.5.4.4.4.4.6 4.3 6.2h-2.8l-.6-.1c-.2-.1-.3-.2-.4-.5l-3.3-4.8a1 1 0 00-.4-.4h-1.2v5.8h-3.1v-18.6h3zm16 5.6c.7 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1.1 3 .6.6 1.4.9 2.4.9.6 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.7-.2-2.5-.5s-1.4-.7-2-1.3c-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-1 0-1.6.2-2.1.8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9c.4 0 .6.2.8.5l.2 1a7 7 0 011.7-1.2 4.6 4.6 0 012.2-.5c.7 0 1.4 0 1.9.3l1.4 1 .8 1.6c.2.6.3 1.2.3 2v8.1h-3.1v-8.2c0-.7-.2-1.4-.6-1.8-.3-.4-.9-.6-1.6-.6l-1.5.3c-.5.3-1 .6-1.3 1v9.3h-3.1zm17.5-12.8V125H327v-12.8h3zm.4-3.8l-.1.8a2 2 0 01-1 1 2 2 0 01-2.2-.4 2 2 0 01-.4-.6l-.2-.8a2 2 0 01.6-1.4 2 2 0 011.3-.5l.8.1a2 2 0 011 1l.3.8zm12.3 5v.7l-.3.5-6.2 8h6.4v2.4h-10v-1.3l.2-.5c0-.2.1-.4.3-.5l6.1-8.2h-6.2v-2.3h9.8v1.3zm7.8-1.4c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.8 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.3 2.3-.2.3-.3.1h-.6l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M440.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm15.5-.2H455l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-1.9 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.8-.1l-1.2-.7a3 3 0 01-.7-1c-.2-.5-.3-1-.3-1.6 0-.5.1-1 .4-1.4.2-.5.6-.9 1.2-1.3s1.4-.7 2.4-1c1-.2 2.2-.3 3.7-.3v-.8c0-.9-.2-1.5-.6-2-.3-.3-.9-.5-1.6-.5a3.8 3.8 0 00-2 .5l-.8.4c-.2.2-.4.2-.6.2-.3 0-.4 0-.6-.2l-.3-.3-.6-1c1.5-1.4 3.2-2 5.3-2 .8 0 1.4 0 2 .3a4.3 4.3 0 012.5 2.6c.2.6.3 1.3.3 2v8.1zm-6-2h.9a3.3 3.3 0 001.4-.7l.7-.6v-2.2c-1 0-1.7.1-2.3.3a6 6 0 00-1.5.4l-.7.6c-.2.2-.3.5-.3.8 0 .5.2.9.5 1.1.3.3.8.4 1.3.4zm13.5-11l1.5.1 1.3.5h3.7v1.2l-.1.4-.6.2-1.1.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.5 3c-.4.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3.2.5.4.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.2 1-.4 1.6-.3.5-.7 1-1.2 1.4-.6.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.9-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.1-1 .5-1.5.4-.4.9-.7 1.5-1-.3-.1-.5-.4-.7-.7a2 2 0 01-.3-1.1v-.6l.4-.6.5-.6.8-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3l1.6-.8c.6-.2 1.3-.3 2-.3zm3.3 13.6c0-.3 0-.5-.2-.6-.1-.2-.3-.3-.6-.4l-1-.2a16.7 16.7 0 00-2.2-.2H462c-.4.1-.6.4-.8.6-.3.3-.4.6-.4 1 0 .2 0 .4.2.6l.5.5 1 .3 1.4.1 1.5-.1c.4-.1.8-.2 1-.4l.7-.5.1-.7zm-3.3-7.3c.3 0 .7 0 1-.2l.7-.4.4-.7.1-.8a2 2 0 00-.5-1.5c-.4-.4-1-.6-1.8-.6-.7 0-1.3.2-1.7.6a2 2 0 00-.5 1.5l.1.8a1.8 1.8 0 001.2 1.1l1 .2zm12.9-6.3l1.5.1 1.4.5h3.7v1.2l-.2.4-.5.2-1.2.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.4 3c-.5.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3 0 .5.3.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.1 1-.4 1.6-.3.5-.7 1-1.2 1.4-.5.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.8-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.2-1 .5-1.5.4-.4 1-.7 1.6-1-.3-.1-.6-.4-.8-.7a2 2 0 01-.3-1.1l.1-.6.3-.6.6-.6.7-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3c.5-.3 1-.6 1.7-.8.6-.2 1.3-.3 2-.3zm3.4 13.6c0-.3-.1-.5-.3-.6-.1-.2-.3-.3-.6-.4l-.9-.2a16.7 16.7 0 00-2.3-.2H475l-.8.6c-.2.3-.3.6-.3 1l.1.6.6.5 1 .3 1.4.1 1.5-.1 1-.4c.3-.1.5-.3.6-.5l.2-.7zm-3.4-7.3c.4 0 .7 0 1-.2.3 0 .5-.2.7-.4l.4-.7.2-.8a2 2 0 00-.6-1.5c-.4-.4-1-.6-1.7-.6-.8 0-1.3.2-1.7.6a2 2 0 00-.6 1.5c0 .3 0 .6.2.8a1.8 1.8 0 001 1.1l1 .2zm13.8-6.3c.8 0 1.5.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3l-.1.3-.2.2h-8.3c0 1.4.4 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.2.3 1 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.8 1c-.4.2-1 .2-1.4.2-.9 0-1.7-.2-2.4-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1l-.5-1-.8-.6-1.3-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.2.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.2 1.5v8h-3z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M556.6 129.2v-17h2l.4.1c.2.1.3.2.3.4l.3 1.2c.5-.6 1-1 1.8-1.4a4.8 4.8 0 014.2-.1c.6.3 1.1.7 1.5 1.2a6 6 0 011 2 10.3 10.3 0 010 5.6c-.3.8-.7 1.5-1.1 2a5.1 5.1 0 01-6 1.7l-1.3-1v5.3h-3zm6-14.8c-.6 0-1.1.1-1.6.4-.4.3-.9.6-1.3 1.1v5.8a3 3 0 002.5 1.1c.5 0 .9 0 1.3-.2l1-.8c.2-.4.4-.8.5-1.4a8.6 8.6 0 000-3.8c0-.5-.2-1-.4-1.3a2 2 0 00-.9-.7c-.3-.2-.6-.2-1-.2zm18.2 10.6h-1.3l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-2 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.7-.1c-.5-.2-.9-.4-1.2-.7a3 3 0 01-.8-1c-.2-.5-.3-1-.3-1.6 0-.5.2-1 .4-1.4.3-.5.7-.9 1.3-1.3.6-.4 1.4-.7 2.4-1 1-.2 2.2-.3 3.6-.3v-.8c0-.9-.2-1.5-.5-2-.4-.3-1-.5-1.6-.5a3.8 3.8 0 00-2.1.5l-.7.4c-.2.2-.4.2-.7.2-.2 0-.4 0-.5-.2-.2 0-.3-.2-.4-.3l-.5-1c1.4-1.4 3.2-2 5.3-2a4.3 4.3 0 014.4 3c.2.5.3 1.2.3 1.9v8.1zm-6-2h1a3.3 3.3 0 001.4-.7l.6-.6v-2.2c-.9 0-1.6.1-2.2.3a6 6 0 00-1.5.4l-.8.6-.2.8c0 .5.2.9.5 1.1.3.3.7.4 1.2.4zm9 2v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1zm17.9-10.3l-.3.3h-.8a32.9 32.9 0 00-1.4-.7h-1c-.6 0-1 0-1.4.3-.4.3-.5.6-.5 1 0 .3 0 .5.2.7l.7.5 1 .4a33 33 0 012.3.8c.4.2.8.4 1 .7.4.2.6.5.8 1l.2 1.2c0 .7 0 1.2-.3 1.7-.2.6-.6 1-1 1.4-.4.4-1 .7-1.6.9a7 7 0 01-3.5.2 7.6 7.6 0 01-2.3-.8l-.8-.7.7-1.1c0-.2.2-.3.3-.4h1a12 12 0 001.4.8l1.2.1h1l.6-.4c.1-.2.3-.3.3-.5l.1-.6c0-.3 0-.6-.2-.8l-.7-.5-1-.3a33.5 33.5 0 01-2.4-.9 4 4 0 01-1-.7 3 3 0 01-.7-1 3.7 3.7 0 011-4.2c.4-.3.9-.6 1.5-.8.6-.2 1.3-.3 2-.3 1 0 1.8.1 2.5.4.7.3 1.3.7 1.8 1.2l-.7 1zm8.6-2.7c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M701.6 125v-12.8h2c.3 0 .6.2.7.5l.2 1a7 7 0 011.8-1.2 4.6 4.6 0 012.2-.5c.7 0 1.3 0 1.9.3.5.3 1 .6 1.3 1 .4.5.7 1 .8 1.6.2.6.3 1.2.3 2v8.1h-3v-8.2c0-.7-.2-1.4-.6-1.8-.4-.4-1-.6-1.6-.6-.6 0-1 .1-1.5.3l-1.4 1v9.3h-3zm19.6-13c.8 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.4 3l-.1.3-.3.2H718c.2 1.4.5 2.4 1.1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.5-.1h.4l.2.3.9 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.8 1c-.5.2-1 .2-1.4.2-.9 0-1.7-.2-2.5-.5-.7-.3-1.4-.7-2-1.3-.5-.5-1-1.3-1.3-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.6-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-1 1.2-1 2.1h5.7l-.1-1.1-.5-1-.9-.6-1.2-.2zm8 10.8v-12.8h1.8l.7.1.3.7.1 1.5a6 6 0 011.6-1.9c.7-.4 1.4-.7 2.1-.7.7 0 1.2.2 1.6.5l-.4 2.3c0 .1 0 .2-.2.3l-.3.1h-.5l-.9-.2c-.6 0-1.2.2-1.6.6a4 4 0 00-1.2 1.5v8h-3z"/>
<path fill="#3D4251" fill-rule="nonzero" d="M831 123.3a2 2 0 01.5-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.3 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.2-.1-.3-.3-.4-.6a2 2 0 01-.2-.7zm5.5 0a2 2 0 01.6-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.2-.7zm5.7 0a2 2 0 01.5-1.3 2 2 0 011.4-.6 1.9 1.9 0 011.3.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.1-.7z"/>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 3.1 KiB

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -1,47 +1,60 @@
<svg class="o-svg" xmlns="http://www.w3.org/2000/svg" width="827" height="168" viewBox="-10 -10 837 178">
<style>
.svg__training__text { fill: #1a1e23; font: 18px Arial, sans-serif }
.svg__training__text-code { fill: #1a1e23; font: bold 16px Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace }
</style>
<defs>
<linearGradient id="a" x1="0%" x2="0%" y1="100%" y2="0%">
<stop offset="0%" stop-color="#F99"/>
<stop offset="100%" stop-color="#B3FF66"/>
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="827" height="168" viewBox="0 0 827 168">
<defs>
<linearGradient id="c" x1="0%" x2="100%" y1="0%" y2="100%">
<stop offset="0%" stop-color="#B4FE67"/>
<stop offset="100%" stop-color="#FE9A98"/>
</linearGradient>
</defs>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M561 103h-6v46H251v-35.8"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M251 107.2l4 8-4-2-4 2z"/>
<rect fill="#f6f6f6" transform="translate(372 138.5)" width="80" height="20"/>
<text class="svg__training__text-code" dy="1em" transform="translate(378.5 138.5)" width="65" height="16">PREDICT</text>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M621 73v6h76.8"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M703.8 79l-8 4 2-4-2-4z"/>
<rect fill="#f6f6f6" transform="translate(630.5 68.5)" width="50" height="20"/>
<text class="svg__training__text-code" dy="1em" transform="translate(634.5 68.5)" width="43" height="18">SAVE</text>
<rect width="120" height="60" x="501" y="43" fill="#f5f5f5" stroke="#666" stroke-width="2" rx="9" ry="9"/>
<text class="svg__training__text" dy="0.9em" transform="translate(538.5 63.5)" width="43" height="18">Model</text>
<path fill="none" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M121 54h61.8"/>
<path fill="#09a3d5" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M188.8 54l-8 4 2-4-2-4z"/>
<path fill="none" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M121 19h61.8"/>
<path fill="#09a3d5" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M188.8 19l-8 4 2-4-2-4z"/>
<rect width="120" height="71" x="1" y="1" fill="#dae8fc" stroke="#09a3d5" stroke-width="2" rx="10.7" ry="10.7"/>
<text class="svg__training__text" dy="0.9em" transform="translate(10 26.5)" width="93" height="18">Training data</text>
<path fill="none" stroke="#87e02d" stroke-width="2" stroke-miterlimit="10" d="M311 54h51.8"/>
<path fill="#87e02d" stroke="#87e02d" stroke-width="2" stroke-miterlimit="10" d="M368.8 54l-8 4 2-4-2-4z"/>
<path fill="#dae8fc" stroke="#09a3d5" stroke-width="2" d="M191 39h120v30H191z"/>
<text class="svg__training__text" dy="0.9em" transform="translate(232.5 44.5)" width="35" height="18">label</text>
<path fill="none" stroke="#f33" stroke-width="2" stroke-miterlimit="10" d="M311 90h51.8"/>
<path fill="#f33" stroke="#f33" stroke-width="2" stroke-miterlimit="10" d="M368.8 90l-8 4 2-4-2-4z"/>
<path fill="#f5f5f5" stroke="#09a3d5" stroke-width="2" d="M191 75h120v30H191z" stroke-dasharray="2 2"/>
<text class="svg__training__text" dy="0.9em" transform="translate(232.5 80.5)" width="35" height="18">label</text>
<rect width="120" height="60" x="706" y="49" fill="#f5f5f5" stroke="#666" stroke-width="2" rx="9" ry="9"/>
<text class="svg__training__text" dy="0.9em" transform="translate(734.5 59.5)" width="61" height="38">Updated
<tspan dy="1.25em" dx="-3.25em">Model</tspan>
</text>
<path fill="#dae8fc" stroke="#09a3d5" stroke-width="2" d="M191 4h120v30H191z"/>
<text class="svg__training__text" dy="0.9em" transform="translate(236.5 9.5)" width="27" height="18">text</text>
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M461 73h31.8"/>
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M498.8 73l-8 4 2-4-2-4z"/>
<path fill="url(#a)" d="M409.5 21L461 72.5 409.5 124 358 72.5z"/>
<text class="svg__training__text-code" dy="0.9em" transform="translate(371.5 64.5)" width="67" height="16">GRADIENT</text>
<rect id="a" width="116" height="29" x="0" y="0" rx="6"/>
<mask id="b" width="116" height="29" x="0" y="0" fill="#fff" maskContentUnits="userSpaceOnUse" maskUnits="objectBoundingBox">
<use xlink:href="#a"/>
</mask>
</defs>
<g fill="none" fill-rule="evenodd">
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M562.8 118v36.2h-99.9"/>
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M375 154.6l-110 .1v-27.8"/>
<path fill="#979797" d="M265 117l5 10h-10z"/>
<path fill="#79E000" d="M378 60l-10 5V55z"/>
<path stroke="#79E000" stroke-linecap="square" stroke-width="2.2" d="M367 60.2h-41"/>
<path fill="#979797" d="M502 78l-10 5V73z"/>
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M491.2 78H475"/>
<path fill="#979797" d="M703 78l-10 5V73z"/>
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M692.2 78H687"/>
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M629 77.3h-4.8"/>
<path fill="#FF5D59" d="M378 95l-10 5V90z"/>
<path stroke="#FF5D59" stroke-linecap="square" stroke-width="2.2" d="M367 95.2h-41"/>
<path fill="#3AC" d="M203 27l-10 5V22z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="2.2" d="M192 27.2h-41"/>
<path fill="#3AC" d="M203 61l-10 5V56z"/>
<path stroke="#3AC" stroke-linecap="square" stroke-width="2.2" d="M192 61.2h-41"/>
<rect width="117.5" height="73.5" x="25.8" y="8.8" fill="#C3E7F1" stroke="#3AC" stroke-width="3.5" rx="12"/>
<g transform="translate(505 46)">
<rect width="113" height="60" x="1.5" y="1.5" fill="#FFF" stroke="#B7B7B7" stroke-width="3" rx="12"/>
<path fill="#3D4251" fill-rule="nonzero" d="M40 31.6a7.3 7.3 0 01.5 1.2 20.3 20.3 0 01.6-1.2l3.8-7.2.2-.2.1-.2h2.4v13h-2.2v-8.4a10.7 10.7 0 010-1l-3.9 7.3c0 .2-.2.3-.3.4a1 1 0 01-.5.1h-.3a1 1 0 01-.5-.1 1 1 0 01-.4-.4l-4-7.4a8 8 0 010 1V37h-2V24H35.7l.1.2.2.2 3.9 7.2zm14-4a5 5 0 011.9.4 4 4 0 012.3 2.4c.3.6.4 1.2.4 2 0 .7-.1 1.4-.4 2-.2.5-.5 1-.9 1.4a4 4 0 01-1.4 1 5 5 0 01-1.9.3c-.7 0-1.3 0-1.9-.3a4 4 0 01-2.3-2.5c-.3-.5-.4-1.2-.4-2 0-.7.1-1.3.4-2a4 4 0 012.4-2.4c.5-.2 1.1-.3 1.8-.3zm0 7.8c.8 0 1.3-.2 1.7-.7.4-.6.6-1.3.6-2.3a4 4 0 00-.6-2.3c-.4-.5-1-.8-1.7-.8-.8 0-1.4.3-1.7.8-.4.5-.6 1.3-.6 2.3 0 1 .2 1.7.6 2.2.3.6 1 .8 1.7.8zM66.8 37c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.4.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V37h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V30a2.2 2.2 0 00-1.9-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.1 1.4v1.4l.5.9.5.5.8.2zm10.6-7.8c.6 0 1 .1 1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 1-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7-.6-.4a2 2 0 00-.9-.2zm8.1-5.6V37h-2.2V23.6h2.2z"/>
</g>
<g transform="translate(704 46)">
<rect width="113" height="60" x="1.5" y="1.5" fill="#FFF" stroke="#B7B7B7" stroke-width="3" rx="12"/>
<path fill="#3D4251" fill-rule="nonzero" d="M29.6 28c.5 0 1 0 1.3-.2a2.6 2.6 0 001.5-1.7c.2-.4.2-.8.2-1.3V17H35v7.8a6 6 0 01-.3 2.1 4.9 4.9 0 01-2.8 2.8 6 6 0 01-2.3.4 6 6 0 01-2.2-.4 4.9 4.9 0 01-2.8-2.8 6 6 0 01-.3-2.1V17h2.4v7.8c0 .5 0 1 .2 1.3.1.4.3.8.6 1 .2.3.5.6.9.7.4.2.8.2 1.2.2zm7.8 5V20.8H39l.2.4.2.8 1.3-1a3.5 3.5 0 013 0c.5.1.8.4 1.1.8.3.4.6 1 .7 1.5a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-4.2 1.1l-1-.7V33h-2.2zm4.3-10.6c-.5 0-.9 0-1.2.2l-.9.9v4.1a2.1 2.1 0 001.8.8l.9-.1.7-.6.4-1 .2-1.4-.1-1.4-.4-.9-.6-.5-.8-.1zM54.1 30c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1A3 3 0 0148 29c-.3-.4-.6-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .4-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.2V30h-1.3zm-3-1.6c.5 0 .9-.1 1.2-.3l.9-.8V23a2.2 2.2 0 00-1.8-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.2 1.4c0 .6 0 1 .2 1.4 0 .4.2.7.3.9l.6.5.8.2zm14 1.6h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.2-1c.2-.4.5-.7 1-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.3-.3-.6-.4-1.1-.4a2.8 2.8 0 00-1.6.4l-.5.2a1 1 0 01-.4.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8l.3 1.5V30zm-4.4-1.4h.7a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.3.9l1 .2zm9.6 1.5c-.8 0-1.4-.2-1.9-.6-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2V21l1.4-.2.5-2.5.1-.3H70v2.8h2.4v1.6H70v5c0 .3 0 .5.2.7l.6.3.3-.1a2 2 0 00.5-.2H71.9l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.4.2-.9.2-1.3.2zm7.7-9.5c.6 0 1.1.1 1.6.3a3.5 3.5 0 012.1 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.7.7l1-.1.7-.3.5-.3H81l.2.1.6.8c-.2.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.4-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.1-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5H80v-.8l-.4-.7-.7-.4a2 2 0 00-.8-.2zM90.4 30c-.3 0-.4-.1-.5-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1a3 3 0 01-2.5-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.3-.6.5-1.1.9-1.5.3-.5.7-.8 1.2-1 .5-.3 1-.4 1.7-.4a3.2 3.2 0 012.3.9v-4.9h2.2V30h-1.4zm-3-1.6c.5 0 1-.1 1.3-.3l.9-.8V23a2.2 2.2 0 00-1.8-.8c-.3 0-.7 0-1 .2l-.6.5-.5 1-.1 1.4.1 1.4c.1.4.2.7.4.9l.6.5.8.2zM40 42.6a7.3 7.3 0 01.5 1.2 20.3 20.3 0 01.6-1.2l3.8-7.2.2-.2.1-.2h2.4v13h-2.2v-8.4a10.7 10.7 0 010-1l-3.9 7.3c0 .2-.2.3-.3.4a1 1 0 01-.5.1h-.3a1 1 0 01-.5-.1 1 1 0 01-.4-.4l-4-7.4a8 8 0 010 1V48h-2V35H35.7l.1.2.2.2 3.9 7.2zm14-4a5 5 0 011.9.4 4 4 0 012.3 2.4c.3.6.4 1.2.4 2 0 .7-.1 1.4-.4 2-.2.5-.5 1-.9 1.4a4 4 0 01-1.4 1 5 5 0 01-1.9.3c-.7 0-1.3 0-1.9-.3a4 4 0 01-2.3-2.5c-.3-.5-.4-1.2-.4-2 0-.7.1-1.3.4-2a4 4 0 012.4-2.4c.5-.2 1.1-.3 1.8-.3zm0 7.8c.8 0 1.3-.2 1.7-.7.4-.6.6-1.3.6-2.3a4 4 0 00-.6-2.3c-.4-.5-1-.8-1.7-.8-.8 0-1.4.3-1.7.8-.4.5-.6 1.3-.6 2.3 0 1 .2 1.7.6 2.2.3.6 1 .8 1.7.8zM66.8 48c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.4.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V48h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V41a2.2 2.2 0 00-1.9-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.1 1.4v1.4l.5.9.5.5.8.2zm10.6-7.8c.6 0 1 .1 1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 1-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7-.6-.4a2 2 0 00-.9-.2zm8.1-5.6V48h-2.2V34.6h2.2z"/>
</g>
<g transform="translate(207 12)">
<rect width="113.5" height="26.5" x="1.3" y="1.3" fill="#C3E7F1" stroke="#3AC" stroke-width="2.5" rx="6"/>
<path fill="#3D4251" fill-rule="nonzero" d="M50 8v2h-4v11h-2.4V10h-3.9V8H50zm5.2 3.6c.6 0 1.1.1 1.6.3A3.5 3.5 0 0159 14a5 5 0 01.3 2.2l-.1.3-.2.1H53c0 1 .3 1.7.7 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.2.1.6.8c-.2.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.4-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 2-.4zm0 1.6a2 2 0 00-1.5.6c-.3.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7c-.1-.2-.3-.3-.6-.4a2 2 0 00-.8-.2zm8 3l-3-4.4h2.5l.2.3 2 3a2.5 2.5 0 01.2-.6l1.5-2.4.3-.3h2.3l-3 4.3 3.1 4.9h-2.1l-.4-.1-.3-.3-2-3.2-.1.5-1.8 2.7-.2.3-.4.1h-2l3.2-4.8zm10.6 5c-.8 0-1.4-.3-1.9-.7-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2-.1-.3v-1l1.5-.2.4-2.5.2-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.2.2.3.3.6.3l.3-.1a2 2 0 00.5-.2H75.4l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.5.2-.9.2-1.3.2z"/>
</g>
<g transform="translate(207 46)">
<rect width="113.5" height="26.5" x="1.3" y="1.3" fill="#C3E7F1" stroke="#3AC" stroke-width="2.5" rx="6"/>
<path fill="#3D4251" fill-rule="nonzero" d="M40 19h5.2v2h-7.6V8H40v11zm14 2h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8-.2-1c0-.4 0-.7.3-1 .1-.4.5-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.5-1.1-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V21zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm6.5 1.4V7.6h2.2V13l1.3-1a3.6 3.6 0 013 0c.4.2.7.5 1 1 .3.3.6.8.7 1.4a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-2.9 1.3 3.3 3.3 0 01-1.4-.3c-.2 0-.4-.2-.5-.4l-.5-.5v.7l-.3.3-.3.1h-1.5zm4.3-7.7l-1.2.3-.9.9v4.1a2.1 2.1 0 001.8.8l1-.1.6-.6.4-1c.2-.4.2-.9.2-1.4l-.1-1.4-.4-.9-.6-.5-.8-.2zm9.7-1.7c.6 0 1.2.1 1.7.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.8.7l.9-.1.7-.3c.2 0 .3-.2.5-.3H73.3l.2.1.7.8c-.3.3-.6.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.2-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5h4.1v-.8l-.4-.7-.6-.4a2 2 0 00-1-.2zm8.2-5.6V21h-2.2V7.6h2.2z"/>
</g>
<g transform="translate(207 80)">
<use stroke="#3AC" stroke-dasharray="3 3" stroke-width="5" mask="url(#b)" xlink:href="#a"/>
<path fill="#3D4251" fill-rule="nonzero" d="M40 19h5.2v2h-7.6V8H40v11zm14 2h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8-.2-1c0-.4 0-.7.3-1 .1-.4.5-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.5-1.1-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V21zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm6.5 1.4V7.6h2.2V13l1.3-1a3.6 3.6 0 013 0c.4.2.7.5 1 1 .3.3.6.8.7 1.4a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-2.9 1.3 3.3 3.3 0 01-1.4-.3c-.2 0-.4-.2-.5-.4l-.5-.5v.7l-.3.3-.3.1h-1.5zm4.3-7.7l-1.2.3-.9.9v4.1a2.1 2.1 0 001.8.8l1-.1.6-.6.4-1c.2-.4.2-.9.2-1.4l-.1-1.4-.4-.9-.6-.5-.8-.2zm9.7-1.7c.6 0 1.2.1 1.7.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.8.7l.9-.1.7-.3c.2 0 .3-.2.5-.3H73.3l.2.1.7.8c-.3.3-.6.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.2-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5h4.1v-.8l-.4-.7-.6-.4a2 2 0 00-1-.2zm8.2-5.6V21h-2.2V7.6h2.2z"/>
</g>
<path fill="#3D4251" fill-rule="nonzero" d="M61.5 31v2h-4v11h-2.4V33h-3.9v-2h10.3zm1.4 13v-9.2h1.8l.1.5.2 1.1c.3-.5.7-1 1.1-1.3.5-.3 1-.5 1.5-.5s.9.1 1.2.3l-.3 1.7-.1.2H68h-1c-.4 0-.8 0-1.2.3a3 3 0 00-.8 1.1V44H63zm14.6 0h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.3-1c.1-.4.4-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.4-1.1-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.7 1.8l.3 1.5V44zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm8.8-7.8V44h-2.2v-9.2H82zm.4-2.7c0 .2 0 .3-.2.5a1.5 1.5 0 01-.7.8 1.4 1.4 0 01-1.6-.3c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.7.8l.2.6zM84.5 44v-9.2H86c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.7.4 1 .7.3.3.4.7.6 1.1l.2 1.4V44h-2.2v-5.9c0-.5-.2-1-.4-1.3-.3-.3-.7-.4-1.2-.4-.4 0-.7 0-1 .2l-1 .7V44h-2.3zm12.6-9.2V44H95v-9.2h2.2zm.4-2.7l-.1.5a1.5 1.5 0 01-.8.8A1.4 1.4 0 0195 33c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.8.8v.6zM99.7 44v-9.2h1.3c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.7.4 1 .7.3.3.5.7.6 1.1l.2 1.4V44h-2.2v-5.9c0-.5-.2-1-.4-1.3-.3-.3-.7-.4-1.2-.4-.4 0-.7 0-1 .2l-1 .7V44h-2.2zm13.4-9.4l1.1.1 1 .4h2.6v.8l-.1.3-.4.2-.8.1a2.9 2.9 0 01.2 1 2.7 2.7 0 01-1 2.3l-1.2.6a4.7 4.7 0 01-2.4 0c-.3.3-.5.4-.5.7 0 .2.1.3.3.4l.7.2h1a19.8 19.8 0 012 .2c.4.1.8.2 1 .4l.7.7c.2.2.3.6.3 1l-.3 1.2c-.2.4-.5.7-.9 1-.4.4-.9.6-1.4.8a7.3 7.3 0 01-3.7 0c-.5 0-1-.3-1.3-.5l-.8-.8-.2-.9c0-.4.1-.8.4-1 .2-.4.6-.6 1-.8l-.5-.5-.2-.8v-.4l.3-.5.4-.4.5-.3a2.7 2.7 0 01-1.5-2.5 2.7 2.7 0 011-2.2l1.2-.6a5 5 0 011.5-.2zm2.4 9.8l-.1-.4a1 1 0 00-.5-.3l-.6-.1a12 12 0 00-1.7-.1l-.9-.1a2 2 0 00-.6.5 1 1 0 00-.2.6l.1.5.4.3.7.3h2.1c.3 0 .6-.2.8-.3l.4-.4.1-.5zm-2.4-5.2c.3 0 .5 0 .7-.2.2 0 .4-.1.5-.3l.3-.4.1-.7c0-.4-.1-.8-.4-1-.3-.3-.7-.4-1.2-.4-.6 0-1 0-1.2.4-.3.2-.5.6-.5 1l.1.6a1.3 1.3 0 00.9.8l.7.2zM74 62c-.2 0-.4-.1-.5-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1A3 3 0 0168 61c-.3-.4-.6-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.4-1.1.8-1.5.3-.5.8-.8 1.2-1 .5-.3 1-.4 1.7-.4a3.2 3.2 0 012.3.9v-4.9h2.2V62h-1.3zm-3-1.6c.6 0 1-.1 1.3-.3l.9-.8V55a2.2 2.2 0 00-1.8-.8c-.3 0-.6 0-1 .2l-.6.5-.4 1-.2 1.4.1 1.4c.1.4.2.7.4.9l.6.5.8.2zm14 1.6h-1a1 1 0 01-.4 0c-.2-.2-.3-.3-.3-.5l-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.2-1c.2-.4.5-.7 1-1 .4-.3 1-.5 1.6-.7a12 12 0 012.7-.3v-.5c0-.7-.2-1.1-.4-1.4-.3-.3-.7-.4-1.2-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2l-.3-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8c.2.5.2 1 .2 1.5V62zm-4.3-1.4h.7a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.7.5a1 1 0 00-.1.5c0 .4 0 .7.3.9l1 .2zm9.6 1.5c-.8 0-1.4-.2-1.9-.6-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2-.1-.3v-1l1.5-.2.5-2.5.1-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.2.2.3.3.6.3l.3-.1a2 2 0 00.5-.2H92l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.4.2-.9.2-1.3.2zm11.1-.1h-1a1 1 0 01-.5 0l-.2-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.6-.8-.2-1c0-.4.1-.7.3-1 .2-.4.5-.7 1-1 .4-.3 1-.5 1.6-.7a12 12 0 012.7-.3v-.5c0-.7-.2-1.1-.4-1.4-.3-.3-.7-.4-1.2-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2L95 55a1 1 0 01-.2-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8c.2.5.2 1 .2 1.5V62zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.6-.5v-1.5l-1.7.1-1 .3-.6.5a1 1 0 00-.1.5c0 .4 0 .7.3.9l1 .2z"/>
<path fill="url(#c)" d="M384.1 42.1h73v73h-73z" transform="rotate(45 420.6 78.6)"/>
<path fill="#3D4251" fill-rule="nonzero" d="M393.4 80.2a6 6 0 002.6-.5v-2.4h-1.6l-.4-.1-.1-.4v-1.3h4.3v5.2a7.2 7.2 0 01-3.5 1.4h-1.5c-1 0-1.8-.1-2.6-.5a6.2 6.2 0 01-3.4-3.4c-.4-.9-.5-1.7-.5-2.7 0-1 .1-1.9.5-2.7a6 6 0 013.4-3.5c.9-.3 1.8-.5 2.9-.5 1 0 2 .2 2.7.5.8.3 1.4.7 2 1.2l-.7 1.1c-.2.3-.3.4-.6.4-.1 0-.3 0-.4-.2a34.3 34.3 0 00-1.3-.6 5.4 5.4 0 00-3.6 0c-.5.2-1 .6-1.3 1-.4.4-.6.8-.8 1.4-.2.6-.3 1.2-.3 1.9s0 1.4.3 2c.2.6.5 1 .9 1.5.3.4.8.7 1.3.9.5.2 1.1.3 1.7.3zm7 1.8v-9.2h1.7l.2.5.1 1.1c.4-.5.7-1 1.2-1.3.4-.3 1-.5 1.5-.5.4 0 .8.1 1.1.3l-.3 1.7v.2H404.5c-.5 0-.9 0-1.2.3a3 3 0 00-.8 1.1V82h-2.3zm14.5 0h-1a1 1 0 01-.5 0l-.2-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5c-.3-.2-.4-.5-.6-.8l-.2-1c0-.4.1-.7.3-1 .2-.4.5-.7 1-1 .3-.3.9-.5 1.6-.7a12 12 0 012.6-.3v-.5c0-.7 0-1.1-.3-1.4-.3-.3-.7-.5-1.2-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.2-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V82zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.2.7.4.9l.9.2zm13 1.4c-.3 0-.5-.1-.6-.4l-.1-.9-.6.6a3.8 3.8 0 01-1.5.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V82h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V75a2.2 2.2 0 00-1.8-.8c-.4 0-.7 0-1 .2l-.7.5-.4 1-.1 1.4v1.4l.5.9c.1.2.3.4.6.5l.7.2zm9.1-7.6V82h-2.2v-9.2h2.2zm.4-2.7c0 .2 0 .3-.2.5a1.5 1.5 0 01-.7.8 1.4 1.4 0 01-1.6-.3c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.7.8l.2.6zm6 2.5l1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.2.4.4 1 .6 1.8.6l.9-.1c.3 0 .5-.2.7-.3.2 0 .3-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.9-.8 1.4-1 .5-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2c0-.3 0-.5-.2-.8l-.3-.7-.6-.4a2 2 0 00-.9-.2zm5.8 7.8v-9.2h1.3c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.8.4 1 .7.3.3.5.7.6 1.1l.2 1.4V82h-2.2v-5.9c0-.5-.1-1-.4-1.3-.3-.3-.7-.5-1.2-.5-.4 0-.7.1-1 .3l-1 .7V82h-2.2zm13.2.1c-.8 0-1.4-.2-1.8-.6-.4-.5-.7-1.1-.7-2v-5h-.9l-.3-.2-.1-.3v-1l1.4-.2.5-2.5c0-.1 0-.2.2-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.1.2.3.3.6.3l.3-.1a2 2 0 00.4-.2H456.7l.2.1.7 1.1c-.4.3-.7.5-1.2.6-.4.2-.8.2-1.3.2z"/>
<rect width="80" height="18" x="378" y="145" fill="#37BBAB" rx="9"/>
<g transform="translate(631 69)">
<rect width="52" height="18" x="1" fill="#37BBAB" rx="9"/>
<path fill="#FFF" fill-rule="nonzero" d="M13.6 5.5c0 .2 0 .2-.2.3H12.8a12.2 12.2 0 00-1.1-.6l-.9-.1H10l-.5.4c-.2 0-.3.2-.4.4v.6c0 .3 0 .5.2.7l.6.5.8.3a41.9 41.9 0 012 .7l.8.6a2.6 2.6 0 01.9 2c0 .6-.1 1-.3 1.5a3.4 3.4 0 01-2 2c-.5.2-1.1.3-1.7.3a5.5 5.5 0 01-3.8-1.5l.6-1 .2-.2H8a13 13 0 001.3.8l1 .2c.6 0 1.1-.2 1.4-.5.4-.3.5-.7.5-1.2 0-.3 0-.6-.2-.8-.1-.2-.3-.3-.6-.4l-.8-.4a28.4 28.4 0 01-2-.7L8 9l-.7-1A3.4 3.4 0 018 4.4a4.3 4.3 0 012.8-1c.7 0 1.3.1 1.9.3.6.2 1 .5 1.5 1l-.6 1zM26.2 15h-1.7a.7.7 0 01-.7-.5l-.9-2.3h-4.8l-.8 2.3a.8.8 0 01-.7.5h-1.7l4.5-11.6h2.2L26.2 15zm-7.5-4.4h3.7L21 6.8a17.6 17.6 0 01-.5-1.4 26.7 26.7 0 01-.4 1.4l-1.4 3.8zm7.5-7.2H28a.7.7 0 01.7.5l2.7 7a9.5 9.5 0 01.5 1.7l.5-1.6L35 4l.2-.4.5-.2h1.7L33 15h-2L26.2 3.4zm19.8 0v1.7h-5v3.3h4V10h-4v3.3h5V15h-7.3V3.4H46z"/>
</g>
<path fill="#FFF" fill-rule="nonzero" d="M389 156v4h-2.2v-11.6h3.8c.7 0 1.4.1 2 .3.5.2 1 .4 1.4.8l.8 1.1.3 1.5c0 .6-.1 1-.3 1.6l-.9 1.2a4 4 0 01-1.4.7c-.5.2-1.2.3-2 .3H389zm0-1.8h1.6l1-.1.7-.4.5-.7a2.6 2.6 0 000-1.7l-.5-.7a2 2 0 00-.7-.4l-1-.1H389v4.1zm10 1.3v4.5h-2.2v-11.6h3.5c.8 0 1.5.1 2 .3.6.1 1 .4 1.4.7.4.3.7.6.8 1a3.5 3.5 0 01-.4 3.4l-.8.8-1 .5.6.6 3 4.3h-2a1 1 0 01-.5-.1 1 1 0 01-.3-.3l-2.4-3.7-.3-.3a1 1 0 00-.5-.1h-1zm0-1.6h1.3l1-.1.8-.4.4-.7.2-.8c0-.6-.2-1-.6-1.3-.4-.3-1-.5-1.8-.5H399v3.8zm15.5-5.5v1.7h-5.1v3.3h4v1.6h-4v3.3h5.1v1.7h-7.3v-11.6h7.3zm12.1 5.8c0 .9-.1 1.6-.4 2.4a5.4 5.4 0 01-3 3c-.7.3-1.5.4-2.4.4h-4.4v-11.6h4.4c.9 0 1.7.2 2.4.5a5.4 5.4 0 013 3c.3.7.4 1.5.4 2.3zm-2.2 0c0-.6 0-1.2-.2-1.7s-.4-1-.7-1.3c-.4-.3-.7-.6-1.2-.8a4 4 0 00-1.5-.3h-2.3v8.2h2.3c.6 0 1-.1 1.5-.3.5-.2.8-.4 1.2-.8.3-.3.5-.8.7-1.3.2-.5.2-1 .2-1.7zm6.4 5.8h-2.2v-11.6h2.2V160zm10.5-2.7l.3.1.9 1c-.5.5-1 1-1.8 1.3a6 6 0 01-2.4.4 5.1 5.1 0 01-5.2-3.5 7 7 0 010-4.8 5.5 5.5 0 013.1-3 6.4 6.4 0 014.7 0c.6.2 1.2.6 1.6 1l-.7 1-.2.2h-.2l-.4-.1a4.7 4.7 0 00-1.2-.6l-1.2-.2-1.5.3c-.5.2-.8.5-1.2.8-.3.4-.6.8-.7 1.3a5 5 0 00-.3 1.7c0 .7 0 1.2.3 1.8.1.5.4.9.7 1.2.3.4.7.6 1.1.8.4.2 1 .3 1.4.3h.8a3.4 3.4 0 001.2-.5c.2 0 .4-.2.5-.4h.2l.2-.1zm11-8.9v1.8h-3.5v9.8h-2.2v-9.8h-3.5v-1.8h9.1z"/>
</g>
</svg>

Before

Width:  |  Height:  |  Size: 3.9 KiB

After

Width:  |  Height:  |  Size: 18 KiB

View File

@ -18,13 +18,13 @@ an **annotated document**. It also orchestrates training and serialization.
### Container objects {#architecture-containers}
| Name | Description |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
| [`Span`](/api/span) | A slice from a `Doc` object. |
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
| [`MorphAnalysis`](/api/morphanalysis) | A morphological analysis. |
| Name | Description |
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
| [`Span`](/api/span) | A slice from a `Doc` object. |
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
| [`MorphAnalysis`](/api/morphanalysis) | A morphological analysis. |
### Processing pipeline {#architecture-pipeline}
@ -52,5 +52,3 @@ an **annotated document**. It also orchestrates training and serialization.
| [`StringStore`](/api/stringstore) | Map strings to and from hash values. |
| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. |
| [`Example`](/api/example) | Collection for training annotations. |
|

View File

@ -12,29 +12,32 @@ passed on to the next component.
> - **Creates:** Objects, attributes and properties modified and set by the
> component.
| Name | Component | Creates | Description |
| ----------------- | ------------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------ |
| **tokenizer** | [`Tokenizer`](/api/tokenizer) | `Doc` | Segment text into tokens. |
| **tagger** | [`Tagger`](/api/tagger) | `Doc[i].tag` | Assign part-of-speech tags. |
| **parser** | [`DependencyParser`](/api/dependencyparser) | `Doc[i].head`, `Doc[i].dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels. |
| **ner** | [`EntityRecognizer`](/api/entityrecognizer) | `Doc.ents`, `Doc[i].ent_iob`, `Doc[i].ent_type` | Detect and label named entities. |
| **textcat** | [`TextCategorizer`](/api/textcategorizer) | `Doc.cats` | Assign document labels. |
| ... | [custom components](/usage/processing-pipelines#custom-components) | `Doc._.xxx`, `Token._.xxx`, `Span._.xxx` | Assign custom attributes, methods or properties. |
| Name | Component | Creates | Description |
| ------------- | ------------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------ |
| **tokenizer** | [`Tokenizer`](/api/tokenizer) | `Doc` | Segment text into tokens. |
| **tagger** | [`Tagger`](/api/tagger) | `Doc[i].tag` | Assign part-of-speech tags. |
| **parser** | [`DependencyParser`](/api/dependencyparser) | `Doc[i].head`, `Doc[i].dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels. |
| **ner** | [`EntityRecognizer`](/api/entityrecognizer) | `Doc.ents`, `Doc[i].ent_iob`, `Doc[i].ent_type` | Detect and label named entities. |
| **textcat** | [`TextCategorizer`](/api/textcategorizer) | `Doc.cats` | Assign document labels. |
| ... | [custom components](/usage/processing-pipelines#custom-components) | `Doc._.xxx`, `Token._.xxx`, `Span._.xxx` | Assign custom attributes, methods or properties. |
The processing pipeline always **depends on the statistical model** and its
capabilities. For example, a pipeline can only include an entity recognizer
component if the model includes data to make predictions of entity labels. This
is why each model will specify the pipeline to use in its meta data, as a simple
list containing the component names:
is why each model will specify the pipeline to use in its meta data and
[config](/usage/training#config), as a simple list containing the component
names:
```json
"pipeline": ["tagger", "parser", "ner"]
```ini
pipeline = ["tagger", "parser", "ner"]
```
import Accordion from 'components/accordion.js'
<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">
<!-- TODO: note on v3 tok2vec own model vs. upstream listeners -->
In spaCy v2.x, the statistical components like the tagger or parser are
independent and don't share any data between themselves. For example, the named
entity recognizer doesn't use any features set by the tagger and parser, and so
@ -48,11 +51,10 @@ pre-defined sentence boundaries, so if a previous component in the pipeline sets
them, its dependency predictions may be different. Similarly, it matters if you
add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
recognizer: if it's added before, the entity recognizer will take the existing
entities into account when making predictions.
The [`EntityLinker`](/api/entitylinker), which resolves named entities to
knowledge base IDs, should be preceded by
a pipeline component that recognizes entities such as the
[`EntityRecognizer`](/api/entityrecognizer).
entities into account when making predictions. The
[`EntityLinker`](/api/entitylinker), which resolves named entities to knowledge
base IDs, should be preceded by a pipeline component that recognizes entities
such as the [`EntityRecognizer`](/api/entityrecognizer).
</Accordion>

View File

@ -909,9 +909,8 @@ If you're using a statistical model, writing to the `nlp.Defaults` or
`English.Defaults` directly won't work, since the regular expressions are read
from the model and will be compiled when you load it. If you modify
`nlp.Defaults`, you'll only see the effect if you call
[`spacy.blank`](/api/top-level#spacy.blank) or `Defaults.create_tokenizer()`. If
you want to modify the tokenizer loaded from a statistical model, you should
modify `nlp.tokenizer` directly.
[`spacy.blank`](/api/top-level#spacy.blank). If you want to modify the tokenizer
loaded from a statistical model, you should modify `nlp.tokenizer` directly.
</Infobox>
@ -1386,8 +1385,7 @@ import spacy
from spacy.lang.en import English
nlp = English() # just the language with no model
sentencizer = nlp.create_pipe("sentencizer")
nlp.add_pipe(sentencizer)
nlp.add_pipe("sentencizer")
doc = nlp("This is a sentence. This is another sentence.")
for sent in doc.sents:
print(sent.text)
@ -1422,6 +1420,7 @@ take advantage of dependency-based sentence segmentation.
```python
### {executable="true"}
from spacy.language import Language
import spacy
text = "this is a sentence...hello...and another sentence."
@ -1430,13 +1429,14 @@ nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print("Before:", [sent.text for sent in doc.sents])
@Language.component("set_custom_coundaries")
def set_custom_boundaries(doc):
for token in doc[:-1]:
if token.text == "...":
doc[token.i+1].is_sent_start = True
doc[token.i + 1].is_sent_start = True
return doc
nlp.add_pipe(set_custom_boundaries, before="parser")
nlp.add_pipe("set_custom_boundaries", before="parser")
doc = nlp(text)
print("After:", [sent.text for sent in doc.sents])
```

View File

@ -97,32 +97,40 @@ but also your own custom processing functions. A pipeline component can be added
to an already existing `nlp` object, specified when initializing a `Language`
class, or defined within a [model package](/usage/saving-loading#models).
When you load a model, spaCy first consults the model's
[`meta.json`](/usage/saving-loading#models). The meta typically includes the
model details, the ID of a language class, and an optional list of pipeline
components. spaCy then does the following:
> #### meta.json (excerpt)
> #### config.cfg (excerpt)
>
> ```json
> {
> "lang": "en",
> "name": "core_web_sm",
> "description": "Example model for spaCy",
> "pipeline": ["tagger", "parser", "ner"]
> }
> ```ini
> [nlp]
> lang = "en"
> pipeline = ["tagger", "parser"]
>
> [components]
>
> [components.tagger]
> factory = "tagger"
> # settings for the tagger component
>
> [components.parser]
> factory = "parser"
> # settings for the parser component
> ```
When you load a model, spaCy first consults the model's
[`meta.json`](/usage/saving-loading#models) and
[`config.cfg`](/usage/training#config). The config tells spaCy what language
class to use, which components are in the pipeline, and how those components
should be created. spaCy will then do the following:
1. Load the **language class and data** for the given ID via
[`get_lang_class`](/api/top-level#util.get_lang_class) and initialize it. The
`Language` class contains the shared vocabulary, tokenization rules and the
language-specific annotation scheme.
2. Iterate over the **pipeline names** and create each component using
[`create_pipe`](/api/language#create_pipe), which looks them up in
`Language.factories`.
3. Add each pipeline component to the pipeline in order, using
[`add_pipe`](/api/language#add_pipe).
4. Make the **model data** available to the `Language` class by calling
language-specific settings.
2. Iterate over the **pipeline names** and look up each component name in the
`[components]` block. The `factory` tells spaCy which
[component factory](#custom-components-factories) to use for adding the
component with with [`add_pipe`](/api/language#add_pipe). The settings are
passed into the factory.
3. Make the **model data** available to the `Language` class by calling
[`from_disk`](/api/language#from_disk) with the path to the model data
directory.
@ -132,17 +140,25 @@ So when you call this...
nlp = spacy.load("en_core_web_sm")
```
... the model's `meta.json` tells spaCy to use the language `"en"` and the
... the model's `config.cfg` tells spaCy to use the language `"en"` and the
pipeline `["tagger", "parser", "ner"]`. spaCy will then initialize
`spacy.lang.en.English`, and create each pipeline component and add it to the
processing pipeline. It'll then load in the model's data from its data directory
and return the modified `Language` class for you to use as the `nlp` object.
<Infobox title="Changed in v3.0" variant="warning">
spaCy v3.0 introduces a `config.cfg`, which includes more detailed settings for
the model pipeline, its components and the
[training process](/usage/training#config). You can export the config of your
current `nlp` object by calling [`nlp.config.to_disk`](/api/language#config).
</Infobox>
Fundamentally, a [spaCy model](/models) consists of three components: **the
weights**, i.e. binary data loaded in from a directory, a **pipeline** of
functions called in order, and **language data** like the tokenization rules and
annotation scheme. All of this is specific to each model, and defined in the
model's `meta.json` for example, a Spanish NER model requires different
language-specific settings. For example, a Spanish NER model requires different
weights, language data and pipeline components than an English parsing and
tagging model. This is also why the pipeline state is always held by the
`Language` class. [`spacy.load`](/api/top-level#spacy.load) puts this all
@ -158,9 +174,8 @@ data_path = "path/to/en_core_web_sm/en_core_web_sm-2.0.0"
cls = spacy.util.get_lang_class(lang) # 1. Get Language instance, e.g. English()
nlp = cls() # 2. Initialize it
for name in pipeline:
component = nlp.create_pipe(name) # 3. Create the pipeline components
nlp.add_pipe(component) # 4. Add the component to the pipeline
nlp.from_disk(model_data_path) # 5. Load in the binary data
nlp.add_pipe(name) # 3. Add the component to the pipeline
nlp.from_disk(model_data_path) # 4. Load in the binary data
```
When you call `nlp` on a text, spaCy will **tokenize** it and then **call each
@ -190,36 +205,34 @@ print(nlp.pipe_names)
### Built-in pipeline components {#built-in}
spaCy ships with several built-in pipeline components that are also available in
the `Language.factories`. This means that you can initialize them by calling
[`nlp.create_pipe`](/api/language#create_pipe) with their string names and
require them in the pipeline settings in your model's `meta.json`.
spaCy ships with several built-in pipeline components that are registered with
string names. This means that you can initialize them by calling
[`nlp.add_pipe`](/api/language#add_pipe) with their names and spaCy will know
how to create them. See the [API documentation](/api) for a full list of
available pipeline components and component functions.
> #### Usage
>
> ```python
> # Option 1: Import and initialize
> from spacy.pipeline import EntityRuler
> ruler = EntityRuler(nlp)
> nlp.add_pipe(ruler)
>
> # Option 2: Using nlp.create_pipe
> sentencizer = nlp.create_pipe("sentencizer")
> nlp.add_pipe(sentencizer)
> nlp = spacy.blank("en")
> nlp.add_pipe("sentencizer")
> # add_pipe returns the added component
> ruler = nlp.add_pipe("entity_ruler")
> ```
| String name | Component | Description |
| ------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| `tagger` | [`Tagger`](/api/tagger) | Assign part-of-speech-tags. |
| `parser` | [`DependencyParser`](/api/dependencyparser) | Assign dependency labels. |
| `ner` | [`EntityRecognizer`](/api/entityrecognizer) | Assign named entities. |
| `entity_linker` | [`EntityLinker`](/api/entitylinker) | Assign knowledge base IDs to named entities. Should be added after the entity recognizer. |
| `textcat` | [`TextCategorizer`](/api/textcategorizer) | Assign text categories. |
| `entity_ruler` | [`EntityRuler`](/api/entityruler) | Assign named entities based on pattern rules. |
| `sentencizer` | [`Sentencizer`](/api/sentencizer) | Add rule-based sentence segmentation without the dependency parse. |
| `merge_noun_chunks` | [`merge_noun_chunks`](/api/pipeline-functions#merge_noun_chunks) | Merge all noun chunks into a single token. Should be added after the tagger and parser. |
| `merge_entities` | [`merge_entities`](/api/pipeline-functions#merge_entities) | Merge all entities into a single token. Should be added after the entity recognizer. |
| `merge_subtokens` | [`merge_subtokens`](/api/pipeline-functions#merge_subtokens) | Merge subtokens predicted by the parser into single tokens. Should be added after the parser. |
| String name | Component | Description |
| --------------- | ------------------------------------------- | ----------------------------------------------------------------------------------------- |
| `tagger` | [`Tagger`](/api/tagger) | Assign part-of-speech-tags. |
| `parser` | [`DependencyParser`](/api/dependencyparser) | Assign dependency labels. |
| `ner` | [`EntityRecognizer`](/api/entityrecognizer) | Assign named entities. |
| `entity_linker` | [`EntityLinker`](/api/entitylinker) | Assign knowledge base IDs to named entities. Should be added after the entity recognizer. |
| `textcat` | [`TextCategorizer`](/api/textcategorizer) | Assign text categories. |
| `entity_ruler` | [`EntityRuler`](/api/entityruler) | Assign named entities based on pattern rules. |
| `sentencizer` | [`Sentencizer`](/api/sentencizer) | Add rule-based sentence segmentation without the dependency parse. |
<!-- TODO: update with more components -->
<!-- TODO: explain default config and factories -->
### Disabling and modifying pipeline components {#disabling}
@ -233,7 +246,6 @@ list:
```python
### Disable loading
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"])
nlp = English().from_disk("/model", disable=["ner"])
```
In some cases, you do want to load all pipeline components and their weights,
@ -297,15 +309,18 @@ nlp.replace_pipe("tagger", my_custom_tagger)
## Creating custom pipeline components {#custom-components}
A component receives a `Doc` object and can modify it for example, by using
the current weights to make a prediction and set some annotation on the
document. By adding a component to the pipeline, you'll get access to the `Doc`
at any point **during processing** instead of only being able to modify it
afterwards.
A pipeline component is a function that receives a `Doc` object, modifies it and
returns it for example, by using the current weights to make a prediction
and set some annotation on the document. By adding a component to the pipeline,
you'll get access to the `Doc` at any point **during processing** instead of
only being able to modify it afterwards.
> #### Example
>
> ```python
> from spacy.language import Language
>
> @Language.component("my_component")
> def my_component(doc):
> # do something to the doc here
> return doc
@ -316,6 +331,12 @@ afterwards.
| `doc` | `Doc` | The `Doc` object processed by the previous component. |
| **RETURNS** | `Doc` | The `Doc` object processed by this pipeline component. |
The [`@Language.component`](/api/language#component) decorator lets you turn a
simple function into a pipeline component. It takes at least one argument, the
**name** of the component factory. You can use this name to add an instance of
your component to the pipeline. It can also be listed in your model config, so
you can save, load and train models using your component.
Custom components can be added to the pipeline using the
[`add_pipe`](/api/language#add_pipe) method. Optionally, you can either specify
a component to add it **before or after**, tell spaCy to add it **first or
@ -325,23 +346,43 @@ last** in the pipeline, or define a **custom name**. If no name is set and no
> #### Example
>
> ```python
> nlp.add_pipe(my_component)
> nlp.add_pipe(my_component, first=True)
> nlp.add_pipe(my_component, before="parser")
> nlp.add_pipe("my_component")
> nlp.add_pipe("my_component", first=True)
> nlp.add_pipe("my_component", before="parser")
> ```
| Argument | Type | Description |
| -------- | ---- | ------------------------------------------------------------------------ |
| `last` | bool | If set to `True`, component is added **last** in the pipeline (default). |
| `first` | bool | If set to `True`, component is added **first** in the pipeline. |
| `before` | str | String name of component to add the new component **before**. |
| `after` | str | String name of component to add the new component **after**. |
| Argument | Type | Description |
| -------- | --------- | ------------------------------------------------------------------------ |
| `last` | bool | If set to `True`, component is added **last** in the pipeline (default). |
| `first` | bool | If set to `True`, component is added **first** in the pipeline. |
| `before` | str / int | String name or index to add the new component **before**. |
| `after` | str / int | String name or index to add the new component **after**. |
### Example: A simple pipeline component {#custom-components-simple}
<Infobox title="Changed in v3.0" variant="warning">
As of v3.0, components need to be registered using the
[`@Language.component`](/api/language#component) or
[`@Language.factory`](/api/language#factory) decorator so spaCy knows that a
function is a component. [`nlp.add_pipe`](/api/language#add_pipe) now takes the
**string name** of the component factory instead of the component function. This
doesn't only save you lines of code, it also allows spaCy to validate and track
your custom components, and make sure they can be saved and loaded.
```diff
- ruler = nlp.create_pipe("entity_ruler")
- nlp.add_pipe(ruler)
+ ruler = nlp.add_pipe("entity_ruler")
```
</Infobox>
### Examples: Simple stateless pipeline components {#custom-components-simple}
The following component receives the `Doc` in the pipeline and prints some
information about it: the number of tokens, the part-of-speech tags of the
tokens and a conditional message based on the document length.
tokens and a conditional message based on the document length. The
[`@Language.component`](/api/language#component) decorator lets you register the
component under the name `"info_component"`.
> #### ✏️ Things to try
>
@ -352,11 +393,16 @@ tokens and a conditional message based on the document length.
> this change reflected in `nlp.pipe_names`.
> 3. Print `nlp.pipeline`. You'll see a list of tuples describing the component
> name and the function that's called on the `Doc` object in the pipeline.
> 4. Change the first argument to `@Language.component`, the name, to something
> else. spaCy should now complain that it doesn't know a component of the
> name `"info_component"`.
```python
### {executable="true"}
import spacy
from spacy.language import Language
@Language.component("info_component")
def my_component(doc):
print(f"After tokenization, this doc has {len(doc)} tokens.")
print("The part-of-speech tags are:", [token.pos_ for token in doc])
@ -365,76 +411,16 @@ def my_component(doc):
return doc
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(my_component, name="print_info", last=True)
nlp.add_pipe("info_component", name="print_info", last=True)
print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info']
doc = nlp("This is a sentence.")
```
Of course, you can also wrap your component as a class to allow initializing it
with custom settings and hold state within the component. This is useful for
**stateful components**, especially ones which **depend on shared data**. In the
following example, the custom component `EntityMatcher` can be initialized with
`nlp` object, a terminology list and an entity label. Using the
[`PhraseMatcher`](/api/phrasematcher), it then matches the terms in the `Doc`
and adds them to the existing entities.
<Infobox title="Important note" variant="warning">
As of v2.1.0, spaCy ships with the [`EntityRuler`](/api/entityruler), a pipeline
component for easy, rule-based named entity recognition. Its implementation is
similar to the `EntityMatcher` code shown below, but it includes some additional
features like support for phrase patterns and token patterns, handling overlaps
with existing entities and pattern export as JSONL.
We'll still keep the pipeline component example below, as it works well to
illustrate complex components. But if you're planning on using this type of
component in your application, you might find the `EntityRuler` more convenient.
[See here](/usage/rule-based-matching#entityruler) for more details and
examples.
</Infobox>
```python
### {executable="true"}
import spacy
from spacy.matcher import PhraseMatcher
from spacy.tokens import Span
class EntityMatcher:
name = "entity_matcher"
def __init__(self, nlp, terms, label):
patterns = [nlp.make_doc(text) for text in terms]
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add(label, patterns)
def __call__(self, doc):
matches = self.matcher(doc)
for match_id, start, end in matches:
span = Span(doc, start, end, label=match_id)
doc.ents = list(doc.ents) + [span]
return doc
nlp = spacy.load("en_core_web_sm")
terms = ("cat", "dog", "tree kangaroo", "giant sea spider")
entity_matcher = EntityMatcher(nlp, terms, "ANIMAL")
nlp.add_pipe(entity_matcher, after="ner")
print(nlp.pipe_names) # The components in the pipeline
doc = nlp("This is a text about Barack Obama and a tree kangaroo")
print([(ent.text, ent.label_) for ent in doc.ents])
```
### Example: Custom sentence segmentation logic {#component-example1}
Let's say you want to implement custom logic to improve spaCy's sentence
boundary detection. Currently, sentence segmentation is based on the dependency
parse, which doesn't always produce ideal results. The custom logic should
therefore be applied **after** tokenization, but _before_ the dependency parsing
this way, the parser can also take advantage of the sentence boundaries.
Here's another example of a pipeline component that implements custom logic to
improve the sentence boundaries set by the dependency parser. The custom logic
should therefore be applied **after** tokenization, but _before_ the dependency
parsing this way, the parser can also take advantage of the sentence
boundaries.
> #### ✏️ Things to try
>
@ -448,90 +434,318 @@ therefore be applied **after** tokenization, but _before_ the dependency parsing
```python
### {executable="true"}
import spacy
from spacy.language import Language
@Language.component("custom_sentencizer")
def custom_sentencizer(doc):
for i, token in enumerate(doc[:-2]):
# Define sentence start if pipe + titlecase token
if token.text == "|" and doc[i+1].is_title:
doc[i+1].is_sent_start = True
if token.text == "|" and doc[i + 1].is_title:
doc[i + 1].is_sent_start = True
else:
# Explicitly set sentence start to False otherwise, to tell
# the parser to leave those tokens alone
doc[i+1].is_sent_start = False
doc[i + 1].is_sent_start = False
return doc
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(custom_sentencizer, before="parser") # Insert before the parser
nlp.add_pipe("custom_sentencizer", before="parser") # Insert before the parser
doc = nlp("This is. A sentence. | This is. Another sentence.")
for sent in doc.sents:
print(sent.text)
```
### Example: Pipeline component for entity matching and tagging with custom attributes {#component-example2}
### Component factories and stateful components {#custom-components-factories}
This example shows how to create a spaCy extension that takes a terminology list
(in this case, single- and multi-word company names), matches the occurrences in
a document, labels them as `ORG` entities, merges the tokens and sets custom
`is_tech_org` and `has_tech_org` attributes. For efficient matching, the example
uses the [`PhraseMatcher`](/api/phrasematcher) which accepts `Doc` objects as
match patterns and works well for large terminology lists. It also ensures your
patterns will always match, even when you customize spaCy's tokenization rules.
When you call `nlp` on a text, the custom pipeline component is applied to the
`Doc`.
```python
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_entities.py
```
Wrapping this functionality in a pipeline component allows you to reuse the
module with different settings, and have all pre-processing taken care of when
you call `nlp` on your text and receive a `Doc` object.
### Adding factories {#custom-components-factories}
When spaCy loads a model via its `meta.json`, it will iterate over the
`"pipeline"` setting, look up every component name in the internal factories and
call [`nlp.create_pipe`](/api/language#create_pipe) to initialize the individual
components, like the tagger, parser or entity recognizer. If your model uses
custom components, this won't work so you'll have to tell spaCy **where to
find your component**. You can do this by writing to the `Language.factories`:
Component factories are callables that take settings and return a **pipeline
component function**. This is useful if your component is stateful and if you
need to customize their creation, or if you need access to the current `nlp`
object or the shared vocab. Component factories can be registered using the
[`@Language.factory`](/api/language#factory) decorator and they need at least
**two named arguments** that are filled in automatically when the component is
added to the pipeline:
> #### Example
>
> ```python
> from spacy.language import Language
>
> @Language.factory("my_component")
> def my_component(nlp, name):
> return MyComponent()
> ```
| Argument | Type | Description |
| -------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `nlp` | [`Language`](/api/language) | The current `nlp` object. Can be used to access the |
| `name` | str | The **instance name** of the component in the pipeline. This lets you identify different instances of the same component. |
All other settings can be passed in by the user via the `config` argument on
[`nlp.add_pipe`](/api/language). The
[`@Language.factory`](/api/language#factory) decorator also lets you define a
`default_config` that's used as a fallback.
```python
### With config {highlight="4,9"}
import spacy
from spacy.language import Language
Language.factories["entity_matcher"] = lambda nlp, **cfg: EntityMatcher(nlp, **cfg)
@Language.factory("my_component", default_config={"some_setting": True})
def my_component(nlp, name, some_setting: bool):
return MyComponent(some_setting=some_setting)
nlp = spacy.blank("en")
nlp.add_pipe("my_component", config={"some_setting": False})
```
You can also ship the above code and your custom component in your packaged
model's `__init__.py`, so it's executed when you load your model. The `**cfg`
config parameters are passed all the way down from
[`spacy.load`](/api/top-level#spacy.load), so you can load the model and its
components with custom settings:
<Accordion title="How is @Language.factory different from @Language.component?" id="factories-decorator-component">
The [`@Language.component`](/api/language#component) decorator is essentially a
**shortcut** for stateless pipeline component that don't need any settings. This
means you don't have to always write a function that returns your function if
there's no state to be passed through spaCy can just take care of this for
you. The following two code examples are equivalent:
```python
nlp = spacy.load("your_custom_model", terms=["tree kangaroo"], label="ANIMAL")
# Statless component with @Language.factory
@Language.factory("my_component")
def create_my_component():
def my_component(doc):
# Do something to the doc
return doc
return my_component
# Stateless component with @Language.component
@Language.component("my_component")
def my_component(doc):
# Do something to the doc
return doc
```
<Infobox title="Important note" variant="warning">
</Accordion>
When you load a model via its package name, like `en_core_web_sm`, spaCy will
import the package and then call its `load()` method. This means that custom
code in the model's `__init__.py` will be executed, too. This is **not the
case** if you're loading a model from a path containing the model data. Here,
spaCy will only read in the `meta.json`. If you want to use custom factories
with a model loaded from a path, you need to add them to `Language.factories`
_before_ you load the model.
<Accordion title="Can I add the @Language.factory decorator to a class?" id="factories-class-decorator" spaced>
Yes, the [`@Language.factory`](/api/language#factory) decorator can be added to
a function or a class. If it's added to a class, it expects the `__init__`
method to take the arguments `nlp` and `name`, and will populate all other
arguments from the config. That said, it's often cleaner and more intuitive to
make your factory a separate function. That's also how spaCy does it internally.
</Accordion>
### Example: Stateful component with settings
This example shows a **stateful** pipeline component for handling acronyms:
based on a dictionary, it will detect acronyms and their expanded forms in both
directions and add them to a list as the custom `doc._.acronyms`
[extension attribute](#custom-components-attributes). Under the hood, it uses
the [`PhraseMatcher`](/api/phrasematcher) to find instances of the phrases.
The factory function takes three arguments: the shared `nlp` object and
component instance `name`, which are passed in automatically by spaCy, and a
`case_sensitive` config setting that makes the matching and acronym detection
case-sensitive.
> #### ✏️ Things to try
>
> 1. Change the `config` passed to `nlp.add_pipe` and set `"case_sensitive"` to
> `True`. You should see that the expanded acronym for "LOL" isn't detected
> anymore.
> 2. Add some more terms to the `DICTIONARY` and update the processed text so
> they're detected.
> 3. Add a `name` argument to `nlp.add_pipe` to change the component name. Print
> `nlp.pipe_names` to see the change reflected in the pipeline.
> 4. Print the config of the current `nlp` object with
> `print(nlp.config.to_str())` and inspect the `[components]` block. You
> should see an entry for the acronyms component, referencing the factory
> `acronyms` and the config settings.
```python
### {executable="true"}
from spacy.language import Language
from spacy.tokens import Doc
from spacy.matcher import PhraseMatcher
import spacy
DICTIONARY = {"lol": "laughing out loud", "brb": "be right back"}
DICTIONARY.update({value: key for key, value in DICTIONARY.items()})
@Language.factory("acronyms", default_config={"case_sensitive": False})
def create_acronym_component(nlp: Language, name: str, case_sensitive: bool):
return AcronymComponent(nlp, case_sensitive)
class AcronymComponent:
def __init__(self, nlp: Language, case_sensitive: bool):
# Create the matcher and match on Token.lower if case-insensitive
matcher_attr = "TEXT" if case_sensitive else "LOWER"
self.matcher = PhraseMatcher(nlp.vocab, attr=matcher_attr)
self.matcher.add("ACRONYMS", [nlp.make_doc(term) for term in DICTIONARY])
self.case_sensitive = case_sensitive
# Register custom extension on the Doc
if not Doc.has_extension("acronyms"):
Doc.set_extension("acronyms", default=[])
def __call__(self, doc: Doc) -> Doc:
# Add the matched spans when doc is processed
for _, start, end in self.matcher(doc):
span = doc[start:end]
acronym = DICTIONARY.get(span.text if self.case_sensitive else span.text.lower())
doc._.acronyms.append((span, acronym))
return doc
# Add the component to the pipeline and configure it
nlp = spacy.blank("en")
nlp.add_pipe("acronyms", config={"case_sensitive": False})
# Process a doc and see the results
doc = nlp("LOL, be right back")
print(doc._.acronyms)
```
### Python type hints and pydantic validation {#type-hints new="3"}
spaCy's configs are powered by our machine learning library Thinc's
[configuration system](https://thinc.ai/docs/usage-config), which supports
[type hints](https://docs.python.org/3/library/typing.html) and even
[advanced type annotations](https://thinc.ai/docs/usage-config#advanced-types)
using [`pydantic`](https://github.com/samuelcolvin/pydantic). If your component
factory provides type hints, the values that are passed in will be **checked
against the expected types**. If the value can't be cast to an integer, spaCy
will raise an error. `pydantic` also provides strict types like `StrictFloat`,
which will force the value to be an integer and raise an error if it's not for
instance, if your config defines a float.
<Infobox variant="warning">
If you're not using
[strict types](https://pydantic-docs.helpmanual.io/usage/types/#strict-types),
values that can be **cast to** the given type will still be accepted. For
example, `1` can be cast to a `float` or a `bool` type, but not to a
`List[str]`. However, if the type is
[`StrictFloat`](https://pydantic-docs.helpmanual.io/usage/types/#strict-types),
only a float will be accepted.
</Infobox>
The following example shows a custom pipeline component for debugging. It can be
added anywhere in the pipeline and logs information about the `nlp` object and
the `Doc` that passes through. The `log_level` config setting lets the user
customize what log statements are shown for instance, `"INFO"` will show info
logs and more critical logging statements, whereas `"DEBUG"` will show
everything. The value is annotated as a `StrictStr`, so it will only accept a
string value.
> #### ✏️ Things to try
>
> 1. Change the `config` passed to `nlp.add_pipe` to use the log level `"INFO"`.
> You should see that only the statement logged with `logger.info` is shown.
> 2. Change the `config` passed to `nlp.add_pipe` so that it contains unexpected
> values for example, a boolean instead of a string: `"log_level": False`.
> You should see a validation error.
> 3. Check out the docs on `pydantic`'s
> [constrained types](https://pydantic-docs.helpmanual.io/usage/types/#constrained-types)
> and write a type hint for `log_level` that only accepts the exact string
> values `"DEBUG"`, `"INFO"` or `"CRITICAL"`.
```python
### {executable="true"}
import spacy
from spacy.language import Language
from spacy.tokens import Doc
from pydantic import StrictStr
import logging
@Language.factory("debug", default_config={"log_level": "DEBUG"})
class DebugComponent:
def __init__(self, nlp: Language, name: str, log_level: StrictStr):
self.logger = logging.getLogger(f"spacy.{name}")
self.logger.setLevel(log_level)
self.logger.info(f"Pipeline: {nlp.pipe_names}")
def __call__(self, doc: Doc) -> Doc:
self.logger.debug(f"Doc: {len(doc)} tokens, is_tagged: {doc.is_tagged}")
return doc
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("debug", config={"log_level": "DEBUG"})
doc = nlp("This is a text...")
```
### Language-specific factories {#factories-language new="3"}
There are many use case where you might want your pipeline components to be
language-specific. Sometimes this requires entirely different implementation per
language, sometimes the only difference is in the settings or data. spaCy allows
you to register factories of the **same name** on both the `Language` base
class, as well as its **subclasses** like `English` or `German`. Factories are
resolved starting with the specific subclass. If the subclass doesn't define a
component of that name, spaCy will check the `Language` base class.
Here's an example of a pipeline component that overwrites the normalized form of
a token, the `Token.norm_` with an entry from a language-specific lookup table.
It's registered twice under the name `"token_normalizer"` once using
`@English.factory` and once using `@German.factory`:
```python
### {executable="true"}
from spacy.lang.en import English
from spacy.lang.de import German
class TokenNormalizer:
def __init__(self, norm_table):
self.norm_table = norm_table
def __call__(self, doc):
for token in doc:
# Overwrite the token.norm_ if there's an entry in the data
token.norm_ = self.norm_table.get(token.text, token.norm_)
return doc
@English.factory("token_normalizer")
def create_en_normalizer(nlp, name):
return TokenNormalizer({"realise": "realize", "colour": "color"})
@German.factory("token_normalizer")
def create_de_normalizer(nlp, name):
return TokenNormalizer({"daß": "dass", "wußte": "wusste"})
nlp_en = English()
nlp_en.add_pipe("token_normalizer") # uses the English factory
print([token.norm_ for token in nlp_en("realise colour daß wußte")])
nlp_de = German()
nlp_de.add_pipe("token_normalizer") # uses the German factory
print([token.norm_ for token in nlp_de("realise colour daß wußte")])
```
<Infobox title="Implementation details">
Under the hood, language-specific factories are added to the
[`factories` registry](/api/top-level#registry) prefixed with the language code,
e.g. `"en.token_normalizer"`. When resolving the factory in
[`nlp.add_pipe`](/api/language#add_pipe), spaCy first checks for a
language-specific version of the factory using `nlp.lang` and if none is
available, falls back to looking up the regular factory name.
</Infobox>
<!-- TODO:
### Trainable components {#trainable new="3"}
-->
## Extension attributes {#custom-components-attributes new="2"}
As of v2.0, spaCy allows you to set any custom attributes and methods on the
`Doc`, `Span` and `Token`, which become available as `Doc._`, `Span._` and
`Token._` for example, `Token._.my_attr`. This lets you store additional
information relevant to your application, add new features and functionality to
spaCy, and implement your own models trained with other machine learning
libraries. It also lets you take advantage of spaCy's data structures and the
`Doc` object as the "single source of truth".
spaCy allows you to set any custom attributes and methods on the `Doc`, `Span`
and `Token`, which become available as `Doc._`, `Span._` and `Token._` for
example, `Token._.my_attr`. This lets you store additional information relevant
to your application, add new features and functionality to spaCy, and implement
your own models trained with other machine learning libraries. It also lets you
take advantage of spaCy's data structures and the `Doc` object as the "single
source of truth".
<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">
@ -641,7 +855,73 @@ attributes on the `Doc`, `Span` and `Token` for example, the capital,
latitude/longitude coordinates and even the country flag.
```python
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
### {executable="true"}
import requests
from spacy.lang.en import English
from spacy.language import Language
from spacy.matcher import PhraseMatcher
from spacy.tokens import Doc, Span, Token
@Language.factory("rest_countries")
class RESTCountriesComponent:
def __init__(self, nlp, name, label="GPE"):
r = requests.get("https://restcountries.eu/rest/v2/all")
r.raise_for_status() # make sure requests raises an error if it fails
countries = r.json()
# Convert API response to dict keyed by country name for easy lookup
self.countries = {c["name"]: c for c in countries}
self.label = label
# Set up the PhraseMatcher with Doc patterns for each country name
self.matcher = PhraseMatcher(nlp.vocab)
self.matcher.add("COUNTRIES", [nlp.make_doc(c) for c in self.countries.keys()])
# Register attribute on the Token. We'll be overwriting this based on
# the matches, so we're only setting a default value, not a getter.
Token.set_extension("is_country", default=False)
Token.set_extension("country_capital", default=False)
Token.set_extension("country_latlng", default=False)
Token.set_extension("country_flag", default=False)
# Register attributes on Doc and Span via a getter that checks if one of
# the contained tokens is set to is_country == True.
Doc.set_extension("has_country", getter=self.has_country)
Span.set_extension("has_country", getter=self.has_country)
def __call__(self, doc):
spans = [] # keep the spans for later so we can merge them afterwards
for _, start, end in self.matcher(doc):
# Generate Span representing the entity & set label
entity = Span(doc, start, end, label=self.label)
spans.append(entity)
# Set custom attribute on each token of the entity
# Can be extended with other data returned by the API, like
# currencies, country code, flag, calling code etc.
for token in entity:
token._.set("is_country", True)
token._.set("country_capital", self.countries[entity.text]["capital"])
token._.set("country_latlng", self.countries[entity.text]["latlng"])
token._.set("country_flag", self.countries[entity.text]["flag"])
# Iterate over all spans and merge them into one token
with doc.retokenize() as retokenizer:
for span in spans:
retokenizer.merge(span)
# Overwrite doc.ents and add entity be careful not to replace!
doc.ents = list(doc.ents) + spans
return doc # don't forget to return the Doc!
def has_country(self, tokens):
"""Getter for Doc and Span attributes. Since the getter is only called
when we access the attribute, we can refer to the Token's 'is_country'
attribute here, which is already set in the processing step."""
return any([t._.get("is_country") for t in tokens])
nlp = English()
nlp.add_pipe("rest_countries", config={"label": "GPE"})
doc = nlp("Some text about Colombia and the Czech Republic")
print("Pipeline", nlp.pipe_names) # pipeline contains component name
print("Doc has countries", doc._.has_country) # Doc contains countries
for token in doc:
if token._.is_country:
print(token.text, token._.country_capital, token._.country_latlng, token._.country_flag)
print("Entities", [(e.text, e.label_) for e in doc.ents])
```
In this case, all data can be fetched on initialization in one request. However,
@ -800,11 +1080,6 @@ function that takes a `Doc`, modifies it and returns it.
[`load_model_from_path`](/api/top-level#util.load_model_from_path) utility
functions.
```diff
+ nlp.add_pipe(my_custom_component)
+ return nlp.from_disk(model_path)
```
- Once you're ready to share your extension with others, make sure to **add docs
and installation instructions** (you can always link to this page for more
info). Make it easy for others to install and use your extension, for example
@ -838,10 +1113,12 @@ wrapper has to do is compute the entity spans and overwrite the `doc.ents`.
> overlapping entity spans are not allowed.
```python
### {highlight="1,6-7"}
### {highlight="1,8-9"}
import your_custom_entity_recognizer
from spacy.gold import offsets_from_biluo_tags
from spacy.language import Language
@Language.component("custom_ner_wrapper")
def custom_ner_wrapper(doc):
words = [token.text for token in doc]
custom_entities = your_custom_entity_recognizer(words)
@ -865,22 +1142,24 @@ because it returns the integer ID of the string _and_ makes sure it's added to
the vocab. This is especially important if the custom model uses a different
label scheme than spaCy's default models.
> #### Example: spacy-stanfordnlp
> #### Example: spacy-stanza
>
> For an example of an end-to-end wrapper for statistical tokenization, tagging
> and parsing, check out
> [`spacy-stanfordnlp`](https://github.com/explosion/spacy-stanfordnlp). It uses
> a very similar approach to the example in this section the only difference
> is that it fully replaces the `nlp` object instead of providing a pipeline
> component, since it also needs to handle tokenization.
> [`spacy-stanza`](https://github.com/explosion/spacy-stanza). It uses a very
> similar approach to the example in this section the only difference is that
> it fully replaces the `nlp` object instead of providing a pipeline component,
> since it also needs to handle tokenization.
```python
### {highlight="1,9,15-17"}
### {highlight="1,11,17-19"}
import your_custom_model
from spacy.language import Language
from spacy.symbols import POS, TAG, DEP, HEAD
from spacy.tokens import Doc
import numpy
@Language.component("custom_model_wrapper")
def custom_model_wrapper(doc):
words = [token.text for token in doc]
spaces = [token.whitespace for token in doc]

View File

@ -450,6 +450,14 @@ git init # Initialize a Git repo
dvc init # Initialize a DVC project
```
<Infobox title="Important note on privacy" variant="warning">
DVC enables usage analytics by default, so if you're working in a
privacy-sensitive environment, make sure to
[**opt-out manually**](https://dvc.org/doc/user-guide/analytics#opting-out).
</Infobox>
The [`spacy project dvc`](/api/cli#project-dvc) command creates a `dvc.yaml`
config file based on a workflow defined in your `project.yml`. Whenever you
update your project, you can re-run the command to update your DVC config. You

View File

@ -506,11 +506,16 @@ attribute `bad_html` on the token.
```python
### {executable="true"}
import spacy
from spacy.language import Language
from spacy.matcher import Matcher
from spacy.tokens import Token
# We're using a class because the component needs to be initialized with
# the shared vocab via the nlp object
# We're using a component factory because the component needs to be initialized
# with the shared vocab via the nlp object
@Language.factory("html_merger")
def create_bad_html_merger(nlp, name):
return BadHTMLMerger(nlp)
class BadHTMLMerger:
def __init__(self, nlp):
patterns = [
@ -536,8 +541,7 @@ class BadHTMLMerger:
return doc
nlp = spacy.load("en_core_web_sm")
html_merger = BadHTMLMerger(nlp)
nlp.add_pipe(html_merger, last=True) # Add component to the pipeline
nlp.add_pipe("html_merger", last=True) # Add component to the pipeline
doc = nlp("Hello<br>world! <br/> This is a test.")
for token in doc:
print(token.text, token._.bad_html)
@ -546,10 +550,16 @@ for token in doc:
Instead of hard-coding the patterns into the component, you could also make it
take a path to a JSON file containing the patterns. This lets you reuse the
component with different patterns, depending on your application:
component with different patterns, depending on your application. When adding
the component to the pipeline with [`nlp.add_pipe`](/api/language#add_pipe), you
can pass in the argument via the `config`:
```python
html_merger = BadHTMLMerger(nlp, path="/path/to/patterns.json")
@Language.factory("html_merger", default_config={"path": None})
def create_bad_html_merger(nlp, name, path):
return BadHTMLMerger(nlp, path=path)
nlp.add_pipe("html_merger", config={"path": "/path/to/patterns.json"})
```
<Infobox title="Processing pipelines" emoji="📖">
@ -835,7 +845,7 @@ patterns can contain single or multiple tokens.
import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.load('en_core_web_sm')
nlp = spacy.load("en_core_web_sm")
matcher = PhraseMatcher(nlp.vocab)
terms = ["Barack Obama", "Angela Merkel", "Washington, D.C."]
# Only run nlp.make_doc to speed things up
@ -975,14 +985,12 @@ chosen.
```python
### {executable="true"}
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
nlp = English()
ruler = EntityRuler(nlp)
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "ORG", "pattern": "Apple"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
doc = nlp("Apple is opening its first big office in San Francisco.")
print([(ent.text, ent.label_) for ent in doc.ents])
@ -1000,13 +1008,11 @@ can set `overwrite_ents=True` on initialization.
```python
### {executable="true"}
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load("en_core_web_sm")
ruler = EntityRuler(nlp)
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "ORG", "pattern": "MyCorp Inc."}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
doc = nlp("MyCorp Inc. is a company in the U.S.")
print([(ent.text, ent.label_) for ent in doc.ents])
@ -1014,12 +1020,12 @@ print([(ent.text, ent.label_) for ent in doc.ents])
#### Validating and debugging EntityRuler patterns {#entityruler-pattern-validation new="2.1.8"}
The `EntityRuler` can validate patterns against a JSON schema with the option
`validate=True`. See details under
The entity ruler can validate patterns against a JSON schema with the config
setting `"validate"`. See details under
[Validating and debugging patterns](#pattern-validation).
```python
ruler = EntityRuler(nlp, validate=True)
ruler = nlp.add_pipe("entity_ruler", config={"validate": True})
```
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
@ -1031,15 +1037,13 @@ the same entity.
```python
### {executable="true"}
from spacy.lang.en import English
from spacy.pipeline import EntityRuler
nlp = English()
ruler = EntityRuler(nlp)
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
ruler.add_patterns(patterns)
nlp.add_pipe(ruler)
doc1 = nlp("Apple is opening its first big office in San Francisco.")
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
@ -1068,7 +1072,7 @@ line.
```python
ruler.to_disk("./patterns.jsonl")
new_ruler = EntityRuler(nlp).from_disk("./patterns.jsonl")
new_ruler = nlp.add_pipe("entity_ruler").from_disk("./patterns.jsonl")
```
<Infobox title="Integration with Prodigy">
@ -1086,9 +1090,8 @@ pipeline, its patterns are automatically exported to the model directory:
```python
nlp = spacy.load("en_core_web_sm")
ruler = EntityRuler(nlp)
ruler = nlp.add_pipe("entity_ruler")
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
nlp.add_pipe(ruler)
nlp.to_disk("/path/to/model")
```
@ -1100,35 +1103,30 @@ powerful model packages with binary weights _and_ rules included!
### Using a large number of phrase patterns {#entityruler-large-phrase-patterns new="2.2.4"}
<!-- TODO: double-check that this still works if the ruler is added to the pipeline on creation, and include suggestion if needed -->
When using a large amount of **phrase patterns** (roughly > 10000) it's useful
to understand how the `add_patterns` function of the EntityRuler works. For each
**phrase pattern**, the EntityRuler calls the nlp object to construct a doc
to understand how the `add_patterns` function of the entity ruler works. For
each **phrase pattern**, the EntityRuler calls the nlp object to construct a doc
object. This happens in case you try to add the EntityRuler at the end of an
existing pipeline with, for example, a POS tagger and want to extract matches
based on the pattern's POS signature.
In this case you would pass a config value of `phrase_matcher_attr="POS"` for
the EntityRuler.
based on the pattern's POS signature. In this case you would pass a config value
of `"phrase_matcher_attr": "POS"` for the entity ruler.
Running the full language pipeline across every pattern in a large list scales
linearly and can therefore take a long time on large amounts of phrase patterns.
As of spaCy 2.2.4 the `add_patterns` function has been refactored to use
nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with
5,000-100,000 phrase patterns respectively.
Even with this speedup (but especially if you're using an older version) the
`add_patterns` function can still take a long time.
An easy workaround to make this function run faster is disabling the other
language pipes while adding the phrase patterns.
5,000-100,000 phrase patterns respectively. Even with this speedup (but
especially if you're using an older version) the `add_patterns` function can
still take a long time. An easy workaround to make this function run faster is
disabling the other language pipes while adding the phrase patterns.
```python
entityruler = EntityRuler(nlp)
ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "TEST", "pattern": str(i)} for i in range(100000)]
with nlp.select_pipes(enable="tagger"):
entityruler.add_patterns(patterns)
ruler.add_patterns(patterns)
```
## Combining models and rules {#models-rules}
@ -1189,9 +1187,11 @@ have in common is that _if_ they occur, they occur in the **previous token**
right before the person entity.
```python
### {highlight="7-11"}
### {highlight="9-13"}
from spacy.language import Language
from spacy.tokens import Span
@Language.component("expand_person_entities")
def expand_person_entities(doc):
new_ents = []
for ent in doc.ents:
@ -1210,18 +1210,20 @@ def expand_person_entities(doc):
```
The above function takes a `Doc` object, modifies its `doc.ents` and returns it.
This is exactly what a [pipeline component](/usage/processing-pipelines) does,
so in order to let it run automatically when processing a text with the `nlp`
object, we can use [`nlp.add_pipe`](/api/language#add_pipe) to add it to the
current pipeline.
Using the [`@Language.component`](/api/language#component) decorator, we can
register it as a [pipeline component](/usage/processing-pipelines) so it can run
automatically when processing a text. We can use
[`nlp.add_pipe`](/api/language#add_pipe) to add it to the current pipeline.
```python
### {executable="true"}
import spacy
from spacy.language import Language
from spacy.tokens import Span
nlp = spacy.load("en_core_web_sm")
@Language.component("expand_person_entities")
def expand_person_entities(doc):
new_ents = []
for ent in doc.ents:
@ -1236,7 +1238,7 @@ def expand_person_entities(doc):
return doc
# Add the component after the named entity recognizer
nlp.add_pipe(expand_person_entities, after='ner')
nlp.add_pipe("expand_person_entities", after="ner")
doc = nlp("Dr. Alex Smith chaired first board meeting of Acme Corp Inc.")
print([(ent.text, ent.label_) for ent in doc.ents])
@ -1347,7 +1349,7 @@ for ent in person_entities:
# children, e.g. at -> Acme Corp Inc.
orgs = [token for token in prep.children if token.ent_type_ == "ORG"]
# If the verb is in past tense, the company was a previous company
print({'person': ent, 'orgs': orgs, 'past': head.tag_ == "VBD"})
print({"person": ent, "orgs": orgs, "past": head.tag_ == "VBD"})
```
To apply this logic automatically when we process a text, we can add it to the
@ -1374,11 +1376,12 @@ the entity `Span` for example `._.orgs` or `._.prev_orgs` and
```python
### {executable="true"}
import spacy
from spacy.pipeline import merge_entities
from spacy.language import Language
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
@Language.component("extract_person_orgs")
def extract_person_orgs(doc):
person_entities = [ent for ent in doc.ents if ent.label_ == "PERSON"]
for ent in person_entities:
@ -1391,12 +1394,12 @@ def extract_person_orgs(doc):
return doc
# To make the entities easier to work with, we'll merge them into single tokens
nlp.add_pipe(merge_entities)
nlp.add_pipe(extract_person_orgs)
nlp.add_pipe("merge_entities")
nlp.add_pipe("extract_person_orgs")
doc = nlp("Alex Smith worked at Acme Corp Inc.")
# If you're not in a Jupyter / IPython environment, use displacy.serve
displacy.render(doc, options={'fine_grained': True})
displacy.render(doc, options={"fine_grained": True})
```
If you change the sentence structure above, for example to "was working", you'll
@ -1409,7 +1412,8 @@ information is in the attached auxiliary "was":
To solve this, we can adjust the rules to also check for the above construction:
```python
### {highlight="9-11"}
### {highlight="10-12"}
@Language.component("extract_person_orgs")
def extract_person_orgs(doc):
person_entities = [ent for ent in doc.ents if ent.label_ == "PERSON"]
for ent in person_entities:

View File

@ -15,6 +15,8 @@ import Serialization101 from 'usage/101/\_serialization.md'
### Serializing the pipeline {#pipeline}
<!-- TODO: update this -->
When serializing the pipeline, keep in mind that this will only save out the
**binary data for the individual components** to allow spaCy to restore them
not the entire objects. This is a good thing, because it makes serialization
@ -22,32 +24,35 @@ safe. But it also means that you have to take care of storing the language name
and pipeline component names as well, and restoring them separately before you
can load in the data.
> #### Saving the model meta
> #### Saving the meta and config
>
> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all
> model meta information, like the language and pipeline, but also author and
> license information.
> The [`nlp.meta`](/api/language#meta) attribute is a JSON-serializable
> dictionary and contains all model meta information like the author and license
> information. The [`nlp.config`](/api/language#config) attribute is a
> dictionary containing the training configuration, pipeline component factories
> and other settings. It is saved out with a model as the `config.cfg`.
```python
### Serialize
bytes_data = nlp.to_bytes()
lang = nlp.meta["lang"] # "en"
pipeline = nlp.meta["pipeline"] # ["tagger", "parser", "ner"]
lang = nlp.config["nlp"]["lang"] # "en"
pipeline = nlp.config["nlp"]["pipeline"] # ["tagger", "parser", "ner"]
```
```python
### Deserialize
nlp = spacy.blank(lang)
for pipe_name in pipeline:
pipe = nlp.create_pipe(pipe_name)
nlp.add_pipe(pipe)
nlp.add_pipe(pipe_name)
nlp.from_bytes(bytes_data)
```
This is also how spaCy does it under the hood when loading a model: it loads the
model's `meta.json` containing the language and pipeline information,
initializes the language class, creates and adds the pipeline components and
_then_ loads in the binary data. You can read more about this process
model's `config.cfg` containing the language and pipeline information,
initializes the language class, creates and adds the pipeline components based
on the defined
[factories](/usage/processing-pipeline#custom-components-factories) and _then_
loads in the binary data. You can read more about this process
[here](/usage/processing-pipelines#pipelines).
### Serializing Doc objects efficiently {#docs new="2.2"}
@ -192,10 +197,9 @@ add to that data and saves and loads the data to and from a JSON file.
> recognizer and including all rules _with_ the model data.
```python
### {highlight="15-19,21-26"}
### {highlight="14-18,20-25"}
@Language.factory("my_component")
class CustomComponent:
name = "my_component"
def __init__(self):
self.data = []
@ -228,9 +232,8 @@ component's `to_disk` method.
```python
### {highlight="2-4"}
nlp = spacy.load("en_core_web_sm")
my_component = CustomComponent()
my_component = nlp.add_pipe("my_component")
my_component.add({"hello": "world"})
nlp.add_pipe(my_component)
nlp.to_disk("/path/to/model")
```
@ -247,7 +250,8 @@ file `data.json` in its subdirectory:
├── parser # data for "parser" component
├── tagger # data for "tagger" component
├── vocab # model vocabulary
├── meta.json # model meta.json with name, language and pipeline
├── meta.json # model meta.json
├── config.cfg # model config
└── tokenizer # tokenization rules
```
@ -260,19 +264,14 @@ instance, you could add a
trained with a different library like TensorFlow or PyTorch and make spaCy load
its weights automatically when you load the model package.
<Infobox title="Important note on loading components" variant="warning">
<Infobox title="Important note on loading custom components" variant="warning">
When you load a model from disk, spaCy will check the `"pipeline"` in the
model's `meta.json` and look up the component name in the internal factories. To
make sure spaCy knows how to initialize `"my_component"`, you'll need to add it
to the factories:
```python
from spacy.language import Language
Language.factories["my_component"] = lambda nlp, **cfg: CustomComponent()
```
For more details, see the documentation on
When you load back a model with custom components, make sure that the components
are **available** and that the [`@Language.component`](/api/language#component)
or [`@Language.factory`](/api/language#factory) decorators are executed _before_
your model is loaded back. Otherwise, spaCy won't know how to resolve the string
name of a component factory like `"my_component"` back to a function. For more
details, see the documentation on
[adding factories](/usage/processing-pipelines#custom-components-factories) or
use [entry points](#entry-points) to make your extension package expose your
custom components to spaCy automatically.
@ -293,40 +292,31 @@ installed in the same environment that's it.
| Entry point | Description |
| ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`spacy_factories`](#entry-points-components) | Group of entry points for pipeline component factories to add to [`Language.factories`](/usage/processing-pipelines#custom-components-factories), keyed by component name. |
| [`spacy_factories`](#entry-points-components) | Group of entry points for pipeline component factories, keyed by component name. Can be used to expose custom components defined by another package. |
| [`spacy_languages`](#entry-points-languages) | Group of entry points for custom [`Language` subclasses](/usage/adding-languages), keyed by language shortcut. |
| `spacy_lookups` <Tag variant="new">2.2</Tag> | Group of entry points for custom [`Lookups`](/api/lookups), including lemmatizer data. Used by spaCy's [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) package. |
| [`spacy_displacy_colors`](#entry-points-displacy) <Tag variant="new">2.2</Tag> | Group of entry points of custom label colors for the [displaCy visualizer](/usage/visualizers#ent). The key name doesn't matter, but it should point to a dict of labels and color values. Useful for custom models that predict different entity types. |
### Custom components via entry points {#entry-points-components}
When you load a model, spaCy will generally use the model's `meta.json` to set
When you load a model, spaCy will generally use the model's `config.cfg` to set
up the language class and construct the pipeline. The pipeline is specified as a
list of strings, e.g. `"pipeline": ["tagger", "paser", "ner"]`. For each of
those strings, spaCy will call `nlp.create_pipe` and look up the name in the
[built-in factories](/usage/processing-pipelines#custom-components-factories).
If your model wanted to specify its own custom components, you usually have to
write to `Language.factories` _before_ loading the model.
list of strings, e.g. `pipeline = ["tagger", "paser", "ner"]`. For each of those
strings, spaCy will call `nlp.add_pipe` and look up the name in all factories
defined by the decorators [`@Language.component`](/api/language#component) and
[`@Language.factory`](/api/language#factory). This means that you have to import
your custom components _before_ loading the model.
```python
pipe = nlp.create_pipe("custom_component") # fails 👎
Language.factories["custom_component"] = CustomComponentFactory
pipe = nlp.create_pipe("custom_component") # works 👍
```
This is inconvenient and usually required shipping a bunch of component
initialization code with the model. Using entry points, model packages and
extension packages can now define their own `"spacy_factories"`, which will be
added to the built-in factories when the `Language` class is initialized. If a
package in the same environment exposes spaCy entry points, all of this happens
automatically and no further user action is required.
Using entry points, model packages and extension packages can define their own
`"spacy_factories"`, which will be loaded automatically in the background when
the `Language` class is initialized. So if a user has your package installed,
they'll be able to use your components even if they **don't import them**!
To stick with the theme of
[this entry points blog post](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
consider the following custom spaCy extension which is initialized with the
shared `nlp` object and will print a snake when it's called as a pipeline
component.
consider the following custom spaCy
[pipeline component](/usage/processing-pipelines#custom-coponents) that prints a
snake when it's called:
> #### Package directory structure
>
@ -337,32 +327,38 @@ component.
```python
### snek.py
from spacy.language import Language
snek = """
--..,_ _,.--.
`'.'. .'`__ o `;__.
`'.'. .'`__ o `;__. {text}
'.'. .'.'` '---'` `
'.`'--....--'`.'
`'--....--'`
"""
class SnekFactory:
def __init__(self, nlp, **cfg):
self.nlp = nlp
def __call__(self, doc):
print(snek)
return doc
@Language.component("snek")
def snek_component(doc):
print(snek.format(text=doc.text))
return doc
```
Since it's a very complex and sophisticated module, you want to split it off
into its own package so you can version it and upload it to PyPi. You also want
your custom model to be able to define `"pipeline": ["snek"]` in its
`meta.json`. For that, you need to be able to tell spaCy where to find the
factory for `"snek"`. If you don't do this, spaCy will raise an error when you
try to load the model because there's no built-in `"snek"` factory. To add an
your custom model to be able to define `pipeline = ["snek"]` in its
`config.cfg`. For that, you need to be able to tell spaCy where to find the
component `"snek"`. If you don't do this, spaCy will raise an error when you try
to load the model because there's no built-in `"snek"` component. To add an
entry to the factories, you can now expose it in your `setup.py` via the
`entry_points` dictionary:
> #### Entry point syntax
>
> Python entry points for a group are formatted as a **list of strings**, with
> each string following the syntax of `name = module:object`. In this example,
> the created entry point is named `snek` and points to the function
> `snek_component` in the module `snek`, i.e. `snek.py`.
```python
### setup.py {highlight="5-7"}
from setuptools import setup
@ -370,73 +366,74 @@ from setuptools import setup
setup(
name="snek",
entry_points={
"spacy_factories": ["snek = snek:SnekFactory"]
"spacy_factories": ["snek = snek:snek_component"]
}
)
```
The entry point definition tells spaCy that the name `snek` can be found in the
module `snek` (i.e. `snek.py`) as `SnekFactory`. The same package can expose
multiple entry points. To make them available to spaCy, all you need to do is
install the package:
The same package can expose multiple entry points, by the way. To make them
available to spaCy, all you need to do is install the package in your
environment:
```bash
$ python setup.py develop
```
spaCy is now able to create the pipeline component `'snek'`:
spaCy is now able to create the pipeline component `"snek"` even though you
never imported `snek_component`. When you save the
[`nlp.config`](/api/language#config) to disk, it includes an entry for your
`"snek"` component and any model you train with this config will include the
component and know how to load it if your `snek` package is installed.
> #### config.cfg (excerpt)
>
> ```diff
> [nlp]
> lang = "en"
> + pipeline = ["snek"]
>
> [components]
>
> + [components.snek]
> + factory = "snek"
> ```
```
>>> from spacy.lang.en import English
>>> nlp = English()
>>> snek = nlp.create_pipe("snek") # this now works! 🐍🎉
>>> nlp.add_pipe(snek)
>>> nlp.add_pipe("snek") # this now works! 🐍🎉
>>> doc = nlp("I am snek")
--..,_ _,.--.
`'.'. .'`__ o `;__.
`'.'. .'`__ o `;__. I am snek
'.'. .'.'` '---'` `
'.`'--....--'`.'
`'--....--'`
```
Arguably, this gets even more exciting when you train your `en_core_snek_sm`
model. To make sure `snek` is installed with the model, you can add it to the
model's `setup.py`. You can then tell spaCy to construct the model pipeline with
the `snek` component by setting `"pipeline": ["snek"]` in the `meta.json`.
Instead of making your snek component a simple
[stateless component](/usage/processing-pipelines#custom-components-simple), you
could also make it a
[factory](/usage/processing-pipelines#custom-components-factories) that takes
settings. Your users can then pass in an optional `config` when they add your
component to the pipeline and customize its appearance for example, the
`snek_style`.
> #### meta.json
> #### config.cfg (excerpt)
>
> ```diff
> {
> "lang": "en",
> "name": "core_snek_sm",
> "version": "1.0.0",
> + "pipeline": ["snek"]
> }
> [components.snek]
> factory = "snek"
> + snek_style = "basic"
> ```
In theory, the entry point mechanism also lets you overwrite built-in factories
including the tokenizer. By default, spaCy will output a warning in these
cases, to prevent accidental overwrites and unintended results.
#### Advanced components with settings {#advanced-cfg}
The `**cfg` keyword arguments that the factory receives are passed down all the
way from `spacy.load`. This means that the factory can respond to custom
settings defined when loading the model for example, the style of the snake to
load:
```python
nlp = spacy.load("en_core_snek_sm", snek_style="cute")
```
```python
SNEKS = {"basic": snek, "cute": cute_snek} # collection of sneks
@Language.factory("snek", default_config={"snek_style": "basic"})
class SnekFactory:
def __init__(self, nlp, **cfg):
def __init__(self, nlp: Language, name: str, snek_style: str):
self.nlp = nlp
self.snek_style = cfg.get("snek_style", "basic")
self.snek_style = snek_style
self.snek = SNEKS[self.snek_style]
def __call__(self, doc):
@ -444,6 +441,14 @@ class SnekFactory:
return doc
```
```diff
### setup.py
entry_points={
- "spacy_factories": ["snek = snek:snek_component"]
+ "spacy_factories": ["snek = snek:SnekFactory"]
}
```
The factory can also implement other pipeline component like `to_disk` and
`from_disk` for serialization, or even `update` to make the component trainable.
If a component exposes a `from_disk` method and is included in a model's
@ -452,12 +457,12 @@ model. When you save out a model using `nlp.to_disk` and the component exposes a
`to_disk` method, it will be called with the disk path.
```python
def to_disk(self, path, **kwargs):
def to_disk(self, path, exclude=tuple()):
snek_path = path / "snek.txt"
with snek_path.open("w", encoding="utf8") as snek_file:
snek_file.write(self.snek)
def from_disk(self, path, **cfg):
def from_disk(self, path, exclude=tuple()):
snek_path = path / "snek.txt"
with snek_path.open("r", encoding="utf8") as snek_file:
self.snek = snek_file.read()
@ -473,24 +478,20 @@ the `snek.txt` and make it available to the component.
To stay with the theme of the previous example and
[this blog post on entry points](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
let's imagine you wanted to implement your own `SnekLanguage` class for your
custom model  but you don't necessarily want to modify spaCy's code to
[add a language](/usage/adding-languages). In your package, you could then
implement the following:
custom model  but you don't necessarily want to modify spaCy's code to add a
language. In your package, you could then implement the following
[custom language subclass](/usage/linguistic-features#language-subclass):
```python
### snek.py
from spacy.language import Language
from spacy.attrs import LANG
class SnekDefaults(Language.Defaults):
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
lex_attr_getters[LANG] = lambda text: "snk"
stop_words = set(["sss", "hiss"])
class SnekLanguage(Language):
lang = "snk"
Defaults = SnekDefaults
# Some custom snek language stuff here
```
Alongside the `spacy_factories`, there's also an entry point option for
@ -510,31 +511,12 @@ setup(
)
```
In spaCy, you can then load the custom `sk` language and it will be resolved to
In spaCy, you can then load the custom `snk` language and it will be resolved to
`SnekLanguage` via the custom entry point. This is especially relevant for model
packages, which could then specify `"lang": "snk"` in their `meta.json` without
spaCy raising an error because the language is not available in the core
packages you train, which could then specify `lang = snk` in their `config.cfg`
without spaCy raising an error because the language is not available in the core
library.
> #### meta.json
>
> ```diff
> {
> - "lang": "en",
> + "lang": "snk",
> "name": "core_snek_sm",
> "version": "1.0.0",
> "pipeline": ["snek"]
> }
> ```
```python
from spacy.util import get_lang_class
SnekLanguage = get_lang_class("snk")
nlp = SnekLanguage()
```
### Custom displaCy colors via entry points {#entry-points-displacy new="2.2"}
If you're training a named entity recognition model for a custom domain, you may
@ -611,7 +593,7 @@ manually and place it in the model data directory, or supply a path to it using
the `--meta` flag. For more info on this, see the [`package`](/api/cli#package)
docs.
> #### meta.json
> #### meta.json (example)
>
> ```json
> {
@ -622,8 +604,7 @@ docs.
> "description": "Example model for spaCy",
> "author": "You",
> "email": "you@example.com",
> "license": "CC BY-SA 3.0",
> "pipeline": ["tagger", "parser", "ner"]
> "license": "CC BY-SA 3.0"
> }
> ```
@ -631,66 +612,39 @@ docs.
$ python -m spacy package /home/me/data/en_example_model /home/me/my_models
```
This command will create a model package directory that should look like this:
This command will create a model package directory and will run
`python setup.py sdist` in that directory to create `.tar.gz` archive of your
model package that can be installed using `pip install`.
```yaml
### Directory structure
└── /
├── MANIFEST.in # to include meta.json
├── meta.json # model meta data
├── setup.py # setup file for pip installation
└── en_example_model # model directory
├── __init__.py # init for pip installation
└── en_example_model-1.0.0 # model data
├── MANIFEST.in # to include meta.json
├── meta.json # model meta data
├── setup.py # setup file for pip installation
├── en_example_model # model directory
│ ├── __init__.py # init for pip installation
│ └── en_example_model-1.0.0 # model data
└── dist
└── en_example_model-1.0.0.tar.gz # installable package
```
You can also find templates for all files on
[GitHub](https://github.com/explosion/spacy-models/tree/master/template). If
you're creating the package manually, keep in mind that the directories need to
be named according to the naming conventions of `lang_name` and
You can also find templates for all files in the
[`cli/package.py` source](https://github.com/explosion/spacy/tree/master/spacy/cli/package.py).
If you're creating the package manually, keep in mind that the directories need
to be named according to the naming conventions of `lang_name` and
`lang_name-version`.
### Customizing the model setup {#models-custom}
The meta.json includes the model details, like name, requirements and license,
and lets you customize how the model should be initialized and loaded. You can
define the language data to be loaded and the
[processing pipeline](/usage/processing-pipelines) to execute.
| Setting | Type | Description |
| ---------- | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `lang` | str | ID of the language class to initialize. |
| `pipeline` | list | A list of strings mapping to the IDs of pipeline factories to apply in that order. If not set, spaCy's [default pipeline](/usage/processing-pipelines) will be used. |
The `load()` method that comes with our model package templates will take care
of putting all this together and returning a `Language` object with the loaded
pipeline and data. If your model requires custom
[pipeline components](/usage/processing-pipelines) or a custom language class,
you can also **ship the code with your model**. For examples of this, check out
the implementations of spaCy's
[`load_model_from_init_py`](/api/top-level#util.load_model_from_init_py) and
[`load_model_from_path`](/api/top-level#util.load_model_from_path) utility
functions.
### Building the model package {#models-building}
To build the package, run the following command from within the directory. For
more information on building Python packages, see the docs on Python's
[setuptools](https://setuptools.readthedocs.io/en/latest/).
```bash
$ python setup.py sdist
```
This will create a `.tar.gz` archive in a directory `/dist`. The model can be
installed by pointing pip to the path of the archive:
```bash
$ pip install /path/to/en_example_model-1.0.0.tar.gz
```
You can then load the model via its name, `en_example_model`, or import it
directly as a module and then call its `load()` method.
you can also **ship the code with your model** and include it in the
`__init__.py` for example, to register custom
[pipeline components](/usage/processing-pipelines#custom-components) before the
`nlp` object is created.
### Loading a custom model package {#loading}

View File

@ -328,18 +328,15 @@ spaCy's configs are powered by our machine learning library Thinc's
[type hints](https://docs.python.org/3/library/typing.html) and even
[advanced type annotations](https://thinc.ai/docs/usage-config#advanced-types)
using [`pydantic`](https://github.com/samuelcolvin/pydantic). If your registered
function provides For example, `start: int` in the example above will ensure
that the value received as the argument `start` is an integer. If the value
can't be cast to an integer, spaCy will raise an error.
function provides type hints, the values that are passed in will be checked
against the expected types. For example, `start: int` in the example above will
ensure that the value received as the argument `start` is an integer. If the
value can't be cast to an integer, spaCy will raise an error.
`start: pydantic.StrictInt` will force the value to be an integer and raise an
error if it's not for instance, if your config defines a float.
</Infobox>
### Defining custom architectures {#custom-architectures}
<!-- TODO: this could maybe be a more general example of using Thinc to compose some layers? We don't want to go too deep here and probably want to focus on a simple architecture example to show how it works -->
### Wrapping PyTorch and TensorFlow {#custom-frameworks}
<!-- TODO: -->
@ -352,6 +349,10 @@ mattis pretium.
</Project>
### Defining custom architectures {#custom-architectures}
<!-- TODO: this could maybe be a more general example of using Thinc to compose some layers? We don't want to go too deep here and probably want to focus on a simple architecture example to show how it works -->
## Parallel Training with Ray {#parallel-training}
<!-- TODO: document Ray integration -->

View File

@ -0,0 +1,6 @@
---
title: Transformers
teaser: Using transformer models like BERT in spaCy
---
TODO: ...

View File

@ -6,6 +6,7 @@ menu:
- ['New Features', 'features']
- ['Backwards Incompatibilities', 'incompat']
- ['Migrating from v2.x', 'migrating']
- ['Migrating plugins', 'plugins']
---
## Summary {#summary}
@ -31,3 +32,74 @@ relied on them.
<!-- TODO: complete (see release notes Dropbox Paper doc) -->
## Migrating from v2.x {#migrating}
## Migration notes for plugin maintainers {#plugins}
Thanks to everyone who's been contributing to the spaCy ecosystem by developing
and maintaining one of the many awesome [plugins and extensions](/universe).
We've tried to keep breaking changes to a minimum and make it as easy as
possible for you to upgrade your packages for spaCy v3.
### Custom pipeline components
The most common use case for plugins is providing pipeline components and
extension attributes.
- Use the [`@Language.factory`](/api/language#factory) decorator to register
your component and assign it a name. This allows users to refer to your
components by name and serialize pipelines referencing them. Remove all manual
entries to the `Language.factories`.
- Make sure your component factories take at least two **named arguments**:
`nlp` (the current `nlp` object) and `name` (the instance name of the added
component so you can identify multiple instances of the same component).
- Update all references to [`nlp.add_pipe`](/api/language#add_pipe) in your docs
to use **string names** instead of the component functions.
```python
### {highlight="1-5"}
from spacy.language import Language
@Language.factory("my_component", default_config={"some_setting": False})
def create_component(nlp: Language, name: str, some_setting: bool):
return MyCoolComponent(some_setting=some_setting)
class MyCoolComponent:
def __init__(self, some_setting):
self.some_setting = some_setting
def __call__(self, doc):
# Do something to the doc
return doc
```
> #### Result in config.cfg
>
> ```ini
> [components.my_component]
> factory = "my_component"
> some_setting = true
> ```
```diff
import spacy
from your_plugin import MyCoolComponent
nlp = spacy.load("en_core_web_sm")
- component = MyCoolComponent(some_setting=True)
- nlp.add_pipe(component)
+ nlp.add_pipe("my_component", config={"some_setting": True})
```
<Infobox title="Important note on registering factories" variant="warning">
The [`@Language.factory`](/api/language#factory) decorator takes care of letting
spaCy know that a component of that name is available. This means that your
users can add it to the pipeline using its **string name**. However, this
requires the decorator to be executed so users will still have to **import
your plugin**. Alternatively, your plugin could expose an
[entry point](/usage/saving-loading#entry-points), which spaCy can read from.
This means that spaCy knows how to initialize `my_component`, even if your
package isn't imported.
</Infobox>

View File

@ -229,3 +229,5 @@ vectors.data = torch.Tensor(vectors.data).cuda(0)
## Other embeddings {#embeddings}
<!-- TODO: explain spacy-transformers, doc.tensor, tok2vec? -->
<!-- TODO: mention sense2vec somewhere? -->

View File

@ -19,6 +19,7 @@
{ "text": "Rule-based Matching", "url": "/usage/rule-based-matching" },
{ "text": "Processing Pipelines", "url": "/usage/processing-pipelines" },
{ "text": "Vectors & Embeddings", "url": "/usage/vectors-embeddings" },
{ "text": "Transformers", "url": "/usage/transformers", "tag": "new" },
{ "text": "Training Models", "url": "/usage/training", "tag": "new" },
{ "text": "spaCy Projects", "url": "/usage/projects", "tag": "new" },
{ "text": "Saving & Loading", "url": "/usage/saving-loading" },

View File

@ -414,7 +414,7 @@ body [id]:target
.cm-number
color: var(--syntax-number)
.cm-def
.cm-def, .cm-meta
color: var(--syntax-function)
// Jupyter

View File

@ -17,7 +17,8 @@
background: var(--color-subtle-opaque)
.footer
background: var(--color-theme-light)
--color-inline-code-bg: var(--color-theme-opaque)
background: var(--color-theme-light) !important
border-top: 2px solid var(--color-theme)
& > td:first-child