mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 18:26:30 +03:00
Update docs [ci skip]
This commit is contained in:
parent
3d56a3f286
commit
7adbaf9a5b
|
@ -384,7 +384,7 @@ original file is shown at the top of the widget.
|
|||
> ```
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
|
||||
https://github.com/explosion/spaCy/tree/master/spacy/language.py
|
||||
```
|
||||
|
||||
### Infobox
|
||||
|
|
|
@ -1,23 +1,22 @@
|
|||
---
|
||||
title: DependencyParser
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/dep_parser.pyx
|
||||
---
|
||||
|
||||
This class is a subclass of `Pipe` and follows the same API. The pipeline
|
||||
component is available in the [processing pipeline](/usage/processing-pipelines)
|
||||
via the ID `"parser"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/dep_parser.pyx
|
||||
```
|
||||
|
||||
## DependencyParser.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/parser_d
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe with default model
|
||||
> parser = nlp.create_pipe("parser")
|
||||
> # Construction via add_pipe with default model
|
||||
> parser = nlp.add_pipe("parser")
|
||||
>
|
||||
> # Construction via create_pipe with custom model
|
||||
> # Construction via add_pipe with custom model
|
||||
> config = {"model": {"@architectures": "my_parser"}}
|
||||
> parser = nlp.create_pipe("parser", config)
|
||||
>
|
||||
> # Construction from class with custom model from file
|
||||
> from spacy.pipeline import DependencyParser
|
||||
> model = util.load_config("model.cfg", create_objects=True)["model"]
|
||||
> parser = DependencyParser(nlp.vocab, model)
|
||||
> parser = nlp.add_pipe("parser", config=config)
|
||||
> ```
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ------------------ | ------------------------------------------------------------------------------- |
|
||||
|
|
|
@ -4,7 +4,7 @@ teaser:
|
|||
Functionality to disambiguate a named entity in text to a unique knowledge
|
||||
base identifier.
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/entity_linker.py
|
||||
new: 2.2
|
||||
---
|
||||
|
||||
|
@ -12,16 +12,15 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
|
|||
component is available in the [processing pipeline](/usage/processing-pipelines)
|
||||
via the ID `"entity_linker"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_linker_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/entity_linker.py
|
||||
```
|
||||
|
||||
## EntityLinker.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -29,22 +28,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/entity_l
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe with default model
|
||||
> entity_linker = nlp.create_pipe("entity_linker")
|
||||
> # Construction via add_pipe with default model
|
||||
> entity_linker = nlp.add_pipe("entity_linker")
|
||||
>
|
||||
> # Construction via create_pipe with custom model
|
||||
> # Construction via add_pipe with custom model
|
||||
> config = {"model": {"@architectures": "my_el"}}
|
||||
> entity_linker = nlp.create_pipe("entity_linker", config)
|
||||
>
|
||||
> # Construction from class with custom model from file
|
||||
> from spacy.pipeline import EntityLinker
|
||||
> model = util.load_config("model.cfg", create_objects=True)["model"]
|
||||
> entity_linker = EntityLinker(nlp.vocab, model)
|
||||
> entity_linker = nlp.add_pipe("entity_linker", config=config)
|
||||
> ```
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ------- | ------- | ------------------------------------------------------------------------------- |
|
||||
|
@ -185,9 +179,8 @@ method, a knowledge base should have been defined with
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> entity_linker = EntityLinker(nlp.vocab)
|
||||
> entity_linker = nlp.add_pipe("entity_linker", last=True)
|
||||
> entity_linker.set_kb(kb)
|
||||
> nlp.add_pipe(entity_linker, last=True)
|
||||
> optimizer = entity_linker.begin_training(pipeline=nlp.pipeline)
|
||||
> ```
|
||||
|
||||
|
|
|
@ -1,23 +1,22 @@
|
|||
---
|
||||
title: EntityRecognizer
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/ner.pyx
|
||||
---
|
||||
|
||||
This class is a subclass of `Pipe` and follows the same API. The pipeline
|
||||
component is available in the [processing pipeline](/usage/processing-pipelines)
|
||||
via the ID `"ner"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/ner.pyx
|
||||
```
|
||||
|
||||
## EntityRecognizer.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -25,22 +24,17 @@ https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/ner_defa
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> ner = nlp.create_pipe("ner")
|
||||
> # Construction via add_pipe with default model
|
||||
> ner = nlp.add_pipe("ner")
|
||||
>
|
||||
> # Construction via create_pipe with custom model
|
||||
> # Construction via add_pipe with custom model
|
||||
> config = {"model": {"@architectures": "my_ner"}}
|
||||
> parser = nlp.create_pipe("ner", config)
|
||||
>
|
||||
> # Construction from class with custom model from file
|
||||
> from spacy.pipeline import EntityRecognizer
|
||||
> model = util.load_config("model.cfg", create_objects=True)["model"]
|
||||
> ner = EntityRecognizer(nlp.vocab, model)
|
||||
> parser = nlp.add_pipe("ner", config=config)
|
||||
> ```
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ------------------ | ------------------------------------------------------------------------------- |
|
||||
|
|
|
@ -8,10 +8,10 @@ new: 2.1
|
|||
The EntityRuler lets you add spans to the [`Doc.ents`](/api/doc#ents) using
|
||||
token-based rules or exact phrase matches. It can be combined with the
|
||||
statistical [`EntityRecognizer`](/api/entityrecognizer) to boost accuracy, or
|
||||
used on its own to implement a purely rule-based entity recognition system.
|
||||
After initialization, the component is typically added to the processing
|
||||
pipeline using [`nlp.add_pipe`](/api/language#add_pipe). For usage examples, see
|
||||
the docs on
|
||||
used on its own to implement a purely rule-based entity recognition system. The
|
||||
pipeline component is available in the
|
||||
[processing pipeline](/usage/processing-pipelines) via the ID `"entity_ruler"`.
|
||||
For usage examples, see the docs on
|
||||
[rule-based entity recognition](/usage/rule-based-matching#entityruler).
|
||||
|
||||
## EntityRuler.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -19,13 +19,13 @@ the docs on
|
|||
Initialize the entity ruler. If patterns are supplied here, they need to be a
|
||||
list of dictionaries with a `"label"` and `"pattern"` key. A pattern can either
|
||||
be a token pattern (list) or a phrase pattern (string). For example:
|
||||
`{'label': 'ORG', 'pattern': 'Apple'}`.
|
||||
`{"label": "ORG", "pattern": "Apple"}`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> ruler = nlp.create_pipe("entity_ruler")
|
||||
> # Construction via add_pipe
|
||||
> ruler = nlp.add_pipe("entity_ruler")
|
||||
>
|
||||
> # Construction from class
|
||||
> from spacy.pipeline import EntityRuler
|
||||
|
@ -90,9 +90,8 @@ is chosen.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> ruler = EntityRuler(nlp)
|
||||
> ruler = nlp.add_pipe("entity_ruler")
|
||||
> ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
|
||||
> nlp.add_pipe(ruler)
|
||||
>
|
||||
> doc = nlp("A text about Apple.")
|
||||
> ents = [(ent.text, ent.label_) for ent in doc.ents]
|
||||
|
|
|
@ -223,7 +223,7 @@ in `example.predicted`.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> nlp.add_pipe(my_ner)
|
||||
> nlp.add_pipe("my_ner")
|
||||
> doc = nlp("Mr and Mrs Smith flew to New York")
|
||||
> tokens_ref = ["Mr and Mrs", "Smith", "flew", "to", "New York"]
|
||||
> example = Example.from_dict(doc, {"words": tokens_ref})
|
||||
|
|
|
@ -15,6 +15,88 @@ the tagger or parser that are called on a document in order. You can also add
|
|||
your own processing pipeline components that take a `Doc` object, modify it and
|
||||
return it.
|
||||
|
||||
## Language.component {#component tag="classmethod" new="3"}
|
||||
|
||||
Register a custom pipeline component under a given name. This allows
|
||||
initializing the component by name using
|
||||
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
|
||||
[config files](/usage/training#config). This classmethod and decorator is
|
||||
intended for **simple stateless functions** that take a `Doc` and return it. For
|
||||
more complex stateful components that allow settings and need access to the
|
||||
shared `nlp` object, use the [`Language.factory`](/api/language#factory)
|
||||
decorator. For more details and examples, see the
|
||||
[usage documentation](/usage/processing-pipelines#custom-components).
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.language import Language
|
||||
>
|
||||
> # Usage as a decorator
|
||||
> @Language.component("my_component")
|
||||
> def my_component(doc):
|
||||
> # Do something to the doc
|
||||
> return doc
|
||||
>
|
||||
> # Usage as a function
|
||||
> Language.component("my_component2", func=my_component)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `name` | str | The name of the component factory. |
|
||||
| _keyword-only_ | | |
|
||||
| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
|
||||
|
||||
## Language.factory {#factory tag="classmethod"}
|
||||
|
||||
Register a custom pipeline component factory under a given name. This allows
|
||||
initializing the component by name using
|
||||
[`Language.add_pipe`](/api/language#add_pipe) and referring to it in
|
||||
[config files](/usage/training#config). The registered factory function needs to
|
||||
take at least two **named arguments** which spaCy fills in automatically: `nlp`
|
||||
for the current `nlp` object and `name` for the component instance name. This
|
||||
can be useful to distinguish multiple instances of the same component and allows
|
||||
trainable components to add custom losses using the component instance name. The
|
||||
`default_config` defines the default values of the remaining factory arguments.
|
||||
It's merged into the [`nlp.config`](/api/language#config). For more details and
|
||||
examples, see the
|
||||
[usage documentation](/usage/processing-pipelines#custom-components).
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.language import Language
|
||||
>
|
||||
> # Usage as a decorator
|
||||
> @Language.factory(
|
||||
> "my_component",
|
||||
> default_config={"some_setting": True},
|
||||
> )
|
||||
> def create_my_component(nlp, name, some_setting):
|
||||
> return MyComponent(some_setting)
|
||||
>
|
||||
> # Usage as function
|
||||
> Language.factory(
|
||||
> "my_component",
|
||||
> default_config={"some_setting": True},
|
||||
> func=create_my_component
|
||||
> )
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ---------------- | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `name` | str | The name of the component factory. |
|
||||
| _keyword-only_ | | |
|
||||
| `default_config` | `Dict[str, any]` | The default config, describing the default values of the factory arguments. |
|
||||
| `assigns` | `Iterable[str]` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `requires` | `Iterable[str]` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `retokenizes` | bool | Whether the component changes tokenization. Used for pipeline analysis. <!-- TODO: link to something --> |
|
||||
| `func` | `Optional[Callable]` | Optional function if not used a a decorator. |
|
||||
|
||||
## Language.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
Initialize a `Language` object.
|
||||
|
@ -30,12 +112,41 @@ Initialize a `Language` object.
|
|||
> nlp = English()
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ---------- | ------------------------------------------------------------------------------------------ |
|
||||
| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created via `Language.Defaults.create_vocab`. |
|
||||
| `make_doc` | callable | A function that takes text and returns a `Doc` object. Usually a `Tokenizer`. |
|
||||
| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
|
||||
| **RETURNS** | `Language` | The newly constructed object. |
|
||||
| Name | Type | Description |
|
||||
| ------------------ | ----------- | ------------------------------------------------------------------------------------------ |
|
||||
| `vocab` | `Vocab` | A `Vocab` object. If `True`, a vocab is created using the default language data settings. |
|
||||
| _keyword-only_ | | |
|
||||
| `max_length` | int | Maximum number of characters allowed in a single text. Defaults to `10 ** 6`. |
|
||||
| `meta` | dict | Custom meta data for the `Language` class. Is written to by models to add model meta data. |
|
||||
| `create_tokenizer` | `Callable` | Optional function that receives the `nlp` object and returns a tokenizer. |
|
||||
| **RETURNS** | `Language` | The newly constructed object. |
|
||||
|
||||
## Language.from_config {#from_config tag="classmethod"}
|
||||
|
||||
Create a `Language` object from a loaded config. Will set up the tokenizer and
|
||||
language data, add pipeline components based on the pipeline and components
|
||||
define in the config and validate the results. If no config is provided, the
|
||||
default config of the given language is used. This is also how spaCy loads a
|
||||
model under the hood based on its [`config.cfg`](/api/data-formats#config).
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from thinc.api import Config
|
||||
> from spacy.language import Language
|
||||
>
|
||||
> config = Config().from_disk("./config.cfg")
|
||||
> nlp = Language.from_config(config)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `config` | `Dict[str, Any]` / [`Config`](https://thinc.ai/docs/api-config#config) | The loaded config. |
|
||||
| _keyword-only_ | |
|
||||
| `disable` | `Iterable[str]` | List of pipeline component names to disable. |
|
||||
| `auto_fill` | bool | Whether to automatically fill in missing values in the config, based on defaults and function argument annotations. Defaults to `True`. |
|
||||
| `validate` | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
|
||||
| **RETURNS** | `Language` | The initialized object. |
|
||||
|
||||
## Language.\_\_call\_\_ {#call tag="method"}
|
||||
|
||||
|
@ -162,43 +273,99 @@ their original weights after the block.
|
|||
|
||||
Create a pipeline component from a factory.
|
||||
|
||||
<Infobox title="Changed in v3.0" variant="warning">
|
||||
|
||||
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method also takes
|
||||
the string name of the factory, creates the component, adds it to the pipeline
|
||||
and returns it. The `Language.create_pipe` method is now mostly used internally.
|
||||
To create a component and add it to the pipeline, you should always use
|
||||
`Language.add_pipe`.
|
||||
|
||||
</Infobox>
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> parser = nlp.create_pipe("parser")
|
||||
> nlp.add_pipe(parser)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | ---------------------------------------------------------------------------------- |
|
||||
| `name` | str | Factory name to look up in [`Language.factories`](/api/language#class-attributes). |
|
||||
| `config` | dict | Configuration parameters to initialize component. |
|
||||
| **RETURNS** | callable | The pipeline component. |
|
||||
| Name | Type | Description |
|
||||
| ------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `factory_name` | str | Name of the registered component factory. |
|
||||
| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
|
||||
| `config` <Tag variant="new">3</Tag> | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
|
||||
| `validate` <Tag variant="new">3</Tag> | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
|
||||
| **RETURNS** | callable | The pipeline component. |
|
||||
|
||||
## Language.add_pipe {#add_pipe tag="method" new="2"}
|
||||
|
||||
Add a component to the processing pipeline. Valid components are callables that
|
||||
take a `Doc` object, modify it and return it. Only one of `before`, `after`,
|
||||
`first` or `last` can be set. Default behavior is `last=True`.
|
||||
Add a component to the processing pipeline. Expects a name that maps to a
|
||||
component factory registered using
|
||||
[`@Language.component`](/api/language#component) or
|
||||
[`@Language.factory`](/api/language#factory). Components should be callables
|
||||
that take a `Doc` object, modify it and return it. Only one of `before`,
|
||||
`after`, `first` or `last` can be set. Default behavior is `last=True`.
|
||||
|
||||
<Infobox title="Changed in v3.0" variant="warning">
|
||||
|
||||
As of v3.0, the [`Language.add_pipe`](/api/language#add_pipe) method doesn't
|
||||
take callables anymore and instead expects the name of a component factory
|
||||
registered using [`@Language.component`](/api/language#component) or
|
||||
[`@Language.factory`](/api/language#factory). It now takes care of creating the
|
||||
component, adds it to the pipeline and returns it.
|
||||
|
||||
</Infobox>
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> def component(doc):
|
||||
> @Language.component("component")
|
||||
> def component_func(doc):
|
||||
> # modify Doc and return it return doc
|
||||
>
|
||||
> nlp.add_pipe(component, before="ner")
|
||||
> nlp.add_pipe(component, name="custom_name", last=True)
|
||||
> nlp.add_pipe("component", before="ner")
|
||||
> component = nlp.add_pipe("component", name="custom_name", last=True)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `component` | callable | The pipeline component. |
|
||||
| `name` | str | Name of pipeline component. Overwrites existing `component.name` attribute if available. If no `name` is set and the component exposes no name attribute, `component.__name__` is used. An error is raised if the name already exists in the pipeline. |
|
||||
| `before` | str | Component name to insert component directly before. |
|
||||
| `after` | str | Component name to insert component directly after: |
|
||||
| `first` | bool | Insert component first / not first in the pipeline. |
|
||||
| `last` | bool | Insert component last / not last in the pipeline. |
|
||||
| Name | Type | Description |
|
||||
| -------------------------------------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `factory_name` | str | Name of the registered component factory. |
|
||||
| `name` | str | Optional unique name of pipeline component instance. If not set, the factory name is used. An error is raised if the name already exists in the pipeline. |
|
||||
| _keyword-only_ | | |
|
||||
| `before` | str / int | Component name or index to insert component directly before. |
|
||||
| `after` | str / int | Component name or index to insert component directly after: |
|
||||
| `first` | bool | Insert component first / not first in the pipeline. |
|
||||
| `last` | bool | Insert component last / not last in the pipeline. |
|
||||
| `config` <Tag variant="new">3</Tag> | `Dict[str, Any]` | Optional config parameters to use for this component. Will be merged with the `default_config` specified by the component factory. |
|
||||
| `validate` <Tag variant="new">3</Tag> | bool | Whether to validate the component config and arguments against the types expected by the factory. Defaults to `True`. |
|
||||
| **RETURNS** <Tag variant="new">3</Tag> | callable | The pipeline component. |
|
||||
|
||||
## Language.has_factory {#has_factory tag="classmethod" new="3"}
|
||||
|
||||
Check whether a factory name is registered on the `Language` class or subclass.
|
||||
Will check for
|
||||
[language-specific factories](/usage/processing-pipelines#factories-language)
|
||||
registered on the subclass, as well as general-purpose factories registered on
|
||||
the `Language` base class, available to all subclasses.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.language import Language
|
||||
> from spacy.lang.en import English
|
||||
>
|
||||
> @English.component("component")
|
||||
> def component(doc):
|
||||
> return doc
|
||||
>
|
||||
> assert English.has_factory("component")
|
||||
> assert not Language.has_factory("component")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ---- | ---------------------------------------------------------- |
|
||||
| `name` | str | Name of the pipeline factory to check. |
|
||||
| **RETURNS** | bool | Whether a factory of that name is registered on the class. |
|
||||
|
||||
## Language.has_pipe {#has_pipe tag="method" new="2"}
|
||||
|
||||
|
@ -208,9 +375,13 @@ Check whether a component is present in the pipeline. Equivalent to
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> nlp.add_pipe(lambda doc: doc, name="component")
|
||||
> assert "component" in nlp.pipe_names
|
||||
> assert nlp.has_pipe("component")
|
||||
> @Language.component("component")
|
||||
> def component(doc):
|
||||
> return doc
|
||||
>
|
||||
> nlp.add_pipe("component", name="my_component")
|
||||
> assert "my_component" in nlp.pipe_names
|
||||
> assert nlp.has_pipe("my_component")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
|
@ -324,6 +495,43 @@ As of spaCy v3.0, the `disable_pipes` method has been renamed to `select_pipes`:
|
|||
| `enable` | str / list | Names(s) of pipeline components that will not be disabled. |
|
||||
| **RETURNS** | `DisabledPipes` | The disabled pipes that can be restored by calling the object's `.restore()` method. |
|
||||
|
||||
## Language.meta {#meta tag="property"}
|
||||
|
||||
Custom meta data for the Language class. If a model is loaded, contains meta
|
||||
data of the model. The `Language.meta` is also what's serialized as the
|
||||
`meta.json` when you save an `nlp` object to disk.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> print(nlp.meta)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ---- | -------------- |
|
||||
| **RETURNS** | dict | The meta data. |
|
||||
|
||||
## Language.config {#config tag="property" new="3"}
|
||||
|
||||
Export a trainable [`config.cfg`](/api/data-formats#config) for the current
|
||||
`nlp` object. Includes the current pipeline, all configs used to create the
|
||||
currently active pipeline components, as well as the default training config
|
||||
that can be used with [`spacy train`](/api/cli#train). `Language.config` returns
|
||||
a [Thinc `Config` object](https://thinc.ai/docs/api-config#config), which is a
|
||||
subclass of the built-in `dict`. It supports the additional methods `to_disk`
|
||||
(serialize the config to a file) and `to_str` (output the config as a string).
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> nlp.config.to_disk("./config.cfg")
|
||||
> print(nlp.config.to_str())
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | --------------------------------------------------- | ----------- |
|
||||
| **RETURNS** | [`Config`](https://thinc.ai/docs/api-config#config) | The config. |
|
||||
|
||||
## Language.to_disk {#to_disk tag="method" new="2"}
|
||||
|
||||
Save the current state to a directory. If a model is loaded, this will **include
|
||||
|
@ -405,23 +613,25 @@ available to the loaded object.
|
|||
|
||||
## Attributes {#attributes}
|
||||
|
||||
| Name | Type | Description |
|
||||
| ------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------- |
|
||||
| `vocab` | `Vocab` | A container for the lexical types. |
|
||||
| `tokenizer` | `Tokenizer` | The tokenizer. |
|
||||
| `make_doc` | `callable` | Callable that takes a string and returns a `Doc`. |
|
||||
| `pipeline` | list | List of `(name, component)` tuples describing the current processing pipeline, in order. |
|
||||
| `pipe_names` <Tag variant="new">2</Tag> | list | List of pipeline component names, in order. |
|
||||
| `pipe_labels` <Tag variant="new">2.2</Tag> | dict | List of labels set by the pipeline components, if available, keyed by component name. |
|
||||
| `meta` | dict | Custom meta data for the Language class. If a model is loaded, contains meta data of the model. |
|
||||
| `path` <Tag variant="new">2</Tag> | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
|
||||
| Name | Type | Description |
|
||||
| --------------------------------------------- | ---------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| `vocab` | `Vocab` | A container for the lexical types. |
|
||||
| `tokenizer` | `Tokenizer` | The tokenizer. |
|
||||
| `make_doc` | `Callable` | Callable that takes a string and returns a `Doc`. |
|
||||
| `pipeline` | `List[str, Callable]` | List of `(name, component)` tuples describing the current processing pipeline, in order. |
|
||||
| `pipe_names` <Tag variant="new">2</Tag> | `List[str]` | List of pipeline component names, in order. |
|
||||
| `pipe_labels` <Tag variant="new">2.2</Tag> | `Dict[str, List[str]]` | List of labels set by the pipeline components, if available, keyed by component name. |
|
||||
| `pipe_factories` <Tag variant="new">2.2</Tag> | `Dict[str, str]` | Dictionary of pipeline component names, mapped to their factory names. |
|
||||
| `factory_names` <Tag variant="new">3</Tag> | `List[str]` | List of all available factory names. |
|
||||
| `path` <Tag variant="new">2</Tag> | `Path` | Path to the model data directory, if a model is loaded. Otherwise `None`. |
|
||||
|
||||
## Class attributes {#class-attributes}
|
||||
|
||||
| Name | Type | Description |
|
||||
| ---------- | ----- | ----------------------------------------------------------------------------------------------- |
|
||||
| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
|
||||
| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
|
||||
| Name | Type | Description |
|
||||
| ---------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `Defaults` | class | Settings, data and factory methods for creating the `nlp` object and processing pipeline. |
|
||||
| `lang` | str | Two-letter language ID, i.e. [ISO code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes). |
|
||||
| `default_config` | dict | Base [config](/usage/training#config) to use for [Language.config](/api/language#config). Defaults to [`default_config.cfg`](https://github.com/explosion/spaCy/tree/develop/spacy/default_config.cfg). |
|
||||
|
||||
## Defaults {#defaults}
|
||||
|
||||
|
|
|
@ -10,20 +10,18 @@ coarse-grained POS tags following the Universal Dependencies
|
|||
[UPOS](https://universaldependencies.org/u/pos/index.html) and
|
||||
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
|
||||
annotation guidelines. This class is a subclass of `Pipe` and follows the same
|
||||
API. The component is also available via the string name `"morphologizer"`.
|
||||
After initialization, it is typically added to the processing pipeline using
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
API. The pipeline component is available in the
|
||||
[processing pipeline](/usage/processing-pipelines) via the ID `"morphologizer"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/morphologizer_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/morphologizer.pyx
|
||||
```
|
||||
|
||||
## Morphologizer.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -33,24 +31,19 @@ Initialize the morphologizer.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> morphologizer = nlp.create_pipe("morphologizer")
|
||||
>
|
||||
> # Construction from class
|
||||
> from spacy.pipeline import Morphologizer
|
||||
> morphologizer = Morphologizer()
|
||||
> # Construction via add_pipe
|
||||
> morphologizer = nlp.add_pipe("morphologizer")
|
||||
> ```
|
||||
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | ------------------------------------------------------------------------------- |
|
||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||
| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
|
||||
| `**cfg` | - | Configuration parameters. |
|
||||
| Name | Type | Description |
|
||||
| ----------- | --------------- | ------------------------------------------------------------------------------- |
|
||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||
| `model` | `Model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. |
|
||||
| `**cfg` | - | Configuration parameters. |
|
||||
| **RETURNS** | `Morphologizer` | The newly constructed object. |
|
||||
|
||||
## Morphologizer.\_\_call\_\_ {#call tag="method"}
|
||||
|
@ -58,8 +51,8 @@ shortcut for this and instantiate the component using its string name and
|
|||
Apply the pipe to one document. The document is modified in place, and returned.
|
||||
This usually happens under the hood when the `nlp` object is called on a text
|
||||
and all pipeline components are applied to the `Doc` in order. Both
|
||||
[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe) delegate to the
|
||||
[`predict`](/api/morphologizer#predict) and
|
||||
[`__call__`](/api/morphologizer#call) and [`pipe`](/api/morphologizer#pipe)
|
||||
delegate to the [`predict`](/api/morphologizer#predict) and
|
||||
[`set_annotations`](/api/morphologizer#set_annotations) methods.
|
||||
|
||||
> #### Example
|
||||
|
@ -81,7 +74,8 @@ and all pipeline components are applied to the `Doc` in order. Both
|
|||
Apply the pipe to a stream of documents. This usually happens under the hood
|
||||
when the `nlp` object is called on a text and all pipeline components are
|
||||
applied to the `Doc` in order. Both [`__call__`](/api/morphologizer#call) and
|
||||
[`pipe`](/api/morphologizer#pipe) delegate to the [`predict`](/api/morphologizer#predict) and
|
||||
[`pipe`](/api/morphologizer#pipe) delegate to the
|
||||
[`predict`](/api/morphologizer#predict) and
|
||||
[`set_annotations`](/api/morphologizer#set_annotations) methods.
|
||||
|
||||
> #### Example
|
||||
|
@ -126,9 +120,9 @@ Modify a batch of documents, using pre-computed scores.
|
|||
> morphologizer.set_annotations([doc1, doc2], scores)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------- | --------------- | ------------------------------------------------ |
|
||||
| `docs` | `Iterable[Doc]` | The documents to modify. |
|
||||
| Name | Type | Description |
|
||||
| -------- | --------------- | ------------------------------------------------------- |
|
||||
| `docs` | `Iterable[Doc]` | The documents to modify. |
|
||||
| `scores` | - | The scores to set, produced by `Morphologizer.predict`. |
|
||||
|
||||
## Morphologizer.update {#update tag="method"}
|
||||
|
@ -145,15 +139,15 @@ pipe's model. Delegates to [`predict`](/api/morphologizer#predict) and
|
|||
> losses = morphologizer.update(examples, sgd=optimizer)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
|
||||
| _keyword-only_ | | |
|
||||
| `drop` | float | The dropout rate. |
|
||||
| Name | Type | Description |
|
||||
| ----------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `examples` | `Iterable[Example]` | A batch of [`Example`](/api/example) objects to learn from. |
|
||||
| _keyword-only_ | | |
|
||||
| `drop` | float | The dropout rate. |
|
||||
| `set_annotations` | bool | Whether or not to update the `Example` objects with the predictions, delegating to [`set_annotations`](/api/morphologizer#set_annotations). |
|
||||
| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
|
||||
| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
|
||||
| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
|
||||
| `sgd` | `Optimizer` | The [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. |
|
||||
| `losses` | `Dict[str, float]` | Optional record of the loss during training. The value keyed by the model's name is updated. |
|
||||
| **RETURNS** | `Dict[str, float]` | The updated `losses` dictionary. |
|
||||
|
||||
## Morphologizer.get_loss {#get_loss tag="method"}
|
||||
|
||||
|
@ -187,12 +181,12 @@ Initialize the pipe for training, using data examples if available. Return an
|
|||
> optimizer = morphologizer.begin_training(pipeline=nlp.pipeline)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
|
||||
| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
|
||||
| Name | Type | Description |
|
||||
| -------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `get_examples` | `Iterable[Example]` | Optional gold-standard annotations in the form of [`Example`](/api/example) objects. |
|
||||
| `pipeline` | `List[(str, callable)]` | Optional list of pipeline components that this component is part of. |
|
||||
| `sgd` | `Optimizer` | An optional [`Optimizer`](https://thinc.ai/docs/api-optimizers) object. Will be created via [`create_optimizer`](/api/morphologizer#create_optimizer) if not set. |
|
||||
| **RETURNS** | `Optimizer` | An optimizer. |
|
||||
| **RETURNS** | `Optimizer` | An optimizer. |
|
||||
|
||||
## Morphologizer.create_optimizer {#create_optimizer tag="method"}
|
||||
|
||||
|
@ -237,9 +231,9 @@ both `pos` and `morph`, the label should include the UPOS as the feature `POS`.
|
|||
> morphologizer.add_label("Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| -------- | ---- | --------------------------------------------------------------- |
|
||||
| `label` | str | The label to add. |
|
||||
| Name | Type | Description |
|
||||
| ------- | ---- | ----------------- |
|
||||
| `label` | str | The label to add. |
|
||||
|
||||
## Morphologizer.to_disk {#to_disk tag="method"}
|
||||
|
||||
|
@ -268,11 +262,11 @@ Load the pipe from disk. Modifies the object in place and returns it.
|
|||
> morphologizer.from_disk("/path/to/morphologizer")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ------------ | -------------------------------------------------------------------------- |
|
||||
| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
|
||||
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
|
||||
| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
|
||||
| Name | Type | Description |
|
||||
| ----------- | --------------- | -------------------------------------------------------------------------- |
|
||||
| `path` | str / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
|
||||
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
|
||||
| **RETURNS** | `Morphologizer` | The modified `Morphologizer` object. |
|
||||
|
||||
## Morphologizer.to_bytes {#to_bytes tag="method"}
|
||||
|
||||
|
@ -288,7 +282,7 @@ Serialize the pipe to a bytestring.
|
|||
| Name | Type | Description |
|
||||
| ----------- | ----- | ------------------------------------------------------------------------- |
|
||||
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
|
||||
| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
|
||||
| **RETURNS** | bytes | The serialized form of the `Morphologizer` object. |
|
||||
|
||||
## Morphologizer.from_bytes {#from_bytes tag="method"}
|
||||
|
||||
|
@ -302,16 +296,16 @@ Load the pipe from a bytestring. Modifies the object in place and returns it.
|
|||
> morphologizer.from_bytes(morphologizer_bytes)
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
| ------------ | -------- | ------------------------------------------------------------------------- |
|
||||
| `bytes_data` | bytes | The data to load from. |
|
||||
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
|
||||
| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
|
||||
| Name | Type | Description |
|
||||
| ------------ | --------------- | ------------------------------------------------------------------------- |
|
||||
| `bytes_data` | bytes | The data to load from. |
|
||||
| `exclude` | list | String names of [serialization fields](#serialization-fields) to exclude. |
|
||||
| **RETURNS** | `Morphologizer` | The `Morphologizer` object. |
|
||||
|
||||
## Morphologizer.labels {#labels tag="property"}
|
||||
|
||||
The labels currently added to the component in Universal Dependencies [FEATS
|
||||
format](https://universaldependencies.org/format.html#morphological-annotation).
|
||||
The labels currently added to the component in Universal Dependencies
|
||||
[FEATS format](https://universaldependencies.org/format.html#morphological-annotation).
|
||||
Note that even for a blank component, this will always include the internal
|
||||
empty label `_`. If POS features are used, the labels will include the
|
||||
coarse-grained POS as the feature `POS`.
|
||||
|
@ -339,8 +333,8 @@ serialization by passing in the string names via the `exclude` argument.
|
|||
> data = morphologizer.to_disk("/path", exclude=["vocab"])
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
| --------- | ------------------------------------------------------------------------------------------ |
|
||||
| `vocab` | The shared [`Vocab`](/api/vocab). |
|
||||
| `cfg` | The config file. You usually don't want to exclude this. |
|
||||
| `model` | The binary model data. You usually don't want to exclude this. |
|
||||
| Name | Description |
|
||||
| ------- | -------------------------------------------------------------- |
|
||||
| `vocab` | The shared [`Vocab`](/api/vocab). |
|
||||
| `cfg` | The config file. You usually don't want to exclude this. |
|
||||
| `model` | The binary model data. You usually don't want to exclude this. |
|
||||
|
|
|
@ -11,8 +11,7 @@ menu:
|
|||
## merge_noun_chunks {#merge_noun_chunks tag="function"}
|
||||
|
||||
Merge noun chunks into a single token. Also available via the string name
|
||||
`"merge_noun_chunks"`. After initialization, the component is typically added to
|
||||
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
||||
`"merge_noun_chunks"`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
@ -20,9 +19,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
|||
> texts = [t.text for t in nlp("I have a blue car")]
|
||||
> assert texts == ["I", "have", "a", "blue", "car"]
|
||||
>
|
||||
> merge_nps = nlp.create_pipe("merge_noun_chunks")
|
||||
> nlp.add_pipe(merge_nps)
|
||||
>
|
||||
> nlp.add_pipe("merge_noun_chunks")
|
||||
> texts = [t.text for t in nlp("I have a blue car")]
|
||||
> assert texts == ["I", "have", "a blue car"]
|
||||
> ```
|
||||
|
@ -44,8 +41,7 @@ all other components.
|
|||
## merge_entities {#merge_entities tag="function"}
|
||||
|
||||
Merge named entities into a single token. Also available via the string name
|
||||
`"merge_entities"`. After initialization, the component is typically added to
|
||||
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
||||
`"merge_entities"`.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
|
@ -53,8 +49,7 @@ the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
|||
> texts = [t.text for t in nlp("I like David Bowie")]
|
||||
> assert texts == ["I", "like", "David", "Bowie"]
|
||||
>
|
||||
> merge_ents = nlp.create_pipe("merge_entities")
|
||||
> nlp.add_pipe(merge_ents)
|
||||
> nlp.add_pipe("merge_entities")
|
||||
>
|
||||
> texts = [t.text for t in nlp("I like David Bowie")]
|
||||
> assert texts == ["I", "like", "David Bowie"]
|
||||
|
@ -76,12 +71,9 @@ components to the end of the pipeline and after all other components.
|
|||
## merge_subtokens {#merge_subtokens tag="function" new="2.1"}
|
||||
|
||||
Merge subtokens into a single token. Also available via the string name
|
||||
`"merge_subtokens"`. After initialization, the component is typically added to
|
||||
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
As of v2.1, the parser is able to predict "subtokens" that should be merged into
|
||||
one single token later on. This is especially relevant for languages like
|
||||
Chinese, Japanese or Korean, where a "word" isn't defined as a
|
||||
`"merge_subtokens"`. As of v2.1, the parser is able to predict "subtokens" that
|
||||
should be merged into one single token later on. This is especially relevant for
|
||||
languages like Chinese, Japanese or Korean, where a "word" isn't defined as a
|
||||
whitespace-delimited sequence of characters. Under the hood, this component uses
|
||||
the [`Matcher`](/api/matcher) to find sequences of tokens with the dependency
|
||||
label `"subtok"` and then merges them into a single token.
|
||||
|
@ -96,9 +88,7 @@ label `"subtok"` and then merges them into a single token.
|
|||
> print([(token.text, token.dep_) for token in doc])
|
||||
> # [('拜', 'subtok'), ('托', 'subtok')]
|
||||
>
|
||||
> merge_subtok = nlp.create_pipe("merge_subtokens")
|
||||
> nlp.add_pipe(merge_subtok)
|
||||
>
|
||||
> nlp.add_pipe("merge_subtokens")
|
||||
> doc = nlp("拜托")
|
||||
> print([token.text for token in doc])
|
||||
> # ['拜托']
|
||||
|
|
|
@ -1,26 +1,24 @@
|
|||
---
|
||||
title: SentenceRecognizer
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/senter.pyx
|
||||
new: 3
|
||||
---
|
||||
|
||||
A trainable pipeline component for sentence segmentation. For a simpler,
|
||||
ruse-based strategy, see the [`Sentencizer`](/api/sentencizer). This class is a
|
||||
subclass of `Pipe` and follows the same API. The component is also available via
|
||||
the string name `"senter"`. After initialization, it is typically added to the
|
||||
processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
||||
the string name `"senter"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/senter_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/senter.pyx
|
||||
```
|
||||
|
||||
## SentenceRecognizer.\_\_init\_\_ {#init tag="method"}
|
||||
|
@ -30,12 +28,8 @@ Initialize the sentence recognizer.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> senter = nlp.create_pipe("senter")
|
||||
>
|
||||
> # Construction from class
|
||||
> from spacy.pipeline import SentenceRecognizer
|
||||
> senter = SentenceRecognizer()
|
||||
> # Construction via add_pipe
|
||||
> senter = nlp.add_pipe("senter")
|
||||
> ```
|
||||
|
||||
<!-- TODO: document, similar to other trainable pipeline components -->
|
||||
|
|
|
@ -9,8 +9,7 @@ that doesn't require the dependency parse. By default, sentence segmentation is
|
|||
performed by the [`DependencyParser`](/api/dependencyparser), so the
|
||||
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
|
||||
require a statistical model to be loaded. The component is also available via
|
||||
the string name `"sentencizer"`. After initialization, it is typically added to
|
||||
the processing pipeline using [`nlp.add_pipe`](/api/language#add_pipe).
|
||||
the string name `"sentencizer"`.
|
||||
|
||||
## Sentencizer.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
|
@ -19,12 +18,8 @@ Initialize the sentencizer.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> sentencizer = nlp.create_pipe("sentencizer")
|
||||
>
|
||||
> # Construction from class
|
||||
> from spacy.pipeline import Sentencizer
|
||||
> sentencizer = Sentencizer()
|
||||
> # Construction via add_pipe
|
||||
> sentencizer = nlp.add_pipe("sentencizer")
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
|
@ -58,8 +53,7 @@ the component has been added to the pipeline using
|
|||
> from spacy.lang.en import English
|
||||
>
|
||||
> nlp = English()
|
||||
> sentencizer = nlp.create_pipe("sentencizer")
|
||||
> nlp.add_pipe(sentencizer)
|
||||
> nlp.add_pipe("sentencizer")
|
||||
> doc = nlp("This is a sentence. This is another sentence.")
|
||||
> assert len(list(doc.sents)) == 2
|
||||
> ```
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Tagger
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/tagger.pyx
|
||||
---
|
||||
|
||||
This class is a subclass of `Pipe` and follows the same API. The pipeline
|
||||
|
@ -13,22 +13,17 @@ via the ID `"tagger"`.
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> tagger = nlp.create_pipe("tagger")
|
||||
> # Construction via add_pipe with default model
|
||||
> tagger = nlp.add_pipe("tagger")
|
||||
>
|
||||
> # Construction via create_pipe with custom model
|
||||
> config = {"model": {"@architectures": "my_tagger"}}
|
||||
> parser = nlp.create_pipe("tagger", config)
|
||||
>
|
||||
> # Construction from class with custom model from file
|
||||
> from spacy.pipeline import Tagger
|
||||
> model = util.load_config("model.cfg", create_objects=True)["model"]
|
||||
> tagger = Tagger(nlp.vocab, model)
|
||||
> parser = nlp.add_pipe("tagger", config)
|
||||
> ```
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#add_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | -------- | ------------------------------------------------------------------------------- |
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: TextCategorizer
|
||||
tag: class
|
||||
source: spacy/pipeline/pipes.pyx
|
||||
source: spacy/pipeline/textcat.py
|
||||
new: 2
|
||||
---
|
||||
|
||||
|
@ -9,41 +9,33 @@ This class is a subclass of `Pipe` and follows the same API. The pipeline
|
|||
component is available in the [processing pipeline](/usage/processing-pipelines)
|
||||
via the ID `"textcat"`.
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/textcat_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/textcat.py
|
||||
```
|
||||
|
||||
<!-- TODO: do we also need to document the other defaults here? -->
|
||||
|
||||
## TextCategorizer.\_\_init\_\_ {#init tag="method"}
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> # Construction via create_pipe
|
||||
> textcat = nlp.create_pipe("textcat")
|
||||
> # Construction via add_pipe with default model
|
||||
> textcat = nlp.add_pipe("textcat")
|
||||
>
|
||||
> # Construction via create_pipe with custom model
|
||||
> # Construction via add_pipe with custom model
|
||||
> config = {"model": {"@architectures": "my_textcat"}}
|
||||
> parser = nlp.create_pipe("textcat", config)
|
||||
>
|
||||
> # Construction from class with custom model from file
|
||||
> from spacy.pipeline import TextCategorizer
|
||||
> model = util.load_config("model.cfg", create_objects=True)["model"]
|
||||
> textcat = TextCategorizer(nlp.vocab, model)
|
||||
> parser = nlp.add_pipe("textcat", config=config)
|
||||
> ```
|
||||
|
||||
Create a new pipeline instance. In your application, you would normally use a
|
||||
shortcut for this and instantiate the component using its string name and
|
||||
[`nlp.create_pipe`](/api/language#create_pipe).
|
||||
[`nlp.add_pipe`](/api/language#create_pipe).
|
||||
|
||||
| Name | Type | Description |
|
||||
| ----------- | ----------------- | ------------------------------------------------------------------------------- |
|
||||
|
|
|
@ -4,16 +4,15 @@ source: spacy/pipeline/tok2vec.py
|
|||
new: 3
|
||||
---
|
||||
|
||||
TODO: document
|
||||
<!-- TODO: document -->
|
||||
|
||||
## Default config {#config}
|
||||
## Implementation and defaults {#implementation}
|
||||
|
||||
This is the default configuration used to initialize the model powering the
|
||||
pipeline component. See the [model architectures](/api/architectures)
|
||||
documentation for details on the architectures and their arguments and
|
||||
hyperparameters. To learn more about how to customize the config and train
|
||||
custom models, check out the [training config](/usage/training#config) docs.
|
||||
See the [model architectures](/api/architectures) documentation for details on
|
||||
the architectures and their arguments and hyperparameters. To learn more about
|
||||
how to customize the config and train custom models, check out the
|
||||
[training config](/usage/training#config) docs.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/defaults/tok2vec_defaults.cfg
|
||||
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/tok2vec.py
|
||||
```
|
||||
|
|
|
@ -31,7 +31,7 @@ the
|
|||
> nlp = English()
|
||||
> # Create a Tokenizer with the default settings for English
|
||||
> # including punctuation rules and exceptions
|
||||
> tokenizer = nlp.Defaults.create_tokenizer(nlp)
|
||||
> tokenizer = nlp.tokenizer
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
|
|
|
@ -45,7 +45,8 @@ class, loads in the model data and returns it.
|
|||
### Abstract example
|
||||
cls = util.get_lang_class(lang) # get language for ID, e.g. 'en'
|
||||
nlp = cls() # initialise the language
|
||||
for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline
|
||||
for name in pipeline:
|
||||
nlp.add_pipe(name) # add component to pipeline
|
||||
nlp.from_disk(model_data_path) # load in model data
|
||||
```
|
||||
|
||||
|
@ -479,7 +480,6 @@ you can use the [`set_lang_class`](/api/top-level#util.set_lang_class) helper.
|
|||
> for lang_id in ["en", "de"]:
|
||||
> lang_class = util.get_lang_class(lang_id)
|
||||
> lang = lang_class()
|
||||
> tokenizer = lang.Defaults.create_tokenizer()
|
||||
> ```
|
||||
|
||||
| Name | Type | Description |
|
||||
|
|
|
@ -1,30 +1,33 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 923 200" width="923" height="200">
|
||||
<style>
|
||||
.svg__pipeline__text { fill: #1a1e23; font: 20px Arial, sans-serif }
|
||||
.svg__pipeline__text-small { fill: #1a1e23; font: bold 18px Arial, sans-serif }
|
||||
.svg__pipeline__text-code { fill: #1a1e23; font: 600 16px Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace }
|
||||
</style>
|
||||
<rect width="601" height="127" x="159" y="21" fill="none" stroke="#09a3d5" stroke-width="3" rx="19.1" stroke-dasharray="3 6" ry="19.1"/>
|
||||
<path fill="#e1d5e7" stroke="#9673a6" stroke-width="2" d="M801 55h120v60H801z"/>
|
||||
<text class="svg__pipeline__text" dy="0.75em" width="28" height="19" transform="translate(846.5 75.5)">Doc</text>
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M121.2 84.7h29.4"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M156.6 84.7l-8 4 2-4-2-4z"/>
|
||||
<path fill="#f5f5f5" stroke="#999" stroke-width="2" d="M1 55h120v60H1z"/>
|
||||
<text class="svg__pipeline__text" dy="0.85em" width="34" height="22" transform="translate(43.5 73.5)">Text</text>
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M760 84.7h33"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M799 84.7l-8 4 2-4-2-4z"/>
|
||||
<rect width="75" height="39" x="422" y="1" fill="#dae8fc" stroke="#09a3d5" stroke-width="2" rx="5.8" ry="5.8"/>
|
||||
<text class="svg__pipeline__text-code" dy="0.8em" dx="0.1em" width="29" height="17" transform="translate(444.5 11.5)">nlp</text>
|
||||
<path fill="#f8cecc" stroke="#b85450" stroke-width="2" stroke-miterlimit="10" d="M176 58h103.3L296 88l-16.8 30H176l16.8-30z"/>
|
||||
<text class="svg__pipeline__text-small" dy="0.75em" dx="-0.25em" width="58" height="14" transform="translate(206.5 80.5)">tokenizer</text>
|
||||
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M314 58h103.3L434 88l-16.8 30H314l16.8-30z"/>
|
||||
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="62" height="14" transform="translate(342.5 80.5)">tagger</text>
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M296.5 88.2h24.7"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M327.2 88.2l-8 4 2-4-2-4z"/>
|
||||
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M416 58h103.3L536 88l-16.8 30H416l16.8-30z"/>
|
||||
<text class="svg__pipeline__text-small" dy="0.75em" dx="-0.25em" width="40" height="14" transform="translate(455.5 80.5)">parser</text>
|
||||
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M519 58h103.3L639 88l-16.8 30H519l16.8-30z"/>
|
||||
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="40" height="14" transform="translate(558.5 80.5)">ner</text>
|
||||
<path fill="#ffe6cc" stroke="#d79b00" stroke-width="2" stroke-miterlimit="10" d="M622 58h103.3L742 88l-16.8 30H622l16.8-30z"/>
|
||||
<text class="svg__pipeline__text-small" dy="0.75em" dx="8" width="20" height="14" transform="translate(671.5 80.5)">...</text>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1155" height="221" viewBox="0 0 1155 221">
|
||||
<defs>
|
||||
<rect id="a" width="735" height="170" x="210" y="25" rx="30"/>
|
||||
<mask id="b" width="735" height="170" x="0" y="0" fill="#fff" maskContentUnits="userSpaceOnUse" maskUnits="objectBoundingBox">
|
||||
<use xlink:href="#a"/>
|
||||
</mask>
|
||||
</defs>
|
||||
<g fill="none" fill-rule="evenodd" transform="translate(0 26)">
|
||||
<rect width="145" height="80" x="2.5" y="2.5" fill="#D8D8D8" stroke="#6A6A6A" stroke-width="5" rx="10" transform="translate(0 70)"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M55.4 99.7v3.9h-7.6V125H43v-21.4h-7.7v-3.9h20zm10.2 7c1 0 2.1.2 3 .6a6.8 6.8 0 014.1 4.1 9.6 9.6 0 01.6 4.3l-.2.5-.3.3H61.3c0 2 .6 3.3 1.4 4.1.9.9 2 1.3 3.5 1.3a6 6 0 001.8-.2l1.3-.6 1-.5.8-.3c.2 0 .3 0 .5.2l.3.2 1.3 1.6c-.5.6-1 1-1.6 1.4a9 9 0 01-3.9 1.4l-2 .2c-1.2 0-2.3-.2-3.4-.7-1-.4-2-1-2.8-1.8a8.6 8.6 0 01-1.9-3 11.6 11.6 0 010-7.6c.3-1.1.9-2 1.6-2.8a8 8 0 012.7-2 9 9 0 013.7-.6zm0 3.2a4 4 0 00-3 1c-.6.7-1 1.8-1.3 3h8.1c0-.5 0-1-.2-1.5-.1-.5-.4-1-.7-1.3-.3-.4-.7-.7-1.2-1a4 4 0 00-1.7-.2zm15.5 5.8l-5.9-8.7h4.2c.3 0 .5 0 .7.2l.4.4 3.7 6a4.9 4.9 0 01.6-1.2l3-4.7.4-.5.6-.2h4l-6 8.5L93 125h-4.2c-.3 0-.5 0-.7-.2l-.5-.6-3.8-6.3-.4 1.1-3.4 5.2-.5.5a1 1 0 01-.7.3H75l6-9.3zm20.5 9.6c-1.5 0-2.7-.5-3.5-1.3a5 5 0 01-1.3-3.7v-10H95c-.3 0-.5 0-.6-.2-.2-.2-.3-.4-.3-.7v-1.7l2.9-.5 1-5c0-.1 0-.3.2-.5l.7-.2h2.2v5.7h4.7v3h-4.7v9.8c0 .6.2 1 .4 1.3.3.3.7.5 1.2.5l.6-.1a3.7 3.7 0 00.9-.4l.3-.1.3.1.3.3 1.2 2c-.6.6-1.3 1-2.1 1.3a8 8 0 01-2.6.4z"/>
|
||||
<rect width="145" height="80" x="2.5" y="2.5" fill="#D7CCF4" stroke="#8978B5" stroke-width="5" rx="10" transform="translate(1005 70)"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M1050.3 101.5a58.8 58.8 0 016.8-.4c2.2 0 4 .4 5.4 1 1.4.6 2.5 1.5 3.4 2.6a10 10 0 011.7 4 23.2 23.2 0 010 9.6c-.3 1.5-1 2.9-1.8 4-.8 1.3-2 2.2-3.5 3-1.5.7-3.4 1-5.8 1a37.3 37.3 0 01-5-.1l-1.2-.2v-24.5zm7 4a15.6 15.6 0 00-2.3 0V122h.5a158 158 0 001.6.1 6 6 0 003.2-.7c.8-.5 1.4-1.2 1.8-2 .4-.8.7-1.8.8-2.8a27.3 27.3 0 000-5.8 8 8 0 00-.7-2.6c-.4-.8-1-1.5-1.8-2-.7-.5-1.8-.8-3.1-.8zm13.4 11.8c0-1.5.2-2.8.7-4a8 8 0 014.8-4.7c1.1-.4 2.4-.6 3.8-.6 1.5 0 2.8.2 4 .7 1 .4 2 1 2.9 1.8.8.9 1.4 1.8 1.8 3 .4 1.1.6 2.4.6 3.7 0 1.5-.2 2.8-.7 4a8 8 0 01-4.8 4.7c-1.1.4-2.4.6-3.8.6a11 11 0 01-4-.7c-1-.4-2-1-2.9-1.8a7.9 7.9 0 01-1.8-3c-.4-1.1-.6-2.4-.6-3.8zm4.7 0c0 .7.1 1.4.3 2 .2.7.5 1.3 1 1.8a4.1 4.1 0 003.3 1.5c1.4 0 2.5-.4 3.3-1.3.9-.8 1.3-2.2 1.3-4a6 6 0 00-1.2-4c-.8-1-2-1.4-3.4-1.4-.7 0-1.3 0-1.8.3-.6.2-1 .5-1.5 1-.4.4-.7 1-1 1.6-.2.7-.3 1.5-.3 2.4zm34.2 7c-1 .7-2 1.3-3.3 1.6-1.3.4-2.7.6-4 .6-1.6 0-3-.2-4.1-.7-1.2-.4-2.2-1-3-1.8a8 8 0 01-1.8-3 10.9 10.9 0 010-7.7 8.2 8.2 0 015.2-4.7 14.3 14.3 0 017.6-.2l2.6 1v6.1h-3.8v-3.2l-2.2-.3c-.7 0-1.3.1-2 .3a4.8 4.8 0 00-2.9 2.6c-.3.7-.5 1.4-.5 2.3 0 .8.2 1.5.4 2.1a5 5 0 002.8 2.8 8.2 8.2 0 005.6-.2l1.9-1 1.5 3.4z"/>
|
||||
<use stroke="#3AC" stroke-dasharray="5 10" stroke-width="10" mask="url(#b)" xlink:href="#a"/>
|
||||
<g transform="translate(540)">
|
||||
<rect width="95" height="50" x="2.5" y="2.5" fill="#C3E7F1" stroke="#3AC" stroke-width="5" rx="10"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M27.8 24.5h4.4l.3 1.6h.1a5.2 5.2 0 014.2-2c.7 0 1.3.1 1.8.3.6.2 1 .4 1.4.8.4.4.7 1 1 1.6.1.6.3 1.5.3 2.4V37H38v-7.1c0-1-.2-1.8-.7-2.2-.4-.5-1-.7-1.7-.7-.6 0-1.2.2-1.7.6-.5.3-.9.8-1 1.3V37h-3.3v-9.8h-1.8v-2.7zm16.9-5H50v11.6c0 1.2.2 2.1.5 2.6s.8.8 1.5.8c.5 0 1 0 1.3-.2l1-.4 1.2 2.2a15.3 15.3 0 01-1.8 1 6.1 6.1 0 01-2.3.3c-1.5 0-2.7-.4-3.5-1.3-.8-.8-1.1-1.9-1.1-3.4V22.3h-2.1v-2.7zm12.8 5h4.3L62 26h.1c.9-1.2 2.3-1.9 4.2-1.9a6 6 0 012.1.4c.7.3 1.2.6 1.7 1.1.4.6.8 1.2 1 2 .3.8.4 1.7.4 2.8 0 1-.1 2-.4 3-.3.8-.7 1.5-1.2 2.1-.6.6-1.2 1-2 1.4-.7.3-1.6.5-2.6.5-.5 0-1 0-1.5-.2-.5 0-1-.2-1.3-.3V42h-3.2V27.2h-1.9v-2.7zm8 2.4c-.7 0-1.3.2-1.8.5s-.9.8-1 1.4V34c.2.2.5.3 1 .4l1.3.2c.4 0 .9 0 1.3-.2s.7-.4 1-.8c.3-.4.6-.8.7-1.3.2-.6.3-1.2.3-2 0-1-.3-1.9-.8-2.5-.6-.6-1.2-.9-2-.9z"/>
|
||||
</g>
|
||||
<path fill="#3AC" d="M205 112.5L180 125v-25z"/>
|
||||
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M180 112.5h-23.1"/>
|
||||
<path fill="#3AC" d="M1000 112.5L975 125v-25z"/>
|
||||
<path stroke="#3AC" stroke-linecap="square" stroke-width="5" d="M975 112.5h-23.1"/>
|
||||
<path fill="#EAC1CC" stroke="#F03969" stroke-linejoin="round" stroke-width="3.8" d="M230 75h135l23.5 43.4L365 160H230l23.5-41.5z"/>
|
||||
<path fill="#F2D7B2" stroke="#F0A439" stroke-linejoin="round" stroke-width="3.8" d="M395 75h135l23.5 43.4L530 160H395l23.5-41.5z"/>
|
||||
<path fill="#F2E7A6" stroke="#CDB217" stroke-linejoin="round" stroke-width="3.8" d="M515 75h135l23.5 43.4L650 160H515l23.5-41.5z"/>
|
||||
<path fill="#D7E99A" stroke="#B2D73A" stroke-linejoin="round" stroke-width="3.8" d="M640 75h135l23.5 43.4L775 160H640l23.5-41.5z"/>
|
||||
<path fill="#B5F3D4" stroke="#3AD787" stroke-linejoin="round" stroke-width="3.8" d="M765 75h135l23.5 43.4L900 160H765l23.5-41.5z"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M265.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm10.9-13.2c1 0 1.8.1 2.6.4a5.6 5.6 0 013.3 3.4c.3.8.4 1.8.4 2.8 0 1-.1 1.9-.4 2.7a5.5 5.5 0 01-3.3 3.4 7 7 0 01-2.6.5 7 7 0 01-2.6-.5 5.6 5.6 0 01-3.3-3.4 7.8 7.8 0 010-5.5c.3-.8.7-1.5 1.3-2 .5-.6 1.2-1 2-1.4a7 7 0 012.6-.4zm0 10.8c1 0 1.9-.3 2.4-1 .5-.8.7-1.8.7-3.2 0-1.4-.2-2.4-.7-3.2-.5-.7-1.3-1-2.4-1-1 0-1.9.3-2.4 1-.5.8-.8 1.8-.8 3.2 0 1.4.3 2.4.8 3.1.5.8 1.3 1.1 2.4 1.1zm11.9-16.4v10.7h.5l.5-.1.4-.3 3.2-4 .4-.4.7-.1h2.8l-4 4.7-.4.5-.5.4.4.4.4.6 4.3 6.2h-2.8l-.6-.1c-.2-.1-.3-.2-.4-.5l-3.3-4.8a1 1 0 00-.4-.4h-1.2v5.8h-3.1v-18.6h3zm16 5.6c.7 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1.1 3 .6.6 1.4.9 2.4.9.6 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.7-.2-2.5-.5s-1.4-.7-2-1.3c-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-1 0-1.6.2-2.1.8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9c.4 0 .6.2.8.5l.2 1a7 7 0 011.7-1.2 4.6 4.6 0 012.2-.5c.7 0 1.4 0 1.9.3l1.4 1 .8 1.6c.2.6.3 1.2.3 2v8.1h-3.1v-8.2c0-.7-.2-1.4-.6-1.8-.3-.4-.9-.6-1.6-.6l-1.5.3c-.5.3-1 .6-1.3 1v9.3h-3.1zm17.5-12.8V125H327v-12.8h3zm.4-3.8l-.1.8a2 2 0 01-1 1 2 2 0 01-2.2-.4 2 2 0 01-.4-.6l-.2-.8a2 2 0 01.6-1.4 2 2 0 011.3-.5l.8.1a2 2 0 011 1l.3.8zm12.3 5v.7l-.3.5-6.2 8h6.4v2.4h-10v-1.3l.2-.5c0-.2.1-.4.3-.5l6.1-8.2h-6.2v-2.3h9.8v1.3zm7.8-1.4c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.8 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.3 2.3-.2.3-.3.1h-.6l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M440.9 125.2c-1.1 0-2-.3-2.6-1-.6-.6-.9-1.4-.9-2.5v-7.2h-1.3c-.2 0-.3 0-.4-.2-.2 0-.2-.2-.2-.5v-1.2l2-.3.7-3.5.2-.4.5-.2h1.6v4h3.4v2.3h-3.4v7c0 .3 0 .6.3.9.2.2.5.3.8.3h.5a2.6 2.6 0 00.6-.3l.2-.1h.2l.2.3 1 1.5-1.6.8-1.8.3zm15.5-.2H455l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-1.9 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.8-.1l-1.2-.7a3 3 0 01-.7-1c-.2-.5-.3-1-.3-1.6 0-.5.1-1 .4-1.4.2-.5.6-.9 1.2-1.3s1.4-.7 2.4-1c1-.2 2.2-.3 3.7-.3v-.8c0-.9-.2-1.5-.6-2-.3-.3-.9-.5-1.6-.5a3.8 3.8 0 00-2 .5l-.8.4c-.2.2-.4.2-.6.2-.3 0-.4 0-.6-.2l-.3-.3-.6-1c1.5-1.4 3.2-2 5.3-2 .8 0 1.4 0 2 .3a4.3 4.3 0 012.5 2.6c.2.6.3 1.3.3 2v8.1zm-6-2h.9a3.3 3.3 0 001.4-.7l.7-.6v-2.2c-1 0-1.7.1-2.3.3a6 6 0 00-1.5.4l-.7.6c-.2.2-.3.5-.3.8 0 .5.2.9.5 1.1.3.3.8.4 1.3.4zm13.5-11l1.5.1 1.3.5h3.7v1.2l-.1.4-.6.2-1.1.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.5 3c-.4.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3.2.5.4.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.2 1-.4 1.6-.3.5-.7 1-1.2 1.4-.6.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.9-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.1-1 .5-1.5.4-.4.9-.7 1.5-1-.3-.1-.5-.4-.7-.7a2 2 0 01-.3-1.1v-.6l.4-.6.5-.6.8-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3l1.6-.8c.6-.2 1.3-.3 2-.3zm3.3 13.6c0-.3 0-.5-.2-.6-.1-.2-.3-.3-.6-.4l-1-.2a16.7 16.7 0 00-2.2-.2H462c-.4.1-.6.4-.8.6-.3.3-.4.6-.4 1 0 .2 0 .4.2.6l.5.5 1 .3 1.4.1 1.5-.1c.4-.1.8-.2 1-.4l.7-.5.1-.7zm-3.3-7.3c.3 0 .7 0 1-.2l.7-.4.4-.7.1-.8a2 2 0 00-.5-1.5c-.4-.4-1-.6-1.8-.6-.7 0-1.3.2-1.7.6a2 2 0 00-.5 1.5l.1.8a1.8 1.8 0 001.2 1.1l1 .2zm12.9-6.3l1.5.1 1.4.5h3.7v1.2l-.2.4-.5.2-1.2.3a4 4 0 01.3 1.4 3.8 3.8 0 01-1.4 3c-.5.4-1 .7-1.6.9a6.5 6.5 0 01-3.4.1c-.4.3-.6.5-.6.8 0 .3 0 .5.3.6l1 .3h1.3a27.5 27.5 0 013 .3l1.3.5 1 1c.2.3.3.8.3 1.5 0 .5-.1 1-.4 1.6-.3.5-.7 1-1.2 1.4-.5.4-1.2.8-2 1a10.1 10.1 0 01-5.2.1 6 6 0 01-1.7-.7c-.5-.3-.8-.7-1-1.1-.3-.4-.4-.8-.4-1.3 0-.6.2-1 .5-1.5.4-.4 1-.7 1.6-1-.3-.1-.6-.4-.8-.7a2 2 0 01-.3-1.1l.1-.6.3-.6.6-.6.7-.5a3.7 3.7 0 01-2-3.5 3.8 3.8 0 011.3-3c.5-.3 1-.6 1.7-.8.6-.2 1.3-.3 2-.3zm3.4 13.6c0-.3-.1-.5-.3-.6-.1-.2-.3-.3-.6-.4l-.9-.2a16.7 16.7 0 00-2.3-.2H475l-.8.6c-.2.3-.3.6-.3 1l.1.6.6.5 1 .3 1.4.1 1.5-.1 1-.4c.3-.1.5-.3.6-.5l.2-.7zm-3.4-7.3c.4 0 .7 0 1-.2.3 0 .5-.2.7-.4l.4-.7.2-.8a2 2 0 00-.6-1.5c-.4-.4-1-.6-1.7-.6-.8 0-1.3.2-1.7.6a2 2 0 00-.6 1.5c0 .3 0 .6.2.8a1.8 1.8 0 001 1.1l1 .2zm13.8-6.3c.8 0 1.5.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3l-.1.3-.2.2h-8.3c0 1.4.4 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.2.3 1 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.8 1c-.4.2-1 .2-1.4.2-.9 0-1.7-.2-2.4-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1l-.5-1-.8-.6-1.3-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.2.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.2 1.5v8h-3z"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M556.6 129.2v-17h2l.4.1c.2.1.3.2.3.4l.3 1.2c.5-.6 1-1 1.8-1.4a4.8 4.8 0 014.2-.1c.6.3 1.1.7 1.5 1.2a6 6 0 011 2 10.3 10.3 0 010 5.6c-.3.8-.7 1.5-1.1 2a5.1 5.1 0 01-6 1.7l-1.3-1v5.3h-3zm6-14.8c-.6 0-1.1.1-1.6.4-.4.3-.9.6-1.3 1.1v5.8a3 3 0 002.5 1.1c.5 0 .9 0 1.3-.2l1-.8c.2-.4.4-.8.5-1.4a8.6 8.6 0 000-3.8c0-.5-.2-1-.4-1.3a2 2 0 00-.9-.7c-.3-.2-.6-.2-1-.2zm18.2 10.6h-1.3l-.7-.1c-.2-.1-.3-.3-.4-.6l-.3-.9a10.6 10.6 0 01-2 1.3 5 5 0 01-1 .4 6.4 6.4 0 01-2.7-.1c-.5-.2-.9-.4-1.2-.7a3 3 0 01-.8-1c-.2-.5-.3-1-.3-1.6 0-.5.2-1 .4-1.4.3-.5.7-.9 1.3-1.3.6-.4 1.4-.7 2.4-1 1-.2 2.2-.3 3.6-.3v-.8c0-.9-.2-1.5-.5-2-.4-.3-1-.5-1.6-.5a3.8 3.8 0 00-2.1.5l-.7.4c-.2.2-.4.2-.7.2-.2 0-.4 0-.5-.2-.2 0-.3-.2-.4-.3l-.5-1c1.4-1.4 3.2-2 5.3-2a4.3 4.3 0 014.4 3c.2.5.3 1.2.3 1.9v8.1zm-6-2h1a3.3 3.3 0 001.4-.7l.6-.6v-2.2c-.9 0-1.6.1-2.2.3a6 6 0 00-1.5.4l-.8.6-.2.8c0 .5.2.9.5 1.1.3.3.7.4 1.2.4zm9 2v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1zm17.9-10.3l-.3.3h-.8a32.9 32.9 0 00-1.4-.7h-1c-.6 0-1 0-1.4.3-.4.3-.5.6-.5 1 0 .3 0 .5.2.7l.7.5 1 .4a33 33 0 012.3.8c.4.2.8.4 1 .7.4.2.6.5.8 1l.2 1.2c0 .7 0 1.2-.3 1.7-.2.6-.6 1-1 1.4-.4.4-1 .7-1.6.9a7 7 0 01-3.5.2 7.6 7.6 0 01-2.3-.8l-.8-.7.7-1.1c0-.2.2-.3.3-.4h1a12 12 0 001.4.8l1.2.1h1l.6-.4c.1-.2.3-.3.3-.5l.1-.6c0-.3 0-.6-.2-.8l-.7-.5-1-.3a33.5 33.5 0 01-2.4-.9 4 4 0 01-1-.7 3 3 0 01-.7-1 3.7 3.7 0 011-4.2c.4-.3.9-.6 1.5-.8.6-.2 1.3-.3 2-.3 1 0 1.8.1 2.5.4.7.3 1.3.7 1.8 1.2l-.7 1zm8.6-2.7c.8 0 1.6.1 2.2.4a4.9 4.9 0 013 3 6.9 6.9 0 01.3 3v.3l-.3.2h-8.3c.1 1.4.5 2.4 1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.6-.1h.3l.3.3.9 1c-.4.5-.8.8-1.2 1a6.4 6.4 0 01-2.7 1c-.5.2-1 .2-1.4.2-1 0-1.8-.2-2.5-.5-.8-.3-1.5-.7-2-1.3-.6-.5-1-1.3-1.4-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.7-.3 1.6-.4 2.5-.4zm0 2.2c-.8 0-1.5.2-2 .8-.5.5-.9 1.2-1 2.1h5.8c0-.4 0-.8-.2-1.1 0-.4-.2-.7-.5-1l-.8-.6-1.2-.2zm8 10.8v-12.8h1.9l.6.1c.2.2.3.4.3.7l.2 1.5a6 6 0 011.6-1.9c.6-.4 1.3-.7 2-.7s1.2.2 1.6.5l-.4 2.3-.1.3-.4.1h-.5l-.8-.2c-.7 0-1.2.2-1.7.6a4 4 0 00-1.1 1.5v8h-3.1z"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M701.6 125v-12.8h2c.3 0 .6.2.7.5l.2 1a7 7 0 011.8-1.2 4.6 4.6 0 012.2-.5c.7 0 1.3 0 1.9.3.5.3 1 .6 1.3 1 .4.5.7 1 .8 1.6.2.6.3 1.2.3 2v8.1h-3v-8.2c0-.7-.2-1.4-.6-1.8-.4-.4-1-.6-1.6-.6-.6 0-1 .1-1.5.3l-1.4 1v9.3h-3zm19.6-13c.8 0 1.5.1 2.2.4a4.9 4.9 0 012.9 3 6.9 6.9 0 01.4 3l-.1.3-.3.2H718c.2 1.4.5 2.4 1.1 3 .7.6 1.5.9 2.5.9.5 0 1 0 1.3-.2a22 22 0 001.7-.8l.5-.1h.4l.2.3.9 1c-.3.5-.7.8-1.1 1a6.4 6.4 0 01-2.8 1c-.5.2-1 .2-1.4.2-.9 0-1.7-.2-2.5-.5-.7-.3-1.4-.7-2-1.3-.5-.5-1-1.3-1.3-2.1a8.3 8.3 0 010-5.5 5.7 5.7 0 013.2-3.4c.6-.3 1.5-.4 2.5-.4zm0 2.2c-.9 0-1.6.2-2 .8-.6.5-1 1.2-1 2.1h5.7l-.1-1.1-.5-1-.9-.6-1.2-.2zm8 10.8v-12.8h1.8l.7.1.3.7.1 1.5a6 6 0 011.6-1.9c.7-.4 1.4-.7 2.1-.7.7 0 1.2.2 1.6.5l-.4 2.3c0 .1 0 .2-.2.3l-.3.1h-.5l-.9-.2c-.6 0-1.2.2-1.6.6a4 4 0 00-1.2 1.5v8h-3z"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M831 123.3a2 2 0 01.5-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.3 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.2-.1-.3-.3-.4-.6a2 2 0 01-.2-.7zm5.5 0a2 2 0 01.6-1.3 2 2 0 011.3-.6 1.9 1.9 0 011.4.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.2-.7zm5.7 0a2 2 0 01.5-1.3 2 2 0 011.4-.6 1.9 1.9 0 011.3.6 1.9 1.9 0 01.4 2 1.8 1.8 0 01-1 1 2 2 0 01-2-.4c-.3-.1-.4-.3-.5-.6a2 2 0 01-.1-.7z"/>
|
||||
</g>
|
||||
</svg>
|
||||
|
|
Before Width: | Height: | Size: 3.1 KiB After Width: | Height: | Size: 13 KiB |
|
@ -1,47 +1,60 @@
|
|||
<svg class="o-svg" xmlns="http://www.w3.org/2000/svg" width="827" height="168" viewBox="-10 -10 837 178">
|
||||
<style>
|
||||
.svg__training__text { fill: #1a1e23; font: 18px Arial, sans-serif }
|
||||
.svg__training__text-code { fill: #1a1e23; font: bold 16px Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace }
|
||||
</style>
|
||||
<defs>
|
||||
<linearGradient id="a" x1="0%" x2="0%" y1="100%" y2="0%">
|
||||
<stop offset="0%" stop-color="#F99"/>
|
||||
<stop offset="100%" stop-color="#B3FF66"/>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="827" height="168" viewBox="0 0 827 168">
|
||||
<defs>
|
||||
<linearGradient id="c" x1="0%" x2="100%" y1="0%" y2="100%">
|
||||
<stop offset="0%" stop-color="#B4FE67"/>
|
||||
<stop offset="100%" stop-color="#FE9A98"/>
|
||||
</linearGradient>
|
||||
</defs>
|
||||
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M561 103h-6v46H251v-35.8"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M251 107.2l4 8-4-2-4 2z"/>
|
||||
<rect fill="#f6f6f6" transform="translate(372 138.5)" width="80" height="20"/>
|
||||
<text class="svg__training__text-code" dy="1em" transform="translate(378.5 138.5)" width="65" height="16">PREDICT</text>
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M621 73v6h76.8"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M703.8 79l-8 4 2-4-2-4z"/>
|
||||
<rect fill="#f6f6f6" transform="translate(630.5 68.5)" width="50" height="20"/>
|
||||
<text class="svg__training__text-code" dy="1em" transform="translate(634.5 68.5)" width="43" height="18">SAVE</text>
|
||||
<rect width="120" height="60" x="501" y="43" fill="#f5f5f5" stroke="#666" stroke-width="2" rx="9" ry="9"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(538.5 63.5)" width="43" height="18">Model</text>
|
||||
<path fill="none" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M121 54h61.8"/>
|
||||
<path fill="#09a3d5" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M188.8 54l-8 4 2-4-2-4z"/>
|
||||
<path fill="none" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M121 19h61.8"/>
|
||||
<path fill="#09a3d5" stroke="#09a3d5" stroke-width="2" stroke-miterlimit="10" d="M188.8 19l-8 4 2-4-2-4z"/>
|
||||
<rect width="120" height="71" x="1" y="1" fill="#dae8fc" stroke="#09a3d5" stroke-width="2" rx="10.7" ry="10.7"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(10 26.5)" width="93" height="18">Training data</text>
|
||||
<path fill="none" stroke="#87e02d" stroke-width="2" stroke-miterlimit="10" d="M311 54h51.8"/>
|
||||
<path fill="#87e02d" stroke="#87e02d" stroke-width="2" stroke-miterlimit="10" d="M368.8 54l-8 4 2-4-2-4z"/>
|
||||
<path fill="#dae8fc" stroke="#09a3d5" stroke-width="2" d="M191 39h120v30H191z"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(232.5 44.5)" width="35" height="18">label</text>
|
||||
<path fill="none" stroke="#f33" stroke-width="2" stroke-miterlimit="10" d="M311 90h51.8"/>
|
||||
<path fill="#f33" stroke="#f33" stroke-width="2" stroke-miterlimit="10" d="M368.8 90l-8 4 2-4-2-4z"/>
|
||||
<path fill="#f5f5f5" stroke="#09a3d5" stroke-width="2" d="M191 75h120v30H191z" stroke-dasharray="2 2"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(232.5 80.5)" width="35" height="18">label</text>
|
||||
<rect width="120" height="60" x="706" y="49" fill="#f5f5f5" stroke="#666" stroke-width="2" rx="9" ry="9"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(734.5 59.5)" width="61" height="38">Updated
|
||||
<tspan dy="1.25em" dx="-3.25em">Model</tspan>
|
||||
</text>
|
||||
<path fill="#dae8fc" stroke="#09a3d5" stroke-width="2" d="M191 4h120v30H191z"/>
|
||||
<text class="svg__training__text" dy="0.9em" transform="translate(236.5 9.5)" width="27" height="18">text</text>
|
||||
<path fill="none" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M461 73h31.8"/>
|
||||
<path fill="#999" stroke="#999" stroke-width="2" stroke-miterlimit="10" d="M498.8 73l-8 4 2-4-2-4z"/>
|
||||
<path fill="url(#a)" d="M409.5 21L461 72.5 409.5 124 358 72.5z"/>
|
||||
<text class="svg__training__text-code" dy="0.9em" transform="translate(371.5 64.5)" width="67" height="16">GRADIENT</text>
|
||||
<rect id="a" width="116" height="29" x="0" y="0" rx="6"/>
|
||||
<mask id="b" width="116" height="29" x="0" y="0" fill="#fff" maskContentUnits="userSpaceOnUse" maskUnits="objectBoundingBox">
|
||||
<use xlink:href="#a"/>
|
||||
</mask>
|
||||
</defs>
|
||||
<g fill="none" fill-rule="evenodd">
|
||||
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M562.8 118v36.2h-99.9"/>
|
||||
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M375 154.6l-110 .1v-27.8"/>
|
||||
<path fill="#979797" d="M265 117l5 10h-10z"/>
|
||||
<path fill="#79E000" d="M378 60l-10 5V55z"/>
|
||||
<path stroke="#79E000" stroke-linecap="square" stroke-width="2.2" d="M367 60.2h-41"/>
|
||||
<path fill="#979797" d="M502 78l-10 5V73z"/>
|
||||
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M491.2 78H475"/>
|
||||
<path fill="#979797" d="M703 78l-10 5V73z"/>
|
||||
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M692.2 78H687"/>
|
||||
<path stroke="#979797" stroke-linecap="square" stroke-width="2.2" d="M629 77.3h-4.8"/>
|
||||
<path fill="#FF5D59" d="M378 95l-10 5V90z"/>
|
||||
<path stroke="#FF5D59" stroke-linecap="square" stroke-width="2.2" d="M367 95.2h-41"/>
|
||||
<path fill="#3AC" d="M203 27l-10 5V22z"/>
|
||||
<path stroke="#3AC" stroke-linecap="square" stroke-width="2.2" d="M192 27.2h-41"/>
|
||||
<path fill="#3AC" d="M203 61l-10 5V56z"/>
|
||||
<path stroke="#3AC" stroke-linecap="square" stroke-width="2.2" d="M192 61.2h-41"/>
|
||||
<rect width="117.5" height="73.5" x="25.8" y="8.8" fill="#C3E7F1" stroke="#3AC" stroke-width="3.5" rx="12"/>
|
||||
<g transform="translate(505 46)">
|
||||
<rect width="113" height="60" x="1.5" y="1.5" fill="#FFF" stroke="#B7B7B7" stroke-width="3" rx="12"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M40 31.6a7.3 7.3 0 01.5 1.2 20.3 20.3 0 01.6-1.2l3.8-7.2.2-.2.1-.2h2.4v13h-2.2v-8.4a10.7 10.7 0 010-1l-3.9 7.3c0 .2-.2.3-.3.4a1 1 0 01-.5.1h-.3a1 1 0 01-.5-.1 1 1 0 01-.4-.4l-4-7.4a8 8 0 010 1V37h-2V24H35.7l.1.2.2.2 3.9 7.2zm14-4a5 5 0 011.9.4 4 4 0 012.3 2.4c.3.6.4 1.2.4 2 0 .7-.1 1.4-.4 2-.2.5-.5 1-.9 1.4a4 4 0 01-1.4 1 5 5 0 01-1.9.3c-.7 0-1.3 0-1.9-.3a4 4 0 01-2.3-2.5c-.3-.5-.4-1.2-.4-2 0-.7.1-1.3.4-2a4 4 0 012.4-2.4c.5-.2 1.1-.3 1.8-.3zm0 7.8c.8 0 1.3-.2 1.7-.7.4-.6.6-1.3.6-2.3a4 4 0 00-.6-2.3c-.4-.5-1-.8-1.7-.8-.8 0-1.4.3-1.7.8-.4.5-.6 1.3-.6 2.3 0 1 .2 1.7.6 2.2.3.6 1 .8 1.7.8zM66.8 37c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.4.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V37h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V30a2.2 2.2 0 00-1.9-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.1 1.4v1.4l.5.9.5.5.8.2zm10.6-7.8c.6 0 1 .1 1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 1-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7-.6-.4a2 2 0 00-.9-.2zm8.1-5.6V37h-2.2V23.6h2.2z"/>
|
||||
</g>
|
||||
<g transform="translate(704 46)">
|
||||
<rect width="113" height="60" x="1.5" y="1.5" fill="#FFF" stroke="#B7B7B7" stroke-width="3" rx="12"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M29.6 28c.5 0 1 0 1.3-.2a2.6 2.6 0 001.5-1.7c.2-.4.2-.8.2-1.3V17H35v7.8a6 6 0 01-.3 2.1 4.9 4.9 0 01-2.8 2.8 6 6 0 01-2.3.4 6 6 0 01-2.2-.4 4.9 4.9 0 01-2.8-2.8 6 6 0 01-.3-2.1V17h2.4v7.8c0 .5 0 1 .2 1.3.1.4.3.8.6 1 .2.3.5.6.9.7.4.2.8.2 1.2.2zm7.8 5V20.8H39l.2.4.2.8 1.3-1a3.5 3.5 0 013 0c.5.1.8.4 1.1.8.3.4.6 1 .7 1.5a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-4.2 1.1l-1-.7V33h-2.2zm4.3-10.6c-.5 0-.9 0-1.2.2l-.9.9v4.1a2.1 2.1 0 001.8.8l.9-.1.7-.6.4-1 .2-1.4-.1-1.4-.4-.9-.6-.5-.8-.1zM54.1 30c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1A3 3 0 0148 29c-.3-.4-.6-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .4-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.2V30h-1.3zm-3-1.6c.5 0 .9-.1 1.2-.3l.9-.8V23a2.2 2.2 0 00-1.8-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.2 1.4c0 .6 0 1 .2 1.4 0 .4.2.7.3.9l.6.5.8.2zm14 1.6h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.2-1c.2-.4.5-.7 1-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.3-.3-.6-.4-1.1-.4a2.8 2.8 0 00-1.6.4l-.5.2a1 1 0 01-.4.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8l.3 1.5V30zm-4.4-1.4h.7a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.3.9l1 .2zm9.6 1.5c-.8 0-1.4-.2-1.9-.6-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2V21l1.4-.2.5-2.5.1-.3H70v2.8h2.4v1.6H70v5c0 .3 0 .5.2.7l.6.3.3-.1a2 2 0 00.5-.2H71.9l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.4.2-.9.2-1.3.2zm7.7-9.5c.6 0 1.1.1 1.6.3a3.5 3.5 0 012.1 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.7.7l1-.1.7-.3.5-.3H81l.2.1.6.8c-.2.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.4-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.1-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5H80v-.8l-.4-.7-.7-.4a2 2 0 00-.8-.2zM90.4 30c-.3 0-.4-.1-.5-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1a3 3 0 01-2.5-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.3-.6.5-1.1.9-1.5.3-.5.7-.8 1.2-1 .5-.3 1-.4 1.7-.4a3.2 3.2 0 012.3.9v-4.9h2.2V30h-1.4zm-3-1.6c.5 0 1-.1 1.3-.3l.9-.8V23a2.2 2.2 0 00-1.8-.8c-.3 0-.7 0-1 .2l-.6.5-.5 1-.1 1.4.1 1.4c.1.4.2.7.4.9l.6.5.8.2zM40 42.6a7.3 7.3 0 01.5 1.2 20.3 20.3 0 01.6-1.2l3.8-7.2.2-.2.1-.2h2.4v13h-2.2v-8.4a10.7 10.7 0 010-1l-3.9 7.3c0 .2-.2.3-.3.4a1 1 0 01-.5.1h-.3a1 1 0 01-.5-.1 1 1 0 01-.4-.4l-4-7.4a8 8 0 010 1V48h-2V35H35.7l.1.2.2.2 3.9 7.2zm14-4a5 5 0 011.9.4 4 4 0 012.3 2.4c.3.6.4 1.2.4 2 0 .7-.1 1.4-.4 2-.2.5-.5 1-.9 1.4a4 4 0 01-1.4 1 5 5 0 01-1.9.3c-.7 0-1.3 0-1.9-.3a4 4 0 01-2.3-2.5c-.3-.5-.4-1.2-.4-2 0-.7.1-1.3.4-2a4 4 0 012.4-2.4c.5-.2 1.1-.3 1.8-.3zm0 7.8c.8 0 1.3-.2 1.7-.7.4-.6.6-1.3.6-2.3a4 4 0 00-.6-2.3c-.4-.5-1-.8-1.7-.8-.8 0-1.4.3-1.7.8-.4.5-.6 1.3-.6 2.3 0 1 .2 1.7.6 2.2.3.6 1 .8 1.7.8zM66.8 48c-.3 0-.5-.1-.6-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.4.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V48h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V41a2.2 2.2 0 00-1.9-.8c-.3 0-.6 0-.9.2l-.7.5-.4 1-.1 1.4v1.4l.5.9.5.5.8.2zm10.6-7.8c.6 0 1 .1 1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 1-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7-.6-.4a2 2 0 00-.9-.2zm8.1-5.6V48h-2.2V34.6h2.2z"/>
|
||||
</g>
|
||||
<g transform="translate(207 12)">
|
||||
<rect width="113.5" height="26.5" x="1.3" y="1.3" fill="#C3E7F1" stroke="#3AC" stroke-width="2.5" rx="6"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M50 8v2h-4v11h-2.4V10h-3.9V8H50zm5.2 3.6c.6 0 1.1.1 1.6.3A3.5 3.5 0 0159 14a5 5 0 01.3 2.2l-.1.3-.2.1H53c0 1 .3 1.7.7 2.1.5.5 1 .7 1.8.7l1-.1.6-.3c.2 0 .4-.2.5-.3h.7l.2.1.6.8c-.2.3-.5.6-.8.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.8-.3 4 4 0 01-1.4-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.3-.5.8-.8 1.3-1 .6-.3 1.2-.4 2-.4zm0 1.6a2 2 0 00-1.5.6c-.3.3-.6.8-.7 1.5h4.2l-.1-.8-.4-.7c-.1-.2-.3-.3-.6-.4a2 2 0 00-.8-.2zm8 3l-3-4.4h2.5l.2.3 2 3a2.5 2.5 0 01.2-.6l1.5-2.4.3-.3h2.3l-3 4.3 3.1 4.9h-2.1l-.4-.1-.3-.3-2-3.2-.1.5-1.8 2.7-.2.3-.4.1h-2l3.2-4.8zm10.6 5c-.8 0-1.4-.3-1.9-.7-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2-.1-.3v-1l1.5-.2.4-2.5.2-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.2.2.3.3.6.3l.3-.1a2 2 0 00.5-.2H75.4l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.5.2-.9.2-1.3.2z"/>
|
||||
</g>
|
||||
<g transform="translate(207 46)">
|
||||
<rect width="113.5" height="26.5" x="1.3" y="1.3" fill="#C3E7F1" stroke="#3AC" stroke-width="2.5" rx="6"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M40 19h5.2v2h-7.6V8H40v11zm14 2h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8-.2-1c0-.4 0-.7.3-1 .1-.4.5-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.5-1.1-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V21zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm6.5 1.4V7.6h2.2V13l1.3-1a3.6 3.6 0 013 0c.4.2.7.5 1 1 .3.3.6.8.7 1.4a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-2.9 1.3 3.3 3.3 0 01-1.4-.3c-.2 0-.4-.2-.5-.4l-.5-.5v.7l-.3.3-.3.1h-1.5zm4.3-7.7l-1.2.3-.9.9v4.1a2.1 2.1 0 001.8.8l1-.1.6-.6.4-1c.2-.4.2-.9.2-1.4l-.1-1.4-.4-.9-.6-.5-.8-.2zm9.7-1.7c.6 0 1.2.1 1.7.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.8.7l.9-.1.7-.3c.2 0 .3-.2.5-.3H73.3l.2.1.7.8c-.3.3-.6.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.2-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5h4.1v-.8l-.4-.7-.6-.4a2 2 0 00-1-.2zm8.2-5.6V21h-2.2V7.6h2.2z"/>
|
||||
</g>
|
||||
<g transform="translate(207 80)">
|
||||
<use stroke="#3AC" stroke-dasharray="3 3" stroke-width="5" mask="url(#b)" xlink:href="#a"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M40 19h5.2v2h-7.6V8H40v11zm14 2h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8-.2-1c0-.4 0-.7.3-1 .1-.4.5-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.5-1.1-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V21zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm6.5 1.4V7.6h2.2V13l1.3-1a3.6 3.6 0 013 0c.4.2.7.5 1 1 .3.3.6.8.7 1.4a7.4 7.4 0 010 4c-.2.5-.4 1-.8 1.5a3.7 3.7 0 01-2.9 1.3 3.3 3.3 0 01-1.4-.3c-.2 0-.4-.2-.5-.4l-.5-.5v.7l-.3.3-.3.1h-1.5zm4.3-7.7l-1.2.3-.9.9v4.1a2.1 2.1 0 001.8.8l1-.1.6-.6.4-1c.2-.4.2-.9.2-1.4l-.1-1.4-.4-.9-.6-.5-.8-.2zm9.7-1.7c.6 0 1.2.1 1.7.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c0 1 .3 1.7.8 2.1.4.5 1 .7 1.8.7l.9-.1.7-.3c.2 0 .3-.2.5-.3H73.3l.2.1.7.8c-.3.3-.6.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.8-.8 1.4-1 .5-.3 1.2-.4 1.8-.4zm0 1.6a2 2 0 00-1.4.6c-.4.3-.6.8-.7 1.5h4.1v-.8l-.4-.7-.6-.4a2 2 0 00-1-.2zm8.2-5.6V21h-2.2V7.6h2.2z"/>
|
||||
</g>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M61.5 31v2h-4v11h-2.4V33h-3.9v-2h10.3zm1.4 13v-9.2h1.8l.1.5.2 1.1c.3-.5.7-1 1.1-1.3.5-.3 1-.5 1.5-.5s.9.1 1.2.3l-.3 1.7-.1.2H68h-1c-.4 0-.8 0-1.2.3a3 3 0 00-.8 1.1V44H63zm14.6 0h-1a1 1 0 01-.5 0l-.3-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.7.2a4.6 4.6 0 01-2-.1l-.9-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.3-1c.1-.4.4-.7.9-1 .4-.3 1-.5 1.7-.7a12 12 0 012.6-.3v-.5c0-.7-.1-1.1-.4-1.4-.2-.3-.6-.4-1.1-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.4-1.5 3.9-1.5.5 0 1 0 1.4.3a3 3 0 011.7 1.8l.3 1.5V44zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.1.7.4.9l.9.2zm8.8-7.8V44h-2.2v-9.2H82zm.4-2.7c0 .2 0 .3-.2.5a1.5 1.5 0 01-.7.8 1.4 1.4 0 01-1.6-.3c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.7.8l.2.6zM84.5 44v-9.2H86c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.7.4 1 .7.3.3.4.7.6 1.1l.2 1.4V44h-2.2v-5.9c0-.5-.2-1-.4-1.3-.3-.3-.7-.4-1.2-.4-.4 0-.7 0-1 .2l-1 .7V44h-2.3zm12.6-9.2V44H95v-9.2h2.2zm.4-2.7l-.1.5a1.5 1.5 0 01-.8.8A1.4 1.4 0 0195 33c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.8.8v.6zM99.7 44v-9.2h1.3c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.7.4 1 .7.3.3.5.7.6 1.1l.2 1.4V44h-2.2v-5.9c0-.5-.2-1-.4-1.3-.3-.3-.7-.4-1.2-.4-.4 0-.7 0-1 .2l-1 .7V44h-2.2zm13.4-9.4l1.1.1 1 .4h2.6v.8l-.1.3-.4.2-.8.1a2.9 2.9 0 01.2 1 2.7 2.7 0 01-1 2.3l-1.2.6a4.7 4.7 0 01-2.4 0c-.3.3-.5.4-.5.7 0 .2.1.3.3.4l.7.2h1a19.8 19.8 0 012 .2c.4.1.8.2 1 .4l.7.7c.2.2.3.6.3 1l-.3 1.2c-.2.4-.5.7-.9 1-.4.4-.9.6-1.4.8a7.3 7.3 0 01-3.7 0c-.5 0-1-.3-1.3-.5l-.8-.8-.2-.9c0-.4.1-.8.4-1 .2-.4.6-.6 1-.8l-.5-.5-.2-.8v-.4l.3-.5.4-.4.5-.3a2.7 2.7 0 01-1.5-2.5 2.7 2.7 0 011-2.2l1.2-.6a5 5 0 011.5-.2zm2.4 9.8l-.1-.4a1 1 0 00-.5-.3l-.6-.1a12 12 0 00-1.7-.1l-.9-.1a2 2 0 00-.6.5 1 1 0 00-.2.6l.1.5.4.3.7.3h2.1c.3 0 .6-.2.8-.3l.4-.4.1-.5zm-2.4-5.2c.3 0 .5 0 .7-.2.2 0 .4-.1.5-.3l.3-.4.1-.7c0-.4-.1-.8-.4-1-.3-.3-.7-.4-1.2-.4-.6 0-1 0-1.2.4-.3.2-.5.6-.5 1l.1.6a1.3 1.3 0 00.9.8l.7.2zM74 62c-.2 0-.4-.1-.5-.4l-.2-.9-.6.6a3.8 3.8 0 01-1.5.7l-.9.1A3 3 0 0168 61c-.3-.4-.6-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.4-1.1.8-1.5.3-.5.8-.8 1.2-1 .5-.3 1-.4 1.7-.4a3.2 3.2 0 012.3.9v-4.9h2.2V62h-1.3zm-3-1.6c.6 0 1-.1 1.3-.3l.9-.8V55a2.2 2.2 0 00-1.8-.8c-.3 0-.6 0-1 .2l-.6.5-.4 1-.2 1.4.1 1.4c.1.4.2.7.4.9l.6.5.8.2zm14 1.6h-1a1 1 0 01-.4 0c-.2-.2-.3-.3-.3-.5l-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.5-.8c-.2-.3-.2-.6-.2-1s0-.7.2-1c.2-.4.5-.7 1-1 .4-.3 1-.5 1.6-.7a12 12 0 012.7-.3v-.5c0-.7-.2-1.1-.4-1.4-.3-.3-.7-.4-1.2-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2l-.3-.1a1 1 0 01-.3-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8c.2.5.2 1 .2 1.5V62zm-4.3-1.4h.7a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.7.5a1 1 0 00-.1.5c0 .4 0 .7.3.9l1 .2zm9.6 1.5c-.8 0-1.4-.2-1.9-.6-.4-.5-.6-1.1-.6-2v-5h-1l-.3-.2-.1-.3v-1l1.5-.2.5-2.5.1-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.2.2.3.3.6.3l.3-.1a2 2 0 00.5-.2H92l.1.1.7 1.1c-.3.3-.7.5-1.1.6-.4.2-.9.2-1.3.2zm11.1-.1h-1a1 1 0 01-.5 0l-.2-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5-.6-.8-.2-1c0-.4.1-.7.3-1 .2-.4.5-.7 1-1 .4-.3 1-.5 1.6-.7a12 12 0 012.7-.3v-.5c0-.7-.2-1.1-.4-1.4-.3-.3-.7-.4-1.2-.4a2.8 2.8 0 00-1.5.4l-.5.2a1 1 0 01-.5.2L95 55a1 1 0 01-.2-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.6 0 1 0 1.5.3a3 3 0 011.7 1.8c.2.5.2 1 .2 1.5V62zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.6-.5v-1.5l-1.7.1-1 .3-.6.5a1 1 0 00-.1.5c0 .4 0 .7.3.9l1 .2z"/>
|
||||
<path fill="url(#c)" d="M384.1 42.1h73v73h-73z" transform="rotate(45 420.6 78.6)"/>
|
||||
<path fill="#3D4251" fill-rule="nonzero" d="M393.4 80.2a6 6 0 002.6-.5v-2.4h-1.6l-.4-.1-.1-.4v-1.3h4.3v5.2a7.2 7.2 0 01-3.5 1.4h-1.5c-1 0-1.8-.1-2.6-.5a6.2 6.2 0 01-3.4-3.4c-.4-.9-.5-1.7-.5-2.7 0-1 .1-1.9.5-2.7a6 6 0 013.4-3.5c.9-.3 1.8-.5 2.9-.5 1 0 2 .2 2.7.5.8.3 1.4.7 2 1.2l-.7 1.1c-.2.3-.3.4-.6.4-.1 0-.3 0-.4-.2a34.3 34.3 0 00-1.3-.6 5.4 5.4 0 00-3.6 0c-.5.2-1 .6-1.3 1-.4.4-.6.8-.8 1.4-.2.6-.3 1.2-.3 1.9s0 1.4.3 2c.2.6.5 1 .9 1.5.3.4.8.7 1.3.9.5.2 1.1.3 1.7.3zm7 1.8v-9.2h1.7l.2.5.1 1.1c.4-.5.7-1 1.2-1.3.4-.3 1-.5 1.5-.5.4 0 .8.1 1.1.3l-.3 1.7v.2H404.5c-.5 0-.9 0-1.2.3a3 3 0 00-.8 1.1V82h-2.3zm14.5 0h-1a1 1 0 01-.5 0l-.2-.5-.2-.6a7.6 7.6 0 01-1.4 1l-.8.2a4.6 4.6 0 01-2-.1l-.8-.5c-.3-.2-.4-.5-.6-.8l-.2-1c0-.4.1-.7.3-1 .2-.4.5-.7 1-1 .3-.3.9-.5 1.6-.7a12 12 0 012.6-.3v-.5c0-.7 0-1.1-.3-1.4-.3-.3-.7-.5-1.2-.5a2.8 2.8 0 00-1.5.5l-.5.2a1 1 0 01-.5.2l-.4-.1a1 1 0 01-.2-.3l-.4-.7c1-1 2.3-1.5 3.8-1.5.5 0 1 0 1.4.3a3 3 0 011.8 1.8l.2 1.5V82zm-4.3-1.4h.6a2.4 2.4 0 001-.5l.5-.5v-1.5l-1.6.1-1 .3-.6.5a1 1 0 00-.2.5c0 .4.2.7.4.9l.9.2zm13 1.4c-.3 0-.5-.1-.6-.4l-.1-.9-.6.6a3.8 3.8 0 01-1.5.7l-1 .1a3 3 0 01-2.4-1.2c-.3-.4-.5-.9-.7-1.5a7.5 7.5 0 010-3.9c.2-.6.5-1.1.8-1.5.4-.5.8-.8 1.3-1 .5-.3 1-.4 1.6-.4a3.2 3.2 0 012.3.9v-4.9h2.3V82h-1.4zm-3-1.6c.5 0 .9-.1 1.2-.3l1-.8V75a2.2 2.2 0 00-1.8-.8c-.4 0-.7 0-1 .2l-.7.5-.4 1-.1 1.4v1.4l.5.9c.1.2.3.4.6.5l.7.2zm9.1-7.6V82h-2.2v-9.2h2.2zm.4-2.7c0 .2 0 .3-.2.5a1.5 1.5 0 01-.7.8 1.4 1.4 0 01-1.6-.3c0-.2-.2-.3-.3-.5v-.5a1.4 1.4 0 01.3-1 1.4 1.4 0 011-.4h.6a1.5 1.5 0 01.7.8l.2.6zm6 2.5l1.6.3a3.5 3.5 0 012 2.1 5 5 0 01.3 2.2v.3l-.2.1h-6c.1 1 .4 1.7.8 2.2.4.4 1 .6 1.8.6l.9-.1c.3 0 .5-.2.7-.3.2 0 .3-.2.5-.3h.7l.1.1.7.8c-.3.3-.5.6-.9.7a4.6 4.6 0 01-2 .8h-1a5 5 0 01-1.7-.3 4 4 0 01-1.5-1c-.4-.3-.7-.9-1-1.5a6 6 0 010-3.9c.2-.5.5-1 .9-1.4.4-.5.9-.8 1.4-1 .5-.3 1.2-.4 1.9-.4zm0 1.6a2 2 0 00-1.5.6c-.4.3-.6.8-.7 1.5h4.2c0-.3 0-.5-.2-.8l-.3-.7-.6-.4a2 2 0 00-.9-.2zm5.8 7.8v-9.2h1.3c.3 0 .5.1.6.4l.1.7a5 5 0 011.3-1 3.3 3.3 0 011.6-.3c.5 0 1 .1 1.3.3.4.1.8.4 1 .7.3.3.5.7.6 1.1l.2 1.4V82h-2.2v-5.9c0-.5-.1-1-.4-1.3-.3-.3-.7-.5-1.2-.5-.4 0-.7.1-1 .3l-1 .7V82h-2.2zm13.2.1c-.8 0-1.4-.2-1.8-.6-.4-.5-.7-1.1-.7-2v-5h-.9l-.3-.2-.1-.3v-1l1.4-.2.5-2.5c0-.1 0-.2.2-.3h1.5v2.8h2.4v1.6h-2.4v5c0 .3 0 .5.2.7.1.2.3.3.6.3l.3-.1a2 2 0 00.4-.2H456.7l.2.1.7 1.1c-.4.3-.7.5-1.2.6-.4.2-.8.2-1.3.2z"/>
|
||||
<rect width="80" height="18" x="378" y="145" fill="#37BBAB" rx="9"/>
|
||||
<g transform="translate(631 69)">
|
||||
<rect width="52" height="18" x="1" fill="#37BBAB" rx="9"/>
|
||||
<path fill="#FFF" fill-rule="nonzero" d="M13.6 5.5c0 .2 0 .2-.2.3H12.8a12.2 12.2 0 00-1.1-.6l-.9-.1H10l-.5.4c-.2 0-.3.2-.4.4v.6c0 .3 0 .5.2.7l.6.5.8.3a41.9 41.9 0 012 .7l.8.6a2.6 2.6 0 01.9 2c0 .6-.1 1-.3 1.5a3.4 3.4 0 01-2 2c-.5.2-1.1.3-1.7.3a5.5 5.5 0 01-3.8-1.5l.6-1 .2-.2H8a13 13 0 001.3.8l1 .2c.6 0 1.1-.2 1.4-.5.4-.3.5-.7.5-1.2 0-.3 0-.6-.2-.8-.1-.2-.3-.3-.6-.4l-.8-.4a28.4 28.4 0 01-2-.7L8 9l-.7-1A3.4 3.4 0 018 4.4a4.3 4.3 0 012.8-1c.7 0 1.3.1 1.9.3.6.2 1 .5 1.5 1l-.6 1zM26.2 15h-1.7a.7.7 0 01-.7-.5l-.9-2.3h-4.8l-.8 2.3a.8.8 0 01-.7.5h-1.7l4.5-11.6h2.2L26.2 15zm-7.5-4.4h3.7L21 6.8a17.6 17.6 0 01-.5-1.4 26.7 26.7 0 01-.4 1.4l-1.4 3.8zm7.5-7.2H28a.7.7 0 01.7.5l2.7 7a9.5 9.5 0 01.5 1.7l.5-1.6L35 4l.2-.4.5-.2h1.7L33 15h-2L26.2 3.4zm19.8 0v1.7h-5v3.3h4V10h-4v3.3h5V15h-7.3V3.4H46z"/>
|
||||
</g>
|
||||
<path fill="#FFF" fill-rule="nonzero" d="M389 156v4h-2.2v-11.6h3.8c.7 0 1.4.1 2 .3.5.2 1 .4 1.4.8l.8 1.1.3 1.5c0 .6-.1 1-.3 1.6l-.9 1.2a4 4 0 01-1.4.7c-.5.2-1.2.3-2 .3H389zm0-1.8h1.6l1-.1.7-.4.5-.7a2.6 2.6 0 000-1.7l-.5-.7a2 2 0 00-.7-.4l-1-.1H389v4.1zm10 1.3v4.5h-2.2v-11.6h3.5c.8 0 1.5.1 2 .3.6.1 1 .4 1.4.7.4.3.7.6.8 1a3.5 3.5 0 01-.4 3.4l-.8.8-1 .5.6.6 3 4.3h-2a1 1 0 01-.5-.1 1 1 0 01-.3-.3l-2.4-3.7-.3-.3a1 1 0 00-.5-.1h-1zm0-1.6h1.3l1-.1.8-.4.4-.7.2-.8c0-.6-.2-1-.6-1.3-.4-.3-1-.5-1.8-.5H399v3.8zm15.5-5.5v1.7h-5.1v3.3h4v1.6h-4v3.3h5.1v1.7h-7.3v-11.6h7.3zm12.1 5.8c0 .9-.1 1.6-.4 2.4a5.4 5.4 0 01-3 3c-.7.3-1.5.4-2.4.4h-4.4v-11.6h4.4c.9 0 1.7.2 2.4.5a5.4 5.4 0 013 3c.3.7.4 1.5.4 2.3zm-2.2 0c0-.6 0-1.2-.2-1.7s-.4-1-.7-1.3c-.4-.3-.7-.6-1.2-.8a4 4 0 00-1.5-.3h-2.3v8.2h2.3c.6 0 1-.1 1.5-.3.5-.2.8-.4 1.2-.8.3-.3.5-.8.7-1.3.2-.5.2-1 .2-1.7zm6.4 5.8h-2.2v-11.6h2.2V160zm10.5-2.7l.3.1.9 1c-.5.5-1 1-1.8 1.3a6 6 0 01-2.4.4 5.1 5.1 0 01-5.2-3.5 7 7 0 010-4.8 5.5 5.5 0 013.1-3 6.4 6.4 0 014.7 0c.6.2 1.2.6 1.6 1l-.7 1-.2.2h-.2l-.4-.1a4.7 4.7 0 00-1.2-.6l-1.2-.2-1.5.3c-.5.2-.8.5-1.2.8-.3.4-.6.8-.7 1.3a5 5 0 00-.3 1.7c0 .7 0 1.2.3 1.8.1.5.4.9.7 1.2.3.4.7.6 1.1.8.4.2 1 .3 1.4.3h.8a3.4 3.4 0 001.2-.5c.2 0 .4-.2.5-.4h.2l.2-.1zm11-8.9v1.8h-3.5v9.8h-2.2v-9.8h-3.5v-1.8h9.1z"/>
|
||||
</g>
|
||||
</svg>
|
||||
|
|
Before Width: | Height: | Size: 3.9 KiB After Width: | Height: | Size: 18 KiB |
|
@ -18,13 +18,13 @@ an **annotated document**. It also orchestrates training and serialization.
|
|||
|
||||
### Container objects {#architecture-containers}
|
||||
|
||||
| Name | Description |
|
||||
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
|
||||
| [`Span`](/api/span) | A slice from a `Doc` object. |
|
||||
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
|
||||
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
|
||||
| [`MorphAnalysis`](/api/morphanalysis) | A morphological analysis. |
|
||||
| Name | Description |
|
||||
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
|
||||
| [`Span`](/api/span) | A slice from a `Doc` object. |
|
||||
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
|
||||
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
|
||||
| [`MorphAnalysis`](/api/morphanalysis) | A morphological analysis. |
|
||||
|
||||
### Processing pipeline {#architecture-pipeline}
|
||||
|
||||
|
@ -52,5 +52,3 @@ an **annotated document**. It also orchestrates training and serialization.
|
|||
| [`StringStore`](/api/stringstore) | Map strings to and from hash values. |
|
||||
| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. |
|
||||
| [`Example`](/api/example) | Collection for training annotations. |
|
||||
|
||||
|
|
||||
|
|
|
@ -12,29 +12,32 @@ passed on to the next component.
|
|||
> - **Creates:** Objects, attributes and properties modified and set by the
|
||||
> component.
|
||||
|
||||
| Name | Component | Creates | Description |
|
||||
| ----------------- | ------------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------ |
|
||||
| **tokenizer** | [`Tokenizer`](/api/tokenizer) | `Doc` | Segment text into tokens. |
|
||||
| **tagger** | [`Tagger`](/api/tagger) | `Doc[i].tag` | Assign part-of-speech tags. |
|
||||
| **parser** | [`DependencyParser`](/api/dependencyparser) | `Doc[i].head`, `Doc[i].dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels. |
|
||||
| **ner** | [`EntityRecognizer`](/api/entityrecognizer) | `Doc.ents`, `Doc[i].ent_iob`, `Doc[i].ent_type` | Detect and label named entities. |
|
||||
| **textcat** | [`TextCategorizer`](/api/textcategorizer) | `Doc.cats` | Assign document labels. |
|
||||
| ... | [custom components](/usage/processing-pipelines#custom-components) | `Doc._.xxx`, `Token._.xxx`, `Span._.xxx` | Assign custom attributes, methods or properties. |
|
||||
| Name | Component | Creates | Description |
|
||||
| ------------- | ------------------------------------------------------------------ | ----------------------------------------------------------- | ------------------------------------------------ |
|
||||
| **tokenizer** | [`Tokenizer`](/api/tokenizer) | `Doc` | Segment text into tokens. |
|
||||
| **tagger** | [`Tagger`](/api/tagger) | `Doc[i].tag` | Assign part-of-speech tags. |
|
||||
| **parser** | [`DependencyParser`](/api/dependencyparser) | `Doc[i].head`, `Doc[i].dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels. |
|
||||
| **ner** | [`EntityRecognizer`](/api/entityrecognizer) | `Doc.ents`, `Doc[i].ent_iob`, `Doc[i].ent_type` | Detect and label named entities. |
|
||||
| **textcat** | [`TextCategorizer`](/api/textcategorizer) | `Doc.cats` | Assign document labels. |
|
||||
| ... | [custom components](/usage/processing-pipelines#custom-components) | `Doc._.xxx`, `Token._.xxx`, `Span._.xxx` | Assign custom attributes, methods or properties. |
|
||||
|
||||
The processing pipeline always **depends on the statistical model** and its
|
||||
capabilities. For example, a pipeline can only include an entity recognizer
|
||||
component if the model includes data to make predictions of entity labels. This
|
||||
is why each model will specify the pipeline to use in its meta data, as a simple
|
||||
list containing the component names:
|
||||
is why each model will specify the pipeline to use in its meta data and
|
||||
[config](/usage/training#config), as a simple list containing the component
|
||||
names:
|
||||
|
||||
```json
|
||||
"pipeline": ["tagger", "parser", "ner"]
|
||||
```ini
|
||||
pipeline = ["tagger", "parser", "ner"]
|
||||
```
|
||||
|
||||
import Accordion from 'components/accordion.js'
|
||||
|
||||
<Accordion title="Does the order of pipeline components matter?" id="pipeline-components-order">
|
||||
|
||||
<!-- TODO: note on v3 tok2vec own model vs. upstream listeners -->
|
||||
|
||||
In spaCy v2.x, the statistical components like the tagger or parser are
|
||||
independent and don't share any data between themselves. For example, the named
|
||||
entity recognizer doesn't use any features set by the tagger and parser, and so
|
||||
|
@ -48,11 +51,10 @@ pre-defined sentence boundaries, so if a previous component in the pipeline sets
|
|||
them, its dependency predictions may be different. Similarly, it matters if you
|
||||
add the [`EntityRuler`](/api/entityruler) before or after the statistical entity
|
||||
recognizer: if it's added before, the entity recognizer will take the existing
|
||||
entities into account when making predictions.
|
||||
The [`EntityLinker`](/api/entitylinker), which resolves named entities to
|
||||
knowledge base IDs, should be preceded by
|
||||
a pipeline component that recognizes entities such as the
|
||||
[`EntityRecognizer`](/api/entityrecognizer).
|
||||
entities into account when making predictions. The
|
||||
[`EntityLinker`](/api/entitylinker), which resolves named entities to knowledge
|
||||
base IDs, should be preceded by a pipeline component that recognizes entities
|
||||
such as the [`EntityRecognizer`](/api/entityrecognizer).
|
||||
|
||||
</Accordion>
|
||||
|
||||
|
|
|
@ -909,9 +909,8 @@ If you're using a statistical model, writing to the `nlp.Defaults` or
|
|||
`English.Defaults` directly won't work, since the regular expressions are read
|
||||
from the model and will be compiled when you load it. If you modify
|
||||
`nlp.Defaults`, you'll only see the effect if you call
|
||||
[`spacy.blank`](/api/top-level#spacy.blank) or `Defaults.create_tokenizer()`. If
|
||||
you want to modify the tokenizer loaded from a statistical model, you should
|
||||
modify `nlp.tokenizer` directly.
|
||||
[`spacy.blank`](/api/top-level#spacy.blank). If you want to modify the tokenizer
|
||||
loaded from a statistical model, you should modify `nlp.tokenizer` directly.
|
||||
|
||||
</Infobox>
|
||||
|
||||
|
@ -1386,8 +1385,7 @@ import spacy
|
|||
from spacy.lang.en import English
|
||||
|
||||
nlp = English() # just the language with no model
|
||||
sentencizer = nlp.create_pipe("sentencizer")
|
||||
nlp.add_pipe(sentencizer)
|
||||
nlp.add_pipe("sentencizer")
|
||||
doc = nlp("This is a sentence. This is another sentence.")
|
||||
for sent in doc.sents:
|
||||
print(sent.text)
|
||||
|
@ -1422,6 +1420,7 @@ take advantage of dependency-based sentence segmentation.
|
|||
|
||||
```python
|
||||
### {executable="true"}
|
||||
from spacy.language import Language
|
||||
import spacy
|
||||
|
||||
text = "this is a sentence...hello...and another sentence."
|
||||
|
@ -1430,13 +1429,14 @@ nlp = spacy.load("en_core_web_sm")
|
|||
doc = nlp(text)
|
||||
print("Before:", [sent.text for sent in doc.sents])
|
||||
|
||||
@Language.component("set_custom_coundaries")
|
||||
def set_custom_boundaries(doc):
|
||||
for token in doc[:-1]:
|
||||
if token.text == "...":
|
||||
doc[token.i+1].is_sent_start = True
|
||||
doc[token.i + 1].is_sent_start = True
|
||||
return doc
|
||||
|
||||
nlp.add_pipe(set_custom_boundaries, before="parser")
|
||||
nlp.add_pipe("set_custom_boundaries", before="parser")
|
||||
doc = nlp(text)
|
||||
print("After:", [sent.text for sent in doc.sents])
|
||||
```
|
||||
|
|
|
@ -97,32 +97,40 @@ but also your own custom processing functions. A pipeline component can be added
|
|||
to an already existing `nlp` object, specified when initializing a `Language`
|
||||
class, or defined within a [model package](/usage/saving-loading#models).
|
||||
|
||||
When you load a model, spaCy first consults the model's
|
||||
[`meta.json`](/usage/saving-loading#models). The meta typically includes the
|
||||
model details, the ID of a language class, and an optional list of pipeline
|
||||
components. spaCy then does the following:
|
||||
|
||||
> #### meta.json (excerpt)
|
||||
> #### config.cfg (excerpt)
|
||||
>
|
||||
> ```json
|
||||
> {
|
||||
> "lang": "en",
|
||||
> "name": "core_web_sm",
|
||||
> "description": "Example model for spaCy",
|
||||
> "pipeline": ["tagger", "parser", "ner"]
|
||||
> }
|
||||
> ```ini
|
||||
> [nlp]
|
||||
> lang = "en"
|
||||
> pipeline = ["tagger", "parser"]
|
||||
>
|
||||
> [components]
|
||||
>
|
||||
> [components.tagger]
|
||||
> factory = "tagger"
|
||||
> # settings for the tagger component
|
||||
>
|
||||
> [components.parser]
|
||||
> factory = "parser"
|
||||
> # settings for the parser component
|
||||
> ```
|
||||
|
||||
When you load a model, spaCy first consults the model's
|
||||
[`meta.json`](/usage/saving-loading#models) and
|
||||
[`config.cfg`](/usage/training#config). The config tells spaCy what language
|
||||
class to use, which components are in the pipeline, and how those components
|
||||
should be created. spaCy will then do the following:
|
||||
|
||||
1. Load the **language class and data** for the given ID via
|
||||
[`get_lang_class`](/api/top-level#util.get_lang_class) and initialize it. The
|
||||
`Language` class contains the shared vocabulary, tokenization rules and the
|
||||
language-specific annotation scheme.
|
||||
2. Iterate over the **pipeline names** and create each component using
|
||||
[`create_pipe`](/api/language#create_pipe), which looks them up in
|
||||
`Language.factories`.
|
||||
3. Add each pipeline component to the pipeline in order, using
|
||||
[`add_pipe`](/api/language#add_pipe).
|
||||
4. Make the **model data** available to the `Language` class by calling
|
||||
language-specific settings.
|
||||
2. Iterate over the **pipeline names** and look up each component name in the
|
||||
`[components]` block. The `factory` tells spaCy which
|
||||
[component factory](#custom-components-factories) to use for adding the
|
||||
component with with [`add_pipe`](/api/language#add_pipe). The settings are
|
||||
passed into the factory.
|
||||
3. Make the **model data** available to the `Language` class by calling
|
||||
[`from_disk`](/api/language#from_disk) with the path to the model data
|
||||
directory.
|
||||
|
||||
|
@ -132,17 +140,25 @@ So when you call this...
|
|||
nlp = spacy.load("en_core_web_sm")
|
||||
```
|
||||
|
||||
... the model's `meta.json` tells spaCy to use the language `"en"` and the
|
||||
... the model's `config.cfg` tells spaCy to use the language `"en"` and the
|
||||
pipeline `["tagger", "parser", "ner"]`. spaCy will then initialize
|
||||
`spacy.lang.en.English`, and create each pipeline component and add it to the
|
||||
processing pipeline. It'll then load in the model's data from its data directory
|
||||
and return the modified `Language` class for you to use as the `nlp` object.
|
||||
|
||||
<Infobox title="Changed in v3.0" variant="warning">
|
||||
|
||||
spaCy v3.0 introduces a `config.cfg`, which includes more detailed settings for
|
||||
the model pipeline, its components and the
|
||||
[training process](/usage/training#config). You can export the config of your
|
||||
current `nlp` object by calling [`nlp.config.to_disk`](/api/language#config).
|
||||
|
||||
</Infobox>
|
||||
|
||||
Fundamentally, a [spaCy model](/models) consists of three components: **the
|
||||
weights**, i.e. binary data loaded in from a directory, a **pipeline** of
|
||||
functions called in order, and **language data** like the tokenization rules and
|
||||
annotation scheme. All of this is specific to each model, and defined in the
|
||||
model's `meta.json` – for example, a Spanish NER model requires different
|
||||
language-specific settings. For example, a Spanish NER model requires different
|
||||
weights, language data and pipeline components than an English parsing and
|
||||
tagging model. This is also why the pipeline state is always held by the
|
||||
`Language` class. [`spacy.load`](/api/top-level#spacy.load) puts this all
|
||||
|
@ -158,9 +174,8 @@ data_path = "path/to/en_core_web_sm/en_core_web_sm-2.0.0"
|
|||
cls = spacy.util.get_lang_class(lang) # 1. Get Language instance, e.g. English()
|
||||
nlp = cls() # 2. Initialize it
|
||||
for name in pipeline:
|
||||
component = nlp.create_pipe(name) # 3. Create the pipeline components
|
||||
nlp.add_pipe(component) # 4. Add the component to the pipeline
|
||||
nlp.from_disk(model_data_path) # 5. Load in the binary data
|
||||
nlp.add_pipe(name) # 3. Add the component to the pipeline
|
||||
nlp.from_disk(model_data_path) # 4. Load in the binary data
|
||||
```
|
||||
|
||||
When you call `nlp` on a text, spaCy will **tokenize** it and then **call each
|
||||
|
@ -190,36 +205,34 @@ print(nlp.pipe_names)
|
|||
|
||||
### Built-in pipeline components {#built-in}
|
||||
|
||||
spaCy ships with several built-in pipeline components that are also available in
|
||||
the `Language.factories`. This means that you can initialize them by calling
|
||||
[`nlp.create_pipe`](/api/language#create_pipe) with their string names and
|
||||
require them in the pipeline settings in your model's `meta.json`.
|
||||
spaCy ships with several built-in pipeline components that are registered with
|
||||
string names. This means that you can initialize them by calling
|
||||
[`nlp.add_pipe`](/api/language#add_pipe) with their names and spaCy will know
|
||||
how to create them. See the [API documentation](/api) for a full list of
|
||||
available pipeline components and component functions.
|
||||
|
||||
> #### Usage
|
||||
>
|
||||
> ```python
|
||||
> # Option 1: Import and initialize
|
||||
> from spacy.pipeline import EntityRuler
|
||||
> ruler = EntityRuler(nlp)
|
||||
> nlp.add_pipe(ruler)
|
||||
>
|
||||
> # Option 2: Using nlp.create_pipe
|
||||
> sentencizer = nlp.create_pipe("sentencizer")
|
||||
> nlp.add_pipe(sentencizer)
|
||||
> nlp = spacy.blank("en")
|
||||
> nlp.add_pipe("sentencizer")
|
||||
> # add_pipe returns the added component
|
||||
> ruler = nlp.add_pipe("entity_ruler")
|
||||
> ```
|
||||
|
||||
| String name | Component | Description |
|
||||
| ------------------- | ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
|
||||
| `tagger` | [`Tagger`](/api/tagger) | Assign part-of-speech-tags. |
|
||||
| `parser` | [`DependencyParser`](/api/dependencyparser) | Assign dependency labels. |
|
||||
| `ner` | [`EntityRecognizer`](/api/entityrecognizer) | Assign named entities. |
|
||||
| `entity_linker` | [`EntityLinker`](/api/entitylinker) | Assign knowledge base IDs to named entities. Should be added after the entity recognizer. |
|
||||
| `textcat` | [`TextCategorizer`](/api/textcategorizer) | Assign text categories. |
|
||||
| `entity_ruler` | [`EntityRuler`](/api/entityruler) | Assign named entities based on pattern rules. |
|
||||
| `sentencizer` | [`Sentencizer`](/api/sentencizer) | Add rule-based sentence segmentation without the dependency parse. |
|
||||
| `merge_noun_chunks` | [`merge_noun_chunks`](/api/pipeline-functions#merge_noun_chunks) | Merge all noun chunks into a single token. Should be added after the tagger and parser. |
|
||||
| `merge_entities` | [`merge_entities`](/api/pipeline-functions#merge_entities) | Merge all entities into a single token. Should be added after the entity recognizer. |
|
||||
| `merge_subtokens` | [`merge_subtokens`](/api/pipeline-functions#merge_subtokens) | Merge subtokens predicted by the parser into single tokens. Should be added after the parser. |
|
||||
| String name | Component | Description |
|
||||
| --------------- | ------------------------------------------- | ----------------------------------------------------------------------------------------- |
|
||||
| `tagger` | [`Tagger`](/api/tagger) | Assign part-of-speech-tags. |
|
||||
| `parser` | [`DependencyParser`](/api/dependencyparser) | Assign dependency labels. |
|
||||
| `ner` | [`EntityRecognizer`](/api/entityrecognizer) | Assign named entities. |
|
||||
| `entity_linker` | [`EntityLinker`](/api/entitylinker) | Assign knowledge base IDs to named entities. Should be added after the entity recognizer. |
|
||||
| `textcat` | [`TextCategorizer`](/api/textcategorizer) | Assign text categories. |
|
||||
| `entity_ruler` | [`EntityRuler`](/api/entityruler) | Assign named entities based on pattern rules. |
|
||||
| `sentencizer` | [`Sentencizer`](/api/sentencizer) | Add rule-based sentence segmentation without the dependency parse. |
|
||||
|
||||
<!-- TODO: update with more components -->
|
||||
|
||||
<!-- TODO: explain default config and factories -->
|
||||
|
||||
### Disabling and modifying pipeline components {#disabling}
|
||||
|
||||
|
@ -233,7 +246,6 @@ list:
|
|||
```python
|
||||
### Disable loading
|
||||
nlp = spacy.load("en_core_web_sm", disable=["tagger", "parser"])
|
||||
nlp = English().from_disk("/model", disable=["ner"])
|
||||
```
|
||||
|
||||
In some cases, you do want to load all pipeline components and their weights,
|
||||
|
@ -297,15 +309,18 @@ nlp.replace_pipe("tagger", my_custom_tagger)
|
|||
|
||||
## Creating custom pipeline components {#custom-components}
|
||||
|
||||
A component receives a `Doc` object and can modify it – for example, by using
|
||||
the current weights to make a prediction and set some annotation on the
|
||||
document. By adding a component to the pipeline, you'll get access to the `Doc`
|
||||
at any point **during processing** – instead of only being able to modify it
|
||||
afterwards.
|
||||
A pipeline component is a function that receives a `Doc` object, modifies it and
|
||||
returns it – – for example, by using the current weights to make a prediction
|
||||
and set some annotation on the document. By adding a component to the pipeline,
|
||||
you'll get access to the `Doc` at any point **during processing** – instead of
|
||||
only being able to modify it afterwards.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.language import Language
|
||||
>
|
||||
> @Language.component("my_component")
|
||||
> def my_component(doc):
|
||||
> # do something to the doc here
|
||||
> return doc
|
||||
|
@ -316,6 +331,12 @@ afterwards.
|
|||
| `doc` | `Doc` | The `Doc` object processed by the previous component. |
|
||||
| **RETURNS** | `Doc` | The `Doc` object processed by this pipeline component. |
|
||||
|
||||
The [`@Language.component`](/api/language#component) decorator lets you turn a
|
||||
simple function into a pipeline component. It takes at least one argument, the
|
||||
**name** of the component factory. You can use this name to add an instance of
|
||||
your component to the pipeline. It can also be listed in your model config, so
|
||||
you can save, load and train models using your component.
|
||||
|
||||
Custom components can be added to the pipeline using the
|
||||
[`add_pipe`](/api/language#add_pipe) method. Optionally, you can either specify
|
||||
a component to add it **before or after**, tell spaCy to add it **first or
|
||||
|
@ -325,23 +346,43 @@ last** in the pipeline, or define a **custom name**. If no name is set and no
|
|||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> nlp.add_pipe(my_component)
|
||||
> nlp.add_pipe(my_component, first=True)
|
||||
> nlp.add_pipe(my_component, before="parser")
|
||||
> nlp.add_pipe("my_component")
|
||||
> nlp.add_pipe("my_component", first=True)
|
||||
> nlp.add_pipe("my_component", before="parser")
|
||||
> ```
|
||||
|
||||
| Argument | Type | Description |
|
||||
| -------- | ---- | ------------------------------------------------------------------------ |
|
||||
| `last` | bool | If set to `True`, component is added **last** in the pipeline (default). |
|
||||
| `first` | bool | If set to `True`, component is added **first** in the pipeline. |
|
||||
| `before` | str | String name of component to add the new component **before**. |
|
||||
| `after` | str | String name of component to add the new component **after**. |
|
||||
| Argument | Type | Description |
|
||||
| -------- | --------- | ------------------------------------------------------------------------ |
|
||||
| `last` | bool | If set to `True`, component is added **last** in the pipeline (default). |
|
||||
| `first` | bool | If set to `True`, component is added **first** in the pipeline. |
|
||||
| `before` | str / int | String name or index to add the new component **before**. |
|
||||
| `after` | str / int | String name or index to add the new component **after**. |
|
||||
|
||||
### Example: A simple pipeline component {#custom-components-simple}
|
||||
<Infobox title="Changed in v3.0" variant="warning">
|
||||
|
||||
As of v3.0, components need to be registered using the
|
||||
[`@Language.component`](/api/language#component) or
|
||||
[`@Language.factory`](/api/language#factory) decorator so spaCy knows that a
|
||||
function is a component. [`nlp.add_pipe`](/api/language#add_pipe) now takes the
|
||||
**string name** of the component factory instead of the component function. This
|
||||
doesn't only save you lines of code, it also allows spaCy to validate and track
|
||||
your custom components, and make sure they can be saved and loaded.
|
||||
|
||||
```diff
|
||||
- ruler = nlp.create_pipe("entity_ruler")
|
||||
- nlp.add_pipe(ruler)
|
||||
+ ruler = nlp.add_pipe("entity_ruler")
|
||||
```
|
||||
|
||||
</Infobox>
|
||||
|
||||
### Examples: Simple stateless pipeline components {#custom-components-simple}
|
||||
|
||||
The following component receives the `Doc` in the pipeline and prints some
|
||||
information about it: the number of tokens, the part-of-speech tags of the
|
||||
tokens and a conditional message based on the document length.
|
||||
tokens and a conditional message based on the document length. The
|
||||
[`@Language.component`](/api/language#component) decorator lets you register the
|
||||
component under the name `"info_component"`.
|
||||
|
||||
> #### ✏️ Things to try
|
||||
>
|
||||
|
@ -352,11 +393,16 @@ tokens and a conditional message based on the document length.
|
|||
> this change reflected in `nlp.pipe_names`.
|
||||
> 3. Print `nlp.pipeline`. You'll see a list of tuples describing the component
|
||||
> name and the function that's called on the `Doc` object in the pipeline.
|
||||
> 4. Change the first argument to `@Language.component`, the name, to something
|
||||
> else. spaCy should now complain that it doesn't know a component of the
|
||||
> name `"info_component"`.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
|
||||
@Language.component("info_component")
|
||||
def my_component(doc):
|
||||
print(f"After tokenization, this doc has {len(doc)} tokens.")
|
||||
print("The part-of-speech tags are:", [token.pos_ for token in doc])
|
||||
|
@ -365,76 +411,16 @@ def my_component(doc):
|
|||
return doc
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
nlp.add_pipe(my_component, name="print_info", last=True)
|
||||
nlp.add_pipe("info_component", name="print_info", last=True)
|
||||
print(nlp.pipe_names) # ['tagger', 'parser', 'ner', 'print_info']
|
||||
doc = nlp("This is a sentence.")
|
||||
|
||||
```
|
||||
|
||||
Of course, you can also wrap your component as a class to allow initializing it
|
||||
with custom settings and hold state within the component. This is useful for
|
||||
**stateful components**, especially ones which **depend on shared data**. In the
|
||||
following example, the custom component `EntityMatcher` can be initialized with
|
||||
`nlp` object, a terminology list and an entity label. Using the
|
||||
[`PhraseMatcher`](/api/phrasematcher), it then matches the terms in the `Doc`
|
||||
and adds them to the existing entities.
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
|
||||
As of v2.1.0, spaCy ships with the [`EntityRuler`](/api/entityruler), a pipeline
|
||||
component for easy, rule-based named entity recognition. Its implementation is
|
||||
similar to the `EntityMatcher` code shown below, but it includes some additional
|
||||
features like support for phrase patterns and token patterns, handling overlaps
|
||||
with existing entities and pattern export as JSONL.
|
||||
|
||||
We'll still keep the pipeline component example below, as it works well to
|
||||
illustrate complex components. But if you're planning on using this type of
|
||||
component in your application, you might find the `EntityRuler` more convenient.
|
||||
[See here](/usage/rule-based-matching#entityruler) for more details and
|
||||
examples.
|
||||
|
||||
</Infobox>
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.matcher import PhraseMatcher
|
||||
from spacy.tokens import Span
|
||||
|
||||
class EntityMatcher:
|
||||
name = "entity_matcher"
|
||||
|
||||
def __init__(self, nlp, terms, label):
|
||||
patterns = [nlp.make_doc(text) for text in terms]
|
||||
self.matcher = PhraseMatcher(nlp.vocab)
|
||||
self.matcher.add(label, patterns)
|
||||
|
||||
def __call__(self, doc):
|
||||
matches = self.matcher(doc)
|
||||
for match_id, start, end in matches:
|
||||
span = Span(doc, start, end, label=match_id)
|
||||
doc.ents = list(doc.ents) + [span]
|
||||
return doc
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
terms = ("cat", "dog", "tree kangaroo", "giant sea spider")
|
||||
entity_matcher = EntityMatcher(nlp, terms, "ANIMAL")
|
||||
|
||||
nlp.add_pipe(entity_matcher, after="ner")
|
||||
|
||||
print(nlp.pipe_names) # The components in the pipeline
|
||||
|
||||
doc = nlp("This is a text about Barack Obama and a tree kangaroo")
|
||||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
```
|
||||
|
||||
### Example: Custom sentence segmentation logic {#component-example1}
|
||||
|
||||
Let's say you want to implement custom logic to improve spaCy's sentence
|
||||
boundary detection. Currently, sentence segmentation is based on the dependency
|
||||
parse, which doesn't always produce ideal results. The custom logic should
|
||||
therefore be applied **after** tokenization, but _before_ the dependency parsing
|
||||
– this way, the parser can also take advantage of the sentence boundaries.
|
||||
Here's another example of a pipeline component that implements custom logic to
|
||||
improve the sentence boundaries set by the dependency parser. The custom logic
|
||||
should therefore be applied **after** tokenization, but _before_ the dependency
|
||||
parsing – this way, the parser can also take advantage of the sentence
|
||||
boundaries.
|
||||
|
||||
> #### ✏️ Things to try
|
||||
>
|
||||
|
@ -448,90 +434,318 @@ therefore be applied **after** tokenization, but _before_ the dependency parsing
|
|||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
|
||||
@Language.component("custom_sentencizer")
|
||||
def custom_sentencizer(doc):
|
||||
for i, token in enumerate(doc[:-2]):
|
||||
# Define sentence start if pipe + titlecase token
|
||||
if token.text == "|" and doc[i+1].is_title:
|
||||
doc[i+1].is_sent_start = True
|
||||
if token.text == "|" and doc[i + 1].is_title:
|
||||
doc[i + 1].is_sent_start = True
|
||||
else:
|
||||
# Explicitly set sentence start to False otherwise, to tell
|
||||
# the parser to leave those tokens alone
|
||||
doc[i+1].is_sent_start = False
|
||||
doc[i + 1].is_sent_start = False
|
||||
return doc
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
nlp.add_pipe(custom_sentencizer, before="parser") # Insert before the parser
|
||||
nlp.add_pipe("custom_sentencizer", before="parser") # Insert before the parser
|
||||
doc = nlp("This is. A sentence. | This is. Another sentence.")
|
||||
for sent in doc.sents:
|
||||
print(sent.text)
|
||||
```
|
||||
|
||||
### Example: Pipeline component for entity matching and tagging with custom attributes {#component-example2}
|
||||
### Component factories and stateful components {#custom-components-factories}
|
||||
|
||||
This example shows how to create a spaCy extension that takes a terminology list
|
||||
(in this case, single- and multi-word company names), matches the occurrences in
|
||||
a document, labels them as `ORG` entities, merges the tokens and sets custom
|
||||
`is_tech_org` and `has_tech_org` attributes. For efficient matching, the example
|
||||
uses the [`PhraseMatcher`](/api/phrasematcher) which accepts `Doc` objects as
|
||||
match patterns and works well for large terminology lists. It also ensures your
|
||||
patterns will always match, even when you customize spaCy's tokenization rules.
|
||||
When you call `nlp` on a text, the custom pipeline component is applied to the
|
||||
`Doc`.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_entities.py
|
||||
```
|
||||
|
||||
Wrapping this functionality in a pipeline component allows you to reuse the
|
||||
module with different settings, and have all pre-processing taken care of when
|
||||
you call `nlp` on your text and receive a `Doc` object.
|
||||
|
||||
### Adding factories {#custom-components-factories}
|
||||
|
||||
When spaCy loads a model via its `meta.json`, it will iterate over the
|
||||
`"pipeline"` setting, look up every component name in the internal factories and
|
||||
call [`nlp.create_pipe`](/api/language#create_pipe) to initialize the individual
|
||||
components, like the tagger, parser or entity recognizer. If your model uses
|
||||
custom components, this won't work – so you'll have to tell spaCy **where to
|
||||
find your component**. You can do this by writing to the `Language.factories`:
|
||||
Component factories are callables that take settings and return a **pipeline
|
||||
component function**. This is useful if your component is stateful and if you
|
||||
need to customize their creation, or if you need access to the current `nlp`
|
||||
object or the shared vocab. Component factories can be registered using the
|
||||
[`@Language.factory`](/api/language#factory) decorator and they need at least
|
||||
**two named arguments** that are filled in automatically when the component is
|
||||
added to the pipeline:
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> from spacy.language import Language
|
||||
>
|
||||
> @Language.factory("my_component")
|
||||
> def my_component(nlp, name):
|
||||
> return MyComponent()
|
||||
> ```
|
||||
|
||||
| Argument | Type | Description |
|
||||
| -------- | --------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `nlp` | [`Language`](/api/language) | The current `nlp` object. Can be used to access the |
|
||||
| `name` | str | The **instance name** of the component in the pipeline. This lets you identify different instances of the same component. |
|
||||
|
||||
All other settings can be passed in by the user via the `config` argument on
|
||||
[`nlp.add_pipe`](/api/language). The
|
||||
[`@Language.factory`](/api/language#factory) decorator also lets you define a
|
||||
`default_config` that's used as a fallback.
|
||||
|
||||
```python
|
||||
### With config {highlight="4,9"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
Language.factories["entity_matcher"] = lambda nlp, **cfg: EntityMatcher(nlp, **cfg)
|
||||
|
||||
@Language.factory("my_component", default_config={"some_setting": True})
|
||||
def my_component(nlp, name, some_setting: bool):
|
||||
return MyComponent(some_setting=some_setting)
|
||||
|
||||
nlp = spacy.blank("en")
|
||||
nlp.add_pipe("my_component", config={"some_setting": False})
|
||||
```
|
||||
|
||||
You can also ship the above code and your custom component in your packaged
|
||||
model's `__init__.py`, so it's executed when you load your model. The `**cfg`
|
||||
config parameters are passed all the way down from
|
||||
[`spacy.load`](/api/top-level#spacy.load), so you can load the model and its
|
||||
components with custom settings:
|
||||
<Accordion title="How is @Language.factory different from @Language.component?" id="factories-decorator-component">
|
||||
|
||||
The [`@Language.component`](/api/language#component) decorator is essentially a
|
||||
**shortcut** for stateless pipeline component that don't need any settings. This
|
||||
means you don't have to always write a function that returns your function if
|
||||
there's no state to be passed through – spaCy can just take care of this for
|
||||
you. The following two code examples are equivalent:
|
||||
|
||||
```python
|
||||
nlp = spacy.load("your_custom_model", terms=["tree kangaroo"], label="ANIMAL")
|
||||
# Statless component with @Language.factory
|
||||
@Language.factory("my_component")
|
||||
def create_my_component():
|
||||
def my_component(doc):
|
||||
# Do something to the doc
|
||||
return doc
|
||||
|
||||
return my_component
|
||||
|
||||
# Stateless component with @Language.component
|
||||
@Language.component("my_component")
|
||||
def my_component(doc):
|
||||
# Do something to the doc
|
||||
return doc
|
||||
```
|
||||
|
||||
<Infobox title="Important note" variant="warning">
|
||||
</Accordion>
|
||||
|
||||
When you load a model via its package name, like `en_core_web_sm`, spaCy will
|
||||
import the package and then call its `load()` method. This means that custom
|
||||
code in the model's `__init__.py` will be executed, too. This is **not the
|
||||
case** if you're loading a model from a path containing the model data. Here,
|
||||
spaCy will only read in the `meta.json`. If you want to use custom factories
|
||||
with a model loaded from a path, you need to add them to `Language.factories`
|
||||
_before_ you load the model.
|
||||
<Accordion title="Can I add the @Language.factory decorator to a class?" id="factories-class-decorator" spaced>
|
||||
|
||||
Yes, the [`@Language.factory`](/api/language#factory) decorator can be added to
|
||||
a function or a class. If it's added to a class, it expects the `__init__`
|
||||
method to take the arguments `nlp` and `name`, and will populate all other
|
||||
arguments from the config. That said, it's often cleaner and more intuitive to
|
||||
make your factory a separate function. That's also how spaCy does it internally.
|
||||
|
||||
</Accordion>
|
||||
|
||||
### Example: Stateful component with settings
|
||||
|
||||
This example shows a **stateful** pipeline component for handling acronyms:
|
||||
based on a dictionary, it will detect acronyms and their expanded forms in both
|
||||
directions and add them to a list as the custom `doc._.acronyms`
|
||||
[extension attribute](#custom-components-attributes). Under the hood, it uses
|
||||
the [`PhraseMatcher`](/api/phrasematcher) to find instances of the phrases.
|
||||
|
||||
The factory function takes three arguments: the shared `nlp` object and
|
||||
component instance `name`, which are passed in automatically by spaCy, and a
|
||||
`case_sensitive` config setting that makes the matching and acronym detection
|
||||
case-sensitive.
|
||||
|
||||
> #### ✏️ Things to try
|
||||
>
|
||||
> 1. Change the `config` passed to `nlp.add_pipe` and set `"case_sensitive"` to
|
||||
> `True`. You should see that the expanded acronym for "LOL" isn't detected
|
||||
> anymore.
|
||||
> 2. Add some more terms to the `DICTIONARY` and update the processed text so
|
||||
> they're detected.
|
||||
> 3. Add a `name` argument to `nlp.add_pipe` to change the component name. Print
|
||||
> `nlp.pipe_names` to see the change reflected in the pipeline.
|
||||
> 4. Print the config of the current `nlp` object with
|
||||
> `print(nlp.config.to_str())` and inspect the `[components]` block. You
|
||||
> should see an entry for the acronyms component, referencing the factory
|
||||
> `acronyms` and the config settings.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
from spacy.language import Language
|
||||
from spacy.tokens import Doc
|
||||
from spacy.matcher import PhraseMatcher
|
||||
import spacy
|
||||
|
||||
DICTIONARY = {"lol": "laughing out loud", "brb": "be right back"}
|
||||
DICTIONARY.update({value: key for key, value in DICTIONARY.items()})
|
||||
|
||||
@Language.factory("acronyms", default_config={"case_sensitive": False})
|
||||
def create_acronym_component(nlp: Language, name: str, case_sensitive: bool):
|
||||
return AcronymComponent(nlp, case_sensitive)
|
||||
|
||||
class AcronymComponent:
|
||||
def __init__(self, nlp: Language, case_sensitive: bool):
|
||||
# Create the matcher and match on Token.lower if case-insensitive
|
||||
matcher_attr = "TEXT" if case_sensitive else "LOWER"
|
||||
self.matcher = PhraseMatcher(nlp.vocab, attr=matcher_attr)
|
||||
self.matcher.add("ACRONYMS", [nlp.make_doc(term) for term in DICTIONARY])
|
||||
self.case_sensitive = case_sensitive
|
||||
# Register custom extension on the Doc
|
||||
if not Doc.has_extension("acronyms"):
|
||||
Doc.set_extension("acronyms", default=[])
|
||||
|
||||
def __call__(self, doc: Doc) -> Doc:
|
||||
# Add the matched spans when doc is processed
|
||||
for _, start, end in self.matcher(doc):
|
||||
span = doc[start:end]
|
||||
acronym = DICTIONARY.get(span.text if self.case_sensitive else span.text.lower())
|
||||
doc._.acronyms.append((span, acronym))
|
||||
return doc
|
||||
|
||||
# Add the component to the pipeline and configure it
|
||||
nlp = spacy.blank("en")
|
||||
nlp.add_pipe("acronyms", config={"case_sensitive": False})
|
||||
|
||||
# Process a doc and see the results
|
||||
doc = nlp("LOL, be right back")
|
||||
print(doc._.acronyms)
|
||||
```
|
||||
|
||||
### Python type hints and pydantic validation {#type-hints new="3"}
|
||||
|
||||
spaCy's configs are powered by our machine learning library Thinc's
|
||||
[configuration system](https://thinc.ai/docs/usage-config), which supports
|
||||
[type hints](https://docs.python.org/3/library/typing.html) and even
|
||||
[advanced type annotations](https://thinc.ai/docs/usage-config#advanced-types)
|
||||
using [`pydantic`](https://github.com/samuelcolvin/pydantic). If your component
|
||||
factory provides type hints, the values that are passed in will be **checked
|
||||
against the expected types**. If the value can't be cast to an integer, spaCy
|
||||
will raise an error. `pydantic` also provides strict types like `StrictFloat`,
|
||||
which will force the value to be an integer and raise an error if it's not – for
|
||||
instance, if your config defines a float.
|
||||
|
||||
<Infobox variant="warning">
|
||||
|
||||
If you're not using
|
||||
[strict types](https://pydantic-docs.helpmanual.io/usage/types/#strict-types),
|
||||
values that can be **cast to** the given type will still be accepted. For
|
||||
example, `1` can be cast to a `float` or a `bool` type, but not to a
|
||||
`List[str]`. However, if the type is
|
||||
[`StrictFloat`](https://pydantic-docs.helpmanual.io/usage/types/#strict-types),
|
||||
only a float will be accepted.
|
||||
|
||||
</Infobox>
|
||||
|
||||
The following example shows a custom pipeline component for debugging. It can be
|
||||
added anywhere in the pipeline and logs information about the `nlp` object and
|
||||
the `Doc` that passes through. The `log_level` config setting lets the user
|
||||
customize what log statements are shown – for instance, `"INFO"` will show info
|
||||
logs and more critical logging statements, whereas `"DEBUG"` will show
|
||||
everything. The value is annotated as a `StrictStr`, so it will only accept a
|
||||
string value.
|
||||
|
||||
> #### ✏️ Things to try
|
||||
>
|
||||
> 1. Change the `config` passed to `nlp.add_pipe` to use the log level `"INFO"`.
|
||||
> You should see that only the statement logged with `logger.info` is shown.
|
||||
> 2. Change the `config` passed to `nlp.add_pipe` so that it contains unexpected
|
||||
> values – for example, a boolean instead of a string: `"log_level": False`.
|
||||
> You should see a validation error.
|
||||
> 3. Check out the docs on `pydantic`'s
|
||||
> [constrained types](https://pydantic-docs.helpmanual.io/usage/types/#constrained-types)
|
||||
> and write a type hint for `log_level` that only accepts the exact string
|
||||
> values `"DEBUG"`, `"INFO"` or `"CRITICAL"`.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
from spacy.tokens import Doc
|
||||
from pydantic import StrictStr
|
||||
import logging
|
||||
|
||||
@Language.factory("debug", default_config={"log_level": "DEBUG"})
|
||||
class DebugComponent:
|
||||
def __init__(self, nlp: Language, name: str, log_level: StrictStr):
|
||||
self.logger = logging.getLogger(f"spacy.{name}")
|
||||
self.logger.setLevel(log_level)
|
||||
self.logger.info(f"Pipeline: {nlp.pipe_names}")
|
||||
|
||||
def __call__(self, doc: Doc) -> Doc:
|
||||
self.logger.debug(f"Doc: {len(doc)} tokens, is_tagged: {doc.is_tagged}")
|
||||
return doc
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
nlp.add_pipe("debug", config={"log_level": "DEBUG"})
|
||||
doc = nlp("This is a text...")
|
||||
```
|
||||
|
||||
### Language-specific factories {#factories-language new="3"}
|
||||
|
||||
There are many use case where you might want your pipeline components to be
|
||||
language-specific. Sometimes this requires entirely different implementation per
|
||||
language, sometimes the only difference is in the settings or data. spaCy allows
|
||||
you to register factories of the **same name** on both the `Language` base
|
||||
class, as well as its **subclasses** like `English` or `German`. Factories are
|
||||
resolved starting with the specific subclass. If the subclass doesn't define a
|
||||
component of that name, spaCy will check the `Language` base class.
|
||||
|
||||
Here's an example of a pipeline component that overwrites the normalized form of
|
||||
a token, the `Token.norm_` with an entry from a language-specific lookup table.
|
||||
It's registered twice under the name `"token_normalizer"` – once using
|
||||
`@English.factory` and once using `@German.factory`:
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
from spacy.lang.en import English
|
||||
from spacy.lang.de import German
|
||||
|
||||
class TokenNormalizer:
|
||||
def __init__(self, norm_table):
|
||||
self.norm_table = norm_table
|
||||
|
||||
def __call__(self, doc):
|
||||
for token in doc:
|
||||
# Overwrite the token.norm_ if there's an entry in the data
|
||||
token.norm_ = self.norm_table.get(token.text, token.norm_)
|
||||
return doc
|
||||
|
||||
@English.factory("token_normalizer")
|
||||
def create_en_normalizer(nlp, name):
|
||||
return TokenNormalizer({"realise": "realize", "colour": "color"})
|
||||
|
||||
@German.factory("token_normalizer")
|
||||
def create_de_normalizer(nlp, name):
|
||||
return TokenNormalizer({"daß": "dass", "wußte": "wusste"})
|
||||
|
||||
nlp_en = English()
|
||||
nlp_en.add_pipe("token_normalizer") # uses the English factory
|
||||
print([token.norm_ for token in nlp_en("realise colour daß wußte")])
|
||||
|
||||
nlp_de = German()
|
||||
nlp_de.add_pipe("token_normalizer") # uses the German factory
|
||||
print([token.norm_ for token in nlp_de("realise colour daß wußte")])
|
||||
```
|
||||
|
||||
<Infobox title="Implementation details">
|
||||
|
||||
Under the hood, language-specific factories are added to the
|
||||
[`factories` registry](/api/top-level#registry) prefixed with the language code,
|
||||
e.g. `"en.token_normalizer"`. When resolving the factory in
|
||||
[`nlp.add_pipe`](/api/language#add_pipe), spaCy first checks for a
|
||||
language-specific version of the factory using `nlp.lang` and if none is
|
||||
available, falls back to looking up the regular factory name.
|
||||
|
||||
</Infobox>
|
||||
|
||||
<!-- TODO:
|
||||
|
||||
### Trainable components {#trainable new="3"}
|
||||
|
||||
-->
|
||||
|
||||
## Extension attributes {#custom-components-attributes new="2"}
|
||||
|
||||
As of v2.0, spaCy allows you to set any custom attributes and methods on the
|
||||
`Doc`, `Span` and `Token`, which become available as `Doc._`, `Span._` and
|
||||
`Token._` – for example, `Token._.my_attr`. This lets you store additional
|
||||
information relevant to your application, add new features and functionality to
|
||||
spaCy, and implement your own models trained with other machine learning
|
||||
libraries. It also lets you take advantage of spaCy's data structures and the
|
||||
`Doc` object as the "single source of truth".
|
||||
spaCy allows you to set any custom attributes and methods on the `Doc`, `Span`
|
||||
and `Token`, which become available as `Doc._`, `Span._` and `Token._` – for
|
||||
example, `Token._.my_attr`. This lets you store additional information relevant
|
||||
to your application, add new features and functionality to spaCy, and implement
|
||||
your own models trained with other machine learning libraries. It also lets you
|
||||
take advantage of spaCy's data structures and the `Doc` object as the "single
|
||||
source of truth".
|
||||
|
||||
<Accordion title="Why ._ and not just a top-level attribute?" id="why-dot-underscore">
|
||||
|
||||
|
@ -641,7 +855,73 @@ attributes on the `Doc`, `Span` and `Token` – for example, the capital,
|
|||
latitude/longitude coordinates and even the country flag.
|
||||
|
||||
```python
|
||||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
|
||||
### {executable="true"}
|
||||
import requests
|
||||
from spacy.lang.en import English
|
||||
from spacy.language import Language
|
||||
from spacy.matcher import PhraseMatcher
|
||||
from spacy.tokens import Doc, Span, Token
|
||||
|
||||
@Language.factory("rest_countries")
|
||||
class RESTCountriesComponent:
|
||||
def __init__(self, nlp, name, label="GPE"):
|
||||
r = requests.get("https://restcountries.eu/rest/v2/all")
|
||||
r.raise_for_status() # make sure requests raises an error if it fails
|
||||
countries = r.json()
|
||||
# Convert API response to dict keyed by country name for easy lookup
|
||||
self.countries = {c["name"]: c for c in countries}
|
||||
self.label = label
|
||||
# Set up the PhraseMatcher with Doc patterns for each country name
|
||||
self.matcher = PhraseMatcher(nlp.vocab)
|
||||
self.matcher.add("COUNTRIES", [nlp.make_doc(c) for c in self.countries.keys()])
|
||||
# Register attribute on the Token. We'll be overwriting this based on
|
||||
# the matches, so we're only setting a default value, not a getter.
|
||||
Token.set_extension("is_country", default=False)
|
||||
Token.set_extension("country_capital", default=False)
|
||||
Token.set_extension("country_latlng", default=False)
|
||||
Token.set_extension("country_flag", default=False)
|
||||
# Register attributes on Doc and Span via a getter that checks if one of
|
||||
# the contained tokens is set to is_country == True.
|
||||
Doc.set_extension("has_country", getter=self.has_country)
|
||||
Span.set_extension("has_country", getter=self.has_country)
|
||||
|
||||
def __call__(self, doc):
|
||||
spans = [] # keep the spans for later so we can merge them afterwards
|
||||
for _, start, end in self.matcher(doc):
|
||||
# Generate Span representing the entity & set label
|
||||
entity = Span(doc, start, end, label=self.label)
|
||||
spans.append(entity)
|
||||
# Set custom attribute on each token of the entity
|
||||
# Can be extended with other data returned by the API, like
|
||||
# currencies, country code, flag, calling code etc.
|
||||
for token in entity:
|
||||
token._.set("is_country", True)
|
||||
token._.set("country_capital", self.countries[entity.text]["capital"])
|
||||
token._.set("country_latlng", self.countries[entity.text]["latlng"])
|
||||
token._.set("country_flag", self.countries[entity.text]["flag"])
|
||||
# Iterate over all spans and merge them into one token
|
||||
with doc.retokenize() as retokenizer:
|
||||
for span in spans:
|
||||
retokenizer.merge(span)
|
||||
# Overwrite doc.ents and add entity – be careful not to replace!
|
||||
doc.ents = list(doc.ents) + spans
|
||||
return doc # don't forget to return the Doc!
|
||||
|
||||
def has_country(self, tokens):
|
||||
"""Getter for Doc and Span attributes. Since the getter is only called
|
||||
when we access the attribute, we can refer to the Token's 'is_country'
|
||||
attribute here, which is already set in the processing step."""
|
||||
return any([t._.get("is_country") for t in tokens])
|
||||
|
||||
nlp = English()
|
||||
nlp.add_pipe("rest_countries", config={"label": "GPE"})
|
||||
doc = nlp("Some text about Colombia and the Czech Republic")
|
||||
print("Pipeline", nlp.pipe_names) # pipeline contains component name
|
||||
print("Doc has countries", doc._.has_country) # Doc contains countries
|
||||
for token in doc:
|
||||
if token._.is_country:
|
||||
print(token.text, token._.country_capital, token._.country_latlng, token._.country_flag)
|
||||
print("Entities", [(e.text, e.label_) for e in doc.ents])
|
||||
```
|
||||
|
||||
In this case, all data can be fetched on initialization in one request. However,
|
||||
|
@ -800,11 +1080,6 @@ function that takes a `Doc`, modifies it and returns it.
|
|||
[`load_model_from_path`](/api/top-level#util.load_model_from_path) utility
|
||||
functions.
|
||||
|
||||
```diff
|
||||
+ nlp.add_pipe(my_custom_component)
|
||||
+ return nlp.from_disk(model_path)
|
||||
```
|
||||
|
||||
- Once you're ready to share your extension with others, make sure to **add docs
|
||||
and installation instructions** (you can always link to this page for more
|
||||
info). Make it easy for others to install and use your extension, for example
|
||||
|
@ -838,10 +1113,12 @@ wrapper has to do is compute the entity spans and overwrite the `doc.ents`.
|
|||
> overlapping entity spans are not allowed.
|
||||
|
||||
```python
|
||||
### {highlight="1,6-7"}
|
||||
### {highlight="1,8-9"}
|
||||
import your_custom_entity_recognizer
|
||||
from spacy.gold import offsets_from_biluo_tags
|
||||
from spacy.language import Language
|
||||
|
||||
@Language.component("custom_ner_wrapper")
|
||||
def custom_ner_wrapper(doc):
|
||||
words = [token.text for token in doc]
|
||||
custom_entities = your_custom_entity_recognizer(words)
|
||||
|
@ -865,22 +1142,24 @@ because it returns the integer ID of the string _and_ makes sure it's added to
|
|||
the vocab. This is especially important if the custom model uses a different
|
||||
label scheme than spaCy's default models.
|
||||
|
||||
> #### Example: spacy-stanfordnlp
|
||||
> #### Example: spacy-stanza
|
||||
>
|
||||
> For an example of an end-to-end wrapper for statistical tokenization, tagging
|
||||
> and parsing, check out
|
||||
> [`spacy-stanfordnlp`](https://github.com/explosion/spacy-stanfordnlp). It uses
|
||||
> a very similar approach to the example in this section – the only difference
|
||||
> is that it fully replaces the `nlp` object instead of providing a pipeline
|
||||
> component, since it also needs to handle tokenization.
|
||||
> [`spacy-stanza`](https://github.com/explosion/spacy-stanza). It uses a very
|
||||
> similar approach to the example in this section – the only difference is that
|
||||
> it fully replaces the `nlp` object instead of providing a pipeline component,
|
||||
> since it also needs to handle tokenization.
|
||||
|
||||
```python
|
||||
### {highlight="1,9,15-17"}
|
||||
### {highlight="1,11,17-19"}
|
||||
import your_custom_model
|
||||
from spacy.language import Language
|
||||
from spacy.symbols import POS, TAG, DEP, HEAD
|
||||
from spacy.tokens import Doc
|
||||
import numpy
|
||||
|
||||
@Language.component("custom_model_wrapper")
|
||||
def custom_model_wrapper(doc):
|
||||
words = [token.text for token in doc]
|
||||
spaces = [token.whitespace for token in doc]
|
||||
|
|
|
@ -450,6 +450,14 @@ git init # Initialize a Git repo
|
|||
dvc init # Initialize a DVC project
|
||||
```
|
||||
|
||||
<Infobox title="Important note on privacy" variant="warning">
|
||||
|
||||
DVC enables usage analytics by default, so if you're working in a
|
||||
privacy-sensitive environment, make sure to
|
||||
[**opt-out manually**](https://dvc.org/doc/user-guide/analytics#opting-out).
|
||||
|
||||
</Infobox>
|
||||
|
||||
The [`spacy project dvc`](/api/cli#project-dvc) command creates a `dvc.yaml`
|
||||
config file based on a workflow defined in your `project.yml`. Whenever you
|
||||
update your project, you can re-run the command to update your DVC config. You
|
||||
|
|
|
@ -506,11 +506,16 @@ attribute `bad_html` on the token.
|
|||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
from spacy.matcher import Matcher
|
||||
from spacy.tokens import Token
|
||||
|
||||
# We're using a class because the component needs to be initialized with
|
||||
# the shared vocab via the nlp object
|
||||
# We're using a component factory because the component needs to be initialized
|
||||
# with the shared vocab via the nlp object
|
||||
@Language.factory("html_merger")
|
||||
def create_bad_html_merger(nlp, name):
|
||||
return BadHTMLMerger(nlp)
|
||||
|
||||
class BadHTMLMerger:
|
||||
def __init__(self, nlp):
|
||||
patterns = [
|
||||
|
@ -536,8 +541,7 @@ class BadHTMLMerger:
|
|||
return doc
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
html_merger = BadHTMLMerger(nlp)
|
||||
nlp.add_pipe(html_merger, last=True) # Add component to the pipeline
|
||||
nlp.add_pipe("html_merger", last=True) # Add component to the pipeline
|
||||
doc = nlp("Hello<br>world! <br/> This is a test.")
|
||||
for token in doc:
|
||||
print(token.text, token._.bad_html)
|
||||
|
@ -546,10 +550,16 @@ for token in doc:
|
|||
|
||||
Instead of hard-coding the patterns into the component, you could also make it
|
||||
take a path to a JSON file containing the patterns. This lets you reuse the
|
||||
component with different patterns, depending on your application:
|
||||
component with different patterns, depending on your application. When adding
|
||||
the component to the pipeline with [`nlp.add_pipe`](/api/language#add_pipe), you
|
||||
can pass in the argument via the `config`:
|
||||
|
||||
```python
|
||||
html_merger = BadHTMLMerger(nlp, path="/path/to/patterns.json")
|
||||
@Language.factory("html_merger", default_config={"path": None})
|
||||
def create_bad_html_merger(nlp, name, path):
|
||||
return BadHTMLMerger(nlp, path=path)
|
||||
|
||||
nlp.add_pipe("html_merger", config={"path": "/path/to/patterns.json"})
|
||||
```
|
||||
|
||||
<Infobox title="Processing pipelines" emoji="📖">
|
||||
|
@ -835,7 +845,7 @@ patterns can contain single or multiple tokens.
|
|||
import spacy
|
||||
from spacy.matcher import PhraseMatcher
|
||||
|
||||
nlp = spacy.load('en_core_web_sm')
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
matcher = PhraseMatcher(nlp.vocab)
|
||||
terms = ["Barack Obama", "Angela Merkel", "Washington, D.C."]
|
||||
# Only run nlp.make_doc to speed things up
|
||||
|
@ -975,14 +985,12 @@ chosen.
|
|||
```python
|
||||
### {executable="true"}
|
||||
from spacy.lang.en import English
|
||||
from spacy.pipeline import EntityRuler
|
||||
|
||||
nlp = English()
|
||||
ruler = EntityRuler(nlp)
|
||||
ruler = nlp.add_pipe("entity_ruler")
|
||||
patterns = [{"label": "ORG", "pattern": "Apple"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]}]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
doc = nlp("Apple is opening its first big office in San Francisco.")
|
||||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
|
@ -1000,13 +1008,11 @@ can set `overwrite_ents=True` on initialization.
|
|||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.pipeline import EntityRuler
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
ruler = EntityRuler(nlp)
|
||||
ruler = nlp.add_pipe("entity_ruler")
|
||||
patterns = [{"label": "ORG", "pattern": "MyCorp Inc."}]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
doc = nlp("MyCorp Inc. is a company in the U.S.")
|
||||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
|
@ -1014,12 +1020,12 @@ print([(ent.text, ent.label_) for ent in doc.ents])
|
|||
|
||||
#### Validating and debugging EntityRuler patterns {#entityruler-pattern-validation new="2.1.8"}
|
||||
|
||||
The `EntityRuler` can validate patterns against a JSON schema with the option
|
||||
`validate=True`. See details under
|
||||
The entity ruler can validate patterns against a JSON schema with the config
|
||||
setting `"validate"`. See details under
|
||||
[Validating and debugging patterns](#pattern-validation).
|
||||
|
||||
```python
|
||||
ruler = EntityRuler(nlp, validate=True)
|
||||
ruler = nlp.add_pipe("entity_ruler", config={"validate": True})
|
||||
```
|
||||
|
||||
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
|
||||
|
@ -1031,15 +1037,13 @@ the same entity.
|
|||
```python
|
||||
### {executable="true"}
|
||||
from spacy.lang.en import English
|
||||
from spacy.pipeline import EntityRuler
|
||||
|
||||
nlp = English()
|
||||
ruler = EntityRuler(nlp)
|
||||
ruler = nlp.add_pipe("entity_ruler")
|
||||
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
doc1 = nlp("Apple is opening its first big office in San Francisco.")
|
||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
|
||||
|
@ -1068,7 +1072,7 @@ line.
|
|||
|
||||
```python
|
||||
ruler.to_disk("./patterns.jsonl")
|
||||
new_ruler = EntityRuler(nlp).from_disk("./patterns.jsonl")
|
||||
new_ruler = nlp.add_pipe("entity_ruler").from_disk("./patterns.jsonl")
|
||||
```
|
||||
|
||||
<Infobox title="Integration with Prodigy">
|
||||
|
@ -1086,9 +1090,8 @@ pipeline, its patterns are automatically exported to the model directory:
|
|||
|
||||
```python
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
ruler = EntityRuler(nlp)
|
||||
ruler = nlp.add_pipe("entity_ruler")
|
||||
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
|
||||
nlp.add_pipe(ruler)
|
||||
nlp.to_disk("/path/to/model")
|
||||
```
|
||||
|
||||
|
@ -1100,35 +1103,30 @@ powerful model packages with binary weights _and_ rules included!
|
|||
|
||||
### Using a large number of phrase patterns {#entityruler-large-phrase-patterns new="2.2.4"}
|
||||
|
||||
<!-- TODO: double-check that this still works if the ruler is added to the pipeline on creation, and include suggestion if needed -->
|
||||
|
||||
When using a large amount of **phrase patterns** (roughly > 10000) it's useful
|
||||
to understand how the `add_patterns` function of the EntityRuler works. For each
|
||||
**phrase pattern**, the EntityRuler calls the nlp object to construct a doc
|
||||
to understand how the `add_patterns` function of the entity ruler works. For
|
||||
each **phrase pattern**, the EntityRuler calls the nlp object to construct a doc
|
||||
object. This happens in case you try to add the EntityRuler at the end of an
|
||||
existing pipeline with, for example, a POS tagger and want to extract matches
|
||||
based on the pattern's POS signature.
|
||||
|
||||
In this case you would pass a config value of `phrase_matcher_attr="POS"` for
|
||||
the EntityRuler.
|
||||
based on the pattern's POS signature. In this case you would pass a config value
|
||||
of `"phrase_matcher_attr": "POS"` for the entity ruler.
|
||||
|
||||
Running the full language pipeline across every pattern in a large list scales
|
||||
linearly and can therefore take a long time on large amounts of phrase patterns.
|
||||
|
||||
As of spaCy 2.2.4 the `add_patterns` function has been refactored to use
|
||||
nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with
|
||||
5,000-100,000 phrase patterns respectively.
|
||||
|
||||
Even with this speedup (but especially if you're using an older version) the
|
||||
`add_patterns` function can still take a long time.
|
||||
|
||||
An easy workaround to make this function run faster is disabling the other
|
||||
language pipes while adding the phrase patterns.
|
||||
5,000-100,000 phrase patterns respectively. Even with this speedup (but
|
||||
especially if you're using an older version) the `add_patterns` function can
|
||||
still take a long time. An easy workaround to make this function run faster is
|
||||
disabling the other language pipes while adding the phrase patterns.
|
||||
|
||||
```python
|
||||
entityruler = EntityRuler(nlp)
|
||||
ruler = nlp.add_pipe("entity_ruler")
|
||||
patterns = [{"label": "TEST", "pattern": str(i)} for i in range(100000)]
|
||||
|
||||
with nlp.select_pipes(enable="tagger"):
|
||||
entityruler.add_patterns(patterns)
|
||||
ruler.add_patterns(patterns)
|
||||
```
|
||||
|
||||
## Combining models and rules {#models-rules}
|
||||
|
@ -1189,9 +1187,11 @@ have in common is that _if_ they occur, they occur in the **previous token**
|
|||
right before the person entity.
|
||||
|
||||
```python
|
||||
### {highlight="7-11"}
|
||||
### {highlight="9-13"}
|
||||
from spacy.language import Language
|
||||
from spacy.tokens import Span
|
||||
|
||||
@Language.component("expand_person_entities")
|
||||
def expand_person_entities(doc):
|
||||
new_ents = []
|
||||
for ent in doc.ents:
|
||||
|
@ -1210,18 +1210,20 @@ def expand_person_entities(doc):
|
|||
```
|
||||
|
||||
The above function takes a `Doc` object, modifies its `doc.ents` and returns it.
|
||||
This is exactly what a [pipeline component](/usage/processing-pipelines) does,
|
||||
so in order to let it run automatically when processing a text with the `nlp`
|
||||
object, we can use [`nlp.add_pipe`](/api/language#add_pipe) to add it to the
|
||||
current pipeline.
|
||||
Using the [`@Language.component`](/api/language#component) decorator, we can
|
||||
register it as a [pipeline component](/usage/processing-pipelines) so it can run
|
||||
automatically when processing a text. We can use
|
||||
[`nlp.add_pipe`](/api/language#add_pipe) to add it to the current pipeline.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.language import Language
|
||||
from spacy.tokens import Span
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
|
||||
@Language.component("expand_person_entities")
|
||||
def expand_person_entities(doc):
|
||||
new_ents = []
|
||||
for ent in doc.ents:
|
||||
|
@ -1236,7 +1238,7 @@ def expand_person_entities(doc):
|
|||
return doc
|
||||
|
||||
# Add the component after the named entity recognizer
|
||||
nlp.add_pipe(expand_person_entities, after='ner')
|
||||
nlp.add_pipe("expand_person_entities", after="ner")
|
||||
|
||||
doc = nlp("Dr. Alex Smith chaired first board meeting of Acme Corp Inc.")
|
||||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
|
@ -1347,7 +1349,7 @@ for ent in person_entities:
|
|||
# children, e.g. at -> Acme Corp Inc.
|
||||
orgs = [token for token in prep.children if token.ent_type_ == "ORG"]
|
||||
# If the verb is in past tense, the company was a previous company
|
||||
print({'person': ent, 'orgs': orgs, 'past': head.tag_ == "VBD"})
|
||||
print({"person": ent, "orgs": orgs, "past": head.tag_ == "VBD"})
|
||||
```
|
||||
|
||||
To apply this logic automatically when we process a text, we can add it to the
|
||||
|
@ -1374,11 +1376,12 @@ the entity `Span` – for example `._.orgs` or `._.prev_orgs` and
|
|||
```python
|
||||
### {executable="true"}
|
||||
import spacy
|
||||
from spacy.pipeline import merge_entities
|
||||
from spacy.language import Language
|
||||
from spacy import displacy
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
|
||||
@Language.component("extract_person_orgs")
|
||||
def extract_person_orgs(doc):
|
||||
person_entities = [ent for ent in doc.ents if ent.label_ == "PERSON"]
|
||||
for ent in person_entities:
|
||||
|
@ -1391,12 +1394,12 @@ def extract_person_orgs(doc):
|
|||
return doc
|
||||
|
||||
# To make the entities easier to work with, we'll merge them into single tokens
|
||||
nlp.add_pipe(merge_entities)
|
||||
nlp.add_pipe(extract_person_orgs)
|
||||
nlp.add_pipe("merge_entities")
|
||||
nlp.add_pipe("extract_person_orgs")
|
||||
|
||||
doc = nlp("Alex Smith worked at Acme Corp Inc.")
|
||||
# If you're not in a Jupyter / IPython environment, use displacy.serve
|
||||
displacy.render(doc, options={'fine_grained': True})
|
||||
displacy.render(doc, options={"fine_grained": True})
|
||||
```
|
||||
|
||||
If you change the sentence structure above, for example to "was working", you'll
|
||||
|
@ -1409,7 +1412,8 @@ information is in the attached auxiliary "was":
|
|||
To solve this, we can adjust the rules to also check for the above construction:
|
||||
|
||||
```python
|
||||
### {highlight="9-11"}
|
||||
### {highlight="10-12"}
|
||||
@Language.component("extract_person_orgs")
|
||||
def extract_person_orgs(doc):
|
||||
person_entities = [ent for ent in doc.ents if ent.label_ == "PERSON"]
|
||||
for ent in person_entities:
|
||||
|
|
|
@ -15,6 +15,8 @@ import Serialization101 from 'usage/101/\_serialization.md'
|
|||
|
||||
### Serializing the pipeline {#pipeline}
|
||||
|
||||
<!-- TODO: update this -->
|
||||
|
||||
When serializing the pipeline, keep in mind that this will only save out the
|
||||
**binary data for the individual components** to allow spaCy to restore them –
|
||||
not the entire objects. This is a good thing, because it makes serialization
|
||||
|
@ -22,32 +24,35 @@ safe. But it also means that you have to take care of storing the language name
|
|||
and pipeline component names as well, and restoring them separately before you
|
||||
can load in the data.
|
||||
|
||||
> #### Saving the model meta
|
||||
> #### Saving the meta and config
|
||||
>
|
||||
> The `nlp.meta` attribute is a JSON-serializable dictionary and contains all
|
||||
> model meta information, like the language and pipeline, but also author and
|
||||
> license information.
|
||||
> The [`nlp.meta`](/api/language#meta) attribute is a JSON-serializable
|
||||
> dictionary and contains all model meta information like the author and license
|
||||
> information. The [`nlp.config`](/api/language#config) attribute is a
|
||||
> dictionary containing the training configuration, pipeline component factories
|
||||
> and other settings. It is saved out with a model as the `config.cfg`.
|
||||
|
||||
```python
|
||||
### Serialize
|
||||
bytes_data = nlp.to_bytes()
|
||||
lang = nlp.meta["lang"] # "en"
|
||||
pipeline = nlp.meta["pipeline"] # ["tagger", "parser", "ner"]
|
||||
lang = nlp.config["nlp"]["lang"] # "en"
|
||||
pipeline = nlp.config["nlp"]["pipeline"] # ["tagger", "parser", "ner"]
|
||||
```
|
||||
|
||||
```python
|
||||
### Deserialize
|
||||
nlp = spacy.blank(lang)
|
||||
for pipe_name in pipeline:
|
||||
pipe = nlp.create_pipe(pipe_name)
|
||||
nlp.add_pipe(pipe)
|
||||
nlp.add_pipe(pipe_name)
|
||||
nlp.from_bytes(bytes_data)
|
||||
```
|
||||
|
||||
This is also how spaCy does it under the hood when loading a model: it loads the
|
||||
model's `meta.json` containing the language and pipeline information,
|
||||
initializes the language class, creates and adds the pipeline components and
|
||||
_then_ loads in the binary data. You can read more about this process
|
||||
model's `config.cfg` containing the language and pipeline information,
|
||||
initializes the language class, creates and adds the pipeline components based
|
||||
on the defined
|
||||
[factories](/usage/processing-pipeline#custom-components-factories) and _then_
|
||||
loads in the binary data. You can read more about this process
|
||||
[here](/usage/processing-pipelines#pipelines).
|
||||
|
||||
### Serializing Doc objects efficiently {#docs new="2.2"}
|
||||
|
@ -192,10 +197,9 @@ add to that data and saves and loads the data to and from a JSON file.
|
|||
> recognizer and including all rules _with_ the model data.
|
||||
|
||||
```python
|
||||
### {highlight="15-19,21-26"}
|
||||
### {highlight="14-18,20-25"}
|
||||
@Language.factory("my_component")
|
||||
class CustomComponent:
|
||||
name = "my_component"
|
||||
|
||||
def __init__(self):
|
||||
self.data = []
|
||||
|
||||
|
@ -228,9 +232,8 @@ component's `to_disk` method.
|
|||
```python
|
||||
### {highlight="2-4"}
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
my_component = CustomComponent()
|
||||
my_component = nlp.add_pipe("my_component")
|
||||
my_component.add({"hello": "world"})
|
||||
nlp.add_pipe(my_component)
|
||||
nlp.to_disk("/path/to/model")
|
||||
```
|
||||
|
||||
|
@ -247,7 +250,8 @@ file `data.json` in its subdirectory:
|
|||
├── parser # data for "parser" component
|
||||
├── tagger # data for "tagger" component
|
||||
├── vocab # model vocabulary
|
||||
├── meta.json # model meta.json with name, language and pipeline
|
||||
├── meta.json # model meta.json
|
||||
├── config.cfg # model config
|
||||
└── tokenizer # tokenization rules
|
||||
```
|
||||
|
||||
|
@ -260,19 +264,14 @@ instance, you could add a
|
|||
trained with a different library like TensorFlow or PyTorch and make spaCy load
|
||||
its weights automatically when you load the model package.
|
||||
|
||||
<Infobox title="Important note on loading components" variant="warning">
|
||||
<Infobox title="Important note on loading custom components" variant="warning">
|
||||
|
||||
When you load a model from disk, spaCy will check the `"pipeline"` in the
|
||||
model's `meta.json` and look up the component name in the internal factories. To
|
||||
make sure spaCy knows how to initialize `"my_component"`, you'll need to add it
|
||||
to the factories:
|
||||
|
||||
```python
|
||||
from spacy.language import Language
|
||||
Language.factories["my_component"] = lambda nlp, **cfg: CustomComponent()
|
||||
```
|
||||
|
||||
For more details, see the documentation on
|
||||
When you load back a model with custom components, make sure that the components
|
||||
are **available** and that the [`@Language.component`](/api/language#component)
|
||||
or [`@Language.factory`](/api/language#factory) decorators are executed _before_
|
||||
your model is loaded back. Otherwise, spaCy won't know how to resolve the string
|
||||
name of a component factory like `"my_component"` back to a function. For more
|
||||
details, see the documentation on
|
||||
[adding factories](/usage/processing-pipelines#custom-components-factories) or
|
||||
use [entry points](#entry-points) to make your extension package expose your
|
||||
custom components to spaCy automatically.
|
||||
|
@ -293,40 +292,31 @@ installed in the same environment – that's it.
|
|||
|
||||
| Entry point | Description |
|
||||
| ------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| [`spacy_factories`](#entry-points-components) | Group of entry points for pipeline component factories to add to [`Language.factories`](/usage/processing-pipelines#custom-components-factories), keyed by component name. |
|
||||
| [`spacy_factories`](#entry-points-components) | Group of entry points for pipeline component factories, keyed by component name. Can be used to expose custom components defined by another package. |
|
||||
| [`spacy_languages`](#entry-points-languages) | Group of entry points for custom [`Language` subclasses](/usage/adding-languages), keyed by language shortcut. |
|
||||
| `spacy_lookups` <Tag variant="new">2.2</Tag> | Group of entry points for custom [`Lookups`](/api/lookups), including lemmatizer data. Used by spaCy's [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) package. |
|
||||
| [`spacy_displacy_colors`](#entry-points-displacy) <Tag variant="new">2.2</Tag> | Group of entry points of custom label colors for the [displaCy visualizer](/usage/visualizers#ent). The key name doesn't matter, but it should point to a dict of labels and color values. Useful for custom models that predict different entity types. |
|
||||
|
||||
### Custom components via entry points {#entry-points-components}
|
||||
|
||||
When you load a model, spaCy will generally use the model's `meta.json` to set
|
||||
When you load a model, spaCy will generally use the model's `config.cfg` to set
|
||||
up the language class and construct the pipeline. The pipeline is specified as a
|
||||
list of strings, e.g. `"pipeline": ["tagger", "paser", "ner"]`. For each of
|
||||
those strings, spaCy will call `nlp.create_pipe` and look up the name in the
|
||||
[built-in factories](/usage/processing-pipelines#custom-components-factories).
|
||||
If your model wanted to specify its own custom components, you usually have to
|
||||
write to `Language.factories` _before_ loading the model.
|
||||
list of strings, e.g. `pipeline = ["tagger", "paser", "ner"]`. For each of those
|
||||
strings, spaCy will call `nlp.add_pipe` and look up the name in all factories
|
||||
defined by the decorators [`@Language.component`](/api/language#component) and
|
||||
[`@Language.factory`](/api/language#factory). This means that you have to import
|
||||
your custom components _before_ loading the model.
|
||||
|
||||
```python
|
||||
pipe = nlp.create_pipe("custom_component") # fails 👎
|
||||
|
||||
Language.factories["custom_component"] = CustomComponentFactory
|
||||
pipe = nlp.create_pipe("custom_component") # works 👍
|
||||
```
|
||||
|
||||
This is inconvenient and usually required shipping a bunch of component
|
||||
initialization code with the model. Using entry points, model packages and
|
||||
extension packages can now define their own `"spacy_factories"`, which will be
|
||||
added to the built-in factories when the `Language` class is initialized. If a
|
||||
package in the same environment exposes spaCy entry points, all of this happens
|
||||
automatically and no further user action is required.
|
||||
Using entry points, model packages and extension packages can define their own
|
||||
`"spacy_factories"`, which will be loaded automatically in the background when
|
||||
the `Language` class is initialized. So if a user has your package installed,
|
||||
they'll be able to use your components – even if they **don't import them**!
|
||||
|
||||
To stick with the theme of
|
||||
[this entry points blog post](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
|
||||
consider the following custom spaCy extension which is initialized with the
|
||||
shared `nlp` object and will print a snake when it's called as a pipeline
|
||||
component.
|
||||
consider the following custom spaCy
|
||||
[pipeline component](/usage/processing-pipelines#custom-coponents) that prints a
|
||||
snake when it's called:
|
||||
|
||||
> #### Package directory structure
|
||||
>
|
||||
|
@ -337,32 +327,38 @@ component.
|
|||
|
||||
```python
|
||||
### snek.py
|
||||
from spacy.language import Language
|
||||
|
||||
snek = """
|
||||
--..,_ _,.--.
|
||||
`'.'. .'`__ o `;__.
|
||||
`'.'. .'`__ o `;__. {text}
|
||||
'.'. .'.'` '---'` `
|
||||
'.`'--....--'`.'
|
||||
`'--....--'`
|
||||
"""
|
||||
|
||||
class SnekFactory:
|
||||
def __init__(self, nlp, **cfg):
|
||||
self.nlp = nlp
|
||||
|
||||
def __call__(self, doc):
|
||||
print(snek)
|
||||
return doc
|
||||
@Language.component("snek")
|
||||
def snek_component(doc):
|
||||
print(snek.format(text=doc.text))
|
||||
return doc
|
||||
```
|
||||
|
||||
Since it's a very complex and sophisticated module, you want to split it off
|
||||
into its own package so you can version it and upload it to PyPi. You also want
|
||||
your custom model to be able to define `"pipeline": ["snek"]` in its
|
||||
`meta.json`. For that, you need to be able to tell spaCy where to find the
|
||||
factory for `"snek"`. If you don't do this, spaCy will raise an error when you
|
||||
try to load the model because there's no built-in `"snek"` factory. To add an
|
||||
your custom model to be able to define `pipeline = ["snek"]` in its
|
||||
`config.cfg`. For that, you need to be able to tell spaCy where to find the
|
||||
component `"snek"`. If you don't do this, spaCy will raise an error when you try
|
||||
to load the model because there's no built-in `"snek"` component. To add an
|
||||
entry to the factories, you can now expose it in your `setup.py` via the
|
||||
`entry_points` dictionary:
|
||||
|
||||
> #### Entry point syntax
|
||||
>
|
||||
> Python entry points for a group are formatted as a **list of strings**, with
|
||||
> each string following the syntax of `name = module:object`. In this example,
|
||||
> the created entry point is named `snek` and points to the function
|
||||
> `snek_component` in the module `snek`, i.e. `snek.py`.
|
||||
|
||||
```python
|
||||
### setup.py {highlight="5-7"}
|
||||
from setuptools import setup
|
||||
|
@ -370,73 +366,74 @@ from setuptools import setup
|
|||
setup(
|
||||
name="snek",
|
||||
entry_points={
|
||||
"spacy_factories": ["snek = snek:SnekFactory"]
|
||||
"spacy_factories": ["snek = snek:snek_component"]
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
The entry point definition tells spaCy that the name `snek` can be found in the
|
||||
module `snek` (i.e. `snek.py`) as `SnekFactory`. The same package can expose
|
||||
multiple entry points. To make them available to spaCy, all you need to do is
|
||||
install the package:
|
||||
The same package can expose multiple entry points, by the way. To make them
|
||||
available to spaCy, all you need to do is install the package in your
|
||||
environment:
|
||||
|
||||
```bash
|
||||
$ python setup.py develop
|
||||
```
|
||||
|
||||
spaCy is now able to create the pipeline component `'snek'`:
|
||||
spaCy is now able to create the pipeline component `"snek"` – even though you
|
||||
never imported `snek_component`. When you save the
|
||||
[`nlp.config`](/api/language#config) to disk, it includes an entry for your
|
||||
`"snek"` component and any model you train with this config will include the
|
||||
component and know how to load it – if your `snek` package is installed.
|
||||
|
||||
> #### config.cfg (excerpt)
|
||||
>
|
||||
> ```diff
|
||||
> [nlp]
|
||||
> lang = "en"
|
||||
> + pipeline = ["snek"]
|
||||
>
|
||||
> [components]
|
||||
>
|
||||
> + [components.snek]
|
||||
> + factory = "snek"
|
||||
> ```
|
||||
|
||||
```
|
||||
>>> from spacy.lang.en import English
|
||||
>>> nlp = English()
|
||||
>>> snek = nlp.create_pipe("snek") # this now works! 🐍🎉
|
||||
>>> nlp.add_pipe(snek)
|
||||
>>> nlp.add_pipe("snek") # this now works! 🐍🎉
|
||||
>>> doc = nlp("I am snek")
|
||||
--..,_ _,.--.
|
||||
`'.'. .'`__ o `;__.
|
||||
`'.'. .'`__ o `;__. I am snek
|
||||
'.'. .'.'` '---'` `
|
||||
'.`'--....--'`.'
|
||||
`'--....--'`
|
||||
```
|
||||
|
||||
Arguably, this gets even more exciting when you train your `en_core_snek_sm`
|
||||
model. To make sure `snek` is installed with the model, you can add it to the
|
||||
model's `setup.py`. You can then tell spaCy to construct the model pipeline with
|
||||
the `snek` component by setting `"pipeline": ["snek"]` in the `meta.json`.
|
||||
Instead of making your snek component a simple
|
||||
[stateless component](/usage/processing-pipelines#custom-components-simple), you
|
||||
could also make it a
|
||||
[factory](/usage/processing-pipelines#custom-components-factories) that takes
|
||||
settings. Your users can then pass in an optional `config` when they add your
|
||||
component to the pipeline and customize its appearance – for example, the
|
||||
`snek_style`.
|
||||
|
||||
> #### meta.json
|
||||
> #### config.cfg (excerpt)
|
||||
>
|
||||
> ```diff
|
||||
> {
|
||||
> "lang": "en",
|
||||
> "name": "core_snek_sm",
|
||||
> "version": "1.0.0",
|
||||
> + "pipeline": ["snek"]
|
||||
> }
|
||||
> [components.snek]
|
||||
> factory = "snek"
|
||||
> + snek_style = "basic"
|
||||
> ```
|
||||
|
||||
In theory, the entry point mechanism also lets you overwrite built-in factories
|
||||
– including the tokenizer. By default, spaCy will output a warning in these
|
||||
cases, to prevent accidental overwrites and unintended results.
|
||||
|
||||
#### Advanced components with settings {#advanced-cfg}
|
||||
|
||||
The `**cfg` keyword arguments that the factory receives are passed down all the
|
||||
way from `spacy.load`. This means that the factory can respond to custom
|
||||
settings defined when loading the model – for example, the style of the snake to
|
||||
load:
|
||||
|
||||
```python
|
||||
nlp = spacy.load("en_core_snek_sm", snek_style="cute")
|
||||
```
|
||||
|
||||
```python
|
||||
SNEKS = {"basic": snek, "cute": cute_snek} # collection of sneks
|
||||
|
||||
@Language.factory("snek", default_config={"snek_style": "basic"})
|
||||
class SnekFactory:
|
||||
def __init__(self, nlp, **cfg):
|
||||
def __init__(self, nlp: Language, name: str, snek_style: str):
|
||||
self.nlp = nlp
|
||||
self.snek_style = cfg.get("snek_style", "basic")
|
||||
self.snek_style = snek_style
|
||||
self.snek = SNEKS[self.snek_style]
|
||||
|
||||
def __call__(self, doc):
|
||||
|
@ -444,6 +441,14 @@ class SnekFactory:
|
|||
return doc
|
||||
```
|
||||
|
||||
```diff
|
||||
### setup.py
|
||||
entry_points={
|
||||
- "spacy_factories": ["snek = snek:snek_component"]
|
||||
+ "spacy_factories": ["snek = snek:SnekFactory"]
|
||||
}
|
||||
```
|
||||
|
||||
The factory can also implement other pipeline component like `to_disk` and
|
||||
`from_disk` for serialization, or even `update` to make the component trainable.
|
||||
If a component exposes a `from_disk` method and is included in a model's
|
||||
|
@ -452,12 +457,12 @@ model. When you save out a model using `nlp.to_disk` and the component exposes a
|
|||
`to_disk` method, it will be called with the disk path.
|
||||
|
||||
```python
|
||||
def to_disk(self, path, **kwargs):
|
||||
def to_disk(self, path, exclude=tuple()):
|
||||
snek_path = path / "snek.txt"
|
||||
with snek_path.open("w", encoding="utf8") as snek_file:
|
||||
snek_file.write(self.snek)
|
||||
|
||||
def from_disk(self, path, **cfg):
|
||||
def from_disk(self, path, exclude=tuple()):
|
||||
snek_path = path / "snek.txt"
|
||||
with snek_path.open("r", encoding="utf8") as snek_file:
|
||||
self.snek = snek_file.read()
|
||||
|
@ -473,24 +478,20 @@ the `snek.txt` and make it available to the component.
|
|||
To stay with the theme of the previous example and
|
||||
[this blog post on entry points](https://amir.rachum.com/blog/2017/07/28/python-entry-points/),
|
||||
let's imagine you wanted to implement your own `SnekLanguage` class for your
|
||||
custom model – but you don't necessarily want to modify spaCy's code to
|
||||
[add a language](/usage/adding-languages). In your package, you could then
|
||||
implement the following:
|
||||
custom model – but you don't necessarily want to modify spaCy's code to add a
|
||||
language. In your package, you could then implement the following
|
||||
[custom language subclass](/usage/linguistic-features#language-subclass):
|
||||
|
||||
```python
|
||||
### snek.py
|
||||
from spacy.language import Language
|
||||
from spacy.attrs import LANG
|
||||
|
||||
class SnekDefaults(Language.Defaults):
|
||||
lex_attr_getters = dict(Language.Defaults.lex_attr_getters)
|
||||
lex_attr_getters[LANG] = lambda text: "snk"
|
||||
|
||||
stop_words = set(["sss", "hiss"])
|
||||
|
||||
class SnekLanguage(Language):
|
||||
lang = "snk"
|
||||
Defaults = SnekDefaults
|
||||
# Some custom snek language stuff here
|
||||
```
|
||||
|
||||
Alongside the `spacy_factories`, there's also an entry point option for
|
||||
|
@ -510,31 +511,12 @@ setup(
|
|||
)
|
||||
```
|
||||
|
||||
In spaCy, you can then load the custom `sk` language and it will be resolved to
|
||||
In spaCy, you can then load the custom `snk` language and it will be resolved to
|
||||
`SnekLanguage` via the custom entry point. This is especially relevant for model
|
||||
packages, which could then specify `"lang": "snk"` in their `meta.json` without
|
||||
spaCy raising an error because the language is not available in the core
|
||||
packages you train, which could then specify `lang = snk` in their `config.cfg`
|
||||
without spaCy raising an error because the language is not available in the core
|
||||
library.
|
||||
|
||||
> #### meta.json
|
||||
>
|
||||
> ```diff
|
||||
> {
|
||||
> - "lang": "en",
|
||||
> + "lang": "snk",
|
||||
> "name": "core_snek_sm",
|
||||
> "version": "1.0.0",
|
||||
> "pipeline": ["snek"]
|
||||
> }
|
||||
> ```
|
||||
|
||||
```python
|
||||
from spacy.util import get_lang_class
|
||||
|
||||
SnekLanguage = get_lang_class("snk")
|
||||
nlp = SnekLanguage()
|
||||
```
|
||||
|
||||
### Custom displaCy colors via entry points {#entry-points-displacy new="2.2"}
|
||||
|
||||
If you're training a named entity recognition model for a custom domain, you may
|
||||
|
@ -611,7 +593,7 @@ manually and place it in the model data directory, or supply a path to it using
|
|||
the `--meta` flag. For more info on this, see the [`package`](/api/cli#package)
|
||||
docs.
|
||||
|
||||
> #### meta.json
|
||||
> #### meta.json (example)
|
||||
>
|
||||
> ```json
|
||||
> {
|
||||
|
@ -622,8 +604,7 @@ docs.
|
|||
> "description": "Example model for spaCy",
|
||||
> "author": "You",
|
||||
> "email": "you@example.com",
|
||||
> "license": "CC BY-SA 3.0",
|
||||
> "pipeline": ["tagger", "parser", "ner"]
|
||||
> "license": "CC BY-SA 3.0"
|
||||
> }
|
||||
> ```
|
||||
|
||||
|
@ -631,66 +612,39 @@ docs.
|
|||
$ python -m spacy package /home/me/data/en_example_model /home/me/my_models
|
||||
```
|
||||
|
||||
This command will create a model package directory that should look like this:
|
||||
This command will create a model package directory and will run
|
||||
`python setup.py sdist` in that directory to create `.tar.gz` archive of your
|
||||
model package that can be installed using `pip install`.
|
||||
|
||||
```yaml
|
||||
### Directory structure
|
||||
└── /
|
||||
├── MANIFEST.in # to include meta.json
|
||||
├── meta.json # model meta data
|
||||
├── setup.py # setup file for pip installation
|
||||
└── en_example_model # model directory
|
||||
├── __init__.py # init for pip installation
|
||||
└── en_example_model-1.0.0 # model data
|
||||
├── MANIFEST.in # to include meta.json
|
||||
├── meta.json # model meta data
|
||||
├── setup.py # setup file for pip installation
|
||||
├── en_example_model # model directory
|
||||
│ ├── __init__.py # init for pip installation
|
||||
│ └── en_example_model-1.0.0 # model data
|
||||
└── dist
|
||||
└── en_example_model-1.0.0.tar.gz # installable package
|
||||
```
|
||||
|
||||
You can also find templates for all files on
|
||||
[GitHub](https://github.com/explosion/spacy-models/tree/master/template). If
|
||||
you're creating the package manually, keep in mind that the directories need to
|
||||
be named according to the naming conventions of `lang_name` and
|
||||
You can also find templates for all files in the
|
||||
[`cli/package.py` source](https://github.com/explosion/spacy/tree/master/spacy/cli/package.py).
|
||||
If you're creating the package manually, keep in mind that the directories need
|
||||
to be named according to the naming conventions of `lang_name` and
|
||||
`lang_name-version`.
|
||||
|
||||
### Customizing the model setup {#models-custom}
|
||||
|
||||
The meta.json includes the model details, like name, requirements and license,
|
||||
and lets you customize how the model should be initialized and loaded. You can
|
||||
define the language data to be loaded and the
|
||||
[processing pipeline](/usage/processing-pipelines) to execute.
|
||||
|
||||
| Setting | Type | Description |
|
||||
| ---------- | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `lang` | str | ID of the language class to initialize. |
|
||||
| `pipeline` | list | A list of strings mapping to the IDs of pipeline factories to apply in that order. If not set, spaCy's [default pipeline](/usage/processing-pipelines) will be used. |
|
||||
|
||||
The `load()` method that comes with our model package templates will take care
|
||||
of putting all this together and returning a `Language` object with the loaded
|
||||
pipeline and data. If your model requires custom
|
||||
[pipeline components](/usage/processing-pipelines) or a custom language class,
|
||||
you can also **ship the code with your model**. For examples of this, check out
|
||||
the implementations of spaCy's
|
||||
[`load_model_from_init_py`](/api/top-level#util.load_model_from_init_py) and
|
||||
[`load_model_from_path`](/api/top-level#util.load_model_from_path) utility
|
||||
functions.
|
||||
|
||||
### Building the model package {#models-building}
|
||||
|
||||
To build the package, run the following command from within the directory. For
|
||||
more information on building Python packages, see the docs on Python's
|
||||
[setuptools](https://setuptools.readthedocs.io/en/latest/).
|
||||
|
||||
```bash
|
||||
$ python setup.py sdist
|
||||
```
|
||||
|
||||
This will create a `.tar.gz` archive in a directory `/dist`. The model can be
|
||||
installed by pointing pip to the path of the archive:
|
||||
|
||||
```bash
|
||||
$ pip install /path/to/en_example_model-1.0.0.tar.gz
|
||||
```
|
||||
|
||||
You can then load the model via its name, `en_example_model`, or import it
|
||||
directly as a module and then call its `load()` method.
|
||||
you can also **ship the code with your model** and include it in the
|
||||
`__init__.py` – for example, to register custom
|
||||
[pipeline components](/usage/processing-pipelines#custom-components) before the
|
||||
`nlp` object is created.
|
||||
|
||||
### Loading a custom model package {#loading}
|
||||
|
||||
|
|
|
@ -328,18 +328,15 @@ spaCy's configs are powered by our machine learning library Thinc's
|
|||
[type hints](https://docs.python.org/3/library/typing.html) and even
|
||||
[advanced type annotations](https://thinc.ai/docs/usage-config#advanced-types)
|
||||
using [`pydantic`](https://github.com/samuelcolvin/pydantic). If your registered
|
||||
function provides For example, `start: int` in the example above will ensure
|
||||
that the value received as the argument `start` is an integer. If the value
|
||||
can't be cast to an integer, spaCy will raise an error.
|
||||
function provides type hints, the values that are passed in will be checked
|
||||
against the expected types. For example, `start: int` in the example above will
|
||||
ensure that the value received as the argument `start` is an integer. If the
|
||||
value can't be cast to an integer, spaCy will raise an error.
|
||||
`start: pydantic.StrictInt` will force the value to be an integer and raise an
|
||||
error if it's not – for instance, if your config defines a float.
|
||||
|
||||
</Infobox>
|
||||
|
||||
### Defining custom architectures {#custom-architectures}
|
||||
|
||||
<!-- TODO: this could maybe be a more general example of using Thinc to compose some layers? We don't want to go too deep here and probably want to focus on a simple architecture example to show how it works -->
|
||||
|
||||
### Wrapping PyTorch and TensorFlow {#custom-frameworks}
|
||||
|
||||
<!-- TODO: -->
|
||||
|
@ -352,6 +349,10 @@ mattis pretium.
|
|||
|
||||
</Project>
|
||||
|
||||
### Defining custom architectures {#custom-architectures}
|
||||
|
||||
<!-- TODO: this could maybe be a more general example of using Thinc to compose some layers? We don't want to go too deep here and probably want to focus on a simple architecture example to show how it works -->
|
||||
|
||||
## Parallel Training with Ray {#parallel-training}
|
||||
|
||||
<!-- TODO: document Ray integration -->
|
||||
|
|
6
website/docs/usage/transformers.md
Normal file
6
website/docs/usage/transformers.md
Normal file
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
title: Transformers
|
||||
teaser: Using transformer models like BERT in spaCy
|
||||
---
|
||||
|
||||
TODO: ...
|
|
@ -6,6 +6,7 @@ menu:
|
|||
- ['New Features', 'features']
|
||||
- ['Backwards Incompatibilities', 'incompat']
|
||||
- ['Migrating from v2.x', 'migrating']
|
||||
- ['Migrating plugins', 'plugins']
|
||||
---
|
||||
|
||||
## Summary {#summary}
|
||||
|
@ -31,3 +32,74 @@ relied on them.
|
|||
<!-- TODO: complete (see release notes Dropbox Paper doc) -->
|
||||
|
||||
## Migrating from v2.x {#migrating}
|
||||
|
||||
## Migration notes for plugin maintainers {#plugins}
|
||||
|
||||
Thanks to everyone who's been contributing to the spaCy ecosystem by developing
|
||||
and maintaining one of the many awesome [plugins and extensions](/universe).
|
||||
We've tried to keep breaking changes to a minimum and make it as easy as
|
||||
possible for you to upgrade your packages for spaCy v3.
|
||||
|
||||
### Custom pipeline components
|
||||
|
||||
The most common use case for plugins is providing pipeline components and
|
||||
extension attributes.
|
||||
|
||||
- Use the [`@Language.factory`](/api/language#factory) decorator to register
|
||||
your component and assign it a name. This allows users to refer to your
|
||||
components by name and serialize pipelines referencing them. Remove all manual
|
||||
entries to the `Language.factories`.
|
||||
- Make sure your component factories take at least two **named arguments**:
|
||||
`nlp` (the current `nlp` object) and `name` (the instance name of the added
|
||||
component so you can identify multiple instances of the same component).
|
||||
- Update all references to [`nlp.add_pipe`](/api/language#add_pipe) in your docs
|
||||
to use **string names** instead of the component functions.
|
||||
|
||||
```python
|
||||
### {highlight="1-5"}
|
||||
from spacy.language import Language
|
||||
|
||||
@Language.factory("my_component", default_config={"some_setting": False})
|
||||
def create_component(nlp: Language, name: str, some_setting: bool):
|
||||
return MyCoolComponent(some_setting=some_setting)
|
||||
|
||||
|
||||
class MyCoolComponent:
|
||||
def __init__(self, some_setting):
|
||||
self.some_setting = some_setting
|
||||
|
||||
def __call__(self, doc):
|
||||
# Do something to the doc
|
||||
return doc
|
||||
```
|
||||
|
||||
> #### Result in config.cfg
|
||||
>
|
||||
> ```ini
|
||||
> [components.my_component]
|
||||
> factory = "my_component"
|
||||
> some_setting = true
|
||||
> ```
|
||||
|
||||
```diff
|
||||
import spacy
|
||||
from your_plugin import MyCoolComponent
|
||||
|
||||
nlp = spacy.load("en_core_web_sm")
|
||||
- component = MyCoolComponent(some_setting=True)
|
||||
- nlp.add_pipe(component)
|
||||
+ nlp.add_pipe("my_component", config={"some_setting": True})
|
||||
```
|
||||
|
||||
<Infobox title="Important note on registering factories" variant="warning">
|
||||
|
||||
The [`@Language.factory`](/api/language#factory) decorator takes care of letting
|
||||
spaCy know that a component of that name is available. This means that your
|
||||
users can add it to the pipeline using its **string name**. However, this
|
||||
requires the decorator to be executed – so users will still have to **import
|
||||
your plugin**. Alternatively, your plugin could expose an
|
||||
[entry point](/usage/saving-loading#entry-points), which spaCy can read from.
|
||||
This means that spaCy knows how to initialize `my_component`, even if your
|
||||
package isn't imported.
|
||||
|
||||
</Infobox>
|
||||
|
|
|
@ -229,3 +229,5 @@ vectors.data = torch.Tensor(vectors.data).cuda(0)
|
|||
## Other embeddings {#embeddings}
|
||||
|
||||
<!-- TODO: explain spacy-transformers, doc.tensor, tok2vec? -->
|
||||
|
||||
<!-- TODO: mention sense2vec somewhere? -->
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
{ "text": "Rule-based Matching", "url": "/usage/rule-based-matching" },
|
||||
{ "text": "Processing Pipelines", "url": "/usage/processing-pipelines" },
|
||||
{ "text": "Vectors & Embeddings", "url": "/usage/vectors-embeddings" },
|
||||
{ "text": "Transformers", "url": "/usage/transformers", "tag": "new" },
|
||||
{ "text": "Training Models", "url": "/usage/training", "tag": "new" },
|
||||
{ "text": "spaCy Projects", "url": "/usage/projects", "tag": "new" },
|
||||
{ "text": "Saving & Loading", "url": "/usage/saving-loading" },
|
||||
|
|
|
@ -414,7 +414,7 @@ body [id]:target
|
|||
.cm-number
|
||||
color: var(--syntax-number)
|
||||
|
||||
.cm-def
|
||||
.cm-def, .cm-meta
|
||||
color: var(--syntax-function)
|
||||
|
||||
// Jupyter
|
||||
|
|
|
@ -17,7 +17,8 @@
|
|||
background: var(--color-subtle-opaque)
|
||||
|
||||
.footer
|
||||
background: var(--color-theme-light)
|
||||
--color-inline-code-bg: var(--color-theme-opaque)
|
||||
background: var(--color-theme-light) !important
|
||||
border-top: 2px solid var(--color-theme)
|
||||
|
||||
& > td:first-child
|
||||
|
|
Loading…
Reference in New Issue
Block a user