mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-27 09:44:36 +03:00
Update docs [ci skip]
This commit is contained in:
parent
30c76dbd67
commit
d7469283c5
|
@ -190,8 +190,6 @@ process that are used when you run [`spacy train`](/api/cli#train).
|
||||||
| `eval_frequency` | How often to evaluate during training (steps). Defaults to `200`. ~~int~~ |
|
| `eval_frequency` | How often to evaluate during training (steps). Defaults to `200`. ~~int~~ |
|
||||||
| `frozen_components` | Pipeline component names that are "frozen" and shouldn't be updated during training. See [here](/usage/training#config-components) for details. Defaults to `[]`. ~~List[str]~~ |
|
| `frozen_components` | Pipeline component names that are "frozen" and shouldn't be updated during training. See [here](/usage/training#config-components) for details. Defaults to `[]`. ~~List[str]~~ |
|
||||||
| `gpu_allocator` | Library for cupy to route GPU memory allocation to. Can be `"pytorch"` or `"tensorflow"`. Defaults to variable `${system.gpu_allocator}`. ~~str~~ |
|
| `gpu_allocator` | Library for cupy to route GPU memory allocation to. Can be `"pytorch"` or `"tensorflow"`. Defaults to variable `${system.gpu_allocator}`. ~~str~~ |
|
||||||
| `init_tok2vec` | Optional path to pretrained tok2vec weights created with [`spacy pretrain`](/api/cli#pretrain). Defaults to variable `${paths.init_tok2vec}`. ~~Optional[str]~~ |
|
|
||||||
| `lookups` | Additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). Defaults to `null`. ~~Optional[Lookups]~~ |
|
|
||||||
| `max_epochs` | Maximum number of epochs to train for. Defaults to `0`. ~~int~~ |
|
| `max_epochs` | Maximum number of epochs to train for. Defaults to `0`. ~~int~~ |
|
||||||
| `max_steps` | Maximum number of update steps to train for. Defaults to `20000`. ~~int~~ |
|
| `max_steps` | Maximum number of update steps to train for. Defaults to `20000`. ~~int~~ |
|
||||||
| `optimizer` | The optimizer. The learning rate schedule and other settings can be configured as part of the optimizer. Defaults to [`Adam`](https://thinc.ai/docs/api-optimizers#adam). ~~Optimizer~~ |
|
| `optimizer` | The optimizer. The learning rate schedule and other settings can be configured as part of the optimizer. Defaults to [`Adam`](https://thinc.ai/docs/api-optimizers#adam). ~~Optimizer~~ |
|
||||||
|
@ -200,7 +198,6 @@ process that are used when you run [`spacy train`](/api/cli#train).
|
||||||
| `score_weights` | Score names shown in metrics mapped to their weight towards the final weighted score. See [here](/usage/training#metrics) for details. Defaults to `{}`. ~~Dict[str, float]~~ |
|
| `score_weights` | Score names shown in metrics mapped to their weight towards the final weighted score. See [here](/usage/training#metrics) for details. Defaults to `{}`. ~~Dict[str, float]~~ |
|
||||||
| `seed` | The random seed. Defaults to variable `${system.seed}`. ~~int~~ |
|
| `seed` | The random seed. Defaults to variable `${system.seed}`. ~~int~~ |
|
||||||
| `train_corpus` | Dot notation of the config location defining the train corpus. Defaults to `corpora.train`. ~~str~~ |
|
| `train_corpus` | Dot notation of the config location defining the train corpus. Defaults to `corpora.train`. ~~str~~ |
|
||||||
| `vectors` | Name or path of pipeline containing pretrained word vectors to use, e.g. created with [`init vocab`](/api/cli#init-vocab). Defaults to `null`. ~~Optional[str]~~ |
|
|
||||||
|
|
||||||
### pretraining {#config-pretraining tag="section,optional"}
|
### pretraining {#config-pretraining tag="section,optional"}
|
||||||
|
|
||||||
|
@ -220,6 +217,38 @@ used when you run [`spacy pretrain`](/api/cli#pretrain).
|
||||||
| `component` | Component to find the layer to pretrain. Defaults to `"tok2vec"`. ~~str~~ |
|
| `component` | Component to find the layer to pretrain. Defaults to `"tok2vec"`. ~~str~~ |
|
||||||
| `layer` | The layer to pretrain. If empty, the whole component model will be used. ~~str~~ |
|
| `layer` | The layer to pretrain. If empty, the whole component model will be used. ~~str~~ |
|
||||||
|
|
||||||
|
### initialize {#config-initialize tag="section"}
|
||||||
|
|
||||||
|
This config block lets you define resources for **initializing the pipeline**.
|
||||||
|
It's used by [`Language.initialize`](/api/language#initialize) and typically
|
||||||
|
called right before training (but not at runtime). The section allows you to
|
||||||
|
specify local file paths or custom functions to load data resources from,
|
||||||
|
without requiring them at runtime when you load the trained pipeline back in.
|
||||||
|
|
||||||
|
> #### Example
|
||||||
|
>
|
||||||
|
> ```ini
|
||||||
|
> [initialize]
|
||||||
|
> vectors = "/path/to/vectors_nlp"
|
||||||
|
> init_tok2vec = "/path/to/pretrain.bin"
|
||||||
|
>
|
||||||
|
> [initialize_components]
|
||||||
|
>
|
||||||
|
> [initialize.components.my_component]
|
||||||
|
> data_path = "/path/to/component_data"
|
||||||
|
> ```
|
||||||
|
|
||||||
|
<!-- TODO: -->
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `components` | Additional arguments passed to the `initialize` method of a pipeline component, keyed by component name. If type annotations are available on the method, the config will be validated against them. The `initialize` methods will always receive the `get_examples` callback and the current `nlp` object. ~~Dict[str, Dict[str, Any]]~~ |
|
||||||
|
| `init_tok2vec` | Optional path to pretrained tok2vec weights created with [`spacy pretrain`](/api/cli#pretrain). Defaults to variable `${paths.init_tok2vec}`. ~~Optional[str]~~ |
|
||||||
|
| `lookups` | Additional lexeme and vocab data from [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). Defaults to `null`. ~~Optional[Lookups]~~ |
|
||||||
|
| `tokenizer` | Additional arguments passed to the `initialize` method of the specified tokenizer. Can be used for languages like Chinese that depend on dictionaries or trained models for tokenization. If type annotations are available on the method, the config will be validated against them. The `initialize` method will always receive the `get_examples` callback and the current `nlp` object. ~~Dict[str, Any]~~ |
|
||||||
|
| `vectors` | Name or path of pipeline containing pretrained word vectors to use, e.g. created with [`init vocab`](/api/cli#init-vocab). Defaults to `null`. ~~Optional[str]~~ |
|
||||||
|
| `vocab_data` | Path to JSONL-formatted [vocabulary file](/api/data-formats#vocab-jsonl) to initialize vocabulary. ~~Optional[str]~~ |
|
||||||
|
|
||||||
## Training data {#training}
|
## Training data {#training}
|
||||||
|
|
||||||
### Binary training format {#binary-training new="3"}
|
### Binary training format {#binary-training new="3"}
|
||||||
|
|
|
@ -142,14 +142,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
|
||||||
|
|
||||||
## DependencyParser.initialize {#initialize tag="method"}
|
## DependencyParser.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -161,16 +161,14 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> parser = nlp.add_pipe("parser")
|
> parser = nlp.add_pipe("parser")
|
||||||
> optimizer = parser.initialize(lambda: [], pipeline=nlp.pipeline)
|
> parser.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## DependencyParser.predict {#predict tag="method"}
|
## DependencyParser.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -141,14 +141,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/entitylinker#call) and
|
||||||
|
|
||||||
## EntityLinker.initialize {#initialize tag="method"}
|
## EntityLinker.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -159,17 +159,15 @@ This method was previously called `begin_training`.
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> entity_linker = nlp.add_pipe("entity_linker", last=True)
|
> entity_linker = nlp.add_pipe("entity_linker")
|
||||||
> optimizer = entity_linker.initialize(lambda: [], pipeline=nlp.pipeline)
|
> entity_linker.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## EntityLinker.predict {#predict tag="method"}
|
## EntityLinker.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -131,14 +131,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
|
||||||
|
|
||||||
## EntityRecognizer.initialize {#initialize tag="method"}
|
## EntityRecognizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -150,16 +150,14 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> ner = nlp.add_pipe("ner")
|
> ner = nlp.add_pipe("ner")
|
||||||
> optimizer = ner.initialize(lambda: [], pipeline=nlp.pipeline)
|
> ner.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## EntityRecognizer.predict {#predict tag="method"}
|
## EntityRecognizer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -204,12 +204,19 @@ more efficient than processing texts one-by-one.
|
||||||
## Language.initialize {#initialize tag="method"}
|
## Language.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the pipeline for training and return an
|
Initialize the pipeline for training and return an
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
[`Optimizer`](https://thinc.ai/docs/api-optimizers). Under the hood, it uses the
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
settings defined in the [`[initialize]`](/api/data-formats#config-initialize)
|
||||||
examples can either be the full training data or a representative sample. They
|
config block to set up the vocabulary, load in vectors and tok2vec weights and
|
||||||
are used to **initialize the models** of trainable pipeline components and are
|
pass optional arguments to the `initialize` methods implemented by pipeline
|
||||||
passed each component's [`initialize`](/api/pipe#initialize) method, if
|
components or the tokenizer. This method is typically called automatically when
|
||||||
available. Initialization includes validating the network,
|
you run [`spacy train`](/api/cli#train).
|
||||||
|
|
||||||
|
`get_examples` should be a function that returns an iterable of
|
||||||
|
[`Example`](/api/example) objects. The data examples can either be the full
|
||||||
|
training data or a representative sample. They are used to **initialize the
|
||||||
|
models** of trainable pipeline components and are passed each component's
|
||||||
|
[`initialize`](/api/pipe#initialize) method, if available. Initialization
|
||||||
|
includes validating the network,
|
||||||
[inferring missing shapes](/usage/layers-architectures#thinc-shape-inference)
|
[inferring missing shapes](/usage/layers-architectures#thinc-shape-inference)
|
||||||
and setting up the label scheme based on the data.
|
and setting up the label scheme based on the data.
|
||||||
|
|
||||||
|
|
|
@ -119,30 +119,27 @@ applied to the `Doc` in order. Both [`__call__`](/api/morphologizer#call) and
|
||||||
|
|
||||||
## Morphologizer.initialize {#initialize tag="method"}
|
## Morphologizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> morphologizer = nlp.add_pipe("morphologizer")
|
> morphologizer = nlp.add_pipe("morphologizer")
|
||||||
> nlp.pipeline.append(morphologizer)
|
> morphologizer.initialize(lambda: [], nlp=nlp)
|
||||||
> optimizer = morphologizer.initialize(lambda: [], pipeline=nlp.pipeline)
|
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## Morphologizer.predict {#predict tag="method"}
|
## Morphologizer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -100,14 +100,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/pipe#call) and
|
||||||
|
|
||||||
## Pipe.initialize {#initialize tag="method"}
|
## Pipe.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -119,16 +119,14 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> pipe = nlp.add_pipe("your_custom_pipe")
|
> pipe = nlp.add_pipe("your_custom_pipe")
|
||||||
> optimizer = pipe.initialize(lambda: [], pipeline=nlp.pipeline)
|
> pipe.initialize(lambda: [], pipeline=nlp.pipeline)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## Pipe.predict {#predict tag="method"}
|
## Pipe.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -116,29 +116,27 @@ and [`pipe`](/api/sentencerecognizer#pipe) delegate to the
|
||||||
|
|
||||||
## SentenceRecognizer.initialize {#initialize tag="method"}
|
## SentenceRecognizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> senter = nlp.add_pipe("senter")
|
> senter = nlp.add_pipe("senter")
|
||||||
> optimizer = senter.initialize(lambda: [], pipeline=nlp.pipeline)
|
> senter.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## SentenceRecognizer.predict {#predict tag="method"}
|
## SentenceRecognizer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -114,14 +114,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
|
||||||
|
|
||||||
## Tagger.initialize {#initialize tag="method"}
|
## Tagger.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -133,16 +133,14 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> tagger = nlp.add_pipe("tagger")
|
> tagger = nlp.add_pipe("tagger")
|
||||||
> optimizer = tagger.initialize(lambda: [], pipeline=nlp.pipeline)
|
> tagger.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## Tagger.predict {#predict tag="method"}
|
## Tagger.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -127,14 +127,14 @@ applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
|
||||||
|
|
||||||
## TextCategorizer.initialize {#initialize tag="method"}
|
## TextCategorizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
used to **initialize the model** of the component and can either be the full
|
||||||
examples are used to **initialize the model** of the component and can either be
|
training data or a representative sample. Initialization includes validating the
|
||||||
the full training data or a representative sample. Initialization includes
|
network,
|
||||||
validating the network,
|
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
<Infobox variant="warning" title="Changed in v3.0" id="begin_training">
|
||||||
|
|
||||||
|
@ -146,16 +146,14 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> textcat = nlp.add_pipe("textcat")
|
> textcat = nlp.add_pipe("textcat")
|
||||||
> optimizer = textcat.initialize(lambda: [], pipeline=nlp.pipeline)
|
> textcat.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## TextCategorizer.predict {#predict tag="method"}
|
## TextCategorizer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -132,22 +132,21 @@ examples are used to **initialize the model** of the component and can either be
|
||||||
the full training data or a representative sample. Initialization includes
|
the full training data or a representative sample. Initialization includes
|
||||||
validating the network,
|
validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> tok2vec = nlp.add_pipe("tok2vec")
|
> tok2vec = nlp.add_pipe("tok2vec")
|
||||||
> optimizer = tok2vec.initialize(lambda: [], pipeline=nlp.pipeline)
|
> tok2vec.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## Tok2Vec.predict {#predict tag="method"}
|
## Tok2Vec.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -167,22 +167,21 @@ examples are used to **initialize the model** of the component and can either be
|
||||||
the full training data or a representative sample. Initialization includes
|
the full training data or a representative sample. Initialization includes
|
||||||
validating the network,
|
validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data.
|
setting up the label scheme based on the data. This method is typically called
|
||||||
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> trf = nlp.add_pipe("transformer")
|
> trf = nlp.add_pipe("transformer")
|
||||||
> optimizer = trf.initialize(lambda: [], pipeline=nlp.pipeline)
|
> trf.initialize(lambda: [], nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `pipeline` | Optional list of pipeline components that this component is part of. ~~Optional[List[Tuple[str, Callable[[Doc], Doc]]]]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `sgd` | An optimizer. Will be created via [`create_optimizer`](#create_optimizer) if not set. ~~Optional[Optimizer]~~ |
|
|
||||||
| **RETURNS** | The optimizer. ~~Optimizer~~ |
|
|
||||||
|
|
||||||
## Transformer.predict {#predict tag="method"}
|
## Transformer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user