mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
Update docs for pipeline initialize() methods (#11221)
* Update documentation for dependency parser * Update documentation for trainable_lemmatizer * Update documentation for entity_linker * Update documentation for ner * Update documentation for morphologizer * Update documentation for senter * Update documentation for spancat * Update documentation for tagger * Update documentation for textcat * Update documentation for tok2vec * Run prettier on edited files * Apply similar changes in transformer docs * Remove need to say annotated example explicitly I removed the need to say "Must contain at least one annotated Example" because it's often a given that Examples will contain some gold-standard annotation. * Run prettier on transformer docs
This commit is contained in:
parent
d0578c2ede
commit
d993df41e5
|
@ -158,10 +158,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/dependencyparser#call) and
|
||||||
## DependencyParser.initialize {#initialize tag="method" new="3"}
|
## DependencyParser.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -179,7 +179,7 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> parser = nlp.add_pipe("parser")
|
> parser = nlp.add_pipe("parser")
|
||||||
> parser.initialize(lambda: [], nlp=nlp)
|
> parser.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -193,7 +193,7 @@ This method was previously called `begin_training`.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ |
|
||||||
|
|
|
@ -141,10 +141,10 @@ and [`pipe`](/api/edittreelemmatizer#pipe) delegate to the
|
||||||
## EditTreeLemmatizer.initialize {#initialize tag="method" new="3"}
|
## EditTreeLemmatizer.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -156,7 +156,7 @@ config.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> lemmatizer = nlp.add_pipe("trainable_lemmatizer", name="lemmatizer")
|
> lemmatizer = nlp.add_pipe("trainable_lemmatizer", name="lemmatizer")
|
||||||
> lemmatizer.initialize(lambda: [], nlp=nlp)
|
> lemmatizer.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -170,7 +170,7 @@ config.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
||||||
|
|
|
@ -185,10 +185,10 @@ with the current vocab.
|
||||||
## EntityLinker.initialize {#initialize tag="method" new="3"}
|
## EntityLinker.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize).
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
@ -208,15 +208,15 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> entity_linker = nlp.add_pipe("entity_linker")
|
> entity_linker = nlp.add_pipe("entity_linker")
|
||||||
> entity_linker.initialize(lambda: [], nlp=nlp, kb_loader=my_kb)
|
> entity_linker.initialize(lambda: examples, nlp=nlp, kb_loader=my_kb)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ |
|
| `kb_loader` | Function that creates a [`KnowledgeBase`](/api/kb) from a `Vocab` instance. ~~Callable[[Vocab], KnowledgeBase]~~ |
|
||||||
|
|
||||||
## EntityLinker.predict {#predict tag="method"}
|
## EntityLinker.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -154,10 +154,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/entityrecognizer#call) and
|
||||||
## EntityRecognizer.initialize {#initialize tag="method" new="3"}
|
## EntityRecognizer.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -175,7 +175,7 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> ner = nlp.add_pipe("ner")
|
> ner = nlp.add_pipe("ner")
|
||||||
> ner.initialize(lambda: [], nlp=nlp)
|
> ner.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -189,7 +189,7 @@ This method was previously called `begin_training`.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Dict[str, Dict[str, int]]]~~ |
|
||||||
|
|
|
@ -147,10 +147,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/morphologizer#call) and
|
||||||
## Morphologizer.initialize {#initialize tag="method"}
|
## Morphologizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -162,7 +162,7 @@ config.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> morphologizer = nlp.add_pipe("morphologizer")
|
> morphologizer = nlp.add_pipe("morphologizer")
|
||||||
> morphologizer.initialize(lambda: [], nlp=nlp)
|
> morphologizer.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -176,7 +176,7 @@ config.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[dict]~~ |
|
||||||
|
|
|
@ -132,10 +132,10 @@ and [`pipe`](/api/sentencerecognizer#pipe) delegate to the
|
||||||
## SentenceRecognizer.initialize {#initialize tag="method"}
|
## SentenceRecognizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize).
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
@ -144,14 +144,14 @@ by [`Language.initialize`](/api/language#initialize).
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> senter = nlp.add_pipe("senter")
|
> senter = nlp.add_pipe("senter")
|
||||||
> senter.initialize(lambda: [], nlp=nlp)
|
> senter.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
|
|
||||||
## SentenceRecognizer.predict {#predict tag="method"}
|
## SentenceRecognizer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -56,7 +56,7 @@ architectures and their arguments and hyperparameters.
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. Defaults to [`ngram_suggester`](#ngram_suggester). ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ |
|
| `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. Defaults to [`ngram_suggester`](#ngram_suggester). ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ |
|
||||||
| `model` | A model instance that is given a a list of documents and `(start, end)` indices representing candidate span offsets. The model predicts a probability for each category for each span. Defaults to [SpanCategorizer](/api/architectures#SpanCategorizer). ~~Model[Tuple[List[Doc], Ragged], Floats2d]~~ |
|
| `model` | A model instance that is given a a list of documents and `(start, end)` indices representing candidate span offsets. The model predicts a probability for each category for each span. Defaults to [SpanCategorizer](/api/architectures#SpanCategorizer). ~~Model[Tuple[List[Doc], Ragged], Floats2d]~~ |
|
||||||
| `spans_key` | Key of the [`Doc.spans`](/api/doc#spans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
|
| `spans_key` | Key of the [`Doc.spans`](/api/doc#spans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
|
||||||
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ |
|
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ |
|
||||||
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ |
|
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ |
|
||||||
| `scorer` | The scoring method. Defaults to [`Scorer.score_spans`](/api/scorer#score_spans) for `Doc.spans[spans_key]` with overlapping spans allowed. ~~Optional[Callable]~~ |
|
| `scorer` | The scoring method. Defaults to [`Scorer.score_spans`](/api/scorer#score_spans) for `Doc.spans[spans_key]` with overlapping spans allowed. ~~Optional[Callable]~~ |
|
||||||
|
@ -93,7 +93,7 @@ shortcut for this and instantiate the component using its string name and
|
||||||
| `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ |
|
| `suggester` | A function that [suggests spans](#suggesters). Spans are returned as a ragged array with two integer columns, for the start and end positions. ~~Callable[[Iterable[Doc], Optional[Ops]], Ragged]~~ |
|
||||||
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
|
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `spans_key` | Key of the [`Doc.spans`](/api/doc#sans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
|
| `spans_key` | Key of the [`Doc.spans`](/api/doc#sans) dict to save the spans under. During initialization and training, the component will look for spans on the reference document under the same key. Defaults to `"sc"`. ~~str~~ |
|
||||||
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ |
|
| `threshold` | Minimum probability to consider a prediction positive. Spans with a positive prediction will be saved on the Doc. Defaults to `0.5`. ~~float~~ |
|
||||||
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ |
|
| `max_positive` | Maximum number of labels to consider positive per span. Defaults to `None`, indicating no limit. ~~Optional[int]~~ |
|
||||||
|
|
||||||
|
@ -147,10 +147,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/spancategorizer#call) and
|
||||||
## SpanCategorizer.initialize {#initialize tag="method"}
|
## SpanCategorizer.initialize {#initialize tag="method"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -162,7 +162,7 @@ config.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> spancat = nlp.add_pipe("spancat")
|
> spancat = nlp.add_pipe("spancat")
|
||||||
> spancat.initialize(lambda: [], nlp=nlp)
|
> spancat.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -176,7 +176,7 @@ config.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
||||||
|
|
|
@ -130,10 +130,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/tagger#call) and
|
||||||
## Tagger.initialize {#initialize tag="method" new="3"}
|
## Tagger.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -151,7 +151,7 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> tagger = nlp.add_pipe("tagger")
|
> tagger = nlp.add_pipe("tagger")
|
||||||
> tagger.initialize(lambda: [], nlp=nlp)
|
> tagger.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -165,7 +165,7 @@ This method was previously called `begin_training`.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
||||||
|
|
|
@ -176,10 +176,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/textcategorizer#call) and
|
||||||
## TextCategorizer.initialize {#initialize tag="method" new="3"}
|
## TextCategorizer.initialize {#initialize tag="method" new="3"}
|
||||||
|
|
||||||
Initialize the component for training. `get_examples` should be a function that
|
Initialize the component for training. `get_examples` should be a function that
|
||||||
returns an iterable of [`Example`](/api/example) objects. The data examples are
|
returns an iterable of [`Example`](/api/example) objects. **At least one example
|
||||||
used to **initialize the model** of the component and can either be the full
|
should be supplied.** The data examples are used to **initialize the model** of
|
||||||
training data or a representative sample. Initialization includes validating the
|
the component and can either be the full training data or a representative
|
||||||
network,
|
sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
by [`Language.initialize`](/api/language#initialize) and lets you customize
|
||||||
|
@ -197,7 +197,7 @@ This method was previously called `begin_training`.
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> textcat = nlp.add_pipe("textcat")
|
> textcat = nlp.add_pipe("textcat")
|
||||||
> textcat.initialize(lambda: [], nlp=nlp)
|
> textcat.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
>
|
>
|
||||||
> ```ini
|
> ```ini
|
||||||
|
@ -212,7 +212,7 @@ This method was previously called `begin_training`.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
| `labels` | The label information to add to the component, as provided by the [`label_data`](#label_data) property after initialization. To generate a reusable JSON file from your data, you should run the [`init labels`](/api/cli#init-labels) command. If no labels are provided, the `get_examples` callback is used to extract the labels from the data, which may be a lot slower. ~~Optional[Iterable[str]]~~ |
|
||||||
|
|
|
@ -127,10 +127,10 @@ and [`set_annotations`](/api/tok2vec#set_annotations) methods.
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training and return an
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
function that returns an iterable of [`Example`](/api/example) objects. **At
|
||||||
examples are used to **initialize the model** of the component and can either be
|
least one example should be supplied.** The data examples are used to
|
||||||
the full training data or a representative sample. Initialization includes
|
**initialize the model** of the component and can either be the full training
|
||||||
validating the network,
|
data or a representative sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize).
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
@ -139,14 +139,14 @@ by [`Language.initialize`](/api/language#initialize).
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> tok2vec = nlp.add_pipe("tok2vec")
|
> tok2vec = nlp.add_pipe("tok2vec")
|
||||||
> tok2vec.initialize(lambda: [], nlp=nlp)
|
> tok2vec.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
|
|
||||||
## Tok2Vec.predict {#predict tag="method"}
|
## Tok2Vec.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -175,10 +175,10 @@ applied to the `Doc` in order. Both [`__call__`](/api/transformer#call) and
|
||||||
|
|
||||||
Initialize the component for training and return an
|
Initialize the component for training and return an
|
||||||
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
[`Optimizer`](https://thinc.ai/docs/api-optimizers). `get_examples` should be a
|
||||||
function that returns an iterable of [`Example`](/api/example) objects. The data
|
function that returns an iterable of [`Example`](/api/example) objects. **At
|
||||||
examples are used to **initialize the model** of the component and can either be
|
least one example should be supplied.** The data examples are used to
|
||||||
the full training data or a representative sample. Initialization includes
|
**initialize the model** of the component and can either be the full training
|
||||||
validating the network,
|
data or a representative sample. Initialization includes validating the network,
|
||||||
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
[inferring missing shapes](https://thinc.ai/docs/usage-models#validation) and
|
||||||
setting up the label scheme based on the data. This method is typically called
|
setting up the label scheme based on the data. This method is typically called
|
||||||
by [`Language.initialize`](/api/language#initialize).
|
by [`Language.initialize`](/api/language#initialize).
|
||||||
|
@ -187,14 +187,14 @@ by [`Language.initialize`](/api/language#initialize).
|
||||||
>
|
>
|
||||||
> ```python
|
> ```python
|
||||||
> trf = nlp.add_pipe("transformer")
|
> trf = nlp.add_pipe("transformer")
|
||||||
> trf.initialize(lambda: iter([]), nlp=nlp)
|
> trf.initialize(lambda: examples, nlp=nlp)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. ~~Callable[[], Iterable[Example]]~~ |
|
| `get_examples` | Function that returns gold-standard annotations in the form of [`Example`](/api/example) objects. Must contain at least one `Example`. ~~Callable[[], Iterable[Example]]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
| `nlp` | The current `nlp` object. Defaults to `None`. ~~Optional[Language]~~ |
|
||||||
|
|
||||||
## Transformer.predict {#predict tag="method"}
|
## Transformer.predict {#predict tag="method"}
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user