mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 10:46:29 +03:00
Document Assigned Attributes of Pipeline Components (#9041)
* Add textcat docs * Add NER docs * Add Entity Linker docs * Add assigned fields docs for the tagger This also adds a preamble, since there wasn't one. * Add morphologizer docs * Add dependency parser docs * Update entityrecognizer docs This is a little weird because `Doc.ents` is the only thing assigned to, but it's actually a bidirectional property. * Add token fields for entityrecognizer * Fix section name * Add entity ruler docs * Add lemmatizer docs * Add sentencizer/recognizer docs * Update website/docs/api/entityrecognizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/tagger.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update type for Doc.ents This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be correct. * Run prettier * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Run prettier * Add transformers section This basically just moves and renames the "custom attributes" section from the bottom of the page to be consistent with "assigned attributes" on other pages. I looked at moving the paragraph just above the section into the section, but it includes the unrelated registry additions, so it seemed better to leave it unchanged. * Make table header consistent Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
This commit is contained in:
parent
f803a84571
commit
ba6a37d358
|
@ -555,8 +555,8 @@ consists of either two or three subnetworks:
|
||||||
|
|
||||||
<Accordion title="spacy.TransitionBasedParser.v1 definition" spaced>
|
<Accordion title="spacy.TransitionBasedParser.v1 definition" spaced>
|
||||||
|
|
||||||
[TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact same signature,
|
[TransitionBasedParser.v1](/api/legacy#TransitionBasedParser_v1) had the exact
|
||||||
but the `use_upper` argument was `True` by default.
|
same signature, but the `use_upper` argument was `True` by default.
|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
|
|
@ -25,6 +25,20 @@ current state. The weights are updated such that the scores assigned to the set
|
||||||
of optimal actions is increased, while scores assigned to other actions are
|
of optimal actions is increased, while scores assigned to other actions are
|
||||||
decreased. Note that more than one action may be optimal for a given state.
|
decreased. Note that more than one action may be optimal for a given state.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Dependency predictions are assigned to the `Token.dep` and `Token.head` fields.
|
||||||
|
Beside the dependencies themselves, the parser decides sentence boundaries,
|
||||||
|
which are saved in `Token.is_sent_start` and accessible via `Doc.sents`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `Token.dep` | The type of dependency relation (hash). ~~int~~ |
|
||||||
|
| `Token.dep_` | The type of dependency relation. ~~str~~ |
|
||||||
|
| `Token.head` | The syntactic parent, or "governor", of this token. ~~Token~~ |
|
||||||
|
| `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. After the parser runs this will be `True` or `False` for all tokens. ~~bool~~ |
|
||||||
|
| `Doc.sents` | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -571,9 +571,9 @@ objects, if the entity recognizer has been applied.
|
||||||
> assert ents[0].text == "Mr. Best"
|
> assert ents[0].text == "Mr. Best"
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------- | --------------------------------------------------------------------- |
|
| ----------- | ---------------------------------------------------------------- |
|
||||||
| **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span, ...]~~ |
|
| **RETURNS** | Entities in the document, one `Span` per entity. ~~Tuple[Span]~~ |
|
||||||
|
|
||||||
## Doc.spans {#spans tag="property"}
|
## Doc.spans {#spans tag="property"}
|
||||||
|
|
||||||
|
|
|
@ -16,6 +16,16 @@ plausible candidates from that `KnowledgeBase` given a certain textual mention,
|
||||||
and a machine learning model to pick the right candidate, given the local
|
and a machine learning model to pick the right candidate, given the local
|
||||||
context of the mention.
|
context of the mention.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predictions, in the form of knowledge base IDs, will be assigned to
|
||||||
|
`Token.ent_kb_id_`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ------------------ | --------------------------------- |
|
||||||
|
| `Token.ent_kb_id` | Knowledge base ID (hash). ~~int~~ |
|
||||||
|
| `Token.ent_kb_id_` | Knowledge base ID. ~~str~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -20,6 +20,24 @@ your entities will be close to their initial tokens. If your entities are long
|
||||||
and characterized by tokens in their middle, the component will likely not be a
|
and characterized by tokens in their middle, the component will likely not be a
|
||||||
good fit for your task.
|
good fit for your task.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predictions will be saved to `Doc.ents` as a tuple. Each label will also be
|
||||||
|
reflected to each underlying token, where it is saved in the `Token.ent_type`
|
||||||
|
and `Token.ent_iob` fields. Note that by definition each token can only have one
|
||||||
|
label.
|
||||||
|
|
||||||
|
When setting `Doc.ents` to create training data, all the spans must be valid and
|
||||||
|
non-overlapping, or an error will be thrown.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ----------------- | ----------------------------------------------------------------- |
|
||||||
|
| `Doc.ents` | The annotated spans. ~~Tuple[Span]~~ |
|
||||||
|
| `Token.ent_iob` | An enum encoding of the IOB part of the named entity tag. ~~int~~ |
|
||||||
|
| `Token.ent_iob_` | The IOB part of the named entity tag. ~~str~~ |
|
||||||
|
| `Token.ent_type` | The label part of the named entity tag (hash). ~~int~~ |
|
||||||
|
| `Token.ent_type_` | The label part of the named entity tag. ~~str~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -15,6 +15,27 @@ used on its own to implement a purely rule-based entity recognition system. For
|
||||||
usage examples, see the docs on
|
usage examples, see the docs on
|
||||||
[rule-based entity recognition](/usage/rule-based-matching#entityruler).
|
[rule-based entity recognition](/usage/rule-based-matching#entityruler).
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
This component assigns predictions basically the same way as the
|
||||||
|
[`EntityRecognizer`](/api/entityrecognizer).
|
||||||
|
|
||||||
|
Predictions can be accessed under `Doc.ents` as a tuple. Each label will also be
|
||||||
|
reflected in each underlying token, where it is saved in the `Token.ent_type`
|
||||||
|
and `Token.ent_iob` fields. Note that by definition each token can only have one
|
||||||
|
label.
|
||||||
|
|
||||||
|
When setting `Doc.ents` to create training data, all the spans must be valid and
|
||||||
|
non-overlapping, or an error will be thrown.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ----------------- | ----------------------------------------------------------------- |
|
||||||
|
| `Doc.ents` | The annotated spans. ~~Tuple[Span]~~ |
|
||||||
|
| `Token.ent_iob` | An enum encoding of the IOB part of the named entity tag. ~~int~~ |
|
||||||
|
| `Token.ent_iob_` | The IOB part of the named entity tag. ~~str~~ |
|
||||||
|
| `Token.ent_type` | The label part of the named entity tag (hash). ~~int~~ |
|
||||||
|
| `Token.ent_type_` | The label part of the named entity tag. ~~str~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -105,7 +105,8 @@ and residual connections.
|
||||||
|
|
||||||
### spacy.TransitionBasedParser.v1 {#TransitionBasedParser_v1}
|
### spacy.TransitionBasedParser.v1 {#TransitionBasedParser_v1}
|
||||||
|
|
||||||
Identical to [`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser)
|
Identical to
|
||||||
|
[`spacy.TransitionBasedParser.v2`](/api/architectures#TransitionBasedParser)
|
||||||
except the `use_upper` was set to `True` by default.
|
except the `use_upper` was set to `True` by default.
|
||||||
|
|
||||||
### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1}
|
### spacy.TextCatEnsemble.v1 {#TextCatEnsemble_v1}
|
||||||
|
|
|
@ -31,6 +31,15 @@ available in the pipeline and runs _before_ the lemmatizer.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Lemmas generated by rules or predicted will be saved to `Token.lemma`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| -------------- | ------------------------- |
|
||||||
|
| `Token.lemma` | The lemma (hash). ~~int~~ |
|
||||||
|
| `Token.lemma_` | The lemma. ~~str~~ |
|
||||||
|
|
||||||
## Config and implementation
|
## Config and implementation
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -15,6 +15,16 @@ coarse-grained POS tags following the Universal Dependencies
|
||||||
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
|
[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
|
||||||
annotation guidelines.
|
annotation guidelines.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predictions are saved to `Token.morph` and `Token.pos`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ------------- | ----------------------------------------- |
|
||||||
|
| `Token.pos` | The UPOS part of speech (hash). ~~int~~ |
|
||||||
|
| `Token.pos_` | The UPOS part of speech. ~~str~~ |
|
||||||
|
| `Token.morph` | Morphological features. ~~MorphAnalysis~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -105,11 +105,11 @@ representation.
|
||||||
|
|
||||||
## Attributes {#attributes}
|
## Attributes {#attributes}
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------- |
|
| ------------- | ------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is ` | `. ~~str~~ |
|
| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `|`. ~~str~~ |
|
||||||
| `FIELD_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ |
|
| `FIELD_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ |
|
||||||
| `VALUE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ |
|
| `VALUE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ |
|
||||||
|
|
||||||
## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"}
|
## MorphAnalysis {#morphanalysis tag="class" source="spacy/tokens/morphanalysis.pyx"}
|
||||||
|
|
||||||
|
|
|
@ -149,8 +149,8 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --- |
|
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `match_id` | An ID for the thing you're matching. ~~str~~ | |
|
| `match_id` | An ID for the thing you're matching. ~~str~~ | |
|
||||||
| `docs` | `Doc` objects of the phrases to match. ~~List[Doc]~~ |
|
| `docs` | `Doc` objects of the phrases to match. ~~List[Doc]~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
|
| `on_match` | Callback function to act on matches. Takes the arguments `matcher`, `doc`, `i` and `matches`. ~~Optional[Callable[[Matcher, Doc, int, List[tuple], Any]]~~ |
|
||||||
|
|
|
@ -80,7 +80,7 @@ Docs with `has_unknown_spaces` are skipped during scoring.
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------- | ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
|
| ----------- | ------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
| `examples` | The `Example` objects holding both the predictions and the correct gold-standard annotations. ~~Iterable[Example]~~ |
|
||||||
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ |
|
| **RETURNS** | `Dict` | A dictionary containing the scores `token_acc`, `token_p`, `token_r`, `token_f`. ~~Dict[str, float]]~~ |
|
||||||
|
|
||||||
|
|
|
@ -12,6 +12,16 @@ api_trainable: true
|
||||||
A trainable pipeline component for sentence segmentation. For a simpler,
|
A trainable pipeline component for sentence segmentation. For a simpler,
|
||||||
rule-based strategy, see the [`Sentencizer`](/api/sentencizer).
|
rule-based strategy, see the [`Sentencizer`](/api/sentencizer).
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predicted values will be assigned to `Token.is_sent_start`. The resulting
|
||||||
|
sentences can be accessed using `Doc.sents`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
|
| `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ |
|
||||||
|
| `Doc.sents` | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -13,6 +13,16 @@ performed by the [`DependencyParser`](/api/dependencyparser), so the
|
||||||
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
|
`Sentencizer` lets you implement a simpler, rule-based strategy that doesn't
|
||||||
require a statistical model to be loaded.
|
require a statistical model to be loaded.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Calculated values will be assigned to `Token.is_sent_start`. The resulting
|
||||||
|
sentences can be accessed using `Doc.sents`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
|
| `Token.is_sent_start` | A boolean value indicating whether the token starts a sentence. This will be either `True` or `False` for all tokens. ~~bool~~ |
|
||||||
|
| `Doc.sents` | An iterator over sentences in the `Doc`, determined by `Token.is_sent_start` values. ~~Iterator[Span]~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
@ -28,7 +38,7 @@ how the component should be configured. You can override its settings via the
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Setting | Description |
|
| Setting | Description |
|
||||||
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ |
|
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` |
|
| `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` |
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
|
@ -8,6 +8,21 @@ api_string_name: tagger
|
||||||
api_trainable: true
|
api_trainable: true
|
||||||
---
|
---
|
||||||
|
|
||||||
|
A trainable pipeline component to predict part-of-speech tags for any
|
||||||
|
part-of-speech tag set.
|
||||||
|
|
||||||
|
In the pre-trained pipelines, the tag schemas vary by language; see the
|
||||||
|
[individual model pages](/models) for details.
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predictions are assigned to `Token.tag`.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ------------ | ---------------------------------- |
|
||||||
|
| `Token.tag` | The part of speech (hash). ~~int~~ |
|
||||||
|
| `Token.tag_` | The part of speech. ~~str~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -29,6 +29,22 @@ only.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
Predictions will be saved to `doc.cats` as a dictionary, where the key is the
|
||||||
|
name of the category and the value is a score between 0 and 1 (inclusive). For
|
||||||
|
`textcat` (exclusive categories), the scores will sum to 1, while for
|
||||||
|
`textcat_multilabel` there is no particular guarantee about their sum.
|
||||||
|
|
||||||
|
Note that when assigning values to create training data, the score of each
|
||||||
|
category must be 0 or 1. Using other values, for example to create a document
|
||||||
|
that is a little bit in category A and a little bit in category B, is not
|
||||||
|
supported.
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ---------- | ------------------------------------- |
|
||||||
|
| `Doc.cats` | Category scores. ~~Dict[str, float]~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
|
|
@ -38,12 +38,21 @@ attributes. We also calculate an alignment between the word-piece tokens and the
|
||||||
spaCy tokenization, so that we can use the last hidden states to set the
|
spaCy tokenization, so that we can use the last hidden states to set the
|
||||||
`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
|
`Doc.tensor` attribute. When multiple word-piece tokens align to the same spaCy
|
||||||
token, the spaCy token receives the sum of their values. To access the values,
|
token, the spaCy token receives the sum of their values. To access the values,
|
||||||
you can use the custom [`Doc._.trf_data`](#custom-attributes) attribute. The
|
you can use the custom [`Doc._.trf_data`](#assigned-attributes) attribute. The
|
||||||
package also adds the function registries [`@span_getters`](#span_getters) and
|
package also adds the function registries [`@span_getters`](#span_getters) and
|
||||||
[`@annotation_setters`](#annotation_setters) with several built-in registered
|
[`@annotation_setters`](#annotation_setters) with several built-in registered
|
||||||
functions. For more details, see the
|
functions. For more details, see the
|
||||||
[usage documentation](/usage/embeddings-transformers).
|
[usage documentation](/usage/embeddings-transformers).
|
||||||
|
|
||||||
|
## Assigned Attributes {#assigned-attributes}
|
||||||
|
|
||||||
|
The component sets the following
|
||||||
|
[custom extension attribute](/usage/processing-pipeline#custom-components-attributes):
|
||||||
|
|
||||||
|
| Location | Value |
|
||||||
|
| ---------------- | ------------------------------------------------------------------------ |
|
||||||
|
| `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ |
|
||||||
|
|
||||||
## Config and implementation {#config}
|
## Config and implementation {#config}
|
||||||
|
|
||||||
The default config is defined by the pipeline component factory and describes
|
The default config is defined by the pipeline component factory and describes
|
||||||
|
@ -98,7 +107,7 @@ https://github.com/explosion/spacy-transformers/blob/master/spacy_transformers/p
|
||||||
Construct a `Transformer` component. One or more subsequent spaCy components can
|
Construct a `Transformer` component. One or more subsequent spaCy components can
|
||||||
use the transformer outputs as features in its model, with gradients
|
use the transformer outputs as features in its model, with gradients
|
||||||
backpropagated to the single shared weights. The activations from the
|
backpropagated to the single shared weights. The activations from the
|
||||||
transformer are saved in the [`Doc._.trf_data`](#custom-attributes) extension
|
transformer are saved in the [`Doc._.trf_data`](#assigned-attributes) extension
|
||||||
attribute. You can also provide a callback to set additional annotations. In
|
attribute. You can also provide a callback to set additional annotations. In
|
||||||
your application, you would normally use a shortcut for this and instantiate the
|
your application, you would normally use a shortcut for this and instantiate the
|
||||||
component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
|
component using its string name and [`nlp.add_pipe`](/api/language#create_pipe).
|
||||||
|
@ -205,7 +214,7 @@ modifying them.
|
||||||
|
|
||||||
Assign the extracted features to the `Doc` objects. By default, the
|
Assign the extracted features to the `Doc` objects. By default, the
|
||||||
[`TransformerData`](/api/transformer#transformerdata) object is written to the
|
[`TransformerData`](/api/transformer#transformerdata) object is written to the
|
||||||
[`Doc._.trf_data`](#custom-attributes) attribute. Your `set_extra_annotations`
|
[`Doc._.trf_data`](#assigned-attributes) attribute. Your `set_extra_annotations`
|
||||||
callback is then called, if provided.
|
callback is then called, if provided.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
|
@ -383,7 +392,7 @@ are wrapped into the
|
||||||
[FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The
|
[FullTransformerBatch](/api/transformer#fulltransformerbatch) object. The
|
||||||
`FullTransformerBatch` then splits out the per-document data, which is handled
|
`FullTransformerBatch` then splits out the per-document data, which is handled
|
||||||
by this class. Instances of this class are typically assigned to the
|
by this class. Instances of this class are typically assigned to the
|
||||||
[`Doc._.trf_data`](/api/transformer#custom-attributes) extension attribute.
|
[`Doc._.trf_data`](/api/transformer#assigned-attributes) extension attribute.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
@ -549,12 +558,3 @@ The following built-in functions are available:
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ---------------------------------------------- | ------------------------------------- |
|
| ---------------------------------------------- | ------------------------------------- |
|
||||||
| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
|
| `spacy-transformers.null_annotation_setter.v1` | Don't set any additional annotations. |
|
||||||
|
|
||||||
## Custom attributes {#custom-attributes}
|
|
||||||
|
|
||||||
The component sets the following
|
|
||||||
[custom extension attributes](/usage/processing-pipeline#custom-components-attributes):
|
|
||||||
|
|
||||||
| Name | Description |
|
|
||||||
| ---------------- | ------------------------------------------------------------------------ |
|
|
||||||
| `Doc._.trf_data` | Transformer tokens and outputs for the `Doc` object. ~~TransformerData~~ |
|
|
||||||
|
|
|
@ -321,7 +321,7 @@ performed in chunks to avoid consuming too much memory. You can set the
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
|
| -------------- | --------------------------------------------------------------------------- |
|
||||||
| `queries` | An array with one or more vectors. ~~numpy.ndarray~~ |
|
| `queries` | An array with one or more vectors. ~~numpy.ndarray~~ |
|
||||||
| _keyword-only_ | |
|
| _keyword-only_ | |
|
||||||
| `batch_size` | The batch size to use. Default to `1024`. ~~int~~ |
|
| `batch_size` | The batch size to use. Default to `1024`. ~~int~~ |
|
||||||
|
|
|
@ -21,14 +21,14 @@ Create the vocabulary.
|
||||||
> vocab = Vocab(strings=["hello", "world"])
|
> vocab = Vocab(strings=["hello", "world"])
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ |
|
| `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ |
|
||||||
| `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ |
|
| `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ |
|
||||||
| `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ |
|
| `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ |
|
||||||
| `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ |
|
| `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ |
|
||||||
| `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~ |
|
| `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~ |
|
||||||
| `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ |
|
| `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ |
|
||||||
| `get_noun_chunks` | A function that yields base noun phrases used for [`Doc.noun_chunks`](/api/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |
|
| `get_noun_chunks` | A function that yields base noun phrases used for [`Doc.noun_chunks`](/api/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |
|
||||||
|
|
||||||
## Vocab.\_\_len\_\_ {#len tag="method"}
|
## Vocab.\_\_len\_\_ {#len tag="method"}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user