Tidy up docs

This commit is contained in:
Adriane Boyd 2021-06-28 11:48:11 +02:00
parent 5eeb25f043
commit 4d1ef8f695
18 changed files with 185 additions and 180 deletions

View File

@ -285,8 +285,8 @@ Encode context using bidirectional LSTM layers. Requires
Embed [`Doc`](/api/doc) objects with their vocab's vectors table, applying a
learned linear projection to control the dimensionality. Unknown tokens are
mapped to a zero vector. See the documentation on [static
vectors](/usage/embeddings-transformers#static-vectors) for details.
mapped to a zero vector. See the documentation on
[static vectors](/usage/embeddings-transformers#static-vectors) for details.
| Name |  Description |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -649,8 +649,8 @@ from the linear model, where it is stored in `model.attrs["multi_label"]`.
<Accordion title="spacy.TextCatEnsemble.v1 definition" spaced>
[TextCatEnsemble.v1](/api/legacy#TextCatEnsemble_v1) was functionally similar, but used an internal `tok2vec` instead of
taking it as argument:
[TextCatEnsemble.v1](/api/legacy#TextCatEnsemble_v1) was functionally similar,
but used an internal `tok2vec` instead of taking it as argument:
| Name | Description |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@ -701,8 +701,9 @@ architecture is usually less accurate than the ensemble, but runs faster.
<Accordion title="spacy.TextCatCNN.v1 definition" spaced>
[TextCatCNN.v1](/api/legacy#TextCatCNN_v1) had the exact same signature, but was not yet resizable.
Since v2, new labels can be added to this component, even after training.
[TextCatCNN.v1](/api/legacy#TextCatCNN_v1) had the exact same signature, but was
not yet resizable. Since v2, new labels can be added to this component, even
after training.
</Accordion>
@ -732,8 +733,9 @@ the others, but may not be as accurate, especially if texts are short.
<Accordion title="spacy.TextCatBOW.v1 definition" spaced>
[TextCatBOW.v1](/api/legacy#TextCatBOW_v1) had the exact same signature, but was not yet resizable.
Since v2, new labels can be added to this component, even after training.
[TextCatBOW.v1](/api/legacy#TextCatBOW_v1) had the exact same signature, but was
not yet resizable. Since v2, new labels can be added to this component, even
after training.
</Accordion>

View File

@ -232,7 +232,7 @@ model. Delegates to [`predict`](/api/dependencyparser#predict) and
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -35,11 +35,11 @@ how the component should be configured. You can override its settings via the
> ```
| Setting | Description |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | ----------- |
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
| `validate` | Whether patterns should be validated (passed to the `Matcher` and `PhraseMatcher`). Defaults to `False`. ~~bool~~ |
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"||"`. ~~str~~ |
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `" | | "`. ~~str~~ |
```python
%%GITHUB_SPACY/spacy/pipeline/entityruler.py
@ -64,14 +64,14 @@ be a token pattern (list) or a phrase pattern (string). For example:
> ```
| Name | Description |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --- | ----------- |
| `nlp` | The shared nlp object to pass the vocab to the matchers and process phrase patterns. ~~Language~~ |
| `name` <Tag variant="new">3</Tag> | Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current entity ruler while creating phrase patterns with the nlp object. ~~str~~ |
| _keyword-only_ | |
| `phrase_matcher_attr` | Optional attribute name match on for the internal [`PhraseMatcher`](/api/phrasematcher), e.g. `LOWER` to match on the lowercase token text. Defaults to `None`. ~~Optional[Union[int, str]]~~ |
| `validate` | Whether patterns should be validated, passed to Matcher and PhraseMatcher as `validate`. Defaults to `False`. ~~bool~~ |
| `overwrite_ents` | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to `False`. ~~bool~~ |
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `"||"`. ~~str~~ |
| `ent_id_sep` | Separator used internally for entity IDs. Defaults to `" | | "`. ~~str~~ |
| `patterns` | Optional patterns to load in on initialization. ~~Optional[List[Dict[str, Union[str, List[dict]]]]]~~ |
## EntityRuler.initialize {#initialize tag="method" new="3"}

View File

@ -245,8 +245,8 @@ certain prior probability.
### Candidate.\_\_init\_\_ {#candidate-init tag="method"}
Construct a `Candidate` object. Usually this constructor is not called directly,
but instead these objects are returned by the
`get_candidates` method of the [`entity_linker`](/api/entitylinker) pipe.
but instead these objects are returned by the `get_candidates` method of the
[`entity_linker`](/api/entitylinker) pipe.
> #### Example
>

View File

@ -178,8 +178,9 @@ added to an existing vectors table. See more details in
### spacy.TextCatCNN.v1 {#TextCatCNN_v1}
Since `spacy.TextCatCNN.v2`, this architecture has become resizable, which means that you can add
labels to a previously trained textcat. `TextCatCNN` v1 did not yet support that.
Since `spacy.TextCatCNN.v2`, this architecture has become resizable, which means
that you can add labels to a previously trained textcat. `TextCatCNN` v1 did not
yet support that.
> #### Example Config
>
@ -213,8 +214,9 @@ architecture is usually less accurate than the ensemble, but runs faster.
### spacy.TextCatBOW.v1 {#TextCatBOW_v1}
Since `spacy.TextCatBOW.v2`, this architecture has become resizable, which means that you can add
labels to a previously trained textcat. `TextCatBOW` v1 did not yet support that.
Since `spacy.TextCatBOW.v2`, this architecture has become resizable, which means
that you can add labels to a previously trained textcat. `TextCatBOW` v1 did not
yet support that.
> #### Example Config
>

View File

@ -121,7 +121,7 @@ Find all token sequences matching the supplied patterns on the `Doc` or `Span`.
> ```
| Name | Description |
| ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `doclike` | The `Doc` or `Span` to match over. ~~Union[Doc, Span]~~ |
| _keyword-only_ | |
| `as_spans` <Tag variant="new">3</Tag> | Instead of tuples, return a list of [`Span`](/api/span) objects of the matches, with the `match_id` assigned as the span label. Defaults to `False`. ~~bool~~ |

View File

@ -62,7 +62,7 @@ shortcut for this and instantiate the component using its string name and
> ```
| Name | Description |
| -------------- | -------------------------------------------------------------------------------------------------------------------- |
| ------- | -------------------------------------------------------------------------------------------------------------------- |
| `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | The [`Model`](https://thinc.ai/docs/api-model) powering the pipeline component. ~~Model[List[Doc], List[Floats2d]]~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
@ -201,7 +201,7 @@ Delegates to [`predict`](/api/morphologizer#predict) and
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -99,15 +99,15 @@ representation.
> ```
| Name | Description |
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `feats_dict` | The morphological features as a dictionary. ~~Dict[str, str]~~ |
| **RETURNS** | The morphological features in Universal Dependencies [FEATS](https://universaldependencies.org/format.html#morphological-annotation) format. ~~str~~ |
## Attributes {#attributes}
| Name | Description |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is `|`. ~~str~~ |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `FEATURE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) feature separator. Default is ` | `. ~~str~~ |
| `FIELD_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) field separator. Default is `=`. ~~str~~ |
| `VALUE_SEP` | The [FEATS](https://universaldependencies.org/format.html#morphological-annotation) value separator. Default is `,`. ~~str~~ |

View File

@ -149,7 +149,7 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
</Infobox>
| Name | Description |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --- |
| `match_id` | An ID for the thing you're matching. ~~str~~ | |
| `docs` | `Doc` objects of the phrases to match. ~~List[Doc]~~ |
| _keyword-only_ | |

View File

@ -188,7 +188,7 @@ Delegates to [`predict`](/api/sentencerecognizer#predict) and
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -28,7 +28,7 @@ how the component should be configured. You can override its settings via the
> ```
| Setting | Description |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ |
| `punct_chars` | Optional custom list of punctuation characters that mark sentence ends. See below for defaults if not set. Defaults to `None`. ~~Optional[List[str]]~~ | `None` |
```python

View File

@ -491,8 +491,8 @@ document by the `parser`, `senter`, `sentencizer` or some custom function. It
will raise an error otherwise.
If the span happens to cross sentence boundaries, only the first sentence will
be returned. If it is required that the sentence always includes the
full span, the result can be adjusted as such:
be returned. If it is required that the sentence always includes the full span,
the result can be adjusted as such:
```python
sent = span.sent

View File

@ -214,7 +214,7 @@ Delegates to [`predict`](/api/spancategorizer#predict) and
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -26,7 +26,7 @@ architectures and their arguments and hyperparameters.
> ```
| Setting | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model` | A model instance that predicts the tag probabilities. The output vectors should match the number of tags in size, and be normalized as probabilities (all scores between 0 and 1, with the rows summing to `1`). Defaults to [Tagger](/api/architectures#Tagger). ~~Model[List[Doc], List[Floats2d]]~~ |
```python
@ -55,7 +55,7 @@ shortcut for this and instantiate the component using its string name and
[`nlp.add_pipe`](/api/language#add_pipe).
| Name | Description |
| ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `vocab` | The shared vocabulary. ~~Vocab~~ |
| `model` | A model instance that predicts the tag probabilities. The output vectors should match the number of tags in size, and be normalized as probabilities (all scores between 0 and 1, with the rows summing to `1`). ~~Model[List[Doc], List[Floats2d]]~~ |
| `name` | String name of the component instance. Used to add entries to the `losses` during training. ~~str~~ |
@ -199,7 +199,7 @@ Delegates to [`predict`](/api/tagger#predict) and
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -197,7 +197,7 @@ Delegates to [`predict`](/api/tok2vec#predict).
> ```
| Name | Description |
| ----------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| -------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `examples` | A batch of [`Example`](/api/example) objects to learn from. ~~Iterable[Example]~~ |
| _keyword-only_ | |
| `drop` | The dropout rate. ~~float~~ |

View File

@ -363,7 +363,7 @@ unknown. Defaults to `True` for the first token in the `Doc`.
> ```
| Name | Description |
| ----------- | --------------------------------------------- |
| ----------- | ------------------------------------------------------- |
| **RETURNS** | Whether the token starts a sentence. ~~Optional[bool]~~ |
## Token.has_vector {#has_vector tag="property" model="vectors"}
@ -421,7 +421,7 @@ The L2 norm of the token's vector representation.
## Attributes {#attributes}
| Name | Description |
| -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `doc` | The parent document. ~~Doc~~ |
| `lex` <Tag variant="new">3</Tag> | The underlying lexeme. ~~Lexeme~~ |
| `sent` <Tag variant="new">2.0.12</Tag> | The sentence span that this token is a part of. ~~Span~~ |

View File

@ -239,6 +239,7 @@ it.
| `infix_finditer` | A function to find internal segment separators, e.g. hyphens. Returns a (possibly empty) sequence of `re.MatchObject` objects. ~~Optional[Callable[[str], Iterator[Match]]]~~ |
| `token_match` | A function matching the signature of `re.compile(string).match` to find token matches. Returns an `re.MatchObject` or `None`. ~~Optional[Callable[[str], Optional[Match]]]~~ |
| `rules` | A dictionary of tokenizer exceptions and special cases. ~~Optional[Dict[str, List[Dict[int, str]]]]~~ |
## Serialization fields {#serialization-fields}
During serialization, spaCy will export several data fields used to restore

View File

@ -290,8 +290,8 @@ If a table is full, it can be resized using
## Vectors.n_keys {#n_keys tag="property"}
Get the number of keys in the table. Note that this is the number of _all_ keys,
not just unique vectors. If several keys are mapped to the same
vectors, they will be counted individually.
not just unique vectors. If several keys are mapped to the same vectors, they
will be counted individually.
> #### Example
>
@ -321,7 +321,7 @@ performed in chunks to avoid consuming too much memory. You can set the
> ```
| Name | Description |
| -------------- | --------------------------------------------------------------------------- |
| -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `queries` | An array with one or more vectors. ~~numpy.ndarray~~ |
| _keyword-only_ | |
| `batch_size` | The batch size to use. Default to `1024`. ~~int~~ |