diff --git a/website/docs/api/cli.md b/website/docs/api/cli.md
index dd396d0b3..cbd1f794a 100644
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@@ -1335,7 +1335,7 @@ $ python -m spacy project run [subcommand] [project_dir] [--force] [--dry]
| `subcommand` | Name of the command or workflow to run. ~~str (positional)~~ |
| `project_dir` | Path to project directory. Defaults to current working directory. ~~Path (positional)~~ |
| `--force`, `-F` | Force re-running steps, even if nothing changed. ~~bool (flag)~~ |
-| `--dry`, `-D` | Perform a dry run and don't execute scripts. ~~bool (flag)~~ |
+| `--dry`, `-D` | Perform a dry run and don't execute scripts. ~~bool (flag)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **EXECUTES** | The command defined in the `project.yml`. |
@@ -1453,12 +1453,12 @@ For more examples, see the templates in our
-| Name | Description |
-| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `project_dir` | Path to project directory. Defaults to current working directory. ~~Path (positional)~~ |
-| `--output`, `-o` | Path to output file or `-` for stdout (default). If a file is specified and it already exists and contains auto-generated docs, only the auto-generated docs section is replaced. ~~Path (positional)~~ |
-| `--no-emoji`, `-NE` | Don't use emoji in the titles. ~~bool (flag)~~ |
-| **CREATES** | The Markdown-formatted project documentation. |
+| Name | Description |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `project_dir` | Path to project directory. Defaults to current working directory. ~~Path (positional)~~ |
+| `--output`, `-o` | Path to output file or `-` for stdout (default). If a file is specified and it already exists and contains auto-generated docs, only the auto-generated docs section is replaced. ~~Path (positional)~~ |
+| `--no-emoji`, `-NE` | Don't use emoji in the titles. ~~bool (flag)~~ |
+| **CREATES** | The Markdown-formatted project documentation. |
### project dvc {#project-dvc tag="command"}
@@ -1497,7 +1497,7 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose]
| `project_dir` | Path to project directory. Defaults to current working directory. ~~Path (positional)~~ |
| `workflow` | Name of workflow defined in `project.yml`. Defaults to first workflow if not set. ~~Optional[str] \(option)~~ |
| `--force`, `-F` | Force-updating config file. ~~bool (flag)~~ |
-| `--verbose`, `-V` | Print more output generated by DVC. ~~bool (flag)~~ |
+| `--verbose`, `-V` | Print more output generated by DVC. ~~bool (flag)~~ |
| `--help`, `-h` | Show help message and available arguments. ~~bool (flag)~~ |
| **CREATES** | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow. |
@@ -1588,5 +1588,5 @@ $ python -m spacy huggingface-hub push [whl_path] [--org] [--msg] [--local-repo]
| `--org`, `-o` | Optional name of organization to which the pipeline should be uploaded. ~~str (option)~~ |
| `--msg`, `-m` | Commit message to use for update. Defaults to `"Update spaCy pipeline"`. ~~str (option)~~ |
| `--local-repo`, `-l` | Local path to the model repository (will be created if it doesn't exist). Defaults to `hub` in the current working directory. ~~Path (option)~~ |
-| `--verbose`, `-V` | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~ |
+| `--verbose`, `-V` | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~ |
| **UPLOADS** | The pipeline to the hub. |
diff --git a/website/docs/api/corpus.md b/website/docs/api/corpus.md
index 35afc8fea..88c4befd7 100644
--- a/website/docs/api/corpus.md
+++ b/website/docs/api/corpus.md
@@ -37,13 +37,13 @@ streaming.
> augmenter = null
> ```
-| Name | Description |
-| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~ |
-| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
-| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
-| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
-| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
+| Name | Description |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~ |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
+| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
+| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
+| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
```python
%%GITHUB_SPACY/spacy/training/corpus.py
@@ -71,15 +71,15 @@ train/test skew.
> corpus = Corpus("./data", limit=10)
> ```
-| Name | Description |
-| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path` | The directory or filename to read from. ~~Union[str, Path]~~ |
-| _keyword-only_ | |
-| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. Defaults to `False`. ~~bool~~ |
-| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
-| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
-| `augmenter` | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~ |
-| `shuffle` | Whether to shuffle the examples. Defaults to `False`. ~~bool~~ |
+| Name | Description |
+| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path` | The directory or filename to read from. ~~Union[str, Path]~~ |
+| _keyword-only_ | |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. Defaults to `False`. ~~bool~~ |
+| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
+| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
+| `augmenter` | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~ |
+| `shuffle` | Whether to shuffle the examples. Defaults to `False`. ~~bool~~ |
## Corpus.\_\_call\_\_ {#call tag="method"}
diff --git a/website/docs/api/language.md b/website/docs/api/language.md
index 8d7686243..9a413efaf 100644
--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@@ -1123,7 +1123,7 @@ instance and factory instance.
| `factory` | The name of the registered component factory. ~~str~~ |
| `default_config` | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~ |
| `assigns` | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
-| `requires` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
-| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
+| `requires` | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
+| `retokenizes` | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~ |
| `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
| `scores` | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Based on the `default_score_weights` and used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~ |
diff --git a/website/docs/api/matcher.md b/website/docs/api/matcher.md
index 273c202ca..6c8cae211 100644
--- a/website/docs/api/matcher.md
+++ b/website/docs/api/matcher.md
@@ -30,26 +30,26 @@ pattern keys correspond to a number of
[`Token` attributes](/api/token#attributes). The supported attributes for
rule-based matching are:
-| Attribute | Description |
-| ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH` | The exact verbatim text of a token. ~~str~~ |
-| `TEXT` 2.1 | The exact verbatim text of a token. ~~str~~ |
-| `NORM` | The normalized form of the token text. ~~str~~ |
-| `LOWER` | The lowercase form of the token text. ~~str~~ |
-| `LENGTH` | The length of the token text. ~~int~~ |
-| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~ |
-| `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | Token text is in lowercase, uppercase, titlecase. ~~bool~~ |
-| `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | Token is punctuation, whitespace, stop word. ~~bool~~ |
-| `IS_SENT_START` | Token is start of sentence. ~~bool~~ |
-| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | Token text resembles a number, URL, email. ~~bool~~ |
-| `SPACY` | Token has a trailing space. ~~bool~~ |
-| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~ |
-| `ENT_TYPE` | The token's entity label. ~~str~~ |
-| `ENT_IOB` | The IOB part of the token's entity tag. ~~str~~ |
-| `ENT_ID` | The token's entity ID (`ent_id`). ~~str~~ |
-| `ENT_KB_ID` | The token's entity knowledge base ID (`ent_kb_id`). ~~str~~ |
-| `_` 2.1 | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
-| `OP` | Operator or quantifier to determine how often to match a token pattern. ~~str~~ |
+| Attribute | Description |
+| ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
+| `ORTH` | The exact verbatim text of a token. ~~str~~ |
+| `TEXT` 2.1 | The exact verbatim text of a token. ~~str~~ |
+| `NORM` | The normalized form of the token text. ~~str~~ |
+| `LOWER` | The lowercase form of the token text. ~~str~~ |
+| `LENGTH` | The length of the token text. ~~int~~ |
+| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~ |
+| `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | Token text is in lowercase, uppercase, titlecase. ~~bool~~ |
+| `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | Token is punctuation, whitespace, stop word. ~~bool~~ |
+| `IS_SENT_START` | Token is start of sentence. ~~bool~~ |
+| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | Token text resembles a number, URL, email. ~~bool~~ |
+| `SPACY` | Token has a trailing space. ~~bool~~ |
+| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~ |
+| `ENT_TYPE` | The token's entity label. ~~str~~ |
+| `ENT_IOB` | The IOB part of the token's entity tag. ~~str~~ |
+| `ENT_ID` | The token's entity ID (`ent_id`). ~~str~~ |
+| `ENT_KB_ID` | The token's entity knowledge base ID (`ent_kb_id`). ~~str~~ |
+| `_` 2.1 | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
+| `OP` | Operator or quantifier to determine how often to match a token pattern. ~~str~~ |
Operators and quantifiers define **how often** a token pattern should be
matched:
diff --git a/website/docs/api/top-level.md b/website/docs/api/top-level.md
index f2fd1415f..904a91ea9 100644
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@@ -320,7 +320,6 @@ If a setting is not present in the options, the default value will be used.
| `template` 2.2 | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`. See [`templates.py`](%%GITHUB_SPACY/spacy/displacy/templates.py) for examples. ~~Optional[str]~~ |
| `kb_url_template` 3.2.1 | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in. ~~Optional[str]~~ |
-
#### Span Visualizer options {#displacy_options-span}
> #### Example
@@ -330,21 +329,19 @@ If a setting is not present in the options, the default value will be used.
> displacy.serve(doc, style="span", options=options)
> ```
-| Name | Description |
-|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `spans_key` | Which spans key to render spans from. Default is `"sc"`. ~~str~~ |
+| Name | Description |
+| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `spans_key` | Which spans key to render spans from. Default is `"sc"`. ~~str~~ |
| `templates` | Dictionary containing the keys `"span"`, `"slice"`, and `"start"`. These dictate how the overall span, a span slice, and the starting token will be rendered. ~~Optional[Dict[str, str]~~ |
-| `kb_url_template` | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in ~~Optional[str]~~ |
-| `colors` | Color overrides. Entity types should be mapped to color names or values. ~~Dict[str, str]~~ |
+| `kb_url_template` | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in ~~Optional[str]~~ |
+| `colors` | Color overrides. Entity types should be mapped to color names or values. ~~Dict[str, str]~~ |
-
-By default, displaCy comes with colors for all entity types used by [spaCy's
-trained pipelines](/models) for both entity and span visualizer. If you're
-using custom entity types, you can use the `colors` setting to add your own
-colors for them. Your application or pipeline package can also expose a
-[`spacy_displacy_colors` entry
-point](/usage/saving-loading#entry-points-displacy) to add custom labels and
-their colors automatically.
+By default, displaCy comes with colors for all entity types used by
+[spaCy's trained pipelines](/models) for both entity and span visualizer. If
+you're using custom entity types, you can use the `colors` setting to add your
+own colors for them. Your application or pipeline package can also expose a
+[`spacy_displacy_colors` entry point](/usage/saving-loading#entry-points-displacy)
+to add custom labels and their colors automatically.
By default, displaCy links to `#` for entities without a `kb_id` set on their
span. If you wish to link an entity to their URL then consider using the
@@ -354,7 +351,6 @@ span. If you wish to link an entity to their URL then consider using the
should redirect you to their Wikidata page, in this case
`https://www.wikidata.org/wiki/Q95`.
-
## registry {#registry source="spacy/util.py" new="3"}
spaCy's function registry extends
@@ -443,8 +439,8 @@ and the accuracy scores on the development set.
The built-in, default logger is the ConsoleLogger, which prints results to the
console in tabular format. The
[spacy-loggers](https://github.com/explosion/spacy-loggers) package, included as
-a dependency of spaCy, enables other loggers, such as one that
-sends results to a [Weights & Biases](https://www.wandb.com/) dashboard.
+a dependency of spaCy, enables other loggers, such as one that sends results to
+a [Weights & Biases](https://www.wandb.com/) dashboard.
Instead of using one of the built-in loggers, you can
[implement your own](/usage/training#custom-logging).
@@ -583,14 +579,14 @@ the [`Corpus`](/api/corpus) class.
> limit = 0
> ```
-| Name | Description |
-| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~ |
-| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
-| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
-| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
-| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
-| **CREATES** | The corpus reader. ~~Corpus~~ |
+| Name | Description |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path` | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~ |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~ |
+| `max_length` | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
+| `limit` | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~ |
+| `augmenter` | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
+| **CREATES** | The corpus reader. ~~Corpus~~ |
#### spacy.JsonlCorpus.v1 {#jsonlcorpus tag="registered function"}
diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md
index b3b896a54..c547ec0bc 100644
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@@ -48,7 +48,7 @@ but do not change its part-of-speech. We say that a **lemma** (root form) is
**inflected** (modified/combined) with one or more **morphological features** to
create a surface form. Here are some examples:
-| Context | Surface | Lemma | POS | Morphological Features |
+| Context | Surface | Lemma | POS | Morphological Features |
| ---------------------------------------- | ------- | ----- | ------ | ---------------------------------------- |
| I was reading the paper | reading | read | `VERB` | `VerbForm=Ger` |
| I don't watch the news, I read the paper | read | read | `VERB` | `VerbForm=Fin`, `Mood=Ind`, `Tense=Pres` |
@@ -430,7 +430,7 @@ for token in doc:
print(token.text, token.pos_, token.dep_, token.head.text)
```
-| Text | POS | Dep | Head text |
+| Text | POS | Dep | Head text |
| ----------------------------------- | ------ | ------- | --------- |
| Credit and mortgage account holders | `NOUN` | `nsubj` | submit |
| must | `VERB` | `aux` | submit |
diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md
index be9a56dc8..bf654c14f 100644
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@@ -158,23 +158,23 @@ The available token pattern keys correspond to a number of
[`Token` attributes](/api/token#attributes). The supported attributes for
rule-based matching are:
-| Attribute | Description |
-| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH` | The exact verbatim text of a token. ~~str~~ |
-| `TEXT` 2.1 | The exact verbatim text of a token. ~~str~~ |
-| `NORM` | The normalized form of the token text. ~~str~~ |
-| `LOWER` | The lowercase form of the token text. ~~str~~ |
-| `LENGTH` | The length of the token text. ~~int~~ |
-| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~ |
-| `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | Token text is in lowercase, uppercase, titlecase. ~~bool~~ |
-| `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | Token is punctuation, whitespace, stop word. ~~bool~~ |
-| `IS_SENT_START` | Token is start of sentence. ~~bool~~ |
-| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | Token text resembles a number, URL, email. ~~bool~~ |
-| `SPACY` | Token has a trailing space. ~~bool~~ |
-| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
-| `ENT_TYPE` | The token's entity label. ~~str~~ |
-| `_` 2.1 | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
-| `OP` | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~ |
+| Attribute | Description |
+| ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ORTH` | The exact verbatim text of a token. ~~str~~ |
+| `TEXT` 2.1 | The exact verbatim text of a token. ~~str~~ |
+| `NORM` | The normalized form of the token text. ~~str~~ |
+| `LOWER` | The lowercase form of the token text. ~~str~~ |
+| `LENGTH` | The length of the token text. ~~int~~ |
+| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT` | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~ |
+| `IS_LOWER`, `IS_UPPER`, `IS_TITLE` | Token text is in lowercase, uppercase, titlecase. ~~bool~~ |
+| `IS_PUNCT`, `IS_SPACE`, `IS_STOP` | Token is punctuation, whitespace, stop word. ~~bool~~ |
+| `IS_SENT_START` | Token is start of sentence. ~~bool~~ |
+| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL` | Token text resembles a number, URL, email. ~~bool~~ |
+| `SPACY` | Token has a trailing space. ~~bool~~ |
+| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
+| `ENT_TYPE` | The token's entity label. ~~str~~ |
+| `_` 2.1 | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
+| `OP` | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~ |
diff --git a/website/docs/usage/v3-1.md b/website/docs/usage/v3-1.md
index 1bac8fd81..2725cacb9 100644
--- a/website/docs/usage/v3-1.md
+++ b/website/docs/usage/v3-1.md
@@ -132,13 +132,13 @@ your own.
> contributions for Catalan and to Kenneth Enevoldsen for Danish. For additional
> Danish pipelines, check out [DaCy](https://github.com/KennethEnevoldsen/DaCy).
-| Package | Language | UPOS | Parser LAS | NER F |
-| ------------------------------------------------- | -------- | ---: | ---------: | -----: |
-| [`ca_core_news_sm`](/models/ca#ca_core_news_sm) | Catalan | 98.2 | 87.4 | 79.8 |
-| [`ca_core_news_md`](/models/ca#ca_core_news_md) | Catalan | 98.3 | 88.2 | 84.0 |
-| [`ca_core_news_lg`](/models/ca#ca_core_news_lg) | Catalan | 98.5 | 88.4 | 84.2 |
-| [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan | 98.9 | 93.0 | 91.2 |
-| [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish | 98.0 | 85.0 | 82.9 |
+| Package | Language | UPOS | Parser LAS | NER F |
+| ------------------------------------------------- | -------- | ---: | ---------: | ----: |
+| [`ca_core_news_sm`](/models/ca#ca_core_news_sm) | Catalan | 98.2 | 87.4 | 79.8 |
+| [`ca_core_news_md`](/models/ca#ca_core_news_md) | Catalan | 98.3 | 88.2 | 84.0 |
+| [`ca_core_news_lg`](/models/ca#ca_core_news_lg) | Catalan | 98.5 | 88.4 | 84.2 |
+| [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan | 98.9 | 93.0 | 91.2 |
+| [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish | 98.0 | 85.0 | 82.9 |
### Resizable text classification architectures {#resizable-textcat}
diff --git a/website/docs/usage/v3.md b/website/docs/usage/v3.md
index 980f06172..971779ed3 100644
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@@ -116,7 +116,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
> corpus that had both syntactic and entity annotations, so the transformer
> models for those languages do not include NER.
-| Package | Language | Transformer | Tagger | Parser | NER |
+| Package | Language | Transformer | Tagger | Parser | NER |
| ------------------------------------------------ | -------- | --------------------------------------------------------------------------------------------- | -----: | -----: | ---: |
| [`en_core_web_trf`](/models/en#en_core_web_trf) | English | [`roberta-base`](https://huggingface.co/roberta-base) | 97.8 | 95.2 | 89.9 |
| [`de_dep_news_trf`](/models/de#de_dep_news_trf) | German | [`bert-base-german-cased`](https://huggingface.co/bert-base-german-cased) | 99.0 | 95.8 | - |
@@ -856,9 +856,9 @@ attribute ruler before training using the `[initialize]` block of your config.
### Using Lexeme Tables
-To use tables like `lexeme_prob` when training a model from scratch, you need
-to add an entry to the `initialize` block in your config. Here's what that
-looks like for the existing trained pipelines:
+To use tables like `lexeme_prob` when training a model from scratch, you need to
+add an entry to the `initialize` block in your config. Here's what that looks
+like for the existing trained pipelines:
```ini
[initialize.lookups]