Remove NBSP's across tables in the docs (#10842)

2026-01-08 17:51:16 +03:00 · 2022-05-25 09:48:39 +02:00 · 2022-05-25 09:48:39 +02:00 · 4619a99185
commit 4619a99185
parent 6be09bbd07
9 changed files with 98 additions and 102 deletions
--- a/website/docs/api/cli.md
+++ b/website/docs/api/cli.md
@ -1335,7 +1335,7 @@ $ python -m spacy project run [subcommand] [project_dir] [--force] [--dry]
 | `subcommand`    | Name of the command or workflow to run. ~~str (positional)~~                            |
 | `project_dir`   | Path to project directory. Defaults to current working directory. ~~Path (positional)~~ |
 | `--force`, `-F` | Force re-running steps, even if nothing changed. ~~bool (flag)~~                        |
-| `--dry`, `-D`   |  Perform a dry run and don't execute scripts. ~~bool (flag)~~                           |
+| `--dry`, `-D`   | Perform a dry run and don't execute scripts. ~~bool (flag)~~                            |
 | `--help`, `-h`  | Show help message and available arguments. ~~bool (flag)~~                              |
 | **EXECUTES**    | The command defined in the `project.yml`.                                               |

@ -1453,12 +1453,12 @@ For more examples, see the templates in our

 </Accordion>

-| Name                 | Description                                                                                                                                                                                             |
-| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `project_dir`        | Path to project directory. Defaults to current working directory. ~~Path (positional)~~                                                                                                                 |
-| `--output`, `-o`     | Path to output file or `-` for stdout (default). If a file is specified and it already exists and contains auto-generated docs, only the auto-generated docs section is replaced. ~~Path (positional)~~ |
-|  `--no-emoji`, `-NE` | Don't use emoji in the titles. ~~bool (flag)~~                                                                                                                                                          |
-| **CREATES**          | The Markdown-formatted project documentation.                                                                                                                                                           |
+| Name                | Description                                                                                                                                                                                             |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `project_dir`       | Path to project directory. Defaults to current working directory. ~~Path (positional)~~                                                                                                                 |
+| `--output`, `-o`    | Path to output file or `-` for stdout (default). If a file is specified and it already exists and contains auto-generated docs, only the auto-generated docs section is replaced. ~~Path (positional)~~ |
+| `--no-emoji`, `-NE` | Don't use emoji in the titles. ~~bool (flag)~~                                                                                                                                                          |
+| **CREATES**         | The Markdown-formatted project documentation.                                                                                                                                                           |

 ### project dvc {#project-dvc tag="command"}

@ -1497,7 +1497,7 @@ $ python -m spacy project dvc [project_dir] [workflow] [--force] [--verbose]
 | `project_dir`     | Path to project directory. Defaults to current working directory. ~~Path (positional)~~                       |
 | `workflow`        | Name of workflow defined in `project.yml`. Defaults to first workflow if not set. ~~Optional[str] \(option)~~ |
 | `--force`, `-F`   | Force-updating config file. ~~bool (flag)~~                                                                   |
-| `--verbose`, `-V` |  Print more output generated by DVC. ~~bool (flag)~~                                                          |
+| `--verbose`, `-V` | Print more output generated by DVC. ~~bool (flag)~~                                                           |
 | `--help`, `-h`    | Show help message and available arguments. ~~bool (flag)~~                                                    |
 | **CREATES**       | A `dvc.yaml` file in the project directory, based on the steps defined in the given workflow.                 |

@ -1588,5 +1588,5 @@ $ python -m spacy huggingface-hub push [whl_path] [--org] [--msg] [--local-repo]
 | `--org`, `-o`        | Optional name of organization to which the pipeline should be uploaded. ~~str (option)~~                                                        |
 | `--msg`, `-m`        | Commit message to use for update. Defaults to `"Update spaCy pipeline"`. ~~str (option)~~                                                       |
 | `--local-repo`, `-l` | Local path to the model repository (will be created if it doesn't exist). Defaults to `hub` in the current working directory. ~~Path (option)~~ |
-| `--verbose`, `-V`    | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~                                                     |
+| `--verbose`, `-V`    | Output additional info for debugging, e.g. the full generated hub metadata. ~~bool (flag)~~                                                     |
 | **UPLOADS**          | The pipeline to the hub.                                                                                                                        |
--- a/website/docs/api/corpus.md
+++ b/website/docs/api/corpus.md
@ -37,13 +37,13 @@ streaming.
 > augmenter = null
 > ```

-| Name            | Description                                                                                                                                                                                                                                                                              |
-| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path`          | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~                                                                                                                                                    |
-|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~                                                                                                                                 |
-| `max_length`    | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~                                                                                                                                      |
-| `limit`         | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                                                                                                                                                          |
-| `augmenter`     | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
+| Name           | Description                                                                                                                                                                                                                                                                              |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path`         | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Path~~                                                                                                                                                    |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~                                                                                                                                 |
+| `max_length`   | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~                                                                                                                                      |
+| `limit`        | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                                                                                                                                                          |
+| `augmenter`    | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |

 ```python
 %%GITHUB_SPACY/spacy/training/corpus.py
@ -71,15 +71,15 @@ train/test skew.
 > corpus = Corpus("./data", limit=10)
 > ```

-| Name            | Description                                                                                                                                         |
-| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path`          | The directory or filename to read from. ~~Union[str, Path]~~                                                                                        |
-| _keyword-only_  |                                                                                                                                                     |
-|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. Defaults to `False`. ~~bool~~                     |
-| `max_length`    | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
-| `limit`         | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                     |
-| `augmenter`     | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~                                                           |
-| `shuffle`       | Whether to shuffle the examples. Defaults to `False`. ~~bool~~                                                                                      |
+| Name           | Description                                                                                                                                         |
+| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path`         | The directory or filename to read from. ~~Union[str, Path]~~                                                                                        |
+| _keyword-only_ |                                                                                                                                                     |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. Defaults to `False`. ~~bool~~                     |
+| `max_length`   | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~ |
+| `limit`        | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                     |
+| `augmenter`    | Optional data augmentation callback. ~~Callable[[Language, Example], Iterable[Example]]~~                                                           |
+| `shuffle`      | Whether to shuffle the examples. Defaults to `False`. ~~bool~~                                                                                      |

 ## Corpus.\_\_call\_\_ {#call tag="method"}

--- a/website/docs/api/language.md
+++ b/website/docs/api/language.md
@ -1123,7 +1123,7 @@ instance and factory instance.
 | `factory`               | The name of the registered component factory. ~~str~~                                                                                                                                                                                                                                                                              |
 | `default_config`        | The default config, describing the default values of the factory arguments. ~~Dict[str, Any]~~                                                                                                                                                                                                                                     |
 | `assigns`               | `Doc` or `Token` attributes assigned by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
-| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
-| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
+| `requires`              | `Doc` or `Token` attributes required by this component, e.g. `["token.ent_id"]`. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                                                                                 |
+| `retokenizes`           | Whether the component changes tokenization. Used for [pipe analysis](/usage/processing-pipelines#analysis). ~~bool~~                                                                                                                                                                                                               |
 | `default_score_weights` | The scores to report during training, and their default weight towards the final score used to select the best model. Weights should sum to `1.0` per component and will be combined and normalized for the whole pipeline. If a weight is set to `None`, the score will not be logged or weighted. ~~Dict[str, Optional[float]]~~ |
 | `scores`                | All scores set by the components if it's trainable, e.g. `["ents_f", "ents_r", "ents_p"]`. Based on the `default_score_weights` and used for [pipe analysis](/usage/processing-pipelines#analysis). ~~Iterable[str]~~                                                                                                              |
--- a/website/docs/api/matcher.md
+++ b/website/docs/api/matcher.md
@ -30,26 +30,26 @@ pattern keys correspond to a number of
 [`Token` attributes](/api/token#attributes). The supported attributes for
 rule-based matching are:

-| Attribute                                       |  Description                                                                                                              |
-| ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                               |
-| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                               |
-| `NORM`                                          | The normalized form of the token text. ~~str~~                                                                            |
-| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                             |
-|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                     |
-|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
-|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
-|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
-|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
-|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
-| `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                      |
-|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
-| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                         |
-| `ENT_IOB`                                       | The IOB part of the token's entity tag. ~~str~~                                                                           |
-| `ENT_ID`                                        | The token's entity ID (`ent_id`). ~~str~~                                                                                 |
-| `ENT_KB_ID`                                     | The token's entity knowledge base ID (`ent_kb_id`). ~~str~~                                                               |
-| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
-| `OP`                                            | Operator or quantifier to determine how often to match a token pattern. ~~str~~                                           |
+| Attribute                                      | Description                                                                                                               |
+| ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
+| `ORTH`                                         | The exact verbatim text of a token. ~~str~~                                                                               |
+| `TEXT` <Tag variant="new">2.1</Tag>            | The exact verbatim text of a token. ~~str~~                                                                               |
+| `NORM`                                         | The normalized form of the token text. ~~str~~                                                                            |
+| `LOWER`                                        | The lowercase form of the token text. ~~str~~                                                                             |
+| `LENGTH`                                       | The length of the token text. ~~int~~                                                                                     |
+| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                          |
+| `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                |
+| `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                     |
+| `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                      |
+| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                       |
+| `SPACY`                                        | Token has a trailing space. ~~bool~~                                                                                      |
+| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~       |
+| `ENT_TYPE`                                     | The token's entity label. ~~str~~                                                                                         |
+| `ENT_IOB`                                      | The IOB part of the token's entity tag. ~~str~~                                                                           |
+| `ENT_ID`                                       | The token's entity ID (`ent_id`). ~~str~~                                                                                 |
+| `ENT_KB_ID`                                    | The token's entity knowledge base ID (`ent_kb_id`). ~~str~~                                                               |
+| `_` <Tag variant="new">2.1</Tag>               | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
+| `OP`                                           | Operator or quantifier to determine how often to match a token pattern. ~~str~~                                           |

 Operators and quantifiers define **how often** a token pattern should be
 matched:
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -320,7 +320,6 @@ If a setting is not present in the options, the default value will be used.
 | `template` <Tag variant="new">2.2</Tag>          | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}`, `{text}` and `{label}`. See [`templates.py`](%%GITHUB_SPACY/spacy/displacy/templates.py) for examples. ~~Optional[str]~~ |
 | `kb_url_template` <Tag variant="new">3.2.1</Tag> | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in. ~~Optional[str]~~                                                                                       |

-
 #### Span Visualizer options {#displacy_options-span}

 > #### Example
@ -330,21 +329,19 @@ If a setting is not present in the options, the default value will be used.
 > displacy.serve(doc, style="span", options=options)
 > ```

-| Name            | Description                                                                                                                                             |
-|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `spans_key`       | Which spans key to render spans from. Default is `"sc"`. ~~str~~                                                                                                   |
+| Name              | Description                                                                                                                                                                               |
+| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `spans_key`       | Which spans key to render spans from. Default is `"sc"`. ~~str~~                                                                                                                          |
 | `templates`       | Dictionary containing the keys `"span"`, `"slice"`, and `"start"`. These dictate how the overall span, a span slice, and the starting token will be rendered. ~~Optional[Dict[str, str]~~ |
-| `kb_url_template` | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in ~~Optional[str]~~                    |
-| `colors`          | Color overrides. Entity types should be mapped to color names or values. ~~Dict[str, str]~~ |
+| `kb_url_template` | Optional template to construct the KB url for the entity to link to. Expects a python f-string format with single field to fill in ~~Optional[str]~~                                      |
+| `colors`          | Color overrides. Entity types should be mapped to color names or values. ~~Dict[str, str]~~                                                                                               |

-
-By default, displaCy comes with colors for all entity types used by [spaCy's
-trained pipelines](/models) for both entity and span visualizer. If you're
-using custom entity types, you can use the `colors` setting to add your own
-colors for them. Your application or pipeline package can also expose a
-[`spacy_displacy_colors` entry
-point](/usage/saving-loading#entry-points-displacy) to add custom labels and
-their colors automatically.
+By default, displaCy comes with colors for all entity types used by
+[spaCy's trained pipelines](/models) for both entity and span visualizer. If
+you're using custom entity types, you can use the `colors` setting to add your
+own colors for them. Your application or pipeline package can also expose a
+[`spacy_displacy_colors` entry point](/usage/saving-loading#entry-points-displacy)
+to add custom labels and their colors automatically.

 By default, displaCy links to `#` for entities without a `kb_id` set on their
 span. If you wish to link an entity to their URL then consider using the
@ -354,7 +351,6 @@ span. If you wish to link an entity to their URL then consider using the
 should redirect you to their Wikidata page, in this case
 `https://www.wikidata.org/wiki/Q95`.

-
 ## registry {#registry source="spacy/util.py" new="3"}

 spaCy's function registry extends
@ -443,8 +439,8 @@ and the accuracy scores on the development set.
 The built-in, default logger is the ConsoleLogger, which prints results to the
 console in tabular format. The
 [spacy-loggers](https://github.com/explosion/spacy-loggers) package, included as
-a dependency of spaCy, enables other loggers, such as one that
-sends results to a [Weights & Biases](https://www.wandb.com/) dashboard.
+a dependency of spaCy, enables other loggers, such as one that sends results to
+a [Weights & Biases](https://www.wandb.com/) dashboard.

 Instead of using one of the built-in loggers, you can
 [implement your own](/usage/training#custom-logging).
@ -583,14 +579,14 @@ the [`Corpus`](/api/corpus) class.
 > limit = 0
 > ```

-| Name            | Description                                                                                                                                                                                                                                                                              |
-| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `path`          | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~                                                                                                                                        |
-|  `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~                                                                                                                                 |
-| `max_length`    | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~                                                                                                                                      |
-| `limit`         | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                                                                                                                                                          |
-| `augmenter`     | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
-| **CREATES**     | The corpus reader. ~~Corpus~~                                                                                                                                                                                                                                                            |
+| Name           | Description                                                                                                                                                                                                                                                                              |
+| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `path`         | The directory or filename to read from. Expects data in spaCy's binary [`.spacy` format](/api/data-formats#binary-training). ~~Union[str, Path]~~                                                                                                                                        |
+| `gold_preproc` | Whether to set up the Example object with gold-standard sentences and tokens for the predictions. See [`Corpus`](/api/corpus#init) for details. ~~bool~~                                                                                                                                 |
+| `max_length`   | Maximum document length. Longer documents will be split into sentences, if sentence boundaries are available. Defaults to `0` for no limit. ~~int~~                                                                                                                                      |
+| `limit`        | Limit corpus to a subset of examples, e.g. for debugging. Defaults to `0` for no limit. ~~int~~                                                                                                                                                                                          |
+| `augmenter`    | Apply some simply data augmentation, where we replace tokens with variations. This is especially useful for punctuation and case replacement, to help generalize beyond corpora that don't have smart-quotes, or only have smart quotes, etc. Defaults to `None`. ~~Optional[Callable]~~ |
+| **CREATES**    | The corpus reader. ~~Corpus~~                                                                                                                                                                                                                                                            |

 #### spacy.JsonlCorpus.v1 {#jsonlcorpus tag="registered function"}

--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -48,7 +48,7 @@ but do not change its part-of-speech. We say that a **lemma** (root form) is
 **inflected** (modified/combined) with one or more **morphological features** to
 create a surface form. Here are some examples:

-| Context                                  | Surface | Lemma | POS    |  Morphological Features                  |
+| Context                                  | Surface | Lemma | POS    | Morphological Features                   |
 | ---------------------------------------- | ------- | ----- | ------ | ---------------------------------------- |
 | I was reading the paper                  | reading | read  | `VERB` | `VerbForm=Ger`                           |
 | I don't watch the news, I read the paper | read    | read  | `VERB` | `VerbForm=Fin`, `Mood=Ind`, `Tense=Pres` |
@ -430,7 +430,7 @@ for token in doc:
    print(token.text, token.pos_, token.dep_, token.head.text)
 ```

-| Text                                |  POS   | Dep     | Head text |
+| Text                                | POS    | Dep     | Head text |
 | ----------------------------------- | ------ | ------- | --------- |
 | Credit and mortgage account holders | `NOUN` | `nsubj` | submit    |
 | must                                | `VERB` | `aux`   | submit    |
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -158,23 +158,23 @@ The available token pattern keys correspond to a number of
 [`Token` attributes](/api/token#attributes). The supported attributes for
 rule-based matching are:

-| Attribute                                       |  Description                                                                                                                                                                                                                                                                                              |
-| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `ORTH`                                          | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
-| `TEXT` <Tag variant="new">2.1</Tag>             | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
-| `NORM`                                          | The normalized form of the token text. ~~str~~                                                                                                                                                                                                                                                            |
-| `LOWER`                                         | The lowercase form of the token text. ~~str~~                                                                                                                                                                                                                                                             |
-|  `LENGTH`                                       | The length of the token text. ~~int~~                                                                                                                                                                                                                                                                     |
-|  `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                                                                                                                                                                                                          |
-|  `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                                                                                                                                                                                                |
-|  `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                                                                                                                                                                                                     |
-|  `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                                                                                                                                                                                                      |
-|  `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                                                                                                                                                                                                       |
-| `SPACY`                                         | Token has a trailing space. ~~bool~~                                                                                                                                                                                                                                                                      |
-|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
-| `ENT_TYPE`                                      | The token's entity label. ~~str~~                                                                                                                                                                                                                                                                         |
-| `_` <Tag variant="new">2.1</Tag>                | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~                                                                                                                                                                                 |
-| `OP`                                            | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                                                                                                                                                                                                           |
+| Attribute                                      | Description                                                                                                                                                                                                                                                                                               |
+| ---------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ORTH`                                         | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
+| `TEXT` <Tag variant="new">2.1</Tag>            | The exact verbatim text of a token. ~~str~~                                                                                                                                                                                                                                                               |
+| `NORM`                                         | The normalized form of the token text. ~~str~~                                                                                                                                                                                                                                                            |
+| `LOWER`                                        | The lowercase form of the token text. ~~str~~                                                                                                                                                                                                                                                             |
+| `LENGTH`                                       | The length of the token text. ~~int~~                                                                                                                                                                                                                                                                     |
+| `IS_ALPHA`, `IS_ASCII`, `IS_DIGIT`             | Token text consists of alphabetic characters, ASCII characters, digits. ~~bool~~                                                                                                                                                                                                                          |
+| `IS_LOWER`, `IS_UPPER`, `IS_TITLE`             | Token text is in lowercase, uppercase, titlecase. ~~bool~~                                                                                                                                                                                                                                                |
+| `IS_PUNCT`, `IS_SPACE`, `IS_STOP`              | Token is punctuation, whitespace, stop word. ~~bool~~                                                                                                                                                                                                                                                     |
+| `IS_SENT_START`                                | Token is start of sentence. ~~bool~~                                                                                                                                                                                                                                                                      |
+| `LIKE_NUM`, `LIKE_URL`, `LIKE_EMAIL`           | Token text resembles a number, URL, email. ~~bool~~                                                                                                                                                                                                                                                       |
+| `SPACY`                                        | Token has a trailing space. ~~bool~~                                                                                                                                                                                                                                                                      |
+| `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. Note that the values of these attributes are case-sensitive. For a list of available part-of-speech tags and dependency labels, see the [Annotation Specifications](/api/annotation). ~~str~~ |
+| `ENT_TYPE`                                     | The token's entity label. ~~str~~                                                                                                                                                                                                                                                                         |
+| `_` <Tag variant="new">2.1</Tag>               | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~                                                                                                                                                                                 |
+| `OP`                                           | [Operator or quantifier](#quantifiers) to determine how often to match a token pattern. ~~str~~                                                                                                                                                                                                           |

 <Accordion title="Does it matter if the attribute names are uppercase or lowercase?">

--- a/website/docs/usage/v3-1.md
+++ b/website/docs/usage/v3-1.md
@ -132,13 +132,13 @@ your own.
 > contributions for Catalan and to Kenneth Enevoldsen for Danish. For additional
 > Danish pipelines, check out [DaCy](https://github.com/KennethEnevoldsen/DaCy).

-| Package                                           | Language | UPOS | Parser LAS |  NER F |
-| ------------------------------------------------- | -------- | ---: | ---------: | -----: |
-| [`ca_core_news_sm`](/models/ca#ca_core_news_sm)   | Catalan  | 98.2 |       87.4 |   79.8 |
-| [`ca_core_news_md`](/models/ca#ca_core_news_md)   | Catalan  | 98.3 |       88.2 |   84.0 |
-| [`ca_core_news_lg`](/models/ca#ca_core_news_lg)   | Catalan  | 98.5 |       88.4 |   84.2 |
-| [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan  | 98.9 |       93.0 |   91.2 |
-| [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish   | 98.0 |       85.0 |   82.9 |
+| Package                                           | Language | UPOS | Parser LAS | NER F |
+| ------------------------------------------------- | -------- | ---: | ---------: | ----: |
+| [`ca_core_news_sm`](/models/ca#ca_core_news_sm)   | Catalan  | 98.2 |       87.4 |  79.8 |
+| [`ca_core_news_md`](/models/ca#ca_core_news_md)   | Catalan  | 98.3 |       88.2 |  84.0 |
+| [`ca_core_news_lg`](/models/ca#ca_core_news_lg)   | Catalan  | 98.5 |       88.4 |  84.2 |
+| [`ca_core_news_trf`](/models/ca#ca_core_news_trf) | Catalan  | 98.9 |       93.0 |  91.2 |
+| [`da_core_news_trf`](/models/da#da_core_news_trf) | Danish   | 98.0 |       85.0 |  82.9 |

 ### Resizable text classification architectures {#resizable-textcat}

--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -116,7 +116,7 @@ import Benchmarks from 'usage/\_benchmarks-models.md'
 > corpus that had both syntactic and entity annotations, so the transformer
 > models for those languages do not include NER.

-| Package                                          | Language | Transformer                                                                                   | Tagger | Parser |  NER |
+| Package                                          | Language | Transformer                                                                                   | Tagger | Parser |  NER |
 | ------------------------------------------------ | -------- | --------------------------------------------------------------------------------------------- | -----: | -----: | ---: |
 | [`en_core_web_trf`](/models/en#en_core_web_trf)  | English  | [`roberta-base`](https://huggingface.co/roberta-base)                                         |   97.8 |   95.2 | 89.9 |
 | [`de_dep_news_trf`](/models/de#de_dep_news_trf)  | German   | [`bert-base-german-cased`](https://huggingface.co/bert-base-german-cased)                     |   99.0 |   95.8 |    - |
@ -856,9 +856,9 @@ attribute ruler before training using the `[initialize]` block of your config.

 ### Using Lexeme Tables

-To use tables like `lexeme_prob` when training a model from scratch, you need
-to add an entry to the `initialize` block in your config. Here's what that
-looks like for the existing trained pipelines:
+To use tables like `lexeme_prob` when training a model from scratch, you need to
+add an entry to the `initialize` block in your config. Here's what that looks
+like for the existing trained pipelines:

 ```ini
 [initialize.lookups]