Update docs [ci skip]

2025-08-11 07:34:54 +03:00 · 2020-08-29 18:43:19 +02:00 · 2020-08-29 18:43:19 +02:00 · 9b86312bab
commit 9b86312bab
parent d73f7229c0
7 changed files with 183 additions and 141 deletions
--- a/website/docs/api/attributeruler.md
+++ b/website/docs/api/attributeruler.md
@ -12,7 +12,8 @@ The attribute ruler lets you set token attributes for tokens identified by
 [`Matcher` patterns](/usage/rule-based-matching#matcher). The attribute ruler is
 typically used to handle exceptions for token attributes and to map values
 between attributes such as mapping fine-grained POS tags to coarse-grained POS
-tags.
+tags. See the [usage guide](/usage/linguistic-features/#mappings-exceptions) for
+examples.

 ## Config and implementation {#config}

--- a/website/docs/usage/101/_pipelines.md
+++ b/website/docs/usage/101/_pipelines.md
@ -12,19 +12,16 @@ is then passed on to the next component.
 > - **Creates:** Objects, attributes and properties modified and set by the
 >   component.

-| Name           | Component                                   | Creates                                                   | Description                      |
-| -------------- | ------------------------------------------- | --------------------------------------------------------- | -------------------------------- |
-| **tokenizer**  | [`Tokenizer`](/api/tokenizer)               | `Doc`                                                     | Segment text into tokens.        |
-| **tagger**     | [`Tagger`](/api/tagger)                     | `Token.tag`                                               | Assign part-of-speech tags.      |
-| **parser**     | [`DependencyParser`](/api/dependencyparser) | `Token.head`, `Token.dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels.        |
-| **ner**        | [`EntityRecognizer`](/api/entityrecognizer) | `Doc.ents`, `Token.ent_iob`, `Token.ent_type`             | Detect and label named entities. |
-| **lemmatizer** | [`Lemmatizer`](/api/lemmatizer)             | `Token.lemma`                                             | Assign base forms.               |
-| **textcat**    | [`TextCategorizer`](/api/textcategorizer)   | `Doc.cats`                                                | Assign document labels.          |
-
-| **custom** |
-[custom components](/usage/processing-pipelines#custom-components) |
-`Doc._.xxx`, `Token._.xxx`, `Span._.xxx` | Assign custom attributes, methods or
-properties. |
+| Name                  | Component                                                          | Creates                                                   | Description                                      |
+| --------------------- | ------------------------------------------------------------------ | --------------------------------------------------------- | ------------------------------------------------ |
+| **tokenizer**         | [`Tokenizer`](/api/tokenizer)                                      | `Doc`                                                     | Segment text into tokens.                        |
+| _processing pipeline_ |                                                                    |                                                           |
+| **tagger**            | [`Tagger`](/api/tagger)                                            | `Token.tag`                                               | Assign part-of-speech tags.                      |
+| **parser**            | [`DependencyParser`](/api/dependencyparser)                        | `Token.head`, `Token.dep`, `Doc.sents`, `Doc.noun_chunks` | Assign dependency labels.                        |
+| **ner**               | [`EntityRecognizer`](/api/entityrecognizer)                        | `Doc.ents`, `Token.ent_iob`, `Token.ent_type`             | Detect and label named entities.                 |
+| **lemmatizer**        | [`Lemmatizer`](/api/lemmatizer)                                    | `Token.lemma`                                             | Assign base forms.                               |
+| **textcat**           | [`TextCategorizer`](/api/textcategorizer)                          | `Doc.cats`                                                | Assign document labels.                          |
+| **custom**            | [custom components](/usage/processing-pipelines#custom-components) | `Doc._.xxx`, `Token._.xxx`, `Span._.xxx`                  | Assign custom attributes, methods or properties. |

 The processing pipeline always **depends on the statistical model** and its
 capabilities. For example, a pipeline can only include an entity recognizer
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@ -57,41 +57,50 @@ create a surface form. Here are some examples:
 Morphological features are stored in the [`MorphAnalysis`](/api/morphanalysis)
 under `Token.morph`, which allows you to access individual morphological
 features. The attribute `Token.morph_` provides the morphological analysis in
-the Universal Dependencies FEATS format.
+the Universal Dependencies
+[FEATS](https://universaldependencies.org/format.html#morphological-annotation)
+format.
+
+> #### 📝 Things to try
+>
+> 1. Change "I" to "She". You should see that the morphological features change
+>    and express that it's a pronoun in the third person.
+> 2. Inspect `token.morph_` for the other tokens.

 ```python
 ### {executable="true"}
 import spacy

 nlp = spacy.load("en_core_web_sm")
+print("Pipeline:", nlp.pipe_names)
 doc = nlp("I was reading the paper.")
-
-token = doc[0] # "I"
-assert token.morph_ == "Case=Nom|Number=Sing|Person=1|PronType=Prs"
-assert token.morph.get("PronType") == ["Prs"]
+token = doc[0]  # 'I'
+print(token.morph_)  # 'Case=Nom|Number=Sing|Person=1|PronType=Prs'
+print(token.morph.get("PronType"))  # ['Prs']
 ```

 ### Statistical morphology {#morphologizer new="3" model="morphologizer"}

-spaCy v3 includes a statistical morphologizer component that assigns the
-morphological features and POS as `Token.morph` and `Token.pos`.
+spaCy's statistical [`Morphologizer`](/api/morphologizer) component assigns the
+morphological features and coarse-grained part-of-speech tags as `Token.morph`
+and `Token.pos`.

 ```python
 ### {executable="true"}
 import spacy

 nlp = spacy.load("de_core_news_sm")
-doc = nlp("Wo bist du?") # 'Where are you?'
-assert doc[2].morph_ == "Case=Nom|Number=Sing|Person=2|PronType=Prs"
-assert doc[2].pos_ == "PRON"
+doc = nlp("Wo bist du?") # English: 'Where are you?'
+print(doc[2].morph_)  # 'Case=Nom|Number=Sing|Person=2|PronType=Prs'
+print(doc[2].pos_) # 'PRON'
 ```

 ### Rule-based morphology {#rule-based-morphology}

 For languages with relatively simple morphological systems like English, spaCy
 can assign morphological features through a rule-based approach, which uses the
-token text and fine-grained part-of-speech tags to produce coarse-grained
-part-of-speech tags and morphological features.
+**token text** and **fine-grained part-of-speech tags** to produce
+coarse-grained part-of-speech tags and morphological features.

 1. The part-of-speech tagger assigns each token a **fine-grained part-of-speech
   tag**. In the API, these tags are known as `Token.tag`. They express the
@ -108,16 +117,16 @@ import spacy

 nlp = spacy.load("en_core_web_sm")
 doc = nlp("Where are you?")
-assert doc[2].morph_ == "Case=Nom|Person=2|PronType=Prs"
-assert doc[2].pos_ == "PRON"
+print(doc[2].morph_)  # 'Case=Nom|Person=2|PronType=Prs'
+print(doc[2].pos_)  # 'PRON'
 ```

 ## Lemmatization {#lemmatization model="lemmatizer" new="3"}

 The [`Lemmatizer`](/api/lemmatizer) is a pipeline component that provides lookup
 and rule-based lemmatization methods in a configurable component. An individual
-language can extend the `Lemmatizer` as part of its [language
-data](#language-data).
+language can extend the `Lemmatizer` as part of its
+[language data](#language-data).

 ```python
 ### {executable="true"}
@ -126,36 +135,38 @@ import spacy
 # English models include a rule-based lemmatizer
 nlp = spacy.load("en_core_web_sm")
 lemmatizer = nlp.get_pipe("lemmatizer")
-assert lemmatizer.mode == "rule"
+print(lemmatizer.mode)  # 'rule'

 doc = nlp("I was reading the paper.")
-assert doc[1].lemma_ == "be"
-assert doc[2].lemma_ == "read"
+print([token.lemma_ for token in doc])
+# ['I', 'be', 'read', 'the', 'paper', '.']
 ```

-<Infobox title="Important note" variant="warning">
+<Infobox title="Changed in v3.0" variant="warning">

-Unlike spaCy v2, spaCy v3 models do not provide lemmas by default or switch
-automatically between lookup and rule-based lemmas depending on whether a
-tagger is in the pipeline. To have lemmas in a `Doc`, the pipeline needs to
-include a `lemmatizer` component. A `lemmatizer` is configured to use a single
-mode such as `"lookup"` or `"rule"` on initialization. The `"rule"` mode
-requires `Token.pos` to be set by a previous component.
+Unlike spaCy v2, spaCy v3 models do _not_ provide lemmas by default or switch
+automatically between lookup and rule-based lemmas depending on whether a tagger
+is in the pipeline. To have lemmas in a `Doc`, the pipeline needs to include a
+[`Lemmatizer`](/api/lemmatizer) component. The lemmatizer component is
+configured to use a single mode such as `"lookup"` or `"rule"` on
+initialization. The `"rule"` mode requires `Token.pos` to be set by a previous
+component.

 </Infobox>

 The data for spaCy's lemmatizers is distributed in the package
 [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). The
-provided models already include all the required tables, but if you are
-creating new models, you'll probably want to install `spacy-lookups-data` to
-provide the data when the lemmatizer is initialized.
+provided models already include all the required tables, but if you are creating
+new models, you'll probably want to install `spacy-lookups-data` to provide the
+data when the lemmatizer is initialized.

 ### Lookup lemmatizer {#lemmatizer-lookup}

 For models without a tagger or morphologizer, a lookup lemmatizer can be added
 to the pipeline as long as a lookup table is provided, typically through
-`spacy-lookups-data`. The lookup lemmatizer looks up the token surface form in
-the lookup table without reference to the token's part-of-speech or context.
+[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data). The
+lookup lemmatizer looks up the token surface form in the lookup table without
+reference to the token's part-of-speech or context.

 ```python
 # pip install spacy-lookups-data
@ -168,19 +179,18 @@ nlp.add_pipe("lemmatizer", config={"mode": "lookup"})
 ### Rule-based lemmatizer {#lemmatizer-rule}

 When training models that include a component that assigns POS (a morphologizer
-or a tagger with a [POS mapping](#mappings-exceptions)), a rule-based
-lemmatizer can be added using rule tables from `spacy-lookups-data`:
+or a tagger with a [POS mapping](#mappings-exceptions)), a rule-based lemmatizer
+can be added using rule tables from
+[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data):

 ```python
 # pip install spacy-lookups-data
 import spacy

 nlp = spacy.blank("de")
-
-# morphologizer (note: model is not yet trained!)
+# Morphologizer (note: model is not yet trained!)
 nlp.add_pipe("morphologizer")
-
-# rule-based lemmatizer
+# Rule-based lemmatizer
 nlp.add_pipe("lemmatizer", config={"mode": "rule"})
 ```

@ -1734,25 +1744,26 @@ print("After:", [sent.text for sent in doc.sents])

 ## Mappings & Exceptions {#mappings-exceptions new="3"}

-The [`AttributeRuler`](/api/attributeruler) manages rule-based mappings and
-exceptions for all token-level attributes. As the number of pipeline components
-has grown from spaCy v2 to v3, handling rules and exceptions in each component
-individually has become impractical, so the `AttributeRuler` provides a single
-component with a unified pattern format for all token attribute mappings and
-exceptions.
+The [`AttributeRuler`](/api/attributeruler) manages **rule-based mappings and
+exceptions** for all token-level attributes. As the number of
+[pipeline components](/api/#architecture-pipeline) has grown from spaCy v2 to
+v3, handling rules and exceptions in each component individually has become
+impractical, so the `AttributeRuler` provides a single component with a unified
+pattern format for all token attribute mappings and exceptions.

-The `AttributeRuler` uses [`Matcher`
-patterns](/usage/rule-based-matching#adding-patterns) to identify tokens and
-then assigns them the provided attributes. If needed, the `Matcher` patterns
-can include context around the target token. For example, the `AttributeRuler`
-can:
+The `AttributeRuler` uses
+[`Matcher` patterns](/usage/rule-based-matching#adding-patterns) to identify
+tokens and then assigns them the provided attributes. If needed, the
+[`Matcher`](/api/matcher) patterns can include context around the target token.
+For example, the attribute ruler can:

- provide exceptions for any token attributes
- map fine-grained tags to coarse-grained tags for languages without statistical
-  morphologizers (replacing the v2 tag map in the language data)
- map token surface form + fine-grained tags to morphological features
-  (replacing the v2 morph rules in the language data)
- specify the tags for space tokens (replacing hard-coded behavior in the
+- provide exceptions for any **token attributes**
+- map **fine-grained tags** to **coarse-grained tags** for languages without
+  statistical morphologizers (replacing the v2.x `tag_map` in the
+  [language data](#language-data))
+- map token **surface form + fine-grained tags** to **morphological features**
+  (replacing the v2.x `morph_rules` in the [language data](#language-data))
+- specify the **tags for space tokens** (replacing hard-coded behavior in the
  tagger)

 The following example shows how the tag and POS `NNP`/`PROPN` can be specified
@ -1765,41 +1776,42 @@ import spacy

 nlp = spacy.load("en_core_web_sm")
 text = "I saw The Who perform. Who did you see?"
-
 doc1 = nlp(text)
-assert doc1[2].tag_ == "DT"
-assert doc1[2].pos_ == "DET"
-assert doc1[3].tag_ == "WP"
-assert doc1[3].pos_ == "PRON"
+print(doc1[2].tag_, doc1[2].pos_)  # DT DET
+print(doc1[3].tag_, doc1[3].pos_)  # WP PRON

-# add a new exception for "The Who" as NNP/PROPN NNP/PROPN
+# Add attribute ruler with exception for "The Who" as NNP/PROPN NNP/PROPN
 ruler = nlp.get_pipe("attribute_ruler")
-
-# pattern to match "The Who"
+# Pattern to match "The Who"
 patterns = [[{"LOWER": "the"}, {"TEXT": "Who"}]]
-# the attributes to assign to the matched token
+# The attributes to assign to the matched token
 attrs = {"TAG": "NNP", "POS": "PROPN"}
-
-# add rule for "The" in "The Who"
-ruler.add(patterns=patterns, attrs=attrs, index=0)
-# add rule for "Who" in "The Who"
-ruler.add(patterns=patterns, attrs=attrs, index=1)
+# Add rules to the attribute ruler
+ruler.add(patterns=patterns, attrs=attrs, index=0)  # "The" in "The Who"
+ruler.add(patterns=patterns, attrs=attrs, index=1)  # "Who" in "The Who"

 doc2 = nlp(text)
-assert doc2[2].tag_ == "NNP"
-assert doc2[3].tag_ == "NNP"
-assert doc2[2].pos_ == "PROPN"
-assert doc2[3].pos_ == "PROPN"
-
-# the second "Who" remains unmodified
-assert doc2[5].tag_ == "WP"
-assert doc2[5].pos_ == "PRON"
+print(doc2[2].tag_, doc2[2].pos_)  # NNP PROPN
+print(doc2[3].tag_, doc2[3].pos_)  # NNP PROPN
+# The second "Who" remains unmodified
+print(doc2[5].tag_, doc2[5].pos_)  # WP PRON
 ```

-For easy migration from from spaCy v2 to v3, the `AttributeRuler` can import v2
-`TAG_MAP` and `MORPH_RULES` data with the methods
-[`AttributerRuler.load_from_tag_map`](/api/attributeruler#load_from_tag_map) and
-[`AttributeRuler.load_from_morph_rules`](/api/attributeruler#load_from_morph_rules).
+<Infobox variant="warning" title="Migrating from spaCy v2.x">
+
+For easy migration from from spaCy v2 to v3, the
+[`AttributeRuler`](/api/attributeruler) can import a **tag map and morph rules**
+in the v2 format with the methods
+[`load_from_tag_map`](/api/attributeruler#load_from_tag_map) and
+[`load_from_morph_rules`](/api/attributeruler#load_from_morph_rules).
+
+```diff
+nlp = spacy.blank("en")
+ ruler = nlp.add_pipe("attribute_ruler")
+ ruler.load_from_tag_map(YOUR_TAG_MAP)
+```
+
+</Infobox>

 ## Word vectors and semantic similarity {#vectors-similarity}

--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -250,26 +250,26 @@ in your config and see validation errors if the argument values don't match.

 The following methods, attributes and commands are new in spaCy v3.0.

-| Name                                                                                                                          | Description                                                                                                                                                                                      |
-| ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| [`Token.lex`](/api/token#attributes)                                                                                          | Access a token's [`Lexeme`](/api/lexeme).                                                                                                                                                        |
-| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes)                                                | Access a token's morphological analysis.                                                                                                                                                         |
-| [`Language.select_pipes`](/api/language#select_pipes)                                                                         | Context manager for enabling or disabling specific pipeline components for a block.                                                                                                              |
-| [`Language.disable_pipe`](/api/language#disable_pipe) [`Language.enable_pipe`](/api/language#enable_pipe)                     | Disable or enable a loaded pipeline component (but don't remove it).                                                                                                                             |
-| [`Language.analyze_pipes`](/api/language#analyze_pipes)                                                                       | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies.                                                                                                          |
-| [`Language.resume_training`](/api/language#resume_training)                                                                   | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting.                              |
-| [`@Language.factory`](/api/language#factory) [`@Language.component`](/api/language#component)                                 | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions.                                               |
-| [`Language.has_factory`](/api/language#has_factory)                                                                           | Check whether a component factory is registered on a language class.s                                                                                                                            |
-| [`Language.get_factory_meta`](/api/language#get_factory_meta) [`Language.get_pipe_meta`](/api/language#get_factory_meta)      | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name.                                                                                       |
-| [`Language.config`](/api/language#config)                                                                                     | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. |
-| [`Language.components`](/api/language#attributes) [`Language.component_names`](/api/language#attributes)                      | All available components and component names, including disabled components that are not run as part of the pipeline.                                                                            |
-| [`Language.disabled`](/api/language#attributes)                                                                               | Names of disabled components that are not run as part of the pipeline.                                                                                                                           |
-| [`Pipe.score`](/api/pipe#score)                                                                                               | Method on pipeline components that returns a dictionary of evaluation scores.                                                                                                                    |
-| [`registry`](/api/top-level#registry)                                                                                         | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config).                                                                                  |
-| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config)                       | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config).                                                                        |
-| [`util.get_installed_models`](/api/top-level#util.get_installed_models)                                                       | Names of all models installed in the environment.                                                                                                                                                |
-| [`init config`](/api/cli#init-config) [`init fill-config`](/api/cli#init-fill-config) [`debug config`](/api/cli#debug-config) | CLI commands for initializing, auto-filling and debugging [training configs](/usage/training).                                                                                                   |
-| [`project`](/api/cli#project)                                                                                                 | Suite of CLI commands for cloning, running and managing [spaCy projects](/usage/projects).                                                                                                       |
+| Name                                                                                                                            | Description                                                                                                                                                                                      |
+| ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| [`Token.lex`](/api/token#attributes)                                                                                            | Access a token's [`Lexeme`](/api/lexeme).                                                                                                                                                        |
+| [`Token.morph`](/api/token#attributes), [`Token.morph_`](/api/token#attributes)                                                 | Access a token's morphological analysis.                                                                                                                                                         |
+| [`Language.select_pipes`](/api/language#select_pipes)                                                                           | Context manager for enabling or disabling specific pipeline components for a block.                                                                                                              |
+| [`Language.disable_pipe`](/api/language#disable_pipe), [`Language.enable_pipe`](/api/language#enable_pipe)                      | Disable or enable a loaded pipeline component (but don't remove it).                                                                                                                             |
+| [`Language.analyze_pipes`](/api/language#analyze_pipes)                                                                         | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies.                                                                                                          |
+| [`Language.resume_training`](/api/language#resume_training)                                                                     | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting.                              |
+| [`@Language.factory`](/api/language#factory), [`@Language.component`](/api/language#component)                                  | Decorators for [registering](/usage/processing-pipelines#custom-components) pipeline component factories and simple stateless component functions.                                               |
+| [`Language.has_factory`](/api/language#has_factory)                                                                             | Check whether a component factory is registered on a language class.s                                                                                                                            |
+| [`Language.get_factory_meta`](/api/language#get_factory_meta), [`Language.get_pipe_meta`](/api/language#get_factory_meta)       | Get the [`FactoryMeta`](/api/language#factorymeta) with component metadata for a factory or instance name.                                                                                       |
+| [`Language.config`](/api/language#config)                                                                                       | The [config](/usage/training#config) used to create the current `nlp` object. An instance of [`Config`](https://thinc.ai/docs/api-config#config) and can be saved to disk and used for training. |
+| [`Language.components`](/api/language#attributes), [`Language.component_names`](/api/language#attributes)                       | All available components and component names, including disabled components that are not run as part of the pipeline.                                                                            |
+| [`Language.disabled`](/api/language#attributes)                                                                                 | Names of disabled components that are not run as part of the pipeline.                                                                                                                           |
+| [`Pipe.score`](/api/pipe#score)                                                                                                 | Method on pipeline components that returns a dictionary of evaluation scores.                                                                                                                    |
+| [`registry`](/api/top-level#registry)                                                                                           | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config).                                                                                  |
+| [`util.load_meta`](/api/top-level#util.load_meta), [`util.load_config`](/api/top-level#util.load_config)                        | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config).                                                                        |
+| [`util.get_installed_models`](/api/top-level#util.get_installed_models)                                                         | Names of all models installed in the environment.                                                                                                                                                |
+| [`init config`](/api/cli#init-config), [`init fill-config`](/api/cli#init-fill-config), [`debug config`](/api/cli#debug-config) | CLI commands for initializing, auto-filling and debugging [training configs](/usage/training).                                                                                                   |
+| [`project`](/api/cli#project)                                                                                                   | Suite of CLI commands for cloning, running and managing [spaCy projects](/usage/projects).                                                                                                       |

 ### New and updated documentation {#new-docs}

@ -304,7 +304,10 @@ format for documenting argument and return types.
  [Layers & Architectures](/usage/layers-architectures),
  [Projects](/usage/projects),
  [Custom pipeline components](/usage/processing-pipelines#custom-components),
-  [Custom tokenizers](/usage/linguistic-features#custom-tokenizer)
+  [Custom tokenizers](/usage/linguistic-features#custom-tokenizer),
+  [Morphology](/usage/linguistic-features#morphology),
+  [Lemmatization](/usage/linguistic-features#lemmatization),
+  [Mapping & Exceptions](/usage/linguistic-features#mappings-exceptions)
 - **API Reference: ** [Library architecture](/api),
  [Model architectures](/api/architectures), [Data formats](/api/data-formats)
 - **New Classes: ** [`Example`](/api/example), [`Tok2Vec`](/api/tok2vec),
@ -371,19 +374,25 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
  arguments). The `on_match` callback becomes an optional keyword argument.
 - The `PRON_LEMMA` symbol and `-PRON-` as an indicator for pronoun lemmas has
  been removed.
+- The `TAG_MAP` and `MORPH_RULES` in the language data have been replaced by the
+  more flexible [`AttributeRuler`](/api/attributeruler).
+- The [`Lemmatizer`](/api/lemmatizer) is now a standalone pipeline component and
+  doesn't provide lemmas by default or switch automatically between lookup and
+  rule-based lemmas. You can now add it to your pipeline explicitly and set its
+  mode on initialization.

 ### Removed or renamed API {#incompat-removed}

-| Removed                                                  | Replacement                                                                                |
-| -------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
-| `Language.disable_pipes`                                 | [`Language.select_pipes`](/api/language#select_pipes)                                      |
-| `GoldParse`                                              | [`Example`](/api/example)                                                                  |
-| `GoldCorpus`                                             | [`Corpus`](/api/corpus)                                                                    |
-| `KnowledgeBase.load_bulk`, `KnowledgeBase.dump`          | [`KnowledgeBase.from_disk`](/api/kb#from_disk), [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
-| `spacy init-model`                                       | [`spacy init model`](/api/cli#init-model)                                                  |
-| `spacy debug-data`                                       | [`spacy debug data`](/api/cli#debug-data)                                                  |
-| `spacy profile`                                          | [`spacy debug profile`](/api/cli#debug-profile)                                            |
-| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated                                                  |
+| Removed                                                  | Replacement                                                                                                  |
+| -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
+| `Language.disable_pipes`                                 | [`Language.select_pipes`](/api/language#select_pipes), [`Language.disable_pipe`](/api/language#disable_pipe) |
+| `GoldParse`                                              | [`Example`](/api/example)                                                                                    |
+| `GoldCorpus`                                             | [`Corpus`](/api/corpus)                                                                                      |
+| `KnowledgeBase.load_bulk`, `KnowledgeBase.dump`          | [`KnowledgeBase.from_disk`](/api/kb#from_disk), [`KnowledgeBase.to_disk`](/api/kb#to_disk)                   |
+| `spacy init-model`                                       | [`spacy init model`](/api/cli#init-model)                                                                    |
+| `spacy debug-data`                                       | [`spacy debug data`](/api/cli#debug-data)                                                                    |
+| `spacy profile`                                          | [`spacy debug profile`](/api/cli#debug-profile)                                                              |
+| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated                                                                    |

 The following deprecated methods, attributes and arguments were removed in v3.0.
 Most of them have been **deprecated for a while** and many would previously
@ -557,6 +566,24 @@ patterns = [nlp("health care reform"), nlp("healthcare reform")]
 + matcher.add("HEALTH", patterns, on_match=on_match)
 ```

+### Migrating tag maps and morph rules {#migrating-training-mappings-exceptions}
+
+Instead of defining a `tag_map` and `morph_rules` in the language data, spaCy
+v3.0 now manages mappings and exceptions with a separate and more flexible
+pipeline component, the [`AttributeRuler`](/api/attributeruler). See the
+[usage guide](/usage/linguistic-features#mappings-exceptions) for examples. The
+`AttributeRuler` provides two handy helper methods
+[`load_from_tag_map`](/api/attributeruler#load_from_tag_map) and
+[`load_from_morph_rules`](/api/attributeruler#load_from_morph_rules) that let
+you load in your existing tag map or morph rules:
+
+```diff
+nlp = spacy.blank("en")
+- nlp.vocab.morphology.load_tag_map(YOUR_TAG_MAP)
+ ruler = nlp.add_pipe("attribute_ruler")
+ ruler.load_from_tag_map(YOUR_TAG_MAP)
+```
+
 ### Training models {#migrating-training}

 To train your models, you should now pretty much always use the
@ -602,8 +629,8 @@ If you've exported a starter config from our
 values. You can then use the auto-generated `config.cfg` for training:

 ```diff
-### {wrap="true"}
- python -m spacy train en ./output ./train.json ./dev.json --pipeline tagger,parser --cnn-window 1 --bilstm-depth 0
+- python -m spacy train en ./output ./train.json ./dev.json
+--pipeline tagger,parser --cnn-window 1 --bilstm-depth 0
 + python -m spacy train ./config.cfg --output ./output
 ```

--- a/website/src/components/code.js
+++ b/website/src/components/code.js
@ -169,7 +169,13 @@ function formatCode(html, lang, prompt) {
    }
    const result = html
        .split('\n')
-        .map((line, i) => (prompt ? replacePrompt(line, prompt, i === 0) : line))
+        .map((line, i) => {
+            let newLine = prompt ? replacePrompt(line, prompt, i === 0) : line
+            if (lang === 'diff' && !line.startsWith('<')) {
+                newLine = highlightCode('python', line)
+            }
+            return newLine
+        })
        .join('\n')
    return htmlToReact(result)
 }
--- a/website/src/components/juniper.js
+++ b/website/src/components/juniper.js
@ -28,7 +28,6 @@ export default class Juniper extends React.Component {
            mode: this.props.lang,
            theme: this.props.theme,
        })
-
        const runCode = () => this.execute(outputArea, cm.getValue())
        cm.setOption('extraKeys', { 'Shift-Enter': runCode })
        Widget.attach(outputArea, this.outputRef)
--- a/website/src/styles/layout.sass
+++ b/website/src/styles/layout.sass
@ -65,12 +65,12 @@
    --color-subtle-dark: hsl(162, 5%, 60%)

    --color-green-medium: hsl(108, 66%, 63%)
-    --color-green-transparent: hsla(108, 66%, 63%, 0.11)
+    --color-green-transparent: hsla(108, 66%, 63%, 0.12)
    --color-red-light: hsl(355, 100%, 96%)
    --color-red-medium: hsl(346, 84%, 61%)
    --color-red-dark: hsl(332, 64%, 34%)
    --color-red-opaque: hsl(346, 96%, 89%)
-    --color-red-transparent: hsla(346, 84%, 61%, 0.11)
+    --color-red-transparent: hsla(346, 84%, 61%, 0.12)
    --color-yellow-light: hsl(46, 100%, 95%)
    --color-yellow-medium: hsl(45, 90%, 55%)
    --color-yellow-dark: hsl(44, 94%, 27%)
@ -79,11 +79,11 @@
    // Syntax Highlighting
    --syntax-comment: hsl(162, 5%, 60%)
    --syntax-tag: hsl(266, 72%, 72%)
-    --syntax-number: hsl(266, 72%, 72%)
+    --syntax-number: var(--syntax-tag)
    --syntax-selector: hsl(31, 100%, 71%)
-    --syntax-operator: hsl(342, 100%, 59%)
    --syntax-function: hsl(195, 70%, 54%)
-    --syntax-keyword: hsl(342, 100%, 59%)
+    --syntax-keyword: hsl(343, 100%, 68%)
+    --syntax-operator: var(--syntax-keyword)
    --syntax-regex: hsl(45, 90%, 55%)

    // Other
@ -354,6 +354,7 @@ body [id]:target
    &.inserted, &.deleted
        padding: 2px 0
        border-radius: 2px
+        opacity: 0.9

    &.inserted
        color: var(--color-green-medium)
@ -388,7 +389,6 @@ body [id]:target
    .token
        color: var(--color-subtle)

-
 .gatsby-highlight-code-line
    background-color: var(--color-dark-secondary)
    border-left: 0.35em solid var(--color-theme)
@ -409,6 +409,7 @@ body [id]:target
    color: var(--color-subtle)

    .CodeMirror-line
+        color: var(--syntax-comment)
        padding: 0

    .CodeMirror-selected
@ -418,26 +419,25 @@ body [id]:target
    .CodeMirror-cursor
        border-left-color: currentColor

-    .cm-variable-2
-        color: inherit
-        font-style: italic
+    .cm-property, .cm-variable, .cm-variable-2, .cm-meta // decorators
+        color: var(--color-subtle)

    .cm-comment
        color: var(--syntax-comment)

-    .cm-keyword
+    .cm-keyword, .cm-builtin
        color: var(--syntax-keyword)

    .cm-operator
        color: var(--syntax-operator)

-    .cm-string, .cm-builtin
+    .cm-string
        color: var(--syntax-selector)

    .cm-number
        color: var(--syntax-number)

-    .cm-def, .cm-meta
+    .cm-def
        color: var(--syntax-function)

 // Jupyter