diff --git a/website/docs/api/tokenizer.md b/website/docs/api/tokenizer.md index 63c1e87ea..d6ab73f14 100644 --- a/website/docs/api/tokenizer.md +++ b/website/docs/api/tokenizer.md @@ -118,8 +118,8 @@ and examples. > #### Example > > ```python -> from spacy.attrs import ORTH, LEMMA -> case = [{ORTH: "do"}, {ORTH: "n't", LEMMA: "not"}] +> from spacy.attrs import ORTH, NORM +> case = [{ORTH: "do"}, {ORTH: "n't", NORM: "not"}] > tokenizer.add_special_case("don't", case) > ``` diff --git a/website/docs/api/top-level.md b/website/docs/api/top-level.md index 0a8f638b2..50ba0e3d9 100644 --- a/website/docs/api/top-level.md +++ b/website/docs/api/top-level.md @@ -514,9 +514,9 @@ an error if key doesn't match `ORTH` values. > > ```python > BASE = {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]} -> NEW = {"a.": [{ORTH: "a.", LEMMA: "all"}]} +> NEW = {"a.": [{ORTH: "a.", NORM: "all"}]} > exceptions = util.update_exc(BASE, NEW) -> # {"a.": [{ORTH: "a.", LEMMA: "all"}], ":)": [{ORTH: ":)"}]} +> # {"a.": [{ORTH: "a.", NORM: "all"}], ":)": [{ORTH: ":)"}]} > ``` | Name | Type | Description | diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md index a91135d70..7549a3985 100644 --- a/website/docs/usage/linguistic-features.md +++ b/website/docs/usage/linguistic-features.md @@ -649,7 +649,7 @@ import Tokenization101 from 'usage/101/\_tokenization.md' data in [`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang). The tokenizer exceptions define special cases like "don't" in English, which needs -to be split into two tokens: `{ORTH: "do"}` and `{ORTH: "n't", LEMMA: "not"}`. +to be split into two tokens: `{ORTH: "do"}` and `{ORTH: "n't", NORM: "not"}`. The prefixes, suffixes and infixes mostly define punctuation rules – for example, when to split off periods (at the end of a sentence), and when to leave tokens containing periods intact (abbreviations like "U.S.").