Remove LEMMA from exception examples [ci skip]

This commit is contained in:
Ines Montani 2019-09-12 16:26:27 +02:00
parent 82c16b7943
commit 25b2b3ff45
3 changed files with 5 additions and 5 deletions

View File

@ -118,8 +118,8 @@ and examples.
> #### Example
>
> ```python
> from spacy.attrs import ORTH, LEMMA
> case = [{ORTH: "do"}, {ORTH: "n't", LEMMA: "not"}]
> from spacy.attrs import ORTH, NORM
> case = [{ORTH: "do"}, {ORTH: "n't", NORM: "not"}]
> tokenizer.add_special_case("don't", case)
> ```

View File

@ -514,9 +514,9 @@ an error if key doesn't match `ORTH` values.
>
> ```python
> BASE = {"a.": [{ORTH: "a."}], ":)": [{ORTH: ":)"}]}
> NEW = {"a.": [{ORTH: "a.", LEMMA: "all"}]}
> NEW = {"a.": [{ORTH: "a.", NORM: "all"}]}
> exceptions = util.update_exc(BASE, NEW)
> # {"a.": [{ORTH: "a.", LEMMA: "all"}], ":)": [{ORTH: ":)"}]}
> # {"a.": [{ORTH: "a.", NORM: "all"}], ":)": [{ORTH: ":)"}]}
> ```
| Name | Type | Description |

View File

@ -649,7 +649,7 @@ import Tokenization101 from 'usage/101/\_tokenization.md'
data in
[`spacy/lang`](https://github.com/explosion/spaCy/tree/master/spacy/lang). The
tokenizer exceptions define special cases like "don't" in English, which needs
to be split into two tokens: `{ORTH: "do"}` and `{ORTH: "n't", LEMMA: "not"}`.
to be split into two tokens: `{ORTH: "do"}` and `{ORTH: "n't", NORM: "not"}`.
The prefixes, suffixes and infixes mostly define punctuation rules for
example, when to split off periods (at the end of a sentence), and when to leave
tokens containing periods intact (abbreviations like "U.S.").