Update v2-2.md [ci skip]

2026-02-15 11:40:37 +03:00 · 2019-10-01 21:37:06 +02:00 · 2019-10-01 21:37:06 +02:00 · 0dd127bb00
commit 0dd127bb00
parent cf65a80f36
1 changed files with 28 additions and 0 deletions
--- a/website/docs/usage/v2-2.md
+++ b/website/docs/usage/v2-2.md
@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the
  the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
  pipeline components, vocab) will now include additional data, and models
  written to disk will include additional files.
+- The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance
+  of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts
+  as separate arguments. This makes it easier to share data tables and modify
+  them at runtime. This is mostly internals, but if you've been implementing a
+  custom `Lemmatizer`, you'll need to update your code.
 - The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
  labelled UD instead of WikiNER), so their predictions may be very different
  compared to the previous version. The results should be significantly better
@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any
 lemmatization rules available. spaCy will now show you a warning when you train
 a new part-of-speech tagger and the vocab has no lookups available.

+#### Lemmatizer initialization
+
+This is mainly internals and should hopefully not affect your code. But if
+you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to
+update how they're initialized and pass in an instance of
+[`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`,
+`lemma_rules` and `lemma_lookup`.
+
+```diff
+from spacy.lemmatizer import Lemmatizer
+ from spacy.lookups import Lookups
+
+lemma_index = {"verb": ("cope", "cop")}
+lemma_exc = {"verb": {"coping": ("cope",)}}
+lemma_rules = {"verb": [["ing", ""]]}
+- lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules)
+ lookups = Lookups()
+ lookups.add_table("lemma_index", lemma_index)
+ lookups.add_table("lemma_exc", lemma_exc)
+ lookups.add_table("lemma_rules", lemma_rules)
+ lemmatizer = Lemmatizer(lookups)
+```
+
 #### Converting entity offsets to BILUO tags

 If you've been using the