Update v2-2.md [ci skip]

This commit is contained in:
Ines Montani 2019-10-01 21:37:06 +02:00
parent cf65a80f36
commit 0dd127bb00

View File

@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the
the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
pipeline components, vocab) will now include additional data, and models
written to disk will include additional files.
- The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance
of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts
as separate arguments. This makes it easier to share data tables and modify
them at runtime. This is mostly internals, but if you've been implementing a
custom `Lemmatizer`, you'll need to update your code.
- The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
labelled UD instead of WikiNER), so their predictions may be very different
compared to the previous version. The results should be significantly better
@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any
lemmatization rules available. spaCy will now show you a warning when you train
a new part-of-speech tagger and the vocab has no lookups available.
#### Lemmatizer initialization
This is mainly internals and should hopefully not affect your code. But if
you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to
update how they're initialized and pass in an instance of
[`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`,
`lemma_rules` and `lemma_lookup`.
```diff
from spacy.lemmatizer import Lemmatizer
+ from spacy.lookups import Lookups
lemma_index = {"verb": ("cope", "cop")}
lemma_exc = {"verb": {"coping": ("cope",)}}
lemma_rules = {"verb": [["ing", ""]]}
- lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules)
+ lookups = Lookups()
+ lookups.add_table("lemma_index", lemma_index)
+ lookups.add_table("lemma_exc", lemma_exc)
+ lookups.add_table("lemma_rules", lemma_rules)
+ lemmatizer = Lemmatizer(lookups)
```
#### Converting entity offsets to BILUO tags
If you've been using the