mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-10 19:57:17 +03:00
Update v2-2.md [ci skip]
This commit is contained in:
parent
cf65a80f36
commit
0dd127bb00
|
@ -334,6 +334,11 @@ check if all of your models are up to date, you can run the
|
|||
the `Vocab` and serialized with it. This means that serialized objects (`nlp`,
|
||||
pipeline components, vocab) will now include additional data, and models
|
||||
written to disk will include additional files.
|
||||
- The [`Lemmatizer`](/api/lemmatizer) class is now initialized with an instance
|
||||
of [`Lookups`](/api/lookups) containing the rules and tables, instead of dicts
|
||||
as separate arguments. This makes it easier to share data tables and modify
|
||||
them at runtime. This is mostly internals, but if you've been implementing a
|
||||
custom `Lemmatizer`, you'll need to update your code.
|
||||
- The [Dutch model](/models/nl) has been trained on a new NER corpus (custom
|
||||
labelled UD instead of WikiNER), so their predictions may be very different
|
||||
compared to the previous version. The results should be significantly better
|
||||
|
@ -399,6 +404,29 @@ don't explicitly install the lookups data, that `nlp` object won't have any
|
|||
lemmatization rules available. spaCy will now show you a warning when you train
|
||||
a new part-of-speech tagger and the vocab has no lookups available.
|
||||
|
||||
#### Lemmatizer initialization
|
||||
|
||||
This is mainly internals and should hopefully not affect your code. But if
|
||||
you've been creating custom [`Lemmatizers`](/api/lemmatizer), you'll need to
|
||||
update how they're initialized and pass in an instance of
|
||||
[`Lookups`](/api/lookups) with the (optional) tables `lemma_index`, `lemma_exc`,
|
||||
`lemma_rules` and `lemma_lookup`.
|
||||
|
||||
```diff
|
||||
from spacy.lemmatizer import Lemmatizer
|
||||
+ from spacy.lookups import Lookups
|
||||
|
||||
lemma_index = {"verb": ("cope", "cop")}
|
||||
lemma_exc = {"verb": {"coping": ("cope",)}}
|
||||
lemma_rules = {"verb": [["ing", ""]]}
|
||||
- lemmatizer = Lemmatizer(lemma_index, lemma_exc, lemma_rules)
|
||||
+ lookups = Lookups()
|
||||
+ lookups.add_table("lemma_index", lemma_index)
|
||||
+ lookups.add_table("lemma_exc", lemma_exc)
|
||||
+ lookups.add_table("lemma_rules", lemma_rules)
|
||||
+ lemmatizer = Lemmatizer(lookups)
|
||||
```
|
||||
|
||||
#### Converting entity offsets to BILUO tags
|
||||
|
||||
If you've been using the
|
||||
|
|
Loading…
Reference in New Issue
Block a user