mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-27 17:54:39 +03:00
Add note on merging speed in v2.1 (see #3300) [ci skip]
This commit is contained in:
parent
236aa94ded
commit
0fc908d7a5
|
@ -215,6 +215,22 @@ if all of your models are up to date, you can run the
|
||||||
means that the `Matcher` in v2.1.x may produce different results compared to
|
means that the `Matcher` in v2.1.x may produce different results compared to
|
||||||
the `Matcher` in v2.0.x.
|
the `Matcher` in v2.0.x.
|
||||||
|
|
||||||
|
- The deprecated [`Doc.merge`](/api/doc#merge) and
|
||||||
|
[`Span.merge`](/api/span#merge) methods still work, but you may notice that
|
||||||
|
they now run slower when merging many objects in a row. That's because the
|
||||||
|
merging engine was rewritten to be more reliable and to support more efficient
|
||||||
|
merging **in bulk**. To take advantage of this, you should rewrite your logic
|
||||||
|
to use the [`Doc.retokenize`](/api/doc#retokenize) context manager and perform
|
||||||
|
as many merges as possible together in the `with` block.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- doc[1:5].merge()
|
||||||
|
- doc[6:8].merge()
|
||||||
|
+ with doc.retokenize() as retokenizer:
|
||||||
|
+ retokenizer.merge(doc[1:5])
|
||||||
|
+ retokenizer.merge(doc[6:8])
|
||||||
|
```
|
||||||
|
|
||||||
- For better compatibility with the Universal Dependencies data, the lemmatizer
|
- For better compatibility with the Universal Dependencies data, the lemmatizer
|
||||||
now preserves capitalization, e.g. for proper nouns. See
|
now preserves capitalization, e.g. for proper nouns. See
|
||||||
[this issue](https://github.com/explosion/spaCy/issues/3256) for details.
|
[this issue](https://github.com/explosion/spaCy/issues/3256) for details.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user