Add note on merging speed in v2.1 (see #3300) [ci skip]

This commit is contained in:
Ines Montani 2019-02-21 12:34:18 +01:00
parent 236aa94ded
commit 0fc908d7a5

View File

@ -215,6 +215,22 @@ if all of your models are up to date, you can run the
means that the `Matcher` in v2.1.x may produce different results compared to means that the `Matcher` in v2.1.x may produce different results compared to
the `Matcher` in v2.0.x. the `Matcher` in v2.0.x.
- The deprecated [`Doc.merge`](/api/doc#merge) and
[`Span.merge`](/api/span#merge) methods still work, but you may notice that
they now run slower when merging many objects in a row. That's because the
merging engine was rewritten to be more reliable and to support more efficient
merging **in bulk**. To take advantage of this, you should rewrite your logic
to use the [`Doc.retokenize`](/api/doc#retokenize) context manager and perform
as many merges as possible together in the `with` block.
```diff
- doc[1:5].merge()
- doc[6:8].merge()
+ with doc.retokenize() as retokenizer:
+ retokenizer.merge(doc[1:5])
+ retokenizer.merge(doc[6:8])
```
- For better compatibility with the Universal Dependencies data, the lemmatizer - For better compatibility with the Universal Dependencies data, the lemmatizer
now preserves capitalization, e.g. for proper nouns. See now preserves capitalization, e.g. for proper nouns. See
[this issue](https://github.com/explosion/spaCy/issues/3256) for details. [this issue](https://github.com/explosion/spaCy/issues/3256) for details.