diff --git a/website/docs/usage/v2-1.md b/website/docs/usage/v2-1.md index 988531e00..bdf0cfa1f 100644 --- a/website/docs/usage/v2-1.md +++ b/website/docs/usage/v2-1.md @@ -215,6 +215,22 @@ if all of your models are up to date, you can run the means that the `Matcher` in v2.1.x may produce different results compared to the `Matcher` in v2.0.x. +- The deprecated [`Doc.merge`](/api/doc#merge) and + [`Span.merge`](/api/span#merge) methods still work, but you may notice that + they now run slower when merging many objects in a row. That's because the + merging engine was rewritten to be more reliable and to support more efficient + merging **in bulk**. To take advantage of this, you should rewrite your logic + to use the [`Doc.retokenize`](/api/doc#retokenize) context manager and perform + as many merges as possible together in the `with` block. + + ```diff + - doc[1:5].merge() + - doc[6:8].merge() + + with doc.retokenize() as retokenizer: + + retokenizer.merge(doc[1:5]) + + retokenizer.merge(doc[6:8]) + ``` + - For better compatibility with the Universal Dependencies data, the lemmatizer now preserves capitalization, e.g. for proper nouns. See [this issue](https://github.com/explosion/spaCy/issues/3256) for details.