From 9a8f169e5c940fd1d2a50d99fc41d11726b9bf65 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Sun, 10 Mar 2019 18:58:51 +0100 Subject: [PATCH] Update v2-1.md --- website/docs/usage/v2-1.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/website/docs/usage/v2-1.md b/website/docs/usage/v2-1.md index f97d9d283..42adcb657 100644 --- a/website/docs/usage/v2-1.md +++ b/website/docs/usage/v2-1.md @@ -237,6 +237,19 @@ if all of your models are up to date, you can run the + retokenizer.merge(doc[6:8]) ``` +- The serialization methods `to_disk`, `from_disk`, `to_bytes` and `from_bytes` + now support a single `exclude` argument to provide a list of string names to + exclude. The docs have been updated to list the available serialization fields + for each class. The `disable` argument on the [`Language`](/api/language) + serialization methods has been renamed to `exclude` for consistency. + + ```diff + - nlp.to_disk("/path", disable=["parser", "ner"]) + + nlp.to_disk("/path", exclude=["parser", "ner"]) + - data = nlp.tokenizer.to_bytes(vocab=False) + + data = nlp.tokenizer.to_bytes(exclude=["vocab"]) + ``` + - For better compatibility with the Universal Dependencies data, the lemmatizer now preserves capitalization, e.g. for proper nouns. See [this issue](https://github.com/explosion/spaCy/issues/3256) for details.