Update v2-2.md [ci skip]

This commit is contained in:
Ines Montani 2019-09-19 00:58:30 +02:00
parent f52b857953
commit ddc09b08ed

View File

@ -341,6 +341,11 @@ check if all of your models are up to date, you can run the
them). If your data contains invalid entity annotations, make sure to clean it
and resolve conflicts. You can now also use the new `debug-data` command to
find problems in your data.
- Pipeline components can now overwrite IOB tags of tokens that are not yet part
of an entity. Once a token has an `ent_iob` value set, it won't be reset to an
"unset" state and will always have at least `O` assigned. `list(doc.ents)` now
actually keeps the annotations on the token level consistent, instead of
resetting `O` to an empty string.
- The default punctuation in the `sentencizer` has been extended and now
includes more characters common in various languages. This also means that the
results it produces may change, depending on your text. If you want the