mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-25 17:36:30 +03:00
Update v2-2.md [ci skip]
This commit is contained in:
parent
f52b857953
commit
ddc09b08ed
|
@ -341,6 +341,11 @@ check if all of your models are up to date, you can run the
|
||||||
them). If your data contains invalid entity annotations, make sure to clean it
|
them). If your data contains invalid entity annotations, make sure to clean it
|
||||||
and resolve conflicts. You can now also use the new `debug-data` command to
|
and resolve conflicts. You can now also use the new `debug-data` command to
|
||||||
find problems in your data.
|
find problems in your data.
|
||||||
|
- Pipeline components can now overwrite IOB tags of tokens that are not yet part
|
||||||
|
of an entity. Once a token has an `ent_iob` value set, it won't be reset to an
|
||||||
|
"unset" state and will always have at least `O` assigned. `list(doc.ents)` now
|
||||||
|
actually keeps the annotations on the token level consistent, instead of
|
||||||
|
resetting `O` to an empty string.
|
||||||
- The default punctuation in the `sentencizer` has been extended and now
|
- The default punctuation in the `sentencizer` has been extended and now
|
||||||
includes more characters common in various languages. This also means that the
|
includes more characters common in various languages. This also means that the
|
||||||
results it produces may change, depending on your text. If you want the
|
results it produces may change, depending on your text. If you want the
|
||||||
|
|
Loading…
Reference in New Issue
Block a user