mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
Add BILUO scheme to annotation docs
This commit is contained in:
parent
99b631617d
commit
465a1dd710
|
@ -71,6 +71,44 @@ include _annotation/_dep-labels
|
||||||
|
|
||||||
include _annotation/_named-entities
|
include _annotation/_named-entities
|
||||||
|
|
||||||
|
+h(3, "biluo") BILUO Scheme
|
||||||
|
|
||||||
|
p
|
||||||
|
| spaCy translates character offsets into the BILUO scheme, in order to
|
||||||
|
| decide the cost of each action given the current state of the entity
|
||||||
|
| recognizer. The costs are then used to calculate the gradient of the
|
||||||
|
| loss, to train the model.
|
||||||
|
|
||||||
|
+aside("Why BILUO, not IOB?")
|
||||||
|
| There are several coding schemes for encoding entity annotations as
|
||||||
|
| token tags. These coding schemes are equally expressive, but not
|
||||||
|
| necessarily equally learnable.
|
||||||
|
| #[+a("http://www.aclweb.org/anthology/W09-1119") Ratinov and Roth]
|
||||||
|
| showed that the minimal #[strong Begin], #[strong In], #[strong Out]
|
||||||
|
| scheme was more difficult to learn than the #[strong BILUO] scheme that
|
||||||
|
| we use, which explicitly marks boundary tokens.
|
||||||
|
|
||||||
|
+table([ "Tag", "Description" ])
|
||||||
|
+row
|
||||||
|
+cell #[code #[span.u-color-theme B] EGIN]
|
||||||
|
+cell The first token of a multi-token entity.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code #[span.u-color-theme I] N]
|
||||||
|
+cell An inner token of a multi-token entity.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code #[span.u-color-theme L] AST]
|
||||||
|
+cell The final token of a multi-token entity.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code #[span.u-color-theme U] NIT]
|
||||||
|
+cell A single-token entity.
|
||||||
|
|
||||||
|
+row
|
||||||
|
+cell #[code #[span.u-color-theme O] UT]
|
||||||
|
+cell A non-entity token.
|
||||||
|
|
||||||
+h(2, "json-input") JSON input format for training
|
+h(2, "json-input") JSON input format for training
|
||||||
|
|
||||||
p
|
p
|
||||||
|
|
Loading…
Reference in New Issue
Block a user