mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 13:47:13 +03:00
Merge branch 'master' into spacy.io
This commit is contained in:
commit
463b093c27
|
@ -970,9 +970,10 @@ optimized for compatibility with treebank annotations. Other tools and resources
|
||||||
can sometimes tokenize things differently – for example, `"I'm"` →
|
can sometimes tokenize things differently – for example, `"I'm"` →
|
||||||
`["I", "'", "m"]` instead of `["I", "'m"]`.
|
`["I", "'", "m"]` instead of `["I", "'m"]`.
|
||||||
|
|
||||||
In cases like that, you often want to align the tokenization so that you can
|
In situations like that, you often want to align the tokenization so that you
|
||||||
merge annotations from different sources together, or take vectors predicted by
|
can merge annotations from different sources together, or take vectors predicted
|
||||||
a [pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
|
by a
|
||||||
|
[pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
|
||||||
and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align)
|
and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align)
|
||||||
helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the
|
helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the
|
||||||
number of misaligned tokens, the one-to-one mappings of token indices in both
|
number of misaligned tokens, the one-to-one mappings of token indices in both
|
||||||
|
|
Loading…
Reference in New Issue
Block a user