Merge branch 'master' into spacy.io

This commit is contained in:
Ines Montani 2019-07-17 16:06:36 +02:00
commit 463b093c27

View File

@ -970,9 +970,10 @@ optimized for compatibility with treebank annotations. Other tools and resources
can sometimes tokenize things differently for example, `"I'm"` can sometimes tokenize things differently for example, `"I'm"`
`["I", "'", "m"]` instead of `["I", "'m"]`. `["I", "'", "m"]` instead of `["I", "'m"]`.
In cases like that, you often want to align the tokenization so that you can In situations like that, you often want to align the tokenization so that you
merge annotations from different sources together, or take vectors predicted by can merge annotations from different sources together, or take vectors predicted
a [pre-trained BERT model](https://github.com/huggingface/pytorch-transformers) by a
[pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align) and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align)
helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the
number of misaligned tokens, the one-to-one mappings of token indices in both number of misaligned tokens, the one-to-one mappings of token indices in both