Merge branch 'master' into spacy.io

This commit is contained in:
Ines Montani 2019-07-17 16:06:36 +02:00
commit 463b093c27

View File

@ -970,9 +970,10 @@ optimized for compatibility with treebank annotations. Other tools and resources
can sometimes tokenize things differently for example, `"I'm"`
`["I", "'", "m"]` instead of `["I", "'m"]`.
In cases like that, you often want to align the tokenization so that you can
merge annotations from different sources together, or take vectors predicted by
a [pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
In situations like that, you often want to align the tokenization so that you
can merge annotations from different sources together, or take vectors predicted
by a
[pre-trained BERT model](https://github.com/huggingface/pytorch-transformers)
and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align)
helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the
number of misaligned tokens, the one-to-one mappings of token indices in both