From c3ead02ea55cbf548b095b994654563acb8f00f9 Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Wed, 17 Jul 2019 16:06:25 +0200 Subject: [PATCH] Adjust wording [ci skip] --- website/docs/usage/linguistic-features.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md index 2ef30576e..55ad646c1 100644 --- a/website/docs/usage/linguistic-features.md +++ b/website/docs/usage/linguistic-features.md @@ -970,9 +970,10 @@ optimized for compatibility with treebank annotations. Other tools and resources can sometimes tokenize things differently – for example, `"I'm"` → `["I", "'", "m"]` instead of `["I", "'m"]`. -In cases like that, you often want to align the tokenization so that you can -merge annotations from different sources together, or take vectors predicted by -a [pre-trained BERT model](https://github.com/huggingface/pytorch-transformers) +In situations like that, you often want to align the tokenization so that you +can merge annotations from different sources together, or take vectors predicted +by a +[pre-trained BERT model](https://github.com/huggingface/pytorch-transformers) and apply them to spaCy tokens. spaCy's [`gold.align`](/api/goldparse#align) helper returns a `(cost, a2b, b2a, a2b_multi, b2a_multi)` tuple describing the number of misaligned tokens, the one-to-one mappings of token indices in both