diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md index 09f81c7c0..2ef30576e 100644 --- a/website/docs/usage/linguistic-features.md +++ b/website/docs/usage/linguistic-features.md @@ -1019,6 +1019,15 @@ above: - The dictionary `b2a_multi` shows that there are no tokens in `spacy_tokens` that map to multiple tokens in `other_tokens`. + + +The current implementation of the alignment algorithm assumes that both +tokenizations add up to the same string. For example, you'll be able to align +`["I", "'", "m"]` and `["I", "'m"]`, which both add up to `"I'm"`, but not +`["I", "'m"]` and `["I", "am"]`. + + + ## Merging and splitting {#retokenization new="2.1"} The [`Doc.retokenize`](/api/doc#retokenize) context manager lets you merge and