spaCy/website/docs
Adriane Boyd c62fd878a3
Allow Doc.char_span to snap to token boundaries (#5849)
* Allow Doc.char_span to snap to token boundaries

Add a `mode` option to allow `Doc.char_span` to snap to token
boundaries. The `mode` options:

* `strict`: character offsets must match token boundaries (default, same as
before)
* `inside`: all tokens completely within the character span
* `outside`: all tokens at least partially covered by the character span

Add a new helper function `token_by_char` that returns the token
corresponding to a character position in the text. Update
`token_by_start` and `token_by_end` to use `token_by_char` for more
efficient searching.

* Remove unused import

* Rename mode to alignment_mode

Rename `mode` to `alignment_mode` with the options
`strict`/`contract`/`expand`. Any unrecognized modes are silently
converted to `strict`.
2020-08-04 13:36:32 +02:00
..
api Allow Doc.char_span to snap to token boundaries (#5849) 2020-08-04 13:36:32 +02:00
images Remove box-decoration-break from entities in displacy (#4564) 2019-10-31 15:09:43 +01:00
models Divide models into core and starters [ci skip] 2019-12-21 14:10:22 +01:00
usage fix the wrong hash url in adding-languages.md file (#5810) 2020-07-25 13:13:38 +02:00
index.md
styleguide.md