spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-08 22:03:24 +03:00

History

Adriane Boyd c62fd878a3 Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.		2020-08-04 13:36:32 +02:00
..
api	Allow Doc.char_span to snap to token boundaries (#5849 )	2020-08-04 13:36:32 +02:00
images	Remove box-decoration-break from entities in displacy (#4564 )	2019-10-31 15:09:43 +01:00
models	Divide models into core and starters [ci skip]	2019-12-21 14:10:22 +01:00
usage	fix the wrong hash url in adding-languages.md file (#5810 )	2020-07-25 13:13:38 +02:00
index.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
styleguide.md	💫 v2.1.0 launch updates (only merge on launch!) (#3414 )	2019-03-18 16:07:26 +01:00