mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-25 13:11:03 +03:00
* Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_add_entities.py | ||
| test_array.py | ||
| test_creation.py | ||
| test_doc_api.py | ||
| test_morphanalysis.py | ||
| test_pickle_doc.py | ||
| test_retokenize_merge.py | ||
| test_retokenize_split.py | ||
| test_span.py | ||
| test_to_json.py | ||
| test_token_api.py | ||
| test_underscore.py | ||