mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-13 13:17:06 +03:00
c62fd878a3
* Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`. |
||
---|---|---|
.. | ||
annotation.md | ||
cli.md | ||
cython-classes.md | ||
cython-structs.md | ||
cython.md | ||
dependencyparser.md | ||
doc.md | ||
docbin.md | ||
entitylinker.md | ||
entityrecognizer.md | ||
entityruler.md | ||
goldcorpus.md | ||
goldparse.md | ||
index.md | ||
kb.md | ||
language.md | ||
lemmatizer.md | ||
lexeme.md | ||
lookups.md | ||
matcher.md | ||
phrasematcher.md | ||
pipeline-functions.md | ||
scorer.md | ||
sentencizer.md | ||
span.md | ||
stringstore.md | ||
tagger.md | ||
textcategorizer.md | ||
token.md | ||
tokenizer.md | ||
top-level.md | ||
vectors.md | ||
vocab.md |