spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-03 19:31:35 +03:00

History

Adriane Boyd c62fd878a3 Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.		2020-08-04 13:36:32 +02:00
..
annotation.md	Update tag maps and docs for English and German (#4501 )	2019-10-24 12:56:05 +02:00
cli.md	Experimental character-based pretraining (#5700 )	2020-07-05 15:48:39 +02:00
cython-classes.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
cython-structs.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00
cython.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
dependencyparser.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
doc.md	Allow Doc.char_span to snap to token boundaries (#5849 )	2020-08-04 13:36:32 +02:00
docbin.md	Fix DocBin.merge() example (#4599 )	2019-11-07 11:26:48 +01:00
entitylinker.md	Fix typos and formatting [ci skip]	2019-10-01 12:30:04 +02:00
entityrecognizer.md	update docs for EntityRecognizer.predict	2020-03-28 18:13:02 +01:00
entityruler.md	Describing priority rules for overlapping matches (#5197 )	2020-03-26 13:13:22 +01:00
goldcorpus.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
goldparse.md	Fix typos and auto-format [ci skip]	2020-06-16 16:38:45 +02:00
index.md	💫 Update website (#3285 )	2019-02-17 19:31:19 +01:00
kb.md	Use consistent spelling	2019-10-02 10:37:39 +02:00
language.md	Amend documentation to Language.evaluate (#5319 )	2020-04-16 20:00:18 +02:00
lemmatizer.md	Misspelling on Lemmatizer Example #4406 (#4449 )	2019-10-16 23:23:15 +02:00
lexeme.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00
lookups.md	Fix typos and formatting [ci skip]	2019-10-01 12:30:04 +02:00
matcher.md	Fix typos and auto-format [ci skip]	2020-06-16 16:38:45 +02:00
phrasematcher.md	Fix in docs: pipe(docs) instead of pipe(texts) (#5680 )	2020-06-30 20:00:50 +02:00
pipeline-functions.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
scorer.md	Update scorer.md [ci skip]	2019-11-21 17:02:43 +01:00
sentencizer.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00
span.md	Fix formatting and update docs for v2.2.4	2020-03-09 11:17:20 +01:00
stringstore.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
tagger.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
textcategorizer.md	Remove u-strings and fix formatting [ci skip]	2019-09-12 16:11:15 +02:00
token.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00
tokenizer.md	Rename to url_match	2020-05-22 12:41:03 +02:00
top-level.md	Fix formatting and update docs for v2.2.4	2020-03-09 11:17:20 +01:00
vectors.md	Update nlp.vectors to nlp.vocab.vectors (#5357 )	2020-04-27 10:53:05 +02:00
vocab.md	Documentation updates for v2.3.0 (#5593 )	2020-06-16 15:37:35 +02:00