mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-28 22:47:52 +03:00
In most cases, the PhraseMatcher will match on the verbatim token text or as of v2.1, sometimes the lowercase text. This means that we only need a tokenized Doc, without any other attributes. If phrase patterns are created by processing large terminology lists with the full `nlp` object, this easily can make things a lot slower, because all components will be applied, even if we don't actually need the attributes they set (like part-of-speech tags, dependency labels). The warning message also includes a suggestion to use nlp.make_doc or nlp.tokenizer.pipe for even faster processing. For now, the validation has to be enabled explicitly by setting validate=True. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_matcher_api.py | ||
| test_matcher_logic.py | ||
| test_phrase_matcher.py | ||