Commit Graph

55 Commits

Author SHA1 Message Date
Ines Montani
ad2a514cdf Show warning if phrase pattern Doc was overprocessed (#3255)
In most cases, the PhraseMatcher will match on the verbatim token text or as of v2.1, sometimes the lowercase text. This means that we only need a tokenized Doc, without any other attributes.

If phrase patterns are created by processing large terminology lists with the full `nlp` object, this easily can make things a lot slower, because all components will be applied, even if we don't actually need the attributes they set (like part-of-speech tags, dependency labels).

The warning message also includes a suggestion to use nlp.make_doc or nlp.tokenizer.pipe for even faster processing. For now, the validation has to be enabled explicitly by setting validate=True.
2019-02-13 01:45:31 +11:00
Ines Montani
694139aad3 Fix formatting [ci skip] 2019-02-08 16:32:36 +01:00
Ines Montani
2898768757 Remove unused attribute [ci skip] 2019-02-08 16:31:30 +01:00
Ines Montani
25602c794c Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
Ines Montani
1ea4df459d 💫 Break up large matcher.pyx (#3236)
* Break up large matcher.pyx

* Remove unused function
2019-02-07 19:42:25 +11:00