mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-26 05:31:15 +03:00
The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| entityruler.py | ||
| functions.py | ||
| hooks.py | ||
| morphologizer.pyx | ||
| pipes.pyx | ||