mirror of
https://github.com/explosion/spaCy.git
synced 2025-11-02 08:57:48 +03:00
The `Matcher` in `merge_subtokens()` returns all possible subsequences of `subtok`, so for sequences of two or more subtoks it's necessary to filter the matches so that the retokenizer is only merging the longest matches with no overlapping spans. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_analysis.py | ||
| test_entity_linker.py | ||
| test_entity_ruler.py | ||
| test_factories.py | ||
| test_functions.py | ||
| test_pipe_methods.py | ||
| test_sentencizer.py | ||
| test_textcat.py | ||