mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-25 13:11:03 +03:00
* Speed up the StateC::L feature function This function gets the n-th most-recent left-arc with a particular head. Before this change, StateC::L would construct a vector of all left-arcs with the given head and then pick the n-th most recent from that vector. Since the number of left-arcs strongly correlates with the doc length and the feature is constructed for every transition, this can make transition-parsing quadratic. With this change StateC::L: - Searches left-arcs backwards. - Stops early when the n-th matching transition is found. - Does not construct a vector (reducing memory pressure). This change doesn't avoid the linear search when the transition that is queried does not occur in the left-arcs. Regardless, performance is improved quite a bit with very long docs: Before: N Time 400 3.3 800 5.4 1600 11.6 3200 30.7 After: N Time 400 3.2 800 5.0 1600 9.5 3200 23.2 We can probably do better with more tailored data structures, but I first wanted to make a low-impact PR. Found while investigating #9858. * StateC::L: simplify loop |
||
|---|---|---|
| .. | ||
| _parser_internals | ||
| __init__.py | ||
| attributeruler.py | ||
| dep_parser.pyx | ||
| entity_linker.py | ||
| entityruler.py | ||
| functions.py | ||
| lemmatizer.py | ||
| morphologizer.pyx | ||
| multitask.pyx | ||
| ner.pyx | ||
| pipe.pxd | ||
| pipe.pyi | ||
| pipe.pyx | ||
| sentencizer.pyx | ||
| senter.pyx | ||
| spancat.py | ||
| tagger.pyx | ||
| textcat_multilabel.py | ||
| textcat.py | ||
| tok2vec.py | ||
| trainable_pipe.pxd | ||
| trainable_pipe.pyx | ||
| transition_parser.pxd | ||
| transition_parser.pyx | ||