spaCy/spacy/pipeline/_parser_internals
Daniël de Kok 28299644fc
Speed up the StateC::L feature function (#10019)
* Speed up the StateC::L feature function

This function gets the n-th most-recent left-arc with a particular head.
Before this change, StateC::L would construct a vector of all left-arcs
with the given head and then pick the n-th most recent from that vector.
Since the number of left-arcs strongly correlates with the doc length
and the feature is constructed for every transition, this can make
transition-parsing quadratic.

With this change StateC::L:

- Searches left-arcs backwards.
- Stops early when the n-th matching transition is found.
- Does not construct a vector (reducing memory pressure).

This change doesn't avoid the linear search when the transition that is
queried does not occur in the left-arcs. Regardless, performance is
improved quite a bit with very long docs:

Before:

   N  Time

 400   3.3
 800   5.4
1600  11.6
3200  30.7

After:

   N  Time

 400   3.2
 800   5.0
1600   9.5
3200  23.2

We can probably do better with more tailored data structures, but I
first wanted to make a low-impact PR.

Found while investigating #9858.

* StateC::L: simplify loop
2022-01-13 09:03:55 +01:00
..
__init__.pxd Add beam_parser and beam_ner components for v3 (#6369) 2020-12-13 09:08:32 +08:00
__init__.py The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
_beam_utils.pxd Add beam_parser and beam_ner components for v3 (#6369) 2020-12-13 09:08:32 +08:00
_beam_utils.pyx Getting scores out of beam_ner (#6575) 2021-01-06 12:02:32 +01:00
_state.pxd Speed up the StateC::L feature function (#10019) 2022-01-13 09:03:55 +01:00
_state.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
arc_eager.pxd Getting scores out of beam_parser (#6684) 2021-01-07 16:28:27 +11:00
arc_eager.pyx Use dict.copy().items() instead of list(.items()) (#9868) 2021-12-16 09:17:33 +01:00
ner.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
ner.pyx Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00
nonproj.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
nonproj.pyx Clean up spacy.tokens (#6046) 2020-09-16 20:32:38 +02:00
stateclass.pxd Add beam_parser and beam_ner components for v3 (#6369) 2020-12-13 09:08:32 +08:00
stateclass.pyx Add beam_parser and beam_ner components for v3 (#6369) 2020-12-13 09:08:32 +08:00
transition_system.pxd Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00
transition_system.pyx Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00