spaCy/spacy/pipeline/_parser_internals
Matthew Honnibal c1bf3a5602
Fix significant performance bug in parser training (#6010)
The parser training makes use of a trick for long documents, where we
use the oracle to cut up the document into sections, so that we can have
batch items in the middle of a document. For instance, if we have one
document of 600 words, we might make 6 states, starting at words 0, 100,
200, 300, 400 and 500.

The problem is for v3, I screwed this up and didn't stop parsing! So
instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch
of [600, 500, 400, 300, 200, 100]. Oops.

The implementation here could probably be improved, it's annoying to
have this extra variable in the state. But this'll do.

This makes the v3 parser training 5-10 times faster, depending on document
lengths. This problem wasn't in v2.
2020-09-02 12:57:13 +02:00
..
__init__.py The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
_state.pxd Fix significant performance bug in parser training (#6010) 2020-09-02 12:57:13 +02:00
_state.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
arc_eager.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
arc_eager.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
ner.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
ner.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
nonproj.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
nonproj.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
stateclass.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
stateclass.pyx Fix significant performance bug in parser training (#6010) 2020-09-02 12:57:13 +02:00
transition_system.pxd The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00
transition_system.pyx The Parser is now a Pipe (2) (#5844) 2020-07-30 23:30:54 +02:00