spaCy/spacy/pipeline
Matthew Honnibal c1bf3a5602
Fix significant performance bug in parser training (#6010)
The parser training makes use of a trick for long documents, where we
use the oracle to cut up the document into sections, so that we can have
batch items in the middle of a document. For instance, if we have one
document of 600 words, we might make 6 states, starting at words 0, 100,
200, 300, 400 and 500.

The problem is for v3, I screwed this up and didn't stop parsing! So
instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch
of [600, 500, 400, 300, 200, 100]. Oops.

The implementation here could probably be improved, it's annoying to
have this extra variable in the state. But this'll do.

This makes the v3 parser training 5-10 times faster, depending on document
lengths. This problem wasn't in v2.
2020-09-02 12:57:13 +02:00
..
_parser_internals Fix significant performance bug in parser training (#6010) 2020-09-02 12:57:13 +02:00
__init__.py Add Lemmatizer and simplify related components (#5848) 2020-08-07 15:27:13 +02:00
attributeruler.py Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
dep_parser.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
entity_linker.py Merge pull request #5993 from explosion/feature/disabled-components 2020-08-29 15:58:41 +02:00
entityruler.py Use frozen list with custom errors 2020-08-29 15:20:11 +02:00
functions.py Option for returning only greedy matches (#5771) 2020-07-29 11:04:43 +02:00
lemmatizer.py Use init parameter (#5909) 2020-08-11 23:41:58 +02:00
morphologizer.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
multitask.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
ner.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
pipe.pxd Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
pipe.pyx Rename Transformer listener (#6001) 2020-08-31 12:41:39 +02:00
sentencizer.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
senter.pyx Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
simple_ner.py Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
tagger.pyx Fix tagger initialization 2020-09-01 16:38:34 +02:00
textcat.py Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
tok2vec.py Rename Transformer listener (#6001) 2020-08-31 12:41:39 +02:00
transition_parser.pxd Tidy up pipes (#5906) 2020-08-11 23:29:31 +02:00
transition_parser.pyx Fix significant performance bug in parser training (#6010) 2020-09-02 12:57:13 +02:00