spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-15 18:52:29 +03:00

History

Matthew Honnibal c1bf3a5602 Fix significant performance bug in parser training (#6010 ) The parser training makes use of a trick for long documents, where we use the oracle to cut up the document into sections, so that we can have batch items in the middle of a document. For instance, if we have one document of 600 words, we might make 6 states, starting at words 0, 100, 200, 300, 400 and 500. The problem is for v3, I screwed this up and didn't stop parsing! So instead of a batch of [100, 100, 100, 100, 100, 100], we'd have a batch of [600, 500, 400, 300, 200, 100]. Oops. The implementation here could probably be improved, it's annoying to have this extra variable in the state. But this'll do. This makes the v3 parser training 5-10 times faster, depending on document lengths. This problem wasn't in v2.		2020-09-02 12:57:13 +02:00
..
_parser_internals	Fix significant performance bug in parser training (#6010 )	2020-09-02 12:57:13 +02:00
__init__.py	Add Lemmatizer and simplify related components (#5848 )	2020-08-07 15:27:13 +02:00
attributeruler.py	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00
dep_parser.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
entity_linker.py	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00
entityruler.py	Use frozen list with custom errors	2020-08-29 15:20:11 +02:00
functions.py	Option for returning only greedy matches (#5771 )	2020-07-29 11:04:43 +02:00
lemmatizer.py	Use init parameter (#5909 )	2020-08-11 23:41:58 +02:00
morphologizer.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
multitask.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
ner.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
pipe.pxd	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
pipe.pyx	Rename Transformer listener (#6001 )	2020-08-31 12:41:39 +02:00
sentencizer.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
senter.pyx	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
simple_ner.py	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
tagger.pyx	Fix tagger initialization	2020-09-01 16:38:34 +02:00
textcat.py	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
tok2vec.py	Rename Transformer listener (#6001 )	2020-08-31 12:41:39 +02:00
transition_parser.pxd	Tidy up pipes (#5906 )	2020-08-11 23:29:31 +02:00
transition_parser.pyx	Fix significant performance bug in parser training (#6010 )	2020-09-02 12:57:13 +02:00