spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-24 15:09:46 +03:00

History

Daniël de Kok 28299644fc Speed up the StateC::L feature function (#10019 ) * Speed up the StateC::L feature function This function gets the n-th most-recent left-arc with a particular head. Before this change, StateC::L would construct a vector of all left-arcs with the given head and then pick the n-th most recent from that vector. Since the number of left-arcs strongly correlates with the doc length and the feature is constructed for every transition, this can make transition-parsing quadratic. With this change StateC::L: - Searches left-arcs backwards. - Stops early when the n-th matching transition is found. - Does not construct a vector (reducing memory pressure). This change doesn't avoid the linear search when the transition that is queried does not occur in the left-arcs. Regardless, performance is improved quite a bit with very long docs: Before: N Time 400 3.3 800 5.4 1600 11.6 3200 30.7 After: N Time 400 3.2 800 5.0 1600 9.5 3200 23.2 We can probably do better with more tailored data structures, but I first wanted to make a low-impact PR. Found while investigating #9858. * StateC::L: simplify loop		2022-01-13 09:03:55 +01:00
..
_parser_internals	Speed up the StateC::L feature function (#10019 )	2022-01-13 09:03:55 +01:00
__init__.py	Add SpanCategorizer component (#6747 )	2021-06-24 12:35:27 +02:00
attributeruler.py	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
dep_parser.pyx	Document scorers in registry and components from #8766 (#8929 )	2021-08-12 12:50:03 +02:00
entity_linker.py	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
entityruler.py	Entity ruler remove pattern (#9685 )	2021-12-06 15:32:49 +01:00
functions.py	Add doc_cleaner component (#9659 )	2021-11-23 15:33:33 +01:00
lemmatizer.py	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
morphologizer.pyx	morphologizer: avoid recreating label tuple for each token (#9764 )	2021-11-30 11:58:59 +01:00
multitask.pyx	Replace negative rows with 0 in StaticVectors (#7674 )	2021-04-22 18:04:15 +10:00
ner.pyx	Document scorers in registry and components from #8766 (#8929 )	2021-08-12 12:50:03 +02:00
pipe.pxd	TrainablePipe (#6213 )	2020-10-08 21:33:49 +02:00
pipe.pyi	Auto-format code with black (#9474 )	2021-10-15 11:36:49 +02:00
pipe.pyx	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
sentencizer.pyx	Add overwrite settings for more components (#9050 )	2021-09-30 15:35:55 +02:00
senter.pyx	Add overwrite settings for more components (#9050 )	2021-09-30 15:35:55 +02:00
spancat.py	Fix spancat for empty docs and zero suggestions (#9654 )	2021-11-15 12:40:55 +01:00
tagger.pyx	Make the Tagger neg_prefix configurable (#9802 )	2021-12-06 18:04:44 +01:00
textcat_multilabel.py	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
textcat.py	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
tok2vec.py	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
trainable_pipe.pxd	Refactor scoring methods to use registered functions (#8766 )	2021-08-10 15:13:39 +02:00
trainable_pipe.pyx	Pass excludes when serializing vocab (#8824 )	2021-08-03 14:42:44 +02:00
transition_parser.pxd	TrainablePipe (#6213 )	2020-10-08 21:33:49 +02:00
transition_parser.pyx	Document scorers in registry and components from #8766 (#8929 )	2021-08-12 12:50:03 +02:00