spaCy/spacy/pipeline
Daniël de Kok 63fa55089d Use constant-time head lookups in StateC::{L,R}
This change changes the type of left/right-arc collections from
vector[ArcC] to unordered_map[int, vector[Arc]], so that the arcs are
keyed by the head. This allows us to find all the left/right arcs for a
particular head in constant time in StateC::{L,R}.

Benchmarks with long docs (N is the number of text repetitions):

Before (using #10019):

    N  Time (s)

  400   3.2
  800   5.0
 1600   9.5
 3200  23.2
 6400  66.8
12800  220.0

After (this commit):

   N   Time (s)

  400   3.1
  800   4.3
 1600   6.7
 3200  12.0
 6400  22.0
12800  42.0

Related to #9858 and #10019.
2022-01-13 12:08:46 +01:00
..
_parser_internals Use constant-time head lookups in StateC::{L,R} 2022-01-13 12:08:46 +01:00
__init__.py Add SpanCategorizer component (#6747) 2021-06-24 12:35:27 +02:00
attributeruler.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
dep_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
entity_linker.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
entityruler.py Entity ruler remove pattern (#9685) 2021-12-06 15:32:49 +01:00
functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
lemmatizer.py Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
morphologizer.pyx morphologizer: avoid recreating label tuple for each token (#9764) 2021-11-30 11:58:59 +01:00
multitask.pyx Replace negative rows with 0 in StaticVectors (#7674) 2021-04-22 18:04:15 +10:00
ner.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00
pipe.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
pipe.pyi Auto-format code with black (#9474) 2021-10-15 11:36:49 +02:00
pipe.pyx Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
sentencizer.pyx Add overwrite settings for more components (#9050) 2021-09-30 15:35:55 +02:00
senter.pyx Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
spancat.py Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
tagger.pyx Make the Tagger neg_prefix configurable (#9802) 2021-12-06 18:04:44 +01:00
textcat_multilabel.py Fix Scorer.score_cats for missing labels (#9443) 2021-12-29 11:04:39 +01:00
textcat.py Fix texcat loss scaling (#9904) (#10002) 2022-01-13 09:03:23 +01:00
tok2vec.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
trainable_pipe.pxd Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
trainable_pipe.pyx Pass excludes when serializing vocab (#8824) 2021-08-03 14:42:44 +02:00
transition_parser.pxd TrainablePipe (#6213) 2020-10-08 21:33:49 +02:00
transition_parser.pyx Document scorers in registry and components from #8766 (#8929) 2021-08-12 12:50:03 +02:00