spaCy/spacy
Matthew Honnibal 609c0ba557
Fix accidentally quadratic runtime in Example.split_sents (#5464)
* Tidy up train-from-config a bit

* Fix accidentally quadratic perf in TokenAnnotation.brackets

When we're reading in the gold data, we had a nested loop where
we looped over the brackets for each token, looking for brackets
that start on that word. This is accidentally quadratic, because
we have one bracket per word (for the POS tags). So we had
an O(N**2) behaviour here that ended up being pretty slow.

To solve this I'm indexing the brackets by their starting word
on the TokenAnnotations object, and having a property to provide
the previous view.

* Fixes
2020-05-20 18:48:18 +02:00
..
cli Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
displacy Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lang Remove "pala" tokenizer exception for Spanish (#5265) 2020-04-09 10:21:20 +02:00
matcher Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
ml Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
pipeline Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
syntax Merge from develop 2020-05-20 12:27:31 +02:00
tests Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
tokens Update morphologizer (#5108) 2020-04-02 14:46:32 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify warnings 2020-02-28 12:20:23 +01:00
__main__.py Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
_ml.py take care of global vectors in multiprocessing (#5081) 2020-03-03 13:58:22 +01:00
about.py Set version to v3.0.0.dev8 2020-05-19 17:15:39 +02:00
analysis.py Simplify warnings 2020-02-28 12:20:23 +01:00
attrs.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
attrs.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
glossary.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
gold.pxd Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
gold.pyx Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
kb.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
kb.pyx Merge branch 'develop' into refactor/simplify-warnings 2020-03-04 16:38:55 +01:00
language.py Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
lemmatizer.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
lexeme.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
lexeme.pyx Simplify warnings 2020-02-28 12:20:23 +01:00
lookups.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx Fix small errors 2020-03-26 13:47:31 +01:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
schemas.py Add sent_start to pattern schema 2020-03-26 14:05:40 +01:00
scorer.py Update morphologizer (#5108) 2020-04-02 14:46:32 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
structs.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
symbols.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
symbols.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
tokenizer.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
tokenizer.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge from develop 2020-05-20 12:27:31 +02:00
vectors.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
vocab.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
vocab.pyx Tidy up and auto-format 2020-02-18 15:38:18 +01:00