spaCy/spacy/cli
Matthew Honnibal 609c0ba557
Fix accidentally quadratic runtime in Example.split_sents (#5464)
* Tidy up train-from-config a bit

* Fix accidentally quadratic perf in TokenAnnotation.brackets

When we're reading in the gold data, we had a nested loop where
we looped over the brackets for each token, looking for brackets
that start on that word. This is accidentally quadratic, because
we have one bracket per word (for the POS tags). So we had
an O(N**2) behaviour here that ended up being pretty slow.

To solve this I'm indexing the brackets by their starting word
on the TokenAnnotations object, and having a property to provide
the previous view.

* Fixes
2020-05-20 18:48:18 +02:00
..
converters Tidy up and fix issues 2020-02-18 15:17:03 +01:00
__init__.py Remove symlinks, data dir and related stuff 2020-02-18 17:20:17 +01:00
convert.py Add convert CLI option to merge CoNLL-U subtokens (#4722) 2020-01-29 17:44:25 +01:00
debug_data.py Fix formatting and update docs for v2.2.4 2020-03-09 11:17:20 +01:00
download.py Remove symlinks, data dir and related stuff 2020-02-18 17:20:17 +01:00
evaluate.py Update morphologizer (#5108) 2020-04-02 14:46:32 +02:00
info.py Remove symlinks, data dir and related stuff 2020-02-18 17:20:17 +01:00
init_model.py Simplify warnings 2020-02-28 12:20:23 +01:00
package.py Modernize plac commands for Python 3 (#4836) 2020-01-01 13:15:46 +01:00
pretrain.py Tidy up and auto-format 2020-02-28 11:57:41 +01:00
profile.py Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
train_from_config.py Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
train.py Feature toggle_pipes (#5378) 2020-05-18 22:27:10 +02:00
validate.py Remove symlinks, data dir and related stuff 2020-02-18 17:20:17 +01:00