spaCy/spacy
Matthew Honnibal 78d79d94ce Guess set_annotations=True in nlp.update
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.

Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.

We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
..
cli Tweak memory management in train_from_config 2020-05-21 19:32:04 +02:00
displacy Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
lang Remove "pala" tokenizer exception for Spanish (#5265) 2020-04-09 10:21:20 +02:00
matcher Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
ml Fix shape inference 2020-05-21 20:46:10 +02:00
pipeline Fix begin_training 2020-05-21 20:46:21 +02:00
syntax Fix begin_training 2020-05-21 20:46:21 +02:00
tests Guess set_annotations=True in nlp.update 2020-05-22 15:55:45 +02:00
tokens Update morphologizer (#5108) 2020-04-02 14:46:32 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Simplify warnings 2020-02-28 12:20:23 +01:00
__main__.py Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
_ml.py take care of global vectors in multiprocessing (#5081) 2020-03-03 13:58:22 +01:00
about.py Set version to v3.0.0.dev9 2020-05-21 20:47:52 +02:00
analysis.py Simplify warnings 2020-02-28 12:20:23 +01:00
attrs.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
attrs.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py Various fixes to NEL functionality, Example class etc (#5460) 2020-05-20 11:41:12 +02:00
glossary.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
gold.pxd Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
gold.pyx Fix accidentally quadratic runtime in Example.split_sents (#5464) 2020-05-20 18:48:18 +02:00
kb.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
kb.pyx Merge branch 'develop' into refactor/simplify-warnings 2020-03-04 16:38:55 +01:00
language.py Guess set_annotations=True in nlp.update 2020-05-22 15:55:45 +02:00
lemmatizer.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
lexeme.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
lexeme.pyx Simplify warnings 2020-02-28 12:20:23 +01:00
lookups.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx Fix small errors 2020-03-26 13:47:31 +01:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
schemas.py Add sent_start to pattern schema 2020-03-26 14:05:40 +01:00
scorer.py Update morphologizer (#5108) 2020-04-02 14:46:32 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
structs.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
symbols.pxd Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
symbols.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
tokenizer.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
tokenizer.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Merge from develop 2020-05-20 12:27:31 +02:00
vectors.pyx Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00
vocab.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
vocab.pyx Tidy up and auto-format 2020-02-18 15:38:18 +01:00