* remove duplicate unit test
* unit test (currently failing) for issue 4267
* bugfix: ensure doc.ents preserves kb_id annotations
* fix in setting doc.ents with empty label
* rename
* test for presetting an entity to a certain type
* allow overwriting Outside + blocking presets
* fix actions when previous label needs to be kept
* fix default ent_iob in set entities
* cleaner solution with U- action
* remove debugging print statements
* unit tests with explicit transitions and is_valid testing
* remove U- from move_names explicitly
* remove unit tests with pre-trained models that don't work
* remove (working) unit tests with pre-trained models
* clean up unit tests
* move unit tests
* small fixes
* remove two TODO's from doc.ents comments
* Improve handling of missing NER tags
GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.
Fix bug that occurred when first tag was a None value. Closes#2603.
* Document specification of missing NER tags.
The helper method state.B(1) gets the index of the first token of the
buffer, or -1 if no such token exists. Normally this is safe because we
pass this to functions like state.safe_get(), which returns an empty
token. Here we used it directly as an array index, which is not okay!
This error may have been the cause of out-of-bounds access errors during
training. Similar errors may still be around, so much be hunted down.
Hunting this one down took a long time...I printed out values across
training runs and diffed, looking for points of divergence between
runs, when no randomness should be allowed.
* Add spacy.errors module
* Update deprecation and user warnings
* Replace errors and asserts with new error message system
* Remove redundant asserts
* Fix whitespace
* Add messages for print/util.prints statements
* Fix typo
* Fix typos
* Move CLI messages to spacy.cli._messages
* Add decorator to display error code with message
An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.
* Remove unused link in spacy.about
* Update errors for invalid pipeline components
* Improve error for unknown factories
* Add displaCy warnings
* Update formatting consistency
* Move error message to spacy.errors
* Update errors and check if doc returned by component is None
Currently when a new label is introduced to NER during training,
it causes the labels to be read in in an unexpected order. This
invalidates the model.