spaCy/spacy/tokens
Matthew Honnibal 80b94313b6 💫 Fix interaction of lemmatizer and tokenizer exceptions (#3388)
Closes #2203. Closes #3268.

Lemmas set from outside the `Morphology` class were being overwritten. The result was especially confusing when deserialising, as it meant some lemmas could change when storing and retrieving a `Doc` object.

This PR applies two fixes:

1) When we go to set the lemma in the `Morphology` class, first check whether a lemma is already set. If so, don't overwrite.
2) When we load with `doc.from_array()`, take care to apply the `TAG` field first. This allows other fields to overwrite the `TAG` implied properties, if they're provided explicitly (e.g. the `LEMMA`).

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-11 01:31:21 +01:00
..
__init__.pxd * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
__init__.py Tidy up and improve docs and docstrings (#3370) 2019-03-08 11:42:26 +01:00
_retokenize.pyx Tidy up and improve docs and docstrings (#3370) 2019-03-08 11:42:26 +01:00
_serialize.py Tidy up and improve docs and docstrings (#3370) 2019-03-08 11:42:26 +01:00
doc.pxd Fix issue 2396 (#3089) 2018-12-29 18:05:52 +01:00
doc.pyx 💫 Fix interaction of lemmatizer and tokenizer exceptions (#3388) 2019-03-11 01:31:21 +01:00
span.pxd Add Span.to_array method 2017-08-19 12:20:45 +02:00
span.pyx Tidy up and improve docs and docstrings (#3370) 2019-03-08 11:42:26 +01:00
token.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
token.pyx Tidy up and improve docs and docstrings (#3370) 2019-03-08 11:42:26 +01:00
underscore.py 💫 Allow setting of custom attributes during retokenization (closes #3314) (#3324) 2019-02-24 18:38:47 +01:00