spaCy/spacy/tests/regression
Matthew Honnibal 80b94313b6 💫 Fix interaction of lemmatizer and tokenizer exceptions (#3388)
Closes #2203. Closes #3268.

Lemmas set from outside the `Morphology` class were being overwritten. The result was especially confusing when deserialising, as it meant some lemmas could change when storing and retrieving a `Doc` object.

This PR applies two fixes:

1) When we go to set the lemma in the `Morphology` class, first check whether a lemma is already set. If so, don't overwrite.
2) When we load with `doc.from_array()`, take care to apply the `TAG` field first. This allows other fields to overwrite the `TAG` implied properties, if they're provided explicitly (e.g. the `LEMMA`).

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-11 01:31:21 +01:00
..
__init__.py Add __init__.py file for regression tests 2016-11-01 13:45:06 +01:00
test_issue1-1000.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
test_issue1001-1500.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
test_issue1501-2000.py Merge regression tests 2019-02-24 20:31:38 +01:00
test_issue2001-2500.py 💫 Fix interaction of lemmatizer and tokenizer exceptions (#3388) 2019-03-11 01:31:21 +01:00
test_issue2501-3000.py Merge regression tests 2019-02-24 21:03:39 +01:00
test_issue3002.py Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293) 2019-02-20 22:10:13 +01:00
test_issue3009.py Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
test_issue3012.py Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
test_issue3199.py Only run noun chunks iterator in Span if available (closes #3199) 2019-02-08 18:33:16 +01:00
test_issue3209.py Un-x-fail passing test 2019-02-24 20:24:15 +01:00
test_issue3248.py Tidy up and auto-format 2019-02-13 15:29:08 +01:00
test_issue3277.py 💫 Add en/em dash to prefixes and suffixes (#3281) 2019-02-15 10:29:59 +01:00
test_issue3288.py Tidy up tests 2019-02-24 14:11:23 +01:00
test_issue3289.py Tidy up tests 2019-02-24 14:11:23 +01:00
test_issue3328.py Auto-format 2019-02-27 11:56:45 +01:00
test_issue3331.py Add xfailing test for #3331 2019-02-25 22:33:30 +01:00
test_issue3345.py Un-xfail passing tests and tidy up 2019-03-10 18:42:16 +01:00