spaCy/spacy/tests/doc
Adriane Boyd 53c0fb7431
Only set NORM on Token in retokenizer (#6464)
* Only set NORM on Token in retokenizer

Instead of setting `NORM` on both the token and lexeme, set `NORM` only
on the token.

The retokenizer tries to set all possible attributes with
`Token/Lexeme.set_struct_attr` so that it doesn't have to enumerate
which attributes are available for each. `NORM` is the only attribute
that's stored on both and for most cases it doesn't make sense to set
the global norms based on a individual retokenization. For lexeme-only
attributes like `IS_STOP` there's no way to avoid the global side
effects, but I think that `NORM` would be better only on the token.

* Fix test
2020-11-30 09:35:42 +08:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_add_entities.py Fix test imports 2019-09-29 17:34:56 +02:00
test_array.py Tidy up and auto-format 2020-03-25 12:28:12 +01:00
test_creation.py Tidy up and auto-format 2020-05-21 14:14:01 +02:00
test_doc_api.py Add ent_id_ to strings serialized with Doc (#6353) 2020-11-10 20:16:07 +08:00
test_morphanalysis.py Revert #4334 2019-09-29 17:32:12 +02:00
test_pickle_doc.py Revert #4334 2019-09-29 17:32:12 +02:00
test_retokenize_merge.py Only set NORM on Token in retokenizer (#6464) 2020-11-30 09:35:42 +08:00
test_retokenize_split.py Fix norm in retokenizer split (#6111) 2020-09-22 21:53:33 +02:00
test_span.py Fix/span.sent (#6083) 2020-10-01 14:01:52 +02:00
test_to_json.py Revert #4334 2019-09-29 17:32:12 +02:00
test_token_api.py Tidy up and auto-format 2020-05-21 14:14:01 +02:00
test_underscore.py use clean_underscore fixture 2020-02-23 15:49:20 +01:00