mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 20:28:20 +03:00
53c0fb7431
* Only set NORM on Token in retokenizer Instead of setting `NORM` on both the token and lexeme, set `NORM` only on the token. The retokenizer tries to set all possible attributes with `Token/Lexeme.set_struct_attr` so that it doesn't have to enumerate which attributes are available for each. `NORM` is the only attribute that's stored on both and for most cases it doesn't make sense to set the global norms based on a individual retokenization. For lexeme-only attributes like `IS_STOP` there's no way to avoid the global side effects, but I think that `NORM` would be better only on the token. * Fix test |
||
---|---|---|
.. | ||
__init__.pxd | ||
__init__.py | ||
_retokenize.pyx | ||
_serialize.py | ||
doc.pxd | ||
doc.pyx | ||
morphanalysis.pxd | ||
morphanalysis.pyx | ||
span.pxd | ||
span.pyx | ||
token.pxd | ||
token.pyx | ||
underscore.py |