spaCy/spacy/tokens
Matthew Honnibal 8aa7882762
Make NORM a token attribute (#3029)
See #3028. The solution in this patch is pretty debateable.

What we do is give the TokenC struct a .norm field, by repurposing the previously idle .sense attribute. It's nice to repurpose a previous field because it means the TokenC doesn't change size, so even if someone's using the internals very deeply, nothing will break.

The weird thing here is that the TokenC and the LexemeC both have an attribute named NORM. This arguably assists in backwards compatibility. On the other hand, maybe it's really bad! We're changing the semantics of the attribute subtly, so maybe it's better if someone calling lex.norm gets a breakage, and instead is told to write lex.default_norm?

Overall I believe this patch makes the NORM feature work the way we sort of expected it to work. Certainly it's much more like how the docs describe it, and more in line with how we've been directing people to use the norm attribute. We'll also be able to use token.norm to do stuff like spelling correction, which is pretty cool.
2018-12-08 10:49:10 +01:00
..
__init__.pxd * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
__init__.py Tidy up and document Doc, Token and Span 2017-10-27 15:41:45 +02:00
_retokenize.pyx 💫 Port master changes over to develop (#2979) 2018-11-29 16:30:29 +01:00
_serialize.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
doc.pxd Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
doc.pyx Fix removabl of dill (for srsly) 2018-12-06 18:46:09 +01:00
span.pxd Add Span.to_array method 2017-08-19 12:20:45 +02:00
span.pyx Update develop from master 2018-08-14 03:04:28 +02:00
token.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
token.pyx Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
underscore.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00