spaCy/spacy/tokens
Adriane Boyd 63f5951f8b Add AttributeRuler for token attribute exceptions
Add the `AttributeRuler` to handle exceptions for token-level
attributes. The `AttributeRuler` uses `Matcher` patterns to identify
target spans and applies the specified attributes to the token at the
provided index in the matched span. A negative index can be used to
index from the end of the matched span. The retokenizer is used to
"merge" the individual tokens and assign them the provided attributes.

Helper functions can import existing tag maps and morph rules to the
corresponding `Matcher` patterns.

There is an additional minor bug fix for `MORPH` attributes in the
retokenizer to correctly normalize the values and to handle `MORPH`
alongside `_` in an attrs dict.
2020-07-30 09:10:59 +02:00
..
__init__.pxd * Break up tokens.pyx into tokens/doc.pyx, tokens/token.pyx, tokens/spans.pyx 2015-07-13 20:20:58 +02:00
__init__.py Modify morphology to support arbitrary features (#4932) 2020-01-23 22:01:54 +01:00
_retokenize.pyx Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
_serialize.py Remove object subclassing 2020-07-12 14:03:23 +02:00
doc.pxd Record whether Doc objects are built from known spacing (#5697) 2020-07-03 12:58:16 +02:00
doc.pyx Tidy up and move noun_chunks, token_match, url_match 2020-07-22 22:18:46 +02:00
morphanalysis.pxd Modify morphology to support arbitrary features (#4932) 2020-01-23 22:01:54 +01:00
morphanalysis.pyx Minor refactor for Morphology and MorphAnalysis (#5804) 2020-07-24 09:28:06 +02:00
span.pxd annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
span.pyx Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
token.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
token.pyx Merge branch 'develop' into master-tmp 2020-07-20 14:58:04 +02:00
underscore.py Remove object subclassing 2020-07-12 14:03:23 +02:00