spaCy/spacy
Adriane Boyd 63f5951f8b Add AttributeRuler for token attribute exceptions
Add the `AttributeRuler` to handle exceptions for token-level
attributes. The `AttributeRuler` uses `Matcher` patterns to identify
target spans and applies the specified attributes to the token at the
provided index in the matched span. A negative index can be used to
index from the end of the matched span. The retokenizer is used to
"merge" the individual tokens and assign them the provided attributes.

Helper functions can import existing tag maps and morph rules to the
corresponding `Matcher` patterns.

There is an additional minor bug fix for `MORPH` attributes in the
retokenizer to correctly normalize the values and to handle `MORPH`
alongside `_` in an attrs dict.
2020-07-30 09:10:59 +02:00
..
cli Merge pull request #5824 from svlandeg/fix/textcat-v3 2020-07-28 15:04:25 +02:00
displacy Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
gold Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
lang Tidy up [ci skip] 2020-07-25 13:00:49 +02:00
matcher Remove object subclassing 2020-07-12 14:03:23 +02:00
ml Rename default textcat arch to TextCatEnsemble 2020-07-26 15:11:43 +02:00
pipeline Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
syntax Remove unused methods 2020-07-28 16:50:02 +02:00
tests Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
tokens Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up __init__.py 2020-07-25 12:14:37 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.0a5 2020-07-25 14:06:01 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
default_config.cfg Remove scores list from config and document 2020-07-28 11:22:24 +02:00
errors.py Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
gold.pyx Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
kb.pxd Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
kb.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
language.py Re-add meta["pipeline"] for now 2020-07-28 16:14:23 +02:00
lemmatizer.py Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
lexeme.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lexeme.pyx WIP: move more language data to config 2020-07-22 15:59:37 +02:00
lookups.py Tidy up 2020-07-25 12:21:37 +02:00
morphology.pxd Update Morphology to load exceptions as MORPH_RULES 2020-07-16 21:16:49 +02:00
morphology.pyx Minor refactor for Morphology and MorphAnalysis (#5804) 2020-07-24 09:28:06 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Refactor pipeline components, config and language data (#5759) 2020-07-22 13:42:59 +02:00
schemas.py Remove scores list from config and document 2020-07-28 11:22:24 +02:00
scorer.py Update cats scoring to provide overall score 2020-07-27 12:26:10 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
structs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
tokenizer.pxd Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
tokenizer.pyx Merge pull request #5798 from explosion/feature/language-data-config 2020-07-25 13:34:49 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Move error to Errors 2020-07-28 16:24:14 +02:00
vectors.pyx Remove object subclassing 2020-07-12 14:03:23 +02:00
vocab.pxd Tidy up and move noun_chunks, token_match, url_match 2020-07-22 22:18:46 +02:00
vocab.pyx Re-add setting for vocab data and tidy up 2020-07-25 12:14:28 +02:00