spaCy/spacy
Adriane Boyd ca33e891e2 Extend AttributeRuler functionality
* Add option to initialize with a dict of AttributeRuler patterns

* Instead of silently discarding overlapping matches (the default
behavior for the retokenizer if only the attrs differ), split the
matches into disjoint sets and retokenize each set separately. This
allows, for instance, one pattern to set the POS and another pattern to
set the lemma. (If two matches modify the same attribute, it looks like
the attrs are applied in the order they were added, but it may not be
deterministic?)

* Improve types
2020-07-30 11:17:33 +02:00
..
cli Merge pull request #5824 from svlandeg/fix/textcat-v3 2020-07-28 15:04:25 +02:00
displacy Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
gold Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
lang Tidy up [ci skip] 2020-07-25 13:00:49 +02:00
matcher Remove object subclassing 2020-07-12 14:03:23 +02:00
ml Rename default textcat arch to TextCatEnsemble 2020-07-26 15:11:43 +02:00
pipeline Extend AttributeRuler functionality 2020-07-30 11:17:33 +02:00
syntax Remove unused methods 2020-07-28 16:50:02 +02:00
tests Extend AttributeRuler functionality 2020-07-30 11:17:33 +02:00
tokens Add AttributeRuler for token attribute exceptions 2020-07-30 09:10:59 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up __init__.py 2020-07-25 12:14:37 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.0a5 2020-07-25 14:06:01 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
default_config.cfg Remove scores list from config and document 2020-07-28 11:22:24 +02:00
errors.py Update name in error message 2020-07-30 10:23:23 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
gold.pyx Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
kb.pxd Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
kb.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
language.py Re-add meta["pipeline"] for now 2020-07-28 16:14:23 +02:00
lemmatizer.py Simplify language data and revert detailed configs 2020-07-24 14:50:26 +02:00
lexeme.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lexeme.pyx WIP: move more language data to config 2020-07-22 15:59:37 +02:00
lookups.py Tidy up 2020-07-25 12:21:37 +02:00
morphology.pxd Update Morphology to load exceptions as MORPH_RULES 2020-07-16 21:16:49 +02:00
morphology.pyx Minor refactor for Morphology and MorphAnalysis (#5804) 2020-07-24 09:28:06 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Refactor pipeline components, config and language data (#5759) 2020-07-22 13:42:59 +02:00
schemas.py Remove scores list from config and document 2020-07-28 11:22:24 +02:00
scorer.py Update cats scoring to provide overall score 2020-07-27 12:26:10 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
structs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
tokenizer.pxd Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
tokenizer.pyx Merge pull request #5798 from explosion/feature/language-data-config 2020-07-25 13:34:49 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Move error to Errors 2020-07-28 16:24:14 +02:00
vectors.pyx Remove object subclassing 2020-07-12 14:03:23 +02:00
vocab.pxd Tidy up and move noun_chunks, token_match, url_match 2020-07-22 22:18:46 +02:00
vocab.pyx Re-add setting for vocab data and tidy up 2020-07-25 12:14:28 +02:00