spaCy/spacy
Matthew Honnibal 77ddcf7381
💫 Update matcher engine for regex and extensions (#3173)
* Update matcher engine for regex and extensions

Add support for matching over arbitrary Python predicate functions, and
arbitrary Python attribute getters. This will allow matching over regex
patterns, and allow supporting extension attributes.

The results of the Python predicate functions are cached, so that we don't
call the same predicate function twice for the same token. The extension
attributes are fetched into an array for each token in the doc. This
should minimise the performance impact of the new features.

We still need to wire up these features to the patterns, and test it
all.

* Work on wiring up extra attributes in matcher

* Work on tests for extra matcher attrs

* Add support for extension attrs to matcher

* Test extension attribute matching

* Work on implementing predicate-based match patterns

* Get predicates working for set membership

* Add test for set membership

* Make extensions+predicates work

* Test matcher extensions

* Cache predicate results better in Matcher

* Remove print statement in matcher test

* Use srsly to get key for predicates
2019-01-21 13:23:15 +01:00
..
cli spacy.cli.evaluate: fix TypeError (#3101) 2018-12-28 11:14:28 +01:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Small fixes to displaCy (#3076) 2018-12-20 17:32:04 +01:00
lang Prevent exceptions from setting POS but not TAG. Closes #1773 2018-12-30 13:16:05 +01:00
syntax 💫 Prevent parser from predicting unseen classes (#3075) 2018-12-20 16:12:22 +01:00
tests 💫 Update matcher engine for regex and extensions (#3173) 2019-01-21 13:23:15 +01:00
tokens Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2018-12-30 15:49:57 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
__main__.py 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py 💫 Better support for semi-supervised learning (#3035) 2018-12-10 16:25:33 +01:00
about.py Set version to v2.1.0a6.dev0 2019-01-05 14:44:42 +01:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
errors.py Small fixes to displaCy (#3076) 2018-12-20 17:32:04 +01:00
glossary.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
gold.pxd Add support for sent_start to GoldParse 2017-08-25 20:03:14 -05:00
gold.pyx Fix JSON segmentation bug that affected French 2018-12-08 10:41:24 +01:00
language.py Improve entry points and allow custom language classes via entry points (#3080) 2018-12-20 23:58:43 +01:00
lemmatizer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lexeme.pxd WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx 💫 Update matcher engine for regex and extensions (#3173) 2019-01-21 13:23:15 +01:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx 💫 Raise better error when using uninitialized pipeline component (#3074) 2018-12-20 15:54:53 +01:00
scorer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Add get_string_id helper to spacy.strings 2018-12-10 16:09:26 +01:00
structs.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Improve entry points and allow custom language classes via entry points (#3080) 2018-12-20 23:58:43 +01:00
vectors.pyx Fix KeyError in Vectors.most_similar. Fixes #2648 2018-12-10 16:19:18 +01:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx Prevent exceptions from setting POS but not TAG. Closes #1773 2018-12-30 13:16:05 +01:00