spaCy/examples/training
adrianeboyd 8fe7bdd0fa Improve token pattern checking without validation (#4105)
* Fix typo in rule-based matching docs

* Improve token pattern checking without validation

Add more detailed token pattern checks without full JSON pattern validation and
provide more detailed error messages.

Addresses #4070 (also related: #4063, #4100).

* Check whether top-level attributes in patterns and attr for PhraseMatcher are
  in token pattern schema

* Check whether attribute value types are supported in general (as opposed to
  per attribute with full validation)

* Report various internal error types (OverflowError, AttributeError, KeyError)
  as ValueError with standard error messages

* Check for tagger/parser in PhraseMatcher pipeline for attributes TAG, POS,
  LEMMA, and DEP

* Add error messages with relevant details on how to use validate=True or nlp()
  instead of nlp.make_doc()

* Support attr=TEXT for PhraseMatcher

* Add NORM to schema

* Expand tests for pattern validation, Matcher, PhraseMatcher, and EntityRuler

* Remove unnecessary .keys()

* Rephrase error messages

* Add another type check to Matcher

Add another type check to Matcher for more understandable error messages
in some rare cases.

* Support phrase_matcher_attr=TEXT for EntityRuler

* Don't use spacy.errors in examples and bin scripts

* Fix error code

* Auto-format

Also try get Azure pipelines to finally start a build :(

* Update errors.py


Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2019-08-21 14:00:37 +02:00
..
conllu.py Remove unused cytoolz / itertools imports 2018-12-03 02:12:07 +01:00
ner_multitask_objective.py Auto-format examples 2018-12-02 04:26:26 +01:00
pretrain_kb.py Improve token pattern checking without validation (#4105) 2019-08-21 14:00:37 +02:00
pretrain_textcat.py Auto-format examples 2018-12-02 04:26:26 +01:00
rehearsal.py Update rehearsal example 2019-02-24 16:17:41 +01:00
train_entity_linker.py Improve token pattern checking without validation (#4105) 2019-08-21 14:00:37 +02:00
train_intent_parser.py Auto-format examples 2018-12-02 04:26:26 +01:00
train_ner.py Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
train_new_entity_type.py Update compatibility [ci skip] 2019-04-01 16:25:16 +02:00
train_parser.py Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
train_tagger.py Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
train_textcat.py Bug fixes and options for TextCategorizer (#3472) 2019-03-23 16:44:44 +01:00
training-data.json Update Example input JSON file to adhere to specification. (#3243) 2019-02-07 16:18:01 +01:00
vocab-data.jsonl Use even smaller examle size 2017-10-30 19:46:45 +01:00