spaCy/spacy
Adriane Boyd a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
..
cli Handle missing reference values in scorer (#6286) 2020-11-03 15:47:18 +01:00
displacy Refactor Docs.is_ flags (#6044) 2020-09-17 00:14:01 +02:00
lang Auto-format [ci skip] 2020-10-15 10:08:53 +02:00
matcher Fix on_match callback for DependencyMatcher (#6313) 2020-10-31 12:20:27 +01:00
ml TextCat updates and fixes (#6263) 2020-10-18 14:50:41 +02:00
pipeline Handle missing reference values in scorer (#6286) 2020-11-03 15:47:18 +01:00
tests Handle missing reference values in scorer (#6286) 2020-11-03 15:47:18 +01:00
tokens Handle missing reference values in scorer (#6286) 2020-11-03 15:47:18 +01:00
training fix resolving of dot notation (#6326) 2020-10-31 12:17:06 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Support vocab arg in spacy.blank 2020-09-15 11:39:36 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py TextCat updates and fixes (#6263) 2020-10-18 14:50:41 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg Integrate file readers 2020-10-02 01:36:06 +02:00
default_config.cfg Update default config [ci skip] 2020-10-01 22:27:37 +02:00
errors.py TextCat updates and fixes (#6263) 2020-10-18 14:50:41 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
kb.pxd Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
kb.pyx Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
language.py Tidy up and auto-format 2020-10-10 19:14:48 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
lookups.py Always serialize lookups and vectors to disk 2020-10-05 09:40:20 +02:00
morphology.pxd Add Lemmatizer and simplify related components (#5848) 2020-08-07 15:27:13 +02:00
morphology.pyx Add _ as a symbol (#6153) 2020-09-27 22:20:14 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
schemas.py Fix TokenPatternSchema pattern field validation 2020-10-16 00:41:21 +02:00
scorer.py Handle missing reference values in scorer (#6286) 2020-11-03 15:47:18 +01:00
strings.pxd Remove 'cleanup' of strings (#6007) 2020-09-01 16:12:15 +02:00
strings.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
structs.pxd Clean up MorphAnalysisC struct (#6146) 2020-09-25 15:56:48 +02:00
symbols.pxd Add _ as a symbol (#6153) 2020-09-27 22:20:14 +02:00
symbols.pyx Add _ as a symbol (#6153) 2020-09-27 22:20:14 +02:00
tokenizer.pxd Simplify specials and cache checks (#6012) 2020-09-03 09:42:49 +02:00
tokenizer.pyx Fix token.idx for special cases with affixes (#6035) 2020-09-13 14:05:36 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py if-else 2020-10-13 09:27:19 +02:00
vectors.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
vocab.pxd Minor renaming / refactoring 2020-09-18 19:43:19 +02:00
vocab.pyx Always serialize lookups and vectors to disk 2020-10-05 09:40:20 +02:00