spaCy/spacy/cli
Lj Miranda 42072f4468
Add spancat pipeline in spacy debug data (#10070)
* Setup debug data for spancat

* Add check for missing labels

* Add low-level data warning error

* Improve logic when compiling the gold train data

* Implement check for negative examples

* Remove breakpoint

* Remove ws_ents and missing entity checks

* Fix mypy errors

* Make variable name spans_key consistent

* Rename pipeline -> component for consistency

* Account for missing labels per spans_key

* Cleanup variable names for consistency

* Improve brevity of conditional statements

* Remove unused variables

* Include spans_key as an argument for _get_examples

* Add a conditional check for spans_key

* Update spancat debug data based on new API

- Instead of using _get_labels_from_model(), I'm now using
_get_labels_from_spancat() (cf. https://github.com/explosion/spaCy/pull10079)
- The way information is displayed was also changed (text -> table)

* Rename model_labels to ensure mypy works

* Update wording on warning messages

Use "span type" instead of "entity type" in wording the warning messages.
This is because Spans aren't necessarily entities.

* Update component type into a Literal

This is to make it clear that the component parameter should only accept
either 'spancat' or 'ner'.

* Update checks to include actual model span_keys

Instead of looking at everything in the data, we only check those
span_keys from the actual spancat component. Instead of doing the filter
inside the for-loop, I just made another dictionary,
data_labels_in_component to hold this value.

* Update spacy/cli/debug_data.py

* Show label counts only when verbose is True

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-02-07 15:03:36 +01:00
..
project Check for assets with size of 0 bytes (#10026) 2022-01-12 10:34:23 +01:00
templates Use paths.vectors for vectors in init config (#10146) 2022-02-04 21:09:48 +01:00
__init__.py assemble CLI command (#7783) 2021-04-19 18:39:11 +10:00
_util.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
assemble.py Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
convert.py Minor fixes to convert CLI (#9465) 2021-10-14 18:37:34 +02:00
debug_config.py Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
debug_data.py Add spancat pipeline in spacy debug data (#10070) 2022-02-07 15:03:36 +01:00
debug_model.py Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
download.py Use minor version for compatibility check (#8403) 2021-06-21 09:39:22 +02:00
evaluate.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
info.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
init_config.py Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
init_pipeline.py Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
package.py Fix Language-specific factory handling in package command (#9674) 2021-11-29 08:31:02 +01:00
pretrain.py Check if the resume path points to a directory (#7919) 2021-04-28 09:17:15 +02:00
profile.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
train.py Add docs section for spacy.cli.train.train (#9545) 2021-10-29 10:36:34 +02:00
validate.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00