spaCy/spacy/cli
Lj Miranda 1d34aa2b3d
Add spacy-span-analyzer to debug data (#10668)
* Rename to spans_key for consistency

* Implement spans length in debug data

* Implement how span bounds and spans are obtained

In this commit, I implemented how span boundaries (the tokens) around a
given span and spans are obtained. I've put them in the compile_gold()
function so that it's accessible later on. I will do the actual
computation of the span and boundary distinctiveness in the main
function above.

* Compute for p_spans and p_bounds

* Add computation for SD and BD

* Fix mypy issues

* Add weighted average computation

* Fix compile_gold conditional logic

* Add test for frequency distribution computation

* Add tests for kl-divergence computation

* Fix weighted average computation

* Make tables more compact by rounding them

* Add more descriptive checks for spans

* Modularize span computation methods

In this commit, I added the _get_span_characteristics and
_print_span_characteristics functions so that they can be reusable
anywhere.

* Remove unnecessary arguments and make fxs more compact

* Update a few parameter arguments

* Add tests for print_span and get_span methods

* Update API to talk about span characteristics in brief

* Add better reporting of spans_length

* Add test for span length reporting

* Update formatting of span length report

Removed '' to indicate that it's not a string, then
sort the n-grams by their length, not by their frequency.

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Show all frequency distribution when -V

In this commit, I displayed the full frequency distribution of the
span lengths when --verbose is passed. To make things simpler, I
rewrote some of the formatter functions so that I can call them
whenever.

Another notable change is that instead of showing percentages as
Integers, I showed them as floats (max 2-decimal places). I did this
because it looks weird when it displays (0%).

* Update logic on how total is computed

The way the 90% thresholding is computed now is that we keep
adding the percentages until we reach >= 90%. I also updated the wording
and used the term "At least" to denote that >= 90% of your spans have
these distributions.

* Fix display when showing the threshold percentage

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add better phrasing for span information

* Update spacy/cli/debug_data.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add minor edits for whitespaces etc.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-05-23 19:06:38 +02:00
..
project Allow assets to be optional in spacy project (#10714) 2022-05-10 10:40:11 +02:00
templates Add spancat, trainable_lemmatizer to quickstart (#10524) 2022-04-01 09:01:04 +02:00
__init__.py Add debug diff command in spaCy CLI (#10502) 2022-04-07 10:48:45 +02:00
_util.py Stream large assets on download (#10521) 2022-03-24 11:47:05 +01:00
assemble.py Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
convert.py Minor fixes to convert CLI (#9465) 2021-10-14 18:37:34 +02:00
debug_config.py Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
debug_data.py Add spacy-span-analyzer to debug data (#10668) 2022-05-23 19:06:38 +02:00
debug_diff.py Add debug diff command in spaCy CLI (#10502) 2022-04-07 10:48:45 +02:00
debug_model.py Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
download.py Use minor version for compatibility check (#8403) 2021-06-21 09:39:22 +02:00
evaluate.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
info.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
init_config.py Fix references to config file in the docs & UX (#9961) 2022-01-04 14:31:26 +01:00
init_pipeline.py Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
package.py Raise error in spacy package when model name is not a valid python identifier (#10192) 2022-02-10 08:15:23 +01:00
pretrain.py Check if the resume path points to a directory (#7919) 2021-04-28 09:17:15 +02:00
profile.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
train.py Add docs section for spacy.cli.train.train (#9545) 2021-10-29 10:36:34 +02:00
validate.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00