mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-28 06:31:12 +03:00
* Rename to spans_key for consistency * Implement spans length in debug data * Implement how span bounds and spans are obtained In this commit, I implemented how span boundaries (the tokens) around a given span and spans are obtained. I've put them in the compile_gold() function so that it's accessible later on. I will do the actual computation of the span and boundary distinctiveness in the main function above. * Compute for p_spans and p_bounds * Add computation for SD and BD * Fix mypy issues * Add weighted average computation * Fix compile_gold conditional logic * Add test for frequency distribution computation * Add tests for kl-divergence computation * Fix weighted average computation * Make tables more compact by rounding them * Add more descriptive checks for spans * Modularize span computation methods In this commit, I added the _get_span_characteristics and _print_span_characteristics functions so that they can be reusable anywhere. * Remove unnecessary arguments and make fxs more compact * Update a few parameter arguments * Add tests for print_span and get_span methods * Update API to talk about span characteristics in brief * Add better reporting of spans_length * Add test for span length reporting * Update formatting of span length report Removed '' to indicate that it's not a string, then sort the n-grams by their length, not by their frequency. * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Show all frequency distribution when -V In this commit, I displayed the full frequency distribution of the span lengths when --verbose is passed. To make things simpler, I rewrote some of the formatter functions so that I can call them whenever. Another notable change is that instead of showing percentages as Integers, I showed them as floats (max 2-decimal places). I did this because it looks weird when it displays (0%). * Update logic on how total is computed The way the 90% thresholding is computed now is that we keep adding the percentages until we reach >= 90%. I also updated the wording and used the term "At least" to denote that >= 90% of your spans have these distributions. * Fix display when showing the threshold percentage * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add better phrasing for span information * Update spacy/cli/debug_data.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add minor edits for whitespaces etc. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> |
||
|---|---|---|
| .. | ||
| architectures.md | ||
| attributeruler.md | ||
| cli.md | ||
| corpus.md | ||
| cython-classes.md | ||
| cython-structs.md | ||
| cython.md | ||
| data-formats.md | ||
| dependencymatcher.md | ||
| dependencyparser.md | ||
| doc.md | ||
| docbin.md | ||
| edittreelemmatizer.md | ||
| entitylinker.md | ||
| entityrecognizer.md | ||
| entityruler.md | ||
| example.md | ||
| index.md | ||
| kb.md | ||
| language.md | ||
| legacy.md | ||
| lemmatizer.md | ||
| lexeme.md | ||
| lookups.md | ||
| matcher.md | ||
| morphologizer.md | ||
| morphology.md | ||
| phrasematcher.md | ||
| pipe.md | ||
| pipeline-functions.md | ||
| scorer.md | ||
| sentencerecognizer.md | ||
| sentencizer.md | ||
| span.md | ||
| spancategorizer.md | ||
| spangroup.md | ||
| stringstore.md | ||
| tagger.md | ||
| textcategorizer.md | ||
| tok2vec.md | ||
| token.md | ||
| tokenizer.md | ||
| top-level.md | ||
| transformer.md | ||
| vectors.md | ||
| vocab.md | ||