mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 18:06:29 +03:00
1d34aa2b3d
* Rename to spans_key for consistency * Implement spans length in debug data * Implement how span bounds and spans are obtained In this commit, I implemented how span boundaries (the tokens) around a given span and spans are obtained. I've put them in the compile_gold() function so that it's accessible later on. I will do the actual computation of the span and boundary distinctiveness in the main function above. * Compute for p_spans and p_bounds * Add computation for SD and BD * Fix mypy issues * Add weighted average computation * Fix compile_gold conditional logic * Add test for frequency distribution computation * Add tests for kl-divergence computation * Fix weighted average computation * Make tables more compact by rounding them * Add more descriptive checks for spans * Modularize span computation methods In this commit, I added the _get_span_characteristics and _print_span_characteristics functions so that they can be reusable anywhere. * Remove unnecessary arguments and make fxs more compact * Update a few parameter arguments * Add tests for print_span and get_span methods * Update API to talk about span characteristics in brief * Add better reporting of spans_length * Add test for span length reporting * Update formatting of span length report Removed '' to indicate that it's not a string, then sort the n-grams by their length, not by their frequency. * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Show all frequency distribution when -V In this commit, I displayed the full frequency distribution of the span lengths when --verbose is passed. To make things simpler, I rewrote some of the formatter functions so that I can call them whenever. Another notable change is that instead of showing percentages as Integers, I showed them as floats (max 2-decimal places). I did this because it looks weird when it displays (0%). * Update logic on how total is computed The way the 90% thresholding is computed now is that we keep adding the percentages until we reach >= 90%. I also updated the wording and used the term "At least" to denote that >= 90% of your spans have these distributions. * Fix display when showing the threshold percentage * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add better phrasing for span information * Update spacy/cli/debug_data.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add minor edits for whitespaces etc. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> |
||
---|---|---|
.. | ||
architectures.md | ||
attributeruler.md | ||
cli.md | ||
corpus.md | ||
cython-classes.md | ||
cython-structs.md | ||
cython.md | ||
data-formats.md | ||
dependencymatcher.md | ||
dependencyparser.md | ||
doc.md | ||
docbin.md | ||
edittreelemmatizer.md | ||
entitylinker.md | ||
entityrecognizer.md | ||
entityruler.md | ||
example.md | ||
index.md | ||
kb.md | ||
language.md | ||
legacy.md | ||
lemmatizer.md | ||
lexeme.md | ||
lookups.md | ||
matcher.md | ||
morphologizer.md | ||
morphology.md | ||
phrasematcher.md | ||
pipe.md | ||
pipeline-functions.md | ||
scorer.md | ||
sentencerecognizer.md | ||
sentencizer.md | ||
span.md | ||
spancategorizer.md | ||
spangroup.md | ||
stringstore.md | ||
tagger.md | ||
textcategorizer.md | ||
tok2vec.md | ||
token.md | ||
tokenizer.md | ||
top-level.md | ||
transformer.md | ||
vectors.md | ||
vocab.md |