spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-03-06 12:51:26 +03:00

History

Lj Miranda 1d34aa2b3d Add spacy-span-analyzer to debug data (#10668 ) * Rename to spans_key for consistency * Implement spans length in debug data * Implement how span bounds and spans are obtained In this commit, I implemented how span boundaries (the tokens) around a given span and spans are obtained. I've put them in the compile_gold() function so that it's accessible later on. I will do the actual computation of the span and boundary distinctiveness in the main function above. * Compute for p_spans and p_bounds * Add computation for SD and BD * Fix mypy issues * Add weighted average computation * Fix compile_gold conditional logic * Add test for frequency distribution computation * Add tests for kl-divergence computation * Fix weighted average computation * Make tables more compact by rounding them * Add more descriptive checks for spans * Modularize span computation methods In this commit, I added the _get_span_characteristics and _print_span_characteristics functions so that they can be reusable anywhere. * Remove unnecessary arguments and make fxs more compact * Update a few parameter arguments * Add tests for print_span and get_span methods * Update API to talk about span characteristics in brief * Add better reporting of spans_length * Add test for span length reporting * Update formatting of span length report Removed '' to indicate that it's not a string, then sort the n-grams by their length, not by their frequency. * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Show all frequency distribution when -V In this commit, I displayed the full frequency distribution of the span lengths when --verbose is passed. To make things simpler, I rewrote some of the formatter functions so that I can call them whenever. Another notable change is that instead of showing percentages as Integers, I showed them as floats (max 2-decimal places). I did this because it looks weird when it displays (0%). * Update logic on how total is computed The way the 90% thresholding is computed now is that we keep adding the percentages until we reach >= 90%. I also updated the wording and used the term "At least" to denote that >= 90% of your spans have these distributions. * Fix display when showing the threshold percentage * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add better phrasing for span information * Update spacy/cli/debug_data.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add minor edits for whitespaces etc. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>		2022-05-23 19:06:38 +02:00
..
cli	Add spacy-span-analyzer to debug data (#10668 )	2022-05-23 19:06:38 +02:00
displacy	#10672 : fixes displacy output for manual unsorted entities (#10673 )	2022-04-27 09:51:58 +02:00
lang	Auto-format code with black (#10687 )	2022-04-22 11:24:53 +02:00
matcher	Fix PhraseMatcher remove overlapping terms (#10734 )	2022-05-12 12:23:52 +02:00
ml	Refactor error messages to remove hardcoded strings (#10729 )	2022-05-02 13:38:46 +02:00
pipeline	Refactor error messages to remove hardcoded strings (#10729 )	2022-05-02 13:38:46 +02:00
tests	Add spacy-span-analyzer to debug data (#10668 )	2022-05-23 19:06:38 +02:00
tokens	Override SpanGroups.setdefault to provide default SpanGroup (#10772 )	2022-05-12 10:06:25 +02:00
training	Alignment: use a simplified ragged type for performance (#10319 )	2022-04-01 09:02:06 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
__main__.py	Tidy up	2020-06-22 00:45:40 +02:00
about.py	Set version to v3.3.0 (#10614 )	2022-04-28 13:07:49 +02:00
attrs.pxd	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
attrs.pyx	Intify IOB (#9738 )	2022-01-20 13:19:38 +01:00
compat.py	Custom component types in spacy.ty (#9469 )	2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg	Add new parameter for saving every n epoch in pretraining (#8912 )	2021-08-12 11:14:48 +02:00
default_config.cfg	Add a few docs to the default_config.cfg (#9981 )	2022-01-05 09:16:40 +01:00
errors.py	Ignore overrides for pipe names in config argument (#10779 )	2022-05-12 11:46:08 +02:00
glossary.py	Add glossary entry for root (#10821 )	2022-05-20 09:56:32 +02:00
kb.pxd	Replace cpdef variables with cdef (#7834 )	2021-04-26 16:54:02 +02:00
kb.pyx	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
language.py	Ignore overrides for pipe names in config argument (#10779 )	2022-05-12 11:46:08 +02:00
lexeme.pxd	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
lexeme.pyi	fix type of lexeme.rank (#9979 )	2022-01-04 13:15:25 +01:00
lexeme.pyx	Bugfix for similarity return types (#10051 )	2022-01-20 11:40:46 +01:00
lookups.py	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
morphology.pxd	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
morphology.pyx	Clean up Morphology imports and definitions (#7441 )	2021-04-26 16:54:23 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pipe_analysis.py	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Add ENT_IOB key to Matcher (#9649 )	2022-01-20 13:18:39 +01:00
scorer.py	Alignment: use a simplified ragged type for performance (#10319 )	2022-04-01 09:02:06 +02:00
strings.pxd	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
strings.pyi	Fix StringStore.__getitem__ return type depending on parameter types (#10741 )	2022-05-03 17:57:07 +02:00
strings.pyx	Update Cython string types (#9143 )	2021-09-13 17:02:17 +02:00
structs.pxd	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 )	2021-01-14 17:30:41 +11:00
symbols.pxd	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Add tokenizer option to allow Matcher handling for all rules (#10452 )	2022-03-24 13:21:32 +01:00
tokenizer.pyx	Add tokenizer option to allow Matcher handling for all rules (#10452 )	2022-03-24 13:21:32 +01:00
ty.py	Custom component types in spacy.ty (#9469 )	2021-10-21 15:31:06 +02:00
typedefs.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	hook up meta in load_model_from_config (#10400 )	2022-03-04 11:07:45 +01:00
vectors.pyx	Save vectors as little endian, load with Ops.asarray (#10201 )	2022-03-21 14:24:46 +01:00
vocab.pxd	Add support for floret vectors (#8909 )	2021-10-27 14:08:31 +02:00
vocab.pyi	Add vector deduplication (#10551 )	2022-03-30 08:54:23 +02:00
vocab.pyx	Add vector deduplication (#10551 )	2022-03-30 08:54:23 +02:00