spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 10:56:45 +03:00

History

Lj Miranda 913d74f509 Add spancat_singlelabel pipeline for multiclass and non-overlapping span labelling tasks (#11365 ) * [wip] Update * [wip] Update * Add initial port * [wip] Update * Fix all imports * Add spancat_exclusive to pipeline * [WIP] Update * [ci skip] Add breakpoint for debugging * Use spacy.SpanCategorizer.v1 as default archi * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * [ci skip] Small updates * Use Softmax v2 directly from thinc * Cache the label map * Fix mypy errors However, I ignored line 370 because it opened up a bunch of type errors that might be trickier to solve and might lead to a more complicated codebase. * avoid multiplication with 1.0 Co-authored-by: kadarakos <kadar.akos@gmail.com> * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update component versions to v2 * Add scorer to docstring * Add _n_labels property to SpanCategorizer Instead of using len(self.labels) in initialize() I am using a private property self._n_labels. This achieves implementation parity and allows me to delete the whole initialize() method for spancat_exclusive (since it's now the same with spancat). * Inherit from SpanCat instead of TrainablePipe This commit changes the inheritance structure of Exclusive_Spancat, now it's inheriting from SpanCategorizer than TrainablePipe. This allows me to remove duplicate methods that are already present in the parent function. * Revert documentation link to spancat * Fix init call for exclusive spancat * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Import Suggester from spancat * Include zero_init.v1 for spancat * Implement _allow_extra_label to use _n_labels To ensure that spancat / spancat_exclusive cannot be resized after initialization, I inherited the _allow_extra_label() method from spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead of len(self.labels) for checking. I think that changing it locally is a better solution rather than forcing each class that inherits TrainablePipe to use the self._n_labels attribute. Also note that I turned-off black formatting in this block of code because it reads better without the overhang. * Extend existing tests to spancat_exclusive In this commit, I extended the existing tests for spancat to include spancat_exclusive. I parametrized the test functions with 'name' (similar var name with textcat and textcat_multilabel) for each applicable test. TODO: Add overfitting tests for spancat_exclusive * Update documentation for spancat * Turn on formatting for allow_extra_label * Remove initializers in default config * Use DEFAULT_EXCL_SPANCAT_MODEL I also renamed spancat_exclusive_default_config into spancat_excl_default_config because black does some not pretty formatting changes. * Update documentation Update grammar and usage Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clarify docstring for Exclusive_SpanCategorizer * Remove mypy ignore and typecast labels to list * Fix documentation API * Use a single variable for tests * Update defaults for number of rows Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Put back initializers in spancat config Whenever I remove model.scorer.init_w and model.scorer.init_b, I encounter an error in the test: SystemError: <method '__getitem__' of 'dict' objects> returned a result with an error set. My Thinc version is 8.1.5, but I can't seem to check what's causing the error. * Update spancat_exclusive docstring * Remove init_W and init_B parameters This commit is expected to fail until the new Thinc release. * Require thinc>=8.1.6 for serializable Softmax defaults * Handle zero suggestions to make tests pass I'm not sure if this is the most elegant solution. But what should happen is that the _make_span_group function MUST return an empty SpanGroup if there are no suggestions. The error happens when the 'scores' variable is empty. We cannot get the 'predicted' and other downstream vars. * Better approach for handling zero suggestions * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spancategorizer headers * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add default value in negative_weight in docs * Add default value in allow_overlap in docs * Update how spancat_exclusive is constructed In this commit, I added the following: - Put the default values of negative_weight and allow_overlap in the default_config dictionary. - Rename make_spancat -> make_exclusive_spancat * Run prettier on spancategorizer.mdx * Change exactly one -> at most one * Add suggester documentation in Exclusive_SpanCategorizer * Add suggester to spancat docstrings * merge multilabel and singlelabel spancat * rename spancat_exclusive to singlelable * wire up different make_spangroups for single and multilabel * black * black * add docstrings * more docstring and fix negative_label * don't rely on default arguments * black * remove spancat exclusive * replace single_label with add_negative_label and adjust inference * mypy * logical bug in configuration check * add spans.attrs[scores] * single label make_spangroup test * bugfix * black * tests for make_span_group with negative labels * refactor make_span_group * black * Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * remove duplicate declaration * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * raise error instead of just print * make label mapper private * update docs * run prettier * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * don't keep recomputing self._label_map for each span * typo in docs * Intervals to private and document 'name' param * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * add Tag to new features * replace tags * revert * revert * revert * revert * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * prettier * Fix merge * Update website/docs/api/spancategorizer.mdx * remove references to 'single_label' * remove old paragraph * Add spancat_singlelabel to config template * Format * Extend init config tests --------- Co-authored-by: kadarakos <kadar.akos@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>		2023-03-09 10:30:59 +01:00
..
__init__.py	Revert #4334	2019-09-29 17:32:12 +02:00
test_analysis.py	Simplify pipe analysis	2020-08-01 13:40:06 +02:00
test_annotates_on_update.py	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
test_attributeruler.py	Refactor scoring methods to use registered functions (#8766 )	2021-08-10 15:13:39 +02:00
test_edit_tree_lemmatizer.py	Fix speed problem with `top_k>1` on CPU in edit tree lemmatizer (#12017 )	2023-01-20 19:34:11 +01:00
test_entity_linker.py	rely on is_empty property instead of __len__ (#12347 )	2023-03-01 12:06:07 +01:00
test_entity_ruler.py	Enable fuzzy text matching in Matcher (#11359 )	2023-01-10 10:36:17 +01:00
test_functions.py	Add doc_cleaner component (#9659 )	2021-11-23 15:33:33 +01:00
test_initialize.py	Test with default value	2020-09-29 17:00:40 +02:00
test_lemmatizer.py	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
test_models.py	Tidy up code	2021-06-28 12:08:15 +02:00
test_morphologizer.py	removing print statements from the test suite (#10712 )	2022-04-27 09:14:25 +02:00
test_pipe_factories.py	Auto-format code with black (#10795 )	2022-05-13 19:02:08 +02:00
test_pipe_methods.py	Revert disable/disabled merging behavior (#11745 )	2022-11-08 14:58:10 +01:00
test_sentencizer.py	Refactor Docs.is_ flags (#6044 )	2020-09-17 00:14:01 +02:00
test_senter.py	Add Pipe.hide_labels to omit labels from pipeline meta (#10175 )	2022-02-05 17:59:24 +01:00
test_span_ruler.py	Add SpanRuler component (#9880 )	2022-06-02 13:12:53 +02:00
test_spancat.py	Add spancat_singlelabel pipeline for multiclass and non-overlapping span labelling tasks (#11365 )	2023-03-09 10:30:59 +01:00
test_tagger.py	Migrate regression tests into the main test suite (#9655 )	2021-12-04 20:34:48 +01:00
test_textcat.py	Improve score_cats for use with multiple textcat components (#11820 )	2023-01-09 11:43:48 +01:00
test_tok2vec.py	Auto-format code with black (#11687 )	2022-10-21 11:54:17 +02:00