When the default `max_length` is not set and there are longer training
documents, it can be difficult to train and evaluate the span finder due
to memory limits and the time it takes to evaluate a huge number of
predicted spans.
* Use isort with Black profile
* isort all the things
* Fix import cycles as a result of import sorting
* Add DOCBIN_ALL_ATTRS type definition
* Add isort to requirements
* Remove isort from build dependencies check
* Typo
* span finder integrated into spacy from experimental
* black
* isort
* black
* default spankey constant
* black
* Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* rename
* rename
* max_length and min_length as Optional[int] and strict checking
* black
* mypy fix for integer type infinity
* revert line order
* implement all comparison operators for inf int
* avoid two for loops over all docs by not precomputing
* interleave thresholding with span creation
* black
* revert to not interleaving (relized its faster)
* black
* Update spacy/errors.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* update dosctring
* enforce that the gold and predicted documents have the same text
* new error for ensuring reference and predicted texts are the same
* remove todo
* adjust test
* black
* handle misaligned tokenization
* return correct variable
* failing overfit test
* only use a single spans_key like in spancat
* black
* remove debug lines
* typo
* remove comment
* remove near duplicate reduntant method
* use the 'spans_key' variable name everywhere
* Update spacy/pipeline/span_finder.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* flaky test fix suggestion, hand set bias terms
* only test suggester and test result exhaustively
* make it clear that the span_finder_suggester is more general (not specific to span_finder)
* Update spacy/tests/pipeline/test_span_finder.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Apply suggestions from code review
* remove question comment
* move preset_spans_suggester test to spancat tests
* Add docs and unify default configs for spancat and span finder
* Add `allow_overlap=True` to span finder scorer
* Fix offset bug in set_annotations
* Ignore labels in span finder scorer
* Format
* Add span_finder to quickstart template
* Move settings to self.cfg, store min/max unset as None
* Remove debugging
* Update docstrings and docs
* Update spacy/pipeline/span_finder.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix imports
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>