Commit Graph

641 Commits

Author SHA1 Message Date
Raphael Mitsch
78c72d3ab7
Merge branch 'main' into feature/docwise-generator-batching 2024-01-30 21:00:22 +01:00
Daniël de Kok
ce4ea5ffa7 Py_UNICODE is not compatible with 3.12 2024-01-24 13:08:56 +01:00
Daniël de Kok
82ef6783a8 Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-24 09:09:01 +01:00
Daniël de Kok
81beaea70e Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
maurice
c608baeecc
Fix typo in method name 2024-01-16 21:54:54 +01:00
Daniël de Kok
7351f6bbeb Update thinc dependency to 9.0.0.dev4 2024-01-16 15:56:09 +01:00
Daniël de Kok
7ebba86402
Add TextCatReduce.v1 (#13181)
* Add TextCatReduce.v1

This is a textcat classifier that pools the vectors generated by a
tok2vec implementation and then applies a classifier to the pooled
representation. Three reductions are supported for pooling: first, max,
and mean. When multiple reductions are enabled, the reductions are
concatenated before providing them to the classification layer.

This model is a generalization of the TextCatCNN model, which only
supports mean reductions and is a bit of a misnomer, because it can also
be used with transformers. This change also reimplements TextCatCNN.v2
using the new TextCatReduce.v1 layer.

* Doc fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence

* Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy

* Add back a test for TextCatCNN.v2

* Replace TextCatCNN in pipe configurations and templates

* Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor

* Add last reduction (`use_reduce_last`)

* Remove non-working TextCatCNN Netlify redirect

* Revert layer changes for the quickstart

* Revert one more quickstart change

* Remove unused import

* Fix docstring

* Fix setting name in error message

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-12-21 11:00:06 +01:00
Daniël de Kok
9b36729cbd Fix Cython lints 2023-12-18 20:02:15 +01:00
Daniël de Kok
42fe4edfd7 Add distillation tests with max cut size
And fix endless loop when the max cut size is 0 or 1.
2023-12-08 20:38:01 +01:00
Daniël de Kok
e2591cda36 isort 2023-12-08 20:24:09 +01:00
Daniël de Kok
e5ec45cb7e Revert "Merge the parser refactor into v4 (#10940)"
This reverts commit a183db3cef.
2023-12-08 20:23:08 +01:00
Daniël de Kok
05803cfe76 Revert "Reimplement distillation with oracle cut size (#12214)"
This reverts commit e27c60a702.
2023-12-08 14:38:05 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW to use the fixed SparseLinear layer (#13149)
* Update `TextCatBOW` to use the fixed `SparseLinear` layer

A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754

This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.

While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.

* Replace TexCatBOW `length_exponent` parameter by `length`

We now round up the length to the next power of two if it isn't
a power of two.

* Remove some tests for TextCatBOW.v2

* Fix missing import
2023-11-29 09:11:54 +01:00
Sofie Van Landeghem
699dd8b3b7
Update __all__ fields (#13063)
* update all for pipeline.init

* add all in training.init

* add all in kb.init

* alphabetically
2023-10-16 10:17:47 +02:00
Adriane Boyd
538304948e Remove profile=True from currently profiled cython 2023-09-28 17:09:41 +02:00
Adriane Boyd
55614d6799 Add profile=False to currently unprofiled cython 2023-09-28 17:09:41 +02:00
Adriane Boyd
245e2ddc25
Allow pydantic v2 using transitional v1 support (#12888) 2023-08-08 11:27:28 +02:00
Adriane Boyd
2702db9fef
Recommend lookups tables from URLs or other loaders (#12283)
* Recommend lookups tables from URLs or other loaders

Shift away from the `lookups` extra (which isn't removed, just no longer
mentioned) and recommend loading data from the `spacy-lookups-data` repo
or other sources rather than the `spacy-lookups-data` package.

If the tables can't be loaded from the `lookups` registry in the
lemmatizer, show how to specify the tables in `[initialize]` rather than
recommending the `spacy-lookups-data` package.

* Add tests for some rule-based lemmatizers

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-07-31 15:54:35 +02:00
Raphael Mitsch
25bce73461 Format. 2023-07-28 12:56:45 +02:00
Raphael Mitsch
645b525a08 Fix merge error & tests. 2023-07-28 12:55:09 +02:00
Raphael Mitsch
61b2215b0e Format. 2023-07-27 16:29:14 +02:00
Raphael Mitsch
a2585333a9 Fix merge errors. 2023-07-27 16:27:59 +02:00
Raphael Mitsch
8aa59c4f65 Merge branch 'v4' into feature/docwise-generator-batching
# Conflicts:
#	spacy/kb/kb.pyx
#	spacy/kb/kb_in_memory.pyx
#	spacy/ml/models/entity_linker.py
#	spacy/pipeline/entity_linker.py
#	spacy/tests/pipeline/test_entity_linker.py
#	website/docs/api/entitylinker.mdx
2023-07-27 14:28:06 +02:00
svlandeg
96f2e30c4b cython fixes and cleanup 2023-07-19 17:41:29 +02:00
svlandeg
47a82c6164 merge fixes 2023-07-19 16:38:29 +02:00
svlandeg
0e3b6a87d6 Merge branch 'upstream_master' into sync_v4 2023-07-19 16:37:31 +02:00
Basile Dura
b0228d8ea6
ci: add cython linter (#12694)
* chore: add cython-linter dev dependency

* fix: lexeme.pyx

* fix: morphology.pxd

* fix: tokenizer.pxd

* fix: vocab.pxd

* fix: morphology.pxd (line length)

* ci: add cython-lint

* ci: fix cython-lint call

* Fix kb/candidate.pyx.

* Fix kb/kb.pyx.

* Fix kb/kb_in_memory.pyx.

* Fix kb.

* Fix training/ partially.

* Fix training/. Ignore trailing whitespaces and too long lines.

* Fix ml/.

* Fix matcher/.

* Fix pipeline/.

* Fix tokens/.

* Fix build errors. Fix vocab.pyx.

* Fix cython-lint install and run.

* Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution.

* Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues.

* Make cython-lint install conditional. Fix tokenizer.pyx.

* Fix remaining files. Reenable cython-lint check.

* Readded parentheses.

* Fix test_build_dependencies().

* Add explanatory comment to cython-lint execution.

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-19 12:03:31 +02:00
Adriane Boyd
830dcca367
SpanFinder: set default max_length to 25 (#12791)
When the default `max_length` is not set and there are longer training
documents, it can be difficult to train and evaluate the span finder due
to memory limits and the time it takes to evaluate a huge number of
predicted spans.
2023-07-06 09:55:34 +02:00
Adriane Boyd
337a360cc7
Use spans_ prefix for default span finder scores (#12753) 2023-06-27 19:32:17 +02:00
Daniël de Kok
2468742cb8 isort all the things 2023-06-26 11:41:03 +02:00
Daniël de Kok
e2b70df012
Configure isort to use the Black profile, recursively isort the spacy module (#12721)
* Use isort with Black profile

* isort all the things

* Fix import cycles as a result of import sorting

* Add DOCBIN_ALL_ATTRS type definition

* Add isort to requirements

* Remove isort from build dependencies check

* Typo
2023-06-14 17:48:41 +02:00
Daniël de Kok
4990cfefb4 spancat type fixes 2023-06-12 16:43:11 +02:00
Daniël de Kok
50c5e9a2dd Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
kadarakos
c003aac29a
SpanFinder into spaCy from experimental (#12507)
* span finder integrated into spacy from experimental

* black

* isort

* black

* default spankey constant

* black

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* rename

* rename

* max_length and min_length as Optional[int] and strict checking

* black

* mypy fix for integer type infinity

* revert line order

* implement all comparison operators for inf int

* avoid two for loops over all docs by not precomputing

* interleave thresholding with span creation

* black

* revert to not interleaving (relized its faster)

* black

* Update spacy/errors.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* update dosctring

* enforce that the gold and predicted documents have the same text

* new error for ensuring reference and predicted texts are the same

* remove todo

* adjust test

* black

* handle misaligned tokenization

* return correct variable

* failing overfit test

* only use a single spans_key like in spancat

* black

* remove debug lines

* typo

* remove comment

* remove near duplicate reduntant method

* use the 'spans_key' variable name everywhere

* Update spacy/pipeline/span_finder.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* flaky test fix suggestion, hand set bias terms

* only test suggester and test result exhaustively

* make it clear that the span_finder_suggester is more general (not specific to span_finder)

* Update spacy/tests/pipeline/test_span_finder.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Apply suggestions from code review

* remove question comment

* move preset_spans_suggester test to spancat tests

* Add docs and unify default configs for spancat and span finder

* Add `allow_overlap=True` to span finder scorer

* Fix offset bug in set_annotations

* Ignore labels in span finder scorer

* Format

* Add span_finder to quickstart template

* Move settings to self.cfg, store min/max unset as None

* Remove debugging

* Update docstrings and docs

* Update spacy/pipeline/span_finder.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix imports

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-06-07 15:52:28 +02:00
Raphael Mitsch
d1371d1043 Simplify doc loop in predict(). 2023-04-24 22:13:05 +02:00
Raphael Mitsch
2c80db9371 Format. 2023-04-24 21:25:48 +02:00
Raphael Mitsch
ee5d7f4a33 Drop valid_ent_idx_per_doc. 2023-04-24 21:15:30 +02:00
Raphael Mitsch
7aa3758af6 Reformat imports in entity_linker.py. 2023-04-24 21:05:37 +02:00
Raphael Mitsch
49747697a2 Merge branch 'v4' into feature/docwise-generator-batching
# Conflicts:
#	spacy/kb/kb.pyx
#	spacy/ml/models/entity_linker.py
#	spacy/pipeline/entity_linker.py
#	website/docs/api/inmemorylookupkb.mdx
#	website/docs/api/kb.mdx
2023-04-17 16:28:09 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester (#12486) 2023-03-31 13:43:51 +02:00
kadarakos
372a90885e
Fix spancat-singlelabel score (#12469)
* debug argmax sort and add span scores

* add missing tests for spanscores
2023-03-29 08:38:11 +02:00
Vinit Ravishankar
28de85737f
Tagger label smoothing (#12293)
* add label smoothing

* use True/False instead of floats

* add entropy to debug data

* formatting

* docs

* change test to check difference in distributions

* Update website/docs/api/tagger.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/tagger.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* bool -> float

* update docs

* fix seed

* black

* update tests to use label_smoothing = 0.0

* set default to 0.0, update quickstart

* Update spacy/pipeline/tagger.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* update morphologizer, tagger test

* fix morph docs

* add url to docs

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-22 12:17:56 +01:00
Raphael Mitsch
3102e2e27a
Entity linking: use SpanGroup instead of Iterable[Span] for mentions (#12344)
* Convert Candidate from Cython to Python class.

* Format.

* Fix .entity_ typo in _add_activations() usage.

* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].

* Update docs.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update doc string of BaseCandidate.__init__().

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.

* Adjust Candidate to support and mandate numerical entity IDs.

* Format.

* Fix docstring and docs.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename alias -> mention.

* Refactor Candidate attribute names. Update docs and tests accordingly.

* Refacor Candidate attributes and their usage.

* Format.

* Fix mypy error.

* Update error code in line with v4 convention.

* Reverse erroneous changes during merge.

* Update return type in EL tests.

* Re-add Candidate to setup.py.

* Format updated docs.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 12:25:18 +01:00
Raphael Mitsch
e5be5d6092 Merge branch 'v4' into feature/docwise-generator-batching
# Conflicts:
#	spacy/kb/kb.pyx
#	spacy/kb/kb_in_memory.pyx
#	spacy/ml/models/entity_linker.py
#	spacy/pipeline/entity_linker.py
#	spacy/tests/pipeline/test_entity_linker.py
#	website/docs/api/inmemorylookupkb.mdx
#	website/docs/api/kb.mdx
2023-03-20 10:50:54 +01:00
Raphael Mitsch
cb79af3a10 Fix merge leftovers. 2023-03-20 10:31:11 +01:00
Raphael Mitsch
73bdeb01e4 Merge branch 'refactor/el-candidates' into feature/docwise-generator-batching
# Conflicts:
#	spacy/kb/candidate.py
#	spacy/kb/kb.pyx
#	spacy/kb/kb_in_memory.pyx
#	spacy/ml/models/entity_linker.py
#	spacy/pipeline/entity_linker.py
#	spacy/tests/pipeline/test_entity_linker.py
#	website/docs/api/inmemorylookupkb.mdx
#	website/docs/api/kb.mdx
2023-03-20 10:24:17 +01:00
Raphael Mitsch
9340eb8ad2
Introduce hierarchy for EL Candidate objects (#12341)
* Convert Candidate from Cython to Python class.

* Format.

* Fix .entity_ typo in _add_activations() usage.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update doc string of BaseCandidate.__init__().

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.

* Adjust Candidate to support and mandate numerical entity IDs.

* Format.

* Fix docstring and docs.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename alias -> mention.

* Refactor Candidate attribute names. Update docs and tests accordingly.

* Refacor Candidate attributes and their usage.

* Format.

* Fix mypy error.

* Update error code in line with v4 convention.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updated error code.

* Simplify interface for int/str representations.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename 'alias' to 'mention'.

* Port Candidate and InMemoryCandidate to Cython.

* Remove redundant entry in setup.py.

* Add abstract class check.

* Drop storing mention.

* Update spacy/kb/candidate.pxd

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix entity_id refactoring problems in docstrings.

* Drop unused InMemoryCandidate._entity_hash.

* Update docstrings.

* Move attributes out of Candidate.

* Partially fix alias/mention terminology usage. Convert Candidate to interface.

* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().

* Update docstrings related to prior_prob.

* Update alias/mention usage in doc(strings).

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.

* Update docstrings.

* Fix InMemoryCandidate attribute names.

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update W401 test.

* Update spacy/errors.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use Candidate output type for toy generators in the test suite to mimick best practices

* fix docs

* fix import

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 00:34:35 +01:00
Raphael Mitsch
96b61d0671
Fix EL failure with sentence-crossing entities (#12398)
* Add test reproducing EL failure in sentence-crossing entities.

* Format.

* Draft fix.

* Format.

* Fix case for len(ent.sents) == 1.

* Format.

* Format.

* Format.

* Fix mypy error.

* Merge EL sentence crossing tests.

* Remove unneeded sentencizer component.

* Fix or ignore mypy issues in test.

* Simplify ent.sents handling.

* Format. Update assert in ent.sents handling.

* Small rewrite

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-14 22:02:49 +01:00
Raphael Mitsch
4a921766f1 Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs(). 2023-03-13 16:54:38 +01:00
Lj Miranda
913d74f509
Add spancat_singlelabel pipeline for multiclass and non-overlapping span labelling tasks (#11365)
* [wip] Update

* [wip] Update

* Add initial port

* [wip] Update

* Fix all imports

* Add spancat_exclusive to pipeline

* [WIP] Update

* [ci skip] Add breakpoint for debugging

* Use spacy.SpanCategorizer.v1 as default archi

* Update spacy/pipeline/spancat_exclusive.py

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* [ci skip] Small updates

* Use Softmax v2 directly from thinc

* Cache the label map

* Fix mypy errors

However, I ignored line 370 because it opened up a bunch of type errors
that might be trickier to solve and might lead to a more complicated
codebase.

* avoid multiplication with 1.0

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* Update spacy/pipeline/spancat_exclusive.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update component versions to v2

* Add scorer to docstring

* Add _n_labels property to SpanCategorizer

Instead of using len(self.labels) in initialize() I am using a private
property self._n_labels. This achieves implementation parity and allows
me to delete the whole initialize() method for spancat_exclusive (since
it's now the same with spancat).

* Inherit from SpanCat instead of TrainablePipe

This commit changes the inheritance structure of Exclusive_Spancat,
now it's inheriting from SpanCategorizer than TrainablePipe. This
allows me to remove duplicate methods that are already present in
the parent function.

* Revert documentation link to spancat

* Fix init call for exclusive spancat

* Update spacy/pipeline/spancat_exclusive.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Import Suggester from spancat

* Include zero_init.v1 for spancat

* Implement _allow_extra_label to use _n_labels

To ensure that spancat / spancat_exclusive cannot be resized after
initialization, I inherited the _allow_extra_label() method from
spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead
of len(self.labels) for checking.

I think that changing it locally is a better solution rather than
forcing each class that inherits TrainablePipe to use the self._n_labels
attribute.

Also note that I turned-off black formatting in this block of code
because it reads better without the overhang.

* Extend existing tests to spancat_exclusive

In this commit, I extended the existing tests for spancat to include
spancat_exclusive. I parametrized the test functions with 'name'
(similar var name with textcat and textcat_multilabel) for each
applicable test.

TODO: Add overfitting tests for spancat_exclusive

* Update documentation for spancat

* Turn on formatting for allow_extra_label

* Remove initializers in default config

* Use DEFAULT_EXCL_SPANCAT_MODEL

I also renamed spancat_exclusive_default_config into
spancat_excl_default_config because black does some not pretty
formatting changes.

* Update documentation

Update grammar and usage

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Clarify docstring for Exclusive_SpanCategorizer

* Remove mypy ignore and typecast labels to list

* Fix documentation API

* Use a single variable for tests

* Update defaults for number of rows

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Put back initializers in spancat config

Whenever I remove model.scorer.init_w and model.scorer.init_b,
I encounter an error in the test:

    SystemError: <method '__getitem__' of 'dict' objects> returned a result
    with an error set.

My Thinc version is 8.1.5, but I can't seem to check what's causing the
error.

* Update spancat_exclusive docstring

* Remove init_W and init_B parameters

This commit is expected to fail until the new Thinc release.

* Require thinc>=8.1.6 for serializable Softmax defaults

* Handle zero suggestions to make tests pass

I'm not sure if this is the most elegant solution. But what should
happen is that the _make_span_group function MUST return an empty
SpanGroup if there are no suggestions.

The error happens when the 'scores' variable is empty. We cannot
get the 'predicted' and other downstream vars.

* Better approach for handling zero suggestions

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spancategorizer headers

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add default value in negative_weight in docs

* Add default value in allow_overlap in docs

* Update how spancat_exclusive is constructed

In this commit, I added the following:
- Put the default values of negative_weight and allow_overlap
    in the default_config dictionary.
- Rename make_spancat -> make_exclusive_spancat

* Run prettier on spancategorizer.mdx

* Change exactly one -> at most one

* Add suggester documentation in Exclusive_SpanCategorizer

* Add suggester to spancat docstrings

* merge multilabel and singlelabel spancat

* rename spancat_exclusive to singlelable

* wire up different make_spangroups for single and multilabel

* black

* black

* add docstrings

* more docstring and fix negative_label

* don't rely on default arguments

* black

* remove spancat exclusive

* replace single_label with add_negative_label and adjust inference

* mypy

* logical bug in configuration check

* add spans.attrs[scores]

* single label make_spangroup test

* bugfix

* black

* tests for make_span_group with negative labels

* refactor make_span_group

* black

* Update spacy/tests/pipeline/test_spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* remove duplicate declaration

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* raise error instead of just print

* make label mapper private

* update docs

* run prettier

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* don't keep recomputing self._label_map for each span

* typo in docs

* Intervals to private and document 'name' param

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* add Tag to new features

* replace tags

* revert

* revert

* revert

* revert

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* prettier

* Fix merge

* Update website/docs/api/spancategorizer.mdx

* remove references to 'single_label'

* remove old paragraph

* Add spancat_singlelabel to config template

* Format

* Extend init config tests

---------

Co-authored-by: kadarakos <kadar.akos@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-09 10:30:59 +01:00