spaCy/website/docs/api
Raphael Mitsch 304b9331e6
Modify EL batching to doc-wise streaming approach (#12367)
* Convert Candidate from Cython to Python class.

* Format.

* Fix .entity_ typo in _add_activations() usage.

* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].

* Update docs.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update doc string of BaseCandidate.__init__().

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.

* Adjust Candidate to support and mandate numerical entity IDs.

* Format.

* Fix docstring and docs.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename alias -> mention.

* Refactor Candidate attribute names. Update docs and tests accordingly.

* Refacor Candidate attributes and their usage.

* Format.

* Fix mypy error.

* Update error code in line with v4 convention.

* Modify EL batching system.

* Update leftover get_candidates() mention in docs.

* Format docs.

* Format.

* Update spacy/kb/candidate.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Updated error code.

* Simplify interface for int/str representations.

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Rename 'alias' to 'mention'.

* Port Candidate and InMemoryCandidate to Cython.

* Remove redundant entry in setup.py.

* Add abstract class check.

* Drop storing mention.

* Update spacy/kb/candidate.pxd

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix entity_id refactoring problems in docstrings.

* Drop unused InMemoryCandidate._entity_hash.

* Update docstrings.

* Move attributes out of Candidate.

* Partially fix alias/mention terminology usage. Convert Candidate to interface.

* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().

* Update docstrings related to prior_prob.

* Update alias/mention usage in doc(strings).

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.

* Update docstrings.

* Fix InMemoryCandidate attribute names.

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/models/entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update W401 test.

* Update spacy/errors.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use Candidate output type for toy generators in the test suite to mimick best practices

* fix docs

* fix import

* Fix merge leftovers.

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/kb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/entitylinker.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/kb/kb_in_memory.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/inmemorylookupkb.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update get_candidates() docstring.

* Reformat imports in entity_linker.py.

* Drop valid_ent_idx_per_doc.

* Update docs.

* Format.

* Simplify doc loop in predict().

* Remove E1044 comment.

* Fix merge errors.

* Format.

* Format.

* Format.

* Fix merge error & tests.

* Format.

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Use type alias.

* isort.

* isort.

* Lint.

* Add typedefs.pyx.

* Fix typedef import.

* Fix type aliases.

* Format.

* Update docstring and type usage.

* Add info on get_candidates(), get_candidates_batched().

* Readd get_candidates info to v3 changelog.

* Update website/docs/api/entitylinker.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update factory functions for backwards compatibility.

* Format.

* Ignore mypy error.

* Fix mypy error.

* Format.

* Add test for multiple docs with multiple entities.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
..
architectures.mdx Modify EL batching to doc-wise streaming approach (#12367) 2024-04-09 11:39:18 +02:00
attributeruler.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
attributes.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
basevectors.mdx Support registered vectors (#12492) 2023-08-01 15:46:08 +02:00
cli.mdx Merge pull request #13299 from danieldk/copy/master 2024-02-04 15:40:55 +01:00
coref.mdx corrected example code (#12466) 2023-03-27 11:32:49 +02:00
corpus.mdx Add spacy.PlainTextCorpusReader.v1 (#12122) 2023-01-26 11:33:22 +01:00
curatedtransformer.mdx Docs: update trf_data examples and pipeline design info (#13164) 2023-12-04 15:15:54 +01:00
cython-classes.mdx Refactor lexeme mem passing (#12125) 2023-01-25 12:50:21 +09:00
cython-structs.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cython.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
data-formats.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
dependencymatcher.mdx Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
dependencyparser.mdx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
doc.mdx Return Tuple[Span] for all Doc/Span attrs that provide spans (#12288) 2023-03-01 16:00:02 +01:00
docbin.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
edittreelemmatizer.mdx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
entitylinker.mdx Modify EL batching to doc-wise streaming approach (#12367) 2024-04-09 11:39:18 +02:00
entityrecognizer.mdx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
entityruler.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
example.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
index.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
inmemorylookupkb.mdx Modify EL batching to doc-wise streaming approach (#12367) 2024-04-09 11:39:18 +02:00
kb.mdx Modify EL batching to doc-wise streaming approach (#12367) 2024-04-09 11:39:18 +02:00
language.mdx Merge branch 'upstream_master' into sync_v4 2023-07-19 16:37:31 +02:00
large-language-models.mdx fix typo (#13254) 2024-01-24 09:29:57 +01:00
legacy.mdx Add TextCatReduce.v1 (#13181) 2023-12-21 11:00:06 +01:00
lemmatizer.mdx Recommend lookups tables from URLs or other loaders (#12283) 2023-07-31 15:54:35 +02:00
lexeme.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
lookups.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
matcher.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
morphologizer.mdx Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
morphology.mdx Fix new tags in docs for v3.5.x (#12629) 2023-05-15 12:06:58 +02:00
phrasematcher.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
pipe.mdx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
pipeline-functions.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
scorer.mdx Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
sentencerecognizer.mdx Add Language.distill (#12116) 2023-01-30 12:44:11 +01:00
sentencizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span-resolver.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span.mdx Return Tuple[Span] for all Doc/Span attrs that provide spans (#12288) 2023-03-01 16:00:02 +01:00
spancategorizer.mdx Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
spanfinder.mdx Update max_length default in span finder docs (#12803) 2023-07-07 10:17:41 +02:00
spangroup.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
spanruler.mdx Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
stringstore.mdx Add info to stringstore and vocab (#12471) 2023-03-27 13:15:14 +02:00
tagger.mdx Merge remote-tracking branch 'upstream/master' into sync-v4-master-20230612 2023-06-12 15:57:10 +02:00
textcategorizer.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
tok2vec.mdx Tok2Vec: Add distill method (#12108) 2023-03-09 09:37:19 +01:00
token.mdx Merge branch 'copy_master' into copy_v4 2023-01-11 18:40:55 +01:00
tokenizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
top-level.mdx Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
transformer.mdx Docs: update trf_data examples and pipeline design info (#13164) 2023-12-04 15:15:54 +01:00
vectors.mdx Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119 2024-01-19 12:34:29 +01:00
vocab.mdx Merge pull request #13299 from danieldk/copy/master 2024-02-04 15:40:55 +01:00