Raphael Mitsch
304b9331e6
Modify EL batching to doc-wise streaming approach ( #12367 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Modify EL batching system.
* Update leftover get_candidates() mention in docs.
* Format docs.
* Format.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
* Fix merge leftovers.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb_in_memory.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/inmemorylookupkb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update get_candidates() docstring.
* Reformat imports in entity_linker.py.
* Drop valid_ent_idx_per_doc.
* Update docs.
* Format.
* Simplify doc loop in predict().
* Remove E1044 comment.
* Fix merge errors.
* Format.
* Format.
* Format.
* Fix merge error & tests.
* Format.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Use type alias.
* isort.
* isort.
* Lint.
* Add typedefs.pyx.
* Fix typedef import.
* Fix type aliases.
* Format.
* Update docstring and type usage.
* Add info on get_candidates(), get_candidates_batched().
* Readd get_candidates info to v3 changelog.
* Update website/docs/api/entitylinker.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update factory functions for backwards compatibility.
* Format.
* Ignore mypy error.
* Fix mypy error.
* Format.
* Add test for multiple docs with multiple entities.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-09 11:39:18 +02:00
Daniël de Kok
81beaea70e
Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119
2024-01-19 12:34:29 +01:00
Sofie Van Landeghem
699dd8b3b7
Update __all__ fields ( #13063 )
...
* update all for pipeline.init
* add all in training.init
* add all in kb.init
* alphabetically
2023-10-16 10:17:47 +02:00
Adriane Boyd
538304948e
Remove profile=True from currently profiled cython
2023-09-28 17:09:41 +02:00
svlandeg
0e3b6a87d6
Merge branch 'upstream_master' into sync_v4
2023-07-19 16:37:31 +02:00
Basile Dura
b0228d8ea6
ci: add cython linter ( #12694 )
...
* chore: add cython-linter dev dependency
* fix: lexeme.pyx
* fix: morphology.pxd
* fix: tokenizer.pxd
* fix: vocab.pxd
* fix: morphology.pxd (line length)
* ci: add cython-lint
* ci: fix cython-lint call
* Fix kb/candidate.pyx.
* Fix kb/kb.pyx.
* Fix kb/kb_in_memory.pyx.
* Fix kb.
* Fix training/ partially.
* Fix training/. Ignore trailing whitespaces and too long lines.
* Fix ml/.
* Fix matcher/.
* Fix pipeline/.
* Fix tokens/.
* Fix build errors. Fix vocab.pyx.
* Fix cython-lint install and run.
* Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution.
* Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues.
* Make cython-lint install conditional. Fix tokenizer.pyx.
* Fix remaining files. Reenable cython-lint check.
* Readded parentheses.
* Fix test_build_dependencies().
* Add explanatory comment to cython-lint execution.
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-19 12:03:31 +02:00
Daniël de Kok
bf92ca4f10
Merge remote-tracking branch 'upstream/master' into v4-isort
2023-06-26 12:43:00 +02:00
Daniël de Kok
2468742cb8
isort all the things
2023-06-26 11:41:03 +02:00
Daniël de Kok
e2b70df012
Configure isort to use the Black profile, recursively isort the spacy
module ( #12721 )
...
* Use isort with Black profile
* isort all the things
* Fix import cycles as a result of import sorting
* Add DOCBIN_ALL_ATTRS type definition
* Add isort to requirements
* Remove isort from build dependencies check
* Typo
2023-06-14 17:48:41 +02:00
Raphael Mitsch
3102e2e27a
Entity linking: use SpanGroup
instead of Iterable[Span]
for mentions ( #12344 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Reverse erroneous changes during merge.
* Update return type in EL tests.
* Re-add Candidate to setup.py.
* Format updated docs.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 12:25:18 +01:00
Raphael Mitsch
9340eb8ad2
Introduce hierarchy for EL Candidate
objects ( #12341 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 00:34:35 +01:00
Sofie Van Landeghem
74cae47bf6
rely on is_empty property instead of __len__ ( #12347 )
2023-03-01 12:06:07 +01:00
Adriane Boyd
3b8918e166
API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar ( #12128 )
...
* API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar
* adjust to mdx
* linkout to InMemoryLookupKB at first occurrence in kb.mdx
* fix links to docs
* revert Azure trigger setting (I'll make a separate PR)
Co-authored-by: svlandeg <svlandeg@github.com>
2023-01-19 13:29:17 +01:00
Raphael Mitsch
1f23c615d7
Refactor KB for easier customization ( #11268 )
...
* Add implementation of batching + backwards compatibility fixes. Tests indicate issue with batch disambiguation for custom singular entity lookups.
* Fix tests. Add distinction w.r.t. batch size.
* Remove redundant and add new comments.
* Adjust comments. Fix variable naming in EL prediction.
* Fix mypy errors.
* Remove KB entity type config option. Change return types of candidate retrieval functions to Iterable from Iterator. Fix various other issues.
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update spacy/kb_base.pyx
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update spacy/kb_base.pyx
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update spacy/pipeline/entity_linker.py
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Add error messages to NotImplementedErrors. Remove redundant comment.
* Fix imports.
* Remove redundant comments.
* Rename KnowledgeBase to InMemoryLookupKB and BaseKnowledgeBase to KnowledgeBase.
* Fix tests.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Move KB into subdirectory.
* Adjust imports after KB move to dedicated subdirectory.
* Fix config imports.
* Move Candidate + retrieval functions to separate module. Fix other, small issues.
* Fix docstrings and error message w.r.t. class names. Fix typing for candidate retrieval functions.
* Update spacy/kb/kb_in_memory.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix typing.
* Change typing of mentions to be Span instead of Union[Span, str].
* Update docs.
* Update EntityLinker and _architecture docs.
* Update website/docs/api/entitylinker.md
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Adjust message for E1046.
* Re-add section for Candidate in kb.md, add reference to dedicated page.
* Update docs and docstrings.
* Re-add section + reference for KnowledgeBase.get_alias_candidates() in docs.
* Update spacy/kb/candidate.pyx
* Update spacy/kb/kb_in_memory.pyx
* Update spacy/pipeline/legacy/entity_linker.py
* Remove canididate.md. Remove mistakenly added config snippet in entity_linker.py.
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-09-08 10:38:07 +02:00