spaCy/spacy
Paul O'Leary McCann 6be09bbd07
Fix Entity Linker with tokenization mismatches (fix #9575) (#10457)
* Add failing test

* Partial fix for issue

This kind of works. The issue with token length mismatches is gone. The
problem is that when you get empty lists of encodings to compare, it
fails because the sizes are not the same, even though they're both zero:
(0, 3) vs (0,). Not sure why that happens...

* Short circuit on empties

* Remove spurious check

The check here isn't needed now the the short circuit is fixed.

* Update spacy/tests/pipeline/test_entity_linker.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Use "eg", not "example"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-05-23 20:42:26 +02:00
..
cli Add spacy-span-analyzer to debug data (#10668) 2022-05-23 19:06:38 +02:00
displacy #10672: fixes displacy output for manual unsorted entities (#10673) 2022-04-27 09:51:58 +02:00
lang Auto-format code with black (#10687) 2022-04-22 11:24:53 +02:00
matcher Fix PhraseMatcher remove overlapping terms (#10734) 2022-05-12 12:23:52 +02:00
ml Refactor error messages to remove hardcoded strings (#10729) 2022-05-02 13:38:46 +02:00
pipeline Fix Entity Linker with tokenization mismatches (fix #9575) (#10457) 2022-05-23 20:42:26 +02:00
tests Fix Entity Linker with tokenization mismatches (fix #9575) (#10457) 2022-05-23 20:42:26 +02:00
tokens Override SpanGroups.setdefault to provide default SpanGroup (#10772) 2022-05-12 10:06:25 +02:00
training Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.3.0 (#10614) 2022-04-28 13:07:49 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Intify IOB (#9738) 2022-01-20 13:19:38 +01:00
compat.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
default_config_pretraining.cfg Add new parameter for saving every n epoch in pretraining (#8912) 2021-08-12 11:14:48 +02:00
default_config.cfg Add a few docs to the default_config.cfg (#9981) 2022-01-05 09:16:40 +01:00
errors.py Ignore overrides for pipe names in config argument (#10779) 2022-05-12 11:46:08 +02:00
glossary.py Add glossary entry for root (#10821) 2022-05-20 09:56:32 +02:00
kb.pxd Replace cpdef variables with cdef (#7834) 2021-04-26 16:54:02 +02:00
kb.pyx Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1 2021-10-26 11:53:50 +02:00
language.py Ignore overrides for pipe names in config argument (#10779) 2022-05-12 11:46:08 +02:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyi fix type of lexeme.rank (#9979) 2022-01-04 13:15:25 +01:00
lexeme.pyx Bugfix for similarity return types (#10051) 2022-01-20 11:40:46 +01:00
lookups.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
morphology.pxd Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
morphology.pyx Clean up Morphology imports and definitions (#7441) 2021-04-26 16:54:23 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
py.typed Add py.typed 2021-03-16 09:48:31 +01:00
schemas.py Add ENT_IOB key to Matcher (#9649) 2022-01-20 13:18:39 +01:00
scorer.py Alignment: use a simplified ragged type for performance (#10319) 2022-04-01 09:02:06 +02:00
strings.pxd Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
strings.pyi Fix StringStore.__getitem__ return type depending on parameter types (#10741) 2022-05-03 17:57:07 +02:00
strings.pyx Update Cython string types (#9143) 2021-09-13 17:02:17 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
symbols.pyx introduce token.has_head and refer to MISSING_DEP_ (WIP) 2021-01-12 17:17:06 +01:00
tokenizer.pxd Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
tokenizer.pyx Add tokenizer option to allow Matcher handling for all rules (#10452) 2022-03-24 13:21:32 +01:00
ty.py Custom component types in spacy.ty (#9469) 2021-10-21 15:31:06 +02:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py hook up meta in load_model_from_config (#10400) 2022-03-04 11:07:45 +01:00
vectors.pyx Save vectors as little endian, load with Ops.asarray (#10201) 2022-03-21 14:24:46 +01:00
vocab.pxd Add support for floret vectors (#8909) 2021-10-27 14:08:31 +02:00
vocab.pyi Add vector deduplication (#10551) 2022-03-30 08:54:23 +02:00
vocab.pyx Add vector deduplication (#10551) 2022-03-30 08:54:23 +02:00