spaCy/spacy/tests/pipeline
Paul O'Leary McCann d61e742960
Handle Docs with no entities in EntityLinker (#11640)
* Handle docs with no entities

If a whole batch contains no entities it won't make it to the model, but
it's possible for individual Docs to have no entities. Before this
commit, those Docs would cause an error when attempting to concatenate
arrays because the dimensions didn't match.

It turns out the process of preparing the Ragged at the end of the span
maker forward was a little different from list2ragged, which just uses
the flatten function directly. Letting list2ragged do the conversion
avoids the dimension issue.

This did not come up before because in NEL demo projects it's typical
for data with no entities to be discarded before it reaches the NEL
component.

This includes a simple direct test that shows the issue and checks it's
resolved. It doesn't check if there are any downstream changes, so a
more complete test could be added. A full run was tested by adding an
example with no entities to the Emerson sample project.

* Add a blank instance to default training data in tests

Rather than adding a specific test, since not failing on instances with
no entities is basic functionality, it makes sense to add it to the
default set.

* Fix without modifying architecture

If the architecture is modified this would have to be a new version, but
this change isn't big enough to merit that.
2022-10-28 10:25:34 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_analysis.py Simplify pipe analysis 2020-08-01 13:40:06 +02:00
test_annotates_on_update.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
test_attributeruler.py Refactor scoring methods to use registered functions (#8766) 2021-08-10 15:13:39 +02:00
test_edit_tree_lemmatizer.py Add edit tree lemmatizer (#10231) 2022-03-28 11:13:50 +02:00
test_entity_linker.py Handle Docs with no entities in EntityLinker (#11640) 2022-10-28 10:25:34 +02:00
test_entity_ruler.py Auto-format code with black (#10908) 2022-06-03 11:01:55 +02:00
test_functions.py Add doc_cleaner component (#9659) 2021-11-23 15:33:33 +01:00
test_initialize.py Test with default value 2020-09-29 17:00:40 +02:00
test_lemmatizer.py Tidy up and auto-format 2021-07-18 15:44:56 +10:00
test_models.py Tidy up code 2021-06-28 12:08:15 +02:00
test_morphologizer.py removing print statements from the test suite (#10712) 2022-04-27 09:14:25 +02:00
test_pipe_factories.py Auto-format code with black (#10795) 2022-05-13 19:02:08 +02:00
test_pipe_methods.py Simplify and clarify enable/disable behavior of spacy.load() (#11459) 2022-09-27 14:22:36 +02:00
test_sentencizer.py Refactor Docs.is_ flags (#6044) 2020-09-17 00:14:01 +02:00
test_senter.py Add Pipe.hide_labels to omit labels from pipeline meta (#10175) 2022-02-05 17:59:24 +01:00
test_span_ruler.py Add SpanRuler component (#9880) 2022-06-02 13:12:53 +02:00
test_spancat.py Save span candidates produced by spancat suggesters (#10413) 2022-03-14 16:46:58 +01:00
test_tagger.py Migrate regression tests into the main test suite (#9655) 2021-12-04 20:34:48 +01:00
test_textcat.py Add test for old architectures (#10751) 2022-05-10 08:24:42 +02:00
test_tok2vec.py Auto-format code with black (#11687) 2022-10-21 11:54:17 +02:00