spaCy/spacy/tests/regression/test_issue3972.py

from spacy.matcher import PhraseMatcher
from spacy.tokens import Doc


def test_issue3972(en_vocab):
    """Test that the PhraseMatcher returns duplicates for duplicate match IDs.
    """
    matcher = PhraseMatcher(en_vocab)
    matcher.add("A", [Doc(en_vocab, words=["New", "York"])])
    matcher.add("B", [Doc(en_vocab, words=["New", "York"])])
    doc = Doc(en_vocab, words=["I", "live", "in", "New", "York"])
    matches = matcher(doc)

    assert len(matches) == 2

    # We should have a match for each of the two rules
    found_ids = [en_vocab.strings[ent_id] for (ent_id, _, _) in matches]
    assert "A" in found_ids
    assert "B" in found_ids
Add regression test for #3972 2019-07-16 14:07:35 +03:00			`from spacy.matcher import PhraseMatcher`
			`from spacy.tokens import Doc`


			`def test_issue3972(en_vocab):`
			`"""Test that the PhraseMatcher returns duplicates for duplicate match IDs.`
			`"""`
			`matcher = PhraseMatcher(en_vocab)`
Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522) * Implement new API for {Phrase}Matcher.add (backwards-compatible) * Update docs * Also update DependencyMatcher.add * Update internals * Rewrite tests to use new API * Add basic check for common mistake Raise error with suggestion if user likely passed in a pattern instead of a list of patterns * Fix typo [ci skip] 2019-10-25 23:21:08 +03:00			`matcher.add("A", [Doc(en_vocab, words=["New", "York"])])`
			`matcher.add("B", [Doc(en_vocab, words=["New", "York"])])`
Add regression test for #3972 2019-07-16 14:07:35 +03:00			`doc = Doc(en_vocab, words=["I", "live", "in", "New", "York"])`
			`matches = matcher(doc)`
Matcher ID fixes (#4179) * allow phrasematcher to link one match to multiple original patterns * small fix for defining ent_id in the matcher (anti-ghost prevention) * cleanup * formatting 2019-08-22 18:17:07 +03:00
Add regression test for #3972 2019-07-16 14:07:35 +03:00			`assert len(matches) == 2`
Matcher ID fixes (#4179) * allow phrasematcher to link one match to multiple original patterns * small fix for defining ent_id in the matcher (anti-ghost prevention) * cleanup * formatting 2019-08-22 18:17:07 +03:00
			`# We should have a match for each of the two rules`
			`found_ids = [en_vocab.strings[ent_id] for (ent_id, _, _) in matches]`
			`assert "A" in found_ids`
			`assert "B" in found_ids`