From 02a44c5be2dbbd8a6a0a1d40ecb04bc887ce8fb1 Mon Sep 17 00:00:00 2001 From: "Martin A. Kayser" <9056896+maknotavailable@users.noreply.github.com> Date: Mon, 3 Feb 2020 03:58:59 -0800 Subject: [PATCH] Adding a note on retrieving the string rep of the match_id (#4904) Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types --- website/docs/api/phrasematcher.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/website/docs/api/phrasematcher.md b/website/docs/api/phrasematcher.md index 90ecd3416..4119c8fc0 100644 --- a/website/docs/api/phrasematcher.md +++ b/website/docs/api/phrasematcher.md @@ -70,6 +70,17 @@ Find all token sequences matching the supplied patterns on the `Doc`. | `doc` | `Doc` | The document to match over. | | **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. | + + +Because spaCy stores all strings as integers, the match_id you get back will be an integer, too – but you can always get the string representation by looking it up in the vocabulary's StringStore, i.e. nlp.vocab.strings: + +``` +match_id_string = nlp.vocab.strings[match_id] +``` + + + + ## PhraseMatcher.pipe {#pipe tag="method"} Match a stream of documents, yielding them in turn.