mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-13 05:07:03 +03:00
Adding a note on retrieving the string rep of the match_id (#4904)
Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
This commit is contained in:
parent
6ff947e1f9
commit
02a44c5be2
|
@ -70,6 +70,17 @@ Find all token sequences matching the supplied patterns on the `Doc`.
|
||||||
| `doc` | `Doc` | The document to match over. |
|
| `doc` | `Doc` | The document to match over. |
|
||||||
| **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. |
|
| **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. |
|
||||||
|
|
||||||
|
<Infobox title="Note on retrieving the string representation of the match_id" variant="warning">
|
||||||
|
|
||||||
|
Because spaCy stores all strings as integers, the match_id you get back will be an integer, too – but you can always get the string representation by looking it up in the vocabulary's StringStore, i.e. nlp.vocab.strings:
|
||||||
|
|
||||||
|
```
|
||||||
|
match_id_string = nlp.vocab.strings[match_id]
|
||||||
|
```
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
|
|
||||||
## PhraseMatcher.pipe {#pipe tag="method"}
|
## PhraseMatcher.pipe {#pipe tag="method"}
|
||||||
|
|
||||||
Match a stream of documents, yielding them in turn.
|
Match a stream of documents, yielding them in turn.
|
||||||
|
|
Loading…
Reference in New Issue
Block a user