mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-10 19:57:17 +03:00
Adding a note on retrieving the string rep of the match_id (#4904)
Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types
This commit is contained in:
parent
6ff947e1f9
commit
02a44c5be2
|
@ -70,6 +70,17 @@ Find all token sequences matching the supplied patterns on the `Doc`.
|
|||
| `doc` | `Doc` | The document to match over. |
|
||||
| **RETURNS** | list | A list of `(match_id, start, end)` tuples, describing the matches. A match tuple describes a span `doc[start:end]`. The `match_id` is the ID of the added match pattern. |
|
||||
|
||||
<Infobox title="Note on retrieving the string representation of the match_id" variant="warning">
|
||||
|
||||
Because spaCy stores all strings as integers, the match_id you get back will be an integer, too – but you can always get the string representation by looking it up in the vocabulary's StringStore, i.e. nlp.vocab.strings:
|
||||
|
||||
```
|
||||
match_id_string = nlp.vocab.strings[match_id]
|
||||
```
|
||||
|
||||
</Infobox>
|
||||
|
||||
|
||||
## PhraseMatcher.pipe {#pipe tag="method"}
|
||||
|
||||
Match a stream of documents, yielding them in turn.
|
||||
|
|
Loading…
Reference in New Issue
Block a user