mirror of
https://github.com/explosion/spaCy.git
synced 2025-05-31 19:23:05 +03:00
Update first matcher example and match_id (resolves #1989)
This commit is contained in:
parent
7d5c720fc3
commit
612c79a4f5
|
@ -54,10 +54,21 @@ p
|
||||||
|
|
||||||
p
|
p
|
||||||
| The matcher returns a list of #[code (match_id, start, end)] tuples – in
|
| The matcher returns a list of #[code (match_id, start, end)] tuples – in
|
||||||
| this case, #[code [('HelloWorld', 0, 2)]], which maps to the span
|
| this case, #[code [('15578876784678163569', 0, 2)]], which maps to the
|
||||||
| #[code doc[0:2]] of our original document. Optionally, we could also
|
| span #[code doc[0:2]] of our original document. The #[code match_id]
|
||||||
| choose to add more than one pattern, for example to also match sequences
|
| is the #[+a("/usage/spacy-101#vocab") hash value] of the string ID
|
||||||
| without punctuation between "hello" and "world":
|
| "HelloWorld". To get the string value, you can look up the ID
|
||||||
|
| in the #[+api("stringstore") #[code StringStore]].
|
||||||
|
|
||||||
|
+code.
|
||||||
|
for match_id, start, end in matches:
|
||||||
|
string_id = nlp.vocab.strings[match_id] # 'HelloWorld'
|
||||||
|
span = doc[start:end] # the matched span
|
||||||
|
|
||||||
|
p
|
||||||
|
| Optionally, we could also choose to add more than one pattern, for
|
||||||
|
| example to also match sequences without punctuation between "hello" and
|
||||||
|
| "world":
|
||||||
|
|
||||||
+code.
|
+code.
|
||||||
matcher.add('HelloWorld', None,
|
matcher.add('HelloWorld', None,
|
||||||
|
|
Loading…
Reference in New Issue
Block a user