Update first matcher example and match_id (resolves #1989)

2025-10-29 15:07:54 +03:00 · 2018-02-17 11:57:38 +01:00 · 2018-02-17 11:57:38 +01:00 · 612c79a4f5
commit 612c79a4f5
parent 7d5c720fc3
1 changed files with 15 additions and 4 deletions
--- a/website/usage/_linguistic-features/_rule-based-matching.jade
+++ b/website/usage/_linguistic-features/_rule-based-matching.jade
@ -54,10 +54,21 @@ p
 p
    |  The matcher returns a list of #[code (match_id, start, end)] tuples – in
-    |  this case, #[code [('HelloWorld', 0, 2)]], which maps to the span
+    |  this case, #[code [('15578876784678163569', 0, 2)]], which maps to the
-    |  #[code doc[0:2]] of our original document. Optionally, we could also
+    |  span #[code doc[0:2]] of our original document. The #[code match_id]
-    |  choose to add more than one pattern, for example to also match sequences
+    |  is the #[+a("/usage/spacy-101#vocab") hash value] of the string ID
-    |  without punctuation between "hello" and "world":
+    |  "HelloWorld". To get the string value, you can look up the ID
    |  in the #[+api("stringstore") #[code StringStore]].
 +code.
    for match_id, start, end in matches:
        string_id = nlp.vocab.strings[match_id]  # 'HelloWorld'
        span = doc[start:end]                    # the matched span
 p
    |  Optionally, we could also choose to add more than one pattern, for
    |  example to also match sequences without punctuation between "hello" and
    |  "world":
 +code.
    matcher.add('HelloWorld', None,