From 612c79a4f55e2b3e077203d92c7e185a0a5bc8dd Mon Sep 17 00:00:00 2001 From: ines Date: Sat, 17 Feb 2018 11:57:38 +0100 Subject: [PATCH] Update first matcher example and match_id (resolves #1989) --- .../_rule-based-matching.jade | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/website/usage/_linguistic-features/_rule-based-matching.jade b/website/usage/_linguistic-features/_rule-based-matching.jade index 4b13bd581..e1a7c8a81 100644 --- a/website/usage/_linguistic-features/_rule-based-matching.jade +++ b/website/usage/_linguistic-features/_rule-based-matching.jade @@ -54,10 +54,21 @@ p p | The matcher returns a list of #[code (match_id, start, end)] tuples – in - | this case, #[code [('HelloWorld', 0, 2)]], which maps to the span - | #[code doc[0:2]] of our original document. Optionally, we could also - | choose to add more than one pattern, for example to also match sequences - | without punctuation between "hello" and "world": + | this case, #[code [('15578876784678163569', 0, 2)]], which maps to the + | span #[code doc[0:2]] of our original document. The #[code match_id] + | is the #[+a("/usage/spacy-101#vocab") hash value] of the string ID + | "HelloWorld". To get the string value, you can look up the ID + | in the #[+api("stringstore") #[code StringStore]]. + ++code. + for match_id, start, end in matches: + string_id = nlp.vocab.strings[match_id] # 'HelloWorld' + span = doc[start:end] # the matched span + +p + | Optionally, we could also choose to add more than one pattern, for + | example to also match sequences without punctuation between "hello" and + | "world": +code. matcher.add('HelloWorld', None,