matcher doc corrections (#9115)

* update error message to current UX * clarify uppercase effect * fix docstring
2026-02-08 08:19:45 +03:00 · 2021-09-02 09:26:33 +02:00 · 2021-09-02 09:26:33 +02:00 · 8895e3c9ad
commit 8895e3c9ad
parent d60b748e3c
2 changed files with 4 additions and 3 deletions
--- a/spacy/matcher/matcher.pyx
+++ b/spacy/matcher/matcher.pyx
@ -340,7 +340,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
    The "predicates" list contains functions that take a Python list and return a
    boolean value. It's mostly used for regular expressions.

-    The "extra_getters" list contains functions that take a Python list and return
+    The "extensions" list contains functions that take a Python list and return
    an attr ID. It's mostly used for extension attributes.
    """
    cdef vector[PatternStateC] states
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -429,7 +429,7 @@ matcher.add("HelloWorld", [pattern])
 # 🚨 Raises an error:
 # MatchPatternError: Invalid token patterns for matcher rule 'HelloWorld'
 # Pattern 0:
-# - Additional properties are not allowed ('CASEINSENSITIVE' was unexpected) [2]
+# - [pattern -> 2 -> CASEINSENSITIVE] extra fields not permitted

 ```

@ -438,7 +438,8 @@ matcher.add("HelloWorld", [pattern])
 To move on to a more realistic example, let's say you're working with a large
 corpus of blog articles, and you want to match all mentions of "Google I/O"
 (which spaCy tokenizes as `['Google', 'I', '/', 'O'`]). To be safe, you only
-match on the uppercase versions, in case someone has written it as "Google i/o".
+match on the uppercase versions, avoiding matches with phrases such as "Google
+i/o".

 ```python
 ### {executable="true"}