matcher doc corrections (#9115)

* update error message to current UX

* clarify uppercase effect

* fix docstring
This commit is contained in:
Sofie Van Landeghem 2021-09-02 09:26:33 +02:00 committed by svlandeg
parent 752696f134
commit 721f4554c8
2 changed files with 4 additions and 3 deletions

View File

@ -340,7 +340,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
The "predicates" list contains functions that take a Python list and return a
boolean value. It's mostly used for regular expressions.
The "extra_getters" list contains functions that take a Python list and return
The "extensions" list contains functions that take a Python list and return
an attr ID. It's mostly used for extension attributes.
"""
cdef vector[PatternStateC] states

View File

@ -429,7 +429,7 @@ matcher.add("HelloWorld", [pattern])
# 🚨 Raises an error:
# MatchPatternError: Invalid token patterns for matcher rule 'HelloWorld'
# Pattern 0:
# - Additional properties are not allowed ('CASEINSENSITIVE' was unexpected) [2]
# - [pattern -> 2 -> CASEINSENSITIVE] extra fields not permitted
```
@ -438,7 +438,8 @@ matcher.add("HelloWorld", [pattern])
To move on to a more realistic example, let's say you're working with a large
corpus of blog articles, and you want to match all mentions of "Google I/O"
(which spaCy tokenizes as `['Google', 'I', '/', 'O'`]). To be safe, you only
match on the uppercase versions, in case someone has written it as "Google i/o".
match on the uppercase versions, avoiding matches with phrases such as "Google
i/o".
```python
### {executable="true"}