mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 10:46:29 +03:00
matcher doc corrections (#9115)
* update error message to current UX * clarify uppercase effect * fix docstring
This commit is contained in:
parent
d60b748e3c
commit
8895e3c9ad
|
@ -340,7 +340,7 @@ cdef find_matches(TokenPatternC** patterns, int n, object doclike, int length, e
|
|||
The "predicates" list contains functions that take a Python list and return a
|
||||
boolean value. It's mostly used for regular expressions.
|
||||
|
||||
The "extra_getters" list contains functions that take a Python list and return
|
||||
The "extensions" list contains functions that take a Python list and return
|
||||
an attr ID. It's mostly used for extension attributes.
|
||||
"""
|
||||
cdef vector[PatternStateC] states
|
||||
|
|
|
@ -429,7 +429,7 @@ matcher.add("HelloWorld", [pattern])
|
|||
# 🚨 Raises an error:
|
||||
# MatchPatternError: Invalid token patterns for matcher rule 'HelloWorld'
|
||||
# Pattern 0:
|
||||
# - Additional properties are not allowed ('CASEINSENSITIVE' was unexpected) [2]
|
||||
# - [pattern -> 2 -> CASEINSENSITIVE] extra fields not permitted
|
||||
|
||||
```
|
||||
|
||||
|
@ -438,7 +438,8 @@ matcher.add("HelloWorld", [pattern])
|
|||
To move on to a more realistic example, let's say you're working with a large
|
||||
corpus of blog articles, and you want to match all mentions of "Google I/O"
|
||||
(which spaCy tokenizes as `['Google', 'I', '/', 'O'`]). To be safe, you only
|
||||
match on the uppercase versions, in case someone has written it as "Google i/o".
|
||||
match on the uppercase versions, avoiding matches with phrases such as "Google
|
||||
i/o".
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
|
|
Loading…
Reference in New Issue
Block a user