mirror of
https://github.com/explosion/spaCy.git
synced 2025-07-05 20:33:10 +03:00
Document operator semantics in Matcher docstring
This commit is contained in:
parent
2534cd57d7
commit
0433181658
|
@ -230,14 +230,27 @@ cdef class Matcher:
|
||||||
def add(self, key, on_match, *patterns):
|
def add(self, key, on_match, *patterns):
|
||||||
"""Add a match-rule to the matcher.
|
"""Add a match-rule to the matcher.
|
||||||
A match-rule consists of: an ID key, an on_match callback, and one or
|
A match-rule consists of: an ID key, an on_match callback, and one or
|
||||||
more patterns. If the key exists, the patterns are appended to the
|
more patterns.
|
||||||
previous ones, and the previous on_match callback is replaced. The
|
|
||||||
`on_match` callback will receive the arguments `(matcher, doc, i,
|
If the key exists, the patterns are appended to the previous ones, and
|
||||||
matches)`. You can also set `on_match` to `None` to not perform any
|
the previous on_match callback is replaced. The `on_match` callback will
|
||||||
actions. A pattern consists of one or more `token_specs`, where a
|
receive the arguments `(matcher, doc, i, matches)`. You can also set
|
||||||
`token_spec` is a dictionary mapping attribute IDs to values. Token
|
`on_match` to `None` to not perform any actions.
|
||||||
descriptors can also include quantifiers. There are currently important
|
|
||||||
known problems with the quantifiers – see the docs.
|
A pattern consists of one or more `token_specs`, where a `token_spec`
|
||||||
|
is a dictionary mapping attribute IDs to values, and optionally a
|
||||||
|
quantifier operator under the key "op". The available quantifiers are:
|
||||||
|
|
||||||
|
'!': Negate the pattern, by requiring it to match exactly 0 times.
|
||||||
|
'?': Make the pattern optional, by allowing it to match 0 or 1 times.
|
||||||
|
'+': Require the pattern to match 1 or more times.
|
||||||
|
'*': Allow the pattern to zero or more times.
|
||||||
|
|
||||||
|
The + and * operators are usually interpretted "greedily", i.e. longer
|
||||||
|
matches are returned where possible. However, if you specify two '+'
|
||||||
|
and '*' patterns in a row and their matches overlap, the first
|
||||||
|
operator will behave non-greedily. This quirk in the semantics
|
||||||
|
makes the matcher more efficient, by avoiding the need for back-tracking.
|
||||||
"""
|
"""
|
||||||
for pattern in patterns:
|
for pattern in patterns:
|
||||||
if len(pattern) == 0:
|
if len(pattern) == 0:
|
||||||
|
|
Loading…
Reference in New Issue
Block a user