mirror of
https://github.com/explosion/spaCy.git
synced 2025-08-05 21:00:19 +03:00
Add initial docs
This commit is contained in:
parent
27a4925f8d
commit
45675e1cbb
|
@ -86,14 +86,20 @@ it compares to another value.
|
||||||
> ]
|
> ]
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Attribute | Description |
|
| Attribute | Description |
|
||||||
| -------------------------- | -------------------------------------------------------------------------------------------------------- |
|
| -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `IN` | Attribute value is member of a list. ~~Any~~ |
|
| `REGEX` | Attribute value matches the regular expression at any position in the string. ~~Any~~ |
|
||||||
| `NOT_IN` | Attribute value is _not_ member of a list. ~~Any~~ |
|
| `FUZZY` | Attribute value matches if the `fuzzy_compare` method matches for `(value, pattern, -1)`. The default method allows a Levenshtein edit distance of at least 2 and up to 20% of the pattern string length. ~~Any~~ |
|
||||||
| `IS_SUBSET` | Attribute value (for `MORPH` or custom list attributes) is a subset of a list. ~~Any~~ |
|
| `FUZZY1`, `FUZZY2`, ... `FUZZY9` | Attribute value matches if the `fuzzy_compare` method matches for `(value, pattern, N)`. The default method allows a Levenshtein edit distance of at most N (1-9). ~~Any~~ |
|
||||||
| `IS_SUPERSET` | Attribute value (for `MORPH` or custom list attributes) is a superset of a list. ~~Any~~ |
|
| `IN` | Attribute value is member of a list. ~~Any~~ |
|
||||||
| `INTERSECTS` | Attribute value (for `MORPH` or custom list attribute) has a non-empty intersection with a list. ~~Any~~ |
|
| `NOT_IN` | Attribute value is _not_ member of a list. ~~Any~~ |
|
||||||
| `==`, `>=`, `<=`, `>`, `<` | Attribute value is equal, greater or equal, smaller or equal, greater or smaller. ~~Union[int, float]~~ |
|
| `IS_SUBSET` | Attribute value (for `MORPH` or custom list attributes) is a subset of a list. ~~Any~~ |
|
||||||
|
| `IS_SUPERSET` | Attribute value (for `MORPH` or custom list attributes) is a superset of a list. ~~Any~~ |
|
||||||
|
| `INTERSECTS` | Attribute value (for `MORPH` or custom list attribute) has a non-empty intersection with a list. ~~Any~~ |
|
||||||
|
| `==`, `>=`, `<=`, `>`, `<` | Attribute value is equal, greater or equal, smaller or equal, greater or smaller. ~~Union[int, float]~~ |
|
||||||
|
|
||||||
|
As of spaCy v3.5, `REGEX` and `FUZZY` can be used in combination with `IN` and
|
||||||
|
`NOT_IN`.
|
||||||
|
|
||||||
## Matcher.\_\_init\_\_ {#init tag="method"}
|
## Matcher.\_\_init\_\_ {#init tag="method"}
|
||||||
|
|
||||||
|
@ -109,10 +115,11 @@ string where an integer is expected) or unexpected property names.
|
||||||
> matcher = Matcher(nlp.vocab)
|
> matcher = Matcher(nlp.vocab)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ---------- | ----------------------------------------------------------------------------------------------------- |
|
| --------------- | ----------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | The vocabulary object, which must be shared with the documents the matcher will operate on. ~~Vocab~~ |
|
| `vocab` | The vocabulary object, which must be shared with the documents the matcher will operate on. ~~Vocab~~ |
|
||||||
| `validate` | Validate all patterns added to this matcher. ~~bool~~ |
|
| `validate` | Validate all patterns added to this matcher. ~~bool~~ |
|
||||||
|
| `fuzzy_compare` | The comparison method used for the `FUZZY` operators. ~~Callable[[str, str, int], bool]~~ |
|
||||||
|
|
||||||
## Matcher.\_\_call\_\_ {#call tag="method"}
|
## Matcher.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
|
|
|
@ -364,6 +364,46 @@ else:
|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
#### Fuzzy matching {#fuzzy new="3.5"}
|
||||||
|
|
||||||
|
Fuzzy matching allows you to match tokens with alternate spellings, typos, etc.
|
||||||
|
without specifying every possible variant.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Matches "favourite", "favorites", "gavorite", "theatre", "theatr", ...
|
||||||
|
pattern = [{"TEXT": {"FUZZY": "favorite"}},
|
||||||
|
{"TEXT": {"FUZZY": "theater"}}]
|
||||||
|
```
|
||||||
|
|
||||||
|
The `FUZZY` attribute allows fuzzy matches for any attribute string value,
|
||||||
|
including custom attributes. Just like `REGEX`, it always needs to be applied to
|
||||||
|
an attribute like `TEXT` or `LOWER`. By default `FUZZY` allows a Levenshtein
|
||||||
|
edit distance of at least 2 and up to 20% of the pattern string length. Using
|
||||||
|
the more specific attributes `FUZZY1`..`FUZZY9` you can specify the maximum
|
||||||
|
allowed edit distance directly.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Match lowercase with fuzzy matching (allows 2 edits)
|
||||||
|
pattern = [{"LOWER": {"FUZZY": "definitely"}}]
|
||||||
|
|
||||||
|
# Match custom attribute values with fuzzy matching (allows 2 edits)
|
||||||
|
pattern = [{"_": {"country": {"FUZZY": "Kyrgyzstan"}}}]
|
||||||
|
|
||||||
|
# Match with exact Levenshtein edit distance limits (allows 3 edits)
|
||||||
|
pattern = [{"_": {"country": {"FUZZY3": "Kyrgyzstan"}}}]
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Regex and fuzzy matching with lists {#regex-fuzzy-lists new="3.5"}
|
||||||
|
|
||||||
|
Starting in spaCy v3.5, both `REGEX` and `FUZZY` can be combined with the
|
||||||
|
attributes `IN` and `NOT_IN`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
pattern = [{"TEXT": {"FUZZY": {"IN": ["awesome", "cool", "wonderful"]}}}]
|
||||||
|
|
||||||
|
pattern = [{"TEXT": {"REGEX": {"NOT_IN": ["^awe(some)?$", "^wonder(ful)?"]}}}]
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Operators and quantifiers {#quantifiers}
|
#### Operators and quantifiers {#quantifiers}
|
||||||
|
|
Loading…
Reference in New Issue
Block a user