spaCy/website/docs/usage
Kevin Humphreys 19650ebb52
Enable fuzzy text matching in Matcher (#11359)
* enable fuzzy matching

* add fuzzy param to EntityMatcher

* include rapidfuzz_capi

not yet used

* fix type

* add FUZZY predicate

* add fuzzy attribute list

* fix type properly

* tidying

* remove unnecessary dependency

* handle fuzzy sets

* simplify fuzzy sets

* case fix

* switch to FUZZYn predicates

use Levenshtein distance.
remove fuzzy param.
remove rapidfuzz_capi.

* revert changes added for fuzzy param

* switch to polyleven

(Python package)

* enable fuzzy matching

* add fuzzy param to EntityMatcher

* include rapidfuzz_capi

not yet used

* fix type

* add FUZZY predicate

* add fuzzy attribute list

* fix type properly

* tidying

* remove unnecessary dependency

* handle fuzzy sets

* simplify fuzzy sets

* case fix

* switch to FUZZYn predicates

use Levenshtein distance.
remove fuzzy param.
remove rapidfuzz_capi.

* revert changes added for fuzzy param

* switch to polyleven

(Python package)

* fuzzy match only on oov tokens

* remove polyleven

* exclude whitespace tokens

* don't allow more edits than characters

* fix min distance

* reinstate FUZZY operator

with length-based distance function

* handle sets inside regex operator

* remove is_oov check

* attempt build fix

no mypy failure locally

* re-attempt build fix

* don't overwrite fuzzy param value

* move fuzzy_match

to its own Python module to allow patching

* move fuzzy_match back inside Matcher

simplify logic and add tests

* Format tests

* Parametrize fuzzyn tests

* Parametrize and merge fuzzy+set tests

* Format

* Move fuzzy_match to a standalone method

* Change regex kwarg type to bool

* Add types for fuzzy_match

- Refactor variable names
- Add test for symmetrical behavior

* Parametrize fuzzyn+set tests

* Minor refactoring for fuzz/fuzzy

* Make fuzzy_match a Matcher kwarg

* Update type for _default_fuzzy_match

* don't overwrite function param

* Rename to fuzzy_compare

* Update fuzzy_compare default argument declarations

* allow fuzzy_compare override from EntityRuler

* define new Matcher keyword arg

* fix type definition

* Implement fuzzy_compare config option for EntityRuler and SpanRuler

* Rename _default_fuzzy_compare to fuzzy_compare, remove from reexported objects

* Use simpler fuzzy_compare algorithm

* Update types

* Increase minimum to 2 in fuzzy_compare to allow one transposition

* Fix predicate keys and matching for SetPredicate with FUZZY and REGEX

* Add FUZZY6..9

* Add initial docs

* Increase default fuzzy to rounded 30% of pattern length

* Update docs for fuzzy_compare in components

* Update EntityRuler and SpanRuler API docs

* Rename EntityRuler and SpanRuler setting to matcher_fuzzy_compare

To having naming similar to `phrase_matcher_attr`, rename
`fuzzy_compare` setting for `EntityRuler` and `SpanRuler` to
`matcher_fuzzy_compare. Organize next to `phrase_matcher_attr` in docs.

* Fix schema aliases

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix typo

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add FUZZY6-9 operators and update tests

* Parameterize test over greedy

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix type for fuzzy_compare to remove Optional

* Rename to spacy.levenshtein_compare.v1, move to spacy.matcher.levenshtein

* Update docs following levenshtein_compare renaming

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-01-10 10:36:17 +01:00
..
101 Refactor KB for easier customization (#11268) 2022-09-08 10:38:07 +02:00
_benchmarks-models.md final 3.0 benchmark numbers 2021-02-09 21:28:33 +01:00
embeddings-transformers.md add floret to static vectors docs (#10833) 2022-05-23 09:16:31 +02:00
facts-figures.md final 3.0 benchmark numbers 2021-02-09 21:28:33 +01:00
index.md Remove spacy-ray from docs (#11781) 2022-11-14 19:58:38 +09:00
layers-architectures.md 🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167) 2021-10-14 15:21:40 +02:00
linguistic-features.md Correct alignment example and documentation (#11491) 2022-09-14 09:36:55 +02:00
models.md Add a way to get the URL to download a pipeline to the CLI (#11175) 2022-09-02 11:58:21 +02:00
processing-pipelines.md Revert disable/disabled merging behavior (#11745) 2022-11-08 14:58:10 +01:00
projects.md Support local filesystem remotes for projects (#11762) 2022-11-29 11:40:58 +01:00
rule-based-matching.md Enable fuzzy text matching in Matcher (#11359) 2023-01-10 10:36:17 +01:00
saving-loading.md remove new v2 tags (#11780) 2022-11-14 17:41:01 +09:00
spacy-101.md Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
training.md Remove spacy-ray from docs (#11781) 2022-11-14 19:58:38 +09:00
v2-1.md Merge branch 'spacy.io' [ci skip] 2021-03-06 17:38:54 +11:00
v2-2.md Update v3 docs [ci skip] 2020-07-05 16:11:16 +02:00
v2-3.md Merge branch 'spacy.io' [ci skip] 2021-03-06 17:38:54 +11:00
v2.md Merge branch 'spacy.io' [ci skip] 2021-03-06 17:38:54 +11:00
v3-1.md Remove NBSP's across tables in the docs (#10842) 2022-05-25 09:48:39 +02:00
v3-2.md Update Catalan acknowledgements for v3.2 (#9763) 2021-11-29 14:14:21 +01:00
v3-3.md Docs for v3.3 (#10628) 2022-04-28 14:09:35 +02:00
v3-4.md fix links (#11927) 2022-12-05 16:29:13 +09:00
v3.md remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
visualizers.md Docs: displaCy documentation - data types, parse_{deps,ents,spans}, spans example (#10950) 2022-08-16 11:23:34 -04:00