diff --git a/website/docs/usage/rule-based-matching.md b/website/docs/usage/rule-based-matching.md index 710c52dfd..be9a56dc8 100644 --- a/website/docs/usage/rule-based-matching.md +++ b/website/docs/usage/rule-based-matching.md @@ -949,7 +949,7 @@ for match_id, start, end in matcher(doc): The examples here use [`nlp.make_doc`](/api/language#make_doc) to create `Doc` object patterns as efficiently as possible and without running any of the other -pipeline components. If the token attribute you want to match on are set by a +pipeline components. If the token attribute you want to match on is set by a pipeline component, **make sure that the pipeline component runs** when you create the pattern. For example, to match on `POS` or `LEMMA`, the pattern `Doc` objects need to have part-of-speech tags set by the `tagger` or `morphologizer`. @@ -960,9 +960,9 @@ disable components selectively. Another possible use case is matching number tokens like IP addresses based on -their shape. This means that you won't have to worry about how those string will -be tokenized and you'll be able to find tokens and combinations of tokens based -on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and +their shape. This means that you won't have to worry about how those strings +will be tokenized and you'll be able to find tokens and combinations of tokens +based on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and `ddd.ddd.d.d`: ```python @@ -1433,7 +1433,7 @@ of `"phrase_matcher_attr": "POS"` for the entity ruler. Running the full language pipeline across every pattern in a large list scales linearly and can therefore take a long time on large amounts of phrase patterns. As of spaCy v2.2.4 the `add_patterns` function has been refactored to use -nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with +`nlp.pipe` on all phrase patterns resulting in about a 10x-20x speed up with 5,000-100,000 phrase patterns respectively. Even with this speedup (but especially if you're using an older version) the `add_patterns` function can still take a long time. An easy workaround to make this function run faster is