Small doc typos (#10750)

* fix typos

* formatting
This commit is contained in:
Sofie Van Landeghem 2022-05-03 13:55:27 +02:00 committed by GitHub
parent f5390e278a
commit e03b9f8095
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -949,7 +949,7 @@ for match_id, start, end in matcher(doc):
The examples here use [`nlp.make_doc`](/api/language#make_doc) to create `Doc`
object patterns as efficiently as possible and without running any of the other
pipeline components. If the token attribute you want to match on are set by a
pipeline components. If the token attribute you want to match on is set by a
pipeline component, **make sure that the pipeline component runs** when you
create the pattern. For example, to match on `POS` or `LEMMA`, the pattern `Doc`
objects need to have part-of-speech tags set by the `tagger` or `morphologizer`.
@ -960,9 +960,9 @@ disable components selectively.
</Infobox>
Another possible use case is matching number tokens like IP addresses based on
their shape. This means that you won't have to worry about how those string will
be tokenized and you'll be able to find tokens and combinations of tokens based
on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
their shape. This means that you won't have to worry about how those strings
will be tokenized and you'll be able to find tokens and combinations of tokens
based on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
`ddd.ddd.d.d`:
```python
@ -1433,7 +1433,7 @@ of `"phrase_matcher_attr": "POS"` for the entity ruler.
Running the full language pipeline across every pattern in a large list scales
linearly and can therefore take a long time on large amounts of phrase patterns.
As of spaCy v2.2.4 the `add_patterns` function has been refactored to use
nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with
`nlp.pipe` on all phrase patterns resulting in about a 10x-20x speed up with
5,000-100,000 phrase patterns respectively. Even with this speedup (but
especially if you're using an older version) the `add_patterns` function can
still take a long time. An easy workaround to make this function run faster is