mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
parent
f5390e278a
commit
e03b9f8095
|
@ -949,7 +949,7 @@ for match_id, start, end in matcher(doc):
|
||||||
|
|
||||||
The examples here use [`nlp.make_doc`](/api/language#make_doc) to create `Doc`
|
The examples here use [`nlp.make_doc`](/api/language#make_doc) to create `Doc`
|
||||||
object patterns as efficiently as possible and without running any of the other
|
object patterns as efficiently as possible and without running any of the other
|
||||||
pipeline components. If the token attribute you want to match on are set by a
|
pipeline components. If the token attribute you want to match on is set by a
|
||||||
pipeline component, **make sure that the pipeline component runs** when you
|
pipeline component, **make sure that the pipeline component runs** when you
|
||||||
create the pattern. For example, to match on `POS` or `LEMMA`, the pattern `Doc`
|
create the pattern. For example, to match on `POS` or `LEMMA`, the pattern `Doc`
|
||||||
objects need to have part-of-speech tags set by the `tagger` or `morphologizer`.
|
objects need to have part-of-speech tags set by the `tagger` or `morphologizer`.
|
||||||
|
@ -960,9 +960,9 @@ disable components selectively.
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||
Another possible use case is matching number tokens like IP addresses based on
|
Another possible use case is matching number tokens like IP addresses based on
|
||||||
their shape. This means that you won't have to worry about how those string will
|
their shape. This means that you won't have to worry about how those strings
|
||||||
be tokenized and you'll be able to find tokens and combinations of tokens based
|
will be tokenized and you'll be able to find tokens and combinations of tokens
|
||||||
on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
|
based on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
|
||||||
`ddd.ddd.d.d`:
|
`ddd.ddd.d.d`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -1433,7 +1433,7 @@ of `"phrase_matcher_attr": "POS"` for the entity ruler.
|
||||||
Running the full language pipeline across every pattern in a large list scales
|
Running the full language pipeline across every pattern in a large list scales
|
||||||
linearly and can therefore take a long time on large amounts of phrase patterns.
|
linearly and can therefore take a long time on large amounts of phrase patterns.
|
||||||
As of spaCy v2.2.4 the `add_patterns` function has been refactored to use
|
As of spaCy v2.2.4 the `add_patterns` function has been refactored to use
|
||||||
nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with
|
`nlp.pipe` on all phrase patterns resulting in about a 10x-20x speed up with
|
||||||
5,000-100,000 phrase patterns respectively. Even with this speedup (but
|
5,000-100,000 phrase patterns respectively. Even with this speedup (but
|
||||||
especially if you're using an older version) the `add_patterns` function can
|
especially if you're using an older version) the `add_patterns` function can
|
||||||
still take a long time. An easy workaround to make this function run faster is
|
still take a long time. An easy workaround to make this function run faster is
|
||||||
|
|
Loading…
Reference in New Issue
Block a user