mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-12 10:16:27 +03:00
parent
f5390e278a
commit
e03b9f8095
|
@ -949,7 +949,7 @@ for match_id, start, end in matcher(doc):
|
|||
|
||||
The examples here use [`nlp.make_doc`](/api/language#make_doc) to create `Doc`
|
||||
object patterns as efficiently as possible and without running any of the other
|
||||
pipeline components. If the token attribute you want to match on are set by a
|
||||
pipeline components. If the token attribute you want to match on is set by a
|
||||
pipeline component, **make sure that the pipeline component runs** when you
|
||||
create the pattern. For example, to match on `POS` or `LEMMA`, the pattern `Doc`
|
||||
objects need to have part-of-speech tags set by the `tagger` or `morphologizer`.
|
||||
|
@ -960,9 +960,9 @@ disable components selectively.
|
|||
</Infobox>
|
||||
|
||||
Another possible use case is matching number tokens like IP addresses based on
|
||||
their shape. This means that you won't have to worry about how those string will
|
||||
be tokenized and you'll be able to find tokens and combinations of tokens based
|
||||
on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
|
||||
their shape. This means that you won't have to worry about how those strings
|
||||
will be tokenized and you'll be able to find tokens and combinations of tokens
|
||||
based on a few examples. Here, we're matching on the shapes `ddd.d.d.d` and
|
||||
`ddd.ddd.d.d`:
|
||||
|
||||
```python
|
||||
|
@ -1433,7 +1433,7 @@ of `"phrase_matcher_attr": "POS"` for the entity ruler.
|
|||
Running the full language pipeline across every pattern in a large list scales
|
||||
linearly and can therefore take a long time on large amounts of phrase patterns.
|
||||
As of spaCy v2.2.4 the `add_patterns` function has been refactored to use
|
||||
nlp.pipe on all phrase patterns resulting in about a 10x-20x speed up with
|
||||
`nlp.pipe` on all phrase patterns resulting in about a 10x-20x speed up with
|
||||
5,000-100,000 phrase patterns respectively. Even with this speedup (but
|
||||
especially if you're using an older version) the `add_patterns` function can
|
||||
still take a long time. An easy workaround to make this function run faster is
|
||||
|
|
Loading…
Reference in New Issue
Block a user