mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-04 21:50:35 +03:00
Reformat processing pipelines
This commit is contained in:
parent
acc58719da
commit
0fb1881f36
|
@ -54,8 +54,8 @@ texts = ["This is a text", "These are lots of texts", "..."]
|
|||
In this example, we're using [`nlp.pipe`](/api/language#pipe) to process a
|
||||
(potentially very large) iterable of texts as a stream. Because we're only
|
||||
accessing the named entities in `doc.ents` (set by the `ner` component), we'll
|
||||
disable all other components during processing. `nlp.pipe` yields `Doc`
|
||||
objects, so we can iterate over them and access the named entity predictions:
|
||||
disable all other components during processing. `nlp.pipe` yields `Doc` objects,
|
||||
so we can iterate over them and access the named entity predictions:
|
||||
|
||||
> #### ✏️ Things to try
|
||||
>
|
||||
|
@ -104,12 +104,11 @@ docs = nlp.pipe(texts, n_process=4)
|
|||
docs = nlp.pipe(texts, n_process=-1)
|
||||
```
|
||||
|
||||
Depending on your platform, starting many processes with multiprocessing can
|
||||
add a lot of overhead. In particular, the default start method `spawn` used in
|
||||
Depending on your platform, starting many processes with multiprocessing can add
|
||||
a lot of overhead. In particular, the default start method `spawn` used in
|
||||
macOS/OS X (as of Python 3.8) and in Windows can be slow for larger models
|
||||
because the model data is copied in memory for each new process. See the
|
||||
[Python docs on
|
||||
multiprocessing](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
|
||||
[Python docs on multiprocessing](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods)
|
||||
for further details.
|
||||
|
||||
For shorter tasks and in particular with `spawn`, it can be faster to use a
|
||||
|
@ -134,8 +133,8 @@ to limitations in CUDA.
|
|||
|
||||
In Linux, transformer models may hang or deadlock with multiprocessing due to an
|
||||
[issue in PyTorch](https://github.com/pytorch/pytorch/issues/17199). One
|
||||
suggested workaround is to use `spawn` instead of `fork` and another is to
|
||||
limit the number of threads before loading any models using
|
||||
suggested workaround is to use `spawn` instead of `fork` and another is to limit
|
||||
the number of threads before loading any models using
|
||||
`torch.set_num_threads(1)`.
|
||||
|
||||
</Infobox>
|
||||
|
|
Loading…
Reference in New Issue
Block a user