mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-27 09:44:36 +03:00
format
This commit is contained in:
parent
f9fe5eb323
commit
6ed67d495a
|
@ -665,18 +665,18 @@ can create and register a custom function that generates
|
||||||
using this dataset for training, stopping criteria such as maximum number of
|
using this dataset for training, stopping criteria such as maximum number of
|
||||||
steps, or stopping when the loss does not decrease further, can be used.
|
steps, or stopping when the loss does not decrease further, can be used.
|
||||||
|
|
||||||
In this example we assume a custom function `read_custom_data()`
|
In this example we assume a custom function `read_custom_data()` which loads or
|
||||||
which loads or generates texts with relevant textcat annotations. Then, small
|
generates texts with relevant textcat annotations. Then, small lexical
|
||||||
lexical variations of the input text are created before generating the final
|
variations of the input text are created before generating the final `Example`
|
||||||
`Example` objects.
|
objects.
|
||||||
|
|
||||||
We can also customize the batching strategy by registering a new "batcher" which
|
We can also customize the batching strategy by registering a new "batcher" which
|
||||||
turns a stream of items into a stream of batches. spaCy has several useful
|
turns a stream of items into a stream of batches. spaCy has several useful
|
||||||
built-in batching strategies with customizable sizes<!-- TODO: link -->, but
|
built-in batching strategies with customizable sizes<!-- TODO: link -->, but
|
||||||
it's also easy to implement your own. For instance, the following function takes
|
it's also easy to implement your own. For instance, the following function takes
|
||||||
the stream of generated `Example` objects, and removes those which have the exact
|
the stream of generated `Example` objects, and removes those which have the
|
||||||
same underlying raw text, to avoid duplicates in the final training data. Note
|
exact same underlying raw text, to avoid duplicates within each batch. Note that
|
||||||
that in a more realistic implementation, you'd also want to check whether the
|
in a more realistic implementation, you'd also want to check whether the
|
||||||
annotations are exactly the same.
|
annotations are exactly the same.
|
||||||
|
|
||||||
> ```ini
|
> ```ini
|
||||||
|
|
Loading…
Reference in New Issue
Block a user