This commit is contained in:
svlandeg 2020-08-18 19:43:20 +02:00
parent f9fe5eb323
commit 6ed67d495a

View File

@ -665,18 +665,18 @@ can create and register a custom function that generates
using this dataset for training, stopping criteria such as maximum number of using this dataset for training, stopping criteria such as maximum number of
steps, or stopping when the loss does not decrease further, can be used. steps, or stopping when the loss does not decrease further, can be used.
In this example we assume a custom function `read_custom_data()` In this example we assume a custom function `read_custom_data()` which loads or
which loads or generates texts with relevant textcat annotations. Then, small generates texts with relevant textcat annotations. Then, small lexical
lexical variations of the input text are created before generating the final variations of the input text are created before generating the final `Example`
`Example` objects. objects.
We can also customize the batching strategy by registering a new "batcher" which We can also customize the batching strategy by registering a new "batcher" which
turns a stream of items into a stream of batches. spaCy has several useful turns a stream of items into a stream of batches. spaCy has several useful
built-in batching strategies with customizable sizes<!-- TODO: link -->, but built-in batching strategies with customizable sizes<!-- TODO: link -->, but
it's also easy to implement your own. For instance, the following function takes it's also easy to implement your own. For instance, the following function takes
the stream of generated `Example` objects, and removes those which have the exact the stream of generated `Example` objects, and removes those which have the
same underlying raw text, to avoid duplicates in the final training data. Note exact same underlying raw text, to avoid duplicates within each batch. Note that
that in a more realistic implementation, you'd also want to check whether the in a more realistic implementation, you'd also want to check whether the
annotations are exactly the same. annotations are exactly the same.
> ```ini > ```ini