Fix broken images
|
@ -167,7 +167,7 @@ validation error with more details.
|
||||||
>
|
>
|
||||||
> #### Example diff
|
> #### Example diff
|
||||||
>
|
>
|
||||||
> 
|
> 
|
||||||
|
|
||||||
```cli
|
```cli
|
||||||
$ python -m spacy init fill-config [base_path] [output_file] [--diff]
|
$ python -m spacy init fill-config [base_path] [output_file] [--diff]
|
||||||
|
@ -1490,7 +1490,7 @@ $ python -m spacy project document [project_dir] [--output] [--no-emoji]
|
||||||
For more examples, see the templates in our
|
For more examples, see the templates in our
|
||||||
[`projects`](https://github.com/explosion/projects) repo.
|
[`projects`](https://github.com/explosion/projects) repo.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
|
|
@ -91,7 +91,7 @@ Main changes from spaCy v2 models:
|
||||||
|
|
||||||
### CNN/CPU pipeline design {#design-cnn}
|
### CNN/CPU pipeline design {#design-cnn}
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
In the `sm`/`md`/`lg` models:
|
In the `sm`/`md`/`lg` models:
|
||||||
|
|
||||||
|
|
|
@ -14,7 +14,7 @@ of the pipeline. The `Language` object coordinates these components. It takes
|
||||||
raw text and sends it through the pipeline, returning an **annotated document**.
|
raw text and sends it through the pipeline, returning an **annotated document**.
|
||||||
It also orchestrates training and serialization.
|
It also orchestrates training and serialization.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Container objects {#architecture-containers}
|
### Container objects {#architecture-containers}
|
||||||
|
|
||||||
|
@ -39,7 +39,7 @@ rule-based modifications to the `Doc`. spaCy provides a range of built-in
|
||||||
components for different language processing tasks and also allows adding
|
components for different language processing tasks and also allows adding
|
||||||
[custom components](/usage/processing-pipelines#custom-components).
|
[custom components](/usage/processing-pipelines#custom-components).
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------------------------------------------- | ------------------------------------------------------------------------------------------- |
|
| ----------------------------------------------- | ------------------------------------------------------------------------------------------- |
|
||||||
|
|
|
@ -5,7 +5,7 @@ referred to as the **processing pipeline**. The pipeline used by the
|
||||||
and an entity recognizer. Each pipeline component returns the processed `Doc`,
|
and an entity recognizer. Each pipeline component returns the processed `Doc`,
|
||||||
which is then passed on to the next component.
|
which is then passed on to the next component.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
> - **Name**: ID of the pipeline component.
|
> - **Name**: ID of the pipeline component.
|
||||||
> - **Component:** spaCy's implementation of the component.
|
> - **Component:** spaCy's implementation of the component.
|
||||||
|
|
|
@ -41,7 +41,7 @@ marks.
|
||||||
> - **Suffix:** Character(s) at the end, e.g. `km`, `)`, `”`, `!`.
|
> - **Suffix:** Character(s) at the end, e.g. `km`, `)`, `”`, `!`.
|
||||||
> - **Infix:** Character(s) in between, e.g. `-`, `--`, `/`, `…`.
|
> - **Infix:** Character(s) in between, e.g. `-`, `--`, `/`, `…`.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
While punctuation rules are usually pretty general, tokenizer exceptions
|
While punctuation rules are usually pretty general, tokenizer exceptions
|
||||||
strongly depend on the specifics of the individual language. This is why each
|
strongly depend on the specifics of the individual language. This is why each
|
||||||
|
|
|
@ -21,7 +21,7 @@ predictions become more similar to the reference labels over time.
|
||||||
> Minimising the gradient of the weights should result in predictions that are
|
> Minimising the gradient of the weights should result in predictions that are
|
||||||
> closer to the reference labels on the training data.
|
> closer to the reference labels on the training data.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
When training a model, we don't just want it to memorize our examples – we want
|
When training a model, we don't just want it to memorize our examples – we want
|
||||||
it to come up with a theory that can be **generalized across unseen data**.
|
it to come up with a theory that can be **generalized across unseen data**.
|
||||||
|
|
|
@ -136,7 +136,7 @@ useful for your purpose. Here are some important considerations to keep in mind:
|
||||||
|
|
||||||
<Infobox title="Tip: Check out sense2vec" emoji="💡">
|
<Infobox title="Tip: Check out sense2vec" emoji="💡">
|
||||||
|
|
||||||
[](https://github.com/explosion/sense2vec)
|
[](https://github.com/explosion/sense2vec)
|
||||||
|
|
||||||
[`sense2vec`](https://github.com/explosion/sense2vec) is a library developed by
|
[`sense2vec`](https://github.com/explosion/sense2vec) is a library developed by
|
||||||
us that builds on top of spaCy and lets you train and query more interesting and
|
us that builds on top of spaCy and lets you train and query more interesting and
|
||||||
|
|
|
@ -85,7 +85,7 @@ difficult to swap components or retrain parts of the pipeline. Multi-task
|
||||||
learning can affect your accuracy (either positively or negatively), and may
|
learning can affect your accuracy (either positively or negatively), and may
|
||||||
require some retuning of your hyper-parameters.
|
require some retuning of your hyper-parameters.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
| Shared | Independent |
|
| Shared | Independent |
|
||||||
| ------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
|
| ------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- |
|
||||||
|
@ -99,7 +99,7 @@ components by adding a [`Transformer`](/api/transformer) or
|
||||||
later in the pipeline can "connect" to it by including a **listener layer** like
|
later in the pipeline can "connect" to it by including a **listener layer** like
|
||||||
[Tok2VecListener](/api/architectures#Tok2VecListener) within their model.
|
[Tok2VecListener](/api/architectures#Tok2VecListener) within their model.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
At the beginning of training, the [`Tok2Vec`](/api/tok2vec) component will grab
|
At the beginning of training, the [`Tok2Vec`](/api/tok2vec) component will grab
|
||||||
a reference to the relevant listener layers in the rest of your pipeline. When
|
a reference to the relevant listener layers in the rest of your pipeline. When
|
||||||
|
@ -249,7 +249,7 @@ the standard way, like any other spaCy pipeline. Instead of using the
|
||||||
transformers as subnetworks directly, you can also use them via the
|
transformers as subnetworks directly, you can also use them via the
|
||||||
[`Transformer`](/api/transformer) pipeline component.
|
[`Transformer`](/api/transformer) pipeline component.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The `Transformer` component sets the
|
The `Transformer` component sets the
|
||||||
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
[`Doc._.trf_data`](/api/transformer#custom_attributes) extension attribute,
|
||||||
|
|
|
@ -111,7 +111,7 @@ If you're using a modern editor like Visual Studio Code, you can
|
||||||
custom Thinc plugin and get live feedback about mismatched types as you write
|
custom Thinc plugin and get live feedback about mismatched types as you write
|
||||||
code.
|
code.
|
||||||
|
|
||||||
[](https://thinc.ai/docs/usage-type-checking#linting)
|
[](https://thinc.ai/docs/usage-type-checking#linting)
|
||||||
|
|
||||||
</Accordion>
|
</Accordion>
|
||||||
|
|
||||||
|
@ -785,7 +785,7 @@ To use our new relation extraction model as part of a custom
|
||||||
[trainable component](/usage/processing-pipelines#trainable-components), we
|
[trainable component](/usage/processing-pipelines#trainable-components), we
|
||||||
create a subclass of [`TrainablePipe`](/api/pipe) that holds the model.
|
create a subclass of [`TrainablePipe`](/api/pipe) that holds the model.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
```python
|
```python
|
||||||
### Pipeline component skeleton
|
### Pipeline component skeleton
|
||||||
|
|
|
@ -1154,7 +1154,7 @@ different signature from all the other components: it takes a text and returns a
|
||||||
[`Doc`](/api/doc), whereas all other components expect to already receive a
|
[`Doc`](/api/doc), whereas all other components expect to already receive a
|
||||||
tokenized `Doc`.
|
tokenized `Doc`.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
To overwrite the existing tokenizer, you need to replace `nlp.tokenizer` with a
|
To overwrite the existing tokenizer, you need to replace `nlp.tokenizer` with a
|
||||||
custom function that takes a text and returns a [`Doc`](/api/doc).
|
custom function that takes a text and returns a [`Doc`](/api/doc).
|
||||||
|
|
|
@ -1158,7 +1158,7 @@ pipeline is loaded. For more background on this, see the usage guides on the
|
||||||
[config lifecycle](/usage/training#config-lifecycle) and
|
[config lifecycle](/usage/training#config-lifecycle) and
|
||||||
[custom initialization](/usage/training#initialization).
|
[custom initialization](/usage/training#initialization).
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
A component's `initialize` method needs to take at least **two named
|
A component's `initialize` method needs to take at least **two named
|
||||||
arguments**: a `get_examples` callback that gives it access to the training
|
arguments**: a `get_examples` callback that gives it access to the training
|
||||||
|
@ -1274,7 +1274,7 @@ trainable components that have their own model instance, make predictions over
|
||||||
`Doc` objects and can be updated using [`spacy train`](/api/cli#train). This
|
`Doc` objects and can be updated using [`spacy train`](/api/cli#train). This
|
||||||
lets you plug fully custom machine learning components into your pipeline.
|
lets you plug fully custom machine learning components into your pipeline.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
You'll need the following:
|
You'll need the following:
|
||||||
|
|
||||||
|
|
|
@ -27,7 +27,7 @@ and share your results with your team. spaCy projects can be used via the new
|
||||||
[`spacy project`](/api/cli#project) command and we provide templates in our
|
[`spacy project`](/api/cli#project) command and we provide templates in our
|
||||||
[`projects`](https://github.com/explosion/projects) repo.
|
[`projects`](https://github.com/explosion/projects) repo.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Project id="pipelines/tagger_parser_ud">
|
<Project id="pipelines/tagger_parser_ud">
|
||||||
|
|
||||||
|
@ -594,7 +594,7 @@ commands:
|
||||||
> For more examples, see the [`projects`](https://github.com/explosion/projects)
|
> For more examples, see the [`projects`](https://github.com/explosion/projects)
|
||||||
> repo.
|
> repo.
|
||||||
>
|
>
|
||||||
> 
|
> 
|
||||||
|
|
||||||
When your custom project is ready and you want to share it with others, you can
|
When your custom project is ready and you want to share it with others, you can
|
||||||
use the [`spacy project document`](/api/cli#project-document) command to
|
use the [`spacy project document`](/api/cli#project-document) command to
|
||||||
|
@ -887,7 +887,7 @@ commands:
|
||||||
|
|
||||||
> #### Example train curve output
|
> #### Example train curve output
|
||||||
>
|
>
|
||||||
> [](https://prodi.gy/docs/recipes#train-curve)
|
> [](https://prodi.gy/docs/recipes#train-curve)
|
||||||
|
|
||||||
The [`train-curve`](https://prodi.gy/docs/recipes#train-curve) recipe is another
|
The [`train-curve`](https://prodi.gy/docs/recipes#train-curve) recipe is another
|
||||||
cool workflow you can include in your project. It will run the training with
|
cool workflow you can include in your project. It will run the training with
|
||||||
|
@ -942,7 +942,7 @@ full embedded visualizer, as well as individual components.
|
||||||
> $ pip install spacy-streamlit --pre
|
> $ pip install spacy-streamlit --pre
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Using [`spacy-streamlit`](https://github.com/explosion/spacy-streamlit), your
|
Using [`spacy-streamlit`](https://github.com/explosion/spacy-streamlit), your
|
||||||
projects can easily define their own scripts that spin up an interactive
|
projects can easily define their own scripts that spin up an interactive
|
||||||
|
@ -1054,9 +1054,9 @@ and you'll be able to see the impact it has on your results.
|
||||||
> model_log_interval = 1000
|
> model_log_interval = 1000
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Project id="integrations/wandb">
|
<Project id="integrations/wandb">
|
||||||
|
|
||||||
|
@ -1107,7 +1107,7 @@ After uploading, you will see the live URL of your pipeline packages, as well as
|
||||||
the direct URL to the model wheel you can install via `pip install`. You'll also
|
the direct URL to the model wheel you can install via `pip install`. You'll also
|
||||||
be able to test your pipeline interactively from your browser:
|
be able to test your pipeline interactively from your browser:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
In your `project.yml`, you can add a command that uploads your trained and
|
In your `project.yml`, you can add a command that uploads your trained and
|
||||||
packaged pipeline to the hub. You can either run this as a manual step, or
|
packaged pipeline to the hub. You can either run this as a manual step, or
|
||||||
|
|
|
@ -208,7 +208,7 @@ you need to describe fields like this.
|
||||||
|
|
||||||
<Infobox title="Tip: Try the interactive matcher explorer">
|
<Infobox title="Tip: Try the interactive matcher explorer">
|
||||||
|
|
||||||
[](https://explosion.ai/demos/matcher)
|
[](https://explosion.ai/demos/matcher)
|
||||||
|
|
||||||
The [Matcher Explorer](https://explosion.ai/demos/matcher) lets you test the
|
The [Matcher Explorer](https://explosion.ai/demos/matcher) lets you test the
|
||||||
rule-based `Matcher` by creating token patterns interactively and running them
|
rule-based `Matcher` by creating token patterns interactively and running them
|
||||||
|
@ -1211,7 +1211,7 @@ each new token needs to be linked to an existing token on its left. As for
|
||||||
`founded` in this example, a token may be linked to more than one token on its
|
`founded` in this example, a token may be linked to more than one token on its
|
||||||
right:
|
right:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The full pattern comes together as shown in the example below:
|
The full pattern comes together as shown in the example below:
|
||||||
|
|
||||||
|
@ -1752,7 +1752,7 @@ print([(ent.text, ent.label_) for ent in doc.ents])
|
||||||
> - `VBD`: Verb, past tense.
|
> - `VBD`: Verb, past tense.
|
||||||
> - `IN`: Conjunction, subordinating or preposition.
|
> - `IN`: Conjunction, subordinating or preposition.
|
||||||
|
|
||||||
 visualization with `options={'fine_grained': True}` to output the fine-grained part-of-speech tags, i.e. `Token.tag_`")
|
 visualization with `options={'fine_grained': True}` to output the fine-grained part-of-speech tags, i.e. `Token.tag_`")
|
||||||
|
|
||||||
In this example, "worked" is the root of the sentence and is a past tense verb.
|
In this example, "worked" is the root of the sentence and is a past tense verb.
|
||||||
Its subject is "Alex Smith", the person who worked. "at Acme Corp Inc." is a
|
Its subject is "Alex Smith", the person who worked. "at Acme Corp Inc." is a
|
||||||
|
@ -1835,7 +1835,7 @@ notice that our current logic fails and doesn't correctly detect the company as
|
||||||
a past organization. That's because the root is a participle and the tense
|
a past organization. That's because the root is a participle and the tense
|
||||||
information is in the attached auxiliary "was":
|
information is in the attached auxiliary "was":
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
To solve this, we can adjust the rules to also check for the above construction:
|
To solve this, we can adjust the rules to also check for the above construction:
|
||||||
|
|
||||||
|
|
|
@ -30,7 +30,7 @@ quick introduction.
|
||||||
|
|
||||||
<Infobox title="Take the free interactive course">
|
<Infobox title="Take the free interactive course">
|
||||||
|
|
||||||
[](https://course.spacy.io)
|
[](https://course.spacy.io)
|
||||||
|
|
||||||
In this course you'll learn how to use spaCy to build advanced natural language
|
In this course you'll learn how to use spaCy to build advanced natural language
|
||||||
understanding systems, using both rule-based and machine learning approaches. It
|
understanding systems, using both rule-based and machine learning approaches. It
|
||||||
|
@ -292,7 +292,7 @@ and part-of-speech tags like "VERB" are also encoded. Internally, spaCy only
|
||||||
> - **StringStore**: The dictionary mapping hash values to strings, for example
|
> - **StringStore**: The dictionary mapping hash values to strings, for example
|
||||||
> `3197928453018144401` → "coffee".
|
> `3197928453018144401` → "coffee".
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
If you process lots of documents containing the word "coffee" in all kinds of
|
If you process lots of documents containing the word "coffee" in all kinds of
|
||||||
different contexts, storing the exact string "coffee" every time would take up
|
different contexts, storing the exact string "coffee" every time would take up
|
||||||
|
@ -437,7 +437,7 @@ source of truth", both at **training** and **runtime**.
|
||||||
> initial_rate = 0.01
|
> initial_rate = 0.01
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Training configuration system" emoji="📖">
|
<Infobox title="Training configuration system" emoji="📖">
|
||||||
|
|
||||||
|
@ -466,7 +466,7 @@ configured via a single training config.
|
||||||
> width = 128
|
> width = 128
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Custom trainable components" emoji="📖">
|
<Infobox title="Custom trainable components" emoji="📖">
|
||||||
|
|
||||||
|
|
|
@ -23,7 +23,7 @@ import Training101 from 'usage/101/_training.mdx'
|
||||||
|
|
||||||
<Infobox title="Tip: Try the Prodigy annotation tool">
|
<Infobox title="Tip: Try the Prodigy annotation tool">
|
||||||
|
|
||||||
[](https://prodi.gy)
|
[](https://prodi.gy)
|
||||||
|
|
||||||
If you need to label a lot of data, check out [Prodigy](https://prodi.gy), a
|
If you need to label a lot of data, check out [Prodigy](https://prodi.gy), a
|
||||||
new, active learning-powered annotation tool we've developed. Prodigy is fast
|
new, active learning-powered annotation tool we've developed. Prodigy is fast
|
||||||
|
@ -222,7 +222,7 @@ config is available as [`nlp.config`](/api/language#config) and it includes all
|
||||||
information about the pipeline, as well as the settings used to train and
|
information about the pipeline, as well as the settings used to train and
|
||||||
initialize it.
|
initialize it.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
At runtime spaCy will only use the `[nlp]` and `[components]` blocks of the
|
At runtime spaCy will only use the `[nlp]` and `[components]` blocks of the
|
||||||
config and load all data, including tokenization rules, model weights and other
|
config and load all data, including tokenization rules, model weights and other
|
||||||
|
@ -1120,7 +1120,7 @@ because the component settings required for training (load data from an external
|
||||||
file) wouldn't match the component settings required at runtime (load what's
|
file) wouldn't match the component settings required at runtime (load what's
|
||||||
included with the saved `nlp` object and don't depend on external file).
|
included with the saved `nlp` object and don't depend on external file).
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="How components save and load data" emoji="📖">
|
<Infobox title="How components save and load data" emoji="📖">
|
||||||
|
|
||||||
|
@ -1572,6 +1572,77 @@ token-based annotations like the dependency parse or entity labels, you'll need
|
||||||
to take care to adjust the `Example` object so its annotations match and remain
|
to take care to adjust the `Example` object so its annotations match and remain
|
||||||
valid.
|
valid.
|
||||||
|
|
||||||
|
## Parallel & distributed training with Ray {#parallel-training}
|
||||||
|
|
||||||
|
> #### Installation
|
||||||
|
>
|
||||||
|
> ```cli
|
||||||
|
> $ pip install -U %%SPACY_PKG_NAME[ray]%%SPACY_PKG_FLAGS
|
||||||
|
> # Check that the CLI is registered
|
||||||
|
> $ python -m spacy ray --help
|
||||||
|
> ```
|
||||||
|
|
||||||
|
[Ray](https://ray.io/) is a fast and simple framework for building and running
|
||||||
|
**distributed applications**. You can use Ray to train spaCy on one or more
|
||||||
|
remote machines, potentially speeding up your training process. Parallel
|
||||||
|
training won't always be faster though – it depends on your batch size, models,
|
||||||
|
and hardware.
|
||||||
|
|
||||||
|
<Infobox variant="warning">
|
||||||
|
|
||||||
|
To use Ray with spaCy, you need the
|
||||||
|
[`spacy-ray`](https://github.com/explosion/spacy-ray) package installed.
|
||||||
|
Installing the package will automatically add the `ray` command to the spaCy
|
||||||
|
CLI.
|
||||||
|
|
||||||
|
</Infobox>
|
||||||
|
|
||||||
|
The [`spacy ray train`](/api/cli#ray-train) command follows the same API as
|
||||||
|
[`spacy train`](/api/cli#train), with a few extra options to configure the Ray
|
||||||
|
setup. You can optionally set the `--address` option to point to your Ray
|
||||||
|
cluster. If it's not set, Ray will run locally.
|
||||||
|
|
||||||
|
```cli
|
||||||
|
python -m spacy ray train config.cfg --n-workers 2
|
||||||
|
```
|
||||||
|
|
||||||
|
<Project id="integrations/ray">
|
||||||
|
|
||||||
|
Get started with parallel training using our project template. It trains a
|
||||||
|
simple model on a Universal Dependencies Treebank and lets you parallelize the
|
||||||
|
training with Ray.
|
||||||
|
|
||||||
|
</Project>
|
||||||
|
|
||||||
|
### How parallel training works {#parallel-training-details}
|
||||||
|
|
||||||
|
Each worker receives a shard of the **data** and builds a copy of the **model
|
||||||
|
and optimizer** from the [`config.cfg`](#config). It also has a communication
|
||||||
|
channel to **pass gradients and parameters** to the other workers. Additionally,
|
||||||
|
each worker is given ownership of a subset of the parameter arrays. Every
|
||||||
|
parameter array is owned by exactly one worker, and the workers are given a
|
||||||
|
mapping so they know which worker owns which parameter.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
As training proceeds, every worker will be computing gradients for **all** of
|
||||||
|
the model parameters. When they compute gradients for parameters they don't own,
|
||||||
|
they'll **send them to the worker** that does own that parameter, along with a
|
||||||
|
version identifier so that the owner can decide whether to discard the gradient.
|
||||||
|
Workers use the gradients they receive and the ones they compute locally to
|
||||||
|
update the parameters they own, and then broadcast the updated array and a new
|
||||||
|
version ID to the other workers.
|
||||||
|
|
||||||
|
This training procedure is **asynchronous** and **non-blocking**. Workers always
|
||||||
|
push their gradient increments and parameter updates, they do not have to pull
|
||||||
|
them and block on the result, so the transfers can happen in the background,
|
||||||
|
overlapped with the actual training work. The workers also do not have to stop
|
||||||
|
and wait for each other ("synchronize") at the start of each batch. This is very
|
||||||
|
useful for spaCy, because spaCy is often trained on long documents, which means
|
||||||
|
**batches can vary in size** significantly. Uneven workloads make synchronous
|
||||||
|
gradient descent inefficient, because if one batch is slow, all of the other
|
||||||
|
workers are stuck waiting for it to complete before they can continue.
|
||||||
|
|
||||||
## Internal training API {#api}
|
## Internal training API {#api}
|
||||||
|
|
||||||
<Infobox variant="danger">
|
<Infobox variant="danger">
|
||||||
|
|
|
@ -130,7 +130,7 @@ write any **attributes, properties and methods** to the `Doc`, `Token` and
|
||||||
`Span`. You can add data, implement new features, integrate other libraries with
|
`Span`. You can add data, implement new features, integrate other libraries with
|
||||||
spaCy or plug in your own machine learning models.
|
spaCy or plug in your own machine learning models.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox>
|
<Infobox>
|
||||||
|
|
||||||
|
|
|
@ -76,7 +76,7 @@ This project trains a span categorizer for Indonesian NER.
|
||||||
|
|
||||||
<Infobox title="Tip: Create data with Prodigy's new span annotation UI">
|
<Infobox title="Tip: Create data with Prodigy's new span annotation UI">
|
||||||
|
|
||||||
[](https://support.prodi.gy/t/3861)
|
[](https://support.prodi.gy/t/3861)
|
||||||
|
|
||||||
The upcoming version of our annotation tool [Prodigy](https://prodi.gy)
|
The upcoming version of our annotation tool [Prodigy](https://prodi.gy)
|
||||||
(currently available as a [pre-release](https://support.prodi.gy/t/3861) for all
|
(currently available as a [pre-release](https://support.prodi.gy/t/3861) for all
|
||||||
|
|
|
@ -86,7 +86,7 @@ transformer support interoperates with [PyTorch](https://pytorch.org) and the
|
||||||
[HuggingFace `transformers`](https://huggingface.co/transformers/) library,
|
[HuggingFace `transformers`](https://huggingface.co/transformers/) library,
|
||||||
giving you access to thousands of pretrained models for your pipelines.
|
giving you access to thousands of pretrained models for your pipelines.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
import Benchmarks from 'usage/_benchmarks-models.mdx'
|
import Benchmarks from 'usage/_benchmarks-models.mdx'
|
||||||
|
|
||||||
|
@ -158,7 +158,7 @@ your pipeline. Some settings can also be registered **functions** that you can
|
||||||
swap out and customize, making it easy to implement your own custom models and
|
swap out and customize, making it easy to implement your own custom models and
|
||||||
architectures.
|
architectures.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||||
|
|
||||||
|
@ -198,7 +198,7 @@ follow the same unified [`Model`](https://thinc.ai/docs/api-model) API and each
|
||||||
`Model` can also be used as a sublayer of a larger network, allowing you to
|
`Model` can also be used as a sublayer of a larger network, allowing you to
|
||||||
freely combine implementations from different frameworks into a single model.
|
freely combine implementations from different frameworks into a single model.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||||
|
|
||||||
|
@ -234,7 +234,7 @@ project template, adjust it to fit your needs, load in your data, train a
|
||||||
pipeline, export it as a Python package, upload your outputs to a remote storage
|
pipeline, export it as a Python package, upload your outputs to a remote storage
|
||||||
and share your results with your team.
|
and share your results with your team.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
spaCy projects also make it easy to **integrate with other tools** in the data
|
spaCy projects also make it easy to **integrate with other tools** in the data
|
||||||
science and machine learning ecosystem, including [DVC](/usage/projects#dvc) for
|
science and machine learning ecosystem, including [DVC](/usage/projects#dvc) for
|
||||||
|
@ -283,7 +283,7 @@ the [`ray`](/api/cli#ray) command to your spaCy CLI if it's installed in the
|
||||||
same environment. You can then run [`spacy ray train`](/api/cli#ray-train) for
|
same environment. You can then run [`spacy ray train`](/api/cli#ray-train) for
|
||||||
parallel training.
|
parallel training.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||||
|
|
||||||
|
@ -386,7 +386,7 @@ A pattern added to the dependency matcher consists of a **list of
|
||||||
dictionaries**, with each dictionary describing a **token to match** and its
|
dictionaries**, with each dictionary describing a **token to match** and its
|
||||||
**relation to an existing token** in the pattern.
|
**relation to an existing token** in the pattern.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||||
|
|
||||||
|
@ -494,7 +494,7 @@ format for documenting argument and return types.
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
[](/api)
|
[](/api)
|
||||||
|
|
||||||
</Grid>
|
</Grid>
|
||||||
|
|
||||||
|
|
|
@ -44,7 +44,7 @@ doc = nlp("This is a sentence.")
|
||||||
displacy.serve(doc, style="dep")
|
displacy.serve(doc, style="dep")
|
||||||
```
|
```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The argument `options` lets you specify a dictionary of settings to customize
|
The argument `options` lets you specify a dictionary of settings to customize
|
||||||
the layout, for example:
|
the layout, for example:
|
||||||
|
@ -77,7 +77,7 @@ For a list of all available options, see the
|
||||||
> displacy.serve(doc, style="dep", options=options)
|
> displacy.serve(doc, style="dep", options=options)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Visualizing long texts {#dep-long-text new="2.0.12"}
|
### Visualizing long texts {#dep-long-text new="2.0.12"}
|
||||||
|
|
||||||
|
@ -267,7 +267,7 @@ rendering if auto-detection fails.
|
||||||
|
|
||||||
</Infobox>
|
</Infobox>
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Internally, displaCy imports `display` and `HTML` from `IPython.core.display`
|
Internally, displaCy imports `display` and `HTML` from `IPython.core.display`
|
||||||
and returns a Jupyter HTML object. If you were doing it manually, it'd look like
|
and returns a Jupyter HTML object. If you were doing it manually, it'd look like
|
||||||
|
@ -455,6 +455,6 @@ Alternatively, if you're using [Streamlit](https://streamlit.io), check out the
|
||||||
helps you integrate spaCy visualizations into your apps. It includes a full
|
helps you integrate spaCy visualizations into your apps. It includes a full
|
||||||
embedded visualizer, as well as individual components.
|
embedded visualizer, as well as individual components.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
</Grid>
|
</Grid>
|
||||||
|
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 88 KiB |
Before Width: | Height: | Size: 202 KiB After Width: | Height: | Size: 202 KiB |
Before Width: | Height: | Size: 270 KiB After Width: | Height: | Size: 270 KiB |
Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 25 KiB |
Before Width: | Height: | Size: 6.9 KiB After Width: | Height: | Size: 6.9 KiB |
Before Width: | Height: | Size: 3.2 KiB After Width: | Height: | Size: 3.2 KiB |
Before Width: | Height: | Size: 5.1 KiB After Width: | Height: | Size: 5.1 KiB |
Before Width: | Height: | Size: 11 KiB After Width: | Height: | Size: 11 KiB |
Before Width: | Height: | Size: 6.8 KiB After Width: | Height: | Size: 6.8 KiB |
Before Width: | Height: | Size: 2.7 KiB After Width: | Height: | Size: 2.7 KiB |
Before Width: | Height: | Size: 3.4 KiB After Width: | Height: | Size: 3.4 KiB |
Before Width: | Height: | Size: 2.4 KiB After Width: | Height: | Size: 2.4 KiB |
Before Width: | Height: | Size: 2.6 KiB After Width: | Height: | Size: 2.6 KiB |
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
Before Width: | Height: | Size: 110 KiB After Width: | Height: | Size: 110 KiB |
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 83 KiB |
Before Width: | Height: | Size: 169 KiB After Width: | Height: | Size: 169 KiB |
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 108 KiB |
Before Width: | Height: | Size: 13 KiB After Width: | Height: | Size: 13 KiB |
Before Width: | Height: | Size: 14 KiB After Width: | Height: | Size: 14 KiB |
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 78 KiB |
Before Width: | Height: | Size: 281 KiB After Width: | Height: | Size: 281 KiB |
Before Width: | Height: | Size: 304 KiB After Width: | Height: | Size: 304 KiB |
Before Width: | Height: | Size: 200 KiB After Width: | Height: | Size: 200 KiB |
Before Width: | Height: | Size: 588 KiB After Width: | Height: | Size: 588 KiB |
Before Width: | Height: | Size: 99 KiB After Width: | Height: | Size: 99 KiB |
Before Width: | Height: | Size: 40 KiB After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 224 KiB After Width: | Height: | Size: 224 KiB |
Before Width: | Height: | Size: 67 KiB After Width: | Height: | Size: 67 KiB |
Before Width: | Height: | Size: 770 KiB After Width: | Height: | Size: 770 KiB |
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 44 KiB |
Before Width: | Height: | Size: 152 KiB After Width: | Height: | Size: 152 KiB |
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 18 KiB |
Before Width: | Height: | Size: 45 KiB After Width: | Height: | Size: 45 KiB |
Before Width: | Height: | Size: 76 KiB After Width: | Height: | Size: 76 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 18 KiB |
Before Width: | Height: | Size: 40 KiB After Width: | Height: | Size: 40 KiB |
Before Width: | Height: | Size: 151 KiB After Width: | Height: | Size: 151 KiB |
Before Width: | Height: | Size: 146 KiB After Width: | Height: | Size: 146 KiB |
|
@ -22,10 +22,10 @@ import QuickstartTraining from '../widgets/quickstart-training'
|
||||||
import Project from '../widgets/project'
|
import Project from '../widgets/project'
|
||||||
import Features from '../widgets/features'
|
import Features from '../widgets/features'
|
||||||
import Layout from '../components/layout'
|
import Layout from '../components/layout'
|
||||||
import courseImage from '../../docs/images/course.jpg'
|
import courseImage from '../../public/images/course.jpg'
|
||||||
import prodigyImage from '../../docs/images/prodigy_overview.jpg'
|
import prodigyImage from '../../public/images/prodigy_overview.jpg'
|
||||||
import projectsImage from '../../docs/images/projects.png'
|
import projectsImage from '../../public/images/projects.png'
|
||||||
import tailoredPipelinesImage from '../../docs/images/spacy-tailored-pipelines_wide.png'
|
import tailoredPipelinesImage from '../../public/images/spacy-tailored-pipelines_wide.png'
|
||||||
import { nightly, legacy } from '../../meta/dynamicMeta'
|
import { nightly, legacy } from '../../meta/dynamicMeta'
|
||||||
|
|
||||||
import Benchmarks from '../../docs/usage/_benchmarks-models.mdx'
|
import Benchmarks from '../../docs/usage/_benchmarks-models.mdx'
|
||||||
|
|