spaCy/website/docs/usage/examples.md

---
title: Examples
teaser: Full code examples you can modify and run
menu:
  - ['Information Extraction', 'information-extraction']
  - ['Pipeline', 'pipeline']
  - ['Training', 'training']
  - ['Vectors & Similarity', 'vectors']
  - ['Deep Learning', 'deep-learning']
---

## Information Extraction {#information-extraction hidden="true"}

### Using spaCy's phrase matcher {#phrase-matcher new="2"}

This example shows how to use the new [`PhraseMatcher`](/api/phrasematcher) to
efficiently find entities from a large terminology list.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/phrase_matcher.py
```

### Extracting entity relations {#entity-relations}

A simple example of extracting relations between phrases and entities using
spaCy's named entity recognizer and the dependency parse. Here, we extract money
and currency values (entities labelled as `MONEY`) and then check the dependency
tree to find the noun phrase they are referring to – for example:
`"$9.4 million"` → `"Net income"`.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/entity_relations.py
```

### Navigating the parse tree and subtrees {#subtrees}

This example shows how to navigate the parse tree including subtrees attached to
a word.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/parse_subtrees.py
```

## Pipeline {#pipeline hidden="true"}

### Custom pipeline components and attribute extensions {#custom-components-entities new="2"}

This example shows the implementation of a pipeline component that sets entity
annotations based on a list of single or multiple-word company names, merges
entities into one token and sets custom attributes on the `Doc`, `Span` and
`Token`.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_component_entities.py
```

### Custom pipeline components and attribute extensions via a REST API {#custom-components-api new="2"}

This example shows the implementation of a pipeline component that fetches
country meta data via the [REST Countries API](https://restcountries.eu) sets
entity annotations for countries, merges entities into one token and sets custom
attributes on the `Doc`, `Span` and `Token` – for example, the capital,
latitude/longitude coordinates and the country flag.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_component_countries_api.py
```

### Custom method extensions {#custom-components-attr-methods new="2"}

A collection of snippets showing examples of extensions adding custom methods to
the `Doc`, `Token` and `Span`.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_attr_methods.py
```

### Multi-processing with Joblib {#multi-processing}

This example shows how to use multiple cores to process text using spaCy and
[Joblib](https://joblib.readthedocs.io/en/latest/). We're exporting
part-of-speech-tagged, true-cased, (very roughly) sentence-separated text, with
each "sentence" on a newline, and spaces between tokens. Data is loaded from the
IMDB movie reviews dataset and will be loaded automatically via Thinc's built-in
dataset loader.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/multi_processing.py
```

## Training {#training hidden="true"}

### Training spaCy's Named Entity Recognizer {#training-ner}

This example shows how to update spaCy's entity recognizer with your own
examples, starting off with an existing, pretrained model, or from scratch using
a blank `Language` class.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_ner.py
```

### Training an additional entity type {#new-entity-type}

This script shows how to add a new entity type to an existing pretrained NER
model. To keep the example short and simple, only four sentences are provided as
examples. In practice, you'll need many more — a few hundred would be a good
start.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_new_entity_type.py
```

### Creating a Knowledge Base for Named Entity Linking {#kb}

This example shows how to create a knowledge base in spaCy, which is needed to
implement entity linking functionality. It requires as input a spaCy model with
pretrained word vectors, and it stores the KB to file (if an `output_dir` is
provided).

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/create_kb.py
```

### Training spaCy's Named Entity Linker {#nel}

This example shows how to train spaCy's entity linker with your own custom
examples, starting off with a predefined knowledge base and its vocab, and using
a blank `English` class.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_entity_linker.py
```

### Training spaCy's Dependency Parser {#parser}

This example shows how to update spaCy's dependency parser, starting off with an
existing, pretrained model, or from scratch using a blank `Language` class.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_parser.py
```

### Training spaCy's Part-of-speech Tagger {#tagger}

In this example, we're training spaCy's part-of-speech tagger with a custom tag
map, mapping our own tags to the mapping those tags to the
[Universal Dependencies scheme](http://universaldependencies.github.io/docs/u/pos/index.html).

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_tagger.py
```

### Training a custom parser for chat intent semantics {#intent-parser}

spaCy's parser component can be used to trained to predict any type of tree
structure over your input text. You can also predict trees over whole documents
or chat logs, with connections between the sentence-roots used to annotate
discourse structure. In this example, we'll build a message parser for a common
"chat intent": finding local businesses. Our message semantics will have the
following types of relations: `ROOT`, `PLACE`, `QUALITY`, `ATTRIBUTE`, `TIME`
and `LOCATION`.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_intent_parser.py
```

### Training spaCy's text classifier {#textcat new="2"}

This example shows how to train a multi-label convolutional neural network text
classifier on IMDB movie reviews, using spaCy's new
[`TextCategorizer`](/api/textcategorizer) component. The dataset will be loaded
automatically via Thinc's built-in dataset loader. Predictions are available via
[`Doc.cats`](/api/doc#attributes).

```python
https://github.com/explosion/spacy/tree/v2.x/examples/training/train_textcat.py
```

## Vectors {#vectors hidden="true"}

### Visualizing spaCy vectors in TensorBoard {#tensorboard}

This script lets you load any spaCy model containing word vectors into
[TensorBoard](https://projector.tensorflow.org/) to create an
[embedding visualization](https://github.com/tensorflow/tensorboard/blob/master/docs/tensorboard_projector_plugin.ipynb).

```python
https://github.com/explosion/spacy/tree/v2.x/examples/vectors_tensorboard.py
```

## Deep Learning {#deep-learning hidden="true"}

### Text classification with Keras {#keras}

This example shows how to use a [Keras](https://keras.io) LSTM sentiment
classification model in spaCy. spaCy splits the document into sentences, and
each sentence is classified using the LSTM. The scores for the sentences are
then aggregated to give the document score. This kind of hierarchical model is
quite difficult in "pure" Keras or TensorFlow, but it's very effective. The
Keras example on this dataset performs quite poorly, because it cuts off the
documents so that they're a fixed size. This hurts review accuracy a lot,
because people often summarize their rating in the final sentence.

```python
https://github.com/explosion/spacy/tree/v2.x/examples/deep_learning_keras.py
```
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								---
 								title: Examples
 								teaser: Full code examples you can modify and run
 								menu:
 								  - ['Information Extraction', 'information-extraction']
 								  - ['Pipeline', 'pipeline']
 								  - ['Training', 'training']
 								  - ['Vectors & Similarity', 'vectors']
 								  - ['Deep Learning', 'deep-learning']
 								---
 								## Information Extraction {#information-extraction hidden="true"}
 								### Using spaCy's phrase matcher {#phrase-matcher new="2"}
 								This example shows how to use the new [`PhraseMatcher`](/api/phrasematcher) to
 								efficiently find entities from a large terminology list.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/phrase_matcher.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Extracting entity relations {#entity-relations}
 								A simple example of extracting relations between phrases and entities using
 								spaCy's named entity recognizer and the dependency parse. Here, we extract money
 								and currency values (entities labelled as `MONEY`) and then check the dependency
 								tree to find the noun phrase they are referring to – for example:
 								`"$9.4 million"` → `"Net income"`.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/entity_relations.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Navigating the parse tree and subtrees {#subtrees}
 								This example shows how to navigate the parse tree including subtrees attached to
 								a word.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/information_extraction/parse_subtrees.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								## Pipeline {#pipeline hidden="true"}
 								### Custom pipeline components and attribute extensions {#custom-components-entities new="2"}
 								This example shows the implementation of a pipeline component that sets entity
 								annotations based on a list of single or multiple-word company names, merges
 								entities into one token and sets custom attributes on the `Doc`, `Span` and
 								`Token`.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_component_entities.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Custom pipeline components and attribute extensions via a REST API {#custom-components-api new="2"}
 								This example shows the implementation of a pipeline component that fetches
 								country meta data via the [REST Countries API](https://restcountries.eu) sets
 								entity annotations for countries, merges entities into one token and sets custom
 								attributes on the `Doc`, `Span` and `Token` – for example, the capital,
 								latitude/longitude coordinates and the country flag.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_component_countries_api.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Custom method extensions {#custom-components-attr-methods new="2"}
 								A collection of snippets showing examples of extensions adding custom methods to
 								the `Doc`, `Token` and `Span`.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/custom_attr_methods.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Multi-processing with Joblib {#multi-processing}
 								This example shows how to use multiple cores to process text using spaCy and
 								[Joblib](https://joblib.readthedocs.io/en/latest/). We're exporting
 								part-of-speech-tagged, true-cased, (very roughly) sentence-separated text, with
 								each "sentence" on a newline, and spaces between tokens. Data is loaded from the
 								IMDB movie reviews dataset and will be loaded automatically via Thinc's built-in
 								dataset loader.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/pipeline/multi_processing.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								## Training {#training hidden="true"}
 								### Training spaCy's Named Entity Recognizer {#training-ner}
 								This example shows how to update spaCy's entity recognizer with your own
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								examples, starting off with an existing, pretrained model, or from scratch using
 								a blank `Language` class.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_ner.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Training an additional entity type {#new-entity-type}
-												Use consistent spelling

											
										
										
											2019-10-02 11:37:39 +03:00
+								This script shows how to add a new entity type to an existing pretrained NER
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								model. To keep the example short and simple, only four sentences are provided as
 								examples. In practice, you'll need many more — a few hundred would be a good
 								start.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_new_entity_type.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
-												Update NEL examples and documentation (#5370)

* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
											
										
										
											2020-04-29 13:53:53 +03:00
 								### Creating a Knowledge Base for Named Entity Linking {#kb}
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								This example shows how to create a knowledge base in spaCy, which is needed to
 								implement entity linking functionality. It requires as input a spaCy model with
 								pretrained word vectors, and it stores the KB to file (if an `output_dir` is
 								provided).
-												Update NEL examples and documentation (#5370)

* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
											
										
										
											2020-04-29 13:53:53 +03:00
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/create_kb.py
-												Update NEL examples and documentation (#5370)

* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
											
										
										
											2020-04-29 13:53:53 +03:00
+								```
 								### Training spaCy's Named Entity Linker {#nel}
 								This example shows how to train spaCy's entity linker with your own custom
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								examples, starting off with a predefined knowledge base and its vocab, and using
 								a blank `English` class.
-												Update NEL examples and documentation (#5370)

* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
											
										
										
											2020-04-29 13:53:53 +03:00
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_entity_linker.py
-												Update NEL examples and documentation (#5370)

* simplify creation of KB by skipping dim reduction

* small fixes to train EL example script

* add KB creation and NEL training example scripts to example section

* update descriptions of example scripts in the documentation

* moving wiki_entity_linking folder from bin to projects

* remove test for wiki NEL functionality that is being moved
											
										
										
											2020-04-29 13:53:53 +03:00
+								```
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								### Training spaCy's Dependency Parser {#parser}
 								This example shows how to update spaCy's dependency parser, starting off with an
-												Use consistent spelling

											
										
										
											2019-10-02 11:37:39 +03:00
+								existing, pretrained model, or from scratch using a blank `Language` class.
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_parser.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Training spaCy's Part-of-speech Tagger {#tagger}
 								In this example, we're training spaCy's part-of-speech tagger with a custom tag
 								map, mapping our own tags to the mapping those tags to the
 								[Universal Dependencies scheme](http://universaldependencies.github.io/docs/u/pos/index.html).
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_tagger.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Training a custom parser for chat intent semantics {#intent-parser}
 								spaCy's parser component can be used to trained to predict any type of tree
 								structure over your input text. You can also predict trees over whole documents
 								or chat logs, with connections between the sentence-roots used to annotate
 								discourse structure. In this example, we'll build a message parser for a common
 								"chat intent": finding local businesses. Our message semantics will have the
 								following types of relations: `ROOT`, `PLACE`, `QUALITY`, `ATTRIBUTE`, `TIME`
 								and `LOCATION`.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_intent_parser.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								### Training spaCy's text classifier {#textcat new="2"}
 								This example shows how to train a multi-label convolutional neural network text
 								classifier on IMDB movie reviews, using spaCy's new
 								[`TextCategorizer`](/api/textcategorizer) component. The dataset will be loaded
 								automatically via Thinc's built-in dataset loader. Predictions are available via
 								[`Doc.cats`](/api/doc#attributes).
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/training/train_textcat.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								## Vectors {#vectors hidden="true"}
 								### Visualizing spaCy vectors in TensorBoard {#tensorboard}
-												Remove non-existent example (closes #3533)

											
										
										
											2019-04-03 10:59:17 +03:00
+								This script lets you load any spaCy model containing word vectors into
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								[TensorBoard](https://projector.tensorflow.org/) to create an
-												[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md`  (#5325)

* The embedding vis. link is broken

The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?

* contributor agreement

* Update Mlawrence95.md

* Update website/docs/usage/examples.md

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
											
										
										
											2020-04-21 21:35:12 +03:00
+								[embedding visualization](https://github.com/tensorflow/tensorboard/blob/master/docs/tensorboard_projector_plugin.ipynb).
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/vectors_tensorboard.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```
 								## Deep Learning {#deep-learning hidden="true"}
 								### Text classification with Keras {#keras}
 								This example shows how to use a [Keras](https://keras.io) LSTM sentiment
 								classification model in spaCy. spaCy splits the document into sentences, and
 								each sentence is classified using the LSTM. The scores for the sentences are
 								then aggregated to give the document score. This kind of hierarchical model is
 								quite difficult in "pure" Keras or TensorFlow, but it's very effective. The
 								Keras example on this dataset performs quite poorly, because it cuts off the
 								documents so that they're a fixed size. This hurts review accuracy a lot,
 								because people often summarize their rating in the final sentence.
 								```python
-												Fix code branch for v2.x site [ci skip]

											
										
										
											2021-02-01 03:48:35 +03:00
+								https://github.com/explosion/spacy/tree/v2.x/examples/deep_learning_keras.py
-												💫 Update website (#3285)

<!--- Provide a general summary of your changes in the title. -->

## Description

The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.

This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.


### Types of change
enhancement

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2019-02-17 21:31:19 +03:00
+								```