mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 20:28:20 +03:00
f67343295d
* simplify creation of KB by skipping dim reduction * small fixes to train EL example script * add KB creation and NEL training example scripts to example section * update descriptions of example scripts in the documentation * moving wiki_entity_linking folder from bin to projects * remove test for wiki NEL functionality that is being moved
208 lines
7.7 KiB
Markdown
208 lines
7.7 KiB
Markdown
---
|
||
title: Examples
|
||
teaser: Full code examples you can modify and run
|
||
menu:
|
||
- ['Information Extraction', 'information-extraction']
|
||
- ['Pipeline', 'pipeline']
|
||
- ['Training', 'training']
|
||
- ['Vectors & Similarity', 'vectors']
|
||
- ['Deep Learning', 'deep-learning']
|
||
---
|
||
|
||
## Information Extraction {#information-extraction hidden="true"}
|
||
|
||
### Using spaCy's phrase matcher {#phrase-matcher new="2"}
|
||
|
||
This example shows how to use the new [`PhraseMatcher`](/api/phrasematcher) to
|
||
efficiently find entities from a large terminology list.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/phrase_matcher.py
|
||
```
|
||
|
||
### Extracting entity relations {#entity-relations}
|
||
|
||
A simple example of extracting relations between phrases and entities using
|
||
spaCy's named entity recognizer and the dependency parse. Here, we extract money
|
||
and currency values (entities labelled as `MONEY`) and then check the dependency
|
||
tree to find the noun phrase they are referring to – for example:
|
||
`"$9.4 million"` → `"Net income"`.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/entity_relations.py
|
||
```
|
||
|
||
### Navigating the parse tree and subtrees {#subtrees}
|
||
|
||
This example shows how to navigate the parse tree including subtrees attached to
|
||
a word.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/parse_subtrees.py
|
||
```
|
||
|
||
## Pipeline {#pipeline hidden="true"}
|
||
|
||
### Custom pipeline components and attribute extensions {#custom-components-entities new="2"}
|
||
|
||
This example shows the implementation of a pipeline component that sets entity
|
||
annotations based on a list of single or multiple-word company names, merges
|
||
entities into one token and sets custom attributes on the `Doc`, `Span` and
|
||
`Token`.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_entities.py
|
||
```
|
||
|
||
### Custom pipeline components and attribute extensions via a REST API {#custom-components-api new="2"}
|
||
|
||
This example shows the implementation of a pipeline component that fetches
|
||
country meta data via the [REST Countries API](https://restcountries.eu) sets
|
||
entity annotations for countries, merges entities into one token and sets custom
|
||
attributes on the `Doc`, `Span` and `Token` – for example, the capital,
|
||
latitude/longitude coordinates and the country flag.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
|
||
```
|
||
|
||
### Custom method extensions {#custom-components-attr-methods new="2"}
|
||
|
||
A collection of snippets showing examples of extensions adding custom methods to
|
||
the `Doc`, `Token` and `Span`.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_attr_methods.py
|
||
```
|
||
|
||
### Multi-processing with Joblib {#multi-processing}
|
||
|
||
This example shows how to use multiple cores to process text using spaCy and
|
||
[Joblib](https://joblib.readthedocs.io/en/latest/). We're exporting
|
||
part-of-speech-tagged, true-cased, (very roughly) sentence-separated text, with
|
||
each "sentence" on a newline, and spaces between tokens. Data is loaded from the
|
||
IMDB movie reviews dataset and will be loaded automatically via Thinc's built-in
|
||
dataset loader.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/multi_processing.py
|
||
```
|
||
|
||
## Training {#training hidden="true"}
|
||
|
||
### Training spaCy's Named Entity Recognizer {#training-ner}
|
||
|
||
This example shows how to update spaCy's entity recognizer with your own
|
||
examples, starting off with an existing, pretrained model, or from scratch
|
||
using a blank `Language` class.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_ner.py
|
||
```
|
||
|
||
### Training an additional entity type {#new-entity-type}
|
||
|
||
This script shows how to add a new entity type to an existing pretrained NER
|
||
model. To keep the example short and simple, only four sentences are provided as
|
||
examples. In practice, you'll need many more — a few hundred would be a good
|
||
start.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_new_entity_type.py
|
||
```
|
||
|
||
### Creating a Knowledge Base for Named Entity Linking {#kb}
|
||
|
||
This example shows how to create a knowledge base in spaCy,
|
||
which is needed to implement entity linking functionality.
|
||
It requires as input a spaCy model with pretrained word vectors,
|
||
and it stores the KB to file (if an `output_dir` is provided).
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
|
||
```
|
||
|
||
### Training spaCy's Named Entity Linker {#nel}
|
||
|
||
This example shows how to train spaCy's entity linker with your own custom
|
||
examples, starting off with a predefined knowledge base and its vocab,
|
||
and using a blank `English` class.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
|
||
```
|
||
|
||
### Training spaCy's Dependency Parser {#parser}
|
||
|
||
This example shows how to update spaCy's dependency parser, starting off with an
|
||
existing, pretrained model, or from scratch using a blank `Language` class.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_parser.py
|
||
```
|
||
|
||
### Training spaCy's Part-of-speech Tagger {#tagger}
|
||
|
||
In this example, we're training spaCy's part-of-speech tagger with a custom tag
|
||
map, mapping our own tags to the mapping those tags to the
|
||
[Universal Dependencies scheme](http://universaldependencies.github.io/docs/u/pos/index.html).
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_tagger.py
|
||
```
|
||
|
||
### Training a custom parser for chat intent semantics {#intent-parser}
|
||
|
||
spaCy's parser component can be used to trained to predict any type of tree
|
||
structure over your input text. You can also predict trees over whole documents
|
||
or chat logs, with connections between the sentence-roots used to annotate
|
||
discourse structure. In this example, we'll build a message parser for a common
|
||
"chat intent": finding local businesses. Our message semantics will have the
|
||
following types of relations: `ROOT`, `PLACE`, `QUALITY`, `ATTRIBUTE`, `TIME`
|
||
and `LOCATION`.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_intent_parser.py
|
||
```
|
||
|
||
### Training spaCy's text classifier {#textcat new="2"}
|
||
|
||
This example shows how to train a multi-label convolutional neural network text
|
||
classifier on IMDB movie reviews, using spaCy's new
|
||
[`TextCategorizer`](/api/textcategorizer) component. The dataset will be loaded
|
||
automatically via Thinc's built-in dataset loader. Predictions are available via
|
||
[`Doc.cats`](/api/doc#attributes).
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/training/train_textcat.py
|
||
```
|
||
|
||
## Vectors {#vectors hidden="true"}
|
||
|
||
### Visualizing spaCy vectors in TensorBoard {#tensorboard}
|
||
|
||
This script lets you load any spaCy model containing word vectors into
|
||
[TensorBoard](https://projector.tensorflow.org/) to create an
|
||
[embedding visualization](https://github.com/tensorflow/tensorboard/blob/master/docs/tensorboard_projector_plugin.ipynb).
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/vectors_tensorboard.py
|
||
```
|
||
|
||
## Deep Learning {#deep-learning hidden="true"}
|
||
|
||
### Text classification with Keras {#keras}
|
||
|
||
This example shows how to use a [Keras](https://keras.io) LSTM sentiment
|
||
classification model in spaCy. spaCy splits the document into sentences, and
|
||
each sentence is classified using the LSTM. The scores for the sentences are
|
||
then aggregated to give the document score. This kind of hierarchical model is
|
||
quite difficult in "pure" Keras or TensorFlow, but it's very effective. The
|
||
Keras example on this dataset performs quite poorly, because it cuts off the
|
||
documents so that they're a fixed size. This hurts review accuracy a lot,
|
||
because people often summarize their rating in the final sentence.
|
||
|
||
```python
|
||
https://github.com/explosion/spaCy/tree/master/examples/deep_learning_keras.py
|
||
```
|