mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	* simplify creation of KB by skipping dim reduction * small fixes to train EL example script * add KB creation and NEL training example scripts to example section * update descriptions of example scripts in the documentation * moving wiki_entity_linking folder from bin to projects * remove test for wiki NEL functionality that is being moved
		
			
				
	
	
		
			208 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			208 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						||
title: Examples
 | 
						||
teaser: Full code examples you can modify and run
 | 
						||
menu:
 | 
						||
  - ['Information Extraction', 'information-extraction']
 | 
						||
  - ['Pipeline', 'pipeline']
 | 
						||
  - ['Training', 'training']
 | 
						||
  - ['Vectors & Similarity', 'vectors']
 | 
						||
  - ['Deep Learning', 'deep-learning']
 | 
						||
---
 | 
						||
 | 
						||
## Information Extraction {#information-extraction hidden="true"}
 | 
						||
 | 
						||
### Using spaCy's phrase matcher {#phrase-matcher new="2"}
 | 
						||
 | 
						||
This example shows how to use the new [`PhraseMatcher`](/api/phrasematcher) to
 | 
						||
efficiently find entities from a large terminology list.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/phrase_matcher.py
 | 
						||
```
 | 
						||
 | 
						||
### Extracting entity relations {#entity-relations}
 | 
						||
 | 
						||
A simple example of extracting relations between phrases and entities using
 | 
						||
spaCy's named entity recognizer and the dependency parse. Here, we extract money
 | 
						||
and currency values (entities labelled as `MONEY`) and then check the dependency
 | 
						||
tree to find the noun phrase they are referring to – for example:
 | 
						||
`"$9.4 million"` → `"Net income"`.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/entity_relations.py
 | 
						||
```
 | 
						||
 | 
						||
### Navigating the parse tree and subtrees {#subtrees}
 | 
						||
 | 
						||
This example shows how to navigate the parse tree including subtrees attached to
 | 
						||
a word.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/information_extraction/parse_subtrees.py
 | 
						||
```
 | 
						||
 | 
						||
## Pipeline {#pipeline hidden="true"}
 | 
						||
 | 
						||
### Custom pipeline components and attribute extensions {#custom-components-entities new="2"}
 | 
						||
 | 
						||
This example shows the implementation of a pipeline component that sets entity
 | 
						||
annotations based on a list of single or multiple-word company names, merges
 | 
						||
entities into one token and sets custom attributes on the `Doc`, `Span` and
 | 
						||
`Token`.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_entities.py
 | 
						||
```
 | 
						||
 | 
						||
### Custom pipeline components and attribute extensions via a REST API {#custom-components-api new="2"}
 | 
						||
 | 
						||
This example shows the implementation of a pipeline component that fetches
 | 
						||
country meta data via the [REST Countries API](https://restcountries.eu) sets
 | 
						||
entity annotations for countries, merges entities into one token and sets custom
 | 
						||
attributes on the `Doc`, `Span` and `Token` – for example, the capital,
 | 
						||
latitude/longitude coordinates and the country flag.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_component_countries_api.py
 | 
						||
```
 | 
						||
 | 
						||
### Custom method extensions {#custom-components-attr-methods new="2"}
 | 
						||
 | 
						||
A collection of snippets showing examples of extensions adding custom methods to
 | 
						||
the `Doc`, `Token` and `Span`.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/custom_attr_methods.py
 | 
						||
```
 | 
						||
 | 
						||
### Multi-processing with Joblib {#multi-processing}
 | 
						||
 | 
						||
This example shows how to use multiple cores to process text using spaCy and
 | 
						||
[Joblib](https://joblib.readthedocs.io/en/latest/). We're exporting
 | 
						||
part-of-speech-tagged, true-cased, (very roughly) sentence-separated text, with
 | 
						||
each "sentence" on a newline, and spaces between tokens. Data is loaded from the
 | 
						||
IMDB movie reviews dataset and will be loaded automatically via Thinc's built-in
 | 
						||
dataset loader.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/pipeline/multi_processing.py
 | 
						||
```
 | 
						||
 | 
						||
## Training {#training hidden="true"}
 | 
						||
 | 
						||
### Training spaCy's Named Entity Recognizer {#training-ner}
 | 
						||
 | 
						||
This example shows how to update spaCy's entity recognizer with your own
 | 
						||
examples, starting off with an existing, pretrained model, or from scratch
 | 
						||
using a blank `Language` class.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_ner.py
 | 
						||
```
 | 
						||
 | 
						||
### Training an additional entity type {#new-entity-type}
 | 
						||
 | 
						||
This script shows how to add a new entity type to an existing pretrained NER
 | 
						||
model. To keep the example short and simple, only four sentences are provided as
 | 
						||
examples. In practice, you'll need many more — a few hundred would be a good
 | 
						||
start.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_new_entity_type.py
 | 
						||
```
 | 
						||
 | 
						||
### Creating a Knowledge Base for Named Entity Linking {#kb}
 | 
						||
 | 
						||
This example shows how to create a knowledge base in spaCy,
 | 
						||
which is needed to implement entity linking functionality.
 | 
						||
It requires as input a spaCy model with pretrained word vectors,
 | 
						||
and it stores the KB to file (if an `output_dir` is provided).
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/create_kb.py
 | 
						||
```
 | 
						||
 | 
						||
### Training spaCy's Named Entity Linker {#nel}
 | 
						||
 | 
						||
This example shows how to train spaCy's entity linker with your own custom
 | 
						||
examples, starting off with a predefined knowledge base and its vocab, 
 | 
						||
and using a blank `English` class.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_entity_linker.py
 | 
						||
```
 | 
						||
 | 
						||
### Training spaCy's Dependency Parser {#parser}
 | 
						||
 | 
						||
This example shows how to update spaCy's dependency parser, starting off with an
 | 
						||
existing, pretrained model, or from scratch using a blank `Language` class.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_parser.py
 | 
						||
```
 | 
						||
 | 
						||
### Training spaCy's Part-of-speech Tagger {#tagger}
 | 
						||
 | 
						||
In this example, we're training spaCy's part-of-speech tagger with a custom tag
 | 
						||
map, mapping our own tags to the mapping those tags to the
 | 
						||
[Universal Dependencies scheme](http://universaldependencies.github.io/docs/u/pos/index.html).
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_tagger.py
 | 
						||
```
 | 
						||
 | 
						||
### Training a custom parser for chat intent semantics {#intent-parser}
 | 
						||
 | 
						||
spaCy's parser component can be used to trained to predict any type of tree
 | 
						||
structure over your input text. You can also predict trees over whole documents
 | 
						||
or chat logs, with connections between the sentence-roots used to annotate
 | 
						||
discourse structure. In this example, we'll build a message parser for a common
 | 
						||
"chat intent": finding local businesses. Our message semantics will have the
 | 
						||
following types of relations: `ROOT`, `PLACE`, `QUALITY`, `ATTRIBUTE`, `TIME`
 | 
						||
and `LOCATION`.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_intent_parser.py
 | 
						||
```
 | 
						||
 | 
						||
### Training spaCy's text classifier {#textcat new="2"}
 | 
						||
 | 
						||
This example shows how to train a multi-label convolutional neural network text
 | 
						||
classifier on IMDB movie reviews, using spaCy's new
 | 
						||
[`TextCategorizer`](/api/textcategorizer) component. The dataset will be loaded
 | 
						||
automatically via Thinc's built-in dataset loader. Predictions are available via
 | 
						||
[`Doc.cats`](/api/doc#attributes).
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/training/train_textcat.py
 | 
						||
```
 | 
						||
 | 
						||
## Vectors {#vectors hidden="true"}
 | 
						||
 | 
						||
### Visualizing spaCy vectors in TensorBoard {#tensorboard}
 | 
						||
 | 
						||
This script lets you load any spaCy model containing word vectors into
 | 
						||
[TensorBoard](https://projector.tensorflow.org/) to create an
 | 
						||
[embedding visualization](https://github.com/tensorflow/tensorboard/blob/master/docs/tensorboard_projector_plugin.ipynb).
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/vectors_tensorboard.py
 | 
						||
```
 | 
						||
 | 
						||
## Deep Learning {#deep-learning hidden="true"}
 | 
						||
 | 
						||
### Text classification with Keras {#keras}
 | 
						||
 | 
						||
This example shows how to use a [Keras](https://keras.io) LSTM sentiment
 | 
						||
classification model in spaCy. spaCy splits the document into sentences, and
 | 
						||
each sentence is classified using the LSTM. The scores for the sentences are
 | 
						||
then aggregated to give the document score. This kind of hierarchical model is
 | 
						||
quite difficult in "pure" Keras or TensorFlow, but it's very effective. The
 | 
						||
Keras example on this dataset performs quite poorly, because it cuts off the
 | 
						||
documents so that they're a fixed size. This hurts review accuracy a lot,
 | 
						||
because people often summarize their rating in the final sentence.
 | 
						||
 | 
						||
```python
 | 
						||
https://github.com/explosion/spaCy/tree/master/examples/deep_learning_keras.py
 | 
						||
```
 |