2017-10-03 15:26:20 +03:00
|
|
|
|
//- 💫 DOCS > USAGE > EXAMPLES
|
|
|
|
|
|
|
|
|
|
include ../_includes/_mixins
|
|
|
|
|
|
2017-10-26 19:46:11 +03:00
|
|
|
|
+section("information-extraction")
|
|
|
|
|
+h(3, "phrase-matcher") Using spaCy's phrase matcher
|
|
|
|
|
+tag-new(2)
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows how to use the new
|
|
|
|
|
| #[+api("phrasematcher") #[code PhraseMatcher]] to efficiently find
|
|
|
|
|
| entities from a large terminology list.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/information_extraction/phrase_matcher.py")
|
|
|
|
|
|
|
|
|
|
+h(3, "entity-relations") Extracting entity relations
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| A simple example of extracting relations between phrases and
|
|
|
|
|
| entities using spaCy's named entity recognizer and the dependency
|
|
|
|
|
| parse. Here, we extract money and currency values (entities labelled
|
|
|
|
|
| as #[code MONEY]) and then check the dependency tree to find the
|
|
|
|
|
| noun phrase they are referring to – for example: "$9.4 million"
|
|
|
|
|
| → "Net income".
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/information_extraction/entity_relations.py")
|
|
|
|
|
|
|
|
|
|
+h(3, "subtrees") Navigating the parse tree and subtrees
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows how to navigate the parse tree including subtrees
|
|
|
|
|
| attached to a word.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/information_extraction/parse_subtrees.py")
|
|
|
|
|
|
2017-10-10 05:26:06 +03:00
|
|
|
|
+section("pipeline")
|
|
|
|
|
+h(3, "custom-components-entities") Custom pipeline components and attribute extensions
|
|
|
|
|
+tag-new(2)
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows the implementation of a pipeline component
|
|
|
|
|
| that sets entity annotations based on a list of single or
|
|
|
|
|
| multiple-word company names, merges entities into one token and
|
|
|
|
|
| sets custom attributes on the #[code Doc], #[code Span] and
|
|
|
|
|
| #[code Token].
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/pipeline/custom_component_entities.py")
|
|
|
|
|
|
|
|
|
|
+h(3, "custom-components-api")
|
|
|
|
|
| Custom pipeline components and attribute extensions via a REST API
|
|
|
|
|
+tag-new(2)
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows the implementation of a pipeline component
|
|
|
|
|
| that fetches country meta data via the
|
|
|
|
|
| #[+a("https://restcountries.eu") REST Countries API] sets entity
|
|
|
|
|
| annotations for countries, merges entities into one token and
|
|
|
|
|
| sets custom attributes on the #[code Doc], #[code Span] and
|
|
|
|
|
| #[code Token] – for example, the capital, latitude/longitude
|
|
|
|
|
| coordinates and the country flag.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/pipeline/custom_component_countries_api.py")
|
|
|
|
|
|
|
|
|
|
+h(3, "custom-components-attr-methods") Custom method extensions
|
|
|
|
|
+tag-new(2)
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| A collection of snippets showing examples of extensions adding
|
|
|
|
|
| custom methods to the #[code Doc], #[code Token] and
|
|
|
|
|
| #[code Span].
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/pipeline/custom_attr_methods.py")
|
|
|
|
|
|
2017-10-27 03:00:01 +03:00
|
|
|
|
+h(3, "multi-processing") Multi-processing with Joblib
|
2017-10-27 02:58:55 +03:00
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows how to use multiple cores to process text using
|
|
|
|
|
| spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
|
|
|
|
|
| exporting part-of-speech-tagged, true-cased, (very roughly)
|
|
|
|
|
| sentence-separated text, with each "sentence" on a newline, and
|
|
|
|
|
| spaces between tokens. Data is loaded from the IMDB movie reviews
|
|
|
|
|
| dataset and will be loaded automatically via Thinc's built-in dataset
|
|
|
|
|
| loader.
|
|
|
|
|
|
2017-10-27 03:00:01 +03:00
|
|
|
|
+github("spacy", "examples/pipeline/multi_processing.py")
|
2017-10-27 02:58:55 +03:00
|
|
|
|
|
2017-10-03 15:26:20 +03:00
|
|
|
|
+section("training")
|
2017-10-26 15:44:43 +03:00
|
|
|
|
+h(3, "training-ner") Training spaCy's Named Entity Recognizer
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows how to update spaCy's entity recognizer
|
|
|
|
|
| with your own examples, starting off with an existing, pre-trained
|
|
|
|
|
| model, or from scratch using a blank #[code Language] class.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_ner.py")
|
|
|
|
|
|
2017-10-03 15:26:20 +03:00
|
|
|
|
+h(3, "new-entity-type") Training an additional entity type
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This script shows how to add a new entity type to an existing
|
|
|
|
|
| pre-trained NER model. To keep the example short and simple, only
|
|
|
|
|
| four sentences are provided as examples. In practice, you'll need
|
|
|
|
|
| many more — a few hundred would be a good start.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_new_entity_type.py")
|
|
|
|
|
|
2017-10-26 17:27:42 +03:00
|
|
|
|
+h(3, "parser") Training spaCy's Dependency Parser
|
2017-10-26 17:12:34 +03:00
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This example shows how to update spaCy's dependency parser,
|
|
|
|
|
| starting off with an existing, pre-trained model, or from scratch
|
|
|
|
|
| using a blank #[code Language] class.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_parser.py")
|
|
|
|
|
|
2017-10-26 17:27:42 +03:00
|
|
|
|
+h(3, "tagger") Training spaCy's Part-of-speech Tagger
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| In this example, we're training spaCy's part-of-speech tagger with a
|
|
|
|
|
| custom tag map, mapping our own tags to the mapping those tags to the
|
|
|
|
|
| #[+a("http://universaldependencies.github.io/docs/u/pos/index.html") Universal Dependencies scheme].
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_tagger.py")
|
|
|
|
|
|
2017-10-27 05:49:05 +03:00
|
|
|
|
+h(3, "intent-parser") Training a custom parser for chat intent semantics
|
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| spaCy's parser component can be used to trained to predict any type
|
|
|
|
|
| of tree structure over your input text. You can also predict trees
|
|
|
|
|
| over whole documents or chat logs, with connections between the
|
|
|
|
|
| sentence-roots used to annotate discourse structure. In this example,
|
|
|
|
|
| we'll build a message parser for a common "chat intent": finding
|
|
|
|
|
| local businesses. Our message semantics will have the following types
|
|
|
|
|
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
|
|
|
|
|
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_intent_parser.py")
|
|
|
|
|
|
2017-10-03 15:26:20 +03:00
|
|
|
|
+h(3, "textcat") Training spaCy's text classifier
|
|
|
|
|
+tag-new(2)
|
|
|
|
|
|
|
|
|
|
p
|
2017-10-27 01:48:45 +03:00
|
|
|
|
| This example shows how to train a multi-label convolutional neural
|
|
|
|
|
| network text classifier on IMDB movie reviews, using spaCy's new
|
|
|
|
|
| #[+api("textcategorizer") #[code TextCategorizer]] component. The
|
|
|
|
|
| dataset will be loaded automatically via Thinc's built-in dataset
|
|
|
|
|
| loader. Predictions are available via
|
|
|
|
|
| #[+api("doc#attributes") #[code Doc.cats]].
|
2017-10-03 15:26:20 +03:00
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/training/train_textcat.py")
|
|
|
|
|
|
2017-10-26 19:47:02 +03:00
|
|
|
|
+section("vectors")
|
2017-10-27 05:48:41 +03:00
|
|
|
|
+h(3, "fasttext") Loading pre-trained fastText vectors
|
2017-10-26 19:47:02 +03:00
|
|
|
|
|
|
|
|
|
p
|
|
|
|
|
| This simple snippet is all you need to be able to use the Facebook's
|
|
|
|
|
| #[+a("https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md") fastText vectors]
|
|
|
|
|
| (294 languages, pre-trained on Wikipedia) with spaCy. Once they're
|
|
|
|
|
| loaded, the vectors will be available via spaCy's built-in
|
|
|
|
|
| #[code similarity()] methods.
|
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/vectors_fast_text.py")
|
|
|
|
|
|
2017-10-03 15:26:20 +03:00
|
|
|
|
+section("deep-learning")
|
|
|
|
|
+h(3, "keras") Text classification with Keras
|
|
|
|
|
|
|
|
|
|
p
|
2017-11-07 03:22:30 +03:00
|
|
|
|
| This example shows how to use a #[+a("https://keras.io") Keras]
|
|
|
|
|
| LSTM sentiment classification model in spaCy. spaCy splits
|
|
|
|
|
| the document into sentences, and each sentence is classified using
|
|
|
|
|
| the LSTM. The scores for the sentences are then aggregated to give
|
|
|
|
|
| the document score. This kind of hierarchical model is quite
|
|
|
|
|
| difficult in "pure" Keras or Tensorflow, but it's very effective.
|
|
|
|
|
| The Keras example on this dataset performs quite poorly, because it
|
|
|
|
|
| cuts off the documents so that they're a fixed size. This hurts
|
|
|
|
|
| review accuracy a lot, because people often summarise their rating
|
|
|
|
|
| in the final sentence.
|
2017-10-03 15:26:20 +03:00
|
|
|
|
|
|
|
|
|
+github("spacy", "examples/deep_learning_keras.py")
|