//- 💫 DOCS > USAGE > EXAMPLES include ../_includes/_mixins +section("information-extraction") +h(3, "phrase-matcher") Using spaCy's phrase matcher +tag-new(2) p | This example shows how to use the new | #[+api("phrasematcher") #[code PhraseMatcher]] to efficiently find | entities from a large terminology list. +github("spacy", "examples/information_extraction/phrase_matcher.py") +h(3, "entity-relations") Extracting entity relations p | A simple example of extracting relations between phrases and | entities using spaCy's named entity recognizer and the dependency | parse. Here, we extract money and currency values (entities labelled | as #[code MONEY]) and then check the dependency tree to find the | noun phrase they are referring to – for example: "$9.4 million" | → "Net income". +github("spacy", "examples/information_extraction/entity_relations.py") +h(3, "subtrees") Navigating the parse tree and subtrees p | This example shows how to navigate the parse tree including subtrees | attached to a word. +github("spacy", "examples/information_extraction/parse_subtrees.py") +section("pipeline") +h(3, "custom-components-entities") Custom pipeline components and attribute extensions +tag-new(2) p | This example shows the implementation of a pipeline component | that sets entity annotations based on a list of single or | multiple-word company names, merges entities into one token and | sets custom attributes on the #[code Doc], #[code Span] and | #[code Token]. +github("spacy", "examples/pipeline/custom_component_entities.py") +h(3, "custom-components-api") | Custom pipeline components and attribute extensions via a REST API +tag-new(2) p | This example shows the implementation of a pipeline component | that fetches country meta data via the | #[+a("https://restcountries.eu") REST Countries API] sets entity | annotations for countries, merges entities into one token and | sets custom attributes on the #[code Doc], #[code Span] and | #[code Token] – for example, the capital, latitude/longitude | coordinates and the country flag. +github("spacy", "examples/pipeline/custom_component_countries_api.py") +h(3, "custom-components-attr-methods") Custom method extensions +tag-new(2) p | A collection of snippets showing examples of extensions adding | custom methods to the #[code Doc], #[code Token] and | #[code Span]. +github("spacy", "examples/pipeline/custom_attr_methods.py") +h(3, "multi-processing") Multi-processing with Joblib p | This example shows how to use multiple cores to process text using | spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're | exporting part-of-speech-tagged, true-cased, (very roughly) | sentence-separated text, with each "sentence" on a newline, and | spaces between tokens. Data is loaded from the IMDB movie reviews | dataset and will be loaded automatically via Thinc's built-in dataset | loader. +github("spacy", "examples/pipeline/multi_processing.py") +section("training") +h(3, "training-ner") Training spaCy's Named Entity Recognizer p | This example shows how to update spaCy's entity recognizer | with your own examples, starting off with an existing, pre-trained | model, or from scratch using a blank #[code Language] class. +github("spacy", "examples/training/train_ner.py") +h(3, "new-entity-type") Training an additional entity type p | This script shows how to add a new entity type to an existing | pre-trained NER model. To keep the example short and simple, only | four sentences are provided as examples. In practice, you'll need | many more — a few hundred would be a good start. +github("spacy", "examples/training/train_new_entity_type.py") +h(3, "parser") Training spaCy's Dependency Parser p | This example shows how to update spaCy's dependency parser, | starting off with an existing, pre-trained model, or from scratch | using a blank #[code Language] class. +github("spacy", "examples/training/train_parser.py") +h(3, "tagger") Training spaCy's Part-of-speech Tagger p | In this example, we're training spaCy's part-of-speech tagger with a | custom tag map, mapping our own tags to the mapping those tags to the | #[+a("http://universaldependencies.github.io/docs/u/pos/index.html") Universal Dependencies scheme]. +github("spacy", "examples/training/train_tagger.py") +h(3, "intent-parser") Training a custom parser for chat intent semantics p | spaCy's parser component can be used to trained to predict any type | of tree structure over your input text. You can also predict trees | over whole documents or chat logs, with connections between the | sentence-roots used to annotate discourse structure. In this example, | we'll build a message parser for a common "chat intent": finding | local businesses. Our message semantics will have the following types | of relations: #[code ROOT], #[code PLACE], #[code QUALITY], | #[code ATTRIBUTE], #[code TIME] and #[code LOCATION]. +github("spacy", "examples/training/train_intent_parser.py") +h(3, "textcat") Training spaCy's text classifier +tag-new(2) p | This example shows how to train a multi-label convolutional neural | network text classifier on IMDB movie reviews, using spaCy's new | #[+api("textcategorizer") #[code TextCategorizer]] component. The | dataset will be loaded automatically via Thinc's built-in dataset | loader. Predictions are available via | #[+api("doc#attributes") #[code Doc.cats]]. +github("spacy", "examples/training/train_textcat.py") +section("vectors") +h(3, "fasttext") Loading pre-trained fastText vectors p | This simple snippet is all you need to be able to use the Facebook's | #[+a("https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md") fastText vectors] | (294 languages, pre-trained on Wikipedia) with spaCy. Once they're | loaded, the vectors will be available via spaCy's built-in | #[code similarity()] methods. +github("spacy", "examples/vectors_fast_text.py") +h(3, "tensorboard") Visualizing spaCy vectors in TensorBoard p | These two scripts let you load any spaCy model containing word vectors | into #[+a("https://projector.tensorflow.org/") TensorBoard] to create | an #[+a("https://www.tensorflow.org/versions/r1.1/get_started/embedding_viz") embedding visualization]. | The first example uses TensorBoard, the second example TensorBoard's | standalone embedding projector. +github("spacy", "examples/vectors_tensorboard.py") +github("spacy", "examples/vectors_tensorboard_standalone.py") +section("deep-learning") +h(3, "keras") Text classification with Keras p | This example shows how to use a #[+a("https://keras.io") Keras] | LSTM sentiment classification model in spaCy. spaCy splits | the document into sentences, and each sentence is classified using | the LSTM. The scores for the sentences are then aggregated to give | the document score. This kind of hierarchical model is quite | difficult in "pure" Keras or Tensorflow, but it's very effective. | The Keras example on this dataset performs quite poorly, because it | cuts off the documents so that they're a fixed size. This hurts | review accuracy a lot, because people often summarise their rating | in the final sentence. +github("spacy", "examples/deep_learning_keras.py")