mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	* Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
		
			
				
	
	
		
			182 lines
		
	
	
		
			7.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			182 lines
		
	
	
		
			7.5 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| //- 💫 DOCS > USAGE > EXAMPLES
 | ||
| 
 | ||
| include ../_includes/_mixins
 | ||
| 
 | ||
| +section("information-extraction")
 | ||
|     +h(3, "phrase-matcher") Using spaCy's phrase matcher
 | ||
|         +tag-new(2)
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to use the new
 | ||
|         |  #[+api("phrasematcher") #[code PhraseMatcher]] to efficiently find
 | ||
|         |  entities from a large terminology list.
 | ||
| 
 | ||
|     +github("spacy", "examples/information_extraction/phrase_matcher.py")
 | ||
| 
 | ||
|     +h(3, "entity-relations") Extracting entity relations
 | ||
| 
 | ||
|     p
 | ||
|         |  A simple example of extracting relations between phrases and
 | ||
|         |  entities using spaCy's named entity recognizer and the dependency
 | ||
|         |  parse. Here, we extract money and currency values (entities labelled
 | ||
|         |  as #[code MONEY]) and then check the dependency tree to find the
 | ||
|         |  noun phrase they are referring to – for example: "$9.4 million"
 | ||
|         |  → "Net income".
 | ||
| 
 | ||
|     +github("spacy", "examples/information_extraction/entity_relations.py")
 | ||
| 
 | ||
|     +h(3, "subtrees") Navigating the parse tree and subtrees
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to navigate the parse tree including subtrees
 | ||
|         |  attached to a word.
 | ||
| 
 | ||
|     +github("spacy", "examples/information_extraction/parse_subtrees.py")
 | ||
| 
 | ||
| +section("pipeline")
 | ||
|     +h(3, "custom-components-entities") Custom pipeline components and attribute extensions
 | ||
|         +tag-new(2)
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows the implementation of a pipeline component
 | ||
|         |  that sets entity annotations based on a list of single or
 | ||
|         |  multiple-word company names, merges entities into one token and
 | ||
|         |  sets custom attributes on the #[code Doc], #[code Span] and
 | ||
|         |  #[code Token].
 | ||
| 
 | ||
|     +github("spacy", "examples/pipeline/custom_component_entities.py")
 | ||
| 
 | ||
|     +h(3, "custom-components-api")
 | ||
|         |  Custom pipeline components and attribute extensions via a REST API
 | ||
|         +tag-new(2)
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows the implementation of a pipeline component
 | ||
|         |  that fetches country meta data via the
 | ||
|         |  #[+a("https://restcountries.eu") REST Countries API] sets entity
 | ||
|         |  annotations for countries, merges entities into one token and
 | ||
|         |  sets custom attributes on the #[code Doc], #[code Span] and
 | ||
|         |  #[code Token] – for example, the capital, latitude/longitude
 | ||
|         |  coordinates and the country flag.
 | ||
| 
 | ||
|     +github("spacy", "examples/pipeline/custom_component_countries_api.py")
 | ||
| 
 | ||
|     +h(3, "custom-components-attr-methods") Custom method extensions
 | ||
|         +tag-new(2)
 | ||
| 
 | ||
|     p
 | ||
|         |  A collection of snippets showing examples of extensions adding
 | ||
|         |  custom methods to the #[code Doc], #[code Token] and
 | ||
|         |  #[code Span].
 | ||
| 
 | ||
|     +github("spacy", "examples/pipeline/custom_attr_methods.py")
 | ||
| 
 | ||
|     +h(3, "multi-processing") Multi-processing with Joblib
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to use multiple cores to process text using
 | ||
|         |  spaCy and #[+a("https://pythonhosted.org/joblib/") Joblib]. We're
 | ||
|         |  exporting part-of-speech-tagged, true-cased, (very roughly)
 | ||
|         |  sentence-separated text, with each "sentence" on a newline, and
 | ||
|         |  spaces between tokens. Data is loaded from the IMDB movie reviews
 | ||
|         |  dataset and will be loaded automatically via Thinc's built-in dataset
 | ||
|         |  loader.
 | ||
| 
 | ||
|     +github("spacy", "examples/pipeline/multi_processing.py")
 | ||
| 
 | ||
| +section("training")
 | ||
|     +h(3, "training-ner") Training spaCy's Named Entity Recognizer
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to update spaCy's entity recognizer
 | ||
|         |  with your own examples, starting off with an existing, pre-trained
 | ||
|         |  model, or from scratch using a blank #[code Language] class.
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_ner.py")
 | ||
| 
 | ||
|     +h(3, "new-entity-type") Training an additional entity type
 | ||
| 
 | ||
|     p
 | ||
|         |  This script shows how to add a new entity type to an existing
 | ||
|         |  pre-trained NER model. To keep the example short and simple, only
 | ||
|         |  four sentences are provided as examples. In practice, you'll need
 | ||
|         |  many more — a few hundred would be a good start.
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_new_entity_type.py")
 | ||
| 
 | ||
|     +h(3, "parser") Training spaCy's Dependency Parser
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to update spaCy's dependency parser,
 | ||
|         |  starting off with an existing, pre-trained model, or from scratch
 | ||
|         |  using a blank #[code Language] class.
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_parser.py")
 | ||
| 
 | ||
|     +h(3, "tagger") Training spaCy's Part-of-speech Tagger
 | ||
| 
 | ||
|     p
 | ||
|         |  In this example, we're training spaCy's part-of-speech tagger with a
 | ||
|         |  custom tag map, mapping our own tags to the mapping those tags to the
 | ||
|         |  #[+a("http://universaldependencies.github.io/docs/u/pos/index.html") Universal Dependencies scheme].
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_tagger.py")
 | ||
| 
 | ||
|     +h(3, "intent-parser") Training a custom parser for chat intent semantics
 | ||
| 
 | ||
|     p
 | ||
|         |  spaCy's parser component can be used to trained to predict any type
 | ||
|         |  of tree structure over your input text. You can also predict trees
 | ||
|         |  over whole documents or chat logs, with connections between the
 | ||
|         |  sentence-roots used to annotate discourse structure. In this example,
 | ||
|         |  we'll build a message parser for a common "chat intent": finding
 | ||
|         |  local businesses. Our message semantics will have the following types
 | ||
|         |  of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
 | ||
|         |  #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_intent_parser.py")
 | ||
| 
 | ||
|     +h(3, "textcat") Training spaCy's text classifier
 | ||
|         +tag-new(2)
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to train a multi-label convolutional neural
 | ||
|         |  network text classifier on IMDB movie reviews, using spaCy's new
 | ||
|         |  #[+api("textcategorizer") #[code TextCategorizer]] component. The
 | ||
|         |  dataset will be loaded automatically via Thinc's built-in dataset
 | ||
|         |  loader. Predictions are available via
 | ||
|         |  #[+api("doc#attributes") #[code Doc.cats]].
 | ||
| 
 | ||
|     +github("spacy", "examples/training/train_textcat.py")
 | ||
| 
 | ||
| +section("vectors")
 | ||
|     +h(3, "tensorboard") Visualizing spaCy vectors in TensorBoard
 | ||
| 
 | ||
|     p
 | ||
|         |  These two scripts let you load any spaCy model containing word vectors
 | ||
|         |  into #[+a("https://projector.tensorflow.org/") TensorBoard] to create
 | ||
|         |  an #[+a("https://www.tensorflow.org/versions/r1.1/get_started/embedding_viz") embedding visualization].
 | ||
|         |  The first example uses TensorBoard, the second example TensorBoard's
 | ||
|         |  standalone embedding projector.
 | ||
| 
 | ||
|     +github("spacy", "examples/vectors_tensorboard.py")
 | ||
| 
 | ||
|     +github("spacy", "examples/vectors_tensorboard_standalone.py")
 | ||
| 
 | ||
| +section("deep-learning")
 | ||
|     +h(3, "keras") Text classification with Keras
 | ||
| 
 | ||
|     p
 | ||
|         |  This example shows how to use a #[+a("https://keras.io") Keras]
 | ||
|         |  LSTM sentiment classification model in spaCy. spaCy splits
 | ||
|         |  the document into sentences, and each sentence is classified using
 | ||
|         |  the LSTM. The scores for the sentences are then aggregated to give
 | ||
|         |  the document score. This kind of hierarchical model is quite
 | ||
|         |  difficult in "pure" Keras or Tensorflow, but it's very effective.
 | ||
|         |  The Keras example on this dataset performs quite poorly, because it
 | ||
|         |  cuts off the documents so that they're a fixed size. This hurts
 | ||
|         |  review accuracy a lot, because people often summarise their rating
 | ||
|         |  in the final sentence.
 | ||
| 
 | ||
|     +github("spacy", "examples/deep_learning_keras.py")
 |