import React from 'react' import PropTypes from 'prop-types' import { StaticQuery, graphql } from 'gatsby' import { LandingHeader, LandingTitle, LandingSubtitle, LandingGrid, LandingCard, LandingCol, LandingDemo, LandingBannerGrid, LandingBanner, } from '../components/landing' import { H2 } from '../components/typography' import { Ul, Li } from '../components/list' import { InlineCode } from '../components/code' import Button from '../components/button' import Link from '../components/link' import QuickstartTraining from './quickstart-training' import Project from './project' import courseImage from '../../docs/images/course.jpg' import prodigyImage from '../../docs/images/prodigy_overview.jpg' import projectsImage from '../../docs/images/projects.png' import irlBackground from '../images/spacy-irl.jpg' import Benchmarks from 'usage/_benchmarks-models.md' const CODE_EXAMPLE = `# pip install spacy # python -m spacy download en_core_web_sm import spacy # Load English tokenizer, tagger, parser and NER nlp = spacy.load("en_core_web_sm") # Process whole documents text = ("When Sebastian Thrun started working on self-driving cars at " "Google in 2007, few people outside of the company took him " "seriously. “I can tell you very senior CEOs of major American " "car companies would shake my hand and turn away because I wasn’t " "worth talking to,” said Thrun, in an interview with Recode earlier " "this week.") doc = nlp(text) # Analyze syntax print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks]) print("Verbs:", [token.lemma_ for token in doc if token.pos_ == "VERB"]) # Find named entities, phrases and concepts for entity in doc.ents: print(entity.text, entity.label_) ` const Landing = ({ data }) => { const { counts } = data return ( <> Industrial-Strength
Natural Language
Processing in Python spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. In the five years since its release, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows. {CODE_EXAMPLE}

Features

✅ Support for {counts.langs}+ languages
✅ {counts.models} trained pipelines for{' '} {counts.modelLangs} languages
✅ Multi-task learning with pretrained transformers{' '} like BERT
✅ Pretrained word vectors
✅ State-of-the-art speed
✅ Production-ready training system
✅ Linguistically-motivated tokenization
✅ Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation,{' '} text classification, lemmatization, morphological analysis, entity linking and more
✅ Easily extensible with custom components and attributes
✅ Support for custom models in PyTorch,{' '} TensorFlow and other frameworks
✅ Built in visualizers for syntax and NER
✅ Easy model packaging, deployment and workflow management
✅ Robust, rigorously evaluated accuracy

spaCy v3.0 features all new transformer-based pipelines that bring spaCy's accuracy right up to the current state-of-the-art . You can use any pretrained transformer to train your own pipelines, and even share one transformer between multiple components with{' '} multi-task learning. Training is now fully configurable and extensible, and you can define your own custom models using{' '} PyTorch, TensorFlow and other frameworks. The new spaCy projects system lets you describe whole{' '} end-to-end workflows in a single file, giving you an easy path from prototype to production, and making it easy to clone and adapt best-practice projects for your own use cases. {/** Update image */} Prodigy: Radically efficient machine teaching

Prodigy: Radically efficient machine teaching

Prodigy is an annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster.

Reproducible training for custom pipelines

spaCy v3.0 introduces a comprehensive and extensible system for{' '} configuring your training runs. Your configuration file will describe every detail of your training run, with no hidden defaults, making it easy to rerun your experiments and track changes. You can use the quickstart widget or the{' '} init config {' '} command to get started, or clone a project template for an end-to-end workflow.

The easiest way to get started is to clone a project template and run it – for example, this template for training a{' '} part-of-speech tagger and{' '} dependency parser on a Universal Dependencies treebank.

End-to-end workflows from prototype to production

spaCy's new project system gives you a smooth path from prototype to production. It lets you keep track of all those{' '} data transformation, preprocessing and{' '} training steps, so you can make sure your project is always ready to hand over for automation. It features source asset download, command execution, checksum verification, and caching with a variety of backends and integrations.

Advanced NLP with spaCy: A free online course

In this free and interactive online course you’ll learn how to use spaCy to build advanced natural language understanding systems, using both rule-based and machine learning approaches. It includes{' '} 55 exercises featuring videos, slide decks, multiple-choice questions and interactive coding practice in the browser. We were pleased to invite the spaCy community and other folks working on NLP to Berlin for a small and intimate event. We booked a beautiful venue, hand-picked an awesome lineup of speakers and scheduled plenty of social time to get to know each other. The YouTube playlist includes 12 talks about NLP research, development and applications, with keynotes by Sebastian Ruder (DeepMind) and Yoav Goldberg (Allen AI).

Benchmarks

spaCy v3.0 introduces transformer-based pipelines that bring spaCy's accuracy right up to the current state-of-the-art. You can also use a CPU-optimized pipeline, which is less accurate but much cheaper to run.

) } Landing.propTypes = { data: PropTypes.shape({ repo: PropTypes.string, languages: PropTypes.arrayOf( PropTypes.shape({ models: PropTypes.arrayOf(PropTypes.string), }) ), }), } export default () => ( } /> ) const landingQuery = graphql` query LandingQuery { site { siteMetadata { nightly repo counts { langs modelLangs starterLangs models starters } } } } `