2016-10-03 21:19:13 +03:00
|
|
|
//- 💫 LANDING PAGE
|
2016-03-31 17:24:48 +03:00
|
|
|
|
2016-10-03 21:19:13 +03:00
|
|
|
include _includes/_mixins
|
2016-03-31 17:24:48 +03:00
|
|
|
|
2016-10-31 21:04:15 +03:00
|
|
|
+landing-header
|
|
|
|
h1.c-landing__title.u-heading-0
|
|
|
|
| Industrial-Strength#[br]
|
|
|
|
| Natural Language#[br]
|
|
|
|
| Processing
|
|
|
|
|
2017-10-03 15:28:18 +03:00
|
|
|
h2.c-landing__title.o-block.u-heading-3
|
|
|
|
span.u-text-label.u-text-label--light in Python
|
|
|
|
|
|
|
|
+grid.o-content.c-landing__blocks
|
|
|
|
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
|
|
|
|
+h(3) Fastest in the world
|
|
|
|
p
|
|
|
|
| spaCy excels at large-scale information extraction tasks.
|
|
|
|
| It's written from the ground up in carefully memory-managed
|
|
|
|
| Cython. Independent research has confirmed that spaCy is
|
|
|
|
| the fastest in the world. If your application needs to
|
|
|
|
| process entire web dumps, spaCy is the library you want to
|
|
|
|
| be using.
|
|
|
|
|
|
|
|
+button("/usage/facts-figures", true, "primary")
|
|
|
|
| Facts & figures
|
|
|
|
|
|
|
|
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
|
|
|
|
+h(3) Get things done
|
|
|
|
p
|
|
|
|
| spaCy is designed to help you do real work — to build real
|
|
|
|
| products, or gather real insights. The library respects
|
|
|
|
| your time, and tries to avoid wasting it. It's easy to
|
|
|
|
| install, and its API is simple and productive. We like to
|
|
|
|
| think of spaCy as the Ruby on Rails of Natural Language
|
|
|
|
| Processing.
|
|
|
|
|
|
|
|
+button("/usage", true, "primary")
|
|
|
|
| Get started
|
|
|
|
|
|
|
|
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
|
|
|
|
+h(3) Deep learning
|
|
|
|
p
|
|
|
|
| spaCy is the best way to prepare text for deep learning.
|
|
|
|
| It interoperates seamlessly with TensorFlow, PyTorch,
|
2017-11-07 16:48:17 +03:00
|
|
|
| scikit-learn, Gensim and the rest of Python's awesome AI
|
|
|
|
| ecosystem. With spaCy, you can easily construct linguistically
|
|
|
|
| sophisticated statistical models for a variety of NLP problems.
|
2017-10-03 15:28:18 +03:00
|
|
|
|
2017-11-07 16:48:17 +03:00
|
|
|
+button("/usage/training", true, "primary")
|
2017-10-03 15:28:18 +03:00
|
|
|
| Read more
|
2016-11-07 04:14:43 +03:00
|
|
|
|
2016-10-31 21:04:15 +03:00
|
|
|
.o-content
|
2016-03-31 17:24:48 +03:00
|
|
|
+grid
|
2016-10-31 21:04:15 +03:00
|
|
|
+grid-col("two-thirds")
|
2017-11-01 21:49:36 +03:00
|
|
|
+terminal("lightning_tour.py", "More examples", "/usage/spacy-101#lightning-tour").
|
2017-11-10 04:30:55 +03:00
|
|
|
# Install: pip install spacy && python -m spacy download en
|
2016-10-31 21:04:15 +03:00
|
|
|
import spacy
|
|
|
|
|
|
|
|
# Load English tokenizer, tagger, parser, NER and word vectors
|
|
|
|
nlp = spacy.load('en')
|
|
|
|
|
|
|
|
# Process a document, of any size
|
|
|
|
text = open('war_and_peace.txt').read()
|
|
|
|
doc = nlp(text)
|
|
|
|
|
2017-11-01 21:49:36 +03:00
|
|
|
# Find named entities, phrases and concepts
|
|
|
|
for entity in doc.ents:
|
|
|
|
print(entity.text, entity.label_)
|
2016-10-31 21:04:15 +03:00
|
|
|
|
2017-11-01 21:49:36 +03:00
|
|
|
# Determine semantic similarities
|
2016-10-31 21:04:15 +03:00
|
|
|
doc1 = nlp(u'the fries were gross')
|
|
|
|
doc2 = nlp(u'worst fries ever')
|
|
|
|
doc1.similarity(doc2)
|
|
|
|
|
2017-11-01 21:49:36 +03:00
|
|
|
# Hook in your own deep learning models
|
|
|
|
nlp.add_pipe(load_my_model(), before='parser')
|
|
|
|
|
2016-10-31 21:04:15 +03:00
|
|
|
+grid-col("third")
|
|
|
|
+h(2) Features
|
|
|
|
+list
|
|
|
|
+item Non-destructive #[strong tokenization]
|
2017-10-24 22:38:53 +03:00
|
|
|
+item #[strong Named entity] recognition
|
2017-10-03 15:28:18 +03:00
|
|
|
+item Support for #[strong #{LANG_COUNT}+ languages]
|
|
|
|
+item #[strong #{MODEL_COUNT} statistical models] for #{MODEL_LANG_COUNT} languages
|
2016-10-31 21:04:15 +03:00
|
|
|
+item Pre-trained #[strong word vectors]
|
2017-10-03 15:28:18 +03:00
|
|
|
+item Easy #[strong deep learning] integration
|
2016-10-31 21:04:15 +03:00
|
|
|
+item Part-of-speech tagging
|
|
|
|
+item Labelled dependency parsing
|
2017-10-03 15:28:18 +03:00
|
|
|
+item Syntax-driven sentence segmentation
|
|
|
|
+item Built in #[strong visualizers] for syntax and NER
|
2017-05-28 19:19:11 +03:00
|
|
|
+item Convenient string-to-hash mapping
|
2016-10-31 21:04:15 +03:00
|
|
|
+item Export to numpy data arrays
|
|
|
|
+item Efficient binary serialization
|
2017-10-03 15:28:18 +03:00
|
|
|
+item Easy #[strong model packaging] and deployment
|
2016-10-31 21:04:15 +03:00
|
|
|
+item State-of-the-art speed
|
|
|
|
+item Robust, rigorously evaluated accuracy
|
|
|
|
|
2017-10-03 15:28:18 +03:00
|
|
|
+landing-banner("Convolutional neural network models", "New in v2.0")
|
|
|
|
p
|
|
|
|
| spaCy v2.0 features new neural models for #[strong tagging],
|
|
|
|
| #[strong parsing] and #[strong entity recognition]. The models have
|
|
|
|
| been designed and implemented from scratch specifically for spaCy, to
|
|
|
|
| give you an unmatched balance of speed, size and accuracy. A novel
|
|
|
|
| bloom embedding strategy with subword features is used to support
|
|
|
|
| huge vocabularies in tiny tables. Convolutional layers with residual
|
|
|
|
| connections, layer normalization and maxout non-linearity are used,
|
|
|
|
| giving much better efficiency than the standard BiLSTM solution.
|
|
|
|
| Finally, the parser and NER use an imitation learning objective to
|
|
|
|
| deliver accuracy in-line with the latest research systems,
|
|
|
|
| even when evaluated from raw text. With these innovations, spaCy
|
|
|
|
| v2.0's models are #[strong 10× smaller],
|
2017-11-06 23:15:36 +03:00
|
|
|
| #[strong 20% more accurate], and #[strong even cheaper to run] than
|
|
|
|
| the previous generation.
|
2017-10-03 15:28:18 +03:00
|
|
|
|
|
|
|
.o-block-small.u-text-right
|
|
|
|
+button("/models", true, "secondary-light") Download models
|
|
|
|
|
|
|
|
+landing-logos("spaCy is trusted by", logos)
|
|
|
|
+button(gh("spacy") + "/stargazers", false, "secondary", "small")
|
|
|
|
| and many more
|
|
|
|
|
|
|
|
+landing-logos("Featured on", features).o-block-small
|
|
|
|
|
|
|
|
+landing-banner("Prodigy: Radically efficient machine teaching", "From the makers of spaCy")
|
|
|
|
p
|
|
|
|
| Prodigy is an #[strong annotation tool] so efficient that data scientists can
|
|
|
|
| do the annotation themselves, enabling a new level of rapid
|
|
|
|
| iteration. Whether you're working on entity recognition, intent
|
|
|
|
| detection or image classification, Prodigy can help you
|
|
|
|
| #[strong train and evaluate] your models faster. Stream in your own examples or
|
|
|
|
| real-world data from live APIs, update your model in real-time and
|
|
|
|
| chain models together to build more complex systems.
|
|
|
|
|
|
|
|
.o-block-small.u-text-right
|
|
|
|
+button("https://prodi.gy", true, "secondary-light") Try it out
|
2016-10-31 21:04:15 +03:00
|
|
|
|
2017-10-03 15:28:18 +03:00
|
|
|
.o-content
|
|
|
|
+grid
|
|
|
|
+grid-col("half")
|
|
|
|
+h(2) Benchmarks
|
2016-10-31 21:04:15 +03:00
|
|
|
|
|
|
|
p
|
2017-10-03 15:28:18 +03:00
|
|
|
| In 2015, independent researchers from Emory University and
|
|
|
|
| Yahoo! Labs showed that spaCy offered the
|
|
|
|
| #[strong fastest syntactic parser in the world] and that its
|
|
|
|
| accuracy was #[strong within 1% of the best] available
|
|
|
|
| (#[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al., 2015]).
|
|
|
|
| spaCy v2.0, released in 2017, is more accurate than any of
|
|
|
|
| the systems Choi et al. evaluated.
|
2016-10-31 21:04:15 +03:00
|
|
|
|
2017-10-03 15:28:18 +03:00
|
|
|
.o-inline-list
|
|
|
|
+button("/usage/facts-figures#benchmarks", true, "secondary") See details
|
|
|
|
|
|
|
|
+grid-col("half")
|
|
|
|
include usage/_facts-figures/_benchmarks-choi-2015
|