spaCy/website/index.jade
2017-11-26 18:04:04 +01:00

157 lines
6.8 KiB
Plaintext

//- 💫 LANDING PAGE
include _includes/_mixins
+landing-header
h1.c-landing__title.u-heading-0
| Industrial-Strength#[br]
| Natural Language#[br]
| Processing
h2.c-landing__title.o-block.u-heading-3
span.u-text-label.u-text-label--light in Python
+grid.o-content.c-landing__blocks
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Fastest in the world
p
| spaCy excels at large-scale information extraction tasks.
| It's written from the ground up in carefully memory-managed
| Cython. Independent research has confirmed that spaCy is
| the fastest in the world. If your application needs to
| process entire web dumps, spaCy is the library you want to
| be using.
+button("/usage/facts-figures", true, "primary")
| Facts & figures
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Get things done
p
| spaCy is designed to help you do real work — to build real
| products, or gather real insights. The library respects
| your time, and tries to avoid wasting it. It's easy to
| install, and its API is simple and productive. We like to
| think of spaCy as the Ruby on Rails of Natural Language
| Processing.
+button("/usage", true, "primary")
| Get started
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Deep learning
p
| spaCy is the best way to prepare text for deep learning.
| It interoperates seamlessly with TensorFlow, PyTorch,
| scikit-learn, Gensim and the rest of Python's awesome AI
| ecosystem. With spaCy, you can easily construct linguistically
| sophisticated statistical models for a variety of NLP problems.
+button("/usage/training", true, "primary")
| Read more
.o-content
+grid
+grid-col("two-thirds")
+terminal("lightning_tour.py", "More examples", "/usage/spacy-101#lightning-tour").
# Install: pip install spacy && python -m spacy download en
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en')
# Process whole documents
text = open('customer_feedback_627.txt').read()
doc = nlp(text)
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
# Determine semantic similarities
doc1 = nlp(u'the fries were gross')
doc2 = nlp(u'worst fries ever')
doc1.similarity(doc2)
# Hook in your own deep learning models
nlp.add_pipe(load_my_model(), before='parser')
+grid-col("third")
+h(2) Features
+list
+item Non-destructive #[strong tokenization]
+item #[strong Named entity] recognition
+item Support for #[strong #{LANG_COUNT}+ languages]
+item #[strong #{MODEL_COUNT} statistical models] for #{MODEL_LANG_COUNT} languages
+item Pre-trained #[strong word vectors]
+item Easy #[strong deep learning] integration
+item Part-of-speech tagging
+item Labelled dependency parsing
+item Syntax-driven sentence segmentation
+item Built in #[strong visualizers] for syntax and NER
+item Convenient string-to-hash mapping
+item Export to numpy data arrays
+item Efficient binary serialization
+item Easy #[strong model packaging] and deployment
+item State-of-the-art speed
+item Robust, rigorously evaluated accuracy
+landing-banner("Convolutional neural network models", "New in v2.0")
p
| spaCy v2.0 features new neural models for #[strong tagging],
| #[strong parsing] and #[strong entity recognition]. The models have
| been designed and implemented from scratch specifically for spaCy, to
| give you an unmatched balance of speed, size and accuracy. A novel
| bloom embedding strategy with subword features is used to support
| huge vocabularies in tiny tables. Convolutional layers with residual
| connections, layer normalization and maxout non-linearity are used,
| giving much better efficiency than the standard BiLSTM solution.
| Finally, the parser and NER use an imitation learning objective to
| deliver accuracy in-line with the latest research systems,
| even when evaluated from raw text. With these innovations, spaCy
| v2.0's models are #[strong 10× smaller],
| #[strong 20% more accurate], and #[strong even cheaper to run] than
| the previous generation.
.o-block-small.u-text-right
+button("/models", true, "secondary-light") Download models
+landing-logos("spaCy is trusted by", logos)
+button(gh("spacy") + "/stargazers", false, "secondary", "small")
| and many more
+landing-logos("Featured on", features).o-block-small
+landing-banner("Prodigy: Radically efficient machine teaching", "From the makers of spaCy")
p
| Prodigy is an #[strong annotation tool] so efficient that data scientists can
| do the annotation themselves, enabling a new level of rapid
| iteration. Whether you're working on entity recognition, intent
| detection or image classification, Prodigy can help you
| #[strong train and evaluate] your models faster. Stream in your own examples or
| real-world data from live APIs, update your model in real-time and
| chain models together to build more complex systems.
.o-block-small.u-text-right
+button("https://prodi.gy", true, "secondary-light") Try it out
.o-content
+grid
+grid-col("half")
+h(2) Benchmarks
p
| In 2015, independent researchers from Emory University and
| Yahoo! Labs showed that spaCy offered the
| #[strong fastest syntactic parser in the world] and that its
| accuracy was #[strong within 1% of the best] available
| (#[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al., 2015]).
| spaCy v2.0, released in 2017, is more accurate than any of
| the systems Choi et al. evaluated.
.o-inline-list
+button("/usage/facts-figures#benchmarks", true, "secondary") See details
+grid-col("half")
include usage/_facts-figures/_benchmarks-choi-2015