spaCy/website/index.jade
Ines Montani 49cee4af92
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274)
* Integrate Python kernel via Binder

* Add live model test for languages with examples

* Update docs and code examples

* Adjust margin (if not bootstrapped)

* Add binder version to global config

* Update terminal and executable code mixins

* Pass attributes through infobox and section

* Hide v-cloak

* Fix example

* Take out model comparison for now

* Add meta text for compat

* Remove chart.js dependency

* Tidy up and simplify JS and port big components over to Vue

* Remove chartjs example

* Add Twitter icon

* Add purple stylesheet option

* Add utility for hand cursor (special cases only)

* Add transition classes

* Add small option for section

* Add thumb object for small round thumbnail images

* Allow unset code block language via "none" value

(workaround to still allow unset language to default to DEFAULT_SYNTAX)

* Pass through attributes

* Add syntax highlighting definitions for Julia, R and Docker

* Add website icon

* Remove user survey from navigation

* Don't hide GitHub icon on small screens

* Make top navigation scrollable on small screens

* Remove old resources page and references to it

* Add Universe

* Add helper functions for better page URL and title

* Update site description

* Increment versions

* Update preview images

* Update mentions of resources

* Fix image

* Fix social images

* Fix problem with cover sizing and floats

* Add divider and move badges into heading

* Add docstrings

* Reference converting section

* Add section on converting word vectors

* Move converting section to custom section and fix formatting

* Remove old fastText example

* Move extensions content to own section

Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary)

* Use better component example and add factories section

* Add note on larger model

* Use better example for non-vector

* Remove similarity in context section

Only works via small models with tensors so has always been kind of confusing

* Add note on init-model command

* Fix lightning tour examples and make excutable if possible

* Add spacy train CLI section to train

* Fix formatting and add video

* Fix formatting

* Fix textcat example description (resolves #2246)

* Add dummy file to try resolve conflict

* Delete dummy file

* Tidy up [ci skip]

* Ensure sufficient height of loading container

* Add loading animation to universe

* Update Thebelab build and use better startup message

* Fix asset versioning

* Fix typo [ci skip]

* Add note on project idea label
2018-04-29 02:06:46 +02:00

163 lines
7.3 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//- 💫 LANDING PAGE
include _includes/_mixins
+landing-header
h1.c-landing__title.u-heading-0
| Industrial-Strength#[br]
| Natural Language#[br]
| Processing
h2.c-landing__title.o-block.u-heading-3
span.u-text-label.u-text-label--light in Python
+grid.o-content.c-landing__blocks
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Fastest in the world
p
| spaCy excels at large-scale information extraction tasks.
| It's written from the ground up in carefully memory-managed
| Cython. Independent research has confirmed that spaCy is
| the fastest in the world. If your application needs to
| process entire web dumps, spaCy is the library you want to
| be using.
+button("/usage/facts-figures", true, "primary")
| Facts & figures
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Get things done
p
| spaCy is designed to help you do real work — to build real
| products, or gather real insights. The library respects
| your time, and tries to avoid wasting it. It's easy to
| install, and its API is simple and productive. We like to
| think of spaCy as the Ruby on Rails of Natural Language
| Processing.
+button("/usage", true, "primary")
| Get started
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Deep learning
p
| spaCy is the best way to prepare text for deep learning.
| It interoperates seamlessly with TensorFlow, PyTorch,
| scikit-learn, Gensim and the rest of Python's awesome AI
| ecosystem. With spaCy, you can easily construct linguistically
| sophisticated statistical models for a variety of NLP problems.
+button("/usage/training", true, "primary")
| Read more
.o-content
+grid
+grid-col("two-thirds")
+code-exec("Edit the code & try spaCy", true).
# pip install spacy
# python -m spacy download en_core_web_sm
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en_core_web_sm')
# Process whole documents
text = (u"When Sebastian Thrun started working on self-driving cars at "
u"Google in 2007, few people outside of the company took him "
u"seriously. “I can tell you very senior CEOs of major American "
u"car companies would shake my hand and turn away because I wasnt "
u"worth talking to,” said Thrun, now the co-founder and CEO of "
u"online higher education startup Udacity, in an interview with "
u"Recode earlier this week.")
doc = nlp(text)
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
# Determine semantic similarities
doc1 = nlp(u"my fries were super gross")
doc2 = nlp(u"such disgusting fries")
similarity = doc1.similarity(doc2)
print(doc1.text, doc2.text, similarity)
+grid-col("third")
+h(2) Features
+list
+item Non-destructive #[strong tokenization]
+item #[strong Named entity] recognition
+item Support for #[strong #{LANG_COUNT}+ languages]
+item #[strong #{MODEL_COUNT} statistical models] for #{MODEL_LANG_COUNT} languages
+item Pre-trained #[strong word vectors]
+item Easy #[strong deep learning] integration
+item Part-of-speech tagging
+item Labelled dependency parsing
+item Syntax-driven sentence segmentation
+item Built in #[strong visualizers] for syntax and NER
+item Convenient string-to-hash mapping
+item Export to numpy data arrays
+item Efficient binary serialization
+item Easy #[strong model packaging] and deployment
+item State-of-the-art speed
+item Robust, rigorously evaluated accuracy
+landing-banner("Convolutional neural network models", "New in v2.0")
p
| spaCy v2.0 features new neural models for #[strong tagging],
| #[strong parsing] and #[strong entity recognition]. The models have
| been designed and implemented from scratch specifically for spaCy, to
| give you an unmatched balance of speed, size and accuracy. A novel
| bloom embedding strategy with subword features is used to support
| huge vocabularies in tiny tables. Convolutional layers with residual
| connections, layer normalization and maxout non-linearity are used,
| giving much better efficiency than the standard BiLSTM solution.
| Finally, the parser and NER use an imitation learning objective to
| deliver accuracy in-line with the latest research systems,
| even when evaluated from raw text. With these innovations, spaCy
| v2.0's models are #[strong 10× smaller],
| #[strong 20% more accurate], and #[strong even cheaper to run] than
| the previous generation.
.o-block-small.u-text-right
+button("/models", true, "secondary-light") Download models
+landing-logos("spaCy is trusted by", logos)
+button(gh("spacy") + "/stargazers", false, "secondary", "small")
| and many more
+landing-logos("Featured on", features).o-block-small
+landing-banner("Prodigy: Radically efficient machine teaching", "From the makers of spaCy")
p
| Prodigy is an #[strong annotation tool] so efficient that data scientists can
| do the annotation themselves, enabling a new level of rapid
| iteration. Whether you're working on entity recognition, intent
| detection or image classification, Prodigy can help you
| #[strong train and evaluate] your models faster. Stream in your own examples or
| real-world data from live APIs, update your model in real-time and
| chain models together to build more complex systems.
.o-block-small.u-text-right
+button("https://prodi.gy", true, "secondary-light") Try it out
.o-content
+grid
+grid-col("half")
+h(2) Benchmarks
p
| In 2015, independent researchers from Emory University and
| Yahoo! Labs showed that spaCy offered the
| #[strong fastest syntactic parser in the world] and that its
| accuracy was #[strong within 1% of the best] available
| (#[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al., 2015]).
| spaCy v2.0, released in 2017, is more accurate than any of
| the systems Choi et al. evaluated.
.o-inline-list
+button("/usage/facts-figures#benchmarks", true, "secondary") See details
+grid-col("half")
include usage/_facts-figures/_benchmarks-choi-2015