spaCy/website/index.jade

163 lines
7.3 KiB
Plaintext
Raw Normal View History

2016-10-03 21:19:13 +03:00
//- 💫 LANDING PAGE
2016-03-31 17:24:48 +03:00
2016-10-03 21:19:13 +03:00
include _includes/_mixins
2016-03-31 17:24:48 +03:00
2016-10-31 21:04:15 +03:00
+landing-header
h1.c-landing__title.u-heading-0
| Industrial-Strength#[br]
| Natural Language#[br]
| Processing
2017-10-03 15:28:18 +03:00
h2.c-landing__title.o-block.u-heading-3
span.u-text-label.u-text-label--light in Python
+grid.o-content.c-landing__blocks
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Fastest in the world
p
| spaCy excels at large-scale information extraction tasks.
| It's written from the ground up in carefully memory-managed
| Cython. Independent research has confirmed that spaCy is
| the fastest in the world. If your application needs to
| process entire web dumps, spaCy is the library you want to
| be using.
+button("/usage/facts-figures", true, "primary")
| Facts & figures
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Get things done
p
| spaCy is designed to help you do real work — to build real
| products, or gather real insights. The library respects
| your time, and tries to avoid wasting it. It's easy to
| install, and its API is simple and productive. We like to
| think of spaCy as the Ruby on Rails of Natural Language
| Processing.
+button("/usage", true, "primary")
| Get started
+grid-col("third").c-landing__card.o-card.o-grid.o-grid--space
+h(3) Deep learning
p
| spaCy is the best way to prepare text for deep learning.
| It interoperates seamlessly with TensorFlow, PyTorch,
2017-11-07 16:48:17 +03:00
| scikit-learn, Gensim and the rest of Python's awesome AI
| ecosystem. With spaCy, you can easily construct linguistically
| sophisticated statistical models for a variety of NLP problems.
2017-10-03 15:28:18 +03:00
2017-11-07 16:48:17 +03:00
+button("/usage/training", true, "primary")
2017-10-03 15:28:18 +03:00
| Read more
2016-10-31 21:04:15 +03:00
.o-content
2016-03-31 17:24:48 +03:00
+grid
2016-10-31 21:04:15 +03:00
+grid-col("two-thirds")
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
2018-04-29 03:06:46 +03:00
+code-exec("Edit the code & try spaCy", true).
# pip install spacy
# python -m spacy download en_core_web_sm
2016-10-31 21:04:15 +03:00
import spacy
# Load English tokenizer, tagger, parser, NER and word vectors
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
2018-04-29 03:06:46 +03:00
nlp = spacy.load('en_core_web_sm')
2016-10-31 21:04:15 +03:00
2017-11-26 20:04:04 +03:00
# Process whole documents
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
2018-04-29 03:06:46 +03:00
text = (u"When Sebastian Thrun started working on self-driving cars at "
u"Google in 2007, few people outside of the company took him "
u"seriously. “I can tell you very senior CEOs of major American "
u"car companies would shake my hand and turn away because I wasnt "
u"worth talking to,” said Thrun, now the co-founder and CEO of "
u"online higher education startup Udacity, in an interview with "
u"Recode earlier this week.")
2016-10-31 21:04:15 +03:00
doc = nlp(text)
2017-11-01 21:49:36 +03:00
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.text, entity.label_)
2016-10-31 21:04:15 +03:00
2017-11-01 21:49:36 +03:00
# Determine semantic similarities
💫 Interactive code examples, spaCy Universe and various docs improvements (#2274) * Integrate Python kernel via Binder * Add live model test for languages with examples * Update docs and code examples * Adjust margin (if not bootstrapped) * Add binder version to global config * Update terminal and executable code mixins * Pass attributes through infobox and section * Hide v-cloak * Fix example * Take out model comparison for now * Add meta text for compat * Remove chart.js dependency * Tidy up and simplify JS and port big components over to Vue * Remove chartjs example * Add Twitter icon * Add purple stylesheet option * Add utility for hand cursor (special cases only) * Add transition classes * Add small option for section * Add thumb object for small round thumbnail images * Allow unset code block language via "none" value (workaround to still allow unset language to default to DEFAULT_SYNTAX) * Pass through attributes * Add syntax highlighting definitions for Julia, R and Docker * Add website icon * Remove user survey from navigation * Don't hide GitHub icon on small screens * Make top navigation scrollable on small screens * Remove old resources page and references to it * Add Universe * Add helper functions for better page URL and title * Update site description * Increment versions * Update preview images * Update mentions of resources * Fix image * Fix social images * Fix problem with cover sizing and floats * Add divider and move badges into heading * Add docstrings * Reference converting section * Add section on converting word vectors * Move converting section to custom section and fix formatting * Remove old fastText example * Move extensions content to own section Keep weird ID to not break permalinks for now (we don't want to rewrite URLs if not absolutely necessary) * Use better component example and add factories section * Add note on larger model * Use better example for non-vector * Remove similarity in context section Only works via small models with tensors so has always been kind of confusing * Add note on init-model command * Fix lightning tour examples and make excutable if possible * Add spacy train CLI section to train * Fix formatting and add video * Fix formatting * Fix textcat example description (resolves #2246) * Add dummy file to try resolve conflict * Delete dummy file * Tidy up [ci skip] * Ensure sufficient height of loading container * Add loading animation to universe * Update Thebelab build and use better startup message * Fix asset versioning * Fix typo [ci skip] * Add note on project idea label
2018-04-29 03:06:46 +03:00
doc1 = nlp(u"my fries were super gross")
doc2 = nlp(u"such disgusting fries")
similarity = doc1.similarity(doc2)
print(doc1.text, doc2.text, similarity)
2017-11-01 21:49:36 +03:00
2016-10-31 21:04:15 +03:00
+grid-col("third")
+h(2) Features
+list
+item Non-destructive #[strong tokenization]
2017-10-24 22:38:53 +03:00
+item #[strong Named entity] recognition
2017-10-03 15:28:18 +03:00
+item Support for #[strong #{LANG_COUNT}+ languages]
+item #[strong #{MODEL_COUNT} statistical models] for #{MODEL_LANG_COUNT} languages
2016-10-31 21:04:15 +03:00
+item Pre-trained #[strong word vectors]
2017-10-03 15:28:18 +03:00
+item Easy #[strong deep learning] integration
2016-10-31 21:04:15 +03:00
+item Part-of-speech tagging
+item Labelled dependency parsing
2017-10-03 15:28:18 +03:00
+item Syntax-driven sentence segmentation
+item Built in #[strong visualizers] for syntax and NER
+item Convenient string-to-hash mapping
2016-10-31 21:04:15 +03:00
+item Export to numpy data arrays
+item Efficient binary serialization
2017-10-03 15:28:18 +03:00
+item Easy #[strong model packaging] and deployment
2016-10-31 21:04:15 +03:00
+item State-of-the-art speed
+item Robust, rigorously evaluated accuracy
2017-10-03 15:28:18 +03:00
+landing-banner("Convolutional neural network models", "New in v2.0")
p
| spaCy v2.0 features new neural models for #[strong tagging],
| #[strong parsing] and #[strong entity recognition]. The models have
| been designed and implemented from scratch specifically for spaCy, to
| give you an unmatched balance of speed, size and accuracy. A novel
| bloom embedding strategy with subword features is used to support
| huge vocabularies in tiny tables. Convolutional layers with residual
| connections, layer normalization and maxout non-linearity are used,
| giving much better efficiency than the standard BiLSTM solution.
| Finally, the parser and NER use an imitation learning objective to
| deliver accuracy in-line with the latest research systems,
| even when evaluated from raw text. With these innovations, spaCy
| v2.0's models are #[strong 10× smaller],
2017-11-06 23:15:36 +03:00
| #[strong 20% more accurate], and #[strong even cheaper to run] than
| the previous generation.
2017-10-03 15:28:18 +03:00
.o-block-small.u-text-right
+button("/models", true, "secondary-light") Download models
+landing-logos("spaCy is trusted by", logos)
+button(gh("spacy") + "/stargazers", false, "secondary", "small")
| and many more
+landing-logos("Featured on", features).o-block-small
+landing-banner("Prodigy: Radically efficient machine teaching", "From the makers of spaCy")
p
| Prodigy is an #[strong annotation tool] so efficient that data scientists can
| do the annotation themselves, enabling a new level of rapid
| iteration. Whether you're working on entity recognition, intent
| detection or image classification, Prodigy can help you
| #[strong train and evaluate] your models faster. Stream in your own examples or
| real-world data from live APIs, update your model in real-time and
| chain models together to build more complex systems.
.o-block-small.u-text-right
+button("https://prodi.gy", true, "secondary-light") Try it out
2016-10-31 21:04:15 +03:00
2017-10-03 15:28:18 +03:00
.o-content
+grid
+grid-col("half")
+h(2) Benchmarks
2016-10-31 21:04:15 +03:00
p
2017-10-03 15:28:18 +03:00
| In 2015, independent researchers from Emory University and
| Yahoo! Labs showed that spaCy offered the
| #[strong fastest syntactic parser in the world] and that its
| accuracy was #[strong within 1% of the best] available
| (#[+a("https://aclweb.org/anthology/P/P15/P15-1038.pdf") Choi et al., 2015]).
| spaCy v2.0, released in 2017, is more accurate than any of
| the systems Choi et al. evaluated.
2016-10-31 21:04:15 +03:00
2017-10-03 15:28:18 +03:00
.o-inline-list
+button("/usage/facts-figures#benchmarks", true, "secondary") See details
+grid-col("half")
include usage/_facts-figures/_benchmarks-choi-2015