Merge branch 'master' of ssh://github.com/explosion/spaCy

This commit is contained in:
Matthew Honnibal 2016-11-02 12:31:49 +01:00
commit 7555aa5e63
9 changed files with 54 additions and 18 deletions

View File

@ -60,15 +60,22 @@ open-source software, released under the MIT license.
Features Features
======== ========
* Labelled dependency parsing (91.8% accuracy on OntoNotes 5) * Non-destructive **tokenization**
* Named entity recognition (82.6% accuracy on OntoNotes 5) * Syntax-driven sentence segmentation
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5) * Pre-trained **word vectors**
* Easy to use word vectors * Part-of-speech tagging
* All strings mapped to integer IDs * **Named entity** recognition
* Labelled dependency parsing
* Convenient string-to-int mapping
* Export to numpy data arrays * Export to numpy data arrays
* Alignment maintained to original string, ensuring easy mark up calculation * GIL-free **multi-threading**
* Range of easy-to-use orthographic features. * Efficient binary serialization
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all. * Easy **deep learning** integration
* Statistical models for **English** and **German**
* State-of-the-art speed
* Robust, rigorously evaluated accuracy
See `facts, figures and benchmarks <https://spacy.io/docs/api/>`_.
Top Peformance Top Peformance
============== ==============
@ -239,9 +246,9 @@ Changelog
**✨ Major features and improvements** **✨ Major features and improvements**
* **NEW:** `custom processing pipelines <https://spacy.io/docs/tutorials/custom-pipelines>`_, to support deep learning workflows * **NEW:** `custom processing pipelines <https://spacy.io/docs/usage/customizing-pipeline>`_, to support deep learning workflows
* **NEW:** `Rule matcher <https://spacy.io/docs/tutorials/rule-based-matcher>`_ now supports entity IDs and attributes * **NEW:** `Rule matcher <https://spacy.io/docs/usage/rule-based-matching>`_ now supports entity IDs and attributes
* **NEW:** Official/documented `training APIs <https://spacy.io/docs/tutorials/training>`_ and `GoldParse` class * **NEW:** Official/documented `training APIs <https://github.com/explosion/spaCy/tree/master/examples/training>`_ and `GoldParse` class
* Download and use GloVe vectors by default * Download and use GloVe vectors by default
* Make it easier to load and unload word vectors * Make it easier to load and unload word vectors
* Improved rule matching functionality * Improved rule matching functionality
@ -425,7 +432,7 @@ include a small fix to the tokenizer.
* Fix bugs in ``Span`` * Fix bugs in ``Span``
* Add tokenizer rule to fix numeric range tokenization * Add tokenizer rule to fix numeric range tokenization
* Add specific string-length cap in Tokenizer * Add specific string-length cap in Tokenizer
* Fix ``token.conjuncts``` * Fix ``token.conjuncts``
2015-10-09 `v0.94 <https://github.com/explosion/spaCy/releases/tag/0.94>`_ 2015-10-09 `v0.94 <https://github.com/explosion/spaCy/releases/tag/0.94>`_
-------------------------------------------------------------------------- --------------------------------------------------------------------------

View File

@ -80,7 +80,7 @@ For example:
+h(2, "link-id") Headline 2 with link to #link-id +h(2, "link-id") Headline 2 with link to #link-id
``` ```
Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace: Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:
```pug ```pug
+code("This is a label"). +code("This is a label").

View File

@ -47,7 +47,7 @@
} }
}, },
"V_CSS": "1.4", "V_CSS": "1.5",
"V_JS": "1.0", "V_JS": "1.0",
"DEFAULT_SYNTAX" : "python", "DEFAULT_SYNTAX" : "python",
"ANALYTICS": "UA-58931649-1", "ANALYTICS": "UA-58931649-1",

View File

@ -10,7 +10,7 @@
.c-table__row .c-table__row
&:nth-child(odd) &:nth-child(odd)
background: lighten($color-subtle-light, 2) background: rgba($color-subtle-light, 0.35)
&.c-table__row--foot &.c-table__row--foot
background: $color-subtle-light background: $color-subtle-light

View File

@ -2,6 +2,7 @@
"sidebar": { "sidebar": {
"Introduction": { "Introduction": {
"Facts & Figures": "./", "Facts & Figures": "./",
"Language models": "language-models",
"Philosophy": "philosophy" "Philosophy": "philosophy"
}, },
"Classes": { "Classes": {
@ -25,6 +26,11 @@
"index": { "index": {
"title": "Facts & Figures", "title": "Facts & Figures",
"next": "language-models"
},
"language-models": {
"title": "Language models",
"next": "philosophy" "next": "philosophy"
}, },

View File

@ -0,0 +1,22 @@
//- 💫 DOCS > API > LANGUAGE MODELS
include ../../_includes/_mixins
p You can download data packs that add the following capabilities to spaCy.
+aside-code("Download language models", "bash").
python -m spacy.en.download all
python -m spacy.de.download all
+table([ "Language", "Token", "SBD", "Lemma", "POS", "NER", "Dep", "Vector", "Sentiment"])
+row
+cell English #[code en]
each icon in [ "pro", "pro", "pro", "pro", "pro", "pro", "pro", "con" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell German #[code de]
each icon in [ "pro", "pro", "con", "pro", "pro", "pro", "pro", "con" ]
+cell.u-text-center #[+procon(icon)]
p We're working hard to extend support for more languages and more capabilities. The next language packs we're planning to add are Spanish, Chinese, French and Portuguese.

View File

@ -208,7 +208,7 @@
} }
}, },
"features": { "deep_dives": {
"Deep Learning with custom pipelines and Keras": { "Deep Learning with custom pipelines and Keras": {
"url": "https://explosion.ai/blog/spacy-deep-learning-keras", "url": "https://explosion.ai/blog/spacy-deep-learning-keras",
"author": "Matthew Honnibal", "author": "Matthew Honnibal",

View File

@ -24,7 +24,7 @@ p
| to help introduce you to new concepts. | to help introduce you to new concepts.
+grid +grid
each details, title in features each details, title in deep_dives
+card(title, details) +card(title, details)
+h(2, "code") Programs and scripts +h(2, "code") Programs and scripts

View File

@ -94,9 +94,10 @@ include _includes/_mixins
+item Labelled dependency parsing +item Labelled dependency parsing
+item Convenient string-to-int mapping +item Convenient string-to-int mapping
+item Export to numpy data arrays +item Export to numpy data arrays
+item Efficient #[strong multi-threading] +item GIL-free #[strong multi-threading]
+item Efficient binary serialization +item Efficient binary serialization
+item Easy #[strong deep learning] integration +item Easy #[strong deep learning] integration
+item Statistical models for #[strong English] and #[strong German]
+item State-of-the-art speed +item State-of-the-art speed
+item Robust, rigorously evaluated accuracy +item Robust, rigorously evaluated accuracy