Merge branch 'master' of ssh://github.com/explosion/spaCy

This commit is contained in:
Matthew Honnibal 2016-11-02 12:31:49 +01:00
commit 7555aa5e63
9 changed files with 54 additions and 18 deletions

View File

@ -60,15 +60,22 @@ open-source software, released under the MIT license.
Features
========
* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
* Named entity recognition (82.6% accuracy on OntoNotes 5)
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
* Easy to use word vectors
* All strings mapped to integer IDs
* Non-destructive **tokenization**
* Syntax-driven sentence segmentation
* Pre-trained **word vectors**
* Part-of-speech tagging
* **Named entity** recognition
* Labelled dependency parsing
* Convenient string-to-int mapping
* Export to numpy data arrays
* Alignment maintained to original string, ensuring easy mark up calculation
* Range of easy-to-use orthographic features.
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
* GIL-free **multi-threading**
* Efficient binary serialization
* Easy **deep learning** integration
* Statistical models for **English** and **German**
* State-of-the-art speed
* Robust, rigorously evaluated accuracy
See `facts, figures and benchmarks <https://spacy.io/docs/api/>`_.
Top Peformance
==============
@ -239,9 +246,9 @@ Changelog
**✨ Major features and improvements**
* **NEW:** `custom processing pipelines <https://spacy.io/docs/tutorials/custom-pipelines>`_, to support deep learning workflows
* **NEW:** `Rule matcher <https://spacy.io/docs/tutorials/rule-based-matcher>`_ now supports entity IDs and attributes
* **NEW:** Official/documented `training APIs <https://spacy.io/docs/tutorials/training>`_ and `GoldParse` class
* **NEW:** `custom processing pipelines <https://spacy.io/docs/usage/customizing-pipeline>`_, to support deep learning workflows
* **NEW:** `Rule matcher <https://spacy.io/docs/usage/rule-based-matching>`_ now supports entity IDs and attributes
* **NEW:** Official/documented `training APIs <https://github.com/explosion/spaCy/tree/master/examples/training>`_ and `GoldParse` class
* Download and use GloVe vectors by default
* Make it easier to load and unload word vectors
* Improved rule matching functionality
@ -425,7 +432,7 @@ include a small fix to the tokenizer.
* Fix bugs in ``Span``
* Add tokenizer rule to fix numeric range tokenization
* Add specific string-length cap in Tokenizer
* Fix ``token.conjuncts```
* Fix ``token.conjuncts``
2015-10-09 `v0.94 <https://github.com/explosion/spaCy/releases/tag/0.94>`_
--------------------------------------------------------------------------

View File

@ -80,7 +80,7 @@ For example:
+h(2, "link-id") Headline 2 with link to #link-id
```
Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace:
Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:
```pug
+code("This is a label").

View File

@ -47,7 +47,7 @@
}
},
"V_CSS": "1.4",
"V_CSS": "1.5",
"V_JS": "1.0",
"DEFAULT_SYNTAX" : "python",
"ANALYTICS": "UA-58931649-1",

View File

@ -10,7 +10,7 @@
.c-table__row
&:nth-child(odd)
background: lighten($color-subtle-light, 2)
background: rgba($color-subtle-light, 0.35)
&.c-table__row--foot
background: $color-subtle-light

View File

@ -2,6 +2,7 @@
"sidebar": {
"Introduction": {
"Facts & Figures": "./",
"Language models": "language-models",
"Philosophy": "philosophy"
},
"Classes": {
@ -25,6 +26,11 @@
"index": {
"title": "Facts & Figures",
"next": "language-models"
},
"language-models": {
"title": "Language models",
"next": "philosophy"
},

View File

@ -0,0 +1,22 @@
//- 💫 DOCS > API > LANGUAGE MODELS
include ../../_includes/_mixins
p You can download data packs that add the following capabilities to spaCy.
+aside-code("Download language models", "bash").
python -m spacy.en.download all
python -m spacy.de.download all
+table([ "Language", "Token", "SBD", "Lemma", "POS", "NER", "Dep", "Vector", "Sentiment"])
+row
+cell English #[code en]
each icon in [ "pro", "pro", "pro", "pro", "pro", "pro", "pro", "con" ]
+cell.u-text-center #[+procon(icon)]
+row
+cell German #[code de]
each icon in [ "pro", "pro", "con", "pro", "pro", "pro", "pro", "con" ]
+cell.u-text-center #[+procon(icon)]
p We're working hard to extend support for more languages and more capabilities. The next language packs we're planning to add are Spanish, Chinese, French and Portuguese.

View File

@ -208,7 +208,7 @@
}
},
"features": {
"deep_dives": {
"Deep Learning with custom pipelines and Keras": {
"url": "https://explosion.ai/blog/spacy-deep-learning-keras",
"author": "Matthew Honnibal",

View File

@ -24,7 +24,7 @@ p
| to help introduce you to new concepts.
+grid
each details, title in features
each details, title in deep_dives
+card(title, details)
+h(2, "code") Programs and scripts

View File

@ -94,9 +94,10 @@ include _includes/_mixins
+item Labelled dependency parsing
+item Convenient string-to-int mapping
+item Export to numpy data arrays
+item Efficient #[strong multi-threading]
+item GIL-free #[strong multi-threading]
+item Efficient binary serialization
+item Easy #[strong deep learning] integration
+item Statistical models for #[strong English] and #[strong German]
+item State-of-the-art speed
+item Robust, rigorously evaluated accuracy