Merge branch 'master' of ssh://github.com/explosion/spaCy

2025-12-07 10:14:22 +03:00 · 2016-11-02 12:31:49 +01:00 · 2016-11-02 12:31:49 +01:00 · 7555aa5e63
commit 7555aa5e63
parent 9efe568177 adf04a6ad3
9 changed files with 54 additions and 18 deletions
--- a/README.rst
+++ b/README.rst
@ -60,15 +60,22 @@ open-source software, released under the MIT license.
 Features
 ========

-* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
-* Named entity recognition (82.6% accuracy on OntoNotes 5)
-* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
-* Easy to use word vectors
-* All strings mapped to integer IDs
+* Non-destructive **tokenization**
+* Syntax-driven sentence segmentation
+* Pre-trained **word vectors**
+* Part-of-speech tagging
+* **Named entity** recognition
+* Labelled dependency parsing
+* Convenient string-to-int mapping
 * Export to numpy data arrays
-* Alignment maintained to original string, ensuring easy mark up calculation
-* Range of easy-to-use orthographic features.
-* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
+* GIL-free **multi-threading**
+* Efficient binary serialization
+* Easy **deep learning** integration
+* Statistical models for **English** and **German**
+* State-of-the-art speed
+* Robust, rigorously evaluated accuracy
+
+See `facts, figures and benchmarks <https://spacy.io/docs/api/>`_.

 Top Peformance
 ==============
@ -239,9 +246,9 @@ Changelog

 **✨ Major features and improvements**

-* **NEW:** `custom processing pipelines <https://spacy.io/docs/tutorials/custom-pipelines>`_, to support deep learning workflows
-* **NEW:** `Rule matcher <https://spacy.io/docs/tutorials/rule-based-matcher>`_ now supports entity IDs and attributes
-* **NEW:** Official/documented `training APIs <https://spacy.io/docs/tutorials/training>`_ and `GoldParse` class
+* **NEW:** `custom processing pipelines <https://spacy.io/docs/usage/customizing-pipeline>`_, to support deep learning workflows
+* **NEW:** `Rule matcher <https://spacy.io/docs/usage/rule-based-matching>`_ now supports entity IDs and attributes
+* **NEW:** Official/documented `training APIs <https://github.com/explosion/spaCy/tree/master/examples/training>`_ and `GoldParse` class
 * Download and use GloVe vectors by default
 * Make it easier to load and unload word vectors
 * Improved rule matching functionality
@ -425,7 +432,7 @@ include a small fix to the tokenizer.
 * Fix bugs in ``Span``
 * Add tokenizer rule to fix numeric range tokenization
 * Add specific string-length cap in Tokenizer
-* Fix ``token.conjuncts```
+* Fix ``token.conjuncts``

 2015-10-09 `v0.94 <https://github.com/explosion/spaCy/releases/tag/0.94>`_
 --------------------------------------------------------------------------
--- a/website/README.md
+++ b/website/README.md
@ -80,7 +80,7 @@ For example:
 +h(2, "link-id") Headline 2 with link to #link-id
 ```

-Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace:
+Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:

 ```pug
 +code("This is a label").
--- a/website/_harp.json
+++ b/website/_harp.json
@ -47,7 +47,7 @@
            }
        },

-        "V_CSS": "1.4",
+        "V_CSS": "1.5",
        "V_JS": "1.0",
        "DEFAULT_SYNTAX" : "python",
        "ANALYTICS": "UA-58931649-1",
--- a/website/assets/css/_components/_tables.sass
+++ b/website/assets/css/_components/_tables.sass
@ -10,7 +10,7 @@

 .c-table__row
    &:nth-child(odd)
-        background: lighten($color-subtle-light, 2)
+        background: rgba($color-subtle-light, 0.35)

    &.c-table__row--foot
        background: $color-subtle-light
--- a/website/docs/api/_data.json
+++ b/website/docs/api/_data.json
@ -2,6 +2,7 @@
    "sidebar": {
        "Introduction": {
            "Facts & Figures": "./",
+            "Language models": "language-models",
            "Philosophy": "philosophy"
        },
        "Classes": {
@ -25,6 +26,11 @@

    "index": {
        "title": "Facts & Figures",
+        "next": "language-models"
+    },
+
+    "language-models": {
+        "title": "Language models",
        "next": "philosophy"
    },

--- a/website/docs/api/language-models.jade
+++ b/website/docs/api/language-models.jade
@ -0,0 +1,22 @@
+//- 💫 DOCS > API > LANGUAGE MODELS
+
+include ../../_includes/_mixins
+
+p You can download data packs that add the following capabilities to spaCy.
+
+aside-code("Download language models", "bash").
+    python -m spacy.en.download all
+    python -m spacy.de.download all
+
+table([ "Language", "Token", "SBD", "Lemma", "POS", "NER", "Dep", "Vector", "Sentiment"])
+    +row
+        +cell English #[code en]
+        each icon in [ "pro", "pro", "pro", "pro", "pro", "pro", "pro", "con" ]
+            +cell.u-text-center #[+procon(icon)]
+
+    +row
+        +cell German #[code de]
+        each icon in [ "pro", "pro", "con", "pro", "pro", "pro", "pro", "con" ]
+            +cell.u-text-center #[+procon(icon)]
+
+p We're working hard to extend support for more languages and more capabilities. The next language packs we're planning to add are Spanish, Chinese, French and Portuguese.
--- a/website/docs/usage/_data.json
+++ b/website/docs/usage/_data.json
@ -208,7 +208,7 @@
            }
        },

-        "features": {
+        "deep_dives": {
            "Deep Learning with custom pipelines and Keras": {
                "url": "https://explosion.ai/blog/spacy-deep-learning-keras",
                "author": "Matthew Honnibal",
--- a/website/docs/usage/tutorials.jade
+++ b/website/docs/usage/tutorials.jade
@ -24,7 +24,7 @@ p
    |  to help introduce you to new concepts.

 +grid
-    each details, title in features
+    each details, title in deep_dives
        +card(title, details)

 +h(2, "code") Programs and scripts
--- a/website/index.jade
+++ b/website/index.jade
@ -94,9 +94,10 @@ include _includes/_mixins
                +item Labelled dependency parsing
                +item Convenient string-to-int mapping
                +item Export to numpy data arrays
-                +item Efficient #[strong multi-threading]
+                +item GIL-free #[strong multi-threading]
                +item Efficient binary serialization
                +item Easy #[strong deep learning] integration
+                +item Statistical models for #[strong English] and #[strong German]
                +item State-of-the-art speed
                +item Robust, rigorously evaluated accuracy