mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Merge branch 'master' of ssh://github.com/explosion/spaCy
This commit is contained in:
commit
7555aa5e63
31
README.rst
31
README.rst
|
@ -60,15 +60,22 @@ open-source software, released under the MIT license.
|
|||
Features
|
||||
========
|
||||
|
||||
* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
|
||||
* Named entity recognition (82.6% accuracy on OntoNotes 5)
|
||||
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
|
||||
* Easy to use word vectors
|
||||
* All strings mapped to integer IDs
|
||||
* Non-destructive **tokenization**
|
||||
* Syntax-driven sentence segmentation
|
||||
* Pre-trained **word vectors**
|
||||
* Part-of-speech tagging
|
||||
* **Named entity** recognition
|
||||
* Labelled dependency parsing
|
||||
* Convenient string-to-int mapping
|
||||
* Export to numpy data arrays
|
||||
* Alignment maintained to original string, ensuring easy mark up calculation
|
||||
* Range of easy-to-use orthographic features.
|
||||
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
|
||||
* GIL-free **multi-threading**
|
||||
* Efficient binary serialization
|
||||
* Easy **deep learning** integration
|
||||
* Statistical models for **English** and **German**
|
||||
* State-of-the-art speed
|
||||
* Robust, rigorously evaluated accuracy
|
||||
|
||||
See `facts, figures and benchmarks <https://spacy.io/docs/api/>`_.
|
||||
|
||||
Top Peformance
|
||||
==============
|
||||
|
@ -239,9 +246,9 @@ Changelog
|
|||
|
||||
**✨ Major features and improvements**
|
||||
|
||||
* **NEW:** `custom processing pipelines <https://spacy.io/docs/tutorials/custom-pipelines>`_, to support deep learning workflows
|
||||
* **NEW:** `Rule matcher <https://spacy.io/docs/tutorials/rule-based-matcher>`_ now supports entity IDs and attributes
|
||||
* **NEW:** Official/documented `training APIs <https://spacy.io/docs/tutorials/training>`_ and `GoldParse` class
|
||||
* **NEW:** `custom processing pipelines <https://spacy.io/docs/usage/customizing-pipeline>`_, to support deep learning workflows
|
||||
* **NEW:** `Rule matcher <https://spacy.io/docs/usage/rule-based-matching>`_ now supports entity IDs and attributes
|
||||
* **NEW:** Official/documented `training APIs <https://github.com/explosion/spaCy/tree/master/examples/training>`_ and `GoldParse` class
|
||||
* Download and use GloVe vectors by default
|
||||
* Make it easier to load and unload word vectors
|
||||
* Improved rule matching functionality
|
||||
|
@ -425,7 +432,7 @@ include a small fix to the tokenizer.
|
|||
* Fix bugs in ``Span``
|
||||
* Add tokenizer rule to fix numeric range tokenization
|
||||
* Add specific string-length cap in Tokenizer
|
||||
* Fix ``token.conjuncts```
|
||||
* Fix ``token.conjuncts``
|
||||
|
||||
2015-10-09 `v0.94 <https://github.com/explosion/spaCy/releases/tag/0.94>`_
|
||||
--------------------------------------------------------------------------
|
||||
|
|
|
@ -80,7 +80,7 @@ For example:
|
|||
+h(2, "link-id") Headline 2 with link to #link-id
|
||||
```
|
||||
|
||||
Code blocks are implemented using the `+code` or `+aside-code` (to display them in the sidebar). A `.` is added after the mixin call to preserve whitespace:
|
||||
Code blocks are implemented using `+code` or `+aside-code` (to display them in the right sidebar). A `.` is added after the mixin call to preserve whitespace:
|
||||
|
||||
```pug
|
||||
+code("This is a label").
|
||||
|
|
|
@ -47,7 +47,7 @@
|
|||
}
|
||||
},
|
||||
|
||||
"V_CSS": "1.4",
|
||||
"V_CSS": "1.5",
|
||||
"V_JS": "1.0",
|
||||
"DEFAULT_SYNTAX" : "python",
|
||||
"ANALYTICS": "UA-58931649-1",
|
||||
|
|
|
@ -10,7 +10,7 @@
|
|||
|
||||
.c-table__row
|
||||
&:nth-child(odd)
|
||||
background: lighten($color-subtle-light, 2)
|
||||
background: rgba($color-subtle-light, 0.35)
|
||||
|
||||
&.c-table__row--foot
|
||||
background: $color-subtle-light
|
||||
|
|
|
@ -2,6 +2,7 @@
|
|||
"sidebar": {
|
||||
"Introduction": {
|
||||
"Facts & Figures": "./",
|
||||
"Language models": "language-models",
|
||||
"Philosophy": "philosophy"
|
||||
},
|
||||
"Classes": {
|
||||
|
@ -25,6 +26,11 @@
|
|||
|
||||
"index": {
|
||||
"title": "Facts & Figures",
|
||||
"next": "language-models"
|
||||
},
|
||||
|
||||
"language-models": {
|
||||
"title": "Language models",
|
||||
"next": "philosophy"
|
||||
},
|
||||
|
||||
|
|
22
website/docs/api/language-models.jade
Normal file
22
website/docs/api/language-models.jade
Normal file
|
@ -0,0 +1,22 @@
|
|||
//- 💫 DOCS > API > LANGUAGE MODELS
|
||||
|
||||
include ../../_includes/_mixins
|
||||
|
||||
p You can download data packs that add the following capabilities to spaCy.
|
||||
|
||||
+aside-code("Download language models", "bash").
|
||||
python -m spacy.en.download all
|
||||
python -m spacy.de.download all
|
||||
|
||||
+table([ "Language", "Token", "SBD", "Lemma", "POS", "NER", "Dep", "Vector", "Sentiment"])
|
||||
+row
|
||||
+cell English #[code en]
|
||||
each icon in [ "pro", "pro", "pro", "pro", "pro", "pro", "pro", "con" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
+row
|
||||
+cell German #[code de]
|
||||
each icon in [ "pro", "pro", "con", "pro", "pro", "pro", "pro", "con" ]
|
||||
+cell.u-text-center #[+procon(icon)]
|
||||
|
||||
p We're working hard to extend support for more languages and more capabilities. The next language packs we're planning to add are Spanish, Chinese, French and Portuguese.
|
|
@ -208,7 +208,7 @@
|
|||
}
|
||||
},
|
||||
|
||||
"features": {
|
||||
"deep_dives": {
|
||||
"Deep Learning with custom pipelines and Keras": {
|
||||
"url": "https://explosion.ai/blog/spacy-deep-learning-keras",
|
||||
"author": "Matthew Honnibal",
|
||||
|
|
|
@ -24,7 +24,7 @@ p
|
|||
| to help introduce you to new concepts.
|
||||
|
||||
+grid
|
||||
each details, title in features
|
||||
each details, title in deep_dives
|
||||
+card(title, details)
|
||||
|
||||
+h(2, "code") Programs and scripts
|
||||
|
|
|
@ -94,9 +94,10 @@ include _includes/_mixins
|
|||
+item Labelled dependency parsing
|
||||
+item Convenient string-to-int mapping
|
||||
+item Export to numpy data arrays
|
||||
+item Efficient #[strong multi-threading]
|
||||
+item GIL-free #[strong multi-threading]
|
||||
+item Efficient binary serialization
|
||||
+item Easy #[strong deep learning] integration
|
||||
+item Statistical models for #[strong English] and #[strong German]
|
||||
+item State-of-the-art speed
|
||||
+item Robust, rigorously evaluated accuracy
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user