mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
💫 Industrial-strength Natural Language Processing (NLP) in Python
aiartificial-intelligencecythondata-sciencedeep-learningentity-linkingmachine-learningnamed-entity-recognitionnatural-language-processingneural-networkneural-networksnlpnlp-librarypythonspacystarred-explosion-repostarred-repotext-classificationtokenization
5e2e8e951a
add classes for English and German noun chunks the respective iterators are set for the document when created by the parser as they depend on the annotation scheme of the parsing model |
||
---|---|---|
bin | ||
contributors | ||
corpora/en | ||
examples | ||
include | ||
lang_data | ||
spacy | ||
website | ||
.gitignore | ||
.travis.yml | ||
bootstrap_python_env.sh | ||
buildbot.json | ||
fabfile.py | ||
LICENSE.txt | ||
MANIFEST.in | ||
package.json | ||
README-MSVC.txt | ||
README.md | ||
requirements.txt | ||
setup.py | ||
tox.ini | ||
wordnet_license.txt |
spaCy: Industrial-strength NLP
spaCy is a library for advanced natural language processing in Python and Cython.
Documentation and details: https://spacy.io/
spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. It's commercial open-source software, released under the MIT license.
Features
- Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
- Named entity recognition (82.6% accuracy on OntoNotes 5)
- Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
- Easy to use word vectors
- All strings mapped to integer IDs
- Export to numpy data arrays
- Alignment maintained to original string, ensuring easy mark up calculation
- Range of easy-to-use orthographic features.
- No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
Top Peformance
- Fastest in the world: <50ms per document. No faster system has ever been announced.
- Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.
Supports
- CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
- OSX
- Linux
- Windows (Cygwin, MinGW, Visual Studio)
Difficult to support:
- PyPy