mirror of https://github.com/explosion/spaCy.git synced 2026-03-02 19:01:29 +03:00

💫 Industrial-strength Natural Language Processing (NLP) in Python

ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy starred-explosion-repo starred-repo text-classification tokenization

Go to file

Wolfgang Seeker 5e2e8e951a add baseclass DocIterator for iterators over documents add classes for English and German noun chunks the respective iterators are set for the document when created by the parser as they depend on the annotation scheme of the parsing model		2016-03-16 15:53:35 +01:00
bin	introduce lang field for LexemeC to hold language id	2016-03-10 13:01:34 +01:00
contributors	Add contributor.	2015-10-07 17:55:46 -07:00
corpora/en	* Add wordnet	2015-09-21 19:06:48 +10:00
examples	move displacy to its own subdomain	2016-02-19 14:03:52 +01:00
include	* Add header files to repo, to prevent cross-compilation problems	2016-02-06 22:57:11 +01:00
lang_data	add tokenizer files for German, add/change code to train German pos tagger	2016-02-18 13:24:20 +01:00
spacy	add baseclass DocIterator for iterators over documents	2016-03-16 15:53:35 +01:00
website	move displacy to its own subdomain	2016-02-19 14:03:52 +01:00
.gitignore	Added Windows file to .gitignore	2015-10-13 10:58:30 +03:00
.travis.yml	Update .travis.yml	2016-02-09 19:34:24 +01:00
bootstrap_python_env.sh	* Add bootstrap script	2015-03-16 14:01:36 -04:00
buildbot.json	add run section to buildbot.json	2016-02-26 23:04:33 +01:00
fabfile.py	Merge branch 'master' of https://github.com/honnibal/spaCy	2015-12-28 18:03:06 +01:00
LICENSE.txt	* Change from AGPL to MIT	2015-09-28 07:37:12 +10:00
MANIFEST.in	fix windows readme	2015-12-21 21:58:53 +01:00
package.json	Update package.json	2016-02-14 20:19:26 +01:00
README-MSVC.txt	fix windows readme	2015-12-21 21:58:53 +01:00
README.md	Update README.md	2016-02-19 19:36:47 +01:00
requirements.txt	upgrade to latest sputnik	2016-03-08 15:30:17 +01:00
setup.py	add baseclass DocIterator for iterators over documents	2016-03-16 15:53:35 +01:00
tox.ini	refactor setup.py	2015-12-13 23:32:23 +01:00
wordnet_license.txt	* Add WordNet license file	2015-02-01 16:11:53 +11:00

README.md

spaCy: Industrial-strength NLP

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: https://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products. It's commercial open-source software, released under the MIT license.

Features

Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
Named entity recognition (82.6% accuracy on OntoNotes 5)
Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
Easy to use word vectors
All strings mapped to integer IDs
Export to numpy data arrays
Alignment maintained to original string, ensuring easy mark up calculation
Range of easy-to-use orthographic features.
No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance

Fastest in the world: <50ms per document. No faster system has ever been announced.
Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Supports

CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
OSX
Linux
Windows (Cygwin, MinGW, Visual Studio)

Difficult to support:

PyPy