💫 Industrial-strength Natural Language Processing (NLP) in Python
Go to file
2016-04-29 09:49:28 +02:00
bin rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip 2016-04-08 13:38:07 +02:00
contributors Add contributor. 2015-10-07 17:55:46 -07:00
corpora/en * Add wordnet 2015-09-21 19:06:48 +10:00
examples remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00
include add stdint.h fallback (vs 2008) 2016-04-29 00:08:14 +02:00
lang_data * Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue #191, and also Issue #325, which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag. 2016-04-14 12:46:43 +02:00
spacy * Merge updated/simplified Break approach 2016-04-25 19:44:42 +00:00
website Update GitHub links 2016-04-01 02:23:52 +11:00
.gitignore Update gitignore for website 2016-04-01 00:36:56 +11:00
.travis.yml Update .travis.yml 2016-02-09 19:34:24 +01:00
buildbot.json pin numpy to >=1.7, ship headers 2016-04-19 19:50:42 +02:00
fabfile.py Merge branch 'master' of https://github.com/honnibal/spaCy 2015-12-28 18:03:06 +01:00
LICENSE Update LICENSE 2016-04-29 09:49:28 +02:00
MANIFEST.in cleanup 2016-03-13 18:12:32 +01:00
package.json Update package.json 2016-04-08 14:48:47 +02:00
README.rst Update README.rst 2016-04-01 20:20:22 +02:00
requirements.txt pin numpy to >=1.7, ship headers 2016-04-19 19:50:42 +02:00
setup.py add stdint.h fallback (vs 2008) 2016-04-28 22:10:43 +02:00
wordnet_license.txt * Add WordNet license file 2015-02-01 16:11:53 +11:00

.. image:: https://travis-ci.org/spacy-io/spaCy.svg?branch=master
    :target: https://travis-ci.org/spacy-io/spaCy

==============================
spaCy: Industrial-strength NLP
==============================

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: https://spacy.io/

spaCy is built on the very latest research, but it isn't researchware.  It was
designed from day 1 to be used in real products. It's commercial open-source
software, released under the MIT license.


Features
--------

* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
* Named entity recognition (82.6% accuracy on OntoNotes 5)
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
* Easy to use word vectors
* All strings mapped to integer IDs
* Export to numpy data arrays
* Alignment maintained to original string, ensuring easy mark up calculation
* Range of easy-to-use orthographic features.
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance
--------------

* Fastest in the world: <50ms per document.  No faster system has ever been
  announced.
* Accuracy within 1% of the current state of the art on all tasks performed
  (parsing, named entity recognition, part-of-speech tagging).  The only more
  accurate systems are an order of magnitude slower or more.

Supports
--------

* CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
* OSX
* Linux
* Windows (Cygwin, MinGW, Visual Studio)