💫 Industrial-strength Natural Language Processing (NLP) in Python
Go to file
2016-04-14 12:46:43 +02:00
bin rename vectors.tgz to vectors.bz2 because it's not compressed with gzip but bzip 2016-04-08 13:38:07 +02:00
contributors Add contributor. 2015-10-07 17:55:46 -07:00
corpora/en * Add wordnet 2015-09-21 19:06:48 +10:00
examples remove deprecated LOCAL_DATA_DIR 2016-04-05 11:25:54 +02:00
include * Add header files to repo, to prevent cross-compilation problems 2016-02-06 22:57:11 +01:00
lang_data * Fix long-standing issue with coarse-grained tags: proper nouns weren't receiving the PROPN tag, and personal pronouns weren't receiving the PRON tag. This should fix Issue #191, and also Issue #325, which reported that proper nouns were being lemmatized using the common noun policies. This lemmatization will be prevented if the universal tag is PROPN, not NOUN, as no lemmatization rules are loaded for the PROPN tag. 2016-04-14 12:46:43 +02:00
spacy * Fix infixed commas in tokenizer, re Issue #326. Need to benchmark on empirical data, to make sure this doesn't break other cases. 2016-04-14 11:36:03 +02:00
website Update GitHub links 2016-04-01 02:23:52 +11:00
.gitignore Update gitignore for website 2016-04-01 00:36:56 +11:00
.travis.yml Update .travis.yml 2016-02-09 19:34:24 +01:00
buildbot.json add run section to buildbot.json 2016-02-26 23:04:33 +01:00
fabfile.py Merge branch 'master' of https://github.com/honnibal/spaCy 2015-12-28 18:03:06 +01:00
LICENSE cleanup 2016-03-12 14:30:24 +01:00
MANIFEST.in cleanup 2016-03-13 18:12:32 +01:00
package.json Update package.json 2016-04-08 14:48:47 +02:00
README.rst Update README.rst 2016-04-01 20:20:22 +02:00
requirements.txt * Pin Cython to <0.24, until we fix for new version 2016-04-07 11:51:53 +10:00
setup.py remove ujson as default non-dev dependency (still works as fallback if installed), because ujson doesn't ship wheels 2016-04-12 11:28:07 +02:00
wordnet_license.txt * Add WordNet license file 2015-02-01 16:11:53 +11:00

.. image:: https://travis-ci.org/spacy-io/spaCy.svg?branch=master
    :target: https://travis-ci.org/spacy-io/spaCy

==============================
spaCy: Industrial-strength NLP
==============================

spaCy is a library for advanced natural language processing in Python and Cython.

Documentation and details: https://spacy.io/

spaCy is built on the very latest research, but it isn't researchware.  It was
designed from day 1 to be used in real products. It's commercial open-source
software, released under the MIT license.


Features
--------

* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
* Named entity recognition (82.6% accuracy on OntoNotes 5)
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
* Easy to use word vectors
* All strings mapped to integer IDs
* Export to numpy data arrays
* Alignment maintained to original string, ensuring easy mark up calculation
* Range of easy-to-use orthographic features.
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Top Peformance
--------------

* Fastest in the world: <50ms per document.  No faster system has ever been
  announced.
* Accuracy within 1% of the current state of the art on all tasks performed
  (parsing, named entity recognition, part-of-speech tagging).  The only more
  accurate systems are an order of magnitude slower or more.

Supports
--------

* CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
* OSX
* Linux
* Windows (Cygwin, MinGW, Visual Studio)