mirror of https://github.com/explosion/spaCy.git synced 2026-02-02 21:46:24 +03:00

💫 Industrial-strength Natural Language Processing (NLP) in Python

Go to file

Matthew Honnibal d8ef2d6b61 * Upd README.md		2015-07-01 15:27:37 +02:00
bin	* Fix token scoring	2015-06-28 06:22:18 +02:00
contributors	Add CLA for suchow	2015-04-19 13:01:38 -07:00
docs	Merge branch 'master' of ssh://github.com/honnibal/spaCy	2015-06-28 09:43:11 +02:00
lang_data/en	* Add lemma rule for better and best in morphs.json	2015-06-28 09:26:25 +02:00
spacy	* Fix attrs.pxd	2015-06-30 18:16:30 +02:00
tests	* Add test for strip_bad_periods reading in read_conll.parse	2015-06-18 16:36:04 +02:00
.gitignore	Don't track generated data files	2015-04-19 13:25:42 -07:00
.travis.yml	* Have travis use pip again...	2015-06-08 01:27:08 +02:00
bootstrap_python_env.sh	* Add bootstrap script	2015-03-16 14:01:36 -04:00
dev_setup.py	Tweak line spacing	2015-04-19 13:01:38 -07:00
fabfile.py	* Fix fab test	2015-06-07 22:59:05 +02:00
LICENSE.txt	Tweak line spacing	2015-04-19 13:01:38 -07:00
MANIFEST.in	* Add manifest file	2015-01-30 16:49:02 +11:00
README.md	* Upd README.md	2015-07-01 15:27:37 +02:00
requirements.txt	* Inc versions	2015-06-30 18:11:06 +02:00
setup.py	* Inc versions	2015-06-30 18:11:06 +02:00
wordnet_license.txt	* Add WordNet license file	2015-02-01 16:11:53 +11:00

spaCy

spaCy is a library for industrial-strength NLP in Python and Cython.

Documentation and details: http://spacy.io/

spaCy is built on the very latest research, but it isn't researchware. It was designed from day 1 to be used in real products.

I left academia to make spaCy my full-time job. You can buy a commercial license, or you can use it under the AGPL.

Features

Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
Named entity recognition (82.6% accuracy on OntoNotes 5)
Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
Easy to use word vectors
All strings mapped to integer IDs
Export to numpy data arrays
Alignment maintained to original string, ensuring easy mark up calculation
Range of easy-to-use orthographic features.
No pre-processing required. spaCy takes raw text as input, warts and newlines and all.

Fastest in the world: <50ms per document. No faster system has ever been announced.
Accuracy within 1% of the current state of the art on all tasks performed (parsing, named entity recognition, part-of-speech tagging). The only more accurate systems are an order of magnitude slower or more.

Want to support:

Difficult to support: