mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	Update README.rst
This commit is contained in:
		
							parent
							
								
									d3dc5718b2
								
							
						
					
					
						commit
						1359cfe32f
					
				
							
								
								
									
										374
									
								
								README.rst
									
									
									
									
									
								
							
							
						
						
									
										374
									
								
								README.rst
									
									
									
									
									
								
							| 
						 | 
					@ -1,77 +1,244 @@
 | 
				
			||||||
 | 
					spaCy: Industrial-strength NLP
 | 
				
			||||||
 | 
					******************************
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					spaCy is a library for advanced natural language processing in Python and 
 | 
				
			||||||
 | 
					Cython. `See here <https://spacy.io>`_ for documentation and details. spaCy is built on 
 | 
				
			||||||
 | 
					the very latest research, but it isn't researchware.  It was designed from day 1 
 | 
				
			||||||
 | 
					to be used in real products. It's commercial open-source software, released under 
 | 
				
			||||||
 | 
					the MIT license.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. image:: http://i.imgur.com/wFvLZyJ.png
 | 
				
			||||||
 | 
					    :target: https://travis-ci.org/spacy-io/spaCy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. image:: https://travis-ci.org/spacy-io/spaCy.svg?branch=master
 | 
					.. image:: https://travis-ci.org/spacy-io/spaCy.svg?branch=master
 | 
				
			||||||
    :target: https://travis-ci.org/spacy-io/spaCy
 | 
					    :target: https://travis-ci.org/spacy-io/spaCy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
==============================
 | 
					 | 
				
			||||||
spaCy: Industrial-strength NLP
 | 
					 | 
				
			||||||
==============================
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
spaCy is a library for advanced natural language processing in Python and Cython.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Documentation and details: https://spacy.io/
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
spaCy is built on the very latest research, but it isn't researchware.  It was
 | 
					 | 
				
			||||||
designed from day 1 to be used in real products. It's commercial open-source
 | 
					 | 
				
			||||||
software, released under the MIT license.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Features
 | 
					Features
 | 
				
			||||||
--------
 | 
					========
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
 | 
					* Labelled dependency parsing (91.8% accuracy on OntoNotes 5)
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Named entity recognition (82.6% accuracy on OntoNotes 5)
 | 
					* Named entity recognition (82.6% accuracy on OntoNotes 5)
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
 | 
					* Part-of-speech tagging (97.1% accuracy on OntoNotes 5)
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Easy to use word vectors
 | 
					* Easy to use word vectors
 | 
				
			||||||
 | 
					 | 
				
			||||||
* All strings mapped to integer IDs
 | 
					* All strings mapped to integer IDs
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Export to numpy data arrays
 | 
					* Export to numpy data arrays
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Alignment maintained to original string, ensuring easy mark up calculation
 | 
					* Alignment maintained to original string, ensuring easy mark up calculation
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Range of easy-to-use orthographic features.
 | 
					* Range of easy-to-use orthographic features.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
 | 
					* No pre-processing required. spaCy takes raw text as input, warts and newlines and all.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Top Peformance
 | 
					Top Peformance
 | 
				
			||||||
--------------
 | 
					==============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Fastest in the world: <50ms per document.  No faster system has ever been
 | 
					* Fastest in the world: <50ms per document.  No faster system has ever been
 | 
				
			||||||
  announced.
 | 
					  announced.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Accuracy within 1% of the current state of the art on all tasks performed
 | 
					* Accuracy within 1% of the current state of the art on all tasks performed
 | 
				
			||||||
  (parsing, named entity recognition, part-of-speech tagging).  The only more
 | 
					  (parsing, named entity recognition, part-of-speech tagging).  The only more
 | 
				
			||||||
  accurate systems are an order of magnitude slower or more.
 | 
					  accurate systems are an order of magnitude slower or more.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Supports
 | 
					Supports
 | 
				
			||||||
--------
 | 
					========
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
 | 
					* CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
 | 
				
			||||||
* OSX
 | 
					* OSX
 | 
				
			||||||
* Linux
 | 
					* Linux
 | 
				
			||||||
* Windows (Cygwin, MinGW, Visual Studio)
 | 
					* Windows (Cygwin, MinGW, Visual Studio)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Install spaCy
 | 
				
			||||||
 | 
					=============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
2016-05-0 0.101.0: Fixed German model
 | 
					spaCy is compatible with 64-bit CPython 2.6+/3.3+ and runs on Unix/Linux, OS X 
 | 
				
			||||||
-------------------------------------
 | 
					and Windows. Source and binary packages are available via 
 | 
				
			||||||
 | 
					`pip <https://pypi.python.org/pypi/spacy>`_ and `conda <https://anaconda.org/spacy/spacy>`_. 
 | 
				
			||||||
 | 
					If there are no binary packages for your platform available please make sure that
 | 
				
			||||||
 | 
					you have a working build enviroment set up. See notes on Ubuntu, OS X and Windows
 | 
				
			||||||
 | 
					for details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					conda
 | 
				
			||||||
 | 
					-----
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    conda config --add channels spacy  # only needed once
 | 
				
			||||||
 | 
					    conda install spacy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					pip
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					When using pip it is generally recommended to install packages in a virtualenv to
 | 
				
			||||||
 | 
					avoid modifying system state:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # make sure you are using a recent pip/virtualenv version
 | 
				
			||||||
 | 
					    python -m pip install -U pip virtualenv
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    virtualenv .env
 | 
				
			||||||
 | 
					    source .env/bin/activate
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    pip install spacy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Python packaging is awkward at the best of times, and it's particularly tricky with
 | 
				
			||||||
 | 
					C extensions, built via Cython, requiring large data files. So, please report issues
 | 
				
			||||||
 | 
					as you encounter them.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Install model
 | 
				
			||||||
 | 
					=============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					After installation you need to download a language model. Currently only models for 
 | 
				
			||||||
 | 
					English and German, named ``en`` and ``de``, are available.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    python -m spacy.en.download
 | 
				
			||||||
 | 
					    python -m spacy.de.download
 | 
				
			||||||
 | 
					    sputnik --name spacy en_glove_cc_300_1m_vectors # For better word vectors
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Then check whether the model was successfully installed:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    python -c "import spacy; spacy.load('en'); print('OK')"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The download command fetches and installs about 500 MB of data which it installs 
 | 
				
			||||||
 | 
					within the ``spacy`` package directory.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Upgrading spaCy
 | 
				
			||||||
 | 
					===============
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To upgrade spaCy to the latest release:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					conda
 | 
				
			||||||
 | 
					-----
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    conda update spacy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					pip
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    pip install -U spacy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Sometimes new releases require a new language model. Then you will have to upgrade to 
 | 
				
			||||||
 | 
					a new model, too. You can also force re-downloading and installing a new language model:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    python -m spacy.en.download --force
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Compile from source
 | 
				
			||||||
 | 
					===================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The other way to install spaCy is to clone its GitHub repository and build it from 
 | 
				
			||||||
 | 
					source. That is the common way if you want to make changes to the code base.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You'll need to make sure that you have a development enviroment consisting of a 
 | 
				
			||||||
 | 
					Python distribution including header files, a compiler, pip, virtualenv and git 
 | 
				
			||||||
 | 
					installed. The compiler part is the trickiest. How to do that depends on your 
 | 
				
			||||||
 | 
					system. See notes on Ubuntu, OS X and Windows for details.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # make sure you are using recent pip/virtualenv versions
 | 
				
			||||||
 | 
					    python -m pip install -U pip virtualenv
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    #  find git install instructions at https://git-scm.com/downloads
 | 
				
			||||||
 | 
					    git clone https://github.com/spacy-io/spaCy.git
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    cd spaCy
 | 
				
			||||||
 | 
					    virtualenv .env && source .env/bin/activate
 | 
				
			||||||
 | 
					    pip install -r requirements.txt
 | 
				
			||||||
 | 
					    pip install -e .
 | 
				
			||||||
 | 
					    
 | 
				
			||||||
 | 
					Compared to regular install via pip and conda `requirements.txt <requirements.txt>`_ 
 | 
				
			||||||
 | 
					additionally installs developer dependencies such as cython.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Ubuntu
 | 
				
			||||||
 | 
					------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Install system-level dependencies via ``apt-get``:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    sudo apt-get install build-essential python-dev git
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					OS X
 | 
				
			||||||
 | 
					----
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Install a recent version of XCode, including the so-called "Command Line Tools". 
 | 
				
			||||||
 | 
					OS X ships with Python and git preinstalled.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Windows
 | 
				
			||||||
 | 
					-------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Install a version of Visual Studio Express or higher that matches the version 
 | 
				
			||||||
 | 
					that was used to compile your Python interpreter. For official distributions 
 | 
				
			||||||
 | 
					these are VS 2008 (Python 2.7), VS 2010 (Python 3.4) and VS 2015 (Python 3.5).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Workaround for obsolete system Python
 | 
				
			||||||
 | 
					=====================================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you're stuck using a system with an old version of Python, and you don't 
 | 
				
			||||||
 | 
					have root access, we've prepared a bootstrap script to help you compile a local 
 | 
				
			||||||
 | 
					Python install. Run:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    curl https://raw.githubusercontent.com/spacy-io/gist/master/bootstrap_python_env.sh | bash && source .env/bin/activate
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Run tests
 | 
				
			||||||
 | 
					=========
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					spaCy comes with an extensive test suite. First, find out where spaCy is 
 | 
				
			||||||
 | 
					installed:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					    
 | 
				
			||||||
 | 
					    python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Then run ``pytest`` on that directory. The flags ``--vectors``, ``--slow`` 
 | 
				
			||||||
 | 
					and ``--model`` are optional and enable additional tests:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					    
 | 
				
			||||||
 | 
					    # make sure you are using recent pytest version
 | 
				
			||||||
 | 
					    python -m pip install -U pytest
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    python -m pytest <spacy-directory> --vectors --model --slow
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					API Documentation and Usage Examples
 | 
				
			||||||
 | 
					====================================
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For the detailed documentation, check out the `spaCy website <https://spacy.io/docs/>`_.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* `Usage Examples <https://spacy.io/docs/#examples>`_
 | 
				
			||||||
 | 
					* `API <https://spacy.io/docs/#api>`_
 | 
				
			||||||
 | 
					* `Annotation Specification <https://spacy.io/docs/#annotation>`_
 | 
				
			||||||
 | 
					* `Tutorials <https://spacy.io/docs/#tutorials>`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Changelog
 | 
				
			||||||
 | 
					=========
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-05-10 `v0.101.0 <../../releases/tag/0.101.0>`_: *Fixed German model*
 | 
				
			||||||
 | 
					-------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Fixed bug that prevented German parses from being deprojectivised.
 | 
					* Fixed bug that prevented German parses from being deprojectivised.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Bug fixes to sentence boundary detection.
 | 
					* Bug fixes to sentence boundary detection.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Add rich comparison methods to the Lexeme class.
 | 
					* Add rich comparison methods to the Lexeme class.
 | 
				
			||||||
 | 
					* Add missing ``Doc.has_vector`` and ``Span.has_vector`` properties.
 | 
				
			||||||
 | 
					* Add missing ``Span.sent`` property.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Add missing Doc.has_vector and Span.has_vector properties.
 | 
					2016-05-05 `v0.100.7 <../../releases/tag/0.100.7>`_: *German!*
 | 
				
			||||||
 | 
					--------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
* Add missing Span.sent property.
 | 
					spaCy finally supports another language, in addition to English. We're lucky 
 | 
				
			||||||
 | 
					to have Wolfgang Seeker on the team, and the new German model is just the 
 | 
				
			||||||
 | 
					beginning. Now that there are multiple languages, you should consider loading 
 | 
				
			||||||
2016-05-05 v0.100.7: German!
 | 
					spaCy via the ``load()`` function. This function also makes it easier to load extra 
 | 
				
			||||||
----------------------------
 | 
					word vector data for English:
 | 
				
			||||||
 | 
					 | 
				
			||||||
spaCy finally supports another language, in addition to English. We're lucky to have Wolfgang Seeker on the team, and the new German model is just the beginning.
 | 
					 | 
				
			||||||
Now that there are multiple languages, you should consider loading spaCy via the load() function. This function also makes it easier to load extra word vector data for English:
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. code:: python
 | 
					.. code:: python
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -79,8 +246,9 @@ Now that there are multiple languages, you should consider loading spaCy via the
 | 
				
			||||||
    en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
 | 
					    en_nlp = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
 | 
				
			||||||
    de_nlp = spacy.load('de')
 | 
					    de_nlp = spacy.load('de')
 | 
				
			||||||
    
 | 
					    
 | 
				
			||||||
To support use of the load function, there are also two new helper functions: spacy.get_lang_class and spacy.set_lang_class.
 | 
					To support use of the load function, there are also two new helper functions: 
 | 
				
			||||||
Once the German model is loaded, you can use it just like the English model:
 | 
					``spacy.get_lang_class`` and ``spacy.set_lang_class``. Once the German model is 
 | 
				
			||||||
 | 
					loaded, you can use it just like the English model:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. code:: python
 | 
					.. code:: python
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -92,20 +260,130 @@ Once the German model is loaded, you can use it just like the English model:
 | 
				
			||||||
    # (u'ist', 1, 2)
 | 
					    # (u'ist', 1, 2)
 | 
				
			||||||
    # (u'sind', 1, 3)
 | 
					    # (u'sind', 1, 3)
 | 
				
			||||||
    
 | 
					    
 | 
				
			||||||
The German model provides tokenization, POS tagging, sentence boundary detection, syntactic dependency parsing, recognition of organisation, location and person entities, and word vector representations trained on a mix of open subtitles and Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis, and it doesn't yet recognise numeric entities such as numbers and dates.
 | 
					The German model provides tokenization, POS tagging, sentence boundary detection, 
 | 
				
			||||||
 | 
					syntactic dependency parsing, recognition of organisation, location and person 
 | 
				
			||||||
 | 
					entities, and word vector representations trained on a mix of open subtitles and 
 | 
				
			||||||
 | 
					Wikipedia data. It doesn't yet provide lemmatisation or morphological analysis, 
 | 
				
			||||||
 | 
					and it doesn't yet recognise numeric entities such as numbers and dates.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Bugfixes
 | 
					**Bugfixes**
 | 
				
			||||||
--------
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
* spaCy < 0.100.7 had a bug in the semantics of the Token.__str__ and Token.__unicode__ built-ins: they included a trailing space.
 | 
					* spaCy < 0.100.7 had a bug in the semantics of the ``Token.__str__`` and ``Token.__unicode__`` built-ins: they included a trailing space.
 | 
				
			||||||
* Improve handling of "infixed" hyphens. Previously the tokenizer struggled with multiple hyphens, such as "well-to-do".
 | 
					* Improve handling of "infixed" hyphens. Previously the tokenizer struggled with multiple hyphens, such as "well-to-do".
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Improve handling of periods after mixed-case tokens
 | 
					* Improve handling of periods after mixed-case tokens
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Improve lemmatization for English special-case tokens
 | 
					* Improve lemmatization for English special-case tokens
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Fix bug that allowed spaces to be treated as heads in the syntactic parse
 | 
					* Fix bug that allowed spaces to be treated as heads in the syntactic parse
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Fix bug that led to inconsistent sentence boundaries before and after serialisation.
 | 
					* Fix bug that led to inconsistent sentence boundaries before and after serialisation.
 | 
				
			||||||
 | 
					 | 
				
			||||||
* Fix bug from deserialising untagged documents.
 | 
					* Fix bug from deserialising untagged documents.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-03-08 `v0.100.6 <../../releases/tag/0.100.6>`_: *Add support for GloVe vectors*
 | 
				
			||||||
 | 
					------------------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This release offers improved support for replacing the word vectors used by spaCy. 
 | 
				
			||||||
 | 
					To install Stanford's GloVe vectors, trained on the Common Crawl, just run:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: bash
 | 
				
			||||||
 | 
					    sputnik --name spacy install en_glove_cc_300_1m_vectors
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To reduce memory usage and loading time, we've trimmed the vocabulary down to 1m entries.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This release also integrates all the code necessary for German parsing. A German model 
 | 
				
			||||||
 | 
					will be released shortly. To assist in multi-lingual processing, we've added a ``load()`` 
 | 
				
			||||||
 | 
					function. To load the English model with the GloVe vectors:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. code:: python
 | 
				
			||||||
 | 
					    spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-02-07 `v0.100.5 <../../releases/tag/0.100.5>`_
 | 
				
			||||||
 | 
					---------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Fix incorrect use of header file, caused from problem with thinc
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-02-07 `v0.100.4 <../../releases/tag/0.100.4>`_: *Fix OSX problem introduced in 0.100.3*
 | 
				
			||||||
 | 
					--------------------------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Small correction to right_edge calculation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-02-06 `v0.100.3 <../../releases/tag/0.100.3>`_
 | 
				
			||||||
 | 
					---------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Support multi-threading, via the ``.pipe`` method. spaCy now releases the GIL around the
 | 
				
			||||||
 | 
					parser and entity recognizer, so systems that support OpenMP should be able to do
 | 
				
			||||||
 | 
					shared memory parallelism at close to full efficiency.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We've also greatly reduced loading time, and fixed a number of bugs.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-01-21 `v0.100.2 <../../releases/tag/0.100.2>`_
 | 
				
			||||||
 | 
					---------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Fix data version lock that affected v0.100.1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-01-21 `v0.100.1 <../../releases/tag/0.100.1>`_: *Fix install for OSX*
 | 
				
			||||||
 | 
					--------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					v0.100 included header files built on Linux that caused installation to fail on OSX.
 | 
				
			||||||
 | 
					This should now be corrected. We also update the default data distribution, to
 | 
				
			||||||
 | 
					include a small fix to the tokenizer.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2016-01-19 `v0.100 <../../releases/tag/0.100>`_: *Revise setup.py, better model downloads, bug fixes*
 | 
				
			||||||
 | 
					-----------------------------------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Redo setup.py, and remove ugly headers_workaround hack. Should result in fewer install problems.
 | 
				
			||||||
 | 
					* Update data downloading and installation functionality, by migrating to the Sputnik data-package manager. This will allow us to offer finer grained control of data installation in future.
 | 
				
			||||||
 | 
					* Fix bug when using custom entity types in ``Matcher``. This should work by default when using the
 | 
				
			||||||
 | 
					  ``English.__call__`` method of running the pipeline. If invoking ``Parser.__call__`` directly to do NER,
 | 
				
			||||||
 | 
					  you should call the ``Parser.add_label()`` method to register your entity type.
 | 
				
			||||||
 | 
					* Fix head-finding rules in ``Span``.
 | 
				
			||||||
 | 
					* Fix problem that caused ``doc.merge()`` to sometimes hang
 | 
				
			||||||
 | 
					* Fix problems in handling of whitespace
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-11-08 `v0.99 <../../releases/tag/0.99>`_: *Improve span merging, internal refactoring*
 | 
				
			||||||
 | 
					-------------------------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Merging multi-word tokens into one, via the ``doc.merge()`` and ``span.merge()`` methods, no longer invalidates existing ``Span`` objects. This makes it much easier to merge multiple spans, e.g. to merge all named entities, or all base noun phrases. Thanks to @andreasgrv for help on this patch.
 | 
				
			||||||
 | 
					* Lots of internal refactoring, especially around the machine learning module, thinc. The thinc API has now been improved, and the spacy._ml wrapper module is no longer necessary.
 | 
				
			||||||
 | 
					* The lemmatizer now lower-cases non-noun, noun-verb and non-adjective words.
 | 
				
			||||||
 | 
					* A new attribute, ``.rank``, is added to Token and Lexeme objects, giving the frequency rank of the word.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-11-03 `v0.98 <../../releases/tag/0.98>`_: *Smaller package, bug fixes*
 | 
				
			||||||
 | 
					---------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Remove binary data from PyPi package.
 | 
				
			||||||
 | 
					* Delete archive after downloading data
 | 
				
			||||||
 | 
					* Use updated cymem, preshed and thinc packages
 | 
				
			||||||
 | 
					* Fix information loss in deserialize
 | 
				
			||||||
 | 
					* Fix ``__str__`` methods for Python2
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-10-23 `v0.97 <../../releases/tag/0.97>`_: *Load the StringStore from a json list, instead of a text file*
 | 
				
			||||||
 | 
					--------------------------------------------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Fix bugs in download.py
 | 
				
			||||||
 | 
					* Require ``--force`` to over-write the data directory in download.py
 | 
				
			||||||
 | 
					* Fix bugs in ``Matcher`` and ``doc.merge()``
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-10-19 `v0.96 <../../releases/tag/0.96>`_: *Hotfix to .merge method*
 | 
				
			||||||
 | 
					------------------------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Fix bug that caused text to be lost after ``.merge``
 | 
				
			||||||
 | 
					* Fix bug in Matcher when matched entities overlapped
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-10-18 `v0.95 <../../releases/tag/0.95>`_: *Bugfixes*
 | 
				
			||||||
 | 
					---------------------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Reform encoding of symbols
 | 
				
			||||||
 | 
					* Fix bugs in ``Matcher``
 | 
				
			||||||
 | 
					* Fix bugs in ``Span``
 | 
				
			||||||
 | 
					* Add tokenizer rule to fix numeric range tokenization
 | 
				
			||||||
 | 
					* Add specific string-length cap in Tokenizer
 | 
				
			||||||
 | 
					* Fix ``token.conjuncts```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-10-09 `v0.94 <../../releases/tag/0.94>`_
 | 
				
			||||||
 | 
					---------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Fix memory error that caused crashes on 32bit platforms
 | 
				
			||||||
 | 
					* Fix parse errors caused by smart quotes and em-dashes
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					2015-09-22 `v0.93 <../../releases/tag/0.93>`_
 | 
				
			||||||
 | 
					---------------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Bug fixes to word vectors
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user