mirror of
https://github.com/explosion/spaCy.git
synced 2025-03-29 14:24:12 +03:00
213 lines
8.1 KiB
Plaintext
213 lines
8.1 KiB
Plaintext
mixin Option(name, open)
|
|
details(open=open)
|
|
summary
|
|
h4= name
|
|
block
|
|
|
|
+Option("Updating your installation")
|
|
| To update your installation:
|
|
|
|
pre.language-bash
|
|
code
|
|
$ pip install --upgrade spacy
|
|
$ python -m spacy.en.download --force all
|
|
p Most updates ship a new model, so you will usually have to redownload the data.
|
|
|
|
|
|
|
|
+Option("conda", true)
|
|
pre.language-bash: code
|
|
| $ conda config --add channels spacy
|
|
| $ conda install spacy
|
|
| $ python -m spacy.en.download all
|
|
|
|
+Option("pip and virtualenv", true)
|
|
p With Python 2.7 or Python 3, using Linux or OSX, ensure that you have the following packages installed:
|
|
|
|
pre.language-bash: code
|
|
| build-essential python-dev
|
|
|
|
p Then run:
|
|
|
|
pre.language-bash: code
|
|
| $ pip install spacy
|
|
| $ python -m spacy.en.download all
|
|
|
|
p
|
|
| The download command fetches and installs about 400mb of data, for
|
|
| the parser model and word vectors, which it installs within the spacy.en
|
|
| package directory.
|
|
|
|
|
|
+Option("Workaround for obsolete system Python", false)
|
|
p
|
|
| If you're stuck using a server with an old version of Python, and you
|
|
| don't have root access, I've prepared a bootstrap script to help you
|
|
| compile a local Python install. Run:
|
|
|
|
pre.language-bash: code
|
|
| $ curl https://raw.githubusercontent.com/honnibal/spaCy/master/bootstrap_python_env.sh | bash && source .env/bin/activate
|
|
|
|
|
|
+Option("Compile from source", false)
|
|
p
|
|
| The other way to install the package is to clone the github repository,
|
|
| and build it from source. This installs an additional dependency,
|
|
| Cython. If you're using Python 2, I also recommend installing fabric
|
|
| and fabtools – this is how I build the project.
|
|
|
|
|
| Ensure that you have the following packages installed:
|
|
|
|
pre.language-bash: code
|
|
| build-essential python-dev git python-virtualenv
|
|
|
|
pre.language-bash: code
|
|
| $ git clone https://github.com/honnibal/spaCy.git
|
|
| $ cd spaCy
|
|
| $ virtualenv .env && source .env/bin/activate
|
|
| $ export PYTHONPATH=`pwd`
|
|
| $ pip install -r requirements.txt
|
|
| $ python setup.py build_ext --inplace
|
|
| $ python -m spacy.en.download
|
|
| $ pip install pytest
|
|
| $ py.test tests/
|
|
|
|
p
|
|
| Python packaging is awkward at the best of times, and it's particularly tricky
|
|
| with C extensions, built via Cython, requiring large data files. So,
|
|
| please report issues as you encounter them.
|
|
|
|
+Option("pypy (Unsupported)")
|
|
| If PyPy support is a priority for you, please get in touch. We could likely
|
|
| fix the remaining issues, if necessary. However, the library is likely to
|
|
| be much slower on PyPy, as it's written in Cython, which produces code tuned
|
|
| for the performance of CPython.
|
|
|
|
+Option("Windows (Unsupported)")
|
|
| Unfortunately we don't currently support Windows.
|
|
|
|
h4 What's New?
|
|
|
|
|
|
details
|
|
summary
|
|
h4 2015-10-24 v0.97: Reduce load time, bug fixes
|
|
|
|
ul
|
|
li Load the StringStore from a json list, instead of a text file. Accept a file-like object in the API instead of a path, for better flexibility.
|
|
li * Load from file, rather than path, in StringStore
|
|
li Fix bugs in download.py
|
|
li Require #[code --force] to over-write the data directory in download.py
|
|
li Fix bugs in #[code Matcher] and #[code doc.merge()]
|
|
|
|
details
|
|
summary
|
|
h4 2015-09-21 v0.93: Bug fixes to word vectors. Rename .repvec to .vector. Rename .string attribute.
|
|
|
|
ul
|
|
li Bug fixes for word vectors.
|
|
li The attribute to access word vectors was formerly named #[code token.repvec]. This has been renamed #[code token.vector]. The #[code .repvec] name is now deprecated. It will remain available until the next version.
|
|
li Add #[code .vector] attributes to #[code Doc] and #[code Span] objects, which gives the average of their word vectors.
|
|
li Add a #[code .similarity] method to #[code Token], #[code Doc], #[code Span] and #[code Lexeme] objects, that performs cosine similarity over word vectors.
|
|
li The attribute #[code .string], which gave a whitespace-padded string representation, is now renamed #[code .text_with_ws]. A new #[code .text] attribute has been added. The #[code .string] attribute is now deprecated. It will remain available until the next version.
|
|
|
|
|
|
details
|
|
summary
|
|
h4 2015-09-15 v0.90: Refactor to allow multi-lingual, better customization.
|
|
|
|
ul
|
|
li Move code out of #[code spacy.en] module, to allow new languages to be added more easily.
|
|
li New #[code Matcher] class, to support token-based expressions, for custom named entity rules.
|
|
li Users can now write to #[code Lexeme] objects, to set their own flags and string attributes. Values set on the vocabulary are inherited by tokens. This is especially effective in combination with the rule logic.
|
|
|
|
details
|
|
summary
|
|
h4 2015-07-29 v0.89: Fix Spans, efficiency
|
|
|
|
ul
|
|
li Fix regression in parse times on very long texts. Recent versions were calculating parse features in a way that was polynomial in input length.
|
|
li Add tag SP (coarse tag SPACE) for whitespace tokens. Ensure entity recogniser does not assign entities to whitespace.
|
|
li Rename #[code Span.head] to #[code Span.root], fix its documentation, and make it more efficient.
|
|
|
|
details
|
|
summary
|
|
h4 2015-07-08 v0.88: Refactoring release.
|
|
|
|
ul
|
|
li If you have the data for v0.87, you don't need to redownload the data for this release.
|
|
li You can now set #[code tag=False], #[code parse=False] or #[code entity=False] when creating the pipleine, to disable some of the models. See the documentation for details.
|
|
li Models no longer lazy-loaded.
|
|
li Warning emitted when parse=True or entity=True but model not loaded.
|
|
li Rename the tokens.Tokens class to tokens.Doc. An alias has been made to assist backwards compatibility, but you should update your code to refer to the new class name.
|
|
li Various bits of internal refactoring
|
|
|
|
details
|
|
summary
|
|
h4 2015-07-01 v0.87: Memory use
|
|
|
|
ul
|
|
li Changed weights data structure. Memory use should be reduced 30-40%. Fixed speed regressions introduced in the last few versions.
|
|
li Models should now be slightly more robust to noise in the input text, as I'm now training on data with a small amount of noise added, e.g. I randomly corrupt capitalization, swap spaces for newlines, etc. This is bringing a small benefit on out-of-domain data. I think this strategy could yield better results with a better noise-generation function. If you think you have a good way to make clean text resemble the kind of noisy input you're seeing in your domain, get in touch.
|
|
|
|
details
|
|
summary
|
|
h4 2015-06-24 v0.86: Parser accuracy
|
|
|
|
p Parser now more accurate, using novel non-monotonic transition system that's currently under review.
|
|
|
|
details
|
|
summary
|
|
h4 2015-05-12 v0.85: More diverse training data
|
|
|
|
ul
|
|
li Parser produces richer dependency labels following the `ClearNLP scheme`_
|
|
li Training data now includes text from a variety of genres.
|
|
li Parser now uses more memory and the data is slightly larger, due to the additional labels. Impact on efficiency is minimal: entire process still takes <10ms per document.
|
|
|
|
p Most users should see a substantial increase in accuracy from the new model.
|
|
|
|
|
|
details
|
|
summary
|
|
h4 2015-05-12 v0.84: Bug fixes
|
|
|
|
ul
|
|
li Bug fixes for parsing
|
|
li Bug fixes for named entity recognition
|
|
|
|
details
|
|
summary
|
|
h4 2015-04-13 v0.80
|
|
|
|
p Preliminary support for named-entity recognition. Its accuracy is substantially behind the state-of-the-art. I'm working on improvements.
|
|
|
|
ul
|
|
li Better sentence boundary detection, drawn from the syntactic structure.
|
|
li Lots of bug fixes.
|
|
|
|
details
|
|
summary
|
|
h4 2015-03-05: v0.70
|
|
|
|
ul
|
|
li Improved parse navigation API
|
|
li Bug fixes to labelled parsing
|
|
|
|
|
|
details
|
|
summary
|
|
h4 2015-01-30: v0.4
|
|
ul
|
|
li Train statistical models on running text running text
|
|
|
|
details
|
|
summary
|
|
h4 2015-01-25: v0.33
|
|
ul
|
|
li Alpha release
|
|
|
|
|
|
|
|
|