mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Update README and docs [ci skip]
This commit is contained in:
parent
45c551037d
commit
a8a1231ccd
119
README.md
119
README.md
|
@ -7,11 +7,14 @@ Cython. It's built on the very latest research, and was designed from day one to
|
|||
be used in real products.
|
||||
|
||||
spaCy comes with
|
||||
[pretrained pipelines](https://spacy.io/models), and
|
||||
[pretrained pipelines](https://spacy.io/models) and
|
||||
currently supports tokenization and training for **60+ languages**. It features
|
||||
state-of-the-art speed and **neural network models** for tagging,
|
||||
parsing, **named entity recognition**, **text classification** and more, multi-task learning with pretrained **transformers** like BERT, as well as a production-ready [training system](https://spacy.io/usage/training) and easy model packaging, deployment and workflow management.
|
||||
spaCy is commercial open-source software, released under the MIT license.
|
||||
parsing, **named entity recognition**, **text classification** and more,
|
||||
multi-task learning with pretrained **transformers** like BERT, as well as a
|
||||
production-ready [**training system**](https://spacy.io/usage/training) and easy
|
||||
model packaging, deployment and workflow management. spaCy is commercial
|
||||
open-source software, released under the MIT license.
|
||||
|
||||
💫 **Version 3.0 out now!**
|
||||
[Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
||||
|
@ -25,7 +28,6 @@ spaCy is commercial open-source software, released under the MIT license.
|
|||
<br />
|
||||
[![PyPi downloads](https://img.shields.io/pypi/dm/spacy?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/spacy/)
|
||||
[![Conda downloads](https://img.shields.io/conda/dn/conda-forge/spacy?style=flat-square&logo=conda-forge&logoColor=white)](https://anaconda.org/conda-forge/spacy)
|
||||
[![Model downloads](https://img.shields.io/github/downloads/explosion/spacy-models/total?style=flat-square&label=model+downloads)](https://github.com/explosion/spacy-models/releases)
|
||||
[![spaCy on Twitter](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/spacy_io)
|
||||
|
||||
## 📖 Documentation
|
||||
|
@ -81,7 +83,7 @@ it.
|
|||
- Support for **60+ languages**
|
||||
- **Trained pipelines** for different languages and tasks
|
||||
- Multi-task learning with pretrained **transformers** like BERT
|
||||
- Support for pretrained **word vectors** and embedings
|
||||
- Support for pretrained **word vectors** and embeddings
|
||||
- State-of-the-art speed
|
||||
- Production-ready **training system**
|
||||
- Linguistically-motivated **tokenization**
|
||||
|
@ -95,7 +97,7 @@ it.
|
|||
📖 **For more details, see the
|
||||
[facts, figures and benchmarks](https://spacy.io/usage/facts-figures).**
|
||||
|
||||
## Install spaCy
|
||||
## ⏳ Install spaCy
|
||||
|
||||
For detailed installation instructions, see the
|
||||
[documentation](https://spacy.io/usage).
|
||||
|
@ -110,8 +112,8 @@ For detailed installation instructions, see the
|
|||
|
||||
### pip
|
||||
|
||||
Using pip, spaCy releases are available as source packages and binary wheels (as
|
||||
of `v2.0.13`). Before you install spaCy and its dependencies, make sure that
|
||||
Using pip, spaCy releases are available as source packages and binary wheels.
|
||||
Before you install spaCy and its dependencies, make sure that
|
||||
your `pip`, `setuptools` and `wheel` are up to date.
|
||||
|
||||
```bash
|
||||
|
@ -119,13 +121,12 @@ pip install -U pip setuptools wheel
|
|||
pip install spacy
|
||||
```
|
||||
|
||||
To install additional data tables for lemmatization and normalization in
|
||||
**spaCy v2.2+** you can run `pip install spacy[lookups]` or install
|
||||
To install additional data tables for lemmatization and normalization you can
|
||||
run `pip install spacy[lookups]` or install
|
||||
[`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data)
|
||||
separately. The lookups package is needed to create blank models with
|
||||
lemmatization data for v2.2+ plus normalization data for v2.3+, and to
|
||||
lemmatize in languages that don't yet come with pretrained models and aren't
|
||||
powered by third-party libraries.
|
||||
lemmatization data, and to lemmatize in languages that don't yet come with
|
||||
pretrained models and aren't powered by third-party libraries.
|
||||
|
||||
When using pip it is generally recommended to install packages in a virtual
|
||||
environment to avoid modifying system state:
|
||||
|
@ -139,17 +140,14 @@ pip install spacy
|
|||
|
||||
### conda
|
||||
|
||||
Thanks to our great community, we've finally re-added conda support. You can now
|
||||
install spaCy via `conda-forge`:
|
||||
You can also install spaCy from `conda` via the `conda-forge` channel. For the
|
||||
feedstock including the build recipe and configuration, check out
|
||||
[this repository](https://github.com/conda-forge/spacy-feedstock).
|
||||
|
||||
```bash
|
||||
conda install -c conda-forge spacy
|
||||
```
|
||||
|
||||
For the feedstock including the build recipe and configuration, check out
|
||||
[this repository](https://github.com/conda-forge/spacy-feedstock). Improvements
|
||||
and pull requests to the recipe and setup are always appreciated.
|
||||
|
||||
### Updating spaCy
|
||||
|
||||
Some updates to spaCy may require downloading new statistical models. If you're
|
||||
|
@ -169,34 +167,36 @@ with the new version.
|
|||
📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the
|
||||
[migration guide](https://spacy.io/usage/v3#migrating).**
|
||||
|
||||
## Download models
|
||||
## 📦 Download model packages
|
||||
|
||||
Trained pipelines for spaCy can be installed as **Python packages**. This
|
||||
means that they're a component of your application, just like any other module.
|
||||
Models can be installed using spaCy's `download` command, or manually by
|
||||
pointing pip to a path or URL.
|
||||
Models can be installed using spaCy's [`download`](https://spacy.io/api/cli#download)
|
||||
command, or manually by pointing pip to a path or URL.
|
||||
|
||||
| Documentation | |
|
||||
| ---------------------- | ---------------------------------------------------------------- |
|
||||
| [Available Pipelines] | Detailed pipeline descriptions, accuracy figures and benchmarks. |
|
||||
| [Models Documentation] | Detailed usage instructions. |
|
||||
| Documentation | |
|
||||
| -------------------------- | ---------------------------------------------------------------- |
|
||||
| **[Available Pipelines]** | Detailed pipeline descriptions, accuracy figures and benchmarks. |
|
||||
| **[Models Documentation]** | Detailed usage and installation instructions. |
|
||||
| **[Training]** | How to train your own pipelines on your data. |
|
||||
|
||||
[available pipelines]: https://spacy.io/models
|
||||
[models documentation]: https://spacy.io/docs/usage/models
|
||||
[models documentation]: https://spacy.io/usage/models
|
||||
[training]: https://spacy.io/usage/training
|
||||
|
||||
```bash
|
||||
# Download best-matching version of specific model for your spaCy installation
|
||||
python -m spacy download en_core_web_sm
|
||||
|
||||
# pip install .tar.gz archive from path or URL
|
||||
pip install /Users/you/en_core_web_sm-2.2.0.tar.gz
|
||||
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
|
||||
pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
|
||||
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
|
||||
```
|
||||
|
||||
### Loading and using models
|
||||
|
||||
To load a model, use `spacy.load()` with the model name or a
|
||||
path to the model data directory.
|
||||
To load a model, use [`spacy.load()`](https://spacy.io/api/top-level#spacy.load)
|
||||
with the model name or a path to the model data directory.
|
||||
|
||||
```python
|
||||
import spacy
|
||||
|
@ -218,7 +218,7 @@ doc = nlp("This is a sentence.")
|
|||
📖 **For more info and examples, check out the
|
||||
[models documentation](https://spacy.io/docs/usage/models).**
|
||||
|
||||
## Compile from source
|
||||
## ⚒ Compile from source
|
||||
|
||||
The other way to install spaCy is to clone its
|
||||
[GitHub repository](https://github.com/explosion/spaCy) and build it from
|
||||
|
@ -228,8 +228,19 @@ Python distribution including header files, a compiler,
|
|||
[pip](https://pip.pypa.io/en/latest/installing/),
|
||||
[virtualenv](https://virtualenv.pypa.io/en/latest/) and
|
||||
[git](https://git-scm.com) installed. The compiler part is the trickiest. How to
|
||||
do that depends on your system. See notes on Ubuntu, OS X and Windows for
|
||||
details.
|
||||
do that depends on your system.
|
||||
|
||||
| Platform | |
|
||||
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| 🐧 **Ubuntu** | Install system-level dependencies via `apt-get`: `sudo apt-get install build-essential python-dev git` . |
|
||||
| 🍎 **Mac** | Install a recent version of [XCode](https://developer.apple.com/xcode/), including the so-called "Command Line Tools". macOS and OS X ship with Python and git preinstalled. |
|
||||
| 🖼 **Windows** | Install a version of the [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that matches the version that was used to compile your Python interpreter. |
|
||||
|
||||
For more details
|
||||
and instructions, see the documentation on
|
||||
[compiling spaCy from source](https://spacy.io/usage#source) and the
|
||||
[quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
|
||||
commands for your platform and Python version.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/explosion/spaCy
|
||||
|
@ -250,55 +261,25 @@ To install with extras:
|
|||
pip install .[lookups,cuda102]
|
||||
```
|
||||
|
||||
To install all dependencies required for development:
|
||||
To install all dependencies required for development, use the [`requirements.txt`](requirements.txt). Compared to regular install via pip, it
|
||||
additionally installs developer dependencies such as Cython.
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Compared to regular install via pip, [requirements.txt](requirements.txt)
|
||||
additionally installs developer dependencies such as Cython. For more details
|
||||
and instructions, see the documentation on
|
||||
[compiling spaCy from source](https://spacy.io/usage#source) and the
|
||||
[quickstart widget](https://spacy.io/usage#section-quickstart) to get the right
|
||||
commands for your platform and Python version.
|
||||
|
||||
### Ubuntu
|
||||
|
||||
Install system-level dependencies via `apt-get`:
|
||||
|
||||
```bash
|
||||
sudo apt-get install build-essential python-dev git
|
||||
```
|
||||
|
||||
### macOS / OS X
|
||||
|
||||
Install a recent version of [XCode](https://developer.apple.com/xcode/),
|
||||
including the so-called "Command Line Tools". macOS and OS X ship with Python
|
||||
and git preinstalled.
|
||||
|
||||
### Windows
|
||||
|
||||
Install a version of the
|
||||
[Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
|
||||
or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that
|
||||
matches the version that was used to compile your Python interpreter.
|
||||
|
||||
## Run tests
|
||||
## 🚦 Run tests
|
||||
|
||||
spaCy comes with an [extensive test suite](spacy/tests). In order to run the
|
||||
tests, you'll usually want to clone the repository and build spaCy from source.
|
||||
This will also install the required development dependencies and test utilities
|
||||
defined in the `requirements.txt`.
|
||||
defined in the [`requirements.txt`](requirements.txt).
|
||||
|
||||
Alternatively, you can run `pytest` on the tests from within the installed
|
||||
`spacy` package. Don't forget to also install the test utilities via spaCy's
|
||||
`requirements.txt`:
|
||||
[`requirements.txt`](requirements.txt):
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
python -m pytest --pyargs spacy
|
||||
```
|
||||
|
||||
See [the documentation](https://spacy.io/usage#tests) for more details and
|
||||
examples.
|
||||
|
|
|
@ -21,12 +21,12 @@ values are defined in the [`Language.Defaults`](/api/language#defaults).
|
|||
> nlp_de = German() # Includes German data
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Stop words**<br />[`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py) | List of most common words of a language that are often useful to filter out, for example "and" or "I". Matching tokens will return `True` for `is_stop`. |
|
||||
| **Tokenizer exceptions**<br />[`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py) | Special-case rules for the tokenizer, for example, contractions like "can't" and abbreviations with punctuation, like "U.K.". |
|
||||
| **Punctuation rules**<br />[`punctuation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py) | Regular expressions for splitting tokens, e.g. on punctuation or special characters like emoji. Includes rules for prefixes, suffixes and infixes. |
|
||||
| **Character classes**<br />[`char_classes.py`](%%GITHUB_SPACY/spacy/lang/char_classes.py) | Character classes to be used in regular expressions, for example, Latin characters, quotes, hyphens or icons. |
|
||||
| **Lexical attributes**<br />[`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py) | Custom functions for setting lexical attributes on tokens, e.g. `like_num`, which includes language-specific words like "ten" or "hundred". |
|
||||
| **Syntax iterators**<br />[`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py) | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks). |
|
||||
| **Lemmatizer**<br />[`lemmatizer.py`](%%GITHUB_SPACY/master/spacy/lang/fr/lemmatizer.py) [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | Custom lemmatizer implementation and lemmatization tables. |
|
||||
| Name | Description |
|
||||
| --------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Stop words**<br />[`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py) | List of most common words of a language that are often useful to filter out, for example "and" or "I". Matching tokens will return `True` for `is_stop`. |
|
||||
| **Tokenizer exceptions**<br />[`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py) | Special-case rules for the tokenizer, for example, contractions like "can't" and abbreviations with punctuation, like "U.K.". |
|
||||
| **Punctuation rules**<br />[`punctuation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py) | Regular expressions for splitting tokens, e.g. on punctuation or special characters like emoji. Includes rules for prefixes, suffixes and infixes. |
|
||||
| **Character classes**<br />[`char_classes.py`](%%GITHUB_SPACY/spacy/lang/char_classes.py) | Character classes to be used in regular expressions, for example, Latin characters, quotes, hyphens or icons. |
|
||||
| **Lexical attributes**<br />[`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py) | Custom functions for setting lexical attributes on tokens, e.g. `like_num`, which includes language-specific words like "ten" or "hundred". |
|
||||
| **Syntax iterators**<br />[`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py) | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks). |
|
||||
| **Lemmatizer**<br />[`lemmatizer.py`](%%GITHUB_SPACY/spacy/lang/fr/lemmatizer.py) [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | Custom lemmatizer implementation and lemmatization tables. |
|
||||
|
|
Loading…
Reference in New Issue
Block a user