mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	Update README and docs [ci skip]
This commit is contained in:
		
							parent
							
								
									45c551037d
								
							
						
					
					
						commit
						a8a1231ccd
					
				
							
								
								
									
										117
									
								
								README.md
									
									
									
									
									
								
							
							
						
						
									
										117
									
								
								README.md
									
									
									
									
									
								
							|  | @ -7,11 +7,14 @@ Cython. It's built on the very latest research, and was designed from day one to | ||||||
| be used in real products. | be used in real products. | ||||||
| 
 | 
 | ||||||
| spaCy comes with | spaCy comes with | ||||||
| [pretrained pipelines](https://spacy.io/models), and | [pretrained pipelines](https://spacy.io/models) and | ||||||
| currently supports tokenization and training for **60+ languages**. It features | currently supports tokenization and training for **60+ languages**. It features | ||||||
| state-of-the-art speed and **neural network models** for tagging, | state-of-the-art speed and **neural network models** for tagging, | ||||||
| parsing, **named entity recognition**, **text classification** and more, multi-task learning with pretrained **transformers** like BERT, as well as a production-ready [training system](https://spacy.io/usage/training) and easy model packaging, deployment and workflow management. | parsing, **named entity recognition**, **text classification** and more, | ||||||
| spaCy is commercial open-source software, released under the MIT license. | multi-task learning with pretrained **transformers** like BERT, as well as a | ||||||
|  | production-ready [**training system**](https://spacy.io/usage/training) and easy | ||||||
|  | model packaging, deployment and workflow management. spaCy is commercial | ||||||
|  | open-source software, released under the MIT license. | ||||||
| 
 | 
 | ||||||
| 💫 **Version 3.0 out now!** | 💫 **Version 3.0 out now!** | ||||||
| [Check out the release notes here.](https://github.com/explosion/spaCy/releases) | [Check out the release notes here.](https://github.com/explosion/spaCy/releases) | ||||||
|  | @ -25,7 +28,6 @@ spaCy is commercial open-source software, released under the MIT license. | ||||||
| <br /> | <br /> | ||||||
| [](https://pypi.org/project/spacy/) | [](https://pypi.org/project/spacy/) | ||||||
| [](https://anaconda.org/conda-forge/spacy) | [](https://anaconda.org/conda-forge/spacy) | ||||||
| [](https://github.com/explosion/spacy-models/releases) |  | ||||||
| [](https://twitter.com/spacy_io) | [](https://twitter.com/spacy_io) | ||||||
| 
 | 
 | ||||||
| ## 📖 Documentation | ## 📖 Documentation | ||||||
|  | @ -81,7 +83,7 @@ it. | ||||||
| - Support for **60+ languages** | - Support for **60+ languages** | ||||||
| - **Trained pipelines** for different languages and tasks | - **Trained pipelines** for different languages and tasks | ||||||
| - Multi-task learning with pretrained **transformers** like BERT | - Multi-task learning with pretrained **transformers** like BERT | ||||||
| - Support for pretrained **word vectors** and embedings | - Support for pretrained **word vectors** and embeddings | ||||||
| - State-of-the-art speed | - State-of-the-art speed | ||||||
| - Production-ready **training system** | - Production-ready **training system** | ||||||
| - Linguistically-motivated **tokenization** | - Linguistically-motivated **tokenization** | ||||||
|  | @ -95,7 +97,7 @@ it. | ||||||
| 📖 **For more details, see the | 📖 **For more details, see the | ||||||
| [facts, figures and benchmarks](https://spacy.io/usage/facts-figures).** | [facts, figures and benchmarks](https://spacy.io/usage/facts-figures).** | ||||||
| 
 | 
 | ||||||
| ## Install spaCy | ## ⏳ Install spaCy | ||||||
| 
 | 
 | ||||||
| For detailed installation instructions, see the | For detailed installation instructions, see the | ||||||
| [documentation](https://spacy.io/usage). | [documentation](https://spacy.io/usage). | ||||||
|  | @ -110,8 +112,8 @@ For detailed installation instructions, see the | ||||||
| 
 | 
 | ||||||
| ### pip | ### pip | ||||||
| 
 | 
 | ||||||
| Using pip, spaCy releases are available as source packages and binary wheels (as | Using pip, spaCy releases are available as source packages and binary wheels. | ||||||
| of `v2.0.13`). Before you install spaCy and its dependencies, make sure that | Before you install spaCy and its dependencies, make sure that | ||||||
| your `pip`, `setuptools` and `wheel` are up to date. | your `pip`, `setuptools` and `wheel` are up to date. | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
|  | @ -119,13 +121,12 @@ pip install -U pip setuptools wheel | ||||||
| pip install spacy | pip install spacy | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| To install additional data tables for lemmatization and normalization in | To install additional data tables for lemmatization and normalization you can | ||||||
| **spaCy v2.2+** you can run `pip install spacy[lookups]` or install | run `pip install spacy[lookups]` or install | ||||||
| [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | ||||||
| separately. The lookups package is needed to create blank models with | separately. The lookups package is needed to create blank models with | ||||||
| lemmatization data for v2.2+ plus normalization data for v2.3+, and to | lemmatization data, and to lemmatize in languages that don't yet come with | ||||||
| lemmatize in languages that don't yet come with pretrained models and aren't | pretrained models and aren't powered by third-party libraries. | ||||||
| powered by third-party libraries. |  | ||||||
| 
 | 
 | ||||||
| When using pip it is generally recommended to install packages in a virtual | When using pip it is generally recommended to install packages in a virtual | ||||||
| environment to avoid modifying system state: | environment to avoid modifying system state: | ||||||
|  | @ -139,17 +140,14 @@ pip install spacy | ||||||
| 
 | 
 | ||||||
| ### conda | ### conda | ||||||
| 
 | 
 | ||||||
| Thanks to our great community, we've finally re-added conda support. You can now | You can also install spaCy from `conda` via the `conda-forge` channel. For the | ||||||
| install spaCy via `conda-forge`: | feedstock including the build recipe and configuration, check out | ||||||
|  | [this repository](https://github.com/conda-forge/spacy-feedstock). | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
| conda install -c conda-forge spacy | conda install -c conda-forge spacy | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| For the feedstock including the build recipe and configuration, check out |  | ||||||
| [this repository](https://github.com/conda-forge/spacy-feedstock). Improvements |  | ||||||
| and pull requests to the recipe and setup are always appreciated. |  | ||||||
| 
 |  | ||||||
| ### Updating spaCy | ### Updating spaCy | ||||||
| 
 | 
 | ||||||
| Some updates to spaCy may require downloading new statistical models. If you're | Some updates to spaCy may require downloading new statistical models. If you're | ||||||
|  | @ -169,34 +167,36 @@ with the new version. | ||||||
| 📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the | 📖 **For details on upgrading from spaCy 2.x to spaCy 3.x, see the | ||||||
| [migration guide](https://spacy.io/usage/v3#migrating).** | [migration guide](https://spacy.io/usage/v3#migrating).** | ||||||
| 
 | 
 | ||||||
| ## Download models | ## 📦 Download model packages | ||||||
| 
 | 
 | ||||||
| Trained pipelines for spaCy can be installed as **Python packages**. This | Trained pipelines for spaCy can be installed as **Python packages**. This | ||||||
| means that they're a component of your application, just like any other module. | means that they're a component of your application, just like any other module. | ||||||
| Models can be installed using spaCy's `download` command, or manually by | Models can be installed using spaCy's [`download`](https://spacy.io/api/cli#download) | ||||||
| pointing pip to a path or URL. | command, or manually by pointing pip to a path or URL. | ||||||
| 
 | 
 | ||||||
| | Documentation              |                                                                  | | | Documentation              |                                                                  | | ||||||
| | ---------------------- | ---------------------------------------------------------------- | | | -------------------------- | ---------------------------------------------------------------- | | ||||||
| | [Available Pipelines]  | Detailed pipeline descriptions, accuracy figures and benchmarks. | | | **[Available Pipelines]**  | Detailed pipeline descriptions, accuracy figures and benchmarks. | | ||||||
| | [Models Documentation] | Detailed usage instructions.                                     | | | **[Models Documentation]** | Detailed usage and installation instructions.                    | | ||||||
|  | | **[Training]**             | How to train your own pipelines on your data.                    | | ||||||
| 
 | 
 | ||||||
| [available pipelines]: https://spacy.io/models | [available pipelines]: https://spacy.io/models | ||||||
| [models documentation]: https://spacy.io/docs/usage/models | [models documentation]: https://spacy.io/usage/models | ||||||
|  | [training]: https://spacy.io/usage/training | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
| # Download best-matching version of specific model for your spaCy installation | # Download best-matching version of specific model for your spaCy installation | ||||||
| python -m spacy download en_core_web_sm | python -m spacy download en_core_web_sm | ||||||
| 
 | 
 | ||||||
| # pip install .tar.gz archive from path or URL | # pip install .tar.gz archive from path or URL | ||||||
| pip install /Users/you/en_core_web_sm-2.2.0.tar.gz | pip install /Users/you/en_core_web_sm-3.0.0.tar.gz | ||||||
| pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz | pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| ### Loading and using models | ### Loading and using models | ||||||
| 
 | 
 | ||||||
| To load a model, use `spacy.load()` with the model name or a | To load a model, use [`spacy.load()`](https://spacy.io/api/top-level#spacy.load) | ||||||
| path to the model data directory. | with the model name or a path to the model data directory. | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| import spacy | import spacy | ||||||
|  | @ -218,7 +218,7 @@ doc = nlp("This is a sentence.") | ||||||
| 📖 **For more info and examples, check out the | 📖 **For more info and examples, check out the | ||||||
| [models documentation](https://spacy.io/docs/usage/models).** | [models documentation](https://spacy.io/docs/usage/models).** | ||||||
| 
 | 
 | ||||||
| ## Compile from source | ## ⚒ Compile from source | ||||||
| 
 | 
 | ||||||
| The other way to install spaCy is to clone its | The other way to install spaCy is to clone its | ||||||
| [GitHub repository](https://github.com/explosion/spaCy) and build it from | [GitHub repository](https://github.com/explosion/spaCy) and build it from | ||||||
|  | @ -228,8 +228,19 @@ Python distribution including header files, a compiler, | ||||||
| [pip](https://pip.pypa.io/en/latest/installing/), | [pip](https://pip.pypa.io/en/latest/installing/), | ||||||
| [virtualenv](https://virtualenv.pypa.io/en/latest/) and | [virtualenv](https://virtualenv.pypa.io/en/latest/) and | ||||||
| [git](https://git-scm.com) installed. The compiler part is the trickiest. How to | [git](https://git-scm.com) installed. The compiler part is the trickiest. How to | ||||||
| do that depends on your system. See notes on Ubuntu, OS X and Windows for | do that depends on your system. | ||||||
| details. | 
 | ||||||
|  | | Platform      |                                                                                                                                                                                                                                                                     | | ||||||
|  | | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
|  | | 🐧 **Ubuntu** | Install system-level dependencies via `apt-get`: `sudo apt-get install build-essential python-dev git` .                                                                                                                                                            | | ||||||
|  | | 🍎 **Mac**    | Install a recent version of [XCode](https://developer.apple.com/xcode/), including the so-called "Command Line Tools". macOS and OS X ship with Python and git preinstalled.                                                                                        | | ||||||
|  | | 🖼 **Windows** | Install a version of the [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that matches the version that was used to compile your Python interpreter. | | ||||||
|  | 
 | ||||||
|  | For more details | ||||||
|  | and instructions, see the documentation on | ||||||
|  | [compiling spaCy from source](https://spacy.io/usage#source) and the | ||||||
|  | [quickstart widget](https://spacy.io/usage#section-quickstart) to get the right | ||||||
|  | commands for your platform and Python version. | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
| git clone https://github.com/explosion/spaCy | git clone https://github.com/explosion/spaCy | ||||||
|  | @ -250,55 +261,25 @@ To install with extras: | ||||||
| pip install .[lookups,cuda102] | pip install .[lookups,cuda102] | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| To install all dependencies required for development: | To install all dependencies required for development, use the [`requirements.txt`](requirements.txt). Compared to regular install via pip, it | ||||||
|  | additionally installs developer dependencies such as Cython. | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
| pip install -r requirements.txt | pip install -r requirements.txt | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| Compared to regular install via pip, [requirements.txt](requirements.txt) | ## 🚦 Run tests | ||||||
| additionally installs developer dependencies such as Cython. For more details |  | ||||||
| and instructions, see the documentation on |  | ||||||
| [compiling spaCy from source](https://spacy.io/usage#source) and the |  | ||||||
| [quickstart widget](https://spacy.io/usage#section-quickstart) to get the right |  | ||||||
| commands for your platform and Python version. |  | ||||||
| 
 |  | ||||||
| ### Ubuntu |  | ||||||
| 
 |  | ||||||
| Install system-level dependencies via `apt-get`: |  | ||||||
| 
 |  | ||||||
| ```bash |  | ||||||
| sudo apt-get install build-essential python-dev git |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| ### macOS / OS X |  | ||||||
| 
 |  | ||||||
| Install a recent version of [XCode](https://developer.apple.com/xcode/), |  | ||||||
| including the so-called "Command Line Tools". macOS and OS X ship with Python |  | ||||||
| and git preinstalled. |  | ||||||
| 
 |  | ||||||
| ### Windows |  | ||||||
| 
 |  | ||||||
| Install a version of the |  | ||||||
| [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) |  | ||||||
| or [Visual Studio Express](https://visualstudio.microsoft.com/vs/express/) that |  | ||||||
| matches the version that was used to compile your Python interpreter. |  | ||||||
| 
 |  | ||||||
| ## Run tests |  | ||||||
| 
 | 
 | ||||||
| spaCy comes with an [extensive test suite](spacy/tests). In order to run the | spaCy comes with an [extensive test suite](spacy/tests). In order to run the | ||||||
| tests, you'll usually want to clone the repository and build spaCy from source. | tests, you'll usually want to clone the repository and build spaCy from source. | ||||||
| This will also install the required development dependencies and test utilities | This will also install the required development dependencies and test utilities | ||||||
| defined in the `requirements.txt`. | defined in the [`requirements.txt`](requirements.txt). | ||||||
| 
 | 
 | ||||||
| Alternatively, you can run `pytest` on the tests from within the installed | Alternatively, you can run `pytest` on the tests from within the installed | ||||||
| `spacy` package. Don't forget to also install the test utilities via spaCy's | `spacy` package. Don't forget to also install the test utilities via spaCy's | ||||||
| `requirements.txt`: | [`requirements.txt`](requirements.txt): | ||||||
| 
 | 
 | ||||||
| ```bash | ```bash | ||||||
| pip install -r requirements.txt | pip install -r requirements.txt | ||||||
| python -m pytest --pyargs spacy | python -m pytest --pyargs spacy | ||||||
| ``` | ``` | ||||||
| 
 |  | ||||||
| See [the documentation](https://spacy.io/usage#tests) for more details and |  | ||||||
| examples. |  | ||||||
|  |  | ||||||
|  | @ -22,11 +22,11 @@ values are defined in the [`Language.Defaults`](/api/language#defaults). | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| | Name                                                                                                                                                      | Description                                                                                                                                              | | | Name                                                                                                                                                      | Description                                                                                                                                              | | ||||||
| | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | | --------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| | **Stop words**<br />[`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py)                                                                         | List of most common words of a language that are often useful to filter out, for example "and" or "I". Matching tokens will return `True` for `is_stop`. | | | **Stop words**<br />[`stop_words.py`](%%GITHUB_SPACY/spacy/lang/en/stop_words.py)                                                                         | List of most common words of a language that are often useful to filter out, for example "and" or "I". Matching tokens will return `True` for `is_stop`. | | ||||||
| | **Tokenizer exceptions**<br />[`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py)                                           | Special-case rules for the tokenizer, for example, contractions like "can't" and abbreviations with punctuation, like "U.K.".                            | | | **Tokenizer exceptions**<br />[`tokenizer_exceptions.py`](%%GITHUB_SPACY/spacy/lang/de/tokenizer_exceptions.py)                                           | Special-case rules for the tokenizer, for example, contractions like "can't" and abbreviations with punctuation, like "U.K.".                            | | ||||||
| | **Punctuation rules**<br />[`punctuation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py)                                                                   | Regular expressions for splitting tokens, e.g. on punctuation or special characters like emoji. Includes rules for prefixes, suffixes and infixes.       | | | **Punctuation rules**<br />[`punctuation.py`](%%GITHUB_SPACY/spacy/lang/punctuation.py)                                                                   | Regular expressions for splitting tokens, e.g. on punctuation or special characters like emoji. Includes rules for prefixes, suffixes and infixes.       | | ||||||
| | **Character classes**<br />[`char_classes.py`](%%GITHUB_SPACY/spacy/lang/char_classes.py)                                                                 | Character classes to be used in regular expressions, for example, Latin characters, quotes, hyphens or icons.                                            | | | **Character classes**<br />[`char_classes.py`](%%GITHUB_SPACY/spacy/lang/char_classes.py)                                                                 | Character classes to be used in regular expressions, for example, Latin characters, quotes, hyphens or icons.                                            | | ||||||
| | **Lexical attributes**<br />[`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py)                                                                   | Custom functions for setting lexical attributes on tokens, e.g. `like_num`, which includes language-specific words like "ten" or "hundred".              | | | **Lexical attributes**<br />[`lex_attrs.py`](%%GITHUB_SPACY/spacy/lang/en/lex_attrs.py)                                                                   | Custom functions for setting lexical attributes on tokens, e.g. `like_num`, which includes language-specific words like "ten" or "hundred".              | | ||||||
| | **Syntax iterators**<br />[`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py)                                                       | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks).  | | | **Syntax iterators**<br />[`syntax_iterators.py`](%%GITHUB_SPACY/spacy/lang/en/syntax_iterators.py)                                                       | Functions that compute views of a `Doc` object based on its syntax. At the moment, only used for [noun chunks](/usage/linguistic-features#noun-chunks).  | | ||||||
| | **Lemmatizer**<br />[`lemmatizer.py`](%%GITHUB_SPACY/master/spacy/lang/fr/lemmatizer.py) [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | Custom lemmatizer implementation and lemmatization tables.                                                                                               | | | **Lemmatizer**<br />[`lemmatizer.py`](%%GITHUB_SPACY/spacy/lang/fr/lemmatizer.py) [`spacy-lookups-data`](https://github.com/explosion/spacy-lookups-data) | Custom lemmatizer implementation and lemmatization tables.                                                                                               | | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user