Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-08-21 13:22:59 +02:00
parent 3826cfb8fe
commit 52bd3a8b48
5 changed files with 190 additions and 19 deletions

View File

@ -7,7 +7,7 @@ source: spacy/morphology.pyx
Store the possible morphological analyses for a language, and index them by
hash. To save space on each token, tokens only know the hash of their
morphological analysis, so queries of morphological attributes are delegated to
this class. See [`MorphAnalysis`](/api/morphology#morphansalysis) for the
this class. See [`MorphAnalysis`](/api/morphology#morphanalysis) for the
container storing a single morphological analysis.
## Morphology.\_\_init\_\_ {#init tag="method"}

View File

@ -450,8 +450,8 @@ The L2 norm of the token's vector representation.
| `pos_` | Coarse-grained part-of-speech from the [Universal POS tag set](https://universaldependencies.org/docs/u/pos/). ~~str~~ |
| `tag` | Fine-grained part-of-speech. ~~int~~ |
| `tag_` | Fine-grained part-of-speech. ~~str~~ |
| `morph` | Morphological analysis. ~~MorphAnalysis~~ |
| `morph_` | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~ |
| `morph` <Tag variant="new">3</Tag> | Morphological analysis. ~~MorphAnalysis~~ |
| `morph_` <Tag variant="new">3</Tag> | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~ |
| `dep` | Syntactic dependency relation. ~~int~~ |
| `dep_` | Syntactic dependency relation. ~~str~~ |
| `lang` | Language of the parent document's vocabulary. ~~int~~ |

View File

@ -632,6 +632,23 @@ validate its contents.
| `path` | Path to the model's `meta.json`. ~~Union[str, Path]~~ |
| **RETURNS** | The model's meta data. ~~Dict[str, Any]~~ |
### util.get_installed_models {#util.get_installed_models tag="function" new="3"}
List all model packages installed in the current environment. This will include
any spaCy model that was packaged with [`spacy package`](/api/cli#package).
Under the hood, model packages expose a Python entry point that spaCy can check,
without having to load the model.
> #### Example
>
> ```python
> model_names = util.get_installed_models()
> ```
| Name | Description |
| ----------- | ---------------------------------------------------------------------------------- |
| **RETURNS** | The string names of the models installed in the current environment. ~~List[str]~~ |
### util.is_package {#util.is_package tag="function"}
Check if string maps to a package installed via pip. Mainly used to validate

View File

@ -11,6 +11,10 @@ next: /usage/training
<!-- TODO: intro, short explanation of embeddings/transformers, Tok2Vec and Transformer components, point user to processing pipelines docs for more general info that user should know first -->
If you're looking for details on using word vectors and semantic similarity,
check out the
[linguistic features docs](/usage/linguistic-features#vectors-similarity).
<Accordion title="Whats the difference between word vectors and language models?" id="vectors-vs-language-models">
The key difference between [word vectors](#word-vectors) and contextual language

View File

@ -10,6 +10,32 @@ menu:
## Summary {#summary}
<Grid cols={2}>
<div>
</div>
<Infobox title="Table of Contents" id="toc">
- [Summary](#summary)
- [New features](#features)
- [Training & config system](#features-training)
- [Transformer-based pipelines](#features-transformers)
- [Custom models](#features-custom-models)
- [End-to-end project workflows](#features-projects)
- [New built-in components](#features-pipeline-components)
- [New custom component API](#features-components)
- [Python type hints](#features-types)
- [New methods & attributes](#new-methods)
- [New & updated documentation](#new-docs)
- [Backwards incompatibilities](#incompat)
- [Migrating from spaCy v2.x](#migrating)
</Infobox>
</Grid>
## New Features {#features}
### New training workflow and config system {#features-training}
@ -28,6 +54,8 @@ menu:
### Transformer-based pipelines {#features-transformers}
![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg)
<Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [Embeddings & Transformers](/usage/embeddings-transformers),
@ -46,8 +74,53 @@ menu:
### Custom models using any framework {#features-custom-models}
<Infobox title="Details & Documentation" emoji="📖" list>
<!-- TODO: link to new custom models page -->
- **Thinc: **
[Wrapping PyTorch, TensorFlow & MXNet](https://thinc.ai/docs/usage-frameworks)
- **API:** [Model architectures](/api/architectures), [`Pipe`](/api/pipe)
</Infobox>
### Manage end-to-end workflows with projects {#features-projects}
<!-- TODO: update example -->
> #### Example
>
> ```cli
> # Clone a project template
> $ python -m spacy project clone example
> $ cd example
> # Download data assets
> $ python -m spacy project assets
> # Run a workflow
> $ python -m spacy project run train
> ```
spaCy projects let you manage and share **end-to-end spaCy workflows** for
different **use cases and domains**, and orchestrate training, packaging and
serving your custom models. You can start off by cloning a pre-defined project
template, adjust it to fit your needs, load in your data, train a model, export
it as a Python package and share the project templates with your team. spaCy
projects also make it easy to **integrate with other tools** in the data science
and machine learning ecosystem, including [DVC](/usage/projects#dvc) for data
version control, [Prodigy](/usage/projects#prodigy) for creating labelled data,
[Streamlit](/usage/projects#streamlit) for building interactive apps,
[FastAPI](/usage/projects#fastapi) for serving models in production,
[Ray](/usage/projects#ray) for parallel training,
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
<!-- <Project id="some_example_project">
The easiest way to get started with an end-to-end training process is to clone a
[project](/usage/projects) template. Projects let you manage multi-step
workflows, from data preprocessing to training and packaging your model.
</Project>-->
<Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [spaCy projects](/usage/projects),
@ -59,6 +132,16 @@ menu:
### New built-in pipeline components {#features-pipeline-components}
spaCy v3.0 includes several new trainable and rule-based components that you can
add to your pipeline and customize for your use case:
> #### Example
>
> ```python
> nlp = spacy.blank("en")
> nlp.add_pipe("lemmatizer")
> ```
| Name | Description |
| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`SentenceRecognizer`](/api/sentencerecognizer) | Trainable component for sentence segmentation. |
@ -78,15 +161,37 @@ menu:
### New and improved pipeline component APIs {#features-components}
- `Language.factory`, `Language.component`
- `Language.analyze_pipes`
- Adding components from other models
> #### Example
>
> ```python
> @Language.component("my_component")
> def my_component(doc):
> return doc
>
> nlp.add_pipe("my_component")
> nlp.add_pipe("ner", source=other_nlp)
> nlp.analyze_pipes(pretty=True)
> ```
Defining, configuring, reusing, training and analyzing pipeline components is
now easier and more convenient. The `@Language.component` and
`@Language.factory` decorators let you register your component, define its
default configuration and meta data, like the attribute values it assigns and
requires. Any custom component can be included during training, and sourcing
components from existing pretrained models lets you **mix and match custom
pipelines**. The `nlp.analyze_pipes` method outputs structured information about
the current pipeline and its components, including the attributes they assign,
the scores they compute during training and whether any required attributes
aren't set.
<Infobox title="Details & Documentation" emoji="📖" list>
- **Usage:** [Custom components](/usage/processing-pipelines#custom_components),
[Defining components during training](/usage/training#config-components)
- **API:** [`Language`](/api/language)
[Defining components for training](/usage/training#config-components)
- **API:** [`@Language.component`](/api/language#component),
[`@Language.factory`](/api/language#factory),
[`Language.add_pipe`](/api/language#add_pipe),
[`Language.analyze_pipes`](/api/language#analyze_pipes)
- **Implementation:**
[`spacy/language.py`](https://github.com/explosion/spaCy/tree/develop/spacy/language.py)
@ -136,13 +241,14 @@ in your config and see validation errors if the argument values don't match.
</Infobox>
### New methods, attributes and commands
### New methods, attributes and commands {#new-methods}
The following methods, attributes and commands are new in spaCy v3.0.
| Name | Description |
| ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). |
| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. |
| [`Language.select_pipes`](/api/language#select_pipes) | Contextmanager for enabling or disabling specific pipeline components for a block. |
| [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. |
| [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. |
@ -153,9 +259,52 @@ The following methods, attributes and commands are new in spaCy v3.0.
| [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. |
| [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). |
| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). |
| [`util.get_installed_models`](/api/top-level#util.get_installed_models) | Names of all models installed in the environment. |
| [`init config`](/api/cli#init-config) [`init fill-config`](/api/cli#init-fill-config) [`debug config`](/api/cli#debug-config) | CLI commands for initializing, auto-filling and debugging [training configs](/usage/training). |
| [`project`](/api/cli#project) | Suite of CLI commands for cloning, running and managing [spaCy projects](/usage/projects). |
### New and updated documentation {#new-docs}
<Grid cols={2} gutterBottom={false}>
<div>
To help you get started with spaCy v3.0 and the new features, we've added
several new or rewritten documentation pages, including a new usage guide on
[embeddings, transformers and transfer learning](/usage/embeddings-transformers),
a guide on [training models](/usage/training) rewritten from scratch, a page
explaining the new [spaCy projects](/usage/projects) and updated usage
documentation on
[custom pipeline components](/usage/processing-pipelines#custom-components).
We've also added a bunch of new illustrations and new API reference pages
documenting spaCy's machine learning [model architectures](/api/architectures)
and the expected [data formats](/api/data-formats). API pages about
[pipeline components](/api/#architecture-pipeline) now include more information,
like the default config and implementation, and we've adopted a more detailed
format for documenting argument and return types.
</div>
[![Library architecture](../images/architecture.svg)](/api)
</Grid>
<Infobox title="New or reworked documentation" emoji="📖" list>
- **Usage: ** [Embeddings & Transformers](/usage/embeddings-transformers),
[Training models](/usage/training), [Projects](/usage/projects),
[Custom pipeline components](/usage/processing-pipelines#custom-components)
- **API Reference: ** [Library architecture](/api),
[Model architectures](/api/architectures), [Data formats](/api/data-formats)
- **New Classes: ** [`Example`](/api/example), [`Tok2Vec`](/api/tok2vec),
[`Transformer`](/api/transformer), [`Lemmatizer`](/api/lemmatizer),
[`Morphologizer`](/api/morphologizer),
[`AttributeRuler`](/api/attributeruler),
[`SentenceRecognizer`](/api/sentencerecognizer), [`Pipe`](/api/pipe),
[`Corpus`](/api/corpus)
</Infobox>
## Backwards Incompatibilities {#incompat}
As always, we've tried to keep the breaking changes to a minimum and focus on
@ -213,14 +362,15 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
### Removed or renamed API {#incompat-removed}
| Removed | Replacement |
| ------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
| -------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) |
| `GoldParse` | [`Example`](/api/example) |
| `GoldCorpus` | [`Corpus`](/api/corpus) |
| `KnowledgeBase.load_bulk` `KnowledgeBase.dump` | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
| `spacy init-model` | [`spacy init model`](/api/cli#init-model) |
| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) |
| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) |
| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated |
| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated |
The following deprecated methods, attributes and arguments were removed in v3.0.
Most of them have been **deprecated for a while** and many would previously
@ -236,7 +386,7 @@ on them.
| `Language.tagger`, `Language.parser`, `Language.entity` | [`Language.get_pipe`](/api/language#get_pipe) |
| keyword-arguments like `vocab=False` on `to_disk`, `from_disk`, `to_bytes`, `from_bytes` | `exclude=["vocab"]` |
| `n_threads` argument on [`Tokenizer`](/api/tokenizer), [`Matcher`](/api/matcher), [`PhraseMatcher`](/api/phrasematcher) | `n_process` |
| `verbose` argument on [`Language.evaluate`] | logging |
| `verbose` argument on [`Language.evaluate`](/api/language#evaluate) | logging (`DEBUG`) |
| `SentenceSegmenter` hook, `SimilarityHook` | [user hooks](/usage/processing-pipelines#custom-components-user-hooks), [`Sentencizer`](/api/sentencizer), [`SentenceRecognizer`](/api/sentenceregognizer) |
## Migrating from v2.x {#migrating}