Update docs [ci skip]

2025-10-18 01:34:16 +03:00 · 2020-08-21 13:22:59 +02:00 · 2020-08-21 13:22:59 +02:00 · 52bd3a8b48
commit 52bd3a8b48
parent 3826cfb8fe
5 changed files with 190 additions and 19 deletions
--- a/website/docs/api/morphology.md
+++ b/website/docs/api/morphology.md
@ -7,7 +7,7 @@ source: spacy/morphology.pyx
 Store the possible morphological analyses for a language, and index them by
 hash. To save space on each token, tokens only know the hash of their
 morphological analysis, so queries of morphological attributes are delegated to
-this class. See [`MorphAnalysis`](/api/morphology#morphansalysis) for the
+this class. See [`MorphAnalysis`](/api/morphology#morphanalysis) for the
 container storing a single morphological analysis.

 ## Morphology.\_\_init\_\_ {#init tag="method"}
--- a/website/docs/api/token.md
+++ b/website/docs/api/token.md
@ -450,8 +450,8 @@ The L2 norm of the token's vector representation.
 | `pos_`                                       | Coarse-grained part-of-speech from the [Universal POS tag set](https://universaldependencies.org/docs/u/pos/). ~~str~~                                                                                                                                                 |
 | `tag`                                        | Fine-grained part-of-speech. ~~int~~                                                                                                                                                                                                                                   |
 | `tag_`                                       | Fine-grained part-of-speech. ~~str~~                                                                                                                                                                                                                                   |
-| `morph`                                      | Morphological analysis. ~~MorphAnalysis~~                                                                                                                                                                                                                              |
-| `morph_`                                     | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~                                                                                                                     |
+| `morph` <Tag variant="new">3</Tag>           | Morphological analysis. ~~MorphAnalysis~~                                                                                                                                                                                                                              |
+| `morph_` <Tag variant="new">3</Tag>          | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~                                                                                                                     |
 | `dep`                                        | Syntactic dependency relation. ~~int~~                                                                                                                                                                                                                                 |
 | `dep_`                                       | Syntactic dependency relation. ~~str~~                                                                                                                                                                                                                                 |
 | `lang`                                       | Language of the parent document's vocabulary. ~~int~~                                                                                                                                                                                                                  |
--- a/website/docs/api/top-level.md
+++ b/website/docs/api/top-level.md
@ -632,6 +632,23 @@ validate its contents.
 | `path`      | Path to the model's `meta.json`. ~~Union[str, Path]~~ |
 | **RETURNS** | The model's meta data. ~~Dict[str, Any]~~             |

+### util.get_installed_models {#util.get_installed_models tag="function" new="3"}
+
+List all model packages installed in the current environment. This will include
+any spaCy model that was packaged with [`spacy package`](/api/cli#package).
+Under the hood, model packages expose a Python entry point that spaCy can check,
+without having to load the model.
+
+> #### Example
+>
+> ```python
+> model_names = util.get_installed_models()
+> ```
+
+| Name        | Description                                                                        |
+| ----------- | ---------------------------------------------------------------------------------- |
+| **RETURNS** | The string names of the models installed in the current environment. ~~List[str]~~ |
+
 ### util.is_package {#util.is_package tag="function"}

 Check if string maps to a package installed via pip. Mainly used to validate
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@ -11,6 +11,10 @@ next: /usage/training

 <!-- TODO: intro, short explanation of embeddings/transformers, Tok2Vec and Transformer components, point user to processing pipelines docs for more general info that user should know first -->

+If you're looking for details on using word vectors and semantic similarity,
+check out the
+[linguistic features docs](/usage/linguistic-features#vectors-similarity).
+
 <Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models">

 The key difference between [word vectors](#word-vectors) and contextual language
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -10,6 +10,32 @@ menu:

 ## Summary {#summary}

+<Grid cols={2}>
+
+<div>
+
+</div>
+
+<Infobox title="Table of Contents" id="toc">
+
+- [Summary](#summary)
+- [New features](#features)
+- [Training & config system](#features-training)
+- [Transformer-based pipelines](#features-transformers)
+- [Custom models](#features-custom-models)
+- [End-to-end project workflows](#features-projects)
+- [New built-in components](#features-pipeline-components)
+- [New custom component API](#features-components)
+- [Python type hints](#features-types)
+- [New methods & attributes](#new-methods)
+- [New & updated documentation](#new-docs)
+- [Backwards incompatibilities](#incompat)
+- [Migrating from spaCy v2.x](#migrating)
+
+</Infobox>
+
+</Grid>
+
 ## New Features {#features}

 ### New training workflow and config system {#features-training}
@ -28,6 +54,8 @@ menu:

 ### Transformer-based pipelines {#features-transformers}

+![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg)
+
 <Infobox title="Details & Documentation" emoji="📖" list>

 - **Usage:** [Embeddings & Transformers](/usage/embeddings-transformers),
@ -46,8 +74,53 @@ menu:

 ### Custom models using any framework {#features-custom-models}

+<Infobox title="Details & Documentation" emoji="📖" list>
+
+<!-- TODO: link to new custom models page -->
+
+- **Thinc: **
+  [Wrapping PyTorch, TensorFlow & MXNet](https://thinc.ai/docs/usage-frameworks)
+- **API:** [Model architectures](/api/architectures), [`Pipe`](/api/pipe)
+
+</Infobox>
+
 ### Manage end-to-end workflows with projects {#features-projects}

+<!-- TODO: update example -->
+
+> #### Example
+>
+> ```cli
+> # Clone a project template
+> $ python -m spacy project clone example
+> $ cd example
+> # Download data assets
+> $ python -m spacy project assets
+> # Run a workflow
+> $ python -m spacy project run train
+> ```
+
+spaCy projects let you manage and share **end-to-end spaCy workflows** for
+different **use cases and domains**, and orchestrate training, packaging and
+serving your custom models. You can start off by cloning a pre-defined project
+template, adjust it to fit your needs, load in your data, train a model, export
+it as a Python package and share the project templates with your team. spaCy
+projects also make it easy to **integrate with other tools** in the data science
+and machine learning ecosystem, including [DVC](/usage/projects#dvc) for data
+version control, [Prodigy](/usage/projects#prodigy) for creating labelled data,
+[Streamlit](/usage/projects#streamlit) for building interactive apps,
+[FastAPI](/usage/projects#fastapi) for serving models in production,
+[Ray](/usage/projects#ray) for parallel training,
+[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
+
+<!-- <Project id="some_example_project">
+
+The easiest way to get started with an end-to-end training process is to clone a
+[project](/usage/projects) template. Projects let you manage multi-step
+workflows, from data preprocessing to training and packaging your model.
+
+</Project>-->
+
 <Infobox title="Details & Documentation" emoji="📖" list>

 - **Usage:** [spaCy projects](/usage/projects),
@ -59,6 +132,16 @@ menu:

 ### New built-in pipeline components {#features-pipeline-components}

+spaCy v3.0 includes several new trainable and rule-based components that you can
+add to your pipeline and customize for your use case:
+
+> #### Example
+>
+> ```python
+> nlp = spacy.blank("en")
+> nlp.add_pipe("lemmatizer")
+> ```
+
 | Name                                            | Description                                                                                                                                                                                                             |
 | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | [`SentenceRecognizer`](/api/sentencerecognizer) | Trainable component for sentence segmentation.                                                                                                                                                                          |
@ -78,15 +161,37 @@ menu:

 ### New and improved pipeline component APIs {#features-components}

- `Language.factory`, `Language.component`
- `Language.analyze_pipes`
- Adding components from other models
+> #### Example
+>
+> ```python
+> @Language.component("my_component")
+> def my_component(doc):
+>     return doc
+>
+> nlp.add_pipe("my_component")
+> nlp.add_pipe("ner", source=other_nlp)
+> nlp.analyze_pipes(pretty=True)
+> ```
+
+Defining, configuring, reusing, training and analyzing pipeline components is
+now easier and more convenient. The `@Language.component` and
+`@Language.factory` decorators let you register your component, define its
+default configuration and meta data, like the attribute values it assigns and
+requires. Any custom component can be included during training, and sourcing
+components from existing pretrained models lets you **mix and match custom
+pipelines**. The `nlp.analyze_pipes` method outputs structured information about
+the current pipeline and its components, including the attributes they assign,
+the scores they compute during training and whether any required attributes
+aren't set.

 <Infobox title="Details & Documentation" emoji="📖" list>

 - **Usage:** [Custom components](/usage/processing-pipelines#custom_components),
-  [Defining components during training](/usage/training#config-components)
- **API:** [`Language`](/api/language)
+  [Defining components for training](/usage/training#config-components)
+- **API:** [`@Language.component`](/api/language#component),
+  [`@Language.factory`](/api/language#factory),
+  [`Language.add_pipe`](/api/language#add_pipe),
+  [`Language.analyze_pipes`](/api/language#analyze_pipes)
 - **Implementation:**
  [`spacy/language.py`](https://github.com/explosion/spaCy/tree/develop/spacy/language.py)

@ -136,13 +241,14 @@ in your config and see validation errors if the argument values don't match.

 </Infobox>

-### New methods, attributes and commands
+### New methods, attributes and commands {#new-methods}

 The following methods, attributes and commands are new in spaCy v3.0.

 | Name                                                                                                                          | Description                                                                                                                                                                                      |
 | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | [`Token.lex`](/api/token#attributes)                                                                                          | Access a token's [`Lexeme`](/api/lexeme).                                                                                                                                                        |
+| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes)                                                | Access a token's morphological analysis.                                                                                                                                                         |
 | [`Language.select_pipes`](/api/language#select_pipes)                                                                         | Contextmanager for enabling or disabling specific pipeline components for a block.                                                                                                               |
 | [`Language.analyze_pipes`](/api/language#analyze_pipes)                                                                       | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies.                                                                                                          |
 | [`Language.resume_training`](/api/language#resume_training)                                                                   | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting.                              |
@ -153,9 +259,52 @@ The following methods, attributes and commands are new in spaCy v3.0.
 | [`Pipe.score`](/api/pipe#score)                                                                                               | Method on trainable pipeline components that returns a dictionary of evaluation scores.                                                                                                          |
 | [`registry`](/api/top-level#registry)                                                                                         | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config).                                                                                  |
 | [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config)                       | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config).                                                                        |
+| [`util.get_installed_models`](/api/top-level#util.get_installed_models)                                                       | Names of all models installed in the environment.                                                                                                                                                |
 | [`init config`](/api/cli#init-config) [`init fill-config`](/api/cli#init-fill-config) [`debug config`](/api/cli#debug-config) | CLI commands for initializing, auto-filling and debugging [training configs](/usage/training).                                                                                                   |
 | [`project`](/api/cli#project)                                                                                                 | Suite of CLI commands for cloning, running and managing [spaCy projects](/usage/projects).                                                                                                       |

+### New and updated documentation {#new-docs}
+
+<Grid cols={2} gutterBottom={false}>
+
+<div>
+
+To help you get started with spaCy v3.0 and the new features, we've added
+several new or rewritten documentation pages, including a new usage guide on
+[embeddings, transformers and transfer learning](/usage/embeddings-transformers),
+a guide on [training models](/usage/training) rewritten from scratch, a page
+explaining the new [spaCy projects](/usage/projects) and updated usage
+documentation on
+[custom pipeline components](/usage/processing-pipelines#custom-components).
+We've also added a bunch of new illustrations and new API reference pages
+documenting spaCy's machine learning [model architectures](/api/architectures)
+and the expected [data formats](/api/data-formats). API pages about
+[pipeline components](/api/#architecture-pipeline) now include more information,
+like the default config and implementation, and we've adopted a more detailed
+format for documenting argument and return types.
+
+</div>
+
+[![Library architecture](../images/architecture.svg)](/api)
+
+</Grid>
+
+<Infobox title="New or reworked documentation" emoji="📖" list>
+
+- **Usage: ** [Embeddings & Transformers](/usage/embeddings-transformers),
+  [Training models](/usage/training), [Projects](/usage/projects),
+  [Custom pipeline components](/usage/processing-pipelines#custom-components)
+- **API Reference: ** [Library architecture](/api),
+  [Model architectures](/api/architectures), [Data formats](/api/data-formats)
+- **New Classes: ** [`Example`](/api/example), [`Tok2Vec`](/api/tok2vec),
+  [`Transformer`](/api/transformer), [`Lemmatizer`](/api/lemmatizer),
+  [`Morphologizer`](/api/morphologizer),
+  [`AttributeRuler`](/api/attributeruler),
+  [`SentenceRecognizer`](/api/sentencerecognizer), [`Pipe`](/api/pipe),
+  [`Corpus`](/api/corpus)
+
+</Infobox>
+
 ## Backwards Incompatibilities {#incompat}

 As always, we've tried to keep the breaking changes to a minimum and focus on
@ -212,15 +361,16 @@ Note that spaCy v3.0 now requires **Python 3.6+**.

 ### Removed or renamed API {#incompat-removed}

-| Removed                                                | Replacement                                                                               |
-| ------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
-| `Language.disable_pipes`                               | [`Language.select_pipes`](/api/language#select_pipes)                                     |
-| `GoldParse`                                            | [`Example`](/api/example)                                                                 |
-| `GoldCorpus`                                           | [`Corpus`](/api/corpus)                                                                   |
-| `KnowledgeBase.load_bulk` `KnowledgeBase.dump`         | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
-| `spacy debug-data`                                     | [`spacy debug data`](/api/cli#debug-data)                                                 |
-| `spacy profile`                                        | [`spacy debug profile`](/api/cli#debug-profile)                                           |
-| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated                                                 |
+| Removed                                                  | Replacement                                                                               |
+| -------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
+| `Language.disable_pipes`                                 | [`Language.select_pipes`](/api/language#select_pipes)                                     |
+| `GoldParse`                                              | [`Example`](/api/example)                                                                 |
+| `GoldCorpus`                                             | [`Corpus`](/api/corpus)                                                                   |
+| `KnowledgeBase.load_bulk` `KnowledgeBase.dump`           | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
+| `spacy init-model`                                       | [`spacy init model`](/api/cli#init-model)                                                 |
+| `spacy debug-data`                                       | [`spacy debug data`](/api/cli#debug-data)                                                 |
+| `spacy profile`                                          | [`spacy debug profile`](/api/cli#debug-profile)                                           |
+| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated                                                 |

 The following deprecated methods, attributes and arguments were removed in v3.0.
 Most of them have been **deprecated for a while** and many would previously
@ -236,7 +386,7 @@ on them.
 | `Language.tagger`, `Language.parser`, `Language.entity`                                                                 | [`Language.get_pipe`](/api/language#get_pipe)                                                                                                              |
 | keyword-arguments like `vocab=False` on `to_disk`, `from_disk`, `to_bytes`, `from_bytes`                                | `exclude=["vocab"]`                                                                                                                                        |
 | `n_threads` argument on [`Tokenizer`](/api/tokenizer), [`Matcher`](/api/matcher), [`PhraseMatcher`](/api/phrasematcher) | `n_process`                                                                                                                                                |
-| `verbose` argument on [`Language.evaluate`]                                                                             | logging                                                                                                                                                    |
+| `verbose` argument on [`Language.evaluate`](/api/language#evaluate)                                                     | logging (`DEBUG`)                                                                                                                                          |
 | `SentenceSegmenter` hook, `SimilarityHook`                                                                              | [user hooks](/usage/processing-pipelines#custom-components-user-hooks), [`Sentencizer`](/api/sentencizer), [`SentenceRecognizer`](/api/sentenceregognizer) |

 ## Migrating from v2.x {#migrating}