mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Update docs [ci skip]
This commit is contained in:
parent
3826cfb8fe
commit
52bd3a8b48
|
@ -7,7 +7,7 @@ source: spacy/morphology.pyx
|
|||
Store the possible morphological analyses for a language, and index them by
|
||||
hash. To save space on each token, tokens only know the hash of their
|
||||
morphological analysis, so queries of morphological attributes are delegated to
|
||||
this class. See [`MorphAnalysis`](/api/morphology#morphansalysis) for the
|
||||
this class. See [`MorphAnalysis`](/api/morphology#morphanalysis) for the
|
||||
container storing a single morphological analysis.
|
||||
|
||||
## Morphology.\_\_init\_\_ {#init tag="method"}
|
||||
|
|
|
@ -450,8 +450,8 @@ The L2 norm of the token's vector representation.
|
|||
| `pos_` | Coarse-grained part-of-speech from the [Universal POS tag set](https://universaldependencies.org/docs/u/pos/). ~~str~~ |
|
||||
| `tag` | Fine-grained part-of-speech. ~~int~~ |
|
||||
| `tag_` | Fine-grained part-of-speech. ~~str~~ |
|
||||
| `morph` | Morphological analysis. ~~MorphAnalysis~~ |
|
||||
| `morph_` | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~ |
|
||||
| `morph` <Tag variant="new">3</Tag> | Morphological analysis. ~~MorphAnalysis~~ |
|
||||
| `morph_` <Tag variant="new">3</Tag> | Morphological analysis in the Universal Dependencies [FEATS]https://universaldependencies.org/format.html#morphological-annotation format. ~~str~~ |
|
||||
| `dep` | Syntactic dependency relation. ~~int~~ |
|
||||
| `dep_` | Syntactic dependency relation. ~~str~~ |
|
||||
| `lang` | Language of the parent document's vocabulary. ~~int~~ |
|
||||
|
|
|
@ -632,6 +632,23 @@ validate its contents.
|
|||
| `path` | Path to the model's `meta.json`. ~~Union[str, Path]~~ |
|
||||
| **RETURNS** | The model's meta data. ~~Dict[str, Any]~~ |
|
||||
|
||||
### util.get_installed_models {#util.get_installed_models tag="function" new="3"}
|
||||
|
||||
List all model packages installed in the current environment. This will include
|
||||
any spaCy model that was packaged with [`spacy package`](/api/cli#package).
|
||||
Under the hood, model packages expose a Python entry point that spaCy can check,
|
||||
without having to load the model.
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> model_names = util.get_installed_models()
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
| ----------- | ---------------------------------------------------------------------------------- |
|
||||
| **RETURNS** | The string names of the models installed in the current environment. ~~List[str]~~ |
|
||||
|
||||
### util.is_package {#util.is_package tag="function"}
|
||||
|
||||
Check if string maps to a package installed via pip. Mainly used to validate
|
||||
|
|
|
@ -11,6 +11,10 @@ next: /usage/training
|
|||
|
||||
<!-- TODO: intro, short explanation of embeddings/transformers, Tok2Vec and Transformer components, point user to processing pipelines docs for more general info that user should know first -->
|
||||
|
||||
If you're looking for details on using word vectors and semantic similarity,
|
||||
check out the
|
||||
[linguistic features docs](/usage/linguistic-features#vectors-similarity).
|
||||
|
||||
<Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models">
|
||||
|
||||
The key difference between [word vectors](#word-vectors) and contextual language
|
||||
|
|
|
@ -10,6 +10,32 @@ menu:
|
|||
|
||||
## Summary {#summary}
|
||||
|
||||
<Grid cols={2}>
|
||||
|
||||
<div>
|
||||
|
||||
</div>
|
||||
|
||||
<Infobox title="Table of Contents" id="toc">
|
||||
|
||||
- [Summary](#summary)
|
||||
- [New features](#features)
|
||||
- [Training & config system](#features-training)
|
||||
- [Transformer-based pipelines](#features-transformers)
|
||||
- [Custom models](#features-custom-models)
|
||||
- [End-to-end project workflows](#features-projects)
|
||||
- [New built-in components](#features-pipeline-components)
|
||||
- [New custom component API](#features-components)
|
||||
- [Python type hints](#features-types)
|
||||
- [New methods & attributes](#new-methods)
|
||||
- [New & updated documentation](#new-docs)
|
||||
- [Backwards incompatibilities](#incompat)
|
||||
- [Migrating from spaCy v2.x](#migrating)
|
||||
|
||||
</Infobox>
|
||||
|
||||
</Grid>
|
||||
|
||||
## New Features {#features}
|
||||
|
||||
### New training workflow and config system {#features-training}
|
||||
|
@ -28,6 +54,8 @@ menu:
|
|||
|
||||
### Transformer-based pipelines {#features-transformers}
|
||||
|
||||
![Pipeline components listening to shared embedding component](../images/tok2vec-listener.svg)
|
||||
|
||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||
|
||||
- **Usage:** [Embeddings & Transformers](/usage/embeddings-transformers),
|
||||
|
@ -46,8 +74,53 @@ menu:
|
|||
|
||||
### Custom models using any framework {#features-custom-models}
|
||||
|
||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||
|
||||
<!-- TODO: link to new custom models page -->
|
||||
|
||||
- **Thinc: **
|
||||
[Wrapping PyTorch, TensorFlow & MXNet](https://thinc.ai/docs/usage-frameworks)
|
||||
- **API:** [Model architectures](/api/architectures), [`Pipe`](/api/pipe)
|
||||
|
||||
</Infobox>
|
||||
|
||||
### Manage end-to-end workflows with projects {#features-projects}
|
||||
|
||||
<!-- TODO: update example -->
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```cli
|
||||
> # Clone a project template
|
||||
> $ python -m spacy project clone example
|
||||
> $ cd example
|
||||
> # Download data assets
|
||||
> $ python -m spacy project assets
|
||||
> # Run a workflow
|
||||
> $ python -m spacy project run train
|
||||
> ```
|
||||
|
||||
spaCy projects let you manage and share **end-to-end spaCy workflows** for
|
||||
different **use cases and domains**, and orchestrate training, packaging and
|
||||
serving your custom models. You can start off by cloning a pre-defined project
|
||||
template, adjust it to fit your needs, load in your data, train a model, export
|
||||
it as a Python package and share the project templates with your team. spaCy
|
||||
projects also make it easy to **integrate with other tools** in the data science
|
||||
and machine learning ecosystem, including [DVC](/usage/projects#dvc) for data
|
||||
version control, [Prodigy](/usage/projects#prodigy) for creating labelled data,
|
||||
[Streamlit](/usage/projects#streamlit) for building interactive apps,
|
||||
[FastAPI](/usage/projects#fastapi) for serving models in production,
|
||||
[Ray](/usage/projects#ray) for parallel training,
|
||||
[Weights & Biases](/usage/projects#wandb) for experiment tracking, and more!
|
||||
|
||||
<!-- <Project id="some_example_project">
|
||||
|
||||
The easiest way to get started with an end-to-end training process is to clone a
|
||||
[project](/usage/projects) template. Projects let you manage multi-step
|
||||
workflows, from data preprocessing to training and packaging your model.
|
||||
|
||||
</Project>-->
|
||||
|
||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||
|
||||
- **Usage:** [spaCy projects](/usage/projects),
|
||||
|
@ -59,6 +132,16 @@ menu:
|
|||
|
||||
### New built-in pipeline components {#features-pipeline-components}
|
||||
|
||||
spaCy v3.0 includes several new trainable and rule-based components that you can
|
||||
add to your pipeline and customize for your use case:
|
||||
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> nlp = spacy.blank("en")
|
||||
> nlp.add_pipe("lemmatizer")
|
||||
> ```
|
||||
|
||||
| Name | Description |
|
||||
| ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| [`SentenceRecognizer`](/api/sentencerecognizer) | Trainable component for sentence segmentation. |
|
||||
|
@ -78,15 +161,37 @@ menu:
|
|||
|
||||
### New and improved pipeline component APIs {#features-components}
|
||||
|
||||
- `Language.factory`, `Language.component`
|
||||
- `Language.analyze_pipes`
|
||||
- Adding components from other models
|
||||
> #### Example
|
||||
>
|
||||
> ```python
|
||||
> @Language.component("my_component")
|
||||
> def my_component(doc):
|
||||
> return doc
|
||||
>
|
||||
> nlp.add_pipe("my_component")
|
||||
> nlp.add_pipe("ner", source=other_nlp)
|
||||
> nlp.analyze_pipes(pretty=True)
|
||||
> ```
|
||||
|
||||
Defining, configuring, reusing, training and analyzing pipeline components is
|
||||
now easier and more convenient. The `@Language.component` and
|
||||
`@Language.factory` decorators let you register your component, define its
|
||||
default configuration and meta data, like the attribute values it assigns and
|
||||
requires. Any custom component can be included during training, and sourcing
|
||||
components from existing pretrained models lets you **mix and match custom
|
||||
pipelines**. The `nlp.analyze_pipes` method outputs structured information about
|
||||
the current pipeline and its components, including the attributes they assign,
|
||||
the scores they compute during training and whether any required attributes
|
||||
aren't set.
|
||||
|
||||
<Infobox title="Details & Documentation" emoji="📖" list>
|
||||
|
||||
- **Usage:** [Custom components](/usage/processing-pipelines#custom_components),
|
||||
[Defining components during training](/usage/training#config-components)
|
||||
- **API:** [`Language`](/api/language)
|
||||
[Defining components for training](/usage/training#config-components)
|
||||
- **API:** [`@Language.component`](/api/language#component),
|
||||
[`@Language.factory`](/api/language#factory),
|
||||
[`Language.add_pipe`](/api/language#add_pipe),
|
||||
[`Language.analyze_pipes`](/api/language#analyze_pipes)
|
||||
- **Implementation:**
|
||||
[`spacy/language.py`](https://github.com/explosion/spaCy/tree/develop/spacy/language.py)
|
||||
|
||||
|
@ -136,13 +241,14 @@ in your config and see validation errors if the argument values don't match.
|
|||
|
||||
</Infobox>
|
||||
|
||||
### New methods, attributes and commands
|
||||
### New methods, attributes and commands {#new-methods}
|
||||
|
||||
The following methods, attributes and commands are new in spaCy v3.0.
|
||||
|
||||
| Name | Description |
|
||||
| ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| [`Token.lex`](/api/token#attributes) | Access a token's [`Lexeme`](/api/lexeme). |
|
||||
| [`Token.morph`](/api/token#attributes) [`Token.morph_`](/api/token#attributes) | Access a token's morphological analysis. |
|
||||
| [`Language.select_pipes`](/api/language#select_pipes) | Contextmanager for enabling or disabling specific pipeline components for a block. |
|
||||
| [`Language.analyze_pipes`](/api/language#analyze_pipes) | [Analyze](/usage/processing-pipelines#analysis) components and their interdependencies. |
|
||||
| [`Language.resume_training`](/api/language#resume_training) | Experimental: continue training a pretrained model and initialize "rehearsal" for components that implement a `rehearse` method to prevent catastrophic forgetting. |
|
||||
|
@ -153,9 +259,52 @@ The following methods, attributes and commands are new in spaCy v3.0.
|
|||
| [`Pipe.score`](/api/pipe#score) | Method on trainable pipeline components that returns a dictionary of evaluation scores. |
|
||||
| [`registry`](/api/top-level#registry) | Function registry to map functions to string names that can be referenced in [configs](/usage/training#config). |
|
||||
| [`util.load_meta`](/api/top-level#util.load_meta) [`util.load_config`](/api/top-level#util.load_config) | Updated helpers for loading a model's [`meta.json`](/api/data-formats#meta) and [`config.cfg`](/api/data-formats#config). |
|
||||
| [`util.get_installed_models`](/api/top-level#util.get_installed_models) | Names of all models installed in the environment. |
|
||||
| [`init config`](/api/cli#init-config) [`init fill-config`](/api/cli#init-fill-config) [`debug config`](/api/cli#debug-config) | CLI commands for initializing, auto-filling and debugging [training configs](/usage/training). |
|
||||
| [`project`](/api/cli#project) | Suite of CLI commands for cloning, running and managing [spaCy projects](/usage/projects). |
|
||||
|
||||
### New and updated documentation {#new-docs}
|
||||
|
||||
<Grid cols={2} gutterBottom={false}>
|
||||
|
||||
<div>
|
||||
|
||||
To help you get started with spaCy v3.0 and the new features, we've added
|
||||
several new or rewritten documentation pages, including a new usage guide on
|
||||
[embeddings, transformers and transfer learning](/usage/embeddings-transformers),
|
||||
a guide on [training models](/usage/training) rewritten from scratch, a page
|
||||
explaining the new [spaCy projects](/usage/projects) and updated usage
|
||||
documentation on
|
||||
[custom pipeline components](/usage/processing-pipelines#custom-components).
|
||||
We've also added a bunch of new illustrations and new API reference pages
|
||||
documenting spaCy's machine learning [model architectures](/api/architectures)
|
||||
and the expected [data formats](/api/data-formats). API pages about
|
||||
[pipeline components](/api/#architecture-pipeline) now include more information,
|
||||
like the default config and implementation, and we've adopted a more detailed
|
||||
format for documenting argument and return types.
|
||||
|
||||
</div>
|
||||
|
||||
[![Library architecture](../images/architecture.svg)](/api)
|
||||
|
||||
</Grid>
|
||||
|
||||
<Infobox title="New or reworked documentation" emoji="📖" list>
|
||||
|
||||
- **Usage: ** [Embeddings & Transformers](/usage/embeddings-transformers),
|
||||
[Training models](/usage/training), [Projects](/usage/projects),
|
||||
[Custom pipeline components](/usage/processing-pipelines#custom-components)
|
||||
- **API Reference: ** [Library architecture](/api),
|
||||
[Model architectures](/api/architectures), [Data formats](/api/data-formats)
|
||||
- **New Classes: ** [`Example`](/api/example), [`Tok2Vec`](/api/tok2vec),
|
||||
[`Transformer`](/api/transformer), [`Lemmatizer`](/api/lemmatizer),
|
||||
[`Morphologizer`](/api/morphologizer),
|
||||
[`AttributeRuler`](/api/attributeruler),
|
||||
[`SentenceRecognizer`](/api/sentencerecognizer), [`Pipe`](/api/pipe),
|
||||
[`Corpus`](/api/corpus)
|
||||
|
||||
</Infobox>
|
||||
|
||||
## Backwards Incompatibilities {#incompat}
|
||||
|
||||
As always, we've tried to keep the breaking changes to a minimum and focus on
|
||||
|
@ -213,14 +362,15 @@ Note that spaCy v3.0 now requires **Python 3.6+**.
|
|||
### Removed or renamed API {#incompat-removed}
|
||||
|
||||
| Removed | Replacement |
|
||||
| ------------------------------------------------------ | ----------------------------------------------------------------------------------------- |
|
||||
| -------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
|
||||
| `Language.disable_pipes` | [`Language.select_pipes`](/api/language#select_pipes) |
|
||||
| `GoldParse` | [`Example`](/api/example) |
|
||||
| `GoldCorpus` | [`Corpus`](/api/corpus) |
|
||||
| `KnowledgeBase.load_bulk` `KnowledgeBase.dump` | [`KnowledgeBase.from_disk`](/api/kb#from_disk) [`KnowledgeBase.to_disk`](/api/kb#to_disk) |
|
||||
| `spacy init-model` | [`spacy init model`](/api/cli#init-model) |
|
||||
| `spacy debug-data` | [`spacy debug data`](/api/cli#debug-data) |
|
||||
| `spacy profile` | [`spacy debug profile`](/api/cli#debug-profile) |
|
||||
| `spacy link` `util.set_data_path` `util.get_data_path` | not needed, model symlinks are deprecated |
|
||||
| `spacy link`, `util.set_data_path`, `util.get_data_path` | not needed, model symlinks are deprecated |
|
||||
|
||||
The following deprecated methods, attributes and arguments were removed in v3.0.
|
||||
Most of them have been **deprecated for a while** and many would previously
|
||||
|
@ -236,7 +386,7 @@ on them.
|
|||
| `Language.tagger`, `Language.parser`, `Language.entity` | [`Language.get_pipe`](/api/language#get_pipe) |
|
||||
| keyword-arguments like `vocab=False` on `to_disk`, `from_disk`, `to_bytes`, `from_bytes` | `exclude=["vocab"]` |
|
||||
| `n_threads` argument on [`Tokenizer`](/api/tokenizer), [`Matcher`](/api/matcher), [`PhraseMatcher`](/api/phrasematcher) | `n_process` |
|
||||
| `verbose` argument on [`Language.evaluate`] | logging |
|
||||
| `verbose` argument on [`Language.evaluate`](/api/language#evaluate) | logging (`DEBUG`) |
|
||||
| `SentenceSegmenter` hook, `SimilarityHook` | [user hooks](/usage/processing-pipelines#custom-components-user-hooks), [`Sentencizer`](/api/sentencizer), [`SentenceRecognizer`](/api/sentenceregognizer) |
|
||||
|
||||
## Migrating from v2.x {#migrating}
|
||||
|
|
Loading…
Reference in New Issue
Block a user