mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-24 17:06:29 +03:00
Docs for v3.4 (#11057)
* Add draft of v3.4 usage * Add Croatian models * Add Matcher min/max * Update release notes * Minor edits * Add updates, tables * Update pydantic/mypy versions * Update version in README * Fix sidebar
This commit is contained in:
parent
d583626a82
commit
11f859c132
|
@ -16,7 +16,7 @@ production-ready [**training system**](https://spacy.io/usage/training) and easy
|
||||||
model packaging, deployment and workflow management. spaCy is commercial
|
model packaging, deployment and workflow management. spaCy is commercial
|
||||||
open-source software, released under the MIT license.
|
open-source software, released under the MIT license.
|
||||||
|
|
||||||
💫 **Version 3.3.1 out now!**
|
💫 **Version 3.4.0 out now!**
|
||||||
[Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
[Check out the release notes here.](https://github.com/explosion/spaCy/releases)
|
||||||
|
|
||||||
[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
|
[![Azure Pipelines](https://img.shields.io/azure-devops/build/explosion-ai/public/8/master.svg?logo=azure-pipelines&style=flat-square&label=build)](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
|
||||||
|
|
143
website/docs/usage/v3-4.md
Normal file
143
website/docs/usage/v3-4.md
Normal file
|
@ -0,0 +1,143 @@
|
||||||
|
---
|
||||||
|
title: What's New in v3.4
|
||||||
|
teaser: New features and how to upgrade
|
||||||
|
menu:
|
||||||
|
- ['New Features', 'features']
|
||||||
|
- ['Upgrading Notes', 'upgrading']
|
||||||
|
---
|
||||||
|
|
||||||
|
## New features {#features hidden="true"}
|
||||||
|
|
||||||
|
spaCy v3.4 brings typing and speed improvements along with new vectors for
|
||||||
|
English CNN pipelines and new trained pipelines for Croatian. This release also
|
||||||
|
includes prebuilt linux aarch64 wheels for all spaCy dependencies distributed by
|
||||||
|
Explosion.
|
||||||
|
|
||||||
|
### Typing improvements {#typing}
|
||||||
|
|
||||||
|
spaCy v3.4 supports pydantic v1.9 and mypy 0.950+ through extensive updates to
|
||||||
|
types in Thinc v8.1.
|
||||||
|
|
||||||
|
### Speed improvements {#speed}
|
||||||
|
|
||||||
|
- For the parser, use C `saxpy`/`sgemm` provided by the `Ops` implementation in
|
||||||
|
order to use Accelerate through `thinc-apple-ops`.
|
||||||
|
- Improved speed of vector lookups.
|
||||||
|
- Improved speed for `Example.get_aligned_parse` and `Example.get_aligned`.
|
||||||
|
|
||||||
|
## Additional features and improvements
|
||||||
|
|
||||||
|
- Min/max `{n,m}` operator for `Matcher` patterns.
|
||||||
|
- Language updates:
|
||||||
|
- Improve tokenization for Cyrillic combining diacritics.
|
||||||
|
- Improve English tokenizer exceptions for contractions with
|
||||||
|
this/that/these/those.
|
||||||
|
- Updated `spacy project clone` to try both `main` and `master` branches by
|
||||||
|
default.
|
||||||
|
- Added confidence threshold for named entity linker.
|
||||||
|
- Improved handling of Typer optional default values for `init_config_cli`.
|
||||||
|
- Added cycle detection in parser projectivization methods.
|
||||||
|
- Added counts for NER labels in `debug data`.
|
||||||
|
- Support for adding NVTX ranges to `TrainablePipe` components.
|
||||||
|
- Support env variable `SPACY_NUM_BUILD_JOBS` to specify the number of build
|
||||||
|
jobs to run in parallel with `pip`.
|
||||||
|
|
||||||
|
## Trained pipelines {#pipelines}
|
||||||
|
|
||||||
|
### New trained pipelines {#new-pipelines}
|
||||||
|
|
||||||
|
v3.4 introduces new CPU/CNN pipelines for Croatian, which use the trainable
|
||||||
|
lemmatizer and [floret vectors](https://github.com/explosion/floret). Due to the
|
||||||
|
use of [Bloom embeddings](https://explosion.ai/blog/bloom-embeddings) and
|
||||||
|
subwords, the pipelines have compact vectors with no out-of-vocabulary words.
|
||||||
|
|
||||||
|
| Package | UPOS | Parser LAS | NER F |
|
||||||
|
| ----------------------------------------------- | ---: | ---------: | ----: |
|
||||||
|
| [`hr_core_news_sm`](/models/hr#hr_core_news_sm) | 96.6 | 77.5 | 76.1 |
|
||||||
|
| [`hr_core_news_md`](/models/hr#hr_core_news_md) | 97.3 | 80.1 | 81.8 |
|
||||||
|
| [`hr_core_news_lg`](/models/hr#hr_core_news_lg) | 97.5 | 80.4 | 83.0 |
|
||||||
|
|
||||||
|
### Pipeline updates {#pipeline-updates}
|
||||||
|
|
||||||
|
All CNN pipelines have been extended with whitespace augmentation.
|
||||||
|
|
||||||
|
The English CNN pipelines have new word vectors:
|
||||||
|
|
||||||
|
| Package | Model Version | TAG | Parser LAS | NER F |
|
||||||
|
| ----------------------------------------------- | ------------- | ---: | ---------: | ----: |
|
||||||
|
| [`en_core_news_md`](/models/en#en_core_news_md) | v3.3.0 | 97.3 | 90.1 | 84.6 |
|
||||||
|
| [`en_core_news_md`](/models/en#en_core_news_lg) | v3.4.0 | 97.2 | 90.3 | 85.5 |
|
||||||
|
| [`en_core_news_lg`](/models/en#en_core_news_md) | v3.3.0 | 97.4 | 90.1 | 85.3 |
|
||||||
|
| [`en_core_news_lg`](/models/en#en_core_news_lg) | v3.4.0 | 97.3 | 90.2 | 85.6 |
|
||||||
|
|
||||||
|
## Notes about upgrading from v3.3 {#upgrading}
|
||||||
|
|
||||||
|
### Doc.has_vector
|
||||||
|
|
||||||
|
`Doc.has_vector` now matches `Token.has_vector` and `Span.has_vector`: it
|
||||||
|
returns `True` if at least one token in the doc has a vector rather than
|
||||||
|
checking only whether the vocab contains vectors.
|
||||||
|
|
||||||
|
### Using trained pipelines with floret vectors
|
||||||
|
|
||||||
|
If you're using a trained pipeline for Croatian, Finnish, Korean or Swedish with
|
||||||
|
new texts and working with `Doc` objects, you shouldn't notice any difference
|
||||||
|
between floret vectors and default vectors.
|
||||||
|
|
||||||
|
If you use vectors for similarity comparisons, there are a few differences,
|
||||||
|
mainly because a floret pipeline doesn't include any kind of frequency-based
|
||||||
|
word list similar to the list of in-vocabulary vector keys with default vectors.
|
||||||
|
|
||||||
|
- If your workflow iterates over the vector keys, you should use an external
|
||||||
|
word list instead:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
|
||||||
|
+ lexemes = [nlp.vocab[word] for word in external_word_list]
|
||||||
|
```
|
||||||
|
|
||||||
|
- `Vectors.most_similar` is not supported because there's no fixed list of
|
||||||
|
vectors to compare your vectors to.
|
||||||
|
|
||||||
|
### Pipeline package version compatibility {#version-compat}
|
||||||
|
|
||||||
|
> #### Using legacy implementations
|
||||||
|
>
|
||||||
|
> In spaCy v3, you'll still be able to load and reference legacy implementations
|
||||||
|
> via [`spacy-legacy`](https://github.com/explosion/spacy-legacy), even if the
|
||||||
|
> components or architectures change and newer versions are available in the
|
||||||
|
> core library.
|
||||||
|
|
||||||
|
When you're loading a pipeline package trained with an earlier version of spaCy
|
||||||
|
v3, you will see a warning telling you that the pipeline may be incompatible.
|
||||||
|
This doesn't necessarily have to be true, but we recommend running your
|
||||||
|
pipelines against your test suite or evaluation data to make sure there are no
|
||||||
|
unexpected results.
|
||||||
|
|
||||||
|
If you're using one of the [trained pipelines](/models) we provide, you should
|
||||||
|
run [`spacy download`](/api/cli#download) to update to the latest version. To
|
||||||
|
see an overview of all installed packages and their compatibility, you can run
|
||||||
|
[`spacy validate`](/api/cli#validate).
|
||||||
|
|
||||||
|
If you've trained your own custom pipeline and you've confirmed that it's still
|
||||||
|
working as expected, you can update the spaCy version requirements in the
|
||||||
|
[`meta.json`](/api/data-formats#meta):
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- "spacy_version": ">=3.3.0,<3.4.0",
|
||||||
|
+ "spacy_version": ">=3.3.0,<3.5.0",
|
||||||
|
```
|
||||||
|
|
||||||
|
### Updating v3.3 configs
|
||||||
|
|
||||||
|
To update a config from spaCy v3.3 with the new v3.4 settings, run
|
||||||
|
[`init fill-config`](/api/cli#init-fill-config):
|
||||||
|
|
||||||
|
```cli
|
||||||
|
$ python -m spacy init fill-config config-v3.3.cfg config-v3.4.cfg
|
||||||
|
```
|
||||||
|
|
||||||
|
In many cases ([`spacy train`](/api/cli#train),
|
||||||
|
[`spacy.load`](/api/top-level#spacy.load)), the new defaults will be filled in
|
||||||
|
automatically, but you'll need to fill in the new settings to run
|
||||||
|
[`debug config`](/api/cli#debug) and [`debug data`](/api/cli#debug-data).
|
|
@ -162,7 +162,12 @@
|
||||||
{
|
{
|
||||||
"code": "hr",
|
"code": "hr",
|
||||||
"name": "Croatian",
|
"name": "Croatian",
|
||||||
"has_examples": true
|
"has_examples": true,
|
||||||
|
"models": [
|
||||||
|
"hr_core_news_sm",
|
||||||
|
"hr_core_news_md",
|
||||||
|
"hr_core_news_lg"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"code": "hsb",
|
"code": "hsb",
|
||||||
|
|
|
@ -12,7 +12,9 @@
|
||||||
{ "text": "New in v3.0", "url": "/usage/v3" },
|
{ "text": "New in v3.0", "url": "/usage/v3" },
|
||||||
{ "text": "New in v3.1", "url": "/usage/v3-1" },
|
{ "text": "New in v3.1", "url": "/usage/v3-1" },
|
||||||
{ "text": "New in v3.2", "url": "/usage/v3-2" },
|
{ "text": "New in v3.2", "url": "/usage/v3-2" },
|
||||||
{ "text": "New in v3.3", "url": "/usage/v3-3" }
|
{ "text": "New in v3.2", "url": "/usage/v3-2" },
|
||||||
|
{ "text": "New in v3.3", "url": "/usage/v3-3" },
|
||||||
|
{ "text": "New in v3.4", "url": "/usage/v3-4" }
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
|
@ -120,8 +120,8 @@ const AlertSpace = ({ nightly, legacy }) => {
|
||||||
}
|
}
|
||||||
|
|
||||||
const navAlert = (
|
const navAlert = (
|
||||||
<Link to="/usage/v3-3" hidden>
|
<Link to="/usage/v3-4" hidden>
|
||||||
<strong>💥 Out now:</strong> spaCy v3.3
|
<strong>💥 Out now:</strong> spaCy v3.4
|
||||||
</Link>
|
</Link>
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user