mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	Docs for v3.4 (#11057)
* Add draft of v3.4 usage * Add Croatian models * Add Matcher min/max * Update release notes * Minor edits * Add updates, tables * Update pydantic/mypy versions * Update version in README * Fix sidebar
This commit is contained in:
		
							parent
							
								
									d583626a82
								
							
						
					
					
						commit
						11f859c132
					
				| 
						 | 
				
			
			@ -16,7 +16,7 @@ production-ready [**training system**](https://spacy.io/usage/training) and easy
 | 
			
		|||
model packaging, deployment and workflow management. spaCy is commercial
 | 
			
		||||
open-source software, released under the MIT license.
 | 
			
		||||
 | 
			
		||||
💫 **Version 3.3.1 out now!**
 | 
			
		||||
💫 **Version 3.4.0 out now!**
 | 
			
		||||
[Check out the release notes here.](https://github.com/explosion/spaCy/releases)
 | 
			
		||||
 | 
			
		||||
[](https://dev.azure.com/explosion-ai/public/_build?definitionId=8)
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
							
								
								
									
										143
									
								
								website/docs/usage/v3-4.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										143
									
								
								website/docs/usage/v3-4.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
				
			
			@ -0,0 +1,143 @@
 | 
			
		|||
---
 | 
			
		||||
title: What's New in v3.4
 | 
			
		||||
teaser: New features and how to upgrade
 | 
			
		||||
menu:
 | 
			
		||||
  - ['New Features', 'features']
 | 
			
		||||
  - ['Upgrading Notes', 'upgrading']
 | 
			
		||||
---
 | 
			
		||||
 | 
			
		||||
## New features {#features hidden="true"}
 | 
			
		||||
 | 
			
		||||
spaCy v3.4 brings typing and speed improvements along with new vectors for
 | 
			
		||||
English CNN pipelines and new trained pipelines for Croatian. This release also
 | 
			
		||||
includes prebuilt linux aarch64 wheels for all spaCy dependencies distributed by
 | 
			
		||||
Explosion.
 | 
			
		||||
 | 
			
		||||
### Typing improvements {#typing}
 | 
			
		||||
 | 
			
		||||
spaCy v3.4 supports pydantic v1.9 and mypy 0.950+ through extensive updates to
 | 
			
		||||
types in Thinc v8.1.
 | 
			
		||||
 | 
			
		||||
### Speed improvements {#speed}
 | 
			
		||||
 | 
			
		||||
- For the parser, use C `saxpy`/`sgemm` provided by the `Ops` implementation in
 | 
			
		||||
  order to use Accelerate through `thinc-apple-ops`.
 | 
			
		||||
- Improved speed of vector lookups.
 | 
			
		||||
- Improved speed for `Example.get_aligned_parse` and `Example.get_aligned`.
 | 
			
		||||
 | 
			
		||||
## Additional features and improvements
 | 
			
		||||
 | 
			
		||||
- Min/max `{n,m}` operator for `Matcher` patterns.
 | 
			
		||||
- Language updates:
 | 
			
		||||
  - Improve tokenization for Cyrillic combining diacritics.
 | 
			
		||||
  - Improve English tokenizer exceptions for contractions with
 | 
			
		||||
    this/that/these/those.
 | 
			
		||||
- Updated `spacy project clone` to try both `main` and `master` branches by
 | 
			
		||||
  default.
 | 
			
		||||
- Added confidence threshold for named entity linker.
 | 
			
		||||
- Improved handling of Typer optional default values for `init_config_cli`.
 | 
			
		||||
- Added cycle detection in parser projectivization methods.
 | 
			
		||||
- Added counts for NER labels in `debug data`.
 | 
			
		||||
- Support for adding NVTX ranges to `TrainablePipe` components.
 | 
			
		||||
- Support env variable `SPACY_NUM_BUILD_JOBS` to specify the number of build
 | 
			
		||||
  jobs to run in parallel with `pip`.
 | 
			
		||||
 | 
			
		||||
## Trained pipelines {#pipelines}
 | 
			
		||||
 | 
			
		||||
### New trained pipelines {#new-pipelines}
 | 
			
		||||
 | 
			
		||||
v3.4 introduces new CPU/CNN pipelines for Croatian, which use the trainable
 | 
			
		||||
lemmatizer and [floret vectors](https://github.com/explosion/floret). Due to the
 | 
			
		||||
use of [Bloom embeddings](https://explosion.ai/blog/bloom-embeddings) and
 | 
			
		||||
subwords, the pipelines have compact vectors with no out-of-vocabulary words.
 | 
			
		||||
 | 
			
		||||
| Package                                         | UPOS | Parser LAS | NER F |
 | 
			
		||||
| ----------------------------------------------- | ---: | ---------: | ----: |
 | 
			
		||||
| [`hr_core_news_sm`](/models/hr#hr_core_news_sm) | 96.6 |       77.5 |  76.1 |
 | 
			
		||||
| [`hr_core_news_md`](/models/hr#hr_core_news_md) | 97.3 |       80.1 |  81.8 |
 | 
			
		||||
| [`hr_core_news_lg`](/models/hr#hr_core_news_lg) | 97.5 |       80.4 |  83.0 |
 | 
			
		||||
 | 
			
		||||
### Pipeline updates {#pipeline-updates}
 | 
			
		||||
 | 
			
		||||
All CNN pipelines have been extended with whitespace augmentation.
 | 
			
		||||
 | 
			
		||||
The English CNN pipelines have new word vectors:
 | 
			
		||||
 | 
			
		||||
| Package                                         | Model Version |  TAG | Parser LAS | NER F |
 | 
			
		||||
| ----------------------------------------------- | ------------- | ---: | ---------: | ----: |
 | 
			
		||||
| [`en_core_news_md`](/models/en#en_core_news_md) | v3.3.0        | 97.3 |       90.1 |  84.6 |
 | 
			
		||||
| [`en_core_news_md`](/models/en#en_core_news_lg) | v3.4.0        | 97.2 |       90.3 |  85.5 |
 | 
			
		||||
| [`en_core_news_lg`](/models/en#en_core_news_md) | v3.3.0        | 97.4 |       90.1 |  85.3 |
 | 
			
		||||
| [`en_core_news_lg`](/models/en#en_core_news_lg) | v3.4.0        | 97.3 |       90.2 |  85.6 |
 | 
			
		||||
 | 
			
		||||
## Notes about upgrading from v3.3 {#upgrading}
 | 
			
		||||
 | 
			
		||||
### Doc.has_vector
 | 
			
		||||
 | 
			
		||||
`Doc.has_vector` now matches `Token.has_vector` and `Span.has_vector`: it
 | 
			
		||||
returns `True` if at least one token in the doc has a vector rather than
 | 
			
		||||
checking only whether the vocab contains vectors.
 | 
			
		||||
 | 
			
		||||
### Using trained pipelines with floret vectors
 | 
			
		||||
 | 
			
		||||
If you're using a trained pipeline for Croatian, Finnish, Korean or Swedish with
 | 
			
		||||
new texts and working with `Doc` objects, you shouldn't notice any difference
 | 
			
		||||
between floret vectors and default vectors.
 | 
			
		||||
 | 
			
		||||
If you use vectors for similarity comparisons, there are a few differences,
 | 
			
		||||
mainly because a floret pipeline doesn't include any kind of frequency-based
 | 
			
		||||
word list similar to the list of in-vocabulary vector keys with default vectors.
 | 
			
		||||
 | 
			
		||||
- If your workflow iterates over the vector keys, you should use an external
 | 
			
		||||
  word list instead:
 | 
			
		||||
 | 
			
		||||
  ```diff
 | 
			
		||||
  - lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
 | 
			
		||||
  + lexemes = [nlp.vocab[word] for word in external_word_list]
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- `Vectors.most_similar` is not supported because there's no fixed list of
 | 
			
		||||
  vectors to compare your vectors to.
 | 
			
		||||
 | 
			
		||||
### Pipeline package version compatibility {#version-compat}
 | 
			
		||||
 | 
			
		||||
> #### Using legacy implementations
 | 
			
		||||
>
 | 
			
		||||
> In spaCy v3, you'll still be able to load and reference legacy implementations
 | 
			
		||||
> via [`spacy-legacy`](https://github.com/explosion/spacy-legacy), even if the
 | 
			
		||||
> components or architectures change and newer versions are available in the
 | 
			
		||||
> core library.
 | 
			
		||||
 | 
			
		||||
When you're loading a pipeline package trained with an earlier version of spaCy
 | 
			
		||||
v3, you will see a warning telling you that the pipeline may be incompatible.
 | 
			
		||||
This doesn't necessarily have to be true, but we recommend running your
 | 
			
		||||
pipelines against your test suite or evaluation data to make sure there are no
 | 
			
		||||
unexpected results.
 | 
			
		||||
 | 
			
		||||
If you're using one of the [trained pipelines](/models) we provide, you should
 | 
			
		||||
run [`spacy download`](/api/cli#download) to update to the latest version. To
 | 
			
		||||
see an overview of all installed packages and their compatibility, you can run
 | 
			
		||||
[`spacy validate`](/api/cli#validate).
 | 
			
		||||
 | 
			
		||||
If you've trained your own custom pipeline and you've confirmed that it's still
 | 
			
		||||
working as expected, you can update the spaCy version requirements in the
 | 
			
		||||
[`meta.json`](/api/data-formats#meta):
 | 
			
		||||
 | 
			
		||||
```diff
 | 
			
		||||
- "spacy_version": ">=3.3.0,<3.4.0",
 | 
			
		||||
+ "spacy_version": ">=3.3.0,<3.5.0",
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Updating v3.3 configs
 | 
			
		||||
 | 
			
		||||
To update a config from spaCy v3.3 with the new v3.4 settings, run
 | 
			
		||||
[`init fill-config`](/api/cli#init-fill-config):
 | 
			
		||||
 | 
			
		||||
```cli
 | 
			
		||||
$ python -m spacy init fill-config config-v3.3.cfg config-v3.4.cfg
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
In many cases ([`spacy train`](/api/cli#train),
 | 
			
		||||
[`spacy.load`](/api/top-level#spacy.load)), the new defaults will be filled in
 | 
			
		||||
automatically, but you'll need to fill in the new settings to run
 | 
			
		||||
[`debug config`](/api/cli#debug) and [`debug data`](/api/cli#debug-data).
 | 
			
		||||
| 
						 | 
				
			
			@ -162,7 +162,12 @@
 | 
			
		|||
        {
 | 
			
		||||
            "code": "hr",
 | 
			
		||||
            "name": "Croatian",
 | 
			
		||||
            "has_examples": true
 | 
			
		||||
            "has_examples": true,
 | 
			
		||||
            "models": [
 | 
			
		||||
                "hr_core_news_sm",
 | 
			
		||||
                "hr_core_news_md",
 | 
			
		||||
                "hr_core_news_lg"
 | 
			
		||||
            ]
 | 
			
		||||
        },
 | 
			
		||||
        {
 | 
			
		||||
            "code": "hsb",
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -12,7 +12,9 @@
 | 
			
		|||
                    { "text": "New in v3.0", "url": "/usage/v3" },
 | 
			
		||||
                    { "text": "New in v3.1", "url": "/usage/v3-1" },
 | 
			
		||||
                    { "text": "New in v3.2", "url": "/usage/v3-2" },
 | 
			
		||||
                    { "text": "New in v3.3", "url": "/usage/v3-3" }
 | 
			
		||||
                    { "text": "New in v3.2", "url": "/usage/v3-2" },
 | 
			
		||||
                    { "text": "New in v3.3", "url": "/usage/v3-3" },
 | 
			
		||||
                    { "text": "New in v3.4", "url": "/usage/v3-4" }
 | 
			
		||||
                ]
 | 
			
		||||
            },
 | 
			
		||||
            {
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -120,8 +120,8 @@ const AlertSpace = ({ nightly, legacy }) => {
 | 
			
		|||
}
 | 
			
		||||
 | 
			
		||||
const navAlert = (
 | 
			
		||||
    <Link to="/usage/v3-3" hidden>
 | 
			
		||||
        <strong>💥 Out now:</strong> spaCy v3.3
 | 
			
		||||
    <Link to="/usage/v3-4" hidden>
 | 
			
		||||
        <strong>💥 Out now:</strong> spaCy v3.4
 | 
			
		||||
    </Link>
 | 
			
		||||
)
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in New Issue
	
	Block a user