Welcome to the BankORG of ChinaGPE.
## Additional features and improvements
- Config comparisons with [`spacy debug diff-config`](/api/cli#debug-diff).
- Span suggester debugging with
[`SpanCategorizer.set_candidates`](/api/spancategorizer#set_candidates).
- Big endian support with
[`thinc-bigendian-ops`](https://github.com/andrewsi-z/thinc-bigendian-ops) and
updates to make `floret`, `murmurhash`, Thinc and spaCy endian neutral.
- Initial support for Lower Sorbian and Upper Sorbian.
- Language updates for English, French, Italian, Japanese, Korean, Norwegian,
Russian, Slovenian, Spanish, Turkish, Ukrainian and Vietnamese.
- New noun chunks for Finnish.
## Trained pipelines {id="pipelines"}
### New trained pipelines {id="new-pipelines"}
v3.3 introduces new CPU/CNN pipelines for Finnish, Korean and Swedish, which use
the new trainable lemmatizer and
[floret vectors](https://github.com/explosion/floret). Due to the use
[Bloom embeddings](https://explosion.ai/blog/bloom-embeddings) and subwords, the
pipelines have compact vectors with no out-of-vocabulary words.
| Package | Language | UPOS | Parser LAS | NER F |
| ----------------------------------------------- | -------- | ---: | ---------: | ----: |
| [`fi_core_news_sm`](/models/fi#fi_core_news_sm) | Finnish | 92.5 | 71.9 | 75.9 |
| [`fi_core_news_md`](/models/fi#fi_core_news_md) | Finnish | 95.9 | 78.6 | 80.6 |
| [`fi_core_news_lg`](/models/fi#fi_core_news_lg) | Finnish | 96.2 | 79.4 | 82.4 |
| [`ko_core_news_sm`](/models/ko#ko_core_news_sm) | Korean | 86.1 | 65.6 | 71.3 |
| [`ko_core_news_md`](/models/ko#ko_core_news_md) | Korean | 94.7 | 80.9 | 83.1 |
| [`ko_core_news_lg`](/models/ko#ko_core_news_lg) | Korean | 94.7 | 81.3 | 85.3 |
| [`sv_core_news_sm`](/models/sv#sv_core_news_sm) | Swedish | 95.0 | 75.9 | 74.7 |
| [`sv_core_news_md`](/models/sv#sv_core_news_md) | Swedish | 96.3 | 78.5 | 79.3 |
| [`sv_core_news_lg`](/models/sv#sv_core_news_lg) | Swedish | 96.3 | 79.1 | 81.1 |
### Pipeline updates {id="pipeline-updates"}
The following languages switch from lookup or rule-based lemmatizers to the new
trainable lemmatizer: Danish, Dutch, German, Greek, Italian, Lithuanian,
Norwegian, Polish, Portuguese and Romanian. The overall lemmatizer accuracy
improves for all of these pipelines, but be aware that the types of errors may
look quite different from the lookup-based lemmatizers. If you'd prefer to
continue using the previous lemmatizer, you can
[switch from the trainable lemmatizer to a non-trainable lemmatizer](/models#design-modify).