Docs for v3.3 (#10628)

* Temporarily disable CI tests * Start v3.3 website updates * Add trainable lemmatizer to pipeline design * Fix Vectors.most_similar * Add floret vector info to pipeline design * Add Lower and Upper Sorbian * Add span to sidebar * Work on release notes * Copy from release notes * Update pipeline design graphic * Upgrading note about Doc.from_docs * Add tables and details * Update website/docs/models/index.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix da lemma acc * Add minimal intro, various updates * Round lemma acc * Add section on floret / word lists * Add new pipelines table, minor edits * Fix displacy spans example title * Clarify adding non-trainable lemmatizer * Update adding-languages URLs * Revert "Temporarily disable CI tests" This reverts commit 1dee505920. * Spell out words/sec Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2025-11-27 05:15:43 +03:00 · 2022-04-28 14:09:35 +02:00 · 2022-04-28 14:09:35 +02:00 · 497a708c71
commit 497a708c71
parent 10377fb945
10 changed files with 407 additions and 82 deletions
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@ -621,7 +621,7 @@ relative clauses.

 To customize the noun chunk iterator in a loaded pipeline, modify
 [`nlp.vocab.get_noun_chunks`](/api/vocab#attributes). If the `noun_chunk`
-[syntax iterator](/usage/adding-languages#language-data) has not been
+[syntax iterator](/usage/linguistic-features#language-data) has not been
 implemented for the given language, a `NotImplementedError` is raised.

 > #### Example
--- a/website/docs/api/span.md
+++ b/website/docs/api/span.md
@ -283,8 +283,9 @@ objects, if the document has been syntactically parsed. A base noun phrase, or
 it – so no NP-level coordination, no prepositional phrases, and no relative
 clauses.

-If the `noun_chunk` [syntax iterator](/usage/adding-languages#language-data) has
-not been implemeted for the given language, a `NotImplementedError` is raised.
+If the `noun_chunk` [syntax iterator](/usage/linguistic-features#language-data)
+has not been implemeted for the given language, a `NotImplementedError` is
+raised.

 > #### Example
 >
@ -520,12 +521,13 @@ sent = doc[sent.start : max(sent.end, span.end)]

 ## Span.sents {#sents tag="property" model="sentences" new="3.2.1"}

-Returns a generator over the sentences the span belongs to. This property is only available
-when [sentence boundaries](/usage/linguistic-features#sbd) have been set on the
-document by the `parser`, `senter`, `sentencizer` or some custom function. It
-will raise an error otherwise.
+Returns a generator over the sentences the span belongs to. This property is
+only available when [sentence boundaries](/usage/linguistic-features#sbd) have
+been set on the document by the `parser`, `senter`, `sentencizer` or some custom
+function. It will raise an error otherwise.

-If the span happens to cross sentence boundaries, all sentences the span overlaps with will be returned.
+If the span happens to cross sentence boundaries, all sentences the span
+overlaps with will be returned.

 > #### Example
 >
--- a/website/docs/api/vectors.md
+++ b/website/docs/api/vectors.md
@ -347,14 +347,14 @@ supported for `floret` mode.
 > most_similar = nlp.vocab.vectors.most_similar(queries, n=10)
 > ```

-| Name           | Description                                                                 |
-| -------------- | --------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
-| `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                        |
-| _keyword-only_ |                                                                             |
-| `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                           |
-| `n`            | The number of entries to return for each query. Defaults to `1`. ~~int~~    |
-| `sort`         | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~ |
-| **RETURNS**    | tuple                                                                       | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |
+| Name           | Description                                                                                                             |
+| -------------- | ----------------------------------------------------------------------------------------------------------------------- |
+| `queries`      | An array with one or more vectors. ~~numpy.ndarray~~                                                                    |
+| _keyword-only_ |                                                                                                                         |
+| `batch_size`   | The batch size to use. Default to `1024`. ~~int~~                                                                       |
+| `n`            | The number of entries to return for each query. Defaults to `1`. ~~int~~                                                |
+| `sort`         | Whether to sort the entries returned by score. Defaults to `True`. ~~bool~~                                             |
+| **RETURNS**    | The most similar entries as a `(keys, best_rows, scores)` tuple. ~~Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray]~~ |

 ## Vectors.get_batch {#get_batch tag="method" new="3.2"}

--- a/website/docs/images/pipeline-design.svg
+++ b/website/docs/images/pipeline-design.svg
--- a/website/docs/models/index.md
+++ b/website/docs/models/index.md
@ -30,10 +30,16 @@ into three components:
   tagging, parsing, lemmatization and named entity recognition, or `dep` for
   only tagging, parsing and lemmatization).
 2. **Genre:** Type of text the pipeline is trained on, e.g. `web` or `news`.
-3. **Size:** Package size indicator, `sm`, `md`, `lg` or `trf` (`sm`: no word
-   vectors, `md`: reduced word vector table with 20k unique vectors for ~500k
-   words, `lg`: large word vector table with ~500k entries, `trf`: transformer
-   pipeline without static word vectors)
+3. **Size:** Package size indicator, `sm`, `md`, `lg` or `trf`.
+
+   `sm` and `trf` pipelines have no static word vectors.
+
+   For pipelines with default vectors, `md` has a reduced word vector table with
+   20k unique vectors for ~500k words and `lg` has a large word vector table
+   with ~500k entries.
+
+   For pipelines with floret vectors, `md` vector tables have 50k entries and
+   `lg` vector tables have 200k entries.

 For example, [`en_core_web_sm`](/models/en#en_core_web_sm) is a small English
 pipeline trained on written web text (blogs, news, comments), that includes
@ -90,19 +96,42 @@ Main changes from spaCy v2 models:
 In the `sm`/`md`/`lg` models:

 - The `tagger`, `morphologizer` and `parser` components listen to the `tok2vec`
-  component.
+  component. If the lemmatizer is trainable (v3.3+), `lemmatizer` also listens
+  to `tok2vec`.
 - The `attribute_ruler` maps `token.tag` to `token.pos` if there is no
  `morphologizer`. The `attribute_ruler` additionally makes sure whitespace is
  tagged consistently and copies `token.pos` to `token.tag` if there is no
  tagger. For English, the attribute ruler can improve its mapping from
  `token.tag` to `token.pos` if dependency parses from a `parser` are present,
  but the parser is not required.
- The `lemmatizer` component for many languages (Catalan, Dutch, English,
-  French, Greek, Italian Macedonian, Norwegian, Polish and Spanish) requires
-  `token.pos` annotation from either `tagger`+`attribute_ruler` or
-  `morphologizer`.
+- The `lemmatizer` component for many languages requires `token.pos` annotation
+  from either `tagger`+`attribute_ruler` or `morphologizer`.
 - The `ner` component is independent with its own internal tok2vec layer.

+#### CNN/CPU pipelines with floret vectors
+
+The Finnish, Korean and Swedish `md` and `lg` pipelines use
+[floret vectors](/usage/v3-2#vectors) instead of default vectors. If you're
+running a trained pipeline on texts and working with [`Doc`](/api/doc) objects,
+you shouldn't notice any difference with floret vectors. With floret vectors no
+tokens are out-of-vocabulary, so [`Token.is_oov`](/api/token#attributes) will
+return `True` for all tokens.
+
+If you access vectors directly for similarity comparisons, there are a few
+differences because floret vectors don't include a fixed word list like the
+vector keys for default vectors.
+
+- If your workflow iterates over the vector keys, you need to use an external
+  word list instead:
+
+  ```diff
+  - lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
+  + lexemes = [nlp.vocab[word] for word in external_word_list]
+  ```
+
+- [`Vectors.most_similar`](/api/vectors#most_similar) is not supported because
+  there's no fixed list of vectors to compare your vectors to.
+
 ### Transformer pipeline design {#design-trf}

 In the transformer (`trf`) models, the `tagger`, `parser` and `ner` (if present)
@ -133,10 +162,14 @@ nlp = spacy.load("en_core_web_trf", disable=["tagger", "attribute_ruler", "lemma
 <Infobox variant="warning" title="Rule-based and POS-lookup lemmatizers require
 Token.pos">

-The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for
-Catalan, Dutch, English, French, Greek, Italian, Macedonian, Norwegian, Polish
-and Spanish. If you disable any of these components, you'll see lemmatizer
-warnings unless the lemmatizer is also disabled.
+The lemmatizer depends on `tagger`+`attribute_ruler` or `morphologizer` for a
+number of languages. If you disable any of these components, you'll see
+lemmatizer warnings unless the lemmatizer is also disabled.
+
+**v3.3**: Catalan, English, French, Russian and Spanish
+
+**v3.0-v3.2**: Catalan, Dutch, English, French, Greek, Italian, Macedonian,
+Norwegian, Polish, Russian and Spanish

 </Infobox>

@ -154,10 +187,34 @@ nlp.enable_pipe("senter")
 The `senter` component is ~10&times; faster than the parser and more accurate
 than the rule-based `sentencizer`.

+#### Switch from trainable lemmatizer to default lemmatizer
+
+Since v3.3, a number of pipelines use a trainable lemmatizer. You can check whether
+the lemmatizer is trainable:
+
+```python
+nlp = spacy.load("de_core_web_sm")
+assert nlp.get_pipe("lemmatizer").is_trainable
+```
+
+If you'd like to switch to a non-trainable lemmatizer that's similar to v3.2 or
+earlier, you can replace the trainable lemmatizer with the default non-trainable
+lemmatizer:
+
+```python
+# Requirements: pip install spacy-lookups-data
+nlp = spacy.load("de_core_web_sm")
+# Remove existing lemmatizer
+nlp.remove_pipe("lemmatizer")
+# Add non-trainable lemmatizer from language defaults
+# and load lemmatizer tables from spacy-lookups-data
+nlp.add_pipe("lemmatizer").initialize()
+```
+
 #### Switch from rule-based to lookup lemmatization

 For the Dutch, English, French, Greek, Macedonian, Norwegian and Spanish
-pipelines, you can switch from the default rule-based lemmatizer to a lookup
+pipelines, you can swap out a trainable or rule-based lemmatizer for a lookup
 lemmatizer:

 ```python
--- a/website/docs/usage/v3-3.md
+++ b/website/docs/usage/v3-3.md
@ -0,0 +1,247 @@
+---
+title: What's New in v3.3
+teaser: New features and how to upgrade
+menu:
+  - ['New Features', 'features']
+  - ['Upgrading Notes', 'upgrading']
+---
+
+## New features {#features hidden="true"}
+
+spaCy v3.3 improves the speed of core pipeline components, adds a new trainable
+lemmatizer, and introduces trained pipelines for Finnish, Korean and Swedish.
+
+### Speed improvements {#speed}
+
+v3.3 includes a slew of speed improvements:
+
+- Speed up parser and NER by using constant-time head lookups.
+- Support unnormalized softmax probabilities in `spacy.Tagger.v2` to speed up
+  inference for tagger, morphologizer, senter and trainable lemmatizer.
+- Speed up parser projectivization functions.
+- Replace `Ragged` with faster `AlignmentArray` in `Example` for training.
+- Improve `Matcher` speed.
+- Improve serialization speed for empty `Doc.spans`.
+
+For longer texts, the trained pipeline speeds improve **15%** or more in
+prediction. We benchmarked `en_core_web_md` (same components as in v3.2) and
+`de_core_news_md` (with the new trainable lemmatizer) across a range of text
+sizes on Linux (Intel Xeon W-2265) and OS X (M1) to compare spaCy v3.2 vs. v3.3:
+
+**Intel Xeon W-2265**
+
+| Model                                            | Avg. Words/Doc | v3.2 Words/Sec | v3.3 Words/Sec |   Diff |
+| :----------------------------------------------- | -------------: | -------------: | -------------: | -----: |
+| [`en_core_web_md`](/models/en#en_core_web_md)    |            100 |          17292 |          17441 |  0.86% |
+| (=same components)                               |           1000 |          15408 |          16024 |  4.00% |
+|                                                  |          10000 |          12798 |          15346 | 19.91% |
+| [`de_core_news_md`](/models/de/#de_core_news_md) |            100 |          20221 |          19321 | -4.45% |
+| (+v3.3 trainable lemmatizer)                     |           1000 |          17480 |          17345 | -0.77% |
+|                                                  |          10000 |          14513 |          17036 | 17.38% |
+
+**Apple M1**
+
+| Model                                            | Avg. Words/Doc | v3.2 Words/Sec | v3.3 Words/Sec |   Diff |
+| ------------------------------------------------ | -------------: | -------------: | -------------: | -----: |
+| [`en_core_web_md`](/models/en#en_core_web_md)    |            100 |          18272 |          18408 |  0.74% |
+| (=same components)                               |           1000 |          18794 |          19248 |  2.42% |
+|                                                  |          10000 |          15144 |          17513 | 15.64% |
+| [`de_core_news_md`](/models/de/#de_core_news_md) |            100 |          19227 |          19591 |  1.89% |
+| (+v3.3 trainable lemmatizer)                     |           1000 |          20047 |          20628 |  2.90% |
+|                                                  |          10000 |          15921 |          18546 | 16.49% |
+
+### Trainable lemmatizer {#trainable-lemmatizer}
+
+The new [trainable lemmatizer](/api/edittreelemmatizer) component uses
+[edit trees](https://explosion.ai/blog/edit-tree-lemmatizer) to transform tokens
+into lemmas. Try out the trainable lemmatizer with the
+[training quickstart](/usage/training#quickstart)!
+
+### displaCy support for overlapping spans and arcs {#displacy}
+
+displaCy now supports overlapping spans with a new
+[`span`](/usage/visualizers#span) style and multiple arcs with different labels
+between the same tokens for [`dep`](/usage/visualizers#dep) visualizations.
+
+Overlapping spans can be visualized for any spans key in `doc.spans`:
+
+```python
+import spacy
+from spacy import displacy
+from spacy.tokens import Span
+
+nlp = spacy.blank("en")
+text = "Welcome to the Bank of China."
+doc = nlp(text)
+doc.spans["custom"] = [Span(doc, 3, 6, "ORG"), Span(doc, 5, 6, "GPE")]
+displacy.serve(doc, style="span", options={"spans_key": "custom"})
+```
+
+import DisplacySpanHtml from 'images/displacy-span.html'
+
+<Iframe title="displaCy visualizer for overlapping spans" html={DisplacySpanHtml} height={180} />
+
+## Additional features and improvements
+
+- Config comparisons with [`spacy debug diff-config`](/api/cli#debug-diff).
+- Span suggester debugging with
+  [`SpanCategorizer.set_candidates`](/api/spancategorizer#set_candidates).
+- Big endian support with
+  [`thinc-bigendian-ops`](https://github.com/andrewsi-z/thinc-bigendian-ops) and
+  updates to make `floret`, `murmurhash`, Thinc and spaCy endian neutral.
+- Initial support for Lower Sorbian and Upper Sorbian.
+- Language updates for English, French, Italian, Japanese, Korean, Norwegian,
+  Russian, Slovenian, Spanish, Turkish, Ukrainian and Vietnamese.
+- New noun chunks for Finnish.
+
+## Trained pipelines {#pipelines}
+
+### New trained pipelines {#new-pipelines}
+
+v3.3 introduces new CPU/CNN pipelines for Finnish, Korean and Swedish, which use
+the new trainable lemmatizer and
+[floret vectors](https://github.com/explosion/floret). Due to the use
+[Bloom embeddings](https://explosion.ai/blog/bloom-embeddings) and subwords, the
+pipelines have compact vectors with no out-of-vocabulary words.
+
+| Package                                         | Language | UPOS | Parser LAS | NER F |
+| ----------------------------------------------- | -------- | ---: | ---------: | ----: |
+| [`fi_core_news_sm`](/models/fi#fi_core_news_sm) | Finnish  | 92.5 |       71.9 |  75.9 |
+| [`fi_core_news_md`](/models/fi#fi_core_news_md) | Finnish  | 95.9 |       78.6 |  80.6 |
+| [`fi_core_news_lg`](/models/fi#fi_core_news_lg) | Finnish  | 96.2 |       79.4 |  82.4 |
+| [`ko_core_news_sm`](/models/ko#ko_core_news_sm) | Korean   | 86.1 |       65.6 |  71.3 |
+| [`ko_core_news_md`](/models/ko#ko_core_news_md) | Korean   | 94.7 |       80.9 |  83.1 |
+| [`ko_core_news_lg`](/models/ko#ko_core_news_lg) | Korean   | 94.7 |       81.3 |  85.3 |
+| [`sv_core_news_sm`](/models/sv#sv_core_news_sm) | Swedish  | 95.0 |       75.9 |  74.7 |
+| [`sv_core_news_md`](/models/sv#sv_core_news_md) | Swedish  | 96.3 |       78.5 |  79.3 |
+| [`sv_core_news_lg`](/models/sv#sv_core_news_lg) | Swedish  | 96.3 |       79.1 |  81.1 |
+
+### Pipeline updates {#pipeline-updates}
+
+The following languages switch from lookup or rule-based lemmatizers to the new
+trainable lemmatizer: Danish, Dutch, German, Greek, Italian, Lithuanian,
+Norwegian, Polish, Portuguese and Romanian. The overall lemmatizer accuracy
+improves for all of these pipelines, but be aware that the types of errors may
+look quite different from the lookup-based lemmatizers. If you'd prefer to
+continue using the previous lemmatizer, you can
+[switch from the trainable lemmatizer to a non-trainable lemmatizer](/models#design-modify).
+
+<figure>
+
+| Model                                           | v3.2 Lemma Acc | v3.3 Lemma Acc |
+| ----------------------------------------------- | -------------: | -------------: |
+| [`da_core_news_md`](/models/da#da_core_news_md) |           84.9 |           94.8 |
+| [`de_core_news_md`](/models/de#de_core_news_md) |           73.4 |           97.7 |
+| [`el_core_news_md`](/models/el#el_core_news_md) |           56.5 |           88.9 |
+| [`fi_core_news_md`](/models/fi#fi_core_news_md) |              - |           86.2 |
+| [`it_core_news_md`](/models/it#it_core_news_md) |           86.6 |           97.2 |
+| [`ko_core_news_md`](/models/ko#ko_core_news_md) |              - |           90.0 |
+| [`lt_core_news_md`](/models/lt#lt_core_news_md) |           71.1 |           84.8 |
+| [`nb_core_news_md`](/models/nb#nb_core_news_md) |           76.7 |           97.1 |
+| [`nl_core_news_md`](/models/nl#nl_core_news_md) |           81.5 |           94.0 |
+| [`pl_core_news_md`](/models/pl#pl_core_news_md) |           87.1 |           93.7 |
+| [`pt_core_news_md`](/models/pt#pt_core_news_md) |           76.7 |           96.9 |
+| [`ro_core_news_md`](/models/ro#ro_core_news_md) |           81.8 |           95.5 |
+| [`sv_core_news_md`](/models/sv#sv_core_news_md) |              - |           95.5 |
+
+</figure>
+
+In addition, the vectors in the English pipelines are deduplicated to improve
+the pruned vectors in the `md` models and reduce the `lg` model size.
+
+## Notes about upgrading from v3.2 {#upgrading}
+
+### Span comparisons
+
+Span comparisons involving ordering (`<`, `<=`, `>`, `>=`) now take all span
+attributes into account (start, end, label, and KB ID) so spans may be sorted in
+a slightly different order.
+
+### Whitespace annotation
+
+During training, annotation on whitespace tokens is handled in the same way as
+annotation on non-whitespace tokens in order to allow custom whitespace
+annotation.
+
+### Doc.from_docs
+
+[`Doc.from_docs`](/api/doc#from_docs) now includes `Doc.tensor` by default and
+supports excludes with an `exclude` argument in the same format as
+`Doc.to_bytes`. The supported exclude fields are `spans`, `tensor` and
+`user_data`.
+
+Docs including `Doc.tensor` may be quite a bit larger in RAM, so to exclude
+`Doc.tensor` as in v3.2:
+
+```diff
+-merged_doc = Doc.from_docs(docs)
+merged_doc = Doc.from_docs(docs, exclude=["tensor"])
+```
+
+### Using trained pipelines with floret vectors
+
+If you're running a new trained pipeline for Finnish, Korean or Swedish on new
+texts and working with `Doc` objects, you shouldn't notice any difference with
+floret vectors vs. default vectors.
+
+If you use vectors for similarity comparisons, there are a few differences,
+mainly because a floret pipeline doesn't include any kind of frequency-based
+word list similar to the list of in-vocabulary vector keys with default vectors.
+
+- If your workflow iterates over the vector keys, you should use an external
+  word list instead:
+
+  ```diff
+  - lexemes = [nlp.vocab[orth] for orth in nlp.vocab.vectors]
+  + lexemes = [nlp.vocab[word] for word in external_word_list]
+  ```
+
+- `Vectors.most_similar` is not supported because there's no fixed list of
+  vectors to compare your vectors to.
+
+### Pipeline package version compatibility {#version-compat}
+
+> #### Using legacy implementations
+>
+> In spaCy v3, you'll still be able to load and reference legacy implementations
+> via [`spacy-legacy`](https://github.com/explosion/spacy-legacy), even if the
+> components or architectures change and newer versions are available in the
+> core library.
+
+When you're loading a pipeline package trained with an earlier version of spaCy
+v3, you will see a warning telling you that the pipeline may be incompatible.
+This doesn't necessarily have to be true, but we recommend running your
+pipelines against your test suite or evaluation data to make sure there are no
+unexpected results.
+
+If you're using one of the [trained pipelines](/models) we provide, you should
+run [`spacy download`](/api/cli#download) to update to the latest version. To
+see an overview of all installed packages and their compatibility, you can run
+[`spacy validate`](/api/cli#validate).
+
+If you've trained your own custom pipeline and you've confirmed that it's still
+working as expected, you can update the spaCy version requirements in the
+[`meta.json`](/api/data-formats#meta):
+
+```diff
+- "spacy_version": ">=3.2.0,<3.3.0",
+ "spacy_version": ">=3.2.0,<3.4.0",
+```
+
+### Updating v3.2 configs
+
+To update a config from spaCy v3.2 with the new v3.3 settings, run
+[`init fill-config`](/api/cli#init-fill-config):
+
+```cli
+$ python -m spacy init fill-config config-v3.2.cfg config-v3.3.cfg
+```
+
+In many cases ([`spacy train`](/api/cli#train),
+[`spacy.load`](/api/top-level#spacy.load)), the new defaults will be filled in
+automatically, but you'll need to fill in the new settings to run
+[`debug config`](/api/cli#debug) and [`debug data`](/api/cli#debug-data).
+
+To see the speed improvements for the
+[`Tagger` architecture](/api/architectures#Tagger), edit your config to switch
+from `spacy.Tagger.v1` to `spacy.Tagger.v2` and then run `init fill-config`.
--- a/website/docs/usage/visualizers.md
+++ b/website/docs/usage/visualizers.md
@ -5,6 +5,7 @@ new: 2
 menu:
  - ['Dependencies', 'dep']
  - ['Named Entities', 'ent']
+  - ['Spans', 'span']
  - ['Jupyter Notebooks', 'jupyter']
  - ['Rendering HTML', 'html']
  - ['Web app usage', 'webapp']
@ -192,7 +193,7 @@ displacy.serve(doc, style="span")

 import DisplacySpanHtml from 'images/displacy-span.html'

-<Iframe title="displaCy visualizer for entities" html={DisplacySpanHtml} height={180} />
+<Iframe title="displaCy visualizer for overlapping spans" html={DisplacySpanHtml} height={180} />


 The span visualizer lets you customize the following `options`:
--- a/website/meta/languages.json
+++ b/website/meta/languages.json
@ -62,6 +62,11 @@
            "example": "Dies ist ein Satz.",
            "has_examples": true
        },
+        {
+            "code": "dsb",
+            "name": "Lower Sorbian",
+	    "has_examples": true
+        },
        {
            "code": "el",
            "name": "Greek",
@ -159,6 +164,11 @@
            "name": "Croatian",
            "has_examples": true
        },
+        {
+            "code": "hsb",
+            "name": "Upper Sorbian",
+	    "has_examples": true
+        },
        {
            "code": "hu",
            "name": "Hungarian",
--- a/website/meta/sidebars.json
+++ b/website/meta/sidebars.json
@ -11,7 +11,8 @@
                    { "text": "spaCy 101", "url": "/usage/spacy-101" },
                    { "text": "New in v3.0", "url": "/usage/v3" },
                    { "text": "New in v3.1", "url": "/usage/v3-1" },
-                    { "text": "New in v3.2", "url": "/usage/v3-2" }
+                    { "text": "New in v3.2", "url": "/usage/v3-2" },
+                    { "text": "New in v3.3", "url": "/usage/v3-3" }
                ]
            },
            {
--- a/website/src/templates/index.js
+++ b/website/src/templates/index.js
@ -120,8 +120,8 @@ const AlertSpace = ({ nightly, legacy }) => {
 }

 const navAlert = (
-    <Link to="/usage/v3-2" hidden>
-        <strong>💥 Out now:</strong> spaCy v3.2
+    <Link to="/usage/v3-3" hidden>
+        <strong>💥 Out now:</strong> spaCy v3.3
    </Link>
 )