Fill in non-CLI details from release notes draft

2025-08-04 04:10:20 +03:00 · 2023-01-16 13:12:17 +01:00 · 2023-01-16 13:12:17 +01:00 · b02ed00814
commit b02ed00814
parent 7aabc12d5c
1 changed files with 68 additions and 12 deletions
--- a/website/docs/usage/v3-5.md
+++ b/website/docs/usage/v3-5.md
@ -8,34 +8,90 @@ menu:
 ## New features {#features hidden="true"}
-spaCy v3.5 introduces two new CLI commands, `find-threshold`
+spaCy v3.5 introduces three new CLI commands, `apply`, `benchmark` and
-and `apply`, provides improvements and extensions to our entity linking
+`find-threshold`, provides improvements and extensions to our entity linking
 functionality, XXX
 ### New CLI commands {#cli}
 TODO `find-threshold`
 TODO `apply`
-### Entity Linking generalization {#el}
+TODO `benchmark`
-XXX
+TODO `find-threshold`
-### Trained pipelines {#models}
+### Entity linking generalization {#el}
-XXX
+The knowledge base used for entity linking is now easier to customize and has a
 new default implementation [`InMemoryLookupKB`](/api/kb_in_memory).
-### Pipeline updates {#pipelines}
+### Additional features and improvements {#additional-features-and-improvements}
-XXX
+- Language updates:
  - Extended support for Slovenian.
  - Fixed lookup fallback for French and Catalan lemmatizers.
  - Switch Russian and Ukrainian lemmatizers to `pymorphy3`.
  - Support for editorial punctuation in Ancient Greek.
  - Update to Russian tokenizer exceptions.
  - Small fix for Dutch stop words.
 - Allow up to `typer` v0.7.x, `mypy` 0.990 and `typing_extensions` v4.4.x.
 - New `spacy.ConsoleLogger.v3` with expanded progress
  [tracking](/api/top-level#ConsoleLogger).
 - Improved scoring behavior for `textcat` with `spacy.textcat_scorer.v2` and
  `spacy.textcat_multilabel_scorer.v2`.
 - Updates so that downstream components can train properly on a frozen `tok2vec`
  or `transformer` layer.
 - Allow interpolation of variables in directory names in projects.
 - Support for local file system [remotes](/usage/projects#remote) for projects.
 - Improve UX around `displacy.serve` when the default port is in use.
 - Optional `before_update` callback that is invoked at the start of each
  [training step](/api/data-formats#config-training).
 - Improve performance of `SpanGroup` and fix typing issues for `SpanGroup` and
  `Span` objects.
 - Patch a
  [security vulnerability](https://github.com/advisories/GHSA-gw9q-c7gh-j9vm) in
  extracting tar files.
 - Add equality definition for `Vectors`.
 - Ensure `Vocab.to_disk` respects the exclude setting for `lookups` and
  `vectors`.
 - Correctly handle missing annotations in the edit tree lemmatizer.
 ### Trained pipeline updates {#pipelines}
 - The CNN pipelines add `IS_SPACE` as a `tok2vec` feature for `tagger` and
  `morphologizer` components to improve tagging of non-whitespace vs. whitespace
  tokens.
 - The transformer pipelines require `spacy-transformers` v1.2, which uses the
  exact alignment from `tokenizers` for fast tokenizers instead of the heuristic
  alignment from `spacy-alignments`. For all trained pipelines except
  `ja_core_news_trf`, the alignments between spaCy tokens and transformer tokens
  may be slightly different. More details about the `spacy-transformers` changes
  in the
  [v1.2.0 release notes](https://github.com/explosion/spacy-transformers/releases/tag/v1.2.0).
 ## Notes about upgrading from v3.4 {#upgrading}
-### XXX
+### Validation of textcat values {#textcat-validation}
-XXX
+An error is now raised when unsupported values are given as input to train a
 `textcat` or `textcat_multilabel` model - ensure that values are `0.0` or `1.0`
 as explained in the [docs](/api/textcategorizer#assigned-attributes).
 ### Updated default scores for tokenization and textcat {#scores}
 We fixed a bug that inflated the `token_acc` scores in v3.0-v3.4. The reported
 `token_acc` will drop from v3.4 to v3.5, but if `token_p/r/f` stay the same,
 your tokenization performance has not changed from v3.4.
 For new `textcat` or `textcat_multilabel` configs, the new default `v2` scorers:
 - ignore `threshold` for `textcat`, so the reported `cats_p/r/f` may increase
  slightly in v3.5 even though underlying performance is unchanged
 - report the performance of only the **final** `textcat` or `textcat_multilabel`
  component in the pipeline by default
 - custom scorers can be used to score multiple `textcat` and
  `textcat_multilabel` components with the built-in `Scorer.score_cats` scorer
 ### Pipeline package version compatibility {#version-compat}