Extend what's new in v2.3 with vocab / is_oov (#5635)

2025-12-23 01:53:17 +03:00 · 2020-06-23 16:48:59 +02:00 · 2020-06-23 16:48:59 +02:00 · 4f73ced914
commit 4f73ced914
parent fcdecefacf
1 changed files with 45 additions and 0 deletions
--- a/website/docs/usage/v2-3.md
+++ b/website/docs/usage/v2-3.md
@ -182,6 +182,51 @@ If you're adding data for a new language, the normalization table should be
 added to `spacy-lookups-data`. See
 [adding norm exceptions](/usage/adding-languages#norm-exceptions).
 #### No preloaded lexemes/vocab for models with vectors
 To reduce the initial loading time, the lexemes in `nlp.vocab` are no longer
 loaded on initialization for models with vectors. As you process texts, the
 lexemes will be added to the vocab automatically, just as in models without
 vectors.
 To see the number of unique vectors and number of words with vectors, see
 `nlp.meta['vectors']`, for example for `en_core_web_md` there are `20000`
 unique vectors and `684830` words with vectors:
 ```python
 {
    'width': 300,
    'vectors': 20000,
    'keys': 684830,
    'name': 'en_core_web_md.vectors'
 }
 ```
 If required, for instance if you are working directly with word vectors rather
 than processing texts, you can load all lexemes for words with vectors at once:
 ```python
 for orth in nlp.vocab.vectors:
    _ = nlp.vocab[orth]
 ```
 #### Lexeme.is_oov and Token.is_oov
 <Infobox title="Important note" variant="warning">
 Due to a bug, the values for `is_oov` are reversed in v2.3.0, but this will be
 fixed in the next patch release v2.3.1.
 </Infobox>
 In v2.3, `Lexeme.is_oov` and `Token.is_oov` are `True` if the lexeme does not
 have a word vector. This is equivalent to `token.orth not in
 nlp.vocab.vectors`.
 Previously in v2.2, `is_oov` corresponded to whether a lexeme had stored
 probability and cluster features. The probability and cluster features are no
 longer included in the provided medium and large models (see the next section).
 #### Probability and cluster features
 > #### Load and save extra prob lookups table