mirror of
https://github.com/explosion/spaCy.git
synced 2025-12-09 19:24:22 +03:00
* Fix most_similar for vectors with unused rows Address issues related to the unused rows in the vector table and `most_similar`: * Update `most_similar()` to search only through rows that are in use according to `key2row`. * Raise an error when `most_similar(n=n)` is larger than the number of vectors in the table. * Set and restore `_unset` correctly when vectors are added or deserialized so that new vectors are added in the correct row. * Set data and keys to the same length in `Vocab.prune_vectors()` to avoid spurious entries in `key2row`. * Fix regression test using `most_similar` Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_lexeme.py | ||
| test_lookups.py | ||
| test_similarity.py | ||
| test_stringstore.py | ||
| test_vectors.py | ||
| test_vocab_api.py | ||