spaCy/spacy/tests/vocab_vectors
adrianeboyd 40e65d6f63
Fix most_similar for vectors with unused rows (#5348)
* Fix most_similar for vectors with unused rows

Address issues related to the unused rows in the vector table and
`most_similar`:

* Update `most_similar()` to search only through rows that are in use
according to `key2row`.

* Raise an error when `most_similar(n=n)` is larger than the number of
vectors in the table.

* Set and restore `_unset` correctly when vectors are added or
deserialized so that new vectors are added in the correct row.

* Set data and keys to the same length in `Vocab.prune_vectors()` to
avoid spurious entries in `key2row`.

* Fix regression test using `most_similar`

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-05-19 16:41:26 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_lexeme.py Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
test_lookups.py Reduce stored lexemes data, move feats to lookups (#5238) 2020-05-19 15:59:14 +02:00
test_similarity.py Revert #4334 2019-09-29 17:32:12 +02:00
test_stringstore.py Revert #4334 2019-09-29 17:32:12 +02:00
test_vectors.py Fix most_similar for vectors with unused rows (#5348) 2020-05-19 16:41:26 +02:00
test_vocab_api.py Revert #4334 2019-09-29 17:32:12 +02:00