* `strings`: Remove unused `hash32_utf8` function
* `strings`: Make `hash_utf8` and `decode_Utf8Str` private
* `strings`: Reorganize private functions
* 'strings': Raise error when non-string/-int types are passed to functions that don't accept them
* `strings`: Add `items()` method, add type hints, remove unused methods, restrict inputs to specific types, reorganize methods
* `Morphology`: Use `StringStore.items()` to enumerate features when pickling
* `test_stringstore`: Update pre-Python 3 tests
* Update `StringStore` docs
* Fix `get_string_id` imports
* Replace redundant test with tests for type checking
* Rename `_retrieve_interned_str`, remove `.get` default arg
* Add `get_string_id` to `strings.pyi`
Remove `mypy` ignore directives from imports of the above
* `strings.pyi`: Replace functions that consume `Union`-typed params with overloads
* `strings.pyi`: Revert some function signatures
* Update `SYMBOLS_BY_INT` lookups and error codes post-merge
* Revert clobbered change introduced in a previous merge
* Remove unnecessary type hint
* Invert tuple order in `StringStore.items()`
* Add test for `StringStore.items()`
* Revert "`Morphology`: Use `StringStore.items()` to enumerate features when pickling"
This reverts commit 1af9510ceb
.
* Rename `keys` and `key_map`
* Add `keys()` and `values()`
* Add comment about the inverted key-value semantics in the API
* Fix type hints
* Implement `keys()`, `values()`, `items()` without generators
* Fix type hints, remove unnecessary boxing
* Update docs
* Simplify `keys/values/items()` impl
* `mypy` fix
* Fix error message, doc fixes
7.4 KiB
title | tag | source |
---|---|---|
StringStore | class | spacy/strings.pyx |
Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of
integer IDs. This ensures that strings always map to the same ID, even from
different StringStores
.
StringStore.__init__
Create the StringStore
.
Example
from spacy.strings import StringStore stringstore = StringStore(["apple", "orange"])
Name | Description |
---|---|
strings |
A sequence of strings to add to the store. |
StringStore.__len__
Get the number of strings in the store.
Example
stringstore = StringStore(["apple", "orange"]) assert len(stringstore) == 2
Name | Description |
---|---|
RETURNS | The number of strings in the store. |
StringStore.__getitem__
Retrieve a string from a given hash. If a string is passed as the input, add it to the store and return its hash.
Example
stringstore = StringStore(["apple", "orange"]) apple_hash = stringstore["apple"] assert apple_hash == 8566208034543834098 assert stringstore[apple_hash] == "apple"
Name | Description |
---|---|
string_or_hash |
The hash value to lookup or the string to store. |
RETURNS | The stored string or the hash of the newly added string. |
StringStore.__contains__
Check whether a string or a hash is in the store.
Example
stringstore = StringStore(["apple", "orange"]) assert "apple" in stringstore assert not "cherry" in stringstore
Name | Description |
---|---|
string_or_hash |
The string or hash to check. |
RETURNS | Whether the store contains the string or hash. |
StringStore.__iter__
Iterate over the stored strings in insertion order.
Example
stringstore = StringStore(["apple", "orange"]) all_strings = [s for s in stringstore] assert all_strings == ["apple", "orange"]
Name | Description |
---|---|
RETURNS | A string in the store. |
StringStore.items
Iterate over the stored string-hash pairs in insertion order.
Example
stringstore = StringStore(["apple", "orange"]) all_strings_and_hashes = stringstore.items() assert all_strings_and_hashes == [("apple", 8566208034543834098), ("orange", 2208928596161743350)]
Name | Description |
---|---|
RETURNS | A list of string-hash pairs. |
StringStore.keys
Iterate over the stored strings in insertion order.
Example
stringstore = StringStore(["apple", "orange"]) all_strings = stringstore.keys() assert all_strings == ["apple", "orange"]
Name | Description |
---|---|
RETURNS | A list of strings. |
StringStore.values
Iterate over the stored string hashes in insertion order.
Example
stringstore = StringStore(["apple", "orange"]) all_hashes = stringstore.values() assert all_hashes == [8566208034543834098, 2208928596161743350]
Name | Description |
---|---|
RETURNS | A list of string hashes. |
StringStore.add
Add a string to the StringStore
.
Example
stringstore = StringStore(["apple", "orange"]) banana_hash = stringstore.add("banana") assert len(stringstore) == 3 assert banana_hash == 2525716904149915114 assert stringstore[banana_hash] == "banana" assert stringstore["banana"] == banana_hash
Name | Description |
---|---|
string |
The string to add. |
RETURNS | The string's hash value. |
StringStore.to_disk
Save the current state to a directory.
Example
stringstore.to_disk("/path/to/strings")
Name | Description |
---|---|
path |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path -like objects. |
StringStore.from_disk
Loads state from a directory. Modifies the object in place and returns it.
Example
from spacy.strings import StringStore stringstore = StringStore().from_disk("/path/to/strings")
Name | Description |
---|---|
path |
A path to a directory. Paths may be either strings or Path -like objects. |
RETURNS | The modified StringStore object. |
StringStore.to_bytes
Serialize the current state to a binary string.
Example
store_bytes = stringstore.to_bytes()
Name | Description |
---|---|
RETURNS | The serialized form of the StringStore object. |
StringStore.from_bytes
Load state from a binary string.
Example
from spacy.strings import StringStore store_bytes = stringstore.to_bytes() new_store = StringStore().from_bytes(store_bytes)
Name | Description |
---|---|
bytes_data |
The data to load from. |
RETURNS | The StringStore object. |
Utilities
strings.hash_string
Get a 64-bit hash for a given string.
Example
from spacy.strings import hash_string assert hash_string("apple") == 8566208034543834098
Name | Description |
---|---|
string |
The string to hash. |
RETURNS | The hash. |