mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-25 13:11:03 +03:00 
			
		
		
		
	* Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields
		
			
				
	
	
	
		
			6.0 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			6.0 KiB
		
	
	
	
	
	
	
	
| title | tag | source | 
|---|---|---|
| StringStore | class | spacy/strings.pyx | 
Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of
integer IDs. This ensures that strings always map to the same ID, even from
different StringStores.
StringStore.__init__
Create the StringStore.
Example
from spacy.strings import StringStore stringstore = StringStore([u"apple", u"orange"])
| Name | Type | Description | 
|---|---|---|
| strings | iterable | A sequence of unicode strings to add to the store. | 
| RETURNS | StringStore | The newly constructed object. | 
StringStore.__len__
Get the number of strings in the store.
Example
stringstore = StringStore([u"apple", u"orange"]) assert len(stringstore) == 2
| Name | Type | Description | 
|---|---|---|
| RETURNS | int | The number of strings in the store. | 
StringStore.__getitem__
Retrieve a string from a given hash, or vice versa.
Example
stringstore = StringStore([u"apple", u"orange"]) apple_hash = stringstore[u"apple"] assert apple_hash == 8566208034543834098 assert stringstore[apple_hash] == u"apple"
| Name | Type | Description | 
|---|---|---|
| string_or_id | bytes, unicode or uint64 | The value to encode. | 
| RETURNS | unicode or int | The value to be retrieved. | 
StringStore.__contains__
Check whether a string is in the store.
Example
stringstore = StringStore([u"apple", u"orange"]) assert u"apple" in stringstore assert not u"cherry" in stringstore
| Name | Type | Description | 
|---|---|---|
| string | unicode | The string to check. | 
| RETURNS | bool | Whether the store contains the string. | 
StringStore.__iter__
Iterate over the strings in the store, in order. Note that a newly initialized
store will always include an empty string '' at position 0.
Example
stringstore = StringStore([u"apple", u"orange"]) all_strings = [s for s in stringstore] assert all_strings == [u"apple", u"orange"]
| Name | Type | Description | 
|---|---|---|
| YIELDS | unicode | A string in the store. | 
StringStore.add
Add a string to the StringStore.
Example
stringstore = StringStore([u"apple", u"orange"]) banana_hash = stringstore.add(u"banana") assert len(stringstore) == 3 assert banana_hash == 2525716904149915114 assert stringstore[banana_hash] == u"banana" assert stringstore[u"banana"] == banana_hash
| Name | Type | Description | 
|---|---|---|
| string | unicode | The string to add. | 
| RETURNS | uint64 | The string's hash value. | 
StringStore.to_disk
Save the current state to a directory.
Example
stringstore.to_disk("/path/to/strings")
| Name | Type | Description | 
|---|---|---|
| path | unicode / Path | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. | 
StringStore.from_disk
Loads state from a directory. Modifies the object in place and returns it.
Example
from spacy.strings import StringStore stringstore = StringStore().from_disk("/path/to/strings")
| Name | Type | Description | 
|---|---|---|
| path | unicode / Path | A path to a directory. Paths may be either strings or Path-like objects. | 
| RETURNS | StringStore | The modified StringStoreobject. | 
StringStore.to_bytes
Serialize the current state to a binary string.
Example
store_bytes = stringstore.to_bytes()
| Name | Type | Description | 
|---|---|---|
| RETURNS | bytes | The serialized form of the StringStoreobject. | 
StringStore.from_bytes
Load state from a binary string.
Example
fron spacy.strings import StringStore store_bytes = stringstore.to_bytes() new_store = StringStore().from_bytes(store_bytes)
| Name | Type | Description | 
|---|---|---|
| bytes_data | bytes | The data to load from. | 
| RETURNS | StringStore | The StringStoreobject. | 
Utilities
strings.hash_string
Get a 64-bit hash for a given string.
Example
from spacy.strings import hash_string assert hash_string(u"apple") == 8566208034543834098
| Name | Type | Description | 
|---|---|---|
| string | unicode | The string to hash. | 
| RETURNS | uint64 | The hash. |