mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			192 lines
		
	
	
		
			6.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			192 lines
		
	
	
		
			6.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
---
 | 
						|
title: StringStore
 | 
						|
tag: class
 | 
						|
source: spacy/strings.pyx
 | 
						|
---
 | 
						|
 | 
						|
Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of
 | 
						|
integer IDs. This ensures that strings always map to the same ID, even from
 | 
						|
different `StringStores`.
 | 
						|
 | 
						|
## StringStore.\_\_init\_\_ {#init tag="method"}
 | 
						|
 | 
						|
Create the `StringStore`.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> from spacy.strings import StringStore
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type          | Description                                        |
 | 
						|
| ----------- | ------------- | -------------------------------------------------- |
 | 
						|
| `strings`   | iterable      | A sequence of unicode strings to add to the store. |
 | 
						|
| **RETURNS** | `StringStore` | The newly constructed object.                      |
 | 
						|
 | 
						|
## StringStore.\_\_len\_\_ {#len tag="method"}
 | 
						|
 | 
						|
Get the number of strings in the store.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> assert len(stringstore) == 2
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type | Description                         |
 | 
						|
| ----------- | ---- | ----------------------------------- |
 | 
						|
| **RETURNS** | int  | The number of strings in the store. |
 | 
						|
 | 
						|
## StringStore.\_\_getitem\_\_ {#getitem tag="method"}
 | 
						|
 | 
						|
Retrieve a string from a given hash, or vice versa.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> apple_hash = stringstore["apple"]
 | 
						|
> assert apple_hash == 8566208034543834098
 | 
						|
> assert stringstore[apple_hash] == "apple"
 | 
						|
> ```
 | 
						|
 | 
						|
| Name           | Type                     | Description                |
 | 
						|
| -------------- | ------------------------ | -------------------------- |
 | 
						|
| `string_or_id` | bytes, unicode or uint64 | The value to encode.       |
 | 
						|
| **RETURNS**    | unicode or int           | The value to be retrieved. |
 | 
						|
 | 
						|
## StringStore.\_\_contains\_\_ {#contains tag="method"}
 | 
						|
 | 
						|
Check whether a string is in the store.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> assert "apple" in stringstore
 | 
						|
> assert not "cherry" in stringstore
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type    | Description                            |
 | 
						|
| ----------- | ------- | -------------------------------------- |
 | 
						|
| `string`    | unicode | The string to check.                   |
 | 
						|
| **RETURNS** | bool    | Whether the store contains the string. |
 | 
						|
 | 
						|
## StringStore.\_\_iter\_\_ {#iter tag="method"}
 | 
						|
 | 
						|
Iterate over the strings in the store, in order. Note that a newly initialized
 | 
						|
store will always include an empty string `''` at position `0`.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> all_strings = [s for s in stringstore]
 | 
						|
> assert all_strings == ["apple", "orange"]
 | 
						|
> ```
 | 
						|
 | 
						|
| Name       | Type    | Description            |
 | 
						|
| ---------- | ------- | ---------------------- |
 | 
						|
| **YIELDS** | unicode | A string in the store. |
 | 
						|
 | 
						|
## StringStore.add {#add tag="method" new="2"}
 | 
						|
 | 
						|
Add a string to the `StringStore`.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore = StringStore(["apple", "orange"])
 | 
						|
> banana_hash = stringstore.add("banana")
 | 
						|
> assert len(stringstore) == 3
 | 
						|
> assert banana_hash == 2525716904149915114
 | 
						|
> assert stringstore[banana_hash] == "banana"
 | 
						|
> assert stringstore["banana"] == banana_hash
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type    | Description              |
 | 
						|
| ----------- | ------- | ------------------------ |
 | 
						|
| `string`    | unicode | The string to add.       |
 | 
						|
| **RETURNS** | uint64  | The string's hash value. |
 | 
						|
 | 
						|
## StringStore.to_disk {#to_disk tag="method" new="2"}
 | 
						|
 | 
						|
Save the current state to a directory.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> stringstore.to_disk("/path/to/strings")
 | 
						|
> ```
 | 
						|
 | 
						|
| Name   | Type             | Description                                                                                                           |
 | 
						|
| ------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- |
 | 
						|
| `path` | unicode / `Path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. |
 | 
						|
 | 
						|
## StringStore.from_disk {#from_disk tag="method" new="2"}
 | 
						|
 | 
						|
Loads state from a directory. Modifies the object in place and returns it.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> from spacy.strings import StringStore
 | 
						|
> stringstore = StringStore().from_disk("/path/to/strings")
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type             | Description                                                                |
 | 
						|
| ----------- | ---------------- | -------------------------------------------------------------------------- |
 | 
						|
| `path`      | unicode / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
 | 
						|
| **RETURNS** | `StringStore`    | The modified `StringStore` object.                                         |
 | 
						|
 | 
						|
## StringStore.to_bytes {#to_bytes tag="method"}
 | 
						|
 | 
						|
Serialize the current state to a binary string.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> store_bytes = stringstore.to_bytes()
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type  | Description                                      |
 | 
						|
| ----------- | ----- | ------------------------------------------------ |
 | 
						|
| **RETURNS** | bytes | The serialized form of the `StringStore` object. |
 | 
						|
 | 
						|
## StringStore.from_bytes {#from_bytes tag="method"}
 | 
						|
 | 
						|
Load state from a binary string.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> fron spacy.strings import StringStore
 | 
						|
> store_bytes = stringstore.to_bytes()
 | 
						|
> new_store = StringStore().from_bytes(store_bytes)
 | 
						|
> ```
 | 
						|
 | 
						|
| Name         | Type          | Description               |
 | 
						|
| ------------ | ------------- | ------------------------- |
 | 
						|
| `bytes_data` | bytes         | The data to load from.    |
 | 
						|
| **RETURNS**  | `StringStore` | The `StringStore` object. |
 | 
						|
 | 
						|
## Utilities {#util}
 | 
						|
 | 
						|
### strings.hash_string {#hash_string tag="function"}
 | 
						|
 | 
						|
Get a 64-bit hash for a given string.
 | 
						|
 | 
						|
> #### Example
 | 
						|
>
 | 
						|
> ```python
 | 
						|
> from spacy.strings import hash_string
 | 
						|
> assert hash_string("apple") == 8566208034543834098
 | 
						|
> ```
 | 
						|
 | 
						|
| Name        | Type    | Description         |
 | 
						|
| ----------- | ------- | ------------------- |
 | 
						|
| `string`    | unicode | The string to hash. |
 | 
						|
| **RETURNS** | uint64  | The hash.           |
 |