mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-29 11:26:28 +03:00
7ba3a5d95c
* Make serialization methods consistent exclude keyword argument instead of random named keyword arguments and deprecation handling * Update docs and add section on serialization fields
192 lines
6.0 KiB
Markdown
192 lines
6.0 KiB
Markdown
---
|
|
title: StringStore
|
|
tag: class
|
|
source: spacy/strings.pyx
|
|
---
|
|
|
|
Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of
|
|
integer IDs. This ensures that strings always map to the same ID, even from
|
|
different `StringStores`.
|
|
|
|
## StringStore.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create the `StringStore`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.strings import StringStore
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------------- | -------------------------------------------------- |
|
|
| `strings` | iterable | A sequence of unicode strings to add to the store. |
|
|
| **RETURNS** | `StringStore` | The newly constructed object. |
|
|
|
|
## StringStore.\_\_len\_\_ {#len tag="method"}
|
|
|
|
Get the number of strings in the store.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> assert len(stringstore) == 2
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ---- | ----------------------------------- |
|
|
| **RETURNS** | int | The number of strings in the store. |
|
|
|
|
## StringStore.\_\_getitem\_\_ {#getitem tag="method"}
|
|
|
|
Retrieve a string from a given hash, or vice versa.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> apple_hash = stringstore[u"apple"]
|
|
> assert apple_hash == 8566208034543834098
|
|
> assert stringstore[apple_hash] == u"apple"
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| -------------- | ------------------------ | -------------------------- |
|
|
| `string_or_id` | bytes, unicode or uint64 | The value to encode. |
|
|
| **RETURNS** | unicode or int | The value to be retrieved. |
|
|
|
|
## StringStore.\_\_contains\_\_ {#contains tag="method"}
|
|
|
|
Check whether a string is in the store.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> assert u"apple" in stringstore
|
|
> assert not u"cherry" in stringstore
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | -------------------------------------- |
|
|
| `string` | unicode | The string to check. |
|
|
| **RETURNS** | bool | Whether the store contains the string. |
|
|
|
|
## StringStore.\_\_iter\_\_ {#iter tag="method"}
|
|
|
|
Iterate over the strings in the store, in order. Note that a newly initialized
|
|
store will always include an empty string `''` at position `0`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> all_strings = [s for s in stringstore]
|
|
> assert all_strings == [u"apple", u"orange"]
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ---------- | ------- | ---------------------- |
|
|
| **YIELDS** | unicode | A string in the store. |
|
|
|
|
## StringStore.add {#add tag="method" new="2"}
|
|
|
|
Add a string to the `StringStore`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore = StringStore([u"apple", u"orange"])
|
|
> banana_hash = stringstore.add(u"banana")
|
|
> assert len(stringstore) == 3
|
|
> assert banana_hash == 2525716904149915114
|
|
> assert stringstore[banana_hash] == u"banana"
|
|
> assert stringstore[u"banana"] == banana_hash
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ------------------------ |
|
|
| `string` | unicode | The string to add. |
|
|
| **RETURNS** | uint64 | The string's hash value. |
|
|
|
|
## StringStore.to_disk {#to_disk tag="method" new="2"}
|
|
|
|
Save the current state to a directory.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> stringstore.to_disk("/path/to/strings")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- |
|
|
| `path` | unicode / `Path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. |
|
|
|
|
## StringStore.from_disk {#from_disk tag="method" new="2"}
|
|
|
|
Loads state from a directory. Modifies the object in place and returns it.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.strings import StringStore
|
|
> stringstore = StringStore().from_disk("/path/to/strings")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ---------------- | -------------------------------------------------------------------------- |
|
|
| `path` | unicode / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
|
|
| **RETURNS** | `StringStore` | The modified `StringStore` object. |
|
|
|
|
## StringStore.to_bytes {#to_bytes tag="method"}
|
|
|
|
Serialize the current state to a binary string.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> store_bytes = stringstore.to_bytes()
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----- | ------------------------------------------------ |
|
|
| **RETURNS** | bytes | The serialized form of the `StringStore` object. |
|
|
|
|
## StringStore.from_bytes {#from_bytes tag="method"}
|
|
|
|
Load state from a binary string.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> fron spacy.strings import StringStore
|
|
> store_bytes = stringstore.to_bytes()
|
|
> new_store = StringStore().from_bytes(store_bytes)
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------------ | ------------- | ------------------------- |
|
|
| `bytes_data` | bytes | The data to load from. |
|
|
| **RETURNS** | `StringStore` | The `StringStore` object. |
|
|
|
|
## Utilities {#util}
|
|
|
|
### strings.hash_string {#hash_string tag="function"}
|
|
|
|
Get a 64-bit hash for a given string.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.strings import hash_string
|
|
> assert hash_string(u"apple") == 8566208034543834098
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ------------------- |
|
|
| `string` | unicode | The string to hash. |
|
|
| **RETURNS** | uint64 | The hash. |
|