9.8 KiB
| title | teaser | tag | source | new |
|---|---|---|---|---|
| Lookups | A container for large lookup tables and dictionaries | class | spacy/lookups.py | 2.2 |
This class allows convenient access to large lookup tables and dictionaries,
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
Lookups are available via the Vocab as vocab.lookups, so they
can be accessed before the pipeline components are applied (e.g. in the
tokenizer and lemmatizer), as well as within the pipeline components via
doc.vocab.lookups.
Lookups.__init__
Create a Lookups object.
Example
from spacy.lookups import Lookups lookups = Lookups()
| Name | Type | Description |
|---|---|---|
| RETURNS | Lookups |
The newly constructed object. |
Lookups.__len__
Get the current number of tables in the lookups.
Example
lookups = Lookups() assert len(lookups) == 0
| Name | Type | Description |
|---|---|---|
| RETURNS | int | The number of tables in the lookups. |
Lookups._\contains__
Check if the lookups contain a table of a given name. Delegates to
Lookups.has_table.
Example
lookups = Lookups() lookups.add_table("some_table") assert "some_table" in lookups
| Name | Type | Description |
|---|---|---|
name |
str | Name of the table. |
| RETURNS | bool | Whether a table of that name is in the lookups. |
Lookups.tables
Get the names of all tables in the lookups.
Example
lookups = Lookups() lookups.add_table("some_table") assert lookups.tables == ["some_table"]
| Name | Type | Description |
|---|---|---|
| RETURNS | list | Names of the tables in the lookups. |
Lookups.add_table
Add a new table with optional data to the lookups. Raises an error if the table exists.
Example
lookups = Lookups() lookups.add_table("some_table", {"foo": "bar"})
| Name | Type | Description |
|---|---|---|
name |
str | Unique name of the table. |
data |
dict | Optional data to add to the table. |
| RETURNS | Table |
The newly added table. |
Lookups.get_table
Get a table from the lookups. Raises an error if the table doesn't exist.
Example
lookups = Lookups() lookups.add_table("some_table", {"foo": "bar"}) table = lookups.get_table("some_table") assert table["foo"] == "bar"
| Name | Type | Description |
|---|---|---|
name |
str | Name of the table. |
| RETURNS | Table |
The table. |
Lookups.remove_table
Remove a table from the lookups. Raises an error if the table doesn't exist.
Example
lookups = Lookups() lookups.add_table("some_table") removed_table = lookups.remove_table("some_table") assert "some_table" not in lookups
| Name | Type | Description |
|---|---|---|
name |
str | Name of the table to remove. |
| RETURNS | Table |
The removed table. |
Lookups.has_table
Check if the lookups contain a table of a given name. Equivalent to
Lookups.__contains__.
Example
lookups = Lookups() lookups.add_table("some_table") assert lookups.has_table("some_table")
| Name | Type | Description |
|---|---|---|
name |
str | Name of the table. |
| RETURNS | bool | Whether a table of that name is in the lookups. |
Lookups.to_bytes
Serialize the lookups to a bytestring.
Example
lookup_bytes = lookups.to_bytes()
| Name | Type | Description |
|---|---|---|
| RETURNS | bytes | The serialized lookups. |
Lookups.from_bytes
Load the lookups from a bytestring.
Example
lookup_bytes = lookups.to_bytes() lookups = Lookups() lookups.from_bytes(lookup_bytes)
| Name | Type | Description |
|---|---|---|
bytes_data |
bytes | The data to load from. |
| RETURNS | Lookups |
The loaded lookups. |
Lookups.to_disk
Save the lookups to a directory as lookups.bin. Expects a path to a directory,
which will be created if it doesn't exist.
Example
lookups.to_disk("/path/to/lookups")
| Name | Type | Description |
|---|---|---|
path |
str / Path |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. |
Lookups.from_disk
Load lookups from a directory containing a lookups.bin. Will skip loading if
the file doesn't exist.
Example
from spacy.lookups import Lookups lookups = Lookups() lookups.from_disk("/path/to/lookups")
| Name | Type | Description |
|---|---|---|
path |
str / Path |
A path to a directory. Paths may be either strings or Path-like objects. |
| RETURNS | Lookups |
The loaded lookups. |
Table
A table in the lookups. Subclass of OrderedDict that implements a slightly
more consistent and unified API and includes a Bloom filter to speed up missed
lookups. Supports all other methods and attributes of OrderedDict /
dict, and the customized methods listed here. Methods that get or set keys
accept both integers and strings (which will be hashed before being added to the
table).
Table.__init__
Initialize a new table.
Example
from spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table(name="some_table", data=data) assert "foo" in table assert table["foo"] == "bar"
| Name | Type | Description |
|---|---|---|
name |
str | Optional table name for reference. |
| RETURNS | Table |
The newly constructed object. |
Table.from_dict
Initialize a new table from a dict.
Example
from spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table.from_dict(data, name="some_table")
| Name | Type | Description |
|---|---|---|
data |
dict | The dictionary. |
name |
str | Optional table name for reference. |
| RETURNS | Table |
The newly constructed object. |
Table.set
Set a new key / value pair. String keys will be hashed. Same as
table[key] = value.
Example
from spacy.lookups import Table table = Table() table.set("foo", "bar") assert table["foo"] == "bar"
| Name | Type | Description |
|---|---|---|
key |
str / int | The key. |
value |
- | The value. |
Table.to_bytes
Serialize the table to a bytestring.
Example
table_bytes = table.to_bytes()
| Name | Type | Description |
|---|---|---|
| RETURNS | bytes | The serialized table. |
Table.from_bytes
Load a table from a bytestring.
Example
table_bytes = table.to_bytes() table = Table() table.from_bytes(table_bytes)
| Name | Type | Description |
|---|---|---|
bytes_data |
bytes | The data to load. |
| RETURNS | Table |
The loaded table. |
Attributes
| Name | Type | Description |
|---|---|---|
name |
str | Table name. |
default_size |
int | Default size of bloom filters if no data is provided. |
bloom |
preshed.bloom.BloomFilter |
The bloom filters. |