9.9 KiB
title | teaser | tag | source | new |
---|---|---|---|---|
Lookups | A container for large lookup tables and dictionaries | class | spacy/lookups.py | 2.2 |
This class allows convenient access to large lookup tables and dictionaries,
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
Lookups are available via the Vocab
as vocab.lookups
, so they
can be accessed before the pipeline components are applied (e.g. in the
tokenizer and lemmatizer), as well as within the pipeline components via
doc.vocab.lookups
.
Lookups.__init__
Create a Lookups
object.
Example
from spacy.lookups import Lookups lookups = Lookups()
Name | Type | Description |
---|---|---|
RETURNS | Lookups |
The newly constructed object. |
Lookups.__len__
Get the current number of tables in the lookups.
Example
lookups = Lookups() assert len(lookups) == 0
Name | Type | Description |
---|---|---|
RETURNS | int | The number of tables in the lookups. |
Lookups._\contains__
Check if the lookups contain a table of a given name. Delegates to
Lookups.has_table
.
Example
lookups = Lookups() lookups.add_table("some_table") assert "some_table" in lookups
Name | Type | Description |
---|---|---|
name |
unicode | Name of the table. |
RETURNS | bool | Whether a table of that name is in the lookups. |
Lookups.tables
Get the names of all tables in the lookups.
Example
lookups = Lookups() lookups.add_table("some_table") assert lookups.tables == ["some_table"]
Name | Type | Description |
---|---|---|
RETURNS | list | Names of the tables in the lookups. |
Lookups.add_table
Add a new table with optional data to the lookups. Raises an error if the table exists.
Example
lookups = Lookups() lookups.add_table("some_table", {"foo": "bar"})
Name | Type | Description |
---|---|---|
name |
unicode | Unique name of the table. |
data |
dict | Optional data to add to the table. |
RETURNS | Table |
The newly added table. |
Lookups.get_table
Get a table from the lookups. Raises an error if the table doesn't exist.
Example
lookups = Lookups() lookups.add_table("some_table", {"foo": "bar"}) table = lookups.get_table("some_table") assert table["foo"] == "bar"
Name | Type | Description |
---|---|---|
name |
unicode | Name of the table. |
RETURNS | Table |
The table. |
Lookups.remove_table
Remove a table from the lookups. Raises an error if the table doesn't exist.
Example
lookups = Lookups() lookups.add_table("some_table") removed_table = lookups.remove_table("some_table") assert "some_table" not in lookups
Name | Type | Description |
---|---|---|
name |
unicode | Name of the table to remove. |
RETURNS | Table |
The removed table. |
Lookups.has_table
Check if the lookups contain a table of a given name. Equivalent to
Lookups.__contains__
.
Example
lookups = Lookups() lookups.add_table("some_table") assert lookups.has_table("some_table")
Name | Type | Description |
---|---|---|
name |
unicode | Name of the table. |
RETURNS | bool | Whether a table of that name is in the lookups. |
Lookups.to_bytes
Serialize the lookups to a bytestring.
Example
lookup_bytes = lookups.to_bytes()
Name | Type | Description |
---|---|---|
RETURNS | bytes | The serialized lookups. |
Lookups.from_bytes
Load the lookups from a bytestring.
Example
lookup_bytes = lookups.to_bytes() lookups = Lookups() lookups.from_bytes(lookup_bytes)
Name | Type | Description |
---|---|---|
bytes_data |
bytes | The data to load from. |
RETURNS | Lookups |
The loaded lookups. |
Lookups.to_disk
Save the lookups to a directory as lookups.bin
. Expects a path to a directory,
which will be created if it doesn't exist.
Example
lookups.to_disk("/path/to/lookups")
Name | Type | Description |
---|---|---|
path |
unicode / Path |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path -like objects. |
Lookups.from_disk
Load lookups from a directory containing a lookups.bin
. Will skip loading if
the file doesn't exist.
Example
from spacy.lookups import Lookups lookups = Lookups() lookups.from_disk("/path/to/lookups")
Name | Type | Description |
---|---|---|
path |
unicode / Path |
A path to a directory. Paths may be either strings or Path -like objects. |
RETURNS | Lookups |
The loaded lookups. |
Table
A table in the lookups. Subclass of OrderedDict
that implements a slightly
more consistent and unified API and includes a Bloom filter to speed up missed
lookups. Supports all other methods and attributes of OrderedDict
/
dict
, and the customized methods listed here. Methods that get or set keys
accept both integers and strings (which will be hashed before being added to the
table).
Table.__init__
Initialize a new table.
Example
from spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table(name="some_table", data=data) assert "foo" in table assert table["foo"] == "bar"
Name | Type | Description |
---|---|---|
name |
unicode | Optional table name for reference. |
RETURNS | Table |
The newly constructed object. |
Table.from_dict
Initialize a new table from a dict.
Example
from spacy.lookups import Table data = {"foo": "bar", "baz": 100} table = Table.from_dict(data, name="some_table")
Name | Type | Description |
---|---|---|
data |
dict | The dictionary. |
name |
unicode | Optional table name for reference. |
RETURNS | Table |
The newly constructed object. |
Table.set
Set a new key / value pair. String keys will be hashed. Same as
table[key] = value
.
Example
from spacy.lookups import Table table = Table() table.set("foo", "bar") assert table["foo"] == "bar"
Name | Type | Description |
---|---|---|
key |
unicode / int | The key. |
value |
- | The value. |
Table.to_bytes
Serialize the table to a bytestring.
Example
table_bytes = table.to_bytes()
Name | Type | Description |
---|---|---|
RETURNS | bytes | The serialized table. |
Table.from_bytes
Load a table from a bytestring.
Example
table_bytes = table.to_bytes() table = Table() table.from_bytes(table_bytes)
Name | Type | Description |
---|---|---|
bytes_data |
bytes | The data to load. |
RETURNS | Table |
The loaded table. |
Attributes
Name | Type | Description |
---|---|---|
name |
unicode | Table name. |
default_size |
int | Default size of bloom filters if no data is provided. |
bloom |
preshed.bloom.BloomFilter |
The bloom filters. |