mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-28 10:56:31 +03:00
bab9976d9a
* Adjust Table API and add docs * Add attributes and update description [ci skip] * Use strings.get_string_id instead of hash_string * Fix table method calls * Make orth arg in Lemmatizer.lookup optional Fall back to string, which is now handled by Table.__contains__ out-of-the-box * Fix method name * Auto-format
319 lines
9.9 KiB
Markdown
319 lines
9.9 KiB
Markdown
---
|
|
title: Lookups
|
|
teaser: A container for large lookup tables and dictionaries
|
|
tag: class
|
|
source: spacy/lookups.py
|
|
new: 2.2
|
|
---
|
|
|
|
This class allows convenient accesss to large lookup tables and dictionaries,
|
|
e.g. lemmatization data or tokenizer exception lists using Bloom filters.
|
|
Lookups are available via the [`Vocab`](/api/vocab) as `vocab.lookups`, so they
|
|
can be accessed before the pipeline components are applied (e.g. in the
|
|
tokenizer and lemmatizer), as well as within the pipeline components via
|
|
`doc.vocab.lookups`.
|
|
|
|
## Lookups.\_\_init\_\_ {#init tag="method"}
|
|
|
|
Create a `Lookups` object.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.lookups import Lookups
|
|
> lookups = Lookups()
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | --------- | ----------------------------- |
|
|
| **RETURNS** | `Lookups` | The newly constructed object. |
|
|
|
|
## Lookups.\_\_len\_\_ {#len tag="method"}
|
|
|
|
Get the current number of tables in the lookups.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> assert len(lookups) == 0
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ---- | ------------------------------------ |
|
|
| **RETURNS** | int | The number of tables in the lookups. |
|
|
|
|
## Lookups.\_\contains\_\_ {#contains tag="method"}
|
|
|
|
Check if the lookups contain a table of a given name. Delegates to
|
|
[`Lookups.has_table`](/api/lookups#has_table).
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table")
|
|
> assert "some_table" in lookups
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ----------------------------------------------- |
|
|
| `name` | unicode | Name of the table. |
|
|
| **RETURNS** | bool | Whether a table of that name is in the lookups. |
|
|
|
|
## Lookups.tables {#tables tag="property"}
|
|
|
|
Get the names of all tables in the lookups.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table")
|
|
> assert lookups.tables == ["some_table"]
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ---- | ----------------------------------- |
|
|
| **RETURNS** | list | Names of the tables in the lookups. |
|
|
|
|
## Lookups.add_table {#add_table tag="method"}
|
|
|
|
Add a new table with optional data to the lookups. Raises an error if the table
|
|
exists.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table", {"foo": "bar"})
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----------------------------- | ---------------------------------- |
|
|
| `name` | unicode | Unique name of the table. |
|
|
| `data` | dict | Optional data to add to the table. |
|
|
| **RETURNS** | [`Table`](/api/lookups#table) | The newly added table. |
|
|
|
|
## Lookups.get_table {#get_table tag="method"}
|
|
|
|
Get a table from the lookups. Raises an error if the table doesn't exist.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table", {"foo": "bar"})
|
|
> table = lookups.get_table("some_table")
|
|
> assert table["foo"] == "bar"
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----------------------------- | ------------------ |
|
|
| `name` | unicode | Name of the table. |
|
|
| **RETURNS** | [`Table`](/api/lookups#table) | The table. |
|
|
|
|
## Lookups.remove_table {#remove_table tag="method"}
|
|
|
|
Remove a table from the lookups. Raises an error if the table doesn't exist.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table")
|
|
> removed_table = lookups.remove_table("some_table")
|
|
> assert "some_table" not in lookups
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----------------------------- | ---------------------------- |
|
|
| `name` | unicode | Name of the table to remove. |
|
|
| **RETURNS** | [`Table`](/api/lookups#table) | The removed table. |
|
|
|
|
## Lookups.has_table {#has_table tag="method"}
|
|
|
|
Check if the lookups contain a table of a given name. Equivalent to
|
|
[`Lookups.__contains__`](/api/lookups#contains).
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups = Lookups()
|
|
> lookups.add_table("some_table")
|
|
> assert lookups.has_table("some_table")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ----------------------------------------------- |
|
|
| `name` | unicode | Name of the table. |
|
|
| **RETURNS** | bool | Whether a table of that name is in the lookups. |
|
|
|
|
## Lookups.to_bytes {#to_bytes tag="method"}
|
|
|
|
Serialize the lookups to a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookup_bytes = lookups.to_bytes()
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----- | ----------------------- |
|
|
| **RETURNS** | bytes | The serialized lookups. |
|
|
|
|
## Lookups.from_bytes {#from_bytes tag="method"}
|
|
|
|
Load the lookups from a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookup_bytes = lookups.to_bytes()
|
|
> lookups = Lookups()
|
|
> lookups.from_bytes(lookup_bytes)
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------------ | --------- | ---------------------- |
|
|
| `bytes_data` | bytes | The data to load from. |
|
|
| **RETURNS** | `Lookups` | The loaded lookups. |
|
|
|
|
## Lookups.to_disk {#to_disk tag="method"}
|
|
|
|
Save the lookups to a directory as `lookups.bin`. Expects a path to a directory,
|
|
which will be created if it doesn't exist.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> lookups.to_disk("/path/to/lookups")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- |
|
|
| `path` | unicode / `Path` | A path to a directory, which will be created if it doesn't exist. Paths may be either strings or `Path`-like objects. |
|
|
|
|
## Lookups.from_disk {#from_disk tag="method"}
|
|
|
|
Load lookups from a directory containing a `lookups.bin`. Will skip loading if
|
|
the file doesn't exist.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.lookups import Lookups
|
|
> lookups = Lookups()
|
|
> lookups.from_disk("/path/to/lookups")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ---------------- | -------------------------------------------------------------------------- |
|
|
| `path` | unicode / `Path` | A path to a directory. Paths may be either strings or `Path`-like objects. |
|
|
| **RETURNS** | `Lookups` | The loaded lookups. |
|
|
|
|
## Table {#table tag="class, ordererddict"}
|
|
|
|
A table in the lookups. Subclass of `OrderedDict` that implements a slightly
|
|
more consistent and unified API and includes a Bloom filter to speed up missed
|
|
lookups. Supports **all other methods and attributes** of `OrderedDict` /
|
|
`dict`, and the customized methods listed here. Methods that get or set keys
|
|
accept both integers and strings (which will be hashed before being added to the
|
|
table).
|
|
|
|
### Table.\_\_init\_\_ {#table.init tag="method"}
|
|
|
|
Initialize a new table.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.lookups import Table
|
|
> data = {"foo": "bar", "baz": 100}
|
|
> table = Table(name="some_table", data=data)
|
|
> assert "foo" in table
|
|
> assert table["foo"] == "bar"
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ---------------------------------- |
|
|
| `name` | unicode | Optional table name for reference. |
|
|
| **RETURNS** | `Table` | The newly constructed object. |
|
|
|
|
### Table.from_dict {#table.from_dict tag="classmethod"}
|
|
|
|
Initialize a new table from a dict.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.lookups import Table
|
|
> data = {"foo": "bar", "baz": 100}
|
|
> table = Table.from_dict(data, name="some_table")
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ------- | ---------------------------------- |
|
|
| `data` | dict | The dictionary. |
|
|
| `name` | unicode | Optional table name for reference. |
|
|
| **RETURNS** | `Table` | The newly constructed object. |
|
|
|
|
### Table.set {#table.set tag="method"}
|
|
|
|
Set a new key / value pair. String keys will be hashed. Same as
|
|
`table[key] = value`.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> from spacy.lookups import Table
|
|
> table = Table()
|
|
> table.set("foo", "bar")
|
|
> assert table["foo"] == "bar"
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------- | ------------- | ----------- |
|
|
| `key` | unicode / int | The key. |
|
|
| `value` | - | The value. |
|
|
|
|
### Table.to_bytes {#table.to_bytes tag="method"}
|
|
|
|
Serialize the table to a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> table_bytes = table.to_bytes()
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ----------- | ----- | --------------------- |
|
|
| **RETURNS** | bytes | The serialized table. |
|
|
|
|
### Table.from_bytes {#table.from_bytes tag="method"}
|
|
|
|
Load a table from a bytestring.
|
|
|
|
> #### Example
|
|
>
|
|
> ```python
|
|
> table_bytes = table.to_bytes()
|
|
> table = Table()
|
|
> table.from_bytes(table_bytes)
|
|
> ```
|
|
|
|
| Name | Type | Description |
|
|
| ------------ | ------- | ----------------- |
|
|
| `bytes_data` | bytes | The data to load. |
|
|
| **RETURNS** | `Table` | The loaded table. |
|
|
|
|
### Attributes {#table-attributes}
|
|
|
|
| Name | Type | Description |
|
|
| -------------- | --------------------------- | ----------------------------------------------------- |
|
|
| `name` | unicode | Table name. |
|
|
| `default_size` | int | Default size of bloom filters if no data is provided. |
|
|
| `bloom` | `preshed.bloom.BloomFilter` | The bloom filters. |
|