Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip]

Extend docs related to Vocab.get_noun_chunks
This commit is contained in:
Ines Montani 2021-02-27 11:51:08 +11:00 committed by GitHub
commit 408b94887a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 26 additions and 21 deletions

View File

@ -61,6 +61,8 @@ cdef class Vocab:
lookups (Lookups): Container for large lookup tables and dictionaries. lookups (Lookups): Container for large lookup tables and dictionaries.
oov_prob (float): Default OOV probability. oov_prob (float): Default OOV probability.
vectors_name (unicode): Optional name to identify the vectors table. vectors_name (unicode): Optional name to identify the vectors table.
get_noun_chunks (Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]):
A function that yields base noun phrases used for Doc.noun_chunks.
""" """
lex_attr_getters = lex_attr_getters if lex_attr_getters is not None else {} lex_attr_getters = lex_attr_getters if lex_attr_getters is not None else {}
if lookups in (None, True, False): if lookups in (None, True, False):

View File

@ -616,8 +616,10 @@ phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be
nested within it so no NP-level coordination, no prepositional phrases, and no nested within it so no NP-level coordination, no prepositional phrases, and no
relative clauses. relative clauses.
If the `noun_chunk` [syntax iterator](/usage/adding-languages#language-data) has To customize the noun chunk iterator in a loaded pipeline, modify
not been implemeted for the given language, a `NotImplementedError` is raised. [`nlp.vocab.get_noun_chunks`](/api/vocab#attributes). If the `noun_chunk`
[syntax iterator](/usage/adding-languages#language-data) has not been
implemented for the given language, a `NotImplementedError` is raised.
> #### Example > #### Example
> >

View File

@ -21,14 +21,14 @@ Create the vocabulary.
> vocab = Vocab(strings=["hello", "world"]) > vocab = Vocab(strings=["hello", "world"])
> ``` > ```
| Name | Description | | Name | Description |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ | | `lex_attr_getters` | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~ |
| `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ | | `strings` | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~ |
| `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ | | `lookups` | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~ |
| `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ | | `oov_prob` | The default OOV probability. Defaults to `-20.0`. ~~float~~ |
| `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~ | | `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~ |
| `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ | | `writing_system` | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~ |
| `get_noun_chunks` | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ | | `get_noun_chunks` | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |
## Vocab.\_\_len\_\_ {#len tag="method"} ## Vocab.\_\_len\_\_ {#len tag="method"}
@ -182,14 +182,14 @@ subword features by average over n-grams of `orth` (introduced in spaCy `v2.1`).
| Name | Description | | Name | Description |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ | | `orth` | The hash value of a word, or its unicode string. ~~Union[int, str]~~ |
| `minn` <Tag variant="new">2.1</Tag> | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | | `minn` <Tag variant="new">2.1</Tag> | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ |
| `maxn` <Tag variant="new">2.1</Tag> | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ | | `maxn` <Tag variant="new">2.1</Tag> | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~ |
| **RETURNS** | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ | | **RETURNS** | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |
## Vocab.set_vector {#set_vector tag="method" new="2"} ## Vocab.set_vector {#set_vector tag="method" new="2"}
Set a vector for a word in the vocabulary. Words can be referenced by string Set a vector for a word in the vocabulary. Words can be referenced by string or
or hash value. hash value.
> #### Example > #### Example
> >
@ -300,13 +300,14 @@ Load state from a binary string.
> assert type(PERSON) == int > assert type(PERSON) == int
> ``` > ```
| Name | Description | | Name | Description |
| --------------------------------------------- | ------------------------------------------------------------------------------- | | ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `strings` | A table managing the string-to-int mapping. ~~StringStore~~ | | `strings` | A table managing the string-to-int mapping. ~~StringStore~~ |
| `vectors` <Tag variant="new">2</Tag> | A table associating word IDs to word vectors. ~~Vectors~~ | | `vectors` <Tag variant="new">2</Tag> | A table associating word IDs to word vectors. ~~Vectors~~ |
| `vectors_length` | Number of dimensions for each word vector. ~~int~~ | | `vectors_length` | Number of dimensions for each word vector. ~~int~~ |
| `lookups` | The available lookup tables in this vocab. ~~Lookups~~ | | `lookups` | The available lookup tables in this vocab. ~~Lookups~~ |
| `writing_system` <Tag variant="new">2.1</Tag> | A dict with information about the language's writing system. ~~Dict[str, Any]~~ | | `writing_system` <Tag variant="new">2.1</Tag> | A dict with information about the language's writing system. ~~Dict[str, Any]~~ |
| `get_noun_chunks` <Tag variant="new">3.0</Tag> | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |
## Serialization fields {#serialization-fields} ## Serialization fields {#serialization-fields}