Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip]

Extend docs related to Vocab.get_noun_chunks
2025-10-25 21:21:10 +03:00 · 2021-02-27 11:51:08 +11:00 · 2021-02-27 11:51:08 +11:00 · 408b94887a
commit 408b94887a
parent dc46fa078f 6a37f343d5
3 changed files with 26 additions and 21 deletions
--- a/spacy/vocab.pyx
+++ b/spacy/vocab.pyx
@ -61,6 +61,8 @@ cdef class Vocab:
        lookups (Lookups): Container for large lookup tables and dictionaries.
        oov_prob (float): Default OOV probability.
        vectors_name (unicode): Optional name to identify the vectors table.
+        get_noun_chunks (Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]):
+            A function that yields base noun phrases used for Doc.noun_chunks.
        """
        lex_attr_getters = lex_attr_getters if lex_attr_getters is not None else {}
        if lookups in (None, True, False):
--- a/website/docs/api/doc.md
+++ b/website/docs/api/doc.md
@ -616,8 +616,10 @@ phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be
 nested within it – so no NP-level coordination, no prepositional phrases, and no
 relative clauses.

-If the `noun_chunk` [syntax iterator](/usage/adding-languages#language-data) has
-not been implemeted for the given language, a `NotImplementedError` is raised.
+To customize the noun chunk iterator in a loaded pipeline, modify
+[`nlp.vocab.get_noun_chunks`](/api/vocab#attributes). If the `noun_chunk`
+[syntax iterator](/usage/adding-languages#language-data) has not been
+implemented for the given language, a `NotImplementedError` is raised.

 > #### Example
 >
--- a/website/docs/api/vocab.md
+++ b/website/docs/api/vocab.md
@ -21,14 +21,14 @@ Create the vocabulary.
 > vocab = Vocab(strings=["hello", "world"])
 > ```

-| Name                                        | Description                                                                                                                                             |
-| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                      |
-| `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~           |
-| `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                      |
-| `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                             |
-| `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                           |
-| `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~          |
+| Name                                        | Description                                                                                                                                            |
+| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `lex_attr_getters`                          | A dictionary mapping attribute IDs to functions to compute them. Defaults to `None`. ~~Optional[Dict[str, Callable[[str], Any]]]~~                     |
+| `strings`                                   | A [`StringStore`](/api/stringstore) that maps strings to hash values, and vice versa, or a list of strings. ~~Union[List[str], StringStore]~~          |
+| `lookups`                                   | A [`Lookups`](/api/lookups) that stores the `lexeme_norm` and other large lookup tables. Defaults to `None`. ~~Optional[Lookups]~~                     |
+| `oov_prob`                                  | The default OOV probability. Defaults to `-20.0`. ~~float~~                                                                                            |
+| `vectors_name` <Tag variant="new">2.2</Tag> | A name to identify the vectors table. ~~str~~                                                                                                          |
+| `writing_system`                            | A dictionary describing the language's writing system. Typically provided by [`Language.Defaults`](/api/language#defaults). ~~Dict[str, Any]~~         |
 | `get_noun_chunks`                           | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |

 ## Vocab.\_\_len\_\_ {#len tag="method"}
@ -182,14 +182,14 @@ subword features by average over n-grams of `orth` (introduced in spaCy `v2.1`).
 | Name                                | Description                                                                                                            |
 | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
 | `orth`                              | The hash value of a word, or its unicode string. ~~Union[int, str]~~                                                   |
-| `minn` <Tag variant="new">2.1</Tag> | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~                 |
-| `maxn` <Tag variant="new">2.1</Tag> | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~                 |
+| `minn` <Tag variant="new">2.1</Tag> | Minimum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~                |
+| `maxn` <Tag variant="new">2.1</Tag> | Maximum n-gram length used for FastText's n-gram computation. Defaults to the length of `orth`. ~~int~~                |
 | **RETURNS**                         | A word vector. Size and shape are determined by the `Vocab.vectors` instance. ~~numpy.ndarray[ndim=1, dtype=float32]~~ |

 ## Vocab.set_vector {#set_vector tag="method" new="2"}

-Set a vector for a word in the vocabulary. Words can be referenced by string
-or hash value.
+Set a vector for a word in the vocabulary. Words can be referenced by string or
+hash value.

 > #### Example
 >
@ -300,13 +300,14 @@ Load state from a binary string.
 > assert type(PERSON) == int
 > ```

-| Name                                          | Description                                                                     |
-| --------------------------------------------- | ------------------------------------------------------------------------------- |
-| `strings`                                     | A table managing the string-to-int mapping. ~~StringStore~~                     |
-| `vectors` <Tag variant="new">2</Tag>          | A table associating word IDs to word vectors. ~~Vectors~~                       |
-| `vectors_length`                              | Number of dimensions for each word vector. ~~int~~                              |
-| `lookups`                                     | The available lookup tables in this vocab. ~~Lookups~~                          |
-| `writing_system` <Tag variant="new">2.1</Tag> | A dict with information about the language's writing system. ~~Dict[str, Any]~~ |
+| Name                                           | Description                                                                                                                                            |
+| ---------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `strings`                                      | A table managing the string-to-int mapping. ~~StringStore~~                                                                                            |
+| `vectors` <Tag variant="new">2</Tag>           | A table associating word IDs to word vectors. ~~Vectors~~                                                                                              |
+| `vectors_length`                               | Number of dimensions for each word vector. ~~int~~                                                                                                     |
+| `lookups`                                      | The available lookup tables in this vocab. ~~Lookups~~                                                                                                 |
+| `writing_system` <Tag variant="new">2.1</Tag>  | A dict with information about the language's writing system. ~~Dict[str, Any]~~                                                                        |
+| `get_noun_chunks` <Tag variant="new">3.0</Tag> | A function that yields base noun phrases used for [`Doc.noun_chunks`](/ap/doc#noun_chunks). ~~Optional[Callable[[Union[Doc, Span], Iterator[Span]]]]~~ |

 ## Serialization fields {#serialization-fields}