From 98c6a85c8bec04dd7be7c0dbc3c18e0efd9f822e Mon Sep 17 00:00:00 2001 From: Ines Montani Date: Fri, 31 Jul 2020 18:55:38 +0200 Subject: [PATCH] Update docs [ci skip] --- website/docs/api/language.md | 59 ++++++++++++++++++++++ website/docs/usage/processing-pipelines.md | 55 ++++++++++++++++++++ 2 files changed, 114 insertions(+) diff --git a/website/docs/api/language.md b/website/docs/api/language.md index 0662fb12a..608442122 100644 --- a/website/docs/api/language.md +++ b/website/docs/api/language.md @@ -598,6 +598,65 @@ contains the information about the component and its default provided by the | `name` | str | The pipeline component name. | | **RETURNS** | [`FactoryMeta`](#factorymeta) |  The factory meta. | +## Language.analyze_pipes {#analyze_pipes tag="method" new="3"} + +Analyze the current pipeline components and show a summary of the attributes +they assign and require, and the scores they set. The data is based on the +information provided in the [`@Language.component`](/api/language#component) and +[`@Language.factory`](/api/language#factory) decorator. If requirements aren't +met, e.g. if a component specifies a required property that is not set by a +previous component, a warning is shown. + + + +The pipeline analysis is static and does **not actually run the components**. +This means that it relies on the information provided by the components +themselves. If a custom component declares that it assigns an attribute but it +doesn't, the pipeline analysis won't catch that. + + + +> #### Example +> +> ```python +> nlp = spacy.blank("en") +> nlp.add_pipe("tagger") +> nlp.add_pipe("entity_linker") +> nlp.analyze_pipes() +> ``` + + + +``` +============================= Pipeline Overview ============================= + +# Component Assigns Requires Scores Retokenizes +- ------------- --------------- -------------- --------- ----------- +0 tagger token.tag tag_acc False + pos_acc + lemma_acc + +1 entity_linker token.ent_kb_id doc.ents False + doc.sents + token.ent_iob + token.ent_type + + +================================ Problems (4) ================================ +⚠ 'entity_linker' requirements not met: doc.ents, doc.sents, +token.ent_iob, token.ent_type +``` + + + +| Name | Type | Description | +| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| _keyword-only_ | | | +| `keys` | `List[str]` | The values to display in the table. Corresponds to attributes of the [`FactoryMeta`](/api/language#factorymeta). Defaults to `["assigns", "requires", "scores", "retokenizes"]`. | +| `pretty` | bool | Pretty-print the results with colors and icons. Defaults to `True`. | +| `no_print` | bool | Don't print anything and return a structured dict instead. Defaults to `False`. | +| **RETURNS** | dict | Optional dict, if `no_print` is set to `True`. | + ## Language.meta {#meta tag="property"} Custom meta data for the Language class. If a model is loaded, contains meta diff --git a/website/docs/usage/processing-pipelines.md b/website/docs/usage/processing-pipelines.md index 486cef1be..deca96840 100644 --- a/website/docs/usage/processing-pipelines.md +++ b/website/docs/usage/processing-pipelines.md @@ -311,6 +311,61 @@ nlp.rename_pipe("ner", "entityrecognizer") nlp.replace_pipe("tagger", my_custom_tagger) ``` +### Analyzing pipeline components {#analysis new="3"} + +The [`nlp.analyze_pipes`](/api/language#analyze_pipes) method analyzes the +components in the current pipeline and outputs information about them, like the +attributes they set on the [`Doc`](/api/doc) and [`Token`](/api/token), whether +they retokenize the `Doc` and which scores they produce during training. It will +also show warnings if components require values that aren't set by previous +component – for instance, if the entity linker is used but no component that +runs before it sets named entities. + +```python +nlp = spacy.blank("en") +nlp.add_pipe("tagger") +nlp.add_pipe("entity_linker") # this is a problem, because it needs entities +nlp.analyze_pipes() +``` + +``` +### Example output +============================= Pipeline Overview ============================= + +# Component Assigns Requires Scores Retokenizes +- ------------- --------------- -------------- --------- ----------- +0 tagger token.tag tag_acc False + pos_acc + lemma_acc + +1 entity_linker token.ent_kb_id doc.ents False + doc.sents + token.ent_iob + token.ent_type + + +================================ Problems (4) ================================ +⚠ 'entity_linker' requirements not met: doc.ents, doc.sents, +token.ent_iob, token.ent_type +``` + +If you prefer a structured dictionary containing the component information and +the problems, you can set `no_print=True`. This will return the data instead of +printing it. + +``` +result = nlp.analyze_pipes(no_print=True) +``` + + + +The pipeline analysis is static and does **not actually run the components**. +This means that it relies on the information provided by the components +themselves. If a custom component declares that it assigns an attribute but it +doesn't, the pipeline analysis won't catch that. + + + ## Creating custom pipeline components {#custom-components} A pipeline component is a function that receives a `Doc` object, modifies it and