Update docs [ci skip]

This commit is contained in:
Ines Montani 2020-07-31 18:55:38 +02:00
parent b68c53858c
commit 98c6a85c8b
2 changed files with 114 additions and 0 deletions

View File

@ -598,6 +598,65 @@ contains the information about the component and its default provided by the
| `name` | str | The pipeline component name. |
| **RETURNS** | [`FactoryMeta`](#factorymeta) |  The factory meta. |
## Language.analyze_pipes {#analyze_pipes tag="method" new="3"}
Analyze the current pipeline components and show a summary of the attributes
they assign and require, and the scores they set. The data is based on the
information provided in the [`@Language.component`](/api/language#component) and
[`@Language.factory`](/api/language#factory) decorator. If requirements aren't
met, e.g. if a component specifies a required property that is not set by a
previous component, a warning is shown.
<Infobox variant="warning" title="Important note">
The pipeline analysis is static and does **not actually run the components**.
This means that it relies on the information provided by the components
themselves. If a custom component declares that it assigns an attribute but it
doesn't, the pipeline analysis won't catch that.
</Infobox>
> #### Example
>
> ```python
> nlp = spacy.blank("en")
> nlp.add_pipe("tagger")
> nlp.add_pipe("entity_linker")
> nlp.analyze_pipes()
> ```
<Accordion title="Example output" spaced>
```
============================= Pipeline Overview =============================
# Component Assigns Requires Scores Retokenizes
- ------------- --------------- -------------- --------- -----------
0 tagger token.tag tag_acc False
pos_acc
lemma_acc
1 entity_linker token.ent_kb_id doc.ents False
doc.sents
token.ent_iob
token.ent_type
================================ Problems (4) ================================
⚠ 'entity_linker' requirements not met: doc.ents, doc.sents,
token.ent_iob, token.ent_type
```
</Accordion>
| Name | Type | Description |
| -------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| _keyword-only_ | | |
| `keys` | `List[str]` | The values to display in the table. Corresponds to attributes of the [`FactoryMeta`](/api/language#factorymeta). Defaults to `["assigns", "requires", "scores", "retokenizes"]`. |
| `pretty` | bool | Pretty-print the results with colors and icons. Defaults to `True`. |
| `no_print` | bool | Don't print anything and return a structured dict instead. Defaults to `False`. |
| **RETURNS** | dict | Optional dict, if `no_print` is set to `True`. |
## Language.meta {#meta tag="property"}
Custom meta data for the Language class. If a model is loaded, contains meta

View File

@ -311,6 +311,61 @@ nlp.rename_pipe("ner", "entityrecognizer")
nlp.replace_pipe("tagger", my_custom_tagger)
```
### Analyzing pipeline components {#analysis new="3"}
The [`nlp.analyze_pipes`](/api/language#analyze_pipes) method analyzes the
components in the current pipeline and outputs information about them, like the
attributes they set on the [`Doc`](/api/doc) and [`Token`](/api/token), whether
they retokenize the `Doc` and which scores they produce during training. It will
also show warnings if components require values that aren't set by previous
component for instance, if the entity linker is used but no component that
runs before it sets named entities.
```python
nlp = spacy.blank("en")
nlp.add_pipe("tagger")
nlp.add_pipe("entity_linker") # this is a problem, because it needs entities
nlp.analyze_pipes()
```
```
### Example output
============================= Pipeline Overview =============================
# Component Assigns Requires Scores Retokenizes
- ------------- --------------- -------------- --------- -----------
0 tagger token.tag tag_acc False
pos_acc
lemma_acc
1 entity_linker token.ent_kb_id doc.ents False
doc.sents
token.ent_iob
token.ent_type
================================ Problems (4) ================================
⚠ 'entity_linker' requirements not met: doc.ents, doc.sents,
token.ent_iob, token.ent_type
```
If you prefer a structured dictionary containing the component information and
the problems, you can set `no_print=True`. This will return the data instead of
printing it.
```
result = nlp.analyze_pipes(no_print=True)
```
<Infobox variant="warning" title="Important note">
The pipeline analysis is static and does **not actually run the components**.
This means that it relies on the information provided by the components
themselves. If a custom component declares that it assigns an attribute but it
doesn't, the pipeline analysis won't catch that.
</Infobox>
## Creating custom pipeline components {#custom-components}
A pipeline component is a function that receives a `Doc` object, modifies it and