diff --git a/website/docs/api/top-level.mdx b/website/docs/api/top-level.mdx index b13a6d28b..f35c8885d 100644 --- a/website/docs/api/top-level.mdx +++ b/website/docs/api/top-level.mdx @@ -340,6 +340,128 @@ use with the `manual=True` argument in `displacy.render`. | `options` | Span-specific visualisation options. ~~Dict[str, Any]~~ | | **RETURNS** | Generated entities keyed by text (original text) and ents. ~~dict~~ | +### Visualizer data structures {id="displacy_structures"} + +You can also use displaCy's data format to manually render data. This can be +useful if you want to visualize output from other libaries. You can find +examples of displaCy's data format on the +[usage page](/usage/visualizers#manual-usage). + +> #### DEP data structure +> +> ```json +> { +> "words": [ +> { "text": "This", "tag": "DT" }, +> { "text": "is", "tag": "VBZ" }, +> { "text": "a", "tag": "DT" }, +> { "text": "sentence", "tag": "NN" } +> ], +> "arcs": [ +> { "start": 0, "end": 1, "label": "nsubj", "dir": "left" }, +> { "start": 2, "end": 3, "label": "det", "dir": "left" }, +> { "start": 1, "end": 3, "label": "attr", "dir": "right" } +> ] +> } +> ``` + +#### Dependency Visualizer data structure {id="structure-dep"} + +| Dictionary Key | Description | +| -------------- | ----------------------------------------- | +| `words` | List of words. ~~List[Dict[str, Any]]~~ | +| `arcs` | List of arcs. ~~List[Dict[str, Any]]~~ | +| `settings` | Visualization options. ~~Dict[str, Any]~~ | + + + +| Dictionary Key | Description | +| -------------- | ------------------------------------ | +| `text` | The string of the word. ~~str~~ | +| `tag` | Dependency tag of the word. ~~str~~ | +| `lemma` | Lemma of the word. ~~Optional[str]~~ | + + + + + +| Dictionary Key | Description | +| -------------- | ----------------------------------------------- | +| `start` | Start index. ~~int~~ | +| `end` | End index. ~~int~~ | +| `label` | Label of the arc. ~~str~~ | +| `dir` | Direction of the arc (`left`, `right`). ~~str~~ | + + + +> #### ENT data structure +> +> ```json +> { +> "text": "But Google is starting from behind.", +> "ents": [{"start": 4, "end": 10, "label": "ORG"}], +> "title": None +> } +> ``` + +#### Named Entity Recognition data structure {id="structure-ent"} + +| Dictionary Key | Description | +| -------------- | ------------------------------------------ | +| `text` | Text of the document. ~~str~~ | +| `ents` | List of entities. ~~List[Dict[str, Any]]~~ | +| `title` | Title of the visualization. ~~str~~ | +| `settings` | Visualization options. ~~Dict[str, Any]~~ | + + + +| Dictionary Key | Description | +| -------------- | ---------------------------- | +| `start` | Start index. ~~int~~ | +| `end` | End index. ~~int~~ | +| `label` | Label of the entity. ~~str~~ | +| `kb_id` | Knowledgebase ID. ~~str~~ | +| `kb_url` | Knowledgebase URL. ~~str~~ | + + + +> #### SPAN data structure +> +> ```json +> { +> "text": "Welcome to the Bank of China.", +> "spans": [ +> { "start_token": 3, "end_token": 6, "label": "ORG" }, +> { "start_token": 5, "end_token": 6, "label": "GPE" } +> ], +> "tokens": ["Welcome", "to", "the", "Bank", "of", "China", "."] +> } +> ``` + +#### Span Classification data structure {id="structure-span"} + +| Dictionary Key | Description | +| -------------- | ----------------------------------------- | +| `text` | Text of the document. ~~str~~ | +| `spans` | List of spans. ~~List[Dict[str, Any]]~~ | +| `title` | Title of the visualization. ~~str~~ | +| `tokens` | List of tokens. ~~List[str]~~ | +| `settings` | Visualization options. ~~Dict[str, Any]~~ | + + + +| Dictionary Key | Description | +| -------------- | -------------------------- | +| `start` | Start index. ~~int~~ | +| `end` | End index. ~~int~~ | +| `start_token` | Start token. ~~int~~ | +| `end_token` | End token. ~~int~~ | +| `label` | Label of the span. ~~str~~ | +| `kb_id` | Knowledgebase ID. ~~str~~ | +| `kb_url` | Knowledgebase URL. ~~str~~ | + + + ### Visualizer options {id="displacy_options"} The `options` argument lets you specify additional settings for each visualizer. diff --git a/website/docs/usage/visualizers.mdx b/website/docs/usage/visualizers.mdx index 1d3682af4..39ecd4f35 100644 --- a/website/docs/usage/visualizers.mdx +++ b/website/docs/usage/visualizers.mdx @@ -344,7 +344,8 @@ or [SyntaxNet](https://github.com/tensorflow/models/tree/master/research/syntaxnet). If you set `manual=True` on either `render()` or `serve()`, you can pass in data in displaCy's format as a dictionary (instead of `Doc` objects). There are -helper functions for converting `Doc` objects to displaCy's format for use with +helper functions for converting `Doc` objects to +[displaCy's format](/api/top-level#displacy_structures) for use with `manual=True`: [`displacy.parse_deps`](/api/top-level#displacy.parse_deps), [`displacy.parse_ents`](/api/top-level#displacy.parse_ents), and [`displacy.parse_spans`](/api/top-level#displacy.parse_spans).