mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-30 20:06:30 +03:00
163 lines
6.5 KiB
Plaintext
163 lines
6.5 KiB
Plaintext
//- 💫 DOCS > USAGE > VISUALIZERS > HTML
|
||
|
||
p
|
||
| If you don't need the web server and just want to generate the markup
|
||
| – for example, to export it to a file or serve it in a custom
|
||
| way – you can use #[+api("displacy#render") #[code displacy.render]].
|
||
| It works the same way, but returns a string containing the markup.
|
||
|
||
+code("Example").
|
||
import spacy
|
||
from spacy import displacy
|
||
|
||
nlp = spacy.load('en')
|
||
doc1 = nlp(u'This is a sentence.')
|
||
doc2 = nlp(u'This is another sentence.')
|
||
html = displacy.render([doc1, doc2], style='dep', page=True)
|
||
|
||
p
|
||
| #[code page=True] renders the markup wrapped as a full HTML page.
|
||
| For minified and more compact HTML markup, you can set #[code minify=True].
|
||
| If you're rendering a dependency parse, you can also export it as an
|
||
| #[code .svg] file.
|
||
|
||
+aside("What's SVG?")
|
||
| Unlike other image formats, the SVG (Scalable Vector Graphics) uses XML
|
||
| markup that's easy to manipulate
|
||
| #[+a("https://www.smashingmagazine.com/2014/11/styling-and-animating-svgs-with-css/") using CSS] or
|
||
| #[+a("https://css-tricks.com/smil-is-dead-long-live-smil-a-guide-to-alternatives-to-smil-features/") JavaScript].
|
||
| Essentially, SVG lets you design with code, which makes it a perfect fit
|
||
| for visualizing dependency trees. SVGs can be embedded online in an
|
||
| #[code <img>] tag, or inlined in an HTML document. They're also
|
||
| pretty easy to #[+a("https://convertio.co/image-converter/") convert].
|
||
|
||
+code.
|
||
svg = displacy.render(doc, style='dep')
|
||
output_path = Path('/images/sentence.svg')
|
||
output_path.open('w', encoding='utf-8').write(svg)
|
||
|
||
+infobox("Important note")
|
||
| Since each visualization is generated as a separate SVG, exporting
|
||
| #[code .svg] files only works if you're rendering #[strong one single doc]
|
||
| at a time. (This makes sense – after all, each visualization should be
|
||
| a standalone graphic.) So instead of rendering all #[code Doc]s at one,
|
||
| loop over them and export them separately.
|
||
|
||
|
||
+h(3, "examples-export-svg") Example: Export SVG graphics of dependency parses
|
||
|
||
+code("Example").
|
||
import spacy
|
||
from spacy import displacy
|
||
from pathlib import Path
|
||
|
||
nlp = spacy.load('en')
|
||
sentences = ["This is an example.", "This is another one."]
|
||
for sent in sentences:
|
||
doc = nlp(sentence)
|
||
svg = displacy.render(doc, style='dep')
|
||
file_name = '-'.join([w.text for w in doc if not w.is_punct]) + '.svg'
|
||
output_path = Path('/images/' + file_name)
|
||
output_path.open('w', encoding='utf-8').write(svg)
|
||
|
||
p
|
||
| The above code will generate the dependency visualizations as to
|
||
| two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
|
||
|
||
|
||
+h(3, "manual-usage") Rendering data manually
|
||
|
||
p
|
||
| You can also use displaCy to manually render data. This can be useful if
|
||
| you want to visualize output from other libraries, like
|
||
| #[+a("http://www.nltk.org") NLTK] or
|
||
| #[+a("https://github.com/tensorflow/models/tree/master/research/syntaxnet") SyntaxNet].
|
||
| Simply convert the dependency parse or recognised entities to displaCy's
|
||
| format and set #[code manual=True] on either #[code render()] or
|
||
| #[code serve()].
|
||
|
||
+aside-code("Example").
|
||
ex = [{'text': 'But Google is starting from behind.',
|
||
'ents': [{'start': 4, 'end': 10, 'label': 'ORG'}],
|
||
'title': None}]
|
||
html = displacy.render(ex, style='ent', manual=True)
|
||
|
||
+code("DEP input").
|
||
{
|
||
'words': [
|
||
{'text': 'This', 'tag': 'DT'},
|
||
{'text': 'is', 'tag': 'VBZ'},
|
||
{'text': 'a', 'tag': 'DT'},
|
||
{'text': 'sentence', 'tag': 'NN'}],
|
||
'arcs': [
|
||
{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'},
|
||
{'start': 2, 'end': 3, 'label': 'det', 'dir': 'left'},
|
||
{'start': 1, 'end': 3, 'label': 'attr', 'dir': 'right'}]
|
||
}
|
||
|
||
+code("ENT input").
|
||
{
|
||
'text': 'But Google is starting from behind.',
|
||
'ents': [{'start': 4, 'end': 10, 'label': 'ORG'}],
|
||
'title': None
|
||
}
|
||
|
||
+h(3, "webapp") Using displaCy in a web application
|
||
|
||
p
|
||
| If you want to use the visualizers as part of a web application, for
|
||
| example to create something like our
|
||
| #[+a(DEMOS_URL + "/displacy") online demo], it's not recommended to
|
||
| simply wrap and serve the displaCy renderer. Instead, you should only
|
||
| rely on the server to perform spaCy's processing capabilities, and use
|
||
| #[+a(gh("displacy")) displaCy.js] to render the JSON-formatted output.
|
||
|
||
+aside("Why not return the HTML by the server?")
|
||
| It's certainly possible to just have your server return the markup.
|
||
| But outputting raw, unsanitised HTML is risky and makes your app vulnerable to
|
||
| #[+a("https://en.wikipedia.org/wiki/Cross-site_scripting") cross-site scripting]
|
||
| (XSS). All your user needs to do is find a way to make spaCy return text
|
||
| like #[code <script src="malicious-code.js"><script>], which
|
||
| is pretty easy in NER mode. Instead of relying on the server to render
|
||
| and sanitise HTML, you can do this on the client in JavaScript.
|
||
| displaCy.js creates the markup as DOM nodes and will never insert raw
|
||
| HTML.
|
||
|
||
p
|
||
| The #[code parse_deps] function takes a #[code Doc] object and returns
|
||
| a dictionary in a format that can be rendered by displaCy.
|
||
|
||
+code("Example").
|
||
import spacy
|
||
from spacy import displacy
|
||
|
||
nlp = spacy.load('en')
|
||
|
||
def displacy_service(text):
|
||
doc = nlp(text)
|
||
return displacy.parse_deps(doc)
|
||
|
||
p
|
||
| Using a library like #[+a("https://falconframework.org/") Falcon] or
|
||
| #[+a("http://www.hug.rest/") Hug], you can easily turn the above code
|
||
| into a simple REST API that receives a text and returns a JSON-formatted
|
||
| parse. In your front-end, include #[+a(gh("displacy")) displacy.js] and
|
||
| initialise it with the API URL and the ID or query selector of the
|
||
| container to render the visualisation in, e.g. #[code '#displacy'] for
|
||
| #[code <div id="displacy">].
|
||
|
||
+code("script.js", "javascript").
|
||
var displacy = new displaCy('http://localhost:8080', {
|
||
container: '#displacy'
|
||
})
|
||
|
||
function parse(text) {
|
||
displacy.parse(text);
|
||
}
|
||
|
||
p
|
||
| When you call #[code parse()], it will make a request to your API,
|
||
| receive the JSON-formatted parse and render it in your container. To
|
||
| create an interactive experience, you could trigger this function by
|
||
| a button and read the text from an #[code <input>] field.
|