spaCy/website/usage/_visualizers/_html.jade
2017-10-10 04:24:22 +02:00

163 lines
6.5 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

//- 💫 DOCS > USAGE > VISUALIZERS > HTML
p
| If you don't need the web server and just want to generate the markup
| for example, to export it to a file or serve it in a custom
| way you can use #[+api("displacy#render") #[code displacy.render]].
| It works the same way, but returns a string containing the markup.
+code("Example").
import spacy
from spacy import displacy
nlp = spacy.load('en')
doc1 = nlp(u'This is a sentence.')
doc2 = nlp(u'This is another sentence.')
html = displacy.render([doc1, doc2], style='dep', page=True)
p
| #[code page=True] renders the markup wrapped as a full HTML page.
| For minified and more compact HTML markup, you can set #[code minify=True].
| If you're rendering a dependency parse, you can also export it as an
| #[code .svg] file.
+aside("What's SVG?")
| Unlike other image formats, the SVG (Scalable Vector Graphics) uses XML
| markup that's easy to manipulate
| #[+a("https://www.smashingmagazine.com/2014/11/styling-and-animating-svgs-with-css/") using CSS] or
| #[+a("https://css-tricks.com/smil-is-dead-long-live-smil-a-guide-to-alternatives-to-smil-features/") JavaScript].
| Essentially, SVG lets you design with code, which makes it a perfect fit
| for visualizing dependency trees. SVGs can be embedded online in an
| #[code <img>] tag, or inlined in an HTML document. They're also
| pretty easy to #[+a("https://convertio.co/image-converter/") convert].
+code.
svg = displacy.render(doc, style='dep')
output_path = Path('/images/sentence.svg')
output_path.open('w', encoding='utf-8').write(svg)
+infobox("Important note")
| Since each visualization is generated as a separate SVG, exporting
| #[code .svg] files only works if you're rendering #[strong one single doc]
| at a time. (This makes sense after all, each visualization should be
| a standalone graphic.) So instead of rendering all #[code Doc]s at one,
| loop over them and export them separately.
+h(3, "examples-export-svg") Example: Export SVG graphics of dependency parses
+code("Example").
import spacy
from spacy import displacy
from pathlib import Path
nlp = spacy.load('en')
sentences = ["This is an example.", "This is another one."]
for sent in sentences:
doc = nlp(sentence)
svg = displacy.render(doc, style='dep')
file_name = '-'.join([w.text for w in doc if not w.is_punct]) + '.svg'
output_path = Path('/images/' + file_name)
output_path.open('w', encoding='utf-8').write(svg)
p
| The above code will generate the dependency visualizations as to
| two files, #[code This-is-an-example.svg] and #[code This-is-another-one.svg].
+h(3, "manual-usage") Rendering data manually
p
| You can also use displaCy to manually render data. This can be useful if
| you want to visualize output from other libraries, like
| #[+a("http://www.nltk.org") NLTK] or
| #[+a("https://github.com/tensorflow/models/tree/master/research/syntaxnet") SyntaxNet].
| Simply convert the dependency parse or recognised entities to displaCy's
| format and set #[code manual=True] on either #[code render()] or
| #[code serve()].
+aside-code("Example").
ex = [{'text': 'But Google is starting from behind.',
'ents': [{'start': 4, 'end': 10, 'label': 'ORG'}],
'title': None}]
html = displacy.render(ex, style='ent', manual=True)
+code("DEP input").
{
'words': [
{'text': 'This', 'tag': 'DT'},
{'text': 'is', 'tag': 'VBZ'},
{'text': 'a', 'tag': 'DT'},
{'text': 'sentence', 'tag': 'NN'}],
'arcs': [
{'start': 0, 'end': 1, 'label': 'nsubj', 'dir': 'left'},
{'start': 2, 'end': 3, 'label': 'det', 'dir': 'left'},
{'start': 1, 'end': 3, 'label': 'attr', 'dir': 'right'}]
}
+code("ENT input").
{
'text': 'But Google is starting from behind.',
'ents': [{'start': 4, 'end': 10, 'label': 'ORG'}],
'title': None
}
+h(3, "webapp") Using displaCy in a web application
p
| If you want to use the visualizers as part of a web application, for
| example to create something like our
| #[+a(DEMOS_URL + "/displacy") online demo], it's not recommended to
| simply wrap and serve the displaCy renderer. Instead, you should only
| rely on the server to perform spaCy's processing capabilities, and use
| #[+a(gh("displacy")) displaCy.js] to render the JSON-formatted output.
+aside("Why not return the HTML by the server?")
| It's certainly possible to just have your server return the markup.
| But outputting raw, unsanitised HTML is risky and makes your app vulnerable to
| #[+a("https://en.wikipedia.org/wiki/Cross-site_scripting") cross-site scripting]
| (XSS). All your user needs to do is find a way to make spaCy return text
| like #[code <script src="malicious-code.js"><script>], which
| is pretty easy in NER mode. Instead of relying on the server to render
| and sanitise HTML, you can do this on the client in JavaScript.
| displaCy.js creates the markup as DOM nodes and will never insert raw
| HTML.
p
| The #[code parse_deps] function takes a #[code Doc] object and returns
| a dictionary in a format that can be rendered by displaCy.
+code("Example").
import spacy
from spacy import displacy
nlp = spacy.load('en')
def displacy_service(text):
doc = nlp(text)
return displacy.parse_deps(doc)
p
| Using a library like #[+a("https://falconframework.org/") Falcon] or
| #[+a("http://www.hug.rest/") Hug], you can easily turn the above code
| into a simple REST API that receives a text and returns a JSON-formatted
| parse. In your front-end, include #[+a(gh("displacy")) displacy.js] and
| initialise it with the API URL and the ID or query selector of the
| container to render the visualisation in, e.g. #[code '#displacy'] for
| #[code <div id="displacy">].
+code("script.js", "javascript").
var displacy = new displaCy('http://localhost:8080', {
container: '#displacy'
})
function parse(text) {
displacy.parse(text);
}
p
| When you call #[code parse()], it will make a request to your API,
| receive the JSON-formatted parse and render it in your container. To
| create an interactive experience, you could trigger this function by
| a button and read the text from an #[code <input>] field.