Update intent parser docs and add to usage docs

This commit is contained in:
ines 2017-10-27 04:49:05 +02:00
parent 954c88f4d8
commit b5643d8575
3 changed files with 117 additions and 7 deletions

View File

@ -1,13 +1,13 @@
#!/usr/bin/env python
# coding: utf-8
"""Using the parser to recognise your own semantics spaCy's parser component
can be used to trained to predict any type of tree structure over your input
text. You can also predict trees over whole documents or chat logs, with
connections between the sentence-roots used to annotate discourse structure.
"""Using the parser to recognise your own semantics
In this example, we'll build a message parser for a common "chat intent":
finding local businesses. Our message semantics will have the following types
of relations: INTENT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION. For example:
spaCy's parser component can be used to trained to predict any type of tree
structure over your input text. You can also predict trees over whole documents
or chat logs, with connections between the sentence-roots used to annotate
discourse structure. In this example, we'll build a message parser for a common
"chat intent": finding local businesses. Our message semantics will have the
following types of relations: ROOT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION.
"show me the best hotel in berlin"
('show', 'ROOT', 'show')

View File

@ -95,6 +95,102 @@ p
+item
| #[strong Test] the model to make sure the parser works as expected.
+h(3, "intent-parser") Training a parser for custom semantics
p
| spaCy's parser component can be used to trained to predict any type
| of tree structure over your input text  including
| #[strong semantic relations] that are not syntactic dependencies. This
| can be useful to for #[strong conversational applications], which need to
| predict trees over whole documents or chat logs, with connections between
| the sentence roots used to annotate discourse structure. For example, you
| can train spaCy's parser to label intents and their targets, like
| attributes, quality, time and locations. The result could look like this:
+codepen("991f245ef90debb78c8fc369294f75ad", 300)
+code.
doc = nlp(u"find a hotel with good wifi")
print([(t.text, t.dep_, t.head.text) for t in doc if t.dep_ != '-'])
# [('find', 'ROOT', 'find'), ('hotel', 'PLACE', 'find'),
# ('good', 'QUALITY', 'wifi'), ('wifi', 'ATTRIBUTE', 'hotel')]
p
| The above tree attaches "wifi" to "hotel" and assigns the dependency
| label #[code ATTRIBUTE]. This may not be a correct syntactic dependency
| but in this case, it expresses exactly what we need: the user is looking
| for a hotel with the attribute "wifi" of the quality "good". This query
| can then be processed by your application and used to trigger the
| respective action e.g. search the database for hotels with high ratings
| for their wifi offerings.
+aside("Tip: merge phrases and entities")
| To achieve even better accuracy, try merging multi-word tokens and
| entities specific to your domain into one token before parsing your text.
| You can do this by running the entity recognizer or
| #[+a("/usage/linguistic-features#rule-based-matching") rule-based matcher]
| to find relevant spans, and merging them using
| #[+api("span#merge") #[code Span.merge]]. You could even add your own
| custom #[+a("/usage/processing-pipelines#custom-components") pipeline component]
| to do this automatically just make sure to add it #[code before='parser'].
p
| The following example example shows a full implementation of a training
| loop for a custom message parser for a common "chat intent": finding
| local businesses. Our message semantics will have the following types
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
+github("spacy", "examples/training/train_intent_parser.py")
+h(4) Step by step guide
+list("numbers")
+item
| #[strong Create the training data] consisting of words, their heads
| and their dependency labels in order. A token's head is the index
| of the token it is attached to. The heads don't need to be
| syntactically correct they should express the
| #[strong semantic relations] you want the parser to learn. For words
| that shouldn't receive a label, you can choose an arbitrary
| placeholder, for example #[code -].
+item
| #[strong Load the model] you want to start with, or create an
| #[strong empty model] using
| #[+api("spacy#blank") #[code spacy.blank]] with the ID of your
| language. If you're using a blank model, don't forget to add the
| parser to the pipeline. If you're using an existing model,
| make sure to disable all other pipeline components during training
| using #[+api("language#disable_pipes") #[code nlp.disable_pipes]].
| This way, you'll only be training the parser.
+item
| #[strong Add the dependency labels] to the parser using the
| #[+api("dependencyparser#add_label") #[code add_label]] method.
+item
| #[strong Shuffle and loop over] the examples and create a
| #[code Doc] and #[code GoldParse] object for each example. Make sure
| to pass in the #[code heads] and #[code deps] when you create the
| #[code GoldParse].
+item
| For each example, #[strong update the model]
| by calling #[+api("language#update") #[code nlp.update]], which steps
| through the words of the input. At each word, it makes a
| #[strong prediction]. It then consults the annotations provided on the
| #[code GoldParse] instance, to see whether it was
| right. If it was wrong, it adjusts its weights so that the correct
| action will score higher next time.
+item
| #[strong Save] the trained model using
| #[+api("language#to_disk") #[code nlp.to_disk]].
+item
| #[strong Test] the model to make sure the parser works as expected.
+h(3, "training-json") JSON format for training
include ../../api/_annotation/_training

View File

@ -122,6 +122,20 @@ include ../_includes/_mixins
+github("spacy", "examples/training/train_tagger.py")
+h(3, "intent-parser") Training a custom parser for chat intent semantics
p
| spaCy's parser component can be used to trained to predict any type
| of tree structure over your input text. You can also predict trees
| over whole documents or chat logs, with connections between the
| sentence-roots used to annotate discourse structure. In this example,
| we'll build a message parser for a common "chat intent": finding
| local businesses. Our message semantics will have the following types
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
+github("spacy", "examples/training/train_intent_parser.py")
+h(3, "textcat") Training spaCy's text classifier
+tag-new(2)