mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-26 09:14:32 +03:00
Update intent parser docs and add to usage docs
This commit is contained in:
parent
954c88f4d8
commit
b5643d8575
|
@ -1,13 +1,13 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
"""Using the parser to recognise your own semantics spaCy's parser component
|
||||
can be used to trained to predict any type of tree structure over your input
|
||||
text. You can also predict trees over whole documents or chat logs, with
|
||||
connections between the sentence-roots used to annotate discourse structure.
|
||||
"""Using the parser to recognise your own semantics
|
||||
|
||||
In this example, we'll build a message parser for a common "chat intent":
|
||||
finding local businesses. Our message semantics will have the following types
|
||||
of relations: INTENT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION. For example:
|
||||
spaCy's parser component can be used to trained to predict any type of tree
|
||||
structure over your input text. You can also predict trees over whole documents
|
||||
or chat logs, with connections between the sentence-roots used to annotate
|
||||
discourse structure. In this example, we'll build a message parser for a common
|
||||
"chat intent": finding local businesses. Our message semantics will have the
|
||||
following types of relations: ROOT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION.
|
||||
|
||||
"show me the best hotel in berlin"
|
||||
('show', 'ROOT', 'show')
|
||||
|
|
|
@ -95,6 +95,102 @@ p
|
|||
+item
|
||||
| #[strong Test] the model to make sure the parser works as expected.
|
||||
|
||||
+h(3, "intent-parser") Training a parser for custom semantics
|
||||
|
||||
p
|
||||
| spaCy's parser component can be used to trained to predict any type
|
||||
| of tree structure over your input text – including
|
||||
| #[strong semantic relations] that are not syntactic dependencies. This
|
||||
| can be useful to for #[strong conversational applications], which need to
|
||||
| predict trees over whole documents or chat logs, with connections between
|
||||
| the sentence roots used to annotate discourse structure. For example, you
|
||||
| can train spaCy's parser to label intents and their targets, like
|
||||
| attributes, quality, time and locations. The result could look like this:
|
||||
|
||||
+codepen("991f245ef90debb78c8fc369294f75ad", 300)
|
||||
|
||||
+code.
|
||||
doc = nlp(u"find a hotel with good wifi")
|
||||
print([(t.text, t.dep_, t.head.text) for t in doc if t.dep_ != '-'])
|
||||
# [('find', 'ROOT', 'find'), ('hotel', 'PLACE', 'find'),
|
||||
# ('good', 'QUALITY', 'wifi'), ('wifi', 'ATTRIBUTE', 'hotel')]
|
||||
|
||||
p
|
||||
| The above tree attaches "wifi" to "hotel" and assigns the dependency
|
||||
| label #[code ATTRIBUTE]. This may not be a correct syntactic dependency –
|
||||
| but in this case, it expresses exactly what we need: the user is looking
|
||||
| for a hotel with the attribute "wifi" of the quality "good". This query
|
||||
| can then be processed by your application and used to trigger the
|
||||
| respective action – e.g. search the database for hotels with high ratings
|
||||
| for their wifi offerings.
|
||||
|
||||
+aside("Tip: merge phrases and entities")
|
||||
| To achieve even better accuracy, try merging multi-word tokens and
|
||||
| entities specific to your domain into one token before parsing your text.
|
||||
| You can do this by running the entity recognizer or
|
||||
| #[+a("/usage/linguistic-features#rule-based-matching") rule-based matcher]
|
||||
| to find relevant spans, and merging them using
|
||||
| #[+api("span#merge") #[code Span.merge]]. You could even add your own
|
||||
| custom #[+a("/usage/processing-pipelines#custom-components") pipeline component]
|
||||
| to do this automatically – just make sure to add it #[code before='parser'].
|
||||
|
||||
p
|
||||
| The following example example shows a full implementation of a training
|
||||
| loop for a custom message parser for a common "chat intent": finding
|
||||
| local businesses. Our message semantics will have the following types
|
||||
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
|
||||
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
|
||||
|
||||
+github("spacy", "examples/training/train_intent_parser.py")
|
||||
|
||||
+h(4) Step by step guide
|
||||
|
||||
+list("numbers")
|
||||
+item
|
||||
| #[strong Create the training data] consisting of words, their heads
|
||||
| and their dependency labels in order. A token's head is the index
|
||||
| of the token it is attached to. The heads don't need to be
|
||||
| syntactically correct – they should express the
|
||||
| #[strong semantic relations] you want the parser to learn. For words
|
||||
| that shouldn't receive a label, you can choose an arbitrary
|
||||
| placeholder, for example #[code -].
|
||||
|
||||
+item
|
||||
| #[strong Load the model] you want to start with, or create an
|
||||
| #[strong empty model] using
|
||||
| #[+api("spacy#blank") #[code spacy.blank]] with the ID of your
|
||||
| language. If you're using a blank model, don't forget to add the
|
||||
| parser to the pipeline. If you're using an existing model,
|
||||
| make sure to disable all other pipeline components during training
|
||||
| using #[+api("language#disable_pipes") #[code nlp.disable_pipes]].
|
||||
| This way, you'll only be training the parser.
|
||||
|
||||
+item
|
||||
| #[strong Add the dependency labels] to the parser using the
|
||||
| #[+api("dependencyparser#add_label") #[code add_label]] method.
|
||||
|
||||
+item
|
||||
| #[strong Shuffle and loop over] the examples and create a
|
||||
| #[code Doc] and #[code GoldParse] object for each example. Make sure
|
||||
| to pass in the #[code heads] and #[code deps] when you create the
|
||||
| #[code GoldParse].
|
||||
|
||||
+item
|
||||
| For each example, #[strong update the model]
|
||||
| by calling #[+api("language#update") #[code nlp.update]], which steps
|
||||
| through the words of the input. At each word, it makes a
|
||||
| #[strong prediction]. It then consults the annotations provided on the
|
||||
| #[code GoldParse] instance, to see whether it was
|
||||
| right. If it was wrong, it adjusts its weights so that the correct
|
||||
| action will score higher next time.
|
||||
|
||||
+item
|
||||
| #[strong Save] the trained model using
|
||||
| #[+api("language#to_disk") #[code nlp.to_disk]].
|
||||
|
||||
+item
|
||||
| #[strong Test] the model to make sure the parser works as expected.
|
||||
|
||||
+h(3, "training-json") JSON format for training
|
||||
|
||||
include ../../api/_annotation/_training
|
||||
|
|
|
@ -122,6 +122,20 @@ include ../_includes/_mixins
|
|||
|
||||
+github("spacy", "examples/training/train_tagger.py")
|
||||
|
||||
+h(3, "intent-parser") Training a custom parser for chat intent semantics
|
||||
|
||||
p
|
||||
| spaCy's parser component can be used to trained to predict any type
|
||||
| of tree structure over your input text. You can also predict trees
|
||||
| over whole documents or chat logs, with connections between the
|
||||
| sentence-roots used to annotate discourse structure. In this example,
|
||||
| we'll build a message parser for a common "chat intent": finding
|
||||
| local businesses. Our message semantics will have the following types
|
||||
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
|
||||
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
|
||||
|
||||
+github("spacy", "examples/training/train_intent_parser.py")
|
||||
|
||||
+h(3, "textcat") Training spaCy's text classifier
|
||||
+tag-new(2)
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user