mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-10 00:20:35 +03:00
Update intent parser docs and add to usage docs
This commit is contained in:
parent
954c88f4d8
commit
b5643d8575
|
@ -1,13 +1,13 @@
|
||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
"""Using the parser to recognise your own semantics spaCy's parser component
|
"""Using the parser to recognise your own semantics
|
||||||
can be used to trained to predict any type of tree structure over your input
|
|
||||||
text. You can also predict trees over whole documents or chat logs, with
|
|
||||||
connections between the sentence-roots used to annotate discourse structure.
|
|
||||||
|
|
||||||
In this example, we'll build a message parser for a common "chat intent":
|
spaCy's parser component can be used to trained to predict any type of tree
|
||||||
finding local businesses. Our message semantics will have the following types
|
structure over your input text. You can also predict trees over whole documents
|
||||||
of relations: INTENT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION. For example:
|
or chat logs, with connections between the sentence-roots used to annotate
|
||||||
|
discourse structure. In this example, we'll build a message parser for a common
|
||||||
|
"chat intent": finding local businesses. Our message semantics will have the
|
||||||
|
following types of relations: ROOT, PLACE, QUALITY, ATTRIBUTE, TIME, LOCATION.
|
||||||
|
|
||||||
"show me the best hotel in berlin"
|
"show me the best hotel in berlin"
|
||||||
('show', 'ROOT', 'show')
|
('show', 'ROOT', 'show')
|
||||||
|
|
|
@ -95,6 +95,102 @@ p
|
||||||
+item
|
+item
|
||||||
| #[strong Test] the model to make sure the parser works as expected.
|
| #[strong Test] the model to make sure the parser works as expected.
|
||||||
|
|
||||||
|
+h(3, "intent-parser") Training a parser for custom semantics
|
||||||
|
|
||||||
|
p
|
||||||
|
| spaCy's parser component can be used to trained to predict any type
|
||||||
|
| of tree structure over your input text – including
|
||||||
|
| #[strong semantic relations] that are not syntactic dependencies. This
|
||||||
|
| can be useful to for #[strong conversational applications], which need to
|
||||||
|
| predict trees over whole documents or chat logs, with connections between
|
||||||
|
| the sentence roots used to annotate discourse structure. For example, you
|
||||||
|
| can train spaCy's parser to label intents and their targets, like
|
||||||
|
| attributes, quality, time and locations. The result could look like this:
|
||||||
|
|
||||||
|
+codepen("991f245ef90debb78c8fc369294f75ad", 300)
|
||||||
|
|
||||||
|
+code.
|
||||||
|
doc = nlp(u"find a hotel with good wifi")
|
||||||
|
print([(t.text, t.dep_, t.head.text) for t in doc if t.dep_ != '-'])
|
||||||
|
# [('find', 'ROOT', 'find'), ('hotel', 'PLACE', 'find'),
|
||||||
|
# ('good', 'QUALITY', 'wifi'), ('wifi', 'ATTRIBUTE', 'hotel')]
|
||||||
|
|
||||||
|
p
|
||||||
|
| The above tree attaches "wifi" to "hotel" and assigns the dependency
|
||||||
|
| label #[code ATTRIBUTE]. This may not be a correct syntactic dependency –
|
||||||
|
| but in this case, it expresses exactly what we need: the user is looking
|
||||||
|
| for a hotel with the attribute "wifi" of the quality "good". This query
|
||||||
|
| can then be processed by your application and used to trigger the
|
||||||
|
| respective action – e.g. search the database for hotels with high ratings
|
||||||
|
| for their wifi offerings.
|
||||||
|
|
||||||
|
+aside("Tip: merge phrases and entities")
|
||||||
|
| To achieve even better accuracy, try merging multi-word tokens and
|
||||||
|
| entities specific to your domain into one token before parsing your text.
|
||||||
|
| You can do this by running the entity recognizer or
|
||||||
|
| #[+a("/usage/linguistic-features#rule-based-matching") rule-based matcher]
|
||||||
|
| to find relevant spans, and merging them using
|
||||||
|
| #[+api("span#merge") #[code Span.merge]]. You could even add your own
|
||||||
|
| custom #[+a("/usage/processing-pipelines#custom-components") pipeline component]
|
||||||
|
| to do this automatically – just make sure to add it #[code before='parser'].
|
||||||
|
|
||||||
|
p
|
||||||
|
| The following example example shows a full implementation of a training
|
||||||
|
| loop for a custom message parser for a common "chat intent": finding
|
||||||
|
| local businesses. Our message semantics will have the following types
|
||||||
|
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
|
||||||
|
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
|
||||||
|
|
||||||
|
+github("spacy", "examples/training/train_intent_parser.py")
|
||||||
|
|
||||||
|
+h(4) Step by step guide
|
||||||
|
|
||||||
|
+list("numbers")
|
||||||
|
+item
|
||||||
|
| #[strong Create the training data] consisting of words, their heads
|
||||||
|
| and their dependency labels in order. A token's head is the index
|
||||||
|
| of the token it is attached to. The heads don't need to be
|
||||||
|
| syntactically correct – they should express the
|
||||||
|
| #[strong semantic relations] you want the parser to learn. For words
|
||||||
|
| that shouldn't receive a label, you can choose an arbitrary
|
||||||
|
| placeholder, for example #[code -].
|
||||||
|
|
||||||
|
+item
|
||||||
|
| #[strong Load the model] you want to start with, or create an
|
||||||
|
| #[strong empty model] using
|
||||||
|
| #[+api("spacy#blank") #[code spacy.blank]] with the ID of your
|
||||||
|
| language. If you're using a blank model, don't forget to add the
|
||||||
|
| parser to the pipeline. If you're using an existing model,
|
||||||
|
| make sure to disable all other pipeline components during training
|
||||||
|
| using #[+api("language#disable_pipes") #[code nlp.disable_pipes]].
|
||||||
|
| This way, you'll only be training the parser.
|
||||||
|
|
||||||
|
+item
|
||||||
|
| #[strong Add the dependency labels] to the parser using the
|
||||||
|
| #[+api("dependencyparser#add_label") #[code add_label]] method.
|
||||||
|
|
||||||
|
+item
|
||||||
|
| #[strong Shuffle and loop over] the examples and create a
|
||||||
|
| #[code Doc] and #[code GoldParse] object for each example. Make sure
|
||||||
|
| to pass in the #[code heads] and #[code deps] when you create the
|
||||||
|
| #[code GoldParse].
|
||||||
|
|
||||||
|
+item
|
||||||
|
| For each example, #[strong update the model]
|
||||||
|
| by calling #[+api("language#update") #[code nlp.update]], which steps
|
||||||
|
| through the words of the input. At each word, it makes a
|
||||||
|
| #[strong prediction]. It then consults the annotations provided on the
|
||||||
|
| #[code GoldParse] instance, to see whether it was
|
||||||
|
| right. If it was wrong, it adjusts its weights so that the correct
|
||||||
|
| action will score higher next time.
|
||||||
|
|
||||||
|
+item
|
||||||
|
| #[strong Save] the trained model using
|
||||||
|
| #[+api("language#to_disk") #[code nlp.to_disk]].
|
||||||
|
|
||||||
|
+item
|
||||||
|
| #[strong Test] the model to make sure the parser works as expected.
|
||||||
|
|
||||||
+h(3, "training-json") JSON format for training
|
+h(3, "training-json") JSON format for training
|
||||||
|
|
||||||
include ../../api/_annotation/_training
|
include ../../api/_annotation/_training
|
||||||
|
|
|
@ -122,6 +122,20 @@ include ../_includes/_mixins
|
||||||
|
|
||||||
+github("spacy", "examples/training/train_tagger.py")
|
+github("spacy", "examples/training/train_tagger.py")
|
||||||
|
|
||||||
|
+h(3, "intent-parser") Training a custom parser for chat intent semantics
|
||||||
|
|
||||||
|
p
|
||||||
|
| spaCy's parser component can be used to trained to predict any type
|
||||||
|
| of tree structure over your input text. You can also predict trees
|
||||||
|
| over whole documents or chat logs, with connections between the
|
||||||
|
| sentence-roots used to annotate discourse structure. In this example,
|
||||||
|
| we'll build a message parser for a common "chat intent": finding
|
||||||
|
| local businesses. Our message semantics will have the following types
|
||||||
|
| of relations: #[code ROOT], #[code PLACE], #[code QUALITY],
|
||||||
|
| #[code ATTRIBUTE], #[code TIME] and #[code LOCATION].
|
||||||
|
|
||||||
|
+github("spacy", "examples/training/train_intent_parser.py")
|
||||||
|
|
||||||
+h(3, "textcat") Training spaCy's text classifier
|
+h(3, "textcat") Training spaCy's text classifier
|
||||||
+tag-new(2)
|
+tag-new(2)
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user