What's an intent? And what's an entity?
Intents
Intents can be summed up as something the user wants to talk about. Not exactly a "topic" of conversation, but more like a flow they want to take. Intents are actually groups of something we in the voice assistant world call "utterances," or, speaking more plainly, sentences. Think of it like, "when the user says X, Y, or Z, they want to talk about this intent".
Let's say you have a voice skill for your online shop that sells clothing and footwear. The user says either "I want to buy a shoe" or "buy shirt". Clearly, the user wants to buy something. You could group these utterances into an intent called PurchaseItemIntent
. This indicates a flow of the conversation that the user wants to steer towards.
Additionally, let's say the user says "what did I buy?" or "show my purchase history". Again, it is clear they want to hear what they've previously purchased from your shop. These utterances (and more, you don't have to strictly have 2 utterances per intent) can be grouped into another intent which you might call ShowPurchaseHistoryIntent
.
Entities
Hearkening back to our web shop example, think of the PurchaseItemIntent
we described. The user says "I want to buy one shirt" or "purchase three sneakers."
As we've stated, the user wants to buy something. We design an intent around that by grouping specific utterances into it. However, as you've noticed, there's a common part to these utterances, and a variable part. What they have in common is "I want to buy" and "purchase".
The variable part is what the user actually wants to buy. Sneakers, shirts, pants, dresses, whatever you might have on offer. How do we accommodate that? A naive way to go about it would be to define a variation of an utterance for every single thing the user could possibly buy. This is extremely inefficient, and very error prone. You might forget to add a variation, and if you start selling accessories, you'd have to go in and manually add and update tons of different utterances.
Luckily, there is a better way.
Take the variable part out, and turn it into an entity. An entity is a concept that is frequently used in your skill, but its value can vary. Various vendors already define some "system" entities for extremely common concepts, such as numbers, colors, cities, countries, proper names, and so on.
Let's further expand on our previous example, and say that the user could buy one of the following:
- shirt/shirts
- shoe/shoes
- pants
- dress/dresses
- jacket/jackets
A bit sparse for a clothing store, but it will serve for the purpose of this example. We'll take this list of things that the user can buy, and turn it into an entity called Product
.
If we take our previously defined utterances that the user can say to express that they want to buy something, we could turn them generic first.
"I want to buy a shirt" and "I want to buy 3 shirts" becomes "I want to buy a <Product>
" and "I want to buy <Quantity>
<Product>
". There are more variations you could do, such as "a/an", or dropping the indefinite article altogether.
What we've done is we've added slots to our utterances. Slots are the variable part, and each slot has an entity associated with it. The names of the slots do not have to match the name of the entity that they're associated with. They're just names, labels that help you rationalize about your conversation's flow.
Suppose now the user says "I want to buy 7 shoes". What did we glean from this utterance?
- The user triggered the
PurchaseItemIntent
, they want to buy something. - They want to buy 7 of whatever item.
- They want to buy shoes.
With this information, we have all that we need to take the user's request to the next stage of processing, which would be checking if we have 7 shoes in stock, prompting for payment, etc.
Hopefully you can see how slots and entities help us in defining our conversation by presenting the user with options and allowing us some leeway in the way we define our utterances.