The interfaces based on conversation (speech interface, personal assistants, chatbots) is becoming more widespread since they are based on a known form of communication used by everyone: language. From an early age we learn to speak through dialogue with our parents, and from that point on we never stop using conversation (oral and then written). Speech interfaces are quicker and more immediate than conversation with people: we speak quicker than when writing and, if we need specific information, it’s easier to ask a question and wait for the answer, instead of looking for it ourselves. Even though this technology has vastly improved, there are still some limitations. A conversation with a bot, if not designed properly, might end up being forced and robotic. This might risk in disappointing the high expectations of the users, who, by asking a natural question, expect a natural response and one coherent to the context.
So how can we design a good conversation system?
First of all, we need to remember the principles of conversation: when we talk to someone (even a bot), we expect cooperation from the other person. This means that our speaker must be able to help us sustain a conservation, creating prompts and other ways to maintain a steady dialogue. If it doesn’t have a precise and direct answer to the query, it can get clues from the context to supply additional information, even if it isn’t explicitly requested.
Even the tone, style and type of language used, makes the conversation more or less natural. Excluding contexts when conversation is very formal, normally we make connections to normal ways of speaking, and not to written ones. Whether it refers to a conversation with a voice assistant or chatting to a bot, we perceive the dialogue as a spoken interaction: quick, direct and immediately understandable.
Whenever possible, we expect our speaker to remember what was said before, so that we don’t have to repeat information and topics every time we want to add or modify something.
It is also fundamental to instruct the user on how to interact with the bot, giving some suggestions,on a graphetic level, on what the interface can or cannot do. If no visual
direction is provided, the user will be lost, not having any instructions to refer to in order to continue. If there aren’t enough inputs accepted and understood by the nterface, the user will feel limited and the interface will be more similar to an automatic response system than a conversation.
Furthermore, it is necessary to be flexible and adapt to the communicative style of the user. For example, when asked for data from the bot, some users proceed by pointing
out all of the necessary details all at once, while others wait to be asked one question at a time. The conversation needs to support both of these methods, in the first case the bot shouldn’t ask the same thing twice, and, in the second it should guide the speaker by asking for the missing information.
To increase the quality of the conversation, you can programme the bot to recognise not only the different ways in which a person gives commands and asks for information, but also to understand when the user is giving feedback on the development of the conversation. Indeed, one of the moments in which the rigidity of the machine becomes apparent is when it provides information that does not correspond with the question asked. The bot is made aware of this when the user uses phrases such as “I don’t understand”, “That’s not what I meant” or “I didn’t ask that…”. As a result, it usually responds by starting over with a new command, which isn’t within its programming, providing answers such as “I’m sorry. I didn’t understand” or “Please ask a question to begin”.
Another issue to resolve in order to further improve the verbal interaction is to make sure that the bot doesn’t misinterpret the user’s silence as a “yes”, “no” or “I don’t understand”. Just like in a real conversation it is possible that the user wanted to say one of these things. However, it is also possible that they were interrupted by something else and for this reason weren’t able to respond on time. In these cases, to get back to the conversation, the bot tries to provide some suggestions, but never in a pedantic way.
In summary, in order to obtain a good interface, it is necessary to know how to listen to and understand the user, calibrating in a way that will make the information sufficient but not overwhelming. Let’s remember that “fast forward” doesn’t exist in the vocal interface and that the user could become bored or believe that they are wasting time if forced to listen to very long lists.
Similar to the last suggestion, we must be cautious with the use of communicative tone, words used and, if present, tone of voice, the user will associate a personality with the
virtual speaker. However, we must choose these characteristics carefully so that they fall in line with the brand and communicate the right message. Therefore it is necessary to
choose these characteristics carefully.