Conversational AI (A Beginner guide)

Sudarshan Sahu
UX Planet
Published in
8 min readAug 2, 2021

--

What is Conversational AI

Conversation is an essential part of human communication. It may be formal or informal talk between two or more people, in order to exchange information, feelings and emotions.

Conversational AI follows the same principle and builds an interaction between human and machine. It allows machines to understand, respond, and react like a real conversation with cognitive human skills.

It enables machines to get the user input in the form of voice or text, understand and process the user’s intention and respond naturally in a way that mimics human conversation.

The common conversational AI technologies are Chatbot, virtual agent, virtual assistant, digital assistant, digital employee

How Conversational AI work?

Conversational AI uses various technologies such as Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Dialog Management.

Here is how Conversation AI works:

  • It starts working with voice/text input from the users. The input may come from various channels, modal and languages
  • For Voice, Automated Speech Recognition (ASR) technology helps to translate the spoken format into a machine-readable format, text.
  • Then Natural Language Understanding (NLU) helps to process the structure data and find the right contexts and languages.
  • It then integrates with database and external APIs to extract the required informations
  • The last stage is dialog management. It manages the responses and converts them into a human-understandable format using Natural Language Generation (NLG).
  • After that, conversational AI application either deliver the response in text, or text to speech.

Components of Conversational AI:

  1. User Input: It starts with voice/text input from the users. Users provide input through various channels, Modals and languages.

2. Input analysis: In this stage the input is converted into structure language which is easily readable by machine.

If the input is voice-based, the conversational AI solution will use Automatic Speech Recognition (ASR) and natural language understanding (NLU) to understand the context and user Intention. However, for Text-based it only uses natural language understanding to analyze user intention.

ASR
Automatic speech recognition (ASR) takes human voice as input and converts it into readable text. Voice input is filtered through various steps like spectrogram, Neural Acoustic model and decoder (with a language model) to produce the output transcript.

NLU
Natural language understanding (NLU) takes text as input, understands context, intent, Entity and generates responses.

In this stage the input text goes through data processing (tokenization, parsing, stemming) and a Pre-trained language model like BERT or ELMo to recognise the correct context.

Below are key terms used in NLU

a. Speech tagging

Tokenization is a way of separating a piece of text into smaller units called tokens. Here, tokens can be either words, characters, or subwords.

Stemming is basically removing the suffix from a word and reducing it to its root word. For example, if the word is running then its remove the suffix ing and take it as run.

Lemmatization is the process of converting a word to its base form. In lemmatization, the root word is called Lemma.For example, runs, running, ran are all forms of the word run, therefore run is the lemma of all these words.

Lemmatization considers the context and converts the word to its meaningful base form, whereas stemming just removes the last few characters.

Text segmentation in NLP is the process of transforming text into meaningful units like words, sentences, different topics, the underlying intent and more.

Syntactic analysis, also referred to as syntax analysis or parsing, is the process of analyzing natural language with the rules of a formal grammar.

Semantic analysis is the process of understanding the meaning and interpretation of words, signs and sentence structure. The word “semantic” is a linguistic term and means “related to meaning or logic.”

Syntax is the grammatical structure of the text, whereas semantics is the meaning being conveyed.

b. Deep Neural Network

Bidirectional Encoder Representations from Transformers (BERT)
It makes use of Transformer, an attention mechanism that learns contextual relations between words (or sub-words) in a text. Transformer includes two separate mechanisms — an encoder that reads the text input and a decoder that produces a prediction for the task.

Embeddings from Language Models (ELMo)
It is a deep contextualized word representation that models both characteristics of word use (e.g. syntax and semantics), and how these uses vary across linguistic contexts. It has the ability to predict the next word in a sentence, and boost the accuracy of the language model.

3. Dialogue management: During this stage, Natural Language Generation (NLG), a component of NLP, formulates a response, Sentence aggregation, Grammar Structuring, and converts them into a human-understandable format.

Natural Language Generation — Convert to structure data in to plain text . Like human — by writing the sentences and paragraphs for you

4. TTS
Taking the text response generated by the NLU stage and changing it to natural-sounding speech by using the speech synthesis neural network and neural vocoder model

Why Conversational AI ? (Business Value)

The rapid growth of conversational AI fulfils the customer experience (CX) as well as User Experience(UX) at the same time, from CSAT to quick and all time available service.

a. From Business PoV

For businesses across various industries, conversational AI is a cost-efficient solution to create a smarter omni channel experience for their customers. It enables you to meet the customers’ needs and address their requests while reducing cost and efforts.

Drive More revenue (by Increasing the sales and engagement)
Conversational AI tools enable businesses to provide real-time information to their end-users, leading to improved customer experience, increased customer loyalty, and additional revenue through referrals.

Reduce Cost
As virtual assistants interact with more customers, it directly Providing customer assistance via conversational interfaces can reduce business costs around salaries and training, especially for small- or medium-sized companies. Virtual assistants can respond instantly, providing 24-hour availability to potential customers.

Adaptable
Highly scalable, available in a variety of languages, and integrates seamlessly.

Cross-Channel
Provides self-service across popular channels, end points, and IVRs.

Quick deployment
The virtual assistant can be deployed within 20–24 hours in any given environment for any channel and source.

b. From Customer PoV

Engaging
Available 24/7. Reduces ticket raising, call backs, and wait time. Provides timely, accurate information based on customer’s query .

Ease of Use
Anyone can interact in an easy and natural way with the business, no need to search or fill the form or raise a ticket.

Accessibility
Unlike other forms of interaction, conversations are the preferred medium for people with diverse technical knowledge and physical abilities. Applications based on conversational user interfaces could be a technological gateway to a wide variety of audiences.

Key Terms frequently used in conversational AI

Utterance
A complete phrase or sentence user says or types, by using this the virtual assistant find the user goal.

Intent (Verb)
It is the intention behind the communication. A specific goal or need of the user in pursuing a conversation with a virtual assistant

Entity (Noun)
Entity is the additional information needed from the user to provide relevant output. An entity in a chatbot is used to add values to the intent and make context more specific.

Context
The context gives us the control over what happens to the conversation.Contextual feature helps shape the speech according to the need and environment.

Features of Conversational AI

Human Like conversation
The virtual assistant must understand the user’s goals, no matter how complex the sentence, and be able to ask questions to remove ambiguity or discover more about the user. It needs memory to reuse key pieces of information throughout the conversation for context or personalization purposes and be able to bring the conversation back on track, when the user asks off-topic questions.

Understands Sentiment
Conversational AI should understand the Emotion and tone of humans and provide appropriate response with high user satisfaction. This can be Spotting key emotional triggers, Human agent handover, use Tone & Sentiment detection to improve conversational flows

Connect to Enterprise system
It allows flexible integration with external systems, Web services, CRM and Execute/ resolve tasks.

Reinforced learning
The application should learn from the experience to deliver a better response in future interactions.

Data Ownership and Analytics
Huge amounts of data are consumed and exchanged between virtual assistant and user. Their individual preferences, views, opinions, feelings etc. are all part of the conversation. This information can then be used to:

  • Make the conversation relevant and more engaging
  • Train and maintain your conversational AI chatbot interface
  • Analyze data to deliver actionable business insights

Collaborate with human agent
The virtual assistant should observe and learn from human agents and implement the same for next time conversation.

Social Talk
Apart from domain expertise, the virtual assistant should use some social talk skills like AIML to make the conversation more human friendly.

Dialog turn management
Conversations often consist of many twists and turns. Conversational AI allows developers to create complex, fluid, and dynamic dialogs.

Context Switching
Conversational AI allows virtual assistants to remember key details from past dialogs, user information, preferences, and more, so easily switch from one context to another based on the user’s need as well as remind them of their original request.

Conversational AI Use Cases

conversational AI applications can be applied across any department and industry. Below are few of them.

Customer Support
Virtual assistants can be used as a point of contact for customer support queries and provide 24/7 support without wartime. Being a machine with needed domain knowledge, they provide quick and accurate information to customer problems. They are good for answering FAQsConversational In case the confidence threshold is exceeded, it will hand over the control to a human. However, it constantly learns from its past mistakes to improve the quality of its answer in the future and avoid such a situation.

Employee Productivity
A virtual assistant has the ability to solve all HR, Finance, IT service related issues to employees through an enterprise channel. It integrates with ERP, CRM solutions to extract the right information and reduce human efforts.

Customer facing interaction
With AI capability, the virtual assistant can help the business to act as a digital employee and help in various sectors like Banking, insurance, travel, healthcare, E-commerce and many more.

Challenges of Conversational AI Solutions

Security and privacy
As conversational AI collects and deals with enormous amounts of user data, it is vulnerable to security breaches. Conversational AI apps must be designed with high security and privacy standards to build trust among end-users and increase usage over time.

Discovery and adoption
Conversational AI is being used by the general population, So it should be simple to use and standardized for everybody, there’re still industries where users need to know the usage of this new technology. It should simplify their task and build a better user experience.

Language input
The processing of user input (voice or text)can be a challenge for conversational AI because there are factors impacting communication like including Dialects and background noises. Also, slang and accents, unscripted language can create issues with handling the information.

Emotion, mood and sarcasm make it hard for conversational AI to understand properly and react accordingly.

--

--

Lead Product Designer @ Flipkart. Building Conversational AI Builder and Business Process Automation (RPA)