Designing for Voice
Is Voice the next frontier in designing Human-Computer Interactions?
First off we started with simple text-based computers.
Then we had graphical user interfaces and came to the mouse.
The next was touchpads and pinching/zooming thanks to the Apple iPhone revolution.
Are we now moving into the voice era?
In this era of Apple Siri, Google Home, Microsoft Cortana, and Amazon Alexa, one can easily ask the question of what is the comparison between Voice and other forms of user interfaces.
The main differences are listed across user interfaces below.
- Text versus Natural Language versus Gestures — While traditional screen-based user interfaces use some form of input typically a text or mouse action based user interface which is usually graphical, Voice uses natural language and converts that to a command. Some interfaces like the Sony Playstation uses gestures, which is neither text nor native language. There are also interfaces based on touch or experimental thought based interfaces where our thoughts control actions on a machine. An excellent example of touch-based interfaces that are security devices like fingerprint readers or the innovative touch interface for any surface developed at Carnegie Mellon University and of thought is the lie detectors.
- Privacy — Voice interfaces are typically not conducive to privacy and noisy environments. Text, Touch and graphical user interfaces can be made private, especially with screen filters on phones, large tablet or laptop screens.
- Speed — Human beings are inherently visual in nature. Visual user interfaces, rightly designed, can speed up the user interface and human-machine interaction. Voice can still be slow.
Source: Chris Harris/Carnegie Mellon University, USA
What makes voice user interfaces interesting is that it allows naturally interacting with machines. Some examples of this are:
- The Smart Home — Typical home use cases of knowing the time of day, alarms, temperature, controlling lights can become easy to manage via simple voice commands.
- The Smart Vehicle — When driving a car, a voice interface is very convenient and gives the driver hands-free access to information while also preserving safety, allowing him/her to focus on the road.
- Phone-based applications — When doing phone banking or calling a customer service agent, using voice recognition, without requiring passwords or verification is a relief for many banking customers. E.g., Citibank is one such leading bank that makes this very easy for customers to come in verified on the call in specific markets.
While this is what we may desire, let’s see what the current offerings are in the market.
Google launched Google Home after Apple launched Siri and Amazon launched Alexa in 2016. It is basically a home automation device that allows up to 6 users to control lights, play music, news or do google searches.
Microsoft started way back in 2009 to develop a voice-enabled digital assistant called Cortana. It is now available in cars as well as on Windows devices. It’s integrated into speakers by third-party manufacturers, but the offering from Microsoft is pretty much a virtual assistant. Cortana predicted Germany FIFA cup matches.
About the same time as Microsoft launched Cortana, Apple launched Siri an acquisition of an app developed by a 3rd party started on the ios app store. Siri is very similar in functionality to its competitors in that it offers typical voice commands, navigational aids, sports news, etc. However, Apple was strict on privacy, and hence it hurt the potential of Siri.
In 2014, Amazon launched Alexa, a virtual assistant similar to that of Cortana and Siri to allow users to make to-do lists, order food, play music and check on weather and news. Alexa, later on, allowed users to customize and augment the device capabilities using APIs.
The most significant issues with the offerings are lack of full availability in all languages, accents as well as universal access across all devices of different operating systems, form factors and hardware devices.
The offerings also have some security holes and privacy issues, most notably Amazon’s Alexa in one incident sending secret recordings of an American couple to a stranger in Europe.
On the other hand, Apple’s Siri is limited because of the extreme privacy constraints that Apple imposes. Google Home products cannot recognize foreign accents for example.
So how does one design differently for Voice? Here are a few guidelines that we have:
- The KISS principle — We advocate to keep it really short and simple for the machine to be able to recognize the commands from the human users in your organization. For example, If you are designing a voice interaction to start a shop floor machine, we recommend “Start Machine M1” or “Start M1” rather than have steps like “Start Machine” (Vague — which machine is it if you have multiple ones) and requiring another interaction.
- Accommodate for accents — A common frustration is that the machine cannot follow English in different emphasis. For example, Google home cannot recognize Bulgarian music composer names. We also recommend testing the commands with a variety of accents, genders, ages and start with English only first.
- Involve Human support agents quickly — If the human-machine interaction doesn’t go well, there has to be a way to intervene with a call to the user from another human and help them with their use case. So far, we have not seen any implementation that does this effectively. Alexa comes close in this but still leaves some room for improvement.
- Incorporate AI and Natural Language — Lastly, we as humans, need some flexibility in our user experiences. While the KISS principle is ethical, some flexibility using AI and Natural language processing will help make the experience more friendly and accessible.
We still have no clear clue if voice interfaces will replace the traditional graphical, touch or text-based interfaces but will help augment many new use cases and help business users become more productive in work as well as the home environments. The critical business outcome would be more time savings, productive employees and potentially more commerce via food delivery, home repair and faster business process execution.