Heuristic Evaluation in the age of AI-driven Tech

A quick and cheap way of getting usability evaluation done efficiently without ignoring AI.

Published in

UX Planet

14 min readFeb 6, 2020

TL/DR: Nielsen’s heuristic evaluation has served us well for the past decades to evaluate interfaces. This blog revisits this technique and presents insights into its relevance in the AI age.

Someone suggests: “Let’s evaluate your user interface (UI)!” 😁

And you say/think: “What a flying waste of time and resources!” 😒

Now that I probably read your mind, I have some good news to share with you. This can be efficiently done within 2 hours, by yourself or with your team and resources…just some papers or your computer if you don’t even have a printer at your office. Above all, it’s constructive and fun, so let’s go for it.

If you’re still not convinced and think you are too busy for it ask yourself this: Did I ever waste 2 hours in a useless meeting and not achieving anything at the end so much to say that I do not have 2 hours to spare? Do I really want to launch an Alpha or Beta version and go through the humiliation of users or clients telling me that I have a broken path in the experience or a spelling mistake elsewhere? We all know the answers to these rhetoric questions.

Background

Usability tends to be a tricky term to define. After years of venturing in this topic, I feel happy with this very brief description: Ease of learning and usage. Put within a context; this also means that users making use of a specific app can quickly achieve tasks and satisfaction within a context. There are different aspects of usability that we inherently expect to find the software that we use. These range from the software’s performance, efficiency, errors, flexibility, its control, robustness, ease of learning, mental effort to reuse, help and assistance provided and other similar traits. Clearly, this is all pretty fluid and difficult to quantify and therefore, this is why Nielsen proposed this landmark method in the year 1990/4.

“Did you just suggest that I should use a technique that is three decades old?” 🤔 Yes, and it’s relevant more than ever before. During this past decade, we witnessed an unprecedented focus on Artificial Intelligence (AI) in software. The heuristic evaluation technique allows us to make sure that whatever we are building today is built on the fundamental principles that got us here. I am personally a fan of this technique since it allows for sufficient flexibility, therefore securing its own survival.

The Heuristic Evaluation approach falls within the category of informal or discount methods of evaluating an interface. This method is labelled as such since it does not require any investment or resources. All you need to do is investigate the UI that you already prototyped in a transparent and efficient process.

“Does this mean I’m evaluating the UX of my app?” Well, the UI is a subset of UX, but it’s not the same thing. You might feel more trendy if you use UX instead of UI, but that would mislead you and anyone who knows what she/he is doing. UX is the User Experience of your product, so basically, it includes the experience before you got to know about a product (a), its usage within a context (b) and your experience after using it (c). UI belongs to phase (b) in your UX so briefly; the UI is not the UX. I have some other blogs entries explaining what UX is and if you’re intrigued about this difference, feel free to check them out.

There are other informal evaluation techniques, such as Cognitive Walkthrough. In this post, I’m focusing on Heuristic Evaluation.

What is the Heuristic Evaluation Process?

TL/DR: Team of experts (or yourself) evaluating a UI given by making use of a set of heuristics.

This is a highly informal process that can be fun and also an excellent way for your team to share views during an easy task. Having said that, if this is a solo-venture, you can do it on your own.

The more evaluators on board, the higher the probability of finding issues rises accordingly. Having said that keep in mind that larger groups are more difficult to manage :) This cool diagram is from the Interaction Design Foundation.

The team members on this task don’t need to have extensive experience in UI or its evaluation. They just need to know the process and be confident with the heuristics described below.

Note: this process is carried out internally and should not involve the end-users.

The motivation of this process is for the evaluators to get a feel of the flow while discovering the UI. This also transmits a general scope of the system. In my experience with startups, this is super handy in the early start of the process to make sure that all involved can get a feel of what the entire initiative is all about. Moreover, it also helps the whole team to have a solid sense of ownership of the product.

This blog entry answers all these questions:

What do we have to do?
How do we grade the ‘violations’?
How do we classify the ‘violations’?
What do we need to do all of this?

What do we have to do?

This is how I carry out this fast and cheap process in practice:

Choose who is going to join you. These people don’t need to be experts, however, they should have some level of appreciation to the software engineering process. It is not advised that you recruit users to do this. Use it as an opportunity to bond with your team. 👥
Allocate between 45 minutes to 2 hours for you and whoever (if available) will join the process. There are no rules but, naturally, the more screens you have, the more time you need. ⏱️
Print the screens of your interface. Yes, print on paper. You should do this offline, so you can focus and not waste time. Phones in airplane mode would help you make sure that you are focusing and sticking to the allocated time. Once done, organise the screens on a single flat surface such as a table, whiteboard, notice board or even the floor if you really want to go low cost! 🗒️📴
Link the screens in the form of a sitemap. I am assuming that you went through the design of the flow before coming up with the actual interface. I have witnessed horror stories of wannabe pro teams who built their UI on gut feeling and ended up thrilling the very same guts two months down the line. 🕸️
Individually, look at one screen at a time, noting any violations. These have to be classified as a Heuristic and graded with a severity level, as explained below. 🧐
Repeat this for the entire flow for some 2 or 3 more times. Still individually and silently 🤐
Get together and exchange notes from the process. Agree on a single set of violations with an agreed level of severity. 🙌
Sort out an action plan to address the violations. ☑️

How do we grade the violations?

At the end of the process, we want to end up with a list of issues for every screen. To make it easier to prioritise and tackle, a sense of priority is needed. For this reason, Nielsen introduced the idea of severity. These are the numerical gradings he suggests:

1 = Cosmetic Problem Only: Need not be fixed unless extra time is available on the project

2 = Minor Usability Problem: Fixing should be given low priority

3 = Major Usability Problem: essential to fix, so should be given high priority

4 = Usability Catastrophe: Imperative to fix this problem before the product is released.

Theoretically, there is a zero grade which in other words, means that it is no issue with the violation. This is mostly useful when you’re going through the violations as a team

How do we classify the violation?

Yes, this is where we (finally) explore the 10 heuristics! Below follows a comprehensive list of my favourite examples. These are either remarkable masterpieces of UX or tragedies that we should avoid. Let’s dive in.

1- Visibility of the System

Imagine you are uploading a critical document such as an assignment, a paper or tax returns a few minutes before a hard deadline (no judgement). The experience of the upload can be torturous, and the confirmation of a successful can be of great relief. The UI has to be there for the users in times like these and less stressful scenarios.

We shouldn’t be asking any software “what’s going on in there?”. Users need to be informed about what is happening inside. This heuristic was introduced in 1994, and it is intended for the interface to provide visibility to how it is working.

A clean and simple concept of providing feedback during uploading.

AI Tip: Today, in the age of AI, we expect AI to be explainable (XAI). The UI happens to be an essential tool to communicate this explainability, and this is a very relevant contemporary interpretation of this heuristic.

2- Match between the System and the Real World

The system has to speak neither binary machine code nor the engineers’ way of structuring things. It has to only speak the user’s language. This ranges from using simple graphics to carefully crafted error messages. The system has to feel as natural as possible.

It is strongly encouraged that icons and graphics used to describe or list features are good enough to understand what the feature does without much explanation. You can find this set and more over here.

AI tip: personalisation and context are expected in today’s software. We all expect our devices to know us and know us well. Today’s technology provides excellent opportunities for this heuristic to deliver beautifully tailored experiences.

3- User Control and Freedom

What happens when the user screws up? Who is in control at that point? The right answer is “the user”. As a designer, think about creating enough emergency exits for whenever the user gets into a tight or unwanted situation.

Giving freedom to users and a way to manage your own organisation of content is very important. Check out this WP content builder.

While it may be annoying at times for the users themselves, it is important to give an experience of control over important decision related to meaningful content such as user data.

AI Tip: With enough data about the different states of your app, the sitemap of your app and goals of the user, you can design an intelligent algorithm to advise about the most efficient emergency exit out of this situation.

4- Consistency and Standards

The best way to learn how to use an app is not needing to learn it in the first place! If the app is developed consistently with other applications, it is easy to grasp and achieve the goal it is intended to deliver. This is perfectly executed in the Microsoft Office suite. If for example, you’re using Word, the ribbon is very similar to that of Excel. Most tabs are the same, and then there are application-specific tabs that are different. This renders the experience across different apps seamless without the user needing to learn every app from scratch and therefore, only focus on the different functionality.

Screenshots of MS Word (top) and MS Excel (bottom). Note how the first 3 tabs and the last 2 tabs are identical in both cases so the user can feel a sense of familiarity across different apps.

AI Tip: Adaptable interfaces are becoming more popular. If your application is adjusting the UI elements to specific users, it is nonetheless advised that it maintains a certain level of consistency across the adaptation. The adaptation should also follow some general external consistency with other applications.

5- Error Prevention

Prevention is better than cure, so there is no “additional” AI tip here. I strongly suggest you exploit AI and machine learning to predict when the user will potentially run into an error and avoid it before it happens.

This approach can also use regular expressions to detect any incoherences in situations such as form-filling.

Assistance during form filling is nowadays expected. This is an interesting case where the designer takes it for granted that the users know the general structure of an email and the message is that there is something wrong with the address. However, they take a very detailed approach when it comes to the quality of the password.

6- Recognition rather than Recall

You meet a classmate of yours from 20 years back. You recognise him immediately, and he does too. The conversation is pleasant and friendly but you cannot recall his name! It’s ok and perfectly normal. It is easier for us to recognise a pattern, figure or a face, but it more difficult for us to recall specific related information such as a name. This robust memory effort known as ‘recall' is what makes us dislike examination questions that are based on recalling or regurgitating information.

The effort is, therefore, to minimise the memory effort of the user in carrying out tasks using your UI. The elements, objects, actions and options need to be clearly presented. With the right presentation and design thought, you can also do without long instructions.

The difference between various fonts can difficult to recall, especially when the list is extensive. This was cleverly solved by writing the font name in its actual font so users (such as me) can recognise the font without the need of a lot of trial and error during the choice.

AI Tip: Through the usage data of your app, you can understand where users are repeating specific useless patterns to reach a goal or finish a task. By applying the right machine learning techniques, you can identify cases where users are feeling lost. Once there, you can either redesign to eliminate the flaws (ideal) or build further intelligence within your software to guide users out of such situations. On the other hand, if you managed to classify what adds this memory load correctly, you can also build an intelligent program that highlights potential issues before they happen, hence taking the design process to the next level.

7- Flexibility and Ease of Use

A tool does not change its form depending on the skill of the user. However, a well-designed tool still needs to be usable for different users with different abilities. This principle is particularly important in the experience design that goes beyond the UI. Here’s how:

Consider the most cross-application and cross-platform feature: copy/cut/paste. There are so many ways of how you can carry out these fundamental functions:

Through the “Edit” menu;
Through buttons in toolbars or ribbons;
Through the menu presented as you left-click;
or by using the keyboard shortcuts!

These different ways of doing the same thing (aka Accelerators) allow users with varying levels of experience to carry out exactly the same task but according to their skill. The keyboard shortcut approach is naturally the fastest; however, one cannot expect someone who just learnt about the existence of computers to make use of it. In that case, it would be probably more comfortable to go through menu items or click the button on the toolbar with a pair of scissors and follow the metaphor.

Another way to copy and paste? (GIF-Source)

AI Tip: monitoring whether users are exploiting alternative ways of achieving tasks can help you deliver a better experience through your app. With the right data, you can learn which type of users are not exploiting all the functionality you put in your app and gently guide them towards making more efficient use of your software.

8- Aesthetic and Minimalist Design

I’ll keep this as minimalist as possible: your interface should only contain needed and relevant information and elements. Otherwise, users will feel distracted and drift away from the intended purpose of the app.

A fresh and recent concept that follows this line of thought is that of “The best interface is no interface”. I strongly suggest you have a look at it.

All you need if you are searching for information. Moreover, Google also introduced the idea of taking you directly to the results page as soon as you are typing, keeping the process even simpler. Fun Fact: did you know that Google has an entire design team to keep on improving this interface? It’s actually hard work.

AI Tip: you can use straightforward machine learning techniques to understand better what is mostly needed for users and how to create a path between the landing screen and the task achievement. Optimise towards reaching that as efficiently as possible with minimal clatter along the way.

9- Help users recognise, diagnose and recover from errors

OK, no matter how hard we try, there will always be some error that pops up. One of the fascinating aspects of UX design is the way we design for the recovery from errors.

There are different levels at which one can look at this. The most obvious is the avoidance of cryptic errors that can be only understood by a handful of programmers. So the language has to be the same as the user. The correct choice of language on its own does not entirely solve the problem. “catch-all” errors that fire up when a range of issues take place should also be avoided since they’re ultimately not helpful. When errors are presented to users, I suggest the following:

A clear, easy to understand and relevant message that explains the situation.
A solution for the user to fix the problem.
Ideally, a link or reference to what caused the error.
A neatly presented error code in case the user would need to escalate it with customer care. The error code is only (and only) there to complement the process and must not replace the rest. It is extremely useful when contacting support since the error code can assist agents to swiftly understand what the issue is and assist in recovery.

An example from Google’s People and AI Research team (PAIR) on good practice for errors and graceful failures in the PAIR Guidebook.

Besides all this, if you really and really care about the UX rather than only the UI, you can get creative and spice up the experience that would otherwise be quite dull and negative. Here’s a classic from Google’s Chrome:

This is what in my opinion is an outstanding way to present errors, offer further detail, suggest a recovery and above all help the user relax when everything goes wrong. If this made you feel like a quick break, you don’t have to break your Internet for one…just play it here: https://chromedino.com

AI Tip: besides assistance discussed in the next point and assuming that error prediction and avoidance failed, there is still room for AI to assist in this situation. An error needs to be seen as a state in the app that needs to be efficiently changed, and the user steered away from it. AI can help by walking the user through the error recovery process by also explaining what is going on within the app at the right dose or level of detail.

10- Help and Documentation

“Who needs this anyway?” said every newbie software engineer. Well, we all do. While the process of providing documentation and help resources can be tedious, it is essential, especially in high-stake situations.

However, providing help tooltips and the right level of documentation delivers an experience that feels safe and increases the users’ confidence when using your app. It also demonstrates a level of deep thought about what you designed and built, improving your trust.

Tooltips still offer a strong opportunity for an excellent experience. Check out this repo with an implementation for React.

Voice User Interface (VUI) place a crucial role in delivering a helpful experience. On the other hand, it is also vital to ensure that the actual commands of the VUI are clear to the user. In that case, just make sure that users can just ask the app or device to outline more information or commands.

VUI is rendering help and documentation more ubiquitous (GIF Source)

AI Tip: with the dramatic and definite improvement in AI conversational technology, in-app assistants can efficiently support the user as issues happen and provide a systematic approach of how to recover from the error. The error data collected and recover can also offer a variety of machine learning opportunities.

So how can I get started?

If you just invested a handful of minutes to go through this blog, you probably have all the motivation you need to get started with this method. This process takes an hour or two to complete.

I am also sharing a slide-based reporting template that I personally use to carry out this evaluation.

Below are some suggested steps:

⏳ Allocate between an hour and two hours for you and colleague (if available) during which you can focus on this exercise.
📲 Plan a commonly agreed task/s or sequence of screens of the app being evaluated and ensure it is clear for all evaluators.
🔎 Individually, go through the flow visiting one screen at a time. You may use any medium but I prefer printed screens for better focus.
📝 For every usability violation encountered, record it in a fresh slide in my template or any other way that you prefer.
👫 If you’re more than one evaluator, get together and share notes.
🚦 Assign a priority to every violation based on the severity levels agreed upon.
🛠️ Agree on a set of changes based on these conclusions.

There is no maximum number of times for which you can repeat this process. Keep in mind that this process is intended as an internal exercise to clean your app from any usability issues before you release it. From my experience, anywhere between two and four iterations are enough to get a polished version out and actually get proper feedback from users.

Happy designing for the age of AI!