Cooking with a Voice Assistant
Cooking is a common activity in most households. Cooking is not only a necessity, but a pleasure and a hobby that a lot of people take up. However, the interaction between technology and the cooking process is not always smooth. To ease that interaction, we propose a cooking app with a voice assistant to aid in the kitchen when we are most busy. The goal of including a voice user interface (VUI) is to help users remotely operate devices (such as a phone) whilst cooking. Users could use either the voice assistant or the phone to see recipes, ingredients, steps and technical videos to let the user focus on the cooking, and not on how to touch the phone without dirtying it.
You can also see a summary of this project in this HTML Project Page that I've made!
State of the Art
The literature is comprised of resources on Voice User Interfaces, products alike a voice assistant with a cooking/reading instruction function and the demographics interested in such systems.
One of the first academic materials we took in consideration was ‘Designing Voice User Interfaces: Principles of Conversational Experiences’ (Pearl, 2016). The book acts as a basis for our investigation into VUIs and presents basic design principles, as well as motivations for choosing voice technologies. It also served to inform us on the different types of voice interfaces and their particular use scenarios. It is important to distinguish between, for example, command-and-control and conversational designs (pp. 32-38). Cathy Pearl, the author, is also a design professional and manager at Google and has had practical experience with VUIs, hence the publication of a book specifically on designing Voice User Interfaces.
Lastly, in order to understand our demographics, we reviewed cooking habits and behaviours amongst our target audience. The focus remained on young adults who are interested in learning how to cook, as well as more experienced people who simply wish to improve their cooking process. We reviewed the information from ‘Prevalence and socio-demographic correlates of cooking skills in UK adults: cross-sectional analysis of data from the UK National Diet and Nutrition Survey’ (Adams et al, 2015) in which the authors conclude that cooking skills, early developed, are quite important for maintaining a healthy lifestyle and preventing obesity.
Voice Assistants Business Canvas
The rate of smart speaker adoption has been growing rapidly. In 2018, 1 in 4 American adults use one of these voice-enabled smart speakers. It's safe to say that it has now reached 50% market penetration. While smart speakers are a catalyst for voice adoption, they are not the only driver of growth.
Based on this research, we initially developed a Lean Canvas. We then went on to do some initial user interviews and refined the canvas with some tools such as persona and user journey map. In particular, the 'Customer', 'Problem' and 'Solution' sections were revised, as shown in the canvas below.
A new Consumer Intelligence Research Partnership analysis found that Apple's HomePod accounts for only 6% of smart speaker devices installed in the US (CIRP, 2019). The HomePod is more expensive and limited compared to Amazon Echo and Google Home (Echo Plus 2G = $149; Apple HomePod = $349). And Apple has recently changed its strategy around smart speakers by allowing Amazon's Alexa to activate Apple Music (Feiner, 2019).
These figures tell us that the cost of expensive device installations has a huge impact on whether consumers can experience and spend on our services. So, we learned from Amazon's business model for voice assistants, the most important of which is the razor-and-blade model (Uenlue, 2018), so following this model, we will build a broad base of installations with low (or no) margins on hardware devices to achieve subsequent profitability. This also allows us to acquire installed users and capture data that can be used to improve speech recognition through machine learning.
Design plan and Evaluation Process
Based on the key issues listed and the user needs from our previous research, we proposed our design goal: to help the target users safely and quickly make dishes that they have never cooked before/are not familiar with, through the voice interaction and other interfaces, improve the experience of the whole cooking process.
(1) The user touches the screen as little as possible;
(2) The communication between the user and the voice interface should be consistent with the environment in the kitchen;
(3) The product can accurately convey information.
We used the double-diamond principle to determine the main tasks in our design process, putting our planned tasks in the corresponding stage in the diagram. In the Discovery stage, we conducted academic research, initial interview, persona creation and competitive product analysis. In the Define problem stage, we have made the user journey map and business model canvas, then we would like to do some further user research and some sketches. In the Develop design idea and Deliver solution design stages, we want to define the user scenarios and personas, as well as make the lo-fi prototypes through ongoing testing and iterative cycles.
After our initial literature review, we decided to do a survey to gather information about people’s opinions on cooking in general and cooking aided by a voice assistant. We designed the survey to include eleven (12) questions, with the last question reserved for further information. The aim of the survey was to gather general information about our potential users and their attitudes and capabilities in relation to cooking and technology.
We received a total of sixty-six (66) responses. From those, the majority (84.8%) were in the age range of 18-25 years old, followed by 14% over 25 years old. In terms of cooking skill level, there was an almost even split between beginner level (being able to follow basic recipes with little to no skill) and intermediate level (more complex recipes with varied technical skills). This gives us insight into the kind of recipes that should be included in the app, as well as the kind of technical videos we might need to provide.
When asked about the frequency of cooking, most people cook at least a couple of times a week and try a new recipe a few times per month. When it comes to finding and following recipes, the trend shows that users like to find recipes on speciality websites or in cookbooks, alongside social platforms and recipes recommended by friends. Most prefer a technological medium to follow the recipe (phone, tablet or laptop), some like physical books with instructions or being taught in person by friends. If provided with a digital medium, 80% of our respondents chose to watch or listen to video tutorials if they are offered by the platform.
Personas and User Journeys
Narrowing more into the ‘define’ stage of the design process, we interpreted the results provided by the research and moved on to composing user personas and a potential user journey. I will firstly discuss the user personas and the considerations we took into account. The main distinction between the two personas is their cooking skill level. As such, we defined Johnny to be:
- 21 years old, an undergraduate student in London who loves video games and playing the guitar
Has little to no time to cook, but enjoys the process and always wants to learn something new; driven by curiosity, he often finds new recipes on his phone or laptop;
He is at beginner cooking level, but is motivated to learn something new in order to have his friends over for dinner;
He does not use voice assistants almost at all;
Some of the pain points he encounters while cooking are dirty hands when trying to check the recipe, suddenly realising he is missing an ingredient, and being able to juggle multiple tasks to be more efficient
This is the user journey associated with Johnny. He tried a french toast recipe for the first time from a YouTube video he saw recently.
Our second user is Jane, who is:
24 years old, a financial consultant for a company in London who loves cooking, gardening a good session of yoga;
She cooks almost every day and also enjoying trying out new recipes, which she finds and follows on her phone;
She is at intermediate skill level, but wants to get more creative in the kitchen;
Uses voice assistants to check weather, make calls or play music;
This is the user journey associated with Jane.
During the user research we distilled some requirements, including not knowing what to eat, having no cooking experience, needing to manage their own ingredients, needing an assistant when cooking, needing timed and accurate cooking, etc.
So we designed four important functions based on these important requirements.
1. Today: Solve the needs of users who don't know what to eat and provide them with better inspiration
2. Search: improve the search, voice search to help users quickly search for dishes to be made
3. Shopping list: help users to achieve ingredient management and organize shopping lists
4. Cooking: help users cook better by combining voice and finger tapping, including providing them with functions such as timing, minimizing the window and making another dish.
Lo-fi and Hi-fi Prototype
The low-fidelity prototype is divided into four main functions, presented in the tab bar as Home, Search, My Recipe and Shopping list.
The Home page mainly solves the pain points that users don't know what to eat. Therefore, the Today Recipes part accounts for the largest proportion, which is the most significant recommended recipe and reduces the user's choice cost. According to Personas, the main users are positioned as cooking novices, so the second major section is Easy Meal, with 15-minute meal, 3-ingredient meal, 5 steps meal and other entrances. In addition, there are sections such as Latest Recipes and What to cook this week.
The recipe details page could be accessed by above two ways. On the details page, The user can view the required time, materials, difficulty and other information, they can click to enter the video tutorial, and can also do operations of like, favorite, share ,etc. The favorite recipes will be displayed in the My recipe section.
Enter the video tutorial on the recipe details page. Taking into account the needs of different user groups, the tutorial page can be freely switched between horizontal and vertical screens. The user can see the tutorial page in portrait orientation first.
At the bottom of the page, it can be switched between different steps, view the description of different steps and the required ingredients. When the users need to time the meal during the cooking process, they can click to select the time duration. Switch to the horizontal screen page, the function is similar to the vertical screen. Return to the details page in the timing state, the tutorial will be minimized to the screen, and the remaining time of the timing will be displayed. Up to two recipes can be timed at the same time.
The last functional partition is ‘Shopping list’. On the recipe details page, the required ingredients can be added to the shopping list to record what to buy. In Shopping list the users can choose to view all the ingredients they want to buy, or view it by recipes. Click the small box to check the ingredients and store it in the Purchased block. In addition, click Edit in the upper right corner to delete items in the list in batches.
In the high-fidelity prototype, we made some adjustments after some development and evaluation tests.
We chose orange as the theme colour, and mainly used rounded-corners cards to display the content.
The homepage will remind users to use the voice interaction function, emphasising the main features of the application.
In the video tutorial page, the timer function becomes a floating button that can be dragged, which can be clicked for timing. In the timing state, the remaining time will be displayed, and the details of the timing will be displayed when the floating button is expanded (the name of the recipe, pause and stop functions). When returning the page in the timing state, the floating button will be displayed on other pages and the number of timings will be displayed. It can hold two timings at most. After unfolding, the user can quickly enter the corresponding recipe page, or turn off the timer.
We first did a tree test for our information architecture. After we quickly sketched out the main pages, we followed up with more pages to serve the different functions of the prototype. We wanted to evaluate the information architecture early in the process because on it depended the rest of our organisational structure. The questions were designed to gather information about the paths users took to reach both main and alternative pages. The four questions we defined were:
1. If you want to cook an easy meal, which one would you choose to find a recipe?
2. If you want to make Sushi, which one would you choose?
3. If you want to check all the ingredients you need to buy, which one would you choose?
4. If you want to start to cook your saved recipes, which one would you choose?
Three of the questions had up to two possible correct paths. The goal of this task was to evaluate the structure and wording of our information architecture.
The tree test was followed by a first click test on a lo-fi prototype of the application. Based on the feedback received on the tree test, we re-evaluated the information architecture and went on to design a low fidelity prototype of the main functions of the application. This too we tested early on to ensure we could move forward with a higher-fidelity prototype. The goal of this test was to evaluate how users feel about the functions of the prototype when faced with a visual representation of the app.
We asked participants to click where they thought they would find the answer to the questions we posed. To account for accidental clicking near the correct area, the surface for correct clicking was quite big. Where multiple widgets would lead to correct answers, all correct areas were accounted for.
We plan to invite 3 target users to run the prototype on iPhone 10 to test it in an actual cooking situation. The tasks to be tested are:
1. the number of times the user touches the screen (compared with before using our product);
2. the time it takes the user to complete a new dish through the prototype test;
3. rate of voice support to correctly understand and execute user commands;
4. whether the user knows when to speak and what to say to the voice guide (test at several points in the flow)
The evaluation of the information architecture provided us with insight into possible problems with the structure and wording of our app. Each question was faced with problems in terms of finding the correct pathway. For example, the question “If you want to start to cook your saved recipes, which would you choose?” had a 50% rate to the correct path. Half of the participants went for options that, although not entirely wrong, did reflect the lack of clarity on the terms we used to define the pages, as well the phrasing of the question.
Similarly, for the question “If you want to cook an easy meal, which one would you choose to find a recipe?”, the success rate was only 27.3%. Once again, it was clear to us that we did not make the name of the pages, or the mission of the questions clear enough. We also received a comment on the clarity of names. This feedback prompted us to eliminate the page called “Voice Cooking” because it seemed to confuse people. We also shuffled the location of other subpages to be more cohesive in the context of our structure.
First click test
The evaluation for the first click test was mostly positive and showed us that we were on the right path. Most of the tasks were successfully finished, with only one task at 60% failure rate. Compared to the information architecture, people were much more successful at completing the tasks with a visual reference.
The task with the highest failure rate was related to the button placement for minimising the window while cooking. Because there were two possible icons to show a minimising action (one for the tab and one for the cooking steps), users became confused with the meaning of the icons. Subsequently, we replaced the window minimising icon with sometime more commonly used and moved the placement of the dropdown of steps, as to not confuse users.
At the beginning, we did not intend to set a wakeup word, we wanted the product to continuously listen to the environment while the user was cooking, so the user could interact with the product directly and naturally. However, in the actual scenario, after the test with the user, there are a lot of unexpected sound inputs including cooking sound and conversation sound with other people, so we decided to set the wake-up word.
When using Protopie, there are problems with voice wakeup. These problems include, firstly, the inability to set a self-created name when setting the wakeup word. For example, a self-created word, Alexa, will not be recognised because it does not have the word data in its corpus.
Secondly, if the prototype needs to wake up according to the wakeup word, Protopie requires keeping the device on a continuous listener. Due to these problems in prototype making software, we use a trigger button instead of a wake-up word. More specifically, when we click that button, it is equivalent to the user speaking the wake-up word. In addition, we replaced the word pause, a more semantically accurate word, by using the word stop, a word that is more easily recognised by Protopie.
In the second test conducted with the voice prototype, we found that users could use that prototype smoothly and complete the cooking.
Our goal was to investigate the use of voice user interfaces for a cooking app and design a functional prototype that could simulate this interaction as smoothly as possible. If we had more time, we would continue to develop the hi-fi prototype, as well as look into development possibilities and alternative tools for VUI’s. Overall, the results of the prototype were positive and worth looking at in more detail for further investigations.