Deceptive Design in Voice Interfaces: Impact on Inclusivity, Accessibility, and Privacy

Posted by Saumyaa Naidu and Shweta Mohandas at Aug 08, 2023 03:22 PM | Permalink

Filed under: Internet Governance, Privacy

This article was commissioned by the Pranava Institute, as part of their project titled Design Beyond Deception, supported by the University of Notre Dame - IBM's Tech Ethics Lab.” The article examines the design of voice interfaces (VI) to anticipate potential deceptive design patterns in VIs. It also presents design and regulatory recommendations to mitigate these practices.

Deceptive Design in Voice Interfaces: Impact on Inclusivity, Accessibility, and Privacy

The original blog post can be accessed here.

Introduction

Voice Interfaces (VIs) have come a long way in recent years and are easily available as inbuilt technology with smartphones, downloadable applications, or standalone devices. In line with growing mobile and internet connectivity, there is now an increasing interest in India in internet-based multilingual VIs which have the potential to enable people to access services that were earlier restricted by language (primarily English) and interface (text-based systems). This current interest has seen even global voice applications such as Google Home and Amazon’s Alexa being available in Hindi (Singal, 2019) as well as the growth of multilingual voice bots for certain banks, hotels, and hospitals (Mohandas, 2022).

The design of VIs can have a significant impact on the behavior of the people using them. Deceptive design patterns or design practices that trick people into taking actions they might otherwise not take (Tech Policy Design Lab, n.d.), have gradually become pervasive in most digital products and services. Their use in visual interfaces has been widely criticized by researchers (Narayanan, Mathur, Chetty, and Kshirsagar, 2020), along with recent policy interventions (Schroeder and Lützow-Holm Myrstad, 2022) as well. As VIs become more relevant and mainstream, it is critical to anticipate and address the use of deceptive design patterns in them. This article, based on our learnings from the study of VIs in India, examines the various types of deceptive design patterns in VIs and focuses on their implications in terms of linguistic barriers, accessibility, and privacy.

Potential deceptive design patterns in VIs

Our research findings suggest that VIs in India are still a long way off from being inclusive, accessible and privacy-preserving. While there has been some development in multilingual VIs in India, their compatibility has been limited to a few Indian languages (Mohandas, 2022) (Naidu, 2022)., The potential of VIs as a tool for people with vision loss and certain cognitive disabilities such as dyslexia is widely recognized (Pradhan, Mehta, and Findlater, 2018), but our conversations suggest that most developers and designers do not consider accessibility when conceptualizing a voice-based product, which leads to interfaces that do not understand non standard speech patterns, or have only text-based privacy policies (Mohandas, 2022). Inaccessible privacy policies full of legal jargon along with the lack of regulations specific to VIs, also make people vulnerable to privacy risks.

Deceptive design patterns can be used by companies to further these gaps in VIs. As with visual interfaces, the affordances and attributes of VI can determine the way in which they can be used to manipulate behavior. Kentrell Owens, et.al in their recent research lay down six unique properties of VIs that may be used to implement deceptive design patterns (Owens, Gunawan, Choffnes, Emami-Naeini, Kohno, and Roesner, 2022). Expanding upon these properties, and drawing from our research, we look at how they can be exacerbated in India.

Making processes cumbersome

VIs are often limited by their inability to share large amounts of information through voice. They thus operate in combination with a smartphone app or a website. This can be intentionally used by platforms to make processes such as changing privacy settings or accessing the full privacy notice inconvenient for people to carry out. In India, this is experienced while unsubscribing from services such as Amazon Prime (Owens et al., 2022). Amazon Echo Dot presently allows individuals to subscribe to an Amazon Prime membership using a voice command, but directs them to use the website in order to unsubscribe from the membership. This can also manifest in the form of canceling orders and changing privacy settings.

VIs follow a predetermined linear structure that ensures a tightly controlled interaction. People make decisions based on the information they are provided with at various steps. Changing their decision or switching contexts could involve going back several steps. People may accept undesirable actions from the VI in order to avoid this added effort (Owens et al., 2022). The urgency to make decisions on each step can also cause people to make unfavorable choices such as allowing consent to third party apps. The VI may prompt advertisements and push for the company’s preferred services in this controlled conversation structure, which the user cannot side-step. For example, while setting up the Google voice assistant on any device, it nudges people to sign into their Google account. This means the voice assistant gets access to their web and app activity and location history at this step. While the data management of Google accounts can be tweaked through the settings, it may get skipped during a linear set-up structure. Voice assistants can also push people to opt into features such as ads personalisation, default news sources, and location tracking.

Making options difficult to find

Discoverability is another challenge for VIs. This means that people might find it difficult to discover available actions or options using just voice commands. This gap can be misused by companies to trick people into making undesirable choices. For instance, while purchasing items, the VI may suggest products that have been sponsored and not share full information on other cheaper products, forcing people to choose without complete knowledge of their options. Many mobile based voice apps in India use a combination of images or icons with the voice prompts to enable discoverability of options and potential actions, which excludes people with vision loss (Naidu, 2022). These apps comprise a voice layer added to an otherwise touch-based visual platform so that people are able to understand and navigate through all available options using the visual interface, and use voice only for purposes such as searching or narrating. This means that these apps cannot be used through voice alone, making them disadvantageous for people with vision loss.

Discreet integration with third parties

VIs can use the same voice for varying contexts. In the case of Alexa, Skills, which are apps on its platform, have the same voice output and invocation phrases as its own in-built features. End users find it difficult to differentiate between an interaction with Amazon and that with Skills which are third-party applications. This can cause users to share information that they otherwise would not have with third parties (Mozilla Foundation, 2022). There are numerous Amazon Skills inHindi and people might not be aware that the developers of these Skills are not vetted by Amazon. This misunderstanding can create significant privacy or security risks if Skills are linked to contacts, banking, or social media accounts.

Lack of language inclusivity

The lack of local language support, colloquial translations, and accents can lead to individuals not receiving clear and complete information. VI’s failure to understand certain accents can also make people feel isolated (Harwell, 2018). While in India voice assistants and even voice bots are available in few Indic languages, the default initial setup, privacy policies, and terms and conditions are still in English. The translated policies also use literary language which is difficult for people to understand, and miss out on colloquial terms. This could mean that the person might have not fully understood these notices and hence not have given informed consent. Such use of unclear language and unavailability of information in Indic languages can be viewed as a deceptive design pattern.

Making certain choices more apparent

The different dimensions of voice such as volume, pitch, rate, fluency, pronunciation, articulation, and emphasis can be controlled and manipulated to implement deceptive design patterns. VIs may present the more privacy-invasive options more loudly or clearly, and the more privacy-preserving options more softly or quickly. It can use tone modulations to shame people into making a specific choice (Owens et al., 2022). For example, media streaming platforms may ask people to subscribe for a premium account to avoid ads in normal volume and mention the option to keep ads in a lower volume. Companies have also been observed to discreetly integrate product advertisements in voice assistants using tone. SKIN, a neurotargeting advertising strategy business, used a change of tone of the voice assistant to suggest a dry throat to advertise a drink (Chatellier, Delcroix, Hary, and Girard-Chanudet, 2019).

The attribution of gender, race, class, and age through stereotyping can create a persona of the VI for the user. This can extend to personality traits, such as an extroverted or an introverted, docile or aggressive character (Simone, 2020). The default use of female voices with a friendly and polite persona for voice assistants has drawn criticism for perpetuating harmful gender stereotypes (Cambre and Kulkarni, 2019). Although there is an option to change the wake word “Alexa” in Amazon’s devices, certain devices and third party apps do not work with another wake word (Ard, 2021). Further, projection of demographics can also be used to employ deceptive design patterns. For example, a VI persona that is constructed to create a perception of intelligence, reliability, and credibility can have a stronger influence on people’s decisions. Additionally, the effort to make voice assistants as human sounding as possible without letting people know they are human, could create a number of issues (X. Chen and Metz, 2019). First time users might divulge sensitive information thinking that they are interacting with a person. This becomes more ethically challenging when persons with vision loss are not able to know who they are interacting with.

Recording without notification

Owens et al speak about VIs occupying physical domains due to which they have a much wider impact as opposed to a visual interface (Owens et al., 2022). The always-on nature of virtual assistants could result in personal information of a guest being recorded without their knowledge or consent as consent is only given at the setup stage by the owner of the device or smartphone.

Making personalization more convenient through data collection

VIs are trained to adapt to the experience and expertise of the user. Virtual assistants provide personalization and the possibility to download a number of skills, save payment information, and phone contacts. In order to facilitate differentiation between multiple users on the same VI, individuals talking to the device are profiled based on their speech patterns and/or voice biometrics. This also helps in controlling or restricting content for children (Naidu, 2022). There is also tracking of commands to identify and list their intent for future use. The increase of specific and verified data can be used to provide better targeted advertisements, as well possibly be shared with law enforcement agencies in certain cases. Recently, a payment gateway company was made to share customer information to the law enforcement without their customer’s knowledge. This included not just the information about the client but also revealed sensitive personal data of the people who had used the gateway for transactions to the customer. While providing such details are not illegal and companies are meant to comply with requests from law enforcement, if more people knew of the possibility of every conversation of the house being accessible to law enforcement they would make more informed choices of what the VI records.

Reducing friction in actions desired by the platform

One of the fundamental advantages of VIs is that it can reduce several steps to perform an action using a single command. While this is helpful to people interacting with it, the feature can also be used to reduce friction from actions that the platform wants them to take. These actions could include sharing sensitive information, providing consent to further data sharing, and making purchases. An example of this can be seen where children have found it very easy to purchase items using Alexa (BILD, 2019).

Recommendations for Designers and Policymakers

Through these deceptive design patterns, VIs can obstruct and control information according to the preferences of the platform. This can result in a heightened impact on people with less experience with technology. Presently, profitability is a key driving factor for development and design of VI products. There is more importance given to data-based and technical approaches, and interfaces are often conceptualized by people with technical expertise with lack of inputs from designers at the early stages (Naidu, 2022). Designers also focus more on the usability and functionality of the interfaces by enabling personalization, but are often not as sensitive to safeguarding the rights of individuals using them. In order to tackle deceptive design, designers must work towards prioritizing ethical practice, and building in more agency and control for people who use VIs.

Many of the potential deceptive design patterns can be addressed by designing for accessibility and inclusivity in a privacy preserving manner. This includes vetting third-party apps, providing opt-outs, and clearly communicating privacy notices. Privacy implications can also be prompted by the interface at the time of taking actions. There should be clear notice mechanisms such as a prominent visual cue to alert people when a device is on and recording, along with an easy way to turn off the ‘always listening’ mode. The use of different voice outputs for third party apps can also signal to people about who they are interacting with and what information they would like to share in that context.

Training data that covers a diverse population should be built for more inclusivity. A linear and time-efficient architecture is helpful for people with cognitive disabilities. But, this linearity can be offset by adding conversational markers that let the individual know where they are in the conversation (Pearl, 2016). This could address discoverability as well, allowing people to easily switch between different steps. Speech-only interactions can also allow people with vision loss to access the interface with clarity.

A number of policy documents including the 2019 version of India’s Personal Data Protection Bill, emphasize on the need for privacy by design. But, they do not mention how deceptive design practices could be identified and avoided, or prescribe penalties for using these practices (Naidu, Sheshadri, Mohandas, and Bidare, 2020). In the case of VI particularly, there is a need to look at it as biometric data that is being collected and have related regulations in place to prevent harm to users. In terms of accessibility as well, there could be policies that require not just websites but also apps (including voice based apps) to be compliant with international accessibility guidelines , and to conduct regular audits to ensure that the apps are meeting the accessibility threshold.