–
188 KB – 13 Pages
PAGE – 1 ============
Hugo Nicolau 1 , André Rodrigues 2 , Andr é Santos 2 , Tiago Guerreiro 2 , Kyle Montague 3 , João Guerreiro 1 1 INESC – ID, Instituto Superior Técnico, Universidade de Lisboa 2 LASIGE, Faculdade de Ciências, Universidade de Lisboa 3 Open Lab, Newcastle University hman@inesc – id.pt, [email protected], arbsantos@f c.ul.pt , [email protected], [email protected], [email protected] ABSTRACT users in an explor ation of the design space, to create their own bespoke word completion solutions. Through this study w e found that user s create alternative interfaces that extended current screen readers capabilities. R esulting interfaces are less conservative than mainstream solutions on notification frequency and cardinality . Customization decisions were based on perceived benefits/cos ts and var ied depending on multiple factors such as perceived prediction accuracy, potential keystroke gains, and situational restrictions . Author Keywords CSS Concepts – INTRODUCTION Text entry is one of the most common tasks on smartphones, vital to browsing the web, sending emails, messaging or using social networks: it is unavoidable. Mobile keyboards often present word completion d word as they type. These suggestions can potentially save keystrokes as they are always visible and displayed near the typing area. While sighted users can quickly scan the display for input feedback, content changes, and suggestion updates, blind people interact with touchscreen mobile devices in inherently distinct ways due to the one – dimensional and ephemeral nature of auditory feedback. Although there is a large amount of work done on word completion interfaces [2,7,20,25,35,41] , there is no prior research into how to design these interfaces for screen reader users. To fill this ga p, we propose the first design space for nonvisual representation of word suggestions. The aim is to identify opportunities for future interaction designs, guide in the creation of novel interfaces , and spur research on the field. Our design space for nonvisual representation of word completion covers a taxonomy of properties within seven categories : notification, output, confidence, cardinality, concurrency, interruption, and selection . In a first step, we analyze and deconstruct the typing process of blind users highlighting key challenges and opportunities that arise from the interaction between capabilities and word completion systems . Such analysis served as a framework to build our design space. We detail the design space by describing each category and their possible Paste the appropriate copyright/license statement here. ACM now supports three different publication options: ACM copyright: ACM holds the copyright on the work. This is the historical approach. License: The author(s) retain copyright, but ACM receives an exclusive publication license. Open Access: The author(s) wish to pay for the work to be open access. The additional fee must be paid to ACM. This text field is large enough to hold the appropriate release statement assuming it is single – spaced in Times New Roman 8 – point font. Please do not change or modify the size of this text box. Each submission will be assigned a DOI string to be included here. Figure 1 . The design space of nonvisual word completion .
PAGE – 2 ============
instantiations . Additionally, we discuss three different usages of the design space : (1) to analyze existing techniques and gaps in t he literature, (2) to design innovative nonvisual representations of word completion suggestions and identify basic interaction possibilities, and (3) as a support tool that provides the building blocks for participatory design activities. With the design space we offer an approach and perspective for designers, researchers , and practitioners to explore potential techniques arising from the combination of using screen readers and word completion systems. We strive to inspire readers to build upon the prese nted design space , aiming to help uncover opportunities to improve on existing techniques and generate novel solutions . The contributions of this paper are two – fold : first, a design space for nonvisual representation of word completion suggestions , offerin g a new approach on how to think about this unexplored topic. Second, we show the potential of the design space to spur innovation by engag ing screen reader users in the design of novel interfaces. Emerged solutions highlight not only the need for alternat ive word completion interfaces, but also current limitations of mobile screen readers. In terms of resulting interfaces, there were four main designs where participants consistently opted for more frequent notifications than mainstream interfaces. Customiz ation choices were often dependent on personal and contextual factors such as perceived prediction accuracy, potential keystroke savings, cognitive demand, and situational restrictions. RELATED WORK W e discuss related work in three fields of research : text input for blind people , word completion , design spaces . Text Input for Blind People Most smartphones already support nonvisual text input via built – in screen readers such as iOS VoiceOver or Android Talkback. Touchscreen screen readers e nable an Explore by Touch approach by allowing users to drag their finger on the screen and having user interface elements (e.g. keys) read aloud as they touch them [22] . Although screen readers are effective in providing access to virtual keyboards, blind users still present significantly slower entry rates than their sighted counterparts [31] . While sighted users ac hieve mean entry rates of about 40 words per minute (WPM) [46] , reported results for blind users are 4 – 5 WPM [4,31] . To address the mismatch in typing performance there have been many efforts to improve mobile text input for blind people [8,19,33,44] . Previous research has proposed alternative keyboards that leverage gestural interaction [19,44] and multitouch interaction [8] , but with limited success in improving typing speed. Following the same approach of alternative keyboards, many Braille – inspired techniques have been presented over the last decade [4,27,34,39,40] . BrailleTouch [39] and Perkinput [4] were particularly successful in improving entry rates, with the most profi cient users reaching 32 and 22 WPM, respectively. Both techniques leverage multitouch capabilities of current touchscreen devices and allow users to type Braille characters by directly entering chords on the screen. These can be complemented with chord – bas ed correction systems that reduce the number of errors [29] . Overall, much work has been done in the field of nonvisual text entry from und erstanding the fundamental challenges of interacting with touch – based screen readers [22,31,37,38] to novel keyboard designs [4,39] and speech input [3 ] . Despite the large amount of work done [4,8,19,33,39,40] , nonvisual word complet ion interfaces remain unexplored. Word Completion Most virtual keyboards make use of word completion through a suggestion bar ( Figure 2 ) . For instance, the Android operating system presents three suggestions above the virtual keyboard while users are typing. When one of the word suggestions is above a confidence threshold, it turns bold: tapping on the space bar automatically accepts the suggestion and enter s a blank space, i.e. auto – complete. Alternatively, users can tap one of the remaining suggestions at any given time or choose to ignore them. To undo the auto – completion action, they can backspace. Screen reader users have a significantly different expe rience with word completion systems. Due to the inherently one – dimensional and ephemeral nature of auditory feedback, suggestion updates are given reader only reads aloud the auto – complete word (i. e. above a confidence threshold). Similarly, in iOS, VoiceOver updates are also restricted to the auto – complete word; forcing users to explore the suggestion bar to get access to the suggested word. Wo rd completion systems aim to reduce the number of keystrokes needed from users to enter an intended word [20,41,42] . These have shown to help sighted people enter text more quickly and accurately. Word completion has shown to be useful for users with motor impairments; however, presen ting suggestions can impose cognitive and motor costs that sometimes outweigh their benefits [20,25] . With the appropriate configurations, a system can offer both word completions and corrections for typing errors. Bi et al. [7] demonstrated that it is possible to simultaneously optimize a keyboard for both goals with correction accuracy rates of 8.3% and completion power of 17.7%. Suggestions beyond word – level to sentence – level have also been investigated. Bridge and Healy [9] proposed GhostWriter – 2.0, which supports users writing product reviews by suggesting short sentences mined from other reviews. Arnold et al. [2] investigated the use of phrase suggestions in composition tasks. The authors were particularly interested in having a system that provide d valuable su ggestions rather than just accurate predictions.
PAGE – 3 ============
Sentence – level prediction was also used in other applications such as language translation [15] , email responses [24] , copy and paste tasks [45] , while fixing typing errors [1] or as an AAC solution [23] . Quinn investigated the effect of vis ually presenting suggestions, demonstrating a trade – off between keystroke savings and typing speed , which was related to cognitive load [35] . Such cognitive load, introduced by word completion systems, has also been observed in a longitudinal study [12] . Despite its ubiquity, word completion has received little attention when used with screen reader s . As nonvisual word completion is still largely unexplored, we outline a design space featuring the categories that can be explored by interface designers. Design Spaces Design spaces have been used in the field of human – computer interaction to understand and explore the potential of multiple technologies: from input devices [11,14] and smartphones [5] to shape – changing interfaces [26] and 3D printable interactivity [6] . Al most three decades ago, Foley et al. [14] showed that taxonomies are a useful way to organize knowledge about input devices and interaction techniques. Later, Card et al. [11] extended this work and proposed a design space to systematize the huge variety of input devices arising at that time. More recently, Kwak et al. [26] report on a design space and elicitation study for shape – changing interfaces. Ballagas et al. [6] survey the state of the art in 3D printing and propose d a design space in the form of a multidimensional box known as Zwicky box [47] . Another example is the work of Hirzle et al . [21] , which provide s a design space arising from the combination of head – mounted displays and 3D gaze. An observant reader may note that previous research on design spaces emerged from a need to structure existing solutions. This approach is geared towards identifying gaps in the literature and families of successful solutions. Although our design space can be used in such a way, the lack of previous literature presents a major challenge. Thus, our design space was mainly built to inspire others [28] and spur research in nonvisual word completion interfaces by offering a new approach to ideate interaction possibilities . THE DESIGN SPACE In the following, we analyze the typing process of screen reader users and identify the unique challenges that emerge from that experience. This analysis serve d as an underpinning framework to build a structured design space for nonvisual word completion interfaces. We then describe categories and values in more detail. Deconstructing the Typing Process Current word completion interfaces are not designed to support nonvisual interaction. These solutions rely on the prompt while typing. Although suggestions are visually displayed and accessible on the screen, blind users may not be aware of the available completion options. Figure 2 illustrates an example of a user typing the word typing the first three characters there is a suggestion that meets the probabilistic confidence threshold of being the i ntended word. The user is notified visually as the suggestion turns bold the word aloud via speech output . Unfortunately, the suggestion is not the intended word. R ecent studies have shown that blind users spend a large amount of time correcting errors [30] , thus accepting the wrong suggestion can be particularly damaging. This is especially relevant to VoiceOver users that over rely on the word completion system and can accept the suggestion (via space bar) without hearing it first. On the other hand, screen reader users may ignore early suggestions by continuing typing, thus not benefiting from word completion. In Figure 2 , the intended word is already Figure 2 . Left the user has type – . Right – – alternative suggestions o
PAGE – 4 ============
available in the suggestion bar, just after three keystrokes. While sighted users have instant access to three suggestions, blind users are notified about, at most, a single word . Because auditory feedback is inherently sequential , h aving access to lower c onfidence suggestions means to intentionally stop the typing process and engage in a screen exploration task to select the suggestion. Moreover, users would perform this exploration without any guarantees of finding the intended word. After six keystrokes ( Figure 2 – right ), there is a notification of the intended word hears the notification and enters a blank space to auto – complete the word, saving 7 keystrokes. However, if s/he continues to type, the suggestion output is interrup ted by input feedback. This ephemeral nature of auditory feedback may result in users missing relevant notifications. Categories and V alues Based on the previous analysis, w e propose a taxonomy of properties relevant to non – visually interacting with word suggestions in text entry tasks. Although it is impossible to prove that taxonomies are complete as technology evolv es, so should the taxonomies of properties the resulting design space aims to make researchers and designers aware of and help them to address challenges of future word completion interfaces for screen reader users. Our taxonomy includes seven categories: notification, output, confidence, cardinality, concurrency, interruption, and selection . Notification . The notification category indicates when to notify user s of w ord completion suggestions. Mainstream screen readers notify users when a suggestion has high probabil ity of being the intended word, making them threshold – dependent notifications . Older versions of Talkback never notified users, forcing them to interact w ith the screen to access suggestions ( input – dependent ) . On the other end of the spectrum, we may have a solution that always notifies user s of word suggestions, emulating visual updates. Threshold – dependent d esigns can also resort to notification behaviors that are related with typing profile. For instance, notifications can occur when there is a keystroke gain of selecting the suggestion typing speed or a fixed amount of characters . Output . The output category indicates how users are notified of new sugg estions. Output can be either implicit or explicit . VoiceOver uses implicit output since users are notifie d through an earcon , without knowing what the auto – complete suggestion is . Users can then explor e the screen to access the suggestion or simply accept it and trust that the suggestion matches the intended word. On the other hand, Talkback reads aloud the auto – complete suggestion, making it explicitly visible. Both implicit and explicit can leverage m ultiple output modalities such as spearcons [43] , vibrotactile , haptics , and Braill e displays . Confidence representation . The confidence category indicates whether confidence representation of suggestions is static or dynamic . Cur rent representations are static, whereby no matter the level of confidence o f the word prediction , the feedback is identical – contemporary screen readers behave similarly. In dynamic representations, the feedback is modified based on the level of confidence for the word completion. This approach can have multiple bene fits, e.g. , increasing volume can make users more aware of a suggestion that is a strong candidate; conversely, one can adjust other sound features , such as pitch or speed, to achieve similar result s. Comparable behaviors can be mirrored in other modalities such as haptics . Cardinality . The cardinality category indicates how many word completion suggestions are presented non – visually. Although there may be many suggestions visible on the screen, the cardinality category specifically indicates whether users are updated of single or multiple suggestions while typing. For instance, in the Google keyboard there are three available suggestions visible most of the time; still, Talkback only presents a singl e suggestion via speech feedback if above a confidence threshold . Concurrency . The concurrency category indicates whether multiple suggestions are presented sequentially or concurrently . – dimensional. However, as visual representations of word completions, o ne can imagine using concurrent feedback to convey multiple suggestions [17,18] . Concurrency can take the form of binaural (left – right ear) or fully spatialized (3D) feedback. In both cases, it requires the use of headphones or even specialized hardware (e.g. head tracking technologies) to achieve the desire d effect . Concurrency can also take place using multiple output modalities (e.g. audio and Braille displays). Interruption . The interruption category indicates whether word completion feedback is interruptible ( e.g. screen reader stops reading word suggestions as the user touches a new key ) or continuous . Current screen readers are interruptible; if users touch a key w hen a suggestion is being read aloud the feedback is interrupted, as the reader assumes users want to continue typing without hearing the suggestion. Depending on typing speed, this can result in appropriate suggestions never being heard. However, one may wish that suggestions are continuously rendered along with input feedback. Continuous feedback should update when there are new notifi cations. It can be always active or based on a condition /threshold. For example, screen readers can keep reading suggestions that are above the auto – complete threshold, even when a new key is focused. It is important to highlight that interfaces should alw ays be [32] . Selectio n shortcut . We assume that the suggestion list should be always available to select from using the de – facto method, e . g. explore by touch. Complementary to this , ther e
PAGE – 5 ============
can be shortcuts to select suggestions, namely the most probable suggestion. This category represents the expressiveness of the shortcuts and how many suggestions from the list can be accessed from a direct action : single or multiple . While current s creen readers only allow for selection of the most p robable word suggestion via space key , it has been demonstrated to be useful to provide immediate access to additional less probable suggestions (e.g. motor impaired [13] ) . S election shortcut mechanism s may include speech input, 2D/3D gestures, physical buttons, or novel keyboard layouts. Usage of the Design Space We propose three ways in which the design space can be leveraged by designers, practitioners , and researchers. Views on the design space. The resulting design space can be filled with existing solutions , mak ing it possible to visually express the main parameters of word completion representations as a 7 – axis radar chart . Figure 3 shows an example of (the few) existing approaches , i.e. Talkback and VoiceOver . The ordinal values are mapped to these axes , such that the most informative values are placed further away from the center of the radar chart and the most restrictive are placed near the center of the chart. It is worth highl ighting that as we are only using ordinal values, when comparing interfaces, the exact position on the axes is not as important as the relative position , thus these values are simply distributed evenly along the axes. Following these rules, when visualizin g a word completion interface, any representation that entirely contains others, means it is more informative. However, it does not necessarily imply it is a better solution. Views of the design space can also be done using a table (e.g. Table 1 ) , which help to identify promising families of solutions , as well as possible lack of techniques by the quantity of solutions in each category . Ideation over the design space. Using the design space as an ideation tool is directed for those that aim to derive new interaction techniques and nonvisual representations of word completions. Each of the design categories and values can be used as the building blocks for novel technical solutions. Alternatively, and inspired by the work of Card et [11] operators within design spaces, one can choose to draw inspiration from existing solutions and manipulate them . Here, we start with a c oncrete set of values in each category and then replace one (or more) to originate a new word completion representation. An example would be to cardinality parameter, enabling users to sequentially hear more th an one suggestion . Furthermore, we could extend such technique by changing the concurrency parameter and allow simultaneous feedback. Moreover, e ach suggestion could be mapped to a position in space, originating tridimensional auditory output . W e can imagi ne another technique where the output could still be tridimensional but sequential rather than simultaneous . These operations illustrate the potential of the design space to generate novel techniques for nonvisual representations of word suggestions based on simple manipulations of its building blocks . Such manipulation s can be extended to other [36] (e.g. generalization, fusion, opposite) . Elicitation of bespoke solutions . Our final usage of the design space , and less common in the literature, is to serve as a tool for participatory design activities . The set of categories and values can be used as materials when engaging users in design . These materials can then be easily mixed and manipulated to create and customize interfaces. The design space serves as an overall framework to expose possibilit ies, guide the design process, and collect feedback about bespoke solutions. In the next section, we present an example of such participatory activity with screen reader users. EXPLORATION OF THE D ESIGN SPACE To demonstrate the potential of our design spac e, we took a participatory approach to design novel nonvisual word completion interfaces . W e engaged screen reader users in exploring the design space enabling them to build personalized word completion representations . Such work yields a valuable contribu tion given the lack of knowledge regarding the expectations and needs of blind users towards mobile word completion interfaces. We aim to answer two main research questions: (1) Do screen reader users value alternative word completion interfaces? (2) What is the rationale for their personalization decisions? Participants We recruited 11 legally blind participants, 7 males, from a local training institution for visually impaired people. M =45, SD =7). All participants have owned a smartphone for between 1 and 4 years and required a screen reader to in teract with it. Although participants reported performing text entry tasks daily with their devices, only 4 reported using word completion. Those who did not use it, said it was either Figure 3 nonvisual word completion interface.
PAGE – 6 ============
because they did not know this feature existed ( N =4) or did not see the benefit of using it ( N =3). Customizable Prototype We modified Android Open Source Project (AOSP) Keyboard 1 , enabling us to augment the stock input method of most Android devices . The modifications made no layout or visual changes to the keyboard, i.e. target mapping remained unchanged. Instead, we augmented its settings capabilities to enable customization of word completion representations. Touch interaction also remained unchanged as p articipants could drag their finger on the keyboard to have k eys read aloud and lift them to insert a character. They could also select suggestions from the top of the keyboard by focusing them and then double tapping. We also relied on the AOSP dictionary to retrieve the top three suggestions. The keyboard was deve loped to be customized and enable users to experience a set of attributes of the design space. T o avoid fatiguing participants, we restricted the session length to 90 minutes . This timeframe allowed us to explore in considerable depth five out of the seven design categories ( notification , confidence , cardinality , concurrency , and interruption ) . Notification. Participants could choose when to be notified of word completion suggestions: always or based on the spellchecker confidence . Confidence representation . The confidence levels of suggestions could be given through static or dynamic feedback; in the dynamic condition, volume was linearly map ped (0 to 100%) to spellchecker confidence. Cardinality. Participants could receive single or multipl e word completion suggestions. In the multiple condition, participants experienced up to three suggestions, emulating the visual interface. Concurrency. Suggestions could be experienced either sequentially or concurrently . For concurrent feedback, we relie d on Amazon Polly 2 to generate the audio with different voices for each suggestion [16,17] . Furthermore, the audio source s were set in different positions in space using the Spatial Audio API 3 with a 250ms delay between word s to help improve speech intelligibility [10] . Participants could decide the position in space for each suggestion based on its confidence value (i.e. right, center or left). Interruption. Notification could be interrupted via touch input or continuously read. Additionally, participants could combine options by interrupting notifications only when they were below the auto – complete threshold; otherwise, 1 https:// source.android.com, accessed 4 th April 20 19. 2 https://aws.amazon.com/polly/ , accessed 4 th April 2019. 3 https://developers.google.com/vr/reference/ios – ndk/group/audio , accessed 4 th April 2019. the suggestion would not be interrupted (herein referred to as conditional interrupt). Input feedback was rendered as We did not explore variations in the following categories: Output. Pilot s studies showed that explicit feedback (speech) was preferred over implicit output (earcons). Thus, for this participatory design activity, users were notified via speech feedback (for auto – complete) as in Talkback . W e solely used auditory feedback as this is still the most convenient and common output modality of screen readers. Alternative interface designs could leverage haptic devices such as refreshable Braille displays. Selection shortcuts . T he suggestion list was available on the top of the keyboard and could be selected through explore – by – touch. Tapping on the space bar selected a single ( the most probable ) auto – complete word. W e did not explore shortcut techniques for multiple selection as we wanted to focus on novel word completion representations rather than new input techniques. However, we did invite participants to make suggestions on new selection interfaces. Apparatus We used the customizable prototype, previously described, running on a Xiaomi Redmi 3. The mobile device featur es a 5 – inch capacitive touchscreen, running Android 7.1.2. All audio feedback was given either through Android Talkback (female voice) for default text input interactions or Amazon Polly (one male and two female voices) for concurrent word completion feedb ack. Participants were requested to touchscreen actions were logged through our application. Procedure At the beginning of the session, participants were informed they would be exploring a variety of word com pletion interfaces to find the one that best suited them. After completing a demographics and smartphone usage questionnaire, participants were asked to write sentences while exploring different values of the design space. First, we described how the curre nt default word completion interfaces work. Next, participants were asked to write a sentence with the default keyboard and Talkback behavio rs to get familiarized with the device. We then prompted them to start exploring the design space. Namely, participa nts could personalize: notification , confidence , cardinality , concurrency , and interruption categories . W e suggested a set of sentences randomly selected from a corpus representative of the language (.98 correlation with the language character frequency) . However, participants were also able to type freely. They were encouraged to use the same sentences to test each interface design. For each personalized interface, participants were asked to share comments, (dis)likes, and improvements. We then invited th e participants to experience interfaces that were theoretically more informative but also more cognitively
PAGE – 8 ============
preference for interruptible notifications, two participants preferred alternative interruption behaviors, which allowed them to explore the keyboard while li stening to word suggestions. P9 chose continuous feedback just when suggestions were above the auto – complete threshold, while P10 did not want touch actions to interrupt notifications. keep typing. Sometimes I just ignore the last suggestions. I Confidence representation is related to perceptual cost. After experiencing dynamic and static representations of co nfidence, most participants (N=8 ) preferred static feedback as illustrated by P6: Most p articipants valued all word completion suggestions. In the dynamic condition, suggestions with lower confidence levels were har der to hear. Participants commented that if they did not want to hear them, they could continue typing. Otherwise, they would like to clearly hear all feedback. On the other hand, three participants cho se the dynamic representation. Interestingly, these pa rticipants chose to hear three sugge stions, indicating that more demanding interfaces may require dynamic feedback to alleviate cognitive load. word less audible when is not related with what I wro te. is interesting if I can associate different volume levels to the I nterface designers may consider other features to represent confidence (e.g. reading speed). It is worth n oticing that participants had an indirect encoding of confidence as they positioned suggestions in space from left to right , even with sequential feedback , left being the top suggestion. Number of suggestions is highly user – dependent. P referred cardinality abilities and perceived benefit. Participants expressed tension s between potential benefits and ease of use. Two participants preferred just one suggestion. The main reason was that it was easier to discrimin ate between touch feedback and word completion suggestions: because then I have a voice givi n g suggestions and the other [voice] reading the keys I touch. With more suggestions I get confused and lose track of what I P 5 felt that hearing more than one sugg estion was cognitively demanding a nd would decrease her input performance. On the other hand, most participants (N=9 ) preferred multiple wo rd completion suggestions: explore [the screen], I could decide [whether to select a s may be biased by a novelty effect, P1 predicted th at having the same information as sighted users would allow him to make inform ed decision s on when to sele ct word suggestions. It is worth highlighting that P1 already uses word completion with his smartphone. Thus, expertise may be playing a significant role. Other participants (N=4) felt that more than two suggestions read aloud was attentionally demanding: be super attentive. It is confusing, two suggestions are Word discrimination is the biggest drawback in concurrent suggestions. From the nine participants that chose multiple suggesti ons , six preferred sequential feedback while only three preferred concurrent feedback. Participants felt that having multiple suggestions read simultaneously decreased their ability to discriminate words , which could result in miss ing accurate word predict ions: . Although we followed guidelines for concurrent feedback [16] , we believe the similarity between word completions was an important factor. Even P1, who preferred concurrent feedback, highlighted the a dvantage of sequential feedback: – not und erstandi ng a word – , but I (P1) . Context. Context of use was raised by some participants as an important factor. Three participants mentioned that the interface should be personalized to the situation, particularly if people are not using earphones: earphones. It would be good if it [the smartphone] gave me three su ggestions in sequen earphones . P4 goes even further and states that multiple suggestions are worthless in mobile contexts: We use our smartphones outside where there are cars and noise. Two suggestions do work. [ Only ] when we are It is clear that P4 is well aware of how environmental factors influence cognitive load and its effect on mobile
188 KB – 13 Pages