Indirect Text Entry Using One or Two Keys - CiteSeerX

Indirect Text Entry Using One or Two Keys Melanie Baljko, Andrew Tam Department of Computer Science and Engineering York University, Toronto, Canada, M3J 1P3 {mb,atam}@cs.yorku.ca

1.

ABSTRACT

This paper introduces a new descriptive model for indirect text composition facilities that is based on the notion of a containment hierarchy. This paper also demonstrates a novel, computer-aided technique for the design of indirect text selection interfaces — one in which Huffman coding is used for the derivation of the containment hierarchy. This approach guarantees the derivation of optimal containment hierarchies, insofar as mean encoding length. This paper describes an empirical study of two two-key indirect text entry variants and compares them to one another and to the predictive model. The intended application of these techniques is the design of improved indirect text entry facilities for the users of AAC systems.

Categories and Subject Descriptors D.2.2 [Software Engineering]: Design Tools and Techniques—User Interfaces; H.5.2 [Information Interfaces and Presentation]: User Interfaces—Evaluation/methodology; H.1.2 [Models and Principles]: User/Machine Systems—Human factors

General Terms Human Factors, Measurement, Design

Keywords Augmentative and Alternative Communication (AAC), Voice Output Communication Aids (VOCA), Speech Generating Devices (SGD), Indirect Text Entry, Scanning, Information Theory, Interventions for Communication Disorders, Interface Evaluation

2.

INTRODUCTION

Augmentative and Alternative Communication (AAC) is an area of research and clinical practice concerned with, among other things, the development of systems that mediate the communication of individuals who have little or no functional speech, such as those affected by conditions such as cerebral palsy, amyotrophic lateral sclerosis, or paralysis [2]. A component of many AAC interventions is an AAC device, which, in a large portion of cases, is a laptop, tablet

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ASSETS’06, October 22–25, 2006, Portland, Oregon, USA. Copyright 2006 ACM 1-59593-290-9/06/0010 ...$5.00.

or other similar piece of computational hardware that provides (1) a facility within which text glosses are prepared and (2) a mechanism whereby the text gloss can be passed to a text-to-speech module, so that synthesized speech can be produced from the text gloss. Such devices are often referred to as Voice Output Communication Aids (VOCAs). A variety of techniques have been developed for preparing the text glosses and can be broadly categorized as compositionbased or retrieval-based. Composition-based techniques entail the selection of symbols (usually orthographic but possibly iconic), one by one, in order to form larger units of text (words, phrases, sentences), whereas the retrieval-based techniques entail navigating and making selections from a database of pre-stored units of text [5]. Each technique has its advantages and disadvantages (a full discussion of which, unfortunately, space does not permit; the interested reader is directed to the work of Alm and colleagues, e.g., [20]). Most commercial AAC devices are hybrids in the sense that they typically implement, to at least some degree, techniques belonging to both classes. Word-prediction and -completion facilities can be viewed as a retrieval-based technique that has been integrated into a composition-based facility. Despite the utility of the hybrid approach, it remains that retrieval-based techniques are ill-suited to some communicative scenarios (e.g., discussions about unanticipated topics or the need to provide information about a topic that has little or no vocabulary coverage in the database). Users of AAC devices must be afforded the possibility to compose text. In previous work [3, 4], we have focused on the utility of information theory as a framework for theoretical models about the relative utility of various text composition designs. We have a particular interest in the use of Huffman coding for automating the design of text composition facilities. In this paper, we discuss a theoretical prediction that relates row-column and Huffman-coding-based indirect text selection and present the results of a relevant empirical user study that we conducted.

3. BACKGROUND The users of AAC devices are highly heterogeneous with respect to the type and number of input devices that can be reliably operated. By the term input device, we are referring to various specialized mechanisms such as contact switches (buttons) operated by positive or negative mechanical pressure, by puffs of breath, by eyebrow raises or threshold sensors that incorporate eye tracking or EMG signals (e.g., [8]). Researchers cannot know a priori the type and number of input devices that the members of the target population

Figure 2: The containment hierarchy for rowcolumn unigram.

4. DESCRIPTIVE MODEL 4.1 The Containment Hierarchy Figure 1: A screen shot of a indirect text composition facility that makes use of the row-column variant. The keys are arranged on the basis of unigram frequencies (rather than the more-familiar QWERTY layout). The on-screen component entitled “Target Text” has been included for the purposes of user experiments (described below).

can reliably operate, other than the fact that this profile will vary widely. The long-term goal of work in this area is to derive text entry techniques that not only are optimal for a particular user profile, but are parameterized to be optimal in a general way (i.e., for a space of possible user profiles). In this work, we focus on indirect text composition — a task in which the number of input devices (we will henceforth use the term key to refer to any such simple button or switch) that the user can reliably operate is significantly smaller than the number of selectables. The selectables are typically alphanumeric and punctuation characters, the space character and enter. This contrasts with direct text composition, such as text can be entered directly, using a one-to-one mapping between selectables and buttons on a physical or software keyboard. Indirect text selection — or scanning — uses the notion of “highlighting”. In row-column scanning, a classic variant shown in figure 1, each of the rows of an on-screen keyboard are highlighted one-by-one, each one d seconds. This duration of time is described as the dwell period. The highlighting iterates row-by-row. When the user makes a selection, the buttons within the currently-highlighted row are then highlighted one-by-one, each for d seconds. When the last button in the row (the last “column”) has been reached and no selection has been made, the button-by-button highlighting wraps back to the first column. The effect of user action is contextualized — a user selection when the “rows” are being scanned has the effect of transferring the highlighting to the buttons within the in-focus row, whereas when the “columns” are being scanned has the effect of selecting the presently-highlighted button (thereby appending its corresponding symbol to the text gloss that is under preparation). Below, we focus on the scenario in which only two keys can be reliably operated by the user: one is dedicated to error correction (e.g., “undo”) and the other to selection. This scenario falls towards the end of the spectrum of possible user profiles.1 1

There are user profiles that are more restricted in terms

At this point, we introduce the notion of the containment hierarchy, which we use to characterize indirect selection techniques. We define the containment hierarchy (CH) to be a directed acyclic graph in which (1) each node is associated with a set of selectables (2) each leaf node’s set contains a single selectable; and (3) each internal node’s set contains precisely those selectables that are associated with its children nodes.2 At any point in time, one and only one node of the CH can be in focus. When a node is in focus, all of the children selectables are highlighted. (Any mode of feedback, and not just visual highlighting, is possible, in principle.) In the indirect selection paradigm, focus advances passively, from one sibling to the next, pausing on each one for d seconds (the dwell period). A user’s action corresponds to either of these two basic operations, depending on the type of in-focus node: “drill down” (internal in-focus node) and “select” (leaf in-focus node). Selection takes place through a sequence of one or more of these basic operations (until a leaf has been selected). A particular instance of row-column scanning (with a nonQWERTY keyboard layout, to be discussed in detail later) is represented by the CH that is shown in figure 2. We denote the number of levels in the tree by n. If the outdegree is constant over all the internal nodes, we denote it by k; if it outdegree varies from level-to-level (we assume it is constant within a level), we denote it by {k1 , . . . , kn } (where ki is the number of children at level i). The row-column variant (figure 2) is characterized by n = 2 and k = {8, 6}, where the first level has eight children and the second level has six children. The Huffman variant, to be discussed in the next section, is characterized by {n, k} = {12, 2}, since internal nodes have two children at every level (figure 3). This containment hierarchy-based representation generalizes those used previously. For example, the three-dimensional matrix method described by Abascal et al [1] can be represented as a three-level containment hierarchy with {n, k} = {3, {3, 4, 3}}: the authors’ “D-dimensional matrix” model encompasses only a subset of the design space covered by the containment hierarchy model. of number and type of input devices, such as single binary switch activations or even no volitional movement (in which case biological or neurological contols — via evoked potentials or sensing of the motor or pre-motor cortex — may be used). These scenarios are not considered here. 2 In the work described here, the DAGs happen to be trees since each selectable is reached by a unique path (but this is not necessarily always the case in our analyses).

5.

RELATED WORK

Venkatagiri [21] presented a detailed discussion of several text-composition facilities, with a focus on the comparison of linear vs row-column arrangements (although two morecomplex arrangements were also presented) and of lexicographic vs unigram-based arrangements. Subsequent to this, the most significant contribution to the design of indirect text selection has been made by Abascal et al [1], who generalized the linear and row-column scanning methods into a hierarchical continuum of matrix-dimensional scanning, with a focus on the position of selectables relating to error correction. Their simulation results show that for a selection set of 36 keys, a square layout is optimal for a two dimensional matrix, while a 3×4×3 layout yields the best results for a three-dimensional entry scheme. Abascal et al and Venkatagiri were in agreement that keys must be located according to their frequencies of use. The comparison of different text entry methods is common in the area of human-computer interaction, namely in the context of mobile telephone keypads [9, 10, 11] and soft keyboards [13, 14, 18]. On the basis of models developed by Soukereff and Mackenzie [18] for predicting lower bound (novice user) and upper bound (expert user) soft keyboard text entry rates, Mackenzie, Zhang and Soukereff [14] evaluated six different layouts and tested the model predictions with a low-scale user study. Mackenzie and Zhang [13] designed and evaluated OPTI, a soft keyboard design, against a soft implementation of QWERTY. The comparison was done through a longitudinal user study, such that learning effects were characterized while entry speeds predicted through modelling can be confirmed without the influence of users’ familiarity with the QWERTY layout. Also relevant to this current work is alternative text entry methods which include necessary time delays as part of their design. The half-QWERTY method for one-handed typing [16] required that the user hold the space bar down for a timeout value in order for the keys to “flip over” to the other, mirror half of the QWERTY configuration. The eye typing methods studied in Majaranta et al [15] featured a “dwell period” for which the typist’s gaze must be fixed on a key before it is selected. Both the timeout and dwell period were limiting factors on the upper bound the user’s text entry speed, as is the dwell period that must be used when evaluating the row-column and Huffman methods.

6.

TEXT COMPOSITION VARIANTS

6.1 Row-Column The row-column variant features the scanning pattern and containment hierarchy as described above in figures 1 and 2. This design provides access to 43 selectables: 26 alphabetical characters, 10 digits, 7 “other” characters (space, punctuation, enter, delete). The unigram probability distribution was derived from a corpus of written English text.

6.2 Huffman The Huffman variant has a containment hierarchy that we derived using Huffman coding. Our implementation of the k-ary Huffman coding algorithm takes as its input a set of symbols (the selectables), a probability distribution over those symbols, and a value k, which indictaed the desired outdegree of the coding tree. The algorithm produces a coding tree that has the smallest mean encoding length (MEL)

Figure 3: The containment hierarchy for Huffman k=2.

for each symbol (other techniques can provide more efficient encodings than Huffman for a sequence of symbols, but not on a symbol-by-symbol basis). We use the coding tree as our containment hierarchy. Note that MEL corresponds to the mean number of input actions required to select a selectable. The row-column variant described above has a containment hierarchy with mean outdegree k = 7×6+1×8 = 5.375 8 (i.e., 7 internal nodes with outdegree 6 and one internal node (the root) with outdegree 8) and has a MEL of 1.9994. But the same selectable set with k = 5 and k = 6 Huffman coding has MELs of 1.8666 and of 1.69229, respectively. We chose to explore the use of a binary Huffman coding tree (k = 2), shown in figure 3, which yields a MEL of 4.2007. The selectable set and the probability distribution were matched to the row-column variant. Our conjecture was that the Huffman variant would afford higher entry rates than the row-column variant, despite having a higher MEL. Our hypothesis was that Huffman would entail less time spent in passive focus advancement, which often can be a substantial portion of the total amount of time required to select a selectable (i.e., the time spent waiting for the selectable to become in-focus, so that it is available for selection). Whereas the row-column variant can be seen as attempting to minimize the total number of levels in the tree,3 Huffman coding suceeds in minimizing the mean encoding length. Moreover, binary Huffman coding has the smallest possible number of children, hence requires the shortest amount of time for complete cycles of focus advancement. We test this hypothesis through model prediction and experimental results. 3 Actually, linear scanning corresponds to a containment hierarchy with a single level, but has been shown to be highly inefficient [21] and is not considered here.

7.

PREDICTION MODELLING

On the basis of the descriptive containment hierarchy model, a prescriptive entry speed prediction model was derived. We assumed that the time to compose a given text sequence C = c1 c2 . . . cn is additive over the time required to select each of the selectables, cj . (All time units are given in seconds.) We hypothesized that the process for selecting selectable cj required the user first to locate the target cj on the keyboard (the visual scan time or α). Next, the user must wait for focus to advance to the target (the incremental wait for focus to reach target or β). Note that the target may have already been in focus one or more times, in which case the user must wait for the focus cycle to return to the target (one or more wasted cycles). In particular, when cj is associated with the first child of the root node, the dwell period often elapses before the user locates it. The visual search may possibly be completed while the target is in focus, as well (in which case β = 0). Once focus reaches the target, a certain amount of time is required for the user to react to this visual event (the visual stimulus time or γ) and to generate a motor response (the key press time or ζ). We assume that the time-to-select cj is the sum of the time-to-select cj at each of the levels of the CH, until the leaf is reached. Thus, time-to-select at the first level (level 0 of the CH) can be characterized by: tts0 (cj ) = α + β + γ + ζ We assume that this visual search is done only once at the beginning, when the focus cycles among the children of the root node of the CH, and is not repeated at lower levels of the CH. We use EL(cj ) to denote the number of nodes in the path to the leaf node corresponding to cj or the encoding length of selectable cj . Given a particular level i in the CH, the time required for focus to reach a particular target cj is a function the associated node’s position among the level’s children nodes (pos(cj , i)) and the dwell period (d). The levels and sibling positions are indexed by i = 0, . . . and s = 0, . . . , respectively. From the start of a given focus cycle at level i, the time for focus to reach target cj is pos(cj , i) · d. Thus, time-to-select at subsequent levels of the CH (levels>0) can be characterized by: tts>0 (cj , i) = pos(cj , i) · d + γ + ζ The total time-to-select selectable cj is thus: EL(cj )

tts(cj ) = tts0 (Cj ) +

X

tts>0 (cj , i)

i=1

We constructed a lower bound for tts>0 by assuming that the visual stimulation time and the key press time were zero. We constructed an upper bound by assuming that the user performs the key press at the very end of the dwell period. This is an overestimation, since the last possible point occurs (γ − ζ) seconds before the end of the dwell period (the point after which focus will have shifted before the key press will has been performed, which is one sort of input error ). Thus, L tts>0 (cj , i)

= pos(cj , i) · d U tts>0 (cj , i) = pos(cj , i) · d + d + γ + ζ We constructed lower and upper bounds for tts0 by first constructing an alternative expression for α + β. We denote

the total time required for a complete focus cycle at level i by timeCC (i). The amount of time α + β is equal to the amount of time spent in wasted cycles plus the amount of time for focus to reach the target from the start of a cycle. The time spent in wasted cycles at level i is the product of the number of wasted cycles (numWC ) and timeCC (i). Thus, tts0 (cj ) = α + β + γ + ζ = pos(cj , i) · d + numWC · timeCC (i) + γ + ζ For numWC , the following formula was used: ( 0 if α < pos(cj , 0) · d numWC = j pos(cj ,0)·d+d−(γ+ζ) k otherwise time CC (0) Similar to tts>0 , we assumed for the lower bound that the visual stimulation time and the key press time were zero, and for the upper bound that the user performs the key press at the very end of the dwell period. Thus, L tts0 (cj , i) U tts0 (cj , i)

= pos(cj , i) · d + numWC · timeCC (i) = pos(cj , i) · d + numWC · timeCC (i) + d + γ + ζ

The visual scan time for 43 characters is α = 1.085s, based on the Hick-Hyman law [18]. The visual stimulus reaction time is γ = 0.2s based on literature [7]. The key press time is ζ = 0.14s based on a key repeat experiment by Card et al [6]. These values have not been established as valid for the target users of AAC devices (but are applicable for the subjects in the study described in the next section).

8. EXPERIMENT DESIGN A user study was conducted to compare the two text entry variants. Twelve people participated in this study, drawn from the undergraduate population. There were five females and seven males of ages ranging from 19 to 39. All were familiar with the conventional QWERTY text entry method, but none have been exposed to the indirect text entry methods to be compared in the study. None had physical disorders. The experiment was a 2×2×4 factorial design. The three factors were: 1. Indirect text entry variant: {Row-Column, Huffman} 2. Dwell period (d): {750ms, 1250ms} 3. Repetition: {4 repetitions} The first factor was administered within-subjects with counterbalancing, such that half the participants started with the Row-Column variant and the other half started with the Huffman variant. The second factor was administered between subjects. The third factor was administered within-subjects. Table 1 shows how the 12 participants were allocated to the factors.

Dwell Period

750 ms 1250 ms

Sequence Row-Col,Huff Huff,Row-Col 3 3 3 3

Table 1: Experiment design: distribution of participants to the conditions.

The experiment software was developed with Java 5.0. The host system was an Intel Pentium 4, at 3 GHz with 1 GB RAM, running Microsoft Windows XP Professional Service Pack 2. The experiment was conducted in a 10-machine lab at York University’s Computer Science and Engineering Building. To minimize interference from other users, the lab was booked for the experiment. The only people present during the experiments were the experimenter and the participants. Each session involved between one to three participants at their convenience, and featured only one configuration of variant sequence and dwell time. At the beginning of each session, the experimenter gave a demonstration of each variant, in the order and with the dwell period that the participant(s) would be subjected to. Participants entered a target text four times with one variant, and then did the same with the other variant. Data collection for each phrase began when the participant pressed “Enter” on the keyboard . Participants were asked to take their time between phrases.

8.1 Choice of Dwell Period The dwell period d must be chosen carefully, and for the design domain of AAC devices, needs to be tailored on an individual basis (as the user population is heterogeneous with respect to physical capabilities). A dwell period that is too short will result in many errors, since the target user will have a difficult time pressing the key before the dwell period finishes and the focus moves on. A dwell period that is too long can result in unnecessarily slow entry speeds. In order to see the effect of varying the scanning period, two periods are chosen in the experiment design based on visual scan time of humans. While in the models it is assumed that the slope coefficient of the Hick-Hyman law is 200 ms, work by Welford has shown that this coefficient can range from 160 ms to 320 ms [14]. We use these two coefficient bounds to find visual scan plus repeat key times of 1.0082 s and 1.8764 s. Rounding to the nearest second for convenience, the dwell periods were initially chosen to be dfast = 1s and dslow = 2s. For users of AAC devices, the derivation of other dwell period values will be necessary. In a preliminary trial, three participants were asked to enter the test phrase four times for each method, with dslow = 2s. It became apparent that this dwell period was too long, as participants were seen to be waiting unnecessarily, especially for the row-column method. On the basis of these preliminary trials, the scanning periods were reassigned to dfast = 750ms and dslow = 1250ms, which correspond to +/250 ms relative to the visual scan lower bound. The effect of the dwell period is emphasized as the threshold of visual scan time is straddled by the two values.

8.2 Choice of Target Text The same target text was presented repeatedly to all the participants. Our hypothesis was that this would afford a faster learning effect that if the target text varied among the sessions. The effect of learning must be taken into account when evaluating the overall relative effectiveness of one variant over another, rather than localized phenomena. We were not concerned that this choice would shift the task from one of skill acquisition to motor memorization, since motor memorization characterizes expert-level text entry anyway.

Figure 4: Screen shot of Huffman variant.

The following character sequence C served as the target text: THE QUICK BROWN FOX JUMPED OVER THE LAZY DOGS This target text contains at least one instance of each selectable. The probability distribution over the selectables in the target text does not resemble that used to derive the Huffman coding. We decided that this was acceptable, given that the same target text was used for both of the variants. We assumed that the effect on each of the variants would be approximately the same. Moreover, confounding due to differences in the probability distributions over the target text selectables is eliminated. A much larger set of target texts would be needed for the probability distribution to be representative of the one used in the design of the variants.

8.3 Interface Design Screen shots of the software used in the experiment are shown in figure 1 for the row-column variant and figure 4 for the Huffman variant. A Huffman coding-based containment hierarchy presents a challenge for the keyboard layout. At any given stage, the highlighting alternates between two groups which are often of quite different sizes. We fashioned the layout such that no matter the user’s current position in the containment hierarchy, the groups between which the highlighting alternates would be adjacent and contiguous. The desired effect was that the user would be able to clearly see the scanning alternating between two groups of keys, from a group on the left to one on the right, or from an upper group to a lower one. The layout can be seen in figure 4. The “up arrow” key was assigned to the “undo” function. If an incorrect selection is made when focus is on a leaf node, the undo function has the effect of deleting the mostrecently appended selectable from the text being composed. If an incorrect selection is made when focus is on an internal node, the undo function has the effect of shifting the focus up one level in the CH. Without this undo function, if the user makes an incorrect selection at an internal node, he or she would be obligated to complete the traversal down the CH until some leaf is chosen. (Then, he or she would need to subsequently choose the selectable corresponding to delete.) The participants were instructed that if they appended an incorrect character to the composed text, they should ignore it and continue with the next character of the target text.

9.

RESULTS AND DISCUSSION

9.1 Model Predictions

4.5

Mean Rate of Entry (wpm)

We used words per minute (wpm) as a normalized unit to characterize rate of input (where 5 chars = 1 word). The lower bound on the time-to-enter text sequence C corresponds to the upper bound on its rate of entry, r. The model predictions are shown in table 2. These predictions confirm our hypothesis that the Huffman variant affords faster rates of entry, but only for one of the variants and not as significantly as one would expect.

Rate of Entry Extrapolated to 12th Session

5.0

Mean Upper Bound

4.0

y = 0.7883Ln(x) + 2.093 R2 = 0.9587

Row-Column

3.5 3.0

y = 0.2713Ln(x) + 1.5038 R2 = 0.7929

2.5

Huffman

2.0 1.5

Mean Lower Bound

1.0 0.5

Predicted Entry Speeds (wpm) Dwell Period Variant d = .75s d = 1.25s Row-Column 2.48 < r < 4.5 1.57 < r < 3.32 Huffman 1.45 < r < 4.36 0.95 < r < 3.89

0.0 1

2

3

4

5

6

7

8

9

10

11

12

Session Number

Figure 6: Learning curves by variant, extrapolated to the 12th session.

9.2 Entry Speed The analysis of variance of text entry speed showed significant effects for variant (F1,10 = 101.87, p < .0001), session (F3,30 = 34.11, p < .0001), and variant-by-session interaction (F3,30 = 6.01, p < .005). Figure 5 (next page) shows the observed entry speeds relative to the predicted entry speeds. Contrary to the model predictions, the Row-Column variant was clearly the faster variant. According to the variant-by-session means, the Row-Column variant started out at 2.048 wpm and reached 3.075 wpm after four sessions, whereas the Huffman variant started at 1.445 wpm and ended at 1.809 wpm. These empirical results suggest that the upper bound estimation on entry rate (corresponding to the lower bounds on time-to-enter) is inaccurate. The Huffman variant minimizes the encoding length of the selectables and the binary instantiation (k = 2) minimizes the size of the focus cycle. Our hypothesis was predicated on the belief that these properties would translate into improved rate of entry. However, the Huffman variant carries a much higher cognitive load. For instance, at any given point in time, focus is shifting among differently-sized groups of selectables. We attempted to mediate this through the use of spatial proximity (namely, the groups are always contiguous and in either left-right or top-bottom relationship with each other). However, the high cognitive load manifested itself in terms of a much larger number of wasted cycles than predicted by the model. Figure 6 illustrative the learning trends for the Huffman and row-column variants. Not only are the Huffman ratesof-entry well below the upper bounds, but (at R2 = 79%) they do not exhibit the adherence to the power law of learning [6], which was demonstrated by the row-column variant (R2 = 96%). It is possible that after four sessions, users of the Huffman variant are still too early in the learning process for the learning relation to take shape. Adding a parameter to the model to reflect scanning pattern familiarity or predictability may yield a better upper bound prediction.

9.3 Dwell Period and Error Rate Significant effects were found for dwell period (F1,10 = 9.57, p < .05), and dwell period-by-variant interaction (F1,10 = 8.37, p < .05). On average, users entered text faster when the dwell period was slower (2.62 wpm at 750 ms, 1.82 wpm at 1250 ms). However, no significant effect was found for dwell period-by-session (F3,30 = 1.26, p > .05). Learning occurred just as quickly for variants with a short dwell period as those with a longer dwell period. A better measure of adeptness with session may be error rate, since rate of entry may increase for both dwell periods simply with the confidence that comes with familiarity. However it was found that there were no significant effects with respect to errors for scanning period (F1,10 = 3.91, p > .05), for period-by-session (F3,30 = 0.179, ns), nor for period-by-variant (F1,10 = 0.667, ns). There was a significant difference between the error rates for each variant (F1,10 = 14.58, p < .005), with the means by variant being 18.50% for Huffman and 6.60% for Venkatagiri. This is demonstrated in figure 7.

Error Rates

25%

20%

Huffman Error Rate (%)

Table 2: Predicted rate of entry for the Huffman and Row-Column variants for two different dwell periods.

15%

10%

Row-Column 5%

0% 1

2

3

4

Session Number

Figure 7: Error rates by variant and session.

Predicted vs. Observed Entry Speeds Huffman Variant, Dwell Period d=750ms

Predicted vs. Observed Entry Speeds Row-Column Variant, Dwell Period d=750ms

5.0

5.0

Upper Bound

4.0 3.5 3.0 2.5

Observed 2.0 1.5

Lower Bound

1.0

Upper Bound

4.5

Mean Rate of Entry (wpm)

Mean Rate of Entry (wpm)

4.5

0.5

4.0 3.5

Observed

3.0

Lower Bound

2.5 2.0 1.5 1.0 0.5

0.0

0.0 1

2

3

1

4

2

Session Number

Predicted vs. Observed Entry Speeds Huffman Variant, Dwell Period (d=1250ms)

4

Predicted vs. Observed Entry Speeds Row-Column Variant, Dwell Period (d=1250ms)

5.0

5.0

4.5

4.5

4.0

Upper Bound

3.5 3.0 2.5 2.0

Observed

1.5 1.0

Lower Bound

0.5 0.0

Mean Rate of Entry (wpm)

Mean Rate of Entry (wpm)

3

Session Number

4.0 3.5

Upper Bound

3.0 2.5

Observed

2.0

Lower Bound

1.5 1.0 0.5 0.0

1

2

3

4

Session Number

1

2

3

4

Session Number

Figure 5: Predicted entry speeds shown in comparison with observed entry speeds.

9.4 User Feedback Participants were asked to make general comments regarding the variants. Those who expressed a preference with regard to speed or usability chose Row-Column over Huffman. The most cited reason was the predictability of the Row-Column scanning pattern, and the high cognitive load associated with the Huffman variant (e.g. “It felt like torture” and “You always have to be looking”).

10. CONCLUSION AND FUTURE WORK This paper introduced a new descriptive model for indirect text composition facilities that has at its core the notion of a containment hierarchy. We used this descriptive model as a common framework and within it, identified the RowColumn and Huffman variants. The containment hierarchy improves over Abascal et al’s “D-dimensional matrix” model [1], which encompasses only a subset of the design space covered by the containment hierarchy model. The containment hierarchy model serves to better characterize the various factors involved in design, and gives the designer an informed perspective from which to optimize parameters. This paper also demonstrated a novel, computer-aided technique for the design of indirect text selection interfaces

— one in which Huffman coding is used for the derivation of the containment hierarchy. This approach guarantees the derivation of optimal containment hierarchies, insofar as mean encoding length. This paper also presented a predictive model for mean entry rate (wpm). This paper described an empirical study of two two-key indirect text entry variants, so that they can be compared both to one another and to the predictive model. The introduction of the Huffman variant and its accompanying selection tree model allows for a general way to approach the design and comparison of all one-key text entry variants. It is necessary to improve the accuracy of the predictive model by having a more complete and explicit account of the characteristics that affect performance — in particular, the predictability of the scanning pattern. This particular deficit is highlighted by the lack of correspondence between the predictions and the empirical observations of subjects using the Huffman variant. We conjecture that scanning predictability of correlated with outdegree in the containment hierarchy — for instance, the higher the value k and the more consistent the values of k within the hierarchy, the more predictable the scanning pattern will be perceived.

Figure 8 shows a hybrid between Row-Column and Huffman that may serve as a successful compromise between the predictability of the Row-Column variant and the selectionover-wait advantage of Huffman (outdegree of k = 7 is the ceiling of the outdegree of the row-column variant, but the encoding is based on Huffman). The method is characterized by {n, k} = {4, 7}.

[4]

[5]

[6]

[7] [8]

Figure 8: Containment hierarchy of the hybrid of the Row-Column and Huffman variants.

[9]

The intended application of these techniques is the design of indirect text entry facilities for the users of AAC systems. The evaluation of such facilities must be conducted by the actual intended users — namely, those individuals who have physical disorders. Since such evaluations are serious undertakings, it is prudent to first perform a preliminary evaluation with subjects who are more readily available. The generalizability of the evaluation results may be limited, but the exercise is invaluable for honing in on salient experimental questions, thus giving us the best possible use of experimental trials with more-representative users. From this particular empirical study, we have learned that a larger number of sessions need to be conducted, over a longer period of time, which will allow any learning effects to become salient. Future experimental conditions will also include the use of other experimental stimulus (e.g., include punctuation) [12] and may include other design variants based on those from the domain of mobile device and soft-keyboard text entry (which are also concerned with few-key text entry and thus also belongs to the domain of AAC [17]. Future analysis will also include the used of more sophisticated error metric [19] than the metric error rate that was used here.

[10]

11. ACKNOWLEDGMENTS This research was supported by the Natural Sciences and Engineering Research Council of Canada. Dr.Horabail Venkatagiri kindly provided the evaluation materials from his 1999 study (the “V260 corpus”), from which we were able to extract the probability distribution that was used in this study. The anonymous reviewers provided much valuable feedback, for which we are sincerely thankful.

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

12. REFERENCES [1] J. Abascal, L. Gardeazabal, and N. Garay. Optimisation of the selection set features for scanning text input. In Proceedings of the 9th International Conference on Computers Helping People with Special Needs (ICCHP 2004), pages 788–795, Paris, France, 2004. [2] A. ASHA. Competencies for speech-language pathologists providing services in augmentative communication. ASHA, 31(3):107–110, 1989. [3] M. Baljko. The contrastive evaluation of unimodal and multimodal interfaces for voice output communication aids.

[20]

[21]

Proceedings of the Seventh International Conference on Multimodal Interfaces ICMI’05, 2005. M. Baljko. The information-theoretic analysis of unimodal interfaces and their multimodal counterparts. Proceedings of the Seventh International ACM Conference on Assistive Technologies – ASSETS’05, 2005. D. R. Beukelman and P. Mirenda. Augmentative and alternative communication: Supporting children & adults with Complex Communication Needs. Paul H. Brookes, Baltimore, MD, third edition, 2005. S. Card, T. Moran, and A. Newell. The Psychology of Human–Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ, 1983. A. Dix, J. Finlay, G. Abowd, and R. Beale. Human–Computer Interaction. Pearson, third edition, 2004. T. Felzer and R. Nordmann. How to operate a PC without using the hands. In Proceedings of the Seventh International ACM Conference on Assistive Technologies, ASSETS’05, pages 198–199, Baltimore, MA, October 9–12 2005. I. S. MacKenzie. Mobile text entry using three keys. In Proceedings of the Second Nordic Conference on Human–Computer Interaction NordiCHI 2002, pages 27–34, 2002. I. S. MacKenzie, H. Kober, D. Smith, T. Jones, and E. Skepner. Letterwise: Prefix-based disambiguation for mobile text input. In Proceedings of the ACM Symposium on User Interface Software and Technology UIST 2001, pages 111–120, 2001. I. S. MacKenzie and R. W. Soukoreff. Text entry for mobile computing: Models and methods, theory and practice. Human–Computer Interaction, 17:147–198, 2002. I. S. MacKenzie and R. W. Soukoreff. Phrase sets for evaluating text entry techniques. In Extended Abstracts of the ACM Symposium on Human Factors in Computing Systems CHI 2003, pages 754–755, 2003. I. S. MacKenzie and S. X. Zhang. The design and evaluation of a high-performance soft keyboard. In Proceedings of the 1999 Conference on Human Factors in Computing Systems CHI’99, pages 25–31, May 1999. I. S. MacKenzie, S. X. Zhang, and R. W. Soukoreff. Text entry using soft keyboards. Behaviour & Information Technology, 18:235–244, 1999. P. Majaranta, I. S. MacKenzie, A. Aula, and K.-J. Raiha. Auditory and visual feedback during eye typing. In Extended Abstracts of the ACM Symposium on Human Factors in Computing Systems CHI 2003, pages 766–767, 2003. E. Matais, I. S. MacKenzie, and W. Buxton. One-handed touch-typing on a qwerty keyboard. Human–Computer Interaction, 11(2–3):1–27, 1996. K. F. McCoy, C. A. Pennington, and A. L. Badman. Compansion: From research prototype to practical integration. Natural Language Engineering, 4(1):73–95, March 1998. R. W. Soukoreff and I. S. MacKenzie. Theoretical upper and lower bounds on typing speed using a stylus and soft keyboard. Behaviour & Information Technology, 14:370–379, 1995. R. W. Soukoreff and I. S. MacKenzie. Metrics for text entry research: an evaluation of MSD and KSPC, and a new unified error metric. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, CHI 2003, pages 113–120. ACM Press, 2003. J. Todman and N. Alm. Modelling conversational pragmatics in communication aids. Journal of Pragmatics, 35(4):523–538, April 2003. H. S. Venkatagiri. Efficient keyboard layouts for sequential access in augmentative and alternative communication. Augmentative and Alternative Communication, 15(2):126–134, June 1998.