Automated Nonlinear Regression Modeling for HCI Antti Oulasvirta Max Planck Institute for Informatics and Saarland University ABSTRACT
Predictive models in HCI, such as models of user performance, are often expressed as multivariate nonlinear regressions. This approach has been preferred, because it is compact and allows scrutiny. However, existing modeling tools in HCI, along with the common statistical packages, are limited to predefined nonlinear models or support linear models only. To assist researchers in the task of identifying novel nonlinear models, we propose a stochastic local search method that constructs equations iteratively. Instead of predefining a model equation, the researcher defines constraints that guide the search process. Comparison of outputs to published baselines in HCI shows improvements in model fit in seven out of 11 cases. We present a few ways in which the method can help HCI researchers explore modeling problems. We conclude that the approach is particularly suitable for complex datasets that have many predictor variables. Author Keywords
Predictive modeling in human–computer interaction; Multivariate nonlinear regression models; Model selection. ACM Classification Keywords
H.5.m. Information Interfaces and Presentation (e.g. HCI): Miscellaneous INTRODUCTION
Predictive models are used in human–computer interaction (HCI) in theory construction, user-interface (UI) design, adaptive UIs, and UI optimization. This paper addresses multivariate nonlinear regression equations, a popular approach to modeling (the literature provides useful introduction [9]). As a concrete example, consider Fitts’ law. Although at times presented as a linear model, it is a nonlinear one: Given movement amplitude (A) and target width (W ) (predictor variables), Fitts’ law predicts movement time T (response variable) as T = a + bID = a + b log2 (2A/W ). Fitts’ law is linear only after the nonlinear transformation that yields ID. However, nonlinear models are not limited to pointing tasks; in fact, they are used in many modeling papers in the HCI field. HCI researchers have preferred regression because it supports many research goals. Firstly, regression is a “white Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from
[email protected]. CHI’14, April 26–May 1, 2014, Toronto, Canada. c 2014 ACM 978-1-4503-2473-1/14/04$15.00. Copyright http://dx.doi.org/10.1145/2556288.2556999
box” approach: equations allow scrutiny against pragmatic and theoretical assumptions. Also, predictions can be computed rapidly, which makes regression models useful in online applications such as adaptive interfaces and interface optimization. Thirdly, it is flexible: a pragmatic researcher can improve model fit by introducing more free parameters and predictor variables, though at the expense of parsimony and generalizability. This Note addresses the issue that identifying nonlinear models can be a grueling enterprise since the number of (theoretically) possible models grows exponentially with the number of predictor variables. While present-day tools support parameter estimation and analytics of predefined equations, they do not adequately support the process of identifying equations with desirable qualities. We work from the observation that a nonlinear regression model can be produced by a sequence of operations starting with a linear model. Fitts’ law, for example, can be obtained when one starts with a linear model with A and D as predictor variables, applying first division and then log2 . We present an algorithmic approach for exploring model spaces like this. The method is aimed at finding the best function that maps experimental variables (predictors) to a response variable while respecting researcher-defined constraints to the function. It is a stochastic local search method that permits control of the model’s content and the search process. In its design, we have paid attention to the specific nature of 1) prediction tasks and 2) goals in modeling in HCI. The method allows the researcher to set constraints that aid in striking a balance among parsimony, consistency with assumptions, and model fitness. Instead of working with one equation at a time, the researcher can work with sets of models found via changes in constraints that guide the search algorithm. The requirement of complete pre-knowledge is relaxed. Such capability can be useful for four purposes: 1. Identifying a “lower bound” for optimal model fit 2. Seeing whether a problem is “modelable” in the first place 3. Exploring alternative models that conform to the desired constraints 4. “Brute force” modeling – acquiring a model for applications wherein theoretical plausibility is unimportant For evaluation, we compare generated models to those reported in the HCI literature. The datasets range from pointing tasks to menu selection and multitouch rotations. The method did find improvements in more than half of the cases and showed comparable performance in the others. The results were particularly promising for complex datasets with more predictors and observations.
BACKGROUND AND APPROACH
The available tools do not adequately support the task of automatically identifying multivariate nonlinear models. Firstly, the tools previously developed within HCI [14, 16, 17] support model fitting, diagnostics, and visualizations but have been limited to predefined pointing models (e.g., in the work of Fitts, Welford, and Mackenzie). Secondly, although the general statistical packages1 allow a researcher to enter arbitrary nonlinear models, they support only parameter estimation, not the identification of models. Thirdly, while some tools do support automatic exploration of nonlinear models, they do not support a larger number of predictors (e.g., xuru.org supports one). Finally, a recent interactive tool [10] supports the construction of models with visualizations, but the process is manual and time-consuming. We build on optimization solutions [12] to the “symbolic regression” problem [6]. We propose a variant of stochastic local search methods motivated by four observations of HCI. First, HCI modeling often favors particular operations on terms. Next, the number of predictors is relatively small and typically limited by that of independent variables in the experiment wherein the data are collected. This number typically lies in the range 1–6. These two observations imply that variable selection [9] is not a significant challenge but the control of terms in the equation is. In contrast, automated tools in, for example, chemistry [15] target problems with hundreds of predictors and need different controls. Thirdly, our examination of some of the existing models (see Table 1) suggested that a linear model with all predictors as separate terms can serve as an initialization for search. Fourth, HCI models are typically based on averaged point data. Thus, model fitting is significantly faster than with unaveraged data. These observations imply that even local search may yield good results. METHOD
Our model assumes repeated observations of a response variable Y in independent conditions x1 , ..., xk ⊂ X. Value Yi in condition xi follows the model Yi =
m X
βj fj (xi ) + j ,
(1)
j=1
where coefficients β are free parameters. The task of the algorithm is to find the functions fj (x) that maximize model fitness (e.g., R2 ). It achieves this by performing a sequence of operations on terms in each iteration. The operations are the familiar algebraic (addition, exponentiation, etc.) and transcendental (exponential, logarithmic, and trigonometric) operations. Unary operations (e.g,. log) involve a single term and binary ones (e.g., multiplication) two. At present, we cover 16 operations, but this range is extendable. Our method is stochastic local search, a random search method [12]. Stochastic local search includes a probabilistic decision criterion that allows it to get past local maxima. Stochastic hill climbing is a baseline method for biologically inspired methods such as genetic algorithms [5]. 1
E.g., SPSS, Stata, R, SAS, Matlab, Maple, and Statistica.
Moreover, we hybridize this method with a hill climbing method (greedy search). Hill climbing methods examine neighbor candidates locally and select the best neighbor. The probability with which it will perform greedy local search (steepest ascent) versus stochastic search can be controlled by the user on the basis of time constraints. In our implementation, search starts with a linear model with every predictor separately as a term. In every iteration, we take the model with the highest fitness value, generate a set of neighbors, and evaluate them. The strategy here on depends on the outcome: if an improvement was found, all single-operation neighbors are examined, for finding the best improvement (steepest-ascent hill climbing); if no improvements were found, the neighborhood is searched more deeply via generation of multi-operation variants. The best candidate is always selected. However, we use the Metropolis criterion [12] to stochastically accept a candidate that is poorer than the best known, if it is not too much worse and if many rounds have passed without improvements. A tabu list keeps track of already visited models and prevents re-accessing them. User controls
Our prime goal has been to provide sufficient control over the process and outcomes. The user can change: 1. 2. 3. 4.
The maximum number of free parameters The types of operations permitted The maximum number of operations allowed per term The maximum number of neighbors generated in each iteration 5. A seed equation for search with a list of terms that are not to be changed 6. Stochasticity factor: with a zero value, search is deterministic and favors steepest ascent to find a local maximum quickly; with a higher value, search is slower and stochastic, but it has an improved chance of finding the optimum 7. Fitness function: R2 , AIC (Akaike Information Criterion), AICc (AIC corrected for sample size), and BIC (Bayesian Information Criterion) [1, 9] Implementation
The algorithm is implemented in Python and uses OLS (ordinary least squares) for fitting. The program is operated from the command line by a user specifying an input file and parameters. By default, variables are treated as scalars, but dummy coding can be applied for categorical variables. It prints intermediate results to display and file and can be stopped during search. The program outputs the best equation found with coefficients and reports a few statistics of model fit and residuals, along with some simple diagnostics (e.g., p-values for coefficients). Limitations
The tool does not eliminate the researcher’s responsibility for model quality but insists on new tasks [1]. One such task is encoding domain knowledge. For instance, Eq. (1) does not allow free parameters within a term (e.g,. axb ), but this can be avoided via searching in stages and encoding them
# 1 2 3 4 5 6 7 8 9 10 11
Dataset Stylus tapping (1 oz)[8] Reanalyzed data [8] Mouse pointing [8] Trackball dragging [8] Magic lens pointing [13] Tactile guidance [7] Pointing, angular [3] Exp. 2 Two thumb tapping[11] Menu selection[2]
Predictors∗ A,W A,We A,W A,We A,W A,We A,W , S N ,I,D W , H, α, A ID,Telapsed B,I,D,W ,F r
n 16
k 2
16
2
16
2
16 16 310 20 10
3 3 4 6 6
Baseline Model provided in paper a + b log2 (2A/W ) a + b log2 (A/We + 1) a + b log2 (A/W + 1) a + b log2 (A/We + 1) a + b log2 (A/W + 1) a + b log2 (A/We + 1) a + b log2 (D/S + 1) + c log2 (S/2/A) Eq. 8-9, nonlinear Eq. 33, IDpr , nonlinear Eq. 5-6, quadratic Eq. 1-7, nonlinear
R2 ∗∗ .966 .987 .984 .980 .965 .817 .88 .91, .95 .953 .79 .99,.52
This paper Best model found∗∗∗ a + b log2 (A/W ) a + b(log2 (log2 A) − We ) a + b log2 (A/W ) a + b log10 (A/We ) a + b log2 (A − (W 3 )4 ) a + b(A/(1 − elog10 We )) a + b(1 − 1/A) + cW 9 Nonlinear (k = 3) Nonlinear (k = 4) 2 a + b(Telapsed /ID) Nonlinear (k = 6)
R2 .966 .981 .973 .978 .981 .941 .947 .980 .962 .929 .990
Table 1. Benchmarking automatic modeling against previously published models of response time in HCI. Notes: n = Number of observations (data rows); k = Number of free parameters; * All variable names from the original papers, except I is interface type (dummy coded); ** = As reported in the paper; *** = Some equations omitted due to space restrictions
to fixed terms. A second is deciding on a meaningful fitness score – we currently use R2 , but this can be changed to cross-validation metrics. A third is model diagnostics. For instance, the use of OLS assumes collinearity and homogenous error variance [9]. The latter is probably an unrealistic assumption in many HCI datasets. Analytics are needed to examine the consequences. Fourthly, the equations are not always elegant and require manual “beautification.” Fifthly, since the outputs are multivariate models with case-specific semantics, we cannot offer tools for plotting or diagnostics.
1. Pointing datasets 1–6 provide the least room to improve, since the R2 s are high to begin with. 2. The method is more successful when there are more predictors. The improvements obtained for datasets 7–11 range from small (8, 9, and 11) to medium (7) to large (10). Constraining of model exploration
To emulate theory-oriented modeling, we took Dataset 11 and limited the transformations (1/x, log2 (x), ∗, /, +, −) to match the equations in the original paper. Many models were found, the best having three parameters and R2 = 0.90.
EVALUATION
We evaluate the method with a benchmark against published models. We then consider two exploration exercises and finally a more complex modeling case. Our intent is not to propose that the outcomes should replace published models but to test whether the method could have aided in the exploration of model spaces. Benchmarking with published datasets
The datasets shown in Table 1 cover traditional pointing tasks with 2–3 predictors (datasets 1–6), more complex pointing models (7 and 9), and more complex compound tasks that involve visual attention and cognition (8, 10, and 11). To match the prediction task to those in the corresponding papers, we use the same input data, including the predictor variables. Moreover, we cap the number of free parameters (k in Table 1) at the one reported in the paper and use the same fitness metric (unadjusted R2 ). Second-order predictors – i.e., variables derived from independent variables, such as ID [8] – are not included in the input data. Search parameters (i.e., stochasticity factor and number of operations allowed per iteration) were set by hand for each case after experience obtained in a few trials. A MacBook Pro 2.8 GHz with 8 GB of RAM was used for all results reported here. Every dataset was given a minimum of 1 hour 30 minutes of runtime, but the winner was often found much sooner. Table 1 shows the results: the models identified by the method improved on model fitness in seven of the 11 cases. Model fit was the same in one case. Interestingly, the method “found Fitts’ law” for Fitts’ original data from 1954 (Dataset 1). In three cases the outcomes were inferior. Two conclusions are drawn:
Modeling of multiple datasets with a single model
We also considered modeling multiple datasets with a single model. Here, as in some pointing papers, the model terms are kept the same but free parameters fitted per dataset. We tested this feature in an exercise covering three datasets (1, 3, and 5) that use D and W to predict T . The best model we found had R2 = 0.97. The case of multitouch rotation gestures
To evaluate performance when there are substantially more observations and predictors, we obtained the dataset of a recent study of multitouch rotations [4]. In the study, five factors were controlled: angle (3 levels), direction (2), xposition on the surface (4), y-position on the surface (3), and diameter (4), yielding an experimental design with 288 cells. The response variable is movement time T . We estimate the search space to be on the order of 1012 . The original paper did not present a model, and Fitts’ law [8] yields a low fit (R2 = 0.28). Our input data contain averaged movement time from trials without contact loss. On the basis of the conclusion of the paper [4], we divided the prediction task into three subtasks: 1) all data, 2) clockwise rotations only, 3) and counterclockwise rotations only. Because no reference model existed, we allowed more time (96 hours) to search. The same setup was used as previously. For the case with all data, a model was found with seven parameters and a fit of R2 = 0.672. For the counterclockwise case, the best model had 11 parameters and a fit of R2 = 0.730. The method was more successful in the clockwise-only case. The best model too had 11 parameters
and R2 = 0.835. However, the method also found a model with seven free parameters and R2 = 0.827. Also, a simpler model, with four parameters and R2 = 0.805, was found: a + bx1 + e
c cos x32 cos 12 −log10 (x1 ×x3 )
+ d tan x3
(2)
x0
Here, variables x0 , ..., x3 refer to x-position, y-position, angle, and diameter, respectively. Further analysis is not possible, for reasons of space, but the results show that performance is satisfactory also for harder problems. SUMMARY AND CONCLUSION
This paper has presented a proof-of-concept for automated nonlinear regression modeling in HCI. It builds on the observation that many HCI problems involve a small number of predictors, averaged point data, and preference for simple models. The implementation offers multiple controls to constrain and guide search. It runs on a regular computer and produces the first results in a matter of seconds. It is perhaps unsurprising that better models can be found with a search algorithm, because we have defined these tasks such that they involve relatively small search spaces – at least for a computer. However, the results confirm that the approach is sensible. For tasks involving one or two predictors, the results are comparable to those in the literature when model fitness is considered. For tasks involving two or more predictor variables, it found models that improve on the reference papers’. For instance, the original model of Dataset 10 had six free parameters, but the method found a superior model with only two free parameters. We hope that automated modeling can aid in both pragmatic and theoretical efforts. The ability to obtain a model automatically can accelerate research in pragmatic pursuits like adaptive UIs and UI optimization. On the other hand, for theoretically oriented researchers who previously regarded modeling as an arcane enterprise, it might lower the barrier of entry. However, we want to warn against “fishing” in theoryconstruction. Although the method helps with the process of identifying models, it is the modeler’s responsibility to explain the terms and parameters. Two challenges stand out for future work. First, performance should be improved, perhaps by using a tree representation for equations [6] and genetic algorithms instead of local search. Second, to better assist the identification of theoretically plausible models, we need visualizations and more interactive ways to construct models. ACKNOWLEDGEMENTS
This research was funded by the Max Planck Centre for Visual Computing and Communication and the Cluster of Excellence on Multimodal Computing and Interaction at Saarland University. We thank Miikka Miettinen, Gilles Bailly, Michael Rohs, Andy Cockburn, Stephanie Wuhrer, Timo K¨otzing, and Miguel Nacenta. Code and data are shared on the project homepage. REFERENCES
1. Bunke, O., Droge, B., and Polzehl, J. Model selection, transformations and variance estimation in nonlinear regression. Statistics: A Journal of Theoretical and Applied Statistics 33, 3 (1999), 197–240.
2. Cockburn, A., Gutwin, C., and Greenberg, S. A predictive model of menu performance. In Proc. CHI’07, ACM Press (2007), 627–636. 3. Grossman, T., and Balakrishnan, R. A probabilistic approach to modeling two-dimensional pointing. ACM TOCHI 12, 3 (2005), 435–459. 4. Hoggan, E., et al. Multi-touch rotation gestures: performance and ergonomics. In Proc. CHI’13, ACM (2013), 3047–3050. 5. Juels, A., and Wattenberg, M. Stochastic hillclimbing as a baseline method for evaluating genetic algorithms. 1994. 6. Koza, J. R. Genetic Programming: vol. 1, On the programming of computers by means of natural selection, vol. 1. MIT press, 1992. 7. Lehtinen, V., et al. Dynamic tactile guidance for visual search tasks. In Proc. UIST’12, ACM Press (2012), 445–452. 8. Mackenzie, I. S. Fitts’ law as a performance model in human-computer interaction. Doctoral dissertation. University of Toronto: Toronto, Ontario, Canada, 1991. 9. Motulsky, H. J., and Ransnas, L. A. Fitting curves to data using nonlinear regression: a practical and nonmathematical review. The FASEB journal 1, 5 (1987), 365–374. 10. Muhlbacher, T., and Piringer, H. A partition-based framework for building and validating regression models. Visualization and Computer Graphics, IEEE Transactions on 19, 12 (2013), 1962–1971. 11. Oulasvirta, A., et al. Improving two-thumb text entry on touchscreen devices. In Proc. CHI’13, ACM Press (2013), 2765–2774. 12. Rao, S. S., and Rao, S. Engineering optimization: theory and practice. John Wiley & Sons, 2009. 13. Rohs, M., and Oulasvirta, A. Target acquisition with camera phones when used as magic lenses. In Proc. CHI’08, ACM Press (2008), 1409–1418. 14. Schedlbauer, M. J. An extensible platform for the interactive exploration of fitts’ law and related movement time models. In Ext Abs. CHI’07, ACM Press (2007), 2633–2638. 15. Searson, D. P., Leahy, D. E., and Willis, M. J. Gptips: an open source genetic programming toolbox for multigene symbolic regression. In Proc. IMECS’10, vol. 1 (2010). 16. Soukoreff, R. W., and MacKenzie, I. S. Generalized fitts’ law model builder. In Ext Abs. CHI’95, ACM (1995), 113–114. 17. Wobbrock, J. O., Shinohara, K., and Jansen, A. The effects of task dimensionality, endpoint deviation, throughput calculation, and experiment design on pointing measures and models. In Proc. CHI’11, ACM Press (2011), 1639–1648.