A Multidimensional Multivariate Image Evaluation Tool Pak Chung Wong
[email protected] R. Daniel Bergeron
[email protected] Department of Computer Science University of New Hampshire Durham, New Hampshire 03824 Abstract. Our current research focus is on the representation and visualization of multidimensional multivariate (mDmV) data. We are developing a visualization evaluation tool, whose primary goal is to provide an environment for visualization researchers to evaluate human responses to different computer generated visual images. It has the ability to create mDmV data with embedded stimuli, and display them in a variety of ways including icons. In addition, statistical analysis functions are provided for visualization researchers to study relationships among variates.
1 Introduction Evaluating data visualization effects is a sophisticated process. Although a variety of tools exist for coding and supporting various visualization techniques, there are almost no tools that help visualization researchers to evaluate the effectiveness of these techniques. Off-the-shelf visualization packages such as AVS[1] could be used as the base for such an environment, but this would require substantial programming work. In this paper, we describe a tool that aids visual image evaluation by automating the process. The evaluation process involves an examiner who produces a series of images, and a subject to respond to these images. The examiner can define the visualization technique to be used to display the data. This allows visualization researchers to evaluate the effectiveness of each technique. The system is a Motif-based tool that makes extensive use of color and iconographic displays. It supports scientific visualization in a variety of ways: – it generates mDmV test data with embedded stimuli whose characteristics are controlled by visualization researchers; – it saves test data, in pre-defined format, as an ascii or binary file; – it creates images from test data, and displays them either in color, or grey scale; – it displays images with icons or plain pixels; – it accepts keyboard and mouse responses from humans; – it keeps track of the results and scores of each test session; – it generates evaluation reports. In the next section we describe the properties and the characteristics of our computer generated random data. In section 3, we describe the two major system components, their corresponding screens and basic system operations. Section 4 illustrates several data display mechanisms. This system supports statistics analysis among different variates,
which we discuss in section 5. The issue of future enhancements is discussed briefly in section 6, and section 7 presents our concluding remarks.
2 Computer Generated Data Data generated by the system consists of a white noise background with embedded stimuli in random locations. Each of these stimuli can have a different probability distribution. The system also supports multivariate stimuli with different probability distributions. 2.1
Random Number Generator Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin. John von Neumann
The data created by the system needs a random number generator. The random numbers generated by a computer are called pseudo-random because they are obtained using deterministic rules actually stemming from a feedback system. The prevalent method is the linear congruential generator. The algorithm is based on a fundamental congruence relationship which can be expressed as
Ii+1 = (aIi + c)(mod m); i = 1; :::; n
where a; c; m 2 Z . This recurrence is implemented in the random number generators such as rand() and drand48() in Unix libraries. The problem with this algorithm is that it is not free of correlation on successive calls. If k random numbers generated at a time are used to plot points in k dimensional space, the points do not tend to fill up the k-dimensional space. Park and Miller[8] prove that the simple multiplicative congruential algorithm, defined by Ii+1 = aIi (mod m); i = 1; :::; n +
where a; m 2 Z + , can be as good as any of the more general linear congruential generators that have c 6= 0. The algorithm passes all theoretical tests for randomness with values of a = 75 and m = 231 ? 1. This algorithm includes a trick, based on an approximate factorization of m, to multiply two 32-bit numbers in any machine for which the maximum integer is 231 ? 1 or larger. We chose this algorithm as the basis to generate our mDmV random data. 2.2
Random Variable and Probability Density Function
Before we continue the discussion of probability distributions, we need to clarify several statistics definitions. For a given sample space = of some experiment, a random variable is any rule which associates a number with each outcome in =. For example, with = = fHead; Tailg, a random variable X can be defined by
X (Head) = 1; X (Tail) = 0
The random variable X indicates a win (Head) or a loss (Tail) of a coin throwing experiment. A random variable X is said to be continuous if its set of possible values is an entire interval of numbers. A probability density function (p.d.f) of a continuous random variable X is a function f(x) such that for any two numbers a and b with a b,
P (a X b) =
Za b
f (x)dx
In Figure 1, the probability that X takes on a value in the interval [a, b] is the area under the curve of the density function.
f(x)
P(a