Computational modeling and experimental investigation of effects of ...

Report 3 Downloads 95 Views
ARTICLE IN PRESS

Int. J. Human-Computer Studies 64 (2006) 670–682 www.elsevier.com/locate/ijhcs

Computational modeling and experimental investigation of effects of compositional elements on interface and design aesthetics Michael Bauerly, Yili Liu Department of Industrial and Operations Engineering, The University of Michigan, 1205 Beal Avenue, Ann Arbor, MI 48109-2117, USA Received 17 May 2005; received in revised form 15 December 2005; accepted 24 January 2006 Available online 7 March 2006 Communicated by J. Scholtz

Abstract This article describes computational modeling and two corresponding experimental investigations of the effects of symmetry, balance and quantity of construction elements on interface aesthetic judgments. In the first experiment, 30 black and white geometric images were developed by systematically varying these three attributes in order to validate computational aesthetic quantification algorithms with subject ratings. The second experiment employed the same image layout as Experiment 1 but with realistic looking web pages as stimuli. The images were rated by 16 subjects in each experiment using the ratio-scale magnitude estimation method against a benchmark image with average balance and symmetry values and a standard number of elements. Subjects also established an ordered list of the images according to their aesthetic appeal using the Balanced-Incomplete-Block (BIB) ranking method. Results from both experiments show that subjects are adept at judging symmetry and balance in both the horizontal and vertical directions and thus the quantification of those attributes is justified. The first experiment establishes a relationship between a higher symmetry value and aesthetic appeal for the basic imagery showing that subjects preferred symmetric over non-symmetric images. The second experiment illustrates that increasing the number of groups in a web page causes a decrease in the aesthetic appeal rating. r 2006 Elsevier Ltd. All rights reserved. Keywords: Aesthetics; Engineering aesthetics; Balance; Symmetry; Display evaluation

1. Introduction Research on visual displays has traditionally defined display effectiveness with criteria such as legibility or difficulty of target search and information access. Most of this research focuses on qualitative description and summarization and does not utilize a quantitative mathematical modeling approach. Because the majority of human factors design guidelines are qualitative, the effectiveness of many design techniques is left to debate because there are no methods to provide numerical analysis or direct comparison between different design proposals. Tullis (1988b), for example, gives many well-defined but qualitative guidelines for text-based screen design. Most of these principles can be transferred to many types of visual Corresponding author.

E-mail addresses: [email protected] (M. Bauerly), [email protected] (Y. Liu). 1071-5819/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.ijhcs.2006.01.002

bitmapped displays, which prove to be extremely helpful in many design situations. One major drawback, however, is that they do not provide a way for the designer to assign quantified values to the specific components of screen design. There are some past attempts to adopt a quantitative and computational approach to interface design. For example, Tullis (1983) developed metrics for quantifying the effects of item grouping, density and complexity on the usability of text-based displays and later tested those attributes against measures of search time and preference (Tullis, 1984, 1988a). Streveler and Wasserman (1984) propose creating several classes of screen measures for alphanumeric displays including the aesthetic measures of balance and symmetry. One of the overlapping attributes in these two lines of research is the use of characteristics such as size or number of groups as a descriptive metric. Liu and Wickens (1992) use cluster analysis to achieve visual grouping and encoding of data in a two-dimensional

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

(2-D) grid according to quantitative similarity values. Subjects were then asked to complete various judgment tasks with the display grids and their judgment performance showed significant improvement with the use of the display grids. Sears (1993) developed a metric for developing and comparing user interface widget layout based on a simple description of the stages required to complete a task within the interface. In addition to the research challenge of developing quantitative metrics for interface evaluation and analysis, there is an issue of integrating aesthetic factors in interface evaluation. Consideration of aesthetics has largely been ignored in human factors analysis of displays until some recent work that has emerged on the increasing importance of aesthetics in various domains of contemporary society. Liu (2003a, b) provided a comprehensive review of the major schools of aesthetic theory and their relationship to aesthetic design and human factors engineering. Jordan (1997) calls for the design of products that provide particular aesthetic pleasure beyond simple usability and the associated positive feelings of security, confidence, pride and satisfaction. Hallnas and Redstrom (2002) declare that increasing the aesthetics of the computational interface will only aid in the widespread acceptance and ‘presence’ of ubiquitous computing devices. A recent survey by Kim et al. (2003) analyzes several web pages to determine the common aesthetic design factors and the corresponding emotional responses of users. Another study by Healey and Enns (2002) manipulates visualization techniques in order to provide guidelines for designing effective visual displays by encoding weather data in a map of the US. This process optimized aesthetic appeal and information retrieval through the creation of different presentation techniques to find an optimal strategy. The relation of aesthetics to system effectiveness cannot be ignored. Tractinsky et al. (2000) concluded that users of an automated teller machine (ATM) found the system to be more usable based solely on aesthetic alterations to the interface without any changes in functionality. In addition to this recent interest from many areas, artists and designers have long treated aesthetics as a primary aspect of their work; however, they mainly describe aesthetic terms in qualitative or subjective languages that do not easily allow for engineering implementation. Recently, several artists have started to include themes of a computationally generated superior aesthetic. One of the earliest groups to utilize computation was Group de Recherche d’Art Visuel (GRAV) of the 1960s. Prince (2000) notes that this group was ‘‘dedicated to understanding mathematical simulation and aesthetics’’ and served as pioneers in this artistic field. The use of an exploratory approach that utilized the most appropriate solution in future iterations is one that would appear here for the first time computationally. Another contemporary artist making similar efforts is Steven Rooke. His art utilizes a genetic process of aesthetic selection such that

671

aesthetically interesting features emerge in subsequent iterations of the design process as recorded by World (1996). At each iteration, certain generations are given higher aesthetic fitness scores which are passed on to future generations, creating an aesthetically superior set of offspring. The aesthetic fitness, however, relies on the judgment of the artist to assign a score and thus the process remains highly subjective. Reiser and Reiser (1995) create a list of aesthetic considerations specific to multimedia; but the end result is one that only extends the qualitative human factors checklist to include more items to consider without knowing quantitatively where an optimal design space exists. This paper describes our computational modeling and experimental research work that attempts to bridge the scientific methods of human performance and display analysis with aesthetic design principles. This is executed in a quantitative manner through the development and validation of numerical quantifications of the effects of three compositional elements on aesthetic judgments. The three elements—symmetry, balance and compositional blocking—are present in 2-D medium. Inspiration is taken from the development of methods for quantifying the grouping, density and complexity of text-based displays by Tullis (1983). Relatively similar attempts have been made using many more attributes than what are presented here (Ngo, 2001; Ngo et al., 2003; Lavie and Tractinsky, 2004). These earlier studies, however, have not been validated with any experimental investigation of user judgment of the proposed quantification methods. It is important that we develop quantitative methods that match the perceptual and mental processes of system users, which creates the necessity of human experimental verification. While the objective of the present study is to develop quantitative indexes for interface aesthetic evaluation, not for constructing or assessing a theoretical model, the present study is inspired by the psychophysical school of aesthetic theories. As discussed in detail in Liu (2003a), several major schools of aesthetic theories exist, including philosophical theories, cognitive and social theories, natural and sexual selection theories and psychophysical theories. Among these schools of theories, psychophysical theories emphasize investigations of quantitative relationships between aesthetic response and basic pictorial elements, which is an approach adopted by the present study. The results of the present study, in return, provide an evaluation of the role of this school of theories in aesthetic interface design and evaluation. The compositional elements chosen in this study represent three basic pictorial elements and design concepts that follow from previous research. The element of balance is well understood by experts and non-experts alike and the preference for balance in visual displays is well documented (Wilson and Chatterjee, 2005). Existing theories suggest that visual balance is necessary because it unifies the

ARTICLE IN PRESS 672

M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

elements of a display into a cohesive whole thus creating integrity and meaning (Locher et al., 1998). Pre-attentive visual processing of balance can be accomplished within 100 ms (Ognjenovic, 1991; Locher and Nagy, 1996) and thus can quickly help to structure and guide the viewer’s gaze through an image (Locher, 1996). Subject preferences for symmetry also exist for the same reason as those for balance. Any composition with perfect symmetry is by definition perfectly balanced and thus it serves as an element that can be pre-attentively processed and serves as a guide for the viewer. The preference for symmetry is also grounded in sexual selection theories. For example, cross-cultural preferences for symmetrical faces may be explained by evolutionary processes that favor symmetrical facial features, among other things (Langlois and Roggman, 1990). Additionally, symmetric faces may reveal a higher level of ability to resist parasites and may have indicated a stronger hunter in males. In females, many symmetrical body and facial features that are considered attractive may be indicators of higher fertility levels (Buss and Barnes, 1986). The number of design elements or visual groups is also related to aesthetic appraisal. Depending on the number and size of the ‘compositional building blocks,’ a display can appear empty and sparse or dense and overcrowded. A high information density may cause perceptual channels to become overloaded (Cropper and Evans, 1968). Tullis (1983) summarizes the recommendations for an appropriate level of density for alphanumeric displays but he points out that it is not always possible to convert from one density measure to another. This is particularly true when using density as a measurement in modern bitmapped displays. The role of symmetry, grouping and balance in visual perception has long been recognized by the Gestalt theory of perceptual organization. ‘‘Gestalt’’ means form or shape and, particularly, ‘‘good form’’ or ‘‘good shape’’ that emerges when the parts of a perceived object are grouped to form the perceptual whole (Boring, 1950). Symmetry, grouping and balance are among the numerous Gestalt principles that have been proposed to describe how the perceptual elements are grouped into recognizable whole objects or ‘‘good forms.’’ According to the Gestalt principles, the more symmetrical a region’s shape, the more likely it is seen as a figure in contrast to its background. Similarly, visual patterns grouped together by similarity or proximity tend to be seen as a whole figure and visual arrangements that are more uniform and homogeneous tend to be perceived as ‘‘good forms.’’ These existing theories and research illustrate the importance of balance, symmetry and grouping on perception and preference. However, none of these previous studies has adopted a mathematical approach to quantify the joint effects of these variables on user’s aesthetic judgment, which is the focus of the present study.

1.1. Introduction to the experimental procedure Two experiments are reported in this paper whose purpose is twofold: to determine whether the metrics of three interface compositional elements (symmetry, balance and the number of compositional groups) reflect subject ratings of those attributes and to determine whether there is a relationship between the compositional elements and subject ratings of aesthetic appeal. Numerical values representing the three compositional elements are calculated for two different types of stimuli in the two experiments. In Experiment 1, basic black and white geometric images were used and in Experiment 2, web pages following the exact same compositions as the images from Experiment 1 were used to represent a real-world interface situation. The three visual elements of balance, symmetry and the number of groups were chosen because they represent simple, intuitive concepts in design and they are related to the measures described or utilized earlier in the analysis of alphanumeric displays. For example, Streveler and Wasserman (1984) proposed several types of screen measures including the aesthetic measures of balance and symmetry. The work by Tullis (1984) on interface usability metrics included the number of the groups. In the current study, balance is derived from summing the visual moments in the horizontal and vertical directions to zero about the balance point. Symmetry is analyzed with a pixel-by-pixel comparison about a central axis of reflection giving more influence to comparisons closer to the axis because of the increased visual influence in that area. The number of groups is a simple count of the distinct visual groups that exist in an image. Because rectangles or pictures are used as compositional elements the number of groups is trivial to compute in both experiments. The validity of the quantitative analysis for symmetry and balance was tested through subject ratings of various images with the ratio-scale magnitude estimation method. The experimental stimuli were created according to target parameters for symmetry, balance and composition elements and all were compared to a benchmark image. Three experimental sessions were completed where: (1) the subjects rated the aesthetic appeal of the images compared to the benchmark; (2) subjects ordered images from least appealing to most appealing in a Balanced-Incomplete-Block (BIB) methodology to establish an overall preference of the images; and (3) subjects rated their impression of the image balance and symmetry in both the horizontal and vertical directions as well as rated the aesthetic appeal for a second time. The results show that subject responses validated the methods for determining symmetry and balance. Additionally, aesthetic appeal was highly dependent on symmetry for the basic images in Experiment 1 and on the number of groups for the web pages in Experiment 2.

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

673

2. Image quantification methods 1

Three basic compositional attributes were each quantified on individual scales. Simple algorithms were created in an attempt to mimic the human cognitive representation of the attributes of symmetry and balance, with the third attribute being the number of compositional building blocks used in the image.

2 3 4

2.1. Symmetry

5

Symmetry, s, is an analysis of the similarity of pixels on opposite sides of an axis of reflection. This particular type of symmetry is referred to as ‘bilateral symmetry’. This algorithm takes a microscopic approach and compares each half of an image pixel by pixel, as opposed to a macroscopic view that might compare higher level elements such as specific shapes or lines. It should also be noted that this algorithm only considers symmetry about a vertical or horizontal axis, but the general strategy can be utilized about any axis of reflection. It was hypothesized that those pixel comparisons that are close to the axis of reflection have a higher influence on the overall impression of symmetry than do comparisons which are further away from the reflection axis. Take Figs. 1 and 2 as illustrations of this assumption. Each image is 5  6 pixels and has only one non-matching pixel pair. Fig. 1 has a non-matching pixel pair closer to the vertical axis of symmetry (at coordinates C3 and D3) than does Fig. 2 (at coordinates A3 and F3). According to the algorithm discussed below, Fig. 1 is less symmetric than Fig. 2 because as the pixel comparisons move farther away from the axis of reflection, their influence on the overall symmetry values are decreased. Eq. (1) below gives the equation for symmetry. The variable m is the pixel length of the image dimension that is parallel to the axis of reflection. For example, if the axis of reflection is vertical, m is the height of the image in pixels. The variable n is the number of comparisons required in each row or column of pixels. Taking the case where the axis of reflection is again vertical, the number of comparisons is the image width in pixels divided by 2 when

1 2

A

5

h X

E

F

Fig. 1. 5  6 pixel bitmap with asymmetric pixel pairs at coordinates C3 and D3 with a vertical axis of reflection.

F

The balance point, b, is the Cartesian coordinate at the center of the visual mass of the image. This center can easily be found once individual masses are assigned to each pixel. In this experiment, black pixels are given a mass of one and white pixels given a mass of zero. Eqs. (2) and (3) give the center of balance as (xb, yb) where w is the image width in pixels, h is the image height in pixels and W, the visual weight, is the summation of black pixels in each pixel column (Eq. (2)) or row (Eq. (3)). For the experimental procedure and analysis, b is given as a set of normalized coordinates between zero and one as in Eq. (4)

x¼1

D

E

2.2. Balance

4

C

D

the width is an even number of pixels, or it is the image width less 1 divided by 2 when the width is odd. The symmetry factor Xij becomes a binary variable which is equal to one when the pixel pairs are the same and zero when they are opposite. To extend this algorithm to a multi-color image, Xij could be defined for each combination of colors such that black might have a higher symmetry factor with dark gray than with white. As discussed illustratively above, Eq. (1) gives a positive comparison (X ij ¼ 1) at the edge of the image less influence on the overall symmetry value than that of a positive comparison occurring at the axis of reflection. Specifically, a positive comparison at the axis of reflection is deemed to be twice as influential as one at the farthest distance away   m X n 2 X j1 s¼ X ij 1 þ . (1) 3mn i¼1 j¼1 n1

w X

B

C

Fig. 2. 5  6 pixel bitmap with asymmetric pixel pairs at coordinates A3 and F3 with a vertical axis of reflection.

3

A

B

W x ðx  xb Þ ¼ 0,

(2)

W y ðy  yb Þ ¼ 0,

(3)

x y  b ; b . w h

(4)

y¼1



ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

674

Although it is somewhat related to s, the measure of b is distinct. The strongest relationship between the two measures occurs when an image is perfectly symmetric. For example, an image that is perfectly symmetric (s ¼ 1) about a vertical line will have a balance point that lies somewhere along that line of reflection and thus the x-coordinate of b will be 0.5. As another example, Fig. 3 gives an image with perfectly centered balance (0.5, 0.5) but less than perfect symmetry. 2.3. Number of groups The number of groups, n, is a simple count of the distinct visual groups that exist in an image. These experiments use rectangles or pictures as compositional elements and thus the number of individual blocks in each image is trivial to calculate. Inclusion of this attribute was driven by the hypothesis that images with a higher n would potentially be less focused, which could have a significant effect on aesthetic appeal. 3. Experiment 1 The image quantification metrics described above are evaluated in Experiment 1 with abstract black-and-white images.

1 2 3 4 5 A

B

C

D

E

F

Fig. 3. 5  6 pixel bitmap with centered balance (0.5, 0.5) but imperfect symmetry.

3.1. Methods 3.1.1. Participants Sixteen subjects aged 21–29 participated in each of three experimental sessions. All subjects had normal (20/20) or corrected-to-normal vision and normal color vision. Art and architecture students were not allowed to participate in the experiment in order to avoid introduction of any potential influence of specialized aesthetic training or background. The entire experimental procedure took approximately 1 h and subjects were compensated $10.00 for their participation. 3.1.2. Stimuli Thirty images were developed to target values of s, b and n, including one image as a benchmark and one as a preview with which the subjects rehearsed the questions for each of three sessions. While the images were designed specifically for combinations of the three composition attributes, there exist an infinite number of possible images for each set. Care was taken to make their design as homogenous as possible. This was done by maintaining similar design elements such as overall visual mass, white space around each rectangle and similar spacing strategies between rectangles. Fig. 4 gives an example of the imagery, showing the benchmark image used in comparison to all images alongside the image used for rehearsing the experimental procedure. 3.1.3. Procedure All data were collected by recording verbal responses to a standard set of questions about each image or group of images. The experiment was conducted in a soundinsulated, well-lit experimental lab. Participants sat at a desk opposite the experimenter and viewed all images on a 17-inch CRT monitor at 1024  768 pixel resolution, with all images measuring 400 pixels square. In the first experimental session, subjects were asked to use the magnitude estimation method to rate the 28 test images. They were instructed to rate the overall aesthetic

Fig. 4. Example of experimental stimuli from Experiment 1. Shown here is the benchmark image (left) and the example image used for question rehearsal (right).

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

3.2. Results and discussion The data collected using the magnitude estimation method is log-normally distributed and thus the data analysis uses the log of the geometric mean subject ratings where appropriate. This allows for linear relationships to be established between subjective ratings of symmetry and balance and the corresponding s and b values, as well as for a regression model of aesthetic appeal ratings based on s, b and n.

3.2.1. Symmetry ratings The log of the mean subject symmetry ratings for this experiment is plotted against the s values for symmetry about a horizontal and vertical axis in Figs. 5 and 6. The linearity of the data plots allows for the subject ratings of symmetry to be reflected as a linear regression function of the s value of the image. Eqs. (5) and (6) give the log of the subject rating, r, as a function of s for symmetry about horizontal and vertical axes, respectively R2 ¼ 0:78,

LOGðrHORSYM1 Þ ¼ 1:74sHOR  0:37; LOGðrVERTSYM1 Þ ¼ 1:83sVERT  0:48;

R2 ¼ 0:70.

(5) (6)

2

These equations and the high R values indicate that subjects were quite adept at rating the symmetry of the images and that their ratings corresponded with the s values of the images. 3.2.2. Balance ratings The log of the mean subject ratings for horizontal and vertical balance in Experiment 1 are plotted against xb and yb, respectively, in Figs. 7 and 8. Subjects were asked to give high ratings to an image if it was evenly balanced and

Exp 1: Mean Subject Rating

1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.00

0.20

0.40 0.60 0.80 1.00 Horizontal Symmetry Algorithm

1.20

Fig. 5. Log of mean subject ratings for symmetry about a horizontal axis are plotted against the s values.

1.8 Exp 1: Mean Subject Rating

appeal of each test image provided that the benchmark image was rated as a 10. For example, if the test image was twice as appealing as the benchmark then it was rated as a 20 and if it was half as appealing it was rated as a 5. This rating method allows the subjects to use any positive number they see fit and their ratings are not restricted by fixed scales. Each image was displayed on a screen next to the benchmark image and the presentation sequence was randomly ordered for each subject and each trial. Subjects were encouraged to give a rating quickly and not to think about the images for too long. As soon as a rating was given verbally, the next test image in the sequence was displayed. The second session required subjects to complete a BIB ranking of 24 of the 28 test images along with the benchmark image, making 25 images total. With the BIB ranking procedure, the images were presented in groups of 4 rather than all 25 images all at once. This procedure considers human perceptual, memory and judgment capacity and thus it helps obtain more reliable results than asking subjects to judge a large number of images at once. For each group of 4 images, the images were ranked from the least aesthetically appealing (‘‘the ugliest’’) to the most aesthetically appealing (‘‘the prettiest’’). The BIB was a complete design such that each image was seen eight times and was compared to all other images in the comparison set of 25. As in the previous session, as soon as the rank order of a group was given, the next set of 4 images was displayed. In the last session, the magnitude estimation method was again used to rate the images compared to the benchmark on multiple scales. Subjects gave two separate ratings for symmetry about the horizontal and vertical axes with a higher rating corresponding to a higher degree of symmetry. Two ratings were given for horizontal and vertical balance, respectively, with higher ratings given to images that were well-balanced and centered and lower ratings given to images that were heavily skewed in one direction. Ideally, subjects would give the same rating to an image that had b ¼ ð0:3; 0:3Þ as one that had b ¼ ð0:7; 0:7Þ because both would have horizontal and vertical balance points the same distance from the center point of the image. Subjects were then asked again to rate the overall aesthetic appeal of the images compared to the benchmark.

675

1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.00

0.20

0.40 0.60 0.80 1.00 Vertical Symmetry Algorithm

1.20

Fig. 6. Log of mean subject ratings for symmetry about a vertical axis are plotted against the s values.

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

to give lower ratings to images where balance was skewed in one direction, thus the subject ratings increase as balance values approach 0.5 (perfect balance) from either direction. Both figures show that subjects were able to judge the balance of images quite well, with lower ratings becoming more common as balance moves away from 0.5 in either direction. The high degree of certainty for images with balance values of 0.5 shows that subjects are quite confident that these images are much more balanced than the benchmark image, which has b ¼ ð0:45; 0:55Þ. As the graphs illustrate, subject ratings of horizontal balance were not highly influenced by whether images were skewed to the left or to the right and ratings of vertical balance were not highly influenced by whether images were skewed to the top or the bottom. This result allows the balance attributes to be slightly transformed such that balance points which are equidistant from the middle point are merged. For example, horizontal balance points at 0.4 and 0.6 are assigned the same value as are balance points at 0.3 and 0.7. This transformation is illustrated mathematically in Eq. (7) such that b0 is the new balance measure and b is the existing measure. This transformation suggests a

Mean Subject Rating

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.40 0.60 Horizontal Balance Algorithm

0.80

Fig. 7. Log of mean subject ratings for horizontal balance are plotted against xb values.

1.6 1.4 Mean Subject Rating

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.60

0.70 0.80 0.90 1.00 Transformed Horizontal Balance Algorithm

1.10

Fig. 9. Log of mean subject ratings for horizontal balance are plotted against xb0 values. As xb0 increases, the balance of the image becomes more horizontally centered.

balance measure that is similar in nature to that proposed by Streveler and Wasserman (1984) b0 ¼ 1  j2b  1j.

(7)

0

Values of b can vary from 0, in the case where the balance point of an image is at the extreme edge, to 1, where the balance point is the middle of the balance axis. The effectiveness of this transformation can now be seen such that it allows for a functional relationship between the balance quantification and subject ratings. The values of mean subject ratings are plotted against b0 for horizontal and vertical balance in Figs. 9 and 10, respectively. As with symmetry, subject ratings of balance can be given as a function of the transformed b0 . Eqs. (8) and (9) give the log of the mean subject rating of balance as a function of b0 for balance in the horizontal and vertical planes

1.6

0.0 0.20

1.6 Exp 1: Mean Subject Rating

676

1.2 1.0 0.8 0.6

LOGðrHORBAL1 Þ ¼ 1:90b0 X  0:65; LOGðrVERTBAL2 Þ ¼ 1:83b0 Y  0:57;

R2 ¼ 0:74, R2 ¼ 0:79.

(8) (9)

Subjects were particularly confident in rating an image with perfect symmetry as having perfect balance because, by definition, an image with s ¼ 1, must have a balance rating of b ¼ 0:5 (or b0 ¼ 1:0) in the corresponding dimension. Subjects gave lower ratings to images that had perfect balance but were not perfectly symmetric. Overall, however, subjects were able to distinguish between varying levels of balance with a sufficient degree of accuracy and their judgments reflected the quantification method.

0.4 0.2 0.0 0.20

0.40 0.60 Vertical Balance Algorithm

0.80

Fig. 8. Log of mean subject ratings for vertical balance are plotted against yb values.

3.2.3. Aesthetic appeal ratings The complete ordered aesthetic scores obtained with the BIB ranking method are given in Fig. 11, with the most aesthetically appealing image in the upper left and the least aesthetically appealing in the lower right. The aesthetic scores scale all the images along an interval scale with the most appealing image receiving a score of 11.13 and the

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

1.4

web pages were developed in a way to match the abstract images in their composition.

1.2

4.1. Methods

1.6 Exp 1: Mean Subject Rating

677

1.0 0.8 0.6 0.4 0.2 0.0 0.60

0.70 0.80 0.90 1.00 Transformed Vertical Balance Algorithm

1.10

Fig. 10. Log of mean subject ratings for vertical balance are plotted against yb0 values. As yb0 increases, the balance of the image becomes more vertically centered.

least appealing being scored at 0. It is very interesting to note that the top 8 scored images are all symmetrical about one or both axis and that the bottom 17 images have no symmetry. A qualitative analysis of the images shows that there is a distinct difference between the non-symmetric images with scores higher than 5.5 and those that are less appealing. While it is not quantified with any of the attributes in the experiment, the images ranked less appealing than an aesthetic score of 5.5 lack a certain coherency or focus that is present in the images scored higher. While the attributes of symmetry and balance were validated by subject ratings and the BIB results give some insight into how the measures, particularly symmetry, influence aesthetic preferences, the question remains about exactly what influence the attributes have on the aesthetic appeal. It was hypothesized that the maximum value of horizontal and vertical symmetry may have some relation to the aesthetic ratings, particularly in light of the results from the aesthetic scores found in the BIB ranking. This is based on the assumption that, independent of the direction, the existence of any symmetry increases the aesthetic appeal of an image. Eq. (10) gives the log of the mean aesthetic appeal rating, r, as a function of smax, the maximum of the horizontal or vertical s value. Statistical analysis showed no difference between the ratings of aesthetic appeal obtained in trials 1 and 3 and thus the mean of the two trials is used as r LOGðrAESTHETIC Þ ¼ 0:68sMAX þ 0:46;

R2 ¼ 0:59.

(10)

4.1.1. Participants For Experiment 2, 16 subjects aged 18–27 participated in four experimental sessions. All subjects had normal (20/20) or corrected-to-normal vision and normal color vision. None of the subjects participated in Experiment 1. As in Experiment 1, art and architecture students were not allowed to participate in the experiment to avoid introduction of any potential influence of specialized aesthetic training or background. The entire experimental procedure took approximately 1 h and 15 min and subjects were compensated $10.00 per h for their participation. 4.1.2. Stimuli The same 30 compositions from Experiment 1 were used in Experiment 2, but their features were altered to create images that look like web pages. Fig. 12 gives an example of the stimuli used in Experiment 2, showing the benchmark web page and the web page used for question rehearsal. Because the same compositions were used in both experiments, the underlying s, b and n values remained unchanged for the second experiment. This was done under the assumption that the web page text is considered as the background and that the images in the web pages are analogous to the solid blocks used in Experiment 1. 4.1.3. Procedure The same experimental procedure was used in Experiment 2 as in Experiment 1 with one minor modification. The experiment was arranged in the following four stages, where stage 3 deviated from the procedure from Experiment 1: (1) subjects rated the aesthetic appeal; (2) subjects completed a BIB ranking; (3) subjects rated the aesthetic appeal for a second time; and (4) subjects rated balance, symmetry and aesthetic appeal. This design allows for the aesthetic appeal ratings to be checked for repeatability from stage 1 to stage 3 and to account for any novelty effects that may arise in rating the images for the first time. The subjects were instructed to make their judgments solely on the basis of the overall layout of the webpage, not on the content of the images or the texts. The 11-point web page text was shown at a resolution of 50% making it virtually impossible to read. To reduce or minimize any potential novel effects of the photographs, all the photographs were shown to each subject prior to the experimental trials.

4. Experiment 2

4.2. Results and discussion

In contrast to Experiment 1, in which abstract blackand-white images were used, Experiment 2 employs web pages to evaluate the image quantification metrics. The

4.2.1. Symmetry ratings The log of the mean subject symmetry ratings for Experiment 2 are plotted against the s values for symmetry

ARTICLE IN PRESS 678

M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

Fig. 11. BIB results given in aesthetic scores from the most appealing (upper left) to least appealing (lower right). The aesthetic score is given below for each image, with higher numbers represent more aesthetically appealing images.

about a horizontal and vertical axis in Figs. 13 and 14. Eqs. (11) and (12) give the log of the subject rating, r, as a function of s for symmetry about horizontal and vertical axes, respectively LOGðrHORSYM2 Þ ¼ 1:25sHOR  0:071;

R2 ¼ 0:75,

(11)

LOGðrVERTSYM2 Þ ¼ 1:28sVERT  0:11;

R2 ¼ 0:69.

(12)

The regression equations show that subject symmetry ratings of the web pages had a strong relationship with s and that it provides a good estimation of subject perceptions of symmetry.

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

679

1.8

1.8

1.6

1.6

Exp 2: Mean Subject Rating

Exp 2: Mean Subject Rating

Fig. 12. Example of experimental stimuli from Experiment 2. Shown here is the benchmark image (left) and the example image used for question rehearsal (right).

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.00

0.20

0.40 0.60 0.80 1.00 Horizontal Symmetry Algorithm

1.20

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.00

0.20

0.40 0.60 0.80 1.00 Vertical Symmetry Algorithm

1.20

Fig. 13. Log of mean subject ratings for symmetry about a horizontal axis are plotted against the s values.

Fig. 14. Log of mean subject ratings for symmetry about a vertical axis are plotted against the s values.

4.2.2. Balance ratings The log of the mean subject ratings for horizontal and vertical balance in Experiment 2 are plotted against xb and yb, respectively, in Figs. 15 and 16. Similar to the balance results from Experiment 1, balance ratings are not dependent on whether the web page was skewed in one direction or the other so b can be transformed to b0 again in order to establish a relationship with subject ratings. The log of the mean subject ratings are plotted against the values of b0 for both horizontal and vertical balance in Figs. 17 and 18. Eqs. (13) and (14) give the log of the mean balance rating as a function of b0 for balance in the horizontal and vertical planes, respectively

Similar to the findings regarding balance from Experiment 1, subjects were able to accurately judge the balance of the web images in Experiment 2 and that judgment matched the balance metric.

R2 ¼ 0:72,

(13)

LOGðrVERTBAL2 Þ ¼ 1:76b0 Y  0:65; R2 ¼ 0:85.

(14)

LOGðrHORBAL2 Þ ¼ 1:76b0 X  0:69;

4.2.3. Aesthetic appeal ratings The complete ordered aesthetic scores obtained with the BIB ranking method are given in Fig. 19, with the most aesthetically appealing image in the upper left and the least aesthetically appealing in the lower right. The aesthetic scores place the images on an interval scale with the most appealing image receiving a score of 15.88 and the least appealing being scored at 0. Unlike in the first experiment, where symmetry was the primary indicator of a higher aesthetic score, the greatest predictor of a high aesthetic score appears to be n, the number of groups. The top 3 web pages have 2 or 3 groups and almost all of the web pages with a smaller n are in the

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

680

1.6 Exp 2: Mean Subject Rating

1.6

Mean Subject Rating

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.20

0.40 0.60 Horizontal Balance Algorithm

0.80

Fig. 15. Log of mean subject ratings for horizontal balance are plotted against xb values for Experiment 2.

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.60

0.70 0.80 0.90 1.00 Transformed Vertical Balance Algorithm

1.10

Fig. 18. Log of mean subject ratings for vertical balance are plotted against yb0 values. As yb0 increases, the balance of the image becomes more vertically centered.

1.6

Mean Subject Rating

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.20

0.40 0.60 Vertical Balance Algorithm

0.80

Fig. 16. Log of mean subject ratings for vertical balance are plotted against yb values for Experiment 2.

that the more well-organized web pages with fewer elements are ranked higher. Just as the results from the BIB ranking indicate that n was the largest predictor of an increased aesthetic appeal ranking, it is highly likely that the number of groups also played a part in the ratings given in trials 1, 3 and 4. Eq. (15) gives the log of the mean aesthetic appeal rating, r, as a function of n. Statistical analysis showed no differences between the ratings of aesthetic appeal obtained in trials 1, 3 and 4 and thus the mean of these ratings is used as r. Using n as a proxy for the image complexity, this equation indicates that subjects rated less complex web pages as more aesthetically appealing LOGðrAESTHETIC Þ ¼ 0:06n þ 1:39;

R2 ¼ 0:30.

(15)

Exp 2: Mean Subject Rating

1.6

5. Conclusions

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.60

0.70 0.80 0.90 1.00 Transformed Horizontal Balance Algorithm

1.10

Fig. 17. Log of mean subject ratings for horizontal balance are plotted against xb0 values. As xb0 increases, the balance of the image becomes more horizontally centered.

top half of the ranking, while the majority of the web pages with the largest n tested (n ¼ 7) are in the lower half of the ranking. One of the major factors underlying this conclusion appears to be an increased sense of organization, such

The central purpose of these two experiments was to create simple methods to quantitatively analyze and describe the composition of visual imagery. The experimental methodology was designed to use a bottomup process such that the overall aesthetic appeal of an image might prove to be partially based on individual compositional attributes. Formulae for both symmetry and balance in both horizontal and vertical dimensions were developed and validated against subject ratings for those attributes. Additionally, a strong relationship between perfect symmetry and overall aesthetic appeal was shown in the basic imagery of Experiment 1, but it was shown to diminish in the more realistic looking web pages used in Experiment 2. These findings lend support to the aesthetic theories that emphasize the organizing role of symmetry in aiding the viewer’s understanding of pictorial composition. This understanding is reflected in the higher ratings of these symmetric images.

ARTICLE IN PRESS M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

681

Fig. 19. BIB results given in rank order from the most appealing (upper left) to least appealing (lower right). The aesthetic score is given below each web page, with higher numbers represent more aesthetically appealing images.

In a finding similar to that of Tullis (1984), the use of a large number of groups in the web pages was found to have negative effects on the dependent measure. While Tullis used response time to show the relationship, this experiment used the aesthetic appeal. It is highly likely that this measure is a proxy for the complexity of the interface. Just

as Tullis suggested that displays can become too complicated and dense, the value of n in the second study is an indicator of the same sense of overcrowding and complexity. These findings can help influence design in multiple ways. The use of symmetry in very basic imagery, such as

ARTICLE IN PRESS 682

M. Bauerly, Y. Liu / Int. J. Human-Computer Studies 64 (2006) 670–682

icons or logos has a high probability of making those images more aesthetically appealing. The use of symmetry in more complex interfaces, such as web pages, becomes less important than making sure the web page is clear and coherent. This can be achieved by reducing the number of compositional elements to the lowest amount possible. The findings of the present study also lend support to the important role of psychophysical theories of aesthetics emphasizing the quantitative relationships between aesthetic response and basic pictorial elements, as investigated in the present study. It should be noted, however, that the present study was designed in a way that minimizes the influence of the pictorial contents on aesthetic judgments through the use of abstract stimuli in Experiment 1 and the same set of pictures seen in advance in Experiment 2. It is highly likely that other schools of theories (Liu, 2003a) will also play an important role in task settings that employ content-rich stimuli and involve active user participation. For example, Kaplan (1987) emphasizes the role of information interchange between the user and the environment, in which people are not passive recipients but active seekers of information and the preferred environments are those that satisfy these informational needs. We plan to extend the research reported in this article to examine the joint contributions of compositional elements and information content when users actively explore a task situation. References Boring, E., 1950. A History of Experimental Psychology, second ed. Appleton-Century-Crofts, Inc., New York. Buss, D.M., Barnes, M., 1986. Preferences in human mate selection. Journal of Personality and Social Psychology 50, 559–570. Cropper, A.G., Evans, S.J.W., 1968. Ergonomics and computer display design. The Computer Bulletin 12 (3), 94–98. Hallnas, L., Redstrom, J., 2002. From use to presence: on the expressions and aesthetics of everyday computational things. ACM Transactions on Computer-Human Interaction 9 (2), 106–124. Healey, C.G., Enns, J.T., 2002. Perception and painting: a search for effective, engaging visualizations. IEEE Computer Graphics and Applications 22 (2), 10–15. Jordan, P.W., 1997. Human factors for pleasure in product use. Applied Ergonomics 29 (1), 25–33. Kaplan, S., 1987. Aesthetics, affect, and cognition: environmental preference from an evolutionary perspective. Environment and Behavior 19, 3–32. Kim, J., Lee, J., Choi, D., 2003. Designing emotionally evocative homepages. International Journal of Human-Computer Studies 59, 899–940.

Langlois, J.H., Roggman, L.A., 1990. Attractive faces are only average. Pscyhological Science 1, 115–121. Lavie, T., Tractinsky, N., 2004. Assessing dimensions of perceived visual aesthetics of web sites. International Journal of Human-Computer Studies 60, 269–298. Liu, Y., 2003a. Engineering aesthetics and aesthetic ergonomics: theoretical foundations and a dual-process research methodology. Ergonomics 46, 1273–1292. Liu, Y., 2003b. The aesthetic and the ethic dimensions of human factors and design. Ergonomics 46, 1293–1305. Liu, Y., Wickens, C.D., 1992. Use of computer graphics and cluster analysis in aiding relational judgment. Human Factors 34 (2), 165–178. Locher, P.J., 1996. The contribution of eye-movement research to an understanding of the nature of pictorial balance perception: a review of the literature. Empirical Studies of the Arts 14 (2), 143–163. Locher, P.J., Nagy, Y., 1996. Vision spontaneously establishes the percept of pictorial balance. Empirical Studies of the Arts 14 (1), 17–31. Locher, P.J., Stappers, P.J., Oberbeeke, K., 1998. The role of balance as an organizing design principle underlying adult’s compositional strategies for creating visual displays. Acta Psychologica 99 (2), 141–161. Ngo, D.C.L., 2001. Measuring the aesthetic elements of screen design. Displays 22 (3), 73–78. Ngo, D.C.L., Teo, L.S., Byrne, J.G., 2003. Modelling interface aesthetics. Information Sciences: An International Journal 152 (1), 25–46. Ognjenovic, P., 1991. Processing of aesthetic information. Empirical Studies of the Arts 9 (1), 1–9. Prince, P.D., 2000. Computer art in the new millennium. IEEE Computer Graphics and Applications 20 (1), 26–27. Reiser, H., Reiser, B., 1995. Aesthetic considerations unique to interactive multimedia. IEEE Computer Graphics and Applications 15 (3), 24–28. Sears, A., 1993. Layout appropriateness: a metric for evaluating user interface widget layout. IEEE Transactions on Software Engineering 19 (7), 707–719. Streveler, D.J., Wasserman, A.I., 1984. Quantitative measures of the spatial properties of screen designs. In: INTERACT ’84 Conference Proceedings. North-Holland, Amsterdam. Tractinsky, N., Katz, A.S., Ikar, D., 2000. What is beautiful is usable. Interacting with Computers 13 (2), 127–145. Tullis, T.S., 1983. The formatting of alphanumeric displays: a review and analysis. Human Factors 25 (6), 657–682. Tullis, T.S., 1984. A computer-based tool for evaluating alphanumeric displays. In: Proceedings of the INTERACT ’84 Conference on Human-Computer Interaction, London, September 1984. Tullis, T.S., 1988a. A system for evaluating screen formats: research and application. In: Hartson, H.R., Hix, D. (Eds.), Advances in Human-Computer Interaction, vol. 2. Ablex, Norwood, NJ, pp. 214–286. Tullis, T.S., 1988b. Screen design. In: Helander, M. (Ed.), Handbook of Human-Computer Interaction. Elsevier Science Publishers B.V., North-Holland, Amsterdam, pp. 377–411. Wilson, A., Chatterjee, A., 2005. The assessment of preference for balance: introducing a new test. Empirical Studies of the Arts 23 (2), 165–180. World, L., 1996. Aesthetic selection: the evolutionary art of Steven Rooke. IEEE Computer Graphics and Applications 16 (1), 4.