Behavioral and procedural consequences of ... - Semantic Scholar

Report 7 Downloads 151 Views
European Journal of Operational Research 134 (2001) 216±227

www.elsevier.com/locate/dsw

Theory and Methodology

Behavioral and procedural consequences of structural variation in value trees Mari P oyh onen, Hans Vrolijk, Raimo P. H am al ainen

*

Systems Analysis Laboratory, Helsinki University of Technology, P.O. Box 1100, 02015 HUT, Finland Received 29 June 1999; accepted 3 October 2000

Abstract Our experiment shows that the division of attributes in value trees can either increase or decrease the weight of an attribute. The structural variation of value trees may also change the rank of attributes. We propose that our new ®ndings related to the splitting bias, some other phenomena appearing with attribute weighting in value trees, and the number-of-attribute-levels e€ect in conjoint analysis may have the same origins. One origin for these phenomena is that decision makers' responses mainly re¯ect the rank of attributes and not to the full extent the strength of their preferences as the value theory assumes. We call this the unadjustment phenomenon. A procedural source of biases is the normalization of attribute weights. One consequence of these two factors is that attribute weights change if attributes are divided in a value tree. We also discuss how the biases in attribute weighting could be avoided in practice. Ó 2001 Elsevier Science B.V. All rights reserved. Keywords: Multi attribute value theory; Value tree; Weight elicitation; Splitting bias

1. Introduction There is growing interest to use multi-attribute value tree methods as a supportive tool in analyzing public policy issues. These methods help to compare di€erent options under multiple, qualitative as well as quantitative, criteria. Attribute weights are used to describe the opinions of stakeholder groups. It is, however, well known

* Corresponding author. Tel.: +358-9-451-3056; fax: +358451-3096. E-mail address: raimo@hut.® (R.P. Hamalainen).

that weight elicitation in value trees is prone to many biases (see, e.g., Weber and Borcherding, 1993). Our paper analyses the biases due to the division of attributes in a value tree. So far, it has remained unclear where phenomena, such as the splitting bias or the range e€ect, truly originate from. A major shortcoming in previous experimental work is the fact that analyses were based on group averages. The results based on averages of weights over the whole group do not describe the behavior of individuals (P oyh onen and Hamalainen, 1998). Our study is the ®rst one focusing on the individual behavior when the attributes are divided. We show that depending on the

0377-2217/01/$ - see front matter Ó 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 7 - 2 2 1 7 ( 0 0 ) 0 0 2 5 5 - 1

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

structure of the value tree, the division of attributes can either increase or decrease the attribute weights. Earlier results have only indicated that there is an increase in weights when attributes are split into sub-attributes (see, e.g., Weber et al., 1988). We also suggest that many of the phenomena related to the biases in attribute weighting may have the same origins. One reason why attribute weighting biases appear is that the decision makers' responses only re¯ect the rank of attributes. This can, for example, cause the range e€ect ± a bias in which decision makers do not adjust their responses enough as the attribute ranges vary (von Nitzsch and Weber, 1993; Fischer, 1995). Another origin for some of the problems is that the weights are normalized. For example, the normalization causes the rank reversal phenomenon in connection with the Analytic Hierarchy Process (Belton and Gear, 1983; Salo and H am al ainen, 1997). In Section 2, we summarize how these two origins ± responses re¯ect only rank and normalization of weights ± may cause biases in attribute weighting. The rest of the paper presents the results from our experiment where some of our suggestions are tested. 2. Consequences of ranking and normalization Value tree analysis is usually based on additive value models. With an additive value model, attribute weights are scaling constants that re¯ect the relative importance of the changes in attributes from the worst levels to the best levels. Thus weights depend on the ranges of the attributes. Weights are normalized to sum to one. The elicitation of attribute weights can be done in two modes, hierarchically or non-hierarchically, see

217

Fig. 1 (Stillwell et al., 1987; Weber and Borcherding, 1993). In hierarchical weighting, weights are elicited and normalized locally on each level and within one branch separately as indicated by the dashed line in Fig. 1. The overall attribute weights are calculated by multiplying the local normalized weights obtained at di€erent levels down through the tree. In non-hierarchical weighting, only the overall lowest level weights are elicited and this is done in one phase. With both modes, di€erent weighting methods, such as SMART or SWING, can be used (von Winterfeldt and Edwards, 1986). Both the weight elicitation mode and the method can have an e€ect on results (Stillwell et al., 1987; Borcherding et al., 1991; P oyh onen and Hamalainen, 2001). Attribute weights can also depend on the structure of the value tree. For example, the splitting bias suggests that the weight of an attribute increases when it is divided into sub-attributes (Weber et al., 1988; Borcherding and von Winterfeldt, 1988). We claim that there are two main origins for these weighting biases. The ®rst origin is that the decision makers' responses do not describe the strength of their preferences but re¯ect only ordinal, i.e., ranking, information on the preferences. The second origin is the normalization of weights. This makes the attribute weights dependent on the number of attributes compared simultaneously. Fig. 2 illustrates di€erent problems in attribute weighting and their possible origins. The solid lines represent known consequences and the dashed lines are conjectures that still call for experimental veri®cation. Following sections describe phenomena resulting from ranking, i.e., decision makers only give ordinal information, and phenomena caused by normalization of weights.

Fig. 1. Hierarchical and non-hierarchical weighting.

218

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

Fig. 2. The problems and biases in attribute weighting and their origins.

2.1. Ranking In conjoint analysis, the attribute weights are calculated from the decision makers' evaluations of hypothetical alternatives (see, e.g., Green and Srinivasan, 1990; Weber and Borcherding, 1993). These hypothetical alternatives are combinations of attribute levels. The attribute `price of a product', for example, can be described with the levels low and high. Wittink et al. (1982) showed empirically that in conjoint analysis the weight of an attribute depends on the number of attribute levels when the decision makers are asked to only rank the alternatives. With the price attribute, for example, this number-of-attribute-levels e€ect states that the weight of the price increases when the price is described with levels low, average, and high instead of levels low and high only. Wittink et al. (1989) showed analytically that this e€ect is unavoidable when the decision makers are only asked to rank the alternatives. Even when the decision makers are asked rating information, it is not de®nite whether they give anything but the rankings. Steenkamp and Wittink (1994) found that the number-of-attribute-levels e€ect appeared because decision makers gave only ordinal information when they were asked to rate the alternatives. P oyh onen and H am al ainen (2001) concluded that weighting methods, such as SMART and SWING, yield di€erent weights because the subjects used a limited set of numerical

ratings in their evaluations. The subjects' responses mainly re¯ected the rank of the attributes instead of the strength of their preferences. A subject used, for example, the numerical ratings 100, 90, and 70 with SWING whereas he used 40, 20, and 10 for the same attributes with SMART. It seems that the subjects ranked the attributes directly or implicitly by using numerical ratings suggested in a particular weighting method. These response scale e€ects lead to di€erent weights for di€erent weighting methods. The variation of the structure of value trees results in many biases that can be due to ranking. P oyh onen and Hamalainen (1998) showed that weighting methods that only use the rank of attributes in the weight calculation, such as SMARTER (Edwards and Barron, 1994), yield biased weights if the attributes are divided. These methods use a limited set of weights, for example with SMARTER the possible weights are 0.611, 0.278, 0.111 when there are three attributes. Thus, the rank information and the normalization of weights together lead to a situation where the division of attributes always results in di€erent weights, even if the decision maker gives consistent evaluations. Weber et al. (1988) pointed out that the division of an attribute is actually the same as the division of the attribute ranges and thus the range e€ect and the splitting bias are the same phenomena. Stillwell et al. (1987) suggested that one reason for the steeper weights in hierarchical

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

weighting is that the decision makers only use integers to give weight ratios. All the phenomena discussed in this section may originate from the ranking of attributes. Decision makers do not adjust their responses enough when the attribute ranges change. For example, if the range of the most important attribute is increased, it remains the most important one, i.e., the rank of this attribute does not change. We will refer to this lack of adjustment in the responses as the description of attribute changes as the unadjustment phenomenon. The unadjustment of responses leads to a situation where the numerical ratings given by respondents implicitly re¯ect ordinal information of preferences only. Steenkamp and Wittink (1994) call this phenomenon the `lack of metric quality' in responses.

2.2. Normalization Normalized weights always depend on the whole set of attributes included in the normalization. Fig. 3 illustrates how the normalization of weights can easily change weights and can make it dicult to detect the real origins of the biases in weights. The normalized weights show the splitting bias although the decision maker has not changed the ratings for the divided attribute. The ratings are changed with the undivided attribute only. In the example, the decision maker assigns ratings 100, 60, and 40 to attributes A, B, and C. After dividing attribute B, the decision maker gives ratings 100, 40, and 20 to A, B1, and B2, but gives 20 to C. The weight of B increases because the decision maker assigns 20 to C. He gives, however, the same ratings to attribute B in both value trees. One should note that the weight of the most important

219

attribute A changes even when the attribute is not divided nor the ratings changed. The example illustrates that by looking at the normalized weights one cannot conclude where the biases in weights really originate from. P oyh onen and Hamalainen (1998) pointed out that taking averages over sets of normalized weights can lead to false conclusions about weighting biases. In the case of n weights being normalized, the average of each normalized weight over a large number of subjects tends to approach 1=n. This phenomenon leads, for example, to the splitting bias if one studies the averages of weights only. The averages of weights over a group of subjects do not describe individual behavior and thus the existence of the biases requires studies at the individual level. The normalization of weights also has other consequences, such as the rank reversal phenomenon in the Analytic Hierarchy Process. Because of the normalization of alternative scores, the addition of alternatives can change the rank of the other alternatives (see, e.g., Belton and Gear, 1983; Salo and Hamalainen, 1997).

3. Experiment The experiment aimed at studying the following questions: · Do the weighting biases appear at the individual level when the structure of the value tree is changed? · What are possible explanations for changes in weights? The 180 subjects were students of the Faculty of Economics at the University of Groningen and they completed the experiment as a course

Fig. 3. An example where the normalized weights show the splitting bias although the decision maker does not change the ratings for the divided attribute. Here the e€ect is caused by a change in the responses with respect to the least important attribute C.

220

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

assignment. The decision task was to choose the most preferred supermarket for their daily shopping. The attributes are described in Table 1 and the value tree is shown in Fig. 4. We assumed the value model to be additive, but we did not test this explicitly. In most of the experiments done with value trees, the additivity assumption is not validated and we decided to skip the validation also in our experiment. The main reason for this was that the validation of the additivity assumption is most often skipped, also in applications. We felt that even without explicitly validating the additivity, we still get results that are similar to practical situations.

Table 1 The attribute ranges Attributes

Worst level

Best level

Price index (A1) Price promotions (A2) Assortment (B1) Quality of fresh products (B2) Personnel (B3) Average waiting time (B4) Distance (C)

105 None Narrow Average Indi€erent 6 minutes 2500 m

100 A lot Broad High Helpful 3 minutes 200 m

The subjects were shown the table of attributes and their ranges. They were not shown the structure of the value tree explicitly. Each subject weighted the following ®ve sets of attributes (the comparison sets 1, 2, 3, 4, and 5) in a randomized order (see Fig. 4). We did not ®nd any order effects. 1. AB and C 2. A, B, and C 3. A and B 4. A1, A2, B1, B2, B3, B4, and C 5. A1, A2, B1, B2, B3, and B4 Weights were elicited by the SWING procedure. The subjects were asked to assign a rating of 100 to the most important change from the worst attribute level to the best level. The remaining attributes were rated on a 0±100 scale. This method was chosen because it clearly introduces the attribute ranges in the questions. The range e€ect has been found to be smaller with this method than with SMART (Fischer, 1995). During the questioning, the attribute ranges were also clearly presented for the higher-level attributes, which consist of a group of sub-attributes. For example, for the attribute A, that includes two attributes related to price issues, the subject was asked to

Fig. 4. The value tree and the ®ve comparison sets used in the experiment.

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

consider both the price index (A1) and the price promotions (A2). 4. Results 4.1. Observations at the individual level The main observation is that the division of attributes changes the weights at the individual level. For most of the subjects the weight of attribute B increases and the weight of attribute A decreases when the attributes are divided. The rank of attributes A and B reverses for most of the subjects who give more weight to (undivided) attribute A than to (undivided) attribute B (see Table 2). They consider attribute A to be more important than B when these two attributes are compared simultaneously, whereas after the division attributes B1, B2, B3 and B4 together get more weight than A1 and A2 together. It should be noted, however, that the subjects do not change the rank of individual attributes, i.e., most of the subjects who give more weight to attribute A also give more weight to either A1 or A2 than to any of attributes B (see Table 3). In other words, Table 3 shows that most of the subjects have stable opinions which attributes get the highest rank. One should note that a subject is not inconsistent if he ®rst gives the highest weight to attribute A and after the division of attributes gives the highest

221

weight to one of the B attributes, because it may be that after division neither A1 nor A2 is the most important attribute. In order to describe individual weighting behavior, the responses are studied separately within groups of subjects having similar preferences. We used the following four homogeneous groups: Group A: Subjects giving the highest weights to one of the A attributes in all the comparisons (AB ˆ 100, A ˆ 100 and either A1 or A2 ˆ 100) (47 subjects) Group B: Subjects giving the highest weights to one of the B attributes in all the comparisons (AB ˆ 100, B ˆ 100 and either B1, B2, B3 or B4 ˆ 100) (46 subjects) Group C: Subjects giving the highest weights to the C attribute in all the comparisons (C ˆ 100 in comparison sets 1,2, and 4) (50 subjects) Others: All subjects not belonging to one of the previous groups (35 subjects) There are di€erences between the groups (see Table 4) and thus also di€erences between individual subjects. In all the groups, the division of attribute B increases the weight of that attribute, although less in Group B than in the other groups. The division of attributes decreases the weight of attribute A in all the groups but less in Group C than in the other groups. In Group A, the division of attributes changes the preference order of attributes A and B. In Group B, the order of attributes A and B remains the same. In all the groups,

Table 2 The direction of the biasa A>B B>A

A1 ‡ A2 > B1 ‡ B2 ‡ B3 ‡ B4

B1 ‡ B2 ‡ B3 ‡ B4 > A1 ‡ A2

4 0

98 76

a

The number of subjects who give more weight to A (or B) in comparison 2 and then give more weight either to A1 and A2 or to B1, B2, B3, and B4 together in comparison 4 (ties excluded).

Table 3 The order of the attributesa A>B B>A a

max…A1; A2† > max…B1; B2; B3; B4†

max…A1; A2† > max…B1; B2; B3; B4†

76 8

18 65

The number of subjects who give the highest weight to A (or B) in comparison 2 and then give the highest weight to either A1 or A2 or to one of sub-attributes B in comparison 4 (ties excluded).

222

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

Table 4 Average changes in the normalized weights after the attribute is divideda Division

Group A (%)

Group B (%)

Group C (%)

Others (%)

AB ! A, B A ! A1, A2 B ! B1, B2, B3, B4

23 )15 84

24 )15 48

44 )6 83

26 )14 114

a

Comparison sets 1, 2, and 4. For example, in Group A the weight of attribute A decreases on average 15% when it is divided into two attributes.

the division of the top level attribute AB into two separate attributes A and B increases the weight. 4.2. The way to give numerical ratings ± the unadjustment phenomenon The di€erences between the groups and the changes in weights originate from the way the Table 5 The medians of responses within groups A, B, and C

subjects select the responses to express their preferences. The individual responses show that the subjects have a tendency to reduce the starting score of 100 in steps of multiples of ten or ®ve (see Table 5). The lowest given score is about 40 when all seven attributes are evaluated simultaneously (see Table 6). This response scale e€ect together with the normalization cause the weights to change when the attributes are divided. The decision

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

223

Table 6 The range of responsesa Endpoints

n ˆ 2, Comp 1

From

To

100 100 100 100 100 100 100 100 100

90 80 70 60 50 40 30 20 10 Total

n ˆ 2, Comp 3

n ˆ 3, Comp 2

n ˆ 6, Comp 5

n ˆ 7, Comp 4

40 40 17 13 14 5 6 4 0

39 44 22 20 15 0 3 1 2

18 38 21 26 24 10 10 4 3

1 6 15 14 38 21 22 18 15

0 4 7 18 18 39 21 26 23

139

146

154

150

156

a

The number of subjects who used these endpoints in their responses within each comparison set (n is the number of attributes compared simultaneously).

makers are inclined to use the same response scale and they fail to adjust their responses adequately when the attributes are divided. We call this lack of reaction the unadjustment phenomenon. The unadjustment of the responses leads to a situation where the numerical ratings given by the decision maker mainly re¯ect the rank of attributes. There are subjects who clearly understood what is the meaning of the attribute weights although their responses re¯ect the rank of attributes only. For example, a subject ®rst indicates attribute A to be the most important one, but after division selects attribute C to be the most important one (Fig. 5 presents the responses of this subject). This subject understood that the weights of the other

Fig. 5. The responses of one subject from the group Others. The subject changed the order of the attributes correctly after the division and also the rank of attributes is consistent. The weights of attributes A, B, and C, however, change because of the division of attributes and because the subject failed to adjust the responses enough.

attributes compared to attribute C should decrease when the attributes are divided. Some subjects (32 out of 180 subjects) correctly increased the numerical ratings given to attribute C compared to other attributes when the attributes were divided. These types of responses show that some subjects correctly considered the attribute ranges.

4.3. Why does the unadjustment phenomenon appear? The explanation that subjects only react to the labels of attributes (see, e.g., Borcherding and von Winterfeldt, 1988) is too simple to explain the response behavior found in our experiment. There are subjects who only seem to rank attributes within some comparison sets because they reduce the starting number of 100 in even steps of 10, 5, or 1. For example, one uses numerical ratings 100, 90, and 80 for attributes A, B, and C. However, none of the subjects is behaving in this way throughout all the comparison sets. Thus, they also use some other rules in addition to ranking to adjust their responses when an attribute is divided. We suggest two other behavioral rules, closely related to direct ranking, that can lead to the observed responses. We restrict this analysis to the subjects belonging to Group C, because in this group the division of attributes does not change the reference attribute of the comparisons, i.e., C is the most important

224

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

attribute throughout the comparisons. In Group C, a consistent decision maker should give A1 and A2 together the same rating as to A alone (in a similar way: the sum of numerical ratings given to B1, B2, B3, and B4 should equal the rating given to B). Subjects on the right-hand side of the straight line in Fig. 6 increase the numerical ratings assigned to A. Only the subjects situated on (or near) the straight line have completely consistent evaluations with respect to A. Two other possible behavioral rules to adjust the responses when the attribute is divided are: (1) the average rating assigned to the sub-attributes equals the rating given to the undivided attribute (the average rule) or (2) the maximum assigned to one of the sub-attributes equals the ratings given to the undivided attribute (the maximum rule, Fig. 7). These rules are related to the availability hypothesis which states that the decision maker uses the same or a similar rating

Fig. 6. Sum of A1 and A2 versus A.

Fig. 7. Maximum of A1 and A2 versus A.

with the sub-attributes as with the undivided attribute (Weber et al., 1988; Borcherding and von Winterfeldt, 1988). It is noteworthy, however, that only ®ve subjects use exactly the same numerical ratings with A1, A2, and A (e.g., give all A, A1, and A2 a rating 20) and none of the subjects use the same ratings for B1, B2, B3, and B4 as for B. Thus the subjects are not assigning the same numerical ratings to attributes throughout the comparisons. The subjects are classi®ed according to which behavioral rule best matches their responses within each comparison set. The classi®cation proceeded in the following order: 1. The average rule: the subjects give the divided attributes lower and higher numerical ratings than to undivided attributes. A typical response is that a subject gives 80 to A and then 95 and 60 to A1 and A2. 2. The maximum rule: the subjects assign to one of the divided attributes the same numerical ratings as to the undivided attribute and less to the other divided attributes. One gives, for example, 95 to A and then 95 also to A1 but 80 to A2. 3. Consistent subjects decrease all the numerical ratings given to the divided attribute compared to the undivided attribute. One gives, for example, 90 to A and then 50 and 70 to A1 and A2. One should note that these subjects are consistent only in the qualitative sense, i.e., they decrease the ratings of the divided attributes as they are supposed to do, but the weights do change because the adjustment of the responses is not enough. 4. Inconsistent subjects give all the divided attributes an equal or higher numerical ratings than to the undivided attribute. One gives, for example, 40 to A and then 80 and 90 to A1 and A2. Table 7 summarizes the behavior of the subjects at each level of the tree. There does not exist one typical way to react to the division of attributes. Furthermore, subjects do not use the same behavioral rule in the di€erent comparison sets. Based on these observations it is not possible to make ®nal conclusions about why the unadjustment phenomenon appears and the topic needs further research.

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

225

Table 7 Possible behavioral rules used in Group C (number of subjects is 52) Division

Average rule

Maximum rule

Consistent

Inconsistent

Total

AB ! A, B A ! A1, A2 B ! B1, B2, B3, B4

8 18 31

16 17 10

12 10 8

16 7 3

52 52 52

4.4. How can the changes in weights be avoided? In all the groups, the division of attributes would not have changed the weights if the subjects would have · decreased the ratings given to attributes B, · increased the ratings given to attributes A, · or increased more clearly the rating given to attribute C. In practice, the decision makers should be advised in weighting so that the proportions of ratings divided among the branches remain the same even if the set of attributes is changed (see Fig. 8). If the left branch gets 56% of weights before division, it should get the same percentage of weights also after any changes in the structure of the value tree. It would be interesting to study these phenomena with a weighting procedure, where a pie is divided among the attributes, instead of making pairwise comparisons relative to the most important attribute. P oyh onen and H am al ainen (1998) showed that the splitting bias does not occur with hierarchical weighting if one studies the sums of the ®nal weights obtained by multiplication down through

the value tree. Fig. 9 illustrates this phenomenon. In Fig. 9(a) and (b) the weights are elicited hierarchically. In Fig. 9(b) one attribute is divided into two attributes. Throughout the comparisons the decision maker gives numerical ratings as the unadjustment phenomenon would suggest. In this way the local weights, i.e., the weights at one level and within one branch, do change when the attribute is divided. However, after the multiplication of local weights downwards, the ®nal attribute weights do not show any biases. In this procedure the sum of ®nal weights below one attribute is by de®nition always the same as the weight of that attribute at the higher level. In practice, the attribute weighting proceeds most often hierarchically because in that way the number of comparisons can be kept at a minimum. There are, however, a few remarks that should be taken into account before drawing the conclusion that the splitting bias is eliminated in this way. First, the local weights do change. In practice, this detailed information is also important. Second, we do not know what biases appear with the weighting at the higher levels of value trees. The attributes at the higher level are multi-dimensional and

Fig. 8. An example how the weights change in Group A when the attributes are divided and how this can be avoided. The weights remain unchanged if the numbers allocated to the divided B attributes would be around 140, i.e., on an average 40 to each attribute. The proportions of the numerical ratings given to both branches should remain the same.

226

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

the scale e€ect. They give ordinal information to express their preferences. However, very straightforward ranking behavior, i.e., a subject gives numerical ratings down from 100 in equal steps, (e.g., the subject uses only 100, 90, 80, etc.) does not alone explain the observed responses. We suggest alternative behavioral rules that can lead to responses that re¯ect ordinal information. · We also describe how the biases could have been avoided in our experiment. We show that hierarchical weighting eliminates the splitting bias in ®nal weights. However, the local weights, i.e., the weights at one level and within one branch in a value tree, may change due to the unadjustment phenomenon if the attributes are divided. Changes of local weights are eliminated when the weights are multiplied down through the tree.

Fig. 9. An illustrative example showing that the splitting bias cannot be observed in hierarchical weighting. The ®nal weights obtained by multiplication down through the tree do not show any bias although the local weights do change because of the division.

their weights should re¯ect the attribute ranges of groups of sub-attributes. These comparisons are much more dicult than the comparison of unidimensional attributes. It may be that the hierarchical weighting does not capture true preferences at all. 4.5. Summary The main results from our experiment are: · The splitting bias appears also at the individual level. There are, however, interpersonal di€erences. There are subjects who are able to avoid the changes in weights. For most of the subjects, the division of attributes either increases or decreases the weights and may also change the rank of groups of attributes (before normalization). · One reason for these biases appears to be that the subjects use even 10s in their evaluations,

5. Conclusions Attribute weights change when attributes are divided in a value tree because the decision makers do not adjust their responses enough to a change in the tree. We call this the unadjustment phenomenon. The unadjustment phenomenon together with the normalization leads to phenomena in which the division of an attribute changes the weights and the weights at the higher level of the value tree do not necessarily have a correct interpretation. This behavior leads to ratings that re¯ect the rank of attributes only. This is in con¯ict with the underlying value theory which assumes that the numerical ratings given by decision makers describe the strength of their preferences. There still remains a major task for decision analysis researchers to ®ll the gap between observed decision maker behavior and methodological assumptions. Procedures should enable the decision makers to clearly understand the interpretation of an attribute weight in a value tree and the meaning of numerical ratings they are asked to give. The responses elicited in our experiment show that the subjects do understand the meaning of attribute weights in a qualitative sense, but they fail to adjust their responses suciently (see also

M. Poyhonen et al. / European Journal of Operational Research 134 (2001) 216±227

P oyh onen and H am al ainen, 2000). It is unclear why decision makers fail to react correctly to the division of attributes. In our experiment, simple ranking of attributes does not alone explain the observed responses. We also suggested other adjustment rules that may explain how decision makers think but none of them was a superior explanation. One solution for practice is that it is accepted that the decision makers' responses re¯ect the rank of attributes only and the weighting methods explicitly ask decision makers to rank attributes. In this way there is less need to speculate whether decision makers understood what they were required to respond to. One remedy would be to use methods that allow decision makers to give ranges of responses (see, e.g., Salo and H am al ainen, 1992a,b). Still, one should be cautious if the structure of a value tree is changed. With hierarchical weighting the ®nal weights obtained by multiplication down through the value tree do not show biases in the way studied in our experiment. The weights at the higher levels of value tree may be biased in other ways that are yet unknown. The decision makers need to give opinions over multi-dimensional attributes. Given the cognitive complexity of these comparisons, the validity and reliability of these ratings are at least questionable. These sources of biases should be studied in the future if hierarchical weighting continues to be applied in practice. Furthermore, in practice, the weighting is often done interactively so that the decision maker is able to revise the weights and get feedback on possible inconsistencies. The e€ects of interactiveness and learning on biases in attribute weights should get further attention. References Belton, V., Gear, T., 1983. On a short-coming of Saaty's method of analytic hierarchies. OMEGA 3, 228±230. Borcherding, K., von Winterfeldt, D., 1988. The e€ect of varying value trees on multiattribute evaluations. Acta Psychologica 68, 153±170. Borcherding, K., Eppel, T., von Winterfeldt, D., 1991. Comparison of weighting judgements in multiattribute utility measurement. Management Science 31 (12), 1603± 1619.

227

Edwards, W., Barron, F.H., 1994. SMARTS and SMARTER: Improved simple methods for multiattribute utility measurement. Organizational Behavior and Human Decision Processes 60, 306±325. Fischer, G.W., 1995. Range sensitivity of attribute weights in multiattribute value models. Organizational Behavior and Human Decision Processes 62 (3), 252±266. Green, P.E., Srinivasan, V., 1990. Conjoint analysis in marketing: New developments with implications for research and practice. Journal of Marketing 54, 3±19. P oyh onen, M., Hamalainen, R.P., 1998. Notes on the weighting biases in value trees. Journal of Behavioral Decision Making 11, 139±150. P oyh onen, M., Hamalainen, R.P., 2000. There is hope in attribute weighting. Information Systems and Operational Research 38 (3), 2000. P oyh onen, M., Hamalainen, R.P., 2001. On the convergence of multiattribute weighting methods. European Journal of Operational Research 129 (3), 569±585. Salo, A., Hamalainen, R.P., 1992a. Preference assessment by imprecise ratio statements. Operations Research 40, 1053± 1061. Salo, A., Hamalainen, R.P., 1992. PRIME ± Preference ratios in multiattribute evaluation. Helsinki University of Technology, Systems Analysis Laboratory Research Reports A43, July 1992. Salo, A.A., Hamalainen, R.P., 1997. On the Measurement of preferences in the analytic hierarchy process. Journal of Multi-Criteria Decision Analysis 6, 309±319 and comments by Belton, V., Choo, E., Donegan, T., Gear, T., Saaty, T., Schoner, B., Stam, A., Weber, M., Wedley, B. Steenkamp, J.E.M., Wittink, D.R., 1994. The metric quality of full-pro®le judgments and the number-of-attribute-levels e€ect in conjoint analysis. International Journal of Research in Marketing 11, 275±286. Stillwell, W.G., von Winterfeldt, D., John, R.S., 1987. Comparing hierarchical and non-hierarchical weighting methods for eliciting multiattribute value models. Management Science 33 (4), 442±450. von Nitzsch, R., Weber, M., 1993. The e€ect of attribute ranges on weights in multiattribute utility measurements. Management Science 39, 937±943. von Winterfeldt, D., Edwards, W., 1986. Decision Analysis and Behavioral Research. Cambridge University, Cambridge, MA. Weber, M., Borcherding, K., 1993. Behavioral in¯uences on weight judgments in multiattribute decision making. European Journal of Operational Research 67, 1±12. Weber, M., Eisenfuhr, F., Von Winterfeldt, D., 1988. The e€ects of splitting attributes on weights in multiattribute utility measurement. Management Science 34, 431±445. Wittink, D.R., Krishnamurthi, L., Nutter, J.B., 1982. Comparing derived importance weights across attributes. Journal of Consumer Research 8, 471±474. Wittink, D.R., Krishnamurthi, L., Rebstein, D.J., 1989. The e€ect of di€erences in the number of attribute levels on conjoint results. Marketing Letters 1, 113±123.