NONPARAMETRIC ESTIMATION OF ITEM AND RESPONDENT ...

Report 2 Downloads 75 Views
PSYCHOMETRIKA—VOL. 71, NO. 2, 257–279 JUNE 2006 DOI: 10.1007/S11336-003-1098-9

NONPARAMETRIC ESTIMATION OF ITEM AND RESPONDENT LOCATIONS FROM UNFOLDING-TYPE ITEMS

MATTHEW S. JOHNSON BARUCH COLLEGE, CITY UNIVERSITY OF NEW YORK Unlike their monotone counterparts, nonparametric unfolding response models, which assume the item response function is unimodal, have seen little attention in the psychometric literature. This paper studies the nonparametric behavior of unfolding models by building on the work of Post (1992). The paper provides rigorous justification for a class of nonparametric estimators of respondents’ latent attitudes by proving that the estimators consistently rank order the respondents. The paper also suggests an algorithm for the rank ordering of items along the attitudes scale. Finally, the methods are evaluated using simulated data. Key words: unfolding response models, ideal point models, Thurstone scaling, attitude scaling, nonparametric item response theory.

1. Introduction A number of methods have been developed in the past eight decades for the measurement of attitudes from surveys and other self-report instruments. Such attitudes range from students’ attitudes toward topics of instruction or teaching styles, to changes in smokers’ attitudes as they approach successive change (e.g., No¨el, 1999), to citizens’ attitudes toward major policy issues (e.g., Formann, 1988) or toward staying informed about politics (Muhlberger, 1999). One of the methods suggested in the literature to study such attitudes is the direct response method. In the direct response method respondents are given a set of J items (statements or other stimuli) and asked to examine each one and determine whether or not to endorse it. Item response theory (IRT) can be used to model direct responses from attitudinal studies, if the central assumption of monotonicity of the item response functions (IRFs) is suitably modified. Standard, unidimensional IRT assumes that there is a single, real-valued latent variable θ being measured; in attitudinal studies the sign of θ represents the direction of the respondent’s attitude (political liberalism versus conservatism, for example) and the magnitude |θ | represents the intensity of the respondent’s attitude. When the items are statements that the respondent can endorse (denoted Xj = 1) or not (denoted Xj = 0), the usual IRF, Pj (θ ) ≡ P r{Subject endorses item j given the subject is located at θ} = P [Xj = 1|θ ] still plays a central role in modeling. One of the original direct response methods, suggested by Thurstone (1928), prescribes that, in order to measure θ , the unidimensional latent attitudes of subjects, we should first premeasure the survey items. This premeasurement is achieved by giving the items to a number of judges who are asked to rank the items from one extreme to the other on the latent scale. These ranks are used to estimate the “locations” (defined more precisely below), of each item on the real line. The This research was supported in part by an Educational Testing Service Gulliksen Fellowship, and by the National Science Foundation, Grant DMS-97.05032. The author would like to thank Brian Junker for his help and support on this paper and Paul Holland, Steve Fienberg, and Jay Kadane for their helpful comments. Requests for reprints should be sent to Matthew S. Johnson, Department of Statistics & CIS, Baruch College, City University of New York, New York, NY 10010, USA. E-mail: Matthew [email protected]

 c 2006 The Psychometric Society

257

258

PSYCHOMETRIKA

items are then given to survey respondents, and the average location of those items endorsed by an individual provides the researcher with an estimate of that respondent’s location on the real line. Thurstone’s procedure has the disadvantage that it requires two stages of estimation. Coombs (1964) suggests a method, which he calls unfolding, that allows for the joint estimation of item and respondent locations with a single data collection step. His method, based on the assumption that subjects agree with those items, and only those items which are located near their location on the latent scale, implies that the IRF Pj (θ ) for endorsing the item is deterministic, that is, Pj (θ ) takes only the values 0 and 1. Both Thurstone’s and Coombs’s methods imply that the IRF is a unimodal function of θ . The assumption of unimodality of the IRF is in contrast to the usual assumption in applications of IRT to testing in which the probability that an examinee correctly responds to a test item is a nondecreasing function of θ . Item response models that assume a unimodal IRF are called unfolding response models, also called proximity models (Hoijtink, 1991). In any unimodal item response model, define the location of item j as the point at which the IRF reaches a maximum (or the midpoint of the interval of all such points), that is, βj = argmaxθ Pj (θ ). Although Coombs’s method allows for measurement of the survey stimuli directly from the observed responses, its assumption of a deterministic response function, specifically,  1 if |θ − βj | < δj , Pj (θ ) = 0 otherwise, is too restrictive for most applications. The inflexibility of the deterministic model prompted the development of probabilistic unfolding models (Davison, 1977; Andrich, 1988; Hoijtink, 1990; Luo, 1998). The probabilistic unfolding models assume conditional independence, unidimensionality of the latent trait, and unimodality of the response function, that is, the response function is single peaked. The probabilistic parametrizations of the IRF Pj (θ ) allow for statistical inferences to be made on both the subject’s location θ , and the item locations βj . A number of papers have focused on the formulation and estimation of unfolding models (see e.g., Andrich, 1978, 1989; DeSarbo & Hoffman, 1986; Hoijtink, 1990, 1991, Andrich & Luo, 1993; Verhelst & Verstralen, 1993; Roberts, Donoghue, & Laughlin, 1999, 2000; Maris & Maris, 2002; Johnson & Junker, 2003). One of the simplest examples of a probabilistic unfolding IRF is the squared logistic model (SLM) (Andrich, 1988), which parametrizes the IRF as Pj (θ ) = (1 + exp{(θ − βj )2 − γj })−1 . The IRF is symmetric around the location βj , and the maximal endorsement probability Pj (βj ) = (1 + exp(−γj ))−1 is a function of the parameter γj , which Andrich and Luo (1993) calls the item-unit parameter. Studying the features of the data without assuming a parametric model is more attractive in many situations. Nonparametric monotone IRT models have been studied in great detail by a number of authors, including Mokken (1971), Stout (1990), Junker (1991), Ramsay (1991), Hemker, Sijtsma, Molenaar, and Junker (1997), Douglas (1971), Sijtsma (1998), and many others (see Junker and Sijtsma (2001) for a survey of more recent work). One of the most important results in the nonparametric theory  of dichotomous IRT models states that under minor regularity conditions the total score Si = Jj=1 Xij consistently estimates the rank order of the sampled individuals (Stout, 1990). The rest-scores S (−j ) ≡ S − Xj are particuraly useful to examine the overall shape of the IRFs. Junker (1993, Proposition 4.1(a)) proves that the item-rest regression, defined by P r{Xj = 1 | S (−j ) } = E[Xj | S (−j ) ], is necessarily monotone, and Douglas (1997) proves that a kernel regression method to estimate the item-rest regression is a consistent estimator of the shape of the IRF. Unlike their monotone counterparts, nonparametric unfolding models have not been studied in great detail. This paper focuses on nonparametric unfolding response models. Section 2.

MATTHEW S. JOHNSON

259

discusses the main results currently available for nonparametric unfolding models, concentrating on the approach of Post (1992). Section 3. presents the major results of this paper: section 3.1. provides a rigorous justification for a class of nonparametric estimators of person locations which requires knowledge of the rank order of the items (the estimator suggested by Thurstone (1928) is a member of this class of estimators); section 3.2. develops a consistent estimator of item location ranks, to allow for data-based estimation of the item weights; and section 3.3. discusses the use of a nonparametric regression estimator of IRFs using the estimates of the person locations. Section 4. performs a series of simulation studies to illustrate the application of these nonparametric estimation procedures.

2. Definition of the Nonparametric Unfolding Model 2.1. Core Assumptions for Item Response Models Post (1992) develops a nonparametric definition of the dichotomous unidimensional latent trait unfolding model. To facilitate the discussion of Post’s approach, assume that the responses to a set of J stimuli are observed for individuals i = 1, . . . , N . The random variables X1 , . . . , XJ are indicators of whether or not the subject endorses the item (Xj = 1 indicates the subject endorses the stimulus). Post’s three primary assumptions are: [A1] There exists a latent trait θ ∈ IR such that every subject has a position on this latent scale. [A2] Given the latent trait (attitude) θ , responses to the set of Jstimuli are independent, mathematically P r{X1 = x1 , X2 = x2 , . . . , XJ = xJ | θ } = j P r{Xj = xj | θ }. [A3] For all items j = 1, . . . , J , the item response function (IRF) Pj (θ ), the probability that a subject responds positively to item j , is weakly unimodal (i.e., there exists βj where Pj (·) obtains its maximum, Pj is nondecreasing for θ < βj , and nonincreasing for θ > βj ). Assumptions [A1] and [A2] are also common assumptions for monotone IRT models. Assumption [A3] allows us to order the items according to their location βj . Although the definition for βj in [A3] does not ensure uniqueness of βj , redefining βj as the midpoint of all values where Pj (θ ) obtains its maximum (i.e., βj = midpoint{t : Pj (t) = maxθ Pj (θ )}) does ensure uniqueness, and this paper shall use this midpoint definition hereafter. For the remainder of the paper assume that the items are ordered 1, . . . , J such that β1 ≤ β2 ≤ · · · ≤ βJ . Assumption [A3] only states that the IRFs must be unimodal and makes no further assumptions about the shape, or relationship between IRFs. Luo (1998) develops a class a unfolding models based on a generalization of distance models. The generalization states that unfolding models should be of the form Pj (θ ) = ψ(δj )/[ψ(δj ) + ψ(θ − βj )], where the function ψ(·) is: (a) nonnegative; (b) monotone increasing in the positive domain; and (c) symmetric about the origin. Although distance models, or location family IRFs, are popular in the literature, [A3] does not require the response functions to be a location family; that is, Pj1 (θ ) is not necessarily a translation of Pj2 (θ ). Furthermore, Luo’s definition of the unfolding model implies that the maximum of the IRF is at least 1/2, which is not a necessary condition. The following section develops a more inclusive class of unfolding response models by generalizing the method by which Andrich and Luo (1993) and Verhelst and Verstralen (1993) developed the hyperbolic cosine model (HCM). 2.2. Nonparametric Latent Response Item Response Functions Andrich and Luo (1993) and Verhelst and Verstralen (1993) independently developed an unfolding response model with the aid of a latent response model (LRM) (Maris, 1995). The

260

PSYCHOMETRIKA

formulation assumes that each respondent has a one-dimensional trichotomous latent response, denoted ξj , where ξj = 0 indicates “Disagree Below,” ξj = 1 indicates “Agree,” and ξj = 2 indicates “Disagree Above.” The observed responses are then simply a mapping of these responses. A latent response of “Agree” maps to the observed response “Agree” and the two disagree latent responses map to the one observable “Disagree” category. That is, Xj = I{ξj =1} , where  1 if u = v, I{u=v} = 0 if u = v, is an indicator function. Maris (1995) calls this the collapsing condensation rule. Johnson and Junker (2003) generalize the rule through importance sampling to facilitate computation when estimating parametric unfolding models. If the latent responses are observed, rather than unobserved, the researcher has a large number of models from standard IRT with which to measure the latent attitude θ . Let Rj k (θ ) = P r{ξj = k | θ } for k ∈ {0, 1, 2} denote the category response functions (CRF) of a three category monotone response model (i.e., Rj 0 (θ ) is monotone decreasing and Rj 2 (θ ) is monotone increasing). Then, the IRF for the observed unfolding item is simply the middle CRF Pj (θ ) = Rj 1 (θ ). Verhelst and Verstralen (1993) and Andrich and Luo (1993) developed the hyperblic cosine model (HCM) by assuming the partial credit model (PCM) (Masters, 1982) for the latent responses. The resulting model parametrizes the item response model with the equation Pj (θ ) =

exp(γj ) . exp(γj ) + 2 cosh(θ − βj )

(1)

Under the HCM, the maximal endorsement probability is Pj (βj ) = 1/[1 + 2 exp(−γj )]. Any monotone response model can be used with the collapsing condensation rule to produce a dichotomous unfolding model. However, care must be taken when selecting a montone response model for the latent response ξj , because, as the following example from Johnson (2001) shows, monotonicity of the latent response model does not guarantee unimodality of the dichotomous unfolding model. Example 1. (Multimodal Response Function Produced by LRM) The CRFs. Rj 0 (θ ) = P r(ξj = 0 | θ ) =

1 1 + exp{(θ + 0.5)3 }

and Rj 2 (θ ) = P r(ξj = 2 | θ ) =

1 1 + exp{−(θ − 0.5)3 }

are monotone increasing and monotone decreasing, respectively; thus defining a monotone item response model for trichotomous responses. The model derived by application of a latent response model with the collapsing condensation rule on these CRFs does not have a unimodal IRF (see Figure 1). It is easy to show (Johnson, 2001) that a sufficient condition for unimodality of the unfolding IRF defined by applying the collapsing condensation rule to a symmetric monotone response model (i.e., Rj 0 (βj − t) = Rj 2 (βj + t)) is that Rj 0 (t) < Rj 0 (2βj − t) for all t < βj . 2.3. More Restrictive Assumptions for the Unfolding Model The final two properties of the unfolding model required by Post (1992) involve the posterior distribution of the latent attitude given the response to a single stimulus. To facilitate the discussion of these assumptions assume that the items are ordered in such a way so that β1 ≤ β2 ≤ · · · ≤ βJ .

261

X= ξ = 1 ξ=0 ξ=2

0.0

0.2

0.4

P(θ)

0.6

0.8

1.0

MATTHEW S. JOHNSON

0

2

4

θ

FIGURE 1. Example of a bimodal IRF, from Example 1, produced by assuming a monotone item response model for latent responses and assuming the collapsing condensation rule to relate observed responses to latent responses.

Suppose item j is located below item k on the attitude scale (i.e., βj < βk ). The first of Post’s additional assumptions is based on the idea that individuals endorsing the item located higher on the attitude scale (Xk = 1) should be more likely to have a large θ -value than those individuals who endorse the lower item. For example, if a subject endorses a political statement that is considered conservative, this subject is more likely to be a conservative than a respondent who endorsed a liberal statement. The assumption is a stochastic ordering assumption (Casella & Berger, 1990, p. 411) made on the posterior distribution of the latent trait for any prior distribution of the latent attitude θ . Formally [A4] states that the latent variable is stochastically ordered by the location of single responses Xj : [A4] For any probability distribution G(θ ) of latent trait values, and any value θ0 on the latent scale, PG (θ > θ0 | Xj = 1) is nondecreasing in j , for all j such that P (Xj = 1) > 0. Assumption [A4] is an item ordering assumption somewhat analogous to the invariant item ordering assumption for monotone IRT models (Rosenbaum, 1987; Sijtsma & Junker, 1996). Stated differently, assumption [A4] assumes that the posterior distribution of θ given a positive response to an item located at β is stochastically ordered by the location β. Assuming [A1] and [A2], Post shows that assumption [A4] is equivalent to two properties on the IRFs. The first property equivalent to [A4] states that the posterior densities of θ given endorsement of a single item has a monotone likelihood ratio (MLR) in θ , ∀ j < k,

g(θ | Xk = 1) g(θ | Xj = 1)

is nondecreasing in θ .

(2)

262

PSYCHOMETRIKA

The second property, called the monotone traceline ratio (MTR) (traceline is used interchangeably with IRF), states ∀ j < k,

P r{Xk = 1 | θ } P r{Xj = 1 | θ }

is nondecreasing in θ .

(3)

It is easy to see that the properties MLR and MTR are equivalent even for models that do not satisfy assumptions [A1] and [A2]. Johnson (2001) shows that the HCM in (1) with item-units equal across items (i.e., γ1 = γ2 = · · · = γJ ) satisfies MTR and hence satisfies [A4]. Although there is some intuitive justification for assumption [A4], and some popular unfolding models satisfy [A4], not all unfolding models do satisfy the assumption. In fact, as Johnson (2001) shows, the PARELLA model (Hoijtink & Molenaar, 1992), and the HCM with varying item units need not satisfy MTR. The final property required by Post is specific to unfolding response models, and does not carry over to monotone items. The assumption again is made on the posterior distribution of the latent trait given the endorsement of a single item. The assumption is based on the following. Let  = (θ1 , θ2 ) ⊂ IR be an open interval on the latent scale. There is some item located in, or near,  such that agreement with the item increases the posterior probability that this subject’s latent attitude θ is in the interval. Conversely, the probability decreases as the subject endorses items further and further away from this item. If the items were developed to study the political liberal– conservative scale, then an interval could be created to represent the political moderate range. The assumption states that there is some item that best describes the moderates’ political views. There is a higher probability that an individual who agrees with this item is a moderate, than an individual who agrees with a more liberal or conservative statement. Formally, the assumption states, [A5] For any distribution G(θ ) of latent trait values, and for all θ1 < θ2 , the posterior probability PG (θ1 < θ < θ2 | Xj = 1) is weakly unimodal in j (recall βj < βk for all j < k). Post (Theorem 2, p. 49) shows assumption [A4], which is equivalent to the MTR of the IRFs Pj (θ ), is also equivalent to the total positivity of order 2 (TP2 ) (Karlin, 1968); and (Theorem 3, p. 49) if the set of response functions Pj (θ ) > 0 for all j , then [A4] and [A5] hold if and only if the response functions satisfy total positivity of order 3 (TP3 ) (Karlin, 1968). 3. Results for the Nonparametric Unfolding Model In addition to being somewhat intuitive assumptions, [A1]–[A5] allow for the derivation of important results. Using assumptions [A1], [A2], and [A3] and a relaxation of [A4], section 3.1. proves that a class of estimators consistently estimate the rank ordering of respondents on the latent scale. section 3.2. shows that, under assumptions [A1]-[A5] that we can consistently estimate the rank order of item locations. Combining these two results, section 3.3. develops a practical nonparametric estimation methodology for dichotomous unfolding response models. 3.1. Nonparametric Estimation of Respondents’ Locations 3.1.1. Definition and background of the Thurstone estimator. The Thurstone estimator was first used for the measurement of attitudes by Thurstone (1927, 1928). The estimator, denoted TJ (X i ), requires that the vector β = (β1 , β2 , . . . , βJ )t of item is known. The Thurstone estimator for respondent i is J j =1 βj Xij β . (4) TJ (X i ) = J j =1 Xij

263

MATTHEW S. JOHNSON β

Notice that the Thurstone estimator is contained within the range of item locations, TJ (X ) ∈ [β1 , βJ ], for all respondents endorsing at least one item. The items provide no information about respondents who endorse none of the J items, and the Thurstone estimator is undefined in that case. A number of authors have used the Thurstone estimator to approximate the respondents locations on the latent scale. Andrich and Luo (1993), for example, first find a starting value for the vector of item locations, and then use those starting values to calculate the Thurstone estimator for each person in the sample. The Thurstone scores are then used as the starting values in the joint maximum likelihood procedure for estimating the parameters of the HCM. In many instances the locations of the items are unknown, but the ordering of items is known. In such cases van Schuur (1988) suggests replacing the item locations βj in 4 with the item ranks rj = k I{βk ≤βj } ∈ {1, . . . , J }. In order to keep the estimator bounded, replace the item ranks with the item quantiles qj = rj /J , in which case the rank-based Thurstone estimator is defined by  J   j =1 qj Xij if Jj=1 Xij > 0, J q X ij j =1 TJ (Xi ) = (5)   undefined if J X = 0. j =1

ij

This estimator does not necessarily give an estimate that is within the range of the item locations, rather the estimator is contained within the unit interval. The Thurstone estimator is undefined when a respondent does not endorse any item, which is sensible because we do not know which end of the attitude scale such individuals are likely located. If the items are measuring political conservatism, then we do not know whether an individual refuses to endorse the items because they are all too conservative, or because they are all too liberal. 3.1.2. Justification for the rank-based Thurstone estimator. One way to conceptualize the Thurstone estimator is to think of it in terms of the latent response formulation of the unfolding , then model discussed in section 2.2. If the latent responses ξj were observed for all j = 1, . . . , J the rank order of θ would be consistently estimated with the sum score (1/J )S = (1/J ) j ξj (Stout, 1990; Junker, 1991). In essence, Stout’s result, and Junker’s modification of it for polytomous items, are based on the fact that J 2 1 1 1  S→ k P r{ξj = k | θ } = kRj k (θ ) J J j,k J j =1 k=0

in probability as J → ∞, and that the right-hand side of the equation above is a monotone increasing function of θ . Now because the values of the latent responses ξ are unknown, they must be estimated with manifest quantities. The first step in estimating the latent responses is to note ξj = 1 if and only if Xj = 1, and so we can write 1 1 S= [Xj + 2I{ξj =2} ]. J J j Estimate the remaining latent quantity I{ξj =2} by noting E[I{ξj =2} | θ ] = P r{ξj = 2 | θ } = P r{ξj = 2 | Xj = 1, θ }Pj (θ ) + (1 − Pj (θ )) P r{ξj = 2 | Xj = 0, θ } = E(1 − Xj | θ ) P r{ξj = 2 | Xj = 0, θ }.

(6)

264

PSYCHOMETRIKA

Now replace E(1 − Xj | θ ) with the observed quantity 1 − Xj and approximate P r{ξj = 2 | Xj = 0, θ } with the proportion of endorsed items which lie above item j ,   X  >j if J=1 X > 0, J X  =1 (7) P r{ξj = 2 | Xj = 0, θ } ≈  undefined otherwise. Combine these approximations to obtain the following estimator for the rank order of θ when J X > 0, =1    J 1 1 >j X Xj + 2(1 − Xj ) J . S≈ J J j =1 =1 X Johnson (2004) shows that the quantity on the right-hand side of the equation above is equivalent q to 2TJ (X ) − 1/J . 3.1.3. A general definition of the Thurstone estimator. In order to generalize the two estimators in (4) and (5) the following definition is needed. Definition 1. The scoring scheme a is an ordered scoring scheme if a1 ≤ a2 ≤ · · · ≤ aJ implies β1 ≤ β2 ≤ · · · ≤ βJ . That is, the scores aj are ordered in the same way as the item locations βj . Define the Thurstone score by replacing the item location βj in (4) with the score aj (this is analogous to Junker’s (1991) definition of observed score for monotone item responses). That is, the Thurstone score for ordered score vector a is  J  aj Xij  j =1 if j Xij > 0, J a X ij j =1 TJ (X i ) = (8)   undefined if j Xij = 0. The Thurstone score is within the range of the scores [a1 , a2 , . . . , aJ ] as long as the subject agrees with at least one item, and is undefined otherwise. 3.1.4. Monotonicity of the true Thurstone score. Define the true Thurstone score, the population version of the Thurstone score, by replacing the observed responses Xij with their expected values. The true Thurstone score is therefore defined by J j =1 aj Pj (θ ) TJa (θ ) ≡ J . (9) j =1 Pj (θ ) If the true Thurstone score is a monotone function, then the latent trait θ can be estimated with θˆ = [TJa ]−1 (T˜ ), where T˜ is some estimate of the true Thurstone score and θ = [TJa ]−1 (t) is the inverse of the true Thurstone score function t = TJa (θ ), a function of θ . To demonstrate that the true monotone score in (9) is in fact monotone under assumptions [A1], [A2], and [A3] consider the following. Suppose a subject is given J unfolding items and asked to choose the one item he or she agrees with most, rather than being allowed to choose all of the items he or she wishes to endorses. Let Y denote this response. One might expect the probability for choosing item j to be Pj (θ ) . P r{Y = j | θ } = Pj∗ (θ ) ≡ J k=1 Pk (θ )

(10)

MATTHEW S. JOHNSON

265

In effect, the J item direct response unfolding model is converted into a single J -category forcedchoice polytomous item. The remainder of the paper refers to the item response model defined by Pj∗ (θ ) above as the associated forced-choice model (AFCM). If one subject was located above another on the latent scale, then the first subject would be more likely to choose items located high on the scale than would the second. That is, if items are ordered from low to high scale values, the hypothetical J -category polytomous forced-choice item is assumed to be a monotone item where Pj∗ (θ ) is an item-category response function [ICRF], and Qk (θ ) ≡ P (Y ≥ k | θ ) =

J 

Pj∗ (θ )

is nondecreasing in θ

(11)

j =k

for all k = 1, . . . , J [Q1 (θ ) ≡ 1]. Hemker et al. (1997) calls the class of polytomous item response models that satisfy the monotonicity condition in (11) nonparametric graded response models (np-GRMs). The following lemma states that if the AFCM is a monotone response model (i.e., the AFCM is an np-GRM), then the true Thurstone score is a nondecreasing function of θ . The result is used to prove the consistency of the Thurstone estimator for subjects’ latent attitudes. Lemma 1. If the AFCM for the unfolding response function  Pj (θ ) is annp-GRM, then the true Thurstone score with ordered scoring scheme a, TJa (θ ) = j aj Pj (θ )/ k Pj (θ ), is nondecreasisng in θ . Proof. By defining a0 = 0 we have TJa (θ ) =

J  j =1

=

J 

aj Pj (θ ) J k=1 Pk (θ ) aj Pj∗ (θ )

j =1

=

J  (aj − aj −1 )Qj (θ ) j =1

J  = a1 + (aj − aj −1 )Qj (θ ). j =2

Because each Qj (θ ) is nondecreasing and aj − aj −1 ≥ 0 for all j = 1, . . . , J , TJa (θ ) is clearly  nondecreasing in θ . Should the AFCM really be expected to be an np-GRM? Recall Post’s (1992) assumption [A4], reviewed in section 3.3 above, which asserts that the posterior probability P r{θ > θ0 | Xj = 1} is nondecreasing in j for all prior distributions G(θ ) of latent attitudes. As stated in section 3.3, this condition is equivalent to the MTR. The next lemma shows that the AFCM constructed from unfolding items exhibiting MTR (according to 10) is indeed a montone response item satisfying the monotonicity condition in (11). Lemma 2. If the set of unfolding response functions defined by Pj (θ ) exhibits an MTR, then the J -category AFCM defined in (10) is a monotone response model.

266

PSYCHOMETRIKA

Proof. Let Y be a response such that P r{Y = k | θ } = Pk∗ (θ ). Now note that P r{Y = j | θ ; Y = j or Y = j − 1} = =

Pj∗ (θ ) Pj∗ (θ )

+ Pj∗−1 (θ )

Pj (θ ) Pj (θ ) + Pj −1 (θ )

(12)

is nondecreasing in θ because by the MTR assumption Pj (θ )/Pj −1 (θ ) is assumed to be nondecreasing in θ . Hemker et al. (1997) calls the class of polytomous item response models, where the adjacent category odds ratio in (12) is nondecreasing in θ , nonparametric partial credit models (npPCMs) and shows that np-PCMs are a subclass of np-GRMs. Hence the AFCM is an np-PCM  and therefore also an np-GRM. Because the np-PCM is a special case of the np-GRM, assumption [A4 ] is a stronger assumption than is necessary for the result stated in Theorem 1. Let [A4 ] be the assumption that the AFCM is an np-GRM. 3.1.5. Convergence of the sample Thurstone score to the true Thurstone score. Because the true Thurstone score is a nondecreasing function of θ , anything that orders the true Thurstone scores also orders the corresponding latent attitudes θ . So all that is needed is a good estimate of the true Thurstone score. A natural choice is the observed Thurstone score TJa (X ). This section proves that the difference between the Thurstone estimator TJa (X ) defined in (8), and the true Thurstone score TJa (θ ) defined in (9) becomes negligible in probability as J → ∞. As in Junker (1991) assume there exists a triangular array of item sets of growing length J designed to study the latent trait θ . Let X (J ) denote the set of length J . Thus the sequence of item sets is X (1) = (X1(1) ), X (2) = (X1(2) , X2(2) ), .. .

(13)

X (J ) = (X1(J ) , X2(J ) , . . . , XJ(J ) ). .. . The set of items of length J − 1 need not be a subset of the set of items of length J . In addition to the items changing, the item scoring scheme might also change. Let aJ = (a1J , . . . , aJJ ) denote the ordered scoring scheme for the set of items of length J and assume that the scoring scheme is bounded above by M < ∞ for all J (i.e., aJ J < M < ∞ for all J ). In order to ensure that the true Thurstone score is wellbehaved, we must assume that each IRF is bounded below by some function. Specifically, assume: [A6] For each row J = 1, 2, . . . of the triangular array in (13), there is a positive function of θ that bounds each traceline from below. That is, there exists a function cJ (·) such that Pj J (θ ) = P r{Xj(J ) = 1 | θ } > cJ (θ ) > 0 for each j = 1, . . . , J . To simplify notation, let TJa (X ) denote the Thurstone score calculated from the J th row of the triangular array in (13) (i.e., TJa (X ) ≡ TJaJ (X (J ) )) and, similarly, TJa (θ ) denotes the true Thurstone score calculated from the J th row of items in (13).

267

MATTHEW S. JOHNSON

Theorem 1. If assumptions [A1], [A2], and [A6] hold, and aj J < M < ∞ for all J = 1, 2, . . . and j = 1, . . . , J (the scoring scheme need not be ordered), then conditional on θ, TJa (X ) approaches TJa (θ ) in probability. That is, lim P r{ |TJa (X) − TJa (θ )| > | θ } = 0.

J →∞

Proof. The proof proceeds by first showing that the numerator and denominator of TJa (X ) approach the numerator and denominator of TJa (θ ) respectively.   • Show (1/J ) j aj J Xj(J ) − (1/J ) j aj J Pj J (θ ) → 0 in probability. For any > 0, by Chebyshev’s inequality,

 

 



1   



1 | aj J Xj(J ) − aj J Pj J (θ )| >

θ ≤ 2 2 Var  aj J Xj(J )

θ  Pr J

 J

j j j =

1  2 a Pj J (θ )[1 − Pj J (θ )]. J 2 2 j j J

Now because, p(1 − p) ≤ 14 when p ∈ (0, 1) and aj J < M for all j , 



 1  

M 2 J →∞ (J ) Pr aj J Xj − aj J Pj J (θ )| >

θ < −→ 0 | J 4J 2

 j j   So 1/J j aj J Xj(J ) − 1/J j aj J Pj J (θ ) = op (1).   • Show 1/J j Xj(J ) − 1/J j Pj J (θ ) → 0 in probability. By applying the previous result with the item scores all equal to one (aj J = 1 for all j = 1, ..., J and J = 1, 2, ...) the desired result holds.  Because (1/J ) j Pj J (θ ) > cJ (θ ) > 0 by assumption [A6] we have   (1/J ) ajJ Xj (J ) (1/J ) ajJ PjJ (θ ) J →∞ a a   TJ (X ) − TJ (θ ) = − −→ 0 in probability. (14) (1/J ) Xj (J ) (1/J ) PjJ (θ ) Theorem 1 shows that the difference between the observed and the true Thurstone scores gets vanishingly small in probability as the number of items increases for each θ on the latent scale. This suggests estimating θ with [TJa ]−1 (TJa (X )). In order to allow for discontinuities, define the inverse function [TJa ]−1 (u) by [TJa ]−1 (u) = inf {θ : TJa (θ ) ≥ u}. θ

[TJa ]−1

to be well defined it is necessary for the scoring It is important to note that in order for the scheme a J to be ordered for all J . The following definition describes a condition under which  [TJa ]−1 (TJa (X )) is a consistent estimator of θ . Definition 2. Let TJa (θ ) be the Thurstone score for ordered scoring scheme aJ for the J items XJ . Suppose that for every fixed θ1 on the latent scale, there exists θ1 > 0 and an open neighborhood

268

PSYCHOMETRIKA

Nθ1 such that for all θ2 in this neighborhood TJa (θ2 ) − TJa (θ1 ) ≥ θ1 θ2 − θ1

forall J.

(15)

Then the sequence of items X is said to be locally asymptotically discriminating (LAD) with respect to the latent trait θ and ordered scoring schemes aJ . The definition of LAD is essentially the same as that given for monotone items in Stout (1990) and Junker (1991). The following theorem shows that if the item sequence X is LAD, then [TJa ]−1 (TJa (X )) consistently estimates the latent trait θ . Theorem 2. Suppose the sequence of items X is LAD with respect to θ and ordered scoring scheme aJ . Then, conditional on θ , J →∞

[TJa ]−1 (TJa (X )) −→ θ

in probability.

(16)

Proof. The proof follows the proof of Stout (1990, Theorem 3.6) for a similar theorem concerning  monotone items. Specific details of the proof are available in johnson (2001). The previous theorem provides a large number of consistent estimators for θ , each differing by only the ordered scoring scheme a that is used. However, in order to calculate this consistent estimator both TJa (θ ) and [TJa ]−1 (θ ) are needed. This requires knowledge of each IRF Pj (θ ), which are generally unknown. The next section returns to this problem. 3.2. Consistent Estimation of Item Ordering The previous section provides a method for the consistent estimation of the rank ordering of respondents when the rank order of items is known. In most cases one does not know the true rank ordering of items; hence we require a method for the estimation of the rank ordering of items. This section reviews results due to Post (1992) and uses these results to introduce a method for the consistent estimation of item ordering. By making assumptions [A1]–[A5], Post is able to make a statement about the conditional adjacency matrix IP , defined by IPj k = P r{Xj = 1 | Xk = 1}. Post’s theorem basically states that the probability an item is endorsed (Xj = 1), conditional on another item being endorsed (Xk = 1), should be maximal when the conditioning item Xk is adjacent (i.e., k = j − 1 or j + 1). Formally, Post (1992, pp. 49–50) shows that for all j the vector of conditional probabilities P r{Xj = 1 | Xk = 1} is weakly unimodal in k = j (recall that items are assumed to be ordered according to their locations βj ). In addition, Post (1992) shows that kj∗ = argmaxk P r{Xj = 1 | Xk = 1} is a nondecreasing function of j , except for possible inversions along the diagonal in IP (i.e., when kj∗ = j + 1 and kj∗+1 = j ). This result has two practical uses. First, the sample version of the conditional adjacency matrix, defined by N ˜ jk = IP

i=1

N

Xij Xik

i=1

Xik

(17)

MATTHEW S. JOHNSON

269

can be examined to see if it has the pattern required by the unfolding model. This is used as an exploratory tool to determine whether or not assuming the items are unfolding items is a valid assumption. Second, if the items are unfolding items and the matrix of conditional probabilities IP is known, the items can be ordered along the latent scale with the following algorithm. Algorithm 1. Because each row j in the matrix IP is weakly unimodal in the columns k, the minimum entry in each row occurs at either the highest ranked item (J ) or the lowest ranked item (1). So, argmink IPj k = 1 or J

for all j .

Furthermore, the minimum entry in the conditional adjacency matrix IP is in the column corresponding to the item that is either lowest or highest on the latent scale. 1. Let ρ = minj,k IPj k and m denote the column containing this minimum entry of IP : m = {k : P r{Xj = 1 | Xk = 1} = ρ; j = 1, . . . , J }. 2. Rank the items according to the mth row of the conditional adjacency matrix IP . Whether the items are ranked from lowest to highest or highest to lowest depends on some directional constraint. This paper utilizes the constraint (r1 ≤ rJ ) by first calculating  rk = I{IPj m ≤IPkm } . j

If r1 > rJ , then the ranks are transformed, rk = J − rk + 1. If there is more than one column for which IPj k = ρ (e.g., m1 and m2 ), then either items m1 and m2 are the lowest and highest ranked items on the latent scale, or items m1 and m2 are tied for the highest (or lowest) ranked items on the survey. When multiple minima occur, this paper takes the average of the rankings based on the multiple columns. Algorithm 1demonstrates one method for the rank ordering of items utilizing the conditional ˜ defined in 17. adjacency matrix IP . However the matrix IP is unknown, so it is estimated with IP ˜ is a consistent estimator of IP . The next lemma proves that IP Lemma 3.3. Assume that respondents are randomly sampled from the population F (θ ) (i.e., ˜ is a consistent estimate of the θ ∼ F (θ )), then the observed conditional adjacency matrix IP population conditional adjacency matrix IP . That is, ˜ j k N→∞ −→ IPj k IP

in probability

(18)

for all j, k ∈ {1, . . . , J } such that P r{Xk = 1} > 0. Proof. Conditional on θ , the vector of latent attitudes, we have, by application of the weak law of large numbers and an elementary result from probability for the limit of a ratio, that   i Xij Xik N→∞ i E[Xij Xik | θi ] ˜ IP j k =  in probability (19) −→  i Xik i E[Xik | θi ]  as long as i E[Xik | θi ] is bounded away from zero. Now, because the respondents are randomly sampled from the population (θi ∼ F (θ )),  i E[Xij Xik | θi ] N→∞ E[Xj Xk ]  = IPj k −→ in probability (20) E[Xk ] i E[Xik | θi ]

270

PSYCHOMETRIKA

from Monte Carlo integration theory (Ripley, 1987). Substituting (20) into (19) leads to the ˜ j k → IPj k in probability as N → ∞.  desired result: IP To estimate the rank order of items along the attitude scale, replace the true conditional ˜ . Replacing the true quantiles qj in the adjacency matrix IP in Algorithm 1with its estimate IP rank-based Thurstone estimator in (5) with their estimated values q˜ j = r˜j /J leads to the following approximation to the Thurstone estimator:  ˜ j Xj j q q˜ . TJ =  j Xj Alternatively, the mth row of the observed conditional adjacency matrix can be used as an ordered scoring scheme for calculating the Thurstone score, in which case  ˜ ˜ j IP j m Xj IP TJ =  , (21) j Xj where m is defined as it is in Algorithm 1 Although the estimated ranks converge to the true ranks, r˜j → rj , in probability as the number of respondents increases, N → ∞, the true Thurstone score calculated with estimated ˜ m ), T q˜ (θ ), is not necessarily a nondecreasing function ranks (or estimated scoring scheme IP J for fixed N . Therefore the rank order of respondents cannot be consistently estimated with the observed Thurstone score when the number of respondents N is small. We do conjecture, however, that if the number of items J increases as the number of individuals N increases, that the Thurstone score calculated with estimated ranks will converge to a nondecreasing function, and that estimates of the respondents’ locations based on this score will be consistent. Some positive evidence for this conjecture is provided by the examples considered below. 3.3. Regressing Unfolding Responses on the Thurstone Score Now that we have a consistent estimator of the rank order of the respondents, location on the latent scale, namely TJa (X ), one may wish to use it to examine the shape of the response functions Pj (θ ). Theorem 3 proves that the probability of endorsement of a single item given the true Thurstone score TJa (θ ) = t is unimodal in t. Theorem 3. For items j = 1, . . . , J the following is true: E[Xj | TJa (θ ) = t]

is weakly unimodal in t.

(22)

Proof. Lemma 1 states that the true Thurstone score TJa (θ ) is a nondecreasing function of θ . Therefore, for any t1 ≤ t2 ≤ TJa (βj ), we have [TJa ]−1 (t1 ) = inf{u : TJa (u) ≥ t1 } ≤ inf{u : TJa (u) ≥ t2 } = [TJa ]−1 (t2 ) ≤ βj . By assumption [A3], E[Xj | θ ] = Pj (θ ) is a weakly unimodal function of θ , taking its maximum at θ = β. Therefore, E[Xj | TJa (θ ) = t1 ] = E[Xj | θ = [TJa ]−1 (t1 )] ≤ E[Xj | θ = [TJa ]−1 (t2 )] = E[Xj | TJa (θ ) = t2 ].

(23)

MATTHEW S. JOHNSON

271

So E[Xj | TJa (θ ) = t] is increasing in t for all t less than TJa (βj ). Similarly, it can be shown that for TJ (β) ≤ t1 ≤ t2 , E[Xj | TJa (θ ) = t1 ] ≥ E[Xj | TJa (θ ) = t2 ]. Hence E[Xj | TJa (θ ) = t] is a weakly unimodal function of t.



Because the expected value of response Xj is a unimodal function of the true Thurstone score TJa (θ ) one might expect the probability of endorsement given the observed Thurstone score TJa (X) = t to also be unimodal in t. If this were the case, then we could examine the shape of the IRF with an estimate of E[X j | TJa (X) = t]. However, the fact that the probability of endorsement given the true Thurstone score is a unimodal function does not ensure that the probability of endorsement, conditional on the observed Thurstone score, is unimodal. In fact, as the next lemma shows, the probability of endorsing the lowest item on the latent scale, conditional on the observed Thurstone score is nonincreasing for TJa (X) < a2 , where a2 is the score associated with the second lowest item. Lemma 4. The following identities hold for unfolding response models E[X1 | TJa (X) = t] = 1

for all t < a2 ,

(24)

E[XJ | TJa (X) = t] = 1

for all t > aJ −1 .

(25)

Proof. Because

 TJa (X)

j

= 

aj Xj j

Xj

is a weighted average of the item scores aj , we know that TJa (X) < a2 implies that X1 = 1. Hence, E[X1 | TJa (X) = t] = P r{X1 = 1 | TJa (X) = t} = 1 for all t < a2 . The proof for XJ is similar.



So any estimate that uses the observed Thurstone score as the predictor of the item responses will be unable to provide any information about unimodality of the highest and lowest items on the latent scale; the conditional regression for these items is not unimodal. In monotone response models Junker (1993) showed that the item-rest regression that calculated a score based on all items except the one under consideration did in fact have nice properties. A similar approach might examine the expected value of the response to item j a (X ); the Thurstone score calculated using all items conditional on the rest-Thurstone score (T−j except j , the one under consideration), that is, we might examine the regression function µj (t) = a a (X ) = t]. Note T−j (X ) is defined only for the 2J −1 − 1 nonzero response patterns, E[Xj | T−j and hence µj (t) is only defined for those values of t. Figure 2 examines the rest-Thurstone score regression function µj (t) for the fourth ranked item from a set of eight HCM unfolding items; item locations were simulated from a standard normal distribution. The figure plots the values of µj (t) for the 127 possible values of the restThurstone score (points are connected for presentation purposes). If the rest-Thurstone regression function µj (t) was unimodal, then we could examine an estimated version of it using a kernel smoother to examine the properties of the data. However as Figure 2 demonstrates, the function is

272

0.55 0.50

E(Xk | T−k(X))

0.60

PSYCHOMETRIKA

0

1

2

3

4

5

6

T−k(X)

FIGURE 2. The rest-Thurstone score regression function for a single item from a set of eight HCM unfolding items, with normally distributed item locations.

not necessarily unimodal, even when the true rank ordering of the items is known. There appears to be an overall unimodal trend, but locally there are jumps and drops in the function. One place that the function depicted in Figure 2 violates unimodality is between the three Thurstone scores, 2.25, 2.33, and 2.40; the value of the regression functions is 0.61, 0.59, and 0.62 at these three Thurstone scores, respectively. Table 1 examines the rest-response patterns (the response pattern of the remaining seven items) that lead to these rest-Thurstone scores; a space is left between the third and fifth ranked items to show where the fourth item response would be. TABLE 1. The rest-response patterns for three rest-Thurstone scores that produce a nonunimodal pattern in the regression function of the fourth item response given the rest-Thurstone score.

Rest-Thurstone score

Rest-response patterns

2.25

101 1100 110 1010 111 0001 011 0100 100 1100 101 0010 110 0001 111 0110 111 1001

2.33

2.40

MATTHEW S. JOHNSON

273

FIGURE 3. The conditional density ((a) and cumulative distribution (b) functions of the latent attitude θ given three values of the rest-Thurstone score T−k (θ ).

The assumptions of the unfolding response model suggest that individuals are more likely to endorse the fourth item, if they also endorse items around that item than they would be if they endorse items far from the fourth item. For all response patterns corresponding to rest-Thurstone scores of 2.25, and 2.40, at least one of the third and fifth items has been endorsed. For the rest-Thurstone score of 2.33 the last response pattern for the score does not have either of the third or fifth items endorsed, which may explain the violation of unimodality. It is also useful to examine the conditional posterior distribution of the latent attitude a (X ), because the rest-regression function is θ conditional on the rest-Thurstone score T−k defined as µk (t) = E[Xk | T−k (X )]  = E[Xk | θ ]f (θ | T−k (X )) dθ θ



Pk (θ )f (θ | T−k (X )) dθ.

= θ

The more concentrated (in the statistical sense; Bickel and Lehmann, 1979) the posterior density f (θ | T−k (X )) is around the location of the fourth item k = 4 (β4 = 0.08), the larger the value of the regression function µk (t). Figure 3 displays the three density functions for rest-Thurstone scores 2.25, 2.33, and 2.40 in panel (a) and the cumulative distribution function of | θ − β4 |, F (t | T−k (X)) = P (|θ − β4 | < t | T−k (X)) in (b). It is clear from the figure that the condtional distributions of θ given rest-Thurstone scores of 2.25 and 2.40 are more concentrated around 0.08, than is the density corresponding to a score of 2.33. This is especially evident in panel (b) where F (t | T−4 (X) = 2.4) > F (t | T−4 (X) = 2.25) > F (t | T−4 (X) = 2.33) for all values of t, which shows that the conditional distribution of θ given a score of 2.40 is more concentrated around β = 0.08, than given a score of 2.25, which is in turn more concentrated than the distribution given a score of 2.33.

274

PSYCHOMETRIKA

4. Simulation Study To evaluate the effectiveness of the nonparametric estimators for both item rank, and subjects’ latent attitudes, this section examines data simulated from the HCM in (1) with equal item units γj ; the HCM with equal item units does satisfy assumption [A4], and although assumption [A5] does not hold for the HCM, the nonparametric estimators perform quite well. Section 4.1. investigates the item ordering algorithm (Algorithm 2). Section 4.2. studies the performance of the Thurstone estimator of subjects’ latent attitudes for an increasing number of items, for both known and estimated item rankings. 4.1. Estimation of Item Ordering The simulation draws individuals’ latent attitudes [θ ] independently from the standard normal distribution, and then draws responses for these individuals from the HCM in (1) with item locations β = (1.86, 1.49, 0.08, −0.30, −0.49) ; each item with equal item-units set at γ = log(4) (although the HCM with equal item-units does satisfy assumption [A4], assumption [A5] has not been verified). In order to study how effectively Algorithm 1 estimates the rankordering of items for various numbers of respondents, the algorithm was applied 1000 times for each sample size N = 100, 250, 500, 1000, and 2500. Then, √ for each N , the modal rank for each item is calculated, and the root mean squared error (denoted MSE) of each item’s quantile rank (qj = rj /J ) is calculated from the 1000 simulations. When all items are examined jointly, the modal rank ordering for each sample size was equal to the true rank ordering, which occurred in 8.5%, 12.5%, 20.5%, 29.4%, and 47.5% of the simulated data sets for the sample sizes N =100, 250, 500, 1000, and 2500, respectively. Table 2 summarizes the marginal sampling distributions of the estimated ranks for each item. Although the mode of the joint sampling distribution for all items peaked at the true rank ordering, the distribution for Item 4 has its mode at 5 for both N = 100 and N = 250. The mode of the marginal sampling distributions for the other four items is equal to the true rank. Furthermore, as the number of√individuals used in the simulation increases the estimates become more precise as indicated by MSE. Figure 4 examines the sampling distribution of the item rank estimates for all five items when the number of respondents is only N = 100 more closely. These barplots suggest that the algorithm has the most difficulty finding the correct item rank for the third and fourth items. Item 3 is ranked as the fourth item almost as often as it is correctly identified, and as discussed earlier Item 4 was ranked fifth more often than it was correctly identified as the fourth. The distance on the attitude scale between the items (0.19 between Items 4 and 5, and 0.38 between Items 3 and 4) is likely one contributing factor to the estimation problems. However, the first two items are actually closer to one another (0.37) than Items 3 and 4. TABLE 2. Item rank estimates found using 100, 250, 500, 1000, and 2500 respondents to five items where item responses are simulated from the HCM.

N = 100 mode Item 1 Item 2 Item 3 Item 4 Item 5

1 2 3 5 5



MSE

(0.19) (0.26) (0.23) (0.26) (0.27)

N = 250 mode 1 2 3 5 5



MSE

(0.15) (0.21) (0.21) (0.22) (0.24)

N = 500 mode 1 2 3 4 5



MSE

(0.11) (0.16) (0.19) (0.19) (0.22)

N = 1000 mode 1 2 3 4 5



MSE

(0.10) (0.12) (0.15) (0.16) (0.18)

N = 2500 mode 1 2 3 4 5



MSE

(0.08) (0.08) (0.11) (0.13) (0.13)

275

MATTHEW S. JOHNSON

0.10

Probability

0.20

0.30 0.10

Probability

0.20

0.5 0.4 0.3 0.2 1

2

3

4

5

1

2

3

4

0.00

0.0

0.00

0.1

Probability

Item 3 0.30

Item 2

0.6

Item 1

5

2

3

4

5

Item 5

0.3 0.0

0.00

0.1

0.2

Probability

0.20 0.10

Probability

0.4

0.30

Item 4

1

1

2

3

4

5

1

2

3

4

5

FIGURE 4. Distribution of estimated ranks over 1000 simulations of 100 individuals’ responses to five items. Data was simulated from the HCM.

Figure 5 examines the sampling distribution of the item rank estimates for the fourth item for sample sizes N = 250, 500, 1000, and 2500 more closely. Indeed, as Lemma 3.3 suggests, as the number of respondents increases, the precision of the item rank estimate improves. For small sample sizes the ranking algorithm has difficulty scaling this item accurately; the algorithm correctly scales the fourth item in approximately 32% of the samples for N = 250. As the sample sizes increase the algorithm gets more and more accurate; when N = 2500 respondents are used, the algorithm correctly ranked the fourth item in 59% of the simulated data sets. The simulation study discussed herein only examines five items, which is a small number for an attitude survey. Typical attitude surveys contain between 10 and √ 25 items. For a fixed number of respondents, increasing the number of items also increases the√ MSE. For example, adding three items to the five discussed in this section increases the MSEs to 0.29, 0.36, 0.22, 0.35, and 0.49. As more and more items are added to the scale, it becomes more and more difficult to distinguish between them with the same number of respondents. The following section demonstrates that this characteristic of the ranking algorithm has direct implications for the estimation of the Thurstone score with estimated item ranks. 4.2. Estimation of Respondent Locations This section discusses the results of a Monte Carlo simulation study designed to study the efficacy of the observed Thurstone score as an estimate of the true Thurstone score. The study drew N =100, 500, 1000, and 2000 latent traits [θ ] from the standard normal distribution, and used these simulated traits to produce item responses to J = 5, 10, 50, and 100 equally spaced

276

PSYCHOMETRIKA N=500 0.30 3

4

0.20

Probability 2

0.00

1

0.10

0.20 0.10 0.00

Probability

0.30

N=250

5

1

2

4

5

4

5

N=2500

2

3

4

5

0.4 0.2

Probability 1

0.0

0.3 0.2 0.1 0.0

Probability

0.4

N=1000

3

1

2

3

FIGURE 5. The distribution of estimated rank order for the fourth item from the simulated data set.

items [β] between −2 and 2; the item responses were drawn from the HCM with γj = log(4) for all items. For each respondent, the study examines the observed rank-based Thurstone score in (5) using two sets of ranks: the true item ranks, and the estimated ranks from Algorithm 1. The study repeated the simulation 10,000 times for each sample size (N )-survey length (J ) combination. Ranks were estimated for each of the 10,000 simulations. Figure 6 examines the Monte Carlo approximated bias of the rank-based Thurstone scores q q q as estimators of the true Thurstone score (i.e., bJ (θ ) = TJ (θ ) − E[TJ (X ) | θ ]); the dashed gray line in each panel represents the bias of the Thurstone score that uses the true item ranks, the solid black line uses estimated item ranks. In general the bias function can take on any value between −1 and 1. The lower left panel in the figure shows that the observed Thurstone score using estimated ranks for N = 100 and J = 5 performs the worst; the bias function in that case q˜ q˜ is nearly linear ranging from bJ (−3) = −0.04 to bJ (3) = 0.04. As Theorem 1 states, for a fixed number of respondents (N ) the observed Thurstone score that uses true item ranks approaches the true Thurstone score in probability as the number of items increases; the bias function bq (θ ) of the observed Thurstone score approaches a horizontal line at zero as the number of items increases for all sample sizes. Not only does it appear that the observed score approaches the true score in probability as the number of items (J ) increases, but it also appears that the mean squared errors MSE =

N 2 1  q q TJ (xi ) − TJ (θi ) , N i=1

which are presented in Table 3, converge to zero as the number of items J → ∞ for each N . The observed Thurstone score based on estimated ranks does not fair so well. For a small number of items (J = 5) the observed score using estimated ranks actually appears to outperform the score using true ranks. However, as the number of items increases the bias function bq˜ (θ ) moves further away from horizontal. Take, for example, the case where there are J = 10 items

277

MATTHEW S. JOHNSON

1

1

1

1

0.02

1

3

1

3

1

3

0.02 0.02

3

1

0.02

θ

0.02

3

θ

q

bJ(θ) 1

bJ(θ)

3

3

θ

θ

0.02

bJ(θ) 1

b J( θ )

3

θ

θ

3

0.02

bJ(θ)

3

0.02

bJ(θ)

1 θ

0.02

bJ(θ)

0.02

bJ(θ)

1 θ

J=100

3

θ

1 θ

0.02

b J( θ )

3

3

bJ(θ)

1 θ

J=50

1 θ

0.02

b J( θ )

0.02

b J( θ )

J=10

3

θ

bJ(θ)

3

θ

N=2000

0.02

bJ(θ)

bJ(θ) 1

N=1000

0.02

N=500

0.02

J=5

bJ(θ)

N=100

3

θ

θ

FIGURE 6.

q

q

Monte Carlo approximated bias functions (bJ (θ ) = TJ (θ ) − E[TJ (X ) | θ ]) for the observed Thurstone scores using true ranks (gray dashed curves) and estimated ranks (solid black curves).

and N = 500 respondents. The solid black curve corresponding to the Thurstone score based on q q˜ estimated ranks (TJ ) is closer to horizontal than the score using true ranks (TJ ). However, as q q˜ q˜ the number of respondents increases TJ approaches TJ . This result suggests that TJ (X ) is not a consistent estimator of the true Thurstone score TJ (θ ) for a fixed number of respondents (N ), that is, both N and J must approach infinity for consistency to hold. TABLE 3. The mean squared error of the observed Thurstone score using true ranks. The numbers presented in this table are MSE × 107 , so the MSE for N = 100 and J = 10 is 1.108 × 10−4 .

N J

100

500

1000

2000

5 10 50 100

1108 464 19 8

1073 385 21 8

1030 378 23 7

1022 399 22 7

278

PSYCHOMETRIKA

5. Discussion In the last 15 years several authors have developed and studied parametric unfolding response models for the analysis of attitudinal surveys. However, unlike monotone response models, very little work has been done on the nonparametric behavior of unfolding response models. This paper, by building on the work of Post (1992), broadens the field of nonparametric unfolding response models for dichotomous data, and in doing so, gives researchers a bigger bag of tools for the analysis of attitudinal surveys. One of the main accomplishments of this paper is to provide rigorous justifications for the use of the conditional adjacency matrix and the Thurstone score for the estimation of the rank order of items and respondents on the latent scale. The justification requires that at least some parts of Post’s (1992) definition of the nonparametric unfolding response model hold. Although the definition is somewhat restrictive (some popular parametric unfolding response models do not satisfy the assumptions of the definition) it is viewed as a sufficiently broad definition for the results presented here. Future research in the field of nonparametric unfolding should attempt to find similar results under assumptions less restrictive than Post’s assumptions. The proof that the Thurstone score consistently estimates the rank order of respondents according to their latent attitude assumes that the rank order of the item locations is known. Although the paper conjectures that the Thurstone score calculated using estimated item rankings as a consistent estimate of the respondent rankings, it will surely perform poorly in small samples, as demonstrated by the simulation. References Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement 2, 451–462. Andrich, D. (1988). The application of an unfolding model of the PIRT type for the measurement of attitude. Applied Psychological Measurement, 12, 33–51. Andrich, D. (1989). A Probabilistic IRT model for unfolding preference data. Applied Psychological Measurement, 13(2), 193–216. Andrich, D., & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17, 253-276. Bickel, P.J., & Lehmann, E.L. (1979). Descriptive statistics for nonparametric models IV: Spread. In Jureckov´a, (Ed.), Contributions to Statistcs, H´ajek Memorial Volume, (pp. 33–40). Prague Academia. Casella, G., & Berger, R.L., (1990). Statistical Inference, Belmont, CA, Duxbury. Coombs, C.H.(1964). A theory of data, New York, Wiley. Davison, M., (1977). On a metric, unidimensional unfolding model for attitudinal and developmental data. Psychometrika, 42, 523–548. DeSarbo, W.S., & Hoffman, D.L. (1986). Simple and weighted unfolding threshold models for the spatial representation of binary choice data. Applied Psychological Measurement, 10, 247–264. Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimates. Psychometrika, 47, 7–28. Formann, A.K. (1988). Latent class models for nonmonotone dichotomous items. Psychometrika, 53, 45–62. Hemker, B.T., Sijtsma, K., Molenaar, I. W., & Junker, B.W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62, 331–347. Hoijtink, H.,(1990). A latent trait model for dichotomous choice data. Psychometrika, 55, 641–656. Hoijtink, H. (1991). PARELLA: Measurement of latent traits by proximity items. Leiden, The Netherlands: DSWO Press. Hoijtink, H., & Molenaar, I.W. (1992). Testing for diff in a model with single peaked item characteristic curves: The parella model. Psychometrika, 57, 383–397. Johnson, M.S., (2001). Parametric and non-parametric extensions to unfolding response models.PhD thesis, Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA. Johnson, M.S.(2004). Nonparametric estimation of item and respondent locations from unfolding-type items. Technical report. Baruch College, Department of Statistics & Computer Information Systems, New York: Available for downloand at stat.baruch.cuny.edu/johnson/NPUF.pdf. Johnson, M.S., & Junker, B.W. (2003). Using data augmentation and Markov chain Monte Carlo for the estimation of unfolding response models. Journal of Educational and Behavioral Statistics, 28(3), 195–230. Junker, B.W. (1991). Essential independence and likelihood-based ability estimation for polytomous items. Psychometrika, 56, 255–278.

MATTHEW S. JOHNSON

279

Junker, B.W. (1993). Conditional association, essential independence and monotone unidimensional item response models. Annals of Statistics, 21, 1359–1378. Junker, B.W., & Sijtsma, K. (2001). Nonparametric item response theory in action: An overview of the special issue. Applied Psychological Measurement, 25, 211–220. Karlin, S. (1968). Total positivity (Vol. 1). Stanford, CA: Stanford University Press. Luo, G. (1998). A general formulation for unidimensional latent trait unfolding models: Making explicit the latitude of acceptance. Journal of Mathematical Psychology, 42, 400–417. Maris, E. (1995). Psychometric latent response models. Psychometrika, 60, 523–547. Maris, G., & Maris, E. (2002). Are attitude items monotone or single-peaked? An analysis using Bayesian methods. Technical Report 2002-02, Arnhem: Citgogroep Measurement and Research Department. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. Mokken, R.J. (1971). A theory and procedure of scale analysis. New York: De Gruyter. Muhlberger, P. (1999).A general unfolding, non-folding scaling model and algorithm. Presented at the 1999 American Political Science Association Annual Meeting, Atlanta, GA. No¨el, Y. (1999). Recovering unimodal latent patterns of change by unfolding analysis: Application to smoking cessation. Psychological Methods, 4(2), 173–191. Post, W.J. (1992). Nonparametric unfolding models: a latent structure approach. M&T Series, Leiden, The Netherlands: DSWO Press. Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56, 611–630. Ripley, B.D. (1987). Stochastic simulaton. New York: Wiley. Roberts, J.S., Donoghue, J.R., & Laughlin, J.E. (1999). A general model for unfolding unidimensional polytomous responses using item response theory. Applied Psychological Measurement. Roberts, J.S., Donoghue, J.R., & Laughlin, J.E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3–32. Rosenbaum, P. (1987). Comparing item characteristic curves. Psychometrika, 52, 217–233. Sijtsma, K. (1998). Methodology review: nonparametric IRT approaches to the analysis of dichotomous item scores. Applied Psychological Measurement, 22(1), 3–31. Sijtsma, K., & Junker, B.W. (1996). A survey of theory and methods of invariant item ordering, with results for parametric models. British Journal of Mathematical and Statistical Psychology, 49, 79–105. Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325. Thurstone, L.L. (1927). A law of comparative judgment. Psychological Review, 34, 278–286. Thurstone, L.L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554. van Schuur, W.H. (1988). Stochastic unfolding. In W.E. Saris, & I.N. Gallhofer, (Eds.), Sociometric research, volume I: Data collection and scaling(vol. 1,chap. 9, p. 137–157). London: Macmillan. Verhelst, N.D., & Verstralen, H.H.F.M. (1993). A stochastic unfolding model derived from the partial credit model, Kwantitative Methoden, 42, 73–92. Manuscript received 23 JUN 2003 Final version received 7 MAR 2005 Published Online Date: 6 JUN 2006