DABB 08 13 IRT

Report 6 Downloads 31 Views
Data Analysis Brown Bag: August 2013

Item Response Theory And Rasch Modelling

David Lillis for

1

ITEM RESPONSE THEORY

Item Response Theory (IRT) refers to a family of statistical models for evaluating the design and scoring of psychometric tests, assessments and surveys Used on assessments in psychology, psychometrics, education, health studies, marketing, economics etc that involve ordinal categorical items (e.g. Likert Items) Can be used in standards-based assessment, where overall grades and item grades are criterion-referenced

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

IRT PARAMETERS OF INTEREST

DIFFICULTY of an item (δ) - for each grade or category. High difficulty means small numbers of able people gaining high grades ABILITY of the person (θ) - a measure of a person’s performance across the entire test. Responding correctly across most or all items results in a high ability. Also called the Person Location DIFICULTIES and ABILITIES are measured on the same scale. The units are called logits DISCRIMINATION – how strongly the item discriminates between levels of performance

IRT MODELS Responses may be coded: 0, 1 - two ordered categories (dichotomous) 0, 1, 2 - three ordered categories (polytomous) 0, 1, 2, 3 - four ordered categories (polytomous) etc

Categories reflect increasing levels on a trait, such as numeracy, extroversion, aggression or skill in an academic subject Several items can form a scale that measures the trait Responses can be summed over all items to give total scores. A person with a high total score displays more of the trait than others.

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

CO-LOCATION Abilities and item difficulties are located along a single (uni-dimensional) latent trait The number of items a person answers correctly varies with the item difficulties The relationship between person ability and total score is non-linear.

RASCH MODEL Used for dichotomous and polytomous variables The total score characterizes the person ability. Those who attempt the same items and attain the same total score will get the same ability Rasch involves only one person parameter, and one parameter corresponding to each category

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

RASCH MODEL PERSON

Q1

Q2

Q3

TOTAL

ABILITY

A

0

0

0

0

NA

B

1

0

0

1

-2.6

C

0

1

0

1

-2.6

D

0

0

1

1

-2.6

E

1

1

0

2

2.7

F

1

0

1

2

2.7

G

0

1

1

2

2.7

H

1

1

1

3

NA

RASCH MODEL For a dichotomous item (1 = right; 0 = wrong), the probability of a respondent or candidate of ability θ (measured over the entire test, survey or assessment) responding correctly, is given by the expression:

P( Xni = 1 ) = exp( θn – δi )

/

1 + exp( θn – δi )

X is a Bernoulli random variable that takes two values n indexes the respondents and i indexes the items θn is the calculated ability of person n (a measure of performance) δi is the estimated difficulty of success in item i There is no item discrimination The difference between the ability and the difficulty θn – δi is crucial

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

LOGISTIC FUNCTION

THE PROBABILITY Imagine a person of a given ability taking an item 10 times (retaining no memory of the item) The probability that he or she is successful on a given attempt is the theoretical average score over those attempts The sum of the probabilities across several items equals the number of items that are indeed correct

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

THE RASCH MODEL Given any total score, the response pattern (i.e. the particular items that were correct or incorrect) does not contain any further information, and the total score contains all of the information about the person If the observed data fit the model, then all of the information about the person ability is contained within the total score Also - the total score of an item is the statistic that contains all of the information about the item difficulty

GRADED RESPONSE MODEL (Samejima, 1969) For a polytomous item, the probability of a respondent or candidate of ability θ gaining grade (category) j or better, is given by the expression:

Pnji = 1 /

{ 1 + exp[ ka( δ – θ ) ] } ji

n

n indexes the person, i indexes the item and j indexes the categories or assessment grades (the lowest passing grade or better, the second lowest passing grade or better … or the highest passing grade) θn is the calculated ability of person n (which you can also think of as a measure of performance) a is the item discrimination δji is the estimated difficulty of attaining grade j in item i k = -1.7 (scales the logistic curve to approximate a cumulative ogive)

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

FOR ALL MODELS . . . The difference between the ability and the difficulty θn – δi is crucial logit(P) =

P/[1–P]

For dichotomous Rasch: logit(P) = exp( θn – δi )

FOR DICHOTOMOUS RASCH If θn = 3 and δi = 1 P( Xni = 1 ) = exp( θn – δi )

/

[ 1 + exp( θn – δi ) ]

= exp( 3 – 1 ) / [ 1 + exp( 3 – 1 ) ] = 0.88

θn – δi > 0 then P > 0.5 θn – δi < 0 then P < 0.5

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

IRT PARAMETERS Item discriminations and grade difficulties estimated using various models Ability estimated from performance on all items, taking account of item difficulties and discriminations

The most objective measure of performance?

A NEW ZEALAND STANDARD

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

ITEM-CHARACTERISTIC CURVE (DICHOTOMOUS GRM) DIFFICULTY: the ability at which P = 0.5 DISCRIMINATION: slope at that point

ITEM-CHARACTERISTIC CURVE (GRM POLYTOMOUS)

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

ITEM DISCRIMINATION & DIFFICULTIES (GRM POLYTOMOUS)

Item

Discrimination

Difficulty (AME)

Difficulty (ME)

Difficulty (E)

Q1

1.51

-1.52

0.37

2.15

Q2

0.73

-2.57

0.59

4.68

Q3

2.43

-0.41

0.54

1.75

HISTOGRAM OF CANDIDATE ABILITIES (PERFORMANCES)

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

DICHOTOMOUS AND POLYTOMOUS ITEM CHARACTERISTIC CURVES

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

GRADE THRESHOLDS

MATHEMATICAL EXPRESSIONS (four grades and three boundaries)

The first boundary θNA is found by equating: 1 – P(A) = P(A) – P(M) θNA = (1 / ka) loge { exp(ka(bA + bM) / [ exp(kabM) - 2exp(kabA) ] }

The second boundary θAM is found by equating: P(A) – P(M) = P(M) – P(E) θAM = (1 / ka) loge { [ exp(ka(bA + bM) + exp(ka(bM + bE) - 2exp(ka(bA + bE ) ] [ exp(kabA) + exp(kabE) - 2 exp(kabM) ]

}

The third boundary θME is found by equating: P(M) – P(E) = P(E) θME = (1 / ka) loge { exp(kabE) - 2exp(kabM) }

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

/

Data Analysis Brown Bag: August 2013

Item Characteristic Curves (based on three grades), constructed using the Graded Response Model (Samejima, 1969).

TEST INFORMATION CURVE

This curve gives the reciprocal of the standard error of the ability: measuring the precision of the candidate ability. Vertical lines indicate the abilities at grade boundaries.

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

CLASS INTERVALS AND MODEL FIT

We can calculate the average ability for class intervals and superpose the model.

CALCULATING ABILITIES

Maximum Likelihood can be used to find the abilities. Each item contributes to the log-likelihood The ability of each person is found from the location of the maximum

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

MAXIMUM LIKELIHOOD

L(θ) = P(x1 | θ) * P(x2 | θ) * . . . * P(xk | θ) loge[ L(θ) ] = loge[ P(x1 | θ) ] + loge[ P(x2 | θ) ] + . . loge[ P(xk | θ) ] The person’s ability (θ) is found from the location of the maximum of the log-likelihood

FITTING ITEM PARAMETERS We have various techniques. Item difficulties can be found from starting conditions (e.g. all θ = 0) and varying the difficulties iteratively, assuming:

Total Score for person n = ΣPni over L items Total Score for item i

= ΣPni over N persons

For a person attempting the same item many times, the sum of the theoretical averages of the number of times each item would be correct (probabilities) should equal the number of items that are correct.

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

OBSERVED RESPONSES

THEORETICAL RESPONSES

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

IRT AS ONE OF SEVERAL APPROACHES

Use IRT in conjunction with other analytic approaches such as Factor Analysis (including CFA), correlation analysis and measures of internal consistency (e.g. Cronbach Alpha)

DIFFERENTIAL ITEM FUNCTIONING (DIF)

DIF (item bias) occurs where one group tends to answer differently, or to experience difficulty with an item, not because its members have less knowledge of the subject matter, but because they held different assumptions or have had different experiences Typically, the purpose of DIF analysis is to detect test items that are biased in favour of, or against, particular demographic groups (e.g. students of one gender, or from different ethnic or socioeconomic groups).

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

DIF

If two samples of candidates, representing different groups, matched with respect to their overall ability in a test, perform differently on a given item, then we have evidence that the item is biased. However, the presence of bias requires corroboration by professional judgement.

DIF: GRAPHICAL DEPICTION

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

PRINCIPAL COMPONENTS ANALYSIS

INTERPRETATION OF PCA? FIRST PC (most or all items) Overall scale

SECOND PC Possibly Item Type (qualitative or quantitative responses)

THIRD PC ?

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

ITEM LOADINGS ON THE PRINCIPAL COMPONENTS

ITEM

PC 1

PC 2

1

0.16

0.56

2

0.47

-0.21

3

0.19

0.34

4

0.48

0.06

5

0.20

0.23

6

0.65

-0.47

MEASURES OF INTERNAL CONSISTENCY

Inter-item Correlation, Item Total Correlation and Cronbach Alpha Range between – 1 and + 1 Optimal between about 0.4 and 0.7

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com

Data Analysis Brown Bag: August 2013

INTER-ITEM AND ITEM-TOTAL CORRELATIONS

Q1

Q2

Q3

Q4

Total

Q1

1.00

0.43

0.43

0.50

0.65

Q2

0.43

1.00

0.27

0.41

0.54

Q3

0.43

0.27

1.00

0.42

0.50

Q4

0.50

0.41

0.42

1.00

0.62

CONCLUSION Many psychometric tests, assessments, surveys, agreement scales and rating scales assess a single dominant dimension (cognitive construct) and, for practical purposes, form psychometric scales. In conjunction with other approaches, IRT provides valuable analytic tools for many tests and assessments.

Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com