Data Analysis Brown Bag: August 2013
Item Response Theory And Rasch Modelling
David Lillis for
1
ITEM RESPONSE THEORY
Item Response Theory (IRT) refers to a family of statistical models for evaluating the design and scoring of psychometric tests, assessments and surveys Used on assessments in psychology, psychometrics, education, health studies, marketing, economics etc that involve ordinal categorical items (e.g. Likert Items) Can be used in standards-based assessment, where overall grades and item grades are criterion-referenced
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
IRT PARAMETERS OF INTEREST
DIFFICULTY of an item (δ) - for each grade or category. High difficulty means small numbers of able people gaining high grades ABILITY of the person (θ) - a measure of a person’s performance across the entire test. Responding correctly across most or all items results in a high ability. Also called the Person Location DIFICULTIES and ABILITIES are measured on the same scale. The units are called logits DISCRIMINATION – how strongly the item discriminates between levels of performance
IRT MODELS Responses may be coded: 0, 1 - two ordered categories (dichotomous) 0, 1, 2 - three ordered categories (polytomous) 0, 1, 2, 3 - four ordered categories (polytomous) etc
Categories reflect increasing levels on a trait, such as numeracy, extroversion, aggression or skill in an academic subject Several items can form a scale that measures the trait Responses can be summed over all items to give total scores. A person with a high total score displays more of the trait than others.
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
CO-LOCATION Abilities and item difficulties are located along a single (uni-dimensional) latent trait The number of items a person answers correctly varies with the item difficulties The relationship between person ability and total score is non-linear.
RASCH MODEL Used for dichotomous and polytomous variables The total score characterizes the person ability. Those who attempt the same items and attain the same total score will get the same ability Rasch involves only one person parameter, and one parameter corresponding to each category
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
RASCH MODEL PERSON
Q1
Q2
Q3
TOTAL
ABILITY
A
0
0
0
0
NA
B
1
0
0
1
-2.6
C
0
1
0
1
-2.6
D
0
0
1
1
-2.6
E
1
1
0
2
2.7
F
1
0
1
2
2.7
G
0
1
1
2
2.7
H
1
1
1
3
NA
RASCH MODEL For a dichotomous item (1 = right; 0 = wrong), the probability of a respondent or candidate of ability θ (measured over the entire test, survey or assessment) responding correctly, is given by the expression:
P( Xni = 1 ) = exp( θn – δi )
/
1 + exp( θn – δi )
X is a Bernoulli random variable that takes two values n indexes the respondents and i indexes the items θn is the calculated ability of person n (a measure of performance) δi is the estimated difficulty of success in item i There is no item discrimination The difference between the ability and the difficulty θn – δi is crucial
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
LOGISTIC FUNCTION
THE PROBABILITY Imagine a person of a given ability taking an item 10 times (retaining no memory of the item) The probability that he or she is successful on a given attempt is the theoretical average score over those attempts The sum of the probabilities across several items equals the number of items that are indeed correct
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
THE RASCH MODEL Given any total score, the response pattern (i.e. the particular items that were correct or incorrect) does not contain any further information, and the total score contains all of the information about the person If the observed data fit the model, then all of the information about the person ability is contained within the total score Also - the total score of an item is the statistic that contains all of the information about the item difficulty
GRADED RESPONSE MODEL (Samejima, 1969) For a polytomous item, the probability of a respondent or candidate of ability θ gaining grade (category) j or better, is given by the expression:
Pnji = 1 /
{ 1 + exp[ ka( δ – θ ) ] } ji
n
n indexes the person, i indexes the item and j indexes the categories or assessment grades (the lowest passing grade or better, the second lowest passing grade or better … or the highest passing grade) θn is the calculated ability of person n (which you can also think of as a measure of performance) a is the item discrimination δji is the estimated difficulty of attaining grade j in item i k = -1.7 (scales the logistic curve to approximate a cumulative ogive)
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
FOR ALL MODELS . . . The difference between the ability and the difficulty θn – δi is crucial logit(P) =
P/[1–P]
For dichotomous Rasch: logit(P) = exp( θn – δi )
FOR DICHOTOMOUS RASCH If θn = 3 and δi = 1 P( Xni = 1 ) = exp( θn – δi )
/
[ 1 + exp( θn – δi ) ]
= exp( 3 – 1 ) / [ 1 + exp( 3 – 1 ) ] = 0.88
θn – δi > 0 then P > 0.5 θn – δi < 0 then P < 0.5
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
IRT PARAMETERS Item discriminations and grade difficulties estimated using various models Ability estimated from performance on all items, taking account of item difficulties and discriminations
The most objective measure of performance?
A NEW ZEALAND STANDARD
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
ITEM-CHARACTERISTIC CURVE (DICHOTOMOUS GRM) DIFFICULTY: the ability at which P = 0.5 DISCRIMINATION: slope at that point
ITEM-CHARACTERISTIC CURVE (GRM POLYTOMOUS)
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
ITEM DISCRIMINATION & DIFFICULTIES (GRM POLYTOMOUS)
Item
Discrimination
Difficulty (AME)
Difficulty (ME)
Difficulty (E)
Q1
1.51
-1.52
0.37
2.15
Q2
0.73
-2.57
0.59
4.68
Q3
2.43
-0.41
0.54
1.75
HISTOGRAM OF CANDIDATE ABILITIES (PERFORMANCES)
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
DICHOTOMOUS AND POLYTOMOUS ITEM CHARACTERISTIC CURVES
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
GRADE THRESHOLDS
MATHEMATICAL EXPRESSIONS (four grades and three boundaries)
The first boundary θNA is found by equating: 1 – P(A) = P(A) – P(M) θNA = (1 / ka) loge { exp(ka(bA + bM) / [ exp(kabM) - 2exp(kabA) ] }
The second boundary θAM is found by equating: P(A) – P(M) = P(M) – P(E) θAM = (1 / ka) loge { [ exp(ka(bA + bM) + exp(ka(bM + bE) - 2exp(ka(bA + bE ) ] [ exp(kabA) + exp(kabE) - 2 exp(kabM) ]
}
The third boundary θME is found by equating: P(M) – P(E) = P(E) θME = (1 / ka) loge { exp(kabE) - 2exp(kabM) }
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
/
Data Analysis Brown Bag: August 2013
Item Characteristic Curves (based on three grades), constructed using the Graded Response Model (Samejima, 1969).
TEST INFORMATION CURVE
This curve gives the reciprocal of the standard error of the ability: measuring the precision of the candidate ability. Vertical lines indicate the abilities at grade boundaries.
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
CLASS INTERVALS AND MODEL FIT
We can calculate the average ability for class intervals and superpose the model.
CALCULATING ABILITIES
Maximum Likelihood can be used to find the abilities. Each item contributes to the log-likelihood The ability of each person is found from the location of the maximum
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
MAXIMUM LIKELIHOOD
L(θ) = P(x1 | θ) * P(x2 | θ) * . . . * P(xk | θ) loge[ L(θ) ] = loge[ P(x1 | θ) ] + loge[ P(x2 | θ) ] + . . loge[ P(xk | θ) ] The person’s ability (θ) is found from the location of the maximum of the log-likelihood
FITTING ITEM PARAMETERS We have various techniques. Item difficulties can be found from starting conditions (e.g. all θ = 0) and varying the difficulties iteratively, assuming:
Total Score for person n = ΣPni over L items Total Score for item i
= ΣPni over N persons
For a person attempting the same item many times, the sum of the theoretical averages of the number of times each item would be correct (probabilities) should equal the number of items that are correct.
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
OBSERVED RESPONSES
THEORETICAL RESPONSES
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
IRT AS ONE OF SEVERAL APPROACHES
Use IRT in conjunction with other analytic approaches such as Factor Analysis (including CFA), correlation analysis and measures of internal consistency (e.g. Cronbach Alpha)
DIFFERENTIAL ITEM FUNCTIONING (DIF)
DIF (item bias) occurs where one group tends to answer differently, or to experience difficulty with an item, not because its members have less knowledge of the subject matter, but because they held different assumptions or have had different experiences Typically, the purpose of DIF analysis is to detect test items that are biased in favour of, or against, particular demographic groups (e.g. students of one gender, or from different ethnic or socioeconomic groups).
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
DIF
If two samples of candidates, representing different groups, matched with respect to their overall ability in a test, perform differently on a given item, then we have evidence that the item is biased. However, the presence of bias requires corroboration by professional judgement.
DIF: GRAPHICAL DEPICTION
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
PRINCIPAL COMPONENTS ANALYSIS
INTERPRETATION OF PCA? FIRST PC (most or all items) Overall scale
SECOND PC Possibly Item Type (qualitative or quantitative responses)
THIRD PC ?
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
ITEM LOADINGS ON THE PRINCIPAL COMPONENTS
ITEM
PC 1
PC 2
1
0.16
0.56
2
0.47
-0.21
3
0.19
0.34
4
0.48
0.06
5
0.20
0.23
6
0.65
-0.47
MEASURES OF INTERNAL CONSISTENCY
Inter-item Correlation, Item Total Correlation and Cronbach Alpha Range between – 1 and + 1 Optimal between about 0.4 and 0.7
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com
Data Analysis Brown Bag: August 2013
INTER-ITEM AND ITEM-TOTAL CORRELATIONS
Q1
Q2
Q3
Q4
Total
Q1
1.00
0.43
0.43
0.50
0.65
Q2
0.43
1.00
0.27
0.41
0.54
Q3
0.43
0.27
1.00
0.42
0.50
Q4
0.50
0.41
0.42
1.00
0.62
CONCLUSION Many psychometric tests, assessments, surveys, agreement scales and rating scales assess a single dominant dimension (cognitive construct) and, for practical purposes, form psychometric scales. In conjunction with other approaches, IRT provides valuable analytic tools for many tests and assessments.
Copyright 2013 The Analysis Factor http://TheAnalysisFactor.com