A PSYCHOMETRIC REVIEW AND EVALUATION OF THE AQUESTT ...

Report 1 Downloads 51 Views
A PSYCHOMETRIC REVIEW AND EVALUATION OF THE AQUESTT EVIDENCE-BASED ANALYSIS (EBA) Yukina Chen & Matt Hastings Data, Research and Evaluation Nebraska Department of Education 1

OUTLINE Section 1

Psychometric Analysis

Section 2

Recommendations For Revision From The Psychometric Analysis

Section 3

Revision Conducted For 2016-2017 EBA

Section 4

Recommendations For Incorporating EBA Results Into The Nebraska State Accountability System

2

PSYCHOMETRIC ANALYSIS

STRUCTURE OF THE EBA 2015 – 2016 EBA Has Two Versions School-level District-level

EBA Is Designed To Use Five Items To Measure Each Of The Six Tenets Positive Partnerships, Relationships, and Student Success (1-PPSS) Transitions (2-TRANS) Educational Opportunities and Access (3-EDOP) College and Career Readiness (4-CCR) Assessment (5-ASSESS) Educator Effectiveness (6-EDEFF) 4

RESPONSE OPTIONS FOR THE EBA Never = 0

Seldom = 1 Sometimes = 2 Usually = 3

5

PSYCHOMETRIC EVALUATION Purpose To determine the measurement quality of a test or a survey, usually by using a model-based approach.

Model Used To Conduct The Psychometric Assessment Of EBA Item Response Theory Model

Data Used For The Psychometric Evaluation 2015-2016 EBA responses Data contains both school-level responses (N=1,127) and district-level responses (N =242)

6

ITEM RESPONSE THEORY (IRT) MODEL Item response theory model is a theory of testing based on the relationship between 1) individuals’ performances on a test item, and 2) the test takers’ levels of performance on an overall measure of the ability that item was designed to measure

It does not assume that each item is equally difficult The model accounts for two parameters: Person Test Item

7

KEY FINDINGS – OVERALL ESTIMATION AND MODEL FIT Unidimensional Model

To examine to what extent the items thought to measure a single trait actually did so – by testing the assumption of unidimensionality within each trait (tenet) Results: Good fit achieved by using CFI measures Local misfit found for some traits

8

KEY FINDINGS – OVERALL ESTIMATION AND MODEL FIT Multidimensional Model

To examine multidimensional fit of all six traits (tenets) simultaneously Result: The six traits were significantly differentiable, as indicated by a significant decrease in model fit when fitting single-trait model as compared to the hypothesized six-trait model

9

KEY FINDINGS – OVERALL ESTIMATION AND MODEL FIT Single Higher-Order Trait Model

To examine multidimensional fit of one trait using designed six traits as lower-order traits to serve as indicators of a higher-order trait Result: This model fits well, however, its fit is significantly worse than the previous model with six traits To conclude, we have enough evidence to support that the six-trait multidimensional model is the best fit given the responses

10

KEY FINDINGS – ITEM CHARACTERISTICS α-value: item discrimination slope, it is used to indicate the strength of the relation of each item to the trait it measures α = 0.7 indicates good item discrimination Problematic in School Form: Item 5 from 1-PPSS Items 1 and 2 from 3-EDOP Items 4 and 5 from 4-CCR Item 5 from 6-EDEFF Problematic in District Form: Item 1 from 3-EDOP Item 5 from 6-EDEFF

11

KEY FINDINGS – SCHOOL ITEM CHARACTERISTICS 𝑏𝑗 indicates the trait level at which probability=0.50 for a response > j Most 𝑏0 locations are 3 standard deviations below 0, meaning that the response never is largely unhelpful in measuring respondents Most 𝑏1 locations largely occur between 1 and 3 SD below the mean of 0 for each trait, which indicates that the seldom response is helpful in measuring respondents with low trait levels Most 𝑏2 locations largely occur between 1 SD below the mean of 0 and 0.5 SD above the mean, which indicates that the usually response is helpful in measuring mid-range respondents.

12

KEY FINDINGS – DISTRICT ITEM CHARACTERISTICS Similar patterns can be found for the district version of the EBA. It is also worth noting that there is no response option available with which to measure high-trait respondents, which is a cause for concern.

13

KEY FINDINGS – ITEM CHARACTERISTICS A dotted gray line emphasizes a conventional level of acceptable reliability at 0.80. As shown, while all traits have acceptable reliability from around 2.5 SD below the mean, reliability becomes problematic for above 0.5 SD.

14

KEY FINDINGS – LATENT TRAITS VERSUS ITEM MEAN Latent trait estimate is a score calculated from the model

Item mean is a score calculated from the response options For example, for school version EBA, the estimated latent trait for 1-PPSS is -0.827

15

KEY FINDINGS – CONCLUSION Support was found for six distinct traits—instead of one common trait—as hypothesized. Within each of the six traits, the five item responses did appear to be largely indicative of the trait they were designed to measure (i.e., largely unidimensional). However, other significant relationships were found within subsets of items for four of the six traits, indicating that these items also measured some additional extraneous trait.

16

KEY FINDINGS – CONCLUSION All item responses showed significant prediction by the trait the item was supposed to measure (i.e., significant positive item discrimination slopes). However, the variability among the item slopes indicates that some items are better at measuring the traits than others. This finding means that model-estimated latent trait scores are more justifiable for characterizing respondents than are trait scores computed from item means.

The lowest item response option “never” was not selected in many of the items, and does not appear to be a useful response option. In parallel, the highest response option “usually” does not appear to be sufficiently strong to capture high trait levels. Consequently, good reliability of measurement was achieved for lower-trait respondents, but not for higher-trait respondents (e.g., in approximately the top 40%).

17

RECOMMENDED REVISIONS 18

RECOMMENDATIONS FOR REVISION FROM THE PSYCHOMETRIC ANALYSIS Goal: To improve EBA’s reliability and the content coverage of its six traits

Recommendation 1: Characterize respondents using model-estimated latent traits instead of item means or sums Recommendation 2: Revise the current four-option item response format to extend further into higher trait levels. Alternatively, consider more customized item-specific response options or prototype

Recommendation 3: Clarify nouns without clear referents (e.g., strategies, processes) whenever possible. Collect information on respondent-specific referents when relevant to guide further item revision Recommendation 4: Consider the content of items with low discrimination or with redundant content (as identified by the need for method traits that predict additional correlation for those items) Recommendation 5: Consider ways to collect additional evidence to support the accuracy and validity of the item responses, such as by offering respondents a way to provide context for negative answers 19

REVISIONS TO 2016 – 2017 EBA 20

REVISION WORK MADE TO 2016 – 2017 EBA RESPONSE OPTIONS

The 2016 – 2017 EBA offers a 5-point scale compared to the 4-point scale that was used in 2015 – 2016 EBA 21

REVISION WORK MADE TO 2016 – 2017 EBA QUESTION STEM

22

REVISION WORK MADE TO 2016 – 2017 EBA CLARIFYING COMMENTS OPTION

23

REVISION WORK MADE TO 2016 – 2017 EBA UPLOADING DOCUMENTS OPTION

24

REVISION WORK MADE TO 2016 – 2017 EBA EBA QUESTION RUBRIC

25

REVISION WORK MADE TO 2016 – 2017 EBA EBA QUESTION RUBRIC

26

REVISION WORK MADE TO 2016 – 2017 EBA REVISION REVIEW PROCESS

NDE Data, Research and Evaluation Team

Subject Matter Experts

AQuESTT Tenet Committee

Leadership Council

Educators

27

REVISION WORK MADE TO 2016 – 2017 EBA Technical Assistance Guide •Best Practices for Completing the EBA •Unit of Analysis for the EBA

•Updates to the EBA •Using the EBA Rubric •Model of Best Practice

•Accessing the EBA on the Portal

28

RECOMMENDATIONS FOR INCORPORATING EBA RESULTS INTO THE NEBRASKA STATE ACCOUNTABILITY SYSTEM

MODEL BASED APPROACH The state accountability system provides a classification of schools or districts into one of four ordered categories: Needs Improvement Good Great Excellent Mixture models (Latent class models) provide a method to determine if data exhibit clusters (often called classes) of observations Since the categories are pre-defined, a confirmatory, four-class mixture models was used for each EBA trait (for both school version and district version) 30

MODEL BASED APPROACH Use trait score rather than the mean score Each trait score is an Expected A Posteriori estimate: it is the mean of the distribution of all possible scores for the trait for a specific school or district Example comparisons for the school 1-PPSS tenet. The data from last year suggest that the fourth class is not able to be determined from such an analysis 31

MODEL BASED APPROACH - TRANSFORMING CLASSIFICATION PROBABILITY ESTIMATES INTO EXPECTED ACCOUNTABILITY SCORES The classification probabilities shown in the previous section can be used to calculate an expected accountability score for each trait for use in the state’s accountability formula For example, we can assign the weights 1, 2, 3, and 4 to the Needs Improvement, Good, Great, and Excellent categories, respectively. Then, to produce the expected accountability score for each school or district, the sum across all four classes of the classification probability times the numeric weight for each class results in the expected accountability score.

32

MODEL BASED APPROACH - TRANSFORMING CLASSIFICATION PROBABILITY ESTIMATES INTO EXPECTED ACCOUNTABILITY SCORES Example:

A district had classification probabilities of .045, .907, .048, and 0 for each of the four accountability categories for the 5-ASSESS trait Using the 1, 2, 3, 4 numeric weighting of classes, the district’s expected accountability score would be .045*1 + .907*2 + .048*3 + 0*4, or 2.003 The deviation from a value of 2 (Good) comes from the district having slightly more plausible values appearing in the “Great” range than in the “Needs Improvement” range

33

MODEL BASED APPROACH - CONCLUSION Choose cut scores that reflect specific quantiles for the distribution of each scaled score For each EBA trait, estimate classification probabilities using plausible values Weight each classification category numerically Create expected accountability scores for each trait using the numeric classification weights and the estimated classification probabilities Use all six expected accountability scores in the accountability formula

34

QUESTIONS? Please feel free to contact [email protected] if future questions arise for 2016 – 2017 EBA process

35

36