Investigating Test Anomalies Andrew Wiley, Ph.D. Tracey R. Hembry, Ph.D.
October 27, 2015
Alpine Testing Solutions - Our Background » Psychometric and assessment design consultative services » Approximately 45 employees working in areas like educational assessment, as well as professional certification, IT certification, and credential management » Partnered with edCount to complete this project in Florida over the summer of 2015
AlpineTesting.com
October 27, 2015
2
Overview » A Case Study: Florida Standards Assessment (FSA) • An independent evaluation of the psychometric validity of the FSA • Today: Evaluation of Test Administration policies and practices
» Key Take Aways • Guidance for support from your vendors • Recommendations for managing test anomalies
AlpineTesting.com
October 27, 2015
3
FSA Background » FSA administered for 1st time in 2014-15 • Statewide assessments for Math and ELA • High stakes for students, teachers, schools, districts • Both CBT and PP
» DDoS attacks on multiple dates » Additional CBT system issues occurred » State legislature mandated an independent study of program AlpineTesting.com
October 27, 2015
4
FSA Validation Study » Legislation required six specific studies of: • • • • • •
Test Items Field Testing Test Blueprints and Construction Test Administration Scaling, Equating, Scoring Specific Psychometric Validity Questions
AlpineTesting.com
October 27, 2015
5
FSA Validation Study » Legislation required six specific studies of: • • • • • •
Test Items Field Testing Test Blueprints and Construction Test Administration Scaling, Equating, Scoring Specific Psychometric Validity Questions
AlpineTesting.com
October 27, 2015
6
Overview of Validation Work » Focus on intended uses of test scores » Define the nature of test administration challenges
» Collect and review available documentation » Discuss program details with staff and stakeholders » Collect additional data, as needed » Compare work and procedures to Test Standards
» Consider realities and constraints of the program AlpineTesting.com
October 27, 2015
7
Overview of Validation Work » Focus on intended uses of test scores » Define the nature of test administration challenges
» Collect and review available documentation » Discuss program details with staff and stakeholders » Collect additional data, as needed » Compare work and procedures to Test Standards
» Consider realities and constraints of the program AlpineTesting.com
October 27, 2015
8
FSA 2014-15 Test Administration Define the extent of the administration related issues » » » » »
DDOS attacks Delayed testing Log in system issues Delayed testing Movement across test sessions Impacted standardization Test interruptions Impacted standardization Other 1st year adjustments and learning Impacted standardization
AlpineTesting.com
October 27, 2015
9
Key Questions » How many students were impacted by administration issues? » What was the impact of the administration issues on student scores?
Considered for each of the administration issues identified AlpineTesting.com
October 27, 2015
10
Data Sources » Student testing data » District test administrators feedback » Other
AlpineTesting.com
October 27, 2015
11
Student testing data » How many? • Students lost work • Students completed multiple sessions in one day • Students active in the same session across multiple days
» Impact? • Relationship between current and previous year scores • Calibration results with and without identified students When possible, considered data by test use/level of aggregation AlpineTesting.com
October 27, 2015
12
Feedback from Districts » Online survey • Estimated % students impacted by content area
» 3 Focus group meetings • Qualitative data about various administration issues - What happened - Reactions of students - Potential impacts on student scores
» Some type of input from ~70% of FL districts (53 of 76)
AlpineTesting.com
October 27, 2015
13
Other Data Sources » Test Administration Manuals » Training Materials » Help Desk log » FLDOE communication with districts
AlpineTesting.com
October 27, 2015
14
Missing/Unavailable Data » Degree of delayed testing • No tracking of unsuccessful log-ins (due to DDoS and system issues)
• No data to compare scheduled vs. actual testing
» Measure of impact on student motivation » Prevalence of unreported issues
AlpineTesting.com
October 27, 2015
15
Challenge – Amalgamation of information » For each known administration issue • Considered the key questions of impact, both how many students and the degree of impact on scores
• Collected information from various sources, including from student test data and from district representatives
» Then had to combine information to gauge more global impact
AlpineTesting.com
October 27, 2015
16
Challenge - Consistency of feedback Degree of Impact/Potential Impact on Scores
Reality? Minor
Significant Student Data
AlpineTesting.com
District Feedback October 27, 2015
17
Recommendations » Realistic expectations for an investigation • Validity is NOT yes/no, rather a matter of degree • Validity is not for the test, but for a particular use of a test score
» Test scores don’t communicate every aspect of testing • Impact of student motivation, divergences from standardization not readily discerned
» When abnormalities are encountered, validity is not presumed unless proven otherwise • Evidence is needed (Test Standards, 2014) • No evidence to invalidate scores ≠ Validity evidence AlpineTesting.com
October 27, 2015
18
Proactive planning » Some things to consider during vendor selection • Availability of essential data (cannot measure and track everything) - Examples of previous investigations - Other state investigations
• Contingency planning needs to be embedded during the contract negotiation process with clear responsibilities defined
» Investigations of impact should begin ASAP » Documentation of issues, as much as possible
AlpineTesting.com
October 27, 2015
19
Questions? » Andrew Wiley, Director of Education Services »
[email protected] » Tracey Hembry, Psychometrician II »
[email protected] AlpineTesting.com
October 27, 2015
20