Chapter 5 – Reliability and Validity Random and Systematic Error •
•
Random Error: chance fluctuations in measurement that influence scores on measured variables o Can happen by misreading or misunderstanding of the questions, measurement of participants on different days or different places, misprinting of questions, misreporting of answers o Random error is self cancelling because some errors will increase some peoples scores and some errors will decrease some peoples score, thus cancelling each others out Systematic error: the influence on a measured variable of the other conceptual variables that are not part of the conceptual variable of interest o Systematic errors don’t cancel out over time and therefore are a threat to construct validity
Reliability • Reliability: the extent to which a measure is free from random error o Best way to measure reliability is to test measures repeatedly • Testretest reliability: the extend to which scores on the same measured variable correlate with each other on two different measurements given at two different times o Correlation between scores should be r = 1.00 for test to be perfectly reliable • Retesting effects: reactivity that occurs when the responses on the second administration are influenced by respondents having been given the same or similar measures before o Participants may answer same questions differently because they believe experimenter wants different opinions or may get bored or may answer exactly the same, all would effect the testretest reliability • Equivalentforms reliability: the extent to which scores on similar, but not identical, measures administered at two different times, correlate with each other o Equivalent forms of tests may be administered to lower retesting effects Reliability as Internal Consistency • Traits: personality variables that are not expected to vary within people over time • States: personality variables that are expected to change within the same person over short periods of time o Testretest approach is not effective for measuring states because they are known to change in a person o Because of the problems with testretest and equivalentforms reliability, internal consistency is more often used • Internal consistency: the extent to which the scores on the items of a scale correlate with each other, usually assessed using coefficient alpha
•
•
•
•
True score: the part of a scale score that is not random error o Although some questions may contain a certain amount of random error, they also contain parts that will assess the true score for the individual o Since random error is self cancelling, the random error components of the items will not correlate with each other but the true score components will Splithalf reliability: a measure of internal consistency that involves correlating the respondents’ scores on one half f the items with their scores on the other half of the items o This is a limited approach and using a measure that uses a correlation among all scores if best Cronbach’s Coefficient Alpha: an estimate of the average correlation among all of the items on the scale and is numerically equivalent to the average of all possible splithalf reliabilities, ranges from 0.00 to +1.00 (errorfree) o When developing a new scale, items go through itemtotal correlations which is calculating the correlation between the score on each of the individuals items and the total scale score excluding the item itself Interrater reliability: the extent to which the ratings of one or more judges correlate with each other o You can use the coefficient alpha to correlate quantitative variables between raters but if scores are nominal, kappa should be used o Kappa ranges from 0.00 to 1.00 (error free)
Construct Validity • Construct validity: the extent to which a measured variable actually measures the conceptual variable that it is designed the assess • Face validity: the extent to which the measured variable appears to be an adequate measure of the conceptual variable • Content validity: the extent to which the measured variable appears to have adequately cover the full domain of the conceptual variable • Convergent validity: the extent to which a measured variable is found to be related to other measured variables designed to measure the same conceptual variable • Discriminant validity: the extend to which a measured variable is found to be unrelated to other measured variables deigned to measure other conceptual variables o Face validity can product relativity and is therefore not always desirable • Nomological net: the pattern of correction among a group of measured variables that provides evidence for the convergent and discriminate validity of the measures o Use of physiological measures, self report measures, and observer report measures for example can create a nomological net o When validity is assessed through correlation of a selfreport measure with a behavioral measured variable, the behavioral variable is called a criterion variable and the correlation is an assessment of the selfreport
measures criterion validity o Criterion validity is known as predictive validity when it involves attempts to foretell the future eg. Using a measure of job aptitude to test how well a new employee is going to perform concurrent validity is when it involves assessment of the relationship between a selfreport and behavioral measure that are assessed at the time Improving the Reliability and Validity of Measures • Conduct a pilot testtrying out a questionnaire or other research on a small group of individuals to get an idea of how they react to it • Use multiple measures • Ensure variability within your measures • Write good items, not too long or confusing • Attempt to make items nonreactive • Use existing measures when possible • If a measure is not reliable, its construct validity cannot be determined