Internal and external validity

Report 8 Downloads 180 Views
Internal and external validity Research strategies -

Results vs. interpretation

-

Internal validity

-

External validity

-



The “undergraduate participant” problem



Cross-species generalizations (Smith, Minda, & Washburn, 2004)



The “But it’s not real life!” argument

Research strategies

Results and interpretation -

Analyze the data interpret the data

-

Data analysis and data interpretation are two completely different things.

-

Researchers can (and often do) come to different interpretations starting from identical data.

Interpretation example: Fictional clinical psychology experiment -

A clinical psychologist decides to evaluate the efficiency of the medication R22 to treat anxiety. •

40 participants are randomly assigned to one of two conditions: R22 or placebo



The experiment uses a double-blind procedure



Pre-treatment assessments show that both groups have a similar level of anxiety.



Finally, a post-treatment measure of anxiety is taken

Fictional study results

-

8 participants in the R22 condition improved compared to only four in the placebo. Therefore, R22 is an effective treatment for anxiety.

-

The majority of participants who had withdrawn did so because the experimentation situation made them feel anxious!

Conclusions -

Attrition: A loss of participants during a study that may lead to biased results.

-

The data for the participants who completed the study were not in question. The interpretation was, however.

-

Any component of a research study that raises doubts about the quality of the research process or the interpretation of the research results is a threat to validity.

-

Attrition is more likely when a study is going on for a long time.

-

Research where this is an issue, they go to great lengths to figure out who dropped out and know if something is special about them. Get nervous about them because don’t know if they are going to change the final results.

Internal validity -

Internal validity: The extent to which a research study produces a single, unambiguous explanation for the relationship between two variables.

-

How strongly do the results allow you to defend your interpretation, your hypothesis, and your theory?

-

Measurement validity ≠ (doesn’t equal) study validity

-

Different factors are always at play and happening simultaneously. Ideally you would be able to consider all facts

Guy’s corollary of Stanovich’s (2007) connectivity principle -

When interpreting research results, look for the straightforward, common “scientific” sense, and boring explanations before putting forward a spectacular one. •

Attrition



Environmental variables



Assignment bias



History



Maturation



Instrumentation



Testing effects



Regression towards the mean

Environmental variables -

Any changes in the testing environment (people, places, time) may have an impact on the results, and thus, on internal validity. •

People



Places



Times

-

If conducting a boring experiment, you have participants come into the lab (e.g., learning experiment). Immediately there is environmental variables e.g. People testing in morning, testing in evening. What about temperature? Morning is cool and afternoon is unbearable.

-

You can be worried about these effects however most of the time it doesn’t matter because we don’t assign participants in experimental conditions in a way that makes it matter. Meaning, morning group will do same conditions as afternoon people.

The effect of context on memory -

Godden and Baddeley (1975). The context in which learning takes place influences memory.

-

16 experienced divers learned lists of 40 words in one of two environments and recalled them in one of the two environments.

-

They found that if you recall words in the same environment you learned them that performance is better. Performance is better when conditions match. They call it episodic cue encoding. Environment serves as a cue for recall.

Gooden and Baddeley (1975) results

Assignment bias -

A threat to internal validity that occurs when the process used to assign different participants to different treatments produces groups of individuals with noticeably different characteristics. •

-

Private vs. public school debates (e.g. Lubienski & Lubienski, 2006).

A principal from a private school which says students are doing really well on standardized tests and because the teachers are doing well.

-

What can lead to these different results? •

Socioeconomic status and home environment



The teachers could be better. To test this, if demographics of the children are the same. You can test if teachers are different.



Ask yourself… if there's a better or different explanation.

History -

A threat to internal validity from any outside event that occurs during the time that a research study is being conducted and has an influence on the participants’ scores.

-

Bahrick (1984). Semantic Memory Content in Permastore: Fifty Years of Memory for Spanish Learned in School

-

773 participants were tested from 1 to 50 years after taking Spanish course(s) at the university level.

Results -

Even 50 years after the courses ended, a substantial proportion of the participants’ knowledge of Spanish remains.

Results -

The difference in performance between students who obtained As and those who obtained Cs remains constant for up to 50 years.

Bahrick’s interpretation -

Semantic memory content in permastore: fifty years of memory for Spanish learned in school

-

773 participants were tested from 1 to 50 years after taking Spanish course(s) at the university level

-

Wanted to see how well people could remember content from academic courses after the passage of time’

-

Tested people that either just finished the exam, or up to 50 years after taking the same class

-

He has recognition tests, production, short reading stories etc. Exercises students would have to learn in class or do on test

-

Is there a threat to validity (history)?

-

“In addition to taking a test of knowledge of Spanish, subjects completed a questionnaire designed to provide information about Spanish instruction; grades obtained in Spanish courses; and various opportunities to read, write, speak, or listen to Spanish and other Romance languages during the retention interval”(p.3).

Results -

After 50 years people remembered specific amount of Spanish. There is a lot of forgetting that occurs early on (5-7 years) then stabilizes so that after 7 years the performance starts to level off.

-

The difference in performance between students who obtained A’s and those who obtained C’s remain constant for up to 50 years.

Permastore

-

Permastore: State of knowledge that remains unchanged for 25 years or more following a 0 to 6 year period of accelerated decline.

-

Almost every participant never touched Spanish again in the time between taking exam and being retested

Maturation -

A threat to internal validity from any physiological or psychological changes that occur in a participant during the time that research study is being conducted and that can influence the participant’s scores. •

Especially important at both ends of the life-span (research with infants, children, and the elderly).



Try to find a control group that is similar to the experimental group. E.g. Man 67 with university degree, find a man with same statistics

Instrumentation -

-

A threat to internal validity from changes in the measurement instrument that occur during the time a research study is being conducted. •

Grading



Observational studies

The easiest example is grading. History prof and teaching large 2nd year class with 200 exams with 10 pages worth of writing and need to grade all of it in one week. Will person grade 1st test, 100th test and 200th test the same way. •

To help this look for very specific things and force self to look at every grade sheet and answer

Testing effects -

A threat to internal validity that occurs when participants are exposed to more than one treatment and their responses are affected by an earlier treatment. •

Practice



Fatigue



Carry-over  Something with one condition and transferring knowledge

Regression toward the mean -

A statistical phenomenon in which extreme scores (high or low) on a first measurement tend to be less extreme on a second measurement.

-

When you get an extreme score, the probability of getting a more extreme score is so remote is highly unlikely and will go towards the middle. •

Galton (1886) – Regression towards mediocrity in hereditary stature.



My “psychic” experiment



Sophomore slumps  Sports commentators talk about that. They have a fantastic first season and then

second season they don’t do as well. If you have a good season it’s hard to have a better one the next season Guy’s corollary of Stanovich’s (2007) connectivity principle -

When interpreting research results, look for the straightforward, common “scientific” sense, and boring explanations before putting forward a spectacular one.

-

Attrition, environmental, assignment bias, history, maturation, instrumentation, testing effects, regression towards the mean

Conclusions about threats to internal validity -

-

Experimental psychologists are trained to: •

catch these threats in non-scientific reports



avoid or minimize these threats in their research

Considering that it is rare that experimental psychologists commit blatant methodological errors that threaten internal validity, then how are the more subtle errors caught?

Replications -

We catch errors by replicating and expanding published research studies.

-

Extraneous variable: Any variable that exists within a study other than the variables being studied. •

A problem if it turns into a confounding variable

-

Confounding variable: An extraneous variable that is allowed to change systematically along with the variables being studied and that threatens internal validity. •

-

The variable you are studying changes systematically with other outside variables

If you can keep getting the same results then there is a good chance the original study was accurate. If it can be replicated in other areas or places, then the environment doesn’t really act as a variable.

External validity -

-

The extent to which we can generalize the results of a research study to people, settings, times, measures, and characteristics other than those used in that study. •

Generalization from one study to another



From a given sample to the general population



From the laboratory to the real world.

Another way of saying it: the ability to generalize a result from a given situation to another situation

Generalization from one study to another -

To what extent will the results obtained in one study be also obtained in another similar study? •

Novelty (Clark, 1994)  Says technology is like a truck that delivers something, but truck not that

important. Doesn’t matter because results will be the same.  What can happen is that kids go in front computer, and they’re excited and

momentarily work harder because it’s cool  Nothing to do with technology and actually to do with the novelty. •

Experimenter characteristics (e.g. Rosenthal)



Reactivity (e.g. social desirability)  Person guessing what researcher wants to hear

-

These potential problems are as much a threat to internal validity as they are to external validity.

Sample to population -

To what extent do the research results obtained with a sample of individuals generalize to the population? •

Selection bias



Undergraduate participants



Cross-species generalizations



Similar to surveys and statistical information but can ask at conceptual level

The “undergraduate participant” problem -

Premise 1. The undergraduate population has specific characteristics (transitional period of life, higher SES, higher IQ…) (Winter, North, & Sugar, 2001)

-

Premise 2. The majority of experiments in psychology are conducted with undergraduate participants.

-

Conclusion. The results of psychological research does not generalize to the general population.

Replies to the “undergraduate participant” problem -

The “undergraduate participant” problem may not be problem (Bodner, 2006).

-

In some areas of research, the problem does not apply (e.g., perception).

-

Often, researchers seek to establish a phenomenon at a low cost (or with greater ease).

-

Research with undergraduates may sometimes call for replications with other populations, but this does not invalidate the findings per se.

Cross-species generalization -

-

Does the research conducted with animals generalize to humans? •

Visual system (Hubel & Wiesel, 1959)



Operant conditioning (Skinner, 1956)



Pharmaceutical research

Comparative psychology involves the use of a comparative method, in which similar studies are carried out on animals of different species, and the results interpreted in terms of their different phylogenetic or ecological backgrounds.



Comparing humans to animals

Shepard, Hovland, and Jenkins (1961) -

Learning and memorization of classification.

-

Goal. Determine the relation between classification learning difficulty and the assignment of stimuli to categories.

-

Shepard et al. created 8 stimuli from 3 attributes with binary values (23 = 2 x 2 x 2 = 8). How can these stimuli be combined to form two categories?

Procedure -

Participants learned the correct category assignment in counter-balanced blocks: •

Type I (5 problems)



Type II (5 problems)



Type III, IV, or V (5 problems)



Type VI (5 problems)

-

Induction learning instructions were given.

-

The learning criterion was 32 consecutive correct trials.

Results

Interpretation -

The stimulus-to-category assignments influenced the difficulty of the category learning task.

-

I < II < III = IV = V < VI

-

Are the participants really using verbal rules to categorize the stimuli or are they learning associations?

-

Monkeys can help because they do not have language, so they learn associatively

Smith, Minda, and Washburn (2004) -

Category learning in Rhesus Monkeys: A study of the Shepard, Hovland, and Jenkins (1961) tasks

-

Goal: To compare the performance of humans and Monkeys on Shepard et al.’s (1961) induction tasks. Is the ability to verbalize categorization rules responsible for human performances?

-

Only humans have the ability to learn verbally, thus if we are learning verbally then the performance of the monkey would be different because they learn associatively.

Battle of the primates!

Results -

The Monkeys needed 10 times more practice than the humans to achieve a similar success rate.

-

Humans caught on quickly; if its associated II will be as difficult as VI

-

Verbal: using verbal rules

-

Associative: learning that there are associations with stimulus.

Results

Interpretation -

-

Rules I and II are easier for humans because they may be described by simple logical rules: •

Type I : If white, then A. Else B



Type II : If a black triangle or a white square, then A. else B.

Rule II (an XOR) is a difficult problem to solve associatively.

-

These results suggest that category learning via induction is verbal and hypothesis-driven in humans.

From the laboratory to the real world -

To what extent will the “results obtained in a relatively sterile research environment” (p. 141) generalize to the real world?

-

Can only research conducted in the “real world” apply to the “real world”?

The “But it’s not real life” argument (Stanovich, 2007) -

Premise 1. Laboratory experiments in psychology place humans (and other organisms) under highly artificial / unnatural conditions.

-

Premise 2. The artificial / unnatural conditions are not found in the “real world”.

-

Conclusion. Therefore, the findings of laboratory experiment do not apply to the “real world”.

-

This arguments stems from a misconception about the way that science works.

Analysis of the “But it’s not real life” argument’ -

Premise 1 is true. That is the point of laboratory research! We need artificial unnatural conditions to control the environment and, ultimately, ascertain causes and effects.

-

Premise 2 is trivially false. Researchers and experiments are part of “real life”. It is only when applied to psychological research that this argument appears to be meaningful.

-

Even if Premise 2 were true, the conclusion does not follow.

Replies to the “But it’s not real life” argument -

Mook (1983): “[…] it is the understanding [of a phenomenon] which has external validity (if it does)—not the findings themselves, much less the setting and the sample” (p. 382)

How does the sun work? -

By nuclear fusion: hydrogen is transformed into Helium (Eddington, 1920’s).

-

Prediction. Neutrinos exist (Pauli, 1930’s).

Neutrino detector (Davis, 60’s) -

A tank of 615 tons of perchloroethylene (a dry-cleaning fluid).

-

The tank was situated in the Homestake gold mine in South Dakota.

-

Twice every three days, a neutrino would interact with a nucleus of chlorine in the liquid and produce a nucleus of radioactive argon.

-

Question is irrelevant

-

There is support for it.

-

“It is the understanding of a phenomenon which has an external validity- not the findings themselves, much less the setting and the sample”

Basic research vs. applied problems

-

Basic and applied research do different things

-

Applied: literally trying to find an answer; solve a specific problem

-

Basic: applying concepts/ principles

Questions basic research try to answer (Mook, 1983) -

We may be asking whether something can happen, rather than whether it typically does happen.

-

Second, our prediction may be in the other direction; it may specify something that ought to happen in the lab, and so we go to the lab to see whether it does.

-

Third, we may demonstrate the power of a phenomenon by showing that it happens even under unnatural conditions that ought to preclude it.

-

Finally, we may use the lab to produce conditions that have no counterpart in real life at all, so that the concept of "generalizing to the real world" has no meaning.

Anderson, Lindsay, and Bushman (1999) -

Is there a relation between field research and lab research effects when they target the same constructs?

-

38 meta-analyses in social psychology (aggression, leadership, helping…) were reviewed.

-

All included the results (in effect size) of both field and lab research.

-

Anderson et al. computed the correlation between the two.

-

Meta-analyses: looking at a bunch of studies that look at the same things; a bunch of studies are replicate and all separate and they allow you to have a pool of data. You analyze them all to see if there is a way to make into one experiment.

Results -

When the constructs under investigation are identical, field and lab research yield highly similar results.

-

Strong correlation

-

Any research in a lab is correlated to the real world

Conclusions about external validity -

External validity is very important in applied research. Otherwise, it is overrated.

-

“Psychology ends up having the worst of both worlds. The same ignorance of the scientific method that supports the belief that psychology just can’t be a science leads to the denigration of psychology when they, like all other scientists, create the special conditions necessary to uncover more powerful and precise explanations of their phenomena” (p.104).

Research strategies

-

Descriptive: trying to figure out what is out there; a new topic and you want to learn more about it; surveys.

-

Nonexperimental: trying to relate two variables, poor research designs because people are not paying attention to internal validity

-

Correlational: relationship between variables; giving people lots of test and seeing what is associated with what.

-

Quasi-experimental: trying to run experimental research but some situations prevents you from doing it; ex: going into a school and trying to figure out which math program is better than another. You can’t get teachers out of the classroom; there are things that you cannot completely control.

-

Experimental: control dependent variables; and trying to say this causes that. If and then; striving for this point.