Formative evaluation

Report 5 Downloads 129 Views
FRHD notes for exam Lecture - process evaluation Types of evaluation research: Formative evaluation: -

E.g. someone learning how to drive – instructor not interested in getting to a location, they are concerned about the process.

-

E.g. making a soup without instructions. Tasting and adjusting

Summative evaluation - e.g. whether or not people like the soup you’ve made Hypothetical scenario -

Prof developed integrated program of weight training that students implemented

-

Students fitness levels improved

-

You want to learn more about this program to better understand why it was successful. What would you ask? o

How long/what type of activities

Black box evaluation -

Pretest  DO NOT KNOW WHAT CAUSED CHANGE (black box)  effects

-

Must open black box to understand cause

-

Logic model

-

Why needed? Target groups, strategies, program activities ******** look at slides

Purpose of process evaluation -

Program monitoring

-

Program improvement

-

Accountability

-

Explain observed program outcomes o

Why program had positive effect 

o

What worked? Disaggregation – take group who took intervention and split between who did well and who didn’t and find out why they are different

Why intervention had no effects 

Was intervention implemented as intended?



Was intervention strong enough to make a difference?

• 

Did the control group receive similar intervention from another source? •

Contamination



Did external events weaken intervention’s impact?



Did program’s theory of cause and effect work as expected?



Are there problems in the organization implementing intervention? •

o

Dose-response relationship

Staff not trained well enough/inspired

Why intervention had unintended consequences?

Components of process evaluation 1. Recruitment -

Procedures used to approach and attract prospective participants

-

Organizational/community and individual levels

-

Resources used

-

Reasons for non-participation

2. Reach -

Percent of intended target audience that participates in intervention o

-

Also, each component – e.g. 50% for one, 40% for another…

Attendance o

Which sub-groups/certain people?

-

Barriers for participation

-

Characteristics of participants

3. Maintenance -

Keeping participants involved in program and data collection

-

Assess dropout rate

4. Dose delivered -

Amount/percent of intended intervention or each component that is actually delivered to participants

-

Function of efforts of intervention providers

5. Does received

-

Extent of which participants engage with intervention o

People can be given a pill but do they actually TAKE the pill?

-

“exposure”

-

Characteristics of target audience (who did less/more)

6. Fidelity (honesty) -

Extent to which intervention was delivered as planned

-

Whether intended and implemented intervention congruent

-

Function of intervention providers

7. Implementation -

Composite score that indicated extent to which intervention has been implemented and received by target audience

-

Combination of reach, dose delivered, dose received and fidelity

-

Multiplicative vs averaging approach

-

o

Multiplication - multiply percent of the above 4 characteristics to receive implementation score – ultimate score would be 1 (100%x100%)

o

Average – add percent and divide by total number of indicators

Continuous (%s) vs category analysis (category is splitting scores in two)

8. Adaptation -

Changes/modifications made in original intervention during implementation

-

Fidelity-adaptation debate: max fidelity vs. adaptation permitted o

Adaptation as an implementation failure

o

Vs. adaptation as inevitable

9. Contamination -

Extent to which participants receive interventions from outside program, and extent to which control group receives intervention

10. Context -

Aspects of larger physical, social, political and economic environment that may influence intervention implementation

Illustration of process evaluation – Wang et al (2013) study HBHS -

Healthy bodies and souls

-

Church based health intervention

-

Promote healthy food

-

Randomized control trial (some churches received and some didn’t)

-

Purpose: o

How well was it implemented in terms of reach dose, delivery and fidelity

-

Had different phases with different components/strategies

-

Process evaluation: o

o

In church measures to assess delivery: 

Reach – percent of people with contact of intervention in each session



Dose delivered – how many intervention components distributed to person in one visit



Fidelity – percent of effective visits out of total visits per program phase

Post intervention exposure survey: 

Assess individual level dose received



Interview sample congregates



Dose received: percent of people who successfully recalled exposure to any intervention components

Lecture – Focus Groups -

Research method for collecting qualitative data

-

Group discussions

-

Focused

Strengths

-

o

Exploring – inductive

o

Context and depth (what are barriers)

o

Interact directly with respondents

o

Their own words

o

React to and build on others’ responses

o

Data from children or illiterate individuals

o

Understand results

DO NOT quantify comments – use survey or other method. Responses may be influenced by each other as well

Uses o

Needs assessment

o

Program development

o

Process evaluation (understand what happened during program)

o

Outcome evaluation

Myths -

Quick and cheap – not as easy of a process (lots of organization of people)

-

Require professional moderators – doesn’t need to be professional, just skilled (whatever resources are available)

-

Require special facilities – can do it anywhere e.g. restaurant

-

Must consist of strangers – no

-

Will not work for sensitive topics – will work, just be mindful that people don’t disclose too much

-

Reaches/produces conformity – no, all people do not need to come to same conclusion – sometimes want to maximize variation (hear from all views)

-

Must be validated by other methods – no, do not need another method (survey) focus group are still good at producing data alone

Ethical issues -

Informed consent

-

Confidentiality

-

Privacy

-

Dealing with stressful/emotional topics – e.g. take a break, make resources for clients clear, therapists may be involved

How structured should they be? -

-

More structured o

If pre-determined issues/barriers

o

Questions specific to barriers

Less structured o

If exploratory

o

Don’t use prompts, get out idea without having a preset idea in mind

Steps in focus group 1. State research purpose

2. Identify moderator - Skilled; train - Compatibility (female moderator with female interviewees) 3. Develop interview guide - Provides direction for discussion - Only a guide, so do not have to ask all questions – can use different angles if seems more appropriate Types of questions: 

Opening questions (ice breaker)



Introductory questions



Key questions



Transition questions



Ending questions

o

Use open ended questions

o

Unstructured vs structured questions

o

Order of questions: 

Less structured questions first



Funnel (general to specific)



Among key questions, most important questions first (time restraint)

o

Number of questions (hard to choose, depends on number of participants, time restraints, complexity)

o

Probes/prompts

o

Pilot test

4. Recruit Sample -

Purposive sampling

-

Convenience sampling

-

Deciding on group composition: o

Homogeneous participants (relatively similar) – reduce conflicts

o

Segmentation

o -

-

Strangers or acquaintances (strangers usually best to reduce bias)

Deciding on group size: o

7-10 individuals

o

Consider amount of time that each person will get to talk

o

Larger vs smaller (large can increase opinions, but can be harder to manage)

Deciding on number of groups: o

Diversity of comments

o

3-5 groups

o

Theoretical saturation – no new information

5. Conduct focus group -

1.5-2.5 hrs

-

“small talk” before

-

Observe interactions (how people feel about others in group)

-

Inform group that session recorded and whether “observed” – put name tags in front of people. Dominant beside interviewer (less eye contact), introverted people right in front (more eye contact so they may talk more)

-

Ground rules

-

Problems o

Experts (self-proclaimed)

o

Others interrupted – if bad may need to ask person to leave

o

Friends participating (chatting with eachother)

6. Analysis and interpret Data -

Moderator and assistant do preliminary analysis

-

Focus on one question at a time o

Themes

o

Quotes

Different analysis strategies o

Based on memories – efficient

o

Based on notes – more systematic

 o

o

o

Field notes during session

Based on tape recording 

Researcher listens to tape



Brief summary

Based on transcripts 

Transcribe interviews – figure out themes….



Open coding – open mind; jot down themes/ideas, first stab at it



Focused coding – revisit transcripts, merge themes, naming, description, segments of data

Content analysis 

Cut and paste



Find option on word processor



Software packages •

NVivo



NUDIST



Ethnograph – each line on software packages have a different code for each themes so you can look it up efficiently (identified themes not emerging themes – do not magically emerge)

Lecture – Quasi-experimental Research designs Questions of interest o

Is the program effective in achieving its intended objectives?

o

How do you know that it was the program or an alternative explanation

Criteria of objectives o

Specific, measureable, purposeful standards, realistic, time frame

Internal validity o

The validity of inferences about whether the relationship between two variables is causal

o

Threats to internal validity: plausible alternative explanations for what causes the observed effect

o

o

o

o

o

-

E.g. frog jumping competition – study designed to determine how far frogs can jump

-

Needs starting point – yells “jump” frog jumps 10 ft = baseline

-

1 leg weight = 5 ft

-

2 leg weight = 3ft

-

3 leg weight = 1

-

4 leg weight = doesn’t move

-

Obvious reason for not moving is because it’s weighed down, BUT novice researcher says “when you strap weights on frog, frog becomes deaf”

History 

Any event that coincides with the independent variable and could affect the dependent variable (don’t know if it was historical event or program causing effect)



E.g. drug makes rat excited BUT rat looking at pretty girl at same time = cause



Must rule out other possible causes

Maturation 

Threat that some biological, psychological, or emotional process within the person and separate from the intervention will change over time (growth and development transitions – learning, aging…)



Can be challenging to do program with children because are at age to change

Testing 

Effects due to repeated testing of participants over time



E.g. give sample survey twice = remember responses from pretest for posttest. Could broaden time period, or give 2 different tests

Instrumentation 

Any change that occurs over time in measurement procedures or devices



Change instrument/measurement (e.g. survey, researcher, taking test at home vs. doctors)



Should calibrate (e.g. scale) and standardize

Statistics regression – no matter what answers will regress towards mean (pennies)



E.g. people go down after being on sport’s illustrated – yes because they’re already at top. FLIP COIN EXAMPLE



Seek groups very different than mean so no matter what it will improve

Threats to internal validity Attrition o

Threat that arises when some participants don’t continue throughout the study

o

Example: Effect of a health education program on physical activity patterns:

o



Previous research: males more active



Assign 100 M 100 F to both intervention and comparison groups



Results: intervention group more active



Dropout: 50F in intervention group and 50 in comparison group

How would you interpret these results? 

Mostly males in intervention group so automatically more likely to be active



Need to look at response (drop out) AND describe characteristics of those dropout and those who stayed

Selection o

Any pre-existing differences between individuals in the different experimental conditions that can influence the dependent variable – comparing apples to oranges

o

Interaction with other treats

Diffusion or imitation of intervention o

Program diffuses/spreads to comparison group from intervention group

o

E.g. Dartmouth health promotion study

Compensatory equalization of intervention o

Idea that people think that the two groups are being unjust – so share info to comparison group and lose group

Compensatory rivalry o

Comparison group feels they are being disadvantaged – so they overcompensate to overdue positive effects – lose group

Resentful demoralization

o

Comparison group feels there is no change and so they change their moods/behaviors in a negative way

Experimenter expectancy o

Experimenter tells participants their expected results and skew results

Mediator vs moderator o

Looking at correlation between independent and dependent variable can be explained by mediator (third party – must happen before dependent)

o

Independent  mediator  dependent

o

E.g. dependent is behavior of healthy eating, mediator is intentions to change (need intention before behavior changes) – like domino effect

o

Moderator MODIFIES relationship between independent and dependent (can strengthen or increase or decrease or can how that there is no relationship)

o

E.g. relationship between alcohol and physical activity – moderator may be person’s gender. Use is stronger/weaker depending on M/F

Research designs Notations: X = intervention X = removal of intervention O = observation R = randomly assigned --- = not randomly signed

Quasi-experimental designs Single-group, posttest only design: Group A X O1 -

E.g. people want to quit smoking, end of program see how many are smoking – weak too many internal validity risks

Single-group, pretest-posttest design: Group A O1 X O2 -

still weak because too many risks

Nonequivalent posttest only design: Group A X1 O1

Group C -

O2

Still weak because no comparison group – only know measurement after intervention

Pretest-posttest, nonequivalent comparison group design: Group A O1 X1 O2 Group B O3 -

O2

Stronger

Comparing nonequivalent posttest-only vs. pretest-posttest, nonequivalent comparison group -

Pretest lets you know how people were before intervention and so more confident with how intervention worked

-

Groups could not be equivalent at base line – one group could be hire and therefore may change more

-

If intervention group changes a lot and comparison doesn’t it could be regression to mean

Cohort design: Earlier cohort X O1 Later cohort

O2 X O3

-

Everyone has received intervention – can’t weed people out

-

Combines two group designs – compare O2’s pretest with O1’s posttest

-

Only as strong as evidence that both cohorts are the same

Time series designs -

Collect observations/measures before, during and after intervention

-

E.g. simple “interrupted” time series: 

O1 O2 O3 X O4 O5 O6

-

At point of interruption trying to see what the change looks like

-

Increases knowledge of effect of intervention because can look at larger time frame and multiple data points (can rule out idea of pattern instead of intervention working)

-

Threat if you give questionnaires all the time (testing error/effect)

-

This is used when data you have is archival – e.g. health records

-

X can be one shot or continuous intervention (tornado vs. new health campaign keeps being shown over months – OXOXOXO)

o

o

o

If effects are only immediate and then do not last it was not overly effective

Interrupted time series with non-equivalent control group 

Group A O1 O2 X O3 O4 Group B O1 O2 O3 O4



Must be similar to work

Interrupted time series with removed intervention 

O1 O2 x O3 O4 X O5 O6



When removed should get worse



E.g. gym fees paid for a year, after a year not paid



Threat – resentment/demoralization if someone of higher power took something away from you

Interrupted time series with multiple replications 

O1 O2 x O3 O4 X O5 O6 X O7 O8 X O9

Experimental Randomized designs Randomized designs ** notes

o

-

Randomization – 50 get it, 50 don’t

-

NOT random selection (selection is sampling issue – use sample to generalize whole population)

-

Tossing a coin

-

Table of random numbers

Between participants, randomized, posttest-only design:

o

Between-participants, matched randomized, posttest-only design:

o

No pretest do nothing to compare it to

Match measures that correlate with topic

Between-participants, randomized, pretest-posttest design:

-

o

o

Statistical comparisons: 

Pretest



Posttest



Pretest and posttest for each participant



Compare groups on difference between pre and posttests

Solomon 4-group design:

-

Combines different methods

-

Statistical comparisons: 

O1 and O3 – should be similar



O2 and O4 – different (A has intervention)



O1-O2 vs O3-O4 – different



O5 and O6 – different



O4 and O6 – same (neither had intervention, both control groups)



O2 and O6 – different



O2 and O5 – same (both had intervention and there is no interaction), if different then pretest probably interacted with intervention

Between-participants factorial design:



E.g. FITT – duration and frequency of workout – could od two individual studies on each, or a combination of both (e.g. 30m/5days week vs. 60m/3days a week)



Each factor (duration/frequency) can have different levels e.g. duration = 0m, 30m or 60m. frequency has 2 – 3x/w or 5x/w

-

main effects: differences among groups for a single independent variable that are significant, temporarily ignoring all other independent variables

-

interaction: influence of 1 independent variable on a dependent variable depends on level of another independent variable

o

E.g. no movie if have exam but if no exam DEPENDS on what movie (variables depend on another variable)

o

E.g. 2 x 3 factorial design: reaction time of people with alcohol/caffeine interaction

o

No caffeine/low caffeine/moderate caffeine VS no alcohol/alcohol 

If only focus on alcohol OR caffeine that is its MAIN EFFECT, together is INTERACTION



If lines on diagram are not parallel = interaction

Research designs and statistics “An ounce of design is worth a pound of analysis. The design or plan for gathering the evidence does more to determine the quality of research project than does the stat analysis. Often one can fix a mistake in stats.” RESEARCH DESIGN OVERTRUMPHS STATS Lecture – cross sectional and longitudinal design Cross sectional designs o

Data collected at 1 point in time -

E.g. effect of children on marital satisfaction

-

Measure marital satisfaction of couples, whether had any children, if so age of oldest

-

Then create groups and compare marital satisfactions

-

Couples without children

-

Couples with oldest child under 1y old

-

Couples with oldest child between 1 and 5y

o

Couple be causal marital sat child

o

Relationship between age a physical activity -

E.g. Select 3 groups of participants from population:

• o

20y today, 30y today, 40y today

Similar to nonequivalent posttest-only design 

3 interventions = age

Repeated cross sectional design o

Snapshots done number of times

o

Different snapshots of different samples (same questions asked)

Prospective cross sectional o

Begin now and repeat study at different points in future (forwards)

Retrospective cross sectional o

Draw on existing data sets to examine patterns of change up to present time (backwards)

Pros of cross sectional -

Ideal for descriptive analysis

-

Get results quickly

-

Relatively cost effective

-

Easier to ensure anonymity

Cons of cross sectional -

Internal validity – not good

-

External validity – can be good if it’s a random sample

Longitudinal research designs o

Effect of children on marital satisfaction

o

1st married couples with no kids

o

Follow sample 5 years later – end up in 1 of 3 groups (no kids, kid >1, kid