Instrumental variables estimates of the effect of subsidized training on ...

Report 9 Downloads 79 Views
Digitized by the Internet Archive in

2011 with funding from

Boston Library Consortium

Member

Libraries

http://www.archive.org/details/instrumentalvariOOabad

,31

[415

©

|Oev*

working paper department of

economics

Instrumental Variables Estimates of the Effect of Subsidized Training

on the

Quantiles of Trainee Earnings

Alberto Abadie

Joshua Angrist Guido Imbens No. 99-16

October 1999

massachusetts institute of

technology 50 memorial drive Cambridge, mass. 02139

WORKING PAPER DEPARTMENT OF ECONOMICS

Instrumental Variables Estimates of the Effect of Subsidized Training

on the

Quantiles of Trainee Earnings

Alberto Abadie

Joshua Angrist Guido Imbens No. 99-16

October 1999

MASSACHUSETTS INSTITUTE OF

TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142

PliACHuSETTS INSTITUTE OF TECHNOLOGY

LIBRARIES

Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee

Earnings*

— MIT

Alberto Abadie

— MIT and NBER Guido Imbens — UCLA and NBER Joshua Angrist

Revised: September 1999

Abstract The

effect of

government programs on the distribution of participants' earnings

important for program evaluation and welfare comparisons. This paper reports timates of the effects of

The estimation

uses a

gram impacts on the

JTPA

training programs

new instrumental

quantiles of

(QTE) estimator accommodates exogenous sion

when

selection for treatment

is

variables.

covariates

linear

where the

empirical results at low quantiles.

of earnings for

first

step

is

effects

and reduces to quantile

regres-

We

show that the JTPA program had the

men

QTE estimator can

JTPA

this

develop distribution theory

estimated nonparametrically.

Perhaps surprisingly, however,

pro-

This quantile treatment

programming problem, although

requires first-step estimation of a nuisance function. for the case

method that measures

exogenously determined. The

be computed as the solution to a convex

es-

on the distribution of earnings.

variable (IV)

outcome

is

For women, the

largest proportional

impact

training raised the quantiles

only in the upper half of the trainee earnings distribution.

*We thank Moshe Buchinsky, Gary Chamberlain, Jinyong Hahn, Jerry Hausman, Whitney Newey, Shlomo Yitzhaki, and seminar participants at Berkeley, MIT-Harvard, Penn, and the Econometric Society Summer 1998 meetings for helpful comments and discussions. Thanks also go to Erik Beecroft at Abt Associates for providing us with the National JTPA Study data and for helpful discussions. Abadie acknowledges financial support from the Bank of Spain. Imbens acknowledges financial support from the Sloan Foundation.

1.

Effects of

many

Introduction

economic variables on distributions of outcomes are of fundamental

A

areas of empirical economic research.

government programs

leading example

interest in

the question of

is

how

affect the distribution of participants' earnings, since the welfare

analysis of public policies involves distributions of outcomes. Policy-makers often hope that

subsidized training programs will reduce earnings inequality by raising the lower quantiles of the earnings distribution of

and thereby reducing poverty (Lalonde (1995), US Department

Labor (1995)). Another example from labor economics

distribution of earnings.

unionism Fortin,

is

Freeman

One

is

the effect of union status on the

of the earliest studies of the distributional consequences of

(1980), while

more recent analyses include Card

and Lemieux (1996), who have asked whether changes

in

(1996),

and DiNardo,

union status can account

a significant fraction of increasing wage inequality in the 1980s.

for

Although the importance of distribution

effects is

widely acknowledged, most evaluation

research focuses on average outcomes, probably because the statistical techniques required to estimate effects on restrict

the

means

are easier to use.

Many

econometric models also implicitly

treatment effects to operate in the form of a simple "location shift"

mean

effect

captures the impact of treatment at

treatment on a distribution

and there

is

is

easy to assess

all

quantiles.

when treatment

perfect compliance with treatment assignment.

Of

status

,

in

which case

course, the impact of is

randomly assigned

Randomization guarantees

that outcomes in the treatment group are directly comparable to outcomes in the control

group, so valid causal inferences can be obtained by simply comparing the treatment and control distributions. in

The problem

of

how

to

draw

randomized studies with non-compliance or

assignment

is

more

In this paper,

difficult,

inferences about distributional effects

in observational studies

however, and has received

we show how

less attention.

with non-random

1

to use a source of exogenous variation in treatment status

Rosenbaum and Rubin (1983), and Manski Heckman, Smith and Clements (1985). (1994), (1997), Imbens and Rubin and Abadie discuss (1999a) effects on distributions. Manski (1994, 1997) develops estimators for (1997), bounds on quantiles. 'Discussions of average treatment effects include Rubin (1977),

Heckman and Robb

-

an instrumental variable - to estimate the

effect of

treatment on the quantiles of the

distribution of outcomes in non-randomized studies, or in situations where the offer of

treatment

is

randomized but treatment

(QTE) estimator

is

itself is

used here to estimate the

voluntary. This Quantile Treatment Effects effect of training

on trainees served by the

Job Training Partnership Act (JTPA) of 1982, a large publicly-funded training program designed to help economically disadvantaged individuals.

JTPA

The data come from the National

Study, a social experiment begun in the late early 1980s at 16 locations across the

JTPA

to evaluate the effects of

training. For this study,

JTPA

US

applicants were randomly

assigned to treatment and control groups. Individuals in the treatment group were offered

JTPA

training, while those in the control

group were excluded

Only 60 percent of the treatment group actually received treatment assignment as an instrument

The treatment subpopulation we

effects

in the control

of effects affected

The

like

the

This terminology

JTPA, the

by an instrumental

Rubin

is

fact, in

(1996).

used because in randomized

the case of the

effects for

JTPA, where

who

(almost)

is

variable.

established by Imbens

approach to instrumental variables

and Angrist (1994) and Angrist, Imbens, and

Imbens and Rubin (1997) extended these

results to the identification of

hypotheses about distribution impacts such as stochastic dominance. papers developed simple estimators or a scheme for estimating the

2

a

trials

whose treatment status

how

the effect of treatment on distributions, and Abadie (1999a) showed

quantiles.

for

compilers are also representative

cases, compilers are those

identification results underlying the compilers first

we can use the

relevant subpopulation consists of people

group received treatment,

on the treated. 2 In other

(IV) models were

training, but

for treatment.

always comply with the treatment protocol. In

no one

a period of 18 months.

estimated using the framework developed here are valid

call compilers.

with partial compliance,

for

to test global

But neither

effect of

of these

treatment on

We focus here on conditional quantiles because quantiles provide useful summary

Angrist and Imbens (1991) discuss the relationship between instrumental variables and effects on the

treated. in the

Orr, et

al.

(1996) and

Heckman, Smith, and Taber (1994) report average

JTPA. Heckman, Clements and Smith

using a non-IV framework.

(1997) estimate the distribution of

effects

JTPA

on the treated

treatment

effects

statistics for distributions,

and because quantile comparisons have been

recent discussions of changing

wage inequality

Murphy

(1994)).

and Buchinsky

(1992)

The paper

organized as follows.

is

discusses the identification problem.

(see, e.g.,

at the heart of

Chamberlain (1991), Katz and

Section 2 outlines the conceptual framework and

Section 3 presents the estimator, which allows for a

binary endogenous regressor (indicating exposure to treatment) and reduces to Koenker

and Bassett (1978) quantile regression when

selection for treatment

is

exogenous.

Like

quantile regression, the estimator developed here can be written as the solution to a convex linear

programming (LP) problem, although implementation

estimation of a nuisance function in a

of the

QTE estimator requires

Finally, Section 4 discusses the estimates of

first step.

on the quantiles of trainee earnings. The estimates

effects of training

for

women show larger

proportional increases in earnings at lower quantiles of the trainee earnings distribution.

But the estimates

men

for

suggest the impact of training was largest in the upper half of

the distribution and not at lower quantiles as policy-makers perhaps would have wished.

Conceptual Framework

2.

The setup outcome

is

as follows.

variable,

The data

consist of

Y, a binary treatment indicator D, and a binary instrument, Z. In the

case of subsidized training,

Y

is

earnings,

D

indicator of the randomized offer of training.

not everyone

n observations on a continuously distributed

who was

program participation, and

indicates

Z

offered training received

and it

D

are not equal in the

status, say

a

D would indicate union status,

dummy

indicating individuals

and

who work

Z

As

in

Rubin

of causal effects,

Y

an

because

might be a

would be an instrument

in firms that

organizing campaigns (Lalonde, Marschke and Troske (1996)). vector of covariates,

is

and because a few people who were not

offered training received services anyway. In a study of the effect of unions,

measure of wages,

JTPA

Z

We

for

union

were subject to union also allow for

an r x

1

X.

(1974, 1977)

we

and our

earlier

work on instrumental variables estimation

define the causal effects of interest using potential

outcomes and

potential treatment status.

D,

Yd,

we

In particular,

outcomes indexed against

define potential

and potential treatment status indexed against Z,

D

z

.

Potential outcomes and

potential treatment status describe possibly counterfactual states of the world. Thus, tells

us what value

take

if

had

D =

Z

D would take Z were equal to if

were equal to d.

The

0.

Similarly, Yd tells us

while

1,

Do

tells

us what value

D\

D would

what someone's outcome would be

if

they

objects of causal inference are features of the distribution of potential

outcomes, possibly restricted to particular subpopulations.

The observed treatment

status

is:

D = D + (A In other words,

if

Z=

1,

then D\

the observed outcome variable

is

observed, while

1

terfactual

causal inference

is

difficult is

outcomes as being defined

Z = 0,

then

D

observed. Likewise,

is

-(1-D).

for

(1)

that although

for everyone,

one potential outcome are ever observed

we think

of

all

possible coun-

only one potential treatment status and

any one person. 3

Principal Assumptions

2.1.

The

principal assumptions of the potential outcomes

Assumption

2.1:

For almost

all

values of

Independence; (Yi,Y ,Di,D

(i)

(ii)

)

is

jointly independent of

^ E[D

\X).

D

=

First-Stage: £[Di|X]

(iv)

MONOTONICITY: P(D >

The

1

\X)

framework

for

IV

are stated below:

X,

Non-Trivial Assignment.- P(Z = 1\X) e

(hi)

3

if

Z.

is:

Y = Y -D + Y The reason why

Do)

Z

given

X.

(0, 1).

1.

idea of potential outcomes appears in labor economics in discussions of the effects of union status.

See, for example, Lewis' (1986) survey of research

on union

relative

wage

effects.

Assumption

ment status

subsumes two related requirements.

2.1(i)

First,

identify the causal effect of the instrument. This

is

comparisons by instru-

equivalent to instrument-

error independence in traditional simultaneous equations models.

comes are not directly affected by the instrument. This Angrist, Imbens

and Rubin (1996)

how they

Assumption

JTPA

differ.

is

Second, potential out-

an exclusion

two requirements and

for additional discussion of these

2.1(i) is plausible

because of the randomly assigned

See

restriction.

(though not guaranteed) in the case of the

offer of

treatment.

Assumption 2.1(h) requires that the conditional distribution of the instrument not be degenerate. in

The

two other ways as

some

and treatment assignment

relationship between instruments well.

correlation between

As

D

in simultaneous equations models,

and Z;

this

is

effect in

and

it is

D in one direction.

Monotonicity

assumption

for the

a

plausible in

most applications

JTPA, where

D =

It is

for (almost) everyone.

and coun-

on observed covariates, X. For example,

evaluation studies focus on estimating the difference between the average outcome

(which

of treatment (which

A

is

inference problem in evaluation research involves comparisons of observed

for the treated

4

2.1(iv) guarantees identification of

This monotonicity assumption means that the

terfactual outcomes, possibly after conditioning

many

Imbens

4 automatically satisfied by latent-index models for treatment assignment.

also a reasonable

The

Also,

any model with heterogeneous potential outcomes

that satisfies assumptions 2.1(i)-2.1(iii).

instrument can only affect

require that there be

stated in Assumption 2.1 (hi).

and Angrist (1994) have shown that Assumption meaningful average treatment

we

restricted

is

latent-index

is

is

model

observed) and what this average would have been in the absence

counter- factual).

Outside of a randomized

trial,

the difference in

for participation is

D= 1{A where Ao and Ai are parameters and 77 D\ — l{Ao + Ax > 77}, and either D\

is

t?

an error term that

> Do

everyone, then monotonicity holds for Z'

+ Z-A! -

or

=1—

Dq > D\ Z.

> is

0}

independent of Z. Then

for everyone.

If

Ax


Dq > D\

so that

t]},

for

average outcomes by observed treatment status

E[Y \X,D = 1

1]

- E[YQ \X,D =

0]

is

typically a biased estimate of this effect:

= {E[Y \X,D =

1]

- E[YQ \X,D=

D=

1]

- E\Y

1

+ {E[Y The

first

also

be written as E\Y\

is

term

in brackets

the bias term.

is

\X,

\X,

D=

1]}

0}}.

the average effect of the treatment on the treated, which can

— Yq\X,D =

1]

since expectation

is

a linear operator; the second

For example, comparisons of earnings by training status are biased

trainees are selected for training

if

on the basis of low earnings potential. This bias extends

to comparisons other than the mean.

For example, the relationship above holds

if

we

replace conditional expectations with conditional quantiles.

2.2.

An

Identification Using Instrumental Variables

instrumental variable solves the problem of identifying causal effects for a group of

individuals

whose treatment status

is

affected

The

by the instrument.

following result

(Imbens and Angrist (1994)) captures this idea formally:

Lemma

2.1:

Under Assumption

(and assuming that the relevant expectations are

2.1

Z = \\- E[Y\X, Z = 0] = E[Y E[D\X, Z = 1] - E[D\X, Z = 0] E[Y\X,

1

E[YX

-Y \X,Di

to individuals for

>

D

]

called a Local

is

whom D\ > Dq

as compliers because in a

whatever their assignment. In other words, the

whose treatment status was changed cannot be identified

(i.e.,

we never observe both D\ and Do where Do

E[Y1

=

in the

randomized

who comply with

trial

We

with partial

the treatment protocol

experiment induced by

Z

.

Note that individuals

are compliers) because

any one person. Also note that

in the special case

for everyone,

-Y \X,D >D l

Q]

= E[Y

-Y \X,D =

= E[Y

-Y \X,D =

l

l

refer

set of compliers is the set of individuals

we cannot name the people who for

}.

1

Average Treatment Effect (LATE).

compliance, this group would consist of individuals

in this set

-Y \X,D >D

finite)

l

l]

l],

= E[Y

1

-Y \X,D

1

= l,Z=l]

D

LATE

so

The equivalence between

the effect of treatment on the treated.

is

compliers and effects on the treated in cases where distributional characteristic

The compliers concept

and not

is

any

identically zero holds for

just means.

LATE

at the heart of the

is

Do

effects for

explanation for

how IV methods

work.

compliers are.

For these people,

Z —

Suppose

D, since

framework and provides a simple

initially it

is

that

we could know who the

D > D

always true that

x

.

This

observation plus Assumption 2.1 leads to the following lemma:

Lemma

Given Assumption 2.1 and conditional on X, treatment

2.2:

(independent of the potential outcomes) for compliers: (Yi, Yo) -L D\X,

Proof: Assumptions 1,

D = 0. When D — x

A

that (YU

2.1(i) says 1

D =

and

Lemma

consequence of

2.2

is

0,

D

Y

,

uD

)

±

assignment

is

X

course, as

compliers

is

it

stands,

Do]

(i.e.,

2.2 operational,

= P(Z —

This function

Lemma

2.3:

±

)

.

Z\X,D = 1

D

can be substituted for Z.

is

:

D=

2.2 is of

effect

0,D > X

even though treatment

E[Y,

]

we

it

(Abadie, 1999b) Let

-

Y \X,D

for the

define the following function of

D-(l-Z)

(l-D)-Z

i-ttoPO

MX) when

"identifies compliers" in

h(Y,D,X)

be

D,

same

Z

x

Do]

=

>

D

}.

(2)

'

To

{S} '

D = Z,

otherwise k

is

negative.

the following average sense:

any integrable

p7p^~^)

individual).

and X:

real function of

Then, given Assumption 2.1,

E[h{Y,D,X)\D >

1

no practical use because the subpopulation of

1\X). Note that k equals one

useful because

D =

we do not observe D\ and Dq

_ where n (X)

DQ

ignorable

that in the subpopulation of compliers, comparisons

- E[Y\X

Lemma

not identified

make Lemma

>

x

is

not ignorable in the population:

E[Y\X,D = 1,D > Of

D

Z\X, so (Yi,Y

means by treatment status estimate an average treatment

of

D,

status,

E[k h(Y, D,X)}.

(Y,D,X).

To

why

see

this

into three groups:

is

true, note that,

who have D\ >

compilers

= Dq =

and never-takers who have D\

E[h(y,D,X)\X,Di>D

by monotonicity, the population can be partitioned

0.

= p(D^D^\X~) E[h{Y D X)lX] { '

]

X, we have the

Z=

Z =

individuals with

all

D=

and

1

'

=D =

£[/i(F,D,X)|X,A

Likewise, those with

1,

Thus,

E[h{Y,D,X)\X~Di

Monotonicity means that

who have D\ = Dq =

Dq, always-takers

1]

= Z> =

1

P(A = D =



0]



P(D = D = 1

D =

and

1|X)

0\X)\.

must be never-takers.

must be always-takers. Since

Z

is

ignorable given

and never-takers as a function of

following expressions for always-takers

observed moments:

E[h(Y,D X)\X,Di t

= A) =

E[h(Y,D,X)\X,D = 1

= E[h{Y,D,X)\X,D = 1,Z = 0] D-(l-Z) 1 h(Y,D,X) X E P{D = 1\X,Z = 0) 1 " TTo(X)

l]

D = 0] =

E[h(Y, D,X)\X,

D=

0,

1

P(£ = 0|X,Z =

P{D\

= Dq =

An

and never-takers using P(D\

0\X)

= P(D =

implication of

condition involving

Abadie (1999b). 5 In the next 5

For example,

if

we

define

fi

(fi,



a)

>

Dq], and identified

fi

E\Yq\D\

is

1

~ D) -

TTo(X)

1)

WAX)

A'

is

Z—

1).

= D =

1\X)

Integrating over

X

= P(D = l\X,Z =

identified for compliers. This point

we show how Lemma

and

completes the argument.

that any parameter defined as the solution to a

section,

0)

2.3

is

moment

explored in detail in

can be used to develop an

and a as

same intercept that is E[k (Y - m - aD) 2 }.

then,

0\X,

Lemma 2.3

(Y,D,X)

P

== 1]

Z given X can similarly be used to identify the proportions

Monotonicity and ignorability of of always-takers

Z

=

argmin (ma) E[(Y

-

m - aD)

a — E\YX - Yq\D\ > Do], so by conventional IV methods).

8

2

\Di

>

D

],

a is LATE (although fi By Lemma 2.3, (/u,a) also

that

is

not the

minimizes

estimator for the causal effect of treatment on the quantiles of an outcome variable.

Quantile Treatment Effects

3.

The QTE Model

3.1.

The

QTE

linear

estimator

and additive

analysis

is

based on a model where the

is

when the treatment

model because the resulting estimator

when

regression

is

there

3.1:

For 6 £

Qg(Y\X,D,Di >

As a consequence

(0, 1),

there exist unique ag

of

D

)

JTPA

Lemma

2.2,

we

training changed the

G

R

and

(3 e

G

W such

that

= a e D + X%.

and Yq

ofY

given

(4)

X

for compliers.

median earnings

and

D

for compliers.

This

tells us, for

example,

of participants. Note, however, that

where average differences equal differences

in

may also be of

focus on the marginal distributions of potential outcomes because identification

social welfare



Yq requires

much

stronger assumptions and because economists

comparisons typically use differences in distributions and not the

distribution of differences for this purpose (see, 6

IV and ordinary

not the quantile of the difference (Yi— Yq). Although the latter

of the distribution of Y\

making

and quantile

the parameter of primary interest in this model, ag,

in contrast with average treatment effects, is

)

denotes the 9-quantile

gives the difference in the ^-quantiles of Y\

interest,

QTE

of interest are defined as follows:

1

averages, ag

Koenker and Bassett (1978) quantile

simplifies to

Q e {Y\X, D, D > D

whether

The

with X, but we use an additive

no instrumenting. The relationship between

is

estimated.

is

(OLS).

The parameters

where

effect varies

is

therefore analogous to the relationship between conventional

least squares

Assumption

treatment and covariates

at each quantile, so that a single treatment effect

straightforward

regression

effect of

e.g.,

Atkinson (1970)). 6

Heckman, Smith and Clements (1997) discuss models where features of the distribution of the difference — Yq) are identified. They note that this may be of interest for questions regarding the political economy of social programs. If the ranking of individuals in the distribution of the outcome is preserved (Y\

The model above

differs in

a number of ways from the model in the seminal papers

by Amemiya (1982) and Powell (1983), who used

least absolute deviations to estimate

a simultaneous equations system. Their approach begins with a traditional simultaneous equations model, and

Rather, the idea tailed.

is

not motivated by an attempt to characterize effects on distributions.

to improve

is

Most importantly,

of interest in the

function.

on 2SLS when the distributions of the error terms are longwith the parameters in equation

in contrast

Amemiya/Powell setup do

(4),

the parameters

not, in general, define a conditional quantile

7

The parameters as (see Bassett

of the conditional quantile function in equation (4) can be expressed

and Koenker

{a e ,Pe)

where p e (X)

is

=

ai*g

(1982)):

mm

(a,/3)eM'-+ 1

i

the check function, defined as p e (A)

Therefore, using

Lemma

2.3,

{a e ,Pe)

=

ae and

minimand

fi e

=

(9

is

— 1{A


is

Fol-

the sample

not equal to Z, the sample

of algorithms exist for minimiza-

and non-convex objective functions), but they Charnes and Cooper (1957) or Fitzenberger

(1997a,b), for a discussion of a related censored quantile regression problem). Unlike the under the treatment, then the estimator impacts.

in this paper is informative about the distribution of treatment King (1983) discusses horizontal equity concerns that require welfare analyses involving the joint

distribution of outcomes. 7

Amemiya and Powell papers comes from conditional median restrictions on the However, a conditional median restriction on the reduced form does not imply that the

Identification in the

reduced form.

is a conditional median. In fact, for a binary endogenous regressor, conditional median on the reduced form and structural equation are typically incompatible.

structural equation restrictions

10



conventional quantile regression minimand, the sample analog of equation (5) does not have a linear programming representation.

Now,

let

U=

(Y,D,X); applying the Law

of Iterated Expectations to equation (5),

we

obtain

{

argmin, GRr+1

0}

«„(^) p e (y, - W/5), •



(7)

77,

First step estimation of

kv

is

carried out using non-parametric series regression.

increasing sequence of positive integers {A(fc)}^=1

(Y x

(

lj ,

...,Y

X(

K)). Assume

that

only takes on a

finite

=

Then, any random sample {Vi}"=1

{wi,...,wj}).

3

be indexed

X

and a positive integer K,

as {{Vi } i =1 }j =1 j

(X, D). In the

same

,

where {Vi j }

fashion, the

number

{(Z{,

let

For an

K p (Y)

=

W

G

of values (so that

U )}f=1

V =

from

l

(Z,U) can

J i

=1 are subsequences for distinct fixed values of

sample can be indexed as {{Vi^L^iLi, where

{l^,}£'= i are

subsequences for distinct fixed values of X. Now, a nonparametric power series estimator

v(U) of vq(U)

is

given by the Least Squares projection of {Z, }™ J=1 on {p K (Yi.)}™ 3=1 (this

amounts to non-parametric

Z

series regression of

Y

on

in each W-cell). Let u t

be the

fitted

values of such estimator for the observations in our sample. Consider the simple estimator

n(X)

of

ir

Z

(X) obtained by averaging

within

X. Our

cells of

first

step estimator of k u

is

given by:

^)~

~(TT\-1 K 1

3.3.

^-(1-%)

(l-A)-Pi

l-rr {Xl )

tt(^)



Distribution Theory

This subsection summarizes asymptotic results for the

QTE estimator.

Proofs are given in

the appendix.

Theorem on W,

Y

3.1: is

Under assumptions

and

and

3.1

if (i)

number

7r

(X)

is

i.i.d.; (ii)

bounded away from zero and one, and

of values; (iv) conditional on

W

,

eg is

conditional

differentiate at zero with density fee \w,Di>D {ty that

W;

(v)

kv

is

is

W

and D\ > Do

takes on a

is

continuously

bounded and bounded away from zero

bounded away from zero uniformly in

13

X

continuously distributed with bounded

density; the distribution function of eg conditional on

uniformly in

the data are

continuously distributed with support equal to a compact interval and density

bounded away from zero; (Hi) finite

2.1

U

;

(vi)

for s equal to the

number of continuous

^ AA(0,O),

6e )

where

Y

derivatives in

Q = J-'ZJ"

1

ofu

K

n

2s



and

>

K

5

/n



Then, n l l 2 (6 e

P(D > D 1

]

\

andZ = E[W] with^ = K-m(U)+H(X)-{Z-7r

-MX))

(I

The asymptotic

Assumption

To produce an estimator

loss.

...

6" ~1 v ( Yj Wj = D

J

,

,

i

- Dt



(1

-

-

W&
Do). This is bigger than the quantile regression intercept because of positive selection male compilers.

constant

21

a

for for

Perhaps most striking among our findings

is

not seem to have raised the lower quantiles of their earnings.

an

by program operators to target

effort

higher earnings potential. in

any assessment using a

distribution

more

well

off,

it

may be

results in distributional changes that

does

because of

men

with

would be undesirable

social welfare function that weights the lower tail of the earnings

seems

One response

purpose of the

JTPA was to aid economically

likely that the lower quantiles are of particular

to this finding might be that few

JTPA

concern to

applicants were very

so that distributional effects within applicants are of less concern than the fact that

the program helped

many

applicants overall. However, the upper quantiles of earnings were

who

reasonably high for adult males this

This

services at relatively easy-to-employ

heavily. Since the ostensible

disadvantaged workers, policy makers.

The

men

the result that training for adult

upper

tail is

participated in the National

JTPA

Study. Increasing

therefore unlikely to have been a high priority.

Summary and Conclusions

5.

This paper reports estimates of the ings for participants.

on quantiles. The

We

QTE

use a

effect of subsidized training

new estimator

on the quantiles of earn-

for the effect of a non-ignorable

estimator can be used to determine

how an

intervention affects

the distribution of any variable for individuals whose treatment status

binary instrument.

The estimator accommodates exogenous

conventional quantile regression

when the treatment

is

covariates

exogenous.

It

treatment

is

changed by a

and

collapses to

minimizes a convex

piecewise-linear objective function similar to that for conventional quantile regression,

can be computed as the solution to a linear programming problem after tion of a nuisance function. this first step is

The paper develops

estimated nonparametrically.

and

first-step estima-

distribution theory for the case where

QTE

estimates of the effect of training on

the quantiles of the earnings distribution suggest interesting and important differences in

program

effects at different quantiles,

women. These the

JTPA

differences are large

and

differences in distributional

impact

for

men and

enough to potentially change the welfare analysis of

program.

22

Appendix Proof of Theorem

3.1:

This proof largely follows that of Theorem

1 in

Buchinski and

Hahn

(1998). Consider,

='^2 gi (j,K)

Gn(T,K)

i

where gi (r, k)

and

egi

\/n(Se

-



=

K{Ui)

{9

— W-5g. The

Y{

Now,

5e)-

function

define r„(r,K)

MLLUA almost surely.

By

- n-

\{e ei

1'2

W[t)+ -

Gn (r, 1{kv >

= E[G n (T, k)}.

= _ n -V2 W

.

e+]

0}

Kv{Ui)

.

kv )



Note

+ (1 - 6) is

- n~ 1 ' 2 W(r)- -

[{e ei

convex in r and

it

is

e«]},

minimized

at

rn

=

that,

{6

_

1{£0i

_ n -l/2

W

-




E r+1 0}

by

Op(l)- So,

.

Op (l)

that:

jT _ T W "(1{^ > '

K„)

0}

+ - 7^ J??n

=

^

r'Jr

+

.

o p (l). Since A n (r)

is

K(t)-\t'Jt -0,

k„)

=

-

Lemma 3

(t

in

-

ry

n )'J(r

-

?

?

J - - rfn Ji ln + rn (r)

Buchinsky and Hahn (1998), we have that r n

=

?7

n

+ o p (l).

A.l

-6 e )^N(O,J- ZJl

1

),

E[ipip'].

PROOF OF Lemma A.l zero.

+

Then,

n 1 / 2 (6

£=

'

T

K„)

0}



G„(t, 1{k„

where

Note

>

r'w n (l{^

= G n (i~, l{«t/ > 0} K„) + r'o; n (l{K„ > 0} K„), then X n (r) applying Pollard's convexity lemma (Pollard (1991)):

any compact subset of

with sup TgT

*Vt "

o 2

g

sup

where

o p (l).

A. 2:

Gn (T, l{Ku > for

=

T)]}

1

This assumption

is

:

To prove

this

lemma we

use the assumption that k v

probably stronger than necessary but

it

is

bounded away from

allows us to ignore the trimming using

K

1 2 s making the asymptotics easier. Assumption (vi) implies that, ((K/rij) / + K~ ) —> — = almost surely for all j 6 {1, ..., J}. Therefore sup t/eW \v op (l) (see, e.g., Newey (1997), Theorem vq\ 4). Since txq is bounded away from zero and one (by (iii)), then sup;y eW \k u — k u = o p (l). Since /c„ is bounded away from zero, with probability approaching one the trimming is not binding and we can ignore

\{k v

>

0},

\

it

for

the asymptotics.

uj n

Let

7Tq

{K u )

be the population

^n

=

= -= y.m(Ui) V n ~(

mean

-F=V m(/7i)

V™^

of

Z

V

-

7—-

1

for the /-cell of

i



1

V

TTOi-TTt

- 7r 0(A

X

7T

i )

and ?

its

(A i )

+ Rn /

sample counterpart.

r

-(TTi-TToi)

(l-7Ti)-(l-7r0i)/

24

(l-A,)-^,

A, •(!-£*,)

tt'-tt'

(i_^).(i_4)

o,

note that

D ir {\-VM

l-D^-Vi, 111

fe

-9

n'

V

1

=

l

Lemma

\V

D

f& ~

,m

V^

< sup

Then, applying

(1-Tf')-

(1 -*{,),/

@ii

ii)

~

v Oii)

— Vq\ n

'fe

A, (%,

,

-^Oi,) ^

Op(l).

*W

V

Newey and McFadden

4.3 in

\

)

(l-5?')-(l

-^)J

(1994),

-l

I l

lX n -TV

(i-A,)-^_ £

(!-£>)

D-(l-i/

i/

m([/) (7T

(X))2

A.-(i-hh.)

\

(1-^.(1-4)

J

(1)

(l-D)-Z

)

m(C/)

A"

(l-Tro(X))'

D-(l-Z)

(i-MX)Y

(x)y

(7r

X

Therefore,

i(K v )

To

"

lf. V^ n

+

^Vi/(X )-{^-7r

/

A-(l-gj)

(l-A)-gj

V

1-T0(^i)

TToCX)

i

(X )}+Op (l). I

prove,

J_V- an A

A-(i-Pi)

(l-A)-gj

>™ Vn

"7=

^i

JT(

1

V

-

A



(1

-

Zi) 7T7T

l-TTo(Ai)

(1

- A) ~ 7r

Zi

lv (Ajj

+

,

Op(l)

/

notice that

A V

t=l



(l

-

vj)

l-Tro(Xi)

(l "

- A)

Pt

no(Xi) J

-.

ni

l

- A,) tt

25

(A%)

Pjj

So,

we

show that

just have to

for

each j €

{1,

J}

(l-A)-zv

Di. -(1-Vi.)

(

...,

i,=i

:^m(^). 1-

+0,(1 ,(!)

l-Tro(^) ,

ttoP^.;

This will be done by checking assumptions 6.1 to 6.6 in Newey (1994). Assumptions 6.1 and 6.2 follow from the conditions of the theorem (see Newey~(1994), page 1373). Assumption 6.3 holds with d — and ad = s. Assumption 6.4 holds for b(z) — and derivative equal to directly

m(U) 5 Assumptions 6.5 and 6.6 follow from: (i) rij K~ 2s — 0; (ii) /rij — check Assumption 6.5 note that (vi) implies that s > 5/2, therefore 6.5 is also valid with d — 0). To check assumption 6.6 note that since

K

>

(almost surely). In particular, to

>

K K~ —

m(U) i

D

1-D

-MX)

MX)

s




00,

then, there exists a sequence £ K such that

E as

K

-

oo (see

Newey

D

m(U)

l

~£)-tKP MX),

l-no(X)

(1994), page 1380 last paragraph.)

Di

(1

-

Vi)

(1

-

Di)

Tn^ mm \ V"

Now, applying the

(TT\

(-L

l

A-(l-^Oi)

-

I

-MX,)

(1

~

~ Di)

Z%

MXi)

)

Di V

Proof of Lemma

lemma

A. 2

:

(1994),

-

VQi)

(1

- ZO

+

o p (l)

(1

7To(A'i

N

2=1

VQ

MXi)

l-A

result of the

Newey

Vi

x

and the

results in

MXi)

)

1

K (U)

^ + Op(l) TTo(A'i)

- A)

holds.

Note that pn (Ui,l{K v > 0}-K v ,T)-pn (Ui,K

1/

,T)

=

(1{k„

> 0}-K„-n v )-Sn (Ui,T),

where

Sn (Ui,r)

so \Sn (Ui,r)\

E[n

9-[{e ei

+

(1-6).

< n- x l 2 \{\e 8i < n- l ' 2 \W[ T \} \

\Sn (Ui,r\}

=E


\

{

h

and

(ii)


D (h-z)dz

=

fe e \W,D 1

>D o (0) + h

=

fe e \W, Dl

>D o {0) +

dz

Z- D

WW'\D >

[feelw Dl>Do (0)

>

1

P(D > D l

]

D

)

).

=

0(l/h 2 ).

then

Notice that, since k„

X>„(0i)

Di

W are bounded,

^ 5>„(tfi)

i

,

x

var( Ku -^ h (6e)-WW') Since

U ]W

3.1, E[

Q

= E Also, since

]

hm E [E [

D

=

WW'}

(6e)

W

1

Theorem

(iv) in

absolute value) by a constant. Since

W,D > D =

„(0i) ¥>m&) WiW! lY,K v (Ui) •

2

By

(i)

and

(iv), for

I Y,K v {Ui)

ip

=1

l

(6 e )

WiWi + o„(l).

(A.3)

2=1

some constant C

Ki (6e)

WiWi - K v {Ui)

oo

-—~-

h- CO - in

rf

£

ft

£

3

nj t~ tS

O w

J3

>

CD

"o

CD

s o u CD

t-i

a

"c6

CO 0)

CO

^j tO "fl

C

6

CD

~

CD

+J

<e

g e& q W =

w

B&

9 -a

-

s

15

»?

^Ica 3~ «c T

*j

*—"

O

rt

a

^

t.

*

C

O

*.

c c « S c to a;

tti

Q

o, r.

15

§

2.*

§

3™

,",

"3

.2

o

^

°|qO o

•=

S-^S.2

*5S

u

O

co

to - io

,-H

W

C3

tp

C

C

CD

*

V>

.S

3

5