Digitized by the Internet Archive in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/instrumentalvariOOabad
,31
[415
©
|Oev*
working paper department of
economics
Instrumental Variables Estimates of the Effect of Subsidized Training
on the
Quantiles of Trainee Earnings
Alberto Abadie
Joshua Angrist Guido Imbens No. 99-16
October 1999
massachusetts institute of
technology 50 memorial drive Cambridge, mass. 02139
WORKING PAPER DEPARTMENT OF ECONOMICS
Instrumental Variables Estimates of the Effect of Subsidized Training
on the
Quantiles of Trainee Earnings
Alberto Abadie
Joshua Angrist Guido Imbens No. 99-16
October 1999
MASSACHUSETTS INSTITUTE OF
TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02142
PliACHuSETTS INSTITUTE OF TECHNOLOGY
LIBRARIES
Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee
Earnings*
— MIT
Alberto Abadie
— MIT and NBER Guido Imbens — UCLA and NBER Joshua Angrist
Revised: September 1999
Abstract The
effect of
government programs on the distribution of participants' earnings
important for program evaluation and welfare comparisons. This paper reports timates of the effects of
The estimation
uses a
gram impacts on the
JTPA
training programs
new instrumental
quantiles of
(QTE) estimator accommodates exogenous sion
when
selection for treatment
is
variables.
covariates
linear
where the
empirical results at low quantiles.
of earnings for
first
step
is
effects
and reduces to quantile
regres-
We
show that the JTPA program had the
men
QTE estimator can
JTPA
this
develop distribution theory
estimated nonparametrically.
Perhaps surprisingly, however,
pro-
This quantile treatment
programming problem, although
requires first-step estimation of a nuisance function. for the case
method that measures
exogenously determined. The
be computed as the solution to a convex
es-
on the distribution of earnings.
variable (IV)
outcome
is
For women, the
largest proportional
impact
training raised the quantiles
only in the upper half of the trainee earnings distribution.
*We thank Moshe Buchinsky, Gary Chamberlain, Jinyong Hahn, Jerry Hausman, Whitney Newey, Shlomo Yitzhaki, and seminar participants at Berkeley, MIT-Harvard, Penn, and the Econometric Society Summer 1998 meetings for helpful comments and discussions. Thanks also go to Erik Beecroft at Abt Associates for providing us with the National JTPA Study data and for helpful discussions. Abadie acknowledges financial support from the Bank of Spain. Imbens acknowledges financial support from the Sloan Foundation.
1.
Effects of
many
Introduction
economic variables on distributions of outcomes are of fundamental
A
areas of empirical economic research.
government programs
leading example
interest in
the question of
is
how
affect the distribution of participants' earnings, since the welfare
analysis of public policies involves distributions of outcomes. Policy-makers often hope that
subsidized training programs will reduce earnings inequality by raising the lower quantiles of the earnings distribution of
and thereby reducing poverty (Lalonde (1995), US Department
Labor (1995)). Another example from labor economics
distribution of earnings.
unionism Fortin,
is
Freeman
One
is
the effect of union status on the
of the earliest studies of the distributional consequences of
(1980), while
more recent analyses include Card
and Lemieux (1996), who have asked whether changes
in
(1996),
and DiNardo,
union status can account
a significant fraction of increasing wage inequality in the 1980s.
for
Although the importance of distribution
effects is
widely acknowledged, most evaluation
research focuses on average outcomes, probably because the statistical techniques required to estimate effects on restrict
the
means
are easier to use.
Many
econometric models also implicitly
treatment effects to operate in the form of a simple "location shift"
mean
effect
captures the impact of treatment at
treatment on a distribution
and there
is
is
easy to assess
all
quantiles.
when treatment
perfect compliance with treatment assignment.
Of
status
,
in
which case
course, the impact of is
randomly assigned
Randomization guarantees
that outcomes in the treatment group are directly comparable to outcomes in the control
group, so valid causal inferences can be obtained by simply comparing the treatment and control distributions. in
The problem
of
how
to
draw
randomized studies with non-compliance or
assignment
is
more
In this paper,
difficult,
inferences about distributional effects
in observational studies
however, and has received
we show how
less attention.
with non-random
1
to use a source of exogenous variation in treatment status
Rosenbaum and Rubin (1983), and Manski Heckman, Smith and Clements (1985). (1994), (1997), Imbens and Rubin and Abadie discuss (1999a) effects on distributions. Manski (1994, 1997) develops estimators for (1997), bounds on quantiles. 'Discussions of average treatment effects include Rubin (1977),
Heckman and Robb
-
an instrumental variable - to estimate the
effect of
treatment on the quantiles of the
distribution of outcomes in non-randomized studies, or in situations where the offer of
treatment
is
randomized but treatment
(QTE) estimator
is
itself is
used here to estimate the
voluntary. This Quantile Treatment Effects effect of training
on trainees served by the
Job Training Partnership Act (JTPA) of 1982, a large publicly-funded training program designed to help economically disadvantaged individuals.
JTPA
The data come from the National
Study, a social experiment begun in the late early 1980s at 16 locations across the
JTPA
to evaluate the effects of
training. For this study,
JTPA
US
applicants were randomly
assigned to treatment and control groups. Individuals in the treatment group were offered
JTPA
training, while those in the control
group were excluded
Only 60 percent of the treatment group actually received treatment assignment as an instrument
The treatment subpopulation we
effects
in the control
of effects affected
The
like
the
This terminology
JTPA, the
by an instrumental
Rubin
is
fact, in
(1996).
used because in randomized
the case of the
effects for
JTPA, where
who
(almost)
is
variable.
established by Imbens
approach to instrumental variables
and Angrist (1994) and Angrist, Imbens, and
Imbens and Rubin (1997) extended these
results to the identification of
hypotheses about distribution impacts such as stochastic dominance. papers developed simple estimators or a scheme for estimating the
2
a
trials
whose treatment status
how
the effect of treatment on distributions, and Abadie (1999a) showed
quantiles.
for
compilers are also representative
cases, compilers are those
identification results underlying the compilers first
we can use the
relevant subpopulation consists of people
group received treatment,
on the treated. 2 In other
(IV) models were
training, but
for treatment.
always comply with the treatment protocol. In
no one
a period of 18 months.
estimated using the framework developed here are valid
call compilers.
with partial compliance,
for
to test global
But neither
effect of
of these
treatment on
We focus here on conditional quantiles because quantiles provide useful summary
Angrist and Imbens (1991) discuss the relationship between instrumental variables and effects on the
treated. in the
Orr, et
al.
(1996) and
Heckman, Smith, and Taber (1994) report average
JTPA. Heckman, Clements and Smith
using a non-IV framework.
(1997) estimate the distribution of
effects
JTPA
on the treated
treatment
effects
statistics for distributions,
and because quantile comparisons have been
recent discussions of changing
wage inequality
Murphy
(1994)).
and Buchinsky
(1992)
The paper
organized as follows.
is
discusses the identification problem.
(see, e.g.,
at the heart of
Chamberlain (1991), Katz and
Section 2 outlines the conceptual framework and
Section 3 presents the estimator, which allows for a
binary endogenous regressor (indicating exposure to treatment) and reduces to Koenker
and Bassett (1978) quantile regression when
selection for treatment
is
exogenous.
Like
quantile regression, the estimator developed here can be written as the solution to a convex linear
programming (LP) problem, although implementation
estimation of a nuisance function in a
of the
QTE estimator requires
Finally, Section 4 discusses the estimates of
first step.
on the quantiles of trainee earnings. The estimates
effects of training
for
women show larger
proportional increases in earnings at lower quantiles of the trainee earnings distribution.
But the estimates
men
for
suggest the impact of training was largest in the upper half of
the distribution and not at lower quantiles as policy-makers perhaps would have wished.
Conceptual Framework
2.
The setup outcome
is
as follows.
variable,
The data
consist of
Y, a binary treatment indicator D, and a binary instrument, Z. In the
case of subsidized training,
Y
is
earnings,
D
indicator of the randomized offer of training.
not everyone
n observations on a continuously distributed
who was
program participation, and
indicates
Z
offered training received
and it
D
are not equal in the
status, say
a
D would indicate union status,
dummy
indicating individuals
and
who work
Z
As
in
Rubin
of causal effects,
Y
an
because
might be a
would be an instrument
in firms that
organizing campaigns (Lalonde, Marschke and Troske (1996)). vector of covariates,
is
and because a few people who were not
offered training received services anyway. In a study of the effect of unions,
measure of wages,
JTPA
Z
We
for
union
were subject to union also allow for
an r x
1
X.
(1974, 1977)
we
and our
earlier
work on instrumental variables estimation
define the causal effects of interest using potential
outcomes and
potential treatment status.
D,
Yd,
we
In particular,
outcomes indexed against
define potential
and potential treatment status indexed against Z,
D
z
.
Potential outcomes and
potential treatment status describe possibly counterfactual states of the world. Thus, tells
us what value
take
if
had
D =
Z
D would take Z were equal to if
were equal to d.
The
0.
Similarly, Yd tells us
while
1,
Do
tells
us what value
D\
D would
what someone's outcome would be
if
they
objects of causal inference are features of the distribution of potential
outcomes, possibly restricted to particular subpopulations.
The observed treatment
status
is:
D = D + (A In other words,
if
Z=
1,
then D\
the observed outcome variable
is
observed, while
1
terfactual
causal inference
is
difficult is
outcomes as being defined
Z = 0,
then
D
observed. Likewise,
is
-(1-D).
for
(1)
that although
for everyone,
one potential outcome are ever observed
we think
of
all
possible coun-
only one potential treatment status and
any one person. 3
Principal Assumptions
2.1.
The
principal assumptions of the potential outcomes
Assumption
2.1:
For almost
all
values of
Independence; (Yi,Y ,Di,D
(i)
(ii)
)
is
jointly independent of
^ E[D
\X).
D
=
First-Stage: £[Di|X]
(iv)
MONOTONICITY: P(D >
The
1
\X)
framework
for
IV
are stated below:
X,
Non-Trivial Assignment.- P(Z = 1\X) e
(hi)
3
if
Z.
is:
Y = Y -D + Y The reason why
Do)
Z
given
X.
(0, 1).
1.
idea of potential outcomes appears in labor economics in discussions of the effects of union status.
See, for example, Lewis' (1986) survey of research
on union
relative
wage
effects.
Assumption
ment status
subsumes two related requirements.
2.1(i)
First,
identify the causal effect of the instrument. This
is
comparisons by instru-
equivalent to instrument-
error independence in traditional simultaneous equations models.
comes are not directly affected by the instrument. This Angrist, Imbens
and Rubin (1996)
how they
Assumption
JTPA
differ.
is
Second, potential out-
an exclusion
two requirements and
for additional discussion of these
2.1(i) is plausible
because of the randomly assigned
See
restriction.
(though not guaranteed) in the case of the
offer of
treatment.
Assumption 2.1(h) requires that the conditional distribution of the instrument not be degenerate. in
The
two other ways as
some
and treatment assignment
relationship between instruments well.
correlation between
As
D
in simultaneous equations models,
and Z;
this
is
effect in
and
it is
D in one direction.
Monotonicity
assumption
for the
a
plausible in
most applications
JTPA, where
D =
It is
for (almost) everyone.
and coun-
on observed covariates, X. For example,
evaluation studies focus on estimating the difference between the average outcome
(which
of treatment (which
A
is
inference problem in evaluation research involves comparisons of observed
for the treated
4
2.1(iv) guarantees identification of
This monotonicity assumption means that the
terfactual outcomes, possibly after conditioning
many
Imbens
4 automatically satisfied by latent-index models for treatment assignment.
also a reasonable
The
Also,
any model with heterogeneous potential outcomes
that satisfies assumptions 2.1(i)-2.1(iii).
instrument can only affect
require that there be
stated in Assumption 2.1 (hi).
and Angrist (1994) have shown that Assumption meaningful average treatment
we
restricted
is
latent-index
is
is
model
observed) and what this average would have been in the absence
counter- factual).
Outside of a randomized
trial,
the difference in
for participation is
D= 1{A where Ao and Ai are parameters and 77 D\ — l{Ao + Ax > 77}, and either D\
is
t?
an error term that
> Do
everyone, then monotonicity holds for Z'
+ Z-A! -
or
=1—
Dq > D\ Z.
> is
0}
independent of Z. Then
for everyone.
If
Ax
Dq > D\
so that
t]},
for
average outcomes by observed treatment status
E[Y \X,D = 1
1]
- E[YQ \X,D =
0]
is
typically a biased estimate of this effect:
= {E[Y \X,D =
1]
- E[YQ \X,D=
D=
1]
- E\Y
1
+ {E[Y The
first
also
be written as E\Y\
is
term
in brackets
the bias term.
is
\X,
\X,
D=
1]}
0}}.
the average effect of the treatment on the treated, which can
— Yq\X,D =
1]
since expectation
is
a linear operator; the second
For example, comparisons of earnings by training status are biased
trainees are selected for training
if
on the basis of low earnings potential. This bias extends
to comparisons other than the mean.
For example, the relationship above holds
if
we
replace conditional expectations with conditional quantiles.
2.2.
An
Identification Using Instrumental Variables
instrumental variable solves the problem of identifying causal effects for a group of
individuals
whose treatment status
is
affected
The
by the instrument.
following result
(Imbens and Angrist (1994)) captures this idea formally:
Lemma
2.1:
Under Assumption
(and assuming that the relevant expectations are
2.1
Z = \\- E[Y\X, Z = 0] = E[Y E[D\X, Z = 1] - E[D\X, Z = 0] E[Y\X,
1
E[YX
-Y \X,Di
to individuals for
>
D
]
called a Local
is
whom D\ > Dq
as compliers because in a
whatever their assignment. In other words, the
whose treatment status was changed cannot be identified
(i.e.,
we never observe both D\ and Do where Do
E[Y1
=
in the
randomized
who comply with
trial
We
with partial
the treatment protocol
experiment induced by
Z
.
Note that individuals
are compliers) because
any one person. Also note that
in the special case
for everyone,
-Y \X,D >D l
Q]
= E[Y
-Y \X,D =
= E[Y
-Y \X,D =
l
l
refer
set of compliers is the set of individuals
we cannot name the people who for
}.
1
Average Treatment Effect (LATE).
compliance, this group would consist of individuals
in this set
-Y \X,D >D
finite)
l
l]
l],
= E[Y
1
-Y \X,D
1
= l,Z=l]
D
LATE
so
The equivalence between
the effect of treatment on the treated.
is
compliers and effects on the treated in cases where distributional characteristic
The compliers concept
and not
is
any
identically zero holds for
just means.
LATE
at the heart of the
is
Do
effects for
explanation for
how IV methods
work.
compliers are.
For these people,
Z —
Suppose
D, since
framework and provides a simple
initially it
is
that
we could know who the
D > D
always true that
x
.
This
observation plus Assumption 2.1 leads to the following lemma:
Lemma
Given Assumption 2.1 and conditional on X, treatment
2.2:
(independent of the potential outcomes) for compliers: (Yi, Yo) -L D\X,
Proof: Assumptions 1,
D = 0. When D — x
A
that (YU
2.1(i) says 1
D =
and
Lemma
consequence of
2.2
is
0,
D
Y
,
uD
)
±
assignment
is
X
course, as
compliers
is
it
stands,
Do]
(i.e.,
2.2 operational,
= P(Z —
This function
Lemma
2.3:
±
)
.
Z\X,D = 1
D
can be substituted for Z.
is
:
D=
2.2 is of
effect
0,D > X
even though treatment
E[Y,
]
we
it
(Abadie, 1999b) Let
-
Y \X,D
for the
define the following function of
D-(l-Z)
(l-D)-Z
i-ttoPO
MX) when
"identifies compliers" in
h(Y,D,X)
be
D,
same
Z
x
Do]
=
>
D
}.
(2)
'
To
{S} '
D = Z,
otherwise k
is
negative.
the following average sense:
any integrable
p7p^~^)
individual).
and X:
real function of
Then, given Assumption 2.1,
E[h{Y,D,X)\D >
1
no practical use because the subpopulation of
1\X). Note that k equals one
useful because
D =
we do not observe D\ and Dq
_ where n (X)
DQ
ignorable
that in the subpopulation of compliers, comparisons
- E[Y\X
Lemma
not identified
make Lemma
>
x
is
not ignorable in the population:
E[Y\X,D = 1,D > Of
D
Z\X, so (Yi,Y
means by treatment status estimate an average treatment
of
D,
status,
E[k h(Y, D,X)}.
(Y,D,X).
To
why
see
this
into three groups:
is
true, note that,
who have D\ >
compilers
= Dq =
and never-takers who have D\
E[h(y,D,X)\X,Di>D
by monotonicity, the population can be partitioned
0.
= p(D^D^\X~) E[h{Y D X)lX] { '
]
X, we have the
Z=
Z =
individuals with
all
D=
and
1
'
=D =
£[/i(F,D,X)|X,A
Likewise, those with
1,
Thus,
E[h{Y,D,X)\X~Di
Monotonicity means that
who have D\ = Dq =
Dq, always-takers
1]
= Z> =
1
P(A = D =
•
0]
•
P(D = D = 1
D =
and
1|X)
0\X)\.
must be never-takers.
must be always-takers. Since
Z
is
ignorable given
and never-takers as a function of
following expressions for always-takers
observed moments:
E[h(Y,D X)\X,Di t
= A) =
E[h(Y,D,X)\X,D = 1
= E[h{Y,D,X)\X,D = 1,Z = 0] D-(l-Z) 1 h(Y,D,X) X E P{D = 1\X,Z = 0) 1 " TTo(X)
l]
D = 0] =
E[h(Y, D,X)\X,
D=
0,
1
P(£ = 0|X,Z =
P{D\
= Dq =
An
and never-takers using P(D\
0\X)
= P(D =
implication of
condition involving
Abadie (1999b). 5 In the next 5
For example,
if
we
define
fi
(fi,
—
a)
>
Dq], and identified
fi
E\Yq\D\
is
1
~ D) -
TTo(X)
1)
WAX)
A'
is
Z—
1).
= D =
1\X)
Integrating over
X
= P(D = l\X,Z =
identified for compliers. This point
we show how Lemma
and
completes the argument.
that any parameter defined as the solution to a
section,
0)
2.3
is
moment
explored in detail in
can be used to develop an
and a as
same intercept that is E[k (Y - m - aD) 2 }.
then,
0\X,
Lemma 2.3
(Y,D,X)
P
== 1]
Z given X can similarly be used to identify the proportions
Monotonicity and ignorability of of always-takers
Z
=
argmin (ma) E[(Y
-
m - aD)
a — E\YX - Yq\D\ > Do], so by conventional IV methods).
8
2
\Di
>
D
],
a is LATE (although fi By Lemma 2.3, (/u,a) also
that
is
not the
minimizes
estimator for the causal effect of treatment on the quantiles of an outcome variable.
Quantile Treatment Effects
3.
The QTE Model
3.1.
The
QTE
linear
estimator
and additive
analysis
is
based on a model where the
is
when the treatment
model because the resulting estimator
when
regression
is
there
3.1:
For 6 £
Qg(Y\X,D,Di >
As a consequence
(0, 1),
there exist unique ag
of
D
)
JTPA
Lemma
2.2,
we
training changed the
G
R
and
(3 e
G
W such
that
= a e D + X%.
and Yq
ofY
given
(4)
X
for compliers.
median earnings
and
D
for compliers.
This
tells us, for
example,
of participants. Note, however, that
where average differences equal differences
in
may also be of
focus on the marginal distributions of potential outcomes because identification
social welfare
—
Yq requires
much
stronger assumptions and because economists
comparisons typically use differences in distributions and not the
distribution of differences for this purpose (see, 6
IV and ordinary
not the quantile of the difference (Yi— Yq). Although the latter
of the distribution of Y\
making
and quantile
the parameter of primary interest in this model, ag,
in contrast with average treatment effects, is
)
denotes the 9-quantile
gives the difference in the ^-quantiles of Y\
interest,
QTE
of interest are defined as follows:
1
averages, ag
Koenker and Bassett (1978) quantile
simplifies to
Q e {Y\X, D, D > D
whether
The
with X, but we use an additive
no instrumenting. The relationship between
is
estimated.
is
(OLS).
The parameters
where
effect varies
is
therefore analogous to the relationship between conventional
least squares
Assumption
treatment and covariates
at each quantile, so that a single treatment effect
straightforward
regression
effect of
e.g.,
Atkinson (1970)). 6
Heckman, Smith and Clements (1997) discuss models where features of the distribution of the difference — Yq) are identified. They note that this may be of interest for questions regarding the political economy of social programs. If the ranking of individuals in the distribution of the outcome is preserved (Y\
The model above
differs in
a number of ways from the model in the seminal papers
by Amemiya (1982) and Powell (1983), who used
least absolute deviations to estimate
a simultaneous equations system. Their approach begins with a traditional simultaneous equations model, and
Rather, the idea tailed.
is
not motivated by an attempt to characterize effects on distributions.
to improve
is
Most importantly,
of interest in the
function.
on 2SLS when the distributions of the error terms are longwith the parameters in equation
in contrast
Amemiya/Powell setup do
(4),
the parameters
not, in general, define a conditional quantile
7
The parameters as (see Bassett
of the conditional quantile function in equation (4) can be expressed
and Koenker
{a e ,Pe)
where p e (X)
is
=
ai*g
(1982)):
mm
(a,/3)eM'-+ 1
i
the check function, defined as p e (A)
Therefore, using
Lemma
2.3,
{a e ,Pe)
=
ae and
minimand
fi e
=
(9
is
— 1{A
is
Fol-
the sample
not equal to Z, the sample
of algorithms exist for minimiza-
and non-convex objective functions), but they Charnes and Cooper (1957) or Fitzenberger
(1997a,b), for a discussion of a related censored quantile regression problem). Unlike the under the treatment, then the estimator impacts.
in this paper is informative about the distribution of treatment King (1983) discusses horizontal equity concerns that require welfare analyses involving the joint
distribution of outcomes. 7
Amemiya and Powell papers comes from conditional median restrictions on the However, a conditional median restriction on the reduced form does not imply that the
Identification in the
reduced form.
is a conditional median. In fact, for a binary endogenous regressor, conditional median on the reduced form and structural equation are typically incompatible.
structural equation restrictions
10
—
conventional quantile regression minimand, the sample analog of equation (5) does not have a linear programming representation.
Now,
let
U=
(Y,D,X); applying the Law
of Iterated Expectations to equation (5),
we
obtain
{
argmin, GRr+1
0}
«„(^) p e (y, - W/5), •
•
(7)
77,
First step estimation of
kv
is
carried out using non-parametric series regression.
increasing sequence of positive integers {A(fc)}^=1
(Y x
(
lj ,
...,Y
X(
K)). Assume
that
only takes on a
finite
=
Then, any random sample {Vi}"=1
{wi,...,wj}).
3
be indexed
X
and a positive integer K,
as {{Vi } i =1 }j =1 j
(X, D). In the
same
,
where {Vi j }
fashion, the
number
{(Z{,
let
For an
K p (Y)
=
W
G
of values (so that
U )}f=1
V =
from
l
(Z,U) can
J i
=1 are subsequences for distinct fixed values of
sample can be indexed as {{Vi^L^iLi, where
{l^,}£'= i are
subsequences for distinct fixed values of X. Now, a nonparametric power series estimator
v(U) of vq(U)
is
given by the Least Squares projection of {Z, }™ J=1 on {p K (Yi.)}™ 3=1 (this
amounts to non-parametric
Z
series regression of
Y
on
in each W-cell). Let u t
be the
fitted
values of such estimator for the observations in our sample. Consider the simple estimator
n(X)
of
ir
Z
(X) obtained by averaging
within
X. Our
cells of
first
step estimator of k u
is
given by:
^)~
~(TT\-1 K 1
3.3.
^-(1-%)
(l-A)-Pi
l-rr {Xl )
tt(^)
•
Distribution Theory
This subsection summarizes asymptotic results for the
QTE estimator.
Proofs are given in
the appendix.
Theorem on W,
Y
3.1: is
Under assumptions
and
and
3.1
if (i)
number
7r
(X)
is
i.i.d.; (ii)
bounded away from zero and one, and
of values; (iv) conditional on
W
,
eg is
conditional
differentiate at zero with density fee \w,Di>D {ty that
W;
(v)
kv
is
is
W
and D\ > Do
takes on a
is
continuously
bounded and bounded away from zero
bounded away from zero uniformly in
13
X
continuously distributed with bounded
density; the distribution function of eg conditional on
uniformly in
the data are
continuously distributed with support equal to a compact interval and density
bounded away from zero; (Hi) finite
2.1
U
;
(vi)
for s equal to the
number of continuous
^ AA(0,O),
6e )
where
Y
derivatives in
Q = J-'ZJ"
1
ofu
K
n
2s
—
and
>
K
5
/n
—
Then, n l l 2 (6 e
P(D > D 1
]
\
andZ = E[W] with^ = K-m(U)+H(X)-{Z-7r
-MX))
(I
The asymptotic
Assumption
To produce an estimator
loss.
...
6" ~1 v ( Yj Wj = D
J
,
,
i
- Dt
•
(1
-
-
W&
Do). This is bigger than the quantile regression intercept because of positive selection male compilers.
constant
21
a
for for
Perhaps most striking among our findings
is
not seem to have raised the lower quantiles of their earnings.
an
by program operators to target
effort
higher earnings potential. in
any assessment using a
distribution
more
well
off,
it
may be
results in distributional changes that
does
because of
men
with
would be undesirable
social welfare function that weights the lower tail of the earnings
seems
One response
purpose of the
JTPA was to aid economically
likely that the lower quantiles are of particular
to this finding might be that few
JTPA
concern to
applicants were very
so that distributional effects within applicants are of less concern than the fact that
the program helped
many
applicants overall. However, the upper quantiles of earnings were
who
reasonably high for adult males this
This
services at relatively easy-to-employ
heavily. Since the ostensible
disadvantaged workers, policy makers.
The
men
the result that training for adult
upper
tail is
participated in the National
JTPA
Study. Increasing
therefore unlikely to have been a high priority.
Summary and Conclusions
5.
This paper reports estimates of the ings for participants.
on quantiles. The
We
QTE
use a
effect of subsidized training
new estimator
on the quantiles of earn-
for the effect of a non-ignorable
estimator can be used to determine
how an
intervention affects
the distribution of any variable for individuals whose treatment status
binary instrument.
The estimator accommodates exogenous
conventional quantile regression
when the treatment
is
covariates
exogenous.
It
treatment
is
changed by a
and
collapses to
minimizes a convex
piecewise-linear objective function similar to that for conventional quantile regression,
can be computed as the solution to a linear programming problem after tion of a nuisance function. this first step is
The paper develops
estimated nonparametrically.
and
first-step estima-
distribution theory for the case where
QTE
estimates of the effect of training on
the quantiles of the earnings distribution suggest interesting and important differences in
program
effects at different quantiles,
women. These the
JTPA
differences are large
and
differences in distributional
impact
for
men and
enough to potentially change the welfare analysis of
program.
22
Appendix Proof of Theorem
3.1:
This proof largely follows that of Theorem
1 in
Buchinski and
Hahn
(1998). Consider,
='^2 gi (j,K)
Gn(T,K)
i
where gi (r, k)
and
egi
\/n(Se
-
—
=
K{Ui)
{9
— W-5g. The
Y{
Now,
5e)-
function
define r„(r,K)
MLLUA almost surely.
By
- n-
\{e ei
1'2
W[t)+ -
Gn (r, 1{kv >
= E[G n (T, k)}.
= _ n -V2 W
.
e+]
0}
Kv{Ui)
.
kv )
•
Note
+ (1 - 6) is
- n~ 1 ' 2 W(r)- -
[{e ei
convex in r and
it
is
e«]},
minimized
at
rn
=
that,
{6
_
1{£0i
_ n -l/2
W
-
E r+1 0}
by
Op(l)- So,
.
Op (l)
that:
jT _ T W "(1{^ > '
K„)
0}
+ - 7^ J??n
=
^
r'Jr
+
.
o p (l). Since A n (r)
is
K(t)-\t'Jt -0,
k„)
=
-
Lemma 3
(t
in
-
ry
n )'J(r
-
?
?
J - - rfn Ji ln + rn (r)
Buchinsky and Hahn (1998), we have that r n
=
?7
n
+ o p (l).
A.l
-6 e )^N(O,J- ZJl
1
),
E[ipip'].
PROOF OF Lemma A.l zero.
+
Then,
n 1 / 2 (6
£=
'
T
K„)
0}
•
G„(t, 1{k„
where
Note
>
r'w n (l{^
= G n (i~, l{«t/ > 0} K„) + r'o; n (l{K„ > 0} K„), then X n (r) applying Pollard's convexity lemma (Pollard (1991)):
any compact subset of
with sup TgT
*Vt "
o 2
g
sup
where
o p (l).
A. 2:
Gn (T, l{Ku > for
=
T)]}
1
This assumption
is
:
To prove
this
lemma we
use the assumption that k v
probably stronger than necessary but
it
is
bounded away from
allows us to ignore the trimming using
K
1 2 s making the asymptotics easier. Assumption (vi) implies that, ((K/rij) / + K~ ) —> — = almost surely for all j 6 {1, ..., J}. Therefore sup t/eW \v op (l) (see, e.g., Newey (1997), Theorem vq\ 4). Since txq is bounded away from zero and one (by (iii)), then sup;y eW \k u — k u = o p (l). Since /c„ is bounded away from zero, with probability approaching one the trimming is not binding and we can ignore
\{k v
>
0},
\
it
for
the asymptotics.
uj n
Let
7Tq
{K u )
be the population
^n
=
= -= y.m(Ui) V n ~(
mean
-F=V m(/7i)
V™^
of
Z
V
-
7—-
1
for the /-cell of
i
•
1
V
TTOi-TTt
- 7r 0(A
X
7T
i )
and ?
its
(A i )
+ Rn /
sample counterpart.
r
-(TTi-TToi)
(l-7Ti)-(l-7r0i)/
24
(l-A,)-^,
A, •(!-£*,)
tt'-tt'
(i_^).(i_4)
o,
note that
D ir {\-VM
l-D^-Vi, 111
fe
-9
n'
V
1
=
l
Lemma
\V
D
f& ~
,m
V^
< sup
Then, applying
(1-Tf')-
(1 -*{,),/
@ii
ii)
~
v Oii)
— Vq\ n
'fe
A, (%,
,
-^Oi,) ^
Op(l).
*W
V
Newey and McFadden
4.3 in
\
)
(l-5?')-(l
-^)J
(1994),
-l
I l
lX n -TV
(i-A,)-^_ £
(!-£>)
D-(l-i/
i/
m([/) (7T
(X))2
A.-(i-hh.)
\
(1-^.(1-4)
J
(1)
(l-D)-Z
)
m(C/)
A"
(l-Tro(X))'
D-(l-Z)
(i-MX)Y
(x)y
(7r
X
Therefore,
i(K v )
To
"
lf. V^ n
+
^Vi/(X )-{^-7r
/
A-(l-gj)
(l-A)-gj
V
1-T0(^i)
TToCX)
i
(X )}+Op (l). I
prove,
J_V- an A
A-(i-Pi)
(l-A)-gj
>™ Vn
"7=
^i
JT(
1
V
-
A
•
(1
-
Zi) 7T7T
l-TTo(Ai)
(1
- A) ~ 7r
Zi
lv (Ajj
+
,
Op(l)
/
notice that
A V
t=l
•
(l
-
vj)
l-Tro(Xi)
(l "
- A)
Pt
no(Xi) J
-.
ni
l
- A,) tt
25
(A%)
Pjj
So,
we
show that
just have to
for
each j €
{1,
J}
(l-A)-zv
Di. -(1-Vi.)
(
...,
i,=i
:^m(^). 1-
+0,(1 ,(!)
l-Tro(^) ,
ttoP^.;
This will be done by checking assumptions 6.1 to 6.6 in Newey (1994). Assumptions 6.1 and 6.2 follow from the conditions of the theorem (see Newey~(1994), page 1373). Assumption 6.3 holds with d — and ad = s. Assumption 6.4 holds for b(z) — and derivative equal to directly
m(U) 5 Assumptions 6.5 and 6.6 follow from: (i) rij K~ 2s — 0; (ii) /rij — check Assumption 6.5 note that (vi) implies that s > 5/2, therefore 6.5 is also valid with d — 0). To check assumption 6.6 note that since
K
>
(almost surely). In particular, to
>
K K~ —
m(U) i
D
1-D
-MX)
MX)
s
00,
then, there exists a sequence £ K such that
E as
K
-
oo (see
Newey
D
m(U)
l
~£)-tKP MX),
l-no(X)
(1994), page 1380 last paragraph.)
Di
(1
-
Vi)
(1
-
Di)
Tn^ mm \ V"
Now, applying the
(TT\
(-L
l
A-(l-^Oi)
-
I
-MX,)
(1
~
~ Di)
Z%
MXi)
)
Di V
Proof of Lemma
lemma
A. 2
:
(1994),
-
VQi)
(1
- ZO
+
o p (l)
(1
7To(A'i
N
2=1
VQ
MXi)
l-A
result of the
Newey
Vi
x
and the
results in
MXi)
)
1
K (U)
^ + Op(l) TTo(A'i)
- A)
holds.
Note that pn (Ui,l{K v > 0}-K v ,T)-pn (Ui,K
1/
,T)
=
(1{k„
> 0}-K„-n v )-Sn (Ui,T),
where
Sn (Ui,r)
so \Sn (Ui,r)\
E[n
9-[{e ei
+
(1-6).
< n- x l 2 \{\e 8i < n- l ' 2 \W[ T \} \
\Sn (Ui,r\}
=E
\
{
h
and
(ii)
D (h-z)dz
=
fe e \W,D 1
>D o (0) + h
=
fe e \W, Dl
>D o {0) +
dz
Z- D
WW'\D >
[feelw Dl>Do (0)
>
1
P(D > D l
]
D
)
).
=
0(l/h 2 ).
then
Notice that, since k„
X>„(0i)
Di
W are bounded,
^ 5>„(tfi)
i
,
x
var( Ku -^ h (6e)-WW') Since
U ]W
3.1, E[
Q
= E Also, since
]
hm E [E [
D
=
WW'}
(6e)
W
1
Theorem
(iv) in
absolute value) by a constant. Since
W,D > D =
„(0i) ¥>m&) WiW! lY,K v (Ui) •
2
By
(i)
and
(iv), for
I Y,K v {Ui)
ip
=1
l
(6 e )
WiWi + o„(l).
(A.3)
2=1
some constant C
Ki (6e)
WiWi - K v {Ui)
oo
-—~-
h- CO - in
rf
£
ft
£
3
nj t~ tS
O w
J3
>
CD
"o
CD
s o u CD
t-i
a
"c6
CO 0)
CO
^j tO "fl
C
6
CD
~
CD
+J
<e
g e& q W =
w
B&
9 -a
-
s
15
»?
^Ica 3~ «c T
*j
*—"
O
rt
a
^
t.
*
C
O
*.
c c « S c to a;
tti
Q
o, r.
15
§
2.*
§
3™
,",
"3
.2
o
^
°|qO o
•=
S-^S.2
*5S
u
O
co
to - io
,-H
W
C3
tp
C
C
CD
*
V>
.S
3
5