Partial Identification and Confidence Sets for ... - Semantic Scholar

Report 6 Downloads 41 Views
Partial Identi…cation and Con…dence Sets for Functionals of the Joint Distribution of “Potential Outcomes” Yanqin Fan Department of Economics University of Washington Box 353330 Seattle, WA 98195

Emmanuel Guerre School of Economics and Finance Queen Mary, University of London Mile End Road London E1 4NS, United Kingdom

Dongming Zhu School of Economics Shanghai University of Finance and Economics 777 Guoding Road, Yangpu District Shanghai, 200433, China First version: May 2009 This version: June 2015

Abstract In this paper, we present a systematic study of partial identi…cation of two general classes of functionals of the joint distribution of two “potential outcomes” when a bivariate sample from the joint distribution is not available to the econometrician. We establish the identi…ed sets for functionals in both classes under various maintained assumptions and characterize conditions under which our identi…ed sets point identify the true value of the functionals. In addition, we establish su¢ cient and necessary conditions for the covariate information to tighten the identi…ed sets without the covariate information. Applications include, but not limited to, the evaluation of distributional treatment e¤ects of a binary treatment and pricing options written on two underlying assets when the sample information only contains observations on traded options written on each individual asset. In the former, the class of functionals include the correlation coe¢ cient between the potential outcomes, many inequality measures of the distribution of treatment e¤ects, and median of the distribution of the individual treatment e¤ect. We focus on two commonly used frameworks for evaluating treatment e¤ects: the selection on observables framework and a latent threshold-crossing model and characterize the role of the propensity score in the selection-on-observables framework and the role of endogenous selection in the latent threshold-crossing model. For the selection-on-observables framework, we construct asymptotically valid con…dence sets for the true value of the parameter corresponding to a super-modular functional. Keywords: Bivariate option pricing; Copula; Distributional treatment e¤ect; Selection-on-observables; Latent threshold-crossing model, Value-at-Risk, Stop-loss premium JEL codes: C31, C14, C19, C39

We thank Stephane Bonhomme, Yingyao Hu, Shih-Tang Hwu, Simon Lee, Konrad Menzel, Stephen Shore, Kevin Song, Richard Spady, Joerg Stoye, Tiemen Woutersen, participants of Bates White Sixth Annual Antitrust Conference 2009, Southern Economics Association Meetings 2009, International Symposium on Econometrics of Speci…cation Tests in 30 Years at Xiamen University, 2010, and seminar participants at City University of Hong Kong, Johns Hopkins University, New York University, University of Kansas, Yale University, IUPUI, Emory University, Caltech, and Shanghai University of Finance and Economics for helpful comments and discussions.

1

Introduction

1.1

The Set-up, Main Applications, and Contributions

Estimators of parameters that depend on the joint distribution of multiple random variables are straightforward to construct when a random sample from the joint distribution of those variables is available. In many important applications in economics, …nance, and other disciplines, however, such a multivariate random sample is not available. This paper considers this latter situation and provides a systematic study of partial identi…cation of two general classes of parameters that depend on the joint distribution of two random variables under various sampling schemes. Speci…cally let Y1 2 Y1 and Y0 2 Y0 denote two real-valued continuous random variables with joint cdf Fo (y1 ; y0 ), y1 2 Y1 and y0 2 Y0 . Let of interest in this paper. It can be written as function

o

Eo [ (Y1 ; Y0 )] 2

o

denote the parameter

R for some real-valued measurable

( ; ), where Eo denotes the expectation taken with respect to Fo ( ; ).

Motivated by the commonly used frameworks in econometrics and statistics to evaluate average treatment e¤ect parameters of a binary treatment, we focus on two sampling schemes in the main part of this paper. Under the …rst sampling scheme, only the marginal distributions of Y1 ; Y0 are identi…ed, e.g., when data from randomized experiments are available. Under the second sampling scheme, the conditional marginal distributions of Y1 ; Y0 given a vector of covariates which may contain unobserved components and the distribution of the covariates are identi…ed. The second sampling scheme covers the selection-on-observables framework and latent threshold-crossing models used to identify average treatment e¤ect parameters in the literature, see e.g., Rosenbaum and Rubin (1983a, b), Hahn (1998), Heckman, Ichimura, Smith, and Todd (1998a, b), Dehejia and Wahba (1999), and Hirano, Imbens, and Ridder (2003) for the former; and Heckman (1990), Heckman and Vytlacil (2005), and Carneiro and Lee (2009) for the latter. Interestingly, we …nd that both sampling schemes arise in other important contexts as well and as such the results in this paper have broad applicability in economics and …nance. Two prominent applications in …nance are: pricing bivariate options when the researcher only has univariate samples on traded single asset options written on each individual asset; and evaluating the VaR of a portfolio when the researcher only has univariate samples on each individual component of the portfolio. In bivariate option pricing, Y1 ; Y0 denote the underlying individual assets and in the VaR application, Y1 ; Y0 denote the underlying assets or risks. To simplify the exposition, we borrow the language from the treatment e¤ect literature and refer to Y1 ; Y0 as the “potential outcomes”of a binary treatment to re‡ect the lack of a bivariate sample from the joint distribution of Y1 ; Y0 . We consider two general classes of parameters

o

corresponding to di¤erent function classes for . The

…rst class is characterized by super-modular functions '-indicator functions ( (Y1 ; Y0 )

I f' (Y1 ; Y0 )

(see De…nition 3.1) and the second by what we call

g, where ' is monotone in each argument, see De…nition

3.3 or Embrechts, Hoeing, and Puccetti (2005)). Members of the …rst class of parameters include the correlation coe¢ cient between the potential outcomes, the joint distribution of the potential outcomes, and many inequality measures of the distribution of treatment e¤ects. Because of the missing data problem,

1

evaluating these parameters is known to pose more challenges than evaluating average treatment e¤ects, the latter being the focus of most work in the treatment e¤ect literature, see Lee (2005), Abbring and Heckman (2007), Heckman and Vytlacil (2007a, b) for discussions and references. Prices of most bivariate European options also belong to the …rst class including the call on the minimum option, the worst-o¤ call option, and the basket option, see Example 2.2 in this paper or Rapuch and Roncalli (2004) and Tankov (2011) for more examples. Members of the second class of parameters include the cdf of treatment e¤ects, quantiles of the distribution of treatment e¤ects, and VaR of portfolios.1 Heckman, Smith, and Clements (1997) and Abbring and Heckman (2007), among others, provide many examples demonstrating the need for evaluating joint distributions of potential outcomes, distributions of treatment e¤ects, or other features of the distributions of treatment e¤ects than various average treatment e¤ects. In integrated risk management, VaR of portfolios play an important role, see McNeil et al. (2005). Under each sampling scheme, we characterize the identi…ed sets for both classes of parameters and show that the identi…ed set of the true parameter in each class is a closed interval. For parameters corresponding to strict super-modular functions and parameters corresponding to ' functions that are strictly increasing in each argument, (i) we characterize conditions under which the lower and upper bounds under each sampling scheme coincide and thus point identify the true parameter; and (ii) we establish su¢ cient and necessary conditions for the covariate information under the second sampling scheme to tighten the identi…ed set under the …rst sampling scheme. Results in these two cases are then extended to other sampling schemes which may only partially identify the marginal or the conditional marginal cdfs. To illustrate the important role played by the covariate (observable and unobservable), we provide a detailed analysis of the identi…ed set of the correlation coe¢ cient under each sampling scheme. In particular, we establish su¢ cient and necessary conditions for its identi…ed set to exclude 0 under the second sampling scheme when there is one observable covariate and when there is endogenous selection in the context of a latent threshold-crossing model. These conditions demonstrate clearly the role of the covariate information and endogenous selection in tightening the identi…ed set. For ideal randomized experiments, Heckman, Smith, and Clements (1997) concluded that the bounds on the correlation coe¢ cient between the potential outcomes implied by the result in Cambanis, Simons, and Stout (1976), i.e., without covariate, are often too wide to be informative. Our results under the second sampling scheme show that (i) by exploiting the information in the observable covariate, these bounds can be narrowed greatly and may be informative about the sign of the correlation coe¢ cient when the dependence between the potential outcomes and the observable covariate is strong enough; and (ii) in the context of latent threshold-crossing model with endogenous selection, the requirement on the dependence between the potential outcomes and the observable covariate in (i) can be weakened signi…cantly. The general results established in this paper have immediate applications in several areas including eval1 Although quantiles of the distribution of treatment e¤ects and VaR of portfolios can not be written in the form of o , their bounds follow immediately from bounds on the distribution of treatment e¤ects and the distribution of the portfolios. So we simply refer to them as members of the second class of parameters.

2

uation of distributional treatment e¤ects, bivariate option pricing, and evaluation of the stop-loss premium of a portfolio of contracts. In this paper we explore their applications in evaluating distributional treatment e¤ects in detail and make several original contributions to the treatment e¤ect literature. First, under the selection-on-observables framework, we characterize the role of the propensity score and show that in sharp contrast to the identi…cation of average treatment e¤ects which can be based on either the observable covariates or the propensity score, the identi…ed sets of distributional treatment e¤ect parameters such as the correlation coe¢ cient and the median of the distribution of treatment e¤ects using the observable covariates could be tighter than the corresponding bounds using the propensity score. We provide su¢ cient and necessary conditions under which the two identi…ed sets are the same. Second, we characterize the identi…ed sets for distributional treatment e¤ect parameters and the role of endogenous selection in the latent threshold-crossing model adopted in Heckman and Vytlacil (2005) and Carneiro and Lee (2009) to identify average treatment e¤ect parameters. Third, we develop inference procedures for the …rst class of parameters under the selection on observables framework. We propose nonparametric estimators of the sharp bounds by plugging in local polynomial quantile estimators, establish their asymptotic distributions, and construct asymptotically valid con…dence sets (CSs) for the true parameters. As a by-product, we establish a Bahadur representation of the local linear quantile estimator that is uniformly valid over the entire support of the covariate X and over expanding intervals of the quantile level, extending Guerre and Sabbah (2012) who only consider interior values of the covariate X and quantile levels, and Fan and Guerre (2014) due to the uniformity over expanding intervals for the quantile level. Such a result is of independent interest.

1.2

Related Works

This paper is related to works in several literatures. Our identi…cation results under the …rst sampling scheme where the marginal distributions are identi…ed collect and strengthen some existing works in the probability literature on solutions to the general Fréchet problem including a continuous version of the classical monotone rearrangement inequality, see Hardy, Littlewood, and Polya (1934), Cambanis, Simons, and Stout (1976), Tchen (1980), and Rachev and Ruschendorf (1998) for the …rst class of parameters; Makarov (1981), Rüschendorf (1982), and Frank, Nelsen, and Schweizer (1987), and Williamson and Downs (1990) for the second class of parameters. Most though not all these works establish sharp bounds on the same class of parameters as in this paper assuming …xed marginal distributions. Building on these important works, Section 3 of this paper characterizes the identi…ed sets for both classes of parameters and presents ‘if and only if’conditions for point identi…cation. The results in Section 3 paves the way for the subsequent analysis in this paper. Some of the above results have been used recently to study partial identi…cation of the joint distributions of potential outcomes and the distributions of treatment e¤ects, see Manski (1997), Heckman, Smith, and Clements (1997), Fan and Park (2009, 2010, 2012), Fan and Wu (2010), Firpo and Ridder (2008), and Fan, Sherman, and Shum (2014). Assuming monotone treatment response, Manski (1997) developed sharp

3

bounds on the distributions of treatment e¤ects, while assuming the availability of ideal randomized data, Heckman, Smith, and Clements (1997) use the result in Cambanis, Simons, and Stout (1976) to bound the correlation coe¢ cient between the potential outcomes and the variance of the treatment e¤ects. Fan, Sherman, and Shum (2014) examined partial identi…cation of treatment e¤ects under data combination. Fan and Park (2009, 2010, 2012) studied partial identi…cation and inference tools for the distributions of treatment e¤ects, quantile of treatment e¤ects, and their sharp bounds for randomized experiments (without covariate). Speci…cally, for randomized experiments (without covariates), these papers study (i) sharp bounds (pointwise) on the cdf of

= Y1

Y0 , from which they derive sharp bounds on the class of D-parameters

including the quantile of the distribution of

and the class of D2 -parameters including Example 2.1 (ii)

and (iii) in the current paper; (ii) they propose estimators of the sharp bounds on the cdf and quantile of and establish their asymptotic distributions; (iii) they contain simulation experiments on the …nite sample performances of con…dence sets. While these papers focus on randomized experiments, Fan and Park (2009, 2010) brie‡y mention sharp bounds and their estimation in the covariate case. Firpo and Ridder (2008) considered bounding a general functional of the distribution of treatment e¤ects. Note that the bounds on a general functional of the distribution of treatment e¤ects obtained from the bounds on the distribution of treatment e¤ects in Fan and Park (2009, 2010), and Firpo and Ridder (2008) are in general not sharp, as the bounds on the distribution of treatment e¤ects are pointwise sharp, but not uniformly sharp. Firpo and Ridder (2008) presented a general approach to establishing bounds on functionals of the distribution of treatment e¤ects that are tighter than bounds obtained directly from bounds on the distribution of treatment e¤ects. However, the bounds in Firpo and Ridder (2008) are not sharp. In addition, Firpo and Ridder (2008) focused exclusively on partial identi…cation. In the context of switching regime models, Fan and Wu (2010) studied partial identi…cation and inference for conditional distributions of treatment e¤ects given observable covariates. In …nance, insurance, and risk management literatures, the results in Section 3 have found applications in bivariate option pricing, VaR evaluations of portfolios, and bounding the stop-loss premium of a portfolio of contracts, see Rapuch and Roncalli (2004) and Tankov (2011) for bivariate option pricing; Embrechts, Hoeing, and Puccetti (2005), McNeil et al. (2005) for VaR evaluations; and Dhaene and Goovaerts (1996), Muller (1997), Bauerle and Muller (1998), and Wang and Dhaene (1998) for bounding the stop-loss premium. However, none of these works employs covariate information. The rest of this paper is organized as follows. In Section 2, we introduce the general parameter of interest and some examples of the parameter in the two main applications considered in this paper. In Section 3, we characterize the identi…ed sets for the classes of super-modular functions and of '-indicator functions under the …rst sampling scheme. Section 4 introduces the covariate information, extends results in Section 3 to incorporate the covariate information, and compares the identi…ed sets with and without the covariate information. Applications of the results in Section 4 to evaluating distributional treatment e¤ects are presented in Section 5. Section 6 extends results in Sections 3 and 4 to incorporate partial dependence

4

information and to cases where the sampling information only partially identi…es the marginal distributions or conditional marginal distributions of the potential outcomes. Section 7 proposes nonparametric estimators of sharp bounds on functionals of the joint distribution of potential outcomes and constructs asymptotically valid con…dence sets for these functionals. Section 8 concludes and presents some extensions. Technical proofs are collected in the Appendices. Throughout the paper, we use =) to denote weak convergence. All the limits are taken as the sample size goes to 1.

2

Examples of the Parameter of Interest

Recall that the parameter of interest in this paper is

Eo [ (Y1 ; Y0 )] 2

o

R, where Y1 2 Y1 , Y0 2 Y0

denote two real-valued continuous random variables with joint cdf Fo (y1 ; y0 ), y1 2 Y1 , y0 2 Y0 ,

( ; ) is a

real-valued measurable function, and Eo denotes the expectation taken with respect to Fo ( ; ). Let Co ( ; ) denote the copula function and F1o ( ), F0o ( ) denote the marginal cdfs of Y1 ; Y0 respectively. In this section, we introduce several examples of

o,

where Example 2.1 is concerned with evaluation of

treatment e¤ects, Example 2.2 is on bivariate option pricing, Example 2.3 is on the evaluation of VaR of portfolios, and the last example deals with bounding the stop-loss premium of two contracts in actuarial mathematics. Example 2.1 (Parameters in the Evaluation of Treatment E¤ects). The …rst application concerns the evaluation of e¤ects of a binary treatment in which Y1 ; Y0 denote potential outcomes of the binary treatment and

o

denotes a policy parameter of interest that depends on the joint cdf of Y1 ; Y0 . Let D be

the binary treatment indicator such that an individual with D = 1 receives the treatment and an individual with D = 0 does not receive the treatment. Let F ( ) and

= E ( ). Some examples of

o

(i) (The Correlation Coe¢ cient). Let

= Y1

Y0 denote the individual treatment e¤ect with cdf

are presented below. (Y1 ; Y0 ) = Y1 Y0 and

2 j

= V ar (Yj ) < 1 for j = 0; 1. Then

the correlation coe¢ cient between Y1 and Y0 is given by 10

=

Eo [ (Y1 ; Y0 )]

E (Y1 ) E (Y0 )

:

1 0

Since E (Yj ) and V ar (Yj ) depend on the marginal distributions only, we sometimes refer to Eo [ (Y1 ; Y0 )] as the correlation coe¢ cient in which case

(Y1 ; Y0 ) = [Y1 Y0

(ii) (Distributional Treatment E¤ects I). Let ity measures of the distribution of treatment e¤ect increasing in its …rst argument, and

(Y1 ; Y0 ) =

1 0 ).

( ) for some function . Many inequal-

can be expressed as g (Eo [ ( )] ;

), where g ( ; ) is

( ) is continuous and convex, see Stoye (2010) and references therein.

For instance, the coe¢ cient of variation de…ned as p p V aro ( ) Eo ( = CV = can be written as g (Eo [ ( )] ;

E (Y1 ) E (Y0 )] = (

), where ( ) =

2

2)

2

is continuous and convex and g (z;

5

)=

p

z

2

=

is increasing in z. A general class of inequality measures of the distribution of measures. Let

denote an even number,

( )=

g (z; Then

, and 1

)=

z

2

( ) is continuous and convex. Further g (Eo [

distribution of

is that of generalized entropy

( )] ;

1 . ) is a generalized entropy measure of the

.

(iii) (Distributional Treatment E¤ects II). (a) Let

(Y1 ; Y0 ) = 1(

> 0). The proportion of people

receiving treatment who bene…t from it is given by Eo [ (Y1 ; Y0 ) jD = 1] = Pr( (b) Let

2 (0; 1). Although the

an example of

o,

(Y1 ; Y0 ) = 1(

> 0jD = 1) = 1

-quantile of the distribution of

F (0jD = 1): , F

1

( ), is strictly speaking not

its bounds can be obtained by inverting the bounds on F ( ) = Eo [ (Y1 ; Y0 )] with 1

), and thus we simply refer to F

( ) as an example of

o.

Example 2.2 (Bivariate Option Pricing). The second class of examples is concerned with bivariate option pricing in which Y1 ; Y0 denote the values of two individual assets and European-style option on Y1 ; Y0 with payo¤

o

denotes the price of a

(Y1 ; Y0 ), see e.g., Rapuch and Roncalli (2004) and Tankov

(2011). Suppose there is no arbitrage. It is known from the option pricing theory that there exists a risk-neutral probability measure with cdf Fo ( ; ) such that the price of the bivariate option is given by the discounted expectation of its payo¤ under Fo , i.e.,

o.

Below we present the payo¤ functions of three speci…c

bivariate options and refer interested readers to Table 1 in Tankov (2011) for more examples. (i) (Call on the minimum option). The payo¤ function of a call on the minimum option with strike K is given by

(Y1 ; Y0 ) = (min (Y1 ; Y0 )

K)+ , where (x)+ = max (x; 0).

(ii) (Worst-o¤ call option). The payo¤ function of a worst-o¤ call option is given by min (Y1

K1 )+ ; (Y0

K0 )+ , where K1 , K0 are strike prices.

(iii) (Basket option). The payo¤ function of a basket option with strike price K is given by (Y1 + Y0

(Y1 ; Y0 ) =

(Y1 ; Y0 ) =

K)+ .

Example 2.3 (VaR of a Portfolio). Another application that …ts into our framework is the evaluation of the Value-at-Risk (VaR) of a portfolio in risk management. In this application, Y1 ; Y0 denote the values of two individual risks such as market risk and credit risk and such as ' (Y1 ; Y0 ). The VaR of ' (Y1 ; Y0 ) at level

o

denotes the VaR of a portfolio of Y1 ; Y0

2 (0; 1) is de…ned as the

quantile of the distribution

of ' (Y1 ; Y0 ) denoted as F' 1 ( ). The VaR of ' (Y1 ; Y0 ) is of interest in risk management and is identi…ed when a bivariate sample from the joint distribution of Y1 ; Y0 is available. However when only univariate samples on Y1 ; Y0 are available, the VaR of ' (Y1 ; Y0 ) is not point identi…ed. To resolve this issue, researchers have adopted the independence assumption on Y1 ; Y0 . This assumption is often violated, see McNeil et al. (2005) and references therein for a detailed discussion. Embrechts, Hoeing, and Juri (2003) and Embrechts, Hoeing, and Puccetti (2005) establish the worst VaR of ' (Y1 ; Y0 ) when the sample information identi…es the

6

marginal distributions of Y1 ; Y0 only. The additional results in this paper can be used to bound the VaR of ' (Y1 ; Y0 ) under di¤erent sampling schemes. Since this is closely related to the quantile of the distribution of the individual treatment e¤ect in Example 2.1 (iii) (b), we will omit the detailed results for the VaR of ' (Y1 ; Y0 ) from this paper. Example 2.4 (The Stop-Loss Premium). An application in actuarial mathematics is that of bounding the stop-loss premium of the sum of two claims for an individual contract or a portfolio of two contracts over a given time period, where in the former case Y1 ; Y0 denote the two claims for an individual contract and in the latter case, Y1 ; Y0 denote the claims for two contracts. The stop-loss premium is de…ned as E (Y1 + Y0

K)+ , where K is the amount retained by the insured. This is the same as the price of the

basket option written on the underlying assets Y1 ; Y0 with strike price K described in Example 2.2 (iii). This paper investigates partial identi…cation of

o

under di¤erent sampling situations. Section 3 deals

with the case that no covariate is present and the sample information identi…es the marginal cdfs of Y1 ; Y0 only. Section 4 extends to the case that there is covariate which may contain unobserved components, but the sample information and the model structure allow identi…cation of the conditional marginal cdfs of Y1 ; Y0 given the covariate and the marginal cdf of the covariate. Some generalizations of the results in Sections 3 and 4 under other sampling situations are discussed in Section 6. In the rest of this paper, we will use parameters in Examples 2.1 and 2.2 to illustrate our results and label them according to the assumptions under which the identi…ed sets are established. For instance, Example 2.1 (I) in the next section refers to Example 2.1 under Assumption (I) below.

3

Partial Identi…cation Without Covariates

This section focuses on the case without covariates and adopts Assumption (I) below. Assumption (I). The marginal cdfs F1o and F0o of Y1 ; Y0 are known.2 For Example 2.1, when the researcher has access to data from an ideal randomized experiment, the marginal cdfs F1o and F0o are identi…ed from the sample information, so Assumption (I) is satis…ed. However, the joint cdf of Y1 ; Y0 is in general not point identi…ed. As a result average treatment e¤ect parameters such as the average treatment e¤ect and the treatment e¤ect for the treated are identi…ed, but treatment e¤ect parameters that depend on the joint distribution of Y1 ; Y0 such as those in Example 2.1 may not be point identi…ed. For Example 2.2, Assumption (I) is also satis…ed when the researcher observes univariate random samples of prices on traded single-asset options on Y1 ; Y0 only. For example, if Yj is the price of an asset at h i time T and call options on this asset with prices Pj (K) Eo exp ( rT ) (Yj K)+ are available, where r

is the interest rate and K is the strike price, then the cdf of Yj under the risk neutral measure is given by Fjo (K) = 1 + exp (rT )

@Pj (K) ; j = 0; 1; @K

(1)

see Ross (1976), Breeden and Litzenberger (1978), and Ait-Sahalia and Lo (1998). 2 Throughout this paper, we refer parameters that are point identi…ed from the sample information as being known except in Section 7 when we consider estimation and inference.

7

Under Assumption (I), the parameter

Eo [ (Y1 ; Y0 )] is in general not point identi…ed. Let C denote

o

the class of bivariate copula functions. For a general function , the identi…ed set for

o

under Assumption

(I) is given by =f 2

I

:

= EF [ (Y1 ; Y0 )] , where F = C (F1o ; F0o ) for some C 2 Cg ;

(2)

where EF denotes the expectation taken with respect to F . Parameters

o

in Examples 2.1 and 2.2 correspond to

functions that belong to one of the two function

classes: that of super-modular functions (see De…nitions 3.1 and 3.2) and '-indicator functions (see De…nition 3.3). In the rest of this section, we further characterize the identi…ed set

3.1

A Characterization of

De…nition 3.1 A function

I

If

for these two classes of functions.

for Super-Modular Functions

( ; ) is called super-modular3 if for all y1 (y1 ; y0 ) + (y10 ; y00 )

and sub-modular if

I

(y1 ; y00 )

y10 and y0

(y10 ; y0 )

y00 ,

0;

( ; ) is super-modular. @ 2 (y1 ;y0 ) @y1 @y0

( ; ) is absolutely continuous, then it is super-modular if and only if

0 a.e. Cambanis,

Simons, and Stout (1976) provide many examples of super-modular or sub-modular functions, see also Tchen (1980). Suppose

( ; ) is a super-modular and right continuous function.4 Sharp bounds on

o

are available,

see Cambanis, Simons, and Stout (1976), Tchen (1980), and Rachev and Ruschendorf (1998).5 For (u; v) 2 [0; 1]2 , let M (u; v)

max(u + v

1; 0) and W (u; v)

min(u; v) denote the Fréchet-

Hoe¤ding lower and upper bounds for a copula. Then the lower and upper bounds on L

and

U

, are achieved when (Y1 ; Y0 ) has the cdfs given by F

F (+) (y1 ; y0 )

( )

(y1 ; y0 )

o,

denoted by

M (F1o (y1 ) ; F0o (y0 )) and

W (F1o (y1 ) ; F0o (y0 )) respectively, and they can be expressed as L

U

EF (

)

[ (Y1 ; Y0 )] =

EF (+) [ (Y1 ; Y0 )] =

Z

Z

1

F1o1 (u) ; F0o1 (1

u) du and

0 1

F1o1 (u) ; F0o1 (u) du;

0

where

Fjo1

(u) = inf fy : Fjo (y)

ug is the quantile function of Yj , j = 0; 1. Below we restate Theorem 2 in

Cambanis, Simons, and Stout (1976). Lemma 3.1 Suppose that Assumption (I) holds and that If EF ( L

;

U

)

(y1 ; y0 ) is super-modular and right continuous.

[ (Y1 ; Y0 )] and EF (+) [ (Y1 ; Y0 )] exist (even if in…nite valued), then the identi…ed set for when either of the following conditions is satis…ed: (a)

and E [ (Y0 ; Y0 )] are …nite (in this case,

1

EF (

3A

)

[ (Y1 ; Y0 )]

o

is

I

=

(y1 ; y0 ) is symmetric and E [ (Y1 ; Y1 )] EF (+) [ (Y1 ; Y0 )] < +1); (b) there

super-modular function is also called a quasi-monotone function or a super-additive function. super-modular and right continuous function satis…es the “Monge” condition, see Rachev and Ruschendorf (1998). 5 Results for sub-modular functions follow straightforwardly from the corresponding results for super-modular functions. To save space, we will not present results for sub-modular functions in this paper. 4A

8

are some …xed constants y 0 and y 1 such that E [ (Y1 ; y 0 )] and E [ (y 1 ; Y0 )] are …nite and at least one of EF (

)

[ (Y1 ; Y0 )] and EF (+) [ (Y1 ; Y0 )] is …nite.

Note that if ( ; ) is additively separable in its arguments, then

L

=

U

in which case

o

is point identi…ed

for all the marginal distribution functions F1o ; F0o satisfying the conditions of Lemma 3.1. However, when ( ; ) is not additively separable in its arguments, in general

L

Below we show that for a general class of super-modular functions

6=

U

and

( ; ),

o

o

is only partially identi…ed.

is point identi…ed only in trivial

cases, i.e., when at least one of the marginal distributions F1o ; F0o is degenerate. De…nition 3.2 A function

( ; ) is called “strict super-modular” if it is super-modular and for all y1 < y10

and y0 < y00 , (y1 ; y0 ) + (y10 ; y00 ) and strict sub-modular if

(y1 ; y00 )

(y10 ; y0 ) > 0;

( ; ) is strict super-modular.

A super-modular or sub-modular function can be additively separable in its arguments, but a strict supermodular or a strict sub-modular function can not be additively separable in its arguments. For example, (y1 ; y0 ) = y1

y0 is super-modular but not strict super-modular, while

(y1 ; y0 ) = y1 y0 is strict super-

modular. Proposition 3.2 Suppose the conditions of Lemma 3.1 hold. (i) If L

=

U

; (ii) If

( ; ) is strict super-modular, then

L

=

U

( ; ) is additively separable, then

if and only if at least one of the marginal

distributions F1o ; F0o is degenerate. Consider the treatment e¤ect

= Y1

Y0 . Proposition 3.2 (i) implies that the lower and upper bounds

on E ( ) in Lemma 3.1 coincide and point identify E ( ). In contrast, Proposition 3.2 (ii) implies that the lower and upper bounds on V aro ( ) coincide only when at least one of the marginal distributions F1o ; F0o is degenerate. As a result, inferences for E ( ) and V aro ( ) are fundamentally di¤erent. Under Assumption (I), functionals that depend on the marginals only are identi…ed. Using Lemma 3.1, we can establish sharp bounds on the correlation coe¢ cient between the potential outcomes, inequality measures of the distribution of

introduced in Example 2.1 (i) and (ii), and prices of bivariate options in

Example 2.2. Example 2.1 (I) (i) (The Correlation Coe¢ cient). Applying Lemma 3.1 (a) to the correlation U , where coe¢ cient between the potential outcomes, we obtain L 10 R1 F 1 (u) F0o1 (1 u) du E (Y1 ) E (Y0 ) L and = 0 1o 1 0 R1 F 1 (u) F0o1 (u) du E (Y1 ) E (Y0 ) U = 0 1o : 1 0

For ideal randomized experiments, Heckman, Smith, and Clements (1997) used

L

and

U

to bound the

variance of treatment e¤ects and the correlation coe¢ cient between the two potential outcomes. They found 9

L

that the bounds are typically too wide to be informative. For example, since of

10

L

is not identi…ed from

family, then

L

=

1 and

U

,

U

U

0, the sign

. In fact, we can show that if F1o and F0o are in the same location-scale

= 1.

(ii) (Distributional Treatment E¤ects I). It is known that ( ) is a continuous, concave function and function. For example, if

0 and

( ; ) is continuous, super-modular if

( ; ) is continuous, sub-modular if

is an even number, then

( )

( ) is a continuous, convex

is continuous and convex and Lemma 3.1

(a) implies that Z

Z

1

F1o1 (u)

F0o1 (u) du

Eo [

( )]

0

1

F1o1 (u)

F0o1 (1

u) du:

(3)

0

The result in (3) also follows from Lemma 2.2 in Fan and Park (2010). Since g ( ; ) is increasing in its …rst argument, we obtain: Z 1 F1o1 (u) g

F0o1 (u) du;

g (Eo [

0

g

Z

( )] ;

)

1

F1o1 (u)

F0o1 (1

u) du;

:

0

Noting that and g (Eo [

= E (Y1 ) ( )] ;

E (Y0 ), we conclude that the bounds on g (Eo [

( )] ;

) are point identi…ed

) is partially identi…ed.

Example 2.2 (I). Applying Lemma 3.1 to the prices of bivariate options with super-modular payo¤ functions yields the identi…ed sets for the bivariate option prices. For example, the identi…ed set for the price of a call on the minimum option with strike K is given by EF (

3.2

)

(min (Y1 ; Y0 )

A Characterization of

I

K)+ ; EF (+) (min (Y1 ; Y0 )

K)+

:

for '-Indicator Functions

We now introduce another class of functions

for which an explicit characterization of I f' (Y1 ; Y0 )

I

is possible.

De…nition 3.3 Let ' denote a measurable function and

(Y1 ; Y0 )

support of ' (Y1 ; Y0 ). We refer to this class of functions

as the class of '-indicator functions.

Let F' ( ) denote the distribution function of ' (Y1 ; Y0 ). Then for a …xed ,

o

g for a …xed

= Pr (' (Y1 ; Y0 )

in the

)=

F' ( ). The sharp bounds on F' ( ) can be found in Williamson and Downs (1990) and Embrechts, Hoeing, and Juri (2003). In the following proposition, we characterize the identi…ed set for F' ( ). Proposition 3.3 Suppose that Assumption (I) holds and that ' is continuous and non-decreasing in each argument. Let Fmin;' ( ) = sup max(F1o (y) + F0o ('^y ( ))

1; 0) and

y2Y1

Fmax;' ( ) = 1 + inf min(F1o (y) + F0o ('^y ( )) y2Y1

10

1; 0);

where '^y ( ) = sup fy0 2 Y0 : ' (y; y0 ) < g. Then (i) the identi…ed set for F' ( ) is

I

= [Fmin;' ( ); Fmax;' ( )];

(ii) if either F1o ( ) or F0o ( ) is a degenerate distribution, then for all , we have Fmin;' ( ) = F' ( ) = Fmax;' ( ), so F' ( ) is point identi…ed. For a sum of two random variables, Makarov (1981), Rüschendorf (1982), and Frank, Nelsen, and Schweizer (1987) establish sharp bounds on its distribution function, see also Nelsen (1999). Frank, Nelsen, and Schweizer (1987) demonstrate that their proof based on copulas can be extended to more general functions than the sum. Unlike the sharp bounds for super-modular functions in Lemma 3.1 which are reached at the FréchetHoe¤ding lower and upper bounds for the distribution of Y1 ; Y0 (when Y1 and Y0 are perfectly negatively dependent or perfectly positive dependent), the sharp bounds for '-indicator functions are not reached at the Fréchet-Hoe¤ding lower and upper bounds for the distribution of Y1 ; Y0 . Frank, Nelsen, and Schweizer (1987) provide explicit expressions for copulas that reach the bounds on the distribution of (Y1 + Y0 ). Embrechts, Hoeing, and Puccetti (2005) analyze the properties of the dependence structures leading to the lower bound on F' ( ) for a general function '. We show in the next proposition that for a large class of functions ', F' ( ) is point identi…ed i¤ at least one of the marginal distributions F1o ; F0o is degenerate strengthening Proposition 3.3 (ii). Proposition 3.4 Suppose that Assumption (I) holds and ' is continuous. If ' is strictly increasing in each argument, then Fmin;' ( ) = F' ( ) = Fmax;' ( ) for all

if and only if at least one of the marginal

distributions F1o ; F0o is degenerate. Example 2.1 (I) (iii) (Distributional Treatment E¤ects II). (a) Under Assumption (I), F (0jD = 1) = F (0). Applying Proposition 3.3 to

leads to the identi…ed set for Pr(Y1 > Y0 jD = 1). We refer

interested readers to Fan and Park (2009, 2010) for a systematic study of partial identi…cation and inference 1

for F ( ) for ideal randomized experiments. (b) The bounds on F bounds on F ( ). Speci…cally, for 0 < QL ( ) = sup [F1 1 (u) u2(0; )

< 1, we get: QL ( )

F0 1 (u + 1

F

)] and QU ( ) =

1

( ) can be obtained by inverting the

( )

QU ( ), where

inf [F1 1 (u)

u2( ;1)

F0 1 (u

)];

and these bounds are sharp. Fan and Park (2012) explore these bounds to construct inference procedures for F

4

1

( ) for ideal randomized experiments.

Partial Identi…cation With Covariates

Let X 2 X

Rd denote the vector of covariates which may contain unobservable components. In this

section we consider an analogue of Assumption (I) and establish the identi…ed set for

o

accounting for

the presence of X . For strict super-modular and '-indicator functions where ' is non-decreasing in each argument, we establish necessary and su¢ cient conditions for the identi…ed set accounting for information in X to be tighter than that without exploiting the information in X . 11

Assumption (IC). The conditional marginal cdfs of Y1 ; Y0 given X = x denoted as F1o (yjx ) and F0o (yjx ) are known for all x 2 X . Moreover the cdf of X denoted as FX

o

( ) is also known.

Section 5 presents two commonly used frameworks for evaluating treatment e¤ects using observational data and shows that Assumption (IC) is satis…ed in both frameworks under standard conditions used in existing work in the literature to identify average treatment e¤ect parameters. Let Co ( ; jx ) denote the conditional copula of Y1 ; Y0 given X = x , where x 2 X . We note that

o

satis…es: o

Z Z

=E

(y1 ; y0 ) dFo (y1 ; y0 jX )

=E

Z Z

(y1 ; y0 ) dCo (F1o (y1 jX ) ; F0o (y0 jX ) jX ) :

So under Assumption (IC), the identi…ed set for o is RR (y1 ; y0 ) dC (F1o (y1 jX ) ; F0o (y0 jX ) jX ) 2 : =E IC = for some C ( ; jX ) 2 C a.s.

4.1

A Characterization of the Covariate

IC

:

(4)

for Super-Modular Functions and the Role of

Let L

=E

Z

1

F1o1

(ujX

) ; F0o1

(1

ujX ) du

and

U

=E

0

Z

1

F1o1 (ujX ) ; F0o1 (ujX ) du ; (5)

0

where Fjo1 (ujx ) = inf fy : Fjo (yjx )

ug is the quantile function of Yj conditional on X = x , j = 0; 1.

THEOREM 4.1 Suppose that Assumption (IC) holds. Let

(y1 ; y0 ) be a super-modular and right contin-

uous function, and suppose that both expectations in (5) exist (even if in…nite valued) and that either of the following conditions is satis…ed: (A) this case,

1

L

U

(y1 ; y0 ) is symmetric and E [ (Y1 ; Y1 )] and E [ (Y0 ; Y0 )] are …nite (in

< +1); (B) for each x 2 X , there are some …xed constants y 0 (x ) and y 1 (x )

(possibly depending on x ) such that E [ (Y1 ; y 0 (X ))], E [ (y 1 (X ); Y0 )], and E [ (y 1 (X ); y 0 (X ))] are …nite, and at least one of

L

(i) the identi…ed set for

and o

U

is …nite. Then

= Eo [ (Y1 ; Y0 )] is

(ii) for a strict super-modular function

( ; ),

IC

=[

L

=

L ; U ]; U

if and only if at least one of the conditional

marginal distributions F1o ( jx ) ; F0o ( jx ) is degenerate for almost all x 2 X . Theorem 4.1 (i) extends Lemma 3.1 and characterizes the identi…ed set for

o

under Assumption (IC)

for super-modular and right continuous functions . Heuristically we expect that incorporating information in the covariate X would help shrink the identi…ed set for necessary conditions for

IC

=

I

when

o.

Theorem 4.2 below establishes su¢ cient and

is strict super-modular.

THEOREM 4.2 Suppose the assumptions of Lemma 3.1 and Theorem 4.1 hold. Then

12

IC

I

=

L

U

;

and if

( ; ) is strict super-modular, then

IC

=

I

Pr (F1o (y1 jX ) + F0o (y0 jX ) Pr (F1o (y1 jX ) Note that Pr (F1o (y1 jX ) + F0o (y0 jX )

i¤ for

all (y1 ; y0 ),6

c -almost

1 > 0) 2 f0; 1g and

F0o (y0 jX ) < 0) 2 f0; 1g :

1 > 0) 2 f0; 1g is equivalent to

F1o (y1 jx ) + F0o (y0 jx ) > 1 for almost all x 2 X or F1o (y1 jx ) + F0o (y0 jx ) and Pr (F1o (y1 jX )

1 for almost all x 2 X ;

(6)

F0o (y0 jX ) < 0) 2 f0; 1g is equivalent to F1o (y1 jx ) < F0o (y0 jx ) for almost all x 2 X or F1o (y1 jx )

F0o (y0 jx ) for almost all x 2 X :

(7)

Obviously, (6) and (7) hold if both Y1 and Y0 are independent of X in which case the covariate X does not help shrink the identi…ed set

I.

Also if for almost every x 2 X , at least one of the conditional marginal

distributions F1o ( jx ) and F0o ( jx ) is degenerate and does not depend on x , then (6) and (7) hold, so I

=

set

IC . IC

For conditional distributions F1o ( jx ) and F0o ( jx ) that violate either (6) or (7), the identi…ed

is a proper subset of

I,

so incorporating information in X helps shrink the identi…ed set

can be useful when the identi…ed set

I

I.

This

is itself not informative as in Example 2.1 (I) (i). Indeed, we show in

Example 2.1 (IC) (i) below that for some F1o (yjx ) and F0o (yjx ), the identi…ed set

IC

of the correlation

coe¢ cient excludes 0 and hence identi…es the sign of the correlation coe¢ cient between Y1 and Y0 . Example 2.1 (IC) (i) (Correlation Coe¢ cient). Let the covariate X be univariate. For notational simplicity, we denote X as X in this example. Suppose the distribution of (Yj ; X) is known to be a bivariate normal distribution:7 Yj X

0 0

N

L

X

;

U

N 0;

= [ 1; 1], so

2 j

10

. Suppose

=

0X 1X

N

; j = 0; 1:

j jX x;

2 j

1

2 jX

q

(1

L

U,

10

2 ) (1 0X

and

L,

U

L

strict inequality if and only if

6= 0 or

+

1X

N

10

j jX x;

2 j

N (0; 1).

corr(Y1 ; Y0 ) 2 1

2 jX

and

where

2 ) 1X

Three conclusions are immediate. First, 0X

; j = 0; 1; and X

> 0, j = 1; 0. Using Lemma 3.1, we get

is not identi…ed. Now, we know that Yj jX = x

N (0; 1). Theorem 4.1 (i) yields: L

2 j

j jX

1

j jX

Then Assumption (IC) is satis…ed with Yj jX = x Obviously, Yj

2 j

;

0X

U

= U

0X 1X

+

q (1

2 ) (1 0X

2 ): 1X

, and at least one of the inequalities holds as a

1X

6= 0, implying that [

L; U ]

= [ 1; 1] i¤ X is

6 If ( ; ) is super-modular and right continuous, then it uniquely determines a nonnegative measure c on the Borel subsets of the plane R2 such that for all y1 y10 and y0 y00 , c ( y1 ; y10 y0 ; y00 ) = (y1 ; y0 ) + y10 ; y00 y1 ; y00 y10 ; y0 : See Cambanis, Simons, and Stout (1976), and Rachev and Ruschendorf (1998). 7 We present an example in Appendix B where the distributions of the potential outcomes are log-normal.

13

independent of (Y1 ; Y0 ). This conclusion is consistent with Theorem 4.2, since we can show that Pr(F1o (y1 jX) + F0o (y0 jX) Pr(F1o (y1 jX)

1 > 0) 2 f0; 1g for all (y1 ; y0 ) i¤

0X

F0o (y0 jX) < 0) 2 f0; 1g for all (y1 ; y0 ) i¤

0X

In fact, noting that Fjo (yj jX) =

[(yj

2 1=2 ] jX )

j jX X) = j (1

+

1X

= 0 and

1X

= 0:

(j = 1; 0), where

is the cdf of N (0; 1),

we conclude that for the lower bound, F1o (y1 jX) + F0o (y0 jX) "

y1 1

p

It follows from X p 2 + 1X = 1 1X

1

1 1X X 2 1X

#

"

>

y0 0

p

1

0 0X X 2 0X

#

10

is positive and when

is point identi…ed (i.e.,

10

X

j=0;1

2 jX

3

5 X:

N (0; 1) that Pr (F1o (y1 jX) + F0o (y0 jX) 1 > 0) 2 f0; 1g for all (y1 ; y0 ) if and only if p 2 0X = 1 0X = 0, which is a condition equivalent to 0X + 1X = 0. Similarly, we can

show the result for the upper bound. Second, when so

,

1 > 0 is equivalent to 2 X y q j q jX >4 2 1 1 j=0;1 j jX

0X 1X L

=

2 0X

< 0 and

U

=

10 )

+

> 0 and

0X 1X

2 1X

> 1, we have 2 0X

if and only if

= 1 or

2 0X

+

L

2 1X

U 2 1X

> 1, we have 0
1, the identi…ed set for

10

excludes 0 so identi…es the sign of

10 .

Example 2.2 (IC). In bivariate option pricing, Y1 ; Y0 are individual assets. When univariate random samples of observations on prices of traded single-asset European call options on them at di¤erent strike prices, risk free interest rates, and other state variables are available, the conditional marginal cdfs corresponding to the bivariate risk neutral measure are identi…ed: for j = 0; 1, Fjo (KjS = s) = 1 + exp (rT )

@ Pj (K; s) ; @K

where Pj (K; s) is the price of the traded single-asset European call option on Yj at strike price K, and S is the vector of state variables, see Ait-Sahalia and Lo (1998). So Assumption (IC) holds with X = S. Remark 4.1. Often we may be interested in o

(x ) = Eo [ (Y1 ; Y0 ) jX = x ] or

where X = X 1 ; X in

o

1

o

(x ) and Z 1

o

(x1 ) = Eo

(Y1 ; Y0 ) jX

1

= x1 ;

. For example, in the latent threshold-crossing model in (12), we may be interested

(x) = Eo [ (Y1 ; Y0 ) jX = x]. Then X

bounds on

o

1

1

= X and X

= . When Assumption (IC) holds, the sharp

(x1 ) are given respectively by

F1o1

(ujx

) ; F0o1

(1

ujx ) du and

0

Z

0

14

1

F1o1 (ujx ) ; F0o1 (ujx ) du;

E

Z

1

F1o1

ujx1 ; X

1

; F0o1

1

ujx1 ; X

1

du

and E

For

1

F1o1 ujx1 ; X

1

; F0o1 ujx1 ; X

1

0

0

4.2

Z

A Characterization of Covariate (Y1 ; Y0 ) = I f' (Y1 ; Y0 )

for '-Indicator Functions and the Role of the

IC

g, the following theorem extends Proposition 3.3 by incorporating informa-

tion in X . THEOREM 4.3 Suppose that Assumption (IC) holds and that ' is continuous and non-decreasing in each argument. Let Y1 (X ) and Y0 (X ) be the supports of Y1 and Y0 given X , respectively, and suppose that they are the Borel sets generated by intervals with both ends being measurable. De…ne Fmin;' ( jX ) =

sup y2Y1 (X )

Fmax;' ( jX ) = 1 +

maxfF1o (yjX ) + F0o ('^y ( jX ) jX )

inf

y2Y1 (X )

1; 0g and

minfF1o (yjX ) + F0o ('^y ( jX ) jX )

where '^y ( jX ) = sup fy0 2 Y0 (X ) : ' (y; y0 ) < g. Then (i) the identi…ed set for

1; 0g: o

= F' ( ) is

IC

=

[FL;' ( ) ; FU;' ( )], where FL;' ( ) = E [Fmin;' ( jX )] and FU;' ( ) = E [Fmax;' ( jX )]; (ii) if ' is strictly increasing in each argument, then FL;' ( ) = F' ( ) = FU;' ( ) for all

if and only if for almost every

x 2 X , at least one of the conditional marginal distributions F1o ( jx ) ; F0o ( jx ) is degenerate. Under the conditions of Theorem 4.3, we obtain that for 0
0jp (X)) 2 f0; 1g and

F0o (y0 jX) < 0jp (X)) 2 f0; 1g :

(11)

The result in Proposition 5.1 is interesting. In sharp contrast to the average treatment e¤ect which can be point identi…ed under Assumption (IX) via conditioning on X or conditioning on p (X): = E (E [Y1 jX; D = 1]

E [Y0 jX; D = 0])

= E (E [Y1 jp (X) ; D = 1] Proposition 5.1 shows that for parameter

o

E [Y0 jp (X) ; D = 0]) :

that is partially identi…ed, the use of the full vector of covariates

X provides tighter bounds than the use of the propensity score p (X) unless the conditional distributions F1o (y1 jX) ; F0o (y0 jX) satisfy (11) which holds if the conditional marginal cdfs of Y1 ; Y0 depend on X only through p (X).10

Remark 5.1. A generalization of Proposition 5.1 implies that the sharp bounds on

o

using the maximal

relevant information set such that Assumption (IX) holds are the tightest, see Heckman and Navarro-Lozano (2004) for discussions on the maximal relevant information set. 8 We

are grateful to an anonymous referee for pointing out the necessary and su¢ cient condition. adapting the proof of Theorem 4.4, one may establish a similar result to Proposition 5.1 for '-indicator functions. To save space, this result is omitted from the paper. 1 0 For the point identi…ed , it is known that matching on the propensity score may result in loss of e¢ ciency, see Hahn (1998, 2004). 9 By

17

5.2

A Latent Threshold-Crossing Model and the Role of Endogenous Selection

Consider the semiparametric latent threshold-crossing model with continuous outcomes in Heckman (1990), Heckman and Vytlacil (1999, 2001, 2005): Y1 = g1 (X; U1 ); Y0 = g0 (X; U0 ); and D = Ifg (Z) where X 2 X

Rdx ; Z 2 Z

> 0g;

(12)

Rdz are observable covariates, U1 ; U0 ; are unobservable univariate covariates, 0

g1 ; g0 and g are unknown functions, and the distribution of the unobserved error vector (U1 ; U0 ; ) is also unknown. Suppose a random sample on (Y; X; Z; D) is available. Heckman and Vytlacil (2005) provided conditions under which various average treatment e¤ect parameters are point identi…ed, while Carneiro and Lee (2009) 0

extended the results in Heckman and Vytlacil (2005) to the identi…cation of distributions of (U1 ; ) and 0

(U0 ; ) conditional on the observables. We restate these conditions in Assumption (IU) and Assumption (LS) below. Assumption (IU). Assume that (i) g (Z) is a nondegenerate random variable conditional on X; (ii) 0

0

(U1 ; ) and (U0 ; ) are independent of Z conditional on X; (iii) the distribution of

conditional on X; Z

and that of g (Z) conditional on X are absolutely continuous with respect to Lebesgue measure. Without loss of generality, we normalize the distribution of implying by Assumption (IU)-(ii) that the distribution of

conditional on X and Z to be U (0; 1),

conditional on X is also U (0; 1). Let p (z) =

Pr (D = 1jZ = z). Then p (z) = g (z). Let Px denote the support of p (Z) conditional on X = x 2 X . Assumption (LS). For each x 2 X , the closure of Px is [0; 1]. 0

Let X = (X 0 ; ) . It follows from Theorem 1 in Carneiro and Lee (2009) that under Assumptions (IU) and (LS), F1o (yjx ) and F0o (yjx ) are point identi…ed from the sample information. In particular, they showed that F1o (yjx ) = Pr (Y

yjp (Z) = p; X = x; D = 1) + p

F0o (yjx ) = Pr (Y

yjp (Z) = p; X = x; D = 0)

where x = (x; p). Since Pr (X 0

@ Pr (Y

yjp (Z) = p; X = x; D = 1) and @p @ Pr (Y yjp (Z) = p; X = x; D = 0) p) ; @p

(1

x ) = pFX (x), Assumptions (IU) and (LS) imply that Assumption (IC)

0

is satis…ed with X = (X ; ) . 0

In the latent threshold-crossing model (12), X = (X 0 ; ) and the lower or upper bounds in Theorem 4.1 (i) are reached when the two potential outcomes are perfectly negatively or positively dependent conditional on X . For example, if

= U1

U0 and g1 ( ; ), g0 ( ; ) are increasing (or decreasing) respectively in U1 and

U0 , then Y1 , Y0 are perfectly positively dependent conditional on X and the upper bound is reached. When the distribution of either Y1 or Y0 conditional on X is degenerate, the lower and upper bounds in Theorem 4.1 (i) coincide and thus point identify

o.

The following proposition follows from a similar proof to that of

Theorem 4.2 or Proposition 5.1.

18

Proposition 5.2 Suppose that Assumptions (IU) and (LS) hold. Let

(y1 ; y0 ) be a strict super-modular11

and right continuous function, and suppose that the four expectations in (13) exist (even if in…nite valued) and that either of the following conditions is satis…ed: (A2)

(y1 ; y0 ) is symmetric and E [ (Y1 ; Y1 )] and

E [ (Y0 ; Y0 )] are …nite; (B2) for each x 2 X, there are some …xed constants y 0 (x) and y 1 (x) (depending on x) such that E [ (Y1 ; y 0 (X))], E [ (y 1 (X); Y0 )] and E [ (y 1 (X); y 0 (X))] are …nite, and at least one of i hR i hR 1 1 E 0 F1o1 (ujX) ; F0o1 (1 ujX) du and E 0 F1o1 (ujX) ; F0o1 (ujX) du is …nite. Then E

Z

0

E

1

F1o1 Z

(ujX

) ; F0o1

(1

ujX ) du = E

c -almost

1

F1o1 (ujX) ; F0o1 (1

ujX) du

and

0

1

F1o1 (ujX ) ; F0o1 (ujX ) du = E

0

i¤ for

Z

Z

1

F1o1 (ujX) ; F0o1 (ujX) du

(13)

0

all (y1 ; y0 ), it holds that Pr (F1o (y1 jX ) + F0o (y0 jX ) Pr (F1o (y1 jX )

1 > 0jX) 2 f0; 1g and

F0o (y0 jX ) < 0jX) 2 f0; 1g :

(14)

Proposition 5.2 implies that in general taking into account the self-selection process in addition to the covariate X in the latent threshold-crossing model results in tighter bounds than using X only unless (14) holds. For instance, if U1 and U0 are independent of

given X; Z, then (14) holds. In fact, when X is an

observable covariate in Example 2.1 (IC) (i), the condition that

2 0X

+

2 1X

> 1 ensuring identi…cation of the

sign of the correlation coe¢ cient might be too strong to be satis…ed in some applications. We now present an example of a latent threshold-crossing model to show that when selection is based on both observable and unobservable covariates, the above condition on the dependence between (Y1 ; Y0 ) and the observable covariate X may be weakened. Example 2.1 (IU) (i) (Correlation Coe¢ cient). Consider the following special case of the latent threshold-crossing model (12): Y1 = g1 (X) + U1 ; Y0 = g0 (X) + U0 ; and D = Ifg (Z) Since the distribution of

> 0g:

conditional on X is normalized to be U (0; 1), the distribution of V

conditional on X is N (0; 1), where

1

( )

0

( ) is the cdf of N (0; 1). Suppose that (U1 ; U0 ; ) is independent of Z

conditional on X, implying that Assumptions (IU)-(ii) holds. Then the joint distribution of (U1 ;U0 ; V; X;Z)0 can be expressed as f (u1 ; u0 ; v; x; z) = f (u1 ; u0 ; v; zjx) f (x) = f (u1 ; u0 ; vjx) f (zjx) f (x) = f (u1 ; u0 ; v; x) f (zjx) : Thus we only need to consider the joint distribution of (U1 ;U0 ; V; X)0 . 1 1 By adapting the proof of Theorem 4.4, one may establish a similar result to Proposition 5.2 for '-indicator functions. To save space, this result is omitted from the paper.

19

Let U = (U1 ; U0 )0 ; X = (V; X)0 and assume for simplicity that gi (X) =

i

(i = 1; 0) are constants and

(U1 ; U0 ; V; X)0 follows a multivariate normal distribution:12 U X where

21

11

2 1

1 0 10 2 0

1 0 10

;

12

=

11

12

21

22

;

(15)

Y jX 1;

0)

0

1 1V

1 X 1X

0 0V

0 X 0X

; and

22

1

=

X XV 2 X

X XV

:

0

Then the conditional distribution of Y

=(

;

0 12 ;

=

=

where

0 0

N

(Y1 ; Y0 ) given X is normal: N

+

and the expression for

1 22 X

12

11

11

11

1 22

12 1 22

12

;

;

21

(16)

is given as follows:

21

a11 a10

=

21

1 22

12

a10 a00

;

(17)

in which

The fact that L

U

2 1

a00 =

2 0

a10 =

1 0

2 1V

2 1X + 2 XV 2 0X + 2 XV

1 2 XV

1

2 0V

1 1V

0V

+

1V

0V

+

0X 1X

0V

2 1X

2 1V

2 XV

1V

0V

+

+2

1X XV 2 XV 1V

0X 1X

0V

1 2 XV

1X XV

;

2

0V

0X XV

; and

0X 1X

0V

1X XV 2 XV

2 1V

2 1X

+2

1 implies that 0X 1V

1X XV

XV

:

corr(Y1 ; Y0 )

L

XV

2 0X

+2

0V

0X XV

)

and

+

2 XV

) (1

(19)

2 0V

2 0X

+2

0V

0X XV

2 XV

1

where (18)

2 0V

2 XV

0X 1V

1X XV

U,

10

XV

) (1

1X XV 2 XV 1V

0X 1V

2 XV

1

= p (1

1V

1

1 (1

2

10

p corr(Y1 ; Y0 jX ) = a10 = a11 a00

1

= p

2 XV

1

a11 =

)

:

Case I. Suppose U1 and U0 are jointly independent of V conditional on X; Z. Then the selection-onobservables assumption (i.e., Assumption (IX)) holds. It follows from Assumptions (IU)-(ii) that U1 and U0 are also jointly independent of V conditional on X, implying that 0

Both constraints follow from the fact that U where

XU

=

0 U X

=

1 X 1X

U

1 2 We

(U1 ; U0 ; V ) jX

0 X 0X

0

=@

2 1

X XV 1 0 10 2 0

1V

N(

1X XV

= 0 and

U X X=

2 X;

U

0V

0X XV U X

; 1 1V 0 0V

1

1

A ; and

present an example in Appendix B where the distributions of the potential outcomes are log-normal.

20

XU

=

= 0: 2 X ),

U

U X

It follows from

1V

2 1

=@

2 X

=

XU

0

1 0

(

10

2 0

1

1X 0X ) 2 0X

( 0(

1

1V

0V

1 ) A: XV )

1X XV

0V

0X 2 XV

1

= 0 and

1X XV

2 1X

1

(20)

= 0 that the bounds in (18) and (19) reduce to those

0X XV

in Example 2.1 (IC) (i): L

=

(1) L

0X 1X

U

=

(1) U

0X 1X

q (1 q + (1

2 ) (1 0X

2 ) 1X

2 ) (1 0X

2 ): 1X

and

(21) (22)

It should be noted that the bounds in (21) and (22) are also those obtained by using only the conditional distribution information given X (that is, from

1

corr(Y1 ; Y0 jX)

(1) U ).

10

1, we can get

(1) L

corr(Y1 ; Y0 )

Case II. We now demonstrate that when there is endogenous selection, i.e., U1 ; U0 are not jointly independent of V conditional on X; Z, the bounds in (21) and (22) may be tightened. Consider the special case of

XV

= 0. In this case, the bounds (2) L

(

(2) U

1V

(

0V

1V

+

0V

L

+

0X 1X )

1V

6= 0 and

0V

+

(1) L

in (18) and (19) reduce to:

U

q

0X 1X )

A straightforward calculation shows that (i) endogenous selection (i.e.,

and

2 1V

(1

q (1 (2) L

2 ) (1 1X

2 1V

and

2 0V

2 ) (1 1X (2) U

(1) U ,

2 ) 0X

2 0V

and

(23)

2 ): 0X

(24)

implying that on the one hand, with

6= 0) the identi…ed set would be tightened; on the other hand,

the identi…ed set based on more conditional distribution information (i.e., given X = (V; X)0 ) should be smaller than that based on less conditional distribution information (i.e., given X only), (ii) p p p p 2 + 2 = 0 and (2) = (1) i¤ 2 2 = 0. 1 1 1 1 1V 1V 0V 0V U U 0X 1X 0X 1X

(1) L

= (2)

(2) L

The result (ii) can also be obtained from Proposition 5.2. Here we show it only for the case of U = p Note from (16) that Yj jX N j ; ajj and Fjo (yj jX ) = yj j = ajj (j = 1; 0), where j + [bjV V + bjX X] with bjV

j(

jV

jX

XV )=(1

and ajj (j = 1; 0) are de…ned in (17). Then F1o (y1 jX ) p (y0 0 )= a00 , speci…cally, y1 p Obviously, since V jX

1

a11

y0 p

0

a00

b1V p a11

) and bjX

j(

jX

jV

XV )=

F0o (y0 jX ) < 0 is equivalent to (y1

b0V p a00

V +

b1X p a11

b0X p a00

X

1

(1) U . j 2 XV

, p 1 )= a11
0;

(26)

< 0 and (26) holds, we have

10

> 1; we can identify the sign

0X 1X

< 0. Obviously, these conditions on

0) cannot identify the sign of

2 1V

From (26), we can see that as long as the correlations between

) are strong enough so that

under quite weak conditions on 0X 1X

2 1V

1

implying that

( ) and Uj (i.e.,

0X 1X

6

2 0V

1

< 0, implying a negative

< 0 when

or

(2) L

> 0 and

0X

0 with and

1X

1V

0V

(i.e.,

> 0, and 0X 1X

0

without endogenous selection.

More on Partial Identi…cation

The results in the previous sections are established under the assumptions that (i) there is no information on the copula (in Section 3) or the conditional copula (in Section 4) of Y1 , Y0 so that the true copula Co or Co ( ; jx ) vary in the whole class of bivariate copula functions and (ii) the marginal (in Section 3) or conditional marginal (in Section 4) distributions of Y1 , Y0 are known. This section extends the results in the previous sections to two cases: (i) partial information on the copula or the conditional copula of Y1 , Y0 is available and/or (ii) the marginal or the conditional marginal distributions of Y1 , Y0 are partially identi…ed.

6.1

Partial Dependence Information

In many applications, some partial dependence information on Y1 , Y0 is available. For example, Y1 and Y0 may be known to be non-negatively dependent; the value of a dependence measure such as Kendall’s 2

is known; or the values of the true copula at some speci…c points in [0; 1] may be known. Nelsen and Ubeda-Flores (2004) and Nelsen et al. (2001, 2004) established improved Fréchet-Hoe¤ding bounds when such partial dependence information is available. Using the improved Fréchet-Hoe¤ding bounds, Tankov (2011) showed that the bounds in Cambanis, Simons, and Stout (1976) for super-modular functions can be tightened. Similar results for '-indicator functions can be obtained using Williamson and Downs (1990) and Embrechts, Hoeing, and Juri (2003), see Fan and Park (2009). In this section, we establish the identi…ed sets for super-modular and '-indicator functions when both the covariate information and partial dependence information are available extending our results in Section 4 and the results in Tankov (2011), Williamson and Downs (1990), and Embrechts, Hoeing, and Juri (2003). We characterize the partial dependence information via a restricted class of copula functions that Co ( ; jx ), x 2 X , belongs to. Let CR (x )

C denote a class of copula functions which may depend on x . Suppose

Assumption (IC) holds and Co ( ; jx ) 2 CR (x ). Then the identi…ed set for o is RR 2 : =E (y1 ; y0 ) dC (F1o (y1 jX ) ; F0o (y0 jX ) jX ) = IC;R for some C ( ; jX ) 2 CR (X ) a.s.

22

:

In many applications, CR (x ) is of the following form: CR (x ) = fC 2 C : CL ( ; jx )

C ( ; jx )

CU ( ; jx )g ;

(27)

where CL ( ; jx ) and CU ( ; jx ) are two known copula functions. For example, if Y1 and Y0 are nonn o 2 negatively dependent conditional on X = x , then CR (x ) = C 2 C : C (u; v) uv for all (u; v) 2 [0; 1] . In some applications, the value of a speci…c dependence measure such as Kendall’s

or the values of

Co (u; vjx ) at some speci…c points (u; v) may be known. Applying the results in Nelsen and Ubeda-Flores (2004), Nelsen et al. (2001, 2004) to the conditional copula Co ( ; jx ) implies that under regularity conditions, there exists known copula functions CL ( ; jx ) ; CU ( ; jx ) depending on the value of the dependence measure or the known values of Co ( ; jx ) such that for all x 2 X , CR (x ) is of the form (27). In the rest of this subsection, we present explicit characterizations of

IC;R

for super-modular and '-

indicator functions respectively assuming that CR (x ) is of the form (27). For a super-modular function , we let Z 1Z 1 F1o1 (ujX ) ; F0o1 (vjX ) dCL (u; vjX ) L;R = E 0

U;R

=E

Z

0

and

0

1

Z

1

F1o1 (ujX ) ; F0o1 (vjX ) dCU (u; vjX ) .

(28)

0

THEOREM 6.1 Suppose that Assumption (IC) holds and that CR (x ) is of the form (27). Let

(y1 ; y0 ) be

a super-modular and right continuous function, and suppose that both the expectations in (28) exist with …nite values and that either of the following conditions is satis…ed: (A3)

(y1 ; y0 ) is symmetric and E [ (Y1 ; Y1 )]

and E [ (Y0 ; Y0 )] are …nite; (B3) for each x 2 X , there are some …xed constants y 0 (x ) and y 1 (x ) (depending on x ) such that E [ (Y1 ; y 0 (X ))], E [ (y 1 (X ); Y0 )] and E [ (y 1 (X ); y 0 (X ))] are …nite. Then the identi…ed set for

o

= Eo [ (Y1 ; Y0 )] is

IC;R

=[

L;R ; U;R ].

Theorem 6.1 extends a similar result in Tankov (2011) to allow for the presence of covariates and extends Theorem 4.1 in Section 4 to allow for restrictions on copulas. For '-indicator functions, Theorem 6.2 below extends Theorem 4.3 in Section 4 to allow for restrictions on copulas. THEOREM 6.2 Suppose that Assumption (IC) holds and ' is continuous and non-decreasing in each argument. Further suppose that CR (X ) is of the form (27). Let Y1 (X ) and Y0 (X ) be the supports of Y1 and Y0 given X , respectively, and suppose that they are the Borel sets generated by intervals with both ends QL (X ) and QR (X ) being measurable. De…ne L Fmin;' ( jX ) =

y2Y1 (X )

U Fmax;' ( jX ) =

y2Y1 (X )

where CLd (u; vjX ) = u + v

sup inf

CL F1o (yjX ); F0o ('^y ( jX ) jX ) ; CLd F1o (yjX ); F0o ('^y ( jX ) jX ) ;

CL (u; vjX ) is the dual of copula CL , and '^y ( jX ) = supfy0 2 Y0 (X ) :

' (y; y0 ) < g. Then the identi…ed set for

o

= F' ( ) is 23

IC;R

L U = E Fmin;' ( jX ) ; E Fmax;' ( jX ) .

As noted in Williamson and Downs (1990) and Embrechts, Hoeing, and Juri (2003), when there is no covariate, for '-indicator functions, the upper bound CU on the true copula does not improve on the bounds for

o.

6.2

This is in sharp contrast to the bounds for super-modular functions stated in Theorem 6.1.

Partially Identi…ed Marginals

Sections 3 and 4 assume that either the marginals of Y1 ; Y0 are identi…ed (Assumption (I)) or the conditional marginals of Y1 ; Y0 and the marginal of the covariate are identi…ed (Assumption (IC)). In some applications, the marginals or conditional marginals of Y1 ; Y0 may be partially identi…ed. In this section, we extend the main identi…cation results in Section 3 to partially identi…ed marginals. The results in this section can be used to extend the results in Section 4 to the case that the conditional marginals of Y1 ; Y0 are partially identi…ed and the marginal of the covariate is point identi…ed. For space considerations, this extension is omitted from this paper. Assumption (I)’. For j = 1; 0, Fjo 2 Fj , where Fj denotes a class of univariate distribution functions absolutely continuous with respect to Lebesgue measure. Examples satisfying Assumption (I)’abound. When Y1 ; Y0 are potential outcomes of a binary treatment, Manski (1990, 2003) present many examples in which the marginal distributions Fjo ; j = 1; 0, are not point identi…ed. For example, suppose the sample information contains a random sample on (Y; D), where Y

Y1 D + Y0 (1

D) denotes the realized outcome for the individual. Let pj Fjo ( ) = Pr (Y

jD = j) pj + Pr (Y

jD = 1

j) (1

Pr (D = j) 2 (0; 1). Since pj ) ;

the identi…ed set for Fjo ( ) is13 Fj =

Fj ( ) = Pr (Y jD = j) pj + G ( ) (1 pj ) for some G ( ) , an absolutely continuous distribution w.r.t. Lebesgue measure

:

When additional information is available such as Instrumental Variable or Monotone Instrumental Variable, the above identi…ed set can be tightened. We refer interested readers to Manski (2003). In bivariate option pricing, when markets for traded single-asset options on Y1 ; Y0 are incomplete, the marginal cdfs corresponding to the risk neutral measure are only partially identi…ed, see Kaido and White (2009). For a general function , the identi…ed set for 0 I

where CR

=

2

o

under Assumption (I)’is given by

: = EF [ (Y1 ; Y0 )] , where F = C (F1 ; F0 ) for some (C; F1 ; F0 ) 2 CR F1 F0

C is the class of copula functions that Co belongs to.

In the rest of this section, we establish explicit characterizations of

0 I

;

(29)

for super-modular functions and

'-indicator functions respectively when F1 , F0 , and CR are characterized by Assumption (I)” below. Assumption (I)”. (i) The marginal distribution functions F1o ; F0o are unknown, but with known lower and upper bound distributions denoted as FjL and FjU for j = 0; 1; (ii) There exists known copula functions 1 3 Tamer (2010) establishes sharp bounds on the joint cdf of the potential outcomes in this case extending the classical Frechet-Hoe¤ding bounds for distributions with …xed marginals.

24

CL and CU such that CR = fC 2 C : CL ( ; )

C(; )

CU ( ; )g :

Assumption (I)”(i) is satis…ed by interval outcome data, see Manski and Tamer (2002). It can be relaxed to accommodate identi…ed sets for F1o ; F0o of the form de…ned in (6) in Stoye (2010) or identi…ed sets for the corresponding measures of the form in (7) in Stoye (2010). Such extensions are omitted from the current paper for space considerations. As discussed in Stoye (2010), Kitagawa (2009) presents many examples satisfying (6) in Stoye (2010). Assumption (I)” (ii) is satis…ed when e.g., Y1 and Y0 may be known to be non-negatively dependent; the value of a dependence measure such as Kendall’s

is known; or the values of

2

the true copula at some speci…c points in [0; 1] are known. For i; j = U or L; de…ne ( )

(+)

Fij (y1 ; y0 ) ( )

Obviously, Fij

(+)

and Fij

CL (F1i (y1 ); F0j (y0 )) and Fij (y1 ; y0 )

CU (F1i (y1 ); F0j (y0 )) :

are bivariate cdfs with marginals satisfying Assumption (I)” (i). If F1L = F1U = ( )

F1o , F0L = F0U = F0o , and CL = M; CU = W , we have Fij (y1 ; y0 ) = F (

)

(+)

(y1 ; y0 ) and Fij (y1 ; y0 ) =

F (+) (y1 ; y0 ) de…ned in Section 3.1. THEOREM 6.3 Suppose Assumption (I)” holds and

(y1 ; y0 ) is super-modular and right continuous. For

i; j = U or L, de…ne EF (+) [ (Y1 ; Y0 )] = ij

EF (

ij

)

[ (Y1 ; Y0 )] =

Z

1

0

Z

Z

1

F1i 1 (u); F0j 1 (v) dCU (u; v) and

0

1

0

Z

0

1

F1i 1 (u); F0j 1 (v) dCL (u; v) ;

and suppose all of them are …nite. (i) Suppose that Y1 and Y0 are bounded from below by constants y 1 h i i h and y 0 respectively, and that EF1j Y1 ; y 0 and EF0j x; y 0 and y 1 ; Y0 (j = U; L) are …nite. If y 1 ; x are non-decreasing respectively for x

is 0 I

h = EF (

) UU

y 1 and x

y 0 , then the identi…ed set for

o

= Eo [ (Y1 ; Y0 )]

i [ (Y1 ; Y0 )] ; EF (+) [ (Y1 ; Y0 )] :

(30)

LL

(ii) Suppose that Y1 is bounded from below by a constant y 1 , Y0 is bounded from above by a constant y 0 , and h i that EF1j [ (Y1 ; y 0 )] and EF0j y 1 ; Y0 y1 (j = U; L) are …nite. If (x; y 0 ) is non-increasing for x

and

y 1 ; x is non-decreasing for x 0 I

y 0 , then the identi…ed set for

h = EF (

) LU

o

is

i [ (Y1 ; Y0 )] ; EF (+) [ (Y1 ; Y0 )] :

(31)

UL

(iii) Suppose that Y1 and Y0 are bounded from above by constants y 1 and y 0 respectively, and that EF1j [ (Y1 ; y 0 )] and EF0j [ (y 1 ; Y0 )] (j = U; L) are …nite. If and x

y 0 ; then the identi…ed set for 0 I

o

(x; y 0 ) and

(y 1 ; x) are non-increasing respectively for x

y1

is

h = EF (

) LL

i [ (Y1 ; Y0 )] ; EF (+) [ (Y1 ; Y0 )] : UU

25

(32)

(iv) Suppose that Y1 is bounded from above by a constant y 1 , Y0 is bounded from below by a constant y 0 , and h i that EF1j Y1 ; y 0 x; y 0 is non-decreasing for x y 1 and EF0j [ (y 1 ; Y0 )] (j = U; L) are …nite. If and

(y 1 ; x) is non-increasing for x 0 I

y 0 , then the identi…ed set for

h = EF (

) UL

o

is

i [ (Y1 ; Y0 )] ; EF (+) [ (Y1 ; Y0 )] :

(33)

LU

Theorem 6.3 extends the result in Cambanis, Simons, and Stout (1976) stated as Lemma 3.1 in Section 3 in two directions. First, it allows for partially identi…ed marginals that are bounded by known distribution functions from both below and above; second it allows for restricted copulas. Rachev and Ruschendorf (1994) study a similar problem when one marginal cdf is bounded from below and the other is bounded from above and derive the lower bound for two special classes of super-modular functions. In the …rst class, symmetric and satis…es:

(y; y) = 0 for all y and in the second class,

in (2.16) in their paper and

is

satis…es the unimodality condition

(y; y) = 0 for all y, see Theorems 1 and 2 in their paper. Both conditions turn

out to be strong and are violated by Example 2.1 (i) as shown in Example 2.1 (I)” below. Example 2.1 (I)" (Interval Data). Suppose that for j = 0; 1, Yj is interval-observed such that YjL

Yj

YjU and the cdfs of YjL ; YjU are FjL and FjU respectively. Further suppose that YjL has

support (0; 1). Let

(y1 ; y0 ) = y1 y0 . It is easy to show that

does not satisfy the conditions in Rachev

and Ruschendorf (1994), but does satisfy the conditions of Theorem 6.3 (i). As a result, (30) is valid for any known copulas CL ; CU and when CL = M; CU = W , we obtain: Z 1 Z F1U1 (u); F0U1 (1 u) du E [Y1 Y0 ] 0

1

F1L1 (u); F0L1 (u) du:

0

(34)

Moreover, for j = 0; 1, it holds that E (YjL )

E (Yj )

2 E YjL

Let

j

= E Yj2 and

correlation coe¢ cient (

j

E (YjU ) and

E Yj2

(35)

2 E YjU :

(36)

= E (Yj ) for j = 0; 1. Then (35), (30), and (36) imply that the identi…ed set for the

10

is

2 [0; 1] :

=

1

q

0

, where ;

j,

f 1 gf 0 g (34), (35), and (36) respectively 2 1

2 0

and

j

satisfy

)

:

The theorem below extends Proposition 3.3 to partially identi…ed marginals and restricted copulas. THEOREM 6.4 Suppose Assumption (I)”holds and ' is continuous and non-decreasing in each argument. For a …xed , let L U Fmin;' ( ) = sup CL F1L (y); F0L ('^y ( )) and Fmax;' ( ) = inf CLd F1U (y); F0U ('^y ( )) : y2Y1

y2Y1

Then the identi…ed set for

o

= F' ( ) is

0 I

L U = Fmin;' ( ); Fmax;' ( ) .

26

7

Inference for Super-Modular Framework

in the Selection-on-Observables

We have provided a comprehensive study of partial identi…cation of sections. The structures of the identi…ed sets for

o

o

under various scenarios in the previous

imply that existing inference procedures developed for

partially identi…ed parameters are not directly applicable, see Chernozhukov, Hong, and Tamer (2007), Andrews and Soares (2010), Andrews and Shi (2013), Santos (2012), among others.

When Y1 ; Y0 are

potential outcomes of a binary treatment and data from randomized experiments are available, Fan and Park (2009, 2010, 2012) constructed asymptotically valid CSs for which

'

= Y1

Y0 and the quantile of

In this section, we construct CSs for modular functions

when

o

is a '-indicator function in

'

respectively.

o

and its conditional version denoted as

o

(x) for strict super-

under the selection-on-observables assumption, i.e., Assumption (IX), which implies

that Assumption (IC) holds with X = X. Asymptotically valid inference procedures for functions

in

other cases studied in the previous sections including latent threshold-crossing models and bivariate option pricing remain to be developed.

7.1

Estimators of the Bounds and Assumptions

Suppose

( ; ) is strict super-modular and right continuous. Let Qj (ujx) = Fjo1 (ujx), j = 0; 1 and L

(x) =

Z

1

(Q1 (ujx) ; Q0 (1

ujx)) du;

U

(x) =

0

Z

1

(Q1 (ujx) ; Q0 (ujx)) du:

0

An application of Lemma 3.1 conditional on the covariate implies that o

(x)

U

(x) and

L

(x) =

U

o

(x) is partially identi…ed:

L

(x)

(x) if and only if at least one of the conditional marginal distributions

F1o ( jx) ; F0o ( jx) is degenerate.

n

Suppose a random sample fYi ; Xi ; Di gi=1 on fY; X; Dg is available. We estimate the conditional quantile function Qj (ujx) of Y given X = x and D = j using the local polynomial approach. Let `u (t) = t (u

I(t

0)) ;

u 2 [0; 1]

be the quantile check function and Y(1) = mini=1;:::;n Yi , Y(n) = maxi=1;:::;n Yi . Consider a kernel function K ( ), a bandwidth an > 0, and an integer s

1. Let x = (x1 ; : : : ; xd ) and P1 (x) be the vector which stacks

the power xj11

xjdd ;

1

j1 +

+ jd

according to the lexicographic order. De…ne also P (x) = 1; P1 (x)

s 0 0

1; . The local polynomial estimator of

b j (ujx) = bb0j (ujx), where bb0j (ujx) and bb1j (ujx) achieve the minimum of Qj (ujx), j = 0,1, is de…ned as Q n X i=1

`u Yi

b0

P1 (Xi

0

x) b1 I fDi = jg

where an appropriate convention is used to break ties. 27

1 K adn

Xi x an

;

b0 2 Y(1) ; Y(n) ;

The estimators of Z bL (x) =

L

0

1

(x),

U

(x),

L

and

b 1 (ujx) ; Q b 0 (1 Q

U

are, bU (x) =

ujx) du;

n

Z

1

0

n

X bL = 1 bL (Xi ) ; n i=1

b 1 (ujx) ; Q b 0 (ujx) du; Q

X bU = 1 bU (Xi ) : n i=1

b j (ujx) = bb0j (ujx) 2 Y(1) ; Y(n) imposed on the local polynomial quantile estimators The restriction that Q

is useful in the extreme cases: u = 0 or u = 1. As discussed in Hall and van Keilegom (2009), for u = 0,1, the minimizers bb0j (0jx) and bb0j (1jx) may become in…nite when x is near the boundary of the support of X. The

b j (ujx) = bb0j (ujx) 2 Y(1) ; Y(n) is a sample version of a basic property of the population restriction that Q conditional quantile Qj (ujx) which lies between the minimal and maximal values taken by Y . Regarding the estimation of the integral parameter

L

(x) and

U

b j (ujx) 2 Y(1) ; Y(n) is su¢ cient (x), the restriction that Q

to deal with such improper behaviors which only a¤ect those u that are too close to 0 or 1. Indeed, when the support of Y is compact as assumed below, Y(1) and Y(n) are bounded away from in…nity and our integral

b j (ujx) in a estimators bL (x), bU (x), bL and bU will not be too much a¤ected by poor performances of Q shrinking neighborhood of the extremes u = 0, 1.

We assume that the support of X given D = j is the same as that of X denoted as X . Let x be any point

in X including its boundary. To establish the asymptotic distribution of bL (x) ; bU (x) introduce the following assumptions. Let pj (x) = Pr(D = jjx) and fj (yjx)

0

and bL ; bU

0

, we

@Fj (yjx)=@y, where Fj (yjx)

Fjo (yjx) is the conditional cumulative distribution function of Y given X = x and D = j, with support Sj = f(x; y) ; x 2 X ; y 2 Yj (x)

[Qj (0jx) ; Qj (1jx)]g. Note that Yj = [inf x2X Qj (0jx) ; supx2X Qj (1jx)].

(A1) (i) The partial derivatives of Fj (yjx) w.r.t. to x up to order s are continuous over Sj , (ii) Sj is compact, fj ( j ) is continuously di¤erentiable over Sj and satis…es: inf (y;x)2Sj fj (yjx) > 0. (A2) (i) XjD = j is continuous with continuous probability density functions fj ( ) satisfying inf x2X fj (x) > 0, j = 0; 1. Further, p ( ) 2 (0; 1) is continuous over X , (ii) There is some

> 0 such that, for all > 0

0

small enough, any x 2 X , there is x 2 X such that B (x0 ;

)

B (x; ) \ X ,

where B (x; ) is the Euclidean ball with center x and radius . (A3)

(y1 ; y0 ) is twice di¤erentiable on Y1

Y0 with bounded second-order partial derivatives.

(A4) (i) The kernel K ( ) is non negative and Lipschitz, i.e., jK (x)

K (x0 )j

L kx

x0 k for any x, x0 2 Rd .

The kernel K ( ) has a compact support and is bounded away from 0 over the unit ball B (0; 1), (ii) 3 2s+d The bandwidth sequence an satis…es an ! 0, nad+s ! 0. n = log n ! 1, and nan

Assumption A1-(i) implies that the conditional quantile functions Qj (ujx), j = 0; 1, are continuously di¤erentiable with respect to (x; u) up to order s. An important implication of Assumptions A1-(ii) and 28

A2 is that the quantile density function 1= (fj (Qj (ujx) jx) fj (x)), which is proportional to the asymptotic variance of many nonparametric quantile estimators, stays bounded away from in…nity. As a consequence studying the variance of bL (x), bU (x), bL and bU is made easier. This however comes at the price of potential

boundary bias issues, which are dealt with through the choice of local polynomial quantile estimators. The technical challenge here is that the bias behavior of multivariate local polynomial estimators has not been studied much. Assumption A2-(ii), which is from Fan and Guerre (2014), ensures that the bias of the local b j (ujx) is of order O (asn ) including x on the boundary of X and u close polynomial quantile estimators Q

to 0 and 1, see Proposition D.3 in the Appendix D. This result is important to study the asymptotic bias bL (x) ; bU (x)

of

0

and

0

bL ; bU

. As discussed in Fan and Guerre (2014), Assumption A2-(ii) is better

suited for global bias study and is more general than a condition of Ruppert and Wand (1994) who studied the pointwise behavior of the bias under a local convexity support restriction. The other assumptions are 3 standard, except the condition nad+s n = log n ! 1 in Assumption A4-(ii). This condition is used to establish

the asymptotic normality of bL and bU . It involves a study of the local polynomial quantile estimator when

u ! 0 or u ! 1 with a suitable rate and requires this bandwidth condition strengthening the usual consistency condition nadn ! 1.

7.2

Asymptotic Normality 0

Let e0 = (1; 0; : : : ; 0) denote the …rst vector of the canonical basis and 2 VK;a n

(x) =

e00 Z

Z

1 0

P (v) P (v) K (v) 1 (x + an v 2 X ) dv 1 0

P (v) P (v) K (v) 1 (x + an v 2 X ) dv

Z

0

P (v) P (v) K 2 (v) 1 (x + an v 2 X ) dv

e0 :

2 Lemma D.1 in the Appendix D shows that under Assumption A2-(ii), VK;a (x) is well-de…ned uniformly n

over the support X provided that an is small enough. De…ne also, for G0L (u) = G1L (u) =

(Q1 (ujx) ; Q0 (1 ujx)) ; f0 (Q0 (1 ujx) jx) ujx)) 1 (Q1 (ujx) ; Q0 (1 ; f1 (Q1 (ujx) jx) 0

G0U (u) = G1U (u) =

j

(y1 ; y0 ) = @ (y1 ; y0 ) =@yj ;

(Q1 (ujx) ; Q0 (ujx)) ; f0 (Q0 (ujx) jx) 1 (Q1 (ujx) ; Q0 (ujx)) : f1 (Q1 (ujx) jx) 0

0

We are now ready to state the joint asymptotic normality of bL (x) ; bU (x) .

THEOREM 7.1 Suppose Assumption (IX) and (A1)-(A4) hold. Then, for any x 2 X , ! p 2 bL (x) nadn L (x) LU (x) L (x) =) N 0; ; 2 b VK;an (x) LU (x) U (x) U (x) U (x)

29

with 2 L

(x) = +

Z

1

(x) =

1

0 Z 1

0 Z 1

0

0

0

0

2 U

Z

Z

1

Z

G0L (u) G0L (v) fmin (1 f0 (x) Pr (D = 0)

u; 1

G1L (u) G1L (v) fmin (u; v) f1 (x) Pr (D = 1)

1

v)

(1

u) (1

v)g dudv

uvg dudv;

G0U (u) G0U (v) G1U (u) G1U (v) + f0 (x) Pr (D = 0) f1 (x) Pr (D = 1)

fmin (u; v)

uvg dudv;

and LU

(x) = +

Z

1

0 Z 1 0

Z

1

0 Z 1 0

G0L (u) G0U (v) fmin (1 f0 (x) Pr (D = 0)

u; v)

G1L (u) G1U (v) fmin (u; v) f1 (x) Pr (D = 1)

(1

u) vg dudv

uvg dudv:

Theorem 7.1 holds for all x in X , including the boundaries of the support of the covariate X, showing in

particular that bL (x) ; bU (x)

0

is consistent when x lies on the boundary. Such a result is potentially useful

for instance when minimal or maximal values of an entry of X, as education, time spent in unemployment or health, corresponds to speci…c subpopulations of interest for some policy interventions. The asymptotic normality stated in Theorem 7.1 holds under the additional condition that the variance dominates the bias, p that is asn = o 1= nadn as ensured by Assumption A4-(ii). The asymptotic variance of Theorem 7.1

involves the partial derivatives of

(y1 ; y0 ) due to the use of the Functional Delta method, the inverse of

fj (Qj (ujx) jx) which is typical of quantile estimation asymptotics, and the inverse of fj (x) as expected from a local polynomial method. b j (ujx) The proof of Theorem 7.1 uses a Bahadur representation of the local linear quantile estimator Q

which is also useful in other econometrics contexts, see Guerre and Sabbah (2012) and Kong, Linton and 0 0

Xia (2010) among others. Let Qj (ujx) = b0j (ujx), where b0j (ujx) ; b1j (ujx)

achieves the minimum of

the population objective function E `u Y

b0

Qj (ujXi ; x) = b0j (ujx) + P1 (Xi

0

P1 (X

0

x) b1 I fD = jg K

X x an

:

De…ne also

Sbj (ujx) =

1

x) b1j (ujx) ; n X 1 Yi Qj (ujXi ; x) 1=2

u 1 (Di = j) P

(nad Pr (D = j)) i=1 Z n 0 H j (ujx) = fj Qj (ujx + an v; x) jx + an v P (v) P (v) K (v) fj (x + an v) dv:

Xi x an

K

Xi x an

;

Propositions D.3 and D.4 in the Appendix D show that Qj (ujx) = Qj (ujx) + O (asn ) and b j (ujx) = Qj (ujx) + Q

e00 (nadn Pr (D = j))

H j (ujx) 1=2

1

30

Sbj (ujx) + (u (1

u) + asn ) OP

log n nadn

3=4

!

(37) ; (38)

uniformly in x 2 X and in u in an interval growing to [0; 1] with the sample size. This extends Guerre and Sabbah (2012) who only consider inner x and u, and Fan and Guerre (2014) due to the uniformity over an interval growing to [0; 1] with the sample size. Using the uniformity in u is crucial to derive a linear b j (ujx), j = 0; 1, by the Delta method. approximation for bk (x), k = L; U from the linear representation of Q 0

bL (x) ; bU (x)

Applying the Lindeberg Central Limit Theorem to the linear part of

then gives Theorem

7.1. Indeed, it is shown in the Appendix D that an important implication of (37) and (38) is that bL (x)

L

(x) = + +

bU (x)

U

(x) = + +

e00 1=2 (nadn ) e00 1=2 (nadn )

O (asn )

1=2

(nadn ) e00

1=2

O (asn )

1 0

(Q1 (ujx) ; Q0 (1

ujx))

1

(Q1 (ujx) ; Q0 (1

ujx))

0

Z

1

0

log n nadn

+ OP

e00

(nadn )

Z

Z

3=4

!

1 0

(Q1 (ujx) ; Q0 (ujx))

1

(Q1 (ujx) ; Q0 (ujx))

1

0

log n nadn

+ OP

3=4

!

1

ujx)

Sb0 (1

Pr (D = 0) 1 H 1 (ujx) Sb1 (ujx) 1=2

Pr (D = 1)

H 0 (ujx)

1

Sb0 (ujx) 1=2

Pr (D = 0) 1 H 1 (ujx) Sb1 (ujx) 1=2

Pr (D = 1)

(RL (Wi1 ; Wi2 ) + RL (Wi2 ; Wi1 )) and

1 i1

where z1 d1 CS

L

(x). Following Horowitz and Manski (2000), de…ne the con…dence set " # bL (x) bU (x) b b d CS 1 (x) = L (x) p z1 ; U (x) + p z1 ; nadn nadn

is the (1

) quantile of the standard normal distribution. The next Theorem shows that

(x) contains the true

o

(x) with an asymptotic probability of 1

. 1=2

THEOREM 7.3 Suppose the conditions of Theorem 7.1 hold with log n= nadn

= o (a1n ) and 0
0 and

for all x 2 A both F1o ( jx ) and F0o ( jx ) are not degenerate, implying by Proposition 3.4 that we have Fmax;' ( jx ) > Fmin;' ( jx ) for all x 2 A, where we used the fact Fmax;' ( jx ) x 2 X . This leads to E[Fmax;' ( jX )

Fmin;' ( jx ) for all

Fmin;' ( jX )] > 0, a contradiction with E [Fmax;' ( jX )] =

E [Fmin;' ( jX )]. Proof of Theorem 4.4. We provide a proof for the lower bounds. The proof for the upper bounds is similar and thus omitted. By de…nitions of FL;' ( ), Fmin;' ( ) and Jensen’s inequality, we obtain: FL;' ( ) = E sup max F1o (yjX ) + F0o ('^y ( ) jX )

1; 0

sup E max F1o (yjX ) + F0o ('^y ( ) jX )

1; 0

sup max E F1o (yjX ) + F0o ('^y ( ) jX )

1 ;0

y2Y1

y2Y1

y2Y1

= sup max F1o (y) + F0o ('^y ( ))

1; 0 = Fmin;' ( ) :

y2Y1

Note that Yj (j = 1; 0) are assumed to be continuous random variables. Then, Fjo (yjX ) and Fjo (y) are continuous and thus supy2Y1 F1o (yjX ) = 1 and supy2Y1 F1o (y) = 1, implying supy2Y1 fF1o (yjX ) + F0o ('^y ( )jX )

1g

0 and supy2Y1 fF1o (y) + F0o ('^y ( ))

1g

0. Therefore, the inequality above becomes

FL;' ( ) = E sup F1o (yjX ) + F0o ('^y ( ) jX )

1

y2Y1

sup E F1o (yjX ) + F0o ('^y ( ) jX )

1

y2Y1

= sup F1o (y) + F0o ('^y ( ))

1 = Fmin;' ( ) :

(A.12)

y2Y1

Let G' (y; x) = F1o (yjx) + F0o ('^y ( ) jx)

1. Then, supy2Y1 E[G(y; X )] = E[G(y; X )] and it follows from

(A.12) that FL;' ( ) = Fmin;' ( ) i¤ E[supy2Y1 G(y; X )] = E[G(y; X )]: Since supy2Y1 G(y; x )

G(y; x )

for all x 2 X , it implies that E supy2Y1 G(y; X ) = E[G(y; X )] i¤ supy2Y1 G(y; x ) = G(y; x ) for

almost all x 2 X , that is, F1o (yjx ) + F0o ('^y ( ) jx )

1 reaches its maximum value uniformly at y for

almost all x 2 X . Proof of Proposition 5.1: By using Theorem 4.1 (i) with X = p (X), we see that the identi…ed set for o,

o

is [

i.e., [

LP ; U P ]. L ; U ].

Similarly, by using Theorem 4.1 (i) with X = X, we obtain the other identi…ed set for

Following the proof of Theorem 4.2, we can show [

L; U ]

[

LP ; U P ].

F L (y1 ; y0 ) = E [M (F1o (y1 jX) ; F0o (y0 jX))] ; FPL (y1 ; y0 ) = E [M (F1o (y1 jp (X)) ; F0o (y0 jp (X)))] ; 49

Denote

F U (y1 ; y0 ) = E [W (F1o (y1 jX) ; F0o (y0 jX))] ; and FPU (y1 ; y0 ) = E [W (F1o (y1 jp (X)) ; F0o (y0 jp (X)))] : It follows from Jensen’s inequality that for all (y1 ; y0 ), we have FPL (y1 ; y0 ) = E [max fF1 (y1 jp (X)) + F0 (y0 jp(X))

1; 0g]

= E [max fE [F1 (y1 jX) + F0 (y0 jX)

1jp (X)] ; 0g]

E [E [max fF1 (y1 jX) + F0 (y0 jX)

1; 0g jp (X)]]

= E [max fF1 (y1 jX) + F0 (y0 jX)

1; 0g]

= F L (y1 ; y0 ) ;

(A.13)

and FPU (y1 ; y0 ) = E [F0o (y0 jp (X)) + min (F1o (y1 jp (X))

F0o (y0 jp (X)) ; 0)]

= E [E (F0o (y0 jX) jp(X)) + min (E (F1o (y1 jX)

F0o (y0 jX) jp(X)) ; 0)]

E [E (F0o (y0 jX) jp(X)) + E (min (F1o (y1 jX)

F0o (y0 jX) ; 0) jp(X))]

= E [F0o (y0 jX) + min (F1o (y1 jX)

F0o (y0 jX) ; 0)]

= F U (y1 ; y0 ) : To save space, we only consider the case (B1) and show LP (p (X))

L

where for all (y1 ; y0 ) ;

LP

L.

Similar to (A.9) and (A.10), we have

= E [ (Y1 ; y 0 (p (X))) jp (X)] + E [ (y 1 (p (X)); Y0 ) jp (X)] RR (y 1 (p (X)); y 0 (p (X))) + BPM d c (y1 ; y0 ) and

(X) = E [ (Y1 ; y 0 (p (X))) jX] + E [ (y 1 (p (X)); Y0 ) jX] RR (y 1 (p (X)); y 0 (p (X))) + BXM d c (y1 ; y0 ) ;

BPM = M (F1o (y1 jp (X)) ; F0o (y0 jp (X)))

(A.14)

(A.15)

1 (y 1 (p (X)) < y1 ) F0o (y0 jp (X))

F1o (y1 jp (X)) 1 (y 0 (p (X)) < y0 ) + 1 (y 1 (p (X)) < y1 ) 1 (y 0 (p (X)) < y0 ) ; BXM = M (F1o (y1 jX) ; F0o (y0 jX))

1 (y 1 (p (X)) < y1 ) F0o (y0 jX)

F1o (y1 jX) 1 (y 0 (p (X)) < y0 ) + 1 (y 1 (p (X)) < y1 ) 1 (y 0 (p (X)) < y0 ) : Taking expectations for (A.14) and (A.15) with respect to X; we have LP

L

= E [ (Y1 ; y 0 (p (X)))] + E [ (y 1 (p (X)); Y0 )] RR E [ (y 1 (p (X)); y 0 (p (X)))] + E [BPM ] d

= E [ (Y1 ; y 0 (p (X)))] + E [ (y 1 (p (X)); Y0 )] RR E [ (y 1 (p (X)); y 0 (p (X)))] + E [BXM ] d 50

(A.16) c

(y1 ; y0 ) and (A.17)

c

(y1 ; y0 ) :

Note that for given y0 and y1 ; E [Fjo (yj jX) jp (X)] = Fjo (yj jp (X)) (j = 1; 0) and thus E [BPM ]

E [BXM ] = FPL (y1 ; y0 )

F L (y1 ; y0 ) :

By using the fact that FPL (y1 ; y0 )

F L (y1 ; y0 ) for all (y1 ; y0 ) ; we have E [BPM ]

comparing (A.16) and (A.17) that

LP

L

(y1 ; y0 ). For a strict super-modular function F L (y1 ; y0 ) for c -almost

c -almost

and

LP

=

L

E [BXM ] ; implying by

i¤ FPL (y1 ; y0 ) = F L (y1 ; y0 ) for

( ; ), it follows from the proof of (A.13) that

all (y1 ; y0 ) if and only if PrfF1o (y1 jX) + F0o (y0 jX)

all (y1 ; y0 ).

Proof of Proposition 5.2: The proof is similar to that of Proposition 5.1.

51

c -almost

FPL

all

(y1 ; y0 ) =

1 > 0jp (X)g 2 f0; 1g for

Appendix B: Two More Examples of Bounds on the Correlation Coe¢ cient In this appendix, we present extensions of Example 2.1 (IC) (i) and Example 2.1 (IU) (i) to log-normal potential outcomes distributions. Example B.1. Let F1o ( jx) and F0o ( jx) denote univariate log-normal distribution functions with known parameters

2 1

1 1X x;

2 1X

1

and

2 0

0 0X x;

Assumption (IC) is satis…ed with X = X. Let

10

2 0X

1

respectively. Let X

(x) denote the conditional correlation coe¢ cient between

Y1 and Y0 given X = x. Using Lemma 3.1 (a), we obtain L (x) 10 (x) p 2 ) (1 2 )) 1 exp( 1 0 (1 1X 0X p L (x) = 2 2 2 2 1)(exp( 0 (1 (exp( 1 (1 0X )) 1X )) p 2 2 exp( 1 0 (1 1 1X ) (1 0X )) U (x) = p 2 )) 2 (1 2 )) (exp( 12 (1 1)(exp( 0 1X 0X

(x), where

U

1) 1)

0;

(B.1)

0:

(B.2)

Considering the unconditional correlation coe¢ cient between Y1 and Y0 , denoted by that E 10

Thus, we have

= L

U

10 (X)

10

p

U,

i V ar (Y1 jX) V ar (Y0 jX) + E [E (Y1 jX) E (Y0 jX)] p V ar (Y1 ) V ar (Y0 )

h

L (X)

10 ,

it is easy to show

E (Y1 ) E (Y0 ) :

where

i V ar (Y1 jX) V ar (Y0 jX) + E [E (Y1 jX) E (Y0 jX)] p = V ar (Y1 ) V ar (Y0 ) h i p E U (X) V ar (Y1 jX) V ar (Y0 jX) + E [E (Y1 jX) E (Y0 jX)] p = V ar (Y1 ) V ar (Y0 ) E

L

h

N (0; 1). Then

p

Now we evaluate the individual components in the expressions for

L

and

E (Y1 ) E (Y0 ) ;

(B.3)

:

(B.4)

E (Y1 ) E (Y0 )

in (B.3) and (B.4). Noting

U

that E (Yj jX) = exp

j jX X

E Yj2 jX = exp 2

+

1 2

2 j

1

2 jX

2 j

1

2 jX

j jX X

+2

1 2

2 j

1

2 jX

exp

1 2

2 2 j jX

and

(B.5)

;

(B.6)

we obtain: E (Yj ) = E [E (Yj jX)] = exp = exp

1 2

2 j

1

E Yj2 = E E Yj2 jX = exp 2

2 j

2 jX

= exp 2 2 jX

1

exp

2 j

1

1 4 2

E (exp [

2 jX 2 2 j jX

= exp

j jX X])

1 2

2 j

E (exp [2 = exp 2

2 j

;

j jX X])

;

and V ar (Yj jX) = exp

2 j

1

2 jX

1 exp 2 52

j jX X

+

2 j

1

2 jX

:

(B.7)

Using the facts in (B.5)-(B.7), we obtain: q p V ar (Y1 ) V ar (Y0 ) = (exp [ E [E (Y1 jX) E (Y0 jX)] 1 2 1 2 1 2

= exp = exp = exp

2 0]

1 2

1) exp

2 0

2 1

+

;

(B.8)

E (Y1 ) E (Y0 )

2 1

1

2 1X

+

2 0

1

2 0X

E exp [(

2 1

1

2 1X

+

2 0

1

2 0X

exp

2 1

+

2 0

2 1]

1) (exp [

[exp (

1 0 1X 0X )

+

1 1X

1 [ 2

1 1X

0 0X ) X] 2 0 0X ]

+

1 2 1 exp 2

exp

2 0

+

2 1

2 0

+

2 1

1] ;

(B.9)

and E

h

=

L

(X)

i p V ar (Y1 jX) V ar (Y0 jX)

exp

0 1

E fexp [( =

q

+ 1 q 1 (1

0 0X

exp

0

2 ) (1 0X

(1

2 ) 1X

1 exp

2 ) 1X

1 exp

"

2 0

1

2 0

+

2 0X

+ 2

2 1

2 1X

1

#

1X ) X]g 2 ) (1 0X

2 1

+2 0 2

1 0X 1X

:

(B.10)

Similarly, E

h

U (X)

=

exp

i p V ar (Y1 jX) V ar (Y0 jX) q 2 ) (1 2 ) 0 1 (1 0X 1X

1 exp

2 0

2 1

+

+2 0 2

Substituting (B.8), (B.9), (B.10) and (B.11) into (B.3) and (B.4) yields h i p 2 ) (1 2 ) exp 0 1 0X 1X (1 0X 1X p L = (exp [ 02 ] 1) (exp [ 12 ] 1) i h p 2 ) (1 2 ) exp 0 1 0X 1X + (1 0X 1X p U = 2 2 (exp [ 0 ] 1) (exp [ 1 ] 1) Like Example 2.1 (IC) (i), when

0X 1X 10

< 0 and

when

2 0X

+

2 0X 2 1X

2 1X

+

0X 1X

> 1, we have

2 0X

> 0 and

L

> 1. By Theorem 4.1 (ii),

2 1X

:

(B.11)

1 ; 1 :

> 1, we have 0
0) 2 = 0 (or

= 0).

Example B.2. Suppose that X = (V; X) follows a bivariate normal distribution: X where

q 1

is the cdf of N (0; 1). Then, by a similar discussion in Example 2.1 (IC) (i), we can see that

the su¢ cient and necessary condition for

0X

and

1)

N [0;

22 ],

0

is de…ned as in (15), and assume that (U1 ; U0 ) conditional on V and X has a bivariate log-normal

distribution with parameters (

0 0)

1;

=

12

1 22 X

and (aij )i;j=1;0 =

11

12

1 22

21

given in (16) and

(17). Let Fd ( jV; X) denote the marginal distribution function of Ud conditional on V and X, which is univariate log-normal distribution with parameters ( with bd = (bdV ; bdX )0 , bdV =

d

(

dV

dX XV

d ; add ),

)= 1

where d = 1; 0;

2 XV

; bdX =

d

(

dX

d

= bdV V + bdX X = b0d X dV

XV

)=

X

1

2 XV

;

and a11 ; a22 are de…ned as in (17). We can give expressions for the lower and upper bounds of conditional correlation coe¢ cient between U1 and U0 given V and X, denoted by

L

(V; X) and

E [Ud jV; X] = exp

d

E [U0 U1 jV; X] = exp

0

Cov [U0 ; U1 jV; X] = exp

U

(V; X) respectively, as follows. Note that

1 + add 2

p 1 (a11 + a00 + 2 a11 a00 ) 2 p 1 (a11 + a00 ) [exp f a11 a00 g 0+ 1+ 2 +

1

+

E Ud2 jV; X = exp f2

d

+ 2add g

V ar [Ud jV; X] = exp f2

d

+ 2add g

= exp f2

d

+ add g [exp fadd g

exp f2

d

+ add g 1]

Then 10

(V; X) = Corr [U0 ; U1 jV; X] = p p

V ar [U1 jV; X] V ar [U0 jV; X]

a11 a00 exp 1 =p [exp fa11 g 1] [exp fa00 g 54

Cov [U0 ; U1 jV; X]

1]

:

1] :

This implies

and

U

L

(V; X)

(V; X) is

10

10

(V; X)

U

(V; X), where

L (V; X) = p

(V; X) with U

p

exp [exp fa11 g

L

(V; X) is

a11 a00

10

(V; X) with

1

1] [exp fa00 g

1]

p exp a11 a00 1 (V; X) = p [exp fa11 g 1] [exp fa00 g

1]

=

1;

< 0;

(B.12)

> 0:

(B.13)

= 1,

Now we compute the lower and upper bounds of unconditional correlation coe¢ cient between U1 and U0 , denoted by

L

and

U,

based on the conditional distribution information given V and X. It follows from

expressions for E [Ud jV; X] ; E Ud2 jV; X and E (U0 U1 jV; X) that we have E (Ud ) = E [E (Ud jV; X)] = E exp = exp = exp = exp

d

1 + add 2

1 add E [exp fbdV V + bdX Xg] 2 1 1 0 add exp (bdV ; bdX ) 22 (bdV ; bdX ) 2 2 1 (add + b0d 22 bd ) ; 2

E Ud2 = E E Ud2 jV; X

= E [exp f2

d

+ 2add g]

= exp f2add g E [exp f2bdV V + 2bdX Xg] = exp f2add g exp 2 (bdV ; bdX ) = exp f2 (add + b0d V ar (Ud ) = E Ud2

22

(bdV ; bdX )

0

22 bd )g ;

2

[E (Ud )]

= exp f2 (add + b0d = exp fadd + b0d

22 bd )g

exp fadd + b0d

22 bd g [exp fadd

+ b0d

22 bd g

22 bd g

1] ;

p a11 + a00 + 2 a11 a00 E (U0 U1 ) = E [E (U0 U1 jV; X)] = E exp 0+ 1+ 2 p a11 + a00 + 2 a11 a00 = exp E [exp f(b0V + b1V ) V + (b0X + b1X ) Xg] 2 p 0 a11 + a00 + 2 a11 a00 (b0 + b1 ) 22 (b0 + b1 ) = exp + : 2 2 Then 10

E (U0 U1 ) E (U1 ) E (U0 ) p V ar (U1 ) V ar (U0 ) p exp a11 a00 + b00 22 b1 1 =p 0 0 [exp fa11 + b1 22 b1 g 1] [exp fa00 + b0

=

55

22 b0 g

1]

:

Let

L

be

10

with

=

1 and =p

L

=p

U

We shall show that when b00 b00

22 b1

22 b1

> 0 and (b00

< 0 and

(b00

2

22 b1 )

to give expression for b00 b00

22 b1

0 1 XV

+

0 1 XV

1 X XV b0X b1V

(1 (

0X

(

0V

0V

0X

0X 1X

1]

L

L

;

(B.14)

:

(B.15)

10

under weaker conditions. Note that

U,

implying that

< 0, implying that

U

L

U

is positive; when

is negative. Thus, we need

10

XV

2

)

[

)(

)

1V

1X XV

)

)(

1X

1V

)

XV

2 )2 XV

) ( 1X 2 )2 XV

0V

1V

0X 1V 0V

+

1V

10

XV

0V

1X XV

0V

)

0V

2 XV 2 0X 1X XV

XV

0X 1X

1V

XV

0X 1V

0V 1X XV

1V

1X XV

+

1X XV

2 0X 1X XV 2 0V 1V XV 0X 1V

0X 1X

XV

0X 1V

2 0X 1X XV

+

3 1X XV 3 0X 1V XV 2 0V 1V XV

+ +

+ XV

0V

]:

= 0, then

=

0 1

(

a11 a00 =

2 2 1 0

< 0 if

1V

0V

+

3 7 7 5

from (B.14) and (B.15) are the same as those from (18) and

XV

22 b1

2 X b0X b1X

2 )2 XV

(1

0V

1X XV

+

0V

+

1V

1

2 0V

0X 1X ) ; 2 0X

1

2 1V

2 1X

;

> a11 a00 is equivalent to (26). Again when (26) holds, we have 0
0; and

10

b1V b1X

X XV b0V b1X

XV

6 + 6 2 ) 4 + XV 2

0 1 2 XV

+

0X XV

0V

b00

2

22 b0 g

) ( 1V 2 )2 XV

(19). If V is uncorrelated with X, i.e.,

22 b1 )

1] [exp fa00 + b00

22 b1 g

0X XV

The conditions of identifying the sign of

and (b00

1]

X XV 2 X

X XV

0 1

(1

22 b0 g

and to use expressions for a11 and a00 in (17). In fact,

(1

(1

1] [exp fa00 + b00 p a11 a00 1 22 b1 +

> a11 a00 , we have 0
a11 a00 , we have

22 b1

1(

b01

U,

10

can help to identify the sign of

U

22 b1 )

+

=

= 1. Then L p exp b00 22 b1 a11 a00

[exp fa11 + b01 2

= b0V b1V + 0 1 ( 0V =

=

with

exp b00

= (b0V ; b0X )

+

10

[exp fa11 +

and

L

be

U

0X 1X

< 0:

56

L

U

if

1V

0V

+

Appendix C: Technical Proofs for Section 6 Proof of Theorem 6.1: Under condition (A3), based on equation (5) in Cambanis, Simons, and Stout (1976), we have 2Eo [ (Y1 ; Y0 ) jX ] = E [ (Y1 ; Y1 ) jX ] + E [ (Y0 ; Y0 ) jX ] where

RR

A d

c (y1 ; y0 );

(C.1)

A = F1o (y1 ^ y0 jX ) + F0o (y1 ^ y0 jX ) Co (F1o (y1 _ y0 jX ) ; F0o (y1 ^ y0 jX ) jX ) Co (F1o (y1 ^ y0 jX ) ; F0o (y1 _ y0 jX ) jX ) : It follows from CL

Co

CU that we have AU

A

AL for all (y1 ; y0 ); where AU and AL are de…ned by

A replacing Co with CU and CL , respectively. Taking expectations for (C.1) with respect to X ; it follows from AU

A

AL that 2

L;R

2

o

= E [ (Y1 ; Y1 )] + E [ (Y0 ; Y0 )]

E

E [ (Y1 ; Y1 )] + E [ (Y0 ; Y0 )]

E

where with j = U; L, 2 This shows

IC;R

[

j;R

L;R ; U;R ]

RR

A d

RR

Aj d

c (y1 ; y0 )

c (y1 ; y0 )

2

U;R ;

:

under condition (A3). Note that CU (F1o (y1 jx ) ; F0o (y0 jx ) jx )

F0o (y0 jx )

and CU (F1o (y1 jx ) ; F0o (y0 jx ) jx ) F1o (y1 jx ) for all (y1 ; y0 ) and x , implying AU 0 for all (y1 ; y0 ;x ) RR 0. Thus, U;R < +1 when E [ (Y1 ; Y1 )] and E [ (Y0 ; Y0 )] are …nite. Similarly, and E AU d c (y1 ; y0 ) under condition (B3), we can also show

IC;R

[

L;R ; U;R ]

based on the following results:

Eo [ (Y1 ; Y0 ) jX ] = E [ (Y1 ; y 0 (X )) jX ] + E [ (y 1 (X ); Y0 ) jX ] RR (y 1 (X ); y 0 (X )) + B d c (y1 ; y0 ) ;

(which is from Eq. (9) in Cambanis, Simons, and Stout (1976)), where for all (y1 ; y0 ) ; B = Co (F1o (y1 jX ) ; F0o (y0 jX ) jX )

1 (y 1 (X ) < y1 ) F0o (y0 jX )

F1o (y1 jX ) 1 (y 0 (X ) < y0 ) + 1 (y 1 (X ) < y1 ) 1 (y 0 (X ) < y0 ) ; and BL

B

BU for all (y1 ; y0 ) (which is due to CL

Co

CU ), where BL and BU are de…ned by B

replacing Co with CL and CU , respectively. Now we show [

L;R ; U;R ]

IC;R .

without loss of generality, suppose

Similar to the proof of Theorem 4.1 (i), for any given V 2 [

L;R