Likelihood Ratio, Score, arid Wald Statistics in Models with Monotone ...

Report 2 Downloads 160 Views
Likelihood Ratio, Score, arid Wald Statistics in Models with Monotone Functions: Some Comparisons

by Moulinath Banerjee and Jon Wellner

TECHNICAL REPORT No. 409 April 25, 2002

Department of Statistics Box 354322

University of Washington Seattle, Washington, 98195 USA

Likelihood Ratio, Score, and Wald Statistics in Models with Monotone Functions: Some Comparisons Moulinath Banerjee

1

and

Jon A. Wellner

2

University of Michigan and University of Washington April 25, 2002

Abstract BANERJEE AND WELLNER (2001) introduced and studied the likelihood ratio statistic for testing the hypothesis that a monotone function takes on a fixed value at a fixed point in the context of estimating the distribution function of the survival time in the interval censoring model. In this paper we continue to use the interval censoring model as a simple "test problem" . We introduce three natural "score statistics" for the same testing problem studied in BANERJEE AND WELLNER (2001) which have natural intepretations in terms of certain (weighted) £2 distances. We compare these new test statistics with an analogue of the classical Wald statistic and the likelihood ratio statistic introduced in BANERJEE AND WELLNER (2001). We first establish limiting distribution theory of the statistics under the null hypothesis, and discuss calculation of the relevant critical points for the test statistics. We then establish the limiting behavior of all five statistics under both fixed and local alternatives. Although the asymptotic theory does allow some qualitative conclusions, it unfortunately does not (yet) lead to explicit quantitative comparison of the power behavior of the five different tests. We therefore also compare the power of five different statistics via a limited Monte-Carlo study. Our preliminary conclusion is that one of the three score tests seem to slightly dominate all the other test statistics, including the likelihood ratio and the Wald statistics.

National Science Foundation grant DMS-D07l8I8 National Science Foundation N1A1D

2R01 A1291968-

1

Introduction

The problem of estimating a monotone function arises frequently in statistics. In what follows we consider a class of monotone function models of a non-regular nature, which can be generically described in the following way: Let Xl, X 2 , ... , X n be U.d. observations from f(x, ~). Here f is a density with respect to some appropriate dominating measure, 1jJ or some transformation of it is a monotone function of interest (say, a distribution function or a cumulative hazard function or a monotone density or a monotone instantaneous hazard) and ~ is a nuisance parameter. The MLE of 1jJ, based on Xl, X 2 , ... , X n is denoted by tPn. The fundamental feature of this class of problems (non-regularity) that sets it apart from the whole spectrum ofregular parametric and semiparametric problems is the slower rate of convergence (n l / 3 ) of the maximum likelihood estimates (henceforth MLEs) of the value of the monotone function at a fixed point (recall that the usual rate of convergence in regular parametric/semiparametric problems is Vii). What happens in each case is the following: n l / 3 (tPn(t) - 1jJ(t)) -+d C(1jJ,~,t)71, where the random variable 7l is a symmetric (about 0) but non-Gaussian random variable and C(1jJ,~, t) is a constant depending upon the underlying parameters in the problem and the point of interest t. In fact, 7l = argminh W(h) + h2 , where W(h) is standard two-sided Brownian motion on the line. In this paper we compare the behavior of three different likelihood based statistics for testing the null hypothesis 1jJ(to) = ()o in the above class of problems: these are (i) the likelihood ratio statistic; (ii) a "score statistic" that arises very naturally as the square of the distance between the constrained and unconstrained MLEs with respect to an appropriate metric; (iii) a version of the "Wald statistic". We deal with the behavior of these statistics not only under the null hypothesis, but also under local (contiguous) alternatives as well as fixed alternatives. The results are obtained in the context of the interval censoring model which has been dealt with extensively in BANERJEE (2000) and BANERJEE AND WELLNER (2001), but are expected to generalize to other montone function models in the class. The behavior of these corresponding statistics under a null hypothesis where the monotone function is constrained at more than one (but finitely many) points is also discussed; this can be derived through natural extensions of the arguments in the "constrained at one point" situation. Finally we discuss general versions of these statistics in the class of monotone function models of interest and conjectures on their asymptotic behavior. We now briefly review some of the background material from BANERJEE AND \VELLNER (2001) since it is fundamental to the subsequent development. Given the characterization of the asymptotic distribution of the MLE in this non-regular class of problems, a natural question of interest is in likelihood ratio tests of null hypotheses where the monotone function is constrained at finitely many points. We consider initially, the one-point testing problem. Thus, consider testing H o : = ()o its complement. Let the MLE of under the constraint imposed the null hypothesis, be The likelihood ratio statistic is defined in the usual manner as twice the ratio. Thus 2

2

{~

BANERJEE and BANERJEE statistic in the interval cellsoring prol>lelll

Interval Censoring Model: Let (Xl, T1 ), (X 2,T2), ... , (X n ,Tn) be n U.d. pairs of random variables. For each i, Xi rv F, rv G and Xi is independent of Ti . Here F and G are continuous distributions concentrated on the positive half-line. In the interval censoring model, we do not get to see the actual failure times, the Xi'S. All that we observe for the i'th individual is the vector (.6. i , Ti ) where .6.i = I{X i :::; Td is the indicator of a failure. We are interested in estimating based on the interval censored data, and more specifically, in the context of likelihood ratio inference, in estimating F(to), the value of F at to. We make the following blanket assumption.

Assumption A: Both F and G are continuously differentiable in a neighborhood of to with Lebesgue densities f and 9 respectively; also 0 < f(to), g(to) and 0 < F(to) < 1. We denote the distribution of (.6., T) under (F, G) by PF,G. The log-likelihood log L n = log Ln(F) based on n i.i.d. observations (.6. 1, Tr), (.6.2, T2), ... , (.6.n> Tn) is then given by log Ln(F) = IP'n (.6.logF(T) + (1- .6.)log(l- F(T») , where IP' n is the empirical measure of the observations {(LJ.i, Ti)i=d. The likelihood ratio statistic for testing F(to) = 00 is given by

where JFn is the unconstrained MLE and JF?, is the constrained MLE under the null hypothesis (based on the interval censored data). The behavior of the likelihood ratio statistic under the null hypothesis was derived in BANERJEE AND WELLNER (2001) and will be discussed in a later section.

2

The Likelihood Ratio, Score and Wald Statistics

In what follows, the null hypothesis is Ho : F(to) = 00 where 0 < to < 00, 0 < 00 < 1. We wish to test this against the alternative hypothesis HI : F(to) =1= 00 . The likelihood ratio statistic is 2 log An where An is the ratio of the likelihoods and is given by

2 log An

2

{ LJ. log

If the null

+ (1 -

1- (T)} .6.) log 1 ~ (T) .

indeed holds, then one the constrained and the unconstrained estimators of the distribution function F to be "close" to each other in some sense. Note that the likelihood ratio measures how close the unconstrained and constrained likelihoods are. One natural and very intuitive way of the distal1ce between the unconstrained MLE F constrained MLE Piet Ufi:>en.eb()ona)

interest

empirical measure of the observation times. We will use scaled/weighted versions of form our test statistics for H o against H 1 • These are of the following types:

to

(i)

(ii)

(iii)

We will show that these statistics are all asymptotically equivalent under both the null hypothesis and under a sequence of contiguous alternatives; under the null hypothesis, these statistics have the same limiting null distribution which is independent of the underlying parameters in the problem. \Vhile, each of these statistics can be regarded as an L z statistic, we will subsequently refer to them as versions of the "score statistic" - the reason why we do so, is explained below. The L z statistic as a score statistic: The log-likelihood, based on n U.d. observations in the interval-censoring model, is given by

log Ln(F) = n lP'n (A logF(T)

+ (1 -

A) log(l - F(T))) ,

where F is the survival time distribution. Now consider the following perturbation of F in the direction of some other distribution function H on [0,00): Fe,N

= (1 -

E) F

+ EH

,

where E ~ O. This gives a one-dimensional parametric submodel of the original non-parametric model, passing through F and the (one-sided) score at F is then given by

=

n

A - F(T) ) ( F(T) (1 F(T)) (H - F)(T)

Our score statistics in the interval censoring problem arise from suitable choices of Hand F. \Ve obtain Sn,l, our first score statistic, by setting F = , and H = ; and our second score statistic, by interchanging the to Hand F. Thus, with , we perturb the unconstrained MLE in the direction of the constrained and for we do the reverse. Since maximizes the it is easy to see that :5 O. Thus

Proposition 2.1 Let 8 n ,1 and belong to B o. Then

be defined as above and let the survival time distribution F

Thus T n ,2, which is one of the versions of the L2 statistic, is asymptotically equivalent, under the null, to the sum of two one-sided score statistics. In this sense, we shall henceforth refer to Tn,2 as a "score statistic". Since Tn,l and Tn,3 are versions of the L 2 statistic which are asymptotically equivalent to Tn ,2 we shall also regard them as "score statistics" . Proof of Proposition 2.1. First, note that T n ,2 can be written as

Now, we have

where r n

r n (L)., T) is the random function given by (~- JF?,(T))(IFn - JF?,)2(T)(1- IFn (T) - JF?,(T))

IFn (T)(l It can be shown, by writing as whereas is showing that but sketch the It is seen

IFn (T)) ~ (T)(l -

~ (T))

that - P)r n is We omit a detailed argument

high probability, it can be shown that eventually lies in a uniformly bounded Donsker class of functions with arbitrarily high probability, whence it follows that - P) r n is Op(l); consequently, n l / 3 (lP' n P) r n is Op(n- l / 6 ).

r;.

In what follows D n denotes the set where Fn and differ, Dn = n l / 3 (D n - to) and tn(z) = to + zn- l / 3 , Xn(z) = n l / 3 (Fn(tn(z» F(to» and Yn(z) = n l / 3 (r;.(tn(z» - F(to». Now nPrn equals

nj

(F(t)-r;.(t»(Fn

r;.)2(t)(1-Fn(t)-r;.(t»g(t)dt

Fn F;, (1 - Fn ) (1 - F;,)(t)

Dn

= n- l / 3 f

ibn

(n l / 3 (F(tn(z» - F(to» - Yn(z» (Xn(z) - Yn(Z»2 Fn F;, (1 - Fn ) (1 - F;,)(tn(z» (1 - Fn (tn(Z» (tn(Z») g(tn(z» dz (z f(to) + 0(1) - Yn(z»(Xn(z) - Yn(Z»2 Fn F;, (1- Fn ) (1 F;,)(tn(z» (1 - Fn (tn (z» (tn (z))) g(tn (z» dz

r;.

= n- l / 3

f_

ibn

r;.

where the 0(1) term goes to 0 uniformly over Z in any compact interval of the form [-K, K]. Using the fact that Dn is eventually contained in a compact set with arbitrarily high probability and that Xn(z) and Yn(z) are uniformly bounded on compact sets with arbitrarily high probability and that Fn(tn(z» and r;.(tn(Z» are eventually bounded away from 0 and 1 with arbitrarily high probability for Z in a compact set, it is shown easily that the integral in the last expression of the above display is Op(l) whence n- l / 3 times the integral is Op(n- l / 3 ). 0 Later we will provide a theorem that describes the behavior of the one-sided score statistics under both the null hypothesis and contiguous alternatives and their relation to the likelihood ratio statistic in the interval censoring model. We now describe the Wald statistic.

The Viald statistic: It is known that nl/3

(IF (t) _ F(t » -+ (F(t o) (1 - F(to» f(t o») 1/3 2Z' n 0 0 d 2 g(to) ,

see e.g. GROENEBOOM AND WELLNER (1992). It is also well known that 2Z =d g1.1(O). Thus a natural analogue Wn of the classical Wald statistic is defined by

where and gn(to) are consistent estimators of and g(to) respectively. Thus the Wald statistic is simply a scaled version of the squared distance between the MLE of F at the point to and the true value The ensures that the limit distribution of the Wald statistic is free of Note as to the ratio and the score statistics, the Wald statistic entails estimation of the and

3

Limit Theory Under the Null Hypothesis

WELLNER (2001) studied the behavior of the likelihood ratio statistic in the interval censoring problem for testing the null hypothesis F(to) == eo. They showed that, under the same regularity conditions as needed to derive the asymptotic distribution of the MLE, a limit distribution indeed does exist and is free of the underlying parameters in the problem. We now introduce some standard notation that will be used throughout. For constants a, b > 0, we shall define the process Xa,b(t) == aW(t)+bt 2 , where Wet) is standard two-sided Brownian motion starting from o and t varies over the line. Let Ga,b denote the greatest convex minorant (henceforth GCM) of the process aW(t) + bt2. The GCM is well-characterized: it is a piecewise linear function that touches the Brownian motion path at the points where it changes slope; furthermore the number of changes of slope in any compact interval is finite. The right derivative process of Ga,b is denoted by ga,b. On the other hand g~,b is the process of right derivatives of G~,b' the process of the constrained one-sided GCMs of Xa,b(t). In other words, G~,b is the process, which for t 2: 0 is the GCM of Xa,b(t) for t :2: 0, but subject to the constraint that its slopes do not fall below 0, and for t < 0 is the GCM of Xa,b(t) for t < 0 but subject to the constraint that its slopes stay less than or equal to O. See GROENEBOOM (1983), GROENEBOOM (1989), BANERJEE (2000), WELLNER (2001) for more detailed descriptions of these processes.

eo

Theorem 3.1 Under the null hypothesis F(to) 210gA n

-1'd

Wn

-1'd

T n,2

-1'd

f f

and under assumption A

{(gl,1(Z»2 - (9L(z»2}dz

2 911(0)

==d

{91,1 (Z)

JI)),

4Z 2 . -

9~,1 (z)} 2 dz == T.

Furthermore T n,l and T n,3 are asymptotically equivalent to T n,2 under H o. The next theorem characterizes the behavior of the one-sided score statistics Sn,l and Sn,2 under the null hypothesis and relates these to the statistics of interest. Theorem 3.2 Under the null hypothesis F(to) ==

Sn.1

-1'd

-

e fl(to)e) f. o

0

f g~,l(Z)

ibn

(gl,l

eo

and under assumption A

Yn(Z) (Xn(Z) - Yn(Z) dz + opel)

-g~,l(Z))

dz,

while dz+

==

+z

and

==

+z

J

where g(to) 80 (1

(Jo)

dz.

The proofs of Theorems 3.1 and 3.2 are given in Section 9. We now briefly discuss the limiting distributions of the three statistics under the null hypothesis. The distributions of IDl and 'Jr are not yet known. A rigorous analytical characterization will undoubtedly involve use of the ideas in GROENEBOOM (1983) and GROENEBOOM (1989). Approximations to the distribution of IDl and 'Jr can be generated by constructing discrete approximations to Brownian motion on a fine grid, over a (sufficiently large) compact set. For a discussion of this method, see BANERJEE (2000) or BANERJEE AND WELLNER (2001). Estimates of selected quantiles of the distribution of IDl based on a sample size of 3 x 104 along with the associated variability are provided in BANERJEE AND WELLNER (2001). The distribution of gl,l (0)2 =d 4Z 2 has essentially been analytically characterized: GROENEBOOM (1985), GROENEBOOM (1989) characterized the distribution of Z, and recently the distribution function, the quantiles and various functionals of Z have been computed by GROENEBOOM AND WELLNER (2001). The density of Z can be written as

h(z) =

1

2 g(z) g( -z),

(3.1)

where the function g(z) is explicitly defined in GROENEBOOM (1985) and GROENEBOOM (1989). Note that the density of Z is symmetric around O. Since gl,l(0)(z)2 =d 4Z 2 , the density of gl,l (0)(Z)2 is easily written as 1 (2 ..fii) =4..fiig 1 (2 ..fii) 9 ( -2 ..fii) ,u > o. f(u)=2..fiih

An easy adaptation of the Mathematica Code in Section 6 of GROENEBOOM AND vVELLNER (2001) yields the density, distribution function, quantiles and moments of 4Z 2 , Table 1 gives quantiles of 4Z 2 = gl,l (0)2 computed via this method along with the estimated quantiles of IDl and 'Jr obtained via discrete approximations to Brownian motion and Monte-Carlo. Figure 1 gives plots of the corresponding distribution functions. Even though the distributions of IDl and 'Jr are not analytically characterized, we do have a stochastic ordering result namely that Jl)l is stochastically larger than 'Jr. Proposition 3.1 dz

Proof. To see

J(

) dz = IDl.

note that

= But for any

~

difference set 2

-2 differs from

o.

unconstrained one-sided convex minorant of the Brownian motion path to the right of O. But this implies that 91,I(Z) :::: 9~,I(Z) > 0 (at any point z > 0 the slope of the two-sided convex minorant is larger than the slope of the unconstrained one-sided convex minorant to the right of 0). Consequently, 9L (91,1 - 9~,I(Z)) :::: 0, and the inequality follows. A similar argument holds for z < 0 and in the difference set. Consequently, from 3.2, we have

for any z in the difference set, implying that

which is all that is required to show. Note that the two statistics are not identical, since with positive probability their pointwise difference, namely,

is strictly positive over a set of non-zero Lebesgue measure.

o

Figure 1 shows the empirical distributions of JD), 1r and 4Z 2 (based on the finite-grid approximations) and the distribution function of XI; the picture clearly suggests a stochastic ordering among three of these four random variables. We conjecture

where < is in the sense of stochastic ordering. Table 1 lists the (empirical) quantiles of JD) and 1r and the quantiles of 4Z 2 and XI. Although the distribution of xi is above that of 4Z 2 for most of the range, it follows from the work of GROENEBOOM (1989) that the tails of the density of Z decay as exp( -cz3 ), and hence it follows that the tail of the distribution of 4Z 2 decays as But the tail of xi decays as exp(-z/2), and hence the tail of should cross the . In fact this crossing occurs at little before the .99 quantile as can be seen from Table

Table 1: Estimated quantiles of the limit distributions under the nulL

I .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 .99

1

Fr;l(p)

0.002728 0.009590 0.020313 0.034385 0.053023 0.076861 0.104505 0.136222 0.169403 0.214541 0.268220 0.335973 0.398911 0.499376 0.607728 0.734048 0.893789 1.128790 1.599501 2.758688

0.002419 0.009847 0.022302 0.040279 0.063874 0.092615 0.128481 0.173011 0.224970 0.284706 0.351822 0.432697 0.527391 0.650435 0.802636 0.986587 1.230060 1.610734 2.286922 3.865057

[Fi (p)

Fx/(p)

I 1

F4i2(P)

0.003932 0.004353 0.015791 0.017475 0.035766 0.039566 0.064185 0.070965 0.101531 0.112176 0.148472 0.163893 0.205900 0.227038 0.274996 0.302833 0.357317 0.392880 0.454936 0.499308 0.570652 0.624969 0.708326 0.773794 0.873457 0.951336 1.074194 1.165780 1.323304 1.429840 1.642374 1 1. 764830 2.072251 2.210680 2.705543 2.856650 3.841459 3.985460 6.62198 6.6349

I

I

10

12

Figure 1: The three different limit distributions under the null and

4

xi.

Limit Theory Under Local Alternatives

Our next theorem concerns the behavior of the three statistics under contiguous alternatives. We briefly recall the class of contiguous alternatives considered in BANERJEE AND "WELLNER (2001). Suppose that {Fn} is a sequence of continuous distribution functions satisfying the following conditions: B(l). For some c > 0, for all t with It tol ~ en- I /3. B(2). The functions An(t) = satisfy

+ uniformly for z E and K vanish on

=

-+ B(z) == f(to)K(z)

(Thus Band K are continuous functions on el and both B It is shown in Theorem 2.6 of BANERJEE AND WELLNER

that let

{

~o


' On the other hand, g~,1,1>' the slope of the constrained one-sided GCM's of W(t) + t 2 + ¢(t), is the MLE of f(t) under the (false) null hypothesis that f(O) = O. Figure 8 shows the functions 2t + ¢' (t) (where this function differs from 2t, it is shown in blue) along with g1,1; (where it differs from g1,1,4» in magenta. Simulation studies: We now study the power behavior of the likelihood ratio test, the three (asymptotically equivalent) score statistics and the Wald statistic, for different values of the constants a, band c, at different sample sizes n and compare with the limiting power predicted by theory. The difficulty for Monte-Carlo experiments is that F;; 1 (u), corresponding to the B n 's defined in (6.1) do not have nice closed forms, so that generating from the distribution function Fn can be complicated. we do know that Fn(to and

c)

Figure 3: Unconstrained, one-sided and constrained minorants of limiting process under null

04

02

0.0

02

04

Figure 5: Slopes of unconstrained and constrained minorants under null:c1ose up view

'!+

'"0

~

+

~ 0

6

'"o 0.2

0.\

0.0

0'

0.2

Figure 7: Unconstrained, one-sided and constrained minorants under contiguous alternatives

The distribution functions G n agree with F and F n on the complement of the interval [to en- l/ 3, to + en- 1/3 ]. On this interval the distribution functions G n are piecewise linear. Note that the distribution function Gn is given by )~~J.:::.!.'4+.~l.t:.J.,

t :::; to - en- 1/3 or t 2:: to to - en- 1/3 :::; t :::; to, to :::; t :::; to + en- 1/3 .

+ en- 1/3 ,

Also, note that it is quite easy to generate observations from the sequence G n , since there is an explicit expression for G;;l (so long as we have an explicit expression for F- l ). We have the following proposition. Proposition: Cn(z) == n l/3 (G n (to + n- 1/3 z) F(to + n- 1/3 z)), -c:::; z :::; c, converges uniformly to B(z) on [-c,c], where B(z) is as defined in (6.1). Consequently, the alternatives {G n } are contiguous.

The straightforward proof is omitted. We proceeded to generate the likelihood ratio statistic for contiguous alternatives of the above kind, at different settings of the underlying parameters and for different values of n. We also generated all three score statistics and the Wald statistic. Recall that by contiguity, the three different versions of the score statistic are all asymptotically equivalent. The actual values, f(to) and g(to), of the densities (which were known) were used to compute the Wald statistic; note that in a real life situation these would need to be estimated. Thus, the Wald statistic that we computed is in some sense an "ideal" version. The asymptotic distribution of the likelihood ratio statistic under the sequence of contiguous alternatives Fn (and also G n ) is that of the random variable, D",

=

!

((gl,1,,,,(Z»2 - (g?,1,,,,(z»2) dz,

the asymptotic distribution of each of the score statistics is 'Jrq,

!

(gl,l,q,(Z)

g?,1,q,(Z»)2 dz,

and the asymptotic distribution of the Wald statistic is that of gl,1,,,,(0)2. Recall that

¢(t) and

i.J!o(h) = 277

= (bja)4 / 3 i.J!o((bja)-2 / 3t) r

lc

h

for h E and is defined the left of -c. The number the null hYIlotJtlesJls. follows:

rv U(O,l), G rv U(O,l), to = 0.5, so that F(to) = 0.5. The value of b/a in this situation is 1. The values of c chosen were 0, 0.5, 1 and 2. The value of 11 was taken to be 0.9.

ii F

rv U(O,l/V2.0), G rv U(0,0.5), to = 1/(2J2), so that F(to) = 0.5. The value of b/a in this situation is 2. The values of c chosen were 0, 0.5, 1 and 2. The value of 11 was taken to be 0.9.

iii F

The sample sizes chosen were n = 100,300,500,800,1000,2000,4000,8000 except for the situation b/a = c = 2, in which case 100 was changed to 200 to tackle some computational issues. For each value of n and each setting of the parameters (b/a, c) (there are twelve settings in all), a sample of size 2000(for c > 0)/5000(for c 0) was generated from the distributions of the likelihood ratio statistic, the three different versions of the score statistic and the Wald statistic. The power of each of these statistics was then computed at the level a = 0.05 by computing the proportion of values that exceeded the 1- a'th quantile of the corresponding limit distribution under the null. Finally, the asymptotic power of each statistic was obtained under each parameter setting with c :f; (c = 0 corresponds to the null hypothesis case) by generating a sample of size 2000 from (discrete approximations to) the limit distribution under that parameter setting. Thus, these samples were obtained by generating discrete approximations to Brownian motion on a grid over a sufficiently large compact set, adding on the function t 2 + ¢(t) to the Brownian motion path and subsequently differentiating the unconstrained and constrained minorants. The results from these simulations are presented below; these allow us to compare the power behavior of the three competing statistics under contiguous alternatives.

°

Table 2: Power at level 0.05 with b/a 100 n-+ 0.0552 2 log(A n ) 0.0518 Tn .1 ' 0.0724 Tn ,2 0.0528 T n ,3 Wn 0.0832

300 0.0514 0546 0.0640 0.0546 0.0642

500 0.0526 0.0558 0.0618 0.0560 0.0616

800 0.0554 0.0616 0.0638 0.0616 0.0618

1000 0.0484 0.0526 0.0564 0.0530 0.0532

= 0.5, c = 0.0. 2000 0.0534 0.0628 0.0648 0.0628 0.0

8000 0.0528 0.0544

00

0.0558

.05 .05 "'.Ui)

0.0544 0.0614

.05

Table 3: Power at level 0.05 with b/a = 0.5, c = 0.5. 500 0.0490 0.0535 0575 0540 .0760

800 0.0565 0.0545 0.0595 0.0545 0.0610 I

1000 0.053 0.056 0.059 0.057 0.060

2000 0.0535 0.0575 0.0615 0.0580 0.0590

4000 0.0545 0.0530 0.0540 0.0530 0.0565

= 0.5, C = 1.0.

00

.0575 .061 .061 .061 .0535

Table 5: Power at level 0.05 with bja n-+

800 0.4340 0.4450 0.4630 0.4450 0.4135

2 log(A n ) Tn,l Tn,2 Tn,3 Wn

1000 0.4255 0.4430 0.4440 0.4170

0.5,e 2000 0.4335 0.4505 0.4595 0.4515 0.3925

2.0. 4000 0.4230 0.4315 0.4365 0.4315 0.3925

8000 0.4115 0.4280 0.4315 .4285 0.3910

00

.4045 .4120 .4120 .4120 .3945

Table 6: Power at level 0.05 with bja = 1, e = O. n-+ 2 log(A n ) Tn,l Tn,2 Tn,3 Wn

100 0.0546 0.0562 0.0758 0.0570 0.0860

300 0.0524 0.0562 0.0640 0.0568 0.0700

500 0.0536 0.0580 0.0632 0.0584 0.0648

800 0.0560 0.0590 0.0640 0.0592 0.0648

2000 0.0540 0.0566 0.0590 0.0566 0.0622

1000 0.0536 0.0520 0.0548 0.0522 0.0592

Table 7: Power at level 0.05 with bja n-+ 2 log(A n ) Tn,l Tn,2 Tn ,3 Wn

100 0.0795 0.0740 0.0960 0.0755 0.1105

300 0.0775 0.0820 0.0925 0.0820 0.0800

500 0.0820 0.0785 0.0840 0.0785 0.0880

800 0.0805 0.0835 0.0900 0.0840 0.0855

1000 0.077 0.075 0.079 0.075 0.080

n-+

800 0.2600 0.2605 0.2720 0.2610 0.2500

1000 0.2610 0.2750 0.2835 0.2750 0.2505

8000 0.0546 0.0580 0.0594 0.0580 0.0564

00

.05 .05 .05 .05 .05

= 1, e = 0.5.

2000 0.0705 0.0750 0.0800 0.0755 0.0700

Table 8: Power at level 0.05 with bja = 1, e 2 log(A n Tn,l Tn ,2

4000 0.0504 0.0562 0.0582 0.0562 0.0538

2000 0.2535 0.2705 0.2760 0.2715 0.2375

4000 0.0690 0.0735 0.0755 0.0735 0.0645

8000 0.0725 0.0725 0.0730 0.0725 0.0725

00

.0730 .0705 .0705 .0705 .066

1.0. 4000 0.2560 0.2485 0.25 0.24 0.2275

at level 0.05 with bja = 1.0, e = 2.0.

00

.261 .2565 .2565 .2565 .2350

Table 10: Power at level 0.05 with bla = 2, c = O. 100 0.0538 0.0590 0.0736 .0592 .0850

n-t

2Iog(An ) Tn,l Tn,2 Tn,3 Wn

300 0.0486 0.0534 0.0598 0.053 0.067

800 0.0470 0.0524 0.0550

2000 0.0468 0.0506 0.0528 0.0508 0.0554

1000 0.0474 0.0516 0.0546 0.0516 0.0508

Table 11: Power at level 0.05 with bla n-t

II

100

2 IOg(A"~ 0.1580 Tn ,2 T n ,3 Wn

1675 0.1980 0.1705 0.1890

I

300

500 0.1585 0.1635 0.1785 0.1635 0.1615

0.1635

.1750 0.1625 0.1545

800 0.1580 0.1540 0.1620 0.1545 0.1505

1000 0.1550 0.1510 0.1565 0.1510 0.1480

n-t

100 0.6845 0.7155 0.7615 0.7190 0.7225

300 0.6625 0.6955 0.7215 0.6965 0.6775

500 0.6710 0.6970 0.7145 0.6980 0.6670

800 0.6655 0.6900 0.6990 0.6900 0.6585

1000 0.6630 0.6980 0.7035 0.6980 0.6550

n-t

II

2Iog(A n ) T n •1

T""=i

T n ,3

Wn

200 1 1 1 1 1

300 1.0000 1.0000 1.0000 1.0000 0.9995

I i

500 1 1 1 1 1

800 1.0000 1.0000 1.0000 1.0000 0.9995

1000 0.9995 0.9995 0.9995 0.9995 0.9980

00

.05 .05 .05 .05 .05

8 i t .1515 00 0.15 0.155 .143 0.1575 .143 0.1555 .143 0.1430 .1435

= 2, c = 1.0.

2000 0.6440 0.6730 0.6790 0.6730 0.647

Table 13: Power at level 0.05 with bla

8000 0.0470 0.0510 0.0524 0.0510 0.0566

2,c = 0.5.

2000 4000 0.1340 • 0.1550 0.1435 0.1485 0.1495 0.1505 0.1435 0.1485 0.1380 0.1405

Table 12: Power at level 0.05 with bla 2 log(A n ) Tn,l Tn,2 Tn,3 == Wn

4000 0.0490 0.0522 0.0532 0.0522 0.0490

4000 0.6370 0.6700 0.6735 0.6700 0.6270

8000 0.6455 0.6905 0.6935 0.6905 0.6375

00

.6615 .6915 .6915 .6915 .6285

= 2.0, c = 2.0.

2000 0.9990 0.9995 0.9995 0.9995 0.9980

4000 1.0000 1.0000 1.0000 1.0000 0.9985

8000 0.9990 0.9990 0.9990

00

~fi

1.0 1.0

0.9985 I .999

Table 14: Power at level 0.1 under a axed alternative for various sample sizes. 300 0.6850 0.7165 0.7345 0.7175 0.7115

I

500 0.8145 0.8400 0.8540 0.8400, 0.8225 I

800 0.919 0.935 0.937 0.935 0.919

1000 , 2000 0.9505 0.9970 0.9630 0.9975 0.9670 0.9975 0.9635 I 0.9975 0.9500 0.9960

1 1 1 1 1

q r

., 0

., 0

g

u..

": 0

0=100 0=1000 o=inlinlty

~ 0

0

0

0

5

15

10

20

x

Figure 9: The limiting distribution of the LRS and the finite sample distributions for n = 100 and n = 1000 under a contiguous sequence with b/a = 0.5 and c = 2

g u..

~

o

o

0l....-r

..,..5

..,..-

..,..-

..,..-

---I

15

2,

q

iD

0

0, then n I/3 (In(to) - f(to) -+ 14f(to)

I' (to)11/3 Z,

as shown by PRAKASA RAG (1969). Let j~ denote the MLE of f under the constraint that f(to) = 00 . We can then ask the same question as in the interval censoring problem. What is the limit distribution of the likelihood ratio statistic? Is it the same as in the interval censoring problem? Other models of interest in the same genre include the maximum likelihood estimation of a decreasing density as above but based on right censored observations treated in HUANG AND ZHANG (1994), maximum likelihood estimation of a monotone instantaneous hazard treated in HUANG AND WELLNER (1995), the panel count data model studied in WELLNER AND ZHANG (2000) etc. and the same question can be asked for each of these models. Fundamental to an understanding of the limitin~ distril:~ution of the likelihood ratio statistic is a characterization of the asymptotic behavior of 'l/Jn and 'I/J~ (reverting to the notation of Sections 1 and 2). Localized versions of the MLE's are defined in the same way, as the interval censoring problem; thus, we set

Un(Z)

= n I/3 (-J;n(t o + Z n- 1/3 ) -

'I/J(to))

and

Vn(z)

= n I/3 (-J;~(to + Z n- 1/3 ) -

'I/J(to)) .

In each of these examples, -J;n is obtained by computing the slope of the greatest convex minorant (or the least concave majorant) of a random object constructed from the data and ,~~ is obtained by differentiating appropriately constrained one-sided greatest convex minorants or least concave majorants, where the constraints stem from the fact that under the null the monotone function assumes the value 00 at the point to. We expect that in each of these models, the random object whose minorants or majorants are differentiated to get the MLEs should be asymptotically replaced by the sample path of Xa,b(t) == a W(t) +bt 2 where a, b > 0 are constants specific to the problem at hand, and the localized and normalized versions of the MLEs, namely (Un, Vn ), ought to converge not only finite dimensionally, but also in the £2 sense to a constant times (ga,b, g~.b)' Finally, the likelihood ratio statistic, 2 log An, should prove to be asymptotically equivalent to a constant times J dz and therefore converge to a constant times J ((ga,b(Z)? and this should be equal to in distribution through Brownian scaling as in the A reason for the above as)lmIPtc,tic eql1iv,'l,ieJ1Ce interval follows from a "monotone function estimation in white noise" discussed in \VELLNER can be of as an a monotone function various models to this class. We the reader to WELLNER Theorem statistic to other monotone defined

where the Ti's are the observed times in the model, and wn(Ti ) is an appropriate (potentially constant) weight. With appropriate weighting the £2 statistic is expected to behave in the same way as in the interval censoring problem Le. converge in distribution to T. The limiting distribution of the Wald statistic in a general monotone function model is actually wellunderstood at this point, at least under the null hypothesis, since the asymptotic distribution of the MLE of a monotone function at a fixed point is characterized. We state a general lemma here that gives the limit distribution of the Wald statistic for a general monotone function model. Lemma 8.1 Consider a non-regular model of the type described in Section 1. Thus, at a fixed point to, n 1/ 3 (~n(to) ¢(to )) -td C(¢, {, to) gl,1 (0).

Let as

Cn

be a consistent estimator of the constant C(¢, {, to). Then, defining the Wald statistic,

(~n(to) ¢(to))2 W n == ---'--C-;:C'-2- - - - ' n

we have,

The proof of the above lemma follows directly by using the continuous mapping theorem in conjunction with Slutsky's theorem. The power behavior of these statistics under both local and fixed alternatives in general monotone function models also needs to be investigated. The behavior of the likelihood ratio statistic under a fixed alternative should be characterizable in terms of KullbackLeibler divergences from the null hypothesis in general, as suggested by the specific result for interval-censored data, and also in light of the derivations in BAHADUR (1971). The construction of general contiguous alternatives for these non-regular models is also of interest; there are two different issues involved here. (a) Construct a generic sequence of contiguous alternatives for these problems based on the general formulation in Section 1, in the spirit of the alternatives constructed for the interval censoring model. (b) Construct broader classes of contiguous alternatives for the interval censoring model, by weakening the requirement of uniform convergence of the perturbation functions, the Bn's and also by allowing the alternatives to vary off shrinking n- 1 / 3 neighborhoods. It is clear that the right rate of convergence of local alternatives in these problems (so that we get convergence to a non-null distribution) is , which matches the convergence rate of ma.-ximum likelihood estimators. With a faster rate (for example Vii) of convergence of local alternatives, the likelihood ratio statistic will converge to the null (this is not difficult to in other we do not power. It is also clear that the the will often have to some "intrinsic" conditions the pal:tic:ul;:t!' model involved.

9

Proofs

Lemma 9.1 Suppose that {X n 0, the following distributional equality holds in the space

(ga,b,\ji(t),g~,b,\ji(t») :g (a (bja)1/3 g1 ,1,q, (bja)2/3 t ) ,a (bja)1/3 g?,1,q, (bja)2/3t)) . where ¢ is as defined in (4.1). Proof of Lemma 3.3. We establish the distributional equality

The joint distributional equality is established similarly. We will use the following fact (see either BANERJEE AND WELLNER (2001) or BANERJEE (2000»:

We also recall, from the definition of ¢(t) in Section 4 that

¢«bja)2/3 t) Now,

= (bja)4/31/Jo(t).

= a (ajb)1/3

a (ajb)1/3

(Xl,l(t)+¢(t».

Thus

t)

a

=

t)

a (ajb)1/3

v

a

v

+

+ a (ajb)1/3 ¢«bja)2/3 t)

This finishes the proof.

0

Proof of Theorem 3.1. The first assertion, 2 log An -+ d L, is proved in BANERJEE AND WELLNER (2001), Theorem 2.5. To show that Wn converges in distribution to , note that

(lFn(to) - F(to» -+d 2::£, (F(to) (1 - F(to» f(to)/2g(tO»1/3

n 1/3

and that lFn(to) (1 lFn(to» In(to)/2g n(to) F(to» f(to)/2g(to). Consequently, n

1/3

is a consistent estimator of F(t o) (1 -

(lFn(to) - F(to» 2'71 -+d ILJ (lFn(to) (l-lFn(to» fn(to)/2g n (to»)i/3 A

by Slutsky's theorem and the result follows by the continuous mapping theorem. It remains to deal with Tn ,2. Denote the set where lFn and F;, differ by D n and the transformed difference set corresponding to the local variable, z = n 1/ 3 (T - to), by Dn. We also denote to + n- 1/ 3 z by tn(z). Now,

=

Tn ,2,l + Tn ,2,2

.

But

and this is opel) easily since the (random) function within the curly brackets can be shown to eventually lie in a uniformly bounded Donkser class of functions with arbitrarily high probability (details of analogous arguments can be found in BANERJEE (2000». Thus we only need to deal with T n ,2,2' Now, nP

=

=

'~ft

Iv (T») ft

dt

The last step in the above string of inequalities follows from the one above it using the continuity of 9 in a neighborhood of to, the almost sure uniform convergence of to F in a neighborhood the fact that Dn is with arbitrarily high probability contained in a of to and the continuity of compact set around 0, eventually and the fact that the processes Xn(z) and Yn(z) are bounded in probability on compact sets. Thus,

It can be shown by similar steps that the same representation holds for Tn,l and T n ,3 under the null hypothesis, showing that all the three statistics are asymptotically equivalent to one another and to gUo) {() 2 Tn = 0 (1 _ ( ) 1D (X n z -} n z) dz. T

0

0

(

)

n

We will now deduce the limit distribution of Tn. To this end we will use Lemma 9.1. Now, for each f > 0 we can find a compact set M, of the form (-K" K.] such that eventually,

For a detailed proof of this see BANERJEE (2000), pages 159-160). Here Da,b is the set on which the processes ga,b and g~,b differ. Now let Xn,

=

W, {

g(to) { FF(to) 1

M

1{M ,

r

= iDa

9 b

( . ( )2 (X n z) - Y n z) dz,

,

F ~(t ) (ga,b(Z) 0

g~,b(Z»)2 dz,

z' -

I o 2 n . (?'" rl?' g(to)F(to)(l - F(to» \ a,V\ / "a,a - I I ~_.

-:--:-=-:--:--:---::::-:--::-:- (g

L (

en

n

Set = Since Me contains Dn with probability greater than 1- f eventually (D is the leftclosed, right-open interval over which the processes X n and Yn differ) we have P (Xne t= {n] < f eventually. Similarly P (We t= {] < f. Also X ne -td We as n -t 00, for every fixed f. This is so because by Theorem 4.1 with iII = 0,

as process in £2 and because

x

c] for every c

> 0 (with

a and b as defined in Theorem

dz Thus all conditions of Lemma establish that the limitilllg theI'eby ShO'Wirll< UllivE';fsality, We

Now, by Lemma 9.2 with \IT have

= 0 (the situation under the null), which entails that 4> = 0, we

(ga,b(t),g~.b(t)) ;g (a (b/a)1/3 g1 ,l ((b/a)2/3 t ) ,a(b/a)1/3g~.1 ((b/a)2/3 t ))

cJ X £2

as processes in £2

cJ.

Once again, for each



> 0, we can find K > 0 such that E

Thus

= 1)

1 1 1

[-K.,K'] 9 F

(ga,b(Z)

1

a2(b/a)2/3 (91,l((b/a)2/3 z ) -

[-K.,K,j 9 F F(to)

=

g~,b(Z))2

1

[-(b/a)2/3 K, ,(b/a)2/3K,] 'lI'e .

dz

g~,b((b/a)2/3z))2

dz

(gl,1(Z)-g~.1(Z))2dz

In the above string of equalities we have used the fact that a2 = g(to) F(to) (1 - F(to)). Now, note that P [UE =j::. UJ < E and P [1I' =j::. 1I'J < E . E

Thus once again by lemma 9.1 (here set X m = UE for all n, set ~n and set ~ = 1I'), it follows that U =d 1I'. This completes the proof.

= U for all n, set WE = 0

Proof of Theorem 3.2. From Lemma 2.1 we have

By the asymptotic equivalence of T n,2 and Tn under the null hypothesis, it follows immediately that Now

)(T) )

-n For this ide:ntit:y refer to pages 156-157 of BANERJEE

=

). where

and

T~,2 = =

»)

(~(T) -

n

(T) - F(to) F.> nP ( IF (T)(l-lF (T)) (n(T) n n

F(tO

F(to))

)

+01'(1).

(9.3)

The derivation of (9.1) is available on pages 168-169 of BANERJEE (2000). Now, from (9.2), standard manipulations (changing to the local variable and so forth) yield, I

T n,1

g(to)

(}o (1- (}o)

r ibn

2

-Yn(z) dz

+ 01'

(

)

1 ,

and from (9.3), by standard manipulations again, we find that

Hence,

(9.4) This coupled with the fact that Sn,2

+ Sn,l = Tn + 01'(1) yields

(9.5)

Thus

S n,2 - Sn,l

= (}o (1g(to)(})0

;;. (X 2( z) - }n 1"2() Z ) dz n