Likelihood Ratio, Score, arid Wald Statistics in Models with Monotone Functions: Some Comparisons
by Moulinath Banerjee and Jon Wellner
TECHNICAL REPORT No. 409 April 25, 2002
Department of Statistics Box 354322
University of Washington Seattle, Washington, 98195 USA
Likelihood Ratio, Score, and Wald Statistics in Models with Monotone Functions: Some Comparisons Moulinath Banerjee
1
and
Jon A. Wellner
2
University of Michigan and University of Washington April 25, 2002
Abstract BANERJEE AND WELLNER (2001) introduced and studied the likelihood ratio statistic for testing the hypothesis that a monotone function takes on a fixed value at a fixed point in the context of estimating the distribution function of the survival time in the interval censoring model. In this paper we continue to use the interval censoring model as a simple "test problem" . We introduce three natural "score statistics" for the same testing problem studied in BANERJEE AND WELLNER (2001) which have natural intepretations in terms of certain (weighted) £2 distances. We compare these new test statistics with an analogue of the classical Wald statistic and the likelihood ratio statistic introduced in BANERJEE AND WELLNER (2001). We first establish limiting distribution theory of the statistics under the null hypothesis, and discuss calculation of the relevant critical points for the test statistics. We then establish the limiting behavior of all five statistics under both fixed and local alternatives. Although the asymptotic theory does allow some qualitative conclusions, it unfortunately does not (yet) lead to explicit quantitative comparison of the power behavior of the five different tests. We therefore also compare the power of five different statistics via a limited Monte-Carlo study. Our preliminary conclusion is that one of the three score tests seem to slightly dominate all the other test statistics, including the likelihood ratio and the Wald statistics.
National Science Foundation grant DMS-D07l8I8 National Science Foundation N1A1D
2R01 A1291968-
1
Introduction
The problem of estimating a monotone function arises frequently in statistics. In what follows we consider a class of monotone function models of a non-regular nature, which can be generically described in the following way: Let Xl, X 2 , ... , X n be U.d. observations from f(x, ~). Here f is a density with respect to some appropriate dominating measure, 1jJ or some transformation of it is a monotone function of interest (say, a distribution function or a cumulative hazard function or a monotone density or a monotone instantaneous hazard) and ~ is a nuisance parameter. The MLE of 1jJ, based on Xl, X 2 , ... , X n is denoted by tPn. The fundamental feature of this class of problems (non-regularity) that sets it apart from the whole spectrum ofregular parametric and semiparametric problems is the slower rate of convergence (n l / 3 ) of the maximum likelihood estimates (henceforth MLEs) of the value of the monotone function at a fixed point (recall that the usual rate of convergence in regular parametric/semiparametric problems is Vii). What happens in each case is the following: n l / 3 (tPn(t) - 1jJ(t)) -+d C(1jJ,~,t)71, where the random variable 7l is a symmetric (about 0) but non-Gaussian random variable and C(1jJ,~, t) is a constant depending upon the underlying parameters in the problem and the point of interest t. In fact, 7l = argminh W(h) + h2 , where W(h) is standard two-sided Brownian motion on the line. In this paper we compare the behavior of three different likelihood based statistics for testing the null hypothesis 1jJ(to) = ()o in the above class of problems: these are (i) the likelihood ratio statistic; (ii) a "score statistic" that arises very naturally as the square of the distance between the constrained and unconstrained MLEs with respect to an appropriate metric; (iii) a version of the "Wald statistic". We deal with the behavior of these statistics not only under the null hypothesis, but also under local (contiguous) alternatives as well as fixed alternatives. The results are obtained in the context of the interval censoring model which has been dealt with extensively in BANERJEE (2000) and BANERJEE AND WELLNER (2001), but are expected to generalize to other montone function models in the class. The behavior of these corresponding statistics under a null hypothesis where the monotone function is constrained at more than one (but finitely many) points is also discussed; this can be derived through natural extensions of the arguments in the "constrained at one point" situation. Finally we discuss general versions of these statistics in the class of monotone function models of interest and conjectures on their asymptotic behavior. We now briefly review some of the background material from BANERJEE AND \VELLNER (2001) since it is fundamental to the subsequent development. Given the characterization of the asymptotic distribution of the MLE in this non-regular class of problems, a natural question of interest is in likelihood ratio tests of null hypotheses where the monotone function is constrained at finitely many points. We consider initially, the one-point testing problem. Thus, consider testing H o : = ()o its complement. Let the MLE of under the constraint imposed the null hypothesis, be The likelihood ratio statistic is defined in the usual manner as twice the ratio. Thus 2
2
{~
BANERJEE and BANERJEE statistic in the interval cellsoring prol>lelll
Interval Censoring Model: Let (Xl, T1 ), (X 2,T2), ... , (X n ,Tn) be n U.d. pairs of random variables. For each i, Xi rv F, rv G and Xi is independent of Ti . Here F and G are continuous distributions concentrated on the positive half-line. In the interval censoring model, we do not get to see the actual failure times, the Xi'S. All that we observe for the i'th individual is the vector (.6. i , Ti ) where .6.i = I{X i :::; Td is the indicator of a failure. We are interested in estimating based on the interval censored data, and more specifically, in the context of likelihood ratio inference, in estimating F(to), the value of F at to. We make the following blanket assumption.
Assumption A: Both F and G are continuously differentiable in a neighborhood of to with Lebesgue densities f and 9 respectively; also 0 < f(to), g(to) and 0 < F(to) < 1. We denote the distribution of (.6., T) under (F, G) by PF,G. The log-likelihood log L n = log Ln(F) based on n i.i.d. observations (.6. 1, Tr), (.6.2, T2), ... , (.6.n> Tn) is then given by log Ln(F) = IP'n (.6.logF(T) + (1- .6.)log(l- F(T») , where IP' n is the empirical measure of the observations {(LJ.i, Ti)i=d. The likelihood ratio statistic for testing F(to) = 00 is given by
where JFn is the unconstrained MLE and JF?, is the constrained MLE under the null hypothesis (based on the interval censored data). The behavior of the likelihood ratio statistic under the null hypothesis was derived in BANERJEE AND WELLNER (2001) and will be discussed in a later section.
2
The Likelihood Ratio, Score and Wald Statistics
In what follows, the null hypothesis is Ho : F(to) = 00 where 0 < to < 00, 0 < 00 < 1. We wish to test this against the alternative hypothesis HI : F(to) =1= 00 . The likelihood ratio statistic is 2 log An where An is the ratio of the likelihoods and is given by
2 log An
2
{ LJ. log
If the null
+ (1 -
1- (T)} .6.) log 1 ~ (T) .
indeed holds, then one the constrained and the unconstrained estimators of the distribution function F to be "close" to each other in some sense. Note that the likelihood ratio measures how close the unconstrained and constrained likelihoods are. One natural and very intuitive way of the distal1ce between the unconstrained MLE F constrained MLE Piet Ufi:>en.eb()ona)
interest
empirical measure of the observation times. We will use scaled/weighted versions of form our test statistics for H o against H 1 • These are of the following types:
to
(i)
(ii)
(iii)
We will show that these statistics are all asymptotically equivalent under both the null hypothesis and under a sequence of contiguous alternatives; under the null hypothesis, these statistics have the same limiting null distribution which is independent of the underlying parameters in the problem. \Vhile, each of these statistics can be regarded as an L z statistic, we will subsequently refer to them as versions of the "score statistic" - the reason why we do so, is explained below. The L z statistic as a score statistic: The log-likelihood, based on n U.d. observations in the interval-censoring model, is given by
log Ln(F) = n lP'n (A logF(T)
+ (1 -
A) log(l - F(T))) ,
where F is the survival time distribution. Now consider the following perturbation of F in the direction of some other distribution function H on [0,00): Fe,N
= (1 -
E) F
+ EH
,
where E ~ O. This gives a one-dimensional parametric submodel of the original non-parametric model, passing through F and the (one-sided) score at F is then given by
=
n
A - F(T) ) ( F(T) (1 F(T)) (H - F)(T)
Our score statistics in the interval censoring problem arise from suitable choices of Hand F. \Ve obtain Sn,l, our first score statistic, by setting F = , and H = ; and our second score statistic, by interchanging the to Hand F. Thus, with , we perturb the unconstrained MLE in the direction of the constrained and for we do the reverse. Since maximizes the it is easy to see that :5 O. Thus
Proposition 2.1 Let 8 n ,1 and belong to B o. Then
be defined as above and let the survival time distribution F
Thus T n ,2, which is one of the versions of the L2 statistic, is asymptotically equivalent, under the null, to the sum of two one-sided score statistics. In this sense, we shall henceforth refer to Tn,2 as a "score statistic". Since Tn,l and Tn,3 are versions of the L 2 statistic which are asymptotically equivalent to Tn ,2 we shall also regard them as "score statistics" . Proof of Proposition 2.1. First, note that T n ,2 can be written as
Now, we have
where r n
r n (L)., T) is the random function given by (~- JF?,(T))(IFn - JF?,)2(T)(1- IFn (T) - JF?,(T))
IFn (T)(l It can be shown, by writing as whereas is showing that but sketch the It is seen
IFn (T)) ~ (T)(l -
~ (T))
that - P)r n is We omit a detailed argument
high probability, it can be shown that eventually lies in a uniformly bounded Donsker class of functions with arbitrarily high probability, whence it follows that - P) r n is Op(l); consequently, n l / 3 (lP' n P) r n is Op(n- l / 6 ).
r;.
In what follows D n denotes the set where Fn and differ, Dn = n l / 3 (D n - to) and tn(z) = to + zn- l / 3 , Xn(z) = n l / 3 (Fn(tn(z» F(to» and Yn(z) = n l / 3 (r;.(tn(z» - F(to». Now nPrn equals
nj
(F(t)-r;.(t»(Fn
r;.)2(t)(1-Fn(t)-r;.(t»g(t)dt
Fn F;, (1 - Fn ) (1 - F;,)(t)
Dn
= n- l / 3 f
ibn
(n l / 3 (F(tn(z» - F(to» - Yn(z» (Xn(z) - Yn(Z»2 Fn F;, (1 - Fn ) (1 - F;,)(tn(z» (1 - Fn (tn(Z» (tn(Z») g(tn(z» dz (z f(to) + 0(1) - Yn(z»(Xn(z) - Yn(Z»2 Fn F;, (1- Fn ) (1 F;,)(tn(z» (1 - Fn (tn (z» (tn (z))) g(tn (z» dz
r;.
= n- l / 3
f_
ibn
r;.
where the 0(1) term goes to 0 uniformly over Z in any compact interval of the form [-K, K]. Using the fact that Dn is eventually contained in a compact set with arbitrarily high probability and that Xn(z) and Yn(z) are uniformly bounded on compact sets with arbitrarily high probability and that Fn(tn(z» and r;.(tn(Z» are eventually bounded away from 0 and 1 with arbitrarily high probability for Z in a compact set, it is shown easily that the integral in the last expression of the above display is Op(l) whence n- l / 3 times the integral is Op(n- l / 3 ). 0 Later we will provide a theorem that describes the behavior of the one-sided score statistics under both the null hypothesis and contiguous alternatives and their relation to the likelihood ratio statistic in the interval censoring model. We now describe the Wald statistic.
The Viald statistic: It is known that nl/3
(IF (t) _ F(t » -+ (F(t o) (1 - F(to» f(t o») 1/3 2Z' n 0 0 d 2 g(to) ,
see e.g. GROENEBOOM AND WELLNER (1992). It is also well known that 2Z =d g1.1(O). Thus a natural analogue Wn of the classical Wald statistic is defined by
where and gn(to) are consistent estimators of and g(to) respectively. Thus the Wald statistic is simply a scaled version of the squared distance between the MLE of F at the point to and the true value The ensures that the limit distribution of the Wald statistic is free of Note as to the ratio and the score statistics, the Wald statistic entails estimation of the and
3
Limit Theory Under the Null Hypothesis
WELLNER (2001) studied the behavior of the likelihood ratio statistic in the interval censoring problem for testing the null hypothesis F(to) == eo. They showed that, under the same regularity conditions as needed to derive the asymptotic distribution of the MLE, a limit distribution indeed does exist and is free of the underlying parameters in the problem. We now introduce some standard notation that will be used throughout. For constants a, b > 0, we shall define the process Xa,b(t) == aW(t)+bt 2 , where Wet) is standard two-sided Brownian motion starting from o and t varies over the line. Let Ga,b denote the greatest convex minorant (henceforth GCM) of the process aW(t) + bt2. The GCM is well-characterized: it is a piecewise linear function that touches the Brownian motion path at the points where it changes slope; furthermore the number of changes of slope in any compact interval is finite. The right derivative process of Ga,b is denoted by ga,b. On the other hand g~,b is the process of right derivatives of G~,b' the process of the constrained one-sided GCMs of Xa,b(t). In other words, G~,b is the process, which for t 2: 0 is the GCM of Xa,b(t) for t :2: 0, but subject to the constraint that its slopes do not fall below 0, and for t < 0 is the GCM of Xa,b(t) for t < 0 but subject to the constraint that its slopes stay less than or equal to O. See GROENEBOOM (1983), GROENEBOOM (1989), BANERJEE (2000), WELLNER (2001) for more detailed descriptions of these processes.
eo
Theorem 3.1 Under the null hypothesis F(to) 210gA n
-1'd
Wn
-1'd
T n,2
-1'd
f f
and under assumption A
{(gl,1(Z»2 - (9L(z»2}dz
2 911(0)
==d
{91,1 (Z)
JI)),
4Z 2 . -
9~,1 (z)} 2 dz == T.
Furthermore T n,l and T n,3 are asymptotically equivalent to T n,2 under H o. The next theorem characterizes the behavior of the one-sided score statistics Sn,l and Sn,2 under the null hypothesis and relates these to the statistics of interest. Theorem 3.2 Under the null hypothesis F(to) ==
Sn.1
-1'd
-
e fl(to)e) f. o
0
f g~,l(Z)
ibn
(gl,l
eo
and under assumption A
Yn(Z) (Xn(Z) - Yn(Z) dz + opel)
-g~,l(Z))
dz,
while dz+
==
+z
and
==
+z
J
where g(to) 80 (1
(Jo)
dz.
The proofs of Theorems 3.1 and 3.2 are given in Section 9. We now briefly discuss the limiting distributions of the three statistics under the null hypothesis. The distributions of IDl and 'Jr are not yet known. A rigorous analytical characterization will undoubtedly involve use of the ideas in GROENEBOOM (1983) and GROENEBOOM (1989). Approximations to the distribution of IDl and 'Jr can be generated by constructing discrete approximations to Brownian motion on a fine grid, over a (sufficiently large) compact set. For a discussion of this method, see BANERJEE (2000) or BANERJEE AND WELLNER (2001). Estimates of selected quantiles of the distribution of IDl based on a sample size of 3 x 104 along with the associated variability are provided in BANERJEE AND WELLNER (2001). The distribution of gl,l (0)2 =d 4Z 2 has essentially been analytically characterized: GROENEBOOM (1985), GROENEBOOM (1989) characterized the distribution of Z, and recently the distribution function, the quantiles and various functionals of Z have been computed by GROENEBOOM AND WELLNER (2001). The density of Z can be written as
h(z) =
1
2 g(z) g( -z),
(3.1)
where the function g(z) is explicitly defined in GROENEBOOM (1985) and GROENEBOOM (1989). Note that the density of Z is symmetric around O. Since gl,l(0)(z)2 =d 4Z 2 , the density of gl,l (0)(Z)2 is easily written as 1 (2 ..fii) =4..fiig 1 (2 ..fii) 9 ( -2 ..fii) ,u > o. f(u)=2..fiih
An easy adaptation of the Mathematica Code in Section 6 of GROENEBOOM AND vVELLNER (2001) yields the density, distribution function, quantiles and moments of 4Z 2 , Table 1 gives quantiles of 4Z 2 = gl,l (0)2 computed via this method along with the estimated quantiles of IDl and 'Jr obtained via discrete approximations to Brownian motion and Monte-Carlo. Figure 1 gives plots of the corresponding distribution functions. Even though the distributions of IDl and 'Jr are not analytically characterized, we do have a stochastic ordering result namely that Jl)l is stochastically larger than 'Jr. Proposition 3.1 dz
Proof. To see
J(
) dz = IDl.
note that
= But for any
~
difference set 2
-2 differs from
o.
unconstrained one-sided convex minorant of the Brownian motion path to the right of O. But this implies that 91,I(Z) :::: 9~,I(Z) > 0 (at any point z > 0 the slope of the two-sided convex minorant is larger than the slope of the unconstrained one-sided convex minorant to the right of 0). Consequently, 9L (91,1 - 9~,I(Z)) :::: 0, and the inequality follows. A similar argument holds for z < 0 and in the difference set. Consequently, from 3.2, we have
for any z in the difference set, implying that
which is all that is required to show. Note that the two statistics are not identical, since with positive probability their pointwise difference, namely,
is strictly positive over a set of non-zero Lebesgue measure.
o
Figure 1 shows the empirical distributions of JD), 1r and 4Z 2 (based on the finite-grid approximations) and the distribution function of XI; the picture clearly suggests a stochastic ordering among three of these four random variables. We conjecture
where < is in the sense of stochastic ordering. Table 1 lists the (empirical) quantiles of JD) and 1r and the quantiles of 4Z 2 and XI. Although the distribution of xi is above that of 4Z 2 for most of the range, it follows from the work of GROENEBOOM (1989) that the tails of the density of Z decay as exp( -cz3 ), and hence it follows that the tail of the distribution of 4Z 2 decays as But the tail of xi decays as exp(-z/2), and hence the tail of should cross the . In fact this crossing occurs at little before the .99 quantile as can be seen from Table
Table 1: Estimated quantiles of the limit distributions under the nulL
I .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 .99
1
Fr;l(p)
0.002728 0.009590 0.020313 0.034385 0.053023 0.076861 0.104505 0.136222 0.169403 0.214541 0.268220 0.335973 0.398911 0.499376 0.607728 0.734048 0.893789 1.128790 1.599501 2.758688
0.002419 0.009847 0.022302 0.040279 0.063874 0.092615 0.128481 0.173011 0.224970 0.284706 0.351822 0.432697 0.527391 0.650435 0.802636 0.986587 1.230060 1.610734 2.286922 3.865057
[Fi (p)
Fx/(p)
I 1
F4i2(P)
0.003932 0.004353 0.015791 0.017475 0.035766 0.039566 0.064185 0.070965 0.101531 0.112176 0.148472 0.163893 0.205900 0.227038 0.274996 0.302833 0.357317 0.392880 0.454936 0.499308 0.570652 0.624969 0.708326 0.773794 0.873457 0.951336 1.074194 1.165780 1.323304 1.429840 1.642374 1 1. 764830 2.072251 2.210680 2.705543 2.856650 3.841459 3.985460 6.62198 6.6349
I
I
10
12
Figure 1: The three different limit distributions under the null and
4
xi.
Limit Theory Under Local Alternatives
Our next theorem concerns the behavior of the three statistics under contiguous alternatives. We briefly recall the class of contiguous alternatives considered in BANERJEE AND "WELLNER (2001). Suppose that {Fn} is a sequence of continuous distribution functions satisfying the following conditions: B(l). For some c > 0, for all t with It tol ~ en- I /3. B(2). The functions An(t) = satisfy
+ uniformly for z E and K vanish on
=
-+ B(z) == f(to)K(z)
(Thus Band K are continuous functions on el and both B It is shown in Theorem 2.6 of BANERJEE AND WELLNER
that let
{
~o
' On the other hand, g~,1,1>' the slope of the constrained one-sided GCM's of W(t) + t 2 + ¢(t), is the MLE of f(t) under the (false) null hypothesis that f(O) = O. Figure 8 shows the functions 2t + ¢' (t) (where this function differs from 2t, it is shown in blue) along with g1,1; (where it differs from g1,1,4» in magenta. Simulation studies: We now study the power behavior of the likelihood ratio test, the three (asymptotically equivalent) score statistics and the Wald statistic, for different values of the constants a, band c, at different sample sizes n and compare with the limiting power predicted by theory. The difficulty for Monte-Carlo experiments is that F;; 1 (u), corresponding to the B n 's defined in (6.1) do not have nice closed forms, so that generating from the distribution function Fn can be complicated. we do know that Fn(to and
c)
Figure 3: Unconstrained, one-sided and constrained minorants of limiting process under null
04
02
0.0
02
04
Figure 5: Slopes of unconstrained and constrained minorants under null:c1ose up view
'!+
'"0
~
+
~ 0
6
'"o 0.2
0.\
0.0
0'
0.2
Figure 7: Unconstrained, one-sided and constrained minorants under contiguous alternatives
The distribution functions G n agree with F and F n on the complement of the interval [to en- l/ 3, to + en- 1/3 ]. On this interval the distribution functions G n are piecewise linear. Note that the distribution function Gn is given by )~~J.:::.!.'4+.~l.t:.J.,
t :::; to - en- 1/3 or t 2:: to to - en- 1/3 :::; t :::; to, to :::; t :::; to + en- 1/3 .
+ en- 1/3 ,
Also, note that it is quite easy to generate observations from the sequence G n , since there is an explicit expression for G;;l (so long as we have an explicit expression for F- l ). We have the following proposition. Proposition: Cn(z) == n l/3 (G n (to + n- 1/3 z) F(to + n- 1/3 z)), -c:::; z :::; c, converges uniformly to B(z) on [-c,c], where B(z) is as defined in (6.1). Consequently, the alternatives {G n } are contiguous.
The straightforward proof is omitted. We proceeded to generate the likelihood ratio statistic for contiguous alternatives of the above kind, at different settings of the underlying parameters and for different values of n. We also generated all three score statistics and the Wald statistic. Recall that by contiguity, the three different versions of the score statistic are all asymptotically equivalent. The actual values, f(to) and g(to), of the densities (which were known) were used to compute the Wald statistic; note that in a real life situation these would need to be estimated. Thus, the Wald statistic that we computed is in some sense an "ideal" version. The asymptotic distribution of the likelihood ratio statistic under the sequence of contiguous alternatives Fn (and also G n ) is that of the random variable, D",
=
!
((gl,1,,,,(Z»2 - (g?,1,,,,(z»2) dz,
the asymptotic distribution of each of the score statistics is 'Jrq,
!
(gl,l,q,(Z)
g?,1,q,(Z»)2 dz,
and the asymptotic distribution of the Wald statistic is that of gl,1,,,,(0)2. Recall that
¢(t) and
i.J!o(h) = 277
= (bja)4 / 3 i.J!o((bja)-2 / 3t) r
lc
h
for h E and is defined the left of -c. The number the null hYIlotJtlesJls. follows:
rv U(O,l), G rv U(O,l), to = 0.5, so that F(to) = 0.5. The value of b/a in this situation is 1. The values of c chosen were 0, 0.5, 1 and 2. The value of 11 was taken to be 0.9.
ii F
rv U(O,l/V2.0), G rv U(0,0.5), to = 1/(2J2), so that F(to) = 0.5. The value of b/a in this situation is 2. The values of c chosen were 0, 0.5, 1 and 2. The value of 11 was taken to be 0.9.
iii F
The sample sizes chosen were n = 100,300,500,800,1000,2000,4000,8000 except for the situation b/a = c = 2, in which case 100 was changed to 200 to tackle some computational issues. For each value of n and each setting of the parameters (b/a, c) (there are twelve settings in all), a sample of size 2000(for c > 0)/5000(for c 0) was generated from the distributions of the likelihood ratio statistic, the three different versions of the score statistic and the Wald statistic. The power of each of these statistics was then computed at the level a = 0.05 by computing the proportion of values that exceeded the 1- a'th quantile of the corresponding limit distribution under the null. Finally, the asymptotic power of each statistic was obtained under each parameter setting with c :f; (c = 0 corresponds to the null hypothesis case) by generating a sample of size 2000 from (discrete approximations to) the limit distribution under that parameter setting. Thus, these samples were obtained by generating discrete approximations to Brownian motion on a grid over a sufficiently large compact set, adding on the function t 2 + ¢(t) to the Brownian motion path and subsequently differentiating the unconstrained and constrained minorants. The results from these simulations are presented below; these allow us to compare the power behavior of the three competing statistics under contiguous alternatives.
°
Table 2: Power at level 0.05 with b/a 100 n-+ 0.0552 2 log(A n ) 0.0518 Tn .1 ' 0.0724 Tn ,2 0.0528 T n ,3 Wn 0.0832
300 0.0514 0546 0.0640 0.0546 0.0642
500 0.0526 0.0558 0.0618 0.0560 0.0616
800 0.0554 0.0616 0.0638 0.0616 0.0618
1000 0.0484 0.0526 0.0564 0.0530 0.0532
= 0.5, c = 0.0. 2000 0.0534 0.0628 0.0648 0.0628 0.0
8000 0.0528 0.0544
00
0.0558
.05 .05 "'.Ui)
0.0544 0.0614
.05
Table 3: Power at level 0.05 with b/a = 0.5, c = 0.5. 500 0.0490 0.0535 0575 0540 .0760
800 0.0565 0.0545 0.0595 0.0545 0.0610 I
1000 0.053 0.056 0.059 0.057 0.060
2000 0.0535 0.0575 0.0615 0.0580 0.0590
4000 0.0545 0.0530 0.0540 0.0530 0.0565
= 0.5, C = 1.0.
00
.0575 .061 .061 .061 .0535
Table 5: Power at level 0.05 with bja n-+
800 0.4340 0.4450 0.4630 0.4450 0.4135
2 log(A n ) Tn,l Tn,2 Tn,3 Wn
1000 0.4255 0.4430 0.4440 0.4170
0.5,e 2000 0.4335 0.4505 0.4595 0.4515 0.3925
2.0. 4000 0.4230 0.4315 0.4365 0.4315 0.3925
8000 0.4115 0.4280 0.4315 .4285 0.3910
00
.4045 .4120 .4120 .4120 .3945
Table 6: Power at level 0.05 with bja = 1, e = O. n-+ 2 log(A n ) Tn,l Tn,2 Tn,3 Wn
100 0.0546 0.0562 0.0758 0.0570 0.0860
300 0.0524 0.0562 0.0640 0.0568 0.0700
500 0.0536 0.0580 0.0632 0.0584 0.0648
800 0.0560 0.0590 0.0640 0.0592 0.0648
2000 0.0540 0.0566 0.0590 0.0566 0.0622
1000 0.0536 0.0520 0.0548 0.0522 0.0592
Table 7: Power at level 0.05 with bja n-+ 2 log(A n ) Tn,l Tn,2 Tn ,3 Wn
100 0.0795 0.0740 0.0960 0.0755 0.1105
300 0.0775 0.0820 0.0925 0.0820 0.0800
500 0.0820 0.0785 0.0840 0.0785 0.0880
800 0.0805 0.0835 0.0900 0.0840 0.0855
1000 0.077 0.075 0.079 0.075 0.080
n-+
800 0.2600 0.2605 0.2720 0.2610 0.2500
1000 0.2610 0.2750 0.2835 0.2750 0.2505
8000 0.0546 0.0580 0.0594 0.0580 0.0564
00
.05 .05 .05 .05 .05
= 1, e = 0.5.
2000 0.0705 0.0750 0.0800 0.0755 0.0700
Table 8: Power at level 0.05 with bja = 1, e 2 log(A n Tn,l Tn ,2
4000 0.0504 0.0562 0.0582 0.0562 0.0538
2000 0.2535 0.2705 0.2760 0.2715 0.2375
4000 0.0690 0.0735 0.0755 0.0735 0.0645
8000 0.0725 0.0725 0.0730 0.0725 0.0725
00
.0730 .0705 .0705 .0705 .066
1.0. 4000 0.2560 0.2485 0.25 0.24 0.2275
at level 0.05 with bja = 1.0, e = 2.0.
00
.261 .2565 .2565 .2565 .2350
Table 10: Power at level 0.05 with bla = 2, c = O. 100 0.0538 0.0590 0.0736 .0592 .0850
n-t
2Iog(An ) Tn,l Tn,2 Tn,3 Wn
300 0.0486 0.0534 0.0598 0.053 0.067
800 0.0470 0.0524 0.0550
2000 0.0468 0.0506 0.0528 0.0508 0.0554
1000 0.0474 0.0516 0.0546 0.0516 0.0508
Table 11: Power at level 0.05 with bla n-t
II
100
2 IOg(A"~ 0.1580 Tn ,2 T n ,3 Wn
1675 0.1980 0.1705 0.1890
I
300
500 0.1585 0.1635 0.1785 0.1635 0.1615
0.1635
.1750 0.1625 0.1545
800 0.1580 0.1540 0.1620 0.1545 0.1505
1000 0.1550 0.1510 0.1565 0.1510 0.1480
n-t
100 0.6845 0.7155 0.7615 0.7190 0.7225
300 0.6625 0.6955 0.7215 0.6965 0.6775
500 0.6710 0.6970 0.7145 0.6980 0.6670
800 0.6655 0.6900 0.6990 0.6900 0.6585
1000 0.6630 0.6980 0.7035 0.6980 0.6550
n-t
II
2Iog(A n ) T n •1
T""=i
T n ,3
Wn
200 1 1 1 1 1
300 1.0000 1.0000 1.0000 1.0000 0.9995
I i
500 1 1 1 1 1
800 1.0000 1.0000 1.0000 1.0000 0.9995
1000 0.9995 0.9995 0.9995 0.9995 0.9980
00
.05 .05 .05 .05 .05
8 i t .1515 00 0.15 0.155 .143 0.1575 .143 0.1555 .143 0.1430 .1435
= 2, c = 1.0.
2000 0.6440 0.6730 0.6790 0.6730 0.647
Table 13: Power at level 0.05 with bla
8000 0.0470 0.0510 0.0524 0.0510 0.0566
2,c = 0.5.
2000 4000 0.1340 • 0.1550 0.1435 0.1485 0.1495 0.1505 0.1435 0.1485 0.1380 0.1405
Table 12: Power at level 0.05 with bla 2 log(A n ) Tn,l Tn,2 Tn,3 == Wn
4000 0.0490 0.0522 0.0532 0.0522 0.0490
4000 0.6370 0.6700 0.6735 0.6700 0.6270
8000 0.6455 0.6905 0.6935 0.6905 0.6375
00
.6615 .6915 .6915 .6915 .6285
= 2.0, c = 2.0.
2000 0.9990 0.9995 0.9995 0.9995 0.9980
4000 1.0000 1.0000 1.0000 1.0000 0.9985
8000 0.9990 0.9990 0.9990
00
~fi
1.0 1.0
0.9985 I .999
Table 14: Power at level 0.1 under a axed alternative for various sample sizes. 300 0.6850 0.7165 0.7345 0.7175 0.7115
I
500 0.8145 0.8400 0.8540 0.8400, 0.8225 I
800 0.919 0.935 0.937 0.935 0.919
1000 , 2000 0.9505 0.9970 0.9630 0.9975 0.9670 0.9975 0.9635 I 0.9975 0.9500 0.9960
1 1 1 1 1
q r
., 0
., 0
g
u..
": 0
0=100 0=1000 o=inlinlty
~ 0
0
0
0
5
15
10
20
x
Figure 9: The limiting distribution of the LRS and the finite sample distributions for n = 100 and n = 1000 under a contiguous sequence with b/a = 0.5 and c = 2
g u..
~
o
o
0l....-r
..,..5
..,..-
..,..-
..,..-
---I
15
2,
q
iD
0
0, then n I/3 (In(to) - f(to) -+ 14f(to)
I' (to)11/3 Z,
as shown by PRAKASA RAG (1969). Let j~ denote the MLE of f under the constraint that f(to) = 00 . We can then ask the same question as in the interval censoring problem. What is the limit distribution of the likelihood ratio statistic? Is it the same as in the interval censoring problem? Other models of interest in the same genre include the maximum likelihood estimation of a decreasing density as above but based on right censored observations treated in HUANG AND ZHANG (1994), maximum likelihood estimation of a monotone instantaneous hazard treated in HUANG AND WELLNER (1995), the panel count data model studied in WELLNER AND ZHANG (2000) etc. and the same question can be asked for each of these models. Fundamental to an understanding of the limitin~ distril:~ution of the likelihood ratio statistic is a characterization of the asymptotic behavior of 'l/Jn and 'I/J~ (reverting to the notation of Sections 1 and 2). Localized versions of the MLE's are defined in the same way, as the interval censoring problem; thus, we set
Un(Z)
= n I/3 (-J;n(t o + Z n- 1/3 ) -
'I/J(to))
and
Vn(z)
= n I/3 (-J;~(to + Z n- 1/3 ) -
'I/J(to)) .
In each of these examples, -J;n is obtained by computing the slope of the greatest convex minorant (or the least concave majorant) of a random object constructed from the data and ,~~ is obtained by differentiating appropriately constrained one-sided greatest convex minorants or least concave majorants, where the constraints stem from the fact that under the null the monotone function assumes the value 00 at the point to. We expect that in each of these models, the random object whose minorants or majorants are differentiated to get the MLEs should be asymptotically replaced by the sample path of Xa,b(t) == a W(t) +bt 2 where a, b > 0 are constants specific to the problem at hand, and the localized and normalized versions of the MLEs, namely (Un, Vn ), ought to converge not only finite dimensionally, but also in the £2 sense to a constant times (ga,b, g~.b)' Finally, the likelihood ratio statistic, 2 log An, should prove to be asymptotically equivalent to a constant times J dz and therefore converge to a constant times J ((ga,b(Z)? and this should be equal to in distribution through Brownian scaling as in the A reason for the above as)lmIPtc,tic eql1iv,'l,ieJ1Ce interval follows from a "monotone function estimation in white noise" discussed in \VELLNER can be of as an a monotone function various models to this class. We the reader to WELLNER Theorem statistic to other monotone defined
where the Ti's are the observed times in the model, and wn(Ti ) is an appropriate (potentially constant) weight. With appropriate weighting the £2 statistic is expected to behave in the same way as in the interval censoring problem Le. converge in distribution to T. The limiting distribution of the Wald statistic in a general monotone function model is actually wellunderstood at this point, at least under the null hypothesis, since the asymptotic distribution of the MLE of a monotone function at a fixed point is characterized. We state a general lemma here that gives the limit distribution of the Wald statistic for a general monotone function model. Lemma 8.1 Consider a non-regular model of the type described in Section 1. Thus, at a fixed point to, n 1/ 3 (~n(to) ¢(to )) -td C(¢, {, to) gl,1 (0).
Let as
Cn
be a consistent estimator of the constant C(¢, {, to). Then, defining the Wald statistic,
(~n(to) ¢(to))2 W n == ---'--C-;:C'-2- - - - ' n
we have,
The proof of the above lemma follows directly by using the continuous mapping theorem in conjunction with Slutsky's theorem. The power behavior of these statistics under both local and fixed alternatives in general monotone function models also needs to be investigated. The behavior of the likelihood ratio statistic under a fixed alternative should be characterizable in terms of KullbackLeibler divergences from the null hypothesis in general, as suggested by the specific result for interval-censored data, and also in light of the derivations in BAHADUR (1971). The construction of general contiguous alternatives for these non-regular models is also of interest; there are two different issues involved here. (a) Construct a generic sequence of contiguous alternatives for these problems based on the general formulation in Section 1, in the spirit of the alternatives constructed for the interval censoring model. (b) Construct broader classes of contiguous alternatives for the interval censoring model, by weakening the requirement of uniform convergence of the perturbation functions, the Bn's and also by allowing the alternatives to vary off shrinking n- 1 / 3 neighborhoods. It is clear that the right rate of convergence of local alternatives in these problems (so that we get convergence to a non-null distribution) is , which matches the convergence rate of ma.-ximum likelihood estimators. With a faster rate (for example Vii) of convergence of local alternatives, the likelihood ratio statistic will converge to the null (this is not difficult to in other we do not power. It is also clear that the the will often have to some "intrinsic" conditions the pal:tic:ul;:t!' model involved.
9
Proofs
Lemma 9.1 Suppose that {X n 0, the following distributional equality holds in the space
(ga,b,\ji(t),g~,b,\ji(t») :g (a (bja)1/3 g1 ,1,q, (bja)2/3 t ) ,a (bja)1/3 g?,1,q, (bja)2/3t)) . where ¢ is as defined in (4.1). Proof of Lemma 3.3. We establish the distributional equality
The joint distributional equality is established similarly. We will use the following fact (see either BANERJEE AND WELLNER (2001) or BANERJEE (2000»:
We also recall, from the definition of ¢(t) in Section 4 that
¢«bja)2/3 t) Now,
= (bja)4/31/Jo(t).
= a (ajb)1/3
a (ajb)1/3
(Xl,l(t)+¢(t».
Thus
t)
a
=
t)
a (ajb)1/3
v
a
v
+
+ a (ajb)1/3 ¢«bja)2/3 t)
This finishes the proof.
0
Proof of Theorem 3.1. The first assertion, 2 log An -+ d L, is proved in BANERJEE AND WELLNER (2001), Theorem 2.5. To show that Wn converges in distribution to , note that
(lFn(to) - F(to» -+d 2::£, (F(to) (1 - F(to» f(to)/2g(tO»1/3
n 1/3
and that lFn(to) (1 lFn(to» In(to)/2g n(to) F(to» f(to)/2g(to). Consequently, n
1/3
is a consistent estimator of F(t o) (1 -
(lFn(to) - F(to» 2'71 -+d ILJ (lFn(to) (l-lFn(to» fn(to)/2g n (to»)i/3 A
by Slutsky's theorem and the result follows by the continuous mapping theorem. It remains to deal with Tn ,2. Denote the set where lFn and F;, differ by D n and the transformed difference set corresponding to the local variable, z = n 1/ 3 (T - to), by Dn. We also denote to + n- 1/ 3 z by tn(z). Now,
=
Tn ,2,l + Tn ,2,2
.
But
and this is opel) easily since the (random) function within the curly brackets can be shown to eventually lie in a uniformly bounded Donkser class of functions with arbitrarily high probability (details of analogous arguments can be found in BANERJEE (2000». Thus we only need to deal with T n ,2,2' Now, nP
=
=
'~ft
Iv (T») ft
dt
The last step in the above string of inequalities follows from the one above it using the continuity of 9 in a neighborhood of to, the almost sure uniform convergence of to F in a neighborhood the fact that Dn is with arbitrarily high probability contained in a of to and the continuity of compact set around 0, eventually and the fact that the processes Xn(z) and Yn(z) are bounded in probability on compact sets. Thus,
It can be shown by similar steps that the same representation holds for Tn,l and T n ,3 under the null hypothesis, showing that all the three statistics are asymptotically equivalent to one another and to gUo) {() 2 Tn = 0 (1 _ ( ) 1D (X n z -} n z) dz. T
0
0
(
)
n
We will now deduce the limit distribution of Tn. To this end we will use Lemma 9.1. Now, for each f > 0 we can find a compact set M, of the form (-K" K.] such that eventually,
For a detailed proof of this see BANERJEE (2000), pages 159-160). Here Da,b is the set on which the processes ga,b and g~,b differ. Now let Xn,
=
W, {
g(to) { FF(to) 1
M
1{M ,
r
= iDa
9 b
( . ( )2 (X n z) - Y n z) dz,
,
F ~(t ) (ga,b(Z) 0
g~,b(Z»)2 dz,
z' -
I o 2 n . (?'" rl?' g(to)F(to)(l - F(to» \ a,V\ / "a,a - I I ~_.
-:--:-=-:--:--:---::::-:--::-:- (g
L (
en
n
Set = Since Me contains Dn with probability greater than 1- f eventually (D is the leftclosed, right-open interval over which the processes X n and Yn differ) we have P (Xne t= {n] < f eventually. Similarly P (We t= {] < f. Also X ne -td We as n -t 00, for every fixed f. This is so because by Theorem 4.1 with iII = 0,
as process in £2 and because
x
c] for every c
> 0 (with
a and b as defined in Theorem
dz Thus all conditions of Lemma establish that the limitilllg theI'eby ShO'Wirll< UllivE';fsality, We
Now, by Lemma 9.2 with \IT have
= 0 (the situation under the null), which entails that 4> = 0, we
(ga,b(t),g~.b(t)) ;g (a (b/a)1/3 g1 ,l ((b/a)2/3 t ) ,a(b/a)1/3g~.1 ((b/a)2/3 t ))
cJ X £2
as processes in £2
cJ.
Once again, for each
€
> 0, we can find K > 0 such that E
Thus
= 1)
1 1 1
[-K.,K'] 9 F
(ga,b(Z)
1
a2(b/a)2/3 (91,l((b/a)2/3 z ) -
[-K.,K,j 9 F F(to)
=
g~,b(Z))2
1
[-(b/a)2/3 K, ,(b/a)2/3K,] 'lI'e .
dz
g~,b((b/a)2/3z))2
dz
(gl,1(Z)-g~.1(Z))2dz
In the above string of equalities we have used the fact that a2 = g(to) F(to) (1 - F(to)). Now, note that P [UE =j::. UJ < E and P [1I' =j::. 1I'J < E . E
Thus once again by lemma 9.1 (here set X m = UE for all n, set ~n and set ~ = 1I'), it follows that U =d 1I'. This completes the proof.
= U for all n, set WE = 0
Proof of Theorem 3.2. From Lemma 2.1 we have
By the asymptotic equivalence of T n,2 and Tn under the null hypothesis, it follows immediately that Now
)(T) )
-n For this ide:ntit:y refer to pages 156-157 of BANERJEE
=
). where
and
T~,2 = =
»)
(~(T) -
n
(T) - F(to) F.> nP ( IF (T)(l-lF (T)) (n(T) n n
F(tO
F(to))
)
+01'(1).
(9.3)
The derivation of (9.1) is available on pages 168-169 of BANERJEE (2000). Now, from (9.2), standard manipulations (changing to the local variable and so forth) yield, I
T n,1
g(to)
(}o (1- (}o)
r ibn
2
-Yn(z) dz
+ 01'
(
)
1 ,
and from (9.3), by standard manipulations again, we find that
Hence,
(9.4) This coupled with the fact that Sn,2
+ Sn,l = Tn + 01'(1) yields
(9.5)
Thus
S n,2 - Sn,l
= (}o (1g(to)(})0
;;. (X 2( z) - }n 1"2() Z ) dz n