ESTIMATION OF A MONOTONE DENSITY OR MONOTONE HAZARD UNDER RANDOM CENSORING
by
Jian Huang Jon A. Wellner
TECHNICAL REPORT No. 252
April 1993
Department of Statistics, GN-22 University of Washington Seattle, Washington 98195 USA
Estimation of a Monotone Density or Monotone Hazard under Random Censoring Jian Huang and Jon A. 'Vellner
1
University of Washington April, 1993
Consider nonparametric estimation of a decreasing density function f under the random (right) censorship model. Alternatively, consider estimation of a monotone increasing (or decreasing) hazard rate A based on randomly right censored data. We show that the nonparametric maximum likelihood estimator of the density f (introduced by Laslett (1982)) is asymptotically equivalent to the estimator obtained by differentiating the least concave majorant of the Kaplan - Meier estimator, the nonparametric maximum likelihood estimator of the distribution function F in the larger model without any monotonicity assumption. A similar result is shown to hold for the nonparametric maximum likelihood estimator of an increasing hazard rate A: the nonparametric maximum likelihood estimator of A (introduced in the uncensored case by Prakasa Rao (1971)) is asymptotically equivalent to the estimator obtained by differentiation of the greatest convex minorant of the Nelson - Aalen estimator, the nonparametric maximum likelihood estimator of the cumulative hazard function A in the larger model without any monotonicity assumption. In proving these asymptotic equivalences, we also establish the asymptotic distributions of the different estimators at a fixed point at which the monotonicity assumption is strictly satisfied.
National ;'Clen.'(t o) < O. Let Xn be the NPjyfLE of A, and let An be the left continuous slope of An. Then THEOREM
1
(2.3)
n
3 1/31 1 - H(t o) 1 ~
2 A( to )>.'( to)
(An (to)
where Z is the same as in the previous theorem. ;Woreover
so the convergence in distribution in (2.3) also holds with
Xn replaced by Xn .
1. When specialized to the case of no censoring, theorem 2.1 agrees with the result of Groeneboom (1985), and theorems 2.2 and 2.3 agree with theorems 6.1 and 7.1 of Prakasa Rao (1970). REMARK 2. The distribution of Z has been completely characterized by Groeneboom (1989). REMARK 3. All of the above results assert that and (or X and X are asymptotically equivalent at the n 1 / 3 scaling level when the true density f is decreasing (or the true hazard A is increasing or decreasing). Similarly, the consistency results to be established in section 4 assert that Fn , iFn , and lFn are all asymptotically equivalent at the n 1 / 2 scaling level. It would be interesting to study the differences between these estimators. It follows along the lines of Wang (1992) that REMARK
in
In
n)
n
where C(O) is the concave majorant of two - sided Brownian motion minus a parabola at O. Note that C(O) 2:: 0 a.s. We conjecture that n 2 / 3 (F,,,, -lFn)(t o)
-+d
D
where the random variable D is :S 0 with positive probability. Other ways of examining the differences may also be of interest.
3. Characterization of the NPMLE's density g
j
in
and An.
If
is absolutely continuous Lel)e!H!u.e measure
to
on
O:S
< 00,
fj
E
}.
=
=
here G(t) 1 - G(t) and, similarly, F(t) 1 - F(t). \iVe use this notation throughout for the survival function associated with a distribution function F. The log-likelihood for the observed data is
l(A) = I: {8i log(J(Ti ))
+ (1 -
8i ) log(1- F(Ti ))}
+ C(g, G),
i=1
where C(g, G) is a term not involving will not consider it in the following.
f or F. Hence we can treat it as a constant term and
In.
(a) Characterization of NPMLE Let T(I) ~ ... ~ T(n) be the order statistics of T1 , • .• , Tn, and let 8(i) be the 8 corresponding to T(i), i.e., if T(i) = Tj, then 8(i) = 1{Xj ~ Yj}. Define T(o) = O. The NPMLE of f under monotonicity constraints is the solution to the following maximization problem: maximize n
'(Td)
i=l
A(Ti )}.
The NPMLE >'n of >. is an increasing step function maximizing the log-likelihood function. If >. is constant on the intervals [TU-l) , T(j), then it follows that
, A(T(i» = L(TU) TU-1»)>'(T(j)) , j=l
T(o) = O. Then 1(>.) can be written as
~{ =L i=l
j=l
- z
1
Hence the NPMLE is given by the vector ~n = (-\l, ... ,An)T E Rn maximizing the function n
l(z) =
(3.15)
L
{O(i) log(zi) -
- i
+ l)(T(i) -
T(i-l))Zi}
i=l
over the cone
c=
{z = (Zl' ... ,
As in Example 1.5.7 of Robertson et al. (1988), the solution to the maximization problem (3.15) can be written as \'
(3.16)
Ai
. LJ=s O(j) t ( . )( ). l:S;s:S;i i9:S;n Lj=s n - J 1 T(j) - T U- 1 )
= max mIll
+
Notice that unlike the NPMLE of a monotone density for the NPMLE of a monotone hazard. Now define
V~\(t) = J
(3.17)
f,
here we have an explicit solution
:s y /\ t}dQn(x,y) = ~C(t)
{x
and (3.18)
Wn(t)
= J[(x /\ y){x /\ y :s t} + t{x /\ Y > t}]dQn(x,y) =
I
t
(1-lHIn(s))ds,
where ~c is the sub-empirical distribution of the uncensored data, lHIn is the empirical distribution of TI, ... ,Tn, and Qn is the empirical distribution of the (unobserved) (Xi, Ii) pairs, i = 1, ... , n. Then
1
Wn(T(i)) =
n
i
L(n
j
+ l)(T(j) -
TU-l))·
j=l
Using the graphical representation of the isotonic regression, see e.g. Theorem 1.2.1 of Robertson et al. (1988), we have the following characterization of ~n' THEOREM 3.2. Let Vn and vVn be defined by (3.17) and (3.18), respectively. Then ~n is the left derivative of the greatest convex minorant of the cumulative sum diagram, consisting of the points
, J
j
= 1,"',n we
are
1, ... , ,j
= 1",' ,no
THEOREM 3.3. Let Vn and W n be defined by (3.17) and (3.18), respectively. Then An is the left derivative of the least concave majorant of the cumulative sum diagram consisting of the points 0,1, ... , n; where Po
= (0,0)
and T(j), j
1, ... ,n are the order statistics of Tj, j
= 1"
.. ,n.
REMARK 3.2. It is interesting to notice that although the product limit (or Kaplan Meier) estimator lFn is the NPMLE of F, the slope of its LCM does not agree with the NPMLE of f (see also Laslett (1982), page 157). Since we can regard the procedure of obtaining the estimator of f from the product limit estimator as a map, the classical invariance property of the likelihood estimator in parametric models breaks down here. However, as stated in THEOREM 2.~, the !.wo estimators and In are asymptotically equivalent. Similar remarks apply to An and An.
in
4. Consistency. vVe first prove that the estimators In and Xn of f and A based on the Kaplan-Meier or Nelson-Aalen estimators are consistent. The proof is based on a minor modification of Mashall's lemma. Let II . II~ denote the supremum norm over the interval [a, b]. For concave F or A, we have the following lemma. Recall from section 1 that Fn denotes the least concave majorant of the Kaplan-Meier estimator lFn and An denotes the least concave majorant of the Nelson-Aalen estimator An. LEMMA 4.1. (A) (J\;!arshaWs lemma) If F is concave, then for any 0 ::; IIFn - FII~ where In = inf{t 2:: r: In(t+) < In(t)}. (B) If A is convex, then for any 0 ::;
IIA nwhere In
= inf{t 2:: I :
I
::;
I
< IH,
IIlFn - FII~n ,
< IH,
AII~ ::; HAn - AII~n,
Xn(t+) > Xn(t)}
PROOF. (A) vVe have
., or dV(vV-l(z))jdz 1 >.(vV-l(z)) where Hf- is the inverse of VV. By the assumption that>. is nondecreasing, V 0 W-l(Z) is convex. We first prove the following , which can be seen as an extension of Marshall's lemma. PROOF.
fJ
fJ
o~ t
~n(t)}. This implies
W - VII~ < IIVn 0 W;l 0 vV - VII~n ::; IIVn 0 W;l 0 W - V 0 W;l 0 WII~n + IIV 0 W;l ::; IIVn - VII~H + IIV 0 W;l 0 IV - VII~n
IIl1n 0 W;l 0
Now since lin
0
0
W
Vll~n
Wn-l(Z) is a finite convex function on [0, W(T)], we have
/lin 0 W;l(W(t)) - lIn(t)/ = /lin 0 Wn- l (lIV(t)) - lin
0
W;l(Wn(t))/ ::; I3n/Wn(t) - W(t)l,
where I3n only depends on the lower and the upper bound of lin (VV;l (z)) (Rockafellar (1972), Theorem lOA). Hence
IIlin THEOREM
PROOF.
V//~ ::;
lin
0
lIV;l
0
IVll~
+ IIl1n 0 W;l 0 W -
Vll~ -t a.s O.
4.2. Suppose A is increasing. Then, for any 0 < to
0, by the definition of An(t), we have
and
prevIOUS lenlm,a,
0,
so it remains only to show that III
But, for 0
~ t ~
C[-k, k].
k, 1
-
-
;;;- {Z(t o + tan) - Z(t o)} van =
1
;;;- {iJ(t o + ant)B(-y(t o
van !=. ~{;3(to
+ ant)) -
t'(to)B(-y(to))]
+ ant) - ;3(to) }B{t(t o + ant)) an
+ iJ(to) ~{B(-y(to + ant)) -
Bb(t o))} van !=. ~{;3(to + ant) -, iJ(t o) }Op(l) an + iJ(to)B('Y(to + ant) - ,(to)) an :::} 0 + iJ(to)Bb'(to)t) !=. Bca2 (to),' (to)t). Arguing similarly for - k ~ t ~ 0 completes the proof of the first claim. To prove the second assertion, write 2 1 1
+ n- 3 t) - An (to) n- 3 a(t o)t} = n f {An (to + n-tt) - An (to) A(to + n-tt) + A(t o)} + nj{A(to + n-tt) - A(to) - n-ta(to)t} :::} iJ(to)W{t'(to)t) + ~a'(to)e
n 3 {An (to
of Lo and Singh (1986) or Lo, Mack, and Wang (1989) together with the empirical process methods of Kim and Pollard (1990). From Burke, Csorgo, and Horvath (1981) corollary 6.1, page 104, (5.2.5) holds for An = An and A A with bn = (logn?/2n -l/3 where the corresponding limit process Z has fJ 1 1 3 1 3 and, = foH-2dHuC fOF-2G-1dF C. Since an = n- / , b~jan = (logn?n- / = 0(1), and hence the convergence of asserted in theorem 5.1 follows from theorem 5.3 with bn = (logn?/2n-l/3, ,'(to) = C'(t o) = >.(to)jH(to), and a'(to) >.'(to). To complete the proof of theorem 5.1, it remains only to prove (5.23), since the claimed weak convergence of U; follows from (5.23) and the convergence of 0; alredy proved. First we prove another corollary of theorem 5.3, for a natural local process defined in terms of the Kaplan - Meier estimator IFn ; although this process will not be used in the sequel, it could be used to give an independent treatment of the estimator of f. The corollary further illustrates the use of theorem 5.3.
0;
=
in
5.1. If IFn is the Kaplan - A1eier estimator of F, f(t o) is continuous in a neighborhood of to, then
COROLLARY
l'
n 2/3 {IFn (t o + n- 1/3 t) -lFn(t o) - n- 1/3 f(t o)t}
0, I'(to) < 0, and
=> F(to)W(>.(to)tjH(t o)) + ~J'(to)t2 .!!.. W(f(to)tjG(t o))
+ ~J'(to)t2
5.1. From Burke, Csorgo, and Horvath (1981), theorem 4.2, page 94, (5.25) holds with bn = (logn)3/2n -l/3. Since an = n- 1 / 3, b~jan = (logn)3n -l/3 = 0(1), and hence the corollary follows from theorem 5.1 with An = IFn , A = F, [a, b] = [0, T] for T < TH, fJ = F" = fo H- 2dHuC = fo F- 2G- 1dF = C, and bn = (logn?/2n -l/3. 0 PROOF OF COROLLARY
PROOF OF THEOREM 5.1, CONTINUED. It suffices to prove (5.23). First, by standard results from empirical process theory, for any a > 0 and 0 < T < TH,
n 1 / 2 - Oi II'ljlHIn - 1jHIIT0 -r a.s. O.,
and n l/2-O:I'lrHf~C ~"n
,,110
-r a.s.
0.
These are in particular true for 1/2 - a = 1/3. vVe will use these two facts repeatedly in the following. We prove II~ O.
-'(to) (t
n
H(to))ds
ito
say. We can write the first term as
By
LEMMA
2 of Lo and Singh (1986), for any
1111111~ :s -tp
For the
2n t H(t o)
to + n-"3 k, 1
T ;::::
lilt (~_ -~) d(~C
- HUC)
II:
o.
112 term, we use a maximal inequality.
Consider the classes R n defined as
For each n, R n is a class of increasing functions, so it is a VC-hull class. Hence the uniform entropy numbers of R n grow polynomially of the order (l/fy with r < 2 as f - t O. See Dudley (1987) or LEMMA 2.42 of Van der Vaart and Wellner (1993). This allows us to use the maximal inequality with the envelope
By Kim and Pollard (1990), inequality 2.1, page 199, there is constant C such that
But,
+
So
o.
For the
113 term, we have 1111311~
::;
ntH(to)11
O.
114 term combined with a term from 12 converges to zero.
vVe will see that the term
The second
.L
nt.\(t o) r'(lHLn(s) - H(to))ds
12
ito
= n~.\(to)
l tn (H(s) -lHLn(s))ds + n~.\(to) l tn (H(s)
= 121 + 122 ,
~
H(to))ds
~
say. But
Iln~ l:n(H(s) -lHLn(s))dsll:::; ntklllHLn - HII~ so 1112111~
-T p
D:) II~
::; 11114+h211~+op(1)
Iln~
- Iln~ -
0,
O. Thus
IIH(t o) ([r~ -
+
PROOF. This follows easily from the Glivenko-Cantelli theorem applied to fonows that
1Hln: from (3.4) it
H(s))ds
tH(to)1
and hence the claim follows from the Glivenko-Cantelli theorem and continuity of H. 0 6. Inverse processes. In this section, we consider some random processes which can be regarded as the inverse processes of the estimators for monotone f or A, and are easier to deal with analytically. In the following we consider estimation of a monotone increasing hazard rate A and a monotone decreasing density A.
We first define four processes Z~, Z~ and Z~, Z~ on JR2 by:
(6.32) for (t,a) E JR2. Let the inverse processes S~, S~ and S~, S~ be defined for a 2::
°by
= sup{t E [O,T(n)] : V';(t) S~( a) = sup{ t E [0, T(n)] : A t)
§~(a)
§~(a) (6.33)
S~(a)
at is minimal}, at is minimal} , n( sup{t E [O,T(n)]: V:n(t) - G(to)Fn(t) - at is maximal}, sup{t E [0, T(n)] : V!,n(t) - G(to)lFn(t) - at is maximal}.
=
From the characterizations given in theorem 3.2 and corollary 3.2 in section 3, it follows that
< t}, < t}, <
..'(to)H(tO)t2 - aH(to)t is minimal}
= sup{t : j)..(to)H(to)W(t + al )..'(to)) + ~)..'(to)H(to)(t + al"\'(to))2 - aH(to)(t + al"\'(to)) is minimal} sup{t: j"\(to)H(to)W(t) + ~"\'(to)H(to)t2 is minimal} SA(O) . Hence it follows from the identities (6.35) that
P(n 1 / 3 (X n
)..)(to) 2::: a, n 1 / 3 (X n - "\)(t o) 2::: b) = P(nl/3(S~()"(to) + n- 1 / 3a) to)::S 0, nl/3(S~()..(to) + n- 1 / 3 b) - to) -l> P(>..'(tO)SA(O) ::s -a, )..'(to)SA(O) ::s -b) = P( -"\'(to)SA(O) 2::: a, -"\'(to)SA(O) 2::: b),
which implies that
-
::s 0)
This implies the asymptotic equivalence stated in theorem 2.2. To identify the common limit law, the distribution of -'\'(to)S>'(O) d N(to)S>'(O) since the distribution of S>'(O) is symmetric ~bout 0, set R == H(to)j(A(to)N(t o)). Then the asymptotic distributon of 2- 2 / 3 R1 / 3 n 1 / 3 (>"n - >")(to) is that of N(t o)2- 2/3R1 / 3 S>'(0). But by the definition of S>'
2-~ >"'(to)Rt S'\(O) =
sup{2-~ >"'(to)Rtt : ..j>..(to)H(to)W(t) + ~>"'(to)H(tO)t2 sup{t:
is minimal}
..j>"(to)H(to)W(>"'(tO)-12~R-tt)+ ~>"'(to)-124/3R-~H(to)t2
is minimal}
= sup{t: ..j>"(to)H(to»..'(tothtR-~W(t)+ ~'\'(to)-124/3R-~H(to)t2 is minimal} = sup{t : 'V(t) + t 2 is minimal}
= sup{ t : - W(t)
t 2 is maximal}
= sup{t : W(t) - t 2 is maximal}
Z where we have used the scaling property of the Brownian motion and the identity
This yields the stated asymptotic distributions It remains only to verify the second condition of theorem 6.1. The next section is devoted to these proofs. 0 PROOF OF THEOREM 2.1. Fix a, b E JR. By theorem 5.1 and lemma 5.2 it follows that
.h~_~ U~f:~ ~;l:fi~ rl :~ ~r;? Wuc;J. nt uo }
~ P{Z!(n"iuo,-lUd 2: O} ~ P{Z!(t, -1vfd 2: 0 for some M 2 ~ t ~ nt uo } Uo we can no use the identity (9.43). Instead, note that V!n Fn. Then the argument proceeds as above using concavity of Now we .39); for I: > 0 >0 IS an
+
>
\o +
M 1)
-
to] > 1'¥f2 }
:::; P {D;(t) - M 1 . n t [Wn(t o + n-tt) - Wn(t o)] / H(t o) :::; 0, for some t ~ M 2 }. Using the definitions of Vn>' and ~Vn given in (3.17) and (3.18), we can express
D;(t) =
Jh(x, y;
D; as
to + n-tt)dQn(x, y)/H(t o) ,
where h(x, y; t) is defined as
h(x,y;u) = {x:::; y}[{x:::; u} - {x:::; to}] - A(to)[(X /\ Y - to)({x /\ y:::; u} - {x /\ Y :::; to} ) + (u - t o){ x /\ Y > u}]. Using LEMMA 4.1 of Kim and Pollard (1990), we can prove that for every a > 0, there exist random variables An of order Op(1) such that (7.41) For the expectation of Un(t), some calculations show that
Moreover, we have
Now taking a = ~N(to), and using (7.41),
Wn(t O)]/ H(t o) [W + n-tt)-
>
t+
+
1
We choose
1'12
> 0 such that 0(1)
is increasing for t 2: J12 and
Hence
+ n-tt) - Wn(to)] ::; 0 for some M2 ::; t ::; nt uo } ::; P{-A~ + ~AI(tO)t2 + 0(1)t 2 - tM1 + 0(1) ::; 0 for some ~'42 ::; t::; nt uo }
P{tJ:(t) - J11n t[Wn(to
::;f for n sufficiently large. Now we are left to deal with t > n~uo. vVe prove that for t > n 1/3 uo,
with probability close to 1 for n sufficiently large. By LEMMA 4.3, we have ~,\
Vn (to
+ uo) = V~\ (to + uo) + 0(1).
By the definition of GeM,
Thus, by the convexity of Vn'\(t),
[Vn,\(t o + n-h) - Vn,\(to + uo)] (n~ A(to) nt J1d[Wn(t o + n-tt) - Vln(to 2: n ~ [V: (to n- t t) - Vn,\(to + uo) + o( 1)] (n~ A(tO) j\1d[Hln(t o + n-tt) n(tO + uo)J1H(t o)
>
[Xn(to+ A(t o)
-
+
+ 2:0
+
t)+ n-
(t o +
+ +
+ +
+
+ uo)]! H(to)
for n sufficiently large. This implies
P{ Z~(t, Atfl ) ~ 0 for some t > uo} 1 ~ P{Zn(nsuo,ivfd ~ O} ~ P{ Zn(t, JUI ) ~ 0 for some Atf'], ~ t ~ ~>.
nt uo }
~c:
for n sufficiently large, and completes the proof of (7.39). 0 8. Illustrations and Example~ ~o~ we illustrate the results obtained in sections 2-6 by calculating the estimators fn, fn, An, An and their corresponding cumulative versions for simulated data. To compare the NPMLE with the estimators based on the Kaplan-Meier estimator in the estimation of a decreasing density, or the Nelson-Aalan estimator in the estimation of an increasing hazard, and also to see how censoring and sample size effect the estimation, we carried out two sets of simulations. (i) Estimation of a decreasing density f. The simulated random samples are generated in S as follows. The failure time X rv exp(l), i.e., the exponential distribution with mean one. Two different censoring distributions are considered, (a) Y rv exp(2), i.e., the exponential distribution with mean 0.5. Then P{X ~ Y} = 1/3, that is, the uncensored rate is a third. (b) Y rv exp(0.5), i.e., the exponential distribution with mean 2. Then P{X ~ Y} P(8 = 1) = 2/3. In each of the above cases, two different sample sizes are used, N = 100,800. Figure 1 shows the estimators of the distribution function. The step function is the Kaplan-Meier estimator. (In the figures, the step functions are either the Kaplan-Meier estimator of the distribution function, or the Nelson-Aalen estimator of the cumulative hazard function. This should be clear in the description of the plots.) The solid line is the least concave majorant of the Kaplan-Meier estimator. The dotted line is the NPMLE of the distribution function with decreasing density. The dashed line is the true distribution function, i.e., Fo(x) = 1 exp( -x). The censoring distribution is G(y) 1 - exp( -2y). Figure 2 shows the estimators of the decreasing density. The solid line is the slope of the concave majorant of the Kaplan-Meier estimator. The dotted line is the NPMLE of the decreasing density. The dashed line is the true density, i.e., fo( x) exp( -x). Figures 3 to 6 are similar to Figures 1 and 2, except the pictures are generated with different and different described above, and except in cases and plots of sample size to 800, we fur ofdellSl1;y
in
in
that both and tend to overestimate 1 at x 0, see Figures 5 and 6. We believe that both and are actuaily not consistent at zero. For the estimation of a decreasing density without censoring, the NPMLE coincides with the slope of the least concave majorant of the empirical distribution function. Woodroofe and Sun (1991) showed that is inconsistent, and in fact that In(O) k -+d sup - ,
in
in
in
1(0+)
l