Uniform and Non-uniform Bounds in Normal Approximation for ...

Report 1 Downloads 34 Views
Uniform and Non-uniform Bounds in Normal Approximation for Nonlinear Statistics

Louis H.Y. Chen

1

National University of Singapore and Qi-Man Shao2 University of Oregon and National University of Singapore

Abstract. Let T be a general sampling statistic that can be written as a linear statistic plus an error term. Uniform and non-uniform Berry-Esseen type bounds for T are obtained. The bounds are best possible for many known statistics. Applications to U-statistic, multi-sample U-statistic, L-statistic, random sums, and functions of non-linear statistics are discussed.

Nov. 6, 2002

AMS 2000 subject classification: Primary 62E20, 60F05; secondary 60G50. Key words and phrases. Normal approximation, uniform Berry-Esseen bound, non-uniform BerryEsseen bound, concentration inequality approach, nonlinear statistics, U-statistics, multi-sample Ustatistics, L-statistic, random sums, functions of non-linear statistics.

1

Research is partially supported by grant R-146-000-013-112 at the National University of Singapore. Research is partially supported by NSF grant DMS-0103487 and grant R-146-000-038-101 at the National University of Singapore. 2

1

1

Introduction

Let X1 , X2 , . . . , Xn be independent random variables and let T := T (X1 , . . . , Xn ) be a general sampling statistic. In many cases T can be written as a linear statistic plus an error term, say T = W + ∆, where W =

n X

gi (Xi ), ∆ := ∆(X1 , . . . , Xn ) = T − W

i=1

and gi := gn,i are Borel measurable functions. Typical cases include U-statistics, multi-sample Ustatistics, L-statistics, and random sums. Assume that (1.1)

E(gi (Xi )) = 0 for i = 1, 2, . . . , n, and

n X

E(gi2 (Xi )) = 1.

i=1

It is clear that if ∆ → 0 in probability as n → ∞, then we have the following central limit theorem sup |P (T ≤ z) − Φ(z)| → 0

(1.2)

z

where Φ denotes the standard normal distribution function, provided that the Lindeberg condition holds: ∀ ε > 0,

n X

Egi2 (Xi )I(|gi (Xi )| > ε) → 0.

i=1

If in addition,

E|∆|p

< ∞ for some p > 0, then by the Chebyshev inequality, one can obtain the

following rate of convergence: sup |P (T ≤ z) − Φ(z)| ≤ sup |P (W ≤ z) − Φ(z)| + 2(E|∆|p )1/(1+p) .

(1.3)

z

z

The first term on the right hand side of (1.3) is well-understood via the Berry-Esseen inequality. For example, using the Stein method, Chen and Shao (2001) obtained (1.4) sup |P (W ≤ z) − Φ(z)| ≤ 4.1 z

n X

Egi2 (Xi )I(|gi (Xi )|

> 1) +

n X



E|gi (Xi )|3 I(|gi (Xi )| ≤ 1) .

i=1

i=1

However, the bound (E|∆|p )1/(1+p) is in general not sharp for many commonly used statistics. Many authors have worked towards obtaining better Berry-Esseen bounds. For example, sharp Berry-Esseen bounds have been obtained for general symmetric statistics in van Zwet (1984) and Friedrich (1989). Bolthausen and G¨otze (1993) extended them to multivariate symmetric statistics and multivariate sampling statistics. In particular, they (see Theorem 2 in [4]) stated that if ET = 0, then (1.5)



sup |P (T ≤ z) − Φ(z)| ≤ C E|∆| + z

n X i=1

2

E|gi (Xi )|3 +

√  α ,

ˆ i , 1 ≤ i ≤ n} is an independent copy of {Xi , 1 ≤ i ≤ n}, and where C is an absolute constant, {X (1.6)

α=

n 1X ˆ i , . . . , Xn )|. E|∆(X1 , . . . , Xi , . . . , Xn ) − ∆(X1 , . . . , X n i=1

Shorack (2000) (see Lemma 11.1.3, p. 261, [22]) recently stated that for any random variables W and ∆, sup |P (W + ∆ ≤ z) − Φ(z)| ≤ sup |P (W ≤ z) − Φ(z)| + 4E|W ∆| + 4E|∆|.

(1.7)

z

z

However neither (1.5) nor (1.7) is correct as will be shown by a counter-example in Section 2. The main purpose of this paper is to correct the bounds in (1.5) and (1.7) and establish uniform and non-uniform Berry-Esseen bounds for general nonlinear statistics. The bounds are best possible for many known statistics. Our proof is based on a randomized concentration inequality approach to estimate P (W + ∆ ≤ z) − P (W ≤ z). Since proofs of uniform and non-uniform bounds for sums of independent random variables can be proved via the Stein method [8], which is much neater and simpler than the traditional Fourier analysis approach, this paper provides a direct and unifying treatment towards the Berry-Esseen bounds for general non-linear statistics. This paper is organized as follows. A counter example is provided in next section. The main results are stated in Section 3, and five applications are presented in Section 4. Proofs of main results are given in Section 5, while proofs of other results including Example 2.1 are postponed to Section 6. Throughout this paper, C will denote an absolute constant whose value may change at each appearance. The Lp norm of a random variable X is denoted by ||X||p , i.e., ||X||p = (E|X|p )1/p for p ≥ 1.

2

A counter-example

Example 2.1 Let X1 , . . . , Xn be independent normally distributed random variables with mean zero and variance 1/n. Let 0 < ε < 1/64 and n ≥ (1/ε)4 . Define W =

n X

Xi , T := Tε = W − ε|W |−1/2 + ε c0 and ∆ = T − W = −ε|W |−1/2 + ε c0 ,

i=1

where c0 = E(|W |−1/2 ) =

p

2/π

R∞ 0

x−1/2 e−x

2 /2

ˆ i , 1 ≤ i ≤ n} be an independent copy of dx. Let {X

{Xi , 1 ≤ i ≤ n} and define α as in (1.6). Then ET = 0, (2.1)

P (T ≤ ε c0 ) − Φ(ε c0 ) ≥ ε2/3 /6, 3

E|W ∆| + E|∆| ≤ 7ε, n X √ E|∆| + E|Xi |3 + α ≤ Cε

(2.2) (2.3)

i=1

where C is an absolute constant. Clearly, (2.1) implies that sup |P (Tε ≤ z) − Φ(z)| ≥ ε2/3 /6

(2.4)

z

for 0 < ε < 1/64, which together with (2.2) shows that (1.7) is false. The mistake of the proof occurred when the author of [22] applied his “especially” remark on page 256 and used a wrong inequality P (a < W + ∆ < b) ≤ (b − a). The same mistake occurred in the proof of Theorem 11.1.1. in [22]. Since (1.7) was applied to prove a Berry-Esseen bound for the U-statistic in [22], the proof is also incorrect. Now (2.3) and (2.4) imply that (1.5) is not correct. Since (1.5) is a special case of Theorem 2 in Bolthausen and G¨otze (1993), which was deduced from their Theorem 1 in that paper, the correctness of their Theorem 1 is questionable.

3

Main results

Let {Xi , 1 ≤ i ≤ n}, T , W , ∆ be defined as in Section 1. In the following theorems, we assume that (1.1) is satisfied. Put (3.1)

β=

n X

E|gi (Xi )|2 I(|gi (Xi )| > 1) +

i=1

and let δ > 0 satisfy (3.2)

n X

E|gi (Xi )|3 I(|gi (Xi )| ≤ 1)

i=1 n X

E|gi (Xi )| min(δ, |gi (Xi )|) ≥ 1/2.

i=1

Theorem 3.1 For each 1 ≤ i ≤ n, let ∆i be a random variable such that Xi and (∆i , W − gi (Xi )) are independent. Then (3.3)

sup |P (T ≤ z) − P (W ≤ z)| ≤ 4δ + E|W ∆| + z

n X

E|gi (Xi )(∆ − ∆i )|

i=1

for δ satisfying (3.2), (3.4)

sup |P (T ≤ z) − P (W ≤ z)| ≤ 2β + E|W ∆| + z

n X i=1

4

E|gi (Xi )(∆ − ∆i )|

and (3.5)

sup |P (T ≤ z) − Φ(z)| ≤ 6.1β + E|W ∆| + z

n X

E|gi (Xi )(∆ − ∆i )|.

i=1

Next theorem provides a non-uniform bound. Theorem 3.2 For each 1 ≤ i ≤ n, let ∆i be a random variable such that Xi and (∆i , {Xj , j 6= i}) are independent. Then for δ satisfying (3.2) and for z ∈ R1 , |P (T ≤ z) − P (W ≤ z)| ≤ γz + e−|z|/3 τ

(3.6) where (3.7)

n X

γz = P (|∆| > (|z| + 1)/3) +

P (|gi (Xi )| > (|z| + 1)/3)

i=1

+

n X

P (|W − gi (Xi )| > (|z| − 2)/3)P (|gi (Xi )| > 1),

i=1

(3.8)

τ

= 21δ + 8.1||∆||2 + 3.5

n X

||gi (Xi )||2 ||∆ − ∆i ||2 .

i=1

In particular, if E|gi (Xi )|p < ∞ for 2 < p ≤ 3, then (3.9)

|P (T ≤ z) − Φ(z)| 

≤ P (|∆| > (|z| + 1)/3) + C(|z| + 1)−p ||∆||2 +

n X

||gi (Xi )||2 ||∆ − ∆i ||2 +

i=1

n X



E|gi (Xi )|p .

i=1

A result similar to (3.5) has been obtained by Friedrich (1989) for gi = E(T |Xi ) using the method of characteristic function. Our proof is direct and simpler and the bounds are easier to calculate. The non-uniform bounds in (3.6) and (3.9) for general non-linear statistics are new.

Remark 3.1 Assume E|gi (Xi )|p < ∞ for p > 2. Let (3.10)

δ=

n  2(p − 2)p−2 X

(p − 1)p−1

E|gi (Xi )|p

1/(p−2)

i=1

Then (3.2) is satisfied. This follows from the inequality (3.11)

min(a, b) ≥ a −

for a ≥ 0 and b > 0. 5

(p − 2)p−2 ap−1 (p − 1)p−1 bp−2

.

Remark 3.2 If β ≤ 1/2, then (3.2) is satisfied with δ = β/2.

Remark 3.3 Let δ > 0 be such that n X

Egi2 (Xi )I(|gi (Xi )| > δ) ≤ 1/2.

i=1

Then (3.2) holds. In particular, if X1 , X2 , · · · , Xn are independent and identically distributed (i.i.d.) √ random variables and gi = g1 , then (3.2) is satisfied with δ = c0 / n, where c0 is a number such that √ √ E( ng1 (X1 ))2 I(| ng1 (X1 )| > c0 ) ≤ 1/2.

Remark 3.4 In Theorems 3.1 and 3.2, the choice of ∆i is flexible. For example, one can choose ∆i = ˆ i , Xi+1 , . . . , Xn ), where {X ˆ i , 1 ≤ i ≤ n} ∆(X1 , . . . , Xi−1 , 0, Xi+1 , . . . , Xn ) or ∆i = ∆(X1 , . . . , Xi−1 , X is an independent copy of {Xi , 1 ≤ i ≤ n}. The choice of gi is also flexible. It can be more general than gi (x) = E(T |Xi = x), which is commonly used by others in the literature. Remark 3.5 Let X1 , . . . , Xn be independent normally distributed random variables with mean zero and variance 1/n, and let W , T and ∆ be as in Example 2.1. Then (3.12) E|W ∆| +

n X

E|Xi |3 +

i=1

n X

E|Xi (∆(X1 , . . . , Xi , . . . , Xn ) − ∆(X1 , . . . , 0, . . . , Xn ))| ≤ Cε2/3

i=1

for (1/ε)4/3 ≤ n ≤ 16(1/ε)4/3 . This together with (2.4) shows that the bound in (3.4) is achievable. Moreover, the term

4

P

E|gi (Xi )(∆ − ∆i )| in (3.4) can not be dropped off.

Applications

Theorems 3.1 and 3.2 can be applied to a wide range of different statistics and provide bounds of the best possible order in many instances. To illustrate the usefulness and the generality of these results, we give five applications in this section. The uniform bounds refine many existing results with specifying absolute constants, while the non-uniform bounds are new for many cases.

4.1

U-statistics

Let X1 , X2 , . . . , Xn be a sequence of independent and identically distributed (i.i.d.) random variables, and let h(x1 , . . . , xm ) be a real-valued Borel measurable symmetric function of m variables, where m 6

(2 ≤ m < n) may depend on n. Consider the Hoeffding (1948) U -statistic Un =

 n −1

m

X

h(Xi1 , . . . , Xim ).

1≤i1 c0 σ1 ) ≤ σ12 /2. If in addition E|g(X1 )|p < ∞ for 2 < p ≤ 3, then ! √ n (1 + 2)(m − 1)σ 6.1E|g(X1 )|p Un ≤ z − Φ(z)| ≤ + mσ1 (m(n − m + 1))1/2 σ1 n(p−2)/2 σ1p



sup |P

(4.2)

z

and for z ∈ R1 , √

(4.3)

|P

!

n Un ≤ z − Φ(z)| ≤ mσ1

9mσ 2 13.5e−|z|/3 m1/2 σ + (1 + |z|)2 (n − m + 1)σ12 (n − m + 1)1/2 σ1 +

CE|g(X1 )|p . (1 + |z|)p n(p−2)/2 σ1p

Moreover, if E|h(X1 , . . . , Xm )|p < ∞ for 2 < p ≤ 3, then for z ∈ R1 , √

(4.4)

|P ≤

!

n Un ≤ z − Φ(z)| mσ1 Cm1/2 E|h(X1 , . . . , Xm )|p CE|g(X1 )|p + . p (1 + |z|)p (n − m + 1)1/2 σ1 (1 + |z|)p n(p−2)/2 σ1p

7

Note that the error in (4.1) is of order O(n−1/2 ) only under the assumption of finite second moment of h. The result appears not known before. The uniform bound given in (4.2) is not new, however, the specifying constant for general m is new. Finite second moment of h is not the weakest assumption for the uniform bound. Friedrich (1989) obtained an order of O(n−1/2 ) when E|h|5/3 < ∞ which is necessary for the bound as shown by Bentkus, G¨otze and Zitikis (1994). For the non-uniform bound, Zhao and Chen (1983) proved that if m = 2, E|h(X1 , X2 )|3 < ∞, then



|P

(4.5)

!

n Un ≤ z − Φ(z)| ≤ An−1/2 (1 + |z|)−3 mσ1

for z ∈ R1 , where the constant A does not depend on n and x. Clearly, (4.4) refines Zhao and Chen’s result specifying the relationship of the constant A with the moment condition. After we finished proving Theorem 4.1, Wang (2001) informed the second author that he also obtained (4.4) for m = 2 and p = 3. Remark 4.1 (4.3) implies that ! √ CE|g(X1 )|p n Cm1/2 σ 2 (4.6) + |P Un ≤ z − Φ(z)| ≤ mσ1 (1 + |z|)3 (n − m + 1)1/2 σ12 (1 + |z|)p n(p−2)/2 σ1p for |z| ≤ ((n − m + 1)/m)1/2 . For |z| > ((n − m + 1)/m)1/2 , the bound like (4.6) can be easily obtained by using the Chebyshev inequality. On the other hand, if (4.6) holds for any z ∈ R1 , then it appears necessary to assume E|h(X1 , . . . , Xm )p < ∞.

4.2

Multi-sample U-statistics

Consider k independent sequences {Xj1 , . . . , Xjnj } of i.i.d. random variables, j = 1, . . . , k. Let h(xjl , l = 1, . . . , mj , j = 1, . . . , k) be a measurable function symmetric with respect to mj arguments of the j-th set, mj ≥ 1, j = 1, . . . , k. Let θ = Eh(Xjl , l = 1, . . . , mj , j = 1, . . . , k). The multi-sample U -statistic is defined as Un¯ =

k  nY nj −1 o X j=1

mj

h(Xjl , l = ij1 , . . . , ijmj , j = 1, . . . , k)

where n ¯ = (n1 , . . . , nk ) and the summation is carried out over all 1 ≤ ij1 < . . . < ijmj ≤ nj , nj ≥ 2mj , j = 1, . . . , k. Clearly, Un¯ is an unbiased estimate of θ. The two-sample Wilcoxon statistic and the 8

two-sample ω 2 -statistic are two typical examples of the multi-sample U -statistics. Without loss of generality, assume θ = 0. For j = 1, . . . , k, define 

hj (x) = E h(X11 , . . . , X1m1 ; . . . ; Xk1 , . . . , Xkmk )|Xj1 = x) and let σj2 = Eh2j (Xj1 ) and σn2¯

=

k X m2j j=1

nj

σj2 .

A uniform Berry-Esseen bound with order O((min1≤j≤k nj )−1/2 ) for the multi-sample U-statistics was obtained by Helmers and Janssen (1982) and Borovskich (1983) (see, [Koroljuk and Borovskich (1994), pp. 304-311.]). Next theorem refines their results. Theorem 4.2 Assume that θ = 0, σ 2 := Eh2 (X11 , . . . , X1m1 ; . . . ; Xk1 , . . . , Xkmk ) < ∞ and max1≤j≤k σj > 0. Then for 2 < p ≤ 3 (4.7)

sup |P



z

σn−1 ¯ ¯ Un

√ k k mpj m2j 6.6 X (1 + 2)σ X p + p ≤ z − Φ(z)| ≤ p−1 E|hj (Xj1 )| σn¯ n σ n j n ¯ j=1 j j=1 

and for z ∈ R1 (4.8)



k k X X m2j 2 m2j 9σ 2 −|z|/3 σ + 13.5e σn¯ j=1 nj (1 + |z|)2 σn2¯ j=1 nj



|P σn−1 ¯ Un ≤ z − Φ(z)| ≤

k mp E|h (X )|p X C j j1 j + . p p−1 p (1 + |z|) σn¯ j=1 nj

4.3

L-statistics

Let X1 , . . . , Xn be i.i.d. random variables with a common distribution function F , and let Fn be the empirical distribution function defined by Fn (x) = n

−1

n X

I(Xi ≤ x) for x ∈ R1 .

i=1

Let J(t) be a real-valued function on [0, 1] and define Z



T (G) =

xJ(G(x))dG(x) −∞

for non-decreasing measurable function G. Put σ2 =

Z



Z



J(F (s))J(F (t))F (min(s, t))(1 − F (max(s, t))dsdt

−∞ −∞

9

and Z



(I(x ≤ s) − F (s))J(F (s))ds.

g(x) = −∞

The statistic T (Fn ) is called an L-statistic (see [Serfling (1980), Chapter 8]). Uniform Berry-Esseen bounds for L-statistic for smoothing J were given by Helmers (1977), and Helmers, Janssen and Serfling (1990). Applying Theorems 3.1 and 3.2 yields the following uniform and non-uniform bounds for L-statistic. Theorem 4.3 Let n ≥ 4. Assume that EX12 < ∞ and E|g(X1 )|p < ∞ for 2 < p ≤ 3. If the weight function J(t) is Lipschitz of order 1 on [0, 1], that is, there exists a constant c0 such that (4.9)

|J(t) − J(s)| ≤ c0 |t − s| for 0 ≤ s, t ≤ 1

then

√ √ −1 (1 + 2)c0 ||X1 ||2 6.1E|g(X1 )|p √ sup |P ( nσ (T (Fn ) − T (F )) ≤ z) − Φ(z)| ≤ + nσ n(p−2)/2 σ p z

(4.10) and

√ (4.11) |P ( nσ −1 (T (Fn ) − T (F )) ≤ z) − Φ(z)| ≤

4.4

 c ||X || 9c20 EX12 C E|g(X1 )|p  0 √ 1 2 + (p−2)/2 + 2 2 p (1 + |z|) nσ (1 + |z|) nσ n σp

Random sums of independent random variables with non-random centering

Let {Xi , i ≥ 1} be i.i.d. random variables with EXi = µ and Var(Xi ) = σ 2 , and let {Nn , n ≥ 1} be a sequence of non-negative integer-valued random variables that are independent of {Xi , i ≥ 1}. Assume that ENn2 < ∞ and Nn − ENn d. p −→ N (0, 1). Var(Nn ) Then by Robbins (1948), PNn Xi − (ENn )µ p i=1

σ 2 EN

n

+

µ2 Var(N

n)

d.

−→ N (0, 1).

This is a special case of limit theorems for random sums with non-random centering. This kind of problems arises in the study, for example, of Galton-Watson branching processes. We refer to Finkelstein, Kruglov and Tucker (1994) and references therein for recent developments in this area. As another application of our general result, we give a uniform Berry-Esseen bound for the random sum.

10

Theorem 4.4 Let {Yi , i ≥ 1} be i.i.d. non-negative integer-valued random variables with EYi = ν and Var(Yi ) = τ 2 . Put Nn =

Pn

i=1 Yi .

Assume that E|Xi |3 < ∞ and that {Yi , i ≥ 1} and {Xi , i ≥ 1}

are independent. Then  τ2  PNn X − nµν E|X1 |3 i + sup |P p i=1 ≤ x − Φ(x)| ≤ Cn−1/2 2 + 3

(4.12)

n(νσ 2 + τ 2 µ2 )

x

4.5

ν

σ

σ  √ . µ ν

Functions of non-linear statistics

ˆn = Θ ˆ n (X1 , . . . , Xn ) be a weak consistent estimator of Let X1 , X2 , . . . , Xn be a random sample and Θ ˆ n can be written as an unknown parameter θ. Assume that Θ n X  ˆ n = θ + √1 Θ gi (Xi ) + ∆ n i=1

where gi are Borel measurable functions with Egi (Xi ) = 0 and

Pn

2 i=1 Egi (Xi )

= 1, and ∆ :=

∆n (X1 , . . . , Xn ) → 0 in probability. Let h be a real-valued function differentiable in a neighborhood of θ with g 0 (θ) 6= 0. Then, it is known that √

ˆ n ) − h(θ)) d. n(h(Θ −→ N (0, 1) h0 (θ)

ˆ n is the sample mean, the Berry-Esseen bound and Edgeunder some regularity conditions. When Θ worth expansion have been well studied (see Bhattacharya and Ghosh (1978)). The next theorem shows that the results in Section 3 can be extended to functions of non-linear statistics. Theorem 4.5 Assume that h0 (θ) 6= 0 and δ(c0 ) = sup|x−θ|≤c0 |h00 (x)| < ∞ for some c0 > 0. Then for 2 < p ≤ 3, sup |P

(4.13)

 √n(h(Θ ˆ n ) − h(θ))

h0 (θ)

z





1+

+6.1

n  X c0 δ(c0 )  E|W ∆| + E|gi (Xi )(∆ − ∆i )| 0 |h (θ)| i=1 n X i=1

where W =



≤ z − Φ(z)|

E|gi (Xi )|p +

4 2E|∆| 4.4c3−p 0 δ(c0 ) + + , 2 1/2 0 c0 n c0 n |h (θ)|n(p−2)/2

Pn

i=1 gi (Xi ).

11

5

Proof of Main Theorems

Proof of Theorem 3.1. (3.5) follows from (3.4) and (1.4). When β > 1/2, (3.4) is trivial. For β ≤ 1/2, (3.4) is a consequence of (3.3) and Remark 3.2. Thus, we only need to prove (3.3). Note that (5.1)

−P (z − |∆| ≤ W ≤ z) ≤ P (T ≤ z) − P (W ≤ z) ≤ P (z ≤ W ≤ z + |∆|).

It suffices to show that (5.2)

P (z ≤ W ≤ z + |∆|) ≤ 4δ + E|W ∆| +

n X

E|gi (Xi )(∆ − ∆i )|

i=1

and (5.3)

P (z − |∆| ≤ W ≤ z) ≤ 4δ + E|W ∆| +

n X

E|gi (Xi )(∆ − ∆i )|

i=1

where δ satisfies (3.2). Let

(5.4)

f∆ (w) =

 −|∆|/2 − δ   

for w ≤ z − δ,

w − 12 (2z + |∆|)

for z − δ ≤ w ≤ z + |∆| + δ,

  

|∆|/2 + δ

for w > z + |∆| + δ.

Let ˆ i (t) = ξi {I(−ξi ≤ t ≤ 0) − I(0 < t ≤ −ξi )}, ξi = gi (Xi ), M ˆ i (t), M ˆ (t) = Mi (t) = E M

n X

ˆ i (t), M (t) = E M ˆ (t). M

i=1

Since ξi and f∆i (W − ξi ) are independent for 1 ≤ i ≤ n and Eξi = 0, we have (5.5)

E{W f∆ (W )}

X

=

E{ξi (f∆ (W ) − f∆ (W − ξi ))}

1≤i≤n

X

+

E{ξi (f∆ (W − ξi ) − f∆i (W − ξi ))}

1≤i≤n

:= H1 + H2 . ˆ (t) ≥ 0 and f 0 (w) ≥ 0, we have Using the fact that M ∆ (5.6)

H1 =

X

Z

nZ



E ξi

1≤i≤n

=

X

E

= E

−ξi

−∞

1≤i≤n

nZ

0

n



−∞

0 f∆ (W + t)dt

o

0 ˆ i (t)dt f∆ (W + t)M

0 ˆ (t)dt f∆ (W + t)M

12

o

o

≥ E

nZ |t|≤δ

0 ˆ (t)dt f∆ (W + t)M

n

≥ E I(z ≤ W ≤ z + |∆|)

o

Z

o

ˆ (t)dt M

|t|≤δ

=

n

X

o

E I(z ≤ W ≤ z + |∆|)|ξi | min(δ, |ξi |)

1≤i≤n

≥ H1,1 − H1,2 , where H1,1 = P (z ≤ W ≤ z + |∆|)

X

X

Eηi , H1,2 = E|

1≤i≤n

ηi − Eηi |, ηi = |ξi | min(δ, |ξi |).

1≤i≤n

By (3.2), X

Eηi ≥ 1/2.

1≤i≤n

Hence H1,1 ≥ (1/2)P (z ≤ W ≤ z + |∆|).

(5.7)

By the Cauchy-Schwarz inequality, 

H1,2 ≤

(5.8)

E(

X

ηi − Eηi )2

1/2

1≤i≤n

 X



Eηi2

1/2

≤ δ.

1≤i≤n

As to H2 , it is easy to see that



|f∆ (w) − f∆i (w)| ≤ |∆| − |∆i | /2 ≤ |∆ − ∆i |/2. Hence (5.9)

|H2 | ≤ (1/2)

n X

E|ξi (∆ − ∆i )|.

i=1

Combining (5.5), (5.7), (5.8) and (5.9) yields n

P (z ≤ W ≤ z + |∆|) ≤ 2 E|W f∆ (W )| + δ + (1/2)

n X i=1

≤ E|W ∆| + 2δE|W | + 2δ +

n X

E|ξi (∆ − ∆i )|

i=1

≤ 4δ + E|W ∆| +

n X i=1

13

o

E|ξi (∆ − ∆i )|

E|ξi (∆ − ∆i )|.

This proves (5.2). Similarly, one can prove (5.3) and hence Theorem 3.1. Proof of Theorem 3.2. First, we prove (3.9). For |z| ≤ 4, (3.9) holds by (3.5). For |z| > 4, consider two cases. Case 1.

Pn

p i=1 E|gi (Xi )|

> 1/2.

By the Rosenthal (1970) inequality, we have P (|W | > (|z| − 2)/3) ≤ P (|W | > |z|/6) ≤ (|z|/6)−p E|W |p

(5.10)

≤ C(|z| + 1)−p ≤ C(|z| + 1)−p

n n X

p/2

Egi2 (Xi )

+

i=1 n X

n X

E|gi (Xi )|p

o

i=1

E|gi (Xi )|p .

i=1

Hence |P (T ≤ z) − Φ(z)| ≤ P (|∆| > (|z| + 1)/3) + P (|W | > (|z| − 2)/3) + P (|N (0, 1)| > |z|) ≤ P (|∆| > (|z| + 1)/3) + C(|z| + 1)−p

n X

E|gi (Xi )|p ,

i=1

which shows that (3.9) holds. Case 2.

Pn

p i=1 E|gi (Xi )|

≤ 1/2.

Similar to (5.10), we have −p

P (|W − gi (Xi )| > (|z| − 2)/3) ≤ C(|z| + 1)

n n X

p/2

Egj2 (Xj )

+

j=1

n X

o

E|gj (Xj )|p ≤ C(|z| + 1)−p

j=1

and hence γz ≤ P (|∆| > (|z| + 1)/3) +

n X

((|z| + 1)/3)−p E|gi (Xi )|p +

i=1

n X

C(|z| + 1)−p E|gi (Xi )|p

i=1

≤ P (|∆| > (|z| + 1)/3) + C(|z| + 1)−p

n X

E|gi (Xi )|p .

i=1

By Remark 3.1, we can choose δ = ≤

n  2(p − 2)p−2 X

(p − 1)p−1

E|gi (Xi )|p

1/(p−2)

i=1

n 2(p − 2)p−2 X E|gi (Xi )|p . (p − 1)p−1 i=1

Combining the above inequalities with (3.6) and the non-uniform Berry-Essee bound for independent random variables yields (3.9). 14

Next we prove (3.6). The main idea of the proof is first to truncate gi (Xi ) and then adopt the proof of Theorem 3.1 to the truncated sum. Without loss of generality, assume z ≥ 0 as we can simply apply the result to −T . By (5.1), it suffices to show that P (z − |∆| ≤ W ≤ z) ≤ γz + e−z/3 τ

(5.11) and

P (z ≤ W ≤ z + |∆|) ≤ γz + e−z/3 τ.

(5.12)

Since the proof of (5.12) is similar to that of (5.11), we only prove (5.11). It is easy to see that P (z − |∆| ≤ W ≤ z) ≤ P (|∆| > (z + 1)/3) + P (z − |∆| ≤ W ≤ z, |∆| ≤ (z + 1)/3). Now (5.11) follows directly by Lemmas 5.1 and 5.2 below. This completes the proof of Theorem 3.2.

Lemma 5.1 Let ¯ = ξi = gi (Xi ), ξ¯i = ξi I(ξi ≤ 1), W

n X

ξ¯i .

i=1

Then P (z − |∆| ≤ W ≤ z, |∆| ≤ (z + 1)/3)

(5.13)

¯ ≤ z, |∆| ≤ (z + 1)/3) ≤ P (z − |∆| ≤ W +

n X

P (ξi > (z + 1)/3) +

i=1

n X

P (W − ξi > (z − 2)/3)P (|ξi | > 1).

i=1

Proof. We have P (z − |∆| ≤ W ≤ z, |∆| ≤ (z + 1)/3) ≤ P (z − |∆| ≤ W ≤ z, |∆| ≤ (z + 1)/3, max |ξi | ≤ 1) 1≤i≤n

+P (z − |∆| ≤ W ≤ z, |∆| ≤ (z + 1)/3, max |ξi | > 1) 1≤i≤n n X

¯ ≤ z, |∆| ≤ (z + 1)/3) + ≤ P (z − |∆| ≤ W

P (W > (2z − 1)/3, |ξi | > 1)

i=1

and n X

P (W > (2z − 1)/3, |ξi | > 1)

i=1

15

n X



i=1 n X



i=1 n X

=

P (ξi > (z + 1)/3) + P (ξi > (z + 1)/3) + P (ξi > (z + 1)/3) +

n X i=1 n X i=1 n X

i=1

P (W > (2z − 1)/3, ξi ≤ (z + 1)/3, |ξi | > 1) P (W − ξi > (z − 2)/3, |ξi | > 1) P (W − ξi > (z − 2)/3)P (|ξi | > 1),

i=1

as desired. Lemma 5.2 We have ¯ ≤ z, |∆| ≤ (z + 1)/3) ≤ e−z/3 τ. P (z − |∆| ≤ W

(5.14)

Proof. Noting that E ξ¯i ≤ 0, es ≤ 1 + s + s2 (ea − 1 − a)a−2 for s ≤ a and a > 0 and that aξ¯i ≤ a, we have for a > 0 ¯

EeaW

(5.15)

= ≤

n Y

¯

Eeaξi

i=1 n  Y

1 + a E ξ¯i + (ea − 1 − a)E ξ¯i2



i=1



n X



n X

≤ exp (ea − 1 − a)

E ξ¯i2



i=1

≤ exp (ea − 1 − a)



Eξi2 = exp(ea − 1 − a).

i=1

In particular, we have

¯ EeW /2

≤ exp(e1/2 − 1.5). If δ ≥ 0.07, then

¯ ≤ z, |∆| ≤ (z + 1)/3) P (z − |∆| ≤ W ¯ /2 ¯ > (2z − 1)/3) ≤ e−z/3+1/6 EeW ≤ P (W

≤ e−z/3 exp(e.5 − 4/3) ≤ 1.38e−z/3 ≤ 20δ e−z/3 . This proves (5.14) when δ ≥ 0.07. For δ < 0.07, let

(5.16)

f∆ (w) =

 0   

ew/2 (w

  

for w ≤ z − |∆| − δ, − z + |∆| + δ)

ew/2 (|∆| + 2δ)

for z − |∆| − δ ≤ w ≤ z + δ, for w > z + δ.

Put ¯ i (t) = ξi {I(−ξ¯i ≤ t ≤ 0) − I(0 < t ≤ −ξ¯i )}, M ¯ (t) = M

n X i=1

16

¯ i (t). M

By (5.5) and similar to (5.6), we have (5.17)

¯ )} E{W f∆ (W

=

E

nZ



−∞

+

n X

0 ¯ ¯ (t)dt f∆ (W + t)M

n

o

¯ − ξ¯i ) − f∆ (W ¯ − ξ¯i )) E ξi (f∆ (W i

i=1

:= G1 + G2 , ¯ (t) ≥ 0, f 0 (w) ≥ ew/2 for z − |∆| − δ ≤ w ≤ z + δ and f 0 (w) ≥ 0 for It follows from the fact that M ∆ ∆ all w, (5.18)

G1 ≥ E

nZ |t|≤δ

0 ¯ ¯ (t)dt f∆ (W + t)M

n

Z

n

oZ

¯ ¯ ≤ z, |∆| ≤ (z + 1)/3) ≥ E eW /2 I(z − |∆| ≤ W

o

¯ (t)dt M

|t|≤δ ¯

¯ ≤ z, |∆| ≤ (z + 1)/3) = E eW /2 I(z − |∆| ≤ W

¯ (t)dt EM

|t|≤δ

n

¯

¯ ≤ z, |∆| ≤ (z + 1)/3) +E eW /2 I(z − |∆| ≤ W

Z

¯ (t) − E M ¯ (t))dt (M

o

|t|≤δ

≥ G1,1 − G1,2 , where z/3−1/6

G1,1 = e

G1,2 = E

¯ ≤ z, |∆| ≤ (z + 1)/3) P (z − |∆| ≤ W

Z

¯ (t)dt, EM

|t|≤δ

nZ

¯ ¯ (t) − E M ¯ (t)|dt . eW /2 |M

o

|t|≤δ

By (3.2) and the assumption that δ ≤ 0.07, Z

¯ (t)dt = EM

|t|≤δ

n X

E|ξi | min(δ, |ξ¯i |)

i=1

=

n X

E|ξi | min(δ, |ξi |) ≥ 1/2.

i=1

Hence (5.19)





¯ ≤ z, |∆| ≤ (z + 1)/3 . G1,1 ≥ (1/2)ez/3−1/6 P z − |∆| ≤ W ¯

By (5.15), we have EeW ≤ exp(e − 2) < 2.06. It follows from the Cauchy-Schwarz inequality that (5.20)

G1,2 ≤ .5

Z



¯



ˆ (t) − M (t)|2 dt 0.5EeW + 2E|M

|t|≤δ

n

≤ 0.5 2.06δ + 2

n Z X i=1 |t|≤δ

o

Eξi2 (I(−ξ¯i ≤ t ≤ 0) + I(0 < t ≤ −ξ¯i ))dt 17

n

n X

n

i=1 n X

= 0.5 2.06δ + 2

o

Eξi2 min(δ, |ξ¯i |)

≤ 0.5 2.06δ + 2δ

o

Eξi2 ≤ 2.03δ.

i=1

As to G2 , it is easy to see that



|f∆ (w) − f∆i (w)| ≤ ew/2 |∆| − |∆i | ≤ ew/2 |∆ − ∆i |. ¯ − ξ¯i are independent Hence, by the H¨older inequality, (5.15) and the assumption that ξi and W |G2 | ≤

(5.21)

n X

¯

¯

¯

¯

E|ξi e(W −ξi )/2 (∆ − ∆i )|

i=1



n  X

Eξi2 eW −ξi

1/2 

E(∆ − ∆i )2

1/2

i=1

=

n  X

¯

¯

Eξi2 EeW −ξi

1/2

||∆ − ∆i ||2

i=1

≤ 1.44 Following the proof of (5.15) and by ¯ 2 W

EW e

=

n X

¯

n X

||ξi ||2 ||∆ − ∆i ||2 .

i=1 using |es ¯

− 1| ≤ |s|(ea − 1)/a for s ≤ a and a > 0, we have

¯

Eξi2 eξi EeW −ξi +

i=1

¯

¯

¯

¯

1≤i6=j≤n

≤ 2.06 e

n X

X

Eξi2 + 2.06(e − 1)2

i=1

Eξi2 Eξj2

1≤i6=j≤n 2

2

≤ 2.06 e + 2.06(e − 1) < 3.42 . Thus, we obtain ¯ /2 ¯ )} ≤ E|W |eW E{W f∆ (W (|∆| + 2δ)

(5.22)



n

||∆||2 + 2δ

¯

o

1/2

E(W 2 eW )

≤ 3.42(||∆||2 + 2δ). Combining (5.17), (5.19), (5.20), (5.21) and (5.22) yields ¯ ≤ z, |∆| ≤ (z + 1)/3) P (z − |∆| ≤ W n

≤ 2e−z/3+1/6 3.42(||∆||2 + 2δ) + 2.03δ + 1.44

n X

||ξi ||2 ||∆ − ∆i ||2

i=1

n

≤ e−z/3 21δ + 8.1||∆||2 + 3.5

n X i=1

−z/3

= e

τ. 18

¯

Eξi (eξi − 1)Eξj (eξj − 1)EeW −ξi −ξj

X

||ξi ||2 ||∆ − ∆i ||2

o

o

This proves (5.14). Proof of Remark 3.1. It is known that for x ≥ 0, y ≥ 0, α > 0, γ > 0 with α + γ = 1 xα y γ ≤ αx + γy, 

p−1

p−2 p−2 a which yields with α = (p−2)/(p−1), γ = 1/(p−1), x = b(p−1)/(p−2) and y = ( p−1 ) bp−2

a = xα y γ ≤ αx + γy = b +

1/(p−1)

,

(p − 2)p−2 ap−1 (p − 1)p−1 bp−2

or b≥a−

(p − 2)p−2 ap−1 . (p − 1)p−1 bp−2

a≥a−

(p − 2)p−2 ap−1 . (p − 1)p−1 bp−2

On the other hand, it is clear that

This proves (3.11). Now (3.2) follows directly from (3.11), (3.10) and the assumption (1.1). Proof of Remark 3.2. Note that δ = β/2 ≤ 1/4. Applying (3.11) with p = 3 yields n X

E|gi (Xi )| min(δ, |gi (Xi )|)

i=1

≥ ≥

n X

E|gi (Xi )|I(|gi (Xi )| ≤ 1) min(δ, |gi (Xi )|)

i=1 n n X

o

Egi2 (Xi )I(|gi (Xi )| ≤ 1) − E|gi (Xi )|3 I(|gi (Xi )| ≤ 1)/(4δ)

i=1



= 1 − 4δ

n X

Egi2 (Xi )I(|gi (Xi )| > 1) +

i=1

n X



E|gi (Xi )|3 I(|gi (Xi )| ≤ 1) /(4δ)

i=1

≥ 1 − β/(4δ) = 1/2. This proves Remark 3.2. Proof of Remark 3.5. The proof is postponed to the end of next section.

6 6.1

Proofs of Other Theorems Proof of Theorem 4.1

¯ k (x1 , . . . , xk ) = For 1 ≤ k ≤ m, let hk (x1 , . . . , xk ) = E(h(X1 , . . . , Xm )|X1 = x1 , . . . , Xk = xk ) and h hk (x1 , . . . , xk ) −

Pk

i=1 g(xi ).

Observing that

Un = n−1 m

n X i=1

g(Xi ) +

 n −1

m

X 1≤i1 2s|Y |) o

|Z|3/2 n |Y | o ≤ CE ≤ Cs−1/2 . (s|Y |)1/2

Thus, we have R1 ≤ Cs−1/2 . As to R2 , we have R2 ≤ E(1/|sY + rZ|1/2 ) = c0 Note that r ∼ 1 and s ∼

p

2/n as n → ∞. Combining the above inequalities yields α ≤ Cεn−1/4 .

(6.31) By (6.29) and (6.31), we have E|∆| +

n X

E|Xi |3 +



α ≤ 8ε + Cε1/2 n−1/8 ≤ (8 + C)ε

i=1

provided that n > (1/ε)4 . This proves (2.3). Finally, we prove (3.12) in Remark 3.5. Let θ =

(n − 1)/n, ρ = (1/n)1/2 , and let Y and Z be

p

independent standard normal random variables. By (6.29), (6.32)

E|∆| +

n X

E|Xi |3 ≤ 4ε + 4n−1/2 ≤ 8ε2/3

i=1

for n ≥ (1/ε)4/3 . Following the proof of (6.30), we have (6.33)

n X





E Xi (∆(X1 , . . . , Xi , . . . , Xn ) − ∆(X1 , . . . , 0, . . . , Xn ))

i=1

34





= nεE |X1 |(|X1 + X2 + . . . Xn |−1/2 − |X2 + . . . Xn |−1/2 )



= nεE |ρY |(|ρY + θZ|−1/2 − |θZ|−1/2 ) o Y2 |ρY + θZ|1/2 |θZ|1/2 (|ρY + θZ|1/2 + |θZ|1/2 ) n Y 2 I(|θZ| ≤ ρ|Y |/2) o ≤ εE |ρY + θZ||θZ|1/2 n Y 2 I(ρ|Y |/2 < |θZ| ≤ 2ρ|Y |) o +εE |ρY + θZ|1/2 |θZ| n Y 2 I(|θZ| > 2ρ|Y |) o +εE |ρY + θZ|1/2 |θZ|

≤ nερ2 E

n

n

o

≤ 2ερ−1 E |Y | |θZ|−1/2 I(|θZ| ≤ ρ|Y |/2) +2ερ−1 E

n |Y |I(ρ|Y |/2 < |θZ| ≤ 2ρ|Y |) o

|αY + θZ|1/2 n Y 2 I(|θZ| > 2ρ|Y |) o +2εE |θZ|3/2 ≤ Cερ−1/2 = Cεn1/4 ≤ 2Cε2/3 for an absolute constant C, provided that n ≤ 16(1/ε)4/3 . This proves (3.12), by (6.32) and (6.33).

Acknowledgments. The authors are thankful to Xuming He for his contribution to the construction of the counterexample in Section 2.

References ¨ tze, F. and Zitikis, M. (1994). Lower estimates of the convergence rate [1] Bentickus, V., Go for U -statistics. Ann. Probab. 22, 1701-1714. [2] Bhattacharya, R.N. and Ghosh, J.K. (1978). On the validity of the formal Edgeworth expansion. Ann. Statist. 6, 434-485. [3] Bickel,P. (1974). Edgeworth expansion in nonparametric statistics. Ann., Statist. 2, 1-20. ¨ tze,F. (1993). The rate of convergence for multivariate sampling statis[4] Bolthausen, E. and Go tics. Ann. Statist. 21 1692-1710. 35

[5] Borovskich, Yu.V. (1983). Asymptotics of U-statistics and von Mises’ functionals. Sov. Math. Dokl. 27, 303-308. [6] Callaert, H. and Janssen, P. (1978). The Berry-Esseen theorem for U-statistics. Ann., Statist. 6, 417-421. [7] Chan, Y.K. and Wierman, J. (1977). On the Berry-Esseen theorem for U-statistics. Ann. Probab. 5, 136-139. [8] Chen, L.H.Y. and Shao, Q.M. (2001). A non-uniform Berry-Esseen bound via Stein’s method. Probab. Theory Related Fields 120, 236-254. [9] Figiel, T., Hitczenko, P., Johnson, W.B., Schechtman, G., and Zinn, J. (1997). Extremal properties of Rademacher functions with applications to the Khintchine and Rosenthal inequalities. Trans. Amer. Math. Soc. 349, 997-1027. [10] Filippova, A.A. (1962). Mises’s theorem of the asympototic behavior of functionals of empirical distribution functions and its statistical applications. Theory Probab. Appl. 7, 24-57. [11] Finelstein, M., Kruglov, V.M., and Tucker, H.G. (1994). Convergence in law of random sums with non-random centering. J. Theoret. Probab. 3, 565-598. [12] Friedrich, K. O. (1989). A Berry-Esseen bound for functions of independent random variables. Ann. Statist. 17, 170–183. [13] Grams, W.F. and Serfling, R.J. (1973). Convergence rates for U-statistics and related statistics. Ann., Statist. 1, 153-160. [14] Helmers, R. (1977). The order of the normal approximation for linear combinations of order statistics with smooth weight functions. Ann. Probab. 5, 940-953. [15] Helmers, R. and Janssen, P. (1982). On the Berry-Esseen theorem for multivariate U-statistics. Math. Cent. Rep. SW 90/82, Mathematisch Centrum, Amsterdam, pp. 1-22. [16] Helmers, R., Janssen, P. and Serfling, R. J. (1990). Berry-Esseen bound and bootstrap results for genalized L-statistics. Scand. J. Statist. —bf 17, 65-77.

36

[17] Hoeffding, W. (1948). A class of statististics with asymptotically normal distribution. Ann. Math. Statist. 19, 293-325. [18] Koroljuk, V.S. and Borovskich, Yu. V. (1994). Theory of U-statistics. Kluwer Academic Publishers, Boston. [19] Robbins, H. (1948). The asymptotic distribution of the sum of a random numbers of random random variables. Bull. A.M.S. 54, 1151-1161. [20] Rosenthal, H.P. (1970). On the subspaces of Lp (p > 2) spanned by sequences of independent random variables. Israel J. Math. 8, 273-303. [21] Serfling, R.J. (1980). Approximation Theorems of Mathematical Statisitics. Wiley, New York. [22] Shorack, G.R. (2000). Probability for Statisticians. Springer, New York. [23] Wang, Q. (2001). Non-uniform Berry-Ess´een Bound for U-Statistics. Statist. Sinica (to appear) [24] Wang, Q., Jing, B.Y. and Zhao, L. (2000). The Berry-Esseen bound for studentized statistics. Ann. Probab. 28, 511–535. [25] Zhao, L.C. and Chen, X.R. (1983). Non-uniform convergence rates for distributions of Ustatistics. Sci. Sinica (Ser. A) 26, 795-810. [26] van Zwet, W.R. (1984). A Berry-Esseen bound for symmetric statisics. Z. Wahrsch. Verw. Gebiete 66, 425-440.

37

Louis H.Y. Chen Institute for Mathematical Sciences National University of Singapore Singapore 118402 Republic of Singapore [email protected] and Department of Mathematics Department of Statistics & Applied Probability National University of Singapore Singapore 117543 Republic of Singapore

38

Qi-Man Shao Department of Mathematics University of Oregon Eugene, OR 97403 U.S.A. [email protected] and Department of Mathematics Department of Statistics & Applied Probability National University of Singapore Singapore 117543 Republic of Singapore