Central Limit Theorem for Testing the Equality of Covariance Matrices

Report 11 Downloads 34 Views
Central Limit Theorem for Testing the Equality of Covariance Matrices

A PROJECT SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA DULUTH BY

Wenchuan Guo

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Advisor: Yongcheng Qi

May, 2014

Acknowledgements I would first like to thank my advisor, Professor Yong-cheng Qi. I am deeply indebted to him for his support and commitment, his enthusiastic guidance, his insight throughout the writing process. Furthermore, I would like to thank him for supporting me as research assistant for this project in the second year. In addition, I would like to thank Dr. Barry James and Dr. Marshall Hampton for willing to serve on my committee, taking the time to read this paper, and hearing the defense. Last but not least, I would like to profusely thank the faculty, sta↵, and fellow students in the math department whose contributions were always helpful and illuminating to me.

i

Abstract In this project, we are interested in testing for the equality of k covariance matrices of p-dimensional multivariate normal distributions where the likelihood ratio test is used. The asymptotic distribution of the test statistics under di↵erent conditions is needed. In traditional multivariate analysis, the dimension p is considered to be a fixed constant that is unchanged as n approaches infinity. However, in practice, the dimension may be proportional to sample size n. In this project, we treat the dimension p as a function of sample size n and studied the hypothesis test under the general conditions that min1ik ni /p

c > 1. We also assume the number of populations k to be a variable

that changes with sample size ni under the condition k/ni ! 0. In this case, the limiting distribution of the log likelihood ratio test statistics when sample size n goes to infinity is

no longer chi-square distribution. We derive the central limit theorem when p, k are variables. Numerical simulations for the two di↵erent approximations including histograms of simulated values and estimation of power and size are presented at the end of this project. Keywords: Likelihood ratio test, central limit theorem, multi- variate normal distribution, multivariate Gamma function.

ii

Contents Acknowledgements

i

Abstract

ii

1 Introduction

1

1.1

Likelihood ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2 Preliminary results

5

2.1

Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2

Preliminary proposition and lemmas . . . . . . . . . . . . . . . . . . . . . . . .

6

3 Proof of the main result

16

4 Numerical simulations

21

4.1

Estimation of size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2

Estimation of power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

References

24

Appendix

25

iii

1

Introduction

High-dimensional data are increasingly common in many applications of statistics, encountered particularly in biological, meteorological, and financial studies. If a sample of size n comes from a population with dimension p, it is natural to set up an n ⇥ p data matrix. The

analysis of such matrices includes hypothesis testing, interval estimation and modeling, and is elaborated in many classical textbooks of multivariate analysis such as Anderson [2], Muirhead [13], and Eaton [7]. In these books, the properties of these tests, estimations or models are based on the assumption that the data dimension p is a fixed constant or is negligible compared with the sample size n. However, due to the explosive development and improvement of computing power and detection methods in biology, the data dimension often is high. In this case, the former assumption of small p large n is no longer valid because the dimension p can be proportionally large compared with the sample size n. Some significant results have been achieved for this situation as reviewed in Bai et al. [3], Ledoit and Wolf [12], Jiang et al. [9], Jiang and Yang [11], Jiang and Qi [10]. In this project, we are interested in testing the equality of k covariance matrices of multivariate normal distributions under the assumption that p ! 1 and k ! 1.

1.1

Likelihood ratio test

In this subsection, we give the formal descriptions of the problem and introduce some results related to this topic. Let k

2 be an integer. We denote Np (µ, ⌃) as the multivariate normal distribution

with dimension p, where µ is the mean vector and ⌃ is the covariance matrix. Consider the following hypothesis test: H0 : ⌃1 = · · · = ⌃k = ⌃ vs Ha : H0 is not true.

(1.1)

Let xi1 , · · · , xini , for 1  i  k, be independent random samples from Np (µi , ⌃i ), where the mean vectors µi and covariance matrices ⌃i are unknown. Define xi =

ni ni X 1 X xij and Ai = (xij ni j=1

j=1

xi )(xij

xi )0 , 1  i  k,

(1.2)

and A = A1 + · · · + Ak and n = n1 + · · · + nk .

1

(1.3)

We derive the likelihood ratio test for (1.1) first. The likelihood function for the k independent samples is given by L(µ1 , . . . , µk , ⌃1 , · · · , ⌃k ) =

k Y

L(µ1 , ⌃1 ) =

i=1

· exp

k ⇢ Y i=1



1 ni (xi 2

ni /2

|⌃i |

µi )0 ⌃i 1 (xi

·etr

µi )





1 1 ⌃ Ai 2 i



(1.4)

,

where etr is the exponential trace function, i.e., for a square matrix X, etr(X) = exp(tr(X)). The likelihood ratio test is given as follows: sup µ1 ,...,µk ,⌃

⇤n =

L(µ1 , . . . , µk , ⌃, . . . , ⌃)

sup µ1 ,...,µk ,⌃1 ,...,⌃k

L(µ1 , . . . , µk , ⌃1 , . . . , ⌃k )

.

(1.5)

Observe that maximizing the likelihood amounts to maximizing each of the likelihoods in the product (1.4). Therefore, the denominator in (1.5) is ˆ 1, . . . , ⌃ ˆ k) = e L(x1 , . . . , xk , ⌃

pn/2

k ⇣ Y

ni

pni /2

i=1

|Ai |

ni /2



,

(1.6)

ˆ i = n 1 Ai . When the null hypothesis H0 : ⌃1 = · · · = ⌃k = ⌃ is true, we write the where ⌃ i likelihood function in (1.4) as

L(µ1 , . . . , µk , ⌃, · · · , ⌃) = |⌃| ·

k Y

n/2

exp

i=1

which is maximized when µi = xi and ⌃ = n L(x1 , . . . , xk , n

1

A, . . . , n

etr



1 ⌃ 2

1

A





1 ni (xi 2

1 A.

Thus the numerator in (1.5) becomes

1

0

µi ) ⌃

A) = npn/2 e

pn/2

Combining (1.6) and (1.8) in (1.5) gives Qk (|Ai |)ni /2 nnp/2 ⇤n = i=1 n/2 ·Q . n p/2 k (|A|) n i i=1

1

(xi

|A|

n/2



(1.7)

µi ) ,

.

(1.8)

(1.9)

i

The likelihood ratio test (1.9) was first obtained by Wilks [17]. The details can be found in Section 8.2.2 of Muirhead [13]. By the result aforementioned, we reject the null hypothesis H0 (1.1) as ⇤n  c↵ , where c↵

is chosen so that the test has the significance level of ↵. The test statistic ⇤n is non-degenerate 2

only if every Ai has full ranks which means that the condition p < ni for all i = 1, . . . , k is required by the fact that only with this condition, the determinants are nonzero. However, the drawback of this likelihood-ratio test is that when the sample sizes n1 , . . . , nk are not all equal, it is biased, which implies that when H0 is true, the probability of rejecting H0 (1.1) can be larger than the probability of rejecting H0 with H0 being false. The bias was first noticed by Brown [6] with p = 1. By this fact, we use instead the following modified likelihood ratio test statistic ⇤⇤n suggested by Bartlett [4]: ⇤⇤n

=

Qk

(ni 1)/2 i=1 (|Ai |) (|A|)(n k)/2

· Qk

k)(n

(n

i=1 (ni

k)p/2

1)(ni

1)p/2

.

(1.10)

⇤⇤n is derived by substituting the sample size ni by the degree of freedom ni ni by n

1 and sum of

k. It is proved by Sugiura and Nagao [15] that this modified likelihood ratio test is

unbiased when k = 2. A generalization to general integer k is made in Perlman [14]. When p is a fixed number, a lot of previous research has developed large sample approximation for the limit distribution of ⇤⇤n . Under the null hypothesis (1.1), Box [5] proved that 2⇢ log ⇤⇤n has a chi-square approximation: d

2⇢ log ⇤⇤n !

2 f

(1.11)

as min1ik ni ! 1, where 1 f = p(p + 1)(k 2 ⇢=1

1),

2p2 + 3p 1 6(p + 1)(k 1)(n

2 f

 x) +

k)

i=1

!

k 1

M2

as n ! 1, where M = ⇢(n k) and ( " k X (n p(p + 1) = (p + 2)(p 1) 48 (ni i=1

1 .

[P (

k)2 1)2

2 f +4 )

1

#

x

6(k

P(

2 f

 x)] + O(M

1)(n

k)2 (1

The likelihood ratio test of size ↵ based on the asymptotic distribution of reject H0 (1.1) if

(1.12)

2⇢ log ⇤⇤n can be expanded as

Actually, the cumulative distribution of P ( 2⇢ log ⇤⇤n  x) = P (

k X n ni

2⇢ log ⇤⇤n > c↵ , where c↵ is the 100↵% upper quantile of

The details can be found in [13, p. 309]. 3

⇢)2

3

),

)

.

(1.13)

(1.14)

2⇢ log ⇤⇤n is to 2 f

distribution.

In recent studies, the assumption on the fixed dimension p is relaxed to be changing with the sample size ni so that the ratio p/ni converges to a constant between 0 and 1, i.e., lim

n!1

p = yi 2 (0, 1). ni

(1.15)

For instance, using the Random Matrix Theory method, the correction to the traditional likelihood ratio test for testing the equality of two covariance matrices is studied in Bai et al. [3] under assumption (1.15). Denote the sample covariance matrices for two populations as A and B. Let L=

|A|n1 /2 ·|B|n2 /2 . |n1 A/n + n2 B/n|n/2

(1.16)

Then the chi-square approximation of the test statistic is represented as d

2 log L !

2

1 p(p+1) 2

.

(1.17)

A special case with yi = 1 is studied in Jiang and Yang [11]. The following central limit theorem (CLT)

log ⇤⇤n

µn

d

! N (0, 1),

n

where µn and

n

(1.18)

are functions of n and p (see, e.g., [11, 10]), is obtained in Jiang and Yang

[11], Jiang and Qi [10] under the assumption that the dimension p can be proportionally large to the sample size ni or much smaller than ni , that is, p/ni ! yi 2 [0, 1] and ni

1

p for

1  i  k.

1.2

Main result

Throughout this project, we assume that both the dimension p and number of the populations k change along with the sample size ni , but k is much smaller than ni , that is, k/ni ! 0. In

addition, the condition min1ik ni > p is always assumed. We have the following Theorem 1.1. Theorem 1.1. Assume ni = ni (p, k) for all 1  i  k such that p > 2, min1ik ni /p k = o(min1ik ni ), where min1ik ni ! 1. Let (1.1), (log ⇤⇤n

⇤⇤n

be as in (1.10). Then, under H0 in

µn )/ n converges to N (0, 1) as p ! 1, k " k ✓ ◆ ✓ ◆ 1 X 3 p µn = p ni + log 1 (ni 2 2 ni i=1 ✓ ◆ ✓ 1 (n k) p n + k + log 1 2 n 4

c > 1,

! 1, where 1) p k+1



(1.19) + (k

1)p ,

2 n

" 1 = p(k 2

1) + (n



k)2 log 1

p k+1

n



k X

(ni



1)2 log 1

i=1

p ni

◆#

Based on the above Theorem, the rejection region of the test (1.1) is (log ⇤⇤n

.

(1.20)

µn )/

n


p

c, where 1  c  4.

In this project, we proceed as follows. The preliminary results used for the proof of Theorem 1.1 are presented in section 2. In section 2.1, we introduce the multivariate gamma function and Stirling formula. Section 2.2 is devoted to a key proposition, which is proved through several lemmas. In section 3, we present the proof of the main result. In section 4, we provide numerical simulations for the size and power of the tests and compare graphically the classical chi-square approximation with the normal approximation.

2

Preliminary results

2.1

Basic definitions

In this subsection, we introduce the multivariate gamma function and Stirling’s Formula, as well as some standard notations. Recall that the gamma function

(z) is defined for complex numbers z with Re(z) > 0

through a convergent improper integral: (z) =

Z

1

tz

1

e t dt.

(2.1)

0

In multivariate analysis, the multivariate gamma function

p (z),

defined on the complex plane

C, is a generalization of the univariate gamma function. The definition of

p (z)

is as follow.

Definition 2.1. The multivariate gamma function, denoted by p (z), is defined to be Z (z) = exp( trace(S))|S|z (p+1)/2 dS, (2.2) p where Re(z)

1 2 (p

1) and Sp⇥p is positive-definite matrix (the integral is over the space of

positive definite p ⇥ p matrices). p (z)

can also be written as the product of univariate gamma functions, as the following

theorem shows (see, e.g., Muirhead [13, p. 62]).

5

Theorem 2.1. p(p p (z) = ⇡

1)/4

 p Y

1 (i 2

z

i=1

for complex numbers z with Re(z) >

1 2 (p

1) ,

(2.3)

1).

By (2.3), it follows immediately that p (z)

=⇡

(p 1)/2

p 1 (z)



z+

(1

p) 2

.

(2.4)

Note that when p = 1, the multivariate gamma function reduces to the univariate gamma function. In addition, the multivariate gamma function appears in the probability density function of the Wishart distribution and inverse Wishard distribution. However, the formula (2.3) is much more useful in practice and is used throughout this project. The logarithm of the gamma function defined on the complex plane C has the following expansion, which is called Stirling’s formula (see, e.g., Gamelin [8, p. 368] or Equation (37) of Ahlfors [1, p. 204]): log (z) =



1 2

z



log z

z + log

p

2⇡ +

1 +O 12z



1 x3



,

(2.5)

as x = Re(z) ! 1. Formula (2.5) is a powerful tool that can be used to obtain the asymptotic approximation to

(z) leading to an accurate approximation for factorials, see more details

in Gamelin [8, p. 368]. The standard notation to describe the limiting behavior of a function is denoted by big O and small o. Precisely, for two sequences of numbers {an ; n

1} and {bn ; n

1}, an =

O(bn ) as n ! 1 means that lim sup1 |an /bn |< 1 and an = o(bn ) as n ! 1 means that limn!1 an /bn = 0. The symbol an ⇠ bn stands for limn!1 an /bn = 1. For two functions f

and g, we also use f (x) = O(g(x)), f (x) = o(g(x)), and f (x) ⇠ g(x) with the similar meaning as x ! x0 2 [ 1, 1].

2.2

Preliminary proposition and lemmas

The key point to prove Theorem 1.1 is the following expansion for

p (z).

Proposition 2.1. Let p = pm and t = tm satisfy that, when m ! 1, (1) pm ! 1; (2) p t = tm = O(m/(p k)) ; (3) m/pm c > 1. Then, log

m 1 2 + t) m 1 p( 2 )

p(

= ↵m t + 6

mt

2

+

m

+ Im ,

(2.6)

where ↵m m

m

Im



◆ ⇣ 3 p⌘ = p m+ log 1 2p; 2 m ⇣ ⌘ p p = log 1 ; m m h m ⇣m⌘ ⇣m ⌘ ⇣m ⌘i =p log + + t log +t ; 2 2 2 2 ✓ ◆ ✓ ◆ 1 1 =O +O . 1/2 pk mk

(2.7)

The proof of Proposition 2.1 is split into three lemmas. The following Lemma 2.1 is actually Lemma 5.1 of Jiang and Qi [10]. 2 (0, 1),

Lemma 2.1. For any given

(x + b) log = (x + b) log(x + b) (x) uniformly on b 2 [

x log x

b

b +O 2x



b2 + 1 x2



as x ! +1,

(2.8)

x, x].

Lemma 2.2. Let p = pm be such that 1  p < m, m/p c > 1 as p ! 1. Then ◆ p ✓ ⇣ ⇣ p ⌘ X 1 1 p p⌘ = log 1 +O , m i m m m m2

(2.9)

i=1

p X

[log(m

i)

log m] =

i=1

Proof. Note that

p ✓ X i=1



p

m+

1 m

i

1 m



⇣ log 1

=

p X

1 2



n=1

p⌘ m

1 m

i

p+O

⇣ p ⌘ . m2

(2.10)

p . m

(2.11)

By the asymptotic expansion of the harmonic series [16]: p X 1 i=1

as p ! 1, where p X i=1

1 m

i

i

1 2p

1 1 + 2 12p 120p4

= log p +

+

= log p +

1 + + O(p 2p

2

···

(2.12)

),

is the Euler’s constant, we obtain =

m X 1 i=1

i

m Xp i=1

1 i

1 1 + m m p

p p = log m log(m p) + m(m p) 2m(m ⇣ ⇣ p ⌘ p⌘ = log 1 +O as p ! 1. m m2 7

p)

+O



1 m2



+O



1 (m

p)2



(2.13)

In the last step of (2.13), we used the fact ✓ ◆ ✓ ◆ ✓ ◆ ⇣ p ⌘ p 1 1 O + O + O = O . m2 mp m2 (m p)2 m2

(2.14)

Equation (2.9) then follows from (2.13).

Next we show (2.10). Since for any integer n p X

log(m

i) =

i=1

m X1

log i

i=1

mX p 1

1, (n) = (n

log i = log (m)

◆ 1 log(m 2 ◆ 1 log(m 2

p) p)

Therefore, p X

[log(m

i)

p log m +

p X

log (m

p).

(2.15)

i=1

Applying Stirling’s formula (2.5) to x = m and x = m log (m) log (m p) ✓ ◆ ✓ 1 = m log m m p 2 ✓ ◆ ✓ 1 = m log m m p 2

1)!, we have

p to obtain

✓ ◆ ✓ ◆ 1 1 1 1 p +O (2.16) 12 m m p m3 ⇣ p ⌘ p+O . m2

log m]

i=1

=

log(m

i=1



=

1 2

p log m + m ✓ ◆ 1 = m p log m 2 ✓ ◆ ⇣ 1 = p m+ log 1 2



as m ! 1.

i) ✓



⇣ p ⌘ log m m p log(m p) p + O m2 ✓ ◆ ⇣ ⌘ 1 p m p log(m p) p + O 2 m2 ⇣ p ⌘ p⌘ p+O m m2 1 2

Lemma 2.3. Let p = pm be such that 1  p < m, m/p c > 1 as p ! 1. Define ✓ ◆ ✓ ◆ ⇣ ⌘ ⇣m ⌘ m i m i m gi (x) = + x log +x + x log +x , 2 2 2 2 p for 1  i  p and x > (m p)/2. Set t = O(m/(p k)), we have that p X

(gi (t)

=

p

(2.18)

gi (0))

i=1

✓

(2.17)

1 m+ 2





log 1

p⌘ m

p t+ 8

h

p m



log 1

p ⌘i 2 t + B, m

(2.19)

as m ! 1, where B=O



1 mk 1/2



+O



1 pk



.

(2.20)

Proof. A straightforward computation shows that ✓ ◆ ⇣m ⌘ m i 0 gi (x) = log +x log +x , 2 2 ✓ ◆ 1 ⇣ ⌘ 1 m i m gi00 (x) = +x +x , 2 2 ✓ ◆ 2 ⇣ ⌘ 2 m i m (3) +x + +x . gi (x) = 2 2

(2.21)

Hence, there exists a constant C that is independent of m, such that (3)(x)

sup |x||t|, 1ip

|gi

| C

p . m(m p)2

(2.22)

We expand the function gi (t) at t = 0 by Taylor’s expansion, obtaining gi (t) = gi (0) + gi0 (0)t +

t2 00 g (0) + R2 (t) 2 i

= gi (0) + (log(m

i)

log(m

1))t +



1

1

m

i

m

1



(2.23) t2 + R2 (t),

where R2 is the remainder such that |R2 (t)| C

p t3 . m(m p)2

(2.24)

Therefore, p X

(gi (t)

gi (0))

i=1 p X

(log(m

i)

log m) + t

p

m+





=t

i=1

=

✓

1 2

log 1

p⌘

m

2

p ✓ X i=1

1 m

p t+

i h

p m

◆ p2 t3 +O m(m p)2 ⇣ p ⌘i 2 log 1 t + A, m

1 m





p where when t = O(m/(p k)), ✓ ◆ ✓ ◆ ✓ ◆ p2 t3 pt pt2 B=O +O +O m(m p)2 m2 mp m2 mp ✓ ◆ ✓ ◆ ✓ ◆ m2 1 m =O +O +O pk(m p) pk 3/2 (m p)2 (m p)k 1/2 ✓ ◆ ✓ ◆ 1 1 =O +O . 1/2 pk mk 9

(2.25)

(2.26)

Proof of Proposition 2.1. By the definition of m 1 2 + t) m 1 p( 2 )

p(

log

=

p

p X

function in (2.3), we have ( m2 i + t)

log

( m2 i )

i=1

Consider the terms in “⌃” first. By Lemma 2.1, we write ✓ ◆ ✓ ◆ ( m2 i + t) m i m i log = + t log +t 2 2 ( m2 i ) ✓ 2 ◆ t t +1 t +O . m i m2

.

(2.27)

m

i 2

log

m

i 2

(2.28)

Performing sum with respect to i yields m 1 2 + t) m 1 p( 2 )

p(

log

p ✓ X m

=

i=1

tp

t

i 2

p X i=1

Set gi (x) =



m

i 2



+ t log

1 m



+ x log



i ✓

+O

m



i 2

m

i 2

pt2 + p m2

+x



i=1



i

+ t log

2



m

i 2

+t



m

i 2

log

m

i 2 (2.29)

.

⌘ ⇣m ⌘ + x log +x . 2 2

i 2



m

⇣m

By Lemma 2.3, we obtain p ✓ X m

+t



log

m

(2.30)

i 2

p h m ⇣m⌘ ⇣m ⌘ ⇣m ⌘i X log + + t log +t + (gi (t) gi (0)) 2 2 2 2 i=1 h m ⇣m⌘ ⇣m ⌘ ⇣m ⌘i =p log + + t log +t 2 2 2 2 ✓ ◆ ⇣ h p ⇣ 1 p⌘ p ⌘i 2 + p m+ log 1 p t+ log 1 t + A. 2 m m m

=p

(2.31)

This, together with (2.29), shows that log ✓

m 1 2 + t) m 1 p( 2 )

p(

◆ ⇣ h p ⇣ 3 p⌘ = p m+ log 1 2p t + log 1 2 m m h m ⇣m⌘ ⇣m ⌘ ⇣m ⌘i +p log + + t log + t + Im , 2 2 2 2 10

p ⌘i 2 t m

(2.32)

where Im = B + O



pt2 + p m2



=B+O



1 pk



=O

2 n

To prove our Theorem 1.1, we also need study the following lemma for Lemma 2.4. Let p

2 n





1 mk 1/2

+O



1 pk



.

(2.33)

as defined in (1.20). We have the

2 n.

=

2 n (k, p, n)

be defined in (1.20). Assume min1ik ni > p, ni > k

3, then we have

p2 (k 36

2 n (k, p, n)

1).

2,

(2.34)

The proof of lemma (2.4) is also split into three lemmas. The idea of the proof is to find the minimal value with respect to each individual variable k, n, p in G(p) = 2

2 n.

2 n.

We start from

In the following three lemmas, we treat p, n, k, ni as continuous variables.

Lemma 2.5. Define G(p) = p(k

1) + (n



2

k) log 1

where min1ik ni > p, ni > k

n

2, p

p k+1



k X



2

(ni

1) log 1

i=1

p ni



,

(2.35)

2, then we have inf G(p) = G(1).

(2.36)

p 1

Proof. It suffices to show that G(p) is an increasing function with respect to p under the given condition. Di↵erentiating (2.35) with respect to p, we obtain k

G0 (p) = k

X (ni (n k)2 + n k+1 p ni

1

1)2 . p

i=1

(2.37)

Observe that n

k+1

p

, kp

k

, (k

1)(p

, k

1 and p

n

kp

p+1

0

1)

0

(2.38)

1.

This gives (n

k+1

p)

k X (ni i=1

ni

1)2 p

(n

kp)

k X (ni i=1

=

k X i=1

11

(ni

p)

1)2 p

ni

k X (ni i=1

ni

1)2 . p

(2.39)

By Cauchy-Schwarz inequality we have k X

(ni

p)

i=1

k X (ni

ni

i=1

Hence

k X (ni

ni

i=1

which implies

k X

1)2 p

(ni

1)

i=1

!2

(n k)2 n k+1 p

1)2 p G0 (p)

k

1

k)2 .

= (n

(2.40)

0,

(2.41)

1.

(2.42)

Therefore, G(p) is an increasing function of p, which leads to (2.36). Then, we fix p = 1 in G(p) and consider the following function H(n1 , . . . , nk ) = G(1). Lemma 2.6. Define H(n1 , . . . , nk ) = (k

1) + (n

where min1ik ni > k

2



k) log 1

1 k+1

n

2. Under the constraint



Pk

inf H(n1 , · · · , nk ) = H

ni 2

i=1 ni

⇣n k

k X

(ni

2



1) log 1

i=1

1 ni



, (2.43)

= n, we have

,···,

n⌘ . k

(2.44)

Proof. To find the maximized or minimized value of function H(n1 , · · · , nk ) subject to the Pk constraint i=1 ni = n (this n is treated as a fixed constant), the method of Lagrange multipliers says to look for solutions of the following k + 1 equations: 8 !! k X > @ > > H(n1 , · · · , nk ) ni n = 0, 1  i  k, > > < @ni i=1

where @ @ni and

> > > > > :

k X

ni

n = 0,

i=1

is the Lagrange multiplier. Note that !! k X H(n1 , · · · , nk ) ni n = 2(ni

2(ni



1) log 1

1 ni



+

1 ni

0

=

⇣ 2n2i log 1 12



1) log 1

i=1



(2.45)

1 ni n2i



1 ni



+

+ 2ni + 1

1 ni

> 0.

1

(2.46)

(2.47)

(2.47) implies that each equation is a monotone function and then, by applying the symmetric property of ni , the solution of (2.45) is unique and must be n1 = n2 = · · · = nk = n/k. However, when n1 = n2 = · · · = nk , we don’t know the function H(n1 , · · · , nk ) has minimal

value or maximum value. To prove (2.44), we need check if the function attains minimal value at (n/k, · · · , n/k), for which it suffices to show ✓ 2 ◆ @ H M= is a positive definite matrix. @ni @nj k⇥k 1)2 log(1

Let f (x) := (x

1/x), 8 x

2, then

0

f (x) = 2(x



1) log 1

and



00

1 x

f (x) = 2 log 1

2



1 x



+

2 1 + 2. x x

1 , x

+1

(2.49)

(2.50)

x) when 0 < |x|< 1, we have

Using a taylor expansion of log(1 f 00 (x) =

(2.48)

1 X 1 2 1 + + 2 = jxj x x

2

i=1

Note that @2H Mij = = @ni @nj

(

1 X 1 < 0. jxj

(2.51)

i=3

f 00 (ni ) > 0 0

if i 6= j

if i = j

,

(2.52)

which leads to (2.48), then (2.44). Then we let ni = n/k and consider L(n) = H(n/k, . . . , n/k). Lemma 2.7. Define L(n) = k where n > k

1 + (n

✓ k)2 log 1

n

2. We have that L(n)

1 k+1



1 (k 2

k)2

(n k

✓ log 1

k n



,

(2.53)

1).

(2.54)

Proof. We show that L(n) is a non-increasing function, in which case we have L(n) limn!1 L(n). Set l(n) =



2 log 1

n

1 k+1



✓ 2 + log 1 k

13

k n



k n(n

1 ,n > k k + 1)

2.

(2.55)

The derivate function of l(n) is given by l0 (n) =

(k

1) k 2 k(n + 1) n , n2 (n k)(n k + 1)2

(2.56)

which is non-positive when k < n, implying l(n) is a non-increasing function, thus l(n)

inf l(n) = lim l(n) = 0.

A straightforward computation shows ✓ ✓ 0 L (n) = (n k) 2 log 1 n =

(n

(2.57)

n!1

n>k 2

1 k+1



✓ 2 + log 1 k

k n



k n(n

1 k + 1)



(2.58)

k)l(n)  0.

Letting n go to infinity and using Taylor expansion of log(1 x) for |x|< 1, we get  ✓ ◆ ✓ ◆ 1 1 1 k 2 2 lim L(n) = lim k 1 (n k) + + (n k) + n!1 n!1 n k + 1 2(n k + 1)2 n 2n2  3 (n k)2 (n k)2 = lim (k 1) + n!1 2 n n k+1  3 (n k)2 = lim (k 1) + (1 k) n!1 2 n(n k + 1) 1 = (k 2

1) > 0. (2.59)

Therefore, by (2.58), we get L(n)

1 inf L(n) = lim L(n) = (k n!1 n>k 2 2

1).

(2.60)

Proof of Lemma 2.4. Let G(p), H(n1 , . . . , nk ) and L(n) be defined in (2.35), (2.43), (2.53) respectively. Note that G(1) = H(n1 , . . . , nk ) and H(n/k, . . . , n/k) = L(n), thus, by Lemmas 2.5, 2.6 and 2.7, we have inf

p 2

G(p) G(1) = 2 2

inf

ni 2

H(n1 , . . . , nk ) 2

We then consider the following function ⇣ p(k 1) + (n k)2 log 1 n G2 (p) =

p k+1 p2

G(p) = 2 , p 14



inf

n 2

Pk

i=1 (ni

L(n) k 1 = . 2 4 ⇣ 1)2 log 1

(2.61)

p ni



(2.62)

where G(p) is defined in (2.35). Note that

2 n

= 2G(p). To show (2.34), it is suffice to show

G2 (p) is an increasing function with repsect to p, since when G2 (p) is increasing, we can easily have, by (2.61), 2 n 2 p

G2 (p) 2

=

G2 (3) G(3) = 2 18

G(1) k 1 = . 18 36

(2.63)

Di↵erentiate (2.62) with respect to p to obtain G02 (p) =

G0 (p)p

2G(p) p3

.

(2.64)

Since p is a positive integer, the positive or negative nature of G02 (p) is only dependent on G0 (p)p

2G(p), i.e. G02 (p)

0 , G0 (p)p

2G(p)

0.

(2.65)

A direct calculation gives 0

G (p)p

2G(p) =

+

2(n

X 1)2 +2 (ni p

k X

k X

ni

=q

i=1

1 and for x > p q(x) :=

ti

!

1

2



k) log 1

k X p(ni i=1

where ti = ni

k)2 p+1

p(n n k

k

2

n ✓

1) log 1

i=1

p k+1 p ni





+p

pk

(2.66)

q(ti ),

i=1

1, ✓ 2x2 log 1

px2 x p+1

p x+1



+p

px.

(2.67)

We write q(x) as another form so that we can see that q(x) is an increasing function with respect to x. Denote (x) =

log(1

x)

x, for 0 < x < 1. Easily, we have that (x) is a

positive increasing function, thus x2 (x) is increasing too. Then q(x) = p =

px2 2px2 + + 2x2 (x) x p+1 x+1  2 (p 1)2 2 p +p + 2x2 (x). x+1 x p+1 px

(2.68)

By evaluating the derivative function of the second term, it’s easy to see that the second term is non-decreasing with respect to x when p

3, which implies q(x) (2.68) is also non-decreasing.

15

Letting x goes to infinity, we have that px2 x p+1

lim q(x) =

x!1

2x



2

p2 2(x + 1)2

p x+1

p3 x2 + p2 x + p2 px2 2px (x + 1)2 (p x 1)

=



+p

px (2.69)

p

= 0, 2. Then we find the one lower bound for G0 (p)p 2G(p)

which implies q(x) < 0 for x > p 1 in (2.66) : 0

G (p)p

2G(p)

k X

q(t1 )

k X

q(ti ) =

i=1

q(ti )

0.

(2.70)

i6=1

This leads to (2.65), then (2.63) follows from (2.65).

3

Proof of the main result

Before beginning the proof of the main result, we state the following lemma to express the moments of Wn as a product of

functions, which is available in Muirhead [13, p. 302].

p

Lemma 3.1. Let ⇤⇤n be as in (1.10). Set Wn =

Qk

(ni 1)/2 i=1 |A| |A|(n k)/2

⇤⇤n

=

Qk

1)(ni

i=1 (ni

·

k)(n

(n

1)p/2

.

k)p/2

(3.1)

Assume ni > p for 1  i  k. Then under H0 in (1.1), we have E(Wnt ) = for all t > max1ik

p 1 ni 1

p

1 k) p 2 (n 1 k)(1 + 2 (n

1, where

p

t)

·

k Y

p

i=1

1 1)(1 + 2 (ni 1 1) p 2 (ni

t)

,

(3.2)

is defined as in (2.3).

Proof of Theorem 1.1. By (3.1), we write log ⇤⇤n

= log Wn +

(n

k)p 2

log(n

k)

i=1

Let µ0n = µn

(n

k)t 2

k X (ni

log

n

k 2

16

+

k X (ni i=1

1)p 2

1)t 2

log

log(ni

ni

1 2

.

1).

(3.3)

(3.4)

To prove Theorem 1.1, it is equivalent to show that µ0n

log Wn

d

! N (0, 1),

n

as n ! 1. Note that log Wn

µ0n

d

! N (0, 1)

n

,E exp



(3.5)

µ0n

log Wn

= log Eet log Wn

s

n

µ0n t ! es

2 /2

(3.6)

1 1 2 2 , log E(Wnt ) = µ0n t + s2 + o(1) = µ0n t + t + o(1) 2 2 n for all |s| 1, where t = s/ n . By Lemma 3.1, we write the log E(Wnt ) as sum of log(

p)

function:

log E(Wnt ) =

=

log

log

p

p

1 k)(1 + 2 (n 1 k) p 2 (n 1 2 (n p

t)

+

2 n kp2

1 36



p

log

i=1

k) + 12 (n 1 k) 2 (n

By Lemma 2.4, we have

k X

k)t

k X

log

p



2 n n!1 kp2

) lim

1 2 (ni p

i=1

1 k

1

+

1 1)(1 + 2 (ni 1 1) p 2 (ni

t)

1) + 12 (ni 1 1) 2 (ni

= c > 0 or 1,

(3.7) 1)t

.

(3.8)

p as p ! 1, k ! 1, which follows t = O(1/(p k)) and when the limit is positive constant p p c or t = o(1/(p k)) when the limit is infinity. Since k/n ! 0, (n k)t = O(n/(p k)) and p (ni 1)t = O(ni /(p k)). By Proposition 2.1, the first term on the last line of the left-hand side of (3.7) is log

p

1 2 (n p

=

n



k+1 + ↵n

k) + 12 (n 1 k) 2 (n (n k+1

✓2 n

k)t

k)t

+

(n n k+1

k)2 t2 + In 4

◆ k+1 k+1 =p log 2 2 ✓ ◆ ✓ ◆ n k + 1 (n k)t n k + 1 (n k)t + + log + 2 2 2 2 ✓ ◆ ✓ ◆ 1 p (n k)t + p n+k+ log 1 2p 2 n k+1 2  ✓ ◆ 2 2 p p (n k) t + log 1 + In , n k+1 n k+1 4 n

17

(3.9)

as p ! 1, k ! 1, where ↵n

k+1 ,

n k+1

and

n k+1

are defined in (2.7). Replacing n k +1

with ni to obtain a similar form for the second term on the last line of the right-hand side of (3.7) 1 2 (ni

p

log

1) + 12 (ni 1 1) 2 (ni

p

1)t

1)2 t2 + In 2 4  ◆ ✓ ◆ ⇣n ⌘ ni ni (ni 1)t ni (ni 1)t i =p log + + log + 2 2 2 2 2 2 ✓ ◆ ✓ ◆ 3 p (ni 1)t + p ni + log 1 2p 2 ni 2  ✓ ◆ 2 2 p p (ni 1) t + log 1 + I ni , ni ni 4 =

ni

+ ↵ ni

(ni

1)t

as p ! 1, k ! 1. Observe that In = O

+ ✓



ni

(ni



1 nk 1/2



+O

1 pk



(3.10)

= o(1)

(3.11)

and k X i=1

I ni

k  ✓ X = O i=1

1 ni k 1/2



+O



1 pk



= o(1) +

k X

O

i=1



1 ni k 1/2



= o(1).

(3.12)

Then taking the sum with respect to i in (3.10) and adding all terms in (3.9) gives log E(Wnt ) =

k  X

ni

+ ↵ ni

(ni 2

i=1

↵n

n k+1

=

+

"

"

k X

i=1 k X i=1

ni

(n k+1

n k+1

ni (ni

1)t

1)2

#

+

ni

1)2 t2 + kIn 4

(ni

k)t

(n n k+1

2 " k X ↵n (ni i + 2 i=1

n k+1 (n

4

4

= F1 + F2 t + F3 t2 + o(1),

18

k)

1) #

k)2 t2 + In 4 ↵n

t2 + o(1)

k+1 (n 2

k)

#

t

(3.13)

as n ! 1. Then we compute F1 , F2 and F3 in the last line of (3.13) respectively. Note that ✓ ◆ n k+1 n k+1 n k+1 = log p 2 2 ✓ ◆ ✓ ◆ n k + 1 (n k)t n k + 1 (n k)t + + log + 2 2 2 2 (3.14) n k+1 (n k)t n k + 1 (n k)t = log(1 + t) + log + log(1 + t) 2 2 ◆ ✓2 ◆2 ✓ n k + 1 (n k)t t + + log 1 , 2 2 (n k + 1)(1 + t) where we used the fact ✓ n log

To get

ni /p,

k + 1 (n k)t + 2 2



= log[(n k + 1)(1 + t) t] log 2 ✓ ◆ ✓ n k+1 = log + log(1 + t) + log 1 2

t k + 1)(1 + t)

(n

we replace n



(3.15) .

k + 1 with ni in (3.14), then ◆ ✓ ◆ ⇣n ⌘ ✓n ni (ni 1)t ni (ni 1)t ni i i = log + + log + p 2 2 2 2 2 2 ni (ni 1)t ni (ni 1)t log + log(1 + t) = log(1 + t) + 2 ✓ 2 2◆ ◆ ✓ 2 ni (ni 1)t t + + log 1 . 2 2 ni (1 + t)

Subtracting (3.16) from (3.14) gives Pk F1 n n k+1 = i=1 i p p =

(3.16)

k

k + 1 X (ni 1)t ni + log 2 2 2 2 2 i=1 ✓ ◆ ✓ ◆ n k + 1 (n k)t t + log 1 2 2 (n k + 1)(1 + t) ◆ ✓ ◆ k ✓ X ni (ni 1)t t + + log 1 2 2 ni (1 + t)

k

1

(n

log(1 + t)

k)t

log

n

i=1

=

k

1 2

+ =

log(1 + t)

k

1 2

(k

1) 2

(n

t

t

k

2 1

2 (n

log

n

k 2

+

k X (ni i=1

1)t 2

log

ni

1 2

t + o(1)

k)t 2

k)t

log

n

k 2

+

k X (ni i=1

19

1)t 2

log

ni

1 2

+ o(t2 ),

(3.17)

as n ! 1. Observe that o(t2 ) · p = o(1), therefore F1 = By the fact

(k

1) 2

k

k)pt n k X (ni 1)pt ni 1 log + log + o(1). 2 2 2 2

(n

pt

(3.18)

i=1

Pk

=n)

i=1 ni

Pk

i=1 (ni

1) = n

k X ↵ni (ni 1) 2 i=1 k ✓ 1X = p ni + 2 i=1 ✓ 1 (n k) p 2 (k 1) =µn p, 2

F2 =

↵n

k, the coefficient of t, F2 , in (3.13) is given by k+1 (n

2 ◆ ✓ 3 log 1 2 n+k+

1 2



k)

p ni



(ni



log 1

1)

n

p k+1

(3.19)



where µn is defined in (1.19). Similarly, we obtain 4F3 =

k X

ni (ni

1)2

n k+1 (n

k)2

i=1

=

np + 2kp

k X p ni

k X

i=1

(ni

2



1) log 1

i=1

p ni



✓ p 2 + p(n k + 1) 2p + + (n k) log 1 n k+1 n ✓ ◆ p p =p(k 1) + (n k)2 log 1 + n k+1 n k+1 ✓ ◆ k k X p X p (ni 1)2 log 1 ni ni i=1

=2 where

2 n

2 n

p k+1



(3.20)

i=1

+

n

p k+1

k X p , ni i=1

is defined in (1.20). Thus, we have F3 =

2 n

2

+

p 4(n k + 1)

20

k X p . 4ni i=1

(3.21)

Plugging F1 (3.18), F2 (3.19), and F3 (3.21) back in (3.13) yields log E(Wnt ) = F1 + F2 t + F3 t3 + o(1) ✓ ◆ 2 (k 1) p n = µn p t+ + 2 2 4(n k + 1) (n

!

k X p 4ni i=1

k

t2 +

(k

1) 2

pt

(3.22)

k)pt n k X (ni 1)pt ni 1 log + log + o(1). 2 2 2 2 i=1

By the given condition k = o(ni ), we have

Pk

i=1 1/(kni )

= o(1), thus !

✓ ◆ k k X X p 2 1 1 t =O ·O ni p kni

= o(1)

(3.23)

✓ ◆ ✓ ◆ 1 1 ·O = o(1), p n

(3.24)

i=1

and

i=1

p t2 = O 4(n k + 1)

as p ! 1, k ! 1. Therefore (3.22) becomes "

log E(Wnt )

= µn

(n

k)t 2

log

n

k 2

1 =µ0n t + s2 + o(1), 2

+

k X (ni i=1

1)t 2

log

ni

1 2

#

t+

1 2

2 2 nt

+ o(1)

(3.25)

as p ! 1, k ! 1, where µ0n is defined in (3.4). This completes the proof.

4

Numerical simulations

In this section, we present numerical simulation with 10,000 iterations to estimate the size and power of the likelihood ratio test using the central limit theorem approximation and the classical

2

approximation (1.11). First, we show the histogram of two tests comparing them

with normal curves and chi-square curves, which gives a intuitive distinction between these two approximations. The R code for this section is given in Appendix. In Figure 1, we display the normal curves and the histogram of (log ⇤⇤n ni = 100, 1  i  10 with p = 5, 20, 40, 60. We see that the histogram of

µn )/

(log ⇤⇤n

n.

Let

µn )/

n

fits better with the normal curve along with growth of dimension p. In Figure 2 we present the chi-square curves and the histogram of 1. Observe that the histogram of

2⇢ log ⇤⇤n under the same conditions as in Figure

2⇢ log ⇤⇤n parts from the chi-square curves quickly. The 21

comparison between the CLT and the classical chi-square approximation shows that the CLT is better when p and k are large. p=20,k=10

p=40,k=10

p=60,k=10

-4

-2

0

2

-4

-2

test_clt

0

2

4

Figure 1: Histogram of (log ⇤⇤n

µn )/

0.4 0.3 0

2

4

−4

n

p=40,k=10

140

160

180

200

Figure 2: Histogram of

0.0020

0.0030

0.0015 Density

0.0020 Density

0.0015

0.0005

0.0010 1900

2000

2100

2⇢ log ⇤⇤n and

0.0000

0.0005 1800

7200

test_chi

5, 20, 40, 60.

4.1

0.0000 1700

test_chi

4

p=60,k=10

0.0025

0.006 0.005 0.004 Density

0.003 0.002 0.001 0.000

120

2 test_n

0.025 0.020 0.015 Density

0.010 0.005

100

0

and normal curves, where ni = 100, 1  i  10 and

p=20,k=10

0.000 80

−2

test_n

p = 5, 20, 40, 60.

p=5,k=10

0.2

Density

0.1 0.0 −2

test_clt

7400

7600 test_chi

2

0.0010

-6

0.2

Density

0.0

0.0

0.0

0.1

0.1

0.2

Density

0.2 0.1

Density

0.3

0.3

0.3

0.4

0.4

p=5,k=10

7800

16500

17000

17500

18000

test_chi

curves, where ni = 100, 1  i  10 and p =

Estimation of size

Size is the probability of rejecting H0 when H0 is true. In our simulation, we choose the significance level ↵ to be 0.05. The estimation of size is the ratio of the number of tests rejected and the number of iterations used under the condition that H0 is true. If the test is reasonable, the estimate of size should be close to the significant level ↵ = 0.05. In Table 1, we list the sizes of the traditional chi-square approximation and our normal approximation. When p = 5, 20, the sizes of chi-square approximation matches the significant level ↵ = 0.05 very satisfactorily. However, when p is getting close to n, the size of the chisquare approximation increases to one rapidly, which means the traditional chi-square test almost rejects every true H0 when p = 60, 80, 98 respectively. On the other hand, the normal approximation test is always very close to 0.05. Table 1: The size of LRT for equality of k covariance matrices 22

2

Conditions

CLT

approx.

ni = 100, k = 10, p = 5

0.0773

0.0536

ni = 100, k = 10, p = 20

0.063

0.0588

ni = 100, k = 10, p = 40

0.0511

0.1939

ni = 100, k = 10, p = 60

0.0521

0.9477

ni = 100, k = 10, p = 80

0.057

1

ni = 100, k = 10, p = 98

0.0483

1

The sizes are estimated based on 10,000 simulations from Np (0, Ip ).

4.2

Estimation of power

Power represents the probability of rejecting H0 when H0 is false, which implies that the test is reasonable when the power is large. Since the estimation of power may vary subject to the choice of the alternative hypothesis, the power of a test is relatively small when the covariance matrices are closed to each other and is larger when they are significantly di↵erent. In our numerical simulation, we studied the likelihood-ratio test under two di↵erent alternative hypotheses, the first one Ha : ⌃1 = Ip , ⌃2 = 1.2Ip , ⌃3 = 0.8Ip and the second one Ha : ⌃1 = ⌃2 = 1.1Ip , ⌃3 = 0.8Ip . The results are listed in Table 2 and Table 3 respectively. Table 2: The power of LRT for equality of k covariance matrices 2

Conditions

CLT

approx.

ni = 100, k = 3, p = 5

0.7431

0.6685

ni = 100, k = 3, p = 20

0.7495

0.7251

ni = 100, k = 3, p = 40

0.6415

0.7246

ni = 100, k = 3, p = 60

0.4972

0.9129

ni = 100, k = 3, p = 80 0.339 1 The powers are estimated based on 10,000 simulations from the alternative hypothesis that ⌃1 = Ip , ⌃2 = 1.2Ip , ⌃3 = 0.8Ip . Table 3: The power of LRT for equality of k covariance matrices 2

Conditions

CLT

ni = 100, k = 3, p = 5

0.633

0.5386

ni = 100, k = 3, p = 20

0.6079

0.5749

ni = 100, k = 3, p = 40

0.4971

0.5902

ni = 100, k = 3, p = 60

0.3927

0.8611

ni = 100, k = 3, p = 80

0.2739

1

23

approx.

The powers are estimated based on 10,000 simulations from the alternative hypothesis that ⌃1 = ⌃2 = 1.1Ip , ⌃3 = 0.8Ip .

References [1] L. V. Ahlfors. Complex analysis: An introduction of the theory of analytic functions of one complex variable. Second edition. McGraw-Hill, New York, 1966. [2] T. W. Anderson. An introduction to multivariate statistical analysis. Wiley Publications in Statistics. John Wiley & Sons, New York, 1958. [3] Z. Bai, D. Jiang, J.-F. Yao, and S. Zheng. Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist., 37(6B):3822–3840, 2009. [4] M. S. Bartlett.

Properties of sufficiency and statistical tests.

Proc. R. Soc. A,

160(901):268–282, 1937. [5] G. E. P. Box. A general distribution theory for a class of likelihood criteria. Biometrika, 36:317–346, 1949. [6] G. W. Brown. On the power of the l1 test for equality of several variances. Ann. Math. Statistics, 10(2):119–128, 1939. [7] M. L. Eaton. Multivariate statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, New York, 1983. A vector space approach. [8] T. W. Gamelin. Complex analysis. Undergraduate Texts in Mathematics. SpringerVerlag, New York, 2001. [9] D. Jiang, T. Jiang, and F. Yang. Likelihood ratio tests for covariance matrices of highdimensional normal distributions. J. Statist. Plann. Inference, 142(8):2241–2256, 2012. [10] T. Jiang and Y. Qi. Central limit theorems for classical likelihood ratio tests for highdimensional normal distributions II. Preprint, 2013. [11] T. Jiang and F. Yang. Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist., 41(4):2029–2074, 2013. 24

[12] O. Ledoit and M. Wolf. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist., 30(4):1081–1102, 2002. [13] R. J. Muirhead. Aspects of multivariate statistical theory. John Wiley & Sons, New York, 1982. Wiley Series in Probability and Mathematical Statistics. [14] M. D. Perlman. Unbiasedness of the likelihood ratio tests for equality of several covariance matrices and equality of several multivariate normal populations. Ann. Statist., 8(2):247– 263, 1980. [15] N. Sugiura and H. Nagao. Unbiasedness of some test criteria for the equality of one or two covariance matrices. Ann. Math. Statist, 39:1686–1692, 1968. [16] M. B. Villarino. Ramanujan’s harmonic number expansion into negative powers of a triangular number. JIPAM. J. Inequal. Pure Appl. Math., 9(3):Article 89, 12, 2008. [17] S. S. Wilks. Certain generalizations in the analysis of variance. Biometrika, 24(3/4):471– 494, 1932.

Appendix We give the R commands for section 4.1 and 4.2 in this section. First, we set up the basic conditions. For example, let p = 40, k = 3, ni = 100 and iterate 10000 times. > library(MASS); > nsim=10000; > p=40; > k=3; > n=rep(100,k); > N=sum(n); Generate some empty vectors in R to be used in the future: > X=vector("list",k); > a=vector("list",k); > logdet=mat.or.vec(1,k); > logbar=mat.or.vec(1,nsim) 25

The definition of µn (1.19),

n

(1.20), ⇢ (1.12) and the degree of freedom f (1.12):

> df=p*(p+1)*(k-1)/2; > rho=1-(sum((N-k)/(n-1))-1)*(2*p^2+3*p-1)/(6*(p+1)*(k-1)*(N-k)); > mu_n=((sum((n-1)*(2*p-2*n+3)*log(1-p/n))-(N-k)*(2*p-2*N+2*k+1) *log(1-p/(N-k+1))))/4+(k-1)*p/2; > sigma_sq=0.5*((N-k)^2*log(1-p/(N-k+1))-sum((n-1)^2*log(1-p/n)))+(k-1)*p/2; > sigma=sqrt(sigma_sq); To reproduce the estimation of sizes of normal approximation and chi-square approximation: > for(j in 1:nsim) { +

for(i in 1:k) X[[i]]=mvrnorm(n[i], rep(0,p), diag(p));

+

for(i in 1:k) a[[i]]=(n[i]-1)*cov(X[[i]]);

+

for(i in 1:k) logdet[i]=log(det(a[[i]]));

+

A=Reduce(’+’,a);

+

logbar[j]=sum((n-1)*logdet/2)+(p/2)*(N-k)*log(N-k)-(N-k)*log(det(A))/2

-sum((n-1)*p*log(n-1)/2); + } > > test_clt=(logbar-mu_n)/sigma; > test_chi=-2*rho*logbar; > > size_clt=length(which(test_clt size_chi=length(which(test_chi>qchisq(0.95,df)))/nsim; To reproduce the estimation of powers of normal approximation and chi-square approximation: > for(j in 1:nsim) { +

X[[1]]=mvrnorm(n[1], rep(0,p), diag(p));

+

X[[2]]=mvrnorm(n[2], rep(0,p), 1.2*diag(p));

+

X[[3]]=mvrnorm(n[3], rep(0,p), 0.8*diag(p));

+

for(i in 1:k) a[[i]]=(n[i]-1)*cov(X[[i]]);

+

for(i in 1:k) logdet[i]=log(det(a[[i]]));

+

A=Reduce(’+’,a);

+

logbar[j]=sum((n-1)*logdet/2)+(p/2)*(N-k)*log(N-k)-(N-k)*log(det(A))/2

-sum((n-1)*p*log(n-1)/2); 26

+ } > > test_clt=(logbar-mu_n)/sigma; > test_chi=-2*rho*logbar; > > power_clt=length(which(test_clt power_chi=length(which(test_chi>qchisq(0.95,df)))/nsim; Figure 1 and Figure 2 can be reproduced in R as follows: > h_n=hist(test_clt,50,main=’p=40,k=10’,freq=FALSE); > seq_n=seq(min(test_clt),max(test_clt),by=0.001); > density_n=dnorm(seq_n,0,1); > lines(seq_n,density_n,col="blue",lwd=2); > > h_chi=hist(test_chi,50,main=’p=5,k=10’,freq=FALSE); > seq_chi=seq(min(test_chi),max(test_chi),by=0.001); > density_chi=dchisq(seq_chi,df); > lines(seq_chi,density_chi,col="blue",lwd=2);

27