0; for i > - Semantic Scholar

Report 3 Downloads 42 Views
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

Information Properties of Order Statistics and Spacings Nader Ebrahimi, Ehsan S. Soofi, and Hassan Zahedi Abstract—We explore properties of the entropy, Kullback–Leibler information, and mutual information for order statistics. The probability integral transformation plays a pivotal role in developing our results. We provide bounds for the entropy of order statistics and some results that relate entropy ordering of order statistics to other well-known orderings of random variables. We show that the discrimination information between order statistics and data distribution, the discrimination information among the order statistics, and the mutual information between order statistics are all distribution free and are computable using the distributions of the order statistics of the samples from the uniform distribution. We also discuss information properties of spacings for uniform and exponential samples and provide a large sample distribution-free result on the entropy of spacings. The results show interesting symmetries of information orderings among order statistics. Index Terms—Differential entropy, entropy bounds, Kullback–Leibler information, mutual information, probability integral transformation.

177

We develop several results on the properties of the entropy of order statistics and on the Kullback–Leibler discrimination information functions that involve order statistics. The probability integral transformation of the random variable U = FX (X ) plays a pivotal role in developing our results. It is well known that the distribution of U is uniform over the unit interval. The order statistics of a sample from uniform distribution U1 ; . . . ; Un are denoted by W1 < 1 1 1 < Wn and Wi , i = 1; . . . n, has beta distribution with density

1

wi01 (1 0 w)n0i ; 0  w  1 (2) B (i; n 0 i + 1) where B (z1 ; z2 ) = 0(z1 )0(z2 )=0(z1 + z2 ); see [1]. gi (w) =

Section II presents some results on the entropy of order statistics. Section III gives some results on the discrimination information function related to order statistics. Section IV presents information properties of spacings for uniform and exponential samples and gives an asymptotic result on the entropy of the spacings. II. ENTROPY OF ORDER STATISTICS

I. INTRODUCTION Suppose that X1 ; . . . ; Xn are independent and identically distributed observations from a distribution FX , where FX is differentiable with a density fX which is positive in an interval and zero elsewhere. The order statistics of the sample is defined by the arrangement of X1 ; . . . ; Xn from the smallest to the largest, denoted as Y1 < 1 1 1 < Yn . It is well known [1] that the distribution Fi (y ) = P (Yi  y), i = 1; . . . n, has the following density:

f i (y ) =

0(n + 1) [F (y)]i01 [1 0 F (y)]n0i f (y) X X 0(n 0 i + 1)0(i) X

(1)

where for a positive integer z , 0(z ) = (z 0 1)! is the gamma function. Order statistics have been used in a wide range of problems, including robust statistical estimation and detection of outliers, characterization of probability distributions and goodness-of-fit tests, entropy estimation, analysis of censored samples, reliability analysis, quality control, strength of materials, waiting time until a big event, selecting the best, records and allocation of prize in tournaments, inequality measurement, speech processing, image and picture processing, echo removal, image coding, filtering, spectrum estimation, acoustics, and edge enhancing; see [1]–[3] and references therein. In spite of such a wide scope of applications, not much attention has been given to the study of information properties of order statistics. We have been able to find only three papers on this topic. Wong and Chen [2] showed that the difference between the average entropy of order statistics and the entropy of data distribution is a constant. They also showed that for symmetric distributions, the entropy of order statistics is symmetric about the median. Park [4] showed some recurrence relations for the entropy of order statistics, and Park [5] provided similar results in terms of the Fisher information.

The probability integral transformation provides the following useful representation of entropy of the random variable X :

1

H (X ) =

0

=

0

n

i

fX (x) log fX (x)dx 01 1 log fX FX01 (u) du:

0 Hereafter, the range of integrations will not be shown and should be clear from the context. The entropies of order statistics Y1 ; . . . ; Yn are found by noting that Wi = FX (Yi ), i = 1; . . . n. The transformation formula for the en01 (Wi ) gives the following representations of tropy applied to Yi = FX the entropy of order statistics: (3) H (Y ) = H (W ) 0 E log f F 01 (W )

i

= H n (W i ) 0

g

X X i fi (y) log fX (y)dy

where Hn (Wi ) denotes the entropy of the beta distribution shown in (2). The expression for beta entropy is

Hn (Wi ) = log B (i; n 0 i + 1) 0 (i 0 1)[ (i) 0 (n + 1)] 0 (n 0 i)[ (n 0 i + 1) 0 (n + 1)]

(5)

where (z ) = d logdz0(z ) is the digamma function. Noting that Hn (W1 ) = Hn (Wn ) = 1 0 log n 0 n1 , representation (4) generalizes [4, Proposition 6.1] for H (Y1 ) and H (Yn ). The following property of the beta entropy is used in the sequel. Let

1n (i) = Hn (Wi ) 0 Hn (Wi+1 ) = [log(n 0 i) 0 (n 0 i)] 0 [log i 0 (i)]: Then

Manuscript received July 15, 2002; revised July 17, 2003. N. Ebrahimi is with the Division of Statistics, Northern Illinois University, DeKalb, IL 60155 USA (e-mail: [email protected]). E. S. Soofi is with the School of Business Administration,University of Wisconsin-Milwaukee, Milwaukee, WI 53201 USA (e-mail: [email protected]). H. Zahedi is with the Department of Statistics, Florida International University, Miami, FL 33199 USA (e-mail: [email protected]). Communicated by P. Narayan, Associate Editor for Shannon Theory. Digital Object Identifier 10.1109/TIT.2003.821973

(4)

1 n (i) < 0;

for i
0;

for i >

(6)

n

2

n

2

(7)

and for an even n, 1n (n=2) = 0. The inequalities in (7) are obtained by noting that (z ) = log z 0 (z ) is a decreasing function; 0 (z ) = 0 0 1 z 0 (z ) < 0, where (z ) is the trigamma function ([6, p. 228]). As an application of the representation (3) consider the following example.

0018-9448/04$20.00 © 2004 IEEE

178

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

Example 2.1: Let X be a random variable having the exponential 0 e0x . For computing H Yn , we find distribution FX x 0 1 0 1 0 w and the expectation term in (3) 0 FX w

Now

()=1 ( ) log(1 ) ( )= 01 (Wi ) Eg log fX FX = Eg [log  + log(1 0 Wi )] (8) = log  + (n 0 i + 1) 0 (n + 1): For the sample minimum i = 1, (5) gives Hn (W1 ) = 1 0 log n 0 n1 . Evaluating (8) and noting that (n + 1) = (n) + n1 , (3) gives H (Y1 ) = 1 0 log n. Thus, in this case, (3) gives the result in ac-

0 Eg log fX FX01 (Wi ) = 0 gi (w)log fX FX01(w)

=0 0

()= 1 ( ) = 1 log H (Yn ) = 1 0 log n 0 log  + (n) + where = 0 (1) = 0:5772 1 1 1 is the Euler constant. Note that H (Yn ) 0 H (Y1 ) = (n) +  0. The equality holds only when n = 1. That is, the uncertainty about the maximum is always more than the minimum in exponential samples. The asymptotic difference is H (Yn ) 0 H (Y1 )  log(n) + . Finally, it can be shown that for all i = 1; . . . ; n 0 1 1 0 1 n (i)  0: H (Yi+1 ) 0 H (Yi ) = n0i

The representation (3) also facilitates development of results about the entropy of order statistics. The following theorem provides bounds for the entropy of order statistics H Yi in terms of the entropy of data distribution H X .

( )

Theorem 2.1: For any random variable X with entropy H X < 1 the entropy of order statistics Yi , i ; ; n, is bounded as follows.

= 1 ...

a) Let Bi denote the ith term of the binomial probability i01 ; p i , pi n01 . Then

1 ) and

= H (Yi )  Hn (Wi ) + nBi [H (X ) + I (A)]

Bin(n 0

( )  Hn (Wi ) + nBi [H (X ) + I (A)]

H Yi where

( )=

( )log f (x)dx A A = fx : f (x)  1g, and A = fx : f (x) > 1g. b) Let M = fX (m) < 1, where m = supfx : fX (x)  M g is I A

f x

the mode of the distribution. Then

( )  Hn (Wi ) 0 log M

and

( )  Hn (Wi ) 0 log M + nBi [H (X ) + log M ]: ( )

pii01

(1 0 pi )n0i = nBi :

dw

A

( )

( )log fX (x)dx

fX x

 A

( )log fX (x)dx

fX x

 =

01 w w fX F X

: :

(11)

( ) 1 ( )

>

1g

:

The first inequality is obtained by noting that the integral in (11) is nonnegative. The second inequality is obtained using (9). The lower bound of H Yi is obtained similarly by noting that the integral in (10) is nonpositive.

( )

= = = 1 ... ( )= 1 ( ) = ( ) ( ) = 0 Hn (Wi )  H (Vi )  Hn (Wi ) + nBi H (Z ): Using H (Z ) = H (X ) + log M and H (Vi ) = H (Yi ) + log M

M X and Vi M Yi , i ; ; n, denote the order b) Let Z z 1 statistics of Z . Then fZ z M fX M  for all z . Noting 0H Z and I A , from part a) we have that I A

gives the result.

The bounds in Theorem 2.1 are useful when the probability distribution function FX does not have a closed form, and thus the density of ordered statistics (1) and the beta expectation in (3) cannot be easily evaluated. The entropy expression for many well-known distributions is available, and thus the bounds in Theorem 2.1 are easily computable. When the bounds in both parts of Theorem 2.1 can be computed, one may use the maximum of the two lower bounds and the minimum of the two upper bounds. Example 2.2: We compute the bounds for the entropies of the sample minimum and maximum for some well-known distributions. Noting Hn W1 Hn Wn 0 n 0 n1 and B1 Bn , ; n: part b) of Theorem 2.1 gives the following bounds for H Yi , i

( )=

( ) = 1 log

( )  1 0 log n 0 n1 0 log M

= =1 ( ) =1

H Yi and

(12)

( )  1 0 log n 0 n1 0 log M + n[H (X ) + log M ]: )

( ) = log(

)

[ ]

b) For the exponential distribution with parameter , M H X 0 . As noted in Example 2.1

Proof: a) The mode of the beta distribution gi w is pi . Thus,

1 1

01 w FX

01 w w fX F X

A1

(

H Yi

n0 i0

log fX

(13)

=

a) For the uniform distribution over the interval a; b , M b 0 a . Thus, the equalities in (12) b 0 a 01 and H X and (13) hold.

and

( )  g i (p i ) = n

dw

( )

=

A1

dw

01 w FX

A

(10)

H Yi

H Yi

gi w

( )

= nBi H (X ) +

( )

( )

01 w FX

dw

( )log fX

= nBi 0 where

( )

gi w

 nBi 0

( )=

The inequality can be seen from (16) in Section III. That is, the entropy of the ith-order statistic of sample from the exponential distribution is increasing in i.

( )log fX A

01 w FX

( )log fX

gi w

gi w

0

cord with the known fact that the sample minimum has an exponential distribution with parameter n. However, the case of the sample maximum is more complicated. The distribution function of Yn is Fn y 0 e0x n and the density is fn y n 0 e0x n01 e0x . 0 n 0 n1 , the formula (3) simply gives Noting that Hn Wn

1

 A

A

dw

(9)

( ) = 1 log H (Y1 ) = 1 0 log n H (Yn ) = 1 0 log n + (n) +

= , and

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

and the entropy of the exponential order statistics is increasing in i. Thus,

1 0 log n 0

1

n

0 log   H (Y1 )  H (Yn )  1 0 log n 0 n1 0 log  + n:

The difference between H (Y1 ) and the lower bound is n01 , which vanishes as n ! 1. The difference between the upper bound and H (Yn ) is n 0 n01 0 (n) 0 , which is an increasing function of n. Thus, the upper bound is useful when n is not large. c) The density function of Pareto distribution with parameters and is

fX (x) = ; for x  > 0; > 0 x +1 = 0; otherwise: 0 1 Here, M = and H (X ) = log( 01 ) + 01 + 1. The distribution of Y1 is also Pareto with parameters n and , thus, H (Y1 ) = log[ (n )01 ] + (n )01 + 1. The distribution of Yn is more complicated. Using (3) gives

H (Yn ) = log[ (n )01 ] 0 n01 0 ( 01 + 1)sn + 1

indicated that the difference between H (Ymed ) and its lower bound decreases sharply in n and vanishes for large n. However, the difference between the upper bound and H (Ymed ) grows with n. Hence, for the median, the upper bound is useful when n is not large. Next we provide some results on the entropy of order statistics in terms of ordering properties of distributions. We need the following definitions in which X and Z denote random variables with distribution functions FX and FZ , density functions fX and fZ , and survival functions FX (x) = 1 0 FX (x) and FZ (z ) = 1 0 FZ (z ). Definition 2.1: A nonnegative random variable X is said to have a decreasing (an increasing) failure rate (DFR (IFR)) if the failure rate (hazard function) X (t) = fX (t)=FX (t) is decreasing (increasing) in t  0. Equivalently, if FX (x + t)=FX (t) is increasing (decreasing) in t for all x  0. Definition 2.2: The random variable X is said to be stochastically st  Z , if FX (v)  FZ (v) for

less than or equal to Z , denoted by X all v .

Definition 2.3: The random variable

X

n

n 1 sn = (01)k  0 1: k k k=1 Noting that H (Yn ) 0 H (Y1 ) = ( 01 + 1)(n01 + sn )  0, we

have

is said to be less than or

equal to Z in dispersion ordering, denoted by X

FX01 (u) 0 FX01 (v)  FZ01 (u) 0 FZ01 (v); Definition 2.4: The random variable

where

d

 Z , if and only if

for all 0  v

X

< u  1:

is said to be less than

or equal to Z in likelihood ratio ordering, denoted by fX (x)=fZ (x) is nonincreasing in x.

X

`r

 Z , if

Definition 2.5: The random variable X is said be less than or equal e to Z in entropy ordering, denoted by X  Z , if H (X )  H (Z ).

d

st

`r

 Z implies dX  Z [7] ande X  Z st implies X  Z . It is also known that X  Z implies X  Z [8].

1 0 log n 0 +log  H (Y1 ) n  H (Y n ) 1

It is well known that

 1 0 log n 0 n1 +log + 1 +1 n:

The difference between H (Y1 ) and the lower bound is ( 01 + 1)n01 , which vanishes as n ! 1. The difference between the upper bound and H (Yn ) is ( 01 + 1)(n + sn ), which is increasing in n. Thus, the upper bound is useful when n is not large.

d) For a normal distribution with variance 2 , M = (2 2 )01=2 and H (X ) = 12 + 12 log 2 2 . In this case, the entropy of order statistics is symmetric about the median [2]. Thus,

and

179

1

H (Y1 ) = H (Yn )  1 0 log n 0

n

H (Y1 ) = H (Yn )  1 0 log n 0

n

1

1 + log 2 2 2 +

n 1 log 2 2 + : 2 2

Numerical computations indicate that H (Y1 ) = H (Yn ) decreases with n and the difference between H (Y1 ) = H (Yn ) and the lower bound is narrow and slowly (concave) increasing in n. Numerical computations also indicate that the difference between the upper bound and H (Y1 ) = H (Yn ) is increasing in n. Thus, for the minimum and maximum, the lower bound is useful when n is small or moderate and the upper bound is useful when n is small. Finally, an example in Wong and Chen [2] shows graphically that for n = 7, H (Y1 ) = H (Yn ) is the maximum and the entropy of the median H (Ymed ) is the minimum among the order statistics of the normal distribution. Our numerical computations confirmed this pattern and

X

Theorem 2.2: Let X and Z be two nonnegative random variables.

If Z

st

e

 X and X is DFR, then Z st X . Proof: Let X be DFR with Z  X . Then

0H (Z ) = fZ (z) log fZ (z)dz  fZ (z) log fX (z)dz =

fZ (z ) log FX (z )dz

fZ (z ) log X (z )dz +

 fX (z) log X (z)dz + fX (z) log FX (z)dz =

fX (z ) log fX (z )dz = 0H (X ):

The first inequality is implied by the nonnegativeness of Kullback–Leibler information between fZ and fX . The second inequality

is obtained using the following result: Z nonincreasing function , Ef [(Z )] > Corollary 2.2: Let

X

DFR distribution. If Yi

st

 X if and only if for any Ef [(X )].

be a nonnegative random variable having a

st

e

 X , then Yi  X .

Example 2.3: It is well known that the sample minimum Y1 is stochastically dominated by X . Thus, for a DFR distribution, e Y1  X . Important examples of DFR distributions are gamma and Weibull distributions with shape parameters less than one, Pareto distribution, and the mixtures of exponential distributions. Theorem 2.2

180

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

and Corollary 2.2 apply to these and other DFR distributions. For example, for the Pareto distribution discussed in Example 2.2

H (X ) 0 H (Y1 ) = log n + 01 (1 0 n01 )  0;

for all n 

Theorem 2.3: Let X be a random variable and let Yi , i denote its order statistics.

01 x is nondecreasing in x, then H a) If fX FX in i for i < n= .

()

2

2

H (Yi+1 ) 0 H (Yi ) = 01n (i) + Eg

log

0 Eg

1 ()

= 1; . . . ; n ,

(Yi ) is decreasing

01 x is nonincreasing in x, then H b) If fX FX in i for i > n= . Proof: a) Using (3), we have

()

1:

(Yi ) is increasing

fX (FX01 (Wi ) log FX01 (Wi+1 )

1 ()

where n i is defined in (6). That is, the information discrepancy between the distribution of order statistics and data distribution decreases up to the median and then increases. Thus, among the order statistics, the median has the closest distribution to the data distribution. We also note that Kn fi fX H U 0 Hn Wi . That is, the discrimination information between the distribution of order statistics and the data distribution is the difference between the maximum entropy and the entropy of beta distribution over the unit interval. The bounds in Theorem 2.1 provide the following bounds for the sum of the entropy of order statistics H Yi and the relative entropy of order statistics Kn fi fX in terms of the entropy of data distribution:

( : )= ( )

( ) ( : ) nBi [H (X ) + I (A)]  H (Yi ) + Kn (fi : fX )  nBi [H (X ) + I (A)]

and

0 log M  H (Yi ) + Kn (fi : fX )  nBi H (X ) + (nBi 0 1)log M:

where n i is defined in (6). Since order statistics are stochastically ordered, we have

Wi

st

st

 Wi+1 . Also, Wi  Wi+1 implies that for any nonde-

creasing function 

Eg [(Wi )] < Eg [(Wi+1 )]: Thus, H (Yi+1 ) 0 H (Yi )  0 and the result follows.

The next result relates the average information discrepancy between the distribution of the order statistics and the data distribution and the difference between the average entropy of order statistics and the entropy of the data distribution. Theorem 3.1: Let

b) The proof is similar to that of part a) and is omitted.

K (fi : fX ) =

The following example gives an application of Theorem 2.3. Example 2.4: Let X be a random variable with the uniform distribution over the unit interval. Noting that FX x x, FX01 x x, and 0 1 , both conditions of Theorem 2.3 are satisfied. Thus, f FX x the entropy of the ith-order statistic is decreasing in i for i < n= and is increasing in i for i > n= . This confirms (7).

( )=

( ) =1

( )=

Theorem 2.4: Let X and Z be two random variables and denote d their order statistics by Yi and Vi , i ; ; n, respectively. If X  Z , e then Yi  Vi .

=1 . . .

Proof:

X

d

d

e

 Z , then Yi  Vi [9], and hence Yi  Vi .

H (Y ) =

A. Discrimination Between Order Statistics and Data Distribution The Kullback–Leibler discrimination information between the distribution of the order statistics fi and the data distribution fX is given by

Kn (fi : fX ) = K (gi : U ) =

gi (w)log gi (w)dw = 0 H n (W i ) where gi is the beta distribution (2) and U is the uniform distribution. The first equality follows from U = FX (X ) being a one-to-one transformation and Wi = FX (Yi ).

Therefore, the discrimination information between the distribution of order statistics and the data distribution is distribution free and is only a function of the sample size and the index i. As a function of i, Kn fi fX is decreasing in i for i < n= and is increasing in i for i > n= . This can be seen by noting that

( : ) 2

2

Kn (fi+1 : fX ) 0 Kn (fi : fX ) = 1n (i)

n

1

n

K (f : f ) n i=1 n i X

n i=1

H (Y i ):

Then

K (fi : fX ) = H (X ) 0 H (Y ) = Cn where

(14)

n Cn = 0 1 log B(i; n 0 i + 1) 0 n 02 1 n i=1

III. DISCRIMINATION INFORMATION This section discusses discrimination information between the distributions of order statistics and the data distribution, the discrimination information between the distributions of the order statistics, and the mutual information between consecutive order statistics.

1

and

2

2

( )

is a constant. Proof: The first equality in (14) is obtained by noting that

n i=1

K n (f i : f X )

n

fi (y)log fi (y) dy f X (y ) i=1 n n = fi (y)log fi (y)dy 0 fi (y)log fX (y)dy i=1 i=1 n n = 0 H (Y i ) 0 gi (Fx (y))fX (y)log fX (y)dy i=1 i=1 n n = 0 H (Y i ) 0 nqi01 fX (y)log fX (y)dy i=1 i=1 n = 0 H (Yi ) + nH (X ) i=1 where qi01 , i = 1; . . . ; n, are probability terms of binomial distribution Bin(n 0 1; p), p = FX (x). The last equality is noted by n qi01 = 1. i=1

=

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

The second equality in (14) is obtained as follows:

n i=1

Kn (fi : fX ) =

n i=1

=0 + + Now, letting j

n i=1

i=1 n i=1 n i=1

and

f i (y ) dy f X (y )

fi (y) log n

Kn (fi : fi+1 ) =

ifi (y) log[FX (y)]dy

(n 0 i)fi (y) log[1 0 FX (y)]dy:

(15)

n01

(n 0 1)qj fX (y)FX (y) log[FX (y)]dy

(16)

n : i(n 0 i)

It can be shown that all three measures are decreasing for i  (n +1)=2 and increasing for i  (n +1)=2. Moreover, the symmetric divergence is symmetric in i and n 0 i. Therefore, the distribution of the consecutive order statistics become closer to each other as they approach the median from either extremes. Next we give a result on the discrimination information between the order statistics in two samples. Theorem 3.2: Let X and Z be two random variables and let fi and `i denote the densities of their order statistics Yi and Vi , i = 1; . . . ; n, respectively. a) If Z

= n(n 0 1)EU (U log U ) = 0 n(n 0 1) : 4

st

`r

 Yi and X  Vi for an i < n=2, then Kn (fi+1 : `i+1 )  Kn (fi : `i ):

The last equality is obtained by noting that U is uniform over the unit interval and EU (U log U ) = [ (2) 0 (3)]=2 = 01=4. The second sum and integral in (15) can be evaluated similarly. Noting that 1 0 U is also uniform, we obtain 0n(n 0 1)=4 for the second sum and integral in (15), which completes the proof. Wong and Chen [2] proved the second equality in (14) through a tedious induction. The probability integral transformation greatly simplifies the proof. By [2, Theorem 2] we also conclude that the average information discrepancy between the distribution of the order statistics and the data distribution is increasing in the sample size n. The discrimination information between the data distribution and the distributions of order statistics is

Kn (fX : fi ) = log B (i; n 0 i + 1) + n 0 1:

b) If Z

st

Kn (fi+1 : `i+1 )  Kn (fi : `i ) respectively. Proof: a) Write

K n (f i : ` i ) = K n (f i : f ) +

i n0i



which is negative for i < n=2 and positive for i > n=2. The average symmetric divergence between the distribution of the order statistics and the data distribution is simply

 (fi : fX ) + K (fX : fi ) = J(fi ; fX ) = K

n01

2

:

B. Discrimination Between Order Statistics The discrimination information between distributions of the ith- and j th-order statistics is given by

K n (f i : f j )

fi (x) log

f (x ) dx: `i (x )

Therefore,

= 1 n (i) +

Kn (fX : fi+1 ) 0 Kn (fX : fi ) = log

`r

 Yi+1 and X  Vi+1 for an i > n=2, then

Kn (fi+1 : `i+1 ) 0 Kn (fi : `i )

In this case

= log 0(j )0(n 0 j + 1) 0 (i 0 j )[ (i) 0 (n 0 i)] 0 i 0 j : 0(i)0(n 0 i + 1) n0i

 =

 = 0:

fi+1 (x) log

f (x ) dx 0 `i+1 (x)

f (x ) dx 0 `i+1 (x) f (x ) fi+1 (x) log dx 0 `i+1 (x) L (x) fi+1 (x) log dx L (x ) L (x) `(x) log dx L (x ) fi+1 (x) log

fi (x) log

f (x ) dx `i (x )

f (x ) dx `i (x ) f (x ) fi+1 (x) log dx `i (x ) fi (x) log

The first inequality comes from the fact that

1n (i) < 0. The st

 Yi+1 and `r X  Vi . Finally, the last inequality comes from the fact that st Z  Yi+1 and L (x)=L(x) is decreasing in x. This completes

second inequality is due to the facts that Yi

the proof.

b) The proof is similar to part a) and is omitted. As an application of Theorem 3.2 consider the following example.

Consequently, we have

Kn (fi+1 : fi ) =

0 1 n (i):

Jn (fi+1 ; fi ) = Kn (fi+1 : fi ) + Kn (fi : fi+1 ) =

ifi (y) log[FX (y)]dy j =0

1

n0i

The symmetric divergence is simply

log B (i; n 0 i + 1)

= i 0 1 and U = FX (X ), we obtain

=n

181

1 + 1 (i) n i

Example 3.1: Let X and Z be two random variables having exponential distributions with parameters 1 and 2 . Denote order statistics

182

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

 z 0 (z + 1):

Yi and Vi , i = 1; . . . ; n, as before. Take 2 = n01 1 , then it implies `r that X  V1 and all the assumptions in Theorem 3.2 hold.

The inequality is obtained from

c) Similarly, the derivative with respect to n is

C. Mutual Information Between Consecutive Order Statistics The sequence of order statistics Y1 ; . . . ; Yn has a Markovian property. We can measure the degree of dependency among Y1 ; . . . ; Yn by the mutual information between consecutive order statistics

Mn (Yi ; Yi+1 )  Kn (fi;i+1 : fi fi+1 )

=

1

y

01 01 +1 (yi ; yi+1 ) 2 log ffii;i dy dy (yi )fi+1 (yi+1 ) i i+1 where the joint density of (Yi ; Yi+1 ), i = 1; . . . n 0 1, is 0(n + 1) [F (y )]i01 fi;i+1 (yi ; yi+1 ) = 0(n 0 i)0(i) X i 2 [1 0 FX (yi+1 )]n0i01 fX (yi )fX (yi+1 ); for yi < yi+1 otherwise:

The next theorem gives the mutual information Mn (Yi ; Yi+1 ) and its properties. Theorem 3.3: Let X be a random variable with distribution fX (x) and let Yi , i = 1; . . . ; n, denote its order statistics.

Mn0 (Yi ; Yi+1 ) = n 0 (n + 1) 0 (n 0 i) 0 (n 0 i + 1) > 0:

It is well known that the order statistics are associated. That is, for any two monotone functions

we have

COV [T1 (Y1 ; Y2 ; . . . ; Yn ); T2 (Y1 ; Y2 ; . . . ; Yn )]  0 [10]. The mutual information Mn (Yi ; Yi+1 ) captures the extent of any form of functional dependency between the order statistics, including the linear dependency. The invariance of Mn (Yi ; Yi+1 ) under the integral transformation of the random variable X is particularly important in this context. Data from any arbitrary distribution FX can be obtained by transforming a sample of uniform data u1 ; . . . ; un by the inverse 01 (ui ), i = 1; . . . ; n. Thus, the mutual infortransformation xi = FX mation function Mn (Yi ; Yi+1 ) preserves the dependency structure of the order statistics of the uniform sample under the transformation. By part b) of Theorem 3.2, the dependency between the consecutive order statistics is maximum at the median and is symmetric about the median. By part c), the extent of the dependency between consecutive order statistics increases with the sample size.

a) The mutual information between consecutive order statistics is distribution free and is given by

Mn (Yi ; Yi+1 ) = Mn (Wi ; Wi+1 ) n = 0 log + n (n ) 0 i (i) i 0 (n 0 i) (n 0 i) 0 1

(17)

b) For a given n, the mutual information between consecutive order statistics is symmetric in i and n 0 i, increases in i for i < n=2, and decreases in i for i > n=2.

c) The mutual information Mn (Yi ; Yi+1 ) is increasing in n. Proof:

a) The first equality in (17) follows from the fact that the mutual information is invariant under one-to-one transformation, Wi = FX (Yi ). Then the second equality is obtained using the beta marginals (2) for i and i + 1 and the joint density

0(n + 1) w 01 (1 0 w ) 0 01 ; +1 0(n 0 i)0(i) for 0  w < w +1  1 otherwise. = 0; i i

n

i

i

i

i

b) The symmetry in i and n 0 i is clear in the expression (17). Taking the derivative with respect to i and using the recurrence formula for the digamma function give

Mn0 (Yi ; Yi+1 ) = (n 0 i) 0 (n 0 i + 1) 0 i 0 (i + 1): To show that the derivative is positive for i < n=2 and negative for i > n=2, it suffices to show that z 0 (z + 1) is increasing in z  1. Using the recurrence formula for the trigamma function, we have

(z + 1) 0 (z + 2) = (z + 1)

0 (z + 1) 0

IV. SPACINGS The set of differences between consecutive order statistics, Si = 0 Yi01 , i = 2; . . . n; S1 = Y1 , is referred to as sample spacings. Spacings from the uniform and exponential distributions have elegant distributional structures and are usually used as benchmarks for studying spacings, [11]–[14]. If U is a random variable with a uniform distribution over the unit interval [0; 1], then Si , i = 1; . . . n, are all identically distributed as a beta (1; n) variable W1 with density g1 (w) shown in (2). Thus, Hn (Si ) = Hn (W1 ) = 1 0 log n 0 n1 , Kn (fS : fU ) = 0Hn (W1 ), and Kn (fS : fS ) = 0 for all i 6= j . For computing the mutual information between Si and Sj , we use the joint entropy of the pairs of spacings (Si ; Sj ), i 6= j , which are identically distributed and have the following bivariate density:

Yi

where Wi , i = 1; . . . ; n, are the order statistics of the uniform distribution over the unit interval.

gi;i+1 (wi ; wi+1 ) =

1 . (z +1)

T1 (y1 ; y2 ; . . . ; yn ) and T2 (y1 ; y2 ; . . . ; yn )

fi;i+1 (yi ; yi+1 )

= 0;

0 (z + 1) 

1 (z + 1)2

fS ;S (si ; sj ) = n(n 0 1)(1 0 si 0 sj )n02 ; for si ; sj  0; si + sj otherwise. = 0; It can be shown that for all i density is

1

6= j , the joint entropy of this bivariate

Hn (Si ; Sj ) = 0 log[n(n 0 1)] + 2 0

1

1

10 : n n01 Using Mn (Si ; Sj ) = Hn (Si ) + Hn (Sj ) 0 Hn (Si ; Sj ), we find that for all i 6= j , the mutual information between any pair of spacings of the samples from the uniform distribution is given by

Mn (Si ; Sj ) = log 1 0

1 +

n

= log(1 + ) 0

1

n01  1+

where  = ij = 0n01 is the correlation between the uniform spacings. Thus, as n ! 1, the uniform spacings become less dependent as well as less correlated random variables.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 1, JANUARY 2004

183

If the distribution of X is uniform over interval [a; b], then spacings may be represented as Di = (b 0 a)Si , i = 1; . . . ; n; where Si are spacings of the sample from the uniform distribution over the unit interval [0; 1]. Thus,

This result is applicable to large samples from gamma and Weibull distributions with shape parameters less than one, Pareto distribution, and the mixtures of exponential distributions.

Hn (Di ) = Hn (W1 ) + log(b 0 a)

REFERENCES

and

Hn (Di ; Dj ) = Hn (Si ; Sj ) + 2 log(b 0 a)

but the discrimination information functions and the mutual information remain unchanged. If Z has an exponential distribution with parameter , then its spacings may be represented as 1

Si =

Z; n0i+1 i

i = 1; . . . ; n

where Zi , i = 1; . . . n, are independent and identically distributed exponential random variables with density fZ (z ) = e0z . Thus,

H (Si ) = H (Z ) 0 log(n 0 i + 1);

i = 1; . . . ; n:

e That is, for the exponential samples, Si  Z for all i = 1; . . . n and e Si  Sj for i < j , j = 2; . . . n. The discrimination information functions Kn (fS : fZ ) and Kn (fS : fS ) are free of  and are easily computable. Because of the independence, Mn (Si ; Sj ) = 0 for all i 6= j . More generally, for any random variable X with failure rate X (t), the spacings admit the following representation:

Si =

(n

1

0 i + 1)X (ai ) Vi

(18)

where V1 ; . . . ; Vn are independent and identically distributed exponential random variables FV (v ) = 1 0 e0v and Ti01  ai  Ti with

Ti =

i

1

Vi ; j =1 n 0 i + 1

i = 1; . . . ; n

see [12] for details. The following theorem gives a large sample result for the entropy of spacings. Theorem 4.1: Let X be DFR with spacings Si , i = 1; . . . ; n. Then for large n, H (Si ) is increasing in i. Proof: For large n, ai  ni [12]. Using representation (18) with ai  ni , we have

H (Si ) = H (Vi ) 0 log(n 0 i + 1) 0 log X The result is implied by the assumption that X is DFR.

i : n

[1] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A First Course in Order Statistics. New York: Wiley, 1992. [2] K. M. Wong and S. Chen, “The entropy of ordered sequences and order statistics,” IEEE Trans. Inform. Theory, vol. 36, pp. 276–284, Mar. 1990. [3] J. Beirlant, E. J. Dudewicz, L. Györfi, and E. C. van der Meulen, “Nonparametric entropy estimation: And overview,” Int. J. Math. Statist. Sci., vol. 6, pp. 17–39, 1997. [4] S. Park, “The entropy of consecutive order statistics,” IEEE Trans. Inform. Theory, vol. 41, pp. 2003–2007, Nov. 1995. , “Fisher information on order statistics,” J. Amer. Statist. Assoc., [5] vol. 91, pp. 385–390, 1976. [6] D. S. Mitrinovic, Analytic Inequalities. New York: Springer-Verlag, 1970. [7] P. J. Bickel and E. L. Lehmann, “Descriptive statistics for nonparametric models. III. Dispersion,” Ann. Statist., vol. 4, pp. 1139–1158, 1976. [8] H. Oja, “On location, scale, skewness and kurtosis of univariate distributions,” Scand. J. Statist., vol. 8, pp. 154–168, 1981. [9] M. Shaked and J. G. Shanthikumar, Stochastic Orders and Their Applications. San Diego, CA: Academic, 1994. [10] R. E. Barlow and F. Prochan, Statistical Theory of Reliability and Life Testing. Silver Spring, MD: To Begin With, 1981. [11] A. Rényi, “On the theory of order statistics,” Acta Math, vol. 4, pp. 191–231, 1953. [12] R. Pyke, “Spacings (with Discussions),” J. Roy. Statist. Soc., Ser. B, vol. 27, pp. 395–449, 1965. [13] Y. Shao and M. G. Hahn, “Limit theorems for the logarithm of sample spacings,” Statist. Probab. Lett., vol. 24, pp. 121–132, 1995. [14] M. Ekström, “Strong limit theorems for sums of logarithm of high order spacings,” Statistics, vol. 33, pp. 153–169, 1999.