The worst additive noise under a covariance constraint - Information ...

Report 1 Downloads 37 Views
3072

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

Substituting (49), (48), and (50) into (47), and using the fact that D is self-orthogonal, we obtain p02 l=1 =

right-hand side of (44) 1

jDj

n

(p ) + 2

p02 l=0 r

Ar (0p)

where h(l) = (p 0 1) l(p+1) 1

=

jDj (p

=

jDj (p )

1

2

2

n ) +

p02 l=0 r

n + (p

p02 s=0

r

 h(l)

[22] J. P. Serre, Linear Representation of Finite Groups. Berlin, Germany: Springer-Verlag, 1977. [23] J. Schwinger, “Unitary operator bases,” Proc. Nat. Acad. Sci., vol. 46, pp. 570–579, 1960. [24] A. M. Steane, “Simple quantum error correcting codes,” Phys. Rev. Lett., vol. 77, pp. 793–797, 1996. , “Multiple particle interference and quantum error correction,” [25] Proc. Roy. Soc. London A, vol. 452, pp. 2551–2577, 1996.

rs s(p+1) The Worst Additive Noise Under a Covariance Constraint

0

Ar ( p)n

Suhas N. Diggavi, Member, IEEE, and Thomas M. Cover, Fellow, IEEE

0 1) (0p)n

and the assertion of the lemma follows. REFERENCES [1] A. Ashikhmin and S. Litsyn, “Upper bounds of the size of quantum codes,” IEEE Trans. Inform. Theory, vol. 45, pp. 1205–1215, May 1999. [2] A. Ashikhmin, A. Barg, E. Knill, and S. Litsyn, “Quantum error detection I: Statement of the problem,” IEEE Trans. Inform. Theory, pp. 778–788, May 2000. , “Quantum error detection II: Lower and upper bounds,” IEEE [3] Trans. Inform. Theory, pp. 789–800, May 2000. [4] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Wootters, “Mixed state entanglement and quantum error-correcting codes,” Phys. Rev. A, vol. 54, p. 3824, 1996. [5] J. Bierbrauer and Y. Edel. (1998) Quantum twisted codes. Preprint. [Online]. Available: http://www.math.mtu.edu/~jbierbra/ [6] A. R. Calderbank, E. M. Rains, P. W. Shor, and N. J. A. Sloane, “Quantum error correction and orthogonal geometry,” Phys. Rev. Lett, vol. 78, pp. 405–409, 1997. [7] , “Quantum errors correction via codes over GF (4),” IEEE Trans. Inform. Theory, vol. 44, pp. 1369–1387, July 1998. [8] L. Carlitz, “Evaluation of some exponential sums over a finite field,” Math. Nachrichtentech., vol. 96, pp. 13–20, 1980. [9] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. [10] D. Gottesman, “A class of quantum error-correcting codes saturating the quantum Hamming bound,” Phys. Rev. A, vol. 54, pp. 1862–1868, 1996. [11] , “Stabilizer codes and quantum error correction,” Ph.D. dissertation, Calif. Inst, Technol., Pasadena, CA, 1997. [12] G. James and M. Liebeck, Representation and Characters of Groups. Cambridge, U.K.: Cambridge Univ. Press, 1993. [13] E. Knill, “Non-binary unitary error bases and quantum codes,” LANL Preprint, quant-ph/9608048, 1996. [14] , “Group representations, error bases and quantum codes,” LANL Preprint, quant-ph/9608049, 1996. [15] E. Knill and R. Laflamme, “A theory of quantum error correcting codes,” Phys. Rev. A, vol. 55, pp. 900–911, 1997. [16] E. Knill, R. Laflamme, and L. Viola, “Theory of quantum error correction for general noise,” Phys. Rev. Lett., vol. 84, pp. 2525–2528, 2000. [17] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. Amsterdam, The Netherlands: North-Holland, 1977. [18] E. Rains, “Nonbinary quantum codes,” LANL e-print, quant-ph/ 9703048. [19] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,” in Proc. 35th Annu. Symp. Foundations of Computer Science, S. Goldwasser, Ed. Los Alamitos, CA: IEEE Comput. Soc. Press, 1994, p. 124. [20] , “Scheme for reducing decoherence in quantum memory,” Phys. Rev. A, vol. 52, p. 2493, 1995. [21] P. W. Shor and R. Laflamme, “Quantum analog of the MacWilliams identities in classical coding theory,” Phys. Rev. Lett., vol. 78, pp. 1600–1602, 1997.

Abstract—The maximum entropy noise under a lag autocorrelation constraint is known by Burg’s theorem to be the th order Gauss–Markov process satisfying these constraints. The question is, what is the worst additive noise for a communication channel given these constraints? Is it the maximum entropy noise? The problem becomes one of extremizing the mutual information over all noise processes with covariances satisfying the correlation constraints ... . For high signal powers, the worst additive noise is Gauss– Markov of order as expected. But for low powers, the worst additive noise is Gaussian with a covariance matrix in a convex set which depends on the signal power. Index Terms—Burg’s theorem, mutual information game, worst additive noise.

I. INTRODUCTION This correspondence treats a simple problem. What is the noisiest noise under certain constraints? There are two possible contexts in which we might ask this question. One is, what is the noisiest random process satisfying, for example, a lag covariance constraint, [Zi Zi+k ] = Rk , k = 0; . . . p. Thus, we ask for the maximum entropy rate for such a process. It is well known from Burg’s work [1], [2] that the maximum-entropy noise process under p lag constraints is the pth-order Gauss–Markov process satisfying these constraints, i.e., it is the process that has minimal dependency on the past given the covariance constraints. Another context in which we might ask this question is for an additive noise channel Y = X + Z , where the noise Z has covariance constraints R0 ; . . . ; Rp and the signal X has a power constraint P . What is the worst possible additive noise subject to these constraints? We expect the answer to be the maximum-entropy noise, as in the first problem. Indeed, we find this is the case, but only when the signal power is high enough to fill the spectrum of the maximum-entropy noise (yielding a white noise sum). Consider the channel Yk

=

Xk

+

Zk

(1)

Manuscript received September 9, 1999; revised September 25, 2000. This work was supported in part by the National Science Foundation under Grant NSF CCR-9973134 and by ARMY (MURI) DAAD19-99-1-0252. The material in this correspondence was presented in part at the International Symposium on Information Theory (ISIT), Ulm, Germany, June 1997. S. N. Diggavi is with AT&T Shannon Laboratories, Florham Park, New Jersey, NJ 07932 USA (e-mail: [email protected]). T. M. Cover is with the Information Systems Laboratory, Stanford University, Stanford, CA 94305 USA (e-mail: [email protected]). Communicated by S. Shamai, Associate Editor for Shannon Theory. Publisher Item Identifier S 0018-9448(01)08962-3.

0018–9448/01$10.00 © 2001 IEEE

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

where Xk is the transmitted signal and Zk is the additive noise. Transmission over additive Gaussian noise channels has been well studied over the past several decades [1]. The capacity is achieved by using Gaussian signaling and water-filling over the noise spectrum [1]. The question of communication over partially known additive noise channels is addressed in [3]–[5], where the class of memoryless noise processes with average power constraint N0 is considered. A game-theoretic problem [3]–[5] is formulated with a mutual information payoff, where the sender maximizes mutual information, and the noise minimizes it, subject to average power constraints. It has been shown that an independent and identically distributed (i.i.d.) Gaussian signaling scheme and an i.i.d. Gaussian noise distribution are robust, in that any deviation of either the signal or noise distribution reduces or increases (respectively) the mutual information. Hence, the solution to this gametheoretic problem yields a rate of 12 log(1 + P=N0 ), where P and N0 are the signal and noise power constraints, respectively. An excellent survey for communication under channel uncertainties is given in [6]. In [7], [8], a game-theoretic problem with Gaussian inputs transmitted over a jamming channel (having an average power constraint) is studied under a mean-squared error payoff function (for estimation/detection). The problem of worst power-constrained noise when the inputs are limited to the binary alphabet is considered in [9]. The more general M -dimensional problem with average noise power constraint is considered in [10], where it is shown that even when the channel is not restricted to be memoryless, the white Gaussian codebook and white Gaussian noise constitute a unique saddle point. In [11], [12] (and references therein) it was shown that a Gaussian codebook and minimum Euclidean distance decoding achieves rate 12 log(1 + P=N0 ) under an average power constraint. Therefore, for average signal and noise power constraints the maximum-entropy noise is the worst additive noise for communication. We ask whether this principle is true in more generality. Suppose the noise is not memoryless and we have covariance constraints. If the signal is Gaussian with covariance Kx and the noise is Gaussian with covariance Kz , the mutual information I (X ; X + Z ) is given by

I (X ; X + Z ) =

1 2

log

jKx + Kz j jKz j

:

It is well known that the mutual information is maximized by choosing a signal covariance Kx that waterfills Kz [1]. The question we ask is about communication over partially known additive noise channels subject to covariance constraints. We first formulate the game-theoretic problem with mutual information as the payoff. The signal maximizes the mutual information and the noise minimizes it by choosing distributions subject to covariance constraints. Note that the problem considered is similar in formulation to the compound channel problem [13], and, therefore, is more benign than the allowed noise in arbitrarily varying channels [6], [12]. In [14], [15] the problem where a memoryless interference which is statistically dependent on the input was considered. In this correspondence, the additive noise is independent of the input but need not be memoryless. We first show that Gaussian signaling and Gaussian noise constitute a saddle point to the mutual information game with covariance constraints. Therefore, we can restrict our attention to the solution of a determinant game with payoff 12 log( jKjK+Kj j ). To solve this problem, one chooses the signal covariance Kx and noise covariance Kz to maximize and minimize (respectively) the payoff 12 log( jKjK+Kj j ) subject to covariance constraints. Throughout this correspondence, we impose an expected power constraint on the signal, 1

n

n i=1

Xi2 =

1

n

x  P:

tr (K )

3073

We will also assume that the noise covariance Kz lies in a given convex set Kz , but the noise distribution is otherwise unspecified. For example, the set Kz of covariances Kz satisfying correlation constraints R0 ; . . . ; Rp is a convex set. Also, for some of the results in the correspondence, we assume Kz > 0, for all Kz 2 Kz , i.e., the noise processes are not degenerate. We study the properties of the saddle points to the payoff function jK +K j 1 log( jK j ). We show that the signaling covariance matrix Kx is 2 unique and water-fills a set of worst noise covariance matrices. The set of worst noise covariance matrices is shown to be convex and hence the signaling scheme is protected against any mixture of noise covariances. Therefore, choosing a Gaussian signaling scheme with covariance Kx3 which water-fills the class of worst covariance matrices will achieve the minimax mutual information. This establishes a single optimal strategy for the sender (Gaussian with a certain covariance matrix designed to water-fill the minimax noise) and a convex set of possible noise covariances, all of which look the same “below the water line.” Next, we re-examine the question of whether the maximum entropy noise is the worst additive noise when we have a banded matrix constraint specified up to a certain covariance lag on the noise covariance matrix. In this case, we show that if we have sufficient input power, the maximum entropy noise is also the worst additive noise in the sense that it achieves the saddle point and minimizes the mutual information. We put forth the game-theoretic problem in Section II, establish the existence of a saddle point and also study its properties. We consider the banded noise covariance constraint in Section III. In Section IV, we show this minimax rate is actually achievable using a random Gaussian codebook and Mahalanobis distance decoding. II. PROBLEM FORMULATION The general problem is that of finding the maximum reliable communication rate over all noise distributions subject to covariance constraints. Throughout this section, we assume that the constraint sets Kx and Kz are closed, bounded, and convex. Note that we have implicitly associated with Kx and Kz the topology of n 2 n symmetric matrices, i.e., that associated with M , where M = n(n + 1)=2. We need to show that there exists a codebook that is simultaneously good for all such noise distributions. We first guess that this problem can be solved by solving the minimax mutual information game. Later, in Section IV, we examine a random coding scheme and a decoding rule that achieves this rate. Hence, the signal designer maximizes the mutual information and the noise (nature) minimizes it, and this is the minimax communication capacity. Therefore, we consider minimax problem

I X (n) ; X (n) + Z (n) p inf2Z psup 2X

(2)

where

Z = Closure fpZ : X = Closure fpX :

2 Kz g [X ] = 0; tr(Kx )  nP g

[Z ] = 0;

Kz

and pX , pZ are probability measures defined on the Borel  -algebra of n . The closure is defined in terms of the weak topology of probability measures on n [16, Sec. 2.2]. We note that if the covariance constraint sets Kx ; Kz are closed, then the sets X ; Z can be proved to be closed (without the closure operation) if the random processes are assumed to 3 and have finite fourth moments. If there exist probability measures pX 3 pZ such that

I X (n) ; X (n) + Z 3(n)

I I

X 3(n) ; X 3(n) + Z 3(n) X 3(n) ; X 3(n) + Z (n)

(3)

3074

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

for all pX 2 X ; pZ 2 Z , where X 3(n) and Z 3(n) are distributed ac3 and p3 , respectively, then (p3 ; p3 ) is defined cording to measures pX X Z Z as a saddle point for I (X (n) ; X (n) + Z (n) ), and I (X 3(n) ; X 3(n) + Z 3(n) ) is called the value of the game. To show the existence of such a saddle point, we examine some properties of the mutual information under input and noise constraints. We first show that there exist saddle points which have Gaussian probability measures pX and pZ . Lemma II.1 [1, Ch. 9]: Let Z and Z G be random vectors in n with the same covariance matrix Kz . If Z G  N (0; Kz ) and Z has any other distribution, then Z

[log(fZ (Z ))] =

f

Z [log( Z

(Z ))]

(4)

where fZ (1) denotes the probability density function of Z G , and Z [1] and Z [1] denote the expectations with respect to Z G and Z , respectively. The following result (Lemma II.2) has been proved by Ihara [17] based on a result by Pinsker [18]. The alternative proof given below shows the condition for which equality holds. In the proof, we assume the noise has a probability density function. Lemma II.2: Let X G  N (0; Kx ), and let Z and Z G be random vectors in n (independent of X G ) with the same covariance matrix Kz . If Z G  N (0; Kz ) and Z has any other distribution with covariance Kz , then

I (X G ; X G + Z )  I (X G ; X G + Z G ):

(5)

If Kx > 0, then equality is achieved iff Z  N (0; Kz ). Proof: Let Y = X G + Z and Y G = X G + Z G . Then Y G  N (0; Kx + Kz ) and Y ; Y G have the same covariance matrix Kx + Kz . We assume the existence of probability density functions for Y and Z denote it by fY (1) and fZ (1), respectively. The Gaussian density functions for Y G and Z G are denoted by fY (1) and fZ (1), respectively. We have

I (X G ; X G + Z G ) 0 I (X G ; X G + Z ) = h(Y G ) 0 h(Z G ) 0 h(Y ) + h(Z ) =

0

log(fY

(Y ))fY

y+ (y ) dy

y log(fY (y ))fY (y ) dy

+

0

z log(fZ (z ))fZ (z ) dz

z log(fZ (z ))fZ (z ) dz

f (z ) fY (y ) f (y ) dyy + log Z fZ (z ) dzz fY (y ) Y fZ (z ) = D (Y jjY G ) 0 D (Z jjZ G ) fY (y )fZ (z ) = log f (y ; z ) dyy dzz fY (y )fZ (z ) Y ; Z (b)  log ffY (y()yf)Zf ((zz )) fY ; Z (y ; z ) dyy dzz Y Z (a)

=

log

(c)

= log

(d)

= log

z fY (y ) dyy [fX (y 0 z )]fZ (z ) dz fY (y ) fY (y ) f (y) dyy = 0 fY (y ) Y 1

where (a) follows from Lemma II.1, (b) follows from Jensen’s inequality, (c) follows from

fY jZ (y jz ) =

fY ; Z (y ; z ) fZ (z )

=

fX (Y

0 Z)

and (d) follows from (y ) =

fY

fX (y 0 z )fZ (z ) dzz :

The equality in (b) (Jensen’s inequality) is achieved iff

fY (y )fZ (z ) fY (y )fZ (z )

for y ; z such that

= 1;

fY ; Z (y ; z ) = fX (y 0 z )fZ (z ) > 0: (6)

If Kx > 0, then the support set of X G ; Y and Y G is n , and, thus, (6) is true for all y 2 n and z in the support set of Z . Therefore, we can write for some z in the support set of Z

fZ (z )

fY (y ) dyy = fZ (z )

y (y ) dy

fY

(7)

and so fZ (z ) = fZ (z ) for all z in the support set of Z as

fY

y= (y ) dy

fY (y ) dyy = 1:

Therefore, to achieve equality in (b) we need Z therefore, Y  N (0; Kx + Kz ).

 N (0; K ) and, z

Using Lemma II.2 we examine the properties of the original minimax problem. Theorem II.1: Consider the channel Yi = Xi + Zi for n, and impose the constraints pX 2 X and pZ 2 Z . 3 Then there exists a pair (pX ; pZ3 ) (probability measures on n ) which is a saddle point for the payoff function I (X (n) ; X (n) + Z (n) ). 3 ; p3 ) is also a saddle point, where p3 ; Moreover, the pair (pX Z X 3 3 ; pZ are Gaussian distributions with the same covariances as pX 3 pZ , respectively. All saddle points have the same payoff value

i

= 1; . . . ;

(n) (n) (n) V def = min max I (X ; X +Z ): p

p

If Kz > 0, 8 Kz 2 Kz , then all saddle points are of the form 3 3 3 (pX ; pZ ), where the saddle-point distribution pX is Gaussian and is unique. Proof: We first argue that the set X of all probability measures (2) (1) having covariance matrices in Kx is convex. If pX ; pX are two prob(1) (2) ability measures with covariances Kx ; Kx 2 Kx , then the covari(1) (2) ance of pX + (1 0 )pX ,  2 [0; 1], is also in Kx , by the convexity of Kx . Thus, X is convex. The same argument is true for the noise probability measure. The mutual information I (X (n) ; X (n) +Z (n) ) is concave in pX and convex in pZ [1], and the constraint sets on the probability measures are closed, convex, and bounded. Hence, using the fundamental theorem of 3 game theory [19], we know that there exists a saddle point (pX ; pZ3 ). (n) (n) n having the same coLet X G ; Z G be Gaussian random vectors in 3 ; p3 , respectively. Furthermore, let X 3(n) ; Z 3(n) be variances as pX Z 3 random vectors in n having probability measures pX ; pZ3 , respectively. Then

I X 3(n) ; X 3(n) +Z (Gn) =

h X 3(n) +Z (Gn)

=

h X 3(n) +Z (Gn)

h

XG

(n)

+Z G

(n)

0h X 3 0h Z 0h Z

(n)

+Z G

(n)

jX 3

(n)

(n) G

(n) G

=I

(n) (n) (n) XG ; X G +Z G

(8)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

where the inequality follows from the fact that the Gaussian distribution maximizes the entropy for a given covariance (see [1, Cha. 9]). Similarly, from Lemma II.2 we have (n) I X (Gn) ; X G + Z (Gn)

I

(n) XG ; X G(n) + Z 3(n)

(9)

(n) for any distribution on Z 3(n) , where Z G  N (0; Kz ), and Kz is the 3(n) covariance matrix of Z . Hence, we have shown that

I X 3(n) ; X 3(n) + Z (Gn)

I I

(n) XG ; X G(n) + Z (Gn) (n) XG ; X G(n) + Z 3(n) :

(10)

3 ) is a saddle point, we have the following But, as we know that (p3X ; pZ double inequality: (n) I XG ; X G(n) + Z 3(n)

I I

X 3(n) ; X 3(n) + Z 3(n) X 3(n) ; X 3(n) + Z (Gn)

= min max I p p

def

=I =I

(n) XG ; X G(n) + Z (Gn)

 I X n ; X n + Z 3G n  I X 3 n ; X 3 n + Z 3G n + I ( )

(12)

( )

( )

( )

( )

( )

=V

where X (n)

3(n) XG ; X G3(n) + Z 3G(n)

(13)

 pX , X 3 n  pX3 , X G3 n  pX3 ( )

( )

+(1 0 )I

3(n) XG ; X G3(n) + Z (1n)

3(n) XG ; X G3(n) + Z (2n)

=V

(15)

3(n) (n) (1) (1) (n) (2) 3 where X G  pX , Z 1  pz , Z 2  pz , Z (n)  pz + (1 0 (2)

)pz ; and V is the value of the game as defined in Theorem II.1.The above equation is due to the convexity of I (X (n) ; X (n) + Z (n) ) with pZ [1]. Thus, the inequality in (15) is satisfied with equality. Hence, (2) 3 (pX3 ; p(1) z +(1 0 )pz ) is also a saddle point and, therefore, Z is a convex set. Moreover, this also implies that the set of worst covariance matrices

Kz : Kz

3 zj = argmin 12 log jKxjK+ K j K

2K

z

is a convex set.

3 3 Thus, pX ; pZ is also a saddle point. This also shows an interchanga(1) (1) (2) (2) bility property, i.e., if (pX ; pZ ) and (pX ; pZ ) are saddle points (1) (2) (2) (1) then (pX ; pZ ) and (pX ; pZ ) are also saddle points. 3(n) Let Z G  pZ3 be one of the Gaussian noise saddle points. If 3 pX = pX + p3X then, by the concavity of the mutual information, we observe

V

 I

3(n) I XG ; X G3(n) + Z (n)

(11)

X (n) ; X (n) + Z (n)

X 3(n) ; X 3(n) + Z 3(n) :

Proof: From Theorem II.1, we already know that the saddle 3 ; p3 ), where p3 is unique. Let points are of the form (pX Z X (1) (2) 3 3 (pX ; pz ) and (pX ; pz ) be two saddle points, and 2 [0; 1]. Then

Kz3 =

and, hence, we have

V

3075

3 ; pZ3 ), and We have shown that the saddle points are of the form (pX 3 3 3 that (pX ; pZ ) are also saddle points, where pZ is Gaussian with 3 . We can make the following observation on the same covariance as pZ 3. the noise saddle-point distributions pZ Let the rank (Kx ) =   n, and the eigendecomposition of Kx be Kx = U T 3x U , where 3 = diag(1 ; . . . ;  ; 0; . . . ; 0). Hence, we can write Y~

= UYY (n) ~ = X~ 1T ; X~ 2T X

~ X

X (n) = UX Y~ = Y~ T1 ; Y~ T2

T

T

~ Z

= UZZ (n) ~ = Z~ T1 ; Z~ T2 Z

T

(16)

~ 1 ; X~ 2 are of dimension ; n 0  , respectively. The vectors where X ~ 1 ; Z~ 2 are defined similarly. The following proposition has Y~ 1 ; Y~ 2 ; Z been contributed by A. Lapidoth. Proposition II.1: The noise saddle-point distribution pz3 is such that

~ 1 0 CZ~ 2 has a full-rank Gaussian distribution where Z ~ 1Z~ T2 ]f [Z~ 2Z~ T2 ]g01 : C = [Z

. Hence

h(Y (n) ) = h(Y 3(n) ) = h(Y 3G(n) )

Note: This means that the estimation error of the best linear estimate ~ 1 from Z~ 2 is full-rank Gaussian. of Z

3(n) 3(n) 3(n) where Y (n) = X (n) + Z G ; Y 3(n) = X 3(n) + Z G ; and Y G = 3(n) 3(n) 3(n) X G +Z G . If Kz > 0 then h(Y G ) < 1 and the entropy h(Y (n) ) is strictly concave in pY and so we have Y 3(n)  N (0; Kx3 + Kz3 ). Therefore, we have

(n) (n) Proof: Let Y (n) = X G + Z (n) , where X G has the Gaussian ~3 = input saddle-point distribution (see Theorem II.1). We define X (n) X G and the notation from (16) is used. If Kx is not full-rank, i.e., UX ~ 2 =0 almost surely (a.s), and Y~ 2 = Z~ 2 a.s. Let  < n, then X

9Y () = 9X ()9Z () = 9X ()9Z () = 9Y ()

(14)

9Y () is the characteristic function of Y (n) , and 9Z () = 1 T exp( 2  Kz3 ). Hence, as 9Z () is nonzero for all  we conclude 3 = p3 , and that the p3 is unique. that pX X X where

It is well known from convex analysis [20] that the set of minimizing arguments for a convex function is a convex set. In the next result, we use this to show the set of worst noise distributions is a convex set. 3 Corollary II.1: Let X G  pX3 have the Gaussian input saddlepoint distribution, then the set of worst noise distributions

Z3 =

pz3

is a convex set.

2 Z : pz3 = argmin I p

3(n) XG ; X G3(n) + Z (n)

C

= [Z~ 1Z~ T2 ]f [Z~ 2Z~ T2 ]g01

then we have the following: (n) I XG ; Y (n)

= I X~ 3 ; Y~ = I X~ 13 ; Y~  I X~ 13 ; Y~ 1 0 CY~ 2 = I X~ 13 ; Y~ 1 0 CZ~ 2 = I X~ 13 ; X~ 13 + Z~ 1 0 CZ~ 2 jK ~ + K ~ ~ j (a) 1  2 log XjK ~ Z ~0Cj Z Z 0CZ (b) 1 j Kx3 + Kz j = 2 log jK j z

(17)

3076

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

~ 3T ] > 0, ~ 3X where (a) is due to Lemma II.2. Moreover, as KX~ = [X 1 1 ~ 2 is ~ using Lemma II.2 we know that equality is achieved iff Z 1 0 C Z Gaussian. Now, to show (b), we use the determinant relationship of block matrices using the Schur complement (defined as A0BD01 B T ) [21] def

j jj 0 BD0 BT j:

A BT

B = D A D

1

(18)

Next we examine the properties of the function g (Kx ; Kz ). In particular, we show that 12 log jKjK+Kj j is convex in Kz and concave in Kx . Lemma II.3: The function log( jKjK+Kj j ) is convex in Kz , with strict convexity if Kx > 0. Proof: Consider Y = X + Z  and let X  N (0; KX ), and let  be independent of X and be distributed as =

Using (18) and KZ~

0CZ~ =

~ 1Z ~T Z 1

0

~ 1Z ~T Z 2

01

~ 2Z ~T Z 2

~ 2Z ~T Z 1

(19)

~ 1Z ~T Z 1

1 =

~ 2Z ~T Z

K 3 + Kz =

~ 2Z ~T Z

x

2

2

KZ~

~ 2Z ~T Z 2

~ 1Z ~T Z 1

01

~ 2Z ~T Z 2 ~

+ KZ ~

Now, since I (X ; ) = 0 and I (X ; jY )

 0, we have

I (X ; Y )

2

~ 2Z ~T Z 1

(25)

However, (20)

jKx + Kz j jKz j

(24)

j  I (X ; Y ):

~ 1Z ~T Z

Now, this does not completely answer the question of whether all saddle points to this problem are Gaussian. The problem arises primarily because the mutual information is not necessarily a strictly convex function of pZ and, therefore, the noise saddle-point distri3 need not be unique. However, using Theorem II.1, which bution pZ shows the existence of Gaussian saddle points, and Proposition II.1 we believe that it is worthwhile to focus our attention on the Gaussian mutual information game defined as follows. The Gaussian mutual information game is defined with payoff 1

(23)

j j

which completes the proof for (b) in (17). Therefore, equality is ~ 1 0 CZ ~ 2 has a full-rank Gaussian distribution. achieved in (17) iff Z

def

if  = 1 if  = 2.

= I (X ; Y ) + I (X ;  Y ):

0CZ~

(n) (n) (n) g (Kx ; Kz ) = I X G ; X G + Z G = log 2

(22)

I (X ; Y ; ) = I (X ; ) + I (X ; Y )

~ 2Z ~T Z 1

0

w.p. 

Consider the two expansions

0CZ~

KX ~ +

jKX

~ 1Z ~T Z 2

01

~ 2Z ~T Z 2

1 =

0

2;

Z 1; Z 2;

Z = ~ 2Z ~T Z 2

w.p. 

where  = 1 0 . Let Z 1  N (0; KZ ), Z 2  N (0; KZ ) (mutually independent and independent of X ), and let us define

we obtain, Kz =

1;

j

j j

j

j

j

j

j

j

j

:

(26) From Lemma II.2, we have I (X ; X + Z )

where Z G we have  log

zj  I (X ; X + Z G ) = 21 log jKxjK+z K j

 N (0; Kz ) and Kz = Kz

jKx + Kz j jKz j

+  log

(21)

where we have constrained X (n) and Z (n) to be Gaussian with covariances Kx 2 Kx and Kz 2 Kz . Note that all saddle points have the same value and hence the Gaussian saddle points yield the minimax rate. Later, we will examine a sufficient condition under which the saddle point is indeed unique. Note that as all saddle-point covariances are characterized by 3 3 (Kx ; Kz ), Kz 2 Kz . For example, if the input covariance constraint is an average power constraint, Kx3 must water-fill all the covariances in Kz3 . From Corollary II.1, if the noise player chooses to use a mixture of covariances in Kz3 it does not gain, since the signal covariance Kx3 is already water-filling any convex combination of fKz g 2 Kz3 . Moreover, the noise cannot further reduce the mutual information by using any other distribution in Z 3 . In [22], [23], a problem with vector (parallel channels) inputs and outputs with power constraints on the signal and noise was considered. In our problem, the transmitter does not know the noise covariance matrix and cannot use this information to form parallel channels. Moreover, the constraints on the processes are more general than power constraints (or trace constraints on the covariance matrix).

j

I (X ; Y ) = I (X ; Y  = 0) + I (X ; Y  = 1) 1 1 Kx + Kz Kx + Kz =  log +  log 2 Kz 2 Kz

+ Kz

(27)

. Using (25)–(27)

jKx + Kz j jKz j zj  log jKxjK+z K j

(28)

which gives the desired result. Note that if Kx > 0, the inequality in (27) is strict, by Lemma II.2, implying strict convexity. The following lemma [24] has an information-theoretic proof in [25].

Lemma II.4: If Kz > 0, the function log( jKjK+Kj j ) is strictly concave in Kx . We now prove sufficient conditions under which the saddle point to the mutual information game is unique. Lemma II.5: If there exists a saddle point (Kx3 ; Kz3 ) of g (Kx ; Kz ), such that Kx3 > 0, then the saddle point (px3 ; pz3 ) for the mutual information game is unique and Gaussian with covariances Kx3 ; Kz3 , respectively. Proof: From Lemma II.2

3(n) 3(n) I (X G ; X G + Z (n) )

 I (X G3 n ; X G3 n + Z 3G n ) and as Kx3 > 0, equality is achieved iff Z n  N (0; Kz3 ). Now, let 3 zj Kz3 = Kz : Kz = argmin 12 log jKxjK+z K : j K 2K ( )

( )

( )

( )

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

Now, since g (Kx3 ; Kz ) is strictly convex for Kx3 > 0 (from Lemma II.3), we see that the above minimization has a unique minimum. Thus, pz3 = argminp I (X G3(n) ; X G3(n) +Z (n) )g is unique and Gaussian. This result also helps us make observations on the set of noise saddlepoint distributions for the case when Kx3 is not strictly positive-definite. Here we use the notation of Proposition II.1 and (17). If rank (Kx ) =  ~ 3; X ~3 +Z ~ 1 0 CZ ~ 2 ) we see that for the 0: Using Lemma II.5 on I (X 1 1 ~ ~ Z Z noise saddle-point distribution, ( 1 0 C 2 ) has to be Gaussian with a unique covariance. Therefore, we can observe that the saddle-point distributions are such that the Schur complement of the noise covariance matrix, projected onto the signal covariance eigendirections, is a constant. More precisely, the set of noise saddle-point distributions is ~ 1 0 CZ ~ 2 has a full-rank Gaussian distribuconvex and such that the Z ~ T ] 0 [Z ~ T ]f [Z ~ T ]g01 [Z ~T ] ~ 1Z ~ 1Z ~ 2Z ~ 2Z tion with a covariance [Z 1 2 2 1 which is constant over the set. We know [3] that for average signal and noise power, the pair (Kx = PII ; Kz = N0I ) is a saddle point. The result in Lemma II.5 shows that the saddle point is unique [10]. In the next section, we find the worst additive noise for a banded covariance constraint.

3077

high-power requirement. This might require a power growing linearly with block size. In Theorem III.1, we show that this requirement is too pessimistic and that the worst additive noise is the maximum-entropy noise for a bounded input power requirement. To show this, we recall two useful facts.

jj

d log x

Fact III.1:

x dx

=

X 01 , for X = X > 0. T

Fact III.2: For the maximum-entropy completion of the noise specified in (29), the covariance matrix Kz33 satisfies (Kz3301 )i; j = 0, for (i; j ) 2 = S as shown, for example in [1]. Now, using these facts we will show that the maximal-entropy extension (Kz33 ) of the noise and the corresponding signal water-filling covariance matrix (Kx33 ) do, indeed, form a saddle point for the game defined in (2) for sufficiently high input power. Theorem III.1: Let Yi = Xi + Zi for i = 1; . . . ; n; and let fZi g be a noise process satisfying the constraints given in (29). Let fXi g satisfy the expected power constraint P . If Kx33 > 0, we have

 I X3 ; X3 + Z3  I X3 ; X3 + Z (32) for all p 2 X ;33p 233 Z where X 3  N (0; K 33), 3 Z  N (0; K33 ), K is the maximum-entropy extension I X(

n)

;

X(

n)

3(n)

+ ZG

(n)

G

(n)

G

III. BANDED COVARIANCE CONSTRAINT In this section, we constrain the noise distribution to have a banded covariance matrix. Here we assume that we know the noise covariance lags up to the pth lag as given by [Zi Zi+k ] =

;

k = 0; . . . p;

k

for all i:

(29)

The noise is assumed to have zero mean. Now, as the transmitter knows only partial information about the noise spectrum, the question is what should be the input spectrum solving the mutual information game defined in (2). In this section, we consider noise distributions Z = fp(z): [Z ] = 0; Kz 2 Kz g where

K

z

and

=

fK : (K ) z

=

z i; j

(i 0 j ); (i; j ) 2 Sg

specifies the constraints on the correlation lags. Let the covariance matrix Kz33 be the maximum entropy covariance in Kz (specified by Burg’s theorem [2]). The maximum entropy noise is a Gauss–Markov process with covariance lags satisfying the Yule–Walker equations [1, pp. 274–277]. Clearly, we can use a signal design which water-fills the eigenvalues of the maximum-entropy extension Kz33 . Let us define this input covariance matrix to be Kx33 . We now show that the maximum-entropy extension Kz33 is the worst additive noise when we have min

K

2K

max

K

2K

2

log

jK

x

+ Kz

jK j z

j

=

1

min

K

2K

2

log

min

2K

1 2

log

P+



i

n I

0

1 2

log

jI j jK j

(30)

z

jK j : z

(n)

x

z

of the noise, and Kx is the corresponding water-filling signal covariance matrix. Proof: The first inequality is easy to show from the water-filling argument. For the second inequality, we again use Lemma II.2 to reduce consideration to only Gaussian noise processes. Therefore, the problem reduces to

min K

1 2

log

jK 33 + K j jK j z

x

such that

z

[Zi Zi+k ] =

;

k = 0; . . . p;

k

for all i:

(33)

This is again a convex minimization problem over a convex set and as jK +K j ) is a strictly convex functional (Lemma 1 0, 2 log( x jK j II.3) and hence it has a unique solution [26]. It remains to show that Kz33 satisfies the necessary and sufficient conditions for optimality [26]. Setting up the Lagrangian we have

L = 12 log(jK 33 + K j) 0 21 log(jK j) + z

x

(31)

i

But i i =n = 0 is specified in (29), so the maximum in (31) is achieved by maximizing maxK 2K 12 log jKz j. However, for this condition, we need the power P to be large. We examine the implication of this high-power requirement. Notice that we need  > maxi i for (30) to be true. Therefore, we need P > maxi i 0 0 for the naive

z

2S



K

i; j (

z )i; j

(34)

(i; j )

where S = f(i; j ): j = i 6 k; k = 0; . . . ; pg specifies the constraints on the correlation lags. Now differentiating with respect to Kz and using Fact III.1, we obtain

dL dK

z

for appropriate  , which is true if the input power is high enough so that for all Kz 2 Kz , Kxo + Kz = I , where Kxo water-fills Kz . Now  = P + i i =n, where fi g are the eigenvalues of Kz . Thus, the minimax problem becomes

K

(n)

G

(n)

z

z

(n)

G

K 33 >

S = f(i; j ): ji 0 j j  p; i; j = 1; 2; . . .g

1

x

(n)

(n)

G

33

01 0 (Kz )01 + A

= (Kx + Kz )

(35)

where A is a banded matrix such that (A)i; j = 0 for (i; j ) 2 = S . Note that, from Fact III.2, we have (Kz3301 )i; j = 0 for (i; j ) 2 = S . Hence, it is clear that Kz33 satisfies the necessary and sufficient conditions for optimality, since Kx33 + Kz33 = II for some constant  . This is true as Kx33 is the water-filling solution to Kz33 . Clearly, from this it follows that Kz33 is the minimizing solution. Note that from Lemma III.5, as Kx33 > 0, this constitutes a unique saddle point to the problem. To see what the power requirement is for Kx33 > 0 and Theorem III.1 to hold, we see that the power should be large enough so that we can “completely” water-fill the maximum-entropy extension. The power needed for this is bounded, as we now argue. For the maximum-entropy completion, the noise covariance matrix is Toepltiz [1] and, therefore, asymptotically the density of the eigenvalues on the real line tends to the power spectrum of the maximum entropy stochastic process [1].

3078

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

Hence, the condition for the power spectral density of the input process for “completely” water-filling the maximum-entropy process is that

 0 NM E (f ) > 0;

8 f 2 [01=2; 1=2)

where  = P 0 01=2 NM E (f ) df , the maximum entropy noise spectral density is denoted by 1=2

2

NM E (f ) =

p

1+

k=1

ak exp(0j 2fk)

2

where a1 ; . . . ; ap ;  2 , satisfy the Yule–Walker equations [1, pp. 274–277]. If the maximum entropy process is stable (i.e., the noise spectral density does not have poles on the unit circle) then the input power needed for the above condition is finite, as supf 2[01=2;1=2) NM E (f ) < 1. If the banded constraint is not degenerate then the Yule–Walker equations are not degenerate, i.e., we do not have a completely predictable process. Hence, the maximum-entropy completion (for the given banded constraint) cannot be unstable (or critically stable), completing the argument. Now, as we have chosen Kx33 > 0, we have a strictly convex minimization problem for Kz and we establish the result. Example: This example shows how the maximum entropy noise and worst additive noise might differ. Let [Zi2 ] = 1 and [Zi Zi+1 ] = 0:9. Thus,

K

z

1

Kz : Kz

=

0:9

=

?

0:9

0:9

0:9

(36)

1

and maximum-entropy completion is

Kz33

1 =

0:9

0:81

0:9 1 0:9

0:81

0:9

=

1

1

T

1

+ 2

2

T

2

+ 3

3

T

3

1

(37) where 1 = 2:7406, 2 = 0:19, 3 = 0:0693 are the eigenvalues of Kz33 and 1 ; 2 ; 3 are the associated eigenvectors. If the power is large enough to water-fill Kz33 (i.e., tr (Kx ) > 5:22), then the conditions needed for Theorem III.1 are satisfied and the maximum-entropy completion Kz33 is indeed the worst noise. We now consider the power constraint, tr (Kx )  0:1. Here the input power is insufficient to water-fill the maximum-entropy completion. We find the saddle point (Kx3 ; Kz3 ) by numerically solving for max

K

2K

min

K

2K

1 2

log

jK

x

+ Kz

jK j z

j:

The covariance Kz3 of the worst additive noise is then given by

Kz3

0:9

1 =

0:9

1

0:873

0:9

0:873

0:9

=

31 1 1T

3

3

+ 2 2 2 + 3 3 3 T

T

1

(38) where 31 = 0:091, 32 = 0:127, 33 = 2:782 are the eigenvalues of Kz3 and 1 ; 2 ; 3 are the associated eigenvectors. The optimal transmitter covariance matrix Kx3 is of rank 2, given by

Kx3

=

C

jK

1

x

+ Kz

min log 0:1 K 2K 2 jKz j 3 3 j Kx + Kz j 1 = log 2 jKz3 j = 0:3915 nats:

=

max

j

tr(K )

(40)

Thus, for the this low signal power example, the worst additive noise is N (0; Kz3 ), which differs from the N (0; Kz33 ) maximum-entropy noise Note that if the transmitter uses the minimax distribution N (0; Kx3 ), but nature deviates from the noise distribution N (0; Kz3 ) by using the maximum-entropy noise N (0; Kz33 ), the transmission rate increases to 1 2

jK 3 + K 33j = 0:4196 nats. jK 33 j x

log

z

z

Thus, deviation by the noise player is strictly punished, and the maximum-entropy noise is seen to be strictly suboptimal for low power. Note that when we have low signal power, the optimal Kx3 does not have full rank. In general (for a larger number of dimensions n), there could be a convex set of noise covariance matrices whose projections on the range space of Kx3 are identical but could be different in the null space of Kx3 (still satisfying the covariance constraints). Thus, the set of worst noise covariance matrices is convex and looks the same in the range space of Kx3 (or “below the water line”). IV. DECODING SCHEME

?

1

and

0:0275 00:0228 00:0045 00:0228 0:0450 00:0228 00:0045 00:0228 0:0275

(39)

It is difficult for the receiver to form a maximum-likelihood detection scheme for all noise distributions. Therefore, we propose using a simpler detection scheme based on a Gaussian metric and the second-order moments. However, as this is not the optimal metric, it falls into the category of mismatched decoding [11], and it is not obvious that the rate 12 log jKjK+Kj j is achievable using such a mismatched decoding scheme. In this section, we show that the rate 12 log jKjK+Kj j is achievable using a random Gaussian codebook and a Gaussian metric under some conditions on the noise process. In [11], [23], it was shown that 12 log(1 + P=N0 ) is achievable using a Gaussian codebook and a minimum Euclidean distance decoding metric. This result was extended to the vector single-user channel where the transmitter knows the noise covariance matrix and hence can form parallel channels [11], [23]. In our case, we do not assume that the transmitter knows the noise covariance but show that if the receiver knows Kz , then the rate jK +K j is achievable. 1 log jK j 2 The coding game is played as follows. The transmitter knows the family Kz but not the specific covariance Kz 2 Kz or the distribution. The transmitter chooses a distribution p(x(n) ) and 2nR i.i.d. codewords drawn according to p(x(n) ). The transmitter is also allowed to choose a random codebook, where the codebook is known to the receiver. The receiver is assumed to know Kz but not the noise distribution. The receiver chooses a given decoding rule based on the knowledge of the noise covariance and the transmitter codebook. The noise can choose any distribution f (z (n) ) satisfying the given covariance constraints Kz 2 Kz and some regularity conditions (C 1 and C 2 below) on the noise process. We find the highest achievable rate for which the probability of error averaged over the random codebooks goes to zero. Let us define M (X (n) ; Y (n) ) as

M X (n) ; Y (n)

=

1 2

log

0 12

jK

x+

Kz j

jK j 0X

+

z

Y (n)

(n)

1 2 T

Y (n)

01Y (n)

(Kx + Kz )

Kz01 Y (n) 0 X (n) : (41)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

Define X (n) and Y (n) to be jointly -typical if we have 1

2n

jKx + Kz j 0 1 M jKz j n

log

We will show below that for rates Rn below

X (n) ; Y (n) < :

T

Y (n) 0 X (n)

2n

Kz01 Y (n) 0 X (n)


= 0;

Cn =

(42)

Our detection rule is that we declare X (n) (i) to be decoded if it is the only codeword which is jointly -typical with the received Y (n) . Note that the detection rule is equivalent to a Gaussian decoding metric with a threshold detection scheme where an error is declared if there are more than one codewords below the threshold. This can be seen by rewriting (42) as 1

3079

8  > 0:

codes with decoding rule given in (42) such that the probability of error ! 0. Proof: Let X (n) (i), i = 1; . . . ; 2nR , be independent codewords chosen from a Gaussian distribution with covariance Kx . Let us define the event Ei = fX (n) (i); Y (n) are jointly -typicalg, where typicality is defined in (42). As the index of the codewords is assumed to be chosen from a uniform distribution, we can assume without loss of generality (w.l.o.g.) that X (n) (W ); W = 1 was the transmitted codeword. Hence, we can write the probability of error P [EjW = 1] using the union bound as

Pe(n)

Pr[Ei ] = Pr

We begin by stating two lemmas which are proved in the Appendix. Lemma IV.2 requires the use of conditions C 1 and C 2 on the noise process.

2

0 12

Z (n)

(b)

=

,

1

2n

1

(i);

1(Kx + Kz )0

1

Z (n) +X (n)

1



2

 (1 0 ) exp 0n 8

0n

log

1

exp

2

e0n 0n(C 0) = e

C 1 Pe n (C ) = nR 2 i

( ) = Pr ^ i

y

Pe(n)

=

6= ijx in (C )

(n)

( )

( )

i

C Pe(n) (C ):

Y (n) (Kx + Kz )01Y (n) ( )

( )

( )

(47)

Cn =

P [EjW

1

2n

log

= 1]

jKx + Kz j jKz j

 Pr[E c ] + e0n C 0R 0  (1 0 ) exp 0n 8 +  + e0n C 0R 0 1

(

)

2

(a)

(45)

(i)

T

0 21 Y n 0 X n 1 Kz0 Y n 0 X n

where (a) follows from the Chernoff bound, using = Cn 0  and

(

)

(48)

where (a) follows from Lemma IV.2. Therefore,

(n)

(n)

)

follows by expanding M (X (n) (i); Y (n) ); (c) uses Lemma IV,1; and (d) uses the definition of . Therefore, using (46) and (47) we have

We define Pe as the probability of error over a block of n samples averaged over transmitter codebooks, i.e., (n)

0n

Y

jKx + Kz j 0  jKz j

(b)

T

+

+ :

log

(46)

(d)

(44)

Z (n) +X (n)

1

2n

Pr[Ei ]:

=

(n)

2n

i=2

M X (n) (i); Y (n) >

eM (X e

2

(c)

:

(n)

Z (n) Kz01Z (n) >

1

1

 N (0; Kx ) and is independent of Z , and Kz > 0, and the noise satisfies C 1 and C 2, then we

]=

 Pr[Ec ] +

( )

have Pr



Kz01 Y (n) 0 X (n)

0 2 log(jKx + Kz j=jKz j)

Lemma IV.2: If X (n)

T

Y (n) 0 X (n)

1

= exp

[Z

(n)

(a)

1

n

Y (n) (Kx + Kz )01Y (n)

1

exp

 N (0; Kx ) and is independent of Y

= 1]

We can write Pr[Ei ] for i = 6 1 as

= 0;

8  > 0; > 0:

Lemma IV.1: If X (n) then

jKx + Kz j jKz j

Theorem IV.1: Let the channel Y (n) = X (n)+Z (n) , where Kx 2Kx , Kz 2Kz , and Z (n) satisfies conditions C 1 and C 2. Suppose the transmitter knows the family Kz but not the actual covariance Kz 2 Kz . Let the receiver know the covariance Kz of Z (n) , but not the distribution. Then, there exists a sequence of (2n(C 0) ; n) randomly drawn

P [EjW

>

log

there exist codes for which the probability of error goes to zero as n ! 1.

z (n)T (Kx (1 + ) + Kz )01z (n)

z (n)T (Kx (1 + ) + Kz )01 z (n)

1

2n

lim

if Rn

 Cn 0 .

n!1

P [EjW

= 1] = 0

This result needs to be interpreted with caution, as it is proved that the average error probability, averaged over randomly chosen codebooks, goes to zero. This does not show that a single codebook will suffice for all noise distributions in Kz . Randomization may protect against noise distributions which are designed for specific codebooks. Given

3080

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

this caveat, we have shown that despite having a mismatched decoder (which treats the noise as Gaussian), we can transmit information reliably at rate

Rn

1

=

2n

log

Lemma IV.1: If X (n)  N (0; Kx ) and is independent of Y (n) , where Y (n) has an arbitrary distribution, then we have

jKx + Kz j jKz j

Y ; X log

=

 N (0; Kx ) 0 X [exp(0b(X 0 a ) A (X 0 a )=2)] jA=bj = exp(0aT (Kx + A=b)0 a=2): (49) = jA=b + Kx j = Proof: We can always write X C , where  N (0; I ) and C is an n 2 m matrix. Here m denotes the rank of Kx . Therefore, we T

1

1 2

= (a)

=

1 2

1 2

1

1

1

1 2

1

1

jY

(n)

0x

)

(y

K

0x

)

jKx + Kz j =

0

1 2

log

e

1

Pr

2n

(52) that X

(n)

and Y

(n)

are independent,

>

Z (n) Kz01Z (n) 1

Z (n) + X (n)

2n

T

2

 (1 0 ) exp 0n 8

01 Z (n) +X (n)

(Kx + Kz )

+

+ :

(53)

Proof of Lemma IV.2: Pr

1 2n

> (a)



z (n)T Kz01z (n) 1 2n exp

z (n) + x (n)

1 2

T

01 z (n) + x(n)

z (n)T Kz01z (n) 0 1

(b)

=

e0n  e z e0n  e z

K

jz n

(Kx + Kz )

1(Kx + Kz )0

1 2

( )

+x

)

T

z (n) + x(n)

z (n) + x(n)

+

0 n

z

1

X

K

z

e0 (z

+x

) (K +K )

j(Kx + Kz )= j =

(z

1 2

jKx + (Kx + Kz )= j =

1 e0

1

1

)

Lemma IV.2: If X (n)  N (0; Kx ) and is independent of Z (n) , and [Z (n)Z (n) ] = Kz > 0, then we have

=

jI + C A bCC j = 1 exp(0aT (A0 b 0 A0 bCC (I +C T A0 bCC )0 C T A0 b)a=2) jA=bj = exp(0aT (Kx + A=b)0 a=2) = (50) jA=b + Kx j = where (a) follows from  N (0; I ) and (b) uses the matrix inversion lemma and the facts Kx = C C T , jI + U V j = jI + V U j[27]. (b)

jKz j =

Y

1

C T A01

e0 (y

0x

(y

K

)

y

(K +K )

where (a) follows from the fact (b) follows from Lemma A.1.

(c)

1

0x

(y

1 2

=

=

Kz01 Y (n) 0 X (n)

T

01y (n)

X

1

0b(X 0 a)T A0 (X 0 a)=2)] T 0 [exp(0b(C 0 a ) A (C 0 a )=2)]

X [exp(

(51)

(Kx + Kz )

e y

Y

(b)

1 2

have

:

01Y (n)

Y (n) 0 X (n)

1 2

1

APPENDIX Lemma A.1: If X

jKx + Kz j=jKz j)

1 X jY e0

Kz301z ] < X ; Z [(z + x)T (Kx + Kz3 )01 (z + x )]:

The existence of Gaussian saddle points in the mutual information game (under covariance constraints on signal and noise) implies the robustness of Gaussian codebooks. The problem of robust signal design reduces to water filling on the worst noise processes subject to covariance constraints. We show that for high signal power, the worst noise with a banded covariance constraint is the maximum entropy noise. However, the maximum entropy noise is not the worst noise for low signal powers. Hence, robust signal design depends on the noise constraints as well as the available signal power.

log(

(Kx + Kz )

e y

Y

=

V. CONCLUDING REMARKS

1 2

Y (n)

2

0

(a)

for the decoding. However, this is just a conjecture, we have not proved such a result and it is not clear whether it is true.

1

exp

 I (X ; Y 3 )

We can perhaps use this in order to prove a coding theorem using Kz3

0

Kz01 Y (n) 0 X (n)

T

Proof of Lemma IV.1:

saddle point in the mutual information game. Using a similar setup, in our case this translates to T

01Y (n)

(Kx + Kz )

Y (n) 0 X (n)

= exp

where Y 3 corresponds to the output of the channel that achieves the

X ; Z [z

Y (n)

2

0 21

using a codebook consisting of independently drawn Gaussian codewords. Note that we have not used the “worst” covariance Kz3 for the decoding rule. It seems difficult to show whether the rate Rn = jK +K j is achievable using the worst covariance for decoding 1 log jK j 2n rather than assuming that the noise covariance Kz is known at the decoder. It can be shown that the equivalent of Lemma IV.1 can be shown for Kz3 as well (and the proof is almost identical to that in the Appendix). However, to show the equivalent of Lemma IV.2 may be harder. An encouraging sign is an adaptation of the result in [13, Lemma 6.10, pp. 212–214] (in the context of a convex class of compound channels), where it is shown that

fY jX (y jz ) fY (y )

1

exp

z

1 2

z

(K +(K +K )= )

(54)

where (a) follows from the Chernoff bound, is the Chernoff parameter, (b) follows from the independence of X (n) and z (n) , and (c) follows from Lemma A.1. Let us define 1 jKx + (Kx + Kz )= j E n; ; z (n) =  + log 2n j(Kx + Kz )= j

0 21n z n T ( Kz0 0 (Kx + (Kx + Kz )= )0 )z n : ( )

1

1

( )

(55)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 7, NOVEMBER 2001

Hence, we have the right-hand side of (54) given by e0nE (n; ; z We can rewrite E (n; ; z (n) ) as

E n; ; z (n)

=

0 2n z 1

 +

(n)T

n

1

log

2n

1+

i=1

)

.

1 + i

01 0 (Kx + (Kx + Kz )= )01 )z (n) ( K z

(56)

01=2 Kx Kz03=2 ). where i = 1=i (Kz E n; ; z (n) n

(a)

  + 21n 0 2n z 1

(b)

=

1 + + i

i=1

01 0 (Kx + (Kx + Kz )= )01 )z (n)

(n)T

( Kz

1

1

 +

+ 1 2n 1 trace([ Kz01 0 (Kx + (Kx + Kz )= )01]Kz )

0 21n z n T ( Kz0 0 (Kx + (Kx + Kz )= )0 )z n (57) where in (a) we have used log(1 + x)  x x for x  0 and (b) is ( )

1

1

1+

due to n

i=1

1+ + i

=

Let

C

=

z (n) :

C

=

z (n) :

1

2

( )

1

n 1

n

1

+1

010(Kx+(Kx+Kz )= )01 ]Kz ):

trace ([ Kz

z (n)T Kz01z (n) 0

1

n

z (n)T Kz01z (n)

< =2

z (n)T (Kx (1 + ) + Kz )01z (n)

0 n1 z n T (Kx (1 + ) + Kz )0 z n < =2 : If A = C \ C then from C 1 and C 2 we have Pr[A] > 1 0  for all n  N (). If we evaluate E (n; ; z n ) when z n 2 A and denote it by E (n; ; z n jA) we have

E (n; ; z n jA)   0 [ 0 ] 2

= [ 0 ] 2 ( )

1

1 ( )

2

( )

( )

( )

( )

(a)

2

 8 (58) where (a) follows because z n 2A and (b) follows by choosing  . The result follows by using (54), (58) =  , and Pr[A] > 1 0 . (b)

( )

2

ACKNOWLEDGMENT The authors wish to thank E. Ordentlich and B. Halder for the stimulating discussions. They would also like to thank A. Lapidoth for a very detailed and helpful review of the manuscript. He (along with his student P. Vontobel) also made the observation leading to Proposition II.1 and contributed to its proof. The authors also wish to thank the referees for helpful reviews and their suggestion to use the characteristic function argument in (14). REFERENCES [1] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. [2] B. S. Choi and T. M. Cover, “An information theoretic proof of Burg’s maximum entropy spectrum,” Proc. IEEE, vol. 72, pp. 1094–1095, Aug. 1984. [3] N. M. Blachman, “Communication as a game,” in Proc. WESCON Conf., Aug. 1957, pp. 61–66. [4] R. L. Dobrushin, “Optimum information transmission through a channel with unknown parameters,” Radiotech. Elektron., vol. 4, pp. 1951–1956, Dec. 1959.

3081

[5] R. J. McEliece and W. E. Stark, “An information theoretic study of communication in the presence of jamming,” in Proc. Int. Conf. Communications, 1981, pp. 45.3.1–45.4.5. [6] A. Lapidoth and P. Narayan, “Reliable communication under channel uncertainty,” IEEE Trans. Inform, Theory (Special Commemorative Issue), vol. 44, pp. 2148–2177, Oct. 1998. [7] T. Bas¸ar, “The Gaussian test channel with an intelligent jammer,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 152–157, Jan. 1983. [8] T. Bas¸ar and Y.-W. Wu, “A complete characterization of minimax and maximin encoder–decoder policies for communication channels with incomplete statistical description,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 482–489, Jan. 1985. [9] S. Shamai (Shitz) and S. Verdú, “Worst case power constrained noise for binary input channels,” IEEE Trans. Inform. Theory, vol. 38, pp. 1494–1511, Sept. 1992. [10] C. R. Baker and I.-F. Chao, “Information capacity of channels with partially unknown noise—Part I: Finite dimensional channels,” SIAM J. Appl. Math., vol. 56, pp. 946–963, June 1996. [11] A. Lapidoth, “Mismatched decoding of the multiple-access channel and some related issues in lossy source compression,” Ph.D. dissertation, Stanford Univ., Stanford, CA, 1995. [12] I. Csiszár and P. Narayan, “Capacity of the Gaussian arbitrarily varying channel,” IEEE Trans. Inform. Theory, vol. 37, pp. 18–26, Jan. 1991. [13] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Channels. New York: Academic, 1982. [14] N. M. Blachman, “On the capacity of a bandlimited channel perturbed by statistically dependent interference,” IRE Trans. Inform. Theory, vol. IT-8, pp. 48–55, Jan. 1962. , “Effect of statistically dependent interference upon channel ca[15] pacity,” IRE Trans. Inform. Theory, vol. IT-8, pp. 53–57, Sept. 1962. [16] R. Durrett, Probability: Theory and Examples, 2nd ed. Boston, MA: Duxbery (PWS Pubs.), 1995. [17] S. Ihara, “On the capacity of channels with additive non-Gaussian noise,” Inform. Contr., vol. 37, pp. 34–39, 1978. [18] M. S. Pinsker, “Calculation of the rate of information production by means of stationary random processes and the capacity of stationary channel” (in Russian), Dokl. Akad. Nauk, USSR 111, pp. 753–756, 1956. [19] M. J. Osborne and A. Rubinstein, A Course in Game Theory. Cambridge, MA: MIT Press, 1994. [20] R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton Univ. Press, 1970. [21] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1990. [22] B. Hughes and P. Narayan, “The capacity of a vector Gaussian arbitrarily varying channel,” IEEE Trans. Inform. Theory, vol. 34, pp. 995–1003, Sept. 1988. [23] A. Lapidoth, “Nearest neighbor decoding for additive non-Gaussian noise channels,” IEEE Trans. Inform. Theory, vol. 42, pp. 1520–1529, Sept. 1996. [24] K. Fan, “On a theorem of Weyl concerning the eigenvalues of linear transformations II,” in Proc. Nat. Acad. Sci, U.S.A., vol. 36, 1950, pp. 31–35. [25] T. M. Cover and J. A. Thomas, “Determinant inequalities via information theory,” SIAM J. Matrix Anal. its Applic., vol. 9, pp. 384–392, 1988. [26] D. G. Luenberger, Optimization by Vector Space Methods. New York: Wiley, 1969. [27] S. Haykin, Adaptive Filter Theory, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1991.