A Proof of the Strong Converse Theorem for Gaussian Broadcast ...

Report 0 Downloads 196 Views
1

A Proof of the Strong Converse Theorem for Gaussian Broadcast Channels via the Gaussian Poincar´e Inequality Silas L. Fong and Vincent Y. F. Tan

arXiv:1509.01380v2 [cs.IT] 11 Sep 2015

Abstract We prove that 2-user Gaussian broadcast channels admit the strong converse. This implies that every sequence of block codes with an asymptotic average error probability smaller than one is such that all the limit points of the sequence of rate pairs must lie within the capacity region derived by Cover and Bergmans. The main mathematical tool required for our analysis is a logarithmic Sobolev inequality known as the Gaussian Poincar´e inequality. Index Terms Gaussian broadcast channel, strong converse, information spectrum, Gaussian Poincar´e inequality, logarithmic Sobolev inequality

I. I NTRODUCTION This paper revisits the 2-user Gaussian broadcast channel (BC) [1, Section 5.5] in which a single transmitting node X would like to send information to two receiving nodes Y1 and Y2 . The channel outputs corresponding to the input X are Y 1 = X + Z1 ,

and

Y 2 = X + Z2 ,

(1) (2)

where Z1 and Z2 are Gaussian noise components, each with zero mean and with variances σ12 and σ22 respectively. This channel is a popular model for the downlink of a cellular system. When information is sent over n uses of the channel, the peak power of the input codeword X n , (X1 , X2 , . . . , Xn ) is constrained to satisfy n

1X 2 Xk ≤ P n

(3)

k=1

with probability one for some admissible power P > 0. Assuming that σ12 ≤ σ22 (so the channel is degraded in favor of the first receiver), the capacity region of this channel is well-known and is given by the set of all rate pairs (R1 , R2 ) satisfying R1 ≤ C (αS1 )   (1 − α)S2 R2 ≤ C αS2 + 1

(4) (5)

for some α ∈ [0, 1], where C(x) , 12 log(1 + x) and Si , σP2 is the signal-to-noise ratio (SNR) of the i-th user. The i achievability part is proved using superposition coding, an idea that originates from Cover [2]. The converse part was proved by Bergmans [3] using the entropy power inequality [4]. See [1, Section 5.5] for a modern exposition for the proof of the capacity region of the Gaussian BC. One potential drawback of the existing outer bound is the fact that it is only a weak converse, proved using Fano’s inequality [1, Section 2.1]. This implies that it only guarantees that for all rate pairs not belonging to the capacity region, the average error probability in decoding the transmitted messages is bounded away from zero as the blocklength of any code tends to infinity. In information theory, it is also important to establish strong converses as such definitive statements indicate that there is a sharp phase transition between rate pairs that are achievable and those that are not. A strong converse indicates that for all codes with rate pairs that are in the exterior of Silas L. Fong and Vincent Y. F. Tan are with the Department of Electrical and Computer Engineering, National University of Singapore (NUS), Singapore (e-mail: {silas_fong,vtan}@nus.edu.sg). Vincent Y. F. Tan is also with the Department of Mathematics, NUS.

2

the capacity region, the error probability must necessarily tend to one. The contrapositive of this statement can roughly be stated as follows: All codes operated at a fixed rate pair that result in an error probability not exceeding 0 ≤ ε < 1 as the blocklength grows, i.e., ε-reliable codes, must be such that the rate pair belongs to the capacity region. This is clearly a stronger statement than the weak converse, which is a special case where ε = 0. A. Main Contribution The main contribution of the present work is a proof of the strong converse for the Gaussian BC. That is, we prove that for the Gaussian BC, its ε-capacity region (the set of all rate pairs for which there exists a sequence of codes whose asymptotic average error probability does not exceed ε) is the region given in (4)–(5). In other words, if one operates at a pair of rates in the exterior of the capacity region, the average error probability must necessarily tend to one as the blocklength grows. Thus, the boundary of the capacity region presents a sharp phase transition. Our technique hinges on a fundamental inequality in probability theory known as the Gaussian Poincar´e inequality [5] (also see [6]), a particular instance of a logarithmic Sobolev inequality. This inequality says that for any n independent and identically distributed standard Gaussian random variables Z n , (Z1 , Z2 , . . . , Zn ) and any  n n 2 differentiable mapping f : R → R where E[(f (Z )) ] < ∞ and E k∇f (Z n )k2 < ∞,   Var[f (Z n )] ≤ E k∇f (Z n )k2 . (6)

In Shannon theory, this inequality has been used by Polyanskiy and Verd´u [7, Theorem 8] to bound the relative entropy between the empirical distribution of a code for an additive white Gaussian channel and the n-fold product of the capacity-achieving output distribution. However, it has not been explicitly used in other problems in Shannon theory, for example, to establish strong converses. We find it useful in the context of the Gaussian BC to bound the variance of a certain log-likelihood ratio (information density). An auxiliary and important contribution of our work is the following. Consider all optimal codes for the Gaussian BC whose rate pairs approach a specific point on the boundary of the capacity region. We show that  if the average 1 √ error probability ε is non-vanishing, those rate pairs converge to the boundary at a rate of O n , where n is the length of the code. The achievability part is a direct consequence of the central limit theorem, similarly to works on second-order asymptotics [8] and in particular, the Gaussian multiple access channel with degraded message sets [9]. However, the converse part is more involved and indeed the strong converse must first be established. The estimates obtained from the various bounding techniques contained herein, including the Gaussian Poincar´e inequality, allows us to assert the O √1n speed of convergence. Nailing down the exact speed of convergence and the corresponding constant would be a fruitful but ambitious avenue for further research. B. Related Work The blowing-up lemma [10] is the standard technique to establish the strong converse for some network information theory problems. These include the discrete memoryless degraded BC [11, Theorem 16.3] [10, Theorem 4], the lossless one-help-one source coding problem [11, Theorem 16.4], the discrete memoryless multiple access channel [12], and the Gel’fand-Pinsker channel [13]. See [6, Section 3.6] for an exposition of the use of the blowing-up lemma for establishing the strong converse for the discrete memoryless degraded BC, and for bounding the relative entropy between the empirical distribution of good channel codes with non-vanishing error probabilities and the n-fold product of the capacity-achieving output distribution. Similarly to logarithmic Sobolev inequalities, the blowing-up lemma is also a result in the study of concentration of measure [6], [14]. However, its use in Shannon theory is tailored to systems where the alphabets of the underlying systems are discrete (finite). It is unclear, at least to the authors, how one can adapt the use of the blowing-up lemma, or more generally, transportation inequalities [6], [14], to establish strong converses for continuous-alphabet systems, such as the Gaussian BC. C. Paper Outline

In the next subsection, the notation of this paper is stated. Section II contains the formulation of the Gaussian BC and our main result. Section III states some useful preliminary results that are used to prove the main theorem. These include some information spectrum bounds as well as an important bound based on the Gaussian Poincar´e inequality. Section IV presents the proof of our main result. Proofs of auxiliary results are deferred to the appendices.

3

D. Notation We use Pr{E} to represent the probability of an event E , and we let 1{E} be the characteristic function of E . We use a capital letter X to denote an arbitrary random variable with alphabet X , and use the small letter x to denote a realization of X . We use X n to denote a random tuple (X1 , X2 , . . . , Xn ), where the components Xk have the same alphabet X . The following notations are used for any arbitrary random variables X and Y and any real-valued mapping g whose domain includes X . We let pX and pY |X denote the probability distribution R of X and the conditional probability distribution of Y given X respectively. We let PrpX {g(X) ≥ ξ} denote x∈X pX (x)1{g(x) ≥ ξ} dx for any real-valued function g and any real constant ξ . The expectation and the variance of g(X) are denoted as EpX [g(X)] and VarpX [g(X)] , EpX [(g(X)−EpX [g(X)])2 ] respectively. We let pX pY |X denote the joint distribution of (X, Y ), i.e., pX pY |X (x, y) = pX (x)pY |X (y|x) for all x and y . We let N (·; µ, σ 2 ) : R → [0, ∞) be the probability density function of a Gaussian random variable denoted by Z whose mean and variance are µ and σ 2 respectively such that (z−µ)2 1 N (z; µ, σ 2 ) = √ (7) e− 2σ2 . 2πσ 2 Similarly, we let N (·; µ, σ 2 ) : Rn → [0, ∞) be the joint probability density function of n independent copies of Z ∼ N (z; µ, σ 2 ) such that P (zk −µ)2 1 − n k=1 2σ2 N (z n ; µ, σ 2 ) = . (8) n e (2πσ 2 ) 2 We will take all logarithms to base e throughout this paper, so all information quantities have units of nats. The set of natural numbers, integers, real numbers and non-negative real numbers are denoted by N, Z, R and R+ qP n n n n 2 respectively. The Euclidean norm of a tuple x ∈ R is denoted by kx k , k=1 xk . II. G AUSSIAN B ROADCAST C HANNEL

AND I TS

ε-C APACITY R EGION

We consider the Gaussian broadcast channel (BC) where a source s wants to transmit a message to two destinations denoted by d1 and d2 respectively in n time slots (channel uses) as follows. Source s chooses a message (n)

Wi ∈ {1, 2, . . . , Mi }

(9)

(n)

destined for di for each i ∈ {1, 2} where Mi denotes the size of message Wi . For notational convenience, we let I , {1, 2}. In each time slot k, s transmits Xk ∈ R based on (W1 , W2 ), and di receives Yi,k = Xi,k + Zi,k for each i ∈ I where {Zi,k }nk=1 are n independent copies of the Gaussian random variable whose mean and variance are 0 and σi2 respectively. Without loss of generality, we assume throughout the paper that σ22 ≥ σ12 ≥ 1.

(10)

ˆ i to be the transmitted Wi based on Y n . Every codeword xn (w1 , w2 ) transmitted After the n time slots, di declares W i P by node s should always satisfy the peak power constraint nk=1 x2k (w1 , w2 ) ≤ nP , where P denotes the power available to s. The definitions of the BC and the codes defined on it are formally given below.

A. Definitions for the Gaussian Broadcast Channel (n)

(n)

Definition 1: An (n, MI , P )-code, where MI

(n)

(n)

, (M1 , M2 ), consists of the following:

(n)

1) A message set Wi = {1, 2, . . . , Mi } for each i ∈ I . Message (W1 , W2 ) is uniform on W1 × W2 , i.e., Pr {(W1 , W2 ) = (w1 , w2 )} =

1

(n) (n) M1 M2

(11)

for all (w1 , w2 ) ∈ W1 × W2 (which implies the independence between W1 and W2 ). 2) An encoding function f (n) : W1 × W2 → Rn where f (n) is the encoding function at s such that X n = f (n) (W1 , W2 ).

(12)

4

The codebook is defined to be {f (n) (w1 , w2 ) ∈ Rn : (w1 , w2 ) ∈ W1 × W2 }. In addition, the peak power constraint kf (n) (w1 , w2 )k2 ≤ nP (13) should be satisfied for each (w1 , w2 ) ∈ W1 × W2 . (n) (n) 3) A decoding function ϕi : Rn → Wi for each i ∈ I where ϕi is the decoding function at di such that ˆ i = ϕ(n) (Y n ). For each i ∈ I and each wi ∈ Wi , the decoding region of wi is defined to be W i i (n)

(n)

Di (wi ) , {yin ∈ Rn |ϕi (yin ) = wi }.

(14)

Definition 2: A Gaussian broadcast channel (BC) is characterized by the probability density function qY1 ,Y2 |X satisfying qY1 ,Y2 |X (y1 , y2 |x) = N (y1 − x; 0, σ12 )N (y2 − x; 0, σ22 ) (15) (n)

where σ22 ≥ σ12 ≥ 1 such that the following holds for any (n, MI , P )-code: For each k ∈ {1, 2, . . . , n}, pW1 ,W2 ,X n ,Y1n ,Y2n (w1 , w2 , xn , y1n , y2n ) = pW1 ,W2 ,X n (w1 , w2 , xn )

n Y

k=1

for all w,

xn ,

y1n

and

y2n ,

where

pY1,k ,Y2,k |Xk (y1,k , y2,k |xk )

pY1,k ,Y2,k |Xk (y1,k , y2,k |xk ) = qY1 ,Y2 |X (y1,k , y2,k |xk ).

(16)

(17)

Since pY1,k ,Y2,k |Xk does not depend on k by (17), the channel is stationary. To simplify notation, we let YI , (Y1 , Y2 ) for any random variables (Y1 , Y2 ), and let YI,k , (Y1,k , Y2,k ) for (n) any random variables (Y1,k , Y2,k ). For any (n, MI , P )-code defined on the BC, let pWI ,X n ,Y n ,W ˆ I be the joint I as follows: distribution induced by the code. We can factorize pWI ,X n ,Y n ,W ˆI I

(a)

pWI ,X n ,Y n ,W ˆ 1 |Y n pW ˆ 2 |Y n ˆ I = pWI ,X n ,YIn pW 1

I

(16)

= pWI ,X n

n Y

k=1

(18)

2

pY1,k ,Y2,k |Xk

!

pW ˆ 1 |Y n pW ˆ 2 |Y n 1

(19)

2

ˆ i is a function of Y n for each i ∈ I . where (a) follows from Definition 1 that W i (n)

Definition 3: For an (n, MI , P )-code defined on the BC, we define according to (19) the average probability of decoding error as (14)

(n)

ˆ I 6= WI } = Pr{∪2i=1 {Yin ∈ Pr{W / Di (Wi )}}.

(20) (n)

(n)

We call an (n, MI , P )-code with average probability of decoding error no larger than ε an (n, MI , P, ε)avg -code. Similarly, we define maximal probability of decoding error as ˆ I 6= wI |WI = wI } max Pr{W

wI ∈WI (14)

(n)

/ Di (wi )}|WI = wI }. = max Pr{∪2i=1 {Yin ∈

(21)

wI ∈WI

(n)

(n)

We call an (n, MI , P )-code with maximal probability of decoding error no larger than ε an (n, MI , P, ε)max code. Definition 4: Let ε ∈ [0, 1) be a real number. A rate pair (R1 , R2 ) is ε-achievable for the BC if there exists a (n) sequence of (n, MI , P, εn )avg -codes on the BC such that lim inf n→∞

1 (n) log Mi ≥ Ri n

(22)

5

for each i ∈ I and

lim sup εn ≤ ε.

(23)

n→∞

Definition 5: The ε-capacity region of the BC, denoted by Cε , is defined to be the set of ε-achievable rate pairs.

Define C(P ) ,

1 2

log(1 + P ) for all P ≥ 0, and let     R ≤ C αP   [ 2 1 σ   1 . (R1 , R2 ) ∈ R2+ RBC , (1−α)P   R ≤ C 2 2 α∈[0,1] αP +σ

(24)

2

B. Main Result

The following theorem is the main result of this paper. The proof of this theorem is deferred to Section IV. Theorem 1: For all ε ∈ [0, 1)

Cε ⊆ RBC .

(25)

C0 = RBC .

(26)

It is well-known [1, Theorem 5.3] that Therefore Theorem 1 implies the strong converse for the Gaussian BC, i.e., that for every ε ∈ [0, 1), Cε = RBC .

(27)

Before presenting the proof of Theorem 1, we would like to make the following two remarks. Remark 1: For each ε ∈ [0, 1) and each λ ∈ [0, 1], define the λ-sum capacity as      (1 − α)P αP + (1 − λ)C . Cλ , max λC σ12 αP + σ22 α∈[0,1]

(28)

Theorem 1 implies that for all (R1 , R2 ) ∈ Cε , λR1 + (1 − λ)R2 ≤ Cλ

(29)

for all λ ∈ [0, 1]. In fact, our analysis gives us a useful estimate of the optimal λ-sum rate at finite blocklengths. From the proof of Theorem 1, and specifically the inequalities (127) and (128), we may assert the following for each ε ∈ (0, 1), (n) (n) each λ ∈ [0, 1] and each sequence of (n, M1 , M2 , P, ε)avg -codes: There exists a constant θ¯ ∈ R that depends on ε and P (but not n) such that  1  (n) (n) lim sup √ λ log M1 + (1 − λ) log M2 − nCλ ≤ θ¯ . (30) n n→∞ On the other hand, for each ε ∈ (0, 1), each λ ∈ [0, 1], it follows from the standard achievability proof involving superposition coding [1, Chapter 5] using i.i.d. Gaussian codewords with average power P − √1n and a generalization (n)

(n)

of Shannon’s non-asymptotic achievability bound [15] that there exists a sequence of (n, M1 , M2 , P, ε)avg -codes which satisfies the following: There exists a θ ∈ R that depends on ε and P (but not n) such that  1  (n) (n) lim inf √ λ log M1 + (1 − λ) log M2 − nCλ ≥ θ . (31) n→∞ n If we define (M1∗ (n, ε, λ), M2∗ (n, ε, λ)) to be an optimal pair of message sizes that satisfies

λ log M1∗ (n, ε, λ) + (1 − λ) log M2∗ (n, ε, λ) n o (n) (n) (n) (n) = max λ log M1 + (1 − λ) log M2 There exists an (n, M1 , M2 , P, ε)avg -code ,

(32)

it then follows from (29) and (30) that

√ λ log M1∗ (n, ε, λ) + (1 − λ) log M2∗ (n, ε, λ) = nCλ + O( n)

(33)

6

for each ε ∈ (0, 1) and each λ ∈ [0, 1]. This result is not unexpected in view of recent works on second-order asymptotics for network information theory problems [8]. However, even establishing the strong converse is not trivial. Moreover, characterizing the order of the most significant term in the O(·) notation in (33) appears to be a formidable problem. Remark 2: As described at the beginning of this subsection, Theorem 1 implies the strong converse under the setting of union average error probability as defined in Definition 3. Our proof technique can also be used to prove the strong converse under the setting of separate maximal error probability as described below. Fix any ε1 ∈ [0, 1) and any ε2 ∈ [0, 1). If we follow the setting of the discrete memoryless degraded BC in [10, Section 1] and define the (ε1 , ε2 )-capacity region as the set of (ε1 , ε2 )-achievable rate pairs where ˆ i 6= wi |WI = wI } εi , max Pr{W

(34)

wI ∈WI

denotes the maximal probability of decoding error for message i ∈ I , then a slight modification of the proof steps for Theorem 1 in Section IV (ignoring the step of codebook expurgation) will imply that the (ε1 , ε2 )-capacity region is contained in RBC , thus establishing the strong converse for the setting of separate maximal error probability. III. P RELIMINARIES

FOR THE

P ROOF

OF

T HEOREM 1

A. Information Spectrum Bounds The following lemma is a modification of Verd´u-Han’s non-asymptotic converse bound [16, Theorem 4] for obtaining a lower bound on the maximal probability of decoding error. Note that the original Verd´u-Han bound pertains to the average probability of error, but the maximal probability of error is more useful for our context. (n)

(n)

(n)

Lemma 1: Fix an (n, MI , P, ε)max -code with decoding regions {D1 (w1 )|w1 ∈ W1 } and {D2 (w2 )|w2 ∈ W2 }. Let pWI ,X n ,Y n ,W ˆ I denote the probability distribution induced by the code. For each i ∈ I and each wI ∈ WI , I fix a real number γi (wI ). Then, we have for each (i, j) ∈ {(1, 2), (2, 1)}   pYin |Wi (Yin |wi ) (n) ≤ log Mi − γi (wI ) PrpYin |WI =wI log pYin (Yin ) ) ( Z (n)

(n)

≤ n2 e−γi (wI ) + ε + 1 M1 M2

pYin (yin )pWj |Wi ,Yin (wj |wi , yin )dyin > n2

(n) Di (wi )

.

(35)

Proof: Fix a pair (i, j) ∈ {(1, 2), (2, 1)}, a wI ∈ WI and a real number γi (wI ). We first consider the case for (i, j) = (1, 2). In order to show (35) for (i, j) = (1, 2), we consider the following chain of inequalities:   pY1n |W1 (Y1n |w1 ) (n) PrpY1n |WI =wI log ≤ log M1 − γ1 (wI ) pY1n (Y1n )   pY n |W (Y1n |w1 ) (n) ≤ log M1 − γ1 (wI ) ≤ PrpY1n |WI =wI log 1 1 n pY1n (Y1 ) ) ( Z (n)

(n)

× 1 M1 M2

+1

(

D1(n) (w1 )

Z

(n) (n) M1 M2

D1(n) (w1 )

pY1n (y1n )pW2 |W1 ,Y1n (w2 |w1 , y1n )dy1n ≤ n2

pY1n (y1n )pW2 |W1 ,Y1n (w2 |w1 , y1n )dy1n > n2

)

  n o pY1n |W1 (Y1n |w1 ) (n) (n) n ≤ PrpY1n |WI =wI log ≤ log M1 − γ1 (wI ) ∩ Y1 ∈ D1 (w1 ) pY1n (Y1n ) ) ( Z (n) (n) pY1n (y1n )pW2 |W1 ,Y1n (w2 |w1 , y1n )dy1n ≤ n2 × 1 M1 M2

(36)

(a)

D1(n) (w1 )

n

/ + PrpY1n |WI =wI Y1n ∈

o

(n) D1 (w1 )

+1

(

Z

(n) (n) M1 M2

D1(n) (w1 )

pY1n (y1n )pW2 |W1 ,Y1n (w2 |w1 , y1n )dy1n > n2

)

(37)

7

where (a) follows from the union bound. In order to bound the first term in (37), we consider  n  o pY1n |W1 (Y1n |w1 ) (n) (n) n ≤ log M1 − γ1 (wI ) ∩ Y1 ∈ D1 (w1 ) PrpY1n |WI =wI log pY1n (Y1n )   Z pY n |W (y1n |w1 ) (n) pY1n |WI (y1n |wI )1 log 1 1 n = ≤ log M1 − γ1 (wI ) dy1n pY1n (y1 ) D1(n) (w1 )   Z pY1n |W1 (y1n |w1 ) (11) (n) (n) n pW2 ,Y1n |W1 (w2 , y1 |w1 )1 log = M2 ≤ log M1 − γ1 (wI ) dy1n n (n) n (y ) p Y1 D (w1 ) 1   Z 1 pY1n |W1 (y1n |w1 ) (n) (n) n n ≤ log M1 − γ1 (wI ) dy1n pY1n |W1 (y1 |w1 )pW2 |W1 ,Y1n (w2 |w1 , y1 )1 log = M2 n (n) n (y ) p Y1 D1 (w1 ) 1 Z (n) (n) −γ1 (wI ) ≤ M1 M2 e pY1n (y1n )pW2 |W1 ,Y1n (w2 |w1 , y1n )dy1n , (n)

(38) (39) (40) (41)

D1 (w1 )

which implies that  n o pY1n |W1 (Y1n |w1 ) (n) (n) n ≤ log M1 − γ1 (wI ) ∩ Y1 ∈ D1 (w1 ) PrpY1n |WI =wI log pY1n (Y1n ) ) ( Z (n) (n) n n n 2 pY1n (y1 )pW2 |W1 ,Y1n (w2 |w1 , y1 )dy1 ≤ n × 1 M1 M2 

D1(n) (w1 )

≤ n2 e−γ1 (wI ) .

(42)

The second term in (37) can be upper bounded as n o (n) / D1 (w1 ) ≤ ε PrpY1n |WI =wI Y1n ∈

(43)

because the maximal probability of decoding error of the code is ε. Combining (37), (42) and (43), we obtain (35) for (i, j) = (1, 2). By symmetry, (35) holds for (i, j) = (2, 1). The following corollary is a direct consequence of Lemma 1 with an appropriate choice of γi (wI ). (n)

(n)

Corollary 2: Fix an ε ∈ [0, 1) and fix an (n, MI , P, ε)max -code with decoding regions {D1 (w1 )|w1 ∈ W1 } (n) and {D2 (w2 )|w2 ∈ W2 }. Let pWI ,X n ,Y n ,W ˆ I denote the probability distribution induced by the code. Fix a pair I (i, j) ∈ {(1, 2), (2, 1)}. Then we have for each wI ∈ WI − log Mi(n) −EpY n |W

1−ε ≤ n2 e 2

i

(

(n)

Z

I =wI



log

(n)

+ 1 M1 M2

(n)

Di (wi )

 pY n |W (Yin |wi ) i i − n pY n (Y ) i i

s

2 VarpY n |W =w 1−ε I I i



log

pY n |W (Y n |wi ) i i i pY n (Y n ) i i

pYin (yin )pWj |Wi ,Yin (wj |wi , yin )dyin > n2

)

!

.

Proof: For each i ∈ I and each wI ∈ WI , define  s    pYin |Wi (Yin |wi ) pYin |Wi (Yin |wi ) 2 (n) VarpYin |WI =wI log − . γi (wI ) , log Mi − EpYin |WI =wI log pYin (Yin ) 1−ε pYin (Yin ) Fix a pair (i, j) ∈ {(1, 2), (2, 1)}. By Chebyshev’s inequality, we have for each wI ∈ WI   pYin |Wi (Yin |wi ) (n) PrpYin |WI =wI log ≤ log Mi − γi (wI ) pYin (Yin ) i h pY n |W (Y n |wi ) VarpYin |WI =wI log i pY ni (Yin ) i i ≥1−  h i2 pY n |W (Y n |wi ) (n) log Mi − γi (wI ) − EpYin |WI =wI log i pY ni (Yin ) i

1−ε (45) . = 1− 2

(44)

(45)

(46)

i

(47)

8

Combining (35) in Lemma 1, (45) and (47), we obtain (44). (n)

The following proposition guarantees that for any (n, MI , P, ε)max -code, the last term in (44) equals one for only a small fraction of codewords. (n)

(n)

(n)

Proposition 3: Fix an (n, MI , P, ε)max -code with decoding regions {D1 (w1 )|w1 ∈ W1 } and {D2 (w2 )|w2 ∈ W2 }, and let pWI ,X n ,Y n ,W ˆ I denote the probability distribution induced by the code. For each (i, j) ∈ {(1, 2), (2, 1)}, I define ( ) ( ) Z 1 (n) (n) pYin (yin )pWj |Wi ,Yin (wj |wi , yin )dyin > n2 ≤ A(i,j) , wI ∈ WI 1 M1 M2 . (48) n Di(n) (wi ) Then, we have for each (i, j) ∈ {(1, 2), (2, 1)} WI \ A(i,j) ≤ 1 M (n) M (n) . 2 n 1 (n)

2 1−ε ,

then the following holds for each (i, j) ∈ {(1, 2), (2, 1)} and each wI ∈ A(i,j):    s  pYin |Wi (Yin |wi ) pYin |Wi (Yin |wi ) 2 ≤ EpY n |WI =wI log VarpY n |WI =wI log + + 3 log n . i i pYin (Yin ) 1−ε pYin (Yin )

In addition, if n ≥ log Mi

(49)

(n)

(50)

Proof: Let pWI ,X n ,Y n ,W ˆ I denote the probability distribution induced by the (n, MI , P, ε)max -code. Fix a pair I (i, j) ∈ {(1, 2), (2, 1)} and consider the following chain of inequalities:  PrpWI WI ∈ / A(i,j)     Z 1 (48) X (n) (n) n n n 2 (51) pYin (yi )pWj |Wi ,Yin (wj |wi , yi )dyi > n > pWI (wI )1 1 M1 M2 = n Di(n) (wi ) wI ∈WI     Z 1 (n) (n) n n n 2 pYin (yi )pWj |Wi ,Yin (Wj |Wi , yi )dyi > n > = PrpWI 1 M1 M2 (52) (n) n Di (Wi )   Z (a) (n) (n) n n n 2 n n pYi (yi )pWj |Wi ,Yi (Wj |Wi , yi )dyi > n ≤ nPrpWI M1 M2 (53) Di(n) (Wi )   Z (b) 1 (n) (n) pYin (yin )pWj |Wi ,Yin (Wj |Wi , yin )dyin (54) ≤ EpWI M1 M2 n Di(n) (Wi ) X X Z (11) 1 pYin (yin )pWj |Wi ,Yin (wj |wi , yin )dyin (55) = (n) n D (w ) i i w1 ∈W1 w2 ∈W2 Z X 1 X = pYin (yin ) pWj |Wi ,Yin (wj |wi , yin )dyin (56) (n) n D (w ) i i wj ∈Wj wi ∈Wi Z 1 X pYin (yin )dyin (57) = (n) n D (w ) i i wi ∈Wi Z (c) 1 pY n (y n )dyin (58) ≤ n Rn i i 1 = (59) n where (a) follows from Markov’s inequality. (b) follows from Markov’s inequality. (n) (c) follows from (14) that for each i ∈ {1, 2}, {Di (wi )|wi ∈ Wi } consists of disjoint decoding regions. Using (59) and (11), we obtain (49). We will prove the second statement of the proposition in the rest of the proof. To this end, we first assume 2 . (60) n≥ 1−ε

9

Then, it follows from Corollary 2 and (48) that for each wI ∈ A(i,j) , − log Mi(n) −EpY n |W

1−ε ≤ n2 e 2 which implies from (60) that log

e

I =wI

i

(n) Mi −EpY n |W =w I I i



log



 pY n |W (Yin |wi ) i i − pY n (Y n ) i i

log

 pY n |W (Yin |wi ) i i − pY n (Y n ) i i

s

s

2 VarpY n |W =w 1−ε I I i

2 VarpY n |W =w 1−ε I I i



log



log

 pY n |W (Y n |wi ) i i i pY n (Y n ) i i

 pY n |W (Y n |wi ) i i i pY n (Y n ) i i

!

≤ n3 ,

!

,

(61)

(62)

which then implies (50). B. The Gaussian Poincar´e Inequality In the proof of the main theorem, we need to use the following lemma to bound the variance term in (50), which is based on the Gaussian Poincar´e inequality. The proof of the following lemma is contained in [7, Section III-C], and for the sake of completeness, we provide an self-contained proof in Appendix A. Lemma 4: Let n be a natural number and σ 2 be a positive number. Let pW be a probability distribution defined on some finite set W , and let g : W → Rn be a mapping. In addition, define pZ n to be the distribution of n independent copies of the zero-mean Gaussian random variable with variance σ 2 , i.e., pZ n (z n ) , N (z n ; 0, σ 2 ) for all z n ∈ Rn . Suppose there exists a 0 ≤ κ < ∞ such that max kg(w)k2 ≤ κ .

(63)

 κ VarpZ n [ log EpW [pZ n (Z n + g(W ))|Z n ] ] ≤ 2 n + 2 . σ

(64)

w∈W

Then, we have

(n)

(n)

Note that the upper bounds on log M1 and log M2 in Proposition 3 do not necessarily hold for all wI ∈ WI . (n) (n) Therefore in the proof of the main theorem, we need to obtain other upper bounds on log M1 and log M2 for those wI ∈ WI which do not satisfy the assumption in Proposition 3. Consequently, we need the following upper (n) (n) bounds on log M1 and log M2 which hold for all wI ∈ WI . Since the proof of the following upper bounds is standard (by the use of Fano’s inequality [1, Section 2.1]), it is relegated to Appendix B. (n)

Proposition 5: Fix an (n, MI , P, ε)max -code. Then, we have for each i ∈ I    1 P n (n) log Mi ≤ 1 + log 1 + 2 . 1−ε 2 σi

(65)

IV. P ROOF OF T HEOREM 1 It suffices to prove Cε ⊆ RBC for ε 6= 0 due to (26). Fix an arbitrary ε ∈ (0, 1) and let (R1 , R2 ) be an ε-achievable (n) rate pair. Then there exists a sequence of (n, MI , P, εn )avg -codes for the BC such that lim inf n→∞

for each i ∈ I and

1 (n) log Mi ≥ Ri n

lim sup εn ≤ ε.

(66) (67)

n→∞

By (67), there exists an ε¯ ∈ (0, 1) such that for all sufficiently large n, εn ≤ ε¯. (n)

(68)

By expurgating appropriate codewords from each (n, MI , P, εn )avg -code as suggested in [1, Problem 8.11], we ¯ (n) , P, √εn )max -code such that can obtain for each sufficiently large n an (n, M I $ (n) % Mi (n) ¯ (69) M = i n2

10

for each i ∈ I . Fix a sufficiently large n≥

2 1 − ε¯

(70)

¯ (n) , P, √εn )max -code such that (68) and (69) hold. Since the (n, M ¯ (n) , P, √εn )max -code and the corresponding (n, M I I √ ¯ (n) , P, ε¯)max -code by (68) and Definition 3, we will regard the (n, M ¯ (n) , P, √εn )max -code as the is also an (n, M I √ I ¯ (n) , P, ε¯)max -code in the rest of the proof. Let p denote the probability distribution induced by (n, M n ˆ n I WI ,X ,YI ,WI √ (n) ¯ the (n, MI , P, ε¯)max -code. For each (i, j) ∈ {(1, 2), (2, 1)}, define ( ) ( ) Z 1 (n) (n) n n n 2 ¯ M ¯ ≤ pYin (yi )pWj |Wi ,Yin (wj |wi , yi )dyi > n A(i,j) , wI ∈ WI 1 M . (71) 1 2 n Di(n) (wi )

Using Proposition 3 and (70), we have for each (i, j) ∈ {(1, 2), (2, 1)} and each wI ∈ A(i,j) ¯ (n) M ¯ (n) WI \ A(i,j) ≤ 1 M 2 n 1

and

(72)

s   n |w )  n p (Y pYin |Wi (Yin |wi ) 2 i Y |W (n) i i i ¯ log Mi ≤ EpYin |WI =wI log VarpYin |WI =wI log + + 3 log n . (73) pYin (Yin ) 1 − ε¯ pYin (Yin ) √ ¯ (n) , P, ε¯)max -code (cf. Definition 1), we Following (73) and letting f (n) be the encoding function of the (n, M I consider the following chain of inequalities for each wI ∈ WI :   VarpY1n |WI =wI log pY1n |W1 (Y1n |w1 ) # " X 1 n pY n |W ,W (Y |w1 , w ˜2 ) (74) = VarpY1n |WI =wI log ¯ (n) 1 1 2 1 M w ˜ ∈W 2 "2 2 # X 1 (a) = VarpY n |X n =f (n) (w ) log p n n (Y1n |f (n) (w1 , w ˜2 )) (75) (n) Y1 |X I 1 ¯ w ˜2 ∈W2 M2 # " X 1 n (n) (n) (n) pY n |X n (Y1 |f (wI ) − (f (wI ) − f (w1 , w ˜2 ))) (76) = VarpY n |X n =f (n) (w ) log I 1 ¯ (n) 1 M w ˜2 ∈W2 2 !2 Z X 1 (b) n 2 n (n) (n) 2 N (z ; 0, σ1 ) log = N (z + f (wI ) − f (w1 , w ˜2 ); 0, σ1 ) dz n (n) n ¯ R w ˜2 ∈W2 M2 !2 Z X 1 n 2 n (n) (n) 2 n N (z ; 0, σ1 ) log − (77) N (z + f (wI ) − f (w1 , w ˜2 ); 0, σ1 )dz ¯ (n) Rn M w ˜2 ∈W2 2   (c) 1 (n) (n) 2 ˜2 )k (78) ≤ 2 n + 2 max kf (wI ) − f (w1 , w σ1 w˜2 ∈W2    (d) 2 ˜2 )k2 (79) ≤ 2 n + 2 max kf (n) (wI )k2 + kf (n) (w1 , w σ1 w˜2 ∈W2   (13) 4P (80) ≤ 2n 1 + 2 , σ1 

where

(a) follows from the fact that for each wI ∈ WI and each y1n ∈ Rn , X pY1n |WI (y1n |wI ) = pX n ,Y1n |WI (xn , y1n |wI ) xn ∈Rn

(81)

11

X

=

xn ∈Rn (12) X

pX n |WI (xn |wI )pY1n |WI ,X n (y1n |wI , xn )

=

(16)

=

xn ∈Rn

X

xn ∈Rn

n o 1 xn = f (n) (wI ) pY1n |WI ,X n (y1n |wI , xn ) n o 1 xn = f (n) (wI ) pY1n |X n (y1n |xn )

= pY1n |X n (y1n |f (n) (wI )).

(82) (83) (84) (85)

(b) follows from letting z n , y1n − f (n) (wI ) and from Definition 2 that for each xn ∈ Rn and each y1n ∈ Rn , pY1n |X n (y1n |xn ) = N (y1n − xn ; 0, σ12 ).

(c) follows from viewing the difference of two terms in (77) as h ii h VarpZ n log EpW2 pZ n (Z n + f (n) (wI ) − f (n) (w1 , W2 ))

(86)

(87)

and from Lemma 4 by letting

g(w˜2 ) , f (n) (wI ) − f (n) (w1 , w ˜2 )

(88)

˜2 )k2 . κ , max kf (n) (wI ) − f (n) (w1 , w

(89)

and w ˜2 ∈W2

(d) follows from the fact that kun + v n k2 ≤ 2(kun k2 + kv n k2 ) for any (un , v n ) ∈ Rn × Rn . Following similar procedures for obtaining (80), we obtain for all wI ∈ WI   VarpY1n |WI =wI log pY1n (Y1n ) " # X 1 n pY n |W (Y |w ˜I ) = VarpY1n |WI =wI log ¯ (n) M ¯ (n) 1 I 1 M w ˜I ∈WI 1 2   4P ≤ 2n 1 + 2 , σ1 which implies from (80) that   n    o 4P n n max max VarpY1n |WI =wI log pY1n |W1 (Y1 |w1 ) , VarpY1n |WI =wI log pY1n (Y1 ) ≤ 2n 1 + 2 . wI ∈WI σ1

Since (92) holds, it follows by symmetry that   n    o 4P n n max max VarpY2n |WI =wI log pY2n |W2 (Y2 |w2 ) , VarpY2n |WI =wI log pY2n (Y2 ) ≤ 2n 1 + 2 . wI ∈WI σ2 For each (i, j) ∈ {(1, 2), (2, 1)} and each wI ∈ A(i,j) , since   pYin |Wi (Yin |wi ) VarpYin |WI =wI log pY n (Y n )   i i   ≤ 2(VarpYin |WI =wI log pYin |Wi (Yin |wi ) + VarpYin |WI =wI log pYin (Yin ) )   (a) 4P ≤ 8n 1 + 2 σi

where (a) follows from (92) and (93), it follows from (73) that     s pYin |Wi (Yin |wi ) 4 4P (n) ¯ log Mi ≤ EpYin |WI =wI log 1 + 2 n + 3 log n. + pYin (Yin ) 1 − ε¯ σi Consider the following chain of equalities for each (i, j) ∈ {(1, 2), (2, 1)}: ¯ (n) log M i

(90) (91)

(92)

(93)

(94) (95)

(96)

12

=

X

wI ∈A(i,j) (96)



X

¯ (n) + pWI (wI ) log M i

wI ∈A(i,j)

+

wI ∈WI \A(i,j)

pWI (wI ) EpYin |WI =wI

X

wI ∈WI \A(i,j)

X

¯ (n) pWI (wI ) log M i

(97)

!     s pYin |Wi (Yin |wi ) 4 4P log 1 + 2 n + 3 log n + pYin (Yin ) 1 − ε¯ σi

(n)

¯ pWI (wI ) log M i

(98)

    s X pYin |Wi (Yin |Wi ) 4 4P ¯ (n) pWI (wI ) log M = EpWI ,Y n log n + 3 log n + 1 + + i n 2 i pYin (Yi ) 1 − ε¯ σi wI ∈WI \A(i,j) !     s X pYin |Wi (Yin |wi ) 4 4P pWI (wI ) EpYin |WI =wI log (99) − 1 + 2 n + 3 log n + pYin (Yin ) 1 − ε¯ σi wI ∈WI \A(i,j)     s X pYin |Wi (Yin |Wi ) 4 4P ¯ (n) ≤ EpWI ,Yin log pWI (wI ) log M n + 3 log n + 1 + + i n 2 pYin (Yi ) 1 − ε¯ σi wI ∈WI \A(i,j)    X pYin |Wi (Yin |wi ) pWI (wI ) EpY n |WI =wI log − . (100) i pYin (Yin ) wI ∈WI \A(i,j)

i h pY n |W (Y n |wi ) term in (100), we consider the following In order to obtain a lower bound for the EpYin |WI =wI log i pY ni (Yin ) i i two chains of inequalities for each (i, j) ∈ {(1, 2), (2, 1)}, each wI ∈ WI and each yin ∈ Rn : X pYin (yin ) = (101) pWI ,Yin (w ˜I , yin ) w ˜I ∈WI

(85)

=

(86)

=

(8)



=

X

w ˜I ∈WI

X

w ˜I ∈WI

X

pWI (w ˜I )pYin |X n (yin |f (n) (w ˜I ))

(102)

pWI (w ˜I )N (yin − f (n) (w ˜I ); 0, σi2 )

(103)

pWI (w ˜I )(2πeσi2 )−n/2

w ˜I ∈WI (2πeσi2 )−n/2

(104) (105)

and (11)

pYin |Wi (yin |wi ) =

X 1 pYin |Wi ,Wj (yin |wi , w ˜j ) (n) ¯ M j

(106)

w ˜j ∈Wj

1 p n (y n |wI ). (107) (n) Yi |WI i ¯ Mj i h pY n |W (Y n |wi ) for each (i, j) ∈ {(1, 2), (2, 1)} and each wI ∈ WI as log i pY ni (Yin ) ≥

Following (100), we bound EpYin |WI =wI i i below:   pYin |Wi (Yin |wi ) EpYin |WI =wI log pYin (Yin ) (a) n ¯ (n) + Ep n ≥ − log(2πeσi2 ) − log M Y |WI =wI j i 2 n ¯ (n) − Ep n = − log(2πeσi2 ) − log M Y |WI =wI j i 2

 log pYin |WI (Yin |wI )   1 log pYin |WI (Yin |wI )



(108) (109)

13

# " n 1 (n) 2 ¯ = − log(2πeσi ) − log M − EpY n |X n =f (n) (w ) log j I i 2 pYin |X n (Yin |f (n) (wI ))

(85)

(86)

¯ (n) = −n log(2πeσi2 ) − log M j

(110) (111)

where (a) follows from (105) and (107). Combining (100) and (111), we obtain for each (i, j) ∈ {(1, 2), (2, 1)}    s  pYin |Wi (Yin |Wi ) 4 4P (n) ¯ log Mi ≤ EpWI ,Yin log + 1 + 2 n + 3 log n pYin (Yin ) 1 − ε¯ σi   X ¯ (n) + log M ¯ (n) + n log(2πeσi2 ) . pWI (wI ) log M (112) + 1 2 wI ∈WI \A(i,j)

Following (112), we consider for each (i, j) ∈ {(1, 2), (2, 1)}   X ¯ (n) + log M ¯ (n) + n log(2πeσi2 ) pWI (wI ) log M 1 2 wI ∈WI \A(i,j)

 1 ¯ (n) + log M ¯ (n) + n log(2πeσi2 ) log M 1 2 n       (b) 1 P P 1 n n 2 2 + log 1 + 2 + log 1 + 2 ≤ + n log(2πeσi ) n 1 − ε¯ 2 2 σ σ    1 2 P P 2 1 1 1 + log 1 + 2 + log 1 + 2 + log(2πeσi2 ) = 1 − ε¯ n 2 2 σ1 σ2

(a)



(113) (114) (115)

where (a) follows from (11) and (72) that X

wI ∈WI \A(i,j)

pWI (wI ) ≤

1 . n

(116)

(b) follows from Proposition 5. Defining ζi ,

and λi , 3 +

s

4 1 − ε¯

  4P 1+ 2 σi

(117)

     1 1 P P 1 + log(2πeσi2 ) 2 + log 1 + 2 + log 1 + 2 1 − ε¯ 2 2 σ1 σ2

(118)

  pYin |Wi (Yin |Wi ) log = IpWi ,Y n (Wi ; Yin ) i pYin (Yin )

(119)

and recalling EpWI ,Y n i

for each i ∈ I , it follows from (112) and (115) that for each i ∈ I √ n ¯ (n) ≤ Ip log M (W ; Y ) + ζ n + λi log n . i i i Wi ,Y n i i For each i ∈ I , since

(n) (69)

log Mi

¯ (n) )) ≤ log(n2 (1 + M i 2 ¯ (n) ≤ log(2n M ) i

≤ log 2 + 2 log n +

¯ (n) , log M i

(120)

(121) (122) (123)

it follows from (120) that (n)

log Mi

√ ≤ IpWi ,Yin (Wi ; Yin ) + ζi n + (λi + 2) log n + log 2.

(124)

14

Following the procedures for obtaining the upper bound of IpW1 ,Y1n (W1 ; Y1n ) and IpW2 ,Y2n (W2 ; Y2n ) in the weak converse proof for the Gaussian BC [1, Section 5.5.2], we conclude that there exists an 0 ≤ α ≤ 1 such that   n αP n (125) IpW1 ,Y1n (W1 ; Y1 ) ≤ log 1 + 2 2 σ1 and IpW2 ,Y2n (W2 ; Y2n )

  n (1 − α)P ≤ log 1 + . 2 αP + σ22

Combining (124), (125) and (126), we obtain   √ n αP (n) log M1 ≤ log 1 + 2 + ζ1 n + (λ1 + 2) log n + log 2 2 σ1 and (n) log M2

  √ (1 − α)P n ≤ log 1 + + ζ n + (λ2 + 2) log n + log 2 2 2 αP + σ22

(126)

(127)

(128)

where ζ1 , ζ2 , λ1 and λ2 are constants that do not depend on n by (117) and (118), which implies from (66) that   1 αP (129) R1 ≤ log 1 + 2 2 σ1 and

  n (1 − α)P R2 ≤ log 1 + , 2 αP + σ22

(130)

(R1 , R2 ) ∈ RBC .

(131)

which then implies that Since (131) holds for any ε-achievable (R1 , R2 ), it follows from Definition 5 that Cε ⊆ RBC . A PPENDIX A P ROOF OF L EMMA 4 Define pX n (xn ) ,

X

pW (w)1{xn = g(w)}

(132)

kxn k2 ≤ κ .

(133)

w∈W

for all xn ∈ Rn . It follows from (63) and (132) that

max

xn ∈Rn :pX n (xn )>0

Consider the following chain of inequalities: VarpZ n [ log EpW [pZ n (Z n + g(W ))|Z n ] ] (132)

= VarpZ n [ log EpX n [pZ n (Z n + X n )|Z n ] ] Z  2 N (z n ; 0, σ 2 ) log EpX n N (z n + X n ; 0, σ 2 ) dz n = Rn  Z   n 2 n n 2 n 2 N (z ; 0, σ ) log EpX n N (z + X ; 0, σ ) dz − Rn    2 Z Xn (8) n n N (z ; 0, 1) log EpX n N z + √ ; 0, 1 = dz n σ2 Rn  2   Z Xn n n n √ ; 0, 1 dz N (z ; 0, 1) log EpX n N z + − σ2 Rn h  i 2    n Z n EpX n − zk + √Xσk2 N z n + √Xσ2 ; 0, 1 X (a)   dz n i h  N (z n ; 0, 1) ≤ n X n R EpX n N z n + √σ2 ; 0, 1 k=1

(134)

(135)

(136)

(137)

15

Z

N (z n ; 0, 1)

n X



Ep X n

h

Xk √ N σ2



zn +

i 2

Xn √ ; 0, 1 σ2

i  dz n h  n X n EpX n N z + √σ2 ; 0, 1 k=1   h i 2   Xk Xn n+ √ Z n √ N z E ; 0, 1 X pX n (b) 2 σ2 2 n hσ  i   N (z n ; 0, 1) ≤2 zk +   dz n X n n R EpX n N z + √σ2 ; 0, 1 k=1  i 2 h  Xn Xk n+ √ Z n √ N z ; 0, 1 E X n p X 2 (c) σ2  i  dz n hσ  N (z n ; 0, 1) = 2n + 2 n X n + √ ; 0, 1 Rn E N z pX n k=1 σ2 =

Rn

−zk −

(138)

(139)

(140)

where (a) follows from the Gaussian Poincar´e inequality [5, Equation (2.16)], which states that for an n-dimensional tuple Z n consisting of independent standard Gaussian random variables and any differentiable mapping  f : Rn → R such that EpZ n [(f (Z n ))2 ] < ∞ and EpZ n k∇f (Z n )k2 < ∞ where ∇f denotes the gradient of f ,   VarpZ n [f (Z n )] ≤ EpZ n k∇f (Z n )k2 . (141) 2 ≤ 2(a2 + b2 ) for all real numbers a and b. (b) follows from the fact that (a + b)P  n 2 n n n (c) follows from the fact that EpZ n k=1 Zk = n when Z is distributed according to pZ n (z ) = N (z ; 0, 1). n n Following (140) and defining for each z ∈ R the distribution p˜X n |Z n =z n as n

pX n (xn )N (z n + √xσ2 ; 0, 1) i , h  p˜X n |Z n =z n (x ) , n EpX n N z n + √Xσ2 ; 0, 1 n

we consider the following chain of inequalities for each z n ∈ Rn : h i 2  n n EpX n √Xσk2 N (z n + √Xσ2 ; 0, 1) X  i  h Xn n+ √ E ; 0, 1) N (z pX n k=1 σ2 2   n X Xk (142) √ = Ep˜X n |Z n =zn σ2 k=1  2 n X Xk ≤ Ep˜X n |Z n =zn σ2 k=1   1 = 2 Ep˜X n |Z n =zn kX n k2 σ (a) κ ≤ 2 σ where (a) follows from (133) and (142). Combining (140) and (146), we obtain (64).

(142)

(143) (144) (145) (146)

A PPENDIX B P ROOF OF P ROPOSITION 5 (n)

Let pWI ,X n ,Y n ,W ˆ I denote the probability distribution induced by the (n, MI , P, ε)max -code. For each (i, j) ∈ I {(1, 2), (2, 1)}, consider the following chain of inequalities: (n) (a)

log Mi

= HpWi (Wi ) =

(b)

(147)

HpWi ,Yin (Wi |Yin ) + IpWi ,Yin (Wi ; Yin ) (n)

+ IpWi ,Yin (Wi ; Yin )

(149)

(n)

+ IpWI ,Yin (Wi ; Yin |Wj )

(150)

≤ 1 + ε log Mi

(c)

(148)

≤ 1 + ε log Mi

16

(n)

≤ 1 + ε log Mi

+

(d)

+

(n)

= 1 + ε log Mi

n  X hpYi,k (Yi,k ) − hpW

I

k=1 n  X k=1

(e)

(n)

= 1 + ε log Mi

+

n  X k=1

where (a) (b) (c) (d) (e)

 k−1 (Y |W , Y ) I i,k i ,Y k

(151)

i

hpYi,k (Yi,k ) − hpW

I ,Xk

 k−1 (Y |W , X , Y ) I i,k k i ,Y k

(152)

i

 hpYi,k (Yi,k ) − hpXk ,Yi,k (Yi,k |Xk ) ,

(153)

follows from (11) that Wi is uniform on Wi . follows from Fano’s inequality. follows from (11) that W1 and W2 are independent. follows from (12) that Xk is a function of WI . follows from (16) that (WI , Yik−1 ) → Xk → Yi,k forms a Markov chain when they are distributed according to pWI ,Xk ,Yik .

Following (153) and defining pZin (zin ) , N (zin ; 0, σi2 )

(154)

for each i ∈ I , we consider the following chain of inequalities for each i ∈ I : n   X hpYi,k (Yi,k ) − hpXk ,Yi,k (Yi,k |Xk ) k=1

(a)

= =

n  X

 hpXk pZi,k (Xi,k + Zi,k ) − hpXk pZi,k (Xi,k + Zi,k |Xk )

(155)

 hpXk pZi,k (Xi,k + Zi,k ) − hpZi,k (Zi,k )

(156)

k=1 n  X

k=1 n  X (154)

=

k=1 n  X

 1 2 hpXk pZi,k (Xi,k + Zi,k ) − log(2πeσi ) 2

   1 1 2 ≤ log 2πeVarpXk pZi,k [Xi,k + Zi,k ] − log(2πeσi ) 2 2 k=1   n X1 VarpXk [Xk ] (c) = log 1 + 2 σi2 k=1 ! n X EpXk [Xk2 ] 1 ≤ log 1 + 2 σi2 k=1 ! P (d) n EpX n [ n1 nk=1 Xk2 ] ≤ log 1 + 2 σi2   (13) n P ≤ log 1 + 2 , 2 σi (b)

(157) (158) (159) (160) (161) (162)

where (a) follows from (15) and (17) that Yi,k and Xi,k + Zi,k have the same distribution when Yi,k is distributed according to pYi,k and (Xi,k , Zi,k ) is distributed according to pXk pZi,k . (b) follows from the fact that the differential entropy of a random variable X is always upper bounded by that of the zero-mean Gaussian random variable whose variance is equal to Var[X]. (c) follows from (154) that VarpXk pZi,k [Xi,k + Zi,k ] = VarpXk [Xi,k ] + σi2 . (d) follows from Jensen’s inequality.

17

Combining (153) and (162), we obtain for each i ∈ I (n) log Mi

≤1+

(n) ε log Mi

  n P + log 1 + 2 , 2 σi

(163)

which then implies (65). Acknowledgment The authors are supported by an NUS grant (R-263-000-A98-750/133), an NUS Young Investigator Award (R263-000-B37-133), and a Ministry of Education (MOE) Tier 2 grant (R-263-000-B61-112). The authors would like to thank Prof. Shun Watanabe for pointing out a possible extension of our result, which leads to Remark 2. R EFERENCES [1] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge, U.K.: Cambridge University Press, 2012. [2] T. Cover, “Broadcast channels,” IEEE Trans. on Inf. Theory, vol. 18, no. 1, pp. 2–14, 1972. [3] P. P. Bergmans, “A simple converse for broadcast channels with additive white Gaussian noise,” IEEE Trans. Inf. Theory, vol. 20, no. 2, pp. 279–280, 1974. [4] A. J. Stam, “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Inf. Control, vol. 2, no. 2, pp. 101–112, 1959. [5] M. Ledoux, “Concentration of measure and logarithmic Sobolev inequalities,” in S´eminaire de Probabilit´es, vol. 33, 1999, pp. 120–216. [6] M. Raginsky and I. Sason, “Concentration of measure inequalities in information theory, communications and coding,” Foundations R in Communications and Information Theory, vol. 10, no. 1–2, 2013. and Trends [7] Y. Polyanskiy and S. Verd´u, “Empirical distribution of good channel codes with nonvanishing error probability,” IEEE Trans. on Inf. Theory, vol. 60, no. 1, pp. 5–21, 2014. R in [8] V. Y. F. Tan, “Asymptotic estimates in information theory with non-vanishing error probabilities,” Foundations and Trends Communications and Information Theory, vol. 11, no. 1–2, 2014. [9] J. Scarlett and V. Y. F. Tan, “Second-order asymptotics for the Gaussian MAC with degraded message sets,” in Proc. of the IEEE Intl. Symp. on Inf. Theory, 2014, arXiv:1310.1197 [cs.IT]. [10] R. Ahlswede, P. G´acs, and J. K¨orner, “Bounds on conditional probabilities with applications in multi-user communication,” Z. Wahrscheinlichkeitstheorie verw. Gebiete, vol. 34, no. 3, pp. 157–177, 1976. [11] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2011. [12] G. Dueck, “The strong converse coding theorem for the multiple-access channel,” Journal of Combinatorics, Information & System Sciences, vol. 6, no. 3, pp. 187–196, 1981. [13] H. Tyagi and P. Narayan, “The Gelfand-Pinsker channel: Strong converse and upper bound for the reliability function,” in Proc. of the IEEE Intl. Symp. on Inf. Theory, Seoul, Korea, Jun. 2009, pp. 1954 – 1957. [14] S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence. Cambridge University Press, 2013. [15] C. E. Shannon, “Certain results in coding theory for noisy channels,” Inf. Control, vol. 1, pp. 6–25, 1957. [16] S. Verd´u and T. S. Han, “A general formula for channel capacity,” IEEE Trans. on Inf. Theory, vol. 40, no. 4, pp. 1147–1157, 1994.