On the Empirical Output Distribution of ε-Good Codes for Gaussian ...

1

On the Empirical Output Distribution of ε-Good Codes for Gaussian Channels under a Long-Term Power Constraint Silas L. Fong and Vincent Y. F. Tan

arXiv:1510.08544v1 [cs.IT] 29 Oct 2015

Abstract This paper considers additive white Gaussian noise (AWGN) channels under the long-term power (also called the average-over-the-codebook) constraint. It is shown that the relative entropy between the output distribution induced by any sequence of codes with non-vanishing maximal probabilities of error and the n-fold product of the capacity-achieving output distribution, denoted by D(pY n kp∗Y n ), is upper bounded by log Mn − nC + O(n2/3 ) where Mn is the number of codewords, C is the AWGN channel capacity and n is the blocklength of the code. Consequently, n1 D(pY n kp∗Y n ) converges to 0 as n tends to infinity for any sequence of capacity-achieving codes with non-vanishing maximal error probabilities, which settles an open question posed by Polyanskiy and Verd´u (2014). Additionally, we extend our convergence result for capacity-achieving codes with non-vanishing maximal error probabilities to quasi-static fading channels.

I. I NTRODUCTION In information and coding theory, the search for good codes for various classes of channels is of paramount importance. By “good” we mean that the code is reliable, i.e., its (average or maximal) probability of error is arbitrarily small when the blocklength is sufficiently large. In addition to being good, the communication engineer would also like the code to be optimal in the sense that the rate of the code (the ratio of the logarithm of the number of codewords to the blocklength) converges to the channel capacity as the blocklength grows. The search for optimal good codes for memoryless channels, however, is known to be challenging and has remained elusive for decades. As a result, information theorists have resorted to characterizing the nature or properties of good codes that are asymptotically optimal. One of the useful characterizations is in terms of the so-called approximation of output statistics [1], studied by Han and Verd´u. In [1, Theorem 15], general channels [2] that satisfies the strong converse property and whose input alphabet is finite were considered. For this class of channels, it was shown that for reliable codes whose rates approach the channel capacity, the normalized relative entropy between pY n , the output distribution induced by the code, and p∗Y n , the n-fold product of the (unique) capacity-achieving output distribution, converges to zero, i.e., 1 (1) lim D(pY n kp∗Y n ) = 0. n→∞ n This result implies that good capacity-achieving codes must necessarily be such that its empirical output distribution is close to the maximum mutual information output distribution in the sense of (1). Thus to find optimal codes, a communication engineer can and indeed must restrict his/her search to this class of codes. The seminal work by Han and Verd´u [1] was subsequently generalized in Shamai and Verd´u [3] who lifted the restriction concerning the finiteness of the input alphabet of the channel. They showed that (1) holds under the condition that the capacity of the general channel can be written as 1 C = lim sup I(X n ; Y n ). n→∞ pX n n

(2)

Indeed, when the input alphabet is finite and the strong converse property holds, the capacity [1, Theorem 8] is given by the expression in (2). In yet another generalization, Polyanskiy and Verd´u [4] studied the properties of ε-good codes under the maximal probability of error formalism and with deterministic encoders. These are codes whose maximal probabilities of error are bounded above by a non-vanishing constant ε ∈ [0, 1). The study of ε-good codes has gained prominence Silas L. Fong and Vincent Y. F. Tan are with the Department of Electrical and Computer Engineering, National University of Singapore (NUS), Singapore (e-mail: {silas_fong,vtan}@nus.edu.sg). Vincent Y. F. Tan is also with the Department of Mathematics, NUS.

2

recently due to the interest in the finite blocklength regime [5] and second-order asymptotics [5]–[7]. Polyanskiy and Verd´u showed that despite this generalization, the approximation in (1) continues to hold for a large class of channels including discrete memoryless channels (DMCs) [4, Theorems 6 & 7] and additive white Gaussian noise (AWGN) channels under the peak power constraint [4, Theorem 8], i.e., that every transmitted codeword (x1 , x2 , . . . , xn ) must satisfy n 1X 2 xi ≤ P (3) n i=1

for some power P > 0. Furthermore, sharper approximations than (1) were provided. For example, it was shown that for codes of size Mn and capacity C , the approximation in (1) may be refined to √ (4) D(pY n kp∗Y n ) ≤ log Mn − nC + O( n).

for certain classes of DMCs (specifically, those with positive entries) and AWGN channels under the peak power constraint in (3). Observe that for any sequence of capacity-achieving codes which by definition satisfies n1 log Mn → C , equation (1) is a direct consequence of (4). We note that the upper bound on the relative entropy in (4) fails to hold under the average error probability formalism or if stochastic encoders are allowed [4, Remark 5]. It is also of interest especially for wireless fading channels [8] to study codebooks whose cost constraints are in the average-over-the-codebook sense, i.e.,  Mn  X n 1 X 1 2 xi (w) ≤ P. (5) Mn n w=1

i=1

This is also known in the wireless communication community as the long-term power constraint [8,9]. This more relaxed constraint is useful and practical in wireless communication as it allows for the dynamic allocation of available power based on the current state (fading statistics) of the channel. The question of whether the property in (1) continues to hold for ε-good codes when the cost constraint is as in (5) was left as an open question in the work by Polyanskiy and Verd´u [4, Remark 8]. A. Main Contribution The main contribution in this work is a resolution of the above open question in the affirmative. We show that (1) continues to hold for ε-good codes under the long-term power constraint in (5). Indeed, we show that D(pY n kp∗Y n ) ≤ log Mn − nC + O(n2/3 )

(6)

under the long-term power constraint. The main technical tool we employ is an application of a version of the Gaussian Poincar´e inequality. This concentration of measure inequality was also used for the peak power constraint case in [4, Theorem 8] and more recently to establish the strong converse for the Gaussian broadcast channel [10]. However, because the long-term power constraint in (5) is more general than the peak power constraint in (3), several other careful approximations have to be made in the present work. In particular, we have to appropriately choose a set of codewords which excludes certain high-power codewords and whose probability is asymptotically one before applying the Gaussian Poincar´e inequality. Additionally, we extend our result to quasi-static fading channels under the maximal probability of error formalism by showing that (1) continues to hold. B. Related Work In Shamai and Verd´u [3, Equations (10)–(12) and Section V], it was already mentioned that (1) holds when there is a cost constraint over the whole codebook and the average (as well as maximal) error probabilities are vanishing. Thus, the present work generalizes Shamai and Verd´u’s work [3] to the case where the maximal error probabilities are non-vanishing. In another closely related work, Raginsky and Sason [11,12] used alternative concentration of measure techniques to improve the constant term in the O(·) notation in (4) for DMCs. C. Paper Outline This paper is organized as follows. The next subsection presents the notation used in this paper. Section II provides the problem formulation of the AWGN channel under a long-term power constraint and presents our

3

main result – an upper bound on the divergence between the output distribution and the capacity-achieving output distribution. Section III contains preliminary results for establishing the upper bound, which include the Gaussian Poincar´e inequality and a multi-letter converse bound. Section IV presents the proof of the upper bound. Section V extends our main result for the AWGN channel to the quasi-static fading channel. D. Notation We use Pr{E} to represent the probability of the event E , and we let 1{E} be the characteristic function of E . We use a capital letter X to denote an arbitrary random variable with alphabet X , and use the small letter x to denote a realization of X . We use X n to denote a random tuple (X1 , X2 , . . . , Xn ), where the components Xk have the same alphabet X . The following notations are used for any arbitrary random variables X and Y and any real-valued mapping g whose domain includes X . We let pX and pY |X denote the probability distribution of X and the conditional R probability distribution of Y given X respectively. We let PrpX {X ∈ A} denote x∈A pX (x)1{x ∈ A} dx for any real-valued function g and any real constant ξ . The expectation and the variance of g(X) are denoted as EpX [g(X)] and VarpX [g(X)] , EpX [(g(X)−EpX [g(X)])2 ] respectively. We let pX pY |X denote the joint distribution of (X, Y ), i.e., pX pY |X (x, y) = pX (x)pY |X (y|x) for all x and y . We let N (· ; µ, σ 2 ) : R → [0, ∞) be the probability density function of a Gaussian random variable denoted by Z whose mean and variance are µ and σ 2 respectively such that (z−µ)2 1 N (z; µ, σ 2 ) = √ e− 2σ2 . (7) 2πσ 2 Similarly, we let N (· ; µ, σ 2 ) : Rn → [0, ∞) be the joint probability density function of n independent copies of Z ∼ N (z; µ, σ 2 ) such that P (zk −µ)2 1 − n k=1 2σ2 N (z n ; µ, σ 2 ) = . (8) n e (2πσ 2 ) 2 We will take all logarithms to base e throughout this paper, so all information quantities have units of nats. The + sets of natural, real and non-negative real numbers qP are denoted by N, R and R respectively. The Euclidean norm n 2 of a tuple xn ∈ Rn is denoted by kxn k , k=1 xk . II. A DDITIVE W HITE G AUSSIAN N OISE C HANNEL

UNDER A

L ONG -T ERM P OWER C ONSTRAINT

We consider an additive white Gaussian noise (AWGN) channel that consists of one source and one destination, denoted by s and d respectively. Node s transmits information to node d in n time slots as follows. Node s chooses message W from the set W , {1, 2, . . . , Mn } (9) and sends W to node d, where Mn = |W|. We assume that W is uniformly distributed over W . The encoder at s is denoted by a mapping f : W → Rn , where f (W ) is the codeword corresponding to W , and the codebook is the set {f (w)|w ∈ W}. The codebook should satisfy the long-term power constraint 1 X kf (w)k2 ≤ nP (10) Mn w∈W

for some fixed P > 0. Let Xk denote the kth coordinate of f (W ). Then for each k ∈ {1, 2, . . . , n}, node s transmits Xk in time slot k and node d receives Yk , Xk + Zk , (11) where Z1 , Z2 , . . . , Zn are n independent copies of the standard Gaussian random variable. In addition, X n and Z n ˆ to be the transmitted W based on Y n . are assumed to be independent. After n time slots, node d declares W Definition 1: An (n, Mn , P )-code consists of the following: 1) A message set W , {1, 2, . . . , Mn } at node s. Message W is uniform on W .

(12)

4

2) An encoding function f : W → Rn ,

(13)

where f is the encoding function at node s for encoding W such that X n = f (W ).

(14)

The codebook, defined to be {f (w)|w ∈ W}, should satisfy the long-term power constraint 1 X kf (w)k2 ≤ nP. Mn

(15)

w∈W

3) A decoding function ϕ : Rn → W,

(16)

where ϕ is the decoding function for W at node d such that ˆ = ϕ(Y n ). W

(17)

For each w ∈ W , the decoding region for message w is defined to be

D (n) (w) , {y n ∈ Rn | ϕ(n) (y n ) = w}.

(18)

Definition 2: An additive white Gaussian noise (AWGN) channel is characterized by the probability density distribution qY |X satisfying qY |X (y|x) = N (y − x; 0, 1) (19) such that the following holds for any (n, Mn , P )-code: p

W,X n ,Y n

n

n

(w, x , y ) = p

W,X n

n

(w, x )

n Y

k=1

for all w ∈ W , xn ∈ Rn and y n ∈ Rn where

pYk |Xk (yk |xk )

pYk |Xk (yk |xk ) = qY |X (yk |xk ).

(20)

(21)

Since pYk |Xk (yk |xk ) does not depend on k by (21), the channel is stationary. For any (n, Mn , P )-code defined on the AWGN channel, let pW,X n ,Y n ,W ˆ be the joint distribution induced by the code. We can factorize pW,X n ,Y n ,W ˆ as follows: (a)

pW,X n,Y n ,W ˆ = pW,X n ,Y n pW ˆ |Y n (20)

= pW,X n

n Y

k=1

ˆ is a function of Y where (a) follows from the fact W

n

pYk |Xk

(22) !

pW ˆ |Y n

(23)

by Definition 1.

Definition 3: For an (n, Mn , P )-code defined on the AWGN channel, we can calculate according to (23) the ˆ 6= w|W = w}. We call an (n, Mn , P )-code with maximal probability of decoding error defined as max Pr{W w∈W

maximal probability of decoding error no larger than ε an (n, Mn , P, ε)max -code. Similarly, we can calculate the ˆ 6= W }. We call an (n, Mn , P )-code with average probability average probability of decoding error defined as Pr{W of decoding error no larger than ε an (n, Mn , P, ε)avg -code. Definition 4: Let ε ∈ (0, 1) be a real number. A rate R is ε-achievable for the AWGN channel if there exists a sequence of (n, Mn , P, ε)max -codes on the AWGN channel such that lim inf n→∞

1 log Mn ≥ R. n

(24)

5

Definition 5: Let ε ∈ (0, 1) be a real number. The ε-capacity for the AWGN channel, denoted by Cε , is defined to be Cε , sup{R | R is ε-achievable}. The capacity is defined to be inf Cε . ε∈(0,1)

Define

1 log(1 + P ) 2 to be the capacity of the AWGN channel. It was shown in [13, Theorem 73] that C(P ) ,

Cε = C(P )

(25)

(26)

for all ε ∈ (0, 1). This justifies the following definition of capacity-achieving codes. Definition 6: For each ε ∈ (0, 1), a sequence of (n, Mn , P, ε)max -codes is said to be capacity-achieving if lim

n→∞

1 log Mn = C(P ). n

(27)

The following theorem is the main result in this paper, and its proof will be presented in Section IV. Theorem 1: Fix an ε ∈ (0, 1) and a sequence of (n, Mn , P, ε)max -codes. For each n ∈ N, define p∗Y n (y n )

,

n Y

k=1

N (yk ; 0, 1 + P )

(28)

to be the product of the capacity-achieving output distribution, and let pY n be the output distribution induced by 2 , the (n, Mn , P, ε)max -code on the AWGN channel. Then for all n ≥ 1−ε where ξ ,

17(1+P ) 1−ε

D(pY n kp∗Y n ) ≤ nC(P ) − log Mn + ξn2/3

(29)

. In particular, if the sequence of codes is capacity-achieving, then 1 D(pY n kp∗Y n ) = 0. n→∞ n lim

(30)

Remark 1: The proof of Theorem 1 relies on a multi-letter converse bound based on a logarithmic Sobolev inequality known as the Gaussian Poincar´e inequality. The Gaussian Poincar´e inequality and the multi-letter converse bound will be introduced in the following two sections respectively, followed by the proof of Theorem 1 in Section IV. Remark 2: It was shown in [3, Section V] that (30) holds if the average as well as maximal error probabilities vanish, which was proved by using Fano’s inequality. Here, we strengthen the result in [3, Section V] under the maximal error formalism by showing that (29) holds for non-vanishing error probabilities, and our proof technique involves a combination of an information spectrum bound, the Gaussian Poincar´e inequality and Fano’s inequality. Remark 3: It has been proved in [4, Theorem 8] by using the Gaussian Poincar´e inequality that (30) holds for the case when the long-term power constraint (15) is replaced by the peak power constraint max kf (w)k2 ≤ nP.

w∈W

(31)

Here, we strengthen the result in [4, Theorem 8] by proving (30) under the long-term power constraint (15). Remark 4: It was conjectured in [4, Remark 8] that (30) need not hold under the long-term power constraint (15). We show (30) does hold through establishing a non-asymptotic bound in (29). Remark 5: If the maximal error probability criterion were replaced with the average error probability criterion (cf. Definition 3) or the encoder were allowed to be stochastic (cf. Definition 1), (30) no longer holds. This can be seen by the counterexamples suggested in [4, Remark 8], which are repeated below to facilitate understanding. Fix any ε ∈ (0, 1). We will show that there exists a sequence of capacity-achieving (n, Mn , P, ε′n )avg -codes such that

6

limn→∞ n1 D(pY n kp∗Y n ) ≥ 4ε log(1 + P ) > 0. Construct a sequence of capacity-achieving (n, Mn , P, εn )max -codes under the peak power constraint (31) such that limn→∞ εn = 0, which is always possible √ √ due to√the channel coding theorem. Then, replace a fraction of ε/2 of the codewords with the codeword ( P , P , . . . , P ) for each (n, Mn , P, εn )max -code, and let (n, Mn , P, ε′n )avg denote the resultant code. Define n √ √ √ o S , 1 Xn = ( P , P , . . . , P ) (32)

for the (n, Mn , P, ε′n )avg -code. Straightforward calculations reveal that the average error probabilities ε′n are upper bounded by ε for all sufficiently large n. However, (30) does not hold for the sequence of (n, Mn , P, ε′n )-codes due to the following chain of inequalities: D(pY n kp∗Y n ) = D(pY n |S kp∗Y n |pS ) − IpS,Y n (S; Y n ) ≥

=

(a)

(33)

D(pY n |S kp∗Y n |pS ) − log 2 D(pY n |S=1 kp∗Y n )pS (1) + D(pY n |S=0 kp∗Y n )pS (0)

(34) − log 2

= nD(PY1 |X1 =√P kp∗Y1 )pS (1) + D(pY n |S=0 kp∗Y n )pS (0) − log 2 (b) nε log(1 + P ) + D(pY n |S=0 kp∗Y n )pS (0) − log 2 = 4 (c) nε = log(1 + P ) + o(n) − log 2 4

(35) (36) (37) (38)

where (a) follows from (21), (28) and (32). (b) follows from the fact that pS (1) = ε/2 and the fact that D(PY1 |X1 =√P kp∗Y1 ) = 12 log(1 + P ) by (19) and (28). (c) follows from Theorem 1 and the fact that the sequence of codes formed from expurgating a fraction of ε/2 codewords from the (n, Mn , P, ε′n )-codes is capacity-achieving(recall that the sequence of (n, Mn , P, ε′n )codes is capacity-achieving). On the other hand, if the encoders are allowed to be stochastic, then we can modify the encoders of the sequence of capacity-achieving (n, n , P, εn )max √ M√ √ -codes with limn→∞ εn = 0 in such a way that the modified encoders will output the codeword ( P , P , . . . , P ) with probability ε/2 regardless of the chosen message and output the original codeword corresponding to the message with probability 1 − ε/2. Consequently, the resultant maximal error probabilities are upper bounded by ε for sufficiently large n. However, we have limn→∞ n1 D(pY n kp∗Y n ) ≥ ε 4 log(1 + P ) > 0 due to similar arguments in the chain of inequalities leading to (38). III. P RELIMINARIES

FOR THE

P ROOF

OF

T HEOREM 1

A. Information Spectrum Bounds The following lemma is a non-asymptotic lower bound on the maximal probability of decoding error in terms of an information spectrum quantity. Lemma 1: Fix an (n, Mn , P, ε)max -code with decoding regions {D (n) (w)|w ∈ W} and let pW,X n ,Y n ,W ˆ denote the probability distribution induced by the code. Let A be a subset of W , define pE|W to be the conditional distribution such that   1 if e = 1 and w ∈ A, pE|W (e|w) , 1 if e = 0 and w ∈ (39) / A,   0 otherwise,

and define

pW,E,X n,Y n ,W ˆ , pW,X n ,Y n ,W ˆ pE|W .

(40)

Equation (39) implies that E is the indicator random variable 1{W ∈ A}. Fix a real number γ(w) for each w ∈ A.

7

Then for each w ∈ A, we have

 pY n |W,E (Y n |w, 1) PrpY n |W =w,E=1 log ≤ log Mn − γ(w) pY n |E (Y n |1) ) ( Z pY n |E (y n |1)dy n > n . ≤ ne−γ(w) + ε + 1 Mn 

(41)

D (n) (w)

Proof: Fix a w ∈ A and a real number γ(w). Consider the following chain of inequalities:   pY n |W,E (Y n |w, 1) ≤ log Mn − γ(w) PrpY n |W =w,E=1 log pY n |E (Y n |1) nn o  (a) o p n (Y n |w,1) n ∈ D (n) (w) log YpY|W,E ≤ PrpY n |W =w,E=1 ≤ log M − γ(w) ∩ Y n n n |E (Y |1) n o / D (n) (w) (42) + PrpY n |W =w Y n ∈ o) (n ) ( n Z p n (Y |w,1) log YpY|W,E ≤ log Mn − γ(w) n |E (Y n |1) ≤ PrpY n |W =w,E=1 pY n |E (y n |1)dy n ≤ n × 1 M  n n (n) (n) D (w) ∩ Y ∈ D (w) ) ( Z n o n n n (n) (43) pY n |E (y |1)dy > n / D (w) + 1 Mn + PrpY n |W =w Y ∈ D (n) (w)

where (a) follows from the union bound. In order to bound the first term in (43), consider the following chain of inequalities: o) (n pY n |W,E (Y n |w,1) log pY n |E (Y n |1) ≤ log Mn − γ(w) PrpY n |W =w,E=1  ∩ Y n ∈ D (n) (w)   Z pY n |W,E (y n |w, 1) n ≤ log Mn − γ(w) dy n (44) pY n |W,E (y |w, 1)1 log = pY n |E (y n |1) D (n) (w)   Z pY n |W,E (y n |w, 1) −γ(w) n ≤ Mn e pY n |E (y |1)1 log ≤ log Mn − γ(w) dy n (45) pY n |E (y n |1) D (n) (w) Z pY n |E (y n |1)dy n , (46) ≤ Mn e−γ(w) D (n) (w)

which implies that (n PrpY n |W =w,E=1

In addition,

log ∩

pY n |W,E (Y n |w,1) ≤ n |1)  pYnn |E (Y (n) Y ∈ D (w)

o) log Mn − γ(w)

(

×1 Mn

Z

n

D (n) (w)

n o / D (n) (w) ≤ ε PrpY n |W =w Y n ∈

n

pY n |E (y |1)dy ≤ n

)

≤ ne−γ(w) .

(47) (48)

because the maximal probability of decoding error of the code is ε. Combining (43), (47) and (48), we obtain (41).

Note that the definitions of A and E may seem to be redundant in Lemma 1 because the property of A plays no role in the proof. However, a judicious choice of A later (cf. (67) in the proof of Theorem 2) will enable us to simplify the bound in (41). The following corollary is a direct consequence of Lemma 1 with an appropriate choice of γ(w). Corollary 2: Fix an ε ∈ (0, 1) and fix an (n, Mn , P, ε)max -code and let pW,X n ,Y n ,W ˆ denote the probability distribution induced by the code. Let A be a subset of W , and define pE|W and pW,E,X n,Y n ,W ˆ as in (39) and (40)

8

respectively. In addition, define B,

to be a subset of W . Then,

(

) Z n n pY n |E (y |1)dy ≤ n w ∈ W Mn D (n) (w) |W \ B| ≤

(49)

Mn . n

(50)

2 1−ε ,

we have for each w ∈ A ∩ B  s    pY n |W,E (Y n |w, 1) pY n |W,E (Y n |w, 1) 2 log Mn ≤ EpY n |W =w,E=1 log VarpY n |W =w,E=1 log + + 2 log n . pY n |E (Y n |1) 1−ε pY n |E (Y n |1) (51)

In addition, for each n ≥

Proof: We first prove (50). To this end, consider the following chain of inequalities: ) ( Z (49) n n pY n |E (y |1)dy > n PrpW {W ∈ / B} = PrpW Mn

(52)

D (n) (w)

# " Z 1 n n pY n |E (y |1)dy ≤ EpW Mn n D (n) (W ) Z (b) 1 X = pY n |E (y n |1)dy n n (n) D (w) (a)

(53) (54)

w∈W

(c)



1 n

(55)

where (a) follows from Markov’s inequality. (b) follows from Definition 1 that pW (w) =

1 Mn

(56)

for all w ∈ W . (d) follows form Definition 1 that the decoding regions {D (n) (w)|w ∈ W} are disjoint.

Consequently, (50) follows from (56) and (55). We now prove (51). For each w ∈ A ∩ B , define   s   pY n |W,E (Y n |w, 1) pY n |W,E (Y n |w, 1) 2 VarpY n |W =w,E=1 log − . γ(w) , log Mn − EpY n |W =w,E=1 log pY n |E (Y n |1) 1−ε pY n |E (Y n |1) (57) By Chebyshev’s inequality, we have for each w ∈ A ∩ B   pY n |W,E (Y n |w, 1) PrpY n |W =w,E=1 log ≤ log Mn − γ(w) pY n |E (Y n |1) h i p n (Y n |w,1) VarpY n |W =w,E=1 log YpY|W,E n |E (Y n |1) ≥1−  i2 h p n (Y n |w,1) log Mn − γ(w) − EpY n |W =w,E=1 log YpY|W,E n n |E (Y |1)

(58)

1−ε . 2 Combining (41) in Lemma 1, (49), (57) and (59), we obtain (57)

= 1−

− 1−ε ≤ ne 2



log Mn −EpY n |W =w,E=1 log

pY n |W,E (Y n |w,1) pY n |E (Y n |1)

  s 2 VarpY n |W =w,E=1 log − 1−ε

pY n |W,E (Y n |w,1) pY n |E (Y n |1)

(59)

!

.

(60)

9

If n ≥

2 1−ε ,

it follows from (60) that

− log Mn −EpY n |W =w,E=1 1 ≤ ne n which then implies (51).



log

pY n |W,E (Y n |w,1) pY n |E (Y n |1)

  s 2 VarpY n |W =w,E=1 log − 1−ε

pY n |W,E (Y n |w,1) pY n |E (Y n |1)

!

,

(61)

B. The Gaussian Poincar´e Inequality In the proof of the main theorem, we need to use the following lemma, which is based on the Gaussian Poincar´e inequality, to bound the variance term in (51). The proof of the following lemma is contained in [4, Section III-C] and a complete proof can be found in [10, Lemma 4]. Lemma 3: Let n be a natural number and let σ 2 be a positive number. Let pW be a probability distribution defined on some finite set A, and let g : A → Rn be a mapping. In addition, define pZ n to be the distribution of n independent copies of the zero-mean Gaussian random variable with variance σ 2 , i.e., pZ n (z n ) , N (z n ; 0, σ 2 ) for all z n ∈ Rn . Suppose there exists a 0 ≤ κ < ∞ such that max kg(w)k2 ≤ κ .

(62)

 κ VarpZ n [ log EpW [pZ n (Z n + g(W ))|Z n ] ] ≤ 2 n + 2 . σ

(63)

w∈A

Then, we have

In the proof of the main theorem, we would make some judicious choice of A so that for all w ∈ A, the upper bound on log Mn in Corollary 2 can be applied and simplified. For those w ∈ / A, we will use the following upper bound, whose proof is standard (by the use of Fano’s inequality [14, Section 2.1]) and therefore omitted. Readers who are interested in the proof may refer to [10, Proposition 5]. Proposition 4: Fix an (n, Mn , P, ε)max -code and let pW,X n ,Y n ,W ˆ be the distribution induced by the (n, Mn , P, ε)max code. Then, n (64) IpX n ,Y n (X n ; Y n ) ≤ log (1 + P ) . 2 C. A Multi-Letter Converse Bound The following multi-letter converse bound for the AWGN channel, which is based on the Gaussian Poincar´e inequality, is the key to the proof of Theorem 1. The proof relies on Corollary 2 with an appropriate choice of A, Lemma 3 and Proposition 4. Theorem 2: Fix an ε ∈ (0, 1). Suppose we are given a sequence of (n, Mn , P, ε)max -codes, and let pW,X n ,Y n ,W ˆ 2 be the distribution induced by the (n, Mn , P, ε)max -code. Then for all n ≥ 1−ε , log Mn ≤ IpX n ,Y n (X n ; Y n ) + ξn2/3

(65)

) where ξ , 17(1+P 1−ε . Proof: Fix an ε ∈ (0, 1) and fix an n ∈ N such that

n≥

2 . 1−ε

(66)

Fix the corresponding (n, Mn , P, ε)max -code and let pW,X n,Y n ,Wˆ be the distribution induced by the (n, Mn , P, ε)max code. Define n o A , w ∈ W kf (w)k2 ≤ n4/3 P (67)

such that all the codewords in {f (w)|w ∈ A} have power no greater than n1/3 P , i.e., n1 kf (w)k2 ≤ n1/3 P . Define pE|W and pW,E,X n,Y n ,W ˆ as in (39) and (40) so that the following two statements hold:

10

(i) The distribution induced by the (n, Mn , P, ε)max -code is a marginal distribution of pW,E,X n,Y n ,W ˆ and pW,E,X n,Y n ,W ˆ = pW,X n ,Y n ,W ˆ pE|W (23)

= pW,X n pE|W

n Y

k=1

(ii) (W, E) is distributed according to pW,E such that

(68) pYk |Xk

!

pW ˆ |Y n .

E = 1{W ∈ A}.

(69)

(70)

In order to obtain an upper bound on the probability of W falling outside A, consider n o (67) (71) PrpW {W ∈ / A} = PrpW kf (W )k2 > n4/3 P i h 2 (a) EpW kf (W )k ≤ (72) n4/3 P (b) 1 (73) ≤ 1/3 n where (a) follows from Markov’s inequality. (b) follows from (56) and the long-term power constraint (15). In order to prove (65), we construct B as in (49) (the construction of B depends on our choice of A in (67) through the indicator random variable E ) and consider the following chain of inequalities: X X log Mn = pW (w) log Mn + pW (w) log Mn (74) w∈A

(73)



=

w∈W\A

1

n1/3 1 n1/3

log Mn +

log Mn +

X

pW (w) log Mn

(75)

w∈A

X

pW (w) log Mn +

w∈A∩B

X

pW (w) log Mn

w∈A\B

1 1 log Mn + log Mn + pW (w) log Mn n n1/3 w∈A∩B X 2 ≤ 1/3 log Mn + pW (w) log Mn n w∈A∩B X (70) 2 pW,E (w, 1) log Mn = 1/3 log Mn + n w∈A∩B    X (b) 2 pY n |W,E (Y n |w, 1) ≤ 1/3 log Mn + pW,E (w, 1) EpY n |W =w,E=1 log pY n |E (Y n |1) n w∈A∩B s    pY n |W,E (Y n |w, 1) 2 + VarpY n |W =w,E=1 log + 2 log n 1−ε pY n |E (Y n |1)

(a)



(76)

X

(77) (78) (79)

(80)

where P P (a) follows from (50) in Corollary 2 and (56) that w∈A\B pW (w) ≤ w∈W\B pW (w) ≤ n1 . (b) follows from (66) and Corollary 2. Following (80) and letting f be the encoding function of the (n, Mn , P, ε)max -code (cf. Definition 1), we consider the following chain of inequalities for each w ∈ A:   VarpY n |W =w,E=1 log pY n |W,E (Y n |w, 1)   (a) = VarpY n |X n =f (w) log pY n |X n (Y n |f (w)) (81)

11

(b)

=

Z

n

Rn

(c)

2

n

n

N (z ; 0, 1) (log N (z ; 0, 1)) dz −

Z

n

Rn

n

N (z ; 0, 1) log N (z ; 0, 1)dz

n

2

= n,

(82) (83)

where (a) follows from the fact that for each w ∈ A and each y n ∈ Rn , Z n pX n ,Y n |W,E (xn , y n |w, 1)dxn pY n |W,E (y |w, 1) = n R Z pX n |W,E (xn |w, 1)pY n |W,E,X n (y n |w, 1, xn )dxn = n R Z 1 {xn = f (w)} pY n |W,E,X n (y n |w, 1, xn )dxn = n R Z (69) 1 {xn = f (w)} pY n |X n (y n |xn )dxn =

(84) (85) (86) (87)

Rn

= pY n |X n (y n |f (w)).

(88)

(b) follows from letting z n , y n − f (w) and the following fact by Definition 2: For each xn ∈ Rn and each y n ∈ Rn , pY n |X n (y n |xn ) = N (y n − xn ; 0, 1).

(89)

(c) follows from viewing the difference of two terms in (82) as VarpZ n [log pZ n (Z n )]. In addition, following (80), we consider the chain of inequalities below for each w ∈ A:   VarpY n |W =w,E=1 log pY n |E (Y n |1) " # X n pW |E (w|1)p ˜ ˜ 1) = VarpY n |W =w,E=1 log Y n |W,E (Y |w,

(90)

w∈W ˜

"

(a)

X

= VarpY n |W =w,E=1 log

"

(88)

w∈A ˜

= VarpY n |X n =f (w) log

"

= VarpY n |X n =f (w) log (b)

=

Z

Rn

− (c)

n

Rn

X

X

w∈A ˜

n

#

(92) #

n pW |E (w|1)p ˜ ˜ Y n |X n (Y |f (w) − (f (w) − f (w)))

X

w∈A ˜

!2

pW |E (w|1)N ˜ (z n + f (w) − f (w); ˜ 0, 1)

X

w∈A ˜ 2

n



w∈A ˜

(93)

dz n

pW |E (w|1)N ˜ (z + f (w) − f (w); ˜ 0, 1)dz

≤ 2 n + max kf (w) − f (w)k ˜ w∈A ˜    (d) 2 2 ≤ 2 n + 2 max kf (w)k + max kf (w)k ˜ w∈A ˜

(91)

pW |E (w|1)p ˜ ˜ Y n |X n (Y |f (w))

N (z ; 0, 1) log



#

pW |E (w|1)p ˜ ˜ 1) Y n |W,E (Y |w,

w∈A ˜

N (z n ; 0, 1) log

Z

n

n

!2

(94) (95) (96)

(67)

≤ 2n + 8P n4/3 ,

where

(97)

12

(a) follows from the fact that for all w ˜ ∈ W \ A, pW |E (w|1) ˜ =

pW (w)p ˜ E|W (1|w) ˜ pE (1)

(70)

= 0.

(98) (99)

(b) follows from letting z n , y n − f (w) and the following fact by Definition 2: For each xn ∈ Rn and each y n ∈ Rn , pY n |X n (y n |xn ) = N (y n − xn ; 0, 1).

(c) follows from viewing the difference of two terms in (94) as   VarpZ n log EpW |E=1 [pZ n (Z n + f (w) − f (W ))]

(100)

(101)

where pW |E=1 is a distribution on A by (99), and from Lemma 3 by letting g(w) ˜ , f (w) − f (w) ˜

(102)

κ , max kf (w) − f (w)k ˜ 2.

(103)

and w∈A ˜

(d) follows from the fact that kun + v n k2 ≤ 2(kun k2 + kv n k2 ) for any (un , v n ) ∈ Rn × Rn .

For each w ∈ A, since

  pY n |W,E (Y n |w, 1) VarpY n |W =w,E=1 log pY n |E (Y n |1)     ≤ 2(VarpY n |W =w,E=1 log pY n |W,E (Y n |w, 1) + VarpY n |W =w,E=1 log pY n |E (Y n |1) )

(104)

(a)

≤ 3n + 8P n4/3

≤ (3 + 8P )n

4/3

(105) (106)

where (a) follows from (83) and (97), it follows from (80) that s    n |w, 1)  X p n (Y 2(3 + 8P )n4/3 2 Y |W,E + 2 log n + log Mn ≤ 1/3 log Mn + pW,E (w, 1) EpY n |W =w,E=1 log pY n |E (Y n |1) 1−ε n w∈A∩B s   X pY n |W,E (Y n |w, 1) 2 2(3 + 8P )n4/3 ≤ 1/3 log Mn + + 2 log n + . pW,E (w, 1)EpY n |W =w,E=1 log 1−ε pY n |E (Y n |1) n w∈A∩B (107) Following (107), consider   X pY n |W,E (Y n |w, 1) pW,E (w, 1)EpY n |W =w,E=1 log pY n |E (Y n |1) w∈A∩B   X pY n |W,E (Y n |w, 1) (70) = pW,E (w, 1)EpY n |W =w,E=1 log pY n |E (Y n |1) w∈A   X pY n |W,E (Y n |w, 1) − pW,E (w, 1)EpY n |W =w,E=1 log (108) pY n |E (Y n |1) w∈A\B     X pY n |W,E (Y n |W, E) pY n |W,E (Y n |w, 0) (70) = EpW,E,Y n log − pW,E (w, 0)EpY n |W =w,E=0 log pY n |E (Y n |E) pY n |E (Y n |0) w∈W\A   X pY n |W,E (Y n |w, 1) − pW,E (w, 1)EpY n |W =w,E=1 log (109) pY n |E (Y n |1) w∈A\B

13

 X pY n |W,E (Y n |W, E) = EpW,E,Y n log − pW,E (w, 0)D(pY n |W =w,E=0kpY n |E=0 ) pY n |E (Y n |E) w∈W\A X − pW,E (w, 1)D(pY n |W =w,E=1kpY n |E=1 ) 

(110)

w∈A\B

≤ EpW,E,Y n



 pY n |W,E (Y n |W, E) log , pY n |E (Y n |E)

(111)

which implies from (107) that 

log Mn ≤ EpW,E,Y n log

n |W, E) 

pY n |W,E (Y pY n |E (Y n |E)

+

2 n1/3

log Mn +

s

2(3 + 8P )n4/3 + 2 log n . 1−ε

In order to simplify (112), consider   pY n |W,E (Y n |W, E) = IpW,E,Y n (W ; Y n |E) EpW,E,Y n log pY n |E (Y n |E)

≤ IpW,E,Y n (W, E; Y n ) n

(112)

(113) (114) n

= IpW,Y n (W ; Y ) + IpW,E,Y n (E; Y |W ) n

(115)

≤ IpW,Y n (W ; Y ) + HpE (E)

(116)

≤ IpW,Y n (W ; Y n ) + 1

(117)

≤ IpX n ,Y n (X n ; Y n ) + 1

(118)

(70)

(a)

and

(b)

log Mn ≤

where

 1  n 1 + log(1 + P ) 1−ε 2

(119)

(a) follows from the following fact by (69): W → X n → Y n forms a Markov chain when they are distributed according to pW,X n ,Y n . (b) follows from Proposition 4 Combining (112), (118) and (119), we obtain log Mn ≤ IpX n ,Y n (X n ; Y n ) +

n2/3 log(1

= IpX n ,Y n (X n ; Y n ) + n2/3 < IpX n ,Y n (X n ; Y n ) + n2/3

s

2 2(3 + 8P )n4/3 + 2 log n + 1 + 1−ε 1−ε (1 − ε)n1/3 ! r 2 log(1 + P ) 2(3 + 8P ) + 2 log n + 1 + + 1−ε 1−ε (1 − ε)n1/3 ! r 2(3 + 8P ) log(1 + P ) 2 . + +3+ 1−ε 1−ε 1−ε + P)

+

(120) (121) (122)

Since r 2 2(3 + 8P ) log(1 + P ) + +3+ 1−ε 1−ε 1−ε  p 1  log(1 + P ) + 2(3 + 8P ) + 5 < 1−ε  p 1  ≤ log(1 + P ) + 2(3 + 8P ) + 5 1−ε 1 (P + 2(3 + 8P ) + 5) ≤ 1−ε 17 < (1 + P ), 1−ε

(123) (124) (125) (126)

14

it follows from (122) that Theorem 2 holds by letting ξ,

.

17(1 + P ) . 1−ε

IV. P ROOF

OF

(127)

T HEOREM 1

Fix an ε ∈ (0, 1). Suppose we are given a sequence of (n, Mn , P, ε)max -codes, and let pW,X n ,Y n ,W ˆ be the distribution induced by the (n, Mn , P, ε)max -code. For each n ∈ N, define p∗Y n (y n ) to be the product of the capacity-achieving output distribution as in (28). In order to prove (29), we follow the standard steps and consider D(pY n kp∗Y n ) = D(pY n |X n kp∗Y n |pX n ) − D(pY n |X n kpY n |pX n ) = D(pY n |X n kp∗Y n |pX n ) − IpX n ,Y n (X n ; Y n ).

By Theorem 2, we have for all n ≥ 17(1+P ) 1−ε

(129)

2 1−ε

log Mn ≤ IpX n ,Y n (X n ; Y n ) + ξn2/3

where ξ ,

(128)

(130)

, which implies from (129) that D(pY n kp∗Y n ) ≤ D(pY n |X n kp∗Y n |pX n ) − log Mn + ξn2/3 .

(131)

Consider the following chain of inequalities for each n ∈ N:

D(pY n |X n kp∗Y n |pX n ) Z Z pY n |X n (y n |xn ) n n n dy dx = pX n (x ) pY n |X n (y n |xn ) log p∗Y n (y n ) Rn Rn Z Z N (y n − xn ; 0, 1) n n (a) n n N (y n − xn ; 0, 1) log pX (x ) = dy dx N (y n ; 0, 1 + P ) n n ZR ZR N (z n ; 0, 1) (b) dz n dxn = pX n (xn ) N (z n ; 0, 1) log N (z n + xn ; 0, 1 + P ) Rn Rn Z n Z X 1 (8) n n pX n (x ) N (zk ; 0, 1)(−P zk2 + 2xk zk + x2k )dzk dxn = log(1 + P ) + 2 2(1 + P ) Rn R R Pk=1 n n −nP + Rn pX (x ) nk=1 x2k dxn n = log(1 + P ) + 2 2(1 + P ) (c) n ≤ log(1 + P ) 2

(132) (133) (134) (135) (136) (137)

where (a) follows from Definition 2 and (28). (b) follows from letting z n = y n − xn . (c) follows from letting f be the encoding function of the (n, Mn , P, ε)max -code and the fact that Z n X X (15) pX n (xn ) x2k dxn = pW (w)kf (w)k2 ≤ nP. Rn

w∈W

k=1

Combining (131) and (137), we obtain that for all n ≥

2 1−ε

n log(1 + P ) − log Mn + ξn2/3 2 (25) = nC(P ) − log Mn + ξn2/3 ,

D(pY n kp∗Y n ) ≤

(138)

(139) (140)

15

which is precisely (29). In particular, if the sequence of codes is capacity-achieving, it follows from Definition 6 that 1 (141) lim log Mn = C(P ). n→∞ n Combining (140) and (141), we obtain   1 1 ξ ∗ (142) lim D(pY n kpY n ) ≤ lim C(P ) − log Mn + 1/3 = 0, n→∞ n n→∞ n n which is precisely (30). V. Q UASI -S TATIC FADING C HANNEL

UNDER A

L ONG -T ERM P OWER C ONSTRAINT

In this section, we will establish a fading channel version of Theorem 1. The problem formulation and the main results are stated in the following subsection. The proofs of the main results are given afterwards. A. Problem Formulation and Main Results We consider a quasi-static fading channel, where the fading coefficient H is selected randomly and kept constant during the course of transmission (we follow the notation in [15, Section II] and use the upper-case letter H to denote the random fading coefficient). The fading coefficient  1is assumed to be real and non-negative, and we let pH denote its distribution. In addition, we assume 0 < EpH H < ∞, i.e., Z pH (h) dh < ∞ , (143) 0< h + R

which is a common assumption for fading channels with positive zero-outage capacity [16, Section 4.2.4]. Source s has the knowledge of H but destination d does not. In other words, we assume the availability of the channel state information at transmitter (CSIT). The encoder at s is allowed to adapt to the fading coefficient H , and the encoding function for H = h is denoted by fh : W → Rn where fh (W ) is the codeword corresponding to W when H = h. The set of encoding functions {fh | h ≥ 0} satisfies the long-term power constraint   1 X EpH kfH (w)k2 ≤ nP (144) Mn w∈W

for some fixed P > 0, where the average is taken over the realizations of both the fading coefficient H and the message W . Let Xk denote the kth coordinate of fH (W ). In each time slot k ∈ {1, 2, . . . , n}, s transmits Xk and d receives √ Yk = HXk + Zk , (145)

where {Zk }nk=1 are i.i.d. standard normal random variables. In addition, we assume that H and Z n are independent. In this work, we are interested in the zero-outage capacity [16, Section 4.2.4] (also called the delay-limited capacity [15]), which is the maximum transmission rate under the constraint that the maximal error probabilities (to be defined precisely in Definition 9) vanish for all H > 0. By the CSIT assumption and long-term power constraint, a simple coding strategy that achieves the zero-outage capacity would be s performing channel inversion [16, Section 4.2.4], i.e., using one codebook and transmitting the codewords multiplied by √1H so that the fading effect disappears from the point of view of d. The formal definition of the zero-outage capacity will be given later in Definition 11. The formal definition of an (n, Mn , P )-code for the quasi-static fading channel is given below. Definition 7: An (n, Mn , P )-code consists of the following: 1) A message set W , {1, 2, . . . , Mn }

(146)

f h : W → Rn

(147)

at node s. Message W is uniform on W . 2) An encoding function

16

for each h ≥ 0, where fh is the encoding function at node s for encoding W when the fading coefficient H is equal to h such that X n = fH (W ). (148) The h-fading codebook is defined to be {fh (w) | w ∈ W}. The set of h-fading codebooks should satisfy the following long-term power constraint:   1 X EpH kfH (w)k2 ≤ nP. (149) Mn w∈W

3) A decoding function

ϕ : Rn → W,

(150)

where ϕ is the decoding function for W at node d such that ˆ = ϕ(Y n ). W

(151)

We now state the definitions for the quasi-static fading channel and the probability of decoding error. Definition 8: A quasi-static fading channel is characterized by the fading distribution pH and the probability density distribution qY |X,H satisfying √ qY |X,H (y|x, h) = N (y − hx ; 0, 1) (152) such that the following holds for any (n, Mn , P )-code: pH,W,X n ,Y n (h, w, xn , y n ) = pH,W,X n (h, w, xn )

n Y

k=1

for all h ≥ 0, w ∈ W ,

xn



Rn

and

yn



Rn

pYk |Xk ,H (yk |xk , h)

(153)

where

pYk |Xk ,H (yk |xk , h) = qY |X,H (yk |xk , h).

(154)

Since pYk |Xk ,H (yk |xk , h) does not depend on k for each fixed h ≥ 0 by (154), the quasi-static fading channel is stationary conditioned on the channel state h. The proof of the following proposition can be established in a standard way using the above two definitions and hence is omitted. Proposition 5: Fix any (n, Mn , P )-code and let pH,W,X n ,Y n ,Wˆ denote the probability distribution induced by the code. Then, the following two statements hold:  Qn (i) pH,W,X n ,Y n ,W ˆ = pH pW pX n |W,H ˆ |Y n . k=1 pYk |Xk pW √ (ii) For each k ∈ {1, 2, . . . , n}, pYk |Xk ,H (y|x, h) = N (y − hx ; 0, 1) for all (x, y) ∈ R2 . We define the zero-outage ε-capacity via the following three definitions. Definition 9: For an (n, Mn , P )-code, we can calculate according to Proposition 5 the maximal probability of decoding error defined as ˆ 6= w|W = w, H = h}. sup max Pr{W (155) h>0 w∈W

We call an (n, Mn , P )-code with maximal probability of decoding error no larger than ε an (n, Mn , P, ε)max -code. ˆ 6= W }, where the error Similarly, we can calculate the average probability of decoding error defined as Pr{W probability is averaged over the realizations of both the fading coefficient H and the message W . We call an (n, Mn , P )-code with average probability of decoding error no larger than ε an (n, Mn , P, ε)avg -code. Remark 6: If we view each positive realization of H as a time-invariant fading (slow fading) process, then the definition of maximal error probability above is a generalization of Definition 2.2 in [15], which insists that the transmission is reliable (meaning that the maximal error probability vanishes) whenever H > 0.

17

Definition 10: Let ε ∈ (0, 1) be a real number. A rate R is zero-outage ε-achievable for the fading channel if there exists a sequence of (n, Mn , P, ε)max -codes such that 1 log Mn ≥ R. (156) n Similarly, R is called an ε-achievable rate if there exists a sequence of (n, Mn , P, ε)avg -codes such that (156) holds. lim inf n→∞

The difference between achieving a zero-outage ε-achievable rate and an ε-achievable rate pertains to the two different error probability formalisms — the maximal versus average error probability formalisms respectively. By definition, any zero-outage ε-achievable rate is also ε-achievable, but not vice versa. The maximal error probability in Definition 9 considers the maximal error over all realizations of both the fading coefficient H and the message W so that the error probability is guaranteed to be less than ε regardless of fading, and hence fading does not contribute to an “outage event”. In contrast, under the average probability formalism, the source may choose to transmit nothing when the fading coefficient falls below certain threshold, thus creating an “outage event”. Definition 11: Let ε ∈ (0, 1) be a real number. The zero-outage ε-capacity for the fading channel, denoted by Cε0-out , is defined to be Cε0-out , sup{R | R is zero-outage ε-achievable}. (157) The zero-outage capacity is defined to be inf Cε0-out . Similarly, the ε-capacity for the fading channel, denoted by ε∈(0,1)

Cε , is defined to be Cε , sup{R | R is ε-achievable}.

(158)

The capacity is defined to be inf Cε . ε∈(0,1)

Define P 0-out ,

P , EpH [1/H]

(159)

which is positive and finite by (143). The following upper bound on the zero-outage ε-capacity theorem is the first result in this section, and its proof will be presented in Section V-B. Theorem 3: Fix any ε ∈ (0, 1) and a sequence of (n, Mn , P, ε)max -codes. Then, for each n ≥ log Mn ≤ nC(P 0-out ) +

In particular,

17(1 + P 0-out )n2/3 . 1−ε

Cε0-out ≤ C(P 0-out ) .

In addition, if

1 log Mn = C(P 0-out ), n   P H w∈W kfH (w)k2 ≤ P 0-out . lim sup EpH nMn n→∞ lim

n→∞

then

2 1−ε ,

(160) (161) (162)

(163)

Remark 7: It is well-known (e.g., [15, Section III-B]) that the zero-outage capacity inf Cε0-out = C(P 0-out ),

ε∈(0,1)

which implies that for all ε ∈ (0, 1)

Cε0-out ≥ C(P 0-out ),

(164)

(165)

which together with Theorem 3 implies that for all ε ∈ (0, 1) Cε0-out = C(P 0-out )

(166)

and hence Cε0-out does not depend on ε. In other words, the quasi-static fading channel admits the strong converse

18

property under the maximal error criterion. This is in contrast to the dependence on ε for the ε-capacity region Cε under the average error criterion, which will be explained in the following remark. Remark 8: By Definition 11, Cε0-out ≤ Cε where the closed-form expression of Cε for each ε ∈ (0, 1) is stated in [9, Section I] as   P Cε = C  (167) 1  Pr {H≤F −1 (ε)}−ε  −1 EpH H 1 {H > F (ε)} + pH F −1 (ε)

where

F −1 (ε) , sup {h | PrpH {H < h} ≤ ε } .

(168)

Indeed, under our assumption that P > 0, Cε0-out < Cε

for each ε ∈ (0, 1) because



1 1{H EpH [1/H] = EpH H  1 1{H ≥ Ep H H  1 = Ep H 1{H H  1 1{H > Ep H H

(169)

   1 −1 > F (ε)} + EpH 1{H ≤ F (ε)} H    EpH 1{H ≤ F −1 (ε)} −1 > F (ε)} + F −1 (ε)   PrpH H ≤ F −1 (ε) −1 > F (ε)} + F −1 (ε)   PrpH H ≤ F −1 (ε) − ε −1 > F (ε)} + , F −1 (ε) −1

(170) (171) (172) (173)

which implies from (159), (166), (167) that (169) holds. By inspecting the closed-form expression of Cε in (167), we see that Cε depends on ε, which implies that the quasi-static fading channel does not admit the strong converse property under the average error criterion. Theorem 3, Remark 7 and Remark 8 justify the following definition of capacity-achieving codes for the quasistatic fading channel. Definition 12: For each ε ∈ (0, 1), a sequence of (n, Mn , P, ε)max -codes is said to be capacity-achieving if 1 log Mn = C(P 0-out ). n→∞ n lim

(174)

The following theorem is the main result in this section, and its proof will be presented in Section V-C. Theorem 4: Fix an ε ∈ (0, 1) and a sequence of (n, Mn , P, ε)max -codes. For each n ∈ N, define p∗Y n (y n ) ,

n Y

k=1

N yk ; 0, 1 + P 0-out



(175)

to be the product of the capacity-achieving output distribution, and let pY n be the output distribution induced by the (n, Mn , P, ε)max -code on the quasi-static fading channel. If the sequence of codes is capacity-achieving, then 1 (176) D(pY n kp∗Y n ) = 0. n Remark 9: Theorem 4 is a generalization of (30) in Theorem 1. If H is deterministically equal to 1, then (30) can be recovered from (176). However, we cannot establish a counterpart of the non-asymptotic bound in (29) for the fading channel due to the difficulty in characterizing the convergence rate in the limiting statement in (163), which depends on not only the h-fading codebooks but also the fading process. With the additional assumption that the number of fading states is finite, our proof technique can indeed yield a counterpart of (29) by invoking Theorem 1 finitely many times. lim

n→∞

19

Remark 10: As explained in Remark 8 after Theorem 3, the quasi-static fading channel under the average error probability criterion does not possess the strong converse property. Hence, Theorem 4 does not hold true if the maximal error criterion is replaced with the average error criterion. B. Proof of Theorem 3 The proof of Theorem 3 relies on the following proposition. Since the proof is technical, it is deferred to the Appendix. Proposition 6: For any t ∈ (0, 1], inf

J ∈B(R) : PrpH {H∈J }≥t

Ep H



where B(R) is the Borel σ -algebra on R.

 1 H ∈J >0 H

(177)

Note that the infimum in Proposition 6 does depend on t ∈ (0, 1] and hence is not uniform over t ∈ (0, 1]. In the following, for conciseness, we omit the mentioning that J is a Borel-measurable set. We are now ready to present the proof of Theorem 3. Proof of Theorem 3: Fix an ε ∈ (0, 1). Our goal is to prove Cε ≤ C(P 0-out ). To this end, fix any sequence of (n, Mn , P, ε)max -codes. By Definition 9, we have for each h > 0 ˆ 6= w | W = w, H = h} ≤ ε. max Pr{W

w∈W

Define for each h > 0 the average power of the codebook {fh (w) | w ∈ W} as 1 X (n) Sh , kfh (w)k2 . nMn

(178)

(179)

w∈W

Combining the properties of pH,W,X n ,Y n ,W ˆ in Proposition 5, the upper bound on the maximal error probability in (178), the long-term power constraint for each h-fading codebook in (179), the strong converse theorem for the no-fading case in Theorem 2 and an upper bound on IpX n ,Y n |H=h (X n ; Y n ) in Proposition 4, we have (n)

for all h > 0 and each n ≥

for each n ≥

17(1 + hSh )n2/3 n (n) log Mn ≤ log(1 + hSh ) + 2 1−ε

(180)

2 1−ε ,

which then implies that ( ) (n) 17(1 + hSh )n2/3 n (n) log Mn ≤ inf log(1 + hSh ) + h>0 2 1−ε

(181)

2 1−ε .

In the following, we would like to show   17n2/3 n 17n2/3 (1 + P 0-out ) n (n) (n) log(1 + hSh ) + (1 + hSh ) ≤ log(1 + P 0-out ) + inf h>0 2 1−ε 2 1−ε

(182)

by assuming the contrary, i.e., there exists some ζ > 0 such that

17n2/3 n 17n2/3 (1 + P 0-out ) n (n) (n) log(1 + hSh ) + (1 + hSh ) ≥ log(1 + P 0-out ) + +ζ 2 1−ε 2 1−ε for all h > 0, which implies that (n)

hSh

> P 0-out

(183)

(184)

for each h > 0, which then implies that (n)

EpH [SH ] > P 0-out EpH [1/H].

Since

(n)

EpH [SH ] ≤ P

(185) (186)

20

by (149) and (179), it follows from (185) that P > P 0-out EpH [1/H],

(187)

which contradicts the definition of P 0-out in (159). Consequently, the assumption (183) is incorrect and (182) holds. Using (181) and (182), we obtain (160). Using (160), Definition 10 and Definition 11, we obtain (161).

It remains to prove (163) under the condition in (162). To this end, suppose (162) holds, i.e., 1 log Mn = C(P 0-out ) . n→∞ n lim

(188)

Combining (188) and (181), we have (

inf

(

lim inf



1 log(1 + P 0-out ) ≤ lim inf n→∞ 2

h>0

which implies that n→∞

(n)

17(1 + hSh )n−1/3 1 (n) log(1 + hSh ) + 2 1−ε

(n) inf hSh h>0



≥ P 0-out .

We will prove (163) by assuming the contrary, i.e., there exists some δ > 0 such that  P  H w∈W kfH (w)k2 lim sup EpH > P 0-out (1 + 2δ), nMn n→∞

))

,

(189)

(190)

(191)

∞ which implies that there exists some subsequence of {n}∞ n=1 , denoted by {nℓ }ℓ=1 , such that for all ℓ ∈ N,   P H w∈W kfH (w)k2 > P 0-out (1 + δ) , (192) Ep H nℓ Mnℓ

which then implies that there exists some t > 0 such that for all sufficiently large ℓ, o n (n ) PrpH HSH ℓ > P 0-out (1 + δ) = t > 0

(193)

(n )

(recall the definition of SH ℓ in (179)). Define

   EpH H1 H ∈ J J :PrpH {H∈J }≥t   υ(δ, t) , . EpH H1   Using Proposition 6 and the facts that δ > 0, t > 0, P 0-out > 0 and EpH H1 > 0, we have δtP 0-out

inf

υ(δ, t) > 0.

Using (190), (193) and (195), we have for all sufficiently large ℓ o n (n ) inf hSh ℓ > P 0-out − υ(δ, t). h>0

Consider the following chain of inequalities for all sufficiently large ℓ: i h (186) (n ) P ≥ Ep H S H ℓ o i n h (n ) (n ) (n ) = EpH SH ℓ HSH ℓ ≤ P 0-out (1 + δ) Pr HSH ℓ ≤ P 0-out (1 + δ) o i n h (n ) (n ) (n ) + EpH SH ℓ HSH ℓ > P 0-out (1 + δ) Pr HSH ℓ > P 0-out (1 + δ)  0-out  n o (196) P − υ(δ, t) (nℓ ) (nℓ ) 0-out 0-out ≥ Ep H ≤ P (1 + δ) ≤ P (1 + δ) Pr HS HS H H H o i n h (n ) (n ) (n ) + EpH SH ℓ HSH ℓ > P 0-out (1 + δ) Pr HSH ℓ > P 0-out (1 + δ)

(194)

(195)

(196)

(197)

(198)

(199)

21

 n o P 0-out − υ(δ, t) (nℓ ) (nℓ ) 0-out 0-out ≤ P (1 + δ) ≤ P (1 + δ) Pr HS HS ≥ Ep H H H H  0-out  n o P (1 + δ) (nℓ ) (nℓ ) 0-out 0-out + Ep H (1 + δ) (1 + δ) Pr HSH > P HSH > P H    1 = P 0-out − υ(δ, t) EpH H   n o  1 (nℓ ) (nℓ ) 0-out 0-out 0-out + P (δ + υ(δ, t)) EpH > P (1 + δ) > P (1 + δ) Pr HS HS H H H      (193)  1 1 0-out 0-out ≥ P − υ(δ, t) EpH H∈J + δtP inf Ep H H H J :PrpH {H∈J }≥t      1 υ(δ, t) 1 (194) Ep H + = P 0-out − υ(δ, t) EpH H 2 H     1 1 υ(δ, t) 0-out =P Ep H Ep H + H 2 H   (b) 1 > P 0-out EpH H (a)



(159)

= P

(200)

(201) (202) (203) (204) (205) (206)

where (a) follows from the inequality in the secondconditional expectation.  (b) follows from (195) and the fact that EpH H1 > 0 by (143). Since (206) is a contradiction due to the strict inequality, the assumption in (191) does not hold for all δ > 0, which implies that (163) holds. C. Proof of Theorem 4 Fix an ε ∈ (0, 1). Suppose we are given a sequence of (n, Mn , P, ε)max -codes which is capacity-achieving and hence satisfies 1 lim log Mn = C(P 0-out ) (207) n→∞ n by Definition 12. Let pH,W,X n ,Y n ,W ˆ be the distribution induced by the (n, Mn , P, ε)max -code. For each n ∈ N, define p∗Y n (y n ) to be the product of the capacity-achieving output distribution as in (175). In order to prove (176), 2 we fix an n ≥ 1−ε and consider the following standard steps: D(pY n kp∗Y n ) ≤ D(pH,Y n kpH p∗Y n ) =

=

Z

R+

= =

(208)

D(pY n |H kp∗Y n |pH )

Z

+ ZR

R+

(209)

pH (h)D(pY n |H=h kp∗Y n )dh

(210)

 pH (h) D(pY n |X n ,H=h kp∗Y n |pX n |H=h ) − D(pY n |X n ,H=h kpY n |H=h |pX n |H=h ) dh

(211)

 pH (h) D(pY n |X n ,H=h kp∗Y n |pX n |H=h ) − IpX n ,Y n |H=h (X n ; Y n ) dh.

For every h > 0, we have the following two observations: √ (i) The codebook { hfh (w) | w ∈ W} satisfies the long-term power constraint 1 X √ (179) (n) k hfh (w)k2 = nhSh . Mn

(212)

(213)

w∈W

√ (n) (ii) The codebook { hfh (w) | w ∈ W} is an (n, Mn , hSh , ε)max -code for the AWGN channel (cf. (11)), because (n) the codebook {fh (w) | w ∈ W} is an (n, Mn , Sh , ε)max -code for the fading channel with H = h (cf. (145)).

22 (n)

Based on the above observations, we can apply Theorem 2 (no-fading case) to the (n, Mn , hSh , ε)max -code for the AWGN channel and obtain (n)

log Mn ≤ IpX n ,Y n |H=h (X n ; Y n ) +

17(1 + hSh )n2/3 1−ε

(214)

for each h > 0, which implies from (212) that ! Z (n) 17(1 + hSh )n2/3 ∗ ∗ pH (h) D(pY n |X n ,H=h kpY n |pX n |H=h ) − log Mn + D(pY n kpY n ) ≤ dh. 1−ε R+

(215)

In order to obtain an upper bound on the divergence term in (215), consider the following chain of inequalities for each h > 0: D(pY n |X n ,H=h kp∗Y n |pX n |H=h ) Z Z pY n |X n ,H (y n |xn , h) n n n pY n |X n ,H (y n |xn , h) log pX n |H (x |h) = dy dx p∗Y n (y n ) Rn Rn √ Z Z √ n N (y n − hxn ; 0, 1) n n (a) n n N (y − hx ; 0, 1) log pX n |H (x |h) = dy dx N (y n ; 0, 1 + P 0-out ) Rn Rn Z Z N (z n ; 0, 1) (b) √ N (z n ; 0, 1) log pX n |H (xn |h) dz n dxn = n n 0-out n n N (z + hx ; 0, 1 + P ) R R (8) n 0-out = log(1 + P ) 2 Z n Z X √ 1 n 0-out 2 n |H (x |h) + hxk zk + hx2k )dzk dxn p + 2 N (z ; 0, 1)(−P z k X k 2(1 + P 0-out ) Rn R R k=1 P 0-out −nP + Rn pX n |H (xn |h) nk=1 hx2k dxn n 0-out )+ = log(1 + P 2 2(1 + P 0-out )

(216) (217) (218)

(219) (220)

(n)

−nP 0-out + nhSh n ≤ log(1 + P 0-out ) + 2 2(1 + P 0-out ) (c)

(221)

where (a) follows from Definition 8 and (175). √ (b) follows from letting z n = y n − hxn . (c) follows from the fact that Z n X X pX n |H (xn |h) hx2k dxn = pW (w)hkfh (w)k2 Rn

w∈W (179)

k=1

(222)

(n)

= nhSh .

Combining (215) and (221), we obtain that for all n ≥

(223)

2 1−ε (n)

(n)

−P 0-out + EpH [HSH ] 17(1 + EpH [HSH ])n−1/3 1 1 1 D(pY n kp∗Y n ) ≤ log(1 + P 0-out ) − log Mn + + . (224) n 2 n 2(1 + P 0-out ) 1−ε

Since

(n)

lim sup EpH [HSH ] ≤ P 0-out

(225)

n→∞

by (207), (163) in Theorem 3 and (179), it follows from (224) that   1 1 1 log(1 + P 0-out ) − log Mn lim sup D(pY n kp∗Y n ) ≤ lim sup 2 n n→∞ n→∞ n (207)

= 0.

(226) (227)

23

A PPENDIX P ROOF

OF

P ROPOSITION 6

To keep the exposition concise in this proof, we omit the mentioning that subsets of the real line such as J , K, ˜ and K ˆ are Borel-measurable. Fix a t ∈ (0, 1]. If t = 1, it follows that K     1 1 inf H∈J = inf H ∈J (228) Ep H Ep H H H J : PrpH {H∈J }≥t J : PrpH {H∈J }=1   1 (229) = Ep H H (143)

> 0,

(230)

which implies (177). Therefore, we assume in the rest of the proof that 0 < t < 1.

(231)

a , sup {h ≥ 0 | PrpH {H ≥ h} ≥ t } .

(232)

a < ∞.

(233)

Let Since t < 1, we have We want to show that a > 0 by assuming the contrary, i.e., a = 0.

(234)

PrpH {H ≥ 1/n} < t,

(235)

PrpH {H < 1/n} ≥ 1 − t.

(236)

PrpH {H = 0} = lim PrpH {H < 1/n}

(237)

Using (232) and (234), we have for all n ∈ N which implies that for all n ∈ N Since n→∞

by continuity of measure, it follows from (236) that (231)

PrpH {H = 0} ≥ 1 − t > 0,

(238)

contradicting the assumption of pH in (143). Consequently, we conclude that a > 0.

In order to prove (177), we consider the following two cases: Case (i): PrpH {H > a} = 0: Since PrpH {H ≤ a} = 1,   1 1 Ep H H ∈J ≥ H a

(239)

(240)

for any J with PrpH {H ∈ J } ≥ t > 0, which together with (239) implies (177). Case (ii): PrpH {H > a} > 0: It follows from the definition of a in (232) that PrpH {H > a} ≤ t .

(241)

24

In order to show (177), we first fix an arbitrary J such that PrpH {H ∈ J } ≥ t.

(242)

PrpH {H ∈ J } ≥ PrpH {H > a} .

(243)

Using (241) and (242), we have In addition, we claim that       1 1 Ep H H > a = inf EpH H ∈ K PrpH {H ∈ K} = PrpH {H > a} . K H H

(244)

Note that the infimum is achieved by the set K = (a, ∞). To prove the above claim, it suffices to show that for ˜ with equal measure as (a, ∞) with respect to pH , i.e., any other K ˜ = PrpH {H > a}, PrpH {H ∈ K}

we must have Ep H



   1 ˜ − EpH 1 H > a ≥ 0. H ∈ K H H

(245) (246)

˜ that satisfies (245) and let A , (a, ∞) \ K ˜ and B , K ˜ ∩ [0, a]. Then, we have To this end, fix any K ˜ PrpH {H ∈ A} = PrpH {H ∈ (a, ∞) \ K}

˜ = PrpH {H ∈ (a, ∞)} − PrpH {H ∈ (a, ∞) ∩ K}

(245)

˜ − PrpH {H ∈ (a, ∞) ∩ K} ˜ = PrpH {H ∈ K} ˜ ∩ [0, a]} = PrpH {H ∈ K = PrpH {H ∈ B}.

(247) (248) (249) (250) (251)

Consider the following chain of inequalities:     1 1 ˜ H ∈ K − Ep H H>a Ep H H H Z p Z pH|H>a (h) ˜ (h) H|H∈K = dh − dh (252) h h ˜ (a,∞) K ! Z Z pH (h) pH (h) 1 (245) dh − dh (253) = PrpH {H > a} h h ˜ (a,∞) K ! Z Z Z Z 1 pH (h) pH (h) pH (h) pH (h) = dh + dh − dh − dh PrpH {H > a} h h h h ˜ ˜ ˜ ˜ K∩[0,a] (a,∞)∩K (a,∞)\K K∩(a,∞) (254) ! Z Z pH (h) pH (h) 1 dh − dh (255) = PrpH {H > a} h h ˜ ˜ (a,∞)\K K∩[0,a] ! Z Z 1 ≥ pH (h)dh (256) pH (h)dh − aPrpH {H > a} ˜ ˜ (a,∞)\K K∩[0,a] (251)

= 0.

(257)

Consequently, the claim in (244) follows from (245) and (257). On the other hand, for any K with PrpH {H ∈ K} > 0 and any positive λ ≤ PrpH {H ∈ K}, we have       1 1 ˆ ˆ (258) H ∈ K ≥ inf EpH H ∈ K PrpH {H ∈ K} = λ Ep H ˆ H H K

25

ˆ ⊆ K with PrpH {H ∈ K} ˆ = λ by excluding an appropriate leftmost subset because we can always construct a K h i  1  ˆ . We now consider of K such that EpH H H ∈ K ≥ EpH H1 H ∈ K       1 1 (259) H ∈ J ≥ inf EpH H ∈ K PrpH {H ∈ K} = PrpH {H ∈ J } Ep H K H H     (a) 1 ˆ = PrpH {H > a} ˆ PrpH {H ∈ K} (260) H∈K ≥ inf EpH ˆ H K   1 (244) = Ep H H > a (261) H

where (a) follows from (243) and (258). Since (261) holds for any arbitrary J that satisfies (242), we have     1 1 inf Ep H H ∈ J ≥ Ep H H>a , (262) H H J : PrpH {H∈J }≥t which then implies from the assumption under this case and (233) that   1 H ∈ J > 0. inf Ep H H J : PrpH {H∈J }≥t

(263)

ACKNOWLEDGMENTS

The authors are supported by an NUS grant (R-263-000-A98-750/133) an NUS Young Investigator Award (R263-000-B37-133), and a Ministry of Education (MOE) Tier 2 grant (R-263-000-B61-112). R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

T. S. Han and S. Verd´u, “Approximation of output statistics,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 752–772, 1993. S. Verd´u and T. S. Han, “A general formula for channel capacity,” IEEE Trans. Inf. Theory, vol. 40, no. 4, pp. 1147–1157, 1994. S. Shamai and S. Verd´u, “The empirical distribution of good codes,” IEEE Trans. Inf. Theory, vol. 43, no. 3, pp. 836–846, 1997. Y. Polyanskiy and S. Verd´u, “Empirical distribution of good channel codes with nonvanishing error probability,” IEEE Trans. Inf. Theory, vol. 60, no. 1, pp. 5–21, 2014. Y. Polyanskiy, H. V. Poor, and S. Verd´u, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, 2010. M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Trans. Inf. Theory, vol. 55, no. 11, pp. 4947–4966, 2009. R in V. Y. F. Tan, “Asymptotic estimates in information theory with non-vanishing error probabilities,” Foundations and Trends Communications and Information Theory, vol. 11, no. 1–2, 2014. G. Caire, G. Taricco, and E. Biglieri, “Optimum power control over fading channels,” IEEE Trans. Inf. Theory, vol. 45, no. 5, pp. 1468–1489, 1999. W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Trans. Inf. Theory, vol. 61, no. 9, pp. 4598–4615, 2015. S. L. Fong and V. Y. F. Tan, “A proof of the strong converse theorem for Gaussian broadcast channels via the Gaussian Poincar´e inequality,” submitted to IEEE Trans. Inf. Theory, Sep. 2015, arXiv:1509.01380 [cs.IT]. M. Raginsky and I. Sason, “Concentration of measure inequalities in information theory, communications and coding,” Foundations R in Communications and Information Theory, vol. 10, no. 1–2, 2013. and Trends ——, “Refined bounds on the empirical distribution of good channel codes via concentration inequalities,” in Proc. IEEE Intl. Symp. Inf. Theory, Istanbul, Turkey, 2013. Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Princeton University, 2010. A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge, U.K.: Cambridge University Press, 2012. S. V. Hanly and D. N. C. Tse, “Multiaccess fading channels – Part II: Delay-limited capacities,” IEEE Trans. Inf. Theory, vol. 44, no. 7, pp. 2816–2831, 1998. A. Goldsmith, Wireless Communications. Cambridge, U.K.: Cambridge University Press, 2005.