The Optimal Density of Infinite Constellations for the Gaussian Channel

Report 1 Downloads 73 Views
1

arXiv:1103.0171v1 [cs.IT] 1 Mar 2011

The Optimal Density of Infinite Constellations for the Gaussian Channel Amir Ingber, Ram Zamir and Meir Feder Dept. of EE-Systems, The Faculty of Engineering, Tel Aviv University Tel Aviv 69978, Israel Email: {ingber, zamir, meir}@eng.tau.ac.il Abstract The setting of a Gaussian channel without power constraints is considered. In this setting the codewords are points in an n-dimensional Euclidean space (an infinite constellation). The channel coding analog of the number of codewords is the density of the constellation points, and the analog of the communication rate is the normalized log density (NLD). The highest achievable NLD with vanishing error probability (which can be thought of as the capacity) is known, as well as error exponents for the setting. In this work we are interested in the optimal NLD for communication when a fixed, nonzero error probability is allowed. In classical channel coding the gap to capacity is characterized by the channel dispersion (and cannot be derived from error exponent theory). In the unconstrained setting, we show that as the codeword length (dimension) n grows, the gap to the highest achievable NLD is inversely proportional (to the first order) to the square root of the block length. We give an explicit expression for the proportion constant, which is given by the inverse Q-function of the allowed error probability, times the square root of 21 . In an analogy to a similar result in classical channel coding, it follows that the dispersion of infinite constellations is given by 12 nat2 per channel use. We show that this optimal convergence rate can be achieved using lattices, therefore the result holds for the maximal error probability as well. Connections to the error exponent of the power constrained Gaussian channel and to the volume-to-noise ratio as a figure of merit are discussed.

I. I NTRODUCTION Coding schemes over the Gaussian channel are traditionally limited by the average/peak power of the transmitted signal [1]. Without the power restriction (or a similar restriction) the channel capacity becomes infinite, since one can space the codewords arbitrarily far apart from each other and achieve a vanishing error probability. However, many coded modulation schemes take an infinite constellation (IC) and restrict the usage to points of the IC that lie within some n-dimensional form in Euclidean space (a ’shaping’ region). Probably the most important example for an IC is a lattice, and examples for the shaping regions include a hypersphere in n dimensions, and a Voronoi region of another lattice [2]. In 1994, Poltyrev [3] studied the model of a channel with Gaussian noise without power constraints. In this setting the codewords are simply points in the n-dimensional Euclidean space. The analog to the number of codewords is the density γ of the constellation points (the average number of points per unit volume). The analog of the communication rate is the normalized log density (NLD) δ , n1 log γ. The error probability in this setting can A. Ingber is supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities. This research was supported in part by the Israel Science Foundation, grant no. 634/09.

2

be thought of as the average error probability, where all the points of the IC have equal transmission probability (precise definitions follow later on in the paper). Poltyrev showed that the NLD δ is the analog of the rate in classical channel coding, and established the analog term to the capacity, the ultimate limit for the NLD, denoted δ ∗ . Random coding and sphere packing error exponent bounds were also derived, which are analogous to Gallager’s error exponents in the classical channel coding setting [4], and to the error exponents of the power-constrained AWGN channel [5], [4]. In classical channel coding, the channel capacity gives the ultimate limit for the rate when arbitrarily small error probability is required, and the error exponent quantifies the (exponential) speed at which the error probability goes to zero when the rate is fixed (and below the channel capacity). Another question of interest is the following: for a fixed error probability ε, what is the optimal (maximal) rate that is achievable when the codeword length n is fixed. While the exact answer for this question for any finite n is still open (see [6] for the current state of the art), the speed at which the optimal rate converges to the capacity is known. By letting Rε (n) denote the maximal rate for which there exist communication schemes with codelength n and error probability at most ε, it is known that for a channel with capacity C [7]: r   V −1 log n Rε (n) = C − , (1) Q (ε) + O n n where Q−1 (·) is the inverse complementary standard Gaussian cumulative distribution function. The constant V , termed the channel dispersion, is the variance of the information (x,y) for a capacity-achieving distribution. More details and spectrum i(x; y) , log PPXXY (x)PY (y) extensions can be found in [6]. In this paper we are interested in finding out whether the behavior demonstrated in (1) exists in the setting of a Gaussian channel without power constraints. We answer this question to the positive. The main result is the following: for a given, fixed, nonzero error probability ε, denote by δε (n) be the maximal NLD for which there exists an IC with dimension n and error probability at most ε. Then r   log n 1 −1 ∗ , (2) Q (ε) + O δε (n) = δ − 2n n 1 where δ ∗ is the ultimate limit for the NLD with any dimension [3], given by 21 log 2πeσ 2 where σ 2 is the variance of the additive Gaussian noise (logarithms are taken w.r.t. to the natural base e). In the achievability part we use lattices (and the Minkowski-Hlawka theorem [8]). Because of the regular structure of lattices, our achievability result holds for the maximal error probability. The proof technique used is somewhat different than that used by Poltyrev in [3]. Here we use a suboptimal ’typicality decoder’, closer in spirit to that used in the standard achievability proofs [9] (rather than the the the technical ML decoder based proof [3]). In addition, a variant of the typicality decoder can be used to prove Poltyrev’s random coding exponent. In the converse part of the proof we consider the average error probability and any IC (not only lattices), therefore our result (2) holds for both average and maximal error probability, and for any IC (lattice or not). Another figure of merit for lattices (that can be defined for general ICs as well) is the volume-to-noise ratio (VNR), which generalizes the SNR notion (see, e.g. [10]). The VNR

3

quantifies how good a lattice is for channel coding over the unconstrained AWGN. It is known that the VNR of any lattice cannot be below 2πe, and that there exist lattices that approach this value as the dimension grows. As a consequence of the paper’s main result we show the asymptotical behavior of the optimal VNR. The paper is organized as follows. In Section II we discuss the relations of our result to the error exponent theory and to the power constrained AWGN channel. In Section III we define the notations and prove a key lemma that is required for the proof of our main result. The direct and converse proofs are given in Sections IV and V respectively. In section VI the random coding exponent of the unconstrained setting is re-derived using techniques from previous chapters and the Laplace method of integration. In Section VII the asymptotical behavior of the optimal VNR is obtained as a consequence of the paper’s main result. We conclude the paper in Section VIII. II. P REDICTIONS By the similarity of Equations (1) and (2) we can isolate the constant 12 and identify it as the dispersion of the unconstrained AWGN setting. In this section we discuss this fact and its relation to classical channel coding and to the power-constrained AWGN channel. One interesting property of the channel dispersion theorem (1) is the following connection to the error exponent. Under some mild regularity assumptions, the error exponent can be approximated near the capacity by (C − R)2 ∼ , (3) E(R) = 2V where V is the channel dispersion. This property, which is attributed to Shannon (see [6, Fig. 18]), holds for DMCs and for the power constrained AWGN channel and is conjectured to hold in more general cases. Note, however, that while theparabolic behavior of the exponent  1 hints that the gap to the capacity should behave as O √n , the dispersion theorem (1) cannot be derived directly from the error exponent theory (even if the error probability was given by e−nE(R) exactly). Analogously to (3), we examine the error exponent for the unconstrained Gaussian setting. 1 ∗ For NLD values above the critical NLD δcr , 12 log 4πeσ 2 (but below δ ), the error exponent is given by [3]: e−2δ 1 2 E(δ, σ ) = + δ + log 2πσ 2 . (4) 2 4πeσ 2 By straightforward differentiation we get that the second derivative (w.r.t. δ) of E(δ, σ 2 ) at δ = δ ∗ is given by 2, so according to (3), it is expected that the dispersion for the unconstrained AWGN channel will be 12 . This agrees with our main result and its similarity to (1), and extends the correctness of the conjecture (3) to the unconstrained AWGN setting as well. It should be noted, however, that our result provides more than just proving the conjecture: there also exist examples where the error exponent is well defined (with second derivative), but a connection of the type (3) can only be achieved asymptotically with ε → 0 (see, e.g. [11]). Our result (2) holds for any finite ε. Another indication that the dispersion for the unconstrained setting should be 12 comes the connections to the the power constrained AWGN. While the capacity 21 log(1 + P ) , where P denotes the channel SNR, is clearly unbounded with P , the form of the error exponent curve does have a nontrivial limit as P → ∞. In [2] it was noticed that this limit

4

Dispersion @nats2 channel useD

0.6 0.5 0.4 0.3 0.2 0.1 0.0 -30

-20

-10

0

10

20

30

SNR @dBD Fig. 1.

The power-constrained AWGN dispersion (solid) vs. the unconstrained dispersion (dashed)

is the error exponent of the unconstrained AWGN channel (sometimes termed the ’Poltyrev exponent’), where the distance to the capacity is replaced by the NLD distance to δ ∗ . By this analogy we examine the dispersion of the power constrained AWGN channel at high SNR. In [6] the dispersion was found, given (in nat2 per channel use) by VAW GN =

P (P + 2) . 2(P + 1)2

(5)

This term already appeared in Shannon’s 1959 paper on the AWGN error exponent [5], where its inverse is exactly the second derivative of the error exponent at the capacity (i.e. (3) holds for the AWGN channel). It is therefore no surprise that by taking P → ∞, we get the desired value of 21 , thus completing the analogy between the power constrained AWGN and its unconstrained version. This convergence is quite fast, and is tight for SNR as low as 10dB (see Fig. 1). III. P RELIMINARIES A. Notation We adopt most of the notations of Poltyrev’s paper [3]: Let Cb(a) denote a hypercube in Rn n ao Cb(a) , x ∈ Rn s.t. ∀i |xi | < . (6) 2 Let Ball(r) denote a hypersphere in Rn and radius r > 0, centered at the origin Ball(r) , {x ∈ Rn s.t. kxk < r},

(7)

Ball(y, r) , {x ∈ Rn s.t. kx − yk < r}.

(8)

and let Ball(y, r) denote a hypersphere in Rn and radius r > 0, centered at y ∈ Rn

5

Let S be an IC. We denote by M(S, a) the number of points in the intersection of Cb(a) T and the IC S, i.e. M(S, a) , |S Cb(a) |. The density of S, denoted by γ(S), or simply γ, measured in points per volume unit, is defined by M(S, a) . (9) an a→∞ The normalized log density (NLD) δ is defined by 1 δ , log γ. (10) n It will prove useful to define the following: Definition 1 (Expectation over points in a hypercube): Let f : S → R be an arbitrary function. Let Ea [f (s)] denote the expectation of f (s), where s is drawn uniformly from the code points that reside in the hypercube Cb(a): X 1 f (s). (11) Ea [f (s)] , M(S, a) γ(S) , lim sup

s∈S∩Cb(a)

Throughout the paper, an IC will be used for transmission of information through the unconstrained AWGN channel with noise variance σ 2 (per dimension). The additive noise shall be denoted by Z = [Z1 , ..., Zn ]T . An instantiation of the noise vector shall be denoted by z = [z1 , ..., zn ]T . For s ∈ S, let Pe (s) denote the error probability when s was transmitted. When the maximum likelihood (ML) decoder is used, the error probability is given by Pe (s) = Pr{s + Z ∈ / W (s)},

(12)

Pemax (S) , sup Pe (s),

(13)

where W (s) is the Voronoi region of s, i.e. the convex polytope of the points that are closer to s than to any other point s′ ∈ S. The maximal error probability is defined by s∈S

and the average error probability is defined by Pe (S) , lim sup Ea [Pe (s)].

(14)

a→∞

B. A Key Lemma A key lemma that will be used throughout the paper is a lemma regarding the norm of a Gaussian vector. Lemma 1: Let Z = [Z1 , ..., Zn ]T be a vector of n zero-mean, independent Gaussian random variables, each with mean σ 2 . Let r > 0 be a given arbitrary radius. Then the following holds for any dimension n:  2  2 r − nσ 6T Pr{kZk > r} − Q ≤ √ √ , (15) n σ 2 2n

where Q(·) is the standard complementary cumulative distribution function, k · k is the usual ℓ2 norm, and " # X 2 − 1 3 (16) T = E √ ≈ 3.0785, 2

6

where X is a standard Gaussian RV. Proof: The proof relies on the convergence of a sum of independent random variables to a Gaussian random variable, i.e. the central limit theorem. We first note that ( n ) X Pr{kZk > r} = Pr (17) Zi2 > r 2 . i=1

Zi2 −σ2 √ σ2 2

Let Yi = . It is easy to verify that E[Yi ] = 0 and that VAR[Yi ] = 1. Let Sn , Pn i=1 Yi . Note that Sn also has zero mean and unit variance. It follows that ) ( n ( n ) 2 2 X Z 2 − σ2 X r − nσ i √ ≥ √ Pr Zi2 ≥ r 2 = Pr 2 2 2 2 σ σ i=1 ) ( i=1 n X r 2 − nσ 2 √ = Pr Yi ≥ 2 2 σ  i=1  r 2 − nσ 2 √ = Pr Sn ≥ . (18) σ 2 2n Sn is a normalized sum of i.i.d. variables, and by the central limit theorem converges to a standard Gaussian random variables. The Berry-Esseen theorem (see Appendix A) quantifies the rate of convergence in the cumulative distribution function sense. In the specific case discussed in the lemma we get    2  2 2 r − nσ 2 6T Pr Sn ≥ r −√nσ √ −Q (19) ≤√ , n σ 2 2n σ 2 2n √1 n

where T = E[|Yi |3 ]. Note that T is independent of σ 2 , finite, and can be evaluated numerically to about 3.0785.

IV. D IRECT In this section we show that for any fixed, nonzero error probability ε > 0, there exist ICs with NLD δ and average error probability at most ε, where r 1 −1 Q (ε). (20) δ ≈ δ∗ − 2n In fact, these ICs will be lattices, and therefore the same result will hold for the maximal error probability. For lattices the average error probability is identical for all the code points, since all the Voronoi cells of a lattice are congruent. In addition, the volume of a voronoi cell is equal for all cells, and is equal to the determinant of the lattice det Λ. The density (in code points per volume unit) is therefore γ = (det Λ)−1 , and the NLD is δ = n1 log γ = − n1 log det Λ. Theorem 1: Let ε > 0. There exists a lattice Λ with maximal error probability at most ε, and NLD r   1 1 −1 ∗ . (21) Q (ε) + O δ=δ − 2n n

Proof: Let Λ be a lattice that is used as an IC for transmission over the unconstrained AWGN. We consider a suboptimal decoder, and therefore the performance of the optimal

7

ML decoder can only be better. The decoder, called a typicality decoder, shall operate as follows. Suppose that λ ∈ Λ is sent, and the point y = λ + z is received, where z is the additive noise. If there is only a single point in the ball Ball(y, r), then this will be the decoded word. If there are no codewords in the ball, or more than one codeword in the ball, an error is declared (one of the code points is chosen at random). Lemma 2: The average error probability of a lattice Λ (with the typicality decoder) is bounded by X Pe (Λ) ≤ Pr {Z ∈ / Ball(r)} + Pr {Z ∈ Ball(λ, r) ∩ Ball(r)} , (22) λ∈Λ\{0}

where Z denotes the noise vector. Proof: Since Λ is a lattice we can assume without loss of generality that the zero point was sent. We divide the error events to two cases. First, if the noise falls outside the ball of radius r (centered at the origin), then there surely will be erroneous decoding since the transmitted (0) point is outside the ball. The remaining error cases are where the noise Z is within Ball(r), and the noise falls in the typical ball of some other lattice point (that is different than the transmitted zero point). We therefore get      \ [ Pe (Λ) ≤ Pr {Z ∈ / Ball(r)} + Pr Z ∈ Ball(r)  Ball(λ, r)   λ∈Λ\{0}     [ = Pr {Z ∈ / Ball(r)} + Pr Z ∈ Ball(λ, r) ∩ Ball(r)   λ∈Λ\{0} X ≤ Pr {Z ∈ / Ball(r)} + Pr {Z ∈ Ball(λ, r) ∩ Ball(r)} ,

(23)

λ∈Λ\{0}

where the last inequality follows from the union bound. In order to show the existence of a lattice that will achieve the desired performance, we use a version of the Minkowski-Hlawka (MH) theorem [8, Lemma 3, p. 65]: Theorem 2 (MH): Let f : Rn → R+ be a nonnegative integrable function with bounded support. Then for every ξ > 0, there exist a lattice Λ with det Λ = 1 that satisfies Z X f (λ) ≤ f (λ)dλ + ξ. (24) λ∈Λ\{0}

Rn

We utilize the MH theorem to show the existence of a good lattice. Let γ > 0 be a desired density of a lattice, and let   f ′ (λ′ ) , P r Z ∈ Ball λ′ γ 1/n r ∩ Ball(r) . (25) Note that f ′ (λ′ ) = 0 for all kλ′ k > 2r · γ −1/n and that f ′ is clearly integrable. By the MH theorem, there exists a lattice Λ′ s.t. Z X ′ f (λ ) ≤ f (λ′ )dλ′ + ξ. (26) λ′ ∈Λ′ \{0}

Rn

8

Define the lattice Λ to be a scaled version of Λ′ : Λ , γ 1/n Λ′ = {λ′ γ 1/n |λ′ ∈ Λ′ }.

(27)

It can be easily verified that det Λ = γ −1 det Λ′ = γ −1 . We therefore get: X Pr {Z ∈ Ball(λ, r) ∩ Ball(r)} λ∈Λ\{0}

=

X

λ′ ∈Λ′ \{0}



Z



R Zn

  Pr Z ∈ Ball λ′ γ 1/n , r ∩ Ball(r)

  Pr Z ∈ Ball λ′ γ 1/n , r ∩ Ball(r) dλ′ + ξ

Rn

Pr {Z ∈ Ball(λ, r) ∩ Ball(r)} dλ + ξ.

We further examine the resulting integral: Z Pr {Z ∈ Ball(λ, r) ∩ Ball(r)} dλ Rn Z Z = fZ (z)dzdλ Rn Ball(λ,r)∩Ball(r) Z Z ≤ fZ (z)dzdλ Rn Ball(λ,r) Z Z = fZ (z′ + λ)dz′ dλ n ZR Ball(r) Z = fZ (z′ + λ)dλdz′ n ZBall(r) ZR = fZ (λ′ )dλ′ dz′ n ZBall(r) R = 1dz′ =

Ball(r) Vn r n ,

(28)

(29)

where Vn is the volume of an n dimensional hypersphere with radius 1. Combined with (22) we get that there exist a lattice Λ with density γ, in which Pe (Λ) ≤ Pr {kZk > r} + γVn r n + ξ,

(30)

where r > 0, γ > 0 and ξ > 0 can be chosen arbitrarily. It appears that the dominant term in (30) is Pr {kZk > r}. The intuition follows from the the converse result (Theorem 3 in the next section), where Pr {kZk > r} is the only term in the lower bound. h i 2 √ Let ε > 0 be the desired error probability. Determine r s.t. Pr(kZk > r) = ε 1 − n , γ s.t. γVn r n = √εn , and ξ = ε √1n . This way it is assured that the error probability is not h i greater than ε 1 − √2n + √εn + √εn = ε. Define αn s.t. r 2 = nσ 2 (1 + αn ) (recall that r implicitly depends on n as well).

9

Lemma 3: αn , defined above, is given by r   2 −1 1 . (31) Q (ε) + O αn = n n Proof: By construction, r is chosen s.t.   2 2 2 Pr(kZk > r ) = ε 1 − √ . (32) n By the definition of αn ,   2 2 2 Pr(kZk > nσ (1 + αn )) = ε 1 − √ . (33) n By Lemma 1,     2 1 nσ (1 + αn ) − nσ 2 2 2 √ +O √ Pr(kZk > nσ (1 + αn )) = Q n σ 2 2n    r n 1 . (34) αn + O √ =Q 2 n Combined with (33), we get   r    2 n 1 ε 1− √ =Q , (35) αn + O √ n 2 n or  r   n 1 =Q αn . (36) ε+O √ n 2 Taking Q−1 (·) of both sides, we get r    n 1 −1 ε+O √ αn = Q . (37) 2 n By the Taylor approximation of Q−1 (ε + x) around x = 0, we get r   n 1 −1 , (38) αn = Q (ε) + O √ 2 n or r   2 −1 1 , (39) Q (ε) + O αn = n n as required. So far, we have shown the existence of a lattice Λ with error probability at most ε. We now calculate its NLD δ. The NLD is given by 1 δ = log γ n 1 ε √ = log n Vn r n n log n 1 1 + log ε = − log Vn − log r − n 2n n 1 1 log n 1 = − log Vn − log[nσ 2 (1 + αn )] − + log ε. n 2 2n n

10

Lemma 4:

  1 2πe 1 1 1 . log Vn = log − log n + O n 2 n 2n n Proof: Appendix B. With Lemma 4, we have   1 1 1 2 . δ ≥ − log(2πeσ ) − log(1 + αn ) + O 2 2 n

(40)

(41)

Note that − 12 log(2πeσ 2 ) =√δ ∗ , the ultimate NLD with infinite dimension [3]. By Lemma 3, we see that αn = O(1/ n), and by the Taylor approximation for log(1 + αn ) around αn = 0, we get   1 1 2 δ ≥ δ − [αn + O(αn )] + O 2 n   1 1 = δ ∗ − αn + O 2 n r   1 −1 1 ∗ , (42) Q (ε) + O =δ − 2n n which completes the proof of the theorem. In [3], the author has used a random coding technique over a hypercube and gave an error exponent achievability result. This technique can be also to prove Theorem 1. However, in order for the result to hold for the maximal error probability as well, an additional n expurgation is required, which weakens the result by a factor of log . In our proof we 2n avoided the need for expurgation by using lattices. ∗

V. C ONVERSE In the previous section we have shown the existence of good ICs with NLD that approaches q the NLD capacity δ ∗ . These ICs were lattices, and the convergence to δ ∗ was of 1 the form 2n Q−1 (ε). In this section we show that this is the optimal convergence rate, for any IC (not only for lattices). We start by showing a bound on the NLD of ICs with fixed Voronoi cell volume (which includes lattices) and average error probability. We then extend the result for regular ICs, and finally, to any IC. The results in this section are concerned with the average error probability Pe (S). A lower bound on the average error probability is clearly a lower bound on the maximal error probability as well. A. ICs with Identical Voronoi Cell Volume Suppose the Voronoi regions of S have the same volume γ1 . Such ICs include the important class of Lattices, as well as many other constellation types. For this type of ICs, we show the following: Theorem 3: Let S be an IC with NLD δ = n1 log γ and average error probability Pe (S) = ε. Suppose that all the Voronoi cells of S have the same volume γ1 . Then the NLD δ is bounded by r   1 1 1 −1 ∗ . (43) δ≤δ − Q (ε) + log n + O 2n 2n n

11

Proof: Suppose s ∈ S is sent. Let r be the radius of a sphere with the same volume as the Voronoi region W (s): |W (s)| = and therefore

1 = e−nδ = r n Vn , γ

(44)

−1

r = e−δ Vn n .

(45)

By the equivalent sphere argument [3][12], the probability that the noise leaves W (s) is lower bounded by the probability to leave a sphere of the same volume. Pe (s) = Pr{s + Z ∈ / W (s)} ≥ Pr{Z ∈ / Ball(r)} = Pr{kZk ≥ r} −1

= Pr{kZk ≥ e−δ Vn n }.

(46)

By assumption, all the Voronoi regions have the same volume. Therefore the bound (46) holds for any s ∈ S, and also for the average error probability ε. The probability that Pr{kZk ≥ r}, or Pr{kZk2 ≥ r 2 }, is equal to the CDF of a χ2 random variable with n degrees of freedom. There is no closed-form expression for the CDF of this probability distribution. In [3], this probability is lower bounded by exp[−n(EL − o(1))], where EL is a function of δ and σ 2 only (and not n). This gives the sphere packing exponent for this setting. In [12], this probability was calculated as a sum of n/2 elements that gives the exact expression, but its asymptotic behavior is hard to characterize. Here we use the normal approximation in order to determine the behavior of the density δ with n, where the error probability ε remains fixed. The arguments so far give: ε ≥ Pr{Z ∈ / Ball of volume γ −1 } = Pr{kZk ≥ r} ) ( n X 2 2 = Pr Zi ≥ r = Pr

( i=1 n X i=1

Zi2 ≥ e−2δ Vn

By Lemma 1, we have ( n ) X 2 2 −2δ − n ε ≥ Pr Zi ≥ e Vn ≥Q i=1

2 −n

)

.

−2

e−2δ Vn n − nσ 2 √ σ 2 2n

where T is a constant given in (16). It follows that # " −2 e−2δ Vn n − nσ 2 6T √ . ε+ √ ≥Q n σ 2 2n

(47)

!

6T −√ , n

(48)

(49)

12

Since the Q function is monotone decreasing we get   −2 6T e−2δ Vn n − nσ 2 −1 √ ε+ √ ≥Q , n σ 2 2n or r   6T 1 −2δ − n2 2 −1 ε+ √ e Vn ≥ 1 + Q . nσ 2 n n

(50)

(51)

We take the logarithm of both sides and multiply by 12 , and get " r  # 1 6T 1 1 1 2 −1 −δ + log 2 − log Vn ≥ log 1 + ε+ √ Q . 2 nσ n 2 n n

(52)

By using Lemma 4, the LHS of (52) becomes   1 1 2πe 1 1 1 + log n + O −δ + log 2 − log 2 nσ 2 n 2n n   1 1 1 1 + log n + O = −δ + log 2 2 2πeσ 2n n   1 1 = δ∗ − δ + , log n + O 2n n

(53)

1 where δ ∗ = 12 log 2πeσ 2 is the optimal NLD. We continue with the RHS of (52):

" r  # 2 −1 1 6T ε+ √ log 1 + Q 2 n n " r   # 2 6T 1 (a) 1 Q−1 (ε) + √ Q˙ −1 (ε) + O = log 1 + 2 n n n " r  # 6T ˙ −1 1 1 2 −1 √ Q (ε) + O Q (ε) + = log 1 + 2 n n n n " r  # 2 −1 1 1 Q (ε) + O = log 1 + 2 n n r  r     !2 2 −1 1 2 −1 1 (b) 1  +O Q (ε) + O Q (ε) + O =  2 n n n n r   1 −1 1 = . Q (ε) + O 2n n

(54)

(a) follows from the Taylor expansion of Q−1 (·) around ε. Q˙ −1 (·) denotes the derivative of Q−1 (·). (b) follows from the Taylor expansion of log(1 + x) around x = 0.

13

We plug (53) and (54) back to (52) and get: r   1 −1 1 1 ∗ , Q (ε) + log n + O δ≤δ − 2n 2n n

(55)

as required. Notes: • When comparing the converse to the achievability result (Theorem 1) we note a gap n n in addition to the O(1/n) term. The log comes from the approximation of of log 2n 2n the Gamma function. Since this approximation is tight (in the multiplicative sense), it seems that this factor can’t be improved and is inherent to the equivalent sphere argument. • A more careful analysis can yield the next factor in the expansion of the RHS of (52), and result in r     1 1 1 1 1 1 −1 −1 −1 ∗ √ Q (ε)+ log n+ −3T Q˙ (ε) + Q (ε) − log π +O δ≤δ − . 2n 2n n 2 2 n n It should be noted that Q˙ −1 (ε) can be quite large (in absolute value). For example, for ε = 10−3 , Q˙ −1 (ε) = −296.992. Therefore for lower values of ε, the bound gets tight only for larger n.

B. Converse for Regular ICs In order to extend Theorem 3 to any IC, we first concentrate on ICs with some mild regularity assumptions. Definition 2 (Regular ICs): An IC S is called regular, if: 1) There exists a radius r0 > 0, s.t. for all s ∈ S, the Voronoi cell W (s) is contained in Ball(s, r0 ). (rather than lim sup in the original 2) The density γ(S) is given by lima→∞ M (S,a) an definition). For s ∈ S, we denote by v(s) the volume of the Voronoi cell of s, |W (s)|. Definition 3 (Average Voronoi cell volume): For a regular IC S, the average Voronoi cell volume is defined by v(S) , lim sup Ea [v(s)]. (56) a→∞

Lemma 5: For a regular IC S, the average volume is given by the inverse of the density: 1 . (57) γ(S) = v(S) Proof: Appendix C. In Theorem 3 we have used in (46) the equivalent sphere bound, i.e. the fact that the probability to leave the Voronoi cell of volume v is lower bounded by the probability that the noise leaves a sphere of volume v. Let SPB(v) denote that probability, i.e. SPB(v) , Pr{Z ∈ / Ball of volume v}.

(58)

The equivalent sphere bound can be applied to any of the individual codewords in S: Pe (s) ≥ SPB(v(s)).

(59)

14

We now show that the above equation holds for the average volume and error probability as well. Theorem 4: Let S be a regular IC, and let v(S) be the average Voronoi cell volume of S. Then the average error probability of S is lower bounded by Pe (S) ≥ SPB(v(S)).

(60)

Proof: The proof relies on the convexity of sphere bound: Lemma 6: The equivalent sphere bound SPB(v) is a convex function of the Voronoi cell volume v. Proof: Appendix D. We therefore get Pe (S) = lim sup Ea [Pe (s)] a→∞

(a)

≥ lim sup Ea [SPB(v(s))] a→∞

(b)

≥ lim sup SPB(Ea [v(s)]) a→∞

(c)

= SPB(lim sup Ea [v(s)]) a→∞

= SPB(v(S)).

(61)

(a) follows from the sphere bound for each individual point s ∈ S, (b) follows from the Jensen inequality and the convexity of SPB(·) (Lemma 6), and (c) follows from the fact that SPB(·) is continuous. Theorem 4 extends the sphere bound to (regular) infinite constellations. For finite dimension n, this allows the easy computation bounds on the performance of infinite constellations with known average Voronoi cell volume. It also helps proving the following: Theorem 5: For any regular IC S with NLD δ and average error probability ε, the following must hold: r   1 −1 1 1 ∗ . (62) Q (ε) + log n + O δ≤δ − 2n 2n n Proof: By Theorem 4, the average error probability is lower bounded by SPB(v(S)). Since S is regular, v(S) = γ −1 = e−nδ , and we get the same relationship between ε and γ as in the case of ICs with constant Voronoi cell volume in Eq. (47). The rest of the proof is identical to the proof of Theorem 3. C. Converse for General ICs In the previous subsection the analysis was limited to regular ICs only. This excludes constellations which are semi-infinite (e.g. contains points only in half of the space), and also constellations in which the density oscillates with the cube size a (and the formal limit γ does not exist). We now extend the proof of the converse for any IC, without the regularity assumptions. The proof is based on the following regularization process:

15

Lemma 7 (Regularization): Let S be an IC with density γ and average error probability Pe (S) = ε. Then for any ξ > 0 there exists a regular IC S ′ with density γ ′ ≥ γ/(1 + ξ), and average error probability Pe (S ′ ) = ε′ ≤ ε(1 + ξ). Proof: Appendix E. Theorem 6: For any IC S with NLD δ and average error probability ε, the following must hold: r   1 −1 1 1 ∗ δ≤δ − . (63) Q (ε) + log n + O 2n 2n n Proof: Let S be an IC. Let ξ = √1n , and let S ′ be the regularized IC, according to ξ and Lemma 7. Since S ′ is regular we get by Theorem 5 that r   1 −1 ′ 1 1 ′ ∗ δ ≤δ − , (64) Q (ε ) + log n + O 2n 2n n

where ε′ = Pe (S ′ ) is the average error probability, γ ′ is the density and δ ′ = n1 log γ ′ is the NLD of S ′ .     1 1 ′ ′ √ √ According to the lemma, ε ≤ ε(1 + ξ) = ε 1 + n and γ ≥ γ/ 1 + n . It follows that   1 1 1 1 1 ′ ′ δ = log γ ≥ log γ − log 1 + √ ≥δ− . (65) n n n n n Similarly to (54), we use the Taylor approximation and get      1 1 −1 −1 ′ −1 ε 1+ √ = Q (ε) + O √ . Q (ε ) ≥ Q n n

Combining (65) and (54) into (64) gives 1 δ − ≤ δ′ n r   1 −1 ′ 1 1 ∗ ≤δ − Q (ε ) + log n + O 2n 2n n r      1 1 1 1 ∗ −1 ≤δ − Q (ε) + O √ + , log n + O 2n n 2n n

(66)

(67)

and (63) follows immediately.

VI. E RROR E XPONENTS The error exponent of the Poltyrev setting is defined in the usual manner: 1 E(δ) , lim log Pe (n, δ), (68) n→∞ n where Pe (n, δ) is the lowest error probability that can be achieved by any IC in n dimensions and NLD δ. An upper bound on this exponent is the analog to the sphere packing exponent. It is given by [3] 1 e−2δ + δ + log 2πσ 2 2 4πeσ 2  1  2(δ∗ −δ) = e − 1 − 2(δ ∗ − δ) 2

Esp (δ) =

(69)

16 1 where δ ∗ = 12 log 2πeσ 2. The random coding exponent (a lower bound on the  ∗ e  δ − δ + log 4 , −2δ e Er (δ) = + δ + 21 log 2πσ 2 ,  4πeσ2 0,

error exponent) is given by δ ≤ δcr ; δcr ≤ δ < δ ∗ ; δ ≥ δ∗,

(70)

1 where δcr = 12 log 4πeσ 2. It is tempting to try and prove the random coding exponent using the simple suboptimal decoder used in the proof in Section IV rather than the ML decoder and its complex analysis in [3]. Although it gives the optimal rate of convergence to the capacity, the technique based on the simple decoder does not yield the random coding exponent.

A. Typicality Decoder In (30) we have shown that for any r > 0 and for all ξ > 0 there exists a lattice Λ for which Pe (Λ) ≤ Pr {kZk > r} + γVn r n + ξ. (71) In the error exponent framework, we fix the NLD δ and optimize this expression w.r.t. r to try and achieve the lowest possible error probability, which will hopefully give a good (high) error exponent. It is known [3] that     exp −n 2σρ2 − 21 log σρe2 , ρ ≥ σ 2 ; 2 2 2 (72) Pr kZk > r = Pr kZk > nρ ≤ 1, ρ ≤ σ2. 2

where ρ = rn . Such bounding technique is used in the proof of the converse part (Theorem 3) which gives the tight error exponent. It therefore makes sense to use the same bound here. Using the last inequality we get for any ρ > σ 2 ,    ρ 1 ρe Pe (Λ) ≤ exp −n + enδ (nρ)n/2 Vn + ξ. (73) − log 2 2σ 2 2 σ Note that ξ can be chosen to be arbitrarily small, e.g. to be equal to the first term in the expression, and does not affect the exponent. We therefore get that the achieved error probability is bounded by     ρ 1 ρe 1 n/2 3 exp −n min − log 2 , −δ − log (nρ) Vn . (74) 2σ 2 2 σ n

The above exponent holds for any selection of ρ. We apply the Stirling approximation (see Appendix B) and get that    1 log n 1 n/2 . (75) − log (nρ) Vn = − log 2ρπe + O n 2 n

We note that after ignoring the O(log n/n) part, one of the expressions in the minimization increases with ρ, while the other decreases. We therefore select the value of ρ that makes them equal in order to get the maximal exponent. The optimal value ρ∗ is given by ρ∗ =

17 -2

-1.9

∆cr

-1.8

-1.7

-1.6

-1.5

∆*

0.5

Esp (δ)

0.4

Exponent @natsD

Er (δ) 0.3

Et (δ)

0.2

0.1

0.0 -2

-1.9

-1.8

∆cr

-1.7

-1.6

-1.5

∆*

The normalized log density ∆

Fig. 2. The achieved error exponents by typicality decoding Et (δ) (dot-dashed) vs. the random coding exponent Er (δ) (dotted) and the sphere packing Esp (δ) (solid). The noise variance σ 2 is set to 1.

σ 2 (1 + 2(δ ∗ − δ)). Overall, the achieved error exponent for the typicality decoder, denoted Et (δ) is given by 1 (76) Et (δ) = δ ∗ − δ − log(1 + 2(δ ∗ − δ)). 2 This exponent is lower than Poltyrev’s random coding exponent (70). The difference can be seen in Figure 2. This may seems like a surprise since this decoder gives the optimal behavior in the dispersion sense (Theorem 1). But this is somewhat resolved by noting that the second derivatives of all the curves in the figure coincide (this is also verified analytically), which hints that at rates close to the capacity the suboptimality of the typicality decoder can be tolerated. B. ML Decoder We now show how to derive Poltyrev’s random coding exponent using techniques from Theorem 1. Lemma 8: For any dimension n, r ∗ > 0 and ξ > 0 there exists a lattice Λ for which the ML error probability is bounded by Z r∗ Pe (Λ) ≤ γVn fR (r)r n dr + Pr{kzk > r ∗ } + ξ, (77) 0

where fR (r) denotes the PDF of the noise vector’s radius. Proof: Suppose that the zero lattice point was sent, and the noise vector is z ∈ Rn . An error event occurs (for a ML decoder) when there is a nonzero lattice point λ ∈ Λ, and

18

its Euclidean distance to z is less than the distance to the zero point to the noise vector. We denote by E the error event and get Pe (Λ) = Pr{E} = = Er [Pr {E | kzk = r}] Z ∞ = fR (r) Pr {E | kzk = r} dr 0 Z r∗ ≤ fR (r) Pr {E | kzk = r} dr + Pr{kzk > r ∗ },

(78)

0

where the last inequality follows by upper bounding the probability by 1. It holds for any r ∗ > 0. We examine the conditional error probability Pr {E | kzk = r}.   Pr {E | kzk = r} = Pr kz − λk ≤ kzk kzk = r   λ∈Λ\{0} X ≤ Pr {kz − λk ≤ kzk | kzk = r}   [

λ∈Λ\{0}

=

X

λ∈Λ\{0}

Pr {λ ∈ Ball(z, kzk) | kzk = r} ,

(79)

where the inequality follows from the union bound. Plugging into the left term in (78) gives Z r∗ X fR (r) Pr {λ ∈ Ball(z, kzk) | kzk = r} dr 0

λ∈Λ\{0}

X Z

=

λ∈Λ\{0}

0

r∗

fR (r) Pr {λ ∈ Ball(z, kzk) | kzk = r} dr.

(80)

Note that the last integral has a bounded support (w.r.t. λ) (always zero if kλk >= 2r ∗ ). Therefore we can apply the Minkowski-Hlawka theorem as in Theorem 1 and get that for any ξ > 0 there exists a lattice Λ with density γ, whose error probability is upper bounded by Pe (Λ) ≤ γ

Z

λ∈Rn

Z

r∗ 0

fR (r) Pr {λ ∈ Ball(z, kzk) | kzk = r} drdλ + Pr{kzk > r ∗ } + ξ.

19

We continue with Z

Z

r∗

fR (r) Pr {λ ∈ Ball(z, kzk) | kzk = r} drdλ Z = fR (r) Pr {λ ∈ Ball(z, kzk) | kzk = r} dλdr 0 λ∈Rn Z r∗ Z   = fR (r) E 1{λ∈Ball(z,kzk)} kzk = r dλdr 0 λ∈Rn  Z Z r∗ = 1{λ∈Ball(z,kzk)} dλ kzk = r dr fR (r)E λ∈Rn 0 Z r∗ fR (r)E [r n Vn | kzk = r] dr = 0 Z r∗ = Vn fR (r)r n dr, λ∈Rn 0 Z r∗

0

thus completing the proof of the lemma. When analyzing error exponents we may ignore the addition of ξ, since it can be chosen to be equal to the sum of the other terms, and by that have no effect on the exponent1. Similarly to Theorem 1, we select r ∗ to be (approximately) the radius of the equivalent sphere of a Voronoi cell of the lattice (which is a function of its NLD δ only). Instead of −1/n the exact radius e−δ Vn , we select the term of the approximation obtained by the pfirst n . We choose the approximated version so that Stirling approximation, and set r ∗ = e−δ 2πe ∗2 ρ∗ , rn will not depend on n. The (technical) motivation for this will be clarified later on. Using (72) we can bound the term Pr{kzk > r ∗ } by    ∗   1 ρ∗ e ρ 2 ∗2 2 ∗ . (81) − log 2 Pr kZk > r = Pr kZk > nρ ≤ exp −n 2σ 2 2 σ Since

ρ∗ σ2

= e2(δ

∗ −δ)

we have

    1 2(δ∗ −δ) ∗ 2 ∗ − 1 − 2(δ − δ) Pr kZk > nρ ≤ exp −n e 2 = exp [−nEsp (δ)] .

(82)

We now turn to handle the left term in (77). We first write the integral in terms of ρ = r 2 /n: r∗

ρ∗

n n √ fR ( nρ)(nρ) 2 −1 dρ. (83) 2 0 0 The amplitude of the noise vector kZk is a (normalized) χ random variable with n degrees of freedom, therefore fR (r) is given by

γVn

Z

n

fR (r)r dr = γVn

n

Z

r2

21− 2 (r/σ)n−1 e− 2σ2  fR (r) = σfχn (r/σ) = σ Γ n2   n   r2 n log 2 + (n − 1) log(r/σ) − 2 − log Γ . = exp log σ + 1 − 2 2σ 2 1

The same result without the addition of ξ can be derived using another version of the Minkowski-Hlawka theorem.

20

Rewriting in terms of ρ and by using the Stirling approximation gives     √ 1 1 ρ ρ fR ( nρ) = exp −n − − log 2 + O(log n) . 2σ 2 2 2 σ Note that the above expression is identical to the sphere packing exponent, with ρ∗ replaced with ρ. We therefore define ˜ sp (ρ) , ρ − 1 − 1 log ρ . (84) E 2σ 2 2 2 σ2 We now examine the expression γVn (nρ)n/2 :     ρ 1 n/2 ∗ γVn (nρ) = exp −n δ − δ − log 2 + O(log n) 2 σ , exp [−nE1 (δ, ρ) + O(log n)] . (85) We combine (83), (84) and (85) and get Z ρ∗ log n ˜ e−n(E1 (δ,ρ)+Esp (ρ)+O( n )) dρ + e−nEsp (δ) . Pe (Λ) ≤

(86)

0

˜ sp (ρ). The first integral can then be For convenience, we define E2 (δ, ρ) , E1 (δ, ρ) + E R ∗ ρ upper bounded by nk 0 e−nE2 (δ,ρ) dρ for some constant k. When n grows, the asymptotical behavior of the integral is dominated by the value ρ˜ that minimizes E2 (δ, ρ). This is formalized by Laplace’s method of integration (see, e.g. [13, Sec. 3.3]): s    Z ρ∗ 2π 1 −nE2 (δ,ρ) −nE2 (δ,˜ ρ) , (87) e dρ = e 1 + O 2 n n ∂ E∂ρ2 (δ,ρ) |ρ=˜ρ 0 2 where ρ˜ , arg minρ∈[0,ρ∗ ] E2 (δ, ρ). Here we used the fact that the upper integration limit ρ∗ does not depend on n. Straightforward optimization of E2 (δ, ρ) gives  2 2σ , 2σ 2 ≤ ρ∗ ; 2 ∗ ρ˜ = argmin E2 (δ, ρ) = min{2σ , ρ } = (88) ρ∗ , o.w. ρ∈[0,ρ∗ ]

Note that the boundary 2σ 2 = ρ∗ occurs exactly when δ = δ ∗ − 21 log 2, which is the ’critical rate’ δcr (see (70)). It follows that  ∗ δ − δ + 12 log 4e , δ ≤ δ ∗ + 12 log 2; ∗ (89) E2 (δ, ρ˜) = δ − δ ∗ − 12 + 21 e−2(δ−δ ) , o.w.

which can be written E2 (δ, ρ˜) = Er (δ) (see (70)). Since the second derivative of E2 (δ, ρ) is constant and bounded away from zero, we plug back into (86) and get log n Pe (Λ) ≤ e−n(Er (δ)+O( n )) + e−nEsp (δ) log n = e−n(min{Er (δ),Esp (δ)}+O( n ))

= e−n(Er (δ)+O( as required.

log n n

)) ,

(90)

21

VII. VOLUME - TO -N OISE R ATIO The volume-to-noise ratio (VNR) of a lattice Λ is defined as µ(Λ, ε) =

γ −2/n [Vol. of Voronoi region]2/n = 2 , [Noise variance] σ (ε)

(91)

where σ 2 (ε) is the noise variance s.t. the error probability is exactly ε. This dimensionless figure of merit is another way to quantify the goodness of the lattice for coding over the unconstrained AWGN channel. Note that the VNR is invariant to scaling of the lattice, and that the definition can be extended to general infinite constellations. The minimum possible value of µ(Λ, ε) over all lattices in Rn is denoted by µn (ε), and it is known that for any 1 > ε > 0, limn→∞ µn (ε) = 2πe. Using results from previous chapters we can show how µn (ε) approaches 2πe: Theorem 7: For a fixed error probability ε > 0, The optimal VNR µn (ε) is given by r   8π 2 e2 −1 log n µn (ε) = 2πe + . (92) Q (ε) + O n n Proof: In Theorems 1 and 3 we have shown that for given ε and σ 2 , the optimal normalized-log-density δ is given by r   log n 1 −1 ∗ , (93) Q (ε) + O δε (n) = δ − 2n n 1 where δ ∗ = 21 log 2πeσ 2. By definition, the following relation holds for any σ 2 :

e−2δε (n) σ2 (note that δε (n) implicitly depends on σ 2 as well). It follows that µn (ε) =

(94)

" r #  1 2 log n µn (ε) = 2 exp −2δ ∗ + Q−1 (ε) + O σ n n " r #  log n 1 2 −1 Q (ε) + O = 2 exp log(2πeσ 2 ) + σ n n "r #  2 −1 log n = 2πe · exp Q (ε) + O n n " r #  2 −1 log n Q (ε) + O = 2πe 1 + n n (95) where the last step follows from the Taylor expansion of ex . (92) follows immediately. Note that the theorem can be slightly strengthened by using the  more delicate bounds on log n δε (n) in Theorems 1 and 3 rather than the loose term O n .

22

VIII. S UMMARY AND C ONCLUSIONS In this paper we examined the optimal normalized log density (NLD) of infinite constellations for channel coding over the unconstrained AWGN channel, when a fixed error probability ε is allowed. We show that the optimal NLD can be approximated by a closedform expression, and the gap to the optimal NLD vanishes as the inverse of the square root of the dimension n. The result is analogous to the channel dispersion theorem in classical channel coding, and agrees with the interpretation of the unconstrained setting as the high-SNR limit of the power constrained AWGN channel. As a consequence of the main result, the optimal volume-to-noise ratio of lattices and infinite constellations can also be approximated in a similar manner. The paper’s main result can be extended to more general noise models. A PPENDIX A C ENTRAL L IMIT T HEOREM AND THE B ERRY-E SSEEN T HEOREM By the central limit theorem (CLT), A normalized sum of n independent random variables converges (in distribution) to a Gaussian random variable. The Berry-Esseen theorem shows the speed of the convergence (see [14, Ch. XVI.5]). We write here the version for i.i.d. random variables, which is sufficient for this paper. Theorem 8 (Berry-Esseen for i.i.d. RVs [14]): Let {Yi }ni=1 be i.i.d. random variables Pn with 1 3 √ zero mean and unit variance. Let T , E[|Yi | ] and assume it is finite. Let Sn , n i=1 Yi be the normalized sum. Note that Sn also has zero mean and unit variance. Then for all α ∈ R and for all n ∈ N, 6T | Pr{Sn ≥ α} − Q(α)| ≤ √ . (96) n A PPENDIX B A PPROXIMATING Vn Proof of Lemma 4: The volume of a hypersphere of unit radius Vn is given by (see, e.g. [15, p. 9]). It follows that

π n/2 Γ(n/2+1)

1 1 1 log Vn = log π − log Γ(n/2 + 1). (97) n 2 n We use the Stirling approximation for the Gamma function for z ∈ R (see, e.g. [16, Sec. 5.11]). r      2π z z 1 Γ(z + 1) = zΓ(z) = z 1+O z e z      √ 1 z z 1+O , (98) = 2πz e z    z 1 1 log(zΓ(z)) = log 2πz + z log + log 1 + O 2 e z   1 z 1 . (99) = log 2πz + z log + O 2 e z

23

By letting z = n/2 we get   h n  n i 1 n n 1 log Γ = log nπ + log , +O 2 2 2 2 2e n   h n  n i 1 1 1 n 1 = . log Γ log n + log +O n 2 2 2n 2 2e n Combined with (97) we have   1 1 1 n 1 1 log Vn = log π − log n − log +O n 2 2n 2 2e n   1 2πe 1 1 = log , − log n + O 2 n 2n n as required.

(100) (101)

(102)

A PPENDIX C P ROPERTIES OF R EGULAR IC S Proof of Lemma 5: Let S be a regular IC with a given r0 . Let V(a) denote the union of all the Voronoi cells of code points in Cb(a): [ V(a) , W (s). (103) s∈S∩Cb(a)

Since all Voronoi cells are bounded in spheres of radius r0 , we note the following (for a > 2r0 ): • All the Voronoi cells of the code points in Cb(a) are contained in Cb(a + 2r0 ), and therefore V(a) ⊆ Cb(a + 2r0 ). • Any point in Cb(a − 2r0 ) must be in a Voronoi cell of some code point. These code points cannot be outside Cb(a) because the the Voronoi cells are bounded in spheres of radius r0 , so they must lie within Cb(a), and we get that Cb(a − 2r0 ) ⊆ V(a). It follows that (a − 2r0 )n ≤ |V(r)| ≤ (a + 2r0 )n , or

(a − 2r0 )n ≤

X

s∈S∩Cb(a)

v(s) ≤ (a + 2r0 )n .

Dividing by an and taking the limit of a → ∞ gives P s∈S∩Cb(a) v(s) lim = 1. a→∞ an Since, by assumption, the limit of the density γ(S) exists, we get M(S, a) γ(S) = lim a→∞ an M(S, a) = lim P a→∞ s∈S∩Cb(a) v(s) 1 P = 1 lima→∞ M (S,a) s∈S∩Cb(a) v(s) 1 = . v(S)

(104)

(105)

(106)

24

As a corollary, we get that the average volume v(S) exists in the limit (and not only in the lim sup sense). A PPENDIX D C ONVEXITY OF THE EQUIVALENT SPHERE BOUND Proof of Lemma 6: Suppose v is the volume of the Voronoi cell. The radius of the equivalent sphere is given −1/n by r = v 1/n Vn . The equivalent sphere bound is given by ) ( n X SPB(v) = Pr Zi2 ≥ r 2 = Pr , Pr

( i=1 n X

(Zi /σ)2 ≥

( i=1 n X i=1

v 2/n 2/n

Vn σ 2

)

(Zi /σ)2 ≥ (C1 · v)2/n

)

,

(107)

where C1 is a constant. P We note that ni=1 (Zi /σ)2 is a sum of n i.i.d. squared Gaussian RVs with zero mean and unit variance, which is exactly a χ2 distribution with n degrees of freedom. We therefore get: Z ∞ 1 xn/2−1 e−x/2 dx SPB(v) = Γ(n/2)2n/2 (C1 ·v)2/n Z ∞ = C2 xn/2−1 e−x/2 dx (C1 ·v)2/n

, C2 F (C1 · v), (108) R ∞ n/2−1 −x/2 where C2 is a constant and F (t) , t2/n x e dx. It can be verified by straightforward differentiation that   Z 1 2/n ∂2 ∂ 2 ∞ n/2−1 −x/2 2 2 −1 n exp − t , (109) F (t) = 2 x e dx = 2 t ∂t2 ∂t t2/n n 2

which is strictly positive for all t > 0. Therefore F (t) is convex, and the equivalent sphere bound SPB(v) = C2 F (C1 · v) is a convex function of v. A PPENDIX E P ROOF OF THE R EGULARIZATION L EMMA Proof of Lemma 7: Our first step will be to find a hypercube Cb(a∗ ), so that the density of the points in S ∩ Cb(a∗ ) and the error probability of codewords in S ∩ Cb(a∗ ) are close enough to γ and ε, respectively. We then replicate this cube in order to get a regular IC with the desired properties. The idea is similar to that used in [3, Appendix C]. By the definition of Pe (S) and γ(S), γ(S) = lim sup a→∞

M(S, b) M(S, a) = lim sup n a→∞ a bn b>a

(110)

25

Pe (S) = lim sup a→∞

1 M(S, a)

X

Pe (s) = lim sup a→∞ b>a

s∈S∩Cb(a)



1 M(S, b)

X

Pe (s).

(111)

s∈S∩Cb(b)

Let τγ = 1 + ξ and τε = 1 + 2ξ . By definition of the limit, there must exist a0 large enough s.t. for every a > a0 , both hold: 1 M(S, b) >γ· , (112) sup n b τγ b>a and

1 M(S, b)

sup b>a

X

s∈S∩Cb(b)

Pe (s) < ε · τε .

(113)

Define ∆ s.t. Q(∆/σ) = ε · 2ξ , and define a∆ as the solution to  n p a∆ + 2∆ = 1 + ξ. a∆

(114)

Let amax = max{a0 , a∆ }. According to (112), there must exist a∗ > amax s.t. M(S, a∗ ) 1 >γ· . n a∗ τγ

By (113) we get that 1 M(S, a∗ )

X

s∈S∩Cb(a∗ )

Pe (s) ≤ sup

b>amax

1 M(S, b)

(115)

X

s∈S∩Cb(b)

Pe (s) < ε · τε .

(116)

Now consider the finite constellation G = S ∩ Cb(a∗ ). For s ∈ G, denote by PeG (s) the error probability of s when G is used for transmission with Gaussian noise. Since G ⊂ S, clearly PeG (s) ≤ Pe (s). The average error probability for G is bounded by 1 X G 1 X Pe (G) , Pe (s) ≤ Pe (s) ≤ ε · τε . (117) |G| s∈G |G| s∈G

We now turn to the second part - constructing an IC from the expurgated code G. Define the IC S ′ as an infinite replication of G with spacing of 2∆ between every two copies as follows: S ′ , {s + I · (a∗ + 2∆) : s ∈ G, I ∈ Zn } , (118) where Zn denotes the integer lattice of dimension n. ′ Now consider the error probability of a point s ∈ S ′ denoted by PeS (s). This error probability equals the probability of decoding by mistake to another codeword from the same copy of G or to a codeword in another copy. By the union bound, we get that ′

PeS (s) ≤ PeG (s) + Q(∆/σ).

(119)

The right term follows from the fact that in order to make a mistake to a codeword in a different copy of G, the noise must have an amplitude at least ∆. The average error probability over S ′ is bounded by Pe (S ′ ) ≤ Pe (G) + Q(∆/σ) ≤ ε · τε + Q(∆/σ) = ε(1 + ξ)

(120)

26

as required, where the last equality follows from the definition of τε and ∆. The density of a cube of edge size a∗ + 2∆ is given by |G|(a∗ + 2∆)−n . Define a˜k = (a∗ + 2∆)(2k − 1) for any integer k. Note that for any k > 0, Cb(˜ ak ) contains exactly n (2k − 1) copies of G, and therefore M(S ′ , a ˜k ) |G|(2k − 1)n |G| . = = n n a ˜k a ˜k (a∗ + 2∆)n

(121)

˜k ∗ . a ˜k∗ −1 = a˜k∗ − (a∗ + 2∆) < a ≤ a

(122)

M(S ′ , a ˜k∗ −1 ) M(S ′ , a) M(S ′ , a ˜k ∗ ) < ≤ , n n n a a a

(123)

a ˜nk∗ −1 |G| M(S ′ , a) |G| a˜nk∗ < ≤ . (a∗ + 2∆)n an an (a∗ + 2∆)n an

(124)

For any a > 0, let k ∗ be the minimal integer k s.t. a ˜k ≥ a. Clearly, Therefore

and

By taking the limit a → ∞ of (124), we get that the limit exists and is given by |G| M(S ′ , a) = . a→∞ an (a∗ + 2∆)n

γ(S ′ ) = lim

(125)

It follows that |G| (a∗ + 2∆)n an∗ |G| = n a∗ (a∗ + 2∆)n  n (a) a∗ 1 ≥ γ(S) τγ a∗ + 2∆ (b) 1 ≥ γ(S) . 1+ξ

γ(S ′ ) =

(126)

where (a) follows from (115) and (b) follows from the definitions of τγ , a∆ and from the fact that a∆ ≤ a∗ . It remains to show that the resulting IC S ′ is regular, i.e. that all the Voronoi cells can be bounded in a sphere with some fixed radius r0 . The fact that the average density is achieved in the limit (ant not only in the lim sup sense) was already established in (125). Let s be an arbitrary point in S ′ . By construction (see (118)), the points {s ± (a∗ + 2∆)ei |i = 1, ..., n} are also in S ′ (where ei denotes the vector of 1 in the i-th coordinate, and the rest are zeros). We therefore conclude that the Voronoi cell W (s) is contained √ in the hypercube s + Cb(a∗ + 2∆), and is clearly bounded in a sphere of radius r0 , n(a∗ + 2∆).

27

R EFERENCES [1] G. D. Forney Jr. and G. Ungerboeck, “Modulation and coding for linear Gaussian channels,” IEEE Trans. on Information Theory, vol. 44, no. 6, pp. 2384–2415, 1998. [2] U. Erez and R. Zamir, “Achieving 1/2 log(1+snr) over the additive white Gaussian noise channel with lattice encoding and decoding,” IEEE Trans. on Information Theory, vol. 50, pp. 2293–2314, Oct. 2004. [3] G. Poltyrev, “On coding without restrictions for the AWGN channel,” IEEE Trans. on Information Theory, vol. 40, no. 2, pp. 409–417, 1994. [4] R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: John Wiley & Sons, Inc., 1968. [5] C. E. Shannon, “Probability of error for optimal codes in a gaussian channel,” The Bell System technical journal, vol. 38, pp. 611–656, 1959. [6] Y. Polyanskiy, H. Poor, and S. Verd´u, “Channel coding rate in the finite blocklength regime,” IEEE Trans. on Information Theory, vol. 56, no. 5, pp. 2307 –2359, May 2010. [7] V. Strassen, “Asymptotische absch¨atzungen in shannons informationstheorie,” Trans. Third Prague Conf. Information Theory, 1962, Czechoslovak Academy of Sciences, pp. 689–723. [8] E. Hlawka, J. Shoißengeier, and R. Taschner, Geometric and Analytic Numer Theory. Springer-Verlang, 1991. [9] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & sons, 1991. [10] R. Zamir, “Lattices are everywhere,” in 4th Annual Workshop on Information Theory and its Applications, UCSD, (La Jolla, CA), 2009. [11] A. Ingber and M. Feder, “Parallel bit-interleaved coded modulation,” Available on arxiv.org. [12] V. Tarokh, A. Vardy, and K. Zeger, “Universal bound on the performance of lattice codes,” Information Theory, IEEE Transactions on, vol. 45, no. 2, pp. 670 –681, mar. 1999. [13] O. E. Barndorff-Nielsen and D. R. Cox, Asymptotic Techniques for Use in Statistics. New York: Chapman and Hall, 1989. [14] W. Feller, An Introduction to Probability Theory and Its Applications, Volume 2 (2nd Edition). John Wiley & Sons, 1971. [15] J. H. Conway and N. J. A. Sloane, Sphere packings, lattices and groups, ser. Grundlehren der math. Wissenschaften. Springer, 1993, vol. 290. [16] N. I. of Standards and Technology, “Digital library of mathematical functions,” http://dlmf.nist.gov, May 2010.