The capacity of finite Abelian group codes over ... - Semantic Scholar

Report 3 Downloads 86 Views
1

The capacity of finite Abelian group codes over symmetric memoryless channels G. Como and Fabio Fagnani

Abstract—The capacity of finite Abelian group codes over symmetric memoryless channels is determined. For certain important examples, such as m-PSK constellations over AWGN channels, with m a prime power, it is shown that this capacity coincides with the Shannon capacity; i.e. there is no loss in capacity using group codes. (This had previously been known for binary linear codes used over binary-input output-symmetric memoryless channels.) On the other hand, a counterexample involving a three-dimensional geometrically uniform constellation is presented in which the use of Abelian group codes leads to a loss in capacity. The error exponent of the average group code is determined, and it is shown to be bounded away from the random-coding error exponent, at low rates, for finite Abelian groups not admitting Galois field structure.

Keywords: non-binary constellation, geometrically uniform constellation, m-PSK, group codes, Shannon capacity, error exponent, channel coding theorem. I. I NTRODUCTION It is a well-known fact that binary linear codes suffice to reach capacity on binary-input output-symmetric channels [1], [2], [3]. Moreover, by averaging over the ensemble of linear codes, the same error exponent is achieved as by averaging over the ensemble of all codes. The same has been proven to hold true [4] for group codes over finite Abelian groups admitting Galois field structure. In this paper we investigate the same question for group codes employed over non-binary channels exhibiting symmetries with respect to the action of a finite Abelian group G. The main example we have in mind is the additive white Gaussian noise (AWGN) channel with input set restricted to a finite geometrically uniform (GU) constellation [5] (m-PSK for instance) and with possibly hard- or soft-decision decoding rule. In [6] it was conjectured that group codes should suffice in this case to achieve capacity exactly as in the field case. On the other hand, in [4] it was conjectured that group codes do not achieve the random-coding exponent if the group G does not admit Galois field structure. To our knowledge, there has not been any progress towards any of these directions. Some of the material of this paper has been presented at the ISIT 2005 [30]. G. Como was with the Department of Mathematics, Politecnico di Torino and the Department of Electrical Electrical Engineering, Yale University. He is now with the Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, 02139, MA, USA. Email: [email protected] F. Fagnani is with Dipartimento di Matematica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10126 Torino, Italy. Email:[email protected]

However, interest in group codes has not decreased in these years. Indeed, they provide the possibility to use more spectrally efficient signal constellations while keeping many good qualities of binary linear codes. More specifically, on the one hand, group codes have congruent Voronoi region, invariant distance profiles, and enjoy the uniform error property. On the other hand, the nice structure of the corresponding minimal encoders, syndrome formers and trellis representations makes group codes appealing for low-memory encoding and lowcomplexity iterative decoding schemes. We refer to [7]–[20] and references therein for an overview of the many research lines on group codes which have been developing during recent years. Observe that coset codes over finite fields allow to achieve capacity and the random-coding error exponent of any memoryless channel [2]. However, whenever the group structure does not match the symmetry of the channel (e.g. binary coset codes on 2r -PSK AWGN channels, for r > 2), or if the channel is not symmetric, coset codes in general fail to be GU, do not enjoy the uniform error property, and have non-invariant distance profiles. Recently, group codes have made their appearance also in the context of turbo concatenated schemes [21], [22] and of low-density parity-check (LDPC) codes [23], [24], [25], [26]. In the binary case an important issue, for these types of highperformance coding schemes, is the evaluation of the gap to Shannon capacity, as well as the rate of convergence to zero of the word and bit error rate. For regular LDPC codes such gaps have been evaluated quite precisely [23], [27], [28] and it has been shown that, when the density parameters are allowed to increase, these schemes tend to attain the performance of generic binary linear codes. In [24], [25] the authors extend such an analysis to LDPC codes over the cyclic group Zq , but they have to restrict themselves to the case of prime q. We believe that, without first a complete understanding of our original question, namely if group codes do themselves allow to reach capacity and the correct error exponent, LDPC codes over general Abelian groups cannot be properly analyzed, since it can not be understood whether the gap to capacity is due to the group structure or to the sparseness of the syndrome representation. In [29], a fundamental analysis of LDPC codes over Abelian groups is proposed, based on the general results for group codes presented in this paper. Our work focuses on the case of finite Abelian groups and is organized as follows. In Section II we introduce all relevant notation, we briefly resume Shannon-Gallager theory concerning the capacity and error exponents of memoryless channels and basic concepts concerning GU constellations, and we formally state the main question whether group codes can

2

achieve capacity of a symmetric channel. In Section III we consider memoryless channels which are symmetric with respect to the action of cyclic groups Zpr of prime power order, and we determine (in a computationally effective way) the capacity achievable by group codes over such channels. This capacity is called the Zpr -capacity and equals the minimum of the normalized Shannon capacities of the channels obtained by restricting the input to all non-trivial subgroups of Zpr . The results are contained in Theorem 5 which is an inverse coding theorem for group codes and in Theorem 7 which exhibits an average result working on the ensemble of group codes. The error exponent for the average group code is determined as well. It is shown that for r > 1, the average Zpr -code is bounded away from the randomcoding exponent at least at low rates, confirming a conjecture of Dobrushin [4]. In Section IV, we show that for the pr -PSK AWGN channel, the Zpr -symmetric capacity and the classical Shannon capacity do coincide so that group codes allow to achieve capacity in this case. This proves a conjecture of Loeliger [6]. In Section V we present a counterexample based on a threedimensional GU constellation where instead the two capacities are shown to differ one from each other. It remains an open problem whether using non-Abelian generating groups the Shannon capacity can be achieved in this case. Finally, in Section VI we generalize the theory to channels symmetric with respect to the action of arbitrary finite Abelian generating groups. II. P ROBLEM

STATEMENT

In this section all relevant notation and definition are introduced, and a formal statement of the problem is presented. A. Notation Throughout the paper the functions exp : R → R and log : (0, +∞) → R have to be considered with respect to the same, arbitrary chosen, base a ∈ (1, +∞), unless explicit mention to the contrary. For a subset A ⊆ B, 1A : B → {0, 1} will denote the indicator function of A, defined by 1A (x) = 1 if x ∈ A, 1A (x) = 0 if x ∈ / A. For two groups G and H we will write G ≃ H to mean they are isomorphic, while H ≤ G will mean that H is a subgroup of G. Unless otherwise stated, we shall use the multiplicative notation for a generic group G, with 1G denoting the null element. When restricted to Abelian case we shall switch to the additive notation with 0 denoting the null element. Given a finite set A, P we shall consider the simplex P(A) := {θ : A → [0, +∞)| a θ(a) = 1} of probability measures on A. The discrete P entropy function H : P(A) → R is defined by H(θ) := − θ(a)>0 θ(a) log θ(a). Similarly, for a continuous space B we shall denote P(B) the set of probability densities on B and define the R entropy function H : P(B) → [−∞, +∞] by H(µ) := − B µ(x) log µ(x)dx. Given x ∈ AN , its A-type (or empirical frequency) is the P probability measure θA (x) ∈ P(A) given by θA (x) := N1 1≤i≤N 1{xi } . Define the set of types of all N -tuples by PN (A) := θA (AN ), and let PN (A) := ∪N PN (A) be the set of all A-types. The number

+|A|−1 of A-types |PN (A)| = N|A|−1 is a quantity growing polynomially fast in N . Instead, the set of N -tuples of a given type θ, denoted by  N AN s.t. θA (x) = θ , θ := x ∈ A   = N := N ! Q (N θ(a))! growing has cardinality AN θ a Nθ exponentially fast with N .

B. Coding theory for memoryless channels Throughout the present paper, stationary memoryless channels (MCs) will be considered, which are described by a triple (X , Y, W ), where X is the input set, Y is the output set and, for every x in X , W ( · | x) is a probability density on Y describing the conditional distribution of the output given that the input x has been transmitted. The input set X will always be assumed finite, while the output set Y will often be identified with the n-dimensional Euclidean space Rn . Nevertheless, all the results presented in this paper continue to hold when Y is a discrete space as well; in this case, it is simply needed to replace Lebesgue integrals with sums over Y. 1 We shall consider the N -th extension of an MC (X , Y, W ), N having input set X N and output QN set Y and transition probability densities WN (y|x) = j=1 W (yj |xj ). This motivates the name memoryless, the various transmissions being probabilistically independent once the input signals have been fixed. As usual, a block-code is any subset C ⊆ X N , while a decoder is any (measurable) mapping D : Y N → C. A coding scheme consists of a pair of a code and a decoder. N is the block-length, while R = log |C|/N will denote the transmission rate. The probabilistic model of transmission is obtained by assuming that the transmitted codeword is a random variable (r.v.) X uniformly distributed over C, and that the channeloutput r.v. Y has conditional probability density WN ( · |X) given X. An error occurs when the output Y is incorrectly decoded, i.e. it is the event {D(Y ) 6= X}. The error probability of the coding scheme (C, D) is therefore given by 1 X pe (C, D| x) , pe (C, D) := |C| x∈C R where pe (C, D| x) := Y N 1C\{x} (D(y))WN (y| x)dy is the error probability conditioned to the transmission of the codeword x. It is well known that, given a code C, the decoder minimizing the error probability is the maximum-likelihood (ML) one DML (y) := argmaxx∈C WN (y| x), solving cases of nonuniqueness by assigning to DML (y) a value x ∈ C arbitrarily chosen from the set of maxima of WN (y|x). Throughout the paper we will always assume that ML-decoding is used, and use the notation pe (C) and pe (C| x) for pe (C, DML ) and pe (C, DML | x) respectively. 1 In fact, all the results hold true when Y is a Borel space [31] and integrations are carried on with respect to an abstract σ-finite reference measure, with respect to which all conditioned output measures are absolutely continuous.

3

In order to state the classical channel-coding theorem we are only left with defining the capacity and the random-coding exponent of an MC (X , Y, W ). The former is defined as Z X W (y|x) C := max W (y|x) log P p(x) dy . p(z)W (y|z) p∈P(X ) Y x∈X

z∈X

(1)

The latter is instead given, for R ∈ [0, log |X |], by E(R) := max max (E0 (ρ, p) − ρR) , 0≤ρ≤1 p∈P(X )

(2)

where, for every ρ ∈ [0, 1] and p ∈ P(X ),  !1+ρ  Z X 1 E0 (ρ, p) := − log  p(x)W (y|x) 1+ρ dy  Y

x∈X

(3)

A well-known fact (see [2], [3]) is that

E(R) > 0 ⇔ R < C .

(4)

Moreover the random-coding exponent E(R) is continuous, monotonically decreasing and convex in the interval [0, C), while the dependence of both C and E(R) on the channel is continuous. Given a design rate R ∈ [0, log |X |] and blocklength N , the random-coding ensemble is obtained by considering a random collection CN of ⌈exp(RN )⌉ possibly non-distinct N tuples, from X N , each with distribuNsampled independently ∗ ∗ tion 1≤j≤N µ , where µ in P(X ) is the optimal input R

distribution in (2). pe (CN ) will denote the average error probability with respect to such a probability distribution. We can now state Shannon-Gallager coding theorem for MCs.

Theorem. Assume a MC (X , Y, W ) is given, having capacity C and random-coding exponent E(R). It holds (a) R pe (CN ) ≤ exp(−N E(R)) . In particular this implies that the average error probability tends to 0 exponentially fast for N → +∞, provided that the rate of the codes is kept below C. (b) For every R > C there exists a constant AR > 0 independent of N such that for any coding scheme having rate not smaller than R, we have that pe (C) ≥ AR . C. Symmetric memoryless channels and group codes In this paper we shall focus on MCs exhibiting symmetries, and on codes matching such symmetries. In order to formalize the notion of symmetry, a few concepts about group actions need to be recalled. Given a finite group G, with identity 1G , and a set A we say that G acts on A if, for every g ∈ G, it is defined a bijection of A denoted by a 7→ ga, such that h(ga) = (hg)a ∀h, g ∈ G , ∀a ∈ A . In particular we have that the identity map corresponds to 1G and the maps corresponding to an element g and its inverse g −1 are the inverse of each other. The action is said to be

transitive if for every a, b ∈ A there exists g ∈ G such that ga = b. The action is said to be simply transitive if the element g above is always unique in G. If G acts simply transitively on a set A, it is necessarily in bijection with A, a possible bijection being given by g 7→ ga0 for any fixed a0 ∈ A. Finally, the action of a group G on a measure space A is said to be isometric if it consists of measure-preserving bijections. In particular, when A is a finite set, all group actions are isometric. When A = Rn instead this is a real restriction and is satisfied if the maps a 7→ ga are isometries of Rn , i.e. maps preserving the Euclidean distance. Definition 1. Let G be a group. A MC (X , Y, W ) is said to be G-symmetric if (a) G acts simply transitively on X , (b) G acts isometrically on Y, (c) W (y|x) = W (gy|gx) for every g ∈ G, x ∈ X , y ∈ Y. The simplest example of symmetric MC is the following one, while a much richer family of symmetric MCs based on GU signal constellations will be presented in Sect.II-D. Example 1 (m-ary symmetric channel). Consider a finite set X of cardinality m ≥ 2 and some ε ∈ [0, 1]. The mary symmetric channel is described by the triple (X , X , W ), where W (y|x) = 1 − ε if y = x and P (y|x) = ε/(m − 1) otherwise. This channel returns the transmitted input symbol x as output with probability 1 − ε, while with probability ε a wrong symbol is received, uniformly distributed over the set X \ {x}. The special case m = 2 corresponds to the BSC. The m-ary symmetric channel exhibits the highest possible level of symmetry. Indeed, it is G-symmetric for every group G of order |G| = m. To see this, it is sufficient to observe that every group acts simply and transitively on itself. Notice that whenever m = pr for some prime p and positive integer r, the group G can be chosen to be Zrp which is compatible with the structure of the Galois field Fpr . A first property of G-symmetric channels is that, for both their Shannon capacity C and their random-coding exponent E(R), the maximizing probability distribution p ∈ P(X ) in the variational definitions (1) and (2) can be chosen to be the uniform distribution over the input set X . Since the input of a G-symmetric MC can be identified with the group G itself, block codes for such channels are subsets C ⊆ GN . However, it is natural to consider a subclass of codes matching the symmetry of the channel: they are known as group codes. Definition 2. For a finite group G, a G-code is a subgroup C ≤ GN . G-codes enjoy many properties when used over Gsymmetric MCs. In particular, [5] they have congruent Voronoi (ML-decoding) regions, and invariant distance profiles. As a consequence, the uniform error property (UEP) holds true, namely the error probability does not depend on the transmitted codeword: pe (C|x) = pe (C|x′ ) for every x, x′ in C. Another important property is that their ML-error probabil-

4

ity of a G-code can be bounded by a function of its typespectrum only. For a code C ⊆ GN and a type θ in P(G), let be the number of codewords x of C of SC (θ) := C ∩ GN θ type θ. The following estimation is proved using techniques similar to those in [32]. It will be used in Sect.III-B while proving the direct coding theorem for Zpr -codes.

1 |G|N

P

R

4

z∈GN Y N

P

θ∈PN (G) θ6=δ1G

SC (θ)

(NNθ)

P

x∈GN θ

1 1+ρ

WN



(y|zx)

dy . (5)

Proof See Appendix A. Observe that Lemma 3 does not assume C to be a G-code. However, when C is a G-code, (5) provides an estimate to pe (C) by the UEP. A fundamental question arising is whether G-codes allow to achieve the capacity of a G-symmetric MC. This is known to be the case for binary linear codes over binary-input outputsymmetric channels. Moreover, as shown in [4], the same continues to hold true whenever the group G has the property that every element g in G has the same order, i.e. when G is isomorphic to Zrp for some prime p and positive integer r. However, in [6] Loeliger conjectured that Zm -codes should suffice to achieve capacity on the m-PSK AWGN channel even for non-prime m. In this paper Loeliger’s conjecture will be proved to be true for m equal to a prime power. More in general, the capacity achievable by G-codes over G-symmetric channels will be characterized for any finite Abelian group G, and a counterexample will be presented showing that, when G is not isomorphic to Zrp , G-codes may fail to achieve Shannon capacity.

0 7

5

s

2

r

1 r3s

r2s 3

6 Fig. 1.

1

WN1+ρ (y|z)

rs

1

3

Lemma 3. Let G be a finite group, (G, Y, W ) a G-symmetric MC, and C ⊆ GN a code such that 1GN ∈ C. Then pe (C|1GN ) ≤

r

2

r

8-PSK constellation with the two labelings Z8 and D4 .

Let S be a finite n-dimensional GU constellation equipped with a generating group G. Define the S-AWGN channel as the n-dimensional unquantized AWGN channel with input set S, output Rn , and Gaussian transition densities given by W (y|x) =

||y−x||2 1 e− 2σ2 . 2 n/2 (2πσ )

The S-AWGN channel is G-symmetric. A well-known fact (see [7]) is that every finite GU constellation S lies on a sphere. With no loss of generality we shall assume the radius of such a sphere is unitary. The above construction of G-symmetric channels with a finite GU constellation S as input can be extended to a much wider class of channels. Indeed, one could consider the hard-decoded version of the S-AWGN channel, obtained by quantizing the output over the Voronoi regions of S through the map Q : Rn → S

Q(x) = argmin ||x − s|| . s∈S

Moreover, all the theory can be generalized to MCs having a GU finite constellation S as input and transition densities W (y|x) which are functions of the Euclidean distance ||y −x|| only. As an example, one can consider the Laplacian channel with transition probability densities given by λn Γ(n/2) −λ||x−y|| e , 2π n/2 Γ(n) R +∞ where λ > 0 is a parameter and Γ(t) := 0 xt−1 e−x dx is the well-known Euler’s Γ function. In the following we present some examples of finite GU constellations admitting Abelian generating group. With start with the simplest example, provide by a binary constellation. W (y|x) =

D. Geometrically uniform signal constellations A finite n-dimensional constellation is a finite subset S ⊂ n n n R P spanning R ; i.e. every x ∈ R can be written as x = s∈S αs s with αs ∈ R. We shall restrict ourselves to the n study of finite P constellations S ⊂ R with barycenter 0, i.e. such that s∈S s = 0: these minimize the average per-symbol energy over the class of constellations obtained one from the other by applying isometries. We denote by Γ(S) its symmetry group, namely the set of all isometric permutations of S with the group structure endowed by the composition operation. Clearly Γ(S) acts on S. S is said to be geometrically uniform (GU) if this action is transitive; a subgroup G ≤ Γ(S) is a generating group for S if for every s, r ∈ S a unique g ∈ G exists such that gs = r, namely if G acts simply transitively on S. It is well known that not every finite GU constellation admits a generating group (see [33] for a counterexample). However in what follows we will always assume that the constellations we are dealing with, do admit generating groups, and, actually, Abelian ones.

Example 2 (2-PAM). The 2-PAM constellation is defined by K2 := {1, −1} . It is trivial to see that Γ(K2 ) ≃ Z2 is a generating group for K2 . It is also possible to show that K2 is the only onedimensional GU constellation.  We now pass to the m-PSK constellation which is the main practical example of finite GU constellation. Example 3 (m-PSK). For any integer m ≥ 2, define ξm := 2π ei m . The m-PSK constellation is  k Km := ξm ,1 ≤ k ≤ m .

5

5

3

1

2

2 3 0

0

4 5

7

1 4

6

(a)

(b)

Fig. 2. (a) Z6 -labelled 2-PAM×3-PSK; (b) Z8 -labelled K8β constellation.

Clearly S is two-dimensional for m ≥ 3. It can be shown that Γ(Km ) ≃ Dm , where Dm is the dihedral group with 2m elements. Km admits Zm , i.e. the Abelian group of integers modulo m, as generating group. When m is even there is another generating group (see [5], [6]): the dihedral group Dm/2 , which is non-commutative for m ≥ 6. It follows that the m-PSK-AWGN channel is both Zm -symmetric and (for even m) Dm/2 -symmetric. The constellation K8 with the two possible labelings Z8 and D4 is reported in Fig.1.  Next example shows how higher dimensional GU constellations can be obtained as Cartesian product of lower dimensional ones. This constellation will be considered in Sect.VI, to show how the G-capacity can be evaluated for Abelian group codes of order which is not the power of a prime. Example 4 (Cartesian product constellation). For any integer m > 2 consider the family of three-dimensional GU constellations parameterized by β ∈ (0, +∞) ) (  1 β l k ξm , (−1) β | 0 ≤ k ≤ 2, l = 0, 1 . Km×2 := p 1 + β2

Fig.2(a) shows the special case m = 3. It’s easy to show that β Zm × Z2 is a generating group for Km×2 ; notice that, for odd m, Zm × Z2 ≃ Z2m . Thus, for odd m, AWGN channels with input m-PSK×2-PAM are Z2m -symmetric.  Finally we provide an example of an ’effectively’ threedimensional constellation, i.e. one which is not obtained as the Cartesian product of lower-dimensional ones. This constellation will be used as a counterexample in Sect.V. Example 5 (3-D constellation). For even m > 2 we introduce the family of three-dimensional GU constellations, parameterized by β ∈ (0, +∞) s ( r ! ) 1 β2 β k k Km = ξ , (−1) , 1 ≤ k ≤ m . 1 + β2 m 1 + β2

An example with m = 8 is shown in Fig.2(b): observe that even-labeled points and odd-labeled ones have an offset of π/4. It can be shown that, similarly to the constellations β Km , the the constellations Km have two different generating β groups, Zm and Dm/2 ; so, Km -AWGN channels are both Zm symmetric and Dm/2 -symmetric. 

III. T HE CODING THEOREM FOR Zpr - CODES ON Zpr - SYMMETRIC MEMORYLESS CHANNELS Given a prime p and a positive integer r, let (Zpr , Y, W ) be a Zpr -symmetric MC, whose input has been identified with the group Zpr itself with no loss of generality. For 1 ≤ l ≤ r, consider the MC (pr−l Zpr , Y, W ) obtained by restricting the input of the original MC to the subgroup pr−l Zpr . We shall denote by Cl the Shannon capacity of such a channel, and by El (R) its error exponent. Definition 4. The Zpr -capacity of the MC (Zpr , Y, W ) is r CZpr := min Cl ; 1≤l≤r l its Zpr -error exponent is EZpr (R) := min El 1≤l≤r



l R r



.

It is easily observed that EZpr (R) > 0 if and only if R < CZpr . In the rest of this section the quantity CZpr will be shown to be exactly the capacity achievable by Zpr -codes over the Zpr -symmetric MC (Zpr , Y, W ). In particular, in Sect.III-A it will be proven that reliable transmission with Zpr codes is not possible at any rate beyond CZpr . In Sect.III-B instead, a random-coding argument will be used in order to show that Zpr -codes of arbitrarily small error probability exist at any rate below CZpr , and that EZpr (R) is a lower bound to the error exponent of the average Zpr -code of rate R. Sect.III-C will deal with issues of tightness of EZpr (R). A. The converse coding theorem for Zpr -codes Let C ≤ ZN pr be some Zpr -code of length N and rate R. Standard algebraic arguments (see [14] for instance) allow to show that M s C≃ ZK ps , 1≤s≤r

for some nonnegative integers Ks satisfying 1 X sKs log p = R . N 1≤s≤r

For every 1 ≤ l ≤ r, we consider the code Cl := C∩pr−l ZN pr obtained by restricting the original code C to the subgroup pr−l ZN pl . This is tantamount to considering only those codewords of C of order not exceeding pl . It follows that M M s s Cl ≃ ZK ZK . ps pl 1≤s CZpr there exists a constant AR > 0 such that pe (C) ≥ AR , for every Zpr -code C of rate not smaller than R. B. A coding theorem for Zpr -codes Theorem 5 provides a necessary condition for reliable transmission using Zpr -codes on Zpr -symmetric channels: for this to be possible the rate needs not exceed the Zpr -capacity CZpr . However, it is not clear at all whether any rate below CZpr can actually be achieved by means of Zpr -codes. In principle there could be other algebraic constraints coming into the picture which have been overlooked in our analysis. In fact, we will see that this is not the case: the condition R < CZpr will be shown to be sufficient for reliable transmission using Zpr -codes over a Zpr -symmetric MC. Given a design rate R in (0, log pr ), we introduce the Zpr code ensemble as follows. For every block-length N , we set L := ⌊(1 − logRpr )N ⌋, and consider a random paritycheck operator ΦN uniformly distributed over the the set L N L hom(ZN pr , Zpr ) of all homomorphisms from Zpr to Zpr . r Finally, let CN := ker ΦN be the random Zp -code obtained as the kernel of ΦN , i.e. the set of all those N -tuples x in ZN pr such that ΦN x = 0. Observe that the rate of CN is deterministically not smaller than R. We are interested in estimating the average error probability pe (CN ) of the paritycheck ensemble of Zpr -codes of design rate R. A first step in our analysis consists in evaluating the average type-spectrum. For any type θ in P(Zpr ), let SN (θ) := SCN (θ) be the number of codewords in the random code CN of type θ. We have X X SN (θ) = 1CN (x) = 1{ΦN x=0} . (7) x∈(Zpr )N θ

x∈(Zpr )N θ

The expected value of SN (θ) can be evaluated as follows. Lemma 6. For every θ in PN (Zpr ), the average type spectrum of the parity-check ensemble of Zpr -codes of design rate R is given by   N SN (θ) = p−Ls(θ) Nθ where, for θ in P(Zpr ), s(θ) denotes the smallest integer l ≥ 0 such that θ(a) = 0 for all a ∈ / pr−l Zpr .

Proof Consider the standard basis {δi }1≤i≤N of ZN pr . Then, an equivalent condition for ΦN to be uniformly distributed L over hom(ZN pr , Zpr ) is that the r.v.s ΦN δi , 1 ≤ i ≤ N , are mutually independent and uniformly distributed over ZL pr . Let us now fix an N -tuple x of type θ ∈ PN (Zpr ). From the definition of s(θ) it follows that xj belongs to pr−s(θ) Zpr for all 1 ≤ j ≤ N , and that xi ∈ / pr−s(θ)−1 Zpr for some 1 ≤ i ≤ N . It follows that the r.v. Xi := xi ΦN δi is uniformly distributed over pr−s(θ) ZL pr , while the r.v. X−i := P r−s(θ) L x Φ δ takes values in p Zpr and is independent j N j j6=i P from Xi . Therefore, ΦN x = 1≤i≤N xi ΦN δi = Xi + X−i is uniformly distributed over pr−s ZL pr . So, in particular, P(ΦN x = 0) = p−Ls(θ) . Hence, for any type θ in PN (Zpr ) we have   X N P(ΦN x = 0) = p−Ls(θ) , E [SN (θ)] = Nθ N x∈(Zpr )θ

the first equality above following from (7) and the linearity of expectation. Lemma 6 and Lemma 3 allow us to prove the following fundamental estimation on the average error probability of the parity-check ensemble of Zpr -codes. Theorem 7. Let (Zpr , Y, W ) be a Zpr -symmetric MC, and let EZpr (R) be its Zpr -error exponent. Then the average error probability of the Zpr -code ensemble of design rate R satisfies  pe (CN ) ≤ r exp −N EZpr (R) . Proof For all 1 ≤ s ≤ r let  S  T S N s CN CN := {0} s(θ)=s (Zpr )θ

be the sub-code of CN consisting of the all-zero codeword and of all the codewords of CN whose type θ is such that s s(θ) = s. Observe that CN ⊆ pr−s ZN pr . By the UEP and the union bound we have X s |0) . pe (CN ) = pe (CN |0) ≤ pe (CN (8) 1≤s≤r

For every 1 ≤ s ≤ r and 0 ≤ ρ ≤ 1, by applying Lemma 3 s to the code CN and the MC (pr−s Zpr , Y, W ), then the Jensen inequality and Lemma 6, we get, s |0) pe (CN  ρ 1 1 R P SN (θ) P 1+ρ 1+ρ 1 P ≤ psN WN (y|z) WN (y|z+x) dy N z YN θ (N θ) x

ρ 1 P SN (θ) P 1+ρ W (y|z+x) dy N N psN z YN θ ( N θ) x     ρ 1 1 R P 1+ρ P 1+ρ 1 1 (y|z) dy WN (y|z) W ≤ psN sL N p z z YN   1+ρ 1 R 1+ρ 1 P (y|z) W = psρ(N −L) dy , sN N p



1

P R

1

WN1+ρ (y|z)

YN



z

where the summation index z runs over pr−s ZN pr , θ over types in PN (Zpr ) such that s(θ) = s, and z over (Zpr )N θ , the set

7

of type-θ N -tuples. Observe that psρ(N −L) ≤ exp(N ρ rs R) while, since the channel is stationary and memoryless, 1+ρ  1 R 1+ρ 1 P WN (y|z) dy psN YN z 1+ρ !N  R P 1 1 dy W 1+ρ (y|z) = ps Y

C. On tightness of the error exponent Theorem 7 provides an exponential upper bound on the average error probability of the parity-check ensemble of Zpr codes on a Zpr -symmetric MC. Corollary 8 states that the same error exponent is asymptotically achieved by a typical code sequence sampled from the Zpr -code ensemble. A natural question arising is whether these bounds are tight. We conjecture that EZpr (R) is the correct error exponent for the average Zpr -code at any rate 0 < R < CZpr , i.e. that

s |0) ≤ exp(N ρ s R − N E 0 (u , ρ)) , pe (CN s s r

1 log pe (CN ) = EZpr (R) . (9) N ∈N N No proof of (9) in its generality will be presented here. Rather, we shall confine ourselves to consider the high-rate and the low-rate regimes.

z

Therefore, we get

where Es0 ( · , · ) denotes the Gallager exponent of the MC (pr−s Zpr , Y, W ) (as defined in (3)) and us is the uniform distribution over pr−s Zpr . Since the MC (pr−s Zpr , Y, W ) is pr−s Zpr -symmetric, us is the optimal input distribution. Then, by optimizing the exponent Es0 (us , ρ) − ρ rs R over ρ in [0, 1], we get the error exponent Es ( rs R), so that  s   s |0) ≤ exp N E pe (CN R . s r The claim now follows by combining the above inequality with (8), and recalling Def.4.

Standard probabilistic arguments allow us to prove the following corollary of Theorem 7, estimating the asymptotic error exponent of the typical Zpr -code. Corollary 8. Let (Zpr , Y, W ) be a Zpr -symmetric MC of Zpr capacity CZpr and Zpr -error exponent EZpr (R). Then, for every 0 < R < CZpr , we have lim inf − N ∈N

1 log pe (CN ) ≥ EZpr (R) , N

lim −

Theorem 10. For any non-trivial Zpr -symmetric MC, there exist some 0 < R0 ≤ R1 < CZpr such that (9) holds true for the Zpr -code ensemble of design rate R ∈ (0, R0 ) ∪ (R1 , CZpr ). Proof First we concentrate on the high-rate regime. For rates R close enough to CZpr , from the continuity of the exponents Es (R), it follows that EZpr (R) = Es ( rs R) for one of the channels (pr−s Zpr , Y, P ) whose normalized capacity r r s Cs coincides with the Zp -capacity CZpr . It is known that close to capacity the random-coding exponent coincides with the sphere-packing exponent [2], [3]. Then, by applying the sphere-packing bound to the sub-code CN ∩ pr−s ZN pr (whose rate is not smaller than rs R), we get that, for all rates R not smaller than some 0 < R1 < CZpr , EZpr (R)

= Es ( rs R) ≥ lim sup − N1 log pe (CN ∩ pr−s ZN pr ) N ∈N



lim sup − N1 N ∈N

(10)

log pe (CN ) .

with probability one over the Zpr -coding ensemble of design rate R.

We shall now concentrate on showing the validity of (9) in the low-rate regime. First, observe that at rate R = 0

Proof With no loss of generality we can restrict ourselves to rates 0 ≤ R < CZpr , since otherwise EZpr (R) = 0 and the claim is trivial as pe (CN ) ≤ 1. For any 0 < ε < EZpr (R), N ∈ N define the event

E1 (0) ≤ E2 (0) ≤ . . . ≤ Er (0) ,

AεN := {pe (CN ) ≥ r exp(−N (EZpr (R) − ε))} , By applying Theorem 7 and the Markov inequality, we obtain   P(AεN ) ≤ P pe (CN ) ≥ 1r exp(N ε)pe (CN ) ≤ r exp(−N ε) . P P Then N P(AεN ) ≤ N exp(−N ε) < +∞, and the BorelCantelli lemma implies that with probability one the event AεN occurs for finitely many N in N. Therefore, with probability one lim inf N − N1 log pe (CN ) ≥ EZpr (R) − ε. Finally, the claim follows from the arbitrariness of ε in (0, EZpr (R)). Corollary 9. Let (Zpr , Y, W ) be a Zpr -symmetric MC, and let CZpr be its Zpr -capacity. Then, for all 0 ≤ R < CZpr there exist Zpr -codes C of rate not smaller than R and arbitrarily low error probability.

(11)

the inequalities above being strict on non-trivial Zpr symmetric MCs. From the continuity of the error exponents as functions of the rate R, it follows that for any non-trivial Zpr -symmetric MC EZpr (R) = E1 ( 1r R) ,

∀ R ≤ R0 ,

(12)

for some R0 > 0. Notice that CN ∩pr−1 ZN pr coincides with the Zp -linear code ensemble of rate 1r R. It is known [34] that E1 ( r1 R) is the correct error exponent for the average Zp -linear code. In fact, the arguments developed in [35] in order to prove tightness of the error exponent for the average code sampled from the random coding ensemble only require pairwise independence of the random codewords. In the Zp -linear ensemble the events {x ∈ CN } and {w ∈ CN } are independent whenever x and w are linear independent in ZN pr . Since every x in Zpr has only p linear dependent elements in ZN pr , the arguments of [35] can still be used to show that 1 1 lim sup − log pe (CN ∩ pr−1 ZN pr ) ≤ E1 ( r R) = EZpr (R) . N N ∈N

8

Then, since pe (CN ) ≥ pe (CN ∩ pr−1 ZN pr ), from (12) it follows that 1 lim sup − log pe (CN ) ≤ EZpr (R) , ∀R ≤ R0 . (13) N N ∈N Finally, the claim follows from (10), (13) and Theorem 7. Notice that, for r ≥ 2, strict inequalities in (11) imply that

The rest of section will be devoted to the proof of (15). The result will be achieved through a series of technical intermediate steps. We start by introducing some related probability densities which will play a key role in the sequel: • for every 1 ≤ q ≤ r, λq in P(C) defined by q

EZpr (R) < Er (R) ,

R ≤ R0 .

Therefore, for r ≥ 2 on any nontrivial Zpr -symmetric MC, the average Zpr -code exhibits poorer performance than the average code (i.e. a code sampled from the random-coding ensemble). This result had been first conjectured in [4], where the author hypothesized that the random-coding exponent of any G-symmetric MC is achieved by the average G-code only if G ≃ Zrp for prime p, namely when G admits Galois field structure. However, it can be shown that, at low rates, EZpr (R) is not the correct error exponent for the Zpr -code ensemble. In fact, similarly to the random-coding ensemble and the linear coding ensemble [34], it can be shown that at low rates the error exponent of a typical Zpr -code is higher than EZpr (R). This is because the average error probability is affected by an asymptotically negligible fraction of codes with poor behavior. In other words, at low rates the bound of Corollary 8 is not tight. In a forthcoming work we shall show that the typical Zpr -code achieves the expurgated error exponent on many Zpr -symmetric MCs of interest, including the pr -PSK AWGN channel. Since it is known that the random-coding ensemble does not instead achieve the expurgated error exponent with probability one, this will show that at low rates hierarchies for the average and the typical error exponent can be reversed: while the average random code behaves better than the average group code, the typical group code exhibits better performance than the typical random code. IV. Zpr - CODES

ACHIEVE CAPACITY ON THE pr -PSK

AWGN CHANNEL This section will be focused on the pr -PSK AWGN channel, for which it will be shown that the Zpr -capacity CZpr coincides with the Shannon capacity C. As a consequence, Zpr -codes are capacity-achieving for this important family of symmetric MCs, thus confirming a conjecture of Loeliger [6]. Throughout this section p will be some given prime number, r a fixed positive integer. The base of log (and thus of the 2π entropy function H) will be p. For m in N, ξm = e m i ∈ C will denote a primitive m-th root of 1. (Zpr , C, W ) will denote the pr -PSK AWGN channel, with input X identified with Zpr , output Y identified with the complex field C, and transition probability densities accordingly given by x 2 2 W (y|x) = 2σ1 2 e−||y−ξpr || /2σ . Recall that, by Def.4 CZpr = min1≤l≤r rl Cl , where Cl is the Shannon capacity of the MC rl Zpr , C, W , i.e. the AWGN channel with input restricted to the pl -PSK constellation. Hence, the condition C = CZpr is equivalent to rCl ≥ lCr for all 1 ≤ s, l ≤ r. A simple inductive argument shows that this is in turn equivalent to qCq+1 ≤ (q + 1)Cq ,

∀1 ≤ q ≤ r − 1 .

1 λq (y) := q p

(14)

(15)



X

x∈pr−q Zpr

p −1 1 X W (yξpj q |0) W (y|x) = q p j=0

(with the second equality above following from the symmetry of the MC); for every 1 ≤ q ≤ r − 1 and y ∈ C, νq (y) in P(Zp ) defined by [νq (y)](a) :=



λq (yξpaq+1 ) pλq+1 (y)

;

(16)

for every 1 ≤ q ≤ r and y ∈ C a probability distribution ωq (y) in P(pr−q Zpr ) defined by [ωq (y)](x) :=

1 pq λq (y)

W (y|x) .

For any 1 ≤ q ≤ r, consider the pq -PSK AWGN channel (p Zpr , C, W ). Since it is symmetric, its Shannon capacity Cq is achieved by a uniform distribution over the input The corresponding output probability density is pr−q Zpr . P given by x∈pr−q Zpr p−q W (y|x) = λq (y), so that r−q

Cq = H(λq ) − H (W (·|0)) .

(17)

Therefore (15) is equivalent to H (W (·|0))+qH(λq+1 ) ≤ (q +1)H(λq ) , 1 ≤ q < r . (18) The following result relates the entropies of the discrete probability distributions ωq (y) and νq (y) to those of the continuous densities λq and W ( · |0). Lemma 11. For every 1 ≤ q < r, Z H(W ( · |0)) = H(λq ) − q + λq (y)H(ωq (y))dy ;

(19)

C

H(λq ) = H(λq+1 ) − 1 +

Z

λq+1 (y)H(νq (y))dy .

(20)

C

Proof See Appendix B-A. As a consequence of Lemma 11 we have that (18) is equivalent to Z Z λq (y)H (ωq (y)) dy , (21) q λq+1 (y)H (νq (y)) dy ≥ C

C

for all 1 ≤ q ≤ r − 1. We pass now to the core of the argument which relies on geometric considerations. For 1 ≤ q < r, fix an arbitrary point y in the output set C, and consider the multiset of likelihood values for the input pq -PSK, given by Wq (y) := =

q {W n (y|0), W (y|1), . . . , W (y|p − q1)} o W (y|0), W (yξpq |0), . . . , W (yξppq −1 |0) .

9

Since the pq+1 -PSK constellation is the disjoint union of p copies of the pq -PSK constellation each rotated by an angle multiple of p2π q+1 , we have [ Wq+1 (y) = Wq (yξpj q+1 ) . (22) 0≤j 0, and  consider the  corresponding family of K2βr -AWGN channels K2βr , R3 , W , whose Z2r -capacity will be denoted by CZ2r (β). For 1 ≤ s ≤ r, C2s (β) will denote the capacity of the AWGN channel with input restricted to the sub-constellation {xk2r−s | 1 ≤ k ≤ 2s }, so that r CZ2r (β) = min C2s (β) . 1≤s≤r s We start our analysis by considering the limit case β = 0. In this case K20r coincides with an R3 embedding of the 2r -PSK constellation and it is clearly not three-dimensional since it does not span R3 . Since orthogonal components of the AWGN are mutually independent, for every 1 ≤ s ≤ r, C2s (0) coincides with the Shannon capacity of the 2s -PSKAWGN channel. Thus, all the results of Sect.IV hold true: in particular, the Z2r -capacity and the Shannon capacity coincide, i.e. (28) CZ2r (0) = C2r (0) . Similar arguments can be applied, for every given β > 0, to the sub-constellation  q q 2  2π ki β r−1 1 r−1 2 , 1+β 2 , 1 ≤ k ≤ 2 1+β 2 e

Fig. 3. Shannon capacity and Z8 -capacity ˘of K8β -AWGN ¯channel as functions of β. It can be seen as CZ8 (β) = min C8 (β), 32 C4 (β) coincides with C8 (β) only for values of β below a certain threshold. The maxima of C8 (β) and C8 (β) are achieved for values of β close to this threshold, i.e. the two problems of optimizing respectively Shannon capacity and Z8 -capacity seem to have similar solutions. The optimal values are greater than the 8PSK-AWGN capacity.

coinciding with a three-dimensional embedding of a rescaled 2r−1 -PSK. Applying the results of the previous section, we get that (r − 1)C2s (β) ≥ sC2r−1 (β) , 1 ≤ s ≤ r − 1 .

(29)

Thus, for every β ∈ (0, +∞), in order to check whether C2r (β) and CZ2r (β) do coincide, one is only left to compare the two capacities C2r (β) and C2r−1 (β). If we now let the parameter β go to +∞, the constellation K2βr approaches an R3 -embedding of the 2-PAM constellation, with the 2r−1 even-labeled points {x2k | 1 ≤ k ≤ 2r−1 } collapsed into the point (0, 1), and the odd labeled ones {x2k−1 | 0 ≤ k ≤ 2r−1 } into the point (0, −1). Let us define this limit constellation as K ∞ := {(0, 1), (0, −1)}. Notice that, for every finite standard deviation value σ > 0, the Shannon capacity of the K ∞ -AWGN channel is strictly positive, while C2r−1 (∞) = 0, since it is the capacity of an MC with indistinguishable inputs. A continuity argument yields the following result. Proposition 18. For every finite variance σ 2 > 0 and any integer r ≥ 2, the family of K2βr -AWGN channels satisfies lim C2r (β) = C(∞) > 0 ,

β→∞

lim CZ2r (β) = 0 .

β→∞

Proof See Appendix C. Theorem 5 and Proposition 18 have the following immediate consequence. Corollary 19. For all variance σ 2 > 0, there exists a positive finite β such that, for any β > β, Z2r -codes do not achieve Shannon capacity of the K2βr -AWGN channel. On the other hand, it can be proved that (r − 1)C2r (0) < rC2r−1 (0) , for all r > 2. Then, by a continuity argument it can be shown that instead, for sufficiently small values of β, C2r (β) = CZ2r (β) , so that Z2r -codes do achieve capacity of the K2βr -AWGN channel. Fig.3 refers to the case 2r = 8: the normalized Shannon capacity C8 (β) and C4 (β) are plotted as a functions of the parameter β (Montecarlo simulations).

11

VI. A RBITRARY

FINITE

A BELIAN

GROUP

A. The algebraic structure of finite Abelian groups In order to generalize the results of Sect.III, some basic facts about the structure of finite Abelian groups need to be recalled. We refer to standard textbooks in algebra ([37] for instance) for a more detailed treatment. Let M be a finite Abelian group. Given µ ∈ N define the following subgroups of M : µM = {µx | x ∈ M } ,

M(µ) = {x ∈ M | µx = 0} .

It is immediate to verify that µM = {0} if and only if M(µ) = M . Define µM := min{µ ∈ N|M(µ) = M } = min{µ ∈ N|µM = {0}} . pr11

· · · prss

Write µM = where p1 < p2 < · · · < ps are distinct primes and r1 , . . . , rs are non-negative integers, existence and uniqueness of such a decomposition being guaranteed by the fundamental theorem of algebra. It is a standard fact that M admits the direct sum decomposition M = M(pr11 ) ⊕ · · · ⊕ M(prss ) .

(30)

Each M(pri ) is a Zpri -module and, up to isomorphisms, can i i be further decomposed, in a unique way, as a direct sum of cyclic groups k

ki,r

M(pri i ) = Zkpii,1 ⊕ Zp2i,2 ⊕ · · · ⊕ Zpri i . i

(31)

i

The sequence σ M = (p1 , . . . , ps ) will be called the spectrum of M , the sequence rM = (r1M , . . . , rsM ) the multiplicity and, finally, the double indexed sequence  kM = ki,j |1 ≤ i ≤ s , 1 ≤ j ≤ riM

will be called the type of M . It will be convenient often to use the following extension: ki,j = 0 for j > riM . Given a sequence of primes σ = (p1 , . . . , ps ), we will say that M is σ-adapted if σ M is a subsequence of σ. Notice that, once the sequence of primes σ has been fixed, all σadapted Abelian groups are completely determined by their type (which includes the multiplicities riM with the agreement that some of them could be equal to 0). We will denote by Mk the finite Abelian group having type k. Notice that if M is a finite Abelian group with type k and N ∈ N, the Abelian group M N has the same spectrum and multiplicity of M and type N k. If M and L are finite Abelian groups and φ ∈ Hom(M, L), then φ(M(µ) ) ⊆ L(µ) and φ(µM ) ⊆ µL for every µ ∈ N. It follows that φ is surely non-injective if M is not σ L -adapted or if any of the multiplicities in M is strictly larger than the corresponding in L. B. The inverse channel coding theorem for Abelian G-codes Suppose now we have fixed, once for all, a finite Abelian group G having spectrum σ G = (p1 , . . . , ps ), multiplicity rG = (r1G , . . . , rsG ) and type kG . Consider a G-code M ≤ GN of rate R = N1 log |M |. Clearly M is σ G -adapted and riM ≤ riG for all 1 ≤ i ≤ s, since otherwise the immersion of

M in GN would not be injective. Then M can be decomposed as illustrated above in (30) and (31). Let us fix now a matrix  l = li,j ∈ Z+ | 1 ≤ i ≤ s , 1 ≤ j ≤ riG

such that li,j ≤ j for every i and j. We will say that l is an rG -compatible matrix. Define M M j−l k (32) M (l) = pi i,j Zpji,j . i

1≤i≤s 1≤j≤riG

An immediate consequence of the previous considerations is that M X j−l pi i,j G(pj ) . M (l) ⊆ GN Gl := l , i

1≤i≤s 1≤j≤riG

These inclusions automatically give information theoretic constraints to the possibility of reliable transmission using this type of codes. Denote by Rl the rate of M (l) and by Cl the capacity of the subchannel having as input alphabet the subgroup Gl . Then, a necessary condition for pe (M ) not to be bounded away from 0 by some constant independent of N is that Rl ≤ Cl for every rG -compatible l. This does not give explicit constraints yet to the rates R at which reliable transmission is possible using G-codes. For this, some extra work is needed using the structure of the Abelian groups M (l). Notice that 1 X X Rl = li,j ki,j log pi . N G 1≤i≤s 1≤j≤ri

It is useful introduce the following probability distribution on the pairs (i, j): jki,j log pi . αi,j = log |M | From the above definition, and recalling that log |M | = RN , RN α we have ki,j = j log pi,ji . Denote now by P(rG ) the space of probability distributions (αi,j ) on the set of pairs (i, j) with 1 ≤ i ≤ s and 1 ≤ j ≤ riG . We introduce the following definition. Definition 20. Let G be a finite Abelian group of spectrum σ G = (p1 , . . . , ps ) and type kG . Let (G, Y, W ) be a Gsymmetric MC. For each rG -compatible matrix l, let Cl be the capacity of the MC (Gl , Y, W ). The G-capacity of the MC (G, Y, W ) is CG :=

max

α∈P(rG )

min

l6=0 rG −comp.

P

Cl P

1≤i≤s 1≤j≤riG

li,j j αi,j

,

(33)

where l 6= 0 means that li,j 6= 0 for some i, j. It clearly follows from our previous considerations that CG is an upper bound to reliable transmission using G-codes. More precisely, we have the following result which is an immediate consequence of the inverse channel coding theorem. Theorem 21. Consider a G-symmetric channel and let CG be its G-capacity. Then, for every rate R > CG there exists a constant AR > 0, such that the the error probability of any G-code C of rate R satisfies pe (C) ≥ AR .

12

C. A coding theorem for Abelian G-codes

D. Examples G

Given a design rate R and a splitting α ∈ P(r ), for each block-length N ∈ N define   RN (1 − αi,j ) (hN )i,j = j log pi Let VhN be the Abelian group having spectrum σ G and type hN . Consider a sequence of independent r.v.s ΦN uniformly distributed over Hom(GN , VhN ). Let CN := ker(ΦN ) be the corresponding sequence of random G-codes. We shall refer to such a random code construction as the G-coding ensemble of design rate R and splitting α. Notice that CN (R,α) has rate deterministically not smaller than R. Let pe (CN ) denote the word error probability averaged over this ensemble. Theorem 7 admits the following generalization. Theorem 22. Let (G, Y, W ) be a G-symmetric MC. For every R ∈ [0, log |G|[, α ∈ P(rG ), X (R,α) pe (CN ) ≤ exp (−N El (Rl )) l6=0 rG −compatible

where El (R) is the error exponent of the MC (Gl , Y, W ), and X X li,j αi,j . Rl := R j 1≤i≤s 1≤j≤ri

G

By choosing α ∈ P(r G ) such that CG =

min

l6=0 rG −comp.

P

Cl P

1≤i≤s 1≤j≤riG

li,j j αi,j

one has that minl6=0 El (Rl ) > 0 for all R < CG . Therefore, Theorem 22 has the following corollary. Corollary 23. Let (G, Y, W ) be a G-symmetric MC of Gcapacity CG . Then, for every rate 0 < R < CG , there exists a G-code C of rate not smaller than R and arbitrarily low error probability. Finally, for 0 < R < C, it is possible to optimize the error exponent over all splittings α in P(r G ). This leads to the following definition of the G-coding error exponent of a MC (G, Y, W ):   X X li,j EG (R) = max min El R αi,j  . j α∈P(rG ) G l6=0 G r

−comp.

In the sequel, three examples will be presented with explicit computations of CG for Abelian groups G with particular algebraic structure. First we examine groups admitting Galois field structure, showing as in this case the G-capacity CG coincides with the Shannon capacity C, as follows from classical linear coding theory. Example 6. Suppose that G ≃ Zkp for some prime p and positive integer k. Thus σ G = (p) ,

Consequently, the only rG -compatible l is given by l = 1 and therefore we have that in this case CG = C, EZrp (R) = E(R). In other words, Zrp -codes achieve both the capacity and the random-coding exponent of every Zpr -symmetric MC. This had first been shown in [4]. In fact, in this case it is known that linear codes over the Galois field Fpr suffice to achieve capacity random-coding exponent. However, GU constellations admitting a generating group which is isomorphic to Zrp are affected by a constraint on their bandwidth efficiency. In fact, if S is an n-dimensional GU constellation admitting Zkp as generating group, then standard arguments using group representation theory allow to conclude that  k, if p = 2 ; n≥ 2k, if p ≥ 2 . In the next example we show that when G = Zpr Def.20 reduces to Def.4 of Section III. Example 7. Let G ≃ Zpr . We want to show that r CG = min Cl . l=1,...,r l Notice first that in this case σ G = (p) and rG = r. A vector l = (l1 , . . . , lr ) is rG -compatible if and only if lj ≤ j for every j = 1, . . . , r. Notice now that X X ∗ Gl = pj−lj G(pj ) = pr−lj Zpr = pr−l Zpr , 1≤j≤r

Corollary 24. Let (G, Y, W ) be a G-symmetric MC of Gcapacity CG and G-coding exponent EG (R). Then, for all 0 < R < CG we have 1 lim inf − log pe (CN ) ≥ EG (R) , N ∈N N with probability one over the G-coding ensemble of design rate R ad optimal splitting αG (R).

1≤j≤r



where l := max lj . Hence, Cl = Cl∗ . 1≤j≤r

Notice now that P(rG ) simply consists of the probability distributions α = (α1 , . . . , αr ). Suppose we are given some α in P(rG ). We have that

1≤i≤s 1≤j≤ri

(34) By letting αG (R) in P(r G ) be an optimal splitting in the maximization above, and using arguments similar to the proof of Corollary 8, the following corollary can be proved.

rG = (1) .

min

l6=0 rG −comp.

Now,

Cl P

1≤j≤r

1

r

lj j αj

= min Cρ

max

ρ=1

l6=0 rG −comp. l∗ =ρ

max

l6=0 rG −comp. l∗ =ρ

P

1≤j≤r

lj j αj

.

X lj ρ αj ≥ j r

1≤j≤r

and equality holds true if and only if αr = 1 and αj = 0 for every j 6= r. Hence, r CZpr = min Cρ , αZpr = (0, . . . , 0, 1) . 1≤ρ≤r ρ 

13

1

0.9

0.8

0.7

0.6

α2(β)

0.5

0.4

0.3

0.2

0.1

0

0

1

2

3

4

5

6

7

8

9

10

β

β Fig. 4. The optimal splitting for the Cartesian product constellation K2×3 as a function of β.

Finally, the following example concerns one of the Cartesian product GU constellations introduced in Ex.4. β Example 8. Now consider the K2×3 constellation introduced β in Example 4. Consider a K2×3 -AWGN channel. It is easy to show that the independence of orthogonal components of the Gaussian noise imply that the capacity C6 (β) of such a channel is equal to the sum of the capacities of its two subchannels, C2 (β) and C3 (β). This fact allows us to explicitly write down the optimal splitting, i.e. the α ∈ P(rG ) solution of the variational problem (33) defining CZ6 , as a function of the parameter β. Since Z6 ≃ Z2 ×  Z3 , we have that s = 2, p1 = 2, p2 = 3, and r G = r1G , r2G = (1, 1). (33) reduces to   C2 (β) C3 (β) CZ6 (β) = max min , , C6 (β) . α∈P({2,3}) α2 α3

We claim that, for every β ∈ (0, +∞), CZ6 (β) = C6 (β) and the optimal splitting is given by   αZ6 (β) = αZ2 6 (β), αZ3 6 (β) = C61(β) (C2 (β), C3 (β)) . Indeed we have that C6 (β)

≥ CZ6 (β) o n C3 (β) max min C6 (β), C2α(β) , α3 2 α∈P({2,3})   ≥ min C6 (β), CZ26(β) , CZ36(β)

=

α2 (β) α3 (β)

= C6 (β) .

In Fig.4 αZ2 6 (β) is plotted: notice how the optimal splitting follows the geometry of the constellation as α2 (β) is monotonically increasing in β with lim αZ6 (β) = (0, 1) ( β→0

as β goes to 0 K2×3 (β) collapses onto constellation K3 ) and lim αZ6 (β) = (1, 0) (as β goes to +∞ K2×3 (β) collapses β→+∞

onto constellation 2-PAM).



restricted over a geometrically uniform constellation S admitting G as generating group and either soft or quantized output. We have characterized the threshold value for the rates at which reliable transmission is possible with G-codes, which we called the G-capacity CG . The G-capacity is defined as the solution of an optimization problem involving Shannon capacities of the channels obtained by restricting the input to some of the subgroups of G. We have shown that at rates below CG the average ML word error probability of the ensemble of G-codes goes to zero exponentially fast with the blocklength, with exponent at least equal to the G-coding exponent EG (R), while at rates beyond CG the word error probability of any Gcode is bounded from below by a strictly positive constant. We have proved that for the AWGN channel with input constrained on the m-PSK constellation (and m the power of a prime) the G-capacity CG does coincide with the Shannon capacity C, so that in this case reliable transmission at any rate R < C is in fact possible using group codes over Zm . Finally, we have exhibited a counterexample when CG < C: it consists of the AWGN channel with as input a particular three-dimensional constellation admitting Zm as generating group. Among the still open problems we recall: • giving a full proof that EG (R) is tight for the average G-code, and analyzing the error exponent of the typical G-code; • extending the theory to non-Abelian groups: indeed, it is known [8], [6] that GU constellations with Abelian generating group do not allow to achieve the unconstrained AWGN capacity. A PPENDIX A P ROOF OF L EMMA 3 For the reader’s convenience, all statements are repeated before their proof. Lemma. Let G be a finite group, (G, Y, W ) a G-symmetric MC, and C ⊆ GN a code such that 1GN ∈ C. Then 1 P R WN1+ρ (y|z) pe (C|1GN ) ≤ |G|1 N z∈GN Y N !ρ 1 P SC (θ) P 1+ρ WN (y|zx) dy . N θ6=δ1 (N θ) x∈GN G

Proof We start by recalling the Gallager bound [2]. Given a MC (X , Y, W ), and a code C ⊆ X N , for every x in C and ρ > 0 the conditioned word error probability satisfies  ρ Z X 1 1 WN (y|z) 1+ρ  dy . WN (y|x) 1+ρ  pe (C|x) ≤ YN

VII. C ONCLUSION In this paper we analyzed the information-theoretical limits of Abelian group codes over symmetric memoryless channels. Our results generalize the classical theory for binary linear codes over binary-input symmetric-output channels. The main example we have in mind is the AWGN channel with input

θ

z∈C\{x}

From the given code C we generate the random code C ′ := ZΠC, where Π is a r.v. uniformly distributed over the permutation group SN (where π ∈ SN acts on x ∈ GN by permuting its components, i.e. (πx)i := (x)πi ) and Z is a r.v. uniformly distributed over GN , independent from Π. Throughout the proof we will denote by E[·] the average operator with respect to such a probabilistic structure.

14

The crucial point here is that the average word error probability of the random code C ′ conditioned to the transmission of Z is equal to the word error probability of C conditioned on the transmission of 1GN . In fact, for every π ∈ SN we have that 1GN ∈ πC and, since the channel is memoryless and stationary, the ML-decision region ΛπC for the codeword 1GN in the code πC coincides with πΛC , where ΛC denotes the ML-decision region of 1GN in the code C. Thus R pe (πC|1GN ) = 1 − Λπφ WN (y|1GN )dy R = 1 − RπΛC WN (y|1GN )dy = 1 − ΛC WN (y|1GN )dy = pe (C|1GN ) .

Similarly, for any z ∈ GN we have z ∈ zC and, due to the G-symmetry of the channel, the ML-decision region ΛzC of z in zC coincides with zΛC , so that pe (zC|z) = pe (C|1GN ). Therefore, we have E[pe (C ′ |Z)] = pe (C|1GN ) .

(35)

From (35), by applying the Gallager bound to each realization of the random code C ′ , and observing that, for any w ∈ C, Πw is uniformly distributed over the set GN θ of N -tuples of type θ and independent from Z, we get pe (C|1GN ) = E[pe (C ′ |Z)]   ρ  1 1 R P 1+ρ 1+ρ ≤ E Y N WN (y|Z) WN (y|ZΠw) dy w  ρ 1 1 P 1+ρ PR 1+ρ 1 WN (y|zΠw) dy = |G|N Y N WN (y|z)E w z  ρ 1 1 PR P SC (θ) P 1+ρ 1+ρ W = |G|1 N W (y|z) (y|zx) dy , N N N N Y z θ ( N θ) x

with the summation index w running over C \ {1GN }, z over GN , θ over PN (G) \ {δ1G } and x over GN θ .

A PPENDIX B P ROOFS FOR S ECTION IV A. Proof of Lemma 11 Lemma. For every 1 ≤ q < r, H(W ( · |0)) = H(λq ) − q +

Z

λq (y)H(ωq (y))dy ;

C

H(λq ) = H(λq+1 ) − 1 +

Z

λq+1 (y)H(νq (y))dy . C

Proof We have, for K := pr−q Zpr , R H (W (·|0)) = − C W (y|0) log W (y|0)dy P R = − p1q W (yξpkr |0) log W (yξpkr |0)dy C k∈K P R W (y|k) log W (y|k)dy = − p1q C R k∈K = − C λq (y) log λq (y)dy R P − C λq (y) (ωq (y))k log(pq (ωq (y))k )dy k∈KR = H(λq ) − q + C λq (y)H (ωq (y)) dy

and H(λq )

= −

R



R

λq (y) log λq (y)dy R P λq (yξpkq+1 ) log λq (yξpkq+1 )dy = − p1 k∈Z C p R P λq (yξpkq+1 ) log λq+1 (y)dy = − C p1 C

k∈Zp

C

λq+1 (y)

P

k∈Zp

R

λq (yξpkq+1 ) pλq+1 (y)

λq (yξpkq+1 )

log

λq+1 (y)

dy

λq+1 (y) log λq+1 (y) P − C λq+1 (y) (νq (y))k log(p(νq (y))k )dy k∈Z Rp = H(λq+1 ) − 1 + C λq+1 (y)H (νq (y)) dy .

= −

C

R

B. Proof of Lemma 12 Lemma. For every 1 ≤ q < r and y ∈ C, there exists a partition [ Wq+1 (y) = Wqk (y) , 1≤k≤pq

k k k where each multiset Wqk (y) = {wq,0 , wq,1 , . . . , wq,p−1 } is j k such that, for all 0 ≤ j, i ≤ p−1, wq,j belongs to Wq (yξpq+1 ), and

0 ≤ k < k ′ < pq



k k =⇒ wq,i (y) ≥ wq,j (y) .

Proof Since the transition densities W (y|x) are decreasing functions of the Euclidean distance |y − x|, the decreasing ordering of the set Wq+1 (y) coincides with the increasing ordering of the set of distances {|y − ξpxr |, x ∈ pr−q−1 Zpr }. r−q−1 Zpr . Then, Define y = ρeθi , ϕj = j 2π pr for j ∈ p |y − ξpj r |2

= (ρ cos θ − cos ϕj )2 + (ρ sin θ − sin ϕj )2 = ρ2 + 1 − 2ρ(cos θ cos ϕj + sin θ sin ϕj ) = ρ2 + 1 + 2ρ cos(θ − ϕj ) .

Let j ∗ be the closest input in pr−q−1 Zpr to the given output y, i.e. j ∗ is such that |θ − ϕj ∗ | ≤ |θ − ϕj | for all j ∈ pr−q−1 Zpr . Then, either 1 2π ϕj ∗ ≤ θ ≤ ϕj ∗ + (36) 2 pq+1 or 1 2π ϕj ∗ − (37) ≤ θ ≤ ϕj ∗ 2 pq+1 hold true. Suppose that (36) holds true, and define m := pr−q−1 . Then, cos(θ − ϕj ∗ ) ≥ cos(θ − ϕj ∗ +1 ) ≥ cos(θ − ϕj ∗ −1 ) ≥ cos(θ − ϕj ∗ +2 ) ≥ . . . ≥ cos(θ − ϕj ∗ −⌊ pq ⌋ ) .

(38)

2

From (38) it follows that, for odd p, Wq0 (y)

= {W (y|j ∗ ), W (y|j ∗ + m), W (y|j ∗ − m),

Wq1 (y)

. . . , W (y|j ∗ − ⌊ p2 ⌋m)} = {W (y|j ∗ + ⌈ p2 ⌉m), W (y|(j ∗ − ⌈ p2 ⌉)m), . . . , W (y|(j ∗ + p)m)}

.. . q

Wqp

−1

(y)

q

= {W (y|(j ∗ − (⌊ p2 ⌋ + ⌊ p2 ⌋)m), q

. . . , W (y|m(j ∗ − ⌊ p2 ⌋))} .

15

The claim follows, since for every k, Wqk (y) contains exactly one W (y|j) with j belonging to each coset of pr−q Zpr in pr−q−1 Zpr . The case when (37) holds true instead of (36) is analogous, while the case p = 2 is much simpler.

intersection defines AJ , while for the latter it suffices to observe that, for each I ⊂ J c , if x is in AJ , then |I|+|J|

For any subset K ⊆ Rn , let co(K) denote the convex hull of K, i.e. the smallest convex subset of Rn containing K. A polytope is the convex hull of finite set K ⊂ Rn . A general fundamental result (see [38]) states that P ⊂ Rn is a polytope if and only if it is bounded intersection of closed half-spaces. In the sequel, we shall deal with a special class of polytopes: given a point x ∈ Rn , we shall consider co(Sn x), i.e. the convex hull of the set of all component permutations of x. This is sometimes called the (generalized) permutahedron of x. The next result explicitly characterizes co(Sn x) as the intersection of half-spaces. Lemma 25. Let w ∈ Rn be such that

xi ≤

i∈I∪J

so that C. Proof of Lemma 14

X

P

i∈I

xi =

X

xi ,

i=1

P

xi =

i∈J

xi −

i∈I∪J

X

P

i∈J

xi ≤

|J| X

xi ,

i=1

|I|+|J| P

xi .

i=|I|+1

For J ⊆ {1, . . . , n + 1}, let ΨJ ∈ Sn+1 be any permutation mapping the first |J| elements onto J, i.e. such that ΨJ ({1, . . . , |J|}) = J. Define SJ ⊆ Sn+1 be the set of permutations σ such that σ {1,...,|J|} is the identity. Notice that SJ commutes with SJ c in the sense that σρ = ρσ, for all σ ∈ SJ and ρ ∈ SJ c . Let φJ : πJ Rn+1 → R|J| c and φJ c : πJ c Rn+1 → R|J | be the standard isomorphisms. By applying the inductive hypothesis to φJ πJ ΨJ w and φJ c πJ c ΨJ w respectively, and then immersing back the results −1 in Rn+1 by φ−1 J and φJ c respectively, we have that BJ ⊆ co(πJ ΨJ SJ w),

CJ ⊆ co(πJ c ΨJ SJ c w) .

(41)

For every x ∈ AJ we have πJ x ∈ BJ and πJ c x ∈ CJ from (40). Then (41) implies that λ′ ∈ P(SJ ) and λ′′ ∈ P(SJ c ) Then co(Sn w) = A, where exist such that     x = πJ x + πJ c x \ X  X \ X X P ′′ P ′ wi xi = wi . A := xi ≤ λ (ρ)πJ c ΨJ ρw λ (σ)πJ ΨJ σw + =     1≤i≤n 1≤i≤n i∈J J 1≤i≤|J| ρ∈SJ c σ∈SJP λ′ (σ)λ′′ (ρ)ΨJ σρw = Proof In order to prove that co(Sn w) ⊆ A it suffices to note σ∈SJP ,ρ∈SJ c that, for every σ ∈ Sn , σx ∈ A. In fact, it is easy to check that, λ(σ)σw ∈ co(Sn+1 w), = σ∈ΨJ SJ SJ c due to (39), every constraint is satisfied. Since A is convex it immediately follows that co(Sn w) ⊆ A. with λ ∈ P(ΨJ SJ SJ c ) ⊆ P (Sn+1 ) defined by λ(ΨJ σρ) := We now prove the converse inclusion, A ⊆ co(Sn w), by λ′ (σ)λ′′ (ρ). Therefore, for every J ⊂ {1, . . . , n + 1}, we have S induction. The statement is trivially true for n = 1. Suppose AJ ⊆ co(Sn+1 w), and so A = co ( J AJ ) ⊆ co(Sn+1 w). that the claim is true for every m ≤ n for some n ∈ N 2 k and let w ∈ Rn+1 P be such that w1 P ≥ . . . ≥ wn+1 . Define Lemma 26. Suppose n real numbers {ai , 1 ≤ i, k ≤ n} are D := x ∈ Rn+1 : 1≤i≤n+1 xi = 1≤i≤n+1 wi . given, such that ′ For each 1}, define D J , FJ ⊆ Rn+1 by  J ⊂P{1, . . . , n +P k < k ′ =⇒ akj ≤ aki , 1 ≤ j, i ≤ n . (42) and respectively w x = DJ := x :  P  P i∈J i P 1≤i≤|J| i P n 1 wi . Consider the Define x and v in Rn , x := FJ := x : i∈J xiT= n−|J|