Channel Polarization: A Method for Constructing ... - Semantic Scholar

Report 2 Downloads 43 Views
!"!#$%&&'($#)*)+,)($-.+./.($0123$4$5$66($%&&'

Channel Polarization: A Method for Constructing Capacity-Achieving Codes Erdal Arkan Electrical-Electronics Engineering Department Bilkent University, Ankara, TR-06800, Turkey Email: [email protected] Abstract— A method is proposed, called channel polarization, to construct code sequences that achieve the symmetric capacity I(W ) of any given binary-input discrete memoryless channel (B-DMC) W . The symmetric capacity I(W ) is the highest rate achievable subject to using the input letters of the channel equiprobably and equals the capacity C(W ) if the channel has certain symmetry properties. Channel polarization refers to the fact that it is possible to synthesize, out of N independent copies of a given B-DMC W , a different set of N binary-input channels such that the capacities of the latter set, except for a negligible fraction of them, are either near 1 or near 0. This second set of N channels are well-conditioned for channel coding: one need only send data at full rate through channels with capacity near 1 and at 0 rate through the others. The main coding theorem about polar coding states that, given any B-DMC W with I(W ) > 0 and any xed 0 < δ < I(W ), there exist nite constants n1 (W, δ) and c(W, δ) such that for all n ≥ n1 , there exist polar codes with block length N = 2n , rate R > I(W )−δ, and probability of block decoding error Pe ≤ c N −1/4 . The codes with this performance can be encoded and decoded within complexity O(N log N ).

independent copies of W2 are combined to create the channel W4 : X 4 → Y 4 with transition probabilities W4 (y14 |u41 ) = W2 (y12 |u1 ⊕ u2 , u3 ⊕ u4 )W2 (y34 |u2 , u4 ). We use the notation

u1 u2

(1)

The next level of recursion is shown in Fig. 2 where two

u1

+

u2

x1

W

x2

W

y1 y2

W2 Fig. 1.

The channel W2 .

$"%&!&'(''&()"!&*+,%+-().,,/0(,,%/1222

v2

u3

v3

+

u4

Let W : X → Y denote an arbitrary B-DMC with input alphabet X = {0, 1}, output alphabet Y, and transition probabilities [W (y|x)], x ∈ X , y ∈ Y. Let W N denote the DMC consisting of N independent copies of W . Channel polarization method has two parts: channel combining and channel splitting. The channel combining part is a recursive method that combines N = 2n independent copies of W to construct a channel WN : X N → Y N . The recursion begins by combining two copies of W as shown in Fig. 1 to obtain the channel W2 : X 2 → Y 2 with transition probabilities

+

x1

W

x2

W

y1 y2

W2

I. C HANNEL POLARIZATION

W2 (y1 y2 |u1 u2 ) = W (y1 |u1 ⊕ u2 )W (y2 |u2 )

v1

+

v4 Π4

+

x3

W

x4

W

y3 y4

W2 W4

Fig. 2.

The channel W4 and its relation to W2 and W .

j aN 1 to denote an arbitrary vector (a1 , . . . , aN ), and ai to denote the subvector (ai , . . . , aj ). In Fig. 2, Π4 is the permutation a41 $→ (a1 , a3 , a2 , a4 ). Note that the mapping u41 $→ x41 from the input of the channel W4 to the input of W 4 can be written as x41 = u41 G4 , where   1 0 0 0 1 0 1 0  G4 =  (2) 1 1 0 0 1 1 1 1

We call G4 the generator matrix of size 4. Thus, we have the relation W4 (y14 |u41 ) = W 4 (y14 |u41 G4 ) between the transition probabilities of the combined channel W4 and those of the collection of underlying raw channels W 4 . The general form of the recursion is shown in Fig. 3 where two independent copies of WN are combined to create W2N : X 2N → Y 2N . The input vector u2N to W2N is rst 1

!!"#

!"!#$%&&'($#)*)+,)($-.+./.($0123$4$5$66($%&&'

u1 u2

+

u3 u4

+

s1 s2

.. .

s3 s4

WN

.. .

ui+1 ∈X

yN

vN .. .

Π2N vN +1 vN +2

.. .

.. . u2N −1 u2N

Having combined N independent copies of W into a channel WN , the next and nal step of channel polarization is to split WN back into a set of N binary-input channels (i) WN : X → Y N ×X i−1 , 1 ≤ i ≤ N , dened by the transition probabilities * 1 (i) WN (y1N |uN (5) WN (y1N , ui−1 1 ) 1 |ui ) = N −1 2 N N −i

y1

v1 v2

+

yN +1

WN

.. .

(i)

where (y1N , ui−1 1 ) represents the output of WN and ui its (i) input. The channels WN exhibit a polarization effect in the sense that the fraction of indices i for which the symmetric (i) capacity I(WN ) is inside the interval (δ, 1 − δ) goes to zero as N goes to innity for any xed δ > 0. This is illustrated in Fig. 4 for W a binary erasure channel (BEC) with erasure probability " = 0.5. 25

s2N −1 s2N

y2N

v2N

20

Fig. 3.

Frequency

W2N The relation between the channels W2N and WN .

transformed to a vector s2N such that s2i−1 = u2i−1 ⊕ u2i 1 and s2i = u2i for 1 ≤ i ≤ N . The permutation Π2N acts on s2N to generate v12N such that vi = s2i−1 and vN +i = s2i 1 for 1 ≤ i ≤ N . This completes the description of the channel combining part. To give a concise algebraic description of channel combining, we dene GN , the generator matrix of size N , as the matrix such that N N N WN (y1N |uN 1 ) = W (y1 |u1 GN )

(3)

N for all y1N ∈ Y N , uN 1 ∈X . !"#$#%&'&#( )* The generator matrix is given by

˜N = Π ˜ N F ⊗n GN = F ⊗n Π

(4) ( ∆ 1 0 with where F ⊗n is the n-fold tensor product of F = 1 1 ˜ N is a permutation matrix known as the bit-reversal itself and Π operation. ! We omit proofs due to space limitations and refer to [1]. To describe the bit-reversal operation, we need to consider an alternative indexing for vectors. Given a vector aN 1 with length N = 2n for some n ≥ 0, we may denote its ith element, ai , 1 ≤ i ≤ N , alternatively as abn ···b1 where b) n · · · b1 is the n binary expansion of integer i − 1, i.e., i − 1 = j=1 bj 2j−1 . N N Now, bit-reversal applied to a1 sends it to c1 such that cbn ···b1 = ab1 ···bn . For example, a41 = (a00 , a01 , a10 , a11 ) is sent to c41 = (a00 , a10 , a01 , a11 ). '

!!"'

15

10

5

0

Fig. 4.

0

0.2

0.4

0.6

Symmetric capacity in bits

0.8

1

(i)

Histogram of I(WN ) for N = 26 and W a BEC with ! = 0.5.

We take advantage of the polarization effect to construct codes that achieve rates approaching I(W ) by a method we call polar coding. The basic idea of polar coding is to create a coding system where one can access each synthesized channel (i) WN individually and send data only through the subset of (i) them for which I(WN ) is near 1. II. P OLAR CODING For any subset A of {1, . . . , N }, we will write GN (A) to denote the submatrix of GN consisting of rows with indices in A. We will refer to the set A as the information set and denote its cardinality by |A|. We will denote by Ac the complement of A in {1, . . . , N }. For any such A and any binary vector |Ac | |Ac | f1 of length N − |A|, the pair (A, f1 ) will parameterize |A| a block code that maps a data vector d1 to a codeword vector N x1 by the transformation |Ac |

|A|

xN 1 = d1 GN (A) ⊕ f1 c

|A |

GN (Ac )

(6)

Clearly, for any given f1 , such a code is a coset of the N linear code with parameter (A, 0N 1 ), where 01 denotes the

!"!#$%&&'($#)*)+,)($-.+./.($0123$4$5$66($%&&'

zero vector. Polar codes are coset codes of this type with a particular rule for selecting cthe set A, paying no special |A | attention to the choice of f1 . The choice cof A critically |A | turns out to affects code performance while that of f1 be not so critical. We already hinted in the previous section that one should choose A from among indices i such that (i) (i) I(WN ) is near 1. However, instead of I(WN ), we use a more tractable and equally effective channel parameter for selecting (i) the information set, namely Z(WN ), which for any B-DMC W is dened as Z(W ) =

*+ W (y|0)W (y|1)

(7)

y∈Y

and is a measure of how error-prone the channel is. The rule we follow for constructing a polar code is as follows. Polar code construction rule. To construct an (N, K) polar code for a B-DMC W , we select A as the subset of indices A ⊂ {1, . . . , N } such that |A| = K and for each i ∈ A, (i) the value Z(WN ) is among the smallest K values in the set (j) {Z(WN ) : j = 1, . . . , N }. The choice of the frozen vector |Ac | |Ac | f1 is unspecied; any choice (A, f1 ) denes a a polar code. ! The intuition here is that we select the most reliable K (i) channels from the collection {WN }. Thus, polar codes are custom-designed for the channel at hand. By leaving the frozen vector unspecied, we are actually dening, for a given parameter (N, K), a polar code ensemble with 2N −K codes, all sharing a common information set A. This degree of freedom provided by having an ensemble of codes makes it possible to use random-coding techniques in the analysis of polar codes. We will see that on channels that have certain symmetries, all codes in the polar code ensemble have the same performance and for such channels the above rule in effect species a unique code for a given (N, K), and hence, can be said to be fully constructive. Polar codes are tailored for successive cancellation (SC) decoding, which is the type of decoder considered in this paper. To describe the overall coding system, x an (N, K) |Ac | |A| polar code with parameter (A, f1 ). Given a data vector d1 for transmission over WN , the encoder rst prepares the full |A| the channel WN by setting uA = d1 input vector uN 1 to |Ac | and uAc = f1 . Here, uA designates the subvector of uN 1 consisting of coordinates in A, and similarly for uAc . |A| In the sequel, we use uA and d1 interchangeably, and referc to them as the data vector. Likewise, we use uAc and |A | f1 interchangeably, and refer to them as the frozen vector. N The encoder next computes the codeword xN 1 = u1 GN , which gets sent over WN to generate an output vector y1N . The decoder’s task is to generate an estimate uˆA of the data vector uA , using the observation y1N and the knowledge of the frozen vector uAc . A decoding error is said to occur if u ˆA *= uA . We assume that the decoder in the system is a SC decoder

that generates its estimate uˆN 1 by setting , if i ∈ Ac ui u ˆi = hi (y1N , uˆi−1 1 ) if i ∈ A

(8)

in the order i from 1 to N , using the decision functions , (i) (i) N i−1 0 if WN (y1N , ui−1 i−1 N 1 |0) > WN (y1 , u1 |1) hi (y1 , u1 ) = 1 otherwise Although we dened a decision function hi for each 1 ≤ i ≤ N , the SC decoder need only use the decision functions {hi : i ∈ A}; it already knows the correct decisions for i ∈ Ac , and follows the rst clause in (8). The decision functions {hi } dened above look like ML decision functions but are not truly ML since they treat the upstream frozen bits (uj : j > i, j ∈ Ac ) as random, although they are xed. In exchange for this suboptimality, hi can be computed efciently using recursive formulas, which leads to a low-complexity decoding algorithm. This recursive structure also renders the performance analysis of polar coding analytically tractable. Fortunately, the loss in performance due to not using true ML decision functions is negligible in the sense that symmetric channel capacity I(W ) can still be achieved. |Ac | For a given polar ccode with parameters (A, f1 ), we |A | denote by Pe (A, f1 ) the probability of block decoding error under the above SC decoder, and take this as the main performance criterion for polar coding. This is the probability that the decoder’s estimate u ˆA differs from the actual data vector uA , assuming that the latter is chosen from the uniform alternatives. Let P e (A) distribution on the set of all 2|A| c |Ac | denote the average of Pe (A, f1 c ) over all 2|A | possible |A | choices for the frozen vector f1 . !"#$#%&'&#( +* For polar coding on any B-DMC W under the above SC decoding rule, * (i) P e (A) ≤ Z(WN ). (9) i∈A

We should note that, actually, the bound (9) holds for any choice of the information set A, whether selected in accordance with the polar code construction rule or not. This is a typical random-coding result that bounds the average error probability over all coset codes belonging to the linear code with parameter (A, 0N 1 ). The result does not specify any individual coset with the indicated performance; however, as usual in such cases, if the channel has certain symmetries, all cosets are equally good. Let us say that a B-DMC W is symmetric if there exists a permutation π of its output alphabet Y such that W (y|1) = W (π(y)|1) for all y ∈ Y. Examples of symmetric channels are the BEC and the BSC (binary symmetric channel). !"#$#%&'&#( ,* For symmetric B-DMCs, we have * |Ac | (i) Pe (A, f1 ) ≤ Z(WN ) (10) i∈A

uniformly for all

!!")

|Ac | f1 .

!

!"!#$%&&'($#)*)+,)($-.+./.($0123$4$5$66($%&&'

III. R ECURSIVE STRUCTURE The performance analysis of polar codes is made possible by a number of recursive relationships that we give in this section. These recursive relationships are also important since they form the basis of low-complexity decoding algorithms for polar codes. !"#$#%&'&#( -* For any n ≥ 0, N = 2n , 1 ≤ i ≤ N , we have (2i−1)

(y12N , u2i−2 |u2i−1 ) 1 * 1 (i) (i) 2N N +i−1 = WN (y1N , v1i−1 |vi )WN (yN +1 , vN +1 |vN +i ) 2 u

W 2N

2i

and

(2i)

W 2N (y12N , u2i−1 |u2i ) 1 1 (i) N i−1 (i) 2N N +i−1 |vN +i ) = WN (y1 , v1 |vi )WN (yN +1 , vn+1 2 where the variables v12N are as shown in Fig. 3. ! These recursive relationships among the channel transition probabilities give rise to recursive relationships among the rate and reliability parameters in polar coding. !"#$#%&'&#( .* For any N = 2n , n ≥ 0, 1 ≤ i ≤ N , the channel polarization operation is rate-preserving and reliability-improving in the sense that (2i−1)

I(W2N

(2i)

(i)

) + I(W2N ) = 2 I(WN )

(2i−1) Z(W2N )

+

(2i) Z(W2N )



(i) 2 Z(WN )

(1)

starting with "1 = ". IV. A NUMERICAL EXAMPLE In this section, we give a numerical example that illustrates the upper bound of Theorem 3. We x W as a BEC with erasure probability " = 0.5 and consider polar codes of various block lengths N = 2n for n as given in the rst column of Table I. The second column is a threshold parameter η which we take as η = 2−5n/4 . The information sets for the codes ∆ are determined by the threshold parameter as A(η) = {i : (i) Z(WN ) < η}. The third and fourth columns of the table give the rate and reliability gures, which are given, respectively, by ∆ ∆ ) (i) R(η) = |A(η)|/N and B(η) = i∈A(η) Z(WN ). The fth ∆

TABLE I C ODE PERFORMANCE VS . BLOCK LENGTH FOR BEC WITH ! = 0.5. Order n 5 10 15 20 25

(11) (12)

Channel polarization moves the rate and reliability away from the center in the sense that (2i−1)

) ≤ I(WN ) ≤ I(W2N )

(2i−1)

) ≥ Z(WN ) ≥ Z(W2N )

I(W2N

Z(W2N

(i)

(i)

(2i)

(2i)

(13) (14)

where the inequalities in (13) and (14) are all strict if 0 < I(W ) < 1, and all hold with equality otherwise. The reliability terms further satisfy (2i−1)

Z(W2N

(i)

(i)

) ≤ 2Z(WN ) − Z(WN )2

(2i) Z(W2N )

=

(i) Z(WN )2

(15) (16)

with equality in (15) iff W is a BEC. ! The BEC plays an important role as an extreme case in channel polarization. We use the term BEC in a broad sense to mean any B-DMC W : X → Y such that for each output y ∈ Y, either W (y|0)W (y|1) = 0 or W (y|0) = W (y|1). In the latter case, we call y an erasure symbol. The sum of W (y|0) over all erasure symbols y is called the erasure probability of the BEC. We have I(W ) = 1 − " and Z(W ) = " if W is a BEC with erasure probability ". !"#$#%&'&#( /* Let W be a BEC with erasure probability (i) ". Then, each channel WN created by channel polarization is (i) a BEC with erasure probability "N that can be computed by the recursion - .2 (2j−1) (j) (j) = 2 "k − "k "2k (17) - .2 (2j) (j) "2k = "k

(i)

column equals L(η) = max{Z(WN ) : i ∈ A(η)}, which is meant to be a lower bound on the probability of block decoding error for polar codes under SC decoding. As might be expected, L(η) becomes almost identical to η as the code block length increases.

Threshold η = 2−5n/4 1.31 E-2 1.73 E-4 2.27 E-6 2.98 E-8 3.92 E-10

Rate R(η) 0.1875 0.3105 0.4009 0.4530 0.4789

Upper bound B(η) 1.17 E-2 2.08 E-3 3.38 E-4 5.32 E-5 8.09 E-6

Lower bound L(η) 1.00 E-2 1.72 E-4 2.27 E-6 2.98 E-8 3.92 E-10

The evolution of the rate and reliability gures in the table provides empirical evidence that polar coding achieves channel capacity under SC decoding as the block length is increased. This is indeed true and it will stated more precisely as the main result of this paper in the next section. The table also suggests that convergence of the coding rate to channel capacity under polar coding may be extremely slow. Fortunately, a signicant part of this slow convergence can be attributed to the use of a SC decoder. The SC decoder is analytically tractable due to its one-pass nature but also far from being optimal for the same reason. We have obtained better decoder performance using an iterative decoder as we will say more about in Sect. VI. V. M AIN CODING THEOREMS 012#"23 )* For any B-DMC W and constant δ > 0, there exist nite constants n1 = n1 (δ, W ) and c = c(δ, W ) such that for all n ≥ n1 , there exists a polar code ensemble, with block length N = 2n , information set A ⊂ {1, . . . , N }, and rate |A|/N > I(W ) − δ, for which the ensemble average of the probability of error under successive cancellation decoding satises P e (A) ≤ c N −1/4 .

(18)

If W is a symmetric B-DMC, then the above claim is true with channel capacity C(W ) in place of symmetric capacity I(W ). Furthermore, for symmetric channels, whenever the

!!"*

!"!#$%&&'($#)*)+,)($-.+./.($0123$4$5$66($%&&'

bound (18) holds for an ensemble, it holds for each code in the ensemble, i.e., |Ac |

Pe (A, f1

) ≤ c N −1/4

(19)

and polar codes under BP decoding and reported the results in [6]. The results show signicantly better performance in favor of polar codes. VII. C ONCLUDING REMARKS

c

|A |

for all f1 . ! This bound on error probability is known to be loose [1]. Obtaining better bounds is an important problem for further research on polar codes. 012#"23 +* For any (N, K) polar code on any B-DMC W , there exists an encoder and a SC decoder each with the same order of complexity O(N log N ). ! A noteworthy feature of this theorem is that the complexity bound applies uniformly for all coding rates. This theorem is proved in [1] by giving specic encoding and decoding algorithms. An important complexity issue regarding any coding scheme is construction complexity. For polar coding, we have not found a low-complexity algorithm for constructing the information set A in the exact manner as specied in Sect. II. Lowcomplexity polar code construction algorithms are possible by (i) using Monte-Carlo estimates of the numbers Z(WN ) instead of their exact values. An exception to these statements is the BEC for which exact code construction is possible in complexity O(N log N ) by using the recursive formula (17). VI. R ELATIONS TO PREVIOUS WORK Channel polarization is based on ideas of [2] where channel combining and splitting were used to show that improvements can be obtained in the sum cutoff rate. However, no recursive method was suggested in [2] to reach the ultimate limit of such improvements. Polar coding has a strong resemblance to Reed-Muller (RM) coding [3],[4], and may be regarded as a generalization of it. Both coding schemes start with a generator matrix for a rate one code and obtain generator matrices of lower rate codes in the family by expurgating rows of the initial generator matrix. ˜ N and obtains Polar coding starts with the matrix GN = F ⊗n Π lower rate codes by expurgating rows using a reliability-based criterion as we described above. For RM coding the rate one generator matrix, denoted GRM (n, n), may be taken as F ⊗n . The rth order RM code RM(r, n) is dened as the linear code with generator matrix GRM (r, n) which is obtained by expurgating rows of F ⊗n with Hamming weights less than 2n−r . Thus, the two coding schemes start with full-order matrices that have the same set of row vectors, only in different orders. The main difference between the two coding schemes comes from the use of different rules for row expurgation; while polar codes use a rule that is matched to the channel and the decoder, the RM rule is static and not necessarily matched to any channel. This issue is further discussed in [1]. While in this context, we would like to point out that, as Forney [5] shows, RM codes have a sparse factor graph representation and can be decoded using a belief propagation (BP) decoder for superior performance. Since polar codes have the same skeleton as RM codes, they too can be decoded by BP decoders. Indeed, we carried out a comparison of RM codes

The basic idea that we pursued in this work has been that channels tend to polarize with respect to rate and reliability under certain combining and splitting operations. We investigated a particular polarization scheme whose salient feature was recursiveness. Recursiveness rendered the scheme analytically tractable, and also provided low-complexity encoding and decoding algorithms. Polar coding for non-binary channels is an open research subject. Another subject for future work is to explore the practical utility of polar codes by studying their performance under more powerful decoding algorithms. The analysis in [1] provides strong evidence that channel polarization is a commonplace phenomenon, which is almost impossible to avoid as long as channels are combined with a sufcient density and mix of connections, whether chosen recursively or at random. The study of channel polarization in such generality is an interesting theoretical problem. ACKNOWLEDGMENT This work was supported in part by The Scientic and ¨ ITAK) under Technological Research Council of Turkey (TUB contract no. 107E216 and in part by the European Commission FP7 Network of Excellence NEWCOM++ (contract no. 216715). R EFERENCES [1] E. Arkan, “Channel polarization: A method for constructing capacityachieving codes for symmetry binary-input memoryless channels.” Submitted to IEEE Trans. Inform. Theory, Oct. 2007. [2] E. Arkan, “Channel combining and splitting for cutoff rate improvement,” 4555 0"6(%7 4(8#"37 012#"9, vol. IT-52, Feb. 2006. [3] D. E. Muller, “Application of boolean algebra to switching circuit design and to error correction,” 4:5 0"6(%7 5;2