Polar Lattices are Good for Lossy Compression

Report 2 Downloads 111 Views
Polar Lattices are Good for Lossy Compression Ling Liu and Cong Ling

arXiv:1501.05683v1 [cs.IT] 22 Jan 2015

Department of Electrical and Electronic Engineering Imperial College London London, UK Email: [email protected], [email protected]

Abstract—To be considered for an 2015 IEEE Jack Keil Wolf ISIT Student Paper Award. Polar lattices, which are constructed from polar codes, have recently been proved to be able to achieve the capacity of the additive white Gaussian noise (AWGN) channel. In this work, we show that polar lattices can also solve the dual problem, i.e., achieving the rate-distortion bound of a memoryless Gaussian source, which means that polar lattices can also be good for the lossy compression of continuous sources. The structure of the proposed polar lattices enables us to integrate the post-entropy coding process into the lattice quantizor, which simplifies the quantization process. Moreover, the nesting structure of polar lattices further provides solutions for some multi-terminal coding problems. The Wyner-Ziv coding problem for a Gaussian source can be solved by an AWGN capacity achieving polar lattice nested in a rate-distortion bound achieving one, and the Gelfand-Pinsker problem can be solved in a reversed manner.

I. I NTRODUCTION Vector quantization (VQ) [1] has been widely used for source coding of image and speech data since the 1980s. Compared with scalar quantization, the advantage of VQ, guaranteed by Shannon’s rate-distortion theory, is that better performance can always be achieved by coding vectors instead of scalars, even in the case of memoryless sources. However, the Shannon theory does not provide any constructive VQ design scheme. During the past several decades, many practical VQ techniques with relatively low complexity have been proposed, such as lattice VQ [2], multistage VQ, treestructured VQ, gain-shape VQ, etc. Among them, lattice VQ is of particular interest because its highly regular structure makes compact storage and fast quantization algorithm possible. In this work, we discuss the explicit construction of polar lattice for quantization, which achieves the rate-distortion bound of the continuous Gaussian source. It is well known that the optimal output alphabet size is infinite for continuousamplitude sources. Particularly, the rate distortion function for the Gaussian source of variance σs2 under the squared-error distortion measure d(x, y) = (x − y)2 is given by  σ2  o n1 (1) log s , 0 , R(∆) = max 2 ∆ where ∆ and R denote the average distortion and rate per symbol, respectively. However, in practice, the size of the reconstruction alphabet needs to be finite. Unconstructively, [3, Theorem 9.6.2] shows the existance of a block code with finite number of output letters that achieves performance arbitrarily close to the rate-distortion bound. Then a size-constrained

output alphabet rate-distortion function RM (∆) was defined in [4] with M denoting the size of output alphabet. The wellknown trellis coded quantization (TCQ) [5] was motivated by this alphabet constrained rate-distortion theory. It was shown that for a given encoding rate of R bits per symbol, the ratedistortion function R(∆) can be approached by using a TCQ encoder with rate R+1 after an initial Lloyd-Max quantization. It is equivalent to the trellis coded modulation (TCM) in the sense that m information bits are transmitted using 2m+1 constellation points. A near-optimum lattice quantization scheme based on tailbiting convolutional codes was introduced in [6]. Despite enjoying a good practical performance, a theoretical proof of the rate-distortion bound achieving TCQ with low complexity is still missing. More recently, a scheme based on low density Construction-A (LDA) lattices [7] was proved to be quantization-good (defined in Sect. III) using the minimumdistance lattice decoder. However, in practice it cannot be realized by the belief-propagation decoding algorithm. Polar lattices have the potential in solving this problem with low complexity. As shown in [8], this kind of lattice structure enables us to employ the discrete lattice Gaussian distribution for lattice shaping. This distribution shares many similar properties to the continuous Gaussian distribution and obtains the optimal shaping gain when its associate flatness factor is negligible. We may use it instead of the continuous Gaussian as the distribution of the reconstruction alphabet. It is also shown that even using binary lattice partition, the number of the partition levels r does not need to be very large (O(log log N )) to achieve the AWGN capacity 12 log(1+SNR), where SNR denotes the signal noise ratio of AWGN channel. By the duality between source coding and channel coding, the quantization lattices can be viewed as a kind of channel coding lattices constructed on the test channel. For the Gaussian source with variance σs2 and an average distortion ∆, the test channel is actually an AWGN channel with noise variance σ2 −∆ ∆. In this case, the “ SNR ” of the test channel is s∆ , σ2 and its “capacity” is 12 log( ∆s ), which implies that the rate of the polar lattice quantizor can be made arbitrarily close σ2 to 21 log( ∆s ). Therefore, based on this idea, we introduce the construction of polar lattices which are good for quantization in this work. Compared with traditional lattice quantization schemes, which generally require a separate entropy encoding process after obtaining the quantized lattice points, our scheme naturally integrates these two processes together.

The paper is organized as follows: Section II presents the background of lattices and discrete Gaussian distribution. The construction of rate-distortion bound achieving polar lattices is investigated in Section III. In Section IV, by combining the AWGN capacity achieving polar lattices and the proposed quantization polar lattices, a brief discussion on the Wynerziv and Gelfand-Pinsker coding is addressed. The paper is concluded in Section V. All random variables (RVs) will be denoted by capital letters. Let PX denote the probability distribution of a RV X taking values x in a set X and let H(X) denote its entropy. For multilevel coding, we denote by Xℓ a RV X at level ℓ. The i-th realization of Xℓ is denoted by xiℓ . We also use the j i notation xi:j ℓ as a shorthand for a vector (xℓ , ..., xℓ ), which i:j j i is a realization of RVs Xℓ = (Xℓ , ..., Xℓ ). Similarly, xiℓ: will denote the realization of the i-th RV from level ℓ to level i , i.e., of Xℓ: = (Xℓi , ..., Xi ). For a set I, |I| represents its cardinality. Throughout this paper, we use the binary logarithm and information is measured in bits. II. BACKGROUND

ON

L ATTICES

P where fσ,c (Λ) = λ∈Λ fσ,c (λ). For convenience, we write DΛ,σ = DΛ,σ,0 . It has been proved to achieve the optimum shaping gain when the flatness factor is negligible [11]. A sublattice Λ′ ⊂ Λ induces a partition (denoted by Λ/Λ′ ) of Λ into equivalence groups modulo Λ′ . The order of the partition is denoted by |Λ/Λ′ |, which is equal to the number of the cosets. If |Λ/Λ′ | = 2, we call this a binary partition. Let Λ(Λ0 )/Λ1 / · · · /Λr−1 /Λ′ (Λr ) for r ≥ 1 be an n-dimensional lattice partition chain. If only one level is applied (r = 1), the construction is known as “Construction A”. If multiple levels are used, the construction is known as “Construction D” [2, p.232]. For each partition Λℓ−1 /Λℓ (1 ≤ ℓ ≤ r) a code Cℓ over Λℓ−1 /Λℓ selects a sequence of coset representatives aℓ in a set Aℓ of representatives for the cosets of Λℓ . This construction requires a set of nested linear binary codes Cℓ with block length N and dimension of information bits kℓ which are represented as [N, kℓ ] for 1 ≤ ℓ ≤ r and C1 ⊆ N C2 · ·· ⊆ Cr . Let ψ be the natural embedding of FN 2 into Z , where F2 is the binary field. Consider b1 , b2 , · · · , bN be a basis of FN 2 such that b1 , · · · bkℓ span Cℓ . When n = 1, the binary lattice L consists of all vectors of the form

A. Definitions A lattice is a discrete subgroup of Rn which can be described by Λ = {λ = Bz : z ∈ Zn } where the columns of the generator matrix B = [b1 , · · · , bn ] are linearly independent. For a vector x ∈ Rn , the nearest-neighbor quantizer associated with Λ is QΛ (x) = arg minλ∈Λ kλ − xk. We define the modulo lattice operation by x mod Λ , x − QΛ (x). The Voronoi region of Λ, defined by V(Λ) = {x : QΛ (x) = 0}, specifies the nearest-neighbor decoding region. The volume of a fundamental region is equal to that of the Voronoi region V(Λ), which is given by V (Λ) = |det(B)|. For σ > 0 and c ∈ Rn , we define the Gaussian distribution of variance σ 2 centered at c as kx−ck2 1 fσ,c (x) = √ e− 2σ2 , x ∈ Rn . ( 2πσ)n Let fσ,0 (x) = fσ (x) for short. The Λ-periodic function is defined as X − kx−λk2 X 1 e 2σ2 . fσ,Λ (x) = fσ,λ (x) = √ n ( 2πσ) λ∈Λ λ∈Λ We note that fσ,Λ (x) is a probability density function (PDF) if x is restricted to the fundamental region R(Λ). This distribution is actually the PDF of the Λ-aliased Gaussian noise, i.e., the Gaussian noise after the mod-Λ operation [9]. B. Flatness Factor and Discrete Lattice Gaussian Distribution The flatness factor of a lattice Λ is defined as [10] ǫΛ (σ) , max |V (Λ)fσ,Λ (x) − 1|. x∈R(Λ)

We define the discrete Gaussian distribution over Λ centered at c as the discrete distribution taking values in λ ∈ Λ: DΛ,σ,c (λ) =

fσ,c (λ) , ∀λ ∈ Λ, fσ,c (Λ)

(2)

r X ℓ=1

2ℓ−1

kℓ X

(ℓ)

αj ψ(bj ) + 2r z,

(3)

j=1

(ℓ)

where αj ∈ {0, 1} and z ∈ ZN . When {C1 , ..., Cr } is a series of nested polar codes, we obtain a polar lattice. III. P OLAR L ATTICES FOR Q UANTIZATION We use Y ∼ N (0, σs2 ) to denote a one dimensional Gaussian source with zero mean and variance σs2 . Let Y 1:N be N independent copies of Y and a realization of Y 1:N (Y¯ ) is denoted y ). For by y 1:N (¯ y). The PDF of Y¯ is given by fY¯ (¯ y ) = fσs (¯ an N -dimensional polar lattice L and its associated quantizer QL (·), the average distortion ∆ after quantization is given by Z 1 ∆= k y¯ − QL (¯ y ) k2 fY¯ (¯ y )d¯ y. (4) N RN The normalized second moment R (NSM) of a quantization 1

kvk2 dv

, where vector v lattice L is defined as G(L) = N V V(L) (L)1+2/N is uniformly distributed in V(L). An N -dimensional lattice L 1 . is called quantization-good [12] if lim G(L) = 2πe N →∞ In [13], an entropy-coded dithered quantization (ECDQ) scheme based on quantization-good lattices was proposed to achieve the rate-distortion bound of the Gaussian source. This scheme requires a pre-shared dither which is uniformly distributed in the Voronoi region of a quantization-good lattice and an entropy encoder after lattice quantization. For our quantization scheme, we will show that dither is not necessary and the entropy encoder can be integrated in the lattice quantization process, which brings much convenience for practical application. Our task is to construct a polar lattice which achieves the rate distortion bound of the Gaussian source with reconstruction distribution DΛ,√σ2 −∆ . Following the notation of s

AWGN-good polar lattices, we use X to denote the reconstruction alphabet. Firstly, we prove that the rate of the DΛ,√σ2 −∆ 1 2

s

σ2 log( ∆s ).

Note that the following can be arbitrarily close to theorem is essentially the same as Theorem 2 in [11]. Here we just reexpress it in the source coding formulation. Theorem 1 ( [11]): Consider a test channel where the reconstruction constellation X has a discrete Gaussian distribution DΛ−c,σr for arbitrary c ∈ Rn , and where σr2 = √σs2 − ∆ with ∆ being the average distortion. Let σ ˜∆ , σrσs ∆ . Then, if πǫt 1 ǫ = ǫΛ (˜ σ∆ ) < 2 and 1−ǫt < ǫ where  q π ǫΛ (σr / t ≥ 1/e π−t ), q ǫt , (5) π (t−4 + 1)ǫΛ (σr / 0 < t < 1/e π−t ),

the discrete Gaussian constellation results in mutual informaσ2 tion I∆ ≥ 12 log( ∆s ) − 5ǫ n per channel use. The statement of Theorem 1 is non-asymptotical, i.e., it can hold even if n = 1. Therefore, it is possible to construct a good polar lattice over one-dimensional lattice partition such as Z/2Z/4Z.... The flatness factor ǫ can be made negligible by scaling this binary partition. This technique has already been used for the construction of AWGN-good polar lattices. Note that when the test channel is chosen to be an AWGN channel with noise variance ∆ and the reconstruction alphabet is discrete Gaussian distributed, the source distribution is not exactly a continuous Gaussian distribution. In fact, it is a distribution obtained by adding a continuous Gaussian of variance ∆ to a discrete Gaussian DΛ−c,σr , which is expressed as the following convolution X 1 fσr (t)fσ (y ′ − t), y ′ ∈ Rn , (6) fY ′ (y ′ ) = fσr (Λ − c) t∈Λ−c √ ′ where σ = ∆ and Y denotes the new source. For simplicity, in this work we only consider one dimensional binary partition chain (n = 1) and Y ′ is also a one dimensional source. Therefore, we are actually quantizing source Y ′ instead of Y using the discrete Gaussian distribution. However, when the flatness factor ǫΛ (˜ σ∆ ) is small, a good polar quantizer for the source Y ′ is also good for source Y because of the following lemma. The relationship between the quantization of source Y ′ and Y is shown in Fig. 1. N (0, ") X ~ D! ,

Y ' ~ fY '

r

test channel N (0,

2 r

Similarly distributed Y ~ N (0,

)

2 s

)

N (0, ")

Fig. 1.

The relationship between the quantization of source Y ′ and Y .

Lemma 1 ( [11]): If ǫ = ǫΛ (˜ σ∆ ) < 12 , the variational distance between the density fY ′ of source Y ′ defined in (6) and the Gaussian density fY satisfies V(fY ′ , fY ) ≤ 4ǫ.

Now we consider the construction of polar lattices. Firstly, we consider the quantization of source Y ′ using the reconstruction distribution DΛ,σr . Since binary partition is used, X can be represented by a binary string X1:r , and we have limr→∞ PX1:r = PX = DΛ,σr . Because the polar lattices are constructed by “Construction D”, we are interested in the test channel at each level. Similar to the setting of shaping for AWGN-good polar lattices [8], given the previous x1:ℓ−1 , the channel transition PDF at level ℓ is P ′ a∈Aℓ (x1:ℓ ) P (a)PY ′ |A (y |a) ′ PY ′ |Xℓ ,X1:ℓ−1 (y |xℓ , x1:ℓ−1 ) = P {Aℓ (x1:ℓ )}   ′2 y 1 1 √ = exp − 2(σs2 + ∆) fσs (Aℓ (x1:ℓ )) 2π ∆σs   X  1 exp − 2 |αy ′ − a|2 . 2˜ σ∆ a∈Aℓ (x1:ℓ )

where α = √ σr ∆ σs .

σr2 σr2 +∆

is equal to the MMSE coefficient and σ ˜∆ =

Consequently, using DΛ,σr as the constellation, the ℓth channel is generally asymmetric with the input distribution PXℓ |X1:ℓ−1 (ℓ ≤ r), which can be calculated according to the definition of DΛ,σr . The lattice quantization can be viewed as lossy compression at all the binary testing channels from level 1 to r. Here we start with the first level. Let y ′1:N denotes the realization of N i.i.d copies of source Y ′ . Although Y ′ is a continuous source with density given by (6) and y ′ is drawn from R, to keep the form of the polar codes settings consistent (the definition of the Bhattacharyya parameter is given by a summation), from now on we will express the distortion measurement as well as the variance distance in the form of summation instead of integral. Note that these changes do not affect our result. In fact, the Bhattacharyya parameter can also be defined for those channels with infinity output alphabet and it can be calculated effectively within arbitrary precision (see [14]). Since the test channel at each level is not necessarily symmetric and the reconstruction constellation is not uniformly distributed, we have to consider the lossy compression for nonuniform source and asymmetric distortion measure. The solution of this problem has already been introduced in [15], and it turns out to be similar to the construction of polar codes for asymmetric channels. For the first level, letting U11:N = X11:N GN , where GN is the N × N generator matrix of polar codes, we define the information set I1 , frozen set F1 and shaping set S1 as follows:  β 1:i−1 i  , Y ′1:N ) ≥ 1 − 2−N }  F1 = {i ∈ [N ] : Z(U1 |U1     I = {i ∈ [N ] : Z(U i |U 1:i−1 ) > 2−N β and 1 1 1 (7) 1:i−1 i ′1:N −N β   Z(U |U , Y ) < 1 − 2 } 1 1    β  S1 = {i ∈ [N ] : Z(U1i |U11:i−1 ) ≤ 2−N }. Note that this definition is similar to that in [8, Equation (21)]. The only difference is that I1 is designed to be slightly larger to guarantee a desired distortion level. The asymmetric Bhat-

tacharyya parameter (defined in [8]) Z(U1i |U11:i−1 , Y ′1:N ) and Z(U1i |U11:i−1 ) can be efficiently calculated by symmetric Bhat˜ U ˜1i |U ˜ 1:i−1 , X11:N ⊕ X ˜ 11:N , Y ′1:N ) tacharyya parameter [16] Z( 1 i ˜ 1:i−1 1:N 1:N ˜ ˜ ˜ and Z(U1 |U1 , X1 ⊕ X1 ), respectively (see [8] for more details). According to [8, Theorem 5], the proportion of set I1 approaches I(X1 , Y ′ ) when N → ∞. After getting F1 , I1 and S1 , for a source sequence y ′1:N , the encoder determines u1:N according to the following rule: 1 ( 0 w. p. PU i |U 1:i−1 ,Y ′1:N (0|u1:i−1 , y ′1:N ) 1 1 1 ui1 = if i ∈ I1 , (8) 1 w. p. PU i |U 1:i−1 ,Y ′1:N (1|u1:i−1 , y ′1:N ) 1 1

1

and ui1

=

(

u ¯i1 if i ∈ F1

arg max PU i |U 1:i−1 (u|u1:i−1 ) if i ∈ S1 . 1 u

1

(9)

1

Here u ¯i1 is a uniformly random bit determined before lossy compression. The output of the encoder at level 1 is uI1 1 = {ui1 , i ∈ I1 }. To reconstruct x11:N , the decoder uses the 1 shared uF and the received u1I1 to recover uS1 1 according 1 to arg maxPU i |U 1:i−1 (u|u1:i−1 ) and then x11:N = u1:N 1 GN . 1 u

1

1

The probability PU i |U 1:i−1 and PU i |U 1:i−1 ,Y ′1:N can both be 1 1 1 1 calculated efficiently by the successive cancellation algorithm with complexity O(N log N ) [8]. ′1:N Theorem 2: Let QU11:N ,Y ′1:N (u1:N ) denote the joint 1 ,y 1:N 1:N according to the encodand Y1 distribution for U1 ing rule described in (8) and (9). Consider another encoder using the encoding rule (8) for all i ∈ [N ] and let ′1:N PU11:N ,Y ′1:N (u1:N ) denote the resulted joint distribution 1 ,y 1:N 1:N of U1 and Y1 . For any β ′ < β < 1/2 satisfying (7) and R1 > I(X1 ; Y ′ ), β′

V(PU11:N ,Y ′1:N , QU11:N ,Y ′1:N ) = O(2−N ).

(10)

according to the following rule: ( ′1:N 0 w.p. PU i |U 1:i−1 ,X 1:N ,Y ′1:N (0|u1:i−1 , x1:N ) 1 ,y 2 i 2 2 1 u2 = (12) ′1:N 1 w.p. PU i |U 1:i−1 ,X 1:N ,Y ′1:N (1|u1:i−1 , x1:N ) 1 ,y 2 2

2

1

for i ∈ I2 and ( i u¯2 if i ∈ F2 i u2 = (13) arg max PU i |U 1:i−1 ,X 1:N (u|u1:i−1 , x1:N 1 ) if i ∈ S2 . 2 u

2

2

1

We further extend Theorem 2 to the second level. 1:N ′1:N Theorem 3: Let QU21:N ,U11:N ,Y ′1:N (u1:N ) de2 , u1 , y 1:N 1:N 1:N notes the joint distribution for U2 and (U1 , Y1 ) according to the encoding rule described in (12) and (13). Consider another encoder using the encoding rule (12) for all i ∈ [N ] 1:N ′1:N and let PU21:N ,U11:N ,Y ′1:N (u1:N ) denote the resulted 2 , u1 , y ′ joint distribution. For any β < β < 1/2 satisfying (11) and R2 > I(X2 ; Y ′ |X1 ), β′

V(PU21:N ,U11:N ,Y ′1:N , QU21:N ,U11:N ,Y ′1:N ) = O(2−N ).

(14)

Note that Theorem 3 is based on the assumption that β′ V(PU11:N ,Y ′1:N , QU11:N ,Y ′1:N ) = O(2−N ), which means we P2 also need R1 > I(X1 ; Y ′ ). Therefore, we have i=1 Ri > I(X1 X2 ; Y ′ ). By induction, for level ℓ (ℓ ≤ r), we define the three 1:N sets Fℓ , Iℓ and Sℓ in the same form as (11) with X1:ℓ−1 1:N replacing X1 and Uℓ replacing U2 . Similarly, the encoder determines u1:N (u1:N = x1:N ) according to the rule given ℓ ℓ ℓ 1:N 1:N and x1:N by (12) and (13), with X1:ℓ−1 1:ℓ−1 replacing X1 1:N 1:N ′1:N 1:N ,Y ′1:N (u and x1 , respectively. Let QU1:ℓ ) denote 1:ℓ , y the associate joint distribution resulted from this encoder 1:N ′1:N and PU1:ℓ 1:N ,Y ′1:N (u ) denote the one that resulted 1:ℓ , y from an encoder only using (12) for all i ∈ [N ]. We have ′ −N β V(PU1:ℓ 1:N ,Y ′1:N , QU 1:N ,Y ′1:N ) = O(ℓ · 2 ) for any rate 1:ℓ Rℓ > I(Xℓ ; Y ′ |X1:ℓ−1 ). Specifically, at level r, for any rate Pr Rr > I(Xr ; Y ′ |X1:r−1 ) and i=1 Ri > I(X1:r ; Y ′ ), we have

A same statement has been given in [15] yet without proof. Here we prove the theorem in Appendix A for completeness. ′ Now we introduce the construction for higher levels. −N β V(PU1:r 1:N ,Y ′1:N , QU 1:N ,Y ′1:N ) = O(r · 2 ). (15) 1:r Taking the second level as an example, to make up the reconstruction constellation distribution, the input distribu- By [8, Lemma 5], I(X1:r ; Y ′ ) is arbitrarily close to I(X; Y ′ ) tion at level 2 should be PX2 |X1 . Based on the quantiza- when N is sufficiently large and r = O(log log N ), which ′ tion results (U11:N , Y ′1:N ) given by the encoder at level 1, gives us V(PU 1:N ,Y ′1:N , QU 1:N ,Y ′1:N ) = O(2−N β ). 1:r 1:r i 1:N 1:N = X2 GN ) is almost deterministic given some U2 (U2 Now we present the main theorem of this paper. (U21:i−1 , U11:N ). Since there is a one to one mapping between Theorem 4: Given a Gaussian source Y with variance σs2 X11:N and U11:N , given (U21:i−1 , U11:N ) is the same as given σ2 and an average distortion ∆ ≤ σs2 , for any rate R > 12 log( ∆s ), (U21:i−1 , X11:N ). We define the information set I2 , frozen set there exists a multilevel polar code with rate R such that the F2 and shaping set S2 as follows: distortion is arbitrarily close to ∆ when N → ∞ and r =  1:i−1 i 1:N ′1:N −N β  O(log log N ). This multilevel polar code is actually a shifted F = {i ∈ [N ] : Z(U |U , X , Y ) ≥ 1 − 2 }  2 2 2 1    polar lattice L + c constructed from the lattice partition Λ/Λ′  I = {i ∈ [N ] : Z(U i |U 1:i−1 , X 1:N ) > 2−N β and 2 2 2 1 (11) with a shaping according p to the discrete Gaussian distribution β 1:i−1 i 1:N ′1:N   Z(U2 |U2 , X1 , Y ) < 1 − 2−N } DΛ,σr , where σ√r = σs2 − ∆ and the partition chain is scaled    β  to make ǫΛ ( σrσs D ) → 0. (The proof is given in Appendix C.) S2 = {i ∈ [N ] : Z(U2i |U21:i−1 , X11:N ) ≤ 2−N }. IV. W YNER -Z IV AND G ELFAND -P INSKER C ODING The proportion of I2 approaches I(X2 ; Y |X1 ) when N is sufAccording to the prior work done by Zamir, Shamai and ficiently large. For a given source sequence pair (u11:N , y ′1:N ) ′1:N Erez [17], the Wyner-Ziv and Gelfand-Pinsker problems can be or (x1:N ), the encoder at level 2 determines u1:N 1 ,y 2

solved by nested quantization-good and AWGN-good lattices. However, due to the lack of explicit construction of such good lattices, no explicit solution was addressed. In this section, we solve these problems by combining the recently proposed AWGN capacity achieving polar lattices [8] and the ratedistortion bound achieving ones. An advantage of this new structure is that these two kinds of lattices can be nested to each other and shaped simultaneously. To save space, we only present the basic design principle in this section. More details and rigorous proofs will be given in the upcoming journal version of this work. For the Wyner-Ziv problem, let X, Y be two jointly Gaussian source and X = Y + Z, where Z is a Gaussian noise independent to Y with variance σz2 . Given Y as the side information, the Wyner-Ziv rate distortion bound on source X is given by  σ2  o n1 log z , 0 . RW Z (∆) = max 2 ∆ To achieve this bound, we firstly design a quantization polar lattice L1 for source X with Gaussian reconstruction alphabet σz2 X ′ . Letting η = σ2 −∆ , the distortion between X ′ and X is z targeted to be η∆, i.e., X ′ = X + N (0, η∆). Then we can also obtain that X ′ = Y + N (0, ησz2 ), which corresponds to an AWGN capacity achieving polar lattice L2 from Y and X ′ . ˆ = Y + 1 (X ′ − The final reconstruction of X is given by X η ˆ + N (0, ∆) and I(X ′ ; X) − Y ). We can check that X = X σ2 I(X ′ ; Y ) = 12 log( ∆z ). Note that X ′ is continuous Gaussian distributed and we can use the discrete lattice Gaussian DΛ,σx′ to replace it. When the flatness factor is negligible, RW Z (∆) σ2 can be made arbitrarily close to 12 log( ∆z ). Since ∆ ≤ σz2 , it is not difficult to observe that L2 is nested within L1 . For the Gelfand-Pinsker problem, with some abuse of notations, consider the channel Y = X + S + Z, where X and Y are the channel input and output, respectively. Z is an unknown additive Gaussian noise with variance σz2 and S is an interference Gaussian signal with variance σi2 known only to the encoder. Given the input power constrain N1 E[kXk2 ] ≤ P , the capacity with side information S at the transmitter is  1 P CGP = log 1 + 2 . 2 σz

To achieve this capacity, the roles of L1 and L2 should P be reversed. Letting γ = P +σ 2 , we firstly design L1 for z source γS with Gaussian reconstruction alphabet S ′ . The distortion between S ′ and γS is targeted to be P , i.e., S ′ = γS +N (0, P ). Then the encoder transmits X = S ′ −γS, which satisfies the power constrain. Moreover, the relationship between Y and S ′ is given by S ′ = γY + (1 − γ)X − γZ. According to the analysis in [17], the noise term (1−γ)X−γZ P σ2 is independent to Y , which leaves us S ′ = γY + N (0, P +σz2 ). z L2 is designed to be able to recover S ′ from γY . Without the power constrain, the maximum data rate that can be sent is actually I(S ′ ; γY ). However, when power constrain is taken into consideration, I(S ′ ; γS) bits should be selected according

to the realization of S, which gives us the actual data rate I(S ′ ; γY ) − I(S ′ ; γS) = 12 log(1 + σP2 ). Similarly, we can z also use the discrete lattice Gaussian distributed version of S ′ to approach this capacity. It is worth mentioning that, in this case, L1 is not exactly nested within L2 because of some unpolarized indices, which has already been addressed in [18]. Fortunately, the proportion of this part is negligible and we can use the two phases transmission method [18] to fix this issue. V. C ONCLUDING R EMARKS In this work, we present an explicit construction of polar lattices which are good for lossy compression. They are further utilized for the solution of the Gaussian version of the Wyner-Ziv and Gelfand-Pinsker problems. Compared with the original idea given in [17], dither is not necessary in our scheme due to the property of discrete lattice Gaussian distribution [11], and the entropy encoder is integrated in our lattice quantization process. R EFERENCES [1] R. Gray, “Vector quantization,” IEEE ASSP Mag., vol. 1, no. 2, pp. 4–29, April 1984. [2] J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices, and Groups. New York: Springer, 1993. [3] R. G. Gallager, Information theory and reliable communication. Springer, 1968, vol. 2. [4] W. Finamore and W. Pearlman, “Optimal encoding of discrete-time continuous-amplitude memoryless sources with finite output alphabets,” IEEE Trans. Inf. Theory, vol. 26, no. 2, pp. 144–155, Mar. 1980. [5] M. Marcellin and T. Fischer, “Trellis coded quantization of memoryless and Gauss-markov sources,” IEEE Trans. Commun., vol. 38, no. 1, pp. 82–93, Jan. 1990. [6] B. Kudryashov and K. Yurkov, “Near-optimum low-complexity lattice quantization,” in Proc. 2010 IEEE Int. Symp. Inform. Theory, Austin, USA, June 2010, pp. 1032–1036. [7] S. Vatedka and N. Kashyap, “Some “goodness” properties of lda lattices,” Oct. 2014. [Online]. Available: http://arxiv.org/abs/1410.7619 [8] Y. Yan, L. Liu, C. Ling, and X. Wu, “Construction of capacityachieving lattice codes: Polar lattices,” Nov. 2014. [Online]. Available: http://arxiv.org/abs/1411.0187 [9] G. D. Forney Jr., M. Trott, and S.-Y. Chung, “Sphere-bound-achieving coset codes and multilevel coset codes,” IEEE Trans. Inf. Theory, vol. 46, no. 3, pp. 820–850, May 2000. [10] C. Ling, L. Luzzi, J. Belfiore, and D. Stehle, “Semantically secure lattice codes for the Gaussian wiretap channel,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 6399–6416, Oct. 2014. [11] C. Ling and J. Belfiore, “Achieving AWGN channel capacity with lattice gaussian coding,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 5918– 5929, Oct. 2014. [12] R. Zamir and M. Feder, “On lattice quantization noise,” IEEE Trans. Inf. Theory, vol. 42, no. 4, pp. 1152–1159, Jul 1996. [13] R. Zamir, Lattice Coding for Signals and Networks: A Structured Coding Approach to Quantization, Modulation, and Multiuser Information Theory. Cambridge, UK: Cambridge University Press, 2014. [14] I. Tal and A. Vardy, “How to construct polar codes,” IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6562–6582, Oct. 2013. [15] J. Honda and H. Yamamoto, “Polar coding without alphabet extension for asymmetric models,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7829–7838, Dec. 2013. [16] E. Arıkan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009. [17] R. Zamir, S. Shamai, and U. Erez, “Nested linear/lattice codes for structured multiterminal binning,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1250–1276, Jun 2002. [18] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, Ecole Polytechnique F´ed´erale de Lausanne, Lausanne, Switzerland, 2009.

A PPENDIX A P ROOF OF T HEOREM 2 Proof: Firstly, we change the encoding rule for the ui1 in i ∈ S1 and (9) is modified to  i ¯1 if i ∈ F1   (u i 0 w. p. PU i |U 1:i−1 (0|u1:i−1 ) u1 = (16) 1 1  if i ∈ S1 .  1 w. p P 1 1:i−1 1:i−1 (1|u1 ) U i |U 1

1

′1:N Let Q′U 1:N ,Y ′1:N (u1:N ) denote the associate joint 1 ,y 1 1:N 1:N distribution for U1 and Y1 according to the encoding rule described in (8) and (16). Then the variational distance between PU 1:N ,Y 1:N and Q′U 1:N ,Y 1:N can be bounded as (17), where D(·||·) is the relative entropy, and the equalities and the inequalities follow from

(a) The telescoping expansion [15]. (b) Q′ (ui1 |u1:i−1 , y ′1:N ) = P (ui1 |u1:i−1 , y ′1:N ) for i ∈ I1 . 1 1 (c) Pinsker’s inequality. (d) Jensen’s inequality. ui1 is uniformly random) for i ∈ F1 (e) Q′ (ui1 |u1:i−1 ) = 12 (¯ 1 ′ and QU i |U 1:i−1 ,Y ′1:N = PU i |U 1:i−1 for i ∈ S1 . 1 1 1 1 (f ) Z(X|Y )2 < H(X|Y ) < Z(X|Y ). (g) (7).

For the second summation, we have X ′1:N ′1:N 1:N ′1:N |P (u1:N ) − Q(u1:N )|Q(u1:N ) 1 ,y 1 ,y 2 |u1 , y u1:N ,u1:N ,y ′1:N 2 1

X

=

u1:N ,y ′1:N 1

′1:N ′1:N |P (u1:N ) − Q(u1:N )| 1 ,y 1 ,y β′

= 2V(PU11:N ,Y ′1:N , QU11:N ,Y ′1:N ) = O(2−N ) Finally, β′

V(PU21:N ,U11:N ,Y ′1:N , QU21:N ,U11:N ,Y ′1:N ) = O(2 · 2−N ).

A PPENDIX C P ROOF OF T HEOREM 4 Proof: Firstly, for the source Y ′ , we consider the average performance of the multilevel polar codes with all possible ℓ choice of uF at each level. If the encoding rule described ℓ in the form of (12) is used for all i ∈ [N ] at each level, the resulted average distortion is given by X 1 ′1:N 1:N DP,Y ′ = P 1:N ′1:N (u1:N )d(y ′1:N , u1:r GN ), 1:r , y N 1:N ′1:N U1:r ,Y u1:r ,y

1:N 1:N where u1:N (remind 1:r GN denotes a mapping from u1:r to x that x is drawn from Λ according to DΛ,σr ). For instance, 2V(Q′U 1:N ,Y ′1:N , QU11:N ,Y ′1:N ) let Λ = Z and the partition is given by Z/2Z/...2r Z, then 1 Xq x1:N = x1:N + 2x1:N + ... + 2r−1 x1:N and x1:N = u1:N 1 2 r ℓ ℓ GN . ≤ 2ln2D(QU1i ||Q′U i |U11:i−1 , Y ′1:N ) When r → ∞, there exists a one to one mapping from u1:N 1:r i∈S1 to x1:N . Then we have q (h) X X 1 2ln2(H(U1i |U11:i−1 ) − 0) = DP,Y ′ = P 1:N ′1:N (x1:N , y ′1:N )d(y ′1:N , x1:N ) i∈S1 N 1:N ′1:N X ,Y x ,y p Xq β′ X 1 ≤ 2ln2Z(U1i |U11:i−1 ) ≤ N 2ln2 · 2−N β = O(2−N ), ·N PX,Y ′ (x, y ′ )d(x, y ′ ) = i∈S1 N x,y ′ Z +∞ X 1 (y ′ − x)2 ′ √ = P (x) exp(− )(y − x)2 dy ′ X where inequality (h) follows from the MAP decision in (9) 2∆ 2π∆ −∞ x∈Λ for i ∈ S1 . = ∆. Finally, we have The result DP,Y ′ = ∆ is reasonable since the encoder does not V(PU11:N ,Y ′1:N , QU11:N ,Y ′1:N ) 1:N ′1:N do any compression. If we replace PU1:r 1:N ,Y ′1:N (u ) 1:r , y ≤ V(PU11:N ,Y ′1:N , Q′U 1:N ,Y ′1:N ) + V(Q′U 1:N ,Y ′1:N , QU11:N ,Y ′1:N ) with Q 1:N Iℓ 1:N ′1:N ′1:N (u , y ) and compress y to u 1 1 U1:r ,Y ′1:N 1:r ℓ β′ at each level, the resulted average distortion DQ,Y ′ can be = O(2−N ). bounded as X Clearly, when N goes to infinity, for any R > |IN1 | = 1 ′1:N Q 1:N ′1:N (u1:N )d(y ′1:N , u1:N DQ,Y ′ = 1:r , y 1:r GN ) I(X1 ; Y ′ ), V(PU11:N ,Y ′1:N , QU11:N ,Y ′1:N ) is arbitrarily small. N 1:N ′1:N U1:r ,Y

Following the same fashion,

u1:r ,y

A PPENDIX B P ROOF OF T HEOREM 3 Proof: The variational distance can be upper bounded as (18). Treating (U11:N , Y ′1:N ) as a new source with distribution ′1:N P (u1:N ), the first summation can be proved to be 1 , ′y −N β O(2 ) in the same fashion as the proof of Theorem 2.

1 ≤ N

+

1 N

X

1:N ′1:N PU1:r 1:N ,Y ′1:N (u )d(y ′1:N , u1:N 1:r , y 1:r GN )

X

1:N ′1:N |PU1:r 1:N ,Y ′1:N (u1:r , y )−

′1:N u1:N 1:r ,y

′1:N u1:N 1:r ,y

1:N ′1:N QU1:r 1:N ,Y ′1:N (u1:r , y )|d(y ′1:N , u1:N 1:r GN )

X

2V(PU 1:N ,Y ′1:N , Q′U 1:N ,Y ′1:N ) = 1

1

(a)

X

=

|

u1:N ,y ′1:N 1 (b)

X



′1:N |Q′ (u1:N , y ′1:N ) − P (u1:N )| 1 1 ,y

u1:N ,y ′1:N 1

X

(Q′ (ui1 |u11:i−1 , y ′1:N ) − P (ui1 |u11:i−1 , y ′1:N ))(

i

i−1 Y

j=1

X

|Q(ui1 |u11:i−1 , y ′1:N ) − P (ui1 |u11:i−1 , y ′1:N )|(

i∈F1 ∪S1 u1:N ,y ′1:N 1

X

=

X

|Q(ui1 |u11:i−1 , y ′1:N ) − P (ui1 |u11:i−1 , y ′1:N )|(

X

X



X

X



X

v u u2ln2 t

i∈F1 ∪S1

q

X



P (ui1 |u11:i−1 , y ′1:N ))(

N Y

Q(ui1 |u11:i−1 , y ′1:N ))P (y ′1:N )

j=i+1

P (ui1 |u11:i−1 , y ′1:N ))P (y ′1:N )

j=1

1

P (u11:i−1 , y ′1:N )

i∈F1 ∪S1 u1:i−1 ,y ′1:N 1 (d)

i−1 Y

i−1 Y

Q(ui1 |u11:i−1 , y ′1:N ))P (y ′1:N )|

j=i+1

2P (u11:i−1 , y ′1:N )V(QU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N , PU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N )

i∈F1 ∪S1 u1:i−1 ,y ′1:N 1 (c)

N Y

j=1

i∈F1 ∪S1 u1:i ,y ′1:N 1

=

P (ui1 |u11:i−1 , y ′1:N ))(

X

1;i−1

u1

1

1

1

1

q 2ln2D(PU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N ||QU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N ) 1

1

1

1

1

1

P (u11:i−1 , y ′1:N )D(PU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N ||QU i |U 1:i−1 =u1:i−1 ,Y ′1:N =y ′1:N ) 1

1

1

1

1

1

,y ′1:N

2ln2D(PU i ||QU i |U11:i−1 , Y ′1:N ) 1

1

i∈F1 ∪S1 (e)



Xq

2ln2(1 − H(U1i |U11:i−1 , Y ′1:N )) +

Xq

2ln2(1 − Z(U1i |U11:i−1 , Y ′1:N )2 ) +

i∈F1 (f )

≤ ≤

2N

p

2ln2(H(U1i |U11:i−1 ) − H(U1i |U11:i−1 , Y ′1:N ))

i∈S1

i∈F1 (g)

Xq

Xq

2ln2(Z(U1i |U11:i−1 ) − Z(U1i |U11:i−1 , Y 1:N )2 )

i∈S1 −N β

4ln2 · 2−N β = O(2



),

(17)

Since the densities of both Y ′ and X decrease exponentially to their square norms, the distortion caused by large x or y ′ is negligible, we can always assume a maximum distortion dmax between y ′ and x. Then we have DQ,Y ′

≤ =

DP,Y ′ +

2 1:N ′1:N V(PU1:r 1:N ,Y ′1:N (u ), 1:r , y N 1:N ′1:N QU1:r 1:N ,Y ′1:N (u )) · N dmax 1:r , y β′

∆ + O(2−N ),

where the equation follows from (15) and r = O(log log N ) [8, Lemma 5]. Now we consider using the same encoder to quantize the

2V(PU 1:N ,U 1:N ,Y ′1:N , QU 1:N ,U 1:N ,Y ′1:N ) = 2

1

2

1

Gaussian source Y . The resulted average distortion DQ,Y can be written as X 1 1:N 1:N Q 1:N 1:N (u1:N )d(y 1:N , u1:r GN ) DQ,Y = 1:r , y N 1:N 1:N U1:r ,Y u1:r ,y

1 = N

X

1:N u1:N 1:r ,y

1:N 1:N PY 1:N (y 1:N )QU1:r 1:N |Y 1:N (u ) 1:r |y

· d(y 1:N , u1:N 1:r GN ).

Since the same encoder is used, for a same realization y 1:N , 1:N 1:N 1:N 1:N 1:N |Y 1:N (u 1:N |Y ′1:N (u we have QU1:r ) = QU1:r ), 1:r |y 1:r |y

X

1:N |Q(u1:N , u1:N , y ′1:N ) − P (u1:N , y ′1:N )| 2 1 2 , u1

X

1:N ′1:N |P (u1:N |u1:N , y ′1:N )P (u1:N , y ′1:N ) − Q(u1:N , y ′1:N )Q(u1:N )| 2 1 1 2 |u1 1 ,y

X

1:N |P (u1:N |u1:N , y ′1:N ) − Q(u1:N , y ′1:N )|P (u1:N , y ′1:N ) 2 1 2 |u1 1

u1:N ,u1:N ,y ′1:N 2 1

=

u1:N ,u1:N ,y ′1:N 2 1



u1:N ,u1:N ,y ′1:N 2 1

+

X

u1:N ,u1:N ,y ′1:N 2 1

|P (u1:N , y ′1:N ) − Q(u1:N , y ′1:N )|Q(u1:N |u1:N , y ′1:N ) 1 1 2 1

(18)

and hence DQ,Y − DQ,Y ′

1 = N

X

1:N u1:N 1:r ,y

(PY 1:N (y 1:N ) − PY ′1:N (y 1:N ))

1:N 1:N QU1:r 1:N |Y 1:N (u )d(y 1:N , u1:N 1:r |y 1:r GN ) X 1 |PY 1:N (y 1:N ) − PY ′1:N (y 1:N )| ≤ N dmax N 1:N y

Again, by the telescoping expansion, X

y 1:N

=

|PY 1:N (y 1:N ) − PY ′1:N (y 1:N )|

N XX

y 1:N i=1

=

N X X i=1 y i

|PY i (y i ) − PY ′i (y i )|PY 1:i−1 (y 1:i−1 )PY ′i+1:N (y i+1:N )

|PY i (y i ) − PY ′i (y i )|



Lemma 1

N · 8ǫΛ (˜ σ∆ ).

As a result, β′

DQ,Y ≤ ∆ + O(2−N ) + 8ǫΛ (˜ σ∆ )dmax N. By scaling Λ, we can make ǫΛ (˜ σ∆ ) ≪

1 8dmax N , ′

(19)

and DQ,Y can σ2

be arbitrarily close to ∆ with R > I(X; Y ) ≥ 12 log ∆s − 5ǫΛ (˜ σ∆ ) (n could be 1). When ǫΛ (˜ σ∆ ) → 0, we have n σs2 σs2 1 1 ′ I(X; Y ) → 2 log ∆ and R > 2 log ∆ .

Now it is ready to explain the lattice structure. From the definition of Fℓ and [8, Lemma 6], it is easy to find that Fℓ ⊆ Fℓ−1 for 1 < ℓ ≤ r. When uSℓ ℓ is uniformly selected ℓ and uF ℓ = 0 at each level, the constructed polar code at level ℓ − 1 is a subset of the polar code at level ℓ. Therefore, the resulted multilevel code is actually a polar lattice and the MAP decision on the bits in Sℓ is a shaping operation according to DΛ,σr . Moreover, since DQ,Y is an average distortion over ℓ all random choices of uF ℓ , there exists at least one specific Fℓ choice of uℓ at each level making the average distortion satisfying (19). This is exactly a shift on the constructed polar lattice. Consequently, the shifted polar lattice achieves the ratedistortion bound of the Gaussian source. Remark 1. From the proof of Theorem 4, it seems that R σ2 could be slightly smaller than 21 log ∆s (since R > I(X; Y ′ ) ≥ σs2 5ǫΛ (˜ σ∆ ) 1 ) to reach an average distortion ∆, which 2 log ∆ − n would be contradictory to Shannon’s rate-distortion theory. In σ2 fact, this is not the case. When R < 12 log ∆s , an arbitrarily small ǫΛ (˜ σ∆ ) cannot be guaranteed, which means that the resulted distortion cannot be arbitrarily close to ∆. To say σ2 achieving the distortion, we need R > 21 log ∆s − 5ǫΛn(˜σ∆ ) for σ2 all possibly small ǫΛ (˜ σ∆ ), which leads to R > 21 log ∆s .