1
Lossy Compression with Near-uniform Encoder Outputs Badri N. Vellambi, Jörg Kliewer New Jersey Institute of Technology, Newark, NJ 07102 Email: {badri.vellambi, jkliewer}@njit.edu Matthieu R. Bloch Georgia Institute of Technology, Atlanta, GA 30332 Email:
[email protected] arXiv:1602.06902v1 [cs.IT] 22 Feb 2016
Abstract It is well known that lossless compression of a discrete memoryless source at a rate just above entropy with near-uniform encoder output is possible if and only if the encoder and decoder share a common random seed. This work focuses on deriving conditions for near-uniform lossy compression in the Wyner-Ziv and the distributed lossy compression problems. We show that in the Wyner-Ziv case, near-uniform encoder output and operation close to the WZ-rate limit is simultaneously possible, while in the distributed lossy compression problem, jointly near-uniform outputs is achievable at any rate point in the interior of the rate region provided the sources share non-trivial common randomness. Index Terms Rate-distortion, Slepian-Wolf problem, Wyner-Ziv problem, distributed lossy source coding.
I. I NTRODUCTION Owing to the source-channel separation theorem for point-to-point communication setup and the convenience separation offers, separate source and channel coding and the optimality of separation have been studied in various network contexts. Usually, separation-based approaches, especially in multi-user settings, assume that the output of source encoders are near-uniform in its alphabet, where the uniformity is measured using the variational distance metric. The study of lossless compression with vanishing error probability and near-uniform encoder output was studied in [1], [2]. Hayashi showed that only one of the two constraints (i.e., vanishing error probability or nearuniform encoder output) can be achieved [2]. However, if we allow the encoder and decoder to share common randomness (that grows at least as the square root of the blocklength), one can design lossless codes with nearuniform encoder output [3], [4]. In [4], we have also shown using finite-length results of Kontoyiannis et al. [5] that lossy compression with deterministic encoders and at rates arbitrarily close to the rate-distortion limit with nearuniform encoder output can be simultaneously achieved. In this work, we analyze Wyner-Ziv (WZ) and distributed lossy compression problems to ascertain whether near-uniform encoder output can be achieved at any rate point in the corresponding rate regions. Specifically, we establish the following results. • In the WZ problem, there exist codes operating just above the WZ rate-limit while offering near-uniform encoder output. • In the two-source distributed lossy compression setting, we show the following result. If the sources share non-trivial Gács-Körner common information, then for any rate pair in the interior of the rate region, there exist codes operating at that rate pair whose encoder outputs are jointly near-uniform. Note that this result is proven without needing a characterization of underlying rate region. The setting in which the sources possess no Gács-Körner common information remains open. In both problems, the proofs employ ideas from channel resolvability codes [6, p. 404] and the likelihood encoder [7]. In the distributed lossy compression case, the proof additionally uses the existence of near-uniform codes for the Slepian-Wolf problem with a non-standard decoding constraint. The remainder of the paper is organized thus. Section II provides the notation, Section III formally presents the problem, Section IV presents the main results of this work, and lastly, Section V presents the results used in the proofs of Section IV. This work is supported by NSF grants CCF-1440014, CCF-1439465, CCF-1320304, and CCF-1527074.
2
II. N OTATION For m, n ∈ N with m < n, Jm, nK , {m, m + 1, . . . , n}. Uppercase letters (e.g., X , Y ) denote random variables (RVs), and the respective script versions (e.g., X , Y ) denote their alphabets. In this work, all alphabets are assumed to be finite. Lowercase letters denote the realizations of random variables (e.g., x, y ). Superscripts indicate the length of vectors and subscripts indicate the component index. Given a finite set S , unif(S) denotes the uniform distribution on the set. Given a p.m.f. pX , Tεn [pX ] denotes the set of all ε-strongly letter typical sequences of length n [8], supp(pX ) indicates the support of the p.m.f. pX , and p⊗n X indicates the joint p.m.f. of n i.i.d random variables each distributed according to pX . Given an event E , P(E) denotes the probability of its occurrence. Lastly, given P two p.m.f.s p and q over a set X , the variational distance V(p, q) is defined as V(p, q) , x∈X |p(x) − q(x)|. III. P ROBLEM D EFINITION The lossy coding problems studied in this work impose a near-uniform encoder output constraint on the classical Wyner-Ziv and distributed lossy compression problems, and are formally defined here for the sake of completeness. Definition 1: Let discrete memoryless sources (X, Y ) correlated according to p.m.f. QXY , bounded distortion measure d : X × Xˆ → [0, dmax ], and distortion requirement ∆ for the source X be given. We say that Wyner-Ziv coding of the source X with receiver side-information Y and near-uniform encoder output is achievable at rate R if for every ε > 0, there exist n ∈ N, encoder function fX : X n → J1, 2n(R+ε) K and reconstruction function gX : J1, 2n(R+ε) K × Y n → Xˆ n s. t.: V(QfX (X n ) , unif(J1, 2n(R+ε) K)) ≤ ε, n X ˆ i ) ≤ n(∆ + ε), d(Xi , X
(1)
(2)
i=1
ˆ n = gX (fX (X n ), Y n ) is the receiver reconstrucwhere QfX (X n ) is the p.m.f. of the encoder output fX (X n ) and X tion. Definition 2: Let discrete memoryless sources (X, Y ) correlated according to QXY , bounded distortion measures dX : X × Xˆ → [0, dmax ] and dY : Y × Yˆ → [0, dmax ], and distortion requirements ∆x , ∆y for sources X and Y , respectively, be given. We say that distributed lossy compression of sources X and Y with jointly near-uniform encoder outputs is achievable at rate pair (Rx , Ry ) if for every ε > 0, there exist n ∈ N, encoder functions fX : X n → J1, 2n(Rx +ε) K, fY : Y n → J1, 2n(Ry +ε) K, and receiver reconstruction function gXY : J1, 2n(Rx +ε) K × J1, 2n(Ry +ε) K → Xˆ n × Y n such that:
V(QfX (X n )fY (Y n ) , QU ) ≤ ε, n X ˆ i ) ≤ n(∆x + ε) d(Xi , X
(3)
d(Yi , Yˆi ) ≤ n(∆y + ε),
(5)
i=1 n X i=1
(4)
where QU = unif(J1, 2nRx +nε K×J1, 2nRy +nε K) is the uniform p.m.f. on J1, 2nRx +nε K×J1, 2nRy +nε K, QfX (X n )fY (Y n ) ˆ n , Yˆ n ) = gXY (fX (X n ), fY (Y n )) is the joint p.m.f. of the outputs of the two encoders (fX (X n ), fY (Y n )), and (X represents the pair of the receiver reconstructions. IV. M AIN R ESULTS A. Near-uniform Wyner-Ziv Coding Theorem 1: Wyner-Ziv coding with near-uniform encoder output is achievable at rate R if and only if R ≥ RW Z (∆). Proof: The proof uses ideas from channel resolvability and the likelihood encoder unlike the standard randomcoding argument. This approach affords us the advantage of being able to track the distribution of the encoder
3
output more easily than when using the covering lemma. To prove the claim, we proceed as follows. Pick a test channel QX|X such that the joint p.m.f. QX|X ˆ ˆ QXY satisfies ˆ ) = I(X; X) ˆ − I(Y ; X) ˆ = RW Z (∆). I(X; X|Y
(6)
ˆ ) + 2ε, and R0 , I(X; ˆ Y ) − ε. Let the codebook C comprising of 2n(R+R0 ) X ˆ -codewords Now, let R , I(X; X|Y 0 nR ˆ be constructed such that the components of the codewords {Xi (j, k) : i ∈ J1, nK, j ∈ J1, 2 K, k ∈ J1, 2nR K} are ˆ obtained from Q ˆ QX . We arrange the codewords of the random i.i.d. according to QXˆ – the marginal of X X|X 0 0 nR nR codebook C in a table of 2 rows and 2 columns. Suppose that (K, K 0 ) ∼ unif(J1, 2nR K × J1, 2nR K denotes ˆ n (K, K 0 ) be selected and transmitted over the the random variable pair used for selecting the codewords. Let X n ˜ discrete memoryless channel (DMC) QX|Xˆ , and X is the corresponding output, and let Y˜ n be the output when ˜ n is transmitted over the DMC QY |X . For this construction, the following hold: X
¯ 0 , gC (Y˜ n , K) K 0 0
( K , K 0 ) ⇠ unif(J1, 2nR K ⇥ J1, 2nR K)
Y˜ n
QY |X
QX|Xˆ
˜n X
Receiver
ˆ n (K, K 0 ) X
QK,K 0 ,X˜ n
(a) Channel resolvability code for generating the source X ˚ ,K ˚0 K
QK,K 0 |X˜ n
Xn
QY |X
Yn
Receiver
¯0 ˆ n (K, ˚ K ˚ X )
(b) The derived scheme for near-uniform WZ coding Fig. 1.
Source generation and the derived near-uniform WZ scheme.
ˆ , by the channel resolvability theorem [6, Theorem 6.3.1], we are guaranteed that 1. Since R + R0 > I(X; X) when averaging over all codebook realizations, we have n→∞ E V(QX˜ n , Q⊗n (7) X ) −→ 0.
ˆ Y ), there exists a function gC (Y˜ n , K) (depending on the realization of the codebook C ) such 2. Since R0 < I(X; that n→∞ E P[K 0 6= gC (Y˜ n , K)] −→ 0. (8)
Note that the above is merely a restatement the achievability of the channel capacity using random codebooks. ˜ n and X ˆ n (K, K 0 ) are related through the ˆ n (K, K 0 ) is generated using the p.m.f. Q ˆ , and since X 3. Since X X DMC QX|Xˆ , by the weak law of large numbers, we also have n n n→∞ ˜ ,X ˆ (K, K 0 ) ∈ P (X / Tεn [QX,Xˆ ] −→ 0. (9) 0
Now, for sufficiently large n, we can find a realization C0 = {ˆ xnC0 (j, k) : j ∈ J1, 2nR K, k ∈ J1, 2nR K} of the random ˜ n and Y˜ n generated by transmitting a codeword selected uniformly from C0 codebook such that the sources X satisfy: ε V(QX˜ n , Q⊗n X ) ≤ /2 0 n P[K 6= gC0 (Y˜ , K)] ≤ ε/2 n ˜ ,x P (X ˆC0 (K, K 0 )) ∈ / Tεn [QX,Xˆ ] ≤ ε/2.
(10) (11) (12)
4
Let QK,K 0 X˜ n be the p.m.f. induced by the codebook C0 . Now, to derive a (randomized) WZ scheme from this channel resolvability code, we proceed as follows. To encode X n , as illustrated in Fig. 1, we simply generate an instance approx n n n ˚ K ˚0 ) ∼ Qapprox of indices (K, ˜ n (·, · | X ) ˜ n (·, · | X ), where QK,K 0 |X ˜ n (·, · | X ) is a p.m.f. close to QK,K 0 |X K,K 0 |X satisfying V(Qapprox ˜ n ) ≤ ε/2, ˜ n , QK,K 0 ,X K,K 0 ,X
(13)
˚ to the receiver. The reason for employing an approximation Qapprox and communicate K ˜ n will ˜ n of QK,K 0 ,X K,K 0 ,X n ˜ become clear later when emulate the latter p.m.f. using X and a near-uniform random seed. Now, supposing that ˚ K ˚0 ), we have we are able to generate an instance of the pair of indices (K, approx ˚ ˚0 n ˚ ˚0 n ⊗n (xn ). QK, ˚K ˚0 ,X n (k, k , x ) , QK,K 0 |X ˜ n (k, k | x )Q
(14)
Hence, from (10) and (14), we are guaranteed that ⊗n V(QK, ˜ n , QX ) ≤ ε. ˚K ˚0 ,X n , QK,K 0 ,X n ) + V(QX ˜ n ) ≤ V(QK, ˚K ˚0 ,X n , QK,K 0 ,X
Consequently, we also have V(QK, ˚K ˚0 ,Y n , QK,K 0 ,Y˜ n ) ≤ ε
V(QK, ˚K ˚0 , QK,K 0 ) ≤ ε,
(15) (16)
˜ n, where (15) follows because Y n and Y˜ n are the channel outputs when the DMC QY |X is fed with X n and X ˚, which is the WZ encoder ˚ and K ˚0 are jointly near-uniform. Consequently, K repsectively. From (16), we see that K output, is also near uniform. Further, (12),(16) together imply that ˚ K ˚0 )) ∈ P (X n , x ˆnC0 (K, / Tεn [QX,Xˆ ] ≤ 3ε/2. (17)
Note that reconstruction is possible at the receiver, since from (11), (15) and Lemma 1 of Section V, it follows that ˚0 6= gC0 (Y n , K)] ˚ ≤ 3ε/2, P[K n 0 n n ˚ K ˚ ) 6= x ˚ gC0 (Y , K)) ˚ P x ˆC0 (K, ˆC0 (K,˚ ≤ 3ε/2.
Combining (17) and (19), we see that ˚ gC0 (Y n , K))) ˚ ∈ P (X n , x ˆnC0 (K,˚ / Tεn [QX,Xˆ ] ≤ 3ε.
(18) (19)
(20)
We are nearly done, except for the detail that the encoding is not deterministic because of the use of Qapprox ˜ n (·, · | K,K 0 |X n X ). Since we require the encoding to be deterministic, we can invoke Lemma 2 to obtain an approximation n Qapprox ˜ n that satisfies (13). This approximation can be realized as a deterministic function of X ˜ n of QK,K 0 |X K,K 0 |X ˆ + ε = 2ε that is independent of X n . Further, this uniform and a uniform random seed of rate R − I(X; X) seed of rate 2ε can be approximated by a near-uniform seed of rate 2ε realized as a deterministic function of 3εn {Xn+` : ` = 1, . . . , H(X) } by extracting its intrinsic randomness (Lemma 5). Thus, in effect, the approximation approx 3εn QK,K 0 |X˜ n of QK,K 0 |X˜ n is realized as a deterministic function of n + H(X) symbols of the source X . From (20), we see that the average per-symbol distortion for the first n symbols at the receiver will be bounded above by ∆(1+3ε). 3εn However, for the last H(X) symbols, which are merely used for generating the random seed, the corresponding per-symbol distortion is no more than dmax – the maximum value of the distortion function. Combining the two, 3εn we see that the overall code encoding n + H(X) symbols offers a distortion of no more than ∆ + 3εdmax . B. Near-uniform Distributed Lossy Source Coding Problem We begin this part by analyzing joint near-uniformity of encoder outputs in the Slepian-Wolf problem, which will be used in studying the corresponding variant of the distributed lossy compression problem. Since the lossless compression of a source with near-uniform output is not possible (without shared encoder-decoder randomness), SW coding with jointly near-uniform encoder outputs is also not possible. However, we show a weaker result: SW coding with jointly near-uniform encoder outputs is possible provided the two sources share non-trivial Gács-Körner
5
common randomness, and all but a small fraction of symbols need to be losslessly recovered. The following result quantifies this precisely. Theorem 2: Consider the distributed lossless coding of two sources X and Y correlated according to QXY . Suppose that the Gács-Körner common randomness between X and Y is non-trivial. Let 0 < ε < H(U ). Then, for all (Rx , Ry ) in the interior of the Slepian-Wolf rate region, there exist integers n, m ∈ N with m ∈ Jn, n + 9εnK, and encoding functions fX : X m → J1, 2nRx K, fY : Y m → J1, 2nRy K and decoding function gXY : J1, 2nRx K × J1, 2nRy K → X n × Y n such that V(QfX (X m ),fY (Y m ) , unif(J1, 2nRx K × J1, 2nRy K)) ≤ ε P[(X n , Y n ) 6= gXY (fX (X m ), fY (Y m ))] ≤ ε
Proof: The proof proves that the claim holds for a corner point of the SW rate region, which extends to the other corner point by reversing the roles of the sources, and to the interior of the rate region by time-sharing. Without loss of generality, we can assume that Y is available at the decoder. Let U indicate the Gács-Körner common randomness between X and Y . Let [Ru Rx Ry ] , [H(U ) + ε/2 H(X|Y ) + ε/2 H(Y |U ) + ε/2] Rx0
, I(X; Y |U ) +
ε/2
(21)
< I(X; Y )
(22)
As illustrated in Fig. 2, let us construct for sufficiently large n, a random codebook of 2nRu U -codewords generated X n (i, 1, 1)
U n (1)
···
.
···
..
Fig. 2.
Y n (i, l)
U n (i)
.. ···
···
X n (i, j, k)
X n (i, 2nRx , 1)
Y n (i, 1)
···
.
.
0
X n (i, 1, 2nRx )
..
..
···
. 0
Y n (i, 2nRy )
U n (2nRu )
X n (i, 2nRx , 2nRx )
Codebook setup for the Slepian-Wolf problem 0
using QU . For each i ∈ J1, 2nRU K, generate a tabular codebook of X -codewords with 2nRx rows and 2nRx columns with the `th component of each codeword generated using QX|U (·|U` (i)). Note that this codebook has 2n(H(X|U )+ε) entries. Next, for each i ∈ J1, 2nRU K, generate a codebook of Y -codewords with 2nRy entries with the `th component of each codeword generated using QY |U (·|U` (i)). Let I, J, K, L satisfy: 0
(I, J, K) ∼ unif(J1, 2nRu K × J1, 2nRx K × J1, 2nRx K) nRu
(I, L) ∼ unif(J1, 2
K × J1, 2
nRy
K)
(23)
(24)
ˆ n , U n (I), X ˆ n , X n (I, J, K) and Y˜ n , Y n (I, L), and let Yˆ n be the output of the DMC QY |X when the Let U n n ˆ = X (I, J, K). By the channel resolvability theorem [6, Theorem 6.3.1], and the choice of codebook input is X rates, we have n→∞
E[DKL (QUˆ n k Q⊗n U )] −→ 0
n→∞ E[DKL (QXˆ n Yˆ n k Q⊗n XY )] −→ 0 n→∞ E[DKL (QY˜ n k Q⊗n Y )] −→ 0 n→∞ Q E DKL QXˆ n Yˆ n |I=1 k nj=1 QXY |Uj (1) −→ 0,
(25) (26) (27) (28)
where C collectively represents the three codebooks. Now, let η , E[DKL (QYˆ n ,I,J k QYˆ n ,I QJ )]. Then, the manipulations in (30)-(35) hold due to the following arguments. η = E DKL (QYˆ n ,I,J k QYˆ n ,I QJ ) (29)
6
h i Q Q = E DKL (QYˆ n ,I,J k QI,J n`=1 QY |U (y` |U` (I))) − DKL (QYˆ n ,I k QI n`=1 QY |U (y` |U` (I))) n (a) Q ≤ E DKL (QYˆ n ,I,J k QI,J QY |U (y` |U` (I))) `=1 # " P X ⊗n ⊗n n n 0 X n n (b) QY |X (y |X (1,1,k)) k0 QY |X (y |X (1, 1, k )) log nR0 Qn =E 0 2nRx 2 x `=1 QY |U (y` |U` (1)) n y k " ⊗n n n " ## P ⊗n n n 0 X QY |X (y |X (1, 1, k)) k0 QY |X (y |X (1, 1, k )) X n (1,1,k) E log nR0 Qn = E n 2nRx0 2 x `=1 QY |U (y` |U` (1)) U (1) y n ,k " ⊗n n n "P ## ⊗n n n 0 X QY |X (y |X (1, 1, k)) k0 QY |X (y |X (1, 1, k )) X n (1,1,k) log E nR0 Qn ≤ E n 2nRx0 2 x `=1 QY |U (y` |U` (1)) U (1) y n ,k " ⊗n n n ## " n n X QY |X (y |X (1, 1, k)) Q⊗n Y |X (y |X (1, 1, k)) = E log 1 + nR0 Qn 2nRx0 2 x `=1 QY |U (y` |U` (1)) n y ,k 0 2 ≤ log 1 + 2n(I(X;Y |U )−Rx +2δ log |Y|) + 2|X ||Y||U|e−nδ µ log 1 + µ−n • • • •
(30) (31)
(32)
(33)
(34) (35)
(a) follows by dropping the second non-negative term that is subtracted, and (b) due to the i.i.d. random codebook construction; (32) uses the law of iterated expectations, where the inner conditional expectation is computed over codewords except (U n (1), X n (1, 1, k)); (33) uses Jensen’s inequality for the log function; (34) follows from the fact that for k 0 6= k , ⊗n n n 0 n E Q⊗n Y |X (·|X (1, 1, k )) X (1, 1, k) = QY |U (·|U (1)),
since X n (1, 1, k 0 ) is chosen using
n Q
QX|U (·|U` (1)); and
`=1
•
finally, (35) follows by splitting the outer sum depending on whether the realization of the codeword X n (1, 1, k) and y n are jointly δ -strongly letter typical, where δ < 4 logε |Y| , and µ , min{QX,Y,U (x, y, u) : (x, y, u) ∈ supp(QX,Y,U )}.
Note that because of the choice of Rx0 in (22), the bound in (35) approaches 0 as n → ∞. Thus, for a sufficiently large n, there must exist a codebook C ∗ such that V(QUˆ n , Q⊗n U ) R − I(A; B). Then, there exists φCn : B × J1, 2nρ K → J1, 2nR K (that depends on Cn ) such that (52) lim E V(Qφ(B˜ n ,S),B˜ n , QL,B˜ n ) = 0, n→∞
˜ n ) induced by Cn . where QL,B˜ n is the joint p.m.f. of (L, B Proof: Let δ, ε > 0 be chosen such that ρ − R + I(A; B) − 4δ log2 (|A||B|) > ε.
(53)
˜ n ) is as if it is the output from a DMS QAB . By the random codebook construction, it follows that (An (L), B Hence, by [8, Theorem 1.1], it follows that h i 2 ˜ n) ∈ P (An (L), B / Tδn [QAB ] ≤ 2M e−nδ µ , (54)
where M , |A||B| and µ ,
min
(a,b)∈supp(QAB )
QAB (a, b). Now, let for a codebook C◦ , {(an (l)}l∈J1,2nR K and bn ∈ B n ,
LC◦ (bn ) , l : (an (l), bn ) ∈ Tδn [QAB ] . From Lemma 3 below, the following holds for sufficiently large n. h i ˜ n )| ≤ 21+n(R−I(A;B)+2δ log2 M ) . E |LC (B
(55)
(56)
9
Let F be the collection of codebooks C0 , {an (l)}l∈J1,2nR K such that the following hold. h i √ ˜ n) ∈ P (an (L), B / Tδn [QAB ] C = C0 ≤ 2M e−nδ2 µ ˜ n )| C = C0 E |LC (B ≤ 2δ log2 M . n ˜ E |LC (B )| By Markov’s inequality, we then have P[C ∈ / F] ≤
√ 2M e−nδ2 µ + 2−nδ log2 M .
Now, pick C ∗ , a∗ n (l) l∈J1,2nR K ∈ F and define GC ∗ as the set of all bn such that n h i √ n 4 ˜ ˜ n) ∈ P (a∗ n (L), B / Tδn [QAB ] B = b ∗ ≤ 2M e−nδ2 µ C= C |LC ∗ (bn )| ≤ 21+n(R−I(A;B)+4δ log2 S) . Again, by Markov’s inequality, it follows that ˜n ∈ P[ B / GC ∗ | C = C ∗ ] ≤ η0 ,
Further, it also follows that for each bn ∈ GC ∗ , X l∈L / C∗ (bn )
√ 4
(57) (58)
(59)
(60) (61)
2M e−nδ2 µ + 2−nδ log2 M .
(61)
QLBˆ n (l, bn ) ≤
√ 4 2M e−nδ2 µ .
(62)
Thus by Lemma 4, we see that given a random seed S ∼ unif(J1, 2nρ K) for all bn ∈ GC ∗ , we can construct fbn : J1, 2nρ K → J1, 2nR K with |LC ∗ (bn )| √ 4 (63) + 2M e−nδ2 µ k Qfbn (S) − QL|B˜ n =bn k1 ≤ nρ 2 √ (61) 4 ≤ 21+n(R−I(A;B)+4δ log2 M −ρ) + 2M e−nδ2 µ (64) √ (53) 4 ≤ η , 21−nε + 2M e−nδ2 µ . (65) We can now glue these functions to define ΛC ∗ (b , S) , n
fbn (S) bn ∈ GC ∗ , l∗ k bn ∈ / GC ∗
(66)
where l∗ ∈ J1, 2nρ K. By construction, for the selected code C ∗ , we now have X QB˜ n (bn ) k QΛC∗ (bn ,S) − QL|B˜ n =bn k1 ≤ η, bn ∈GC∗
X n
b ∈B
n
QB˜ n (bn ) k QΛC∗ (bn ,S) − QL|B˜ n =bn k1 ≤ η + 2η0 .
Since the RHS does not depend on the choice of C ∗ in F , h i E k QΛC (B˜ n ,S),B˜ n − QL,B˜ n k1 | C ∈ F ≤ η + 2η0 . Next. using the fact that the variational distance between two p.m.f.s is at most 2, we also have h i E k QΛC (B˜ n ,S),B˜ n − QL,B˜ n k1 || C ∈ / F ≤ 2 P[C ∈ / F]. Finally, combining the above two equations and using (59) completes the claim. Lemma 3: Consider the setup of Lemma 2. Let LC (·) be as defined in (55). Then, for n large, h i ˜ n )| ≤ 21+n(R−I(A;B)−2δ log2 (|A||B|)) . E |LC (B
(67)
10
Proof: Owing to the random codebook construction, h i h i ˜ n )| = E |LC (B ˜ n )| L = 1 E |LC (B i X h ˜ n )} L = 1 . = E 1{l ∈ LC (B
(68) (69)
l
h i ˜ n )}|L = 1 is the same for l > 2. Hence, Since the codewords are chosen randomly, it follows that E 1{l ∈ LC (B h i h i ˜ n )| ≤ 1 + (2nR − 1) E 1{2 ∈ LC (B ˜ n )}|L = 1 . E |LC (B (70) h i ˜ n )}|L = 1 is exactly the probability that realizations An ∼ Q⊗n , B n ∼ Q⊗n selected Clearly, E 1{2 ∈ LC (B A B independent of one another are jointly δ -letter typical. Thus, by [8, Theorem 1.1], it follows that h i X ˜ n )}|L = 1 = QA (an )QB (bn ) ≤ 2−n(I(A;B)−2δ log2 |A||B|) . E 1{2 ∈ LC (B (a,bn )∈Tδn [QAB ]
Combining the above bound with (70) completes the proof. P Lemma 4: Let Q be a p.m.f. on a finite set A such that there exists B ⊆ A with |B| = M and b∈B Q(b) ≥ 1−ε for 0 < ε < 1. Now, suppose that L ∼ unif(J1, `K). Then, there exists f : J1, `K → A such that Qf (L) , the p.m.f. of f (L), satisfies k Qf (L) − Q k1 ≤ ε + M` . P Proof: Let b1 b2 · · · bM be an ordering of B . Let p0 = 0, and for 1 ≤ i ≤ M , let pi , ij=1 Q(bj ) denote the cumulative mass function. Now, let Ni , bpi `c, i = 0, . . . , M , and let f : J1, NM K → B be defined by the pre-images via f −1 (bi ) = {Ni−1 + 1, . . . , Ni }, i = 1, . . . , M . Fig. 3 provides an illustration of these operations. Now, by construction, we have 0 ≤ pi − P [f (L) ∈ {b1 , . . . , bi }] ≤ `−1 ,
pM = Q(b1 ) + · · · + Q(bM )
M .
..
2 1
0
p1
... ⌘ 2⌘
f Fig. 3.
Q(b2 )
Q(b1 )
..
N1 ⌘ (N1 + 1)⌘
1
(71)
"
.
p2
...
i = 1, . . . , M.
Q(bM ) pM 1
...
N2 ⌘ (N2 + 1)⌘
NM ⌘
1, . . . , N1
N1 + 1, . . . , N2
NM
...
...
...
b1
b2
bM
1
+ 1, . . . , NM
An illustration of approximating a p.m.f. using a function of a uniform RV.
Consequently, we also have for any i = 1, . . . , M , −`−1 ≤ pi − pi−1 − Qf (L) (bi ) = Q(bi ) − Qf (L) (bi ) ≤ `−1 .
(72)
Hence, we see that X a∈A
|Q(a) − Qf (L) (a)| =
M X i=1
(72)
|Q(bi ) − Qf (L) (bi )| + P[A ∈ / B] ≤
M + ε. `
(73)
Lemma 5 (Theorem 2.2.2 [6]): Let X n be i.i.d. according to pX . Then, for each R < H(X), there exists a sequence of mappings {φn,R : X n → J1, 2nR K}n∈N such that lim V φn,R (X n ), unif(J1, 2nR K) = 0. (74) n→∞
11
R EFERENCES [1] T. S. Han, “Folklore in source coding: Information-spectrum approach,” IEEE Transactions on Information Theory, vol. 51, no. 2, pp. 747–753, February 2005. [2] M. Hayashi, “Second-order asymptotics in fixed-length source coding and intrinsic randomness,” IEEE Transactions on Information Theory, vol. 54, no. 10, pp. 4619–4637, October 2008. [3] R. Chou and M. Bloch, “Data compression with nearly uniform output,” 2013 IEEE International Symposium on Information Theory, pp. 1979–1983, 2013. [4] B. N. Vellambi, M. Bloch, R. Chou, and J. Kliewer, “Lossless and lossy source compression with near-uniform output: Is common randomness always required?” 2015 IEEE International Symposium on Information Theory, pp. 2171–2175, 2015. [5] I. Kontoyinannis, “Pointwise redundancy in lossy data compression and universal lossy data compression,” IEEE Transactions on Information Theory, vol. 46, no. 1, pp. 136–152, January 2000. [6] T. S. Han, Information-Spectrum Methods in Information Theory, 1st ed. Springer, 2003. [7] P. Cuff and E. Song, “The likelihood encoder for source coding,” in 2013 IEEE Information Theory Workshop, Sept 2013, pp. 1–2. [8] G. Kramer, “Topics in multi-user information theory,” Found. Trends Commun. Inf. Theory, vol. 4, no. 4-5, pp. 265–444, 2007.