Non-Asymptotic and Asymptotic Analyses on Markov Chains in ...

Comment

Report 5 Downloads 23 Views

Non-Asymptotic and Asymptotic Analyses on Markov Chains in Several Problems Masahito Hayashi∗ and Shun Watanabe† ∗ Graduate

School of Mathematics, Nagoya University, Japan, and Centre for Quantum Technologies, National University of Singapore, Singapore. E-mail: [email protected] † Department of Information Science and Intelligent Systems, University of Tokushima, Japan, and Institute for Systems Research, University of Maryland, College Park. Email: [email protected]

Abstract—In this paper, we derive non-asymptotic achievability and converse bounds on the source coding with side-information and the random number generation with side-information. Our bounds are efficiently computable in the sense that the computational complexity does not depend on the block length. We also characterize the asymptotic behaviors of the large deviation regime and the moderate deviation regime by using our bounds, which implies that our bounds are asymptotically tight in those regimes. We also show the second order rates of those problems, and derive single letter forms of the variances characterizing the second order rates.

Although it is not stated explicitly in any literatures, we believe that there are two important criteria for non-asymptotic bounds: • •

Computational complexity, and Asymptotic optimality.

To begin with, we shall explain motivations of this paper. Although the problems treated in this paper are not the channel coding, we consider the channel coding here to explain motivations. So far, quite many types of non-asymptotic achievability bounds have been proposed. For example, Verd´u and Han derived a non-asymptotic bound by using the information spectrum approach in order to derive the general formula [5] (see also [6]), which we call the information-spectrum bound. One of the authors and Nagaoka derived a bound (for the classical-quantum channel) by relating the error probability to the binary hypothesis testing [7, Remark 15] (see also [8]), which we call the hypothesis testing bound. Polyanskiy et. al. derived the RCU (random coding union) bound and the DT (dependence testing) bound [1]2 . There is also Gallager’s bound [9].

Let us first consider the first criterion, i.e., the computational complexity. For the BSC, the computational complexity of the RCU bound is O(n2 ) and that of the DT bound is O(n) [10]. However, the computational complexities of these bounds is much larger for general DMCs or channels with memory. It is known that the hypothesis testing bound can be described as a linear programming (eg. see [11], [12]3 ), and can be efficiently computed under certain symmetry. However, the number of variables in the linear programming grows exponentially in the block length, and it is difficult to compute in general. The computation of the information-spectrum bound depends on the evaluation of a tail probability. The information-spectrum bound is less operational than the hypothesis testing bound in the sense of the hierarchy introduced in [11], and the computational complexity of the former is much smaller than that of the latter. However the computation of a tail probability is still not so easy unless the channel is a DMC. For DMCs, computational complexity of Gallager’s bound is O(1) since the Gallager function is additive quantity for DMCs. However, this is not the case if there is a memory4 . Consequently, there is no bound that is efficiently computable for the Markov chain so far. The situation is the same for the source coding with side-ifnromation. Let us now move to achievability bounds on the random number generation with side-infromation, i.e., the privacy amplification. Renner derived a bound by using the leftover hash lemma and the smooth min-entropy [14], which we call the smooth min-entropy bound. By combining this bound and the information-spectrum approach method, Tomamichel and one of the authors derived another bound [11], which we call the inf-spectral entropy bound. One of the authors also

1 The uniform random number generation with side-information is also known as the privacy amplification [3], [4]. 2 A bound slightly looser (coefficients are worse) than the DT bound can be derived from the hypothesis testing bound of [7].

3 In the case of quantum channel, the bound is described as a semi-definite programming. 4 The Gallager bound for finite states channels was considered in [13, Section 5.9], but a closed form expression for the exponent was not derived.

I. I NTRODUCTION The non-asymptotic analyses of coding problems are attracting a considerable attention recently [1], [2]. In this paper, we further develop the non-asymptotic analyses for the fixed length source coding with (full) side-information at the decoder and the uniform random number generation with sideinformation1 . Particularly, we are interested in the cases such that underlying sources are Markov chains. A. Motivation

derived another bound by using the leftover hash lemma, the approximate smoothing of the R´enyi entropy of order 2, and the large deviation technique [15], which we call the exponential bound. Further, the authors compared the infspectral entropy bound and the exponential bound [16]. It turned out that the former is tighter than the latter when the required security level ε is rather large, and the latter is tighter than the former when ε is rather small. A bound that interpolates both the bounds was also derived in [16], which we call the hybrid bound. For the computational complexity issue of the privacy amplification, the situation is the same as the coding problems, i.e., there is no bound that is efficiently computable for the Markov chain. The smooth min-entropy bound can be computed by using the linear programming for rather small block length, but it is difficult to compute in general. The computation of the inf-spectral entropy bound depends on the evaluation of a tail probability, and it is also difficult to compute in general. The exponential bound is described by the Gallager function, and thus can be easily computed provided that a source is memoryless. As described above, there is no bound that is efficiently computable for the Markov chain, and the first purpose of this paper is to derive non-asymptotic bounds that are efficiently computable. Next, let us consider the second criterion, i.e., asymptotic optimality. So far, three kinds of asymptotic regimes have been studied in the information theory [1], [2], [17], [18], [19], [20], [21]: • The large deviation regime in which the error probability ε asymptotically behaves like e−nr for some r > 0, • The moderate deviation regime in which ε asymptotically 1−2t r behaves like e−n for some r > 0 and t ∈ (0, 1/2), and • The second order regime in which ε is a constant. We shall claim that a good non-asymptotic bound should be asymptotically optimal at least one of the above mentioned three regimes. In fact, the information spectrum bound, the hypothesis testing bound, and the DT bound are asymptotically optimal in the moderate deviation regime and the second order regime; the Gallager bound is asymptotically optimal in the large deviation regime; and the RCU bound is asymptotically optimal in all the regimes5 . B. Main Contribution for Non-Asymptotic Analysis To derive non-asymptotic achievability bounds on the problems, we basically use the exponential type bounds6 for the single shot setting. For the source coding with side-information and the random number generation with side-infromation, we consider two assumptions on transition matrices (see Assumption 1 and Assumption 2 of Section II). Although a computable form of the conditional entropy rate is not known in general, 5 The Gallager bound and the RCU bound are asymptotically optimal in the large deviation regime only up to the critical rate. 6 For the channel coding, it corresponds to the Gallager bound.

Assumption 1, which is less restrictive than Assumption 2, enables us to derive a computable form of the conditional entropy rate. In the problems with side-information, exponential type bounds are described by conditional R´enyi entropies. There are several definitions of conditional R´enyi entropies (see [22], [23] for extensive review), and we use the one defined in [24] and the one defined by Arimoto [25]. We shall call the former one the lower conditional R´enyi entropy (cf. (2)) and the latter one the upper conditional R´enyi entropy (cf. (7)). To derive non-asymptotic bounds, we need to evaluate these information measures for the Markov chain. For this purpose, under Assumption 1, we introduce the lower conditional R´enyi entropy for transition matrices (cf. (21)). Then, we evaluate the lower conditional R´nyi entropy for the Markov chain in terms of its transition matrix counterpart. This evaluation gives non-asymptotic bounds for the coding and random number generation problems under Assumption 1. Under more restrictive assumption, i.e., Assumption 2, we also introduce the upper conditional R´enyi entropy for a transition matrix (cf. (26)). Then, we evaluate the upper R´enyi entropy for the Markov chain in terms of its transition matrix counterpart. This evaluate gives non-asymptotic bounds that are tighter than those obtained under Assumption 1. We also derive converse bounds for every problem by using the change of measure argument developed by the authors in the accompanying paper on information geometry [26]. To derive converse bounds for the problems with sideinformation, we further introduce two-parameter conditional R´enyi entropy and its transition matrix counterpart (cf. (14) and (30)). This novel information measure includes the lower conditional R´enyi entropy and the upper conditional R´enyi entropy as special cases. Here, we would like to remark on terminologies. There are a few ways to express exponential type bounds. In statistics or the large deviation theory, we usually use the cumulant generating function (CGF) to describe exponents. In information theory, we use the Gallager function or the R´enyi entropies. Although these three terminologies are essentially the same and are related by change of variables, the CGF and the Gallager function are convenient for some calculations since they have good properties such as convexity. However, they are merely mathematical functions. On the other hand, the R´enyi entropies are information measures including Shannon’s information measures as special cases. Thus, the R´enyi entropies are intuitively familiar in the field of information theory. The R´enyi entropies also have an advantage that two types of bounds (eg. (82) and (82)) can be expressed in a unified manner. For these reasons, we state our main results in terms of the R´enyi entropies while we use the CGF and the Gallager function in the proofs.

and

C. Main Contribution for Asymptotic Analysis For asymptotic analyses of the large deviation and the moderate deviation regimes, we derive the characterizations7 by using our non-asymptotic achievability and converse bounds, which implies that our non-asymptotic bounds are tight in the large deviation regime and the moderate deviation regime. We also derive the second order rate. It is also clarified that the reciprocal coefficient of the moderate deviation regime and the variance of the second order regime coincide. Furthermore, a single letter form of the variance is clarified8 .

V(X|Y )

1 := Var log PX|Y (X|Y ) h i ↓ 2 H(X|Y ) − H1+θ (X|Y ) = lim . θ→0 θ

The other important special cases of H1+θ (PXY |QY ) is the measure maximized over QY . We shall call this special case the upper conditional R´enyi entropy of order 1+θ and denote10

:=

In Section II, we introduce information measures that will be needed in later sections. Then, we consider the source coding with side-information and the uniform random number generation with side-information in Section III and Section IV respectively. Omitted results and proofs can be found in [29].

(7)

max

QY ∈P(Y)

H1+θ (PXY |QY ) (1+θ)

H1+θ (PXY |PY

=

−

=

(8)

)

X 1+θ log PY (y) θ y

(9) " X

1 # 1+θ

PX|Y (x|y)1+θ

(10)

In this section, we introduce information measures that will be used in later sections.

where (1+θ) PY (y)

A. Information measures for Single-Shot Setting In this section, we introduce conditional R´enyi entropies for the single-shot setting. For more detailed review of conditional R´enyi entropies, see [23]. For a correlated random variable (X, Y ) on X × Y with probability distribution PXY and a marginal distribution QY on Y, we introduce the conditional R´enyi entropy of order 1 + θ relative to QY as X 1 PXY (x, y)1+θ QY (y)−θ , (1) H1+θ (PXY |QY ) := − log θ x,y where θ ∈ (−1, 0) ∪ (0, ∞). The conditional R´enyi entropy of order 1 relative to QY is defined by the limit with respect to θ. One of important special cases of H1+θ (PXY |QY ) is the case with QY = PY . We shall call this special case the lower conditional R´enyi entropy of order 1 + θ and denote9 := H1+θ (PXY |PY ) (2) X 1 = − log PXY (x, y)1+θ PY (y)−θ .(3) θ x,y

↑ lim H1+θ (X|Y ) = H(X|Y )

(12)

θ→0

and lim

h i ↑ 2 H(X|Y ) − H1+θ (X|Y )

θ→0

θ

= V(X|Y ).

7 For the large deviation regime, we only derive the characterizations up to the critical rates. 8 An alternative way to derive a single letter characterization of the variance for the Markov chain was shown in [27, Lemma 20]. It should be also noted that a single letter characterization can be derived by using the fundamental matrix [28]. The single letter characterization of the variance in [17, Section VII] and [2, Section III] has an error, which is corrected in this paper. 9 This notation was first introduce in [30].

(13)

When we derive converse bounds on the source coding with side-information or the random number generation with sideinformation, we need to consider the case such that the order of the R´enyi entropy and the order of conditioning distribution defined in (11) are different. For this purpose, we introduce two-parameter conditional R´enyi entropy:

= (4)

(11)

Lemma 2 We have

Lemma 1 We have ) = H(X|Y )

x

For this measure, we also have properties similar to Lemma 1.

:= lim

1 PXY (x, y)1+θ 1+θ := P P 1 . 0 1+θ ] 1+θ y0 [ x PXY (x, y ) P

H1+θ,1+θ0 (X|Y )

The following property holds.

θ→0

,

x

II. I NFORMATION M EASURES

↓ H1+θ (X|Y

(6)

↑ H1+θ (X|Y )

D. Organization of Paper

↓ H1+θ (X|Y )

(5)

(1+θ 0 ) H1+θ (PXY |PY ) "

(14) (15) #

X X 1 PY (y) PX|Y (x|y)1+θ (16) − log θ y x " # θ0 1+θ X 0 θ0 × PX|Y (x|y)1+θ + H ↑ (X|Y ). 0 1+θ 0 1 + θ x (17)

10 For −1 < θ < 0, (9) can be proved by using the H¨ older inequality, and, for 0 < θ, (9) can be proved by using the reverse H¨older inequality [31, Lemma 8].

B. Information Measures for Transition Matrix 0

0

Let {W (x, y|x , y )}((x,y),(x0 ,y0 ))∈(X ×Y)2 be an ergodic and irrecucible transition matrix. The purpose of this section is to introduce transition matrix counter parts of those measures in Section II-A. For this purpose, we first need to introduce some assumptions on transition matrices: Assumption 1 (Non-Hidden) We say that a transition matrix W is non-hidden if X W (x, y|x0 , y 0 ) = W (y|y 0 ) (18) x

for every x ∈ X and y, y 0 ∈ Y. Assumption 2 (Strongly Non-Hidden) We say that a transition matrix W is strongly non-hidden if, for every θ ∈ (−1, ∞) and y, y 0 ∈ Y, X Wθ (y|y 0 ) := W (x, y|x0 , y 0 )1+θ (19) x

is well defined, i.e., the right hand side of (19) is independent of x0 . Assumption 1 requires (19) to hold only for θ = 0, and thus Assumption 2 implies Assumption 1. However, Assumption 2 strictly stronger condition than Assumption 1. For example, let consider the case such that the transition matrix is a product form, i.e., W (x, y|x0 , y 0 ) = W (x|x0 )W (y|y 0 ). In this case, Assumption 1 is obviously satisfied. However, Assumption 2 is not satisfied in general. First, we introduce information measures under Assumption 1. In order to define a transition matrix counterpart of (2), let us introduce the following tilted matrix: ˜ θ (x, y|x0 , y 0 ) := W (x, y|x0 , y 0 )1+θ W (y|y 0 )−θ . W

(20)

Let λθ be the Perron-Frobenius eigenvalue and P˜θ,XY be its normalized eigenvector. Then, we define the lower conditional R´enyi entropy for W by 1 ↓,W (21) H1+θ (X|Y ) := − log λθ , θ where θ ∈ (−1, 0) ∪ (0, ∞). For θ = 0, we define the lower conditional R´enyi entropy for W by = :=

H1↓,W (X|Y ) lim

θ→0

1+θ log κθ , (26) θ where θ(−1, 0) ∪ (0, ∞). We have the following properties. ↑,W H1+θ (X|Y ) := −

Lemma 3 We have ↑,W lim H1+θ (X|Y ) = H W (X|Y )

θ→0

(27)

and

0

H W (X|Y )

where Wθ is defined by (19). Let κθ be the Perron-Frobenius eigenvalue. Then, we define the upper conditional R´enyi entropy for W by

↓,W H1+θ (X|Y

(22) ),

1

θ

θ→0

(25)

= VW (X|Y ).

(28)

Now, let us introduce a transition matrix counterpart of (14). For this purpose, we introduce the following |Y| × |Y| matrix: −θ

Nθ,θ0 (y|y 0 ) := Wθ (y|y 0 )Wθ0 (y|y 0 ) 1+θ0 .

(29)

Let νθ,θ0 be the Perron-Frobenius eigenvalue of Nθ,θ0 . Then, we define the two-parameter conditional R´enyi entropy by θ0 1 W log νθ,θ0 + H ↑,W0 (X|Y ). (30) H1+θ,1+θ 0 (X|Y ) := − θ 1 + θ0 1+θ For the information measures introduced in this section, we have the following property. Lemma 4 ↓,W 1) The function θH1+θ (X|Y ) is a concave function of θ, and it is strict concave iff. VW (X|Y ) > 0. ↓,W 2) H1+θ (X|Y ) is a monotonically decreasing function of θ. ↑,W 3) The function θH1+θ (X|Y ) is a concave function of θ, and it is strict concave iff. VW (X|Y ) > 0. ↑,W 4) H1+θ (X|Y ) is a monotonically decreasing function of θ. ↓,W 5) For every θ ∈ (−1, 0)∪(0, ∞), we have H1+θ (X|Y ) ≤ ↑,W H1+θ (X|Y ). W 6) For fixed θ0 , the function θH1+θ,1+θ 0 (X|Y ) is a concave function of θ, and it is strict concave iff. VW (X|Y ) > 0. W 7) For fixed θ0 , H1+θ,1+θ 0 (X|Y ) is a monotonically decreasing function of θ. 8) We have

(23)

and we just call it the conditional entropy for W . As a counterpart of (6), we also define h i ↓,W 2 H W (X|Y ) − H1+θ (X|Y ) VW (X|Y ) := lim . (24) θ→0 θ Next, we introduce information measures under Assumption 2. In order to define a transition matrix counterpart of (7), let us introduce the following |Y| × |Y| matrix: Kθ (y|y 0 ) := Wθ (y|y 0 ) 1+θ ,

lim

h i ↑,W 2 H W (X|Y ) − H1+θ (X|Y )

↓,W W H1+θ,1 (X|Y ) = H1+θ (X|Y ).

(31)

↑,W W H1+θ,1+θ (X|Y ) = H1+θ (X|Y ).

(32)

9) We have

W 10) For every θ ∈ (−1, 0) ∪ (0, ∞), H1+θ,1+θ 0 (X|Y ) is maximized at θ0 = θ. ↓,W Next, we consider the extreme cases of H1+θ (X|Y ). Let λ−1 be the Perro-Frobenius eigenvalue of

1[W (x, y|x0 , y 0 ) > 0]W (y|y 0 ).

(33)

Since

Then, we define H0↓,W (X|Y ) := log λ−1 .

On the other hand, let GW = (X × Y, EW ) be the graph such that ((x0 , y 0 ), (x, y)) ∈ EW iff. W (x, y|x0 , y 0 ) > 0. Then, for each (x, y) ∈ X ×Y, let C(x,y) be the set of all Hamilton cycle from (x, y) to itself. Then, we define ↓,W H∞ (X|Y )

:= − log

max

max

(35)

(¯ x,¯ y )∈X ×Y c∈C(¯ x,y) ¯

1/|c|

 Y

W (x|x0 , y 0 , y)



(36) .

((x0 ,y 0 ),(x,y))∈c

Lemma 5 We have ↓,W lim H1+θ (X|Y )

=

H0↓,W (X|Y ),

(37)

↓,W lim H1+θ (X|Y )

=

↓,W H∞ (X|Y ).

(38)

θ→∞

Next, we consider the extreme cases of W satisfies Assumption 2, we note that 0

T (y|y )

↑,W H1+θ (X|Y

:= |supp(W (·|x0 , y 0 , y))|, :=

0

0

max W (x|x , y , y) x

). When

y¯∈Y c∈Cy¯

(50)

by ↑,W (1 + θ(a(R)))a(R) − θ(a(R))H1+θ(a(R)) (X|Y ) = R, (51)

Let (X, Y) be the Markov chain induced by transition matrix W and some initial distribution PX1 Y1 . Now, we show how information measures introduced in Section II-B are related to the conditional R´enyi entropy rates. First, we introduce the following lemma, which gives finite upper and lower bounds on the lower conditional R´enyi entropy. Lemma 7 Suppose that transition matrix W satisfies Assumption 1. Let vθ be the eigenvector of WθT with respect to the Perron-Frobenius eigenvalue λθ such that minx,y vθ (x, y) = 1. Let wθ (x, y) := PX1 Y1 (x, y)1+θ PY1 (y)−θ . Then, we have ↓,W (n − 1)θH1+θ (X|Y ) + δ(θ)

≤

(42)

≤

↓ θH1+θ (X n |Y n ) ↓,W (n − 1)θH1+θ (X|Y

(52) (53)

) + δ(θ),

(54)

where

↑,W lim H1+θ (X|Y )

θ→−1

↑,W H1+θ (X|Y

)

=

H0↑,W (X|Y ),

↑,W = H∞ (X|Y ).

(43) (44)

d[θH ↓,W (X|Y )]

1+θ is monotoniFrom Statement 1 of Lemma 4, dθ cally decreasing. Thus, we can define the inverse function θ(a)

by

↓,W d[θH1+θ (X|Y )] =a dθ θ=θ(a) for a < a ≤ a, where a := ↓,W d[θH1+θ (X|Y )] limθ→−1 . dθ

↑,W R(a) := (1 + θ(a))a − θ(a)H1+θ(a) (X|Y )

(40)

Lemma 6 We have

↓,W d[θH1+θ (X|Y )] dθ

for R(a) < R < H0↓,W (X|Y ). ↑,W For θH1+θ (X|Y ), by the same reason, we can define the inverse function θ(a) by ↑,W d[θH1+θ (X|Y )] = a, (49) dθ θ=θ(a)

C. Information Measures for Markov Chain

(y 0 ,y)∈c

↑,W H∞ (X|Y ) := − log κ∞ .

of

↓,W (X|Y ) = R (48) (1 + θ(a(R)))a(R) − θ(a(R))H1+θ(a(R))

(39)

On the other hand, let κ∞ be the Perro-Frobenius eigenvalue of W (y|y 0 )T (y|y 0 ). Then, we define

lim

R(a) is a monotonic increasing function of a < a < R(a). Thus, we can define the inverse function a(R) of R(a) by

for R(a) < R < H0↑,W (X|Y ).

are well defined, i.e., the right hand sides of (39) and (40) are independent of x0 . Let GW = (Y, EW ) be the graph such that (y 0 , y) ∈ EW iff. W (y|y 0 ) > 0. Then, for each y ∈ Y, let Cy be the set of all Hamilton cycle from y to itself. Then, we define  1/|c| Y H0↑,W (X|Y ) := log max max  S(y|y 0 ) . (41)

θ→∞

(47)

and the inverse function a(R) of

θ→−1

S(y|y 0 )

R0 (a) = (1 + θ(a)),

(34)

↓,W d[θH1+θ (X|Y )] limθ→∞ dθ

(45) and a :=

Let

↓,W R(a) := (1 + θ(a)) − θ(a)H1+θ(a) (X|Y ).

(46)

δ(θ)

:= − loghvθ |wθ i + log max vθ (x, y),

(55)

δ(θ)

:= − loghvθ |wθ i.

(56)

x,y

From Lemma 7, we have the following. Theorem 1 Suppose that transition matrix W satisfies Assumption 1. For any initial distribution, we have 1 ↓ ↓,W H (X n |Y n ) = H1+θ (X|Y ), n 1+θ 1 lim H(X n |Y n ) = H W (X|Y ), n→∞ n 1 lim H ↓ (X n |Y n ) = H0↓,W (X|Y ), n→∞ n 0 1 ↓ ↓,W lim H∞ (X n |Y n ) = H∞ (X|Y ). n→∞ n

lim

n→∞

(57) (58) (59) (60)

We also have the following asymptotic evaluation of the variance. Theorem 2 Suppose that transition matrix W satisfies Assumption 1. For any initial distribution, we have 1 lim V(X n |Y n ) = VW (X|Y ). n→∞ n

(61)

Theorem 2 is practically important since the limit of the variance can be described by a single letter characterized quantity. A method to calculate VW (X|Y ) can be found in [32]. Next, we show the lemma that gives finite upper and lower bound on the upper conditional R´enyi entropy in terms of the upper conditional R´enyi entropy for the transition matrix. Lemma 8 Suppose that transition matrix W satisfies Assumption 2. Let vθ be the eigenvector of KθT with respect to the Perro-Frobenius eigenvalue κθ such that miny vθ (y) = 1. Let wθ be the |Y|-dimensional vector defined by 1 # 1+θ

" wθ (y) :=

X

1+θ

.

PX1 Y1 (x, y)

(62)

x

Perro-Frobenius eigenvalue νθ,θ0 such that miny vθ,θ0 (y) = 1. Let wθ,θ0 be the |Y|-dimensional vector defined by wθ,θ0 (y) "

(71) X

:=

PX1 Y1 (x, y)1+θ

#" X

x

# PX1 Y1 (x, y)1+θ

0

−θ 1+θ 0

(72) .

x

Then, we have W 0 (n − 1)θH1+θ,1+θ 0 (X|Y ) + ζ(θ, θ ) n

≤

(73)

n

θH1+θ,1+θ0 (X |Y )

≤

(n −

W 1)θH1+θ,1+θ 0 (X|Y

(74) 0

) + ζ(θ, θ ),

(75)

where ζ(θ, θ0 )

:= − loghvθ,θ0 |wθ,θ0 i + log max vθ,θ0 (y) + θξ(θ0 ),

ζ(θ, θ0 )

:= − loghvθ,θ0 |wθ,θ0 i + θξ(θ0 )

y

for θ > 0 and ζ(θ, θ0 )

:= − loghvθ,θ0 |wθ,θ0 i + log max vθ,θ0 (y) + θξ(θ0 ),

ζ(θ, θ0 )

:= − loghvθ,θ0 |wθ,θ0 i + θξ(θ0 )

y

for θ < 0 From Lemma 9, we have the following.

Then, we have θ H ↑,W (X|Y ) + ξ(θ) 1 + θ 1+θ θ H ↑ (X n |Y n ) 1 + θ 1+θ θ (n − 1) H ↑,W (X|Y ) + ξ(θ), 1 + θ 1+θ

(n − 1)

(63)

≤

(64)

≤

(65)

ξ(θ)

:= − loghvθ |wθ i + log max vθ (y),

(66)

ξ(θ)

:= − loghvθ |wθ i.

(67)

From Lemma 8, we have the following.

1 ↑ ↑,W H (X n |Y n ) = H1+θ (X|Y ), n→∞ n 1+θ 1 lim H ↑ (X n |Y n ) = H0↑,W (X|Y ), n→∞ n 0 1 ↑ ↑,W lim H∞ (X n |Y n ) = H∞ (X|Y ). n→∞ n

A code Ψ = (e, d) consists of one encoder e : X → {1, . . . , M } and one decoder d : {1, . . . , M } × Y → X . The decoding error probability is defined by Pe (Ψ) := Pr{X 6= d(e(X), Y )}.

(77)

For notational convenience, we introduce the infimum of error probabilities under the condition that the message size is M :

Theorem 3 Suppose that transition matrix W satisfies Assumption 2. For any initial distribution, we have lim

1 W H1+θ,1+θ0 (X n |Y n ) = H1+θ,1+θ (76) 0 (X|Y ). n→∞ n III. S OURCE C ODING WITH F ULL S IDE -I NFORMATION lim

A. Problem Formulation

where y

Theorem 4 Suppose that transition matrix W satisfies Assumption 2. For any initial distribution, we have

(68) (69) (70)

Finally, we show the lemma that gives finite upper and lower bounds on the two-parameter conditional R´enyi entropy in terms of the two-parameter conditional R´enyi entropy for the transition matrix. Lemma 9 Suppose that transition matrix W satisfies AssumpT tion 2. Let vθ,θ0 be the eigenvector of Nθ,θ 0 with respect to the

Pe (M ) := inf Pe (Ψ) Ψ

(78)

When we construct a source code, we often use a twouniversal hash family11 F and a random function F on F. Then, we bound the error probability Pe (Ψ(F )) averaged over the random function by only using the property of twouniversality. For this reason, it is convenient to introduce the quantity: ¯ e (M ) := sup E[Pe (Ψ(F ))], P

(79)

F

where the supremum is taken over all two-universal hash family from X to {1, . . . , M }. From the definition, we ob¯ e (M ). When we consider n-fold viously have Pe (M ) ≤ P 11 A family of functions is said to be universal-two if Pr{F (x) = 1 for any district x and x0 [33]. F (x0 )} ≤ M

extension, the source code and related quantities are denoted with subscript n. Instead of evaluating the error probability ¯ e (Mn )) for given Mn , we are also interested in Pe (Mn ) (or P evaluating M (n, ε) := inf{Mn : Pe (Mn ) ≤ ε}, ¯ (n, ε) := inf{Mn : P ¯ e (Mn ) ≤ ε} M

(80)

Theorem 8 Suppose that transition matrix W satisfies Assumption 2. Let R := n1 log Mn . For any H W (X|Y ) < R < H0↑,W (X|Y ), we have − log Pe (Mn ) ≤ inf s>0 ˜ −1θ(a)

n o ↓,W ↓,W (X|Y ) − H (X|Y ) + δ1 (n − 1)(1 + s)θ˜ H1+ 1+(1+s)θ˜ θ˜ ˜ −(1 + s) log 1 − e(n−1)E(R,θ)+δ2 /s + 1, where ˜ := (θ(R) − θ)R ˜ E(R, θ) ↓,W ˜ ↓,W (X|Y ) −θ(R)H1+θ(R) (X|Y ) + θH 1+θ˜

θ(a) is the inverse function defined by (45), and δ1 δ2

˜ − δ((1 + s)θ), ˜ (1 + s)δ(θ) ˜ − δ(θ(R)) + δ(θ). ˜ := (θ(R) − θ)R :=

Theorem 18 Suppose that transition matrix W satisfies Assumption 2. For R < H W (X|Y ), we have ↑,W

lim inf − n→∞

Next, we derive tighter achievability and converse bounds under Assumption 2. Theorem 15 Suppose that transition matrix W satisfies Assumption 2. Let R := n1 log Mn . Then we have

−θR + θH1+θ (X|Y ) 1 ¯ enR ≥ sup log ∆ . n 1+θ 0≤θ≤1

On the other hand, for R(a) < R < H W (X|Y ), we have 1 ¯ enR lim sup − log ∆ n n→∞ ↑,W (X|Y ). ≤ −θ(a(R))a(R) + θ(a(R))H1+θ(a(R))

¯ − log ∆(M n)

Remark 2 For Rcr ≤ R, where (cf. (50) for the definition of ↑,W −θnR + (n − 1)θH1+θ (X|Y ) R(a)) ≥ sup + ξ(θ) − log(3/2). ! 1+θ ↑,W 0≤θ≤1 d[θH1+θ (X|Y )] Rcr := R dθ Theorem 16 Suppose that transition matrix W satisfies Asθ=1 sumption 2. Let R be such that is the critical rate, we can rewrite the lower bound in (87) as ↑,W (n − 1)R + (1 + θ(a(R)))(a(R) − ξ(θ(a(R)))) −θR + θH1+θ (X|Y ) sup = log(Mn /2). 1+θ 0≤θ≤1 ↑,W = −θ(a(R))a(R) + θ(a(R))H1+θ(a(R)) (X|Y ).

If R(a) < R < H W (X|Y ), then we have ¯ − log ∆(M ) ≤ inf n s>0

Thus, the lower bound and the upper bound coincide up to the critical rate.

˜ θ>θ(a(R))

n W (n − 1)(1 + s)θ˜ H1+ (X|Y ) ˜ θ,1+θ(a(R)) o W −H1+(1+s) (X|Y ) + δ1 ˜ θ,1+θ(a(R)) ˜ −(1 + s) log 1 − e(n−1)E(R,θ)+δ2 /s + 2, where ˜ := (θ(a(R)) − θ)(a(R)) ˜ E(R, θ) ↑,W ˜ W˜ −θ(a(R))H1+θ(a(R)) (X|Y ) + θH (X|Y )] 1+θ,1+θ(a(R))

θ(a) and a(R) are the inverse functions defined by (49) and (51) respectively, δ1 δ2

˜ θ(a(R))) − ζ((1 + s)θ, ˜ θ(a(R))), (1 + s)ζ(θ, ˜ := (θ(a(R)) − θ)(a(R)) − ζ(θ(a(R)), θ(a(R))) ˜ +ζ(θ, θ(a(R))).

:=

C. Large Deviation

D. Moderate Deviation From Theorem 13 and Theorem 14, we have the following. Theorem 19 Suppose that transition matrix W satisfies Assumption 1. For arbitrary t ∈ (0, 1/2) and δ > 0, we have W 1−t 1 lim − 1−2t log ∆ enH (X|Y )−n δ n→∞ n 1 ¯ enH W (X|Y )−n1−t δ = lim − 1−2t log ∆ n→∞ n δ2 = . 2VW (X|Y ) E. Second Order By applying the central limit theorem to information spectrum bounds, and by using Theorem 2, we have the following. Theorem 20 Suppose that transition matrix W satisfies Assumption 1. For arbitrary ε ∈ (0, 1), we have log M (n, ε) − nH W (X|Y ) √ n→∞ n ¯ (n, ε) − nH W (X|Y ) log M √ = lim n→∞ n q −1 W = V (X|Y )Φ (ε). lim

From Theorem 13 and Theorem 14, we have the following. Theorem 17 Suppose that transition matrix W satisfies Assumption 1. For R < H W (X|Y ), we have ↓,W

lim inf − n→∞

−θR + θH1+θ (X|Y ) 1 ¯ enR ≥ sup log ∆ . n 1+θ 0≤θ≤1

On the other hand, for a < R < H W (X|Y ), we have 1 ↓,W lim sup − log ∆ enR ≤ −θ(R)R + θ(R)H1+θ(R) (X|Y ). n n→∞ Under Assumption 2, from Theorem 15 and Theorem 16, we have the following tighter bound.

ACKNOWLEDGMENT HM is partially supported by a MEXT Grant-in-Aid for Scientific Research (A) No. 23246071. He is partially supported by the National Institute of Information and Communication Technology (NICT), Japan. The Centre for Quantum Technologies is funded by the Singapore Ministry of Education and the National Research Foundation as part of the Research Centres of Excellence programme.

R EFERENCES [1] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inform. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010. [2] M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Trans. Inform. Theory, vol. 55, no. 11, pp. 4947–4966, November 2009. [3] C. H. Bennett, G. Brassard, and J. M. Robert, “Privacy amplification by public discussion,” SIAM Journal on Computing, vol. 17, no. 2, pp. 210–229, Apr. 1988. [4] C. H. Bennett, G. Brassard, C. Cr´epeau, and U. Maurer, “Generalized privacy amplification,” IEEE Trans. Inform. Theory, vol. 41, no. 6, pp. 1915–1923, Nov. 1995. [5] S. Verd´u and T. S. Han, “A general fomula for channel capacity,” IEEE Trans. Inform. Theory, vol. 40, no. 4, pp. 1147–1157, July 1994. [6] T. S. Han, Information-Spectrum Methods in Information Theory. Springer, 2003. [7] M. Hayashi and H. Nagaoka, “General formulas for capacity of classicalquantum channels,” IEEE Trans. Inform. Theory, vol. 49, no. 7, pp. 1753–1768, July 2003. [8] L. Wang and R. Renner, “One-shot classical-quantum capacity and hypothesis testing,” Phys. Rev. Lett., vol. 108, no. 20, p. 200501, May 2012. [9] R. G. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory, vol. 11, no. 1, pp. 3–18, January 1965. [10] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Princeton University, November 2010. [11] M. Tomamichel and M. Hayashi, “A hierarchy of information quantities for finite block length analysis of quantum tasks,” IEEE Trans. Inform. Theory, vol. 59, no. 11, pp. 7693–7710, November 2013. [12] W. Matthews and S. Wehner, “Finite blocklength converse bounds for quantum channels,” 2012, arXiv:1210.4722. [13] R. G. Gallager, Information Theory and Reliable Communication. John Wiley & Sons, 1968. [14] R. Renner, “Security of quantum key distribution,” Ph.D. dissertation, Dipl. Phys. ETH, Switzerland, February 2005. [15] M. Hayashi, “Tight exponential analysis of universally composable privacy amplification and its applications,” IEEE Trans. Inform. Theory, vol. 59, no. 11, pp. 7728–7746, November 2013. [16] S. Watanabe and M. Hayashi, “Non-asymptotic analysis of privacy amplification via R´enyi entropy and inf-spectral entropy,” in Proc. IEEE Int. Symp. Inf. Theory 2013, Istanbul, Turkey, 2013, pp. 2715–2719, arXiv:1211.5252. [17] M. Hayashi, “Second-order asymptotics in fixed-length source coding and intrinsic randomness,” IEEE Trans. Inform. Theory, vol. 54, no. 10, pp. 4619–4637, October 2008, arXiv:cs/0503089. [18] Y. Altug and A. B. Wagner, “Moderate deviation analysis of channel coding: Discrete memoryless case,” in Proceedings of IEEE International Symposium on Information Theory, Austin, Texas, USA, June 2010, pp. 265–269. [19] D. He, L. A. Lastras-Montano, E. Yang, A. Jagmohan, and J. Chen, “On the redundancy of slepian-wolf coding,” IEEE Trans. Inform. Theory, vol. 55, no. 12, pp. 5607–5627, December 2009. [20] V. Y. F. Tan, “Moderate-deviations of lossy source coding for discrete and gaussian sources,” in Proc. IEEE Int. Symp. Inf. Theory 2012, Cambridge, MA, 2012, pp. 920 – 924. [21] S. Kuzuoka, “A simple technique for bounding the redundancy of source coding with side information,” in Proc. IEEE Int. Symp. Inf. Theory 2012, Cambridge, MA, 2012, pp. 915–919. [22] A. Teixeira, A. Matos, and L. Antunes, “Conditional R´enyi entropies,” IEEE Trans. Inform. Theory, vol. 58, no. 7, pp. 4273–4277, July 2012. [23] M. Iwamoto and J. Shikata, “Information theoretic security for encryption based on conditional R´enyi entropies,” 2013, http://eprint.iacr.org/2013/440.pdf. [24] M. Hayashi, “Exponential decreasing rate of leaked information in universal random privacy amplification,” IEEE Trans. Inform. Theory, vol. 57, no. 6, pp. 3989–4001, June 2011, arXiv:0904.0308. [25] S. Arimoto, “Information measures and capacity of order α for discrete memoryless channels,” Colloquia Mathematica Societatis Janos Bolyai, 16. Topics in Information Theory, pp. 41–52, 1975. [26] M. Hayashi and S. Watanabe, “Information geometry approach to markov chains,” 2014.

[27] M. Tomamichel and V. Y. F. Tan, “ε-capacities and second-order coding rates for channels with general state,” 2013, arXiv:1305.6789. [28] J. G. Kemeny and J. Snell, Finite Markov Chains. Springer, 1976. [29] M. Hayashi and S. Watanabe, “Non-asymptotic and asymptotic analyses on markov chains in several problems,” 2013, arXiv:1309.7528. [30] M. Tomamichel, M. Berta, and M. Hayashi, “A duality relation connecting different quantum generalizations of the conditional R´enyi entropy,” 2013, arXiv:1311.3887. [31] M. Hayashi, “Large deviation analysis for classical and quantum security via approximate smoothing,” 2012, arXiv:1202.0322. [32] S. Watanabe and M. Hayashi, “Finite-length analysis on tail probability and simple hypothesis testing for markov chain,” 2014. [33] M. N. Wegman and J. L. Carter, “New hash functions and their use in authentication and set equality,” Journal of Computer and System Sciences, vol. 22, pp. 265–279, 1981.

Recommend Documents

Parametric LTL on Markov Chains

Lecture 15: Markov Chains

MARKOV CHAINS ON ORTHOGONAL BLOCK STRUCTURES 1 ...

COMBINATORIAL MARKOV CHAINS ON LINEAR EXTENSIONS

Detecting change-points in Markov chains