New Strong Converse for Asymmetric Broadcast ... - Semantic Scholar

Report 4 Downloads 48 Views
New Strong Converse for Asymmetric Broadcast Channels Yasutada Oohama

arXiv:1604.02901v2 [cs.IT] 12 Apr 2016

University of Electro-Communications, Tokyo, Japan Email: [email protected]

Abstract—We consider the discrete memoryless asymmetric broadcast channels. We prove that the error probability of decoding tends to one exponentially for rates outside the capacity region and derive an explicit lower bound of this exponent function. We shall demonstrate that the information spectrum approach is quite useful for investigating this problem. Index Terms—Discrete memoryless channels, asymmetric broadcast channels, strong converse theorem, exponent of correct probability of decoding

variable Kn is a message sent to the receiver 1. The random variable Ln is a message sent to the receivers 1 and 2. A sender transforms Kn and Ln into a transmitted sequence X n using an encoder function ϕ(n) and sends it to the receivers 1 and 2. In this paper we assume that the encoder function ϕ(n) is a stochastic encoder. In this case, ϕ(n) is a stochastic matrix given by ϕ(n) = {ϕ(n) (xn |k, l)}(k,l,xn)∈Kn ×Ln ×X n ,

I. A SYMMETRIC B ROADCAST C HANNELS Let X , Y, Z be finite sets. The broadcast channel we study in this paper is defined by a discrete memoryless channel specified with the following stochastic matrix: △

W = {W (y, z|x)}(x,y,z)∈X ×Y×Z .

(1)

Here the set X corresponds to a channel input and the sets Y and Z correspond to two channel outputs. Let X n be a random variable taking values in X n . We write an element of X n as xn = x1 x2 · · · xn . Suppose that X n has a probability distribution on X n denoted by pX n = {pX n (xn )}xn ∈X n . Similar notations are adopted for other random variables. Let Y n ∈ Y n and Z n ∈ Y n be random variables obtained as the channel output by connecting X n to the input of channel. We write a conditional distribution of (Y n , Z n ) on given X n as W n = {W n (y n , z n |xn )}(xn ,yn ,zn )∈X n×Y n ×Z n . Since the channel is memoryless, we have W n (y n , z n |xn ) =

n Y

W (yt , zt |xt ).

(2)

In this paper we deal with the case where the components W (z, y|x) of W satisfy the following conditions: (3)

In this case the broadcast channel(BC) is specified with (W1 , W2 ). Under the assumption of (3), the conditional probability of (2) is given by n

n

n

n

W (y , z |x ) = =

W1n (y n |xn )W2n (z n |xn ) n Y

Pr{(Kn , Ln , X n , Y n , Z n ) = (k, l, xn , y n , z n )} n ϕ(n) (xn |k, l) Y W1 (yt |xt ) W2 (zt |yt ) , = |Kn ||Ln | t=1 where |Kn | is a cardinality of the set Kn . The decoding functions at the receiver 1 and the receiver 2, respectively, (n) (n) are denoted by ψ1 and ψ2 . Those functions are formally defined by (n)

ψ1

(n)

: Y n → Kn × Ln , ψ2

: Z n → Ln .

The average error probabilities of decoding at the receivers 1 and 2 are defined by (n)

(n)



(n)

(n)

(n)



(n)

(n) Pe,1 = P(n) , ψ1 ) = Pr{ψ1 (Y n ) 6= (Kn , Ln )}, e (ϕ

t=1

W (y, z|x) = W1 (y|x)W2 (z|x).

where ϕ(n) (xn |k, l) is a conditional probability of xn ∈ X n given message pair (k, l) ∈ Kn × Ln . The joint probability mass function on Kn × Ln ×X n ×Y n ×Z n is given by

W1 (yt |xt )W2 (zt |xt ).

t=1

Transmission of messages via the BC is shown in Fig. 1. Let Kn and Ln be uniformly distributed random variables taking values in message sets Kn and Ln , respectively. The random

(n) Pe,2 = P(n) , ψ2 ) = Pr{ψ2 (Z n ) 6= Ln }. e (ϕ

Furthermore, we set (n)

(n)

(n) = P(n) , ψ1 , ψ2 ) P(n) e e (ϕ △

(n)

(n)

= Pr{ψ1 (Y n ) 6= (Kn , Ln ) or ψ2 (Z n ) 6= Ln } It is obvious that we have the following relation. (n)

(n)

≤ Pe,1 + Pe,2 . P(n) e

(4)

For k ∈ Kn and l ∈ Ln , set △

(n)



(n)

D1 (k, l) = {y n : ψ1 (y n ) = (k, l)}, D2 (l) = {z n : ψ2 (z n ) = l}. The families of sets {D1 (k, l)}(k,l)∈Kn ×Ln and {D2 (l)} l∈Ln are called the decoding regions. Using the decoding region,

Receiver 1

Sender

φ(n) X

Kn Ln

n

Y n ψ (n) 1

n

Z n ψ (n) 2

W1

n

which is called the capacity region of the ABC. The two maximum error probabilities of decoding are defined by as follows:

^

Receiver 2

W2 Fig. 1.

^

Kn , Ln

(n)

(n)

(n)

^^

Pe,m,1 = Pe,m,1 (ϕ(n) , ψ1 )

Ln



=

Transmission of messages via the BC.

(n)

Pr{ψ1 (Y n ) 6= k|Kn = k, Ln = l},

(n)

(n)

Pe,m,2 = Pe,m,2 (ϕ(n) , ψ2 ) △

(n) Pe

(n)

max

(k,l)∈Kn ×Ln

(n)

= max Pr{ψ2 (Z n ) 6= l|Ln = l}. l∈Ln

can be written as

P(n) = e

1 |Kn ||Ln |

X

X

1

(k,l)∈Kn ×Ln (xn ,y n ,z n )∈X n ×Y n ×Z n : y n ∈D1c (k,l) or z n ∈D2c (l)

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |xn ).

Based on those quantities, we define the maximum capacity region as follows. For a given (ε1 , ε2 ) ∈ (0, 1)2 , a pair (R1 , R2 ) is (ε1 , ε2 )-achievable if there exists a sequence of (n) (n) triples {(ϕ(n) , ψ1 , ψ2 )}∞ n=1 such that for any δ > 0 and for any n with n ≥ n0 = n0 (ε1 , ε2 , δ) (n)

(n)

Pe,m,i (ϕ(n) , ψi ) ≤ εi , i = 1, 2, 1 1 log |Kn | ≥ R1 − δ, log |Ln | ≥ R2 − δ. n n

Set (n)

(n)

(n) P(n) = P(n) , ψ1 , ψ2 ) c c (ϕ △

(n)

(n)

(n) = 1 − P(n) , ψ1 , ψ2 ). e (ϕ (n) Pc

is called the average correct probability of The quantity decoding. This quantity has the following form X X 1 P(n) = 1 c |Kn ||Ln | n n n n n n (k,l)∈Kn ×Ln (x ,y ,z )∈X ×Y ×Z : y n ∈D1 (k,l),z n ∈D2 (l)

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |xn ). For a given (ε1 , ε2 ) ∈ (0, 1)2 , a rate pair (R1 , R2 ) is (ε1 , ε2 )-achievable if there exists a sequence of triples {(ϕ(n) , (n) (n) ψ1 , ψ2 )}∞ n=1 such that for any δ > 0 and for any n with n ≥ n0 = n0 (ε1 , ε2 , δ) (n)

The set that consists of all (ε1 , ε2 )-achievable rate pair is denoted by Cm,ABC (ε1 , ε2 |W1 , W2 ). We set \ Cm,ABC (W1 , W2 ) = Cm,ABC (ε1 , ε2 |W1 , W2 ), (ε1 ,ε2 )∈(0,1)2

which is called the maximum capacity region of the ABC. It is obvious that Cm,ABC (ε1 , ε2 |W1 , W2 ) ⊆ CABC (ε1 , ε2 |W1 , W2 ). To describe previous works on Cm,ABC (ε1 , ε2 |W1 , W2 ) and CABC (ε1 , ε2 |W1 , W2 ), we introduce an auxiliary random variable U taking values in a finite set U. We assume that the joint distribution of (U, X, Y, Z) is

(n)

Pe,i (ϕ(n) , ψi ) ≤ εi , i = 1, 2, 1 1 log |Kn | ≥ R1 − δ, log |Ln | ≥ R2 − δ. n n The set that consists of all (ε1 , ε2 )-achievable rate pair is denoted by CABC (ε1 , ε2 |W1 , W2 ). For a given ε ∈ (0, 1), a pair (R1 , R2 ) is ε-achievable if there exists a sequence of (n) (n) triples {(ϕ(n) , ψ1 , ψ2 )}∞ n=1 such that for any δ > 0 and for any n with n ≥ n0 = n0 (ε, δ) (n)

pUXY Z (u, x, y, z) = pU (u)pX|U (x|u)W1 (y|x)W2 (z|x). The above condition is equivalent to Y ↔ X ↔ Z and U ↔ X ↔ (Y, Z). Define the set of probability distribution p = pUXY Z of (U, X, Y, Z) ∈ U ×X ×Y ×Z by △

P(W1 , W2 ) = {p : |U| ≤ min{|X |, |Y| + |Z|} + 1, pY |X = W1 , pZ|X = W2 , U ↔ X ↔ (Y, Z), Y ↔ X ↔ Z}.

(n)

(n) P(n) , ψ1 , ψ2 ) ≤ ε, e (ϕ 1 1 log |Kn | ≥ R1 − δ, log |Ln | ≥ R2 − δ. n n

The set that consists of all ε-achievable rate pair is denoted by CABC (ε|W1 , W2 ). It is obvious that for 0 ≤ ε1 + ε2 ≤ 1, we have CABC (ε1 , ε2 |W1 , W2 ) ⊆ CABC (ε1 + ε2 |W1 , W2 ).

Set △

C(p) = {(R1 , R2 ) : R1 , R2 ≥ 0 , R1 ≤ Ip (X; Y |U ) R2 ≤ Ip (U ; Z) R1 + R2 ≤ Ip (X; Y )}, [ C(p). C(W1 , W2 ) = p∈P(W1 ,W2 )

We set △

CABC (W1 , W2 ) =

\

ε∈(0,1)

CABC (ε|W1 , W2 ),

We can show that the above functions and sets satisfy the following property. Property 1:

a) Set △

Cext (p) = {(r1 , r2 , r3 ) : r1 , r2 , r3 ≥ 0 , r1 ≤ Ip (X; Y |U ), r2 ≤ Ip (U ; Z), r3 ≤ Ip (X; Y )}, [ Cext (p). Cext (W1 , W2 ) =

for those problems. Universally attainable error exponents for rates inside the capacity region C(W1 , W2 ) was studied by K¨orner and Sgarro [4] and Kaspi and Merhav [5]. (n) To examine an asymptotic behavior of Pc for rates outside the capacity region C(W1 , W2 ), we define the following quantity. G(n) (R1 , R2 |W1 , W2 )   1 △ (n) (n) (n) − = min log P(n) , ψ1 , ψ2 ). c (ϕ (n) (n) n (ϕ(n) ,ψ1 ,ψ2 ):

p∈P(W1 ,W2 )

Then we have C(W1 , W2 ) = {(R1 , R2 ) : (R1 , R2 , R1 + R2 ) ∈ Cext (W1 , W2 )} b) The cardinality bound in P(W1 , W2 ) is sufficient to describe C(W1 , W2 ) and Cext (W1 , W2 ). c) The region C(W1 , W2 ) is a closed convex subset of R2+ and the region Cext (W1 , W2 ) is a closed convex subset of R4+ , where △

R2+ = {(R1 , R2 ) : R1 ≥ 0, R2 ≥ 0}, △

R3+ = {(r1 , r2 , r3 ) : ri ≥ 0, i = 1, 2, 3}. d) The region C(W1 , W2 ) is always contained in, and may coincide with, the triangle with vertices (0, 0), (C(W1 ), 0), (0, C(W1 )), where C(W1 ) is the capacity of the channel W1 . The point (C(W1 ), 0) always belongs to C(W1 , W2 ). In general, the upper boundary of C(W1 , W2 ) contains a line segment of slope −1 going through the point (C(W1 ), 0) but this line segment may reduce to the point (C(W1 ), 0). Property 1 part a) is obvious. Property 1 parts b), c) are well known result. Property 1 part d) is found in [2]. Proof of those properties are omitted. The broadcast channel was posed and investigated by Cover [1]. The capacity region of the ABC was given by K¨orner and Marton [2]. They called the ABC the broadcast channels with degraded message sets. K¨orner and Marton [2] obtained the following result. Theorem 1 (K¨orner and Marton [2]): For each fixed ε ∈ (0, 1) and for any (W1 , W2 ), we have Cm,ABC(ε, ε|W1 , W2 ) = CABC (W1 , W2 ) = C(W1 , W2 ). To prove this theorem they used a combinatorial lemma called “the blowing up lemma”. Their method used to prove the above theorem was extended to the method called the entropy and image size characterization by Csisz´ar and K¨orner [3], where they obtained the following result. Theorem 2 (Csisz´ar and K¨orner [3]): For ε ∈ (0, 1/2), CABC (ε, ε|W1 , W2 ) = CABC (W1 , W2 ). Csisz´ar and K¨orner [3] applied the method of entropy and image size characterization to other coding problems in multiuser information theory to prove strong converse theorems

(1/n) log |Kn |≥R1 , (1/n) log |Ln |≥R2

By time sharing we have that   nR1 + mR1′ nR2 + mR2′ G(n+m) , W , W 1 2 n+m n+m nG(n) (R1 , R2 |W1 , W2 ) + mG(m) (R1′ , R2′ |W1 , W2 ) .(5) ≤ n+m Choosing R1 = R1′ and R2 = R2′ in (5), we obtain the following subadditivity property on {G(n) (R1 , R2 |W1 , W2 ) }n≥1 : G(n+m) (R1 , R2 |W1 , W2 ) nG(n) (R1 , R2 |W1 , W2 ) + mG(m) (R1 , R2 |W1 , W2 ) , ≤ n+m from which we have that G(R1 , R2 |W1 , W2 ) exists and satisfies the following: lim G(n) (R1 , R2 |W1 , W2 ) = inf G(n) (R1 , R2 |W1 , W2 ).

n→∞

n≥1

Set △

G(R1 , R2 |W1 , W2 ) = lim G(n) (R1 , R2 |W1 , W2 ), n→∞



G(pXY ) = {(R1 , R2 , G) : G ≥ G(R1 , R2 |W1 , W2 )}. The exponent function G(R1 , R2 |W1 , W2 ) is a convex function of (R1 , R2 ). In fact, from (5), we have that for any α ∈ [0, 1] G(αR1 + α ¯ R1′ , αR2 + α ¯ R2′ |W1 , W2 ) ≤ αG(R1 , R2 |W1 , W2 ) + α ¯ G(R1′ , R2′ |W1 , W2 ). The region R(W1 , W2 ) is also a closed convex set. Our main aim is to find an explicit characterization of R(W1 , W2 ). In this paper we derive an explicit outer bound of R (W1 , W2 ) whose section by the plane G = 0 coincides with C(W1 , W2 ). II. M AIN R ESULT In this section we state our main result. Before describing the result, we state that the region Cext (W1 , W2 ) can be expressed with two families of supporting hyperplanes. We define the set of probability distribution p = pUXY Z of (U,



Ω(α,β,γ,µ,λ) (q|W1 , W2 ) = log Λ(α,β,γ,µ,λ)(q|W1 , W2 ), Ω(α,β,γ,µ,λ) (W1 , W2 )

X, Y, Z) ∈ U ×X ×Y ×Z by Psh (W1 , W2 )





= {p = pUXY Z : |U| ≤ min{|X |, |Y| + |Z| − 1}, pY |X = W1 , pZ|Y = W2 , Y ↔ X ↔ Z, U ↔ X ↔ (Y, Z)}, △

Q = {q = qUXY Z : |U| ≤ |Y| + |Z| − 1}. We set

= max Ω(α,β,γ,µ,λ) (q|W1 , W2 ), q

(α,β,γ,µ,λ)

Fext

(r1 , r2 , r3 |W1 , W2 )



−1

= {1 + λ[1 + α + β + (2 − 3µ)γ]} i h   × λ γ µr1 + µ ¯r2 + γ¯r3 − Ω(α,β,γ,µ,λ) (W1 , W2 ) , F (α,β,γ,µ,λ) (R1 , R2 |W1 , W2 )

C (γ,µ) (W1 , W2 ) △

=

max

p∈Psh (W1 ,W2 )

{γµIp (X; Y |U ) + γ µ ¯Ip (U ; Z)

+¯ γ Ip (X; Y )}, (α,β,γ,µ) ˜ C (W1 , W2 )  △ = max −αD(qY |XZU ||W1 |qXZU ) q∈Q −βD(qZ|XY U ||W2 |qXY U ) +γµIq (X; Y |U ) + γ µ ¯Iq (U ; Z) +¯ γ Iq (X; Y )}, Cext,sh (W1 , W2 ) \ {(r1 , r2 , r3 ) : = ¯r2 ] + γ¯ r3 γ∈[0,1], γ[µr1 + µ µ∈[0,1/2] ≤ C (γ,µ) (W1 , W2 )}, C˜ext,sh (W1 , W2 ) \ {(r1 , r2 , r3 ) : = ¯r2 ] + γ¯ r3 α,β>0, γ[µr1 + µ γ∈[0,1], (α,β,γ,µ) ˜ (W1 , W2 )}. µ∈[0,1/2] ≤ C Then we have the following Property. Property 2: a) The cardinality bound in Psh (W1 , W2 ) is sufficient to describe Cext,sh(W1 , W2 ). b) Cext,sh (W1 , W2 ) = C˜ext,sh (W1 , W2 ) = Cext (W1 , W2 ). Proof of Property 2 part a) is given in Appendix A. Proof of Property 2 part b) is given in Appendix B. Define ωq(α,β,γ,µ)(x, y, z|u) W1 (y|x) W2 (z|x) △ = α log + β log qY |XZU (y|x, z, u) qZ|XY U (z|x, y, u)   qZ|U (z|u) W1 (y|x) +µ ¯ log +γ µ log qY |U (y|u) qZ (z) W1 (y|x) +¯ γ log . qY (y) Furthermore, define Λ(α,β,γ,µ,λ) (q|W1 , W2 ) X △ qUXY Z (u, x, y, z) = (u,x,y,z) ∈U ×X ×Y×Z

n o × exp λωq(α,β,γ,µ) (x, y, z|u) ,



(α,β,γ,µ,λ)

= Fext (R1 , R2 , R1 + R2 |W1 , W2 )   λ (γµ + γ¯ )R1 + (γ µ ¯ + γ¯)R2 − Ω(α,β,γ,µ,λ) (W1 , W2 ) = , 1 + λ[1 + α + β + (2 − 3µ)γ] Fext (r1 , r2 , r3 |W1 , W2 ) △

=

(α,β,γ,µ,λ)

sup α,β,λ>0, (γ,µ)∈[0,1]×[0,1/2]

Fext

(r1 , r2 , r3 |W1 , W2 ),

F (R1 , R2 |W1 , W2 ) △

= Fext (R1 , R2 , R1 + R2 |W1 , W2 ) =

sup α,β,λ>0, (γ,µ)∈[0,1]×[0,1/2]

F (α,β,γ,µ,λ) (R1 , R2 |W1 , W2 ),



R(W1 , W2 ) = {(R1 , R2 , G) : G ≥ F (R1 , R2 |W1 , W2 )} . We can show that the above functions and sets satisfy the following property. Property 3: a) Ω(α,β,γ,µ,λ)(q|W1 , W2 ) is a convex function of λ > 0. b) For every q ∈ Q, we have Ω(α,β,γ,µ,λ) (q|W1 , W2 ) λ→+0 λ = −αD(qY |XZU ||W1 |qXZU ) −βD(qZ|XY U ||W2 |qXY U ) lim

−γµD(qY |XU ||W1 |qXU ) +γ[µIq (X; Y |U ) + µ ¯Iq (U ; Z)] + γ¯ Iq (X; Y ). c) If (r1 , r2 , r3 ) ∈ / Cext (W1 , W2 ), then we have Fext (r1 , r2 , r3 |W1 , W2 ) > 0. In particular, if (R1 , R2 , R1 +R2 ) ∈ / Cext (W1 , W2 ), then we have Fext (R1 , R2 , R1 + R2 |W1 , W2 ) > 0, which implies that if (R1 , R2 ) ∈ / C(W1 , W2 ), then we have F (R1 , R2 |W1 , W2 ) > 0. Proof of Property 3 is given in Appendix C. Our main result is the following. Theorem 3: For any BC (W1 , W2 ), we have G(R1 , R2 |W1 , W2 ) ≥ F (R1 , R2 |W1 , W2 ), R(W1 , W2 ) ⊆ R(W1 , W2 ).

(6) (7)

It follows from Theorem 3 and Property 3 part c) that if (R1 , R2 ) is outside the capacity region, then the error probability of decoding goes to one exponentially and its exponent is not below F (R1 , R2 |W1 , W2 ). From this theorem we immediately obtain the following corollary, which recovers the strong converse theorem by Csisz´ar and K¨orner [3]. Corollary 1: For each fixed ε ∈ (0, 1), and for any (W1 , W2 ), we have

(iii)

QY n |Ln on Y n given Ln . In (11), we can choose any distri(iv)

bution QZ n on Z n . In (12) , we can choose any distribution (v) QY n on Y n . Proof of this lemma is given in Appendix D. For t = 1, 2, · · · , n, set △



n n Ut = Ln × Y t−1 × Zt+1 , Ut = (Ln , Y t−1 , Zt+1 ) ∈ Ut , △

n ) ∈ Ut , ut = (l, y t−1 , zt+1 △



n n ) ∈ Vt . , Vt = (Ln , Zt+1 Vt = Ln × Zt+1

CABC (ε|W1 , W2 ) = CABC (W1 , W2 ) = C(W1 , W2 ). For each fixed ε ∈ (0, 1/2), and for any (W1 , W2 ), we have CABC (ε, ε|W1 , W2 ) = CABC (W1 , W2 ) = C(W1 , W2 ). Proof of Theorem 3 will be given in the next section. The strong converse theorem for the single-user channel is proved by a combinatorial method called the method of types, which is developed by Csisz´ar and K¨orner [3]. This method is not useful for proving Theorem 3. In fact when we use the method of type, it is very hard to extract a condition related to the Markov chain condition U ↔ X ↔ Y , which the auxiliary random variable U ∈ U must satisfy when (R1 , R2 ) is on the boundary of C(W1 , W2 ). Some novel techniques based on the information spectrum method introduced by Han [6] are necessary to prove this theorem. III. P ROOFS

OF THE

For each t = 1, 2 · · · , l, let κt be a natural projection from Ut onto Vt . Using κt , we have Vt = κt (Ut ), t = 1, 2, · · · , n. From Lemma 1 we have the following. (n) (n) Lemma 2: For any η > 0 and for any (ϕ(n) , ψ1 , ψ2 ) satisfying 1 1 log |Kn | ≥ R1 , log |Ln | ≥ R2 , n n we have (n) (n) (n) P(n) , ψ1 , ψ2 ) c (ϕ n

0≤

1X log (i) n t=1 Q

Yt |Xt Zt Ut (Yt |Xt Zt Ut )

n

n

1X W1 (Yt |Xt ) R1 ≤ log (iii) n t=1 (Yt |Ut ) Q + R2 ≤

0≤

W1n (Y n |X n )

1 log (i) n Q n

1 0 ≤ log n



n n n Y |X n Z n Ln (Y |X , Z , Ln ) W2n (Z n |X n ) (ii) QZ n |X n Y n Ln (Z n |X n , Y n , Ln ) W1n (Y n |X n ) + η, (iii) QY n |Ln (Y n |Ln )

R1 ≤

1 log n

R2 ≤

pZ n |Ln (Z n |Ln ) 1 log + η, (iv) n Q n (Z n )

+ η, + η,

(8) (9) (10) (11)

Y

)

Yt |Ut (iii) QZt |Ut (Zt |Ut ) log (iii) QZt |Vt (Zt |Vt ) t=1

n X



pZ |Vt (Zt |Vt ) 1X log t (iv) +η n t=1 Q (Zt ) Zt

Z

1 W n (Y n |X n ) R1 + R2 ≤ log 1 (v) +η n Q n (Y n )

1 n n

we have ≤ pL n

+ η,

W2 (Zt |Xt ) 1X log (ii) + η, 0≤ n t=1 (Zt |Xt Yt Ut ) Q

1 1 log |Kn | ≥ R1 , log |Ln | ≥ R2 , n n

XnY nZn

W1 (Yt |Xt )

Zt |Xt Yt Ut

M AIN R ESULTS

We first prove the following lemma. (n) (n) Lemma 1: For any η > 0 and for any (ϕ(n) , ψ1 , ψ2 ) satisfying

(n) (n) (n) P(n) , ψ1 , ψ2 ) c (ϕ

≤ pL n X n Y n Z n



 n 1X W1 (Yt |Xt ) + η + 5e−nη , (13) R1 + R2 ≤ log (v) n t=1 Q (Yt ) Yt

where for each t = 1, 2, · · · , n, the following probability and conditional probability distributions: ) (iii) (ii) (i) QYt |Xt Zt Ut , QZt |Xt Yt Ut , QYt |Ut , (14) (v) (iv) (iii) (iii) QZt |Ut , QZt |Vt , QZt , QYt appearing in the first term in the right members of (13) have a property that we can choose their values arbitrary. (i) Proof: In (8), we choose QY n |X n Z n Ln so that

+ 5e−nη . (12) (i) QY n |

In (8), we can choose any conditional distribution n n n X n Z n Ln on Y given (X , Z , Ln ). In (9), we can choose (ii) any conditional distribution QZ n |X n Y n Ln on Z n given (X n , Y n ,Ln ). In (10), we can choose any conditional distribution

(i)

QY n |X n Z n Ln (Y n |X n , Z n , Ln ) = =

n Y

t=1 n Y

t=1

(i)

QYt |Xt Y t−1 Z n

t+1 Ln

(i)

n (Yt |Xt , Y t−1 , Zt+1 , Ln )

QYt |Xt Ut (Yt |Xt , Ut ).

(15)

(ii)

a set of all probability distributions pLn X n Y n Z n on Ln ×X n ×Y n ×Z n having the form:

In (9), we choose QZ n |X n Y n Ln so that (ii)

QZ n |X n Y n Ln (Z n |X n , Y n , Ln ) = =

n Y

(ii)

QZt |Xt Y t−1 Z n

t=1 n Y

t+1 Ln

n (Zt |Xt , Y t−1 , Zt+1 , Ln )

t=1

(ii)

QZt |Xt Ut (Zt |Xt , Ut ).

(16)

t=1

In (10), we have the following chain of equalities: W1n (Y n |X n ) (iii)

QY n |Ln (Y n |Ln ) (iii)

=

pLn X n Y n Z n (l, xn , y n , z n ) n Y = pLn (l) pXt |Ln X t−1 (xt |l, xt−1 )W1 (yt |xt )W2 (zt |xt ). For simplicity of notation we use the notation p(n) for pLn X n Y n Z n ∈ P (n) (W1 , W2 ). We assume that pUt Xt Yt Zt = pLn Xt Y t Ztn is a marginal distribution induced by p(n) . For t = 1, 2, · · · , n, we simply write pt = pUt Xt Yt Zt . For each t = 1, 2, · · · , n, let Proj(Ut → Vt ) be a set of all projection κt from Ut onto Vt . For p(n) ∈ P (n) (W1 , W2 ),

W1n (Y n |X n )QZ n |Ln (Z n |Ln ) κn = {κt }nt=1 ∈

(iii)

(iii)

QY n |Ln (Y n |Ln )QZ n |Ln (Z n |Ln )

=

t−1 n n Q , Zt+1 , Ln ) n L (Zt |Y Y Zt |Y t−1 Zt+1 n

and Qn ∈ Qn , we define

(iii) t−1 , Z n , L ) n L (Zt |Y n t=1 QYt |Y t−1 Zt+1 t+1 n n Y W1 (Yt |Xt ) × (iii) n n L (Zt |Zt+1 , Ln ) t=1 QZt |Zt+1 n (iii) n Y W1 (Yt |Xt )QZt |Ut (Zt |Ut )

.

(17)

QZt (Zt ).

(18)

(iii)

(iii)

t=1

QYt |Ut (Yt |Ut )QZt |Vt (Zt |Vt ) (iv)

In (11), we choose QZ n so that (iv)

QZ n (Z n ) =

n Y

(iv)

t=1 (v)

In (12), we choose QY n so that (v)

QY n (Y n ) =

n Y

(v)

QYt (Yt ).

(19)

From Lemma 1 and (15)-(19), we have the bound (13) in Lemma 2. For each t = 1, 2, · · · , n, let Qt be a set of all (iii)

(ii)

(iii)

Qt = (QYt |Xt Zt Ut , QZt |Xt Yt Ut , QYt |Ut , QZt |Ut , (v) (iv) (iii) QZt |Vt , QZt , QYt ).



n Y

t=1



×

n Y

×

n on △ Qt , Qn = Qt

t=1

∈ Qn .

To evaluate an upper bound of (13) in Lemma 2. We use the following lemma, which is well known as the Cram`er’s bound in the large deviation principle. Lemma 3: For any real valued random variable A and any θ > 0, we have Pr{A ≥ a} ≤ exp [− (λa − log E[exp(θA)])] . Here we define a quantity which serves as an exponential (n) (n) (n) upper bound of Pc (ϕ(n) , ψ1 , ψ2 ). Let P (n) (W1 , W2 ) be

Zt |Vt

(

pZt |Vt (Zt |Vt ) (iv)

)γ µ¯θ 

QZt (Zt )  ( )γ¯ θ  n Y W (Y |X ) 1 t t   t=1

(v)

t=1

Set Qn =

Ω(α,β,γ,µ,θ)(p(n) , κn , Qn )   αθ  n   Y W1 (Yt |Xt ) △   = log Ep(n)     Q(i) t=1 Yt |Xt Zt Ut (Yt |Xt , Zt , Ut )   βθ  n   Y W2 (Zt |Xt )   ×   Q(ii)  t=1 Zt |Xt Yt Ut (Zt |Xt , Yt , Ut )   γµθ  n  Y W1 (Yt |Xt )    ×   Q(iii) (Yt |Ut )  t=1 Yt |Ut   γµθ  n   (iii)  Y Q (Zt |Ut ) ×   Q(iii) (Zt |Vt )  t=1

t=1

(i)

Proj(Ut → Vt ),

t=1

(iii)

=

n Y

QYt (Yt )



,

where for each t = 1, 2, · · · , n, the following probability and conditional probability distributions: ) (iii) (ii) (i) QYt |Xt Zt Ut , QZt |Xt Yt Ut , QYt |Ut , (20) (v) (iv) (iii) (iii) QZt |Ut , QZt |Vt , QZt , QYt appearing in the definition of Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) can be chosen arbitrary. By Lemmas 2 and 3, we have the following proposition. Proposition 1: For any α, β, γ, µ, θ > 0, any Qn ∈ Qn , (n) (n) and any (ϕ(n) , ψ1 , ψ2 ) satisfying 1 1 log |Kn | ≥ R1 , log |Ln | ≥ R2 , n n

We choose η so that

we have (n)

(n)

(n) P(n) , ψ , ψ2 ) c (ϕ  1 ≤ 6 exp −n[1 + θ(1 + α + β)]−1 n × θ[(¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2 ]

o 1 − Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) . n

− η = θ(1 + α + β)η   −θ (¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2 1 + Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ). n Solving (23) with respect to η, we have η = [1 + θ(1 + α + β)]−1    × θ (¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2  1 − Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) . n

Proof: We define six random variables Ai ,i = 1, 2, · · · , 5 by n

A1 =

1X log (i) n t=1 Q

W1 (Yt |Xt )

Yt |Xt Zt Ut (Yt |Xt Zt Ut )

,

For this choice of η and (22), we have

n 1X W2 (Zt |Xt ) , A2 = log (ii) n t=1 (Zt |Xt Yt Ut ) Q

(n)

Zt |Xt Yt Ut

Zt |Vt

n pZ |Vt (Zt |Vt ) 1X A4 = − R2 , log t (v) n t=1 Q (Zt ) Zt

n W1 (Yt |Xt ) 1X log − (R1 + R2 ). A5 = (v) n t=1 Q (Yt )

completing the proof. Set

Yt

Then by Lemma 2, for any

(n) (n) (ϕ(n) , ψ1 , ψ2 )



satisfying

= sup n≥1

(W1 , W2 ) max

min 1

n n (n) p(n) (W1 ,W2 ), Q ∈Q Q∈P κ ∈ n Proj(U →V ) t t t=1 n

1 × Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ). n Then we have the following corollary from Proposition 1. Corollary 2: For any positive R1 , R2 and for any positive α, β, γ, µ, and θ, we have

we have (n)

(n) P(n) , ψ1 , ψ2 ) c (ϕ

≤ pLn X n Y n Z n {Ai ≥ −η for i = 1, 2, 3, 4, 5} +5e−nη

G(R1 , R2 |W1 , W2 )

¯A4 ] ≤ pLn X n Y n Z n {αA1 + βA2 + γ[µA3 + µ −nη +¯ γ A5 ≥ −η[1 + α + β]} + 5e = pLn X n Y n Z n {A ≥ a} + 5e−nη ,

(α,β,γ,µ,θ)



1 1 log |Kn | ≥ R1 , log |Ln | ≥ R2 , n n (n)

(n)

(n) P(n) , ψ1 , ψ2 ) ≤ 6e−nη c (ϕ  −1 = 6 exp −n {1 + θ(1 + α + β)}    × θ (¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2  1 − Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) , n

(iii) n W1 (Yt |Xt )QZt |Ut (Yt |Ut ) 1X A3 = log (iii) − R1 , (iii) n t=1 (Zt |Vt ) (Yt |Ut )Q Q Yt |Ut

(23)

(α,β,γ,µ,θ)

≥ (21)

θ[(¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2 ] − Ω 1 + θ(1 + α + β)

(W1 , W2 )

.

(α,β,γ,µ,θ)

Proof: By the definition of Ω (W1 , W2 ), the definition of G(n) (R1 , R2 |W1 , W2 ), and Proposition 1, we have

where we set △

A = αA1 + βA2 + γ[µA3 + µ ¯A4 ] + γ¯ A5 ,

G(n) (R1 , R2 |W1 , W2 )



a = −η[1 + α + β].

(α,β,γ,µ,θ)

Applying Lemma 3 to the first term in the right member of (21), we have (n)

(n)

(n) , ψ1 , ψ2 ) P(n) c (ϕ   ≤ exp − θa − log Ep(n) [exp(θA)] + 5e−nη   = exp n θ(1 + α + β)η   −θ (¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2  1 + Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) + 5e−nη . n

(22)

(W1 , W2 ) θ [(¯ γ + γµ)R1 + (¯ γ + γµ ¯)R2 ] − Ω ≥ 1 + θ(1 + α + β) 1 − log 6, n from which we have Corollary 2. (α,β,γ,µ,θ) We shall call Ω (W1 , W2 ) the communication potential. The above corollary implies that the analysis of (α,β,γ,µ,θ) Ω (W1 , W2 ) leads to an establishment of a strong converse theorem for the BC. In the following argument we drive an explicit upper bound (α,β,γ,µ,θ) of Ω (W1 , W2 ). To this end we use a new novel

technique called the recursive method. This method is a powerful tool to drive a single letterized exponent function for rates below the rate distortion function. This method is also applicable to prove the exponential strong converse theorems for other network information theory problems [7], [8], [9]. For each t = 1, 2, · · · , n, define a function of (ut , xt , yt , zt ) ∈ Ut ×X ×Y ×Z by (α,β,γ,µ,θ) f(pZ |V ,κt ),Q t t △

=

(xt , yt , zt |ut )

(α,β,γ,µ,θ)

×pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n ) ×pXt Yt |Ln X t−1 Z n (xt , yt |l, xt−1 , y t−1 , z n ) (α,β,γ,µ,θ)

xt ,y t

×pXt Yt |Ln X t−1 Z n (xt , yt |l, xt−1 , y t−1 , z n ) (α,β,γ,µ,θ)

(xt , yt , zt |ut ), ×fFt o n exp Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) =

n Y

(α,β,γ,µ,θ)

Φt,F t

(l, z n ).

t=1

(l,z n )∈Ln ×Z n

n

of the random variable (Ln , Z ) taking values in Ln ×Z n by (α,β,γ,µ,θ)

pLn Z n ;F t (l, z n )



F t = {Fi }ti=1 .

= C˜t−1 pLn Z n (l, z n )

t Y

(α,β,γ,µ,θ)

Φi,F i

(l, z n ),

(25)

i=1

where C˜t is a constant for normalization given by

(xt ,y t ,l,z n )∈X t ×Y t ×Ln ×Z

, n

(α,β,γ,µ,θ) pX t Y t |Ln Z n ;F t (xt , y t |l, z n )

Ct (l, z ) =

X

t

×

t Y

l,z n

t Y

(α,β,γ,µ,θ)

Φi,F i

(l, z n ).

i=1

t

(α,β,γ,µ,θ)

fFi

−1 = C˜t C˜t−1 ,

where we define C˜0 = 1. Then we have the following. Lemma 5: n X (α,β,γ,µ,θ) Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) = log Λt,F t ,

(26)

(27)

t=1

n

(α,β,γ,µ,θ)

Λt,F t X (α,β,γ,µ,θ) (α,β,γ,µ,θ) = pLn Z n ;F t−1 (l, z n )Φt,F t (l, z n )

(xi , yi , zi |ui )

l,z n

i=1

are constants for normalization. For t = 1, 2, · · · , n, define

=

XX l,z n

△ (α,β,γ,µ,θ) Φt,F t (l, z n ) =

pLn Z n (l, z n )

(α,β,γ,µ,θ) △

pX t Y t |Ln Z n (x , y |l, z )

xt ,y t

X

Λt,F t

where △

C˜t =

For t = 1, 2, · · · , n, define

Ct−1 (l, z n )pX t Y t |Ln Z n (xt , y t |l, z n ) t Y (α,β,γ,µ,θ) fFi (xi , yi , zi |ui ), × i=1 n

pLn Z n (l, z n )

Proof of this lemma is given in Appendix E. Next we define the probability distribution n o (α,β,γ,µ,θ) (α,β,γ,µ,θ) pLn Z n ;F t = pLn Z n ;F t (l, z n )

(α,β,γ,µ,θ)

pX t Y t |Ln Z n ;F t n o △ (α,β,γ,µ,θ) = pX t Y t |Ln Z n ;F t (xt , y t |l, z n ) =

X l,z n

For each t = 1, 2, · · · , n, we define a conditional probability distribution of (X t , Y t ) given (Ln , Z n ) by



(xt , yt , zt |ut ).

(α,β,γ,µ,θ)

Set △

(l, z n ))−1

Φt,F t (l, z n ) X (α,β,γ,µ,θ) pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n ) =

  Q(i) Yt |Xt Zt Ut (yt |xt , zt , ut )  βθ   W2 (zt |xt ) ×  Q(ii)  Zt |Xt Yt Ut (zt |xt , yt , ut ) γµθ   W (y |x )  1 t t ×  Q(iii) (yt |ut )  Yt |Ut γµθ (  )γ µ¯θ  Q(iii) (zt |ut )  pZt |Vt (zt |vt ) Zt |Ut × (iv)  Q(iii) (zt |vt )  QZt (zt ) Zt |Vt ( )γ¯ θ W1 (yt |xt ) × . (v) QYt (yt )

Ft = (pZt |Vt , κt , Qt ),

(α,β,γ,µ,θ)

Furthermore, we have αθ 

W1 (yt |xt )

(α,β,γ,µ,θ)

pX t Y t |Ln Z n ;F t (xt , y t |l, z n ) = (Φt,F t

×fFt

t

 

Lemma 4: For each t = 1, 2, · · · , n, and for any (l, z n x , y t ) ∈ Ln ×Z n ×X t ×Y t , we have t

Ct (l, z

n

−1 )Ct−1 (l, z n ),

(24)

where we define C0 (l, z n ) = 1 for (l, z n ) ∈ Ln ×Z n . Then we have the following lemma.

(α,β,γ,µ,θ)

pLn Z n ;F t−1 (l, z n )

xt ,y t (α,β,γ,µ,θ)

×pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n )

×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n ) (α,β,γ,µ,θ)

×fFt

(xt , yt , zt |ut ).

(28)

Proof: By Lemma 4, we have n o exp Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) = C˜n =

n Y

−1 (a) = C˜t C˜t−1

n Y

(α,β,γ,µ,θ)

Λt,F t

.

(29)

t=1

t=1

and choose the following probability and conditional probability distributions: ) (iii) (iii) (ii) (i) QYt |Xt Zt Ut , QZt |Xt Yt Ut , QYt |Ut , QZt |Ut , appearing in

(α,β,γ,µ,θ) Λt,qt .

Step (a) follows from the definition (26) of From (29), we have (27) in Lemma 5. We next prove (28) in Lemma (α,β,γ,µ,θ) 5. Multiplying Λt,F t = C˜t /C˜t−1 to both sides of (25), we have (α,β,γ,µ,θ) (α,β,γ,µ,θ) pLn Z n ;F t (l, z n ) t Y (α,β,γ,µ,θ) −1 (l, z n ), Φi,F i C˜t−1 pLn Z n (l, z n ) i=1 (α,β,γ,µ,θ) (α,β,γ,µ,θ) (l, z n ). pLn Z n ;F t−1 (l, z n )Φt,F t

Λt,F t

= =

(30)

(31)

Taking summations of (30) and (31) with respect to (l, z n ), we have (28) in Lemma 5. The following proposition is a mathematical core to prove our main result. Proposition 2: For θ ∈ (0, [(2 − 3µ)γ]−1 ), set λ=

λ θ ⇔θ= . 1 − (2 − 3µ)γθ 1 + (2 − 3µ)γλ

(32)

Then, for any positive α, β, γ, µ, and any θ ∈ (0, [(2 − 3µ)γ]−1 ), we have Ω

(α,β,γ,µ,θ)

(W1 , W2 ) ≤

Ω(α,β,γ,µ,λ) (W1 , W2 ) . 1 + (2 − 3µ)γλ

Proof: Set △

ˆ n = {q = qUXY Z : |U| ≤ |Ln ||Y n−1 ||Z n−1 |}, Q △ ˆ (α,β,γ,µ,λ)(W1 , W2 ) = Ω min Ω(α,β,γ,µ,λ) (q|W1 , W2 ). n

ˆn q∈Q

Set (α,β,γ,µ,θ)

pLn Xt Y t Z n ;F t−1 (l, xt , y t , ztn ) t

(α,β,γ,µ,θ)

= pUt Xt Yt Zt ;F t−1 (ut , xt , yt , zt ) X △ (α,β,γ,µ,θ) = pLn Z n ;F t−1 (l, z n )

(α,β,γ,µ,θ)

=

fFt  

×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n). (33) Then by Lemma 5, we have X (α,β,γ,µ,θ) (α,β,γ,µ,θ) pUt Xt Yt Zt ;F t−1 (ut , xt , yt , zt ) Λt,F t = ut ,xt ,yt ,zt

(α,β,γ,µ,θ)

(xt , yt , zt |ut ).

For each t = 1, 2, · · · , n, we choose qt = qUt Xt Yt Zt so that (α,β,γ,µ,θ)

qUt Xt Yt Zt (ut , xt , yt , zt ) = pUt Xt Yt Zt ;F t−1 (ut , xt , yt , zt )

W1 (yt |xt )

αθ 

such that they are the distributions induced by qUt Xt Yt Zt . Then for each t = 1, 2, · · · , n, we have the following chain of inequalities: (α,β,γ,µ,θ)

Λt,F t "(

W1αθ (Yt |Xt ) αθ qYt |Xt Zt Ut (Yt |Xt , Zt , Ut ) W2βθ (Zt |Xt ) W1γµθ (Yt |Xt ) × βθ qZt |Xt Yt Ut (Zt |Xt , Yt , Ut ) qYγµθ (Yt |Ut ) t |Ut

= Eqt

×

γµ ¯θ qZ (Zt |Ut ) W1γ¯ θ (Yt |Xt ) t |Ut γµ ¯θ (Zt ) qZ t

)

qYγ¯tθ (Yt )  ( γ µ¯θ )  γ(¯µ−µ)θ pZt |Vt (Zt |Vt )  qZt |Vt (Zt |Vt )   × γµ ¯θ  q γ(¯µ−µ)θ (Zt |Ut )  qZ (Z |V ) t t Zt |Ut t |Vt "( αθ (a) W1 (Yt |Xt ) ≤ Eqt αθ qYt |Xt Zt Ut (Yt |Xt , Zt , Ut ) W2βθ (Zt |Xt )

W1γµθ (Yt |Xt )

βθ qZ (Zt |Xt , Yt , Ut ) qYγµθ (Yt |Ut ) t |Xt Yt Ut t |Ut

(α,β,γ,µ,θ)

×pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n )

(xt , yt , zt |ut )

  Q(i) Yt |Xt Zt Ut (yt |xt , zt , ut )  βθ  γµθ    W (y |x )  W2 (zt |xt ) 1 t t × (iii)  Q(ii)    (z |x , y , u ) Q t Zt |Xt Yt Ut t t t Yt |Ut (yt |ut ) γµθ (  )γ µ¯θ  Q(iii) (zt |ut )  pZt |Vt (zt |vt ) Zt |Ut × (iv)  Q(iii) (zt |vt )  QZt (zt ) Zt |Vt )γ¯ θ ( W1 (yt |xt ) × (v) QYt (yt )

×

xt−1 ,z t−1

×fFt

(v)

(iv)

QZt , QYt

×

γµ ¯θ qZ (Zt |Ut ) W1γ¯ θ (Yt |Xt ) t |Ut

1−(2−3µ)γθ 1 ) 1−(2−3µ)γθ

γµ ¯θ (Zt ) qZ qYγ¯tθ (Yt ) t γ µ¯θ   pZt |Vt (Zt |Vt ) × Eqt qZt |Vt (Zt |Vt )    qZt |Vt (Zt |Vt ) γ(¯µ−µ)θ × Eqt qZt |Ut (Zt |Ut )  = exp [1 − (2 − 3µ)γθ]

o θ ×Ω(α,β,γ,µ, 1−(2−3µ)γθ ) (qt |W1 , W2 )



 Ω(α,β,γ,µ,λ) (qt |W1 , W2 ) = exp 1 + (2 − 3µ)γλ ) ( (α,β,γ,µ,λ) (c) ˆ (W1 , W2 ) Ωn ≤ exp 1 + (2 − 3µ)γλ   (α,β,γ,µ,λ) Ω (W1 , W2 ) (d) . = exp 1 + (2 − 3µ)γλ 

(b)

A PPENDIX A. Cardinality Bound on Auxiliary Random Variables (34)

Step (a) follows from H¨older’s inequality. Step (b) follows from (32). Step (c) follows from qt ∈ Pˆn (W1 , W2 ) and ˆ n(α,β,γ,µ,λ) (W1 , W2 ). Step (d) follows the definition of Ω from Lemma 6 in Appendix A. To prove this lemma we bound the cardinality |U| appearing in the definition of ˆ n(α,β,γ,µ,λ) (W1 , W2 ) to show that the bound |U| ≤|Y|+|Z|− Ω ˆ n(α,β,γ,µ,λ) (W1 , W2 ). Hence we 1 is sufficient to describe Ω have the following: 1 (α,β,γ,µ,θ) (n) n n Ω (p , κ , Q ) n n 1 (a) 1 X (α,β,γ,µ,θ) ≤ Ω(α,β,γ,µ,θ)(p(n) , κn , Qn ) = log Λt,F t n n t=1 min

qn ∈Qn

(b)





(α,β,γ,µ,λ)

(W1 , W2 ) . 1 + (2 − 3µ)γλ

(α,β,γ,µ,θ)

(W1 , W2 ) ≤

Ω(α,β,γ,µ,λ) (W1 , W2 ) , 1 + (2 − 3µ)γλ

completing the proof. Proof of Theorem 3: For θ ∈ (0, [(2 − 3µ)γ]−1 ), set λ=

θ λ ⇔θ= . 1 − (2 − 3µ)γθ 1 + (2 − 3µ)γλ

(36)

Then we have the following: G(R1 , R2 |W1 , W2 ) (a)

u∈U

pY (y) = pZ (z) =

XX

u∈U X x∈X X

u∈U x∈X

(α,β,γ,µ,θ)

θ[(γµ + γ¯)R1 + (γ µ ¯ + γ¯)R2 ] − Ω (W1 , W2 ) ≥ 1 + θ[1 + α + β] λ[(γµ + γ¯)R1 + (γ µ ¯ + γ¯)R2 ] − Ω(α,β,γ,µ,λ) (W1 , W2 ) (b) 1 + (2 − 3µ)γλ ≥ λ[1 + α + β] 1+ 1 + (2 − 3µ)γλ λ[(γµ + γ¯)R1 + (γ µ ¯ + γ¯)R2 ] − Ω(α,β,γ,µ,λ) (W1 , W2 ) = 1 + λ[1 + α + β + (2 − 3µ)γ] (α,β,γ,µ,λ) = F (R1 , R2 |W1 , W2 ). (37) Step (a) follows from Corollary 2. Step (b) follows from Proposition 2 and (36). Since (37) holds for any positive α, β, µ, ν, and λ, we have G(R1 , R2 |W1 , W2 ) ≥ F (R1 , R2 |W1 , W2 ). Thus (6) in Theorem 3 is proved. The inclusion R(W1 , W2 ) ⊆ R(W1 , W2 ) is obvious from this bound.

 pU (u)W1 (y|x)pX|U (x|u),   

pU (u)W2 (z|x)pX|U (x|u),   

γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)] + γ¯ Ip (X; Y ) X = pU (u)[π1 (pX|U (·|u)) + π2 (pX|U (·|u))],

(39)

(40)

u∈U

where we set △

π1 (pX|U (·|u)) =

(35)

Step (a) follows from (27) in Lemma 5. Step (b) follows from (34). Since (35) holds for any n ≥ 1 and any p(n) ∈ P (n) (W1 , W2 ), we have Ω

In this appendix we prove the cardinality bounds on auxiliary random variables appearing in this paper. We first prove Property 2 part a). Observe that X pX (x) = pU (u)pX|U (x|u), (38)

X

pX|U (x|u)W1 (y|x)

(x,y)∈X ×Y

γ¯+γ µ¯            X W1 (y|x)       W1 (y|˜ x)pX (˜ x)   x ˜∈X × log  X γµ ,     W1 (y|˜ x)pX|U (˜ x|u)          x ˜ ∈X              pY (y)     X △ pX|U (x|u)W2 (z|x) π2 (pX|U (·|u)) =             



(x,z)∈X ×Z

 X γ µ¯      W (z|˜ x )p (˜ x |u) 2 X|U     x˜∈X    × log    .  pZ (z)      

Proof of Property 2 part a): We bound the cardinality |U| of U to show that the bound |U| ≤ min{|X |, |Y|+ |Z| − 1} is sufficient to describe C (µ) (W1 , W2 ). We first derive a sufficient value of |U| to express |X | − 1 values of (38) and (40). Note that by (39) the quantities pY (·) and pZ (·) appearing in the above definitions of πi (pX|U (·|u)), i = 1, 2, are regarded as constants under (38). For each u ∈ U, πi (pX|U (·|u)), i = 1, 2 is a continuous function of pX|U (·|u). Then by the support lemma, |U| ≤ |X | − 1 + 1 = |X | is sufficient to express |X | − 1 values of (38) and one value of (40). We next derive a sufficient value of |U| to express |Y|+ |Z| − 2 values of (39) and (40). Note that the quantities pY (·) and pZ (·) appearing in the above definitions of πi (pX|U (·|u)), i = 1, 2, are regarded as constants under (39). For each u ∈ U,

πi (pX|U (·|u)), i = 1, 2 is a continuous function of pX|U (·|u). Then by the support lemma, |U| ≤ |Y| + |Z| − 2 + 1 = |Y| + |Z| − 1 is sufficient to express |Y| + |Z| − 2 values of (39) and one value of (40). Next we prove the following lemma. Lemma 6: For each integer n ≥ 2, we define ˆ (α,β,γ,µ,λ) (W1 , W2 ) Ω n



=

max q=q

U XY Z : |U |≤|Ln ||Y|n−1 |Z|n−1

Ω(α,β,γ,µ,λ) (q|W1 , W2 ),





Q = {q = qUXY Z : |U| ≤ |Y| + |Z| − 1}.

Then we have ˆ (α,β,γ,µ,λ) (W1 , W2 ) = Ω(α,β,γ,µ,λ)(W1 , W2 ). Ω Proof: We bound the cardinality |U| of U to show that the ˆ n(α,β,γ,µ,λ) bound |U| ≤ |Y|+|Z| − 1 is sufficient to describe Ω (W1 , W2 ). Observe that X  qY (y) = qU (u)qY |U (y|u),    u∈U X (41) qZ (z) = qU (u)qZ|U (z|u),    u∈U (α,β,γ,µ,λ)

=

qU (u)ζ

Y ↔ X ↔ Z, U ↔ X ↔ (Y, Z)}, Psh (W1 , W2 )

Y ↔ X ↔ Z, U ↔ X ↔ (Y, Z)},

|U |≤|Y|+|Z|−1

Λ X



= {p = pUXY Z : |U| ≤ min{|X |, |Y| + |Z|} + 1, pY |X = W1 , pZ|X = W2 ,



Ω(α,β,γ,µ,λ)(q|W1 , W2 ).

max

q=qU XY Z :

P(W1 , W2 )

= {p = pUXY Z : |U| ≤ min{|X |, |Y| + |Z| − 1}, pY |X = W1 , pZ|Y = W2 ,

Ω(α,β,γ,µ,λ) (W1 , W2 ) =

Proof of this lemma is omitted here. This lemma will be used to prove Property 2 part b). Proof of Property 2 part b): We first recall the following definitions of P(W1 , W2 ), Psh (W1 , W2 ), and Q:

(q|W1 , W2 ) (α,β,γ,µ,λ)

(qXY Z|U (·|u)),

(42)

We first show that µ ∈ [0, 1/2] is sufficient to describe Cext (W1 , W2 ). The supporting hyperplane of Cext (W1 , W2 ) with a normal vector (γµ, γ µ ¯, γ¯ ) can be mapped to that of C(W1 , W2 ) with (γµ + γ¯ , γ µ ¯ + γ¯ ). Hence by Property 1 part d), we have γµ ¯ + γ¯ − (γµ + γ¯ ) = γ(1 − 2µ) ≥ 0. We next prove Cext,sh ( W1 , W2 ) ⊆ Cext (W1 , W2 ). We assume that (ˆ r1 , rˆ2 , rˆ3 ) ∈ / Cext (W1 , W2 ). Then by Lemma 7, there exist ǫ, γ ∗ ∈ [0, 1], µ∗ ∈ [0, 1/2] such that for any (r1 , r2 , r3 ) ∈ Cext (W1 , W2 ) we have

u∈U

where we set

ζ (α,β,γ,µ,λ) (qXY Z|U (·, ·, ·|u)) X △ qXY Z|U (x, y, z|u) = (x,y,z)∈X ×Y×Z

n

o × exp λωq(α,β,γ,µ) (x, y, z|u) .

For the quantities qY (·) and qZ (·) contained in the forms of ζ (α,β,γ,µ,λ) (qXY Z|U (·|u)), u ∈ U, we regard them as constants under (41). For each u ∈ U, ζ (α,β,γ,µ,λ) (qXY Z|U (·|u)) is a continuous function of qXY Z|U (·, ·, ·|u). Then by the support lemma, |U| ≤ |Y| + |Z| − 2 + 1 = |Y| + |Z| − 1 is sufficient to express |Y| + |Z| − 2 values of (41) and one value of (42). B. Supporting Hyperplane Expression of the Capacity Region In this appendix we prove Property 2 part b). From Property 1 part c), we have the following lemma. Lemma 7: Suppose that (ˆ r1 , rˆ2 , rˆ3 ) does not belong to Cext (W1 , W2 ). Then there exist ǫ, γ ∗ , µ∗ > 0 such that for any (r1 , r2 , r3 ) ∈ Cext (W1 , W2 ) we have γ ∗ µ∗ (r1 − rˆ1 ) + γ ∗ µ ¯∗ (r2 − rˆ2 ) ∗ +¯ γ (r3 − rˆ3 ) + ǫ ≤ 0.

γ ∗ µ∗ (r1 − rˆ1 ) + γ ∗ µ ¯∗ (r2 − rˆ2 ) +¯ γ ∗ (r3 − rˆ3 ) + ǫ ≤ 0. Then we have γ ∗ (µ∗ rˆ1 + µ ¯∗ rˆ2 ) + γ¯ ∗ rˆ3 ¯∗ r2 ) + γ¯∗ r3 } + ǫ ≥ max {γ ∗ (µ∗ r1 + µ (r1 ,r2 ,r3 ) ∈C(W1 ,W2 )

(a)

=

max

p∈P(W1 ,W2 )

{γ ∗ µ∗ Ip (X; Y |U ) + γ ∗ µ ¯∗ Ip (U ; Z) +¯ γ ∗ Ip (X; Y )} + ǫ

= C (γ





,µ )

(W1 , W2 ) + ǫ.

(43)

Step (a) follows from the definition of Cext (W1 , W2 ). The bound (43) implies that (ˆ r1 , rˆ2 ,ˆ r3 ) ∈ / Cext,sh (W1 , W2 ). Thus Cext,sh (W1 , W2 ) ⊆ Cext (W1 , W2 ) is proved. To complete the proof it suffices to prove the following inclusions: Cext (W1 , W2 ) ⊆ C˜ext,sh(W1 , W2 ) ⊆ Cext,sh (W1 , W2 ). We first prove Cext ( W1 , W2 ) ⊆ C˜ext,sh (W1 , W2 ). We assume that (R1 , R2 ) ∈ Cext (W1 , W2 ). Then there exists p ∈ P (W1 , W2 ) such that  r1 ≤ Ip (X; Y |U ), r2 ≤ Ip (U ; Z), (44) r3 ≤ Ip (X; Y ).

Then, for (r1 , r2 , r3 ) ∈ Cext (W1 , W2 ), we have the following chain of inequalities:

From (45), we must have  ∗ 0 ≤ αD(qY∗ |XZU,α,β,γ,µ ||W1 |qXZU,α,β,γ,µ )   ≤ ∆(γ,µ) , ∗ ∗ 0 ≤ βD(qZ|XY U,α,β,γ,µ ||W2 |qXY U,α,β,γ,µ )    (γ,µ) ≤∆

γ(µr1 + µ ¯r2 ) + γ¯r3 (a)

≤ γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)] + γ¯Ip (X; Y )

≤ (b)

=

(c)

=

max

p∈P(W1 ,W2 )

{γµIp (X; Y |U ) + γ µ ¯Ip (U ; Z) +¯ γ Ip (X; Y )}

max

p∈Psh (W1 ,W2 )

max

p∈Psh (W1 ,W2 )

for any α, β, γ, µ > 0. From (46), we have

 ∗ 0 ≤ D(qY∗ |XZU,α,β,γ,µ ||W1 |qXZU,α,β,γ,µ )   (γ,µ)  ≤ ∆ ,

{γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)] +¯ γ Ip (X; Y )}

α

From (47), we have

≤ max {−αD(pY |XZU ||W1 |pXZU ) p∈Q −βD(pZ|XY U ||W2 |pXY U ) +γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)] + γ¯ Ip (X; Y )} (α,β,γ,µ) ˜ = C (W1 , W2 ). Step (a) follows from (44). Step (b) follows from that by Property 2 part a), the cardinality bound in P(W1 , W2 ) can be reduced to that in Psh (W1 , W2 ). Step (c) follows from that when p ∈ Psh (W1 , W2 ), we have D(pY |XZU ||W1 |pXZU ) = D(pZ|XY U ||W2 |pXY U ) = 0. Hence we have Cext (W1 , W2 ) ⊆ C˜ext,sh (W1 , W2 ). Finally we prove C˜ext,sh (W1 , W2 ) ⊆ Cext,sh (W1 , W2 ). We assume that (˜ r1 , r˜2 , r˜3 ) ∈ C˜ext (W1 , W2 ). Then we have γ[µ˜ r1 + µ ¯r˜2 ] + γ¯ r˜3 (α,β,γ,µ) ˜ ≤C (W1 , W2 )

α,β,γ,µ

be

qˆUXY Z,α,β,γ,µ (u, x, y, z) ∗ = qUX,α,β,γ,µ (u, x)W1 (y|x)W2 (z|x). Define Q(W1 , W2 ) △

= {qUXY Z : |U| ≤ |Y| + |Z| − 1, qY |X = W1 , qZ|X = W2 , Y ↔ X ↔ Z, U ↔ X ↔ (Y, Z)}. By definition, we have qˆα,β,γ,µ ∈ Q(W1 , W2 ). Computing ∗ D(qα,β,γ,µ ||ˆ qα,β,γ,µ ), we have the following:

(49)

∗ lim D(qα,β,γ,µ ||ˆ qα,β,γ,µ ) = 0,

α,β→∞

∗ ∗ +γ[µIqα,β,γ (X; Y |U ) + µ ¯Iqα,β,γ,µ (U ; Z)]

from which we have (45)

∗ qUXY Z,α,β,γ,µ

where = ∈ Q is a probability distribution which attains the maximum in the definition of C˜ (α,β,γ,µ) (W1 , W2 ). The quantities ∗ qY∗ |XZU,α,β,γ,µ , qXZU,α,β,γ,µ , ∗ ∗ qZ|XY U,α,β,γ,µ , qXY U,α,β,γ,µ ,

appearing in the right members of (45) are the (conditional) ∗ . We set distributions induced by qα,β,γ,µ

∗ +¯ γ Iqα,β,γ,µ (X; Y ) − γ[µ˜ r1 + µ ¯r˜2 ] − γ¯ r˜3 .

for any α, β, γ, and µ > 0. Let qˆα,β,γ,µ = qˆUXY Z, a probability distribution with the form

(48)

Step (a) follows from (47). From (49), we have

∗ ∗ −βD(qZ|XY U,α,β,γ,µ ||W2 |qXY U,α,β,γ,µ )



∗ +¯ γ Iqα,β,γ,µ (X; Y ),

∗ ∗ +D(qZ|XY U,α,β,γ,µ||W2 |qXY U,α,β,γ,µ )   (a) 1 1 ∗ −Iqα,β,γ,µ (Y ; Z|XU ) ≤ ∆ . + α β

+γIq (X; Y )} ∗ ) = −αD(qY∗ |XZU,α,β,γ,µ ||W1 |qXZU,α,β,γ,µ

∗ ∗ ∆(γ,µ) = γ[µIqα,β,γ,µ (X; Y |U ) + µ ¯Iqα,β,γ,µ (U ; Z)]

γ(µ˜ r1 + µ ¯r˜2 ) + γ¯ r˜3 ∗ ∗ (U ; Z)] (X; Y |U ) + µ ¯Iqα,β,γ,µ ≤ γ[µIqα,β,γ,µ

∗ D(qα,β,γ,µ ||ˆ qα,β,γ,µ ) ∗ ∗ ) = D(qY |XZU,α,β,γ,µ ||W1 |qXZU,α,β,γ,µ

= max {−αD(qY |XZU ||W1 |qXZU ) q∈Q −βD(q Z|XY U ||W2 |qXY U ) +γ[µIq (X; Y |U ) + µ ¯Iq (U ; Z)]

∗ qα,β,γ,µ

(47)

∗ ∗ 0 ≤ D(qZ|XY U,α,β,γ,µ ||W2 |qXY U,α,β,γ,µ )    (γ,µ)  ≤ ∆β .

{−αD(pY |XZU ||W1 |pXZU ) −βD(pZ|XY U ||W2 |pXY U ) +γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)] +¯ γ Ip (X; Y )}

∗ (X; Y ), +¯ γ Iqα,β,γ,µ

(46)

∗ qα,β,γ,µ → qˆα,β,γ,µ as α, β → ∞.

(50)

By (50) and the continuity of Iq (X; Y |U ), Iq (U ; Z), and Iq (X; Y ) with respect to q, we have that for any γ, µ > 0 and any sufficiently large α, β, we have ∗ ∗ (U ; Z)] (X; Y |U ) + µ ¯Iqα,β,γ,µ γ[µIqα,β,γ,µ ∗ (X; Y ) +¯ γ Iqα,β,γ,µ

≤ γ[µIqˆα,β,γ,µ (X; Y |U ) + µ ¯Iqˆα,β,γ,µ (U ; Z)] +¯ γ Iqˆα,β,γ,µ (X; Y ) + τ (α, β, γ, µ), where τ (α, β, γ, µ) is a positive number that satisfies lim τ (α, β, γ, µ) = 0.

α,β→∞

(51)

From (54), it is obvious that ξ ′′ (λ) is nonnegative. Hence Ω(q|W1 , W2 ) is a convex function of λ. From (53), we have

Then we have the following chain of inequalities: γ[µ˜ r1 + µ ¯r˜2 ) + γ¯r˜3 (a)

ξ ′ (0) =

∗ ∗ ≤ γ[µIqα,β,γ,µ (X; Y |U ) + µ ¯Iqα,β,γ,µ (U ; Z)]

= −αD(qY |XZU ||W1 |qXZU ) −βD(qZ|XY U ||W2 |qXY U )

(b)

≤ γ[µIqˆα,β,γ,µ (X; Y |U ) + µ ¯Iqˆα,β,γ,µ (U ; Z)] +¯ γ Iqˆα,β,γ,µ (X; Y ) + τ (α, β, γ, µ)

−γµD(qY |XU ||W1 |qXU ) + γµIq (X; Y |U ) +γ µ ¯Iq (U ; Z) + γ¯ Iq (X; Y ).

(c)

max

p∈Q(W1 ,W2 )

{γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)]

+¯ γ Ip (X; Y )} + τ (α, β, γ, µ) (d)

=

max

p∈Psh (W1 ,W2 )

qA (a)ρ(a)

a

∗ +¯ γ Iqα,β,γ,µ (X; Y )



X

(55)

Hence we have the part b). Next we prove the part c). We assume that (r1 , r2 , r3 ) ∈ / Cext (W1 , W2 ), then by Property 2 part b), there exist positive α∗ , β ∗ , γ ∗ , µ∗ , and ǫ such that

{γ[µIp (X; Y |U ) + µ ¯Ip (U ; Z)]

+¯ γ Ip (X; Y )} + τ (α, β, γ, µ) = C

(γ,µ)

(W1 , W2 ) + τ (α, β, γ, µ).

γ ∗ (µ∗ r1 + µ ¯∗ r2 ) + γ¯ ∗ r3 ∗ ∗ ∗ ∗ ≥ C˜ (α ,β ,γ ,µ ) (W1 , W2 ) + ǫ.

(52)

Step (a) follows from (48). Step (b) follows from (51). Step (c) follows from that qˆα,β,γ,µ ∈ Q(W1 , W2 ). Step (d) follows from that by Property 2 part a), the cardinality |U| of U in Q(W1 , W2 ) can be upper bounded by |X | for describing C (γ,µ) (W1 , W2 ). Since in (52), the quantity τ (α, β, γ, µ) can be made arbitrary close to zero, we conclude that (˜ r1 , r˜2 , r˜3 ) ∈ Cext,sh (W1 , W2 ). Thus C˜ext,sh (W1 , W2 ) ⊆ Cext,sh (W1 , W2 ) is proved.

Set △













ζ(λ) = Ω(α ,β ,γ ,µ ,λ) (q|W1 , W2 ) h −λ −α∗ D(qY |XZU ||W1 |qXZU )

−β ∗ D(qZ|XY U ||W2 |qXY U )

−γ ∗ µ∗ D(qY |XU ||W1 qXU ) +γ[µ∗ Iq (X; Y |U ) + µ ¯∗ Iq (U ; Z)] i ǫ +¯ γ ∗ Iq (X; Y ) + . 2

C. Proof of Property 3 In this appendix we prove Property 3. Proof of Property 3: We first prove parts a) and b). For simplicity of notations, set

Then we have the following:



a = (u, x, y, z), A = (U, X, Y, Z), A = U × X × Y × Z, △

ωq(α,β,γ,µ)(x, y, z|u) = ρ(a),

(56)

ζ(0) = 0, ζ ′ (0) = − ζ ′′ (λ) = ξ ′′ (λ) ≥ 0.

) ǫ < 0, 2

(57)



Ω(α,β,γ,µ,λ)(q|W1 , W2 ) = ξ(λ). Then we have 

Ω(α,β,γ,µ,λ) (q|W1 , W2 ) = ξ(λ) = log 

X

a∈A



qA (a)eλρ(a)  .

By simple computations we have 

ξ ′ (λ) = 

X



ξ ′′ (λ) =  

×

X

a,b∈A

a

a

a

−2

qA (a)eλρ(a) 

qA (a)qA (b)









Ω(α ,β ,γ ,µ ,λ) (q|W1 , W2 ) h ≤ λ −α∗ D(qY |XZU ||W1 |qXZU ) −β ∗ D(qZ|XY U ||W1 |qXY U )

−1   X qA (a)eλρ(a)   qA (a)ρ(a)eλρ(a)  ,(53)

X

It follows from (57) that there exists ν(ǫ) > 0 such that we have ζ(λ) ≤ 0 for λ ∈ (0, κ(ǫ)]. Hence for any λ ∈ (0, ν(ǫ)] and for every q ∈ Q, we have



{ρ(a) − ρ(b)}2 λ{ρ(a)+ρ(b)}  e . 2

(54)

−γ ∗ µ∗ D(qY |XU ||W1 |qXU ) +γ ∗ [µ∗ Iq (X; Y |U ) + µ ¯∗ Iq (U ; Z)] ǫi +¯ γ Iq (X; Y ) + 2 h ≤ λ −α∗ D(qY |XZU ||W1 |qXZU ) −β ∗ D(qZ|XY U ||W1 |qXY U ) +γ ∗ [µ∗ Iq (X; Y |U ) + µ ¯∗ Iq (U ; Z)] i ǫ . +¯ γ ∗ Iq (X; Y ) + 2

(58)

Furthermore, for l ∈ Ln , set

From (58), we have that for any λ ∈ (0, ν(ǫ)], ∗

Ω(α







,β ,γ ,µ ,λ) ∗





A3 (l) = {(xn , y n , z n ) : W1n (y n |xn ) (iii) ≥ |Kn |e−nη QY n |Ln (y n |l)},

(W1 , W2 )





= max Ω(α ,β ,γ ,µ ,λ) (q|W1 , W2 ) q∈Q n ≤ λ max −α∗ D(qY |XZU ||W1 |qXZU )



A4 (l) = {(xn , y n , z n ) : pZ n |Ln (z n |l) (iv) ≥ |Ln |e−nη QZ n (z n )},

q∈Q

−β ∗ D(qZ|XY U ||W2 |qXY U ) +γ[µ∗ Iq (X; Y |U ) + µ ¯∗ Iq (U ; Z)] o ǫ +¯ γ ∗ Iq (X; Y ) + 2 h ∗ ∗ ∗ ∗ ǫi . = λ C˜ (α ,β ,γ ,µ ) (W1 , W2 ) + 2



A5 (l) = {(xn , y n , z n ) : W1n (y n |xn ) (v) ≥ |Kn ||Ln |e−nη QY n (y n )}, △

A(l) = (59)

Under (56) and (59), we have the following chain of inequalities: (α,β,γ,µ,λ)

sup α,β,λ>0, (γ,µ)∈[0,1] ×[0,1/2)



sup λ∈(0,ν(ǫ)]

(a)



sup

Proof of Lemma 1: We have the following: X X 1 P(n) = c |Kn ||Ln | n n n

Fext



(k,l)∈Kn ×Ln

(r1 , r2 , r3 |W1 , W2 )

{1h+ λ[1 + α∗ + β ∗ + (2 − 3µ∗ )γ ∗ ]}   × λ γ ∗ µ∗ r1 + µ ¯∗ r2 + γ¯ ∗ r3 i ∗ ∗ ∗ ∗ −Ω(α ,β ,γ ,µ ,λ) (W1 , W2 ) ∗





−1

where

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |y n ) 5 X ∆i , ≤ △

−1

∆0 =

W1 (y |x )

Y |X n Z n Ln

(y n |xn , z n , l)

≥ −η

A2 (l)   pZ n |X n Y n Ln (z n |xn , y n , l) 1 △ = (xn , y n , z n ) : log (ii)  n qZ n |X n Y n Ln (z n |xn , y n , l)

(n)

n

(k,l)∈Kn ×Ln (xn ,y n ,z n )∈A(l)

×ϕ (x |k, l)W1n (y n |xn )W2n (z n |xn ), for i = 1, 2, X X 1 △ ∆i = |Kn ||Ln | n n n c (k,l)∈Kn ×Ln

(x ,y ,z )∈Ai (l), y n ∈D1 (k,l),z n ∈D2 (l)

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |y n )

In this appendix we prove Lemma 1. For l ∈ Ln , set

1 log (i) n q n

n

X

(k,l)∈Kn ×Ln (x ,y ,z )∈A1 (l)

D. Proof of Lemma 1

=

(n)

X

(x |k, l)W1n (y n |xn )W2n (z n |xn ), X X 1 △ ∆i = |Kn ||Ln | c n n n

Step (a) follows from (59). Step (b) follows from (56).

n

1 |Kn ||Ln | ×ϕ

ν(ǫ)ǫ 1 > 0. = · 2 1 + ν(ǫ)[1 + α∗ + β ∗ + (2 − 3µ∗ )γ ∗ ]

n

(x ,y ,z )∈A (l): y n ∈D1 (k,l),z n ∈D2 (l)

i=0

{1 + )γ ]} h λ[1 + α + β + (2 − 3µ  λ∈(0,ν(ǫ)] ∗ ∗ ∗ ∗ ×λ γ (µ r1 + µ ¯ r2 + γ¯ r3 ∗ ∗ ∗ ∗ ǫi −C˜ (α ,β ,γ ,µ ) (W1 , W2 ) − 2 (b) 1 λǫ ≥ sup · ∗ ∗ ∗ ∗ λ∈(0,ν(ǫ)] 2 1 + λ[1 + α + β + (2 − 3µ )γ ]

A1 (l)   pY n |X n Z n Ln (y n |xn , z n , l) 1 △ = (xn , y n , z n ) : log (i)  n qY n |X n Z n Ln (y n |xn , z n , l)

(x ,y ,z )∈A(l), y n ∈D1 (k,l),z n ∈D2 (l)

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |y n ) X X 1 + |Kn ||Ln | n n n c

(r1 , r2 , r3 |W1 , W2 )

(α∗ ,β ∗ ,γ ∗ ,µ∗ ,λ)

sup λ∈(0,ν(ǫ)]

=

Fext

Ai (l).

i=1

(k,l)∈Kn ×Ln

Fext (r1 , r2 , r3 |W1 , W2 ) =

5 \

for i = 3, 4, 5. By definition we have ∆0  

= pL n ,



 1 W2n (z n |xn ) = log ≥ −η . n qZ n |X n Y n Ln (z n |xn , y n , l)

XnY nZn



1 W1n (Y n |X n ) + η, log n qY n |X n Z n Ln (Y n |X n , Z n , Ln ) 1 W2n (Z n |X n ) 0 ≤ log + η, n qZ n |X n Y n Ln (Z n |X n , Y n , Ln ) 1 W1n (Y n |X n ) 1 + η, log |Kn | ≤ log n n qY n |Ln (Y n |Ln ) pY n |Ln (Y n |Ln ) 1 1 log |Ln | ≤ log + η, n n qY n (Y n ) 0≤

 pZ n |Ln (Z n |Ln ) 1 1 +η . log |Ln | ≤ log n n qZ n (Z n ) (n)

(60)



l∈Ln

(n)

From (60), it follows that if (ϕ(n) , ψ1 , ψ2 ) satisfies 1 1 log |Kn | ≥ R1 , log |Ln | ≥ R2 , n n then the quantity ∆0 is upper bounded by the first term in the right members of (12) in Lemma 1. Hence it suffices to show ∆i ≤ e−nη , i = 1, 2, 3, 4, 5 to prove Lemma 1. We first prove ∆i ≤ e−nη for i = 1, 2. By a symmetrical structure on A1 (·) and A2 (·), it suffices to prove ∆1 ≤ e−nη . We have the following chain of inequalities: X X pLn X n Y n Z n (l, xn , y n , z n ) ∆1 = l∈Ln (xn ,y n ,z n ) ∈A1 (l)

≤ e−nη

X

(i)

X

l∈Ln (xn ,y n ,z n ) ∈A1 (l)

QY n |X n Z n Ln (y n |xn , z n , l) ×pX n Z n Ln (xn , z n , l)

≤ e−nη .

1 X |Ln |

(k,l)∈Kn ×Ln

(x ,y ,z ): y n ∈D1 (k,l),z n ∈D2 (l) W1n (y n |xn )<e−nη (iii) ×|Kn |QY n |L (y n |l)

k∈Kn (xn ,y n )∈X n ×Y n z n ∈D2 (l), pZ n |Ln (z n |l)<e−nη (iv)

=

1 X |Ln |

l∈Ln

n

X

pZ n |Ln (z n |l)

z ∈D2 (l), pZ n |Ln (z n |l)<e−nη (iv)

×|Ln |QZ n (z n )

≤ e−nη

X

X

(v)

[

(v)

(k,l)∈Kn ×Ln

(x ,y ,z ): y n ∈D1 (k,l),z n ∈D2 (l)

(iii)

×ϕ(n) (xn |k, l)QY n |Ln (y n |l)W2n (z n |xn ) X X e−nη (iii) QY n |Ln (y n |l) = |Ln | n ≤

(k,l)∈Kn ×Ln y ∈D1 (k,l) ×W2n (D2 (l)|xn ) −nη X X e (iii) QY n |Ln ( D1 (k, l)| l) |Ln | l∈Ln k∈Kn

e−nη X (iii) QY n |Ln = |Ln | l∈Ln



e−nη X 1 = e−nη . |Ln |

[

k∈Kn

! D1 (k, l) l

l∈Ln

We prove ∆4 ≤ e−nη . We have the following chain of inequalities: X X 1 ∆4 = |Ln | n n n (k,l)∈Kn ×Ln

(x ,y ,z ): y ∈D1 (k,l),z n ∈D2 (l) pZ n |Ln (z n |l)<e−nη n

(iv)

×|Ln |QZ n (z n ) n n n

×pKn X n Y n Z n |Ln (k, x , y , z |l)

(iv)

QZ n (D2 (l))

l∈Ln

l∈Ln z n ∈D2 (l)

= e−nη QZ n

X

qZ n (z n ) = e−nη !

D2 (l)

l∈Ln

≤ e−nη .

Finally, we prove ∆5 ≤ e−nη . We have the following chain of inequalities: 1 |Kn ||Ln |

X

(k,l)∈Kn ×Ln

X

(xn ,y n ,z n ): y n ∈D1 (k,l),z n ∈D2 (l) W1n (y n |xn )<e−nη (v) ×|Kn ||Ln |QY n (y n )

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |xn ) X X 1 = |Kn ||Ln | n n (k,l)∈Kn ×Ln

n

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (z n |xn ) X X e−nη ≤ |Ln | n n n

X

×|Ln |QZ n (z n ) ×pKn X n Y n Z n |Ln (k, xn , y n , z n |l)

∆5 =

Next, we prove ∆3 ≤ e−nη . We have the following chain of inequalities: X X 1 ∆3 = |Kn ||Ln | n n n

X

X

(x ,y ):

y n ∈D1 (k,l), W1n (y n |xn )<e−nη (v) ×|Kn ||Ln |QY n (y n )

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (D2 (l)|xn ) X X 1 = |Kn ||Ln | n n (k,l)∈Kn ×Ln

(x ,y ): y n ∈D1 (k,l), W1n (y n |xn )<e−nη (v) ×|Kn ||Ln |QY n (y n )

×ϕ(n) (xn |k, l)W1n (y n |xn )W2n (D2 (l)|xn ) X X 1 ≤ |Kn ||Ln | n n (k,l)∈Kn ×Ln

(x ,y ): y n ∈D1 (k,l), W (y n |xn )<e−nη (v) ×|Kn ||Ln |QY n (y n ) n

×ϕ(n) (xn |k, l)W1n (y n |xn ) X X X ≤ e−nη l∈Ln k∈Kn

(xn ,y n ): y n ∈D1 (k,l), W n (y n |xn )<e−nη (v) |Kn ||Ln |QY n (y n ) (v)

×ϕ(n) (xn |k, l)QY n (y n ) X X X = e−nη l∈Ln k∈Kn

≤ e−nη

X X

l∈Ln k∈Kn

y n ∈D1 (k,l), W n (y n |xn )<e−nη (v) |Kn ||Ln |QY n (y n ) (v)

QY n (D1 (k, l))

(v)

QY n (y n )

(v)



= e−nη QY n 

[

(k,l)∈Kn ×Ln

Thus Lemma 1 is proved.



completing the proof.

D1 (k, l) ≤ e−nη .

Acknowledgements I am very grateful to Dr. Shun Watanabe for his helpful comments.

E. Proof of Lemma 4 In this appendix we prove Lemma 4. (α,β,γ,µ,θ) Proof of Lemma 4: By the definition of pX t Y t |Ln Z n ;F t (xt , y t |l, z n ), for t = 1, 2, · · · , n, we have (α,β,γ,µ,θ)

pX t Y t |Ln Z n ;F t (xt , y t |l, z n ) = Ct−1 (l, z n )pX t Y t |Ln Z n ;F t (xt , y t |l, z n ) t Y (α,β,γ,µ,θ) (xi , yi , zi |ui ). fFi ×

(61)

i=1

Then we have the following chain of equalities: (α,β,γ,µ,θ)

pX t Y t |Ln Z n ;F t (xt , y t |l, z n) (a)

= Ct−1 (l, z n )pX t Y t |Ln Z n (xt , y t |l, z n) t Y (α,β,γ,µ,θ) (xi , yi , zi |ui ) fFi ×

i=1 −1 = Ct (l, z n )pX t−1 Y t−1 |Ln Z n (xt−1 , y t−1 |l, z n ) t−1 Y (α,β,γ,µ,θ) (xi , yi , zi |ui ) fFi × i=1 ×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n ) (α,β,γ,µ,θ) (xt , yt |ut ) ×fFt n (b) Ct−1 (l, z ) (α,β,γ,µ,θ) t−1 t−1 = p t−1 t−1 , y |l, z n ) n t−1 (x Ct (l, z n ) X Y |Ln Z ;F ×pXt |Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n) (α,β,γ,µ,θ) (xt , yt , zt |ut ) ×fFt (α,β,γ,µ,θ) = (Φt,F t (l, z n ))−1 (α,β,γ,µ,θ) ×pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n ) ×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , z t−1 , l, z n) (α,β,γ,µ,θ) (xt , yt , zt |ut ). (62) ×fFt

Steps (a) and (b) follow from (61). From (62), we have (α,β,γ,µ,θ)

Φt,F t =

(α,β,γ,µ,θ)

(l, z n )pX t Y t |Ln Z n ;F t (xt , y t |l, z n )

(α,β,γ,µ,θ) pX t−1 Y t−1 |LnZ n ;F t−1 (xt−1 , y t−1 |l, z n ) ×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n ) (α,β,γ,µ,θ) ×fFt (xt , yt , zt |ut ).

(63)

(64)

Taking summations of (63) and (64) with respect to xt , y t , we obtain (α,β,γ,µ,θ)

Φt,F t (l, z n ) X (α,β,γ,µ,θ) = pX t−1 Y t−1 |Ln Z n ;F t−1 (xt−1 , y t−1 |l, z n ) xt ,y t

×pXt Yt |X t−1 Y t−1 Ln Z n (xt , yt |xt−1 , y t−1 , l, z n ) (α,β,γ,µ,θ)

×fFt

(xt , yt , zt |ut ),

R EFERENCES [1] T. M. Cover, “Broadcast channels,” IEEE Trans. Inform. Theory, vol. IT-18, no.1, pp. 2–13, Jan. 1972. [2] J. K¨orner and K. Marton, “General broadcast channels with degraded message sets,” IEEE Trans. Inform. Theory, vol. IT-23, no. 1, pp. 60-64, Jan 1977. [3] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981. [4] J. K¨orner and A. Sgarro, “Universally attainable error exponents for broadcast channels with degraded message sets,” IEEE Trans. Inform. Theory, vol. IT-26, no. 6, pp. 670-679, Nov. 1980. [5] Y. Kaspi and N. Merhav, “Error exponents for broadcast channels with degraded Message sets,” IEEE Trans. Inform. Theory, vol. 57, no. 1, pp.101-123, Jan. 2011. [6] T. S. Han, Information-Spectrum Methods in Information Theory. Springer-Verlag, Berlin, New York, 2002. The Japanese edition was published by Baifukan-publisher, Tokyo, 1998. [7] Y. Oohama, “Exponent function for one helper source coding problem at rates outside the rate region,” Proceedings of the 2015 IEEE International Symposium on Information Theory, pp. 1575-1579, Hong Kong, China, June 14-19, 2015. [8] Y. Oohama, “Strong converse exponent for degraded broadcast channels at rates outside the capacity region,” Proceedings of the 2015 IEEE International Symposium on Information Theory, pp. 939 - 943, Hong Kong, China, June 14-19, 2015. [9] Y. Oohama, “Strong converse theorems for degraded broadcast channels with feedback,” Proceedings of the 2015 IEEE International Symposium on Information Theory, pp. 2510 - 2514, Hong Kong, China, June 14-19, 2015.