Error Exponents for Channel Coding with Side Information

Report 4 Downloads 77 Views
Error Exponents for Channel Coding with Side Information

arXiv:cs/0410003v1 [cs.IT] 1 Oct 2004

Pierre Moulin and Ying Wang University of Illinois at Urbana-Champaign Beckman Institute, Coordinated Science Laboratory, and Dept of Electrical and Computer Engineering, Urbana, IL 61801, USA {moulin,ywang11}@ifp.uiuc.edu ∗ September 30, 2004

Abstract Capacity formulas and random-coding and sphere-packing exponents are derived for a generalized family of Gel’fand-Pinsker coding problems. These exponents yield asymptotic upper and lower bounds, respectively, on the achievable log probability of error. Information is to be reliably transmitted through a noisy channel with finite input and output alphabets and random state sequence. The channel is selected by an hypothetical adversary. Partial information about the state sequence is available to the encoder, adversary, and decoder. The design of the transmitter is subject to a cost constraint. Two families of channels are considered: 1) compound discrete memoryless channels (C-DMC), and 2) channels with arbitrary memory, subject to an additive cost constraint, or more generally to a hard constraint on the conditional type of the channel output given the input. Both problems are closely connected. In each case the randomcoding and sphere-packing exponents coincide at high rates, thereby determining the reliability function of the channel family. The random-coding exponent is achieved using a stacked binning scheme and a maximum penalized mutual information decoder. The sphere-packing exponent is obtained by defining a dummy sequence whose joint type with the state sequence and channel input and output sequences is made available to the decoder. For channels with arbitrary memory, the random-coding and sphere-packing exponents are larger than their C-DMC counterparts. Applications of this study include watermarking, data hiding, communication in presence of partially known interferers, and problems such as broadcast channels, all of which involve the fundamental idea of binning.

Index terms: channel coding with side information, error exponents, arbitrarily varying channels, universal coding and decoding, randomized codes, random binning, capacity, reliability function, method of types, watermarking, data hiding, broadcast channels.



This research was supported by NSF under ITR grants CCR 00-81268 and CCR 03-25924.

1

1

Introduction

In 1980, Gel’fand and Pinsker studied the problem of coding for a discrete memoryless channel (DMC) p(y|x, s) with random states S that are observed by the encoder but not by the decoder [1]. They derived the capacity of this channel and showed it is achievable by a random binning scheme and a joint-typicality decoder. Applications of their work include computer memories with defects [2] and writing on dirty paper [3]. Duality with source coding problems with side information was explored in [4, 5, 6]. In the late 1990’s, it was found that the problems of embedding and hiding information in cover signals are closely related to the Gel’fand-Pinsker problem: the cover signal plays the role of the state sequence in the Gel’fand-Pinsker problem [7, 8, 9]. Capacity expressions were derived under expected distortion constraints for the transmitter and a memoryless adversary [9]. One difference between the basic Gel’fand-Pinsker problem and the various formulations of data-hiding and watermarking problems resides in the amount of side information available to the encoder, channel designer (adversary), and decoder. A unified framework for studying such problems is considered in this paper. The encoder, adversary and decoder have access to degraded versions se , sa , sd , respectively, of a state sequence s. Capacity takes the form C = sup

min [I(U ; Y S d ) − I(U ; S e )],

pXU |S e pY |XS a

where U is an auxiliary random variable, and the sup and min are subject to appropriate constraints. In problems such as data hiding, the assumption of a fixed channel is untenable when the channel is under partial control of an adversary. This motivated the game-theoretic approach of [9], where the worst channel in a class of memoryless channels was derived, and capacity is the solution to a maxmin mutual-information game. This game-theoretic approach was recently extended by Cohen and Lapidoth [10] and Somekh-Baruch and Merhav [11, 12], who considered a class of channels with arbitrary memory, subject to almost-sure distortion constraints. In the special case of private data hiding, in which the cover signal is known to both the encoder and the decoder, Somekh-Baruch and Merhav also derived random-coding and sphere-packing exponents [11]. Binning is not needed in this scenario. The channel model of [10, 11, 12] is reminiscent of the classical memoryless arbitrary varying channel (AVC) [13, 14, 15] which is often used to analyze jamming problems. In the classical AVC model, no side information is available to the encoder or decoder. Error exponents for this problem were derived by Ericson [16] and Hughes and Thomas [17]. The capacity of the AVC with side information at the encoder was derived by Ahlswede [18]. The coding problems considered in this paper are motivated by data hiding applications in which the decoder has partial 1 or no knowledge of the cover signal. In all cases capacity is achievable by random-binning schemes. Roughly speaking, the encoder designs a codebook for the auxiliary U . The selected sequence U plays the role of input to a fictitious DMC and conveys information about both the encoder’s state sequence Se and the message M to the decoder. Finding the best error exponents for such schemes is challenging. Initial attempts in this direction for the Gel’fand-Pinsker DMC have been reported by Haroutunian et al [19, 20], but errors were discovered later [21, 22]. Very recently, random-coding exponents have been independently obtained by Haroutunian and Tonoyan [23] and Somekh-Baruch and Merhav [24]. Their results and ours [25] were presented at the 2004 ISIT conference. 1

For instance, the decoder may have access to a noisy, compressed version of the original cover signal.

2

The random-coding exponents we have derived cannot be achieved by standard binning schemes and standard maximum mutual information (MMI) decoders [13, 15]. Instead we use a stack of variable-size codeword arrays indexed by the type of the encoder’s state sequence Se . The appropriate decoder is a maximum penalized mutual information (MPMI) decoder, where the penalty is a function of the state sequence type. The case where the probability mass function (p.m.f.) of the state sequence is unknown can also be accommodated in this framework. Similarly to [1], the auxiliary random variable U that appears in the proof of the converse capacity theorem does not admit an obvious coding interpretation. (This is in sharp contrast with conventional point-to-point communication problems, where U coincides with X, the channel input.) Our derivation of sphere-packing bounds provides a dual role for U : a construct from a converse capacity theorem, as well as a fictitious channel input. This paper is organized as follows. A statement of the problem is given in Sec. 2, together with basic definitions. Our main results are stated in Sec. 3 in the form of six theorems. An application to binary alphabets under Hamming cost constraints for the transmitter and adversary is given in Sec. 4. Proofs of the theorems appear in Secs. 5—10. All our derivations are based on the method of types [26]. The paper concludes with a discussion in Sec. 11 and appendices.

1.1

Notation

We use uppercase letters for random variables, lowercase letters for individual values, and boldface fonts for sequences. The p.m.f. of a random variable X ∈ X is denoted by pX = {pX (x), x ∈ X }, and the probability of a set Ω under pX is denoted by PX (Ω). Entropy of a random variable X is denoted by H(X), and mutual information between two random variables X and Y is denoted by I(X; Y ) = H(X) − H(X|Y ), or by I˜XY (pXY ) when the dependency on pXY should be explicit; similarly we sometimes use the notation I˜XY |Z (pXY Z ). The Kullback-Leibler divergence between two p.m.f.’s p and q is denoted by D(p||q). We denote by D(pY |X ||qY |X |pX ) = D(pY |X pX ||qY |X pX ) the conditional Kullback-Leibler divergence of pY |X and qY |X with respect to pX . All logarithms are in base 2. Following the notation in Csisz´ar and K¨ orner [13], let px denote the type of a sequence x ∈ X N (px is an empirical p.m.f. over X ) and Tx the type class associated with px , i.e., the set of all sequences of type px . Likewise, we define the joint type pxy of a pair of sequences (x, y) ∈ X N ×Y N (a p.m.f. over X × Y) and Txy the type class associated with pxy , i.e., the set of all sequences of p (x,y) type pxy . Finally, we define the conditional type py|x of a pair of sequences (x, y) as xy px (x) for all ˜ such that x ∈ X such that px (x) > 0. The conditional type class Ty|x is the set of all sequences y ˜ ) ∈ Txy . The conditional type class Ty|x (˜ ˜ such that (˜ ˜ ) ∈ Txy . (x, y x) is the set of all sequences y x, y We denote by H(x) the entropy of the p.m.f. px and by I(x; y) the mutual information for the joint p.m.f. pxy . Recall that |N + 1|−|X | 2N H(x) ≤ |Tx | ≤ 2N H(x)

(1.1)

|N + 1|−|X | |Y| 2N H(y|x) ≤ |Ty|x | ≤ 2N H(y|x) .

(1.2)

and [N ]

We let PX and PX represent the set of all p.m.f.’s and all empirical p.m.f.’s respectively, for [N ] a random variable X. Likewise, PY |X and PY |X denote the set of all conditional p.m.f.’s and all 3

empirical conditional p.m.f.’s respectively, for a random variable Y given X. The notations f (N ) ≪ (N ) g(N ), f (N ) = O(g(N )), and f (N ) ≫ g(N ) indicate that limN →∞ fg(N ) is zero, finite but nonzero,   . and infinite, respectively. The shorthands f (N ) = g(N ), f (N ) ≤ g(N ) and f (N ) ≥ g(N ) denote (N ) f (N ) 1 equality and inequality on the exponential scale: limN →∞ N1 ln fg(N ) = 0, limN →∞ N ln g(N ) ≤ 0,

and limN →∞

1 N

(N ) ln fg(N ) ≥ 0, respectively. We let 1{x∈Ω} denote the indicator function of a set Ω,

and U(Ω) denote the uniform p.m.f. over a finite set Ω. We define |t|+ , max(0, t), exp2 (t) , 2t , and h(t) , −t log t − (1 − t) log(1 − t) (the binary entropy function). We adopt the notational convention that the minimum of a function over an empty set is +∞.

2

Statement of the Problem

Our generic problem of communication with side information at the encoder and decoder is diagrammed in Fig. 1. There the state sequence S = (S1 , . . . , SN ) consists of independent and identically distributed (i.i.d.) samples drawn from a p.m.f. pS (s), s ∈ S. Degraded versions Se , Sa , and Sd of the state sequence S are available to the encoder, adversary and decoder, respectively. The triple (Se , Sa , Sd ) is the output of a memoryless channel with conditional p.m.f. p(se , sa , sd |s) and input S. The sequences Se , Sa , Sd are available noncausally to the encoder, adversary and decoder, respectively. The adversary’s channel is of the form pY|XSa (y|x, sa ). This includes the problems listed in Table 1 as special cases. To simplify the presentation, we set up the problem as (S e , S a , S d ) = F (S),

(2.1)

where F : S → S e × S a × S d is an invertible mapping. There is no loss of generality here, because only the joint statistics of (S e , S a , S d ) matter in the performance analysis. The alphabets S, X and Y are finite.

S State Sequence p( s e, sa, sd| s )

NR

M {1, ... , 2 Message

Sa

Se

} Encoder fN

X

Channel p(y |x ,sa )

Y

Sd Decoder gN

^ M

Randomized Code C Figure 1: Communication with side information at the encoder and decoder. Cost constraints are imposed on the encoder and channel. A message M is to be transmitted to a decoder; M is uniformly distributed over the message set M. The transmitter produces a sequence X = fN (Se , M ). The adversary passes X through the 4

Problem Gel’fand-Pinsker [1] Public Watermarking [9, 12] Semiblind Watermarking [9] Cover-Chiang [4] Private Watermarking [9, 11] Jamming [13, 16, 17]

Se S S Se Se S ∅

Sa S ∅ ∅ e (S , Sd ) ∅ S

Sd ∅ ∅ Sd 6= Se Sd S ∅

Binning? yes yes yes yes no no

Table 1: Relation between Se , Sa , Sd and S for various coding problems with side information. channel pY|XSa (y|x, sa ) to produce corrupted data Y. The decoder does not know pY|XSa selected by the adversary and has access to a version sd of the state sequence s. The decoder produces ˆ = gN (Y, Sd ) ∈ M of the transmitted message. The alphabets for X and Y are an estimate M denoted by X and Y, respectively. We allow the encoder/decoder pair (fN , gN ) to be randomized, i.e., the choice of (fN , gN ) is a function of a random variable known to the encoder and decoder but not to the adversary. This random variable is independent of all other random variables and plays the role of a secret key. The randomized code will be denoted by (FN , GN ). To summarize, the random variables M , FN , GN , Se , Sa , Sd , X and Y have joint p.m.f. # "N Y pM (m)pFN GN (fN , gN ) pS e S a S d (sei , sai , sdi ) 1{x=fN (se ,m)} pY|XSa (y|x, sa ). i=1

2.1

Constrained Side-Information Codes

A cost function Γ : S e × X → R+ is defined to quantify the cost Γ(se , x) of transmitting symbol x when the channel state at the encoder is se . This definition is extended to N -vectors using 1 PN N e Γ (s , x) = N i=1 Γ(sei , xi ). In information embedding applications, Γ is a distortion function measuring the distortion between host signal and marked signal.

We now define a class of codes satisfying maximum-cost constraints (Def. 2.1) and a class of codes satisfying average-cost constraints (Def. 2.2). The latter class is of course larger than the former. We also define a class of codes that have constant composition, conditioned on the encoder’s state sequence type (Def. 2.3), a class of extended codes (Def. 2.4), and a class of randomlymodulated (RM) codes (Def. 2.5). The latter terminology is adopted from [17].

Def. 2.2 is analogous to the definition of a length-N information hiding code in [9]. The common source of randomness between encoder and decoder appears via the distribution pFN GN (fN , gN ) whereas in [9] it appears via a cryptographic key sequence k with finite entropy rate. Def. 2.4 is applicable to the binning codes with conditionally-constant composition that are used to prove the achievability theorems. The same definition is also applicable to codes extended with a dummy sequence, which are used to prove the sphere-packing theorems. Definition 2.1 A length-N , rate-R, randomized code with side information and maximum cost D1 is a triple (M, FN , GN ), where • M is the message set of cardinality |M| = ⌈2N R ⌉; 5

• (FN , GN ) has joint distribution pFN GN (fN , gN ); • fN : (S e )N × M → X N is the encoder mapping the state sequence se and message m to the transmitted sequence x = fN (se , m). The mapping is subject to the cost constraint ΓN (se , fN (se , m)) ≤ D1

almost surely (pS e pFN pM );

(2.2)

• gN : Y N × (S d )N → M ∪ {e} is the decoder mapping the received sequence y and channel state sequence sd to a decoded message m ˆ = gN (y, sd ). The decision m ˆ = e is a declaration of error. Definition 2.2 A length-N , rate-R, randomized code with side information and expected cost D1 is a triple (M, FN , GN ) which satisfies the same conditions as in Def. 2.1, except that (2.2) is replaced with the looser constraint X se

e pN S e (s )

X

pFN (fN )

X

m∈M

fN

1 N e Γ (s , fN (se , m)) ≤ D1 . |M|

(2.3)

Definition 2.3 A length-N , rate-R, randomized code with side information, maximum cost D1 , and conditionally constant composition is a quadruple (M, Λ, FNccc , GN ), where Λ is a mapping [N ] [N ] ccc (se , m) has conditional type p e = Λ(p e ) from PS e to PX|S e . The transmitted sequence x = fN s x|s and satisfies the cost constraint (2.2). Definition 2.4 A length-N , rate-R, randomized, extended code with side information, maximum cost D1 , and conditionally constant composition is a quadruple (M, Λ, FNext , GN ), where Λ is [N ] [N ] ext (se , m) a mapping from PS e to PXU |S e . The output of the code is a pair of sequences (x, u) = fN with conditional type pxu|se = Λ(pse ). The first sequence satisfies the cost constraint (2.2). Definition 2.5 A randomly modulated (RM) code with side information is a randomized code defined via permutations of a prototype (fN , gN ). Such codes are of the form π e (s , m) , π −1 fN (πse , m) x = fN π gN (y, sd ) , gN (πy, πsd )

where π is chosen uniformly from the set of all N ! permutations, and the sequence πx is obtained by applying π to the elements of x.

2.2

Constrained Attack Channels

Next we define a class A of DMC’s (Def. 2.6) and a closely related class PY|XSa [A] of AVC’s in which the conditional type of y given (x, sa ) is constrained (Def. 2.7). Definition 2.6 The C-DMC class A is a given subset of PY |XS a . 6

Q a For C-DMC’s, we have pY|XSa (y|x, sa ) = N i=1 pY |XS a (yi |xi , si ), where pY |XS a ∈ A. To simplify the exposition, we assume that A is a closed set. The set A is defined according to the application. 1. In the case of a known channel [1], A is a singleton. 2. In information hiding problems [9], A is the class of DMC’s that introduce expected distortion between X and Y at most equal to D2 : X (2.4) pXS a (x, sa )pY |XS a (y|x, sa )d(x, y) ≤ D2 , sa ,x,y

where d : X × Y → R+ is a distortion function. A can also be defined to be a subset of the above class. 3. In some applications, A could be defined via multiple cost constraints. Definition 2.7 The AVC class PY|XSa [A] is the set of channels such that for any channel input T [N ] (x, sa ) and output y, the conditional type py|xsa belongs to A PY |XS a with probability 1: P r[py|xsa ∈ A] = 1.

If A is defined via the distortion constraint (2.4), let dN (x, y) = may then be rewritten as P r[dN (x, y) ≤ D2 ] = 0,

(2.5)

1 N

PN

i=1 d(xi , yi ).

Condition (2.5) (2.6)

i.e., feasible channels have total distortion bounded by N D2 and arbitrary memory. 2 Comparing the C-DMC class A and the AVC class PY|XSa [A], we see that 1) the p.m.f. of Y given (X, Sa ) is uniform in the C-DMC case but not necessarily so in the AVC case, and 2) while conditional types py|xsa ∈ / A may have exponentially vanishing probability under the C-DMC model, such types are prohibited in the AVC case. One may expect that both factors have an effect on capacity and error exponents. As we shall see, only the latter factor does. The relation between the AVC class PY|XSa [A] in (2.6) and the classical AVC model [13] is detailed in Appendix A. The class (2.6) is not a special case of the classical AVC model because arbitrary memory is allowed; conversely, the classical AVC model is not a subset of the class (2.6) because the latter imposes a hard constraint on the conditional type of the channel input given the input. In some special cases, PY|XSa [A] may be a classical AVC. Lastly, we introduce the following class of attack channels, which are the counterparts of the c.c.c. codes of Def. 2.3 and are used in the proofs. Definition 2.8 An attack channel pY|XSa uniform over single conditional types is defined via N N a mapping Λ : PXS a → PY |XS a such that with probability 1, the channel output y has conditional type py|xsa = Λ(pxsa ). Moreover, Y is uniformly distributed over the corresponding conditional type class. 2

The case of channels with arbitrary memory subject to expected-distortion constraints admits a trivial solution: the adversary “obliterates” X with a fixed, nonzero probability that depends on D2 but not on N , and therefore no reliable communication is possible in the sense of Def. 2.9 below.

7

2.3

Probability of Error

The average probability of error for a deterministic code (fN , gN ) when channel pY|XSa is in effect is given by Pe (fN , gN , pY|XSa ) ˆ 6= M ) = Pr(M 1 X X = |M| m e a d

s ,s ,s ,x

X

e a d pY|XSa (y|x, sa ) 1{x=fN (se ,m)} pN S e S a S d (s , s , s ).

(2.7)

−1 (y,sd )∈g / N (m)

For a randomized code the expression above is averaged with respect to pFN GN (fN , gN ); this average is denoted by Pe (FN , GN , pY|XSa ). The minmax probability of error for the class of randomized codes and the class of attack channels considered is given by X ∗ Pe,N = min max pFN GN (fN , gN )Pe (fN , gN , pY|XSa ). (2.8) pFN GN pY|XSa

fN ,gN

∗ → 0 as N → ∞. Definition 2.9 A rate R is said to be achievable if Pe,N

Definition 2.10 The capacity C(D1 , A) is the supremum of all achievable rates. Definition 2.11 The reliability function of the class of attack channels considered is   1 ∗ E(R) = lim sup − log Pe,N . N N →∞

(2.9)

There are four combinations of maximum/expected cost constraints for the transmitter and C-DMC/AVC designs for the adversary (four flavors of the generalized Gel’fand-Pinsker problem), and a question is whether same capacity and error exponents will be obtained in all four cases. We now define transmit channels, which play a crucial role in deriving capacity and error-exponents. Definition 2.12 Given alphabets X , U and S e , a transmit channel pXU |S e is a conditional p.m.f. which satisfies the distortion constraint X pXU |S e (x, u|se )pS e (se )Γ(se , x) ≤ D1 . u,se ,x

We denote by PXU |S e (D1 ) the set of feasible transmit channels. Note that transmit channels have been termed covert channels [9] and watermarking channels [11, 12] in the context of information hiding. In those papers, the channel pY |X was termed attack channel; we retain this terminology for pY |XS a in this paper.

8

2.4

Preliminaries

Consider a sextuple of random variables (S e , S a , S d , U, X, Y ) with joint p.m.f. pS e S a S d U XY . The following difference of mutual informations plays a fundamental role in capacity analyses [1]—[12] of channels with side information. It plays a central role in the analysis of error exponents as well: J(pS e S a S d U XY ) = I(U ; Y S d ) − I(U ; S e ).

(2.10)

Note that J depends on pS e S a S d U XY only via the marginal pU S e S d Y . Hence we can define J4 (pU S e S d Y ) = J(pS e S a S d U XY ).

(2.11)

The following properties are analogous to [1, Prop. 1] and [9, Prop. 4.1]. Property 2.1 The function J is concave in pU |S e for pX|U S e , pS e S a S d and pY |XS a fixed. Property 2.2 The function J is convex in pX|U S e for pU |S e , pS e S a S d and pY |XS a fixed. Property 2.3 The function J is convex in pY |XS a for pS e S a S d and pXU |S e fixed. Channel capacity for the problems studied in [1]—[12] is given by C(D1 , A) = sup max

min J(pS e S a S d pXU |S e pY |XS a )

U pXU |S e pY |XS a

(2.12)

where restrictions are imposed on the joint distribution of (S e , S a , S d ) (including the absence of some of these variables, see Table 1), and the maximization over pXU |S e and minimization over pY |XS a are possibly subject to cost constraints. Generally, the auxiliary random variable U is defined over an arbitrary, possibly unbounded, alphabet, hence the supremum over U in (2.12). To evaluate (2.12) in the case S a = ∅, Moulin and O’Sullivan [9] claimed that the cardinality of the alphabet U can be restricted to |S e | |X | + 1 without loss of optimality. The proof is based on Caratheodory’s theorem, as suggested in [1]. However the proof in [9] applies only to the fixed-channel case 3 . The use of alphabets with unbounded cardinality introduces some technical subtleties. We will be using the following properties. • Given alphabets U and Z such that |U | |Z| ≪ . |Tu | = 2N H(u) ,

N log N ,

it follows from (1.1) and (1.2) that

. |Tu|z | = 2N H(u|z)

(2.13)

for all (u, z) ∈ (U × Z)N . • The mutual information, Kullback-Leibler divergence, and J functionals are continuous with respect to l1 norm. For instance, for any p.m.f.’s pS eS a S d U XY , p′S eS a S d U XY and ǫ > 0, there exists δ such that kpS e S a S d U XY − p′S e S a S d U XY k < δ



|J(pS e S a S d U XY ) − J(p′S e S a S d U XY )| < ǫ,

where the norm on p − p′ is the l1 norm. 3

Equation (A7) in the proof of [9, Prop. 4.1(iv)] is associated with a fixed DMC.

9

∞ • Given two compact sets P and Q and two sequences of subsets {Pn }∞ n=1 and {Qn }n=1 dense in P and Q respectively under the l1 norm, we have the following property. For any continuous functional φ : P × Q → R+ , the functional φ∗ (p) = inf q∈Q φ(p, q) is also continuous, and we have 4

sup inf φ(p, q) = p∈P q∈Q

=

3

lim sup lim

inf φ(p, q)

(2.14)

n→∞ p∈Pn m→∞ q∈Qm

lim sup inf φ(p, q).

(2.15)

n→∞ p∈Pn q∈Qn

Main Results

The main tool used to prove the coding theorems in this paper is the method of types [26]. Our random-coding schemes (Sec. 3.1) are random-binning schemes in which the auxiliary random variable U is input to a fictitious channel. The role of U for our converse and sphere-packing theorems is detailed in Sec. 3.2. In all derivations, optimal types for sextuples (se , sa , sd , u, x, y) are obtained as solutions to maxmin problems. Two key facts used to prove the theorems are: 1) the number of conditional types is polynomial in N , and 2) in the AVC case, the worst attacks are uniform over conditional types, as in Somekh-Baruch and Merhav’s watermarking capacity game [12]. Proof of the theorems appears in Secs 5—10. Related, known results for C-DMC’s without side information are summarized in Appendix B. The expression (2.12), restated below in a slightly different form, turns out to be a capacity expression for the problems considered in this paper (Theorem 3.4): C = C(D1 , A) = sup

max

min

U pXU |S e ∈PXU |S e (D1 ) pY |XS a ∈A

J(pS e S a S d pXU |S e pY |XS a ).

(3.1)

In the special case of degenerate pS e S d (no side information at the encoder and decoder), it is known that the maximum above is achieved by U = X, and capacity reduces to the standard formula C = maxpX minpY |XS a I˜XY (pX pY |XS a pS a ). If S e = S d = S and S a = ∅ (private watermarking), the optimal choice is again U = X, and C = maxpX|S minpY |X I˜XY |S (pX|S pY |X pS ).

3.1

Random-Coding Exponents

Theorem 3.1 For the C-DMC case (Def. 2.6) with maximum-cost constraint (2.2) or expected-cost constraint (2.3) on the transmitter, the reliability function is lower-bounded by the random-coding error exponent ErC−DM C (R) =

min

p˜S e ∈PS e

sup U

max

min

h

D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS e S a S d pXU |S e pY |XS a ) i +|J(˜ pS e pXU |S e p˜Y S a S d |XU S e ) − R|+ .

Moreover, ErC−DM C (R) = 0 if and only if R ≥ C. 4

min

pXU |S e ∈PXU |S e (D1 ) p˜Y S a S d |XU S e ∈PY S a S d |XU S e pY |XS a ∈A

[N] [N] Consider for instance P = PZ , Q = PU |Z , φ(p, q) = I˜U ;Z (pq), Pn = PZ , and Qn = PU |Z .

10

(3.2)

. . ..

.. . . .

... .

C (p s e) State Sequence type p se

N ρ( pse)

2

C (p s e ) 2

NR

Figure 2: Representation of binning scheme as a stack of variable-size arrays indexed by state sequence type; ρ(pse ) ∼ I(u; se ). The random-coding error exponent (3.2) is achieved by deterministic, extended codes with conditionally constant composition (Def. 2.4) (using the stacked binning technique depicted in Fig. 2) and the MPMI decoder (5.6). At each level of the stack, the array depth parameter ρ(pse ) is N1 times the logarithm of the number of codewords in any column of the array. The parameter ρ(pse ) is optimized so as to optimally balance the probability of encoding errors and the probability of decoding errors, conditioned on the encoder’s state sequence type pse . The former does not vanish when ρ(pse ) ≤ I(u; se ) but vanishes at a double-exponential rate otherwise. The latter increases exponentially with ρ(pse ). Therefore making ρ(pse ) a function of pse allows optimal fine-tuning of this tradeoff, resulting in ρ(pse ) ∼ I(u; se ). Such fine tuning is not possible with the standard fixed-size binning scheme, see Remark 3.3 for more details. Theorem 3.2 For the AVC case (Def. 2.7) with maximum-cost constraint (2.2) or expected-cost constraint (2.3) on the transmitter, the reliability function is lower-bounded by the random-coding error exponent ErAV C (R) =

min

p˜S e ∈PS e

sup U

max

min

pXU |S e ∈PXU |S e (D1 ) p˜Y S a S d |XU S e ∈PY S a S d |XU S e [A]

h

pS e pXU |S e p˜Y S a S d |XU S e ) D(˜ pS e S a S d ||pS e S a S d ) + I˜Y ;U S eS d |XS a (˜ i +|J(˜ pS e pXU |S e p˜Y S a S d |XU S e ) − R|+ .

(3.3)

Moreover, ErAV C (R) = 0 if and only if R ≥ C.

The random-coding error exponent (3.3) is achieved by randomly-modulated, extended codes with conditionally constant composition (Defs. 2.4 and 2.5), stacked binning, and a MPMI decoder. The worst attack channel is uniform over single conditional types (Def. 2.8). It should also be noted that: 11

1. the worst type classes Tse , Tysa sd |xuse , and best type class Txu|se (in an appropriate min max min sense) determine the error exponents; 2. the order of the min, max and min is determined by the knowledge available to the encoder. The encoder knows se and can optimize Txu|se , but has no control over Tysa sd |xuse . 3. the straight-line part of Er (R) results from the union bound; 4. random codes are generally suboptimal at low rates. Theorems 3.1 and 3.2 imply the following relationship between error exponents in the C-DMC and AVC cases. Corollary 3.3 ErC−DM C (R) ≤ ErAV C (R) for all R. Proof. Using the relation I(Y ; U S e S d |XS a ) = D(pY XU S e S a S d ||pY |XS a pXU S e S a S d ), we can rewrite the cost function in (3.3) as pS pXU |S e pY |XS a ) + |J − R|+ D(˜ pS ||pS ) + D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||˜ where Bayes’ rule gives pY |XS a explicitly as a function of p˜Y S a S d |XU S e , pXU |S e , and p˜S ; moreover, pY |XS a ∈ A. Next, using the chain rule for divergence, it is seen that the cost function takes the same form as that in (3.2). Unlike (3.2) however, the minimization over pY |XS a and p˜Y S a S d |XU S e 2 are over smaller sets: respectively the singleton defined above and PY S a S d |XU S e [A]. The inequality ErC−DM C (R) ≤ ErAV C (R) is not as surprising as it initially seems, because the proof of Theorem 3.2 shows there is no loss in optimality in considering AVC’s that are uniform over conditional types, and there are more conditional types to choose from under the C-DMC model. Generally that additional flexibility is beneficial for the adversary, and the worst conditional type does not satisfy the hard constraint (2.5). See Sec. 4 for an example. Remark 3.1 An upper bound on the error exponents (3.2) and (3.3) can be obtained by fixing p˜S e = pS e and p˜Y S a S d |XU S e = pY |XS a pS a S d |S e . We obtain Er{C−DM C, AV C} (R) ≤ |C − R|+ ; equality is achieved if the minimizing p˜S e and p˜Y S a S d |XU S e in (3.2) and (3.3) are equal to pS e and pY |XS a pS a S d |S e , respectively. Remark 3.2 In the absence of side information (degenerate pS e S d ), the optimal U = X, and (3.3) becomes ErAV C (R) = |C − R|+ . The expression for ErAV C (R) derived by Hughes and Thomas [17] (Eqns (9), (6), also see the observation on top of p. 96) is upper-bounded by |C − R|+ ; they also provide a binary-Hamming example in which equality is achieved. Our result implies that the upper bound |C − R|+ is in fact achieved for any problem without side information in which there exists a hard constraint on the conditional type of the channel output given the input. This result is also in agreement with the random-coding error exponent given in [11, Eqns (17)–(19)] for the private watermarking problem under hard distortion constraints (λ = ∞). See (B.5) and (B.6). 12

Remark 3.3 For a standard 2-D random binning scheme with size 2N R × 2N ρ array, derivations similar to those in Theorem 3.1 result in the following error exponent: C−DM C (R) = max Er−2Dbin

min

ρ≥0 p˜S e ∈PS e

sup U

max

min

min

pXU |S e ∈PXU |S e (D1 ) p˜Y S a S d |XU S e ∈PY S a S d |XU S e pY |XS a ∈A

h ≤

D(˜ pS e pXU |S e p˜Y S a S d |XU S e || pS e S a S d pXU |S e pY |XS a ) i +γ(ρ, R, p˜S e , pXU |S e , p˜Y S a S d |XU S e )

min

max sup

p˜S e ∈PS e ρ≥0

U

max

min

(3.4) min

pXU |S e ∈PXU |S e (D1 ) p˜Y S a S d |XU S e ∈PY S a S d |XU S e pY |XS a ∈A

h

D(˜ pS e pXU |S e p˜Y S a S d |XU S e || pS e S a S d pXU |S e pY |XS a ) i +γ(ρ, R, p˜S e , pXU |S e , p˜Y S a S d |XU S e )

(3.5)

= ErC−DM C (R) where the function γ is defined as

γ(ρ, R, p˜S e , pXU |S e , p˜Y S a S d |XU S e )  |I˜U ;Y S d (˜ pS e , pU |S e ) pS e pXU |S e p˜Y S a S d |XU S e ) − ρ − R|+ : ρ ≥ I˜U S e (˜ = 0 : else.

(3.6)

The inequality (3.5) is obtained by switching the maximization over ρ and the minimization over p˜S e . For fixed p˜S e and pXU |S e , the maximizing ρ in (3.5) is equal to I˜U S e (˜ pS e pU |S e ). The function γ is nonconcave in ρ, and therefore we generally have strict inequality in (3.5): the standard 2-D AV C (R) ≤ ErAV C (R). binning scheme is suboptimal. We similarly have Er−2Dbin Remark S 3.4 In the proof of Theorems 3.1 and 3.2, neither the construction of the random code C = pse C(pse ) nor the encoding or decoding rule depend on the actual probability law pS of the state sequence. Therefore the random-coding exponents (3.2) and (3.3) are universally attainable for all pS even when pS is unknown. If pS is known to belong to a class PS∗ , the worst-case error exponents over PS∗ are obtained by performing an additional inner minimization of (3.2) and (3.3) over pS ∈ PS∗ .

3.2

Converse Theorems, Capacity, and Sphere-Packing Exponents

To prove their converse theorem, Gel’fand and Pinsker [1] used Fano’s inequality P together with a telescoping technique to upper bound the mutual information I(M ; Y) with a sum N i=1 [I(Ui ; Yi )− I(Ui ; Si )]. Then they derived a single-letter upper bound N [I(U ; Y ) − I(U ; S)]. The random variables Ui used in this construction are of the form Ui = (M, Si+1 , · · · , SN , Y1 , · · · , Yi−1 ) and as such do not admit an obvious coding interpretation (Y is not available to the encoder). The only natural coding interpretation known to date arises in the context of binning schemes, where the fictitious channel from U to Y conveys information about M as well as about S (overhead information). The Gel’fand-Pinsker proof of the converse theorem can be extended to more complex problems such as compound Gel’fand-Pinsker channels [9, 12] and sphere-packing bounds. Our first result 13

is the capacity of the generalized Gel’fand-Pinsker problem, stated in Theorem 3.4. Achievability follows from Theorems 3.1 and 3.2. The proof of the converse appears in Sec. 7. Theorem 3.4 Capacity for the generalized Gel’fand-Pinsker problem is given by (3.1) for all four combinations of maximum-cost constraints (2.2) and expected-cost constraints (2.3) on the transmitter and C-DMC (Def. 2.6) and AVC models (Def. 2.7) for the adversary. The proof of the C-DMC converse is similar to that in [9]; the proof of the AVC case exploits the close connection between the AVC and C-DMC problems. Our derivation of sphere-packing bounds builds on the existence of a converse theorem (Prop. 8.4) and makes use of a random variable U as discussed above. The derivation also relies on extended codes (Def. 2.4), for which U admits a coding interpretation as well. Theorem 3.5 For the C-DMC case with maximum-cost constraint (2.2) or expected-cost constraint (2.3) on the transmitter, the reliability function is upper-bounded by the sphere-packing exponent C−DM C Esp (R) =

min

sup

p˜S e ∈PS e

U

max

pXU |S e ∈PXU |S e (D1 )

min

pY |XS a ∈A

min

p˜Y S a S d |XUS e : J(˜ pS e pXU|S e p˜Y S a S d |XUS e ) ≤ R

D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS e S a S d pXU |S e pY |XS a ) (3.7)

C−DM C (R) < ∞ in (3.7). for all R∞ ≤ R ≤ C, where R∞ is the infimum of all R such that Esp C−DM C (R) = 0 if and only if R > C. Moreover, Esp

Theorem 3.6 For the AVC case with maximum-cost constraint (2.2) or expected-cost constraint (2.3) on the transmitter, the reliability function is upper-bounded by the sphere-packing exponent AV C Esp (R) =

min

p˜S e ∈PS e

sup U

h

max

pXU |S e ∈PXU |S e (D1 )

min p˜Y S a S d |XUS e : p˜Y |XS a ∈ A, J(˜ pS e pXU|S e p˜Y S a S d |XUS e ) ≤ R

i pS e pXU |S e p˜Y S a S d |XU S e ) (3.8) D(˜ pS e S a S d ||pS e S a S d ) + I˜Y ;U S eS d |XS a (˜

AV C (R) < ∞ in (3.8). for all R∞ ≤ R ≤ C, where R∞ is the infimum of all R such that Esp AV C (R) = 0 if and only if R ≥ C. Moreover, Esp

Similarly to Corollary 3.3, we have: C−DM C (R) ≤ E AV C (R) for all R. Corollary 3.7 Esp sp

Theorem 3.8 determines the reliability function for C-DMC at high rates. Theorem 3.8 For both the C-DMC and AVC cases, there exists a critical rate Rcr < C such that  Esp (R) : R ≥ Rcr (3.9) Er (R) = Esp (Rcr ) + Rcr − R : else and E(R) = Esp (R) for all R ≥ Rcr . The critical rates are generally not the same in the C-DMC and AVC cases. 14

4

Binary-Hamming Case

In this section, we consider a problem of theoretical and practical interest where S = S e , S e = {0, 1}, Se is a Bernoulli sequence with P r[S e = 1] = pe = 1 − P r[S e = 0], transmission is subject to the cost constraint (2.2) in which Γ is Hamming distance, and the adversary is subject to the expected-distortion constraint (2.4) or to the maximum-distortion constraint (2.6), in which d is also Hamming distance. In both cases the set A is given by (2.4). We study three cases: Case I: pe = 21 , S a = S d = ∅. This problem is analogous to the public watermarking problem of [5, 6, 9]. Case II: pe = 21 , S a = ∅, S d = S e . This is the private watermarking problem of [9]. The AVC version of this problem is closely related to a problem studied by Csisz´ar and Narayan [14] and Hughes and Thomas [17]. Case III: Degenerate side information: pe = 0, S e = S a = S d = ∅. Unlike [14, 17], the attacker’s noise may depend on X. In all three cases, we are able to derive some analytical results and to numerically evaluate error exponents. Capacity formulas for these problems are given below and illustrated in Fig. 3. In this section we use the notation p ⋆ q , p(1 − q) + (1 − p)q.

Capacity (bit/sample)

0.25

0.2

Private Case

0.15

0.1

h(D1)−h(D2)

0.05 Public Case & Degenerate Case 0 0

0.1

0.2

0.3

0.4

0.5

D

1

Figure 3: Capacity functions for Cases I–III when D2 = 0.2.

4.1

Case I: Public Watermarking

Here pe = 21 , S a = S d = ∅. Capacity for a fixed-DMC problem (adversary implements a binary symmetric channel (BSC) with crossover probability D2 ) is given in Barron et al [5] and Pradhan 15

et al [6]: C

pub



= g (D1 , D2 ) ,

 

D1 δ2 [h(δ2 ) −

h(D2 )], if 0 ≤ D1 < δ2 ; h(δ2 ) − h(D2 ), if δ2 ≤ D1 ≤ 1/2;  1 − h(D2 ), if D1 > 1/2,

(4.1)

where δ2 = 1 − 2−h(D2 ) and h(·) is the binary entropy function. The straight-line portion of the capacity function is achieved by time-sharing. Proposition 4.1 shows that the BSC is the worst channel for the C-DMC and AVC classes considered. Proposition 4.1 Capacity under the C-DMC and AVC models defined by the distortion constraints (2.4) and (2.6), respectively, is equal to C pub and is achieved for |U | = 2. Proof: See Appendix C. The sphere-packing exponent in the AVC case is trivial (∞) because its calculation requires minimization of a function over an empty set. The same phenomenon was observed and discussed in [17]. AV C,pub Proposition 4.2 Esp (R) = ∞ for all R < C pub .

Proof: See Appendix D. Proposition 4.3 The random-coding error exponent is a straight line in the AVC case: ErAV C,pub (R) = |C pub −R|+ for all R. The minimizing p˜S in (3.3) coincides with pS , the maximizing |U | = 2, and the minimizing pY |XU S is the BSC pY |X with crossover probability D2 . Proof: See Appendix E. Error exponents in the case D1 = 0.4, D2 = 0.2, are given in Fig. 4. For the C-DMC case, we have found numerically (see discussion in Sec. 4.4) that the worst attack channel pY |X is the BSC with crossover probability D2 , and that the worst-case p˜S in (3.2) coincides with pS .

4.2

Case II: Private Watermarking

Here pe = 21 , S a = ∅, S d = S e . Proposition 4.4 [9]. Capacity is given by C priv = h(D1 ⋆ D2 ) − h(D2 ). AV C,priv Proposition 4.5 Esp (R) = ∞ for all R < C pub . AV C,pub pub priv For C 0. For any rate-R encoder fN and attack channel pY |XS a ∈ A such that I(M, YSd ) ≤ N (R − η), (7.1) we have N R = H(M ) = H(M |YSd ) + I(M ; YSd ) d ≤ 1 + Pe (fN , gN , pN Y |XS a ) N R + I(M ; YS )

≤ 1 + Pe (fN , gN , pN Y |XS a ) N R + N (R − η) where the first inequality is due to Fano’s inequality, and the second is due to (7.1). Hence Pe (fN , gN , pN Y |XS a ) ≥

Nη − 1 . NR

We conclude that the probability of error is bounded away from zero: Pe (fN , gN , pN Y |XS a ) ≥

η 2R

(7.2)

for all N > η2 . Therefore rate R is not achievable if min

pY |XS a ∈A

I(M ; YSd ) ≤ N R.

(7.3)

Step 2. The joint p.m.f. of (M, S, X, Y) is given by pM SXY = pM pSe 1{X=fN (Se ,M )}

N Y

pY |XS a (yi |xi , sai )pS a S d |S e (sai , sdi |sei ).

(7.4)

i=1

Define the random variables d e e , Y1 , · · · , Yi−1 ), Wi = (M, Si+1 , · · · , SN , S1d , · · · , Si−1

1 ≤ i ≤ N.

(7.5)

Since (M, {Sje , Sjd , Yj }j6=i ) → Xi Sie → Yi Sia Sid forms a Markov chain for any 1 ≤ i ≤ N , so does Wi → Xi Sie → Yi Sia Sid . 27

(7.6)

Also define the quadruple of random variables (W, S, X, Y ) as (WI , SI , XI , YI ), where I is a timesharing random variable, uniformly distributed over {1, · · · , N } and independent of all the other random variables. The random variable W is defined over an alphabet of cardinality exp2 {N [R + log max(|S e |, |Z|)]}. Due to (7.4) and (7.6), W → XS e → Y S a S d forms a Markov chain. Using the same inequalities as in [1, Lemma 4] (with (Yi , Sid ) and Sie playing the roles of Yi and Si , respectively), we obtain I(M ; YSd ) ≤

N X [I(Wi ; Yi Sid ) − I(Wi ; Sie )].

(7.7)

i=1

Using the definition of (W, S, X, Y ) above and the same inequalities as in [9, (C16)], we obtain N X

[I(Wi ; Yi Sid ) − I(Wi ; Sie )] = N [I(W ; Y S d |T ) − I(W ; S e |T )]

i=1

≤ N [I(W T ; Y S d ) − I(W T ; S e )] = N [I(U ; Y S d ) − I(U ; S e )]

(7.8)

where U = (W, T ). Therefore I(M ; YSd ) ≤ N [I(U ; Y S d ) − I(U ; S e )] min

pY |XS a ∈A

I(M ; YSd ) ≤ N = N

min

[I(U ; Y S d ) − I(U ; S e )]

min

J(pU SXY )

pY |XS a ∈A pY |XS a ∈A

≤ N sup max

min

U pXU |S e pY |XS a ∈A

J(pU SXY ).

(7.9)

Combining (7.3) and (7.9), we conclude that R is not achievable if sup max

min

U pXU |S e pY |XS a ∈A

J(pU SXY ) ≤ R,

which proves the claim.

2

(b) AVC Case (Def. 2.7). Choose ǫ arbitrarily small and define the slightly enlarged class of DMC’s: ) ( Aǫ =

pY |XS a :

min

p′Y |XS a ∈A

kpY |XS a − p′Y |XS a k ≤ ǫ

where the norm is the l1 norm. Therefore limǫ→0 C(D1 , Aǫ ) = C(D1 , A). In order to prove the converse theorem, it is sufficient to show that reliable communication at rates above C(D1 , Aǫ ) is impossible for a particular attack channel pY|XSa ∈ PY|XSa [Aǫ ]. The channel we select is “nearly memoryless”. We show that minfN ,gN Pe,N (fN , gN , pY|XSa ) 6→ 0 as N → ∞. Construction of pY|XSa . Consider any rate-R code (M, fN , gN ) where R > C and select an arbitrary DMC p˜Y |XS a ∈ A such that minfN ,gN Pe,N (fN , gN , p˜N Y |XS a ) 6→ 0 as N → ∞. The existence 28

of such a DMC is guaranteed from Part (a). Define an arbitrary mapping Λ : X N × (S a )N → Y N ˜ be the output of p˜N a . Define the following such that pΛ(x,sa )|xsa ∈ Aǫ for all (x, sa ). Let Y Y |XS a ˜ ): the binary quantity functions of (x, s , y B = 1{py˜ |xsa ∈A / ǫ} and the sequence y=



˜ y : if py˜ |xsa ∈ Aǫ (B = 0) Λ(x, sa ) : else (B = 1).

Therefore a

pY|XSa (y|x, s ) =

(" n Y

#

p˜Y |XS a (yi |xi , sai )

i=1

)

P r(B = 0) + 1{y=Λ(x,sa )} P r(B = 1) 1{py|x,sa ∈Aǫ }

belongs to Aǫ . We also have Pr[B = 1] ≤ Pr[kpy˜ |xsa − p˜Y |XS a k > ǫ]   ǫ2 pY |XS a ) > ≤ Pr D(py˜ |xsa ||˜ 2 ln 2   2 ǫ . = exp2 −N 2 ln 2 where the second inequality follows from Pinsker’s inequality. For any (fN , gN ), we have Pe,N (fN , gN , p˜N Y |XS a ) ˜ Sd ] ˆ = = P r[M 6 M |Y, ˆ = ˜ Sd , B = 0]P r(B = 0) + P r[M ˆ 6= M |Y, ˜ Sd , B = 1]P r(B = 1). = P r[M 6 M |Y,

(7.10)

Likewise, Pe,N (fN , gN , pY|XSa ) ˆ 6= M |Y, Sd ] = P r[M ˆ 6= M |Y, Sd , B = 0]P r(B = 0) + P r[M ˆ 6= M |Y, Sd , B = 1]P r(B = 1). = P r[M

(7.11)

Noting that the terms multiplying P r[B = 0] in (7.10) and (7.11) are identical and that P r[B = 1] vanishes exponentially with N , we obtain |Pe,N (fN , gN , pY|XSa ) − Pe,N (fN , gN , p˜N Y |XS a )| → 0,

∀fN , gN .

Since minfN ,gN Pe,N (fN , gN , p˜N Y |XS a ) 6→ 0, we necessarily have min Pe,N (fN , gN , p˜N Y |XS a ) 6→ 0,

fN ,gN

which proves the claim.

2 29

8

Proof of Theorem 3.5

The proof is given for the average-cost constraint (2.3) on the transmitter. The upper bound C−DM C (R) we develop on the reliability function is still an upper bound under the stronger Esp maximum-cost constraint (2.2). Our lower bound on error probability is obtained by creating a dummy sequence U ∈ U N at the encoder (see Step 2 below) and making the joint type psuxy of the sequences (S, U, fN (Se , M ), Y) available to the decoder. Therefore the decoder gN is now a mapping [N ] gN : (Y × S d )N × PSU XY → M such that m ˆ = gN (y, sd , psuxy ) is the decoded message. To control the rate of growth of |U | [N ] as N → ∞, we restrict pysa sd |xuse to a subset PY S a S d |XU S e [ǫN ] of PY S a S d |XU S e , where ǫN is a resolution parameter which tends to zero at an appropriate rate as N → ∞. The sequence of sets PY S a S d |XU S e [ǫN ],

N ≥1

is dense in PY S a S d |XU S e (under the l1 norm) as N → ∞. Given a triple (U, V, Z) of random variables with joint p.m.f. pU V Z , we shall use the notation I(V ; Z | Tuz ) = I˜V ;Z (pV |U Z puz ) instead of the cumbersome

1 N I(V; Z

(8.1)

| (U, Z) ∈ Tuz ) where (U, V, Z) are i.i.d. pU V Z .

Step 1. For any pY|XSa , we have, using (2.8), ∗ Pe,N

≥ =

min

pFN GN

X

pFN GN (fN , gN )Pe (fN , gN , pY|XSa )

fN ,gN

min Pe (fN , gN , pY|XSa )

fN ,gN

(8.2)

(The minimum on the first line is achieved by a deterministic code.) The minimizing gN is the maximum-likelihood decoder tuned to pY|XSa . Denote by Dm (Tsuxy ) = {(˜ y , ˜sd ) ∈ Tysd : gN (˜ y, ˜sd , psuxy ) = m}

(8.3)

the decoding region associated with message m and type class Tsuxy , and c Dm (Tsuxy ) = Tysd \ Dm (Tsuxy )

its complement. We choose pY|XSa = pN Y |XS a , the length-N extension of a DMC pY |XS a ∈ A. ˜ = fN (˜se , m), we draw a dummy sequence u ˜ uniformly from the Step 2. Given ˜se and x conditional type class Tu|xse (˜ x, ˜se ). We call this sequence “dummy” because the actual channel ˜ ) as the output of an extended code x, u pN Y |XS a has no access to it. We may also think of the pair (˜ (Def. 2.4).

30

The probability of error conditioned on message m being sent and (S, U, fN (Se , m), Y) ∈ Tsuxy is given by Pe (fN , gN | Tsuxy ) 1 X P r[gN (Y, Sd , psuxy ) 6= m | m sent, Tsuxy ] = |M| m∈M X X 1 1 1 X 1 X = |M| |Ts | |Tu|xse | |Ty|xsa | d c ˜ s

m∈M

˜ u

1{(˜s,˜u,fN (˜se ,m),˜y)∈Tsuxy }

(8.4)

˜ : (˜ y y,˜ s )∈Dm (Tsuxy )

and is independent of pY |XS a . The overall probability of error is given by Pe (fN , gN , pN Y |XS a ) =

X

P r[Tsuxy ] Pe (fN , gN | Tsuxy ).

(8.5)

Tsuxy

Next we lower bound the sum in (8.5) with a single term corresponding to a particular type class Tsuxy , to be optimized later: Pe (fN , gN , pN Y |XS a ) ≥ P r[Tsuxy ] Pe (fN , gN | Tsuxy ).

(8.6)

Step 3. Define the conditional type distribution β(px|se |fN ) of a code fN as the relative frequency of the conditional types px|se , i.e., β(px|se |fN ) =

1 X 1 |M| |Tse | m∈M

Clearly

P

px|se β(px|se |fN ) = 1. Since e 1)|X | |S | , we have β(px|se |fN )

X

1{˜x=x(˜se ,m)} ,

[N ]

px|se ∈ PX|S e .

(8.7)

(˜ x,˜ se )∈Txse

the number of conditional types px|se is upper-bounded e

by (N + ≥ (N + 1)−|X | |S | for at least one conditional type px|se . Therefore, there exists a subset of M × Tse of size equal to |M||Tse | on the exponential scale, such that the codewords fN (˜se , m) have constant conditional type given ˜se . Without loss of optimality on the exponential scale, we can thus restrict our attention to codes with conditionally constant composition: . ccc (8.8) , gN , pN min Pe (fN min Pe (fN , gN , pN Y |XS a ). Y |XS a ) = ccc fN ,gN

fN ,gN

ext with conditionally constant composition, the sequences (f ccc (˜ e ˜) For extended codes fN N s , m), u have fixed conditional type pxu|se = Λ(pse ). ext with conditional types p Step 4. Given any extended code fN xu|se = Λ(pse ), the joint p.m.f. of (M, S, U, X, Y) is given by N ccc (Se ,M )} pU|XSe p pM SUXY = pM pN S 1{X=fN Y |XS a .

(8.9)

Next we define a septuple (W, S e , S a , S d , U, X, Y ) as follows. As in the proof of Theorem 3.4, define the random variables e e d Wi = (M, Si+1 , · · · , SN , S1d , · · · , Si−1 , Y1 , · · · , Yi−1 ),

31

1 ≤ i ≤ N.

(8.10)

Since (M, {Sje , Sjd , Yj }j6=i ) → Xi Sie → Yi Sia Sid forms a Markov chain for any 1 ≤ i ≤ N , so does Wi → Xi Sie → Yi Sia Sid .

(8.11)

Next define (W, S e , S a , S d , U, X, Y ) as (WT , STe , STa , STd , UT , XT , YT ), where T is a time-sharing random variable, uniformly distributed over {1, · · · , N }, and independent of all the other random variables. The random variable W is defined over an alphabet W of cardinality exp2 {N [R + log max(|S e |, |S d | |Y|)]}. Also note that W is a function of M, Se , Sd , Y, and T ; however, due to (8.9) and (8.11), W → XS e → Y S a S d (8.12) forms a Markov chain. The joint p.m.f. of (W, S e , S a , S d , U, X, Y ) is given by pW S e S a S d U XY = pW |XS e pXS e pu|xse pY |XS a pS a S d |S e .

(8.13)

ccc , and that Observe that pW |XS e is a function of fN

pW S eS a S d U XY |Tsuxy = pW |XS e psuxy . Define the function J7 (pW S eS a S d U XY ) = I(W ; Y S d ) − I(W ; S e ).

(8.14)

Three elementary properties of the function J7 will be useful in the sequel. 1. Recalling (2.10) and (2.11), note that J7 (pW S e S a S d U XY ) = J(pS e S a S d W XY ) = J4 (pW S e S d Y ).

(8.15)

2. Due to (8.1) and (8.13), we have I(W ; Y S d | Tsuxy ) − I(W ; S e | Tsuxy ) = J7 (pW |XS e psuxy ).

(8.16)

J7 (1{W =U } pS e S a S d U XY ) = J(pS e S a S d U XY ).

(8.17)

3. Finally,

Step 5. We derive a condition that ensures no reliable communication is possible for certain types psuxy . Choose an arbitrarily small η. ext with conditionally constant types p Lemma 8.1 For any rate-R extended code fN xu|se = Λ(pse ), decoder gN , and joint type psuxy such that

I(M ; YSd Tsuxy ) ≤ N (R − η),

(8.18)

the probability of error conditioned on (S, U, X, Y) ∈ Tsuxy is bounded away from zero: ccc Pe (fN , gN | Tsuxy ) ≥

for all N > η2 . 32

η 2R

(8.19)

ext , p , and p Proof: The p.m.f. of Tsuxy is induced by fN S Y |XS a . We have

N R = H(M ) = H(M |YSd Tsuxy ) + I(M ; YSd Tsuxy ) ext ≤ 1 + Pe (fN , gN |Tsuxy ) N R + I(M ; YSd Tsuxy ) ext ≤ 1 + Pe (fN , gN |Tsuxy ) N R + N (R − η)

where the first inequality is due to Fano’s inequality, and the second is due to (8.18). Hence ext Pe (fN , gN |Tsuxy ) ≥

Nη − 1 , NR

from which the claim follows.

2

ccc with conditionally constant types Lemma 8.2 Assume |U | ≪ logNN . For any rate-R code fN ccc , such that px|se = Λ(pse ), decoder gN , there exists pW |XS e , functional of fN

1 1 I(M ; YSd Tsuxy ) ≤ I(M ; YSd | Tsuxy ) + o(1) N N ≤ J7 (pW |XS e psuxy ) + o(1),

(8.20)

[N ]

uniformly over all types psuxy ∈ PSU XY . Proof: The first inequality holds because I(M ; YSd Tsuxy ) = I(M ; YSd | Tsuxy ) + I(M ; Tsuxy ) ≤ I(M ; YSd | Tsuxy ) + H(Tsuxy ) ≤ I(M ; YSd | Tsuxy ) + |S| |U | |X | |Y| log(N + 1) and |U | ≪

N log N .

Referring to (8.9), the conditional p.m.f. pM SUXY| Tsuxy = pM |Tse |−1 1{X=fNccc (Se ,M )} |Tu|xse |−1 |Tysa sd |xuse |−1 1{(S,U,X,Y)∈Tsuxy } does not depend on pY |XS a ; hence I(M, YSd | Tsuxy ) is independent of pY |XS a as well. Recalling (8.10) and (8.14) and using the inequalities (7.7) and (7.8), with an additional conditioning on Tsuxy , we obtain the two inequalities below: d

I(M ; YS | Tsuxy ) ≤

N X

J7 (pWi |Xi Sie psuxy )

i=1

≤ N J7 (pW |XS e psuxy ), [N ]

which hold for all psuxy ∈ PSU XY .

(8.21) 2

Step 6. The following lemma shows that we can replace W with another auxiliary random ˜ defined over an alphabet U whose cardinality satisfies |U | ≪ N . This is done without variable U log N loss of optimality (in terms of J7 ) as psuxy ranges over a set of similar cardinality. 33

[N ]

The main difficulties are that the set PSU XY has unbounded cardinality as N → ∞, for any fixed U, and also that |U | itself is unbounded as N → ∞. We can however address both difficulties by fixing pxuse and restricting the stochastic matrix pysa sd |xuse to belong to the ǫ-net of some space of smooth matrix-valued functions (with respect to l1 distance) [30]. The specific choice of the functional space is not important here, however the cardinality 2H(ǫ) of the ǫ-net can be selected independently of the cardinality of U. Here ǫ is a resolution parameter and H(ǫ) denotes ǫ-entropy of an ǫ-net, which is O(ǫ−q ), q > 0, for functions that satisfy a Lipshitz condition, or O(logq 1ǫ ) for analytic functions. For our purposes it suffices to consider a sequence PY S a S d |XU S e [ǫN ], [N ]

N ≥1

of ǫ-nets in the set PY S a S d |XU S e , with ǫ-entropy H(ǫN ) =

1 ǫN .

The resolution parameter ǫN tends

to zero at a rate slower than H−1 (log N ). Observe that U in (8.22) satisfies |U | ≪

N log N .

Lemma 8.3 (Cardinality Reduction). Given pxse and a set PY S a S d |XU S e [ǫN ] of stochastic matrices with log-cardinality H(ǫN ) ≪ log N , let U be an alphabet of cardinality |U | = |X | |S e | + 2H(ǫN ) .

(8.22)

˜ Given pxse and pW |XS e , one can construct pU|XS e where U ∈ U, such that ˜ J7 (pU˜ |XS e psuxy ) = J7 (pW |XS e pxusy ) [N ]

(8.23)

[N,ǫ]

for all psuxy ∈ Ω , {pxse } × PU |XS e × PY S a S d |XU S e . Proof. The proof is an application of Caratheodory’s theorem, similarly to Prop. 4.1(iv) in [9] ˜ with alphabet and Lemma 7 in [12]. Let IN = |U |. We claim there exists a random variable U U = {1, 2, · · · , IN }, such that (8.23) holds. Due to property (8.15), the value of J7 (pW |XS e pxse pu|xse pysa sd |xuse ) does not depend on the particular choice of u. Therefore it suffices to prove (8.23) when u is an arbitrary, fixed sequence, [N,ǫ] and psuxy ∈ Ω′ , {pxuse } × PY S a S d |XU S e . Let fj , 1 ≤ j < IN be real-valued, continuous functions defined over PSU XY , and C be the image N −1 of Ω ⊂ PSU XY under the continuous mapping f (·) = {fj (·)}Ij=1 . By application of Caratheodory’s theorem, each element in the convex closure of C can be represented as the convex combination of at most IN elements of C. Hence for any pW , there exist IN elements wj , 1 ≤ j ≤ IN , of W and IN nonnegative numbers αj , 1 ≤ j ≤ IN , summing to one, such that X

w∈W

pW (w)fj (pSU XY |W =w ) =

IN X

αj fj (pSU XY |W =wj ),

1 ≤ j < IN .

(8.24)

j=1

Apply (8.24) to the following IN − 1 functions: fj (pSU XY |W =w ) = pXS e |W (x, se |w),

1 ≤ j = j(x, se ) ≤ |X | |S e | − 1,

(8.25)

fj (pSU XY |W =w ) = H(Y S d |W = w) − H(S e |W = w), |X | |S e | ≤ j = j(psuxy ) ≤ |X | |S e | + 2H(ǫN ) − 1, 34

(8.26)

where w ∈ W. In (8.25) j = j(x, se ) indexes all elements of X × S e except possibly one. In (8.26), ˜ such j indexes all the elements of Ω′ . Due to (8.24) and (8.25), there exists a random variable U that pU˜ XS e (˜ u, x, se ) = αu˜ pXS e |W (x, se |wu˜ )

∀(˜ u, x, se ) ∈ U × X × S e .

Due to (8.24) and (8.26), we have J7 (pW |XS e psuxy ) = H(Y S d |W ) − H(S e |W ) X = pW (w)[H(Y S d |W = w) − H(S e |W = w)] w∈W

=

IN X

αu˜ [H(Y S d |W = wu˜ ) − H(S e |W = wu˜ )]

v=1

˜ ) − H(S e |U ˜) = H(Y S d |U = J7 (pU˜ |XS e psuxy ) for all psuxy ∈ Ω′ . Combining Lemmas 8.1—8.3, we obtain the following proposition.

2

ccc with conditional types Proposition 8.4 (Converse for c.c.c. codes). For any rate-R code fN px|se = Λ(pse ), associated p.m.f. pU˜ |XS e , decoder gN , and psuxy ∈ Ω satisfying

J7 (pU˜ |XS e psuxy ) ≤ R − η,

(8.27)

the probability of error conditioned on (S, U, X, Y) ∈ Tsuxy is bounded away from zero: ccc Pe (fN , gN | Tsuxy ) ≥

η , 2R

∀N >

2 . η

(8.28)

ext with conditional types p Step 7. For a code fN xu|se , the probability of conditional type class Tsuxy under pS and the worst-case pY |XS a is given by (see (5.12)) . (8.29) max P r[Tsuxy ] = max exp2 {−N D(pse sa sd uxy ||pS e S a S d pxu|se pY |XS a )} pY |XS a ∈A

pY |XS a ∈A

η ext , g | T Moreover, Pe (fN N suxy ) ≥ 2R if inequality (8.18) holds. We shall be interested in pysa sd |xuse ∈ PY S a S d |XU S e [ǫN ] that maximizes the right side of (8.29) subject to the constraint (8.18). A lower bound on this quantity may be obtained by (i) replacing the constraint (8.18) with the stronger constraint (8.27), resulting in a smaller feasible set for maximization over pysa sd |xuse , and (ii) minimizing the right side of (8.29) over px|se and pU˜ |XS e . This lower bound holds for any choice of [N ]

pu|xse ∈ PU |XS e . From (8.6) we obtain ext max Pe (fN , gN , pN Y |XS a )

pY |XS a

≥ max min

max

ext max P r[Tsuxy ] Pe (fN , gN | Tsuxy )

pse px|se pysa sd |xuse pY |XS a

≥ max min min

max

(8.30)

o n max exp2 −N D(pse pxu|se pysa sd |xuse ||pS e S a S d pxu|se pY |XS a )

pse px|se pU˜ |XS e pysa sd |xuse pY |XS a

(8.31)

35

where the maximizations over pysa sd |xuse in (8.30) and (8.31) are restricted to the set PY S a S d |XU S e [ǫN ] and are subject to (8.18) and (8.27), respectively. Step 8. Replace the stochastic matrix pU˜ |XS e with the closest approximation (in the l1 sense) [N ]

N log N

pu˜ |xse in PU |XS e . Since |U | ≪

we obtain

kpU˜ |XS e pxse − pu˜ |xse pxse k ≪ 1. By continuity of J7 with respect to pU˜ |XS e , we obtain max |J7 (pU˜ |XS e psxuy ) − J7 (pu˜ |xse psuxy )| ≪ 1.

(8.32)

J7 (pu˜ |xse psuxy ) = J(ps˜uxy ).

(8.33)

psuxy

It follows from (8.15) that Combining (8.32) and (8.33), we obtain max |J7 (pU˜ |XS e psuxy ) − J(ps˜uxy )| < η

psuxy

for all N greater than some N0 (η). Step 9. A lower bound on (8.31) is obtained by replacing (8.27) with the stronger constraint J7 (pu˜ |xse psuxy ) ≤ R − 2η.

(8.34)

Recall from Prop. 8.4 that this bound holds for any choice of pu|xse . Instead of optimizing pu|xse , ˜ . 6 Eliminating u ˜ from the above equations, we obtain we simply choose u = u ext max Pe (fN , gN , pN Y |XS a )

pY |XS a 



max min

n o max exp2 −N D(pse pxu|se pysa sd |xuse ||pS e S a S d pxu|se pY |XS a )(8.35)

max

pse pxu|se pysa sd |xuse pY |XS a

where the maximization over pysa sd |xuse is subject to pysa sd |xuse ∈ PY S a S d |XU S e [ǫN ],

J(pxu|se pse pysa sd |xuse ) ≤ R − 2η.

(8.36)

We have ∗ Pe,N

. = 



min

ext , gN , pN max Pe (fN Y |XS a )

ext ,g pY |XS a fN N

2−N Esp,N (R)

(8.37)

where Esp,N (R) = min max

min

min D(pse pxu|se pysa sd |xuse ||pS e S a S d pxu|se pY |XS a )

pse pxu|se pysa sd |xuse pY |XS a

(8.38)

and the minimization over pysa sd |xuse is subject to the constraint (8.36). Step 10. Taking limits as N → ∞, replacing optimization over conditional types with optimization over conditional p.m.f.’s, and letting η → 0, we obtain the sphere-packing exponent C−DM C (R) in (3.7). Esp C−DM C (R) ≥ 0 with equality if and only if the following three conditions are met: 1) Clearly Esp 2 p˜S e = pS e ; 2) p˜Y S a S d |XU S e = pY |XS a pS e S d |S a , and 3) R ≥ C. 6

We conjecture that this choice is optimal.

36

9

Proof of Theorem 3.6

The derivations are similar to the derivation of the sphere-packing bound in the C-DMC case (Theorem 3.5). The proof is outlined for the average-cost constraint (2.3) on the transmitter. The AV C (R) we develop on the reliability function is still an upper bound under the upper bound Esp stronger maximum-cost constraint (2.2). In Step 1 of the proof, we choose pY|XSa to be an attack channel uniform over single conditional types py|xsa ∈ A (Def. 2.8) instead of some DMC pN Y |XS a in the C-DMC case. Steps 2—6 are identical to those in the C-DMC case. In particular (8.6) applies, with the same expression for Pe (fN , gN | Tsuxy ). In Step 7, P r[Tsuxy ] is given by (6.8), reiterated below for convenience: o n . P r[Tsuxy ] = exp2 −N [D(ps ||pS ) + I˜Y ;U S eS d |XS a (pse pxu|se pyxsa sd |use )] . Following Steps 8 and 9, we obtain the bound Pe,N





2−N Esp,N (R)

(9.1)

where Esp,N (R) = min max

min

p˜se pxu|se pysa sd |xuse

[D(ps ||pS ) + I˜Y ;U S e S d |XS a (pse pxu|se pysa sd |xuse )]

(9.2)

and the minimization over pysa sd |xuse is subject to the constraints (8.34) and py|xsa ∈ A. Step 10 is analogous to Step 10 in Theorem 3.5. 2

10

Proof of Theorem 3.8

The method of proof is identical for the C-DMC and AVC cases and will be presented only for the C-DMC case. Define the conditional error exponents ˜r (R, p˜S e , pXU |S e , pY |XS a ) E ,

min

p˜Y S a S d |XU S e ∈PY S a S d |XU S e

[D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS pXU |S e pY |XS a )

+ |J(˜ pS e pXU |S e p˜Y S a S d |XU S e ) − R|+ ]

(10.1)

and ˜sp (R, p˜S e , pXU |S e , pY |XS a ) E ,

min

p˜Y S a S d |XUS e ∈ PY S a S d |XUS e , J(˜ pS e pXU|S e p˜Y S a S d |XUS e ) ≤ R

D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS pXU |S e pY |XS a ) (10.2)

as well as lower and upper values for R: ˜ ∞ (˜ ˜sp (R, p˜S e , pXU |S e , pY |XS a ) < ∞} R pS e , pXU |S e , pY |XS a ) = inf{R : E 37

(10.3)

and ˜ pS e , pXU |S e , pY |XS a ) = sup{R : E ˜sp (R, p˜S e , pXU |S e , pY |XS a ) > 0}. C(˜

(10.4)

We can rewrite (3.2) and (3.7) as ErC−DM C (R) = min sup max p˜S e

min

˜r (R, p˜S e , pXU |S e , pY |XS a ) E

min

˜sp (R, p˜S e , pXU |S e , pY |XS a ). E

U pXU |S e pY |XS a ∈A

(10.5) C−DM C Esp (R)

= min sup max p˜S e

U pXU |S e pY |XS a ∈A

(10.6)

We have the following properties, which follow directly from [13, pp. 168–169]. ˜sp is convex in R. Property 10.1 The function E ˜sp is strictly decreasing in R for R ˜ ∞ < R ≤ C. ˜ Property 10.2 The function E Property 10.2 implies that the minimizing p˜Y S a S d |XU S e in (10.2) is such that J(˜ pS e pXU |S e p˜Y S a S d |XU S e ) = R, ˜ ∞ ≤ R ≤ C. ˜ provided that R Next we have ˜r (R, p˜S e , pXU |S e , pY |XS a ) E h i = min D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS pXU |S e pY |XS a ) + |J(˜ pS e pXU |S e p˜Y S a S d |XU S e ) − R|+ p˜Y S a S d |XU S e

=

(a)

=

= = =

˜ ∞ ≤R′ ≤C ˜ R

min

p˜Y S a S d |XUS e : J(˜ pS e pXU|S e p˜Y S a S d |XUS e ) = R′

[D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS pXU |S e pY |XS a ) + |R′ − R|+ ]

min

min

[D(˜ pS e pXU |S e p˜Y S a S d |XU S e ||pS pXU |S e pY |XS a ) + |R′ − R|+ ]

min

˜ ∞ ≤R′ ≤C ˜ R

p˜Y S a S d |XUS e : J(˜ pS e pXU|S e p˜Y S a S d |XUS e ) ≤ R′

h i ˜sp (R′ , p˜S e , pXU |S e , pY |XS a ) + |R′ − R|+ E min ˜ ∞ ≤R′ ≤C ˜ R h i ˜sp (R′ , p˜S e , pXU |S e , pY |XS a ) + |R′ − R|+ E min R′ ≥R  ˜sp (R, p˜S e , pXU |S e , pY |XS a ) ˜ cr E :R≥R ˜sp (R ˜ cr , p˜S e , pXU |S e , pY |XS a ) + R ˜ cr − R : else E

(10.7)

˜ cr (˜ where (a) follows from Property 10.2, and R pS e , pXU |S e , pY |XS a ) in the last line is the value of ∂ ˜ R for which ∂R Esp (R, p˜S e , pXU |S e , pY |XS a ) = −1. The claim (3.9) follows, where ˜ cr (˜ pS e , pXU |S e , pY |XS a ) Rcr , lim R |U |→∞

and (˜ pS e , pXU |S e , pY |XS a ) achieves the minmaxmin in (10.6) for fixed |U |. 38

(10.8) 2

11

Discussion

In their landmark paper, Gel’fand and Pinsker [1] showed that random binning achieves the capacity of a DMC with random states known to the encoder. Their encoder structure is however suboptimal at rates below capacity. A stack of binning arrays, indexed by the state sequence type, achieves higher error exponents if the appropriate decoder is used. The channel models studied in this paper generalize the Gel’fand-Pinsker setup in two ways. First, partial information about the state sequence is available to the encoder, adversary, and decoder. Second, both C-DMC and AVC channel models are studied. In all cases, the appropriate decoder is the MPMI decoder (5.6), where the penalty is a function of the state sequence type. To the best of our knowledge such decoders have not been encountered before. We have considered four combinations of maximum/expected cost constraints for the transmitter and C-DMC/AVC designs for the adversary, and obtained the same capacity in all four cases. There is thus no advantage (in terms of capacity or error probability) to the transmitter in operating under expected-cost constraints instead of the stronger maximum-cost constraints. More remarkably, there is a definite advantage to the adversary in choosing a C-DMC rather than an AVC design of the channel. This is because 1) arbitrary memory does not help the adversary because randomly-modulated codes and a MMI-type decoder are used, 2) the set of conditional types the adversary can choose from is constrained in the AVC case but not in the C-DMC case, and 3) the error exponents are determined by the worst types. When no side information is available to the encoder, the random coding exponent is a straight line with slope -1 at all rates below capacity in the AVC case. In their study of AVC’s without side information, Hughes and Thomas [17] observed that the sphere-packing exponent is trivial (∞) in a binary-alphabet example. This is not an anomaly but is typical of problems in which the conditional type of the channel output given the input is constrained, i.e., the channel output sequence lives in a “ball” whose location is determined by the channel input. Certainly at low rates the codes can be designed in such a way that these balls do not overlap, and thus error-free transmission is possible. At high rates it is still an open question whether nontrivial sphere-packing bounds can be obtained. In the case of AVC’s with side information, we have been able to derive nontrivial sphere-packing bounds. Finally, neither the MMI nor the MPMI decoder is practical, and it remains to be seen whether good, practical encoders and decoders can be developed. Acknowledgements. The authors are grateful to R. Koetter for insightful comments about sphere-packing bounds, and to P. Narayan, N. Merhav and M. Haroutunian for helpful discussions.

39

A

Relation to AVC’s

In this appendix, we detail the relation between the channel model pY|XSa , with the maximum distortion constraint (2.6), and the AVC model in [13]. The AVC is a family of conditional p.m.f.’s W (y|x, θ), where θ ∈ Θ (finite set) is a “channel state” selected by the adversary; in general (but not always [14, 17]), θ is allowed to depend on X. A cost function l : Θ → R+ for the states is also defined. The channel law is of the form p(y|x, θ) =

N Y

W (yi |xi , θi )

(A.1)

i=1

where the sequence θ = {θ1 , · · · , θN } is arbitrary except for a maximum cost constraint lN (θ) ,

N 1 X l(θi ) ≤ lmax . N

(A.2)

i=1

Define Y(x, θ) = {y ∈ Y : d(x, y) = θ}, D(x) = {θ ≥ 0 : Y(x, θ) 6= ∅}, Therefore

S

θ

x ∈ X, θ ≥ 0 x ∈ X.

Y(x, θ) = Y.

The problem with maximum distortion constraint (2.6) may be formulated as (A.1) (A.2) with appropriately defined channels W and cost l. Define W (y|x, θ) as any p.m.f. over Y(x, θ); l(θ) = θ. In other words, given x, the adversary selects some θ ∈ D(x) and y such that d(x, y) = θ. The maximum-cost constraint (A.2) is then equivalent to the maximum-distortion constraint (2.6), with lmax = D2 . Unlike [14, 17], θ depends on X here. The state sequence θ should not be confused with Sa in our model. The AVC model (A.1) is memoryless in that θi is not allowed to depend on xj for any j 6= i. This model has been generalized to the A∗ VC model [13] in which θi may depend on xj for all j ≤ i. The model used in [11, 12] and the present paper is even more general in that θi may depend on the entire sequence x. This additional flexibility for the jammer may however not be beneficial to him in terms of error exponents. Finally, note that for a general problem of the form (A.1) (A.2), the conditional type of the channel output given the input need not be restricted as in our model (2.5).

40

B

Error Exponents for Channels Without Side Information

This appendix summarizes some known results on error exponents. Single DMC: Let pY |X and pX be the channel law and input p.m.f., respectively. Referring to [13, p. 165-166], we have Er (R, pX , pY |X ) = min [D(˜ pY |X ||pY |X | pX ) + |I˜XY (pX , p˜Y |X ) − R|+ ], p˜Y |X

Esp (R, pX , pY |X ) =

min

p˜Y |X : I˜XY (pX ,˜ pY |X )≤R

D(˜ pY |X ||pY |X | pX ).

(B.1) (B.2)

We trivially have Esp (R, pX , pY |X ) = 0 if R ≥ I˜XY (pX , pY |X ). Compound DMC: Here pY |X belongs to a set A. We have Er (R, pX , A) = = Esp (R, pX , A) = =

min Er (R, pX , pY |X )

pY |X ∈A

min [D(˜ pY |X ||pY |X | pX ) + |I˜XY (pX , p˜Y |X ) − R|+ ]

min

pY |X ∈A p˜Y |X

(B.3)

min Esp (R, pX , pY |X )

pY |X ∈A

min

min

pY |X ∈A p˜Y |X : I˜XY (pX ,˜ pY |X )≤R

D(˜ pY |X ||pY |X | pX )

(B.4)

which is zero if R ≥ minpY |X ∈A I˜XY (pX , pY |X ). Private Watermarking: the set A is defined by the distortion constraint (2.4). Then [11] ErAV C (R, D1 , D2 ) = max

min [I˜SY |X (pS , pX|S , pY |XS )

pX|S p ˜ Y |XS ∈A

+ |I˜XY |S (pS , pX|S , pY |XS ) − R|+ ] min I˜SY |XS a (pS , pX|S , pY |XS )

AV C Esp (R, D1 , D2 ) = max

pX|S p ˜ ˜ Y |XS ∈A : IXY |S ≤R

(B.5) (B.6)

P where A˜ , {pY |XS : s pS (s)pX|S (x|s)pY |XS (y|x, s)d(x, y) ≤ D2 }. The maximization over pX|S is also subject to a distortion constraint. Jamming with channel state S selected independently of input X [14, 17]. We have Erjam (R) = max min pX

pS

min p˜Y SX : p˜X = pX , p˜S = pS

[D(˜ pY SX ||pY |SX pX pS ) + |I˜XY (pX , p˜Y |X ) − R|+ ] (B.7)

jam Esp (R)

= max min pX

pS

min p˜Y SX : p˜X = pX , p˜S = pS , I(pX , p˜Y |X ) ≤ R

41

D(˜ pY SX ||pY |SX | pX pS ).

(B.8)

C

Proof of Proposition 4.1

The set A, denoted here as PY |X (D2 ), is the set of DMC’s that introduce maximum Hamming distortion D2 . Let the attack channel p∗Y |X be the BSC with crossover probability D2 . Considering p∗Y |X may not be the worst channel, we have C pub = sup ≤ =

max

min

J(pS pXU |S pY |X )

U pXU |S ∈PXU |S (D1 ) pY |X ∈PY |X (D2 ) sup max J(pS pXU |S p∗Y |X ) p ∈P (D ) 1 U XU |S XU |S g∗ (D1 , D2 ),

(C.1)

where the last step is derived in [5, 6]. The function g∗ is defined in (4.1). Next we prove that C pub ≥ g∗ (D1 , D2 ). Consider D1 = D1′ θ, where D1′ ∈ [0, 21 ] and θ ∈ [0, 1]. Let p∗U |S be the BSC with crossover probability D1′ . Furthermore, X = U makes the distortion equal to D1′ . (Note that |U | = 2 in this case.) Clearly, C pub (D1′ ) ≥ =

min

J(pS p∗XU |S pY |X )

min

I(X; Y ) − (1 − h(D1′ )) {z } |

pY |X ∈PY |X (D2 ) pY |X ∈PY |X (D2 )

I(U ;S)

= (1 − h(D2 )) − (1 −

h(D1′ ))

= h(D1′ ) − h(D2 ),

(C.2)

where min

pY |X ∈PY |X (D2 )

I(X; Y ) = 1 − h(D2 )

is achieved by p∗Y |X . Using time-sharing arguments, Barron et al [5] proved that capacity is a concave function of D1 in the case A = {p∗Y |X }. It can be shown that their result holds in the case A = PY |X (D2 ) considered here. Therefore we have  C pub (D1 ) = C pub (D1′ θ) ≥ θC pub(D1′ ) ≥ θ h(D1′ ) − h(D2 ) , ∀θ ∈ [0, 1].

It may be verified that

 max θ h(D1′ ) − h(D2 ) = g∗ (D1 , D2 ).

0≤θ≤1

Therefore

C pub ≥ g∗ (D1 , D2 ). From (C.1) and (C.3), we conclude that C pub = g∗ (D1 , D2 ); also |U | = 2.

42

(C.3) 2

D

Proof of Proposition 4.2

From (3.8), we have AV C,pub Esp (R) = min sup p˜S



Step 1. First we prove that L2 (D1 , D2 ) , min sup p˜S

max

U pXU |S ∈PXU |S (D1 )

min

pY |XU S ∈PY |XU S (D2 ): J (p ˜S pXU |S pY |XU S )≤R

 D(˜ pS ||pS ) + IY ;U S|X (˜ pS pXU |S pY |XU S ) .

max

min

U pXU |S ∈PXU |S (D1 ) pY |XU S ∈PY |XU S (D2 )

(D.1)

J(˜ pS pXU |S pY |XU S )

= C pub ,

(D.2)

with equality if p˜S = pS . Referring to (4.1), we first consider the regime in which time sharing is not needed: D1 ≥ δ2 = 1 − 2−h(D2 ) and therefore C pub = h(D1 ) − h(D2 ). Letting U = X and p∗X|S be the BSC with crossover probability D1 , we obtain a lower bound on L2 (D1 , D2 ): L2 (D1 , D2 ) ≥ min

min

J(˜ pS p∗X|S pY |XS )

= min

min

h

= min

min

p˜S pY |XS ∈PY |XS (D2 ) p˜S pY |XS ∈PY |XS (D2 )

p˜S pY |XS ∈PY |XS (D2 )

h

i I˜X;Y (˜ pS p∗X|S pY |XS ) − I˜X;S (˜ pS p∗X|S )

i (D.3) I˜X;Y (˜ pS p∗X|S pY |XS ) − h(D1 ⋆ p0 ) − h(D1 )

where we use the shorthand p0 = p˜S (0). Next, write pY |XS as pY |XS Y =0 Y =1

XS = 00 1−e e

XS = 10 f 1−f

XS = 01 1−g g

XS = 11 . h 1−h

The p.m.f. of X induced by p˜S and p∗X|S is given by pX

= (pX0 , pX1 ) = (p0 (1 − D1 ) + (1 − p0 )D1 , p0 D1 + (1 − p0 )(1 − D1 )) .

We derive min

pY |XS ∈PY |XS (D2 )

=

I˜X;Y (˜ pS p∗X|S pY |XS ) min

e,f,g,h: p0 ((1−D1 )e+D1 f )+(1−p0 )((1−D1 )h+D1 g)≤D2

h(pX0 (1 − α) + pX1 β) − pX0 h(1 − α) − pX1 h(β)

= h(D1 ⋆ p0 ) − h(D2 ) where α=

p0 (1 − D1 )e + (1 − p0 )D1 g pX0

(D.4)

and 43

β=

p0 D1 f + (1 − p0 )(1 − D1 )h . pX1

The minimum is achieved by α∗ =

D2 pX1 − D2 , pX0 1 − 2D2

β∗ =

D2 pX0 − D2 . pX1 1 − 2D2

Combining (D.3) and (D.4), we obtain L2 (D1 , D2 ) ≥ h(D1 ) − h(D2 ) = C pub .

(D.5)

In the case D1 < δ2 , capacity is achieved using time-sharing: C pub > h(D1 ) − h(D2 ). Similarly to [5], it can be shown that L2 (D1 , D2 ) is a nondecreasing concave function of D1 . Hence,  L2 (D1 , D2 ) = L2 (D1′ θ, D2 ) ≥ max θL2 (D1′ , D2 ) ≥ max θ h(D1′ ) − h(D2 ) = C pub . (D.6) 0≤θ≤1

0≤θ≤1

For all values of D1 , letting p˜S = pS in (D.2) and further restricting the minimization over pY |XU S , we have L2 (D1 , D2 ) ≤ sup

max

min

U pXU |S ∈PXU |S (D1 ) pY |X ∈PY |X (D2 )

J(˜ pS pXU |S pY |X ) = C pub .

(D.7)

Combining (D.5), (D.6) and (D.7), we obtain (D.2). Step 2. Due to (D.2), for any p˜S , there exists pXU |S such that min

pY |XU S ∈PY |XU S (D2 )

J(˜ pS pXU |S pY |XU S ) ≥ C pub .

Therefore, if R < C pub , the innermost minimization over pY |XU S in (D.1) has an empty feasible set AV C,pub for this choice of pXU |S . It follows that Esp (R) = ∞ when R < C pub . 2

E

Proof of Proposition 4.3

From (3.3), we have ErAV C,pub (R) = min sup p˜S

max

min

U pXU |S ∈PXU |S (D1 ) pY |XU S ∈PY |XU S (D2 ) +

+ |J(˜ pS pXU |S pY |XU S ) − R|

i

h

D(˜ pS ||pS ) + IY ;U S|X (˜ pS pXU |S pY |XU S )

.

(E.1)

If we fix p˜S = pS = ( 12 , 12 ) and restrict pY |SU X to be of the form pY |X , we obtain an upper bound on ErAV C,pub (R): ErAV C,pub (R) ≤ sup

max

min

U pXU |S ∈PXU |S (D1 ) pY |X ∈PY |X (D2 )

|J(˜ pS pXU |S pY |X ) − R|+

= |C pub − R|+ .

(E.2)

On the other hand, the first two bracketed terms in (E.1) are nonnegative. This yields a lower bound on ErAV C,pub (R): ErAV C,pub (R) ≥ min sup p˜S

max

min

U pXU |S ∈PXU |S (D1 ) pY |XU S ∈PY |XU S (D2 )

|J(˜ pS pXU |S pY |XU S ) − R|+

= |C pub − R|+ .

(E.3)

The above equality is due to (D.2). Combining (E.2) and (E.3), we obtain ErAV C,pub (R) = |C pub − R|+ . 44

2

F

Proof of Proposition 4.5

From (3.8), we have AV C,priv Esp (R) = min

max

p˜S pX|S ∈PX|S (D1 )



Step 1. First we prove that L1 (D1 , D2 ) , min

min

pY |XS ∈PY |XS (D2 ): I˜X;Y |S (p ˜S pX|S pY |XS )≤R

 D(˜ pS ||pS ) + IY ;S|X (˜ pS pX|S pY |XS ) .

max

min

p˜S pX|S ∈PX|S (D1 ) pY |XS ∈PY |XS (D2 )

(F.1)

I˜X;Y |S (˜ pS pX|S pY |XS )

≥ C pub .

(F.2)

We first consider the case of no time-sharing: D1 ≥ δ2 = 1 − 2−h(D2 ) , in which case C pub = h(D1 ) − h(D2 ). Let p∗X|S be the BSC with crossover probability D1 . Denote p˜S (0) as p0 and pY |XS as pY |XS Y =0 Y =1 a b

Define D(a||b) = a log

XS = 00 1−e e

+ (1 − a) log

XS = 10 f 1−f 1−a 1−b .

XS = 01 1−g g

XS = 11 . h 1−h

We have

L1 (D1 , D2 ) ≥ min

min

p˜S pY |XS ∈PY |XS (D2 )

= min p0

I˜X;Y |S (˜ pS , p∗X|S , pY |XS )

min

e,f,g,h: p0 ((1−D1 )e+D1 f )+(1−p0 )((1−D1 )h+D1 g)≤D2

p0 [(1 − D1 )D(e||(1 − D1 )e + D1 (1 − f )) + D1 D(f ||(1 − D1 )(1 − e) + D1 f )] + (1 − p0 ) [(1 − D1 )D(h||(1 − D1 )h + D1 (1 − g)) + D1 D(g||D1 g + (1 − D1 )(1 − h))] . (F.3) For fixed p˜S and pX|S , I˜X;Y |S (˜ pS , pX|S , pY |XS ) is a convex function of pY |XS . Using the Lagrange multiplier method, the inner minimization in (F.3) can be easily solved. The optimal value of p∗Y |XS is independent of p0 and is given by e∗ = h∗ =

D2 (D1 − D2 ) (1 − D1 )(1 − 2D2 )

g∗ = f ∗ =

D2 (1 − D1 − D2 ) . D1 (1 − 2D2 )

and

Substituting e∗ , h∗ , g∗ and f ∗ into the cost function in (F.3) and performing tedious simplifications, we obtain h(D1 ) − h(D2 ) for the right side of (F.3). Thus, we have L1 (D1 , D2 ) ≥ h(D1 ) − h(D2 ). 45

(F.4)

Similarly to [5], it can be shown that L1 (D1 , D2 ) is a nondecreasing, concave function of D1 . Thus, L1 (D1 , D2 ) = L1 (D1′ θ, D2 ) ≥ max θL1 (D1′ , D2 ) ≥ max θ(h(D1′ ) − h(D2 )) = C pub , 0≤θ≤1

0≤θ≤1

(F.5)

which proves (F.2). Step 2. Due to (F.2), for any p˜S , there exists pX|S such that min

pY |XS ∈PY |XS (D2 )

I˜X;Y |S (˜ pS , pX|S , pY |XS ) ≥ C pub .

Therefore, if R < C pub , the innermost minimization over pY |XS in (F.1) has an empty feasible set AV C,priv for this choice of pX|S . It follows that Esp (R) = ∞ when R < C pub . 2

G

Proof of Proposition 4.6

We have C deg =

max

min

pX ∈PX (D1 ) pY |X ∈PY |XS (D2 )

I(X; Y ).

(G.1)

Let a = pX (1), e = pY |X (1|0), and f = pY |X (0|1), which satisfy the distortion constraints a ≤ D1 ,

(1 − a)e + af ≤ D2 .

Substituting these probabilities into (G.1), we obtain   h((1 − a)(1 − e) + af ) − (1 − a)h(e) − ah(f ) . C deg = max min a≤D1 (1−a)e+af ≤D2

p∗X

Solving the above max-min problem in the case D1 ≥ δ2 = 1 − 2h(D2 ) , we obtain the optimal and p∗Y |X from a = D1 ,

e=

D2 (D1 − D2 ) , (1 − D1 )(1 − 2D2 )

f=

D2 (1 − D1 − D2 ) . D1 (1 − 2D2 )

After some algebraic simplifications, we obtain C deg = h(D1 ) − h(D2 ). Applying the same timesharing argument as in the proof of Prop. 4.1, we obtain C deg = g∗ (D1 , D2 ), which is the same as the capacity C pub for the public watermarking game. 2

46

References [1] S. I. Gel’fand and M. S. Pinsker, “Coding for Channel with Random Parameters,” Problems of Control and Information Theory, Vol. 9, No. 1, pp. 19—31, 1980. [2] C. Heegard and A. A. El Gamal, “On the Capacity of Computer Memory with Defects,” IEEE Trans. Info. Thy, Vol. 29, No. 5, pp. 731—739, Sep. 1983. [3] M. Costa, “Writing on Dirty Paper,” IEEE Trans. Info. Thy, Vol. 29, No. 3, pp. 439—441, May 1983. [4] T. M. Cover and M. Chiang, “Duality Between Channel Capacity and Rate Distortion with Side Information,” IEEE Trans. Info. Thy, Vol. 48, No. 6, pp. 1629—1638, June 2002. [5] R. J. Barron, B. Chen, G. W. Wornell, “The duality between information embedding and source coding with side information and some applications”, IEEE Trans. on Information Theory, Vol. 49, No. 5, pp. 1159-1180, May 2003. [6] S. S. Pradhan, J. Chou and Ramchandran, “Duality Between Source Coding and Channel Coding and its Extension to the Side Information Case,” IEEE Trans. on Information Theory, Vol. 49, No. 5, pp. 1181-1203, May 2003. [7] F. M. J. Willems, “An Information Theoretical Approach to Information Embedding,” Proc. 21st Symp. Info. Thy in the Benelux, pp. 255—260, Wassenaar, The Netherlands, May 2000. [8] B. Chen and G. W. Wornell, “Quantization Index Modulation Methods: A Class of Provably Good Methods for Digital Watermarking and Information Embedding,” IEEE Trans. Info. Thy, Vol. 47, No. 4, pp. 1423—1443, May 2001. [9] P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of information hiding”, IEEE Trans. on Information Theory, Vol. 49, No.3, pp. 563–593, March 2003. [10] A. S. Cohen and A. Lapidoth, “The Gaussian Watermarking Game,” IEEE Trans. Info. Thy, Vol. 48, No. 6, pp. 1639—1667, June 2002. [11] A. Somekh-Baruch and N. Merhav, “On the Error Exponent and Capacity Games of Private Watermarking Systems,” IEEE Trans. on Information Theory, Vol. 49, No.3, pp. 537–562, March 2003. [12] A. Somekh-Baruch and N. Merhav, “On the Capacity Game of Public Watermarking Systems,” IEEE Trans. on Information Theory, Vol. 50, No.3, pp. 511–524, Mar. 2004. [13] I. Csisz´ar and J. K¨ orner, Information Theory: Coding Theory for Discrete Memoryless Systems, Academic Press, NY, 1981. [14] I. Csisz´ar and P. Narayan, “Arbitrarily Varying Channels with Constrained Inputs and States,” IEEE Trans. Info. Thy, Vol. 31, No. 1, pp. 42—48, Jan. 1988. [15] A. Lapidoth and P. Narayan, “Reliable Communication Under Channel Uncertainty,” IEEE Trans. Info. Thy, Vol. 44, No. 6, pp. 2148—2177, Oct. 1998. 47

[16] T. Ericson, “ “Exponential Error Bounds for Random Codes in the Arbitrarily Varying Channel,” IEEE Trans. on Information Theory, Vol. 31, No.1, pp. 42–48, Jan. 1985. [17] B. L. Hughes and T. G. Thomas, “On Error Exponents for Arbitrarily Varying Channels,” IEEE Trans. on Information Theory, Vol. 42, No.1, pp. 87–98, Jan. 1996. [18] R. Ahlswede, “Arbitrarily Varying Channels with States Sequence Known to the Sender,” IEEE Trans. on Information Theory, Vol. 32, No.5, pp. 621–629, Sep. 1986. [19] E. A. Haroutunian and M. E. Haroutunian, “E-Capacity Upper Bound for a Channel with Random Parameter,” Problems of Control and Information Theory, Vol. 17, No. 2, pp. 99– 105, 1988. [20] M. E. Haroutunian, “Bounds on E-Capacity of a Channel with a Random Parameter,” Probl. Info. Transmission, Vol. 27, No. 1, pp. 14–23, 1991. [21] M. E. Haroutunian, “Letter to the Editor,” Probl. Info. Transmission, Vol. 36, No. 4, p. 140, 2000. [22] M. E. Haroutunian, “New Bounds on E-Capacity of Arbitrarily Varying Channel and Channels with Random Parameters,” Trans. IIAP NAS RA and YSU (Inst. for Informatics and Automation Problems of the Nat. Acad. Sci. Rep. Armenia and Yerevan State U.), Mathematical Problems of Computer Science, Vol. 22, pp. 44–59, 2001. Available from http://ipia.sci.am/˜itas/Mariam/mariam.htm. [23] M. E. Haroutunian and S. A. Tonoyan, “Random Coding Bound of Information Hiding ECapacity,” Proc. IEEE Int. Symp. Info. Theory, p. 536, Chicago, IL, June-July 2004. [24] A. Somekh-Baruch and N. Merhav, “On the Random Coding Error Exponents of the SingleUser and the Multiple-Access Gel’fand-Pinsker Channels,” Proc. IEEE Int. Symp. Info. Theory, p. 448, Chicago, IL, June-July 2004. [25] P. Moulin and Y. Wang, “Error Exponents for Channel Coding With Side Information,” presented at the Recent Results session, IEEE Int. Symp. Info. Theory, Chicago, IL, June 2004. [26] I. Csisz´ar, “The Method of Types,” IEEE Trans. Info. Thy, Vol. 44, No. 6, pp. 2505—2523, Oct. 1998. [27] P. Moulin and Y. Wang, “New Results on Steganographic Capacity,” Proc. CISS’04, Princeton, NJ, March 2004. [28] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. New York: Springer, 1996. [29] G. Rudolph, Convergence Properties of Evolutionary Algorithms. Verlag Dr. Kovac: Hamburg, Germany, 1997. [30] A. N. Kolmogorov and V. M. Tihomirov, “ǫ-Entropy and ǫ-Capacity of Sets in Functional Spaces,” Amer. Math. Soc. Translations, Series 2, Vol. 17, pp. 277–364, 1961. Original Russian paper in Uspehi Mat. (N.S.), Vol. 14, No. 2, pp. 3–86, 1959.

48