DIMACS Series in Discrete Mathematics and Theoretical Computer Science
Coding Theorems for Reversible Embedding Frans M.J. Willems and Ton Kalker Abstract. We consider embedding of messages (data-hiding) into i.i.d. host sequences. As in Fridrich et al. [2002] we focus on the case where reconstruction of the host sequence from the composite sequence is required. We study the balance between embedding rate and embedding distortion. First we determine the distortion-rate region corresponding to this setup. Then we generalize this result in two directions. (A) The reversible embedding setup is not robust. Therefore we also consider reconstruction based on the output sequence of a discrete memoryless channel whose input is the composite sequence. We also determine the distortion-rate region for this setup. (B) Then we consider the case where only partial reconstruction of the host sequence is required. We determine the possible trade-offs here between embedding rate, distortion between source sequence and composite sequence (embedding distortion), and distortion between source sequence and restoration sequence (restoration distortion), i.e. the distortion-rate region. All achievability proofs in this paper are based on the Gelfand-Pinsker [1980] achievability proof.
1. Introduction In 1999 it was observed that data-hiding (embedding) is closely related to the information-theoretical concept of ”channels with side-information”. E.g. Chen [C00], Chen and Wornell [CW00], and Moulin and O’Sullivan [MoS03] realized that in the Gaussian case there is a close connection between data embedding and Costa’s result ”writing on dirty paper” [C83]. The achievability proof in the Costa paper is a consequence of the proof of Gelfand and Pinsker [GP80]. The GelfandPinsker result can be regarded as an information-theoretical follow-up of the ”memories with defects” paper by Kuznetsov and Tsybakov [KT77]. Heegard and El Gamal [HeG83] studied codes based on the Gelfand-Pinsker result for computer memories with defects. Coding theorems for data embedding situations appeared in Chen [C00] (specialized to the Gaussian case), Moulin and O’Sullivan [MoS03], Barron [B00], Barron, Chen, and Wornell [BCW03], and Willems [W00]. In all these papers the trade-off between embedding rate and embedding distortion is investigated (standard embedding). In the present paper we will focus on data 1991 Mathematics Subject Classification. Primary 54C40, 14E20; Secondary 46E25, 20C20. Key words and phrases. Data embedding, data hiding, channels with side information, information theory. c °0000 (copyright holder)
1
2
FRANS M.J. WILLEMS AND TON KALKER
embedding schemes that are reversible. These schemes satisfy the additional requirement that the decoder should be able to reconstruct the host sequence. A good overview of the history and the state-of-the-art of reversible data embedding can be found in Fridrich, Goljan, and Du [FGD02]. Reversible data embedding schemes are important in cases where no degradation of the original host signal is allowed. This is for example true for medical imagery, military imagery, and multimedia archives of valuable original works. In [KW02], [WK02], and [KW03] we have determined the fundamental limits of several (partially) reversible embedding setups. The objective of the current paper is to present these results in a unified way. It is demonstrated here that all our achievability proofs are extensions of the Gelfand-Pinsker achievability proof. The paper is organized as follows. In the next section 2 we consider the GelfandPinsker channel with side-information since its achievability proof forms the basis of all our achievability proofs. We start in section 3 with standard noise-free embedding. This ”classical” embedding method is not reversible. We study this case for comparison only. The first section dealing with reversible embedding is section 4 in which the rate-distortion region for the basic setup is determined. In this section we assume that the composite sequence is observed by the decoder (noisefree). In section 5 we study the robust version of this reversible setup. We assume there that the composite sequence is degraded by transmitting it via a discrete memoryless channel. Looking at the output of this channel the decoder should be able to reconstruct the host sequence and the embedded message. In section 6 we generalize the basic reversible embedding setup in another direction. We require on partial reconstruction of the host sequence there. It is assumed that the composite sequence is noise-free accessible by the decoder. Section 7 concludes the paper. 2. The Gelfand-Pinsker Coding Theorem W
- X N = e(W, S N ) 6
XN
- Pc (y|x, s)
YN
- W ˆ = d(Y N )
ˆ W
-
6 SN Ps (s)
Figure 1. The Gelfand-Pinsker side-information channel. 2.1. System description, statement of result. The characteristic of the Gelfand-Pinsker [GP80] setup (see figure 1) is that the transmitter has knowledge of the sequence of channel states that will occur during a block of N transmissions prior to these transmissions. For such a block of transmissions a message source produces one out of M possible message indices. Message index w ∈ {1, 2, · · · , M } occurs with probability Pr{W = w} = 1/M . This message index is conveyed from the transmitter to the receiver in these N transmissions. The channel {X × S, Pc (y|x, s), Y} is memoryless. It has input alphabet X , output alphabet Y, and state alphabet S, all these alphabets are finite. Given an input symbol x ∈ X and a channel state s ∈ S, the output symbol y ∈ Y occurs
CODING THEOREMS FOR REVERSIBLE EMBEDDING
3
with probability Pc (y|x, s). The state sequence sN = (s1 , s2 , · · · , sN ) is assumed to be generated at random. Sequence sN ∈ S N occurs with probability Pr{S N = sN } = Π N n=1 Ps (sn ), therefore the state sequence is i.i.d. and its distribution is {Ps (s), s ∈ S}. The transmitter (encoder) produces the sequence of channel inputs xN = (x1 , x2 , · · · , xN ) based on the message index w that is to be transmitted and the state sequence sN , we write X N = e(W, S N ). The receiver (decoder) observes the sequence of channel outputs y N = (y1 , y2 , · · · , yN ) and forms an estimate w ˆ of the ˆ = d(Y N ). transmitted message index, thus W The performance of the system is determined by its error probability PE = ˆ 6= W } and its transmission rate R = 1 log2 (M ). We assume throughout Pr{W N this manuscript that the base of the logarithm is 2, therefore rates, entropies, and mutual informations are expressed in bits. The Gelfand-Pinsker capacity CGP is the largest ρ such that for all ² > 0 there exist for all large enough N encoders and decoders with R ≥ ρ − ² and PE ≤ ². Theorem 2.1. (Gelfand and Pinsker [GP80]) The capacity (2.1)
CGP = max I(U ; Y ) − I(U ; S). Pt (u,x|s)
The maximum is over all test-channels {S, Pt (u, x|s), U × X } with input alphabet S, output alphabets U and X , and transition probability matrix Pt (u, x|s). The joint distribution of S, U, X, and Y is given by P (s, u, x, y) = Ps (s)Pt (u, x|s)Pc (y|x, s). The cardinality |U| of the auxiliary alphabet need not be larger than |S| + |X |. 2.2. Achievability proof. We will only give a brief outline of the achievability proof here. Our proof is along the lines of the Gelfand-Pinsker proof [GP80]. Details can be found in appendix A. (a) First fix a test-channel Pt (u, x|s). This determines the joint distribution P (s, u, x, y) = Ps (s)Pt (u, x|s)Pc (y|x, s). Assume that the corresponding I(U ; Y ) − I(U ; S) > 0. If a test-channel with I(U ; Y ) − I(U ; S) > 0 does not exist, CGP = 0. Fix an 0 < η < 1. Consider sets TηN (·) of (strongly) typical sequences, see (A.1) and (A.3), and a reverse set RN γ (·) of typical sequences as defined in (A.9). For each message index w P ∈ {1, · · · , 2N R }, generate 2N Ru sequences uN at random according to P (u) = s,x Ps (s)Pt (u, x|s) and give these sequences the label w. (b) The encoder observes the state sequence sN . If sN ∈ / TηN (S) an error is declared by the encoder and we say that an error event E1 (of the first kind) occurred. It can be shown that for all N large enough Pr{E1 } ≤ η. (c) Assume that the complement E1c of E1 occurred, thus sN ∈ TηN (S). When the message index w is to be transmitted, the encoder chooses a sequence uN with label w such that uN ∈ TηN (U |sN ). If such a sequence was not generated the encoder declares an error and we say that an error event E2 (of the second kind) occurred. It can be shown that Pr{E2 |E1c } ≤ η for N large enough if (2.2)
Ru = (1 + η)2 H(U ) − (1 − η)2 H(U |S) + 2η.
N (d) The from applying the ”channel” P (x|u, s) = Pinput sequence xN results Pt (u, x|s)/ x Pt (u, x|s) to u and sN . The resulting xN is transmitted over the side-information channel, the channel with transition probability matrix Pc (y|x, s). The output of this channel is the sequence y N . Assume that the complementary
4
FRANS M.J. WILLEMS AND TON KALKER
N events E1c and E2c occurred. Then by (A.5) the pair (sN , uN ) ∈ T(1+η) 2 −1 (S, U ). We say that an error event E3 (of the third kind) occurred when the channel output N N , uN ). It can be shown that Pr{E3 |E1c , E2c } ≤ η for sequence y N ∈ / T(1+η) 2 −1 (Y |s all large enough N . (e) The decoder, upon receiving y N , finds the unique sequence uN such that ∆ N N 4 2 N u ∈ RN is ζ (U |y ) where ζ = (1 + η) /(1 − η) − 1. If such a unique sequence u not found the decoder declares an error and we say that an error event E4 (of the fourth kind) occurred. If we assume that the complementary events E1c , E2c , and N E3c occurred, the transmitted sequence uN ∈ RN ζ (U |y ). It can be shown that
(2.3)
R + Ru = (1 − ζ)H(U ) − (1 + ζ)2 H(U |Y ) − η
implies that Pr{E4 |E1c , E2c , E3c } ≤ η for all N large enough. The estimated message index w ˆ is the label of the decoded sequence uN . If also E4 did not occur the decoded message index w ˆ is equal to w. (f) It can be shown that for all N large enough the total error probability (2.4)
Pr{E1 ∪ E2 ∪ E3 ∪ E4 } ≤ 4η,
while, by (2.2) and (2.3), the rate satisfies R (2.5)
= (1 − ζ)H(U ) − (1 + ζ)2 H(U |Y ) − η −(1 + η)2 H(U ) + (1 − η)2 H(U |S) − 2η = I(U ; Y ) − I(U ; S) − δ(η),
where limη↓0 δ(η) = 0. So far we have shown that, averaged over the ensemble of generated auxiliary codewords, the probability Pr{E1 ∪ E2 ∪ E3 ∪ E4 } ≤ 4η. This however implies that there are encoders and decoders that actually achieve Pr{E1 ∪ E2 ∪ E3 ∪ E4 } ≤ 4η and thus also PE ≤ 4η. If we now let η ↓ 0 then also δ(η) ↓ 0 and we may conclude that (2.6)
ρ = I(U ; Y ) − I(U ; S)
is achievable. Maximizing over all test channel Pt (u, x|s) yields that CGP is achievable.
2.3. Some observations. From the previous subsection we know that for 0 < η < 1 and N large enough, there are codes with R = I(U ; Y ) − I(U ; S) + δ(η) that achieve Pr{E1 ∪ E2 ∪ E3 ∪ E4 } ≤ 4η. Note also that limη↓0 δ(η) = 0. Consider such a code. (a) With probability not smaller than 1−4η the state sequence sN ∈ TηN (S) and the auxiliary sequence uN is conditionally typical with the state sequence sN , i.e. N N N N N u P ∈ Tη (U |s ). Therefore the joint composition of (s , u ) is close to P (s, u) = x Ps (s)Pt (u, x|s), see (A.5). Consider a matrix {Dsu (s, u), s ∈ S, u ∈ U} with
CODING THEOREMS FOR REVERSIBLE EMBEDDING
5
P non-negative components, and let s,u P (u, s)Dsu (s, u) < ∞, then we can write 1 X 1 X Dsu (sn , un ) = #(s, u|sN , uN )Dsu (s, u) N N s,u n=1,N X ≤ (1 + η)2 P (s, u)Dsu (s, u) (2.7)
≤
X
s,u
P (s, u)Dsu (s, u) + 3ηdmax ,
s,u ∆
where also dmax = maxs,u:P (s,u)>0 Dsu (s, u) < ∞. Now suppose that the encoder, when E1 or E2 occurred, chooses the auxiliary sequence uN that minimizes P n=1,N Dsu (sn , un ). Then we obtain from (2.7) the following upper bound XX 1 1 X Dsu = Pr{S N = sN } Dsu (sn , un (w, sN )) M N w sN n=1,N Ã ! X c c ≤ Pr{E1 ∩ E2 ) P (s, u)Dsu (s, u) + 3ηdmax + Pr{E1 ∪ E2 )dmax (2.8)
≤
X
s,u
P (s, u)Dsu (s, u) + 7ηdmax .
s,u
Here un (w, sN ) is the n-th component of uN . (b) Moreover observe that as an ”intermediate” result the decoder recovers the auxiliary sequence uN if the event E1 ∪ E2 ∪ E3 ∪ E4 does not occur i.e. with probability not smaller than 1 − 4η for N large enough. Both results (a) and (b) will be used in the achievability proofs for several embedding situations that we will investigate in the next sections. 3. Standard Noise-Free Embedding SN Ps (s)
- X N = e(W, S N ) 6
W
XN
-
ˆ = d(X N ) W
-
ˆ W
Figure 2. The standard noise-free embedding situation. 3.1. System description, statement of result. Standard noise-free embedding was first investigated by Chen [C00] and Barron [B00]. In [WvD01] codes were studied for the standard noise-free case for gray-scale symbols and squared error distortion. These authors considered the situation depicted in figure 2. The objective there is to embed as much information in an i.i.d. host sequence as possible, without changing this sequence too much, i.e. without increasing the distortion between the source sequence and the composite sequence too much. We model this system as follows. A message source produces one out of M possible message indices. Message index w ∈ {1, 2, · · · , M } occurs with probability Pr{W = w} = 1/M . This message
6
FRANS M.J. WILLEMS AND TON KALKER
index is embedded in a host sequence sN = (s1 , s2 , · · · , sN ) consisting of N symbols from the finite alphabet S. The host sequence is assumed to be generated at random. Sequence sN ∈ S N occurs with probability Pr{S N = sN } = ΠN n=1 Ps (sn ), therefore the host sequence is i.i.d. and its distribution is {Ps (s), s ∈ S}. The encoder produces the composite sequence xN = (x1 , x2 , · · · , xN ) based on the message index w that must be embedded and the host sequence sN , we write X N = e(W, S N ). The symbols xn , n = 1, N, from the composite sequence assume values from the finite alphabet X . The decoder observes the composite sequence ˆ = d(X N ). xN and forms an estimate w ˆ of the embedded message index, thus W The performance of the system is determined by its error probability PE = ˆ 6= W }, its embedding rate Re = 1 log2 (M ), and the average embedding Pr{W N distortion XX 1 1 X e e = (3.1) Dsx Pr{S N = sN } Dsx (sn , en (w, sN )), M N N w n=1,N
s
N
e (s, x) is the embedding where en (w, s ) is the n-th component of e(w, sN ) and Dsx distortion between symbols s ∈ S and x ∈ X . Without loss of generality we assume e (·, ·) are non-negative. that the components of Dsx We say that a distortion-rate pair (∆sx , ρe ) is achievable in the standard noisefree case if for all ² > 0 there exist for all large enough N encoders and decoders with
Re e Dsx
(3.2)
≥ ρe − ², ≤ ∆sx + ², and
ˆ 6= W } ≤ PE = Pr{W
².
Here we restrict ourselves to rates ρe ≥ 0 and finite distortions ∆sx . It makes no sense to consider rates smaller than zero, and allowing infinite distortion is essentially the same as no distortion constraint. The set of all achievable distortion-rate pairs in the standard noise-free embedding case is denoted by Gsnf . The capacity∆ distortion function Csnf (∆) is defined as Csnf (∆) = max{∆ : (∆, ρ) ∈ Gsnf }. Theorem 3.1. (Chen [C00] and Barron [B00]) The achievable region in the standard noise-free embedding case is given by X e Gsnf = {(∆sx , ρe ) : ∆sx ≥ P (s, x)Dsx (s, x), s,x
0 ≤ ρe ≤ H(X|S), where P (s, x) = Ps (s)Pt (x|s) for some Pt (x|s)}.
(3.3)
Therefore the capacity-distortion function (3.4)
Csnf (∆) =
max H(X|S). P e (s,x)≤∆ Pt (x|s): s,x Ps (s)Pt (x|s)Dsx
The test-channel {S, Pt (x|s), X } has input alphabet S, output alphabet X , and transition probability matrix Pt (x|s). 3.2. Achievable region. Fix a test-channel {S, Pt (x|s), X } with H(X|S) > 0. If such a test-channel does not exist, only distortion-rate pairs (∆sx , 0) must be shown to be achievable P which is trivial. eLet 0 < η < 1. The definition of achievability implies that s,x Ps (s) Pt (x|s)Dsx (s, x) < ∞. The achievability proof
CODING THEOREMS FOR REVERSIBLE EMBEDDING
7
now follows directly from the Gelfand-Pinsker achievability proof in subsection 2.2 and observation (a) in subsection 2.3. Note that the host sequence plays the role of the state sequence here. Also observe that Y ≡ X since our ”channel” is noisefree. Finally substitute X for the auxiliary random variable U , thus U = X and e Dsu (s, u) = Dsx (s, u), s ∈ S, u ∈ X . Then from the arguments in subsection 2.3 we may conclude that there exists a code with error probability PE ≤ 4η, and embedding rate and average embedding distortion satisfying Re
=
e Dsx
≤
I(U ; Y ) − I(U ; S) − δ(η) = H(X|S) − δ(η), X P (s, u)Dsu (s, u) + 7ηdmax s,u
(3.5)
=
X
e Ps (s)Pt (x|s)Dsx (s, x) + 7ηdmax ,
s,x ∆
e for all N large enough. Here dmax = maxs,x:P (s,x)>0 Dsx (s, x). The achievability proof is complete if we let η ↓ 0.
3.3. Converse. Given the standard noise-free embedding system as shown in figure 2 we first define the joint distribution (S, X). To this end, consider the random variable I assuming values from {1, 2, · · · , N } with probability 1/N , independently. Next define the single-letter random variables ∆
(3.6)
(S, X) = (Sn , Xn ) if I = n.
Now the probability distribution for (S, X) is given by: 1 X (3.7) Pr{(S, X) = (s, x)} = Pr{(Sn , Xn ) = (s, x)}, N n=1,N
for s ∈ S, x ∈ X . Note that for these joint probabilities P (s, x) we can write P (s, x) = Ps (s)Pt (x|s) for some test-channel Pt (x|s). The proof of the converse now proceeds in a number of steps. We start with Fano’s inequality: (3.8)
ˆ ) ≤ h(PE ) + PE log2 (M − 1) ≤ 1 + PE log2 (M ), H(W |W ∆
where h(α) = −α log2 α − (1 − α) log2 (1 − α) for 0 ≤ α ≤ 1 is the binary entropy function. For the rate part we have: log2 (M ) ≤
ˆ ) + 1 + PE log2 (M ) H(W ) − H(W |W
≤
H(W |S N ) − H(W |S N , X N ) + 1 + PE log2 (M )
=
I(W ; X N |S N ) + 1 + PE log2 (M )
≤
H(X N |S N ) + 1 + PE log2 (M ) X H(Xn |Sn ) + 1 + PE log2 (M )
≤
n=1,N
(3.9)
=
N H(X|S, I) + 1 + PE log2 (M ) ≤ N H(X|S) + 1 + PE log2 (M ),
or (3.10)
1 1 log2 (M ) ≤ Re = N 1 − PE
µ ¶ 1 H(X|S) + . N
8
FRANS M.J. WILLEMS AND TON KALKER
For the distortion part we get: X 1 X e e Dsx (sn , xn ) Dsx = Pr{(S N , X N ) = (sN , xN )} N N N n=1,N
s ,x
= (3.11)
=
1 X X e Pr{(Sn , Xn ) = (x, y)}Dsx (s, x) N n=1,N s,x X e Pr{(S, X) = (s, x)}Dsx (s, x). s,x
e − ² and ρ ≤ R + ², for Now note that an achievable (∆sx , ρe ) satisfies ∆sx ≥ Dsx e e each ² > 0 and all large enough N . Moreover PE ≤ ². Therefore from (3.10) and (3.11) we may conclude that for an achievable (∆sx , ρe ) X e ∆sx ≥ P (s, x)Dsx (s, x), s,x
(3.12)
ρe
≤
H(X|S),
for some joint distribution P (s, x) = Ps (s)Pt (x|s), s ∈ S, x ∈ X . This completes the converse for the standard noise-free case. 4. Reversible Noise-Free Embedding SN Ps (s)
- X N = e(W, S N ) 6
W
XN
-
ˆ W ˆ , SˆN ) = d(X N ) (W
-
SˆN
Figure 3. The reversible noise-free embedding situation. 4.1. System description, statement of result. Reversible embedding situations were considered recently by Fridrich, Goljan, and Du [FGD02]. Their setup is similar to the standard noise-free setup. There is again a balance between embedding rate and embedding distortion, but it in addition to the message the decoder should reconstruct the host sequence after having received the composite sequence, see figure 3. Although actual embedding methods were discussed in [FGD02], the capacity-distortion function for reversible noise-free embedding was determined a little later in [KW02] by the authors. Coding techniques based on [KW02] were discussed in [MKW02]. We first describe the reversible noise-free embedding model and then state the result of [KW02]. As before the message source produces M possible message indices. Index w ∈ {1, 2, · · · , M } occurs with probability Pr{W = w} = 1/M . This message index is embedded in host sequence sN = (s1 , s2 , · · · , sN ) consisting of symbols from the finite alphabet S. Sequence sN ∈ S N occurs with probability Pr{S N = sN } = ΠN n=1 Ps (sn ), therefore the host sequence is i.i.d. with distribution {Ps (s), s ∈ S}. The encoder produces the composite sequence xN = (x1 , x2 , · · · , xN ) based on the message index w that is to be embedded and host sequence sN , we write X N = e(W, S N ). The symbols xn , n = 1, N, are in the finite alphabet X . The decoder
CODING THEOREMS FOR REVERSIBLE EMBEDDING
9
observes the composite sequence xN and forms an estimate w ˆ of the embedded ˆ , SˆN ) = d(X N ). message index, and an estimate sˆN of the host sequence hence (W The performance of the reversible noise-free embedding system is determined ˆ 6= W ∪ SˆN 6= S N }, its embedding rate Re = by the error probability PE = Pr{W 1 N log2 (M ), and the average embedding distortion XX 1 1 X e e = (4.1) Dsx Pr{S N = sN } Dsx (sn , en (w, sN )). M N N w n=1,N
s
We again say that a distortion-rate pair (∆sx , ρe ) (for ∆sx < ∞ and ρe ≥ 0) is achievable in the reversible noise-free case if for all ² > 0 there exist for all large enough N encoders and decoders with Re e Dsx
≥ ρe − ², ≤ ∆sx + ², and
ˆ 6= W ∪ SˆN 6= S N } ≤ PE = Pr{W
(4.2)
².
The set of all achievable distortion-rate pairs in the reversible noise-free embedding case is denoted by Grnf . The capacity-distortion function Crnf (∆) is defined as ∆ Crnf (∆) = max{∆ : (∆, ρ) ∈ Grnf }. Theorem 4.1. (Kalker and Willems [KW02]) The achievable region in the reversible noise-free embedding case is given by X e Grnf = {(∆sx , ρe ) : ∆sx ≥ P (s, x)Dsx (s, x), s,x
0 ≤ ρe ≤ H(X) − H(S), where P (s, x) = Ps (s)Pt (x|s) for some Pt (x|s)}.
(4.3)
Therefore the capacity-distortion function (4.4)
Crnf (∆) =
Pt (x|s):
P s,x
max
e (s,x)≤∆ Ps (s)Pt (x|s)Dsx
H(X) − H(S).
Test-channel {S, Pt (x|s), X } has input alphabet S, output alphabet X , and transition probability matrix Pt (x|s). 4.2. Achievability proof. We start by fixing a test-channel P {S, Pt (x|s), X } for which H(X) − H(S) > 0 and an 0 < η < 1. We assume that s,x Ps (s)Pt (x|s) e Dsx (s, x) < ∞. The achievability proof now follows again from the Gelfand-Pinsker achievability proof in subsection 2.2 and observations (a) and (b) in subsection 2.3. The host sequence plays the role of the state sequence. Note that Y ≡ X since the channel is noise-free. We now substitute (S, X) for the auxiliary random variable e U , thus U = (S, X) and Dsu (s, (s0 , x)) = Dsx (s, x) if s = s0 for s ∈ S, x ∈ X . From the arguments in subsection 2.3 we may conclude that there exists a code with embedding rate and embedding distortion satisfying Re
=
e Dsx
≤
I(U ; Y ) − I(U ; S) − δ(η) = H(X) − H(S) − δ(η), X P (s, u)Dsu (s, u) + 7ηdmax s,u
(4.5)
=
X s,x
e Ps (s)Pt (x|s)Dsx (s, x) + 7ηdmax ,
10
FRANS M.J. WILLEMS AND TON KALKER ∆
e for all N large enough. Here dmax = maxs,x:P (s,x)>0 Dsx (s, x). Moreover from observation (b) in subsection 2.3 we may conclude that the decoder recovers uN = (sN , xN ) if E1 ∪ E2 ∪ E3 ∪ E4 does not occur. This implies that also sN is reconstructed and hence PE ≤ 4η for all N large enough. The achievability proof for test-channels with H(X) − H(S) > 0 is complete if we let η ↓ 0. Let (d, 0) ∈ Grnf be the distortion-rate pair with minimal distortion d. This distortion-rate pair corresponds to a test channel having ρ = H(X) − H(S) = 0. Since Grnf can be shown to be closed and convex, also the distortion-rate pair (d, 0) should be achievable (if there is a test channel with ρ = H(X) − H(S) > 0 and ∆sx < ∞).
4.3. Converse. Just like for noise-free embedding we define the joint distribution (S, X) as 1 X (4.6) Pr{(S, X) = (s, x)} = Pr{(Sn , Xn ) = (s, x)}, N n=1,N
This follows from defining the random variable I that assumes values in {1, 2, · · · , N } with probability 1/N , independently. Now ∆
(4.7)
(S, X) = (Sn , Xn ) if I = n.
Again we can write P (s, x) = Ps (s)Pt (x|s) for some test-channel Pt (x|s). The proof of the converse starts with Fano’s inequality: (4.8)
ˆ , SˆN ) ≤ H(W, S N |W ≤
h(PE ) + PE log2 (M |S|N − 1) 1 + PE log2 (M ) + N PE log2 (|S|).
For the rate part we have: log2 (M ) ≤
ˆ , SˆN ) + 1 + PE log2 (M ) + N PE log2 (|S|) H(W ) − H(W, S N |W
≤
H(W, S N ) − H(W, S N |X N ) − H(S N ) +1 + PE log2 (M ) + N PE log2 (|S|)
=
I(W, S N ; X N ) − H(S N ) + 1 + PE log2 (M ) + N PE log2 (|S|)
≤
H(X N ) − H(S N ) + 1 + PE log2 (M ) + N PE log2 (|S|) X X H(Xn ) − H(Sn ) + 1 + PE log2 (M ) + N PE log2 (|S|)
≤
n=1,N
(4.9)
= ≤
n=1,N
N H(X|I) − N H(S|I) + 1 + PE log2 (M ) + N PE log2 (|S|) N H(X) − N H(S) + 1 + PE log2 (M ) + N PE log2 (|S|),
or (4.10)
1 1 log2 (M ) ≤ Re = N 1 − PE
µ
¶ 1 H(X) − H(S) + + PE log2 (|S|) . N
For the distortion part we find, as in (4.11), that X e e = (4.11) Pr{(S, X) = (s, x)}Dsx (s, x). Dsx s,x e − ² and ρ ≤ R + ² for Now note that an achievable (∆sx , ρe ) satisfies ∆sx ≥ Dsx e e each ² > 0 and all large enough N . Moreover PE ≤ ². Therefore from (4.10) and
CODING THEOREMS FOR REVERSIBLE EMBEDDING
11
(4.11) we may conclude that for an achievable (∆sx , ρe ) X e ∆sx ≥ P (s, x)Dsx (s, x), s,x
(4.12)
ρe
≤
H(X) − H(S),
for some joint distribution P (s, x) = Ps (s)Pt (x|s), s ∈ S, x ∈ X . The converse for the reversible noise-free case is now complete. 5. Reversible and Robust Embedding SN Ps (s)
- X N = e(W, S N ) 6
XN
- Pc (y|x)
W
YN
ˆ W
- (W ˆ , S ) = d(Y ) ˆN
N
-
SˆN
Figure 4. The reversible and robust embedding situation. 5.1. System description, statement of result. Reversible noise-free embedding has the disadvantage that it is not robust against errors that may occur in the composite sequence. Therefore in [KW03] the authors considered a modification of the reversible noise-free embedding configuration in figure 3 in which the composite sequence is transmitted over a discrete memoryless channel, before being observed by the decoder that extracts the host sequence and the embedded message index, see figure 4. For this model the authors could determine the capacity-distortion region. We first describe the reversible and robust embedding model and then state the result of [KW03]. We mainly describe the differences with the reversible noise-free embedding situation. There is also a message source and a source that produces the host sequence. The message index w is embedded in host sequence sN . The encoder produces the composite sequence xN based on the message index w that is to be embedded and host sequence sN , i.e. X N = e(W, S N ). The composite sequence xN is now transmitted over the channel {X , Pc (y|x), Y}. This channel is memoryless. It has input alphabet X , output alphabet Y, both finite. Given an input symbol x ∈ X , the output symbol y ∈ Y occurs with probability Pc (y|x). The decoder observes the channel output sequence y N = (y1 , y2 , · · · , yN ) and forms an estimate w ˆ of the embedded message index, and estimate sˆN of the host sequence thus ˆ , SˆN ) = d(Y N ). (W The performance of the reversible embedding system is determined by the error ˆ 6= W ∪ SˆN 6= S N }, its embedding rate Re = 1 log2 (M ), probability PE = Pr{W N and the average embedding distortion XX 1 1 X e e = Pr{S N = sN } Dsx Dsx (sn , en (w, sN )). (5.1) M N N w s
n=1,N
A distortion-rate pair (∆sx , ρe ) with ∆sx < ∞ and ρe ≥ 0 is achievable in the robust and reversible case if for all ² > 0 there exist for all large enough N encoders
12
FRANS M.J. WILLEMS AND TON KALKER
and decoders with Re e Dsx
ˆN
ˆ 6= W ∪ S PE = Pr{W
(5.2)
N
≥ ρe − ², ≤ ∆sx + ², and
6= S } ≤
².
The set of all achievable distortion-rate pairs in the reversible and robust embedding case is denoted by Grr . The capacity-distortion function Crr (∆) is defined as ∆ Crr (∆) = max{∆ : (∆, ρ) ∈ Grr }. Theorem 5.1. (Kalker and Willems [KW03]) The achievable region in the reversible and robust embedding case is given by X e Grr = {(∆sx , ρe ) : ∆sx ≥ P (s, x)Dsx (s, x), s,x
0 ≤ ρe ≤ I(X; Y ) − H(S), where P (s, x, y) = Ps (s)Pt (x|s)Pc (y|x) for some Pt (x|s)}.
(5.3)
Therefore the capacity-distortion function (5.4)
Crr (∆) =
Pt (x|s):
max
P s,x
e (s,x)≤∆ Ps (s)Pt (x|s)Dsx
I(X; Y ) − H(S).
Test-channel {S, Pt (x|s), X } has input alphabet S, output alphabet X , and transition probability matrix Pt (x|s). 5.2. Achievability proof. Fix a test-channel {S, Pt (x|s), X } and 0 < η < 1. P e (s, x) < ∞. The achievability proof follows We assume that s,x Ps (s)Pt (x|s)Dsx again from the Gelfand-Pinsker achievability proof in subsection 2.2 and observations (a) and (b) in subsection 2.3. The host sequence plays the role of the state sequence. Substitute (S, X) for the auxiliary random variable U again, thus e U = (S, X) and Dsu (s, (s0 , x)) = Dsx (s, x) if s = s0 for s ∈ S, x ∈ X . Note that (S, X) − X − Y form a Markov chain. The arguments in subsection 2.3 imply that there exists a code with embedding rate and embedding distortion such that Re
=
e Dsx
≤
I(U ; Y ) − I(U ; S) − δ(η) = I(X; Y ) − H(S) − δ(η), X P (s, u)Dsu (s, u) + 7ηdmax s,u
(5.5)
=
X
e (s, x) + 7ηdmax , Ps (s)Pt (x|s)Dsx
s,x ∆
e for all N large enough. Here dmax = maxs,x:P (s,x)>0 Dsx (s, x). Moreover from observation (b) in subsection 2.3 we may conclude that the decoder recovers uN = (sN , xN ) and hence sN if E1 ∪E2 ∪E3 ∪E4 does not occur. Hence PE ≤ 4η for all N large enough. The achievability part of theorem 5.1 for pairs with I(X; Y )−H(S) > 0 is complete if we let η ↓ 0. Distortion-rate pairs having I(X; Y )−H(S) = 0 are achievable by the fact that Grr is closed and convex.
5.3. Converse. Consider the reversible and robust embedding system shown in figure 4. Define the joint distribution (S, X, Y ). Therefore consider the random
CODING THEOREMS FOR REVERSIBLE EMBEDDING
13
variable I assuming values from {1, 2, · · · , N } with probability 1/N , independently. Now define the single-letter random variables ∆
(5.6)
(S, X, Y ) = (Sn , Xn , Yn ) if I = n.
The probability distribution for (S, X, Y ) is therefore given by: 1 X (5.7) Pr{(S, X, Y ) = (s, x, y)} = Pr{(Sn , Xn , Yn ) = (s, x, y)}, N n=1,N
for s ∈ S, x ∈ X , and y ∈ Y. For these joint probabilities P (s, x, y) we can write P (s, x, y) = Ps (s)Pt (x|s)Pc (y|x) for some test-channel Pt (x|s). Only the rate part of the proof differs from that for the reversible noise-free case: ˆ , SˆN ) + 1 + PE log2 (M ) + N PE log2 (|S|) log2 (M ) ≤ H(W ) − H(W, S N |W ≤
H(W, S N ) − H(W, S N |Y N ) − H(S N ) +1 + PE log2 (M ) + N PE log2 (|S|)
=
I(W, S N ; Y N ) − H(S N ) + 1 + PE log2 (M ) + N PE log2 (|S|)
≤
I(X N ; Y N ) − H(S N ) + 1 + PE log2 (M ) + N PE log2 (|S|) X X X H(Yn ) − H(Yn |Xn ) − H(Sn )
≤
n=1,N
≤ =
N I(X; Y ) − N H(S) + 1 + PE log2 (M ) + N PE log2 (|S|),
or (5.9)
n=1,N
+1 + PE log2 (M ) + N PE log2 (|S|) N H(Y |I) − N H(Y |X, I) − N H(S|I) +1 + PE log2 (M ) + N PE log2 (|S|) N H(Y ) − N H(Y |X) − N H(S) + 1 + PE log2 (M ) + N PE log2 (|S|)
=
(5.8)
n=1,N
Re =
1 1 log2 (M ) ≤ N 1 − PE
µ ¶ 1 I(X; Y ) − H(S) + + PE log2 (|S|) . N
The distortion part and the rest of the converse are analogous to that of the nonrobust reversible case. 6. Partially Reversible Noise-Free Embedding SN Ps (s)
- X N = e(W, S N ) 6
W
XN
-
ˆ = d(X N ) W V
N
N
= f (X )
ˆ W
-
VN
Figure 5. The partially reversible noise-free embedding situation. 6.1. System description, statement of result. Another disadvantage of reversible noise-free embedding is that the requirement that the host sequence must be reconstructed in addition to the message index has a strong negative effect on the embedding rate. In [WK02] the authors studied a reversible embedding situation
14
FRANS M.J. WILLEMS AND TON KALKER
where only partial reconstruction of the host sequence was required, see figure 5. Note that this setup is not robust. There is a message source and a source that produces the host sequence. Message index w is embedded in host sequence sN . The encoder produces the composite sequence xN based on the message index w that is to be embedded and host sequence sN , i.e. X N = e(W, S N ). The composite sequence xN is now observed by the decoder that forms an estimate w ˆ of the embedded message index, and a restoration sequence v N = (v1 , v2 , · · · , vN ) ∈ V N that is close to the host sequence ˆ = d(X N ) and V N = f (X N ). We assume that the alphabet V is sN . We write W finite. The performance of the partially reversible noise-free embedding system is deˆ 6= W }, its embedding rate Re = termined by the error probability PE = Pr{W 1 log (M ), the average embedding distortion 2 N XX 1 1 X e e = (6.1) Dsx Pr{S N = sN } Dsx (sn , en (w, sN )), M N N w n=1,N
s
and the average restoration distortion XX 1 1 X r r = Dsv (6.2) Pr{S N = sN } Dsv (sn , fn (e(w, sN ))), M N N w n=1,N
s
r (s, x) is the restoration where fn (xN ) is the n-th component of f (xN ) and Dsv distortion between symbols s ∈ S and v ∈ V. Without loss of generality we asr sume that the components of Dsv (·, ·) are non-negative. We assume also that all components are finite however. A distortion-rate triple (∆sx , ∆sv , ρe ) is achievable in the partially reversible noise-free case if for all ² > 0 there exist for all large enough N encoders and decoders with
Re
≥
ρe − ²,
e Dsx
≤
∆sx + ²,
r Dsv
(6.3)
≤ ˆ PE = Pr{W = 6 W} ≤
∆sv + ², and ².
The set of all achievable distortion-rate triples in the partially reversible noise-free embedding case is denoted by Gprnf . Theorem 6.1. (Willems and Kalker [WK02]) The achievable region in the partially reversible noise-free embedding case is given by X e P (s, x)Dsx (s, x), Gprnf = {(∆sx , ∆sv , ρe ) : ∆sx ≥ s,x
∆sv ≥
X
r P (s, v)Dsv (s, v),
s,v
0 ≤ ρe ≤ H(X) − I(S; X, V ), (6.4)
where P (s, x, v) = Ps (s)Pt (x, v|s) for some Pt (x, v|s)}.
Test-channel {S, Pt (x, v|s), X × V} has input alphabet S, output alphabets X and V, and transition probability matrix Pt (x, v|s).
CODING THEOREMS FOR REVERSIBLE EMBEDDING
15
6.2. Achievability proof. Fix a test-channel {S, Pt (x, v|s), X × V} and an η such that 0 < η < 1. Assume that H(X) P − I(S; X, V ) > 0.e By the definition of achievability we may assume that s,x,v Ps (s)Pt (x, v|s)Dsx (s, x) < ∞ and P r P (s)P (x, v|s)D (s, v) < ∞. The achievability part of theorem 6.1 now t sv s,x,v s follows directly from the Gelfand-Pinsker proof in subsection 2.2 and the two observations in subsection 2.3. The host sequence plays the role of the state sequence. Also observe that Y ≡ X, the channel is noise-free. Finally substitute (V, X) for the 0 (·, ·) auxiliary random variable U , thus U = (V, X). Now define the matrices Dsu 00 and Dsu (·, ·): 0 (s, (v, x)) = Dsu 00 Dsu (s, (v, x)) =
(6.5)
e Dsx (s, x) and r Dsv (s, v) for s ∈ S, x ∈ X , v ∈ V.
The arguments in subsection 2.3 imply that there exists a code with embedding rate and embedding distortion such that PE ≤ 4η and e Dsx
=
Re
=
0 Dsu
≤
I(U ; Y ) − I(U ; S) − δ(η) = H(X) − I(S; X, V ) − δ(η), X 0 P (s, u)Dsu (s, u) + 7ηd0max s,u
(6.6)
=
X
e (s, x) + 7ηd0max , Ps (s)Pt (x|s)Dsx
s,x ∆
e (s, x). Moreover from for all N large enough. Here d0max = maxs,x:P (s,x)>0 Dsx observation (b) in subsection 2.3 we may conclude that the decoder recovers uN = (v N , xN ) if E1 ∪ E2 ∪ E3 ∪ E4 does not occur. We can now write r Dsv
00 + (1 − Pr{E ∪ E ∪ E ∪ E })d00 ≤ Pr{E1 ∪ E2 ∪ E3 ∪ E4 }Dsu 1 2 3 4 max X 00 00 00 ≤ Ps (s)Pt (u|s)Dsu (s, u) + 7ηdmax + 4ηdmax s,u
(6.7)
≤
X
r Ps (s)Pt (v|s)Dsv (s, v) + 11ηd00max
s,v ∆
r (s, v). The achievability part of for all N large enough. Now d00max = maxs,v Dsv theorem 6.1 for triples with H(X) − I(S; XV ) > 0 follows if we let η ↓ 0. Note that by convexity and closed-ness of Gprnf triples with H(X) − I(S; XV ) = 0 are achievable.
6.3. Converse. Consider the partially reversible embedding system as shown in figure 5. We begin by defining the joint distribution (S, X, V ). Therefore consider the random variable I assuming values from {1, 2, · · · , N } with probability 1/N , independently. Now define the single-letter random variables (6.8)
∆
(S, X, V ) = (Sn , Xn , Vn ) if I = n.
The probability distribution for (S, X, V ) is therefore given by: 1 X (6.9) Pr{(S, X, V ) = (s, x, v)} = Pr{(Sn , Xn , Vn ) = (s, x, v)}, N n=1,N
for s ∈ S, x ∈ X , and v ∈ V. For these joint probabilities P (s, x, v) we can write P (s, x, v) = Ps (s)Pt (x, v|s) for some test-channel Pt (x, v|s).
16
FRANS M.J. WILLEMS AND TON KALKER
For the distortion parts we get: e Dsx
X
=
Pr{(S N , X N ) = (sN , xN )}
n=1,N
sN ,xN
1 X X e Pr{(Sn , Xn ) = (x, y)}Dsx (s, x) N n=1,N s,x X e Pr{(S, X) = (s, x)}Dsx (s, x),
= (6.10)
1 X e Dsx (sn , xn ) N
=
s,x
and r Dsv
=
X
Pr{(S N , V N ) = (sN , v N )}
n=1,N
sN ,v N
= (6.11)
=
1 X r Dsv (sn , vn ) N
1 X X r Pr{(Sn , Vn ) = (s, v)}Dsv (s, v) N n=1,N s,v X r Pr{(S, V ) = (s, v)}Dsv (s, v). s,v
We continue with Fano’s inequality: (6.12)
ˆ ) ≤ h(PE ) + PE log2 (M − 1) ≤ 1 + PE log2 (M ), H(W |W
where h(·) is the binary entropy function. The rate part consists of the following steps: log2 (M )
= H(X N , V N , W ) − H(X N , V N |W ) ≤ H(X N ) + H(W |X N ) + H(V N |X N , W ) − I(X N , V N ; S N |W ) ˆ ) − I(X N , V N ; S N |W ) ≤ H(X N ) + H(W |W ≤ H(X N ) − H(S N |W ) + H(S N |W, X N , V N ) + 1 + PE log2 (M ) ≤ H(X N ) − H(S N ) + H(S N |X N , V N ) + 1 + PE log2 (M ) X X X ≤ H(Xn ) − H(Sn ) + H(Sn |Xn , Vn ) n=1,N
n=1,N
n=1,N
+1 + PE log2 (M ) = N H(X|I) − N H(S|I) + N H(S|X, V, I) + 1 + PE log2 (M ) ≤ N H(X) − N H(S) + N H(S|X, V ) + 1 + PE log2 (M ) (6.13)
= N H(X) − N I(S; X, V ) + 1 + PE log2 (M ),
or (6.14)
Re =
1 1 log2 (M ) ≤ N 1 − PE
µ H(X) − I(S; X, V ) +
1 N
¶ .
e − ², Now note that an achievable triple (∆sx , ∆sv , ρe ) satisfies ∆sx ≥ Dsx r ∆sv ≥ Dsv − ², and ρe ≤ Re + ², for each ² > 0 and all large enough N . Moreover PE ≤ ². Therefore from (6.14), (6.10), and (6.11), we may conclude that for an
CODING THEOREMS FOR REVERSIBLE EMBEDDING
17
achievable (∆sx , ∆sv , ρe ) ∆sx
≥
X
e P (s, x)Dsx (s, x),
s,x
∆sv
≥
ρe
≤
X
r P (s, v)Dsv (s, v),
s,v
(6.15)
H(X) − I(S; X, V ),
for some joint distribution P (s, x, v) = Ps (s)Pt (x, v|s), s ∈ S, x ∈ X , and v ∈ V. This completes the converse for the partially reversible noise-free case. 7. Concluding remarks The reader may have observed that actually there is a section missing in this paper. We can not give results on partially reversible and robust embedding here. Although it is possible to prove the achievability of a (reasonable) distortion-rate region for this setup we cannot prove the corresponding converse and therefore we omit this ”result”. The region that we have in mind is actually very similar to the region that is mentioned in Sutivong et al. [SCCK02] (see also [SCC01]) for channels with state information available to the transmitter whose task it is to send the states and a message to the receiver. In the setup of Sutivong et al. embedding distortion does not play a role, however in our setup it is an essential parameter. In both the reversible and robust case considered in section 5, and the partially reversible noise-free case that was subject of section 6 it makes sense to consider the zero-rate case. In the reversible and robust case this leads to an expression for the smallest possible distortion that can be obtained if we want to transmit an i.i.d. sequence over a discrete memoryless channel by changing it into a channel codeword. This instance of joint source-channel coding shows that reliable transmission is possible also when channel codewords are only slightly different from the source sequences that need to be transmitted to the receiver. In the zero-rate partially reversible noise-free case the composite sequence is close to the host sequence (measured by the embedding distortion) but from this composite sequence a restoration sequence can be derived which is in general even closer to the host sequence (if the restoration distortion and embedding distortion measures are identical). Such a system is actually a scalar quantizer in which a vector quantizer is hidden [WK02]. Theorem 6.1 shows what embedding (scalar) distortion and restoration (vector) distortions are jointly achievable. Appendix A. Details Gelfand-Pinsker achievability proof This proof is based on strong typicality, see e.g. Cover and Thomas [CT91], pp. 370-372. Fix an 0 < γ < 1 and a block-length N . (a) Consider random variables A and B with finite alphabets A and B and joint distribution {P (a, b), a ∈ A, b ∈ B}. The marginal distribution of A is {P (a), a ∈ A}, the marginal distribution of B is {P (b), b ∈ B}. First consider the set TγN (A) of typical a-sequences aN ∈ AN defined by (A.1)
(1 − γ)N P (a) ≤ #(a|aN ) ≤ (1 + γ)N P (a),
18
FRANS M.J. WILLEMS AND TON KALKER
for all a ∈ A, where #(a|aN ) denotes the number of occurrences of a in aN . Then, by the weak law of large numbers, X (A.2) lim P (aN ) = 1. N →∞
aN ∈TγN (A)
For the conditional probability of b given a with P (a) > 0 we can write P (b|a) = P (a, b)/P (a) for a ∈ A, b ∈ B. Next consider for each aN ∈ AN the set TγN (B|aN ) of b-sequences bN ∈ B N conditionally typical with aN , i.e. sequences that satisfy (A.3)
(1 − γ)#(a|aN )P (b|a) ≤ #(a, b|aN , bN ) ≤ (1 + γ)#(a|aN )P (b|a),
for all a ∈ A, b ∈ B. Again by the weak law of large numbers, for all aN ∈ TγN (A), X (A.4) lim P (bN |aN ) = 1. N →∞
bN ∈TγN (B|aN )
Note that for all bN ∈ TγN (B|aN ) with aN ∈ TγN (A) the composition #(a, b|aN , bN ) satisfies for all a ∈ A, b ∈ B (A.5)
(1 − γ)2 N P (a, b) ≤ #(a, b|aN , bN ) ≤ (1 + γ)2 N P (a, b),
N which implies that (aN , bN ) ∈ T(1+γ) 2 −1 (A, B). From (A.4) and (A.5) we obtain that, for all N large enough, the size of the set TγN (B|aN ) for aN ∈ TγN (A) is lower bounded by
(A.6)
2
|TγN (B|aN )| ≥ 2N [(1−γ)
H(B|A)−γ]
.
Furthermore, if bN ∈ TγN (B|aN ) for aN ∈ TγN (A), the composition #(b|bN ) is such that for all b ∈ B (1 − γ)2 N P (b) ≤ #(b|bN ) ≤ (1 + γ)2 N P (b). ∆ P If we now define Q(aN ) = bN ∈TγN (B|aN ) P (bN ) this leads to (A.7)
(A.8)
2
Q(aN ) ≥ 2−N [(1+γ)
H(B)−(1−γ)2 H(B|A)+γ]
for N large enough. Next we define for each y N ∈ Y N the ”reverse” set (A.9)
∆
N N N N N N RN γ (A|b ) = {a ∈ Tγ (A) such that b ∈ Tγ (B|a )},
N then the size of the set RN γ (A|b ) is upper bounded by 2
N N (1+γ) |RN γ (A|b )| ≤ 2
(A.10) ∆
If we now define Q(bN ) = (A.11)
P N aN ∈RN γ (A|b )
H(A|B)
.
P (aN ) this leads to 2
Q(bN ) ≤ 2−N [(1−γ)H(A)−(1+γ)
H(A|B)]
.
(b) When the encoder receives a state sequence sN ∈ / TηN (S), an error event E1 of the first kind occurs. By (A.2) however Pr{E1 } ≤ η for all N large enough. (c) An error event E2 of the second kind occurs if among the 2N Ru sequences N u that were generated at random, and that have label w, there is no sequence in
CODING THEOREMS FOR REVERSIBLE EMBEDDING
19
TγN (U |sN ). By event E1c the state sequence sN ∈ TηN (S). Now for large enough N Pr{E2 |sN ∈ TηN (S)}
= (1 − Q(sN ))2 N Ru
N Ru N
log2 (1−Q(s )) = 22 −2N Ru Q(sN )/ ln(2) ≤ 2
(A.12)
≤
2−2
N Ru −N [(1+η)2 H(U )−(1−η)2 H(U |S)+η]
2
/ ln(2)
= exp(−2N η )
if Ru = (1 + η)2 H(U ) − (1 − η)2 H(U |S) + 2η. Note that the last inequality follows from (A.8). From (A.12) we can conclude that also Pr{E2 |E1c } ≤ η for all N large enough. N N , uN ). It fol(d) An error event of the third kind occurs if y N ∈ / T(1+η) 2 −1 (Y |s N c c lows from (A.4) and the fact that (sN , uN ) ∈ T(1+η) 2 −1 (S, U ) that Pr{E3 |E1 , E2 } ≤ η for all large enough N . (e) First note that the occurrence of the complementary events E1c , E2c , and E3c implies that (A.13) N P (s, u, y)(1 − 2η − η 2 )2 ≤ #(s, u, y|sN , uN , y N ) ≤ N P (s, u, y)(1 + η)4 . Since (1 − η)2 N P (u) ≤ #(u|uN ) ≤ (1 + η)2 N P (u) for u ∈ U we obtain (A.14) (1 − 2η − η 2 )2 (1 + η)4 #(u|uN )P (y|u) ≤ #(u, y|uN , y N ) ≤ #(u|uN )P (y|u), 2 (1 + η) (1 − η)2 ∆
and it can be checked that y N ∈ TζN (Y |uN ) for ζ = (1 + η)4 /(1 − η)2 − 1. Moreover N N N N uN ∈ T(1+η) ∈ RN 2 −1 (U ) ⊆ Tζ (U ). Hence u ζ (U |y ). Event E4 occurs when N (R+Ru ) among the other 2 − 1 auxiliary sequences there is at least one sequence N u ˜N 6= uN that is in the reverse set RN ζ (U |y ). Now we can write Pr{E4 |y N ∈ TζN (Y )} (A.15)
≤
(2N (R+Ru ) − 1)Q(y N )
≤
2N (R+Ru ) 2−N [(1−ζ)H(U )−(1+ζ)
2
H(U |Y )]
= 2−N η
if R + Ru = (1 − ζ)H(U ) − (1 + ζ)2 H(U |Y ) − η. Note that the first inequality in (A.15) comes from the union bound, the second follows from (A.11). (f) For the total error probability we can write Pr{E1 ∪ E2 ∪ E3 ∪ E4 } = Pr{E1 } + Pr{E1c , E2 } + Pr{E1c , E2c , E3 } + Pr{E1c , E2c , E3c , E4 } (A.16) ≤ Pr{E1 } + Pr{E2 |E1c } + Pr{E3 |E1c , E2c } + Pr{E4 |E1c , E2c , E3c } ≤ 4η, for all N large enough. References [B00] R.J. Barron, Systematic Hybrid Analog/Digital Signal Coding, Ph.D. dissertation, Massachusetts Inst. of Techn., June 2000. [BCW03] R.J. Barron, B. Chen, G.W. Wornell, ”The duality between information embedding and source coding with side information and some applications,” IEEE Trans. Inform. Theory, vol. IT-49, pp. 1159-1180, May 2003. [C00] B. Chen, Design and Analysis of Digital Watermarking, Information Embedding, and Data Hiding Systems, Ph.D. dissertation, Massachusetts Inst. of Techn., June 2000. [CW00] B. Chen and G.W. Wornell, ”Quantization index modulation: A class of provably good methods for digital watermarking and information embedding,” IEEE Trans. Inform. Theory, vol. IT-47, pp. 1423-1443, May 2001.
20
FRANS M.J. WILLEMS AND TON KALKER
[C83] M.H.M. Costa, ”Writing on dirty paper,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 439-441, May 1983. [CT91] T.M. Cover and J.A. Thomas, Elements of Information Theory. Wiley, New York, 1991. [FGD02] J. Fridrich, M. Goljan, and R. Du, ”Lossless data embedding for all image formats,” Proc. SPIE, Security and Watermarking of imedia Contents, San Jose, California, 2002. [GP80] S. Gelfand and M. Pinsker, “Coding for a channel with random parameters”, Problems of Control and Information Theory, vol. 9, pp. 19-31, 1980. [HeG83] C. Heegard and A. El Gamal, ”On the capacity of computer memory with defects,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 731-739, September 1983. [KW02] T. Kalker and F.M.J. Willems, ”Capacity bounds and constructions for reversible datahiding,” Proc. Int. Conf. DSP, Santorini, Greece, July 1-3, 2002. [KW03] T. Kalker and F.M.J. Willems, ”Capacity bounds and code constructions for reversible data-hiding,” IS & T / SPIE’s 15th Ann. Symp. Electronic Imaging, January 20-24, 2003. Santa Clara, California. CDROM. [MKW02] D. Maas, T. Kalker, and F. Willems, ”Code construction for recursive reversible datahiding,” Multimedia and Security Workshop at ACM Multimedia 2002, December 1-6, 2002, Juan-les-Pins, France. CDROM. [MoS03] P. Moulin and J.A. O’Sullivan, “Information-theoretic analysis of information hiding,” IEEE Trans. Inform. Theory, vol. IT-49, pp. 563- 593, Mar. 2003. [SCC01] A. Sutivong, T.M. Cover, and M. Chiang, ”Trade-off between message and state information rates,” Proc. ISIT 2001, Washington DC, June 24 - 29, 2001, p. 303. [SCCK02] A. Sutivong, T.M. Cover, M. Chiang, and Young-Han Kim, ”Rate vs. distortion tradeoff for channels with state information,” Proc. ISIT 2002, Lausanne, Switzerland, June 30 July 5, 2002, p. 226. [KT77] A.V. Kuznetsov and B.S. Tsybakov,”Coding in a memory with defective cells,” translated from Prob. Peredach. Inform., vol. 10, no. 2, pp. 52-60, April-June 1974. [WvD01] F.M.J. Willems and M. van Dijk, ”Codes for embedding information in grayscale signals,” Proceedings 39th Allerton Conference, October 1-3, 2001, Monticello, Illinois. [WK02] F.M.J. Willems and T. Kalker, ”Methods for reversible embedding,” Proc. 40th Annual Allerton Conference on Communication, Control, and Computing, Allerton House, Monticello, Illinois, Oct. 2-4, 2002. CDROM. [W00] F.M.J. Willems, ”An informationtheoretical approach to information embedding,” Proc. 21st mm”, Symp. Inform. and Comm. Theory in the Benelux, May 25-26, 2000, Wassenaar, Werkgemeenschap voor Informatie- en Communicatietheorie, Enschede, pp. 255 - 260. Eindhoven University of Technology, Eindhoven, The Netherlands. Also Philips Research Laboratories, Eindhoven. E-mail address:
[email protected] Philips Research Laboratories, Eindhoven, The Netherlands. Also Eindhoven University of Technology, Eindhoven. E-mail address:
[email protected]