TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
1
Capacity of Steganographic Channels Jeremiah J. Harmsen, Member, IEEE, William A. Pearlman, Fellow, IEEE,
Abstract— This work investigates a central problem in steganography, that is: How much data can safely be hidden without being detected? To answer this question a formal
information that may be transferred over a stego-channel as seen in Figure 1. The stego-channel is equivalent to the classic channel
definition of steganographic capacity is presented. Once this has been defined a general formula for the capacity
with the addition of the detection function and attack
is developed. The formula is applicable to a very broad
channel. For the classic channel, a transmission is con-
spectrum of channels due to the use of an information-
sidered successful if the decoder properly determines
spectrum approach. This approach allows for the analysis
which message the encoder has sent. In the stego-channel
of arbitrary steganalyzers as well as non-stationary, non-
a transmission is successful not only if the decoder
ergodic encoder and attack channels. After the general formula is presented, various simplifications are applied to gain insight into example hiding and
properly determines the sent message, but if the detection function is not triggered as well.
detection methodologies. Finally, the context and applica-
This additional constraint on the channel use leads
tions of the work are summarized in a general discussion.
to the fundamental view that the capacity of a stegochannel is an intrinsic property of both the channel
Index Terms— Steganographic capacity, stego-channel, steganalysis, steganography, information theory, informa-
and the detection function. That is, the properties of the detection function influence the capacity just as much as
tion spectrum
the noise in the channel. I. I NTRODUCTION A. Background
B. Previous Work There have been a number of applications of informa-
Shannon’s pioneering work provides bounds on the
tion theory to the steganographic capacity problem[1],
amount of information that can be transmitted over a
[2]. These works give capacity results under distortion
noisy channel. His results show that capacity is an
constraints on the hider as well as active adversary. The
intrinsic property of the channel itself. This work takes
additional constraint that the stego-signal retain the same
a similar viewpoint in seeking to find the amount of
distribution as the cover-signal serves as the steganalysis detection function.
This work was carried out at Rensselaer Polytechnic Institute and was supported by the Air Force Research Labs. J. Harmsen is now with Google Inc. in Mountain View, CA 94043, USA; E-mail:
[email protected].
Somewhat less work exists exploring capacity with arbitrary detection functions. These works are written from a steganalysis perspective[3], [4] and accordingly
W. Pearlman is with the Elec. Comp. and Syst. Engineering Dept., Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA; E-mail:
[email protected].
May 2, 2006
give heavy consideration to the detection function. This work differs from previous work in a number of DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
2
Encoder-Attack Channel Qn (z|x) Xn fn (m) Encoder
Zn
Yn W n (y|x)
An (z|y)
Encoder Noise
Attack Channel
φn (z) Decoder
gn (y) Steganalyzer
Fig. 1.
Active System with Noise
aspects. Most notable is the use of information-spectrum
channel shown in Figure 1. Here, the adversary’s goal
methods that allow for the analysis of arbitrary detection
is to disrupt any steganographic communication between
algorithms. This eliminates the need to restrict interest to
the encoder and decoder. To accomplish this a stegana-
detection algorithms that operate on sample averages or
lyzer is used to intercept steganographic messages, and
behave consistently. Instead the detection functions may
an attack function may alter the signal.
be instantaneous, that is, the properties of a detector for n samples need not have any relation to the same detector for n + 1 samples.
We now formally define each of the components in the system, beginning with the random variable notation. 1) Random Variables: Random variables are denoted
Another substantial difference is the presence of noise
by capital letters, e.g. X. Realizations of these random
before the detector. This placement enables the modeling
variables are denoted as lowercase letters, e.g. x. Each
of common signal processing distortions such as com-
random variable is defined over a domain denoted with
pression, quantization, etc. The location of the noise adds
a script X . A sequence of n random variables is de-
complexity not only because of confusion at the decoder,
noted with X n = (X1 , . . . , Xn ). Similarly, an n-length
but also a signal, carefully crafted to avoid detection,
sequence of random variable realizations is denoted x =
may be corrupted into one that will trigger the detector.
(x1 , . . . , xn ) ∈ X n . The probability of X taking value
Finally, the consideration of a cover-signal and distortion constraint in the encoding function is omitted. This is due to the view that steganographic capacity is
x ∈ X is pX (x). Following a signal through Figure 1 we begin in the space of n-length stego-signals denoted X n . The signal
a property of the channel and the detection function.
then undergoes some distortion as it travels through the
This viewpoint, along with the above differences, make
encoder-channel. This results in an element from the
a direct comparison to previous work somewhat diffi-
corrupted stego-signal space of Y n . Finally, the signal
cult, although possible with a number of simplifications
is attacked to produce the attacked stego-signal in space
explored in Section V.
Z n. 2) Steganalyzer: The steganalyzer is a function gn :
C. Groundwork
Y n → {0, 1} that classifies a sequence of signals from
This chapter lays the groundwork for determining the
Y n into one of two categories: containing steganographic
amount of information that may be transfered over the
information, and not containing steganographic informa-
May 2, 2006
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
Yn
gn−1 ({0}) Pgn
gn
Ign
3
4) Impermissible Set: The impermissible set Ign ⊆
Y n is the inverse image of 1 under gn . That is,
{0, 1}
Ign := gn−1 ({1}) = {y ∈ Y n : gn (y) = 1}.
gn−1 ({1})
(4)
For a given gn the impermissible set is the set of all Fig. 2.
Permissible and Impermissible Sets
signals in Y n that gn will classify as steganographic. Example 1: Consider the illustrative sum steganalyzer defined for the binary channel outputs (Y = {0, 1}). The
tion. The function is defined as follows for all y ∈ Y n , 1, if y is steganographic (1) gn (y) = 0, if y is not steganographic The specific type of function may be that of support
steganalyzer is defined for y = (y1 , . . . , yn ) as, 1, if Pn y > n i=1 i 2 gn (y) = 0, else
(5)
The permissible sets for n = 1, 2, 3, 4 are shown in Table I.
vector machine or a Bayesian classifier, etc. A steganalyzer sequence is denoted as,
TABLE I S UM S TEGANALYZER P ERMISSIBLE S ETS
g := {g1 , g2 , g3 , . . .},
(2) P1 =
{(0)}
P2 =
{(0,0),(0,1),(1,0)}
The set of all n length steganalyzers is denoted Gn .
P3 =
{(0,0,0),(1,0,0),(0,1,0),(0,0,1)}
3) Permissible Set: For any steganalyzer gn , the space
P4 =
{(0,0,0,0),(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1),
n
where gn : Y → {0, 1}.
of signals Y n is split into the permissible set and the
(1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)}
impermissible set, defined below. The permissible set Pgn ⊆ Y n is the inverse image of 0 under gn . That is, Pgn := gn−1 ({0}) = {y ∈ Y n : gn (y) = 0}.
5) Memoryless Steganalyzers: A memoryless steganalyzer, g = {gn }∞ n=1 is one where each gn is defined
(3)
The permissible set is the set of all signals of Y n that the given steganalyzer, gn will classify as nonsteganographic.
for y = (y1 , y2 , . . . , yn ) as, 1, if ∃i ∈ {1, 2, . . . , n} such that g(yi ) = 1 gn (y) = 0, if g(y ) = 0 ∀ i ∈ {1, 2, . . . , n} i
(6)
where g ∈ G1 is said to specify gn (and g). To denote
Since each steganalyzer has a binary range, a steganalyzer sequence may be completely described by a
a steganalyzer sequence is memoryless the following notation will be used g = {g}.
sequence of permissible sets. To denote a steganalyzer
The analysis of the memoryless steganalyzer is mo-
sequence in such a way the following notation is used,
tivated by the current real world implementation of
g∼ = {P1 , P2 , P3 , . . .},
detection systems. As an example we may consider each
n
where Pn ⊆ Y is the permissible set for gn . May 2, 2006
yi to be a digital image sent via email. When sending n emails, the hider attaches one of the yi ’s to each DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
4
message. The entire sequence of images is considered to
The attack channel may be deterministic or probabilistic.
be y. Typically steganalyzers do not make use of entire
Similarly to the encoder-noise channel, we denote an
sequence y. Instead each image is sequentially processed
arbitrary attack channel as the sequence of transition
by a given steganalyzer g, where if any of the yi trigger
probabilities,
the detector the entire sequence of emails is treated as
A := {A1 , A2 , A3 , . . .}.
steganographic. Clearly for a memoryless steganalyzer gn , defined by
3) Encoder-Attack Channel:
The encoder-attack n
channel or channel is a function Q : X n → Z n , defined
g we have that, Pgn = Pg × Pg × · · · × Pg | {z }
(7)
to model the effect of both the encoder-noise and attack channel. Thus,
n
That is, the permissible set of gn is defined by the ndimensional product of Pg .
Qn (z|x) =
X
An (z|y) W n (y|x) .
(9)
y∈Y n
The specification of Qn by An and W n is denoted Qn = An ◦ W n .
D. Channels We now define two channels. The first models inherent distortions occurring between the encoder and detection
The arbitrary encoder-attack channel is a sequence of transition probabilities,
function, such as the compression of the stego-signal.
Q = {Q1 , Q2 , Q3 , . . .}.
The second models a malicious attack by an active
(10)
We will express the dependence between the arbitrary
adversary such as a cropping or additive noise. 1) Encoder-Noise Channel: The encoder-noise channel is denoted as W n where W n : Y n × X n → [0, 1]
and has the following property for all x ∈ X n ,
encoder-noise and attack channels and the arbitrary encoder-attack channel as Q = A ◦ W. 4) Memoryless Channels: In the case where channel distortions act independently and identically on each
W n (y|x) := Pr {Y n = y|X n = x} .
input letter xi , we say it is a memoryless channel. In
The channel represents the conditional probabilities of
this instance the n-length transition probabilities can be
the steganalyzer receiving y ∈ Y n when x ∈ X n is
written as,
sent.
W n (y|x) =
The random variable, Y resulting from transmitting X W
i=1
W (yi |xi ),
(11)
where W is said to define the channel. To denote a
through the channel W will be denoted as X → Y . We denote an arbitrary encoder-noise channel as the
channel is memoryless and defined by W we will write W = {W }.
sequence of transition probabilities, W := {W 1 , W 2 , W 3 , . . .}.
E. Encoder and Decoder n
2) Attack Channel: The attack function maps A n
n Y
:
The purpose of the encoder and decoder is to transmit
n
Y → Z as, n
and receive information across a channel. The informan
A (z|y) = Pr {Z = z|Y May 2, 2006
n
= y} .
(8)
tion to be transfered is assumed to be from a uniformly DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
distributed message set denoted Mn , with a cardinality of Mn .
5
Similarly the probability of detection for the steganalyzer is calculated as,
The encoding function embeds a message into a
δn =
stegosignal. That is, fn : Mn → X n . The element of X n to which the ith message maps is called the codeword
Mn 1 X W n (Ign |ui ) . Mn i=1
(13)
F. Stego-Channel
for i and is denoted, ui . That is,
A steganographic channel or stego-channel is a triple i ∈ {1, . . . , Mn }.
fn (i) = ui ,
(W, g, A), where W is an arbitrary encoder-noise chan-
The collection of codewords, Cn = {u1 , . . . , uMn } is
nel, g is a steganalyzer sequence, and A is an arbitrary
called the code. The rate of an encoding function is given
attack channel. To reinforce the notion that a stego-
as,
channel is defined by a sequence of triples we will Rn :=
typically write (W, g, A) = {(W n , gn , An )}∞ n=1 .
1 log Mn . n
1) Discrete Stego-Channel: A discrete stego-channel
n
The decoding function, φn : Z → Mn , maps a cor-
is one where at least one of the following holds:
rupted stegosignal to a message. The decoder is defined by the set of decoding regions for the each message. The decoding regions, D1 , . . . , DMn , are disjoint sets that cover Z n and defined such that,
|X | < ∞,
|Y| < ∞,
|Z| < ∞,
or |Pgn | < ∞ ∀n.
2) Discrete Memoryless Stego-Channel: A discrete memoryless stego-channel (DMSC) is a stego-channel where,
φ−1 n ({m}) = Di := {F ⊆ Z n : φn (z) = m, ∀ z ∈ F } ,
1) (W, g, A) is discrete 2) W is memoryless 3) g is memoryless
for m = 1, . . . , Mn . Next, two important terms are presented that allow for the analysis of steganographic systems. The first is the probability the decoder makes a mistake, called
4) A is memoryless A DMSC is said to be defined by the triple (W, g, A) and will be denoted (W, g, A) = {(W, g, A)}.
the probability of error. The second is the probability the steganalyzer is triggered, called the probability of detection. In both cases they are calculated for a given n
G. Steganographic Capacity The secure capacity tells us how much information can
code C = {u1 , . . . , uMn }, encoder-channel W , attack-
be transfered with arbitrarily low probabilities of error
channel A and impermissible set Ign (corresponding to
and detection.
n
some gn ).
An (n, Mn , ǫn , δn )-code (for a given stego-channel)
The probability of error in decoding the message can be found as,
decoder are capable of transferring one of Mn messages ǫn =
n
n
1 Mn n
where Q = A ◦ W . May 2, 2006
consists of an encoder and decoder. The encoder and
Mn X i=1
Qn (Dic |ui ) ,
(12)
in n uses of the channel with an average probability of error of less than (or equal to) ǫn and a probability of detection of less than (or equal to) δn . DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
6
1) Secure Capacity: A rate R is said to be se-
The information-spectrum method also uses two novel
curely achievable for a stego-channel (W, g, A) =
quantities defined for sequences of random variables,
{(W n , gn , An )}∞ n=1 , if there exists a sequence of
called the lim sup and lim inf in probability.
(n, Mn , ǫn , δn )-codes such that:
The limsup in probability of a sequence of random variables, {Zn }∞ n=1 is defined as,
1) limn→∞ ǫn = 0 2) limn→∞ δn = 0 3) lim inf n→∞
1 n
n o p- lim sup Zn := inf α : lim Pr {Zn > α} = 0 . n→∞
log Mn ≥ R
The secure capacity of a stego-channel (W, g, A) is denoted as C(W, g, A). This is defined as the supremum of all securely achievable rates for (W, g, A).
Similarly, the liminf in probability of a sequence of random variables, {Zn }∞ n=1 is, n o p- lim inf Zn := sup β : lim Pr {Zn < β} = 0 . n→∞
H. (ǫ, δ)-Secure Capacity
The spectral sup-entropy rate of a general source X =
A rate R is said to be (ǫ, δ)-securely achievable for a stego-channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if
{X n }∞ n=1 is defined as, H(X) := p- lim sup
there exists a sequence of (n, Mn , ǫn , δn )-codes such that:
n→∞
1 1 . log n pX n (X n )
(15)
Analogously, the spectral inf-entropy rate of a general
1) lim supn→∞ ǫn ≤ ǫ
source X = {X n }∞ n=1 is defined as,
2) lim supn→∞ δn ≤ δ 3) lim inf n→∞
1 n
H(X) := p- lim inf
log Mn ≥ R
n→∞
1 1 log . n pX n (X n )
(16)
The spectral entropy rate has a number of natural prop-
II. S ECURE C APACITY F ORMULA
erties such as for any X, H(X) ≥ H(X) ≥ 0 [5, Thm.
A. Information-Spectrum Methods
1.7.2]. The information-spectrum method[5], [6], [7], [8], [9] is a generalization of information theory created to apply to systems where either the channel or its inputs are not
The spectral sup-mutual information rate for the pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,
necessarily ergodic or stationary. Its use is required in 1 i(X n ; Y n ), n
(17)
pY n |X n (Y n |X n ) . pY n (Y n )
(18)
I(X; Y) := p- lim sup
this work because the steganalyzer is not assumed to
n→∞
have any ergodic or stationary properties. The information-spectrum method uses the general source (also called general sequence) defined as, n o∞ (n) (n) X := X n = (X1 , X2 , . . . , Xn(n) ) , n=1
where each
(n) Xm
where, i(X n ; Y n ) := log
(14)
is a random variable defined over
alphabet X . It is important to note that the general source
Likewise the spectral inf-mutual information rate for the pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,
makes no assumptions about consistency, ergodicity, or stationarity. May 2, 2006
I(X; Y) := p- lim inf n→∞
1 i(X n ; Y n ). n
(19)
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
7
The set for δ = 0 is called secure output set and denoted
B. Information-Spectrum Results This section lists some of the fundamental results from
T0 .
information-spectrum theory [5] that will be used in the D. (ǫ, δ)-Channel Capacity
remainded of the paper. H(X) ≤ lim inf n→∞
1 H (X n ) n
(20)
I(X; Y) ≤ H(Y) − H(Y|X)
(21)
I(X; Y) ≥ H(Y) − H(Y|X)
(22)
C. Secure Sequences 1) Secure Input Sequences: For a given stegochannel (W, g, A), a general source X = {X n }∞ n=1 is called
δ-secure if the resulting Y = {Y n }∞ n=1 satisfies, lim sup Pr {gn (Y n ) = 1} ≤ δ,
(23)
result- the (ǫ, δ)-Channel Capacity. This capacity will make use of the following definition, 1 J (R|X) := lim sup Pr i(X n ; Z n ) ≤ R n n→∞ pZ n |X n (Z n |X n ) 1 = lim sup Pr ≤R log n pZ n (Z n ) n→∞ Qn (Z n |X n ) 1 ≤R . = lim sup Pr log n pZ n (Z n ) n→∞ The proof is the general ǫ-capacity proof given by Han[5], [6], with the restriction to the secure input set.
n→∞
or either of the following equivalent conditions,
Theorem 2.1 ((ǫ, δ)-Channel Capacity): The
lim sup pY n (Ign ) ≤ δ,
(24)
lim inf pY n (Pgn ) ≥ 1 − δ.
(25)
n→∞
channel capacity C(ǫ, δ|W, g, A) of a stegochannel
C(ǫ, δ|W, g, A) = sup sup {R : J (R|X) ≤ ǫ} , X∈Sδ
(29)
The set of all general sources that are δ-secure is denoted Sδ , that is, ( Sδ :=
X : lim sup
where X =
n→∞
X
x∈X n
W n (Ign |x) pX n (x) ≤ δ
)
for any 0 ≤ ǫ < 1 and 0 ≤ δ < 1. ,
(26)
{X n }∞ n=1 .
The set for δ = 0 is called secure input set and denoted S0 . 2) Secure Output Sequences: For a given steganalyzer sequence g =
{gn }∞ n=1 ,
a general sequence Y =
{Y n }∞ n=1 is called δ-secure if, lim sup Pr {gn (Y n ) = 1} ≤ δ,
(27)
n→∞
The set of all general output sequences that are δ-secure is denoted Tδ , that is, n ∞ Tδ := Y = {Y }n=1 : lim sup pY n (Ign ) ≤ δ . n→∞
(28)
May 2, 2006
(ǫ, δ)-
(W, g, A) is given by,
or n→∞
We are now prepared to derive the first fundamental
Proof: This proof is based on [5], [6]. Let C = supX∈Sδ sup {R : J (R|X) ≤ ǫ}, and Qn = An ◦ W n . Achievability: Choose any ǫ ≥ 0 and δ > 0. Let R = C − 3γ, for any γ > 0. By the definition of C we have that there exists an X ∈ Sδ such that, sup{R : J (R|X) ≤ ǫ} ≥ C − γ = R + 2γ.
(30)
Similarly we may find an R′ > R + γ such that J (R′ |X) ≤ ǫ. By the monotonicity of J (R|X) we have that, J (R + γ|X) ≤ ǫ
(31)
Next by letting Mn = enR we have that, lim inf n→∞
1 log Mn ≥ R n DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
8
(W, g, A)
Using Feinstein’s Lemma[10] we have that there exists
Xn
an (n, Mn , ǫn )-code with, Qn (Z n |X n ) 1 1 log ≤ log Mn + γ +e−nγ . ǫn ≤ Pr n pZ n (Z n ) n (32)
Xn = Y n
Xn
1 n
Taking the lim sup of each side we have,
(W, g, ·) W n (y|x) (·, g, ·)
An (y|z)
gn (y)
An (y|z)
Zn
Y n = Zn gn (y)
Xn = Y n = Zn gn (y)
Fig. 3.
lim sup ǫn ≤ J (R + γ|X) ,
gn (y)
(·, g, A)
log Mn = R for all n we have, 1 Qn (Z n |X n ) ǫn ≤ Pr ≤ R + γ +e−nγ . (33) log n pZ n (Z n )
As
Zn
Yn W n (y|x)
Stegochannels
(34)
n→∞
with J (R + γ|X) ≤ ǫ shows that lim supn→∞ ǫn ≤ ǫ. Finally since X ∈ Sδ we have that, lim sup pZ n (Ign ) ≤ δ.
(35)
n→∞
Converse: Let R > C, and choose γ > 0 such that
Thus for n > n0 we have, 1 Qn (Z n |X n ) ǫn ≥ Pr log ≤ R − 2γ − e−nγ . n pZ n (Z n ) (42) Taking the lim sup of both sides,
R−2γ > C. Assume that R is (ǫ, δ)-achievable, so there lim sup ǫn ≥ J (R − 2γ|X) .
exists an (n, Mn , ǫn , δn )-code such that, lim inf n→∞
(43)
n→∞
1 log Mn ≥ R, n
(36)
lim sup ǫn ≤ ǫ,
(37)
lim sup δn ≤ δ.
(38)
Since J (R − 2γ|X) > ǫ by (39), we see that, lim sup ǫn > ǫ.
(44)
n→∞
n→∞
and n→∞
Let X =
{X n }∞ n=1
E. Secure Channel Capacity
be the distribution of this code Q
and let Z be the corresponding output, where X → Z. As R − 2γ > C ≥ sup{R : J (R|X) ≤ ǫ} we must have,
capacity, namely the one where ǫ = δ = 0. The secure channel capacity is the maximum amount of information that may be sent over a channel with arbitrarily small
J (R − 2γ|X) > ǫ.
(39)
Using the Feinstein Dual [5], [6] we have, Qn (Z n |X n ) 1 1 log log M − γ −e−nγ ≤ ǫn ≥ Pr n n pZ n (Z n ) n (40) Using the property of lim inf we have that for all n > n0 that, 1 log Mn ≥ R − γ. n May 2, 2006
The next result deals with a special case of (ǫ, δ)-
probabilities of error and detection. The four potential formulations for our model are shown in Figure 3. The capacity of the stego-channel (W, g, A) is shown in Theorem 2.2 to follow and specialized to the other cases in Theorems 2.3, 2.4 and 2.5. The results of these capacities are summarized in
(41)
Table II. DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
9
Theorem 2.2 (Secure Capacity): The secure channel
Theorem 2.4 (Passive Adversary): The secure chan-
capacity C(W, g, A) of a stegochannel (W, g, A) is
nel capacity with a passive adversary, denoted C(W, g)
given by,
of a stego-channel (W, g, ·) is given by, C(W, g, A) = sup I(X; Z).
(45)
C(W, g) = sup I(X; Y).
(52)
X∈S0
X∈S0
Proof: We apply Theorem 2.1 with ǫ = 0 and δ = 0. This gives,
Proof: From the above Theorem we have, C(W, g, A) = sup I(X; Z).
(53)
X∈S0
C(W, g, A)
Since the adversary is passive, we have that Z = Y.
= C(0, 0|W, g, A)
(46a)
Theorem 2.5 (Noiseless Encoder, Passive Adversary):
= sup sup {R : J (R|X) ≤ 0}
(46b)
The secure capacity of a stego-channel (·, g, ·), with
1 i(X n ; Z n ) ≤ R ≤ 0 = sup sup R : lim sup Pr n n→∞ X∈S0
a noiseless-encoder and passive adversary, denoted
X∈S0
(46c) = sup I(X; Z)
C(·, g), is given by, C(·, g) = sup I(X; Y).
(54)
X∈S0
(46d)
Proof: From the above Theorem 2.2 we have,
X∈S0
Here the last line is due to the definition of p- lim inf.
C(W, g, A) = sup I(X; Z).
(55)
X∈S0
Theorem 2.3 (Noiseless Encoder, Active Adversary):
Since the adversary is passive, we have that Z = Y, and
The secure capacity of a stego-channel, (·, g, A), with
since there is no encoder noise we have that X = Y as
a noiseless-encoder and active adversary, denoted
well as S0 = T0 .
C(·, g, A), is given by,
C(·, g) = sup I(X; X)
(56)
X∈S0
C(·, g, A) = sup I(Y; Z),
(47)
= sup H(Y)
Y∈T0
(57)
Y∈T0
where
Here we have made use of the result that since X =
1 An (Y n |X n ) I(Y; Z) = p- lim inf log . n→∞ n pY n (Y n )
(48)
Proof: From the above Theorem 2.2 we have, C(W, g, A) = sup I(X; Z).
(49)
X∈S0
Since there is no encoder-noise we have that X = Y
Y, we have pX n |Y n (X n |Y n ) = 1 and, pX n |Y n (X n |Y n ) 1 log n→∞ n pY n (Y n ) 1 1 = p- lim inf log n→∞ n pY n (Y n )
I(Y; X) = p- lim inf
(58) (59)
= H(Y)
(60)
and S0 = T0 , C(·, g, A) = sup I(X; Z)
(50)
X∈S0
= sup I(Y; Z) Y∈T0
F. Strong Converse (51)
A stego-channel (W, g, A) is said to satisfy the ǫ-strong converse property if for any R
May 2, 2006
>
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
10
TABLE II
Let X represent the uniform input due to this code and
S ECURE C APACITY F ORMULAS
Z the output after the channel Q = AX. From the Secure Capacity
Noise
Attack
Thm.
C(W, g, A) = sup I(X; Z)
W
A
2.2
Noiseless
A
2.3
Feinstein Dual [5], [6] we know, 1 1 ǫn ≥ Pr i(X n ; Z n ) ≤ log Mn − γ − e−nγ . n n (63)
W
Passive
2.4
We also know there exists n0 such that for all n > n0
Noiseless
Passive
2.5
X∈S0
C(·, g, A) = sup I(Y; Z) Y∈T0
C(W, g) = sup I(X; Y) X∈S0
C(·, g) = sup H(Y)
that, 1 log Mn ≥ R − γ, n
Y∈T0
(64)
so for n > n0 , 1 ǫn ≥ Pr i(X n ; Z n ) ≤ R − 2γ − e−nγ . n
C(0, δ|W, g, A), every (n, Mn , ǫn , δn )-code with, 1 lim inf log Mn ≥ R, n→∞ n
(65)
We now show that the probability term above tends to
and
1. lim sup δn ≤ δ,
Using Theorem 2.2 we have,
n→∞
we have,
R lim ǫn = 1.
n→∞
= C(0, δ|W, g, A) + 3γ
(66)
= supX∈Sδ I(X; Z) + 3γ
(67)
= supX∈Sδ I(X; Z) + 3γ
(68)
Thus if a channel satisfies the ǫ-strong converse, Rewriting gives, C(ǫ, δ|W, g, A) = C(0, δ|W, g, A),
(61) R − 2γ = sup I(X; Z) + γ.
(69)
X∈Sδ
for any ǫ ∈ [0, 1). Theorem 2.6 (ǫ-Strong Converse): A
stego-channel
(W, g, A) satisfies the ǫ-strong converse property (for a fixed δ) if and only if,
By the definition of I(X; Z) we finally have, 1 n n lim Pr i(X ; Z ) ≤ R − 2γ = 1, n→∞ n
(70)
which together with 65 shows that that lim ǫn = 1. sup I(X; Z) = sup I(X; Z). X∈Sδ
(62)
X∈Sδ
For the other direction assume,
Proof: This proof is based on[5], [6].
Let R = C(0, δ|W, g, A) + 3γ with γ > 0. Consider an
lim ǫn = 1,
(71)
lim sup δn ≤ δ.
(72)
n→∞
First assume supX∈Sδ I(X; Z) = supX∈Sδ I(X; Z). and,
(n, Mn , ǫn , δn )-code with,
n→∞
lim inf n→∞
1 log Mn ≥ R, n
Mn = enR . Clearly,
and lim sup δn ≤ δ. n→∞
May 2, 2006
Set R = C(0, δ|W, g, A) + γ for any γ > 0 and set
lim inf n→∞
1 log Mn = R > C(0, δ|W, g, A). n DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
For any X ∈ Sδ (and it’s corresponding Z), using
11
Proof: Let U(A) represent the uniform distribution
Feinstein’s Lemma [10] we have an (n, Mn , ǫn )-code
on a set A.
satisfying,
Since Y∗ = {U(Pn )}∞ i=1 ∈ T0 we have,
ǫn ≤ Pr
1 i(X n ; Z n ) ≤ R + γ n
+ e−nγ .
From the error assumption we see that, 1 n n i(X ; Z ) ≤ R + γ = 1. lim Pr n→∞ n
sup H(Y) ≥ H(Y∗ )
(73)
(83a)
Y∈T0
= lim inf n→∞
(74)
1 log |Pn | n
Now assume there exists Y ∈ T0 with Y = {Y¯ n }∞ n=1 , such that,
This means that, R + γ ≥ I(X; Z),
R + γ ≥ sup I(X; Z).
H(Y) = H(Y∗ ) + 3γ,
(75)
(76)
X∈Sδ
Substituting we have that,
This means that, 1 1 ∗ log < H(Y lim Pr ) + 2γ = 0 (85) n→∞ n pY¯ n (Y¯ n ) By (83b) we have H(Y∗ ) = lim inf n→∞
sup I(X; Z) ≤
(84)
for any γ > 0.
and since X ∈ Sδ is arbitrary we have,
1 n
log |Pn |
R+γ
(77)
and from the definition of lim inf we may find a subse-
=
C(0, δ|W, g, A) + 2γ
(78)
quence indexed by kn such that,
=
supX∈Sδ I(X; Z) + 2γ
(79)
X∈Sδ
As γ is arbitrarily close to 0 we have, sup I(X; Z) ≤ sup I(X; Z).
X∈Sδ
(80)
X∈Sδ
Also, by definition, sup I(X; Z) ≥ sup I(X; Z),
X∈Sδ
(83b)
(81)
X∈Sδ
H(Y∗ ) + 2γ ≥
1 log |Pkn | + γ. kn
(86a)
For any kn (86a) holds and we have, 1 1 1 log log |Pkn | + γ ≤ (87) < Pr kn kn pY¯ kn (Y¯ kn ) 1 1 ∗ log < H(Y Pr ) + 2γ . (88) kn pY¯ kn (Y¯ kn ) Applying this result to (85) we have,
showing equality and completing the proof. lim Pr
G. Bounds
n→∞
1 1 1 < log log |Pkn | + γ k ¯ n kn kn pY¯ kn (Y )
the permissible set. These bounds will then be used to prove general bounds for steganographic systems, and see further application in Chapter III.
Rearranging the inner term we have, e−kn γ kn ¯ = 0. lim Pr pY¯ kn (Y ) > n→∞ |Pkn |
(90)
This means we may find n0 such that for any ǫ > 0 and n > n0 ,
Theorem 2.7 (Spectral inf-entropy bound): For a discrete g = {Pn }∞ n=1 with corresponding secure output set T0 ,
e−kn γ kn ¯ Pr pY¯ kn (Y ) > < ǫ. Pkn
(91)
e−kn γ , y ∈ Y n : pY¯ kn (Y¯ kn ) > |Pkn |
(92)
Let, sup H(Y) = lim inf Y∈T0
May 2, 2006
= 0.
(89)
We now derive a number of useful bounds on the spectral-entropy of an output sequence in relation to
n→∞
1 log |Pn | n
(82)
Akn =
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
12
Proof: Since Y∗ = {U(Pn )}∞ i=1 ∈ T0 we have,
so for all n > n0 ,
pY¯ kn (Akn ) < ǫ.
sup H(Y) ≥ H(Y∗ )
(93)
(98a)
Y∈T0
= lim sup
So for n > n0 we may calculate the probability of the
n→∞
1 log |Pn | n
(98b)
permissible set (for the subsequence) as, pY¯ kn (Pkn ) =
X
pY¯ kn (y)
(94a)
Now assume there exists Y ∈ T0 , with Y = {Y¯ n }∞ n=1 such that,
y∈Pkn
X
=
X
pY¯ kn (y) +
y∈Pkn ∩Ackn
y∈Pkn ∩Akn
(94b) ≤
−kn γ
X
y∈Pkn ∩Ackn
e + |Pkn |
X
pY¯ kn (y)
X e−kn γ ≤ + |Pkn | y∈Pkn
X
This means that,
lim Pr
pY¯ kn (y) (94d)
pY¯ kn (y)
X
pY¯ kn (y)
(94f)
< e−kn γ + ǫ
lim Pr
n→∞
Thus we see that for the subsequence, (95)
= 0 (100)
(101)
and
(94g)
lim sup pY¯ kn (Pkn ) < ǫ,
1 γ log |Pkn | + γ > H(Y) + kn 4
(94e)
y∈Akn
1 1 γ log > H(Y) + n ¯ n 4 pY¯ n (Y )
Thus for some subsequence kn we have,
y∈Pkn ∩Akn
≤ e−kn γ +
(99)
for any γ > 0.
n→∞
y∈Pkn ∩Akn
X
γ , 4
y∈Pkn ∩Akn
(94c)
= e−kn γ +
H(Y) = H(Y∗ ) +
pY¯ kn (y)
1 1 1 log log |Pkn | + γ > k ¯ n kn k pY¯ kn (Y ) n
= 0.
(102)
Rewriting we have,
n→∞
e−kn γ lim Pr pY¯ kn (Y¯ kn ) < =0 n→∞ |Pkn |
for all ǫ > 0 so clearly, lim pY¯ n (Pn ) = 1,
n→∞
(96) Let,
is impossible. Thus from (25) we have a contradiction as the above / T0 . implies that Y ∈ Theorem 2.8 (Spectral sup-entropy bound): For discrete g = {Pn }∞ n=1 with corresponding secure output set T0 , sup H(Y) = lim sup Y∈T0
May 2, 2006
(103)
n→∞
1 log |Pn | n
(97)
Akn =
e−kn γ y ∈ X n : pY¯ kn (Y¯ kn ) < |Pkn |
(104)
and given any ǫ > 0 we may find n0 so for n > n0 , pY¯ kn (Akn ) < ǫ.
(105)
For n > n0 the probability of the permissible set (in this DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
Proof: We note that the general distributions form
subsequence) is, pY¯ kn (Pkn ) =
13
X
pY¯ kn (y)
(106a)
x∈Pkn
X
=
a Markov chain, X → Y → Z1 . A property of the infinformation rate[6] is,
pY¯ kn (y)
I(X; Z) ≤ I(X; Y),
y∈Pkn ∩Akn
+
X
pY¯ kn (y)
(106b)
y∈Pkn ∩Akn −kn γ
≤
e |Pkn | +
(111)
when X → Y → Z. Since X → Y → Z implies Z → Y → X we also
X
have,
1
y∈Pkn ∩Akn
X
pY¯ kn (y)
I(X; Z) ≤ I(Y; Z).
(106c)
(112)
y∈Pkn ∩Akn
≤ e−kn γ + ≤ e−kn γ +
X
pY¯ kn (y)
(106d)
y∈Pkn ∩Akn
X
pY¯ kn (y)
(106e)
on the sup-entropy of the secure input set. Theorem 2.9 (Input Sup-Entropy Bound): For
y∈Akn
< e−kn γ + ǫ
The first capacity bound gives an upperbound based
(106f)
a
stego-channel (W, g, A) the secure capacity is bounded as,
This gives,
C(W, g, A) ≤ sup H(X)
(113)
X∈S0
lim sup pY¯ kn (Pkn ) < ǫ,
(107)
Proof: Using (21) and the property that H(X|Z) ≥
n→∞
for any ǫ > 0. Since the subsequence above does not
0 we have,
converge to 1 it is impossible for,
(T2.2)
C(W, g, A) = sup I(X; Z) X∈S0
lim inf pY¯ n (Pn ) = 1, n→∞
(21)
(108)
≤ sup
X∈S0
and by (25) we see Y ∈ / T0 .
H(X) − H(X|Z)
≤ sup H(X) X∈S0
H. Capacity Bounds
The following corollary specializes the previous one
This section present a number of fundamental bounds on the secure capacity of a stego-channel based on the properties of that channel.
with the restriction that the input alphabet is finite. Corallary 2.1: For
a n
n
(W, g, A) = {(W , Pgn , A
given
stego-channel
)}∞ n=1
with a discrete
We make use of the following lemma.
input set (|X | < ∞) the secure capacity is bounded
Lemma 2.1: For a stego-channel (W, g, A) the fol-
from above as,
lowing hold,
May 2, 2006
C(W, g, A) ≤ log |X | I(X; Z) ≤ I(X; Y),
(109)
I(X; Z) ≤ I(Y; Z).
(110)
1X
(115)
→ Y → Z is said to hold when for all n, X n and Z n are
conditionally independent given Y n .
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
Proof: We make use of Theorem 2.9,
14
Proof: Combining Theorem 2.8 and line (117b) of Theorem 2.10 gives the desired result.
(T2.9)
C(W, g, A) = sup H(X) X∈S0
The next theorem provides an intuitive result dealing
≤ sup H(X)
with the capacity of two stego-channels having related
X
steganalyzers.
= log |X |
Theorem 2.11 (Permissible Set Relation): For
The next theorem gives two upper bounds on the
two
stego-channels, (W, g, A) and (W, v, A) if Pgn ⊆ Pvn for all but finitely many n, then,
capacity based on the sup-entropy of the secure input C(W, g, A) ≤ C(W, v, A).
and output sets. Theorem 2.10 (Output Sup-Entropy Bounds): For
a
stego-channel (W, g, A) the secure capacity is bounded as,
Proof:
(120)
∞ Let {fn }∞ n=1 and {φn }n=1 be a se-
quence of encoding and decoding functions that achieves C(W, g, A). Such a sequence exists by the definition of
C(W, g, A) ≤ sup H(Y)
(117a)
≤ sup H(Y)
(117b)
secure capacity. The following definitions will be used
X∈S0
for i = 1, . . . , Mn ,
Y∈T0
ui = fn (i),
Proof: Using (21) and the property that H(Z|X) ≥
Di = φ−1 n ({i}) .
0 we have,
The probability of error for this sequence is given
C(W, g, A) = sup I(X; Z) X∈S0
by (12),
(L2.1)
≤ sup I(X; Y) X∈S0
(21)
≤ sup
X∈S0
ǫn =
H(Y) − H(Y|X)
where Qn = An ◦ W n .
≤ sup H(Y)
Clearly, this value is independent of the permissible
X∈S0
sets and if ǫn → 0 for the stego-channel (W, g, A) then
≤ sup H(Y) Y∈T0
it also goes to zero for (W, v, A). W
Here the final line follows since if X ∈ S0 and X → Y then Y ∈ T0 .
Next we know that the probability of detection for (W, g, A) is given by (13),
The next corollary specializes the above theorem when δng =
the permissible set is finite. Corallary 2.2 (Discrete Permissible Set Bound): For a given discrete stego-channel (W, g, A)
=
{(W n , Pgn , An )}∞ n=1 the secure capacity is bounded from above as, C(W, g, A) ≤ lim sup n→∞
May 2, 2006
Mn 1 X Qn (Dic |ui ) , Mn i=1
1 log |Pgn | n
(119)
and that δng → 0.
Mn 1 X W n (Ign |ui ) , Mn i=1
Since Pgn ⊆ Pvn for all n > N , we have that, Ign ⊇ Ivn if n > N and thus, W n (Ign |x) ≥ W n (Ivn |x) ,
∀n > N, x ∈ X n . (121) DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
W n (y|x)
fn (m) Encoder
φn (y)
Noise
Decoder
gn (y) Detection g
Fig. 4.
Detection g
Fig. 5.
vn (z) Detection v
Two Noise Channel
Proof:
We first show that C(W, h, A)
≤
C(W, g, A). The permissible set of the composite is equal to the
Mn 1 X W n (Ivn |ui ) Mn i=1
intersection of the base detection functions,
Mn 1 X W n (Ign |ui ) Mn i=1
Phn = Pgn ∩ Pvn , ∀n,
(124)
thus we have that Phn ⊆ Pgn and we may apply
=δng
Theorem 2.11 to state,
Since δng → 0 we see that δnv → 0 as well.
C(W, h, A) ≤ C(W, g, A). The above argument may be applied using Phn ⊆ Pvn
I. Applications 1) Composite steganalyzers: This final theorem of the previous section is intuitively pleasing and leads to some immediate results. An example of this is the composite steganalyzer pictured in Figure 4.
to show C(W, h, A) ≤ C(W, v, A). 2) Two Noise Systems: We briefly present and discuss an interesting case that is somewhat counter-intuitive. Consider the channel shown in Figure 5. In this case there is distortion A after the encoder and a second dis-
In this system two steganalyzers, g and v are used sequentially on the corrupted stego-signal. If either of these steganalyzers are triggered, the message is considered steganographic. We will denote the composite stego-channel of this system as (W, h, A).
tortion, B before the second steganalyzer. In the previous section it was shown that in the composite steganalyzer the addition of a second steganalyzer (Figure 5) lowers the capacity of the stego-channel. A surprising result for the two noise system is that this may not be the case-
As one would expect the capacity of the composite channel, C(W, h, A), is smaller than either C(W, g, A) or C(W, v, A). This is shown in the next theorem. Theorem 2.12 (Composite Stego-Channel): For
in fact, the addition of a second distortion can increase the capacity of a stego-channel! To see this consider the two steganalyzers g and v.
a
composite stego-channel (W, h, A) defined by g and v, the following inequality holds,
Assume that g classifies signals with positive means as steganographic, while v classifies signals with negative means as steganographic. If these detection functions
C(W, h, A) ≤ min {C(W, g, A), C(W, v, A)} . (123) May 2, 2006
Noise B
Detection v
for (W, v, A) and n > N as,
≤
B(z|y)
gn (y)
Using this we may bound the probability of detection
(121)
A(y|x) Noise A
vn (y)
Composite steganalyzer
δnv =
15
were in series, clearly the permissible set (of the composite detection function) is empty as a signal cannot DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
Xn = Y n = Zn Mn
fn (m) Encoder
φn (z)
16
ˆn M
Decoder
Example 2 (Capacity of the Sum Steganalyzer): We gn (y)
now use this result to find the noiseless capacity of
Detection
the parity steganalyzer of Example 1. The size of the Fig. 6.
permissible set for n is equal to the number of different
Noiseless Stego-Channel
ways we may arrange up to ⌊n/2⌋ 1s into n positions. have a positive and negative mean. Now if the distortion B is deterministic, for example, |Pgn | =
B n (−y|y) = 1, we may send any signal we wish, as long as its mean
X
i:0≤i≤⌊ n 2⌋
n i
.
(126)
is positive. So in some instances, it’s possible for the addition of a distortion to actually increase the capacity.
Making use of the following properties,
III. N OISELESS C HANNELS This section investigates the capacity of the noiseless stego-channel shown in Figure 6. In this system there is no encoder-noise and the adversary is passive. This
n X i=0
n
n
means that not only does the decoder receive exactly what the encoder sends, but the steganalyzer does as
i
k
= 2n ,
=
(127) n n−k
(128)
well. This section finds the perfectly secure capacity of this system, and then derives a number of intuitive bounds
We begin by finding |Pgn | when n is odd. Letting k = (n − 1)/2,
relating to this capacity. Theorem 3.1 (Secure Noiseless Capacity): For a discrete noiseless channel (·, g, ·) the secure channel capacity is given by,
C(·, g) = lim inf n→∞
1 log |Pgn | n
(125)
Proof: Using Theorem 2.5 and Theorem 2.7 we have, (T2.5)
C(·, g) = sup H(Y) Y∈T0 (T2.7)
= lim inf n→∞
May 2, 2006
1 log |Pgn | n
n X n (127) 2n = i i=0 n k X X n n + = j j j=0 j=k+1 n k X X n n (128) + = n−j j j=0 j=k+1 k X n = 2 |Pgn | =2 j j=0
(129)
(130)
(131)
(132)
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
We now find |Pgn | for n even. Letting k = n/2, n X n 1 (127) 2n−1 = (133) 2 i=0 i k−1 n 1 X n 1 n 1 X n = + + 2 j=0 2 2 k j j j=k+1
17
First assume that the stego-channel satisfies the ǫstrong converse property. This gives, (140)
sup H(Y) = sup I(X; Z) Y∈T0 (T2.6)
= sup I(X; Z)
k−1 X
n
+ 1 2 j j=0 k X n 1 − = 2 j j=0 1 n = |Pgn | − 2 k
(128)
=
This gives |Pgn | = 2n−1 + |Pgn | = 2n−1
n k n k
(135)
(136)
(141)
= sup H(Y)
1 2
(142c)
Y∈T0
The capacity is then, (T2.5)
C(·, g) = sup H(Y) Y∈T0 (T2.7)
= lim inf n→∞
(137)
(142b)
X∈S0
(134)
(142a)
X∈S0
n
k for n odd from above.
1 log |Pgn | n
(142c)
= sup H(Y) Y∈T0
(T2.8)
= lim sup
for n even and
n→∞
= lim
n→∞
1 log |Pgn | n
1 log |Pgn | n
This produces the following capacity result, C(·, g) = lim inf n→∞
1 1 log |Pgn | = lim log 2n−1 n→∞ n n = 1bit/use.
(138a)
3) ǫ-Strong Converse for Noiseless Channels: We
Here the final line results as the lim inf and lim sup coincide. For the other direction assume that C(·, g) limn→∞
1 n
=
log |Pgn | thus we have,
now present a fundamental result for discrete noiseless channels regarding the ǫ-strong converse property.
C(·, g) = sup I(X; Z) X∈S0
It gives the necessary and sufficient conditions for a
(T2.5)
= sup H(Y) Y∈T0
noiseless stego-channel to satisfy the ǫ-strong converse
1 log |Pgn | n 1 = lim sup log |Pgn | n n→∞ = lim
property.
n→∞
Theorem 3.2 (Noiseless ǫ-Strong Converse): A discrete noiseless stego-channel (·, g, ·) satisfies the
(T2.8)
= sup H(Y)
ǫ-strong converse property if and only if, C(·, g) = lim
n→∞
1 log |Pgn | . n
Y∈T0 (141)
Proof: Since the channel is noiseless, X = Y = Z we have,
X∈S0
Thus, supX∈S0 I(X; Z) = supX∈S0 I(X; Z) and by Theorem 2.6 the stego-channel satisfies the ǫ-strong-
sup I(X; Z) = sup H(Y), X∈S0
X∈S0
(140)
converse property.
Y∈T0
sup I(X; Z) = sup H(Y).
May 2, 2006
= sup I(X; Z)
(139)
Y∈T0
(141)
Example 3 (Sum Steganalyzer): We now determine if the sum steganalyzer satisfies the ǫ-strong converse. DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
From Example 2 the size of the permissible set is, n 2n−1 + 1 , for even n 2 |Pgn | = n/2 2n−1 , for odd n (144)
We will make use of Stirling’s approximation, √ 1 n! = 2πnn+ 2 e−n+λn ,
(145)
18
Since the liminf and limsup coincide the limit is indeed a true one. Thus, this stego-channel satisfies the ǫ-strong converse. 4) Properties of the Noiseless DMSC: In this section we briefly investigate the secure capacity of the discrete memoryless stego-channel (cf. I-F.2). Theorem 3.3 (Noiseless DMSC Secure Capacity): For the stego-channel (·, g, ·) with g = {g}, the secure
where 1/(12n + 1) < λn < 1/(12n).
capacity is given by,
For n even, 1 n! (146) 1 2 (n − 2 n)!( 12 n)! √ 1 1 2πnn+ 2 e−n+λn n−1 =2 + √ 2 n n 2 +1 2π(n/2) 2 2 e− 2 +λn/2
|Pgn | = 2n−1 +
(147)
= 2n−1 + = 2n−1 +
√
C(·, g) = log |Pg | ,
and furthermore this stego-channel satisfies the strong converse. Proof: As the channel is noiseless and the input alphabet is finite we may use Theorem 3.1,
n+ 12 −n+λn
e 1 2πn 2 2π(n/2)n+1 e−n+2λn/2
(157)
(148)
C(·, g) = lim inf n→∞
1 log |Pgn | . n
(158)
n+1 λn
e 1 2 √ 2 2πne2λn/2
(149)
2eλn −2λn/2 = 2n−1 + 2n−1 √ 2πn λn −2λn/2 2e n−1 √ 1+ =2 2πn 2e n−1 1+ √ ≤2 2πn
(150)
Note that by (7) we have for all n, 1 1 log |Pgn | = log Pg × Pg × · · · × Pg n n {z } | n
(151)
1 n = log |Pg | n
(152)
= log |Pg | .
This gives, lim sup n→∞
1 log |Pgn | n ≤ lim sup n→∞
Thus,
2e 1 log 2n−1 1 + √ n 2πn
1 log 2n−1 n→∞ n 1 2e + lim sup log 1 + √ n→∞ n 2πn
(153)
n→∞
1 1 log |Pgn | = log |Pg | = lim log |Pgn | , n→∞ n n (160)
(154)
thus by Theorem 3.2 the stego-channel satisfies the
(155)
strong converse.
This shows, 1 1 lim inf log |Pgn | = 1bit/use. ≥ lim sup log |Pgn | . n→∞ n n→∞ n (156) May 2, 2006
(159)
We also have that C(·, g) = lim inf
≤ lim sup
=1
C(·, g) = log |Pg | .
IV. A DDITIVE N OISE S TEGO -C HANNELS In this section we evaluate the capacity of particular stego-channel, shown in Figure 7. In this channel both DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002 Y n = X n + Nen
Xn
Z n = Y n + Nan
fn (m)
fn (m)
φn (z)
φn (y)
Encoder
Encoder
Decoder
Decoder
gn (y)
Nan
Detection
Attack
Nen Noise
Fig. 7.
19
N (0, σe2 )
gn (y) Detection
Noise
Fig. 8.
Additive Noise Channel Active Adversary
N (0, σa2 ) Attack
AWGN Channel Active Adversary
Theorem 4.1: For additive noise stego-channel de-
the encoder-noise and attack-noise are additive and in-
fined with Ne + Na = N, if N satisfies the strong
dependent from the channel input.
converse (i.e. H(N) = H(N)) then the capacity is, A. Additive Noise
C(W, g, A) = sup {H(Z)} − H(N)
(164)
X∈S0
Denote the sum of two general sequences X = (n)
Proof: First we find a lower bound as,
(n)
n {X n = (X1 , . . . , Xn )}∞ = n=1 , and Y = {Y (n)
(22)
C(W, g, A) ≥ sup
(n)
(Y1 , . . . , Yn )}∞ n=1 as,
X∈S0
(n)
(n)
X+Y := {X n +Y n = (X1 +Y1 , . . . , Xn(n) +Yn(n) )}∞ n=1 . (161) Letting the encoder-noise be denoted as Ne = {Nen }∞ n=1 and the attack-noise denoted as Na = {Nan }∞ n=1 we have the following relations,
(162)
H(Z) − H(Z|X)
= sup {H(Z)} − H(N)
(165) (166)
X∈S0
Next we upperbound the capacity as, (21)
C(W, g, A) ≤ sup {H(Z) − H(Z|X)}
(167)
X∈S0
(163)
= sup {H(Z)} − H(N)
(168)
X∈S0
Y = X + Ne
By assumption H(N) = H(N) and combining (166)
Z = Y + Na = X + Ne + Na = X + N
and (168) we have the desired result.
where N = {N n }∞ n=1 = Ne + Na . As noises are independent from the stego-signal we
B. AWGN Example The general formula of the previous section is now
may use the following simplifications,
applied to the commonly found additive white Gaussian pZ n |X n (X n + N n |X n ) = pN n (N n ),
noise channel. The detector is motivated by the use of
leading to the following simplifications in spectral-
spread spectrum steganography[11] or more generally
entropies,
stochastic modulation[12]. The encoder-noise and attack-channel to be considered H(Z|X) = H(N),
(162)
H(Z|X) = H(N).
(163)
We now use these simplifications to present a useful capacity result for additive noise channels. May 2, 2006
are additive white Gaussian noise (AWGN). Thus for a stego-signal, x = (x1 , . . . , xn ), the corrupted stegosignal is given by, y = (x1 + n1 , . . . , xn + nn ), DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
where each ni ∼ N (0, σe2 ), and all are independent. The transition probabilities of the encoder-noise are given by, (
)
n 1 X 1 − W (y|x) = (yi − xi )2 . n exp 2σe2 i=1 (2πσe2 ) 2 (169) n
Similarly, the attack-channel is AWGN as
N (0, σa2 )
so the transition probabilities are, ( ) n 1 1 X n 2 A (z|y) = − 2 (zi − yi ) . n exp 2σa i=1 (2πσa2 ) 2 (170) 1) Variance Steganalyzer: In stochastic modulation a pseudo-noise is modulated by a message and added to
20
Let Ne = {Ne }2 where Ne ∼ N (0, σe2 ) and Na =
{Na } where Na ∼ N (0, σa2 ).
Let N = Ne + Na = {N n = Nen + Nan }∞ n=1 . Since
both Ne and Na are i.i.d. as N (0, σe2 ) and N (0, σa2 ),
respectively, their sum is i.i.d. as N (0, σe2 +σa2 ), i.e. N =
{N } with N ∼ N (0, σe2 + σa2 ).
Since N = {N } with N ∼ N (0, σe2 + σa2 ) we have
the following relations, H(N) = H(N) = H(N ) =
1 log 2πe σa2 + σe2 . 2 (172)
Since H(N) = H(N) we see that the noise sequence satisfies the strong converse property.
the cover signal. This is done as the presence of noise in
3) Active Adversary Capacity: We now derive the
signal processing applications is a common occurrence.
secure capacity of the above stego-channel. Since the
If the passive adversary has knowledge of the dis-
noises are i.i.d. the general sequence N will satisfy the
tribution of the coversignal and knows that the hider
strong converse and allow the use of Theorem 4.1.
is using stochastic modulation, it also knows that the
The formal proof is then followed by a discussion
variance of a coversignal will differ from the variance
of the results and a description using the classic sphere
of a stegosignal. Thus if the passive adversary knows that
packing intuition.
the variance of the cover-distribution it could design a
Theorem 4.2: For the stego-channel (W, g, A) =
steganalyzer to trigger if the variance of a test signal is
n n {(W n , gn , An )}∞ n=1 with W and A defined by (169)
higher than this threshold.
and (170) respectively, and gn defined by (171) the
For example when testing the signal y = (y1 , . . . , yn )
secure capacity is,
the variance steganalyzer operates as, 1, gn (y) = 0,
if
1 n
else
Pn
i=1
yi2 > c
C(W, g, A) =
1 c + σa2 . log 2 2 σe + σa2
Proof: From Theorem 4.1 and (172) we have, (171) C(W, g, A) = sup {H(Z)} − H(N)
(174)
X∈S0
Thus, if the empirical variance of a test signal is above
= sup {H(Z)} − X∈S0
a certain threshold, the signal is considered stegano-
1 log 2πe σa2 + σe2 . 2
(175)
graphic. 2) Additive Gaussian Channel Active Adversary: In this section we derive the capacity under an active adversary. Assume that the adversary uses an additive i.i.d. Gaussian noise with variance σa2 while the encoder noise is additive i.i.d. Gaussian with σe2 . May 2, 2006
(173)
Achievability: Let X = {X} where X ∼ N (0, c − σe2 ). Thus Y = 2 Recall
that for a general sequence, X
=
{X n
(n) (n) (X1 , . . . , Xn )}∞ n=1 when X = {X} is written it means (n) each Xi is independent and identically distributed as X.
= that
DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
X + Ne = {Y } with Y = X + Ne . By addition of independent Gaussians, Y ∼ N (0, c). This gives, ) ( n 1 X (n) 2 > c → 0, (176) Pr Yi n i=1 and we see that X ∈ S0 . Similarly, Z = Na + Y = {Z} with Z = X + Ne + Na . Again by addition of
independent Gaussians we have Z ∼ N (0, c + σa2 ).
21
Proof: Since σa2 is a variance and always positive it suffices to show, n
1 X (n) K < c + γ, n i=1 ii
for all n greater than some N .
To show this, assume that no such N exists, thus we have a subsequence nk such that, nk 1 X (n ) K k ≥ c + γ. nk i=1 ii
This allows for a lower bound of, (175)
C(W, g, A) = sup H(Z) − X∈S0
1 log 2πe(σe2 + σa2 ) 2
(177a)
1 log 2πe(σe2 + σa2 ) 2 1 =H(Z) − log 2πe(σe2 + σa2 ) 2 1 = log 2πe(c + σa2 ) 2 1 − log 2πe(σe2 + σa2 ) 2 1 c + σa2 = log 2 2 σe + σa2 ≥H(Z) −
(177b)
nk 1 X (n ) K k =E nk i=1 ii
(
which in turn implies that,
nk 1 X y2 nk i=1 i
)
≥ c + γ,
Pr {gnk (Y nk ) = 0} → 0. / T0 . This is a contradiction as it shows Y = {Y n }∞ n=1 ∈
(177d) Lemma 4.3: For any Z n = (Z1 , . . . , Zn ) with Cij = (177e)
E {Zi Zj }, 1 H(Z ) ≤ log(2πe)n 2 n
To find the upperbound we will make use of a number
n
1X Cii n i=1
!n
.
(182)
Proof: From [13, Chap. 9.6] we have,
of simple lemmas: Lemma 4.1: For a given stego-channel with secure input distribution set S0 and secure output distribution set T0 , the following holds, (178)
Y∈T0
W
Proof: By definition for any X ∈ S0 and X → Y, we have Y ∈ T0 .
H(Z n ) ≤
(n)
(n)
(n)
Lemma 4.2: For Y n = (Y1 , Y2 , . . . , Yn ) let (n)
(n)
Kij be the covariance between Yi and Yj , that is o n (n) (n) (n) . For the stego-channel defined Kij := E Yi Yj n ∞ }n=1
(183)
The result follows from application of the arithmetic-
Lemma 4.4: For the above stego-channel, any Y ∈ T0 and any ǫ > 0 we have, lim inf n→∞
(n)
n Y 1 log(2πe)n Cii . 2 i=1
geometric inequality.
sup H(Z) ≤ sup H(Z).
X∈S0
(179)
(184)
A
Proof: Let any ǫ > 0 be given and choose γ > 0 such that,
this gives,
n
1 X (n) K + σa2 < c + σa2 + γ. n i=1 ii
1 1 H(Z n ) < log 2πe(c + σa2 ) + ǫ, n 2
where Z = {Z n }∞ n=1 and Y → Z.
∈ T0 we have for any γ > 0
there exists some N such that for all n > N ,
May 2, 2006
(181)
This means that,
(177c)
Converse:
above, if Y = {Y
(180)
γ ≤ (c + σa2 ) e2ǫ − 1 ,
1 1 log 2πe c + σa2 + γ ≤ log 2πe(c + σa2 ) + ǫ. (185) 2 2 DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
o n (n) (n) (n) (n) and Kij = Letting Cij = E Zi Zj o n (n) (n) (n) (n) we note that Zi = Yi + Na . This E Yi Yj
gives,
(n)
(n)
Cii = Kii + σa2 .
(186)
22
TABLE III G AUSSIAN A DDITIVE N OISE C APACITIES
Channel
Secure Capacity
C(W, g, A) C(W, g)
This gives,
C(·, g, A)
(L4.3) 1 1 H(Z n ) ≤ log(2πe)n n 2n
1 = log(2πe)n 2n
(186)
1 n
n X
(n)
Cii
i=1
!n
n
(187)
1 X (n) K + σa2 n i=1 ii
1 log(2πe)n c + σa2 + γ 2n 1 = log 2πe c + σa2 + γ 2 (185) 1 ≤ log 2πe(c + σa2 ) + ǫ 2
(L4.2)
0, 1 c + σa2 1 c + σa2 log 2 ≤ C(W, g, A) < log 2 + ǫ, 2 2 σe + σa 2 σe + σa2
May 2, 2006
Attack Noise
σe2
4) Noise Cases: We now use this theorem to inves-
The inequality of (189) holds for all but a finite number
and we see that C(W, g, A) =
Encoder Noise
c+σ 2 log σ2 +σa2 e a 1 c log 2 2 σe 2 c+σa 1 log 2 2 σa 1 log 0c 2
1 2
1 2
log
2 c+σa 2. σe2 +σa
constraint in communication over a stego-channel, as even if ǫ → 0 the capacity of the stego-channel is still zero. DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
Decoder
Noise
ℜn , we may view each codeword as a point in ℜn .
gn (y)
When we transmit a given codeword we may think of
Detection
the addition of noise as moving the point around in that
N (0, σ 2 )
Fig. 9.
9) Error Probability: Since we have that X n = Y n =
φn (y)
fn (m) Encoder
23
space. Since the power of the noise is σ 2 , the probability √ that the received codeword has moved more than nσ 2
AWGN Channel Passive Adversary
away from where it started goes to zero as n → ∞. Thus we know that if we transmit a codeword, it will likely 7) Noiseless Case: Consider the noiseless case where σe2
=
σa2
2
2
= σ and σ → 0. This gives, c + σ2 1 log 2 =∞ σ + σ2 →∞ 2
lim C(W, g, A) = 2lim 2
σ →0
σ
Thus we see that since the channel is noiseless, and the permissible set size (as well as input and output alphabets) is uncountable (thus infinite) and the capacity is unbounded.
be contained in a sphere (centered on the codeword) of √ radius nσ 2 . This means that if we receive a signal inside such a sphere, it is likely that the transmitted codeword was the center of that sphere. In this manner we can define a coding system. We know that for secure capacity the probability of error must go to zero. We also know that each codeword
8) Geometric Intuition: In this section we present some geometric intuition to the previous results, similar to the case of the classic additive Gaussian noise[13], [14].
has an associated sphere that the received signal will fall inside. Thus if we choose the codewords such that their spheres do not overlap, there will be no confusion in decoding and the probability of error will go to zero.
We will consider the case of only an encoder-noise of σ 2 , shown in Figure 9.
10) Detection Probability: We begin by looking at the permissible set. The permissible set for our gn is
From the above theorem we see that, 1 c C(W, g) = log 2 . 2 σ
given by, (193)
The most basic element will be the volume of an n dimensional sphere of radius r. In this case the volume is equal to, An rn ,
n
Pgn = {y ∈ Y :
n X
yi2 < nc}.
i=1
Clearly the permissible set is a sphere of radius
(195) √
nc
centered at the origin. If a test signal falls inside this sphere it is classified as non-steganographic, whereas if
(194)
where An is a constant dependent only on the dimension n.
it is outside it is considered steganographic. The second criteria for a secure system is that the probability of detection go to zero. If we were to
The fundamental question is what is the capacity
place each codeword such that its sphere was inside the
of the stego-channel, or how many codewords can we
permissible set, we know that the probability of detection
reliably use? To answer this we must consider the two
will go to zero.
constraints on a secure system: error probability and detection probability. May 2, 2006
11) Capacity: From the above we know that the codeword spheres cannot overlap (to ensure no errors), DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
and we also know that all the codeword spheres must fit inside the permissible set (to ensure no detection). Thus if we calculate the number of non-overlapping spheres
24
Since Cachin’s definition does not model noise, we may consider it as noiseless and apply Theorem 3.1, C(W, g) = sup H(X) = H(S).
(200)
X∈S0
we may pack into the permissible set, we will have a
This result states that in a system that is perfectly
general idea of the number of codewords we can use. n 2
secure (in Cachin’s definition) the limit on the amount
and the volume of each codeword sphere is An (nσ 2 ) 2
n
of information that may be transferred each channel use
we can place approximately,
is equal to the entropy of the source. This is intuitive
Since the volume of the permissible set is An (nc)
because in Cachin’s definition the output distribution
n c n2 An (nc) 2 , n = σ2 An (nσ 2 ) 2
of the encoder is constrained to be equal to the cover
non-overlapping sphere inside the permissible set.
distribution.
Thus using the center of each sphere as a codeword, we have Mn codewords where, c n2 Mn = . σ2
cal distribution steganalyzer is motivated by the fact that the empirical distribution from a stationary memoryless source converges to the actual distribution of that source.
If we consider the capacity as C(W, g)
lim
1 n
=
log Mn we have, 1 log Mn n c n2 1 = lim log n σ2 c 1 = log 2 , 2 σ
C(W, g) = lim
13) Empirical Distribution Steganalyzer: The empiri-
Accordingly, if the empirical distribution of the test signal converges to the cover-signal distribution it is
(196a)
considered to be non-steganographic. Assume that pS is a discrete distribution over the finite
(196b) (196c)
which agrees with the result of Theorem 4.2.
n n alphabet S. Let a sequence, {sn }∞ n=1 with each s ∈ S
be used to specify the steganalyzer for a test signal x as, 0 if P n = P , [s ] [x] (201) gn (x) = 1 if P n 6= P . [s ]
V. P REVIOUS W ORK R EVISITED
where P[x] is the empirical distribution of x.
12) Cachin Perfect Security: In Cachin’s definition of perfect security the cover-signal distribution and the stego-signal distribution are each required to be independent and identically distributed. This gives the following secure-input set, 1 S0 = X = {X} : lim D (S n ||X n ) = 0 . (197) n→∞ n The i.i.d. property means that D (S n ||X n )
=
nD (S||X) so we see that the above is equivalent to, S0 = {X = {X} : D (S||X) = 0} = {X = {X} : pS = pX } May 2, 2006
[x]
(198)
The permissible set for gn is equal to the type class of P[sn ] , i.e., Pgn = T (P[sn ] ) := x ∈ X n : P[x] = P[sn ] . (202) 14) Moulin
Steganographic
Capacity:
Moulin’s
formulation[2] of the stego-channel is shown in Figure 10. This is somewhat different than the formulation shown in Figure 1; most notable is the presence of distortion constraints. Additionally there is an absence of a distortion function prior to the steganalyzer. Also in this model the steganalyzer is
(199)
fixed as the previously discussed empirical distribution DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002 P[s]
25
P[x] = P[s] ?
Detection
Theorem 3.1.
P[x] M Source
S
fn (m, s)
n
An (y|x)
Xn
Encoder
Noise
D1
D2
Yn
1 log |Pgn | n 1 = lim inf log |T (sn )| n→∞ n
C(W, g) = lim inf
φn (y)
n→∞
Decoder
= H(S) Fig. 10.
(204a) (204b) (204c)
Moulin Stego-channel
Here we have used the fact that the permissible set for the empirical distribution detection function is the type class P[s] Detection
in (204b). Additionally, by Varadarjan’s Theorem[15],
P[s] = P[x] ?
P[sn ] (x) → pS (x) almost surely (here the convergence
P[x] M Source
Fig. 11.
S
fn (m, s)
X=Y
Encoder
φn (y)
is uniform in x as well). This allows for the use of the
ˆ M
type class-entropy bound from Theorem 1.1 that provides
Decoder
the final result.
Equivalent Stego-channel
We now show Moulin’s capacity is equal to this value. In the case of a passive adversary (D2 = 0), the following is the capacity of the stego-channel[2], steganalyzer. The sequence of sn to specify the
C ST EG (D1 , 0) = sup H(X|S)
steganalyzer is drawn i.i.d. as S. In order to have the two formulations coincide a number of simplifications
where a p ∈ Q′ is feasible if,
are needed for each model.
X
For our model, •
The stego-channel is noiseless
•
The steganalyzer is the empirical distribution
(205)
Q′ ∈Q′
s,x
p(x|s)pS (s)d(s, x) ≤ D1 ,
(206)
and X
p(x|s)pS (s) = pS (x).
(207)
s
For Moulin’s model,
First we upper-bound the secure capacity as,
•
Passive adversary (D2 = 0)
•
No distortion constraint on encoder (D1 = ∞)
C ST EG (∞, 0) =
sup
H(X|S)
(208a)
p(x|s)∈Q′
≤
These changes produce the stego-channel shown in Figure 11.
sup H(X)
(208b)
p(x)∈Q′
= H(S)
(208c)
Theorem 5.1: For the stego-channel shown in Figure 11, the capacities of this work and Moulin’s agree.
Where the final line comes from the requirement that if p ∈ Q′ and p(x|s) = p(x) then p(x) = pS (x) for all x,
That is,
to satisfy (207). C(W, g) = C ST EG (∞, 0) = H (S) .
(203)
For the lower-bound we let Let pXS ˜ (x, s)
=
˜ pX|S ˜ (x|s)pS (s) = pS (x)pS (s), i.e. X ∼ pS . This Proof: Since the channel is noiseless we may apply May 2, 2006
defines a feasible covert-channel as (206) is trivially DRAFT
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002
satisfied (since D1 = ∞) and (207) is as well since, X X pX|S pS (x)pS (s) = pS (x). ˜ (x|s)pS (s) = s
s
(209)
26
these algorithms. This work presents a theory to shed light onto this important quantity called steganographic capacity.
This gives, C ST EG (∞, 0) =
sup
H(X|S)
(210a)
A PPENDIX I
p(x|s)∈Q′
˜ ≥ H(X|S)
(210b)
˜ = H(X)
(210c)
= H(S)
(210d)
Theorem 1.1 (Entropy): Let (p1 , p2 , . . .) be a se-
˜ and S are independent Here (210c) is because X (pXS ˜ (x, s) = pX ˜ (x)pS (s)). VI. C ONCLUSIONS
quence of types defined over the finite alphabet X where pn ∈ Pn . Assume this sequence satisfies the following: 1) pn → p 2) pn ≺≺ p,
∀n
Then,
A framework for evaluating the capacity of stegano-
lim inf n→∞
graphic channels under a passive adversary has been
1 log |T (pn )| = H(p). n
(211)
introduced. The system considers a noise corrupting the Proof: We first show,
signal before the detection function in order to model real-world distortions such as compression, quantization,
lim inf n→∞
etc.
1 log |T (pn )| ≥ H(p). n
(212)
Constraints on the encoder dealing with distortion and a cover-signal are not considered. Instead, the focus is
A sharpening of Stirling’s approximation states that,
to develop the theory necessary to analyze the interplay n! =
between the channel and detection function that results
√ 1 2πnn+ 2 e−n eλn
in the steganographic capacity. The method uses an information-spectrum approach
with
1 12n+1
< λn