Capacity of Steganographic Channels - CiteSeerX

Report 2 Downloads 135 Views
TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

1

Capacity of Steganographic Channels Jeremiah J. Harmsen, Member, IEEE, William A. Pearlman, Fellow, IEEE,

Abstract— This work investigates a central problem in steganography, that is: How much data can safely be hidden without being detected? To answer this question a formal

information that may be transferred over a stego-channel as seen in Figure 1. The stego-channel is equivalent to the classic channel

definition of steganographic capacity is presented. Once this has been defined a general formula for the capacity

with the addition of the detection function and attack

is developed. The formula is applicable to a very broad

channel. For the classic channel, a transmission is con-

spectrum of channels due to the use of an information-

sidered successful if the decoder properly determines

spectrum approach. This approach allows for the analysis

which message the encoder has sent. In the stego-channel

of arbitrary steganalyzers as well as non-stationary, non-

a transmission is successful not only if the decoder

ergodic encoder and attack channels. After the general formula is presented, various simplifications are applied to gain insight into example hiding and

properly determines the sent message, but if the detection function is not triggered as well.

detection methodologies. Finally, the context and applica-

This additional constraint on the channel use leads

tions of the work are summarized in a general discussion.

to the fundamental view that the capacity of a stegochannel is an intrinsic property of both the channel

Index Terms— Steganographic capacity, stego-channel, steganalysis, steganography, information theory, informa-

and the detection function. That is, the properties of the detection function influence the capacity just as much as

tion spectrum

the noise in the channel. I. I NTRODUCTION A. Background

B. Previous Work There have been a number of applications of informa-

Shannon’s pioneering work provides bounds on the

tion theory to the steganographic capacity problem[1],

amount of information that can be transmitted over a

[2]. These works give capacity results under distortion

noisy channel. His results show that capacity is an

constraints on the hider as well as active adversary. The

intrinsic property of the channel itself. This work takes

additional constraint that the stego-signal retain the same

a similar viewpoint in seeking to find the amount of

distribution as the cover-signal serves as the steganalysis detection function.

This work was carried out at Rensselaer Polytechnic Institute and was supported by the Air Force Research Labs. J. Harmsen is now with Google Inc. in Mountain View, CA 94043, USA; E-mail: [email protected].

Somewhat less work exists exploring capacity with arbitrary detection functions. These works are written from a steganalysis perspective[3], [4] and accordingly

W. Pearlman is with the Elec. Comp. and Syst. Engineering Dept., Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA; E-mail: [email protected].

May 2, 2006

give heavy consideration to the detection function. This work differs from previous work in a number of DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

2

Encoder-Attack Channel Qn (z|x) Xn fn (m) Encoder

Zn

Yn W n (y|x)

An (z|y)

Encoder Noise

Attack Channel

φn (z) Decoder

gn (y) Steganalyzer

Fig. 1.

Active System with Noise

aspects. Most notable is the use of information-spectrum

channel shown in Figure 1. Here, the adversary’s goal

methods that allow for the analysis of arbitrary detection

is to disrupt any steganographic communication between

algorithms. This eliminates the need to restrict interest to

the encoder and decoder. To accomplish this a stegana-

detection algorithms that operate on sample averages or

lyzer is used to intercept steganographic messages, and

behave consistently. Instead the detection functions may

an attack function may alter the signal.

be instantaneous, that is, the properties of a detector for n samples need not have any relation to the same detector for n + 1 samples.

We now formally define each of the components in the system, beginning with the random variable notation. 1) Random Variables: Random variables are denoted

Another substantial difference is the presence of noise

by capital letters, e.g. X. Realizations of these random

before the detector. This placement enables the modeling

variables are denoted as lowercase letters, e.g. x. Each

of common signal processing distortions such as com-

random variable is defined over a domain denoted with

pression, quantization, etc. The location of the noise adds

a script X . A sequence of n random variables is de-

complexity not only because of confusion at the decoder,

noted with X n = (X1 , . . . , Xn ). Similarly, an n-length

but also a signal, carefully crafted to avoid detection,

sequence of random variable realizations is denoted x =

may be corrupted into one that will trigger the detector.

(x1 , . . . , xn ) ∈ X n . The probability of X taking value

Finally, the consideration of a cover-signal and distortion constraint in the encoding function is omitted. This is due to the view that steganographic capacity is

x ∈ X is pX (x). Following a signal through Figure 1 we begin in the space of n-length stego-signals denoted X n . The signal

a property of the channel and the detection function.

then undergoes some distortion as it travels through the

This viewpoint, along with the above differences, make

encoder-channel. This results in an element from the

a direct comparison to previous work somewhat diffi-

corrupted stego-signal space of Y n . Finally, the signal

cult, although possible with a number of simplifications

is attacked to produce the attacked stego-signal in space

explored in Section V.

Z n. 2) Steganalyzer: The steganalyzer is a function gn :

C. Groundwork

Y n → {0, 1} that classifies a sequence of signals from

This chapter lays the groundwork for determining the

Y n into one of two categories: containing steganographic

amount of information that may be transfered over the

information, and not containing steganographic informa-

May 2, 2006

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

Yn

gn−1 ({0}) Pgn

gn

Ign

3

4) Impermissible Set: The impermissible set Ign ⊆

Y n is the inverse image of 1 under gn . That is,

{0, 1}

Ign := gn−1 ({1}) = {y ∈ Y n : gn (y) = 1}.

gn−1 ({1})

(4)

For a given gn the impermissible set is the set of all Fig. 2.

Permissible and Impermissible Sets

signals in Y n that gn will classify as steganographic. Example 1: Consider the illustrative sum steganalyzer defined for the binary channel outputs (Y = {0, 1}). The

tion. The function is defined as follows for all y ∈ Y n ,   1, if y is steganographic (1) gn (y) =  0, if y is not steganographic The specific type of function may be that of support

steganalyzer is defined for y = (y1 , . . . , yn ) as,   1, if Pn y >  n  i=1 i 2 gn (y) =  0, else

(5)

The permissible sets for n = 1, 2, 3, 4 are shown in Table I.

vector machine or a Bayesian classifier, etc. A steganalyzer sequence is denoted as,

TABLE I S UM S TEGANALYZER P ERMISSIBLE S ETS

g := {g1 , g2 , g3 , . . .},

(2) P1 =

{(0)}

P2 =

{(0,0),(0,1),(1,0)}

The set of all n length steganalyzers is denoted Gn .

P3 =

{(0,0,0),(1,0,0),(0,1,0),(0,0,1)}

3) Permissible Set: For any steganalyzer gn , the space

P4 =

{(0,0,0,0),(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1),

n

where gn : Y → {0, 1}.

of signals Y n is split into the permissible set and the

(1,1,0,0),(1,0,1,0),(1,0,0,1),(0,1,1,0),(0,1,0,1),(0,0,1,1)}

impermissible set, defined below. The permissible set Pgn ⊆ Y n is the inverse image of 0 under gn . That is, Pgn := gn−1 ({0}) = {y ∈ Y n : gn (y) = 0}.

5) Memoryless Steganalyzers: A memoryless steganalyzer, g = {gn }∞ n=1 is one where each gn is defined

(3)

The permissible set is the set of all signals of Y n that the given steganalyzer, gn will classify as nonsteganographic.

for y = (y1 , y2 , . . . , yn ) as,   1, if ∃i ∈ {1, 2, . . . , n} such that g(yi ) = 1 gn (y) =  0, if g(y ) = 0 ∀ i ∈ {1, 2, . . . , n} i

(6)

where g ∈ G1 is said to specify gn (and g). To denote

Since each steganalyzer has a binary range, a steganalyzer sequence may be completely described by a

a steganalyzer sequence is memoryless the following notation will be used g = {g}.

sequence of permissible sets. To denote a steganalyzer

The analysis of the memoryless steganalyzer is mo-

sequence in such a way the following notation is used,

tivated by the current real world implementation of

g∼ = {P1 , P2 , P3 , . . .},

detection systems. As an example we may consider each

n

where Pn ⊆ Y is the permissible set for gn . May 2, 2006

yi to be a digital image sent via email. When sending n emails, the hider attaches one of the yi ’s to each DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

4

message. The entire sequence of images is considered to

The attack channel may be deterministic or probabilistic.

be y. Typically steganalyzers do not make use of entire

Similarly to the encoder-noise channel, we denote an

sequence y. Instead each image is sequentially processed

arbitrary attack channel as the sequence of transition

by a given steganalyzer g, where if any of the yi trigger

probabilities,

the detector the entire sequence of emails is treated as

A := {A1 , A2 , A3 , . . .}.

steganographic. Clearly for a memoryless steganalyzer gn , defined by

3) Encoder-Attack Channel:

The encoder-attack n

channel or channel is a function Q : X n → Z n , defined

g we have that, Pgn = Pg × Pg × · · · × Pg | {z }

(7)

to model the effect of both the encoder-noise and attack channel. Thus,

n

That is, the permissible set of gn is defined by the ndimensional product of Pg .

Qn (z|x) =

X

An (z|y) W n (y|x) .

(9)

y∈Y n

The specification of Qn by An and W n is denoted Qn = An ◦ W n .

D. Channels We now define two channels. The first models inherent distortions occurring between the encoder and detection

The arbitrary encoder-attack channel is a sequence of transition probabilities,

function, such as the compression of the stego-signal.

Q = {Q1 , Q2 , Q3 , . . .}.

The second models a malicious attack by an active

(10)

We will express the dependence between the arbitrary

adversary such as a cropping or additive noise. 1) Encoder-Noise Channel: The encoder-noise channel is denoted as W n where W n : Y n × X n → [0, 1]

and has the following property for all x ∈ X n ,

encoder-noise and attack channels and the arbitrary encoder-attack channel as Q = A ◦ W. 4) Memoryless Channels: In the case where channel distortions act independently and identically on each

W n (y|x) := Pr {Y n = y|X n = x} .

input letter xi , we say it is a memoryless channel. In

The channel represents the conditional probabilities of

this instance the n-length transition probabilities can be

the steganalyzer receiving y ∈ Y n when x ∈ X n is

written as,

sent.

W n (y|x) =

The random variable, Y resulting from transmitting X W

i=1

W (yi |xi ),

(11)

where W is said to define the channel. To denote a

through the channel W will be denoted as X → Y . We denote an arbitrary encoder-noise channel as the

channel is memoryless and defined by W we will write W = {W }.

sequence of transition probabilities, W := {W 1 , W 2 , W 3 , . . .}.

E. Encoder and Decoder n

2) Attack Channel: The attack function maps A n

n Y

:

The purpose of the encoder and decoder is to transmit

n

Y → Z as, n

and receive information across a channel. The informan

A (z|y) = Pr {Z = z|Y May 2, 2006

n

= y} .

(8)

tion to be transfered is assumed to be from a uniformly DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

distributed message set denoted Mn , with a cardinality of Mn .

5

Similarly the probability of detection for the steganalyzer is calculated as,

The encoding function embeds a message into a

δn =

stegosignal. That is, fn : Mn → X n . The element of X n to which the ith message maps is called the codeword

Mn 1 X W n (Ign |ui ) . Mn i=1

(13)

F. Stego-Channel

for i and is denoted, ui . That is,

A steganographic channel or stego-channel is a triple i ∈ {1, . . . , Mn }.

fn (i) = ui ,

(W, g, A), where W is an arbitrary encoder-noise chan-

The collection of codewords, Cn = {u1 , . . . , uMn } is

nel, g is a steganalyzer sequence, and A is an arbitrary

called the code. The rate of an encoding function is given

attack channel. To reinforce the notion that a stego-

as,

channel is defined by a sequence of triples we will Rn :=

typically write (W, g, A) = {(W n , gn , An )}∞ n=1 .

1 log Mn . n

1) Discrete Stego-Channel: A discrete stego-channel

n

The decoding function, φn : Z → Mn , maps a cor-

is one where at least one of the following holds:

rupted stegosignal to a message. The decoder is defined by the set of decoding regions for the each message. The decoding regions, D1 , . . . , DMn , are disjoint sets that cover Z n and defined such that,

|X | < ∞,

|Y| < ∞,

|Z| < ∞,

or |Pgn | < ∞ ∀n.

2) Discrete Memoryless Stego-Channel: A discrete memoryless stego-channel (DMSC) is a stego-channel where,

φ−1 n ({m}) = Di := {F ⊆ Z n : φn (z) = m, ∀ z ∈ F } ,

1) (W, g, A) is discrete 2) W is memoryless 3) g is memoryless

for m = 1, . . . , Mn . Next, two important terms are presented that allow for the analysis of steganographic systems. The first is the probability the decoder makes a mistake, called

4) A is memoryless A DMSC is said to be defined by the triple (W, g, A) and will be denoted (W, g, A) = {(W, g, A)}.

the probability of error. The second is the probability the steganalyzer is triggered, called the probability of detection. In both cases they are calculated for a given n

G. Steganographic Capacity The secure capacity tells us how much information can

code C = {u1 , . . . , uMn }, encoder-channel W , attack-

be transfered with arbitrarily low probabilities of error

channel A and impermissible set Ign (corresponding to

and detection.

n

some gn ).

An (n, Mn , ǫn , δn )-code (for a given stego-channel)

The probability of error in decoding the message can be found as,

decoder are capable of transferring one of Mn messages ǫn =

n

n

1 Mn n

where Q = A ◦ W . May 2, 2006

consists of an encoder and decoder. The encoder and

Mn X i=1

Qn (Dic |ui ) ,

(12)

in n uses of the channel with an average probability of error of less than (or equal to) ǫn and a probability of detection of less than (or equal to) δn . DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

6

1) Secure Capacity: A rate R is said to be se-

The information-spectrum method also uses two novel

curely achievable for a stego-channel (W, g, A) =

quantities defined for sequences of random variables,

{(W n , gn , An )}∞ n=1 , if there exists a sequence of

called the lim sup and lim inf in probability.

(n, Mn , ǫn , δn )-codes such that:

The limsup in probability of a sequence of random variables, {Zn }∞ n=1 is defined as,

1) limn→∞ ǫn = 0 2) limn→∞ δn = 0 3) lim inf n→∞

1 n

n o p- lim sup Zn := inf α : lim Pr {Zn > α} = 0 . n→∞

log Mn ≥ R

The secure capacity of a stego-channel (W, g, A) is denoted as C(W, g, A). This is defined as the supremum of all securely achievable rates for (W, g, A).

Similarly, the liminf in probability of a sequence of random variables, {Zn }∞ n=1 is, n o p- lim inf Zn := sup β : lim Pr {Zn < β} = 0 . n→∞

H. (ǫ, δ)-Secure Capacity

The spectral sup-entropy rate of a general source X =

A rate R is said to be (ǫ, δ)-securely achievable for a stego-channel (W, g, A) = {(W n , gn , An )}∞ n=1 , if

{X n }∞ n=1 is defined as, H(X) := p- lim sup

there exists a sequence of (n, Mn , ǫn , δn )-codes such that:

n→∞

1 1 . log n pX n (X n )

(15)

Analogously, the spectral inf-entropy rate of a general

1) lim supn→∞ ǫn ≤ ǫ

source X = {X n }∞ n=1 is defined as,

2) lim supn→∞ δn ≤ δ 3) lim inf n→∞

1 n

H(X) := p- lim inf

log Mn ≥ R

n→∞

1 1 log . n pX n (X n )

(16)

The spectral entropy rate has a number of natural prop-

II. S ECURE C APACITY F ORMULA

erties such as for any X, H(X) ≥ H(X) ≥ 0 [5, Thm.

A. Information-Spectrum Methods

1.7.2]. The information-spectrum method[5], [6], [7], [8], [9] is a generalization of information theory created to apply to systems where either the channel or its inputs are not

The spectral sup-mutual information rate for the pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,

necessarily ergodic or stationary. Its use is required in 1 i(X n ; Y n ), n

(17)

pY n |X n (Y n |X n ) . pY n (Y n )

(18)

I(X; Y) := p- lim sup

this work because the steganalyzer is not assumed to

n→∞

have any ergodic or stationary properties. The information-spectrum method uses the general source (also called general sequence) defined as, n o∞ (n) (n) X := X n = (X1 , X2 , . . . , Xn(n) ) , n=1

where each

(n) Xm

where, i(X n ; Y n ) := log

(14)

is a random variable defined over

alphabet X . It is important to note that the general source

Likewise the spectral inf-mutual information rate for the pair of general sequences (X, Y) = {(X n , Y n )}∞ n=1 is defined as,

makes no assumptions about consistency, ergodicity, or stationarity. May 2, 2006

I(X; Y) := p- lim inf n→∞

1 i(X n ; Y n ). n

(19)

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

7

The set for δ = 0 is called secure output set and denoted

B. Information-Spectrum Results This section lists some of the fundamental results from

T0 .

information-spectrum theory [5] that will be used in the D. (ǫ, δ)-Channel Capacity

remainded of the paper. H(X) ≤ lim inf n→∞

1 H (X n ) n

(20)

I(X; Y) ≤ H(Y) − H(Y|X)

(21)

I(X; Y) ≥ H(Y) − H(Y|X)

(22)

C. Secure Sequences 1) Secure Input Sequences: For a given stegochannel (W, g, A), a general source X = {X n }∞ n=1 is called

δ-secure if the resulting Y = {Y n }∞ n=1 satisfies, lim sup Pr {gn (Y n ) = 1} ≤ δ,

(23)

result- the (ǫ, δ)-Channel Capacity. This capacity will make use of the following definition,   1 J (R|X) := lim sup Pr i(X n ; Z n ) ≤ R n n→∞   pZ n |X n (Z n |X n ) 1 = lim sup Pr ≤R log n pZ n (Z n ) n→∞   Qn (Z n |X n ) 1 ≤R . = lim sup Pr log n pZ n (Z n ) n→∞ The proof is the general ǫ-capacity proof given by Han[5], [6], with the restriction to the secure input set.

n→∞

or either of the following equivalent conditions,

Theorem 2.1 ((ǫ, δ)-Channel Capacity): The

lim sup pY n (Ign ) ≤ δ,

(24)

lim inf pY n (Pgn ) ≥ 1 − δ.

(25)

n→∞

channel capacity C(ǫ, δ|W, g, A) of a stegochannel

C(ǫ, δ|W, g, A) = sup sup {R : J (R|X) ≤ ǫ} , X∈Sδ

(29)

The set of all general sources that are δ-secure is denoted Sδ , that is, ( Sδ :=

X : lim sup

where X =

n→∞

X

x∈X n

W n (Ign |x) pX n (x) ≤ δ

)

for any 0 ≤ ǫ < 1 and 0 ≤ δ < 1. ,

(26)

{X n }∞ n=1 .

The set for δ = 0 is called secure input set and denoted S0 . 2) Secure Output Sequences: For a given steganalyzer sequence g =

{gn }∞ n=1 ,

a general sequence Y =

{Y n }∞ n=1 is called δ-secure if, lim sup Pr {gn (Y n ) = 1} ≤ δ,

(27)

n→∞

The set of all general output sequences that are δ-secure is denoted Tδ , that is,   n ∞ Tδ := Y = {Y }n=1 : lim sup pY n (Ign ) ≤ δ . n→∞

(28)

May 2, 2006

(ǫ, δ)-

(W, g, A) is given by,

or n→∞

We are now prepared to derive the first fundamental

Proof: This proof is based on [5], [6]. Let C = supX∈Sδ sup {R : J (R|X) ≤ ǫ}, and Qn = An ◦ W n . Achievability: Choose any ǫ ≥ 0 and δ > 0. Let R = C − 3γ, for any γ > 0. By the definition of C we have that there exists an X ∈ Sδ such that, sup{R : J (R|X) ≤ ǫ} ≥ C − γ = R + 2γ.

(30)

Similarly we may find an R′ > R + γ such that J (R′ |X) ≤ ǫ. By the monotonicity of J (R|X) we have that, J (R + γ|X) ≤ ǫ

(31)

Next by letting Mn = enR we have that, lim inf n→∞

1 log Mn ≥ R n DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

8

(W, g, A)

Using Feinstein’s Lemma[10] we have that there exists

Xn

an (n, Mn , ǫn )-code with,   Qn (Z n |X n ) 1 1 log ≤ log Mn + γ +e−nγ . ǫn ≤ Pr n pZ n (Z n ) n (32)

Xn = Y n

Xn

1 n

Taking the lim sup of each side we have,

(W, g, ·) W n (y|x) (·, g, ·)

An (y|z)

gn (y)

An (y|z)

Zn

Y n = Zn gn (y)

Xn = Y n = Zn gn (y)

Fig. 3.

lim sup ǫn ≤ J (R + γ|X) ,

gn (y)

(·, g, A)

log Mn = R for all n we have,   1 Qn (Z n |X n ) ǫn ≤ Pr ≤ R + γ +e−nγ . (33) log n pZ n (Z n )

As

Zn

Yn W n (y|x)

Stegochannels

(34)

n→∞

with J (R + γ|X) ≤ ǫ shows that lim supn→∞ ǫn ≤ ǫ. Finally since X ∈ Sδ we have that, lim sup pZ n (Ign ) ≤ δ.

(35)

n→∞

Converse: Let R > C, and choose γ > 0 such that

Thus for n > n0 we have,   1 Qn (Z n |X n ) ǫn ≥ Pr log ≤ R − 2γ − e−nγ . n pZ n (Z n ) (42) Taking the lim sup of both sides,

R−2γ > C. Assume that R is (ǫ, δ)-achievable, so there lim sup ǫn ≥ J (R − 2γ|X) .

exists an (n, Mn , ǫn , δn )-code such that, lim inf n→∞

(43)

n→∞

1 log Mn ≥ R, n

(36)

lim sup ǫn ≤ ǫ,

(37)

lim sup δn ≤ δ.

(38)

Since J (R − 2γ|X) > ǫ by (39), we see that, lim sup ǫn > ǫ.

(44)

n→∞

n→∞

and n→∞

Let X =

{X n }∞ n=1

E. Secure Channel Capacity

be the distribution of this code Q

and let Z be the corresponding output, where X → Z. As R − 2γ > C ≥ sup{R : J (R|X) ≤ ǫ} we must have,

capacity, namely the one where ǫ = δ = 0. The secure channel capacity is the maximum amount of information that may be sent over a channel with arbitrarily small

J (R − 2γ|X) > ǫ.

(39)

Using the Feinstein Dual [5], [6] we have,   Qn (Z n |X n ) 1 1 log log M − γ −e−nγ ≤ ǫn ≥ Pr n n pZ n (Z n ) n (40) Using the property of lim inf we have that for all n > n0 that, 1 log Mn ≥ R − γ. n May 2, 2006

The next result deals with a special case of (ǫ, δ)-

probabilities of error and detection. The four potential formulations for our model are shown in Figure 3. The capacity of the stego-channel (W, g, A) is shown in Theorem 2.2 to follow and specialized to the other cases in Theorems 2.3, 2.4 and 2.5. The results of these capacities are summarized in

(41)

Table II. DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

9

Theorem 2.2 (Secure Capacity): The secure channel

Theorem 2.4 (Passive Adversary): The secure chan-

capacity C(W, g, A) of a stegochannel (W, g, A) is

nel capacity with a passive adversary, denoted C(W, g)

given by,

of a stego-channel (W, g, ·) is given by, C(W, g, A) = sup I(X; Z).

(45)

C(W, g) = sup I(X; Y).

(52)

X∈S0

X∈S0

Proof: We apply Theorem 2.1 with ǫ = 0 and δ = 0. This gives,

Proof: From the above Theorem we have, C(W, g, A) = sup I(X; Z).

(53)

X∈S0

C(W, g, A)

Since the adversary is passive, we have that Z = Y.

= C(0, 0|W, g, A)

(46a)

Theorem 2.5 (Noiseless Encoder, Passive Adversary):

= sup sup {R : J (R|X) ≤ 0}

(46b)

The secure capacity of a stego-channel (·, g, ·), with

    1 i(X n ; Z n ) ≤ R ≤ 0 = sup sup R : lim sup Pr n n→∞ X∈S0

a noiseless-encoder and passive adversary, denoted

X∈S0

(46c) = sup I(X; Z)

C(·, g), is given by, C(·, g) = sup I(X; Y).

(54)

X∈S0

(46d)

Proof: From the above Theorem 2.2 we have,

X∈S0

Here the last line is due to the definition of p- lim inf.

C(W, g, A) = sup I(X; Z).

(55)

X∈S0

Theorem 2.3 (Noiseless Encoder, Active Adversary):

Since the adversary is passive, we have that Z = Y, and

The secure capacity of a stego-channel, (·, g, A), with

since there is no encoder noise we have that X = Y as

a noiseless-encoder and active adversary, denoted

well as S0 = T0 .

C(·, g, A), is given by,

C(·, g) = sup I(X; X)

(56)

X∈S0

C(·, g, A) = sup I(Y; Z),

(47)

= sup H(Y)

Y∈T0

(57)

Y∈T0

where

Here we have made use of the result that since X =

1 An (Y n |X n ) I(Y; Z) = p- lim inf log . n→∞ n pY n (Y n )

(48)

Proof: From the above Theorem 2.2 we have, C(W, g, A) = sup I(X; Z).

(49)

X∈S0

Since there is no encoder-noise we have that X = Y

Y, we have pX n |Y n (X n |Y n ) = 1 and, pX n |Y n (X n |Y n ) 1 log n→∞ n pY n (Y n ) 1 1 = p- lim inf log n→∞ n pY n (Y n )

I(Y; X) = p- lim inf

(58) (59)

= H(Y)

(60)

and S0 = T0 , C(·, g, A) = sup I(X; Z)

(50)

X∈S0

= sup I(Y; Z) Y∈T0

F. Strong Converse (51)

A stego-channel (W, g, A) is said to satisfy the ǫ-strong converse property if for any R

May 2, 2006

>

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

10

TABLE II

Let X represent the uniform input due to this code and

S ECURE C APACITY F ORMULAS

Z the output after the channel Q = AX. From the Secure Capacity

Noise

Attack

Thm.

C(W, g, A) = sup I(X; Z)

W

A

2.2

Noiseless

A

2.3

Feinstein Dual [5], [6] we know,   1 1 ǫn ≥ Pr i(X n ; Z n ) ≤ log Mn − γ − e−nγ . n n (63)

W

Passive

2.4

We also know there exists n0 such that for all n > n0

Noiseless

Passive

2.5

X∈S0

C(·, g, A) = sup I(Y; Z) Y∈T0

C(W, g) = sup I(X; Y) X∈S0

C(·, g) = sup H(Y)

that, 1 log Mn ≥ R − γ, n

Y∈T0

(64)

so for n > n0 ,   1 ǫn ≥ Pr i(X n ; Z n ) ≤ R − 2γ − e−nγ . n

C(0, δ|W, g, A), every (n, Mn , ǫn , δn )-code with, 1 lim inf log Mn ≥ R, n→∞ n

(65)

We now show that the probability term above tends to

and

1. lim sup δn ≤ δ,

Using Theorem 2.2 we have,

n→∞

we have,

R lim ǫn = 1.

n→∞

= C(0, δ|W, g, A) + 3γ

(66)

= supX∈Sδ I(X; Z) + 3γ

(67)

= supX∈Sδ I(X; Z) + 3γ

(68)

Thus if a channel satisfies the ǫ-strong converse, Rewriting gives, C(ǫ, δ|W, g, A) = C(0, δ|W, g, A),

(61) R − 2γ = sup I(X; Z) + γ.

(69)

X∈Sδ

for any ǫ ∈ [0, 1). Theorem 2.6 (ǫ-Strong Converse): A

stego-channel

(W, g, A) satisfies the ǫ-strong converse property (for a fixed δ) if and only if,

By the definition of I(X; Z) we finally have,   1 n n lim Pr i(X ; Z ) ≤ R − 2γ = 1, n→∞ n

(70)

which together with 65 shows that that lim ǫn = 1. sup I(X; Z) = sup I(X; Z). X∈Sδ

(62)

X∈Sδ

For the other direction assume,

Proof: This proof is based on[5], [6].

Let R = C(0, δ|W, g, A) + 3γ with γ > 0. Consider an

lim ǫn = 1,

(71)

lim sup δn ≤ δ.

(72)

n→∞

First assume supX∈Sδ I(X; Z) = supX∈Sδ I(X; Z). and,

(n, Mn , ǫn , δn )-code with,

n→∞

lim inf n→∞

1 log Mn ≥ R, n

Mn = enR . Clearly,

and lim sup δn ≤ δ. n→∞

May 2, 2006

Set R = C(0, δ|W, g, A) + γ for any γ > 0 and set

lim inf n→∞

1 log Mn = R > C(0, δ|W, g, A). n DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

For any X ∈ Sδ (and it’s corresponding Z), using

11

Proof: Let U(A) represent the uniform distribution

Feinstein’s Lemma [10] we have an (n, Mn , ǫn )-code

on a set A.

satisfying,

Since Y∗ = {U(Pn )}∞ i=1 ∈ T0 we have,

ǫn ≤ Pr



1 i(X n ; Z n ) ≤ R + γ n



+ e−nγ .

From the error assumption we see that,   1 n n i(X ; Z ) ≤ R + γ = 1. lim Pr n→∞ n

sup H(Y) ≥ H(Y∗ )

(73)

(83a)

Y∈T0

= lim inf n→∞

(74)

1 log |Pn | n

Now assume there exists Y ∈ T0 with Y = {Y¯ n }∞ n=1 , such that,

This means that, R + γ ≥ I(X; Z),

R + γ ≥ sup I(X; Z).

H(Y) = H(Y∗ ) + 3γ,

(75)

(76)

X∈Sδ

Substituting we have that,

This means that,   1 1 ∗ log < H(Y lim Pr ) + 2γ = 0 (85) n→∞ n pY¯ n (Y¯ n ) By (83b) we have H(Y∗ ) = lim inf n→∞

sup I(X; Z) ≤

(84)

for any γ > 0.

and since X ∈ Sδ is arbitrary we have,

1 n

log |Pn |

R+γ

(77)

and from the definition of lim inf we may find a subse-

=

C(0, δ|W, g, A) + 2γ

(78)

quence indexed by kn such that,

=

supX∈Sδ I(X; Z) + 2γ

(79)

X∈Sδ

As γ is arbitrarily close to 0 we have, sup I(X; Z) ≤ sup I(X; Z).

X∈Sδ

(80)

X∈Sδ

Also, by definition, sup I(X; Z) ≥ sup I(X; Z),

X∈Sδ

(83b)

(81)

X∈Sδ

H(Y∗ ) + 2γ ≥

1 log |Pkn | + γ. kn

(86a)

For any kn (86a) holds and we have,   1 1 1 log log |Pkn | + γ ≤ (87) < Pr kn kn pY¯ kn (Y¯ kn )   1 1 ∗ log < H(Y Pr ) + 2γ . (88) kn pY¯ kn (Y¯ kn ) Applying this result to (85) we have,

showing equality and completing the proof. lim Pr

G. Bounds

n→∞



1 1 1 < log log |Pkn | + γ k ¯ n kn kn pY¯ kn (Y )

the permissible set. These bounds will then be used to prove general bounds for steganographic systems, and see further application in Chapter III.

Rearranging the inner term we have,   e−kn γ kn ¯ = 0. lim Pr pY¯ kn (Y ) > n→∞ |Pkn |

(90)

This means we may find n0 such that for any ǫ > 0 and n > n0 ,

Theorem 2.7 (Spectral inf-entropy bound): For a discrete g = {Pn }∞ n=1 with corresponding secure output set T0 ,

  e−kn γ kn ¯ Pr pY¯ kn (Y ) > < ǫ. Pkn

(91)

  e−kn γ , y ∈ Y n : pY¯ kn (Y¯ kn ) > |Pkn |

(92)

Let, sup H(Y) = lim inf Y∈T0

May 2, 2006

= 0.

(89)

We now derive a number of useful bounds on the spectral-entropy of an output sequence in relation to



n→∞

1 log |Pn | n

(82)

Akn =

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

12

Proof: Since Y∗ = {U(Pn )}∞ i=1 ∈ T0 we have,

so for all n > n0 ,

pY¯ kn (Akn ) < ǫ.

sup H(Y) ≥ H(Y∗ )

(93)

(98a)

Y∈T0

= lim sup

So for n > n0 we may calculate the probability of the

n→∞

1 log |Pn | n

(98b)

permissible set (for the subsequence) as, pY¯ kn (Pkn ) =

X

pY¯ kn (y)

(94a)

Now assume there exists Y ∈ T0 , with Y = {Y¯ n }∞ n=1 such that,

y∈Pkn

X

=

X

pY¯ kn (y) +

y∈Pkn ∩Ackn

y∈Pkn ∩Akn

(94b) ≤

−kn γ

X

y∈Pkn ∩Ackn

e + |Pkn |

X

pY¯ kn (y)

X e−kn γ ≤ + |Pkn | y∈Pkn

X

This means that,

lim Pr

pY¯ kn (y) (94d)

pY¯ kn (y)



X

pY¯ kn (y)

(94f)

< e−kn γ + ǫ

lim Pr

n→∞

Thus we see that for the subsequence, (95)

= 0 (100)

(101)

and

(94g)

lim sup pY¯ kn (Pkn ) < ǫ,



1 γ log |Pkn | + γ > H(Y) + kn 4

(94e)

y∈Akn

1 1 γ log > H(Y) + n ¯ n 4 pY¯ n (Y )

Thus for some subsequence kn we have,

y∈Pkn ∩Akn

≤ e−kn γ +

(99)

for any γ > 0.

n→∞

y∈Pkn ∩Akn

X

γ , 4

y∈Pkn ∩Akn

(94c)

= e−kn γ +

H(Y) = H(Y∗ ) +

pY¯ kn (y)



1 1 1 log log |Pkn | + γ > k ¯ n kn k pY¯ kn (Y ) n



= 0.

(102)

Rewriting we have,

n→∞

  e−kn γ lim Pr pY¯ kn (Y¯ kn ) < =0 n→∞ |Pkn |

for all ǫ > 0 so clearly, lim pY¯ n (Pn ) = 1,

n→∞

(96) Let,

is impossible. Thus from (25) we have a contradiction as the above / T0 . implies that Y ∈ Theorem 2.8 (Spectral sup-entropy bound): For discrete g = {Pn }∞ n=1 with corresponding secure output set T0 , sup H(Y) = lim sup Y∈T0

May 2, 2006

(103)

n→∞

1 log |Pn | n

(97)

Akn =

  e−kn γ y ∈ X n : pY¯ kn (Y¯ kn ) < |Pkn |

(104)

and given any ǫ > 0 we may find n0 so for n > n0 , pY¯ kn (Akn ) < ǫ.

(105)

For n > n0 the probability of the permissible set (in this DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

Proof: We note that the general distributions form

subsequence) is, pY¯ kn (Pkn ) =

13

X

pY¯ kn (y)

(106a)

x∈Pkn

X

=

a Markov chain, X → Y → Z1 . A property of the infinformation rate[6] is,

pY¯ kn (y)

I(X; Z) ≤ I(X; Y),

y∈Pkn ∩Akn

+

X

pY¯ kn (y)

(106b)

y∈Pkn ∩Akn −kn γ



e |Pkn | +

(111)

when X → Y → Z. Since X → Y → Z implies Z → Y → X we also

X

have,

1

y∈Pkn ∩Akn

X

pY¯ kn (y)

I(X; Z) ≤ I(Y; Z).

(106c)

(112)

y∈Pkn ∩Akn

≤ e−kn γ + ≤ e−kn γ +

X

pY¯ kn (y)

(106d)

y∈Pkn ∩Akn

X

pY¯ kn (y)

(106e)

on the sup-entropy of the secure input set. Theorem 2.9 (Input Sup-Entropy Bound): For

y∈Akn

< e−kn γ + ǫ

The first capacity bound gives an upperbound based

(106f)

a

stego-channel (W, g, A) the secure capacity is bounded as,

This gives,

C(W, g, A) ≤ sup H(X)

(113)

X∈S0

lim sup pY¯ kn (Pkn ) < ǫ,

(107)

Proof: Using (21) and the property that H(X|Z) ≥

n→∞

for any ǫ > 0. Since the subsequence above does not

0 we have,

converge to 1 it is impossible for,

(T2.2)

C(W, g, A) = sup I(X; Z) X∈S0

lim inf pY¯ n (Pn ) = 1, n→∞

(21)

(108)

≤ sup

X∈S0

and by (25) we see Y ∈ / T0 .



H(X) − H(X|Z)

≤ sup H(X) X∈S0

H. Capacity Bounds

The following corollary specializes the previous one

This section present a number of fundamental bounds on the secure capacity of a stego-channel based on the properties of that channel.

with the restriction that the input alphabet is finite. Corallary 2.1: For

a n

n

(W, g, A) = {(W , Pgn , A

given

stego-channel

)}∞ n=1

with a discrete

We make use of the following lemma.

input set (|X | < ∞) the secure capacity is bounded

Lemma 2.1: For a stego-channel (W, g, A) the fol-

from above as,

lowing hold,

May 2, 2006

C(W, g, A) ≤ log |X | I(X; Z) ≤ I(X; Y),

(109)

I(X; Z) ≤ I(Y; Z).

(110)

1X

(115)

→ Y → Z is said to hold when for all n, X n and Z n are

conditionally independent given Y n .

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

Proof: We make use of Theorem 2.9,

14

Proof: Combining Theorem 2.8 and line (117b) of Theorem 2.10 gives the desired result.

(T2.9)

C(W, g, A) = sup H(X) X∈S0

The next theorem provides an intuitive result dealing

≤ sup H(X)

with the capacity of two stego-channels having related

X

steganalyzers.

= log |X |

Theorem 2.11 (Permissible Set Relation): For

The next theorem gives two upper bounds on the

two

stego-channels, (W, g, A) and (W, v, A) if Pgn ⊆ Pvn for all but finitely many n, then,

capacity based on the sup-entropy of the secure input C(W, g, A) ≤ C(W, v, A).

and output sets. Theorem 2.10 (Output Sup-Entropy Bounds): For

a

stego-channel (W, g, A) the secure capacity is bounded as,

Proof:

(120)

∞ Let {fn }∞ n=1 and {φn }n=1 be a se-

quence of encoding and decoding functions that achieves C(W, g, A). Such a sequence exists by the definition of

C(W, g, A) ≤ sup H(Y)

(117a)

≤ sup H(Y)

(117b)

secure capacity. The following definitions will be used

X∈S0

for i = 1, . . . , Mn ,

Y∈T0

ui = fn (i),

Proof: Using (21) and the property that H(Z|X) ≥

Di = φ−1 n ({i}) .

0 we have,

The probability of error for this sequence is given

C(W, g, A) = sup I(X; Z) X∈S0

by (12),

(L2.1)

≤ sup I(X; Y) X∈S0

(21)

≤ sup

X∈S0



ǫn =

H(Y) − H(Y|X)

where Qn = An ◦ W n .

≤ sup H(Y)

Clearly, this value is independent of the permissible

X∈S0

sets and if ǫn → 0 for the stego-channel (W, g, A) then

≤ sup H(Y) Y∈T0

it also goes to zero for (W, v, A). W

Here the final line follows since if X ∈ S0 and X → Y then Y ∈ T0 .

Next we know that the probability of detection for (W, g, A) is given by (13),

The next corollary specializes the above theorem when δng =

the permissible set is finite. Corallary 2.2 (Discrete Permissible Set Bound): For a given discrete stego-channel (W, g, A)

=

{(W n , Pgn , An )}∞ n=1 the secure capacity is bounded from above as, C(W, g, A) ≤ lim sup n→∞

May 2, 2006

Mn 1 X Qn (Dic |ui ) , Mn i=1

1 log |Pgn | n

(119)

and that δng → 0.

Mn 1 X W n (Ign |ui ) , Mn i=1

Since Pgn ⊆ Pvn for all n > N , we have that, Ign ⊇ Ivn if n > N and thus, W n (Ign |x) ≥ W n (Ivn |x) ,

∀n > N, x ∈ X n . (121) DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

W n (y|x)

fn (m) Encoder

φn (y)

Noise

Decoder

gn (y) Detection g

Fig. 4.

Detection g

Fig. 5.

vn (z) Detection v

Two Noise Channel

Proof:

We first show that C(W, h, A)



C(W, g, A). The permissible set of the composite is equal to the

Mn 1 X W n (Ivn |ui ) Mn i=1

intersection of the base detection functions,

Mn 1 X W n (Ign |ui ) Mn i=1

Phn = Pgn ∩ Pvn , ∀n,

(124)

thus we have that Phn ⊆ Pgn and we may apply

=δng

Theorem 2.11 to state,

Since δng → 0 we see that δnv → 0 as well.

C(W, h, A) ≤ C(W, g, A). The above argument may be applied using Phn ⊆ Pvn

I. Applications 1) Composite steganalyzers: This final theorem of the previous section is intuitively pleasing and leads to some immediate results. An example of this is the composite steganalyzer pictured in Figure 4.

to show C(W, h, A) ≤ C(W, v, A). 2) Two Noise Systems: We briefly present and discuss an interesting case that is somewhat counter-intuitive. Consider the channel shown in Figure 5. In this case there is distortion A after the encoder and a second dis-

In this system two steganalyzers, g and v are used sequentially on the corrupted stego-signal. If either of these steganalyzers are triggered, the message is considered steganographic. We will denote the composite stego-channel of this system as (W, h, A).

tortion, B before the second steganalyzer. In the previous section it was shown that in the composite steganalyzer the addition of a second steganalyzer (Figure 5) lowers the capacity of the stego-channel. A surprising result for the two noise system is that this may not be the case-

As one would expect the capacity of the composite channel, C(W, h, A), is smaller than either C(W, g, A) or C(W, v, A). This is shown in the next theorem. Theorem 2.12 (Composite Stego-Channel): For

in fact, the addition of a second distortion can increase the capacity of a stego-channel! To see this consider the two steganalyzers g and v.

a

composite stego-channel (W, h, A) defined by g and v, the following inequality holds,

Assume that g classifies signals with positive means as steganographic, while v classifies signals with negative means as steganographic. If these detection functions

C(W, h, A) ≤ min {C(W, g, A), C(W, v, A)} . (123) May 2, 2006

Noise B

Detection v

for (W, v, A) and n > N as,



B(z|y)

gn (y)

Using this we may bound the probability of detection

(121)

A(y|x) Noise A

vn (y)

Composite steganalyzer

δnv =

15

were in series, clearly the permissible set (of the composite detection function) is empty as a signal cannot DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

Xn = Y n = Zn Mn

fn (m) Encoder

φn (z)

16

ˆn M

Decoder

Example 2 (Capacity of the Sum Steganalyzer): We gn (y)

now use this result to find the noiseless capacity of

Detection

the parity steganalyzer of Example 1. The size of the Fig. 6.

permissible set for n is equal to the number of different

Noiseless Stego-Channel

ways we may arrange up to ⌊n/2⌋ 1s into n positions. have a positive and negative mean. Now if the distortion B is deterministic, for example, |Pgn | =

B n (−y|y) = 1, we may send any signal we wish, as long as its mean

X

i:0≤i≤⌊ n 2⌋

 

n i



.

(126)

is positive. So in some instances, it’s possible for the addition of a distortion to actually increase the capacity.

Making use of the following properties,

III. N OISELESS C HANNELS This section investigates the capacity of the noiseless stego-channel shown in Figure 6. In this system there is no encoder-noise and the adversary is passive. This

n X i=0



n



n





means that not only does the decoder receive exactly what the encoder sends, but the steganalyzer does as

i

k



 = 2n ,





=

(127) n n−k

 

(128)

well. This section finds the perfectly secure capacity of this system, and then derives a number of intuitive bounds

We begin by finding |Pgn | when n is odd. Letting k = (n − 1)/2,

relating to this capacity. Theorem 3.1 (Secure Noiseless Capacity): For a discrete noiseless channel (·, g, ·) the secure channel capacity is given by,

C(·, g) = lim inf n→∞

1 log |Pgn | n

(125)

Proof: Using Theorem 2.5 and Theorem 2.7 we have, (T2.5)

C(·, g) = sup H(Y) Y∈T0 (T2.7)

= lim inf n→∞

May 2, 2006

1 log |Pgn | n

  n X n (127)   2n = i i=0     n k X X n n   +  = j j j=0 j=k+1     n k X X n n (128)   +  = n−j j j=0 j=k+1   k X n   = 2 |Pgn | =2 j j=0

(129)

(130)

(131)

(132)

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

We now find |Pgn | for n even. Letting k = n/2,   n X n 1 (127)   2n−1 = (133) 2 i=0 i       k−1 n 1 X n  1  n  1 X  n  = + + 2 j=0 2 2 k j j j=k+1

17

First assume that the stego-channel satisfies the ǫstrong converse property. This gives, (140)

sup H(Y) = sup I(X; Z) Y∈T0 (T2.6)

= sup I(X; Z)

k−1 X

n





+ 1 2 j j=0    k X n 1 −   = 2 j j=0   1 n  = |Pgn | − 2 k

(128)

=



This gives |Pgn | = 2n−1 + |Pgn | = 2n−1

n k n k

 

(135)

 

(136)

(141)

= sup H(Y)

1 2



(142c)

Y∈T0

The capacity is then, (T2.5)

C(·, g) = sup H(Y) Y∈T0 (T2.7)

= lim inf n→∞

(137) 

(142b)

X∈S0

(134)



(142a)

X∈S0

n

k for n odd from above.

1 log |Pgn | n

(142c)

= sup H(Y) Y∈T0



(T2.8)

= lim sup

 for n even and

n→∞

= lim

n→∞

1 log |Pgn | n

1 log |Pgn | n

This produces the following capacity result, C(·, g) = lim inf n→∞

1 1 log |Pgn | = lim log 2n−1 n→∞ n n = 1bit/use.

(138a)

3) ǫ-Strong Converse for Noiseless Channels: We

Here the final line results as the lim inf and lim sup coincide. For the other direction assume that C(·, g) limn→∞

1 n

=

log |Pgn | thus we have,

now present a fundamental result for discrete noiseless channels regarding the ǫ-strong converse property.

C(·, g) = sup I(X; Z) X∈S0

It gives the necessary and sufficient conditions for a

(T2.5)

= sup H(Y) Y∈T0

noiseless stego-channel to satisfy the ǫ-strong converse

1 log |Pgn | n 1 = lim sup log |Pgn | n n→∞ = lim

property.

n→∞

Theorem 3.2 (Noiseless ǫ-Strong Converse): A discrete noiseless stego-channel (·, g, ·) satisfies the

(T2.8)

= sup H(Y)

ǫ-strong converse property if and only if, C(·, g) = lim

n→∞

1 log |Pgn | . n

Y∈T0 (141)

Proof: Since the channel is noiseless, X = Y = Z we have,

X∈S0

Thus, supX∈S0 I(X; Z) = supX∈S0 I(X; Z) and by Theorem 2.6 the stego-channel satisfies the ǫ-strong-

sup I(X; Z) = sup H(Y), X∈S0

X∈S0

(140)

converse property.

Y∈T0

sup I(X; Z) = sup H(Y).

May 2, 2006

= sup I(X; Z)

(139)

Y∈T0

(141)

Example 3 (Sum Steganalyzer): We now determine if the sum steganalyzer satisfies the ǫ-strong converse. DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

From Example 2 the size of the permissible set is,      n   2n−1 + 1   , for even n 2 |Pgn | = n/2     2n−1 , for odd n (144)

We will make use of Stirling’s approximation, √ 1 n! = 2πnn+ 2 e−n+λn ,

(145)

18

Since the liminf and limsup coincide the limit is indeed a true one. Thus, this stego-channel satisfies the ǫ-strong converse. 4) Properties of the Noiseless DMSC: In this section we briefly investigate the secure capacity of the discrete memoryless stego-channel (cf. I-F.2). Theorem 3.3 (Noiseless DMSC Secure Capacity): For the stego-channel (·, g, ·) with g = {g}, the secure

where 1/(12n + 1) < λn < 1/(12n).

capacity is given by,

For n even, 1 n! (146) 1 2 (n − 2 n)!( 12 n)! √ 1 1 2πnn+ 2 e−n+λn n−1 =2 + √ 2 n n 2 +1 2π(n/2) 2 2 e− 2 +λn/2

|Pgn | = 2n−1 +

(147)

= 2n−1 + = 2n−1 +



C(·, g) = log |Pg | ,

and furthermore this stego-channel satisfies the strong converse. Proof: As the channel is noiseless and the input alphabet is finite we may use Theorem 3.1,

n+ 12 −n+λn

e 1 2πn 2 2π(n/2)n+1 e−n+2λn/2

(157)

(148)

C(·, g) = lim inf n→∞

1 log |Pgn | . n

(158)

n+1 λn

e 1 2 √ 2 2πne2λn/2

(149)

2eλn −2λn/2 = 2n−1 + 2n−1 √ 2πn   λn −2λn/2 2e n−1 √ 1+ =2 2πn   2e n−1 1+ √ ≤2 2πn

(150)

Note that by (7) we have for all n, 1 1 log |Pgn | = log Pg × Pg × · · · × Pg n n {z } | n

(151)

1 n = log |Pg | n

(152)

= log |Pg | .

This gives, lim sup n→∞

1 log |Pgn | n ≤ lim sup n→∞

Thus, 



2e 1 log 2n−1 1 + √ n 2πn

1 log 2n−1 n→∞ n   1 2e + lim sup log 1 + √ n→∞ n 2πn



(153)

n→∞

1 1 log |Pgn | = log |Pg | = lim log |Pgn | , n→∞ n n (160)

(154)

thus by Theorem 3.2 the stego-channel satisfies the

(155)

strong converse.

This shows, 1 1 lim inf log |Pgn | = 1bit/use. ≥ lim sup log |Pgn | . n→∞ n n→∞ n (156) May 2, 2006

(159)

We also have that C(·, g) = lim inf

≤ lim sup

=1

C(·, g) = log |Pg | .

IV. A DDITIVE N OISE S TEGO -C HANNELS In this section we evaluate the capacity of particular stego-channel, shown in Figure 7. In this channel both DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002 Y n = X n + Nen

Xn

Z n = Y n + Nan

fn (m)

fn (m)

φn (z)

φn (y)

Encoder

Encoder

Decoder

Decoder

gn (y)

Nan

Detection

Attack

Nen Noise

Fig. 7.

19

N (0, σe2 )

gn (y) Detection

Noise

Fig. 8.

Additive Noise Channel Active Adversary

N (0, σa2 ) Attack

AWGN Channel Active Adversary

Theorem 4.1: For additive noise stego-channel de-

the encoder-noise and attack-noise are additive and in-

fined with Ne + Na = N, if N satisfies the strong

dependent from the channel input.

converse (i.e. H(N) = H(N)) then the capacity is, A. Additive Noise

C(W, g, A) = sup {H(Z)} − H(N)

(164)

X∈S0

Denote the sum of two general sequences X = (n)

Proof: First we find a lower bound as,

(n)

n {X n = (X1 , . . . , Xn )}∞ = n=1 , and Y = {Y (n)

(22)

C(W, g, A) ≥ sup

(n)

(Y1 , . . . , Yn )}∞ n=1 as,

X∈S0

(n)

(n)

X+Y := {X n +Y n = (X1 +Y1 , . . . , Xn(n) +Yn(n) )}∞ n=1 . (161) Letting the encoder-noise be denoted as Ne = {Nen }∞ n=1 and the attack-noise denoted as Na = {Nan }∞ n=1 we have the following relations,

(162)



H(Z) − H(Z|X)

= sup {H(Z)} − H(N)

(165) (166)

X∈S0

Next we upperbound the capacity as, (21)

C(W, g, A) ≤ sup {H(Z) − H(Z|X)}

(167)

X∈S0

(163)

= sup {H(Z)} − H(N)

(168)

X∈S0

Y = X + Ne

By assumption H(N) = H(N) and combining (166)

Z = Y + Na = X + Ne + Na = X + N

and (168) we have the desired result.

where N = {N n }∞ n=1 = Ne + Na . As noises are independent from the stego-signal we

B. AWGN Example The general formula of the previous section is now

may use the following simplifications,

applied to the commonly found additive white Gaussian pZ n |X n (X n + N n |X n ) = pN n (N n ),

noise channel. The detector is motivated by the use of

leading to the following simplifications in spectral-

spread spectrum steganography[11] or more generally

entropies,

stochastic modulation[12]. The encoder-noise and attack-channel to be considered H(Z|X) = H(N),

(162)

H(Z|X) = H(N).

(163)

We now use these simplifications to present a useful capacity result for additive noise channels. May 2, 2006

are additive white Gaussian noise (AWGN). Thus for a stego-signal, x = (x1 , . . . , xn ), the corrupted stegosignal is given by, y = (x1 + n1 , . . . , xn + nn ), DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

where each ni ∼ N (0, σe2 ), and all are independent. The transition probabilities of the encoder-noise are given by, (

)

n 1 X 1 − W (y|x) = (yi − xi )2 . n exp 2σe2 i=1 (2πσe2 ) 2 (169) n

Similarly, the attack-channel is AWGN as

N (0, σa2 )

so the transition probabilities are, ( ) n 1 1 X n 2 A (z|y) = − 2 (zi − yi ) . n exp 2σa i=1 (2πσa2 ) 2 (170) 1) Variance Steganalyzer: In stochastic modulation a pseudo-noise is modulated by a message and added to

20

Let Ne = {Ne }2 where Ne ∼ N (0, σe2 ) and Na =

{Na } where Na ∼ N (0, σa2 ).

Let N = Ne + Na = {N n = Nen + Nan }∞ n=1 . Since

both Ne and Na are i.i.d. as N (0, σe2 ) and N (0, σa2 ),

respectively, their sum is i.i.d. as N (0, σe2 +σa2 ), i.e. N =

{N } with N ∼ N (0, σe2 + σa2 ).

Since N = {N } with N ∼ N (0, σe2 + σa2 ) we have

the following relations, H(N) = H(N) = H(N ) =

 1 log 2πe σa2 + σe2 . 2 (172)

Since H(N) = H(N) we see that the noise sequence satisfies the strong converse property.

the cover signal. This is done as the presence of noise in

3) Active Adversary Capacity: We now derive the

signal processing applications is a common occurrence.

secure capacity of the above stego-channel. Since the

If the passive adversary has knowledge of the dis-

noises are i.i.d. the general sequence N will satisfy the

tribution of the coversignal and knows that the hider

strong converse and allow the use of Theorem 4.1.

is using stochastic modulation, it also knows that the

The formal proof is then followed by a discussion

variance of a coversignal will differ from the variance

of the results and a description using the classic sphere

of a stegosignal. Thus if the passive adversary knows that

packing intuition.

the variance of the cover-distribution it could design a

Theorem 4.2: For the stego-channel (W, g, A) =

steganalyzer to trigger if the variance of a test signal is

n n {(W n , gn , An )}∞ n=1 with W and A defined by (169)

higher than this threshold.

and (170) respectively, and gn defined by (171) the

For example when testing the signal y = (y1 , . . . , yn )

secure capacity is,

the variance steganalyzer operates as,   1, gn (y) =  0,

if

1 n

else

Pn

i=1

yi2 > c

C(W, g, A) =

1 c + σa2 . log 2 2 σe + σa2

Proof: From Theorem 4.1 and (172) we have, (171) C(W, g, A) = sup {H(Z)} − H(N)

(174)

X∈S0

Thus, if the empirical variance of a test signal is above

= sup {H(Z)} − X∈S0

a certain threshold, the signal is considered stegano-

 1 log 2πe σa2 + σe2 . 2

(175)

graphic. 2) Additive Gaussian Channel Active Adversary: In this section we derive the capacity under an active adversary. Assume that the adversary uses an additive i.i.d. Gaussian noise with variance σa2 while the encoder noise is additive i.i.d. Gaussian with σe2 . May 2, 2006

(173)

Achievability: Let X = {X} where X ∼ N (0, c − σe2 ). Thus Y = 2 Recall

that for a general sequence, X

=

{X n

(n) (n) (X1 , . . . , Xn )}∞ n=1 when X = {X} is written it means (n) each Xi is independent and identically distributed as X.

= that

DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

X + Ne = {Y } with Y = X + Ne . By addition of independent Gaussians, Y ∼ N (0, c). This gives, ) ( n 1 X  (n) 2 > c → 0, (176) Pr Yi n i=1 and we see that X ∈ S0 . Similarly, Z = Na + Y = {Z} with Z = X + Ne + Na . Again by addition of

independent Gaussians we have Z ∼ N (0, c + σa2 ).

21

Proof: Since σa2 is a variance and always positive it suffices to show, n

1 X (n) K < c + γ, n i=1 ii

for all n greater than some N .

To show this, assume that no such N exists, thus we have a subsequence nk such that, nk 1 X (n ) K k ≥ c + γ. nk i=1 ii

This allows for a lower bound of, (175)

C(W, g, A) = sup H(Z) − X∈S0

 1 log 2πe(σe2 + σa2 ) 2

(177a)

 1 log 2πe(σe2 + σa2 ) 2  1 =H(Z) − log 2πe(σe2 + σa2 ) 2  1 = log 2πe(c + σa2 ) 2  1 − log 2πe(σe2 + σa2 ) 2 1 c + σa2 = log 2 2 σe + σa2 ≥H(Z) −

(177b)

nk 1 X (n ) K k =E nk i=1 ii

(

which in turn implies that,

nk 1 X y2 nk i=1 i

)

≥ c + γ,

Pr {gnk (Y nk ) = 0} → 0. / T0 . This is a contradiction as it shows Y = {Y n }∞ n=1 ∈

(177d) Lemma 4.3: For any Z n = (Z1 , . . . , Zn ) with Cij = (177e)

E {Zi Zj }, 1 H(Z ) ≤ log(2πe)n 2 n

To find the upperbound we will make use of a number

n

1X Cii n i=1

!n

.

(182)

Proof: From [13, Chap. 9.6] we have,

of simple lemmas: Lemma 4.1: For a given stego-channel with secure input distribution set S0 and secure output distribution set T0 , the following holds, (178)

Y∈T0

W

Proof: By definition for any X ∈ S0 and X → Y, we have Y ∈ T0 .

H(Z n ) ≤

(n)

(n)

(n)

Lemma 4.2: For Y n = (Y1 , Y2 , . . . , Yn ) let (n)

(n)

Kij be the covariance between Yi and Yj , that is o n (n) (n) (n) . For the stego-channel defined Kij := E Yi Yj n ∞ }n=1

(183)

The result follows from application of the arithmetic-

Lemma 4.4: For the above stego-channel, any Y ∈ T0 and any ǫ > 0 we have, lim inf n→∞

(n)

n Y 1 log(2πe)n Cii . 2 i=1

geometric inequality.

sup H(Z) ≤ sup H(Z).

X∈S0

(179)

(184)

A

Proof: Let any ǫ > 0 be given and choose γ > 0 such that,

this gives,

n

1 X (n) K + σa2 < c + σa2 + γ. n i=1 ii

1 1 H(Z n ) < log 2πe(c + σa2 ) + ǫ, n 2

where Z = {Z n }∞ n=1 and Y → Z.

∈ T0 we have for any γ > 0

there exists some N such that for all n > N ,

May 2, 2006

(181)

This means that,

(177c)

Converse:

above, if Y = {Y

(180)

 γ ≤ (c + σa2 ) e2ǫ − 1 ,

 1 1 log 2πe c + σa2 + γ ≤ log 2πe(c + σa2 ) + ǫ. (185) 2 2 DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

o n (n) (n) (n) (n) and Kij = Letting Cij = E Zi Zj o n (n) (n) (n) (n) we note that Zi = Yi + Na . This E Yi Yj

gives,

(n)

(n)

Cii = Kii + σa2 .

(186)

22

TABLE III G AUSSIAN A DDITIVE N OISE C APACITIES

Channel

Secure Capacity

C(W, g, A) C(W, g)

This gives,

C(·, g, A)

(L4.3) 1 1 H(Z n ) ≤ log(2πe)n n 2n

1 = log(2πe)n 2n

(186)

1 n

n X

(n)

Cii

i=1

!n

n

(187)

1 X (n) K + σa2 n i=1 ii

1 log(2πe)n c + σa2 + γ 2n  1 = log 2πe c + σa2 + γ 2 (185) 1 ≤ log 2πe(c + σa2 ) + ǫ 2

(L4.2)


0, 1 c + σa2 1 c + σa2 log 2 ≤ C(W, g, A) < log 2 + ǫ, 2 2 σe + σa 2 σe + σa2

May 2, 2006

Attack Noise

σe2

4) Noise Cases: We now use this theorem to inves-

The inequality of (189) holds for all but a finite number

and we see that C(W, g, A) =

Encoder Noise

c+σ 2 log σ2 +σa2 e a 1 c log 2 2 σe 2 c+σa 1 log 2 2 σa 1 log 0c 2

1 2

1 2

log

2 c+σa 2. σe2 +σa

constraint in communication over a stego-channel, as even if ǫ → 0 the capacity of the stego-channel is still zero. DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

Decoder

Noise

ℜn , we may view each codeword as a point in ℜn .

gn (y)

When we transmit a given codeword we may think of

Detection

the addition of noise as moving the point around in that

N (0, σ 2 )

Fig. 9.

9) Error Probability: Since we have that X n = Y n =

φn (y)

fn (m) Encoder

23

space. Since the power of the noise is σ 2 , the probability √ that the received codeword has moved more than nσ 2

AWGN Channel Passive Adversary

away from where it started goes to zero as n → ∞. Thus we know that if we transmit a codeword, it will likely 7) Noiseless Case: Consider the noiseless case where σe2

=

σa2

2

2

= σ and σ → 0. This gives, c + σ2 1 log 2 =∞ σ + σ2 →∞ 2

lim C(W, g, A) = 2lim 2

σ →0

σ

Thus we see that since the channel is noiseless, and the permissible set size (as well as input and output alphabets) is uncountable (thus infinite) and the capacity is unbounded.

be contained in a sphere (centered on the codeword) of √ radius nσ 2 . This means that if we receive a signal inside such a sphere, it is likely that the transmitted codeword was the center of that sphere. In this manner we can define a coding system. We know that for secure capacity the probability of error must go to zero. We also know that each codeword

8) Geometric Intuition: In this section we present some geometric intuition to the previous results, similar to the case of the classic additive Gaussian noise[13], [14].

has an associated sphere that the received signal will fall inside. Thus if we choose the codewords such that their spheres do not overlap, there will be no confusion in decoding and the probability of error will go to zero.

We will consider the case of only an encoder-noise of σ 2 , shown in Figure 9.

10) Detection Probability: We begin by looking at the permissible set. The permissible set for our gn is

From the above theorem we see that, 1 c C(W, g) = log 2 . 2 σ

given by, (193)

The most basic element will be the volume of an n dimensional sphere of radius r. In this case the volume is equal to, An rn ,

n

Pgn = {y ∈ Y :

n X

yi2 < nc}.

i=1

Clearly the permissible set is a sphere of radius

(195) √

nc

centered at the origin. If a test signal falls inside this sphere it is classified as non-steganographic, whereas if

(194)

where An is a constant dependent only on the dimension n.

it is outside it is considered steganographic. The second criteria for a secure system is that the probability of detection go to zero. If we were to

The fundamental question is what is the capacity

place each codeword such that its sphere was inside the

of the stego-channel, or how many codewords can we

permissible set, we know that the probability of detection

reliably use? To answer this we must consider the two

will go to zero.

constraints on a secure system: error probability and detection probability. May 2, 2006

11) Capacity: From the above we know that the codeword spheres cannot overlap (to ensure no errors), DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

and we also know that all the codeword spheres must fit inside the permissible set (to ensure no detection). Thus if we calculate the number of non-overlapping spheres

24

Since Cachin’s definition does not model noise, we may consider it as noiseless and apply Theorem 3.1, C(W, g) = sup H(X) = H(S).

(200)

X∈S0

we may pack into the permissible set, we will have a

This result states that in a system that is perfectly

general idea of the number of codewords we can use. n 2

secure (in Cachin’s definition) the limit on the amount

and the volume of each codeword sphere is An (nσ 2 ) 2

n

of information that may be transferred each channel use

we can place approximately,

is equal to the entropy of the source. This is intuitive

Since the volume of the permissible set is An (nc)

because in Cachin’s definition the output distribution

n  c  n2 An (nc) 2 , n = σ2 An (nσ 2 ) 2

of the encoder is constrained to be equal to the cover

non-overlapping sphere inside the permissible set.

distribution.

Thus using the center of each sphere as a codeword, we have Mn codewords where,  c  n2 Mn = . σ2

cal distribution steganalyzer is motivated by the fact that the empirical distribution from a stationary memoryless source converges to the actual distribution of that source.

If we consider the capacity as C(W, g)

lim

1 n

=

log Mn we have, 1 log Mn n  c  n2 1 = lim log n σ2 c 1 = log 2 , 2 σ

C(W, g) = lim

13) Empirical Distribution Steganalyzer: The empiri-

Accordingly, if the empirical distribution of the test signal converges to the cover-signal distribution it is

(196a)

considered to be non-steganographic. Assume that pS is a discrete distribution over the finite

(196b) (196c)

which agrees with the result of Theorem 4.2.

n n alphabet S. Let a sequence, {sn }∞ n=1 with each s ∈ S

be used to specify the steganalyzer for a test signal x as,   0 if P n = P , [s ] [x] (201) gn (x) =  1 if P n 6= P . [s ]

V. P REVIOUS W ORK R EVISITED

where P[x] is the empirical distribution of x.

12) Cachin Perfect Security: In Cachin’s definition of perfect security the cover-signal distribution and the stego-signal distribution are each required to be independent and identically distributed. This gives the following secure-input set,   1 S0 = X = {X} : lim D (S n ||X n ) = 0 . (197) n→∞ n The i.i.d. property means that D (S n ||X n )

=

nD (S||X) so we see that the above is equivalent to, S0 = {X = {X} : D (S||X) = 0} = {X = {X} : pS = pX } May 2, 2006

[x]

(198)

The permissible set for gn is equal to the type class of P[sn ] , i.e.,  Pgn = T (P[sn ] ) := x ∈ X n : P[x] = P[sn ] . (202) 14) Moulin

Steganographic

Capacity:

Moulin’s

formulation[2] of the stego-channel is shown in Figure 10. This is somewhat different than the formulation shown in Figure 1; most notable is the presence of distortion constraints. Additionally there is an absence of a distortion function prior to the steganalyzer. Also in this model the steganalyzer is

(199)

fixed as the previously discussed empirical distribution DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002 P[s]

25

P[x] = P[s] ?

Detection

Theorem 3.1.

P[x] M Source

S

fn (m, s)

n

An (y|x)

Xn

Encoder

Noise

D1

D2

Yn

1 log |Pgn | n 1 = lim inf log |T (sn )| n→∞ n

C(W, g) = lim inf

φn (y)

n→∞

Decoder

= H(S) Fig. 10.

(204a) (204b) (204c)

Moulin Stego-channel

Here we have used the fact that the permissible set for the empirical distribution detection function is the type class P[s] Detection

in (204b). Additionally, by Varadarjan’s Theorem[15],

P[s] = P[x] ?

P[sn ] (x) → pS (x) almost surely (here the convergence

P[x] M Source

Fig. 11.

S

fn (m, s)

X=Y

Encoder

φn (y)

is uniform in x as well). This allows for the use of the

ˆ M

type class-entropy bound from Theorem 1.1 that provides

Decoder

the final result.

Equivalent Stego-channel

We now show Moulin’s capacity is equal to this value. In the case of a passive adversary (D2 = 0), the following is the capacity of the stego-channel[2], steganalyzer. The sequence of sn to specify the

C ST EG (D1 , 0) = sup H(X|S)

steganalyzer is drawn i.i.d. as S. In order to have the two formulations coincide a number of simplifications

where a p ∈ Q′ is feasible if,

are needed for each model.

X

For our model, •

The stego-channel is noiseless



The steganalyzer is the empirical distribution

(205)

Q′ ∈Q′

s,x

p(x|s)pS (s)d(s, x) ≤ D1 ,

(206)

and X

p(x|s)pS (s) = pS (x).

(207)

s

For Moulin’s model,

First we upper-bound the secure capacity as,



Passive adversary (D2 = 0)



No distortion constraint on encoder (D1 = ∞)

C ST EG (∞, 0) =

sup

H(X|S)

(208a)

p(x|s)∈Q′



These changes produce the stego-channel shown in Figure 11.

sup H(X)

(208b)

p(x)∈Q′

= H(S)

(208c)

Theorem 5.1: For the stego-channel shown in Figure 11, the capacities of this work and Moulin’s agree.

Where the final line comes from the requirement that if p ∈ Q′ and p(x|s) = p(x) then p(x) = pS (x) for all x,

That is,

to satisfy (207). C(W, g) = C ST EG (∞, 0) = H (S) .

(203)

For the lower-bound we let Let pXS ˜ (x, s)

=

˜ pX|S ˜ (x|s)pS (s) = pS (x)pS (s), i.e. X ∼ pS . This Proof: Since the channel is noiseless we may apply May 2, 2006

defines a feasible covert-channel as (206) is trivially DRAFT

TRANSACTIONS ON INFORMATION THEORY, VOL. 1, NO. 11, NOVEMBER 2002

satisfied (since D1 = ∞) and (207) is as well since, X X pX|S pS (x)pS (s) = pS (x). ˜ (x|s)pS (s) = s

s

(209)

26

these algorithms. This work presents a theory to shed light onto this important quantity called steganographic capacity.

This gives, C ST EG (∞, 0) =

sup

H(X|S)

(210a)

A PPENDIX I

p(x|s)∈Q′

˜ ≥ H(X|S)

(210b)

˜ = H(X)

(210c)

= H(S)

(210d)

Theorem 1.1 (Entropy): Let (p1 , p2 , . . .) be a se-

˜ and S are independent Here (210c) is because X (pXS ˜ (x, s) = pX ˜ (x)pS (s)). VI. C ONCLUSIONS

quence of types defined over the finite alphabet X where pn ∈ Pn . Assume this sequence satisfies the following: 1) pn → p 2) pn ≺≺ p,

∀n

Then,

A framework for evaluating the capacity of stegano-

lim inf n→∞

graphic channels under a passive adversary has been

1 log |T (pn )| = H(p). n

(211)

introduced. The system considers a noise corrupting the Proof: We first show,

signal before the detection function in order to model real-world distortions such as compression, quantization,

lim inf n→∞

etc.

1 log |T (pn )| ≥ H(p). n

(212)

Constraints on the encoder dealing with distortion and a cover-signal are not considered. Instead, the focus is

A sharpening of Stirling’s approximation states that,

to develop the theory necessary to analyze the interplay n! =

between the channel and detection function that results

√ 1 2πnn+ 2 e−n eλn

in the steganographic capacity. The method uses an information-spectrum approach

with

1 12n+1

< λn