Quantitative Steganalysis of LSB Embedding in ... - Semantic Scholar

Report 2 Downloads 69 Views
Quantitative Steganalysis of LSB Embedding in JPEG Domain

Jan Kodovský, Jessica Fridrich September 10, 2010 / ACM MM&Sec ’10

Quantitative Steganalysis of LSB Embedding in JPEG Domain 1 / 17

Motivation Least Significant Bit (LSB) embedding – Simplicity, high embedding capacity – Used in Jsteg, JP Hide&Seek, and other commercial stego software

Steganalysis of LSB embedding in spatial domain is mature area – [Dumitrescu-2002], [Ker-2008]

Our focus – Transform domain – JPEG format

Quantitative steganalysis – Outputs the estimate of the message length Quantitative Steganalysis of LSB Embedding in JPEG Domain 2 / 17

Jsteg Jsteg: [Upham-1993] LSB replacement – Embedding along a pseudo-random path

DCT histogram

– Skipping 0 and 1

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

Quantitative Steganalysis of LSB Embedding in JPEG Domain 3 / 17

Jsteg Jsteg: [Upham-1993] LSB replacement – Embedding along a pseudo-random path

Full embedding

– Skipping 0 and 1

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

Embedding violates histogram symmetry Quantitative Steganalysis of LSB Embedding in JPEG Domain 3 / 17

Selected Existing Attacks [Zhang,Ping-2003] – the first quantitative attack – Employed violation of histogram symmetry

[Yu-2004] – histogram-based attack – Generalized Cauchy ML fit – Chi-square test

[Lee-2006], [Lee-2007] – Category attack – Technically not quantitative

[Westfeld-2007], [Böhme-2008] – adaptation of spatial domain attacks [Pevný-2009] – support vector regression – Feature-based non-structural attack – Currently the most accurate quantitative attack Quantitative Steganalysis of LSB Embedding in JPEG Domain 4 / 17

Our Goals / Challenges

Improve the accuracy of existing quantitative attacks to Jsteg Achieve better performance than the feature-based machine learning approach (SVR) Focus on the structure of LSB embedding Deliver theoretically well-founded modular framework Explore the applicability of the proposed attacks to a different LSB embedding paradigms

Quantitative Steganalysis of LSB Embedding in JPEG Domain 5 / 17

Maximum Likelihood β . . . change rate

Emb(β)

Emb(β)

x P (y|x,β)

P (y|x,β)

Px (x)

cover feature vector Z P (y, β) =

stego feature vector

x Z

P (y, x, β)dx =

Z P (y|x, β)P (x, β)dx = P (β)

βˆ = arg max P (y|β) = arg max β≥0

y

β≥0

P (y|x, β)Px (x)dx

Z P (y|x, β)Px (x)dx

Choice of the feature vector x is crucial Quantitative Steganalysis of LSB Embedding in JPEG Domain 6 / 17

Features of Zhang & Ping x = [ x1

x2

[Zhang,Ping-2003]

x3 ]

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17

Features of Zhang & Ping x = [ x1

x2

[Zhang,Ping-2003]

x3 ]

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17

Features of Zhang & Ping x = [ x1

x2

[Zhang,Ping-2003]

x3 ]

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17

Features of Zhang & Ping x = [ x1

x2

[Zhang,Ping-2003]

x3 ]

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17

Features of Zhang & Ping x = [ x1 Emb(β)

x2

[Zhang,Ping-2003]

x3 ]

β

Z

1−β

arg max

P (y|x, β)Px (x)dx

β≥0

y=[

xβ1

xβ2

xβ3

-7

]

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007]

Precover 1/2

x1

1/2

x2 + x3

Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17

Performance Evaluation

Median absolute error

·10−2

ML - Zhang & Ping

Jsteg

0.8 0.6 0.4 0.2

0.00

0.05

0.10 0.15 Change rate β

0.20

3,250 JPEG images – resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption

xβ1 = expected value



Zhang & Ping’s estimator

Quantitative Steganalysis of LSB Embedding in JPEG Domain 8 / 17

Performance Evaluation

Median absolute error

·10−2

ML - Zhang & Ping SVR

Jsteg

0.8 0.6

– Cartesian-calibrated Pevný features (548)

0.4

– Additional 3,250 images for training

0.2

0.00

0.05

0.10 0.15 Change rate β

0.20

3,250 JPEG images – resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption

xβ1 = expected value



Zhang & Ping’s estimator

Quantitative Steganalysis of LSB Embedding in JPEG Domain 8 / 17

First-Order Statistics x = [ x−2L , x−2L+1 , . . . , x2R , x2R+1 ]

βˆ = arg max β≥0

Z P (y|x, β)Px (x)dx -7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

Embedding changes in individual LSB pairs are independent     Y   β β P (y|x, β) = P xβ P xβ 0 |x0 , β · P x1 |x1 , β · 2k , x2k+1 |x2k , x2k+1 , β k

Embedding invariants: x0 , x1 , x2k + x2k+1 Binomial distribution −→ Gaussian approximation Quantitative Steganalysis of LSB Embedding in JPEG Domain 9 / 17

First-Order Statistics x = [ x−2L , x−2L+1 , . . . , x2R , x2R+1 ]

x  + 1 −p

p−1 2s βˆ = arg max β≥0

s

Z P (y|x, β)Px (x)dx -7

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

DCT coefficients are i.i.d. drawn from generalized Cauchy distribution Parameters p and s are ML estimates, given embedding invariants Precover assumption for every LSB pair

Precover

Embedding invariants: x0 , x1 , x2k + x2k+1

Quantitative Steganalysis of LSB Embedding in JPEG Domain 9 / 17

Performance Evaluation

Median absolute error

·10−2

ML - Zhang & Ping ML - First-order SVR

Jsteg

0.8 0.6 0.4 0.2

0.00

0.05

0.10 0.15 Change rate β

0.20

Quantitative Steganalysis of LSB Embedding in JPEG Domain 10 / 17

Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k ∈ {1, 2, 4} Binomial / multinomial distributions −→ Gaussian approximations Z arg max β≥0

P (Y|X, β)Px (x)dx

Factorization of P (y|x, β) Embedding invariants

[-2,3]

[-1,3]

[0,3]

[1,3]

[2,3]

[3,3]

[-2,2]

[-1,2]

[0,2]

[1,2]

[2,2]

[3,2]

[-2,1]

[-1,1]

[0,1]

[1,1]

[2,1]

[3,1]

[-2,0]

[-1,0]

[0,0]

[1,0]

[2,0]

[3,0]

[-2,-1]

[-1,-1]

[0,-1]

[1,-1]

[2,-1]

[3,-1]

[-2,-2]

[-1,-2]

[0,-2]

[1,-2]

[2,-2]

[3,-2]

Analytic expression Quantitative Steganalysis of LSB Embedding in JPEG Domain 11 / 17

Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k ∈ {1, 2, 4} Binomial / multinomial distributions −→ Gaussian approximations Z arg max β≥0

P (Y|X, β)Px (x)dx

Complications arise Good parametric model ?

[-2,3]

[-1,3]

[0,3]

[1,3]

[2,3]

[3,3]

[-2,2]

[-1,2]

[0,2]

[1,2]

[2,2]

[3,2]

[-2,1]

[-1,1]

[0,1]

[1,1]

[2,1]

[3,1]

[-2,0]

[-1,0]

[0,0]

[1,0]

[2,0]

[3,0]

[-2,-1]

[-1,-1]

[0,-1]

[1,-1]

[2,-1]

[3,-1]

[-2,-2]

[-1,-2]

[0,-2]

[1,-2]

[2,-2]

[3,-2]

High complexity Quantitative Steganalysis of LSB Embedding in JPEG Domain 11 / 17

Zero Message Hypothesis (ZMH) Alternative heuristic approach – Penalty function z(x) ≥ 0 satisfying

z(xβ ) ≈ 0 when β = 0 z(xβ ) > 0 when β > 0

– z(x) should be a quantitative description of a zero message hypothesis capturing a key cover property violated by embedding – Assumption: y = E[xβ ] = Emb(x, β) – Assumption: mapping Emb is invertible ⇒ x = Emb−1 (y, β) βˆ = arg min z(Emb−1 (y, β)) β≥0

Comments – Low computational complexity – one-dimensional search over β – ZMH-based steganalysis is not a new idea! [RS steganalysis,2001] Quantitative Steganalysis of LSB Embedding in JPEG Domain 12 / 17

First-Order Statistics (ZMH)

x = [x−2L , x−2L+1 , . . . , x2R−1 , x2R ]

-7

Penalty function zsym (x) =

P

-6

-5

-4

-3

-2

-1

0

1

2

3

4

5

6

7

wk (xk − x−k )2

Weights wk chosen to minimize the estimator variance −→ least squares steganalysis [Ker-2007] Final form of the penalty function:

zsym (x) =

X (xk − x−k )2 k>0

xk + x−k

Quantitative Steganalysis of LSB Embedding in JPEG Domain 13 / 17

Performance Evaluation

Median absolute error

·10−2

ML - Zhang & Ping ML - First-order ZMH - First-order SVR

Jsteg

0.8 0.6 0.4 0.2

0.00

0.05

0.10 0.15 Change rate β

0.20

Quantitative Steganalysis of LSB Embedding in JPEG Domain 14 / 17

Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach – Decomposition into k-nodes – Embedding is invertible

[-2,3]

[-1,3]

[0,3]

[1,3]

[2,3]

[3,3]

[-2,2]

[-1,2]

[0,2]

[1,2]

[2,2]

[3,2]

[-2,1]

[-1,1]

[0,1]

[1,1]

[2,1]

[3,1]

[-2,0]

[-1,0]

[0,0]

[1,0]

[2,0]

[3,0]

[-2,-1]

[-1,-1]

[0,-1]

[1,-1]

[2,-1]

[3,-1]

[-2,-2]

[-1,-2]

[0,-2]

[1,-2]

[2,-2]

[3,-2]

provided 0 ≤ β < 1/2 – Symmetry about D

zadj (X) =

)2

X (xi,j − x−j,−i xi,j + x−j,−i i,j

D Quantitative Steganalysis of LSB Embedding in JPEG Domain 15 / 17

Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach – Decomposition into k-nodes – Embedding is invertible

[-2,3]

[-1,3]

[0,3]

[1,3]

[2,3]

[3,3]

[-2,2]

[-1,2]

[0,2]

[1,2]

[2,2]

[3,2]

[-2,1]

[-1,1]

[0,1]

[1,1]

[2,1]

[3,1]

[-2,0]

[-1,0]

[0,0]

[1,0]

[2,0]

[3,0]

[-2,-1]

[-1,-1]

[0,-1]

[1,-1]

[2,-1]

[3,-1]

[-2,-2]

[-1,-2]

[0,-2]

[1,-2]

[2,-2]

[3,-2]

provided 0 ≤ β < 1/2 – Symmetry about D

zadj (X) =

)2

X (xi,j − x−j,−i xi,j + x−j,−i i,j

D Quantitative Steganalysis of LSB Embedding in JPEG Domain 15 / 17

Performance Evaluation

Median absolute error

·10−2

ML - Zhang & Ping ML - First-order ZMH - First-order ZMH - Second-order SVR

Jsteg

0.8 0.6 0.4 0.2

0.00

0.05

0.10 0.15 Change rate β

0.20

Quantitative Steganalysis of LSB Embedding in JPEG Domain 16 / 17

What Else Can You Find in the Paper / Journal Version Error analysis of between-image and within-image errors for selected attacks Verification of precover assumptions using two different statistical tests Discussion & experiments with the symmetrized version of Jsteg Conversion of the Category attack [Lee-2006] into a quantitative one through the proposed ZMH framework Experiments conducted on two different sources of images Results reported in terms of two more security measures: IQR, median bias Quantitative Steganalysis of LSB Embedding in JPEG Domain 17 / 17