Quantitative Steganalysis of LSB Embedding in JPEG Domain
Jan Kodovský, Jessica Fridrich September 10, 2010 / ACM MM&Sec ’10
Quantitative Steganalysis of LSB Embedding in JPEG Domain 1 / 17
Motivation Least Significant Bit (LSB) embedding – Simplicity, high embedding capacity – Used in Jsteg, JP Hide&Seek, and other commercial stego software
Steganalysis of LSB embedding in spatial domain is mature area – [Dumitrescu-2002], [Ker-2008]
Our focus – Transform domain – JPEG format
Quantitative steganalysis – Outputs the estimate of the message length Quantitative Steganalysis of LSB Embedding in JPEG Domain 2 / 17
Jsteg Jsteg: [Upham-1993] LSB replacement – Embedding along a pseudo-random path
DCT histogram
– Skipping 0 and 1
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Quantitative Steganalysis of LSB Embedding in JPEG Domain 3 / 17
Jsteg Jsteg: [Upham-1993] LSB replacement – Embedding along a pseudo-random path
Full embedding
– Skipping 0 and 1
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Embedding violates histogram symmetry Quantitative Steganalysis of LSB Embedding in JPEG Domain 3 / 17
Selected Existing Attacks [Zhang,Ping-2003] – the first quantitative attack – Employed violation of histogram symmetry
[Yu-2004] – histogram-based attack – Generalized Cauchy ML fit – Chi-square test
[Lee-2006], [Lee-2007] – Category attack – Technically not quantitative
[Westfeld-2007], [Böhme-2008] – adaptation of spatial domain attacks [Pevný-2009] – support vector regression – Feature-based non-structural attack – Currently the most accurate quantitative attack Quantitative Steganalysis of LSB Embedding in JPEG Domain 4 / 17
Our Goals / Challenges
Improve the accuracy of existing quantitative attacks to Jsteg Achieve better performance than the feature-based machine learning approach (SVR) Focus on the structure of LSB embedding Deliver theoretically well-founded modular framework Explore the applicability of the proposed attacks to a different LSB embedding paradigms
Quantitative Steganalysis of LSB Embedding in JPEG Domain 5 / 17
Maximum Likelihood β . . . change rate
Emb(β)
Emb(β)
x P (y|x,β)
P (y|x,β)
Px (x)
cover feature vector Z P (y, β) =
stego feature vector
x Z
P (y, x, β)dx =
Z P (y|x, β)P (x, β)dx = P (β)
βˆ = arg max P (y|β) = arg max β≥0
y
β≥0
P (y|x, β)Px (x)dx
Z P (y|x, β)Px (x)dx
Choice of the feature vector x is crucial Quantitative Steganalysis of LSB Embedding in JPEG Domain 6 / 17
Features of Zhang & Ping x = [ x1
x2
[Zhang,Ping-2003]
x3 ]
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17
Features of Zhang & Ping x = [ x1
x2
[Zhang,Ping-2003]
x3 ]
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17
Features of Zhang & Ping x = [ x1
x2
[Zhang,Ping-2003]
x3 ]
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17
Features of Zhang & Ping x = [ x1
x2
[Zhang,Ping-2003]
x3 ]
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007] Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17
Features of Zhang & Ping x = [ x1 Emb(β)
x2
[Zhang,Ping-2003]
x3 ]
β
Z
1−β
arg max
P (y|x, β)Px (x)dx
β≥0
y=[
xβ1
xβ2
xβ3
-7
]
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
P (y|x, β) −→ Binomial distribution −→ Gaussian approximation Embedding invariants: x1 + x2 , x3 Px (x) → precover assumption [Ker-2007]
Precover 1/2
x1
1/2
x2 + x3
Quantitative Steganalysis of LSB Embedding in JPEG Domain 7 / 17
Performance Evaluation
Median absolute error
·10−2
ML - Zhang & Ping
Jsteg
0.8 0.6 0.4 0.2
0.00
0.05
0.10 0.15 Change rate β
0.20
3,250 JPEG images – resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption
xβ1 = expected value
⇒
Zhang & Ping’s estimator
Quantitative Steganalysis of LSB Embedding in JPEG Domain 8 / 17
Performance Evaluation
Median absolute error
·10−2
ML - Zhang & Ping SVR
Jsteg
0.8 0.6
– Cartesian-calibrated Pevný features (548)
0.4
– Additional 3,250 images for training
0.2
0.00
0.05
0.10 0.15 Change rate β
0.20
3,250 JPEG images – resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption
xβ1 = expected value
⇒
Zhang & Ping’s estimator
Quantitative Steganalysis of LSB Embedding in JPEG Domain 8 / 17
First-Order Statistics x = [ x−2L , x−2L+1 , . . . , x2R , x2R+1 ]
βˆ = arg max β≥0
Z P (y|x, β)Px (x)dx -7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
Embedding changes in individual LSB pairs are independent Y β β P (y|x, β) = P xβ P xβ 0 |x0 , β · P x1 |x1 , β · 2k , x2k+1 |x2k , x2k+1 , β k
Embedding invariants: x0 , x1 , x2k + x2k+1 Binomial distribution −→ Gaussian approximation Quantitative Steganalysis of LSB Embedding in JPEG Domain 9 / 17
First-Order Statistics x = [ x−2L , x−2L+1 , . . . , x2R , x2R+1 ]
x + 1 −p
p−1 2s βˆ = arg max β≥0
s
Z P (y|x, β)Px (x)dx -7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
DCT coefficients are i.i.d. drawn from generalized Cauchy distribution Parameters p and s are ML estimates, given embedding invariants Precover assumption for every LSB pair
Precover
Embedding invariants: x0 , x1 , x2k + x2k+1
Quantitative Steganalysis of LSB Embedding in JPEG Domain 9 / 17
Performance Evaluation
Median absolute error
·10−2
ML - Zhang & Ping ML - First-order SVR
Jsteg
0.8 0.6 0.4 0.2
0.00
0.05
0.10 0.15 Change rate β
0.20
Quantitative Steganalysis of LSB Embedding in JPEG Domain 10 / 17
Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k ∈ {1, 2, 4} Binomial / multinomial distributions −→ Gaussian approximations Z arg max β≥0
P (Y|X, β)Px (x)dx
Factorization of P (y|x, β) Embedding invariants
[-2,3]
[-1,3]
[0,3]
[1,3]
[2,3]
[3,3]
[-2,2]
[-1,2]
[0,2]
[1,2]
[2,2]
[3,2]
[-2,1]
[-1,1]
[0,1]
[1,1]
[2,1]
[3,1]
[-2,0]
[-1,0]
[0,0]
[1,0]
[2,0]
[3,0]
[-2,-1]
[-1,-1]
[0,-1]
[1,-1]
[2,-1]
[3,-1]
[-2,-2]
[-1,-2]
[0,-2]
[1,-2]
[2,-2]
[3,-2]
Analytic expression Quantitative Steganalysis of LSB Embedding in JPEG Domain 11 / 17
Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k ∈ {1, 2, 4} Binomial / multinomial distributions −→ Gaussian approximations Z arg max β≥0
P (Y|X, β)Px (x)dx
Complications arise Good parametric model ?
[-2,3]
[-1,3]
[0,3]
[1,3]
[2,3]
[3,3]
[-2,2]
[-1,2]
[0,2]
[1,2]
[2,2]
[3,2]
[-2,1]
[-1,1]
[0,1]
[1,1]
[2,1]
[3,1]
[-2,0]
[-1,0]
[0,0]
[1,0]
[2,0]
[3,0]
[-2,-1]
[-1,-1]
[0,-1]
[1,-1]
[2,-1]
[3,-1]
[-2,-2]
[-1,-2]
[0,-2]
[1,-2]
[2,-2]
[3,-2]
High complexity Quantitative Steganalysis of LSB Embedding in JPEG Domain 11 / 17
Zero Message Hypothesis (ZMH) Alternative heuristic approach – Penalty function z(x) ≥ 0 satisfying
z(xβ ) ≈ 0 when β = 0 z(xβ ) > 0 when β > 0
– z(x) should be a quantitative description of a zero message hypothesis capturing a key cover property violated by embedding – Assumption: y = E[xβ ] = Emb(x, β) – Assumption: mapping Emb is invertible ⇒ x = Emb−1 (y, β) βˆ = arg min z(Emb−1 (y, β)) β≥0
Comments – Low computational complexity – one-dimensional search over β – ZMH-based steganalysis is not a new idea! [RS steganalysis,2001] Quantitative Steganalysis of LSB Embedding in JPEG Domain 12 / 17
First-Order Statistics (ZMH)
x = [x−2L , x−2L+1 , . . . , x2R−1 , x2R ]
-7
Penalty function zsym (x) =
P
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
wk (xk − x−k )2
Weights wk chosen to minimize the estimator variance −→ least squares steganalysis [Ker-2007] Final form of the penalty function:
zsym (x) =
X (xk − x−k )2 k>0
xk + x−k
Quantitative Steganalysis of LSB Embedding in JPEG Domain 13 / 17
Performance Evaluation
Median absolute error
·10−2
ML - Zhang & Ping ML - First-order ZMH - First-order SVR
Jsteg
0.8 0.6 0.4 0.2
0.00
0.05
0.10 0.15 Change rate β
0.20
Quantitative Steganalysis of LSB Embedding in JPEG Domain 14 / 17
Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach – Decomposition into k-nodes – Embedding is invertible
[-2,3]
[-1,3]
[0,3]
[1,3]
[2,3]
[3,3]
[-2,2]
[-1,2]
[0,2]
[1,2]
[2,2]
[3,2]
[-2,1]
[-1,1]
[0,1]
[1,1]
[2,1]
[3,1]
[-2,0]
[-1,0]
[0,0]
[1,0]
[2,0]
[3,0]
[-2,-1]
[-1,-1]
[0,-1]
[1,-1]
[2,-1]
[3,-1]
[-2,-2]
[-1,-2]
[0,-2]
[1,-2]
[2,-2]
[3,-2]
provided 0 ≤ β < 1/2 – Symmetry about D
zadj (X) =
)2
X (xi,j − x−j,−i xi,j + x−j,−i i,j
D Quantitative Steganalysis of LSB Embedding in JPEG Domain 15 / 17
Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach – Decomposition into k-nodes – Embedding is invertible
[-2,3]
[-1,3]
[0,3]
[1,3]
[2,3]
[3,3]
[-2,2]
[-1,2]
[0,2]
[1,2]
[2,2]
[3,2]
[-2,1]
[-1,1]
[0,1]
[1,1]
[2,1]
[3,1]
[-2,0]
[-1,0]
[0,0]
[1,0]
[2,0]
[3,0]
[-2,-1]
[-1,-1]
[0,-1]
[1,-1]
[2,-1]
[3,-1]
[-2,-2]
[-1,-2]
[0,-2]
[1,-2]
[2,-2]
[3,-2]
provided 0 ≤ β < 1/2 – Symmetry about D
zadj (X) =
)2
X (xi,j − x−j,−i xi,j + x−j,−i i,j
D Quantitative Steganalysis of LSB Embedding in JPEG Domain 15 / 17
Performance Evaluation
Median absolute error
·10−2
ML - Zhang & Ping ML - First-order ZMH - First-order ZMH - Second-order SVR
Jsteg
0.8 0.6 0.4 0.2
0.00
0.05
0.10 0.15 Change rate β
0.20
Quantitative Steganalysis of LSB Embedding in JPEG Domain 16 / 17
What Else Can You Find in the Paper / Journal Version Error analysis of between-image and within-image errors for selected attacks Verification of precover assumptions using two different statistical tests Discussion & experiments with the symmetrized version of Jsteg Conversion of the Category attack [Lee-2006] into a quantitative one through the proposed ZMH framework Experiments conducted on two different sources of images Results reported in terms of two more security measures: IQR, median bias Quantitative Steganalysis of LSB Embedding in JPEG Domain 17 / 17