Blind Image Steganalysis Based on Statistical Analysis of Empirical ...

Report 2 Downloads 101 Views
Blind Image Steganalysis Based on Statistical Analysis of Empirical Matrix Xiaochuan Chen1, Yunhong Wang2, Tieniu Tan1, Lei Guo1 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, P.O.Box 2728, Beijing 100080 2 School of Computer Science and Engineering, Beihang University, 100083 E-Mail: {xcchen, wangyh, tnt, lguo}@nlpr.ia.ac.cn Abstract In this paper, a novel steganalysis method based on statistical analysis of empirical matrix (EM) is proposed to detect the presence of hidden message in an image. The projection histogram (PH) of EM is used to extract features composed of two parts: the moments of PH and the moments of the characteristic function of PH. Also, features extracted from prediction-error image [7] are included to enhance performance. SVM is utilized as classifier. A test database is constructed, based on which a detailed test for different categories of features and a comparison with methods in prior arts are conducted. Experiments show that the features we proposed are more effective than prior arts and our steganalysis method could blindly detect the presence of data hiding for various embedding schemes with high performance.

1. Introduction In recent years, with the development of digital multimedia and network technology, information hiding has received much attention both in theoretical and industrial fields. Information hiding is to hide data in the cover medium imperceptibly. Particularly, steganography is a typical application of information hiding. The main purpose of steganography is for covert communication. Generally, most data hiding schemes could be ascribed into three kinds of typical principles: the spread spectrum (SS), least significant bit (LSB) and quantization index modulation (QIM) data hiding methods. In contrast to data hiding, steganalysis is the art of detecting the presence of hidden messages. It is developed for the urgent demands of network securities to block the covert communication with illegal information. There are various steganalysis methods aiming at certain kind of data hiding methods,

such as the steganalysis method of SS data hiding in [5], of QIM data hiding in [11], and of LSB data hiding in [12]. However, a general steganalysis method that can attack steganography blindly, that is, detect hidden data without knowing the embedding methods, will be more useful in practical applications. The massive variety of data hiding methods makes the design of steganalysis methods that can blindly cope with most embedding methods very challenging. In [2], a blind steganalysis method is proposed, in which 2D features are extracted in the spatial and DCT domain. In [10], Harmsen el al. proposed a steganalysis method which exploits the change of mass center of the histogram characteristic function after embedding to distinguish the stego images from original ones. However, both of these two methods only extract such a few number of features that the performance is not satisfying. Shi el al. [7] proposed to utilize the multiple-order moments of characteristic functions using wavelet decomposition as high dimensional features and won relatively high performance. But the performance of this method under very low embedding rate is still not satisfying. This paper proposes a blind steganalysis method based on statistical analysis of the empirical matrix (EM). More specifically, we extract the moments of projection histogram (PH) of EM and moments of characteristic function of PH as features, and utilize SVM as classifier. Our method performs significantly better than prior blind steganalysis techniques.

2. Feature Extraction In our method, we consider the empirical matrix (or co-occurrence matrix) as a raw representation of the statistical characteristics of images. For a given gray scale image I with N gray levels, the N × N empirical matrix (EM) M r ,θ of the image is defined as: x,y

x = x1 + r cosθ ⎞ ⎛ M r ,θ (i, j ) = P ⎜⎜ I x1 , y1 = i, I x2 , y2 = j 2 ⎟ y2 = y1 + r sin θ ⎟⎠ ⎝

(1)

where r is the step, θ is the direction, and P represents the normalized probability. Fig. 1 shows a sample image and its EM plot M 1,0 with r = 1 and θ = 0 is shown in Fig.2. It can be seen that the EM is highly concentrated in the diagonal line of i = j , the reason is that the neighboring pixels are highly correlated and tend to have the same or close gray values.

be served as features that give superior discriminability. 1 2 3 Fig. 5 shows the plot of mH1,0 , mH1,0 and mH1,0 of massive original and stego images (Cox el al.’s SS) in 3-D coordinate, from which we can see that samples for original and stego images are distinctly clustered to a great extent. This illustrates the effectiveness of moments of projection histogram as features.

2.1. Moments of EM Projection Histogram

50

100

According to [5], the concentration effect of EM will be weakened after spread spectrum data hiding, seen as slightly spreading away from the diagonal line. Our consideration is to project the EM along the diagonal line (dashed arrow in Fig.2), and generates the 1-D projection histogram H r ,θ , H r ,θ ( k ) = ∑ M r ,θ (i , i + k ) 1 − N ≤ k ≤ N − 1 k ∈ Z

(2)

i

150

200

250 50

Fig.1 Sample image

100

150

200

250

Fig.2 M 1,0 of Fig.1

0.16

Original Stego

0.15

0.14 0.12 0.1 0.08

0.1

0.06

The length of H r ,θ is 2 N − 1 . It is noted that the 1-D projection histogram H r ,θ is actually the histogram of directional derivatives (although in various step and direction). H1,0 , the projection histogram of M 1,0 , is shown in Fig.3, and Fig.4 shows the local zoom-in plot of H1,0 for original image and stego version produced

0.04 0.02 0

-200

-100

0

100

200

Fig.3 H1,0 of M 1,0

0.05 -3

-2

-1

0

1

2

3

Fig.4 Zoom-in of H1,0

2.2. Moments of Characteristic Function

In order to focus on the variation of H r ,θ after data

In most steganalysis methods, a general assumption is that the noise introduced by data hiding is additive, Gaussian distributed. Here we can further suppose that noises introduced to directional derivatives of images (refer to 2.1) by data hiding concord with such assumption. This supposition is tested to be valid for most embedding schemes in our experiments. Under this assumption, we can apply an existing conclusion [10], that the magnitudes of the characteristic function of H r ,θ (referred as CF in the following) do not

hiding, we use the multiple-order moments of H r ,θ as

increase after data hiding. The CF of H r ,θ is simply the

features to detect the data hiding, where the n order moment of H r ,θ is defined as follows:

DFT of H r ,θ , referred as Fr ,θ . Fig. 6 shows the plot of

by Cox et al.’s SS method [1]. It can be easily seen that due to the spreading-away effect of M 1,0 after data hiding, the projection histogram correspondingly becomes “flattened”. Moreover, we observed that not only SS data hiding but also the LSB and QIM steganography will generally produce such spreadingaway effect on M r ,θ .

th

mH rn,θ =

N −1



k =1− N

k n H r ,θ ( k )

N −1



k =1− N

H r ,θ ( k ) =

N −1



k =1− N

k n H r ,θ ( k )

(3)

Based on such definition, the odd-order moments of H r ,θ will represent the weighted sum of H r ,θ ( k ) − H r ,θ ( − k ) while the even-order moments of H r ,θ

will

represent

the

weighted

sum

of

H r ,θ ( k ) + H r ,θ ( − k ) . It is expected that multiple order

moments of H r ,θ will reflect the change on H r ,θ imposed by data hiding. In fact, we found that the odd-order moments of H r ,θ has a strong tendency to decrease while the even-order moments of H r ,θ to increase. The combination of the even and odd order moments can

F1,0 of H1,0 shown in Fig.4. It complies with our

assumption since the line for the stego image is below the one for the original image. We utilize multi-order moments of characteristic function (CF) [7] of H r ,θ as features, defined as: mFrn,θ =

( L / 2)

∑ j =1

f jn Fr ,θ ( f j )

( L / 2)

∑ j =1

Fr ,θ ( f j )

(4)

where Fr ,θ ( f j ) is the component of Fr ,θ at frequency f j , L is the DFT sequence length. As so, the mFr ,θ will not

increase after data hiding. In [7], they utilize multiorder moments of CF of wavelet subbands as features. Compared with [7], the cutting errors and precision degradation in calculating the histograms of wavelet subbands are avoided in our method since we calculate

the histogram of discrete integers instead of floats. And as H r ,θ has a unimodal and smooth profile, the CF plot is regularly mono-decreasing and the decrease in moments of CF is more stable and prominent, resulting in the enhancement of discriminability for features. 1 Original Stego

0.8

network with four-hidden layer and one output layer is utilized as classifier, but it is intensively time consuming to train the neural network. In order to compare the effectiveness between features of our method and features proposed in [7], we tested them using the same SVM classifier on the same database, and our method performs much better.

4. Experiments

0.6 0.4

4.1. Test Database

0.2 0

Fig.5 Plot of

1 2 3 T [ mH1,0 , mH1,0 , mH1,0 ]

0

50

100

150

200

250

300

Fig.6 Magnitude of F1,0

2.3. Feature Vector In our experiments, we use the M r ,θ on three directions and in three steps, which are expressed as, 2 3, θ = 0、π / 2} ∪ {( r,θ ) | r = 2、 2 2、 3 2, r = π / 4} . There {(r,θ ) | r = 1、、 are totally 9 empirical matrices. Then we generate the projection histogram corresponding to each EM, and calculate the first three order moments of each projection histogram. Further, we generate CF for each histogram, and calculate first three order moments of each CF. Thus there is a 54-D feature vector for an image. Due to the vast diversity of images, features extracted from various images will be affected by impertinent information and become unstable. With reference to [7], we simply adopt the prediction-error image proposed therein to eliminate miscellaneous information while retain the data valuable for classification. For simplicity, the detailed description of such prediction-error image is omitted here. We repeat the above feature extraction steps for the prediction-error image, thus there is another 54-D feature vector. By concatenation of the two feature vectors, we obtain a 108-D feature vector. Including more features of more directions, more steps and more moments than above will not further improve the performance or even make degradation as indicated by our experiments.

3. Classifier In our work, we use Joachim’s support vector machine (SVM) implementation SVMlight [6] as a classifier. And a linear kernel is used since we found non-linear kernels did not perform better. The wellknown SVM implementation is computationally efficient for large scale SVM learning. In [7], a neural

The construction of a general and comprehensive image database is a fundamental and important work in steganalysis research. At present, our image database comprises all the 1349 images in CorelDraw version 11 CD#4, and six stego images are generated for each original image using the following six typical data hiding schemes: ·#1: Cox el al.’s non-blind SS [1] (36dB) ·#2: Huang el al.’s 8*8 DCT block SS [9] (48dB) ·#3: Piva el al.’s blind SS [3] (56dB) ·#4: Generic LSB (0.3bpp, 56dB) ·#5: Lie el al.’s adaptive LSB [8] (0.3bpp, 51dB) ·#6: Generic QIM method [4](0.11bpp, 47dB) To make our tests persuasive, we customized the data hiding methods to embed very small amount of data, the approximate average embedding rate and PSNR are shown in brackets. Thus a fairly harsh condition is created for steganalysis.

4.2. Test Results We tested our method on the above database aiming at said six data hiding methods. In “separate” mode, we randomly select 700 original images and their corresponding 700 stego images as training sample, and the remaining 649 pairs are used to test. In the “mixed” mode, we randomly select 700 original images and their corresponding 700*6 stego images generated with the above six embedding methods as training samples, and the remaining 649 original images and 649*6 stego images as test samples. The performance is evaluated with three measures: accuracy (AR), which represents the ratio of correct classification, false positive (FP), which is the ratio of wrongly classifying the plain images as stego ones, false negative (FN), which is the ratio of wrongly classifying the stego images as plain ones. In practical application, we should keep FP as low as possible while enhancing accuracy. Firstly, in order to demonstrate the rationality of our feature selection, we selectively comprise partial

features of different categories, that is of original image (Orig, 54D), of prediction error image (PE, 54D), of projection histogram moments (MH, 54D), of projection histogram CF moments (cfMH, 54D), and of all (ALL, 108D), to perform training and classifying. All test results are obtained using 5-time average. Table I shows the accuracy for each group of features in our test strategy, from which we can see that MH performs better on some data hiding methods than cfMH and worse on other methods, and PE always performs better than Orig, and ALL performs best in almost all cases. The result serves as an explanation for the reason we select the 108D features. Table I Test results for different group of features Schemes MH cfMH Orig PE ALL #1 96.6% 92.7% 90.9% 98.0% 98.1% #2 97.0% 92.7% 91.6% 98.1% 98.3% #3 96.8% 59.6% 84.8% 95.5% 96.6% #4 96.7% 97.8% 92.4% 99.2% 99.4% #5 96.9% 97.1% 88.4% 99.1% 98.9% #6 96.9% 99.4% 98.4% 99.8% 99.9% Mixed 98.1% Then, we further compare our method with the method in [2] and [7], wherein the method in [7] is embodied with our SVM classifier. The test result is shown in Table II. As shown, the method in [2] is not effective on our database due to the harsh testing condition, and our method performs much better than these two methods. Specially, in embedding scheme #3, the method in [7] fails while our method keeping high detection rate because of the good performance of MH as shown in Table I. Table II Comparison with other steganalysis methods Proposed Schemes [2] [7] AR FP FN #1 61.1% 92.5% 98.1% 1.9% 2.0% #2 58.2% 92.3% 98.3% 1.7% 1.7% #3 51.3% 60.1% 96.6% 2.2% 4.7% #4 58.6% 96.5% 99.4% 0.5% 0.7% #5 64.0% 97.3% 98.9% 0.6% 1.6% #6 80.1% 98.6% 99.9% 0.1% 0.2% Mixed 87.2% 98.1% 3.5% 0.3%

5. Conclusion In this paper, a novel steganalysis method which blindly detects the presence of data hiding is proposed. Our method is based on the statistical analysis of the empirical matrix. We extract features through processing of the empirical matrix and utilize SVM as classifier. Our system performs better than existing

methods, and is effective in blind detection of a wide range of data hiding schemes. Our work in the near future is to try to extract more features from the EM to improve performance further and perform more comprehensive tests to both theoretical and commercial data hiding methods.

6. Acknowledgment The work on this paper was supported by Nature Science Foundation of China (Grant No.60121302, 60335010), National Basic Research Program of China (Grant No.2004CB318110), Ministry of Science and Technology of PRC (Grant No. 2004DFA06900).

7. References [1] Cox, I.J., Kilian, J., Leighton, F.T., and Shamoon, T., “Secure spread spectrum watermarking for multimedia”, IEEE Trans. Image Process, 1997, 6(12), 1673–1687. [2] W. –N. Lie, G. -S. Lin, “A feature-based classification technique for blind image steganalysis,” IEEE Trans. on Multimedia, 2005, vol. 7, Issue 6, pp1007-1020. [3] A.Piva, M.Barni, E Bartolini, V.Cappellini, “DCT-based watermark recovering without resorting to the uncorrupted original image”, Proc. ICIP 97, vol. 1, pp.520 [4] B.Chen and G.W.Wornell, “Digital watermarking and information embedding using dither modulation”, Proceedings of IEEE MMSP 1998, pp273 – 278. [5] K. Sullivan, U. Madhow, S. Chandrasekaran, B.S. Manjunath, “Steganalysis of Spread Spectrum Data Hiding Exploiting Cover Memory,” SPIE 2005, vol. 5681. [6] T. Joachims, “Making large-Scale SVM Learning Practical,” in Advances in Kernel Methods - Support Vector Learning, B. Schölkopf, C. Burges, and A. Smola, eds., MIT Press, 1999. [7] Yun Q. Shi et al., “Image Steganalysis Based on Moments of Characteristic Functions Using Wavelet Decomposition, Prediction-Error Image, and Neural Network”, ICME 2005, pp269 - 272. [8] W. -N. Lie and L. -C. Chang, “Data hiding in images with adaptive numbers of least significant bits based on human visual system,” in Proc IEEE Int. Conf. Image Processing, 1999, pp. 286-290. [9] J. Huang and Y. Q. Shi, “Adaptive image watermarking scheme based on visual masking,” Electron. Lett., Apr. 1998, vol. 34, no. 8, pp. 748-750. [10] J. Harmsen, Steganalysis of Additive Noise Modelable information hiding, MS thesis, Rensselaer Polytechnic Institute, NY, thesis advisor William Pearlman, Apr. 2003. [11] K. Sullivan, Z. Bi, U. Madhow, S. Chandrasekaran, B. S. Manjunath, “Steganalysis of quantization index modulation data hiding,” ICIP 2004, vol. 2, pp1165-1168. [12] X. Y. Yu, T. N. Tan, Y. H. Wang, “Extended Optimization Method of LSB Steganalysis,” ICIP 2005, vol. 2, pp1102-1105.