COMPARISON OF WATERMARKING ALGORITHMS VIA A GA-BASED BENCHMARKING TOOL V. Conotter1, G. Boato1, C. Fontanari2, and F.G.B. De Natale1 1
Department of Information Engineering and Computer Science, University of Trento Email: {conotter, boato, denatale}@disi.unitn.it 2 Department of Mathematics, University of Trento Email:
[email protected] ABSTRACT In this paper we present the application of a recently developed benchmarking tool to the comparison of watermarking algorithms. We carry out an extensive analysis to assess and compare robustness of digital image watermarking techniques by considering the perceptual quality of un-marked images in terms of Weighted PSNR. Such a benchmarking tool employs genetic algorithms, introduces a novel metric for robustness assessment based on perceptual quality measures and allows analysis and comparison of different techniques performances. Experimental results show the effectiveness of the proposed approach. Index Terms— Digital Watermarking, Benchmarking, Performances Comparison 1. INTRODUCTION In the age of information technology it has become easier and easier to access and redistribute digital multimedia data, thanks to the spreading of Internet as a global mean of communication. Similarly, the availability of low-cost digital imaging devices together with editing software allows us to make a daily use of digital images in all areas of our everyday life. Thus, problems related to authenticity and integrity of images have risen, stimulating the scientific community to focus on solutions for these issues. Digital watermarking has been proposed as an effective instrument against piracy by proving the content ownership, verifying authenticity and tracking copyright. Therefore, except for specific applications, the major constraint for a mark embedded into a cover work is robustness against manipulations, including a great variety of digital and analog processing operations. Hence designing an efficient watermarking algorithm is extremely challenging and the research is still in progress, proposing a great variety of solutions. As a consequence, the role of performance evaluation through the use of benchmarking frameworks has achieved more and more importance to speed up the research in the field of digital watermarking and stimulate a continuous improvement of the existing techniques by
978-1-4244-5654-3/09/$26.00 ©2009 IEEE
4229
identifying the weaknesses and failings of different methods (see for instance [1], [2] and references therein). In the literature there are already several benchmarking tools, which standardize the process of evaluating a watermarking system on a large set of single attacks. We developed an innovative and flexible tool, presented in [3], suitable to evaluate the robustness level of digital watermarking techniques, by introducing a novel metric based on the perceptual quality evaluation for un-marked images. A set of attacks is chosen depending on the application the under-test algorithm is intended for. Then Genetic Algorithms (GA) perform the search of optimal parameters to be assigned to each image processing operator, as well as the order they have to be applied in, to remove the watermark from the content while keeping the perceptual quality of the resulting image as high as possible. The perceived quality of the recovered unmarked image, here measured by means of the Weighted Peak Signal to Noise Ratio (WPSNR), turns out to be inversely proportional to the robustness of the method. The major difference with the existing benchmarking tools consists in the possibility to test the selected algorithm under a combination of attacks, evaluating the relative performance in terms of visual degradation perceived by the Human Visual System (HVS). We point out that the combination of more attacks produces a gain of quality in the unmarked image with respect to the degradation introduced by one single image processing operator to remove the watermark. On the other hand, taking into account the effect of more than one attack at one time makes this problem nonlinear and multidimensional. Therefore a suitable optimization technique as GA is needed to converge to an optimal or near-optimal solution. In this paper we propose to employ this innovative benchmarking tool in order to compare performances of watermarking algorithms in terms of their robustness, as promised as future work in [3]. Indeed, a fair comparison seems to be more and more compelling, given the growing number of watermarking algorithms present in the literature. From a developer’s point of view, a benchmark allows to verify the robustness of the own method with respect to
ICIP 2009
existing algorithms, thus stimulating the research to look for better and better solutions. On the other hand, from a user’s point of view, a comparison tool is useful in order to select the most appropriate watermarking algorithm for the intended application. Given two different techniques, fairly parameterized in order to guarantee similar initial conditions, the key idea is to compare their relative robustness indexes, evaluated by mean of a novel robustness metric. This is inversely proportional to the maximal perceptual quality achievable while removing the mark, indeed found by the GA-based benchmarking tool. Consequently, the algorithm whose robustness index is greater results to be more robust since it requires a heavier degradation to remove the mark (i.e., a lower perceptual quality of the un-marked image). 2. ROBUSTNESS ASSESSMENT AND COMPARISON In this work we exploit our benchmarking tool based on GA, presented in [3], which evaluates methods’ robustness in terms of perceptual quality of the un-marked images. Given a set of attacks, our idea is to find the best parameterization removing the mark embedded in a set of reference images, while maximizing the perceptual quality of the processed content. Since the traditional quality measures, such as Mean Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) do not take into account the modification perceived by the HVS, we suggest adopting as a metric a modified version of PSNR, called Weighted PSNR (WPSNR) [4]:
where Ipeak is the peak value of the input image and NVF (Noise Visibility Function) is a new parameter which considers distortions in different textured areas: its value ranges from zero for extremely textured areas, where degradations are less visible, up to one for smooth areas of the content. Given the function WPSNR that we want to maximize, if we consider combinations of attacks and avoid brute force computations in order to find the best solution, a suitable optimization technique is needed. In [3] we exploited GA, which are robust, stochastic search methods modeled on the principle of natural selection and evolution [5]. In our benchmarking application, GA give as an output the best attack pattern to be applied to the input marked image in order to remove the mark while maximizing the WPSNR of the un-marked version of the image. Namely, in our approach, once a set of admissible image processing operators is fixed, the robustness of a method is measured with the following metric:
where Q is a fixed quality threshold, q is the perceptual quality of a watermarked image Iw and M(q) is the maximal perceptual quality of the unmarked image obtained from Iw
4230
Figure 1 Block scheme for watermarking techniques comparison.
by applying the found combination of the selected attacks. Given Q, chosen dependently on the application scenario, and the value of M(q), found by running GA, the robustness index R(q) is evaluated according to Eq. (2). If R(q) is greater than 1, then it is possible to remove the mark from the given image only degrading its maximal perceptual quality M(q) under Q. As a consequence, the watermarking algorithm can be declared robust since a large degradation needs to be introduced in the image to remove the mark. On the other hand, the embedded watermark is not robust if M(q) assumes values higher than the threshold Q (i.e. R(q) is less than 1). Note that the choice of the attacks to be performed depends on the application scenario the algorithm has been designed for (i.e., logo or authentication application), as well as on the threshold Q. As long as two watermarking algorithms are comparable (i.e. they share the same watermark recovery process, either detection or decoding), the introduced metric may represent a valuable instrument to measure differences between their robustness performances. Indeed, in this paper we aim at identifying the more robust technique between the two under analysis with respect to the perceptual quality of the un-marked image. As shown in Fig. 1, running the developed benchmarking framework for two input algorithms, we get their maximum perceptual quality, M1(q) and M2(q) respectively, reachable while removing the mark. We underline that parameters for both input algorithms (i.e., embedding strength) are selected to guarantee the same perceptual quality q of the marked images in terms of WPSNR. According to Eq. (2), we then evaluate their robustness indexes, R1(q) and R2(q) respectively, which can be exploited in order to provide a reliable comparison. In particular, if R1(q) is greater than R2(q), algorithm 1 can be declared more robust than algorithm 2 with respect to the selected attacks, since a harder distortion is required to remove the inserted mark (i.e., M1(q)<M2(q)). 3. EXPERIMENTAL ANALYSIS In this section we set up the robustness comparison of two pairs of well-known algorithms, as a proof of concept of the efficiency of the proposed approach. The choice of the algorithms to be analyzed has been done based on their scheme characteristics, turning out to be representative of modern watermarking schemes. The former technique has been presented by Barni et. al. in [6] and the latter has been proposed by Li and Cox in [7]. Both are perceptual-based watermarking algorithms and the main difference between them lies in the watermark recovery process, allowing watermark detection in the first case and watermark
3.1. Analysis of [6] This algorithm adopts a perceptual mask in the embedding process in order to achieve high invisibility and robustness. It operates in the wavelet domain and employs a classic scheme for embedding: where is the -subband at 0-level wavelet decomposition, is a strength value, w(i,j) is a weighting function and x(i,j) is the watermark to be inserted. The perceptual weighting in [6] is performed as
where l is the level of the wavelet decomposition (set to 0), and (i,j) indicates pixels position. In Eq. (4) the first term (l,) takes into account the dependency of the sensitivity to noise changes to the orientation and level of the band. The second term (l,i,j) takes into account the local brightness of the image, while the third term (l,i,j) relates to the texture activity in the neighborhood of a pixel. As suggested in [6], this method is compared with a similar algorithm working in the wavelet domain but employing a different masking technique. In particular, subbands are divided into non overlapping blocks; for each block the variance is evaluated and used as a weight for the watermark embedding, as follows: where pixel (i,j) belongs to the 8x8 block B(i,j) and K is the maximum of block variance. In this method the
4231
&
decoding in the second one. The comparison has been made with analogous schemes characterized by the absence of a perceptual model (or at least by the presence of a more trivial one), in order to put into evidence the improvements in performances of perceptually adaptive watermarking schemes. In the first case, the comparison for [6] is made with a related algorithm introduced in the same paper but employing a more trivial perceptual model based on variance, while for [7] a technique without any perceptual model is used for comparison. The parameters for the embedding procedure are carefully selected so that the resulting watermarked images present the same WPSNR. They are then processed by the proposed GA-based tool which requires the selection of attacks. In this work we take into consideration combination of some (2 or 3) attacks chosen among JPEG2000 compression, JPEG compression, Additive White Gaussian Noise, Resize and Amplitude scaling (each of them tuned by just a single parameter). The selection of the attacks will depend on the application the investigated algorithm is intended for. We used common image processing operators whose combination represents a realistic scenario. Notice that this choice is fully arbitrary and application driven although it affects the computational cost [5].
% $ # #
!
!
!"
!%
"
Figure 2 Performances plot for Q = 40 dB under the combination of JPEG2000 compression, addition of WGN and amplitude scaling attack.
is inserted with higher strength in blocks where high texture is present. For the sake of clarity, we denote the main algorithm in [6] as WW-PM, which stands for Waveletbased Watermarking Pixel-wise Masking, while the one based on Variance Masking as WW-VM. In order to evaluate the robustness of the above described methods and compute the value Ri(q) (i=1,2) defined in Eq. (2) we proceed, for both the algorithms, as follows: I. tune the embedding strength in such a way that q=WPSNR(Iw); II. select the detection threshold so that the false positive probability is less than a fixed value Pfa; III. run the GA to determine M(q) and set R(q) as in Eq. (2). In our simulations the detection threshold is adaptively changed depending on the parameter (strength value) and imposing a probability of false alarm Pfa 10-6 (refer to [6]). In Fig. 2 experimental results for Lena and Baboon images are reported. As it can be noted, WW-PM outperforms WWVM in terms of robustness, when tested under the combination of JPEG2000 compression, addition of WGN and scaling attack. This result turns out to be coherent with the theoretical expectation that exploiting the perceptual masking allows to properly increase embedding strength, thus improving robustness. These results are consistent with robustness experiments carried out with respect to single attacks, as reported in [6]. 3.1. Analysis of [7] The algorithm presented in [7] is an important enhancement of traditional quantization index modulation (QIM) methods overcoming the sensitivity to volumetric changes of QIM schemes by adaptively selecting the quantization step size according to a modified version of Watson’s model. The perceptual model is adopted to calculate the maximal distortion allowed for each discrete cosine transform coefficient (the so-called “slack”) and to adaptively adjust the quantization step size as follows: where L is the number of samples, is the quantization step, G is a global constant known in the decoding phase and s is where L is the number of samples, is the quantization step,
!
&
&
!
! ! !
#
!
!%
"!
#
#
!
!%
"!
#
Figure 3 Performances plot for Q = 40 dB under the combination of JPEG2000 compression, addition of WGN and amplitude scaling attack.
Figure 4. Performances plot for Q = 40 dB under the combination of addition of WGN and amplitude scaling attack.
G is a global constant known in the decoding phase and s is the slack calculated via the perceptual model. Slacks obtained from Watson’s model are multiplied by a global constant G in order to get the final quantization step size for each DCT coefficient. Moreover G is tuned to empirically control the quality of the watermarked image. The proposed method is compared with the traditional QIM scheme where no perceptual models are employed to calculate the slack. In this case we have: Mn = G K n = 1,2....,L ( 6) where K is a fixed constant. Consequently, the quantization step is fixed as well. For sake of clarity, we denote the algorithm in [7] as P-QIM (Perceptual QIM) in contrast with the traditional QIM algorithm the comparison is made with. In order to compute the value Ri(q) (i=1,2) defined in Eq. (2), we proceed for both the algorithms as follows: I. tune the global constant G in such a way that q=WPSNR(Iw); II. fix a BER threshold discriminating between marked and un-marked images (here set to 0.2); III. run the GA to determine M(q) and set R(q) as in Eq. (2). Firstly we test the algorithms under the combination of attacks suggested in the reference paper: JPEG compression, AWGN and amplitude scaling. Obtained results show a weakness with respect to JPEG compression, for both the tested algorithms, which result to be non robust in this scenario. The compression attack is able to remove the mark while introducing a minimal degradation in the resulting image. This is not surprising considering QIM-based algorithms and, because of space restrictions, we do not report here such results. Notwithstanding, these tests allow us to demonstrate the effectiveness of the proposed tool. A further experimental analysis is carried out in order to analyze both the algorithms under test under the effect of JPEG2000 compression instead of classical JPEG. This attack is expected to become the new standard for image compression; it is therefore interesting to examine the robustness with respect to it. In Fig. 3 the obtained results for Baboon and Lena images are reported. It is clear that as the quality of the marked image increases, it becomes easier and easier to remove the mark, introducing little degradation
in the image, for both the algorithms. In this scenario, the traditional QIM results to be weaker than P-QIM. This turns out to be theoretically coherent, since the better embedding allows the mark to be more robust. Finally we analyze the algorithm avoiding compression. We apply the combination of AWGN and volumetric scaling attacks. In Fig. 4 robustness indexes for both the algorithms are plotted. It is clear that QIM results to be much less robust, as expected. Indeed, as declared in [7], P-QIM has been designed to overcome the sensitivity to volumetric attack of traditional QIM techniques.
4232
4. CONCLUSIONS In this paper we propose an additional application of the benchmarking developed in [3] to allow comparison of watermarking techniques. We carry out an extensive analysis on two pairs of well known algorithms, thus fully demonstrating the effectiveness of our benchmarking tool. 5. REFERENCES [1] B. Macq, J. Dittman, and E.J. Delp, “Benchmarking of image watermarking algorithms for digital rights management”, Proc. of the IEEE, vol. 92, no. 6, pp. 971-984, June 2004. [2] B. Michiels, B. Macq, “Benchmarking Image Watermarking Algorithms with OpenWatermark”, EUSIPCO06, 2006. [3] G. Boato, V. Conotter, F.G.B. De Natale and C. Fontanari, “Watermarking Robustness Evaluation based on Perceptual Quality via Genetic Algorithms”, IEEE Trans. on Information Forensics and Security, vol. 4, no. 2, pp. 207-216, June 2009. [4] S. Voloshynovskiy, A. Herrigel, N. Baumgaertner and T. Pun, “A stochastic approach to content adaptive digital image watermarking”, Proc. of the International Workshop on Information Hiding, pp. 211-236, 1999. [5] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, 1999. [6] M. Barni, F. Bartolini and A. Piva, ”Improved wavelet-based watermarking through pixel-wise masking”, IEEE Trans. on Image Processing, vol. 10, pp. 783–791, May 2001. [7] Q. Li and I. J. Cox, “Using perceptual models to improve fidelity and provide resistance to volumetric scaling for quantization index modulation watermarking”, IEEE Trans. on Information Forensics and Security, vol. 2, pp. 127–139, June 2007.