GEOMETRICALLY INVARIANT IMAGE WATERMARKING VIA ROBUST PERCEPTUAL HASHES Oztan Harmancı1 , Vishal Monga2 and M. Kıvanc¸ Mıhc¸ak3 University of Rochester1 , NY, USA, Xerox Innovation Group2 , CA, USA Microsoft Research3 , WA, USA
[email protected],
[email protected],
[email protected] ABSTRACT We consider the problem of blind watermark detection from an image, when the image has undergone a geometric attack. Existing schemes for solving this problem involve insertion of periodic marks or templates, invariant properties of transforms, and feature point based techniques. A major deficiency of these approaches is either a security leak (owing to redundancy in the mark) and/or poor watermark detection performance esp. under lossy geometric attacks, e.g. cropping. In this paper, we introduce a novel watermark synchronization paradigm by using perceptual image hashes. We exploit the fact that a perceptual hash acts as a robust as well as randomized digest of image content, and use it to embed a synchronization mark into the image. Unlike existing synchronization schemes, our synchronization mark has favorable security and robustness to adversarial attacks and hence, cannot be easily removed. Experimental results confirm that our proposed algorithm has significantly lower watermark detection error probabilities than existing methods. 1 1. INTRODUCTION In many watermarking applications, e.g. content authentication and fingerprinting, robustness of the mark to perceptually insignificant attacks is important to the system. This means that the mark should be detected successfully as long as the watermarked image and the received image at the watermark receiver are perceptually equivalent, i.e. even as they are very different bit-wise. Watermarking research has been fairly successful in achieving robustness to common signal processing modifications such as compression, image enhancement, independent noise addition, image smoothing operations etc. However, an important subset of perceptually insignificant modifications that continues to plague existing systems is geometric attacks. These can further be decomposed into two classes: global transformations such as scaling, rotations and translations, and local transformations such as random bending and shearing, e.g. the StirMark attack [1]. In particular, it is known that in classical watermarking schemes the ability of the watermark detector is dramatically reduced even under minor geometric manipulations (global or local) to the image. For this reason, significant attention has been devoted in recent years towards developing geometrically invariant watermarking schemes. This includes periodic insertion of the mark [2], template insertion [3], mark embedding in geometrically invariant domains [4], and content based watermarking schemes that extract image feature points [5]. However, these methods have shortcomings.Periodic marks leak information to an adversary which may be used in estimation and 1 Work was carried out when O. Harmancı and V. Monga were with Microsoft Research.
subsequent removal of the mark. Attacks have been developed that easily remove the templates[6]. Geometrically invariant domains have poor robustness to a large number of common signal processing attacks and they introduce visually pronounced artifacts in the spatial domain for even a small watermark strength. Although feature point based methods provide good robustness to both global and local distortions, they implicitly make very strong assumptions of the feature point detector. In particular, a large fraction of the feature points from the watermarked original image and the received image are required to exactly match (under a model of the geometric distortion) for the mark to be successfully detected, which in practice, often proves too optimistic. Based on the above discussion, we propose using perceptual image hashes to convey information about the geometric attack to the watermark receiver. Recently, perceptual image hashing [7] has been investigated for image search and content based secure signatures. Such a hash has favorable robustness to common signal processing modifications and local geometric attacks (e.g. Stirmark random bending) while being relatively sensitive to global geometric distortions such as rotation, scaling, and large pixel translation. Our watermark synchronization framework is based on matching the hash values of pseudo random regions of the image to independently generated pseudo random numbers by modifying the image. Watermarking is performed on the modified image. The receiver uses the knowledge of the hash function and the secret key used to seed the pseudo-random number generator (PRNG) to perform an efficient search over the parameters of a geometric attack model. The determined optimal parameters are used for attack undoing, i.e. synchronizing the image. Watermark detection is subsequently performed on the synchronized image. 2. OVERVIEW OF THE PROPOSED SCHEME The step-by-step embedder functions are summarized as follows; 1. Divide the image I into N pseudo-random(PR) regions and for each region generate a PR number using a secret key KH . Denote the ith region as Ui , and corresponding PRNG output as bi , i = 1..., N 2. From each region Ui , compute a hash value hi using a perceptual hash function hKH (·, ·), i.e. hi = hKH (I, Ui ). 3. Sort the regions in increasing order of the deviation of their hash values (hi ’s) from the respective bi ’s. Label top M re0 gions as {Ui }M i=1 . 0 4. Modify the image coefficients“ in regions {Ui }M i=1 to get ” ˜ ˜ U˜i = bi , I˜ is the modi{U˜i }M I, i=1 , so that hi = hK H
fied image. 5. Use a second secret key KW to embed the watermark in the ˜ modified image I.
Following are the step-by-step reciever functions;
6000
{Ri }N i=1
and N PR num1. Generate the same N PR regions bers {bi }N i=1 as at the embedder using the same secret key KH to seed the PRNG. 2. Search for the optimum geometric synchronization model G∗ , such that for the synchronized image, the hash values of the top M regions approximately match the corresponding PRNG outputs bi , i = 1, ..., M . 3. Operate (G∗ )−1 on received image, Ir , to obtain the “synchronized image” Ir∗ . 4. Apply watermark decoding on Ir∗ using secret key KW . Our scheme is based on first geometrically synchronizing the received image, and then subsequently applying watermark detection. For brevity, we do not discuss the basic watermark embedding and detection here. The reader is referred to [8] for further details. Our choice of this watermarking method is motivated by its known robustness [8] to local geometric distortions. We embed a synchronization mark called as the hash distortion compensation (HDC) (step 4 of the embedder). Note from Step 2 of the receiver, that G∗ will correctly model the geometric attack when (most of) the M regions at the receiver are in correspondence with those at the embedder. In particular, the hash values extracted from the regions at the receiver allow synchronization with the corresponding regions at the embedder. Evidently, the design of the hash function plays an important role in the ability of the receiver to synchronize. First, the entropy of the hash function in the probability space induced by the key should be sufficiently high; this condition makes sure that if we try sufficiently many keys, we would eventually come across hash values that are sufficiently close to the PRNG outputs. Second, the hash must be robust to common signal processing (or image intensity) distortions, while being sensitive to geometric attacks. For example, let Rθ oI represent I rotated by the angle θ. Then it is desired that as θ is increased, k hKH (I) − hKH (Rθ oI) k increases as well. If this increase is perfectly monotonic, receiver can perform efficient search over the attack parameter space. A hash function that possesses the desired properties is discussed in the next section. 3. HASHING VIA PSEUDO-RANDOM (PR) IMAGE STATISTICS We employ the perceptual hash proposed by Venkatesan et al. [7]. The hash values are obtained as PR linear statistics of PR semiglobal regions in the DC subband in the wavelet domain for images. First, L level wavelet decomposition of the input image I is generated. Then, the DC sub-band is divided into PR generated possibly overlapping rectangular regions {Ri }P i=1 . Hash value of each region is computed as a weighted linear combination of the coefficients in that region where weights are chosen from a smoothly varying Gaussian random field. P Let d(θ) = i k hKH (I, Ui ) − hKH (I, Rθ oUi ) k2 , where Ui ’s denote image regions as before, Rθ is a rotation attack by angle θ. Fig. 1 shows d(θ) vs. θ, where θ is varied from −10 to 10 degrees. Although not completely monotonically, the distance between the hash values of the original and attacked images, shows an increasing behavior; therefore, we observe that this hash function possesses both sensitivity and graceful degradation properties with respect to the rotation attack, which is a typical example of geometric attacks.
5500 5000
d(θ)
4500 4000 3500 3000 2500 2000 −10
−5
0 θ
5
10
Fig. 1. Distance between the hash value of the original image and its rotated version as a function of the rotation angle.
4. COMPUTATION OF HASH DISTORTION COMPENSATION Let s denote the vector representation of the DC subband in the DWT domain of the host image. Use the secret key KH to randomly tilt s into N possibly overlapping rectangles where N < n, the length of s. Denote the index set for the ith chosen rectangle as Ui . Given each Ui , use the secret key KH to generate PR weights ai = {aij } for all j ∈ Ui . Let aij = 0 for j ∈ / Ui . For each Ui compute hi =< ai , s >. Generate {bi }N i=1 using a secure PRNG and sampled i.i.d from a zero mean Gaussian distribution with a suitably chosen 2 variance σ 2 . Then, sort the N regions, in increasing order with respect to k hi − bi k, and pick top M regions (where M ¿ N ). Denote these 0 regions as {Ui }M i=1 . These are the regions, whose hash values best match the corresponding bi ’s generated by PRNG. In matrix notation, for the selected M regions we may write, 0
h = As
(1)
where A is M × n and contains PR chosen weights corresponding 0 0 to regions {Ui }M i=1 , h is the M × 1 vector of hash values corresponding to the same regions, and s is n × 1 host signal vector. 0 0 Let bi denote the PRNG outputs corresponding to {Ui }M i=1 . Our goal is to modify the signal s by an additive perturbation d so that, 0
0
b = As 0
0
(2) 0
where s = s + d and b contains the bi ’s corresponding to the M regions. Since M < n, there are generally infinitely many solutions to the above system of equations. We are interested in incurring the minimum distortion on s. Hence,
min subject to
kdk 0 0 As = b .
The solution to (3) is the well-known minimum norm solution: “ 0 ” 0 dmin = A0 (AA0 )−1 b − h
(3)
(4)
2 Choosing the variance of b ’s is a delicate issue. More the variance, i greater would be the security but thatalso increases the distortion introduced by HDC embedding. In practice, we chose the variance of bi ’s to be approximately the same as the variance of hi ’s.
0
Note that d is added so that the hash value hi , from the ith re0 0 gion Ui , exactly matches the corresponding PR number bi , i.e. it 0 0 compensates for the distortion k hi − bi k. Hence, we call d as the hash distortion compensation (HDC). Embedding HDC in this manner ensures that it has robustness properties similar to the watermark proposed in [8] and hence cannot be easily removed like simple synchronization streams, such as templates.
8000 7000 6000 5000 4000 3000
5. GEOMETRIC ATTACK UNDOING FOR WATERMARK DETECTION
2000 1000 1
5.1. Geometric Attack Model We model the geometric attack on the image via an affine transformation (G). Although affine transfomation has 6 parameters (scaling, rotation and translation in each axis), in this paper, without loss of generality, we assume rotation and scaling is equal for both axes, and we use 4 variables; rotation(θ), scaling(r), shift in x and y directions(m1 and m2 ). Therefore; » – » – r cos(θ) r sin(θ) m1 G(u) = u+ (5) −r sin(θ) r cos(θ) m2 Hence, the geometric attack on the image is modeled by a 4 × 1 parameter vector p = {r, θ, m1 , m2 }. Our search algorithm described in the next Section can easily be extended to the whole class of global affine transformations specified by six parameters, or even a general parametrization of the locally affine transformations.
4
0.95
2
0.9
0.85
0 0.8
−2 θ
r
Fig. 2. Synchronization cost function D(p) as a function of rotation angle θ and scaling parameter r. The actual synchronization is θ = 1 degrees, r = 0.909. 1. Uniformly partition [pjmin , pjmax ] into Nj intervals. Hence, Q entire parameter space is partitioned into Ntot = 4j=1 Nj tot regions. Label the regions as {Vi }N i=1 . 2. In the ith region, employ a steepest descent gradient search based on a random initialization of p = p0i ∈ Vi ; pk+1 = pki − α∇pk D(p) i i
(8)
where ∇pk D(p) represents the gradient3 of D(p) evaluated i
at pki , and α is a scalar used to adjust the size of the update taken in the inverse direction of the gradient.
5.2. Algorithm for the Attack Model Parameter Search Synchronization of the received image Ir with the marked image at the embedder then requires a search over the space of the parameter vector p, such that p∗ = arg min D(p) (6) p
D(p) = “
M X i=1
ˆ i − bi k kh
(7)
” ˆ i = hK where h Ir , (G)−1 oUˆi , G denotes the geometric transH formation specified by p, bi denotes the PRNG output corresponding to region Uˆi using the same key as embedder. {Uˆi }M i=1 is defined as similar to embedder: sort the N regions in increasing order with ˆ i − bi k, and pick top M regions. respect to k h The construction of an algorithm that tries to find p∗ would be dictated by the behavior of D(p) w.r. to p. Fig. 2 illustrates this behaviour for rotation and scaling when the correct synchronization parameters are r∗ = 0.909 and θ∗ = 1. (Only two dimensions are shown for visualization). Note that the global minimum is indeed achieved at the true attack parameters. Also note that, D(p) has many local minima hence a simple gradient search is not possible. However, it can be seen in Fig. 2 that D(p) is approximately locally convex around global minimum. Based on this observation, we propose a divide and conquer type gradient search algorithm. In particular, we divide the total attack parameter space into a finite number of regions and perform gradient search in each region. Let [pjmin , pjmax ] denote the range of interest for jth affine transform parameter. A step-by-step description of our search algorithm follows:
3. Repeat Step(3) for all Vi and collect p∗i . 4. p∗ = arg minp∗i D(p∗i ) gives the minimizer. 6. EXPERIMENTAL RESULTS We used grayscale test images of size 512 × 512 from a large natural image database. We apply our additive HDC as well as watermarkembedding algorithm to the DC subband of 3-level DWT with Daubechies - 10 wavelets. For HDC embedding, A is chosen such that image statistics represent weighted averages of PR rectangles (sub-images) of size randomly varying from 32 to 64 in the DC sub-band of a the 3-level DWT. Weights aij (i.e. PR weights for hashes) are generated as i.i.d. realizations of a zero-mean Gaussian distribution with unit variance; then they are passed through an ideal low-pass filter of cutoff frequency 0.8π. Parameters M and N that were defined in Section 2 are chosen as 100 and 10000 respectively. Figs. 3 (a)-(c) show an original image, its geometrically attacked version and the synchronized image prior to watermark detection obtained by applying an “inverse attack” to the image in Fig. 3(b) using the G∗ determined by our geometric transformation search algorithm. The attack in Fig. 3 (b) is rotation by θ = 7o , scaling by 3 In practice, it is important to “condition” the cost function D(p) prior to applying any kind of gradient search. This is typically done by defining a new function E(p) = D(W p) where W is a diagonal matrix. Note E(p) has the same minimizer as D(p). The entries of the diagonal matrix W are experimentally optimized to enhance convergence of gradient descent algorithms.
(a)
(b)
(c)
Fig. 3. (a) Original image, (b) Attacked version of (a), (c) synchronized image after attack undoing at the receiver. The attack is rotation by 7o , scaling by 0.9, and JPEG with QF = 10.
7. CONCLUSION In the proposed work, we present a watermark synchronization technique under geometric attacks. Our method is significantly different
0.9 scaling, 7o rotation, JPEG QF = 10
0
10
Algorithm 1 Algorithm 2 Algorithm 3
−1
10
PMiss
r = 0.9, crop and resize, and JPEG compression with QF = 10. Fig. 3 (c) clearly reveals that the geometric attack is accompanied by a loss of data due to cropping. We rely on the natural robustness of the watermarking algorithm in [8] to small cropping attacks. We applied this attack to an image database of 100 images (both watermarked and unmarked). Further, 10 different secret keys were used with each image for mark embedding and detection, resulting in 1000 samples. After the attack undoing on each received image, we apply blind watermark detection. Based on the detector output, we experimentally quantified the probabilities of “false alarm” (PF A ) and “miss” (PM ). Recall that the event of “false alarm” happens if the detector declares the presence of the WM although it was not embedded; conversely, the event of “miss” happens if the detector declares that the WM is not present even though it was embedded. Fig. 4 shows the receiving operation characteristics (ROC) (i.e., PM v.s. PF A varied as a function of the detection threshold) for the aforementioned attack . Further, we generate ROC curves for three different algorithms: 1.) Algorithm 1: watermark embedder and detector as in [8] but without any attack undoing, 2.) Algorithm 2: synchronization based on the output of the watermark detector, i.e. we simply search for the G∗ such that when (G∗ )−1 is applied on the image, the watermark detector identifies the presence of the mark, and 3.) Algorithm 3: the proposed watermarking framework with hash based synchronization. For a fair comparison, all three algorithms embed the same watermark power. In particular, for Algorithm 3, we used 1/3 of embedded power for the HDC (or synchronization mark) and 2/3 for the authentication watermark. In particular, it is evident from Fig. 4 that both PM and PF A are orders of magnitude lower for the proposed algorithm. The improvement over Algorithm 1 is expected since no attack undoing is applied in Algorithm 1, and hence synchronization is way off. A comparison of the ROC curves for Algorithm 2 and 3 reveals the virtues of the HDC embedding for synchronization over a naive search over the attack parameters based on the watermark detector output. In particular, for the same miss probability, Algorithm 2 results in a much higher false alarm probability than Algorithm 3. Intuitively, this is because for Algorithm 2, spurious regions may get selected in watermark detection, whereas the hash based synchronization ensures that with high probability a large fraction of mark detection regions are in correspondence.
−2
10
−3
10
−3
10
−2
−1
10
10
0
10
PFalse Alarm
Fig. 4. ROC (PF A vs. PM ) plots for the attack in Fig. 3 (b). from existing literature in two respects: 1.) we embed geometric alignment information in the form of another mark that has properties similar to the authentication watermark, and 2.) we use a randomized perceptual hash function to generate information about the geometric attack at the receiver. We show successfully that geometric synchronization via hashes significantly improves watermark detection error probabilities. Further, because hash values are matched to PRNG outputs, our scheme does not compromise security at the cost of geometric synchronization. This is in contrast to most existing methods where information leading to geometric synchronization incurs a security leak. 8. REFERENCES [1] F. A. P. Petitcolas and R. J. Anderson, “Evaluation of copyright marking systems,” Proc. IEEE Int. Conf. on Multimedia Systems, pp. 574–579, 1999. [2] M. Kutter, “Watermarking resistant to translation, rotation and scaling,” Proc. SPIE Multimedia Systems and Applications, vol. 3528, pp. 423–431, Nov. 1998. [3] S. Pereira and T. Pun, “Fast robust template matching for affine resistant watermarking,” Int. Workshop on Information Hiding, Lecture Notes in Computer Science, vol. 1768, pp. 200–210, 1999. [4] J. K. O Ruanaidh and T. Pun, “Rotation, scale and translation invariant spread spectrum image watermarking,” Signal Processing: Imag. Comm., vol. 66, no. 3, pp. 303–317, May 1998. [5] M. Alghoniemy and A. H. Tewfik, “Geometric distortion correction in image watermarking,” Proc. SPIE Symp. on Electronic Imaging, pp. 82–89, Jan. 2000. [6] S. Voloshynovskiy, A. Herrigel, and Y. B. Rystar, “Watermark template attack,” Proc. SPIE Annu. Symposium on Electronic Imaging, Jan. 2001. [7] R. Venkatesan, S. M. Koon, M. H. Jakubowski, and P. Moulin, “Robust image hashing,” Proc. IEEE Conf. on Image Processing, pp. 664–666, Sept. 2000. [8] K. Mihcak, R. Venkatesan, and T. Liu, “Watermarking via optimization algorithms for quantizing randomized semiglobal image statistics,” ACM Multimedia Systems Journal, Apr. 2005.