Robust Image Registration and Tampering ... - Semantic Scholar

Report 2 Downloads 87 Views
Robust Image Registration and Tampering Localization Exploiting Bag of Features Based Forensic Signature ∗ Sebastiano Battiato, Giovanni Maria Farinella, Enrico Messina, Giovanni Puglisi Dipartimento di Matematica e Informatica, Università degli Studi di Catania, Italy

{battiato, gfarinella, emessina, puglisi}@dmi.unict.it

ABSTRACT

1.

The distribution of digital images with the classic and newest technologies available on Internet (e.g., emails, social networks, digital repositories) has induced a growing interest on systems able to protect the visual content against malicious manipulations that could be performed during their transmission. One of the main problems addressed in this context is the authentication of the image received in a communication. This task is usually performed by localizing the regions of the image which have been tampered. To this aim the received image should be first registered with the one at the sender by exploiting the information provided by a specific component of the forensic hash associated with the image. In this paper we propose a robust alignment method which makes use of an image signature based on the Bag of Features paradigm. The alignment is based on a voting procedure in the parameter space of the model used to recover the geometric transformation occurred into the manipulated image. Experiments show that the proposed approach obtains good margin in terms of performances with respect to state-of-the art methods.

The importance of digital visual data is reflected by their increasing and systematically use in the communication platforms offered by Internet to the today’s society. The growing demand of techniques useful to protect digital visual data against malicious manipulations is induced by different episodes that make questionable the use of visual content as evidence material [3]. Specifically, methods useful to establish the validity and authenticity of a received image are needed in the context of Internet communications. To this aim different solutions have been recently proposed in literature [1, 2, 5, 6, 7]. Most of them share the same basic scheme: i) a hash code based on the visual content is attached to the image to be sent; ii) the hash is analyzed at destination to verify the reliability of the received image. In order to perform tampering localization, the receiver should be able to filter out all the geometric transformations (e.g., rotation, scaling, translation) added to the tampered image by aligning the received image the one at the sender [2, 6]. The alignment should be done in a semi-blind way: at destination one can use only the received image and the image hash to deal with the alignment problem since the reference image is not available. The challenging task of recovering the geometric transformations occurred on a received image from its signature motivates this paper. Building on the technique described in [2] and [6], we propose a new method to detect the geometric manipulations occurred on an image starting from the hash computed on the original one. Differently than [2] and [6], we exploit replicated matchings and a voting procedure in the parameter space of the transformation model employed to establish the geometric manipulation (i.e., rotation, scale, translation). As pointed out by the experimental results, the proposed approach obtains the best results with a significant margin in terms of estimation accuracy with respect to [2] and [6]. The effectiveness of the proposed method is also reflected through tampering detection tests in which a block-wise approach based on histograms of oriented gradients representation has been employed [7]. The remainder of the paper is organized as follows: Section 2 presents the proposed alignment signature and the registration framework. Section 3 reports experiments and discuss the registration and tampering localization results. Finally, conclusion are given in Section 4.

Categories and Subject Descriptors I.4.10 [Image Processing and Computer Vision]: Image Representation; I.4.9 [Image Processing and Computer Vision]: Applications

General Terms Algorithms, Security, Verification

Keywords Image Forensics, Forensic Hash, Bag of Features, Image Registration, Tampering Localization, Image Authentication. ∗Area chair: Tian-Tsong Ng

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’11, November 28–December 1, 2011, Scotsdale, Arizona, USA. Copyright 2011 ACM 978-1-4503-0616-4/11/11 ...$10.00.

2.

INTRODUCTION AND MOTIVATIONS

REGISTRATION COMPONENT

As in [2, 6], we adopt a Bag of Features based representations to reduce the dimensionality of the descriptors to be

used as hash component for the alignment. A codebook is generated by clustering the set of SIFT extracted on training images. The pre-computed codebook is shared between sender and receiver. It should be noted that the codebook is built only once, and then used for all the communications between sender and receiver (i.e., no extra overhead for each communication). The sender extracts SIFT features and sorts them in descending order with respect to their contrast values. Afterward, the top n SIFT are selected and associated to the id label corresponding to the closest prototype belonging to the shared codebook. The final signature for the alignment component is created by considering the id label, the dominant direction θ, and the keypoint coordinates x and y for each selected SIFT. The source image and the corresponding hash component for the alignment (hs ) are sent to the destination. The system assumes that the image is sent over a network consisting of possibly untrusted nodes, whereas the signature is sent upon request through a trusted authentication server which encrypts the hash in order to guarantee its integrity [5]. The image could be manipulated for malicious purposes during the untrusted communication. Once the image reaches the destination, the receiver generates the related hash signature for registration (hr ) by using the same procedure employed by the sender. Then, the entries of the hashes hs and hr are matched by considering their id values.The alignment is hence performed by employing a similarity transformation of keypoint pairs corresponding to matched hashes entries: xr = xs λ cos α − ys λ sin α + Tx

(1)

yr = xs λ sin α + ys λ cos α + Ty

(2)

The above transformation is used to model the geometrical manipulations which have been done on the source image during the untrusted communication. The model assumes that a point (xs , ys ) in the source image Is is transformed in a point (xr , yr ) in the image Ir at destination with a combination of rotation (α), scaling (λ) and translation (Tx , Ty ). The aim of the alignment phase is the esb α cx , T cy ) by exploiting the timation of the quadruple (λ, b, T correspondences ((xs , ys ), (xr , yr )) related to matchings between hs and hr . We propose to use a cascade approach; fx , T fy ) is first an initial estimation of the parameters (e α, T accomplished through a voting procedure in the quantized parameter space α × Tx × Ty . Such procedure is performed after filtering outlier matchings by taking into account the differences between dominant orientations of matched entries. The initial estimation is then refined considering only reliable matchings in order to obtain the final parameters b is estimated cx , T cy ). Afterward, the scaling parameter λ (b α, T c c by means of the parameters (b α, Tx , Ty ) which have been previously estimated on the reliable information obtained through the filtering described above. The overall estimation procedure is detailed in the following. Moving Tx and Ty on the left side and making the ratio of (1) and (2) the following equation is obtained: xs cos α − ys sin α xr − Tx = yr − Ty xs sin α + ys cos α

(3)

Solving (3) with respect to Tx and Ty we get the formulas

to be used in the voting procedure: Tx = Ty =





xs cos α − ys sin α xs sin α + ys cos α xs sin α + ys cos α xs cos α − ys sin α





(Ty − yr ) + xr

(4)

(Tx − xr ) + yr

(5)

Each pair of coordinates (xs , ys ) and (xr , yr ) can be used together with (4) and (5) to represent two lines in the parameter space α × Tx × Ty . The initial estimation of the fx , T fy ) is hence obtained by considering the parameters (e α, T densest bin of a 3D histogram in the quantized parameter space α × Tx × Ty . This means that the initial estimation fx , T fy ) is accomplished in correspondence of the maxof (e α, T imum number of intersections between lines generated by matched keypoints. As said before, to discard outliers (i.e., wrong matchings) the information coming from the dominant directions (θ) of the SIFT is used during the voting procedure. In particular ∆θ = θr − θs is a rough estimation of the rotational angle α. Hence, for each fixed triplet (α, Tx , Ty ) of the quantized parameter space, the voting procedure considers only the matchings between hs and hr such that |∆θ − α| < tα . The threshold value tα is chosen to consider only matchings with a rough estimation ∆θ which is closer to the considered α (e.g., consider just matchings with a small initial error of ±3.5 degree). In this way we obtain an initial estifx , T fy ) mation of rotation angle α e , and translation vector (T by taking into account the quantized values used to build the 3D histogram into the parameter space. To refine the initial estimation we exploit the m matchings which have generated the lines intersecting in the selected bin. Specifically, for each pair ((xs,i , ys,i ), (xr,i , yr,i )) corresponding to the selected bin of the 3D histogram, we consider the following translation vectors g (Td x,i , Ty,i ) = d (Tg x,i , Ty,i ) =





xs,i cos α e − ys,i sin α e xs,i sin α e + ys,i cos α e

fx , T





fy − yr,i ) + xr,i , T fy (T

xs,i sin α e + ys,i cos α e xs,i cos α e − ys,i sin α e





(6) 

fx − xr,i ) + yr,i (T

(7)

together with the subsequent equations xr,i = xs,i λi cos αi − ys,i λi sin αi + Tx,i

(8)

yr,i = xs,i λi sin αi + ys,i λi cos αi + Ty,i

(9)

fx , T fy ) in (6) and (7) are the ones The parameters values (e α, T obtained through the voting procedure. Solving (8) and (9) with respect to ai = λi cos αi and bi = λi sin αi we obtain: ai =

yr,i ys,i + xr,i xs,i − xs,i Tx,i − ys,i Ty,i 2 x2s,i + ys,i

(10)

bi =

xs,i yr,i − xr,i ys,i + ys,i Tx,i − xs,i Ty,i 2 x2s,i + ys,i

(11)

Since the ratio abii is by definition equals to tan αi , for each pair of matched keypoints we can estimate αbi by exploiting the following formula: α ci =

+

1 arctan 2

1 arctan 2

g xs,i yr,i − xr,i ys,i + ys,i Td x,i − xs,i Ty,i g yr,i ys,i + xr,i xs,i − xs,i Td x,i − ys,i Ty,i

d xs,i yr,i − xr,i ys,i + ys,i Tg x,i − xs,i Ty,i d yr,i ys,i + xr,i xs,i − xs,i Tg x,i − ys,i Ty,i

! !

(12)

Once αbi is obtained, the equation (13) (derived from (8) and (9) by considering (6) and (7)) is used to estimate λbi . xr,i − Td yr,i − Tg 1 1 x,i y,i + λbi = 4 xs,i cos α ci − ys,i sin α ci 4 xs,i sin α ci + ys,i cos α ci xr,i − Tg yr,i − Td 1 1 x,i y,i + + 4 xs,i cos α ci − ys,i sin α ci 4 xs,i sin α ci + ys,i cos α ci

(13)

d d The above method produces m quadruples (λbi , αbi , T x,i , Ty,i ), one for each matching pair ((xs,i , ys,i ), (xr,i , yr,i )) corresponding to the bin selected with the voting procedure. The b α cx , T cy ) to be used for final transformation parameters (λ, b, T the registration are computed by averaging over all the m produced quadruples. It should be noted that some id values may appear more than once in hs and/or in hr . Even if a small number of SIFT are selected during the image hash generation process, the conflict due to replicated id can arise. For example, it is possible that a selected SIFT has no unique dominant direction; in this case the different directions are coupled with the same descriptor, and hence will be considered more than once by the selection process which generates many instance of the same id with different dominant directions. Differently than [6], the described approach considers all the possible matchings in order to preserve the useful information. The correct matchings are hence retained but other wrong pairs could be generated. Since the noise introduced by considering correct and incorrect pairs can badly influence the final estimation results, the presence of possible wrong matchings should be considered during the estimation process. The approach described in this paper deals with the problem of wrong matchings combining in cascade a filtering strategy based on the SIFT dominant direction (θ) with a robust estimator based on a voting strategy on the parameter space (see further details in [1]). In this way the information of spatial position of keypoints and their dominant orientations are jointly considered, and the scale factor is estimated only at the end of the cascade on reliable information.

3. EXPERIMENTAL RESULTS This section reports a number of experiments on which the proposed approach has been tested and compared with respect to [2] and [6]. In order to cope with scene variability the tests have been performed considering a subset of the fifteen scene category benchmark dataset used in [4]. The training set used in the experiments is built through a random selection of 150 images from the aforementioned dataset. Specifically, ten images have been randomly sampled from each scene category. The test set consists of 15750 images generated through the application of different manipulations on the training images (Tab. 1). The registration results obtained employing the proposed approach considering an alignment hash component of different size (i.e., different number of SIFT) are reported in Tab. 2. To demonstrate the effectiveness of the proposed approach, and to highlight the contribution of both, the replicated matchings and cascade filtering during the estimation, we have performed comparative tests by considering our method, the approach proposed in [6], and the method proposed in [2] which exploits both replicated matchings and a cascade filtering approach.

Table 1: Image manipulations. Operations Rotation ( ) Scaling (!) Orizontal Traslation (Tx) Vertical Traslation (Ty) Cropping Tampering Malicious Tampering

Parameters 3, 5, 10, 30, 45 degrees factor = 0.5, 0.7, 0.9, 1.2, 1.5 5, 10, 20 pixels 5, 10, 20 pixels 19%, 28%, 36%, of entire image block size 50x50 block size 50x50 a = 0.90, 0.95, 1, 1.05, 1.10 Linear Photometric Transformation (a*I+b) b = -10, -5, 0, 5, 10 Compression JPEG Q=10 Various combinations of above operations

Table 2: Registration results of the proposed approach. Number of SIFT Unmatched Images Mean Error Mean Error ! Mean Error Tx Mean Error Ty

15 3.17% 1.3826 0.0421 1 9704 1.9704 1.6857

Mean Error ! 30 45 1.50% 0,96% 0.7970 0.5158 0.0237 0.0161 0 9131 0.9131 0 6012 0.6012 0.8880 0.6271

60 0.84% 0.4255 0.0134 0 4866 0.4866 0.5540

The approach proposed in [6] has been reimplemented. The Ransac thresholds used in [6] to perform the geometric parameter estimation have been set to 3.5 degrees for the rotational model and to 0.025 for the scaling one. These thresholds values have been obtained through data analysis (inliers and outliers distributions). Although Lu et al. [6] claim that further refinements are performed using the matchings that occur more than once, actually they do not provide any implementation detail. Hence, to further refine the results obtained by Lu et al. [6] without replicated matchings, we have re-considered all the previous discarded matchings belonging to [θb − 3.5, θb + 3.5] and [b σ − 0.025, σ b + 0.025], with θb and σ b coming from the estimation without replicates. In order to perform a fair comparison, the threshold tα used in our approach to filter the correspondences (see Section 2) has been set with the same value of the threshold employed by Ransac to estimate the rotational parameter in [6]. The quantized values Tx and Ty needed to evaluate the right side of (4) and (5) have been quantized considering a step of 2.5 pixels. Similar considerations have been done to properly set the parameters of the approach proposed in [2] in order to guarantee a fair comparison. Specifically, the bin size used in [2] has been obtained doubling the corresponding Ransac thresholds: a histogram with bin size of 7 degrees ranging from −180 to 180 degrees has been used for the rotation estimation step, whereas a histogram with bin size equal to 0.05 ranging from 0 to maxσ = 10 was employed to estimate the scale. Finally, a codebook with 1000 visual words has been employed to compare the different approaches. The codebook has been learned through k-means clustering on the overall SIFT descriptors extracted on training images. First, let us examine the typically cases on which registration approaches could fail. Two cases can be distinguished: i) no matchings are found between the hash built at the sender (hs ) and the one computed by the receiver (hr ); ii) all the matchings are replicated. The first problem can be mitigated considering a higher number of features. The second one is solved by allowing replicated matchings (see Section 2). As reported in Tab. 3, by increasing the number of SIFT

ROC CURVE

Table 3: Comparison with respect to unmatched images. Unmatched Images 15 7.54% 1.00% 3.17%

30 2.79% 0.57% 1.50%

45 1.63% 0,30% 0,96%

60 1.17% 0.10% 0.84%

points there is a decreasing of the number of unmatched images (i.e., image pairs that the algorithm is not able to process because there are no matchings between hs and hr ) for all the approaches. In all cases the percentage of images on which our algorithm is able to work is higher than the one obtained by the approach proposed in [6]. Despite the percentages of unmatched images obtained employing [2] is less than the one obtained by the proposed approach, the tests reported in the following reveal that our method strongly outperforms the other two in terms of parameter estimation error and robustness with respect to the different transformations.

True Positive (Sensitivity)

Number of SIFT Lu et al. [6] Battiato et al. [2] Proposed approach

1

0.8

0.6

0.4

Lu et al. [6] Battiato et al. [2] Proposed approach

0.2

0 0

0.2

0.4 0.6 0.8 False Positive (1 Specificity)

1

Figure 1: Tampering detection comparison. in Fig. 1, our approach strongly outperforms the other techniques.

Table 4: Average rotational error. Number of SIFT Unmatched Images Lu et al. [6] Battiato et al. [2] Proposed approach

15 10.81% 5.9700 3.0408 0.9213

Mean Error 30 45 4.10% 2.17% 6.5225 6.7614 2.1747 2.0613 0.6525 0.3548

60 1.66% 6.6574 1.6890 0.2756

Table 5: Average scaling error. Number of SIFT Unmatched Images Lu et al. [6] Battiato et al. [2] Proposed approach

15 10.81% 0.0543 0.0241 0.0295

Mean Error 30 45 4.10% 2.17% 0.0512 0.0533 0.0211 0.0182 0.0191 0.0125

60 1.66% 0.0551 0.0165 0.0102

Tab. 4 and Tab. 5 show the results obtained in terms of rotational and scale estimation through mean error. In order to properly compare the methods, the results have been computed taking into account the images on which all approaches were able to work (the number of unmatched images is reported into the tables). The proposed approach outperforms [2] and [6] obtaining a considerable gain in terms of rotational accuracy. Moreover, the performance of our approach significantly improves with the increasing of the extracted feature points (SIFT). On the contrary, the technique in [6] is not able to take advantage from the information coming from an increasing number of extracted SIFT points. The rotational estimation gain obtained by employing our approach instead the one in [6] is about 5 degree exploiting the minimum number of SIFT, and reaches 6 degree with 60 SIFT. A good gain in terms of performance is also obtained with respect to the scale factor (Tab. 5). Finally, experiments have been done to validate the tampering localization performances after image registration. To this aim, the 1500 images of the test set containing a tampering patch have been considered. As in [7], a uniform quantization of orientation histograms has been used to represent non-overlapping blocks of images. Euclidean distance is used to discriminate between tampered and not tampered blocks. The different registration approaches have been compared through ROC curves. The area under the curves indicates a biased estimation of the expected tampering detection accuracy of the different methods. As shown

4.

CONCLUSIONS AND FUTURE WORKS

The assessment of the reliability of an image received through the Internet is an important issue in nowadays society. This paper addressed the image alignment and tampering detection tasks in the context of distributed forensic systems. A robust image registration component which exploits an image signature based on the Bag of Features paradigm has been introduced. Comparative tests show that the proposed approach outperforms recently appeared techniques by obtaining a significant margin in terms of performances.

5.

REFERENCES

[1] S. Battiato, G. M. Farinella, E. Messina, and G. Puglisi. A robust forensic hash component for image alignment. In International Conference on Image Analysis and Processing, 2011. [2] S. Battiato, G. M. Farinella, E. Messina, and G. Puglisi. Understanding geometric manipulations of images through BOVW-based hashing. In IEEE International Workshop on Content Protection &Forensics, 2011. [3] H. Farid. Digital doctoring: how to tell the real from the fake. Significance, 3(4):162–166, 2006. [4] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2169–2178, 2006. [5] Y.-C. Lin, D. Varodayan, and B. Girod. Image authentication based on distributed source coding. In IEEE Computer Society International Conference on Image Processing, pages 3–8, 2007. [6] W. Lu and M. Wu. Multimedia forensic hash based on visual words. In IEEE Computer Society International Conference on Image Processing, pages 989–992, 2010. [7] S. Roy and Q. Sun. Robust hash for detecting and localizing image tampering. In IEEE Computer Society International Conference on Image Processing, pages 117–120, 2007.

Recommend Documents