A Probabilistic Measure for Signature Verification based on Bayesian ...

Report 2 Downloads 63 Views
2010 International Conference on Pattern Recognition

A Probabilistic Measure for Signature Verification based on Bayesian Learning Danjun Pu, Sargur N. Srihari Center of Excellence for Document Analysis and Recognition(CEDAR) University at Buffalo, The State University of New York, USA {danjunpu, srihari}@cedar.buffalo.edu

Abstract Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of known signatures may be available, with as few as four or five knowns [1]. Here we describe a fully Bayesian approach which overcomes the limitation of having too few genuine samples. The algorithm has three steps: Step 1: Learn prior distributions of parameters from a population of known signatures; Step 2: Determine the posterior distributions of parameters using the genuine samples of a particular person; Step 3: Determine probabilities of the query from both genuine and forgery classes and the Log Likelihood Ratio (LLR) of the query. Rather than give a hard decision, this method provides a probabilistic measure LLR of the decision and the performance of the Bayesian Learning is improved especially in the case of limited known samples.

I. . Introduction The most common task in Questioned Document (QD) analysis is that of verifying signatures. The problem is the question relating to the authenticity of a signature(Figure 1): does this questioned signature (Q) match the known, true signatures (K) of this subject? In a typical forgery case a very limited number of known signatures may be available, with as few as four or five knowns [1]. Thus a measure of uncertainty based on the frequentist approach becomes infeasible. In previous studies, parametric approaches such as naive Bayes classifiers have been attempted[2], but they have an absolute requirement for having forgery samples for the case. Recently, we described a method [3] based on a distance non-parametric Bayesian ap1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.1142

(a)

(b)

(c)

Fig. 1. Bayesian Signature Verification: (a) small learning set of four known signatures, (b) genuine signature (Qa ) with LLR=9.19, (c) skilled forgery (Qb ) with LLR= -2.47.

proach to capture variation among genuine signatures and use it to classify a questioned signature. No attempt has been made to use the fully Bayesian approach where prior distributions of parameters are explicitly used, except assuming a uniform prior. In the fully Bayesian approach described here, training samples are used to determine prior distributions. The method has three steps: (i) from a population of signatures, means and variances of the distances are determined for prior distributions, (ii) posterior distributions are obtained using genuine samples of a particular person, (iii) determine probabilities of the query from both genuine and forgery classes and the log likelihood ratio (LLR) of the query. Such an approach has no minimum sample size requirements for genuines/forgeries for the case at hand and also provides a useful probabilistic measure of the decision instead of a binary decision. The rest of this paper is organized as follows: 1192 1188

Section II describes extraction of signature features and from them the dissimilarities. Section III describes prior distributions of parameters. The Bayesian approach is introduced in Section IV. Experimental results with several data sets are in Section V. A summary and conclusion is provided in Section VI. (a)

II. . Feature and Distance Image Preprocessing Each signature on paper was scanned at 300 dpi gray-scale and binarized using a gray-scale histogram. Salt-and-pepper noise was removed by median filtering. Slant normalization was performed by extracting the axis of least inertia and rotating the curve until this axis coincides with the horizontal axis [4]. Features Binary features known as gradient, structural and concavity (or GSC) [5][6] features were used. The image is divided into a 4 × 8 grid from which a 1024-bit GSC feature vector is determined. Distance We compute the distance between two signatures by the correlation similarity measurement method [7]. In a population of signatures which is used to determine priors, each person has n genuine  samples and m forgery samples. So we can get n2 (defined as n! 2!(n−2)! ) pairs of “genuine vs. genuine samples” and m × n pairs of “genuine vs. forgery samples”. The distance of “genuine vs. genuine samples” is called “genuine distance” and “genuine vs. forgery samples” is called “forgery distance”. This calculation maps feature space to distance space.

III.. Prior Distributions The distributions of genuine and forgery distances are assumed to be Gaussian. This is a good first assumption, although a Gamma for the forgery and Gaussian for genuine might be a better fit [8]. For each distributions, the parameters corresponding to mean (µ) and variance (σ 2 ) are chosen to be Gaussian and Inverse-chi-square distributions respectively. (i) We want to get the prior distributions based on noninformative prior distribution and inference of a large number of data D. So from [9], we begin by analyzing the model under a non-informative prior distribution P (µ, σ 2 ) ∝ (σ 2 )−1 , after data D inference, get the joint posterior distribution P (µ, σ 2 |D) and then marginalizing µ and σ 2 respectively, get the conditional posterior distributions P (µ|σ 2 , D) in the normal form and P (σ 2 |µ, D) as Inverse-chi-square distribu2 n+1 tion (Inv − X 2 (σ 2 |D) ∼ (σ 2 )− 2 exp(− (n−1)s 2σ 2 ) where n is the number of inference data set D; D is the

(b)

(c)

Fig. 2. Choice of parameter distributions. (a) Histogram of mean of genuine and forgery distances which are Gaussian-like; (b)Histogram of variance of genuine and forgery distances which are Inverse-chisquare-like; (c)Inverse-chi-square distributions.

Tab. I. Summary of assumption Likelihood fun. Prior of mean Prior of variance

Assumption dg ∼ N (θg , σg2 ); df ∼ N (θf , σg2 ) 2 ); θ ∼ N (µ , τ 2 ) θg ∼ N (µg0 , τg0 g0 g0 f 2 2 σg ∼Inv-X (n, b σg2 ); σf2 ∼ Inv-X2 (m, b σf2 )

Pn 1 2 mean of D; s2 = n−1 i=1 (Di − D) .) respectively. (ii) On the other hand, when plotting the histogram of the means and variance from our signature database, they are distributed as in Figure 2(a) and 2(b) respectively, which looks very like normal and Inverse-chisquare distributions (in Figure 2(c)). In conclusion, we make assumptions in Table I.

IV.. Bayesian Learning Algorithm The Bayesian Learning Algorithm has three steps: Step 1: Learn the prior distributions from a population of known signatures By mapping feature space to distance space, for each individual i, we can get one genuine distance vector [Gi1 , Gi2 , ...] and one forgery distance vector [Fi1 , Fi2 , ...] both of which are assumed iid and distributed as dg ∼ N (Gij |θg , σg2 ) and df ∼ N (Fij |θf , σf2 ). θg and θf parameters are distributed 2 as N (µg0 , τg0 ) and N (µf 0 , τf20 ) respectively. σg2 and 2 σf parameters are distributed as Inv − Xv2 (n, σ bg2 ) and 2 2 Inv − Xv (m, σ bf ) respectively. Learning from a population of known signatures, 1189 1193

Tab. II. Tables of distances. Left: Genuine distances d(Ki , Kj ) between all pairs in learning set [Figure 1(a)] and simulated forgery distances by adding δ = µf 0 − µg0 to genuine distances; Right: Query distances d(Ki , Qj ) between each Known signature [in Figure 1(a)] and the Query(Qa ) [in Figure 1(b)] and Query(Qb ) [Figure 1(c)] (K1 ,K2 ) (K1 ,K3 ) (K1 ,K4 ) (K2 ,K3 ) (K2 ,K4 ) (K3 ,K4 )

G. Dis. 0.208 0.264 0.214 0.24 0.183 0.253

F. Dis. 0.278 0.334 0.284 0.31 0.253 0.323

K1 K2 K3 K4

Qa 0.236 0.226 0.284 0.261

(ti with i = 1, 2, 3, 4) are in Qa and Qb column of the right table of Table II by comparing the queries in Figure 1(b) and 1(c) with known samples in Figure 1(a) respectively. We assume a query distance (ti ) as genuine (G) and forgery (F ) one respectively and by integrating the likelihood distance function and mean posterior parameter function and variance parameter function in formula 3 and 4 respectively, we can get p(ti |G) and p(ti |F ) as integrals. By Laplace Approximation, the integrals can be evaluated in equation 5 and 6.

Qb 0.225 0.266 0.295 0.257

Z

P (ti |θg , σf2 ) × P (θf ) × P (σf2 )dθf dσf2

−∞

(4) r

P (ti |G) = r P (ti |F ) = where Ag = kf = 1 +

τf2 σf2

n+5 ; nkg b σg2

Ag − Ag (ti −µg )2 2 e 2π

(5)

Af − Af (ti −µf )2 2 e 2π

(6)

Af =

m+5 ; mkf b σf2

kg = 1 +

τg2

b σg2

;

. b Based on the probabilities, we can get the LLR by a product of s likelihood ratios (where s is the number of query distances or the number of known samples). Qs P (ti |G) LLR = log( Qi=1 ) (7) s i=1 P (ti |F )

2 N (θg |µg , τg2 ) ∝ N (Xg |θg , σg2 ) × N (θg |µg0 , τg0 ) (1)

N (Xf |θf , σf2 ) × N (θf |µf 0 , τf20 )

(3)



P (ti |F ) =

the hyper-parameters of the prior distributions are 2 µg0 = 0.193; µf 0 = 0.264; τg0 = 0.0014; τf20 = 0.0012 and n and m are the numbers of all the genuine and forgery distances respectively and σ bg2 and σ bf2 are the means of variances of genuine distance vectors and forgery distance vectors respectively. Step 2: Determine the posterior distributions using the genuine samples of a particular person The hyper-parameters of the posterior distribution are updated by learning the training set from a particular person. In this training phase, for a specific  person, we enroll s genuine signatures. Get the t = 2s genuine distances Xg (= Xg1 , Xg2 , ..., Xgt ). We also simulate the forgery distances by adding δ = µf 0 − µg0 to genuine distances (Xf = Xg + δ). From Bayes rule, we obtain:



P (ti |θg , σg2 ) × P (θg ) × P (σg2 )dθg dσg2

−∞

Z

N (θf |µf , τf2 )



P (ti |G) =

For the query shown in Figure 1(b) which is known to be a genuine signature, the LLR is 9.19 and for the query shown in Figure 1(c) which is known to be a forgery, the LLR is -2.47. A positive LLR indicates a genuine decision and a negative number indicates a forgery decision. The magnitude of LLR gives the strength decision. For instance, a questioned signature with LLR of 9.19 is classified as genuine signature and the query with LLR of -2.47, is classified as forgery signature.

(2)

2 tτg0 µg +σg2 µg0 2 +σ 2 tτg0 g τf20 σf2 tτf20 +σf2

b The updated hyper-parameters: µg = ; 2 tτf20 b µf +σf2 µf 0 τg0 σg2 τg = tτ 2 +σ2 ; µf = ; τf = . tτf20 +σf2 g g0 where µ bg and µ bf are the mean of the Xg and Xf respectively; t is the number of Xg and simulated Xf . The genuine distances from the known set in Figure 1(a) and simulated forgery distances by adding δ = µf 0 − µg0 to genuine distances are shown in the left one Table II. By using these learning data, the updated hyper-parameters are µg = 0.222; µf = 0.293; τg2 = 0.0019; τf2 = 0.0018. Step 3: Determine probabilities of the query from both genuine and forgery classes and the Log Likelihood Ratio (LLR) of the query. For a questioned signature, the query distances are calculated by comparing it with the known samples, e.g., the query distances

V.. Experimental Results We tested this approach on two signature data sets: S1. fifty five persons, each of them has 24 genuine and 24 forgery signatures, S2. two persons, one of them has 140 genuine, 35 disguised, 90 spurious, 449 forgery signatures and the other one has 170 genuine, 21 disguised, 650 forgery signatures.

1190 1194

Tab. III. Data sets used in four experiments.

The experiments are designed in four steps: (i) the hyper-parameters are learned from data set A; (ii) in data set B, for each writer in data set B, s (s = 1, 2, ..., 8) genuine signatures are randomly chosen to be known samples and all the other genuine signatures and all the forgery signatures from this writer are used to get their classification by the Bayesian Learning method; (iii) for each writer, iterate (ii) for T times; (iv) count the number of signatures which are classified correctly. We configure A, B and T above for four tests in Table III and get their results in Table IV.

Test 1 2 3 4

Prior (A) S1 S1+S2 S1 S1+S2

Posterior (B) S2 S2 S1 S1

Iteration # (T ) 200 200 40 40

Tab. IV. The performances of the full Bayesian approach (Col. T est1, T est2, T est3 and T est4) and previous method (Col. S1 and S2) [3] with better results of new method. K.# 1 2 3 4 5 6 7 8

When we use different data sets S1 and S1+S2 in step (i) to test data set S2, the results are very similar. The results for data set S1 by S1 and S1+S2 in step (i) are very similar. So we assume that in this method, the testing data set in step (ii) and (iii) can be used as one part of the population data set in step (i). For each of 55 writers in data set S1, we randomly enrolled K genuine signatures for training and use all the other signatures for testing in T times. The accuracy results are shown in T est3 and T est4 columns of Tabs IV with enrolled number K from 1 to 8. Similarly the accuracy results of data set S2 are shown in T est1 and T est2 columns. With the same experiment design and data set as the method [3], we can see the results of new method is better than the results of previous method (shown in columns S1 and S2 of Tab. IV). When the number of known samples is smaller than 5, only the new method can provide probabilities.

Test 1 78.7% 83.38% 84.52% 85.36% 88% 88.3% 88.27% 89%

Test 2 80% 83.56% 84.92% 85.88% 87.07% 87.97% 88.54% 88.45%

Test 3 77.83% 84.04% 86.44% 87.7% 88.24% 88.93% 89.36% 89.6%

Test 4 77.57% 83.84% 86.47% 87.48% 88.29% 89.13% 89.19% 89.63%

S1 N.A. N.A. N.A. N.A. 85.5% 85.7% 85.5% 86%

S2 N.A N.A N.A N.A 82% 82.6% 83.2% 83.7%

References [1] S. A. Slyter, Forensic Signature Examination. field Illinois, USA: Charles C Thomas, 1995.

Spring-

[2] S. N. Srihari, A. Xu, and M. K. Kalera, “Learning strategies and classification methods for off-line signature,” Proc 7th Int Wkshp Frontiers in handwriting recognition(IWHR), pp. 161–166, 2004. [3] S. Srihari, K. Kuzhinjedathu, H. Srinivasan, C. Huang, and D. Pu, “Signature verification using a bayesian approach,” IWCF ’08: Proc of the 2nd Int Wkshp Computational Forensics, pp. 192–203, 2008. [4] B. Horn, Robot vision.

MIT Press, 1986.

VI.. Summary and Conclusion [5] B. Zhang and S. N. Srihari, “Analysis of handwriting individuality using handwritten words,” Proc of the Seventh Int Conf on Document Anal Recog (ICDAR), Edinburgh, Scotland, pp. 1142–1146, August 2003.

A fully Bayesian approach to signature verification has been presented. The approach allows learning from a small set of genuine signatures which is common in many forensic cases. The genuine and forgery distributions are assumed to be Gaussian. Their parameters, which are mean and variance, are assumed to have Gaussian and Inverse-chi-square distributions representations. Hyper parameters needed to define the prior distributions are learnt from a large training set. The smaller set of genuine signatures from the case at hand are used to infer the posterior distributions which are then used with the questioned signatures. The performance is better than previous method based on the same features. (Note: We assume that the experiment hasn’t enrolled forgeries from each writer for training in step 2. However, if there are forgeries, the performance can be improved more.)

[6] J. Favata and G. Srikantan, “A multiple feature/resolution approach to handprinted digit and character-recognition,” vol. 7, no. 4, pp. 304–311, 1996. [7] B. Zhang and S. N. Srihari, “Binary vector dissimilarity measures for handwriting identification,” Proc of SPIE, Document Recog Retrieval, pp. 155–166, 2003. [8] S. N. Srihari, C. Huang, and H. Srinivasan, “On the discriminability of the handwriting of twins,” Journal of Forensic Sciences, vol. 53(2), pp. 430–446, March 2008. [9] A. Gelman, Bayesian Data Analysis 2nd edtion. Texts in statistical science, 2004.

1191 1195