Pattern Analysis & Applications (2000)3:169–181 2000 Springer-Verlag London Limited
Signature Verification: Increasing Performance by a Multi-Stage System C. Sansone and M. Vento Dipartimento di Informatica e Sistemistica, Universita` degli Studi di Napoli ‘Federico II’, Napoli, Italy Abstract: A serial three stage multi-expert system for facing the problem of signature verification is proposed. The whole decision process is organised into successive stages, each using a very reduced set of features for recognising forgeries and providing information about the reliability of the recognition process. The first expert, adopting only a single global feature, is devoted to the elimination of random and simple forgeries. The second stage receives only those signatures not classified as false by the first stage (i.e. those signatures really genuine or forgeries reproduced in a skilled way), and adopts a single specific feature suitable for isolating skilled forgeries. Both of these two stages employ suitable criteria for estimating the reliability of the performed classification, so that, in case of uncertainty, the signature is forwarded to a final stage which takes the final decision, taking into account the decisions of the previous stages together with the corresponding reliability estimations. The proposed multi-stage automatic signature verification system has been tested on a database of signatures produced by 49 different writers. The experimental analysis highlights the effectiveness of the approach: the proposed system employing only two features, used in distinct moments of the decision process, performs better than other systems, employing larger feature set (including the features used in the proposed system) and performing classification in a single stage. Keywords: Classification; Document validation; Multi-expert systems; Reject option; Reliability; Serial combination; Signature verification
1. INTRODUCTION Several types of document need to be validated by signature, and reliable techniques for signature verification are consequently requested. Even if the most recent research is focussed on the digital signature of electronic documents (i.e. an encrypted key code associated with a document in its electronic version, especially designed for avoiding manipulation of the file by unauthorised people), a very large number of signed paper documents are still produced daily. Until now, the problem of signature verification on this class of documents has been faced by taking into account three different types of forgeries: random forgeries, produced without knowing either the name of the signer nor the shape of its signature; simple forgeries, produced knowing the name of the signer but without having an example of his signature; and skilled forgeries, produced by people who,
Received: 21 May 1999 Received in revised form: 7 February 2000 Accepted: 25 February 2000
looking at an original instance of the signature, attempt to imitate it as closely as possible (see Fig. 1). It is obvious that the problem of signature verification becomes more and more difficult when passing from random to simple and skilled forgeries, the latter being so difficult a task that even human beings make errors in several cases. In fact, exercises in imitating a signature often allow us to produce forgeries so similar to the originals that discrimination is practically impossible; in many cases, the distinction is complicated even more by the large variability introduced by some signers when writing their own signatures. In relation to this, studies on signature shape found that North American signatures are typically more stylistic in contrast to the highly personalised and ‘variable in shape’ European ones [1]. Until now, a lot of different recognition systems have been proposed [2–7]. Among them, we find that solutions in the case of random forgeries are mainly based on the use of global shape descriptors as the shadow code, initially proposed by Burr [8] and successively enhanced by Sabourin et al [9], who report the results with reference to a database of 20 writers. Another approach, which performs well in the case of random forgeries, is based on the gradient operator applied to the signature image. The gradient image
170
C. Sansone and M. Vento
Fig. 1. (a) A genuine signature by ‘Claudio Busillo’ and different types of forgeries, (b) a random forgery of ‘Claudio Busillo’ produced by ‘Marcello Gatti’, (c) a simple forgery and (d) a skilled forgery.
obtained, integrated over the signature in a predefined range of directions, provides a directional probability density functions (pdf), which is used as a signature descriptor; the results of the method on a database of 800 random forgeries are reported by Drouhard et al [10]. Other approaches using global shape descriptors such as shape envelope projections on the coordinate axes, geometric moments, or even more general global features such as area, height and width, have been widely investigated [1,4,11–13]. Sometimes these approaches, although tailored for detecting random forgeries, produce interesting results with simple forgeries. As soon as we consider the problem of simple forgery detection, the use of global features often does not allow us to discriminate simple forgeries from the original, so limiting the overall performance of the system. In this case, most of the systems known in the literature perform the validation by using more local features obtained by starting from the skeleton of the image or its outline. In particular, some authors consider peculiarities of the writing process, such as the evaluation of the ink distribution at different resolution levels and the high-pressure regions [6,7], or other features characterising the shape of the signature, such as the edges evaluated along different directions. Another way of extracting local features is proposed by Murshed et al [12]. The input signature image is divided into areas of equal size, and a set of features is extracted by evaluating in each area the occurrence of some graphical segments. It is worth pointing out that most of the systems proposed up to now, while performing reasonably well on a single category of forgeries (random, simple or skilled), decrease in performance when working with all the categories of forgeries simultaneously, and generally this decrement is bigger than one would expect. The main reason for this behaviour lies in the difficulty of defining a feature set that is adequate to work with all the classes of forgery simultaneously. In fact, the use of global features allows us to isolate, as required, the random forgeries and most of the simple forgeries, but often does not allow us to discriminate between a genuine signature and a skilled forgery that is similar in shape to the corresponding original. On the other hand, as we consider more specific features, the systems increase the ability to discriminate among forgeries and originals, allowing the recognition of skilled forgeries, but those genu-
ine signatures drawn with higher shape variations risk being classified as forgeries. In the light of these considerations, it seems justifiable that a single-stage system, devised for recognising forgeries of all the categories, should simultaneously use both general and specialised features; this could result in unavoidable errors related to the adjustment of the discriminatory power of the feature set depending on the variability of the genuine signatures. A good solution to this kind of problem could be obtained by considering Multi-Expert Systems (MES), i.e. systems made of a variety of simple experts, each able to solve a particular problem. The rationale of the multi-expert approach lies in the assumption that, by combining the results of a set of experts according to a combining criteria, it is possible to compensate for the weakness of each single expert while preserving its own strength [14]. Experimental analysis on different areas has demonstrated that the performance of the MES can be better than that of any single expert, especially when using complementary experts, as far as possible, and adopting a combining rule for determining the most likely class a sample should be attributed to, given the class to which it is attributed by each single expert [15–17]. The idea of using a MES has recently been investigated in the literature of signature verification; in this area, most of the existing systems refine the decision process by adopting a multi-resolution scheme. Consequently, the set of features is always the same, but is applied to the input image at different resolution levels [6,13,18]. Other approaches are based on a parallel combination of a set of experts, each using a relatively small number of features. To increase the complementarity of the experts, as required for building a good MES, typically some of them employ only global features, while others employ only local features [1,5]. Anyway, even if these approaches allow us to improve performance with respect to a single classification strategy, further improvements can be obtained by splitting in a serial way the whole classification phase, and adopting, at each stage, the set of features that is more suited to the specific decisional process which is being carried out. In the paper we propose a MES for signature verification based on a combination of two experts organised according
Signature Verification: Increasing Performance by a Multi-Stage System
to a serial topology. The whole decision process is subdivided into different stages, each using an adequate set of features. The first expert, adopting only a single global feature, is devoted to the elimination of most random and simple forgeries, even if the generality of the features employed implies, as an unwanted side effect, that some skilled forgeries will deceive at this stage, being classified as genuine. The second stage receives only those signatures not classified as false by the first stage (i.e. those signatures which are really genuine, or forgeries reproduced in a skilled way and that have deceived the first stage), and adopts a single specific feature suitable for isolating skilled forgeries. Both of the two stages employ suitable criteria for estimating the reliability of the classification performed so that, in case of uncertainty, the signature is forwarded to a final stage which takes the final decision, taking into account the decisions of the previous stages together with the corresponding reliability estimations. Our approach is similar to that proposed by Murshed et al [12], in which a two-stage verification system is proposed. In Murshed et al [12], however, there is no final combination of the decisions made by the two stages, and the thresholds used to decide upon the acceptance or rejection of a signature are a priori fixed. On the other hand, the proposed MES adopts novel criteria for evaluating the reliability of the classification decisions carried out by the experts. The MESs proposed to-date evaluate the reliability of the classification of an expert on the basis of the recognition rate obtained by that expert during the training phase, on the class assigned to the sample. As a consequence, the same reliability value is associated to every decision attributing a sample to the same class, even though it seems reasonable to take into account its dependence on the quality of the specific sample. The proposed system, at each stage, employs criteria for estimating the reliability of each single recognition act on the basis of information directly derived from the output of the expert. In the light of the reliability evaluation, a stage can refuse the decision, thus making a rejection. In the case of rejection, the successive stage of the system will be involved for further processing. The decision to reject a classification is made according to a threshold on the reliability value. Determination of the optimal threshold is carried out according to a method which determines the best trade-off between a reject and an error, by considering the requirements of the application domain. These requirements are specified by attributing costs to misclassifications, rejects and correct classifications. The proposed approach has been tested on a large database of signatures produced by 49 different writers. The experimental analysis highlights the effectiveness of the approach: the proposed system employs only two features, used in distinct moments of the decision process, and performs better than other systems employing larger feature sets (including the features used in our system) and performing classification in a single stage.
171
2. THE PROPOSED APPROACH The proposed Automatic Handwritten Signature Verification System (AHSVS) is a serial multi-expert system, in which each stage is devoted to recognising whether or not the signature belongs to a specific forgery category. Since both random and simple forgeries can be very different from genuine signatures, because in both cases the writer does not know the model of the genuine signature, it seems reasonable to consider random and simple forgeries as one category. Consequently, the proposed system is made up of three stages: the first will cope with random and simple forgeries; the second with skilled forgeries; while the final stage intervenes only if the two previous stages were unable to make a decision. An overview of the overall system is given in Fig. 2(a). Each stage is made up of an expert, devoted to the classification of an input sample, and of a forgery decider. The expert is a two-class classifier (genuine or forgery). The forgery decider, on the basis of the output vector provided by the corresponding expert, estimates (by a suitably defined parameter) the reliability of the classification decision, and isolates all of those signatures which can be reliably considered as forgeries. Before going in detail about the flow of the decision process (reported in Fig. 2(b)), we briefly explain the notation used: the reliability parameters, whose values ranges from 0 to 1, are in general indicated with , and the reliability thresholds (formally defined hereafter) with . These symbols have a subscript denoting the stage to which they refer. So I, II and c, respectively, denote the reliability evaluated in the first, second and final (combiner) stage, and similarly, I, II and c are the thresholds in the three stages. The recognition process starts by presenting the input signature to the first stage. If its response is that the signature is a forgery and the reliability I associated with this decision is higher than a suitably fixed reliability threshold I, the system concludes that the signature is a forgery and the process stops. Otherwise the signature is forwarded to the second stage. In a similar way, this stage classifies the signature and computes the corresponding reliability II, stopping the process if the signature is classified as a forgery with II greater than II. The signatures forwarded to the third stage are those recognised as being genuine by the second stage, no matter what the associated reliability, or those recognised as forgeries, but with a reliability lower than the threshold. The third stage combines the information regarding the decisions taken by the two previous stages, i.e. the class the signature was tentatively attributed to (in the following called a vote) and the reliabilities I and II associated with each vote. The combiner takes the final decision according to a weighted voting criteria, i.e. by performing a sum of the votes for each class, each weighted by the corresponding reliability, and attributing the signature to the class that achieves the highest score. This stage can decide upon a reject if the reliability c associated with the winning class (see Section 2.2) is below a threshold c.
172
Fig. 2. (a) The architecture of the proposed AHSVS system, and (b) the implemented decisional classification decisions at the different stages and of the corresponding reliability evaluations.
C. Sansone and M. Vento
flow
as
a
function
of
the
Signature Verification: Increasing Performance by a Multi-Stage System
Section 2.1 describes the experts used in the first and second stages of the system, detailing the architecture of the corresponding classifiers and the features adopted. Section 2.2 illustrates, for each stage of the serial multi-expert system, the criteria used for evaluating the reliability of the classification decisions, and for determining the optimal values of the reject thresholds. 2.1. The Experts in the First and Second Stages
As outlined in the introduction, a key point of our approach is that of splitting the whole decisional process into three successive stages, evaluating in each stage the actual reliability of the classification on the currently considered input signature. According to the rationale of our serial Multi-Expert System, the set of features employed at each stage is peculiar, and especially tailored for efficiently solving a given subtask. For instance, the set of features employed at the first stage should ideally detect all of the random and simple forgeries, even accepting that some skilled forgeries could be misclassified (these are in fact sent to the second stage for further processing). In a similar way, the features in the second stage should ideally allow us, if applied to the signatures coming from the first stage, to detect all of the skilled forgeries. In other words, the features of the first stage should allow us to minimise the percentage of the genuine signatures classified as forgeries, i.e. the False Rejection Rate (FRR) and the percentage of random and simple forgeries classified as genuine, respectively denoted with FAR(random) and FAR(simple). In a similar way, the features in the second stage should minimise the FRR on the incoming signatures (those not classified as forgeries by the first stage) and the percentage of skilled forgeries classified as genuine, denoted with FAR(skilled). Taking into account these requirements, the set of features used at the first and second stages has been selected from among those well known in the literature and extensively used in other AHSVSs. In particular, we have chosen the features to employ in the first and second stages of our serial Multi-Expert System among those proposed by Huang and Yan [6], i.e. the core, the outline, the directional frontiers in eight directions and the high pressure regions. In the first stage of the system we decided to use the outline of the signature. This feature, as documented by Huang and Yan [6], is global enough to detect most of the random and simple forgeries, even if a number of skilled forgeries can deceive it. Recall that these latter do not create problems for our system, as they are successively processed in the second stage. The feature adopted in the second stage is the high pressure regions, which have been demonstrated to be effective at detecting skilled forgeries. Figure 3 summarises the process for obtaining the coded feature vector from a signature. As previously clarified, the first and second stages evaluate a binarised transformed image, starting from the grey-scale input signature (Fig. 3(a)), i.e. the outline and the high pressure regions, respectively. As described by Huang and Yan [6], a pixel of the
173
signature belongs to the outline if its 8-neighbour count is below 8 and its grey level is greater then a threshold outline, thus computed:
outline ⫽ gmin ⫹ 0.25 (gmax ⫺ gmin)
(1)
where gmin and gmax are (respectively) maximum grey level value of the input Analougously, a pixel belongs to the if its grey level is greater then another
hpr ⫽ gmin ⫹ 0.75 (gmax ⫺ gmin)
the minimum and image. high pressure regions threshold hpr: (2)
Note that the outline is successively projected in four predefined directions, by applying suitable filters, detailed by Huang and Yan [6]. In all cases, the coded feature vector is obtained by superimposing the transformed image with a 3⫻10 grid (see Figs 3(b) and 3(c)). In particular, each component of the feature vector is associated with a square in the grid, and has a value calculated on the basis of the black pixels contained in the corresponding grid. As pointed out in Figs 3(d) and 3(e), each square of the grid is divided into a core area, in which pixels are weighted by 1, and a peripheral area, in which pixels are weighted by a value proportional to their distance from the border. In conclusion, the first stage uses a feature vector obtained by chaining the four vectors associated with the projections of the outline on the four directions considered. Taking into account that each image gives rise to 30 components, we have that the feature vector of the first stage is made of 120 components. The feature vector used in the second stage is instead made of only 30 components. Both of the experts in the first and the second stages are based on a neural classifier, in particular a Multi-Layer Perceptron Network with three layers of neurons. Details of the architecture of the experts and the number of neurons can be found in Table 1.
Table 1. The characteristics of the in the first and second stages
experts employed of the system
First stage expert ‘Random and simple forgeries’ expert
Second stage expert ‘Skilled forgeries’ expert
Features
outline
high pressure regions
Classifier
120-7-2 Multi-Layer Perceptron with Sigmoidal Activation Function, trained with standard Backpropagation Algorithm using a constant learning rate equal to 0.5.
30-19-2 Multi-Layer Perceptron with Sigmoidal Activation Function, trained with standard Backpropagation Algorithm using a constant learning rate equal to 0.5.
174
C. Sansone and M. Vento
Fig. 3. The feature extraction and coding process. (a) The original image; (b) the projection of the outline of the input signature in the 0° direction (the other three projections are missing for the sake of simplicity). The projections in the {0°, 45°, 90°, 135°} directions are used as features in the first stage; (c) the high pressure region is the feature used in the second stage; (d) (e) results of the feature coding process applied to images (b) and (c), respectively. The black pixels in the core area are weighted by 1, while the peripheral pixels are weighted by a value proportional to their distance from the border.
2.2. Forgeries Decider and Combiner
It is worth recalling that the forgery deciders in the first and second stages of the systems are devoted to evaluating the reliability of the classification decisions taken by the corresponding experts on the input signature currently considered. If the classification decision is over a given reliability threshold , to be suitably determined during the training of the system, the decision is accepted; otherwise it is rejected. Evaluation of the reliability of the classification decision is carried out by considering the output vector of the experts, on the basis of a set of considerations reported below. The combiner (i.e. the third stage of the system) performs a task very similar to that of the forgery deciders. In fact, with reference to a given signature passing the first two
stages of the system, it estimates the overall reliability decision on the basis of the reliability of the decisions of the two experts. Also, the combiner implements a reject criterion if the estimated reliability is under a given threshold. In the following, we first define the general criteria for evaluating the reliability of a classification decision by looking at the output vector of the expert. Consequently, by applying the method to the first and second stages of the system, we respectively obtain the reliability parameters I and II. Successively, we report on how to evaluate the reliability of the combiner c, starting from the reliabilities I and II. Finally, we describe the method for evaluating the optimal reject threshold . The method is applied to both the first, second and combiner stages, using I, II and c, respectively.
Signature Verification: Increasing Performance by a Multi-Stage System
With reference to the first point, we consider that the evaluation of the reliability of a classification decision requires the characterisation of those situations in the feature space which can give rise to unreliable classifications and how these situations can be inferred by looking at the state of the expert output. The low reliability of a classification can be traced back to one of the following situations: (a) the considered sample is significantly different from those present in the training set, i.e. its representative point is located in a region of the feature space which is far from those associated with the different classes; (b) the point which represents the sample considered in the feature space lies where the regions pertaining to different classes overlap, i.e. where training set samples belonging to more than one class are present. To distinguish among classifications which are unreliable because a sample is of type (a) or (b), let us define two reliability parameters, a and b, whose values vary in the interval [0,1]. It is assumed that parameter values near to 1 characterise very reliable classifications, while low values correspond to unreliable classifications. The two parameters are associated with each expert, and each parameter is a function of the expert output vector (indeed, of the output of its classification section). In the case of a Multi-Layer Perceptron neural classifier, as shown by Cordella et al [19], the reliability parameters can be defined as:
a ⫽ Owin and b ⫽ 1 ⫺ O2win/Owin
(3)
where Owin is the value of the winning neuron and O2win is the value of the second winning neuron. A parameter providing an inclusive measure of the reliability of a classification can be computed by combining the values of a and b. The form chosen for is:
⫽ min{a,b}
(4)
This is certainly a conservative choice because it implies that, for a classification to be considered unreliable, only one reliability parameter needs to assume a low value, regardless of the value assumed by the other one. As regards evaluation of the decision of the combiner, let us introduce, for simplicity, the two quantities 1 and 2. If the two stages agree on the guess class, we assume 1 ⫽ min(I, II) and 2 ⫽ 0, otherwise, 1 ⫽ max(I, II) and 2 ⫽ min(I, II). The reliability of the decision of the combiner c is defined as:
c ⫽ min(1, 1 ⫺ (2/1)).
175
It is therefore necessary to assess the effectiveness of introducing a reject criterion; this cannot be expressed in absolute terms, but depends upon the specific needs of the application domain. In fact, in some contexts it is desirable to reduce the error rate as much as possible; while in other kinds of applications there may be a dual objective. The first type of scenario could be one in which the correction of a misclassified sample has a high cost, for instance an error in an automatic postal delivery system due to an incorrect interpretation of the zipcode. In other applications, it may be desirable to carry out the classification regardless, even at the risk of a high error rate, for instance, when a character classifier is used in applications in which text must in any case be extensively edited by hand afterwards. It is assumed that an effectiveness function ᏼ is defined which, taking into account the requirements of the particular application, evaluates the quality of the classification in terms of recognition, misclassification and reject rates. Under this assumption, the optimal reject threshold value, determining the best trade-off between the reject rate and misclassification rate, is that for which the function ᏼ reaches its absolute maximum. The reject threshold is evaluated on the basis of some statistical distributions characterising the behaviour of the classifier when operating without a reject option; as will be seen, these distributions are computed after the training phase of the classifier has been completed. The requirements of the particular application domain are specified by attributing costs to misclassifications, rejects and correct classifications. In Cordella et al [19] it is assumed that these costs are invariant with the classes; in this paper, the method is generalised to the case in which the cost of an error is different as a function of the actual class. To operatively define the function ᏼ, let us refer to a general classification problem. Suppose that the samples to be classified can be assigned to one of N⫹1 classes with labels 0, 1, %, N, where 1, %, N are the labels of the real classes and 0 is a fictitious class label indicating rejection of the sample. For each class i⫽1,%,N let us call Rii the percentage of samples correctly classified, Rij the percentage of samples erroneously assigned to the class j (with j⫽i), and Ri0 the percentage of rejected samples. For the same class i, let R0ii and R0ij indicate, respectively, the percentage of samples correctly classified and the percentage of samples erroneously assigned to the class j, when the classifier is used at 0-reject. If we assume for ᏼ a linear dependence on Rii, Rij and Ri0, its expression is given by
(5)
From here on, we describe the method for determining the optimal values of the reject thresholds, whose rationale has been already described by Cordella et al [19], but with some restrictions that will be removed in the present paper. The introduction of a reject option aims to reject the highest possible percentage of samples which would otherwise be misclassified. However, it is worth noting that, even when used adequately, this criterion introduces a side-effect, whereby some samples that otherwise would have been correctly classified are rejected.
冘 冘 N
Cii(Rii ⫺ R0ii) ⫺
ᏼ⫽
i⫽1
冘冘 N
N
Cij(Rij ⫺ R0ij)
(6)
i⫽1 j⫽1 j⫽i
N
⫺
Ci0Ri0
i⫽1
In other words, ᏼ measures the actual effectiveness improvement when the reject option is introduced, independently of the absolute performance of the expert at 0-reject. The quantity Cij denotes the cost of assigning to class j a sample belonging to class i. It is worth noting that, for j
176
C. Sansone and M. Vento
⫽ 0, we indicate the cost of rejecting a sample coming from class i, while when j ⫽ i, the cost actually represents the gain associated with a correct classification. Thus, for each class i, the following relation holds: Cij ⱖ Ci0 ∀j ⫽ 0
(7)
Generally, these costs can be assigned by quantitatively estimating the consequences of the classification result in the domain considered: the cost of a misclassification is generally attributed by considering the burden of locating and possibly correcting the error or, if this is impossible, by evaluating the consequent damage. The cost of a reject is that of a new classification using a different technique. Since Rii, Rij and Ri0 depend upon the value of the reject threshold , ᏼ is also a function of . Starting from the results presented by Cordella et al [19], it is possible to show that the following relation holds:
冘冘 N
ᏼ() ⫽
N
j⫽i
冘
N
(Cii ⫹ Ci0)
i⫽1
Dij ()d
0
i⫽1 j⫽1
⫺
冕 冕
(Cij ⫺ Ci0)
Dii()d
(6⬘)
0
where Dii() and Dij() (with j⫽i) are, respectively, the occurrence density curves of correctly classified and misclassified samples for the class i as a function of the value of . The optimal value * of the reject threshold is that for which the function ᏼ gets its maximum value. Therefore, by calculating the derivative of Eq. (6⬘) with respect to and making it equal to zero, it holds:
冘冘 N
N
i⫽1 j⫽1 j⫽i
冘 N
(Cij ⫺ Ci0)Dij() ⫺
(Cii ⫹ Ci0)Dii() ⫽ 0
(8)
i⫽1
In practice, the functions Dij() are not available in their analytical form, and therefore, for evaluating *, they should be experimentally determined in tabular form on a set S of labelled samples. The construction of this set should be
Fig. 4. An example of a form used in the tests, containing 10 genuine signatures by ‘Sannito Paolo’.
carried out so as to ensure that it represents the target domain. The evaluation of the representativeness of a training set and of its influence on the results of the learning process is a general problem in pattern recognition [20,21]. The optimal threshold * can be determined by means of an exhaustive search among the tabulated values of ᏼ(). The computational complexity of this search is ⌰(N), N being the number of samples of the set S. In fact, the complexity of obtaining a tabulated value of ᏼ() is linear with respect to N, and the number of tabulated values of ᏼ() is obviously constant with respect to N. It is easy to show that, in case of costs invariant with respect to the classes (i.e. Cii ⫽ Cc, Cij ⫽ Ce, Ci0 ⫽ Cr ∀ i, ∀j⫽i), the consequent results coincide with those reported by Cordella et al [19]. In the case of the AHSVS, the number N of classes is equal to two, and we can use the index 1 to denote the genuine class and index 2 the forgery class. As it can be reasonable that the gain of the system in case of a correct recognition is independent of the actual class, it can be assumed that C11 ⫽ C22 ⫽ Cc. Analogously, the cost of a reject can be assumed to be independent of the actual class, so it can also be justified that C10 ⫽ C20 ⫽ Cr. On the other hand, the cost of an error is not the same in case a genuine signature is misrecognised as a forgery, with respect to the case in which a forgery is erroneously recognised as a genuine signature. Therefore, these costs must be different: they can be denoted by C12 ⫽ CFR and C21 ⫽ CFA, where FR and FA stand for False Rejection and False Acceptance, respectively.
3. EXPERIMENTAL RESULTS The problem of assessing the performance of a signature verification system is limited by the unavailability, in the literature, of a large standard database. Security and privacy issues have until now been the main reasons for this deficiency. Consequently, to characterise the performance of our approach, we have collected a signature database. On this data, we have tested our system and, to make a comparison, an implementation of another system known in the literature. In particular, a database made up of 1960 signatures produced by 49 writers has been collected. Writers have been selected in inhomogeneous social and cultural contexts, and they differ in sex, age and profession. The database obtained, whose size is comparable with that used by Dimauro et al [5] and larger than that used by many others [4,7,10,12,13], contains 20 genuine signatures, 10 simple forgeries and 10 skilled forgeries for each writer. Skilled forgeries have been produced by writers after a preliminary training phase in which they tried to reproduce each signature about twenty times. Obviously, these forgeries encompass varying skill levels, as it would be in real cases. The signatures have been written on forms containing 10 writing areas of 12⫻3 cm (see Fig. 4). Each form has been acquired by a flat bed scanner with a resolution of 300 dpi and 256 grey levels. After a process of resolution reduction,
Signature Verification: Increasing Performance by a Multi-Stage System
177
Fig. 5. A snapshot of the implementation of the proposed AHSVS. In the top window, three genuine signatures, three simple forgeries and three skilled forgeries relating to ‘Claudio Busillo’ are shown in the first, second and third rows. The windows on the left show (from top to bottom) a zoom of the first genuine signature and its transformed images, i.e. the outline and the high pressure regions. The verification window gives the output of the system, including intermediate results and reliability thresholds at each stage.
a binarisation and a thinning processing have been applied on the form; the dashed lines have been detected and used as primary separators for extracting each individual signature. On the grey-scale images thus obtained, further processing has been performed with the aim of separating background pixels from foreground pixels. First, a 3⫻3 mean filter has been applied to reduce the noise due to the acquisition phase. Then, using a thresholding algorithm [22], a binarised signature mask has been obtained. This mask allows the individuation of the pixels of the original grey-scale signature image on which the feature extraction phase described in Section 2.1 have to be applied. Before performing the feature extraction, the image has been centred at the grey-level centroid by adding to it the minimum amount of clean border areas. The system has been implemented in C⫹⫹ using wxWindows 1.67, a freeware class library for Windows and Unix distributed on the Internet at http://www.wxwindows.org/. On a PC equipped with a Intel Celeron 300 Mhz processor, the overall signature verification process takes about half a second. As the system is still in a prototype version, it is reasonable to suppose that this time can be further reduced. Figure 5 shows a snapshot of the system. To verify the effectiveness of such a system, we report
Table 2. Results in terms of FRR and FAR obtained by the experts constituting the first and second stages, working separately, and by the proposed AHSVS without the reject option in the third stage. Last row reports the percentage relative improvement of the FAR and the FRR obtained using the proposed AHSVS without the reject option, with respect to the best single expert FRR
FAR Random Simple Skilled
Outline 2.65 (First stage expert) High Pressure 12.04 Regions (Second stage expert) Proposed AHSVS 5.71 (without reject option) Relative improvement ⫺115.47% with respect to the best single expert
0.09
7.14
38.98
0.86
12.45
26.12
0.03
4.29
20.82
66.67% 39.92% 20.29%
178
C. Sansone and M. Vento
Fig. 6. (a) Distribution of the FAR values among the 49 considered writers. Each slice is proportional to the percentage of writers falling in the specified range of values. For example, in (a) 65.3% of the writers assume a FAR equal to 0% and 16.33% a FAR in the range (0%, 10%). Similarly (b), (c) and (d) refer to the FRR on random, simple and skilled forgeries, respectively.
Table 3. Results in terms of FRR, FAR and Reject Rate (RR) obtained by the proposed AHSVS on the whole database Genuine
Random
Simple
Skilled
FRR
RR
FAR
RR
FAR
RR
FAR
RR
2.04
3.67
0.01
0.02
4.29
0.00
19.80
1.22
the experimental results, in terms of FAR and FRR, by considering: (i) the two experts working alone; (ii) the experts working according to the proposed architecture at 0-reject, i.e. without rejections in the third stage (c ⫽ 0) and (iii) considering the reject option. Results regarding (i) and (ii) are reported in Table 2, while results regarding (iii) in Table 3. Furthermore, in Fig. 6 it is shown how the FAR and FRR of the 49 writers are distributed within some predefined range of values, in order to give a more analytical characterisation of the results obtained. Table 2 highlights that the performance of the two experts working separately is not particularly good; in fact, both the FAR of the first stage (outline) on skilled forgeries and the
FRR of the second stage (high pressure regions) is significantly high. The use of the experts according to the architecture proposed in Fig. 2, operating at 0-reject (obtained by fixing the threshold c to zero) allows us to significantly improve the performance on the forgeries, as is evident from the last row of Table 2. In fact, the FAR is significantly lower than that of each single expert used (about 67% less on random, 40% less on simple and 20% less on skilled forgeries). This notwithstanding, in this case the FRR of the overall system is over twice the FRR of the first stage working alone (5.71% vs. 2.65%). This is due to the fact that the FRR of the second stage (high pressure regions) of the whole system
Signature Verification: Increasing Performance by a Multi-Stage System
179
Fig. 7. (a) Twenty genuine signatures by ‘Massimiliano Rak’, an Italian student of Computer Engineering. Note the variability in shape, size, darkness and connectivity; (b) ten simple forgeries; and (c) ten skilled forgeries produced by people looking at the genuine signature and imitating it after 20 trials. Misclassified signatures are enclosed in boxes drawn with a continuous line, while signature rejected are enclosed in dashed boxes. Signatures marked with a star are further commented upon in Fig. 8.
Fig. 8. (a) A genuine signature misrecognised a forgery by the second expert, but correctly classified by the combiner; (b) skilled forgeries that deceive the first stage but are correctly classified by the second expert; (c) a skilled forgery that deceives the first two stages but is correctly classified by the combiner.
is very high, i.e. 12.04%, and this limits the possibility of obtaining good results in terms of the whole FRR. Anyway, the addition of the reject option to the third stage, assuming as cost coefficients Cc ⫽ 1, Cr ⫽ 2, CFR ⫽ 4 and CFA ⫽ 10, determines a significant performance improvement, as becomes evident in Table 3. In fact, the FRR becomes less than that of the best single expert working separately (2.04 vs. 2.65), and the FAR on random and skilled forgeries decreases further. Particularly effective is the result on random forgeries, whose relative FAR is almost zero. In conclusion, the system obtains a relative reduction of 23% in terms of FRR and of 51%, on average, in terms of FAR. To show the behaviour of the proposed system in a
specific case, Figs 7 and 8 report the results with reference to signatures relative to ‘Massimiliano Rak’, one of the writers of the database used. By considering his twenty genuine signatures, it can be noted that almost all (18 of 20) are correctly recognised by all three stages, while one (marked with a star and reported in Fig. 8(a)) is misclassified only by the second stage, but corrected recognised as genuine by the combiner; only one signature is misclassified at all (the one enclosed in a box in Fig. 7). Moreover, all the simple forgeries, as wanted, are correctly classified by the first stage, and then do not pass to the successive stages of the system. Finally, as regards skilled forgeries, it results that five of them are already correctly classified by the first stage. Two of the remaining five forgeries deceive the first stage but are correctly classified by the second stage (they are reported in Fig 8(b)); another one (reported in Fig. 8(c)) deceives both the first and second stages, but is correctly recognised by the combiner; one is rejected and only one is misclassified. For the sake of completeness, the results obtained with the proposed system are compared with those achieved by Huang and Yan’s [6] method operating at a single resolution level, namely 3⫻10. See Table 4 for details. Note that this system achieves lower performance, as the FAR and the FRR are higher than in our case.
4. CONCLUSIONS This paper has presented an Automatic Handwritten Signature Verification System based on a serial multi-expert archi-
180
C. Sansone and M. Vento
Table 4. Results in terms of FRR and FAR obtained on the whole database by the proposed method, and by the expert proposed by Huang and Yan. The last row reports the percentage improvements of the FAR and the FRR obtained by using the proposed AHSVS FRR
for the False Acceptance Rate (the average has been made with reference to random, simple and skilled forgeries, which reported a decrease of the corresponding FAR of 92%, 46%, 40%, respectively).
FAR
References Random Simple Skilled Proposed AHSVS Huang and Yan Relative improvements
2.04 0.01 4.29 19.80 6.53 0.13 7.96 33.06 68.76% 92.31% 46.11% 40.11%
tecture, made of three different stages. The novel features of the approach can be summarised as follows: 쐌 The use of very simple experts at each stage adopting single features especially tailored for the task the stage is devoted to. In our system, the first stage, devoted to isolating random and simple forgeries, uses only the outline of the signature as a feature. The second stage, which has the task of detecting skilled forgeries, employs a single but more specific feature, the high pressure regions of the signature. 쐌 Each stage takes a decision accompanied by an evaluation of the reliability of the decision itself. In the light of the reliability evaluation, a stage can refuse the decision, resulting in a rejection. In case of rejection, the successive stage of the system will be involved for further processing. 쐌 The decision to reject a classification is made according to a threshold on the reliability value. The determination of the optimal threshold is carried out according to a method which determines the best trade-off between a reject and an error, by considering the requirements of the application domain. These requirements are specified by attributing costs to misclassifications, rejects and correct classifications. All the aspects mentioned above have been addressed in this paper, detailing the criteria for evaluating the reliability of a classification decision, and discussing how to implement a reject criteria and how to determine the optimal threshold for rejection. Experimental results of the system have been presented with reference to a database of 1960 signatures produced by 49 writers. The performance of the system, analytically commented upon in the paper, has confirmed the effectiveness of the approach; the results obtained by using the experts according to the proposed architecture are significantly better than those obtained by considering the experts working separately. Finally, the performance of our system has been compared with that achieved, on the same database, by the method proposed by Huang and Yan [6] applied at a single resolution level. To this concern, we obtain a reduction of about 69% of the False Rejection Rate and of about 59% on average
1. Cardot H, Revenu M, Victorri B, Revillet MJ. A static signature verification system based on a cooperative neural network architecture. International Journal of Pattern Recognition and Artificial Intelligence 1994; 8(3):679–692 2. Plamondon R, Lorette G. Automatic signature verification and writer identification – the state of the art. Pattern Recognition 1989; 22(2):107–131 3. Plamondon R, Leclerc F. Automatic signature verification: the state of the art 1989–1993. International Journal of Pattern Recognition and Artificial Intelligence 1994; 8(3):643–660 4. Bajaj R, Chaudhury S. Signature verification using multiple neural classifiers. Pattern Recognition 1997; 30(1):1–7 5. Dimauro G, Impedovo S, Pirlo G, Salzo A. A multi-expert signature verification system for bankcheck processing. International Journal of Pattern Recognition and Artificial Intelligence 1997; 11(5):827–844 6. Huang K, Yan H. Off-line signature verification based on geometric feature extraction and neural network classification. Pattern Recognition 1997; 30(1):9–17 7. Lee LL, Lizarraga MG, Gomes NR, Koerich AL. A prototype for Brazilian bankcheck recognition. International Journal of Pattern Recognition and Artificial Intelligence 1997; 11(4): 549–569 8. Burr DJ. Experiments on Neural Net Recognition of Spoken and Written Text. IEEE Transactions on ASSP 1988; 36(7): 1162–1168 9. Sabourin R, Cheriet M, Genest G. An extended-shadow-code based approach for off-line signature verification. Proceedings of the Second International Conference on Document Analysis and Recognition, IEEE Press, 1993: 1–5 10. Drouhard JP, Sabourin R, Godbout M. A neural network approach to off-line signature verification using directional PDF. Pattern Recognition 1996; 29(3):415–424 11. Ammar M. Progress in verification of skillfully simulated handwritten signatures. International Journal of Pattern Recognition and Artificial Intelligence 1991; 5(1–2):337–351 12. Murshed NA, Sabourin R, Bortolozzi F. A cognitive approach to off-line signature verification. International Journal of Pattern Recognition and Artificial Intelligence 1997; 11(5):801–825 13. Qi Y, Hunt BR. Signature verification using global and grid features. Pattern Recognition 1994; 27(12):1621–1629 14. Suen CY, Nadal C, Legault R, Mai TA, Lam L. Computer Recognition of Unconstrained Handwritten Numeral. Proceedings of the IEEE 1992; 80(7):1162–1180 15. Rahman AFR, Fairhurst MC. An Evaluation of Multi-Expert Configurations for the Recognition of Handwritten Numerals. Pattern Recognition 1998; 31(9):1255–1273 16. Kittler J. Combining Classifiers: A Theoretical Framework. Pattern Analysis and Applications 1999; 1(1):18–27 17. Cordella LP, Foggia P, Sansone C, Tortorella F, Vento M. Reliability parameters to improve combination strategies in multi-expert systems. Pattern Analysis and Applications 1999; 2(3):205–214 18. Qi Y, Hunt BR. A multiresolution approach to computer verification of handwritten signatures. IEEE Transactions on Image Processing 1995; 4(6):870–874 19. Cordella LP, Sansone C, Tortorella F, Vento M, De Stefano
Signature Verification: Increasing Performance by a Multi-Stage System C. Neural Networks Classification Reliability. In: Leondes CT (ed), Academic Press Theme Volumes on Neural Network Systems, Techniques and Applications. Academic Press 1998; 5:161–199 20. Fukunaga K. Introduction to Statistical Pattern Recognition, 2nd Ed. Academic Press, 1990 21. Guyon I, Makhoul J, Schwartz R, Vapnik V. What Size Test Set Gives Good Error Rate Estimation? IEEE Transactions on PatternAnalysis and Machine Intelligence 1998; 20(1):52–63 22. Otsu N. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics 1979; 9(1):62–66
Carlo Sansone was born in Naples, Italy, in 1969. He received a Laurea degree (cum laude) in electronic engineering in 1993 and a PhD degree in electronic and computer engineering in 1997, both from the University of Naples ‘Federico II’. Since 1999 he has been Assistant Professor of Computer Science and Neural Programming at the ‘Dipartimento di Informatica e Sistemistica’ of the University of Naples ‘Federico II’. His research interests are in the fields of neural network theory and classification methodologies, exploiting applications in different areas of pattern recognition as optical character recognition, document processing and signature verification. Carlo Sansone is a member of the International Association for Pattern Recognition (IAPR).
181
Mario Vento was born in Italy in 1960. In 1984 he received a Laurea degree (cum laude) in electronic engineering, and in 1988 a PhD degree in electronic and computer engineering, both from University of Naples ‘Federico II’, Italy. Since 1989 he has been Assistant Professor at the ‘Dipartimento di Informatica e Sistemistica’ in the Faculty of Engineering of the University of Naples, where he is currently Associate Professor of Computer Science and Artificial Intelligence. His interests cover the areas of artificial intelligence, image analysis, pattern recognition, machine learning and parallel computing in artificial vision. He especially dedicated to classification techniques, either statistical, syntactic and structural, giving contributions to neural network theory, statistical learning, exact and inexact graph matching, multi-expert classification and learning methodologies for structural descriptions. He has participated to several projects in the areas of handwritten character recognition, document processing, car plate recognition, signature verification, raster to vector conversion of technical drawings, and automatic interpretation of biomedical images. He authored over 70 research papers in international journals and conference proceedings. Dr Vento is a member of the International Association for Pattern Recognition (IAPR), and of IAPR Technical Committee on ’Graph Based Representations’ (TC15).
Correspondence and offprint requests to: Professor M. Vento, Departimento di Informatica e Sistemistica, Universita` degli Studi di Napoli ‘Federico II’, Via Claudio 21, I-80125 Napoli, Italy. Email:
[email protected]