COMPOSITE SCORE NORMALIZATION FOR FACE VERIFICATION ˇ Vitomir Struc, Nikola Paveˇsi´c
ˇ Jerneja Zganec Gros
Faculty of Electrical Engineering, University of Ljubljana, Trˇzaˇska 25, SI-1000 Ljubljana, Slovenia
Alpineon Ltd. Ulica Iga Grudna 15, SI-1000 Ljubljana, Slovenia
ABSTRACT Similarity scores, which form the basis for identity inference in biometric verification systems, typically exhibit statistical variations. These variations are caused by so-called miss-matched conditions, in which the enrollment and probe samples were acquired, and are common to most application domains of biometric verification systems ranging from forensics to smart-home environments. To mitigate these variations, score normalization techniques are usually used. Examples of these techniques include the z-norm, the t-norm or the zt-norm. In this paper we study two-step normalization techniques, such as the zt-norm, and propose a new way of implementing such techniques. Specifically, we propose to implement the first step of the two-step procedure off-line in a non-parametric manner, while the second step is kept unchanged and, hence, performed parametrically. As shown in our face verification experiments, the proposed composite scheme can improve upon the performance of parametric normalization techniques, without an increase in computational complexity, as this is the case with pure non-parametric normalization techniques. Index Terms— Face verification, score normalization, nonparametric score normalization, composite zt-norm 1. INTRODUCTION Biometric verification systems typically rely on a matching procedure to measure the similarity between the feature representation extracted from the ”live” biometric sample and a template stored in the systems database. The results of this matching procedure is a matching (or similarity) score, which is compared to a decision threshold to make a decision regarding the identity of the ”live” biometric sample. However, due to changes in conditions across different enrollment and probe samples, similarity scores typically exhibit statistical variations [1], which negatively affect the recognition performance and render approaches relying on a single, global decision threshold suboptimal. To mitigate this problem the computed matching score is commonly subjected to a score normalization technique, which in a sense calibrates the score and makes it comparable to a global decision threshold. In the context of biometric verification, (parametric) normalization techniques, which assume that the matching score is drawn from a Gaussian-shaped distribution and that adjusting this The work presented in this paper was supported in parts by the national research program P2-0250(C) Metrology and Biometric Systems, the postdoctoral project BAMBI (ARRS ID Z2-4214) and by the EU, European Regional Fund, in scope of the framework of the Operational Programme for Strengthening Regional Development Potentials for the Period 2007-2013, contract No. 3211-10-000467 (KC Class).
distribution to zero mean and unit variance successfully alleviates the score-variation problem, have emerged as the most popular. Since different strategies can be adopted to produce the required estimates of the first and second statistical moment of the Gaussian distribution, different normalization techniques have been proposed by different researchers (e.g., [2–5]). Recently, a new class of normalization techniques, which make no assumption regarding the shape of distribution from which the matching score is drawn, was introduced in [6]. Here, the actual shape of the distribution is first estimated from a sample scorepopulation using kernel density estimation (KDE) and the obtained estimate is then used to remap the entire distribution to a predefined one. While the basic idea of adjusting the score distribution to a common one is still the same as in the case of parametric techniques, this class of so-called non-parametric score normalization techniques relaxes the assumptions pertaining to the shape of the score distribution and should, therefore, have an edge when it comes to verification performance after normalization. Even though it was shown in [6] that non-parametric score normalization techniques consistently outperform their parametric counterparts, there are still a couple of open issues that need to be addressed: i) the size of the sample score population, from which the shape of the distribution is estimated needs to be much larger than in the case, where only two parameters need to be estimated; and ii) the computational load for the remapping is much larger than that of the parametric scaling procedure. Both issues hinder the deployment of non-parametric normalization techniques in verification systems, in which operational speed is crucial. In this paper, we study a special kind of score normalization technique that combines two distinct normalization techniques into a single two-step procedure. An example of such a two-step normalization technique can be found in the popular zt-norm, which combines the so-called z- and t-norms1 [2, 7, 8]. The zt-norm, which is also the main focus of this paper, is considered to be amongst the most effective score normalization techniques and is regularly used in the fields of speaker or signature verification [2, 3]. Instead of performing zt-norm in either a parametric or a non-parametric manner, we propose here to use a composite technique, where the z-norm step, which can be conducted off-line is performed nonparametrically, and the t-norm step is performed parametrically. Our experimental evaluation, conducted on the Face Recognition Grand Challenge (FRGC) database, shows that the composite normalization technique still outperforms its parametric equivalent, while it exhibits the same computational complexity. The rest of the paper is structured as follows. In Section 2 we first briefly review the basics of score normalization, proceed to presenting the theory of parametric and non-parametric score normal1 Details
on the t-, z-, and zt-norms are given in the remainder.
c 2013 IEEE 978-1-4673-4989-5/13/$31.00 ⃝
ization, and finally introduce our composite scheme. In Section 3 we evaluate all techniques in face verification experiments on the FRGCv2 database and conclude the paper with some final remarks in Section 4. 2. COMPOSITE SCORE NORMALIZATION 2.1. Background on score normalization Score normalization techniques can be divided into two classes based on the nature of sample score-population that forms the basis for normalization. If the sample score-population used by the normalization technique is generated based on client verification attempts, then the technique is considered client-centric and, similarly, if the the sample score-population used by the normalization technique is generated based on impostors verification attempt, then the normalization technique is considered impostor-centric [2–4, 6]. As the data for producing a client-score population is usually not available, most of the existing techniques fall into the latter class, including the popular zero-normalization or z-norm and the testnormalization or t-norm [2]. Formally, score normalization can be described as follows. Let 𝑁 denote the number of users enrolled in a biometric verification system and let 𝜔1 , 𝜔2 , ..., 𝜔𝑁 denote the corresponding class labels. Score normalization techniques try to define a mapping 𝜓 of the following form [4]: 𝜓 : 𝑠 → 𝑠′ , (1) in such a way that the resulting normalized scores 𝑠′ are well calibrated and, thus, comparable to a global decision threshold. In the above equation 𝑠 denotes a raw score representing the output of the matching module of a biometric verification system and 𝑠′ stands for the normalized version of the score. Impostor-centric score normalization techniques typically define the mapping 𝜓 based on the classconditional impostor distribution 𝑝(𝑠∣𝜔 𝑖 ), where 𝑖 ∈ {1, 2, ..., 𝑁 } and ¯ implies that we are dealing with impostor scores 𝑠) [6].
2.3. Non-parametric score normalization Different from parametric score normalization techniques, nonparametric score normalization techniques do not make any assumptions regarding the shape of the class-conditional impostor distribution 𝑝(𝑠∣𝜔 𝑖 ). To be able to relax the Gaussian assumption the non-parametric normalization techniques need to estimate the probability density function (pdf) of the impostor distribution 𝑝(𝑠∣𝜔 𝑖 ), which can be efficiently done using kernel density estimation (KDE) [9]. Once the pdf is estimated, the impostor score distribution can be normalized by mapping it to a predefined shape [6]. The described procedure can be formalized as follows: let 𝜌 be a random variable with the property [10]: ∫ 𝑠 𝜌 = 𝐹 (𝑠) = 𝑝𝑠 (𝑞)𝑑𝑞, (3) 𝑞=−∞
where 𝑞 is a dummy variable for integration. Furthermore, let 𝑠′ be another random variable with the property: 𝜌 = 𝐺(𝑠′ ) =
∫
𝑠′ 𝑥=−∞
𝑝𝑠′ (𝑥)𝑑𝑥,
(4)
where 𝑥 again denotes a dummy variable for integration. If we assume that the pdf 𝑝𝑠 (𝑞) represents our impostor score pdf 𝑝(𝑠∣𝜔 𝑖 ) and that 𝑝𝑠′ represents the pdf of a predefined target distribution, then we can describe the new class of non-parametric score normalization techniques with the following mapping: 𝜓(𝑠) = 𝐺−1 (𝜌) = 𝐺−1 (𝐹 (𝑠)) = 𝑠′ ,
(5)
where 𝐺(.) and 𝐹 (.) denote cumulative density functions (CDF) of their corresponding arguments and 𝐺−1 (.) represents the inverse of the CDF [6]. Using the above expression, it is possible to implement most of the existing parametric score normalization techniques in a non-parametric fashion, this includes the popular z- and t-norms. 2.4. Composite score normalization
2.2. Parametric score normalization Parametric (impostor-centric) score normalization techniques assume that the class-conditional impostor distribution 𝑝(𝑠∣𝜔 𝑖 ) takes a Gaussian form, i.e., 𝑝(𝑠∣𝜔 𝑖 ) = 𝒩 (𝑠; 𝜇, 𝜎). Thus, the try to estimate the mean 𝜇 and standard deviation 𝜎 of the distribution and then apply the following mapping for normalization of an arbitrary score: 𝜓(𝑠) = 𝑠′ =
𝑠−𝜇 . 𝜎
(2)
As we can see, the goal of these techniques is to normalize the score distribution to 𝒩 (𝑠′ ; 0, 1) and to ensure that scores from all verification attempts made are drawn from the same distribution. This, in turn suggests that they are well calibrated and can be compared to a single global threshold. We have already emphasized that z- and t-norm are amongst the most popular parametric score normalization techniques. Both techniques use Eq. (2) to normalize the score distribution to 𝒩 (𝑠′ ; 0, 1), with the difference that the 𝑧−norm generates the required sample score-population by comparing the enrolled template of the claimed identity 𝜔𝑖 to a number of impostor probe samples (or z-impostors), while the 𝑡−norm generates a its sample score-population by comparing the given probe sample to a number of impostor templates (or t-models). Based on these sample score-populations each type of score normalization computes its 𝜇 and 𝜎, respectively [6].
Score normalization techniques, which operate on different samplescore populations, can be combined into two-step normalization techniques, where one technique is applied after the other. This two-step approach is not limited solely to parametric normalization techniques, but applies to non-parametric techniques as well. An example of such a two-step normalization technique can be found in the zt-norm [2], where the t-norm technique, is applied on znorm normalized scores. There are, however, also other examples, e.g., [11]. Formally two-step normalization techniques can be defined based on the following mapping: 𝑠′ = 𝜓(𝑠′′ ) and 𝑠′′ = 𝜙(𝑠),
(6)
where both 𝜓 and 𝜙 denote score normalization techniques, as defined in Eq. (1), 𝑠′′ denotes the normalized version of the input score 𝑠 after the first normalization, and 𝑠′ stands for the final normalized score. While commonly parametric techniques are combined into two step procedures,the authors of [6] suggested to combine nonparametric techniques. As we have emphasized in Section 1, such an approach is computationally very demanding and is, therefore, often not suitable for deployment in a biometric verification system. To overcome this shortcoming, parametric and non-parametric normalization techniques can be combined into a two-step composite normalization technique. In the case of the zt-norm this means
that the first step of the normalization technique, which relates to the sample-score population created with the target template, is conducted in a non-parametric manner, while the second step of the normalization techniques, which relates to the sample-score population created with the ”live” biometric sample, is conducted in a parametric manner. Since the pdf estimation procedure of the non-parametric z-norm step can be conducted off-line and the mapping in Eq. (5) can be implemented in the form of a look up table, the first step of the composite normalization does not induce any computational burden during run time. The second step of composite technique is identical to the the parametric version of the zt-norm. Note that the proposed composite approach can be applied to any two-step normalization technique, where the first step involves template-dependant sample score-populations, and is not limited only to the zt-norm studied in this paper. 3. EXPERIMENTS 3.1. Experimental database and setup To assess the proposed composite score normalization technique we make use of the second version of the Face Recognition Grand Challenge database (FRGCv2). Specifically, we use images from the database for conducting the most challenging of the experimental configuration, defined within the FRGCv2 experimental protocol, namely, Experiment 4. Here, 12776 images taken in both controlled and uncontrolled conditions are available for training the recognition techniques to evaluated. Once trained, a probe set of 8024 images taken in controlled conditions and a gallery set of 16028 taken in uncontrolled conditions are available for the actual evaluation. For the experiments we implement a basic feature extraction technique and use it in conjunction with a distance based classifier relying to generate the 8014 × 16028 similarity (or matching score) matrix that forms the foundation for assessing our score normalization techniques2 . This setting more or less corresponds to a closedset verification scenario. It need to be emphasized specifically at this point that the feature extraction technique, matching procedure and as well as the achieved (baseline) verification performance are only of minor importance. Instead, we are interested in the relative performance gains obtained by applying different normalization techniques to the computed similarity scores. Thus, we implement a variant of Principal Component Analysis (PCA) [12] and a cosinebased similarity scoring procedure. We set the PCA dimensionality to 500 for all experiments. 3.2. Results and discussion In our first series of verification experiments we first implement three popular (parametric) normalization techniques and compare their performance to the performance of their non-parametric counterparts. Specifically, we implement the t-norm, z-norm and the twostep procedure zt-norm as well as the non-parametric equivalents, which will be denoted as nt-norm, nz-norm and nzt-norm in the remainder of the paper. Note that it would be possible to conduct a tznorm procedure (by applying z-norm on t normalized scores); however, since this would involve enormous amounts of computations for each test sample during run-time, this options is not feasible. Before we can implement the non-parametric normalization techniques, we have to select a target distribution for Eq. (4). Here, 2 Note that considering the entire 8014 × 16028 similarity matrix when producing performance metrics on the FRGCv2 database corresponds to the so-called all vs. all experimental scenario.
Table 1. Quantitative comparison of the normalization techniques Norm. No norm Z norm NZ norm T norm NT norm ZT norm NZT norm
EER 0.437 0.432 0.426 0.293 0.269 0.263 0.238
[email protected] 0.024 0.028 0.031 0.037 0.037 0.026 0.030
VER@1FAR 0.075 0.084 0.099 0.132 0.159 0.146 0.180
we follow the suggestion of [6] and select the log-normal distribu′ 2 tion for the remapping, i.e., 𝑝𝑠′ (𝑠′ ) = 𝑠′ 𝜎1√2𝜋 exp (− (ln 𝑠2𝜎−𝜇) ) 2 with 𝜇 = 0 and 𝜎 = 0.5. The results of this first series of experiments are shown in the form of Detection Error Tradeoff (DET) and Receiver Operating Characteristic (ROC) curves in Fig. 1. Here, the pair of graphs on the left shows a comparison of the baseline performance of the PCA technique (denoted as NO NORM) and the performance of the parametric and non-parametric versions of the z-norm. The graph in the middle shows the same comparison for the t-norm, while the graph on the right compares the performance of the baseline PCA technique and the parametric and non-parametric versions of the zt-norm. The same comparison can also be seen in Table 1, where a few characteristic error rates are tabulated for all assessed techniques. In the table
[email protected] stands for the verification rate at a false accept rate of 0.1%, VER@1FAR denotes the verification rate at a false accept rate of 1% and finally EER stands for the equal error rate, where the false accept and false reject rates are equal. The first thing to notice from the presented results is the fact that except for the z-norms, where no significant improvement over the baseline performance was observed, all other techniques resulted in significant performance gains. In general, the two-step normalization techniques outperformed the single-step ones and the nonparametric normalization techniques consistently outperformed their parametric equivalents along most operating points of the DET and ROC curves. It has to be noticed, nevertheless that this improved performance comes at the expense of an increased computational complexity. Now that the relative ranking of the normalization techniques is established, let us turn our attention to the second series of verification experiments. Here, we assess the performance of the proposed composite two-step zt-norm (denoted as hzt-norm in the figures) and compare it to the remaining two two-step procedures, i.e., the ztnorm and the nzt-norm, and the baseline performance of PCA. The results of this series of experiments are again shown in the form of DET (on the left) and ROC (on the right) curves in Fig. 2. The composite zt-norm technique achieves the
[email protected] of 4.8%, the VER@1FAR of 19.6% and the EER of 24.4%. In comparison to the parametric two-step normalization technique, the composite procedure produces better verification results along all operating points of the DET and ROC curves. When compared to the nonparametric two-step normalization techniques, the proposed composite procedure still results in a competitive performance, even performing slightly better at the lower values of the FAR. This, however, is most likely a consequence of the characteristics of the database used in our experiments and not a general trend. All in all the proposed composite normalization technique is highly competitive in terms of performance when compared to the non-parametric techniques, while exhibiting the same computational complexity as the parametric techniques.
1
1 40
NO NORM Z NORM NZ NORM
0.9
40
0.9 0.8
0.8 20
5 2
0.6 0.5 0.4
1
0.3
0.5
0.2
0.1
NO NORM Z NORM NZ NORM
0.7
Verification Rate
10
Miss probability (in %)
0.7
Verification Rate
Miss probability (in %)
20
0.2
NO NORM T NORM NT NORM
10 5 2
0.5
0.1
0.1 0.2 0.5 1 2 5 10 20 False Alarm probability (in %)
40
−2
−1
10
0
10
0.1
0.1 0.2 0.5 1 2 5 10 20 False Alarm probability (in %)
10
False Accept Rate
0.4
0.2 NO NORM T NORM NT NORM
0.2
0 −3 10
0.5
0.3
1
0.1
0.6
(a) Comparison NO-, Z-, NZ-norm
0 −3 10
40
−2
−1
10
10
0
10
False Accept Rate
(b) Comparison NO-, T-, NT-norm 1
40
0.9
NO NORM ZT NORM NZT NORM
0.8 0.7
Verification Rate
Miss probability (in %)
20
10 5 2
0.6 0.5 0.4 0.3
1 0.5
0.2 NO NORM ZT NORM NZT NORM
0.2 0.1
0.1
0.1 0.2 0.5 1 2 5 10 20 False Alarm probability (in %)
0 −3 10
40
−2
−1
10
10
0
10
False Accept Rate
(c) Comparison NO-, ZT-, NZT-norm
Fig. 1. Effect of (parametric and non-parametric) score normalization on the verification performance - DET (left) and ROC (rigth) curves 1 40
0.9 0.8
NO NORM ZT NORM NZT NORM HZT NORM
0.7
Verification Rate
Miss probability (in %)
20
10 5 2
0.2 0.1
0.5 0.4 0.3
1 0.5
0.6
0.2
NO NORM ZT NORM NZT NORM HZT NORM
0.1
0.1 0.2 0.5 1 2 5 10 20 False Alarm probability (in %)
40
0 −3 10
−2
−1
10
10
0
10
False Accept Rate
Fig. 2. Comparison of the two-step techniques - DET (left) and ROC (right) curves 4. CONCLUSION We have presented a novel composite scheme for conducting score normalization in face verification systems. The composite technique combines non-parametric and parametric score normalization techniques and makes the computations feasible by computing the non-parametric part off-line and only performing the parametric part during run-time. The results of our experiments suggest that the proposed technique still outperforms the common parametric techniques, while exhibiting the same computational complexity. 5. REFERENCES [1] R. Wallace, M. McLaren, C. McCool, and S. Marcel, “Crosspollination of normalization techniques from speaker to face authentication using gaussian mixture models,” IEEE Information Forensics and Security, vol. 7, no. 2, pp. 553–562, 2012. [2] R. Auckenthaler, M. Careya, and H. Lloyd-Thomas, “Score normalization for text-independent speaker verification systems,” Dig. Sig. Proc, vol. 10, no. 1–3, pp. 42–54, 2000.
[3] J. Fierrez-Aguilar, J. Ortega-Garcia, and J. GonzalezRodriguez, “Target dependent score normalization techniques and their application to signature verification,” IEEE Transactions on Systems, Man and Cybernetics - Part C, vol. 35, no. 3, pp. 418–425, 2005. [4] N. Poh and J. Kittler, “Incorporating model-specific score distribution in speaker verification systems,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 3, pp. 594–606, 2008. [5] J. Mariethoz and S. Bengio, “A unified framework for score normalization techniques applied to text-independant speaker verification,” IEEE Signal Processing Letters, vol. 12, no. 7, pp. 532–535, 2005. ˇ ˇ [6] V. Struc, J. Zganec Gros, and N. Paveˇsi´c, “Non-parametric score normalization for biometric verification systems,” in Proc. of ICPR, Tsukuba, Japan, 2012, pp. 2395–2399. [7] F. Yang, S. Shan, B. Ma, X. Chen, and W. Gao, “Using score norm. tech. to solve the score variation problem in face authentication,” Springer LNCS, vol. 3781, pp. 31–38, 2005. [8] N. Poh and S. Bengio, “F-ratio client-dependent normalisation on biometric authentication tasks,” in Proc. of the Int. Conference on Acoustics, Speech and Signal Processing, 2005, vol. 1, pp. 721–724. [9] Timo Ahonen and Matti Pietik¨ainen, “Pixelwise local binary pattern models of faces using kernel density estimation,” in Proc. of ICB ’09, Alghero, Italy, 2009, vol. 1, pp. 52–61. [10] R.C. Gonzales and R.E. Woods, Digital Image Processing, 3rd Ed., Prentice Hall, New Yersey, 2008. [11] N. Poh, A. Merati, and J. Kittler, “Adaptive client-impostor centric score normalization: A case study in fingerprint verification,” in Proc. of BTAS, Washington, USA, 2009, pp. 1–7. [12] M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.