Personal Identification and Verification: Fusion of Palmprint Representations C. Poon, D.C.M. Wong and H.C. Shen* Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. {carmenp, csdavid, helens}@cs.ust.hk
Abstract. This paper aims to study the accuracy and robustness of personal identification or verification systems where palmprint is the only modality available or utilized. Three different representations of palmprint are fused at the score-level by the sum rule, and at the decision-level by weighed or majority votes. Results showed that fusion at the score-level is easier to formulate and justify, and performs better than fusing at the decision-level. On a database of 340 subjects (10 samples/class), 10-fold and 2-fold cross-validation is accurate to 99.8% and 99.2% respectively. When operating as a verification system, it can achieve a false acceptance rate of 0.68% while maintaining a false rejection rate of 5%. Keywords. Biometrics; Multimodal system; Fusion; Palmprints; Personal identification; Personal identity verification
1 Introduction Amongst the various biometrics used for personal identification or verification, palmprints are considered to have high uniqueness, fast processing time, and low cost [1],[2],[3]. Inkless palmprint images can be captured by CCD cameras [3],[4] or scanning technologies [5]. Typical methods in processing them are: directional lines detection [1],[5],[6]; Fourier transform [4]; Karhunen-Loeve transform [7]; wavelets transform [8],[9]; Gabor filters [3]; and feature space dimensionality reduction by Fisher’s linear discriminant [2]. However, the palmprint capturing devices currently available require users to lay their hands on a contact surface against pegs for fixing the rotation and the stretching of the palm. The surface required frequent cleaning for hygienic, performance and security reasons. To overcome the shortfalls, we propose a contact-less palmprint capturing system that works without pegs (see Fig. 1). In spite of the success of many single modality biometric systems, fusion of multiple representations, matchers, or modalities to improve accuracy or flexibility are inevitably the trend of modern security systems [10]. Fusion can be carried out at different stages of an identification or verification process, which typically comprises the steps of: 1) feature extraction, 2) matching score calculation, and 3) decision making. Fusion at each stage are, correspondingly: 1) at the representation-level, where feature vectors are concatenated before putting forward to a classier; 2) at the scorelevel, where scores resulted from each independently-classified feature vector are *
Corresponding Author:
[email protected] θ Extracted Palm Region
Fig. 1. Palmprint Capturing System
Fig. 2. Schematic diagram of image alignment. Gaps between the fingers are used to minimize rotation and translation errors
combined to derive the final decision; and 3) at the decision-level, where decisions are made based on each feature vector, from which the final conclusion is drawn. In this paper, fusion at the score-level and decision-level of three representations of palmprint are studied. Fusing at the representation-level is not applicable in our case since the varying length of our feature vectors made it difficult to scale each feature fairly at the representation-level. By fusing different representations of palmprint, it is hoped that the performance of biometric systems that uses palmprint as one of their biometric modals can be improved as a result.
2 Methodology The three representations utilize: 1) a 2D Gabor filter, 2) a set of oriented line detectors, and 3) Haar wavelet transform. To extract features from a local spatial domain, palm images are either divided into fixed number of overlapping rectangular blocks, or fixed size, non-overlapping sectors of elliptical half-rings. The image of the hand captured on a uniform, dark background is identified by a threshold determined statistically. With reference to the gaps-between-fingers, the palm is duly rotated and the maximum palm area is selected [11] (see Fig. 2). A shade model is constructed by applying a lowpass filter to the image and is subtracted from the original palm image to minimize the effects of non-uniform illumination. 2.1 2D Gabor Filter The 2D Gabor filter family is inherited from the receptive field profiles encountered experimentally in cortical simple cells [12]. The general functional form of it is: 2 2 − π ( x − xo ) + ( y − y o ) 2 β 2 α
G ( x , y ) = e
⋅e
( −2 π i [uo ( x − xo )+ vo ( y − yo )] )
(1)
Previous studies suggest that a Gabor filter orientated at 135° is sufficient and best for capturing features of a left-hand palmprint [3]. In this study, the spatial frequency f o = u o + vo is set to best respond to lines of 10 pixels wide, which corresponds to that of the majority palm lines found in our database. (α , β ) specify the radial and angular bandwidths in the frequency domain, and are chosen to be [13]: α=β =
3 2 ln(2)
(2)
2π fo
After applying a 15x15 (in pixels) Gabor filter, the palm image is converted into a binary image using an adaptive threshold estimated from the histogram. The resultant image is divided into 11x11 overlapping rectangular blocks such that each rectangular block overlaps half of its adjacent blocks. The number of pixels detected as a line in each rectangular block forms the feature vector. 2.2 Detection of Directional Lines A set of line detectors oriented from 0° to 165°, in steps of 15°, is used to extract features from palmprints. For each pixel, the algorithm selects the line detector that matches most with the structural orientation in the local vicinity of the pixel-ofinterest and stores the resultant magnitude in a separate map. Pixels with magnitude below an adaptive threshold that is determined from the histogram of the magnitude map are excluded and only representative features are retained. The palm image is divided into non-overlapping elliptical half rings, which centered at the same point and with the two axes of each ring doubles that of its immediate inner one [11] (see Fig. 3). The innermost ellipse is divided into 3 sectors, while each outer ring will have 2 more sectors than its inner layer. For each sector, the percentage of pixels in each of the four orientations, i.e. oriented between 0°-45°, … , 135°-180°, are used as features to represent the palmprint. Arrangement of the features is such that those obtained from an inner layer precede those obtained from an outer layer (see Fig. 4). This ensures that when two feature vectors of unequal length are compared to each other, point-wise comparison of them is actually comparing the same spatial region of two different palm images.
Features in Features in Features in Features in Region 1 Region 2 Region 3 Region 4
Fig. 3. Palm image is divided into sectors of elliptical half-ring
Fig. 4. Arrangement of feature vector
…
2.3 Haar Wavelet Transform Palm images are decomposed by the Haar wavelet for a single-level. Smoothing masks are applied before and after wavelet transform. Since it was found that most of the low frequency components are attributable to the redness underneath the skin and should preferably be excluded from features for identification, magnitude of pixels within one standard deviation are set to zero. Values of the rest of the pixels are projected onto a logarithmic scale in order to minimize variations between two images: I ( xi , y i ) =
if | I ( xi , yi ) | ≤ std ( I ( x , y ))
0,
ln( | I ( xi , yi ) | − std ( I ( x, y )) + 1 ) , if | I ( xi , yi ) | > std ( I ( x, y ))
,
(3)
where I(x,y) is the intensity of a detailed image. Each of the detailed images is divided into sectors of non-overlapping elliptical half-rings (see Sect. 2.2). Mean energy level of each sector forms the feature vector, which is arranged in such a way that those extracted from an inner layer precede those from an outer layer (see Fig. 4). 2.4 Distance Measures For each representation, the score between two feature vectors is calculated as the mean of the absolute difference between them to accommodate for the varying dimensionality. If featureVk represents a feature vector of Nk elements for the kth image, the score between the ith and the jth images is given as: min( N i , N j )
Score(i , j ) =
∑ n =1
| featureVi ( n) − featureV j ( n) |
(4)
min( N i , N j )
3 Experiments 3.1 Analysis of Identification Systems Fusion of the three methods is examined at the score-level and at the decision-level. Since the number of classes outstands the number of samples per class, a classifier that is not statistically based is chosen, i.e. the nearest-neighbor (1-NN). The system is tested with 10-fold and 2-fold cross-validation methods. The sum rule is used when fusing at the score-level. Scores resulted from the three representations are first rescaled to a comparable scale by normalizing the genuine score distribution to zero mean and unit variance based on the distribution estimated from the training set. The final score is the sum of the normalized scores. The weighed vote scheme is examined when fusing at the decision-level. The classes are ranked in ascending order of scores and the first three classes with maximum confidence will be given weights w1, w2, and w3, respectively. The weight
mum confidence will be given weights w1, w2, and w3, respectively. The weight W is chosen randomly, while w2 and w3 are selected accordingly to satisfy:
1 W 3
1 W 2
w1 = W < w2 < W
< w3 < W , and ,
1 2
,
(5)
w2 < w3 < w2
We have chosen the three weights .9, .5, and .31. A slight variation of the weighed vote scheme, where each representation method is allowed to cast three third-picks (i.e. 5 votes/representation), is also examined. 3.2 Analysis of Verification Systems The sum rule is used to fuse the three representations at the score-level. The sum of the normalized scores of the three representations is analyzed. To fuse at the decision-level, nine images per class are used for training while the remaining one is resided for testing. The score at equal error rate (EER) is obtained from the training set for each representation. A representation method accepts a test sample only if the resultant score is below the threshold. The overall system will accept or reject the test sample based on the majority vote.
4 Results Hand images are captured with resolution of 1280x960 (in pixels) and 8-bit colors. Ten images each of the left and right hand of the 170 individuals were captured to form a database of 340 subjects (right hands are flipped around the vertical axis and stored as another subject). 4.1 Performance of Identification Systems It is found that representing palmprint by either one of the three representations results in accuracy differs by less than 1% (see Table 1). On the other hand, classification rates can be improved by as much as 2.2% in 2-fold cross-validation with information fusion (see Table 1). 85% of the incorrectly identified images are the same regardless of the fusionlevel, suggesting that the two fusion rules presented in this study are correlated. Table 1. Classifiation rate of the three algorithms Individual Representation
10-fold 2-fold
2D Gab. Filter
Dir. Lines
Haar Wavelet
99.2% 97.0%
99.3% 97.3%
99.6% 97.9%
Fusion of Different Representations Score-Level Decision-Level Decision-Level (Sum Rule) (3 votes/Rep.) (5 votes/Rep.) 99.8% 99.7% 99.8% 99.2% 98.8% 99.0%
4.2 Performance of Verification Systems
Genuine Acceptance Rate (%)
The Haar wavelet method performs notably better than the other two methodologies when false rejection rate (FRR) is to be kept below 3% (genuine acceptance rate > 97%) (see Fig. 10). It is the only method that can maintain 90% genuine acceptance rate when false acceptance rate (FAR) is at least 0.68% (see Table 2). Fusion (Score-Level) Haar wavelet 2D Gabor Filter Dir. Lines
False Acceptance Rate (%) Fig. 5. Receiver operating characteristic (ROC) curves Table 2. EER, FAR, and FAR of the three algorithms Individual Representation EER (threshold) FAR @ FRR=3% FAR @ FRR=5% FRR @ FAR=1.8% FRR @ FAR=0.68%
2D Gab. Filter
Dir. Lines
Haar Wavelet
4.7% (80.2) 8.6% (86.5) 4.3% (79.3) 8.8% (72.0) 15% (65.2)
6.5% (0.990) 19% (1.13) 10% (1.04) 12% (0.877) 18% (0.806)
4.6% (1.91) 9.8% (2.04) 4.0% (1.89) 7.1% (1.79) 10% (1.69)
Fusion at Score-Level (Sum Rule) 2.6% (4.22) 2.0% (4.18) 0.72% (9.05) 3.2% (9.16) 5.1% (9.04)
Fusion at Decision-Level 3%-3.5% – – 4.5% –
Fusion at the score-level greatly improves the verification accuracy. EER is reduced sharply to 2.6%. This corresponds to 95% genuine acceptance rate when FAR is 0.68% (see Fig. 10). To fuse at the decision-level, majority vote of the three representations results in 4.5% FRR and 1.8% FAR. EER is estimated to be around 3% to 3.5%. It must be noted that the estimation relies on three independent thresholds and therefore it is hard to obtain the true minimum EER. It is therefore suspected that fusion at the decision-level can hardly be justified and most likely it cannot perform better than fusing at the score-level.
5 Conclusion In this paper, we have shown that fusing different representations of palmprint can result in a more accurate identification or verification system.
Fusion at the score-level (by the sum rule) and fusion at the decision-level (by weighed or majority votes) agree with each other, but the former is easier to formulate and justify, and has a better performance. When tested the fused system with a database of 340 subjects (10 samples per subject), 10-fold and 2-fold cross validation method results in classification rates of 99.8% and 99.2% respectively. When used as a verification system, it can operate at a false acceptance rate of 0.68% with an acceptable genuine acceptance rate of 95%. Equal error rate of the system is 2.6%. For future development, we shall explore systems that combine the fusion of multiple representations of palmprints with other biometrics modalities so that optimal accuracy can be obtained while a certain degree of flexibility to users can also be retained.
Acknowledgement The research work is supported by the Sino Software Research Institute (SSRI) grant from HKUST, grant number SSRI01/02.EG12.
References 1. Kumar, A., Wong, D.C.M., Shen, H.C., Jain, A.K.: Personal Verification using Palmprint and Hand Geometry Biometric. Proc. Int. Conf. AVBPA. Guildford, UK (2003) 668-678 2. Wu, X.Q., Zhang, D., Wang, K.Q.: Fisherpalms based palmprint recognition. Pattern Recognition Letters, Vol. 24(15). (2003) 2829-2838 3. Zhang, D., Kong, W.K., You, J., Wong, M.: Online Palmprint Identification. IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 25(9). (2003) 1041-1050 4. Li, W., Zhang, D., Xu, Z.: Palmprint identification by Fourier transform. Int. J. Pattern Recognition and Artificial Intelligence, Vol. 16(4). (2002) 417-432 5. Han, C.C., Cheng, H.L., Lin, C.L., Fan, K.C.: Personal authentication using palm-print features. Pattern Recognition, Vol. 36(2). (2003) 371-381 6. Zhang, D., Shu, W.: Two novel characteristics in palmprint verification: datum point invariance and line feature matching. Pattern Recognition, Vol. 32. (1999) 691-702 7. Lu, G., Zhang, D., Wang, K.: Palmprint recognition using eigenpalms features. Pattern Recognition Letters, Vol. 24(10). (2003) 1463-1467 8. Kumar, A., Shen, H.C.: Recognition of palmprints using wavelet-based features. Proc. Int. Conf. System and Cybernetics, SCI-2002, Orlando, Florida, Jul. (2002) 9. Wu, X.Q., Wang, K.Q., Zhang, D.: Wavelet Based Palm print Recognition. Proc. 1st Int. Conf. Machine Learning and Cybernetics, Vol. 3. (2002) 1253-1257 10. Ross, A., Jain, A.K.: Information fusion in biometrics. Pattern Recognition Letters, Vol. 24. (2003) 2115-2125 11. Poon, C., Wong, D.C.M., Shen, H.C.: A New Method in Locating and Segmenting Palmprint into Region-of-Interest. 17th Int. Conf. Pattern Recognition, (submitted, Jan. 2004). 12. Daugman, J.G.: Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression. IEEE. Trans. Acoustics, Speech, and Signal Processing, Vol. 36(7). (1988) 1169-1179 13. Kumar, A., Pang, G.: Fabric defect segmentation using multichannel blob detectors. Optical Engineering, Vol. 39(12). (2000) 3176-3190