Analysis of Partial Least Squares for Pose-Invariant Face Recognition Mika Fischer∗
Hazım Kemal Ekenel∗,†
Rainer Stiefelhagen∗
[email protected] ekenel@{kit.edu,itu.edu.tr}
[email protected] ∗
Karlsruhe Institute of Technology Institute for Anthropomatics Karlsruhe, Germany
†
Istanbul Technical University Faculty of Computer and Informatics Istanbul, Turkey
Abstract Face recognition across large pose changes is one of the hardest problems for automatic face recognition. Recently, approaches that use partial least squares (PLS) to compute pairwise pose-independent coupled subspaces have achieved good results on this problem. In this paper, we perform a thorough experimental analysis of the PLS approach for pose-invariant face recognition. We find that the use of different alignment methods can have a significant influence on the results. We propose a simple and consistent alignment method that is easily reproducible and uses only few hand-tuned parameters. Further, we find that block-based approaches outperform those using a holistic face representation. However, we note that the size, positioning and selection of the extracted blocks has a large influence on the performance of PLS-based approaches, with the optimal sizes and selections differing significantly for different feature representations. Finally, we show that local PLS using simple intensity values performs almost as well as more sophisticated feature extraction methods like Gabor features for frontal gallery images. However, Gabor features perform significantly better with non-frontal gallery images. The achieved results exceed the previously reported results for the CMU Multi-PIE dataset on this task with an average recognition rate of 90.1% when using frontal images as gallery and 82.0% when considering all pose pairs.
1. Introduction One of the largest remaining problems in automatic face recognition is the problem of comparing faces with different poses reliably. The difficulty stems from the fact that a change in the pose of the face causes a highly nonlinear transform of the 2D images that are usually used for automatic face recognition. Areas of the face become self-occluded, other areas might appear and the same areas might have a very different appearance, like for instance a
Alignment
PLS
Block extraction
PLS PLS PLS
Holistic PLS Block based PLS with intensity features with intensity features
Gabor features
PLS PLS PLS Block based PLS with Gabor features
Figure 1. Overview of the steps of the different PLS approaches
frontal versus a profile view of a nose. Many approaches have been proposed for face recognition across pose [10]. A successful approach to solve this problem is to reconstruct a 3D model of the face from the available 2D image [1]. However this process is computationally expensive and relies on manual initialization and so is not applicable to many real-world applications. Recently, approaches that seek to find pose-independent latent spaces have gained popularity: Prince et al. [6] use tied factor analysis to find a latent identity space for each pair of poses. Recognition is then done in a probabilistic fashion. Better results are reported when local blocks are used instead of the whole face image. Sharma et al. [8] use partial least squares (PLS) to find a latent space for each pair of poses. Recognition is then performed in this latent space using nearest neighbor matching. In [9], the authors extend their approach to generalized multi-view analysis, where they extend linear discriminant analysis (LDA) and marginalized Fisher Analysis (MFA) to the multi-view case. Li et al. [4] use linear regression to find regression coefficients using a number of face images in the same pose as regressors. The assumption then is that the coefficients are independent of the pose. Here again, better re-
sults are reported when using local blocks instead of the whole face image. In [3], the same authors employed a similar approach, which replaces using regression to find pose-independent feature vectors by the use of partial least squares (PLS), similarly to [8]. However, the authors apply PLS on local blocks of Gabor features, instead of only the holistic face image and report improved results when using them. The aformentioned previous works indicate that the PLS approach is very promising for pose-invariant face recognition, since very good results can already be achieved with simple features. However, previous papers have only reported results with very specific parameters, which are often not reported in the paper itself, making the results hard to reproduce. This paper seeks to address this problem by performing extensive experiments illuminating the different factors that influence the performance of PLS-based approaches, from alignment to feature extraction. Our contributions are the following: We first show that the alignment method can have a significant influence on the performance and we propose a simple and consistent alignment approach that performs well for faces in all poses. We further show that in local approaches, the selection, location and size of the blocks is of high importance. As of yet, no analysis of this factor has been reported. Finally we find that PLS on local blocks of intensity features performs on the same level as PLS on local blocks of Gabor features for frontal gallery images, while for non-frontal gallery images, Gabor features perform better, sometimes by a large margin. We report result exceeding the previous state-of-the-art results on the Multi-PIE database with an average recognition rate of 90.1% when using frontal images as gallery and 82.0% when considering all pose pairs. The remainder of the paper is structured as follows: In Section 2, we briefly introduce PLS. In Section 3, we present our approach and discuss its steps in detail. We present experimental results on the CMU Multi-PIE data set and thoroughly discuss the influence of the different parameters on the recognition performance in Section 4.
2. Partial least squares for pose-invariant face recognition Partial least squares (PLS) is a statistical technique originally proposed as an alternative to ordinary least squares regression in the field of chemometrics [7]. It maps input vectors (regressors) xi and corresponding output vectors (responses) yi into a common vector space in such a way that the covariance between projected input vectors and projected output vectors is maximized. Formally, given a matrix X of input vectors and a matrix Y of output vectors, PLS finds basis vectors w and c so that: [cov(Xw, Yc)]2 =
max [cov(Xr, Ys)]2
|r|=|s|=1
(1)
deyes
dxmouth
yeyes h ymouth w Figure 2. Illustration of the alignment parameters
The projections yields latent scores t and u: t = Xw and u = Yc
(2)
Non-linear iterative partial least squares (NIPALS) [7] is a well-known method for iteratively computing PLS basis vectors. After N iterations, it computes projection matrices W = (w1 · · · wN ) and C = (c1 · · · cN ) containing the iteratively computed basis vectors. PLS was originally used as a regression method to predict output vectors y from input vectors x. However, the latent scores can also be used directly. In the particular case of pose invariant face recognition, we consider vectors from one pose p0 as X and vectors of the same faces from a different pose p1 as Y. We then use PLS to compute projections W and C that maximize the covariance of the latent scores over the data. This property of the projections can then be exploited to recognize face images in different poses by applying face recognition methods on the pose-independent latent vectors x ˜ = WT x and y ˜ = CT y instead of directly on the pose-dependent input vectors x and y.
3. Description of the analyzed approach In this section we present the different steps of the analyzed PLS approach: alignment, extraction of local blocks, extraction of Gabor features, partial least squares and closed-set identification. An overview of these steps is shown in Figure 1.
3.1. Alignment Most previous works on face recognition across pose use alignment methods that rely at some point on details that are input by a human for the different poses specifically. For instance, Sharma et al. [8] use manually specified target points for the facial landmarks in the aligned image, which differ for all poses. Li et al. [3, 4] annotated occluded landmarks (e.g. an eye that is not visible due to the face pose) by estimating the position of the landmark. Such approaches have
several disadvantages. In the first case, the alignment is specific to the poses in the database. For different poses, new points have to be specified by hand. In the second case, the alignment relies on annotations of invisible facial landmarks, which cannot be detected automatically and have to be annotated by a human. Therefore, we propose a consistent and easily reproducible alignment method which works for all pose angles (pan and tilt), and gives a consistent scale and rotation of the face. The method is parametrized by the width w and height h of the aligned image, the vertical position of the point between the eyes, yeyes and the mouth, ymouth , as well as the eye distance in the frontal view, deyes , and the horizontal offset of the mouth from the point between the eyes in the profile view, dxmouth . These parameters are illustrated in Figure 2. For a face image that is to be aligned, we require the positions of the visible eyes, sl.eye and/or sr.eye , and the mouth center smouth in the source image. We then compute a similarity transform T by specifying two point correspondences between the source image and the aligned image. In contrast to previous approaches, the target points are computed automatically based on a simple geometrical model of the head: The first point correspondence is always the mouth center. Let xcenter = (w − 1)/2 be the horizontal center of the aligned image, φ be the pan angle and ψ be the tilt angle. The positions in the source and aligned image are computed as follows: xcenter + dxmouth sin(φ) s1 = smouth t1 = (3) ymouth In the case that both eyes are visible, the point between the eyes is used for the second correspondence: sl.eye + sr.eye xcenter t2 = (4) s2 = yeyes 2 In the case that only one of the eyes is visible, the visible eye sv.eye is used for the second correspondence: xcenter ± 0.5 deyes cos(φ) s2 = sv.eye t2 = (5) yeyes with the sign depending on which eye is visible. In case of tilted faces, the method scales the faces incorrectly. Therefore, a compensation factor is computed as ftilt = 1 − cos(ψ)
(6)
and the distance between yeyes and ymouth is shortened by that factor by moving the target points closer together in vertical direction. This ensures that faces always have the same scale in all poses. The homogeneous transformation matrix T is computed by solving the system of linear equations given by the two
point correspondences: s01 = Tt01 and s02 = Tt02 , where u0 = (ux uy 1)T denotes the homogeneous coordinates of u, for u ∈ {s1 , s2 , t1 , t2 }. In our experiments, we use the following alignment parameters: w = 104, h = 128, yeyes = 42, ymouth = 106, deyes = 62, and dxmouth = 20. Example face images in different poses aligned using these parameters are shown in Figure 3.
3.2. Gabor features Gabor features have been successfully used for face recognition, and in particular in conjunction with PLS [3]. We use the Gabor formulation from [5]: kkµ,ν k2 kzk2 kkµ,ν k2 exp − ψµ,ν (z) = σ2 2σ 2 (7) σ2 × exp (ikµ,ν z) − exp − 2 with iπµ kmax kµ,ν = ν exp (8) f 8 to compute Gabor filters of√five scales and eight orientations with the parameters f = 2, kmax = π/2 and σ = 3π/2, which leads to Gabor filter responses similar to the ones shown in [3]. In order to avoid border effects, we extract the local blocks from Gabor responses which we compute on the whole aligned image including an additional padding of 64 pixels.
3.3. Extraction of local blocks Local face representations have been shown to generally perform much better than holistic representations. Therefore, we extract local blocks around facial landmarks as our local face representation, similar to [3]. Since the only annotations for the Multi-PIE database that are publicly available include eye centers, nose tips and mouth centers, we use local blocks around those facial landmarks in our experiments. We extract blocks of size wb × hb centered on the annotated landmarks in the aligned image. In non-frontal poses, local blocks extracted around the nose tip and mouth center include more and more of the background. This is not desirable because in general, information about the background will not help the identification. Therefore, we discard the block of the eye that is further away from the camera in non-frontal poses. Further, instead of just centering the blocks on the facial landmarks, we introduce a horizontal offset ∆b , for the nose and mouth, which moves these blocks further into the face depending on the pan angle of the face, thereby reducing the amount of background pixels in the block. The offset is computed as: ∆b = −fb · wb · sin(φ)
(9)
Figure 3. Examples of aligned face images in various poses as used for the holistic intensity features (first row), blocks used for local intensity features (second row) and blocks used for local Gabor features (third row)
where fb is a parameter for b ∈ {nose, mouth}. In the experiments, we use fmouth = 0.35 and fnose = 0.2, unless stated otherwise. Examples of the block sizes and locations used in this paper are shown in Figure 3.
3.4. Partial least squares For each pair of poses, we use partial least squares to compute a latent space for all blocks that are present in both poses. We use a custom GPU implementation of the NIPALS algorithm [7] to compute the PLS bases. The input vectors are always transformed so that each input dimension has a mean of zero and a standard deviation of one over the training data. The test data is transformed using the mean and standard deviation computed during training. In case of pose pairs where one pose has a negative pan angle and the other a positive one, the eyes cannot be used at all, because in one pose only the right eye is visible, and in the other pose only the left eye. However, we assume that the eyes are sufficiently symmetric and therefore we train a PLS latent space for opposite eyes, if no matching eyes are available in the pose pair.
3.5. Closed-set identification We compute the distance between two face images in different poses by first extracting the blocks for both images as described in the previous sections. For those pairs of blocks that have a trained PLS for the pose pair, we project the blocks into the latent space and compute the distance between the latent vectors. The differences of the latent vectors are then averaged to yield the global difference. The difference values are then used to perform closed-set face identification using the nearest neighbor method. We use the L2 -distance in our experiments using intensity values and the cosine distance in our experiments using Gabor features: dcos (x, y) = 1 − x · y/(kxkkyk). These distance metrics gave the best results in the experiments.
4. Experiments 4.1. Data set We performed experiments on the CMU Multi-PIE [2] database, which is the largest face database containing images of subjects in a large number of controlled poses. The database contains 337 subjects, recorded during four sessions in 15 poses under 19 different illumination conditions and several facial expressions. We only use the images with ambient illumination (illumination 0) and neutral expression. Further we only use the images from session 1, which contains 249 subjects. Each subject has a single image per pose. We used the publicly available labels provided by the authors of [8] with a small number of errors corrected manually1 . For the experiments, the dataset is split into a training set, which is used to estimate the PLS latent spaces and an evaluation set, which is used to perform closed-set identification. We follow [3, 4] and use the first 100 subjects for training, and the next 149 subjects for closed-set identification. We also compare our results to the ones reported in [3, 4], since the experimental conditions are very similar. In particular, the authors also assume that the poses of the faces and the locations eyes, nose and mouth are known.
4.2. Metric We use all images from one pose as gallery set and the images from all other poses as probe set and perform closed-set identification. This yields a correct recognition rate for all pairs of poses. In most cases only the results for the frontal gallery pose are reported for brevity, or if a single number as indicator of performance is needed, we use the average of the recognition rates using the frontal pose as gallery.
4.3. Baseline: holistic PLS In order to establish a baseline for the rest of the experiments, we first perform PLS using the whole face image as input. The parameter to optimize in this case is the 1 http://cvhci.anthropomatik.kit.edu/˜mfischer/research/pls-analysis/
82
85 Recognition rate in %
90
80 78 76 74 64x52 32x26 16x13
72 70 10
20
30 40 50 60 70 Number of PLS basis vectors
80 75 70 65 60 64x64 48x48 32x32
55 50 80
90
Figure 4. Results with holistic intensity input vectors with different image sizes and different numbers of PLS basis vectors
number of PLS bases used. Since the input dimensionality of 128 × 104 = 13312 is computationally intractable, we downscale the images by different factors before applying PLS. The results for different factors and different numbers of PLS bases are shown in Figure 4. It can be seen that the influence of the number of PLS bases is small after about 30 bases and that downscaling the images reduces the accuracy only moderately. Best parameters The best results for the holistic approach are achieved with an image size of 64 × 52 and 50 PLS basis vectors. The results over all poses when using the frontal pose as gallery are shown in Figure 10. The average recognition rate for frontal gallery faces is 83.4%. The average recognition rate over all pose pairs is 64.7%. Note that these results are significantly better than the results reported in [3] for the holistic case.
4.4. Block-based PLS For the block-based PLS approach, we first use local blocks of intensity values directly. The additional parameters here are the size of the local blocks and the offset for the mouth block. The nose was not used as it reduced the results consistently. We used quadratic blocks of fixed sizes for all landmarks. The results are shown in Figure 5. As can be seen, the effect of the number of PLS basis vectors is even smaller, compared to the holistic case. Therefore, for the further experiments, we used a fixed number of PLS bases of 40. We also see that for intensity features, the largest block size gives the best results. The blocks are shown in Figure 3. Effects of mouth block position To show the effects of the position of the mouth block, we varied the mouth offset parameter fmouth (cf. Section 3.3). Since the input dimensionality of 64 × 64 = 4096, while not completely infeasible, severely slowed down the computation of the PLS
10
20
30 40 50 60 70 Number of PLS basis vectors
80
90
Figure 5. Results with local intensity block input vectors with different block sizes and different numbers of PLS basis vectors 100 90 Recognition rate in %
Recognition rate in %
84
80 70 60 50
0 0.15 0.3 0.45 0.6 0.75
40 -90 -75 -60 -45* -45 -30 -15 +15 +30 +45 +45* +60 +75 +90 Pose angle (pan)
Figure 6. Effects of the mouth offset parameter (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees)
bases, we downscaled the 64 × 64 pixel blocks to 32 × 32 pixels, which caused only a very small drop in performance. The effects of the location of the mouth block are shown in Figure 6. It is clear, that the location of the mouth has a significant influence on the results and the best results are achieved, when the amount of background pixels in the mouth block are minimized, while still keeping the outline of the mouth visible (cf. Figure 3). The differences are larger for large pan angles. For all further experiments, we use a fixed mouth offset parameter fmouth = 0.35, which gave the best overall performance. Effects of alignment To illustrate the influence of the alignment method on the results, we performed the same experiment using the proposed alignment and the alignment that was used in [9], which uses hand-marked target points for each pose and computes an affine alignment using eye centers, nose tip and mouth center. Local blocks of intensity features were used with a block size of 64×64 pixels for the proposed alignment and a block size of 90×90 pixels for the
100
86
95
84 82 Recognition rate in %
Recognition rate in %
90 85 80 75 70 65 60
80 78 76 74 72 70 no nose 0 0.2 0.6
68 proposed alignment alignment from [9]
55 -90 -75 -60 -45* -45 -30 -15 +15 +30 +45 +45* +60 +75 +90 Pose angle (pan)
Figure 7. Effects of different alignment methods (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees)
other alignment, because the distance between the eyes and mouth is 64 pixels in our aligned images and 90 pixels in the other one. The local blocks were downscaled to 64 × 64 pixels if necessary. The results in Figure 7 show that the alignment can have a significant effect on the results, even when using the same labels and the exact same approach, yielding differences of over 10% for some poses. Unfortunately we cannot compare the alignment used in [3], since neither the labels used, nor the code or a detailed description of the alignment method are available. However, from the reported results in Figure 10, it seems that tilted faces are not handled optimally. Further, the large difference between our holistic results and the ones reported in [3] seems to suggest a general problem with their alignment. This shows that results obtained using different alignment methods are not easily comparable. This can only be avoided by using reproducible methods and publicly available annotations. Best parameters The best results with local blocks of intensity features are achieved with blocks of size 64 × 64 pixels, without downscaling and with 40 PLS basis vectors, while using a mouth offset of fmouth = 0.35 and not including the nose block at all. The results over all poses when using the frontal pose as gallery are shown in Figure 10. The average recognition rate for frontal gallery faces is 89.0%. The results for all pose pairs are shown in Table 1. The average recognition rate over all pose pairs is 73.3%. It can be seen that the results using local blocks outperform the holistic approach significantly. Interestingly these results are already better for most poses than the results reported in [3] using Gabor features. For large pose angles and tilted poses, the difference in performance is quite large.
4.5. Gabor features Similarly to Section 4.4, we first investigate the effect of the block size on the performance. To keep the input dimensionality manageable, we downscale all Gabor re-
66 64 24x24
32x32
40x40 48x48 Block size
56x56
64x64
Figure 8. Effects of the block size and nose offset parameter with Gabor features (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees)
sponses to 7 × 7 pixels, thereby reducing the dimensionality to 7 × 7 × 5 × 8 = 1960. We also investigate the effects of the nose block on the performance. As can be seen in Figure 8, in contrast to the intensity features, the inclusion of the nose generally increases the performance with Gabor features. Furthermore, the optimal block size is significantly smaller than in the case of intensity features. The best results are achieved using a block size of 48 × 48 pixels and when including the nose block with an offset factor fnose = 0.2. The locations and sizes of these blocks for the different poses are illustrated in Figure 3. Effect of Gabor parameters Note that these results are significantly worse than the results using blocks of intensity features in almost all poses. However, so far we only used one fixed set of the parameters of the Gabor filters. In particular, the scale parameter kmax of the Gabor filters is highly dependent on the size of the relevant structures in the image. To ensure that we use a reasonable set of parameters, we performed a grid search for the parameters kmax and σ of the Gabor filters. As shown in Figure 9, the parameters of kmax = π/2 and σ = 1.5π are indeed not optimal and the results can be significantly improved by choosing more suitable parameters. Best parameters The best results are achieved with a block size of 48 × 48 pixels, the block offset parameters fnose = 0.2 and fmouth = 0.35, the Gabor parameters kmax = π/4.25 and σ = π, and 40 PLS basis vectors. The results over all poses when using the frontal pose as gallery are shown in Figure 10. The average recognition rate for frontal gallery faces is 90.1%. The results for all pose pairs are shown in Table 2. The average recognition rate over all pose pairs is 82.0%. It can be seen that using the optimized Gabor parameters, the representation using Gabor features outperforms the one using intensity features. However, as
that local PLS using simple intensity values performs almost as well as local PLS using Gabor features in the case of frontal gallery faces, while Gabor features outperform intensity features significantly when non-frontal faces are used as gallery. Our best results exceed the previously reported results for the CMU Multi-PIE dataset on this task, reaching an average recognition rate of 90.1% when using frontal faces as gallery images, and 82.0% when considering all possible pose pairs.
95 σ = 0.5π σ = 1.0π σ = 1.5π σ = 2.0π
Recognition rate in %
90 85 80 75 70 65
6. Acknowledgements
60 0
0.5
1
1.5
2
2.5 3 π / kmax
3.5
4
4.5
5
Figure 9. Influence of the Gabor parameters kmax and σ
can be seen if Figure 10, in some poses intensity features perform slightly better, and in general the performance difference is much smaller than the difference reported in [3]. However, for non-frontal gallery poses, Gabor features perform significantly better than intensity features.
4.6. Comparison of the different approaches Figure 10 shows the results on the frontal gallery of the different approaches that were analyzed, with the best sets of parameters used. As can be seen, the local approaches clearly outperform the holistic approach. However, the difference between the use of intensity features and Gabor features is quite small for the frontal gallery pose. The results for block-based PLS using intensity and Gabor features for all pairs of poses are shown in Tables 1 and 2, respectively. Gabor features only outperform intensity features significantly when using non-frontal poses as gallery. Using local Gabor features, we achieve a correct recognition rate of 82.0% when all pose pairs are considered, and 90.1% when only the frontal gallery pose is considered. These results exceed the previously reported results on the Multi-PIE database for this task.
5. Conclusions In this paper, we have presented an extensive experimental analysis of the PLS approach to pose-invariant face recognition. We found that the alignment of the face images has a large influence on the results, an aspect that is often neglected in works on face recognition across pose changes. We proposed a simple and consistent alignment method that is easily reproducible and works well on face images in all pose angles, with only a small number of parameters. Further, we showed that the tested local approaches outperform the holistic one. However, we found that the size, location and selection of the local blocks is essential to good performance. These parameters also differ significantly between different feature representations. Finally, we show
This work was partially funded by OSEO, French State agency for innovation, as part of the Quaero Programme, and by the “Concept for the Future” of Karlsruhe Institute of Technology within the framework of the German Excellence Initiative, and by the German Federal Ministry of Education and Research (BMBF) under contract no. 01ISO9052E.
References [1] V. Blanz and T. Vetter. Face recognition based on fitting a 3D morphable model. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1063–1074, Sept. 2003. [2] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker. Multi-PIE. Image and Vision Computing, 28(5):807–813, May 2010. [3] A. Li, S. Shan, X. Chen, and W. Gao. Cross-pose face recognition based on partial least squares. Pattern Recognition Letters, 32(15):1948–1955, Nov. 2011. [4] A. Li, S. Shan, and W. Gao. Coupled bias-variance tradeoff for cross-pose face recognition. IEEE Transactions on Image Processing, 21(1):305–15, Jan. 2012. [5] C. Liu and H. Wechsler. Gabor feature based classification using the enhanced Fisher linear discriminant model for face recognition. IEEE Transactions on Image Processing, 11(4):467–76, Jan. 2002. [6] S. J. D. Prince, J. H. Elder, J. Warrell, and F. M. Felisberti. Tied factor analysis for face recognition across large pose differences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(6):970–984, June 2008. [7] R. Rosipal and N. Kr¨amer. Overview and recent advances in partial least squares. In Proc. Subspace, Latent Structure and Feature Selection: Statistical and Optimization Perspectives Workshop, Bohinj, Slovenia, 2005. [8] A. Sharma and D. W. Jacobs. Bypassing Synthesis: PLS for Face Recognition with Pose, Low-Resolution and Sketch. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2011. [9] A. Sharma, A. Kumar, H. Daume III, and D. W. Jacobs. Generalized Multiview Analysis: A Discriminative Latent Space. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Providence, USA, 2012. [10] X. Zhang and Y. Gao. Face recognition across pose: A review. Pattern Recognition, 42(11):2876–2896, Nov. 2009.
100
Recognition rate in %
90 80 70 60 50
holistic PLS PLS with intensity blocks PLS with Gabor blocks holistic PLS from [3] PLS with Gabor blocks from [3] holistic regression from [4] regression with Gabor blocks from [4]
40 30 20 -90
-75
-60
-45*
-45
-30
-15 +15 Pose angle (pan)
+30
+45
+45*
+60
+75
+90
Figure 10. Comparison of the analyzed approaches (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees) Gallery Probe
-90 110
-75 120
-60 090
-45* 081
-45 080
-30 130
-15 140
0 051
15 050
30 041
45 190
45* 191
60 200
75 010
90 240
Average
-90 -75 -60 -45* -45 -30 -15 0 15 30 45 45* 60 75 90
77.9 61.7 43.0 47.0 43.0 38.3 40.3 34.2 30.2 32.2 25.5 39.6 39.6 38.3
85.9 93.3 77.2 81.2 73.2 69.1 68.5 59.7 64.4 52.3 47.0 57.0 69.8 51.0
77.9 94.6 84.6 92.6 87.9 83.2 81.2 67.8 72.5 68.5 57.7 69.8 68.5 48.3
53.7 82.6 90.6 87.2 85.9 81.9 85.9 68.5 63.1 61.7 75.2 57.7 57.7 38.9
66.4 87.2 94.0 88.6 96.6 93.3 96.0 81.2 79.2 72.5 62.4 75.2 63.1 44.3
66.4 81.2 91.9 85.9 96.0 94.6 94.0 87.2 84.6 75.8 62.4 68.5 62.4 49.0
63.1 85.2 91.9 85.2 96.6 99.3 99.3 91.9 92.6 82.6 69.8 72.5 68.5 53.0
62.4 81.2 89.9 85.9 94.0 100 100 100 99.3 96.6 89.9 91.3 87.9 67.1
45.0 70.5 81.2 64.4 88.6 90.6 96.0 100 100 98.0 93.3 94.6 91.3 71.1
39.6 64.4 76.5 61.7 83.9 90.6 85.2 99.3 99.3 99.3 94.6 97.3 92.6 84.6
40.3 60.4 71.1 64.4 77.9 74.5 78.5 94.6 97.3 98.0 96.0 96.6 94.6 80.5
36.2 58.4 55.7 72.5 61.1 55.7 63.1 84.6 84.6 87.9 91.9 86.6 81.9 67.1
37.6 63.8 69.8 57.7 74.5 67.8 68.5 82.6 88.6 94.0 96.6 89.3 96.6 88.6
45.6 63.8 62.4 43.6 61.7 53.0 50.3 67.8 81.2 87.2 85.2 79.2 96.6 97.3
43.0 52.3 39.6 23.5 35.6 32.2 30.9 45.0 45.6 61.7 59.7 53.0 81.2 96.0 -
54.5 73.1 76.4 67.0 77.0 75.0 73.8 81.4 77.7 79.6 76.6 71.1 77.5 76.5 62.8
Average
42.2
67.8
75.4
70.8
78.6
78.6
82.3
89.0
84.6
83.5
80.3
70.5
76.8
69.7
50.0
73.3
Table 1. Recognition rates in percent using local blocks of intensity features for all gallery (columns) and probe poses (rows) (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees, Multi-PIE pose identifiers in italics) Gallery Probe
-90 110
-75 120
-60 090
-45* 081
-45 080
-30 130
-15 140
0 051
15 050
30 041
45 190
45* 191
60 200
75 010
90 240
Average
-90 -75 -60 -45* -45 -30 -15 0 15 30 45 45* 60 75 90
95.3 90.6 59.1 69.1 58.4 55.7 60.4 45.0 51.0 57.7 34.9 63.1 84.6 81.9
99.3 99.3 83.2 94.6 89.3 87.9 85.9 75.8 79.2 77.2 55.7 80.5 86.6 74.5
90.6 98.0 93.3 98.0 96.6 94.6 91.9 82.6 83.2 80.5 61.1 87.9 85.9 72.5
57.7 81.2 89.9 91.3 89.3 87.9 80.5 71.8 69.1 71.1 84.6 63.8 58.4 40.9
71.1 92.6 98.0 94.6 98.7 100 97.3 91.3 90.6 94.0 67.8 86.6 77.2 63.8
66.4 92.6 97.3 91.3 99.3 98.7 98.7 96.6 96.0 88.6 66.4 79.2 75.8 51.7
67.1 91.3 94.0 94.0 100 99.3 100 99.3 96.6 89.9 70.5 80.5 72.5 57.0
70.5 89.3 94.6 91.3 97.3 98.7 99.3 100 99.3 98.7 83.9 91.9 84.6 61.7
59.1 81.2 87.2 75.2 93.3 97.3 99.3 100 100 98.7 85.2 95.3 89.3 71.1
62.4 80.5 89.3 75.2 91.9 96.6 95.3 98.7 99.3 99.3 94.6 96.6 88.6 71.1
65.1 77.9 89.9 71.8 90.6 90.6 94.6 98.0 98.7 98.7 96.6 98.7 97.3 73.2
38.9 54.4 64.4 83.2 63.8 59.7 65.8 74.5 85.9 91.9 91.3 85.9 77.9 59.7
65.1 81.2 86.6 63.8 85.2 75.8 73.2 86.6 92.6 96.0 98.0 93.3 99.3 87.2
77.9 87.2 89.9 59.7 75.8 71.1 72.5 86.6 89.9 87.9 92.6 84.6 99.3 99.3
84.6 74.5 77.9 38.3 59.1 49.7 51.7 56.4 61.7 65.8 68.5 57.7 88.6 98.7 -
69.7 84.1 89.2 76.7 86.4 83.7 84.0 86.8 85.0 86.1 86.2 74.1 85.6 84.1 69.0
Average
64.8
83.5
86.9
74.1
87.4
85.6
86.6
90.1
88.0
88.5
88.7
71.2
84.6
83.9
66.6
82.0
Table 2. Recognition rates in percent using local blocks of Gabor features for all gallery (columns) and probe poses (rows) (pose angles with an asterisk indicate poses with a tilt angle of about 30 degrees, Multi-PIE pose identifiers in italics)