HEAD POSE ESTIMATION USING GABOR EIGENSPACE MODELING* Yucheng Wei*, Ludovic Fradet† , Tieniu Tan* * National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, P. O. Box 2728, Beijing, China, 100080 † Laboratoire d'Automatique Industrielle (LAI) Institut National des Sciences Appliquées de Lyon (INSA de Lyon) E-mails: {ycwei, lfradet, tnt}@nlpr.ia.ac.cn ABSTRACT In this paper, an approach towards head pose estimation is introduced based on Gabor eigenspace modeling. Gabor filter is used to enhance pose information and eliminate other distractive information like variable face appearance or changing environmental illumination. We discuss the selection of optimal Gabor filter’s orientation to each pose, which leads to more compact pose clustering. Then we use a distributionbased pose model (DBPM) to model each pose cluster in Gabor eigenspace. Thus to each pose cluster, a 2Ddistance space is established where the distance from centroid (DFC) could be used to estimate head pose. Experimental results demonstrate the algorithm’s robustness and generalization. We also try our algorithm on real scene sequences to detect human face and estimate its pose. In this way, user can control an intelligent wheelchair just by his head poses. 1. INTRODUCTION In real face-related applications such as automatic face recognition, intelligent surveillance and perceptual human-computer interface, most of the faces in the images are non-frontal with various pose and illumination. Therefore, how to detect and estimate head pose reliably is an essential problem in these applications. Much work has been done on this challenging issue in recent years. Existing methods could be categorized into three main types: geometric feature-based approach [1,2], model-based approach [ 3 , 4 , 5 ], and appearance-based subspace analysis [ 6 , 7 , 8 ]. Treating the whole face as a feature vector in some statistic subspaces, appearancebased subspace method can avoid the difficulties of local face feature detection and face modeling. It has become a popular method recently. But in the subspace, the distribution of face appearances under variable pose and illumination is always a highly non-linear, non-convex and maybe twisted manifold, which is very hard to be analyzed directly [ 9 ]. Hence how to model such pose manifold and remove the influences of noise information,
such as illumination and outliers in the image, is a challenging problem of subspace analysis. Murase and Nayar [ 10 ] make a parametric description of this nonlinear manifold to estimate pose in a single PCA subspace. Pentland etc. [11] construct the view-based subspaces to detect face and estimate pose. Li etc. [6] use the same idea to estimate head poses in the ISA subspace. There are also some approaches solving this problem by kernel-based methods such as SVR [7] and KPCA [8]. In this paper, we present an approach to detect face and estimate pose under changing illumination and background in the context of pose-based control of a robotic wheelchair [12]. The 2D images of a certain head pose are always different due to the diversity of individual face shape, accessories, various lighting conditions and complex background. Therefore, it is necessary to find certain transformation beforehand to enhance pose feature in the images and to eliminate other distractive features as much as we can. In our method, the selective Gabor filter according to each special pose is chosen to filter the pose images. After that, we could construct a Gabor eigenspace and get a more representative manifold of head pose clusters in this subspace. We then model such pose clusters by the "distribution-based pose model" (DBPM), an extension of "distribution-based face model" (DBFM) provided by Sung and Poggio [13]. Thus the 2D-distance space could be established according to each pose cluster. For an unknown test image, we can transform it into the 2D-distance spaces of all poses, where the distances from centroids (DFC) are used to estimate its pose. 2. GABOR CHANNEL SELECTION As we know, Gabor filter can compensate for changes in illumination direction in the face images [ 14 ] and different Gabor filter could extract different pose information [16]. We choose the Gabor filter proposed by Lee [15] ( ω 0 = 3.06,σ = 0.8 ) and focus on how to select the optimal orientation θ of the filter to its corresponding pose. For a cluster of images with same pose φ , we filter them separately by different Gabor filters changing from 0 o to 170 o at the interval of 10 o and use their magnitude
* This work is funded in part by the National 863 program and NSFC (Grand No. 59825105).
responses. After that, we can get 18 different clusters of filtered images. Then we calculate the mean Euclidean distance in each cluster and select the filter that results in the minimal inner distance as the optimal Gabor channel to this pose. The mean distance in a given cluster is defined as follows [16]: N −1
d (φ , gθ (⋅)) =
N
∑ ∑ d (g i =1 j =i +1
i
θ
j
( xφ ), gθ ( xφ ))
3.1. Distribution-based pose model (DBPM)
(1)
N ( N − 1) 2
θ ∈ [0o ,10o ,...,170o ] where xφ is the image of a given pose φ , gθ (⋅) is the transformed image ( θ is Gabor filter's orientation) , N is i
variations in the Gabor filtered appearances due to differences in facial expression, mustache, glasses and face shape. So it is necessary to further model such pose variations. Based on the investment of the distribution of head poses in eigensapce, we try to use a distributionbased pose model to represent some intrinsic features of head poses.
j
number of points in the cluster, and d ( gθ ( x φ ), gθ ( x φ )) is the Euclidean distance between two filtered image. We choose seven poses with pan angle φ varying from o 0 to 60 o and calculate the mean distance in each pose cluster after the Gabor selective filtering. Results are shown in Fig. 1.
We sample 1,470 standard pose examples labeled with pan angles from 0 o to 60 o at 10 o interval (see section 4 in detail). Filtered with each selective Gabor channel and projected to eigenspace, those images form an interesting manifold (Fig. 3).
0o 10o 20o 30 o
40o 50o 60o
Fig. 3. Typical pose distribution in Gabor eigensapce (3D eigenspace is used here for illustration)
Fig. 1. Mean distances in pose clusters filtered by various Gabor channels From the result, we can see that horizontal Gabor filter is optimal to near frontal poses such as 0o , 10o , 20 o , and vertical Gabor filter is optimal to other non-frontal poses. Pose images filtered by selective Gabor channel can form more compact clusters in high-dimensional space, which is desirable to pose estimation (see Fig. 2). Similar idea has been used in [16]. But they choose the ratio of inner distance and cross-distance of clusters as the selection criteria, which perhaps induces the optimal value with large inner distance.
From the figure, we assume that each pose cluster could be regarded as a multidimensional Gaussian distribution. Sung and Poggio [13] once used a modified version of kmeans clustering algorithm to compute each cluster’s centroids ( µ ) and covariance matrix ( Σ ) and provide a distribution-based face model. Here we extend such an idea to the distribution-based pose model (DBPM), which is used to model the above pose manifold. Given a test pattern point x , we can calculate its Mahalanobis distance to a pose cluster’s centroid µ φ and use such distance as the similarity to this pose. −1 M ( x, µ φ ) = ( x − µ φ ) T Σ φ ( x − µ φ )
(2)
where µ φ is the mean value of pose cluster φ , and Σφ is the corresponding covariance matrix which encodes the cluster’s shape and directions of elongation. 3.2. 2D-distance space
Fig. 2. Pose images ( φ ∈ [0 o ,60 o ] ) and selective Gabor filtered result 3. POSE MODELING IN GABOR EIGENSPACE
Although selective Gabor filters can alleviate some undesirable factors such as lighting changes and some flecks or stains in pose images, there still exist pattern
It is not feasible to calculate the Mahalanobis distance directly in high dimensional Gabor eigenspace. Therefore, a two-value distance is used to simulate the normal Mahalanobis distance [ 17 ]. If we establish view-based subspace separately to each pose by PCA, shape of the pose cluster’s projection in its pose subspace remains Gaussian-like. Therefore we can reduce the dimensions
and calculate the Mahalanobis distance in each pose subspace with lower dimensions, which is also named the “distance in feature space” (DIFS). And the Euclidean distance between the test pattern and its projection in the each pose subspace is the “distance from feature space” (DFFS). So the normal Mahalanobis distance is divided into a two-value distance vector, DIFS and DFFS [13, 17]. y φ = U φT ( x − µ φ ) (3) DIFS:
D1 ( x, µ φ ) = ( x − µ φ ) T Σ φ−1 ( x − µ φ ) = ( x − µ φ )T U φ Λ−φ1U φT ( x − µ φ ) N
= yφ Λ−φ1 yφ = ∑ T
i =1
DFFS:
D2 ( x, µ φ ) = x − x p
2
yi2
(4)
λi
= x − Uφ yφ
2
(5)
where y φ is the projection variable of a test pattern x in subspace of pose φ , U φ is the matrix where each columns uφi is a unit vector in the direction of the pose cluster’s ith largest eigenvector, and Λ φ is a diagonal matrix of the corresponding ith largest eigenvalue λi . Thus we can transform each pose subspace to a 2Ddistance space (Fig. 4).
Fig. 4. 2D-distance pose spaces ( φ = 0o , 60o for instance) It is apparent that patterns with similar pose φ cluster in the corresponding 2D-distance space and patterns of other poses or nonface run further away from the cluster. Thus for an unknown pattern, we can calculate the distance from pose cluster’s centroid to evaluate its similarity to this pose.
Then each pose image was rotated clockwise and anticlockwise by 5o and shifted left, right, up and down by 2 pixels. In this way, we get 210 images for each pan φ . We aim to estimate 13 head poses from − 60 o to 60 o in pan at 10o interval (left pose is negative, right pose is positive). First, we select proper Gabor filter to transform each pose image and rescale the result to 20 × 20. Then we model each pose cluster by view-based subspaces. We choose the dimension of each subspace by the same criteria in [13] (see Table 1, 0o ~ 60 o for instance). Pan ( φ ) Orientation of Gabor ( θ ) Dimensions of Subspace
Fig. 5. Some pose examples in our database ( − 60 o ~ 60 o )
10o 0o
20o 0o
30 o 90 o
40 o 90o
50o 90o
60 o 90 o
47
45
52
80
65
52
49
Table 1. Orientation of Gabor channel and number of dimensions to each pose subspace ( 0o ~ 60 o )
Then we project each pose image just to its corresponding pose subspace and calculate its DFFS and DIFS. Thus 13 pose clusters in 13 2D-distance spaces are acquired (as in Fig. 4). We can calculate the centroids of all the clusters for the next estimation. For a test pose pattern, we can project this pattern to each pose subspace and calculate its two-value distance. So this pose image is transformed into 13 points in 13 2Ddistance spaces. The Euclidean distances of this point to the each pose cluster centroid, i.e. distance from centroid (DFC), are used as criteria to estimate the pose. We can decide the pose with minimal DFC as the estimation result. In the experiment, an interesting phenomenon is found that the pose patterns within known poses ( − 60 o ~ 60 o ) can always find its close cluster in certain 2D-distance space, while almost all nonface or unknown pose patterns run away from the pose cluster in all spaces. Here an empirical threshold is used to judge whether the pattern is known or unknown. The pose estimation framework is illustrated as follows: DFFS pose subspace φ1 DIFS
θ1 = 90o
. . .
4. HEAD POSE ESTIMATION
We use a database of 2,730 pose images with pan angles ranging from − 60 o to 60 o (right to left). Volunteers wearing a special headpiece equipped with a laser emitter were asked to point to the marks at 10o interval on the wall measured beforehand. In this way, we have sampled the standard pose images of 30 persons in our lab.
0o 0o
. . .
θ 7 = 0o
DFFS pose subspace φ 7 DIFS
DFFS pose subspace φ13 DIFS
θ13 = 90
DFC φ
1
. . .
θ DFC φ
. . .
7
Min unknown
DFCφ
13
o
Fig. 6 Framework of pose estimation 5. EXPERIMENTS
We choose three image sets to test our algorithm. The first set contains 2,730 training images in our pose database. The second contains 200 images cropped by hand from real scene sequences with various illumination and expressions. The third includes 300 pose images selected from FERET database. Estimation results are tabulated in Table 2. Some examples are shown in Fig. 7.
From the result, we can see that our algorithm is robust to various patterns of head pose. Set 1 Set 2 Set 3
± 10 o Acc.
± 20 o Acc.
100% 89% 91%
100% 94% 96%
Table 2. Head Pose Estimation Accuracy
40o
60o
30o
20o
10o
0o
0o
− 20o
− 30 o
40o 40o 20o 20o 0o − 10o − 20o Fig. 7. Pose estimation results from the test sets
We also attempt to adapt our algorithm to realize posebased control of an intelligent wheelchair. For a sequence containing a head with changing poses, we want to detect the face and estimate its pose at the same time. In this way, we can control an intelligent wheelchair by user’s head poses (go forward: φ ∈ [−10,10] ; turn left: φ ∈ [−60,−20] ; turn right: φ ∈ [20,60] ). We implemented our algorithm on VC 6.0 + PII 300 (Compaq Presario 1910) and obtained near real-time performance (5-10 f/s). Some key frames with estimation results are illustrated in Fig. 8.
Fig. 8. Pose-based control of an intelligent wheelchair (the white box is the initial color filtered result, and then the black one is the detection result with pose estimation.) 6. CONCLUSIONS
In this paper, we have proposed an approach of head pose estimation based on Gabor eigenspace modeling. By filtering pose images through selective Gabor channels, some irrelevant information like variable face appearance or environmental illumination can be removed to some extent and more compact pose clusters are acquired in Gabor eigenspace. Thus we can model those clusters by a distribution-based pose model. A composition of DFFS and DIFS is adopted to create the 2D-distance space to each pose cluster, where a test pattern can be estimated by the distance from centroid (DFC). Extensive experimental results including the static image pose estimation and realtime pose tracking show the efficacy and robustness of our algorithm. In this way, we can control an intelligent wheelchair by the user’s head poses.
REFERENCES [1] A.Gee and R.Cipolla, "Determining the Gaze of Faces in Images," Image and Vision Computing, Vol. 12, No.10, 1994. [2] Q. Chen, H. Wu etc., "3D Head Pose Estimation without Feature Tracking," Proc. 3rd Int’l Conf. on Automatic Face and Gesture Recognition, Japan, 1998. [ 3] T.F.Cootes, K.Walker, C.J.Taylor, "View-Based Active Appearance Models," Proc. 3rd Int’l Conf. on Automatic Face and Gesture Recognition, Japan, 1998. [4] N.Krüger, M.Pötzsch, and C. von der Malsburg, "Determination of Faces Position and Pose With a Learned Representation Based on Labeled Graphs," Image and Vision Computing, Vol. 15, pp.665--673. [5] M. L. Cascia, S. Sclaroff and V. Athitsos, "Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models", IEEE Trans. PAMI, Vol. 21, No.6, June 1999. [ 6 ] S. Li, etc., "View-Based Clustering of Object Appearances Based on Independent Subspace Analysis," Proc. 8th IEEE Int'l Conf. on Computer Vision. Vancouver, Canada. July 9-12, 2001. [7] Y. Li, S. Gong, H. Liddell, "Support Vector Regression and Classification Based Multi-View Face Detection and Recogniton", Proc. 4th IEEE Int’l Conf. On Face and Gesture Recognition, pp.300-305, France, 2000. [8] S.Li, etc., "Kernel Machine Based Learning For MultiView Face Detection and Pose Estimation," Proc. of 8th IEEE Int'l Conf. on Computer Vision, Vancouver, Canada. July 9-12, 2001. [9] S.Gong, S.Mckenna, and J.Collins, "An Investigation into Face Pose Distribution," Proc. 2nd IEEE Int. Conf. On Automatic Face and Gesture Recognition, pp.265-270, Vermont, USA, Oct. 1996. [ 10] H.Murase and S.K. Nayar, "Illumination Planning for Objects Recognition Using Parametric Eigenfaces," IEEE Trans. PAMI, 16(12), pp.1219-1227, Jan.1995. [11] A. Pentland, B. Moghaddam, T. Starner, O. Oliyide, and M. Turk, "View-Based and Modular Eigenspaces for Face Recognition," Technical Report 245, M.I.T Media Lab, 1993. [ 12 ] Xueen Li, Tieniu Tan, Xiaojian Zhao, “Multi-modal Navigation for an Interactive Wheelchair,” Proc. of ICMI'2000, Beijing, CHINA, 2000. [13] K. Sung and T. Poggio, "Example-Based Learning for View-Based Human Face Detection," IEEE Trans. PAMI, Vol. 20, No. 1, pp. 39-51, Jan. 1998. [14] Y. Adini, Y. Moses and S. Ullman, "Face Recognition: The Problem of Compensation for Changes in Illumination Direction," IEEE Trans. PAMI, Vol. 19, No. 7, pp. 721732, July 1997. [15] T. Lee, "Image Representation: Using 2D Gabor Wavelet," IEEE Trans. PAMI, Vol. 18, No. 10, Oct. 1996. [16] J. Sherrah, S. Gong and E. Ong, "Understanding Pose Discrimination in Similarity Space," 10 th British Machine Vison Conference, Vol. 2, pp. 523-532, Nottingham, UK, Sept. 1999. [17] B. Moghaddam, and A. Pentland, "Probabilistic Visual Learning for Object Representaion," IEEE Trans. PAMI, Vol. 19, No. 7, pp. 696-710, July 1997