Coarse to Fine Face Detection Based on Skin Color Adaption

Comment

Report 1 Downloads 95 Views

Coarse to Fine Face Detection Based on Skin Color Adaption Hichem Sahbi and Nozha Boujemaa INRIA Rocquencourt, BP 105, 78153 Le Chesnay, France {Hichem.Sahbi, Nozha.Boujemaa}@INRIA.fr http://www-rocq.inria.fr/imedia/

Abstract. In this paper we present a skin color approach for fast and accurate face detection which combines skin color learning and image segmentation. This approach starts from a coarse segmentation which provides regions of homogeneous statistical color distribution. Some regions represent parts of human skin and are selected by minimizing an error between the color distribution of each region and the output of a compression decompression neural network, which learns skin color distribution for several populations of diﬀerent ethnicity. This ANN is used to ﬁnd a collection of skin regions which are used to estimate the new parameters of the Gaussian models using a 2-means fuzzy clustering in order to adapt these parameters to the context of the input image. A Bayesian framework is used to perform a ﬁner classiﬁcation and makes the skin and face detection process invariant to scale and lighting conditions. Finally, a face shape based model is used to validate or not the face hypothesis on each skin region.

1

Introduction

With the increasing emergence of the Internet, more and more data is becoming available on the Web. So how can this data be organized in order to retrieve information with an accurate precision and in a reasonable time? Visual information retrieval systems use multiple generic and speciﬁc descriptors. The application of speciﬁc face descriptors on databases containing human faces is signiﬁcant only if these descriptors are applied to the regions of interest, which means that a face localization process is required. Several methods for face detection are discussed in the literature, including that developed by Rowley et al [1] who tests for the existence of faces of diﬀerent sizes and rotations at each image position using a neural network. Osuna [2] uses support vector machine to build a face / no face classiﬁer by maximzing the margin between the two separated classes. Leung et al [3] use a graph-matching method to ﬁnd probable faces from detected facial features. These graphs are generated from the detected features and the true faces are detected among the candidates by random graph matching. Goshtasby et al [4] use the chrominance invariant color space (ab) to learn skin color, using a Gaussian models for color learning and face detection is performed using a template matching process. M. Tistarelli, J. Bigun, A.K. Jain (Eds.): Biometric Authentication, LNCS 2359, pp. 112–120, 2002. c Springer-Verlag Berlin Heidelberg 2002

Coarse to Fine Face Detection Based on Skin Color Adaption

113

Fleuret et al [5] presents a coarse-to-fine face detector which is entirely based on edge conﬁgurations. This algorithm visits a hierarchical partition of the face pose space, and in order to declare detection, a chain of classiﬁers from the root to one leaf is found. Recently, Viola et al [6] proposed a face detection method that compute fastly features using an Integral image and combine classiﬁers in a cascade allowing to reject background regions quickly. Their learning approach is purely statistical and it is based on AdaBoost 1 . In this paper we present an approach for precise skin and face detection based on the use of color space properties. This approach aims to track the variation in skin color distribution from one person to another. A coarse skin color learning stage from a very large population is performed oﬄine. This color model is used to select skin regions and a finer color learning step is performed using a maximum conﬁdence scheme in order to adapt parameters of the skin model to persons present in the scene. In the ﬁnal stage, a skin/no-skin classiﬁcation is performed using a Gaussian model and a Bayesian framework. A face shape based model is used to validate the face hypothesis.

2

Skin Region Selection

To perform a better skin color learning process based on the conditions of the input image, we search for a distribution of pixel colors which is the most likely to be from human skin. An ad-hoc method, which attempts to search for every subset of the image pixels and to measure a distance for every combination from a given skin color model, is very time consuming. So we start by a coarse initial segmentation such as the DFDM method [7]. This segmentation provides connected regions which have an homogeneous local color distribution in the image space. Among these regions Ri , we have skin parts (noted SRi ) which are detected using a distance E given by : E(Ri ) =

1 Ri

(Φ(c(x,y) ) − c(x,y) )2

(1)

(x,y)∈Ri

Here c(x,y) is the color of a pixel (x, y) represented in the RGB color space. Φ is the output of a neural network trained over a large population of skin colors collected from the World Wide Web. A quadratic or more generally a nonlinear function such as one hidden layer neural network is a good choice for a satisfactory approximation of skin color distribution(Fig.1). The learning is performed using the traditional back-propagation [8] algorithm, and our network performs no linear PCA since for every color c(x,y) in the training set, the diﬀerence between the input and output is minimized. For each candidate region to be a skin part, we use a decision rule based on computing an empirical error between this region Ri and the learned model using 1

Selecting a small number of critical visual features from a learning set.

114

Hichem Sahbi and Nozha Boujemaa

(b)

(a)

Fig. 1. (a) Neural network architecture used for skin color learning. (b) 3D distribution of skin color in the RGB color space. (1). Our approach does not aim to classify each pixel directly into skin or no skin according to the ANN only. Indeed, a decision based on a direct computation of the error function E to each pixel color can cause an increase in the number of false positives and false negatives, related to noisy data and lighting variations (cf. Fig.(5).A). In order to reduce these eﬀects, we learn skin color under the lighting conditions of the input image (diﬀerence in lighting and melanin), so the goal is to have a set of color pixels (which may even be small), in order to perform a second color training process for more accurate classiﬁcation.

The Finer Skin Detection Process

(b)

Database

N. Network

(a)

Skin Region

Gaussian model

Detection

parameter learning

Off-Line skin color learning

(c)

Fast Skin Region Detection

Online skin color learning

The finer skin color classification.

Fig. 2. The whole diagram of skin region selection (a) The input image of Clinton (b) Segmented image using the DFDM (c) Selection of skin regions marked with white. The coarse-to-fine algorithm shown in (Fig.2) is now summarized as follows: 1. Learn the neural network weights from a skin color population which is diﬀerent from the input image (Oﬀ-line Step). 2. For every query image I (On line step): – Do a coarse segmentation to have a collection of candidate regions Ri i = 0...L. – Classify each candidate region Ri as a skin or no skin region, this is undertaken by considering the regions where the error E(Ri ) is below a given threshold.

Coarse to Fine Face Detection Based on Skin Color Adaption

3

115

Accurate Face Detection

By considering K and L−K clusters for both skin and noisy regions respectively, we make a decision rule for whether a pixel (x, y) is a skin point given its color observation c(x,y) . This decision rule is based on the following condition: P ((Y (c(x,y) = 1)|c(x,y) ) > P ((Y (c(x,y) ) = 0)|c(x,y) )

(2)

Here Y (c(x,y) ) = 1 (resp Y (c(x,y) ) = 0) denotes the event which expresses that the color c(x,y) is a skin (resp no-skin) color and X(c(x,y) ) = si (resp X(c(x,y) ) = ni ) the event expressing that the color c(x,y) is a skin color from the region SRi (resp Ni : The noisy region). In what follows, we denote by c the color c(x,y) depending on the pixel (x, y). The two members of equation (2) are given by: P ((Y (c) = 1)|c) =

K P (c|(X(c) = si )).P (X(c) = si )

P (c)

i=1

P ((Y (c) = 0)|c) =

L−K i=1

P (c|(X(c) = ni )).P (X(c) = ni ) P (c)

(3)

(4)

We can set the priors P (X(c) = si ), P (X(c) = ni ) to be equal. The density function P (c|(X(c) = si )) is modeled as a Gaussian having parameters which are estimated as explained in the following section. 3.1

Accurate Online Training Model

Let c1 , ..., ck , ..., cMi to be a quantiﬁcation of colors in a skin region SRi , and h1 , ..., hk , ..., hMi the related histogram which denotes the color frequencies. The average µi and the variance-covariance Σi matrices of the related color distribution, are respectively given by: Mi Mi hk (ck − µi )(ck − µi )T k=1 hk ck µi = Mi , Σi = k=1 Mi k=1 hk k=1 hk During the generation of parameters of the Gaussian model, the noisy points in a skin region SRi aﬀect µi and Σi estimation quality and this is related to the presence of no skin parts as hair or glasses. In order to reduce the eﬀect of outliers, we model each skin region as two clusters which contain relevant and noisy skin points respectively. We apply the fuzzy clustering approach [9] to compute for each color in SRi a conﬁdence coeﬃcient. This coeﬃcient is given by: Up,ck = 2

1 1

2 2 (m−1) q=1 [(dp,ck ) /(dq,ck ) ]

(5)

116

Hichem Sahbi and Nozha Boujemaa

J(U, v) =

Mi 2

(Up,ck )m (dp,ck )2

(6)

p=1 k=1

Here Up,ck expresses the color membership of the color ck to the cluster p (p is either skin or a noisy cluster) and dp,ck is a simple Mahalanobis distance of the color ck to the cluster p. Relating to [9], we perform a 2-mean fuzzy clustering of points present in each skin region into noisy and relevant skin points. This is carried out by minimizing the functional (6) which reaches its global minimum when each color ck is assigned to its relevant (noisy or skin) cluster. This preprocessing step gives much greater accuracy to the learned parameters of the Gaussian model, which are now modiﬁed as follows: Mi µi =

k=1 hk Uskin,ck ck , Mi k=1 hk Uskin,ck

Mi Σi =

k=1

hk Uskin,ck (ck − µi )(ck − µi )T Mi k=1 hk Uskin,ck

(7)

The coeﬃcients Uskin,ck are introduced as weighting values to reduce the noise eﬀects when computing the Gaussian model’s parameters. 3.2

Validating the Face Hypothesis

Given a skin region, a shape model is used to make a decision as to whether this region is a face or not. We compute two histograms corresponding to the horizontal and vertical sum of gray level information in the X and Y coordinates as shown in ﬁgure (Fig.3(b)). These two histograms are smoothed using a Gaussian ﬁltering function to eliminate high frequency components. This process is summarized as follows: – Construct an entropy map using a snapshot descriptor [10](as the gray level histogram) on each window w(x, y) of the skin region. Assuming that each descriptor takes values in c1 , ..., cr the computed entropy is given by: H(w(x, y)) = −

r

[P rw(x,y) (ci )]log2 [P rw(x,y) (ci )]

(8)

i

– The Y and X histograms are computed using equations (9). yi =

Tx j=1

H(w(j, i)) , xj =

Ty

H(w(j, i))

(9)

i=1

– Perform a progressive ﬁltering, to extract respectively the principal y and x coordinates corresponding to the lowest frequencies or the principal variation modes of the X and Y histograms. A skin region is taken to be a frontal face if the following two conditions are satisﬁed:

Coarse to Fine Face Detection Based on Skin Color Adaption

117

– The number of local extrema are three both in the horizontal and the vertical histograms and noted x1,x2,x3 and y1,y2,y3 respectively (cf. Fig.3(b)). – We estimate the likelihood for (x1, y1),(x3, y1), (x2, y2) and (x2, y3) to be respectively eyes, nose and mouth coordinates using a learning model. A Gaussian mixture model is used where each cluster attempts to capture the statistical distribution of the (xi , yj ) coordinates of the related feature. A decision rule is made using a maximum likelihood principal.

4

Experiments

To build our neural network, we collected a set of skin maps from the World Wide Web (Fig.3.(a)). These images were chosen to span a wide range of environmental conditions (blur, noise, etc), with people of diﬀerent ethnicity and various skin colors. We tested our algorithm on the French TV Channel (TF1) database, the detection performances are estimated using the precision recall curves (cf. equations (10),(11)) with respect to the acceptance rate σ which represents the fraction of accepted and used skin colors (considered as relevant) during the online fuzzy learning step. relevant detected skin pixels detected skin pixels relevant detected skin pixels Recall = all correct skin pixels

P recision =

(a)

(10) (11)

(b)

Fig. 3. (a). A sample of skin maps from the WWW used during the Oﬀ line learning process. (b) X and Y gray level histrogram projections used for frontal face feature detection. According to the results (cf. Fig.5), even though the segmentation algorithm does not provide a good result, each detected skin region contains a signiﬁcant

118

Hichem Sahbi and Nozha Boujemaa

Fig. 4. Recall and precision of skin classiﬁcation for both (1) ANN direct classiﬁcation (2) The coarse to ﬁne approach.

part of skin color distribution, which is suﬃcient to perform a successful learning process. Figure (4) presents the precision-recall curves in both direct color ﬁltering (using the ANN directly) and the coarse-to-ﬁne approach. From this diagram, a considerable improvement is observed in both precision and recall for our method with respect to using the ANN directly as a skin ﬁlter. According to our experiments, the acceptance rate σ ranges almost between 30 − 60%, so an improvement both in precision and recall is guaranteed with respect to the ANN ﬁlter (cf. Fig.4). Time processing is an other aspect which have been evaluated. For images of 400 × 300 pixels, the face extraction process was performed in 0.8 (s) using a standart Pentium II 450 MHZ, so the face detection is carried out interactively and can be used to bootstrap a face tracking system.

5

Conclusion

A ”coarse to ﬁne” method is presented for an accurate skin and face detection based on the combination of two coarse approaches. This approach starts from a coarse segmentation which performs a subdivision of an image into regions of homogeneous statistical color properties and a neural network skin detector provides a vote to select regions of interest in order to perform a second online training step which improves the skin model parameters. We are currently investigating to use our skin classiﬁer as an input to an SVM classiﬁer [11] to perform the face validation step. This can be performed by applying the SVM function only in the skin regions detected by our algorithm rather than sliding a window on the whole image space. This SVM is considered as a shape model which is able to handle large variations in face pose to decide whether a skin region is a face or not. Combining a fast skin detection with an SVM face detector allows

Coarse to Fine Face Detection Based on Skin Color Adaption

119

(A)

(B)

(C)

Fig. 5. (A) Skin detection using the ANN. (B) Segmentation using the DFDM followed by a skin region selection. (C) Face detection using the coarse to ﬁne approach followed by the application of the frontal face shape model.

us to build a face localizer which is more faster and accurate than many other existing methods. Acknowledgment: We would like to thank TF1, the French TV Channel for providing us with images for tests.

References 1. H. Rowley, S. Baluja and T. Kanade : Neural network-based face detection. In IEEE Trans on PAMI. Vol. 20, Num. 1. (1998) 23–38. 2. E. Osuna, R. Freund and F. Girosi : Training support vector machines: an application to face detection. In IEEE CVPR. (1997) 130–136. 3. T. Leung, M.C. Burl and P Perona : Finding faces in cluttered scenes using random labelled graph matching. In ICCV. (1995). 4. J. Cai and A. Goshtasby : Detecting humans faces in color images. Image and Vision Computing. Vol. 18, Num. 1. (2000) 63–75.

120

Hichem Sahbi and Nozha Boujemaa

5. F. Fleuret and D. Geman : Coarse-to-ﬁne visual selection. In IJCV. Vol. 41,Num. 2. (2001). 6. P. Viola and M. Jones : Robust real-time object detection. In Second International Workshop On Statistical and Computational Theories of Vision-Modeling, Learning, Computing and Sampling. (2001). 7. A. Winter and C. Nastar : Diﬀerential feature distribution maps for image segmentation and region queries in image databases. CBAIVL workshop at CVPR. (1999). 8. C.M. Bishop : Neural networks for pattern recognition. CLARENDON PRESS OXFORD. (1995). 9. Rajesh N. Dave : Characterization and detection of noise in clustering. Pattern Recognition. Vol. 12,Num. 11. (1995) 545–561. 10. S. Gilles : Robust description and matching of images. Oxford University. (1998). 11. H. Sahbi, D. Geman and N. Boujemaa : Face detection using coarse-to-ﬁne support vector classiﬁers. Submitted to the IEEE, ICIP. (2002).

Recommend Documents

Coarse-to-Fine Face Detection | SpringerLink

Combining Haar Feature and skin color based classifiers for face ...