Robust and Precise Eye Detection based on ... - Semantic Scholar

Report 2 Downloads 72 Views
Robust and Precise Eye Detection based on Locally Selective Projection Ying Zheng1 , Zengfu Wang1,2∗ 1. Department of Automation, University of Science and Technology of China 2. The Key Laboratory of Biomimetic Sensing and Advanced Robot Technology, Anhui Province, China

Abstract This paper proposes a robust and precise eye detection method based on a new projection algorithm called locally selective projection (LSP). Along each projection axis, LSP selects a pixel and uses a function calculated in the neighborhood of the pixel as response. The local selectivity of LSP makes it robust against rotation, illumination and occlusion. Moreover, the positions of selected pixels can be recorded, providing a 2D cue for image analysis. We apply LSP to eye detection. The vertical and horizontal LSPs are first used to give reliable eye candidates, then a SVM classifier is employed to verify the real eye pairs. The experiment result compared with an AdaBoost detector shows the robustness and accuracy of proposed method.

1 Introduction Eyes are the most salient features on human faces. The precise detection of eyes is a crucial step in many face-related applications. Among the various eye detection methods, one kind of them is based on the geometric or intensity properties of eyes. These information can either be integrated in an eye template, or used to guide a heuristic search. Another kind of methods is based on learning. They treat eye detection as a classification task, and employ the state-of-art classification methods, such as SVM, Gabor wavelet network and AdaBoost [5, 3]. Although the learning based methods are more robust, their performances partly depend on the amount and diversity of the training data, and they may not give as precise eye positions as search based methods. Most of the successful eye detectors fall into a framework which first uses heuristic search methods to detect potential eye locations, and then uses sophisticated ∗ This work is supported by the National Natural Science Foundation of China.

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

classifier to verify the real eyes. Projection is a popular method for giving eye candidates [1, 6]. However, all the existing projection algorithms do not consider some complex conditions such as occlusion, illumination and rotation, thus will fail under these circumstances. In this paper, we propose a new projection algorithm and further use it for robust and precise eye detection. The proposed projection is called locally selective projection (LSP). Along each projection axis, LSP only selects a pixel whose neighborhood most resembles eye pupil. The similarity degree between this pixel and pupil is used as projection value. During LSP search, the positions of selected pixels can be recorded, providing a 2D cue for further analysis. The local selectiveness of LSP makes it robust to rotation, occlusion and illumination. By applying vertical and horizontal LSPs to eye images, we can obtain reliable eye candidates. A SVM classifier is then used to verify the real eye pairs from candidates. Compared with an AdaBoost detector, the proposed method achieves higher performance with much less training images used.

2

Locally Selective Projection

Projection is an effective image processing method. The original 2D image is projected along axes to form a 1D sequence. All the pixels along an axis are mapped to a real value (called response) calculated by a projection function. There are three projection functions proposed: integral projection function (calculate the mean), variance projection function (calculate the variance) [1], and hybrid projection function (a weighted sum of mean and variance) [6]. Although the used projection functions are different, the base framework of the existing projection algorithms is the same: calculate response from all pixels along one projection axis. And it is this framework that brings two major problems. First, they all use cumulative functions, for example summation, to calculate the responses. The occlusion and illumination factors will

greatly influence the global appearance of images, thus cause these projections fail. Second, the calculation of responses depends on a projection axis, which is vulnerable to rotation. Base on the above analysis, a new projection algorithm called locally selective projection (LSP) is proposed to overcome the deficiencies of traditional projections. For each pixel (x, y) in an image, suppose that there is a function fLoc (x, y) defined in its neighborhood N (x, y): fLoc (x, y) : N (x, y) → R. We call this function locally selective function (LSF). Based on fLoc , the vertical LSP is defined as: LSPv (x) Pv (x)

=

min

yi ∈[y1 ,y2 ]

= arg

fLoc (x, yi ),

min

yi ∈[y1 ,y2 ]

fLoc (x, yi )

(1) (2)

where [y1 , y2 ] is the vertical interval of the image, LSPv (x) is the response of LSP, and the point (x, Pv (x)) is the selected pixel whose fLoc is minimal, Pv (x) is the y coordinates of this pixel. The horizontal LSP can be defined similarly as: LSPh (y) = Ph (y) =

min

xi ∈[x1 ,x2 ]

arg

fLoc(xi , y),

min

xi ∈[x1 ,x2 ]

fLoc (xi , y)

(3) (4)

In LSP, the minimization operation replaces the traditional cumulative operations. The searching for minimum is, to some extent, similar to performing a weak classification: the “positive” is identified as minimum of LSF. This immediately makes LSP more powerful than traditional projections. Furthermore, the process of designing specific LSF provides us an opportunity to construct a good detector.

3 LSP for Eye Detection The performance of LSP is largely determined by the LSF used. In this section, we introduce the design of LSF. We will show that the designed LSF integrated with our LSP algorithm will produce robust eye detections. Finally, to further reduce the false detection, a SVM is used to validate the real eyes.

3.1 The Design of LSF In our LSP framework, the ideal LSF output (nonnegative) should be small for eyes, while large for noneyes. In other words, LSF should be designed to measure the similarity between the examining local region and an eye region. Since the eye image has low intensity and high contrast, we propose to use LBP histogram distance as our LSF.

The local binary pattern (LBP) [4] is a powerful method for texture description. The operator labels the pixels of an image by thresholding the neighborhood of each pixel with the center value and considering the result as a binary number. The size of neighborhood and the number of sampling points can be arbitrary, by using circular neighborhoods and bilinearly interpolating the pixel values. Denoting an image filtered by LBP operalbp lbp tor lbp as flbp (x, y), a LBP histogram HN = {Hi,N } obtained in a neighborhood N is calculated as:  lbp Hi,N = I(flbp (x, y) = i), i = 0, . . . , n − 1 (5) x,y∈N

where n is the number of labels produced by lbp, and I(A) = 1 if condition A is true, I(A) = 0 otherwise. The LBP histogram depicts the appearance distribution over a local region. Using a small training set of eye images, we can calculate a template of eye LBP histogram by averaging all the histograms extracted from the neighborhoods of pupils. Based on the template, the LSF can be designed as the Chi square (χ2 ) distance between a neighborhood histogram H and the template: lbp lbp 2 lbp fN ) (x,y) (x, y) = χ (HN (x,y) , T

(6)

where N (x, y) stands for the pixel neighborhood, and T lbp is the template LBP histogram of eyes, n i −Ti )2 χ2 (H, T ) = i (H (Hi +Ti ) is the Chi square distance. By scaling the radius of neighborhood, we can evaluate the LSF at different scales. But enlarging neighborhood will prolong the computing time. Instead, we enlarge the radius of LBP operator, which is in const computing time (the computation complexity of LBP is based on the number of sampling points). In this paper, we choose lbp8,8 as the LBP operator, which has 8 sampling points in a neighborhood of 8 pixel radius. By using a neighborhood of 10 × 10, we can actually examine a 18 × 18 region.

3.2 Eye Candidate Detection via LSP To accelerate the projection process, we first calculate the LSF for each pixel in the image and store the value as pixel intensity. This results in a similarity map. Two similarity maps are illustrated in Fig.1(a). The low intensity pixels show the high similarity. Now, the LSP applied on the similarity map is to select a pixel with minimum intensity along each projection axis. Fig.1 shows some eye images and corresponding LSP results, compared with the results of integral projection (IP). Using this figure for illustration, we discuss some advantages of LSP. Fig.1(a) shows the robustness of LSP against rotation. The eye on the right is rotated

If no such intersected pixel exists, then all the valleys detected are used as candidates.

3.3

(a)

(b)

(c) Figure 1. LSP curves and trace lines

by 30 degrees from the left one. Unlike IP results, the vertical and horizontal LSP curves do not change much between the two eyes. This is because LSF is defined on a neighborhood. It examines the local appearance in all directions, not just one direction along the projection axis. From the two similarity maps, we can also see that the parts corresponding to pupils in the two maps are approximately circles which are invariant to rotations. This explains the robustness of LSP against rotation. Fig.1(b) shows some eye images and corresponding vertical and horizontal LSP curves. And the red (for vertical LSP) and blue (for horizontal LSP) lines in Fig.1 (c) are the trace line of LSP: each point in the trace line is a pixel with minimal LSF. Although occlusion (hair), accessory (glasses), and bad lighting (shadow) are presented in Fig.1(b), the pupil positions can still be reliably detected as local minima (valleys) in the LSP projection curves. The corresponding 2D coordinates of these local minima can be inquired from the trace line, and used as eye candidates. We mark the candidates on trace line as white dots in Fig.1 (c). In most cases, the trace lines of vertical and horizontal LSPs will intersect at real pupils. The eye candidate detection strategy is thus as follows: if two pixels are respectively detected by vertical and horizontal LSPs as valleys, and they are close to each other(within 3 pixels), then the mean position of the two pixels are used as an eye candidates.

SVM based Eye Pair Verification

After obtaining the eye candidates, we need to verify the real eyes from these candidates. We use SVM for this validation. The SVM classifier is trained using face images with normal illumination and frontal pose. A 40 × 20 (width×height) window is placed and centered at pupils. The intensities in the window is concatenated to form a positive sample. The negative samples are extracted similarly with windows centered at non-pupil pixels. The trained SVM has the form Ns f (x) = α yi K(si , x) + b, where K(si , x) is a i i=1 kernel function. The support vector si , the weight α and threshold b are obtained by training. Note that we do not need to threshold the output of SVM for a hard decision. Instead, we just need to get a score for each candidate and select the one with the highest score. Towards a robust detection against rotation, especially in-plane rotation, the left and right eye candidates are considered as eye-pairs [3]. For each pair, the original eye region is rotated so that the left and right candidates have the same height. Then two 40 × 20 windows are centered at the left and right eye candidates respectively, and the scores of the sampled windows are obtained by SVM. The left and right scores are summed up to get a final score for the candidate eye-pair. The pair with the highest score is selected as the final eyes.

4

Experiment

The experiment is performed using images from CAS-PEAL database [2]. After detecting the faces in images, the windows containing eyes can be cropped as the upper part of the faces. The cropped eye window is resized to 120 × 50. The eye windows contain the eyebrows and part of the nose. The template eye histogram is calculated on a small training set of images: the first 100 female and 100 male face images in the Normal subset of the database. The pupils are first labeled on the training images and the LBP histograms are obtained on the 10 × 10 neighborhood of the pupils. Then the histograms are averaged to obtain the template. The 200 images are also used for training the SVM. Therefore the positive eye samples we get are just 400. The negative samples used are 8,000. Compared with most learning based eye detection methods, the images used for training are few. We test our algorithm on three data sets. The first one is the Normal subset of CAS-PEAL database containing frontal pose and normal illuminated face im-

Table 1. Eye detection correct rate Test Set method correct rate% (< pixel)