Robust Color Edge Detection Through Tensor Voting - IEEE Xplore

Report 1 Downloads 24 Views
ROBUST COLOR EDGE DETECTION THROUGH TENSOR VOTING Rodrigo Moreno1 , Miguel Angel Garcia2 , Domenec Puig1 , Carme Juli`a1 ∗ 1

Rovira i Virgili University, Intelligent Robotics and Computer Vision Group, Dept. of Computer Science and Mathematics, Av. Pa¨ısos Catalans 26, 43007 Tarragona, Spain 2 Autonomous University of Madrid, Dept. of Informatics Engineering, Cra. Colmenar Viejo Km 15, 28049 Madrid, Spain ABSTRACT This paper presents a new method for color edge detection based on the tensor voting framework, a robust perceptual grouping technique used to extract salient information from noisy data. The tensor voting framework is adapted to encode color information via tensors in order to propagate them into a neighborhood through a voting process specifically designed for color edge detection by taking into account perceptual color differences, region uniformity and edginess according to a set of intuitive perceptual criteria. Perceptual color differences are estimated by means of an optimized version of the CIEDE2000 formula, while uniformity and edginess are estimated by means of saliency maps obtained from the tensor voting process. Experiments show that the proposed algorithm is more robust and has a similar performance in precision when compared with the state-of-the-art. Index Terms— Image edge analysis, tensor voting, CIELAB, CIEDE2000. 1. INTRODUCTION The performance of many computer vision applications directly depends on the effectiveness of a previous edge detection process. The final goal of edge detection is to find “meaningful discontinuities” in a digital image. Although many edge detectors have been proven effective (e.g. [1], [2]), their performance decreases for noisy images. This paper proposes a new edge detector that has a similar performance to the state-of-the-art methods for noiseless images and, in addition, a better one for noisy images. The proposed detector is based on an adaptation to edge detection (Section 2) of the tensor voting framework (TVF) [3]. First, an encoding process specifically designed to encode color, uniformity and edginess into tensors is introduced (Section 2.1). Second, a voting process specifically tailored to the edge ∗ This research has been partially supported by the Spanish Ministry of Science and Technology under project DPI2007-66556-C03-03, by the Commissioner for Universities and Research of the Department of Innovation, Universities and Companies of Catalonia’s Government and by the European Social Fund.

978-1-4244-5654-3/09/$26.00 ©2009 IEEE

2153

detection problem is also presented (Section 2.2 and 2.3). Although every color channel is processed independently, possible correlations between channels are also taken into account by the proposed method. A comparison of the proposed detector with state-of-the-art methods is shown in Section 3. 2. TENSOR VOTING FRAMEWORK FOR COLOR EDGE DETECTION The input of the proposed method is the set of pixels of a color image. Thus, positional and color information is available for every input pixel. Positional information is used to determine the neighborhood of every pixel, while color information is used to define the tensors in the encoding step. The next subsections describe the details of the proposed edge detector. 2.1. Encoding of Color Information Before applying the proposed method, color is converted to the CIELAB space. Every CIELAB channel is then normalized in the range [0, π/2]. In the first step of the method, the information of every pixel is encoded through three second order 2D tensors, one for each normalized CIELAB color channel. These tensors are represented by 2×2 symmetric positive semidefinite matrices that can be graphically represented by 2D ellipses. There are two extreme cases for the proposed tensors: stick tensors, which are stick-shaped ellipses with a single eigenvalue, λ1 , different from zero, and ball tensors, which are circumference-shaped ellipses whose λ1 and λ2 eigenvalues are equal to each other. Three perceptual measures are encoded in the tensors associated with every input pixel, namely: the most likely normalized noiseless color at the pixel (in the specific channel), a metric of local uniformity (how edgeless its neighborhood is), and an estimation of edginess (how likely finding edges or texture at the pixel’s location is). The most likely normalized noiseless color is encoded by the angle α between the x axis, which represents the lowest possible color value in the corresponding channel, and the eigenvector corresponding to the largest eigenvalue. For example, in channel L, a tensor with α = 0 encodes black, π whereas a tensor with α = encodes white. In addition, 2 local uniformity and edginess are encoded by means of the

ICIP 2009

Fig. 1. Encoding process for channel L. Color, uniformity and edginess are encoded by means of α and the normalized sˆ1 = (λ1 − λ2 )/λ1 and sˆ2 = λ2 /λ1 saliencies respectively.

normalized sˆ1 = (λ1 − λ2 )/λ1 and sˆ2 = λ2 /λ1 saliencies respectively. Thus, a pixel located at a completely uniform region is represented by means of three stick tensors, one for each color channel. In contrast, a pixel located at an ideal edge is represented by means of three ball tensors, one for every color channel. Figure 1 shows the graphical interpretation of a tensor for channel L. Before applying the voting process, it is necessary to initialize the tensors associated with every pixel. The most likely noiseless colors can be initialized with the colors of the input pixels encoded by means of the angle α between the x axis and the principal eigenvector, as described before. However, since metrics of uniformity and edginess are usually unavailable at the beginning of the voting process, normalized saliency sˆ1 is initialized to one and normalized saliency sˆ2 is initialized to zero. These initializations allow the method to estimate more appropriate values of the normalized saliencies for the next stages, as described in the next subsection. Hence, the initial color information of a pixel is encoded through three stick tensors oriented along the directions that represent that color in the normalized CIELAB channels: Tc (p) = tc (p) tc (p)T , where Tc (p) is the tensor of the c-th color channel (L, a and b) at pixel p, tc (p) = [cos (Cc (p)) sin (Cc (p))]T , and Cc (p) is the normalized value of the c-th color channel at p. 2.2. Voting Process The voting process requires three measurements for every pair of pixels p and q: the perceptual color difference, ΔEpq ; the joint uniformity measurement, Uc (p, q), used to determine if both pixels belong to the same region; and the likelihood of a pixel being impulse noise, ηc (p). ΔEpq is calculated through CIEDE2000 [4], while Uc (p, q) = sˆ1c (p) sˆ1c (q), and ηc (p) = sˆ2c (p) − μsˆ2c (p) if p is located at a local maximum and zero otherwise, where μsˆ2c (p) represents the mean of sˆ2c over the neighborhood of p. In the second step of the method, the tensors associated with every pixel are propagated to their neighbors through a convolution-like process. This step is independently applied to the tensors of every channel (L, a and b). The voting

2154

process is carried out by means of specially designed tensorial functions referred to as propagation functions, which take into account not only the information encoded in the tensors but also the local relations between neighbors. Two propagation functions are proposed for edge detection: a stick and a ball propagation function. The stick propagation function is used to propagate the most likely noiseless color of a pixel, while the ball propagation function is used to increase edginess where required. The application of the first function leads to stick votes, while the application of the second function produces ball votes. Stick votes are used to eliminate noise and increase the edginess where the color of the voter and the voted pixels are different. Ball votes are used to increase the relevance of the most important edges. The voting process described in [3] cannot directly be applied to edge detection, since a pixel cannot appropriately propagate its information to its neighbors without taking into account the local relations between that pixel and its neighbors. A stick vote can be seen as a stick-shaped tensor, STc (p), with a strength modulated by three scalar factors. The proposed stick propagation function, Sc (p, q), which allows a pixel p to cast a stick vote to a neighboring pixel q for channel c is given by: Sc (p, q) = GS(p, q) ηc (p) SV c (p, q) STc (p),

(1)

with STc (p), GS(p, q), ηc (p) and SV c (p, q) being defined as follows. First, the tensor STc (p) encodes the most likely normalized noiseless color at p. Thus, STc (p) is defined as the tensorized eigenvector corresponding to the largest eigenvalue of the voter pixel, that is, STc (p) = e1c (p) e1c (p)T , being e1c (p) the eigenvector with the largest eigenvalue of the tensor associated with channel c at p. Second, the three scalar factors in (1), each ranging between zero and one, are defined as follows. The first factor, GS(p, q), models the influence of the distance between p and q in the vote strength. Thus, GS(p, q) = Gσs (||p − q||), where Gσs (·) is a decaying Gaussian function with zero mean and a user-defined standard deviation σs . The second factor, ηc (p) defined as ηc (p) = 1 − ηc (p), is introduced in order to prevent a pixel p previously classified as impulse noise from propagating its information. The third factor, SV c , takes into account the influence of the perceptual color difference, the uniformity and the noisiness of the voted pixel. This factor is given by: SV c (p, q) = ηc (q) SV c (p, q) + ηc (q),

(2)

where: SV c (p, q) = [Gσd (ΔEpq )+Uc (p, q)]/2, and ηc (q) = 1 − ηc (q). SV c (p, q) allows a pixel p to cast a stronger stick vote to q either if both pixels belong to the same uniform region, or if the perceptual color difference between them is small. That behavior is achieved by means of the factors Uc (p, q) and the decaying Gaussian function on ΔEpq with a user-defined standard deviation σd . A normalizing factor of two is used in order to make SV c (p, q) to vary from zero to

one. The term ηc (q) in (2) makes noisy voted pixels, q, to adopt the color of their voting neighbors, p, disregarding local uniformity measurements and perceptual color differences between p and q. The term ηc (q) in (2) makes SV c to vary from zero to one. The effect of ηc (q) and ηc (q) on the strength of the stick vote received at a noiseless pixel q is null. In turn, a ball vote can be seen as a ball-shaped tensor, BT(p), with a strength controlled by the scalar factors GS(p, q), ηc (p) and BV c (p, q), each varying between zero and one. The ball propagation function, Bc (p, q), which allows a pixel p to cast a ball vote to a neighboring pixel q for channel c is given by: Bc (p, q) = GS(p, q) ηc (p) BV c (p, q) BT(p),

(3)

with BT(p), GS(p, q), ηc (p) and BV c (p, q) being defined as follows. First, the ball tensor, represented by the identity matrix, I, is the only possible tensor for BT(p), since it is the only tensor that complies with the two main design restrictions: a ball vote must be equivalent to casting stick votes for all possible colors using the hypothesis that all of them are equally likely and, the normalized sˆ1 saliency must be zero when only ball votes are received at a pixel. Second, GS(p, q) and ηc (p) are the same as the factors introduced in (1) for the stick propagation function. They are included for similar reasons to those given in the definition of the stick propagation function. Finally, the scalar factor BV c (p, q) is given by: BV c (p, q) =

c Gσd (ΔEpq ) + Uc (p, q) + Gσd (ΔEpq ) , (4) 3

where Gσd (·) = 1 − Gσd (·) and Uc (p, q) = 1 − Uc (p, q). BV c (p, q) models the fact that a pixel p must reinforce the edginess at the voted pixel q either if there is a big perceptual color difference between p and q, or if p and q are not in a uniform region. This behavior is modeled by means of c Gσd (ΔEpq ) and Uc (p, q). The additional term Gσd (ΔEpq ) is introduced in order to increase the edginess of pixels in which c denotes the perthe only noisy channel is c, where ΔEpq ceptual color difference only measured in the specific color channel c. The normalizing factor of three in (4) allows the ball propagation function to cast ball votes with a strength between zero and one. The proposed voting process at every pixel is carried out by adding all the tensors propagated towards it from its neighbors by applying the above propagation functions. Thus, the total vote received at a pixel q  for each color channel c, TVc (q), is given by: TVc (q) = p∈neigh(q) Sc (p, q) + Bc (p, q). The voting process is applied twice. The first application is used to obtain an initial estimation of the normalized sˆ1 and sˆ2 saliencies, as they are necessary to calculate Uc (p, q) and ηc (p). For this first estimation, only perceptual color differences and spatial distances are taken into account. At the second application, the tensors at every pixel are initialized with the tensors obtained after the first application.

2155

After this initialization, (1) and (3) can be applied in their full definition, since all necessary data are available. After applying the voting process described above, it is necessary to obtain eigenvectors and eigenvalues of TVL (p), TVa (p) and TVb (p) at every pixel p in order to analyze its local perceptual information. The voting results can be interpreted as follows: uniformity increases with the normalized sˆ1 saliency and edginess increases as the normalized sˆ2 saliency becomes greater than the normalized sˆ1 saliency. Hence, the map of normalized sˆ2 saliencies can be used directly as an edginess map. Standard post-processing steps such as non-maximum suppression, hysteresis or thresholding can then be applied to the normalized sˆ2 saliency map in order to obtain binary edge maps. The results can be improved by reducing the noise in the image. This denoising step can be achieved by replacing the pixel’s color by the most likely normalized noiseless color encoded in its tensors. The method can then be applied to the denoised images iteratively, which improves the final performance of the edge detector. 2.3. Parameters of the CIEDE2000 formula The CIEDE2000 formula [4], which estimates the perceptual color difference between two pixels p and q, ΔEpq , has three parameters, kL , kC and kH , to weight the differences in CIELAB luminance, chroma and hue respectively. They can be adjusted to make the CIEDE2000 formula more suitable for every specific application by taking into account factors such as noise or background luminance, since those factors were not explicitly taken into account in the definition of the formula. These parameters must be greater than or equal to one. Based on the formulation given in [5], the following equations for these parameters are proposed: kL = FBL FηL , kC = FBC FηC , kH = FBh Fηh ,

(5)

where FBm are factors that take into account the influence of the background color on the calculation of color differences for the color component m (L, C and h) and Fηm are factors that take into account the influence of noise on the calculation of color differences in component m. On the one hand, big color differences in chromatic channels become less perceptually visible as background luminance decreases. Thus, the influence of the background on the CIEDE2000 formula can be modeled by FBL = 1 and FBC = FBh = 1 + 3 (1 − YB ), where YB is the mean background luminance. On the other hand, big color differences become less perceptually visible as noise increases. The influence of noise on CIEDE2000 can be modeled by means of Fηm = M AD(I)m − M AD(G)m , where I is the image, G is a Gaussian blurred version of I and M AD(·)m is the median absolute difference (M AD) calculated on component m. Fηm is set to 1 in noiseless regions. 3. RESULTS Fifteen outdoor images from the Berkeley segmentation data set [6] and their corresponding ground truths have been used

PSNR (dB) FOMO , FOMN

LGC

Compass

TVED

16.74 0.45, 0.38

16.05 0.45, 0.43

21.13 0.45, 0.43

PSNR (dB) FOMO , FOMN

LGC

Compass

TVED

16.93 0.45, 0.43

20.20 0.44, 0.40

22.28 0.46, 0.44

Fig. 2. First row: original image and the edginess maps generated by the LGC, Compass and TVED methods respectively for two different images. Second row: noisy version of the same images and their corresponding edginess maps (LGC, Compass and TVED). PSNR and FOM for the original (FOMO ) and the noisy image (FOMN ) are indicated below the images.

in the experiments. The methods proposed by Maire et al. [2], referred to as the LGC method, and by Ruzon and Tomasi [1], referred to as the Compass method, have been used in the comparisons, since they are representative of the state-ofthe-art in edge detection. The default parameters of the LGC method have been used. The Compass algorithm has been applied with σ = 2, since the best overall performance of this algorithm has been attained with this standard deviation. Five iterations of the proposed method, referred to as TVED, have been run with parameters σs = 1.3 and σd = 2.5. Gaussian noise with a standard deviation of 30 has been added to the images for the robustness analysis in order to simulate very noisy scenarios. Performance has been evaluated by using two metrics: the Pratt’s Figure of Merit (FOM) [7] in order to measure precision, and the Peak Signal to Noise Ratio (PSNR) in order to measure robustness by comparing differences between two edginess maps: those generated for both the noiseless and the noisy version of the same image. Figure 2 shows the edginess maps detected for two of the tested images1 . It can be seen that LGC generates fewer edges than the others but misses some important edges and their strength is reduced for the noisy images. The Compass operator generates too many edges and the number of edges increases with noise. TVED has a better behavior, since it only detects the most important edges and is less influenced by noise. The PSNR confirms that TVED is the most robust detector, whereas the FOM indicates that the three methods have a similar performance in precision, with TVED being slightly better.

1 All the images are available at http://deim.urv.cat/˜rivi/ tved.html

2156

4. CONCLUDING REMARKS A new method for edge detection based on an adaptation of the TVF has been proposed. An optimized version of CIEDE2000 has been used to measure perceptual color differences in non-controlled environments by modifying its original parameters. Experimental results show that the use of a specific voting process makes the TVF a powerful tool for edge detection. PSNR and FOM have been used to compare the performance of the TVED against two of the most representative state-of-the-art edge detectors. TVED has been found to be more robust and slightly more precise than the other algorithms. 5. REFERENCES [1] M. Ruzon and C. Tomasi, “Edge, junction, and corner detection using color distributions,” IEEE Trans. PAMI, vol. 23, no. 11, pp. 1281–1295, 2001. [2] M. Maire, P. Arbelaez, C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in Proc. CVPR, 2008, pp. 1–8. [3] G. Medioni, M. S. Lee, and C. K. Tang, A Computational Framework for Feature Extraction and Segmentation, Elsevier Science, 2000. [4] M. R. Luo, G. Cui, and B. Rigg, “The development of the CIE 2000 colour-difference formula: CIEDE2000,” Color Res. and Application, vol. 26, no. 5, pp. 340–350, 2001. [5] C-H Chou and K-C Liu, “A fidelity metric for assessing visual quality of color images,” in Proc. ICCCN, 2007, pp. 1154–1159. [6] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proc. ICCV, 2001, pp. II:416–423. [7] W. K. Pratt, Digital Image Processing: PIKS Scientific Inside, Wiley-Interscience, fourth edition, 2007.