Pattern Recognition Image thresholding by ... - Semantic Scholar

Report 3 Downloads 120 Views
Pattern Recognition 42 (2009) 843 -- 856

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

Image thresholding by variational minimax optimization Baidya Nath Saha, Nilanjan Ray ∗ University of Alberta, Edmonton, Alberta, Canada T6G 2E8

A R T I C L E

I N F O

Article history: Received 23 October 2007 Received in revised form 19 August 2008 Accepted 19 September 2008

Keywords: Minimax optimization Variational calculus Image thresholding

A B S T R A C T

In this paper we introduce an adaptive image thresholding technique via minimax optimization of a novel energy functional that consists of a non-linear convex combination of an edge sensitive data fidelity term and a regularization term. While the proposed data fidelity term requires the threshold surface to intersect the image surface only at places with large image gradient magnitude, the regularization term enforces smoothness in the threshold surface. To the best of our knowledge, all the previously proposed energy functional-based adaptive image thresholding algorithms rely on manually set weighting parameters to achieve a balance between the data fidelity and the regularization terms. In contrast, we use minimax principle to automatically find this weighting parameter value, as well as the threshold surface. Our conscious choice of the energy functional permits a variational formulation within the minimax principle leading to a globally optimum solution. The proposed variational minimax optimization is carried out by an iterative gradient descent with exact line search technique that we experimentally demonstrate to be computationally far more attractive than the Fibonacci search applied to find the minimax solution. Our method shows promising results to preserve edge/texture structures in different benchmark images over other competing methods. We also demonstrate the efficacy of the proposed method for delineating lung boundaries from magnetic resonance imagery (MRI). © 2008 Elsevier Ltd. All rights reserved.

1. Introduction Image thresholding is an image pixel labeling problem. It aims to classify pixels of an image into two classes: foreground and background. Typically, a pixel is classified as foreground if the image intensity at the pixel exceeds a threshold value; otherwise it is classified as a background pixel. Thresholding is a well-known pre-processing step for image segmentation. It is widely used in automatic target recognition, industrial applications of computer visions and medical/biomedical image analysis. In general there are two types of image thresholding techniques available: global and local. In the global thresholding technique a gray-level image is converted into a binary image based on an image intensity value called global threshold which is fixed in the whole image domain whereas in local thresholding technique, threshold value can vary from one pixel location to next. Global thresholding converts an input image I to a binary image G as follows: G(i, j) = 1 for I(i, j)  T, or, G(i, j) = 0 for I(i, j) < T, where T is the threshold, G (i, j) = 1 for foreground and G (i, j) = 0 for background. Whereas, for a local threshold, the threshold T is a function over the image domain,



Corresponding author. E-mail addresses: [email protected], [email protected] (N. Ray).

0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.09.033

i.e., T = T (x, y). In addition, if in constructing the threshold value/surface the algorithm adapts itself to the image intensity values, then it is called dynamic or adaptive threshold. Thus, in a general setting, thresholding can be expressed as a test operation that tests against a function T of the form [1]: T = T[x, y, h, I], where, I is the input image and h denotes some local property of this point—for example, the average gray level of a neighborhood centered on (x, y). Threshold selection depends on the information available in the gray-level histogram of the image. An image function I(x, y) can be expressed as a product of a reflectance function and an illumination function based on a simple image formation model. If the illumination component of the image is uniform then the gray-level histogram of the image is clearly bimodal, because the gray levels of object pixels are significantly different from the gray levels of the background. It indicates that one mode is populated from object pixels and the other mode is populated from background pixels. Then objects could be easily partitioned by placing a single global threshold at the valley at the histogram. However, in reality bimodality in histograms does not occur very often. Consequently, a fixed threshold level based of the information of the gray-level histogram will fail totally to separate objects from the background. In this scenario we turn our attention to adaptive local threshold surface where threshold value changes over the image domain to fit the spatially changing background and lighting conditions.

844

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Over the years many threshold selection methods have been proposed. Otsu has suggested a global image thresholding technique where the optimal global threshold value is ascertained by maximizing the between-class variance with an exhaustive search [2]. Sahoo et al. [3] claim that Otsu's method is suitable for real world applications with regard to uniformity and shape measures. Though Otsu's method is one of the most popular methods for global thresholding, it does not work well for many real world images where a significant overlap exists in the gray-level histograms between the pixel intensity values of the objects and the background due to uneven and poor illumination. Kittler and Illingworth [4] characterize the image by a Gaussian mixture of foreground and background pixels and address a minimum error Gaussian-density fitting problem. They use either an exhaustive or an iterative search to optimize the average pixel classification error rate. A local thresholding method is superior to the global ones for poorly and unevenly illuminated images. Niblack proposes a local thresholding technique based on the local mean and local standard deviation [5]. The drawback of this algorithm is to determine the size of the neighborhood that is set by the user and it depends on the information available in the images. The window size should be small enough to preserve the local details and at the same time, it should be large enough to suppress noise. One of the well-known local thresholding methods is to fit a plane or biquadratic function to match the background gray-level variations [6] for unevenly illuminated images. A more advanced way is to generate a threshold surface where the threshold level changes dynamically over the image pixel to pixel [7]. Milgram et al. use gradient or edge information to segment images and assumed that different objects may have different thresholds, but each object has a fixed threshold with respect to its background [8]. Yanowitz and Bruckstein [9] have incorporated the concepts of threshold surface and suggested an algorithm where the threshold surface is determined by the interpolation of the intensity values at high magnitudes of image intensity gradient. The computational complexity of the successive over-relaxation method used in Yanowitz and Bruckstein's algorithm is prohibitively expensive: O (N3 ) for an N×N image. Moreover, the technique uses a mean filter in the processing stage to eliminate the noise that reduces the contrast of the image and consequently affects the segmentation results. Also, there is no good algorithm to choose the value for the gradient magnitude threshold. Choosing an improper gradient threshold leads to false object boundary points adversely affecting the subsequent steps of the algorithm. Yan et al. [10] propose a multistage adaptive thresholding where they consider two global thresholds: pixels having gray values lower than the low threshold value are considered as the background, pixels with intensity greater than the high threshold value are considered as objects and the pixels with gray values in between the two threshold values are considered based on local image statistics of mean and variance within a variable neighborhood. The values of two global thresholds are determined using Otsu's multilevel threshold with exhaustive search technique or percentile statistics. The window size of the neighborhood is determined by ad hoc trial and error method. Chan et al. [11] propose a variational formulation of adaptive thresholding technique based on the principle advocated by Yanowitz and Bruckstein [9]. Whereas Yanowitz and Bruckstein's method involves pre- and post-processing steps followed by solving a dense-matrix inversion, Chan et al. [11] reduce the solution by solving a Poisson equation with only one user supplied scalar parameter. The interpolation in Yanowitz and Bruckstein's algorithm does not consider the relationship between the threshold surface and the image surface which causes either black stains on white background

or vice versa. Liu et al. [12] propose an active surface-based adaptive thresholding algorithm by a repulsive external force. In this model, a Gaussian external repulsive force is designed to keep the active surface away from the image surface while the smoothness constraint and the boundary condition on the threshold surface restrain it from moving far away. Shen and Ip [13] use a Hopfield neural network for an active surface paradigm like Liu et al. [12]. Li et al. compare different thresholding algorithms for segmenting aurora oval boundary from spacecraft UV imagery and they show that local adaptive thresholding algorithm show better results than other thresholding techniques [14]. Sezgin and Sankur have made a recent survey on image thresholding [15]. To the best of our knowledge, all the available local thresholding techniques, including the ones using energy functional minimization, have manually tuned parameters that need to be adjusted for differently illuminated images by trial and error method. The values of these parameters vary significantly for different images. To mitigate the effort of tuning parameters, in this paper, we introduce an adaptive local image thresholding method based on energy functional minimization, where the weighting parameters for the data and the regularization terms do not need to be supplied by a user. The proposed energy functional consists of a non-linear convex combination of a data term and a regularization term. The data term encourages the threshold surface to intersect the image surface at high gradient location and the regularization term imposes smoothness on the threshold surface. In order to find out the solution, i.e., the desired threshold surface, we propose a variational minimax (VM) optimization. The VM algorithm consists of two interleaved iterative steps: maximization with respect to the weighting parameter and variational minimization with respect to the threshold surface. We demonstrate that our conscious choice of the energy functional creates a unique saddle point and the VM algorithm finds this saddle point containing the desired solution. We guarantee this unique saddle point by making the energy functional strictly concave with respect to the weighting parameter and strictly convex with respect to the threshold surface. A short version of this work has been communicated in Ref. [16]. It is worth mentioning that Gennert and Yuille have proposed a generic minimax solution for non-convex energy functional by repeated Fibonacci search [17]. Their method consists of multiple minimization computations for different values of the weighting parameter dictated by a Fibonacci sequence. In contrast, the proposed VM method avoids multiple minimization computations by virtue of a deliberate choice of concavo-convex energy functional. As a result, we compare VM method favorably against the Fibonacci search technique with respect to computations (see Section 4.4 for comparisons). The outline of this paper is as follows. In Section 2 our proposed threshold algorithm is described. In Section 3 numerical implementation is furnished in details. Experimental results are presented in Section 4. Section 5 contains directions for future work and conclusions.

2. Proposed method To describe the mathematical principle behind the proposed method, let I(x, y) and T(x, y), respectively denote the image and the threshold functions. Our proposed energy functional consists of two terms: a data term and a regularization term as follows: E(T; ) =

 1 − 2 E1 (T) + E2 (T),

(1)

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

where E1 (T) is the data term and E2 (T) is the regularization term. E1 (T) =

1 2

E2 (T) =

1 2

   

h(x, y)(I(x, y) − T(x, y))2 dx dy,

(2)

|∇T(x, y)|2 dx dy,

(3)

The VM algorithm iteratively solves the minimax problem as follows. VM algorithm: Initialize: T ← I While convergence/maximum iterations not reached Compute E1 (T) and E2 (T) via Eqs. (2) and (3) Compute: ∗ ← 

with h(x, y) defined as: |∇I(x, y)|q . h(x, y) = max(|∇I(x, y)|q )

(4)

Here  ∈ [0, 1] is the weighting parameter that controls the contribution of the data term and the regularization term in the energy functional. The first component of the energy functional is a data term, which includes an edge sensitive component, h that is a function (normalized between 0 and 1) of image gradient term. Essentially h is an edge indicator function: presence of edges is indicated by a higher value of h. Minimization of this edge sensitive data term requires the threshold surface to intersect the image surface at high gradient places. The second component of the energy functional is an L2 -norm regularization term. The minimization of the regularization term translates into smoothing the threshold surface. The role of the data term and the regularization terms are somewhat opposing in the following sense. The former would be perfectly content if the image and the threshold surface are the same; however the regularization would be highly dissatisfied because the threshold surface would not be smooth. Technically speaking, if the data term dominates over regularization term then the threshold surface suffers from discontinuity and spurious segmentation. On the contrary, if the regularization term dominates over the data term then it incurs under segmentation. The weight parameter  mediates the dispute between the two terms by preventing one to dominate over the other. The value of the exponent q in Eq. (4) is generally taken as 1 or 2 to describe the gradient strength of an image. However, sometimes it may be useful to find its value by simple cross-validation as described in Section 4. Note that the energy functional E in Eq. (1) is a strictly concave function with respect to  and a strictly convex functional with respect to T (proofs provided in Appendices A and B). To find the optimum threshold surface, we seek the minimax solution: max minT E(T; ). Minimax criterion minimizes the cost E in the worst possible case and hence provides a very conservative solution. When the weights for the two components in the energy functional (1) are unknown and/or hard to learn from training images, minimax principle provides a way to find them out during the online minimization with a never-seen-before test image. In the minimax solution a balance between the contributions of the two terms in the energy functional is automatically achieved. Because of the concavo-convex nature of E with respect to  and T, it is possible to interchange the order of the max and the min, because the duality gap is zero [18]: T ∗ = arg max min E(T; ) = arg min max E(T; ).   T T

(5)

We first differentiate E in Eq. (1) with respect to  and find the maximum value * by equating the derivative to zero: E2 (T)

∗ = 

(E1 (T))2 + (E2 (T))2

.

(6)

Next, we minimize E using gradient descent technique for T keeping this value * for  to be fixed (see Appendix C for a derivation): 

jT (x, y) = 1 − (∗ )2 (h(x, y)(I(x, y) − T(x, y))) + ∗ (∇ 2 T(x, y)). jt

(7)

845

E2 (T)

(E1 (T))2 +(E2 (T))2

Update T via Eq. (7) End while 3. Numerical implementation  Eq. (7) is essentially a heat equation with a source term 1 − (∗ )2 (h(x, y)(I(x, y) − T(x, y))) [19]. A straightforward discrete implementation for Eq. (7) would be an explicit numerical scheme as follows:  t+1 t +  1 − (∗ )2 (h (I − T t )) Ti,j = Ti,j i,j i,j i,j t t t t t ), + Ti−1,j + Ti,j+1 + Ti,j−1 − 4Ti,j + ∗ (Ti+1,j

(8)

where  is the time step length of the iterative numerical scheme and t denotes iteration number. Convergence analysis shows that the step size  should be less than or equal to 1/4 [19] at each iteration. We have implemented steepest descent technique using exact line search algorithm to find the optimum value of step size  at each iteration, instead of using a constant value for  with the explicit scheme (8). The basic structure of the kth iteration is: (k)

Step a: Determine a direction of search Ti,j , where 

(k) (k) Ti,j = 1 − ((k) )2 (hi,j (Ii,j − Ti,j )) (k)

(k)

(k)

(k)

(k)

+ (k) (Ti+1,j + Ti−1,j + Ti,j+1 + Ti,j−1 − 4Ti,j ). Step b: Find (k) by minimizing E(T(k) +(k) T (k) ) with respect to (k) . Step c: Set T(k+1) = T(k) +(k) T(k) . Since E(T(k) + (k) T(k) ) = e1 +(k) e 2 +((k) ) 2 e3 (see Appendix D for details), where  1  (k) { ∇T2 + 1 − ((k) )2 h(I − T)2 }, 2   e2 = {(k) (∇T.∇(T)) − 1 − ((k) )2 h(I − T)(T)}

e1 =

and e3 =

 1 { 1 − ((k) )2 h(T)2 + (k) ∇ T2 }, 2

the exact line search step (k) =−(e2 /2e3 ) minimizes E(T(k) +(k) T(k) ). 4. Experimental results We have demonstrated experimental results into four subsections. In Section 4.1, we have demonstrated the capability of edge and texture preservation of the proposed method compared with other competing methods. Next, in Section 4.2, we demonstrate that the proposed method can be used as a pre-processing step to find the boundaries of the lung cavity from magnetic resonance imagery (MRI). The volume of lung cavity is measured to identify different lung diseases severity including asthma, chronic obstructive

846

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

pulmonary disease (COPD), cystic fibrosis, smoking related lung disease and bronchiolitis obliterans [20]. Also, to validate pulmonary drug radiologists need to segment hundreds of MRI slices manually that is time consuming. So, automating this lung segmentation is important. Then, in Section 4.3, we carry out an experiment with a hand-written document image with complex background—poor and uneven illumination. Further, in Section 4.4, we compare the computations of the VM method with the conventional Fibonacci search technique. This comparison shows that the proposed VM algorithm is computationally more attractive for solving minimax optimization problem at hand than conventional techniques that do not exploit the convexity in the energy functional. We have compared the proposed VM method with Liu et al.'s locally adaptive threshold [12], Ray et al.'s parametric active contour [20], Otsu's global thresholding [2] and fuzzy C-means algorithm [21]. The rationale for choosing these methods for comparison is as follows. The proposed VM method is a variational energy minimization technique, so that the choice of Liu et al.'s method is the most obvious as another latest variational method for image thresholing. Also note that the VM method does not need user supplied weighting parameters for the data and the regularization terms. Thus we have chosen two very well known completely parameter free unsupervised methods for image thresholding, viz., Otsu's method and fuzzy C-means clustering. The choice of Ray et al.'s technique for lung segmentation is a domain-specific one. It is one of the latest state-of-the-art image segmentation techniques applied to lung cavity segmentation from MRI slices.

4.1. Preserving texture information To examine the performance of the proposed algorithm on preservation of texture and edges, we apply it on a 512 pixels×512 pixels gray-scale Barbara image (shown in Fig. 1(a)). The result is shown in Fig. 1(b). Here, the value of q is taken as 1. We have implemented Liu et al.'s method (a variational locally adaptive threholding method) on the Barbara image and we have chosen the two tuning parameters of their method  = 16, w = 1 from their article [12]. We have also implemented Otsu's global thresholding and fuzzy C-means [21] on the same image as shown in Fig. 1. We further experiment with some benchmark images: girl (Fig. 2), truck (Fig. 3) and tank (Fig. 4) used in image processing literature. In all the experiments the value of q in VM method is kept fixed at 1; we have also taken the same aforementioned values of the parameters in Liu et al.'s method. From Fig. 2 it is observed that our proposed method can preserve better texture information on girl image (especially top left quadrant of the image shown in Fig. 2(b)) than other competitive methods. Our method also works well on both the well-known truck and tank images. It can segment the wheel of the truck (shown in Fig. 3(b)) and the star sign located on the tank (shown in Fig. 4(b)) from their noisy backgrounds, whereas the other methods fail to retain such texture information. We have performed quantitative evaluation of these results with respect to local texture preservation. Our quantitative evaluation goes exactly in line with the goal of any locally adaptive threholding: to locally classify the image pixels into foreground and background classes correctly, such as in Ref. [2]. Hence, we have computed two local statistics: intra-class variance (Vintra ) and intra-class entropy (Eintra ) for each pixel and if any algorithm can classify the pixels correctly into two classes: foreground and background than any other algorithm then the values of these two statistics will be less than that of others for most of the pixels. These two local statistics are defined as follows. Given a binary image G produced by a threshold on the gray-level image I, a natural measure of local classification is to compute the sum of intra-class variances within

a window of size (2n+1)×(2n+1) centered at each pixel location (i, j) as follows: Vintra (i, j) = af (i, j)

n  x,y=−n

G(i + x, j + y)(I(i + x, j + y) − mf (i, j))2

+ (1 − af (i, j))

n 

(1 − G(i + x, j + y))

x,y=−n

× (I(i + x, j + y) − mb (i, j))2 , where af is the fraction of foreground pixels within the window: af (i, j) =



n 

G(i + x, j + y)

(2n + 1)2

x,y=−n

and mf and mb are, respectively, the foreground and the background class means:  n  G(i + x, j + y)I(i + x, j + y) af (i, j)(2n + 1)2 , mf (i, j) = mb (i, j) =

x,y=−n n 

 (1 − G(i + x, j + y))I(i + x, j + y)

x,y=−n

(1−af (i, j))(2n+1)2 .

In computing these quantities we assumed G (i, j) = 1 for a foreground pixel and G (i, j) = 0 for a background pixel. In our computation of Vintra we have taken a window size of 5×5, i.e., n = 2. Similarly, the sum of intra-class entropies within a window of size (2n+1)×(2n+1) centered at each pixel location (i, j) can be defined as (see Ref. [3]): ⎛ n L  x,y=−n Ik (i + x, j + y)G(i + x, j + y) ⎝ Eintra (i, j) = n L m=0 x,y=−n Im (i + x, j + y)G(i + x, j + y) k=0 ⎛

n



n

⎞⎞

x,y=−n Ik (i + x, j + y)G(i + x, j + y) ⎠⎠ × log ⎝  n L m=0 x,y=−n Im (i + x, j + y)G(i + x, j + y)

+

L  k=0

⎝ ⎛

x,y=−n Ik (i + x, j + y)(1 − G(i + x, j + y)) n L m=0 x,y=−n Im (i + x, j + y)(1 − G(i + x, j + y))

n

⎞⎞

x,y=−n Ik (i + x, j + y)(1−G(i+x, j+y)) ⎠⎠ , × log ⎝  n L m=0 x,y=−n Im (i+x, j+y)(1−G(i+x, j+y))

where Ik (i, j) = 1 if I(i, j) = k else Ik (i, j) = 0. Gray level of image I varies from 0 to L. Note that between two segmentation algorithms A and B, the better one will produce a lower Vintra or Eintra , i.e., a lower sum of intra-class variances and a lower sum of intra-class entropies. We define percentage of win, lose and tie-break/draw for A (with respect to B) as: Win =

# of pixels for which Vintra (Eintra ) for A < Vintra (Eintra ) for B , Total # of pixels (size of image)

Loss =

# of pixels for which Vintra (Eintra ) for A > Vintra (Eintra ) for B , Total # of pixels (size of image)

Draw =

# of pixels for which Vintra (Eintra ) for A = Vintra (Eintra ) for B . Total # of pixels (size of image)

Tables 1(a)–(c) report these quantities between the VM algorithm and other methods. From the results of Table 1a–c it is observed that proposed VM algorithm can retain substantial amount of more local texture information than other competitive methods.

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

847

Fig. 1. Barbara image: (a) Barbara image, (b) proposed method, (c) Liu et al.'s method [12], (d) Otsu's method [2] and (e) fuzzy C-means [21].

4.2. Lung boundary extraction We apply the VM algorithm on proton MRI slices to extract the lung boundaries shown in Fig. 5(a). Lung boundary extraction from proton MRI is of significance importance in clinical trials on

some pulmonary diseases, because it helps to compute lung air intake volume that indicates the degree of progress of a pulmonary disease [20]. In this segmentation process we utilize the proposed VM image thresholding as a pre-processing step. First, the value of edge

848

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Fig. 2. Girl image and results of different thresholding techniques: (a) girl image, (b) proposed method, (c) Liu et al.'s method [12], (d) Otsu's method [2] and (e) fuzzy C-means [21].

indicator exponent q has been found experimentally from a small training set. Fig. 6 shows the values of q vs. Pratt's figure of merit (PFOM) [22] for three randomly chosen MRI slices. Initially the PFOM increases as the value of q increases and it stabilizes at q = 8 in Fig. 6. If the value of q is increased without bound, the edge indicator

h(x, y) will tend to zero for most pixel locations (x, y) and consequently the effect of edge sensitive data term on energy functional (1) decreases and the regularization term overpowers the data term. With the negligible data fidelity term the solution turns out to be that of an isotropic heat equation and will essentially be

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

849

Fig. 3. Truck image and results of different thresholding techniques: (a) truck image, (b) proposed method, (c) Liu et al.'s method [12], (d) Otsu's method [2] and (e) fuzzy C-means [21].

a global threshold value. Thus we limit the value of the exponent as 8 for MRI slices based on the experimental results shown in Fig. 6. Fig. 6(a) shows PFOM values for different q and Fig. 6(b) shows absolute values of the derivative of PFOM with respect to q for

different q. Fig. 6(b) shows that initially |j(PFOM)/ jq| is unstable but it stabilizes after q = 8. So, we have chosen the value of q = 8 for this experiment. However, for images where detailed texture information is favored to preserve, the value of q is recommended as 1 or 2.

850

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Fig. 4. Tank image and the results of different thresholding techniques: (a) tank image, (b) proposed method, (c) Liu et al.'s method [12], (d) Otsu's method [2] and (e) fuzzy C-means [21].

Medical images where contour extraction is the primary goal, the value of q is recommended to have a larger value, say 4–10. Experiment shown in Fig. 6 is recommended to find out the value of q if ground truth of the images are available. The results of VM

method, Liu et al. [12], Otsu and fuzzy C-means algorithms [21] are shown in Fig. 5(b)–(e), respectively. We have chosen the two tuning parameters of Liu et al.'s method,  = 16 and w = 1 from their article [12].

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

851

Table 1 (a) Comparison of proposed variational minimax (VM) method with Liu et al's [12] locally adaptive variational thresholding method VM vs. Liu Truck image (Fig. 3a) (%) Girl image (Fig. 2a) (%) Tank image (Fig. 4a) (%) Barbara image (Fig. 1a) (%) Win Vintra Eintra

57.57 64.39

43.63 41.36

68.61 73.74

42.86 47.77

Loss Vintra Eintra

24.96 17.52

19.81 21.56

22.95 17.47

21.94 16.05

Draw Vintra Eintra

17.47 18.09

36.56 37.08

8.43 8.79

35.19 36.18

(b) comparison of proposed variational minimax (VM) method with fuzzy C-means [21] VM vs. fuzzy C-means Truck image (Fig. 3a) (%) Girl image (Fig. 2a) (%)

Tank image (Fig. 4a) (%)

Barbara image (Fig. 1a) (%)

Win Vintra Eintra

67.05 74.28

56.94 57.18

82.56 89.08

55.59 63.40

Loss Vintra Eintra

19.37 11.83

14.88 14.42

12.36 5.71

19.7 11.43

Draw Vintra Eintra

13.58 13.89

28.18 28.40

5.08 5.21

24.71 25.17

(c) comparison of proposed variational minimax (VM) method with Otsu's global thresholding [2] VM vs. Otsu's global thresholding Truck image (Fig. 3a) (%) Girl image (Fig. 2a) (%)

Tank image (Fig. 4a) (%)

Barbara image (Fig. 1a) (%)

Win Vintra Eintra

66.74 74.19

57.12 57.26

83.35 89.21

55.59 63.39

Loss Vintra Eintra

19.37 11.59

14.99 14.63

11.52 5.55

19.71 11.44

Draw Vintra Eintra

13.89 14.22

27.89 28.11

5.13 5.24

24.70 25.17

Next, to complete the lung boundary extraction, the following morphological operations are performed on the binary image obtained by thresholding. Note that the two lung cavities are contained in the two black connected components located at the middle of the thresholded binary image shown in Fig. 5(b). These two black connected components are collected and their boundaries are found (shown in Fig. 7(a)). Now, we need to discard the unnecessary extended necks, if any, formed at the costophrenic angles, such as one shown here in Fig. 7(a) at the right cavity. Next, we compute the distance transform of the boundary image in Fig. 7(a) and threshold it at 5 to obtain the binary image shown in Fig. 7(b). Our empirical observations have led us to choose the threshold value of 5 here, as the neck width of extraneous portions hardly exceeds 10 pixels, i.e., twice the chosen threshold value 5. Finally we perform watershed segmentation on the original proton image of Fig. 7(a) from two markers provided by the two white connected components from Fig. 7(b). The watershed boundaries are taken as the lung boundaries (shown in Fig. 7(c)). We have computed PFOM to measure the quantitative evaluation of the lung boundary extraction results. PFOM is a dimensionless number between 0 and 1, with 1 for ideal segmentation. To compute PFOM we need the ideal edges that are computed by manually segmented lung boundaries. We perform automated segmentation on three MRI data sets. The segmentation process as explained already consists of VM thresholding followed by morphological post-processing. To compare our segmentation results with other methods, we replace the thresholding step by other binarization algorithms. Table 2 summarizes the average value of

PFOM of all three data sets (a total of 30 slices in the three sets) and it compares the results found by Liu et al.'s method [12], Ray et al.'s active contour method [20], Otsu's method [2] and fuzzy C-means [21]. Results found by our proposed method shows superiority over Ray et al.'s method [20], Otsu [2] and fuzzy C-means method [21] and it is comparable to Liu et al.'s method [12].

4.3. Experiment with a hand-written document image Fig. 8(a) is an example of a hand-written document image with poor and uneven illumination and background. Fig. 8(b)–(e) are result of our proposed VM algorithm for different q values: 1, 2, 4 and 8. Fig. 8(f)–(h) are result of Liu et al.'s algorithm [12] for different parameter values: 8(f)  = 8, w = 5, 8(g)  = 3, w = 3 and 8(h)  = 16, w = 1. Text segmented by our algorithm are readable for all parameter settings and it is clearly visible for q = 4 and 8 (Fig. 8(d) and Fig. 8(e)) where Liu et al.'s algorithm are illegible for different parameter settings. All of these parameters are chosen from their article [12].

4.4. Comparison of VM with Fibonacci search Conventionally a Fibonacci search can be performed to find out the minimax solution [17]: max minT E(T; ). Note that in performing so, one needs to compute multiple minimization of the energy functional with respect to the threshold surface T, one each for each

852

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Fig. 5. (a) Proton MRI, (b) threshold by the proposed method, (c) threshold by Liu et al.'s method [12], (d) threshold by Otsu's method [2] and (e) binarization by fuzzy C-means [21] classification.

value of . In VM algorithm we avoid multiple computation of the minimum T exploiting the fact that for any value of  the proposed energy functional is convex, quadratic. In this section, we show experimentally that the VM method is computationally more

attractive than conventional Fibonacci search applied to find the minimax solution of the proposed energy functional. In the Fibonacci search the value of the weighting parameter  is chosen according to a Fibonacci sequence. Next, as already

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

853

Fig. 6. (a) PFOM for different values of q is shown. Three different plots represent PFOM for three different MRI slices. (b) Absolute value of the derivative of PFOM w.r.t q vs. q. Three different plots correspond to that of (a).

Fig. 7. Images of different steps of post-processing operation required to find lung cavity from the thresholded binary image.

Table 2 Comparison of our segmentation method with Ray et al., Otsu and Fuzzy C-means methods using Pratt's figure of merit (PFOM) MRI data sets (series number)

PFOM value (proposed method)

Liu's active surface-based adaptive thresholding algorithm [12]

PFOM value (Ray et al.'s method [20])

PFOM value (Otsu's method [2])

PFOM value (fuzzy C-means [21])

1 2 3

0.8381 0.8395 0.7955

0.8367 0.8373 0.8143

0.7156 0.7120 0.7085

0.5565 0.4672 0.7954

0.5567 0.4607 0.7648

indicated, the energy functional is minimized with respect to the threshold function T for each value of . As  ∈ [0, 1], we have chosen the value of  initially as 0.1, 0.2 and 0.3, respectively. For each value of , minT E(T; )is computed. As the energy functional is concave down with respect to , we need to compute minT E(T; ) for three consecutive values of  in a Fibonacci sequence to reach at the maximum point on the energy curve shown in Fig. 9. There are three cases which might occur: 1. If minT E(T; )at any three consecutive values of  in a Fibonacci sequence are monotonically increasing in order, then the maximum point occurs beyond the largest of the three  values. 2. If minT E(T; ) for three values of  say 1 , 2 and 3 are, respectively, T1 , T2 and T3 , where T1 < T2 > T3 and 1  2  3 , then the maximum point at the curve lies between 1 and 3 . 3. If minT E(T; )at any three consecutive values of  in a Fibonacci sequence are monotonically decreasing in order, then the maximum point occurs before the smallest of the three  values. In case 1 and 3 further search is performed until case 2 is found. In case 2, further search is conducted with a decreased order of 

values (if the values of  were 0.1, 0.2, 0.3 they will now be 0.01, 0.02, 0.03 etc.) and the search is finally stopped when a desired accuracy for  is reached. A typical behavior of the energy functional with respect to  is shown in Fig. 9. We have carried out an experiment on Barbara image shown in Fig. 1(a). Our proposed VM algorithm converges after 129 iterations. The performance of Fibonacci search-based minimax solution is shown in Table 3. Note that the Fibonacci search-based minimax solution depends on two stopping criteria: (a) the criterion for convergence of minT E(T; ) computation at each value of the parameter  using gradient descent technique with line search and (b) the criterion for convergence of . To keep the comparison fair, we stop the VM at the same accuracy level of  as the Fibbonacci search, i.e., the aforementioned criterion (b). The criterion (a) for this experiment is the maximum iteration number or a fixed accuracy level (0.001) for minT E(T; ) computation, whichever is reached earlier. Thus Table 3 has two columns—the first one denoting the maximum iteration number in (a). The second column of Table 3 denotes the total number of iterations before the Fibonacci search stopped. Note that number of VM iterations (129) is almost an order of magnitude less even for the lowest maximum number of iterations. The gap widens

854

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Fig. 8. (a) Real hand-written document image, (b)–(e) threshold by the proposed VM method for q = 1,2,4 and 8, respectively, (f)–(h) threshold by Liu et al's method  = 8, w = 5;  = 3, w = 3 and  = 16, w = 1, respectively.

by orders of magnitude as we increase the maximum number of iterations. 5. Conclusions and future work In this paper we introduce a variational adaptive image thresholding technique where we design a novel energy functional consisting of a data term and a regularization term, encouraging the threshold surface, respectively, to intersect the image surface only at high gradient places and to enforce smoothness in the threshold surface. There is a weighting parameter that makes a balance between the data term and the regularization term and it tries to

contravene the domination of one term over other. The optimal value of the weighting parameter as well as the desired threshold surface is computed through an iterative minimax algorithm. Our algorithm attempts to mitigate the user effort to adjust the values of the weighting parameter. Moreover, it avoids multiple minima computation as in Fibonacci search-based minimax solution method to substantially save on computations. The method shows encouraging results when utilized to find lung boundaries on proton MRIs. The method is also shown to preserve texture and edge better than other competing methods. A few future plans about the proposed VM method are as follows. We would like to concentrate on an automated technique where

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Appendix B. E (T; a) is strictly convex in T

Fibonacci Search 25

In order to prove E (T; ) is strictly convex in T, we need to show E(T + U; ) − E(T; )  E(T; ) holds for ∀U such that T + U ∈ D [23]. By substituting the VM functional, we have

Maximum point

15 T

min E(T;α)

20

E(T + U; ) − E(T; ) =

10

+

5 0

F1 = 0.1 F2 = 0.2 F3 = F1 + F2 = 0.3

0

0.1

0.2

0.3

0.4

0.5 0.6 alpha

855

 

=

0.7

0.8

0.9

1

|∇(T + U)|2 −

 



Table 3 Performance of Fibonacci search-based minimax solution

− =

Maximum # of iterations to find minT E(T; )

Total # of iterations to find minimax solution

100 200 500 1000

809 1109 2009 3509

simultaneous smoothing and thresholding operations of the images can be performed in a cooperative manner so that it can mostly eliminate the work of post-processing. Here, we propose to determine the value of the edge indicator index, q experimentally for the lung boundary extraction applications. Although a very small number of training images is usually good enough for this estimation, in the future, we would like to concentrate on finding the value of q automatically from the data. Further, here we have chosen a particular convex combination of data and regularization term. Our future endeavor will investigate whether other convex combination of data and regularization terms works successfully in different applications. At present, our method is purely an edge-based thresholding technique. We also would want to incorporate the gray-level histogram information in the VM algorithm and get the advantage of both edge and region-based segmentation techniques.



1 − 2

 

h(I − T)2 − 

 

|∇T|2

   1 − 2 (h((I − T)2 + U 2 − 2(I − T)U)) +

Fig. 9. Behavior of the energy functional with respect to .

   1 − 2 h(I − T − U)2

(|∇T|2 + |∇U|2 + 2∇T.∇U)

1 − 2

 

h(I − T)2 − 

 

|∇T|2

   1 − 2 (h(U 2 − 2(I − T)U)) +

 

(|∇U|2 + 2∇T.∇U).

Similarly, we can show E(T + U; ) − E(T; ) =

   1 − 2 (h(x, y)(2 U 2 − 2(I − T)U)) +

 

(2 |∇U|2 + 2∇T.∇U).

Therefore, the Gâteaux variation of VM functional is given by

E(T; ) = lim

→0

= lim →0 +  =

E(T + U; ) − E(T; ) 



1 − 2



1 − 2

 

h(U 2 − 2(I − T)U)

|∇U|2 + 2∇T.∇U  

(h(−2(I − T)U)) + 

  2∇T.∇U.

Thus, we have Acknowledgments The authors express their gratitude to the reviewers for constructive suggestions. This work is supported by NSERC, iCORE and University of Alberta. Appendix A. E is a concave function with respect to a To prove E is concave in , it is required to show that for any

 ∈ [0, 1]:

2 E(, T) < 0. 2 Note that:

E(, T) E (T) = E2 (T) − 1  1 − 2 and

2 E(, T) E1 (T) =− < 0. 2 (1 − 2 )3/2 So E is concave in .

E(T + U; ) − E(T; ) − E(T; )      h(x, y)U 2 +  |∇U|2 . = 1 − 2 Since, both  and we have



1 − 2 and h(x, y) are non-negative ∀ ∈ [0, 1],

E(T + U; ) − E(T; ) − E(T; )  0. The equality holds if and only if U = 0. Hence, E (T; ) is strictly convex in T. Appendix C. Minimization of the energy functional using calculus of variation For a fixed value of , say *, proposed energy functional E becomes,    1 E= h(x, y)(I(x, y) − T(x, y))2 dx dy 1 − (∗ )2 2   1 + ∗ |∇T(x, y)|2 dx dy. 2

856

B.N. Saha, N. Ray / Pattern Recognition 42 (2009) 843 -- 856

Now, Euler equation, Ef = ET − ETx − ETy , where

where

jE , ET = jT

e1 =

jT Tx = , jx

jT Ty = , jy

 ET = − 1 − (∗ )2 h(x, y)(I(x, y) − T(x, y)), ETx =

jE = ∗ Txx , jTx

Txx =

j2 T , jx2

ETy =

jE = ∗ Tyy , jTy

Tyy =

j2 T . jy2

and e3 =

jT(x, y) = − Ef jt

 = 1 − (∗ )2 h(x, y)(I(x, y) − T(x, y)) + ∗ (Txx + Tyy ),  = 1 − (∗ )2 h(x, y)(I(x, y) − T(x, y)) + ∗ (∇ 2 T).

Appendix D. Line search details [24] Let E(T(k) +(k) T(k) ) = E1 +E2 , where

= =

(k)  2

(k)  2

(k)  2 +

∇(T + (k) T)

2

∇T + (k) ∇(T) ∇T2 + (k) (k)

(k) ((k) )2  2

2

 (∇T.∇(T))

∇ T2

and E2 = =

 1 − ((k) )2 h(x, y)  2

(I − T − (k) T)

2

 1 − ((k) )2 h(x, y)    1 ((k) )2  2 2 (k) (I − T) −  (T) . × (I − T)(T) + 2 2

So E1 + E 2 =

 1  (k) { ∇T2 + 1 − ((k) )2 h(x, y)(I − T)2 } 2    {(k) (∇T.∇(T))− 1−((k) )2 h(x, y)(I−T)(T)} + (k) +

 1 { 1 − ((k) )2 h(x, y)(T)2 + (k) ∇ T2 }. 2

So the step size (k) = −(e2 /2e3 ) minimizes E (T(k) +(k) T (k) ).

Using gradient descent technique [23] we have

E1 =

 1  (k) { ∇T2 + 1 − ((k) )2 h(x, y)(I − T)2 }, 2   {(k) (∇T.∇(T)) − 1 − ((k) )2 h(x, y)(I − T)(T)} e2 =

 ((k) )2  { 1 − ((k) )2 h(x, y)(T)2 + (k) ∇ T2 } 2

References [1] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Pearson Prentice-Hall, 2005. [2] N. Otsu, A threshold selection method from gray-level histogram, IEEE Trans. Syst. Man Cybern. SMC-9 (1) (1979) 62–66. [3] P.K. Sahoo, S. Soltani, A.K.C. Wong, Y. Chen, A survey of thresholding techniques, Comput. Vision Graphics Image Process. 41 (1988) 233–260. [4] J. Kittler, J. Illingworth, Minimum error thresholding, Pattern Recognition 19 (1) (1986) 41–47. [5] W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1986. [6] R. Jain, R. Kasturi, B.G. Schunk, Machine Vision, McGraw-Hill, New York, 1995. [7] C.K. Chow, T. Kaneko, Automatic boundary detection of the left-ventricle from cineangiograms, Comput. Biomed. Res. 5 (1972) 388–410. [8] D.L. Milgram, A. Rosenfeld, T. Willet, G. Tisdale, Algorithms and hardware technology for image recognition, Final Report to US Army Night Vision Lab., Computer Science Centre, University Maryland, College park, 1978. [9] S.D. Yanowitz, A.M. Bruckstein, A new method for image segmentation, Comput. Vision Graphs Image process. 46 (1989) 82–95. [10] F. Yan, H. Zhang, C.R. Kube, Multistage adaptive thresholding method, Pattern Recognition Lett. 26 (8) (2005) 1183–1191. [11] F.H.Y. Chan, F.K. Lam, H. Zhu, Adaptive thresholding by variational method, IEEE Trans. Image Process. 7 (3) (1998) 468–473. [12] F. Liu, Y. Luo, X. Song, D. Hu, Active surface model-based adaptive thresholding algorithm by repulsive external force, J. Electron. Imaging 12 (2003) 299–306. [13] D. Shen, H.H.S. Ip, A Hopfield neural network for adaptive image segmentation: An active surface paradigm, Pattern Recognition Lett. 18 (1997) 37–48. [14] X. Li, R. Ramachandran, M. He, S. Movva, J. Rushing, S. Graves, W. Lyatsky, A. Tan, G. Germany, Comparing different thresholding algorithms for segmenting auroras, Proc. Int. Conf. Inf. Technol.: Coding Comput. 2 (2004) 594–601. [15] M. Sezgin, B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, J. Electron. Imaging 13 (2004) 146–165. [16] N. Ray, B. Saha, Edge Sensitive Variational Image Thresholding, accepted at IEEE International Conference on Image Processing, September 2007. [17] M.A. Gennert, A.Y. Yuille, Determining the optimal weights in multiple objective function optimization, in: Proceedings of the Second International Conference on Computer Vision, 1988, pp. 87–89. [18] A. Ruszczynski, Nonlinear Optimization, Princeton University Press, Princeton, NJ, 2006. [19] G. Evans, J. Blackledge, P. Yardley, Numerical Methods for Partial Differential Equations, Springer, Berlin, 2005. [20] N. Ray, S.T. Acton, T. Altes, E.E. de Lange, J.R. Brookeman, Merging parametric active contours within Homogeneous Image Regions for MRI-Based Lung Segmentation, IEEE Trans. Med. Imaging 22 (2003) 189–199. [21] J.C. Bezdek, J. Keller, R. Krishnapuram, N.R. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Springer, Berlin, 1999. [22] I.E. Abdou, W.K. Pratt, Quantitative design and evaluation of enhancement/thresholding edge detectors, Proc. IEEE 67 (5) (1979) 753–763. [23] J.L. Troutman, Variational Calculus with Elementary Convexity, Springer, Berlin, 1983. [24] R. Fletcher, Practical Methods of Optimization, second ed., Wiley, New York, 2006.

= e1 + (k) e2 + ((k) )2 e3 ,

About the Author—BAIDYA NATH SAHA received Bachelor of mechanical engineering from Jadavpur University, Kolkata, India, in 2002. He received Master of Technology in quality, reliability and operations research from Indian Statistical Institute, Kolkata, India, in 2004. He received Master of Technology in computing science from Indian Statistical Institute, Kolkata, India, in 2006. Currently, he is a Ph.D. student in the Department of Computing Science, University of Alberta, Canada. He is a recipient of graduate fellowship at Indian Statistical Institute from 2002 to 2006. His research interests include object segmentation and tracking. About the Author—NILANJAN RAY received his bachelor degree in mechanical engineering from Jadavpur University, Calcutta, India, in 1995, master's degree in computer science from the Indian Statistical Institute, Calcutta, in 1997 and Ph.D. in electrical engineering from the University of Virginia, Charlottesville, in May, 2003. He has worked as a post-doctoral fellow at Electrical and Computer Engineering department, University of Virginia. He has worked in Utopia Compression Corpotation, Los Angeles for a year. He joined the department of Computing Science, University of Alberta in July 2006 as assistant professor. He is a recipient of the CIMPA-UNESCO fellowship for image processing in 1999; the graduate fellowship at Indian Statistical Institute from 1995 to 1997; and the best student paper award from IBM Picture Processing Society presented at the IEEE International Conference on Image Processing, Rochester, NY, 2002. His research area is image analysis.