Single point iterative weighted fuzzy C-means ... - Semantic Scholar

Report 2 Downloads 44 Views
Pattern Recognition 42 (2009) 2527 -- 2540

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

Single point iterative weighted fuzzy C-means clustering algorithm for remote sensing image segmentation Jianchao Fan a , Min Han a,∗ , Jun Wang b a b

School of Electronic and Information Engineering, Dalian University of Technology, Dalian 116023, PR China Department of Mechanical and Automation Engineering, Faculty of Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong

A R T I C L E

I N F O

Article history: Received 24 October 2008 Received in revised form 18 March 2009 Accepted 18 April 2009 Keywords: Clustering Attribute weights Center initialization Fuzzy C-means Image segmentation

A B S T R A C T

In this paper, a remote sensing image segmentation procedure that utilizes a single point iterative weighted fuzzy C-means clustering algorithm is proposed based upon the prior information. This method can solve the fuzzy C-means algorithm's problem that the clustering quality is greatly affected by the data distributing and the stochastic initializing the centrals of clustering. After the probability statistics of original data, the weights of data attribute are designed to adjust original samples to the uniform distribution, and added in the process of cyclic iteration, which could be suitable for the character of fuzzy C-means algorithm so as to improve the precision. Furthermore, appropriate initial clustering centers adjacent to the actual final clustering centers can be found by the proposed single point adjustment method, which could promote the convergence speed of the overall iterative process and drastically reduce the calculation time. Otherwise, the modified algorithm is updated from multidimensional data analysis to color images clustering. Moreover, with the comparison experiments of the UCI data sets, public Berkeley segmentation dataset and the actual remote sensing data, the real validity of proposed algorithm is proved. © 2009 Elsevier Ltd. All rights reserved.

1. Introduction Wetland mapping derived from remote sensing images is subject to error and uncertainty. Fuzzy classification techniques can deal with the spectral and spatial vagueness, and can be used to model the uncertainty in remote sensing classification [1]. Several investigations have been undertaken for forecasting the change of various geographic areas, and the identification of different types of landform [2]. Success of many segmentation methods is based upon a fundamental assumption that the clustering of data spectrally similar are close, and dissimilar data lies further apart in the feature space [3,4]. Thus, different land cover types can be separated on the basis of their spectral features and temporal variations of these spectral features. Clustering-analysis-based methods can provide a nonparametric, unsupervised approach to the analysis of each kind of images. This classification method is a data-driven and pattern-based approach that summarizes the collective role of topography in differentiating the environment [5]. ISODATA [6] (iterative self-organizing data

∗ Corresponding author. Tel.: +86 411 84708719; fax: +86 411 84707847. E-mail addresses: [email protected] (J. Fan), [email protected] (M. Han), [email protected] (J. Wang). 0031-3203/$ - see front matter © 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2009.04.013

analysis technique) and K-means method [7] are both unsupervised clustering methods that can identify natural class in a reproducible way no matter how many attributes are adopted. The above efforts fully developed the “crisp” landform classification, which maybe causes that landform of large areas can be classified rapidly into small and repetitive landform units at various spatial scales. However, the spatial continuum that has long been recognized as an essential characteristic of landforms is poorly represented [8]. The theory of fuzzy method provides a possible solution to the “crispiness” problem in traditional landform classifications. Fuzzy method allow individual data points to have partial belongings (membership) to multiple classes, and therefore allow for the existence of overlapped classes that transition gradually from one to another [9]. In this way, the spatial continuity existing in the biophysical environment can be represented without relying on crisp boundaries in geographic or attribute space. Fuzzy C-means algorithm is a well-known fuzzy clustering algorithm, whose relevant theories are complete [10]. Based on calculated terrain attributes, this clustering method follows an iterative procedure to construct continuous spatial patterns of fuzzy landform class memberships. The spatial co-variation between these memberships and other biophysical properties, which are often continuous in space as well, can be identified [11]. Many modified classifiers based on FCM algorithm have been applied for image segmentation. Ahmed et al. [12] combined the bias field estimation to the MRI data clustering.

2528

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

Fig. 1. The influence of initial centers for clustering: (a) the original remote sensing image, (b) the normal clustering result image and (c) the false clustering result image.

Chuang et al. [13] and Cai et al. [14] incorporated local information to reduce the influence of noises, respectively. There is a key challenge for FCM landform classification: it results in a stable solution that can optimally partition data, but the biophysical meanings of produced class centers and memberships need further interpretation [15]. There is no guarantee in this process that the uniform weight assignment will correspond to the most meaningful classification, in part because the optimal weight assignment is generally unknown. This is to say, attribute weights may need to be tuned and evaluated in an iterative way to produce interpretable data classes that best fit a particular application or best match a specific biophysical pattern. Thus, the data classification of fuzzy C-means algorithm is sensitive to weight adjustments of adopted data attributes [16]. Wen-Ya and Isabelle [17] used different kinds of experiment to prove that to a great extent the clustering precision was influenced by the weights of data attribute. But how to design the weights was not proposed. Nock and Nielsen [18] referring to the boosting analogies drew the conclusion that weighting a clustering algorithm boiled down to defining a distribution about the all samples. Hathaway and Hu [19] consisted of reducing the original dataset to a smaller one, assigning each selected datum a weight reflecting the number of nearby data, clustering the weighted reduced dataset using a weighted version of the feature or relational data FCM algorithm, and if desired, extending the reduced data results hack to the original dataset. On the other hand, the initialization step of clustering is very important because different selection of the initial clustering centers can potentially result in different local optima or different partition, and affect the whole algorithm speed of convergence [20]. Fig. 1(a) shows a Zhalong wetland original remote sensing image with nine different landforms according to the sampling in fact. FCM clustering using different initial clustering centers are shown, respectively, in Fig. 1(b) and (c). From direct observation, the Fig. 1(b) is partitioned to nine landforms precisely, and the boundaries of different classes are clear. However, there are only two classes in the Fig. 1(c), which cannot denote the correct clustering results, and converge to a local minimum point. Yager and Filev [21] adopted the mountain method to approximate the initial centers, however, whose computation is too complex. Pena et al. [22] compared four initialization methods from the empirical results. The color-checker method was proposed by Dae-Won et al. [23] to obtain the initial centers. Therefore, to take these two aspects into account, a novel single point iterative weighted fuzzy C-means algorithm (SWFCM) for multidimensional data clustering is proposed. There is always some prior knowledge in the actual image segmentation mission, hence which is fully used in the SWFCM classifier to direct the unsupervised FCM algorithm to obtain the best clustering results. The semi-supervised [2,24–26] methods generally modify the objective function in the iterative process. However, in this paper, incorporating this known

information, the weights are set to each attribute to distinguish the distinctions between a class and the other ones. Furthermore, the single point iterative method is used to accelerate the means of prior knowledge adjustment to get the better initial clustering centers closed to the true clustering centers. Thus, the well-known drawbacks of the FCM algorithm are overcome with the help of our novel clustering centers initialization method and weights design scheme. The remainder of this paper is organized as follows. Section 2 presents some preliminaries on clustering. Section 3 describes the weights adaptive adjustment method for clustering attributes. In Section 4, we discuss the issues associated with the initialization of clustering algorithms and present the proposed initialization technique. Furthermore, in Section 5, the whole framework of the remote sensing image segmentation and accuracy assessment method are given. Section 6 highlights the potential of the proposed approach through experimental examples. Concluding remarks are presented in Section 7.

2. Preliminaries Clustering aims to discover and organize structure in data sets by identifying and quantifying similarities among individual data patterns [27]. Several clustering algorithms have been proposed in the literature to properly partition data sets [28,29]. The basic idea in most clustering algorithm is to identify a set of points, and then update pattern membership to clusters iteratively, so as to achieve a better partition. Generally, the number of clustering C is needed to set up in advance, and then begin the whole iterative process [30]. In the actual clustering application, such as data analysis and image segmentation, some prior knowledge X  is always incidental, but the number of that is generally little. If the supervised method was adopted, the learning effect would be poor due to fewer training samples. Thus, the unsupervised algorithm is a better choice to solute this problem. Our main distribution in this article is how to use this small part of prior knowledge to guide the clustering process and improve the accuracy. In the standard numerical and image database like UCI machine learning and Berkeley public common image database, this prior knowledge is provided with the experiment data. In the real remote sensing image segmentation, a little ground true knowledge X  about difference classes of landform is obtained by way of on-the-spot sampling or other geobiological information on this research place. If really the studied image without any prior information was required to analyze by our proposed method, the ancillary image process software could be used to mark small part of different classes information out and obtain the X  , which is very convenient and simple. Let X  = {x1 , x2 , . . . , xN } be the prior knowledge matrix including each class data information, where xk = {xk,1 , xk,2 , . . . , xk,P , ClassID}T ∈

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

P+1 are prior samples, N is the total number for all classes, and (P + 1) is the dimension of X  including data attributes and the corresponding category knowledge. Therefore, the mean  and standard deviation  of data attribute are acquired through the statistical analysis for first P column of the matrix X  . According to the category knowledge, the mean of each class in the prior samples can also be obtained. In the rest of this section, we briefly describe the fuzzy C-means algorithm [30] from the mathematical view. Let X = {x1 , x2 , . . . , xN } be an unlabeled data set of the image pixels, with xk = {xk,1 , xk,2 , . . . , xk,P }T ∈ P , where N is the total number of samples and P is the dimension of pattern vectors. FCM is an objective function-based clustering algorithm, which divides the dataset into C(C ∈ 2, . . . , N − 1) clusters represented as fuzzy sets {T1 , T2 , · · · , TC }, so as to minimizing the function Jm (U, V) Jm (U, V) =

C N   k=1 i=1

 i )T (xk − v i) um (x − v ik k

(1)

where uik is the membership degree of data xk to the clustering  i , and is additionally an element of a membership matrix center v 1, v 2, . . . , v  C } is a vector comprised of the centroid U = [uik ]C×N . V = {v of fuzzy sets {T1 , T2 , . . . , TC }. Thus, a fuzzy partition can be denoted by  i )T (xk − v  i ) denotes the Euclidean distance from the pair(U, V). (xk − v  i . m is a weighting sample xk to the candidate clustering center v exponent controlling the amount of clustering fuzziness, whose value is often determined by user. In general, most users choose m in the range [1.5, 2.5], with m=2.0 an overwhelming favorite [31]. The FCM algorithm focuses on minimizing Jm (U, V), subjected to the following constraints on U N 

uik > 0,

C 

uik = 1,

0 ⱕ uik ⱕ 1

i=1

k=1

(i = 1, 2, . . . , C k = 1, 2, . . . , N)

(2)

Functions (1) and (2) describe a constrained optimization problem, which can be converted to unconstrained optimization problem by using the Lagrange multiplier technique. A solution of the objective function can be obtained via an iterative process where the  i are updated degrees of membership uik and the clustering centers v via uik =

C

j=1



1 T

 i ) (xk −v i ) (xk −v  j )T (xk −v j ) (xk −v

N (u )m xk  i = k=1 ik , v m N k=1 (uik )

1/(m−1) ,

i = 1, . . . , C

i = 1, . . . , C k = 1, . . . , N

(3)

(4)

And, at the beginning of the FCM algorithm, the clustering centers are initialized randomly. Then, calculate the membership matrix and update the clustering centers using the Eqs. (3) and (4) in the whole iterative process, respectively. Therefore, the different initial centers would make different results, which leads to FCM are sensitive to the initial setting as shown in Fig. 1. Pena et al. [22] conducted a series of experiments to illustrate that the Kaufman initialization methods outperformed the rest of the compared methods as they made the algorithm more effective and more independent on initial clustering. The initial C clustering centers were obtained by the selection of uniformly located instances in the database, so which was short for UFCM in this paper. The rest of the instances were selected according to the heuristic rule of choosing the instances that promised to have around them a higher number of the rest of instances. However, the first C instances uniformly located in the database might

2529

not be adjacent to the final clustering centers in the actual case. The proposed initialization scheme by Dae-Won et al. [23] extracted the most vivid and distinguishable colors in the image, referred as the dominant colors. The color points closest to these dominant colors were selected as the initial centroids in the FCM calculations, short for CFCM. So, the CFCM might choose noise points in the image as the initial points by mistake, and also this method are only fit for common photos. In the numerical data and particular image segmentation application, such as the remote sensing image analysis, CFCM might not achieve good clustering accuracy. Furthermore, from the objective function Jm (U, V), it is found that data attribute weights are assumed equal and make the same impact to each clustering centers. So, some important attributes would not affect the clustering process greatly to obtain the more meaningful clustering results. What is more, it is well known that higher clustering precision is achieved by FCM algorithm when the different classes are distributed uniform in the sample space. However, in general case sample distribution is not fit this character. Hence, from these two aspects, the special attribute weights for different classes are imperative to overcome high dependency on initial sample distribution. It is also sensitive to weight adjustments of adopted terrain attributes in the landform classification mission. Undoubtedly, better matching between landform classification and the real soil map may be produced when appropriate weights are obtained. In the iterative process of remote sensing image clustering, attribute weights determine the significance of each topographic properties and influence the attributes distances calculation, which results in corresponding movements for clustering centers. 3. The proposed scheme of attribute weights design Many researchers had proposed some weighting scheme. Nock and Nielsen [18] had proposed boosting analogies method to estimate probability density of all samples, which needs a lot of calculation time. Nevertheless, if the data is huge, like the image segmentation, the efficiency of this method is limited to a great extend. Hathaway and Hu [19] consisted of a reducing the original dataset to a smaller one method, which clustered the weighted reduced dataset, and then extended the reduced data results hack to the original dataset. In the paper, some prior knowledge is adopted to analyze the data attribute distribution. And then, special weights in FCM are defined to each one of the data attributes to adjust original sample uniform distribution, which is better fit for the clustering character of different classes distributing uniform in the space. The influence of data attribute with the example of UCI dataset IRIS is shown in Fig. 2. There are 3 classes in the IRIS dataset, which are marked with different symbol. From this figure, it is found that the attributes 3 and 4 are fitter for the clustering character than other two attributes. And in the FCM algorithm attributes with higher weights should more likely attract the clustering centers intuitively. Thus, attributes 3 and 4 should be set larger weights to improve the accuracy. With the result of that, the remainder of this section is to investigate a particular weighting scheme to adjust sample uniform distribution and its possible theoretical benefits on clustering, in order to make FCM become more robust and efficient. For this objective, prior knowledge X  is statistically analyzed to   (temp) acquire mean  and standard deviation  of data attribute.  = ·i (temp)

(temp)

(temp)

, 2i , . . . , Ci )T is obtained by calculating the means (1i of data attribute i, and then the individual means are rescaled to [0, 1] for avoiding the effect of different magnitude. After that, the  ·i = (1i , 2i , . . . , Ci )T according to the mean values are arranged as  ascending order.

2530

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

4.5

2.5

4

2

3.5

1.5

3

1

2.5

0.5

2

0 4

4.5 5

5.5

6 6.5

7

7.5 8

1

2

3

4

5

6

7

Fig. 2. The influence of data attribute: (a) the combination of attribute 1 and 2 in the IRIS data and (b) the combination of attributes 3 and 4 in the IRIS data.

Definition 1. The difference between the two adjacent data attribute for a given class number C and the data attribute dimension P is defined as

ji = uj+1,i

− uj,i 2

(5)

where ji denotes the difference between the class j and the class (j+ 1) belonging to the data attribute i (i = 1, 2, . . . , P), with the constraint  of C−1 j=1 ji = 1. Definition 2. Let the number of clusters and the dimension of data attributes be denoted by C and P, respectively. The separation index Difference(i) is defined as

Difference(i) =

C−1 

t

 =

C−1 j=1 ji t

t

(C − 1)



C−1

 C−1 ⱖ tji

(7)

j=1

C−1

tji gets maximal value if and only if all of the  i is uniform distriitems ji are same. In other word, each vector  The product of

j=1

bution in the range of [0, 1], the separation degree of each class for data attribute i is better.

Definition 3. The spatial overlapping behavior of data attributes is analyzed by the overlapping index such that ⎛

P

C

 l=1 li

Overlapping(i) = ⎝ C P l=1

C

 l=1 li

 q=1 lq

⎞−1 ⎠

(8)

is the sum with respect to standard deviation for data C P  attribute i, and l=1 q=1 lq shows the summation of all data attributes' deviation. Intuitively, the lower the overlapping between a prototype of a clustering and the prototypes of other clusters is, the higher the weight associated with attribute i is, and hence the quality of attribute i is higher for pattern recognition problems.

q=1 lq

l=1

(9)

j=1

here  (0 <  < 1) denotes a scaling factor whose role is to maintain a balance between the difference index and the overlapping index within the optimization mechanism. wii is the weight for the data attribute i, and is also considered as elements of the weight matrix W, which is expressed as 11

(6)

t ∈ [0, 1] is a nonlinear parameter aiming to enhance the difference ji . That is because the distinction is too tiny after normalization to fully express the data attribute separation. After transforming the formula (6) using the average inequality, formula (7) is taken as

1 C−1

wii = Overlapping(i) + (1 − )Difference(i) ⎛ ⎞−1  C−1  t P Cl=1 li ⎠ + (1 − ) =  ⎝ C P ji 

⎛w

tji

j=1



From Definitions 2 and 3, the weighting design principle of every data attribute for i = 1, 2, . . . , P is given as

⎜ ⎜ ⎜ W =⎜ ⎜ ⎝

0 .. . 0

0 w22 .. . ···

... .. ..

.

. 0

0 ⎞ .. ⎟ ⎟ . ⎟ ⎟ ⎟ 0 ⎠ wpp

(10)

To simplify the calculation, weight matrix W is defined as a diagonal matrix. The elements on the diagonal of W correspond to the weights associated with the attribute for class i, the other elements are 0, which eliminates the relativity for different attributes. From the function (9), it is shown that a high weight associated with an attribute means a low variance of the values of the attribute for the points belonging to the clustering, and the distinction between the prototype of the cluster and the prototypes of the other clusters is high. In other words, the larger the value of the weight is, the higher the discrimination capability of the attribute for the clustering is, as consequently highlights the attributes that result to be particularly effective to characterize a clustering. In all, the pseudocode of the whole proposed scheme on attribute weights design is shown as follow: Start Calculate the mean  and standard deviation  of each   ·i  (temp) and  data attribute, and then get the  ·i Calculate the Difference(i) and Overlapping(i) of ith data attribute using the formulae (6) and (8), respectively Obtain the weights of each data attribute using the formula (9) and weight matrix W End Furthermore, also using the IRIS dataset as example to explain the weighting process, 10% data is selected as prior knowledge to

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540 Table 1 The special parameters calculation results on IRIS dataset.

Difference Overlapping Weights

2531

are defined as

Attribute 1

Attribute 2

Attribute 3

Attribute 4

0.0052 0.1167 0.01635

0.0312 0.0646 0.0926

0.1707 0.1552 0.1692

0.3405 0.1053 0.3170

ii = ij =

ni nj

ni

−1

nj +1

(i)

(0)

(i)

(0)

(i)

(0)

(i)

(0)

[xk − V i ]T W[xk − V i ]

(14)

[xk − V j ]T W[xk − V j ]

(15)

(i)

simulate the little known teaching sample situation. The parameters Difference(i), Overlapping(i) and the final weights for each attribute are given in the Table 1 when  = 0.1. Thus, from Table 1, it is shown that the same conclusion as Fig. 2 that the attributes 3 and 4 should be set larger weights is got. After the weighting processing, the original data distribution is adjusted to almost uniform to improve the clustering accuracy. 4. The novel initialization method The clustering initialization procedure aims to establish good starting points for the clustering centers. We choose the initial clustering centers from the prior knowledge X  = {x1 , x2 , . . . , xN }. However, intuitively, the effect of clustering result is mostly dependent on the prior knowledge selection. Therefore, it should fast adjust the means of prior information to acquire better initial clustering centers which are close to the true clustering centers.  i· = Definition 4. Let xk = {xk,1 , xk,2 , . . . , xk,P , ClassID}T ∈ P+1 and  (i1 , i2 , . . . , iP ) be a prior sample point and a mean of prior knowledge for class i according to the variable ClassID, respectively. The  i· for i = 1, 2, . . . , C denoted by (xk ,   i· ), is distance between xk and  defined as  i· ) = xk −   i· 2 , (xk , 

xk ∈ X 

(11)

 i· ) is larger, From the above equation, it is shown that when (xk ,  the prior sample point k is more far away from the center i of the training samples, thus which does not belong to class i. Conversely,  i· closer. Then, all prior samples are roughly that means the xk and  clustered by the following function:

here xk indicates the prior sample point k in the class i for i = 1, 2, . . . , C. The numbers of all prior samples in class i and j of data set Z are represented as ni and nj , respectively. Assuming that il =min{ij } (i) for j = 1, . . . , C, i  j, if il  ii , the sample point xk will be removed from classes i to j, which is the minimum distance among the dis(i) (0) tances from sample xk to each clustering centers except V i . There(0) (0) fore, the corresponding clustering centers V and V are tuned by

1ⱕiⱕC

k = 1, 2, . . . , N, j = 1, 2, . . . , C

1 (0) (i) [V − xk ] ni − 1 i

(16)

(0) (0) V j = V j −

1 (0) (i) [V − xk ] nj + 1 j

(17)

The adjustment is phased in after going through every prior sample point. When all the know information X  are searched one time, the initial clustering centers V (0) stop tuning. By now, the better initial clustering centers in the whole SWFCM algorithm have achieved. The convergence of the proposed novel initialization method with the data attribute weights is proved as follow: JCi (I) =

ni 

(0) V i

n

(18)

where Dki (I) = [xk − V i (I)]T W[xk − V i (I)], and JCi (I) is the value of objective function for class i as the iterative time I. If sample xk is removed from data set Zi to Zj , the objective function of Zi changes JCi (I + 1) ni−1

=

=

i



Dki (I)

k=1

(12)

 i· )}, and the clustering result is that where j = arg min1 ⱕ i ⱕ C {(xk ,  prior sample xk are put into the data set Z = {Z1 , Z2 , . . . , Zj , . . . , ZC } = X  , respectively. Therefore, the prior samples xk may be ascribed to the other categories, which is not identical with the parameter ClassID. That is because prior sample overlapping and distribution  i· , asymmetry. After the comparison of distances between xk and  (0)  the rough initial clustering centers V are calculated by Eq. (13).

j

(0) (0) V i = V i +

 k=1

 i· )}, Zj = min {(xk , 

i

Eqs. (16) and (17)

[xk − V i (I + 1)]T W[xk − V i (I + 1)]

ni −1 P   k=1 j=1

=

ni −1 P  k=1 j=1

2

xkj

1 (i) − Vij (I) −  (V (I) − xkj ) ni − 1 ij

(xkj − Vij (I))2 w2jj −

ni

=



Dki (I) −

k=1

ni

ni

−1

(i)

w2jj

P 1  (i) (xkj − Vij (I))2 w2jj ni − 1 j=1

(i)

[xk − V i (I)]T W[xk − V i (I)]

= JCi (I) − ii



i 1  = ⎝ xj ⎠ , ni

xj

j=1



∈ Zi ⊂ X , i = 1, 2, . . . , C

(13)

where ni is the number of prior samples in data set Zi , and initial (0) clustering centers V are the means of data set Zi belonging to the ith i

class, which is not same as

 i· . 

The variable

 i· 

is calculated through

the known category information, but V i is obtained according to the distance comparison results. Furthermore, according to the functions (14) and (15), ii denotes the distance between the clustering centers i of data set Zi and the sample that belongs to the same set, and ij is the distance from this sample to another clustering centers j, which (0)

(19)

And so, the objective function of Zj has the same change JCj (I + 1) = JCj (I) + ij

(20)

Sum Eqs. (19) and (20) to obtain the change of the overall objective function at each iterative step JC (I + 1) = JC (I) − (ii − ij )

(21)

If ii > ij , the objective function could gradually decrease all the time until converging to the optimal point. Thus, the convergence of proposed method has been proven.

2532

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

In the end of this section, the pseudocode of the proposed novel initialization method on clustering centers is given as follow:

From step 6 start clustering

Start  i· according to the ClassID, and then update Calculate the mean   i· ) using (11) the distances (xk ,  Calculate the data set Z and the rough initial centers V (0) using (12) and (13), respectively For k = 1: N Calculate the ii and ij using the formulae (14) and (15), respectively, make the il = min{ij } If (il  ii ) Tune the initial centers V (0) through the formulae (16) and (17) End if Increase k End For Obtain the better initial clustering centers V (0)

Get the better initial clustering center V(0) and the attribute weights matrix W, then initialize other parameters.

(Iε) Yes Exclude the situation that samples overlap the clustering centers

Defuzzied uik(I+1)=1

End 5. The whole framework of remote sensing image segmentation According to the proposed modified SWFCM algorithm, the data attribute matrix W and the initial clustering centers  (0)  (0)  (0) T V (0) = {v 1 , v2 , . . . , vC } that is close to the true clustering centers are obtained. Thus we give the complete framework of remote sensing image segmentation based on the proposed method, which will be described as follow.

No

(I) (I) c [xk−vi ]ΤW[xk−vi ] 1 )m−1 ∑( j=1 [x −v (I) ]ΤW[x −v(I) ] k j k j

(I+1)

vi

U

n n (I) (I) = ∑ (uik )mxk ∑ (uik )m k=1 k=1

Stop

(I+1)

=[uik(I+1) ],error = U(I+1) -U(I) , I + +

Fig. 3. The flowchart of SWFCM algorithm for remote sensing image classification.

Step 1. Given the clustering number C, initialize the fuzzy exponent m, the maximal iteration times Loop and the scaling factor , and decide the convergence error , which is a small positive constant. Step 2. Import the RGB values of each pixel in the remote sensing image to samples set X ={x1 , x2 , . . . , xN }, the number of data attributes for each sample data xi is P. Step 3. Choose 10% incidental known samples information on this remote sensing image as the prior knowledge X  ={x1 , x2 , . . . , xN }, then which are statistically analyzed to acquire the mean  and standard deviation  for each attribute, other 90% sample knowledge is used to test the clustering accuracy. Step 4. Compute the weight matrix W using Eq. (9). Step 5. For each prior sample point xk ∈ X  , compute the distances using Eqs. (14) and (15), and update the initial clustering centers V (0) using Eqs. (16) and (17). Step 6. After all the prior sample points are explored, the final initial  (0)  (0)  (0) T clustering centers V (0) = {v 1 , v2 , . . . , vC } are obtained. The remainder steps are shown in Fig. 3. In the whole iterative  (I+1) and membership process, compute the new clustering centers v i (I+1)

by functions (22) and (23), respectively. The weight matrix uik matrix W that tunes the distribution of samples through different data attribute weights is added to Eq. (23), which could improve the clustering algorithm precision. The objective function is minimized and the updating function for the clustering center and membership matrix are derived as N (I) m  k=1 (uik ) xk  (I+1) =  , v i (I) m N k=1 (uik )

i = 1, . . . , C

(I+1)

=

c

j=1



Most clustering algorithms take a random generation to the initial selection method, which results in different selections of the initial clustering centers leading to different clustering partitions. Thus, a methodology of evaluation was required to validate each of the fuzzy C-partitions once they are found [32,33]. This quantitative evaluation is the subject of clustering validity index, which can appraise the partition result preciously. There are a number of clustering validity indices available. Most of them use only original data, clustering centers as well as the membership matrix. Here are some indices most frequently referred to in the literature. VPC (partition coefficient) and VPE (partition entropy) [34,35] are two simple indices that are computed by using only the elements of the membership matrix. Xie and Beni [36] proposed a well-known validity index VXB , which measures overall average compactness against separation of the C-partitions. Fakuyama [37] and Sugeno and Kwon [38] explored validity indices VFS and VK , respectively. The main computation formulas are presented in Table 2. 6. Simulation results

1 (I)

5.1. Accuracy assessment

(22)

and uik

What is more, append the part of excluding the situation that samples overlap the clustering centers, in order to avoid function (23) trending to infinite. When the stopping condition |U (I+1) −U (I) | < or I > Loop is arrived, the algorithm converges to the optimal point. In the end, after the defuzzy course, the clustering results are matched with the special landforms.

(I)

 i ]T W[xk −v i ] [xk −v  (I)  (I) [xk −v ]T W[xk −v ] j j

1/(m−1)

(23)

To show the effectiveness of the proposed method, we make experiments to compare our weighted versions of clustering algorithm with the initialization scheme to the other algorithms. There are various data sets used for our experimental comparisons, but in order

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

2533

Table 2 The validity indices of unsupervised clustering algorithm. Author

Clustering validation index

Meaning

n c 2  j=1 i ij n

Bezdek [30]

VPC =

Bezdek [30]

VPE = −

Fukuyama and Sugeno [37]

VFS = Jm (U, V : X) − Km (U, V : X) =

Xie and Beni [36]

c n   j=1 i=1

VXB =

The sum of membership value is higher, each class is more compact

c n  1 [ loga (ij )] n j=1 i=1 ij



m ij xk

− vi  − 2

c n   j=1 i=1

Use the partition entropy concept to indicate the separation level among every class



m ij vi

Jm denotes the inner-class compactness, Km is the distinction of the external-class ¯ − v

c n 2 x −v 2 i=1 j=1 ij j i⎞ ⎛

2

Use the arithmetic division format to establish clustering validity index, and the denominator part indicates the strength of separation between clusters

⎜ ⎟ ⎜ ⎟ 2⎟ n⎜ ⎜min vi −vk  ⎟ ⎝ ⎠ ik

Kwon [38]

VK =

 c c ¯ 2 2 x −v 2 − 1c ci=1 vi −v j=1 i=1 ij j i min vi −vk 2

 ik

It is extended from VXB index to eliminate its monotonic decreasing tendency, and then a  ¯ 2 is introduced punishing function 1c ci=1 vi − v

Table 3 UCI data sets.

Class number Data dimension The total number of data

Table 4 Clustering accuracy of three algorithms on UCI data sets. IRIS

Image

Wine

Pima

3 4 150

7 18 2310

3 13 178

2 8 768

to make fair comparisons, the SWFCM and other FCM versions we run are initialized with the same parameters. Thus, any difference in the results stems from the differences in the weighting strategies and the initialization scheme. We have tested public UCI machine learning data sets, synthetic image and the remote sensing image to show the performance reliability of the proposed algorithm, as detailed below. In the UCI data sets and remote sensing image experiment, all sample attributes and category information have been provided. We choose 10% of all samples as the prior knowledge; other 90% samples are adopted to verify the clustering accuracy. That the proportion of the prior information is set as 10% is to simulate the actual few training sample situation discussed in this paper. Through our novel SWFCM algorithm, this small part of known information can be made good use to acquire the attribute weights and initial clustering centers. The effect on this proportion index is shown in the following experiments. The samples information about the synthetic image in the Berkeley public database are also provided, but the number is so little that all of these samples are selected as prior known data. Hence, the results of the natural color image segmentation are verified by intuition and other index such as iterative times and time cost. 6.1. Simulation results of UCI classic data sets Four data sets are used to evaluate the performance reliability of the proposed SWFCM algorithm, which are IRIS, Image segmentation, Wine and Pima from UCI database. Table 3 presents the details of these four data sets such as the class number, data dimension and total number of data. There is the same number of samples for each class. Choose 10% of samples as the prior knowledge, others are test samples. Compared with the standard FCM and UFCM [22] algorithm, which choose initial clustering centers randomly and uniformly distributed in data space, respectively, the clustering quality is given in Table 4. Table 5 presents the corresponding convergence time of three algorithms. From this data, the proposed SWFCM algorithm successfully segments the IRIS, Wine and Pima data sets, especially for the large data set Image, the clustering precision is improved

Standard FCM UFCM [22] SWFCM

IRIS (%)

Image (%)

Wine (%)

Pima (%)

88.67 88.67 91.33

61.69 54.03 73.77

96.07 96.07 96.63

87.19 87.19 92.14

Table 5 Convergence time of three algorithms on UCI data sets (ms).

Standard FCM UFCM [22] SWFCM

IRIS

Image

Wine

Pima

156 140.625 46.875

169687 107359 33093

156.25 140.625 125

378.125 375 437.5

Table 6 Clustering validity indices for dataset IRIS.

Standard FCM UFCM [22] SWFCM

VPC

VPE

VXB

VFS

VK

0.8957 0.9000 0.9553

0.2845 0.2576 0.1234

0.6566 0.1669 0.0824

31.8589 26.6838 26.2158

100.8137 26.0793 12.7859

largely by 12.08% with Standard FCM, and by 19.74% with the UFCM algorithm, but, the convergence time decreases about 5–7 times. Furthermore, it is found that with the data dimension increasing, the accuracy improves more significantly. UFCM adopts the uniform distribution of all data space initialization strategy, regardless of specific distribution of each class. Therefore, though the operation time reduces, the clustering precision does not increase sometimes. The convergence time of SWFCM algorithm decreases obviously except that of Pima data set, that is because only one attribute is fit for FCM clustering character, which is endowed with higher weight. The whole clustering process ignores the effect of other attributes and is only dependent on this attribute, so it needs more iterative times to converge to the optimal point. Then we adopt the well-known clustering validity indices VPC , VPE , VXB , VFS and VK to test two versions of FCM and the proposed SWFCM. In order to eliminate the disparities in performance induced by random initialization, standard FCM is run 10 times with different initial centers to get the average value. Tables 6–9 give the results for four types of data sets. VPC is larger, the clustering quality is better, however, VPE , VXB , VK and VFS are smaller, the clustering quality is better.

2534

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

Table 7 Clustering validity indices for dataset IMAGE.

Standard FCM UFCM [22] SWFCM

VPC

VPE

VXB

VFS

VK

0.7126 0.6928 0.7812

0.8846 0.9106 0.6638

0.4084 0.4212 0.3939

1002.7734 942.5285 916.1919

946.9180 976.2681 914.1681

Table 8 Clustering validity indices for dataset WINE.

Standard FCM UFCM [22] SWFCM

VPC

VPE

VXB

VFS

VK

0.7659 0.7659 0.7853

0.6287 0.6287 0.5860

0.3166 0.3166 0.2987

5.1304 5.1304 5.0190

56.8213 56.8213 53.6602

Table 9 Clustering validity indices for dataset PIMA.

Standard FCM UFCM [22] SWFCM

VPC

VPE

VXB

VFS

VK

0.7641 0.7641 0.8270

0.5420 0.5420 0.4067

0.5173 0.5173 0.2960

53.9842 53.9842 10.7514

397.5480 397.5480 227.5628

Table 10 The relationship between initial clustering centers and final ones.

Standard FCM

UFCM [22]

SWFCM

Initial clustering centers

Initial clustering centers

{0.500,0.417,0.661,0.708} {0.694,0.333,0.644,0.542} {0.528,0.375,0.559,0.499} {0,0,0,0} {0.333,0.333,0.333,0.333} {0.667,0.667,0.667,0.667} {0.196,0.591,0.079,0.059} {0.460,0.317,0.569,0.525} {0.658,0.424,0.784,0.830}

{0.196,0.591,0.079,0.060} {0.431,0.298,0.567,0.532} {0.703,0.452,0.795,0.827} {0.196,0.591,0.079,0.060} {0.431,0.298,0.567,0.532} {0.703,0.452,0.795,0.827} {0.196,0.590,0.078,0.060} {0.453,0.310,0.559,0.511} {0.655,0.426,0.781,0.827}

Iterative times

30

34

18

From these data in tables, it is found that the proposed SWFCM algorithm have the better clustering performance, which is consistent with the clustering accuracy in Table 4. In other words, the C-partitions after SWFCM segmentation obtain the higher innerclass compactness and the more significant distinctions between the external-classes. In order to explain the relationship between initial clustering centers and final ones, the results for different versions of FCM algorithms are presented in Table 10 with the IRIS dataset. It is shown that standard FCM selects the initial centers randomly, and UFCM [22] dose not consider the actual sample distribution, the centers are uniform distribution in [0,1] after standard. Hence, both methods need more iterative times to converge. However, the proposed SWFCM method chooses the adjacent initial centers to final ones, the speed of convergence is fast, and the clustering accuracy and convergence time is also improved greatly from Tables 4 and 5, respectively. The performance of each algorithm is monitored by expressing a distance between the membership matrices as error = |U (I+1) − U (I) |, which is usually a suitable stopping criterion. In order to show the different effect of weighting scheme and initialization method respectively, we run the fuzzy C-means algorithm only with the proposed clustering centers initialization version (SFCM) for four UCI datasets in the experiments. The change curves of convergence error are shown in Fig. 4 for each algorithm. Fig. 3(a) and (d) adopt

Cartesian coordinate to represent convergence process, and the longitudinal coordinates in Fig. 3(b) and (c) is logarithm to express the difference of final precision. From these curves, it is found that the initial errors of standard FCM and UFCM algorithm are larger. That is because the initial clustering centers are not chosen appropriately, the algorithms need more iterative times to converge. SFCM algorithm with the proposed initialization method has obtained better clustering centers V (0) , which accelerates the iterative speed. However, the clustering precision of that is the same as the standard FCM without any improvement sometimes shown in Fig. 4(c). Furthermore, with the weighting scheme, SWFCM not only advances the rapidity of convergence, but also improves the precision. Therefore, we can discover that the initialization method meliorates the convergence speed, and the weights of data attributes increase the clustering precision. Furthermore, parameter  in the formula (9) balances the index Difference(i) and Overlapping(i). In order to study the effect on the clustering accuracy with different  values, the experiment is designed with the UCI dataset. Parameter  changes from 0.1 to 0.9 and the corresponding clustering accuracy are presented in Table 11. It is found that when  is in the range of [0.1,0.4], the best clustering accuracies marked bold in Table are often obtained. That means the attribute weights are more dependent on the index Difference(i). And, in the UCI dataset experiments, we choose ten percent of all samples as the prior knowledge; other ninety percent samples are adopted to verify the clustering accuracy. That the proportion of the prior information is set as 10% is to simulate the actual few training sample situation mainly discussed in this paper. In the rest of this part, change the proportion of known information to look for the influence on the accuracy as given in Table 12. From Table 12, the conclusion is obtained that the clustering accuracy is improved with the increase of the proportion for prior information. However, when the proportion gets a certain level, the precision does not increase any more which marks bold in the Table. In the attribute weight design method and the initialization process, this part of known knowledge is adopted to acquire the statistics information of each class and each attribute, such as the mean and variance. Hence, when the X  is large enough to obtain the sufficient statistic information standing for all samples, the precision is not improved, but the proportion is different for each dataset. For the IRIS dataset, the accuracy gets saturated until the proportion gets 40% due to the number of all samples is too small as presented in Table 3. Correspondingly, the precision of the large datasets, such as IMAGE and PIMA, get the better results only with 10% prior knowledge. And that the dataset WINE needs 20% is because of the higher dimension. Therefore, the proposed SWFCM algorithm is very robust to the proportion of prior knowledge. 6.2. The common color image segmentation In this section, the natural color images in the famous public Berkeley segmentation dataset and benchmark based on humansegmented are used to test the reliability of the proposed algorithm. Fig. 5 shows the original color image called “flower” with the size 481×321, which contains four dominant colors: red for petal, yellow for pistil, dark green for leaves and black for the background. It could be found from the follow website: http://www.eecs.berkeley. edu/Research/Projects/CS/vision/grouping/segbench/BSDS300/html/ dataset/images/color/124084.html. Here some modifications have been performed to appreciate the color image clustering. Otherwise, a lot of small parts in the image will influence the segmentation results. In the Berkeley dataset, some pixels in the “flower” image are recorded and classified by human, however, the number of which is less. So, all of these sample information are adopted as prior knowledge X  to obtain the better attribute weights and initial clustering

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

90

104 Standard FCM UFCM SFCM SWFCM

80

60 50 40 30

Standard FCM UFCM SFCM SWFCM

102 100 Convergence error

Convergence error

70

20

10-2 10-4 10-6 10-8 10-10

10

10-12 1

2

4

6

8 10 Iterative times

12

10-14

14

1

50

100 150 Iterative times

200

250

300

103 Standard FCM UFCM SFCM SWFCM

102 101 100 10-1 10-2 10-3 10-4 10-5

Standard FCM UFCM SFCM SWFCM

250 Convergence error

Convergence error

2535

200 150 100 50

10-6 10-7

0 1

2

4

6

8 10 Iterative times

12

14

16

1

5

10

15 20 Iterative times

25

30

Fig. 4. The comparison of convergence error variation for: (a) IRIS dataset, (b) IMAGE dataset, (c) WINE dataset and (d) PIMA dataset.

Table 12 The effect of clustering accuracy on the proportion of prior information.

Table 11 The effect of clustering accuracy on parameter .



0.1 (%) 0.2 (%) 0.3 (%) 0.4 (%) 0.5 (%) 0.6 (%) 0.7 (%) 0.8 (%) 0.9 (%)

IRIS IMAGE WINE PIMA

91.33 70.12 96.07 92.14

91.33 73.77 96.63 92.14

90.67 71.34 96.63 92.14

90.00 68.61 96.63 92.14

89.33 60.69 96.07 87.19

89.33 54.03 96.07 87.19

88.67 54.03 96.07 87.19

88.67 54.03 96.07 87.19

88.67 54.03 96.07 87.19

centers. These known sample information contains red, green and blue value of each pixel and the corresponding category knowledge. Thus, in this experiment the class number C = 4, attribute number P = 3. Compared with the other three unsupervised algorithm, the standard FCM and UFCM are the same as mentioned above. The main principle of CFCM [23] algorithm is that the points closest to these dominant colors should be selected as the initial clustering centers in the FCM calculation. So the clustering results are shown in Fig. 6. Fig. 6(a) shows the clustering result using the Standard FCM, which leads to some part of pistil misclassifying to the petal, and pistil and leaves are wrong attribute to the same class. Clustering results based on the UFCM shown in Fig. 6(b) also fails to correctly distinguish pistil from leaves, and some parts of petal are confused as the background due to the effect of illumination. CFCM algorithm (Fig. 6(c)) considers the special character in the image, and chooses

IRIS (%) IMAGE (%) WINE (%) PIMA (%)

5%

10%

20%

40%

80%

88.67 71.34 96.07 89.84

91.33 73.77 96.63 92.14

91.33 73.77 98.31 92.14

93.33 73.77 98.31 92.14

93.33 73.77 98.31 92.14

Fig. 5. The original natural color image.

the vivid color as the initial centers, which correctly clusters the image to four classes, but the shadow on the petal is also oversegmentation to the background. In contrast, our proposed SWFCM

2536

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

Fig. 6. The clustering result images generated by: (a) the Standard FCM, (b) UFCM [22], (c) CFCM [23] and (d) the proposed SWFCM.

Table 13 The comparison of convergence for four algorithms.

Iterative times Time cost (ms)

Standard FCM

UFCM [22]

CFCM [23]

SWFCM

51 1 546 875.5

48 143 031.25

43 125 437.5

38 118 962.3

method adopting the novel initialization scheme and weighting method provides better clustering in Fig. 6(d) results than any other versions of FCM algorithm. The whole flower is correctly recognized, and the boundary between petal and pistil is clearly, no shadow parts are misclassified. From the other hand, considering the convergence performance of four algorithms, the comparison results are presented in Table 13. It is found that the proposed SWFCM takes the less iterative times and time cost to complete the nature color image clustering mission. Therefore, the conclusion proves that the appropriate initial clustering centers could speed up the segmentation process, and the correct weights of data attribute could significantly improve the image clustering precision.

Fig. 7. Zhalong Nature Reserve on October 21, 2001 image.

6.3. The clustering of remote sensing image In order to verify the proposed unsupervised fuzzy SWFCM clustering algorithm for remote sensing images in this paper, we conduct this experiment on the remote sensing image of Zhalong Nature Reserve in China acquired on October 21, 2001, as shown in Fig. 7. Zhalong Nature Reserve is a national natural reserve of China, located in 123◦ 47 E–124◦ 47 E and 46◦ 52 N–47◦ 32 N, with the total area of 210,000 ha. Its main purpose is to protect the valuable and rare wild animals, plant species and wetland ecosystems. A catastrophic forest fire had occurred in Zhalong Nature Reserve from August 27, 2001. Therefore, in the remote sensing image of October 21, 2001, there is a large area of burning. There are eight major types of features in Zhalong Nature Reserve: farmland, grassland, reed swamp, freshwater swamp, Salina, water, wasteland and burned area. We select a sub-region image from the remote sensing image of Zhalong Nature Reserve, as shown in Fig. 8, which consists of 256

Fig. 8. The reference remote sensing image of Zhalong.

scan lines, with 256 pixels per line, and a pixel size of about 30×30 m. For this geographic area, 19 832 pixels have been carefully recorded and registered ground cover information as samples by human

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

2537

Fig. 9. The clustering result images generated by: (a) the standard FCM, (b) UFCM [22], (c) CFCM [23] and (d) the proposed SWFCM.

Table 15 The comparison of Kappa coefficient and overall accuracy.

Table 14 The comparison of producer accuracy. Standard FCM (%) UFCM [22] (%) CFCM [23] (%) SWFCM (%) Farmland Grassland Reed swamp Freshwater swamp Salina Water Wasteland Burned area

80.27 43.53 86.82 19.73 90.80 81.89 61.58 95.50

85.98 54.06 70.41 88.64 90.80 89.44 79.49 79.47

81.32 45.44 83.55 89.89 90.80 91.76 78.38 94.37

98.03 65.54 90.72 87.02 91.62 96.98 77.22 100.00

observers based on ground reference and land-use map, which contains eight different landforms shown in Legend. For each of these image pixels, Landsat TM measurements are available in six spectral bands. Bands 2 and 3 are in the visible part of the electromagnetic spectrum, which could provide great help in distinguishing different landform and the boundaries. TM 4 is in the infrared that is benefit for mapping the water and exploring the soil humidity. Thus, these three bands are used to compose this reference remote sensing image as shown in Fig. 7. After the pixels extraction, the image information is imported. There are eight classes of landform, as C = 8. And each sample includes three data attributes, as P = 3, which are red, green and blue short for RGB. We select 10% sample information from each kind of landform, the others are used to test the clustering accuracy. Thus, 1983 pixels with RGB value and category information are contained in the prior samples X  to obtain the better weights and initial centers. We compare the proposed SWFCM algorithm with standard FCM, UFCM and CFCM for this experiment, which have been widely used

Overall accuracy (%) Kappa

Standard FCM

UFCM [22]

CFCM [23]

SWFCM

77.38 0.6865

83.29 0.7554

81.56 0.7364

90.92 0.8412

Table 16 The comparison of convergence for four algorithms.

Iterative times Time cost (s)

Standard FCM

UFCM [22]

CFCM [23]

SWFCM

219 483.064

108 184.114

102 165.788

88 140.413

in the classification of remote sensing images. Fig. 9 shows four fuzzy classified scenes with the original map, respectively. From the clustering results, it is found that, based on a visual evaluation, the image generated by the SWFCM classifier is more accurate than the images resulting from the other version of FCM classifiers. Fig. 9(d) illustrates that SWFCM algorithm can capture Salina located in the verge on water, and the farmland, grassland and reed swamp are well attribute to the three classes. However, the standard FCM misclassifies the water area to the burned area at the right-bottom side due to the improper initial clustering centers in Fig. 9(a). CFCM classifier excludes some part of water area to the Salina at the left-top side, that is because the differences of water depth cause the different spectrum mapping. To quantify the qualitative evaluation results in Fig. 9, error matrices are used in the accuracy assessment to generate various

2538

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

Center 2 160

245 240

150

RGB

RGB

250 RGB

Center 3

170

140 130

0

50 Times Center 5

100

0

80

60

75

50

70

40

65

20

55

10 0 0

50 Times Center 7

100

200

RGB

RGB

RGB

RGB

100

160 50

80

50 Times

100

0

50 Times

100

0

50 Times

100

60 50

30

0

120 0

100

40

140 60

50 Times Center 8

70

150 200 180

0

80

220 120 100

30

60

50 100 Times Center 6

240

140

70

50

120

235

Center 4

85

RGB

Center 1 255

0

50 Times

100

Fig. 10. The change curve of the clustering centers for SWFCM algorithm.

Table 17 The relationship between initial clustering centers and final ones.

Initial Centers Final Centers

Initial Centers Final Centers

Center 1

Center 2

Center 3

Center 4

{254.1,255.3,255.4} {246.3,253.2,253.5}

{148.2,138.3,120.6} {152.1,158.1,169.8}

{52.0,75.1,78.3} {72.1,74.9,81.7}

{0.5,16.2,18.6} {3.1,31.2,62.2}

Center 5

Center 6

Center 7

Center 8

{78.6,62.5,109.4} {102.3,114.4,133.7}

{232.0,127.5,219.7} {182.2,191.7,197.3}

{0.1,86.3,159.3} {1.75,84.8,150.6}

{45.0,37.8,34.7} {39.9,51.5,73.4}

statistics that characterized the capabilities of the four fuzzy classifiers. Table 14 presents the comparison of producer accuracy. Otherwise, a discrete multivariate analysis technique is used to test whether the overall agreement in the different separate error matrices is significantly different. The measure of agreement called Kappa coefficient [39] is adopted to assess the significant differences among the four classification algorithms, which is defined as q n − k=1 nk+ n+k k=1 kk q n2 − k=1 nk+ n+k

q

Kappa =

n

(24)

where n is the number of all samples, q is the number of classes, nkk denotes the right classed amount. nk+ and n+k indicate, respectively, the number for class i and the number of clustering to class i. The comparison results of the Kappa coefficient and overall accuracy for four algorithms are presented in Table 15. When examining the producer accuracy for the standard FCM classification, we find that only 43.53% and 19.73% accuracy are achieved for grassland and freshwater swamp, respectively. Furthermore, the accuracy of the grassland and reed swamp is also very low by the UFCM and CFCM classifiers from the data. However, the SWFCM can provide a producer's accuracy up to 65.54% for the grassland and 90.72% for the reed swamp. The accuracy assessment shows that the standard FCM classifier provides an overall classification

accuracy of 77.38% while the proposed SWFCM could provide an overall accuracy of 90.92%, improving the standard FCM's accuracy by 13.54%, UFCM by 7.63% and CFCM by 9.36%. The Kappa coefficient could be used as an accuracy measurement to determine which classifier is better. From Table 11, the Kappa value of standard FCM algorithm is only 0.6865, and that of UFCM and CFCM are 0.7554 and 0.7364, respectively, while the value of SWFCM is 0.8412. So we can conclude that our proposed SWFCM algorithm provides better wetland mapping results than the other versions FCM classifier. From the other aspect, taking into account the convergence for the four algorithms, the comparison of iterative times and time cost is represented in Table 16. The proposed SWFCM has the insignificant improvement compared with the other classifiers. Fig. 10 shows the change curve of the clustering centers for SWFCM algorithm during the process of the remote sensing image clustering. And the corresponding quantitative relationship between the initial clustering centers and the final centers is given in Table 17. The red points are the initial clustering centers. It is found that the clustering centers change greatly at the beginning, then become steady after 30 times iterative calculation in Fig. 10. The reason is that the novel initialization method chooses the better clustering centers closed to the true centers. Only after fewer times circulation, the algorithm has converged. The calculation-consuming has gotten a remarkable decrease.

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

7. Conclusions In this paper, incorporating some prior information, a novel single point iterative weighted fuzzy C-means algorithm is proposed for multidimensional data clustering and image classification. Using this method, the weights of data attributes are set to adjust original samples to the uniform distribution, which could be suitable for the character of FCM calculation so as to improve the precision. What is more, in order to accelerate the convergent speed, the appropriate initial clustering centers are selected by the single point adjustment algorithm. That can also eliminate the influence on how to choose the prior samples. In addition, combined with the character of remote sensing data, the modified algorithm is updated for remote sensing image clustering. Our modified scheme is worth trying for FCM algorithms because it may yield better solutions, at little implementation expenses. The experimental results of the UCI data sets, public Berkeley segmentation dataset and Zhalong wetland remote sensing data clearly indicate that proposed SWFCM has high classification accuracy and convergence speed, and is suitable to solve the actual image classification problems with only small prior knowledge. In the future, we will continue to look for better weights of data attributes and initial clustering centers design method without any known information, which can update with iterative process adaptively. Acknowledgments This research is supported by the project (2007AA04Z158) of the National High Technology Research and Development Program of China (863 Program), the project (60674073) of the National Nature Science Foundation of China, the project (2006BAB14B05) of the National Key Technology R&D Program of China and the project (2006CB403405) of the National Basic Research Program of China (973 Program). References [1] R.L. Cannon, J.V. Dave, J.C. Bezdek, M.M. Trivedi, Segmentation of a thematic mapper image using the fuzzy C-means clustering algorithm, IEEE Transactions on Geoscience and Remote Sensing GE-24 (3) (1986) 400–407. [2] V.G. Camps, M.T. Bandos, D. Zhou, Semi-supervised graph-based hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing 45 (10) (2007) 3044–3054. [3] X.Z. Wang, Y.D. Wang, L.J. Wang, Improving fuzzy C-means clustering based on feature-weight learning, Pattern Recognition Letters 25 (10) (2004) 1123–1132. [4] F. Marcelloni, Feature selection based on a modified fuzzy C-means algorithm with supervision, Information Sciences 151 (2003) 201–226. [5] Y.X. Chen, J.Z. Wang, R. Krovetz, CLUE: clustering-based retrieval of images by unsupervised learning, IEEE Transactions on Image Processing 14 (8) (2005) 1187–1201. [6] M. Andersson, J. Gudmundsson, C. Levcopoulos, Approximate distance oracles for graphs with dense clusters, Computational Geometry 37 (3) (2007) 142–154. [7] D.R. Martin, C.C. Fowlkes, J. Malik, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (5) (2004) 530–549. [8] Y.X. Deng, J.P. Wilson, J. Sheng, Effects of variable attribute weights on landform classification, Earth Surface Processes and Landforms 31 (11) (2006) 1452–1462. [9] Y.J. Zhang, A survey on evaluation methods for image segmentation, Pattern Recognition 29 (8) (1996) 1335–1346. [10] Z. Hui, E.F. Jason, A.G. Sally, Image segmentation evaluation: a survey of unsupervised methods, Computer Vision and Image Understanding 110 (2) (2008) 260–280.

2539

[11] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, Validity-guided (re)clustering with applications to image segmentation, IEEE Transactions on Fuzzy Systems 4 (2) (1996) 112–123. [12] M.N. Ahmed, S.M. Yamany, N. Mohamed, A.A. Farag, T. Moriarty, A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data, IEEE Transactions on Medical Imaging 21 (3) (2002) 193–199. [13] K.S. Chuang, H.L. Tzeng, S. Chen, J. Wu, T.J. Chen, Fuzzy C-means clustering with spatial information for image segmentation, Computerized Medical Imaging and Graphics 30 (1) (2006) 9–15. [14] W.L. Cai, S. Chen, D.Q. Zhang, Fast and robust fuzzy C-means clustering algorithm incorporating local information for image segmentation, Pattern Recognition 40 (3) (2006) 825–838. [15] Y. Xia, D.G. Feng, T.J. Wang, R.C. Zhao, Y.N. Zhang, Image segmentation by clustering of spatial patterns, Pattern Recognition Letters 28 (12) (2007) 1548–1555. [16] E.T. George, On the use of the weighted fuzzy C-means in fuzzy modeling, Advances in Engineering Software 36 (5) (2005) 287–300. [17] C. Wen-Ya, C. Isabelle, Modified fuzzy C-means classification technique for mapping vague wetlands using Landsat ETM+ imagery, Hydrological Processes 20 (17) (2006) 3623–3634. [18] R. Nock, F. Nielsen, On weighting clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (8) (2006) 1223–1235. [19] R.J. Hathaway, Y.K. Hu, Density-weighted fuzzy C-means clustering, IEEE Transactions on Fuzzy Systems 17 (1) (2009) 243–252. ¨ [20] F. Hoppner, Speeding up fuzzy C-means: using a hierarchical data organisation to control the precision of membership calculation, Fuzzy Sets and Systems 123 (3) (2002) 365–376. [21] R.R. Yager, D.P. Filev, Approximate clustering via the mountain method, IEEE Transactions on Systems, Man and Cybernetics Part B-Cybernetics 24 (8) (1994) 1279–1284. [22] J.M. Pena, J.A. Lozano, P. Larranaga, An empirical comparison of four initialization methods for the K-means algorithm, Pattern Recognition Letters 20 (10) (1999) 1027–1040. [23] K. Dae-Won, H.L. Wang, L. Doheon, A novel initialization scheme for the fuzzy Cmeans algorithm for color clustering, Pattern Recognition Letters 25 (2) (2004) 227–237. [24] W. Pedrycz, J. Waletzky, Fuzzy clustering with partial supervision, IEEE Transactions on Systems, Man and Cybernetics, Part B 27 (5) (1997) 787–795. [25] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, Partially supervised clustering for image segmentation, Pattern Recognition 29 (5) (1996) 859–871. [26] Z.H. Zhou, M. Li, Semisupervised regression with cotraining-style algorithms, IEEE Transactions on Knowledge and Data Engineering 19 (11) (2007) 1479–1493. [27] I. Gath, A.B. Geva, Unsupervised optimal fuzzy clustering, IEEE Transaction on Pattern Analysis Machine Intelligence 11 (7) (1989) 773–781. [28] D. Chaudhuri, B.B. Chaudhuri, A novel multiseed nonhierarchical data clustering technique, IEEE Transactions on Systems Man and Cybernetics Part B—Cybernetics 27 (5) (1997) 871–877. [29] J. Ledoux, Filtering and the EM-algorithm for the Monrovian arrival process, Communications in Statistics—Theory and Methods 36 (14) (2007) 2577–2593. [30] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Algorithms, Plenum Press, New York, 1981. [31] H.J. Sun, S.R. Wang, Q.S. Jiang, FCM-based model selection algorithms for determining the number of clusters, Pattern Recognition 37 (10) (2004) 2027–2037. [32] M.R. Rezaee, B.P. Lelieveldt, J.H. Reiber, A new clustering validity index for the fuzzy C-mean, Pattern Recognition Letters 19 (3-4) (1998) 237–246. [33] K. Dae-Won, H.L. Kwang, L. Doheon, Fuzzy clustering validation index based on inter-clustering proximity, Pattern Recognition Letters 24 (15) (2003) 2561–2574. [34] R. Pal, J.C. Bezdek, On clustering validity for the fuzzy C-means model, IEEE Transactions on Fuzzy Systems 3 (3) (1995) 370–379. [35] R. Pal, J.C. Bezdek, Correction to on clustering validity for the fuzzy C-means model, IEEE Transactions on Fuzzy System 5 (1) (1997) 152–153. [36] X.L. Xie, G. Beni, A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (8) (1991) 841–847. [37] Y. Fukuyama, M. Sugeno, A new method of choosing the number of clusters for the fuzzy C-means method, in: Proceedings of Fifth Fuzzy Systems Symposium, 1989, pp. 247–250. [38] S.H. Kwon, Clustering validity index for fuzzy clustering, Electronics Letters 34 (22) (1998) 2176–2177. [39] W.B. Tao, H. Jin, Y.M. Zhang, Color image segmentation based on mean shift and normalized cuts, IEEE Transactions on Systems, Man and Cybernetics, Part B 37 (5) (2007) 1382–1389.

About the Author—JIANCHAO FAN was born in the Inner Mongolia Autonomous Region, China, in 1985. He received his B.S. degree from Department of Automation, Dalian University of Technology in 2007. He is a graduate student in the same university now. His current research interests are information management and decisions supporting system based on 3S technology, neural networks and predict control. About the Author—MIN HAN (M'95–A'03–SM'06) received the B.S. and M.S. degrees from the Department of Electrical Engineering, Dalian University of Technology, Liaoning, China, in 1982 and 1993, respectively, and the M.S. and Ph.D. degrees from Kyushu University, Fukuoka, Japan, in 1996 and 1999, respectively. She is a Professor at School of Electronic and Information Engineering, Dalian University of Technology. Her current research interests are neural network and chaos and their applications to control and identification.

2540

J. Fan et al. / Pattern Recognition 42 (2009) 2527 -- 2540

About the Author—JUN WANG (S'89–M'90–SM'93–F'07) received the B.S. degree in electrical engineering and the M.S. degree in systems engineering from Dalian University of Technology, Dalian, China, in 1982 and 1985, respectively, and the Ph.D. degree in systems engineering from Case Western Reserve University, Cleveland, OH, in 1991. He was an Associate Professor with the University of North Dakota, Grand Forks, until 1995. He is currently a Professor with the Department of Automation and ComputerAided Engineering, Chinese University of Hong Kong, Sha Tin. His current research interests include neural networks and their engineering applications. Prof. Wang is an Associate Editor of the IEEE Transactions on Neural Networks and IEEE Transactions on Systems, Man, and Cybernetics: Part B. He was the President of the Asia Pacific Neural Network Assembly.