Swarm Based Fuzzy Clustering with Partition Validity Lawrence O. Hall and Parag M. Kanade Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu
Abstract— Swarm based approaches to clustering have been shown to be able to skip local extrema by doing a form of global search. We previously reported on the use of a swarm based approach using artificial ants to do fuzzy clustering by optimizing the fuzzy c-means (FCM) criterion. FCM requires that one choose the number of cluster centers (c). In the event that the user of the algorithm is unsure of the number of cluster centers, they can try several different choices and evaluate them with a cluster validity metric. In this work, we use a fuzzy cluster validity metric proposed by Xie and Beni as the criterion for evaluating a partition produced by swarm based clustering. Interestingly, when provided with more clusters than exist in the data our antbased approach produces a partition with empty clusters and/or very lightly populated clusters. We used two data sets, Iris and an artificially generated data set, to show that optimizing a cluster validity metric with a swarm based approach can effectively provide an indication of how many clusters there are in the data.
I. I NTRODUCTION Clustering unlabeled data requires that an algorithm determine the number of clusters or the user of the clustering algorithm is able to approximately guess the number of clusters in the data. In the absence of knowledge about the number of clusters in the data, partition validity metrics [1], [2] are typically applied to determine how good a partition is for a particular number of clusters. In the event that the algorithm automatically determines the number of clusters, it can still be useful to use partition validity metrics if you want to compare partitions with different numbers of clusters from different algorithms. Swarm based approaches have been used to produce partitions of clusters [3], [4], [5], [6], [7], [8]. In particular, ant based clustering has been used to produce a fuzzy partition [9]. We thought it would potentially be valuable to use an ant based approach to optimize a fuzzy partition validity metric. In [2] the Xie-Beni [1] validity metric was shown to be quite useful in picking out the “right” number of clusters for a partition. So, we chose to optimize this metric. Our hypothesis was that given a guess of the number of clusters it would produce a fuzzy partition that was, perhaps, a little more “valid” than FCM for the same number of clusters. It turns out that this approach will determine the number of clusters that exist in the data, if provided with an initial guess of cluster centers that is larger than or equal to the actual number of clusters in the 0-7803-9158-6/05/$20.00 © 2005 IEEE.
data. We support this observation with results on two data sets, the well-known Iris data set [10] and an artificially generated data set. The Iris data set was where we first observed the phenomenon as we claimed there were three clusters and the algorithm stubbornly produced two by leaving one cluster empty or nearly empty. In Section 2, we briefly describe the fuzzy c-means clustering algorithm and the Xie-Beni Partition validity metric. In Section 3, we discuss ant based clustering using partition validity to evaluate partitions, Section 4 contains experimental results and Section 5 has the conclusions. II. F UZZY
CLUSTERING AND PARTITION VALIDITY
In [11] the authors proposed a reformulation of the optimization criteria used in a couple of common clustering objective functions. The original clustering function minimizes the objective function (1) used in fuzzy c-means clustering to find good clusters for a data partition. Jm (U, β, X) =
n c
m Uik Dik (xk , βi )
(1)
i=1 k=1
where Uik : M embership of the kth object in the ith cluster βi : T he ith cluster prototype m ≥ 1 : T he degree of f uzzif ication c ≥ 2 : N umber of clusters n : N umber of data points Dik (xk , βi ) : Distance of xk f rom the ith cluster center βi The reformulation replaces the membership matrix U with the necessary conditions which are satisfied by U. In this work, the ants will move only cluster centers and hence we do not want the U matrix in the equation. The reformulated version of Jm is denoted as Rm . The reformulation for the fuzzy optimization function is given in (2). The function Rm depends only on the cluster prototype and not on the U matrix, whereas J depends on both the cluster prototype and the U matrix. The U matrix for the reformulated criterion can be easily computed using (3).
991
The 2005 IEEE International Conference on Fuzzy Systems
k=1
Uik = c
1−m Dik (xk , βi )
i=1 1
Dik (xk , βi ) 1−m
j=1
1 1−m
(2)
1
Djk (xk , βj ) 1−m
.
(3)
The Xie-Beni partition validity metric can be described as [1]: Rm (β, X) (4) XB(β, X) = n(mini=j {βi − βj 2 }) It is clearly tied to the FCM functional with a strong preference for keeping the smallest distance between any two cluster centroids as large as possible. The smallest XB(β, X) is considered to be the best. III. F UZZY A NTS C LUSTERING A LGORITHM The ants co-ordinate to move cluster centers in feature space to search for optimal cluster centers. Initially the feature values are normalized between 0 and 1. Each ant is assigned to a particular feature of a cluster in a partition. The ants never change the feature, cluster or partition assigned to them as in [6]. After randomly moving the cluster centers for a fixed number of iterations, called an epoch, the quality of the partition is evaluated by using the Xie-Beni criterion (4). If the current partition is better than any of the previous partitions in the ant’s memory then the ant remembers its location for this partition otherwise the ant, with a given probability goes back to a better partition or continues from the current partition. This ensures that the ants do not remember a bad partition and erase a previously known good partition. Even if the ants change good cluster centers to unreasonable cluster centers, the ants can go back to the good cluster centers as the ants have a finite memory in which they keep the currently best known cluster centers. There are two directions for the random movement of the ant. The positive direction is when the ant is moving in the feature space from 0 to 1, and the negative direction is when the ant is moving in the feature space from 1 to 0. If during the random movement the ant reaches the end of the feature space the ant reverses its direction. After a fixed number of epochs the ants stop. Each ant has a memory of the mem (5 here) best locations for the feature of a particular cluster of a particular partition that it is moving. An ant has a chance to move I times before an evaluation is made (an epoch). It can move a random distance between Dmin and Dmax . It has a probability of resting Prest (not moving for an epoch) and a probability of continuing in the same direction as it was moving at the start of the epoch Pcontinue . At the end of an epoch in which it did not find a position better than any in memory it continues with PContinueCurrent . Otherwise there are a fixed set of probabilities for which of the best locations in memory search should be resumed from for the next epoch [6]. They are a probability of 0.6 that the ant chooses to go back to the best known partition, a probability of 0.2 that the ant goes back
to the second best known partition, a probability of 0.1 that the ant goes to the third best known partition, a probability of 0.075 that the ant goes to the fourth best known partition and with a probability of 0.025 that the ant goes to the worst or fifth of the known partitions. Since objects membership in clusters are not explicitly evaluated at each step, there can be cluster centroids that are placed in feature space such that no object is closer to the centroid than it is other centroids. These are empty clusters and indicate that there are less true clusters than estimated as will be shown in the proceeding. There may also exist clusters with one, two or very few examples assigned to them which are likely spurious if we expect approximately equal size clusters and have cluster sizes larger than some threshold, say thirty. IV. E XPERIMENTS There were two data sets utilized to experimentally evaluate the ant based clustering algorithm proposed here. The first was the well-known Iris data set. It consists of four continuous valued features, 150 examples, and three classes [10]. Each class consists of 50 examples. However, one of the classes is clearly linearly separable from the other two and many partition validity metrics will prefer a partition with two classes. Figure 1 shows a two-dimensional projection of the first 2 principal components of the Iris data, which has been normalized so all feature values are between 0 and 1. The first 2 principal components are strongly correlated with the petal length and petal width features. For this data set, a reasonable argument may be made for two or three clusters. 0.6
0.4
0.2 2nd Principal Component
Rm (β, X) =
c n
0
−0.2
−0.4
−0.6
−0.8 −0.8
Fig. 1.
−0.6
−0.4
−0.2
0 0.2 1st Principal Component
0.4
0.6
0.8
1
Iris Dataset (Normalized)- First 2 Principal Components
The artificial dataset had 2 attributes, 5 classes and 1000 examples. It was generated using a Gaussian distribution and is shown in Figure 2. The classes are slightly unequally sized [12] (248, 132, 217, 192 and 211 respectively). A. Experimental parameters The parameters used in the experiments are shown in Table I. Essentially, 30 different partitions were utilized in each epoch. As there is significant randomness in the process, each experiment was run 30 times. Each experiment was tried with the known number of clusters or more. For the Iris data set, we also tried two classes
992
The 2005 IEEE International Conference on Fuzzy Systems
TABLE II
1
N UMBER OF CLUSTERS SEARCHED FOR AND AVERAGE NUMBER FOUND
0.9
FOR THE I RIS DATA WITH THE MINIMUM
0.8
0.7
Clusters searched 3 4 5 6
2nd Attribute
0.6
0.5
0.4
0.3
Ave. clusters found 2 2 2 2.5
P
OVER
30 TRIALS .
P 0.2 0.3 0.9 0.9
0.2
0.1
0
0
0.1
0.2
0.3
Fig. 2.
0.4
0.5 1st Attribute
0.6
0.7
0.8
0.9
1
Gauss-1 Dataset (Normalized) TABLE I PARAMETER VALUES Parameter
Number of ants Memory per ant Iterations per epoch
Value 30 Partitions 5 50
Epochs
1000
Prest
0.01
Pcontinue
0.75
PContinueCurrent
0.20
Dmin
0.001
Dmax
0.01
m
2
because of the fact that in feature space an argument can be made for this number of classes. B. Results We will look at the results from the Iris data set first. When we tried to cluster into three classes; a partition with 50 examples from class 1 and 100 examples from class 2/class 3 was found 10 of 30 times. The rest of the time a cluster with one example was found four times and in the other experiments the cluster with class 1 had a few examples from another class. So, the results seem to clearly indicate that there are two classes. However, we wanted a repeatable method that could objectively determine how many classes existed. We used a threshold on the number of examples in a cluster. The FCM functional has a bias towards producing approximately equal size clusters. It is not the right functional to use for widely different sized clusters. Hence, we used a threshold which was the percent of examples if each cluster was the same size. If a cluster had less than the threshold, it indicated that there was no cluster and the cluster should be merged with another. We did not, in these experiments, try to merge the clusters. The equation is n (5) T = ∗ P, c where n is the number of examples, c is the number of clusters
searched for and P is the percentage. Any percentage 2 or greater will lead to the conclusion that there are only 2 clusters in the Iris data when we search for 3. Results are summarized for different c in Table II. Next, we searched for four clusters in the Iris data. A partition with 50 examples from class 1 and the other two classes perfectly mixed occurred three times. There was always one empty cluster and the largest cluster size was 9 in the case three clusters were found. So, any threshold above 30 percent will provide the conclusion that there are only two clusters. With five clusters there were typically two or three empty clusters and the “perfect” partition into two clusters occurs twice. If a percentage of 90 or above is used the conclusion will be two clusters exist. This search space is significantly larger and no more epochs were utilized, so we feel the result is a strong one. We also tried six clusters where there were typically two or three empty clusters. In this case, with a percentage of 90 or above the average number of classes was 2.5. There were a number of cases in which the linearly separable class would get discovered as one cluster and the other two classes would be split into two (maybe 67/33 or 74/26 for example). Again, in this large search space this seems to be a very reasonable result. One would probably not guess double the number of actual classes. In order to evaluate whether a more complete search might result in the discovery of 2 clusters more often when we initially searched for 6, we changed the number of epochs to 4000 and the number of iterations per epochs to 25. This causes the ant to move less during epochs and have more opportunities (epochs) to find good partitions. With these parameters and a percentage of 90, just 2 clusters were found for all thirty trials. The examples in the linearly separable class were assigned, by themselves, to one cluster nine times. Finally, we report the results when searching for only 2 clusters. In this case, there were always two clusters found (for P < 0.65). In 14/30 trials a partition with the linearly separable class and the other two classes mixed was found. In the other experiments a few examples were assigned with the linearly separable class making its size between 51 and 54. So, a very reasonable partition was obtained when searching for two classes. For the artificial data we did experiments with 5, 6, 7, 8 and 9 clusters. Results are summarized for different c in Table III. The ant based clustering always found five clusters when it was given five to search for. In fact, it found the exact original partition 15 times. When it was incorrect, it had some small
993
The 2005 IEEE International Conference on Fuzzy Systems
TABLE III N UMBER OF CLUSTERS SEARCHED FOR AND AVERAGE NUMBER FOUND FOR THE
A RTIFICIAL DATA WITH THE MINIMUM P Clusters searched 6 7 8 9
Ave. clusters found 5 5.033 5 5
OVER
30 TRIALS .
P 0.3 0.3 0.75 0.8
confusion between class two and class five. A typical partition that did not match the original was: (248, 133,217, 192, 210) in which one example had switched between class 2 and class 5. This seems to be a pretty reasonable clustering result given the larger search space of the ants. When it searched for six classes, it always found five for a percentage 30 or greater. The sixth cluster typically had between 0 and two examples assigned to it. When searching for seven classes, it found five classes for a percentage of 30 or greater 29 times. One time it found six classes. In that case there was an empty cluster and then class 4 was split into two clusters. For eight classes, exactly five were found for a percentage of 0.75. Making it larger would occasionally cause 4 to be found when Cluster 5 was split exactly into 2 chunks. For nine classes, five classes were always found for a percentage of 80 up to about 90. There might be two or three empty clusters. The other non-clusters were very lightly populated with less than 15 examples closest to their centroid in the usual case. As the percentage got too high it would cause a class split into two, to occasionally be missed resulting in four clusters. For example, with P = 1, T = 111.11 and class 4 is split into two clusters with 107 and 86 examples in each, respectively. V. S UMMARY
AND
D ISCUSSION
A swarm based approach to clustering is used to optimize a fuzzy partition validity metric. A group of ants was assigned as a team to produce a partition of the data by positioning cluster centroids. Each ant was assigned to a particular feature of a particular cluster in a particular partition. The assignment was fixed. The ants utilized memory to keep track of the best locations they had visited. Thirty partitions were simultaneously explored. It was found that an overestimate of the number of clusters that exist in the data would result in a best partition with “the optimal” number of clusters. An overestimate of the number of clusters allows the ant based algorithm the freedom to make groups of two or more clusters have approximately the same centroid, thereby reducing the total number of clusters in a partition. The ability to choose a smaller set clusters than initially hypothesized allows for a better optimized value for the partition validity function. After minimal post-processing to remove spurious clusters the “natural” substructure of the data, in terms of clusters, was discovered. The Xie-Beni fuzzy clustering validity metric was used to evaluate the goodness of each partition. It was based on
the fuzzy c-means clustering algorithm. A minor modification was made to it so that a membership matrix did not need to be computed. A threshold was applied to cluster size to eliminate very small clusters which would not be discovered utilizing the FCM functional which has a strong bias towards approximately equal size clusters. Small clusters here mean clusters of from 1 to 20 elements or less than 40% of the expected size of a class (given that we knew the approximate class size). Two data sets, the Iris data and a five cluster artificial data set, were used to evaluate the approach. For both data sets, the number of clusters in the feature space describing the data set were discovered even when guessing there were more than twice as many clusters as in the original data set. There is an open question on how to set the threshold which would indicate that a cluster is spurious (too small to be real). There is the question of what to do with spurious clusters. They could certainly be merged into the closest non-spurious cluster. Alternatively, if the threshold is too high a cluster that is split into two or more chunks could be left undiscovered as all sub-clusters could be deemed spurious. The search can be parallelized to make it significantly faster. That is, each ant can certainly move independently. The final partitions produced by the swarm based clustering algorithm typically matched or were quite close to what would be obtained from FCM with the same number of cluster centers and matched the actual data quite well. Hence, it is a promising approach to find a partition with the number of clusters actually resident in the data as long as some heuristic overestimate of the cluster number can be made. ACKNOWLEDGEMENTS This research partially supported by The National Institutes of Health via a bioengineering research partnership under grant number 1 R01 EB00822-01.
994
R EFERENCES [1] X. Xie and G. Beni, “Validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 3, no. 8, pp. 841–846, 1991. [2] N. Pal and J. Bezdek, “On cluster validity for the fuzzy c-means model,” IEEE Transactions on Fuzzy Systems, vol. 3, no. 3, pp. 370–379, 1995. [3] S. Ouadfel and M. Batouche, “Unsupervised image segmentation using a colony of cooperating ants,” in Biologically Motivated Computer Vision, Second International Workshop, BMCV 2002, vol. Lecture Notes in Computer Science Vol.2525, 2002, pp. 109–116. [4] N. Labroche, N. Monmarche, and G. Venturini, “A new clustering algorithm based on the chemical recognition system of ants,” in Proceedings of the European Conference on Artificial Intelligence, 2002, pp. 345– 349. [5] N. Monmarch´e, M. Slimane, and G. Venturini, “On improving clustering in numerical databases with artificial ants,” in 5th European Conference on Artificial Life (ECAL’99), Lecture Notes in Artificial Intelligence, D. Floreano, J. Nicoud, and F. Mondala, Eds., vol. 1674. Lausanne, Switzerland: Springer-Verlag, Sep 1999, pp. 626–635. [6] P. M. Kanade and L. O. Hall, “Fuzzy ants clustering with centroids,” FUZZ-IEEE’04, 2004. [7] J. Handl, J. Knowles, and M. Dorigo, “On the performance of antbased clustering. design and application of hybrid intelligent systems,” in Frontiers in Artificial intelligence and Applications 104, 2003, pp. 204–213.
The 2005 IEEE International Conference on Fuzzy Systems
[8] ——, “Strategies for the increased robustness of ant-based clustering,” in Self-Organising Applications: Issues, challenges and trends, vol. LNCS 2977, 2003, pp. 90–104. [9] J. Bezdek, J. Keller, R. Krishnapuram, and N. Pal, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Boston, MA: Kluwer, 1999. [10] C. Blake and C. Merz, “UCI repository of machine learning databases,” 1998. [Online]. Available: http://www.ics.uci.edu/∼mlearn/MLRepository.html
[11] R. J. Hathway and J. C. Bezdek, “Optimization of clustering criteria by reformulation,” IEEE Transactions on Fuzzy Systems, vol. 3, no. 2, pp. 241–245, May 1995. [12] P. Kanade, “Fuzzy ants as a clustering concept,” Master’s thesis, University of South Florida, Tampa, FL, 2004.
995
The 2005 IEEE International Conference on Fuzzy Systems