A Competitive Elliptical Clustering Algorithm - Semantic Scholar

Report 11 Downloads 265 Views
A Competitive Elliptical Clustering Algorithm S. De Backer, P. Scheunders Vision Lab, Department of Physics, University of Antwerp, Groenenborgerlaan 171, 2020 Antwerpen, Belgium in Pattern Recognition Letters, Vol. 20, Nr. 11-13, p. 1141-1147, (1999)

Abstract This paper introduces a new learning algorithm for on-line ellipsoidal clustering. The algorithm is based on the competitive clustering scheme extended by two specific features. Elliptical clustering is accomplished by efficiently incorporating the Mahalanobis distance measure into the learning rules, and underutilization of smaller clusters is avoided by incorporating a frequency-sensitive term. Experiments are conducted to demonstrate the usefulness of the algorithm on artificial datasets as well as on the problem of texture segmentation. Key words: pattern classification, image segmentation and grouping

Preprint submitted to Elsevier Preprint

18 January 2000

1

Introduction

When a d-dimensional data space is to be partitioned, a model of the data distribution is needed. In general, a mixture of normal distributions is assumed. The Maximum Likelihood approach to fit the model to the data leads to the well-known Expectation-Maximization technique (EM) (Dempster 1976).

When applied as a clustering algorithm, however, it leads to several practical problems. First of all, since a large number of parameters is to be estimated, a large number of data points needs to be available. As a consequence, the algorithm is very time consuming. Another problem that can occur is that the covariance matrices can become singular or near-singular leading to numerical problems in inverting them. To our knowledge, not many studies have investigated the problem of singularities. Some have tried to solve the problem by appropriate initialization (Jolion 1991). Recently, a renormalization approach was studied, where the singularities were removed by combining elliptical and spherical distance measures (Mao 1996).

In this paper, a different approach is proposed, making use of Competitive Learning (Kohonen 1995). A direct update rule for the inverse of the covariance matrices is introduced. Initially the algorithm starts with spherical clusters, so that the problem of the singularities is automatically solved. A frequency sensitive modification takes care of the underutilisation of the clusters (Scheunders 1999). The competitive learning approach will be shown to be much less time consuming than the EM algorithm. Moreover, it has the advantage of being adaptive to changing streams of data. 2

2

2.1

Frequency-Sensitive Elliptical Competitive Learning

The EM algorithm

Assume a distribution of C multivariate d-dimensional Gaussians characterized by their mean ~µk and covariance matrix Σk . The log likelihood function to be maximized is then given by:

l(~µk , Σk , πk | ~y ) = C X X k=1 ~ yi ∈ Ck

d 1 1 log πk − log 2π − log |Σk | − (~yi − ~µk )T Σ−1 yi − ~µk ) k (~ 2 2 2

!

(1)

with Ck the set of points belonging to class k. In the following we will assume equal priors πk so that the last two terms are the remaining relevant ones. A practical way to cluster the distribution is given by the Expectation-Maximization (EM) algorithm, an iterative procedure where the E-step classifies the datapoints according to: 

~yi ∈ Ck ⇐⇒ k = argmin j log |Σj | + (~yi − ~µj )T Σ−1 yi − ~µj ) j (~



(2)

and the M-step updates the parameters according to: 1 X ~yi Nk ~yi ∈Ck

(3)

1 X (~yi − ~µk )(~yi − ~µk )T Nk ~yi ∈Ck

(4)

~µk = Σk =

where Nk is the total number of data points that belong to cluster k. The EM algorithm is summarized as follows: Algorithm 1 the EM algorithm 3

initialize ~µk randomly initialize Σk to be unit matrices set t = 1 while ∃ n : Cn (t) 6= Cn (t − 1) do for all ~yi do find class Ck using (2) (E-step) end for for all k do update ~µk and Σk using (3) and (4) (M-step) end for t=t+1 end while A well known and often used simplification of this technique is to assume spherically shaped clusters, which reduces the covariance matrices to unit matrices and leads to the well known k-means algorithm.

2.2

Competitive Learning

The EM algorithm is a batch-mode algorithm, i.e. the classification and updating steps are performed globally, using the complete data set. Another method, often applied using neural networks, uses on-line or sequential training. Here the parameters are updated each time a data point is presented. The on-line version of the k-means algorithm is also known as Competitive Learning (Kohonen 1995). The mean ~µk is then updated by ~µk (t + 1) = ~µk (t) + α(t)(~yi − ~µk (t))

(5)

where α is a learning parameter, decreasing in function of t. There are several reasons to prefer this sequential approach over the batch one. One iteration step of k-means takes a computation time proportional to the total number of data points. Applying competitive learning during one “e4

poch” (i.e. as many times as there are data points), takes about the same time. Due to the smaller steps that are taken for the adaptation process, competitive learning will lead quickly to a nearby optimum, while k-means will have more possibilities to get stuck into well-separated local optima (Scheunders 1997). Moreover, when a stream of continuously changing data is involved, competitive learning will be useful as an adaptive approach.

2.3

Frequency-Sensitive Competitive Learning

When using competitive learning, instabilities can appear during the first few iterations due to favouring larger clusters over the others (an effect that obviously does not appear in the case of k-means, since all data points are updated simultaneously). This effect, also known in the field of vector quantization, can be removed by using a frequency-sensitive approach (Ahalt 1990). Here, the distance measure is multiplied by nk (t), the number of times that the cluster has been modified at the t-th iteration. Recently, this technique has been demonstrated to be very efficient for clustering in cases where differently sized clusters appear (Scheunders 1999). The technique will be referred to as Frequency-Sensitive Competitive Learning.

2.4

Ellipsoidal Clustering

The k-means and the competitive learning scheme are derived for spherical classes. In the more general case, the covariance matrices need to be updated also. Therefore, we present the following updating rule.

Σk (t + 1) = Σk (t) + α((~yi − ~µk )(~yi − ~µk )T − Σk (t)) 5

= (1 − α)Σk (t) + α(~yi − ~µk )(~yi − ~µk )T

(6)

In the same way as in (5), where the position of the cluster moves towards the data point presented, the covariance “moves” towards the covariance represented by that data point. This rule, however, is impractical, since using this information for the classification step (equation (2)) would require an inversion of Σk after each presentation of a data point (the first term requires the determinant of Σk but this is just the inverse of the determinant of Σ−1 k ). Therefore we need to find an updating rule for Σ−1 k directly. ~ we obtain: Writing the difference vector (~yi − ~µk ) as m, Σ−1 k (t + 1) 

−1



−1

~ ·m ~ T) = (1 − α)Σk (t) + α(m α = I+ Σ−1 ~ ·m ~ T) k (t)(m 1 − α  = I − kΣ−1 ~ ·m ~ T) k (t)(m

Σ−1 k (t) 1−α

write k =

α 1−α

 Σ−1 (t) k

+k 2 Σ−1 ~ ·m ~ T )Σ−1 ~ ·m ~ T ) − k3 . . . k (t)(m k (t)m Σ−1 k (t)

1−α

k 2 2 Σ−1 (t)(m ~ ·m ~ T )Σ−1 k (t)(1 − kλ + k λ − . . .) 1−α 1−α k with λ = m ~ T Σ−1 ~ k (t)m −1 −1 Σ (t) k (Σk (t)m) ~ T · (Σ−1 ~ k (t)m) = k − 1−α 1−α 1 + kλ

=



(7)

In this way, the inverse of the covariance matrices can be updated directly. The competitive learning algorithm, including the frequency-sensitive term and the elliptical clustering modification will be referred to as FrequencySensitive Elliptical Competitive Learning, and is summarized as follows: Algorithm 2 Frequency-Sensitive Elliptical Competitive Learning for all k do 6

initialize ~µk randomly initialize Σk to be unit matrices end for set t = 1 for all ~yi do find class Ck∗ using (2) update ~µk∗ using (5) update Σ−1 k ∗ using (7) t=t+1 nk∗ = nk∗ + 1 end for

3

Experiments and discussion

In this experimental section, the proposed algorithm is demonstrated and compared to the batch-mode techniques. Four algorithms will be compared: a) the k-means algorithm (CM); b) the EM algorithm (ECM); c) Competitive Learning with the assumption of spherical clusters (CL) and the proposed technique (ECL). Two experiments are conducted: one experiment for didactic purpose on an example data set and one experiment on the problem of texture segmentation.

3.1

Example dataset

As an example, assume 2 gaussian clusters in d = 2, with parameters Σ1 (x, x) = 25; Σ1 (y, y) = 1; Σ2 (x, x) = Σ2 (y, y) = 1 and all off-diagonal elements are zero (see Figure 1). The mean positions of the clusters have the same x-component while the y-component is variable and denotes the inter-class distance. This distribution is generated using 1000 sample points for each cluster. The experiment is repeated 100 times, where each time a new set of datapoints is generated. 7

All the methods work well when the inter-class distance is fairly large. When clusters tend to overlap, the spherical description starts to fail. Both ECM and ECL work well in characterizing the two classes, even when they start to overlap. In Figure 1, the inter-class distance is chosen to be 3. The classification performance is about 70 ±1% for both CM and CL, while ECM and ECL attain 93 ±1%. While about 5-10 iterations are needed for ECM, 1 epoch is sufficient for ECL to obtain a reasonably good result.

3.2

Texture Segmentation

Clustering is regularly applied for image processing tasks, where it can be used for segmentation purposes. Texture segmentation is generally believed to be a difficult task. Here, for every pixel of the image, typically a highdimensional feature vector is calculated. Clustering of the feature space leads to the segmentation results. Generally, the feature space is reduced first. The cluster shapes are generally very irregular. In the experiments, mosaics of five textures, chosen from the Brodatz set (Brodatz 1966), were built. See Figure 2 for an example, with from left to right and from top to bottom Brodatz textures D29, D28, D21 and D2, and D24 in the middle. Features were calculated using non-separable Gabor filters, implemented in the Fourier space. From the filtered images, energies were calculated in 8 different orientations, equally distributed between -90 and 90 degrees, and for 8 different frequencies, equally distributed on an octave scale, leading to 64 features per pixel. This high-dimensional space was reduced using principal component analysis (PCA) to dimension d, after which clustering was applied, using C = 5. 8

In Figures 3 and 4, segmentation results are shown for d = 3. In Figure 3, the resulting segmented images are shown using CM and ECM, respectively. In Figure 4, the results of CL resp. ECL are shown. In Figure 5, segmentation results are plotted as a function of d. Since all algorithms depend on the initial configuration of the parameters, and the competitive approaches depend also on the order in which data points are presented, results differ from one run to another. Therefore, we repeated all experiments during 100 runs, and average and error bars (1 s.d.) are given. From these results the following can be concluded: • The results for CL are comparable to CM. When looking at the segmented images, textures D28 and D2 were not distinguishable, and texture D21 was characterized by 2 different clusters. • A remarkable result was obtained when going from spherical to ellipsoidal clustering. In the case of ECM, the segmentation performance did not improve. Apparently, the task of estimating large numbers of parameters did not succeed, probably due to the noisiness of the calculated features. • When applying ECL, an improvement of about 10% over CM and CL can be observed. On the images, one can notice that the two clusters characterizing texture D21, were now utilized to characterize two different textures D28 and D2. This is the only result where all 5 textures were distinguishable. Remark that applying only the frequency-sensitive term, or only the elliptical modification did not improve results over CL, but applying both succeeded. Besides the gain in performance, the competitive approach also entails a gain in calculation speed. CM and ECM update all data points simultaneously during one iteration. Several iteration steps were needed to obtain useful results. 9

The competitive approaches update the data points one at a time. Typically about one “epoch” (i.e. all data points presented once) sufficed to obtain useful results. In Table 1, typical processing times are shown.

References Ahalt, S. C., A. K. Krishnamurthy, P. Chen, D. E. Melton. 1990. Competitive learning algorithms for vector quantization. Neural Networks 3, 277–290. Brodatz, P., 1966. Textures, A photographic album for artists and designers. Dover Publications, New York. Dempster, A. P., N. M. Laird, D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 1–38. Jolion, J.-M., P. Meer, S. Bataouche. 1991. Robust clustering with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 791–802. Kohonen, T., 1995. Self-Organizing Maps. Springer-Verlag. Mao, J., A. K. Jain. 1996. A self-organizing network for hyperellipsoidal clustering (hec). IEEE Trans. on Neural Networks, 16–29. Scheunders, P., 1997. A comparison of clustering algorithms applied to color image quantization. Pattern Recognition Letters 18, 1379–1384. Scheunders, P., S. De Backer. 1999. High-dimensional clustering using frequency sensitive competitive learning. Pattern Recognition 32, 193–202.

10

List of Figures 1

The classification of 2 clusters in d = 2

12

2

The composition of textures

12

3

Segmentation result using the batch-mode

13

4

Segmentation result using competitive learning

13

5

Plot of segmentation performance in function of d for the different algorithms

14

List of Tables 1

Average processing times for the different algorithms in the texture experiments

11

12

(a) ECM

(b) ECL

Fig. 1. The classification of 2 clusters in d = 2

Fig. 2. The composition of textures

Method

# iterations Time (s)

CM

24

1.5

ECM

50

23.

CL

1

0.1

ECL

1

0.3

Table 1 Average processing times for the different algorithms in the texture experiments

12

(a) CM

(b) ECM

Fig. 3. Segmentation result using the batch-mode

(a) CL

(b) ECL

Fig. 4. Segmentation result using competitive learning

13

100 95

CL ECL CM ECM

90

Classification Result (%)

85 80 75 70 65 60 55 50

1

2

3

Dimension

4

5

6

Fig. 5. Plot of segmentation performance in function of d for the different algorithms

14