Contextual Hausdorff Dissimilarity for Multi ... - Semantic Scholar

Report 2 Downloads 77 Views
2012 9th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2012)

Contextual Hausdorff Dissimilarity for Multi-instance Clustering Ying Chen

Ou Wu

Department of Basic Sciences Beijing Electronic Science and Technology Institute Beijing, P.R. China [email protected]

National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences Beijing, P.R. China [email protected]

Abstract—The multi-instance clustering problem has been emerging in kinds of applications. A straightforward solution is to adapt the classical single-instance clustering algorithms such as k-mediods to the setting of it. In this way, the essential step is the dissimilarity measurement between multi-instance bags. Traditional distances fail to capture the differences between bags. This paper proposes a new type of bag dissimilarity, namely, contextual Hausdorff dissimilarity (CHD). Then a multi-instance clustering algorithm based on CHD is introduced. Experimental results on both synthetic data and real-world data sets show that the proposed CHD outperforms the traditional Hausdorff dissimilarity. Keywords: Instance, Clustering, Hausdorff dissimilarity.

I. I NTRODUCTION Different from classical single-instance clustering (SIC), more emerging applications need to cluster data consisting of sets of instances, i.e. multi-instance bags. This new clustering is called multi-instance clustering (MIC). It lays its emphasis on the bag clusters, which are useful and worthy of study. For example, given a database storing each record of customers’ shopping in a super-market, each customer can be viewed as a bag consisting of a set of records. If customers are clustered into groups according to their records, the supermarket can provided personalized services for different groups. Fig.1 illustrates the difference between SIC and MIC. The symbols (’’, ’O’ and ’◦’) represent instances, and eclipses with their containing symbols represent bags. SIC exploits the patterns of instances, while MIC aims the patters of bags.

between instances with the distance between bags. Notice that,dissimilarity measure is the essential problem of clustering techniques. In SIC, a number of studies utilize the contextual information to obtain better dissimilarity measures between instances. Two typical methods are given in [3] and [4]. [3] studied the clustering data with noise. Their intuition is that the dissimilarity between two data points is determined by both themselves and their neighbors. The experimental results demonstrate the superiority of their new dissimilarity definition. [4] proposed a contextual dissimilarity for image retrieval. In their method, the dissimilarity between two points is also determined by the context of the points. Different from [3], the contextual information [4] used is not only the neighborhood of each point but also the global information of the data set. The experiments show that their dissimilarity measure can greatly improve the image clustering accuracy. However, the existing researches about MIC do not pay much attention to the dissimilarity between bags. Inspired by the studies of contextual information in SIC, we focus on leveraging the contextual information to bag dissimilarity measures. In the paper, a new type of bag dissimilarity called contextual Hausdorff dissimilarity (CHD) is defined. The experimental results show the effectiveness. In the following section, we briefly review the traditional Hausdorff distance. In section 3, we analyze the contextual information used in dissimilarity measurement in SIC and propose the contextual Hausdorff dissimilarity. Then a new multi-instance clustering algorithm is introduced. Section 4 describes our experimental results with some discussions. Section 5 concludes the paper. II. H AUSDORFF D ISSIMILARITY Hausdorff distance is a widely used distance between bags. Given two finite instance bag X = {x1 , · · · , xp } and Y = {y1 , · · · , yq }, The Hausdorff distance (HD) is defined as

Fig. 1.

Instance patterns and bag patterns.

H(X, Y ) = max(h(X, Y ), h(Y, X)),

(1)

h(X, Y ) = max min ||x − y||,

(2)

where [1] introduced an expectation-maximization approach to cluster bags by optimization the likelihood function that describes the distribution of bags in each cluster. [2] modified k-medoids method to MIC by replacing the distance

978-1-4673-0024-7/10/$26.00 ©2012 IEEE

x∈X y∈Y

and ||x − y|| is the Euclidean distance between x and y. Hausdorff distance has been applied in various applications.

870

It is tested to be sensitive to the noise. To alleviate this drawback, a number of modified Hausdorff distance have been proposed. [5] proposed the minimum Hausdorff distance (minHD) which achieves satisfactory results in supervised multi-instance learning. Nevertheless, we find that HD and minHD is inappropriate to measure the dissimilarity in MIC, especially when bags are overlapped. For example, in Fig. 2, there are three colored bags. Let D(R, B) represent the dissimilarity between the red bag and the blue bag and D(P, B) represent the dissimilarity between the purple bag and the blue bag. If using HD, D(R, B) is smaller than D(P, B), when using minHD, they are equal. Both results seem to contradict to our observation that D(R, B) is larger. This contradiction usually happens when bags are overlapped. The underlying reason mainly lies in that HD and minHD do not deal with the distribution of bags and instances well. There are two main ways to deal with this problem. One is to transform original data into another space to find a more suitable dissimilarity measurement. The other is to modify the dissimilarity used before. Since modification of the dissimilarity using contextual information has been successfully used in SIC, we take the second one.

Fig. 3.

Recognition using global contextual information.

Hence, it is necessary to embed the global information of data into the dissimilarity measurement. The global information here refers to the distribution or structure of a data set. Therefore, for each point, saying x, θx is used to represent its global parameters, i.e. the cluster it belongs to. Let y be another instance. Intuitively, the contextual dissimilarity is the function of data and their global parameters, which can be defined as: Dc (x, y) = f (x, y, θx , θy ).

(3)

To make Eq. (3) computational, we take the global parameter as an additional dimension of data. Hence, Eq. (3) can be calculated using the mathematical distance, which is as follows: Dc (x, y) = d([x, θx ], [y, θy ]).

(4)

Unfortunately, it is impossible to use θx and θy directly for they are unknown. Assuming d(x, y) = d(x, z), intuitively, the modified dissimilarity should satisfy: Fig. 2. Three colored bags. The three circles have the same radiuses. Both red bag and blue bag have five instances while purple bag has four.

III. C ONTEXTUAL H AUSDORFF D ISSIMILARITY BASED M ULTI - INSTANCE C LUSTERING This section defines the contextual Hausdorff dissimilarity and then proposed a new multi-instance clustering algorithm. In the paper, a capital letter denotes a bag while a small letter denotes an instance. Let D represent dissimilarity and d represent distance. A. Contextual Information [6] designed an experiment on testing the influence of expectation on perception. It demonstrated that the same physical stimulus can be perceived differently in different contexts. It is not difficult to understand this conclusion. For example, what is the exact meaning of a word usually depends on its context. The contextual information includes not only local information but more importantly global information of the data set. There is an illustrative example in Fig. 3. What is the letter represent in row 1, column 3? We will perceive 13 if just according nearest letters ’12’ and ’14’. We will perceive ’B’ if we understand the global distribution of the whole data.



Dc (x, y) ≤ Dc (x, z), if Dc (x, y) ≥ Dc (x, z), if

θx = 6 θz . θ x = θz (5) Furthermore, the equitation θx = θy can be modified as p(θx 6= θy ) = 0, and θx 6= θy can be modified to p(θx 6= θy ) = 1. We find that the following definition meets the rules of Eq. (5). θx = θy θx 6= θy

but but

Dc (a, b) = d(a, b) · p(θa 6= θb ).

(6)

Indeed, Eq. (6) is more general than Eq. (5). The larger of the probability of a pair of points coming from different clusters, the larger of their dissimilarity is. This conclusion is reasonable. Compared with Eq. (4), it can be easily seen that the dissimilarity in Eq. (6) does not have to know the two parameters in advance, but only the probability of whether they are equal. Note that the latter is easier to achieve than the former. Consequently, the contextual information can be used. B. Contextual Dissimilarity for Bags According to Eq. (6), the contextual dissimilarity between bags can be formed as follows:

871

A. Synthesize Data Dc (X, Y )

=

d((x1 , · · · , xp ), (y1 , · · · , y2 )) × p((θx1 , · · · , θxp ) 6= (θy1 , · · · , θyq )). (7)

This definition is more complex and the key problem is how to calculate the probability. Note that what we pursue are the clusters of bags instead of instances. We firstly cluster the instances to derive the global parameters θ for instances in Eq. (7), and then plunge them into Eq. (7). Although the derived parameters for instances may be imperfect, it is believable that the clusters of instances can reflect the distributions of instances to some extent, which can aid the bag clustering. Given that k clusters are derived from the instances from all the bags. For each bag, we can obtain a cluster vector in the form of (i1 , · · · , ik ), where ij means the number of instances in the j-th cluster. Let vX and vY be the cluster vectors of X and Y , respectively. Then the probability in Eq. (7) can be calculated as: vX · vy ||v || · ||vY || . (8) X p((θx1 , · · · , θxp ) 6= (θy1 , · · · , θyq )) = e −

This formula indicates if two bags’ cluster vectors are similar, they should have smaller dissimilarity. The new dissimilarity is called contextual Hausdorff dissimilarity (CHD).

The instances are generated from three different Gaussian distributions named as t1 , t2 and t3 respectively. The means and variances of t1 , t2 and t3 are ([1,1], [1,1]), ([9,1], [1,1]) and ([5,8], [1,1]) respectively. Each bag is produced according to its cluster vector. Bags with the same cluster vector belong to the same cluster. We design tree patterns of bags whose cluster vectors are (3, 0, 1), (0, 2, 0) and (1, 0, 3), respectively. When producing a bag, its cluster vector is firstly random selected from the three cluster vectors. For example, if the vector is (1, 0, 3), four instances are generated: one is from t1 and the other three from t3 . We generate totally 500 bags (1720 instances) of the three patterns. Our approach needs to pre-define the two parameters k1 and k2 . Because the initial clustering centers affect the final results, we undertake ten rounds of experiments for each dissimilarity measurement with different values of k1 and k2 . Table I shows the precision results, while Table II shows the entropy results. Each value is the average of ten rounds of runs with randomly setting of initial centers. The value of k1 is set to 3 and 5. The value of k2 is set to 2, 3 and 5. It is easily observed that CHD based clustering outperforms HD overall; minCHD based clustering also outperforms minHD consistently. TABLE I T HE PRECISION VALUES ON SYNTHESIZE DATA

C. Clustering Algorithm Once the bag dissimilarity is defined, some SIC algorithms can be directly modified to cluster bags. In this study, kmedoids is adapted to MIC based on CHD. The proposed multi-instance clustering algorithm is divided into two stages. The first stage is to cluster instances and calculate the CHD. The second is to cluster bags using k-medoids. The main steps of the proposed MIC algorithm are showed as follows. Input: • B: unlabeled multi-instance bag set; • I: instances set extracted from B; • k1 : number of instance clusters; • k2 : number of bag clusters; Outputs: • Groups: clustered bags; Steps: 1) Using k-means to cluster I into k1 clusters; 2) Constructing the parameter vector vX for each bag X; 3) Calculating the dissimilarity for each pair of bags using; 4) Using k-mediods to cluster B into k2 Groups; IV. E XPERIMENTS Four types of dissimilarity measurements are evaluated in the experiments. Two types are the traditional Hausdorff dissimilarities: HD and minHD. Two new types of dissimilarities are: CHD and minCHD. We use the precision and entropy defined in [2] to evaluate the clustering results. The higher the precision, the better the clustering results are; the lower the entropy, the better the results are.

k2 = 2 k2 = 3 k2 = 5

k1 = 3 CHD minCHD 0.528 0.453 0.761 0.506 0.907 0.579

k1 = 5 CHD minCHD 0.483 0.453 0.721 0.486 0.962 0.512

HD 0.521 0.739 0.839

minHD 0.417 0.424 0.436

TABLE II T HE ENTROPY VALUES ON SYNTHESIZE DATA

k2 = 2 k2 = 3 k2 = 5

k1 = 3 CHD minCHD 0.977 1.794 0.492 1.484 0.199 1.373

k1 = 5 CHD minCHD 1.261 1.794 0.679 1.540 0.127 1.506

HD 1.058 0.574 0.447

minHD 1.794 1.728 1.706

B. Real-world Data There are two public UCI data sets [7] for multi-instance learning. i.e. MUSK1 and MUSK2. The two sets are realworld data, which are generated in the research of drug activity prediction [2]. Table III and Table IV show the precision and entropy achieved by different dissimilarity measures, respectively, on MUSK1. CHD significantly and consistently outperforms the traditional HD. minCHD is also comparable to HD. Table V and Table VI show the precision and entropy achieved by different dissimilarity measurement, respectively, on MUSK2. CHD outperforms HD in most cases with respect to precision.

872

In terms of entropy, CHD is superior to HD consistently. Tough minHD seems to inferior to minHD with respect to precision, it outperforms minHD consistently with respect to entropy. Hence, minCHD is comparable to minHD on the MUSK2 set. In conclusion, contextual information does help in improving the clustering quality especially using the Hausdorff dissimilarity. TABLE III T HE PRECISION VALUES ON MUSK1

k2 = 2 k2 = 5 k2 = 8

k1 = 2 CHD minCHD 0.611 0.585 0.640 0.672 0.682 0.699

k1 = 4 CHD minCHD 0.543 0.532 0.622 0.642 0.688 0.701

HD 0.526 0.609 0.652

minHD 0.554 0.647 0.699

V. C ONCLUSIONS We have investigated the utilization of contextual information in dissimilarity measures. We have shown that the contextual information can be used as a modified term to the mathematical distances. The modified term reflects the probability that the global parameters of two data are identical, or they belong to the same cluster. Then a new bag dissimilarity measurement is proposed. Finally, a contextual bag dissimilarity based multi-instance clustering algorithm is introduced. Experimental results on three data sets demonstrate the proposed contextual bag dissimilarity is superior to traditional bag dissimilarity. ACKNOWLEDGMENT The authors would like to thank many other workmates for their valuable suggestion. This work is supported by the National Natural Science Foundation of China (Grant No. 61003115 & 60903147).

TABLE IV T HE ENTROPY VALUES ON MUSK1

k2 = 2 k2 = 5 k2 = 8

k1 = 2 CHD minCHD 0.947 0.998 0.855 0.896 0.764 0.777

k1 = 4 CHD minCHD 0.976 0.986 0.874 0.850 0.762 0.752

R EFERENCES

HD 0.990 0.904 0.829

minHD 0.965 0.854 0.765

HD 0.618 0.628 0.663

minHD 0.618 0.659 0.683

HD 0.959 0.903 0.810

minHD 0.960 0.859 0.786

TABLE V T HE PRECISION VALUES ON MUSK2

k2 = 2 k2 = 5 k2 = 8

k1 = 2 CHD minCHD 0.618 0.618 0.634 0.629 0.672 0.686

k1 = 4 CHD minCHD 0.618 0.624 0.639 0.643 0.660 0.682

[1] H.P. Kriegel, A. Pryakhin, M. Schubert, An EM-approach for clustering multi-instance objects. Lecture Notes in Computer Science, vol. 3918, Page(s): 139-148, 2006. [2] M.L. Zhang, Z.H. Zhou, Multi-Instance clustering with applications to multi-Instance Prediction. Applied Intelligence, vol. 31, No. 1, Page(s): 47-68, 2009. [3] D.L. Zhao, Z.C. Lin, X.O. Tang, Contextual distance for data perception. IEEE International Conference on Computer Vision, Page(s): 1-8, 2007. [4] H. Jegou, H. Harzallah, C. Schmid, A contextual dissimilarity measure for accurate and efficient image search. IEEE International Conference on Computer Vision and Pattern Recognition, Page(s): 1-8, 2007. [5] J. Wang, J. Zucker, Solving the mMultiple-instance problem: a lazy learning approach. IEEE International Conference on Machine Learning, Page(s): 1119-1125, 2000. [6] J.S. Bruner, A.L. Minturn, Perceptual identification and perceptual organization. Journal of General Psychology, vol. 53, Page(s): 21-28, 1955. [7] http://archive.ics.uci.edu/ml/datasets.html.

TABLE VI T HE ENTROPY VALUES ON MUSK2

k2 = 2 k2 = 5 k2 = 8

k1 = 2 CHD minCHD 0.931 0.936 0.875 0.861 0.794 0.777

k1 = 4 CHD minCHD 0.940 0.931 0.849 0.846 0.786 0.773

C. Discussion CHD integrates both local structure (using HD) and global structure (contextual information), and achieves better results over the synthetic data and real-world data. In addition, CHD is not too sensitive to the instance clustering quality. As shown in the above 6 tables, CHD is better than HD though the number of instance clusters (k1 ) changes.

873