Pattern Recognition 37 (2004) 161 – 163
Rapid and Brief Communication
www.elsevier.com/locate/patcog
Distance metric learning by knowledge embedding Yun Gang Zhanga , Chan Shui Zhanga;∗ , David Zhangb a State
Key Laboratory of Intelligent Technology and Systems, Department of Automation, Tsinghua University, Beijing 100084, China b Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Abstract This paper presents an algorithm which learns a distance metric from a data set by knowledge embedding and uses the new distance metric to solve nonlinear pattern recognition problems such a clustering. ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Distance metric learning; Knowledge embedding; Clustering
1. Introduction Distance metrics are very important in many learning algorithms. Euclidean distance is a common distance metric in linear space. In nonlinear problems, Euclidean distance has many limitations and some new distance metrics are used for better performance. This paper presents an algorithm which learns a distance metric from a data set by knowledge embedding and uses the new distance metric to solve nonlinear problems such as clustering. There are three key ideas in the algorithm. The 5rst is using eigenvalues of local covariance matrixes to express knowledge. The second is embedding knowledge into the input space to form a knowledge embedded space and using MNN distance in this space as the new distance metric. The third is using mutual neighborhood graphs to discover knowledge and cluster data. In the last decade, a number of new distance metric learning methods have been proposed. The kernel-based methods [1] transform data in the input space to a high-dimensional feature space and make the original nonlinear problems become linear problems in the feature space. However, it is di<cult to choose a proper kernel function to perform this task. In our method, the knowledge embedded space is easily formed. Compared with local principal component analysis ∗
Corresponding author. Department of Automation, Tsinghua University, Beijing 100084, China. Tel.: +86-10-62782447; fax: +86-10-62786911. E-mail address:
[email protected] (C.S. Zhang).
methods [2], our method directly uses all the eigenvalues to represent local knowledge. The advantage is that it contains more information. 2. Distance metric learning by knowledge embedding Suppose we have a data set, X = {x1 ; : : : ; xN }, xi ∈ Rd (i = 1; : : : ; N ), N is the number of points and d is the dimension number of input data. Denition 1. Set !i = {xi ; xi1 ; : : : ; xiK } is called the local neighborhood of xi . K is the number of neighbors, xil denotes the lth nearest neighbor of xi . For convenience, xi0 is used to denote xi and !i is rewritten as !i = {xi0 ; : : : ; xiK }. Denition 2. Local covariance matrix of !i is K 1 Si = (xil − mi )(xil − mi )T ; K +1 l=0
where mi =
K 1 x il K +1 l=0
is the average of !i . i = [ i1 ; : : : ; id ]T ( i1 ¿ · · · ¿ id ) is the vector of eigenvalues of Si and is called the local feature of xi . The knowledge represented by the local feature is called knowledge. Denition 3. A neighborhood graph is an undirected weighted graph G = (X; E), where X is the set of data points
0031-3203/$30.00 ? 2003 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/S0031-3203(03)00218-8
162
Y.G. Zhang et al. / Pattern Recognition 37 (2004) 161 – 163
and E is the set of edges between neighbors with weights eij to represent the distance. When mutual neighborhood values (MNV) [3] are used as the weights, the neighborhood graph is called a mutual neighborhood graph and the distance is called MNN distance. If xi and xj are not neighbors, let eij = 0, indicating there is no edge between them; otherwise, eij is the MNV for xi and xj .
20
20
10
10
0
0
-10
-10
-20 -20
(a)
Denition 4. Adaptability of !i is de5ned as d 1 ij ; ai = d j=1 Jij where ij is the jth element of i , 1 Jij = tj ; K t∈{i1 ;:::; iK }
t is the vector of eigenvalues of St . il (l = 1; : : : ; K) is the subscript of xil which is the lth nearest neighbor of xi . The 5rst step of our algorithm is mutual neighborhood graph construction in the input space. We expand the neighborhood construct algorithm of ISOMAP [4] and LLE [5] by using MNN distance instead of Euclidean distance. MNN distance makes it easier for the points with similar density to connect together. The second step is distance metric learning by knowledge embedding. For arbitrary xi , its corresponding point in the knowledge embedded space is xi ; yi = i where is a normalization factor because the scales of xi and j are diMerent. Here, the local principal component analysis is expanded by directly using all the eigenvalues to represent knowledge rather than analyzing local principal components. Then, MNN distance in the knowledge embedded space is used as the new distance metric. If two points are close in this distance metric it means that in the input space, their coordinates are close and local features are similar. So, the essence of our algorithm can be comprehended in the following way: the dimension of the input space is increased, and these appended dimensions represent local features of data. Then, some points which cannot be distinguished in the input space can be distinguished in the knowledge embedded space by the increased dimensions. The third step is denoising of knowledge. In neighborhood graphs, there are often some edges connecting two points that are not actually neighbors. These are called false edges. In Fig. 1(a), the edges between layers are false edges. An e