A Hybrid Method for Distance Metric Learning Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque, and Sandy Napel Stanford University
1 Introduction We consider the problem of learning a measure of distance among feature vectors, and propose a hybrid method that simultaneously learns from similarity ratings and class labels. Application: information retrieval
Data: Pair-wise similarity rating: a set S of quintuplets (o, o’, x, x’, σ) • o, o’ : object identifier • x, x’ ∈ RK : features of each object • σ ∈ { 1, 2, 3} : dissimilar / neutral / similar Class labels: a set G of triplets (o, x, c) • c ∈ { 1, 2, …, M} : class Distance Metric: K 2 d r ( x, x' ) rk ( xk x'k ) k 1
Main Question: how to learn coefficients r ?
3 Conventional Algorithms Ordinal Regression: We assume 1 P( v | x, x' ) 1 exp( d r ( x, x' ) 2 v ) where v is the level of similarity, and solve r ,
log P( | x, x' )
( x , x ', )S
s.t.
r0 1 2
r
2 d r ( x, x ' )
d ( x, x ' ) 1
r ( x , x ', 1)S
Neighborhood Component Analysis: We assume a feature x† is assigned class label c† with probability
exp( d ( x , x)) †
2 r
P (c | x , G ) †
( x ,c c † )G
exp( d
( x ',c ')G
and solve
max r 0
2 r
†
( x , x' ))
log P(c | x, G \ ( x, c))
( x ,c )G
Synthetic data: We randomly generate 100 datasets and carry out the above algorithms while varying the amount of training data. Figure 1 plots the resulting normalized discounted cumulative gains, defined as ( p)
2 1 DCG10 p 1 log 2 (1 p )
4 A Hybrid Method Assumptions: The observed features may not express all relevant information. In other words, there are “missing features.” Given objects o, o’ with observed feature vectors x, x’ ∈ RK and missing feature vectors z, z’ ∈ RJ, the underlying distance metric is given by J K 2 2 D(o, o' ) rk ( xk x'k ) rj ( z j z ' j ) j 1 k 1
d ( x, x ' ) d ( z , z ' ) 2 r
2 r
1 2
1 2
where r ∈ RK and r┴ ∈ RJ. x and z are independent conditioned on c. Given a learning algorithm A that learns conditional class probabilities P(c|x) from class label data, we represent the resulting estimates by a vector u(x) ∈ RM , defined as
Figure 1. The average NDCG delivered by OR, CO, NCA, and HYB, over different sizes of rating data set. Here K=60 and M=3. Real data: Our real data set consists of 30 CT scans of liver lesion. Figure 2(a) gives some sample images. Figure 2(b) plots the NDCG delivered by each algorithm.
um (o) Pˆ (m | x) Then we have
E[ D 2 (o, o' ) | x, x' , u (o), u (o' )] d ( x, x' ) u (o) Qu(o' ) where Q ∈ RM×M is defined as 2 r
T
Example: We take A to be a kernel density estimator similar to NCA, and B to be the aforementioned convex optimization:
r0
†
Objects database
We can plug the above results into any learning algorithm B that learns the coefficients of a distance metric from feature differences and similarity ratings.
( x , x ', 3)S
s.t.
Input: new object o
Output: a list of similar objects o(1), …, o(N)
Qc,c ' E[d r2 ( z, z ' ) | c, c' ], 1 c, c' M
Convex Optimization: We solve
min
Retrieval system
10
2 Problem Formulation
max
5 Experiments
min r
s.t.
2 T d ( x , x ' ) u ( o ) Qu(o' ) r
( o ,o ', x , x ', 3)S
d r ( x, x' ) u (o) Qu(o' ) 1 2
( o ,o ', x , x ', 1)S
r0 Q 0 and symmetric.
T
(a) (b) Figure 2. (a) Sample images (b) The average NDCG delivered by OR, CO, NCA, and HYB.
References • Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. Learning distance functions using equivalence relations. ICML. 2003. • Cox, T. and Cox, M. A. A. Multidimensional Scaling. Chapman & Hall/CRC, 2000. • Frome, A., Singer, Y., and Malik, J. Image retrieval and classification using local distance functions. NIPS. 2006. • Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. Neighbourhood components analysis. NIPS. 2005. •Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers. 2000. • McCullagh, P. and Nelder, J. A. Generalized linear models (Second edition). London: Chapman & Hall,1989. • Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. NIPS. 2004. • Weinberger, K. Q. and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. JMLR. 2009. • Weinberger, K. Q. and Tesauro, G. Metric learning for kernel regression. AISTATS. 2007. • Weinberger, K. Q., Blitzer, J., and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. NIPS. 2006. • Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. Distance metric learning, with application to clustering with side-information. NIPS. 2002.