A Hybrid Method for Distance Metric Learning

Report 2 Downloads 132 Views
A Hybrid Method for Distance Metric Learning Yi-hao Kao, Benjamin Van Roy, Daniel Rubin, Jiajing Xu, Jessica Faruque, and Sandy Napel Stanford University

1 Introduction  We consider the problem of learning a measure of distance among feature vectors, and propose a hybrid method that simultaneously learns from similarity ratings and class labels.  Application: information retrieval

 Data:  Pair-wise similarity rating: a set S of quintuplets (o, o’, x, x’, σ) • o, o’ : object identifier • x, x’ ∈ RK : features of each object • σ ∈ { 1, 2, 3} : dissimilar / neutral / similar  Class labels: a set G of triplets (o, x, c) • c ∈ { 1, 2, …, M} : class  Distance Metric: K 2 d r ( x, x' )   rk ( xk  x'k ) k 1

 Main Question: how to learn coefficients r ?

3 Conventional Algorithms  Ordinal Regression: We assume 1 P(  v | x, x' )  1  exp( d r ( x, x' ) 2   v ) where v is the level of similarity, and solve r ,

 log P( | x, x' )

( x , x ', )S

s.t.

r0 1   2

r

2 d  r ( x, x ' )

d ( x, x ' )  1  

r ( x , x ', 1)S

 Neighborhood Component Analysis: We assume a feature x† is assigned class label c† with probability

 exp( d ( x , x)) †

2 r

P (c | x , G )  †

( x ,c  c † )G

 exp( d

( x ',c ')G

and solve

max r 0

2 r



( x , x' ))

 log P(c | x, G \ ( x, c))

( x ,c )G

 Synthetic data: We randomly generate 100 datasets and carry out the above algorithms while varying the amount of training data. Figure 1 plots the resulting normalized discounted cumulative gains, defined as ( p)

2  1 DCG10   p 1 log 2 (1  p )

4 A Hybrid Method  Assumptions:  The observed features may not express all relevant information. In other words, there are “missing features.”  Given objects o, o’ with observed feature vectors x, x’ ∈ RK and missing feature vectors z, z’ ∈ RJ, the underlying distance metric is given by J  K  2  2 D(o, o' )    rk ( xk  x'k )   rj ( z j  z ' j )  j 1  k 1 



 d ( x, x ' )  d ( z , z ' ) 2 r

2 r

1 2



1 2

where r ∈ RK and r┴ ∈ RJ.  x and z are independent conditioned on c.  Given a learning algorithm A that learns conditional class probabilities P(c|x) from class label data, we represent the resulting estimates by a vector u(x) ∈ RM , defined as

Figure 1. The average NDCG delivered by OR, CO, NCA, and HYB, over different sizes of rating data set. Here K=60 and M=3.  Real data: Our real data set consists of 30 CT scans of liver lesion. Figure 2(a) gives some sample images. Figure 2(b) plots the NDCG delivered by each algorithm.

um (o)  Pˆ (m | x) Then we have

E[ D 2 (o, o' ) | x, x' , u (o), u (o' )]  d ( x, x' )  u (o) Qu(o' ) where Q ∈ RM×M is defined as 2 r

T

Example: We take A to be a kernel density estimator similar to NCA, and B to be the aforementioned convex optimization:

r0



Objects database

 We can plug the above results into any learning algorithm B that learns the coefficients of a distance metric from feature differences and similarity ratings.

( x , x ', 3)S

s.t.

Input: new object o

Output: a list of similar objects o(1), …, o(N)

Qc,c '  E[d r2 ( z, z ' ) | c, c' ], 1  c, c'  M

 Convex Optimization: We solve

min

Retrieval system

10

2 Problem Formulation

max

5 Experiments

min r

s.t.

2 T d ( x , x ' )  u ( o ) Qu(o' )  r

( o ,o ', x , x ', 3)S



d r ( x, x' )  u (o) Qu(o' )  1 2

( o ,o ', x , x ', 1)S

r0 Q  0 and symmetric.

T

(a) (b) Figure 2. (a) Sample images (b) The average NDCG delivered by OR, CO, NCA, and HYB.

References • Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. Learning distance functions using equivalence relations. ICML. 2003. • Cox, T. and Cox, M. A. A. Multidimensional Scaling. Chapman & Hall/CRC, 2000. • Frome, A., Singer, Y., and Malik, J. Image retrieval and classification using local distance functions. NIPS. 2006. • Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. Neighbourhood components analysis. NIPS. 2005. •Herbrich, R., Graepel, T., and Obermayer, K. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers. 2000. • McCullagh, P. and Nelder, J. A. Generalized linear models (Second edition). London: Chapman & Hall,1989. • Schultz, M. and Joachims, T. Learning a distance metric from relative comparisons. NIPS. 2004. • Weinberger, K. Q. and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. JMLR. 2009. • Weinberger, K. Q. and Tesauro, G. Metric learning for kernel regression. AISTATS. 2007. • Weinberger, K. Q., Blitzer, J., and Saul, L. K. Distance metric learning for large margin nearest neighbor classification. NIPS. 2006. • Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. Distance metric learning, with application to clustering with side-information. NIPS. 2002.