slides - DI ENS

Report 4 Downloads 266 Views
Convex and Spectral Relaxations for Seriation and Ranking Ph.D. Defense Fajwel Fogel

November 18th, 2015 ´ Ecole Polytechnique & Sierra Team Inria-CNRS

1

Ordering problems

Henri MATISSE, Icare, 1943, Gouache sur papier (collage), Jazz, 1947, Centre Pompidou, Paris

2

1D jigsaw puzzle

3

1D jigsaw puzzle

4

Seriation problem

Seriation: reorder items given pairwise similarities.

High similarity (edges coincide)

5

Ranking problem Ranking: reorder items given pairwise comparisons.

Pairwise order (head goes before legs)

6

Seriation problem



Pairwise similarity information Sij on n variables.



Suppose the data has a serial structure, i.e. there is an order π such that

Sπ(i)π(j) decreases with |i − j| (R-matrix) Recover π ?

20

20

40

40

60

60

80

80

100

100

120

120

140

140

160

Similarity matrix

20

40

60

80

100

Input

120

140

160

160

20

40

60

80

100

120

140

160

Reconstructed

7

DNA de novo assembly Seriation has direct applications in DNA de novo assembly.  

Genomes are cloned multiple times and randomly cut into shorter reads (∼ 400bp), which are fully sequenced. Reorder the reads to recover the genome.

Figure from [Commins et al., 2009] 8

Ranking from pairwise comparisons



Given n items and pairwise comparisons

itemi  itemj ,



for (i, j) ∈ C,

Find a global ranking π of these items

itemπ(1)  itemπ(2)  . . .  itemπ(1).



Many applications ◦ Comparing items is often more intuitive than ranking them directly.

◦ Some datasets naturally come with pairwise comparisons (sport competitions, recommender systems...). 

Two main issues ◦ missing comparisons

◦ non transitive comparisons (i.e. a < b and b < c but a > c).

9

Contributions

The material of this thesis is based on the following publications: 

F. Fogel, I. Waldspurger, A. d’Aspremont, Phase retrieval for imaging problems. To appear in Mathematical Programming Computation.



F. Fogel, R. Jenatton, F. Bach, A. d’Aspremont, Convex relaxations for permutation problems. In Advances in Neural Information Processing Systems, pp. 1016-1024. 2013.



F. Fogel, R. Jenatton, F. Bach, A. d’Aspremont, Convex relaxations for permutation problems. SIAM Journal on Matrix Analysis and Applications (SIMAX), Vol. 36, Issue 4.



F. Fogel, A. d’Aspremont, M. Vojnovic: Serialrank: spectral ranking using seriation. In Advances in Neural Information Processing Systems, pp. 900-908. 2014.



F. Fogel, A. d’Aspremont, M. Vojnovic: Spectral ranking using seriation. Preprint on Arxiv: 1406.5370 (in submission).

Today: seriation and ranking.

10

Outline



Introduction



Seriation: spectral and convex relaxations



Ranking as a seriation problem



Numerical results

11

The seriation problem

Definition R-matrix We say a matrix S ∈ Sn is an R-matrix if

Sij decreases with |i − j| (R-matrix)

Seriation problem.  



Pairwise similarity information Sij on n variables. Suppose the data has a serial structure, i.e., there is an order π such that S π obtained by permuting rows and columns of S is an R-matrix. Recover π ?

12

Seriation: a combinatorial problem

2-SUM. Assign similar items to nearby positions in reordering, i.e., find permutation π of items 1 to n that minimizes n X 2 Si,j (π(i) − π(j)) . i,j=1

Theorem [F., Jenatton, Bach, Aspremont, 2013] Combinatorial Solution. For R-matrices written S = RRT , 2-SUM ⇐⇒ seriation. 

Generalized by [Laurent and Seminaroti, 2015] to products of R and anti-R matrices.



The 2-SUM problem is NP-Complete for generic matrices S [George and Pothen, 1997].

13

A spectral relaxation Spectral clustering.



Define the Laplacian of S as LS = diag(S 1) − S , n 1X 2 T Sij (π(i) − π(j)) = π LS π. 2 i,j=1



Spectral relaxation of 2-SUM: allow permutation vector to have non integer values: solve T

f = argmin x LS x. 1T x=0, kxk2 =1



f is the second smallest eigenvector of the Laplacian and is called Fiedler vector.



Output permutation π s.t. fπ(1) ≤ ... ≤ fπ(n).

14

A spectral solution

The Fiedler vector reorders a R-matrix in the noiseless case.

Theorem [Atkins, Boman, Hendrickson, et al., 1998] Spectral seriation. Suppose S ∈ Sn is a pre-R matrix, with a simple Fiedler value whose Fiedler vector f has no repeated values. Suppose that Π ∈ P is such that the permuted Fielder vector Πv is monotonic, then ΠSΠT is an R-matrix.

15

Spectral solution

Spectral solution. 

Exact for R-matrices.



Quite robust to small noise. Arguments similar to perturbation results in spectral clustering.



Scales very well, especially when similarity matrix is sparse (as in DNA sequencing and ranking).

Issues. 

What if the data is noisy and outside the spectral perturbation regime? (The spectral solution is only stable when the noise k∆Lk2 ≤ (λ2 − λ3)/2.)



What if we have additional structural information?

16

2-SUM with permutation matrices

Permutation matrices.



Permutation matrix Π associated to permutation vector π : {0, 1} matrix s.t. Πij = 1 iff π(i) = j. Let g = (1, ..., n)T , Πg = π .



Reformulate 2-SUM over the set of permutations matrices Pn minimize subject to

g T ΠT LAΠg Π ∈ Pn .

17

Convex relaxation over doubly stochastic matrices

Convex relaxation.



The convex hull of the set of permutation matrices Pn is the set of doubly stochastic matrices n×n T Dn = {X ∈ R : X > 0, X 1 = 1, X 1 = 1}



P = D ∩ O , i.e. Π permutation matrix if and only Π is both doubly stochastic and orthogonal.



Relax orthogonality constraint to obtain convex QP (cf. [Cl´emen¸con, Jakubowicz, 2010] for a similar relaxation). minimize g T ΠT LAΠg subject to Π ∈ Dn.

18

Convex relaxation

A few more things.

minimize subject to

Tr(Y T ΠT LAΠY ) − µkP Πk2F eT1 Πg + 1 ≤ eTn Πg, Π1 = 1, ΠT 1 = 1, Π ≥ 0,

in the variable Π ∈ Rn×n, where P = I − n1 11T , ei = (0, . . . , 0, 1, 0, . . . , 0) is the i-th unit vector, and Y ∈ Rn×p is a matrix whose columns are small perturbations of g = (1, . . . , n)T .

19

Convex relaxation

Objective. Minimize Tr(Y T ΠT LAΠY ) − µkP Πk2F 



T

T

2-SUM term Tr(Y Π LAΠY ) = the vector g = (1, . . . , n)T .

Pp

i=1

yiT ΠT LAΠyi where yi are small perturbations of

Orthogonalization penalty −µkP Πk2F , where P = I − n1 11T . ◦ Among all DS matrices, rotations (hence permutations) have the highest Frobenius norm. ◦ Setting µ ≤ λ2(LA)λ1(Y Y T ), keeps the problem a convex QP.

Constraints. 



eT1 Πg + 1 ≤ eTn Πg breaks degeneracies by imposing π(1) ≤ π(n). Without it, both monotonic solutions are optimal and this degeneracy can significantly deteriorate relaxation performance. Π1 = 1, ΠT 1 = 1 and Π ≥ 0, keep Π doubly stochastic.

20

Convex relaxation Solvers.



Sampling permutations. We can generate permutations from a doubly stochastic matrix D ◦ Sample monotonic random vector u.

◦ Recover a permutation by reordering Du.



Algorithms. Large QP, projecting on doubly stochastic matrices can be done very efficiently, using block coordinate descent on the dual. We use accelerated first-order methods.

21

Semi-supervised seriation

We can add structural constraints to the QP relaxation, reflecting for instance known distances between items, e.g., T

T

a ≤ π(i) − π(j) ≤ b is written a ≤ ei Πg − ej Πg ≤ b. → linear constraints in Π. Example: “mate reads” in DNA sequencing (Illumina technology).

22

Convex relaxation Approximation bounds.



A lot of work on relaxations for orthogonality constraints, e.g. SDPs in [Nemirovski, 2007, Coifman et al., 2008, So, 2011]. All of this could be used here.



Forms SDP of dimension O(n4), e.g. O(n9) for naive IPM implementations.



Simple idea: QT Q = I is a quadratic constraint on Q, lift it.





√ O( log n) approximation bounds for some instances of Minimum Linear Arrangement. [Even et al., 2000, Feige, 2000, Blum et al., 2000, Rao and Richa, 2005, Feige and Lee, 2007, Charikar et al., 2010]. Usual tradeoff with SDP relaxations: higher complexity but easier to quantify approximation quality.

Our relaxation is a simpler QP. No approximation bounds at this point however.

23

Outline



Introduction



Seriation: spectral and convex relaxations



Ranking as a seriation problem



Numerical results

24

Ranking from pairwise comparisons



Given n items and pairwise comparisons

itemi  itemj ,



for (i, j) ∈ C,

Find a global ranking π of these items

itemπ(1)  itemπ(2)  . . .  itemπ(n).

25

Ranking from pairwise comparisons

Classical problem, many algorithms (roughly sorted by increasing complexity). 

Scores. Borda count, Elo rating system (chess), TrueSkill [Herbrich et al., 2006], etc.



Spectral methods. [Dwork et al., 2001, Negahban et al., 2012]



MLE based algorithms. [Bradley and Terry, 1952, Luce, 1959, Herbrich et al., 2006]



Learning to rank. Ranking via least-squares [Jiang et Al, 2011], learning to rank [Schapire et al., 1998, Joachims, 2002, Rajkumar and Agarwal, 2014]

See forthcoming book by Milan Vojnovic on the subject . . .

26

From ranking to seriation

Similarity matrices from pairwise comparisons. 

Given pairwise comparisons C ∈ {−1, 0, 1}n×n with

Ci,j



 

1 = 0  −1

if i is ranked higher than j if i and j are not compared or in a draw if j is ranked higher than i

Define the pairwise similarity matrix S match as match

Si,j

=

 n  X 1 + Ci,k Cj,k k=1



(1)

2

.

(2)

match Si,j counts the number of matching comparisons between i and j with other reference items k. In a tournament setting: players that beat the same players and are beaten by the same players should have a similar ranking.

27

From ranking to seriation

Proposition [F., Aspremont, Vojnovic, 2014] Similarity from preferences. Given all comparisons between items ranked linearly, the similarity matrix S is a strict R-matrix and

Sij = n − |i − j| for all i, j = 1, . . . , n.

This means that, given all pairwise pairwise comparions, spectral ordering on S match will recover the true ranking.

28

Robustness: a graphical argument

10 9

7

rank

SerialRank

8

6 5 4 3 2 1

1

2

3

4

5

6

7

8

9

10

8

9

10

item 10

8 7

rank

Point Score

9

6 5 4 3 2 1

1

2

3

4

5

6

item

Ranking

7

Comparison matrix

Similarity matrix

All comparisons given, corrupted entries induce ties in score based ranking but not in similarity based ranking.

29

Robustness of SerialRank to corrupted entries

Theorem [F., Aspremont, Vojnovic, 2014] Robustness to corrupted entries. Given a comparison matrix for a set of n items with m corrupted comparisons selected uniformly at random from the set of all possible item pairs, the probability of recovery p(n, m) using SerialRank satisfies p(n, m) ≥ 1 − δ , provided that √ m = O( δn).

30

Sample complexity

Point score [Wauthier et al., 2013] 

Ordinal comparisons are sampled independently with fixed probability (same setting as ours).



Rank items according to their point score



For 0 < µ < 1, sampling Ω



n log n µ2



Pn

j=1

˜ij for each item i. C

comparisons guarantees that

max |˜ π (j) − π(j)| < νn j

with high probability for n large enough. 

This bound provides not only sample complexity, but also information on local displacements of the retrieved ranking.

→ similar bounds for SerialRank?

31

Perturbation analysis



Derive analytical expression of Fiedler vector when all comparisons are observed and consistent with an underlying ranking.



Use perturbation results (i.e., Davis-Kahan) in order to bound the perturbation of the Fiedler vector with missing/corrupted comparisons.



Get `2 and `∞ approximation bounds for SerialRank in settings where only a few comparisons are available.

32

Perturbation analysis Setting: Erd¨ os-R´ enyi graph with ordinal comparisons.



Take w.l.o.g. π = (1, 2, . . . , n).



Associated ordinal comparisons Cij = 1 iff i ≥ j .



Pairs are sampled i.i.d. with probability q , sampled comparisons are flipped i.i.d. with probability p.



˜ is the matrix of observed comparisons: C ˜ij = C

  

0 Cij −Cij

with probability 1 − q with probability qp with probability q(1 − p).

33

Perturbation analysis: `2 approximation bound

[F., Aspremont, Vojnovic, 2015]

`2 approximation bound on Fiedler vector For every µ ∈ (0, 1) and n large enough, sampling  Ω

n log4 n µ2 (2p−1)4

comparisons guarantees that

µ kf˜ − f k2 ≤ √ log n with probability at least 1 − 2/n.

34

Perturbation analysis: `∞ approximation bound

Bound local displacements of the ranking. [F., Aspremont, Vojnovic, 2015]

`∞ approximation bound on retrieved ranking For every µ ∈ (0, 1) and n large enough,  √  n log4 n sampling Ω nµ2(2p−1) comparisons guarantees that 4 max |˜ π (j) − π(j)| ≤ µn j

with probability at least 1 − 2/n.

35

Sketch of the proof `2 norm perturbation of the Fiedler vector. 





 

Show that the normalized Laplacian L = I − D −1S has a linear Fiedler vector and bound eigengap (D = diag(S 1)).

˜ − Dk2 using Bernstein concentration Bound the perturbation of the degree matrix kD inequality. ˜ − Sk2 using results from [Achlioptas and Bound the perturbation of the similarity matrix kS McSherry, 2007]. ˜ − Lk2. Bound the perturbation of the Laplacian matrix kL Deduce bound on the `2 norm perturbation of the Fiedler vector kf˜ − f k2 using Davis-Kahan theorem.

`∞ norm perturbation of the permutation vector. 



Bound the `∞ norm perturbation of the Fiedler vector kf˜ − f k∞ using previous results and additional component-wise bounds. Use the linearity of the Fiedler vector to derive bound on the `∞ norm perturbation of the permutation vector k˜ π − πk∞.

36

Outline



Introduction



Seriation: spectral and convex relaxations



Ranking as a seriation problem



Numerical results

37

Ranking quality

When true ranking is known. 

Kendall rank correlation coefficient τ ∈ [−1, 1]:

τ =

(number of concordant pairs) − (number of discordant pairs) 1 2 n(n − 1)

When true ranking is unknown (real datasets). 

Count the fraction of inverted pairs of the ranking relative to the input comparisons P π (j) < π ˆ (i)) C(i,j)