Pushing the Limits of Affine Rank Minimization by ... - Semantic Scholar

Report 2 Downloads 40 Views
Pushing the Limits of Affine Rank Minimization by Adapting Probabilistic PCA Bo Xin 1

David Wipf 2

1 Peking 2 Microsoft

University

Research, Beijing

July 8, 2015

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

1 / 25

Outline

1

Introduction

2

Bayesian Affine Rank Minimization

3

Experimental Results

4

Conclusions

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

2 / 25

Section 1

1

Introduction

2

Bayesian Affine Rank Minimization

3

Experimental Results

4

Conclusions

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

3 / 25

Rank Minimization under Affine Constraints

General problem min rank(X)

X∈Rn×m

(1)

s.t. A(X) = b A : Rn×m → Rp Special case: Matrix completion min rank(X)

X∈Rn×m

(2)

s.t. X ij = (X 0 )ij , (i, j) ∈ Ω, |Ω| = p. Problems NP-hard

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

4 / 25

Common Solutions The target min rank(X)

X∈Rn×m

(3)

s.t. A(X) = b A : Rn×m → Rp common surrogates min n×m

X∈R

X

f (σi [X]) (4)

i

s.t. A(X) = b A : Rn×m → Rp Special cases f (z) = I[z 6= 0] → matrix rank. f (z) = z → (commonly applied) convex nuclear norm. f (z) = log(z) or f (z) = z q with q ≤ 1 → non-convex.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

5 / 25

Problems Nuclear norm

IRLS0 50

Cost function value

Cost function value

70 60 50 40 30 20 10 0 −5

true (lowest rank) solution

0 −50 −100 −150 γ=1 γ=1e−05 γ=1e−10

−200

0

η

5

−250 −5

0

η

5

Figure: Plots of different surrogates for matrix rank in a 1D feasible subspace. Here the convex nuclear the correct global minimum. In contrast, although the  P norm does not retain non-convex i log σi [X ]2 + γ penalty exhibits the correct minimum when γ is sufficiently small, it also contains spurious minima.

Key issues Nuclear norm: strong assumptions on measurement process (A) and the underlying matrix (X 0 ). Non-convex ones: convergence to local minimum (including tuning parameters). Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

6 / 25

As a Consequence

Figure: Plots from Lu et al. (2014). The performance of a matrix completion task.

Remarks Degree of freedom: r (m + n) − r 2 . Theoretical limit: p = r (m + n) − r 2 . Specifically, here m = n = 150, p = 0.5 × 1502 = 11250; rpossible = 43. Nuclear norm failed at r = 24; current best non-convex surrogate failed at r = 33. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

7 / 25

Contributions

Key contributions A deceptively simple and parameter-free algorithm. Strong empirical performance against the theoretical limit, which has never been demonstrated previously. Theoretical inquiry for such substantial performance gains.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

8 / 25

Section 2

1

Introduction

2

Bayesian Affine Rank Minimization

3

Experimental Results

4

Conclusions

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

9 / 25

Basic Model View it from a Bayesian perspective   1 2 p(b|X ; A, λ) ∝ exp − kA(X ) − bk2 , 2λ i h Y ¯ −1 x , p(X ; Ψ, ν) = N (x :i ; 0, νi Ψ) ∝ exp x > Ψ

(5) (6)

i

 −1 ˆ ] = ΨA ¯ > λI + AΨA ¯ > xˆ = vex[X b.

(7)

Choose Ψ by maximizing the likelihood marginalized over X Z max p(b|X ; A, λ)p(X ; Ψ, ν)dX , Ψ∈H + ,ν ≥0

(8)

After a −2 log transformation, this leads to a new objective L(Ψ, ν) = b > Σ−1 b b + log |Σb | ,

(9)

¯ > + λI and Ψ ¯ = diag[ν] ⊗ Ψ. Σb = AΨA Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

10 / 25

Updating Rules Construct upper bounds for both terms 1 ¯ −1 x kb − Axk22 + x > Ψ λ ¯ −1 log |Σb | ≡ m log |Ψ| + log λA> A + Ψ h i ≤ m log |Ψ| + tr Ψ−1 ∇Ψ−1 + C, b > Σ−1 b b ≤

∇Ψ−1 =

m X

(10)

(11)

 −1 > ¯ Ψ − ΨA> A ΨA + λI Ai Ψ, i

(12)

i=1

Check the equality conditions  −1 ˆ ] = ΨA ¯ > λI + AΨA ¯ > xˆ = vex[X b. Ψopt

Bo Xin and David Wipf

(13)

h  i = arg min tr Ψ−1 X X > + ∇Ψ−1 + m log |Ψ| X i 1 hˆ ˆ> X X + ∇Ψ−1 . = m Bayesian Affine Rank Minimization

July 8, 2015

(14) 11 / 25

Symmetric Improvements Symmetric improvements ¯ = 1/2 (Ψr ⊗ I + I ⊗ Ψc ) . Ψ

(15)

The corresponding updating rules ˆ] = xˆ = vec[X

= ∇Ψ−1 r

 −1  1 ¯ ¯ c )A> λI + A 1 Ψ ¯r +Ψ ¯ c A> (Ψr + Ψ b. 2 2 m X

(16)

 −1 ¯ > Ψr − Ψr A> Ari Ψr , ri AΨr A + λI

(17)

 −1 > ¯ Ψc − Ψc A> A Ψ A + λI Aci Ψc , c ci

(18)

i=1

∇Ψ−1 = c

n X i=1

Ψopt = r

Bo Xin and David Wipf

i 1 hˆ> ˆ X X + ∇Ψ−1 , r n

Ψopt c =

i 1 hˆ ˆ> X X + ∇Ψ−1 . c m

Bayesian Affine Rank Minimization

July 8, 2015

(19)

12 / 25

Theoretical Inquiry of Global/Local Minima Analysis Lemma (Global optimal always retained) Define r as the smallest rank of any feasible solution to b = A vec[X ], where A ∈ Rp×nm satisfies spark[A] = p + 1. Then if r < p/m, any global minimizer {Ψ∗ , ν ∗ } of (9) in the limit †  ¯ ∗ A> b is feasible and rank[X ∗ ] = r with vec[X ∗ ] = x ∗ . ¯ ∗ A> AΨ λ → 0 is such that x ∗ = Ψ Lemma (Scale Invariance) ˜ = AD, where D = diag[α1 Γ, . . . , αm Γ] is a block-diagonal matrix with invertible blocks Let A Γ ∈ Rn×n of unit norm scaled with coefficients αi > 0. Then iff {Ψ∗ , ν ∗ } is a minimizer (global ˜ replaces or local) to (9) in the limit λ → 0, then {Γ−1 Ψ∗ , diag[α]−1 ν ∗ } is a minimizer when A A. The corresponding estimates of X are likewise in one-to-one correspondence. Theorem (Exclusive cases of local minimum smoothed away) Let b = A vec[X ], where A is block diagonal, with blocks Ai ∈ Rpi ×n . Moreover, assume pi > 1 for all i and that ∩i null[Ai ] = ∅. Then if minX rank[X ] = 1 in the feasible region, any minimizer  † ¯ ∗ A> AΨ ¯ ∗ A> b is {Ψ∗ , ν ∗ } of (9) (global or local) in the limit λ → 0 is such that x ∗ = Ψ feasible and rank[X ∗ ] = 1 with vec[X ∗ ] = x ∗ . Furthermore, no cost function in the form of (4) can satisfy the same result. In particular, there can always exist local and/or global minima with rank greater than one. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

13 / 25

Illustration

Nuclear norm

IRLS0

50 40 30 20

0 −5

true (lowest rank) solution

0

Cost function value

60

10

BARM

50

Cost function value

Cost function value

70

0 −50 −100 −150 γ=1 γ=1e−05 γ=1e−10

−200

0

η

5

−250 −5

0

η

−20 −40 −60 −80 −100

5

−120 −5

0

η

5

Figure: Plots of different surrogates for matrix rank in a 1D feasible subspace. The cost function of BARM smoothes away local minimum while simultaneously retaining the correct global optima.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

14 / 25

Section 3

1

Introduction

2

Bayesian Affine Rank Minimization

3

Experimental Results

4

Conclusions

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

15 / 25

Matrix Completion Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 150) as iid N (0, 1). 50% of all entries are then hidden uniformly at random. FoS denotes Frequency of Success, where the relative error REL =

ˆk kX 0 −X F kX 0 kF

≤ 10−3 .

1 VSBL Nuclear norm IRNN_best IRLS0 BARM Theoretical limit

0.8

FoS

0.6 0.4 0.2 0 15

20

25

30 Rank

35

40

45

Figure: Reproduction of the task in Lu et al. (2014).

Remarks We significantly outperforms the state-of-the-art and reaches the theoretical limit. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

16 / 25

Comparisons with Rank-Aware Algorithms Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 150) as iid N (0, 1). 1 [U, S, V ] = X 0 ; s = diag[S]; ∀i, si = si ∗ ( i 0.8 ); X 0 = Udiag[s]V T .

50% of all entries are then hidden uniformly at random.

0.8

0.8

0.6

0.6

NIHT Alter BARM Theoretical limit

FoS

1

FoS

1

0.4

0.4

0.2

0 20

NIHT Alter BARM Theoretical limit 25

0.2

30

35

40

45

0 20

25

(a) X iid Gaussian

30

35

40

45

Rank

Rank

(b) X has decaying singular values

Figure: Comparisons with rank-aware algorithms. BARM has no knowledge of the true rank.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

17 / 25

General Problems

Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 100) as iid N (0, 1). Uncorrelated A: iid N (0, 1), p × n2 matrix (p=1000); Pp p n2 are iid N (0, 1) vectors. Correlated A: i=1 i −1/2 u i v > i , where u i ∈ R and v i ∈ R 1 0.8

0.6

0.2 0 1

2

3

4

5

6

FoS

REL

Nuclear norm IRLS0 BARM Theoretical limit

0.4

1 Nuclear norm IRLS0 BARM Theoretical limit

0.8

0.6

FoS

0.6

1

Nuclear norm IRLS0 BARM Theoretical limit

0.8 0.6 REL

1 0.8

0.4

0.4

0.4

0.2

0.2

0.2

0 1

2

Rank

3

4 Rank

5

6

0 1

2

3

4

5

6

0 1

Nuclear norm IRLS0 BARM Theoretical limit 2

Rank

(a) A uncorrelated

3

4

5

6

Rank

(b) A correlated

Remarks Over a wide battery of empirical tests, our algorithm is consistently able to reach the theoretical limits.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

18 / 25

Pushing the Limit and Delve into Failure Cases

Average value

1.2 1 0.8 0.6 0.4 0.2 0 1

3

5

7

9

11

13

15

17

19

Singular value number

Figure: Singular value averages of failure cases. (at p = (m + n)r − r 2 ).

Remarks ˆ 6= X 0 . Solutions of correct minimal rank are obtained even though X ˆ ]/σr +1 [X ˆ ] > 103 , where r is the rank of the true Define rank success as when σr [X low-rank X 0 . FoRS denotes Frequency of Rank Success. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

19 / 25

Pushing the Limit and Delve into Failure Cases

Table: Further matrix completion comparisons of BARM with IRLS0 by pushing the limits.

FR 0.9 0.95 0.99

Problem n(=m) 100 100 100

r 14 14 14

IRLSO FoS FoRS 0 0 0 0 0 0

BARM FoS FoRS 1 1 0.8 1 0.7 1

Remarks BARM failures are converted to successes under the FoRS metric. The other algorithms display almost identical behavior under either metric.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

20 / 25

Application-Low-rank Image Rectification Task Construct a first-order Taylor series approximation around the current rectified image estimate Reduced to rank minimization under general affine constraints.

(a)

Nuclear norm (easy)

(b)

BARM (easy)

(c)

Nuclear norm (hard)

(d)

BARM (hard)

Figure: Image rectification comparisons using a checkboard image. Top: Original image with observed region (red box) and estimated transformation (green box). Bottom: Rectified image estimates. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

21 / 25

Section 4

1

Introduction

2

Bayesian Affine Rank Minimization

3

Experimental Results

4

Conclusions

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

22 / 25

Conclusions and Discussions

Discussions Model justification based on technical considerations rather than the legitimacy of priors. Computational complexity: worst case scale linearly in the elements of X and quadratically in the number of observations. More challenging test would be interesting and could potentially show the robustness of BARM. Extension to tensor analysis would be interesting.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

23 / 25

Conclusions and Discussions

Take home messages A deceptively simple and parameter-free algorithm, with very strong empirical performance and theoretical inquiries. Code: http://idm.pku.edu.cn/staff/boxin/

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

24 / 25

Thank you !

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

25 / 25

Babacan, S Derin, Luessi, Martin, Molina, Rafael, and Katsaggelos, Aggelos K. Sparse bayesian methods for low-rank matrix estimation. Signal Processing, IEEE Transactions on, 60(8):3964–3977, 2012. Chandrasekaran, Venkat, Recht, Benjamin, Parrilo, Pablo A, and Willsky, Alan S. The convex geometry of linear inverse problems. Foundations of Computational mathematics, 12(6):805–849, 2012. Ding, Xinghao, He, Lihan, and Carin, Lawrence. Bayesian robust principal component analysis. Image Processing, IEEE Transactions on, 20(12): 3419–3430, 2011. Hu, Yao, Zhang, Debing, Ye, Jieping, Li, Xuelong, and He, Xiaofei. Fast and accurate matrix completion via truncated nuclear norm regularization. Pattern Analysis and Machine Intelligence (PAMI), IEEE Transactions on, 35(9):2117–2130, 2013. Jain, Prateek, Netrapalli, Praneeth, and Sanghavi, Sujay. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 665–674. ACM, 2013. Léger, Flavien, Yu, Guoshen, and Sapiro, Guillermo. Efficient matrix completion with gaussian models. arXiv preprint arXiv:1010.4050, 2010. Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

25 / 25

Lu, Canyi, Tang, Jinhui, Yan, Shuicheng, and Lin, Zhouchen. Generalized nonconvex nonsmooth low-rank minimization. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE, 2014. Mohan, Karthik and Fazel, Maryam. Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research (JMLR), 13(1):3441–3473, 2012. Tanner, Jared and Wei, Ke. Normalized iterative hard thresholding for matrix completion. SIAM Journal on Scientific Computing, 35(5):S104–S125, 2013. Tipping, Michael and Bishop, Christopher. Probabilistic principal component analysis. J. Royal Statistical Society, Series B, 61(3):611–622, 1999. Wipf, David. Non-convex rank minimization via an empirical bayesian approach. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.

Bo Xin and David Wipf

Bayesian Affine Rank Minimization

July 8, 2015

25 / 25