Pushing the Limits of Affine Rank Minimization by Adapting Probabilistic PCA Bo Xin 1
David Wipf 2
1 Peking 2 Microsoft
University
Research, Beijing
July 8, 2015
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
1 / 25
Outline
1
Introduction
2
Bayesian Affine Rank Minimization
3
Experimental Results
4
Conclusions
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
2 / 25
Section 1
1
Introduction
2
Bayesian Affine Rank Minimization
3
Experimental Results
4
Conclusions
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
3 / 25
Rank Minimization under Affine Constraints
General problem min rank(X)
X∈Rn×m
(1)
s.t. A(X) = b A : Rn×m → Rp Special case: Matrix completion min rank(X)
X∈Rn×m
(2)
s.t. X ij = (X 0 )ij , (i, j) ∈ Ω, |Ω| = p. Problems NP-hard
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
4 / 25
Common Solutions The target min rank(X)
X∈Rn×m
(3)
s.t. A(X) = b A : Rn×m → Rp common surrogates min n×m
X∈R
X
f (σi [X]) (4)
i
s.t. A(X) = b A : Rn×m → Rp Special cases f (z) = I[z 6= 0] → matrix rank. f (z) = z → (commonly applied) convex nuclear norm. f (z) = log(z) or f (z) = z q with q ≤ 1 → non-convex.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
5 / 25
Problems Nuclear norm
IRLS0 50
Cost function value
Cost function value
70 60 50 40 30 20 10 0 −5
true (lowest rank) solution
0 −50 −100 −150 γ=1 γ=1e−05 γ=1e−10
−200
0
η
5
−250 −5
0
η
5
Figure: Plots of different surrogates for matrix rank in a 1D feasible subspace. Here the convex nuclear the correct global minimum. In contrast, although the P norm does not retain non-convex i log σi [X ]2 + γ penalty exhibits the correct minimum when γ is sufficiently small, it also contains spurious minima.
Key issues Nuclear norm: strong assumptions on measurement process (A) and the underlying matrix (X 0 ). Non-convex ones: convergence to local minimum (including tuning parameters). Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
6 / 25
As a Consequence
Figure: Plots from Lu et al. (2014). The performance of a matrix completion task.
Remarks Degree of freedom: r (m + n) − r 2 . Theoretical limit: p = r (m + n) − r 2 . Specifically, here m = n = 150, p = 0.5 × 1502 = 11250; rpossible = 43. Nuclear norm failed at r = 24; current best non-convex surrogate failed at r = 33. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
7 / 25
Contributions
Key contributions A deceptively simple and parameter-free algorithm. Strong empirical performance against the theoretical limit, which has never been demonstrated previously. Theoretical inquiry for such substantial performance gains.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
8 / 25
Section 2
1
Introduction
2
Bayesian Affine Rank Minimization
3
Experimental Results
4
Conclusions
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
9 / 25
Basic Model View it from a Bayesian perspective 1 2 p(b|X ; A, λ) ∝ exp − kA(X ) − bk2 , 2λ i h Y ¯ −1 x , p(X ; Ψ, ν) = N (x :i ; 0, νi Ψ) ∝ exp x > Ψ
(5) (6)
i
−1 ˆ ] = ΨA ¯ > λI + AΨA ¯ > xˆ = vex[X b.
(7)
Choose Ψ by maximizing the likelihood marginalized over X Z max p(b|X ; A, λ)p(X ; Ψ, ν)dX , Ψ∈H + ,ν ≥0
(8)
After a −2 log transformation, this leads to a new objective L(Ψ, ν) = b > Σ−1 b b + log |Σb | ,
(9)
¯ > + λI and Ψ ¯ = diag[ν] ⊗ Ψ. Σb = AΨA Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
10 / 25
Updating Rules Construct upper bounds for both terms 1 ¯ −1 x kb − Axk22 + x > Ψ λ ¯ −1 log |Σb | ≡ m log |Ψ| + log λA> A + Ψ h i ≤ m log |Ψ| + tr Ψ−1 ∇Ψ−1 + C, b > Σ−1 b b ≤
∇Ψ−1 =
m X
(10)
(11)
−1 > ¯ Ψ − ΨA> A ΨA + λI Ai Ψ, i
(12)
i=1
Check the equality conditions −1 ˆ ] = ΨA ¯ > λI + AΨA ¯ > xˆ = vex[X b. Ψopt
Bo Xin and David Wipf
(13)
h i = arg min tr Ψ−1 X X > + ∇Ψ−1 + m log |Ψ| X i 1 hˆ ˆ> X X + ∇Ψ−1 . = m Bayesian Affine Rank Minimization
July 8, 2015
(14) 11 / 25
Symmetric Improvements Symmetric improvements ¯ = 1/2 (Ψr ⊗ I + I ⊗ Ψc ) . Ψ
(15)
The corresponding updating rules ˆ] = xˆ = vec[X
= ∇Ψ−1 r
−1 1 ¯ ¯ c )A> λI + A 1 Ψ ¯r +Ψ ¯ c A> (Ψr + Ψ b. 2 2 m X
(16)
−1 ¯ > Ψr − Ψr A> Ari Ψr , ri AΨr A + λI
(17)
−1 > ¯ Ψc − Ψc A> A Ψ A + λI Aci Ψc , c ci
(18)
i=1
∇Ψ−1 = c
n X i=1
Ψopt = r
Bo Xin and David Wipf
i 1 hˆ> ˆ X X + ∇Ψ−1 , r n
Ψopt c =
i 1 hˆ ˆ> X X + ∇Ψ−1 . c m
Bayesian Affine Rank Minimization
July 8, 2015
(19)
12 / 25
Theoretical Inquiry of Global/Local Minima Analysis Lemma (Global optimal always retained) Define r as the smallest rank of any feasible solution to b = A vec[X ], where A ∈ Rp×nm satisfies spark[A] = p + 1. Then if r < p/m, any global minimizer {Ψ∗ , ν ∗ } of (9) in the limit † ¯ ∗ A> b is feasible and rank[X ∗ ] = r with vec[X ∗ ] = x ∗ . ¯ ∗ A> AΨ λ → 0 is such that x ∗ = Ψ Lemma (Scale Invariance) ˜ = AD, where D = diag[α1 Γ, . . . , αm Γ] is a block-diagonal matrix with invertible blocks Let A Γ ∈ Rn×n of unit norm scaled with coefficients αi > 0. Then iff {Ψ∗ , ν ∗ } is a minimizer (global ˜ replaces or local) to (9) in the limit λ → 0, then {Γ−1 Ψ∗ , diag[α]−1 ν ∗ } is a minimizer when A A. The corresponding estimates of X are likewise in one-to-one correspondence. Theorem (Exclusive cases of local minimum smoothed away) Let b = A vec[X ], where A is block diagonal, with blocks Ai ∈ Rpi ×n . Moreover, assume pi > 1 for all i and that ∩i null[Ai ] = ∅. Then if minX rank[X ] = 1 in the feasible region, any minimizer † ¯ ∗ A> AΨ ¯ ∗ A> b is {Ψ∗ , ν ∗ } of (9) (global or local) in the limit λ → 0 is such that x ∗ = Ψ feasible and rank[X ∗ ] = 1 with vec[X ∗ ] = x ∗ . Furthermore, no cost function in the form of (4) can satisfy the same result. In particular, there can always exist local and/or global minima with rank greater than one. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
13 / 25
Illustration
Nuclear norm
IRLS0
50 40 30 20
0 −5
true (lowest rank) solution
0
Cost function value
60
10
BARM
50
Cost function value
Cost function value
70
0 −50 −100 −150 γ=1 γ=1e−05 γ=1e−10
−200
0
η
5
−250 −5
0
η
−20 −40 −60 −80 −100
5
−120 −5
0
η
5
Figure: Plots of different surrogates for matrix rank in a 1D feasible subspace. The cost function of BARM smoothes away local minimum while simultaneously retaining the correct global optima.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
14 / 25
Section 3
1
Introduction
2
Bayesian Affine Rank Minimization
3
Experimental Results
4
Conclusions
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
15 / 25
Matrix Completion Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 150) as iid N (0, 1). 50% of all entries are then hidden uniformly at random. FoS denotes Frequency of Success, where the relative error REL =
ˆk kX 0 −X F kX 0 kF
≤ 10−3 .
1 VSBL Nuclear norm IRNN_best IRLS0 BARM Theoretical limit
0.8
FoS
0.6 0.4 0.2 0 15
20
25
30 Rank
35
40
45
Figure: Reproduction of the task in Lu et al. (2014).
Remarks We significantly outperforms the state-of-the-art and reaches the theoretical limit. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
16 / 25
Comparisons with Rank-Aware Algorithms Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 150) as iid N (0, 1). 1 [U, S, V ] = X 0 ; s = diag[S]; ∀i, si = si ∗ ( i 0.8 ); X 0 = Udiag[s]V T .
50% of all entries are then hidden uniformly at random.
0.8
0.8
0.6
0.6
NIHT Alter BARM Theoretical limit
FoS
1
FoS
1
0.4
0.4
0.2
0 20
NIHT Alter BARM Theoretical limit 25
0.2
30
35
40
45
0 20
25
(a) X iid Gaussian
30
35
40
45
Rank
Rank
(b) X has decaying singular values
Figure: Comparisons with rank-aware algorithms. BARM has no knowledge of the true rank.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
17 / 25
General Problems
Task X 0 = ML MR , with ML ∈ Rn×r and MR ∈ Rr ×m (n = m = 100) as iid N (0, 1). Uncorrelated A: iid N (0, 1), p × n2 matrix (p=1000); Pp p n2 are iid N (0, 1) vectors. Correlated A: i=1 i −1/2 u i v > i , where u i ∈ R and v i ∈ R 1 0.8
0.6
0.2 0 1
2
3
4
5
6
FoS
REL
Nuclear norm IRLS0 BARM Theoretical limit
0.4
1 Nuclear norm IRLS0 BARM Theoretical limit
0.8
0.6
FoS
0.6
1
Nuclear norm IRLS0 BARM Theoretical limit
0.8 0.6 REL
1 0.8
0.4
0.4
0.4
0.2
0.2
0.2
0 1
2
Rank
3
4 Rank
5
6
0 1
2
3
4
5
6
0 1
Nuclear norm IRLS0 BARM Theoretical limit 2
Rank
(a) A uncorrelated
3
4
5
6
Rank
(b) A correlated
Remarks Over a wide battery of empirical tests, our algorithm is consistently able to reach the theoretical limits.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
18 / 25
Pushing the Limit and Delve into Failure Cases
Average value
1.2 1 0.8 0.6 0.4 0.2 0 1
3
5
7
9
11
13
15
17
19
Singular value number
Figure: Singular value averages of failure cases. (at p = (m + n)r − r 2 ).
Remarks ˆ 6= X 0 . Solutions of correct minimal rank are obtained even though X ˆ ]/σr +1 [X ˆ ] > 103 , where r is the rank of the true Define rank success as when σr [X low-rank X 0 . FoRS denotes Frequency of Rank Success. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
19 / 25
Pushing the Limit and Delve into Failure Cases
Table: Further matrix completion comparisons of BARM with IRLS0 by pushing the limits.
FR 0.9 0.95 0.99
Problem n(=m) 100 100 100
r 14 14 14
IRLSO FoS FoRS 0 0 0 0 0 0
BARM FoS FoRS 1 1 0.8 1 0.7 1
Remarks BARM failures are converted to successes under the FoRS metric. The other algorithms display almost identical behavior under either metric.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
20 / 25
Application-Low-rank Image Rectification Task Construct a first-order Taylor series approximation around the current rectified image estimate Reduced to rank minimization under general affine constraints.
(a)
Nuclear norm (easy)
(b)
BARM (easy)
(c)
Nuclear norm (hard)
(d)
BARM (hard)
Figure: Image rectification comparisons using a checkboard image. Top: Original image with observed region (red box) and estimated transformation (green box). Bottom: Rectified image estimates. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
21 / 25
Section 4
1
Introduction
2
Bayesian Affine Rank Minimization
3
Experimental Results
4
Conclusions
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
22 / 25
Conclusions and Discussions
Discussions Model justification based on technical considerations rather than the legitimacy of priors. Computational complexity: worst case scale linearly in the elements of X and quadratically in the number of observations. More challenging test would be interesting and could potentially show the robustness of BARM. Extension to tensor analysis would be interesting.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
23 / 25
Conclusions and Discussions
Take home messages A deceptively simple and parameter-free algorithm, with very strong empirical performance and theoretical inquiries. Code: http://idm.pku.edu.cn/staff/boxin/
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
24 / 25
Thank you !
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
25 / 25
Babacan, S Derin, Luessi, Martin, Molina, Rafael, and Katsaggelos, Aggelos K. Sparse bayesian methods for low-rank matrix estimation. Signal Processing, IEEE Transactions on, 60(8):3964–3977, 2012. Chandrasekaran, Venkat, Recht, Benjamin, Parrilo, Pablo A, and Willsky, Alan S. The convex geometry of linear inverse problems. Foundations of Computational mathematics, 12(6):805–849, 2012. Ding, Xinghao, He, Lihan, and Carin, Lawrence. Bayesian robust principal component analysis. Image Processing, IEEE Transactions on, 20(12): 3419–3430, 2011. Hu, Yao, Zhang, Debing, Ye, Jieping, Li, Xuelong, and He, Xiaofei. Fast and accurate matrix completion via truncated nuclear norm regularization. Pattern Analysis and Machine Intelligence (PAMI), IEEE Transactions on, 35(9):2117–2130, 2013. Jain, Prateek, Netrapalli, Praneeth, and Sanghavi, Sujay. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 665–674. ACM, 2013. Léger, Flavien, Yu, Guoshen, and Sapiro, Guillermo. Efficient matrix completion with gaussian models. arXiv preprint arXiv:1010.4050, 2010. Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
25 / 25
Lu, Canyi, Tang, Jinhui, Yan, Shuicheng, and Lin, Zhouchen. Generalized nonconvex nonsmooth low-rank minimization. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE, 2014. Mohan, Karthik and Fazel, Maryam. Iterative reweighted algorithms for matrix rank minimization. The Journal of Machine Learning Research (JMLR), 13(1):3441–3473, 2012. Tanner, Jared and Wei, Ke. Normalized iterative hard thresholding for matrix completion. SIAM Journal on Scientific Computing, 35(5):S104–S125, 2013. Tipping, Michael and Bishop, Christopher. Probabilistic principal component analysis. J. Royal Statistical Society, Series B, 61(3):611–622, 1999. Wipf, David. Non-convex rank minimization via an empirical bayesian approach. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 2012.
Bo Xin and David Wipf
Bayesian Affine Rank Minimization
July 8, 2015
25 / 25