Introduction
Related Work
Proposed Algorithm
Accelerated Inexact Soft-Impute for Fast Large-Scale Matrix Completion Quanming Yao Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong
Joint work with James Kwok
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Outline
1
Introduction
2
Related Work
3
Proposed Algorithm
4
Experiments
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Motivating Applications Recommender systems: predict rating by user i on item j
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Motivating Applications
Similarity among users and items: low-rank assumption
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Motivating Applications Image inpainting: fill in missing pixels
Natural image can be well approximated by low rank matrix
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
Matrix Completion minX
1 2 kPΩ (X
− O)k2F + λkX k∗
X ∈ Rm×n : low-rank matrix to be recovered (m ≤ n) O ∈ Rm×n : observed elements [PΩ (A)]ij = Aij if Ωij = 1, and 0 otherwise kX k∗ : nuclear norm (sum of X ’s singular values,non-smooth) P kX k∗ = m i=1 σi (X )
find X which is low-rank and consistent with the observations Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Proximal Gradient Descent minx f (x) + λg (x) f (·): convex and smooth g (·): convex, can be non-smooth 1 xt+1 = arg min f (xt ) + hx − xt , ∇f (xt )i + kx − xt k2F + λg (x) x 2 1 = arg min kx − zt k2 + λg (x) (where zt = xt − ∇f (xt )) x 2 | {z } Proximal Step
often has simple closed-form solution convergence rate: O(1/T ), where T is number of iterations Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Proximal Gradient Descent - Acceleration
minx f (x) + λg (x) can be accelerated to O(1/T 2 ) [Nesterov, 2013] yt = (1 + θt )xt − θt xt−1 zt = yt − ∇f (yt ) 1 xt+1 = arg min kx − zt k2 + λg (x) x 2 e.g., θt = (t − 1)/(t + 2) can be seen as momentum method with specified weight
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
Proximal Gradient Descent for Matrix Completion minX
1 kPΩ (X − O)k2F +λ kX k∗ | {z } 2 | {z } g (X )
f (X )
Let the SVD of matrix Z be UΣV > . Proximal Step for Matrix Completion 1 arg min kX − Z k2F + λkX k∗ = U (Σ − λI )+ V > ≡ SVTλ (Z ) | {z } X 2 thresholding
[(A)+ ]ij = max(Aij , 0) singular value thresholding (SVT): shrink singular values no bigger than λ to 0 Acceleration can be used [Ji and Ye, 2009; Toh and Yun, 2010]. Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Soft-Impute [Mazumder et al., 2010] Zt = PΩ (O) + PΩ⊥ (Xt ),
Xt+1 = SVTλ (Zt ).
[PΩ⊥ (A)]ij = Aij if Ωij = 0, and 1 otherwise (complement of PΩ (A)) To compute SVD, the basic operations are matrix multiplications of the form Zt u and Zt> v
Key observation: Zt is sparse + low-rank Let Xt = Ut Σt Vt > . For any u ∈ Rn , Zt u = PΩ (O − Xt )u + Ut Σt (Vt > u) {z } | {z } | sparse:O(kΩk1 )
low rank:O((m+n)k)
Rank-k SVD takes O(kΩk1 k + (m + n)k 2 ) time, instead of O(mnk) (similarly, for Zt> v ) k is much smaller than m and n; kΩk1 much smaller than mn Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Soft-Impute is Proximal Gradient
Zt = Xt − ∇f (Xt ) = Xt − PΩ (Xt − O) = PΩ⊥ (Xt ) + PΩ (O) | {z } {z } | Proximal Gradient
Soft-Impute
Soft-Impute = Proximal Gradient Possible to use acceleration and obtain O(1/T 2 ) rate Previous work suggested that this is not useful “sparse + low-rank” structure no longer exists increase in iteration complexity > gain in convergence rate
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Main Contributions
Acceleration is useful! 1
“sparse + low-rank” structure can still be used maintain low iteration complexity improve convergence rate to O(1/T 2 )
2
Speedup SVT using power method further reduces iteration complexity use of approximation still yields O(1/T 2 ) convergence rate
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
“Sparse + Low-Rank” Structure With acceleration, Zt
= PΩ (O − Yt ) + Yt = PΩ (O − Yt ) + (1 + θt )Xt − θt Xt−1 {z } | {z } | sparse
sum of two low-rank matrices
For any u, > Zt u = PΩ (O − Yt )u + (1 + θt )Ut Σt Vt> u − θt Ut−1 Σt−1 Vt−1 u. {z } | {z } | | {z } O(kΩk1 )
O((m+n)k)
O((m+n)k)
rank-k SVD takes O(kΩk1 k + (m + n)k 2 ) time (same as Soft-Impute) but rate is improved to O(1/T 2 ) (because of acceleration)
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Approximate SVT - Motivation The iterative procedure becomes Yt = (1 + θt )Xt − θt Xt−1 Zt = PΩ (O − Yt ) + Yt Xt+1 = SVT (Zt ) Motivations in SVT, only need singular vectors with singular values ≥ λ partial-SVD still has to be exactly solved iterative nature of proximal gradient descent, warm start can be helpful → approximate the subspace spanned by those singular vectors using power method Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Power Method Let rank-k SVD of Z˜ = Uk Σk Vk> , power method is simple but efficient to approximate subspace spanned by Uk iterative algorithm and can be warm-started (using R) PowerMethod(Z˜ , R, ˜) [Halko et al., 2011] ˜ ∈ Rm×n , initial R ∈ Rn×k for warm-start, tolerance ˜; Require: Z ˜ R); 1: initialize Q0 = QR(Z 2: for j = 0, 1, . . . do ˜ (Z ˜ > Qj )); // QR decomposition of a matrix 3: Qj+1 = QR(Z 4: ∆j+1 = kQj+1 Qj+1 > − Qj Qj > kF ; 5: if ∆j+1 ≤ ˜ then break; 6: end for 7: return Qj+1 ;
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Power Method - Case with k = 1 PowerMethod(Z˜ , r ) ˜r; 1: initialize q0 = Z 2: for j = 0, 1, . . . do 3: qj = qj /kqj k; // QR becomes normalization of a vector ˜ (Z ˜ > qj ); 4: qj+1 = Z 5: end for
Let Z˜ = UΣV > , recursive relationship can be seen as 1 j U > Z˜ r qj = Z˜ Z˜ > Z˜ r = U (σ2 /σ1 )2j ... For i = 2, · · · , m, lim
j→∞
2j σi σ1
= 0, power method captures
span of u1 (first column of U) Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
Obtain SVT(Z˜t ) from a much smaller SVT With the obtained Q, an approximate SVT can be constructed as ˆt = Q SVTλ (Q > Z˜t ). X Q > Z˜t ∈ Rk×n , thus is much smaller than Z˜t ∈ Rm×n Approx-SVT(Z˜t , R, λ, ˜) ˜t ∈ Rm×n , R ∈ Rn×k , thresholds λ and ˜. Require: Z ˜t , R, ˜); 1: Q = PowerMethod(Z >˜ 2: [U, Σ, V ] = SVD(Q Zt ); 3: U = {ui | σi > λ}, V = {vi | σi > λ}, Σ = (Σ − λI )+ ; 4: return QU, Σ and V .
still O(kΩk1 k + (m + n)k 2 ), but is cheaper than exact SVD
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Complete Algorithm Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt> ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 . Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .
core steps: 5–7 (acceleration) Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .
core steps: 8–11 (approximate SVT) the last two iterations (Vt and Vt−1 ) is used to warm-start power method error on approximate SVT ˜t is decreased linearly Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .
step 12: adaptive restarts algorithm if F (X ) starts to increase Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt> ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .
step 4 (continuation strategy): λt is initialized to large value and then decreased gradually; allows further speedup Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Error in Approximate SVT Let hλg (X ; Zt ) ≡ 12 kX − Zt k2F + λg (X ), if power method exits after j p ˆ ηt < 1 and ˜ ≥ αt ηtj 1 + ηt2 , then iterations, assume that k ≥ k, ˆt ; Z˜t ) ≤ hλk·k (SVTλ (Z˜t ); Z˜t ) + hλk·k∗ (X ∗
ηt βt γt ˜ . 1 − ηt | {z } controlled by ˜
ˆt is approximate solution. where X
αt , βt , γt and ηt are some constants depend on Z˜t kˆ is # of singular values > λ, k is input rank for Approx-SVT ˜ is tolerance for power method The approximation error in Approx-SVT can be controlled by ˜t Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Convergence of AIS-Impute
Theorem With controlled approximation error on SVT, Algorithm 3 converges to the optimal solution with a rate of O(1/T 2 ).
Since approximation error ˜t on proximal step (approx-SVT) decreases to 0 faster than O(1/T 2 ), the convergence rate is the same as for exact SVT
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Synthetic Data
m × m data matrix O = UV + G U ∈ Rm×5 , V ∈ R5×m : sampled i.i.d. from N (0, 1) G : sampled from N (0, 0.05)
kΩk1 = 15m log(m) random elements in O are observed half for training, half for parameter tuning
Testing on the unobserved (missing) elements Performance criteria: q ˜ )kF /kP ⊥ (X ˜ )kF NMSE = kPΩ⊥ (X − X Ω rank obtained time
Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
Synthetic Data - Compared Methods Compare the proposed AIS-Impute with accelerated proximal gradient algorithm (“APG”) [Ji and Ye, 2009; Toh and Yun, 2010]; Soft-Impute [Mazumder et al., 2010] Algorithm APG Soft-Impute AIS-Impute
Iteration Complexity O(mnk) O(kkΩk1 + k 2 (m + n)) O(kkΩk1 + k 2 (m + n))
Rate
SVT
O(1/T 2 )
Exact
O(1/T )
Exact
O(1/T 2 )
Approximate
Code can be download from https://github.com/quanmingyao/AIS-impute Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Results
APG Soft-Impute AIS-Impute
APG Soft-Impute AIS-Impute
m = 500 (sparsity=18.64%) NMSE rank time (sec)
m = 1000 (10.36%) NMSE rank time (sec)
0.0183
5
5.1
0.0223
5
45.5
0.0183
5
1.3
0.0223
5
4.4
0.0183
5
0.3
0.0223
5
1.1
m = 1500 (7.31%) NMSE rank time (sec)
m = 2000 (5.70%) NMSE rank time (sec)
0.0251
5
172.7
0.0273
5
483.9
0.0251
5
13.3
0.0273
5
18.7
0.0251
5
2.0
0.0273
5
2.9
All algorithms are equally good on recovery, while AIS-Impute is the fastest Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Convergence Speeds
(a) objective vs #iterations.
(b) objective vs time.
W.r.t. #iterations APG and AIS-Impute are much faster than Soft-Impute AIS-Impute has a slightly higher objective than APG W.r.t. time APG is the slowest (does not use “sparse plus low-rank”) AIS-Impute is the fastest Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Recommendation - MovieLens Data
Task: Recommend movies based on users’ historical ratings MovieLens-100K MovieLens-1M MovieLens-10M
#users 943 6,040 69,878
#movies 1,682 3,449 10,677
#ratings 100,000 999,714 10,000,054
ratings (from 1 to 5) of different users on movies 50% of the observed ratings for training 25% for validation and the rest for testing Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
MovieLens Data - Compared Methods
Besides proximal algorithms, we also compare with active subspace selection (“active”) [Hsieh and Olsen, 2014] Frank-Wolfe algorithm (“boost”) [Zhang et al., 2012] variant of Soft-Impute (“ALT-Impute”) [Hastie et al., 2014] second-order trust-region algorithm (“TR”) [Mishra et al., 2013]
Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Objective w.r.t. Time AIS-Impute is in black
(a) MovieLens-100K.
(b) MovieLens-10M.
MovieLen-10M TR and APG are very slow, and thus not shown Quanming Yao
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Testing RMSE w.r.t. Time AIS-Impute is in black
(a) MovieLens-100K.
Quanming Yao
(b) MovieLens-10M.
AIS-Impute for Matrix Completion
Experiments
Introduction
Related Work
Proposed Algorithm
Experiments
Results
active boost ALT-Impute TR APG Soft-Impute AIS-Impute
MovieLens-100K RMSE rank time 1.037 70 59.5 1.038 71 19.5 1.037 70 29.1 1.037 71 1911.4 1.037 70 83.4 1.037 70 337.6 1.037 70 5.8
MovieLens-1M RMSE rank time 0.925 180 1431.4 0.925 178 616.3 0.925 179 797.1 — — > 106 0.925 180 2060.3 0.925 180 8821.0 0.925 179 129.7
MovieLens-10M RMSE rank time 0.918 217 29681.4 0.917 216 13873.9 0.919 215 17337.3 — — > 106 — — > 106 — — > 106 0.916 215 2817.5
All algorithms are equally good at recovering the missing matrix elements TR is the slowest ALT-Impute has the same convergence rate as Soft-Impute, but is faster (than Soft-Impute) AIS-Impute is the fastest Quanming Yao
AIS-Impute for Matrix Completion
Introduction
Related Work
Proposed Algorithm
Experiments
Conclusion
AIS-Impute accelerates proximal gradient descent without losing the “sparse plus low-rank” structure power method produces good approximation to SVT efficiently fast convergence rate + low iteration complexity empirically, much faster than the state-of-the-art
Quanming Yao
AIS-Impute for Matrix Completion