Accelerated Inexact Soft-Impute for Fast Large-Scale Matrix Completion

Report 3 Downloads 71 Views
Introduction

Related Work

Proposed Algorithm

Accelerated Inexact Soft-Impute for Fast Large-Scale Matrix Completion Quanming Yao Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong

Joint work with James Kwok

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Outline

1

Introduction

2

Related Work

3

Proposed Algorithm

4

Experiments

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Motivating Applications Recommender systems: predict rating by user i on item j

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Motivating Applications

Similarity among users and items: low-rank assumption

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Motivating Applications Image inpainting: fill in missing pixels

Natural image can be well approximated by low rank matrix

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

Matrix Completion minX

1 2 kPΩ (X

− O)k2F + λkX k∗

X ∈ Rm×n : low-rank matrix to be recovered (m ≤ n) O ∈ Rm×n : observed elements [PΩ (A)]ij = Aij if Ωij = 1, and 0 otherwise kX k∗ : nuclear norm (sum of X ’s singular values,non-smooth) P kX k∗ = m i=1 σi (X )

find X which is low-rank and consistent with the observations Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Proximal Gradient Descent minx f (x) + λg (x) f (·): convex and smooth g (·): convex, can be non-smooth 1 xt+1 = arg min f (xt ) + hx − xt , ∇f (xt )i + kx − xt k2F + λg (x) x 2 1 = arg min kx − zt k2 + λg (x) (where zt = xt − ∇f (xt )) x 2 | {z } Proximal Step

often has simple closed-form solution convergence rate: O(1/T ), where T is number of iterations Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Proximal Gradient Descent - Acceleration

minx f (x) + λg (x) can be accelerated to O(1/T 2 ) [Nesterov, 2013] yt = (1 + θt )xt − θt xt−1 zt = yt − ∇f (yt ) 1 xt+1 = arg min kx − zt k2 + λg (x) x 2 e.g., θt = (t − 1)/(t + 2) can be seen as momentum method with specified weight

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

Proximal Gradient Descent for Matrix Completion minX

1 kPΩ (X − O)k2F +λ kX k∗ | {z } 2 | {z } g (X )

f (X )

Let the SVD of matrix Z be UΣV > . Proximal Step for Matrix Completion 1 arg min kX − Z k2F + λkX k∗ = U (Σ − λI )+ V > ≡ SVTλ (Z ) | {z } X 2 thresholding

[(A)+ ]ij = max(Aij , 0) singular value thresholding (SVT): shrink singular values no bigger than λ to 0 Acceleration can be used [Ji and Ye, 2009; Toh and Yun, 2010]. Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Soft-Impute [Mazumder et al., 2010] Zt = PΩ (O) + PΩ⊥ (Xt ),

Xt+1 = SVTλ (Zt ).

[PΩ⊥ (A)]ij = Aij if Ωij = 0, and 1 otherwise (complement of PΩ (A)) To compute SVD, the basic operations are matrix multiplications of the form Zt u and Zt> v

Key observation: Zt is sparse + low-rank Let Xt = Ut Σt Vt > . For any u ∈ Rn , Zt u = PΩ (O − Xt )u + Ut Σt (Vt > u) {z } | {z } | sparse:O(kΩk1 )

low rank:O((m+n)k)

Rank-k SVD takes O(kΩk1 k + (m + n)k 2 ) time, instead of O(mnk) (similarly, for Zt> v ) k is much smaller than m and n; kΩk1 much smaller than mn Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Soft-Impute is Proximal Gradient

Zt = Xt − ∇f (Xt ) = Xt − PΩ (Xt − O) = PΩ⊥ (Xt ) + PΩ (O) | {z } {z } | Proximal Gradient

Soft-Impute

Soft-Impute = Proximal Gradient Possible to use acceleration and obtain O(1/T 2 ) rate Previous work suggested that this is not useful “sparse + low-rank” structure no longer exists increase in iteration complexity > gain in convergence rate

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Main Contributions

Acceleration is useful! 1

“sparse + low-rank” structure can still be used maintain low iteration complexity improve convergence rate to O(1/T 2 )

2

Speedup SVT using power method further reduces iteration complexity use of approximation still yields O(1/T 2 ) convergence rate

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

“Sparse + Low-Rank” Structure With acceleration, Zt

= PΩ (O − Yt ) + Yt = PΩ (O − Yt ) + (1 + θt )Xt − θt Xt−1 {z } | {z } | sparse

sum of two low-rank matrices

For any u, > Zt u = PΩ (O − Yt )u + (1 + θt )Ut Σt Vt> u − θt Ut−1 Σt−1 Vt−1 u. {z } | {z } | | {z } O(kΩk1 )

O((m+n)k)

O((m+n)k)

rank-k SVD takes O(kΩk1 k + (m + n)k 2 ) time (same as Soft-Impute) but rate is improved to O(1/T 2 ) (because of acceleration)

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Approximate SVT - Motivation The iterative procedure becomes Yt = (1 + θt )Xt − θt Xt−1 Zt = PΩ (O − Yt ) + Yt Xt+1 = SVT (Zt ) Motivations in SVT, only need singular vectors with singular values ≥ λ partial-SVD still has to be exactly solved iterative nature of proximal gradient descent, warm start can be helpful → approximate the subspace spanned by those singular vectors using power method Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Power Method Let rank-k SVD of Z˜ = Uk Σk Vk> , power method is simple but efficient to approximate subspace spanned by Uk iterative algorithm and can be warm-started (using R) PowerMethod(Z˜ , R, ˜) [Halko et al., 2011] ˜ ∈ Rm×n , initial R ∈ Rn×k for warm-start, tolerance ˜; Require: Z ˜ R); 1: initialize Q0 = QR(Z 2: for j = 0, 1, . . . do ˜ (Z ˜ > Qj )); // QR decomposition of a matrix 3: Qj+1 = QR(Z 4: ∆j+1 = kQj+1 Qj+1 > − Qj Qj > kF ; 5: if ∆j+1 ≤ ˜ then break; 6: end for 7: return Qj+1 ;

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Power Method - Case with k = 1 PowerMethod(Z˜ , r ) ˜r; 1: initialize q0 = Z 2: for j = 0, 1, . . . do 3: qj = qj /kqj k; // QR becomes normalization of a vector ˜ (Z ˜ > qj ); 4: qj+1 = Z 5: end for

Let Z˜ = UΣV > , recursive relationship can be seen as   1  j  U > Z˜ r qj = Z˜ Z˜ > Z˜ r = U  (σ2 /σ1 )2j ... For i = 2, · · · , m, lim

j→∞

 2j σi σ1

= 0, power method captures

span of u1 (first column of U) Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

Obtain SVT(Z˜t ) from a much smaller SVT With the obtained Q, an approximate SVT can be constructed as ˆt = Q SVTλ (Q > Z˜t ). X Q > Z˜t ∈ Rk×n , thus is much smaller than Z˜t ∈ Rm×n Approx-SVT(Z˜t , R, λ, ˜) ˜t ∈ Rm×n , R ∈ Rn×k , thresholds λ and ˜. Require: Z ˜t , R, ˜); 1: Q = PowerMethod(Z >˜ 2: [U, Σ, V ] = SVD(Q Zt ); 3: U = {ui | σi > λ}, V = {vi | σi > λ}, Σ = (Σ − λI )+ ; 4: return QU, Σ and V .

still O(kΩk1 k + (m + n)k 2 ), but is cheaper than exact SVD

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Complete Algorithm Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt> ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 . Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .

core steps: 5–7 (acceleration) Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .

core steps: 8–11 (approximate SVT) the last two iterations (Vt and Vt−1 ) is used to warm-start power method error on approximate SVT ˜t is decreased linearly Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .

step 12: adaptive restarts algorithm if F (X ) starts to increase Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Accelerated Inexact Soft-Impute (AIS-Impute). Require: partially observed matrix O, parameter λ, decay parameter ν ∈ (0, 1), threshold ; 1: [U0 , λ0 , V0 ] = rank-1 SVD(PΩ (O)); 2: initialize c = 1, ˜0 = kPΩ (O)kF , X0 = X1 = λ0 U0 V0> ; 3: for t = 1, 2, . . . do 4: λt = ν t (λ0 − λ) + λ; 5: θt = (c − 1)/(c + 2); 6: Yt = Xt + θt (Xt − Xt−1 ); ˜t = Yt + PΩ (O − Yt ); 7: Z 8: ˜t = ν t ˜0 ; 9: Vt−1 = Vt−1 − Vt (Vt > Vt−1 ), remove zero columns; 10: Rt = QR([Vt , Vt−1 ]); ˜t , Rt , λt , ˜t ); 11: [Ut+1 , Σt+1 , Vt+1 ] = approx-SVT(Z > 12: if F (Ut+1 Σt+1 Vt+1 ) > F (Ut Σt Vt> ) c = 1 else c = c + 1; 13: end for > 14: return Xt+1 = Ut+1 Σt+1 Vt+1 .

step 4 (continuation strategy): λt is initialized to large value and then decreased gradually; allows further speedup Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Error in Approximate SVT Let hλg (X ; Zt ) ≡ 12 kX − Zt k2F + λg (X ), if power method exits after j p ˆ ηt < 1 and ˜ ≥ αt ηtj 1 + ηt2 , then iterations, assume that k ≥ k, ˆt ; Z˜t ) ≤ hλk·k (SVTλ (Z˜t ); Z˜t ) + hλk·k∗ (X ∗

ηt βt γt ˜ . 1 − ηt | {z } controlled by ˜

ˆt is approximate solution. where X

αt , βt , γt and ηt are some constants depend on Z˜t kˆ is # of singular values > λ, k is input rank for Approx-SVT ˜ is tolerance for power method The approximation error in Approx-SVT can be controlled by ˜t Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Convergence of AIS-Impute

Theorem With controlled approximation error on SVT, Algorithm 3 converges to the optimal solution with a rate of O(1/T 2 ).

Since approximation error ˜t on proximal step (approx-SVT) decreases to 0 faster than O(1/T 2 ), the convergence rate is the same as for exact SVT

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Synthetic Data

m × m data matrix O = UV + G U ∈ Rm×5 , V ∈ R5×m : sampled i.i.d. from N (0, 1) G : sampled from N (0, 0.05)

kΩk1 = 15m log(m) random elements in O are observed half for training, half for parameter tuning

Testing on the unobserved (missing) elements Performance criteria: q ˜ )kF /kP ⊥ (X ˜ )kF NMSE = kPΩ⊥ (X − X Ω rank obtained time

Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

Synthetic Data - Compared Methods Compare the proposed AIS-Impute with accelerated proximal gradient algorithm (“APG”) [Ji and Ye, 2009; Toh and Yun, 2010]; Soft-Impute [Mazumder et al., 2010] Algorithm APG Soft-Impute AIS-Impute

Iteration Complexity O(mnk) O(kkΩk1 + k 2 (m + n)) O(kkΩk1 + k 2 (m + n))

Rate

SVT

O(1/T 2 )

Exact

O(1/T )

Exact

O(1/T 2 )

Approximate

Code can be download from https://github.com/quanmingyao/AIS-impute Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Results

APG Soft-Impute AIS-Impute

APG Soft-Impute AIS-Impute

m = 500 (sparsity=18.64%) NMSE rank time (sec)

m = 1000 (10.36%) NMSE rank time (sec)

0.0183

5

5.1

0.0223

5

45.5

0.0183

5

1.3

0.0223

5

4.4

0.0183

5

0.3

0.0223

5

1.1

m = 1500 (7.31%) NMSE rank time (sec)

m = 2000 (5.70%) NMSE rank time (sec)

0.0251

5

172.7

0.0273

5

483.9

0.0251

5

13.3

0.0273

5

18.7

0.0251

5

2.0

0.0273

5

2.9

All algorithms are equally good on recovery, while AIS-Impute is the fastest Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Convergence Speeds

(a) objective vs #iterations.

(b) objective vs time.

W.r.t. #iterations APG and AIS-Impute are much faster than Soft-Impute AIS-Impute has a slightly higher objective than APG W.r.t. time APG is the slowest (does not use “sparse plus low-rank”) AIS-Impute is the fastest Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Recommendation - MovieLens Data

Task: Recommend movies based on users’ historical ratings MovieLens-100K MovieLens-1M MovieLens-10M

#users 943 6,040 69,878

#movies 1,682 3,449 10,677

#ratings 100,000 999,714 10,000,054

ratings (from 1 to 5) of different users on movies 50% of the observed ratings for training 25% for validation and the rest for testing Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

MovieLens Data - Compared Methods

Besides proximal algorithms, we also compare with active subspace selection (“active”) [Hsieh and Olsen, 2014] Frank-Wolfe algorithm (“boost”) [Zhang et al., 2012] variant of Soft-Impute (“ALT-Impute”) [Hastie et al., 2014] second-order trust-region algorithm (“TR”) [Mishra et al., 2013]

Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Objective w.r.t. Time AIS-Impute is in black

(a) MovieLens-100K.

(b) MovieLens-10M.

MovieLen-10M TR and APG are very slow, and thus not shown Quanming Yao

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Testing RMSE w.r.t. Time AIS-Impute is in black

(a) MovieLens-100K.

Quanming Yao

(b) MovieLens-10M.

AIS-Impute for Matrix Completion

Experiments

Introduction

Related Work

Proposed Algorithm

Experiments

Results

active boost ALT-Impute TR APG Soft-Impute AIS-Impute

MovieLens-100K RMSE rank time 1.037 70 59.5 1.038 71 19.5 1.037 70 29.1 1.037 71 1911.4 1.037 70 83.4 1.037 70 337.6 1.037 70 5.8

MovieLens-1M RMSE rank time 0.925 180 1431.4 0.925 178 616.3 0.925 179 797.1 — — > 106 0.925 180 2060.3 0.925 180 8821.0 0.925 179 129.7

MovieLens-10M RMSE rank time 0.918 217 29681.4 0.917 216 13873.9 0.919 215 17337.3 — — > 106 — — > 106 — — > 106 0.916 215 2817.5

All algorithms are equally good at recovering the missing matrix elements TR is the slowest ALT-Impute has the same convergence rate as Soft-Impute, but is faster (than Soft-Impute) AIS-Impute is the fastest Quanming Yao

AIS-Impute for Matrix Completion

Introduction

Related Work

Proposed Algorithm

Experiments

Conclusion

AIS-Impute accelerates proximal gradient descent without losing the “sparse plus low-rank” structure power method produces good approximation to SVT efficiently fast convergence rate + low iteration complexity empirically, much faster than the state-of-the-art

Quanming Yao

AIS-Impute for Matrix Completion