Complexity Analysis of the Lasso Regularization Path

Comment

Report 2 Downloads 102 Views

Complexity Analysis of the Lasso Regularization Path Julien Mairal and Bin Yu Inria, UC Berkeley

San Diego, SIAM Optimization, May 2014

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

1/15

What this work is about another paper about the Lasso/Basis Pursuit [Tibshirani, 1996, Chen et al., 1999]: min

w∈Rp

1 ky − Xwk22 + λkwk1 ; 2

(1)

the first complexity analysis of the homotopy method [Ritter, 1962, Osborne et al., 2000, Efron et al., 2004] for solving (1);

Some conclusions reminiscent of the simplex algorithm [Klee and Minty, 1972]; the SVM regularization path [G¨artner, Jaggi, and Maria, 2010].

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

2/15

The Lasso Regularization Path and the Homotopy Under uniqueness assumption of the Lasso solution, the regularization path is piecewise linear: 1.5

w

coefficient values

1

w2

1

w

3

w4

0.5

w

5

0

−0.5 0

1 Julien Mairal, Inria

2 λ

3

4

Complexity Analysis of the Lasso Regularization Path

3/15

Our Main Results Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3p + 1)/2 linear segments.

Proposition - approximate analysis

√ There exists an ε-approximate path with O(1/ ε) linear segments.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

4/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ 7→ w⋆ (λ) is continuous and piecewise linear.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

5/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ 7→ w⋆ (λ) is continuous and piecewise linear.

Recipe of the homotopy method - main ideas 1 2

finds a trivial solution w⋆ (λ∞ ) = 0 with λ∞ = kX⊤ yk∞ ;

compute the direction of the current linear segment of the path;

3

follow the direction of the path by decreasing λ;

4

stop at the next “kink” and go back to 2.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

5/15

Brief Introduction to the Homotopy Algorithm Piecewise linearity Under uniqueness assumptions of the Lasso solution, the regularization path λ 7→ w⋆ (λ) is continuous and piecewise linear.

Recipe of the homotopy method - main ideas 1 2

finds a trivial solution w⋆ (λ∞ ) = 0 with λ∞ = kX⊤ yk∞ ;

compute the direction of the current linear segment of the path;

3

follow the direction of the path by decreasing λ;

4

stop at the next “kink” and go back to 2.

Caveats kinks can be very close to each other; the direction of the path can involve ill-conditioned matrices; worst-case exponential complexity (main result of this work). Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

5/15

Worst case analysis Theorem - worst case analysis In the worst-case, the regularization path of the Lasso has exactly (3p + 1)/2 linear segments.

Coefficients (log scale)

Regularization path, p=6 2 1 0 −1

100 Julien Mairal, Inria

200 Kinks

300

Complexity Analysis of the Lasso Regularization Path

6/15

Worst case analysis Consider a Lasso problem (y ∈ Rn , X ∈ Rn×p ). ˜ in R(n+1)×(p+1) as follows: Define the vector ˜ y in Rn+1 and the matrix X y X 2αy △ △ ˜ ˜ y= , X= , yn+1 0 αyn+1 2 ). where yn+1 6= 0 and 0 < α < λ1 /(2y⊤ y + yn+1

Adverserial strategy If the regularization path of the Lasso (y,X) has k linear segments, the ˜ has 3k − 1 linear segments. path of (˜ y, X)

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

7/15

Worst case analysis △

˜ y=

y yn+1

,

△

˜= X

X 2αy 0 αyn+1

,

Let us denote by {η 1 , . . . , η k } the sequence of k sparsity patterns in {−1, 0, 1}p encountered along the path of the Lasso (y, X). ˜ is The new sequence of sparsity patterns for (˜ y, X) first k patterns

middle k patterns

z ( }|2 k { z k k−1}| 1 { η1 = 0 η η =0 η η η , , ,..., ,..., , , 0 1 0 1 1 0 2 3 k ) −η −η −η . , ,..., 1 1 1 {z } | last k−1 patterns

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

8/15

Worst case analysis

We are now in shape to build a pathological path with (3p + 1)/2 linear segments. Note that this lower-bound complexity is tight. 

   y=   △

1 1 1 .. . 1



   ,  



   X=   △

α1 2α2 2α3 0 α2 2α3 0 0 α3 .. .. .. . . . 0 0 0

Julien Mairal, Inria

 . . . 2αp . . . 2αp   . . . 2αp  , ..  .. . .  . . . αp

Complexity Analysis of the Lasso Regularization Path

9/15

Approximate Complexity Refinement of Giesen, Jaggi, and Laue [2010] for the Lasso

Strong Duality f (w), primal κ⋆

w⋆ b

b

w

κ

b

b

g (κ), dual Strong duality means that maxκ g (κ) = minw f (w)

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

10/15

Approximate Complexity Duality Gaps f (w), primal w ˜

κ ˜

b

b

w

κ

b

δ(w, ˜ κ ˜) b

g (κ), dual Strong duality means that maxκ g (κ) = minw f (w) The duality gap guarantees us that 0 ≤ f (w) ˜ − f (w⋆ ) ≤ δ(w, ˜ κ ˜ ).

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

11/15

Approximate Complexity n o △ 1 min fλ (w) = ky − Xwk22 + λkwk1 , w 2 o n 1 △ max gλ (κ) = − κ⊤ κ − κ⊤ y s.t. kX⊤ κk∞ ≤ λ . κ 2

(primal) (dual)

ε-approximate solution w satisfies APPROXλ (ε) when there exists a dual variable κ s.t. δλ (w, κ) = fλ (w) − gλ (κ) ≤ εfλ (w).

ε-approximate path A path P : λ 7→ w(λ) is an approximate path if it always contains ε-approximate solutions. (see Giesen et al. [2010] for generic results on approximate paths) Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

12/15

Approximate Complexity Main relation APPROXλ (0) =⇒ APPROXλ(1−√ε) (ε) Key: find an appropriate dual variable κ(w) + simple calculation;

Proposition - approximate analysis there exists an ε-approximate path with at most

Julien Mairal, Inria

l

log(λ∞ /λ1 ) √ ε

m

segments.

Complexity Analysis of the Lasso Regularization Path

13/15

Approximate Homotopy Recipe - main ideas/features Maintain approximate optimality conditions along the path; √ Make steps in λ greater than or equal to λ(1 − θ ε);

When the kinks are too close to each other, make a large step and use a first-order method instead; Between λ∞ l m and λ1 , the number of iterations is upper-bounded by log(λ∞ /λ1 ) √ . θ ε

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

14/15

A Few Messages to Conclude Despite its exponential complexity, the homotopy algorithm remains extremely powerful in practice; the main issue of the homotopy algorithm might be its numerical stability; when one does not care about precision, the worst-case complexity of the path can significantly be reduced.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

15/15

Advertisement SPAMS toolbox (open-source) C++ interfaced with Matlab, R, Python. proximal gradient methods for ℓ0 , ℓ1 , elastic-net, fused-Lasso, group-Lasso, tree group-Lasso, tree-ℓ0 , sparse group Lasso, overlapping group Lasso... ...for square, logistic, multi-class logistic loss functions. handles sparse matrices, provides duality gaps. fast implementations of OMP and LARS - homotopy. dictionary learning and matrix factorization (NMF, sparse PCA). coordinate descent, block coordinate descent algorithms. fast projections onto some convex sets. Try it! http://www.di.ens.fr/willow/SPAMS/

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

16/15

References I S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20:33–61, 1999. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of statistics, 32(2):407–499, 2004. B. G¨artner, M. Jaggi, and C. Maria. An exponential lower bound on the complexity of regularization paths. preprint arXiv:0903.4817v2, 2010. J. Giesen, M. Jaggi, and S. Laue. Approximating parameterized convex optimization problems. In Algorithms - ESA, Lectures Notes Comp. Sci. 2010. V. Klee and G. J. Minty. How good is the simplex algorithm? In O. Shisha, editor, Inequalities, volume III, pages 159–175. Academic Press, New York, 1972.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

17/15

References II M. R. Osborne, B. Presnell, and B. A. Turlach. On the Lasso and its dual. Journal of Computational and Graphical Statistics, 9(2):319–37, 2000. K. Ritter. Ein verfahren zur l¨osung parameterabh¨angiger, nichtlinearer maximum-probleme. Mathematical Methods of Operations Research, 6(4):149–166, 1962. R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B, 58(1):267–288, 1996.

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

18/15

Worst case analysis - Backup Slide △

˜ y=

y yn+1

,

△

˜= X

X 2αy 0 αyn+1

,

Some intuition about the adverserial strategy: 1

the patterns of the new path must be [η i⊤ , 0]⊤ or [±η i⊤ , 1]⊤ ;

2

the factor α ensures the (p + 1)-th variable to enter late the path;

3

after the k first kinks, we have y ≈ Xw⋆ (λ) and thus ⋆ w (λ) 0 −w⋆ (λ) ˜ ˜ X + ≈˜ y≈X . 0 yn+1 1/α

Julien Mairal, Inria

Complexity Analysis of the Lasso Regularization Path

19/15

Worst case analysis - Backup Slide 2

2

w 1 ˜ w ˜

=,

˜ + λ ˜ y−X min p ,w w ˜ 1 w ˜ 2 w∈R ˜ ˜ ∈R 2 1 1 min k(1 − 2αw ˜ )y − Xwk ˜ 22 + (yn+1 − αyn+1 w ˜ )2 + λkwk ˜ 1 + λ|w ˜ |. p w∈R ˜ ,w ˜ ∈R 2 2 is equivalent to 1 λ min ky − Xw ˜ ′ k22 + kw ˜ ′ k1 , ′ p w ˜ ∈R 2 |1 − 2αw ˜ ⋆| and then w ˜⋆ =

(

(1 − 2αw ˜ ⋆ )w⋆ 0

Julien Mairal, Inria

λ |1−2αw ˜ ⋆|

if w ˜ ⋆ 6=

1 2α

otherwise

.

Complexity Analysis of the Lasso Regularization Path

20/15

Recommend Documents

Concept Learning Using Complexity Regularization

Complexity Analysis