A Precise Analysis of the Estimation Error

Report 0 Downloads 129 Views
Regularized Linear Regression: A Precise Analysis of the Estimation Error Christos Thrampoulidis Samet Oymak Babak Hassibi

Caltech COLT 2015, Paris July 06, 2015

1/7

Setting

I

How to find a good estimate ˆ x of x0 from y = Ax0 + z?

2/7

Setting I

How to find a good estimate ˆ x of x0 from y = Ax0 + z?

I

Solve for ˆx = arg min L(y − Ax) + λf (x) x



L loss-function

convex



f

convex, non-smooth

e.g., `2 , `1 , `∞ , `22 , Huber-loss ...

regularizer-function

e.g., `1 , `1,2 , nuclear-norms ...

2/7

Setting I

How to find a good estimate ˆ x of x0 from y = Ax0 + z?

I

Solve for ˆx = arg min L(y − Ax) + λf (x) x



L loss-function

convex



f

convex, non-smooth

e.g., `2 , `1 , `∞ , `22 , Huber-loss ...

regularizer-function

e.g., `1 , `1,2 , nuclear-norms ...

Popular instances: minx 21 ky − Axk22 + λf (x)

LASSO, Group-LASSO, Fused-LASSO, ...

minx ky − Axk2 + λf (x)

Square-root LASSO

minx ky − Axk1 + λf (x) ···

Least Absolute Deviations (LAD) ···

2/7

Setting

I

How to find a good estimate ˆ x of x0 from y = Ax0 + z?

I

Solve for ˆx = arg min L(y − Ax) + λf (x)

I

kˆx − x0 k22 = ?

x

2/7

Contribution ˆx = arg min L(y − Ax) + λf (x) x

When I

A has entries i.i.d. standard normal

I

high-dimensional (proportional) asymptotic regime m, n → ∞ with m/n → δ ∈ (0, ∞)

We propose

A general framework for the precise characterization of the error kˆx − x0 k22

3/7

Contribution ˆx = arg min L(y − Ax) + λf (x) x

When I

A has entries i.i.d. standard normal

I

high-dimensional (proportional) asymptotic regime m, n → ∞ with m/n → δ ∈ (0, ∞)

We propose

A general framework for the precise characterization of the error kˆx − x0 k22 Previous Literature [Candes’07,Bickel’12,Belloni’11,Negahban’12...]: order-wise results (unknown constants) [DBM’11,BM’12,...]: limited instances: `22 -LASSO(only separable-regularizers), AMP [Stojnic’13]: constrained LASSO(`1 -regularizer), GMT + convexity: we build on this!

3/7

The Tool

CG MT

Convex Gaussian Min-max Theorem (CGMT)

A tight version of a classical

Gaussian comparison inequality of Gordon (1988) when combined with convexity. inspired by [Stojnic’13]

4/7

The Tool

CG MT

Convex Gaussian Min-max Theorem (CGMT)

A tight version of a classical

Gaussian comparison inequality of Gordon (1988) when combined with convexity. inspired by [Stojnic’13]

“Associates with a (difficult) Primary Oprimization a simplified Auxiliary Optimiztion with the same optimal cost, norm of optimal solution, etc.”

4/7

How it all works ˆ = min L(y x x

Ax) + f (x)

Regression Optimization (RO)

L(v) = max uT v u

L⇤ (u)

min-max Primary Optimization (PO)

kˆ x

x0 k2 ! . . .

5/7

How it all works ˆ = min L(y x x

Ax) + f (x)

L(v) = max uT v

Regression Optimization (RO)

u

Main theorem (CGMT)

L⇤ (u)

min-max Primary Optimization (PO) !

Auxiliary Optimization (AO)

kˆ x

x0 k2 ! . . .

5/7

How it all works ˆ = min L(y x x

Ax) + f (x)

L(v) = max uT v u

Regression Optimization (RO)

Main theorem (CGMT)

L⇤ (u)

min-max Primary Optimization (PO) !

Auxiliary Optimization (AO) Analyze the (AO) Problem specific…

kˆ x

x0 k2 ! . . .

5/7

An example: Error vs λ X0 ∈ R45×45 is rank 6. Observe y = A(X0 ) + z, solve the matrix LASSO:

minX ky − A(X0 )k2 + λkXk∗ Low rank estimation

12

Analytical prediction Simulation

NSE

2 !X∗ ! 2 −X 0 ! F σ2

10

8

6

Optimal tuning if rank r is known

4

2 0

2

4

λ

6

best

8

10

6/7

Last one x0 ∈ R4600 is k-sparse. Observe y = Ax0 + σz, solve the LAD:

minx ky − Ax0 k1 + λkxk1 3.5

k = 36 k = 60 k = 84 3

ˆ − x 0k 22 kx σ2

2.5

2

1.5

1

0.5

0 100

150

200

250

300

350

400

450

500

550

numb e r of me as ur e me nt s m

7/7

30

4

25

3.5

0

=

=

26

2

n

3

x0!

5, 0.0

σ

15

2.5

! xˆ − x 0! 22 σ2

/n m

=

! xˆ −

n

/ , k 0.5

σ2 = 1 σ 2 = 10 − 1 2 −2 20σ = 10 σ 2 = 10 − 3 σ 2 = 10 − 4 T h e or e m 3. 1

10

2

2.1 4 m. 0− 2 Th2 = 1 0− 1 σ 21.5 = σ 2 =1 σ 45

3

5

σ2 σ 2 = 10 − 4 = σ2 10 − 2 σ2 = 1 = 5

50

1

0 0

40

2.5 35

0.5

2

0.5

0

λ

100

150

200

250

300

350

400

450

500

550

numb e r of me as ur e me nt s m

20 15 st

λ

1.5

λ be

be

5

st

0

2

ax

2.5

m

at

io

n

m

Lo w

10

ra

nk

es

ti

3

10

λ

12

6

2

1.5

8

A Sim naly ula tica tio l p n red

ic

tio

n

2.5

120

140

160

180

200

220

2 4



100

!

80

N

0! 2 F

2 − σ2 X

60

!X

0.5

6

8

4

1

240

2 0

10

0.5

600

0 1 cr it

1

SE

E NS

30 25

1.5

7/7