Regularized Linear Regression: A Precise Analysis of the Estimation Error Christos Thrampoulidis Samet Oymak Babak Hassibi
Caltech COLT 2015, Paris July 06, 2015
1/7
Setting
I
How to find a good estimate ˆ x of x0 from y = Ax0 + z?
2/7
Setting I
How to find a good estimate ˆ x of x0 from y = Ax0 + z?
I
Solve for ˆx = arg min L(y − Ax) + λf (x) x
•
L loss-function
convex
•
f
convex, non-smooth
e.g., `2 , `1 , `∞ , `22 , Huber-loss ...
regularizer-function
e.g., `1 , `1,2 , nuclear-norms ...
2/7
Setting I
How to find a good estimate ˆ x of x0 from y = Ax0 + z?
I
Solve for ˆx = arg min L(y − Ax) + λf (x) x
•
L loss-function
convex
•
f
convex, non-smooth
e.g., `2 , `1 , `∞ , `22 , Huber-loss ...
regularizer-function
e.g., `1 , `1,2 , nuclear-norms ...
Popular instances: minx 21 ky − Axk22 + λf (x)
LASSO, Group-LASSO, Fused-LASSO, ...
minx ky − Axk2 + λf (x)
Square-root LASSO
minx ky − Axk1 + λf (x) ···
Least Absolute Deviations (LAD) ···
2/7
Setting
I
How to find a good estimate ˆ x of x0 from y = Ax0 + z?
I
Solve for ˆx = arg min L(y − Ax) + λf (x)
I
kˆx − x0 k22 = ?
x
2/7
Contribution ˆx = arg min L(y − Ax) + λf (x) x
When I
A has entries i.i.d. standard normal
I
high-dimensional (proportional) asymptotic regime m, n → ∞ with m/n → δ ∈ (0, ∞)
We propose
A general framework for the precise characterization of the error kˆx − x0 k22
3/7
Contribution ˆx = arg min L(y − Ax) + λf (x) x
When I
A has entries i.i.d. standard normal
I
high-dimensional (proportional) asymptotic regime m, n → ∞ with m/n → δ ∈ (0, ∞)
We propose
A general framework for the precise characterization of the error kˆx − x0 k22 Previous Literature [Candes’07,Bickel’12,Belloni’11,Negahban’12...]: order-wise results (unknown constants) [DBM’11,BM’12,...]: limited instances: `22 -LASSO(only separable-regularizers), AMP [Stojnic’13]: constrained LASSO(`1 -regularizer), GMT + convexity: we build on this!
3/7
The Tool
CG MT
Convex Gaussian Min-max Theorem (CGMT)
A tight version of a classical
Gaussian comparison inequality of Gordon (1988) when combined with convexity. inspired by [Stojnic’13]
4/7
The Tool
CG MT
Convex Gaussian Min-max Theorem (CGMT)
A tight version of a classical
Gaussian comparison inequality of Gordon (1988) when combined with convexity. inspired by [Stojnic’13]
“Associates with a (difficult) Primary Oprimization a simplified Auxiliary Optimiztion with the same optimal cost, norm of optimal solution, etc.”
4/7
How it all works ˆ = min L(y x x
Ax) + f (x)
Regression Optimization (RO)
L(v) = max uT v u
L⇤ (u)
min-max Primary Optimization (PO)
kˆ x
x0 k2 ! . . .
5/7
How it all works ˆ = min L(y x x
Ax) + f (x)
L(v) = max uT v
Regression Optimization (RO)
u
Main theorem (CGMT)
L⇤ (u)
min-max Primary Optimization (PO) !
Auxiliary Optimization (AO)
kˆ x
x0 k2 ! . . .
5/7
How it all works ˆ = min L(y x x
Ax) + f (x)
L(v) = max uT v u
Regression Optimization (RO)
Main theorem (CGMT)
L⇤ (u)
min-max Primary Optimization (PO) !
Auxiliary Optimization (AO) Analyze the (AO) Problem specific…
kˆ x
x0 k2 ! . . .
5/7
An example: Error vs λ X0 ∈ R45×45 is rank 6. Observe y = A(X0 ) + z, solve the matrix LASSO:
minX ky − A(X0 )k2 + λkXk∗ Low rank estimation
12
Analytical prediction Simulation
NSE
2 !X∗ ! 2 −X 0 ! F σ2
10
8
6
Optimal tuning if rank r is known
4
2 0
2
4
λ
6
best
8
10
6/7
Last one x0 ∈ R4600 is k-sparse. Observe y = Ax0 + σz, solve the LAD:
minx ky − Ax0 k1 + λkxk1 3.5
k = 36 k = 60 k = 84 3
ˆ − x 0k 22 kx σ2
2.5
2
1.5
1
0.5
0 100
150
200
250
300
350
400
450
500
550
numb e r of me as ur e me nt s m
7/7
30
4
25
3.5
0
=
=
26
2
n
3
x0!
5, 0.0
σ
15
2.5
! xˆ − x 0! 22 σ2
/n m
=
! xˆ −
n
/ , k 0.5
σ2 = 1 σ 2 = 10 − 1 2 −2 20σ = 10 σ 2 = 10 − 3 σ 2 = 10 − 4 T h e or e m 3. 1
10
2
2.1 4 m. 0− 2 Th2 = 1 0− 1 σ 21.5 = σ 2 =1 σ 45
3
5
σ2 σ 2 = 10 − 4 = σ2 10 − 2 σ2 = 1 = 5
50
1
0 0
40
2.5 35
0.5
2
0.5
0
λ
100
150
200
250
300
350
400
450
500
550
numb e r of me as ur e me nt s m
20 15 st
λ
1.5
λ be
be
5
st
0
2
ax
2.5
m
at
io
n
m
Lo w
10
ra
nk
es
ti
3
10
λ
12
6
2
1.5
8
A Sim naly ula tica tio l p n red
ic
tio
n
2.5
120
140
160
180
200
220
2 4
∗
100
!
80
N
0! 2 F
2 − σ2 X
60
!X
0.5
6
8
4
1
240
2 0
10
0.5
600
0 1 cr it
1
SE
E NS
30 25
1.5
7/7