Condition Number Analysis of Logistic Regression, and its ...

Report 1 Downloads 41 Views
SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Condition Number Analysis of Logistic Regression, and its Implications for First-Order Solution Methods Robert M. Freund (MIT) joint with Paul Grigas (Berkeley) and Rahul Mazumder (MIT)

ISI Marrakech, July 2017

1

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

How can optimization inform statistics (and machine learning)?

Paper in preparation (this talk): Condition Number Analysis of Logistic Regression, and its Implications for First-Order Solution Methods

A “cousin” paper of ours: A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

2

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Outline Optimization primer: some “old” results and new observations for the family of steepest descent algorithms Logistic regression perspectives: statistics and machine learning A pair of condition numbers for the logistic regression problem: when the sample data is non-separable: a condition number for the degree of non-separability of the dataset informing the convergence guarantees of steepest descent family guarantees on reaching linear convergence (thanks to Bach)

when the sample data is separable: a condition number for the degree of separability of the dataset informing convergence guarantee to deliver an approximate maximum margin classifier 3

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Primer on Steepest Descent in a Given Norm

Some Old and New Results for Steepest Descent in a Given Norm

4

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Steepest Descent in a Given Norm (SDGN) F ∗ :=

min

F (x)

s.t.

x ∈ Rp

x

Let k · k be the given norm on the variables x ∈ Rp Steepest Descent in a Given Norm (SDGN) Initialize at x 0 ∈ Rp , k ← 0 At iteration k : 1

Compute gradient ∇F (x k )

2

Compute d k ← arg maxd {∇F (x k )T d : kdk ≤ 1}

3

Choose step-size αk

4

Set x k+1 ← x k − αk d k

5

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Greedy Coordinate Descent ≡ `1 -Steepest Descent F ∗ :=

min

F (x)

s.t.

x ∈ Rp

x

Let k · k = k · k1 Steepest Descent method in the `1 -norm Initialize at x 0 ∈ Rp , k ← 0 At iteration k : 1

Compute gradient ∇F (x k )

2

Compute direction: d k ← arg maxd {∇F (x k )T d : kdk1 ≤ 1}

3

Choose step-size αk

4

Set x k+1 ← x k − αk d k

6

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Greedy Coordinate Descent ≡ `1 -Steepest Descent, cont. d k ∈ arg max {∇F (x k )T d} kdk1 ≤1

dk

7

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Gradient Descent ≡ `2 -Steepest Descent F ∗ :=

min

F (x)

s.t.

x ∈ Rp

x

Let k · k = k · k2 Steepest Descent method in the `2 -norm Initialize at x 0 ∈ Rp , k ← 0 At iteration k : 1

Compute gradient ∇F (x k )

2

Compute direction: d k ← arg maxd {∇F (x k )T d : kdk2 ≤ 1}

3

Choose step-size αk

4

Set x k+1 ← x k − αk d k

8

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Gradient Descent ≡ `2 -Steepest Descent, cont. d k ∈ arg max {∇F (x k )T d} kdk2 ≤1

rF (xk )

9

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family F ∗ :=

min

F (x)

s.t.

x ∈ Rp

x

Assume F (·) is convex and ∇F (·) is Lipschitz with parameter LF : k∇F (x) − ∇F (y )k∗ ≤ LF kx − y k

for all x, y ∈ Rp

k · k∗ is the usual dual norm Two sets of interest: S0 := {x ∈ Rp : F (x) ≤ F (x 0 )} is the level set of the initial point x 0 S ∗ := {x ∈ Rp : F (x) = F ∗ } is the set of optimal solutions 10

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Metrics for Evaluating Steepest Descent family, cont. S0 := {x ∈ Rp : F (x) ≤ F (x 0 )} is the level set of the initial point x 0

S ∗ := {x ∈ Rp : F (x) = F ∗ } is the set of optimal solutions Dist0 := max min kx − x ∗ k ∗ ∗ x∈S0 x ∈S

S0 Dist0

S⇤

(In high-dimensional machine learning problems, S ∗ can be very big)

11

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family Dist0 := max min kx − x ∗ k ∗ ∗ x∈S0 x ∈S

Theorem: Objective Function Value Convergence (essentially [Beck and Tetruashvil 2014], [Nesterov 2003]) If the step-sizes are chosen using the rule: αk =

k∇F (x k )k∗ LF

for all k ≥ 0 ,

then for each k ≥ 0 the following inequality holds: F (x k ) − F ∗ ≤ where

2LF (Dist0 )2 Kˆ 0 := . F (x 0 ) − F ∗

2LF (Dist0 )2 Kˆ 0 + k


n GCD performs variable selection GCD imparts implicit regularization Just one tuning parameter (number of iterations)

28

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Implicit Regularization and Variable Selection Properties Artificial example: n = 1000, p = 100, true model has 5 non-zeros

���





���

����� ����

����������������

�������������������

��� �

���������������� ���



�� � �

��� �

���

�� �

���

���

���

���������

���

���



���

���

���

���

���

���������

Compare with explicit regularization schemes (`1 , `2 , etc.) 29

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

How Can SDGN Inform Logistic Regression?

Some questions: How do the computational guarantees for the Steepest Descent family specialize to the case of Logistic Regression?

What role does problem structure/conditioning play in these guarantees?

Can we say anything further about the convergence properties of the Steepest Descent family in the special case of Logistic Regression?

30

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Elementary Properties of the Logistic Loss Function L∗n :=

min

Ln (β) :=

β

1 n

Pn

i=1

ln(1 + exp(−yi β T xi ))

Logistic regression “ideally” seeks β for which yi xiT β > 0 for all i : yi > 0 ⇒ xiT β > 0 yi < 0 ⇒ xiT β < 0 �

����





������� ����� ��������



� ��

��

��









T y������ x

31

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Geometry of the Data: Non-Separable and Separable Data

(a) Very Non-Separable Data

(b) Very Separable Data

(c) Mildly Non-Separable Data

(d) Mildly Separable Data

32

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separable and Non-Separable Data

Separable Data The data is separable if there exists β¯ for which ¯ T xi > 0 yi · (β)

for all i = 1, . . . , n

Non-Separable Data The data is non-separable if it is not separable, namely, every β satisfies yi · (β)T xi ≤ 0

for some i ∈ {1, . . . , n}

33

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separable Data

L∗n :=

min β

Ln (β) :=

1 n

Pn

i=1

ln(1 + exp(−yi β T xi ))

The data is separable if there exists β¯ for which ¯ T xi > 0 yi · (β)

for all i = 1, . . . , n

¯ → 0 (= L∗n ) as θ → +∞ If β¯ separates the data, then Ln (θβ) Perhaps trying to optimize the logistic loss function is unlikely to be effective at finding a “good” linear classifier .... 34

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separable and Non-Separable Data

(a) Separable

(b) Non-Separable

35

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Results in the Non-Separable Case

Results in the Non-Separable Case

36

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Non-Separable Data and Problem Behavior/Conditioning

Let us quantify the degree of non-separability of the data.

(a) Very non-separable data

(b) Mildly non-separable data

We will relate this to problem behavior/conditioning.... 37

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Non-Separability Condition Number DegNSEP∗

Definition of Non-Separability Condition Number DegNSEP∗ Pn 1 T − DegNSEP∗ := minp i=1 [yi β xi ] n β∈R

s.t.

kβk = 1

DegNSEP∗ is the least average misclassification error (over all normalized classifiers)

DegNSEP∗ > 0 if and only if the data is strictly non-separable

38

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Non-Separability Measure DegNSEP∗ DegNSEP∗ :=

(a) DegNSEP∗ is large

Pn

minp

1 n

s.t.

kβk = 1

β∈R

i=1 [yi β

T

xi ]−

(b) DegNSEP∗ is small 39

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

DegNSEP∗ and Problem Behavior/Conditioning L∗n :=

min β

Ln (β) :=

DegNSEP∗ :=

1 n

Pn

i=1

ln(1 + exp(−yi β T xi ))

Pn

min

1 n

s.t.

kβk = 1

β∈Rp

i=1 [yi β

T

xi ]−

Theorem: Non-Separability and Sizes of Optimal Solutions Suppose that the data is non-separable and DegNSEP∗ > 0. Then 1 2

3

the logistic regression problem LR attains its optimum, for every optimal solution β ∗ of LR it holds that L∗n ln(2) kβ ∗ k ≤ ≤ , and DegNSEP∗ DegNSEP∗ for any β it holds that kβk ≤

Ln (β) . DegNSEP∗

40

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family: Non-Separable Case

Theorem: Computational Guarantees for Steepest Descent family: Non-Separable Case Consider the SDGN applied to the Logistic Regression problem with k n (β )k∗ step-sizes αk := 4nk∇L for all k ≥ 0, and suppose that the data is kXk2·,2 non-separable. Then for each k ≥ 0 it holds that: (i) (training error): Ln (β k ) − L∗n ≤ (ii) (gradient norm):

min i∈{0,...,k}

(iii) (regularization): kβ k k ≤

2(ln(2))2 kXk2·,2 k·n·(DegNSEP∗ )2

k∇Ln (β i )k∗ ≤ kXk·,2

q

(ln(2)−L∗ n) 2n·(k+1)

√  1 p k kXk·,2 8n(ln(2) − L∗n ) 41

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family: Non-Separable Case

Theorem: Computational Guarantees for Steepest Descent family: Non-Separable Case Consider the SDGN applied to the Logistic Regression problem with k n (β )k∗ step-sizes αk := 4nk∇L for all k ≥ 0, and suppose that the data is kXk2·,2 non-separable. Then for each k ≥ 0 it holds that: (i) (training error): Ln (β k ) − L∗n ≤ (ii) (gradient norm):

min i∈{0,...,k}

(iii) (regularization): kβ k k ≤

2(ln(2))2 kXk2·,2 k·n·(DegNSEP∗ )2

k∇Ln (β i )k∗ ≤ kXk·,2

q

(ln(2)−L∗ n) 2n·(k+1)

√  1 p k kXk·,2 8n(ln(2) − L∗n ) 41

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Reaching Linear Convergence

Reaching Linear Convergence using Steepest Descent with a Given Norm for Logistic Regression

For logistic regression, does SDGN exhibit linear convergence?

42

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Some Definitions/Notation

Definitions: R :=

max i∈{1,...,n}

kxi k2 (maximum `2 norm of the feature vectors)

H(β ∗ ) denotes the Hessian of Ln (·) at an optimal solution β ∗ λpmin (H(β ∗ )) denotes the smallest non-zero (and hence positive) eigenvalue of H(β ∗ ) NormRatio := maxβ6=0 kβk/kβk2

43

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Reaching Linear Convergence of Steepest Descent family for Logistic Regression Theorem: Reaching Linear Convergence of Steepest Descent family for Logistic Regression Consider SDGN applied to the Logistic Regression problem with k n (β )k∗ step-sizes αk := 4nk∇L for all k ≥ 0, and suppose that the data is kXk2·,2 non-separable. Define: kˇ :=

16 ln(2)2 kXk4·,2 R 2 (NormRatio)2 . 9n2 (DegNSEP∗ )2 λpmin (H(β ∗ ))2

ˇ it holds that: Then for all k ≥ k, k

Ln (β ) −

L∗n



≤ (Ln (β ) −

L∗n )

λpmin (H(β ∗ ))n 1− kXk2·,2 (NormRatio)2

!k−kˇ . 44

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Reaching Linear Convergence of Steepest Descent family for Logistic Regression, cont. Some comments: Proof relies on (a slight generalization of) the “generalized self-concordance” property of the logistic loss function due to [Bach 2014] Furthermore, we can bound: λpmin (H(β ∗ )) ≥

T 1 4n λpmin (X X) exp

 −

ln(2)kXk·,∞ DegNSEP∗



As compared to results of a similar flavor for other algorithms, here we have an exact characterization of when the linear convergence “kicks in” and also what the rate of linear convergence is guaranteed to be Q: Can we exploit this generalized self-concordance property in other ways? (still ongoing . . . ) 45

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

DegNSEP∗ and “Perturbation to Separability”

DegNSEP∗ :=

Pn

minp

1 n

s.t.

kβk = 1

β∈R

i=1 [yi β

T

xi ]−

Theorem: DegNSEP∗ is the “Perturbation to Separability” DegNSEP∗ =

Pn

inf

1 n

s.t.

(xi + ∆xi , yi ), i = 1, . . . , n are separable

∆x1 ,...,∆xn

i=1

k∆xi k∗

46

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Illustration of Perturbation to Separability

47

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Results for Some other Methods

Results for Some other Methods

48

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Standard Accelerated Gradient Method (AGM) P:

F ∗ := minimumx s.t.

F (x) x ∈ Rp

Lipschitz gradient: k∇f (y ) − ∇f (x)k2 ≤ Lky − xk2 for all x, y ∈ Rp Accelerated Gradient Method (AGM) Given x 0 ∈ Rp and z 0 := x 0 , and i ← 0 . Define step-size parameters θi ∈ (0, 1] 1 recursively by θ0 := 1 and θi+1 satisfies θ21 − θi+1 = θ12 . i+1

i

At iteration k: 1

Update :

y k ← (1 − θk )x k + θk z k x k+1 ← y k − L1 ∇f (y k ) z k+1 ← z k +

1 θk

(x k+1 − y k ) 49

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Accelerated Gradient Method (AGM) for Logistic Regression

Theorem: Computational Guarantees for Accelerated Gradient Method (AGM) for Logistic Regression Consider the AGM applied to the Logistic Regression problem initiated at β 0 := 0, and suppose that the data is non-separable. Then for each k ≥ 0 it holds that: (training error):

Ln (β k ) − L∗n ≤

2(ln(2))2 kXk22,2 n · (k + 1)2 · (DegNSEP∗ )2

50

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

AGM with Simple Re-Starting (AGM-SRS) Assume that 0 < F ∗ := minimumx F (x) Accelerated Gradient Method with Simple Re-Starting (AGM-SRS) Initialize with x 0 ∈ Rp . Set x1,0 ← x 0 , i ← 1 . At outer iteration i: 1

Initialize inner iteration. j ← 0

2

Run inner iterations. At inner iteration j: F (xi,j ) If ≥ 0.8 , then: F (xi,0 ) xi,j+1 ← AGM(F (·), xi,0 , j + 1) , j ← j + 1, and Goto step 2. Else xi+1,0 ← xi,j , i ← i + 1, and Goto step 1.

“xi,j ← AGM(F (·), xi,0 , j)” denotes assigning to xi,j the j th iterate of AGM applied with objective function F (·) using the initial point xi,0 ∈ Rp

51

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantee for AGM with Simple Re-Starting for Logistic Regression

Computational Guarantee for Accelerated Gradient Method with Simple Re-Starting for Logistic Regression Consider the AGM with Simple Re-Starting applied to the Logistic Regression problem initiated at β 0 := 0, and suppose that the data is non-separable. Within a total number of computed iterates k that does not exceed √

5.8kXk2,2 8.4kXk2,2 · L∗n √ , ∗ + √ n · DegNSEP n · DegNSEP∗ · ε

the algorithm will deliver an iterate β k for which Ln (β k ) − L∗n ≤ ε . 52

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Results in the Separable Case

Results in the Separable Case

53

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separable Data and Problem Behavior/Conditioning Let us quantify the degree of separability of the data.

(a) Very separable data

(b) Barely separable data

We will relate this to problem behavior/conditioning.... 54

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separability Condition Number DegSEP∗ Definition of Non-Separability Condition Number DegSEP∗ DegSEP∗ :=

max

min

β∈Rp

i∈{1,...,n}

s.t.

kβk ≤ 1

[yi β T xi ]

DegSEP∗ maximizes the minimal classification value [yi β T xi ] (over all normalized classifiers) DegSEP∗ is simply the “maximum margin” in machine learning parlance DegSEP∗ > 0 if and only if the data is separable 55

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Separability Measure DegSEP∗ DegSEP∗ :=

(a) DegSEP∗ is large

max

min

β∈Rp

i∈{1,...,n}

s.t.

kβk ≤ 1

[yi β T xi ]

(b) DegSEP∗ is small 56

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

DegNSEP∗ and Problem Behavior/Conditioning L∗n :=

min β

Ln (β) :=

DegSEP∗ :=

1 n

Pn

i=1

ln(1 + exp(−yi β T xi ))

max

min

β∈Rp

i∈{1,...,n}

s.t.

kβk ≤ 1

[yi β T xi ]

Theorem: Separability and Non-Attainment Suppose that the data is separable. Then DegSEP∗ > 0, L∗n = 0, and LR does not attain its optimum. Despite this, it turns out that the Steepest Descent family is reasonably effective at finding an approximate margin maximizer as we shall shortly see....

57

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Margin function ρ(β) Margin function ρ(β) ρ(β) :=

min i∈{1,...,n}

[yi β T xi ]

58

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family: Separable Case Theorem: Computational Guarantees for Steepest Descent family: Separable Case Consider SDGN applied to the Logistic Regression problem with step-sizes k n (β )k∗ αk := 4nk∇L for all k ≥ 0, and suppose that the data is separable. kXk2·,2   3.7nkXk2·,2 (i) (margin bound): there exists i ≤ for which the ∗ 2 (DegSEP ) i i i normalized iterate β¯ := β /kβ k satisfies ρ(β¯i ) ≥

(ii) (shrinkage): kβ k k ≤ (iii) (gradient norm):

.18 · DegSEP∗ . n

√  1 p k kXk·,2 8n ln(2)

min i∈{0,...,k}

k∇Ln (β i )k∗ ≤ kXk·,2

q

ln(2) 2n·(k+1)

59

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Computational Guarantees for Steepest Descent family: Separable Case Theorem: Computational Guarantees for Steepest Descent family: Separable Case Consider SDGN applied to the Logistic Regression problem with step-sizes k n (β )k∗ αk := 4nk∇L for all k ≥ 0, and suppose that the data is separable. kXk2·,2   3.7nkXk2·,2 (i) (margin bound): there exists i ≤ for which the ∗ 2 (DegSEP ) i i i normalized iterate β¯ := β /kβ k satisfies ρ(β¯i ) ≥

(ii) (shrinkage): kβ k k ≤ (iii) (gradient norm):

.18 · DegSEP∗ . n

√  1 p k kXk·,2 8n ln(2)

min i∈{0,...,k}

k∇Ln (β i )k∗ ≤ kXk·,2

q

ln(2) 2n·(k+1)

59

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

DegSEP∗ and “Perturbation to Non-Separability”

DegSEP∗ :=

max

min

β∈Rp

i∈{1,...,n}

s.t.

kβk ≤ 1

[yi β T xi ]

Theorem: DegSEP∗ is the “Perturbation to Non-Separability” DegSEP∗ =

inf

∆x1 ,...,∆xn

s.t.

max i∈{1,...,n}

k∆xi k∗

(xi + ∆xi , yi ), i = 1, . . . , n are non-separable

60

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Illustration of Perturbation to Non-Separability

61

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Other Issues

Some other topics not mentioned (still ongoing): Other first-order methods for logistic regression (accelerated gradient descent, randomized methods, etc. high-dimensional regime p > n, define DegNSEP∗k and DegSEP∗k for restricting β to satisfy kβk0 ≤ k Numerical experiments comparing methods

Other...

62

SDGN Primer

Logistic Regression

SDGN for LR

Non-Separable Case

Separable Case

Other Issues

Summary Some old and new results for Steepest Descent in a Given Norm (SDGN) Analyizing SDGN for Logistic Regression: separable/non-separable cases

Non-Separable case

condition number DegNSEP∗ computational guarantees for SGDN including reaching linear convergence Separable case

condition number DegSEP∗ computational guarantees for SGDN including computing an approximate maximum margin classifier 63