Performance Guarantees for Random Fourier Features–Limitations ...

Report 2 Downloads 31 Views
Performance Guarantees for Random Fourier Features – Limitations and Merits Zolt´ an Szab´ o

Joint work with Bharath K. Sriperumbudur (PSU)

ML@SITraN, University of Sheffield June 25, 2015

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Context

Given: k(x, y) =

Z

eiω

T (x−y)

dΛ(ω) =

Rd

Z

Rd

  cos ω T (x − y) dΛ(ω).

ˆ y): Monte-Carlo estimator of k(x, y) using (ωj )m i .i∼.d. Λ k(x, j=1 [Rahimi and Recht, 2007]. Motivation: Primal form – fast linear solvers. Kernel function approximation: out-of-sample extension. Online applications.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Performance measures

Uniform (r = ∞):



ˆ y) .

k − kˆ := sup k(x, y) − k(x, S

x,y∈S

Lr (1 ≤ r < ∞):

ˆ Lr (S) := kk − kk

Z Z S

Zolt´ an Szab´ o

S

ˆ y)|r dx dy |k(x, y) − k(x,

1 r

.

Random Fourier Features – Limitations and Merits

Approximation of kernel derivatives

One could also consider ∂ p,q k. Motivation [Zhou, 2008, Shi et al., 2010, Rosasco et al., 2010, Rosasco et al., 2013, Ying et al., 2012, Sriperumbudur et al., 2014]: semi-supervised learning with gradient information, nonlinear variable selection, fitting of infD exp. family distributions.

Many of the presented results hold for derivatives ([p; q] 6= 0).

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Goal

Large deviation inequalities

 

Λm k − kˆ ≤ ǫ ≥ f1 (ǫ, d, m, |S|),

S  

m ˆ Λ k − k r ≤ ǫ ≥ f2 (ǫ, d, m, |S|). L

Scaling of |S| and m ensuring a.s. convergence?

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Existing results on the approximation quality

Notations: Xn = Op (rn ) (Oa.s. (rn )) denotes probability (almost surely).

Xn rn

boundedness in

[Rahimi and Recht, 2007]:



ˆ

k(x, y) − k(x, y) = Op S

r

|S|

log m m

!

.

[Sutherland and Schneider, 2015]: better constants.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Contents

Uniform guarantee (empirical process theory), Two Lr guarantees (uniform consequence, direct). Kernel derivatives.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

High-level proof

1

Empirical process form:



k − kˆ = sup |Λg − Λm g | = kΛ − Λm kG . S

2

g ∈G

kΛ − Λm kG concentrates by its bounded difference property: 1 kΛ − Λm kG - Eω1:m kΛ − Λm kG + √ . m

3

G is a uniformly bounded, separable Carath´eodory family ⇒ Eω1:m kΛ − Λm kG - Eω1:m R (G, ω1:m ) .

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

High-level proof 4

Using Dudley’s entropy integral: 1 R (G, ω1:m ) - √ m

Z

|G|L2 (Λm ) 0

q

log N (G, L2 (Λm ), r )dr .

5

G is smoothly parameterized by a compact set ⇒ s   q C (ω1:m ) 2 log N (G, L (Λm ), r ) ≤ log +1 ⇒ r 1 Eω1:m R (G, ω1:m ) - √ . m

6

Putting together: ! r

1 1 log |S|

.

k − kˆ - √ + √ = O m S m m Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-1: empirical process form

Notation: Λg =

R

g (ω)dΛ(ω), Λm g =

Zolt´ an Szab´ o

R

g (ω)dΛm (ω) =

1 m

Pm

j=1 g (ωj ).

Random Fourier Features – Limitations and Merits

Step-1: empirical process form R R P Notation: Λg = g (ω)dΛ(ω), Λm g = g (ω)dΛm (ω) = m1 m j=1 g (ωj ). Reformulation of the objective: ˆ sup k(x, y) − k(x, y) = sup |Λg − Λm g | =: kΛ − Λm kG , g ∈G

x,y∈S

where

G = {gz : z ∈ S∆ },

S∆ = S − S = {x − y : x, y ∈ S},   gz : ω 7→ cos ω T z .

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG

McDiarmid inequality: Let ω1 , . . . , ωm ∈ D be independent r.v.-s, and f : D m → R satisfy the bounded diff. property (∀r ): f (u1 , . . . , um ) − f (u1 , . . . , ur −1 , ur′ , ur +1 , . . . , um ) ≤ cr . sup

u1 ,...,um ,ur′ ∈D

Then for ∀β > 0

P (f (ω1 , . . . , ωm ) − E [f (ω1 , . . . , ωm )] ≥ β) ≤ e

Zolt´ an Szab´ o

2β − Pm

2

c2 r=1 r

.

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X  1  1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X  1  1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)



1 sup g (ωr ) − g (ωr′ ) m g ∈G

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X  1  1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)



 1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X  1  1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)

 1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G # " 1 sup |g (ωr )| + sup g (ωr′ ) ≤ m g ∈G g ∈G ≤

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X  1  1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)

 1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G # " 1+1 1 2 sup |g (ωr )| + sup g (ωr′ ) ≤ ≤ = . m g ∈G m m g ∈G ≤

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| g ∈G g ∈G

Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine

supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G

g ∈G

g ∈G

Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine

supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G

g ∈G

sup |a(g )| = sup |a(g ) + b(g ) − b(g )|

g ∈G

g ∈G

≤ sup |a(g ) + b(g )| + sup |b(g )|. g ∈G

g ∈G

g ∈G

Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine

supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G

g ∈G

g ∈G

sup |a(g )| = sup |a(g ) + b(g ) − b(g )|

g ∈G

g ∈G

"

≤ sup |a(g ) + b(g )| + sup |b(g )|. g ∈G g ∈G #

⇒ ± sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G

g ∈G

Our choice: a(g ) = Λg −

1 m

P

g ∈G

j=1 g (ωj ),

Zolt´ an Szab´ o

b(g ) =

1 m

[g (ωr ) − g (ωr′ )].

Random Fourier Features – Limitations and Merits

Step-2

Applying McDiarmid to f (cr = kΛ − Λm kG ≤

2 m ):

with probability 1 − e −τ

Eω1:m kΛ − Λm kG {z } |

√ 2τ +√ . m

Step-3: bounding this term

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e.  1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e.  1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆  2 z 7→ cos ω T z : continuous for ∀ω.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e.  1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆  2 z 7→ cos ω T z : continuous for ∀ω. 3

Rd is separable, S∆ ⊆ Rd ⇒ S∆ : separable.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e.  1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆  2 z 7→ cos ω T z : continuous for ∀ω. 3

Rd is separable, S∆ ⊆ Rd ⇒ S∆ : separable.

Thus, by [Steinwart and Christmann, 2008, Prop. 7.10] Eω1:m kΛ − Λm kG ≤ 2Eω1:m [

R (G, ω ) ] | {z 1:m} P m 1 :=Eǫ supg∈G | m j=1 ǫj g (ωj )|

using the uniformly boundedness of G (sup kg k∞ ≤ 1). g ∈G

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-4: bounding R

R

G, (ωj )m j=1



√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-4: bounding R

R

G, (ωj )m j=1



√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0

where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ),

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-4: bounding R

R

G, (ωj )m j=1



√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0

where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ), |G|L2 (Λm ) = supg1 ,g2 ∈G kg1 − g2 kL2 (Λm ) ,

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-4: bounding R

R

G, (ωj )m j=1



√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0

where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ), |G|L2 (Λm ) = supg1 ,g2 ∈G kg1 − g2 kL2 (Λm ) , N (G, L2 (Λm ), r ): r -covering number. r -net: S ⊆ G, for ∀g ∈ G ∃s ∈ S such that kg − skL2 (Λm ) ≤ r . N : size of the smallest r -net of G.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on |G|L2 (Λm )

|G|L2 (Λm ) = sup kg1 − g2 kL2 (Λm ) ≤ sup g1 ,g2 ∈G

g1 ,g2 ∈G

  kg1 kL2 (Λm ) + kg2 kL2 (Λm ) ∗

≤ sup kg1 kL2 (Λm ) + sup kg2 kL2 (Λm ) ≤ 2 × 1, g1 ∈G

g1 ∈G

Step-5: bound on |G|L2 (Λm )

|G|L2 (Λm ) = sup kg1 − g2 kL2 (Λm ) ≤ sup g1 ,g2 ∈G

g1 ,g2 ∈G

  kg1 kL2 (Λm ) + kg2 kL2 (Λm ) ∗

≤ sup kg1 kL2 (Λm ) + sup kg2 kL2 (Λm ) ≤ 2 × 1, g1 ∈G

sup kg kL2 (Λm )

g ∈G

g1 ∈G

v u X u1 m gz2 (ωj ) = sup t m z∈S∆ j=1 v v u X u X   u1 m u1 m T 2 = sup t cos ωj z ≤ sup t 1= 1. m m z∈S∆ z∈S∆ j=1

Zolt´ an Szab´ o

j=1

Random Fourier Features – Limitations and Merits

Step-5: bound on N (G, L2(Λm ), r )

Let gz1 , gz2 ∈ G. We want to bound kgz1 − gz2 kL2 (Λm ) . One term:     cos ω T z1 − cos ω T z2

 

= ∇z cos ω T zc kz1 − z2 k2

 2 

T = − sin ω zc ω kz1 − z2 k2 2

≤ kωk2 kz1 − z2 k2 ,

where zc ∈ (z1 , z2 ).

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on N (G, L2(Λm ), r ) Smooth parameterization: kgz1 − gz2 kL2 (Λm )

v u X 2 u1 m kωj k2 kz1 − z2 k2 ≤t m j=1 v u X u1 m = kz1 − z2 k2 t kωj k22 . m j=1 {z } | =:A

r -net on (S∆ , k·k2 ) ⇒ r ′ = Ar -net on (G, L2 (Λm )).   In other words, N G, L2 (Λm ), r ≤ N S∆ , k·k2 , Ar . Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on N (G, L2(Λm ), r )

  Note that S∆ ⊆ Bk·k2 t, |S2∆ | for some t ∈ Rd . d N (Bk·k2 (s, R), k·k2 , ǫ) ≤ 2R for ∀s ∈ Rd . ǫ +1 Thus

2



N G, L (Λm ), r ≤



d 2|S|A +1 r

by |S∆ | ≤ 2|S| and the compactness of S∆ .

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R Combining the obtained √ Z q 8 2 |G|L2(Λm ) R (G, ω1:m ) ≤ √ log N (G, L2 (Λm ), r )dr , m 0 |G|L2 (Λm ) ≤ 2,     2|S|A 2 +1 log N G, L (Λm ), r ≤ d log r

results

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R Combining the obtained √ Z q 8 2 |G|L2(Λm ) R (G, ω1:m ) ≤ √ log N (G, L2 (Λm ), r )dr , m 0 |G|L2 (Λm ) ≤ 2,     2|S|A 2 +1 log N G, L (Λm ), r ≤ d log r

results, we have (r ≤ 2)

√ Z s   8 2d 2 2|S|A + 2 R (G, ω1:m ) ≤ √ log dr . r m 0

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R

Using |S|A + 1 ≤ (|S| + 1)(A + 1) √ Z s   8 2d 2 2|S|A + 2 R (G, ω1:m ) ≤ √ dr log r m 0 # √ "Z 2 r p 2 (|S| + 1) 8 2d log ≤ √ dr + 2 log(A + 1) r m 0 # √ "Z 1 r p 16 2d |S| + 1 log = √ dr + log(A + 1) . r m 0 Applying

R1p 0

log ar dr ≤

√ log a +

Zolt´ an Szab´ o

√1 2 log a

(a > 1)

Random Fourier Features – Limitations and Merits

Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)

By the Jensen inequality

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)

By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1),

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)

By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1), v u X h i u1 m Eωj kωj k22 =: σ. ⇒ Eω1:m A ≤ t m j=1

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)

By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1), v u X h i u1 m Eωj kωj k22 =: σ. ⇒ Eω1:m A ≤ t m j=1

Eω1:m R (G, ω1:m ) ≤ (1), but with A → σ.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Step-6: putting together

Result: k continuous, shift-invariant kernel; for any τ > 0, S 6= ∅ compact set, √ ! 2τ h(d, |S|, σ) + m ˆ y) − k(x, y)| ≥ √ ≤ e −τ , sup |k(x, Λ m x,y∈S s p p h(d, |S|, σ) := 32 2d log(|S| + 1) + 32 2d log(σ + 1) + 16

Zolt´ an Szab´ o

2d . log(|S| + 1)

Random Fourier Features – Limitations and Merits

Step-6: putting together Result: k continuous, shift-invariant kernel; for any τ > 0, S 6= ∅ compact set,   √   ˆ y) − k(x, y)| ≥ h(d, |S|,√σ) + 2τ  ≤ e −τ , sup |k(x, Λm   x,y∈S m {z } | :=ǫ

s p p h(d, |S|, σ) := 32 2d log(|S| + 1) + 32 2d log(σ + 1) + 16

2d , log(|S| + 1)

Equivalently √

  [ǫ m−h(d,|S|,σ)]2

ˆ

− 2 . Λ k − k ≥ ǫ ≤ e

m

S

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Discussion (Borel-Cantelli lemma)

m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate

Zolt´ an Szab´ o

q

log |S| m .

Random Fourier Features – Limitations and Merits

Discussion (Borel-Cantelli lemma)

m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate Growing diameter:

q

log |S| m .

log |Sm | m→∞ −−−−→  0 m

is enough (i.e., |Sm | = e o(m) ) ↔  p Old: |Sm | = o m/ log m .

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Discussion (Borel-Cantelli lemma)

m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate Growing diameter:

q

log |S| m .

log |Sm | m→∞ −−−−→  0 m

is enough (i.e., |Sm | = e o(m) ) ↔  p Old: |Sm | = o m/ log m .

Specifically:

asymptotically optimal result [Cs¨ org˝ o and Totik, 1983, Theorem 2] (if ψ vanishes at ∞), at faster rate ⇒ even conv. in prob. would fail.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Direct consequence: Lr guarantee (1 < r )

Idea: Note that kkˆ − kkLr (S) =

Z Z S

S

ˆ y) − k(x, y)| dx dy |k(x, r

≤ kkˆ − kkS×S vol2/r (S). n vol(S) ≤ vol(B), where B := x ∈ Rd : kxk2 ≤ vol(B) =

π d/2 |S|d . 2d Γ( d2 +1)

Zolt´ an Szab´ o

|S| 2

o

1 r

,

Random Fourier Features – Limitations and Merits

Lr large deviation inequality

Under the previous assumptions:  !2/r √  d/2 |S|d h(d, |S|, σ) + 2τ  π √ Λm kkˆ − kkLr (S) ≥ ≤ e −τ . d d m 2 Γ( 2 + 1)

In other words,

  p kkˆ − kkLr (S) = Oa.s. m−1/2 |S|2d/r log |S| .

For 2 ≤ r : direct Lr proof ⇒

p

Zolt´ an Szab´ o

log(|S|) factor can be discarded.

Random Fourier Features – Limitations and Merits

Kernel derivatives

If supp(Λ) is bounded k-proof can be extended (Lr as well), but Gaussian kernel:(

[Rahimi and Recht, 2007]’s proof: Hoeffding inequality (boundedness!) + Lipschitzness,

Bernstein + Lipschitzness: handles ∂ p,q k with moment constraints on Λ (example: Gaussian kernel). slightly worse rates.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Conclusion

Kernel + derivative approximations. Performance: uniform, Lr . Detailed finite-sample analysis, optimal rates. Paper (submitted to NIPS): RFF: http://arxiv.org/abs/1506.02155, infD exp. fitting: http://arxiv.org/abs/1506.02564.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Thank you for the attention!

Acknowledgments: This work was supported by the Gatsby Charitable Foundation.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Cs¨org˝ o, S. and Totik, V. (1983). On how long interval is the empirical characteristic function uniformly consistent? Acta Sci. Math. (Szeged), 45:141–149. Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Neural Information Processing Systems (NIPS), pages 1177–1184. Rosasco, L., Santoro, M., Mosci, S., Verri, A., and Villa, S. (2010). A regularization approach to nonlinear variable selection. JMLR W&CP – International Conference on Artificial Intelligence and Statistics (AISTATS), 9:653–660. Rosasco, L., Villa, S., Mosci, S., Santoro, M., and Verri, A. (2013). Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14:1665–1714. Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Shi, L., Guo, X., and Zhou, D.-X. (2010). Hermite learning with gradient data. Journal of Computational and Applied Mathematics, 233:3046–3059. Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Hyv¨arinen, A., and Kumar, R. (2014). Density estimation in infinite dimensional exponential families. Technical report. http://arxiv.org/pdf/1312.3516.pdf. Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer. Sutherland, D. and Schneider, J. (2015). On the error of random fourier features. In Conference on Uncertainty in Artificial Intelligience (UAI). Ying, Y., Wu, Q., and Campbell, C. (2012). Learning the coordinate gradients. Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Advances in Computational Mathematics, 37:355–378. Zhou, D.-X. (2008). Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics, 220:456–463.

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits

Support of a measure

Ingredients: (X , τ ): topological space with a countable basis. B = σ(τ ): sigma-algebra generated by τ . Λ: measure on (X , B).

Then supp(Λ) = ∪{A ∈ τ : Λ(A) = 0}, i.e., the complement of the union of all open Λ-null sets. Our choice: X = Rd .

Zolt´ an Szab´ o

Random Fourier Features – Limitations and Merits