Performance Guarantees for Random Fourier Features – Limitations and Merits Zolt´ an Szab´ o
Joint work with Bharath K. Sriperumbudur (PSU)
ML@SITraN, University of Sheffield June 25, 2015
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Context
Given: k(x, y) =
Z
eiω
T (x−y)
dΛ(ω) =
Rd
Z
Rd
cos ω T (x − y) dΛ(ω).
ˆ y): Monte-Carlo estimator of k(x, y) using (ωj )m i .i∼.d. Λ k(x, j=1 [Rahimi and Recht, 2007]. Motivation: Primal form – fast linear solvers. Kernel function approximation: out-of-sample extension. Online applications.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Performance measures
Uniform (r = ∞):
ˆ y) .
k − kˆ := sup k(x, y) − k(x, S
x,y∈S
Lr (1 ≤ r < ∞):
ˆ Lr (S) := kk − kk
Z Z S
Zolt´ an Szab´ o
S
ˆ y)|r dx dy |k(x, y) − k(x,
1 r
.
Random Fourier Features – Limitations and Merits
Approximation of kernel derivatives
One could also consider ∂ p,q k. Motivation [Zhou, 2008, Shi et al., 2010, Rosasco et al., 2010, Rosasco et al., 2013, Ying et al., 2012, Sriperumbudur et al., 2014]: semi-supervised learning with gradient information, nonlinear variable selection, fitting of infD exp. family distributions.
Many of the presented results hold for derivatives ([p; q] 6= 0).
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Goal
Large deviation inequalities
Λm k − kˆ ≤ ǫ ≥ f1 (ǫ, d, m, |S|),
S
m ˆ Λ k − k r ≤ ǫ ≥ f2 (ǫ, d, m, |S|). L
Scaling of |S| and m ensuring a.s. convergence?
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Existing results on the approximation quality
Notations: Xn = Op (rn ) (Oa.s. (rn )) denotes probability (almost surely).
Xn rn
boundedness in
[Rahimi and Recht, 2007]:
ˆ
k(x, y) − k(x, y) = Op S
r
|S|
log m m
!
.
[Sutherland and Schneider, 2015]: better constants.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Contents
Uniform guarantee (empirical process theory), Two Lr guarantees (uniform consequence, direct). Kernel derivatives.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
High-level proof
1
Empirical process form:
k − kˆ = sup |Λg − Λm g | = kΛ − Λm kG . S
2
g ∈G
kΛ − Λm kG concentrates by its bounded difference property: 1 kΛ − Λm kG - Eω1:m kΛ − Λm kG + √ . m
3
G is a uniformly bounded, separable Carath´eodory family ⇒ Eω1:m kΛ − Λm kG - Eω1:m R (G, ω1:m ) .
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
High-level proof 4
Using Dudley’s entropy integral: 1 R (G, ω1:m ) - √ m
Z
|G|L2 (Λm ) 0
q
log N (G, L2 (Λm ), r )dr .
5
G is smoothly parameterized by a compact set ⇒ s q C (ω1:m ) 2 log N (G, L (Λm ), r ) ≤ log +1 ⇒ r 1 Eω1:m R (G, ω1:m ) - √ . m
6
Putting together: ! r
1 1 log |S|
.
k − kˆ - √ + √ = O m S m m Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-1: empirical process form
Notation: Λg =
R
g (ω)dΛ(ω), Λm g =
Zolt´ an Szab´ o
R
g (ω)dΛm (ω) =
1 m
Pm
j=1 g (ωj ).
Random Fourier Features – Limitations and Merits
Step-1: empirical process form R R P Notation: Λg = g (ω)dΛ(ω), Λm g = g (ω)dΛm (ω) = m1 m j=1 g (ωj ). Reformulation of the objective: ˆ sup k(x, y) − k(x, y) = sup |Λg − Λm g | =: kΛ − Λm kG , g ∈G
x,y∈S
where
G = {gz : z ∈ S∆ },
S∆ = S − S = {x − y : x, y ∈ S}, gz : ω 7→ cos ω T z .
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG
McDiarmid inequality: Let ω1 , . . . , ωm ∈ D be independent r.v.-s, and f : D m → R satisfy the bounded diff. property (∀r ): f (u1 , . . . , um ) − f (u1 , . . . , ur −1 , ur′ , ur +1 , . . . , um ) ≤ cr . sup
u1 ,...,um ,ur′ ∈D
Then for ∀β > 0
P (f (ω1 , . . . , ωm ) − E [f (ω1 , . . . , ωm )] ≥ β) ≤ e
Zolt´ an Szab´ o
2β − Pm
2
c2 r=1 r
.
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X 1 1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X 1 1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)
≤
1 sup g (ωr ) − g (ωr′ ) m g ∈G
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X 1 1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)
≤
1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X 1 1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)
1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G # " 1 sup |g (ωr )| + sup g (ωr′ ) ≤ m g ∈G g ∈G ≤
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: bounded difference property of kΛ − Λm kG Our choice: f (ω1 , . . . , ωm ) := kΛ − Λm kG . |f (ω1 , . . . , ωr −1 , ωr , ωr +1 , . . . , ωm ) − f (ω1 , . . . , ωr −1 , ωr′ , ωr +1 , . . . , ωm )| = X X 1 1 1 ′ g (ωj ) − sup Λg − g (ωj ) + g (ωr ) − g (ωr ) = sup Λg − m m m g ∈G g ∈G j=1 j=1 (∗)
1 1 sup g (ωr ) − g (ωr′ ) ≤ sup |g (ωr )| + g (ωr′ ) m g ∈G m g ∈G # " 1+1 1 2 sup |g (ωr )| + sup g (ωr′ ) ≤ ≤ = . m g ∈G m m g ∈G ≤
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| g ∈G g ∈G
Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine
supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G
g ∈G
g ∈G
Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine
supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G
g ∈G
sup |a(g )| = sup |a(g ) + b(g ) − b(g )|
g ∈G
g ∈G
≤ sup |a(g ) + b(g )| + sup |b(g )|. g ∈G
g ∈G
g ∈G
Step-2: (*) = reverse triangle inequality with sup Lemma: G: set of functions, a, b : G → R maps; then sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G g ∈G g ∈G Proof: combine
supg ∈G |a(g ) + b(g )| ≤ sup (|a(g )| + |b(g )|) ≤ sup |a(g )| + sup |b(g )|, g ∈G
g ∈G
g ∈G
sup |a(g )| = sup |a(g ) + b(g ) − b(g )|
g ∈G
g ∈G
"
≤ sup |a(g ) + b(g )| + sup |b(g )|. g ∈G g ∈G #
⇒ ± sup |a(g )| − sup |a(g ) + b(g )| ≤ sup |b(g )|. g ∈G
g ∈G
Our choice: a(g ) = Λg −
1 m
P
g ∈G
j=1 g (ωj ),
Zolt´ an Szab´ o
b(g ) =
1 m
[g (ωr ) − g (ωr′ )].
Random Fourier Features – Limitations and Merits
Step-2
Applying McDiarmid to f (cr = kΛ − Λm kG ≤
2 m ):
with probability 1 − e −τ
Eω1:m kΛ − Λm kG {z } |
√ 2τ +√ . m
Step-3: bounding this term
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e. 1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e. 1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆ 2 z 7→ cos ω T z : continuous for ∀ω.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e. 1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆ 2 z 7→ cos ω T z : continuous for ∀ω. 3
Rd is separable, S∆ ⊆ Rd ⇒ S∆ : separable.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-3: bounding Eω1 ,...,ωm kΛ − Λm kG G = {gz : z ∈ S∆ } is a separable Carath´eodory family, i.e. 1 ω 7→ cos ω T z : measurable for ∀z ∈ S . ∆ 2 z 7→ cos ω T z : continuous for ∀ω. 3
Rd is separable, S∆ ⊆ Rd ⇒ S∆ : separable.
Thus, by [Steinwart and Christmann, 2008, Prop. 7.10] Eω1:m kΛ − Λm kG ≤ 2Eω1:m [
R (G, ω ) ] | {z 1:m} P m 1 :=Eǫ supg∈G | m j=1 ǫj g (ωj )|
using the uniformly boundedness of G (sup kg k∞ ≤ 1). g ∈G
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-4: bounding R
R
G, (ωj )m j=1
√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-4: bounding R
R
G, (ωj )m j=1
√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0
where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ),
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-4: bounding R
R
G, (ωj )m j=1
√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0
where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ), |G|L2 (Λm ) = supg1 ,g2 ∈G kg1 − g2 kL2 (Λm ) ,
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-4: bounding R
R
G, (ωj )m j=1
√ Z q 8 2 |G|L2 (Λm ) log N (G, L2 (Λm ), r )dr , ≤ √ m 0
where q P 2 L2 (Λm ) = L2 (Rd , B(Rd ), Λm ), kg kL2 (Λm ) = m1 m j=1 g (ωj ), |G|L2 (Λm ) = supg1 ,g2 ∈G kg1 − g2 kL2 (Λm ) , N (G, L2 (Λm ), r ): r -covering number. r -net: S ⊆ G, for ∀g ∈ G ∃s ∈ S such that kg − skL2 (Λm ) ≤ r . N : size of the smallest r -net of G.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on |G|L2 (Λm )
|G|L2 (Λm ) = sup kg1 − g2 kL2 (Λm ) ≤ sup g1 ,g2 ∈G
g1 ,g2 ∈G
kg1 kL2 (Λm ) + kg2 kL2 (Λm ) ∗
≤ sup kg1 kL2 (Λm ) + sup kg2 kL2 (Λm ) ≤ 2 × 1, g1 ∈G
g1 ∈G
Step-5: bound on |G|L2 (Λm )
|G|L2 (Λm ) = sup kg1 − g2 kL2 (Λm ) ≤ sup g1 ,g2 ∈G
g1 ,g2 ∈G
kg1 kL2 (Λm ) + kg2 kL2 (Λm ) ∗
≤ sup kg1 kL2 (Λm ) + sup kg2 kL2 (Λm ) ≤ 2 × 1, g1 ∈G
sup kg kL2 (Λm )
g ∈G
g1 ∈G
v u X u1 m gz2 (ωj ) = sup t m z∈S∆ j=1 v v u X u X u1 m u1 m T 2 = sup t cos ωj z ≤ sup t 1= 1. m m z∈S∆ z∈S∆ j=1
Zolt´ an Szab´ o
j=1
Random Fourier Features – Limitations and Merits
Step-5: bound on N (G, L2(Λm ), r )
Let gz1 , gz2 ∈ G. We want to bound kgz1 − gz2 kL2 (Λm ) . One term: cos ω T z1 − cos ω T z2
= ∇z cos ω T zc kz1 − z2 k2
2
T = − sin ω zc ω kz1 − z2 k2 2
≤ kωk2 kz1 − z2 k2 ,
where zc ∈ (z1 , z2 ).
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on N (G, L2(Λm ), r ) Smooth parameterization: kgz1 − gz2 kL2 (Λm )
v u X 2 u1 m kωj k2 kz1 − z2 k2 ≤t m j=1 v u X u1 m = kz1 − z2 k2 t kωj k22 . m j=1 {z } | =:A
r -net on (S∆ , k·k2 ) ⇒ r ′ = Ar -net on (G, L2 (Λm )). In other words, N G, L2 (Λm ), r ≤ N S∆ , k·k2 , Ar . Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on N (G, L2(Λm ), r )
Note that S∆ ⊆ Bk·k2 t, |S2∆ | for some t ∈ Rd . d N (Bk·k2 (s, R), k·k2 , ǫ) ≤ 2R for ∀s ∈ Rd . ǫ +1 Thus
2
N G, L (Λm ), r ≤
d 2|S|A +1 r
by |S∆ | ≤ 2|S| and the compactness of S∆ .
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R Combining the obtained √ Z q 8 2 |G|L2(Λm ) R (G, ω1:m ) ≤ √ log N (G, L2 (Λm ), r )dr , m 0 |G|L2 (Λm ) ≤ 2, 2|S|A 2 +1 log N G, L (Λm ), r ≤ d log r
results
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R Combining the obtained √ Z q 8 2 |G|L2(Λm ) R (G, ω1:m ) ≤ √ log N (G, L2 (Λm ), r )dr , m 0 |G|L2 (Λm ) ≤ 2, 2|S|A 2 +1 log N G, L (Λm ), r ≤ d log r
results, we have (r ≤ 2)
√ Z s 8 2d 2 2|S|A + 2 R (G, ω1:m ) ≤ √ log dr . r m 0
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R
Using |S|A + 1 ≤ (|S| + 1)(A + 1) √ Z s 8 2d 2 2|S|A + 2 R (G, ω1:m ) ≤ √ dr log r m 0 # √ "Z 2 r p 2 (|S| + 1) 8 2d log ≤ √ dr + 2 log(A + 1) r m 0 # √ "Z 1 r p 16 2d |S| + 1 log = √ dr + log(A + 1) . r m 0 Applying
R1p 0
log ar dr ≤
√ log a +
Zolt´ an Szab´ o
√1 2 log a
(a > 1)
Random Fourier Features – Limitations and Merits
Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)
By the Jensen inequality
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)
By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1),
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)
By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1), v u X h i u1 m Eωj kωj k22 =: σ. ⇒ Eω1:m A ≤ t m j=1
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-5: bound on R we get R (G, ω1:m ) ≤ (1) # √ " p 16 2d p 1 √ log(|S| + 1) + p + log(A + 1) . m 2 log(|S| + 1)
By the Jensen inequality p p p Eω1:m log(A + 1) ≤ Eω1:m log(A + 1) ≤ log(Eω1:m A + 1), v u X h i u1 m Eωj kωj k22 =: σ. ⇒ Eω1:m A ≤ t m j=1
Eω1:m R (G, ω1:m ) ≤ (1), but with A → σ.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Step-6: putting together
Result: k continuous, shift-invariant kernel; for any τ > 0, S 6= ∅ compact set, √ ! 2τ h(d, |S|, σ) + m ˆ y) − k(x, y)| ≥ √ ≤ e −τ , sup |k(x, Λ m x,y∈S s p p h(d, |S|, σ) := 32 2d log(|S| + 1) + 32 2d log(σ + 1) + 16
Zolt´ an Szab´ o
2d . log(|S| + 1)
Random Fourier Features – Limitations and Merits
Step-6: putting together Result: k continuous, shift-invariant kernel; for any τ > 0, S 6= ∅ compact set, √ ˆ y) − k(x, y)| ≥ h(d, |S|,√σ) + 2τ ≤ e −τ , sup |k(x, Λm x,y∈S m {z } | :=ǫ
s p p h(d, |S|, σ) := 32 2d log(|S| + 1) + 32 2d log(σ + 1) + 16
2d , log(|S| + 1)
Equivalently √
[ǫ m−h(d,|S|,σ)]2
ˆ
− 2 . Λ k − k ≥ ǫ ≤ e
m
S
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Discussion (Borel-Cantelli lemma)
m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate
Zolt´ an Szab´ o
q
log |S| m .
Random Fourier Features – Limitations and Merits
Discussion (Borel-Cantelli lemma)
m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate Growing diameter:
q
log |S| m .
log |Sm | m→∞ −−−−→ 0 m
is enough (i.e., |Sm | = e o(m) ) ↔ p Old: |Sm | = o m/ log m .
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Discussion (Borel-Cantelli lemma)
m→∞ A.s. convergence on compact sets: kˆ −−−−→ k at rate Growing diameter:
q
log |S| m .
log |Sm | m→∞ −−−−→ 0 m
is enough (i.e., |Sm | = e o(m) ) ↔ p Old: |Sm | = o m/ log m .
Specifically:
asymptotically optimal result [Cs¨ org˝ o and Totik, 1983, Theorem 2] (if ψ vanishes at ∞), at faster rate ⇒ even conv. in prob. would fail.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Direct consequence: Lr guarantee (1 < r )
Idea: Note that kkˆ − kkLr (S) =
Z Z S
S
ˆ y) − k(x, y)| dx dy |k(x, r
≤ kkˆ − kkS×S vol2/r (S). n vol(S) ≤ vol(B), where B := x ∈ Rd : kxk2 ≤ vol(B) =
π d/2 |S|d . 2d Γ( d2 +1)
Zolt´ an Szab´ o
|S| 2
o
1 r
,
Random Fourier Features – Limitations and Merits
Lr large deviation inequality
Under the previous assumptions: !2/r √ d/2 |S|d h(d, |S|, σ) + 2τ π √ Λm kkˆ − kkLr (S) ≥ ≤ e −τ . d d m 2 Γ( 2 + 1)
In other words,
p kkˆ − kkLr (S) = Oa.s. m−1/2 |S|2d/r log |S| .
For 2 ≤ r : direct Lr proof ⇒
p
Zolt´ an Szab´ o
log(|S|) factor can be discarded.
Random Fourier Features – Limitations and Merits
Kernel derivatives
If supp(Λ) is bounded k-proof can be extended (Lr as well), but Gaussian kernel:(
[Rahimi and Recht, 2007]’s proof: Hoeffding inequality (boundedness!) + Lipschitzness,
Bernstein + Lipschitzness: handles ∂ p,q k with moment constraints on Λ (example: Gaussian kernel). slightly worse rates.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Conclusion
Kernel + derivative approximations. Performance: uniform, Lr . Detailed finite-sample analysis, optimal rates. Paper (submitted to NIPS): RFF: http://arxiv.org/abs/1506.02155, infD exp. fitting: http://arxiv.org/abs/1506.02564.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Thank you for the attention!
Acknowledgments: This work was supported by the Gatsby Charitable Foundation.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Cs¨org˝ o, S. and Totik, V. (1983). On how long interval is the empirical characteristic function uniformly consistent? Acta Sci. Math. (Szeged), 45:141–149. Rahimi, A. and Recht, B. (2007). Random features for large-scale kernel machines. In Neural Information Processing Systems (NIPS), pages 1177–1184. Rosasco, L., Santoro, M., Mosci, S., Verri, A., and Villa, S. (2010). A regularization approach to nonlinear variable selection. JMLR W&CP – International Conference on Artificial Intelligence and Statistics (AISTATS), 9:653–660. Rosasco, L., Villa, S., Mosci, S., Santoro, M., and Verri, A. (2013). Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14:1665–1714. Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Shi, L., Guo, X., and Zhou, D.-X. (2010). Hermite learning with gradient data. Journal of Computational and Applied Mathematics, 233:3046–3059. Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Hyv¨arinen, A., and Kumar, R. (2014). Density estimation in infinite dimensional exponential families. Technical report. http://arxiv.org/pdf/1312.3516.pdf. Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer. Sutherland, D. and Schneider, J. (2015). On the error of random fourier features. In Conference on Uncertainty in Artificial Intelligience (UAI). Ying, Y., Wu, Q., and Campbell, C. (2012). Learning the coordinate gradients. Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Advances in Computational Mathematics, 37:355–378. Zhou, D.-X. (2008). Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics, 220:456–463.
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits
Support of a measure
Ingredients: (X , τ ): topological space with a countable basis. B = σ(τ ): sigma-algebra generated by τ . Λ: measure on (X , B).
Then supp(Λ) = ∪{A ∈ τ : Λ(A) = 0}, i.e., the complement of the union of all open Λ-null sets. Our choice: X = Rd .
Zolt´ an Szab´ o
Random Fourier Features – Limitations and Merits