A Note On Spectral Clustering

Report 4 Downloads 261 Views
arXiv:1509.09188v2 [cs.DM] 8 Jan 2016

A Note On Spectral Clustering Pavel Kolev∗ Kurt Mehlhorn Max-Planck-Institut f¨ ur Informatik, Saarbr¨ ucken, Germany {pkolev,mehlhorn}@mpi-inf.mpg.de

Abstract Let G = (V, E) be an undirected graph, λk the kth smallest eigenvalue of the normalized Laplacian matrix LG of G, and ρ(k) (b ρ(k)) the smallest value of the maximal conductance over all k disjoint subsets Z1 , . . . , Zk (that form a partition) of V . Oveis Gharan and Trevisan [3] proved the existence of a k-way partition (P1 , . . . , Pk ) of V with ρb(k) 6 kρ(k). The k-way (approximate) partitioning problem asks to partition a graph into k clusters such that the conductance of each cluster is (approximately) bounded by ρb(k). Peng et al. [4] gave the first rigorous analysis of approximation algorithms for the k-way partitioning problem that are based on clustering suitably normalized eigenvectors of LG with the help of an approximate k-means algorithm. Their analysis relies on the following gap assumption: Υ,

λk+1 > Ω(k 3 ). ρb(k)

We strengthen the analysis in two directions. First, we improve the approximation guarantee by a factor of Θ(k) and second we require only a weaker gap assumption: Ψ,

λk+1 > Ω(k 3 ), ρbavr (k)

(1)

where ρbavr (k) is the minimal average conductance over all k-way partitions achieving ρb(k). Furthermore, for graphs G that satisfy the gap assumption (1) with k = w(1), our improved analysis gives an algorithm running in time O(nk) that on input a suitable spectral embedding of V outputs with constant probability a k-way partition of V with identical approximation guarantees as in [4]. This speeds up the algorithm in [4] by a O(2k )-factor.

∗ This work has been funded by the Cluster of Excellence “Multimodal Computing and Interaction” within the Excellence Initiative of the German Federal Government.

Contents 1 Introduction

2

2 Notations

5

3 Structure of the Paper

6

4 The Vectors gbi and fi are Close 4.1 Analyzing the Columns of Matrix F . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Analyzing Eigenvectors f in terms of fbj . . . . . . . . . . . . . . . . . . . . . . . . .

6 7 8

5 Spectral Properties of Matrix B 9 5.1 Analyzing the Column Space of Matrix B . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Analyzing the Row Space of Matrix B . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 The Vectors p(i) are Well-Spread

13

7 The Proof of Lemma 1.5

14

8 The Proof of Lemma 1.6

17

9 k-means Algorithms

17

A Parameterized Upper Bound on ρbavr (k)

19

1

1

Introduction

Let G = (V, E) be an undirected graph. For any subset of vertices S ⊆ V we denote by µ(S) = P v∈S deg(u) the volume of S, and we define the conductance of S by E(S, S) . (2) φ(S) = µ(S) The order k conductance constant ρ(k) is defined by ρ(k) =

min

disjoint Z1 ,...,Zk

Φ(Z1 , . . . , Zk ),

where Φ(Z1 , . . . , Zk ) = max φ(Zi ). i∈[1:k]

(3)

Lee et al. [1] connected ρ(k) and the kth smallest eigenvalue of the normalized Laplacian matrix LG through the relation p λk /2 6 ρ(k) 6 O(k 2 ) λk . (4) Oveis Gharan and Trevisan [3] proved that the order k partition constant ρb(k) satisfies ρb(k) ,

min

partition (P1 ,...,Pk )

Φ(P1 , . . . Pk ) 6 kρ(k).

(5)

Let X = {F (u)}u∈V be a set with n vectors in Rk . The k-means clustering problem asks to partiP P tion X into k clusters (X1′ , . . . , Xk′ ) so as to minimize the “k-means cost” ki=1 u∈X ′ kF (u) − c′i k2 , i P where c′i = X1 ′ F (u). We use △ (X ) to denote the optimal k-means cost that is achieved ′ k | i | u∈Xi by an optimal partition (X1 , . . . , Xk ) of X . Formally, we write △k (X ) ,

k X X

i=1 u∈Xi

kF (u) − ci k2 =

min′

partition X1 ,...,Xk′

k X X

F (u) − c′i 2 . i=1 u∈Xi′

Ostrovsky et al. [2] proposed a PTAS for the k-means clustering problem restricted to inputs X that satisfy △k (X ) 6 ε2 · △k−1 (X ), for ε ∈ (0, 6 · 10−7 ]. (6) Theorem 1.1. [2] There is a PTAS for the k-means clustering problem restricted to inputs satisfying (6) that outputs with constant probability in O(exp{O(1 + ǫ2 )k/γ} · nk) time a (1 + γ)-optimal solution. Peng et al. [4] studied approximation schemes for the k-way partitioning problem that asks to partition the vertices of a graph into sets P1 , . . . , Pk that minimize φ(P1 , . . . , Pk ). They analyzed spectral clustering algorithms that embed the vertices of G into vectors F (·) ∈ Rk using the first k eigenvectors of the normalized Laplacian matrix LG , and then partition the resulting vectors via k-means clustering algorithms. More precisely, let fk be the eigenvector corresponding to λk . The vectors f1 to fn form an orthonormal basis of RV . The spectral embedding map F : V → Rk is defined by 1 F (u) = √ (f1 (u) , . . . , fk (u))T , du

for all vertices u ∈ V .

(7)

Peng et al. [4] constructed a k-means instance by inserting du many copies of the vector F (u) into XV , for every vertex u ∈ V . The optimal cost of a k-means clustering of XV is defined by OPT =

min

X1 ,...,Xk

k X X

i=1 u∈Xi

2

du ||F (u) − ci ||2 ,

where X1 , . . . , Xk ranges over all k-way partitions of V and each vector ci is the gravity center of a cluster {du copies of F (u)}u∈Xi . Moreover, we note that △k (XV ) = OPT. An approximate k-means clustering algorithm with an approximation ratio APR returns a k-way partition (A1 , . . . , Ak ) with corresponding centers c1 , . . . , ck such that Cost({Ai , ci }ki=1 ) =

k X X

i=1 u∈Ai

du kFu − ci k2 6 APR · OPT .

(8)

We summarize now the main result of Peng et al. [4]. Theorem 1.2. [4] Let k > 3 and (P1 , . . . , Pk ) be a k-way partition of V with φ(P1 , . . . , Pk ) = ρb(k). Let G be a graph that satisfies the gap assumption Υ=

λk+1 = 105 · k3 /δ, ρb(k)

(9)

for some δ ∈ (0, 1/2]. Let (A1 , . . . , Ak ) be the k-way partition returned by an APR-approximate k-means algorithm applied to XV . Then the following statements hold (after suitable renumbering of one of the partitions): 1) µ(Ai △Pi ) = (APR · δ) · µ(Pi )

and

2) φ(Ai ) 6 (1 + APR · 2δ) · φ(Pi ) + (APR · 2δ).

Our Contribution: We improve the analysis of Theorem 1.2 and show that both approximation guarantees that appear in the conclusion of Theorem 1.2 can be strengthened by a Θ(k)-factor, under a less restrictive gap assumption. Let O be the set of all k-way partitions (P1 , . . . , Pk ) with Φ(P1 , . . . , Pk ) = ρb(k), i.e., the set of all partitions that achieve the order k partition constant. Let ρbavr (k) =

min

(P1 ,...,Pk )∈O

k 1X φ(Pi ) k i=1

be the minimal average conductance over all k-way partitions in O. Our gap assumption is defined by λk+1 > Ω(k3 ). Ψ= ρbavr (k)

Moreover, we give in Appendix A a parameterized upper bound on ρbavr (k) that depends on a natural combinatorial parameter and the average conductance of k disjoint subsets achieving ρ(k). For the remainder of this paper we denote by (P1 , . . . , Pk ) a k-way partition of V that achieves ρbavr (k). We summarize now our main results.

Theorem 1.3 (Main Theorem). Let G be a graph that satisfies for some δ ∈ (0, 1/2] and k > 3 the gap assumption Ψ = 204 · k3 /δ. (10) Suppose a k-means clustering algorithm achieving an approximation ratio APR takes as input a spectral embedding XV and outputs a k-way partition {Ai }ki=1 . Let δ′ = 8δ/104 . Then for every i ∈ [1 : k] the following two statements hold (after suitable renumbering of one of the partitions):       APR · 2δ′ APR · 2δ′ APR · δ′ · µ(Pi ) and 2) φ(Ai ) 6 1 + · φ(Pi ) + . 1) µ(Ai △Pi ) < k k k 3

Using Theorem 1.3, we show in Section 9 that Theorem 1.1 yields a O(2k )-factor faster k-means clustering algorithm that outputs a k-way partition of V with the same approximation guarantees as in Theorem 1.2. Theorem 1.4. Suppose G is a graph that satisfies for some δ ∈ (0, 1/2] the gap assumption Ψ = 204 · k3 /δ and k/δ > 109 . Then there is an algorithm that on input a spectral embedding {Fu }u∈V of G, outputs with constant probability in time O(nk) a k-way partition (A1 , . . . , Ak ) of V that satisfies for all i ∈ [1 : k] 1) µ(Ai △Pi ) < δ′ · µ(Pi )

and

2) φ(Ai ) 6 (1 + 2δ′ ) · φ(Pi ) + δ′ ,

where

δ′ = 8δ/104 .

Part 2) follows from Part 1). Indeed,   APR · δ′ · µ(Pi ) µ(Ai ) > µ(Pi ∩ Ai ) = µ(Pi ) − µ(Pi \ Ai ) > µ(Pi ) − µ(Ai △Pi ) > 1 − k

The Proof of the Main Theorem:

and |E(Ai , Ai )| 6 |E(Pi , Pi )| + µ(Ai ∆Pi ) since every edge that is counted in |E(Ai , Ai )| but not in |E(Pi , Pi )| must have an endpoint in Ai ∆Pi . Thus ′

Φ(Ai ) =

|E(Pi , Pi )| + ( APR·δ ) · µ(Pi ) |E(Ai , Ai )| k 6 6 ′ µ(Ai ) (1 − APR·δ ) · µ(P ) i k



1+

APR · 2δ′ k



· φ(Pi ) +



APR · 2δ′ k



.

The proof of Part 1) builds upon the following Lemmas that we will prove in Section 7 and Section 8 respectively. Lemma 1.5. Under the hypothesis of Theorem 1.3, the following holds. If for every permutation σ : [1 : k] → [1 : k] there exists an index i ∈ [1 : k] such that   APR · δ′ 8δ µ(Ai △Pσ(i) ) > · µ(Pσ(i) ), where δ′ = 4 . k 10 Then it holds that Cost({Ai , ci }ki=1 ) >

2k2 · APR. Ψ

Lemma 1.6. If Ψ > 4 · k3/2 then there are vectors {p(i) }ki=1 such that OPT 6

Cost({Pi , p(i) }ki=1 )

6

k X X

i=1 u∈Pi

 2

2  3k k

(i) du F (u) − p 6 1 + · . Ψ Ψ

Substituting these bounds into (8) yields a contradiction, since

  2 2k2 3k k k (i) k APR < Cost({Ai , ci }i=1 ) 6 APR · OPT 6 APR · Cost({Pi , p }i=1 ) 6 APR · 1 + · . Ψ Ψ Ψ Therefore, there exists a permutation π (the identity after suitable renumbering of one of the ′ · µ(Pi ) for all i ∈ [1 : k]. This completes the proof of partitions) such that µ(Ai △Pi ) < APR·δ k Theorem 1.3.

4

2

Notations

We use the notation adopted by Peng et al. [4] and restate it below for completeness. Let LG = I − D −1/2 AD −1/2 be a normalized Laplacian matrix, where D is diagonal degree matrix and A is adjacency matrix. We refer to the kth eigenvalue of matrix LG by λk , λk (LG ). The (unit) eigenvector corresponding to λk is denoted by fk . D 1/2 χPi

, where χPi is the characteristic vector of a subset Pi ⊆ V . Note gi is the kD1/2 χPi k

2 P normalized characteristic vector of Pi and that D 1/2 χPi = v∈Pi degv = µ(Pi ). We will write µi instead of µ(Pi ). The Rayleigh quotient is defined by and satisfies that Let gi =

R (gi ) ,

1 |E(S, S)| gi T LG gi = χT = φPi , Pi LχPi = T µ(Pi ) µ(Pi ) gi gi

where L = D − A is the graph Laplacian matrix. The eigenvectors {fi }ni=1 form an orthonormal basis of Rn . Thus each characteristic vector gi P (i) can be expressed as gi = nj=1 αj fj for all i ∈ [1 : k]. We define its projection onto the first k P (i) eigenvectors by fbi = k α fj . j=1

j

Peng et al. [4] showed that span({fbi }ki=1 ) = span({fi }ki=1 ) if the gap parameter Υ is large enough. In Lemma 4.2 we demonstrate that similar statement holds with substituted gap parameter Ψ. This P (i) implies that each of the first k eigenvectors can be expressed by fi = kj=1 βj fbj . Moreover, Peng et al. [4] showed that each vector k X (i) βj gj gbi = j=1

approximates the eigenvector fi for all i ∈ [1 : k], if Υ is large. We prove in Theorem 4.1 that it suffices to have a large gap parameter Ψ. In the proof of Lemma 1.6, we will use the vectors

For any vertex u ∈ Pi , we have

p(i) = p

p(i) = Indeed, for any h ∈ [1 : k],

h

D −1/2 gbh (u) =

1 µ(Pi )

  (1) (k) T . βi , . . . , βi

i h i  D −1/2 gb1 (u) , . . . , D −1/2 gbk (u) .

(11)

(12)

X

D 1/2 χPi 1 (h) (h) βj D −1/2 p (u) = p βi . µ(Pi ) µ(Pi ) 16j6k

Our analysis builds upon the following two matrices. Let F, B ∈ Rk×k be square matrices such that for all indices i, j ∈ [1 : k] we have (i)

Fj,i = αj

(i)

and Bj,i = βj .

5

(13)

P (i) fbi = kj=1 αj fj

fi =

Pk

(i) b j=1 βj fj

D 1/2 χPi

Lemma 4.3 kfbi − gi k2 6 φPi /λk+1

Theorem 4.1 2

kfi − gbi k 6 (1 + 3k/Ψ) · k/Ψ

gi = √

gbi =

µ(Pi )

=

Pn

(i) j=1 αj fj

Pk

(i) j=1 βj gj

Figure 1: The relation between the vectors fi , fbi , gbi and gi . The vectors {fi }n i=1 are eigenvectors of the normalized Laplacian matrix LG of a graph G satisfying Ψ > 4 · k 3/2 . The vectors {g i }ki=1 are the normalized characteristic vectors of an optimal partition {Pi }ki=1 . For each i ∈ [1 : k] the vector fbi is the projection of vector gi onto span(f1 , . . . , fk ). By Lemma 4.3 the vectors fbi and g i are close for i ∈ [1 : k]. By Lemma 4.2 it holds span(f1 , . . . , fk ) = span(fb1 , . . . , c fk ) when Ψ > 4 · k 3/2 , and Pk Pk (i) (i) b thus we can write fi = j=1 βj fj . Moreover, by Theorem 4.1 the vectors fi and gbi = j=1 βj gj are close for i ∈ [1 : k].

3

Structure of the Paper

In Section 4, we will show that if Ψ > 4 · k3/2 then the vectors gbi and fi are close for all i ∈ [1 : k], i.e.,   3k k 2 kfi − gbi k 6 1 + · . Ψ Ψ

The proof follows [4] but our analysis depends on our the less restrictive gap parameter Ψ. In contrast to [4] we exhibit in Section 5 key spectral properties of the matrices BT B and BBT . More precisely, we show that they are close to the identity matrix in the following sense. If Ψ > 104 · k3 /ε2 and ε ∈ (0, 1) then for all distinct i, j ∈ [1 : k] it holds √ (14) 1 − ε 6 hBi,: , Bi,: i 6 1 + ε and |hBi,: , Bj,: i| 6 ε. Peng et al.. (c.f. [4, Lemma 4.3]) proved that the L2 square norm between any distinct estimation center vectors satisfies

2  −1

(i) (j) p − p .

> 103 · k · min {µ(Pi ), µ(Pj )}

In Section 6 we improve their result by Θ(k)-factor. Our analysis depends on a less restrictive gap assumption Ψ > 204 · k3 and builds upon (14). We show in Lemma 6.2 that for all distinct i, j ∈ [1 : k] it holds

2

(i) −1 (j) p − p

> [3 · min {µ(Pi ), µ(Pj )}] . We prove Lemma 1.5 in Section 7 and Lemma 1.6 in Section 8. The analysis of these Lemmas builds upon the results from Section 4 to Section 6, and it depends on the gap parameter Ψ.

4

The Vectors gbi and fi are Close

In this section we prove Theorem 4.1. We argue in a similar manner as in [4], however, in terms of Ψ instead of Υ. For completeness, we show in Subsection 4.1 that the span of the first k eigenvectors is equal to the span of the projections of the characteristic vectors of subsets Pi onto the first k eigenvectors. Then in Subsection 4.2 by expressing the eigenvectors fi in terms of the vectors fbi we conclude the proof of Theorem 4.1. 6

Theorem 4.1. If Ψ > 4 · k3/2 then the vectors gbi = 2

4.1

kfi − gbi k 6



Pk

(i) j=1 βj gj ,

3k 1+ Ψ



i ∈ [1 : k], satisfy

k . Ψ

·

Analyzing the Columns of Matrix F

We prove in this subsection the following result that depends on gap parameter Ψ. Lemma 4.2. If Ψ > k3/2 then the span({fbi }ki=1 ) = span({fi }ki=1 ) and thus each eigenvector can be P (i) expressed as fi = kj=1 βj · fbj for every i ∈ [1 : k]. To prove Lemma 4.2 we build upon the following result shown by Peng et al. [4].

Lemma 4.3. [4, Theorem 1.1 Part 1] For Pi ⊂ V let gi =

that

D 1/2 χPi

kD1/2 χPi k

. Then any i ∈ [1 : k] it holds

n

2

  X R (gi ) φ(Pi )

(i) 2 b 6 α = g − f = .

i i j λk+1 λk+1 j=k+1

Based on the following two results we prove Lemma 4.2.

Lemma 4.4. For every i ∈ [1 : k] and p 6= q ∈ [1 : k] it holds that 1 − φ(Pi )/λk+1

2

(i) 2

b 6 fi = α 6 1

and

p D E φ(Pp ) · φ(Pq ) b b p q . fp , fq = |hα , α i| 6 λk+1

Proof. The first part follows by Lemma 4.3 and the following chain of inequalities

n  k  n

2 X     X X φ(Pi )

b (i) 2 (i) 2 (i) 2 = 1. αj 6 αj = fi = 61− αj 1− λk+1 j=1

j=k+1

j=1

We show now the second part. Since {fi }ni=1 are orthonormal eigenvectors we have for all p 6= q that n X (q) (p) hfp , fq i = (15) αl · αl = 0. l=1

We combine (15) and Cauchy-Schwarz to obtain n k X X D E b b (q) (p) (q) (p) αl · αl αl · αl = fp , fq = l=k+1 l=1 v v u X u n  2 pφ(P ) · φ(P ) u n  (p) 2 u X p q (q) 6 t . αl ·t αl 6 λk+1 l=k+1

l=k+1

Lemma 4.5. If Ψ > k3/2 then the columns {F:,i }ki=1 are linearly independent.

7



Proof. We show that the columns of matrix F are almost orthonormal. Consider the symmetric  T T matrix F F. It is known that ker F F = ker(F) and that all eigenvalues of matrix FT F are real numbers. We proceeds by showing that the smallest eigenvalue λmin (FT F) > 0. This would imply that ker(F) = ∅ and hence yields the statement. By combining Gersgorin Circle Theorem, Lemma 4.4 and Cauchy-Schwarz it holds that     k k D   

2 X E  X   T

(j) (i) λmin (FT F) > min FT F ii − F F ij = min α(i) − α ,α  i∈[1:k]   i∈[1:k]  j6=i j6=i v s s s u k k X √ uX φ(P ) φ(Pj ) φ(Pi⋆ ) φ(Pi⋆ ) k3/2 j > 1 − kt >1− > 0, > 1− λk+1 λk+1 λk+1 λk+1 Ψ j=1

j=1

where i⋆ ∈ [1 : k] is the index that minimizes the expression above.



We present now the proof of Lemma 4.2. Proof of Lemma 4.2. Let λ be an arbitrary non-zero vector. Notice that ! k k k k k k X X X X X X (i) (i) b γj fj , where γj = hFj,: , λi . fj = λi αj αj fj = λi λi · fi = i=1

i=1

(16)

j=1

i=1

j=1

j=1

By Lemma 4.5 the columns {F:,i }ki=1 are linearly independent and since γ = Fλ, it follows at least n ok are linearly independent and span Rk .  one component γj 6= 0. Therefore the vectors fbi i=1

4.2

Analyzing Eigenvectors f in terms of fbj

To prove Theorem 4.1 we establish next the following result. Lemma 4.6. If Ψ > k3/2 then for i ∈ [k] it holds 

2k 1+ Ψ

−1

6

k  X j=1

 (i) 2 βj

6



2k 1− Ψ

Proof. We show now the upper bound. By Lemma 4.2 fi = 1

2

=

kfi k =

=

k  X j=1

(⋆)

>



* k X a=1

βa(i) fba ,

k X b=1

Pk

(i) βb fbb

−1

.

(i) b j=1 βj fj

+

for all i ∈ [1 : k] and thus

k X k  D E X (i) 2 b 2 (i) βj βa(i) βb fba , fbb

fj + a=1 b6=a

 X k   2k (i) 2 . βj 1− · Ψ j=1

To prove the inequality (⋆) we consider the two terms separately.

8

2 P P P

By Lemma 4.4, fbj > 1 − φ(Pj )/λk+1 . We then apply i ai bi 6 ( i ai )( i bi ) for all non-negative vectors a, b and obtain   X  k k  k  k      X X φ(Pj ) k X  (i) 2 (i) 2 φ(Pj ) (i) 2 (i) 2 βj 1− . βj − βj βj > 1− = λk+1 λk+1 Ψ j=1

j=1

j=1

j=1

D E p Again by Lemma 4.4, we have fba , fbb 6 φ(Pa )φ(Pb )/λk+1 , and by Cauchy-Schwarz it holds k X k X a=1 b6=a

(i) βa(i) βb

D

fba , fbb

E

k X k D E X (i) (i) b b > − βa · βb · fa , fb a=1 b6=a

> −

> −

1

λk+1 1 λk+1

k X k X (i) p (i) p βa φ(Pa ) · βb φ(Pb ) a=1 b6=a



2 k k q X k X  (i) 2 (i)   . βj >− · βj φ(Pj ) Ψ j=1

j=1

The lower bound follows by analogous arguments.



We are ready now to prove Theorem 4.1. P P (i) (i) Proof of Theorem 4.1. By Lemma 4.2, we have fi = kj=1 βj fbj and recall that gbi = kj=1 βj gj for all i ∈ [1 : k]. We combine triangle inequality, Cauchy-Schwarz, Lemma 4.3 and Lemma 4.6 to obtain

2 

2

k k

  X X



(i) βji ·  kfi − gbi k2 = βj fbj − gj

fbj − gj 

6

j=1 j=1        −1 k k  k

2 2 X X X 2k 1

b

(i)    · βj 6  φ(Pj )

fj − gj  6 1 − Ψ λk+1 j=1 j=1 j=1  −1   3k k k 2k · 6 1+ · , = 1− Ψ Ψ Ψ Ψ 

where the last inequality uses Ψ > 4 · k.

5

Spectral Properties of Matrix B

In this section we bound the inner product of any two rows of matrix B (c.f. Equation 13). Theorem 5.1. If Ψ > 104 · k3 /ε2 and ε ∈ (0, 1) then for all distinct i, j ∈ [1 : k] it holds √ 1 − ε 6 hBi,: , Bi,: i 6 1 + ε and |hBi,: , Bj,: i| 6 ε. The proof is divided into two parts. We show in Lemma 5.4 that 1 − ε 6 hBi,: , Bi,: i 6 1 + ε, √ and we establish the second statement |hBi,: , Bj,: i| 6 ε in Lemma 5.5.

9

5.1

Analyzing the Column Space of Matrix B

We show below that the matrix BT B is close to the identity matrix. Lemma 5.2. (Columns) If Ψ > 4 · k3/2 then for all distinct i, j ∈ [1 : k] it holds r 3k k 3k 6 hB:,i , B:,i i 6 1 + and |hB:,i , B:,j i| 6 4 . 1− Ψ Ψ Ψ Proof. By Lemma 4.6 it holds that X  (i) 2 3k 3k 61+ βj 6 hB:,i , B:,i i = . Ψ Ψ k

1− Recall that gbi =

Pk

(i) j=1 βj

j=1

· gj . Moreover, since the eigenvectors {fi }ki=1 and the characteristic

vectors {gi }ki=1 are orthonormal by combing Cauchy-Schwarz and by Theorem 4.1 it holds + * k k k X X X (i) (j) (j) (i) |hB:,i , B:,j i| = βa · ga , βl βl = βb · gb = hgbi , gbj i a=1

l=1

b=1

= h(gbi − fi ) + fi , (gbj − fj ) + fj i

= hgbi − fi , gbj − fj i + hgbi − fi , fj i + hfi , gbj − fj i

6 kgbi − fi k · kgbj − fj k + kgbi − fi k + kgbj − fj k s r    3k 3k k k k 6 1+ . · +2 1+ · 64 Ψ Ψ Ψ Ψ Ψ



Using a stronger gap assumption we show that the columns of matrix B are linearly independent. Lemma 5.3. If Ψ > 25 · k3 then the columns {B:,i }ki=1 are linearly independent.  Proof. Since ker (B) = ker BT B and BT B is SPSD1 matrix, it suffices to show that the smallest eigenvalue xT BT Bx λ(BT B) = min > 0. x6=0 xT x By Lemma 5.2, !2 r r k k X k D E X k X k (i) (j) 2 |xi | |xj | β , β 6 kxk · 4k , |xi | 64 Ψ Ψ i=1 j6=i

and

T

T

x B Bx =

i=1

* k X

(i)

xi β ,

>

3k 1− Ψ

xj β

j=1

i=1



k X



kxk2 −

(j)

+

=

k k X X i=1 j6=i

k X i=1



2

x2i β (i)

+

k X k X i=1 j6=i

D E |xi | |xj | β (i) , β (j) >

Therefore λ(BT B) > 0 and the statement follows. 1

We denote by SPSD the class of symmetric positive semi-definite matrices.

10

E D xi xj β (i) , β (j) 1 − 5k

r

k Ψ

!

· kxk2 . 

5.2

Analyzing the Row Space of Matrix B

In this subsection we show that the matrix BBT is close to the identity matrix. We bound now the squared L2 norm of the rows in matrix B, i.e. the diagonal entries in matrix BBT . Lemma 5.4. (Rows) If Ψ > 400 · k3 /ε2 and ε ∈ (0, 1) then for all distinct i, j ∈ [1 : k] it holds 1 − ε 6 hBi,: , Bi,: i 6 1 + ε. Proof. We show that the eigenvalues of matrix BBT are concentrated around 1. This would imply T that χT i BB χi = hBi,: , Bi,: i ≈ 1, where χi is a characteristic vector. By Lemma 5.2 we have     k D

E 3k 2  (i) T 23k2 3k 2 16k2

(i) 4 X (j) (i) 2 T (i) 1− · BB · β = β + 6 β + 61+ 6 1+ β ,β Ψ Ψ Ψ Ψ j6=i

and r X  r  k D E E D 2 (i) T k k k 3k (l) (j) (i) (l) T (j) β + 16 6 11 . · BB · β 6 68 1+ β ,β β ,β Ψ Ψ Ψ Ψ l=1

By Lemma 5.3 every vector x ∈ Rk can be expressed as x = T

T

x BB x =

k X



γi β

i=1

=

k X i=1

> and T

x x=

k k X X i=1 j=1

γi2

(i)

T

T

· BB ·

k X

γj β (j)

Pk

i=1 γi β

(i) .

j=1

k k X T  T  X (i) T (i) γi γj β (i) · BBT · β (j) β · BB · β +

23k2 1− − 11k Ψ D

(i)

γi γj β , β

(j)

E

!

r

k Ψ

=

k X i=1

i=1 j6=i

2

kγk >

2 2 (i) γi β

1 − 14k

+

r

k k X X i=1 j6=i

k Ψ

!

kγk2 .

E D γi γj β (i) , β (j)

q P P

2

(i) (j) k k 2 k By Lemma 5.2 we have i=1 j6=i γi γj β , β and β (i) 6 1 + 6 kγk · 4k Ψ holds r ! r ! k k 2 T 1 − 5k kγk 6 x x 6 1 + 5k kγk2 . Ψ Ψ Therefore 1 − 20k

r

k 6 λ(BBT ) 6 1 + 20k Ψ

r

3k Ψ.

Thus it

k . Ψ 

We have now established the first part of Theorem 5.1. We turn to the second part and restate it in the following Lemma. Lemma 5.5. (Rows) If Ψ > 104 · k3 /ε2 and ε ∈ (0, 1) then for all distinct i, j ∈ [1 : k] it holds √ |hBi,: , Bj,: i| 6 ε. 11

To prove Lemma 5.5 we establish the following three Lemmas. Before stating them we need some notation that is inspired by Lemma 5.2. q k and E is symmetric matrix. Then we have Definition 5.6. Let BT B = I + E, where |Eij | 6 4 Ψ BBT

2

= B (I + E) BT = BBT + BEBT .

Lemma 5.7. If Ψ > 402 · k3 /ε2 and ε ∈ (0, 1) then all eigenvalues of matrix BEBT satisfy λ(BEBT ) 6 ε/5.

Proof. Let z = BT x. We upper bound the quadratic form

r T X k x BEBT x = z T Ez 6 |Eij | |zi | |zj | 6 4 · Ψ ij

k X i=1

!2

|zi |

By Lemma 5.4 we have 1 − ε 6 λ(BBT ) 6 1 + ε and since kzk2 =

2

6 kzk · 4k

xBBT x xT x

r

k . Ψ

· kxk2 it follows that

kzk2 kzk2 6 kxk2 6 , 1+ε 1−ε and hence

r T x BEBT x k T λ(BEB ) 6 max 6 4 (1 + ε) · k 6 ε/5. T x x x Ψ



Lemma 5.8. Suppose {ui }ki=1 is orthonormal basis and the square matrix U has ui as its ith column. Then UT U = I = UUT . Proof. Notice that by the definition of U it holds UT U = I. Moreover, the matrix U−1 exists and thus UT = U−1 . Therefore, we have UUT = I as claimed.  Lemma 5.9. If Ψ > 402 ·k3 /ε2 and ε ∈ (0, 1) then it holds |(BEBT )ij | 6 ε/5 for every i, j ∈ [1 : k]. Proof. Notice that BEBT is symmetric matix, since By SVD Theorem there is an Pk E is symmetric. k T T T orthonormal basis {ui }i=1 such that BEB = i=1 λi (BEB ) · ui ui . Thus, it suffices to bound the expression k X |λl (BEBT )| · |(ul uT |(BEBT )ij | 6 l )ij |. l=1

By Lemma 5.8 we have

q q k X 2 |(ul )i | · |(ul )j | 6 kUi,: k kUj,: k2 = 1. l=1

We apply now Lemma 5.7 to obtain

k k X ε ε X · |(ul )i | · |(ul )j | 6 . |λl (BEBT )| · |(ul uT ) | 6 l ij 5 5 l=1

l=1



12

√ We are ready now to prove Lemma 5.5, i.e. |hBi,: , Bj,: i| 6 ε for all i 6= j. 2 Proof of Lemma 5.5. By Definition 5.6 we have BBT = BBT +BEBT . Observe that the (i, j)th T entry of  matrix BB is equal to the inner product between the ith and jth row of matrix B, i.e. T BB ij = hBi,: , Bj,: i. Moreover, we have h

 T 2

BB

i

ij

=

k X l=1

BB

T



i,l

BB

T



l,j

=

k X l=1

hBi,: , Bl,: i hBl,: , Bj,: i .

For the entries on the main diagonal, it holds 2

hBi,: , Bi,: i +

k X l6=i

hBi,: , Bl,: i2 = [(BBT )2 ]ii = [BBT + BEBT ]ii = hBi,: , Bi,: i + BEBT



ii

,

and hence by applying Lemma 5.4 with ε′ = ε/5 and Lemma 5.9 with ε′ = ε we obtain  X ε 2 ε ε  + − 1− 6 ε. hBi,: , Bj,: i2 6 hBi,: , Bl,: i2 6 1 + 5 5 5 l6=i



6

The Vectors p(i) are Well-Spread

Peng et al.. (c.f. [4, Lemma 4.3]) showed for Υ > Ω(k3 ) that the square Euclidean distance between any distinct estimation center vectors (c.f. Equation 11) is lower bounded by

2  −1

(i) (j) p − p .

> 103 · k · min {µ(Pi ), µ(Pj )}

Under a less restrictive gap assumption Ψ > Ω(k3 ) we improve [4, Lemma 4.3] by a Θ(k)-factor. Our analysis builds upon Theorem 5.1 and bounds a summation of k terms, instead of applying [4, Lemma 4.2] to a single component. We show now a statement similar to [4, Lemma 4.2] that depends on Ψ. Lemma 6.1. [4, Lemma 4.2] If Ψ = 204 · k3 /δ for some δ ∈ (0, 1] then for every i ∈ [1 : k] it holds " √ #

2 1 δ

(i) 1± .

p ∈ µ(Pi ) 4

Proof. By definition pi = µ(Pi )−1/2 · Bi,: and by Theorem 5.1 we have kBi,: k2 ∈ [1 ±



δ/4].



We present now our statement. Lemma 6.2. If Ψ = 204 · k3 /δ for some δ ∈ (0, 1/2] then for any distinct i, j ∈ [1 : k] it holds that

2

(i)

−1

p − p(j) > [2 · min {µ(Pi ), µ(Pj )}] .



Suppose ci is the center of a cluster Ai . If ci − p(i1 ) > ci − p(i2 ) then it holds

2

2 1

(i1 )

(i2 ) (i1 ) − p > [8 · min {µ(Pi1 ), µ(Pi2 )}]−1 .

ci − p > p 4 13

Proof. We argue in a similar manner as in [4] but in contrast apply Theorem 5.1 with ε = obtain * + √ 2δ1/4 ε p(i) hBi,: , Bj,: i p(j)

,

= 6 = .

p(i) p(j) kBi,: k kBj,: k 1−ε 3

2

2 W.l.o.g. assume that p(i) > p(j) . Then by Lemma 6.1 we have √ !

2 δ

(i) · [min {µ(Pi ), µ(Pj )}]−1 .

p > 1 − 4

Let p(j) = α · p(i) for some α ∈ (0, 1]. Then * +

2

2

2



(i) (j) p p

(i)



(i) (j)

,

p p

p − p(j) = p(i) + p(j) − 2



p(i) p(j) !

2 4δ1/4

2 · α + 1 p(i) > [2 · min {µ(Pi ), µ(Pj )}]−1 . > α − 3 The second claim follows immediately from the first.

7

√ δ/4 to



The Proof of Lemma 1.5

Our main result in this section improves [4, Lemma 4.5] by Θ(k)-factor. We argue in a similar manner as in [4], but in contrast our result relies on Lemma 6.2 and the gap parameter Ψ. We begin our discussion by restating [4, Lemma 4.6] whose analysis crucially relies on a function σ defined by µ(Al ∩ Pj ) σ(l) = arg max . (17) µ(Pj ) j∈[1:k] Lemma 7.1. [4, Lemma 4.6] Let (P1 , . . . , Pk ) and (A1 , . . . , Ak ) be partitions of the vector set. Suppose for every permutation π : [1 : k] → [1 : k] there is an index i ∈ [1 : k] such that µ(Ai △Pπ(i) ) > 2ε · µ(Pπ(i) ),

(18)

where ε ∈ (0, 1/2) is a parameter. Then one of the following three statements holds: 1. If σ is a permutation and µ(Pσ(i) \Ai ) > ε · µ(Pσ(i) ), then for every index j 6= i there is a real εj > 0 such that µ(Aj ∩ Pσ(j) ) > µ(Aj ∩ Pσ(i) ) > εj · min{µ(Pσ(j) ), µ(Pσ(i) )},

P

and j6=i εj > ε. 2. If σ is a permutation and µ(Ai \Pσ(i) ) > ε · µ(Pσ(i) ), then for every j 6= i there is a real εj > 0 such that µ(Ai ∩ Pσ(i) ) > εj · µ(Pσ(i) ), µ(Ai ∩ Pσ(j) ) > εj · µ(Pσ(i) ), P and j6=i εj > ε. 3. If σ is not a permutation, then there is an index ℓ 6∈ {σ(1), . . . , σ(k)} and for every index j there is a real εj > 0 such that

and

Pk

j=1 εj

µ(Aj ∩ Pσ(j) ) > µ(Aj ∩ Pℓ ) > εj · min{µ(Pσ(j) ), µ(Pℓ )}, = 1. 14

We prove now our main technical result that yields an improved lower bound by Θ(k)-factor. Lemma 7.2. Suppose the hypothesis of Lemma 7.1 is satisfied and Ψ = 204 · k3 /δ for some δ ∈ (0, 1/2]. Then it holds 2k2 ε − . Cost({Ai , ci }ki=1 ) > 16 Ψ Proof. By definition Cost({Ai , ci }ki=1 )

=

k k X X

X

i=1 j=1 u∈Ai ∩Pj

du kF (u) − ci k2 , Λ.

(19)

Since for every vectors x, y, z ∈ Rk it holds   2 kx − yk2 + kz − yk2 > (kx − yk + kz − yk)2 > kx − zk2 ,

we have for all indices i, j ∈ [1 : k] that 2

kF (u) − ci k >

(j)

p − ci 2 2

2

− F (u) − p(j) .

(20)

Our proof proceeds by considering three cases. Let i ∈ [1 : k] be the index from the hypothesis in Lemma 7.1. Case 1. Suppose the first conclusion of Lemma 7.1 holds. For every index j 6= i let (



pσ(j) , if pσ(j) − cj > pσ(i) − cj ; γ(j) p = pσ(i) , otherwise. Then by combining (20), Lemma 6.2 and Lemma 1.6, we have

2

2 X

X X 1X

γ(j) Λ > F (u) − p du pγ(j) − cj −

2 j6=i u∈Aj ∩Pγ(j) j6=i u∈Aj ∩Pγ(j)   2 µ(Aj ∩ Pγ(j) ) 1 X 3k ε 2k2 k > − 1+ > − . · 16 min{µ(Pσ(i) ), µ(Pσ(j) )} Ψ Ψ 16 Ψ j6=i

Case 2. Suppose the second conclusion of Lemma 7.1 holds. Notice that if µ(Ai ∩ Pσ(i) ) 6 (1 − ε) · µ(Pσ(i) ) then µ(Pσ(i) \Ai ) > ε · µ(Pσ(i) ) and thus we can argue as in Case 1. Hence, we can assume that it holds µ(Ai ∩ Pσ(i) ) > (1 − ε) · µ(Pσ(i) ). (21) We proceed

by analyzing

two subcases.

a) If pσ(j) − ci > pσ(i) − ci holds for all j 6= i then by combining (20), Lemma 6.2 and Lemma 1.6 it follows

2 X

2

X X 1X



du pσ(j) − ci − Λ >

F (u) − pσ(j) 2 j6=i u∈Ai ∩Pσ(j) j6=i u∈Ai ∩Pσ(j)   2 X µ(Ai ∩ Pσ(j) ) 1 3k ε 2k2 k > − 1+ > − . · 2 min{µ(Pσ(i) ), µ(Pσ(j) )} Ψ Ψ 16 Ψ j6=i

15



b) Suppose there is an index j 6= i such that pσ(j) − ci < pσ(i) − ci . Then by triangle inequality combined with Lemma 6.2 we have

2 1

 −1

σ(i)

− ci > pσ(i) − pσ(j) > 8 · min{µ(Pσ(i) ), µ(Pσ(j) )} .

p 4

Thus, by combining (20), (21) and Lemma 1.6 we obtain 1 2

Λ >

X

u∈Ai ∩Pσ(i)

2

du pσ(i) − ci −

2

du F (u) − pσ(i)

X

u∈Ai ∩Pσ(i)

  2 µ(Ai ∩ Pσ(i) ) 1 3k 1 − ǫ 2k2 k · − 1+ > − . · 16 min{µ(Pσ(i) ), µ(Pσ(j) )} Ψ Ψ 16 Ψ

>

Case 3. Suppose the third conclusion of Lemma 7.1 holds, i.e., σ is not a permutation. Then there is an index ℓ ∈ [1 : k] \ {σ(1), . . . , σ(k)} and for every index j ∈ [1 : k] let (



pℓ , if pℓ − cj > pσ(j) − cj ; γ(j) p = pσ(j) , otherwise. By combining (20), Lemma 6.2 and Lemma 1.6 it follows that k

Λ >

1X 2

X

j=1 u∈Aj ∩Pγ(j)

>

1 16

k X j=1

k

2 X

du pγ(j) − cj −

X

j=1 u∈Aj ∩Pγ(j)



µ(Aj ∩ Pγ(j) ) 3k − 1+ min{µ(Pσ(j) ), µ(Pℓ )} Ψ



·

2

du F (u) − pγ(j)

1 2k2 k2 > − . Ψ 16 Ψ



Based on Lemma 7.2 we improve [4, Lemma 4.5] by Θ(k)-factor and condition our analysis on a less restrictive gap assumption that depends on Ψ. Corollary 7.3. Let (P1 , . . . , Pk ) and (A1 , . . . , Ak ) are partitions of the vector set. Suppose for every permutation π : [1 : k] → [1 : k] there is an index i ∈ [1 : k] such that µ(Ai △Pπ(i) ) >

2ε · µ(Pπ(i) ), k

(22)

where ε ∈ (0, 1) is a parameter. If Ψ = 204 · k3 /δ for some δ ∈ (0, 1/2], and ε > 64 · APR · k3 /Ψ then 2k2 APR. Cost({Ai , ci }ki=1 ) > Ψ Proof. We apply Lemma 7.1 with ε′ = ε/k. Then by Lemma 7.2 we have Cost({Ai , ci }ki=1 ) >

2k2 ε − , 16k Ψ

and the desired result follows by setting ε > 64 · APR · k3 /Ψ.



We note that Lemma 1.5 follows directly by applying Corollary 7.3 with ε = 64 · APR · k3 /Ψ. 16

8

The Proof of Lemma 1.6

By Theorem 4.1 we have kfi − gbi k2 6 1 + k X X

i=1 u∈Pi

=

du kF (u) − c⋆i k2 6

k X k X X

j=1 i=1 u∈Pi

3k Ψ

k X X

i=1 u∈Pi 2

(fj (u) − gbj (u)) =

k X j=1



·

k Ψ

and thus

k X k X

2 X  

(i) 2 du F (u)j − pj du F (u) − p(i) = i=1 j=1 u∈Pi

2

kfj − gbj k 6



3k 1+ Ψ



·

k2 , Ψ

where the k-way partition (P1 , . . . , Pk ) achieving ρbavr (k) has corresponding centers c⋆1 , . . . , c⋆k .

9

k-means Algorithms

In this Section we prove Theorem 1.4. Our improved analysis (c.f. Theorem 1.3) strengthens by Θ(k)-factor the approximation guarantees in Theorem 1.2. This allows us to execute any k-means clustering algorithm with approximation ratio set to APR = Θ(k), and obtain as a result a k-way partition of V with the same approximation guarantees as in Theorem 1.2. Moreover, by Theorem 1.1 we can run Ostrovsky et al.’s [2] PTAS for k-means clustering in time O(nk). This speeds up by O(2k )-factor the algorithm in [4] and preserves the approximation guarantees in Theorem 1.2. For the instances considered in the statement of Theorem 1.4 it suffices to show that △k (XV ) 6 ε2 · △k−1 (XV ) (23) holds for ε = 6 · 10−7 . We prove (23) by establishing a lower bound on △k−1 (XV ).

Lemma 9.1. Suppose Ψ = 204 · k3 /δ for some δ ∈ (0, 1/2]. Then for δ′ = 2δ/204 it holds △k−1 (XV ) >

δ′ 1 − . 12 k

(24)

Before we present the proof of Lemma 9.1 we show that it implies (23). By Lemma 1.6 we have △k (XV ) 6

δ′ 2k2 = , Ψ k

for δ′ = 2δ/204 . The statement follows, since for k/δ > 109 and ε = 6 · 10−7 it holds △k−1 (XV ) >

δ′ 1 2 δ 1010 δ 1 δ′ 1 1 − = − 4· > · = · > 2 · △k (XV ). 5 2 12 k 12 20 k 9·2 k ε k ε

Proof of Lemma 9.1 We argue in a similar manner as in Lemma 7.2 (c.f. Case 3). We start by giving some notations. Then we prove Lemma 9.2 which is later used in the proof of Lemma 9.1. We redefine the function σ (c.f. Equation 17) such that for any two partitions (P1 , . . . , Pk ) and (Z1 , . . . , Zk−1 ) of V , we define a function σ : [1 : k − 1] 7→ [1 : k] by µ(Zi ∩ Pj ) , µ(Pj ) j∈[1:k]

σ(i) = arg max

for every i ∈ [1 : k − 1].

The next statement is similar to the third conclusion of Lemma 7.1, but in contrast lower bounds the overlapping (in terms of the volume) between any k-way and (k − 1)-way partitions of V . 17

Lemma 9.2. Suppose (P1 , . . . , Pk ) and (Z1 , . . . , Zk−1 ) are partitions of V . Then for any index ℓ ∈ [1 : k] \ {σ(1), . . . , σ(k − 1)} (there is at least one such ℓ) and for every i ∈ [1 : k − 1] it holds   µ(Zi ∩ Pσ(i) ), µ(Zi ∩ Pℓ ) > τi · min µ(Pℓ ), µ(Pσ(i) ) , Pk−1 where i=1 τi = 1 and τi > 0.

Proof. By pigeonhole principle there is an index ℓ ∈ [1 : k] such that ℓ ∈ / {σ(1), . . . , σ(k − 1)}. Thus, for every i ∈ [1 : k − 1] we have σ(i) 6= ℓ and µ(Zi ∩ Pσ(i) ) µ(Zi ∩ Pℓ ) > , τi , µ(Pσ(i) ) µ(Pℓ )

where

Pk−1 i=1

τi = 1 and τi > 0 for all i. Hence, the statement follows.



We present now the proof of Lemma 9.1. Proof of Lemma 9.1. Let (Z1 , . . . , Zk−1 ) be a (k − 1)-way partition of V with centers c′1 , . . . , c′k−1 that achieves △k−1 (XV ), and (P1 , . . . , Pk ) be a k-way partition of V achieving ρbavr (k). Our goal now is to lower bound the optimal (k − 1)-means cost △k−1 (XV ) =

k k−1 X X

X

i=1 j=1 u∈Zi ∩Pj

2 du F (u) − c′i .

(25)

By Lemma 9.2 there is an index ℓ ∈ [1 : k] \ {σ(1), . . . , σ(k − 1)}. For i ∈ [1 : k − 1] let (



pℓ , if pℓ − c′i > pσ(i) − c′i ; γ(i) p = pσ(i) , otherwise.

Then by combining Lemma 6.2 and Lemma 9.2, we have

2    −1

γ(i)

− c′i > 8 · min µ(Pℓ ), µ(Pσ(i) ) and µ(Zi ∩ Pγ(i) ) > τi · min µ(Pℓ ), µ(Pσ(i) ) , (26)

p Pk−1 where i=1 τi = 1. We now lower bound the expression in (25). Since

2

2

γ(i) ′ γ(i)

F (u) − c′i 2 > 1 − ci − F (u) − p ,

p 2 it follows for δ′ = 2δ/204 that △k−1 (XV ) = >

k k−1 X X

X

i=1 j=1 u∈Zi ∩Pj

k−1 X

X

i=1 u∈Zi ∩Pγ(i) k−1

>

1X 2

X

2 du F (u) − c′i

i=1 u∈Zi ∩Pγ(i)

>

2

du F (u) − c′i

k−1

2 X

du pγ(i) − c′i −

i=1 u∈Zi ∩Pγ(i)

2

du F (u) − pγ(i)

k X k−1 X

2 µ(Zi ∩ Pγ(i) ) 1X  − du F (u) − pi 2 8 · min µ(Pγ(i) ), µ(Pσ(i) ) i=1 u∈Pi

i=1

δ′

1 − , 16 k where the last inequality holds due to (26) and Lemma 1.6. >

X

18



References [1] J. R. Lee, S. Oveis Gharan, and L. Trevisan. Multi-way spectral partitioning and higher-order cheeger inequalities. In Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, STOC ’12, pages 1117–1130, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1245-5. doi: 10.1145/2213977.2214078. URL http://doi.acm.org/10.1145/2213977.2214078. [2] R. Ostrovsky, Y. Rabani, L. J. Schulman, and C. Swamy. The effectiveness of lloyd-type methods for the k-means problem. J. ACM, 59(6):28:1–28:22, Jan. 2013. ISSN 0004-5411. doi: 10.1145/2395116.2395117. URL http://doi.acm.org/10.1145/2395116.2395117. [3] S. Oveis Gharan and L. Trevisan. Partitioning into expanders. In Proceedings of the TwentyFifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, pages 1256–1266, 2014. doi: 10.1137/1.9781611973402.93. URL http://dx.doi.org/10.1137/1.9781611973402.93. [4] R. Peng, H. Sun, and L. Zanetti. Partitioning well-clustered graphs with k-means and heat kernel. CoRR, abs/1411.2021, 2014. URL http://arxiv.org/abs/1411.2021.

A

Parameterized Upper Bound on ρbavr(k)

A k-disjoint tuple Z is a k-tuple (Z1 , . . . , Zk ) of disjoint subsets of V . A k-way partition (P1 , . . . , Pk ) of V is compatible with a k-disjoint tuple Z if Zi ⊆ Pi for all i. We then define Si = Pi \Zi and use PZ to denote all partitions compatible with Z. We use Zk to denote all k-tuples Z with ρ(k) = Φ(Z) = Φ(Z1 , . . . , Zk ). The elements of Zk are called optimal (k-disjoint) tuples. We denote all partitions compatible with some optimal k-tuple by Pk = ∪Z∈Zk PZ .

(27)

Oveis Gharan and Trevisan [3, Lemma 2.5] proved that for every k-disjoint tuple Z ∈ Zk there is a k-way partition (P1 , . . . , Pk ) ∈ PZ with Φ(P1 , . . . , Pk ) 6 kρ(k).

(28)

Remark A.1. In this section, we assume that every partition (P1 , . . . , Pk ) ∈ Pk satisfies Φ(P1 , . . . , Pk ) > ρ(k),

(29)

since otherwise ρb(k) = ρ(k).

We refine the analysis in [3] and prove a parameterized upper bound on ρbavr (k) that depends on a natural combinatorial parameter and the average conductance of a k-disjoint tuple Z ∈ Zk . Before we state our results, we need some notations. We define the order k inter-connection constant of a graph G by ρP (k) ,

min

P1 ,...,Pk ∈Pk

where ΦIC (P1 , . . . , Pk ) , max Si 6=∅

ΦIC (P1 , . . . , Pk )

|E(Si , V \Pi )| − |E(Si , Zi )| . |E(Pi , V \Pi )| 19

(30)

(31)

We will prove in Lemma A.5 that ρP (k) ∈ (0, 1 − 1/(k − 1)]. Furthermore, let OP be the set of all k-way partitions (P1 , . . . , Pk ) ∈ Pk with ΦIC (P1 , . . . , Pk ) = ρP (k), i.e., the set of all partitions that achieve the order k inter-connection constant. Let ρeavr (k) =

min

(P1 ,...,Pk )∈OP

k 1X φ(Pi ) k

(32)

i=1

be the minimal average conductance over all k-way partitions in OP . By construction it holds that ρbavr (k) 6 ρeavr (k).

(33)

We present now our main result of this Section which upper bounds ρeavr (k).

Theorem A.2. For any graph G there exists a k-way partition (P1 , . . . , Pk ) ∈ OP compatible with a k-disjoint tuple Z with Φ(Z1 , . . . , Zk ) = ρ(k) such that for κP , [1 − ρP (k)]−1 ∈ (1, k − 1] it holds k κP X ρeavr (k) 6 φ(Zi ) k i=1

and in addition, for every i ∈ [1 : k]

φ(Pi ) 6 κP · φ(Zi ). Our goal now is to prove Theorem A.2. We establish first a few useful Lemmas that will be used to prove Lemma A.5 and Theorem A.2. Oveis Gharan and Trevisan [3, Algorithm 2 and Fact 2.4] showed that Fact A.3. [3] For any k-disjoint tuple Z, there is a k-way partition (P1 , . . . , Pk ) ∈ PZ such that 1. For every i ∈ [1 : k], Zi ⊆ Pi . 2. For every i ∈ [1 : k], and every subset ∅ = 6 S ⊆ Pi \Zi it holds |E(S, Pi \S)| >

1 |E(S, V \S)| . k

Lemma A.4. For any k-disjoint tuple Z, there exists a k-way partition (P1 , . . . , Pk ) ∈ PZ that satisfies |E(Si , V \Pi )| − |E(Si , Zi )| 1 max 61− . |E(Pi , V \Pi )| k−1 Si 6=∅ Proof. By Fact A.3 there is a k-way partition (P1 , . . . , Pk ) ∈ PZ such that for all i it holds |E(Si , Zi )| = |E(Si , Pi \Si )| >

1 1 |E(Si , V \Si )| = (|E(Si , V \Pi )| + |E(Si , Zi )|) k k

and hence |E(Si , Zi )| >

1 |E(Si , V \Pi )| . k−1

Lemma A.5. The order k inter-connection constant of a graph G is bounded by 0 < ρP (k) 6 1 − 20

1 . k−1



Proof. We prove first the upper bound. By Lemma A.4 there is a k-way partition (P1 , . . . , Pk ) ∈ Pk compatible with a k-disjoint tuple Z such that max Si 6=∅

1 |E(Si , V \Pi )| − |E(Si , Zi )| 61− . |E(Pi , V \Pi )| k−1

Therefore, ρP (k) =

min′

P1′ ,...,Pk ∈Pk

= max Si 6=∅

 ΦIC P1′ , . . . , Pk′ 6 ΦIC (P1 , . . . , Pk )

|E(Si , V \Pi )| − |E(Si , Zi )| 1 61− . |E(Pi , V \Pi )| k−1

We prove now the lower bound. Suppose for contradiction that ρP (k) 6 0. By definition we have |E(Zi , V \Zi )| + |E(Si , V \Pi )| − |E(Si , Zi )| |E(Pi , V \Pi )| = µ(Pi ) µ(Pi ) |E(Si , V \Pi )| − |E(Si , Zi )| 6 φ(Zi ) + µ(Pi )

φ(Pi ) =

By (30), it holds for any Si 6= ∅ that |E(Si , V \Pi )| − |E(Si , Zi )| 6 ρP (k) · |E(Pi , V \Pi )| and thus φ(Pi )

(

6 φ(Zi ) − |ρP (k)| · φ(Pi ) , if Si 6= ∅; = φ(Zi ) , otherwise.

However, this contradicts Φ(P1 , . . . , Pk ) > ρ(k) and thus the statement follows.



We are now ready to prove Theorem A.2. Proof of Theorem A.2. Let (P1 , . . . , Pk ) ∈ OP be a k-way partition compatible with a k-disjoint tuple Z ∈ Zk that satisfies Φ(Z1 , . . . , Zk ) = ρ(k). By Lemma A.5 there is a real number such that κP , [1 − ρP (k)]−1 ∈ (1, k − 1]. We argue in a similar manner as in Lemma A.5 to obtain ( 6 φ(Zi ) − ρP (k) · φ(Pi ) , if Si 6= ∅; φ(Pi ) = φ(Zi ) , otherwise.

(34)

(35)

By combining (34) and the first conclusion of (35) we have φ(Pi ) 6 [1 − ρP (k)]−1 · φ(Zi ) = κP · φ(Zi ).

(36)

The statement follows by combining (32) and (36), since ρeavr (k) 6

k k 1X κP X φ(Pi ) 6 φ(Zi ). k k i=1

i=1



21