A Simple Spectral Algorithm for Recovering Planted Partitions

Report 0 Downloads 39 Views
A Simple Spectral Algorithm for Recovering Planted Partitions

arXiv:1503.00423v2 [cs.DS] 20 Jul 2015

Sam Cole, Shmuel Friedland, and Lev Reyzin Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago Chicago, Illinois 60607-7045, USA {scole3,friedlan,lreyzin}@uic.edu

Abstract In this paper, we consider the planted partition model, in which n = ks vertices of a random graph are partitioned into k “clusters,” each of size s. Edges between vertices in the same cluster and different clusters are included with constant probability p and q, respectively (where 0 ≤ q < p ≤ 1). We give an efficient algorithm √ that, with high probability, recovers the clustering as long as the cluster sizes are are least Ω( n). Our algorithm is based on projecting the graph’s adjacency matrix onto the space spanned by its largest eigenvalues and using the result to recover one cluster at a time. While certainly not the first to use the spectral approach, our algorithm is arguably the simplest to do so: there is no need to randomly partition the vertices beforehand, and hence there is no messy “cleanup” step at the end. We also use a novel application of the Cauchy integral formula to prove its correctness.

1

Introduction and previous work

In the Erd˝os-R´enyi random graph model [11], graphs G(n, p) on n vertices are generated by inn cluding each of the possible 2 edges independently at random with probability 0 < p < 1. A classical conjecture of Karp [19] states that there is no efficient algorithm for finding cliques of size (1 + ǫ) log1/p n, though cliques of size at least 2 log1/p n will almost surely exist [6]. Jerrum [18] and Kuˇcera [21] introduced a potentially easier variant called the planted clique problem. In this model, one starts with a random graph, but additionally, edges are added deterministically to an unknown set of s vertices (known as the “plant”) to make them form a clique. The goal then is to determine which vertices belong to the planted clique, which should be easier when s becomes √ large. When s = Ω( n log n), the clique can be found by simply taking the s vertices with the largest √ degrees [21]. This bound was improved using spectral methods to Ω( n) by Alon et al. [2] and then others [5, 9, 10, 12, 13, 22]. These methods also handle a generalization of this problem in which edges within the plant are added merely with higher probability rather than deterministically. A more general version of the problem is to allow for planting multiple disjoint cliques, sometimes called a planted clustering. In the most basic version, known as the planted partition model (also called the stochastic block model), n nodes are partitioned into k disjoint clusters of size s = n/k, which are “planted” in a random graph. Two nodes u and v get an edge with probability p if they are in the same cluster and with probability q if they reside in different clusters (with p > q constant). One interesting case is when p = 1 and q = 1/2. As in the planted √ clique case, a relatively simple algorithm can recover the clustering when the clique sizes are Ω( n log n)—in this case pairs of vertices with the most common neighbors can be √ placed in the same cluster [8]. However, when the cluster sizes are only required to be Ω( n), the problem, as in the planted clique case, becomes more difficult because a simple application of the Azuma-Hoeffding inequality no longer suffices. √ In this paper we present a simple spectral algorithm that recovers clusters of size Ω( n). Our algorithm is not the first to achieve this bound for the planted partition problem [4, 7, 16, 23] (see Appendix B for comparison with previous work). It is, however, arguably much simpler than other spectral algorithms, e.g. that of Gieson and Mitsche [16], who, to our knowledge, were the first √ to achieve the n bound for this problem. Our proof is also of interest because it uses a novel application of Cauchy’s integral formula. Our algorithm works for any constants 0 ≤ q < p ≤ 1 and √ any cluster size s ≥ c n, where c is a constant depending only on p and q. Our results can also be generalized to the case in which clusters have differing sizes and intra-cluster edge probabilities, but herein we discuss only the case when they are uniform. Efficient algorithms for planted clustering typically rely on either convex optimization [4, 7, 23] or spectral techniques [16, 22, 24]. The latter, including ours, often involve looking at the projection operator onto the vector space corresponding to the k largest eigenvalues of the adjacency matrix of ˆ and showing that it is “not too far” from the projection operator the randomly generated graph G ˆ of the expectation matrix E[G] onto its own k largest eigenvalues. A natural approach for identifying all the clusters would be to identify a single cluster, remove it, and recurse on the remaining vertices. This is hard to make work because the randomness of ˆ is “used up” in the first iteration, and then subsequent iterations cannot be handled the instance G independently of the first. Existing spectral approaches bypass these difficulties by randomly splitting the input graph into parts, thus forcing independence in the randomness on the parts [16, 22, 24]. This partitioning trick works at the cost of complicating the algorithm. We, however, are 1

able to make the natural recursive approach work by “preprocessing the randomness”: we show that certain (exponentially many) events all occur simultaneously with high probability, and as long as they all occur our algorithm definitely works. √ Ω( n) cluster size is generally accepted to be the barrier for efficient algorithms for “planted” √ problems. Evidence for the difficulty of beating the n barrier dates back to Jerrum [18], who showed a specific Markov chain approach will fail to find smaller cliques. Feige and Krauthgamer [12] showed that Lov´asz-Schrijver SDP relaxations run into the same barrier, while Feldman et al. [14] show that all “statistical algorithms” also provably fail to efficiently find smaller cliques in a distributional version of the planted clique problem. Recently, Ailon et al. [1] were able to recover planted √ clusterings in which some of the cluster sizes are o( n), but their algorithm’s success depends on √ the simultaneous presence of clusters of size Ω( n log2 n).

1.1

Outline

In Section 2 we formally define the planted partition model. In Section 3 we present our algorithm for identifying the clusters. We prove its correctness in Section 7. Sections 4-6 are dedicated to developing the linear algebra tools necessary for the proof: in Sections 4 and 5 we characterize the eigenvalues of the (unknown) expectation matrix G and the randomly generated adjacency matrix ˆ and their submatrices, which allows us to bound the difference of their projections in Section 6. G ˆ are “close” is the key ingredient in our proof. Showing that the projection operators of G and G

2

The planted partition model

Assume that C = {C1 , . . . , Ck } is an unknown partition of the set [n] := {1, . . . , n} into k sets of size s = n/k called clusters. For constants 0 ≤ q < p ≤ 1, we define G(n, C, p, q) to be the probability space of graphs with vertex set [n], with edges ij (for i 6= j) included independently with probability p if i and j are in the same cluster in C and probability q otherwise. Note that the case k = 1 gives the standard Erd˝os-R´enyi model G(n, p) [11]. We will denote as follows the main quantities to consider in this paper. ˆ is the adjacency matrix of a random graph obtained from G(n, C, p, q). This is what the • G cluster identification algorithm receives as input. ˆ + pIn . This is the expectation of the adjacency matrix G ˆ with ps added to the • G := E[G] diagonal (to make it a rank k matrix and simplify the proofs). ˆ The planted partition problem is to “recover the clusters” given only G—i.e., to identify the unknown partition C1 , . . . , Ck (up to a permutation of [k]) or, equivalently, to reproduce G. In this ˆ paper we give an algorithm to recover the clusters which is based on the k largest eigenvalues of G and the corresponding eigenspaces.

2.1

Graph and matrix notation

Instead of introducing additional notation, we sometimes refer to a symmetric 0-1 matrix A as a graph. In that case, we are referring to the graph whose adjacency matrix is A. We will use the following notation throughout this paper:

2

• Aij – the (i, j)th entry of the matrix A or the indicator variable for the presense of edge ij in the graph A • NA (j) – neighborhood of vertex j in the graph A or support of column j in the matrix A. We will omit the subscript A when the meaning is clear. • A[S] – the induced subgraph on S ⊆ [n] or the principal submatrix with row and column indices in S  S • AJ := A i∈J Ci – the induced subgraph on clusters Ci , i ∈ J or the principal submatrix with row and column indices in these clusters • Pl (A) – orthogonal projection operator onto the subspace of Rn spanned by eigenvectors corresponding to the largest l eigenvalues of A • 1S ∈ {0, 1}n – indicator vector for set S ⊆ [n]; i.e., the ith entry is 1 if i ∈ S, 0 else. In addition, we will sometimes adopt the following notation for convenience: l := |J|,

m := ls,

A := GJ ,

ˆJ , Aˆ := G

P := Pl (A),

ˆ Pˆ := Pl (A).

(2.1)

We will specify whenever we are doing so.

3

The cluster identification algorithm

√ The main result of this paper is that Algorithm 1 below recovers clusters of size c n: Theorem 1. For sufficiently large n with probability 1−o(1), Algorithm 1n correctly recovers planted o √ 72 88 partitions in which the of size of the clusters is ≥ c n, where c := max p−q , (p−q)2 . ˆ Algorithm 1 Identify clusters of size s in a graph G ˆ s) IdentifyClusters(G, ˆ 1: n ← |V (G)| 2: k ← ⌊n/s⌋ 3: if k < 1 then 4: return ∅ 5: else ˆ 6: Pˆ ← Pk (G) ˆ do 7: for j ∈ V (G) ˆ 8: Let Pi1 j ≥ . . . ≥ Pˆin−1 j be the entries of of column j of Pˆ other than Pˆjj . 9: Wj ← {j, i1 , . . . , is−1 }, i.e., the indices of the s − 1 greatest entries of column j of Pˆ , along with j itself 10: end for ˆ 11: j ∗ ← arg maxj∈V (G) ˆ ||P 1Wj ||2 ˆ with the most neighbors in Wj ∗ 12: C ← the s vertices in G ˆ \ C, s) 13: return {C} ∪ IdentifyClusters(G 14: end if 3

ˆ generated The overview of Algorithm 1 is as follows. The algorithm gets a random matrix G ˆ according to G(n, C, p, q). We first project G onto the eigenspace corresponding to its largest k eigenvalues. This, we will argue, gives a fairly good approximation of at least one of the clusters, which we can then find and “fix up.” Then we remove the cluster and repeat the algorithm. Note that we ensure that Algorithm 1 works in every iteration w.h.p. by “preprocessing the randomness”; i.e., we show that certain events occur simultaneously on all (exponentially many) ˆ J w.h.p., and that as long as they all hold Algorithm 1 will definitely succeed. For this reason, G ˆ J are the main objects analyzed in the following sections. the subgraphs GJ and G

4

Eigenvalues of GJ

The following lemma is easily verified: Lemma 2. Let ∅ ⊂ J ⊆ [k] with |J| = l and m = ls. Then GJ is a m × m matrix of rank l with eigenvalues λ1 (GJ ) = (p − q)s + qm,

λi (GJ ) = (p − q)s for i = 2, . . . , l, λi (GJ ) = 0 for i = l + 1, . . . , m.

So we see that the smallest positive eigenvalue is proportional to the size of the clusters. Recall √ that our main assumption is that s ≥ c n for some constant c > 0. Let c′ := (p − q)c. Assuming that p > q we obtain

√ λk (G) ≥ c′ n.

(4.1)

Note that the number of cliques, k, satisfies the inequality k≤

5

1√ n. c

ˆJ Eigenvalues of G

In this section, we prove a separation of the first |J| eigenvalues of GˆJ from the remaining ones. ˆ J − GJ . We begin by bounding the spectral norm of G √n Lemma 3. For sufficiently large n, with probability > 1 − 2 2e the following is true: for all ∅ ⊂ J ⊆ [k] p ˆ J − GJ k2 ≤ 8 |J|s. (5.1) kG ˆ − E[G]. ˆ Let σij be the standard deviation of xij and let σ ≥ σij for i, j ∈ Proof. Set X = [xij ] = G [n]. Hence X satisfies the conditions of Theorem A.2, with K = 1,

p p 1 σ = max( p(1 − p), q(1 − q)) ≤ , 2 4

S and S is the set of nonempty unions of clusters, i.e. S = i∈J Ci : ∅ ⊂ J ⊆ [k] . Let J ⊆ [k] and m = |J|s. Observe that ˆ J − GJ = G ˆ J − E[G ˆ J ] − pIm = XJ − pIm G ⇒

ˆ J − GJ k2 = kXJ − pIm k2 ≤ kXJ k2 + kpIm k2 ≤ kXJ k2 + 1. kG

By Theorem A.2 we obtain that for all ∅ ⊂ J ⊆ [k] p ˆ J − GJ k2 ≤ 2(σ + 3K) |J|s + 1 kG   p 1 |J|s + 1 ≤ 2 +3·1 2 p ≤ 8 |J|s √n for n > n1 . with probability > 1 − 2 2e

ˆ J (and GJ ) as follows: We can now use the lemma above to characterize the eigenvalues of G ˆ J satisfies (5.1). Then the first l Lemma 4. Let ∅ ⊂ J ⊆ [k] with |J| = l and m = ls. Assume G √ ˆ J are in the interval [(c′ − 8) m, m] and all other eigenvalues of GJ and eigenvalues of GJ and G √ √ ˆ GJ are in the interval [−8 m, 8 m]. Proof. Applying Weyl’s inequalities (see, e.g., [17]) to Lemma 3 yields √ ˆ J ) − λi (GJ )| ≤ 8 m, 1 ≤ i ≤ m. |λi (G Recall that GJ has exactly l positive eigenvalues and all other eigenvalues are zero. Hence √ ˆ J ) − λi (GJ )| ≤ 8 m, for 1 ≤ i ≤ l, |λi (G and

ˆ J )| |λi (G



√ 8 m for i ≥ l + 1.

Using (4.1) we deduce that √ ˆ J ) ≥ (c′ − 8) m for 1 ≤ i ≤ l, λi (G

√ ˆ J )| ≤ 8 m for i > l. |λi (G

Thus, if we make c′ > 16 we have a separation between the first l eigenvalues and the reamining ˆJ . eigenvalues of both GJ and G m×m we have λ (A) ≤ Note 1 P that the upper bound of m follows from the fact that for any A ∈ R maxi j |Aij |.

6

ˆ J ) and Pl (GJ ) Deviations between the projections Pl (G

ˆ J ) − Pl (GJ )k2 and kPl (G ˆ J ) − Pl (GJ )kF , where k · k2 In this section, we will prove bounds on kPl (G and k · kF are the spectral and the Frobenius matrix norms, respectively. Note that it is easily verified that Pl (GJ ) = Pk (G)J and k

Pk (G) =

1X 1 1Ci 1⊤ Ci = H, s s

(6.1)

i=1

where H is the “true” cluster matrix whose (i, j)th entry is 1 if i and j are in the same cluster, 0 else. 5

6.1

ˆ J ) − Pl (GJ )k2 A bound on kPl (G

ˆ J ) and Pl (GJ ) are projection operators, we have As Pl (G ˆ J )k2 = kPl (GJ )k2 = 1, |Pl (G

ˆ J ) − Pl (GJ )k2 ≤ 2. kPl (G

In fact, we can make this difference arbitrarily small by increasing c accordingly, as shown in the following lemma. ˆ J satisfies (5.1). When the cluster sizes satisfy Lemma 5. Let ∅ ⊂ J ⊆ [k] with |J| = l. Assume G √ s ≥ c n for sufficiently large c > 0, we have ˆ J ) − Pl (GJ )k2 ≤ kPl (G

(c′

8 , − 8)

(6.2)

Proof. Let us use (2.1) for notational convenience. Define Γ to be a square with the length 2M ≫ m. Its sides are parallel to the x- and y-axes. The center√ of of the square √ is on the x-axis. The left c′ m c′ m and right sides of the square are on the lines x = 2 and x = 2 + 2M , respectively. The upper and lower sides of the square are on the lines y = ±M . Note that by Lemma 4 the interior of Γ contains the l largest eigenvalues of A and Aˆ and the exterior of Γ contains the other m − l ˆ To get our estimate (6.2) we let M → ∞. eigenvalues of A and A. Recall the Cauchy integral formula for calculating projections [20]: Z 1 ˆ ˆ −1 dz, P = (zIm − A) 2πi Γ Z 1 (zIm − A)−1 dz. P = 2πi Γ Hence Pˆ − P

= =

Clearly

Z  1 ˆ −1 (zIm − A) − (zIm − A) ˆ (zIm − A)−1 dz (zIm − A) 2πi Γ Z  1 ˆ −1 Aˆ − A (zIm − A)−1 dz. (zIm − A) 2πi Γ

kPˆ − P k2 ≤ ≤

Z  1 ˆ −1 Aˆ − A (zIm − A)−1 k2 |dz| k(zIm − A) 2π Γ Z  1 ˆ −1 k2 k Aˆ − A k2 k(zIm − A)−1 k2 |dz|. k(zIm − A) 2π Γ

(6.3)

ˆ zIm − A are normal. Hence Observe that for each z ∈ C the matrices zIm − A, ˆ −1 k2 = k(zIm − A) ′√

1

ˆ mini∈[m] |z − λi (A)|

k(zIm − A)−1 k2 =

,

′√

1 . mini∈[m] |z − λi (A)|

Let z = c 2 m + yi, y ∈ R. That is, z lies on the line x = c 2 m . It is easy to see the following simple estimate: s s √ 2 √ 2 √ ′ (c − 8) m c′ m c′ m 2 2 ˆ |z − λi (A)| ≥ + yi. + y , |z − λi (A)| ≥ + y for z = 2 2 2 6

′√ √ Also recall from (5.1) that kAˆ − Ak2 ≤ 8 m. Hence for z = c 2 m + yi one has the estimate: Z M 1 ˆ −1 k2 k(Aˆ − A)k2 k(zIm − A)−1 k2 |dz| k(zIm − A) 2π −M


t2 (W ) ≥ t3 (W ) ≥ 1,

l X

ti (W ) = s.

i=1

˜ from W by replacing In particular, C3 ∩ W and C2 \ W are both nonempty. Now construct W ˜ ˜ ) = τ since only t2 a vertex from C3 ∩ W with one from C2 \ W . Clearly |W | = s, and t(W increases and t2 (W ) < t1 (W ) = τ . But l l X 1X ˜ )2 − 1 ti (W ti (W )2 s s i=1 i=1  1 2 (t2 (W ) + 1) + (t3 (W ) − 1)2 − t2 (W )2 − t3 (W )2 = s 2 = (t2 (W ) − t3 (W ) + 1) s > 0,

2 ||P 1W ˜ ||2 − ||P 1W ||2 =

contradicting the maximality of ||P 1W ||2 . Thus, W is split between two clusters C1 and C2 ; i.e., W = U ∪ V , where U := W ∩ C1 and V := W ∩ C2 . So by (7.4) we have (1 − 8ǫ2 − 2ǫ)2 s ≤ ||P 1W ||22 = ||P (1U + 1V )||22 =

|U |2 (s − |U |)2 + . s s

Solving the inequality for |U | yields τ = max{|U |, |V |} ≥ (1 − 3ǫ)s, provided ǫ is small enough (again ǫ ≤ .1 is sufficient). This completes the proof.

11

Note that in order for the proof of Lemma 8 to go through we require that ǫ ≤ .1, which can be accomplished by making 88 . (7.5) c≥ p−q Lemma 9. With probability ≥ 1 − n3/2 e−ǫ

for all j ∈ Ci and

2 √n

the following is true: for all i ∈ [k]

|NGˆ (j) ∩ Ci | ≥ (p − ǫ)s

(7.6)

|NGˆ (j) ∩ Ci | ≤ (q + ǫ)s

(7.7)

for all j ∈ [n] \ Ci . Proof. Fix i ∈ [k]. Let j ∈ Ci . E[|N (j) ∩ Ci |] = p(s − 1), so Hoeffding’s inequality yields 2 /(s−1)

Pr[|N (j) ∩ Ci | ≤ (p − ǫ)s] ≤ e−2(ǫs−p)

≤ e−ǫ

2s

for n (hence s) sufficiently large. Now let j ∈ / Ci . Then E[|N (j) ∩ Ci |] = qs, so Pr[|N (j) ∩ Ci | ≥ (q + ǫ)s] ≤ e−2ǫ

2s

2

≤ e−ǫ s .

The lemma follows by taking a union bound over all possible i, j. Lemma 10. Let W ⊆ [n] such that |W | = s and |W ∩ Ci | ≥ (1 − 3ǫ)s for some i ∈ [k]. Then a) If j ∈ Ci and j satisfies (7.6), then |NGˆ (j) ∩ W | ≥ (p − 4ǫ)s. b) If j ∈ [n] \ Ci and j satisfies (7.7), then |NGˆ (j) ∩ W | ≤ (q + 4ǫ)s. Proof. Assume j ∈ Ci and j satisfies (7.6). As |Ci | = s, we have |Ci \ W | ≤ 3ǫ. Therefore, |N (j) ∩ W | ≥ |N (j) ∩ W ∩ Ci |

= |N (j) ∩ Ci | − |(N (j) ∩ Ci ) \ W |

≥ |N (j) ∩ Ci | − |Ci \ W | ≥ (p − ǫ)s − 3ǫs = (p − 4ǫ)s.

Part b) follows by a similar argument. This lemma gives us a way to differentiate between vertices j ∈ Ci and vertices j ∈ / Ci , provided p − 4ǫ ≥ q + 4ǫ.

(7.8)

By definition of ǫ (7.1), this is true if we make c≥

72 . (p − q)2 12

(7.9)

7.3

Main proof

Proof of Theorem 1. We will show that if (5.1) holds for all J ⊆ [k], (7.6) holds for all i ∈ [k], j ∈ Ci , and (7.7) holds for all i ∈ [k], j ∈ / Ci , then Algorithm 1 definitely identifies all clusters correctly; √  √n 2 2 − n3/2 e−ǫ n . hence, by Lemmas 4 and 9 Algorithm 1 succeeds with probability ≥ 1 − 2 e Assume that this is the case. We will prove by induction that Algorithm 1 succeeds in every ˆ = G ˆ [k] considered in the first iteration. iteration. For the base case, take the original graph G ∗ By Lemma 7, the column j = j identified in step 11 satisfies (7.2). Then by Lemma 8 we have |Wj ∩ Ci | ≥ (1 − 3ǫ)s for some i ∈ [k]. Finally, step 12 correctly identifies C = Ci by Lemma 10 and (7.8). Now assume that Algorithm 1 succeeds in the first t iterations, i.e., it correctly identifies a cluster and removes it in each of these iterations. Then the graph considered in the (t + 1)st ˆ J for some J ⊂ [k], |J| = l = k − t. Therefore, we can repeat the analysis of the base iteration is G ˆ J instead of G. ˆ case with G Note that Lemma 8 ensures that we always identify a cluster i ∈ J, so nwe never recover a cluster o 72 88 that has already been recovered. Also note that the definition c := max p−q , (p−q)2 comes from the requirements (7.5) and (7.9).

Acknowledgements We would like to thank the anonymous reviewers of previous versions of our paper for pointing us to relevant past work and for calling our attention to the fact that the iterations of Algorithm 1 cannot be handled independently, as we attempted to do in v1 of our paper (http://arxiv.org/abs/1503.00423v1). We fix this mistake in this updated version by “preprocessing the randomness” as discussed in Sections 1, 3, and 7.

References [1] Nir Ailon, Yudong Chen, and Huan Xu. Breaking the small cluster barrier of graph clustering. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 995–1003, 2013. [2] Noga Alon, Michael Krivelevich, and Benny Sudakov. Finding a large hidden clique in a random graph. Random Struct. Algorithms, 13(3-4):457–466, 1998. [3] Noga Alon, Michael Krivelevich, and Van H. Vu. On the concentration of eigenvalues of random symmetric matrices. Israel Journal of Mathematics, 131(1):259–267, 2002. [4] Brendan P. W. Ames. Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147(1-2):429–465, 2014. [5] Brendan P. W. Ames and Stephen A. Vavasis. Nuclear norm minimization for the planted clique and biclique problems. Math. Program., 129(1):69–89, 2011. [6] B´ela Bollob´ as and Paul Erd˝os. Cliques in random graphs. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 80, pages 419–427. Cambridge Univ Press, 1976.

13

[7] Yudong Chen, S. Sanghavi, and Huan Xu. Improved graph clustering. Information Theory, IEEE Transactions on, 60(10):6440–6455, October 2014. [8] Yudong Chen and Jiaming Xu. Statistical-computational phase transitions in planted models: The high-dimensional setting. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 244–252, 2014. [9] Amin Coja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Probability and Computing, 19(02):227–284, 2010. [10] Yael Dekel, Ori Gurel-Gurevich, and Yuval Peres. Finding hidden cliques in linear time with high probability. In Proceedings of ANALCO, pages 67–75, 2011. [11] Paul Erd˝os and Alfr´ed R´enyi. On random graphs I. Publicationes Mathematicae (Debrecen), 6:290–297, 1959 1959. [12] Uriel Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandom graph. Random Struct. Algorithms, 16(2):195–208, 2000. [13] Uriel Feige and Dorit Ron. Finding hidden cliques in linear time. In Proceedings of AofA, pages 189–204, 2010. [14] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh Vempala, and Ying Xiao. Statistical algorithms and a lower bound for detecting planted cliques. In Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1-4, 2013, pages 655–664, 2013. [15] Zolt´ an F¨ uredi and J´ anos Koml´ os. The eigenvalues of random symmetric matrices. Combinatorica, 1(3):233–241, 1981. [16] Joachim Giesen and Dieter Mitsche. Reconstructing many partitions using spectral techniques. In Proceedings of the 15th International Symposium on Fundamentals of Computation Theory, 2005. [17] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 2012. [18] Mark Jerrum. Large cliques elude the metropolis process. 3(4):347–360, 1992.

Random Struct. Algorithms,

[19] Richard M. Karp. Probabilistic analysis of graph-theoretic algorithms. In Proceedings of Computer Science and Statistics 12th Annual Symposium on the Interface, page 173, 1979. [20] Tosio Kato. Perturbation Theory for Linear Operators. 1966. [21] Ludˇek Kuˇcera. Expected complexity of graph partitioning problems. Discrete Applied Mathematics, 57(2-3):193–212, 1995. [22] Frank McSherry. Spectral partitioning of random graphs. In FOCS, pages 529–537, 2001. [23] Samet Oymak and Babak Hassibi. Finding dense clusters via “low rank + sparse” decomposition. arXiv preprint arXiv:1104.5186, 2011. [24] Van Vu. A simple SVD algorithm for finding hidden partitions. arXiv:1404.3918, 2014. 14

arXiv preprint

A

Eigenvalues of random symmetric matrices

First we give a slight generalization of F¨ uredi-Koml´os result [15, Theorem 2]: Theorem A.1. Let X = [xij ] ∈ Rn×n be a random symmetric matrix where xij are independent random variables for 1 ≤ i ≤ j ≤ n. Assume that there exists K, σ > 0 so that the following conditions hold independent of n: 1. E[xij ] = 0 for 1 ≤ i ≤ j ≤ n. 2. |xij | ≤ K for 1 ≤ i ≤ j ≤ n. 2 ≤ σ 2 for 1 ≤ i ≤ j ≤ n. 3. E[x2ij ] = σij

Then

√ 1 1 Pr[kXk2 > 2σ n + 50Kn 3 log n] < 10 for n > n0 . n

(A.1)

Proof. We indicate briefly the changes one needs to make in the proof of Theorem 2 in [15]. Denote by En,k the expected value of the trace of X k . Observe that the entries of X k are sum of certain monomials in the entries xij , where 1 ≤ i ≤ j ≤ n, of degree k. Take a monomial appearing in tr X k . Assume that the variable xij appears with power one in this monomial. Since the entries xij are independent random variables and since E[xij ] = 0 for 1 ≤ i ≤ j ≤ n it follows that the expected value of this monomial is zero,. Denote by En,k,p the expected values of all monomials appearing in tr X k , such that the number of the variables appearing in this monomial is exactly p − 1. Assume that k = 2l is even. Suppose first that p > l + 1. Then each monomial in En,2l,p contains at least one variable appearing with power one. Hence En,2l,p = 0 for p > 2l + 1. Assume now that p = l + 1. Consider a monomial with l variables of a total degree 2l. If one variable appears with degree 3 at least then there is at least one variable in this monomial appearing with degree one. Hence the expected value of this monomial is zero. So the only nonzero contribution comes from the monomials of the form x2i1 j1 . . . x2il jl where one has l distinct pairs (i1 , j1 ), . . . , (il , jl ), where iq ≤ jq for q ∈ [l]. The expected value of this monomial is σi21 j1 . . . σi2l jl . The assumption that σij ≤ σ yields that the expected value of this monomial is at most σ 2l . Hence the equality (9a) of [15] yields that   2l 1 n(n − 1) · · · (n − l)σ 2l . En,2l,l+1 ≤ En,2l.l+1 (σ) := l+1 l Consider now En,2l,p with p ≤ l. Recall again that the expected value of a monomial which contains a variable xij of degree one is zero. Hence we need to consider only monomials such that each variable xij that appears in this monomial has degree dij ≥ 2. Note that d

2 | E[xijij ]| ≤ K dij −2 E[x2ij ] ≤ K dij −2 σij ≤ K dij −2 σ 2 .

The above inequality implies the inequality X En,2l,p ≤ O(k6 n)En,2l,l+1 (σ). p≤l

This is an analog of the equality (9b) in [15]. Use the arguments of §3.3 in [15], i.e. Markov’s inequality, to deduce (A.1). 15

We are able to prove a much stronger result by combining Theorem A.1 with Theorem√ 1 in [3]. Namely, we prove below that the bound in Theorem A.1 applies simultaneously to 2O( n) submatrices w.h.p. Theorem A.2. Let X = [xij ] ∈ Rn×n be a random symmetric matrix where xij are independent random variables for 1 ≤ i ≤ j ≤ n. Assume that the conditions of Theorem A.1 are satisfied. Let S be a collection subsets of [n], each of size ≥ s = s(n). Let E be the event that for all S ∈ S p kX[S]k2 ≤ 2(σ + 3K) |S|. (A.2) Then

Pr[E] > 1 − 2|S|e−s for n ≥ n1 ,

(A.3)

where n1 depends only on σ and K. √ √ In particular, if s ≥ n and |S| ≤ 2 n , then

 √ n 2 Pr[E] > 1 − 2 . e

Proof. Let n1 be big enough so that s(n) ≥ n0 when n ≥ n1 , where n0 is given as in Theorem A.1. Fix S ∈ S and let m = |S|. Then √ 1 1 Pr[kX[S]k2 < 2σ m + 50Km 3 log m] < 10 . (A.4) m by Theorem A.1. As m ≥ s ≥ n0 , we have √ √ √ 1 1 50m 3 log m ≤ 0.2 m ⇒ 2σ m + 50Km 3 log m ≤ 2(σ + 0.1K) m. Let λ1 (X[S]) ≥ . . . ≥ λm (X[S]) be the eigenvalues of X[S] arranged in nonincreasing order. Let λS1 be the median of the random variable λ1 (X[S]). We claim that √ |λS1 | ≤ 2(σ + 0.1K) m. (A.5) Indeed,

√ 1 1 Pr[λ1 (X[S]) ≥ 2(σ + 0.1K) m] ≤ 10 ≤ m 2 by (A.4). Now consider the random matrix −X. It satisfies the assumptions of Theorem A.1. Therefore we have √ 1 1 Pr[λm (−X[S]) ≥ 2(σ + 0.1K) m] ≤ 10 ≤ . m 2 √ As λm (−X[S]) = −λ1 (X[S]), this is the same as Pr[λ1 (X[S]) ≤ −2(σ + 0.1K) m]. Hence (A.5) follows by definition of median. 1 X[S]. So now each entry of Y [S] is in We now ready to apply Theorem 1 in [3]. Let Y [S] = K [−1, 1]. Clearly the median of the random variable λ1 (Y [S]) is λS1 /K. By (A.5) and Theorem 1 in [3]   −(5.8)2 m √ √ λS1 32 Pr[λ1 (X[S]) ≥ 2(σ + 3K) m] ≤ Pr λ1 (Y [S]) − ≥ 5.8 ≤ e−m . m ≤ 4e K Noting that kX[S]k2 is either λ1 (X[S]) or −λm (X[S]) = λ1 (−X[S]), we get √ Pr[kX[S]k2 ≥ 2(σ + 3K) m] ≤ 2e−m ≤ 2e−s .

Finally, we deduce the inequality (A.3) by taking the union bound over all S ∈ S. 16

B

Comparison with previous results

The following table compares our work with previous algorithms for recovering planted partitions. Note that some of the algorithms apply to more general planted clustering settings, but here we list only their performance for planted partition with constant edge probabilities. Paper McSherry Giesen & Mitsche Oymak & Hassibi Ames Chen et al. Vu Our result

2001 2005 2011 2014 2014 2014

[22] [16] [23] [4] [7] [24]

Minimum cluster size for planted partition Ω(n2/3 ) √ Ω( n) √ Ω( n) √ Ω( n) √ Ω( n) ω(n2/3 (log n)1/3 ) √ Ω( n)

Algorithm type Spectral Spectral Convex programming Semidefinite programming Convex programming Spectral Spectral

√ Thus, we see that while many have succeeded in recovering clusters of size Ω( n), prior to this paper, only Giesen and Mitsche [16] had done so using a purely spectral approach. While their proof techniques have much in common with our own, our algorithm is arguably much simpler.

17