SEMIDEFINITE PROGRAMS ON SPARSE ... - Semantic Scholar

Report 5 Downloads 90 Views
SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS ANDREA MONTANARI AND SUBHABRATA SEN Abstract. Denote by A the adjacency matrix of an Erd˝ os-R´enyi graph with bounded average degree. We consider the problem of maximizing hA − E{A}, Xi over the set of positive semidefinite matrices X with diagonal entries Xii = 1. We prove that for large (bounded) average degree γ, the √ √ value of this semidefinite program (SDP) is –with high probability– 2n γ + n o( γ) + o(n). Our proof is based on two tools from different research areas. First, we develop a new ‘higherrank’ Grothendieck inequality for symmetric matrices. In the present case, our inequality implies that the value of the above SDP is arbitrarily well approximated by optimizing over rank-k matrices for k large but bounded. Second, we use the interpolation method from spin glass theory to approximate this problem by a second one concerning Wigner random matrices instead of sparse graphs. As an application of our results, we prove new bounds on community detection via SDP that are substantially more accurate than the state of the art.

1. Main results 1.1. Semidefinite programs on sparse random graphs. Let G = (V, E) be a random graph with vertex set V = [n], and let AG ∈ {0, 1}n×n denote its adjacency matrix. Spectral algorithms have proven extremely successful in analyzing the structure of such graphs under various probabilistic models. Interesting tasks include finding clusters, communities, latent representations, and so on [AKS98, McS01, NJW+ 02, CO06]. The underlying mathematical justification for these applications can be informally summarized as follows (more precise statements are given below): If G is dense enough, then (AG − E{AG }) is ‘much smaller’ than E{AG }. However, it was repeatedly observed that this principle breaks down for random graphs with bounded average degree [FO05, CO10, KMO10, DKMZ11, KMM+ 13], and that spectral methods consequently fail in this case. In order to focus on the simplest non-trivial instance of this phenomenon, assume that G ∼ G(n, γ/n) is an Erd˝os-R´enyi random graph with edge probability γ/n. Then –letting λmax ( · ) denote the largest eigenvalue– we have1 λmax (EAG ) = γ. On the other hand, with high probability [KS03, Vu05] ( √ 2 γ (1 + o(1)) if γ  (log n)4 , λmax (AG − EAG ) = p (1.1) log n/(log log n)(1 + o(1)) if γ = O(1). (The same behavior holds for the second-largest eigenvalue λ2 (AG )). In particular, λmax (AG − EAG )  λmax (EAG ) for bounded average degree γ. This phenomenon is not limited to Erd˝ os-R´enyi random graphs, and instead leads to failures of spectral methods for Date: April 22, 2015. Key words and phrases. Semidefinite programming, Erd˝ os-R´enyi graph, spin models, community detection. A.M. (Stanford University, [email protected]) was partially supported by NSF grants DMS-1106627 and CCF-1319979. S.S. (Stanford University, [email protected]) was supported by the William R. and Sara Hart Kimball Stanford Graduate Fellowship. 1For mathematical convenience, we assume (A ) ∈ {0, 1} with P((A ) = 1) = γ/n. G ii G ii 1

2

ANDREA MONTANARI AND SUBHABRATA SEN

many probabilistic models in the same regime. The origin of this behavior is also well-understood: large eigenvalues correspond to eigenvectors localized close to vertices with high degree in G. A natural approach to overcome this problem is to use semidefinite programming (SDP), instead of spectral methods. Informally, SDP allows to rule out solutions that are localized on a small subset of vertices. Concretely, we consider the following graph parameters: n o M + (G) ≡ max hAG − EAG , Xi : X  0, Xii = 1 ∀i ∈ [n] , (1.2) n o M − (G) ≡ − min hAG − EAG , Xi : X  0, Xii = 1 ∀i ∈ [n] . (1.3) (Throughout, for matrices A, B, we let hA, Bi = Tr(AB T ) denote the standard scalar product.) Equivalently, M + (G) is the value of the following optimization problem (dropping the subscript G from the adjacency matrix and assuming G ∼ G(n, γ/n)) maximize

n X

(Aij − γ/n)hui , uj i ,

(1.4)

i,j=1

subject to

ui ∈ Rn ,

kui k2 = 1 for i ∈ [n] .

(1.5)

and −M − (G) is the solution of the corresponding minimization problem. Our intuition is that the constraints kui k2 = 1 rule out solutions concentrated on a few highdegree vertices, and hence the SDP should be insensitive to the ‘discontinuity’ exemplified by Eq. (1.1). Our main result yields a precise formalization of this idea2. Theorem 1.1. Let G ∼ G(n, γ/n) be an Erd˝ os-R´enyi random graph with edge probability γ/n. Then with high probability we have M − (G) M + (G) √ √ √ √ (1.6) = 2 γ + oγ ( γ) , = 2 γ + oγ ( γ) . n n Remark 1.2. A possible interpretation of this result is that –unlike the largest, or second-largest eigenvalue– M + (G)/n behaves ‘continuously’ for large bounded degrees and remains much smaller than the largest eigenvalue of the expected adjacency matrix3 λmax (EAG ). Remark 1.3. The quantity M + (G) can Pnalso be thought as a relaxation of the problem of maximizing P n A u u over u ∈ {+1, −1}, i i,j=1 ij i j i=1 ui = 0 (the −γ/n term being a Lagrangian version of the latter constraint). The result of our companion paper [DMS15] implies that this has –with high √ √ probability– value n(P∗ /2) γ +n o( γ)+o(n) (see [DMS15] for a definition of P∗ ). We deduce that –with high probability– the SDP relaxation overestimates the optimum by a factor 4/P∗ + oγ (1) (where 4/P∗ ≈ 5.241). Remark 1.4. For the sake of simplicity, we stated Eq. (1.6) in asymptotic form. However, our proof √ provides quantitative bounds on the error terms. In particular, the oγ ( γ) term is upper bounded by Cγ 2/5 log(γ), for C a numerical constant. 2 Throughout the paper, O(·), o(·), and Θ(·) stands for the usual n → ∞ asymptotic, while O (·), o (·) and γ γ

Θγ (·) are used to describe the γ → ∞ asymptotic regime. We say that a sequence of events An occurs with high probability (w.h.p.) if P(An ) → 1 as n → ∞. Finally, for random {Xn } and non-random f : R+ → R+ , we say that Xn = oγ (f (γ)) w.h.p. as n → ∞ if there exists non-random g(γ) = oγ (f (γ)) such that the sequence An = {|Xn | ≤ g(γ)} occurs w.h.p. (as n → ∞).  −1 3Notice in passing that, letting A = EA , λ max hA 11T , Xi : X  0, Xii = 1 ∀i ∈ [n] . In other G max (A) = n words, it makes sense to compare M + (G)/n to λmax (A).

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

3

1.2. Higher-Rank Grothendieck Inequalities. A useful tool we develop is a Grothendieck-type inequality, which is of independent interest (see [KN12] for background). The inequality is more general than our graph setting and applies to general symmetric matrices. In analogy with the definitions given in the previous section, we introduce the following quantities, for any symmetric matrix B ∈ Rn×n n o Qk (B) ≡ max hB, Xi : X  0, rank(X) ≤ k, Xii = 1 ∀i ∈ [n] , (1.7) n o Q(B) ≡ max hB, Xi : X  0 , Xii = 1 ∀i ∈ [n] . (1.8) Note that the graph parameters introduced in the previous section can be expressed in terms of these, e.g. M + (G) = Q(AG − EAG ). Theorem 1.5. For k ≥ 1, let g ∼ N(0, Ik×k /k) be a vector with i.i.d. centered normal entries with variance 1/k, and define αk ≡ (Ekgk2 )2 .

(1.9)

Then, for any symmetric matrix B, we have the inequalities Q(B) ≥ Qk (B) ≥ αk Q(B) − (1 − αk ) Q(−B) ,  1   1  Q(B) + − 1 Qk (−B) . Qk (B) ≥ 2 − αk αk

(1.10) (1.11)

Remark 1.6. The upper bound in Eq. (1.10) is trivial. It follows from Cauchy-Schwartz that αk ∈ (0, 1) for all k. Also kgk22 is a chi-squared random variable with k degrees of freedom and hence 1 2Γ((k + 1)/2)2 =1− + O(1/k 2 ) . (1.12) αk = kΓ(k/2)2 2k Substituting in Eq. (1.10) we get, for all k ≥ k0 with k0 a sufficiently large constant, and assuming Q(B) > 0,  1 1 1− Q(B) − |Q(−B)| ≤ Qk (B) ≤ Q(B) . (1.13) k k In particular, if |Q(−B)| is of the same order as Q(B), we conclude that Qk (B) approximates Q(B) with a relative error of order O(1/k). The classical Grothendieck inequality concerns non-symmetric bilinear forms [Gro96]. A Grothendieck inequality for symmetric matrices was established in [NRT99, Meg01] (see also [AMMN06] for generalizations) and states that, for a constant C, 1 Q1 (B) ≥ Q(B) . (1.14) C log n Higher-rank Grothendieck inequalities were developed in the setting of general graphs in [Bri10, BdOFV10]. However, constant-factor approximations were not estabilished for the present problem (the complete graph, in the setting of [Bri10]). Constant factor approximations exist for B positive semidefinite [BdOFV10]. However we want to apply Theorem 1.5 –among others– to B = AG − EAG with AG the adjacency matrix of a Erd˝os-R´enyi graph with average degree γ = O(1). This matrix is non-positive definite, and in a dramatic way (the smallest eigenvalue being approximately −(log n/(log log n))1/2 ). On the other hand, Eq. (1.10) can be weakened by using Q(−B) ≤ −λmin (B), yielding Grothendieck inequality of [BdOFV10] for the positive semidefinite matrix B − λmin (B)I.

4

ANDREA MONTANARI AND SUBHABRATA SEN

In summary, we could not use the vast literature on Grothendieck-type inequality to prove our main result, Theorem 1.1, which motivated us to develop Theorem 1.5. 1.3. Application to community detection. Significant attention has been devoted recently to the community detection problem under the so-called ‘stochastic block model’ in the regime of bounded average degree [DKMZ11, KMM+ 13, Mas14]. This regime is particularly challenging mathematically because of the heterogeneity of vertex degrees. To be definite we will formalize this as a hypothesis testing problem, whereby we want to determine –with high probability of success– whether the random graph under consideration has a community structure or not. The estimation version of the problem, i.e. the question of determining –approximately– a partition into communities, can be addressed by similar techniques and will be considered in a future publication. We are given a single graph G = (V, E) over n vertices and we have to decide which of the following holds: Hypothesis 0: G is an Erd˝ os-R´enyi random graph with edge probability (a + b)/(2n), G ∼ G(n, (a + b)/(2n)). We denote the corresponding distribution over graphs by P0 . Hypothesis 1: There is a set S ⊆ [n], that is uniformly random given its size |S| = n/2. Conditional on S, edges are independent with (  a/n if {i, j} ⊆ S or {i, j} ⊆ [n]\S, P1 (i, j) ∈ E|S = (1.15) b/n if i ∈ S, j ∈ [n]\S or i ∈ [n]\S, j ∈ S. The corresponding distribution over graphs is denoted by P1 . A statistical test takes as input a graph G, and returns T (G) ∈ {0, 1} depending on which hypothesis is estimated to hold. We say that it is successful with high probability if P0 (T (G) = 1)+P1 (T (G) = 0) → 0 as n → ∞. It is convenient to generalize slightly the definition (1.2) by letting, for λ ∈ R a regularization parameter, n o M + (G; λ) ≡ max hAG − λ11T , Xi : X  0, Xii = 1 ∀i ∈ [n] . (1.16) Theorem 1.1 indicates that, under Hypothesis 0, and letting γ = (a + b)/2 be the average degree, √ √ we have M + (G; γ/n)/n = 2 γ + oγ ( γ). This suggests the following test: ( p 1 if M + (G; (a + b)/(2n)) ≥ n(1 + δ) 2(a + b), T (G; δ) = (1.17) 0 otherwise. Theorem 1.7. Assume, for some ε > 0, a−b p ≥ 2 + ε. 2(a + b)

(1.18)

Then there exists δ∗ = δ∗ (ε) > 0 and γ∗ = γ∗ (ε) > 0 such that the following holds. If (a + b)/2 ≥ γ∗ , then the SDP-based test T ( · ; δ∗ ) succeeds with high probability. Remark 1.8. Mossel,pNeeman, Sly [MNS12] proved that no test can be successful with high probability if (a − b) < 2(a + b). The above theorem guarantees that SDP is successful –roughly– within a factor 2 from this threshold. Remark 1.9. The factor 2 on the right hand side of Eq. (1.18) is not the best possible one. Indeed a somewhat more involved proof (see Appendix C and Appendix E) yields the condition (a − p b)/ 2(a + b) ≥ ξ0 + ε for some ξ0 ∈ (1, 2) strictly. In fact we expect the optimal constant ξ∗ for which this theorem holds to be ξ∗ = 1.

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

5

In Appendix C we develop a characterization of the constant ξ∗ in terms of the value of an SDP with gaussian data. We believe that this might give a way to prove ξ∗ = 1. Remark 1.10. One might wonder why we consider large degree asymptotics (a + b) → ∞ instead of trying to establish a threshold at (a − b)2 /2(a + b) = 1 for fixed a, b. Preliminary non-rigorous calculation [JMRT15] suggest that indeed this is necessary. For fixed (a + b) the SDP threshold does not coincide with the optimal one. The semidefinite program (1.2) was analyzed mostly in a regime of diverging degrees a, b = Θ(log n) [ABH14, HWX14, HWX15]. In that setting SDP recovers exactly the correct partition of vertices and hence one can apply more standard dual witness techniques to carry out the analysis. The only earlier result that compares to ours was recently proven by Guedon and Vershynin [GV14]. Their work establishes upper bounds on the estimation error of SDP, that apply only under the condition (a − b)2 ≥ 104 (a + b). The same paper also provides guarantees for a more general community structure. We defer the analysis of estimation error and more general community structures using our methods to future work. 1.4. Organization of the paper. In Section 2, we state some auxiliary results and prove Theorem 1.1. As mentioned, the proof is based on two tools that come from different research communities. On one hand, we use the Grothendieck-type inequality of Theorem 1.5, whose proof uses a Rietz’s randomized rounding method, and is deferred to Appendix A. On the other, we exploit a connection with statistical physics, and develop an interpolation between sparse graphs and Wigner matrices in Section 3. The interpolation argument requires to prove uniform continuity ‘at zero temperature’ as well as a separate analysis of the Wigner matrix problem, carried out in Appendix B. Finally, Appendices C, D, E contain proofs and complementary results on community detection. 2. Other results and proof of Theorem 1.1 We begin by defining a rank-k version of the parameters M ± (G). Namely, for k an integer we let n o (2.1) Mk+ (G) ≡ max hAG − EAG , Xi : X  0, rank(X) ≤ k, Xii = 1 ∀i ∈ [n] , or equivalently (assuming G ∼ G(n, γ/n)), Mk+ (G) is the value of the optimization problem maximize

n X

(Aij − γ/n)hui , uj i ,

(2.2)

i,j=1

subject to

ui ∈ Rk ,

kui k2 = 1 for i ∈ [n] .

(2.3)

We define Mk− (G) through the corresponding minimization problem. The basic tool we will employ is an interpolation with Wigner matrices, that we will use together with the Grothendieck-type inequality Theorem 1.5. 2.1. Interpolation lemmas. For W ∈ Rn×n a symmetric random matrix with (Wij )i≤j independent and Wij ∼ N(0, 1/n) if i < j, Wii ∼ N(0, 2/n), we define ql (k) ≡ lim inf

n→∞

1 E Qk (W ) , n

1 E Qk (W ) . n n→∞

qu (k) ≡ lim sup

(2.4)

The first lemma connects these quantities to the sparse graph setting. For its proof we refer to Section 3.1.

6

ANDREA MONTANARI AND SUBHABRATA SEN

Lemma 2.1. Let G ∼ G(n, γ/n) be an Erd˝ os-R´enyi random graph with edge probability γ/n, γ ≥ k. Letting ql (k), qu (k) be defined as above, there exists an absolute constant C0 , such that, with high probability i Mk+ (G) Mk− (G) h √ √ , ∈ ql (k) γ − C0 k 2/3 γ 1/3 log(kγ), qu (k) γ + C0 k 2/3 γ 1/3 log(kγ) . (2.5) n n This proof follows an interpolation method, following the approach developed by Guerra and Toninelli [GT04] in spin glass-theory, and recently applied in our companion paper [DMS15] to combinatorics. The basic idea is to continuously interpolate between the problem defined on a sparse Erd˝os-R´enyi random graph, and the one in terms of a Wigner matrix. Interpolation (or ‘smart path’) methods have a long history in probability theory, dating back to Lindeberg method [Lin22]. More specifically, interpolation methods from statistical physics have proven successful in the study of combinatorial problems on random graphs [FL03, FLT03, BGT13, PT04]. However, the type of interpolation used in these papers is different from the one studied here, and aimed at proving existence of the n → ∞ limit for certain graph parameters. Here we are interested in the large (bounded) degree asymptotics. The next lemma controls the constants ql (k), qu (k) for large k. Its proof can be found in Appendix B. Lemma 2.2. Let ql (k), qu (k) be defined as above. Then, there exists an integer k0 , such that, for all k ≥ k0 , 4 2 − ≤ ql (k) ≤ qu (k) ≤ 2 . (2.6) k (Lemma C.1 proves the additional result that ql (k) = qu (k).) 2.2. Proof of Theorem 1.1. We will limit ourselves to proving the claim for M + (G), since the analogous result for M − (G) follows from a very similar argument. Fix k a large integer. Then, applying Eq. (1.11) to B = AG − λ11T , we get αk 1 − αk Mk+ (G) ≤ M + (G) ≤ (2.7) Mk+ (G) + M − (G) . 2αk − 1 2αk − 1 k (The first inequality is trivial.) By Lemma 2.1, we have, letting ψ(k, γ) = C0 k 2/3 γ 1/3 log(kγ), n  o 1 1 √ √ lim P ql (k) γ − ψ(k, γ) ≤ M + (G) ≤ qu (k) γ + ψ(k, γ) = 1 . (2.8) n→∞ n 2αk − 1 Since this holds for any k and γ, we can take k = k(γ) = γ 1/10 → ∞ with γ. Notice that, with this √ setting ψ(k(γ), γ) = o( γ). Using αk = 1 − O(1/k), cf. Eq. (1.12) the claim follows. Note indeed that the proof yields a more quantitative version of Theorem 1.1, namely |M ± (G)/n− √ 2 γ| ≤ C γ 2/5 log(γ) with high probability. 3. Interpolation and proof of Lemma 2.1 We consider in the proof. The proof for Mk− (G), being similar, is omitted. Throughout let W ∈ Rn×n be a symmetric random matrix with (Wij )i≤j independent, Wij ∼ N(0, 1/n) if i < j, i, j ∈ [n], and Wii ∼ N(0, 2/n) for i ∈ [n]. We further consider a slightly different (Poisson) random multi-graph model GPoiss n,γ . Under this model, the number of edges is Poisson(nγ/2) and each edge has endpoints (I, J) that are independent and uniformly random in {1, 2, . . . , n}. This can be coupled to the Erd˝os-R´enyi random Mk+ (G)

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

7

graph in such a way that they differ in O(1) edges with high probability. It is therefore sufficient to prove our claims under the Poisson model. We denote by A the adjacency matrix of this graph (with every edge counted with its multiplicity). Let Sk−1 = {u ∈ Rk : kuk2 = 1} be the unit sphere in k dimensions. We will consider the following Hamiltonians (measurable functions (Sk−1 )n → R): H

spar

n 1 X γ (σ) = √ Aij − hσi , σj i , γ n

(3.1)

i,j=1

H den (σ) =

n X

Wij hσi , σj i ,

(3.2)

i,j=1

where σ = (σ1 , . . . , σn ), σi ∈ Sk−1 . We note that the process {H den (σ)} is a Gaussian process with mean 0 and Cov(H den (σ), H den (σ 0 )) = 2 P 0 0 ij hσi , σj ihσi , σj i. n ´ We denote by dν( · ) the uniform measure on (Sk−1 )n (normalized to 1, i.e. dν(σ) = 1). For each of ? ∈ {spar, den} we define the (expected) free energy density o nˆ 1 ? ? (3.3) φ (β) = E log eβH (σ) dν(σ) . n Our proof of Lemma 2.1 is based on two auxiliary results, denoted as Lemma 3.1 and 3.2. The first result bounds the difference between φspar (β) and φden (β). Lemma 3.1. With the above definitions, there exists a constant C independent of n, γ and β such √ that, for all β ≤ γ/10, 3 spar φ (β) − φden (β) ≤ C β + o(1) . γ 1/2

(3.4)

Proof. We define an interpolating Hamiltonian. Namely, for t ∈ [0, 1], we define Ht : (Sk−1 )n → R, by n n √ X 1 X γ(1 − t)  Ht (σ) = √ hσi , σj i + t Wij hσi , σj i . Aij (t) − γ n i,j=1

(3.5)

i,j=1

Here Aij (t) is the adjacency matrix of a graph with distribution GPoiss n,γ(1−t) . We further define the interpolating free energy nˆ o 1 eβHt (σ) dν(σ) , φ(β; t) = E log (3.6) n and notice that φ(β; 0) = φspar (β), φ(β; 1) = φden (β), whence ˆ 1 ∂φ(β; t) spar den φ (β) − φ (β) ≤ dt. ∂t 0

(3.7)

Therefore, it suffices to bound the derivative uniformly in [0, 1]. To this end, we introduce the interpolating Gibbs measure µβ;t (σ) = ´

exp(βHt (σ)) dν(σ) exp(βHt (τ ))dν(τ )

(3.8)

8

ANDREA MONTANARI AND SUBHABRATA SEN

and obtain that ∂φ(β; t) = I + II, ∂t

(3.9)

where β2 X √ 1 X I=β γ 2 E[µβ;t (hσi , σj i)] + 2 E[µβ;t (hσi , σj i2 ) − µβ;t (hσi , σj i)2 ], n n ij ij    X γ 2β E log µβ;t exp( √ hσi , σj i) + o(1) . II = − 2 2n γ

(3.10) (3.11)

ij

The calculation of these derivatives is relatively straightforward, and we present it in Section A.1. 2β Using Taylor expansion with b = √ γ and noting that |hσi , σj i| ≤ 1 for all i, j, one has exp(bhσi , σj i) = 1 + bhσi , σj i +

b2 hσi , σj i2 + Rij , 2

(3.12)

where |Rij | ≤ b3 eb /6. 2 √ Notice that, since β ≤ γ/10, we have b ≤ 1/5. Defining ψ = bµβ;t (hσi , σj i) + b2 µβ;t (hσi , σj i2 ) + µβ;t (Rij ), we have |ψ| ≤ b + b2 /2 + b3 eb /6 ≤ 1/2. We can then use the Taylor approximation |ψ|3 1 ≤ C1 b3 . (3.13) log(1 + ψ) − ψ + ψ 2 ≤ 2 6(1 + ψ)3 Further substituting ψ, we get   b2 b2 log µβ;t (exp(bhσi , σj i)) − bµβ;t (hσi , σj i) + µβ;t (hσi , σj i2 ) − µ2β;t (hσi , σj i) ≤ C10 b3 . (3.14) 2 2 Combining Eqs. (3.9) to (3.11) with Eq. (3.14) completes the proof. 2 Next, for ? ∈ {spar, den}, we define the ground state energy density e?n = E[ n1 supσ H ? (σ)]. The next result bounds the difference between the free energy density and the ground state energy density. Lemma 3.2. There exist constants C, C0 and c independent of n, γ and k such that, for every ε > 0, √ k k den 1 den + o(1) , (3.15) en − φ (β) ≤ 5ε k + C log β β cε p k k spar 1 spar + o(1) . (3.16) en − φ (β) ≤ C0 ε kγ + C log β β cε ´ Proof. For ? ∈ {spar, den}, define the partition function Z ? (β) = exp(βH ? (σ))dν(σ). Let H∗? = maxσ H ? (σ) denote the maximum of the Hamiltonian, and denote by σ ? a maximizer (i.e. H ? (σ ? ) = H∗? ). Each σ ∈ (Sk−1 )n will also be thought of as a matrix in Rn×k . We also define the event √ E ? ≡ {|H ? (σ) − H∗? | ≤ C ? knkσ − σ ? kF ∀σ ∈ (Sk−1 )n } , (3.17) √ where k · kF is the Frobenius norm of a matrix. We choose C den = 5 and C spar = C0 γ where C0 is a sufficiently large constant independent of n, γ and k. We claim that –with these choices– P(E ? ) = 1 − o(1/n) for both ? ∈ {den, spar}. Before proving this claim, let us see how it implies the Lemma.

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

On the event E ? , we have βH∗?

e

?

βH∗?

≥ Z (β) ≥ e

ˆ

√ exp(−βC ? kn kσ − σ ? kF ) dν(σ) .

9

(3.18)

For any ε > 0, we have (here I( · ) denotes the indicator function) ˆ ˆ √ √ ? ? exp(−βC knkσ − σ kF ) dν(σ) ≥ exp(−βC ? knkσ − σ ? kF )I(max kσi − σi? k2 ≤ ε) dν(σ) i∈[n] √ ? n ≥ exp(−βC nε k) (Vk (ε)) , (3.19) where Vk (ε) is volume of the spherical cap {y ∈ Sk−1 : ks∗ − yk2 ≤ ε}. Combining (3.18) and (3.19), we get h H ? √ i 1 1 1 1 ? +E ∗ − log Z ? (β) I (E ? )c (3.20) en − φ? (β) ≤ C ? ε k + log β β Vk (ε) n nβ We show that the last term in the RHS of (3.20) is o(1). By Cauchy-Schwarz inequality, we have s h H ? i  1 1 E ∗ − log Z ? (β) I (E ? )c ≤ P[(E ? )c ] 2 E{[sup H ? (σ) − inf H ? (σ)]2 } σ n nβ n σ p ≤ k P[(E ? )c ]E{[λmax (A? ) − λmin (A? )]2 } (3.21) √ where, for the dense case we have Aden = W , and for the sparse case Aspar = (AG − (γ/n)11T )/ γ, with AG the adjacency matrix of G. In the first inequality we used the elementary bounds β inf σ H ? (σ) ≤ log Z ? (β) ≤ β supσ H ? (σ). The second inequality is the obvious spectral relaxation bound. We next show that the last expression in Eq. (3.21) is o(1) as n → ∞, treating separately the dense and sparse case (as a consequence of our claim P((E ? )c ) = 1 − o(1/n) which will be proven 2 below). For the dense case, P(λmax (W ) − λmin (W ) ≥ 4 + t) ≤ c1 e−nc2 t for some constant c1 , c2 > 0 [AGZ09], whence E{[λmax (Aden ) − λmin (Aden )]2 } ≤ 20, thus yielding the claim. To handle the sparse case, note that by Efron-Stein inequality [BLM13], Var(λmax (Aspar ) − p spar spar spar λmin (A )) = O(n) while [λmax (A ) − λmin (A )] ≤ C (log n)/(log log n) with high probability [KS03]. So the RHS is o(1) under our claim P(E spar ) ≥ 1 − o(1/n). 1 π k/2 Finally, Vk (ε) = Γ([k/2]) P[X < ε2 − (ε4 /4)] where X ∼ Beta( k−1 2 , 2 ). Hence ˆ ε2 −ε4 /4  k−1 k−1 ε4  1 c P X < ε2 − ≥ t 2 −1 dt ≥ √ (ε2 − ε4 /4) 2 k−1 1 4 Beta( 2 , 2 ) 0 k

(3.22)

Plugging this back into (3.20) gives the required upper bounds. It remains to prove P[E ? ] = 1 − o(1/n) for ? ∈ {den, spar}. Using the triangle inequality, we get |H ? (σ) − H∗? | ≤ |hσ − σ ? , A? σi| + |hσ − σ ? , A? σ ? i| ≤ kσ − σ ∗ kF max{kA? σkF , kA? σ ∗ kF }

(3.23) (3.24)

where we recall that, for the dense case we have Aden = W , and for the sparse case Aspar = √ (AG − (γ/n)11T )/ γ, with AG the adjacency matrix of the Erd˝os-R´enyi graph G ∼ G(n, γ/n). For any σ ∈ (Sk−1 )n we have kA? σk2F ≤ k

sup s∈Rn :ksk∞ ≤1

kA? sk22 ,

(3.25)

10

ANDREA MONTANARI AND SUBHABRATA SEN

and therefore   P (E ? )c ≤ P

sup

√  kA? sk2 ≥ C ? n .

(3.26)

s∈Rn :ksk∞ ≤1

We conclude the proof by bounding the right-hand side separately in the dense and sparse cases. The dense case is straightforward. Indeed, for ksk∞ ≤ 1 kAden sk22 ≤ λmax (Aden )2 ksk22 ≤ λmax (Aden )2 n .

(3.27) 2

The claim follows since λmax (Aden ) ≤ 2 + t with probability larger than 1 − c1 e−c2 nt for some c1 , c2 > 0 [AGZ09]. For the sparse case, we observe that s 7→ kAspar sk22 is a convex function on ksk∞ ≤ 1, and thus attains it maxima at one of the corners of the hypercube [−1, 1]n . In other words, supksk∞ ≤1 kAspar sk22 = maxs∈{±1}n kAspar sk22 . For s ∈ {±1}n , we get n  2X 2 spar 2 di + nγ 2 , (3.28) kA sk2 ≤ γ i=1 P where di is the degree of vertex i in G. The desired bound follows since ni=1 d2i ≤ C00 γ 2 n with probability at least 1 − o(1/n) for some constant C00 large enough. 2 3.1. Proof of Lemma 2.1. Using Lemma 3.1 and Lemma 3.2, we can finally prove Lemma 2.1. We compare the “ground state energies” for the sparse and the dense models. By triangle inequality and lemmas 3.1 and 3.2, we have den 1 den 1 den spar 1 spar den spar |espar − e | ≤ e − φ (β) + |φ (β) − φ (β)| + e − φ (β) (3.29) n n n n β β β n p  β2 o k (3.30) ≤ C ε kγ + log k/ε + 1/2 + o(1) β γ k 2/3 ≤ C 0 1/6 log(kγ) + o(1) . (3.31) γ Here, in the last inequality, we substituted ε = k 1/6 /γ 2/3 , β = k 1/3 γ 1/6 (which satisfies the assumptions of Lemma 3.1 for γ ≥ k). √ Recall that EMk+ (G)/n = espar γ and EQk (W )/n = eden n n . Hence to complete the proof, it is + + sufficient to show that Mk (G) − EMk (G) = o(n) with high probability. In order to prove this, let Z(A) = Mk+ (G) (with A = (aij ) the adjacency matrix of G). Let Z (i,j) (A) be the random variable obtained by replacing aij by an independent copy a0ij . It is easy to see that |Z(A) − Z (i,j) (A)| ≤ |aij − a0ij |. An application of Efron-Stein inequality [BLM13] yields that for any ε > 0, P[|Mk+ (G, λ) − E[Mk+ (G, λ)]| ≥ nε] = O(1/n), Combining this with (3.31) completes the proof.

(3.32)

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

11

Appendix A. Proof of Theorem 1.5 As mentioned already, the upper bound in Eq. (1.10) is trivial. The proof of the lower bound follows Rietz’s method [Rie74]. Let X be a solution of the problem (1.8) and through its Cholesky decomposition write Xij = hui , uj i, with ui ∈ Rn , kui k2 = 1. In other words we have, letting B = (Bij )i,j∈[n] , Q(B) =

n X

Bij hui , uj i .

(A.1)

i,j=1

Let J ∈ Rk×n be a matrix with i.i.d. entries Jij ∼ N(0, 1/k). Define, xi ∈ Rk , for i ∈ [n], by letting xi =

J ui . kJ ui k2

(A.2)

We next need a technical lemma. Lemma A.1. Let u, v ∈ Rn with kuk2 = kvk2 = 1 and J ∈ Rk×n be defined as above. Further, for −1/2 w ∈ Rn , let z(w) ≡ (1 − αk kJwk−1 2 )Jw. Then D Ju Jv E E , = αk hu, vi + αk Ehz(u), z(v)i . (A.3) kJuk2 kJvk2 Proof. Let g1 , g2 ∼ N(0,√Ik×k /k) be independent vectors (distributed as the first two columns of J. Let a = hu, vi and b = 1 − a2 . Then by rotation invariance EhJu, Jvi = Ehg1 , ag1 + g2 i = aE(kg1 k22 ) = hu, vi ,

(A.4)

and

g1

Ju , Jv = E , ag1 + g2 E kJuk2 kg1 k2

(A.5)

1/2

= aE(kg1 k2 ) = αk hu, vi . By expanding the product we have Jv 1 Ju Jv −1/2 Ju −1/2

Ehz(u), z(v)i = hu, vi − αk E , Jv − αk E Ju, + E , kJuk2 kJvk2 αk kJuk2 kJvk2 1 Ju Jv = −hu, vi + E , αk kJuk2 kJvk2

(A.7) (A.8) 2

which is equivalent to the statement of our lemma. Now, by definition of the xi ’s we have n n nX o D Ju X Juj E i E Bij hxi , xj i = Bij E , kJui k2 kJuj k2 i,j=1

(A.6)

(A.9)

i,j=1

= αk

n X

Bij hui , uj i + αk

i,j=1

= αk Q(B) + αk

n X

Bij Ehz(ui ), z(uj )i

(A.10)

i,j=1 n X i,j=1

Bij Ehz(ui ), z(uj )i .

(A.11)

12

ANDREA MONTANARI AND SUBHABRATA SEN

Now we interpret z(ui ) as a vector in a Hilbert space with scalar product Eh · , · i. Further by the rounding lemma A.1, these vectors have norm 1 E(kz(ui )k22 ) = − 1. (A.12) αk Hence, by definition of Q( · ), we have n  1  X − Bij Ehz(ui ), z(uj )i ≤ − 1 Q(−B) . (A.13) αk i,j=1

Substituting this in Eq. (A.11), we obtain n nX o Qk (B) ≥ E Bij hxi , xj i ≥ αk Q(B) − (1 − αk )Q(−B) ,

(A.14)

i,j=1

which coincides with the claim (1.10). In order to prove Eq. (1.11), we apply Eq. (1.10) to −B, thus getting 1 1 − αk Q(−B) ≤ Qk (−B) + Q(B) . αk αk Substituting this in Eq. (1.10), we obtain Eq. (1.11).

(A.15)

A.1. The interpolation derivatives. In this subsection we carry out the calculation of the derivative in Eqs. (3.9), (3.10), (3.11). We note that ˆ hˆ i 1 log exp(βHt (σ))dν(σ) p(t, A(t)) dρ0 (A(t), W ) φ(β; t) = (A.16) n Q where p(t, A(t)) = i≤j p(t, aij (t)), and p(t, aij (t)) is the PMF of Pois( γ(1−t) 2n ) if i = j and is the n(n+1)/2

γ(1−t) n )

n(n+1)/2

if i < j. Further ρ0 = ρN ⊗ ρR , where ρN is the counting measure on N and ρR is the standard Gaussian measure on R. Then we have, ∂φ(β; t) = M1 + M2 , (A.17) ∂t  ∂H (σ) i β h t M1 = E µβ,t , (A.18) ˆn ˆ ∂t h i 1 ∂p(t, A(t)) M2 = log exp(βHt (σ))dν dρ0 . (A.19) n ∂t Consider the term M1 : β √ 1 X M1 = β γ 2 E[µβ;t (hσi , σj i)] + √ E[µβ;t (H den (σ))]. (A.20) n 2n t ij PMF of Pois(

Further E[µβ;t (H

den

X ˆ hσi , σj i h Wij exp(βHt (σ)) i √ (σ))] = E ´ dν(σ) n exp(βHt (τ ))dν(τ )

(A.21)

i,j

An application of Gaussian integration by parts yields √ 1X E[µβ;t (H den (σ))] = 2β t E[µβ;t (hσi , σj i2 ) − µβ;t (hσi , σj i)2 ]. n i,j

Plugging this back into (A.20) gives I, as in Eq. (3.10).

(A.22)

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

13

´ Next, let hij (aij (t)) = E[n−1 log[ exp(βHt (σ))dν]|aij (t)]. The product form of p(t, A(t)) implies Xˆ ∂p(t, aij (t)) hij (aij (t)) M2 = dρN (aij (t)). (A.23) ∂t i≤j

γ 0 The ij-th integral in the RHS of (A.23) is simply − 2n g (λ), where g(λ) = E[f (X)] for f = hij and γ(1−t) X ∼ Pois(λ) at λ = n . Differentiating the PMF of the Poisson distribution, one observes that g 0 (λ) = E[f (X + 1) − f (X)]. With this observation, we have, γ X E[hij (aij (t) + 1) − hij (aij (t))] (A.24) M2 = − 2n i≤j

Adding 1 to aij (t) corresponds to adding an (i, j) edge in the sparse hamiltonian H spar (·). Thus we have, ψij β E[hij (aij (t) + 1) − hij (aij (t))] = E[log µβ;t (exp( √ hσi , σj i))] γ

(A.25)

where ψij = 1 if i = j and 2 otherwise. Plugging this back to (A.24) establishes II, as in Eq. (3.10). Appendix B. The complete graph: Proof of Lemma 2.2 Throughout this section, W ∈ Rn×n is a symmetric random matrix with (Wij )i≤j independent, Wij ∼ N(0, 1/n) if i < j, i, j ∈ [n], and Wii ∼ N(0, 2/n) for i ∈ [n]. By Theorem 1.5, applied to B = W , we have Q(W ) ≥ Qk (W ) ≥ αk Q(W ) − (1 − αk )Q(−W ), and hence using the symmetry of W , 1 1 (B.1) (2αk − 1) lim inf EQ(W ) ≤ ql (k) ≤ qu (k) ≤ lim sup EQ(W ) . n→∞ n n→∞ n The proof of Lemma 2.2 therefore follows from αk = 1 − O(1/k), once we prove the following. Lemma B.1. We have lim

n→∞

1 EQ(W ) = 2 . n

(B.2)

Proof. First notice that Q(W ) ≤ nλmax (W ) (the maximum eigenvalue of W ). Hence 1 EQ(W ) ≤ lim Eλmax (W ) = 2 . n→∞ n→∞ n

lim sup

(B.3)

For the last (classical) equality, see, for instance, [AGZ09]. We are left to prove a lower bound on Q(W ). Fix ε > 0, and let ϕ1 , ϕ2 ,. . . ϕnε be the eigenvectors of W corresponding to the top nε eigenvalues. Denote by U ∈ Rn×(nε) the matrix whose columns are ϕ1 , ϕ2 ,. . . ϕnε , and let D ∈ Rn×n be the diagonal matrix with entries Dii = (U U T )ii .

(B.4)

Note that –by the invariance of W under rotations– P = U U T is a projector onto a uniformly random subspace of nε dimension in Rn , and Dii = hei , P ei i = kP ei k22 . Inverting the role of P and ei , we see that Dii is distributed as the square norm of the first nε components of a uniformly random unit vector of n dimensions. Hence Znε d Dii = , (B.5) Znε + Zn(1−ε)

14

ANDREA MONTANARI AND SUBHABRATA SEN

where Z` ∼ χ2 (`), ` ∈ {nε, n(1 − ε)} denote two independent chi-squared random variable with ` degrees of freedom. Standard tail bounds on chi-squared random variables, plus union bound over i ∈ [n], imply r  log n  1 (B.6) P max |Dii − ε| ≤ C ≥ 1 − 10 , n n i∈[n] for C a suitable constant. We then define X = D−1/2 U U T D−1/2 .

(B.7)

This is clearly a feasible point of the optimization problem that defines Q(W ), cf. Eq. (1.8), i.e X  0 and Xii = 1. Therefore, letting E = ε1/2 D−1/2 Q(W ) ≥ hW, Xi 1 1 = hW, U U T i + hW − EW E, U U T i ε ε nε 1X 1 ≥ λ` (W ) − kW − EW Ek2 kU U T k∗ ε ε

(B.8) (B.9) (B.10)

`=1

1 ≥ nλnε (W ) − kW k2 (1 + kEk2 )kE − Ik2 kU U T k∗ . (B.11) ε Here kZk∗ denotes the nuclear norm of Z (sum of the absolute values of eigenvalues) and in the last inequality we used kW − EW Ek2 ≤ kW − EW k2 + kEW − EW Ek ≤ kW k2 kE − Ik2 + kEk2 kW k2 kE − Ik2 . Next , since U U T is a projector on nε dimensions, we have kU U T k∗ = nε, whence 1 Q(W ) ≥ λnε (W ) − kW k2 (2 + kE − Ik2 )kE − Ik2 . (B.12) n By Eq. (B.6), we have kE − Ik2 → 0 almost surely, and by a classical result [AGZ09], also the following limits hold almost surely lim kW k2 = 2 ,

n→∞

lim λnε (W ) = 2 − δ(ε) ,

n→∞

(B.13) (B.14)

where δ(ε) ↓ 0 as ε → 0. Indeed δ(ε) can be expressed explicitly in terms of Wigner semicircle law, namely, for ε ∈ (0, 1) it is the unique positive solution of the following equation. ˆ 2 √ 4 − x2 dx = ε . (B.15) 2π 2−δ Substituting in Eq. (B.12), we get, almost surely, 1 lim inf Q(W ) ≥ 2 − δ(ε) . (B.16) n→∞ n Note that λmin (W ) ≤ Q(W ) ≤ λmax (W ) and E|λmin (W )|, E|λmax (W )| < ∞. Hence by dominated convergence 1 lim inf EQ(W ) ≥ 2 − δ(ε) , (B.17) n→∞ n And this implies limn→∞ EQ(W )/n = 2 since ε can be taken arbitrarily small. 2

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

15

Appendix C. Community detection: A Stronger Version of Theorem 1.7 Throughout this section we will assume n even, to avoid un-necessary technicalities. We √ will also fix, without loss of generality √ S = {1, 2, . . . , n/2}. Further v is the vector with vi = 1/ n for i ∈ {1, . . . , n/2} and vi = −1/ n for i ∈ {(n/2) + 1, . . . , n}. Finally, throughout this section we let γ ≡ (a + b)/2. Notice that Theorem 1.7 follows immediately if we can prove that, assuming Eq. (1.18) with ε > 0, there exists δ∗ (ε) > 0 such that, under Hypothesis 1: 1 √ √  + lim P1 M (G; γ/n) ≥ (1 + 2δ∗ )2 γ + o( γ) = 1 . (C.1) n→∞ n There exists a very simple proof of this fact. Indeed D   E 2 1 + γ 1  M (G; γ/n) ≥ v, AG − 1 1T v = EG (S) + EG (S c ) − EG (S, S c ) + LG , (C.2) n n n 2 where EG (S) and EG (S c ) are the number of edges with both end-points in S and both end-points in S c , respectively (excluding self-loops), and EG (S, S c ) is the number of edges with one endpoint in S and the other in S c . Finally LG is the number of self-loops. By the definition of Hypothesis 1, we have EG (S) + EG (S c ) ∼ Binom(n(n − 2)/4, a/n), EG (S, S c ) ∼ Binom(n2 /4, b/n), and LG ∼ Binom(n, a/n). By elementary tail bounds on binomial random variables, we conclude that, with high probability, under P1 , r a−b log n 1 + M (G; γ/n) ≥ − . (C.3) n 2 n This yields the desired bound (C.1) with δ∗ (ε) = ε/4. In the rest of this section, we will establish a characterization of the optimal constant ξ∗ that can be substituted to 2 on the right-hand side of Eq. (1.18) in Theorem 1.7. This characterization is not explicit: it yields ξ∗ in terms of the asymptotic value of a sequence of SDP’s with random Gaussian data. However, we believe it provides an insightful equivalence and can open the way to determining ξ∗ . C.1. Interpolation lemmas. Let us begin by defining a problem with Gaussian data. For ξ ∈ R, we let B(ξ) ∈ Rn×n be a symmetric matrix given by B(ξ) ≡ ξ v v T + W .

(C.4)

Here W is a symmetric random matrix with (Wij )i≤j independent and Wij ∼ N(0, 1/n) if i < j, Wii ∼ N(0, 2/n). We then introduce a generalization and strengthening of the definition in Eq. (2.4). Lemma C.1. With the above definitions, the following limit exists, for all fixed k ∈ N, ξ ∈ R 1 (C.5) q(ξ; k) ≡ lim E Qk (B(ξ)) . n→∞ n Its proof can be found in Appendix D and uses results from [GT02]. Note that, applying this for ξ = 0, we obtain that indeed ql (k) = qu (k) = q(0; k). Another straightforward consequence of this Lemma and Theorem 1.5 is the following. Corollary C.2. The following limits exist and coincide 1 q(ξ) ≡ lim q(ξ; k) = lim E Q(B(ξ)) . n→∞ n k→∞

(C.6)

16

ANDREA MONTANARI AND SUBHABRATA SEN

Finally we have the following interpolation lemma that generalizes Lemma 2.1 to the stochastic block model. Lemma C.3. Let G be an random graph distributed according to Hypothesis 1, i.e. according p to the two-groups stochastic block model with n vertices, and parameters a, b. Define ξ ≡ (a−b)/ 2(a + b) and γ ≡ (a + b)/2. Letting q(ξ; k) be defined as above, there exists a constant C0 = C0 (ξ) such that, with high probability   1 + √ √ (C.7) Mk (G; γ/n) ∈ q(ξ; k) γ − C0 k 2/3 γ 1/3 log(kγ), q(ξ; k) γ + C0 k 2/3 γ 1/3 log(kγ) . n C.2. A stronger form of Theorem 1.7. Lemma C.3, together with Theorem 1.5 implies that, √ √ for large M + (G; γ/n)/n = q(ξ) γ + oγ ( γ). The following properties of the function q are proven in Appendix D.2. Lemma C.4. The function ξ 7→ q(ξ) is non-decreasing in ξ ∈ R+ . Further, we have the bounds 0 ≤ ξ ≤ 1 ⇒ q(ξ) = 2 , ξ ≥ 1 ⇒ max(2, ξ) ≤ q(ξ) ≤ ξ +

(C.8) 1 . ξ

(C.9)

We therefore define ξ∗ ∈ [1, 2] by  ξ∗ ≡ inf ξ ≥ 0 : q(ξ) > 0 .

(C.10)

Indeed, as mentioned in Remark 1.9, it is possible to prove by a perturbative argument that ξ∗ < 2 strictly, cf. Appendix E. Theorem C.5. Let G be an random graph distributed according to Hypothesis 1, i.e. according to the p two-groups stochastic block model with n vertices, and parameters a, b. Define ξ ≡ (a − b)/ 2(a + b) and γ ≡ (a + b)/2. Then, with high probability, for any ξ fixed,  1 √ √ √ √  M (G; γ/n) ∈ q(ξ) γ − oγ ( γ), q(ξ) γ + oγ ( γ) . (C.11) n In particular, if for some ε > 0, a−b p ≥ ξ∗ + ε , 2(a + b)

(C.12)

then there exists δ∗ = δ∗ (ε) > 0 and γ∗ = γ∗ (ε) > 0 such that the following holds. If (a + b)/2 ≥ γ∗ , then the SDP-based test T ( · ; δ∗ ) succeeds with high probability. Proof. The fact that T ( · ; δ∗ ) succeeds under condition (C.12) follows immediately from Theorem 1.1 and Eq. (C.11). We can therefore focus on proving the latter. Now, by Theorem 1.5 and Lemma C.3, we have, letting ψ(ξ, k, γ) = C0 k 2/3 γ 1/3 log(kγ), n  o 1 1 √ √ lim P1 q(ξ; k) γ − ψ(ξ, k, γ) ≤ M + (G; γ/n) ≤ q(ξ; k) γ + ψ(ξ, k, γ) = 1 . n→∞ n 2αk − 1 (C.13) The conclusion follows by taking k = k(γ) = γ 1/10 and using Corollary C.2 alongside Eq. (1.12). 2

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

17

C.3. Proof of Lemma C.3. The interpolation argument in this case proceeds in two steps — we first compare the sparse problem to a problem on the complete graph. Next, we compare two distinct problems on the complete graph. Without loss of generality, we assume that S = {1, · · · , n/2}. P Let H spar (σ) = √1γ ni,j=1 (Aij − nγ )hσi , σj i, where A = (Aij ) is the adjacency matrix corresponding to the random P graph distributed according to Hypothesis 1 with parameters a and b. Next, let H den (σ) = ni,j=1 Dij hσi , σj i, where D = ξ v v T + U , where U is a symmetric Gaussian matrix with mean 0, (Uij )i≤j are independent normal random variables , and, for i ≤ j ( a/(nγ) if {i, j} ⊆ S or {i, j} ⊆ [n]\S, Var(Uij ) = (C.14) b/(nγ) if i ∈ S, j ∈ [n]\S or i ∈ [n]\S, j ∈ S. The free energy densities corresponding to these Hamiltonians is defined using (3.3) and denoted by φspar (β)and φden (β) respectively. The proof of Theorem C.3 will use two lemmas. Our first lemma will bound the difference between φspar (β) and φden (β). √ Lemma C.6. There exists C = C(ξ) independent of n, γ and k such that, if β ≤ γ/C, then β3 |φspar (β) − φden (β)| ≤ C √ + o(1) γ

(C.15)

Proof. The proof will proceed along lines similar to the proof of Lemma 2.1. We start with a “Poissonized” version of the model in Hypothesis 1, with parameters a and b. The graph is formed by adding Pois(nγ/2) edges to [n], with the end-points being chosen as follows. The first end-point of the edge is selected uniformly at random in {1, · · · , n}. Given the first end-point i, j ∈ [n] is a if {i, j} ⊆ S or {i, j} ⊆ [n]\S; otherwise j is chosen with probability chosen with probability nγ b nγ . It is easy to see that it suffices to establish the result for this “Poissonized” graph. Indeed, this can be coupled to the original graph in such a way that they differ -with high probability- in O(1) edges. Let A(a, b) = (Aij ) denote the adjacency matrix of the modified graph. For simplicity of notation, the arguments will be suppressed when the underlying parameters are clear from the context. We observe that A(a, b) is a symmetric matrix having (Aij )i≤j independent with Aij ∼ Pois(a/n) if i < j and {i, j} ⊆ S or {i, j} ⊆ [n]\S, Aii ∼ Pois(a/2n) and Aij ∼ Pois(b/n) otherwise. Next we define the interpolating Hamiltonian n n X 1 X (1 − t)γ  Aij (t) − hσi , σj i + Dij (t)hσi , σj i , Ht (σ) = √ γ n i,j=1

(C.16)

i,j=1

√ where A(t) = A((1 − t)a, (1 − t)b) and D(t) = t ξv v T + t U . Let nˆ o 1 exp(βHt (σ))dν(σ) φ(β; t) = E log n

(C.17)

denote the interpolating free energy. Then we see that φ(β; 0) = φspar (β) while φ(β, 1) = φden (β). Similar to (3.7), it suffices to uniformly bound the derivative of the interpolating free energy. To this end, we introduce the Gibbs measure µβ;t (·) corresponding to the interpolating Hamiltonian Ht (·). Further, let S1 denote the set of (i, j) pairs such that either {i, j} ⊆ S or {i, j} ⊆ [n]\S.

18

ANDREA MONTANARI AND SUBHABRATA SEN

Setting v v T = (δij ), by calculations similar to Section A.1, we have,

∂φ(β;t) ∂t

= I + II, where,

√ n β γ X β β X δij E[µβ;t (hσi , σj i)] + √ E[µβ;t (hU σ, σi)], E[µβ;t (hσi , σj i)] + 2 ξ (C.18) I= 2 n n 2n t ij i,j=1   2β i i   2β a X h b X h E log µβ;t exp √ hσi , σj i II = − 2 E log µβ;t exp √ hσi , σj i − 2 + o(1). 2n γ 2n γ c S1

S1

(C.19) An application of Gaussian integration by parts as in (A.22) yields, o √ n a X E[µβ;t (hU σ, σi)] =2β t E[µβ;t (hσi , σj i2 ) − µβ;t (hσi , σj i)2 ] + nγ S1 o √ n b X E[µβ;t (hσi , σj i2 ) − µβ;t (hσi , σj i)2 ] . 2β t nγ c

(C.20)

S1

The proof is completed by using Taylor expansion and bounding the derivatives as in the proof of Lemma 3.1. 2 We next notice that the β → ∞ regularity proved in Lemma 3.2 holds for the random graph G and the random matrix D introduced here as well. The proof 3.2 does indeed go through with essentially no change. Proceeding as in the proof of Lemma 2.1, cf. Section 3.1, we obtain the √ following. (Note that maxσ H spar (σ) = Mk+ (G; γ/n)/ γ) Corollary C.7. Let G be a random graph distributed according to Hypothesis 1with parameters a and b, and D = ξ v v T + U be a random matrix as defined above, with variance of the entries given by Eq. (C.14). Then, the following holds with high probability, for a constant C = C(ξ), 1 1 + (G; γ/n) − M EQ (D) ≤ Ck 2/3 γ −1/6 log(kγ) . √ k n γ k n

(C.21)

Next, define three independent, symmetric, random matrices Y0 , Y1 , Y2 with centered Gaussian entries. We choose variances as follows for i ≤ j b , nγ ( (a − b)/(nγ) Var((Y1 )ij ) = 0

(C.22)

Var((Y0 )ij ) =

if {i, j} ⊆ S or {i, j} ⊆ [n]\S, if i ∈ S, j ∈ [n]\S or i ∈ [n]\S, j ∈ S.

(C.23)

Finally Var((Y2 )ij ) = (a − b)/(2nγ) for all i < j and Var((Y2 )ii ) = a/(nγ) on the diagonal. With these definition, we can couple the random matrices B and D defined above by letting B = ξ v v T + Y0 + Y2 ,

(C.24)

D = ξ v v T + Y0 + Y1 .

(C.25)

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

19

We then have 1   1 EQk (B) − EQk (D) ≤ E kB − Dk2 n n ≤ EkY1 k2 + EkY2 k2 s s a−b a−b 2 ≤2 +2 + , γ 2γ n

(C.26) (C.27) (C.28)

where the last bound holds by standard estimates on the eigenvalues of GOE matrices [AGZ09]. √ Recalling that, by definition, a − b = 2ξ γ we conclude that γ 1/2 γ 1/2 EQk (B) − EQk (D) ≤ 5 ξ 1/2 γ 1/4 . (C.29) n n Substituting in Eq. (C.21) we obtain, for a constant C = C(ξ), with high probability 1 γ 1/2 EQk (B) ≤ Cγ 1/4 + Ck 2/3 γ 1/3 log(kγ) . (C.30) Mk+ (G; γ/n) − n n Recalling that n−1 EQk (B) → q(ξ; k) by Lemma C.1, we obtain the claim of Lemma C.3. Appendix D. Auxiliary lemmas for community detection In this Appendix we prove Lemma C.1 and Lemma C.4. Note that by a change of variables √ in the optimization problem defining Qk (B(ξ)) we can redefine B(ξ) as per Eq. (C.4) with v = 1/ n. To see this, it is sufficient to replace the decision variable X in Eq. (1.7) by X 0 defined as follows ( Xij if i, j ∈ S or i, j ∈ S c , Xij0 = (D.1) −Xij if i ∈ S, j ∈ S c or i ∈ S c , j ∈ S , and note that this change leave unchanged the constraints. Hence, in the proof below we will take ξ B(ξ) = 11T + W . n

(D.2)

D.1. Proof of Lemma C.1. We define the following Hamiltonian H : (Sk−1 )n → R: Hξ (σ) =

n X

B(ξ)ij hσi , σj i

(D.3)

i,j=1

=

n n

ξ

X 2 X σ + Wij hσi , σj i ,

i n 2 i=1

(D.4)

i,j=1

and the corresponding free energy density nˆ o 1 φn (β; ξ) = E log eβHξ (σ) dν(σ) . n Using [GT02, Theorem 1], we conclude that the limit φ(β; ξ) = lim φn (β; ξ) n→∞

(D.5)

(D.6)

exists for every β > 0. Further note that limβ→∞ φn (β; ξ)/β = EQk (B(ξ))/n. By inspection of the proof of Lemma 3.2, we conclude that a bound analogous to (3.15) holds for the present Hamiltonian as well (the only

20

ANDREA MONTANARI AND SUBHABRATA SEN

property used is that the kB(ξ)k2 ≤ 5 + ξ with probability 1 − e−cn ). Hence, we get, for a universal constant C, 1 √ 1 k k + o(1) . (D.7) EQk (B(ξ)) − φn (β; ξ) ≤ (5 + ξ)ε k + C log n β β cε Define ql (ξ; k) = lim inf

n→∞

1 EQk (B(ξ)) , n

1 EQk (B(ξ)) . n→∞ n

qu (ξ; k) = lim sup

Taking the n → ∞ in Eq. (D.7) and using Eq. (D.6), we get √ 1 k k q (ξ; k) − φ(β; ξ) l ≤ (5 + ξ)ε k + C log , β β cε √ k 1 k qu (ξ; k) − φ(β; ξ) ≤ (5 + ξ)ε k + C log . β β cε and hence by triangular inequality √ ql (ξ; k) − qu (ξ; k) ≤ 2(5 + ξ)ε k + 2C k log k . β cε

(D.8)

(D.9) (D.10)

(D.11)

Taking ε = 1/β and letting β → ∞ we get ql (ξ; k) = qu (ξ; k), and hence the limit q(ξ; k) exists as claimed. D.2. Proof of Lemma C.4. Note that Q( · ) is non-decreasing in its argument, i.e. B1  B2 ⇒ Q(B1 ) ≤ Q(B2 ) .

(D.12)

Indeed, if X1 is an optimizer for the SDP with argument B1 , then Q(B1 ) = hB1 , X1 i ≤ hB2 , X1 i ≤ Q(B2 ) (the first inequality follows from positive semidefineteness of X). In particular, this implies E Q(B(ξ1 )) ≤ E Q(B(ξ2 )) whenever ξ1 ≤ ξ2 . Since by Corollary C.2, q(ξ) = limn→∞ n−1 E Q(B(ξ)), it follows that ξ 7→ q(ξ) is non-decreasing. Further, since B(ξ)  B(0) for all ξ ≥ 0, it follows from Lemma 2.2 that q(ξ) ≥ 2 for all ξ ≥ 0. Finally [FP07] yields that ( 2 if ξ ∈ [0, 1], lim λmax (B(ξ)) = (D.13) n→∞ ξ + 1/ξ if ξ > 1. (These limits hold almost surely but also in L1 , and hence in expectation.) Since Q(B(ξ)) ≤ nλmax (B(ξ)), it follows that q(ξ) ≤ 2 for ξ ∈ [0, 1], and q(ξ) ≤ ξ + 1/ξ for ξ > 1, thus concluding the proof. Appendix E. A better bound on ξ∗ In this appendix we prove that the factor 2 in Theorem 1.7 can be replaced by ξ0 ∈ [1, 2) strictly. Indeed we prove the following. Proposition E.1. Let ξ∗ be defined as per Eq. (C.10). Then ξ∗ < 2 strictly. Proof. Without loss of generality, we can define B(ξ) Appendix D, i.e. B(ξ) ≡

ξ 1 1T + W . n

(E.1)

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

21

Fixing ρ > 0 bounded, we let n n o X p V (ρ) ≡ i ∈ [n] : Wij ≤ −ρ (n + 1)/n .

(E.2)

j=1

Note that

Pn

j=1 Wij

´x

∼ N(0, (n + 1)/n), whence E|V (ρ)| = n Φ(−ρ) ,

−t2 /2

(E.3)



with Φ(x) ≡ −∞ e dx/ 2π denoting the Gaussian distribution. p We also have, for i 6= j, i, j ∈ [n], and letting ρn ≡ ρ (n + 1)/n, and Z, Z 0 , Z1 , Z2 ∼ N(0, 1) independent,    P i ∈ V (ρ), j ∈ V (ρ) − P i ∈ V (ρ) P j ∈ V (ρ) = (E.4)  −1/2 −1/2 2 = P(Z1 + n Z ≤ −ρn ; Z2 + n Z ≤ −ρn − Φ(−ρ) (E.5)   2 −1/2 2 −1/2 = E Φ(−ρn + n Z) − E Φ(−ρn + n Z) (E.6) n o 2 1  . (E.7) = E Φ(−ρn + n−1/2 Z) − Φ(−ρn + n−1/2 Z 0 ) 2 √ Since Φ0 (x) ≤ 1/ 2π, this implies    1 1 E{(Z − Z 0 )2 } = . (E.8) P i ∈ V (ρ), j ∈ V (ρ) 0P i ∈ V (ρ) P j ∈ V (ρ) ≤ 4πn 2πn Hence     Var(|V (ρ)|) ≤ nΦ(−ρ)(1 − Φ(−ρ)) + n(n − 1) P 1 ∈ V (ρ), 2 ∈ V (ρ) − P 1 ∈ V (ρ) P 2 ∈ V (ρ) (E.9) n ≤ . (E.10) 2 In particular, by Chebyshev inequality,   1 . (E.11) P |V (ρ)| − nΦ(−ρ) ≥ nε ≤ 2nε Nest let ` = λ n for some λ ∈ (0, 1/2). For any R ⊆ [n], |R| = `, we have h1R , W 1R i ∼ N(0, 2`2 /n). Then   n   P max h1R , W 1R i ≥ n t ≤ max P h1R , W 1R i ≥ n t (E.12) ` R⊆[n],|R|=` R⊆[n],|R|=` n h  t2 io ≤ en H(λ) Φ − n3 t2 /(2`2 ) ≤ exp n λ log(e/λ) − 2 , (E.13) 4λ with H(λ) = −λ log λ − (1 − λ) log(1 − λ) the entropy function. Note that the exponent is negative for t > F∗ (λ) = (4λ2 H(λ))1/2 . By a standard calculation, there exists a constant C = C(λ) such that n o E max h1R , W 1R i ≤ n F∗ (λ) + C . (E.14) R⊆[n],|R|=`

Now define s ∈ {+1, −1}n by ( +1 si = −1

if i ∈ V (ρ), otherwise.

(E.15)

22

ANDREA MONTANARI AND SUBHABRATA SEN

We then have Q(B(ξ)) ≥ hs, B(ξ)si

(E.16)

= h1, B(ξ)1i − 4h1V (ρ) , B(ξ)1i + 4h1V (ρ) , B(ξ)1V (ρ) i (E.17)   2 |V (ρ)| = ξ n − 4|V (ρ)| + 4 + h1, W 1i − 4h1V (ρ) , W 1i + 4h1V (ρ) , W 1V (ρ) i (E.18) n  |V (ρ)|2  ≥ ξ n − 4|V (ρ)| + 4 + h1, W 1i + 4ρ |V (ρ)| − 4 min h1R , W 1R i . n R⊆[n],|R|≤|V (ρ)| (E.19) Using Eqs. (E.11) and (E.13), and letting λ ≡ Φ(−ρ) ∈ (0, 1/2) we get p 2 1 EQ(B(ξ)) ≥ ξ 1 − 2λ + 4λρΦ−1 (1 − λ) − 4 4λ2 H(λ) − o(1) , n and hence, for any λ ∈ (0, 1/2), p 2 q(ξ) ≥ q0 (ξ; λ) ≡ ξ 1 − 2λ + 4λρΦ−1 (1 − λ) − 4 4λ2 H(λ) .

(E.20)

(E.21)

We conclude that  ξ∗ ≤ ξ0 ≡ inf ξ ≥ 1 : ∃λ ∈ (0, 1/2) s.t. q0 (ξ; λ) > 2 . We now claim that ξ0 < 2 strictly. Indeed, expanding q0 (2; λ) for small λ yields 1/2 q0 (2; λ) = 2 + 4λ 2 log(1/λ) (1 + oλ (1)) .

(E.22)

(E.23)

It follows that there exists λ0 such that q0 (2; λ0 ) > 2 strictly. By continuity of the function ξ → q0 (ξ; λ0 ), there exists ε > 0 such that q0 (2 − ε; λ0 ) > 2 as well. This proves our claim. 2

SEMIDEFINITE PROGRAMS ON SPARSE RANDOM GRAPHS

23

References [ABH14] [AGZ09] [AKS98] [AMMN06] [BdOFV10] [BGT13] [BLM13] [Bri10] [CO06] [CO10] [DKMZ11]

[DMS15] [FL03] [FLT03] [FO05] [FP07] [Gro96] [GT02] [GT04] [GV14] [HWX14] [HWX15] [JMRT15] [KMM+ 13]

[KMO10] [KN12]

Emmanuel Abbe, Afonso S Bandeira, and Georgina Hall, Exact recovery in the stochastic block model, arXiv:1405.3267 (2014). Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni, An introduction to random matrices, Cambridge studies in advanced mathematics., Cambridge University Press, 2009. Noga Alon, Michael Krivelevich, and Benny Sudakov, Finding a large hidden clique in a random graph, Random Structures and Algorithms 13 (1998), no. 3-4, 457–466. Noga Alon, Konstantin Makarychev, Yury Makarychev, and Assaf Naor, Quadratic forms on graphs, Inventiones mathematicae 163 (2006), no. 3, 499–522. Jop Bri¨et, Fernando M´ ario de Oliveira Filho, and Frank Vallentin, The positive semidefinite grothendieck problem with rank constraint, Automata, Languages and Programming, Springer, 2010, pp. 31–42. Mohsen Bayati, David Gamarnik, and Prasad Tetali, Combinatorial approach to the interpolation method and scaling limits in sparse random graphs, Annals of Probability 41 (2013), no. 6, 4080–4115. S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, Oxford University Press, 2013. Bri¨et, Jop and de Oliveira Filho, Fernando M´ ario and Vallentin, Frank, Grothendieck inequalities for semidefinite programs with rank constraint, arXiv:1011.1754 (2010). Amin Coja-Oghlan, A spectral heuristic for bisecting random graphs, Random Structures & Algorithms 29 (2006), no. 3, 351–398. , Graph partitioning via adaptive spectral techniques, Combinatorics, Probability and Computing 19 (2010), no. 02, 227–284. Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´ a, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications, Physical Review E 84 (2011), no. 6, 066106. Amir Dembo, Andrea Montanari, and Subhabrata Sen, Extremal cuts of sparse random graphs, arXiv:1503.03923 (2015). Silvio Franz and Michele Leone, Replica bounds for optimization problems and diluted spin systems, J. Stat. Phys. 111 (2003), 535–564. Silvio Franz, Michele Leone, and Fabio L. Toninelli, Replica bounds for diluted non-poissonian spin systems, J. Phys. A 36 (2003), 10967–10985. Uriel Feige and Eran Ofek, Spectral techniques applied to sparse random graphs, Random Structures & Algorithms 27 (2005), no. 2, 251–275. Delphine F´eral and Sandrine P´ech´e, The largest eigenvalue of rank one deformation of large wigner matrices, Communications in mathematical physics 272 (2007), no. 1, 185–228. Alexander Grothendieck, R´esum´e de la th´eorie m´etrique des produits tensoriels topologiques, Resenhas do Instituto de Matem´ atica e Estat´ıstica da Universidade de S˜ ao Paulo 2 (1996), no. 4, 401–481. Francesco Guerra and Fabio L. Toninelli, The infinite volume limit in generalized mean field disordered models, Markov Processes Relat. Fields 9 (2002), no. cond-mat/0208579, 195–207. , The high temperature region of the Viana-Bray diluted spin glass models, J. Stat. Phys 115 (2004), 531–555. Olivier Gu´edon and Roman Vershynin, Community detection in sparse networks via grothendieck’s inequality, arXiv:1411.4686 (2014). Bruce Hajek, Yihong Wu, and Jiaming Xu, Achieving exact cluster recovery threshold via semidefinite programming, arXiv:1412.6156 (2014). , Achieving exact cluster recovery threshold via semidefinite programming: Extensions, arXiv:1502.07738 (2015). Adel Javanmard, Andrea Montanari, and Federico Ricci-Tersenghi, Phase transitions in semidefinite relaxations: A statistical physics analysis, In preparation, 2015. Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov´ a, and Pan Zhang, Spectral redemption in clustering sparse networks, Proceedings of the National Academy of Sciences 110 (2013), no. 52, 20935–20940. Raghunandan H Keshavan, Andrea Montanari, and Sewoong Oh, Matrix completion from noisy entries, Journal of Machine Learning Research 11 (2010), 2057–2078. Subhash Khot and Assaf Naor, Grothendieck-type inequalities in combinatorial optimization, Communications on Pure and Applied Mathematics 65 (2012), no. 7, 992–1035.

24

ANDREA MONTANARI AND SUBHABRATA SEN

[KS03] [Lin22] [Mas14] [McS01] [Meg01] [MNS12] [NJW+ 02] [NRT99] [PT04] [Rie74] [Vu05]

Michael Krivelevich and Benny Sudakov, The largest eigenvalue of sparse random graphs, Combinatorics, Probability and Computing 12 (2003), no. 01, 61–72. Jarl Waldemar Lindeberg, Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung, Mathematische Zeitschrift 15 (1922), no. 1, 211–225. Laurent Massouli´e, Community detection thresholds and the weak Ramanujan property, Proceedings of the 46th Annual ACM Symposium on Theory of Computing, ACM, 2014, pp. 694–703. Frank McSherry, Spectral partitioning of random graphs, Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, IEEE, 2001, pp. 529–537. Alexandre Megretski, Relaxations of quadratic programs in operator theory and system analysis, Systems, approximation, singular integral operators, and related topics, Springer, 2001, pp. 365–392. Elchanan Mossel, Joe Neeman, and Allan Sly, Stochastic block models and reconstruction, arXiv:1202.1499 (2012). Andrew Y Ng, Michael I Jordan, Yair Weiss, et al., On spectral clustering: Analysis and an algorithm, Advances in neural information processing systems 2 (2002), 849–856. Arkadi Nemirovski, Cornelis Roos, and Tam´ as Terlaky, On maximization of quadratic form over intersection of ellipsoids with common center, Mathematical Programming 86 (1999), no. 3, 463–473. Dmitry Panchenko and Michel Talagrand, Bounds for diluted mean-fields spin glass models, Probability Theory and Related Fields 130 (2004), no. 3, 319–336. Ronald E Rietz, A proof of the Grothendieck inequality, Israel Journal of Mathematics 19 (1974), no. 3, 271–276. Van H Vu, Spectral norm of random matrices, Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, ACM, 2005, pp. 423–430.

Department of Electrical Engineering and Department of Statistics, Stanford University Department of Statistics, Stanford University