The Hidden Subgraph Problem Hamid Javadi∗
and
Andrea Montanari†
arXiv:1511.05254v1 [math.ST] 17 Nov 2015
November 16, 2015
Abstract We introduce a statistical model for the problem of finding a subgraph with specified topology in an otherwise random graph. This task plays an important role in the analysis of social and biological networks. In these type of networks, small subgraphs with a specific structure have important functional roles. Within our model, a single copy of a subgraph is added (‘planted’) in an Erd˝os-Renyi random graph with n vertices and edge probability q0 . We ask whether the resulting graph can be distinguished reliably from a pure Erd˝os-Renyi random graph, and present two types of result. First we investigate the question from a purely statistical perspective, and ask whether there is any test that can distinguish between the two graph models. We provide necessary and sufficient conditions that are essentially tight for subgraphs of size asymptotically smaller than n2/5 . Next we study two polynomial-time algorithms for solving the same problem: a spectral algorithm, and a semidefinite programming (SDP) relaxation. For the spectral algorithm, we establish sufficient conditions under which it distinguishes the two graph models with high probability. Under the same conditions the spectral algorithm indeed identifies the hidden subgraph. The spectral algorithm is substantially sub-optimal with respect to the optimal test. We show that a similar gap is present for the SDP approach. This points at a large gap between statistical and computational limits for this problem.
1
Introduction
‘Motifs’ play a key role in the analysis of social and biological networks. Quoting from an influential paper in this area [MSOI+ 02], the term ‘motif’ broadly refers to “patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks.” For instance, the authors of [MSOI+ 02] considered directed graph representations of various types of data: gene regulation networks, neural circuits, food webs, the world wide web, electronic circuits. They identified a number of small subgraphs that are found in atypically large numbers in such networks, and provided interpretations of their functional role. The analysis of motifs in large biological networks was pursued in a number of publications, see e.g. [KIMA04, YLSK+ 04, KA05, SSR+ 05, Alo07]. ∗ †
Department of Electrical Engineering, Stanford University Department of Electrical Engineering and Statistics, Stanford University
1
The analysis of subgraph frequencies has an even longer history within sociology, in part because sociological theories are predictive of specific subgraph occurrences. We refer to [Gra73] for early insights, and to [WF94, EK10] for recent reviews of this research area. Despite the practical relevance of this problem, little is known about how to assess statistical significance of network motifs. (We refer to Section 2.1 for a discussion of related work.) In this paper we introduce a model to address this question in a specific regime, namely when a single atypical subgraph is added to an otherwise random graph1 . We next provide an informal discussion of our model and results, deferring to the next sections for a formal description. We formalize the motif detection problem as a binary hypothesis testing problem, whereby we want to distinguish a graph with a hidden atypical subgraph from a purely random graph. Our null model is the Erd˝ os-Renyi random graph over n vertices, whereby each pair of vertices is connected by an edge independently with probability q0 . Under the alternative, v(Hn ) of the n vertices are selected at random and a a specific graph Hn is copied there (for each n, Hn is a fixed graph over v(Hn ) vertices). We ask whether there exists a test that distinguishes between the two graph models with high probability (i.e. with error probability converging to zero as n → ∞). We obtain three types of results that, for the sake of simplicity, we describe here assuming that Hn a regular graph of degree dn : Statistical limits. It is clear that detecting the subgraph Hn becomes easier when Hn has a larger number of edges. We establish the precise location of the detection threshold, under the assumption that the atypical subgraph is not too large. More precisely, assuming v(Hn ) = o(n2/5 ), we show that no successful test exists for dn ≤ (1 − ε) log n/ log(1/q0 ), while a test that succeeds with high probability exists for dn ≥ (1 + ε) log n/ log(1/q0 ). Spectral algorithm. Performing an optimal test (e.g. the likelihood ratio test) has complexity of order nv(Hn ) which is non-polynomial. Motivated by this, we introduce a spectral test which requires computing the leading eigenvector of a matrix of dimensions n × n. This has total complexity at most n3 . We establish sufficient conditions under which this test succeeds with high probability. The most important condition requires, that the leading eigenvalue of the adjacency matrix of Hn √ √ (suitably modified) is larger than C n. In particular, this requires v(Hn ) ≥ C n. It turns out that –under the same condition– a modified spectral algorithm not only detects, but also identifies the hidden subgraph. Semidefinite programming relaxation. Note that, if the hidden subgraph Hn is dense (e.g. has average degree of the same order as the number of nodes, dn = Θ(v(Hn ))) there is a large gap between the fundamental statistical limit (which only requires v(Hn ) ≥ C log n), and the √ detection threshold achieved by the spectral algorithm (which requires v(Hn ) ≥ C n). This motivates the study of more complex algorithmic approaches. We introduce a semidefinite programming relaxation of the subgraph detection problem that should be at least as powerful as the spectral method. However, we prove a negative result. The SDP approach is not successful unless –again– the leading eigenvalue of Hn is of order √ n. In other words, a similarly large gap between statistical and algorithmic thresholds exists for SDP as for spectral methods. 1
The case of a large number of subgraphs will be considered in a forthcoming publication [JM15].
2
These results suggest that it might be computationally hard to find an hidden subgraph in otherwise large random graph, even in regimes in which this task is statistically feasible. While this phenomenon has been already extensively investigated for the hidden clique problem (see, e.g., [AKS98, FR10] and Section 2.1), its generality is nevertheless surprising. The rest of this paper is organized as follows. In Section 2 we present a formal statement of the problem, and a discussion of related literature. Section 3 presents our results on the fundamental statistical limits of detection. The spectral algorithm and the SDP approach are presented and analyzed in Section 4. Proofs are collected in Sections 5, 6, 7.
2
Problem statement
Let P 0 = (P0,ij )i,j∈[n] and P 1 = (P1,ij )i,j∈[n] be two (sequences of) symmetric matrices with entries in [0, 1]. (Here and below [n] = {1, 2, . . . , n} will denote the set of first n integers.) The diagonal values Pa,ii are immaterial will be set to one by definition. We then define the probability laws P0,n and P1,n over the space of graphs with vertex set Vn = [n] defined as follows. Let σ ∈ Sn be a uniformly random permutation on [n]. Then, for Gn ∼ Pa,n , the events (i, j) ∈ E(Gn ) are conditionally independent given σ, with Pa,n (i, j) ∈ E(Gn ) σ = Pa,σ(i)σ(j) . (2.1) We denote by Gn the space of graphs over vertex set Vn = [n]. Definition 2.1. We say that the two laws P0 , P1 are strongly distinguishable if there exists a (sequence of ) function T : Gn → {0, 1} such that lim sup P0,n T (Gn ) = 1 = lim sup P1,n T (Gn ) = 0 = 0 . n→∞
n→∞
We say that they are weakly distinguishable if there exists T : Gn → {0, 1} such that h i lim sup P0,n T (Gn ) = 1 + P1,n T (Gn ) = 0 < 1 . n→∞
Using the above definitions, we can see that strong and weak distinguishability are equivalent to lim inf n→∞ kP0,n − P1,n kTV = 1, and lim inf n→∞ kP0,n − P1,n kTV > 0, respectively. Rather than considering two general sequences of symmetric matrices P 0,n and P 1,n , we will focus here on a specific example. Namely, P 0,n will correspond to a standard Erd˝os-Renyi random graph, and P 1,n to an Erd˝ os-Renyi random graph, with a planted copy of a small graph Hn . In order to define these formally, fix 0 ≤ q0 ≤ 1, and let Hn = (V (Hn ), E(Hn )) be a graph sequence, indexed by n. Let us emphasize that Hn is a non-random graph on v(Hn ) ≡ |V (Hn )| n vertices. e n) the set of labelings of the vertices of F taking values in Given a graph F , we denote by L(F, [n], i.e. n o e n) ≡ ϕ : V (F ) → [n] s.t. ϕ(i) 6= ϕ(j) ∀i 6= j . L(F, (2.2) e (In particular |L(G, n)| = n!/(n − v(G))!.) For an edge e = (i, j) ∈ E(G), we let ϕ(e) be the unordered pair (ϕ(i), ϕ(j)), and hence ϕ(E(G)) = {ϕ(e) : e ∈ E(G)}. 3
e n , n) a labeling of its vertices in [n], n ≥ v(Hn ). We With these notation, fix Hn and ϕ0 ∈ L(H let P0,ij P1,ij
= q0 ( 1 = q0
for all (i, j),
(2.3)
if (i, j) ∈ ϕ0 (E(Hn )), otherwise.
(2.4)
Throughout the paper, we assume that Hn and q0 are fixed and often drop them as arguments of various functions. We are interested in the large graph behavior of the above testing problem. An instance of this problem is parametrized by the pair q0 , {Hn }n∈N , where q0 ∈ (0, 1) is a number and {Hn }n∈N is a sequence of graphs. Two questions will be addressed in the next sections: 1. Under which conditions on q0 , {Hn }n∈N the two laws P0 , P1 are weakly distinguishable? Under which conditions are they strongly distinguishable? 2. Assuming the conditions for distinguishability of P0 , P1 are satisfied, under which conditions there exists a polynomial-time computable test T ( · ) that distinguishes P0 from P1 ?
2.1
Related work
To the best of our knowledge, the problem of testing random graphs for atypical small subgraphs has been studied –from a statistical perspective– only in specific scenarios. Following the original work on motifs in biological networks [MSOI+ 02], several papers developed algorithms to sample uniformly random networks with certain given characteristics, e.g. with a given degree sequence [MKI+ 03, TS07, BD11]. Uniform samples can be used can be used for assessing the significance level of specific subgraph counts in a real network under considerations. Let, for instance, NH (G) denote the number of copies of a certain small graph H in G. If in a real network of interest we find NH (G) = t, the probability P0 (NH (G) ≥ t) is used as significance level for this discovery. Let us mention two key differences with respect to problem considered in the present paper. First, we focus on conditions under which the laws P0 and P1 are strongly distinguishable. In particular, under the null model NH (G) = 0 with very high probability, and Monte Carlo is not effective in assessing significance. Second –and most importantly– the work of [MKI+ 03, TS07, BD11] implicitly assumes that the subgraph H has bounded size so that NH (G) can be computed in time nv(H) by exhaustive search. Here we consider instead large subgraphs Hn , and address the computational challenge of testing for Hn in polynomial time. Let us emphasize that, while we typically assume Hn to have diverging size, in practice it is impossible to perform exhaustive search already for quite small subgraphs. For instance if n = 105 , and v(Hn ) = 6, exhaustive search requires of the order of 1030 /6! ≥ 1027 operations. The specific case in which Hn is the complete graph over k(n) vertices, Hn = Kk(n) , is known as the hidden (or ‘planted’) clique problem, and has been extensively studied within theoretical computer science, see e.g. [Jer92, AKS98, FR10, AV11, DM13]. In particular, recent work [MPW15, DM15, RS15, HKP15] provides evidence towards the claim that there is a large gap between statistically optimal tests and the best polynomial-time computable tests. The present paper suggests that that this is indeed a very general phenomenon.
4
A related question is the one of testing whether a graph G contains a subset of nodes that are more tightly collected than the background. We refer to [VAC13, ACV14, HWX14, Mon15, HWX15] for recent work in this direction.
2.2
Notations
Given n ∈ Z, we let [n] = {1, 2, . . . , n} denote the set of first n integers. We write |S| for the cardinality of a set S. We denote by (n)k = n!/(n − k)! the incomplete factorial. Throughout this paper, we will use lowercase boldface (e.g. v = (v1 , . . . , vn ), x = (x1 , . . . , xn ), etc.) to denote vectors and uppercase boldface (e.g. A = (Ai,j )i,j∈[n] , Y = (Yi,j )i,j∈[n] , etc.) to denote matrices. For a vector v ∈ Rn and a set A ⊆ [n], we define v A ∈ Rn as (vA )i = (v)i for i ∈ APand (vA )i = 0, otherwise. Given a square matrix X ∈ Rn×n , we denote its trace by Tr(X) = ni=1 Xi,i . Given a symmetric matrix M , we denote by λ1 (M ) ≥ λ2 (M ) ≥ · · · ≥ λn (M ) its ordered eigenvalues. n×n the We denote by 1n = (1, 1, . . . , 1) ∈ Rn the all-ones vector, and by In , Jn = 1n 1T n ∈ R identity and all-ones matrices, respectively. For a matrix A ∈ Rm×n , vec(A) ∈ Rmn is the vector whose l’th entry is Aij where i − 1 and j − 1 are the quotient and remainder in dividing l by n, respectively. Also, ei ∈ Rn denotes the i’th standard unit vector. A simple graph is a pair G = (V, E), whereby V is a vertex set and E is a set of unordered pairs (i, j), i, j ∈ E. We will write V (G), E(G) whenever necessary to specify which graph we are referring to. Throughout, we will be focusing on finite graphs. We let v(G) = |V (G)|, e(G) = |E(G)|. For v ∈ V (G), degree of node v is shown by deg(v). For H ⊆ G, v ∈ V (G), the number of nodes in H which are connected to v is denoted by degH (v). We let Gn be the set of graphs over vertex set V = [n]. We follow the standard Big-Oh notation. Given functions f (n), g(n), we write f (n) = O(g(n)) if there exists a constant C such that f (n) ≤ C g(n), f (n) = Ω(g(n)) if there exists a constant C such that f (n) ≥ g(n)/C, and f (n) = Θ(g(n)) if f (n) = O(g(n)) and f (n) = Ω(g(n)). Further f (n) = o(g(n)) if f (n) ≤ C g(n) for all C > 0 and n large enough, and f (n) = ω(g(n)) if f (n) ≥ C g(n) for all C > 0 and n large enough.
3
Statistical limits on hypothesis testing
In this section we address the first question stated in Section 2: under which conditions on q0 , {Hn }n∈N are the two laws P0 , P1 distinguishable. Our results depend on the graph sequence {Hn }n∈N through its maximum density d(Hn ). For a graph H, we define e(F ) d(H) ≡ max . (3.1) F ⊆H v(F ) The following theorem provides a sufficient condition on the distinguishability of laws P0 , P1 . Theorem 1. Let {Hn }n≥1 be a sequence of non-empty graphs such that v(Hn ) = o(n) and for q0 ∈ (0, 1) let P0,n be the null model with edge density q0 , and P1,n be planted model with parameters Hn , q0 . If lim inf n→∞
d(Hn ) 1 > , log n log (1/q0 ) 5
(3.2)
then the two laws P0,n , P1,n are strongly distinguishable. Remark 3.1. The proof of this theorem also provides an explicit test T : Gn → {0, 1} which has asymptotically vanishing error probability under the assumptions of the theorem. Let k(n) = v(Fn ), where Fn ⊆ Hn is the subgraph of Hn with the smallest number of vertices, such that e(Fn )/v(Fn ) = d(Hn ). Then, the test developed in the proof requires searching over all subsets of k(n) vertices which, in most cases, is non-polynomial. The next theorem provides condition under which the two laws are indistinguishable. Theorem 2. Let {Hn }n≥1 , q0 ∈ (0, 1), P0,n , P1,n be as in Theorem 1. Then the two models are not weakly distinguishable if lim sup n→∞
d(Hn ) 1 < log n log (1/q0 )
(3.3)
v(Hn ) = 0. n2/5
(3.4)
and lim sup n→∞
Note that, under the condition v(Hn ) = n2/5 (i.e. when the hidden subgraph is ‘not too large’), this bound matches the positive result 1. In fact, we will prove a stronger indistinguishability result that implies Theorem 2. Theorem 3. With the definitions of Theorem 2, the following hold. High density graphs: If d(Hn ) = ω(log v(Hn )), then the laws P0 , P1 are not weakly distinguishable if lim sup n→∞
d(Hn ) 1 < . log n log (1/q0 )
(3.5)
Low density graphs: If d(Hn ) = o(log v(Hn )), then the laws P0 , P1 are not weakly distinguishable if lim sup n→∞
v(Hn ) = 0. n1/2
(3.6)
Medium density graphs: If d(Hn ) = Θ(log v(Hn )), then the laws P0 , P1 are not weakly distinguishable if d(Hn ) 1 < , log n log (1/q n→∞ 0) v(Hn ) lim sup 2/5 = 0. n n→∞
lim sup
We next illustrate these results with a few examples.
6
(3.7) (3.8)
Example 3.2. Recall that Km denotes the complete graph over m vertices (hence having degree m−1). Setting Hn = Kk(n) we recover the hidden clique problem. In this case d(Hn ) = (k(n)−1)/2. Hence, our theorems imply that the two laws are strongly distinguishable if lim inf n→∞ k(n)/ log n > 2/ log(1/q0 ), and are not weakly distinguishable if lim supn→∞ k(n)/ log n < 2/ log(1/q0 ). Example 3.3. Let Qm be the hypercube graph over 2m vertices (hence having degree m): this is the graph whose vertices are binary vectors of length m, connected by an edge whenever their Hamming distance is exactly equal to one. Set Hn = Qlog2 k(n) . In other words, Hn is an hypercube over k(n) vertices. It is easy to see that d(Hn ) = (log2 k(n))/2. This example fits the intermediate density regime in Theorem 3. Let γ(q0 ) ≡ 2/ log2 (1/q0 ). Theorem 1 implies that this graph can be detected provided k(n) ≥ nγ(q0 )+ε for some ε > 0 and all n large enough. On the other hand, Theorem 3 implies that it cannot be detected if k(n) ≤ nmin(γ(q0 ),2/5)−ε for some ε > 0 and all n large enough. We therefore obtained a tight characterization for γ(q0 ) ≤ 2/5 or, equivalently, q0 ≤ 1/32. Example 3.4. Let Hn be a regular tree with degree d(n) and r(n) generations (hence v(Hn ) = 1 + d(n)[(d(n) − 1)r(n) − 1]/(d(n) − 2), e(Hn ) = v(Hn ) − 1). In this case for any Fn ⊆ Hn , e(Fn ) ≤ v(Fn ) − 1. Therefore, d(Hn ) = 1 − 1/v(Hn ) < 1 and Theorem 1, cannot guarantee the strong distinguishability of the hypotheses. Furthermore, lim supn→∞ d(Hn )/ log v(Hn ) = 0 and we are in the low density region. Hence, Theorem 3 implies that the null and planted models are not weakly distinguishable if lim supn→∞ d(n)r(n)+1 /n1/2 = 0. Example 3.5. Let Ckm be the m-th power of the cycle over k vertices. This is the graph with vertex set {1, . . . , k}, and two vertices i, j are connected if |i − j − bk| ≤ m for some b ∈ N. m(n) Let Hn = Ck(n) , for two functions m(n), k(n). In this case, v(Hn ) = k(n) and for all i ∈ V (Hn ), deg(i) = 2m(n). Therefore, for any Fn ⊆ Hn , e(Fn ) ≤ m(n)v(Fn ). Since e(Hn ) = m(n)k(n), by definition, d(Hn ) = m(n). Using Theorem 1, two models are strongly distinguishable if lim inf n→∞ m(n)/ log(n) > log(1/q0 ). In addition, depending on k(n), m(n), we can be in any of 3 regions stated in Theorem 3. If m(n) = ω(log k(n)), we are in the high density region and the laws are not weakly distinguishable if lim supn→∞ m(n)/ log(n) < log(1/q0 ). If m(n) = o(log k(n)), two models cannot be weakly distinguished if lim supn→∞ k(n)/n1/2 = 0. Finally, for the intermediate regime where m(n) = Θ(log k(n)), if lim supn→∞ k(n)/n2/5 = 0 and lim supn→∞ m(n)/ log n < log(1/q0 ) the models are not weakly distinguishable. Note that if lim inf n→∞ m(n)/ log k(n) ≥ (5/2) log(1/q0 ), it is sufficient to have lim supn→∞ m(n)/ log n < log(1/q0 ) since it will imply that that lim supn→∞ k(n)/n2/5 = 0. Therefore, we have obtained a tight characterization for this example when lim inf n→∞ m(n)/ log k(n) ≥ (5/2) log(1/q0 ).
4
Computationally efficient tests
In this section we propose two computationally plausible algorithms for detecting the planted subgraph in the setting described in Section 2. The first method leverages the spectral properties of the given graph for solving the problem. In this case, we establish establish sufficient conditions under which the algorithm succeeds with high probability. We then show that a modification of the spectral algorithm can be used to identify the hidden subgraph. The second approach uses an SDP relaxation of the problem, that is a priori more powerful than the spectral approach.
7
4.1
Spectral algorithm
For p ∈ [0, 1) we denote by ApG the shifted adjacency matrix of the graph G, defined as follows: ( 1 if (i, j) ∈ E(G), p AG ij = (4.1) −p/(1 − p) otherwise. Further, we will denote by AG = A0G the 0 − 1 adjacency matrix of G. Recall that λ1 (ApG ) ≥ λ2 (ApG ) ≥ · · · ≥ λn (ApG ) denote the eigenvalues of ApG . The spectral test is simply based on the leading eigenvalue: ( √ 1 if λ1 (AqG0 ) ≥ 2.1 σ(q0 ) n, Tspec (G) = (4.2) 0 otherwise , r q0 . (4.3) σ(q0 ) ≡ 1 − q0 Note that this test uses the knowledge of q0 , but does not assume the knowledge of planted subgraph Hn . Theorem 4. Let {Hn }n≥1 be a sequence of non-empty graphs such that v(Hn ) = o(n) and for q0 ∈ (0, 1) let P0,n be the null model with edge density q0 , and P1,n be planted model with parameters Hn , q0 . Define σ(q0 ) as per Eq. (4.3). If lim inf n→∞
λ1 (AHn ) √ > 3σ(q0 ) , n
(4.4)
then the two laws P0,n , P1,n are strongly distinguishable. Remark 4.1. The constant 2.1 in Eq. (4.2) can be reduced to 2 + ε for any ε > 0. In addition, we expect that with further work the constant 3 in Eq. (4.4) can be reduced to 1 + ε for any ε > 0. These improvements are not the focus of the present paper. Algorithm 1 Spectral algorithm for identifying hidden subgraphs in G Input: Graph G, edge probability q0 , size of hidden subgraph k = v(H) Output: Estimated support of the √ hidden subgraph S ⊆ V (G) Initialize: n = v(G), t = kq0 + 3 kq0 log k, S = ∅ 1: for i ∈ V (G) do 2: Set v (i) ≡ principal eigenvector of (AqG0 )−i,−i . (i) (i) (i) 3: Order the entries of v (i) : |vj(1) | ≥ |vj(2) | ≥ · · · ≥ |vj(n) | 4: Set Si ≡ {j(1), . . . , j(k)} 5: Set d(i) ≡ # of edges between vertex i and vertices in Si 6: if d(i) > t then 7: S = S ∪ {i} Ensure: S Can spectral method be used to identify the hidden subgraph? We start by noting that, even if Hn can be detected, a subset of its node might remain un-identified. As an example, let Hn be a 8
graph over k(n) vertices, whereby vertices {1, . . . , k(n) − 1} are connected by a clique, and vertex k(n) is connected to the others by a single edge, see figure below:
Then Example 3.2 implies that Hn can be detected with high probability as soon as k(n) ≥ (1+ε) log n/ log(1/q0 ). As we will see below, the spectral algorithm detects Hn with high probability √ √ if k(n) ≥ 3σ(q0 ) n = Θ( n). However it is intuitively clear (and not hard to prove) that the degree-one vertex in Hn cannot be identified reliably. With this caveat in mind, Algorithm 1 provides a spectral approach to identify a subset of the vertices of the hidden subgraph. In order to characterize the set of ‘important’ vertices of Hn , we introduce the following notion. Definition 4.1. Given a graph H = (V (H), E(H)), and c ∈ R>0 , we define the c-significant set of H, Sc (H) ⊆ V (H) as the following set of vertices p Sc (H) := i ∈ V (H) : deg(i) > c v(H) log v(H) . (4.5) We also need to to assume that the leading eigenvector of H is sufficiently spread out. Definition 4.2. Let H = (V (H), E(H)) be a graph with adjacency matrix AH ∈ {0, 1}n×n . For ε ∈ (0, 1), we say that H has spectral expansion ε, if max λ2 (AH ); −λn (AH ) 1−ε= (4.6) λ1 (AH ) Finally let v be the leading eigenvector of AH . We say that H is (ε, µ)-balanced in spectrum if it has spectral expansion ε and µ min |vi | ≥ p . (4.7) i∈V (H) v(H) The following definition helps us present the result on the performance of algorithm 1. Definition 4.3. Let H = (V (H), E(H)) be a graph. For any i ∈ V (H), the graph obtained by removing i from H is denoted by H \ i. Then: 1. We say that H is (ε, µ)-strictly balanced in spectrum if for all i ∈ V (H), H \ i is (ε, µ)balanced in spectrum. 2. We define λ− (H) as λ− (H) ≡ min λ1 (AH\i ) . i∈V (H)
9
(4.8)
The next theorem states sufficient conditions under which Algorithm 1 succeeds in identifying the significant set of the planted subgraph. Theorem 5. Given {Hn }n∈N , q0 ∈ (0, 1), let P1,n be the law of the random graph with edge density q0 and planted subgraph Hn , cf. Section 2, and assume Gn ∼ P1,n . Assume v(Hn ) = o(n) and that, for each n, Hn is (ε, µ)-strictly balanced in spectrum for some µ > 0, ε ∈ (0, 1). Let δ be such that 2δ < 1. µ2 (1 − δ) Finally, assume that lim inf n→∞
|λ− (Hn )| 3σ(q0 ) √ > , εδ n
(4.9)
where λ− (Hn ) is defined as per Eq. (4.8). Let S be the output of Algorithm 1, and set c ≡ 4/(1 − α)(1 − q0 ), α ≡ 2δ/(µ2 (1 − δ)). Then the following hold with high probability n, v(Hn ) → ∞: 1. S contains all the vertices of Gn that correspond to the c-significant set Sc (Hn ) of planted subgraph Hn . 2. S does not contain any vertex that does not correspond to those of the planted subgraph Hn . p Remark 4.2. Note that if mini∈V (Hn ) deg(i) > c v(Hn ) log v(Hn ) where c is as in Theorem 5 (the minimum degree of nodes in the hidden subgraph is ‘sufficiently large’), Sc (Hn ) = V (Hn ) and under the assumptions of Theorem 5, Algorithm 1 will find all the nodes of the planted subgraph p Hn . However, if Hn contains some ‘low degree’ vertices, namely, for some i ∈ V (Hn ), deg(i) ≤ c v(Hn ) log v(Hn ), we have Sc (Hn ) ⊂ V (Hn ). In this case, in order to find all vertices of the planted subgraph H in G, after finding the output of Algorithm 1, S, we can select the nodes i ∈ V (Gn ) such that degS (i) > (1 + ε)q0 |S|, for some ε > 0. Note that if |Sc (Hn )| is ω(log n) then for any i ∈ / ϕ0 (Hn ), degS (i) ≤ (1 + ε)q0 |S| with high probability. Hence, this procedure will not choose any node i such that i ∈ / ϕ0 (Hn ). Moreover, this procedure will find the the planted subgraph Hn if for all nodes i ∈ V (Hn ), degS (i) > (1 + ε)q0 |S|. Note that for any graph H = (V (H), E(H)), λ1 (AH ) ≤ v(H). Hence by definition λ− (H) ≤ v (H \ i) ≤
v(H) . min(q0 , 1)
(4.10)
Hence, the assumptions of Theorem 5 imply in particular lim inf n→∞
v(Hn ) > 0. n1/2
(4.11)
We can compare this condition with the one of Theorem 1. If Hn is a dense graph, we expect generically d(Hn ) = Θ(v(Hn )), and hence there is a large gap between the condition of Theorem 1 (that guarantees distinguishability) and that of Theorem 5. We illustrate this with a few examples
10
Example 4.3. Let Hn = Kk(n) . Assume that, v(Hn ) = k(n) is o(n). We have λ1 (AHn ) = k(n) − 1 and Theorem 4 implies that the laws are strongly distinguishable using the spectral test if lim inf n→∞ k(n)/n1/2 > 3σ(q0 ). This shows a gap between the performance of the spectral test and the statistical bound of Theorem 1. In order to express results on identifying the hidden subgraph in this case, first note that for all p i ∈ V (Hn ), deg(i) = k(n) − 1 and all nodes of Hn are in the c-significant set of Hn for c < (k(n) − 2)/ log k(n) as per Definition 4.1. Assuming that k(n) → ∞ as n → ∞, for any c > 0 all nodes of Hn are in the c-significant set of Hn for large enough n. Also, the leading eigenvector of AHn is 1k(n) and the rest of eigenvalues are −1. Setting ε0 (k(n)) ≡ 1 − 1/(k(n) − 1), based on definition 4.2, for each n, Hn is (ε, 1) balanced in spectrum for ε < ε0 (k(n)). Using the fact that for any i ∈ V (Hn ), Hn \ i is Kk(n)−1 , we deduce that for each n, Hn is (ε, 1)- strictly balanced in spectrum for ε < ε0 (k(n) − 1) and λ− (Hn ) = k(n) − 2. Note that, ε0 (k(n) − 1) → 1 as n → ∞. Therefore, using Theorem 5, if lim inf n→∞ k(n)/n1/2 > 9σ(q0 ), Algorithm 1 can find the planted clique with high probability as n, v(Hn ) → ∞. Example 4.4. Set Hn = Qlog2 k(n) as in Example 3.3. Since the hypercube is a regular graph, λ1 (AHn ) = log2 k(n) and Theorem 4 implies that two models can be strongly distinguished using the spectral test if lim inf n→∞ log2 k(n)/n1/2 > 3σ(q0 ). However, this never happens since k ≤ n. Therefore, Theorem 4 cannot guarantee the strong distinguishability of the hypotheses using the spectral test. Similarly, Theorem 5 does not imply the success of Algorithm 1 in finding the planted hypercube. Example 4.5. Let Hn be a regular tree with degreepd(n) and r(n) generations as in Example 3.4. For a large regular tree, λ1 (AHn ) is of order of 2 d(n) − 1 as v(Hn ) → ∞. Hence, based on Theorem 4, two laws are strongly distinguishable using the spectral test if lim inf n→∞ d(n)/n > (9/4)σ 2 (q0 ). Therefore, v(Hn ) cannot be o(n) and Theorem 4 cannot guarantee the strong distinguishability of two models under any conditions. Recall that Theorem 1, also, could not guarantee the strong distinguishability for this example under any conditions. As a side note, note that if q0 is known a priori -which is not a practical assumption- and lim inf n→∞ d(n)/(nq0 ) > 0 = c, the null and planted models can be distinguished only by looking at the maximum degree in the graph G√ n . In fact, under the null model the maximum degree of graph Gn is less than or equal nq0 + Θ( nq0 log n) with high probability. Therefore, the test that rejects the null iff the maximum degree of Gn is bigger than or equal (1+ε)nq0 strongly distinguishes two models under this assumption. Subsequently, under this condition, the high degree nodes can be used to find the planted tree. p In addition, since d(n)/ v(Hn ) log v(Hn ) → 0 as v(Hn ) → ∞ for the sequence of regular trees, Theorem 5 cannot imply the success of Algorithm 1 in finding the planted regular tree. In other words, Algorithm 1 fails in identifying the planted regular tree because it does not contain sufficiently high degree vertices. m(n)
Example 4.6. Set Hn = Ck(n) as in Example 3.5. As we had for previous examples, since Hn is a sequence of regular graphs, λ1 (AHn ) = 2m(n). Therefore, two models are strongly distinguishable using the spectral test if lim inf n→∞ m(n)/n1/2 > (3/2)σ(q0 ) and the gap between the results of Theorems 1 and 4 is similar to Example p 4.3. Assuming that lim inf n→∞ m(n)/ k(n) log k(n) = (c/2) > 0, Theorem 5 can used to guarantee the performance of Algorithm 1 in identifying the hidden subgraph. Under this condition, all vertices of Hn are in the c-significant set of Hn for large enough n. Note that AHn is a circulant 11
matrix, its principal eigenvalue is λ1 (A(Hn )) = 2m(n), corresponding eigenvector is 1k(n) and Pm(n) other eigenvalues are i=1 2 cos(2πij/k(n)) √ for j√= 1, 2, . . . , k(n) − 1. Here we only consider the case in which lim inf n→∞ m(n)/k(n) = 3ε/(π 2) for some ε > 0. Therefore, λ2 (A(Hn )) ≤ Pm(n) 2m(n) − i=1 4π 2 i2 /(k(n))2 = 2m(n)(1 − 2π 2 m(n)2 /(3k(n)2 )) + o(m(n)). Hence, for large enough n, λ2 (A(Hn )) ≤ 2m(n)(1 − ε) = λ1 (A(Hn ))(1 − ε) and Hn is (ε, 1)-balanced in spectrum. For any i ∈ V (Hn ), λ1 (A(Hn \ i)) ≥ 2e(Hn \ i)/(k(n) − 1) = 2m(n)(1 − 1/(k(n) − 1)). Using Cauchy’s interlacing theorem, λ2 (A(Hn \ i)) ≤ λ2 (A(Hn )) ≤ 2m(n)(1 − ε) ≤ λ1 (A(Hn \ i))(1 − ε0 ) where (1 − ε)/(1 − 1/(k(n) − 1)) = 1 − (ε − ε0n ), ε0n → 0 as n → ∞.pIn addition, for large enough n, for v, the leading eigenvector of A(Hn \ i), we have vi ≥ (1 − δn0 )/ k(n) − 1 for i = 1, 2, . . . , n − 1 where δn0 → 0 as n → ∞. Therefore, for any 0 < ε0 < ε, 0 < µ < 1, Hn is (ε − ε0 , µ)-strictly balanced in spectrum, for large enough n. In addition, λ− (Hn ) ≥ 2m(n)(1 − 1/(k(n) − 1)). Thus, using Theorem 5, if lim inf n→∞ m(n)/n1/2 > 9σ(q0 )/(2ε), Algorithm 1, can find the planted subgraph with high probability as n, k(n) → ∞.
4.2
SDP relaxation
Since we found –generically– a large gap between the statistical detection threshold and the performance of the spectral method, it is natural to look for more powerful algorithms. In this section we develop an SDP relaxation of the hidden subgraph problem. Recall that we denote by AG is the adjacency matrix of graph G ( 1 if (i, j) ∈ E(G), (AG )ij = (4.12) 0 otherwise. This notation is consistent with the one used in the previous section, with the following identification: AG = A0G . We want to find a planted copy of a given graph H in graph G. Let v(H) = k, v(G) = n. We consider therefore the problem maximize Tr(AH ΠT AG Π) subject to ΠT Π = Ik Π ∈ {0, 1}n×k .
(4.13)
This is a non-convex optimization problem known as Quadratic Assignment Problem (QAP) and is well studied in the literature, for example see [Bur13]. We will denote the value of this problem as OPT(G; H). Note indeed that, Π ∈ {0, 1}n×k is feasible if it contains exactly one non-zero entry per column and at most one per row. Call ϕ(i) ∈ [n] the position of the non-zero-entry of column i ∈ [k]. Then e ϕ ∈ L(H, n) is a labeling of the vertices of H, and the objective function can be rewritten as X Tr(AH ΠT AG Π) = 2 (AG )ϕ(i),ϕ(j) (4.14) (i,j)∈E(H)
Hence, if G contains a planted copy of H (e.g. under model G ∼ P1 ), we have OPT(G; H) ≥ 2 e(H). This suggests the following optimization-based test: ( 1 if OPT(G; H) ≥ 2 e(H), TOPT (G) = (4.15) 0 otherwise. 12
The proof of of Theorem 1 suggests that this test is nearly optimal, provided d(H) = e(H)/v(H), i.e. H has no subgraph denser than H itself2 . Unfortunately, in general, OPT(G; H) is NP-complete even to approximate within a constant factor [SG76] We will then resort to an SDP relaxation of the same problem. The following Lemma provides a different formulation of (4.13). Lemma 4.4. Let Π∗ be an optimal solution of problem (4.13). Then vec(Π∗ ) = y ∗ such that T y ∗ y ∗ = Y ∗ is an optimal solution of the following problem maximize Tr ((AG ⊗ AH )Y ) subject to Y ∈ {0, 1}nk×nk Y 0 Tr(Y Jnk ) = k 2 Tr(Y (In ⊗ (ei eT i ))) = 1 Tr(Y ((ej eT ) ⊗ Ik )) ≤ 1 j rank(Y ) = 1.
(4.16) for i = 1,2,. . . ,k for j = 1,2,. . . ,n
Now, we try the following SDP relaxation of problem (4.16) which is proposed in [ZKRW98] maximize Tr ((AG ⊗ AH )Y ) subject to Y 0 0≤Y ≤1 Tr(Y Jnk ) = k 2 Tr(Y (In ⊗ (ei eT i ))) = 1 Tr(Y ((ej eT ) ⊗ Ik )) ≤ 1 j
(4.17) for i = 1, 2, . . . , k for j = 1, 2, . . . , n
The following theorem states an upper bound on the performance of the hypothesis testing method that rejects the null hypothesis if SDP(G; H) ≥ 2e(H). Theorem 6. Let {Hn }n≥1 , P0,n , P1,n be as in Theorem 1. Consider the hypothesis testing problem in which under null Gn is generated according to P0,n and under alternative it is generated according to P1,n . Define σ(q0 ) as per Eq. (4.3). If lim sup n→∞
λ1 (AHn ) 1 √ < σ(q0 ), 4 n
then for the method that rejects the null hypothesis if SDP(G; H) ≥ 2e(H), P0,n {T (Gn ) = 1} → 1 as n → ∞. In the next section we will present the proofs of our results. 2
If this is the not case, the optimization problem (4.13) can be modified replacing H by its densest subgraph.
13
5
Proofs: Statistical limits
We start with the following preliminary lemma. Lemma 5.1. Let, for each n, Z : Gn → R+ be such that, Z(G) = E0,n {Z(Gn )}
dP1,n (G) . dP0,n
Further let Zn = Z(Gn ). Then, P0,n and P1,n are strongly distinguishable if and only if, under P0,n , Zn p −→ 0 . E0,n Zn They are not weakly distinguishable if and only if, along some subsequence {nk }, Zn p −→ 1 . E0,n Zn The proof is standard and deferred to Appendix. In order to state the proof our results, given e a graph G ∈ Gn , we define UG : L(H; n) → N by UG (ϕ) ≡ ϕ(E(H)) ∩ E(G) . For n ≥ v(H), we let e N (H; G) ≡ ϕ ∈ L(H; n) : UG (ϕ) = e(H) . Let P0,n , P1,n be defined as in Section 2, note that e(Hn )
E0,n N (Hn ; G) = (n)v(Hn ) q0
.
Further, we can write dP1,n 1 (G) = dP0,n (n)v(Hn )
X
Y
e (i,j)∈E(Hn ) ϕ∈L(H;n)
1 q0
I{(ϕ(i), ϕ(j)) ∈ E(G)} .
Thus, dP1,n 1 (G) = dP0,n (n)v(Hn )
1 q0
e(Hn )
X
I {|ϕ(E(Hn )) ∩ E(G)| = |E(Hn )|} .
e ϕ∈L(H;n)
Therefore, dP1,n 1 (G) = dP0,n (n)v(Hn )
1 q0
e(Hn ) N (Hn ; G) =
Now we can prove Theorems 1, 3.
14
1 N (Hn ; G). En,0 {N (Hn ; G)}
(5.1)
5.1
Proof of Theorem 1
˜ n be a subgraph of Hn that satisfies d(Hn ) = e(H ˜ n )/v(H ˜ n ). Using (5.1), we can write Proof. Let H dP1,n (Gn ) > 0 = P0,n (N (Hn ; Gn ) > 0) P0,n dP0,n ˜ n ; Gn ) > 0) ≤ P0,n (N (H ˜ n ; Gn ) ≤ E0,n N (H ˜
˜ ) e(H
≤ nv(Hn ) q0 n ˜ n ) log n 1 − d(Hn ) log(1/q0 ) = exp v(H log n which goes to zero as n → ∞ when (3.2) holds. Therefore, under the assumptions of Theorem 1, under P0,n , dP1,n p (Gn ) −→ 0 dP0,n and using Lemma 5.1 the proof is complete.
5.2
Proof of Theorem 3
We start by stating the following lemma. Lemma 5.2. Let {Hn }n≥1 , q0 , P0,n , P1,n be as in Theorem 1. Under the assumptions of Theorem 3; for all ε > 0 lim P0,n {N (Hn ; Gn ) ≤ (1 − ε)E0 N (Hn ; Gn )} = 0.
n→∞
(5.2)
Proof. Let v(Hn ) = kn , e(Hn ) = en . Let ( 1 if |ϕ(E(H)) ∩ E(G)| = en , Xϕ (G) = 0 otherwise. X Note that N (Hn ; Gn ) = Xϕ (Gn ). We have e ϕ∈L(H;n)
E0 Xϕ = p1 (en ) = q0en . We write ϕ1 ∼ ϕ2 , if |ϕ1 (V (Hn )) ∩ ϕ2 (V (Hn ))| ≥ 2. We define eG (m) = maxH⊆G,v(H)=dme e(H). Therefore, if |ϕ1 (V (Hn )) ∩ ϕ2 (V (Hn ))| = u, we have 2en −eHn (u)
E0 Xϕ1 Xϕ2 ≤ p2 (u, en ) ≡ q0 Define ¯ ∆(n, Hn ) =
X ϕ1 ∼ϕ2
15
E0 Xϕ1 Xϕ2 .
.
Therefore, we have ¯ ∆(n, Hn ) =
kn X (n)2k u=2
(kn !)2 p2 (u, en ). u! ((kn − u)!)2 n −u
Now using the fact that for all n √ 2πnn+1/2 e−n ≤ n! ≤ nn+1/2 e−n+1 , we get kn X ¯ ∆(n, Hn ) ≤ g(u). (E0 N (Hn ; Gn ))2 u=2
Where, g(u) =
2kn +1 −e (u) (2π)−3/2 e−2kn +2 kn q Hn (n−u)u (kn −u)2(kn −u)+1 e−2(kn −u)−u uu+1/2 0 k +1/2 −kn +1 e (n−kn )n−kn +1/2 e−n+kn +1 −en knn q0 (2π)1/2 nn+1/2 e−n
if u ≤ kn − 1,
(5.3)
if u = kn .
Now using Janson’s inequality, see [Jan90, Theorem 1], for all ε > 0 ε2 (E0 N (Hn ; Gn ))2 P0,n {N (Hn ; Gn ) ≤ (1 − ε)E0 N (Hn ; Gn )} ≤ exp − ¯ . 2(∆(n, Hn ) + E0 N (Hn ; Gn ))
(5.4)
Note that n! q en (n − kn )! 0 √ 2πnn+1/2 q en . ≥ (n − kn )n−kn +1/2 ekn +1 0
E0 N (Hn ; Gn ) =
Taking logarithm we have, log E0 N (Hn ; Gn ) ≥ (n + 1/2) log n − (n − kn + 1/2) log(n − kn ) − kn + en log q0 + C ≥ kn (log(n − kn ) − 1) − en log(1/q0 ) + C en = kn log n + log(1 − kn /n) − 1 − log(1/q0 ) + C → ∞ kn
(5.5)
under the assumptions of Theorem 3, as n → ∞. Further, for g(u) defined as in (5.3), if u < kn eH (u) 1 log(1/q0 ) − log(g(u)) = log n + log(1 − (u/n)) − n u u + 2(kn /u) − 2((kn /u) − 1) − 1 − (2(kn /u) + (1/u)) log kn 1 + (2(kn /u) − 2 + (1/u)) log(kn − u) + (1 + ) log u + C 2u eH (u) 1 = log n − n log(1/q0 ) − 2 log kn + log u + log u u 2u 1 2kn u + −2+ + + C 0. log 1 − u u kn 16
In addition, −
en 1 log(g(kn )) = (n/kn + 1/(2kn )) log n − log(1/q0 ) kn kn − (1 + 1/(2kn )) log kn − (n/kn − 1 + 1/(2kn )) log(n − kn ) + C en 1 = log n − log(1/q0 ) − log kn − log kn kn 2kn n 1 kn − −1+ log(1 − ) + C 0 . kn 2kn n
Letting f (u) = log n −
u 1 2kn + 1 eHn (u) log 1 − , log(1/q0 ) − 2 log kn + log u + log u + − 2 + u 2u u kn (5.6)
for 2 ≤ u ≤ kn − 1, and f (kn ) = log n −
eHn (u) 1 n 1 kn log(1/q0 ) − log kn − log kn − −1+ log(1 − ), u 2kn kn 2kn n
(5.7)
using (5.4), (5.5), in order to complete the proof it suffices to show that kn X ¯ ∆(n, Hn ) g(u) ≤ (E0 N (Hn ; Gn ))2
=
u=2 kn X
exp{−uf (u)} + o(1)
u=2
≤ kn exp {−˜ uf (˜ u)} + o(1) log kn = exp −˜ u f (˜ u) − + o(1) → 0 u ˜ as n → ∞, where u ˜ = arg min2≤u≤kn {uf (u)}. First note that ((2/x) − 2) log(1 − x) ≥ −1, for 0 ≤ x < 1. Hence, −2+
2kn u log 1 − ≥ −1, u kn
for 2 ≤ u ≤ kn − 1. Further, since x log(1 − 1/x) is increasing for x > 1, for 2 ≤ u ≤ kn − 1, 1 1 log(1 − u/kn ) ≥ log(1 − (kn − 1)/kn ) ≥ −1, u kn − 1 for large enough kn . In addition, log u + (1/(2u)) log u ≥ 0. Hence, the last three terms in (5.6) is bounded below. Finally, note that n 1 −1+ ≥ 0, kn 2kn log kn − ≥ −1, 2kn 17
for large enough n. Thus, the last two terms in (5.7) are also bounded below. Therefore, for 2 ≤ u ≤ kn , f (u) −
log kn eH (u) ≥ log n − n log(1/q0 ) − (5/2) log kn + C, u u
for some constant C. Hence,
¯ ∆(n, Hn ) →0 (E0 N (Hn ; Gn ))2
as n → ∞ if 1 d(Hn ) < , log(1/q0 ) n→∞ log n v(Hn ) lim sup 2/5 = 0. n n→∞ lim sup
This proves Theorem 2. Now, let u∗ = arg min f (u). Note that as we had above, 2≤u≤kn
f (u) ≥ log n −
eHn (u) log(1/q0 ) − 2 log kn + C. u
Hence, under the assumptions of Theorem 3, f (u∗ ) → ∞. Define ( f (u) if, u ≤ u∗ , ∗ f (u) = f (u∗ ) otherwise. We have kn X
exp {−uf (u)} ≤
u=2
=
kn X u=2 u∗ X
exp {−uf ∗ (u)} exp {−uf (u)} + C exp{−u∗ f (u∗ )}
u=2
≤ u∗ exp{−˜ uf (˜ u)} + C exp{−u∗ f (u∗ )}. ∗)
Where C = 1 − e−f (u
−1
is a constant. Therefore, it suffices that u∗ exp{−˜ uf (˜ u)} + C exp{−u∗ f (u∗ )} → 0
as n → ∞. This holds when u∗ f (u∗ ) → +∞, u∗ exp{−˜ uf (˜ u)} → 0. as n → ∞. Note that the first condition above holds since f (u∗ ) → ∞ as n → ∞. Further, log (u∗ exp{−˜ uf (˜ u)}) = log u∗ − u ˜f (˜ u).
18
(5.8)
Hence, if lim supn→∞ (log u∗ )/(˜ u log n) = 0, (5.8) holds when u ˜f (˜ u) → ∞ as n → ∞. Therefore, it suffices to have 1 d(Hn ) < , (5.9) lim sup log(1/q0 ) n→∞ log n v(Hn ) (5.10) lim sup 1/2 = 0. n n→∞ Thus, if d(Hn ) log(1/q0 ) ∈ / Θ(log v(Hn )) then the lemma holds under (5.9), (5.10). Therefore, (5.2) is satisfied under the assumptions of Theorem 3, and the lemma is proved. Now, we can prove Theorem 3. Proof of Theorem 3. using Lemma 5.2, under the assumptions of Theorem 3, for all ε > 0 1 lim P0,n N (Hn ; Gn ) ≤ (1 − ε) = 0. n→∞ E0 N (Hn ; Gn ) Hence letting αε = P0,n
1 N (Hn ; Gn ) ≥ (1 + ε) , E0 N (Hn ; Gn )
for any ε > 0, M > 0 we have lim {αM ε (1 + M ε) + (1 − αM ε )(1 − ε)} ≤ E0,n
n→∞
1 N (Hn ; Gn ) E0 N (Hn ; Gn )
=1
Hence, for any ε > 0, M > 0 1 . M +1 Letting M → ∞, M ε = δ, we deduce that for all δ > 0 1 lim P0,n N (Hn ; Gn ) ≥ (1 + δ) = 0. n→∞ E0 N (Hn ; Gn ) lim αM ε ≤
n→∞
Therefore, under P0,n 1 p N (Hn ; Gn ) −→ 1 E0 N (Hn ; Gn ) and using (5.1) and Lemma 5.1, Theorem 3 is proved.
6
Proofs: spectral algorithm
We start by stating the following useful theorems from random matrix theory. Theorem 7 ([Tao12], Corollary 2.3.6). Let X ∈ Rn×n be a random symmetric matrix whose entries Xij are independent, zero-mean, uniformly bounded random variables for j ≥ i and Xij = Xji for j < i. There exists constants c1 , c2 > 0 such that for all t ≥ c1 √ P kXk2 > t n ≤ c1 exp (−c2 tn) . Theorem 8 ([Tao12], Theorem 2.3.24). Let X ∈ Rn×n be a random symmetric matrix whose entries Xij are i.i.d copies of a zero-mean random variable with variance 1 and finite fourth moment √ for j ≥ i and Xij = Xji for j < i. Then, limn→∞ kXk2 / n = 2, almost surely. 19
6.1
Proof of Theorem 4
First assume that Gn is generated according to the null model P0,n . Then, AqG0n is a random symmetric matrix with independent entries where each entry is a zero-mean Bernoulli random variable which is equal to 1 with probability q0 and −q0 /(1 − q0 ) with probability 1 − q0 . Using Theorem 8, √ λ1 (AqG0n ) ≤ 2.1σ(q0 ) n with high probability as n → ∞. Therefore, lim supn→∞ P0,n Tspec (Gn ) = 1 = 0. Now assume that Gn is generated according to the planted model, P1,n , with parameters q0 v(Hn )×n , and (Π ) = 1 and Hn . Hence, AqG0n is distributed as ΠT n ij n AHn Πn + E n where Πn ∈ {0, 1} if and only if ϕ0,n (i) = j. Further, E n is a random symmetric matrix with independent entries where (E n )i,j = 0 if (ΠT n AHn Πn )i,j = 1 and (E n )i,j is a zero mean Bernoulli random variable which is equal to 1 with probability q0 and −q0 /(1 − q0 ) with probability 1 − q0 , otherwise. Let v, kvk2 = 1 be the principal eigenvector of AHn . We have E D E D E D q0 T T T T T T Π v = Π v, Π A Π Π v + Π v, E Π v v, A λ1 (AqG0n ) ≥ ΠT n n n n Hn n n n n Gn n D E = hv, AHn vi + v, Πn E n ΠT nv Therefore,
λ1 (AqG0n ) v, Πn E n ΠT λ1 (AHn ) nv √ √ √ lim inf ≥ lim inf − lim sup n→∞ n→∞ n n n n→∞ λ1 (Πn E n ΠT n) √ . n n→∞
≥ 3σ(q0 ) − lim sup
p Now, using Theorem 7, λ1 (Πn E n ΠT ≤ c v(Hn ), for some c, and large enough n, almost surely. n )√ Therefore, lim supn→∞ λ1 (Πn E n ΠT n )/ n = 0 and under the alternative, λ1 (AqG0n ) √ ≥ 2.1σ(q0 ), n→∞ n almost surely. Hence, lim supn→∞ P1,n Tspec (Gn ) = 0 = 0 and two models are strongly distinguishable using the spectral test. lim inf
6.2
Proof of Theorem 5
We start by proving some useful lemmas. ˜ where AG is a symmetric n by n matrix, Π ∈ {0, 1}k×n , Lemma 6.1. Let AG = ΠT AH Π + E + E Π1 = 1, AH is k by k symmetric matrix, k = o(n). Further, let E be a random symmetric matrix with independent entries where each entry is a zero-mean Bernoulli random variable which is equal ˜i,j = −Ei,j if (ΠT AH Π)i,j = 1 to 1 with probability p and −p/(1−p) with probability 1−p. Finally, E ˜i,j = 0, otherwise. Let v ∈ Rn , x ∈ Rk , kvk2 = kxk2 = 1, be the leading eigenvectors of AG and E and AH , respectively. Assume that for some δ ∈ (0, 1), r np 3 λ1 (AH ) ≥ εδ 1 − p then v = αΠT x + z for some α such that α2 ≥ 1 − δ, with high probability as n → ∞. 20
Proof. Let S ⊆ [n] be the set of i’s for which the i’th column of Π is not entirely zero. We denote ¯ We can write v = ΠT (αx + βy) + v ¯ = αΠT x + z, where y ∈ Rk the complement of this set by S. S 2 2 is such that y ⊥ x and α + β = kv S k22 . Now, if α2 < 1 − δ, then ˜ hv, AG vi = α2 hx, AH xi + 2αβ hx, AH yi + β 2 hy, AH yi + hv, Evi + hv, Evi. Since x is an eigenvector of AH and x ⊥ y, we have hx, AH yi = 0. Now, using Theorems 7, 8, with high probability as n → ∞, r √ np 2 2 2 hv, AG vi ≤ α λ1 (AH ) + (kv S k2 − α )(1 − ε)λ1 (AH ) + (2 + o(1)) +c k 1−p r √ np ≤ (1 − δ)λ1 (AH ) + δ(1 − ε)λ1 (AH ) + (2 + o(1)) + c k. 1−p q np 3 Therefore, if λ1 (AH ) ≥ εδ 1−p , then hv, AG vi < λ1 (AH ). Further, H is a subgraph of G from which some edges are removed. Hence, by Perron-Frobenius theorem λ1 (AH ) ≤ λ1 (AG ). v cannot be the leading eigenvector of AG and the lemma is proved. The following lemma is an immediate consequence of the above lemma. Lemma 6.2. Let {Hn }n≥1 be a sequence of graphs that are (ε, µ)-balanced in spectrum for some e µ > 0, ε ∈ (0, 1). Further, Let ϕ0,n ∈ L(H, n) be a labeling of Hn vertices in [n], v(Hn ) = o(n). Suppose that Gn is generated according to P 1 as in (2.4). Take v to be the leading eigenvector of AqG0n . Let |vj(1) | ≥ |vj(2) | ≥ · · · ≥ |vj(n) | be the entries of v and S 0 = {j(1), j(2), . . . , j(v(Hn ))}. If r nq0 3 , λ1 (AHn ) ≥ εδ 1 − q0 then 0
|S ∩ ϕ0 (V (Hn ))| ≥
2δ 1− 2 µ (1 − δ)
v(Hn ),
with high probability as n → ∞. v(Hn )×n , and ˜ Proof. Note that AqG0n is distributed as ΠT n AHn Πn + E n + E n where Πn ∈ {0, 1} (Πn )ij = 1 if and only if ϕ0,n (i) = j. Further, E n is a random symmetric matrix with independent entries where each entry is a zero-mean Bernoulli random variable which is equal to 1 with proba˜ n )i,j = −(E n )i,j if (ΠT AH Π)i,j = 1 bility q0 and −q0 /(1 − q0 ) with probability 1 − q0 . Finally, (E ˜ n )i,j = 0, otherwise. Hence, defining x to be the leading eigenvector of AH , using Lemma and (E n 2 ≥ 1 − δ and z ⊥ x ˜ + z, x ˜ = ΠT ˜ x, kxk , with high probability. Let S 6.1, v = x n p (Hn )), √ = ϕ0 (V using the assumption that Hn is (ε, µ)-balanced in spectrum, for i ∈ S, |˜ xi | ≥ µ 1 − δ/ v(Hn ). Note that for i ∈ / S, x ˜ = 0. Therefore, for any i ∈ (S¯ ∩ S 0 ), there exists an index i0 ∈ (S¯0 ∩ S) such √i 2 p that zi2 + zi20 ≥ 2 µ 1 − δ/(2 v(Hn )) . Hence, letting N be the number of indices in S 0 which are not in S, we have !2 √ µ 1−δ p 2N ≤ kzk22 = 1 − kxk22 ≤ δ. 2 v(Hn ) 2δ Therefore, N ≤ 2δv(Hn )/(µ2 (1 − δ)) and |S 0 ∩ ϕ0 (V (Hn ))| ≥ 1 − µ2 (1−δ) v(Hn ) with high probability as n → ∞.
21
Now we prove Theorem 5. Proof of Theorem 5. First assume that i ∈ / ϕ0 (V (Hn )). Recall that d(i) is the number of edges between vertex i and vertices in Si . We have v(Hn ) (i)
d
=
X
Xi
i=1
where {Xi } is a sequence of i.i.d Bern(q0 ) random variables. Therefore using Chernoff’s bound n o n o X P d(i) > t ≤ nP d(i) > t i∈ϕ / 0 (V (Hn ))
! !2 t − Ed(i) Ed(i) ≤ n exp − 3 Ed(i) ( ) t − v(Hn )q0 2 v(Hn )q0 = n exp − v(Hn )q0 3 !2 p 3 v(Hn )q0 log v(Hn ) v(Hn )q0 = n exp − v(Hn )q0 3 = exp{log n − 3 log v(Hn )}. Using (4.11), this goes to zero as n → ∞. Therefore, using union bound, the output set S of Algorithm 1 will be a subset of ϕ0 (V (Hn )). Now, assume that i = ϕ0 (˜i) and ˜i ∈ Sc (Hn ) where c is as in Theorem 5. Using Lemma 6.2, |Si0 ∩ ϕ0 (V (Hn ))| ≥ (1 − α)v(Hn ). Now, note that the set Si0 is independent from i. Therefore, we have deg(˜i) (i)
d
=
X
v(Hn )
Yi +
i=1
X
Xi
i=deg(˜i)+1
where {Xi } is a sequence of i.i.d Bern(q0 ) random variables and {Yi } is a sequence of Bern(1 − α + αq0 ) random variables. Note that here Ed(i) = v(Hn )q0 + deg ˜i (1 − α − q0 + αq0 ) and using the p definition of c-significant set, deg ˜i ≥ (4/(1 − α − q0 + αq0 )) v(Hn ) log v(Hn ). Now we can define (i) v(H )
the process {dj }j=0n as (i)
d0 = Ed(i) ,
h i (i) dj = E d(i) Y1 , Y2 , . . . , Yj ,
for j = 1, 2, . . . , deg(˜i)
and h i (i) dj = E d(i) Y1 , Y2 , . . . , Ydeg(˜i) , Xdeg(˜i)+1 , . . . , Xj ,
for j = deg(˜i) + 1, . . . , v(Hn ). deg(˜i)
(i) v(H )
v(H )
n Using this definition, {dj }j=0n is a martingale with respect to {Yi }i=1 , {Xi }i=deg( ˜i)+1 . Fur (i) (i) (i) (i) ther, dj − dj−1 is uniformly bounded by 1. Note that dv(Hn ) = d(i) , d0 = Ed(i) ≥ v(Hn )q0 +
22
4
p v(Hn )q0 log(v(Hn )). Hence, by Azuma’s inequality n o n o X P d(i) ≤ t ≤ v(Hn )P d(i) ≤ t i∈ϕ0 (Sc (Hn ))
p 2 p ≤ exp log v(Hn ) − (2/v(Hn )) 4 v(Hn ) log v(Hn ) − 3 v(Hn ) log v(Hn ) →0 as n, v(Hn ) → ∞. Thus, by union bound, the output of Algorithm 1, contains all the nodes in the c-significant set of planted subgraph Hn in Gn and has no nodes which are not in the planted subgraph Hn , with high probability, as n, v(Hn ) → ∞.
7
Proofs: SDP relaxation
For simplicity we denote v(Hn ) by kn . We start by proving Lemma 4.4.
7.1
Proof of Lemma 4.4
First note that every feasible Π in (4.13), corresponds uniquely to an injective mapping ϕ from [k] to [n] where ϕ(i) = j if and only if Πji = 1. Based on this, we have k X Tr AH ΠT AG Π = (AH )ij (AG )ϕ(i)ϕ(j) = Tr ((AG ⊗ AH )Y ) . i,j=1
where Y = yy T and y = vec(Π). Moreover Y is a rank one positive definite matrix in {0, 1}nk×nk . Also, Tr(Y Jnk ) =
nk X
Yij =
i,j=1
nk X
!2 = k2 .
yi
(7.1)
i=1
In addition, for i = 1, 2, . . . , k Tr(Y (In ⊗
(ei eT i )))
=
n−1 X
ykl+i =
l=0
n X
Πji = 1.
(7.2)
j=1
Further, for j = 1, 2, . . . , n Tr(Y
((ej eT j)
⊗ Ik )) =
k−1 X
y(j−1)k+l =
l=0
k X
Πji ≤ 1.
(7.3)
i=1
Therefore, Y is feasible for problem (4.17). Conversely, if Y is feasible for problem (4.17), then Y = yy T where y ∈ {0, 1}nk . Also, using (7.1), y has exactly k entries equal to one and n − k entries equal to zero. Further, using the first equality in (7.2), we deduce that for i = 1, 2, . . . , k, n−1 X
ykl+i = 1.
l=0
23
Also, using the first equality in (7.3) for j = 1, 2, . . . , n, k−1 X
y(j−1)k+l ≤ 1.
l=0
This means that the matrix Π ∈ {0, 1}n×k whose j’th row is y(j−1)k+1 , y(j−1)k+2 , . . . , yjk has exactly one entry equal to one in each column. Therefore, Π is feasible for problem equation (4.13). This completes the proof of Lemma 4.4.
7.2
Proof of Theorem 6
The following lemma about the spectrum of a random Erd˝os-R´enyi graph is a consequence of Theorem 8. Lemma 7.1. Let A ∈ {0, 1}n×n be a random matrix with independent entries such that ( 1 with probability pij Aij = 0 with probability 1 − pij , where pij = p if i 6= j and pij = 0 otherwise. Also assume that limn→∞ log n/(np) = 0. Then, λ1 (A) = 1, np −λn (A) lim sup p = 1, n→∞ 2 np(1 − p) lim
n→∞
almost surely. Lemma 7.2. Let A ∈ {0, 1}n×n be a random matrix with independent entries such that ( 1 with probability pij Aij = 0 with probability 1 − pij , where pij = p if i 6= j and pij = 0 otherwise. Also, assume that p > ω(log n)/n. Let D be a n by n n X diagonal matrix such that Dii = Aij , L = D − A. Then, j=1
(i) L 0. (ii) L1n = 0. (iii) if λ2 (L) is the second smallest eigenvalue of L then, p λ2 (L) = np − 2(1 + o(1)) np(1 − p) almost surely, as n → ∞. Now we can state the proof of Theorem 6. 24
Proof of Theorem 6. Using the fact that for any graph G with adjacency matrix AG ∈ {0, 1}n×n , λ1 (AG ) ≥ −λn (AG ), it suffices to prove the theorem assuming that √ 2(λ1 (AHn ) − λkn (AHn )) 1 − q0 = 1 − C < 1. lim sup √ nq0 n→∞ Assume that Gn is generated randomly according to P0,n . Let SDP(Gn ; Hn ) be the sequence of the optimal values of the (random) convex programs (4.17). Let D Gn be a n × n diagonal matrix such that (D Gn )ii = deg(i). In order to prove Theorem 6, we have to show that SDP(Gn ; Hn ) ≥ 2e(Hn ). In order to do this, we construct a sequence of matrices Y n which are feasible for problem (4.17) and Tr ((AGn ⊗ AHn )Y n ) ≥ 2e(Hn ), with high probability as n → ∞. We take ( an (D Gn ⊗ Ikn ) + bn (AGn ⊗ (AHn + Ikn )) + cn Jnkn If λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn , Yn= bn (D Gn − AGn ) ⊗ Ikn + bn AGn ⊗ Jkn otherwise, (7.4) where kn λkn (AHn ) + kn2 − 2e(Hn ) un = −λkn (AHn ) − 1 + , nkn2 + 2e(Hn ) + kn un + nkn2 − kn2 , 2kn e(Gn )(nkn − 1) 1 bn = , 2e(Gn ) kn (kn − 1) − 2e(Hn ) − kn un cn = . n2 kn2 − nkn
an =
Now, we show that Y n is feasible for problem (4.17). First, consider the case where λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn . In this case, un ≤ kn − 1 − 2e(Hn )/kn ≤ kn .
(7.5)
Hence, cn ≥ 0. Also, an ≥ 0 and bn ≥ 0. Thus, Y n ≥ 0, entrywise. In addition, max deg(i) < 2nq0 ,
(7.6)
i∈V (Gn )
e(Gn ) > q0 n2 /4,
(7.7)
with high probability as n → ∞. Thus, using the fact that, 2e(Hn ) ≤ kn2 , n + 1 ≤ 2n and for large enough n, nkn − 1 ≥ nkn /2, n2 kn2 − nkn ≥ n2 kn2 /2, 2nq0 (nkn2 + kn un ) (k 2 − kn ) − kn un an max deg(i) + cn ≤ 2 + n 2 2 (n /2)kn q0 (nkn /2) n kn /2 i∈V (Gn ) 2n2 kn2 q0 2nq0 kn un kn2 + + n3 kn2 q0 /4 n3 kn2 q0 /4 n2 kn2 /2 16 8un 2 + 2 + 2, = n n kn n ≤
25
(7.8)
which is less than 1 for large enough n. Also, similarly, using (7.7), 1 kn2 + (q0 n2 )/2 n2 kn2 /2 2 2 = + 2 ≤ 1, 2 q0 n n
bn + cn ≤
(7.9)
for large enough n. Finally, using (7.5), (7.7), kn k2 + 2 n2 2 (q0 n )/2 n kn /2 2 2kn + 2 ≤1 = 2 q0 n n
un bn + cn ≤
(7.10)
for large enough n. Therefore, according to the construction of Y n as in (7.4), using equations (7.8),(7.9),(7.10) for large enough n, Y n ≤ 1, entrywise. Also, Tr(Y n Jnkn ) = 2an e(Gn )kn + 4bn e(Hn )e(Gn ) + 2bn kn un e(Gn ) + cn n2 kn2 = kn2 . Moreover, for i = 1, 2, . . . , kn , Tr(Y n (In ⊗ (ei eT i ))) = 2e(Gn )an + ncn = 1. Finally, for j = 1, 2, . . . , n, Tr(Y n ((ej eT j ) ⊗ Ikn )) ≤ kn an ≤
max deg(i) + kn cn
i∈V (Gn )
16kn 8un 2kn + 2 + 2 ≤ 1, n n n
(7.11)
for large enough n. Second inequality in (7.11) is by (7.8). Next, we consider the case in which λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn . In this case, since maxi∈V (Gn ) deg(i) 2nq0 ≤ 2 ≤1 2e(Gn ) (n q0 /2) with high probability, as n → ∞, 0 ≤ Y n ≤ 1 entrywise. Further, for i = 1, 2, . . . , kn and j = 1, 2, . . . , n, Tr(Y n Jnkn ) = 2bn kn2 e(Gn ) = kn2 , Tr(Y (In ⊗ (ei eT i ))) = 2e(Gn )bn = 1, 2kn nq0 4kn T Tr(Y n ((ej ej ) ⊗ Ikn )) ≤ kn bn max deg(i) ≤ 2 = ≤ 1, (n q0 /2) n i∈V (Gn ) for large enough n, with high probability. Finally, we have to show that the proposed Y n is positive semidefinite with high probability. In order to show this it is sufficient to show that ˜ H 0, Y˜ n = 2e(Gn )˜ an (D Gn − AGn ) ⊗ Ikn + AGn ⊗ A n 26
where ˜H A n
( AHn + (un + 2e(Gn )an )Ikn , = Jkn
if λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn , otherwise,
and ( an , a ˜n = bn
if λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn , otherwise.
˜ H = Jk 0. Otherwise, If λkn (AHn ) < (2e(Hn ) − kn2 )/kn , then A n n ˜ H ) = λk (AH ) + un + 2an e(Gn ) λkn (A n n n ˜ H ) − λk (A ˜H )−1+ ≥ λkn (A n n n +
kn λkn (AHn ) + kn2 − 2e(Hn ) nkn2
2e(Hn ) + kn un + nkn2 − kn2 kn (nkn − 1)
1 = nkn − 1
2 ˜ ˜ H ) + 1 − kλkn (AHn ) + kn − 2e(Hn ) u + λkn (A n nkn2
! ≥ 0.
˜ H is positive semidefinite in both cases. Therefore, A n Let z be an arbitrary vector in Rnkn . We can write z = zk + z⊥ where z k = 1n ⊗ x,
x ∈ Rk ,
y1 y2 .. .
z⊥ = , y n−1 −y 1 − y 2 − · · · − y n−1
y i ∈ Rk .
Using Lemma 7.2, z k is in the nullspace of (D Gn − AGn ) ⊗ Ikn . Therefore, D E D E ˜ H zk z, Y˜ n z = z k , AGn ⊗ A n D E ˜ H zk + 2 z ⊥ , AGn ⊗ A n D E ˜ H z⊥ . + z ⊥ , 2e(Gn )˜ an (D Gn − AGn ) ⊗ Ikn + AGn ⊗ A n Note that D E D E ˜ H z k = 2e(G) x, A ˜H x , z k , AGn ⊗ A n n D
E n−1 D E X ˜ H zk = ˜ z ⊥ , AGn ⊗ A (deg(i) − deg(n)) y , A x . Hn i n i=1
27
Also using Lemmas 7.1, 7.2 we have D E p ˜ H ) kz ⊥ k2 . an e(Gn ) + λ1 (A z ⊥ , Y˜ n z ⊥ ≥ 2nq0 a ˜n e(Gn ) − 2(1 + o(1)) nq0 (1 − q0 ) 2˜ n Note that if λkn (AHn ) ≥ (2e(Hn ) − kn2 )/kn , −λkn (AHn ) − 1 +
kn λkn (AHn ) + kn2 − 2e(Hn ) ≥ −1. nkn2
Therefore, ˜ H ) = λ1 (AH ) + un + 2an e(Gn ) λ1 (A n n kn λkn (AHn ) + kn2 − 2e(Hn ) nkn2 2 2 2e(Hn ) + kn un + nkn − kn + kn (nkn − 1) ≤ λ1 (AHn ) − λkn (AHn ) + 1. ≤ λ1 (AHn ) − λkn (AHn ) +
Otherwise, note that λ1 (AHn ) ≥
2e(Hn ) . kn
Therefore, if λkn (AHn ) < (2e(Hn ) − kn2 )/kn then ˜ H ). λ1 (AHn ) − λkn (AHn ) ≥ kn ≥ λ1 (A n Thus, ˜ H ) ≤ λ1 (AH ) − λk (AH ) + 1. λ1 (A n n n n √ √ In both cases. Using the fact that, lim supn→∞ (2(λ1 (AHn ) − λkn (AHn )) 1 − q0 )/ nq0 = 1 − C, E D an e(Gn )nq0 kz ⊥ k2 . z ⊥ , Y˜ n z ⊥ ≥ 2C˜ Thus, in order to show positive semidefiniteness of Y˜ n , it suffices to show that ˜H 2e(Gn )A n (deg(1) − deg(n))A ˜H n (deg(2) − deg(n))A ˜H n = .. . ˜H (deg(n − 1) − deg(n))A n
Y 0n
˜H (deg(1) − deg(n))A n 2C˜ an e(Gn )nq0 Ikn 0 .. . 0
˜H (deg(2) − deg(n))A n 0 2C˜ an e(Gn )nq0 Ikn .. . 0
··· ··· ··· .. . ···
˜H (deg(n − 1) − deg(n))A n 0 0 0. .. . 2C˜ an e(Gn )nq0 Ikn
Note that using Chernoff bound, P
p max deg(i) ≥ nq0 + 2 nq0 log n
i∈V (Gn )
( √ ) 2 2 nq0 log n nq0 1 ≤ n exp − = →0 nq0 2 n
28
and, P
p min deg(i) ≤ nq0 − 2 nq0 log n
i∈V (Gn )
( √ ) 2 2 nq0 log n nq0 1 ≤ n exp − = 1/3 → 0, nq0 3 n
√ as n → ∞. Hence, deg(i) − deg(n) ≤ 4 nq0 log n, for all n, with high probability, as n → ∞. Therefore, using Schur’s theorem, since C > 0, we need to show that ˜ H − 16 (2C˜ ˜2 2e(Gn )A an e(Gn )nq0 )−1 n2 q0 log nA n Hn 2 C˜ a (e(G )) n n ˜H ˜H = C 0A 0. Ikn − A n n 4n log n
(7.12)
˜ H 0. Further, C˜ Where C 0 > 0. This holds, since A an (e(Gn ))2 /(4n log n) is Θ(n/ log n) and n (λ1 (AHn ) − λkn (AHn )) log n = 0. n→∞ n lim
Hence, Y n is feasible for problem (4.17), with high probability as n → ∞. Now, note that Tr ((AGn ⊗ AHn )Y n ) ≥ 4e(Hn )e(Gn )bn = 2e(Hn ). Thus, with high probability as n → ∞ under null, the optimal value of problem (4.17), SDP(Gn ; Hn ), is bigger than or equal 2e(Hn ). Note that the optimal value of (4.17) under the alternative when there is no noise is 2e(Hn ). Therefore, under the conditions of the Theorem 6, for the test based on SDP(Gn ; Hn ), P0,n {T (Gn ) = 1} → 1, as n → ∞ and the proof is complete.
Acknowledgments H.J. was supported by the William R. Hewlett Stanford Graduate Fellowship. A.M. was partially supported by NSF grants CCF-1319979 and DMS-1106627 and the AFOSR grant FA9550-13-10036.
A
Proof of Lemma 5.1
First assume that along some subsequence {nk } under P0,n Zn → 1, E0 Zn in probability. For a test T : Gn → {0, 1}, define the risk γ(T ) as γn (T ) = P0,n T (Gn ) = 1 + P1,n T (Gn ) = 0 .
29
(A.1)
Now, for any test T we have Z Z = Z ≥
Z (1 − T )dP1,n +
γn (T ) =
T dP0,n
(1 − T )Zn + T dP0,n (1 − 1{Zn > 1})Zn + 1{Zn > 1} dP0,n .
Using (A.1), the last term goes to 1 as n → ∞. Therefore along {nk } lim inf {inf {γn (T )}} ≥ 1. n→∞
T
Which implies that for all tests T , h i lim sup P0,n T (Gn ) = 1 + P1,n T (Gn ) = 0 = 1. n→∞
Thus, P0,n , P1,n are not weakly distinguishable. Now, assume that Zn → 0. E0 Zn
(A.2)
As above, It is easy to see that the test T = 1{Zn > 1} satisfies lim sup P0,n T (Gn ) = 1 = lim sup P1,n T (Gn ) = 0 = 0 . n→∞
n→∞
Therefore in this case P0,n , P1,n are strongly distinguishable.
References [ACV14]
Ery Arias-Castro and Nicolas Verzelen, Community detection in dense random networks, The Annals of Statistics 42 (2014), no. 3, 940–969.
[AKS98]
Noga Alon, Michael Krivelevich, and Benny Sudakov, Finding a large hidden clique in a random graph, Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 1998, pp. 594–598.
[Alo07]
Uri Alon, Network motifs: theory and experimental approaches, Nature Reviews Genetics 8 (2007), no. 6, 450–461.
[AV11]
Brendan PW Ames and Stephen A Vavasis, Nuclear norm minimization for the planted clique and biclique problems, Mathematical programming 129 (2011), no. 1, 69–89.
[BD11]
Joseph Blitzstein and Persi Diaconis, A sequential importance sampling algorithm for generating random graphs with prescribed degrees, Internet Mathematics 6 (2011), no. 4, 489–522.
[Bur13]
Rainer E Burkard, Quadratic assignment problems, Springer, 2013. 30
[DM13]
p Yash Deshpande and Andrea Montanari, Finding hidden cliques of size N/e in nearly linear time, Foundations of Computational Mathematics (2013), 1–60.
[DM15]
, Improved sum-of-squares lower bounds for hidden clique and hidden submatrix problems, arXiv:1502.06590 (2015).
[EK10]
David Easley and Jon Kleinberg, Networks, crowds, and markets: Reasoning about a highly connected world, Cambridge University Press, 2010.
[FR10]
Uriel Feige and Dorit Ron, Finding hidden cliques in linear time, DMTCS Proceedings (2010), no. 01, 189–204.
[Gra73]
Mark S Granovetter, The strength of weak ties, American journal of sociology (1973), 1360–1380.
[HKP15]
Samuel B Hopkins, Pravesh K Kothari, and Aaron Potechin, Sos and planted clique: Tight analysis of mpw moments at all degrees and an optimal lower bound at degree four, arXiv:1507.05230 (2015).
[HWX14]
Bruce Hajek, Yihong Wu, and Jiaming Xu, Computational lower bounds for community detection on random graphs, arXiv preprint arXiv:1406.6625 (2014).
[HWX15]
, Information limits for recovering a hidden community, arXiv:1509.07859 (2015).
[Jan90]
Svante Janson, Poisson approximation for large deviations, Random Structures & Algorithms 1 (1990), no. 2, 221–229.
[Jer92]
Mark Jerrum, Large cliques elude the metropolis process, Random Structures & Algorithms 3 (1992), no. 4, 347–359.
[JM15]
Hamid Javadi and Andrea Montanari, The hidden motifs problem, In preparation, 2015.
[KA05]
Nadav Kashtan and Uri Alon, Spontaneous evolution of modularity and network motifs, Proceedings of the National Academy of Sciences of the United States of America 102 (2005), no. 39, 13773–13778.
[KIMA04]
Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon, Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs, Bioinformatics 20 (2004), no. 11, 1746–1758.
[MKI+ 03]
Ron Milo, Nadav Kashtan, Shalev Itzkovitz, Mark E.J. Newman, and Uri Alon, On the uniform generation of random graphs with prescribed degree sequences, arXiv condmat/0312028 (2003).
[Mon15]
Andrea Montanari, Finding one community in a sparse graph, arXiv preprint arXiv:1502.05680 (2015).
[MPW15]
Raghu Meka, Aaron Potechin, and Avi Wigderson, Sum-of-squares lower bounds for planted clique, arXiv:1503.06447 (2015).
31
[MSOI+ 02] Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon, Network motifs: simple building blocks of complex networks, Science 298 (2002), no. 5594, 824–827. [RS15]
Prasad Raghavendra and Tselil Schramm, Tight lower bounds for planted clique in the degree-4 sos program, arXiv:1507.05136 (2015).
[SG76]
Sartaj Sahni and Teofilo Gonzalez, P-complete approximation problems, Journal of the ACM (JACM) 23 (1976), no. 3, 555–565.
[SSR+ 05]
Sen Song, Per Jesper Sj¨ ostr¨om, Markus Reigl, Sacha Nelson, and Dmitri B Chklovskii, Highly nonrandom features of synaptic connectivity in local cortical circuits, PLoS Biol 3 (2005), no. 3, e68.
[Tao12]
Terence Tao, Topics in random matrix theory, vol. 132, American Mathematical Soc., 2012.
[TS07]
Thomas Thorne and Michael PH Stumpf, Generating confidence intervals on biological networks, BMC bioinformatics 8 (2007), no. 1, 467.
[VAC13]
Nicolas Verzelen and Ery Arias-Castro, Community detection in sparse random networks, arXiv preprint arXiv:1308.2955 (2013).
[WF94]
Stanley Wasserman and Katherine Faust, Social network analysis: Methods and applications, vol. 8, Cambridge university press, 1994.
[YLSK+ 04] Esti Yeger-Lotem, Shmuel Sattath, Nadav Kashtan, Shalev Itzkovitz, Ron Milo, Ron Y Pinter, Uri Alon, and Hanah Margalit, Network motifs in integrated cellular networks of transcription–regulation and protein–protein interaction, Proceedings of the National Academy of Sciences of the United States of America 101 (2004), no. 16, 5934–5939. [ZKRW98] Qing Zhao, Stefan E Karisch, Franz Rendl, and Henry Wolkowicz, Semidefinite programming relaxations for the quadratic assignment problem, Journal of Combinatorial Optimization 2 (1998), no. 1, 71–109.
32