A LARGE DEVIATION RESULT ON THE NUMBER OF SMALL SUBGRAPHS OF A RANDOM GRAPH
Van H. Vu Microsoft Research Microsoft Corporation One Microsoft Way Redmond, WA 98052 vanhavu@ microsoft.com Technical report number MSR-TR-99-90 December-1999
Abstract.
Fix a small graph H and let YH denote the number of copies of H in the random graph G(n, p). We investigate the degree of concentration of YH around its mean, motivated by the following questions • For what λ does YH have sub-Gaussian behaviour, namely
E
P r(|YH − (YH )| ≥ (λvar(YH ))1/2 ) ≤ e−cλ . where c is a positive constant. • What is P r(YH ≥ (1 + ²) (YH )), for a constant ², i.e, when the upper tail is of order Ω( (YH )) ? • Fixing λ = ω(1) in advance, find a reasonably small tail T = T (λ) such that
E
E
E
P r(|YH − (YH )| ≥ T ) ≤ e−λ . We prove a general concentration result which contains a partial answer to each of these questions.
§1 INTRODUCTION Given a graph H with k vertices {v1 , . . . , vk } and m edges. Consider a bigger graph G and a subgraph H 0 of G. The density of H is the ratio m/k; H is balanced if its density is not smaller than the density of any of its subgraphs. Consider a random graph G(n, p) and let tij denote the random variable representing the edge ij; thus, tij is a {0, 1} random variable with mean p. We assume 1
2
VAN H. VU
that n → ∞, and the asymptotic notations (ω, Θ, O, etc) are understood under this assumption. [n] will denote the vertex set {1, 2, . . . , n} of G(n, p). Through the paper, a, b, c, d, β are positive constants with possible different values in each occurrence. We write f À g if f = ω(g). We denote by YH the number of subgraphs of G(n, p) isomorphic to H. The study of the distribution of YH is a classical topic in the theory of random graphs and there is a vast literature on the subject (see [Bol, ERe, Fri, Jan2, JÃLR, KRu, Ruc, Sch] and their references, also [Bol2, Kar, Ruc2] for surveys). A significant part of the research deals with the limit distribution of the normalized version of YH . For instance, a typical result is the following, proven by Ruci´ nski [Kar]: If 1/2 ≥ p À n−1/m(H) , where m(H) is the maximal density of a subgraph of H, then
(1.1)
YH =
YH − E(YH ) D N (0, 1), → var1/2 (YH ) −
where − D denotes convergence in distribution. This implies that for any positive → constant λ the following bounds hold
(1.2)
P r(|YH − E(YH )| ≥ (λvar(YH ))1/2 ) ≤ exp(−cλ),
where one can set c = 1/2 − o(1). Another important result is a result of Janson, L Ã uczak and Ruci´ nski [JÃLR], who establish a sharp bound for the probability P r(YH ≥ (1 − ²)E(YH )), for a given non-negative number ². Let Hl denote the set of all subgraphs of H with exactly l vertices. Define (1.3)
F (H) = min min E(Y (L)). k≥l≥2 L∈Hl
Janson’s inequality implies
(1.4)
P r(YH ≤ (1 − ²)E(YH )) ≤ exp(−c²2 F (H)).
When ² is a positive constant, this bound is optimal (in our sense). Indeed, for a properly chosen c, with probability exp(−cF (H)), G(n, p) contains no copy of H (see [JÃLR]). In this paper, we study the following three problems:
3
In term of (1.4), one may wonder whether a similar bound (1.5)
P r(YH ≥ (1 + ²)E(YH )) ≤ exp(−c(²)F (H))
would hold for the upper tail. It is not true. For most H, with distance ²E(YH ) from the mean (where ² is a constant), the distribution of YH is no longer symmetric. Example. Let ² = 1 and H be the triangle. Then E(YH ) = Θ(n3 p3 ). Assume that p = o(1/ log n) and n3 p3 ≥ n2 p. Then the bound on the lower tails is exp(−cn2 p) according to (1.4). On the other hand, by Theorem 1.1 below P r(YH ≥ 2E(YH )) ≥ exp(−cn2 p2 log p1 ). It is obvious, given the condition on p, that n2 p À n2 p2 log p1 . The situation concerning the upper tail is less clear and there was no general bound as (1.4). If H is strictly balanced and E(YH ) is small (a constant, say), then it is known that YH is asymptotically Poisson. However, for the more typical case of large expectation, not much was known prior to this paper. The only thing we know of are few bounds for simple graphs (such as triangle) derived by ad hoc arguments (see [AS, JÃLR2], for instance). In this paper, we introduce a new method to prove large deviation results for YH . Using this method, we shall: • Prove a general exponential upper bound for the upper tail. (Theorem 1.1 and Theorem 2.1). We shall also give a general lower bound which nearly matches the upper bound for an infinite family of graphs. • Show that (1.2) holds for λ → ∞, when p is sufficiently large. It turns out that if p is sufficiently large then (1.2) holds (with an abstract constant c) for λ as large as a positive constant power of n. For instance, if p = 1/2, then one can set λ = n (see Corollary 2.2). • Give a general strategy to effectively compute a reasonably small tail T (λ) which satisfy P r(|YH − E(YH )| ≥ T ) ≤ exp(−λ) for a large λ given in advance (see the paragraph and the example following the proof of Theorem 1.1 in Section 2). We shall also consider a more general problem of counting rooted extensions and prove a general large deviation result for this problem. This bound extends and strengthens considerably an earlier result of Spencer [ Spe] (see Theorem 2.3 and the remarks following it). A nice feature of our method is that it is very general so that one can easily modify it to apply to other random models, such as random hypergraphs or random subgraphs of a given graph, to obtain similar bounds. We shall discuss this possibility in Section 6. In the rest of this section, we demonstrate a sample result. Let α∗ (H) denote the fractional independence number of H. AssignPto each vertex v a non-negative number av , α∗ (H) is the maximum value of a = v av under the constrains au + av ≤ 1, for all u ∼ v.
4
VAN H. VU
Theorem 1.1. Fixed a positive constant K. For any positive number ² ≤ K, there is a positive constant d = d(K, H) such that ∗
exp(−dE(YH )1/α
(H)
1 log ) ≤ P r(Y ≥ (1 + ²)E(YH )). p
If H is balanced and ²2 E(YH )1/(k−1) = ω(log n), then there is a positive constant c = c(K, H) such that P r(Y ≥ (1 + ²)E(YH )) ≤ exp(−c²2 E(YH )1/(k−1) ). The sharpness of the result. For any k, there is a graph H on k vertices such that α∗ (H) = k − 1. Thus at this generality, the upper bound is more or less best possible, in the sense that k − 1 cannot be replaced by k − 1 − δ for any positive constant δ (the factor log p1 is negligible if p is constant or E(YH ) is a positive constant power of n). Remarks. • If ² = o(1), for certain p, one can improve the first statement by replacing d ∗ by d0 ²1/α (H) , where d0 is a positive constant depending only on H (see the remark following the proof of the lower bound). • Notice that for any graph H, α∗ (G) is at least |V (H)|/2 (assign weight 1/2 to all vertices). Thus Theorem 1.1 implies the following corollary. Corollary 1.2. Given a balanced graph H and a positive constant K. There are positive constants c and d such that for any ² satisfying K ≥ ² > 0 and ²2 E(YH )1/(k−1) = ω(log n), we have 1 exp(−dE(YH )2/k log ) ≤ P r(Y ≥ (1 + ²)E(YH )) ≤ exp(−c²2 E(YH )1/(k−1) ). p • Theorem 1.1 can be used to derive the following type of results, which is frequently used in combinatorial applications. If p is sufficiently large and H is balanced, then a.s. in every large subset X of [n], the number of copies of H in X is not significantly more than expected. For instance, if p is a constant, then for a subset X of Ω(n) vertices, the expectation of the number copies of H in X is Ω(nk ). Assume that ² is a small positive constant; by Theorem 1.1, the probability the number of copies of H in X exceeds its expectation by a ²-fraction is O(exp(−Ω(nk/(k−1) ))) = o(exp(−n)). On the other hand, there are only O(exp(n)) ways to choose X. Therefore, a.s., no subset X has too many copies of H. Now let us prove the lower bound, which is the easy part of the theorem. This proof will explain how the rather mysterious quantity α∗ (H) comes into the picture.
5
Lemma 1.3. Assume that M À k = |V (H)|. Then using O(M ) edges one can ∗ build a graph which contains M α (H) copies of H. Proof. The following simple construction is the proof. Let ai , i = 1, . . . , k be the weights of the vertices of H in an optimal fractional coloring. Consider pairwise disjoint sets A1 , . . . , Ak where |Ai | = dM ai e. For all i, j, i ∼ j in H, connect all points in Ai with all points in Aj . Since ai + aj ≤ 1, the resulting graph has O(M ) Pk ∗ edges. On the other hand, this has at least M i=1 ai = M α (H) copies of H. ¤ ∗
∗
Now set M = ((1+²)E(YH ))1/α (H) . With probability exp(−dE(YH )1/α (H) log p1 ) one can fix in advance Ω(M ) edges. By the previous lemma, fixing so many edges is enough to guarantee (1 + ²)E(YH ) copies of H. Choosing the constant d properly, the proof of the lower bound is thus complete. ¤ One can achieve a somewhat better bound for the case ² → 0 by the following ∗ ∗ trick. Instead of ((1 + ²)E(YH ))1/α (H) , set M = (2²E(YH ))1/α (H) . By fixing Ω(M ) edges, we can guarantee 2²E(YH ) copies of H. Assume that M = o(n), then there are (1 − o(1))n points left. The random graph on these points with probability 1−o(1) contain at least (1−²)E(YH ) copies of H. Thus with probability exp(−d0 M ) (where d0 is a positive constant that does not depend on ²) the random graph contains at least (1 + ²)E(YH ) copies of H. Remark. Alon [Alo] proved that Lemma 1.3 is best possible in the sense that ∗ using O(M ) edges one cannot build more than Ω(M α (H) ) copies of H. However, we do not require this direction in our proofs. The proof of the upper bound is derived from a more general result, stated in the next section (Theorem 2.1). This theorem is the main result of this paper and it gives partial answers to all problems mentioned in the beginning of this section. An open question. It would be very nice (and equally surprising) if one can replace k − 1 in the upper bound of Theorem 1.1 by α∗ (H). In the following we specify ² = 1. Question. For which H and p does exp(−dE(YH )1/α
∗
(H)
) ≥ P r(Y ≥ 2E(YH )),
or even exp(−dE(YH )1/α
∗
(H)
1 log ) ≥ P r(Y ≥ 2E(YH )) p
hold for some positive constant d ? §2 GENERAL RESULTS Let us start by few notation. For j = 0, set F0 = 1; if j ≥ 2, let
6
VAN H. VU
Fj (YH ) = min min E(YL ). l≥j L∈Hl
For j = 0, 2, . . . , k, set Ej (YH ) = E(YH )/Fj (YH ). Ej (YH ) can be interpreted as follows. On [n], fix a subgraph H 0 of H with at least j vertices. Draw all other edges randomly, Ej (YH ) is (up to a constant factor) the maximum expectation of the number of copies of H containing H 0 , over all choices of H 0 . Here is our main theorem. Theorem 2.1. There is a positive constant c = c(H) such that the following holds. Assume that E0 , E2 , . . . , Ek−1 and λ satisfy the following conditions • Ej ≥ Ej (YH ), j = 0, 2, . . . , k − 1 • λ = ω(log n) • Ek−1 ≥ λ • Ej /Ej+1 ≥ λ for all k − 2 ≥ j ≥ 2 • E0 /E2 ≥ λ. Then P r(|YH − E(YH )| ≥ (λE0 E2 )1/2 ) ≤ exp(−cλ). Remark. Notice that we omit E1 and E1 (the reason will be explained in the paragraph following Lemma 5.3). The theorem holds for an arbitrary graph H, without the assumption that H is balanced. Now we use Theorem 2.1 to deduce Theorem 1.1 and an extension of (1.2). Proof of Theorem 1.1 We only need to prove the upper bound, that is P r(YH ≥ (1 + ²)E(YH )) ≤ exp(−c²2 E(YH )1/(k−1) ). Set E0 = nk pm , λ = a²2 (nk pm )1/(k−1) , Ej = (nk pm )(k−j)/(k−1) for all j = 2, 3, . . . , k− 1, where a is a small positive constant chosen so that (λE0 E2 )1/2 ≤ ²E(YH ) and the last three conditions of Theorem 2.1 are satisfied. Since (nk pm )1/(k−1) = Ω(E(YH ))1/(k−1) , λ = ω(log n). The only condition in Theorem 2.1 which needs verification is that Ej ≥ Ej (YH ). This is trivial for j = 0, so in the following we assume that j ≥ 2. Assume that Fj (Y ) is obtained at a subgraph with k 0 vertices and m0 edges, where k 0 ≥ j. By definition, there is a constant b such that 0
0
Ej (YH ) ≤ bnk pm /(nk pm ).
7
We shall show that nk pm k m (k−j)/k ) ¿ (nk pm )(k−j)/(k−1) . 0 m0 ≤ (n p k n p The last inequality in the previous line is trivial, by the fact that nk pm = ω(1). To show the first inequality notice that m0 /k 0 ≤ m/k since H is balanced. On the other hand, npm/k > 1 and k 0 ≥ j. Thus 0
0
0
(npm/k )j ≤ (npm /k )k , which is equivalent to the first inequality. Since (λE0 E2 )1/2 ≤ ²E(YH ), the statement follows from Theorem 2.1.
¤
For an arbitrary graph H (not necessarily balanced), the optimal choice of Ej depends strongly on the structure of H (i.e, on the numbers Ej (YH )). For instance, if λ is given in advance and we want to guarantee a bound of order exp(−cλ), then one can optimize the tail by minimizing E0 E2 under the conditions of Theorem 2.1. This gives a partial answer to the last problem posed in the introduction. In the following, we present one example of how to compute the best tail (best with respect to our theorem). The reader is encouraged to work out few other simple cases. Example. Consider H = C4 and p = n−1/2+γ for some positive constant γ. Assume that we need a superpolynomial bound exp(−Ω(log2 n)), i.e, λ = Ω(log2 n). In the following Fj stands for Fj (C4 ) and so on. By definition, we have (up to constant factors) that F0 = 1, F2 = min(n2 p, n3 p2 , n4 p4 ), F3 = min(n3 p2 , n4 p4 ), F4 = n4 p4 . Given the order of magnitude of p, it follows that F1 = n2 p and F2 = n3 p2 . So, (again up to constant factors) E0 = n4 p4 , E2 = n2 p3 and E3 = np2 . Setting Ej = Ej , we have that a tail of magnitude (λn6 p7 )1/2 is enough to guarantee the desired upper bound P r(YC4 ≥ (λn6 p7 )1/2 ) ≤ exp(−cλ). On the other hand, the situation changes when p get smaller. Assume now that p = n−4/5 , for instance. Then F2 = F3 = F4 = n4 p4 . Therefore, E2 = E3 = 1. Thus, the natural choices for Ej is the following E0 = n4 p4 , E2 = λ2 , E3 = λ. In this case, the tail is of order (λ3 n4 p4 )1/2 and we have P r(YC4 ≥ (λ3 n4 p4 )1/2 ) ≤ exp(−cλ). Extension of (1.2).
8
VAN H. VU
Now we use Theorem 2.1 to extend (1.2) for bigger λ, provided that p is sufficiently large. j Assume that p = ω(n−1/(k−1) log n). Set Ej = aE(YH )/nj p(2) , where a is a positive constant to be determined later. It follows from the definition of Fj (YH ) in the first section that l Fj (YH ) ≥ min (Θ(nl p(2) ).
k≥l≥j
l l+1 Using the assumption on p, one can show that nl p(2) ≤ nl+1 p( 2 ) , for all l ≤ j k −1. Therefore, Fj (YH ) = Ω(nj p(2) ). Thus, with sufficiently large a, Ej ≥ Ej (YH ). Moreover, one can show that E0 E2 = O(var(YH )). Since min(Ek−1 , Ej /Ej+1 ) ≥ npk−1 , it follows that one can set λ as large as bnpk−1 for some positive constant b. In particular, if p is a constant (1/2, say), then λ can be as large as Ω(n).
Corollary 2.2. There are positive constants b and c such that the following holds. If p = ω(n−1/(k−1) log n), ω(log n) = λ ≤ bnpk−1 , then P r(|YH − E(YH )| ≥ (λvar(YH ))1/2 ) ≤ exp(−cλ). The condition on p and the upper bound of λ in this corollary are determined based on the worst case when H is a k-clique. For a different graph H, it might be possible to improve both parameters. Number of rooted subgraphs. Let L be a graph with vertices labeled by r1 , . . . , rl , v1 , . . . , vk , where R = {r1 , . . . , rl } is a special subset, called the roots. The vj are free points, and an edge with at least one free endpoint is called a free edge. The pair (R, L) will be dubbed as a rooted graph. Let G be a graph on [n] and identify R with a set of l points in G (to simplify the notation, we also call these points r1 , . . . , rl ).In a rooted graph we pay no attention to the edges between the roots. Consider a subgraph L0 of G on {r1 , . . . , rl } ∪ W , where |W | = k. We say that this subgraph is an extension if one can label the vertices of W as w1 , . . . , wk so that • wi ∼ wj if and only if vi ∼ vj • wi ∼ rj if and only if vi ∼ rj . In other words, (R, L0 ) is a copy of (R, L). Consider G = G(n, p); we denote by Y(R,L) the number of extensions corresponding to a given pair (R, L) and a fixed set of vertices r1 , . . . , rl . If l = 0 (i.e., there is no root), then Y (∅, L) = Y (L) is the number of copies of L. Therefore, the problem of bounding the deviation of Y(R,L) is a generalization of our original problem. We denote by E(Y(R,L) ) the expectation of Y(R,L) in G(n, p); it is clear that E(Y(R,L) ) = Ω(nk pm ), where k is the number of free vertices and m is the number of free edges.
9
For a set A ⊂ V (L)\R, let (RA , LA ) denote the rooted graph obtained by extending the root set by A. By definition, if (R, L) has k free vertices and m free edges, then (RA , LA ) has k − |A| free vertices and m − e(A) free edges, where eA is the number of free edges of L with both endpoints in R ∪ A. For any 0 ≤ j ≤ k − 1, let Mj (R, L) = max E(Y (RA , LA )). A,|A|≥l
If A = V (L)\R, we set Mk (R, L) = E(Y (RA , LA )) = 1. The reader may notice that the definition of Mj is a generalization of the definition of Ej . Theorem 2.3. There is a positive constant c = c((R, L)) such that the following holds. Assume that M0 , . . . , Mk−1 and λ satisfy • Mj ≥ Mj , j = 0, 1, . . . , k − 1 • λ = ω(log n) • Mk−1 ≥ λ • Mj /Mj+1 ≥ λ for all j = 0, 1, . . . , k − 2. Then P r(|Y(R,L) − E(Y(R,L) )| ≥ (λM0 M1 )1/2 ) ≤ exp(−cλ). Remark. The reader would notice that this theorem looks quite similar to Theorem 2.1. The difference here is that in Theorem 2.3, we need to consider M1 , while in Theorem 2.1, E1 is omitted (the reason will be come clear in the proof; see, for instance, the paragraph following Lemma 5.3). So if we apply Theorem 2.3 to a balanced graph H (seen as a rooted graph with empty root), we would obtain the following bound, which is similar but weaker than Theorem 1.1 P r(YH ≥ (1 + ²)E(YH )) ≤ exp(−cE(YH )1/k ). The investigation of the number of rooted subgraphs is motivated by a theorem of Shelah and Spencer on zero-one laws. In [SS] Shelah and Spencer proved the following important result Theorem 2.4. If p = n−γ for γ irrational, then p satisfies zero-one law. We omit the (rather involved) definition of zero-one laws and refer to [SS]. Consider a rooted graph (R, L) with k free vertices and m edges. The ratio m/k is the density of (R, L). Similar to the case of graphs, we say that (R, L) is balanced if its density is not smaller than the density of any proper subgraph; if the density of (R, L) is definitely larger than that of any proper subgraph, then we say that (R, L) is strictly balanced. We say that p is safe if the expectation of Y(R,L) in G(n, p) is lower bounded by a positive constant power of n.
10
VAN H. VU
A key tool in the proof of Theorem 2.4 is a concentration result on the number of rooted subgraphs. This concentration result was later strengthened by Spencer in another paper [Spe] to the following Theorem 2.5. If p is safe and (R, L) is strictly balanced, then for any positive constant ² P r(|Y(R,L) − E(Y(R,L) )| > ²E(Y(R,L) )) = o(n−r ). The following corollary Theorem 2.3 improves Theorem 2.5 by giving a sharper upper bound and weakening the assumption that (R, L) is strictly balanced. Notice that if p is safe than (E(Y(R,L) )1/k À log n. Corollary 2.6. If p is safe and (R, L) is balanced, then for any positive constant ² P r(|Y(R,L) − E(Y(R,L) )| > ²E(Y(R,L) )) < exp(−c(²)E(Y(R,L) )1/k ). The proof is similar to the proof of Theorem 1.1, so we omit it. A result weaker than Corollary 2.6 (but still stronger than Theorem 2.5) was proven by Kim and the present author in an earlier paper [KV]. The rest of the paper is organized as follows. In the next section, we describe our main tool. In Section 4, we prove Theorem 2.3. This theorem will be used as a lemma in the proof of Theorem 2.1, which follows in Section 5. In the final section, Section 6, we shall discuss the possibility of extending our results to more general models. §3 A MARTINGALE LEMMA Consider a function Y = Y (t1 , . . . , tn ), where t − i are i.i.d. {0, 1} random variables. Given t = (t1 , . . . , tn ), set effecti (t) = E(Y |t1 , . . . , ti−1 ) − E(Y |t1 , . . . , ti−1 , ti = 1, and W (t) =
X
pi effecti (t).
i
The following lemma is a corollary of Lemma (??) in [KV]. Lemma 3.1. Let V and A be two arbitrary positive numbers and λ satisfies 0 < λ < V /c2 . Then P r(|Y − E(Y )| > (λV )1/2 ) < 2e−λ/4 + P r(W (t) ≥ V /A) +
n X i=1
P r(effecti (t) ≥ A).
11
We shall prove Theorems 2.1 and 2.3 by induction combined with Lemma 3.1. The hardest part of the proofs was to state the theorems to be both general and suitable for an inductive proof. Despite their little bit complicated and artifficial appearance, the statements of Theorems 2.1 and 2.3 allow fairly simple inductive proofs. On the other hand, we do not find any direct proof for a more pleasant (and weaker) statement of Theorem 1.1. We shall first prove Theorem 2.3, since this theorem will be needed as a lemma in the proof of Theorem 2.1. §4 PROOF OF THEOREM 2.3 We use induction on the number of free edges of L. If F ree(L) = 0, then Y(R,L) has only one value and the statement is trivial. Assume that the statement holds for all rooted graph (R0 , L0 ) with at most m − 1 free edges and k free vertices, we show that it also hold for a rooted graph (R, L) with m free edges and k free vertices. Given this graph, we shall prove the following Claim 4.1. There are positive constants a and b such that the following holds. For a sequence M0 , M1 , . . . , Mk and λ satisfying • Mj ≥ Mj (Y(R,L) ) for all j = 0, 1, . . . , k − 1 • λ = ω(log n) • Mk−1 ≥ λ • Mj /Mj+1 ≥ λ for all j = 0, 1, . . . , k − 2 we have
P r(|Y(R,L) − E(Y(R,L) )| ≥ (aλM0 M1 )
1/2
µ ¶ n ) ≤ 2exp(−λ/4) + ( + 1)exp(−bλ). 2
Given that λ = ω(log n), Theorem 2.3 follows from this claim by resetting λ = aλ (one may always assume that a ≥ 1). Order the random variables tij arbitrarily and expose them as in Lemma 3.1 ¡n¢ (we have here 2 random variables). Set A and V (in Corollary 3.2) equal M1 and M0 , respectively. Claim 4.1 follows from Corollary 3.2 and the following two lemmas. Lemma 4.2. There are positive constants a, b such that for any edge e P r(effecte ≥ aM1 ) ≤ exp(−bλ), for any λ satisfying the conditions of Claim 4.1.
12
VAN H. VU
Lemma 4.3. There are positive constants a, b such that P r(W (t) ≥ aM0 ) ≤ exp(−bλ), for any λ satisfying the conditions of Claim 4.1. These two lemmas will be verified using the induction hypothesis. First, we need a better understanding of the quantities in question (effecte and W ). Let us consider two simple cases. For two non-negative random variables X and Y , we write X = O(Y ) if there is a positive constant c such that with probability 1, X ≤ cY . P Example 1. L is a triangle with no root. In this case, YL = 1≤i<j