Concentration of measure for the number of isolated vertices in the Erd¨os-R´enyi random graph by size bias couplings Subhankar Ghosh and Larry Goldstein
∗
University of Southern California
Abstract Let Y be a nonnegative random variable with mean µ and let Y s , defined on the same space as Y , have the Y -size biased distribution, that is, the distribution characterized by E[Y f (Y )] = µE[f (Y s )]
for all functions f for which these expectations exist.
The size bias coupling of Y to Y s can be used to obtain the following concentration of measure result when Y counts the number of isolated vertices in an Erd¨ os- R´enyi random graph model on n edges with edge probability p. With σ 2 denoting the variance of Y , Z θ Y −µ µ P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds θ≥0 σ 2σ 2 0 with 2s
γs = 2e
n pes 1+ + (1 − p)−n + 1. 1−p
Left tail inequalities may be obtained in a similar fashion. When np → c for some constant c ∈ (0, ∞) as n → ∞, the bound is of the order at most e−kt for some positive constant k.
1
Introduction and main result
For some n ∈ {1, 2, . . . , } and p ∈ (0, 1) let K be the random graph on the vertices V = {1, 2, . . . , n}, with the indicators Xvw of the presence of edges between two unequal vertices v and w being independent Bernoulli random variables with success probability p, and Xvv = 0 for all v ∈ V. Recall that the degree of a vertex v ∈ V is the number of edges incident on v, X d(v) = Xvw . (1) w∈V
Many authors have studied the distribution of X Y = 1(d(v) = d)
(2)
v∈V ∗ Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA,
[email protected] and
[email protected] 2000 Mathematics Subject Classification: Primary 60E15; Secondary 60C05,60D05. Keywords: Large deviations; graph degree; size biased couplings
1
counting the number of vertices v of K with degree d(v) = d for some fixed d. In this paper we derive upper bounds, for fixed n, on the distribution function of the number of isolated vertices of K, that is, (2) for the case d = 0 where Y counts the number of vertices having no incident edges. For d in general, and p = pn depending on n, previously in [7] the asymptotic normality of Y was shown when n(d+1)/d pn → ∞ and npn → 0, or npn → ∞ and npn − log n − d log log n → −∞; see also [10] and [3]. Asymptotic normality of Y when npn → c > 0, was obtained by [2]. The size bias coupling considered here was used in [6] to study the rate of convergence to the multivariate normal distribution for a vector whose components count the number of vertices of some fixed degrees. In [8], the mean µ and variance σ 2 of Y for the particular case d = 0 are computed as µ = n(1 − p)n−1
and σ 2 = n(1 − p)n−1 (1 + np(1 − p)n−2 − (1 − p)n−2 ).
(3)
In the same paper, Kolmogorov distance bounds to the normal were obtained and asymptotic normality shown when n2 p → ∞ and np − log(n) → −∞. O’Connell [9] showed that an asymptotic large deviation principle holds for Y . Raiˇc [11] obtained nonuniform large deviation bounds in some generality, for random variables W with E(W ) = 0 and Var(W ) = 1, of the form 3 P (W ≥ t) ≤ et β(t)/6 (1 + Q(t)β(t)) 1 − Φ(t)
for all t ≥ 0
(4)
where Φ(t) denotes the distribution function of a standard normal variate and Q(t) is a quadratic in t. Although in general the expression for β(t) is not simple, when W equals Y properly standardized and np → c as n → ∞, (4) holds for all n sufficiently large with √ C2 t C1 C4 t/ n − 1) β(t) = √ exp √ + C3 (e n n for some constants C1 , C2 , C3 and C4 . For t of order n1/2 , for instance, the function β(t) will be small as n → ∞, allowing an approximation of the deviation probability P (W ≥ t) by the normal, to within some factors. Theorem 1.1 below, by contrast, produces a non-asymptotic, explicit bound, that is, it does not require 2 any relation between n and p and is satisfied for all n. Moreover by (9) and (7), the bound is of order e−λt 2 over some range of t, and of worst case order e−ρt , for the right tail, and e−ηt for the left tail, where λ, ρ and η are explicit, with the bound holding for all t ∈ R. Theorem 1.1. For n ∈ {1, 2, . . .} and p ∈ (0, 1) let K denote the random graph on n vertices where each edge is present with probability p, independently of all other edges, and let Y denote the number of isolated vertices in K. Then for all t > 0, Z θ Y −µ µ P ≥ t ≤ inf exp(−θt + H(θ)) where H(θ) = sγs ds, (5) θ≥0 σ 2σ 2 0 with the mean µ and variance σ 2 of Y given in (3), and n pes γs = 2e2s 1 + + β + 1 where β = (1 − p)−n . 1−p
(6)
For the left tail, for all t > 0, P
Y −µ ≤ −t σ
2 t σ2 exp − . 2 µ(β + 1)
≤
2
(7)
Recall that for a nonnegative random variable Y with finite, nonzero mean µ, the size bias distribution of Y is given by the law of a variable Y s satisfying E[Y f (Y )] = µE[f (Y s )]
(8)
for all f for which the expectations above exist. The main tool used in proving Theorem 1.1 is size bias coupling, that is, constructing Y and Y s , having the Y -size biased distribution, on the same space. In [4] and [5], size bias couplings were used to prove concentration of measure inequalities when |Y s − Y | can be almost surely bounded by a constant independent of the problem size, the number of vertices in the present case. Here, when Y is the number of isolated vertices of K, we consider a coupling of Y to Y s with the Y -size bias distribution where this boundedness condition is violated. Unlike the theorem used in [4] and [5], which can be applied to a wide variety of situations under a bounded coupling assumption, it seems that cases where the coupling is unbounded, such as the one we consider here, need application specific treatment, and cannot be handled by one single general result. Remark 1.1. Useful bounds for the minimization in (5) may be obtained by restricting to θ ∈ [0, θ0 ] for some θ0 . In this case, as γs is an increasing function of s, we have H(θ) ≤
µ γθ θ 2 4σ 2 0
for θ ∈ [0, θ0 ].
The quadratic −θt + µγθ0 θ2 /(4σ 2 ) in θ is minimized at θ = 2tσ 2 /(µγθ0 ). When this value falls in [0, θ0 ] we obtain the first bound in (9), and setting θ = θ0 yields the second, thus, ( t2 σ 2 exp(− µγ ) for t ∈ [0, θ0 µγθ0 /(2σ 2 )] Y −µ θ0 ≥t ≤ (9) P 2 µγθ0 θ0 σ ) for t ∈ (θ0 µγθ0 /(2σ 2 ), ∞). exp(−θ0 t + 4σ 2 Though Theorem 1.1 is not an asymptotic result, when np → c as n → ∞, (3) and (6) yield σ2 → 1 + ce−c − e−c , µ
β → ec
and
s
γs → 2e2s+ce + ec + 1
as n → ∞.
Since limn→∞ γs and limn→∞ µ/σ 2 exist, the right tail probability is at most of the exponential order e−ρt for some ρ > 0. Also, the left tail bound (7) in this asymptotic behaves as 2 2 t σ2 t 1 + ce−c − e−c lim exp − = exp − , n→∞ 2 µ(β + 1) 2 e−c + 1 2
that is, as e−ηt with η > 0. The paper is organized as follows. In Section 2 we review results leading to the construction of size biased couplings for sums of possibly dependent variables, and then apply it in Section 3 to the number Y of isolated vertices; this construction first appeared in [6]. The proof of Theorem 1.1 is also given in Section 3.
2
Construction of size bias couplings
In this section we will review the discussion in [6] which gives a procedure for a construction of size bias couplings when Y is a sum; the method has its roots in the work of Baldi et al. [1]. The construction depends on being able to size bias a collection of nonnegative random variables in a given coordinate, as described in Definition 2.1. Letting F be the distribution of Y , first note that the characterization (8) of the size bias distribution F s is equivalent to the specification of F s by its Radon Nikodym derivative dF s (x) =
x dF (x). µ 3
(10)
Definition 2.1. Let A be an arbitrary index set and let X = {Xα : α ∈ A} be a collection of nonnegative random variables with finite, nonzero expectations EXα = µα and joint distribution dF (x). For β ∈ A, we say that Xβ = {Xαβ : α ∈ A} has the X-size bias distribution in coordinate β if Xβ has joint distribution dF β (x) = xβ dF (x)/µβ . Just as (10) is related to (8), the random vector Xβ has the X-size bias distribution in coordinate β if and only if E[Xβ f (X)] = µβ E[f (Xβ )]
for all functions f for which these expectations exist.
Letting f (X) = g(Xβ ) for some function g one recovers (8), showing that the β th coordinate of Xβ , that is, Xββ , has the Xβ -size bias distribution. The factorization dF (x) = P (X ∈ dx|Xβ = x)P (Xβ ∈ dx) of the joint distribution of X suggests the following way to construct X. First generate Xβ , a variable with distribution P (Xβ ∈ dx). If Xβ = x, then generate the remaining variates {Xα , α 6= β} with distribution P (X ∈ dx|Xβ = x). Similarly, the factorization dF β (x) = xβ dF (x)/µβ = P (X ∈ dx|Xβ = x)xβ P (Xβ ∈ dx)/µβ = P (X ∈ dx|Xβ = x)P (Xββ ∈ dx)
(11)
suggests that to generate Xβ with distribution dF β (x), one may first generate a variable Xββ with the Xβ -size bias distribution, then, when Xββ = x, generate the remaining variables according to their original conditional distribution given that the β th coordinate takes on the value x. Definition 2.1 and the following special case of a proposition from Section 2 of [6] will be applied in the subsequent constructions; the reader is referred there for the simple proof. Proposition 2.1. Let A be an arbitrary indexP set, and let X = {Xα , α ∈ A} be a collection of nonnegative random variables with finite means. Let Y = β∈A Xβ and assume µA = EY is finite and positive. Let Xβ have the X-size biased distribution in coordinate β as in Definition 2.1. Then, if I is a random index, independent of X, taking values in A with distribution P (I = β) = µβ /µA , the variable Y s =
P
α∈A
β ∈ A,
XαI has the Y -sized biased distribution as in (8).
In our examples we use Proposition 2.1 and the random index I, and (11), to obtain Y s by first generating XII with the size bias distribution of XI , then, if I = β and Xββ = x, generating {Xαβ : α ∈ A\{β}} according to the (original) conditional distribution P (Xα , α 6= β|Xβ = x).
3
Proof of Theorem 1.1
We now present the proof of Theorem 1.1. Proof. We first construct a coupling of Y s , having the Y -size bias distribution, to Y . Let K be given, and let Y be the number of isolated vertices in K. To size bias Y , first recall the representation of Y as the sum (2) with d = 0. As the summands in (2) are exchangeable, the distribution of the random index I in Proposition 2.1 is uniform. Hence, choose one of the n vertices of K uniformly. If the chosen vertex, say V , is already isolated, we do nothing and set K s = K, as the remaining variables already have their conditional distribution given that V is isolated. Otherwise obtain K s by deleting all the edges connected to K. By Proposition 2.1, the variable Y s counting the number of isolated vertices of K s has the Y -size biased distribution. 4
To derive the needed properties of this coupling, let N (v) be the set of neighbors of v ∈ V, and T be the collection of isolated vertices of K, that is, with d(v), the degree of v, given in (1), N (v) = {w : Xvw = 1}
and T = {v : d(v) = 0}.
Note that Y = |T |. Since all edges incident to the chosen V are removed in order to form K s , any neighbor of V which had degree one thus becomes isolated, and V also becomes isolated if it was not so earlier. As all other vertices are otherwise unaffected, as far as their being isolated or not, we have X Y s − Y = d1 (V ) + 1(d(V ) 6= 0) where d1 (V ) = 1(d(w) = 1). (12) w∈N (V )
In particular the coupling is monotone, that is, Y s ≥ Y . Since d1 (V ) ≤ d(V ), (12) yields Y s − Y ≤ d(V ) + 1. Now note that for real x 6= y, the convexity of the exponential function implies Z 1 Z 1 ey + ex ey − ex (tey + (1 − t)ex )dt = ety+(1−t)x dt ≤ = . y−x 2 0 0 Using (14), that the coupling is monotone, and that Y is a function of T , for θ ≥ 0 we have s s θ s E(eθY − eθY ) ≤ E (Y − Y )(eθY + eθY ) 2 θ E (exp(θY )(Y s − Y ) (exp(θ(Y s − Y )) + 1)) = 2 θ = E {exp(θY )E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T )} . 2
(13)
(14)
(15)
Now using that Y s = Y when V ∈ T , and (13), we have E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T ) ≤ E((d(V ) + 1)(exp(θ(d(V ) + 1)) + 1)1(V 6∈ T )|T ) ≤ eθ E d(V )eθd(V ) + eθd(V ) + e−θ d(V ) 1(V 6∈ T )|T + 1 ≤ eθ E 2d(V )eθd(V ) + e−θ d(V ) 1(V 6∈ T )|T + 1,
(16)
where in the final inequality we have used that 1(V 6∈ T ) ≤ d(V )1(V 6∈ T ). To derive a bound on the expectation in (16), first note that by conditioning we obtain P (d(V ) = k, 1(V 6∈ T ) = 1|T ) = P (V 6∈ T )P (d(V ) = k|T , V 6∈ T ).
(17)
Since V is chosen independently of K, the distribution on the right hand side of (17) is a Binomial sum with number of trials equal to the number n − 1 − Y of non-isolated vertices other than V , each having success probability p, conditioned to be nonzero, that is, ( n−1−Y pk (1−p)n−1−Y −k for 1 ≤ k ≤ n − 1 − Y k 1−(1−p)n−1−Y P (d(V ) = k|T , V 6∈ T ) = (18) 0 otherwise. Using the conditional distribution (18) and (17), and dropping the factor P (V 6∈ T ) in the later to obtain an inequality, it can be easily verified that the first derivative of the conditional moment generating function of d(V ) satisfies E(d(V )eθd(V ) 1(V 6∈ T )|T ) ≤
(n − 1 − Y )(peθ + 1 − p)n−2−Y peθ . 1 − (1 − p)n−1−Y 5
Starting with the first term of (16), by the mean value theorem applied to the function f (x) = xn−1−Y , for some ξ ∈ (1 − p, 1) we have 1 − (1 − p)n−1−Y = f (1) − f (1 − p) = (n − 1 − Y )pξ n−2−Y ≥ (n − 1 − Y )p(1 − p)n . Hence, recalling θ ≥ 0, (n − 1 − Y )(peθ + 1 − p)n peθ 1 − (1 − p)n−1−Y (n − 1 − Y )(peθ + 1 − p)n peθ ≤ (n − 1 − Y )p(1 − p)n n peθ θ = αθ where αθ = e 1 + . 1−p
E(d(V )eθd(V ) 1(V 6∈ T )|T ) ≤
(19)
Lastly, again applying (17) and (18) we may handle the second term in (16) by the inequality E(d(V )1(V 6∈ T )|T ) ≤
(n − 1 − Y )p (n − 1 − Y )p ≤ = β, n−1−Y 1 − (1 − p) (n − 1 − Y )p(1 − p)n
(20)
with β as in (6). Substituting inequalities (19) and (20) into (16) yields E((Y s − Y )(exp(θ(Y s − Y )) + 1)|T ) ≤
γθ
γθ = 2eθ αθ + β + 1.
where
(21)
Now, by (15) we have that s
E(eθY − eθY ) ≤
θγθ E(eθY ) 2
for all θ ≥ 0.
(22)
Letting m(θ) = E(eθY ) and using (8) and (22) we obtain 0
m (θ) = E(Y e
θY
) = µE(e
θY s
θγθ )≤µ 1+ 2
m(θ).
(23)
Standardizing, we set M (θ) = E(exp(θ(Y − µ)/σ)) = e−θµ/σ m(θ/σ),
(24)
and now by differentiating and applying (23), we obtain M 0 (θ)
1 −θµ/σ 0 µ e m (θ/σ) − e−θµ/σ m(θ/σ) σ σ µ −θµ/σ θγθ µ e 1+ m(θ/σ) − e−θµ/σ m(θ/σ) σ 2σ σ µθγ µθγ θ θ e−θµ/σ m(θ/σ) = M (θ). 2σ 2 2σ 2
= ≤ =
Since M (0) = 1, integrating M 0 (s)/M (s) over [0, θ] yields the bound log(M (θ)) ≤ H(θ),
or that
Hence for t ≥ 0, Y −µ ≥t ≤ P σ
M (θ) ≤ exp(H(θ))
P
exp
θ(Y − µ) σ
≥e
6
θt
where
µ H(θ) = 2σ 2
Z
θ
sγs ds. 0
≤ e−θt M (θ) ≤ exp(−θt + H(θ)).
As the inequality holds for all θ ≥ 0, it holds for the θ achieving the minimal value, proving (5). To demonstrate the left tail bound let θ < 0. Since Y s ≥ Y and θ < 0, using (14) and (13), and recalling s Y = Y when V ∈ T , we obtain s
E(eθY − eθY ) ≤ ≤
s |θ| θY E (e + eθY )(Y s − Y ) 2 |θ|E(eθY (Y s − Y ))
= |θ|E(eθY E(Y s − Y |T )) ≤
|θ|E(eθY E((d(V ) + 1)1(V 6∈ T ))|T )).
Now (20) yields s
E(eθY − eθY ) ≤ (β + 1)|θ|E(eθY ), and therefore s
m0 (θ) = µE(eθY ) ≥ µ (1 + (β + 1)θ) m(θ). Again with M (θ) as in (24), M 0 (θ)
= ≥ =
1 −θµ/σ 0 µ e m (θ/σ) − e−θµ/σ m(θ/σ) σ σ µ µ −θµ/σ e ((1 + (β + 1)θ/σ)m(θ/σ)) − e−θµ/σ m(θ/σ) σ σ µ(β + 1)θ M (θ). σ2
Dividing by M (θ) and integrating over [θ, 0] yields log(M (θ)) ≤
µ(β + 1)θ2 . 2σ 2
(25)
The inequality in (25) implies that for all t > 0 and θ < 0, Y −µ µ(β + 1)θ2 P ≤ −t ≤ exp θt + . σ 2σ 2 Taking θ = −tσ 2 /(µ(β + 1)) we obtain (7). Acknowledgements: The authors would like to thank an anonymous referee for a number of helpful comments.
References [1] Baldi, P., Rinott, Y. and Stein, C. (1989). A normal approximations for the number of local maxima of a random function on a graph, Probability, Statistics and Mathematics, Papers in Honor of Samuel Karlin, T. W. Anderson, K.B. Athreya and D. L. Iglehart eds., Academic Press, 59-81. ´ ski, M. and Rucin ´ ski,A.(1989). A central limit theorem for decomposable [2] Barbour, A.D., Karon random variables with applications to random graphs, J. Combinatorial Theory B, 47, 125-145. ´ s, B. (1985). Random graphs. Academic Press Inc., London. [3] Bolloba [4] Ghosh, S. and Goldstein, L.(2009). Concentration of measures via size biased couplings, Probab. Th. Rel. Fields, DOI: 10.1007/s00440-009-0253-3. 7
[5] Ghosh, S. and Goldstein, L.(2011). Applications of size biased couplings for concentration of measures, Electronic Communications in Probability, 16, pp. 70-83 [6] Goldstein, L. and Rinott, Y.(1996). Multivariate normal approximations by Stein’s method and size bias couplings, Journal of Applied Probability, 33,1-17. ´ ski, M. and Rucin ´ ski, A. (1987). Poisson convergence and semi-induced properties of random [7] Karon graphs. Math. Proc. Cambridge Philos. Soc., 101, 291-300. [8] Kordecki, W.(1990). Normal approximation and isolated vertices in random graphs, Random Graphs ’87, Karo´ nski, M., Jaworski, J. and Ruci´ nski, A. eds., John Wiley & Sons Ltd., 131-139. [9] O’Connell, N.(1998). Some large deviation results for sparse random graphs, Probab. Th. Rel. Fields, 110, 277-285. [10] Palka, Z. (1984). On the number of vertices of given degree in a random graph. J. Graph Theory, 8, 167–170. ˇ, M.(2007). CLT related large deviation bounds based on Stein’s method, Adv. Appl. Prob., 39, [11] Raic 731-752.
8