LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS BEHZAD MEHRDAD AND LINGJIONG ZHU
Abstract. The law of large numbers for the empirical density for the pairs of uniformly distributed integers with a given greatest common divisor is a classic result in number theory. In this paper, we study the large deviations of the empirical density. We will also obtain a sharp rate of convergence to the normal distribution for the central limit theorem. Some generalizations are provided.
1. Introduction Let X1 , . . . , Xn be the random variables uniformly distributed on {1, 2 . . . , n}. It is well known that 1 X 6 1gcd(Xi ,Xj )=` → 2 2 , ` ∈ N. (1.1) 2 n π ` 1≤i,j≤n
Heuristically, observe that if gcd(Xi , Xj ) = `, then Xi , Xj ∈ C` := {`n : n ∈ N}, with probability `12 as n → ∞ and {gcd(Xi , Xj ) = `} = {Xi , Xj ∈ C` , gcd(Xi /`, Xj /`) = P∞ 2 1}. Therefore, we get the desired result by noticing that `=1 `12 = π6 . On the other hand, two independent uniformly chosen integers are coprime if and only if they do not have a common prime factor. For any prime number p, the probability that a uniformly random integer is divisible by p is p1 . Hence, we get an alternative formula, Y 1 X 6 1 1 (1.2) 1 → = 2, 1 − = gcd(Xi ,Xj )=1 n2 p2 ζ(2) π 1≤i,j≤n
p∈P
where ζ(·) is the Riemann zeta function and throughtout this paper P denotes the set of all the prime numbers in an a creasing order. The fact that P(gcd(Xi , Xj )) → π62 was first proved by Ces`aro [2]. The identity relating the product over primes to ζ(2) in (1.2) is an example of an Euler product, and the evaluation of ζ(2) as π 2 /6 is the Basel problem, solved by Leonhard Euler in 1735. For a rigorous proof of (1.2), see e.g. Hardy and Wright [11]. For further details and properties of the distributions, moments and asymptotics for the greatest common divisors, we refer to Ces`aro [3], [4], Cohen [5], Diaconis and Erd˝os [7] and Fern´ andez and Fern´ andez [9], [10]. Since the law of large numbers result is well-known, it is natural to study the fluctuations, i.e. central limit theorem and the probabilities of rare events, i.e. Date: 25 October 2013. Revised: 25 October 2013. 2000 Mathematics Subject Classification. 60F10, 60F05, 11A05. Key words and phrases. Greatest common divisors, coprime pairs, central limit theorems, large deviations, rare events. 1
2
BEHZAD MEHRDAD AND LINGJIONG ZHU
large deviations. The central limit theorem was recently obtained in Fern´andez and Fern´ andez [9] and we will provide the sharp rate of convergence to normal distribution. The large deviations result is the main contribution of this paper. For the readers who are interested in the probabilistic methods in number theory, we refer to the books by Elliott [8] and Tenenbaum [12]. The paper is organized in the following way. In Section 2, we state the main results, i.e. the central limit theorem and the convergence rate to the Gaussian distribution and the large deviation principle for the emirical density. The proofs for large deviation principle are given in Section 3, and the proofs for the central limit theorem are given in Section 4. 2. Main Results 2.1. Central Limit Theorem. In this section, we will show a central limit theorem and obtain the sharp rate of convergence to the normal distribution. The method we will use is based on a result by Baldi et al. [1] for Stein’s method for central limit theorems. Theorem 1. Let Z be a standard normal distribution with mean 0 and variance 1. ! P 2 6 C 1≤i,j≤n 1gcd(Xi ,Xj )=1 − n π 2 , Z ≤ 1/2 , (2.1) dT V 2σ 2 n3/2 n where C > 0 is a universal constant and Y 1 36 2 2 (2.2) σ := 1 − 2 + 3 − 4. p p π p∈P
Similarly, we have the following result. Theorem 2. Let Z be a standard normal distribution with mean 0 and variance 1, and let ` ∈ N . ! P 2 6 C 1≤i,j≤n 1gcd(Xi ,Xj )=` − n `2 π 2 (2.3) dT V , Z ≤ 1/2 , 2σ 2 n3/2 n where C > 0 is a universal constant and 1 Y 1 2 36 2 (2.4) σ := 3 1 − 2 + 3 − 4 4. ` p p ` π p∈P
2.2. Large Deviation Principle. In this section, we are interested to study the following probability, X 1 (2.5) P 2 1gcd(Xi ,Xj )=` ' x , as n → ∞. n 1≤i,j≤n
When x 6= show that,
1 6 `2 π 2 ,
this probability goes to zero as n → ∞. Indeed, later, we will
(2.6)
1 P 2 n
X
1gcd(Xi ,Xj )=` ' x ' e−nI` (x)+o(n) ,
` ∈ N.
1≤i,j≤n
In other words, this probability decays exponenailly fast as n → ∞. This pheonomenon is called large devitions in probability and the exponent I` (x) is the rate function.
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
3
Later in this section, we will specify I` (x) for any ` ∈ N. For the moment, let us concentrate on I1 (x), the case in which we consider the number of coprime pairs. Before we proceed, let us introduce the formal definition of large deviations. A sequence (Pn )n∈N of probability measures on a topological space X satisfies the large deviation principle with rate function I : X → R if I is non-negative, lower semicontinuous and for any measurable set A, we have 1 1 (2.7) − inf o I(x) ≤ lim inf log Pn (A) ≤ lim sup log Pn (A) ≤ − inf I(x). n→∞ n x∈A n→∞ n x∈A Here, Ao is the interior of A and A is its closure. We refer to Dembo and Zeitouni [6] or Varadhan [13] for general background of large deviations and the applications. αk β1 βm 1 α2 Let Xi = pα 1 p2 · · · pk q1 · · · qm , where pj , for 1 ≤ j ≤ k are the first k prime numbers and qh are distinct primes other than pj , for 1 ≤ h ≤ m. Define Xik as αk 1 αk the restriction of Xi to its first k primes, i.e. Xik := pα 1 p2 · · · pk . As a first step to understand the event that Xi and Xj are coprime, it is easier to understand the event that Xik and Xjk are coprime. For a given prime p ∈ {p1 , . . . , pk }, we can ask whether Xik and Xjk are divisible by p or not, for any 1 ≤ i, j ≤ n. This is analogous to selecting two subsets from {1, 2, . . . , k} and consider if they have empty intersection. To illustrate, let us consider the following example. Choose n subsets Aki , 1 ≤ i ≤ n, uniformly from {1, 2, . . . , k}. Then, Aki ∩Akj = ∅ if and only if h ∈ / Aki ∩ Akj , for any h ∈ {1, 2, . . . , k}, which occurs with probability 1 1 − 22 . Therefore, the following law of large numbers holds, k 1 1 X 3k 1 → 1 − (2.8) = k. k k Ai ∩Aj =∅ 2 2 n 2 4 1≤i,j≤n
Let us denote the 2k subsets of {1, 2, . . . , k} by B1 , . . . , B2k and P k := {B1 , . . . , B2k }. Then, ZZ 1 X A (2.9) 1Aki ∩Akj =∅ = fk (x, y)LA n (dx)Ln (dy), n2 x∈P k ,y∈P k 1≤i,j≤n
where
LA n
is the empirical measure on P k , i.e. n
(2.10)
LA n (x) =
1X δ k (x), n i=1 Ai
and fk (x, y) on P k × P k is defined as (2.11)
( 0 fk (x, y) := 1
if x ∩ y 6= ∅, otherwise.
The sets Aki are i.i.d. uniformly distributed on P k . Sanov’s theorem tells us that (2.12)
−nI(µ)+o(n) P(LA , n ' µ) = e
where µ = (µj )1≤j≤2k ∈ M(P k ), the space of all the probability measures on P k , and the rate function is given by k
(2.13)
I(µ) =
2 X j=1
log
µ j µj . 2−k
4
BEHZAD MEHRDAD AND LINGJIONG ZHU
By contraction principle, X 1 (2.14) P 2 1Aki ∩Akj =∅ ' x ' e−nI(x)+o(n) , n 1≤i,j≤n
where k
(2.15)
I(x) = RR
inf P k ×P k
fk (x,y)µ(dx)µ(dy)=x
2 X
log
j=1
µ j µj . 2−k
Now let us go back to our original problem, but for truncated integers. we are interested in, X 1 (2.16) P 2 1gcd(Xik ,Xjk )=1 ' x ' e−nI1 (x)+o(n) . n 1≤i,j≤n
Therefore, let us introduce P k again, P k := {(α1 , . . . , α2k ) : αj ∈ {0, 1}}
(2.17)
Similar as before, we take Aki and Akj from P k . Unlike before, now the Aki are not taken uniformly from P k . Indeed, it follows a probability distribution η k = (ηi )1≤i≤2k , where ηi are taken from the set (2.18) ( ) 1−α1 αk 1−αk α1 1 1 1 1 1− ··· 1− , αj ∈ {0, 1}, j = 1, 2, . . . , k . p1 p1 pk pk Another look at fk in 2.11 reveals that, ( 0 if xi = yi = 1 for some 1 ≤ i ≤ 2k . (2.19) fk (x, y) = 1 otherwise. Hence, we have k
(2.20)
I(x) = RR
inf P k ×P k
fk (x,y)µ(dx)µ(dy)=x
2 X i=1
log
µi ηi
µi .
There are two issues one needs to understand. • The first problem is how to let k → ∞. We need to understand the meaning of P k and the associated probability measure when k → ∞. Later, we will see that to rigorously tackle this problem, we consider the binary expansions of a real number on [0, 1] and will define the probability measures accordingly. • In an “ideal world”, the probability that Xi is divisible by any prime number p is precisely p1 . But in our definition, Xi is uniformly distributed on {1, 2, . . . , n} and therefore the probability that it is divisible by p is [n/p] n , where [x] is the integer part of x. Later, we will define the probability ˜ compared with the original probability measure for the “ideal world” as P measure P.
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
5
Let S := (si )i∈N be a sequence of numbers on [0, 1]. We define the probability measure νkS on [0, 1], for k ∈ N, as follows. (2.21)
νkS ([0, b])
:=
k i−1 X Y
(1 − sj )(1 − si )bi ,
i=1 j=1
where b = 0.b1 b2 . . . is the binary expansion of b. In other words, if we draw a random variable U according to the measure νkS and consider the first k digits in the binary expansion of U , they are distributed as k Bernoulli random variables with parameters (si )ki=1 . It is easy to see that a measure ν S exists as a weak limit of νkS and let ν S be its weak limit. For example, if si = 21 , for i ∈ N, then ν S is simply the Lebesgue measure on [0, 1]. Let (pi )i∈N be the members of P in the increasing order. From now on, we work with νk and ν, for which the si is p1i ,or 1 (2.22) ν = ν P , where P := , and pi ∈ P. pi i∈N Let U1 , . . . , Un be random variables distributed with probability measure ν. Restrict each Ui to its first k digits in the binary expansion and define it as Uik . By our construction of ν and νk , we see that Uik are random variables distributed with probability measure νk . Let us see the connections amongst Xik , Aki and Uik . If we view Uik as a 0, 1 sequence of length k, there exists a bijection ψk to P k . Hence, ψk (Uik ) has the same distribution as Aki that is η k as in eq. 2.18. In addition, Xik is φk (Uik ), where the map φk : [0, 1] → N, for k ∈ N, is defined as (2.23)
φk (a) :=
k Y
pai i ,
i=1
where a = 0.a1 a2 a3 . . . is the binary expansion of a and pi is the ith smallest prime number. Moreover, if we define f : [0, 1]2 → {0, 1} (an extension of 2.19) as ( 0 if x and y share a common 1 in their binary expansions (2.24) f (x, y) = , 1 otherwise then, it is straightforward to see that (2.25)
f (Uik , Ujk ) = 1gcd(Xik ,Xjk )=1 = 1Aki ∩Akj =∅ .
We also extend ψk naturally to ψ. In that regard, define Ai := ψ(Ui ). So, (2.26)
f (Ui , Uj ) = 1Ai ∩Aj =∅ .
Since φk , ψk , and ψ are continuous maps, Ui ’s, Ai ’s and Xi ’s satisfy large deviations with the same rate if one of them satisfies large deviation. Moreover, the rate function should be an extension of (2.20), and this is the subject of our next and main theorem. P Theorem 3. The probability measures n12 1≤i,j≤n 1gcd(Xi ,Xj )=1 ∈ · satisfy a large deviation principle with rate function Z dµ (2.27) I1 (x) = RR inf log dµ, dν f (x,y)µ(dx)µ(dy)=x [0,1] [0,1]2 where ν and f are defined in 2.22 and 2.24, respectively.
6
BEHZAD MEHRDAD AND LINGJIONG ZHU
We can also consider the following large deviation problem, X 1 (2.28) P 2 1gcd(Xi ,Xj )=` ' e−nI` (x)+o(n) . n 1≤i,j≤n
Write βm ` = q1β1 q2β2 · · · qm ,
(2.29)
where qi are distinct primes and βi are positive integers for 1 ≤ i ≤ m. For a fixed `, let p1 , . . . , pk be the smallest k primes distinct from q1 , . . . , qm . Any positive integer can be written as γm α1 k p1 · · · pα q1γ1 · · · qm k ,
(2.30)
where γi and αj are non-negative integers. Any number on [0, 1] can be written as 0.γ1 γ2 · · · γm α1 α2 · · · αk · · · ,
(2.31)
where γ1 , . . . , γm are obtained from ternary expansion and α1 , α2 , . . . are obtained from binary expansion. The interpretation is that if an integer is not divisible by qiβi , then γi = 0. If it is divisible by qiβi but not by qiβi +1 , then γi = 1. Finally, if it is divisible by qiβi +1 , then γi = 2. We also have αj = 0 if an integer is not divisible by pj and 1 otherwise. Restrict to the first m + k digits and define a probability measure νk that takes values α1 1−α1 αk 1−αk 1 1 1 1 (2.32) g(q1 ) · · · g(qm ) 1− ··· 1− , p1 pi pk pk where
(2.33)
g(qi ) =
1− 1
1 β qi i
β − qi i βi1+1
qi
if γi = 0 1 β +1
qi i
if γi = 1 ,
1 ≤ i ≤ m.
if γi = 2
Let ν be the weak limit of νk . Similar to Theorem 3, we get the following result. P Theorem 4. The probability measures n12 1≤i,j≤n 1gcd(Xi ,Xj )=` ∈ · satisfy a large deviation principle with rate function Z dµ (2.34) I` (x) = RR inf dµ, log dν f (x,y)µ(dx)µ(dy)=x 2 [0,1] [0,1] where f (x, y) = 1 if 0 never occurs in the first m digits in the expansions of x and y; and x and y do not share a common 1 or 2 in their expansions. Otherwise, f (x, y) = 0. Remark 5. It is interesting to observe that π62 is also the density of square-free integers. That is because an integer is square-free if and only if it is not divisible by p2 for any prime number p. Therefore, we have the law of large numbers, i.e. n Y 1X 1 6 (2.35) 1Xi is square-free → 1 − 2 = 2. n i=1 p π p∈P
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
The central limit theorem is standard, Pn i=1 1Xi is square-free − √ (2.36) n
6n π2
→N
36 6 0, 2 − 4 π π
7
.
The large deviation principle also holds with rate function x 1−x (2.37) I(x) := x log + (1 − x) log . 6/π 2 1 − 6/π 2 Remark 6. One can also generalize the result to ask what it is the probability that if we uniformly randomly choose d numbers from {1, 2, . . . , n} their greatest common divisor is 1. It is not hard to see that X Y 1 1 1 (2.38) 1gcd(Xi1 ,...,Xid )=1 → 1− d = , as n → ∞. d n p ζ(d) 1≤i1 ,...,id ≤n
p∈P
2
There are d n(n − 1) · · · (n − (2d − 2)) pairs (i1 , . . . , id ) and (j1 , . . . , jd ) so that |{i1 , . . . , id } ∩ {j1 , . . . , jd }| = 1. It is also easy to see that Y 1 2 (2.39) P (gcd(X1 , . . . , Xd ) = gcd(Xd , . . . , X2d−1 ) = 1) = 1 − d + 2d−1 . p p p∈P
Therefore, we have the central limit theorem. X Y 1 1 1gcd(Xi1 ,...,Xid )=1 − nd (2.40) 1− d 2d−1 p d · n 2 1≤i1 ,...,id ≤n p∈P Y 2 Y 1 1 2 → N 0, 1 − d . 1 − d + 2d−1 − p p p p∈P
p∈P
We also have the large deviation principle for ( n1d ·) with the rate function
P
1≤i1 ,...,id ≤n
1gcd(Xi1 ,...,Xid )=1 ∈
Z (2.41)
I(x) = R
···
R
inf [0,1]d
f (x1 ,x2 ,...,xd )µ(dx1 )···µ(dxd )=x
log [0,1]
dµ dν
dµ,
where ν is the same as in Theorem 3 and (2.42) ( 1 if x1 , . . . , xd do not share a common 1 in their binary expansions f (x1 , . . . , xd ) = . 0 otherwise 3. Proofs of Large Deviation Principle In this section, we will prove a series of lemmas and theorems of superexponential estimates that are needed in the proof of Theorem 3 before going to the proof of ˜ that will be used Theorem 3. Let us give the definitions of Yp , S(k1 , k2 ) and P repeatedly throughout this section. Definition 7. For any prime number p, we define (3.1)
Yp := #{1 ≤ i ≤ n : Xi is divisible by p}.
Definition 8. For any k1 , k2 ∈ N, let us define (3.2)
S(k1 , k2 ) := {p ∈ P : k1 < p ≤ k2 }.
8
BEHZAD MEHRDAD AND LINGJIONG ZHU
˜ under which Xi are i.i.d. and Definition 9. We define a probability measure P 1 ˜ P(Xi is divisible by p) = p for p ∈ P, p ≤ n and the events {Xi divisible by p} and {Xi divisible by q} are independent for distinct p, q ∈ P, p, q ≤ n. Lemma 10. Let Y be a Binomial random variable distributed as B(α, n). For any λ ∈ R, let λ1 := eλ . If 2αλ21 < 1 and α < 21 , then, for sufficiently large n, i h 1 ˜ e nλ Y 2 ≤ 4λα2 λ4 + log 4(n + 1) . (3.3) log E 1 n n Proof. By the definition of Binomial distribution, n i X h λi2 n i ˜ e nλ Y 2 = (3.4) E α (1 − α)n−i e n i i=0 λi2 n i ≤ (n + 1) max α (1 − α)n−i e n . 0≤i≤n i Using Stirling’s formula, for any n ∈ N, 1≤ √
(3.5) Therefore, we have 0 ≤ x ≤ 1. Hence,
n i
n! e ≤√ . 2πn(n/e)n 2π
≤ 4enH(i/n) , where H(x) := −x log x − (1 − x) log(1 − x),
(3.6) h i 1 ˜ e nλ Y 2 log E n ( 2 ) log 4(n + 1) i i i i ≤ + max H + log(α) + 1 − log(1 − α) + λ . 0≤i≤n n n n n n To find the maximum of (3.7)
f (x) := H(x) + x log(α) + (1 − x) log(1 − α) + λx2 ,
it is sufficient to look at (3.8)
0
f (x) = log
α 1−α
The assumptions 2αλ21 < 1 and α
0, X 1 ˜ (3.12) log P Yp2 > n2 ≤ − log(k) + 4. n 8 p∈S(k,n)
Therefore, we have the following superexponential estimate, X 1 ˜ (3.13) lim sup lim sup log P Yp2 > n2 = −∞. n→∞ n k→∞ p∈S(k,n)
˜ i is divisible by p}. And whether X ˜ i is Proof. Note that Yp = #{1 ≤ i ≤ n : X ˜ divisible by p is independent from Xi being divisible by q for distinct primes p and q. In other words, Yp are independent for distinct primes p ∈ P. By Chebychev’s inequality, h P i X 2 1 1 ˜ ˜ e nλ p∈S(k,n) Yp (3.14) log P Yp2 > n2 ≤ −λ + log E n n p∈S(k,n) h i 1 X ˜ e nλ Yp2 . log E = −λ + n p∈S(k,n)
We choose k ∈ N large enough so that λ1 = eλ < 2 2 2 2 p λ1 < k λ1 < 1. By Lemma 10, we have (3.15)
1 n
X
h i ˜ e nλ Yp2 ≤ log E
p∈S(k,n)
X p∈S(k,n)
√
2k. For k < p ≤ n, we have
log(4(n + 1)) + 4λ n
2 1 λ41 . p
Prime number theorem states that (3.16)
lim
x→∞
π(x) = 1, x/ log(x)
where π(x) denotes the number of primes less than x. Therefore, |k < p ≤ n, p ∈ 2n P| ≤ log n for sufficiently large n. Together with (3.15), for sufficiently large n, we get h P i X 1 1 ˜ e nλ k n2 ≤ −λ + 3 + n k k n2 ≤ 3 − + 3 n 8 k n2 ≤ 4 log log k2 + 4 − 8 S(k1 ,k2 )
˜ and apply Lemma Proof. The idea of the proof is to change the P to P P measure from 2 10. First, observe that F (X1 , . . . , Xn ) = p∈S(k1 ,k2 ) Yp only depends on the events {Xi ∈ Ep1 ,...,p` }, where i, ` ∈ {1, 2, . . . , n} and {p1 , . . . , p` } ⊂ S(k1 , k2 ) and (3.21)
Ep1 ,...,p` := {i ∈ {1, 2, . . . , n}|Prime(i) ∩ S(k1 , k2 ) = {p1 , . . . , p` }} ,
where Prime(x) := {q ∈ P : x is divisible by q}. We will show that the following uniform upper bound holds, (3.22)
P(X1 ∈ Ep1 ,...,p` ) ≤ e4 log log k2 . ˜ 1 ∈ Ep ,...,p ) P(X 1 `
Before we proceed, let us show that (3.22) and Theorem 11 implies (3.20). Since ˜ i ’s are independent, Xi ’s are independent and X P Xi ∈ Epi1 ,...,pi` , 1 ≤ i ≤ n n ≤ e4 log log k2 , (3.23) ˜ Xi ∈ Epi ,...,pi , 1 ≤ i ≤ n P 1 ` where {pi1 , . . . , pi` } ⊂ S(k1 , k2 ) for 1 ≤ i ≤ n. Recall that F only depends on events {Xi ∈ Ep1 ,...,p` }, therefore, (3.24)
1 1 ˜ log P(F > n2 ) ≤ 4 log log k2 + log P(F > n2 ) n n log(k1 ) ≤ 4 log log k2 + 4 − , 8
where we used Theorem 11 at the last step. Now, let us prove (3.22). First, let us give an upper bound for the numerator, that is, h i n p ···p 1 1 1 ` (3.25) P (X1 ∈ Ep1 ,...,p` ) = #|Ep1 ,...,p` | ≤ ≤ , n n p1 · · · p` where [x] denotes the largest integer less or equal to x and we used the simple fact that [x] x ≤ 1 for any positive x.
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
11
As for the lower bound for the denominator, we have Y Y 1 1 ˜ P (X1 ∈ Ep1 ,...,p` ) = (3.26) 1− q q q∈{p1 ,...,p` } q∈S(k1 ,k2 )\{p1 ,...,p` } Y Y 1 1 ≥ 1− q q q∈{p1 ,...,p` }
≥
Y q∈{p1 ,...,p` }
q∈S(k1 ,k2 )
1 −2 Pq∈S(k ,k ) q1 1 2 e , q
where we used the inequality that 1 − x ≥ e−2x for x ≤ 21 . Notice that X 1 (3.27) lim − + log log n = M, n→∞ q q∈S(1,n)
where M = 0.261497 . . . is the Meissel-Mertens constant. Therefore, for suffciently large k2 , X 1 X 1 ≤ ≤ 2 log log k2 . (3.28) q q q∈S(k1 ,k2 )
q∈S(1,k2 )
Combining (3.25), (3.26) and (3.28), we have proved the upper bound in (3.22).
Lemma 13. Let pj , 1 ≤ j ≤ `, ` ∈ N be the primes such that S(k1 , k2 ) = {p1 , . . . , p` } and Y Y (3.29) m pj ≤ n < (m + 1) pj , 1≤j≤`
1≤j≤`
where m ∈ N. Then, there exists a coupling of vectors of random variables Xi and ˜ i for 1 ≤ i ≤ n, i.e. a measure µ with marginal distributions the same as Xi and X ˜ Xi such that n 2k X X 2 1 2 2 n 2 ˜ Yq ≥ n ≤ 2 (3.30) µ Yq − . m q∈S(k1 ,k2 )
q∈S(k1 ,k2 )
Proof. The main ingredient of the proof is the Chinese Remainder Theorem which states that the set of equations x ≡ a1 mod(p1 ) .. (3.31) . x ≡ a mod(p ) ` ` has a unique solution 1 ≤ x ≤ p1 · · · p` , where 0 ≤ ai < pi , i ∈ {1, 2, . . . , `}. Hence, for each sequence of ai ’s, the set of equations in (3.31) has exactly m solutions for 1 ≤ x ≤ mp1 · · · p` . We denote these solutions by Ri (a1 , . . . , a` ) for ˜ i as i ∈ {1, 2, . . . , m}. Given Xi uniformly distributed on {1, 2, . . . , n}, we define X follows. We generate Bernoulli random variables cj for 1 ≤ j ≤ `, with parameters 1 pj and independent of each other. Now, define ( pc11 · · · pc`` if Xi > mp1 · · · p` ˜ (3.32) Xi = , pb11 · · · pb`` otherwise
12
BEHZAD MEHRDAD AND LINGJIONG ZHU
where bj is 1 if Xi is divisible by pj and 0 otherwise for 1 ≤ j ≤ `. By the ˜ i is the multiplication of pcj and definition, if we condition on Xi > mp1 · · · p` , X j cj ’s are independent. Now, conditional on Xi ≤ mp1 · · · p` and let Prime(Xi ) = → − {p ∈ P : Xi is divisible by p}. Thus, for a vector b = (bj )`j=1 ∈ {0, 1}` , we have
` Y
˜i = ∆ := µ X
(3.33)
b
pjj |Xi ≤ mp1 · · · p`
j=1
→ − = µ Prime(Xi ) ∩ {p1 , . . . , p` } = S( b ) ,
→ − where S( b ) := {pj |bj = 1, 1 ≤ j ≤ `}. But that is equivalent to (3.34)
∆= =
#{Ri (a1 , . . . , a` )|aj = 0 if and only if bj = 0, 1 ≤ i ≤ m} mp1 · · · p` Q m bj 6=0 (pj − 1)
mp1 · · · p` Y 1 Y 1 1− = . pj pj j:bj 6=0
j:bj =0
Therefore, we get (3.35)
µ Xi =
` Y
b
Y
pj j =
j=1
j:bj =0
1 Y 1 1− . pj pj j:bj 6=0
Let us define ( ˜ i ) ∩ S(k1 , k2 )} 6 {Prime(X ˜ i ) := 1 if {Prime(Xi ) ∩ S(k1 , k2 )} = (3.36) g(Xi , X . 0 otherwise → − → − ˜ , we have P(g(Xi , X ˜ i ) = 1) ≤ 1 since By the definition of the coupling of X and X m ˜ the event g(Xi , Xi ) = 1 implies that Xi > mp1 · · · p` which occurs with probability (3.37)
n − mp1 p2 · · · p` mp1 p2 · · · p` 1 1 ≤1− < . = n (m + 1)p1 · · · p` m+1 m
Now, let us go back to prove the superexponential bound in (3.30). Observe that (3.38) f (X1 , . . . , Xn ) :=
X q∈S(k1 ,k2 )
Yq2 =
X q∈S(k1 ,k2 )
Yq +
X
X
q∈S(k1 ,k2 ) i6=j
Hence,
(3.39)
X 2 2 ˜ ˜ i ) = 1}2k2 n. Yq − Yq ≤ #{i|g(Xi , X q∈S(k1 ,k2 )
1q|gcd(Xi ,Xj ) .
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
13
That is because if we change one of Xi ’s, the function f (X1 , . . . , Xn ) changes by at most k2 (n + 1) ≤ 2k2 n. Therefore, X µ (3.40) Yq2 − Y˜q2 ≥ n2 q∈S(k1 ,k2 )
˜ i ) = 1} ≥ n2 ≤ µ 2k2 n#{i|g(Xi , X ˜ = µ #{i|g(Xi , Xi ) = 1} ≥ n . 2k2 ˜ i ) = 1} = Pn 1 Notice that #{i|g(Xi , X ˜ i )=1 is the sum of i.i.d. indicai=1 g(Xi ,X ˜ 1 ) = 1) ≤ 1 . Hence, by Chebychev’s inequality, by tor functions and µ(g(X1 , X m choosing θ = log m > 0, we have ˜ i ) = 1} ≥ n (3.41) µ #{i|g(Xi , X 2k2 h in ≤ E eθ1g(X1 ,X˜ 1 )=1 e−θn 2k2 n θ e + 1 e−θn 2k2 ≤ m
≤ 2n e−(log m)n 2k2 which yields the desired result.
Theorem 14. For any > 0, we have the following superexponential estimates, X 1 Yq2 > n2 = −∞. (3.42) lim sup lim sup log P n n→∞ k→∞ q∈S(k,n)
Proof. Let us write X (3.43) Yq2 = q∈S(k,n)
X
Yq2 +
q∈S(k,M1 ) 120
X q∈S(M1 ,M2 )
Yq2 +
X
Yq2 ,
q∈S(M2 ,n)
120
where M1 := [log log n] and M2 := [log n] . By Lemma 12, for the second and third terms in (3.43), we have 2 X 1 n (3.44) log P Yq2 > n 3 q∈S(M1 ,M2 )
≤ 4 log log M2 + 4 −
log M1 8 3
120 120 = 4 log(log([log n] )) + 4 − log([log(log(n))] ) 24 120 = 4 log log + log log n + 4 − 5 log log log n 120 = 4 log log + 4 − log log log n,
14
BEHZAD MEHRDAD AND LINGJIONG ZHU
and similarly, 1 log P n
(3.45)
X q∈S(M2 ,n)
2 n ≤ − log(log n) + 4. Yq2 > 3
In addition, for the first term in (3.43), by Lemma 13, we get X 2 X n 1 (3.46) Yq2 − Y˜q2 > ≤ log 2 − log µ log M0 , n 6 12M 1 q∈S(k,M1 ) q∈S(k,M1 ) where (3.47)
n
M0 := Q
q∈S(k,M1 ) q n ≥ M1 M1 120 120 = exp log(n) − (log log n) log log log n .
By Theorem 11, (3.48)
1 ˜ log P n
X
p∈S(k,M1 )
2 n ≤ − log(k) + 4. Yp2 ≥ 6 48
Combining (3.44), (3.45), (3.46), (3.47) and (3.48), we get the desired result.
Finally, we are ready to prove Theorem 3. Proof of Theorem 3. Let Ln , Lkn be the empirical measures of Ui , Uik , i.e. n
1X δU (x), Ln (x) := n i=1 i
(3.49) and
n
Lkn (x) :=
(3.50)
1X δ k (x). n i=1 Ui
By Sanov’s theorem, see Dembo and Zeitouni [6], Ln satisfies a large deviation principle on M[0, 1], the space of probability measures on [0, 1], equipped with the weak topology and the rate function (R dµ log dν dµ if µ ν [0,1] (3.51) I(µ) = . +∞ otherwise For a ∈ [0, 1] and i ∈ N, we define (3.52)
χi (a) = the ith digit in the binary expansion of a.
We also redefine fk , f : [0, 1]2 → {0, 1} from 2.19 and 2.24, for k ∈ N, as follows (3.53)
fk (x, y) = 1 − max χi (x)χi (y), 1≤i≤k
and f (x, y) := 1 − max χi (x)χi (y). i∈N
In other words, f is 1 if x and y do not share a common 1 at the same place in their binary expansions and f is 0 otherwise. Similar interpertation holds for fk . Clearly, fk ≥ f and limk→∞ fk (x, y) = f (x, y). Again, let ν be the probability
LIMIT THEOREMS FOR EMPIRICAL DENSITY OF GREATEST COMMON DIVISORS
15
measure on [0, 1] such that for a random variable x with measure ν, χi (x) are i.i.d. Bernoulii random variables with parameters p1i , where pi is the ith smallest prime number. Let αk := {α ∈ [0, 1]|χi (α) = 0 for i > k} be the set of numbers on [0, 1] with k-digit binary expansion. We define (3.54) Let Fk (µ) := have
Aα := {x ∈ [0, 1]|χi (α) = χi (x), 1 ≤ i ≤ k}. RR f (x, y)dµ(x)dµ(y) and F (µ) := [0,1]2 f (x, y)dµ(x)dµ(y). We [0,1]2 k
RR
(3.55)
Fk (µ) =
X
fk (α, β)µ(Aα )µ(Aβ ).
α,β∈Ak
Hence the map µ 7→ Fk (µ) is continuous, i.e. for µn → µ in the weak topology, Fk (µn ) → Fk (µ). By contraction principle, Ln ◦ Fk−1 satisfies a large deviation principle with good rate function Z dµ (k) dµ. log (3.56) I (x) = RR inf dν f (x,y)dµ(x)dµ(y)=x [0,1] [0,1]2 k Moreover, in Theorem 11, we proved that ! ZZ 1 (fk − f )dLn (x)dLn (y) ≥ δ = −∞, (3.57) lim sup lim sup log P n→∞ n k→∞ [0,1]2 for any δ > 0. Thus the family {Ln ◦Fk−1 } are exponentially good approximation of {Ln ◦ F −1 }. Now, by Theorem 4.2.16 in Dembo and Zeitouni [6], Ln ◦ F −1 satisfies a weak large deviation principle with the rate function (3.58)
I1 (x) = sup lim inf
inf
δ>0 k→∞ |w−x|