Hypergraph coloring up to condensation

Comment

Report 0 Downloads 91 Views

Hypergraph coloring up to condensation Amin Coja-Oghlan∗

arXiv:1508.01841v1 [cs.DM] 7 Aug 2015

Peter Ayre School of Mathematics and Statistics UNSW Australia Sydney NSW 2052, Australia

Mathematics Institute Goethe University Frankfurt 60325, Germany

[email protected]

[email protected]

Catherine Greenhill† School of Mathematics and Statistics UNSW Australia Sydney NSW 2052, Australia

[email protected]

August 11, 2015

Abstract Improving a result of Dyer, Frieze and Greenhill [Journal of Combinatorial Theory, Series B, 2015], we determine the q-colorability threshold in random k-uniform hypergraphs up to an additive error of ln 2 + εq , where limq→∞ εq = 0. The new lower bound on the threshold matches the “condensation phase transition” predicted by statistical physics considerations [Krzakala et al., PNAS 2007]. Mathematics Subject Classification: 05C80 (primary), 05C15 (secondary)

1 Introduction Recent work on random constraint satisfaction problems has focused either on the case of binary variables and k-ary constraints (e.g., random k-SAT) or on the case of k-ary variables and binary constraints (e.g., random graph coloring) for some k ≥ 3. In these two cases substantial progress has been made over the past few years. For instance, the k-SAT threshold has been identified precisely for large enough k [12]. Moreover, in the random hypergraph 2-coloring problem (or equivalently the k-NAESAT problem) the threshold is known up to an error term that tends to 0 rapidly in terms of the size k of the edges [11]. In addition, the best current upper and lower bounds on the k-colorability threshold of the Erd˝os-Rényi random graph are within a small additive constant [9]. By comparison, little is known about problems in which both the arity of the constraints and the domain of the variables have size greater than two. Although it has been asserted that the techniques developed in recent work should carry over [9], this claim has hardly been put to the test. In fact, although the k-XORSAT threshold (random linear equations with k variables apiece over the field of size 2) has been known for a while [13, 28], the case of equations of length k ≥ 3 over fields of size greater than two largely remains elusive (apart from [16]).

The present paper deals with one of the most natural examples of a problem with k-ary constraints and q-ary variables with q, k ≥ 3, namely q-colorability of random k-uniform hypergraphs. To be precise, by a q-coloring of G = (V, E ) we mean a map σ : V → [q] such that |σ(e)| > 1 for all e ∈ E , i.e., no edge is monochromatic. The chromatic number of G is the least q for which a q-coloring exists. The random hypergraph model that we consider ∗ The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 278857–PTCC. † Research supported by the Australian Research Council Discovery Project DP140101519.

1

is the most natural one, i.e., G ∈ G (n, k, m) is a (simple) k-uniform hypergraph on the vertex set [n] := {1, 2, 3, . . . , n} with a set of precisely m edges chosen uniformly at random. For every q ≥ 2, k ≥ 3 there exists a (non-uniform) sharp threshold c q,k = c q,k (n) for q-colorability [19]. That is, if m = m(n) is a sequence such that for some fixed ε > 0 we have m(n) < (1 − ε)nc q,k (n), then G (n, k, m) is qcolorable w.h.p., whereas w.h.p. the random hypergraph fails to be q-colorable if m(n) > (1 + ε)nc q,k (n). The best prior bounds on this threshold, obtained by Dyer, Frieze and Greenhill [14, Remark 2.1, (82)], read ³ ´ (q k−1 − 1) ln q − 1 − εq,k ≤ liminf c q,k (n) ≤ limsup c q,k (n) ≤ q k−1 − 1/2 ln q, (1.1) n→∞

n→∞

where limq→∞ εq,k = 0 for any fixed k ≥ 3. Thus, the upper and the lower bound differ by an additive 21 ln q +1+εq,k , a term that diverges in the limit of large q. The main result of this paper provides an improved lower bound that is within an additive ln 2 of the upper bound from (1.1), in the large-q limit.

Theorem 1.1. For each k ≥ 3 there is a number q 0 = q 0 (k) > 0 such that for all q > q 0 we have liminf c q,k (n) ≥ (q k−1 − 1/2) ln q − ln 2 − 1.01ln q/q. n→∞

The proof of Theorem 1.1 is based on the second moment method. So is [14], which generalises the second moment argument of Achlioptas and Naor [4] from graphs to hypergraphs. The result of Achlioptas and Naor was recently improved by Coja-Oghlan and Vilenchik [9], and in this paper we generalise the argument from that paper to hypergraphs. While numerous details need adjusting, the basic proof strategy that we pursue is similar to the one suggested in [9]. In particular, the improvement over [14] results from studying the second moment of a subtly chosen random variable. While the random variable considered in [14] is just the number of (balanced) q-colorings of the random hypergraph, here we use a random variable that is inspired by ideas from statistical mechanics; we will give a more detailed outline in Section 3 below. Thus, the present paper shows that, indeed, with a fair number of careful modifications the method from [9] can be generalised to hypergraphs.

2 Related work The quest for the chromatic number of random graphs (i.e., G (n, 2, m)) goes back to the seminal 1960 paper of Erd˝os and Rényi in which they established the “giant component” phase transition [15]. But it took almost thirty years until a celebrated paper of Bollobás [7] determined the asymptotic value of the chromatic number of dense random graphs. His proof used martingale tail bounds, which were introduced to combinatorics by Shamir and Spencer [29] to investigate the concentration of the chromatic number. Building upon ideas of Matula [27], Łuczak [25] determined the asymptotic value of the chromatic number of the Erd˝os-Rényi random graph in the case that m = m(n) satisfies m/n → ∞. However, the results from [7, 25] only determine the chromatic number up to a multiplicative error of 1 + o(1) as n → ∞, and the resulting error term exceeds the width within which the chromatic number is known to be concentrated. Indeed, in the case that m = m(n) ≤ n 3/2−Ω(1) it is known that the chromatic number of the random graph is concentrated on two subsequent integers [6, 26]. In the sparse case m = O(n) the precise values of these two integers are implied by the current bounds on the q-colorability threshold [4, 8, 9]. The 2-colorability problem in random hypergraphs, which is essentially equivalent to the random k-NAESAT problem, has also been studied. Achlioptas and Moore [2, 3] showed that the 2-colorability threshold can be approximated within a small additive constant via the second moment method. Furthermore, Coja-Oghlan and Zdeborová [10] established the existence of a further phase transition apart from the threshold for 2-colorability, the “condensation phase transition”. The name derives from an intriguing connection to the statistical mechanics of glasses [21, 23]. Moreover, the argument of Coja-Oghlan and Panagiotou [11] determines the 2-colorability threshold in k-uniform random hypergraphs up to an additive error term εk that tends to 0 exponentially as a function of k. Prior to the aforementioned work of Dyer, Frieze and Greenhill [14] the q-colorability problem in hypergraphs was studied by Krivelevich and Sudakov [22], who also considered other possible notions of colorings. Their results 2

are of a similar nature to Łuczak’s [25] in the case of graphs. That is, they determine the value of the chromatic number up to a multiplicative 1 + o(1) factor, with o(1) hiding a term that vanishes as m/n → ∞. The same is true of the results of Kupavskii and Shabanov [24], which partly improve upon [22]. However, the bounds on the q-colorability threshold that can be read out of [22, 24] are less precise than those obtained in [14] (upon which Theorem 1.1 improves).

3 Outline Throughout, we assume that n is sufficiently large for our error estimates to hold, and that q > q 0 . Further, we assume that m = ⌈cn⌉ and for ease of notation will often write cn rather than ⌈cn⌉. The second moment method. The second moment method has become the mainstay for lower-bounding satisfiability thresholds [2, 5, 17]. Suppose that we can construct a non-negative random variable Z on G (n, k, cn) such that the event Z (G) > 0 implies q-colorability, and such that E[Z 2 ] = O(E[Z ]2 )

as n → ∞.

(3.1)

Then the Paley-Zygmund inequality implies that liminf P [Z > 0] ≥ liminf n→∞

n→∞

E[Z ]2 > 0. E[Z 2 ]

(3.2)

Combining (3.2) with the sharp threshold result from [19], which establishes the existence of a sharp threshold sequence c q,k (n), yields liminfn→∞ c q,k (n) ≥ c. Hence, the second moment method can be summarised as follows. Fact 3.1. If there is a non-negative random variable Z on G (n, k, cn) which implies q-colorability and such that (3.1) holds, then liminfn→∞ c q,k (n) ≥ c. Thus, our task is to exhibit a random variable Z on G (n, k, cn) that satisfies (3.1) for as large a value of c as possible. Balanced colorings. Certainly the most natural choice for Z seems to be the number Z q of q-colorings of the random hypergraph. Clearly, Z q ≥ 0 and Z q (G) > 0 only if G is q-colorable. However, technically Z q is a bit unwieldy. Therefore, following Achlioptas and Naor [4], Dyer, Frieze and Greenhill [14] considered a slightly modified p random variable. Namely, let us call a map σ : [n] → [q] balanced if |σ−1 (i ) − n/q| ≤ n for all i ∈ [q] and let Z q,bal be the number of balanced q-colorings of G . Lemma 3.2 ([14]). For any q, k ≥ 3 and any c > 0 we have

´ ³ 1 1 ln E[Z q ] = lim ln E[Z q,bal ] = ln q + c ln 1 − q 1−k . n→∞ n n→∞ n

lim

(3.3)

That (3.3) is true follows from the analysis of [14] which provides a lower bound (where the term ‘balanced coloring’ is reserved for a still stricter type of colouring), and the observation that each non-balanced τ : [n] 7→ [q] is less likely to be a proper coloring than a given balanced map. It is easily verified that the r.h.s. of (3.3) is positive if c < (q k−1 − 21 ) ln q − ln 2. Hence, for such c, both E[Z q ] and E[Z q,bal ] are exponential in n. They differ only in their sub-exponential terms. Consequently, we do not give anything away by confining ourselves to balanced colorings only. In the following we will see why neither Z q nor Z q,bal is a good random variable to work with and why neither can be used to prove Theorem 1.1. What we learn will guide us towards constructing a better random variable. While working out the first moment of Z q,bal (i.e., the proof of Lemma 3.2) is pretty straightforward, getting a handle on the second moment is not quite so easy. Of course, the second moment of Z q,bal is nothing but the 3

£ ¤ expected number of pairs of q-colorings. Moreover, the probability that two maps σ, τ : [n] → q simultaneously happen to be q-colorings of G will depend on how “similar” σ, τ are. To gauge similarity, define the overlap of σ, τ as the q × q-matrix a(σ, τ) = (ai j (σ, τ))i,j ∈[q] with entries

ai j (σ, τ) = n −1 |σ−1 (i ) ∩ τ−1 ( j )|. In words, ai j (σ, τ) is the probability that a random vertex v ∈ [n] has color i under σ and color j under τ. Then we can cast the second moment in terms of the overlap as follows. Let R = R n,q be the set of all overlaps a(σ, τ) of £ ¤ balanced σ, τ : [n] → q . Lemma 3.3 ([14]). Let kakk = H (a) = −

X

hP

k i,j ∈[q] a i j

i1/k

be the ℓk -norm and define

h i E (a) = E q,c,k (a) = c ln 1 − 2q 1−k + kakkk .

ai j ln ai j ,

i,j ∈[q]

Let F (a) = H (a) + E (a). Then 2 E[Z q,bal ] = Θ(n (1−k

2

)/2

)·

X

exp [nF (a)] .

(3.4)

a∈R

The proof of Lemma 3.3 is elementary. First, observe that for a given a ∈ R, the number of σ, τ with overlap a is given by a multinomial coefficient. Applying Stirling’s formula yields the entropy H (a). Second, given σ, τ with ¡ ¢ overlap a, the probability that a random edge chosen uniformly out of all nk possible edges is monochromatic under either σ or τ is asymptotically equal to 2q 1−k − kakkk by inclusion/exclusion (as σ, τ are balanced). Hence E (a) arises. 2

Clearly, the set R that the sum in (3.4) ranges over is of polynomial size; for instance, the bound |R| ≤ n q is immediate. Furthermore, letting D be the polytope comprising of all a = (ai j )i,j ∈q such that X X £ ¤ ai j = a j i = 1/q for all i ∈ q , ai j ≥ 0 for all i , j ∈ [q], j ∈[q ] j ∈[q ] we see that R ∩ D is dense in D as n → ∞. Therefore, (3.4) yields lim

n→∞

1 2 ln E[Z q,bal ] = max F (a). a∈D n

Further, evaluating the function F (a) from Lemma 3.3 at the “flat” overlap a¯ = (a¯i j ) with a¯i j = q −2 for all i , j ∈ [q], we find h i ¯ = 2 ln q + c ln(1 − q 1−k ) . F (a)

This term is precisely twice the exponential order of the first moment from (3.3). Consequently, the second mo2 ] = O(E[Z q,bal ]2 ) can hold only if ment bound E[Z q,bal ¯ = max F (a). F (a) a∈D

(3.5)

In fact, the Laplace method applied along the lines of [18, Theorem 2.3] shows that the condition (3.5) is both necessary and sufficient for the success of the second moment method. In summary, the second moment argument reduces to the analytic problem of maximising the function F over the polytope D. A relaxation. This maximisation problem is anything but straightforward. Following [4], Dyer, Frieze and Greenhill [14] consider a relaxation. Namely, instead of optimising F over £ ¤D, they consider the (substantially) bigger Pq domain S of all a = (ai j )i,j ∈[q] such that j =1 ai j = 1/q for all i ∈ q and ai j ≥ 0 for all i , j ∈ [q], dropping the P constraint that the “columns sums” j a j i equal 1/q. Clearly, maxa∈D F (a) ≤ maxa∈S F (a). Furthermore, Dyer, 4

Frieze and Greenhill solve the latter maximisation problem precisely by generalising the techniques from [4], requiring rather lengthy technical arguments. The result is that for c up to the lower bound in (1.1) we indeed have ¯ maxa∈S F (a) = F (a).

But this method does not work up to the density promised by Theorem 1.1. There are two obstacles. First, not ¯ far beyond the lower bound in (1.1) the maximum of F over S is attained at a point a ′ ∈ S \ D, i.e., F (a ′ ) > F (a). Thus, relaxing D to the larger domain S gives too much away. Second, there exists a constant γ > ln 2 such that for c = (q k−1 − 1/2) ln q − γ, the value of F attained at astable = (q −1 − q −k ) id + q −k (q − 1)−1 (J − id) ∈ D ¯ (Here every entry of J equals 1.) Consequently, even if we could solve the analytic is strictly greater than F (a). problem of maximising F over the actual domain D it would be insufficient to prove Theorem 1.1. Tame colorings. The above discussion shows that it is impossible to prove Theorem 1.1 via the second moment method applied to Z q,bal . A similar problem occurs in the case of random graphs (k = 2), see [9]. To remedy this problem in the hypergraph case we will generalise the strategy from [9].

The key idea is to introduce a random variable Z q,tame that takes the expected geometry of the set B(G ) of all balanced q-colorings of G ∈ G (n, k, cn) into account, such that 0 ≤ Z q,tame ≤ Z q,bal . According to predictions based on non-rigorous physics considerations [23], the set B(G ) has a geometry that is very different from that of a random subset of the cube [q][n] of the same size. More precisely, for almost all k-uniform hypergraphs G with cn edges, the set B(G) decomposes into well-separated “clusters” which each contains an exponential number of colorings. However, the fraction of colorings that any single cluster contains is only an exponentially small fraction of the total number of q-colorings of G. Furthermore, while it is possible to walk inside the set B(G) from any coloring to any other colouring in the same cluster by only changing the colors of O(ln n) vertices at a time, it is impossible to get from one cluster to another without changing the colors of Ω(n) vertices in a single step. Now, the basic idea is to let Z q,tame = Z q,bal · 1{T }, where T is the event that the geometry of the set B(G) has the aforementioned properties. To make this rigorous, we define the cluster of a q-coloring σ of a hypergraph G as the set ½ ¾ C (G, σ) = τ ∈ B(G) : min aii (σ, τ) > q −1 (1.01/k)1/(k−1) . i∈[ q ] In words, C (G, σ) contains all balanced q-colorings τ of G where, for each color i , at least a (1.01/k)1/(k−1) fraction of all vertices colored i under σ retain color i under τ. Call a q-coloring σ of G separable if £ ¤ ∀τ ∈ B(G), ∀i , j ∈ q , ai j (σ, τ) 6∈ (q −1 (1.01/k)1/(k−1) , q −1 (1 − κ))

where κ = q 1−k ln20 q.

(3.6)

Definition 3.4. A q-coloring σ of the (fixed) hypergraph G is tame if T1: σ is balanced,

T2: σ is separable,

T3: |C (G, σ)| ≤ E[Z q,bal ].

Definition 3.4 generalises the concept of “tame graph colorings” from [9, Definition 2.3]. The set of tame colorings of a given hypergraph G decomposes into well-separated clusters. Indeed, the separability condition ensures that the clusters of two tame colorings σ, τ of G are either disjoint or identical. Furthermore, T3 ensures that no cluster size exceeds the expected number of balanced colorings, i.e., the clusters are “small” (well, small enough for our proof to work). Let Z q,tame be the number of tame q-colorings of G (n, k, cn). With the right random variable in place, our task boils down to calculating the first and the second moment. In Section 4 we will prove that the first moment of Z q,tame is asymptotically equal to the first moment of Z q,bal . For the following two propositions we assume that (q k−1 − 1/2) ln q − 2 ≤ c ≤ (q k−1 − 1/2) ln q − ln 2 − 1.01ln q/q. 5

Proposition 3.5. There is a number q 0 > 0 such that for all q > q 0 we have E[Z q,tame ] ∼ E[Z q,bal ]. Further, in Section 5 we establish the following bound on the second moment. 2 Proposition 3.6. There is a number q 0 > 0 such that for all q > q 0 if E[Z q,tame ] ∼ E[Z q,bal ] then we have E[Z q,tame ]=

O(E[Z q,bal ]2 ).

Thus, while moving to tame colorings has no discernible effect on the first moment, Proposition 3.6 shows that 2 the impact on the second moment is dramatic. Indeed, the matrix astable shows that E[Z q,bal ] ≥ exp(Ω(n))E[Z q,bal ]2

2 for c near the bound in Theorem 1.1, while E[Z q,tame ] = O(E[Z q,bal ]2 ) for all c up to (q k−1 −1/2) ln q−ln 2−1.01ln q/q. Then Theorem 1.1 follows from Fact 3.1 and Propositions 3.5 and 3.6.

Finally, the obvious question is whether the approach taken in this work can be pushed further to actually obtain tight upper and lower bounds on the q-colorability threshold. However, it follows from the proof of Propositions 3.5 that the answer is “no”. More specifically, in Section 4.4 we prove the following. Corollary 3.7. For any k ≥ 3 there exists a sequence (εq )q≥3 such that limq→∞ εq = 0 and such that the following is true. For any c > (q k−1 − 12 ) ln q − ln 2 + εq there exists δ > 0 such that £ ¤ lim P Z q < exp(−δn)E[Z q ] = 1. n→∞

(3.7)

Now, assume for contradiction that there is a random variable 0 ≤ Z ≤ Z q with the following properties. First, Z (G) > 0 only if G is q-colorable. Second, ln E[Z ] ∼ ln E[Z q ] (cf. Lemma 3.2 and Proposition 3.5). Third, E[Z 2 ] = O(E[Z ]2 ). Then the Paley-Zygmund inequality implies that £ ¤ £ ¤ lim lim P Z q ≥ exp(−δn)E[Z q ] ≥ lim lim P Z ≥ exp(−δn)E[Z ] > 0, δ→0 n→∞

δ→0 n→∞

in contradiction to (3.7). Corollary 3.7 is in line with the physics prediction that the actual q-colorability threshold is preceded by another phase transition called condensation [23], beyond which Z q ≤ exp(−Ω(n))E[Z q ] w.h.p. In particular, the lower bound of Theorem 1.1 matches this “condensation threshold” up to an error term that tends to 0 in the limit of large q. Notation. We assume throughout that the number of vertices, n, is sufficiently large for our estimates to hold. We also assume that the number of colors, q, exceeds some large enough constant q 0 = q 0 (k). But of course q, k are always assumed to remained fixed as n → ∞.

We use the O-notation to refer to the limit n → ∞. For example, f (n) = O(g (n)) means that there exists some C > 0, n0 > 0 such that for all n > n0 we have | f (n)| ≤ C ·|g (n)|. In addition, o(·), Ω(·), Θ(·) take their usual definitions, except that we assume the expression Ω(n) is positive (for sufficiently large n) whenever we write exp(−Ω(n)). We write f (n) ∼ g (n) if limn→∞ f (n)/g (n) = 1. When discussing estimates that hold in the limit of large q we will make this explicit by adding the subscript q to the asymptotic notation. Therefore, f (q) = O q (g (q)) means that there exists C > 0, q 0 > 0 such that for all q > q 0 e q (g (q)) to indicate that there exists C > 0, q 0 > 0 we have | f (q)| ≤ C · |g (q)|. Furthermore, we will write f (q) = O such that for all q > q 0 we have | f (q)| ≤ (ln q)C · |g (q)|.

4 The first moment Throughout this section, unless specified otherwise we take σ, τ : [n] → [q] as balanced maps, and assume that (q k−1 − 1/2) ln q − 2 ≤ c ≤ (q k−1 − 1/2) ln q − ln 2 − 1.01ln q/q. We frequently make use of the Chernoff bound. 6

Lemma 4.1. ([20, Theorem 2.1]) Let φ(x) = (1 + x) ln(1 + x) − x. Let X be a binomial random variable with mean µ > 0. Then for any t > 0 £ ¤ © ª P X > µ + t ≤ exp −µφ(t /µ) ,

£ ¤ © ª P X < µ − t ≤ exp −µφ(−t /µ) .

£ ¤ © ª In particular, for any t > 1 we have P X > t µ ≤ exp −t µ ln(t /e) .

4.1 The planted model The aim in this section £ is ¤ to establish Proposition 3.5, the lower bound on the expected number of tame colorings. Let σ : → q be a (fixed) balanced [n] £ ¤ map that assigns each vertex a color. It suffices to prove that P σ is a tame coloring of G |σ is a coloring of G = 1 − o(1). Furthermore, the conditional distribution of G given that σ is a coloring admits an easy explicit description: the conditional random hypergraph simply consists of m random edges chosen uniformly out of all edges that are not monochromatic under σ. It will however be convenient to work with a slightly different distribution. Let Gσ ∈ G (n, k, cn, σ) be the hypergraph on [n] obtained by including every edge that is not monochromatic under σ with probability ¡ ¢ ¡ ¢ ck! · 1 + O(1/n) cn p = ¡n ¢ = O n 1−k (4.1) ¡n/q ¢ = k−1 1−k n (1 − q ) −q k

k

independently. Observe that the expected number of edges equals cn. We call G (n, k, cn, σ) the planted coloring model. £ ¤ Lemma 4.2. Let σ : [n] → q be a fixed balanced map. For any event E we have £ ¤ p P G ∈ E | σ is a coloring of G ≤ O( n) P[Gσ ∈ E ] .

Proof. By Stirling’s formula, the probability that Gσ has precisely m edges is Θ(n −1/2 ). If this event occurs then the conditional distributions of Gσ and of G coincide. 1

Hence, we are left to show that the probability that σ fails to be tame in Gσ is o(n − 2 ). Indeed, in Sections 4.2 and 4.3 we will establish the following two statements. In both cases the proofs are by careful generalisation of the arguments from [9] to the hypergraph case. Lemma 4.3. With probability 1 − exp(−Ω(n)) the planted coloring σ is separable in G (n, k, cn, σ). Lemma 4.4. With probability 1 − exp(−Ω(n)) we have |C (Gσ , σ)| ≤ E[Z q,bal ]. Proposition 3.5 is immediate from Lemmas 4.2–4.4. Much of the analysis in this section will involve random variables defined using the following edge counts. For sets X 1 , X 2 , X 3 ⊂ [n] and α ∈ [k], we let m α (X 1 , X 2 , X 3 ) be the number of edges e of Gσ such that there exists x ∈ X 1 and distinct v 1 , . . . v α ∈ X 2 with x, v 1 , . . . v α ∈ e and e \ {x, v 1 , . . . v α } ⊆ X 3 . If α = k − 1 then we write m k−1 (X 1 , X 2 ) instead of m k−1 (X 1 , X 2 , X 3 ), since X 3 has no effect in this case. For ease of notation, if X 1 = {v} we simply write m α (v, X 2 , X 3 ), or m k−1 (v, X 2 ). We set Vi = σ−1 (i ) to ease the notational burden.

4.2 Separability: proof of Lemma 4.3 £ ¤ £ ¤ Let τ : [n] → q be a balanced map for which there exist i , j ∈ q such that (3.6) is violated. Of course, we may assume without loss that i = j = 1. We aim to show that τ is unlikely to be a coloring of Gσ . Clearly, if τ is a coloring of Gσ then τ−1 (1) is an independent set of size about n/q that has a rather substantial intersection with the independent set σ−1 (1). The following lemma rules this constellation out for a wide range of intersection sizes.

7

Lemma 4.5. With probability 1 − exp(−Ω(n)) the hypergraph Gσ has no independent set I of order (1+o(1)) nq such that ³ ´ n −1 |I ∩ σ−1 (1)| ∈ q −1 (1.01/k)1/(k−1) , q −1 (1 − q (1.01−k)/2 ) .

Proof. Suppose that I is an independent set with |I | = nq (1+o(1)) such that S = I ∩σ−1 (1) contains |S| = Then the set

sn q

vertices.

V0 = V0 (S) := {v ∈ V \σ−1 (1) : m k−1 (v, S) = 0} contains I \ S. Observe that (

Ã

|S| P[m k−1 (v, S) = 0] = exp −p k −1

!)

(

¡ ¢k−1 ) ¡ ¢ kc s/q k−1 · 1 + O(1/n) = exp − · 1 + O(1/n) ≤ 2q −k s . 1−k (1 − q ) ¡

¢

For I to exist, we require that n0 := |V0 | > (1 − s+o(1)) nq . k−1

Since n0 is stochastically dominated by Bin(|V \σ−1 (1)|, 2q −k s ), we have by the Chernoff bound (see Lemma 4.1) that !) Ã ( h i 1−s n n . P n0 > (1 − s+o(1)) q ≤ exp −(1 − s + o(1)) q ln k−1 2q 1−k s e Hence, by the union bound, the probability that there exists an independent set I of order |I ∩ S| = sn q is at most (

¡

exp − 1 − s + o(1)

¢n

q

Ã

ln

Ã

1−s

2q 1−k s

k−1

e

!

− 1 + ln(1 − s)

!)

= exp

(

¡

1 − s + o(1)

¢n

q

ln

Ã

n q (1 + o(1)

2e 2 qks

k−1 −1

(1 − s)2

such that

!)

,

which tends to zero if and only if p 2e q (1−k s

k−1 )/2

< 1 − s.

(4.2)

By convexity, the exponential function on the l.h.s. intersects the linear function on the r.h.s. at most twice, and between these two points of intersection the linear function is largest. For sufficiently large q, explicit calculation shows that the values s = (1.01/k)1/(k−1) and s = 1 − q (1.01−k)/2 satisfy (4.2), completing the proof. Lemma 4.5 does not quite cover the entire interval of intersections required by (3.6). To rule out the remaining subinterval (q −1 (1− q (1−k)/2), q −1 (1− κ)) we use an expansion argument. The starting point is the observation that most vertices that have color 1 under τ but not under σ are likely to occur in a good number of edges in which all the k − 1 other vertices are colored 1 under σ. Lemma 4.6. Let τ : [n] → [q] be a balanced map such that a11 (σ, τ) ∈ (q −1 (1 − q (1.01−k)/2), q −1 (1 − κ)). With probability 1 − exp(−Ω(n)), the random hypergraph Gσ ∈ G (n, k, cn, σ) has the following properties: 1. The set Y := {v ∈ V \ σ−1 (1) : m k−1 ({v}, σ−1 (1)) < 15} has size at most nκ/(3q). 2. The set U := τ−1 (1)\(σ−1 (1) ∪ Y ) satisfies m 1 (U , σ−1 (1) \ τ−1 (1), σ−1 (1)) ≤ 5|σ−1 (1) \ τ−1 (1)|. Proof. Suppose that |σ−1 (1) ∩ τ−1 (1)| =

sn q

where s ∈ (1 − q (1.01−k)/2 , 1 − κ). Fix v ∈ V \ V1 . We know that m k−1 (v,V1 ) ∼ Bin 8

ÃÃ

! ! |V1 | ,p . k −1

Therefore

P [m k−1 (v,V1 ) < 15] ≤ Further, if we note that

¡ |V1 | ¢ k−1

14 X

j =0

Ã¡ |V | ¢! 1 k−1

j

p j (1 − p)

¡|V |¢ 1 k−1

−j

≤ (1 − p)

14 ¡|V |¢ 1 −14 X k−1

j =0

´j |V1 | ¢ p k−1

³¡

j!

.

p > k ln q then ¡ ¢14 P [m k−1 (v,V1 ) < 15] ≤ 3 k ln q q −k .

As the event {m k−1 (v,V1 ) < 15} occurs independently for all v ∈ V \V1 the total number Y of such vertices is dom¡ ¢14 ¡ ¢14 inated by Bin(n(1 − 1/q), 3 k ln q q −k ). Therefore E[Y ] ≤ n · 3 k ln q q −k . Finally, by the Chernoff bound (see Lemma 4.1) P[Y ≥ nκ/(3q)] ≤ exp{nκ/(3q)} = exp{−Ω(n)} and so the proof of (i ) is complete. For³ notational convenience, we write R = σ−1 (1)\τ−1 (1). Observe that m 1 (U , R,V1 ) is stochastically dominated ¡ |V1 | ¢ ´ by Bin |R||U | k−2 , p . Therefore Ã

! |V1 | n(1 − s) −1 2 · n k q ln q = |R| · (1 − s)k 2 ln q. E[m 1 (U , R,V1 )] ≤ |R||U | p ≤ |R| · q k −2

Finally, as 1 − κ ≤ 1 − s ≤ q (1.01−k)/2 , part (ii) follows from the Chernoff Bound. Proof of Lemma 4.3. Suppose that τ is a balanced map such that a11 (σ, τ) > q −1 (1.01/k)1/(k−1) . By Lemma 4.5, we may assume that a11 (σ, τ) > q −1 (1 − q 1.01−k)/2). With U , Y as in Lemma 4.6 we have that 15|U | ≤ m 1 (U , R,V1 ) ≤ 5|σ−1 (1) \ τ−1 (1)|, so |U | ≤ 31 |σ−1 (1)\τ−1 (1)|. Also, since τ is balanced, we have

n q

∼ |τ−1 (1)| = |σ−1 (1)∩τ−1 (1)|+|U |+|Y |. Substituting

our bound on U from above, and using Lemma 4.6, implies that na11 (σ, τ) = |σ−1 (1) ∩ τ−1 (1)| > n(1 − κ)/q, as required.

4.3 The cluster size: proof of Lemma 4.4 To upper bound the cluster size we will exhibit a large “core” of vertices of Gσ that are difficult to recolor. More specifically, the core will consist of vertices v such that for every color i 6= σ(v) there are several edges e containing v such that e \{v} ⊂ Vi and such that all vertices of e belong to the core. Therefore, if we attempt to change the color of v to i 6= σ(v), then it will be necessary to recolor several other vertices of the core. In other words, recoloring a single vertex in the core leads to an avalanche that will stop only once at least nq −1 (1.01/k)1/(k−1) vertices in some color class have been recolored. Hence, the outcome is a coloring that does not belong to C (Gσ , σ). Formally, given a fixed balanced map σ and fixed hypergraph G, the core Vcore of G is defined as the largest subset V ′ ⊆ [n] of vertices such that m k−1(v,Vi ∩ V ′ ) ≥ 100k for all v ∈ V ′ and all i 6= σ(v). The core is well-defined; for if V ′ ,V ′′ are sets with the property, then so is V ′ ∪ V ′′ . Lemma 4.7. With probability 1 − o(n −1/2 ) the random hypergraph Gσ has the following two properties: (i) The core of Gσ contains at least (1 − q 1−k ln500k q)n vertices. 9

(ii) If τ is a balanced coloring of Gσ such that τ(v) 6= σ(v) for some v in the core, then τ 6∈ C (Gσ , σ). We proceed to prove Lemma 4.7. To estimate the size of the core we consider the following process: © ª CR1 For i , j ∈ [q] and i 6= j , let Wi j = v ∈ Vi : m k−1 (v,V j ) < 300k , Wi = ∪i6= j Wi j , W = ∪i Wi . © ª CR2 For i 6= j let U i j = v ∈ Vi : m 1 (v,W j ,V j ) > 100k , and U = ∪i6= j U i j .

CR3 Set Z (0) = U and repeat the following for ℓ ∈ N,

• if there is a v ∈ V j \Z (ℓ) such that m 1 (v, Z (ℓ) ,V j ) > 100k for some j 6= σ(v) then take one such v and let Z (ℓ+1) = Z (i) ∪ {v};

• otherwise, set Z (ℓ+1) = Z (ℓ) .

Let Z = ∪ℓ≥0 Z (ℓ) be the final set resulting from CR3. Claim 4.8. The set V \(W ∪ Z ) is contained within the core. Proof. For a contradiction, let v ∈ V \(Vcor e ∪W ∪Z ). Since W j ⊆ V j , any edge that does not contribute to m 1 (v,W j ,V j ) must have empty intersection with W j . Therefore since m k−1 (v,V j ) ≥ 300k but m 1 (v,W j ,V j ) ≤ 100k, we must have that m k−1 (v,V j \W j ) ≥ 200k. Similarly, since m 1 (v, Z (ℓ) ∩ V j ,V j ) ≤ 100k we must have that m k−1(v,V j \(W j ∪ Z ) ≥ 100k. Furthermore, this statement holds for all j 6= σ(v) and all v ∈ V \ (W ∪ Z ). It follows that the entire set V \ (W ∪ Z ) may be added to the core, contradicting maximality. We now bound the size of W,U and Z . Claim 4.9. The function Q(q, k) = q −k−1 ln400k q is such that |Wi j | ≤ n·Q(q, k) with probability at least 1−exp{−Ω(n)}. Proof. Fix v ∈ Vi . Due to the independence of edges in G (n, k, cn, σ) we know that m k−1 (v,V j ) is distributed bi¡ |V j | ¢ q nomially with mean k−1 p(1 + o(1)) ≥ k ln q + O q (q −1 ). It follows from Lemma 4.1 that P(v ∈ Wi j ) ≤ 3 · Q(q, k) n for v ∈ Vi and sufficiently large q. Therefore E[|Wi j |] = 3 · Q(q, k). Finally, since |Wi j | is distributed binomially, a straightforward application of the Chernoff bound shows that P[|Wi j | ≥ n · Q(q, k)] ≤ exp{−n · Q(q, k) ln(3/e)} = exp{−Ω(n)}. Claim 4.10. We have |U | ≤ n/q 10k with probability at least 1 − exp{−Ω(n)}.

³ ¡ |V j | ¢ ´ , p . Hence, with Q(q, k) Proof. Fix v ∈ Vi . The quantity m 1 (v,W j ,V j ) is stochastically dominated by Bin |W j | k−2 as previously, we know that Ã ! ¯ i h |V j | ¯ e q (q 1−k ). E m 1 (v,W j ,V j ) ¯ |W j | ≤ n · qQ(q, k) ≤ nkp · qQ(q, k) = O k −2 ¯ £ ¤ e q (q −19k ). The independence of edges in Applying the Chernoff bound gives P v ∈ U i j ¯ |W j | ≤ n · qQ(q, k) ≤ O G (n, k, cn, σ) then implies that U i j is distributed binomially with mean less than n·O q (q −20k ). The Chernoff bound implies that ¯ i h ¯ P |U i j | > nq −15k ¯ |W j | ≤ n · qQ(q, k) ≤ exp{−Ω(n)}.

The result follows by Claim 4.9.

Claim 4.11. We have |Z | ≤ n/q 9k with probability at least 1 − exp{−Ω(n)}.

10

Proof. Claim 4.10 tells us that |U | ≤ n/q 10k with probability 1 − exp{−Ω(n)}. We will condition on this event. Sup∗ pose that |Z \U | ≥ i ∗ = n/q 10k and consider the set Z (i ) obtained after i ∗ steps of CR3. The construction of Z ∗ implies that there exists 100k|Z (i ) \U | vertex-edge pairs (v, e) such that e ∩ Z ≥ 2 and e \ {v} ⊆ V j for some j ∈ [q]. ∗ Since each edge may appear in at most k vertex-edge pairs, this implies that there are at least 100|Z (i ) \U | such ∗ 10k (i ∗ ) edges. Therefore, there are at least 100i = 100n/q edges e such that e ∩ Z ≥ 2 and e\{v} ⊆ V j for some (i ∗ ) 10k j ∈ [q], v ∈ e, despite the set Z only being of size at most 2n/q . We prove that with high probability, no such set can exist. Let α = q −10k and let T ⊂ [n] be a set of |T | = αn vertices. Let m T∗ be the number of edges e such that e ∩ T ≥ 2 ¡ ¢¡ n/q ¢ and e\{v} ⊆ V j for some j ∈ [q], v ∈ V . We know that m T∗ is stochastically dominated by Bin(2 αn 2 k−2 , p), and so we may observe by the Chernoff bound that £ ¤ P m T∗ ≥ 100 · αn ≤ exp{100αn ln α}.

If we let N be the number of sets T of size |T | = αn such that m T∗ ≥ 100 · αn, then Ã ! n ³ ´o n P [N > 0] ≤ exp{100αn ln α} ≤ exp −n αln α + (1 − α) ln(1 − α) − 100αln α = exp{−Ω(n)}. αn Therefore with probability 1 − exp{−Ω(n)} we have |Z \U | ≤ n/q 10k , which implies the claim. Lemma 4.7 (i ) then follows from Claims 4.8–4.11.

To establish (ii) we say that a vertex v is j -blocked if there is an edge e ∋ v such that e \ {v} is contained in the core and e \{v} ⊂ V j . We say that a vertex v is σ-complete if it is j -blocked for all j 6= σ(v). Note that, as with vertices inside the core, recoloring any σ-complete vertex will set off a coloring avalanche. Therefore, the cluster size is a result of vertices that fail to have a neighbour in the core (i.e. the cluster size is a function of the number of 1-free vertices). Claim 4.12. With probability 1 − exp{−Ω(n)} the random graph Gσ has the following property: if τ ∈ C (Gσ , σ) then for all σ- complete vertices v we have σ(v) = τ(v). Proof. Note that it suffices to prove that σ(v) = τ(v) for all v in the core, since this implies the result for all σcomplete vertices outside the core as well, by definition of σ-complete. Recalling Lemma 4.3, we may assume that σ is separable in G (n, k, cn, σ). For i ∈ [q], let ∆+ i = {v ∈ Vcor e : τ(v) = i 6= σ(v)}, Then

q X

i=1

∆− i = {v ∈ Vcor e : τ(v) 6= i = σ(v)}.

|∆+ i | = |{v ∈ Vcor e : σ(v) 6= τ(v)}| =

q X

i=1

|∆− i |.

(4.3)

n The assumption that σ is separable implies maxi∈[q] |∆+ | ≤ nq κ(1 + o(1)) and maxi∈[q] |∆− i | ≤ q κ(1 + o(1)). If we i can show that {v ∈ Vcor e : σ(v) 6= τ(v)} = ; then σ(v) = τ(v) for all σ-complete vertices.

Take v ∈ ∆+ . Since v ∈ Vcor e we know that m k−1(v,Vi )≥ 100. Further, since τ is a coloring, we must have that i m 1 (v, ∆− ,V )≥ 100. Let |∆− | = βn and observe that i i i £ ¤ − P m 1 (∆+ i , ∆i ,Vi ) ≥ βn ≤

Ã 1.01 nκ q

· βn · βn

¡ |Vi | ¢! k−2

p

βn

Ã

2.8nκ|Vi |k−2 p ≤2 q(k − 2)!

by definition of κ. This implies that for all i ∈ [q], + − − 100|∆+ i | ≤ m 1 (∆i , ∆i ,Vi ) ≤ |∆i |.

Therefore, (4.3) implies that

∆+ i

= ∆− i

= ; for all i , as required.

11

!βn

¡ ¢βn ≤ 3k 2 κ ln q = exp{−Ω(n)},

(4.4)

Lemma 4.7 (i i ) is immediate from Claim 4.12. The core size guaranteed by Lemma 4.7 is not quite big enough to deduce a good bound on the cluster size (due to the polylogarithmic factor). To remedy this problem, we say that a vertex v is j -blocked if it is contained in an edge e such that e \{v} is contained in the core and e \{v} ⊂ V j . Further, we say that v is α-free if there are at least α+ 1 colors j (including σ(v)) such that v fails to be j -blocked. A careful study of how the vertices outside the core connect to those inside yields the following. Lemma 4.13. With probability 1 − o(n −1/2 ) there exists a set A W of vertices such that there are at most nq 1−k (1 + O q (q −2 )) vertices outside of A W which are 1-free and there are at most nq −k (1 + O q (q −1 )) vertices which belong to A W or are 2-free. We proceed to prove Lemma 4.13. Let A i = {v ∈ V \Vi : m k−1 ({v},Vi ) = 0}, and define A0 =

[

i∈[k]

Ai ,

A 00 =

[¡

i6= j

¢ Ai ∩ A j ,

A Z = {v ∈ V : m 1 (v, Z ∩ Vi ,Vi ) > 0 for some i 6= σ(v)},

A W = {v ∈ V : m k−1 (v,Vi \ Wi ) = 0 for some i 6= σ(v)} \ A 0 .

By construction, if v is 1-free then v ∈ A 0 ∪ A Z ∪ A W , and if v is 2-free then v ∈ A 00 ∪ A Z ∪ A W . Thus, if we can bound the size of these sets that we will obtain the estimates required by Lemma 4.13. Claim 4.14. We have |A 0 | ≤ n/q k−1 and |A 00 | ≤ n/q 2k−2 with probability at least 1 − exp{−Ω(n)}.

© ª Proof. Take v ∈ V j , and i 6= j . Now P [m k−1 (v,Vi ) = 0] < exp −k ln q , and hence P[v ∈ A 0 ] ≤ (q − 1)q −k . It follows that E[|A 0 |] := µ < n · (q − 1)q −k . Since P [m k−1 (v,Vi ) = 0] > exp{−(k + 1) ln q} we must have that µ > nq −k and so by the Chernoff bound µ ¶ ¸¾ ½ · i h q 1 q −k k−1 ln − = exp{−Ω(n)} ≤ exp −nq P |A 0 | > n/q q −1 q −1 q −1

as desired. Further, the argument for A 00 follows quickly after noting that the edge sets of m k−1 (v,Vi ) and m k−1 (v,V j ) are independent for i 6= j . Claim 4.15. We have |A Z | ≤ n/q 6k with probability at least 1 − exp{−Ω(n)}.

³ ¡ ¢ ´ |Vi | , p , we have Proof. Take v ∈ V \Vi . Since m 1 (v, Z ∩ Vi ,Vi ) is stochastically dominated by Bin |Z | k−2 n ¡ |Vi | ¢ ¯ i h ¯ 9k P m 1 (v, Z ∩ Vi ,Vi ) > 0 ¯ |Z | ≤ n/q 9k ≤ 1 − (1 − p) q k−2 ≤

ck 2

n · · |Vi |k−2 ≤ q −7k . n k−1 (1 − q 1−k ) q 9k

It follows that conditional on the event |Z | ≤ n/q 9k we know that |A Z | is distributed binomially with mean less than n/q 7k . Finally, application of Lemma 4.1 implies that ¯ h i ¯ P |A Z | > n/q 6k ¯ |Z | ≤ n/q 9k ≤ exp{−Ω(n)},

so the result follows from Claim 4.11.

Claim 4.16. We have |A W | ≤ n/q 2k−3 with probability at least 1 − exp{−Ω(n)}.

12

Proof. Fix i 6= j . We begin by developing probabilistic bounds for m k−1 (Vi ,V j ) and m 1 (Vi ,W j ,V j ). As m k−1(Vi ,V j ) ∼ ³ ¡ |V j | ¢ ´ nk p , p , we have E[m k−1 (Vi ,V j )] := µ = q k (k−1)! (1 + o(1)), and so by Lemma 4.1 Bin |Vi | k−1 # ½ ·µ ¶ µ ¶ ¸¾ k − 1/2 1 1 1 · ≥ 1 − exp −µ 1 − ln 1 − + ≥ 1 − exp{−Ω(n)}. P m k−1(Vi ,V j ) > k k 2k 2k 2k q (k − 1)! "

nk p

In what follows, all probabilities are taken conditional on the event that |W k) for all i 6= j , where ³ i j | ≤ n ¡· Q(q, ¡ −k−1¢ ¢ ´ |V | j eq q Q(q, k) = O (see Claim 4.9). Since m 1 (Vi ,W j ,V j ) is dominated by Bin |Vi ||W j | , p , we have k−2

£ ¤ nk p · q 2−k Q(q, k) E m 1 (Vi ,W j ,V j ) ≤ (k − 2)!

and so from Lemma 4.1, it follows that h i P m 1 (Vi ,W j ,V j ) < 3n k p · q 2−k Q(q, k) = 1 − exp{−Ω(n)}.

We will condition on the event E that T = m k−1 (Vi ,V j ) >

n k p(k − 1/2)

Y = m 1 (Vi ,W j ,V j ) ≤ 3n k p · q 2−k Q(q, k).

and

q k k!

We now appeal to a balls and bins argument. Think of the edges that contribute to Y as yellow balls, those that contribute to T − Y as blue balls, and the vertices of Vi as the bins. Let Yi j be the set of vertices (bins) in Vi that receive at least one ball, but do not receive a blue ball. Let m ≤ 10 be a positive integer. When T balls are thrown amongst n/q bins, the probability that an individual bin receives m balls is given by Ã ! T (q/n)m (1 − q/n)T −m ≤ 2(T q/n)m exp{−T q/n} ≤ 2q 1/2−k k m lnm q. m Given that vertex v ∈ Vi receives a total of m balls, the probability that all its balls are yellow is equal to the probability that a hypergeometric random variable with parameters T, Y , m takes the value m. Therefore, summing over m ≥ 1 and using the bounds on Y , T gives ¯ i X £ h ¤ ¯ P Hyp(T, Y , m) = m × P [v receives m balls] P v ∈ Yi j ¯ E ≤ m≥1

≤

X

1≤m≤10

µ

Y T

¶m

3q 1/2−k k m lnm q +

≤ 4k! q 5/2−k Q(q, k) ln q + q 10(2−k)

X

m>10

µ

2Y T

¶m

≤ q 7/4−2k .

¯ i h ¯ Hence E |Yi j | ¯ E ≤ n · q 3/4−2k . Finally, write Yi j as the sum of indicator variables X v for v ∈ Vi , where X v = 1 if

v ∈ Yi j and X v = 0 otherwise. Azuma’s inequality [20, Theorem 2.25] implies that ¯ i h ¯ P |Yi j | ≤ n · q 1−2k ¯ E ≥ 1 − exp{−Ω(n)}.

Recalling P[E ] ≥ 1−exp{−Ω(n)} and taking the union bound yields A W ≤ q 3−2k with probability 1−exp{−Ω(n)}. Thus Lemma 4.13 follows from Claims 4.14-4.16.

13

Proof of Lemma 4.4. We know from Claim 4.12 that for all σ-complete v and all τ ∈ C (Gσ , σ) we have τ(v) = σ(v). If we let F x be the set of x-free vertices, then by Lemma 4.13, we may assume that ³ ´ ³ ´ n |F 1 \A W | ≤ k−1 + n · O q q −k−1 , |F 2 \A W | = n · O q q −k−1 , |A W | ≤ n · q −k . q We know that for any v ∈ F x there are at least x + 1 choices for the color of v. Since F x+1 ⊆ F x it follows that |C (Gσ , σ)| ≤ 2|F1 \F2 | 3|F2 \F3 | . . . q |F q | ≤ 2|F1 \(F2 ∪AW )| · q |F2 ∪AW | , and so ln q ln 2 1 e q (q −k−1). ln |C (Gσ , σ)| ≤ k−1 + k + O n q q

Further, if we set c = (q k−1 − 1/2) ln q − ln 2 −

1.01 ln q q

then

1.01ln q ln 2 1 e q (q −k−1). +O ln E[Z q,bal ] = ln q + c ln(1 − q 1−k ) = k−1 + n q qk

Hence by Lemma 4.2, T3 holds in G (n, k, cn) with probability 1 − o(1), completing the proof of Lemma 4.4.

4.4 Proof of Corollary 3.7 Here we assume that c > (q k−1 −1/2) ln q −ln 2+1/ ln q. The proof of Corollary 3.7 is similar to the proof of [9, Proposition 2.1]. The starting point is the following observation, which is reminiscent of the “planting trick” from [1]. Call σ : [n] → [q] ε-balanced for some ε > 0 if maxi∈[q] ||σ−1 (i )| − n/q| < εn.

Claim 4.17. Suppose there exist ε, ε′ > 0 and a sequence (E n )n of events such that for large n and all ε-balanced σ : [n] → [q] we have P [Gσ ∈ E n ] ≤ exp(−ε′ n)

while

lim P [G ∈ E n ] = 1.

(4.5) (4.6)

n→∞

Then there exists δ > 0 such that Z q (G ) ≤ exp(−δn)E[Z q (G )] w.h.p. Proof. Let Z q′ (G) be the number of ε-balanced q-colorings of G. By [14, proof of Lemma 2.1] there exists α > 0 such that E[Z q (G ) − Z q′ (G )] ≤ exp(−αn)E[Z q (G )]. (4.7) Further, let Z q′′ (G ) = Z q′ (G )1{G ∈ E n }. Then (4.5) ensures that

E[Z q′′ (G )] ≤ exp(−ε′ n/2)E[Z q′ (G )].

(4.8)

Moreover, let An = {Z q (G ) ≥ exp(−δn)E[Z q (G )]} for a small enough δ > 0. Combining (4.7) and (4.8), we obtain exp(−δn)E[Z q (G )] P[G ∈ An ∩ E n ] ≤ E[Z q (G )1{G ∈ An ∩ E n }]

≤ E[Z q′′ (G )] + E[Z q (G ) − Z q′ (G )] ≤ (exp(−ε′ n/2) + exp(−αn))E[Z q (G )].

Hence, choosing δ > 0 small enough and recalling (4.6), we obtain P [An ] = o(1). Thus, we are left to exhibit a sequence of events as in Claim 4.17. Given a map τ : [n] → [q] and a hypergraph G on [n] let E τ (G) be the number of monochromatic edges of G under τ. Further, for β > 0 let X Z q,β (G) = exp(−βE τ (G)), τ

where the sum ranges over all τ : [n] → [q]. The function β 7→ Z q,β (G) can be viewed as the partition function of a hypergraph variant of the “Potts antiferromagnet” from statistical physics. We consider this random variable because it is concentrated in the following sense. 14

Claim 4.18. For any ε > 0 there is δ > 0 such that for any σ : [n] → [q] we have ¯ ¯ ¤ ¤ £¯ £¯ P ¯ln Z q,β (Gσ ) − E ln Z q,β (Gσ )¯ > εn < exp(−δn). P ¯ln Z q,β (G ) − E ln Z q,β (G )¯ > εn < exp(−δn),

(4.9)

Proof. Either adding or removing a single edge alters the value of ln Z q,β by at most β. Therefore, the assertion follows from a standard application of Azuma’s inequality.

Additionally, we have the following estimate of E ln Z q,β (Gσ ). Claim 4.19. There is δ > 0 such that for all β > 0 and all δ-balanced σ we have E ln Z q,β (Gσ ) > δn + ln E[Z q (G )]. Proof. We are going to show that for a small enough δ > 0 we have n −1 ln Z q (Gσ ) ≥ q 1−k ln 2 + O˜ q (q −k )

(4.10)

w.h.p. Since Z q,β (Gσ ) ≥ Z q (Gσ ) for all β and because (3.3) entails that n −1 ln E[Z q (G )] ≤ q 1−k (ln 2−ln−1 q)+O˜ q (q −k ), the claim follows from (4.10). To prove (4.10) we let F i j be the set of vertices v ∈ Vi such that m k−1 (v,V j ) = 0. Further, let F i′ j be the set of

all v ∈ F i j such that m k−1 (v,Vh ) = 0 for some h ∈ [q] \ {i , j }. Due to the independence of the edges, |F i j |, |F i′ j | are binomial random variables. Provided that δ is small enough, their expected sizes are ¢ e q (q −k−2) n, e q (q −k−2)n. E|F i j | = q −k−1 + O E[F i′ j | = O

Hence, the Chernoff bound implies that w.h.p. for all i , j we have eq (q −k−2))n, |F i j | = (q −k−1 + O

e q (q −k−2)n. |F i′ j | = O

(4.11)

S Let F ⋆ = i6= j F i j \ F i′ j . Further, for every vertex v ∈ F ⋆ let σ⋆ (v) ∈ [q] be the (unique) color such that v ∈ F σ(v)σ⋆ (v) . Further, let E ⋆ be the set of edges e of Gσ such that there exist v, w ∈ e ∩ F ⋆ such that

σ(e \ {v, w}) ⊂ {σ(v), σ(w), σ⋆ (v), σ⋆ (w)}. e q (q −k )n. Given that (4.11) occurs, |E ⋆ | is stochastically dominated by a binomial random variable with mean O Hence, the Chernoff bound implies that w.h.p. e q (q −k )n. |E ⋆ | = O

(4.12)

Now, let F 0 be the set of all vertices v ∈ F ⋆ that do not occur in any e ∈ E ⋆ . Then by construction any map τ : [n] → [q] such that τ(v) ∈ {σ(v), σ⋆ (v)} for all v ∈ F 0 and τ(v) = σ(v) for all v 6∈ F 0 is a q-coloring of Gσ . Furthermore, e q (q −k ))n w.h.p., whence (4.10) follows. there are 2|F0 | such τ and (4.11), (4.12) entail that |F 0 | ≥ (q 1−k + O By comparison, ln E[Z q,β (G )] is upper-bounded as follows.

Claim 4.20. For any δ > 0 there is β0 > 0 such that for all β > β0 we have ln E[Z q,β (G )] ≤ δn + ln E[Z q (G )]. Proof. Using (3.3) and the fact that monochromatic edges are least likely when τ is balanced, we obtain ³ ³ ´ ´ 1 1 ln E[Z q (G )] = ln q + c ln 1 − q 1−k + o(1), ln E[Z q,β (G )] ≤ ln q = c ln 1 − q 1−k (1 − exp(−β)) . n n Making β sufficiently large and taking logarithms, we obtain the assertion.

Finally, we know from Claims 4.19–4.20 and Jensen’s inequality that there exists δ > 0 such that E ln Z q,β (G ) + δn ≤ E ln Z q,β (Gσ ). However, Claim 4.18 implies that both ln Z q,β (G ) and ln Z q,β (Gσ ) are close to their expectations. Therefore ¯ © ¯ ª E n = G : ¯ln Z q,β (G) − E ln Z q,β (G )¯ > εn

will suffice for Corollary 3.7.

15

5 The second moment In this section we prove Proposition 3.6. We keep the notation and the assumptions of Section 3 and Section 4.

5.1 Overview 2 We reduce the problem of estimating E[Z q,tame ] to that of optimising the function F (a) from Lemma 3.3 over a certain domain Dtame . Due to the additional constraints imposed by the “tame” condition, this domain Dtame is a relatively small subset of D, which was the domain of optimisation for (3.5). In the end, maxa∈Dtame F (a) will be seen to be significantly smaller than maxa∈D F (a), and additionally, the problem of maximising F over Dtame technically less demanding.

To define Dtame formally, call a ∈ D separable if ai j 6∈ (q −1 (1.01/k)1/(k−1) , q −1 (1 − κ)) for all i , j ∈ [q] (cf. (3.6)). Additionally, we say that a ∈ D is s-stable if there are precisely s pairs (i , j ) such that ai j > q −1 (1.01/k)1/(k−1) . We denote by Ds the set of all s-stable a ∈ D, and by D[q−1] = ∪s s}1{ j > s}. ¯ Hence, a(s) is a block-diagonal matrix. The upper-left block is the s × s identity matrix, divided by q, and the ¯ ∈ Ds,tame . lower-right block is the (q − s) × (q − s) matrix with all entries equal to (q(q − s))−1 . Clearly, a(s) The following statement, which we prove in Section 5.3, is the heart of the second moment analysis.

¯ and maxa∈Ds,tame F (a) ≤ F (a(s)) ¯ Lemma 5.2. We have maxa∈D0,tame F (a) = F (a) + q 0.999−k for any 1 ≤ s < q. 5.1.1 Proof of Lemma 3.6 By Lemma 5.1 and Lemma 5.2 it remains only to show that ¯ ¯ F (a(s)) + q 0.999−k < F (a)

for all 1 ≤ s < q.

(5.1)

To see that this is the case, we note that ¯ = 2ln q, H (a)

and

¯ = −2ln q + E (a)

2ln 2 q k−1

+ o(q 1−k )

and thus

¯ = F (a)

2ln 2 q k−1

¡ ¢ + o q 1−k .

(5.2)

Moreover, if s ∈ [q] then ¯ H (a(s)) =

s q −s ln q + ln(q(q − s)), q q

s eq (q 1−k ) ln q + O q ¶ (q − s)2

¯ E (a(s)) < −2ln q +

µ q −s s − 2q s + k ln q + ln(q(q − s)) + c ln 1 + q q qk q (q − s)k q −s < ln q + ln(q − s) + o q (q 1−k ) q

¯ F (a(s)) =

16

and thus

(5.3)

" ¶ ¶2 # µ h i µ s − 2q (q − s)2 (q − s)2 1 s − 2q k−1 + (q − 1/2) ln q − ln 2 · + k + k − 2 qk q (q − s)k qk q (q − s)k

= (1 − s/q) ln(1 − s/q) +

2ln 2 q k−1

−

s ln 2 qk

q k−1 ln q − 2

−

s ln q

"µ

2q k

+

s − 2q qk

ln q

q k−1

+

+

ln q

q k−1 (1 − s/q)k−2

(q − s)2

q k (q − s)k

¶2 #

+ o q (q 1−k ),

(5.4)

whence (5.3) and (5.4) follow. Finally, (5.1) follows from (5.2) and (5.4).

5.2 The Laplace method: proof of Lemma 5.1 We seek to show that there exists some positive constant C (q) such that 2 E[Z q,tame ] ≤ C (q) · E[Z q,bal ]2 .

(5.5)

2 The expected value of Z q,tame can be written as a sum over pairs of tame colourings. We will break this sum into several components and deal with each separately. First we estimate the contribution resulting from a near a¯ by ¯ Taylor-expanding F around a.

¯ kk < η(q)} we have Lemma 5.3. There exists C (q) and η(q) such that with E = {a ∈ R ∩ Dtame : ka − ak 2 E[Z q,tame · 1E ] ≤ C (q) · E[Z q,bal ]2

Proof. We may parametrise R ∩ Dtame as follows: disregard the (q, q) entry and consider each matrix a as a q 2 − 1 dimensional vector. Let ( ai j if (i , j ) 6= (q, q), q 2 −1 q2 ai j 7→ L : [0, 1/q] −→ [0, 1/q] , P 1 − (i,j )6=(q,q) ai j otherwise. We compute the Hessian of F ◦ L = H ◦ L + E ◦ L . For (i , j ) 6= (a, b) we have

Further

¢¯¯ ∂ ¡ H ◦ L (a) ¯ = 0, a=a¯ ∂ai j

¯ ∂ ¯ kL (a)kkk ¯ = 0, a=a¯ ∂ai j

and so

∂2 ¡

∂ai2j

∂2 ∂ai2j

¢¯¯ H ◦ L (a) ¯

¯ ¯ kL (a)kkk ¯

a=a¯

a=a¯

=

= −2q 2 ,

2k(k − 1) q 2k−4

,

¡ ¢¯¯ ∂2 H ◦ L (a) ¯ = −q 2 . a=a¯ ∂ai j ∂a ab ¯ k(k − 1) ∂2 ¯ kL (a)kkk ¯ = 2k−4 , a=a¯ ∂ai j ∂a ab q

¢¯¯ ∂ ¡ E ◦ L (a) ¯ = 0, a=a¯ ∂ai j

¢¯¯ ∂2 ¡ 2ck(k − 1) , E ◦ L (a) ¯ = 2k−4 2 a=a¯ q (1 − q 1−k )2 ∂ai j ¡ ¢¯¯ ∂2 ck(k − 1) . E ◦ L (a) ¯ = 2k−4 a=a¯ ∂ai j ∂a ab q (1 − q 1−k )2

¯ and that the Hessian is Thus, we have that the first derivative of F ◦ L vanishes at a, ¶ µ ¡ ¢¯¯ 2ck(k − 1) (id + 1) D 2 F ◦ L (a) ¯ = −q 2 1 − 2(k−1) a=a¯ q (1 − q 1−k )2

where 1 is the matrix with all all entries equal to one, and id is the identity matrix. As id is positive definite, 1 is ¯ Further, it follows from positive semidefinite and c < q k−1 ln q we have that the Hessian is negative definite at a. 17

¡ ¢ continuity that there exists some η˜ , ξ˜ independent of n such that the largest eigenvalue of D 2 F ◦L is smaller than ˜ 2< of L implies that there is an n independent η such that for all a such −ξ˜ for all points ka − ak ° η˜ . The linearity ° ¯ 2 < η we have °L −1 − a˜ °2 < η˜ . Taylor’s theorem then implies that there is some n independent ξ such that ka − ak that X ¯ −ξ ¯ 2 < η. F ◦ L (a) ≤ F (a) (ai j − 1/q)2 for all a : ka − ak (i,j )6=(q,q)

Finally then, 2 2 ¯ · O(n (1−q )/2 ) · E[Z q,tame · 1E ] = exp{nF (a)}

¯ · O(n ≤ exp{nF (a)} ¯ · O(n ≤ exp{nF (a)}

(1−q 2 )/2

(1−q 2 )/2

)· )·

X

a∈E

Z

Rq

(

exp −nξ

2 −1

·Z∞ ∞

(

X

(i,j )6=(q,q)

exp −ξ

X

(ai j − 1/q)

(i,j )6=(q,q)

2

(zi j − 1/q)

)

2

)

d zi j

¸q 2 −1 n o 2 = C (q) · E[Z q,bal ]2 . exp −ξzi j d zi j

There are two remaining cases to consider, namely a ∈ D[q−1] \E and a ∈ Dq . We begin with the latter. Lemma 5.4. There exists a constant C (q) > 0 such that 2 E[Z q,tame · 1Dq ] ≤ C (q) · E[Z q,bal ]2

Proof. We calculate 2 E[Z q,tame · 1D q ] =

X

a(σ,τ)∈Dq

= q! = q!

X

σ,τ∈Dq

X

σ∈Dq

X £ ¤ £ ¤ P σ, τ are tame = q! P σ, τ are tame and τ ∈ C (G , σ) σ,τ∈Dq

£

¤ £ ¤ P τ ∈ C (G , σ)| σ, τ are tame · P σ, τ are tame

£ ¤ £ ¤ E C (G , σ)| σ is tame · P σ is tame

≤ q! · E[Z q,bal ]

X

σ∈Dq

£ ¤ P σ is tame = q! · E[Z q,bal ]2 ,

[by T3],

as desired. Lemma 5.5. We have that 2 E[Z q,tame · 1D[q−1] \E ] ≤ E[Z q,bal ]2

Proof. We take η as in Lemma 5.3 and set ¯ 2 ≥ η}. E ′ = {a ∈ R ∩ Dtame : ka − ak ¯ for all a ∈ E ′ additionally implies that there exists some γ such that As E ′ is compact, the fact that F (a) < F (a) ¯ − γ. Then maxa∈E ′ F (a) < F (a)

as desired.

© ª © ª 2 2 ¯ − γ) ¯ − γ) ≤ n q exp n(F (a) E[Z q,tame · 1D[q−1] \E ] ≤ |E ′ | exp n(F (a) © ª ¯ − γ/2) ≤ E[Z q,bal ] · exp{−γ/3} ≤ E[Z q,bal ]2 , ≤ exp n(F (a)

Finally, (5.5) follows from combining Lemmas 5.3-5.5. 18

5.3 The Maximisation Problem: proof of Lemma 5.2 5.3.1 The strategy The proof is based on the local variation technique developed in [9]. Roughly speaking, for each 0 < s < q we will argue that for any arbitary a ∈ Ds , we can move slightly toward a nicer matrix while increasing F . The new matrix ¯ whose first that we produce is then regular enough that we may perform calculations and compare it the point a(s) ¯ comes s diagonal entries are 1/q, and whose (i , j )-entries are equal to (q(q − s))−1 for i , j > s. As it turns out, a(s) close enough to maximising F over Ds (up to a negligible error term in each case). The final step is then to show ¯ that these points are still bounded above by the barycentre of D (i.e. by a). Let us take a moment to collect some results that will be used throughout the remainder of this section. In particular, it may come as no surprise that in a local variations argument we make extensive use of derivatives. Taking partials of F we have µ

¶ ck(aik−1 − aik−1 ) ai y ∂ ∂ x y , − F (a) = ln + ∂ai x ∂ai y ai x 1 − 2/q k−1 + kakk k

i , x, y ≤ q.

(5.6)

This represents the change in F when we increase ai x at the expense of ai y (see Proposition 5.7, which describes when the above quantity is positive). Further, we will often tackle the changes in entropy and energy separately. Further, we need the following elementary inequalities (cf. [9, Corollary 4.10]). Pq Fact 5.6. Let q a ∈ [0, 1]q be such that i=1 ai = 1/q, and define h : [0, 1] → R,

z 7→ −z ln z − (1 − z) ln(1 − z).

Then (i) for J ⊆ [q] and r =

P

q ai we have H (a) ≤ h(r ) + r ln |J | + (1 − r ) ln(q − |J |), and P (ii) for J ⊆ {2, . . . , q} with 0 < |J | < q − 1 and r = i∈J q ai , if q a1 < 1 then i∈J

H (a) ≤ h(q a1 ) + (1 − q a1 )h(r /(1 − q a1 )) + r ln |J | + (1 − r − q a1 ) ln(q − |J | − 1).

The following lemma is the main tool to carry out the local variations argument. Recall that S is the set of all P matrices a = (ai j )i,j ∈[q] with entries ai j ≥ 0 such that j ai j = 1/q for all i . Lemma 5.7. Suppose a ∈ S . If i ∈ [q] and ; 6= J ⊆ [q] are such that for some number 3ln ln q/ ln q ≤ µ ≤ 1 we have |J | ≥ q µ

and

max aik−1 j < j ∈J

0.995 ¡

kq k−1

¢ µ − ln ln q/ ln q ,

then the matrix a˜ ∈ S obtained from a by setting a˜x y = 1{(x, y) ∉ {i } × J }a x y +

1{(x, y) ∈ {i } × J } X ai j |J | j ∈J

˜ In fact, the inequality is strict unless a = a. ˜ is such that F (a) ≤ F (a). Proof. Take i ∈ [q], J ⊂ [q] as desribed and x, y ∈ J such that k−1 k−1 aik−1 x = min a i j < a i y < j ∈J

0.995 ¡

kq k−1

19

¢ µ − ln ln q/ ln q .

(5.7)

We will show that (5.6) is positive for the range of ai x and ai y that we have at hand. It will be convenient to make the substitution δx y = aik−1 − aik−1 > 0 and instead consider whether y x Ãµ ¶ ¶ ! ck(k − 1)(a k−1 − a k−1 ) ai y k−1 ∂ ∂ iy ix (k − 1) − F (a) = ln − k−1 ∂ai x ∂ai y ai x 1 − 2/q + kakkk ! Ã ck(k − 1)δx y δx y =: ∆(δx y ) > 0. = ln 1 + k−1 − ai x 1 − 2/q k−1 + kakkk µ

(5.8)

After noting that ∆(0) = 0, it follows from the concavity of ∆ that if δ∗ > 0 satisfies (5.8) then so does δx y for all 0 < δx y < δ∗ . Therefore we take δ∗ = and observe that ai x ≤ exp

(

1 |J|

P

j ∈J

0.999 ¡

kq k−1

ai j ≤

ck(k − 1)δ∗

1 − 2/q k−1 + kakkk

)

¡ ¢1−k ≤ q ai x ln q ≤ 1+

¯ ¯ ¢ ¯ k−1 ¯ µ − ln ln q/ ln q > max δx y = max ¯aik−1 y − ai x ¯ ,

1 q|J| , we have

x,y ∈J

x,y ∈J

after taking the exponential of (5.8) that

© ¡ ¢ª ¡ ¢k−1 ¡ ¢k−1 < exp (k − 1) ln q µ − ln ln q/ ln q = q µ / ln q ≤ |J |/ ln q

1.99ln ln q

kq k−1 aik−1 ln q x

≤ 1+

0.995 ¡

kq k−1

¢ µ − ln ln q/ ln q ·

1

aik−1 x

< 1 + δ∗ /aik−1 x .

In other words, if we take a row i and a set J of not too few columns such that the largest entry ai j , j ∈ J , is not too big, then the function value does not drop if we replace all entries ai j , j ∈ J , by their average. Thus, Proposition 5.7 can be used to “flatten” parts of the matrix a without reducing the function value. The proof of Proposition 5.7 is merely by analysing signs of ∂F /∂ai j − ∂F /∂ai j ′ for j , j ′ ∈ J . In what follows we will use Lemma 5.7 to show that Lemma 5.2 holds for each 0 ≤ s < q separately. Formally, we set out to show that: ¯ we have F (a) < F (a). ¯ Claim 5.8. Suppose that s = 0. Then for all a ∈ D0,tame \{a} ¯ Claim 5.9. Suppose that 1 ≤ s ≤ q 0.999 . Then for all a ∈ Ds,tame we have F (a) < F (a). ¯ Claim 5.10. Suppose that q 0.999 < s < q − q 0.49 . Then for all a ∈ Ds,tame we have F (a) < F (a). ¯ Claim 5.11. Suppose that q − q 0.49 ≤ s < q. Then for all a ∈ Ds,tame we have F (a) < F (a). Lemma 5.2 is immediate from Claims 5.8–5.11. 5.3.2 Proof of Claim 5.8 We begin with the following consequence of Lemma 5.7. Claim 5.12. Suppose that a ∈ S has an entry ai j ∈ [1.02/(qk), q −1 (1.01/k)1/(k−1) ]. Then the matrix a ′ ∈ S with entries a x′ y = 1{x 6= i }a x y + 1{x = i }q −2 (x, y ∈ [q]) satisfies F (a ′ ) > F (a). Proof. Without loss of generality we may assume that a maximises F (a) over the set a ∈ S with respect to a11 ∈ [1.02/(qk), q −1 (1.01/k)1/(k−1) ]. If we apply Proposition 5.7 to the set J = [q]\{1} with µ = ln(q − 1)/ ln q then the 20

maximality of F (a) implies that a1 j = (1 − q a11 )/(q(q − 1)) for j ≥ 2. Because a ′ is obtained from a by replacing the first row replaced by (q −2 , . . . , q −2 ), the change in entropy comes to ¡ ¢ ¡ ¢ H (a ′ ) − H (a) = q −1 ln q − H (q a1 ) ≥ q −1 ln q − ln 2 − (1 − 1.02/k) ln q ≥ q −1 (1.02ln q)/k − ln 2 .

(5.9)

Furthermore,

k kakkk − ka ′ kkk = a11 − q 1−2k + (q − 1)

·

1 − q a11 q(q − 1)

¸k

≤ q −k (1.01/k)k/(k−1) + 4q 1−2k .

(5.10)

The derivative of the function E from Lemma 3.3 satisfies ∂E (a) ∂ kakkk

=

c 1 − 2q 1−k

+ kakkk

≤ 1.001q k−1 ln q.

(5.11)

Hence, (5.10) implies that E (a) − E (a ′) ≤ 1.02k −k/(k−1) q −1 ln q. Combining this bound with (5.9) and assuming that q ≥ q 0 for a large enough constant q 0 , we find F (a ′ ) − F (a) = H (a ′ ) − H (a) + E (a ′ ) − E (a) > 0. Proof of Claim 5.8. The set D0,tame is compact. Therefore, the continuous function F attains a maximum at some £ ¤ ¯ Then we will construct a sequence of matrices a[i ], i ∈ q , point a ∈ D0,tame . Assume for contradiction that a 6= a. ¯ with F (a[i + 1]) ≥ F (a[i ]) for all i < q and F (a[0]) 6= F (a[q]), clearly arriving at a such that a[0] = a, a[q] = a, contradiction to the maximality of F (a). Specifically, let a[0] = a and obtain a[i ] from a[i − 1] by letting a x y [i ] = 1{x 6= i }a x y [i − 1] + 1{x = i }q −2

for i , x, y ∈ [q].

¯ To show that F (a[i + 1]) ≥ F (a[i ]) we consider two cases. This construction ensures that a[q] = a. Case 1: max j ∈[q] ai j ≤ 1.02/(qk). We apply Lemma 5.7 with J = [q] and µ = 1 Since ai j ≤ 1.02/(qk), the assumption (5.7) is satisfied. Consequently, F (a[i ]) ≥ F (a[i − 1]), with equality iff a[i ] = a[i − 1]. Case 2: max j ∈[q] ai j > 1.02/(qk). Claim 5.12 shows that F (a[i ]) > F (a[i − 1]). ¯ = F (a[q]) > F (a[0]) = F (a). Note that Finally, since a 6= a¯ we have a[i ] 6= a[i − 1] for some i ∈ [q], whence F (a) although we may temporarily leave D0,tame during this process, we are guaranteed to return to a¯ ∈ D0,tame .

5.3.3 Proof of Claim 5.9 The strategy of this proof is to compare an arbitary element of Ds to a matrix that is more evenly distributed (using ¯ Proposition 5.7), to which we then compare the barycentre of the face of D (i.e. a(s)) and finally, to which we ¯ Let 1 ≤ s ≤ q 0.999 and take a ∈ Ds . It follows from Corollary 5.12 and the separability results that we compare a. may assume q aii ≥ 1 − κ for i ≤ s with κ = ln20 q/q k−1, and further, that we may also assume q ai j < 1.02/k for all i 6= j ≤ s and s < i , j ≤ q. Let q aˆ be the singly-stochastic matrix with entries aˆi j =

(

1 q−s

ai j P

ℓ>s

aiℓ

if i ∈ [q], j ≤ s,

if i ∈ [q], j > s.

ˆ We Since q − s = q(1 − o q (1)) we may apply Proposition 5.7 to J = [q]\[s] for any i ∈ [q]. It follows that F (a) ≤ F (a). ˆ and F (a(s)). ¯ ˆ We start with the entropy term. As aˆ is will now compare F (a) To this end we must first estimate F (a). stochastic and q aˆii ≥ 1 − κ for i ≤ s, we find that X ri = q aˆi j = 1 − q aii ≤ κ, for i ≤ s. i6= j

21

Further, if we set r i = q

Ps

ˆ for i > s then it follows from the fact that q a is doubly-stochastic that

j =1 a

X

i>s

ri = q

s XX

i>s j =1

aˆi j = q

s XX

i>s j =1

ai j ≤ κs,

for i > s.

We know from Fact 5.6 that H (q aˆi ) ≤ h(r i ) + r i ln(q − 1) ≤ h(κ) + κ ln q

for i ≤ s,

and H (q aˆi ) ≤ h(r i ) + r i ln s + (1 − r i ) ln(q − s) ≤ h(r i ) + r i ln s + ln(q − s),

for i > s.

Since h is concave, it follows that X

i>s

H (q aˆi ) ≤ (q − s) ln(q − s) +

X

i>s

(h(r i ) − r i ln s) ≤ (q − s) ln(q − s) + qh

µ

¶ κs + κs ln s. q

Therefore µ ¶ q ¢ q −s 1X s¡ κs κs ˆ = ln q + H (a) H (q aˆi ) ≤ ln q + ln(q − s) + h ln s h(κ) + κ ln q + + q i=1 q q q q q −s e q (q 1−k )] ln(q − s) + o q (q 1−k ) [as s ≤ q 0.999 and h(κs/q) = O ≤ ln q + q

¯ = H (q a(s)) + o q (q 1−k )

[by (5.3)].

(5.12)

Next we deal with estimation of the energy term. It will be convenient to break down the problem as follows XX k XX k XX k kak ˆ kk = aˆi j + aˆi j + aˆi j . (5.13) i≤s j ≤q

i>s j >s

i>s j ≤s

As the k-norm is maximised when summands are as unequal as possible, we have kaˆi kkk ≤ q 1−k for i ≤ s. Further, by the same logic we have XX

i>s j >s

aˆikj

= (q − s)

2

Ã

1 X ail q − s l >s

!k

≤ (q − s)2−k q −k ,

and XX

i>s j ≤s

aˆikj

≤

Ã

As s/q ≤ q −0.001 we know XX

i>s j ≤s

aˆikj ≤

!k

µ

¶k

XX

aˆi j

µ

≤ q k(1−k) q −γ ,

i>s j ≤s

κs q

¶k

κs ≤ q

.

for some γ > 0. If we combine the above results then we have shown that k(1−k) −γ k ¯ kak ˆ kk ≤ sq 1−k + (q − s)2−k q −k + q k(1−k) q −γ = k a(s)k q , k +q

and so ˆ − E (a(s)) ¯ E (a) ≤

∂E (a) ¡

∂ kakkk

2

−(1−k) −γ k ¯ kak ˆ kk − ka(s)k q ln q(1 + o q (1/q)) = o q (q 1−k ). k) ≤ q

22

(5.14)

Therefore it follows from (5.12) and (5.14) that ˆ ≤ F (a(s)) ¯ F (a) ≤ F (a) + o q (q 1−k ). Recalling that s/q ≤ q −0.001 , it follows from (5.4) that ˆ ≤ F (a(s)) ¯ F (a) ≤ F (a) + o q (q 1−k ) = (1 − s/q) ln(1 − s/q) +

= (1 − s/q) ln(1 − s/q) +

2ln 2 q k−1

s ln 2

−

−

s ln q

q k−1

ln q

+

q k−1

+

2q k

+

ln q q k−1

ln q

+

q k−1(1 − s/q)k−2 "µ ¶2 # k−1 (q − s)2 q ln q s − 2q + k + o q (q 1−k ) − 2 qk q (q − s)k

2ln 2

qk

ln q

q k−1 (1 − s/q)k−2

¶ µ q k−1 ln q s − 2q 2 + o q (q 1−k ) − 2 qk s ln q 2ln q 2ln 2 ln q = − (1 − s/q) + k−1 + k−1 + k−1 − + o q (q 1−k ) q q q q (1 − s/q)k−2 q k−1 s 2ln 2 ¯ − (1 − s/q) + o q (q 1−k ). ≤ (1 − s/q) ln(1 − s/q) + k−1 + o q (q 1−k ) = F (a) q q

¯ As the qs (1−s/q) is decreasing in s, we have shown that F (a) < F (a)−1/q +1/q 2 +o q (q 1−k ). This implies our original assertion. 5.3.4 Proof of Claim 5.10 Let q 0.999 < s < q − q 0.49 and take a ∈ Ds . As before, we may assume q aii ≥ 1 − κ for i ≤ s, and q ai j < 1.02/k for all i 6= j ≤ s and s < i , j ≤ q. Let q aˆ be the singly-stochastic matrix with entries      

ai j 1 P ℓ∈[s]\{i} a iℓ aˆi j = s−11 P   ℓ>s a iℓ   q−s  1P s ℓ≤s a iℓ

if i = j ∈ [s],

if i , j ≤ s, i 6= j ,

if j > s,

if j ≤ s < i .

Since s, q − s > q 0.49 we may apply Proposition 5.7 to J = [q]\[s] and J ′ = [s]\{i } for any i ∈ [q]. It follows that ˆ To estimate F (a) ˆ we will now define F (a) ≤ F (a). X X X X ri = q ai j = q aˆi j for i ≤ s, and r i = q ai j = q aˆi j for i > s, j >s

j >s

j ≤s

j ≤s

and since q a is doubly stochastic, r=

X

i>s

ri =

X

i≤s

ri ≤

X

i≤s

1 − q aii ≤ κs.

Further, we also set ti = q

X

j ∈[s]\{i}

aˆi j = q

X

j ∈[s]\{i}

ai j ≤ 1 − q aii ≤ κ

for i ≤ s.

(5.15)

As before we will now estimate both the entropy, and the energy term separately. We know from Fact 5.6 (ii) that H (q aˆi ) ≤ h(q aii ) + (1 − q aii )h(ti /(1 − q aii )) + ti ln(s − 1) + (1 − ti − q aii ) ln(q − s) 23

≤ h(q aii ) + (1 − q aii )h(r i /(1 − q aii )) + ti ln s + r i ln(q − s)

= −q aii ln(q aii ) − ti ln ti − r i ln r i + ti ln s + r i ln(q − s) ≤ h(ti ) + ti ln s + h(r i ) + r i ln(q − s),

i ≤ s,

where the last line follows as the function g : x 7→ −(1 − x) ln(1 − x) is decreasing with g ′ (x) ≤ 1 for small x. If we set P e = 1 i≤s (h(ti ) + ti ln s) then it follows from the concavity of h that H q 1X r s e + h(r /s) + ln(q − s). H (q aˆi ) ≤ H q i≤s q q

Furthermore, by Fact 5.6 (i) and the concavity of h, we have

1X r q −s −r q−s h(r /(q − s)) + ln s + ln(q − s). H (q aˆi ) ≤ q i>s q q q Combining these results, it follows that ¶ µ ¶ µ r q −s q −s −r s e+ ˆ ≤ ln q + H h(r /s) + ln(q − s) + h(r /(q − s)) + (r /q) ln s + ln(q − s). H (a) q q q q

Further as h(x) ≤ x(1 − ln x), we have

¤ q −s ln(q − s) 2 − 2ln r + 2ln s + ln(q − s) + q ¡ ¢ q −s ln(q − s) + O q (1/q), [as −z ln z ≤ 1 for z ≥ 0] 2 + 3ln q + q ¢ r ¡ s ln q = 2ln q + + O q (1/q). 2 + 3ln q + (1 − s/q) ln(1 − s/q) − q q

r q r ≤ ln q + q

e ≤ ln q + ˆ −H H (a)

Since s < q, we obtain

e− H

£

´ 1X 1 X³ 1 2ln q X h(ti ) + ti (ln s − 2ln q) ≤ ti = (h(ti ) − ti ln q) ≤ , q i≤s q i≤s q i≤s q

(5.16)

(5.17)

where the last inequality follows from noting that maxx∈[0,1] h(x) − x ln q ≤ 1/q. Thus, by combining (5.16) and (5.17) we have shown ˆ ≤ 2ln q + H (a)

¢ r ¡ s ln q 2ln q X + 2 + 3ln q + (1 − s/q) ln(1 − s/q) − ti + O q (1/q). q q q i≤s

(5.18)

Next, we move on to estimating the energy term. As before we firstly estimate kakkk , then we apply a bound for ˆ − E (a(s)). ¯ Firstly note from (5.15) that for i ≤ s, we have ∂E /∂ kakkk in order to approximate E (a) k aˆii ≤ q −k (1 − ti )k =

X

j ∈[s]\{i}

aˆikj = (s − 1)

µ

ti /q s −1

¶k

1 qk ≤

−

kti qk

+ o q (q 1−2k ),

(κ/q)k (s − 1)k−1

≤ (κ/q)k .

Moreover, since q aˆ is stochastic and q aˆii ≥ 1 − κ if i ≤ s, we have X aˆikj ≤ (κ/q)k , for i ≤ s. j ∈[q]\[s]

Combining the above equations yields X

i≤s

kaˆi kkk ≤ sq −k −

k X

qk 24

i≤s

ti + o q (q 1−2k ).

and

Since q aii ≥ 1 − κ for i ≤ s we have q ai j ≤ κ for j ≤ s < i . By construction, this implies that q aˆi j ≤ κ for j ≤ s < i . Furthermore, we have that µP ¶k XX k XX κk s j >s a i j 2 aˆi j ≤ k and aˆi j = (q − s) ≤ q −k (q − s)2−k . q −s q i>s j ≤s i>s j >s We have shown that kak ˆ kk ≤ sq −k + q −k (q − s)2−k −

k X

qk

i≤s

k ¯ ti + o q (q 1−2k ) = ka(s)k k−

k X

qk

i≤s

ti + o q (q 1−2k ),

and so from (5.11) and (5.3), we have ¢ ∂E (a) ¡ k ¯ ˆ kk − k a(s)k · kak k k ∂ kakk ! Ã ¡ ¢ k X 1−2k k−1 ¯ ti + o q (q ) ≤ E (a(s)) −q ln q 1 + o q (1/q) · k q i≤s k ln q X ¯ ti + o q (q 1−k ) ≤ E (a(s)) − q i≤s k ln q X s ti + O q (1/q). = −2ln q + ln q − q q i≤s

ˆ = E (a(s)) ¯ E (a) +

(5.19)

Finally then, it follows from (5.18) and (5.19) that ¢ r ¡ (2 − k) ln q X 2 + 3ln q + (1 − s/q) ln(1 − s/q) + ti + O q (1/q) q q i≤s s = (1 − s/q) ln(1 − s/q) + O q (1/q) ≤ − (1 − s/q) + O q (1/q). q

ˆ ≤ F (a) ≤ F (a)

¯ Fortunately, our assumption q 0.999 < s < q − q 0.49 ensures that F (a) < 0 < F (a). 5.3.5 Proof of Claim 5.11 p Let q − q ≤ s ≤ q − 1 and take a ∈ Ds . As before we may assume q aii ≥ 1 − κ for i ∈ [s], and q ai j < 1.02/k for all P i 6= j ≤ s and s < i , j ≤ q. Let r i = q j 6=i ai j . As q a is doubly-stochastic and q aii ≥ 1 − κ for i ≤ s, we have X XX X r= ri = q ai j = 1 − q aii ≤ κs. i≤s

i≤s j 6=i

i≤s

Further, we let ti =

X

q ai j ,

and

j >s

t=

X

ti .

i≤s

Since q a is doubly-stochastic we have t=

XX

i≤s j >s

q ai j =

XX

q ai j .

(5.20)

i>s j ≤s

The strategy of this proof is to compare F (a) to F (q −1 id) where id is the q × q identity matrix. To this end we firstly P estimate the entropy of a. We now set H = q1 i≤s h(q aii ) and as before apply Fact 5.6 (ii) and the concavity of h to observe that 1X 1X H (q ai ) ≤ h(q aii ) + r i h(ti /r i ) + ti ln(q − s) + (r i − ti ) ln s q i≤s q i≤s 25

r t r −t h(t /r ) + ln(q − s) + ln s q q q t t r −t ≤ H + (1 − ln t + ln r ) + ln(q − s) + ln s q q q ≤H+

[as h(z) ≤ z(1 − ln z)].

As −z ln z ≤ 1 for z > 0, we have that −t ln t ≤ 1. Furthermore, as q a is doubly-stochastic we have that t ≤ q − s, and so ¢ q −s t e (1 − ln t + ln r ) ≤ · (1 + O(1/q) . q q

Therefore

¢ t 1X r −t q −s¡ e q (1/q) . ln s + H (q ai ) ≤ H + ln(q − s) + 1+O q i≤s q q q

(5.21)

We now move to estimating H (q ai ) for i > s. As is by now routine, we apply Fact 5.6 (i) along with the concavity of h and (5.20) to conclude that " Ã ! Ã ! # X X X 1X 1X q ai j + H (q ai ) ≤ h q ai j ln(s) + 1 − q ai j ln(q − s) q i>s q i>s j ≤s j ≤s j ≤s µ ¶ q −s t q −s−t t ≤ h ln(q − s) + ln s + q q −s q q t q −s−t q −s ln 2 + ln s + ln(q − s) [as h(z) ≤ ln 2 for all z]. (5.22) ≤ q q q Finally then, we have from (5.21) and (5.22) that H (a) = ln q + H +

1 X H (q ai ) q i≤q

¢ r q −s q −s q −s¡ e ln s + ln 2 + ln(q − s) + 1 + O(1/q) q q q q ¢ r q −s q−s q −s¡ e ln 2 + ln q + 1 + O(1/q) , ≤ ln q + H + ln s + q q 2q q

≤ ln q +

[as q − s ≤

p

q].

Moving on to the energy term, we firstly estimate kakkk . As the norm is maximised when the summands are widely distributed, we have X X k X k kai kkk ≤ κk s + aii = aii + o(1/q k+1 ). i≤s

i≤s

i≤s

A similar argument then applies to the remaining q − s rows. Recalling Corollary 5.12, it follows that X

i>s

kai kkk

µ

1.02 ≤ (q − s) qk

¶k

.

Therefore, kakkk ≤ (q − s)

µ

1.02 qk

¶k

+

X

i≤s

k aii + o(1/q k+1 ),

and so kakkk

° °k q − s − ° q −1 id°k ≤ k q

"µ

1.02 k

¶k

#

−1 +

26

1 X

qk

i≤s

[(q aii )k − 1] + o(1/q k+1 ).

Finally then, we have from (5.11) that ° °k ¢ ∂E (a) ¡ kakkk − °q −1 id°k ∂a "µ # ¶ ln q X q − s 1.02 k [(q aii )k − 1]) + o q (ln q/q) −1 + ≤ ln q · q k q i≤s

E (a) − E (q −1id) =

If we combine our estimates for the entropy and energy terms, we have shown that F (a) − F (q −1id) ≤ H +

r ln q X ln s + [(q aii )k − 1]) + o q (ln q/q) q q i≤s + ln q ·

q −s q

·µ

1.02 k

¶r

+

¸ 2 + ln 2 − 1/2 . ln q

Since max0