Hypercontractivity of spherical averages in ... - Semantic Scholar

Report 3 Downloads 93 Views
1

Hypercontractivity of spherical averages in Hamming space arXiv:1309.3014v1 [math.PR] 12 Sep 2013

Yury Polyanskiy

Abstract Consider a linear space of functions on the binary hypercube and a linear operator Tδ acting by averaging a function over a Hamming sphere of radius δn. It is shown that such an operator has a dimension independent bound on the norm Lp → L2 with p = 1 + (1 − 2δ)2 . This result evidently parallels a classical estimate of Bonami and Gross for Lp → Lq norms for the operator of convolution with a Bernoulli noise. The estimate for Tδ is harder to obtain since the latter is neither a part of a semigroup, nor a tensor power. The result is shown by a detailed study of the eigenvalues of Tδ and Lp → L2 norms of the Fourier multiplier operators Πa with symbol equal to a characterstic function of the Hamming sphere of radius a. An application of the result to additive combinatorics is given: Any set A ⊂ Fn2 with the property that A + A contains a large portion of some Hamming sphere (counted with multiplicity) must have cardinality a constant multiple of 2n . It is also demonstrated that this result does not follow from standard spectral gap and semi-definite (Lov´asz-Delsarte) methods.

I. M AIN

RESULT AND DISCUSSION

Consider a linear space L of functions on n-dimensional Hamming cube f : Fn2 → C. We endow L with the following norms and an inner product: △

1

kf kp = E p [|f (X)|p ] , △

(f, g) = E [f (X)¯ g (X)] ,

1 ≤ p ≤ ∞,

(1) (2)

YP is with the Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139 USA. e-mail: [email protected]. The research was supported by the NSF grant CCF-12-53205 and NSF Center for Science of Information (CSoI) under grant agreement CCF-09-39370. September 13, 2013

DRAFT

2

where X is uniform on Fn2 . For any linear operator T : L → L we define △

kT kp→q = sup f ∈L

kT f kq . kf kp

For the following operator △

Z ∼ i.i.d. Bern(δ), x, Z ∈ Fn2 , 0 ≤ δ ≤ 1

Nδ f (x) = E [f (x + Z)],

(3)

the so-called “hypercontractive” inequality was established by Bonami and Gross [1], [2]: p − 1 ≥ (q − 1)(1 − 2δ)2 , p, q ≥ 1 .

kNδ kp→q = 1 ,

(4)

In this paper we analyze Lp → L2 norm for an operator Tδ of averaging over a Hamming

sphere Sδn . Specifically, for x = (x1 , . . . , xn ) ∈ Fn2 denote the Hamming weight of x and the Hamming sphere centered at zero as △

|x| = |{j : xj = 1}|

(5)



Sj = {x : |x| = j} . The operator Tδ is defined as follows: For δ < 1/2 −1  X n △ Tδ f (x) = ⌈δn⌉ n

(6)

f (x + y)

y∈F2 ,|y|=⌈δn⌉

and for δ ≥ 1/2



Tδ f (x) =



n ⌊δn⌋

−1

X

f (x + y) .

y∈Fn 2 ,|y|=⌊δn⌋

Apart from rounding issues we may write △

Tδ f = where ∗ denotes the convolution △

f ∗ g(x) =

f ∗ 1Sδn , |Sδn |

X

y∈Fn 2

f (x − y)g(y) .

Our main result: Theorem 1: Consider the set F ⊂ [0, 1] × [1, 2] F = {(δ, p) : p ≥ 1 + (1 − 2δ)2 , 0 ≤ δ ≤ 1, 1 < p ≤ 2} . DRAFT

September 13, 2013

3

For every compact subset K of F there exists a constant C = C(K) such that for all (δ, p) ∈ K, n ≥ 1 and f : Fn2 → C we have

kTδ f k2 ≤ Ckf kp .

(7)

Conversely, for any (δ, p) 6∈ F there is E > 0 such that sup f

kTδ f k2 ≥ enE+o(n) , kf kp

n→∞

(8)

with the exception of δ = 1/2, p = 1 for which we have − 12   1  kT1/2 f k2 πn 4 n n/2 sup ∼ . =2 ⌊n/2⌋ kf k1 2 f

(9)

√ Remark: The constants that our proof yields are as follows: for δ ≤ 0.174 we have C = 2, √ while for larger δ we can take C to be arbitrary close to 2 for sufficiently large n. Note also that the constant cannot be tightened to 1. Indeed, taking f = 1even to be the characteristic function of the set of all even-weight vectors we get 1

1

kTδ kp→2 ≥ 2 2 − p ,

1 ≤ p ≤ 2,0 < δ < 1,

regardless of dimension n. There are a number of applications of hypercontractive inequalities in information theory [3], theoretical computer science [4] and probability [5]. In particular, a very simple argument, cf. [5, Theorem 3.7], shows that a discrete time finite Markov chain with state space X which satisfies hypercontractive inequality mixes in time of order O(log log |X |). For Tδ , this Markov chain is

a non-standard random walk on a hypercube Fn2 which jumps by a distance exactly δn at each

step. A simple coupling argument shows that indeed such a random walk must mix in time O(log n). This gives a probabilistic intuition to Theorem 1. Our original interest in hypercontractivity was motivated by a remarkably simple solution it yields to a problem that the author attempted to solve using more conventional semi-definite programming (SDP), compare Sections IV in [6] and [7]. Here is an application of the new result (Theorem 1) similar in spirit: Corollary 2: For every ǫ ∈ (0, 1) there are constants C1 , C2 > 0 such that for any dimension

n and any set A ⊂ Fn2 with the property that the average multiplicity of A + A on at least some September 13, 2013

DRAFT

4

Hamming sphere Sj with ǫn ≤ j ≤ (1−ǫ)n exceeds λ|A| implies that cardinality |A| ≥ C1 λC2 2n . Formally:

2n (1A ∗ 1A , 1Sj ) ≥λ sup |Sj ||A| j∈[ǫn,(1−ǫ)n]

=⇒

|A| ≥ C1 λC2 2n

Remark: It is known that any linear subspace V ⊂ Fn2 which contains a Ω(1)-fraction of any Sδn must have codimension O(1) (in n → ∞). This corollary is a generalization: if a sumset A + A contains a λ-fraction of any Hamming sphere Sj (counted with multiplicity normalized by |A|) then a set must be of cardinality Ω(2n ). Proof: For later reference we prove a stronger statement:   1 Sj 1 −C2 kφk22 ≥ λkφk22 =⇒ ≤ λ , φ ∗ φ, 2 |Sj | kφk1 C1 from which the result follows by taking φ = 1A . To show (10) denote δ =

(10) j n

and consider the

chain λkφk22

  1 Sj ≤ φ ∗ φ, |Sj |

(11)

= (φ, Tδ φ)

(12)

≤ kφk2kTδ φk2

(13)

≤ Ckφk2kφkp ,

2 −1 p

≤ Ckφk2kφk1

p = 1 + (1 − 2ǫ)2 < 2 2− p2

kφk2

(14) (15)

where (13) is Cauchy-Schwartz, (14) is from Theorem 1, and (15) is from log-convexity of 1 p

7→ kφkp . Rearranging terms yields (10). Intuitively, a much more natural approach to proving the Corollary would be to apply the

harmonic-analytic method (or linear programming, or SDP) of Delsarte, cf. [8]. Somewhat surprisingly, such a method works but only for sufficiently large values of λ. We illustrate this issue briefly below. For λ ∈ (0, 1] let us say that a distribution φ λ-approximates deconvolution of an i.i.d. Bern(δ) random variable Z if P[X + X ′ = Z] ≥ λP[X + X ′ = 0] ,

(16)

where X, X ′ and Z are independent and X and X ′ are distributed according to φ. The idea here is that when λ → 1 then the distribution of X + X ′ , whose highest peak necessarily occurs at 0, DRAFT

September 13, 2013

5

is almost flat on the set where the Z concentrates. The goal is to find maximally non-uniform φ which becomes a λ-approximation to Z after self-convolution. Formally: Vn (λ) = Note that for φ =

1A |A|

kφk22 . (16) kφk2 1

max

φ sat.

this corresponds to minimizing cardinality |A|.

It turns out that regardless of dimension n the value of Vn (λ) is bounded by a constant. That is, every set A deconvolving Bern(δ) is of cardinality cλ · 2n . Obtaining this result from Bonami-

Gross hypercontractivity (4) is very simple. To that end define Bδ (x) = δ |x| (1 − δ)n−|x| to be a distribution function of iid Bernoulli noise. Then, we have Vn (λ) =

(φ, φ) . (φ, 1)2

max φ≥0

(17)

(φ ∗ φ, Bδ ) ≥ λkφk22 The argument entirely similar to (13)-(15) invoking Bonami-Gross (4) instead of Theorem 1 demonstrates Vn (λ) ≤ λ−s for some s > 0 and all dimensions n. Note that the problem in (17) is completely “L2 ” and thus escaping to Lp space in order to solve it looks somewhat artificial. Indeed, a much more natural approach would be to apply Fourier analysis or SDP relaxation. The Fourier-analytic approach (or the “spectral gap”) yields the following bound on Vλ : Since the second-largest eigenvalue of Nδ equals (1 − 2δ) we get (φ0 , Nδ φ0 ) ≤ (1 − 2δ)kφ0k , where φ0 = φ − (φ, 1). Simple manipulations then imply Vn (λ) ≤

2δ , λ − (1 − 2δ)

if λ > (1 − 2δ) .

This proves a correct estimate of O(1) but only for large values of λ. An improvement of this method comes with the use of an SDP relaxation. The latter is obtained by considering ψ = φ ∗ φ and retaining only the non-negative definiteness property of ψ. I.e. we September 13, 2013

DRAFT

6

have the following upper bound: △

Vn (λ) ≤ SDP (n, λ) =

2n

max ψ≥0

(ψ, B0 ) , (ψ, 1)

ψ0 (ψ, Bδ ) ≥ λ(ψ, B0 ) where B0 (x) = 1{x = 0} and ψ  0 denotes that f 7→ f ∗ ψ is a non-negative definite operator.

It can be shown that1

λ > (1 − 2δ)2 ,

SDP (n, λ) = O(1) ,

while for smaller values of λ SDP (n, λ) grows polynomially in n. Thus SDP is unable to yield the correct estimate of Vn (λ) for the entire range of λ. This example demonstrates that hypercontractivity may prove useful (in fact more powerful than SDP) even for questions that are entirely “L2 ”. Before delving into the proof of Theorem 1 we mention that traditional comparison techniques, cf. [9], for proving hypercontractive and log-Sobolev inequalities are not effective here. One problem is that our operators Tδ do not form a semigroup. This may potentially be worked around by applying the discrete-time version of log-Sobolev inequalities developed by Miclo [10]. However, the natural comparison to Nδ via Miclo’s method is unfortunately not useful: the primary reason is that the log-Sobolev constant of Nδ is of order when δ ∼

1 n

1 n

which implies tight hypercontractive estimates

and is very loose otherwise.

Nevertheless, a direct comparison of Tδ and Nδ can still yield useful results Theorem 3: For any δ and p ≥ 1 + (q − 1)(1 − 2δ)2 we have √ kTδ kp→q = O( n) .

Proof: Assuming without loss of generality that f ≥ 0 it is easy to see from Stirling’s formula that 1

 n

δn

X

|y|=δn

√ X f (x + y) ≤ O( n) f (x + y)δ |y| (1 − δ)n−|y| . |y|=δn

Then extending summation to all of y we get √ Tδ f (x) ≤ O( n)Nδ f (x) 1

∀x ∈ Fn2 .

These observations were made in collaboration with Prof. A. Megretski.

DRAFT

September 13, 2013

7

The result then follows from (4). The main part of Theorem 1 is thus in establishing a constant estimate on kTδ kp→q . Our proof compares the eigenvalues of Tδ and Nδ but also crucially depends on peculiar relation between norms of certain Fourier-multiplier operators on Fn2 and eigenvalues of Tδ . Those estimates perhaps are of independent interest as they bound energies in the degree-a components of functions on the hypercube. Finally, we close our discussion with mentioning the result of Semenov and Shneiberg [11]. Note that one of the most fascinating properties of (4) is that it shows the following “stickiness at 1” of k·kp→q norms: as operator Nδ starts to depart from the N 1 the norm kNδ kp→q remains stuck 2

at 1 for some range of values δ ∈

[δ0 , 12 ]

before starting to grow as δ < δ0 . This distinguishes the

measure of dependence k · kp→q from other measures (such as mutual information, or correlation coefficients). Interestingly, a similar effect was observed for Fourier multiplier operators and norms k · kp→p and k · k2→q in [12], [13]. Semenov and Shneiberg showed this in general: If T is any operator with kT kp→q < ∞ then in some neighborhood of ǫ = 0 we have k(1 − ǫ)E + ǫT kp→q = 1 , △

provided that E ◦T = T ◦E , T 1 = 1 and (E f )(x) = E [f (X)] maps functions to constants. Paired with our Theorem 1 this allows to establish that many permutation-invariant (or Sn -equivariant) operators in Hamming space have Lp → Lq norm equal to 1. II. P ROOF A. Auxiliary results: Notation △

For x = (x1 , . . . , xn ) ∈ Fn2 define x¯ = (1 − x1 , . . . , 1 − xn ). For each j = 1, . . . , n let △

χj (x1 , . . . , xn ) = 1{xj = 0} − 1{xj = 1} . Define the characters, indexed by v ∈ Fn2 , Y △ χv (x) = χj (x) = (−1) , where < v, x >= f : Fn2 → C is

Pn

j:vj =1

j=1

vj xj is a non-degenerate bilinear form on Fn2 . The Fourier transform of

△ fˆ(ω) =

X

x∈Fn 2 September 13, 2013

χω (x)f (x) = 2n (f, χω ) ,

ω ∈ Fn2 . DRAFT

8

Lp norms are monotonic kf kp ≤ kf kp1 ,

p ≤ p1 .

(18)

and satisfy the Young inequality: 1 1 1 + 1 = + , 1 ≤ p, q, r ≤ ∞ p q r

kf ∗ gkp ≤ 2n kf kq kgkr

For the size of Hamming spheres we have   1 n = enh(δ)− 2 ln n+O(1) , |Sδn | = ⌊δn⌋

(19)

n→∞

(20)

where the estimate is a consequence of Stirling’s formula, O(1) is uniform in δ on compact subsets of (0, 1) and h(δ) = −δ ln δ − (1 − δ) ln(1 − δ) . Furthermore, for all 0 ≤ j ≤ n j nh( n )

e

r

(21)

j 1 ≤ |Sj | < enh( n ) 2n

(22)

and for 1 ≤ j ≤ n − 1, cf. [14, Exc. 5.8], r r j j n n nh( n ) ) nh( n ≤ |Sj | ≤ e e 8j(n − j) 2πj(n − j)

(23)

B. Auxiliary results: Asymptotics of Krawtchouk polynomials Krawtchouk polynomials are defined as Fourier transforms of Hamming spheres:    n X n − |x| |x| △ j (−1) Kj (x) = 1c Sj (x) = j−k k

(24)

k=0

Since Kj (x) only depends on x through its Hamming weight |x|, we will abuse notation and write Kj (2) to mean value of Kj at a point with weight 2, etc. Some useful properties of Kj , cf. [15]: Kj (x) = (−1)j Kj (n − x)

(25)

Kj (x) = (−1)x Kn−j (x)

(26)

Kj (x) Kx (j) = Kj (0) Kx (0)

(27)

Kj (0) = Kj (x) =

kKj k22 X

  n , = |Sj | = j

χv (x)

(28) (29)

|v|=j

DRAFT

September 13, 2013

9

It is also well-known that Kj (x) has j simple real roots. For j ≤ n/2 all of them are in the interval [15, (71)]

n p n p − j(n − j) ≤ x ≤ + j(n − j) . 2 2

Thus for j = δn the location of the first root is at roughly △

ξcrit(δ) =

1 p − δ(1 − δ) . 2

The following gives a convenient non-asymptotic estimate of the magnitude of Kj (x): Lemma 4: For all x, j = 0, . . . , n we have |Kj (x)| ≤ enEj/n (x/n) , where the function Eδ (ξ) = E1−δ (ξ) and for δ ∈ [0, 1/2]:    1 (h(δ) + ln 2 − h(ξ)) , ξcrit (δ) ≤ ξ ≤ 1 − ξcrit (δ) 2 Eδ (ξ) =  φ(ξ, ω) , ξ = 21 (1 − (1 − δ)ω − δω −1 ) ,

(30)

(31)

where in the second case ω ranges in # " # " r r δ δ δ δ ∪ ,− , ω∈ − 1−δ 1−δ 1−δ 1−δ and △

φ(ξ, ω) = ξ ln |1 − ω| + (1 − ξ) ln |1 + ω| − δ ln |ω| .

(32)

Remark: Exponent Eξ (δ) was derived in [16] for ξ ≤ ξcrit (δ). Subsequently, a refined asymptotic expansion for all ξ ∈ [0, 1] was found in [17]: O(1) Kδn (ξn) = √ enEξ (δ) , n

(33)

where the O(1) term is θ(1) for ξ ≤ ξcrit, while for ξ ∈ [ξcrit , 1/2] the factor O(1) is oscillating and may reduce the exponent for a few integer points x ∈ [ξcritn, (1 − ξcrit)n], which are close to one of the roots of Kj (·). Proof: Following [17]2 we have 1 Kj (x) = 2πi 2

I

 dz (1 − z)x (1 + z)n−x z −j , z C

(34)

Note that Kj (·) in [17] corresponds to (−1)j Kj (·) in this paper.

September 13, 2013

DRAFT

10

where integration is over an arbitrary circle C with center at z = 0. The derivative of the function in braces is zero when n − 2x = (n − j)z + jz −1 .

(35)

Due to (26) it is sufficient to consider j ≤ n/2. Among the two solutions of (35) denote by ω the unique one with smallest |z| and ℑ(z) ≥ 0. Set, for convenience ξ = x/n,

δ = j/n ∈ [0, 1/2]

and note that we have the following relation between ω and ξ   p 1 2 2 ω= 1 − 2ξ − sgn(1 − 2ξ) · (1 − 2ξ) − 1 + (1 − 2δ) 2(1 − δ) δ 1 − 2ξ = (1 − δ)ω + . ω

(36) (37)

As ξ ranges from 0 to 1 the saddle point ω traverses the path r r δ δ δ δ → →− →− , ω: 1−δ 1−δ 1−δ 1−δ q δ where the middle segment is along the arc eiφ 1−δ , φ ∈ [0, π]; Corresponding to these corner points the ξ ranges as follows

ξ : 0 → ξcrit → 1 − ξcrit → 1 . It is more convenient to reparameterize the answer in terms of location of the saddle-point ω. If we take C to be the circle passing through ω, then as shown in [17, (3.4) and paragraph after (3.19)] the maximum

max (1 − z)x (1 + z)n−x z −j z∈C

is attained at z = ω and is equal to enEδ (ξ) , where

Eδ (ξ) = φ(ξ, ω) ,

(38)

and ξ is a function of ω defined via (37). Thus, upper-bounding the integrand {·} in (34) by the maximal value and noting that for any circle I dz ≤ 2π C z

we conclude that (30) holds. DRAFT

September 13, 2013

11 Normalized exponents of Krawtchouk polynomials 0

Kδ n(ξ n): δ=0.2 Kδ n(ξ n): δ=0.3

−0.05

−0.1

Eδ(ξ) − h(δ)

−0.15

−0.2

−0.25

−0.3

−0.35

−0.4

−0.45 0

Fig. 1.

The exponent of

Kδn (ξn) Kj (0)

0.1

0.2

0.3

0.4

0.5 ξ

0.6

0.7

0.8

0.9

1

is equal Eδ (ξ) − h(δ). The figure compares these exponents for two values of δ. Asterisks

mark the interval [ξcrit , 1 − ξcrit ] containing all the roots of Kδn (·). In this interval Kδn (·) is oscillatory.

.

It remains to show the simplified expression in (31) for ξ ∈ [ξcrit , 1 − ξcrit]. To that end, notice that such ξ corresponds to ω = eiφ

r

δ , 1−δ

φ ∈ [0, π] .

Substituting this ω into (38) we see that (31) is equivalent to ξ ln

2 |1 + ω| 1 |1 − ω| √ . + (1 − ξ) ln √ = ln 2 1−δ ξ 1−ξ

But for ω on the arc we have |1 − ω| |1 + ω| √ =√ = ξ 1−ξ

r

(39)

2 , 1−δ

thus verifying (39) and (31). For visual illustration of some properties summarized in the next lemma see Fig. 1. Lemma 5: Properties of Eδ (ξ): 1) (δ, ξ) 7→ Eδ (ξ) is continuous on [0, 1] × [0, 1] and possesses two symmetries: Eδ (ξ) = E1−δ (ξ), Eδ (ξ) = Eδ (1 − ξ). 2) Eδ (0) = Eδ (1) = h(δ), Eδ (1/2) = h(δ)/2 3) E1/2 (ξ) = ln 2 − h(ξ)/2 4) Eδ (ξ) = h(δ) − h(ξ) + Eξ (δ) 5) ξ 7→ Eδ (ξ) is monotonically decreasing on [0, 1/2] and possesses continuous derivative on [0, 1]. September 13, 2013

DRAFT

12

6) δ 7→ Eδ (ξ) is monotonically increasing on [0, 1/2]. 7) δ 7→ Eδ (ξ) − h(δ) is monotonically decreasing on [0, 1/2]. 8) For fixed δ and all ξ ≤ ξcrit (δ) we have Eδ (ξ) ≤ ξ ln(1 − 2δ) + h(δ) .

(40)

Proof: None of these properties are used below, so we omit fairly trivial details. We will also need a more refined estimate for Kj (x) when x is small: p Lemma 6: For j ≤ n/2 and 0 ≤ x ≤ nξcrit (j/n) = n/2 − j(n − j) we have x  2j Kj (x) ≤ 1− . (41) Kj (0) n √ Remark: With the additional factor O( n) the estimate (41) follows from (40). Lemma 6 establishes the crucial relation between spectra of operators Nδ and Tδ powering Theorem 1. Proof: In the mentioned range of x the polynomial Kj (x) is monotonically decreasing since Kj (0) > 0 and all roots are to the right of x. Hence, 0
0 it holds that x0 ≤ n

θ1 (θ1 − θ) 1 + θ12

⇐⇒

nθ x0 x0 θ1 + θ1x0 −1 ≤ θ1x0 +1 , n − x0 n − x0

which concludes the proof of (48) for x = x0 + 1. On the other extreme, for small values of j we can extend Lemma 6 to the whole range 0 ≤ x ≤ n2 : Lemma 8: There exist C1 ≥ 1 and δ0 ∈ (0, 1) such that for all 0 ≤ j ≤ δ0 n we have  x Kj (x) n ≤ C1 · 1 − 2j , 0≤x≤ . Kj (0) n 2

Remark: The proof given here yields C1 = 1 and δ0 = 0.174.

Proof: For j = 0 the inequality is trivial. For x ≤ ξcrit(j/n) it follows from Lemma 6. Thus, it is sufficient to consider x ≥ ξcrit (j/n), j ≥ 1. Denote δ = j/n. Then from Lemma 4 and (23) we have for all n ≥ 1: √ p Kδn (x) nf (δ)− 12 h(δ) 8(1 − δ) · e nδ , ≤ Kj (0) (1 − 2δ)x where

f (δ) =

1 (ln 2 − h(ξ)) − ξ ln(1 − 2δ) . ξ∈[ξcrit (δ),1/2] 2 max

From convexity of the function under maximization, we conclude f (δ) =

ln 2 1 − min (h(ξcrit (δ)) + 2ξcrit (δ) ln(1 − 2δ), ln 2(1 − 2δ)) . 2 2

Taking derivative at δ = 0 we conclude that for some δ0′ > 0 we have h(ξcrit (δ)) + 2ξcrit(δ) ln(1 − 2δ) ≤ ln 2(1 − 2δ) , September 13, 2013

∀δ ∈ [0, δ0′ ] . DRAFT

14

Consequently, for such δ f (δ) =

1 (ln 2 − h(ξcrit (δ))) − ξcrit (δ) ln(1 − 2δ) . 2

Evidently, f is continuously differentiable and f (δ) = 2δ + o(δ),

δ → 0.

Therefore for some δ0 ∈ (0, δ0′ ] we must have 1 f (δ) − h(δ) < 0 , 2

∀δ ∈ (0, δ0 ] .

The statement of the Lemma then follows with C1 = max(1, finite supremum found in the following Lemma.

p 8(1 − δ0 )C1′ ), where C1′ is the

Lemma 9: Let α, δ0 , C > 0 and f – a continuous function on [0, δ0 ] with f (0) = 0, derivative (one-sided at 0) bounded by C and satisfying f (δ) − αh(δ) < 0 ,

∀δ ∈ (0, δ0 ] .

(51)

where h is a binary entropy function (21). Then √ sup max en(f (δ)−αh(δ)) nδ < ∞ . n≥1 δ∈[0,δ0 ]

Proof: Under conditions of the theorem there exists 0 < δ1 < δ0 such that f (δ) ≤

α h(δ) , 2

∀δ ∈ [0, δ1 ] .

Thus we have max n(f (δ) − αh(δ)) +

δ∈[0,δ1 ]

1 1 ln(δn) ≤ max −αnh(δ) + log δn 2 2 δ∈[0,δ1 ] 1 ≤ max αnδ ln δ + ln(δn) . 2 δ∈[0,δ1 ]

Without loss of generality we may assume δ1


e2 . α

(52) (53)

In this case, maximization in (53)

1 ). Consequently, upper-bounding the first term by zero and second by is attained at δ ∗ ∈ (0, nα 1 · n) we get ln( nα

1 − ln α max αnδ ln δ + ln(δn) ≤ . 2 δ∈[0,δ1 ] 2

On the other hand, from (51) and continuity we get max f (δ) − αh(δ) = −C2 < 0 .

δ∈[δ1 ,δ0 ]

DRAFT

September 13, 2013

15

Therefore, putting both bounds together max

n≥1,δ∈[0,δ0 ]

n(f (δ)−αh(δ))

e

≤ max



p 1 √ , sup δ0 ne−C2 n α n



< ∞.

Finally, for illustration purposes we will need the following Lemma 10: Lp norms of Krawtchouk polynomials are given asymptotically by the following parametric formula: Let ω ∈ [0, 1] then for p ≥ 2     h(ξ) − ln 2 + φ(ξ, ω) + O(log n) , kK⌊δn⌋ kp = exp n p (1 + ω)p − (1 − ω)p c= (1 + ω)p + (1 − ω)p 1−c 1 ξ= = (1 − (1 − δ)ω − δω −1) 2 2 2 cω − ω δ= 1 − ω2 and φ(ξ, ω) is given by (32). For p ≤ 2 we have o nn h(δ) + O(log n) , kK⌊δn⌋ kp = exp 2

n→∞

(54) (55) (56) (57)

n → ∞.

Proof: The lemma is shown by analyzing with exponential precision the expression   n X p −n n |Kj (a)|p . kKj kp = 2 a a=0

From Lemma 4 it may be shown that for p ≥ 2 the term exponentially dominating this sum occurs at a ≤ nξcrit (j/n). Then, for such a = ξn we know from [16] that Kδn (ξn) = exp{nEδ (ξ) + O(log n)}. We omit further details as this Lemma is not used in the proof of Theorem 1. C. Auxiliary results: Norms of Fourier projection operators The Fourier projection operators Πa are defined as

or, equivalently,

△ d ˆ Π a f = f · 1 Sa

a = 0, 1, . . . , n ,

(58)



Πa f = 2−n f ∗ Ka . On the other hand from Young’s inequality (19) we have for any convolution operator: kφ ∗ (·)k1→2 = 2n kφk2 . September 13, 2013

DRAFT

16

Thus we have kΠa k1→2

s  n . = a

(59)

Also, we note that kΠa kp→q = kΠn−a kp→q , and thus we only consider a ≤

n 2

below. Estimates for other Lp → L2 follow from Bonami-Gross

inequality (4) and complex interpolation: Lemma 11: For any 1 ≤ p ≤ 2 and 0 ≤ a ≤ n2 we have   (p − 1)− a2 , p > p∗ , kΠa kp→2 ≤ s s  n p−2 (p∗ − 1)− (1−s)a 2 + s, 0 ≤ s ≤ 1 , 1p = 1−s a p∗

where p∗ = p∗ (a) = 2 if

h(δ) δ

(60)

≤ 2, and otherwise p∗ ∈ (1, 2) is a solution of

p∗ − ln(p∗ − 1) = δ −1 h(δ) ,

δ=

a . n

We also have two weaker bounds a

kΠa kp→2 ≤ (p − 1)− 2   p1 − 21 n kΠa kp→2 ≤ a

(61) (62)

Remark: The estimate (61) has been the basis of Kahn-Kalai-Linial results [4], so we refer to (61) as KKL bound. Note that p∗ (a) = 2 corresponds to a > 0.3093n, and then bound (60) coincides with (62). Proof: From Riesz-Thorin interpolation, we know that the map

1 p

7→ kΠa kp→2 is log-convex.

Thus (60) follows from (61) and (62) by convexification (the value of p∗ is chosen to minimize the resulting exponent when a = δn). Thus, it is sufficient to prove (61) and (62). The second one again follows from interpolating between (59) and kΠa k2→2 = 1. For the first one notice that for any τ we have Nτ Πa = Πa Nτ = (1 − 2τ )a Πa . And thus from (4) with (1 − 2τ )2 = p − 1 we get kΠa f k2 = |1 − 2τ |−a kΠa Nτ f k2 ≤ |1 − 2τ |−a kNτ f k2 ≤ |1 − 2τ |−a kf kp .

DRAFT

September 13, 2013

17 Normalized exponent of ||Π ||

, p=1.5

Normalized exponent of ||Π ||

a p−>2

, p=1.2

a p−>2

0.5 Upper bound: KKL Upper bound: interpolated Lower bound: symmetric functions

Upper bound: KKL Upper bound: interpolated Lower bound: symmetric functions

0.45

0.2

0.4

a p−>2

0.15

1/n log ||Π ||

1/n log ||Πa||p−>2

0.35

0.1

0.3

0.25

0.2

0.15

0.05

0.1

0.05

0 0

0.05

0.1

0.15

0.2

0.25 a/n

0.3

0.35

0.4

0.45

0 0

0.5

0.05

0.1

0.15

0.2

0.25 a/n

0.3

0.35

0.4

0.45

0.5

Fig. 2. Exponent of kΠa kp→2 as a function of a for two values of p. Two upper bounds correspond to Kahn-Kalai-Linial (61) and the interpolated one (60). The lower bound is given by considering only permutation invariant functions (cf. Lemmas 10 and 12).

.

To verify the tightness of our bounds we derive a simple lower bound by considering permutation invariant functions: Lemma 12: For any a ∈ {0, . . . , n} and any q, p ≥ 1 we have kΠa kp→q ≥ where p′ =

p p−1

kKa kq kKa kp′ , kKa k22

is the H¨older conjugate.

Proof: The lower bound is shown by optimizing over a class of permutation invariant functions f (x) = Ka (x) +

n X



cj Kj (x) = Ka (x) + Φ(x) ,

j6=a

where Φ ⊥ Ka . Note that inf kf kp = inf

Φ⊥Ka

sup (Ka + Φ, g)

Φ⊥Ka g:kgk ′ ≤1 p

= inf

sup

(Ka + Φ, g)

(64)

inf (Ka + Φ, g)

(65)



(66)

Φ⊥Ka g−sym.:kgk ′ ≤1 p

=

sup

g−sym.:kgkp′ ≤1 Φ⊥Ka

 = Ka ,

Ka kKa kp′

(63)

=

kKa k22 , kKa kp′

where (63) is by duality (Lp )∗ = Lp′ , (64) states the obvious fact that supremization can be restricted to permutation-symmetric g, (65) is by von Neumann minimax theorem and (66) is September 13, 2013

DRAFT

18

because the inner inf can only be finite if g belongs to the one-dimensional subspace spanned by Ka , i.e. g = cKa for a suitable c. Since Πa (Ka + Φ) = Ka we conclude that kΠa kp→q ≥ as claimed.

kKa kq kKa kq kKa kp′ = inf Φ⊥Ka kKa + Φkp kKa k22

On Fig. 2 we compare the upper and lower bounds on kΠa kp→2 as a ranges from 0 to n/2 for two values of p. We note that KKL bound (61) is significantly suboptimal for small p and large a. For example, for a > 0.3093n the bound (62) is strictly better than KKL. D. Proof of Theorem 1 Denote the boundary of F as △

p(δ) = 1 + (1 − 2δ)2 . Note that every compact subset K ′ of F is contained in F ∩ {p ≥ p0 } for sufficiently small p0 and in turn in some K = (F ∩ {δ : |1 − 2δ| ≥ θ}) ∪ {(δ, p) : |1 − 2δ| ≤ θ, p ≥ p0 }

(67)

for sufficiently small θ. In particular, we may choose θ so small that p0 > 1 + θ2 . Next note that x) (f ∗ 1Sn−a )(x) = (f ∗ 1Sa )(¯ and thus estimates for Tδ and T1−δ coincide asymptotically. Due to this symmetry and thanks to the monotonicity (18) of norms, to prove a theorem it is sufficient to prove the following pair of statements, corresponding to the boundary of K: S1. (critical estimate for δ < 1/2) For each δ there is Cδ such that for all n ≥ 1 and all functions f we have kTδ f k2 ≤ Cδ kf kp(δ) ,

(68)

and function δ 7→ Cδ is bounded on each [0, ∆], ∆ < 1/2. S2. (subcritical estimate around δ = 1/2) For any p > 1 and sufficiently small θ (in particular, p > 1 + θ2 ) there is C such that for all δ ∈ [(1 − θ)/2, 1/2], n ≥ 1 and functions f we have kTδ f k2 ≤ Ckf kp DRAFT

(69) September 13, 2013

19 Asymptotic spectra of Tδ and Nδ: δ = 0.1

Asymptotic spectra of Tδ and Nδ: δ = 0.25

0

0 Bernoulli noise Nδ

Fourier projection norm ||Πa||p(δ)−>2

Spherical average Tδ

Bernoulli noise Nδ

−0.05

Spherical average Tδ

−0.05 −0.1

(1/n) log(Fourier coef.)

(1/n) log(Fourier coef.)

−0.15 −0.1

−0.15

−0.2

−0.25

−0.3

−0.2

−0.35

−0.4 −0.25 0

Fig. 3.

0.05

0.1

0.15

0.2

0.25 a/n

0.3

0.35

0.4

0.45

0.5

0

0.05

0.1

0.15

0.2

0.25 a/n

0.3

0.35

0.4

0.45

0.5

Comparison of exponents of a-th eigenvalue of Tδ and Nδ . For larger δ we also show the negative of the exponent of

kΠa kp(δ)→2 , p(δ) = 1 + (1 − 2δ)2 . As before asterisks denote the critical value ξcrit (δ), i.e. the smallest root of Krawtchouk polynomial Kδn (·).

First we show S1. In accordance with (24) kTδ f k22 where we denoted

n X Kδn (a) 2 2 = Kδn (0) kfa k2 ,

(70)

a=0



fa = Πa f . The scheme of our proof is illustrated by Fig. 3: 1) First, we show that summation in (70) can be truncated to a ≤ n2 . 2) Second, we show that for small values of δ eigenvalues of Tδ are upper-bounded by a constant multiple of eigenvalues of Nδ defined in (3). This is the content of Lemma 8. 3) Third, for larger values of δ we show that although eigenvalues of Tδ can be exponentially larger than those of Nδ , such eigenvalues correspond to large a for which

kfa k2 kf kp

is

exponentially smaller. For ther first step note that any f can be written as f = feven + fodd , where each of the summands is supported on vectors x ∈ Fn2 of even/odd weight. Note that Tδ feven and Tδ fodd are also of opposite parity. Thus, kTδ f k22 = kTδ feven k22 + kTδ fodd k22 . September 13, 2013

DRAFT

20

On the other hand, we have 1 2 2

kfeven k2p + kfodd kp

q

2 2 ≤ f + f odd

even

(71)

p

= kf kp ,

(72)

where (71) is from Minkowski’s inequality and (72) is because the supports of feven and fodd are disjoint (we also assume, without loss of generality that f ≥ 0). Thus, if (68) is established for both odd and even functions then (68) follows for all functions with the same constant C. Note that for both odd and even functions we have |fˆ(ω)| = | ± fˆ(¯ ω )| = |fˆ(¯ ω )| . and for any such f from (70) and (25) we get X kTδ f k22 ≤ 2

0≤a≤n/2

Kδn (a) 2 2 Kδn (0) kfa k2 .

(73)

In the remaining we show that (73) is upper-bounded by Ckf kp(δ) uniformly in f and δ ≤ ∆ < 1/2. For all δ ∈ [0, δ0 ] from Lemma 8 we have X kTδ f k22 ≤ 2C12 (1 − 2δ)2a kfa k22

(74)

(1 − 2δ)2a kfa k22

(75)

0≤a≤n/2

≤ 2C12

X

0≤a≤n/2

= 2C12 kNδ f k22

(76)

≤ 2C12 kf k2p(δ) ,

(77)

where the last step follows from Bonami-Gross (4). For δ ∈ [δ0 , ∆] we have from Lemma 6 Kδn (a) a 0 ≤ a ≤ nξcrit (δ) . (78) Kδn (0) ≤ (1 − 2δ) ,

On the other hand, for a ∈ [nξcrit (δ), n/2] we have the following estimate:

Lemma 13: Fix arbitrary 0 < δ0 < ∆ < 1/2. Then there exist constants C1 , C2 > 0 such that

for all n ≥ 1, all j ∈ [δ0 n, ∆n] and all n p n p − n(n − j) ≤ x ≤ + n(n − j) 2 2

we have

DRAFT

Kj (x) √ −C2 n Kj (0) · kΠj kp( nj )→2 ≤ C1 ne

(79) September 13, 2013

21

where p(δ) = 1 + (1 − 2δ)2 . Lemma is proved at the end of the section. Putting together (77) and (79) we get similar to (77): kTδ f k22 ≤ 2kNδ f k22 + 2kf k2p

X

(C1 )2 ne−2C2 n

(80)

a∈[nξcrit (δ),n/2]

≤ 2kNδ f k22 + 2(C1 )2 kf k2p(δ) · n2 e−2C2 n

(81)

≤ 2(1 + (C1 )2 n2 e−2C2 n )kf k2p(δ) ,

(82)

where in the last step we applied (4). Since constants C1 and C2 only depend on δ0 and ∆ we finish the proof of (68) and of statement S1. We proceed to statement S2. Showing (69) is significantly simpler since p > p(δ) this time. √ 1 Take θ1 = p − 1 > θ and δ1 = 1−θ . Then, for all 2 △

0 ≤ a ≤ nξ1 = n

θ1 (θ1 − θ) 1 + θ12

and all δ ∈ [ 1−θ , 21 ] we have from Lemma 7: 2 Kj (a) a Kj (0) ≤ (1 − 2δ1 ) .

Thus, from (4) we get

X Kj (a) 2 2 2 2 Kj (0) kfa k2 ≤ kNδ1 f k2 ≤ kf kp .

(83)

a∈[0,nξ1 ]

On the other hand, for a > nξ1 we have for some C1 , E > 0: Kj (a) kfa k2 √ −nE n , ∀a ∈ [nξ1 , ] Kj (0) · kf kp ≤ C1 ne 2

(84)

Indeed, from Lemma 4 and (62) the exponent of the left-hand side of (84) is upper-bounded by   1 1 △ a (ln 2 − h(δ)) + − 1 h(ξ) , ξ= . 2 p n and ξ = ξ1 , yielding The largest value is attained when δ = 1−θ 2         1 1−θ 1 1 1 θ1 (θ1 − θ) . ln 2 − h( (ln 2 − h(δ)) + − 1 h(ξ) ≤ ) + −1 h 2 p 2 2 p 1 + θ12 Since p > 1 as θ → 0 the function on the right-hand side becomes negative. Thus the exponent of left-hand side in (84) is negative for sufficiently small θ. September 13, 2013

DRAFT

22

Estimating the sum in (73) via (83) and (84) we get similar to (82) that kTδ f k22 ≤ 2(1 + (C1 )2 n2 e−2En )kf k2p

∀δ ∈ [

1−θ 1 , ]. 2 2

This completes the proof of (69) and statement S2. We proceed to lower bounds on kTδ kp→2. To show (8) consider function

n n n X Y X n−t t ǫk Kk (x) . f (x) = (1 + ǫχj ) = (1 + ǫ) (1 − ǫ) 1St = t=0

j=1

On one hand,

kf kp =



k=0

(1 + ǫ)p (1 − ǫ)p + 2 2

= en

p−1 2 ǫ +o(ǫ2 ) 2

,

 np

(85)

ǫ→0

(86)

On the other hand, from Lemma 4 and (33) we have n X a 1 a a kTδ f k22 = e2n(Eδ ( n )−h(δ)+ n ln ǫ+ 2 h( n ))+o(n) ,

(87)

a=0

where we also used

  12 a n = ea ln ǫ+nh( n )+o(n) . kfa k2 = ǫ a a

For convenience, set ξ = na . Then it is not hard to show from (31) that Eδ (ξ) − h(δ) = ξ ln(1 − 2δ) + o(ξ) . Then setting ξ = ǫ2 (1 − 2δ)2 we find that

1 (1 − 2δ)2 2 Eδ (ξ) − h(δ) + ξ ln ǫ + h(ξ) = ǫ + o(ǫ2 ) , 2 2

ǫ→0

Thus from (87) and (86) we get lim inf n→∞

1 kTδ f k2 (1 − 2δ)2 − (p − 1) 2 ln ≥ ǫ + o(ǫ2 ) . n kf kp 2

Evidently, for p < 1 + (1 − 2δ)2 the norm kTδ kp→2 grows exponentially in dimension. Finally, estimate (9) follows from Young’s inequality (19): k1Sn/2 k2 kT1/2 f k2 ≤ 2n kf k1 |Sn/2 | −1/2 !  n kf k1 = 2n · 2−n/2 ⌊n/2⌋  πn  41 kf k1 = (1 + o(1)) 2 DRAFT

(88) (89) (90) September 13, 2013

23

This upper-bound is tight as f (x) = 1{x = 0} shows. Proof of Lemma 13: Let ξ =

a n

and δ = j/n. Since ξ is restricted to critical strip of

Krawtchouk polynomial Kδn from Lemma 4, bound (23) and Lemma 11 it is sufficient to show max

max 2

δ0 ≤δ≤∆ ξ:(1−2ξ)

+(1−2δ)2 ≤1

1 (ln 2 − h(ξ) − h(δ)) + π(p(δ), ξ) ≤ −C2 < 0 , 2

where p(δ) = 1 + (1 − 2δ)2 and

(91)

1 7→ π(p, ξ) p

is the convexification of the function (cf. Lemma 11)   1 ξ 1 1 7→ min − ln(p − 1), ( − )h(ξ) . p 2 p 2

(92)

To show (91) we first reparameterize the problem in terms of p. Set p0 = 1 + (1 − 2∆)2 ,

(93)

p1 = 1 + (1 − 2∆)2 .

(94)

Then (91) is equivalent to (we also interchange the maxima in ξ and δ): max 2

max

ξ:(1−2ξ) ≤2−p0 p:p0 ≤p≤min(p1

,2−(1−2ξ)2 )

η(ξ, p) +

where 1 η(ξ, p) = π(p, ξ) − h 2 △

By construction,

1 p



1−

ln 2 − h(ξ) ≤ −C2 < 0 2



p−1 2



(95)

.

7→ π(p, ξ) is convex. A simple verification shows that the second term h(· · · )

is concave in 1p . Thus, the maximization over p in (95) is applied to a convex function and

therefore must be achieved at one of the boundaries. Consequently, verifying (95) is equivalent to showing the following three strict inequalities, the maximum of which is taken to be −C2 : ln 2 − h(ξ)