Analytical Approach to Parallel Repetition
arXiv:1305.1979v2 [cs.CC] 15 May 2013
Irit Dinur∗
David Steurer†
May 16, 2013
Abstract We propose an analytical framework for studying parallel repetition, a basic product operation for one-round two-player games. In this framework, we consider a relaxation of the value of a game, val+ , and prove that for projection games, it is both multiplicative (under parallel repetition) and a good approximation for the true value. These two properties imply a parallel repetition bound as val(G⊗k ) ≈ val+ (G⊗k ) = val+ (G)k ≈ val(G)k . Using this framework, we can also give a short proof for the NP-hardness of label cover(1, δ) for all δ > 0, starting from the basic PCP theorem. We prove the following new results: – A parallel repetition bound for projection games with low soundness. Previously, it was not known whether parallel repetition decreases the value of such games. This result implies stronger inapproximability bounds for set cover and label cover. – An improved bound for few parallel repetitions of projection games, showing that Raz’s counterexample is tight even for a small number of repetitions. Our techniques also allow us to bound the value of the direct product of multiple games, namely, a bound on val(G1 ⊗· · ·⊗Gk ) for different projection games G1 , . . . , Gk . Keywords: parallel repetition, one-round two-player games, label cover, set cover, hardness of approximation, copositive programming, operator norms.
∗
Department of Computer Science and Applied Mathematics, Weizmann Institute. Part of this work was done at Microsoft Research New England and Radcliffe Institute for Advanced Study. † Computer Science Department, Cornell University. Part of this work was done at Microsoft Research New England.
Contents 1 Introduction 1.1 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 6 6
2 Technique 2.1 Label Cover, Games, and Linear Operators . . . . . . . . . . . . . 2.2 High-Level Proof Sketch . . . . . . . . . . . . . . . . . . . . . . . .
7 7 8
3 Warm-up: Gap Amplification for Label Cover
9
4 Analytical Setup
12
5 Parallel Repetition via a Relaxation 5.1 Relaxed Value . . . . . . . . . . . . . . 5.2 Multiplicativity . . . . . . . . . . . . . 5.3 Approximation . . . . . . . . . . . . . 5.4 Proof of Theorem 1.1 and Theorem 1.5
13 13 14 16 20
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
6 Few Repetitions – Proof of Theorem 1.4
20
7 Inapproximability Results 7.1 label cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 set cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23 23 25
8 Conclusions
27
References
27
A Additional Proofs
29
1 Introduction A one-round two-player game G consists of a bipartite graph1 with vertex sets U, V and edge set E, and a constraint πuv ⊆ Σ × Σ per edge uv ∈ E. The optimization goal is to find assignments f : U → Σ and 1 : V → Σ that satisfy as many of the constraints as possible (where a constraint πuv is satisfied if f (u), 1(v) ∈ πuv ). The value of the game G is the fraction of constraints satisfied by optimal assignments, o n val(G) = max P f (u), 1(v) ∈ πuv . f,1 uv∈E
The term one-round two-player game stems from the following scenario: A referee interacts with two players, Alice and Bob. Alice has a strategy f : U → Σ, and Bob a strategy 1 : V → Σ. A referee selects a random edge uv ∈ E and sends u as a question to Alice, and v as a question to Bob. Alice responds with f (u) and Bob with 1(v). They succeed if their answers satisfy ( f (u), 1(v)) ∈ πuv . In the k-fold parallel repetition G⊗k , the referee selects k edges u1 v1 , . . . , uk vk ∈ E independently at random and sends a question tuple u1 , . . . , uk to Alice, and v1 , . . . , vk to Bob. Each player responds with a k-tuple of answers and they succeed if their answers satisfy each of the k constraints πu1 v1 , . . . , πuk vk . Parallel repetition is a basic product operation on games, and yet its effect on the game value is far from obvious. The celebrated parallel repetition theorem of Raz [Raz98] bounds the value of G⊗k by a function of the value of G that decays exponentially with the number of repetitions. Raz’s proof has since been simplified, giving stronger and sometimes tight bounds [Hol09, Rao11]. However, some fundamental questions remained open, for example, bounding the value of G⊗k in terms of the value of G when this value close to 0. In this work, we establish such a bound and it allows us to finally prove an optimal NP-hardness of approximating set cover, relying on several previous works [Fei98, MR10, Mos12]. It stands to reason that a better understanding of direct products will lead to further inapproximability results. Some basic open questions include repetitions of multi-players games, quantum games, and direct sums of games.
1.1 Our Contribution Our main contribution is a new analytical framework for studying parallel repetitions of games. We express games in terms of matrices or linear operators. This correspondence between matrices and games yields interesting analogies between operations on matrices and operations on games, as well as between norms on matrices and values of games. It is very convenient that parallel repetition of a 1
Later, we will also allow non-negative weights on the edges. The definitions and results extend in the expected way.
3
game directly corresponds to the k-fold tensor power of its matrix. Additionally, the value of a game can be described as a certain norm on the corresponding operator. A somewhat surprising connection is that multiplication of two matrices can be interpreted as a certain serial concatenation of two games. This operation on games seems new and merits more investigation. In our proof, this operation allows us to focus on one coordinate of the repeated game while abstracting the remaining coordinates through appropriate matrix multiplication. In this framework, we consider a relaxation of the value of a game, val+ (G), and prove that it has the following two properties for projection games:2 Approximation: val+ (G) ≈ val(G) , Multiplicativity: val+ (G⊗k ) = (val+ (G))k . The combination of these two properties yields a parallel repetition bound, val(G⊗k ) ≈ val+ (G⊗k ) = val+ (G)k ≈ val(G)k . We also derive a particularly short proof of a parallel repetition bound for a subclass of games, namely expanding projection games. This class of games is rich enough for the main application of the parallel repetition: NP-hardness of label cover with perfect completeness and soundness close to 0 (the starting point for most hardness of approximation results). See Section 3. Next, we list some new results that are obtained by studying parallel repetition in this framework. Repetition of small-value games. Our first result shows that if the initial game G has value ρ that is possibly sub-constant, then the value of the repeated game still decreases exponentially with the number of repetitions. Theorem 1.1 (Repetition of games with small value). For any projection game G with val(G) 6 ρ , √ val(G⊗k ) 6 (2 ρ)k . The theorem follows by showing that val+ (G) approximates val(G) even for small values of ρ. This parallel repetition bound allows us to prove NP-hardness for label cover that is better than was previously known (see Theorem 7.4). One particular corollary is Corollary 1.2 (NP-hardness for label cover). For every constant c > 0, given a label cover instance of size n with alphabet size at most n, it is NP-hard to decide if its value is 1 or at most ε = (log1 n)c . 2
In a projection game, for any two questions u and v to the players and any answer β of Bob, there exists at most one acceptable answer α for Alice. The approximation property of val+ holds for general games. See Section 5.3.
4
This hardness result is precisely the missing component that allows us to lift Feige’s well-known quasi NP-hardness for ln n-approximating set cover [Fei98] to a proper NP-hardness. (Here, quasi NP-hardness means with respect to quasipolynomial time reductions.) Our improved hardness for label cover combined with the previous work of [Fei98] and using the reduction of [Mos12] shows the following hardness result for set cover. Corollary 1.3 (Tight NP-hardness for approximating set cover). For every α > 0, it is NP-hard to approximate set cover to within (1 − α) ln n, where n is the size of the instance. The reduction runs in time nO(1/α) . Unlike the previous quasi NP-hardness results for set cover, Corollary 1.3 rules out that approximation ratios of (1 − α) ln n can be achieved in time 2no(1) (unless NP ⊆ TIME(2no(1) )). We remark that Corollary 1.2 is still far from the known algorithms for label cover and it is an interesting open question to determine the correct tradeoff between ε and the alphabet size. Few repetitions. Parallel repetition is usually studied in the case where the number of repetitions k is large compared to 1/ε (where as usual, val(G) = 1 − ε). We address the question of ‘few repetitions’, when k is any number of repetitions, and in particular possibly much smaller than 1/ε. Theorem 1.4 (Few repetitions). Let G be a projection game with val(G) = 1 − ε. Then for all k ≪ 1/ε2 , √ val(G⊗k ) 6 1 − Ω( k · ε) . For small values of k, the previous best parallel repetition bound for projection games was roughly (1 − ε2 )k > 1 − kε2 [Rao11]. The above bound is stronger when k ≪ 1/ε2 . Raz, in his counterexample for strong parallel repetition [Raz08], showed a game √ with value 1 − ε such that the value of the k-fold repeated game is at least 1 − O( k · ε). Our bound matches Raz’s bound even for small values of k, thereby confirming a conjecture of Ryan O’Donnell3 and extending the work of [FKO07] who proved such a bound for the odd-cycle game. Parallel product of different games. Our techniques allow us to bound the value of G1 ⊗ · · · ⊗ Gk for different projection games G1 , . . . , Gk . Theorem 1.5 (Product of different games). Let G1 , . . . , Gk be projection games. Then √ k Y 2 δi val(G1 ⊗ · · · ⊗ Gk ) 6 1 + δi i=1
where δi = val(Gi ). 3
Personal communication, 2012.
5
We note that an obstacle to obtaining such a statement with the previous approach of [Raz98, Rao11, Hol09] is the fact that their proof has an inductive step that involves selecting at random a coordinate i ∈ [k] on which the argument proceeds. There is a separate claim showing that most coordinates “will work” but this will necessarily ignore a constant fraction of the games Gi . Our analysis, in contrast, involves no such random step so every coordinate causes the appropriate decrease in the value of the product game. Finally, we note that this statement captures Theorem 1.1 as well as the optimal bound of [Rao11] for projection games with value close to 1.
1.2 Related work Already in [FL92], Feige and Lovász proposed to study parallel repetition via a relaxation of the game value. Their relaxation is defined as the optimal value of a semidefinite program. While this relaxation is multiplicative, it does not provide a good approximation for the game value.4 In particular, the value of this relaxation can be 1, even if the game value is close to 0. The proof that the Feige–Lovász relaxation is multiplicative uses semidefinite programming duality (similar to [Lov79]). In contrast, we prove the multiplicativity of val+ in a direct way.5 For unique two-player games, Barak et al. [BHH+08] introduced a new relaxation, called Hellinger value, and showed that this relaxation provides a good approximation to both the game value and the value of the Feige–Lovász relaxation (see [Ste10a] for improved approximation bounds). These quantitative relationships between game value, Hellinger value, and the Feige–Lovász relaxation lead to counter-examples to “strong parallel repetition,” generalizing [Raz08]. The relaxation val+ is a natural extension of the Hellinger value to projection games (and even, general games). Our proof that val+ satisfies the approximation property for projection games follows the approach of [BHH+ 08]. The proof is more involved because, unlike for unique games, val+ is no longer easily expressed in terms of Hellinger distances. Another difference to [BHH+ 08] is that we need to establish the approximation property also when the game’s value is close to 0. This case turns out to be related to Cheeger-type inequalities in the near-perfect expansion regime [Ste10b].
1.3 Organization In Section 2 we describe the analytic framework in which we study games. In Section 3 we give a relatively simple proof of a parallel repetition theorem for expanding projection games. This proof gives a taste of our techniques in a 4
In fact, no polynomial-time computable relaxation can provide a good approximation for the game value unless P = NP, because the game value is NP-hard to approximation. 5 The relaxation val+ can be defined as a convex program. However, it turns out that unlike for semidefinite programs, the dual objects are not closed under tensor products.
6
simplified setting, and gives gap amplification for label cover, thereby proving the NP-hardness of label cover (1, δ). In Section 5 we define val+ and prove the approximation and multiplicativity properties, followed by a proof of Theorem 1.1 and Theorem 1.5. In Section 6 we analyze parallel repetition with few repetitions, proving Theorem 1.4. We prove Corollary 1.2 and related hardness results for label cover and set cover in Section 7.
2 Technique 2.1 Label Cover, Games, and Linear Operators A one-round two-prover game G is given by a bipartite graph (U, V, E) and an alphabet Σ. Each edge uv ∈ E comes with a constraint πuv ⊆ Σ2 . We say G is a projection game if every πuv is a projection constraint: each β has a unique α for which (α, β) ∈ πuv . The value of G is defined as o n val(G) = max P ( f (u), 1(v)) ∈ πuv . f,1 uv∼E
The computational problem of approximating the value of a two-prover game is called label cover. Concretely, label cover(1, δ) is the problem of distinguishing if a given a projection game G has val(G) = 1 or val(G) 6 δ. We identify the two-prover game G with a linear operator from RV×Σ to RU×Σ , by the following matrix 1/d if uv ∈ E and (α, β) ∈ πuv , G((u, α), (v, β)) = 0 otherwise. Here, we assume that the vertices u ∈ U have degree d in the constraint graph of G. (We will also assume that the vertices in V have the same degree d′ in the constraint graph of G.) The action of this operator can be explicitly written as X f (v, β) . G f (u, α) = E v|uv∼E
β∈π−1 uv (α)
We define the label-extended graph of a game to be the bipartite graph whose incidence matrix is the matrix of G. Namely, for each u or v this graph has a “cloud” of vertices indexed by Σ, and there is an edge between (u, α) and (v, β) if and only if (α, β) ∈ πuv . We identify an assignment for a vertex set W with a vector f ∈ RW×Σ such that f (w, γ) = 1 if w is assigned the label γ ∈ Σ and f (v, γ) = 0 otherwise. Given two assignments f ∈ RV×Σ and 1 ∈ RU×Σ the fraction of satisfied constraints is exactly h1, G f i where we define the inner product with respect to the uniform probability measure on U and the counting measure on Σ. We remark that this setup gives a description of the value of the game as max f,1 h1, G f i, which is 7
quite similar to the largest eigenvalue of the matrix, except that the maximum is P taken only over positive vectors for which β f (v, β) = 1 for all v. Given an assignment f , the norm kG f k2 is a reasonable measure of the maximal success probability of the game, using f . Claim 2.1. Let G be a projection game, viewed as an operator G : RV×Σ → RU×Σ . Then val(G)2 6 max f kG f k2 6 val(G), where the maximum is over assignments. Proof. For the second inequality, let f be an assignment for G, kG f k2 = hG f, G f i 6 maxh1, G f i = val(G). 1
On the other hand, the best choice of 1 is such that for each u ∈ U chooses an α that maximizes maxα G f (u, α). So if f is a best assignment, val(G)2 = h1, G f i2 6 k1k2 · kG f k2 = kG f k2 . ′
′
′
′
Given two games G : RV×Σ → RU×Σ and H : RV ×Σ → RU ×Σ define the ′ ′ ′ ′ product game G ⊗ H : RV×V ×Σ×Σ → RU×U ×Σ×Σ by the tensor of the respective matrices. More explicitly, G ⊗ H(u, u′ , α, α′ , v, v′ , β, β′ ) = G(u, α, v, β) · H(u′ , α′ , v′ , β′ ) We define G⊗k = G ⊗ G⊗k−1 inductively in the obvious way.
2.2 High-Level Proof Sketch Given an assignment f , we bound the value of G⊗k on f , by considering a sequence of hybrid games. In the i-th hybrid the players play i − 1 coordinates of F and k − i + 1 coordinates of G, where F stands for a "free" game in which every assignment is accepted with probability 1, val(G ⊗ G ⊗ · · · ⊗ G, f ) 6 val(F ⊗ G ⊗ · · · ⊗ G, f ) 6 · · · 6 val(F ⊗ F ⊗ · · · ⊗ F, f ) = 1. There are several possible choices for F, but the one that works out best is the following game: Bob gets a random question v just like in G, Alice gets an identical question v, and they succeed if Alice answers 1 and regardless of Bob’s response. Now, if val(G⊗k , f ) is not smaller than (1 − ε)k , there must be some step in which replacing F by G doesn’t cause the value of the game to drop by much. Letting H stand for all the other coordinates, this means that val(G ⊗ H, f ) > (1 − ε) · val(F ⊗ H, f ). The idea is to use this inequality to obtain an assignment for the single shot game G. Now, we can think of the game G ⊗ H as a two step process, G⊗I
I⊗H
1 ←− h ←− f , 8
corresponding to the factorization G ⊗ H = (G ⊗ I)(I ⊗ H). In words, we first map f to h = (I ⊗ H) f through the operator (I ⊗ H). Then, we map h to 1 = (G ⊗ I)h = (G ⊗ H) f through the operator (G ⊗ I). We can now focus on h and understand the inequality above to mean that the map from h to (G ⊗ I)h didn’t cause much decrease in the value of the game, i.e. (1 − ε) val(F ⊗ I, h) < val(G ⊗ I, h).
(2.1)
Had G ⊗ I above been replaced by G, and h by an integral assignment, we would have had an assignment h for G whose value is larger than 1− ε. Instead, for each vertex v and label β for G, h gives a vector hv,β ∈ RV(H)×Σ(H) where V(H), Σ(H) are the vertices and alphabet of H. Naïvely, one could choose a fixed coordinate in these hv,β vectors and look at the partial assignment given for G and in fact in the simpler case of expanding games, it suffices to do just that. In non-expanding games, a closer look shows that (2.1) describes a weighted average of the success probability of these partial assignments, with the weights of different coordinates corresponding to winning probability in H. Finally, one uses correlated sampling to combine these different partial assignments into a full assignment for G.
3 Warm-up: Gap Amplification for Label Cover In this section, we prove the following theorem (assuming the PCP Theorem). Theorem. label cover(1, δ) is NP-hard for all δ > 0 Concretely, we show a polynomial-time reduction from label cover(1, 1 − ε) to label cover(1, δ) (for all constants ε, δ > 0). Let G be a label cover instance with vertex sets U and V and alphabet Σ. First, convert G into an expanding game (There is an easy way to do this, see Appendix A.) Then, output the parallel repetition of G for some k = O(log(1/δ)/ε2 ). This reduction preserves satisfiability: If the original instance has value 1, the resulting instance also has value 1. More interestingly, if the original instance has value at most 1 − ε, then the (contrapositive of) the theorem below shows that the value of the resulting instance is at most (1 − Ω(ε2 ))k 6 δ. Theorem. Suppose the constraint graph of G has a constant eigenvalue gap. √ If val(G⊗k ) > (1 − η)k , then val(G) > 1 − O( η). Our proof of parallel repetition is different from known proofs [FK00, Raz98, Hol09, Rao11] and arguably simpler. (The technically most involved step is Cheeger’s inequality.) This proof also illustrates key ideas for our other results (e.g., amplification in the low-soundness regime). k k Let us assume that there is some assignment f ∈ RV ×Σ for which val(G⊗k ) > (1 − η)k . This implies that kG⊗k f k2 > (1 − η)2k (see Claim 2.1). The main goal is 9
√ to deduce from f an assignment ϕ for G such that kGϕk2 > 1 − O( η), which √ immediately implies that val(G) > 1 − O( η). A function 1 : V × Σ → R is a fractional assignment if for every vertex v ∈ V, there exists at most one label β ∈ Σ with 1(v, β) , 0. (In other words, the support of 1 corresponds to a partial assignment.) Our proof involves two steps. The first lemma shows that a good assignment f for G⊗k implies a good fractional assignment 1 for G. k
k
Lemma 3.1. If there exists an assignment f ∈ RV ×Σ such that kG⊗k f k2 > (1 − η)k , then there exists a fractional assignment 1 : V × Σ → R such that kG1k2 > (1 − η)k1k2 . The second lemma shows that fractional assignments can be “rounded” to partial assignments in a way that approximately preserves the value. For expanding instances, partial assignments can be turned to (total) assignments with roughly the same value. Lemma 3.2. If there exists a fractional assignment 1 : V × Σ → R for G with kG1k2 > (1 − p η)k1k2 , then there exists a partial assignment ϕ for G with value at least kGϕk2 > (1 − 2η)kϕk2 . Furthermore, if the constraint graph of G has a constant eigenvalue gap, √ then kGϕk2 > 1 − O( η). Lemma 3.2 is like a Cheeger inequality for the label-extended graph. It follows by applying a discrete version of Cheeger’s inequality (with boundary conditions) after setting up an appropriate graph. Lemma 3.1 is the main ingredient of the proof of the theorem above. Proof of Lemma 3.1. We will prove Lemma 3.1 by induction on k. Suppose kG⊗k f k2 > (1 − η)k . If k = 1, we are done. Suppose k > 1. Then, we can assume max f kG⊗(k−1) f k2 6 (1 − η)k−1 . (Otherwise, we would be done by induction hypothesis. ) We can write G⊗k = G ⊗ H for H = G⊗(k−1) . The game H = G⊗(k−1) is the game with vertex set UH = Uk−1 and VH = V k−1 , and alphabet ΣH = Σk−1 . Let F be the following trivial game: Bob gets a random question v ∈ V, and Alice gets an identical question v. The alphabet is Σ, and they succeed if Alice replies 1 regardless of Bob’s response. The instance F is trivial in the sense that every assignment for Bob has value 1 as long as Alice always answers 1. In other words, kF1k2 = 1 for all assignments 1 for Bob. Since k(F⊗H) f k2 6 max f ′ kH f ′ k2 6 (1−η)k−1 (using the induction hypothesis), we get k(G ⊗ H) f k2 > (1 − η) · k(F ⊗ H) f k2 (3.1) and we will argue that this allows us to extract a fractional assignment for G with value 1 − η. Now let us write G⊗H = (G⊗I)(I⊗H) with I the identity matrix of appropriate dimensions. This matrix product can be interpreted as first mapping f to h =
10
(IV×Σ ⊗ H) f , and then mapping h to (G ⊗ IΩ )h. The domain of h is (V × Σ) × Ω for Ω = UH × ΣH . Rewriting (3.1), we get k(G ⊗ IΩ )hk2 > (1 − η)k(F ⊗ IΩ )hk2 It is now instructive to look at the vector h as a two dimensional matrix, whose rows correspond to V × Σ and whose columns correspond to Ω. Denoting hω : V × Σ → R>0 to be the column indexed by ω, we get k(G ⊗ I)hk2 = Eω kGhω k2 and similarly k(F ⊗ I)hk2 = Eω kFhω k2 . There must therefore be some ω ∈ Ω such that kGhω k2 > (1 − η)kFhω k2 . It turns out that we can greedily change hω to a fractional assignment h⋆ : V × Σ → R>0 with kGh⋆ k2 > kGhω k2 and kh⋆ k2 = kFhω k2 . Consider the polytope {h0 | Fhω = Fh0 , h0 > 0}. Let h⋆ be a corner of this polytope maximizes the linear function hh⋆ , GT Ghω i. Since hω is contained in the polytope, h⋆ satisfies hh⋆ , GT Ghω i > hhω , GT Ghω i = kGhω k2 . Furthermore, since F has rank |V|, all corners of the polytope are fractional assignments. As a fractional assignment, h⋆ also satisfies kh⋆ k2 = kFh⋆ k2 = kFhω k2 To verify kG′ h⋆ k2 > kGhω k2 , we see that kGhω k2 6 hh⋆ , GT Ghω i 6 kGh⋆ k · kGhω k , which implies that indeed kGh⋆ k2 > (1 − η)kh⋆ k2 as desired.
Proof of Lemma 3.2. Let 1 : V × Σ be a fractional assignment for G with value kG1k2 > (1 − η)k1k2 . Let x ∈ ΣV be an assignment for V that is compatible with 1 in the sense that xv = β if 1(v, β) , 0. We p are to construct a partial assignment ϕ : V × Σ → {0, 1} such that kGϕk2 > (1 − 2η)kϕk2 . (Here, partial assignment means a fractional assignment that is 0/1-valued.) P Let h : V → R be the function h(v) = β 1(v, β). Since 1 is a fractional assignment, khk2 = k1k2 . Let us now consider the following graph Q on v: For any two edges (u, v), (u, v′ ) ∈ E that have a common neighbor u ∈ U, add an edge between v and v′ if x is consistent with the constraints on the two edges, in the sense that πu,v (xv ) = πu,v′ (xv′ ). Since 1 is a fractional assignment, we can verify that this construction satisfies hh, Qhi = h1, GT G1i = kG1k2 . At this point, we apply the following version of Cheeger’s inequality (which can be proven in the same way as the usual statement of the inequality): For any graph Q on V with maximum degree d (identified with its adjacency matrix scaled by 1/d), and any function h : V → R, if the Rayleigh quotient 2 of h satisfies hh, Qhi p > (1 −2 η)khk , then there exists a set S ⊆ V with h1S , Q1S i > (1 − 2η)k1S k .
We let ϕ : V × Σ → {0, 1} be the function corresponding to x restricted to the set S. As before, wepsee that kGϕk2 = hϕ, GT Gϕi = h1S , Q1S i and kϕk2 = k1S k2 . Thus, kGϕk2 > (1 − 2η)kϕk2 as desired. Let GV be the constraint graph of GT G, let γ be its spectral gap, and let µ = k1S k2 be the measure of the set S. Since GV is a subgraph of Q, h1S , G′V 1S i > 11
p h1S , Q1S i > (1 − 2 2η)µ. Since GV has spectral gap γ, we have h1S , GV 1S i 6 µ2 + (1 − γ)(1 − µ)µ = (1 − (1 − µ)γ)µ (by the expander mixing lemma). Combining p the two bounds on h1S , GV 1S i gives µ > 1 − 2 2η/γ, which implies that kGϕk2 > p p (1 − 2η)µ > 1 − 2 2η/γ. We can extend ϕ to an assignment for G with value at p least 1 − 2 2η/γ.
4 Analytical Setup Games as operators. We represent a (non-bipartite) game G with vertex set W and alphabet Σ as a distribution over triples (u, v, π) with u, v ∈ W and π ⊆ Σ2 . (A triple (u, v, π) encodes the constraint that the label α assigned to u and the label β assigned to v satisfy the relation π, so that (α, β) ∈ π. We refer to this distribution as constraint distribution). We say a G is bipartite if there exists a bipartition W = U ∪ V such that every triple (u, v, π) in the support of the distribution satisfies u ∈ U and v ∈ V. The value of an assignment x ∈ W for G is defined as def
val(G; x) =
P (u,v,π)∼G
{(xu , xv ) ∈ π} .
The value of G is the maximum value of an assignment val(G) = maxx val(G; x). We will assume that our games are regular. For non-bipartite games, regularity means that every vertex of W participates in the same fraction of triples (both as first vertex and as second vertex). In other words, the marginal distribution of both u and v (when we sample (u, v, π) ∼ G) is uniform over W. For bipartite games, regularity means that the marginal distribution of u is uniform over U and the marginal distribution of v is uniform over V. We remark that our results are likely to generalize to non-regular games (by changing measures appropriately). However, we did not verify this case. We identify G with the following linear operator acting on functions f : W × Σ → R (corresponding to the label-extended graph), X G f (u, α) = E π(α, β) · f (v, β) . (v,π)∼G|u
β
Here, G | u denotes the constraint distribution condition on the first vertex being u, and π(·, ·) denotes the 0/1-indicator of the relation π (so that π(α, β) = 1 if (α, β) ∈ π and π(α, β) = 0 otherwise.) For bipartite games, we view G as a linear operator mapping functions on V × Σ to functions on U × Σ. We say that a bipartite game is a projection game if every constraint projects from V to U, which means that for every constraint (u, v, π) in G and every label β for v, there exists at most one label α for u such that (α, β) ∈ π.
12
Norms and inner products. For a vertex set W, we equip the space of realP valued functions on W × Σ with the inner product h f, 1i = Eu∈W α f (u, α)1(u, α). (For vertex sets, we use uniform probability measures, and for alphabets, we use counting measures.) We equip the space also with the corresponding norm k f k = h f, f i1/2 . Assignments and fractional assignments. For an assignment x ∈ ΣW , we associate a function f : W × Σ → {0, 1} such that f (u, α) = 1 if α = x0 and f (u, α) = 0 otherwise. (In other words, f is the 0/1-indicator function of x viewed as a subset of W × Σ.) We will refer to such functions as assignments. An assignment x and the function f that corresponds to it satisfy val(G; x) = h f, G f i . Similarly, if G is a bipartite game with vertex set U and V, then assignments x ∈ ΣU and y ∈ ΣV and the corresponding indicator functions f : U × Σ → {0, 1} and h : V × Σ → {0, 1} satisfy val(G; x, y) = h f, Ghi . Cheeger’s inequality (with boundary conditions). Lemma 4.1. Let G is a graph (identified with its random walk matrix) with vertex set V and let f : V → R be a real-valued function on V. √ If h f, G f i > (1 − ε)k f k2 , then there exists a set S ⊆ supp( f ) with expansion Φ(S) 6 2ε (so that h1S , G1S i > √ (1 − 2ε)k1S k2 ).
5 Parallel Repetition via a Relaxation 5.1 Relaxed Value Let G be a game with vertex set W and alphabet Σ. Let AssignW,Σ be the set of normalized fractional assignment, ) ( 2 P def P 2 AssignW,Σ = f : W × Σ → R>0 f (u, α) = f (u, α) 6 1 for all u ∈ W . α
α
P 2 P Since f > 0, the constraint = α f (u, α)2 means that f (u, α) is nonα f (u, α) P zero for at most one label α ∈ Σ. The constraint α f (u, α)2 6 1 is for normalization. Given a function f ∈ AssignW,Σ , we can extend it to an assignment x ∈ ΣW such that val(G; x) > h f, G f i. Thus, the value of G is the maximum of the quadratic form f 7→ h f, G f i over AssignW,Σ , val(G) =
max
f ∈AssignW,Σ
13
h f, G f i .
For a finite measure space Ω, we consider the following higher-dimensional analog of AssignW,Σ , ) (
2 P 2 def
P AssignW,Σ (Ω) = f : W × Σ × Ω → R>0 fu,α
= fu,α 6 1 for all u ∈ W . α
α
Here, fu,α : Ω → R>0 is the slice of f such that fu,α (ω) = f (u, α, ω). (We can think of fu,α as an |Ω|-dimensional vector assigned to (u, α).) The constraint
P
2 P
α fu,α
= α k fu,α k2 for all u means that each slice fω : W × Σ → R>0 is a fractional assignment (so that fω (u, α) is non-zero for at most one label α ∈ W for every vertex u ∈ W. (In particular, h fu,α , fu,α′ i = 0 holds for all u ∈ W and α , α′ ∈ Σ.) We define the relaxed value of G as val+ (G) = sup Ω
max
h f, (G ⊗ IΩ ) f i .
f ∈AssignW,Σ (Ω)
Clearly val+ (G) > val(G) by taking Ω = {1}. The relaxed value is achieved for Ω being the uniform measure on a set of size |Σ|2 |W|2 . (Notice that h f, (G ⊗ IΩ ) f i = Eω∼Ω h fω , G fω i = Tr(G · Eω∼Ω fω fωT ). The set of operators that can be expressed as Eω∼Ω fω fωT is a convex subset of an d = |W|2 · |Σ|2 -dimensional vector space. Hence, every such operator can be expressed as a convex combination of at most d operators f1 f1T , . . . , fd fdT with fi : W × Σ → R>0 .)
5.2 Multiplicativity Theorem 5.1. Let G and H be two bipartite projection games and let G′ = GT G and H′ = HT H be their squares. Then, val+ (G′ ⊗ H′ ) = val+ (G′ ) · val+ (H′ ) . We remark that our proof of the above theorem also works for general bipartite games G and H. However in the general case, the squares GT G and HT H are operators that do not necessarily correspond to games. Let G be a bipartite projection game with vertex sets U and V, and alphabet Σ. Let H be a bipartite projection game with vertex sets UH and VH , and alphabet ΣH . Let G′ = GT G and H′ = HT H be the squares of G and H. An important ingredient of the above lemma is the following alternative characterization +(G′ ) for squared games. It turns out that the condition P P of val 2 2 k β fv,β k = β k fv,β k 6 1 for all v ∈ V can be relaxed to the weaker condition P k β fv,β k2 6 1 for all v ∈ V without changing the optimal value.
Lemma 5.2. Let f : V × Σ × Ω → R>0 be a nonnegative function (for some finite P probability space Ω) such that k β fv,β k2 6 1 for all vertices v ∈ V. Then, val+ (G′ ) > h f, (G′ ⊗ IΩ ) f i . 14
To illustrate the utility of Lemma 5.2, let us prove above Theorem 5.1 assuming it. We will prove Lemma 5.2 at the end of this subsection. Proof of Theorem 5.1. We prove the direction val+ (G′ ⊗ H′ ) 6 val+ (G′ ) · val+ (H′ ). (The other direction is clear.) Let f ∈ AssignV×VH ,Σ×ΣH (Ω) be a function with h f, (G′ ⊗ H′ ) f i = ρ · val+ (H′ ). We are to prove val+ (G′ ) > ρ. Let Ω′ = UH × ΣH × Ω. Consider the function h : (V × Σ) × Ω′ with h = (IV×Σ ⊗ H ⊗ IΩ ) f /val+ (H′ )1/2 . P We claim that h satisfies k β hv,β k2 6 1 for all v ∈ V. Assuming this claim, we arrive at the desired conclusion val+ (G) > hh, (G′ ⊗ IΩ′ )hi = h f, (G′ ⊗ H′ ⊗ IΩ ) f i/val+ (H′ ) = ρ . The first step uses Lemma 5.2 and the second step uses that (IV×Σ ⊗ H ⊗ IΩ )T (G′ ⊗ IΩ′ )(IV×Σ ⊗ H ⊗ IΩ ) = G′ ⊗ H′ ⊗ IΩ . P P It remains to verify the claim. For fv,∗ = β fv,β , we see that k β hv,β k2 = k(H ⊗ IΩ ) fv,∗ k2 /val+ (H′ ) = h fv,∗ , (H′ ⊗ IΩ ) fv,∗ i/val+ (H′ ) . So we are to show that h fv,∗ , (H ⊗ IΩ ) fv,∗ i 6 val+ (H′ ). We see that for all v ∈ V and v′ ∈ VH ,
X
2
X
2
′ ′ ′ ′ fv,v ,β,β
(by definition of fv,∗ ) fv,∗,v ,β = β∈Σ,β′ ∈ΣH
β′ ∈ΣH
61
(using f ∈ AssignV×VH ,Σ×ΣH (Ω))
Hence, by Lemma 5.2, fv,∗ indeed satisfies h fv,∗ , (H ⊗ IΩ ) fv,∗ i 6 val+ (H′ ).
The proof of Lemma 5.2 uses a simple greedy argument (or alternatively a related polytopal argument.) Proof of Lemma 5.2. Let f : V × Σ × Ω → R>0 be a function with h f, (G′ ⊗ IΩ ) f i = P ρ and k β fv,β k2 6 1 for all v ∈ V. We are to find f ⋆ ∈ AssignV,Σ (Ω) with h f ⋆ , (G′ ⊗ IΩ ) f ⋆ i > ρ. Consider the following polytope X X n o P = h : V × Σ × Ω → R>0 hv,β = fv,β for all v ∈ V . β
β
All corners of this polytope belong to AssignV,Σ (Ω). (Points in P that do not belong to AssignV,Σ (Ω) can be expressed as non-trivial convex combinations within the polytope.) Let f ⋆ ∈ AssignV,Σ (Ω) be a corner of P that maximizes the linear function h 7→ hh, (G′ ⊗ IΩ ) f i. Since f ∈ P, we get that h f ⋆ , (G′ ⊗ IΩ ) f i > ρ. To verify h f ⋆ , (G′ ⊗ IΩ ) f ⋆ i > ρ, we see that ρ 6 h f ⋆ , (G′ ⊗ IΩ ) f i
= h(G ⊗ IΩ ) f ⋆ , (G ⊗ IΩ ) f i
6 k(G ⊗ IΩ ) f ⋆ k · k(G ⊗ IΩ ) f k
(using G′ ⊗ IΩ = (G ⊗ IΩ )T (G ⊗ IΩ )) (Cauchy–Schwarz)
= h f ⋆ , (G′ ⊗ IΩ ) f ⋆ i1/2 · h f, (G′ ⊗ IΩ ) f i1/2 q = h f ⋆ , (G′ ⊗ IΩ ) f ⋆ i · ρ . 15
(using G′ ⊗ IΩ = (G ⊗ IΩ )T (G ⊗ IΩ ))
5.3 Approximation Theorem 5.3. Let G be a game with val+ (G) > ρ, then p 1 − 1 − ρ2 val(G) > . p 1 + 1 − ρ2 Contrapositively, if val(G) 6 δ, then
√ 2 δ val+ (G) 6 . 1+δ p In particular, if val(G) is small then the above bound becomes val+ (G) 6 2 val(G); and if val(G) 6 1 − ε then val+ (G) 6 1 − ε2 /8. It also follows that if val+ (G) > ρ then val(G) > ρ2 /4 (which captures the asymp√ totic behaviour close to 0), and denoting ρ = 1 − ε, we get val(G) > 1 − 2 2ε (which captures the asymptotic behavior for ρ close to 1). Let G be a game with vertex set W and alphabet Σ. Let f ∈ AssignW,Σ (Ω) be a function with h f, (G ⊗ IΩ ) f i = ρ (witnessing val+ (G) > ρ). Lemma 5.4. There exists a 0/1-valued function f ′ ∈ AssignW,Σ (Ω × [0, 1]) such that p h f, (G ⊗ I) f i > 1 − 1 − ρ2 .
(Our definition of val+ considered only finite measure spaces (but [0, 1] is infinite). It would be straightforward to generalize the definition to the infinite setting. We remark that the lemma also holds a finite measure space instead of [0, 1] but the proof would be more cumbersome.) Proof. We rescale the measure on Ω and the function f such that f 6 1 (without changing f ∈ AssignW,Σ (Ω) or h f, (G ⊗ IΩ ) f i). Choose f ′ : (W × Σ) × Ω × [0, 1] as 2 1 if f (u, α, ω) > τ , f ′ (u, α, ω, τ) = 0 otherwise.
′ are also fractional Since the slices fω are fractional assignments, the slices fω,τ assignments. they are 0/1-valued partial assignments.) Therefore, P ′ 2 P (Actually, ′ 2 k α fu,α k = α k fu,α k for allPu ∈ W. ′ ′ k2 6 1 for all u ∈ W and h f ′ , (G ⊗ I It remains to show that α k fu,α Ω×[0,1] ) f i > 2 2 2 ′ ′ ϕ(ρ), where ϕ(x) = 1 − (1 − x2 )1/2 . Since Eτ ( fτ ) = f , we see that k fu,α k = k fu,α k2 P ′ 2 and therefore, α k fu,α k 6 1. Let ρω ∈ [0, 1] be such that h fω , G fω i = ρω k fω k2 . We will first show that ′ ′ Eh fω,τ , G fω,τ i > ϕ(ρω )k fω k2 τ
16
for all ω ∈ Ω. (From that bound, we will be able to deduce the desired bound on h f ′ , (G ⊗ IΩ×[0,1] ) f ′ i by integrating over Ω and convexity of ϕ on [0, 1].) First, n o ′ ′ E fω,τ (u, α) · fω,τ (v, β) = min fω (u, α)2 , fω (v, β)2 (5.1) τ∼[0,1]
Let xω ∈ ΣW be an assignment that is consistent with the fractional assignment fω (so that fω (u, α) = 0 for all α , xω,u ). Now, by (5.1), and using that fω′ is a fractional assignment, n o ′ ′ Eh fω,τ , G fω,τ i = E π(xω,u , xω,v ) min fω (u, xω,u )2 , fω (u, xω,v )2 . (5.2) τ
(u,v,π)∼G
Again using that fω′ is a fractional assignment, we can express h fω , G fω i in a similar way, h fω , G fω i =
E (u,v,π)∼G
π(xω,u , xω,v ) · fω (u, xω,u ) fω (u, xω,v )
(5.3)
At this point we will use the following simple inequality (see Corollary 5.7 toward the end of this subsection): Let A, B, Z be jointly-distributed random variables, such that A, B take only nonnegative values and Z is 0/1-valued. Then, E Z min{A, B} > √ ϕ(ρω ) E 21 (A + B) holds as long as E Z AB > ρω E 21 (A + B). We will instantiate the inequality with Z = π(xu , xv ), A = fω (u, xu )2 , and B = fω (v, xv )2 (with (u, v, π) drawn from G). With this setup, E 21 (A + B) = k fω k2 . Furthermore, E Z min{A, B} corresponds to the right-hand side of (5.2) and √ E Z AB corresponds to the right-hand side of (5.3). Thus, h fω , G fω i > ρω k fω k2 means that the condition above is satisfied, and we get the desired conclusion, ′ , G f ′ i > ϕ(ρ )k f k2 . Eτ h fω,τ ω ω ω,τ Finally, Z Z ′ ′ ′ ′ h f , (G ⊗ IΩ×[0,1] ) f i = h fω , (G ⊗ I[0,1] ) fω i dω > ϕ(ρω ) · k fω k2 dω > ϕ(ρ). Ω
Ω
For the last step, we use the convexity of ϕ and that ρ = R k f k2 = Ω k fω k2 dω 6 1. f′
R
Ω
ρω k fω k2 dω and
Let Ω′ = Ω × [0, 1]. The next step is to convert the 0/1-valued function ∈ AssignW,Σ (Ω′ ) to an assignment for G via correlated sampling.
Lemma 5.5. There exists an assignment x for G with value at least h f ′ , (G ⊗ IΩ′ ) f i > ϕ(ρ).
17
1−γ 1+γ
for 1 − γ =
′ k2 = 1 for all u ∈ W (we can arrange this condition to Proof. We may assume k fu,∗ hold by adding additional points to Ω′ , one for each vertex in W, and extending f ′ in a suitable way.) Rescale Ω′ to a probability measure. Let λ be the scaling factor, so that ′ ′ k2 = λ for h f , (G ⊗ IΩ′ ) f ′ i = (1 − γ)λ (after rescaling). Then, f ′ also satisfies k fu,∗ all u ∈ W. For every ω ∈ Ω′ , the slice fω′ is a partial assignment (in the sense that it uniquely assigns a label to a subset of vertices). We will construct (jointlydistributed) random variables {Xu }u∈W , taking values in Σ, by combining the partial assignments fω′ in a probabilistic way. Let {ω(n)}n∈N be an infinite sequence of independent samples from Ω′ and let f ′(n) = fω′ n be the corresponding slices of f ′ . Let R(u) be the smallest number ′ ′ r such that fω(r),0 , 0 (meaning that the partial assignment fω(r) assigns a label to ′ u), and we let Xu be the (unique) label such that fω(R(u)) (u, Xu ) , 0. By a correlated sampling argument (e.g., similar to [BHH+ 08, Lemma 4.1]), the probability that the random assignment X satisfies an arbitrary constraints (u, v, π) is bounded from below by the following quantity: The probability that a random partial assignment fω′ satisfies (u, v, π) (over the randomness ω ∼ Ω) divided by the probability that at least least one vertices u, v assigned in fω′ . Formally,
P {π(Xu , Xv ) = 1} > X
Eω∼Ω′ π(xω,u , xω,v ) min{ fω′ (u, xω,u ), fω′ (u, xω,v )} . Eω∼Ω′ max{ fω′ (u, xω,u ), fω′ (u, xω,v )}
(5.4)
(Here, xω ∈ ΣW denotes any assignment consistent with the partial assignment fω′ for ω ∈ Ω′ .) At this point, we will use the following simple inequality (see Lemma 5.8 toward the end of this subsection): Let A, B, Z be jointly-distributed random variables such that A, B are nonnegative-valued and Z is 0/1-valued. Then, E Z · min{A, B} > 1−γ ( 1+γu,v,π ) E max{A, B} as long as E Z · min{A, B} > (1 − γu,v,π ) E 12 (A + B) . u,v,π
We instantiate this inequality with A = fω′ (u, xω,u ), B = fω′ (v, xω,v ) and Z = π(xω,u , xω,v ) for ω ∼ Ω′ . The condition of the inequality corresponds to the ′ k2 + k f ′ k2 ). (Recondition Eω π(xω,u , xω,v ) fω′ (u, xω,u ) fω′ (v, xω,v ) > (1 − γu,v,π ) 12 (k fu,∗ v,∗ call that f ′ is 0/1-valued.) The conclusion of the inequality shows that the right-hand side of (5.4) is bounded from below by 1−γu,v,π/1+γu,v,π . Hence, by convexity of ϕ′ (x) = (1 − x)/(1 + x), E val(G; X) = E P {π(Xu , Xv ) = 1} > E ϕ′ (γu,v,π ) > ϕ′ E γu,v,π . X
(u,v,π)∼G X
(u,v,π)∼G
(u,v,π)∼G
It remains to lowerbound the expectation of γu,v,π over (u, v, π) ∼ G. E (u,v,π)∼G
1 − γu,v,π = =
′ ′ 2 E 2 2 E π(xω,u , xω,v ) fω (u, xω,u ) fω (v, xω,v ) (u,v,π)∼G k fu,∗ k +k fv,∗ k ω 1 E E π(xω,u , xω,v ) fω′ (u, xω,u ) fω′ (v, xω,v ) λ (u,v,π)∼G ω
= λ1 h f ′ , (G ⊗ IΩ′ ) f ′ i = 1 − γ . 18
Some inequalities. The following lemma show that if the expected geometric average of two random variable is close to their expected arithmetic average, then the expected minimum of the two variables is also close to the expected arithmetic average. A similar lemma also used in proofs of Cheeger’s inequality. Lemma 5.6. √ Let A, B1be jointly-distributed random variables, taking nonnegative values. If E AB = ρ E 2 (A + B), then
for ϕ(x) = 1 −
√ 1 − x2 .
E min{A, B} > ϕ(ρ) · E 12 (A + B) ,
Proof. Since min{A, B} = 21 (A + B) − 12 |A − B|, it is enough to lower bound E 12 |A − B| = E 21 A1/2 − B1/2 A1/2 − B1/2 1/2 1/2 1/2 2 1/2 1/2 2 1 1 ·E2 A −B 6 E2 A −B √ √ 1/2 = E 12 (A + B) − E AB · E 21 (A + B) + E AB q = 1 − ρ2 · E 12 (A + B) .
The second step uses Cauchy–Schwarz.
The following corollary will be useful for us to prove Cheeger’s inequality for two-player games. Corollary 5.7. Let A, B be as before. √Let Z be a 0/1-valued random variable, jointly distributed with A and B. If E Z · AB = ρ E 12 (A + B), then E Z · min{A, B} > ϕ(ρ) E 21 (A + B), for ϕ as before. Proof. The corollary follows from the convexity of ϕ. For notational simplicity, assume E 12 (A + B) = 1 (by scaling). Let λ = E Z · 12 (A + B). Write ρ as a convex combination of 0 and a number ρ′ such that ρ = ρ′ · λ + 0 · (1 − λ). Since √ ′ E Z AB = ρ · λ, Lemma 5.6 implies E Z min{A, B} > ϕ(ρ′ ) · λ. The convexity of ϕ implies ϕ(ρ) 6 λ · ϕ(ρ′ ) + (1 − λ)ϕ(0) = λ · ϕ(ρ′ ). Lemma 5.8. Let A, B, Z be jointly-distributed random variables as before (A, B taking nonnegative values and Z taking 0/1 values). If E Z · min{A, B} = (1 − γ) E 12 (A + B), then E Z · min{A, B} 1 − γ > . E max{A, B} 1+γ Proof. For simplicity, assume E 21 (A + B) = 1 (by scaling). Since min{A, B} = 1 1 1 2 (A + B) − 2 |A − B|, we get 1 − γ 6 E min{A, B} = 1 − E 2 |A − B|, which means that E 21 |A − B| 6 γ. Since max{A, B} = 12 (A + B) + 21 |A − B|, it follows that 1−γ 1−γ E Z · min{A, B} > = . 1 E max{A, B} 1 + E 2 |A − B| 1 + γ 19
5.4 Proof of Theorem 1.1 and Theorem 1.5 We end this section with a quick derivation of Theorem 1.5 that clearly implies Theorem 1.1 by setting Gi = G for all i. Let G1 , . . . , Gk be projection games with val(Gi ) 6 δi . val(G1 ⊗ · · · ⊗ Gk ) 6 val+ (G1 ⊗ · · · ⊗ Gk ) =
k Y i=1
√ k Y 2 δi val+ (Gi ) 6 1 + δi i=1
where the first inequality is immediate from the fact that val+ is a relaxation, the equality follows from Theorem 5.1, and the last inequality is by Theorem 5.3.
6 Few Repetitions – Proof of Theorem 1.4 The following theorem will allow us to prove a tight bound on the value of projection games after few parallel-repetitions (Theorem 1.4). Theorem 6.1. Let G and H be two bipartite projection games and let G′ = GT G and H′ = HT H be their squares. Suppose val(H′ ) 6 1 − γ and val(G′ ⊗ H′ ) > 1 − η − γ (with γ small enough, say γ 6 1/100). Then, √ val(G′ ) > 1 − O η + γη .
Let us first explain how the bound above improves over the bounds in Section 5. In the notation of the above theorem, Theorem 5.1 implies that √ val+ (G′ ⊗H′ ) > 1 − O(η). Thus, val(G′ ) > 1 − O( η) by Theorem 5.3. val+ (G′ ) = val ′ + (H ) We see that this bound is worse than the above bound whenever γ is close to 0. Before proving the theorem, we will show how it implies Theorem 1.4 (parallel-repetition bound for few repetitions). Proof of Theorem 1.4. Let us first reformulate Theorem 6.1. (The above formulation reflects our proof strategy for Theorem 6.1 but for our current application, the following formulation is more convenient.) Let G and H be two projection games and let G′ and H′ be their squares. Suppose val(G′ ) 6 1 − ε and val(H′ ) 6 1 − tε for some ε > 0 and 1 6 t ≪ 1/ε. Then, Theorem 6.1 shows that val(G′ ⊗ H′ ) 6 1 − t + Ω 1t · ε
√ (Here, we use that for γ = tε and η = Ω(1/t)ε, we have η+ ηγ = Ω(ε).) From this bound, we can prove by induction on k that val(G⊗k ) 6 1 − Ω(k)1/2 · ε for k ≪ 1/ε2 . Concretely, let {t(k)}k∈N be the sequence such that val((G′ )⊗k ) = 1 − t(k) · ε. The above formulation of Theorem 6.1 implies t(k + 1) > t(k) + Ω(1/t(k)) (as long as t(k) ≪ 1/ε). Since the sequence increases monotonically, it follows that t(k + 1)2 > t(k)2 + Ω(1) (by multiplying the recurrence relation with t(k + 1) on both sides). Hence, as long as t(k) ≪ 1/ε, we have t(k)2 = Ω(k) and t(k) = Ω(k)1/2 , as desired. 20
The proof of Theorem 6.1 follows a similar structure as the proofs in Section 5. If val(H′ ) 6 1 − γ and val(G′ ) > 1 − η − γ, then following the proof of Theorem 5.1 we can construct a function for G′ that satisfies certain properties. (In particular, the function certifies val+ (G′ ) > 1 − O(η)). The challenge is to construct an assignment for G′ from this function. This construction is the content of the following two lemmas, Lemma 6.2 and Lemma 6.3. For convenience, these lemmas will be about general games as opposed to squared games. Lemma 6.2. Let f ∈ AssignW,Σ (Ω) be a [0, 1]-valued function with k fu,∗ k2 6 1 − γ for all u ∈ W and k f k1 = 1. Suppose h f, (G ⊗ IΩ ) f i > 1 − η − γ and that γ is small enough (say, γ < 1/100). Then, there exists a 0/1-valued function f ′ ∈ AssignW,Σ (Ω′ ) with √ h f ′ , (G ⊗ IΩ′ ) f ′ i > 1 − O(η + ηγ) k f ′ k2 , ′ k2 > 1/3 for all but an O(η) fraction of vertices in W. and k fu,∗
Proof. Since k fu,∗ k2 6 1 − γ for all u ∈ W, the condition h f, (G ⊗ IΩ ) f i > 1 − η − γ implies that k fu,∗ k2 > 0.9 for all but O(η) fraction of vertices u ∈ W. Let Ω′ = Ω × [1/10, 9/10] (with the uniform measure on the interval [1/10, 9/10]). Let f ′ : W × Σ × Ω′ → {0, 1} be such that for all u ∈ W, α ∈ Σ, ω ∈ Ω, τ ∈ [1/10, 9/10], 2 1 if fω (u, α) > τ , ′ fω,τ (u, α) = 0 otherwise.
′ k2 > 1/3 for all but O(η) vertices u ∈ W (using that k f k2 > 0.9 It follows that k fu,∗ u,∗ ′ (u, α)2 6 10/9 f (u, α)2 . for all but O(η) vertices). We see that fω (u, α)2 − 1/10 6 Eτ fω,τ ω Let Bω : W×Σ → R be the 0/1 indicator function of the event { f (u, α)2 ∈ [1/10, 9/10]}. Then,
2 2 ′ ′ (u, α) − fω,τ (u′ , α′ ) 6 2 · fω (u, α) − fω (u′ , α′ ) E fω,τ τ + 2 Bω (u, α) + Bω (u′ , α′ ) · fω (u, α)2 − fω (u′ , α′ )2
(6.1)
We can see that the left-hand side is always at most 2| fω (u, α)2 − fω (u′ , α′ )2 |. Hence, the inequality holds if Bω(u, α)+Bω (u′ , α′ ) > 1. Otherwise, if Bω (u, α)+Bω (u′ , α′ ) = 0, then the left-hand side is either 0 or 1. In both cases, the left-hand side is bounded by 2( fω (u, α) − fω (u′ , α′ ))2 . For every ω ∈ Ω, let xω ∈ ΣV be an assignment consistent with the fractional assignment fω . Let hω : W → [0, 1] be the function hω (u) = fω (u, xω,u ). Similarly, ′ (u, x h′ω,τ (u) = fω,τ ω,u ). Let Gω be the linear operator on functions on W, so that for every 1 : W → R h1, Gω 1i =
E (u,v,π)∼G
π(xω,u , xω,v ) · 1(u) · 1(v) .
21
(As a graph, Gω corresponds to the constraint graph of G, but with some edges deleted.) Let Lω be the corresponding Laplacian, so that for all 1 : W → R h1, Lω 1i =
E (u,v,π)∼G
π(xω,u , xω,v ) ·
1 2
2 1(u) − 1(v) .
′ , G f ′ i = hh′ , G h′ i. With these definitions, h fω , G fω i = hhω , Gω hω i and h fω,τ ω ω,τ ω,τ ω,τ We also use the short-hand, Bω (u) = Bω (u, xω,u ). With this setup, we can relate Eτ hh′ω,τ , Lω , h′ω,τ i and hhω , Lω , hω i,
Ehh′ω,τ , Lω h′ω,τ i τ
6 2hhω , Lω hω i +
E (u,v,π)∼G
Bω (u) + Bω (v) · π(xω,u , xω,v ) · hω (u)2 − hω (v)2
(by (6.1))
6 2hhω , Lω , hω i + 10hhω , Lω hω i1/2 · kBω k (using Cauchy–Schwarz) .
Let Mω be the linear operator on functions on W with the following quadratic form, hhω , Mω hω i = E (1 − π(xu , xv )) 12 hω (u)2 + hω (v)2 (u,v,π)∼G
Let L′ω = Lω + Mω . The following identity among these operators holds hhω , Gω hω i = khω k2 − hhω , L′ω hω i = khω k2 − hhω , Lω hω i − hhω , Mω hω i.
(6.2)
Let γω = khω k1 − khω k2 , and let ηω = hhω , L′ω hω i. We see that Eτ hh′ω,τ , Mω h′ω,τ i 6 √ 2hhω , Mω hω i. Thus, Eτ hh′ω,τ , L′ω h′ω,τ i 6 O(ηω + ηω · kBω k) (using (6.1)). Next, we claim that kBω k2 = O(γω ). On the whole domain W, we have Bω 6 100(1 − hω )hω = 100(hω − h2ω ). So, kBω k2 6 100(khω k1 − khω k2 ) = 100γω . Hence, √ Ehh′ω,τ , L′ω h′ω,τ i 6 O(ηω + ηω γω ). (6.3) τ
Using (6.2) and the relation between G and Gω , we can integrate (6.3) over Ω, Z Z √ ′ ′ ′ ′ ′ 2 ′ h f , (G ⊗ IΩ ) f i = Ehhω,τ , Ghω,τ i dω = k f k − O(1) ηω + ηω γω dω Ω τ
Ω
√ = k f k − O(η + ηγ) . R R The last step uses that ω ηω = k f k2 − h f, (G ⊗ IΩ ) f i 6 η and γω = k f k1 − k f k2 6 γ, as well as Cauchy–Schwarz. Since k f ′ k2 > 0.9 − η − γ (using that k f k > 1 − η − γ), √ we see that h f ′ , (G ⊗ IΩ′ ) f ′ i > (1 − O(η + ηγ))k f ′ k2 . ′ 2
Lemma 6.3. Let f ∈ AssignW,Σ (Ω) be a 0/1-valued function with k fu,∗ k > 1/3 for all but O(ε) vertices, and h f, (G ⊗ IΩ ) f i = (1 − ε)k f k2 . Then, val(G) > 1 − O(ε) .
22
Proof. Rescale Ω so that it becomes a probability measure. Let λ be the scaling factor so that 1/3λ 6 k fu,∗ k2 6 λ for all u ∈ W (after rescaling of Ω). Following the analysis in Lemma 5.5, there exists an assignment for G with value at least val(G) > (1 − ε′ )/(1 + ε′ ) for ε′ =
E (u,v,π)∼G
1−
2 k fu,∗ k2 +k fv,∗ k2
· E π(xω,u , xω,v ) · fω (u, xω,u ) fω (u, xω,v ) . ω
Here, xω ∈ ΣW is a consistent assignment with fω for all ω ∈ Ω. We are to show ε′ = O(ε). To ease notation, let use define two jointly-distributed random variables, for (u, v, π) ∼ G, ∆ · k f k2 = 12 (k fu,∗ k2 + k fv,∗ k2 )
Γ · k f k2 = 12 (k fu,∗ k2 + k fv,∗ k2 ) − E π(xω,u , xω,v ) · fω (u, xω,u ) fω (u, xω,v ) ω
In expectation, E Γ = ε. Also, 0 6 Γ 6 ∆ with probability 1. Furthermore, P {∆ < 1/3} = O(ε). We can express ε′ in terms of these variables, ε′ = E 1 −
1 ∆
· (∆ − Γ) = E ∆1 Γ .
Let B (for bad) be the 0/1-indicator variable of the event {∆ < −2/3} (which is the same as {1/(1 + ∆) > 3}). Since this event happens with probability at most O(ε), we have E B = O(ε). Then, we can bound ε′ as ε′ = E ∆1 Γ = E B ·
1 ∆Γ
+ (1 − B) · ∆1 · Γ 6 E B + 3 E Γ = O(ε) . |{z} | {z } 61
63
7 Inapproximability Results 7.1 label cover In this section we prove a new hardness result for label cover (Theorem 7.4 below), and derive Corollary 1.2. Definition 7.1 (label cover). Let ε : N → [0, 1] and let s : N → N be functions. We define the label covers (δ) problem to be the problem of deciding if an instance of label cover of size n and alphabet size at most s(n) has value 1 or at most ε(n). When we refer to a reduction from 3SAT to label covers (ε), we mean that satisfiable 3SAT instances are mapped to label cover instances with value 1, and unsatisfiable 3SAT instances are mapped to label cover instances with value at most ε. Let us begin by reviewing the known results. We mentioned in Section 3 that the PCP theorem [AS98, ALM+ 98] implies that label covers (1 − δ) for some 23
constant s and (small enough) δ > 0. Raz’s parallel repetition theorem applied to this instance k times implies that label coverak (βk ) is NP-hard, for some β < 1 and some constant a > 1. So, taking k = O(log 1/ε) will imply a soundness of ε, with an alphabet of size s = poly(1/ε) = ak . (This is proved in Section 3). Theorem 7.2 (PCP theorem followed by Raz’s theorem). There are some absolute constants a > 1 and 0 < β < 1 such that for every k ∈ N there is a reduction that takes instances of 3SAT of size n to instances of label covers (ε) of size nO(k) , such that s 6 ak and ε 6 βk , and in particular, s 6 poly(1/ε). In fact, one can take k = ω(1) in the above and get a reduction from the original label cover to label covers (ε) still with s = poly(1/ε) = ak but now the size of the instance grows to be nk . Setting k = log log n or log n is still often considered reasonable, and yields quasi NP hardness results, namely, hardness results under quasi-polynomial time reductions. If we insist on polynomial time reductions, then the best hardness for label cover with sub-constant value of ε is due to Moshkovitz and Raz, who proved in [MR10] (and see also [DH09]), Theorem 7.3 (Theorem 10 in [MR10]). There exists some constant c > 0 such that the following holds. For every ε : N → [0, 1] there is a reduction taking 3SAT instances of size n to label covers (ε) instances of size n1+o(1) · poly(1/ε) such that s 6 exp(1/εc ). By applying parallel repetition to an instance of label cover from the above theorem, and using the bound of Theorem 1.1, we get, Theorem 7.4 (New NP-hardness for label cover). For every constant α > 0 the following holds. For every ε : N → [0, 1] there is a reduction taking 3SAT instances of size n to label covers (ε) instances of size nO(1) · poly(1/ε) such that s 6 exp(1/εα ). The improvement of this theorem compared to Theorem 7.3 is in the for-all quantifier over α. Proof. Assume ε = o(1) otherwise Theorem 7.2 can be applied with k = O(log 1/ε) and a much better bound on s. The reduction is as follows. Starting with a 3SAT instance of size n, let G be the label covers1 (ε1 ) instance output of the reduction from Theorem 7.3 with ε1 = (3c/α · εα )1/c , and output G⊗k for k = 3c/α. The resulting instance G⊗k has size nO(1) , alphabet size s = (s1 )k and the 1/c α/c )3c/2α 6 ε by Theorem 1.1 (and soundness is at most (4ε1 )k/2 6 (4 · ( 3c α) ε assuming ε = o(1)). Finally, plugging in the bound for s1 and the value of ε1 , s = (s1 )k 6 exp(k/εc1 ) = exp(1/εα ) . Finally, Corollary 1.2 follows immediately from the above theorem, by taking ε = (log n)−c and choosing α < 1/c so that the alphabet size is bounded by s 6 exp(1/εα ) 6 n. We also remark that the resulting instance is regular, due to the regularity of the [MR10] instance. 24
7.2 set cover In this section we prove Corollary 1.3. First, a brief background. Feige [Fei98] proved (extending [LY94]) that set cover is hard to approximate to within factor (1 − o(1)) ln n, by showing a quasi-polynomial time reduction from 3SAT. His reduction has two components: a multi-prover verification protocol, and a setpartitioning gadget. Moshkovitz [Mos12] shows that the multi-prover protocol can be replaced by a label cover instance with an agreement soundness property that she defines. She also shows how to obtain such an instance starting with a standard label cover instance. Corollary 1.3 follows by instantiating this chain of reductions starting with a label cover instance from Theorem 7.4. Details follow. Let α > 0 and let G be an instance of label cover as per Corollary 1.2, with soundness ε < a2 /(log n)4 for a = O(α5 ), and with |Σ| 6 n. This clearly follows by setting c = 5 in the corollary, and we can also assume that G is regular. Using a reduction of Moshkovitz (Lemma 2.2 in [Mos12]) we construct in polynomialtime a label cover instance G1 such that the Alice degree of G1 is D = Θ(1/α), and such that, setting ε1 = a/(log n)2 , – If val(G) = 1 then val(G1 ) = 1 – If val(G) < ε21 = a2 /(log n)4 then every assignment for G1 has the following agreement soundness property6. For at least 1 − ε1 fraction of the vertices u ∈ U1 the neighbors of u project to D distinct values (i.e., they completely disagree). While a hardness for set cover is proven in [Mos12], it is unfortunately proven under a stronger conjecture than our Corollary 1.2. For completeness, we repeat the argument. Let us recall the gadget used in [Fei98] (see Definition 3.1 in [Fei98]) Definition 7.5 (Partition Systems). A partition system B(m, L, k, d) has the following properties: – m: There is a ground set [m]. (We will use m = nD , where n1 is the size of 1 G1 ). – L: There is a collection of L distinct partitions, p1 , . . . , pL . (We will use L = |Σ| 6 n). – k: For 1 6 i 6 L, partition pi is a collection of k disjoint subsets of [m] whose union is [m]. (We will use k = D = O(1/α)). Let us denote pi (j) the j-th set in the partition pi . – d: Any cover of [m] by subsets that appear in pairwise different partitions requires at least d subsets. (We will use d = k · (1 − D2 ) ln m). 6
In [Mos12] the projections go from Alice to Bob, while ours go from Bob to Alice.
25
Such a gadget is explicitly constructed in time linear in m and termed “antiuniversal sets” in [NSS95]). The intention is that the gadget can be covered in the yes case by k sets, and in the no case by at least d = k · (1 − D2 ) ln m sets. Construction of set cover instance. Finally, the set cover instance will have a ground set U1 × [m] consisting of |U1 | partition-system gadgets. For every v ∈ V1 and value β ∈ Σ there will be a set Sv,β in the set cover instance. Denoting by DV the number of neighbors of v in G1 , the set Sv,β will be the union of DV sets, one for each neighbor u of v. We arbitrarily enumerate the neighbors of each u ∈ U1 with numbers from 1 to D, and then denote by juv ∈ [D] the number on the edge from u to v. With this notation, Sv,β will be the union of the sets {u} × pα (j) where j is the number of the edge uv and α = πuv (β): [ {u} × (pπuv (β) (juv )) . Sv,β = u∼v
Completeness. It is easy to see that if f, 1 is a satisfying assignment for G1 then by taking the sets Sv, f (v) the gadget corresponding to u is covered by the k sets in the partition corresponding to α = 1(u). In total the set cover has size |V1 | = D|U1 |. Soundness. In the no case, we claim that every set cover has size at least Z = (1 − D4 ) ln m · |V1 |. Assume otherwise, and let su be the number of sets in the cover that touch {u} × [m]. By X su = Z · DV , u
def
at least D2 fraction of the vertices u ∈ U1 have su < ℓ = (1 − D2 )D ln m, and we call such u’s good. Define a randomized assignment f : V → Σ by selecting for each v a value β at random from the set of β’s for which Sv,β belongs to the set cover (or, if no such set exists output a random β). First, by the property of the gadget, since the set cover must cover each {u} × [m], if u is good there must be two neighbors v1 , v2 ∼ u such that the sets Sv1 ,β1 and Sv2 ,β2 are in the cover, and such that πuv1 (β1 ) = α = πuv2 (β2 ) (both pα (juv ) ⊂ Sv1 ,β1 and pα (juv2 ) ⊂ Sv2 ,β2 ). Next, observe that if u is good then each neighbor v of u has at most ℓ = (1 − D2 )D ln m values of β for which Sv,β is in the cover, because each such set is counted in su . So the probability that f (v1 ) = β1 and f (v2 ) = β2 is at least D2 · ℓ12 > ε1 and this contradicts the agreement soundness property of G1 as long as ε1 < a/(log n)2 , for a < O(1/D5 ) = O(α5 ).
26
Size. The size of the setcover instance is at most m = nD times n1 , so ln(nD+1 )= 1 1 1 (D + 1) ln n1 = (1 + D ) ln m. So by choosing the appropriate constant relation between 1/D and α we get a factor of (1 − α) ln N hardness for approximating a set cover instance of size N.
8 Conclusions For many kind of games, tight parallel repetition bounds are still open, for example, for general7 (non-projection) two-player games, for entangled games (where the two players share entanglement, see [KV11]), and for games with more than two parties. It is an interesting question whether the analytical approach in this work can give improved parallel repetition bounds for these cases. A more open-ended question is whether analytical approaches can complement or replace information-theoretic approaches in other contexts (for example, in communication complexity) leading to new bounds and simpler proofs.
Acknowledgments We thank Boaz Barak, Ryan O’Donnell, Ran Raz, and Oded Regev for insightful discussions and comments about this work.
References [ALM+ 98] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy, Proof verification and intractability of approximation problems, Journal of the ACM 45 (1998), no. 3, 501–555. 23 [AS98]
S. Arora and S. Safra, Probabilistic checking of proofs: A new characterization of NP, Journal of the ACM 45 (1998), no. 1, 70–122. 23
[BHH+ 08] Boaz Barak, Moritz Hardt, Ishay Haviv, Anup Rao, Oded Regev, and David Steurer, Rounding parallel repetitions of unique games, FOCS, 2008, pp. 374–383. 6, 18 [DH09]
Irit Dinur and Prahladh Harsha, Composition of low-error 2-query pcps using decodable pcps, FOCS, 2009, pp. 472–481. 24
[Fei98]
U. Feige, A threshold of ln n for approximating set cover, Journal of the ACM 45 (1998), no. 4, 634–652. 3, 5, 25
[FK00]
Uriel Feige and Joe Kilian, Two-prover protocols - low error at affordable rates, SIAM J. Comput. 30 (2000), no. 1, 324–346. 9
7 An earlier version of this manuscript erroneously claimed a reduction from general constraints to projection constraints. We are thankful to Ran Raz for pointing out this error.
27
[FKO07]
Uriel Feige, Guy Kindler, and Ryan O’Donnell, Understanding parallel repetition requires understanding foams, IEEE Conference on Computational Complexity, 2007, pp. 179–192. 5
[FL92]
Uriel Feige and László Lovász, Two-prover one-round proof systems: Their power and their problems (extended abstract), STOC, 1992, pp. 733– 744. 6
[Hol09]
Thomas Holenstein, Parallel repetition: Simplification and the nosignaling case, Theory of Computing 5 (2009), no. 1, 141–172. 3, 6, 9
[KV11]
Julia Kempe and Thomas Vidick, Parallel repetition of entangled games, STOC, 2011, pp. 353–362. 27
[Lov79]
László Lovász, On the shannon capacity of a graph, IEEE Transactions on Information Theory 25 (1979), no. 1, 1–7. 6
[LY94]
Carsten Lund and Mihalis Yannakakis, On the hardness of approximating minimization problems, Journal of the ACM 41 (1994), no. 5, 960–981. 25
[Mos12]
Dana Moshkovitz, The projection games conjecture and the NP-hardness of ln n-approximating set-cover, APPROX-RANDOM, 2012, pp. 276– 287. 3, 5, 25
[MR10]
Dana Moshkovitz and Ran Raz, Two-query PCP with subconstant error, J. ACM 57 (2010), no. 5. 3, 24
[NSS95]
Moni Naor, Leonard J. Schulman, and Aravind Srinivasan, Splitters and near-optimal derandomization, FOCS, 1995, pp. 182–191. 26
[Rao11]
Anup Rao, Parallel repetition in projection games and a concentration bound, SIAM J. Comput. 40 (2011), no. 6, 1871–1891. 3, 5, 6, 9
[Raz98]
Ran Raz, A parallel repetition theorem, SIAM J. Comput. 27 (1998), no. 3, 763–803. 3, 6, 9
[Raz08]
, A counterexample to strong parallel repetition, FOCS, 2008, pp. 369–373. 5, 6
[Ste10a]
David Steurer, Improved rounding for parallel repeated unique games, APPROX-RANDOM, 2010, pp. 724–737. 6
[Ste10b]
, Subexponential algorithms for d-to-1 two-prover games and for certifying almost perfect expansion, Manuscript, available from the author’s website., 2010. 6
28
A
Additional Proofs
Reduction to expanding games. Let G = (U, V, E) be a projection game. Define the following game G′ = (U ∪ {λ}, V, E′ ). Bob gets a random question v ∈ V. Alice with probability half gets a random neighbor u of v and with probability half gets a null question λ. The players succeed on a uv question as before, and on a null question only if Alice answers 1 and regardless of the answer of Bob. Clearly val(G′ ) > val(G) and if val(G) < 1 − ε then val(G′ ) < 1 − ε/2.
29