A Parallel Repetition Theorem Ran Raz
[email protected] Department of Applied Mathematics Weizmann Institute Rehovot 76100, ISRAEL Abstract We show that a parallel repetition of any two-prover one-round proof system (MIP(2; 1)) decreases the probability of error at an exponential rate. No constructive bound was previously known. The constant in the exponent (in our analysis) depends only on the original probability of error, and on the total number of possible answers of the two provers. The dependency on the total number of possible answers is logarithmic, which was recently proved to be almost best possible [27].
1 Introduction and Basic Notations 1.1 History and Motivation We consider two-prover one-round proof systems (MIP(2; 1)), as introduced in [12]. In a MIP(2; 1) proof system, two computationally unbounded provers (that are not allowed to communicate with each other) try to convince a probabilistic polynomial time veri er that a common input I belongs to a pre-speci ed language L. The proof proceeds in one round: The veri er generates a pair of questions (x; y), based on I and on a random string r, and sends x to the rst prover and y to the second prover. The rst prover responds by sending u, and the second prover responds by sending v. Based on I; r; x; y; u; v, the veri er decides whether to accept or reject the conjecture: I 2 L. In what follows the strategy of the veri er (the way to generate the questions, and to decide whether I 2 L) is called \a proof system", and the pair of strategies of the provers is called \a protocol". A language L is in MIP(2; 1) with probability of error if there exists a proof system (of this type), such that: 1) For I 2 L; there exists a protocol that causes the veri er to always accept. 2) For I 62 L; for any possible protocol, the veri er accepts with probability smaller than . For the exact de nitions of MIP(2; 1) proof systems and the family of languages MIP(2; 1), see [12, 24].
A preliminary version of the paper appeared in Stoc 95.
1
The class of languages MIP(2; 1) turned out to be very powerful. In particular, it follows from [7, 36, 25] that NEXPTIME = MIP(2; 1) with exponentially small probability of error. MIP(2; 1) proof systems have cryptographic applications (see [12, 13, 35, 18]), and have also been used as a starting point to prove that certain optimization problems are hard to approximate (see [22, 4, 5, 25, 6, 10, 37, 8]). For more discussion about the class MIP(2; 1), and its applications, see [24], and the references there. Sequential repetition of MIP(2; 1) proof systems decreases the probability of error exponentially, but requires multiple rounds. Parallel repetition preserves the number of rounds. At what rate is the probability of error decreased by parallel repetition ? At rst, it was believed that, as in the sequential case, repeating a proof system k times in parallel decreases the probability of error to k (see [26, 28]). A counter example for this conjecture was given in [21] (see also [35, 19, 25, 27]). For years it was not even known whether parallel repetition can make the probability of error arbitrarily small. This was recently proved in [43]. No constructive bound, however, was given there for the number of repetitions required to decrease the probability of error below a given bound. In the simpler special case, where the provers' questions, x and y, are chosen independently, it has been known that repeating a proof system k times in parallel decreases the probability of error exponentially. The proof was rst given in [15], and the bound was further improved in [35, 38, 19, 1]. Another special case, a tree-like-measure case, was recently settled in [44]. The constant in the exponent in all those proofs, however, is (at least) polynomial in the number of possible questions (x; y). We remark that in most interesting cases, the questions x,y are not chosen independently, and the measure is not tree-like. Researchers were interested in analyzing the probability of error of a parallel repetition, not only as a mathematical problem, but also because an ecient technique to decrease the probability of error was needed. In the literature there are many results that use dierent techniques to decrease the probability of error [19, 32, 36, 25, 8, 24, 42]. For certain applications, however, these techniques are insucient. Parallel repetition was suggested as a technique to decrease the probability of error, because it was believed to be very ecient, and because it preserves many canonical properties of the proof system (e.g. zero knowledge). Since the appearance of a preliminary version of this paper in STOC95 [39], our results were used by [9] to improve many of the hardness results for approximation of optimization problems (see also [2, 3, 24, 11, 30, 31]). It was also used by [41] to prove a new direct sum theorem for probabilistic communication complexity. The reader can nd more about the problem of parallel repetition in [19, 25, 24, 20].
1.2 Main Result We will concentrate on the deterministic case, where the veri er decides whether to accept or reject the conjecture: I 2 L, based on I; x; y; u; v only (and not on the random string r). The probabilistic case, where this decision depends also on r, is shortly discussed in section 7. Following [15, 19, 25], we model the problem as a problem on games : 2
A game G consists of four nite sets X; Y; U; V , with a probability measure : X Y ! R+, and a predicate Q : X Y U V ! f0; 1g. We think of Q also as a set. A protocol for G consists of a function u : X ! U , and a function v : Y ! V . The value of the protocol is de ned as X (x; y)Q(x; y; u(x); v(y)) X Y
i.e. the -probability that Q(x; y; u(x); v(y)) = 1. The value of the game, w(G), is de ned to be the maximal value of all protocols for G. The answer-size of the game, s(G), is de ned by s(G) = jU jjV j. We think of G as a game for two players (provers): Player I receives x 2 X , and player II receives y 2 Y , according to the pair distribution (x; y). We think of x; y as the inputs for the game. Each player doesn't know the other player's input. Player I has to give an \answer" u0 2 U , and player II has to give an \answer" v0 2 V . The goal of the players is to maximize the -probability that Q is satis ed (i.e. the probability that Q(x; y; u0; v0) = 1). We remark that this probability corresponds to the probability of error. The game G G consists of the sets X X ; Y Y ; U U ; V V , with the measure ((x1; x2); (y1; y2)) = (x1; y1)(x2; y2) and the predicate Q Q((x1; x2); (y1; y2); (u1; u2); (v1; v2)) = Q(x1; y1; u1; v1)Q(x2; y2; u2; v2) In the same way, the game G k consists of the sets X k ; Y k ; U k ; V k , with the measure k Y k (x; y) = (xi; yi) i=1
and the predicate
Q k (x; y; u; v) =
Yk i=1
Q(xi; yi; ui; vi)
where x 2 X k ; y 2 Y k ; u 2 U k ; v 2 V k . Here and throughout zi stands for the i-th coordinate Q; X; Y ; U; V ; . of a vector z. We denote G k ; Q k ; X k ; Y k ; U k ; V k ; k also by G; Assume w.l.o.g. that s(G) 2. In this paper we prove the following parallel repetition theorem:
Theorem 1.1
There exists a global function W : [0; 1] ! [0; 1], with z < 1 ) W (z ) < 1, such that given a game G, with value w(G), and answer-size s(G) 2:
w(G k ) W (w(G))k= log2(s(G)) The exact behavior of the function W (z) is not the focus of this paper. We will make here, however, a few comments about the function W (z) implicit in the paper : 3
1. When z tends to 0, the function W (z) (implicit in this paper) tends to a constant const1 > 0. In fact, for every 0 < z 1, W (z) > const1. It is still not clear whether a tendency of W (z) to 0, when z tends to 0, can be achieved. 2. Obviously, if w(G) is a global constant (e.g. w(G) = 1=2) then W (w(G)) is just a dierent global constant. E.g. there exists a constant const2 < 1, s.t. for every 0:01 z 0:99, const1 < W (z) < const2. 3. Obviously, when z tends to 1, W (z) also tends to 1. It will be simpler to denote t = 1 ? z, and to analyze the behavior of [1 ? W (1 ? t)] when t tends to 0. It is implicit in this paper that there exists a (small) constant const3 , (1=34 seems to be enough), s.t. when t tends to 0, [1 ? W (1 ? t)] can be bounded by O(tconst3 ). E.g. there exists a constant const4, s.t. for every t 0:01, [1 ? W (1 ? t)] const4 tconst3 , i.e.
W (1 ? t) 1 ? const4 (1 ? z)const3 It was recently shown in [27] that in certain examples the number of repetitions, k, required to decrease the probability of error from w(G) = 1=2 to w(G k ) 1=8 is ! log ( s ( G )) 2
log log (s(G)) 2
2
This shows that the factor log2(s(G)) in Theorem 1.1 is almost best possible. It was observed by [41] that in Theorem 1.1, the term log2(s(G)) can be replaced with the (possibly smaller) CC (G), or with the (possibly smaller) (G), de ned in the following way: For every (x; y) 2 X Y , de ne Qx;y : U V ! f0; 1g , by Qx;y (u; v) = Q(x; y; u; v) . De ne CCx;y to be the deterministic communication complexity of the function Qx;y , and de ne x;y to be the exact-cover-number of the same function (for the de nitions see [41, 33]). Then de ne CC (G) to be the maximum, taken over x; y, of CCx;y , and de ne (G) to be the maximum, taken over x; y, of x;y . It is well known (and very easy to prove) that
(G) CC (G) log2(s(G)): Throughout the paper (sections 2, 3, 4, 5, 6), X; Y; U; V; Q; refer to the game G, from X; Y ; U; V ; Q; refer to Theorem 1.1 as well. Denote, for the Theorem 1.1. Similarly, k; G; rest of the paper s = s(G) = jU jjV j.
1.3 Main Technical Theorem In what follows, we will sometimes denote the game G by G , and denote its value by w(). We will sometimes say that the protocol is a protocol for (as X; Y; U; V; Q will be xed). We will look at the values w( ), for dierent probability measures : X Y ! R+. 4
For a measure : ! R+ , and a set C , we use the usual notation X (C ) = (z) z2C
For a probability measure : ! R+, and a set C , denote by C : ! R+ the probability measure ( 0 for z 62 C C (z) = (z) for z 2 C We will think of C also as C : C ! R+.
(C )
This de nition makes sense only if (C ) > 0. For C , with (C ) = 0, de ne C (z) to be identically 0. More generally, the term 00 can appear in some places in this paper. Unless said otherwise, 00 is de ned to be 0. The term 0z is de ned to be 0, even if z = 1, or z is unde ned. This can occur when we take the expectancy (or a weighted average) of a variable z, which is unde ned with probability 0 (or with a weight of 0). Y ; U; V , with the measure For a set C X Y , de ne the game G Ck = GC to consist of X; C , and the predicate Q. We think of GC as the restriction of G to the set C . AY Y . In the proof we will always work with a product set A = AX AY , where AX X; Since in most parts of the paper we work with one speci c A = AX AY , it will be convenient to denote (throughout the paper) the measure A by . The game G A will also be denoted by G , and its value will also be denoted by w(). De ne the predicate Q i : X Y U V ! f0; 1g by y 2 Y ; u0 2 U; v0 2 V : Q i(x; y; u0; v0) = Q(xi; yi; u0; v0) 8x 2 X; Y ; U; V with the measure , and the predicate Q i. We De ne the game G iA to consist of X; also denote G iA by G i , and its value by wi(). We think of G i as the restriction of G to one coordinate. Notice that for an input (x; y) 2 X Y , the game G just means playing simultaneously all the games Gi (on the same input). The following is our main technical theorem. The theorem claims that if A = AX AY is large then wi() is small for at least one coordinate i.
Theorem 1.2
There exists a global function W2 : [0; 1] ! [0; 1], with z < 1 ) W2(z ) < 1, and a global constant c0 , such that for all games G: AY Y ), and any for any k, and any product set A = AX AY X Y (where AX X; ? k 0 1, if ? log2 (A) k (i.e. (A) 2 ) then there exists i with
wi() W2(w(G)) + c01=16 Again, achieving the best function W2, and the best constant c0, and improving the constant 1=16 are not the focus of this paper. 5
1.4 Some More Notations and Basic Facts
For a measure or function : k ! R de ne i to be the projection of on the i-th coordinate. Thus, for a 2
X (z) i (a) = fz2 k j zi =ag
In particular for : X Y ! R, de ne = X Y , and think of as a measure (or function) : k ! R. The projection i is now a measure (or function) i : X Y ! R. In particular we will be interested in the projections i. For a probability measure : X Y ! R+ , we will be interested, from time to time, in the -probability of an element x 2 X . We denote this probability by (x), thus X
(x) = (x; y) y2Y
For simplicity we use the same notation (y) for the -probability of an element y 2 Y . The dierence will be that we use the letters x; xi; a; ai, for elements of X , and the letters y; yi; b; bi, for elements of Y . It will always be clear if the element is an element of X or an element of Y. When X; Y; U; V; Q are xed, we will be interested in the value of the game, w( ), as a function of the measure : X Y ! R+ . First notice that w is a continuous function. Also, if for 1; 2, the L1 distance satis es k 1 ? 2 k1 then jw( 1) ? w( 2)j . This is true since every protocol for one of the measures can be viewed as a protocol for the other one, with value dierent by at most . Thus w has a Lipschitz constant of 1. If = p 1 + (1 ? p) 2 , for 0 p 1, then w( ) pw( 1 ) + (1 ? p)w( 2 ) This is true, because every protocol for can be viewed as a protocolPfor 1, and 2. Thus, P m the function w is concave. If = i=1 pi i, where 8i : 0 pi 1, and mi=1 pi = 1, then m X w( ) pi w( i) i=1
For a game G, it is sometimes convenient to consider also probabilistic protocols. In a probabilistic protocol, the \answers" of the players can depend also on a random string. Thus the \answers" are u(x; h); v(y; h) (rather than u(x); v(y)), where h is a random string. The value of the protocol will be the probability, over the inputs (x; y), and over the random strings, that Q(x; y; u(x; h); v(y;h)) = 1. However, since a probabilistic protocol can be viewed as a convex combination of deterministic ones, the value of any probabilistic protocol can be achieved by a deterministic one. Remark: in this paper the logarithm function log is always taken base 2. The natural logarithm is denoted by ln . 6
1.5 Organization of the Paper The paper is organized as follows: In section 2, we show how Theorem 1.2 leads to the proof of Theorem 1.1. In section 3, we review the de nition, and the basic properties, of the Informational Divergence, a basic tool of information theory. This tool is needed for the proof of Theorem 1.2. Theorem 1.2 is proved in section 4. The proof of the main lemma is deferred to section 5, and the proof of another lemma is deferred to section 6. In section 7 several generalizations of Theorem 1.1 are shortly discussed. We remark that two \shortcuts" can be done in the paper: First, in the special case where the measure is a product measure (i.e. = X Y ), the entire argument is much simpler, and the proof follows simply from section 2, and the beginning of section 4 (plus simpler versions of several lemmas in section 3, using Entropy instead of Informational Divergence). Also, section 6 is not really needed for the proof of the parallel repetition conjecture, but it is needed to improve the constant in the exponent to log(s(G)). A simpler proof which does not use section 6 can be given, but the constant in the exponent in that proof is much worse.
2 Proof of the Parallel Repetition Theorem In this section, we show how Theorem 1.2 leads to the proof of Theorem 1.1. Theorem 1.1 claims an upper bound for the value w(G ) . We will prove even more : we will upper bound the value w(G A ) , of the game G A , for any large product set A = AX AY X Y . We will rst give some intuition : The proof uses a simple induction on the dimension k. The idea is to assume by Theorem 1.2 w.l.o.g. an upper bound for w(G 1A ). Given a protocol for GA , we partition A into product subsets, according to the behavior of the protocol on the rst coordinate. The size of this partition is not too big, and therefore the average size of a subset in the partition is not too small. In many of these subsets the protocol fails to satisfy Q , because it fails to satisfy Q 1. We can disregard these subsets. In every other subset, the predicate Q can be thought of as a k ? 1 dimensional predicate, and we can use induction to upper bound the size of the set of points, which satisfy this predicate. By this argument, we will get a recursive bound for w(G Ak ) , as a function of the dimension k, and the size (A). For the game G, de ne CG(k; r) to be the maximum, taken over A, of the value w(G Ak ) , AY Y ), of a game G Ak = G A = G , where A = AX AY is a product set (with AX X; with ? log (A) r (i.e. (A) 2?r ). Here, k is the dimension, and r 0 is real. For k = 0, it will be convenient to de ne CG(0; r) = 1. We will prove an upper bound for CG(k; r), as a function of w(G). The theorem will follow by taking r = 0. Recall that X; Y; U; V; Q; refer to the game G from Theorem 1.1, and that s = s(G). Assume for simplicity that 0 < w(G) < 1 (otherwise the game is trivial). Take 0 < < 1 , ( will be determined later on). For r k , take A = AX AY X Y , with 7
? log (A) r , which achieves CG(k; r). Thus w(G ) = CG(k; r) By Theorem 1.2, there exists i with w(G i ) = wi() W2(w(G)) + c01=16 where W2; c0 are taken from Theorem 1.2. W.l.o.g. assume that i = 1. Denote
w^ = W2(w(G)) + c01=16 We will later on assume that is such that 0 < w^ < 1 and
1
2? 12 < w^ 2(log( )+) < 1 (at this point the reader can ignore these two assumptions). Let u : X ! U ; v : Y ! V be a protocol for G , achieving the value w(G ). The pair (x1; u1(x)) is a function of x 2 X . Partition AX according to (x1; u1(x)) . Formally, 8x0 2 X; u0 2 U de ne s
AX (x0; u0) = fx 2 AX j x1 = x0; u1(x) = u0 g Then the family fAX (x0; u0)g is a partition of AX . In the same way, de ne AY (y0; v0) = fy 2 AY j y1 = y0; v1(y) = v0 g Then the family fAY (y0; v0)g is a partition of AY . For simplicity denote in all the following Z = X Y U V , and z = (x0; y0; u0; v0) 2 Z . For all z = (x0; y0; u0; v0) 2 Z denote A(z) = AX (x0; u0) AY (y0; v0) Then the family fA(z)g is a partition of A, and we have [ A = A(z) z2Z
For all z 2 Z , de ne
n o B (z) = (x; y) 2 A(z) Q (x; y; u(x); v(y)) = 1 i.e. B (z) is just the set of elements of A(z) satisfying Q . Notice that for z 62 Q (i.e. z such that Q(z) = 0) we have (x; y) 2 A(z) ) Q(x1; y1; u1(x); v1(y)) = Q(z) = 0 . Therefore, 8
z 62 Q implies B (z) = ;. On the other hand, for z 2 Q we have that (x; y) 2 A(z) ) Q(x1; y1; u1(x); v1(y)) = Q(z) = 1 . Therefore, for z 2 Q, and (x; y) 2 A(z), Yk Q (x; y; u(x); v(y)) = Q(xi; yi; ui(x); vi(y)) i=2
Thus in this case, B (z) is a set of elements satisfying a k ? 1 dimensional predicate. This fact enable us to use induction. For all z de ne (z)) (z) = (A(z)) ; (z) = ((B A(z))
Then we have:
Claim 2.1
X z2Q
(z) w^
Proof: u1 : X ! U ; v1 : Y ! V can be viewed as a protocol for G 1 . The value of this protocol is clearly
X
z2Q
(A(z)) =
but this value is at most w(G 1 ) w^.
X
z2Q
(z)
2
Claim 2.2
For all z = (x0; y 0; u0; v 0) 2 Q , with (z ) > 0,
(z) CG (k ? 1; r ? log[(z)=(x0; y0)])
Proof:
First notice that if (z) > 0 then also (x0; y0) > 0 , thus the logarithm is well de ned. For k = 1, the claim is immediate. Assume that k > 1. Ignoring the rst coordinate, which is xed, A(z) can be viewed as a set of dimension k ? 1. Formally, de ne A0(z) X k?1 Y k?1 by n o A0(z) = (~x; y~) 2 X k?1 Y k?1 ((x0; x~); (y0; y~)) 2 A(z) where (x0; x~) denotes x 2 X k , with x1 = x0, and (x2; : : : ; xk) = x~ (and similarly (y0; y~)). Since for (x; y) 2 A(z) : x1 = x0, and y1 = y0, we have by de nition k (A(z)) = (x0; y0) k?1 (A0(z)) In the same way de ne n o B 0(z) = (~x; y~) 2 X k?1 Y k?1 ((x0; x~); (y0; y~)) 2 B (z) 9
Then we have
k (B (z)) = (x0; y0) k?1 (B 0(z)) The last k ? 1 coordinates of u : X ! U ; v : Y ! V can be viewed as a protocol for the game G Ak(?z)1 . Since z 2 Q, this protocol satis es Q k?1 at the set of elements B 0(z). Therefore, the value of this protocol is k?1 (B 0(z)) = (B (z)) = (B (z))=(A) = (B (z)) = (z) k?1(A0(z)) (A(z)) (A(z))=(A) (A(z)) so by the de nition of CG we have (z) CG(k ? 1; ? log k?1 (A0(z)) = CG (k ? 1; ? log[(A(z)=(x0; y0)]) = CG(k ? 1; ? log[(A)(z)=(x0; y0)]) CG(k ? 1; r ? log (z) + log (x0; y0)) (recall that by de nition CG (k0; r0) is monotone in r0). Notice that since k?1 (A0(z)) 1, r ? log (z) + log (x0; y0) 0 , thus CG is de ned. 2 0
Claim 2.3
CG(k; r) =
Proof:
X z2Q
(z) (z)
The protocol u; v satis es Q at the set of elements Sz2Z B (z). Therefore ! X [ X (z)) = X (z) (z) B (z) = (B (z)) = (A(z)) ((B CG (k; r) = w(G ) = A(z)) z2Z z2Z z2Z z2Z but z 62 Q implies (z) = 0. Thus
CG(k; r) =
X z2Q
(z) (z)
2 Denote
T = fz 2 Q j (z) > 0g From Claim 2.2, and Claim 2.3 we have the recursive inequality X CG(k; r) (z)CG (k ? 1; r ? log[(z)=(x0; y0)]) z2T
where, by Claim 2.1,
X z2T
(z) w^ 10
(1)
We will now assume that is such that 0 < w^ < 1 and
1
2? 12 < w^ 2(log( )+) < 1 We will prove by induction on k the inequality: s
Claim 2.4
k?r 1 CG (k; r) w^ 2(log( )+) s
Proof: For k = 0, the claim is trivial, (since 0 < w^ < 1, and r 0). For k 1 assume the inequality for k ? 1, and substitute in inequality (1) to get
(k?1)?r+log[(z)=(x ;y )] 1 (z) w^ 2(log( )+) z2T k?r X ?+log[(z)=(x ;y )] 1 1 2(log( )+) 2(log( )+) = w^ (z) w^
CG (k; r)
X
0
0
s
0
s
s
z2T
Hence, it will be enough to prove ??log(s)+log[s(z)=(x ;y )] X 1 2(log( )+) (z) w^ 1 0
0
s
z2T
De ne
t(z) =
(
for z 62 T for z 2 T
0
s(z) (x ;y ) 0
0
Then the inequality is equivalent to X (x0; y0) ! 2(log(1 )+) log t(z) 1 2 w ^ t ( z ) w ^ s z2T s
De ne p(z) = (x0; y0)=s , and f : R+ ? f0g ! R by
f (t) = tclog t = t1+log c where
1
c = w^ 2(log( )+) s
In these notations, we have to prove X z2T
p(z)f (t(z)) w^ 12 11
0
We assumed 2? 12 < c < 1 . Therefore ? 12 < log c < 0 , thus f (t) = tclog t = t1+log c is convex. Notice that X X X (x0; y0) 1 X X 0 0 1 X 1s = 1 = ( x ; y ) = 1 = p(z) = s s s s z2Z
U V X Y
U V X Y
U V
Therefore, we can use Jensen's inequality to conclude !1+log c X X X X 1+log c 1+log c p(z)f (t(z)) = p(z)t(z) = p(z)t(z) p(z)t(z) but
z2T
z2T
z2Z
z2Z
X (x0; y0) s(z) X p(z)t(z) = p(z)t(z) = s (x0; y0) = z2T (z) w^ z2Z z2T z2T and since the function t1+log c is monotone in t , we have !1+log c X X w^1+log c w^1? 21 = w^ 21 p(z)t(z) p(z)f (t(z)) X
z2T
X
z2Z
(where the third inequality uses the assumption log c > ?1=2).
2
By Claim 2.4, and by < 2 log2(s) (which follows from the assumptions: < 1, and s 2), we can now conclude k 1 k 1 k= log(s) 1
k 2(log( )+) w(G ) = CG(k; 0) w^ w^ 4log( ) = w^ 4 s
s
where w^ = W2(w(G)) + c01=16 , and 0 < < 1 satis es 0 < w^ < 1, and 1
2? 12 < w^ 2(log( )+) < 1 Since 0 < W2(w(G)) < 1 , and since w^ is monotone in , the conditions are satis able - Just start from = 0, and increase until the conditions hold. Thus Theorem 1.1 follows. We will take in Theorem 1.1 W (w(G)) = inf w^ 14 where the in mum is taken over all 0 < < 1, which satisfy the conditions. s
3 Informational Divergence In this section, we de ne the informational divergence, a basic tool of information theory, and review some of its basic properties that will be used in the paper. The reader can nd excellent treatments of the subject in [29, 17]. 12
Given a nite probability space , with two probability measures #; , the divergence of # with respect to is de ned by X D ( # k ) = #(z) log #((zz)) z2
In this de nition, 0 log 00 is de ned to be 0, and for z > 0, z log 0z is de ned to be 1. Thus D ( # k ) < 1 if and only if # is absolutely continuous with respect to (i.e. (z) = 0 implies #(z) = 0). In the special case where is the uniform distribution, we have
D ( # k ) = log2 j j ? H(#); where H(#) is the standard entropy of #. More generally, the divergence D ( # k ) can be thought of as the entropy of the measure #, relative to the measure , as opposed to the standard entropy of #, which is taken relative to the uniform distribution. D ( # k ) has many names and notations throughout the literature. In [29] it is also called \relative entropy", and denoted by H#k (Q), where Q is the partition of into single points. This is a special case of the following de nition: For a measurement f on , with a nite alphabet A, let Q be the induced partition ff ?1 (a)ga2A . Let #f ; f be the corresponding probability mass functions, i.e. for a 2 A
#f (a) = #(fz 2 j f (z) = ag) ;
f (a) =
(fz 2 j f (z) = ag)
The relative entropy of f , with measure #, with respect to the measure , is de ned by X H#k (f ) = H#k (Q) = #f (a) log #f ((aa)) f a2A In this paper, we prefer to use the notation D ( # k ). The following lemma, known as the divergence inequality, is probably the most basic property of the informational divergence:
Lemma 3.1
for all #; , we have
D (# k ) 0
Proof:
See [29, Ch.2, Theorem 2.3.1]
2
For measures #; on ki=1 i , recall that #i; i are the projections of #; on the i-th coordinate. 13
Lemma 3.2
For measures #; on 1 2 , such that = 1 2, D #
D #1
1 + D #2
2
Proof:
See [29, Ch.2, Lemma 2.5.3] The lemma is stated there as: MXY = MX MY implies
HP kM (X; Y ) HP kM (X ) + HP kM (Y ) 2 The next lemma can be viewed as a generalization of the well known entropy inequality
H(z1; : : :; zk ) H(z1) + : : : + H(zk ) (for any random variable z), and as a generalization of the previous lemma.
Lemma 3.3
For measures #; on k , such that = ki=1 i
k
X D # D #i
i i=1
Proof:
Immediate from Lemma 3.2.
2
For a function : ! R, denote by k k1 the standard L1 norm of , i.e. X k k1=k (z) k1= j(z)j z2
If is a probability measure then k k1= 1. In this paper we use k # ? k1 as a distance function between measures (or functions). It is not true that the divergence D ( # k ) is a distance function. However, it is true that if D ( # k ) is small then the L1 distance between # and is also small:
Lemma 3.4
For all #; , we have
(2 ln 2)D ( # k ) (k # ? k1)2
Proof:
See [17, Ch. 3, Exercise 17], and the references there. 14
2 The next lemma computes the value of D ( # k ), in the special case # = A .
A,
where
Lemma 3.5
For a measure , and a set A , we have
D ( A k ) = ? log (A) Proof: X
X A (z ) = A (z ) log (z) (z) z2
z2A X (z)= (A) = X (z)(? log (A)) = ? log (A) = A A (z ) log (z) z2A z2A
D( A k ) =
A (z ) log
A (z )
2 For a probability measure : ! R+, where = X Y , de ne (a; ) : Y ! R+ to be the probability measure on Y , derived from by xing x = a. Thus (a; ) is the following probability measure: for all y 2 Y , y) (a; )(y) = (a; (a) where (a) = Py2Y (a; y). This de nition makes sense only if (a) > 0. Otherwise de ne (a; ) to be identically 0. In the same way, de ne the measure (; y) : X ! R+ . For the measures #; : X Y ! R+ , we will be interested in the values of D ( #(x; ) k (x; )), and D ( #(; y) k (; y)). De ne X VX ( # k ) = #(x)D ( #(x; ) k (x; )) x2X
and
VY ( # k ) =
X y2Y
#(y)D ( #(; y) k (; y))
These are denoted in [29] by H#k (Y jX ), and H#k (X jY ). In addition, de ne V ( # k ) = 21 [VX ( # k ) + VY ( # k )] 15
The notion V ( # k ) is central in the rest of the paper. In particular we will be interested in cases were V ( # k ) is small. We saw before that if D ( # k ) is small then k # ? k1 is also small. Is the same true for V ( # k ) ? Taking X = Y = f0; 1g and ! 1 0! 1 0 ; #= 0 0 = 02 1 2 we have V ( # k ) = 0, but still k # ? k1= 21 . Hence, in general it is not the case. A measure : X YS! R+ is called irreducible, if there are no non-trivial partitions S X = X1 X2 ; Y = Y1 Y2 such that (X1 Y2) = (X2 Y1) = 0. Every measure : X Y ! R+ decomposes into its irreducible components. In general, it is true that if V ( # k ) = 0 then # has the same components as , and behaves like on each one of them, but the #-probability of each component can be dierent from the -probability. For irreducible it can be shown that if V ( # k ) = 0 then # = , and that if is xed V ( # k ) ! 0 implies # ! . However, this convergence is not uniform. For example we can take X = Y = f0; 1g, and ! ! 1 1 0 2 = 0 1 ? ; #= 0 0 2 In this case is irreducible, and V ( # k ) = O(), but still k # ? k1= 21 . Therefore, we will use in this paper a dierent characterization of measures #, with small V ( # k ). This characterization, proved in Lemma 6.2, will intuitively say that in this case there are measures #0; 0, such that #0 is very close to #, and 0 is very close to , and such that #0; 0 have the same irreducible components, and behave the same on each one of them (but not necessarily give the same mass to each component). We remark that this characterization is not necessary to prove the parallel repetition conjecture. It is done here only to improve the constants.
4 Proof of the Main Theorem In this section we give the proof of Theorem 1.2. The proofs of two important lemmas are deferred to section 5, and section 6. Given X; Y; U; V; Q; k, de ne for any probability measure : X Y ! R+, the following games: 1. The game G , consisting of X; Y; U; V; Q, with the measure i . Denote the value of this game by w(i ). Y ; U; V ; Q , with the measure . 2. The game G , consisting of X; Denote the value of this game by w(). i
16
Y ; U; V , with the predicate Q i, de ned by 3. The game G i , consisting of X; 8(x; y; u; v) 2 X Y U V : Q i(x; y; u; v) = Q(xi; yi; u; v) and with the measure . Denote the value of this game by wi(). Recall that for any probability measure : X Y ! R+ , we denote by w( ) the value of the game G , consisting of X; Y; U; V; Q; . Theorem 1.2 claims that if A = AX AY is large then wi() cannot be too large for all coordinates i. If A is large it is easy to show that for many coordinates i, i is very close to , and therefore w(i) is very close to w(), and is not too large. Hence, it could be enough to show that for some coordinates i, wi() is upper bounded by some \well behaved" function of w(i). For any : X Y ! R+ , any protocol for G de nes also a corresponding protocol, with the same value, for G i . This is true, because if the distribution of (x; y) is then the distribution of (xi; yi) is i . Thus, given an input (x; y), for the game G i , (xi; yi) can be used as an input for the protocol for G . The output of this protocol, can be viewed as an output for the game G i . Therefore, we always have i
i
w(i) wi() The other direction is false, as the the other coordinates can give a lot of information on (xi; yi). There is an important special case, however, in which w(i ) = wi().
Lemma 4.1
If there exist 1 : X Y ! R ; 2 : X ! R ; 3 : Y ! R such that for all (x; y) 2 X Y
(x; y) = 1(xi; yi)2(x)3(y) then
w(i ) = wi()
Proof:
We will show that in this case a protocol for G i de nes a corresponding protocol, with the same value, for G . This will be true, because in this special case a pair (x; y), with distribution , can simply be created, by the two players, from a pair (ai; bi), with distribution i. First notice that 1; 2; 3 are not unique, as we can multiply 1, and divide 2, by the same function f (xi), as long as for all a 2 X : f (a) 6= 0 . In the same way, we can multiply 1, and divide 3, by the same function g(yi), as long as for all b 2 Y : g(b) 6= 0 . Therefore we can assume w.l.o.g. that for all ai; bi : X X 3(y) 2(x) ; and i
fy2Y j yi =bi g
fx2X j xi =ai g
17
are always 0 or 1. Also, we can assume w.l.o.g. that if one of them is 0 then 1(ai; bi) is also 0. Therefore, in this case X i(ai; bi) = 1(ai; bi)2(x)3(y) f(x;y)2X Y j (x ;y )=(a ;b )g 1 10 0 X X 3(y)A = 1(ai; bi) 2(x)A @ = 1(ai; bi) @ i
i
i i
fy2Y j yi =bi g
fx2X j xi =ai g
Notice that now, given ai 2 X , with i(ai) 6= 0, we have Pfx2X j x =a g 2(x) = 1. Therefore, for all ai 2 X , with i(ai) 6= 0, we can de ne a probability distribution on X by ( ai Pra (x) = 0 (x) ifif xxi 6= 2 i = ai i
i
i
Note that Pra (x) is exactly the -probability for x, given that xi = ai. In the same way de ne for bi 2 Y , with i (bi) 6= 0, ( bi Prb (y) = 0 (y) ifif yyi 6= = 3 i bi i
i
Given (ai; bi), with i (ai; bi) 6= 0, choose randomly x 2 X according to Pra (x), and y 2 Y according to Prb (y). If (ai; bi) are chosen with probability i (ai; bi) then (x; y) are chosen with probability i
i
i(xi; yi)Prx (x)Pry (y) = 1(xi; yi)2(x)3(y) = (x; y) i
i
Thus, if (ai; bi) is a random variable, with distribution i , then (x; y) is a random variable with distribution . Players I,II can use a protocol for G i to de ne a probabilistic protocol for G , in the following way: Given an input pair (ai; bi) 2 X Y , for the game G , player I chooses randomly x 2 X , according to Pra (x), and player II chooses randomly y 2 Y , according to Prb (y). Since (ai; bi) is a random variable, with distribution i, (x; y) is a random variable with distribution , and can be used as an input for G i . Players I,II can use now the protocol for G i on the input (x; y), as a protocol for G on the input (ai; bi). Formally, if u : X ! U ; v : Y ! V is the protocol for G i , and h is the random string used by the players, the protocol for G will be: u0(ai; h) = u(x) ; v0(bi; h) = v(y), (where x; y are created from ai; bi; h by the previous procedure). Since Q i(x; y; u(x); v(y)) = Q(xi; yi; u(x); v(y)) = Q(ai; bi; u0(ai; h); v0(ai; h)) i
i
i
i
i
i
the probability that Q(ai; bi; u0(ai; h); v0(ai; h)) = 1 exactly equals the probability that Q i(x; y; u(x); v(y)) = 1. Since (x; y) is a random variable with distribution , this probability is the value of the original protocol for G i . 18
Thus we proved, that a protocol for G i de nes a probabilistic protocol with the same value for G . It is well known, that since a probabilistic protocol can be viewed as a convex combination of deterministic ones, there must exist a deterministic protocol with the same value. i
2
Since the conditions of Lemma 4.1 do not hold usually, we cannot expect them to hold for the measure . The idea will be to represent as a convex combination of measures that satisfy the conditions of Lemma 4.1, and then to deduce bounds for the values wi(). We remark that in the simpler case, in which the original measure is a product measure, the measure does satisfy the conditions of Lemma 4.1. Had we considered only this simpler case, the entire proof would have ended here ! Recall that in all the following = A , where the set A is taken from Theorem 1.2 , and satis es ? log (A) k The following de nition is probably the most important notion in this paper. This de nition enable the representation of as a convex combination of measures with nice properties. A similar idea was used before in [34, 40]. A scheme M of type Ml consists of: 1. A partition of the set of coordinates [k] ? flg into I [ J . 2. Values ai 2 X , for all i 2 I , and bj 2 Y , for all j 2 J . l , however, for simplicity we will (Formally M should be denoted by MI;J;a 1 ;:::;a ;b 1 ;:::;b just use the notation M .) We also denote by M the set o n M = (x; y) 2 X Y 8i 2 I : xi = ai ; 8j 2 J : yj = bj i
ijI j
j
jjJ j
We also denote by Ml the family of all sets M of type Ml (i.e. for all possible I; J; ai1 ; : : :; ai ; bj1 ; : : :; bj ). Notice that each Ml is a cover of X Y . Each element (x; y) 2 X Y is covered 2k?1 times by each Ml. For all l, de ne the following two probability measures, l; l : Ml ! R+ by ) (M ) A (M ) (A \ M ) l(M ) = 2(kM ?1 ; l (M ) = 2k?1 = 2k?1 = (A)2k?1 (l; l are obviously probability measures, by the above observation that each element (x; y) is covered exactly 2k?1 times by Ml). Recall that for the set M , we have the measures M : X Y ! R+ ; M : X Y ! R+. For M with (M ) > 0 (respectively (M ) > 0), these measures are probability measures jI j
jJ j
19
(recall that otherwise they are de ned to be identically 0). Recall that iM ; Mi : X Y ! R+ are the projections of these measures on the i-th coordinate. For M 2 Ml , we will be mainly interested in the projections lM ; Ml . Note that since = k , the projection lM is just . The important fact for M is:
Claim 4.1 For M 2 Ml (with l (M ) > 0), Proof:
wl(M ) = w(Ml )
This is a simple application of Lemma 4.1 : For (x; y) 2 M , we have
M (x; y) =
(M )?1(x; y)
= (M )?1(xl; yl) = (M )?1(xl; yl)
= Y i2I
Y i2I
Yk ? 1 (xi ; yi) (M ) i=1 (xi; yi)
(ai; yi)
Y
j 2J
Y
j 2J
(xj ; yj )
(xj ; bj )
De ne 1 : X Y ! R ; 2 : X ! R ; 3 : Y ! R by Y Y 1(xl; yl) = (M )?1(xl; yl) ; 2(x) = (xj ; bj ) ; 3(y) = (ai; yi) j 2J
i2I
Then for (x; y) 2 M , we have
M (x; y) = 1(xl; yl)2(x)3(y) The event (x; y) 2 M \ A can be written as \ (x 2 AX \ 8i 2 I : xi = ai) (y 2 AY \ 8j 2 J : yj = bj ) Therefore, de ning 2 : X ! R ; 3 : Y ! R by ( if x 2 AX \ 8i 2 I : xi = ai 2(x) = 01 otherwise ( if y 2 AY \ 8j 2 J : yj = bj 3(y) = 01 otherwise we have 8(x; y) 2 X Y M (x; y) = A\M (x; y) = (M )A (x; y) = M (A)?12(x)3(y)M (x; y) and since (x; y) 62 M implies 2(x)3(y) = 0, we have M (x; y) = M (A)?12(x)3(y)1(xl; yl)2(x)3(y) 20
De ne
1(xl; yl) = M (A)?11(xl; yl) 2(x) = 2(x)2(x) 3(y) = 3(y)3(y) Then M (x; y) = 1(xl; yl)2(x)3(y), and the claim follows from Lemma 4.1.
2
Recall the de nitions of the probability measures l; l. Since every (x; y) is covered the same number of times by Ml, for any l, the measure can be written as the convex combination X l(M )M (2) = M 2Ml
We will use the equality (2) in two ways: First note that a protocol for G l is also a protocol for each G l . Therefore X l(M )wl(M ) = E (wl(M )) wl() M
l
M 2Ml
(which re ects the concavity of the function wl - see in the introduction), and by Claim 4.1 we have wl() E w Ml (3) Thus, if we can only upper bound the values w Ml , of the one dimensional measures Ml , we will have an upper bound for wl(). In ordern to odeduce an upper bound for E w Ml , we need some more properties of the family Ml M 2M : First take the projection of equality (2) to get X l(M )Ml l = l
l
l
We also need the following lemma:
M 2Ml
Lemma 4.2 (Main)
There exists l0 such that
E 0 V Ml0
k2 (? log (A)) l
and
D l0
k2 (? log (A)) 21
The proof is given in the next section. In order to understand the intuition behind Lemma 4.2, one should think of the right hand side, k2 (? log (A)) , as a very small . The lemma just says that there exists l0 s.t. l0 is
very close to (by Lemma 3.4), and s.t. for an average M , V Ml0
is very small. 2 n l0Thus o xing l0 from the last lemma, and xing = k (? log (A)), the family of measures M M 2M 0 satis es X l0 (M )V Ml0
l
and
M 2Ml0
1 0 X
D@ l0 (M )Ml0
A = D l0
M 2M 0 In these conditions, the following lemma proves an upper bound for X l0 (M )w Ml0 = E 0 w Ml0 l
l
M 2Ml0
i.e. an upper bound for the value w Ml0 , for an average M .
Lemma 4.3
There exists a global function W2 : [0; 1] ! [0; 1], with z < 1 implying W2(z ) < 1, and a global constant c1 such that: If f d gd2D is a family of probability measures on X Y , and fpd gd2D are weights satisfying P 8d : pd 0 , and d2D pd = 1. Assume that for some 0 1
X D
and Then
D X D
pd V ( d k )
X D
! pd d
pdw( d ) W2(w()) + c11=16
The proof is given in section 6. Using Lemma 4.3, and inequality (3), and the fact that def = k2 (? log (A)) k2 k 2 we can now conclude wl0 () E 0 w Ml0 W2(w()) + c1(2)1=16 which proves Theorem 1.2. l
22
Lemma 4.2, which is the most important lemma in the paper, is proved in the next section. Lemma 4.3 is proved in section 6. We remark that Lemma 4.3 is not really necessary: for small enough , and irreducible , the fact that V ( d k ) implies that k d ? k1 is very small. Therefore, it can be proved easily that E 0 w Ml0 w() which implies the parallel repetitions conjecture for irreducible measures. It turns out that the general case follows easily from the irreducible one. However, as we mentioned in section 3, only for a very small it is the case that V ( d k ) implies that k d ? k1 is small. In fact, such an needs to be much smaller than the used here. In particular, since the measure is arbitrary, such an depends on the size of the input set, jX Y j, as opposed to the used here, which depends only on w(), and is independent of other parameters of the game. Thus using Lemma 4.3 improves the constants in the exponent (in our analysis) in the parallel repetition theorem. l
5 The Main Lemma In this section we give the proof of Lemma 4.2. Our proof uses tools, and intuition from [40]. The main idea of the proof is to introduce a new type of schemes: A scheme M of type M consists of: 1. A partition of the set of coordinates [k] into I [ J . 2. Values ai 2 X , for all i 2 I , and bj 2 Y , for all j 2 J . Note that this is the same as before, except that here I [ J = [k], rather than [k] ? flg. As before, we also denote by M the set o n M = (x; y) 2 X Y 8i 2 I : xi = ai ; 8j 2 J : yj = bj and by M the family of all the sets M of type M. Notice that each element (x; y) 2 X Y is covered 2k times, by the cover M. As before, de ne two probability measures ; : M ! R+ by (M ) = (2Mk ) ; (M ) = (2Mk ) = A2(kM ) = (A(A\)2Mk )
As before, we have for the set M the measures M : X Y ! R+ , and M = M \A : X Y ! R+ . Recall that iM ; Mi are the projections of these measures on the i-th coordinate. For i 2 I , xi = ai is xed. In this case iM ; Mi are concentrated on faig Y , and can also be thought of as measures on Y . In the same way, if j 2 J then jM ; Mj can be thought of as measures on X . 23
Notice also that since = k , we have
Claim 5.1
M = ki=1iM k X i=1
E D Mi
iM ? log (A)
Proof:
The proof follows from the basic properties of Informational Divergence : For any scheme M 2 M, with (M ) > 0, we have by Lemma 3.5,
D ( M k M ) = D ( M \A k M ) = D ((M )A k M ) = ? log (A) = ? log (M \ A) = ? log (M )(A) M
Therefore
(M )
(M )
X
)(A) (M ) log (M (M ) M 2M " # X ( M ) = ? (M ) log (M ) + log (A) M 2M = ?D ( k ) ? log (A) ? log (A)
E (D (M k M )) = ?
(by Lemma 3.1). But by Lemma 3.3, for all M, with (M ) > 0, k X D (M k M ) D Mi
iM i=1
and we can conclude: k k i
i ! i
i X X E D M M = E D M M E (D ( M k M )) ? log (A) i=1
i=1
2 Fix l. In what follows we denote by M a scheme of type M, and by M 0 a scheme of type Ml. Denote by I; J the partition of coordinates, corresponding to the scheme M , and by I 0; J 0 the one corresponding to M 0. A scheme M agrees with a scheme M 0 if and only if I 0 = I \ ([k] ? flg) , J 0 = J \ ([k] ? flg) , and M; M 0 agree on the values of ai; bj , for all i 2 I 0; j 2 J 0. For a scheme M 0 of type Ml, denote the set of all schemes M of type M, agreeing with M 0, by N (M 0). Then the family of sets fM gM 2N (M ) is a cover of the set M 0, 0
24
where each element (x; y) 2 M 0 is covered exactly twice (as we have to choose l 2 I or l 2 J , and then x the value of al or bl). Therefore we have X (M 0) = 21 (M ) M 2N (M ) and hence X l(M 0) = (M ) 0
M 2N (M ) 0
l(M 0)
Assume in all the following that > 0. Then (M )=l (M 0) is a probability measure on N (M 0 ). A scheme M can be randomly chosen, according to the distribution , by rst choosing M 0 according to l , and then choosing M 2 N (M 0) with probability (M )=l (M 0). Claim 5.2 For every l, X (M ) l
l V Ml
lM = 0 D M M M 2N (M ) l (M ) 0
0
0
Proof:
The proof follows by looking carefully into the de nitions : For M 2 N (M 0) (M ) = (M )=2k = 1 (M ) = 1 (M ) (M 0) (M 0)=2k?1 2 (M 0) 2 M 0
l
If for the scheme M , l 2 I then M (M ) is just the probability to have xl = al, in the set M 0. By de nition this probability is Ml (al). Therefore in this case (M ) = 1 l (a ) l(M 0) 2 M l Also, in this case M is derived from M by xing xl to be al. Hence, Ml = Ml (al; ). In the same way, we have lM = lM (al; ). Therefore, for l 2 I (M ) D l
l = 1 l (a )D l (a ; )
l (a ; ) M M M l M l l(M 0) 2 M l In the same way, for l 2 J (M ) D l
l = 1 l (b )D l (; b )
l (; b ) l l M M M M l(M 0) 2 M l Partitioning M 2 N (M 0 ) into schemes with l 2 I , and schemes with l 2 J , we have X (M ) l
l 0 D M M M 2N (M ) l (M )
l
l X1 l l l X1 l ( a; ) + ( ; b ) ( a; )
( ; b )
( a ) D ( b ) D = M M M M M M a2X 2 b2Y 2 = 12 VX Ml
lM + 12 VY Ml
lM = V Ml
lM 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
25
0
0
0
2
Claim 5.3
k X l=1
E V Ml
lM ? log (A) 0
l
0
Proof:
By Claim 5.2 we have X (M )D Ml
lM E D Ml
lM = M 2M
(M ) D l
l M M 0 M 2N (M ) l (M ) M 2M X l(M 0)V Ml
lM = E V Ml
lM =
X
=
l
0
M 2M 0
X
l(M 0)
0
0
0
l
l
0
0
Since this is true for all l, we can sum and conclude by Claim 5.1 that k k X X E V Ml
lM = E D Ml
lM ? log (A) l=1
l
0
0
l=1
2 Since V ( # k ) is always non-negative (see section 3), we can conclude from the last claim that less than k2 coordinates l satisfy E V Ml
lM > ? k2 log (A) Therefore, more than 21 k coordinates l satisfy E V Ml
lM ? k2 log (A) By Lemma 3.3, and Lemma 3.5, we also have k X D l
l D ( k ) = D ( A k ) = ? log (A) l
0
0
l
0
0
l=1
and as before, more than 21 k coordinates l satisfy D l
l ? k2 log (A) 26
Therefore, there exists l with
E V Ml
lM ? k2 log (A) 0
l
and
0
D l
l ? k2 log (A)
Since = k , we have lM = , and l = . Thus Lemma 4.2 follows. 0
6 Families of Local Protocols Given X; Y; U; V; Q, every probability measure, : X Y ! R+, de nes a game G . In this section, as before, we denote the value of this game by w( ). The original measure is one measure of this type (with value w()). For some nite set D, let f d gd2D be a family of measures of this type. Thus, for all d 2 D, d : X Y ! R+ is a probability measure, and has w( d ). Let pd 0 be an arbitrary weightP(8d 2 D). We will not always require P a pvalue d2D d = 1. However, in the cases that we consider d2D pd will be very close to 1. Thus pd will be \almost" a probability measure on D. De ne X wd = w( d ) ; w = pdwd d2D (note that w may be larger than 1, because we didn't require Pd2D pd = 1), and the measure
: X Y ! R+, by X
= pd d d2D
In this section we assume that is close to . We would like to deduce, under some conditions on the measures f dgd2D , a lower bound for w() as a function of w, or conversely, an upper bound for w as a function of w(). The conditions on f dgd2D will intuitively say that each measure d \locally" describes the behavior of the original measure . The goal of the section is to prove Lemma 4.3, which is stated in section 4.
6.1 The Basic Lemma First assume that for all d 2 D, we have sets Xd X , and Yd Y , such that
d = X Y For all d 2 D, de ne ad = (Xd Yd) and d )) [ ((X ? Xd ) Yd )] rd = [(Xd (Y ?Y(X Y ) d
d
27
d
d
De ne
r=
X d2D
pd rd
The next lemma gives our basic lower bound for w() as a function of w:
Lemma 6.1
Assume that for some constants 0 0 1, 0 1 1, we have k ? k1 0, and r 1, and assume (for simplicity) that w 1. Let f : R ! R be any increasing monotone function such that f (z ) 0, for z 0, and such that for all 0 z 1, 0 < 1, we have the inequality 1 (1 ? )z + f (z ? (1 ? )) f (z) Then
w() f (w ? 2p1 ? 0): We remark that 0; 1 should be thought of as small constants. Some examples for functions f , as above, are given in corollaries 6.1,6.2,6.3 below. It is always true that for 0 z 1, f (z) z, (to see this, substitute = 1=2 to get f (z) z=2 + f (2z ? 1)=2 , and by the fact that f is monotone, and z 1 , we get f (z) z=2 + f (z)=2 ).
Proof:
We will rst give some intuition: Every measure decomposes into its irreducible components. In the simplest case, where 0 = 1 = 0, for all d : rd = 0. Therefore, in this case Xd Yd is a component of the measure (not necessarily irreducible). Since we can further decompose each one of these components, we can assume w.l.o.g. that Xd Yd is irreducible, and therefore that the family f dgd2D describes the decomposition of the measure into irreducible components. Every protocol for de nes a protocol for each one of its components. Also, given a protocol for each one of the irreducible components, we can de ne a protocol for (we de ne the protocol for on each irreducible component separately). The value of the protocol for is the weighted average of the values of the protocols for the components. Therefore in the special case 0 = 1 = 0, we can simply conclude that w() = w, and the claim follows. In the general case the intuition is basically the same: We will nd d0 such that Xd0 Yd0 is \almost" a component of (i.e. d0 with small rd0 ), and such that wd0 is very close to w. We will concentrate on the set (X ? Xd0 ) (Y ? Yd0 ) , and continue by induction. Intuitively, we will get a sub-family of f dgd2D that will \almost" describe a decomposition of the measure into \almost" irreducible components. First note that
X X
X k k1 =
pd d
= pd k d k1 = pd d2D d2D d2D 1 Therefore, since k ? k1 0, and since k k1= 1, we have by the triangle inequality X 1 ? 0 pd 1 + 0 d2D
28
Claim 6.1
There exists d0 2 D with
rd0 p1
and
wd0 w ? p1 ? 0
Proof: Denote
Z = fd 2 D j rd p1g X X X 1 r = pd rd pd rd p1 pd
Then
d2D
Hence, thus, and therefore,
X d2Z
w=
X Z
pd wd +
X D?Z
pdwd
X D?Z
d2Z
d2Z
pd p1
X Z
pd 1 +
X D?Z
pd wd p1 +
X D?Z
pd wd
pd wd w ? p1
Let d0 2 D ? Z be the element with the maximal wd0 . Then X X wd0 pd pdwd w ? p1 D?Z
D?Z
and by PD?Z pd Pd2D pd 1 + 0 we have wd0 (w ? p1)=(1 + 0) (w ? p1)(1 ? 0) w ? p1 ? 0 (since w 1). This completes the proof of Claim 6.1. Fix d0 from the last claim. De ne X 0 = X ? Xd0 ; Y 0 = Y ? Yd0 ; = (X 0 Y 0) Assume that > 0. For all d 2 D de ne Xd0 = Xd \ X 0 ; Yd0 = Yd \ Y 0 ; d0 = X Y ; wd0 = w( d0 ) ; a0d = (Xd0 Yd0) 0
d
0
d
0 0 0 0 0 0 0 d )) [ ((X ? Xd ) Yd )] ; p0 = p 1 ad rd0 = [(Xd (Y ?Y(X d d 0 0 ad d Yd )
29
2
(recall that 00 is de ned to be 0). Notice that rd0 can be 1, but then p0d = 0. De ne X X X r0 = p0d rd0 ; w0 = p0d wd0 ; 0 = p0d d0 D
Clearly,
D
d0 (x; y) =
(
Therefore, by the de nitions,
0(x; y) = So
ad (x; y ) ad d 0
0
D
for (x; y) 2 X 0 Y 0 for (x; y) 62 X 0 Y 0
(1 0 0 (x; y ) for (x; y ) 2 X 0 Y 0 0 for (x; y) 62 X Y
0 = 1 (X 0 Y 0) X Y X 0 pd = k 0 k1 = (X 0 Y 0)= 0
Therefore, we also have
0
D
We would like to use induction (of the lemma), with the family f d0 gd2D , to deduce a lower bound for w(X Y ). In order to use the lemma, we need bounds for r0, k 0 ? X Y k1, and for w0. First note that 0
0
0
0
rd0 a0d = [(Xd0 (Y 0 ? Yd0)) [ ((X 0 ? Xd0 ) Yd0)] [(Xd (Y ? Yd )) [ ((X ? Xd ) Yd)] = rdad Therefore, de ning we have
01 = 1 r0 =
Also, de ne
X D
X 0 X p0d rd0 = 1 pd aad rd0 1 pdrd = r 1 = 01 d
D
D
00 =k 0 ? X Y k1 0
then clearly
0
00 = k 0 ? X Y k1 = k (X 0 Y 0) X Y ? (X 0 Y 0)X Y k1 X X j (x; y) ? (x; y)j j (x; y) ? (x; y)j = k ? k1 = 0 = 0
X Y 0
0
0
0
0
0
X Y
0
Thus we have bounds for r0,k 0 ? X Y k1. The bound for w0 will follow from the following two claims: 0
0
30
Claim 6.2
k k1 ? k 0 k1 0 ? 00 + (1 ? )
Proof:
De ne Z = (X Y ) ? (X 0 Y 0), and Z 0 = X 0 Y 0. Then = (Z ) Z + (Z 0) Z , and = (Z )Z + (Z 0)Z . Therefore (4) k k1 = k (Z ) Z k1 + k (Z 0) Z k1 = k (Z ) Z k1 + k 0 k1 In addition, 0 k ? k1 = k (Z ) Z ? (Z )Z k1 + k (Z 0) Z ? (Z 0)Z k1 = k (Z ) Z ? (Z )Z k1 + k 0 ? Z k1 = k (Z ) Z ? (Z )Z k1 +00 Thus, by the triangle inequality 0 ? 00 k (Z ) Z ? (Z )Z k1 k (Z ) Z k1 ? k (Z )Z k1 = k (Z ) Z k1 ?(1 ? ) so, k (Z ) Z k1 0 ? 00 + (1 ? ) The proof follows by substituting this in (4). 0
0
0
0
0
0
2
Claim 6.3 Proof:
w0 (w ? (1 ? ) ? 0 + 00)=
Any protocol for the game with the measure X Y de nes a corresponding protocol for the game with the measure X Y . Take the best protocol for X Y . This protocol for X Y satis es Q on a set of points of -measure wdad. At most -measure of ad ? a0d = (Xd Yd) ? (Xd0 Yd0) is outside Xd0 Yd0. Therefore, at least -measure of wdad ? (ad ? a0d) is inside. Thus, the corresponding protocol for X Y satis es Q on a set of points of measure wdad ? (ad ? a0d) (at least). But this cannot be more than wd0 a0d, thus wd0 a0d wdad ? ad + a0d Therefore 0 1 a 0 0 pd wd = pd ad wd0 pd 1 a1 [wdad ? ad + a0d] = 1 pd wd ? 1 pd + p0d d d and we have X 0 0 1X X X w0 = pd wd pd wd ? 1 pd + p0d D D D D = 1 w ? 1 k k1 + k 0 k1 = 1 [w ? (k k1 ? k 0 k1)] d
0
d
d
d
0
d
d
d
0
d
31
0
d
d
and by Claim 6.2
w0 (w ? (1 ? ) ? 0 + 00)=
2 By Claim 6.3, we can conclude r 1 q0 0 1 0 0 w ? 2 1 ? 0 (w ? (1 ? ) ? 0 + 0) ? 2 ? 00 p p = 1 (w ? (1 ? ) ? 2 p1 ? 0) 1 (w ? (1 ? ) ? 2 1 ? 0) Now, we can use induction, and apply the lemma for X Y , with the family f d0 gd2D, to get a lower bound q 0 0 1 p 0 w(X Y ) f w ? 2 1 ? 0 f (w ? 2 1 ? 0 ? (1 ? )) (since f is monotone). De ne z = w ? 2p1 ? 0 to get w(X Y ) f 1 (z ? (1 ? )) 0
0
0
0
0
0
Recall that wd0 w ? p1 ? p0. Since ad0 + rd0 ad0 + = (X Y ) = 1, we have ad0 = (1 ? )=(1 + rd0 ). But, rd0 1, and therefore
ad0 (1 ? )=(1 + p1) (1 ? )(1 ? p1)
and we have
ad0 wd0 (1 ? )(1 ? p1)(w ? p1 ? 0) (1 ? )(w ? 2p1 ? 0) = (1 ? )z
Recall that Xd0 \ X 0 = ;, and Yd0 \ Y 0 = ;. Given the best protocol for X 0 Y 0 , and the best protocol for X Y , de ne a protocol for that behaves like the rst one on Xd0 Yd0 , and like the other one on X 0 Y 0. This protocol satis es Q on a set of points of -measure (Xd0 Yd0 )w(X 0 Y 0 ) + (X 0 Y 0)w(X Y ) , (at least), and therefore proves that d
0
d
d
0
0
d
0
w() ad0 wd0 + w(X Y ) 1 (1 ? )z + f (z ? (1 ? )) f (z) = f (w ? 2p1 ? 0) 0
0
which proves the lemma. The assumption z 0 ) f (z) 0 is needed for the base case of the induction. We remark that if = 0 the lemma is proved simply from Claim 6.1, or by a continuity argument.
2
32
Corollary 6.1 Under the assumptions of Lemma 6.1, w() f (w ? 2p1 ? 0)
where f is de ned by
Proof:
(1 2 for 0 z 1 f (z) = 20z for z0
We just have to prove the inequality (as in Lemma 6.1) for f . Case a: z 1 ? In this case 1 (1 ? )z + f (z ? (1 ? )) ? f (z) = (1 ? )z + 21 (z ? (1 ? ))2 ? 21 z2 i h = 21 2(1 ? )z + z2 ? 2(1 ? )z + (1 ? )2 ? z2 i h = 21 z2(1 ? ) ? 2z(1 ? )2 + (1 ? )2 i h 1 2? z2 ? 2z(1 ? ) + (1 ? )2 = 1 2? [z ? (1 ? )]2 0
Case b: z 1 ? .
In this case 1 (1 ? )z + f (z ? (1 ? )) ? f (z) = (1 ? )z + 0 ? 21 z2 z2 ? 12 z2 = 12 z2 0
2
It is sometime convenient to denote v() = 1 ? w() ; v = 1 ? w , and g(t) = 1 ? f (1 ? t). In these notations the inequality 1 (1 ? )z + f (z ? (1 ? )) ? f (z) 0 (for 0 z 1, 0 < 1), is equivalent (by setting t = 1 ? z) to (1 ? )t + g(t=) g(t) (for 0 t 1, 0 < 1). Corollary 6.2 Under the assumptions of Lemma 6.1, if g : R ! R is an increasing monotone function such that g(t) 1, for t 1, and such that for all 0 t 1, 0 < 1, we have (1 ? )t + g(t=) ? g(t) 0 then v() g(v + 2p1 + 0) 33
Corollary 6.3 Under the assumptions of Lemma 6.1, q v() 2 v + 2p1 + 0 Proof:
p
Take g(t) = 2 t, then for 0 t 1, 0 < 1
pp
p
(1 ? )t + g(t=) ? g(t) = (1 ? )t + 2 t ? 2 t p pp p p p = (1 + )(1 ? ) t t ? 2(1 ? ) t p p p p = (1 ? ) t[(1 + ) t ? 2] 0
p p
(because (1 + ) t ? 2 2 ? 2 = 0).
2
It will be convenient to de ne the function W1 : R ! R in the following way:
W1(z) = sup f (z) f
where the supremum is taken over all the strictly increasing monotone functions, satisfying the conditions as in Lemma 6.1. Conversely, de ne W2 : R ! R by:
W2(z) = inf f ?1 (z) f where the in mum is taken over the same family of functions. Take, for convenience, 0 = 1 = , and assume for convenience that w 1. Lemma 6.1 can be now restated in the following way: Corollary 6.4 If for a family f d gd2D we have k ? k1 , and r then
p
w() W1(w ? O( )) or conversely
p
w W2(w()) + O( )
By Corollaries 6.1, 6.3 we have 0 < z ) 0 < W1(z) ; zlim W (z) = 1 ; zlim W (z) = 0 ; z < 1 ) W2(z) < 1 !1 1 !0 2 (where the last fact is the important one for us). Also, since f was monotone, W1(z); W2(z) are both monotone. Thus, W1; W2 are \well behaved". In this manuscript we will not investigate their exact behavior. 34
6.2 Characterization of Measures #, with Small V ( # k )
Lemma 4.3 talks about measures d , with small VX ( d k ), and small VY ( d k ). We would like to prove this lemma by a reduction to Lemma 6.1. In order to do that, we need one more lemma that gives a characterization of measures #, with small VX ( # k ), and small VY ( # k ). Again, : X Y ! R+ , and # : X Y ! R+ are probability measures.
Lemma 6.2 Sm X ; Y = If V ( # k ) ; V ( # k ) then, for some m , there exist partitions X = i X Y Sm Y , and positive weights fq gm (i.e. for all i : q 0), such that i=1 the function i i i i=1 i=1 h : X Y ! R, de ned by m h(x; y) =
satis es
i=1
qiX Y (x; y) i
i
k # ? h k1 O(1=8)
and such that
m X i=1
where
X
qiri O(1=8)
i )) [ ((X ? Xi ) Yi )] ri = [(Xi (Y ?Y(X i Yi )
Proof:
Assume that #(x); #(y); (x); (y) take only strictly positive values (otherwise, just add small constants and normalize). This is done only to simplify the notations. Assume that is small enough ( < 2?10 is enough), (for 2?10, O(1=8) = O(1), thus h = does the job). The rst claim describes the basic structure of the measure #: Claim 6.4
#(x; y) ? #(x) (x; y)
p2 ln 2p
1
(x)
#(x; y) ? #(y) (x; y)
p2 ln 2p
1
(y)
Proof:
We will prove the rst inequality. The second can be proved in a similar manner. By Lemma 3.4 we have X X #(x) (k #(x; ) ? (x; ) k1)2 (2 ln 2) #(x)D ( #(x; ) k (x; )) x2X
x2X
= (2 ln 2)VX ( # k ) (2 ln 2) Since for any random variable z: (E(z))2 E(z2), we have p p X #(x) (k #(x; ) ? (x; ) k1) 2 ln 2 x2X
35
but,
X #(x; y) (x; y) #(x) (k #(x; ) ? (x; ) k1) = #(x) #(x) ? (x) x2X x2X y2Y
X # ( x ) # ( x ) = #(x; y) ? (x) (x; y) =
#(x; y) ? (x) (x; y)
X Y 1 X
X
and the claim follows. De ne
2
RX (x) = #((xx)) ; RY (y) = #((yy)) ; R(x; y) = RRY ((yx)) X
(note that R(x; y) is asymmetric). By our assumption, these values are always well de ned, and strictly positive. De ne
h1(x; y) = RX (x)(x; y) ; h2(x; y) = RY (y)(x; y) We proved
p p k # ? h1 k1 2 ln 2p ; k # ? h2 k1 2 ln 2p
Therefore, by the triangle inequality we also have
p k h1 k1 k # k1 + k # ? h1 k1 1 + 2 ln 2p
Similarly, and
p k h2 k1 1 + 2 ln 2p p k h1 ? h2 k1 k # ? h1 k1 + k # ? h2 k1 2 2 ln 2p
The last inequality shows that most of the measure is concentrated on pairs (x; y), with RX (x) very close to RY (y). In the simplest case, where = 0, the entire measure is concentrated on pairs, with RX (x) = RY (y). In this case we can partition X according to RX (x), and Y according to RY (y), and de ne the weight of a subset as RX (x) (or RY (y)). The lemma follows then, from Claim 6.4, and from the last inequality. In the general case the intuition will be the same, but we will have to partition X , and Y , into subsets, according to the value of RX (x), and RY (y), where each subset allows small deviations in the value. Denote p p 1 = 2 2 ln 2 and de ne Z = f(x; y) 2 X Y j j1 ? R(x; y)j p1g 36
Claim 6.5
X Z
h1(x; y) p1 ;
X Z
h2(x; y) p1
Proof:
By the de nitions p k h1(x; y) (1 ? R(x; y)) k1 = k h1 ? h2 k1 2 2 ln 2p = 1 Therefore, X X X h1(x; y)p1 h1(x; y) j1 ? R(x; y)j h1(x; y) j1 ? R(x; y)j 1 Z
Thus
Z
X;Y
X Z
h1(x; y) p1
The second inequality is proved in the same way. Denote Then since < 2?10, For (x; y) 62 Z , we have Hence,
2
q 2 = ? ln(1 ? p1) q q 2 ln(1 + 2p1) 2p1 21=8 1 ? p1 R(x; y) 1 + p1
p
ln (1 ? p1) ln R(x; y) ln(1 + 1) ? ln(1 ? p1) Thus, (x; y) 62 Z implies ?22 ln R(x; y) 22 Let r be a random variable uniformly distributed in the interval [0; 1]. For every integer i de ne Xi(r) = fx 2 X j (i ? 1 + r)2 ln RX (x) < (i + r)2g Yi(r) = fy 2 Y j (i ? 1 + r)2 ln RY (y) < (i + r)2g 1 Then, for all r, fXi (r)g1 i=?1 is a partition of X , and fYi (r)gi=?1 is a partition of Y . De ne 9 8 1 = < [ Xi Yi ; Z^ (r) = :(x; y) 2 X Y (x; y) 62 i=?1 We will prove that for some r these partitions satisfy the lemma. 37
Claim 6.6
h i (x; y) 62 Z ) Prr (x; y) 2 Z^(r) 2
Proof: (x; y) 62 Z implies
j ln RX (x) ? ln RY (y)j = j ln R(x; y)j 22 Assume w.l.o.g. ln RX (x) ln RY (y) . For a xed r, (x; y) 2 Xi (r) Yi (r) for some i, unless there exists in the interval [ln RY (y); ln RX (x)] , a number of the form (j + r)2 (for some integer j ). The probability for that (over r) is ln RX (x) ? ln RY (y) 22 = 2 2 2
2
Claim 6.7
1 0 1 0 X X Er B@ h1(x; y)CA 32 ; Er B@ h2(x; y)CA 32 Z^(r)
Z^(r)
Proof:
By changing the order of the summations, 0 1 h i X X h1(x; y)Prr (x; y) 2 Z^ (r) Er B@ h1(x; y)CA = X Y Z^(r) h i X h i X = h1(x; y)Prr (x; y) 2 Z^ (r) + h1(x; y)Prr (x; y) 2 Z^ (r) X Y ?Z Z X X X X h1(x; y)2 + h1(x; y)1 2 h1(x; y) + h1(x; y) X Y ?Z
X Y
Z
p 2 k h1 k1 +p1 2(1 + 2 ln 2p) + p1 32
The second inequality is proved in the same way.
Z
2
Since h1(x; y); h2(x; y) are always positive, we can conclude from the last claim the existence of r0 with X X h1(x; y) 62 ; h2(x; y) 62 Fix this r0. De ne
Z^(r)
Z^(r)
Xi = Xi(r0) ; Yi = Yi(r0) ; Z^ = Z^ (r0) 38
We will prove that these partitions satisfy the lemma. De ne h3 : X Y ! R by ( (x; y) 62 Z^ h3(x; y) = 0h1(x; y) for for (x; y) 2 Z^ Then clearly X X k h1 ? h3 k1= jh1(x; y) ? h3(x; y)j = h1(x; y) 62 By the de nitions De ne
Z^
X;Y
h3(x; y) =
1 X i=?1
RX (x)(Xi Yi)X Y (x; y) i
i
qi = e(i?1+r0)2 (Xi Yi)
and
h(x; y) =
Claim 6.8
1 X i=?1
qiX Y (x; y) i
i
k h3 ? h k1 22
Proof: For x 2 Xi ,
e(i?1+r0)2 RX (x) < e(i+r0)2
thus, by the de nition of qi, RX (x)(Xi Yi ) qi > RX (x)(Xi Yi )e?2 RX (x)(Xi Yi)(1 ? 2) therefore, for x 2 Xi 0 RX (x)(Xi Yi) ? qi RX (x)(Xi Yi)2 Since X Y (x; y) > 0 ) x 2 Xi , we have for all x; y, jRX (x)(Xi Yi) ? qij X Y (x; y) 2RX (x)(Xi Yi )X Y (x; y) and therefore
1 1 X
X qiX Y (x; y)
RX (x)(Xi Yi )X Y (x; y) ? k h3 ? h k1 =
1 i=?1 i=?1
1 X jRX (x)(Xi Yi) ? qij X Y (x; y)
i=?1
1
1 X
2RX (x)(Xi Yi)X Y (x; y)
1 i=?1 = 2 k h3 k1 2 k h1 k1 22 i
i
i
i
i
i
i
i
i
i
39
i
i
i
i
2 By the triangle inequality, we can now conclude: k h ? # k1 k # ? h1 k1 + k h1 ? h3 k1 + k h3 ? h k1 p 2 ln 2p + 32 + 22 = O(1=8) Notice that in the de nition of h, the sum is actually nite, because X; Y are nite ((Xi Yi) can take a non-zero value only a nite number of times).
Claim 6.9
1 X (Y ? Yi )] O(1=8) qi [Xi( X Y )
i=?1 1 X i=?1
i
i
? Xi) Yi] O(1=8) qi [(X (X Y ) i
i
Proof: Notice that (Xi Yi) can be 0, only if qi is also 0. As in the previous claim for x 2 Xi, qi RX (x)(Xi Yi ). Thus X X (Y ? Yi)] = qi (x; y) qi [Xi( RX (x)(x; y) Xi Yi ) (x;y)2X (Y ?Y ) (Xi Yi ) (x;y)2X (Y ?Y ) i
and therefore
i
i
0 1 1 X X [ X i (Y ? Yi )] @ qi (X Y ) i
i=?1
=
X
(x;y)2Z^
i
X
i=?1 (x;y)2Xi (Y ?Yi )
RX (x)(x; y) =
X Z^
i
1 RX (x)(x; y)A
h1(x; y) 62 = O(1=8)
The second inequality is proved in the same way. Claim 6.9, and the inequality before give the proof of Lemma 6.2.
6.3 Proof of Lemma 4.3 Lemma 4.3 will follow as a simple application of Lemma 6.1, and Lemma 6.2 : Given #, with VX ( # k ) ; VY ( # k ) 40
2 2
take m ; fqigmi=1 ; fXigmi=1 ; fYigmi=1 from Lemma 6.2. De ne
i = X Y ; wi = w( i) i
i
and
ri = [(Xi (Y ?Y(iX)) [((YX) ? Xi) Yi )] i i as in Lemma 6.1. By Lemma 6.2 we have m X qiri O(1=8) i=1
and
X
# ? m qi i
O(1=8)
i=1 1 Every protocol for # de nes also a protocol for each one of the i -s. Therefore, the last inequality also gives m X w(#) qiwi + O(1=8) i=1
which re ects the fact that the function w( ) is concave, and has a Lipschitz constant of 1 (see in the introduction). Lemma 4.3 can be proved now in the following way : Given f d gd2D of probability measures on X Y , and weights fpd gd2D, such that P pa =family d2D d 1, and such that for all d : pd 0, de ne X wd = w( d ) ; w = pd wd ; Vd = V ( d k ) d2D
Lemma 4.3 assumes and and therefore by Lemma 3.4
X d2D
pd Vd
1 0 X D @ pd d
A d2D
X
? pd d
O(1=2) d2D
1
For each measure d , we have by the previous discussion, f d;i gmi=1, and fqd;igmi=1, such that m X qd;ird;i O(Vd1=8) d
d
i=1
41
d
and
m
d ? X qd;i d;i
O(Vd1=8)
i=1 1 d
and as before, we also have
wd
m X d
i=1
qd;iwd;i + O(Vd1=8)
(where rd;i; wd;i are de ned as before). De ne the set D^ by D^ = f(d; i) j d 2 D ; 1 i mdg For each d^ = (d; i) 2 D^ de ne the weight p^(d;i) = pdqd;i and the measure
^(d;i) = d;i Look at the family of measures f ^(d;i)g(d;i)2D^ . For this family, de ne as before X w^(d;i) = w(^ (d;i)) ; w^ = p^d^w^d^ ; r^(d;i) = rd;i Then by the convexity of the function
X d^2D^
and
d^2D^ f (z) = z1=8 ,
we have
0 11=8 m X X X X 1 = 8 p^d^r^d^ = pd qd;ird;i pd O(Vd ) O @ pd Vd A d
d2D
i=1
d2D
d2D
O(1=8)
X
X X
X
? p^d^ ^d^
? pd d
+
pd d ? p^d^ ^d^
d2D d2D d^2D^ d^2D^ 1 1 1
m
X X O(1=2) + pd
d ? qd;i d;i
i=1 d2D 1 0 11=8 X X 1 = 8 O(1=2) + pd O(Vd ) O(1=2) + O @ pd VdA d
d2D
d2D
O(1=8) and as before also m X X X X X w^ = p^d^w^d^ = pd qd;iwd;i pd wd ? O(Vd1=8) = w ? pd O(Vd1=8) d
d^2D^
w ? O(1=8)
d2D
i=1
d2D
42
d2D
Now we can apply Corollary 6.4, for the family f ^d^gd^2D^ to get w^ W2(w()) + O(1=16) and we can conclude w w^ + O(1=8) W2(w()) + O(1=16)
7 Conclusions Theorem 1.1 can be generalized, using methods introduced herein, in many ways. Let us shortly describe two generalizations that seem to follow. The proofs were not veri ed as carefully as the rest of the paper, and should be trusted accordingly. Several other generalizations are described in [41].
7.1 Product of Games Given a game G, and a game G0, the product game G G0 is de ned in the same manner as G G, i.e. if G consists of X; Y; U; V; ; Q, and G0 consists of X 0; Y 0; U 0; V 0; 0; Q0, then the game G G0 consists of the sets X X 0 ; Y Y 0 ; U U 0 ; V V 0, with the measure 0((x; x0); (y; y0)) = (x; y)0(x0; y0) and the predicate Q Q0((x; x0); (y; y0); (u; u0); (v; v0)) = Q(x; y; u; v)Q0(x0; y0; u0; v0) In the same way, given k games G1; : : :; Gk , the product G1 Gk is de ned. Since in the entire proof of Theorem 1.1 we didn't use the fact that the same game, G, is repeated, and since the function W from Theorem 1.1 is global (and in particular doesn't depend on the game G), the following generalization of Theorem 1.1 follows.
Theorem 7.1 Let W : [0; 1] ! [0; 1] be the function from Theorem 1.1. Given k games G1 ; : : :; Gk , de ne and Then
w = MAX[w(G1); : : :; w(Gk )];
s = MAX[s(G1); : : :; s(Gk ); 2]:
w(G1 Gk ) W (w)k= log2(s): As before, w = MAX[s(G1); : : : ; s(Gk ); 2] can be replaced with w = MAX[CC (G1); : : :; CC (Gk); 2]; or with w = MAX[(G1); : : :; (Gk ); 2]: 43
7.2 Probabilistic Predicates Our second generalization deals with the probabilistic case, where the predicate Q, of a game G, depends also on the random string r (i.e. Q is probabilistic). W.l.o.g. we can assume that there exists a second random string, r~, such that r~ is independent of r (and therefore also of (x; y; u; v)), and such that the predicate Q depends on x; y; u; v; ~r (and not on the random string r). As before, the value of a protocol for the game G is de ned to be the probability that Q(x; y; u(x); v(y); ~r) = 1, where (x; y) is chosen according to . As before, the value of the game, w(G), is de ned to be the maximal value of all protocols for G.
Theorem 7.2 Let W : [0; 1] ! [0; 1] be the function from Theorem 1.1. Given a probabilistic game G (as above), with value w(G), and answer-size s(G) 2: w(G k ) W (w(G))k= log2(s(G))
As before, log2(s(G)) can be replaced with CC (G), or with (G), de ned in the following way: De ne Gr~ to be the deterministic game obtained by xing the second random string to r~. Then de ne CC (G) to be the maximum, taken over r~, of CC (Gr~), and de ne (G) to be the maximum, taken over r~, of (Gr~).
Sketch of proof:
First we claim that the proof of Theorem 1.2 holds (with minor changes) for probabilistic games as well: In section 4, the fact that the function wl is concave is used to prove inequality (3) (and the inequality before). The concavity of the value-function of a game is proved in the introduction for deterministic games. The same argument, however, holds for probabilistic games. Using this fact, one can verify that the entire argument of section 4 holds for probabilistic games as well. Now, section 4 uses Lemma 4.2 and Lemma 4.3. Lemma 4.2 does not depend on the game G at all. The proof of Lemma 4.3 (in section 6) is based on Lemma 6.1 and Lemma 6.2. Lemma 6.2 does not depend on the game G as well. Therefore, we just have to verify that the proof of Lemma 6.1 holds for probabilistic games. In the proof of Lemma 6.1 we use the fact that the game is deterministic only in the proof of Claim 6.3. It is not hard to see, however, that a probabilistic version of that proof can be given. Theorem 7.2 is now proved using Theorem 1.2 in the same manner as before (see section 2). The dierence is that now, given z = (x0; y0; u0; v0) 2 Z , Q 1 is still not determined on the set A(z), because Q 1 depends also on the second random string corresponding to the rst coordinate (denote this random string by r~1). In the deterministic case, we disregarded sets A(z) for every z s.t. Q is not satis ed on z. In order to be able to do the same in the probabilistic case, we will have to have a copy of A(z) for every possible r~1 (for r~1 = r0 denote this copy by Ar (z)). We then disregard every copy Ar (z) for every z; r0 s.t. Q is not satis ed on z; r0. A dierent way to see how Theorem 7.2 is proved using Theorem 1.2 is to de ne q(z) to be the probability that Q(z; r~) = 1. (For a deterministic game, q(z) is always 0 or 1). It is 0
0
44
not too hard to see that the entire argument of section 2 can be generalized to the case where 0 q(z) 1. Alternatively, we can present the proof of Theorem 7.2 in the following way: We can view a probabilistic game G in the following equivalent way: Player I receives (as an input) the pair (x; r~), and Player II receives the pair (y; r~). The protocols of the players are restricted as to depend only on the rst input, i.e. x for the rst player, and respectively y for the second player, (and not on the second input r~). We call such a protocol a restricted protocol. The game G is now deterministic, because we can think of the predicate as being depended only on the inputs and the answers of the two players. The class of allowed protocols, however, is now a sub-class of all possible protocols. We claim that the entire proof of Theorem 1.1 (including the proof of Theorem 1.2) is correct even if we allow only restricted protocols: In some parts of the proof (e.g. section 4), we start from a protocol, P , for a game, and obtain from this protocol many protocols for many other games. These protocols are obtained either by projection (on one coordinate), or by restriction to a product subset. In other parts (e.g. section 6) a protocol P for a game is composed from other protocols for dierent games. In order to see that the proof of Theorem 1.1 holds even if we allow only restricted protocols, one should verify that if the original protocols are restricted then every protocol obtained by one of these three methods is also restricted. If the new protocol is obtained by projection of a restricted protocol, or by composition of restricted protocols (i.e. by the rst method or by the third method), then it is very easy to see that the new protocol is also restricted. If the new protocol is obtained by a restriction of a restricted protocol to a subset, then the new protocol is not necessarily restricted. If that subset does not depend on r~, however, then the new protocol is restricted. It is not hard to verify that the subsets used (in the proof) never depend on r~. Theorem 1.1 is thus correct even if we allow only restricted protocols. Theorem 7.2 follows.
2
Acknowledgments I would like to thank Uri Feige and Moni Naor for presenting the problem to me, and Uri Feige, Peter Hajnal, Joe Kilian, Dieter Van Melkebeek, Mario Szegedy, Endre Szemeredi, Gabor Tardos, Oleg Verbitsky, and Avi Wigderson, for helpful and encouraging discussions on the subject, and on the content of this paper. I would like to thank the two referees for many helpful comments. This research was carried out while the author was a post-doc at Princeton University and DIMACS.
45
References [1] N. Alon, \Probabilistic Methods in Extremal Finite Set Theory", Proc. of the Conference on Extremal Problems for Finite Sets, Hungary, 1991. [2] S. Arora, \Proof Veri cation and Hardness of Approximation Problems", Ph.D. Dissertation, UC Berkeley, 1994. Available from http://www.cs.princeton.edu/ arora. [3] S. Arora, C. Lund, \Hardness of Approximations", to appear in: Approximation Algorithms for NP-hard Problems, D. Hochbaum, ed. PWS Publishing 1996. Available from http://www.cs.princeton.edu/ arora. [4] S. Arora, S. Safra, \Probabilistic Checking of Proofs; A New Characterization of NP", FOCS 92, 2{13. [5] S. Arora, C. Lund, R. Motwani, M. Sudan, M. Szegedy, \Proof veri cation and intractability of approximation problems", FOCS 92, 14{23. [6] M. Bellare, \Interactive proofs and approximation", ISTCS 93, 266-274. [7] L. Babai, L. Fortnow, C. Lund, \Non-Deterministic Exponential Time has TwoProver Interactive Protocols", FOCS 90, 16{25. [8] M. Bellare, S. Goldwasser, C. Lund, A. Russell, \Ecient probabilistic checkable proofs and applications to approximation", STOC 93, 294{304. [9] M. Bellare, O. Goldreich, M. Sudan, \Free Bits and Non-Approximability", FOCS 95. [10] M. Bellare, P. Rogaway, "The complexity of approximating a nonlinear program", in: Complexity in numerical optimization, P. Pardalos, ed., World Scienti c, 1993. [11] M. Bellare, M. Sudan, \Improved non-approximability results", STOC 94. [12] M. Ben-or, S. Goldwasser, J. Kilian, A. Wigderson, \Multi Prover Interactive Proofs: How to Remove Intractability", STOC 88, 113{131. [13] M. Ben-or, S. Goldwasser, J. Kilian, A. Wigderson, \Ecient identi cation schemes using two prover interactive proofs", Crypto 89, 498{506. [14] J. Cai, A. Condon, R. Lipton, \On Bounded Round Multi-Prover Interactive Proof Systems", Structures 90, 45{54. [15] J. Cai, A. Condon, R. Lipton, \Playing Games of Incomplete Information", STACS 90. 46
[16] J. Cai, A. Condon, R. Lipton, \PSPACE is Provable by Two Provers in One Round", Structures 91, 110{115. [17] I. Csiszar, J. Korner, \Information Theory: Coding Theorems for Discrete Memoryless Systems", Academic Press New York-San Francisco-London (1981). [18] C. Dwork, U. Feige, J. Kilian, M. Naor, S. Safra, \Low Communication, 2-Prover Zero-Knowledge Proofs for NP" Crypto 92, 217{229. [19] U. Feige, \On the Success Probability of the Two Provers in One Round Proof Systems", Structures 91, 116{123. [20] U. Feige, \Error Reduction by Parallel Repetition - The State of the Art", Technical Report: CS95-32, Weizmann Institute of Science. [21] L. Fortnow, \Complexity-Theoretic Aspects of Interactive Proof Systems", Ph.D. Thesis, MIT/LCS/TR-447, 1989. [22] U. Feige, S. Goldwasser, L. Lovasz, M. Safra, M. Szegedy, \Approximating Clique is Almost NP-Complete", FOCS 91, 2{12. [23] U. Feige, J. Kilian, \Two Prover Protocols - Low Error at Aordable Rates", STOC 1994. [24] U. Feige, J. Kilian, \Impossibility Results for recycling Random Bits in Two Prover Proof Systems", STOC 1995, 457{468. [25] U. Feige, L. Lovasz, \Two-prover one-round proof systems, their power and their problems", STOC 1992, 733{744. [26] L. Fortnow, J. Rompel, M. Sipser, \On the Power of Multi-Prover Interactive Protocols", Structures 88, 156{161. [27] U. Feige, O. Verbitsky, \Error Reduction by Parallel Repetition - a Negative Result", 11th Annual IEEE Conference on Computational Complexity, 96. [28] L. Fortnow, J. Rompel, M. Sipser, \Errata for On the Power of Multi-Prover Interactive Protocols", Structures 90, 318{319. [29] R.M. Gray, \Entropy and Information Theory", Springer-Verlag (1990). [30] J. Hastad, \Testing of the Long Code and Hardness for Clique", STOC 1996. [31] J. Hastad, \Clique is Hard to Approximate within n1? ", manuscript 1996. [32] J. Kilian, \Strong Separation Models of Multi Prover Interactive Proofs" DIMACS Workshop on Cryptography, October 1990. 47
[33] E. Kushilevitz, N. Nisan. \Communication Complexity", Cambridge University Press, to appear. [34] B. Kalyanasundaram, G. Schnitger, \The probabilistic communication complexity of set intersection", Structures 87, 41{49. [35] D. Lapidot, A. Shamir, \A One-Round, Two-Prover, Zero-Knowledge Protocol for NP", Crypto, 1991. [36] D. Lapidot, A. Shamir, \Fully Parallelized Multi Prover Protocols for NEXP-time" FOCS 91, 13{18. [37] C. Lund, M. Yannakakis, \On the hardness of approximating minimization problems", STOC 93, 286{293. [38] D. Peleg, \On the Maximal Number of Ones in Zero-One Matrices with No Forbidden Rectangles", manuscript, 1990. [39] R. Raz, \A Parallel Repetition Theorem", STOC 95 447{556. [40] A.A. Razborov, \On the distributional complexity of disjointness", Theoretical Computer Science 106, 1992, 385{390. [41] R. Raz, A. Wigderson, \Parallel Repetition and Communication Complexity", manuscript in preparation. [42] G. Tardos. \Multi-prover encoding schemes, and 3-prover interactive proofs." Structures, 1994. [43] O. Verbitsky, \Towards the Parallel Repetition Conjecture", Structures, 1994. [44] O. Verbitsky, \The Parallel Repetition Conjecture for Trees is True", manuscript, 1994.
48