Some Bounds on Communication Complexity of Gap Hamming Distance

Report 5 Downloads 68 Views
arXiv:1511.08854v1 [cs.CC] 27 Nov 2015

Some Bounds on Communication Complexity of Gap Hamming Distance Alexander Kozachinskiy∗ Lomonosov Moscow State University, [email protected]



December 1, 2015 Abstract In this paper we obtain some bounds on communication complexity of Gap Hamming Distance problem (GHDn L,U ): Alice and Bob are given binary string of length n and they are guaranteed that Hamming distance between their inputs is either ≤ L or ≥ U for some L < U . They have to output 0, if the first inequality holds, and 1, if the second inequality holds. In this paper we study the communication complexity of GHDn L,U for probabilistic protocols with one-sided error and for deterministic proto 1  s 3 cols. Our first result is a protocol which communicates O U · n log n (L+ 10 )3

n . bits and has one-sided error probability e−s provided s ≥ U2 Our second result is about deterministic communication complexity of GHDn 0, t . Surprisingly, it can be computed with logarithmic precision:    t D(GHDn + O(log n), 0, t ) = n − log 2 V2 n, 2

where V2 (n, r) denotes the size of Hamming ballof radius  r. As an appli2 lower bound on cation of this result for every c < 2 we prove a Ω n(2−c) p the space complexity of any c-approximate deterministic p-pass streaming algorithm for computing the number of distinct elements in a data stream of length n with tokens drawn from the universe U = {1, 2, . . . , n}. Previously that lower bound was known for c < 32 and for c < 2 but with larger |U |. Finally, we obtain a linear lower bound on randomized one-sided error communication complexity of GHDn 0, t for t < n/4 (the error is allowed only for pairs of zero Hamming distance).

1

1 1.1

Introduction Gap Hamming Distance Problem

Given two strings x = x1 . . . xn ∈ {0, 1}n, y = y1 . . . yn ∈ {0, 1}n, Hamming distance between x and y is defined as the number of positions, where x and y differ: H(x, y) = |{i ∈ {1, . . . , n} | xi 6= yi }| .

Let L < U ≤ n be integer numbers. In this paper we consider the following communication problem GHDnL,U , called the Gap Hamming Distance problem: Definition 1. Let Alice receive an n-bit string x and Bob an n-bits string y such that either H(x, y) ≤ L, or H(x, y) ≥ U . They have to output 0, if the first inequality holds, and 1, if the second inequality holds. If the promise is not fulfilled, they may output anything. The Gap Hamming Distance problem is motivated by the problem of approximating the number of distinct elements in a data stream (see [7], [2]). There is the following simple and relatively efficient protocol with shared randomness to solve GHDnL,U . Alice and Bob pick i ∈ {1, . . . , n} uniformly at random (using shared randomness) and check, whether xi = yi or not. They repeat it many times and then perform some kind of a majority vote: if in more than L+U 2n fraction of trials it happened that xi = yi they  output 0, and they output 1 snU number of times is sufficient to otherwise. It can be shown, that O (U−L) 2

make error probability less than e−s . Hence   Re−s GHDnL,U = O

snU (U − L)2



.

(1)

Here Rε (f ) denotes randomized public-coin communication complexity of f with error probability ε. Previously the Gap Hamming Distance problem was studied in the symmetric case: L = n2 − γ, U = n2 + γ. Let GHDnγ stand for GHDnL,U for these  2 specific values of L and U . In this notation the bound (1) becomes O nγ 2 (for

a constant error, say, 13 ). It turns out that this bound is tight:  n 2 o Theorem 1 ([4]). R 31 (GHDnγ ) = Θ min nγ 2 , n . √ The most difficult case is γ = c n, where c is a constant, in which case the n√ lower bound becomes R 13 (GHDc n ) = Ω(n). There are several proofs of this bound [4], [12], [10]. As noted in [4], for other values of γ the bound can be proved via the following reduction: n/k

R 13 (GHDγ/k ) ≤ R 13 (GHDnγ )  2 for k > 1. Setting k = Θ γn in this inequality, we can reduce Theorem 1 to its special case. 2

To the best of our knowledge, GHD has not been studied for L + U 6= n (except simple inequality (1)). This paper establishes new bounds on communication complexity of GHD with different parameters and in different settings.

1.2

Our Results

In section 3 we provide the following upper bound on randomized communication complexity of GHD with one-sided error. Before claim it, let us fix our notations. In this paper Rεi (f ) stands for the minimal possible depth of the communication protocol with shared randomness, which never errs on inputs from f −1 (i) and which errs with probability at most ε on inputs from f −1 (1 − i) (here f is partial Boolean function). Theorem 2. If s ≥

3 (L+ 10 n )

U2

, then

Re0−s (GHDnL, U ) = O

  1 s 3 U

 · n log n .

Let us compare this bound with the upper bound (1) for protocols with two-sided error. For simplicity assume that L < U2 .Then (1) becomes s  Re−s (GHDnL, U ) = O ·n . U 1 s 3 s Instead of U , theorem 2 has U , which is bigger than Us when s < U . As s 1 tends to U , both Us and Us 3 tend to 1 and both bounds become trivial. In section 4 we study the deterministic communication complexity of GHDn0, t . Namely, we prove the following theorem Theorem 3. D(GHDn0, t )

   t + O(log n) = n − log2 V2 n, 2

where V2 (n, r) denotes the size of Hamming ball of radius r. We use this result to prove the following lower bound on space complexity of approximating the number of distinct elements in a data stream: Theorem 4. Assume that 1 < c < 2 and A is a p-pass deterministic streaming algorithm for estimating F0 , the number of distinct elements in a given data stream of size 2n with tokens drawn from the universe U = {1, 2, . . . , 2n}. If A outputs a number  E such that F0 ≤ E < cF0 , then A must use linear space, n(2−c)2 namely Ω . p

Previously such a bound was known in the case when the size of the universe is constant-time larger than the size of the data stream. In the case when the size of the universe and the size of the data stream coincide the bound was known only for c < 23 . In section 5 we prove the following lower bound on randomized one-sided error communication complexity of GHD. 3

Theorem 5. For all t ≤

n 4

the following inequality holds

1 (GHDn0, t ) ≥ R1/2

αn , 2(2 − log2 α)

where α denotes the maximum possible rate of linear error correcting code of length n and distance 2t − 1. Since the maximum possible rate of linear error–correcting code with relative distance less 12 is positive ([11]), we obtain the following corollary Corollary 1. For each q ∈ (0, 1/4) we have 1 (GHDn0, qn ) = Ωq (n). R1/2

2 2.1

Preliminaries Communication Complexity

Let f : X × Y → {0, 1} be a Boolean function and R an arbitrary random variable whose support is R. Definition 2. A randomized (public-coin) communication protocol is a rooted binary tree, in which each non-leaf vertex is associated either with Alice or with Bob and each leaf is labeled by 0 or 1. For each non-leaf vertex v, associated with Alice, there is a function fv : X × R → {0, 1} and for each non-leaf vertex u, associated with Bob, there is a function gu : Y × R → {0, 1}. For each non-leaf vertex one of its out-going edges is labeled by 0 and other one is labeled by 1. Definition 3. Communication complexity of a protocol π, denoted by CC(π), is defined as the depth of the corresponding binary tree. A computation according to a protocol runs as follows. Alice is given x ∈ X , Bob is given y ∈ Y. They start at the root of tree. If they are in a non-leaf vertex v, associated with Alice, Alice sends fv (x, R) to Bob and they move to the son of v by the edge labeled by fv (x, R). If they are in a non-leaf vertex, associated with Bob, they act in a similar same way, however this time it is Bob who sends a bit to Alice. When they reach a leaf, they output the bit which labels this leaf. Definition 4. We say that a randomized protocol computes f with error probability ε, if for every pair of inputs (x, y) ∈ X × Y with probability at least 1 − ε that protocol outputs f (x, y). Randomized communication complexity of f is defined as Rε (f ) = min CC(π), π

where minimum is over all protocols that compute f with error probability ε.

4

If for i ∈ {0, 1} we require that the protocol never errs on inputs from f −1 (i), then the corresponding notion is called “randomized one-sided error communication complexity” and is denoted by Rεi (f ). If f is a partial function, then, in the definition of computation with error we consider only inputs from the domain of f . The Gap Hamming Distance problem is the problem of computing the following partial function:   H(x, y) ≤ L, 0 n for x, y ∈ {0, 1}n. GHDL,U (x, y) = 1 H(x, y) ≥ U,   undefined U < d(x, y) < L, A protocol π is called deterministic, if π does not use any randomness.

Definition 5. We say that a deterministic protocol computes f , if for every possible value i ∈ {0, 1} and for every pair of inputs from f −1 (i) protocol outputs i. Deterministic communication complexity of f is defined as D(f ) = min CC(π), π

where minimum is over all deterministic protocols that compute f .

2.2

Codes

In Section 5 we will use the notion of error–correcting codes. Definition 6. A set C ⊂ {0, 1}n is called an error–correcting code with distance d, if x, y ∈ C, x= 6 y =⇒ H(x, y) ≥ d. Error–correcting code C is called linear, if C is a linear subspace of Fn2 . Definition 7. A rate of the error–correcting code C ⊂ {0, 1}n is defined as log2 |C| . n There are linear error–correcting codes with positive rate and distance less than n/2: Proposition 1 (Gilbert–Varshamov Bound, [11]). The maximal possible rate of a linear error–correcting code of distance d, d < n/2, is at least n − log2 V2 (d − 1, n) − 1 . n In section 4 we will use the notion of covering codes. Definition 8. A set C ⊂ {0, 1}n is called a covering code of radius r, if ∀x ∈ {0, 1}n

∃y ∈ C 5

H(x, y) ≤ r.

Obviously, the size a covering code of radius r is at least 2n . V2 (n, r) There are covering codes with the almost optimal size. n Proposition  There is a covering code in {0, 1} of radius r and size  2n([5]). at most O V2n2 (n,r) .

We will also use the fact that Hamming ball is the largest set among all subsets of {0, 1}n with the same diameter. Definition 9. Diameter of the set A ⊂ {0, 1}n is equal to diam(A) = max H(x, y). x,y∈A

Theorem 6 ([5]). If B ⊂ {0, 1}n, diam(B) ≤ 2r and n ≥ 2r + 1, then |B| ≤ V2 (n, r).

3

Upper Bound on One-Sided Error Communication Complexity of GHD

Consider any x, y ∈ Rt . The scalar product and length of a vector are defined in the usual way t X p xi yi , kxk = hx, xi. hx, yi = i=1

Let US(t) denote the uniform distribution on (t − 1)–dimensional unit sphere.

Proposition 3 ([8]). US(t) is equal to the distribution of the following vector (Z1 , Z2 , . . . , Zt ) p , Z12 + . . . + Zt2

where Z1 , . . . Zt are independent random variables and for each of them we have that Zi ∼ N (0, 1). Lemma 1. If Z ∼ US(t), then for each x ∈ Rt we have Ehx, Zi2 =

kxk2 . t

Proof. Let Z1 , . . . , Zt be random variables from Proposition 3. Assume that x = (1, 0, . . . , 0). Then we have hx, Zi2 =

Z12 6

Z12 . + . . . + Zt2

Random variables Z12

Z12 Z22 Zt2 , 2 , ..., 2 2 2 + . . . + Zt Z1 + . . . + Zt Z1 + . . . + Zt2

are identically distributed. Hence 1=E

Z12 + . . . + Zt2 Z12 = tE 2 = tE(x, Z)2 . 2 2 Z1 + . . . + Zt Z1 + . . . + Zt2

Thus lemma is proved for x = e1 = (1, 0, . . . , 0). Consider any other x ∈ Rt . If x = 0, lemma is obvious. Otherwise there x = Ae1 . Now consider the exists an orthogonal n × n matrix A such that kxk T T vector A Z. Proposition 3 implies that vectors A Z and Z are identically distributed. Hence Ehx, Zi2 = kxk2 EhAe1 , Zi2 = kxk2 Ehe1 , AT Zi2 = kxk2 Ehe1 , Zi2 =

kxk2 . t

Now we are able to construct the protocol for Theorem 2.  p  Proof of Theorem 2. Set b = 4n 3 Us . If b > n, then the theorem 2 states that Re0−s (GHDnU, L ) is linear in n, which is trivial. Therefore we will assume thatb ≤  n. Communication complexity of the protocol will be O(b log n). Set a = nb and xn+1 = . . . = xab = yn+1 = . . . = yab = 0. Note that ab =

lnm b

b≥

n · b = n, b

ab =

lnm b

b≤

n b

 + 1 b = n + b ≤ 2n.

(2)

Alice and Bob transform their inputs x, y to vectors α, β ∈ Rab , where α = (x1 , . . . , xn , xn+1 , . . . , xab ),

β = (y1 , . . . , yn , yn+1 , . . . , yab ).

Note that H(x, y) = kα − βk2 . Alice and Bob divide α and β into b blocks of size a: αi = (xia−a+1 , . . . , xia ),

βi = (yia−a+1 , . . . , yia ),

i = 1, . . . b.

The protocol runs as follows. Alice and Bob sample b independent random vectors U1 , . . . , Ub , each of them according to the distribution US(a). Then Alice computes b numbers hα1 , U1 i, . . . , hαb , Ub i, and sends their approximations to Bob. More specifically, let ri be the closest  to hαi , Ui i number in nm3 | m ∈ Z . Note that 7

1 . (3) n3 Alice sends r1 , . . . , rb to Bob, each number specified by O(log n) bits. Bob computes T ′ = (r1 − hβ1 , U1 i)2 + . . . + (rb − hβb , Ub i)2 , |ri − hαi , Ui i| ≤

If T ′ > L + n5 , then Bob sends 1 to Alice. Otherwise Bob sends 0 to Alice. Communication complexity of the protocol is O(b log n). Now we have to estimate error probability. We first show that T ′ ≤ L + n5 whenever H(x, y) ≤ L and thus the protocol does note err in this case. To this end consider the random variable T = hα1 − β1 , U1 i2 + . . . + hαb − βd , Ub i2 . Note that H(x, y) = kα − βk2 = kα1 − β1 k2 + . . . + kαb − βb k2 ≥ hα1 − β1 , U1 i2 + . . . + hαb − βb , Ub i2 = T.

Let us show that |T ′ − T | is at most Qi = hαi − βi , Ui i. By definition T′ =

b X

Pi2 ,

5 n.

T =

b P

i=1

|Pi2 − Q2i | =

b P

i=1

b X

Q2i .

i=1

i=1

Thus |T ′ − T | ≤

Denote Pi = ri − hβi , Ui i and

|Pi − Qi | · |Pi + Qi |. Let us bound |Pi − Qi |

and |Pi + Qi | separately. By (3) |Pi − Qi | ≤ most

1 n3 .

By definition |Pi + Qi | is at

|Pi + Qi | = |ri + hαi , Ui i − 2hβi , Ui i|

≤ |ri | + |hαi , Ui i| + 2 |hβi , Ui i| 1 ≤ 2 |hαi , Ui i| + 3 + 2 |hβi , Ui i| n

(again we use that ri is n13 -close to hαi , Ui i). Coordinates of αi and βi are zeros and ones and there are at most n ones among them. Hence √ √ |hβi , Ui i| ≤ kβi k ≤ n, |hα, Ui i| ≤ kαi k ≤ n, √ and therefore |Pi + Qi | ≤ 4 n + n13 ≤ 5n. Finally ′

|T − T | ≤

b X i=1

|Pi − Qi | · |Pi + Qi | ≤ b ·

since b ≤ n. 8

1 5 · 5n ≤ , n3 n

Assume that H(x, y) = kα − βk2 ≤ L. Then T′ ≤ T +

5 5 ≤L+ . n n

In this case the protocol always outputs 0. Now assume that H(x, y) = kα − βk2 ≥ U . We will show that event T ′ ≤ L + n5 happens with small probability. By Lemma 1 we have that ET =

kα1 − β1 k2 kαb − βb k2 kα − βk2 H(x, y) U + ... + = = ≥ . a a a a a

For each i = 1, . . . b we have   hαi − βi , Ui i2 ∈ 0, kαi − βi k2

with probability 1. To finish the proof we use the Hoeffding inequality: Proposition 4 ([6]). If random variables χ1 , . . . , χm are independent and for each i = 1, . . . m χi ∈ [ai , bi ] with probability 1, then for every positive δ

Pr [χ1 + . . . + χm ≤ E(χ1 + . . . + χm ) − δ] ≤ exp

      

−P m

The following chain of inequalities finishes the proof. 

   5 10 Pr T ≤ L + ≤ Pr T ≤ L + n n

2δ 2

   

i=1



≤ Pr [T ≤ ET − ET /2]   (ET )2 ≤ exp − 2(kα1 − β1 k4 + . . . + kαb − βb k4 )   kα − βk4 /a2 ≤ exp − 2a(kα1 − β1 k2 + . . . + kαb − βb k2 )   U ≤ exp − 3 2a

≤ exp{−s}. Let us explain it step by step. • (4) holds because |T − T ′ | ≤

5 n;

9

.

 (bi − ai )2  

(4) (5) (6) (7) (8) (9)

• First of all, by definition of b and since s ≥ 43 43 sn3 b3 ≥ ≥ U and hence b ≥ Therefore

4n(L+ 10 n ) . U

3 (L+ 10 n )

U2

U

n3

3 (L+ 10 n )

U2

we have

4n L + U

=

Now recall that ET ≥

U a

10 n

 !3

,

and by (2) ab ≤ 2n.

ET U bU 10 ≥ ≥ ≥L+ 2 2a 4n n

and (5) follows. • (6) holds because of Hoeffding inequality, applied to T = hα1 − β1 , U1 i2 + . . . + hαb − βb , Ub i2 . • (7) holds because kαi − βi k2 ≤ a and hence kαi − βi k4 ≤ akαi − βi k2 . • (8) holds because kα1 − β1 k2 + . . . + kαb − βb k2 = kα − βk2 and kα − βk2 ≥ U . • For (9) again recall that a ≤

2n b

and b3 ≥

43 sn3 U .

U U b3 U ≥ ≥ s. =  3 2a3 16n3 2 2n b

4 4.1

Deterministic Communication Complexity of GHDn0, t Proof of Theorem 3

Observe that  j n k + O(log n). D(GHDn0, n ) = 2 = n − log2 V2 n, 2

Hence we can assume that t < n. Consider the protocol π witnessing D(GHDn0, t ). Let π(x, y) denote the leaf in protocol in which Alice and Bob come when Alice have x on input and Bob has y on input. If l is a 0–leaf of the protocol π, consider the set Al = {x ∈ {0, 1}n | π(x, x) = l} 10

Note that diam(Al ) ≤ t − 1. Indeed, assume that x, y ∈ Al and H(x, y) ≥ t. At the same time π(x, x) = l,

π(y, y) = l =⇒ π(x, y) = l.

This contradicts the fact that l is a 0–leaf of π. Observe that       t t t 1 t−1 =2 ≤2 , n≥ t+1 ≥2 + 1. − 2 2 2 2 Hence by Theorem 6    t . |Al | ≤ V2 n, 2

If both parties have the same x ∈ {0, 1}n on input, they must come to some 0–leaf. Hence {Al | l is a 0–leaf of π } is a covering of {0, 1}n. This covering has size at most 2CC(π) and each set of  t  the covering has size at most V2 n, 2 . Therefore    t ≥ 2n 2CC(π) V2 n, 2 and D(GHDn0, t )

   t = CC(π) ≥ n − log2 V2 n, . 2

Let us prove the upper bound on D(GHDn0, t ). Let C be the covering code of   radius t−1 and size at most 2 ! n2n  ,  O V2 n, t−1 2 existing by Proposition 2. Alice computes

c = arg min H(z, x) z∈C

and sends c to Bob. Since c ∈ C, it takes at most !    t−1 n2n  t−1  = n − log2 V2 n, log2 |C| + 1 = log2 O + O(log n) 2 V2 n, 2

  bits. If H(c, y) ≤ t−1 2 , Bob sends 0 to Alice. Otherwise, Bob sends 1 to Alice. Let us prove that the described protocol computes GHDn0, t . Note that by  t−1  . Hence if x = y, then H(c, y) ≤ definition of c and C we have H(c, x) ≤ 2  t−1  . Assume now that H(x, y) ≥ t. Then 2   t−1 H(x, c) + H(y, c) ≥ H(x, y) ≥ t > 2 2 11

and hence H(y, c) > 2



   t−1 t−1 − H(x, c) ≥ . 2 2

Observe that         t t−1 n V2 n, ≤ V2 n, + t 2 2 2        n − 2t t−1 n   · t = V2 n, + t 2 2 −1 2    t−1 ≤ (1 + n)V2 n, . 2 Therefore the communication complexity of the protocol is at most       t t−1 + O(log n) ≤ n − log2 V2 n, + O(log n). n − log2 V2 n, 2 2

4.2

Application to the Number of Distinct Elements (Proof of Theorem 4)

Let F0 denote the number of distinct elements in a given data stream of size 2n with tokens drawn from the universe U = {1, 2, . . . , 2n}. We say that a deterministic p-pass streaming algorithm A with memory S for computing F0 is c–approximate if A outputs E such that F0 ≤ E < cF0 . We claim  a number  n(2−c)2 that for c < 2 A requires Ω memory. Let us start with the case p = 1. p

The first result of that kind was proved in [1]. It states that if |E −F0 | < cF0 , where c = 0.1, then A requires Ω(n) memory. A linear lower bound for memory for a larger c can be obtained by reduction to the deterministic communication complexity of equality, as it done, for example, in [3]. Indeed, for each α < 21 there is an error-correcting code ECC : {0, 1}k → {0, 1}n with relative distance at least α and k = Ωα (n). Assume that Alice has x ∈ {0, 1}k and Bob has y ∈ {0, 1}k their inputs. They want to decide whether x = y. Alice and Bob transform their inputs into 2 data streams u and v: u = hu1 , u2 , . . . , un i,

v = hv1 , v2 , . . . , vn i,

ui = n · ECC(x)i + i,

vi = n · ECC(y)i + i.

where (10)

Alice emulates A on u. Then, using S bits, she sends a description of the current state of A to Bob and Bob emulates A on v, starting with the state he received from Alice. Finally Bob knows the output of A for the stream that is equal to the concatenation of u and v. Notice that the number of the distinct elements F0 in this concatenation equals n + H (ECC(x), ECC(y)). If A is a (1 + α)–approximate (that is, c = 1 + α), then Bob is able to decide whether 12

x = y or not. Indeed, if x = y, than E < cF0 = (1 + α)n. If x 6= y, then by definition of ECC we have that E ≥ F0 ≥ n + αn = (1 + α)n. As deterministic communication complexity of the equality predicate on k-bit strings is k, a linear lower bound S = Ω(k) = Ω(n) for the space complexity of A for c < 32 follows. In this argument we only needed a linear lower bound for 1-round communication complexity of equality predicate, which is trivial. However for arbitrary p we already need a linear lower bound for complexity of equality predicate for 2p-round protocols. The lower bound for the space complexity we obtain by this argument becomes Ω(n/p), as in each round Alice in Bob exchange S bits. Instead of binary error–correcting codes one can use error–correcting codes with a larger alphabet and relative distance close to 1. The same reduction provides a linear lower bound for c < 2. The point is that the size of the universe increases and the problem becomes harder. Theorem 3 implies a linear lower bound on the space complexity of A for c < 2 in the case when the size of the universe and the size of a data stream are equal. Indeed, assume that Alice has x ∈ {0, 1}n and Bob has y ∈ {0, 1}n, as their inputs. Assume also that they are promised that either x = y or H(x, y) ≥ t = ⌈n(c − 1)⌉. Again, Alice and Bob transform their input into data streams u and v but with (10) replaced by ui = nxi + i,

vi = nyi + i.

The expression for F0 becomes F0 = n + H(x, y). Thus Alice and Bob can solve GHDn0, t using 2pS bits of communication. Indeed, if x = y, then E < cF0 = cn ≤ n + t since by definition t ≥ (c − 1)n. If H(x, y) ≥ t, then E ≥ F0 ≥ n + t. We conclude that by theorem 3 that pS must be at least    2 !   1 t t + log n = Ω n = Ω(n(2 − c)2 ). − pS = Ω n − log2 V2 n, 2 2 2n

5

The Lower Bound

The lower bound of Theorem 5 will be proved by reducing it to the lower bound of the following lemma by Orlitsky. Lemma 2 ([9]). Assume that we are given random jointly distributed variables X, Y taking values in {0, 1}n such that for each x, y ∈ {0, 1}n: Pr[X = x, Y = y] > 0,

Pr[X = x] = 2−n .

Assume also that Alice receives value of X, Bob receives value of Y and there is a deterministic protocol π such that in the end of the protocol π Bob outputs Alice’s input (value of X). Then the expected (with respect to the distribution of (X, Y )) length of the protocol π is at least n. Note that lemma 2 holds also for randomized protocols. Indeed, if there is a randomized protocol which transmits X from Alice to Bob (who sees Y ) 13

without errors, then we can convert it into the deterministic protocol by fixing an optimal choice of randomness. The resulting protocol remains error-less and it’s expected length does not increase. Proof of Theorem 5. Let us explain the idea of the proof. Assume that Alice and Bob have a protocol π computing GHDn0, t , which errs only on inputs of zero Hamming distance. For each ε ∈ (0, 1) there exist random variables X, Y taking values in {0, 1}n, whose joint distribution satisfies the following conditions ∀x, y ∈ {0, 1}n

Pr[X = x, Y = y] > 0,

Pr[X = x] = 2−n ,

Pr[X 6= Y ] ≤ ε. Further, assume that Alice has X on input, Bob has Y on input and the goal is to transmit X from Alice to Bob. This will be done in two steps. On the first step they run π on (X, Y ) (even though H(X, Y ) can be between 0 and t). Note that if π outputs 0, then H(X, Y ) < t. On the second step Alice and Bob finish the transmission. We will show how to do it in n − αn bits if Alice and Bob are sure that H(x, y) < t. Otherwise Alice sends X in n bits. Two steps together by Lemma 2 must take at least n bits on average. We will show that the second step takes much less then n bits on average. Therefore the first step must take many bits. As the cost of the first step equals CC(π), we obtain the lower bound on CC(π). Let us now explain the proof in more details. Chose s ∈ N such that α α ≤ 2−s ≤ . 4 2

(11)

1 1 ≥ . s 2 − log2 α

(12)

Note that

1 Repeating protocol witnessing R1/2 (GHDn0, t ) s times we get 1 (GHDn0, t ). R21−s (GHDn0, t ) ≤ sR1/2

Let π be the protocol witnessing R21−s (GHDn0, t ) and let C be a linear errorcorrecting code in Fn2 with distance 2t − 1 and rate α. Alice gets X, Bob gets Y and Alice want to transmit X to Bob. Consider the following protocol τ which solves this task: 1. Alice and Bob run π on (X, Y ); 2. If π outputs 0, Alice sends the description of the conjugacy class [X] ∈ Fn2 /C, using at most n − αn bits. Then Bob outputs X ′ = arg min H(z, Y ). z∈[X]

14

3. If π outputs 1, then Alice sends X to Bob, using n bits, and Bob outputs X ′ = X. Let us show that X = X ′ . It is clear when π outputs 1. Assume now that π outputs 0. Then H(X, Y ) < t, since by definition π never errs when H(X, Y ) ≥ t. Note that by definition of X ′ H(X ′ , X) ≤ H(X ′ , Y ) + H(Y, X) ≤ 2H(Y, X) ≤ 2t − 2.

(13)

Assume now that X ′ 6= X. X ′ ∈ [X] and hence X ′ − X ∈ C. But X ′ − X 6= 0 and hence by definition of C H(X ′ , X) = H(X ′ − X, 0) ≥ 2t − 1, which contradicts (13). Thus it is always true that X ′ = X. By Lemma 2 the expected length of τ is at least n. Let us now bound the expected length of τ from above. The third item in the definition of τ occurs when X 6= Y or X = Y and π makes an error. This happens with probability at most ε + 2−s . Hence the expected length of τ is at most CC(π) + n − αn + (ε + 2−s )n.

1 Recalling that CC(π) = R21−s (GHDn0, t ) ≤ sR1/2 (GHDn0, t ) and letting ε → 0 we get 1 (GHDn0, t ) ≥ n(α − 2−s ). sR1/2

The last inequality transforms to n(α − 2−s ) . s

1 R1/2 (GHDn0, t ) ≥

As s satisfies (11) and (12), we obtain 1 (GHDn0, t ) ≥ R1/2

αn . 2(2 − log2 α)

References [1] Alon, N., Matias, Y., and Szegedy, M. The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing (1996), ACM, pp. 20–29. [2] Brody, J., Chakrabarti, A., Regev, O., Vidick, T., and De Wolf, R. Better gap-hamming lower bounds via better round elimination. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Springer, 2010, pp. 476–489. 15

[3] Chakrabarti, A. Data stream algorithms. Computer Science 49 , 149. [4] Chakrabarti, A., and Regev, O. An optimal lower bound on the communication complexity of gap-hamming-distance. SIAM Journal on Computing 41, 5 (2012), 1299–1317. [5] Cohen, G., Honkala, I., Litsyn, S., and Lobstein, A. Covering codes, vol. 54. Elsevier, 1997. [6] Hoeffding, W. Probability inequalities for sums of bounded random variables. Journal of the American statistical association 58, 301 (1963), 13–30. [7] Indyk, P., and Woodruff, D. Tight lower bounds for the distinct elements problem. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on (2003), IEEE, pp. 283–288. [8] Marsaglia, G. Choosing a point from the surface of a sphere. The Annals of Mathematical Statistics 43, 2 (1972), 645–646. [9] Orlitsky, A. Average-case interactive communication. Information Theory, IEEE Transactions on 38, 5 (1992), 1534–1547. [10] Sherstov, A. A. The communication complexity of gap hamming distance. Theory of Computing 8, 1 (2012), 197–208. [11] Sudan, M. Algorithmic Introduction to Coding Theory: Lecture Notes. 2001. [12] Vidick, T. A concentration inequality for the overlap of a vector on a large set, with application to the communication complexity of the gaphamming-distance problem. Chicago Journal of Theoretical Computer Science 1 (2012).

16