Solving some discrepancy problems in NC Sanjeev Mahajan Edgar A. Ramosy K. V. Subrahmanyamz LSI Logic Max-Planck-Institut fur Informatik SPIC Mathematical Institute Milpitas, CA 95035 Im Stadtwald, 66123 Saarbrucken 92.GN Chetty Road, T. Nagar Madras U.S.A. Germany India. 600 017
[email protected] [email protected] [email protected] Abstract
We show that several discrepancy-like problems can be solved in NC 2 nearly achieving the corresponding sequential bounds. For example, given a set system (X; S ), where X is a ground set and S 2X , a set R Xp can be computed in NC 2 so that, for each S 2 S , the discrepancy jjR \ S j ? jR \ S jj is O( jS j log jSj). Previous NC algorithms could only achieve p O( jS j1+ log jSj), while ours matches the probabilistic bound achieved sequentially by the method of conditional probabilities within a multiplicative factor 1 + o(1). Other problems whose NC solution we improve are lattice approximation, -approximations of range spaces of bounded VC-exponent, sampling in geometric con guration spaces, and approximation of integer linear programs.
1 Introduction
Problem and previous work. Discrepancy is an important concept in combinatorics, see e.g.
[1, 5], and theoretical computer science, see e.g. [27, 23, 9]. It attempts to capture the idea of a good sample from a set. The simplest example, the discrepancy problem, DP, considers a set system (X; S ), where X is a ground set and S 2X is a family of subsets of X . Then one is interested in a subset R X such that for each S 2 S the dierences jjR \ S j ? jR \ S jj, called the discrepancies, are small. Using Cherno-Hoeding bounds [10, 15, 28, 27, 29], it is found that a random sample R X with each x 2 X taken into R independently with probability 1=2, results with nonzero probability in a low discrepancy set: for each S 2 S , jjR \ S j ? jR \ S jj = p O( jS j log m) (here m = jSj ). This can be made into a deterministic sequential algorithm that computes such a sample R [27]; it uses the so called method of conditional probabilities. In parallel, several approaches have been used (k-wise independence combined with the method of conditional probabilities and relaxed to biased spaces [6, 23, 25, 7]). However, so far these p eorts to compute a sample in parallel have resulted only in O( jS j1+ log m) discrepancies. The situation is similar for other discrepancy-like problems, like the lattice approximation problem [27, 23], and some sampling problems in computational geometry [12, 13, 14]. Results. In this paper, we describe NC algorithms (speci cally, the algorithms run in time O(log2 n) using work1 O(nC ) for some constant C in the EREW PRAM model) that achieve Work performed while at the Max-Planck-Institut fur Informatik, Germany. author. The work by this author was started while at DIMACS/Rutgers University, New Brunswick, NJ, USA, supported by DIMACS postdoctoral fellowship. z Work performed while visiting the Max-Planck-Institut f ur Informatik, Germany. 1 In the parallel context, the work is the product of the number of processors and the running time. y Contact
1
the probabilistic bounds (achievable sequentially) within a multiplicative factor 1 + o(1). The technique we use is to model the sampling by a deterministic nite automata, DFA, and then fool the resulting probabilistic DFA with a small (polynomial size) probability distribution. The approach is not new; in fact, Karger and Koller [17], show how to do exactly this through the lattice approximation problem, LAP, using the solution for the LAP developed in [23]. However, they apparently did not realize that the LAP itself can be modeled by DFAs, and that as a result this and other discrepancy-like problems can be solved in parallel nearly achieving the probabilistic bounds. We also describe how the work of Nisan [26] to fooling DFAs in the context of pseudorandom generators also ts the same general approach. Karger and Koller's approach is stronger in that it fools the DFAs with relative error in the transition probabilities, while Nisan's approach can only do it in absolute error. However, absolute error is sucient for most applications, and results in a simpler algorithm and better work bounds. The approach is limited in that we can only handle problems in which the goodness of the sample R, obtained with individual and independent probabilities px for x 2 X , is determined by a polynomial number of constraints of the form jci ? i j i , 1 P = 1; : : :; m = O(jX jC ), where P ci = x2X aix qx with qx = 1 i x 2 R (the indicator for R), i = ix aixpx is the expected value of ci , and i is the deviation guaranteed by the probabilistic bounds. The coecients aij must be limited to O(log jX j) bits, so that the possible sums ci can be represented by a polynomial number of states in a DFA (one DFA for each i). A key point, which perhaps explains why our observations had not been noticed before, is that it is sucient to fool the individual transition probabilities of the DFAs simultaneously, rather than the joint transition probabilities. This is because, in the probabilistic analysis, the probability of obtaining a bad sample is bounded by the sum of the probabilities that each individual constraint does not hold. This limited framework, however, includes the LAP and the DP, and other sampling problems in computational geometry. Also, since the LAP can be used to obtain approximate solutions to integer linear programs [28, 27], this results in improved results in the parallel context. As a result, with no extra eort, we improve on the recent work in [2]. Contents. In section 2, we consider the lattice approximation problem, its solution by rounding and its modeling by DFAs; in section 3, we present the techniques for fooling DFAs and the resulting algorithm for the lattice approximation problem; in section 4, we consider the discrepancy problem and its application to solving the lattice approximation problem; in section 5, we present two applications to computational geometry; and in section 6, we discuss the application to approximating integer linear programs. In Appendix A, we quote the probabilistic bounds that will be needed in the paper, and in Appendices B, C and D, we include some computations omitted in the main body.
2 Lattice approximation In the lattice approximation problem, LAP, we are given an m n matrix A with aij 2 [0; 1], an n 1 vector p with pj 2 [0; 1], and we are to compute an n 1 vector q with qj 2 f0; 1g, a lattice P n point, that achieves small discrepancies i = j =1 aij (pj ? qj ) . We use the following notation. For details refer to appendix A). For independent random variables X1; : : :; Xn in [0; 1], X = Pni=1 Xi and = E[X ], let (; x) denote the absolute deviation for which, Pr(jX ? j > (; x)) 0, PrfX > (1 + )g < B(; ),
(ii) for 2 (0; 1], PrfX < (1 ? )g < B(; ).
Let (; x) be the absolute deviation that results in this bound of the tail probability being x. There is a constant c > 0 such that (see [27, 2]) (
(; x) =
p
( log(1=x)) if c log(1=x) log(1=x) log(log(1 =x)=) otherwise.
A.2 Cherno-Hoeding bound for k-wise independence
P
Let X1 ; : : :; Xn be a sequence of k-wise independent r.v.'s in [0; 1], with k 2 even. Let X = ni=1 Xi , Pn 2 = E[X], and [X] = i=1 2 [Xi ]. Then [29] (using 2[Xi ] [Xi ] for r.v.'s in [0; 1]), 2 [X]) k=2 k max(k; [X]) k=2 : PrfjX ? j g < k max(k; e2=32 e2=3 2
Let k (; x) be the absolute deviation that results in this bound of the tail probability being x. Then
p
( k(1=x)1=k ) if k (k(1=x)1=k ) otherwise. In the case k = 2, Chebyschev's inequality gives a more precise bound 2 p 2 ; 2(; x) = =x: PrfjX ? j g [X] 2 k (; x) =
B Accumulation of error B.1 Absolute error
Let h0 = l00 ? l and h00 = l0 ? l00 , then X
t2Ni;l
jPrD~ fstg ? PrF fstgj = h
0
X X t2Ni;l0 r2Ni;l00
21
X
PrD1 fsrgPrD2 frtg ?
X (PrD
1
t2Ni;l r2Ni;l 0
00
X
r2Ni;l
PrF fsrgPrF frtg h0
h00
00
fsrg ? PrF fsrg)(PrD2 frtg + PrF frtg)+ h00
h0
(PrD1 fsrg + PrF fsrg)(PrD2 frtg ? PrF frtg) h0
2k?1:
B.2 Relative error
h00
To verify the claim notice that, using the inequality jab ? 1j (1 + ja ? 1j)(1 + jb ? 1j) ? 1 and h0 = l00 ? l; h00 = l0 ? l00, PrD fstg Pr ~ fstg PrD fstg D 1 + Pr fstg ? 1 ? 1; Pr fstg ? 1 1 + Pr ~ fstg ? 1 F F D h
h
15
and that
PrD ~ Pr
fstg ? 1 = 1 jPr fstg ? Pr fstgj ~ F f st g Pr F F fstg D h
h
h
1
X ? PrD1 r2Ni;l00 X PrD 1 Pr
= Pr fstg fsrgPrD2 frtg ? PrF fsrgPrF frtg F Pr 1 f sr g D 2 frtg PrF fsrgPrF frtg ? 1 Pr fstg f sr g Pr f rt g F F F r2N X ? (1 + k?1)2 ? 1 PrF fsrgPrF frtg Pr 1fstg F r2N = (1 + k?1 )2 ? 1: h
h
h
h0
h0
i;l00
h00
h0
h00
h0
h00
h00
i;l00
C Work and time bounds for LAP algorithm
C.1
LAP
Let us consider how reduce obtains D from D~ = D1 D2 . The probabilities PrD~ fstg that we aim to preserve are computed in time O(log(n + m)) and work O(E02 m + m) (since E02 is the size of the support ~ Let P be the probability space used in the reduction. Each p 2 P, which corresponds to a of D). ~ is tested to determine a good one (which is guaranteed to exist by probability distribution on supp(D), the computations of the previous subsection). Thus, for each p 2 P and each i = 1; : : :; m the following is ~ determine the state t 2 Ni;l that is reached from s~i , and then add for each t done: For each w 2 supp(D), all Prp fwg for w that lead to t. This gives all the probabilities Prp fstg, and from this information we can determine a good p. The amount of work performed is then 0
E02 m + m + f(E02 ) E0 m where f(E02 ) is the size of P, E0 the size of supp(p) for p 2 P, m the number of DFAs and the number of states in Ni;l . The third term, f(E02 )E0 m, dominates. The time required is O(log(n + m)). The work performed by fool(l; l0 ) satis es then the recurrence 0
W(h) 2W(h=2) + Cf(E02 )E0m where h = l0 ? l. Then the total work is O(f(E02 )E0mn). Since the size of a k-wise, k even, independent probability space for x variables is f(x) = O(xck ), where c 1=2, we conclude from the expressions we obtained for E0, that the best choice is k = 2. For the case of absolute error, in which the distributions are uniform, f(x) = O(x) is achievable: Use hash functions to generate P, following the approach of [26]. Let H be a 2-wise independent family of hash functions h : E0 ! E0. The size of H is E02 . P is generated from H as follows: For h 2 H, let ph 2 P be the uniform distribution with supp(ph ) = fwh(w) : w 2 E0g. The 2-wise independence of H implies the 2-wise independence of P and, obviously, the size of P is also E02 .6 This results in a work bound O(E03 mn), and replacing the constraint for E0:
2 2 W(n) C n 2 m
3
7 6 4 mn = C n 6m :
6 For the reader familiar with Nisan's construction, we point out that in his construction all the subproblems generated at the same level of recursion of fool would use the same good hash function h. This is important there to obtain a compact representation of the nal pseudorandom strings, but here choosing a good h independently in each subproblem results in less work.
16
For the case of relative error, for k = 2, we have f(x) = O(x2 ) using the construction in [16]. For this, it is important to note that the probabilities q(w) can be truncated to log n bits (by the same argument used for the pj 's). So we obtain: 2 2 5 11 10 6 W(n) C n 2 m mn = C n 10 m : The time satis es the recurrence T(h) T(h=2) + C log(n + m); so it is O(log n log(n + m)) = O(log2 n).
C.2 Discrepancy
In the absolute error option, the work bound can be improved by observing that a smaller choice of suces (with a very small loss in the discrepancy value achieved) considerably less than = n as chosen above. Consider the DFA MS for S 2 S and let X(l; l0 ) be the subset of X whose indicator bits correspond to the levels between l and l0 ; thus jX(l; l0)j = h using the notation from the previous section. Let R(l; l0) = R \ X(l; l0 ). Then, Prfstg should be concentrated on those t near the mean jR(l; l0)j=2, so that states t corresponding to a deviation larger than (recall that ~ is the absolute error introduced by each level) p p (jR(l; l0 )j=2;~=2) C jR(l; l0)j log(2=~) C h log(n + m) can be safely assigned probability 0. The error introduced by this is then ~=2, so if we guarantee also error ~=2 in the p reduction from D~ to D, then the error introduced at each level is ~ as required. So now is at most C h log(n + m). Using the lower bound for E0 from section 3, with = 1=2m, Cn2m3 2 , we obtain E0 (h) Cn2m3 h log2 (n+ m). So the recurrence for the amount of work above becomes W(h) 2W(h=2) + E03 (h)nm 2W(h=2) + Cn7m10h log6 n: whose solution is W(h) Cn7m10 h logh log6 n. Thus, W(n) = Cn7m10 n log7 n Cn8m10 log7 n:
D Lattice approximation via discrepancy
D.1 Reduction from VBP to DP
Let us assume that each aij has L fractional bits, and let a(ijk) be the k-th most sign cant one. That P P P is, aij = Lk=0 a(ijk) 2?k . Also let (ik) = 21 nj=1 a(ijk) , so that i = Lk=0 (ik) . Note that (ik) i 2k . The reduction is to transform the VBP with the m n matrix A of coecients aij into the DP with an m(L + 1) n matrix A0 obtained by writing in column the m(L + 1) bits of aij . The claim is that the solution to the DP is a solution to the VBP with only a constant factor loss. Let q be a solution to the VBP. Since q is a solution to the DP, then (ik) Then i
?
aij qj j =1
n X
=
= (ik)
?
L X (k) ?k i 2 k=0 p
(k ) aij qj j =1
n X
?
C i log(mL) P where = Lk=0 2?k=2 = O(1).
q
p
C (ik) log(mL) C i log(mL)2k=2:
a(ijk)2?k qj j =1 k=0
n X L X L X
k=0
L X (k) i k=0
?
a(ijk)qj 2?k j =1
n X
L X k=0
p 2?k=2 C i (log m + logL) 1 + logL
logm
17
p
C i log(mL)2k=22?k
1=2
p
C i log m;
D.2 Reduction from LAP to VBP
The reduction uses bit-by-bit randomized rounding. Let us assume that each pj has L fractional bits, and P let p(jk) be the k-th most sign cant one. That is, pj = Lk=0 p(jk)2?k . The bit-by-bit rounding consists of L stages. Let pfj kg be the rounded version of pj at the begining of the k-th stage, so pfj 0g = pj and pfj Lg = qj , the resulting lattice vector (pfj kg has L ? k fractional bits, in particular qj is 0 or 1). In the k-stage, the (L ? k)-th signi cant bit of pfj kg, denoted p[jL?k] , is rounded: if nonzero then round up or round down with equal probability, that is, pfj k+1g = pfj kg + 2L?1k?1 (qj[L?k] ? 21 p[jL?k] ); where qj[L?k] is 0 or 1 with equal probability if p[jL?k] = 1 and 0 otherwise. It is argued in [23] that this is equivalent to the original randomized rounding. In the deterministic version, qj[L?k] is the solution to the P [L?k ] VBP with matrix A and vector p[jL?k] . Let [iL?k] = m . The solution to the VBP satis es j =1 aij pj q m X [L?k ] [L?k ] L ? k ] i = j aij qj ? i j C [iL?k] log m: j =1 [
P Let fi kg = mj=1 aij pfj kg and note that [iL?k] 2L?k fi kg. So q 2C qfkg logm: [L?k ] logm fi k+1g = jfi k+1g ? fi kg j = 2L?1k?1 [iL?k] 22C i L?k 2(L?k)=2 i
p
Assuming that fi kg i + i log m, and that i log m, then
p
p
m) log m fi k+1g 2C (i + 2(L?iklog )=2
We verify the assumption inductively:
p log m 2C(1 + ) 2(Li?k)=2 : =
1 2
p
k+1 kX +1 X m fi k+1g i + fi rg i + 2C(1 + )1=2 2(Li?log r)=2 r=1
r=1
as long as 2C(1 + )
P1 1=2
r=0 2r=2 . 1
18
p
i + i log m;