ANALYSIS OF TWO SIMPLE HEURISTICS ON A RANDOM INSTANCE OF k-SAT Alan Friezeand Stephen Suen Department of Mathematics, Carnegie Mellon University, Pittsburgh PA15213, U.S.A. May 31, 1995 Abstract We consider the performance of two algorithms, GUC and SC studied by Chao and Franco [2], [3], and Chvatal and Reed [4], when applied to a random instance ! of a boolean formula in conjunctive normal form with n variables and bcnc clauses of size k each. For the case where k = 3, we obtain the exact limiting probability that GUC succeeds. We also consider the situation when GUC is allowed to have limited backtracking, and we improve an existing threshold for c below which almost all ! is satis able. For k 4, we obtain a similar result regarding SC with limited backtracking.
1 Introduction Given a boolean formula ! in conjunctive normal form, the satis ability problem (sat) is to determine whether there is a truth assignment that satis es !. Since sat is NP-complete, one is interested in ecient heuristics that perform well \on average," or with high probability. The choice of the probabilistic space is crucial for the signi cance of such a study. In particular, it is easy to decide sat in probabilistic spaces that generate formulas with large clauses [8]. To circumvent this problem, recent studies have focused on formulas with exactly k literals per clause (the k-sat problem). Of particular interest is the case k = 3, since this is the minimal k for which the problem is NP-complete. k on the set of all m = bcnc Let Vn be a set of n variables. We de ne a probability space m;n clause formulae over the variables which have exactly k literals per clause. We let Cj (Vn) be ( )
Supported by NSF grant CCR-9024935
1
the set of all clauses of size j chosen from Vn . We will assume that all variables occurring in k = Ck (Vn )m . a single clause are distinct. We then take m;n This means that we consider the clauses to be ordered and we will consider the literals within clauses to be ordered too. Thus we can think of ! as a k m array where !i;j is the i'th literal in the clause Cj . There is not a lot of dierence between this model and other unordered models. We show later in Section 8 that our results can easily be extended to these models. Experimental evidence [11, 13] strongly suggests that there exists a threshold , such that formulas are almost surely satis able for c < and almost surely unsatis able for c > , where is about 4:2. This has not been proven rigorously, but such a threshold (namely c=1) is known to exist for 2-CNF formulas [7, 4]. Most practical algorithms for the satis ability problem (such as the well-known DavisPutnam algorithm [6]) work iteratively. At each iteration, the algorithm selects a literal and assigns it the value 1. All clauses containing this literal are erased from the formula, and the complement of the chosen literal is erased from the remaining clauses. Algorithms dier in the way they select the literal for each iteration. The following three rules are the most common ones: ( )
1. The unit clause rule: If a clause contains only one literal, that literal must have the value 1; 2. The pure literal rule: If a formula contains a literal but does not contain its complement, this literal is assigned the value 1; 3. The smallest clause rule: Give value 1 to a (random) literal in a (random) smallest clause. Broder, Frieze and Upfal [1] analysed an algorithm based entirely on the pure literal rule. They showed that in the m;n probabilistic space, the pure literal rule alone is sucient to nd, with high probability, a satisfying assignment for a random formula ! 2 m;n , for c = m=n 1:63. On the other hand, if c > 1:7, then the pure literal rule by itself does not suce. Chao and Franco [2],[3] and Chvatal and Reed [4] analysed two heuristics GUC and SC based on the small clause rule: (3)
(3)
begin repeat
choose a literal x; remove all clauses from ! that contain x and remove x from any remaining clause; if a clause becomes empty - HALT, FAILURE; until no clauses left; HALT, SUCCESS
end
2
The algorithms GUC and SC dier in how the literal x is chosen. In GUC, x is chosen at random from a randomly selected clause of smallest size. SC (see Chvatal and Reed [4] for a complete description of SC) diers from GUC in that if there are no clauses of size one or two, then x is chosen at random from the set of all free literals. Since at least one clause is satis ed each time when GUC assigns a value to a variable, it is intuitively clear that GUC is likely (probabilistically) to perform better than SC. Algorithm SC however has the advantage of being simpler to analyse. The reason for this is that since SC only takes care of clauses of size one and two, there are fewer cases to consider when analysing SC. The combined results (among other things) in Chao and Franco [2], [3] and Chvatal and Reed [4] can be summarized as follows. For 3-sat, if c < 2=3 then SC succeeds with probability tending to 1 [4] and if c < 2:99 then the probability that UC (a variant of GUC using only the unit clause rule) succeeds does not tend to zero [2]. For k-sat where k 4, if !k? k ? 1 2k ? ; k ? 1 c < k ?3 k?2 k 3
3
then SC succeeds with probability tending to 1 [4], and if k 40 and !k? k 2 ; k ? 1 c < 0:7725 k ? 2 k+1 2
(1:1)
then the probability that GUC succeeds does not tend to zero [3]. Our rst theorem gives the precise limiting probability that GUC succeeds when applied to a random instance of 3-sat. Let c 3:003 be the solution to the equation 3
3c ? 2 log c = 6 ? 2 log(2=3); and
f (x) = fc (x) = 34c (1 ? x ) + log x; When c < c we have f (x) < 1 for all x 2 (0; 1). 2
x 2 (0; 1):
3
Theorem 1.1 Consider applying GUC to a random instance of 3-sat with n variables and bcnc clauses. (a) Suppose that c < 2=3. Then
nlim !1 Pr(GUC
succeeds) = 1:
(b) Suppose that 2=3 c < c3 . Let be the unique root of f (x) = 0 that is strictly less than 1. Then ! Z 1 f (x)2 nlim !1 Pr(GUC succeeds) = exp ? 4x(1 ? f (x)) dx : (c) If c c3 then nlim !1 Pr(GUC succeeds) = 0:
3
Chao and Franco [2] report that using GUC in a backtracking algorithm can be quite successful (and possibly be polynomial expected time for certain values of c). We describe (in Section 6) a modi cation of GUC called GUCB that allows a limited amount of backtracking when an empty clause is produced. We obtain the following result by showing that for suciently small c, the backtracking does not change the state of GUC by a great deal.
Theorem 1.2 Consider GUCB when applied to a random instance of 3-sat with n variables and bcnc clauses. If c < c3 then nlim !1 Pr(GUCB succeeds) = 1: Thus Theorem 1.2 raises the lower threshold for almost sure satis ability from about 1.65n [1] to just above 3n. On the other hand, the upper threshold giving almost sure unsatis ability has been reduced to below 5n by El Maftouhi and de la Vega [12] and to about 4:758n by Kamath, Motwani, Palem and Spirakis [9]. Thus the current gap in our knowledge of the satis abiity or unsatis ability of random ionstances of 3-sat is still rather large. Furthermore, even though it is very easy to prove that an instance of 3-sat with 100n random clauses is almost surely unsatis able, there are no known polynomial time algorithms which can prove this. Chvatal and Szemeredi [5] have proved negative results on this problem. We next turn our attention to algorithm SC. It is possible to show that the assertions in Theorems 1.1 and 1.2 hold for SC. In fact, our proof of Theorem 1.1 can be extended to obtain the precise limiting probability that SC succeeds when applied to a random instance of k-sat. However, the more interesting question is: for what values of c will SC, with limited backtracking as in GUCB, succeed with probability close to 1? We answer this question with our next result. Assume k 4. Let ! k p (x) = 3 2kc? x (1 ? x)k? : 2
3
3
3
It is easy to see that p (x) is unimodal, achieving a maximum of ! 2 kc k ? 2 k ? 3 k? 3 2k ? k ? 1 k ? 1 when x = 2=(k ? 1). For k? k ? 1 !k? k ? 1 2 c > k k ?3 k ? 2; 3
3
3
3
3
let 0 < = (c) < = (c) < 1 be the two solutions of the equation p (x) = 2=3. We prove the following theorem. 0
0
1
1
3
Theorem 1.3 Suppose that k 4. Let ck be the maximum value of c such that ! 1 + k ? 3 ? 1 ? k ? 3 + ln( = ) 1: 1 (k ? 1)(k ? 2) 2 0
2 1
0
4
1
0
1
Then when SCB is applied to a random instance of k-sat with n variables and bcnc clauses where c < ck , we have nlim !1 Pr(SCB succeeds) = 1:
Write ck = k 2k =k. It is possible to show that as k ! 1, k ! where can be de ned similarly as k . Numerical calculations show that 1:817, 1:3836, 1:504, 1:686, and that k is increasing in k. Theorem 1.3 gives a constant ck such that almost every formula ! with n variables and bcnc, with c < ck , clauses of size k is satis able. This improves, by only a constant factor, a similar result in [4]. Also, ck (for 4 k 40) is smaller than the right hand side of (1.1), and we believe that if the limiting probability that GUC succeeds is positive, then GUC with limited backtracking (as described later) succeeds with probability 1 ? o(1). It is thus very likely that when applied to random instances of k-sat for k 4, GUCB has a higher threshold of success than SCB. At present, we can only characterize the critical behaviour of GUC and GUCB, when applied to random instances of k-sat with k 4, using a system of k ? 2 polynomial equations whose properties we have diculty in penetrating analytically. It seems unlikely that the exact thresholds for GUCB can be rid of the factor 1=k (see de nition of ck ). 4
5
10
2 Proof Strategy The basis of our proof of Theorem 1.1 is that the intermediate states of GUC (or SC), when applied to a random instance of k-sat, can be represented by a Markov chain which we describe as follows. Consider GUC when applied to a formula ! chosen at random (with k where m = bcnc. Use to denote the number of equal probability) from the space m;n variables whose truth values are not yet determined by GUC at an intermediate stage. We call this stage and so GUC starts at stage n. For the purpose of analysis, all empty clauses are assumed to be removed by GUC as soon as they are created, and GUC is allowed to run until the set of clauses is exhausted. Hence, GUC succeeds if and only if the number of empty clauses created is zero. We will assume that ! is not given to us in its entirety at the start of the algorithm. Instead we will learn about the formula as the algorithm proceeds. This scenario has been aptly named the method of deferred decisions by Knuth, Motwhani and Pittel [10]. At stage we will have partially lled in the k m matrix ! and there remains free variables. Some columns, corresponding to satis ed clauses, will be completely lled in. We will refer to these as removed. The remaining columns will be partially lled in. If an entry in a partially lled in column is assigned a literal, then the value of this literal has been assigned false by previous steps of the algorithm. The remaining entries will be left blank. A partially lled in column with i blank entries will correspond to a residual clause of size i, i = 0; 1; : : : ; k. (A clause of size 0 is an empty clause, previous assignments have assured us that GUC will fail to satisfy this clause.) Let Ni = Ni( ), i = 0; 1; 2; : : : ; k be the number of residual clauses of size i remaining at the start of stage of GUC. ( )
5
To carry out stage we choose a clause C of minimum size. We randomly choose a literal x from the remaining 2 possibilities. We assign x to one un lled entry of C and then randomly ll in the remaining positions, subject to the condition that all variables must be distinct. We then go through the partially lled columns of !. Suppose we have a column j with ` un lled entries:
With probability 1 ? `= we do nothing. With probability `= we choose one of the un lled positions of column j , position i say.
{ With probability 1/2 place x in position i, randomly ll in the rest of column j
and remove it from further consideration, as it corresponds to a satis ed clause. { With probability 1=2 we place x in position i, leaving the remaining positions of column j blank. The reader can easily convince himself (herself) that at the end of the algorithm the columns have been lled in with random clauses. The important and now obvious property of this process is that conditional on Ni( ), i = 0; 1; 2; : : : ; k the remaining clauses are random and independent of previous steps of the algorithm. For future reference we refer to this as complete independence. It follows that N = (N ; N ; : : : ; Nk ) is a Markov chain. We next write down the transition probability of N . Use B (; p) to denote a binomial variable with parameters and p and note that decreases by 1 at each stage. Write Ni( ) = Ni( ? 1) ? Ni ( ) as the change from stage to stage ? 1. Then Ni are binomial variables (conditional upon N ( )). We shall write down the distributions of Ni under the dierent cases where the minimum size of the clauses is i. For i = 1; 2; : : : ; k, we write i((y ; y ; : : :; yk )) = 1 if minfj j yj 6= 0; 1 j kg = i, and i((y ; y ; y ; : : :; yk )) = 0 if otherwise. Also, (y) = 0 always. Consider the stage when GUC has just assigned 1 to a literal x in clause C and is about to remove clauses that contain x and all occurrences of x from other clauses. Let j; be the number of clauses of size j containing literals x or x (but not including C ). Let j; be the number of clauses of size j containing literal x (but not x as all variables in a clause are dierent). It follows that conditional on N = N ( ), we have for j = 1; 2; : : : ; k that j; ( ) = B (Nj ? j (N ); j= ); in distribution; j; ( ) = B (j; ; 1=2); in distribution: Note that complete independece implies that the variables, j; ; j = 1; 2; : : : ; k, are independent. Then for j = 0; 1; : : : ; k, Nj ( ) = j ; ( ) ? j; ( ) ? j (N ( )); where ; = k ; = 0. Note that if N ( ) = 0, then N ( ) = 0 with probability 1. Note also that if N ( ) 1 is given, then a clause of size one (with literal x say) is chosen 0
1
0
1
0
0
0
1
0
1
0
0
+1 1
00
+1 1
0
1
0
1
6
1
2
at stage and that N ( ) is distributed as a binomial variable with parameters N ( ) ? 1 and 1=(2 ). Theorem 1.1(b) is obtained by showing that in the case of 3-SAT, the total numberR of empty clauses created is asymptotically distributed as a Poisson variable with mean f (x) =(4x(1 ? f (x)))dx. Theorem 1.1(a) and (c) are shown using monotonicity arguments. We shall also require similar statements for SC. Let Nj0 ( ) be the number of size j clauses remaining at stage when SC is applied to a random instance of k-sat with n variables and m clauses. Then similary to GUC, N 0( ) is a Markov chain with initial state N 0(n) = (0; : : : ; 0; m) and transition probabilities given by ( 0 0 0 0 j ; ( ) ? j; ( ) ? j (N ( )); if j = 0; 1; 2; Nj ( ) = 0j ; ( ) ? 0j; ( ); otherwise; 0
1
1
2
where 0 ; = 0k 00
;
+1 1
0
+1 1
0
= 0 and for j = 1; 2; : : : ; k ( (Nj0 ? j (N 0); j= ); if j = 0; 1; 2; 0 j; ( ) = B B (Nj0; j= ); otherwise; 0 0 j; ( ) = B (j; ; 1=2):
+1 1
0
1
0
Also, conditional on N 0 ( ), the distribution of the number of empty clauses created at stage is binomial with parameters (N ( ) ? 1) and 1=(2 ). The layout of this paper is as follows. We concentrate on showing Theorems 1.1 and 1.2, while we shall only sketch our proof of Theorem 1.3. In the next section, we collect some useful properties of a Markov chain Xt which will be used to approximate N in proving Theorem 1.1(b). We shall then prove parts (a) and (c) of Theorem 1.1 in Section 4 by developing monotonicity arguments for comparing dierent Markov chains. Theorem 1.1(b) is proved in Section 5 by applying the results stated in Section 3. In Section 6, we describe how GUC is allowed to backtrack, and prove Theorem 1.2. In Section 7, we sketch brie y how our proof of Theorem 1.2 can be extended to proving Theorem 1.3. Section 8 brie y discusses other models. 1
1
+
1
3 A Markov chain Use B (m; p) to denote a binomial variable with parameters m and p, and write bj = bj (m; p) for the probability that B (m; p) equals j . We assume throughout this section that mp < 1. The big O terms in this section are uniform in m and p (but may depend on ). We consider a Markov chain Xt with transition probabilities de ned as follows. If Xt = 0, then Xt = Xt ? Xt equals B (m; p) in distribution; otherwise Xt equals B (m; p) ? 1 in distribution. We assume X 0 and so X = 0 is a re ecting barrier. As we are interested in bounds that are uniform in m and p, we need to consider a Markov chain Yt which is similar to Xt except that in the one-step transitions of Yt , we have a Poisson variable P () in place of B (m; p). It will be clear that the two chains Xt and Yt are very similar when mp = , +1
0
7
although it is not possible to couple them so that X = Y and Xt Yt for all t 0. We let = mp in this section. We rst prove the existence of a steady state distribution denoted by for our walk. The following existence proof was kindly provided by Boris Pittel. Let Ti; i > 0 denote the expected number of steps to visit the state 0 if the walk starts at i. Then Ti = limn!1 Ti n , where T n = Ti = 0 and for n 1 and i 1, 0
( )
( ) 0
0
(0)
Ti n = 1 + E[Ti?n? B m;p ] is the expected value of minfn; time to reach 0 from ig. Now, if mp < 1 then T i = ?imp satis es ( )
(
1) 1+ (
)
1
T i = 1 + E[T i?
B (m;p)];
1+
so by induction Ti n T i, and consequently Ti T i. Thus, T the expected time of return to zero is at most X 1 + Pr(B (m; p) = j ) T j = 1 + 1 ?1mp E[B (m; p)] ( )
0
j>0
= 1 + 1 ?mpmp = 1 ?1mp < 1: Thus the stationary distribution fig exists and = 1=T 1 ? mp. (Note that = 1 ? mp from (3.1) below, and so Ti = Ti; i 1.) Note next that satis es i X i = bi + j bi?j ; 8i 0: 0
0
0
+1
0
+1
j =1
i Writing GX (s) = P1 i s i as the probability generating function of the steady state distribution, it follows from the above equations that X X X GX (s) = sibi + j sj? bisi =0
0
= giving
0
1
i0
j 1 (1 ? p + ps)m +
i0
1 (G (s) ? )(1 ? p + ps)m; s X 0
(s ? 1) GX (s) = s(1 ?p + ps)?m ? 1 : 0
As GX (1) = 1, we actually have and
= 1 ? mp
(3:1)
? mp) : GX (s) = s(1(s??p1)(1 + ps)?m ? 1
(3:2)
0
8
Since (1 ? p + ps)m exp(? + s) for all s, we see that (s ? 1)(1 ? ) ; (3:3) GX (s) G(s) = s exp( ? s) ? 1 for all s between 1 and the radius of convergence of G. (It can be checked that G(s) is the probability generating function of the steady state distribution of Yt.) Since , G(s) exists for all s < r, where r > 1 is a constant depending on only. (r is in fact the unique root bigger than 1 of s exp( ? s) = 1.) Thus, (3.3) holds for all s satisfying 1 < s < r. Note also that from (3.2), the mean of the steady state distribution of Xt is X (2 ? p ? mp) : = (m; p) = ii = mp2(1 (3:4) ? mp) i Also, Pr(Xt = 0) = GX (0) = 1 ? mp: (3:5) 1
1
1
1
0
We would like to consider the number of times that Xt returns to 0 in a certain time period. To do this, we need to collect some preliminary results. Suppose X = 1. Let HX be the time elapsed when Xt rst hits 0. (H is de ned accordingly for Yt with Y = 1.) Note that HX = 1 + L + : : : + LB in distribution; where B = B (m; p) in distribution and L ; : : :; LB are independent copies of HX . This last equation follows from the fact that if the rst step of the walk jumps to state B , it takes B independent copies of HX for the walk to get back to the origin because all moves of the walk toward the origin have magnitude 1. Hence, writing MX () = E[exp(HX )], we have MX () = e (1 ? p + pMX ())m: (3:6) By considering the functions f (y) = e (1 ? p + py)m, f (y) = exp( ? + y), f (y) = exp( ? + y) and f (y) = y, and by noting that f (y) f (y) for all and y and that f (y) f (y) for all and y 1, we have MX () M () M (); (3:7) where the rst inequality holds for all < r and the second inequality holds for 0 < r, and r is the radius of convergence of M (), and M () and M () respectively are the smallest roots of M () = exp( ? + M ()); (3.8) M () = exp( ? + M ()): (3.9) (Again, it can be checked that M is the moment generating function for H .) By observing that r is the value of at which the line f (y) = y is a tangent to the curve f (y) = exp( ? + y), we nd that r = ? log ? 1. Further, by considering close to r, 2 d M we see that M () < 1. Also, we shall need to bound M 00() = d2 . From (3.8), we have M 0() = M ()=(1 ? M ()) M 00() = M ()=(1 ? M ()) : 0
0
1
1
1
2
3
1
2
2
3
2
2
2
2
2
2
3
9
Using the fact that M () M () < 1, it follows from the second inequality in (3.7) that for 0 < r, () M 00() (1 ?M M ()) : Also, for 0, we have M 00() (1 ?1 ) (1 ?1) : 2
3
3
3
Thus, for any (1 ? )r (where > 0 is any xed constant), we have M 00() A; (3:10) where A is a xed constant (depending only on ). Note that from (3.6) and (3.8), we have E[HX ] = E[H ] = 1=(1 ? ): (3:11) 2
Consider next that X = 0. For r 1, let r be the time elapsed when Xt rst returns to 0 for the r-th time. We shall obtain a concentration result for r (when r is large). Observe that equals HX in distribution (this is because X has the same distribution when X = 0 or X = 1) and so r is distributed as a sum of r independent copies of HX . Hence, E[r ] = r=(1 ? ). We shall use the inequalities Pr(r A) MX ()r exp(?A); Pr(r A) MX (?)r exp(A); for any > 0. As MX () M () by (3.7), we shall bound M (). Using Taylor's theorem and (3.11), M () = 1 + =(1 ? ) + M 00() =2; for some between 0 and . Using (3.10), we have that as ! 0, M 00() = O(1); which implies that M () = 1 + =(1 ? ) + O( ): Hence, for any A > 0 and small > 0, Pr(r r=(1 ? ) + Ar = ) M ()r exp(?r=(1 ? ) ? Ar = ) exp(O(r ) ? Ar = ): Also, we have for any A > 0 and small > 0, Pr(r r=(1 ? ) ? Ar = ) M (?)r exp(r=(1 ? ) ? Ar = ) exp(O(r ) ? Ar = ): By putting = r? = , we have for any A > 0 and for large r Pr(j r ? r=(1 ? mp) j Ar = ) = O(e?A ): (3:12) We therefore have the following lemma. 0
1
0
1
0
2
2
1 2
1 2
2
1 2
1 2
1 2
2
1 2
1 2
1 2
10
Lemma 3.1 Let r be the time elapsed when Xt rst returns to 0 for the r-th time given that X = 0. Then for any A > 0, we have as r ! 1, Pr(j r ? r=(1 ? ) j Ar = ) = O(e?A ): 0
1 2
Lemma 3.2 Suppose that X = r for any integer r 1. Let Hr = minft j Xt = 0g. Then for any A > 0, Pr(j Hr ? r=(1 ? ) j Ar = ) = O(e?A ): (3:13) 0
1 2
Also, we have for any A > 0 that Pr(9t Hr s:t: Xt r=(1 ? ) + Ar1=2) = O(e?A ):
(3:14)
Proof Simply observe that Hr is distributed as a sum of r independent copies of HX , and so Hr equals r in distribution, which gives (3.13). Equation (3.14) follows from (3.13) and the fact that Xt decreases by at most 1 in each transition. 2 Lemma 3.3 Let NT be the number of times that Xt equals 0 in the time interval [0; T ], given that X0 = O(log 10 T ). Then for any A > 0, we have for any constant A0 > 0 that (3:15) Pr(j NT ? T (1 ? ) j AT 1=2) = O(e?A + T ?A ): 0
Proof
Use H to denote the minimum value of t such that Xt = 0. Using (3.13) with r = O(log T ), we have for any constant A0 > 0 that Pr(H log T ) = O(e? 6 T ) = O(T ?A ): Hence if NT0 is the number of times that Xt = 0 in the interval [0; T ] given that X = 0, then NT0 NT NT0 ? 11 T with probability at least 1 ? O(T ?A ) for any constant A0 > 0. Now Lemma 3.1 implies that as t ! 1, Pr(j Nt0 ? t(1 ? ) j At = ) = O(e?A= ? ) = O(e?A ): The lemma now follows by taking t = T and t = T ? log T . 2 10
11
0
log
0
0
log
1 2
(1
)
11
Lemma 3.4 Suppose that X = 0. With (r with r = 1) as de ned in Lemma 3.1, we have for any A > 0, there exist a constant 2 (0; 1) and a constant C > 0 such that Pr( A) C?A : (3:16) For each t, let Rt = minfk 1 j Xt k = 0g. That is, Rt is the waiting time after time t until the next return to 0. Then for any A > 0, there is a constant 2 (0; 1) such that as T ! 1, Pr(max R A) = O(T?A ); (3:17) tT t 0
1
1
+
and
Pr(max X A) = O(T?A ); tT t 11
(3:18)
Proof
Since equals HX in distribution, we have Pr( A) MX () exp(?A): Inequality (3.16) follows by putting = r=2. To show (3.17), let Si be the time elapsed between the (i ? 1)-th and the i-th return to 0. That is, each Si equals in distribution. Let N be the number of times that Xt = 0 for t 2 [0; T ]. Then N T and (3.17) follows from (3.16) because Pr(max R A) Pr(max S A) = O(T?A ): tT t tT t 1
1
2
1
Inequality (3.18) follows from (3.17) and the fact that Xt decreases by at most 1 in each transition. 2 For the rest of the section, we will require coupling chain Xt with another chain Xt0 having the same transition probability. The coupling is such that if X X 0 then Xt Xt0 for all t 0. This coupling is speci ed by de ning the transition probabilities of the coupled chain (Xt; Xt0) as follows: Xt = Xt0 = B (m; p) ? 1; if Xt > 0 and Xt > 0 = B (m; p) ? 1; if Xt = 0 and Xt > 0 Xt ? 1 = Xt0 0 = B (m; p); if Xt = 0 and Xt = 0 Xt = Xt 0 if Xt > 0 and Xt = 0: Xt = Xt ? 1 = B (m; p) ? 1; 0
0
+1
+1
+1
+1
Lemma 3.5 Suppose that X = O(log T ). Then for any A > 0 and for large T , there is a constant 2 (0; 1) and a constant C > 0 such that Pr(XT A) C?A + O(T ?A ); 10
0
0
for any constant A0 > 0.
Proof
Use H to denote the minimum value of t such that Xt = 0. Note that for t H , it follows from coupling Xt with the steady state chain X^t that the distribution of Xt is stochastically at most the steady state distribution. Hence, Pr(XT A) Pr(XT A j H T )Pr(H T ) + Pr(H > T ) Pr(X^t A)Pr(H T ) + Pr(H > T ): Now from (3.13), we have Pr(H > T ) = O(T ?A ); for any constant A0 > 0. (Note that although Pr(H > T ) should be exponentially small, our bound here will suce for future applications.) To bound Pr(X^t A), we note that according to (3.2) and the comments that followed, the moment generating function M () of X^t is properly de ned for < log r. Hence, similar to proof of (3.16), there are constants 2 (0; 1) and C > 0 such that Pr(X^t A) C?A : 0
1
12
The lemma now follows. 2 For the next lemma, we let Xt denote the chain with initial state X = O(log n) and compare it with the steady state chain X^ after h = blog nc steps. 2
0
9
Lemma 3.6 As n ! 1, Pr(Xh = 0) = Pr(X^ = 0) + o(1); E[Xh] = E[X^ ] + o(1): Proof
We shall show the lemma for the case where X = dlog ne. Let E be the event ^ that X log n. Then from the last equation in the proof of the previous lemma, we have Pr(E ) = Pr(X^t log n) = O(n?A ); (3:19) 0
2
0
2
2
for any contstant A. Next, let H be the waiting time until Xt rst hits 0. Then from (3.13), we have Pr(H > h) = O(n?A ); (3:20) for any constant A. Now in the coupling of Xt and X^t, if E does not occur, Xh must equal X^h if H h. (This is because XH = X^H = 0 on the event E.) Thus Pr(Xh = 0) Pr(E ) + Pr(H > h) + Pr(X^h = 0); and so the rst assertion of the lemma follows. Also, jE[Xh ] ? E[X^h]j E[jXh ? X^h j] in the coupling. Let (D) be the indicator for the event D. Then E[jXh ? X^h j] = E[(E )jXh ? X^h j] + E[(1 ? (E ))(H > h)jXh ? X^h j] E[(E )jXh ? X^h j] + E[(H > h)jXh ? X^hj] E[X^ (E )] + (log n)E[(H > h)] E[X^ ]Pr(E ) + (log n)Pr(H > h); which equals o(1) from (3.19), (3.20) and the fact that E[X^ ] = O(1). 2
0
2
2 0
2 0
2
4 Proof of Theorems 1.1(a) and 1.1(c) We shall rst assume Theorem 1.1(b) and prove Theorem 1.1(c) by a monotonicity argument to show that when c > c , the probability that GUC succeeds is o(1). We rst consider the monotonicity argument. Suppose that we have two random instances of ksat on n variables with m and m^ clauses of size k respectively. Assume m m^ . Let N ( ) = (N ( ); N ( ); N ( ); N ( )) and N^ ( ) = (N^ ( ); N^ ( ); N^ ( ); N^ ( )) denote their 3
0
1
2
3
0
13
1
2
3
respective states in GUC when there are variables whose truth values remain undetermined. We aim to give a coupling of N ( ) and N^ ( ) so that N ( ) N^ ( ). Note that the transition probabilities of N are given at the end of Section 2 and that the transition probabilities of N^ are de ned similarly with replaced with ^ and N with N^ . Note also that N (n) = (0; : : : ; 0; m) and N^ (n) = (0; : : : ; 0; m^ ) and so N (n) N^ (n). We shall show that if N ( ) N^ ( ), then N (t) N^ (t) for t < by coupling arguments.
Lemma 4.1 If N ( ) N^ ( ), then the chains N and N^ can be coupled so that N (t) N^ (t)
for t < .
Let i 1 be the minimum integer such that N^i( ) 6= 0. Now for j 6= i, ^ j (N ( )) = 0 and i(N^ ( )) = 1. Thus, for j 6= i, Nj ( ) ? j (N ( )) N^j ( ) ? j (N^ ( )):
Proof
For j = i, we have N^i( ) 1 and note that if Ni( ) = 0 then Ni( ) ? i(N ( )) = 0 N^i( ) ? 1 = N^i( ) ? i(N^ ( )); and that if Ni( ) 1 then i(N ( )) = 1, from which we have Ni( ) ? i(N ( )) = Ni ( ) ? 1 N^i ( ) ? 1 = N^i( ) ? i(N^ ( )): Therefore, we have for all i = 1; : : : ; k,
Ni( ) ? i(N ( )) N^i ( ) ? i(N^ ( )):
(4:1)
Observe next that for any two binomial variables B = B (; p) and B^ = B (^ ; p) with ^, we can couple B and B^ so that ^ B B; ^ ? B ^ ? B; where the coupling is obtained by identifying B as the sum of the rst Bernoulli variables from the ^ independent Bernoulli variables in B^ . It follows from (4.1) that we may couple N and N^ so that for i = 1; : : : ; k, i; ( ) ^ i; ( ); (4.2) Ni( ) ? i(N ( )) ? i; ( ) N^i( ) ? i(N^ ( )) ? ^ i; ( ): (4.3) It follows similarly from (4.2) that we may couple N and N^ so that for i = 1; : : : ; k, 0
0
0
0
i; ( ) ^ i; ( ): (4:4) Combining (4.3) and (4.4) gives that N ( ? 1) N^ ( ? 1). We can then repeat this coupling for ? 1; ? 2; : : : ; 1 to give the lemma. 2 1
1
14
Proof of Theorem 1.1(c). For c c , we have c > c ? for any > 0. Now for a random instance I of 3-sat with b(c ? )nc clauses and n variables, Theorem 1.1(b) gives that the limit (as n ! 1) of the probability that GUC succeeds when applied to I is arbitrarily 3
3
3
close to 0 for suciently small > 0. Theorem 1.1(c) thus follows from monotonicity. 2 To show Theorem 1.1(a), we apply a result of Chvatal and Reed [4] which can be stated as follows. Suppose that c < 2=3 and consider applying algorithm SC to a random instance of 3-sat with n variables and bcnc clauses. Then the probability that SC succeeds equals 1 ? o(1) as n ! 1. Theorem 1.1(a) now follows from the following lemma.
Lemma 4.2 Consider applying both GUC and SC to a random instance of k-sat with n variables and m clauses. Then
Pr(SC succeeds) Pr(GUC succeeds): Proof Consider applying both SC and GUC to a random instance I of k-sat with n variables and m clauses. Let N ( ) = (N ( ); : : :; Nk ( )) and N 0( ) = (N 0 ( ); : : : ; Nk0 ( )) denote the respective states of I in GUC and SC when there are variables whose truth values remain undetermined. Note that N (n) = N 0(n) initially and that the transition probabilities of N ( ) and N 0( ) are given at the end of Section 2. Note also that if N ( ) N 0( ) then 0j; and j; can be coupled so that 0j; j; . Thus, by following the coupling arguments in proof of Lemma 4.1, we have that if N ( ) N 0( ) then the chains N and N 0 can be coupled so that N (t) N 0(t) for 1 t < . This shows in particular that N ( ) N 0 ( ), and so the lemma follows. 2 0
0
0
0
0
0
0
0
5 Proof of Theorem 1.1(b) Assume c 2 (2=3; c ). Recall that 3
f (x) = fc(x) = 34c (1 ? x ) + log x; x 2 (0; 1); 2
and c is the maximum value of c such that f (x) 1 for all x 2 (0; 1). Let = (c) (for c > 2=3) be the root of the equation f (x) = 0 that is strictly less than 1. Note that is uniquely de ned and that is positive. By investigating the behaviour of f ((1 + )) for small > 0, we see that c < 2=3 and also if 3
2
= + n? :
0 24
0
then
nf ( ) = (n : ): Note that both n and n equal (n). We shall show that if c 2 (2=3; c ), then N ( ) can be approximated by f (=n) as decreases from n to n. We shall also show that if c 0 76
0
0
3
0
15
2
and are within these ranges, then N ( ) can be approximated by c (=n) . (Thus, when = b nc, we see that N ( ) = (n : ) and N ( ) c n). These estimates enable us to nd the limit of the probability that GUC succeeds. In order to minimize subscripts, we write W ( ) = N ( ), Y ( ) = N ( ) and Z ( ) = N ( ). We shall also consider a process X ( ) which runs alongside N ( ), and so we have a Markov chain (N ; W ( ); X ( ); Y ( ); Z ( )). The transition probabilities of (N ; W; Y; Z ) are same as N , but those of X need de ning. For completeness, we write down the one-step transitions of (W ( ); X ( ); Y ( ); Z ( )) below. 0
2
3 0 76
2
3 0
3
1
2
0
3
0
Z ( ) Y ( ) X ( ) W ( ) N ( ) 0
= = = = =
? ; ? ((N ; W; Y; Z )) ; ? ; ? ((N ; W; Y; Z )) ; ? ((N ; X; Y; Z )); ; ? ; ? ((N ; W; Y; Z )); 30
3
31
20
21
21
0
1
; ( );
2
0
1
0
0
10
11
where ; ; ; ; ; ;
30 31 20 21 10
11
= ; ( ) = ; ( ) = ; ( ) = ; ( ) = ; ( ) = ; ( ) 30
31
20
21
10
10
= = = = = =
B (Z ? ((N ; W; Y; Z )); 3= ); B ( ; ; 1=2); B (Y ? ((N ; W; Y; Z )); 2= ); B ( ; ; 1=2); B (W ? (((N ; W; Y; Z )); 1= ); B ( ; ; 1=2): 3
0
2
0
30
20
1
0
10
The initial state of the process is (N (n); W (n); X (n); Y (n); Z (n)) = (0; 0; 0; 0; bcnc). As the transitions of X ( ) ignores the eects of ? ( ), we have W ( ) X ( ) always (which can be checked by considering the cases where X ( ) = W ( ) and X ( ) > W ( )). We shall see that X ( ) is a good approximation of W ( ). We shall need the following bounds for sums of independent binomial variables. Let BP ( ; p ), : : : ; Bk (k ; pk ) be independent binomial variables. Write = + : : : + k and p = i i pi= . Then for A satisfying 0 < A < p=3 q Pr j B + : : : + Bk ? pj 3A p 2 exp(?A): (5:1) 0
10
1
1
1
1
1
Also, for a binomial variable B (; p), we have for u e,
Pr(B up) (e=u)up:
(5:2)
All our subsequent error probabilities regarding sums like P ; are derived from one of the above inequalities. We shall be bounding P such sums by sums of independent binomial variables. Although the variables in sums like ; are usually not independent, it is not dicult to show the stochastic dominance by induction and by conditioning on the outcomes 30
30
16
of the partial sums. Also, we say that an event E occurs with high probability (w.h.p. for short) if Pr(E ) = 1 ? O(n?A ); (5:3) for any constant A > 0. Now the events E usually contain bounds, involving some big O terms, for random variables. In this situation, it will be clear that equations like (5.3) hold for any A > 0 by choosing suciently large constants (which may depend on A) in the big O terms. We rst prove the following lemma which will be useful for future inductive proofs. Note that we make no attempt to minimize the powers of log n.
Lemma 5.1 Suppose that n. Let h = bn = c, 0 = ? h and I = f 0 + 1; : : : ; g. 1 2
0
Suppose that at stage ,
Z ( ) = c =n + z(n); Y ( ) = f (=n) + y(n); W ( ) = w(n) log n; where z (n) = o(n) and y(n) = o(n : ). Then with high probability, Z ( 0) = c 0 =n + O(z(n) + n = log n); Y ( 0) = 0f ( 0=n) + O(y(n) + z(n)n? = + n = log n); W ( 0) log n; (The constants in the big O terms are independent of .) 3
2
10
0 76
3
2
1 4
1 2
2
1 4
(5.4) (5.5) (5.6)
When proving the above lemma, we shall obtain the following estimates which will be useful later.
Lemma 5.2 With hypotheses of Lemma 5.1, we have with high probability that for all j 2 I , Z (j ) = Z ( ) + O(n = ); (5.7) = Y (j ) = Y ( ) + O(n ): (5.8) Let be the minimum value of k 0 such that W ( ? k) = 0, and for j 2 I , let j be the minimum value of k 1 such that W (j ? k) = 0. Then we have with high probability that q = O(w(n) + w(n) log n); (5.9) j log n; for j ? : (5.10) Also, we have with high probability that for j ? , q (5:11) W (j ) = O(w(n) + w(n) log n); and that for j ? , W (j ) = O(log n): (5:12) 1 2
1 2
2
2
17
Note that (5.9-5.12) imply that if w(n) = O(log 2 n), then we have with high probability that for all j 2 I ,
W (j ) = O(log n); j = O(log n):
(5.13) (5.14)
2
2
Proof
We shall prove Lemma 5.1 and point out from where the statements in Lemma 5.2 follow. Note rst that since n = (n), both and Z ( ) equal (n). De ne 0Z as the number of times that Y (j ) = W (j ) = 0 for j 2 I , and 0Y be the number of times that Y (j ) 6= 0 but W (j ) = 0. Similarly, let 0W be the number of times that W (j ) = 0. Therefore, we have X Z ( 0) ? Z ( ) = ? ; (j ) ? 0Z; (5.15) j 2I X Y ( 0) ? Y ( ) = ( ; (j ) ? ; (j )) ? 0Y: (5.16) 0
30
j 2I
31
20
To estimate UZ = Pj2I ; (j ), we note that ; (j ) is bounded above in distribution by aPbinomial variable with parameters Z ( ) and 3= 0. Thus it is not dicult to obtain that j 2I ; (j ) is bounded above by a sum of independent binomial variables, each with parameters Z ( ) and 3= 0. This gives an upper bound (w.h.p.) UZ = O(h) for the sum of the variables. Since Z ( 0) Z (j ) Z ( ), we have with high probability that Z (j ) = Z ( ) ? O(h), which is (5.7). Hence, with high probability, ; (j ) is bounded below by a binomial variable with parameters Z ( ) ? UZ and 3= . Since as n ! 1, 3Z ( ) ? O(h) = 3Z ( ) + O(n? = ); ? O(h) we have with high probability that, X ; (j ) = 3hZ( ) + O(n = log n): (5:17) j 2I 30
30
30
30
1 2
1 4
30
Similarly, we have with high probability that X ; (j ) = 3hZ2( ) + O(n = log n); j 2I 1 4
31
(5:18)
which gives us an upper bound Y ( ) + O(h) for Y (j ) where j 2 I . As each ; (j ) is distributed as a binomial variable with parameters Y (j ) + O(1) = O(n) and 2=j = O(1=n), we have with high probability that X ; (j ) = O(h): 20
j 2I
20
Since 0Y = O(h), we therefore have a lower bound Y ( ) ? O(h) for Y (j ) where j 2 I . Thus, we have Y (j ) = Y ( ) + O(h) with high probability (which is (5.8)). Hence, with high 18
probability, each ; (j ) is bounded above and below in distribution by binomial variables with parameters Y ( ) + O(h) and 2=( + O(h)). It thus follows that X (5:19) ; (j ) = 2hY ( ) + O(n = log n) j 2I 20
1 4
20
with high probability. To estimate 0Z , note rst that if n ? n : (but n), then from the hypotheses in the lemma, we have 0 76
0
Y ( ) = (n : ): 0 76
Note P that during the entire time interval I , the number of size two clauses removed is at most j2I ; (j ) + h, which equals O(n = ) with high probability (using (5.19)). Thus the quantity Y (j ), for j 2 I , is never zero and so when n ? n : , 1 2
20
0 76
0 Z = 0
(5:20)
with high probability. For the case where n ? n : , we consider stage k 2 I with k ? n : and write h0 = ? k. Then similar to (5.18), we have with high probability that 0 0 0 X (5:21) ; (j ) = 3h 2Z( ) (1 + o(1)) = 3cn h (1 + o(1)) = 3ch 2 (1 + o(1)): j k 0 76
01
2
31
2
= +1
Also, note that Y ( ) = O(n : ) (for n ? n : ) and so similar to (5.19), we have that for any xed > 0, X ; (j ) h0; (5:22) 0 76
0 76
20
j =k+1
with high probability. Now in order for Y (k) = 0, we must have X ( ; (i) ? ; (i)) h0: 31
i=k+1
20
which, since c > 2=3 and according to (5.21) and (5.22), occurs with probability O(n?A ), for any constant A > 0. This shows that with high probability, Y (k) 6= 0 for all k ? n : . Thus with high probability, there are at most n : times when Y (j ) = 0 (where j 2 I ). Combining this with (5.20), we have with high probability that 01
01
0Z = O(n : ):
(5:23)
01
Using (5.15) and (5.17) and (5.23), we have with high probability that Z ( 0) = Z ( ) ? 3hZ( ) + O(n = log n) = Z ( )(1 ? 3h= ) + O(n = log n) = Z ( )( 0= ) (1 + O(1=n)) + O(n = log n) = c 0 =n + O(z(n) + n = log n): 1 4
1 4
3
3
1 4
2
1 4
19
This proves (5.4). Next, we like to estimate 0Y and 0W . In view of (5.23), we have 0Y = 0W ? O(n : ) with high probability. To estimate 0W , we consider a process fX (j ) j j ng with transition probabilities as de ned in the beginning of this section. We also let X ( ) = W ( ). Then as observed before, we have W (j ) X (j ) for all j . Let 0X be the number of times that X (j ) = 0 for j 2 I , and so 0W 0X (as W (j ) X (j )). Next, observe that similar to our proof of (5.19), we have with high probability that ; (j ) (for all j 2 I ) is bounded above and below in distribution by binomial variables with parameters Y ( ) + O(h) and 1=( + O(h)). Now according to the hypotheses of the lemma, 0 < Y (+) +O(Oh()h) = f (=n)(1 + o(1)); which is bounded above by a constant less than 1 (since c < c ). Hence, with high probability, we have that X (j ) (for all j 2 I ) is bounded above and below in distribution by the states of two Markov chains similar to the Markov chain described in the previous section. It therefore follows from Lemma 3.3 (by taking there as (Y ( ) + O(h))=( + O(h)), T there as h, A there as O(log h)) that with high probability, ! Y ( ) + O ( h ) 0 X = h 1 ? + O(h) + O(h = log h) 01
21
3
1 2
= Y ( ) n = n ? + O(n = log n): (5.24) We shall show next that 0W and 0X do not dier by much. We do this by nding an estimate for X (X (j ) ? W (j )) ; 1 2
=
1 2
1 4
j 2I
which will also be useful later. Let 0 = maxfk j X (k) = 0g and use j0 to denote the minimum value of k 1 such that X (j ? k) = 0. Note that when X (j ) = 0, W (j ) is necessarily equal to 0 (as W X ). Hence whenever ; (j ) 1, its cumulative eect on P W stops when X next gets to 0. Thus, 10
X j 2I
(X (j ) ? W (j ))
X j = +1 0
X (j ) +
X 0
j = +1 0
; (j )j0: 10
Recall that as argued above, X (j ) behaves like the Markov chain Xj discussed in the previous section. To estimate 0, note that if w(n) = 0, then 0 = ; otherwise we apply (3.13) (with n there as w(n), and A there as O(log n)) to obtain that q ? 0 = O(w(n) + w(n) log n); holds with high probability. (Since W (j ) X (j ), this gives (5.9).) Similarly, using (3.14), we have with high probability that for all j between and 0, q (5:25) X (j ) = O(w(n) + w(n) log n); 20
from which (5.11) follows. Thus, with w(n) log n, we have ? 0 = O(log n) and X (j ) = O(log n), from which we obtain that X X (j ) = O(log n) 10
10
10
20
j = +1 0
holds with high probability. Next, for j between 0 +1 and 0, we have from (3.18) that with high probability, W (j ) X (j ) log n; (5:26) and so (5.12) follows. Next we use (3.17) to obtain that with high probability, j0 log n; (5:27) from which (5.10) follows. (Note that strictly speaking, we have only showed that X (j ) can be approximated by a Markov chain Xj de ned in the previous section for j 2 I . This creates a problem when estimating j0 for j \close" to 0. However, as it can be seen easily that our previous approximations for Z (j ) and Y (j ) work for j between 0 ? log n and 0 also. This means that X (j ) can be approximated by Xj for all j between 0 ? log n and . As (3.17) gives that 0 = O(log n), inequality (5.27) now follows from (3.17) too.) Note that ; (j ) P is a binomial variable with parameters W (j )+O(1) and 1=j . Thus it follows from (5.26) that j ; (j ) is bounded above by a binomial variable with parameters O(h log n) and O(1=n). Hence (5.2) gives that X ; (j ) = O(log n) 2
2
3
3
2
0
10
2
0
=
0
+1
10
0
j = +1
10
0
with high probability. It thus follows from (5.27) that X ; (j )j0 = O(log n) 0
j =
3
10
0 +1
with high probability. We thus conclude that with high probability, X (X (j ) ? W (j )) = O(log n): 20
j 2I
(5:28)
It follows that with high probability, we have 0W ? 0X = O(log n): This together with (5.24) give that with high probability, = 0W = n = ? Y ( )n + O(n = log n): (5:29) Hence, combining (5.16), (5.18), (5.19), (5.29) and the fact that 0Y = 0W ? O(n : ), we have with high probability that (5:30) Y ( 0) ? Y ( ) = 3hZ2( ) ? hY( ) ? n = + O(n = log n): 20
1 2
1 2
1 4
01
1 2
21
1 4
It follows from the hypotheses of the lemma that Y ( 0) = ! 3cn = n = Y ( ) 1 ? + 2 n ? n = + O(n = log n + z(n)n? = ) = = f (=n)( ? n = ) + 3cn2 n ? n = + O(y(n) + n = log n + z(n)n? = ): On the other hand, 0f ( 0 =n) = !! n = ( ? n = )f n 1 ? 3cn = n = ! = = ( ? n ) f (=n) + n 2 ? + O(1) = 3 cn = = ( ? n )f (=n) + 2 n ? n = + O(1): This proves (5.5). For (5.6), we use the fact that W ( 0) X ( 0). Since as observed previously that X (j ) can be approximated by a Markov chain Xt de ned in the previous section, inequality (5.6) follows from Lemma 3.5. 2 We now make use of Lemma 5.1 to show Theorem 1.1(b). Let h = bn = c as before, and write ni = n ? ih and Ii = fni + 1; : : : ; ni? g. De ne J as the greatest integer such that nJ = n ? Jh n. Note rst that by using induction and by applying Lemma 5.1 repeatedly, we have with high probability that for all i J , Z (ni ) = cni =n + O(in = log n); (5.31) = Y (ni ) = nif (ni =n) + O(in log n); (5.32) W (ni) log n; (5.33) where the constants in the big O terms are independent of i. Note that since i J = O(n = ), the error terms in the (5.31) and (5.32) are both equal to O(n = log n) = o(n : ). This implies that the values of Z (ni); Y (ni) and W (ni) (i J ) satisfy the hypotheses of Lemma 5.1, and so induction works by applying Lemma 5.1 repeatedly. In particular, it follows from (5.13) that with high probability, W ( ) = O(log n); for all nJ : (5:34) We shall now prove the following two lemmas from which Theorem 1.1(b) follows immediately. 1 2
1 2
1 2
2
1 2
2
1 2
1 4
1 2
1 2
1 4
1 2
1 2
1 2
2
1 2
1 2
1 2
1 2
1 2
2
1 2
1 2
1
0
3
2
1 4
1 4
2
1 2
3 4
0 76
2
Lemma 5.3 nlim !1 Pr(GUC does not fail before stage nJ ) = exp ?
22
Z
1
! f (x) dx : 4x(1 ? f (x)) 2
(5.35)
Lemma 5.4 Suppose that at stage nJ , Z (nJ ) = cnJ =n + o(n); Y (nJ ) = nJ f (nJ =n) + o(n : ); W (nJ ) log n: 3
2
0 76
10
Then
nlim !1 Pr(GUC
creates an empty clause at and after stage nJ ) = 0:
(5:36)
Proof of Lemma 5.3 Let be the number of empty clauses created at stage and = minf1; g. Note that conditional on W ( ) = w, is a distributed as a binomial variable with parameters (w ? 1) and 1=(2 ). Thus Pr( =6 j W ( ) = w) = Pr( 2 j W ( ) = w) +
= O(w = ): 2
So if
=
then (5.34) implies
n X =nJ
;
=
2
n X =nJ
Pr( 6= ) = O n logn n = o(1): 2
!
2
Since our aim is to show that is asymptotically distributed (as n ! 1) as a Poisson random variable with parameter Z f (x) dx; 4x(1 ? f (x)) we need only show that is asympotically Poisson distributed with the right mean. We shall do this by the method of moments. The r-th fractorial moment of is X Pr(Er ); (5:37) E[( ? 1) : : : ( ? r + 1)] = r! 2
1
i ;i ;:::;ir )2Sr
( 1 2
where Sr = f(i ; i ; : : :; ir ) : nJ i < i < : : : < ir g and Er = fi1 = i2 = = ir = 1g. We next partition Sr into Sr0 [Sr00 so that Sr0 = f(i ; i ; : : : ; ir) 2 Sr : i n?log n and ik ? ik log n; k = 1; 2; : : : ; r ? 1g and Sr00 = Sr n Sr0 . Let us rst deal with Sr00. We have 1
2
1
2
1
10
2
1
jSr00j = O(nr? log n) 1
23
10
10
+1
(5:38)
and we claim that for any (i ; i ; : : :; ir) 2 Sr 1
2
log n n 2
Pr(Er ) = O
!r !
:
(5:39)
Combining (5.38) and (5.39) we will have !r ! X log n r ? Pr(Er ) = O n log n n = o(1): i1 ;i2 ;:::;ir 2Sr 1
(
2
10
(5:40)
00
)
To prove (5.39) we write
Pr(Er ) =
Yr t=1
Pr(it = 1 j Et? ):
(5:41)
1
We now consider a typical term in the product (5.41). X Pr(t = 1 j A(t; t); Et? )Pr(A(t; t) j Et? ) Pr(it = 1 j Et? ) = 1
1
t 2N 3
1
where if t = (w; y; z) then A(t; t) = fW (it) = w; Y (it) = y; Z (it) = zg. Now Et? refers to events in the history of the algorithm up to the start of stage it and so by complete independence Pr(it = 1 j A(t; t); Et? ) = Pr(it = 1 j A(t; t)) wi t w n: 1
1
So
J
X
Pr(it = 1 j Et? ) n1 w(t)Pr(A(t; t) j Et? ) J t X 1 n wPr(W (it) = w j Et? ) 1
1
1
J w
X n1 wPr(W (it) = w)=Pr(Et? ) J w ! 1 n Pr ( W ( i t) B log n) n B log n + Pr(Et? ) J ? r B logn n + Prn(E ) 1
2
2
1
2
2
t?1
0
from (5.34) (B denotes the hidden constant in (5.34)). Now either Pr(Et? ) n?r and we are done since Pr(Er ) Pr(Et? ) or Pr(it = 1 j Et? ) B logn n + n?r 2Blogn n : 1
1
2
1
0
2
0
24
Substituting in (5.41) gives (5.39). We next nd an estimate for Pr(Er ) when (i ; i ; : : :; ir ) 2 Sr0 . Let h = blog nc. Let ?t = f(w; y; z) : 0 w log n; y = itf (it=n) + O(n: ); z = Vcit =n + O(n: )g and ? = ? ? ?r . For = ( ; ; : : : ; r) 2 ? let A() = rt A(t; t). Then for (i ; i ; : : :; ir) 2 Sr0 we have, where D = f91 t r : (W (it); Y (it); Z (it)) 62 ?tg, X Pr(Er ) = Pr(Er j A())Pr(A()) + Pr(Er ^ D ) 2S X Yr Pr(it = 1 j Er ; A())Pr(A()) + Pr(Er ^ D ) = 2S t X Yr Pr(it = 1 j A(t; t))Pr(A()) + O(n?A ); (5.42) = 1
2
1
1
2
1
9
2
76
3
2
2
76
=1
2
=1
2S t=1
where the last equation follows from complete independence. We now estimate Pr(it = 1 j A(t; t)). As argued in our proof of Lemma 5.1, W ( ) can be approximated by a Markov chain de ned in Section 2. Thus using (3.4), (3.5) and Lemma 3.6, we have =n)(2 ? f (ik =n)) ? 1 + (1 ? f (i =n)) + o(1) E ([W (ik ) ? 1 + (W (ik)) j A(t; t)]) = f (ik2(1 k ? f (ik =n)) ) + o(1): = 2(1f?(ifk =n (i =n)) 2
k
Hence,
Yr
Yr
f (ik =n) + O(n?A ); 4 i (1 ? f ( i =n )) k k k k and o(1) can be made independent of (i ; i ; : : : ; ir). Then applying (5.37), (5.40) and (5.42) ! Yr X f ( i k =n) + o(1) E[( ? 1) : : : ( ? r + 1)] = (1 + o(1))r! i1 ;i2 ;:::;ir 2Sr k 4ik (1 ? f (ik =n)) X Yr f (ik =n) = (1 + o(1))r! + o(1) i1 ;i2 ;:::;ir 2Sr k 4ik (1 ? f (ik =n)) X Yr f (ik =n) + o(1) = (1 + o(1)) nJ i1 ;:::;ik n k 4ik (1 ? f (ik =n)) 0 n 1r X f ( i=n ) = (1 + o(1)) @ 4i(1 ? f (i=n)) A + o(1) i nJ (To obtain the second equation from the rst we use the fact that f (x)=(x(1 ? f (x)) is bounded in the range [1; ].) Note that nJ =n ! , and so Z n X f (x) dx + o(1): f (i=n) = 4x(1 ? f (x)) i nJ 4i(1 ? f (i=n)) =1
Pr(ik = 1jA(t; t)) = (1 + o(1)) 1
2
=1
2
2
(
)
=1
0
2
(
)
=1
2
=1
2
=
2
1
=
25
2
This gives that
E[( ? 1) : : : ( ? r + 1)] = (1 + o(1))
Z
Thus, for any xed integer r 1,
1
! f (x) dx r + o(1): 4x(1 ? f (x)) 2
!r f ( x ) nlim !1 E[( ? 1) : : : ( ? r + 1)] = 4x(1 ? f (x)) dx : This means that (and hence ) is asymptotically distributed as a Poisson variable with mean Z f (x) dx: 4x(1 ? f (x)) The lemma now follows. 2 Proof of Lemma 5.4 It is useful to note that as remarked when we de ned , the quantity c is bounded above by a constant less than 2/3. Note also that from the hypotheses of the lemma, we have Z (nJ ) = c n(1 + o(1)) and Y (nJ ) = o(n : ). We consider a further h0 = bn : c stages after stage nJ . We claim that by that stage, GUC will have arrived at a stage n where Y (n) = W (n) = 0. To see this, it is not dicult to check that in these further h0 stages, with high probability, Z
2
1
2
1
2
3
08
08
(I) at most 3c n : =2(1 + o(1)) new clauses of size 2 are created by GUC, (II) at least h0 clauses of minimal size are removed by GUC. 2
08
(Note that (I) is similar to (5.18) and can therefore be proved similarly.) Since c < 2=3 and Y (nJ ) + W (nJ ) = o(n : ), it is not possible to have (I) and (II) unless some of the clauses of minimal size removed are of size 3. This shows that with high probability, there is n nJ ? h0 such that Y (n) = W (n) = 0. Note also that similar to (5.17), we have with high probability that between stages nJ and nJ ?h0, only O(n : ) clauses of size 3 are removed. Thus at stage n, we have with high probability that there are Z (n) = c n(1+ o(1)) clauses of size three remaining, and that there are n = n(1 + o(1)) variables whose truth values remain unassigned. Since the ratio of number of size three clauses to number of variables at stage n is strictly less than 2/3, we know from part (a) of Theorem 1.1 that the probability that GUC creates an empty clause at and after stage n is o(1). It therefore remains to argue that for n between n0 = nJ ? h0 and nJ , GUC creates no empty clauses with probability tending to 1 as n ! 1. To do this, note that as in (I) above, we have with high probability that Y ( ) = O(n : ); for all between n0 and nJ . Since both n0 and nJ equal (n), we have with high probability that Y ( )= = o(1) for all 2 [n0; nJ ]. As indicated when showing (5.24), we have with high probability that for 2 [n0; nJ ], W ( ) can be bounded above in distribution by a Markov chain Xn de ned in the previous section with one-step transitions governed by 2
08
08
3
08
26
a binomial variable with parameters O(n : ) and 1=n0 . Using (3.14) and (3.18) and by following arguments used in showing (5.11) and (5.12), we have with high probability that for all 2 [n0; nJ ], W ( ) log n. This in turn gives that nJ W ( ) X = O(n? : log n): n 08
11
11
02
=
0
with high probability. Since the expected number of empty clauses created at stage equals O(E[W ( )= ]) (see de nition of ; ), the above equation gives that the expected number of empty clauses created at stages 2 [n0; nJ ] equals o(1). Hence, as n ! 1, 11
Pr(GUC creates an empty clause at stage 2 [n0; nJ ]) = o(1):
(5:43)
2
This completes our proof of Lemma 5.4.
6 GUC with backtracking and proof of Theorem 1.2 Since GUC succeeds with probability 1 ? o(1) when c < 2=3, we consider only the case where 2=3 c < c . Note rst that empty clauses can only be created by GUC when N ( ) 6= 0. As our previous analysis shows, N ( ) behaves like a Markov chain in steady state with a re ecting barrier at 0. Also, given N ( ), the probability that GUC creates an empty clause is at stage is O(N ( )= ). By allowing GUC to backtrack when it makes a \mistake", we shall see that a random instance of 3-sat almost certainly has a satis able truth assignment when c < c . Consider applying GUC to a 3-sat problem. With nb > ne, we use [nb; ne] to denote a \run" in which N ( ) is non-zero. That is, a run [nb; ne] is such that N (nb + 1) = 0, N (k) > 0 (nb k > ne), and N (ne ) = 0. We next describe how we allow GUC to backtrack. Recall that N ( ) is obtained from N ( + 1) by setting a literal x to 1 at stage + 1 (using x to denote the literal that is set to 1 at stage , and recall that x is a literal picked randomly from a randomly chosen clause of minimal size). Also, use S ( ) to denote the set of clauses at stage . Suppose that GUC is in a run with N (n0 + 1) = 0, and N (k) 1 for k = n0; n0 ? 1; : : : ; n00 where n00 n0 is the present stage. GUC then sets a literal xn to 1. The backtracking is performed if the setting of xn to 1 gives rise to the occurrence of two size one clauses fyg and fyg for some variable y. If this occurs , then GUC is allowed a limited backtracking (see also the failure condition (B) later) by resetting the literals xn ; xn ; xn ? ; : : :; xn to 0. We have to update the set of clauses by (a) removing all clauses that contain xk (k = n0 + 1; n0; : : :; n00) from the set S (n0 + 1) of clauses, (b) removing all occurrences of xk (k = n0 + 1; n0; : : : ; n00) from clauses in the set S (n0 + 1). Hence this new set of clauses becomes S (n00 ? 1) and the algorithm then proceeds as before by choosing a literal xn ? and setting it to 1 to obtain S (n00 ? 2). Stages n00 ? 2; n00 ? 3; : : : are carried out similarly as before. We call this algorithm GUCB. We say that GUCB fails if: 3
1
1
1
1
3
1
1
1
1
+1
1
1
00
00
0
+1
0
0
1
00
00
1
27
(A) An empty clause is created in the backtracking when resetting the truth values of some
literals to 0, or (B) It creates an empty clause in a stage after a backtracking and before the next time when the number of size one clauses becomes zero i.e. two separate occurrences of empty clauses in one run.
We use N^ ( ) = (N^ ( ); N^ ( ); N^ ( ); N^ ( )) to denote the state of GUCB at stage when applied to a random instance of 3-sat. With n0 and n00 de ned as above, we claim that at stage n00 ? 1, the set S (n00 ? 1) of clauses remains uniformly random. Claim. If Vn ? is the set of variables whose truth values remain unassigned at stage n00 ? 1, then for i = 1; 2; 3, a size i clause in S (n00 ? 1) is equally likely to be any clause in Ci(Vn ? ). Proof Let C be a clause of size s in S (n0 + 1). Note that s 2. It is clear that if C \ fxi; xig = ; for all i = n0 + 1; n0; : : : ; n00, then C is equally likely to be any clause in Cs (Vn ? ). On the other hand, if C \ fxi; xig 6= ; for some i = n0 + 1; n0; : : :; n00, then let j be the greatest value of such i's. If xj 2 C , then no sub-clause of C is in S (n00 ? 1) by de nition of S (n00 ? 1). If xj 2 C , then C = C ?fxj g is equally likely to be any clause in the set of all clauses with size j C j made up of variables whose truth values remain unassigned immediately after stage j . Now since C contains xj , C is not considered by GUCB until backtracking. During the backtracking, C is removed from S (n0 + 1) if C contains xi for some i = j ? 1; j ? 2; : : : ; n00. Otherwise C = C ? fxn ; xn ; : : :; xn g is in S (n00 ? 1), but then C is equally likely to be any clause of size j C j made up of variables in Vn ? . 2 Hence the behaviour of GUCB can be analysed by considering N^ ( ). As before, we shall allow GUCB to continue after empty clauses are created, that is, we allow GUCB to continue even when it fails in cases (A) and (B) above. We shall show that the probability that GUCB fails is o(1). This is done by showing that the eect of backtracking on N^ is negligible, and that with high probability, there are at most log n times when GUCB backtracks. Note that we make no attempt to minimize the powers of log n in this section. To minimize subscripts, we write W^ ( ) for N^ ( ), Y^ ( ) for N^ ( ) and Z^ ( ) for N^ ( ). Recall that f (x) = 34c (1 ? x ) + log x; x 2 (0; 1): The constant is de ned to be the unique root of f (x) = 0 within the range (0; 1), and = + n? : . Also, the integer J is de ned as the greatest integer such that n ? Jh n, where h = bn = c. We next de ne some new quantities. Let b = n + 1, l = n + 1 and f = n + 1. For integers 1 i log n , if GUCB backtracks for at least i times before stage nJ , then de ne bi; li; fi so that bi equals the stage number at which GUCB backtracks for the i-th time, li equals the greatest integer k bi such that W^ (k) = 0, and fi equals the smallest integer k bi such that W^ (k + 1) = 0; if GUCB backtracks for less than i times before stage nJ , then de ne bi = bi? , li = li? and fi = fi? . (That is, [fi & li] is essentially a \run" corresponding to GUCB in which the backtracking takes place at stage bi). We shall use induction to show that with high probability, we have for all i log n that Z^ (bi ? 1) = cbi =n + O(in = log n); (6.1) 0
1
2
3
1
00
00
00
1
1
1
1
2
0
2
+1
00
0
2
00
1
5
1
2
3
2
0
0 24
0
1 2
5
0
0
0
0
1
1
1
5
3
2
28
3 4
0
Y^ (bi ? 1) = bif (bi=n) + O(i n = log n); (6.2) ^ W (bi ? 1) = O(log n); (6.3) where the constants in the big O terms are independent of i. Note that the quantities Z^ (bi ? 1); Y^ (bi ? 1); W^ (bi ? 1) respectively are the numbers of size three, size two, size one clauses immediately after the backtracking at stage bi. When proving the above estimates using induction, it is convenient to show at the same time the following estimates that for i log n, Pr(GUCB creates an empty clause at stage j 2 [bi ? 1 & li + 1]) = O(log n=n);(6.4) Pr(GUCB creates an empty clause at stage bi) = O(log n=n):(6.5) That (6.1 - 6.3) hold for i = 0 is trivial. We assume therefore that they hold for i, and show that (6.1 - 6.3) remain valid for i + 1. Note that after stage bi, GUCB behaves like GUC until the next backtracking. Therefore, consider applying GUC to a random instance I of a satis ability problem on bi ? 1 variables with Z^ (bi ? 1) size three clauses, Y^ (bi ? 1) size two clauses and W^ (bi ? 1) size one clauses. Use Z (j ); Y (j ) and W (j ) to denote the numbers of size three, size two and size one clauses at stage j bi ? 1. Also, for j bi ? 1, use j to denote the minimum value of k 1 such that W (j ? k) = 0. Note that until the next backtracking at stage bi , we have Z^ = Z , Y^ = Y and W^ = W . 2
3 4
4
5
8
6
+1
Note that the values of Z; Y; W satisfy the hypotheses of Lemma 5.1. Thus, we pply (5.9) and (5.11) to obtain that with high probability, bi ? li = O(log n); (6.6) W (j ) = O(log n); for all j 2 [bi ? 1; li]: (6.7) We therefore have with high probability that lX i W (j ) = O(log n=n): j bi ? j Hence, the expected number of empty clauses created in stages j 2 [bi ? 1 & li + 1] equals O(log n=n) (please refer to comments before (5.43)). Equation (6.4) now follows. Next, we apply Lemma 5.1 to obtain that with high probability Z (n0) = cn0 =n + O(in = log n + n = log n); Y (n0) = n0f (n0 =n) + O(i n = log n + (i + 1)n = log n); W (n0) log n; where n0 = bi ? 1 ? h. These estimates satisfy the hypotheses of Lemma 5.1. Therefore, if n0 n, we may apply Lemma 5.1 repeatedly. Since we need only apply Lemma 5.1 at most O(n = ) times before we go past the stage b nc, we have by using (5.7), (5.8), (5.11), (5.13), (5.10) and (5.14) that with high probability, Z (j ) = cj =n + O((i + 1)n = log n); (6.8) = Y (j ) = jf (j=n) + O((i + 1) n log n); (6.9) W (j ) = O(log n) (6.10) j = O(log n); (6.11) 4
4
+1
=
8
1
8
3
2
3 4 2
1 4
3 4
1 4
2
0
1 2
0
3
2
3 4 2
2
2
29
3 4
for all j 2 [li & nJ ]. Note that if there are at most i backtrackings before stage nJ , then (6.1 - 6.3) remain valid for i + 1. Otherwise, we have li > fi bi nJ by de nitions of fi and bi . Therefore, using the above estimates, we have with high probability that +1
+1
+1
+1
Z (bi ) Y (bi ) Z (fi + 1) Y (fi + 1)
= = = =
+1
+1
+1
+1
cbi bi cfi fi
=n f (bi =n f (fi
+1
+1
3
+1
2
+ O((i + 1)n = log n); =n) + O((i + 1) n = log n); + O((i + 1)n = log n); =n) + O((i + 1) n = log n ): 2
+1
3
2
+1
(6.12) (6.13) (6.14) (6.15)
3 4
3 4
3 4
2
+1
3 4
0
Note that from (6.11), we have with high probability that the length of every \run" equals O(log n) in the entire history when GUC is applied to a random instance I de ned above. Thus, when GUCB backtracks at stage bi , we have with high probability that GUCB need only reset the truth values of v = O(log n) variables. Also, we have with high probability that fi ? bi = O(log n): (6:16) We next show that the backtracking does not change the numbers of size three and size two clauses by much. Note rst that by (6.12 - 6.16), we have 2
+1 2
+1
2
+1
Z (fi + 1) = cbi =n + O((i + 1)n = log n); Y (fi + 1) = bi f (bi =n) + O((i + 1) n = log n): +1
+1
+1
+1
3
2
(6.17) (6.18)
3 4
2
+1
3 4
Recall that in the backtracking at stage bi , GUCB resets the truth values of v = O(log n) variables and obtain the set of clauses at stage bi ? 1 by updating the set S (fi + 1) of clauses at stage fi + 1. We next observe that in the initial set of bcnc (random) clauses of size three, the number of clauses containing a given literal is distributed as B (m; 3=n). Thus, we have with high probability that for any literal x, the number of clauses containing x equals O(log n). Hence, with high probability, the number of size three clauses in S (fi +1) containing (at least) one of the v variables is O(log n). This gives that with high probability, Z^(bi ? 1) ? Z^(fi + 1) = O(log n): (6:19) 2
+1
+1
+1
+1
2
+1
4
+1
4
+1
For size two clauses, we note rst that at most Z^ (bi ? 1) ? Z^ (fi + 1) clauses of size two are added to S (fi + 1). Also, similar to (6.19), we have with high probability that at most O(log n) size two clauses are removed from S (fi + 1) in the backtracking. Therefore, we have with high probability that Y^ (bi ? 1) ? Y^ (fi + 1) = O(log n): (6:20) +1
4
+1
+1
+1
+1
4
+1
Similarly, it is easy to see that with high probability, at most O(log n) clauses of size 1 are created from clauses of size two and size three in S (fi + 1). We thus have with high probability that W^ (bi ? 1) = O(log n): (6:21) The induction proof of (6.1 - 6.3) is now complete by noting that (6.1 - 6.3) follow from (6.17 - 6.21) and the fact that Z^(fi + 1) = Z (fi + 1); Y^ (fi + 1) = Y (fi + 1). 4
+1
4
+1
+1
+1
30
+1
+1
We next would like to show (6.5). Let I = fbi; bi +1; : : : ; fi; fi +1g and use Vb to denote the set of variables whose truth values remain unassigned immediately before stage bi ? 1. For j 2 I , use xj to denote the literal that was set to 1 at stage j . (Note that v = fi ?bi +2 = O(log n).) Now in the backtracking at stage bi,i GUCB resets these v literals to 0 and update the set S (fi +1) of clauses. For j 2 I , let Sj be the set of clauses of size i in the set S (j ) of clauses at stage j containing the literal xj . That is, 2
( )
Sj i = fC 2 S (j ) j xj 2 C and j C j= ig: Note that if C 2 Sj , then C must come from a clause C 0 2 S (fi + 1) where C 0 contains a literal xj , for some j 0 2 I and j 0 > j . Thus, no clause in [j2I Sj can become an empty clause during backtracking. Note also that if C 2 Sj i (i = 2; 3), then the entire clause C is removed from Sj at stage j , and so no sub-clause of C can appear in Sj [Sj for all j 0 2 I and j 0 < j . Thus, if C 2 Sj i (i = 2; 3), then during backtracking, C ? fxj g is equally likely to be a size i ? 1 clause chosen from the set Ci? (Vb [ fx^j j j 0 2 I and j 0 < j g); where x^ here denotes the variable of the literal x. Thus, if C 2 Sj i (i = 2; 3), then the ( )
(1)
(1)
0
( )
(2) 0
(3) 0
( )
1
0
( )
probability that C becomes an empty clause after the backtracking is O(v=bi ) = O(log n=n). Note that for a clause C 2 S (fi + 1) to become an empty clause after backtracking, the clause C must be contained in [j2I [i ; Sj i . As argued in (6.19) and (6.20), the size of [j2I [i ; Sj i is O(log n). Hence the probability that an empty clause is created in the backtracking at stage bi equals O(log n=n). This proves (6.5). It now follows from (6.4) and (6.5) that 2
( )
=2 3
=2 3
( )
4
6
Pr(GUCB creates an empty clause at stages j 2 [bi; li + 1]; for some i log n) 5
= O(log n=n): 13
Therefore, it remains to show that
Pr(GUCB backtracks at least log n times before stage nJ ) = o(1); 5
and that
Pr(GUCB backtracks at and after stage nJ ) = o(1):
(6:22)
(6:23) To show (6.22), suppose that li is given and note that GUCB behaves like GUC after each li until the next backtracking at stage bi . Note that using (6.8) and (6.9), we have with high probability that for i log n, Z^ (li) = cli =n + O(n = log n); Y^ (li) = lif (li=n) + O(n = log n): Also, W^ (li) = 0. Next, consider applying GUC to a random satis ability problem I 0 with li variables, Z 0(li), Y 0(li) and W 0(li) clauses of size three, two and one respectively, where 5
+1
3
2
3 4
3 4
31
6
11
Z 0(li) Z^ (li), Y 0(li) Y^ (li) and W 0(li) W^ (li). Then by the monotonicity argument used in showing Theorem 1.1(c), we have W^ (j ) W 0(j ) for j bi . Thus, if b0 is the minimum value of li such that when GUC is applied to I 0, the set of clauses at stage contains two clauses fyg; fyg for some y, then it is easy to see that bi b0 in distribution. We apply this idea with Z 0; Y 0; W 0 obtained by applying GUC to a random instance I 0 of 3-sat with bc0nc clauses of size 3, where c0 2 (c; c ). Note that by de nitions of li and c0, we have li nJ 0 n (where 0 is de ned as but with c replaced by c0). Thus, we apply Lemmas 5.1 and 5.2 to obtain that with high probability, the numbers of size three, size two and size one clauses with respect to I 0 satisfy that for i log n, +1
+1
3
0
0
0
5
Z 0(li) = c0li =n + O(n = log n); Y 0(li) = lig(li=n) + O(n = log n); W 0(li) = O(log n); 3
2
3 4
6
3 4
11
2
where g(x) = 3c0(1 ? x )=4 + log x. Let N 0 be the number of stages before nJ such that in applying GUC to I 0, the set of clauses at stage contains two clauses fyg; fyg for some y. Since Z 0(li) Z^(li), Y 0(li) Y^ (li) and W 0(li) W^ (li) with high probability, it follows (by considering the waiting times b0 de ned above) that 2
Pr(GUCB backtracks at least log n times before stage nJ ) Pr(N 0 log n) + o(1): Using (5.13), we see that when GUC is applied to I 0, we have with high probability that for all j nJ , the number W 0(j ) of size one clauses at stage j is O(log n). Therefore, the probability that there is a contribution to N 0 at stage j equals O(E[W 0(j ) =j ]). Since W 0(j ) = O(n), we have E[W 0(j )] = O(log n), and hence E[N 0] = O(log n): 5
5
2
2
2
4
It therefore follows that
Pr(N 0 log n) = O(1= log n): 5
This shows (6.22). To show (6.23), we have from (6.8 - 6.10) again that with high probability Z^ (nJ ) = cnJ =n + O(n = log n); Y^ (nJ ) = nJ f (nJ =n) + O(n = log n); W^ (nJ ) = O(log n): 3
2
6
3 4
3 4
11
4
(Note that in the unlikely event where nJ 2 [bi ? 1; li] for some i, we may apply (6.1 - 6.3) and ^ Y^ ; W^ satisfy the (6.6 - 6.7) to obtain the above estimates at stage nJ .) These values of Z; hypotheses of Lemma 5.4. Thus, we obtain (6.23) from Lemma 5.4. Our proof of Theorem 1.2 is thus complete. 32
7 Proof of Theorem 1.3 We shall only give a sketch proof here. Consider SC when applied to a random instance of k-sat with n variables and m = bcnc clauses. We restrict our attention to ! k ? 1 ? 1 2k ? ; c > k ? 3 kk ? 2 k for otherwise SC succeeds with probability 1 ? o(1) (see Chvatal and Reed [4]). Let qi( ) be the probability that a randomly selected clause from Ck (Vn) is of size i immediately before stage . It is not dicult to check that for i = 3; : : : ; k, n? qi( ) = k?in i 2? k?i : 3
(
)
k
Let Ni0( ) be the number of size i clauses at stage . The above equation implies that with high probability, we have for i = 3; : : : ; k that ! k 0 i k?i = Ni ( ) = i 2cn (7:1) k?i (=n) (1 ? =n) + O(n log n); 1 2
whenever = (n). This gives a fairly accurate estimate for N 0 ( ) in particular. Fix a (small) constant > 0. Recall that is the largest root of the equation ! k p (x) = 3 cx (1 ? x)k? 2? k? = 2=3: 3
1
2
3
3
(
3)
Let 0 = + and 00 = ? . Note that N 0 ( ? 1)+ N 0 ( ? 1) ? N 0 ( ) ? N 0 ( ) is bounded above by ( 0 ; ( ); if N 0 ( ) + N 0 ( ) = 0; 0 ; ( ) ? 1; otherwise; where 0 ; ( ), de ned in Section 2, is the number of new size 2 clauses created at stage . Since 0 ; ( ) is a binomial variable with parameters N 0 ( ) and 3=2 and since for 0 n, ! 3 k 3 c N 0 ( ) 2 = 2 3 (=n) (1 ? =n)k? 2? k? + O(n? = log n) < 1 1
1
1
1
1
31
2
1
1
2
2
31
31
1
3
31
2
3
3
(
3)
1 2
with high probability, it follows from Lemma 3.4 (see also proof of (5.13)) that for 0 n, 1
N 0 ( ) + N 0 ( ) = O(log n) 1
2
2
with high probability. This gives an upper bound for N 0 ( ) which in turn gives that with high probability, X 0 N ( ) = O(log n): 2
n 1 n
2
1
0
33
The expected number of empty clauses created before stage 0 n thus equals O(log n=n) = o(1). Hence 0 (7:2) nlim !1 Pr(SC fails at or before stage n) = 0: Furthermore, for between 00n and 0 n, it is not dicult to obtain that there is () which tends to 0 as ! 0 such that N 0 ( ) ()n (7:3) with high probability. This gives an upper bound for N 0 ( ) and it is not dicult to obtain in a similar (but simpler) fashion as our proof of Theorem 1.1(b) that there is () where
() ! 0 as ! 0 such that 2
1
1
1
1
1
1
2
1
2
2
nlim !1 Pr(SC
fails at a stage between 00n and 0 n) (): 1
(7:4)
2
1
Suppose we allow SC to have limited backtracking (as in GUC described in the previous section). Then in view of (7.2) and (7.4), the theorem follows from the following lemma.
Lemma 7.1 For all small > 0, 00 nlim !1 Pr(SCB fails at or after stage n) = 0: 1
We do not prove Lemma 7.1. Instead, we give a sketch proof of Lemma 7.2 below. (c < ck means that Lemmas 7.1 and 7.2 can be proved similarly.)
Lemma 7.2 Let n = b nc and V be a set of n variables. Let I be a random formula 0
1
0
with N^i (n ) clauses of size i, where for i = 3; : : : ; k, ! k ^ Ni(n ) = i cn(n =n)i(1 ? n =n)k?i + O(n = log n) and N^i(n ) = 0 for i = 0; 1; 2. Each size i clause in I is chosen at random (with equal probability) and independently from Ci (V ). Then 0
0
0
1 2
0
0
nlim !1 Pr(SCB,
applied to I , fails at or after stage n ) = 0: 0
This lemma can be proved in a way similar to our proof of Theorem 1.2. The key point is that when SC (without backtracking) is applied to I , we can follow our proof of (5.5) to obtain an estimate for the number N 0 ( ) of clauses of size two. Indeed, if h = bn = c, ni = n ? ih, Ii = fni + 1; : : : ; ni? g and J is the greatest integer such that n ? Jh n + n : where is de ned later, then we have with high probability that i ih k? ? (1 + (k ? 2) )(1 ? )k? N 0 (ni) = 2kcn (1 + ( k ? 2) n =n )(1 ? n =n ) i i k? +ni log(ni=( n)) + O(in = log n); (7.5) 1 2
2
1
2
0
2
1
1
1 4
34
0
0 76
1
1
2
which can be proved using induction and dierence equations as in Lemma 5.1. Intuitively, the above equation can be obtained as follows. Let k? p (x) = 2kc k? (1 + (k ? 2)x)(1 ? x) + log x: Note that p (x) ? p ( ) is an approximation to N 0 (bxnc)=bxnc according to (7.5). We de ne < as the smallest number so that p (x) ? p ( ) = 0. Note also that dp = 1 ? 3 p (x) + 1 : (7:6) dx x 2 Thus p (x) is maximized when x = . Note that ! 1 1 1 k ? 3 k ? 3 p ( ) ? p ( ) = (k ? 1)(k ? 2) + ? ? + ln( = ); 2
2
2
2
1
1
2
0
2
2
2
2
1
3
0
2
0
2
1
2 0
2 1
0
0
1
1
which is less than 1 according to the hypothesis of the theorem. Thus, taking (7.5) as induction hypothesis, we see that N 0 (ni)=ni is, with high probability, at most a constant which is less than 1. This means that we can apply the results in Section 2 to approximate N 0 ( ), and in particular obtain that (see before (3.2)) 2
0
1
Pr(N 0 ( ) = 0) 1 ? N 0 ( )=: 1
2
This shows that
E[N 0 ( ? 1) ? N 0 ( )] E[0 ; ( ) ? 0 ; ( )] ? Pr(N 0 ( ) = 0) 23 E[N 0 ( )] ? 1 E[N 0 ( )] ? 1: 2
2
31
20
1
3
2
Putting '(x) = E[N 0 (bxnc)=bxnc), we have for small h > 0 that 2
'(x ? h) ? '(x) 1 E[N 0 (bxn ? hnc)] ? 1 E[N 0 (bxnc)] (1 + h=x + O(h )) xn xn h 1 xn (E[N 0 (bxn ? hnc) ? E[N 0 (bxnc)]) + x n E[N 0 (bxnc)] + O(h ) hx 32 p (x) ? 1 : So '(x) should stay close to the solution of the dierential equation (7.6). The induction proof of (7.5) is completed by showing that N 0 (ni ) ? N 0 (ni ) is close to its mean. It can be shown that the Claim in Section 5 remains true for SCB when applied to I . That is, the set of clauses after each (limited) backtracking remains uniformly random. Therefore, our proof of (6.22) and the statement before it can be extended to show that 2
2
2
2
2
2
2
2
3
2
+1
2
Pr(SCB, applied to I , fails at a stage between nJ and n ) = o(1): 0
35
(7:7)
It therefore remains to show that Pr(SCB, applied to I , backtracks at and after stage nJ ) = o(1): (7:8) Proving (7.8) requires a result similar to Lemma 5.4. Since the backtracking in SCB does not change Ni0( ) by much, we have in particular estimates for N^i (nJ ) (similar to those given in (7.1) and (7.5)). Thus as in the proof of Lemma 5.4, there is (with high probability) n n such that N^ (n) = N^ (n) = 0 and that for i = 3; : : : ; k and for n, N^i( ) can be approximated by estimates similar to those given in (7.1). Note that for < n, N ( )= is less than a constant which is less than 2=3. Thus similar to (7.2) and (7.4), we have (7.8). 1
2
3
8 Other Models We observe that repacing m = bcnc by m = b(c + o(1))nc yields exactly the same results above.
(a) Suppose we allow x; x in the same clause. Remove such clauses as they are always
satis ed. With high probability there are o(n) such clauses and what is left is random. (b) Suppose we do not allow repetition of the same clause. Remove repetitions and argue as in (a). (c) Suppose clauses are distinct but unordered, as are the literals in a clause. This follows from (b) as each instance in this model gives rise to the same number m!(k!)m instances of Model (b). (d) If we allow a clause to have a repeated literal then this is the same as starting with a few clauses of size k ? 1 (with high probability no smaller clauses will occur). Nothing signi cant will happen, but one has to check that the analysis is essentially unaected.
Acknowledgement: We thank Boris Pittel for pointing out errors and providing help on an earlier version of this paper.
References [1] A.Z. Broder, A.M. Frieze and E. Upfal, On the satis ability and maximum satis ability of random 3-CNF formulas, to appear in SODA 1993. [2] M.T. Chao and J. Franco, Probabilistic analysis of two heuristics for the 3-satisfability problem, SIAM Journal on Computing 15 (1986) 1106-1118. [3] M.T. Chao and J. Franco, Probabilistic analysis of a generalization of the unit-clause literal selection heuristics for the k satis abiable problem, Information Science 51 (1990) 289-314. 36
[4] V. Chvatal and B. Reed, Mick gets his (the odds are on his side), Proceedings of the 33rd IEEE Symposium on Foundations of Computer Science, (1992) 620-627. [5] V.Chvatal and E.Szemeredi, Many hard examples for resolution, [6] M. Davis and H. Putnam, A computing procedure for quanti cation theory, Journal of the ACM 7 (1960) 201-215. [7] A. Goerdt, A threshold for unsatis ability, to appear in 17th International Symposium on Mathematical Foundations of Computer Science, Prague, Czechoslovakia, August 1992. [8] A.Goldberg, Average case complexity of the satis ability problem, Proceedings of 4th Workshop on Automated Deduction, (1979) 1-6. [9] A.Kamath, R.Motwani, K.Palem and P.Spirakis, Why Mick doesn't get any: thresholds for (un)satis ability, to appear. [10] D.Knuth, R.Motwhani and B.Pittel, stable marriage [11] T. Larabee, Evidence for the satis ability threshold for random 3CNF formulas. [12] A.El Maftouhi and W.Fernandez de la Vega, On Random 3-sat, to appear. [13] D. Mitchell, B. Selman and H. Levesque, Hard and easy distributions of SAT problems.
37