A more rapidly mixing Markov chain for graph colourings

Report 2 Downloads 38 Views
A more rapidly mixing Markov chain for graph colourings Martin Dyer and Catherine Greenhill 15 August 1997 Abstract

We de ne a new Markov chain on (proper) k-colourings of graphs, and relate its convergence properties to the maximum degree  of the graph. The chain is shown to have bounds on convergence time appreciably better than those for the wellknown Jerrum/Salas{Sokal chain in most circumstances. For the case k = 2, we provide a dramatic decrease in running time. We also show improvements whenever the graph is regular, or fewer than 3 colours are used. The results are established using the method of path coupling. We indicate that our analysis is tight by showing that the couplings used are optimal in a sense which we de ne.

1 Introduction Markov chains on the set of proper colourings of graphs have been studied in computer science [9] and statistical physics [13]. In both applications, the rapidity of convergence of the chain is the main focus of interest, though for somewhat di erent reasons. The papers [9, 13] introduced a simple Markov chain, which we shall refer to as the JSS chain, and related its convergence properties to the maximum degree  of the graph. Speci cally, they established (independently) that the chain converges in time polynomial in the size of the graph (i.e. has the rapid mixing property), provided that number of colours k exceeds 2. (Note that the JSS chain is known to converge eventually provided only k   + 2.) Jerrum proved this using the coupling technique while Salas and Sokal used Dobrushin's uniqueness criterion. The two results have di erent merits. Jerrum's result can be extended (but with an  (n2) increase1 in running time) to the case k = 2, whereas Salas and Sokal's extends to the more general situation of the Potts model [2]. The transition procedure of the JSS chain is simple: choose a vertex v and a colour c both uniformly at random and recolour v with c if this results in v being properly coloured. A slight variation of this procedure involves choosing the colour c uniformly at random from the set of colours which could properly recolour v . The contribution of this paper is to de ne and analyse a new Markov chain on colourings. The transition procedure of the new chain can be thought of as an extension of the second version of the JSS chain. An edge, rather than a vertex, is chosen uniformly at random and the endpoints of the edge are (properly) recoloured uniformly at random from the permissible colour pairs. 1

 ( ) is the notation which hides factors of log(n) 

1

We establish the following properties for the new chain. If k = 2 then our bound on convergence time for the new chain is  (n2) faster than that for the JSS chain. Further, we show that the new chain always requires fewer than half the number of steps of the JSS chain whenever 2  k < 3, or the underlying graph is regular. This is somewhat surprising, since one step of the new chain is no more complex than two steps of the Jerrum chain. Our convergence results are obtained using the method of path coupling, introduced in [3]. We give a rather precise analysis of the chain, and provide some evidence that coupling techniques are unlikely to improve upon our analysis of this chain. The reader may be tempted to think that our results simply imply that the JSS chain is actually faster than its known bound. This could be the case, but we stress that there is at present absolutely no evidence for this hypothesis. The improved convergence rate over the JSS chain may be an artefact of the analysis, or may represent a genuinely faster convergence, since the new chain certainly has \more power" than two steps of JSS. For example, we prove that the new chain converges eventually for k   + 1. Although we do not address the question here, it can be shown that the problem of counting proper k-colourings of arbitrary graphs of maximum degree  is #P -hard for any xed k,  such that k   + 1,   3. The proof takes as its starting point the known fact that counting proper k-colourings of arbitrary graphs is #P -hard when k  3 (see [8]). Proofs are presented in [6]. The plan of the paper is as follows. A review of the path coupling method is presented in Section 2. The rapid mixing of the new chain is established in Section 3 and a variation of the chain is de ned which improves the mixing time for nonregular graphs. The mixing times of the JSS chain and the new chain are then compared. Finally, in Section 4 it is shown that the couplings used to prove rapid mixing of the new chain are optimal couplings in a precise sense which we will de ne.

2 The path coupling method In this section some notation is introduced and the path coupling method is reviewed. Let be a nite set and let M be a Markov chain with state space , transition matrix P and unique stationary distribution . In order for a Markov chain to be useful for almost uniform sampling or approximate counting, it must converge quickly towards its stationary distribution  . We make this notion more precise below. If the initial state of the Markov chain is x then the distribution of the chain at time t is given by Pxt (y) = P t (x; y). The total variation distance of the Markov chain from  at time t, with initial state x, is de ned by X dTV (Pxt ;  ) = 21 jP t (x; y ) ?  (y )j: y2

Following Aldous [1], let x (") denote the least value T such that dTV (Pxt ;  )  " for all t  T . The mixing time of M, denoted by  ("), is de ned by  (") = max fx (") : x 2 g. A Markov chain will be said to be rapidly mixing if the mixing time is bounded above by some polynomial in n and log("?1 ), where n is a measure of the size of the elements of . Throughout this paper all logarithms are to base e. 2

There are relatively few methods available to prove that a Markov chain is rapidly mixing. One such method is coupling. A coupling for M is a stochastic process (Xt; Yt ) on  such that each of (Xt ), (Yt ), considered marginally, is a faithful copy of M. The Coupling Lemma (see for example, Aldous [1]) states that the total variation distance of M at time t is bounded above by Prob[Xt 6= Yt ], the probability that the process has not coupled. The following result is used to obtain an upper bound on this probability and hence an upper bound for the mixing time. The proof of the second statement follows that given in [12, Lemma 4].

Theorem 2.1 Let (Xt; Yt) be a coupling for the Markov chain M and let  be any integer valued metric de ned on  . Suppose that there exists  1 such that E[(Xt+1; Yt+1)]  (Xt; Yt)

for all t. Let D be the maximum value that  achieves on  . If < 1 then the mixing time  (") of M satis es D"?1 ) :  (")  log( (1 ? ) If = 1 and there exists > 0 such that

Prob[ (Xt+1; Yt+1 ) 6=  (Xt; Yt )]  for all t, then

2

?1  (")  d eD edlog(" )e:

Proof. Clearly

E[(Xt; Yt )]  t(X0; Y0)  tD;

and, since  is nonnegative and integer valued, Prob[Xt 6= Yt ]  E[ (Xt; Yt )]: If < 1 then the Coupling Lemma implies that  (")  log(D"?1 )= log( ?1 ). The rst statement follows as log( ?1 ) > 1 ? . Suppose now that = 1. Then the process Z (t) de ned by Z (t) = (D ?  (Xt; Yt ))2 ? t is a submartingale, since   E ((Xt+1; Yt+1) ? (Xt; Yt))2  : The di erences Z (t +1) ? Z (t) are bounded. Let T x;y denote the rst time that Xt = Yt where X0 = x and Y0 = y . Then T x;y is a stopping time for Z . The Optional Stopping theorem for submartingales (see for example, [15, Theorem 10.10]) applies and allows us to conclude that 2 E [T x;y ]  (x; y)(2D ? (x; y))  D :



3



Let T = deD2 = e. By Markov's inequality, the probability that T x;y  T is at most e?1 . If we run s independent trials of length T then the probability that we have not coupled by the end of the sT steps is at most e?s . Let t = sT . Now e?s  " if and only if s  log("?1 ). Since s is an integer it follows that t  T dlog("?1 )e: This gives the stated bound on the mixing time  ("). In general, the problem of obtaining an estimate for is quite dicult. The path coupling method, introduced in [3], provides a simpli cation. It involves de ning a coupling (Xt; Yt ) by considering a path, or sequence Xt = Z0 ; Z1 ; : : : ; Zr = Yt between Xt and Yt where the Zi satisfy certain conditions.

Theorem 2.2 Let  be an integer valued metric de ned on  which takes values in f0; : : : ; Dg. Let S be a subset of  such that for all (Xt; Yt) 2  there exists a path

Xt = Z0 ; Z1 ; : : : ; Zr = Yt between Xt and Yt such that (Zl ; Zl+1 ) 2 S for 0  l < r and r?1 X l=0

(Zl; Zl+1) = (Xt; Yt):

De ne a coupling (X; Y ) 7! (X 0; Y 0 ) of the Markov chain M on all pairs (X; Y ) 2 S . Apply this coupling along the given sequence from Xt to Yt to obtain the new sequence Z00 ; Z10 ; : : : ; Zr0 . Then (Xt; Yt) is also a coupling for M, where Xt+1 = Z00 and Yt+1 = Zr0 . Moreover, if there exists  1 such that E [(X 0; Y 0 )]  (X; Y ) for all (X; Y ) 2 S then E [(Xt+1; Yt+1)]  (Xt; Yt):

Proof. Clearly (Xt), (Yt) are faithful copies of M, as they are constructed from inductive application of a coupling of M de ned on elements of S . Now E [(Xt+1; Yt+1)]  E

" r?1 X

l=0 rX ?1 

(Zl0 ; Zl0+1 )

E (Zl0; Zl0+1)

=

l=0 r?1 X



l=0

(Zl; Zl+1)

=  (Xt; Yt ); 4

#



using linearity of expectation and the fact that  is a metric. This proves the theorem.

Remark 2.1 The path coupling technique can greatly simplify the arguments required to prove rapid mixing by coupling. For example, the set S is often taken to be

S = f(X; Y ) 2  : (X; Y ) = 1g : Here one need only de ne and analyse a coupling on pairs which are at distance 1 apart. Some applications of path coupling can be found in [4, 3, 5, 7].

3 A new Markov chain for graph colourings Let G = (V; E ) be a graph and let  be the maximum degree of G. Let k be a positive integer and let C be a set of size k. A map from V to C is called a k-colouring (or simply a colouring when k is xed). A vertex v is said to be properly coloured in the colouring X if v is coloured di erently from all of its neighbours. A colouring X is called proper if every vertex is properly coloured in X . A necessary and sucient condition for the existence of a proper k-colouring of every graph with maximum degree  is k   + 1. Let k (G)  C V be the set of all proper k-colourings of G. We will denote the colour assigned to a vertex v in the colouring X by X (v ). A simple Markov chain with state space k (G) was proposed by Jerrum [9] and by Salas and Sokal [13]. (We refer to this chain as the JSS chain.) If the JSS chain is in state Xt at time t then the state at time t + 1 is determined using the following procedure: choose a vertex v uniformly at random from V and choose a colour c uniformly at random from C . Let X 0 denote the colouring obtained from X by recolouring vertex v with the colour c. If v is properly coloured in X 0 then let Xt+1 = X 0, otherwise let Xt+1 = Xt . In the transition procedure the colour c is chosen uniformly at random from the set C of all colours. This is an example of a so called Metropolis Markov chain (see [10, p. 511]). There is a nonzero probability that the state X 0 will be rejected. We can remove this probability by choosing the colour c uniformly at random from the set Xt (v ) = C n fXt(u) : fu; v g 2 E g :

(1) This set contains just those colours which can be used to colour the vertex v properly in Xt . We refer to this amended chain as the heat bath JSS chain and denote this chain by J ( k (G)). Both the original JSS chain and the heat bath JSS chain are irreducible for k   + 2 and rapidly mixing for k  2. A simple path coupling argument can be used to establish Jerrum's result, that the mixing time J (") of J ( k (G)) satis es  n log(n"?1 ) J (")  kk??2 (2) when k > 2. Path coupling also establishes that J (")  6n3 log("?1 ) when k = 2. In this section we de ne a new Markov chain M1( k (G)) with state space k (G). We will show that M1( k (G)) is irreducible for k   + 1 and rapidly mixing for 5

k  2. The chain apparently mixes most rapidly when the graph G is -regular. By

introducing self-loop edges, the -regular mixing time can be achieved for non-regular graphs if 2  k < 3. At the end of the section, a comparison is made between the mixing times of the new chain and the mixing time of the heat bath JSS chain. It will be shown that the new chain is faster than the heat bath JSS chain if G is -regular or if 2  k  3 ? 1. If k = 2 then the new chain is  (n2) times faster than the JSS chain. We will use the metric known as the Hamming distance. Given X , Y 2 k (G) the Hamming distance H (X; Y ) between X and Y counts the number of vertices coloured di erently in X and Y . We begin by de ning the transitions of the Markov chain M1( k (G)). A transition from state X involves choosing an edge fv; wg uniformly at random from E , then choosing colours c(v ), c(w) uniformly at random from those pairs such that both v and w are properly coloured in the colouring obtained from X by recolouring v with c(v) and w with c(w). To make this more precise, we make some de nitions. For each edge e = fv; wg in E let Sv;w;X be the set of colours de ned by

Sv;w;X = fX (u) : fu; vg 2 E; u 6= wg : Let C e denote the set of all maps which assign colours to both endpoints of e, that is

C e = fc : fv; wg ! C g : We will refer to elements of C e as colour maps. Let AX (e) denote the subset of C e de ned by

AX (e) = fc 2 C e : c(v) 2 C n Sv;w;X ; c(w) 2 C n Sw;v;X ; c(v) 6= c(w)g : Given e = fv; wg 2 E and c 2 C e , denote by Xe!c the colouring obtained from X by

recolouring v by c(v ) and w by c(w). Then both v and w are properly coloured in Xe!c if and only if c 2 AX (e). Suppose that the Markov chain M1( k (G)) is in state Xt at time t. Then the state at time t + 1 is determined by the following procedure: (i) choose an edge e = fv; wg uniformly at random from E , (ii) choose c uniformly at random from AXt (e) and let Xt+1 = Xe!c . The chain M1( k (G)) is clearly aperiodic. We prove below that M1( k (G)) is irreducible for k   + 1. Therefore M1( k (G)) is ergodic when k   + 1. Let NX (e) = jAX (e)j for each edge e = fv; wg 2 E and write w  v if w is a neighbour of v. The transition matrix P of M1( k (G)) has entries 8P ?1 > >Pe2E ( E NX (e)) > > > > < wv ( E NX ( v; w

j j j j

P (X; Y ) = (jE jNX(e))?1 > > > > > > :

f

if X = Y ;

g))?1 if X and Y di er only at the vertex v, if X , Y di er only at both endpoints of the edge e, otherwise.

0 6

for all X; Y 2 k (G). Suppose that X and Y di er just at one or both endpoints of the edge e = fv; wg. Then Sv;w;X = Sv;w;Y and Sw;v;X = Sw;v;Y . Hence AX (e) = AY (e) and NX (e) = NY (e). This implies that the transition matrix P is symmetric. Therefore the stationary distribution of M1 ( k (G)) is the uniform distribution on k (G). Given any colouring X and any edge fv; wg, the sets Sv;w;X , Sw;v;X both have at most ( ? 1) elements. Since we may easily show that

NX (e) = (k ? jSv;w;X j)(k ? jSw;v;X j) ? (k ? jSv;w;X [ Sw;v;X j)

(3)

it follows that NX (e)  (k ? )(k ?  + 1) for all edges e 2 E . Therefore NX (e) is always positive when k   + 1. The next result shows that M1 ( k (G)) is irreducible for k   + 1.

Theorem 3.1 Let G be a graph with maximum degree . The Markov chain M1( k (G)) is irreducible for k   + 1. Proof. The new chain can perform any move of the JSS chain and the latter is known to be irreducible for k  +2. Therefore we need only show that M1 ( k (G)) is irreducible when k =  + 1. Let X and Y be two elements of k (G) where H (X; Y )  1. We show that H (X; Y ) can be reduced by performing a suitable series of transitions of M1( k (G))

with initial state X , relabelling the resulting state by X after each transition (here Y remains xed). Suppose that the colourings disagree at vertex v . Let U be the set of neighbours u of v such that X (u) = Y (v ). Assume that U is nonempty, otherwise H (X; Y ) can be decreased by recolouring v with Y (v). Clearly all vertices in U are non-adjacent and the colourings X , Y disagree at each element of U . Any vertex u 2 U which has fewer than  colours surrounding it can be coloured with some colour other than Y (v ), decreasing jU j without increasing H (X; Y ). Therefore we may assume that each element of U has  di erently coloured neighbours. In particular v is the only neighbour of u coloured X (v ), for all u 2 U . If jU j > 1 then v can be legally recoloured with some colour c(v ) 6= X (v ) without changing H (X; Y ). Then all vertices in U may be recoloured with X (v ) without increasing H (X; Y ). Finally v may be recoloured with Y (v), decreasing H (X; Y ) by 1. Otherwise jU j = 1. If u is the unique element of U then the edge fu; v g may be chosen and the colours of its endpoints may be switched by one move of the chain M1( k (G)). This move decreases H (X; Y ) by at least 1. We now prove that the new chain is rapidly mixing when k  2, using path coupling on pairs at Hamming distance 1 apart. The result is stated in terms of the minimum degree  of the graph G.

Theorem 3.2 Let G be a graph with minimum degree  and maximum degree . The Markov chain M1 ( k (G)) is rapidly mixing for k  2. The mixing time 1(") of M1( k (G)) satis es ? )(k ?  + 1) ?1 1 (")  (k2 ?(k(3 ? 2)k + 2( ? 1)2) jE j log(n" ): 7

Let T (") denote the mixing time of M1( k (G)) in the case that G is -regular. Then ? )(k ?  + 1) ?1 T (")  2(k2 ?(k(3 ? 2)k + 2( ? 1)2) n log(n" ):

Proof. Let (Xt; Yt) be a pair of elements of k (G)  k (G) and let r = H (Xt; Yt). We wish to de ne a path

Xt = Z0 ; Z1; : : : ; Zr = Yt (4) between Xt and Yt of length r such that H (Zi; Zi+1 ) = 1 for 0  i < r. In general we

cannot guarantee that such a path contains only proper colourings. To see this, consider elements Xt; Yt which di er only at both endpoints of a single edge fv; wg such that

Y (v) = X (w); Y (w) = X (v): However, we can always construct such a path with elements Zi in C V . The Markov Chain M1 ( k (G)) can be extended to the set C V of all colourings, since it does not assume that the current state Xt is a proper colouring in the transition procedure. Denote the extended chain by M(C V ). Since NX (e) is positive for any edge e and any colouring X , a transition is always possible at e in X and both the endpoints of the chosen edge will be properly coloured after the transition. Moreover, no properly coloured vertex can become improperly coloured by subsequent transitions. Therefore the improper colourings are transient states, and the stationary distribution  of M(C V ) is given by (

?1 (X ) = j k (G)j if X 2 k (G),

0

otherwise.

The chain M1( k (G)) acts exactly as the chain M(C V ) when the starting state of both chains is a proper colouring. Hence if M(C V ) is rapidly mixing then M1( k (G)) is rapidly mixing with no greater mixing time. We now prove that M(C V ) is rapidly mixing using the path coupling method on pairs at Hamming distance 1 apart. Let X and Y be elements of C V which di er just at the vertex v . The sets Sw;u;X , Sw;u;Y are equal when one of u or w equal v, or when fw; vg 62 E and fu; vg 62 E . We drop the subscripts X , Y for these sets. Let d denote the degree of the vertex v . The only edges which can a ect the Hamming distance between X and Y are the d edges fv; wg and all edges fw; ug where fw; v g 2 E and u 6= v . If the chosen edge is e = fv; wg then AX (e) = AY (e). Moreover these sets are nonempty. Therefore we couple at edges e = fv; wg by choosing the same colour map c uniformly at random from AX (e) and letting X 0 = Xe!c and Y 0 = Ye!c . Here H (X 0; Y 0 ) = 0 for each choice of c. Now suppose that the chosen edge is e = fw; ug where fw; v g 2 E and u 6= v . We proceed as follows: (i) choose cX 2 AX (e) uniformly at random and let cY 2 AY (e) be chosen as described below, 8

(ii) let X 0 = Xe!cX and let Y 0 = Ye!cY . For the remainder of the proof write AX for AX (e) as the edge e is xed: similarly write AY , NX , NY . For ease of notation a colour map c 2 C e will be written as hc(w); c(u)i. Note that 8 > 2 if cX (w) 6= cY (w) and cX (u) 6= cY (u), :

1 otherwise.

There are two con gurations, depending on whether fu; v g 2 E . If fu; v g 62 E then call fw; ug a Con guration 1 edge, while if fu; vg 2 E then call fw; ug a Con guration 2 edge. Let M be de ned by (5) M = (k ? k?+1)(+k2? ) : In the case analysis that follows, we will show that every Con guration 1 edge contributes less than M to E [H (X 0; Y 0 ) ? 1] and every Con guration 2 edge contributes less than 2M . Let d(w) denote the degree of the vertex w and write v  w if w is a neighbour of v. Then the total contribution of all Con guration 1 and Con guration 2 edges is less than X

wv

(d(w) ? 1)M  d( ? 1)M:

(6)

A Con guration 2 edge is allowed to contribute twice as much as a Con guration 1 edge since it is counted twice in the summation given in (6). Before embarking upon the case analysis we must establish a useful inequality. Let fw; ug be any edge such that fw; vg 2 E and u 6= v. We claim that k ? jSu;w;X j + 1  M: (7)

NX

Let a = jSu;w;X j. Then a   ? 1. After some rearranging, this implies that (k ? )(k ? a + 1)  (k ?  + 2)(k ? a ? 1):

(8)

Let b = jSw;u;X j. Then b   ? 1 and jSw;u;X [ Su;w;X j  b. It follows that (k ? a ? 1)(k ?  + 1)  (k ? a ? 1)(k ? b)  (k ? a)(k ? b) ? (k ? jSw;u;X [ Su;w;X j) = NX : Combining this with (8), we see that (k ? )(k ? jSu;w;X j + 1)(k ?  + 1)  (k ?  + 2)NX ; establishing (7). Note that (7) also holds with the roles of w and u reversed, or after replacing X by Y , or both. This will be used throughout the remainder of the proof. 9

Now the di erent cases which arise for Con guration 1 and Con guration 2 edges are identi ed. Then a coupling will be described for each subcase and the contribution of the coupling to E [H (X 0; Y 0 ) ? 1] will be calculated. Suppose rst that fw; ug is a Con guration 1 edge. Here Su;w;X = Su;w;Y so we drop the subscripts X , Y . If X (v ) 2 Sw;u;Y and Y (v ) 2 Sw;u;X then AX = AY . Here we can always use the same colour map c in both X and Y , so H (X 0; Y 0 ) = 1 with probability 1. Otherwise without loss of generality assume that Y (v ) 62 Sw;u;X . There are two cases to consider: for Case 1 suppose that X (v ) 62 Sw;u;Y and for Case 2 suppose that X (v) 2 Sw;u;Y . Within each case there are four subcases: Subcase (i) where X (v) 62 Su;w and Y (v ) 62 Su;w ; Subcase (ii) where X (v ) 62 Su;w and Y (v ) 2 Su;w ; Subcase (iii) where X (v) 2 Su;w and Y (v) 62 Su;w ; Subcase (iv) where X (v) 2 Su;w and Y (v) 2 Su;w . Clearly Subcase 1(ii) and 1(iii) are equivalent, by exchanging X and Y . The reader may verify that the situation for Subcase 2(iii) is exactly as described below for Subcase 2(i). Hence the coupling procedure described below for Subcase 2(i) works in Subcase 2(iii) also. Similarly Subcase 2(iv) may be handled identically to Subcase 2(ii). Therefore it it suces to consider only Subcases 1(i), 1(ii), 1(iv), 2(i), 2(ii) for a Con guration 1 edge. Now suppose that fw; ug is a Con guration 2 edge. If X (v ) 2 Sw;u;Y , Y (v ) 2 Sw;u;X , X (v) 2 Su;w;Y and Y (v) 2 Su;w;X then AX = AY . Here we can always use the same colour map c in both X and Y , so H (X 0; Y 0 ) = 1 with probability 1. Otherwise without loss of generality assume that Y (v ) 62 Sw;u;X . The same two cases arise: Case 1 where X (v) 62 Sw;u;Y and Case 2 where X (v) 2 Sw;u;Y . Again, within each case there are four subcases: Subcase (i) where X (v ) 62 Su;w;Y and Y (v ) 62 Su;w;X ; Subcase (ii) where X (v) 62 Su;w;Y and Y (v) 2 Su;w;X ; Subcase (iii) where X (v) 2 Su;w;Y and Y (v) 62 Su;w;X ; Subcase (iv) where X (v ) 2 Su;w;Y and Y (v ) 2 Su;w;X . In Subcase (iv) notice that Su;w;X = Su;w;Y . Therefore the methods used in Con guration 1, Subcase (iv) may be used here. Again Subcase 1(ii) and 1(iii) are equivalent, by exchanging X and Y . Moreover by exchanging the roles of u and w we see that Subcase 1(ii) and Subcase 2(i) are equivalent. Therefore it suces to consider Subcases 1(i), 2(i), 2(ii), 2(iii) for a Con guration 2 edge. CONFIGURATION 1, SUBCASE 1(i). Here fu; v g 62 E . Moreover X (v ) 62 Sw;u;Y and Y (v ) 62 Sw;u;X , so jSw;u;Y j = jSw;u;X j. Also X (v ) 62 Su;w and Y (v ) 62 Su;w . It follows that NX = NY . The set AY is obtained from AX by replacing the colour map hY (v); X (v)i with the colour map hX (v); Y (v)i and replacing every other colour map of the form hY (v ); c(u)i with hX (v ); c(u)i. Here choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v ). If cX (w) = Y (v ) and cX (u) 6= X (v ) then let cY = hX (v); cX(u)i. Finally if cX = hY (v); X (v)i then let cY = hX (v); Y (v)i. This de nes a bijection between AX and AY so each element of AY is chosen with equal probability. If cX (w) = Y (v ) then H (X 0; Y 0) = 2 unless cX (u) = X (v ), in which case H (X 0; Y 0) = 3. In all other cases H (X 0; Y 0) = 1. There are exactly k ?jSu;w j?1 elements cX of AX such that cX (w) = Y (v). Therefore the contribution to E [H (X 0; Y 0 ) ? 1] made by this edge is k ? jSu;w j ? 2 + 2 = k ? jSu;w j < M:

NX

NX

10

NX

CONFIGURATION 1, SUBCASE 1(ii). Here fu; v g 62 E . Moreover X (v ) 62 Sw;u;Y and Y (v ) 62 Sw;u;X , so jSw;u;Y j = jSw;u;X j. Also X (v ) 62 Su;w and Y (v ) 2 Su;w . It follows that NX = NY + 1. The set AY can be obtained from AX by deleting the colour map hY (v ); X (v )i and replacing all other colour maps of the form hY (v ); c(u)i by hX (v ); c(u)i. Here choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v). If cX (w) = Y (v) and cX (u) 6= X (v) then let cY = hX (v); cX(u)i. Finally if cX = hY (v ); X (v )i then choose cY uniformly at random from AY . We now check that each element of AY is equally likely to be the result of this procedure. Let cX be the element of AX chosen uniformly at random in Step (i) and let cY 2 AY . Suppose rst that cY (w) 6= X (v ). Then cY is the result of the coupling procedure if (a) cX = cY , or (b) cX = hY (v ); X (v)i and cY is the element of AY chosen uniformly at random. Therefore the probability that cY is the result of the coupling procedure is Prob [cX = cY ] + Prob [cX = hY (v ); X (v )i] NY ?1 = NY ?1 :

(9)

Now suppose that cY (w) = X (v ). Then cY is chosen if (a) cX = hY (v ); cY (u)i, or (b) cX = hY (v); X (v)i and cY is the element of AY chosen uniformly at random. Replacing cY with hY (v); cY (u)i in (9) we see that cY is chosen with probability NY ?1 in this case also. Therefore each element of AY is chosen with equal probability, as required. Now we calculate the contribution to E [H (X 0; Y 0 ) ? 1]. If cX (w) 6= Y (v ) then H (X 0; Y 0 ) = 1. If cX (w) = Y (v ) and cX (u) 6= X (v ) then H (X 0; Y 0) = 2. There are exactly k ? jSu;w j ? 1 such elements in AX . Finally if cX = hY (v); X (v)i then H (X 0; Y 0) = 2 with probability (k ?jSw;u;Y j? 1)NY ?1 , otherwise H (X 0; Y 0) = 3. Therefore the contribution to E [H (X 0; Y 0 ) ? 1] in this case is k ? jSu;w j ? 1 + 2NY ? (k ? jSw;u;Y j ? 1) < k ? jSu;w j + 1  M:

NX

NX NY

NX

CONFIGURATION 1, SUBCASE 1(iv). Here fu; v g 62 E . Moreover X (v ) 62 Sw;u;Y and Y (v) 62 Sw;u;X , so jSw;u;Y j = jSw;u;X j. Also X (v) 2 Su;w and Y (v) 2 Su;w . It follows that NX = NY . The set AY is obtained from AX by replacing every colour map of the form hY (v ); c(u)i with hX (v ); c(u)i. Here choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v ). If cX (w) = Y (v ) then let cY (w) = hX (v ); cX (u)i. This de nes a bijection between AX and AY so each element of AY is chosen with equal probability. If cX (w) = Y (v ) then H (X 0; Y 0 ) = 2 and if cX (w) 6= Y (v ) then H (X 0; Y 0) = 1. There are exactly k ?jSu;w j elements cX of AX such that cX (w) = Y (v). Therefore the contribution to E [H (X 0; Y 0 ) ? 1] made by this edge is

k ? jSu;w j < M: NX

CONFIGURATION 1, SUBCASE 2(i). Here fu; v g 62 E . Moreover X (v ) 2 Sw;u;Y and Y (v) 62 Sw;u;X , so jSw;u;X j = jSw;u;Y j ? 1. Also X (v) 62 Su;w and Y (v) 62 Su;w . It follows 11

that NX = NY +(k ?jSu;w j? 1). The set AY may be obtained from AX by deleting the k ? jSu;w j ? 1 colour maps which assign the colour Y (v) to w. Before we can describe the coupling procedure we de ne some notation. The set C n Sw;u;Y is the disjoint union of the subsets C1 and C2, where C1 = C n (Su;w [ Sw;u;Y ) and C2 = Su;w n Sw;u;Y : (10) If b is an element of C1 then there are k ?jSu;w j?1 elements cY 2 AY such that cY (w) = b, while if b 2 C2 then there are k ? jSu;w j such elements. Let Y : C n Sw;u;Y ! R be de ned by

(

?1 if b 2 C1, (11) Y (b) = (k ? jSu;w j ? 1)?N1 Y (k ? jSu;w j)NY if b 2 C2 for all b 2 C n Sw;u;Y . Then Y is a probability measure on C n Sw;u;Y . The coupling procedure can now be described. Choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v ). If cX (w) = Y (v ) then choose cY (w) 2 C n Sw;u;Y according to the probability distribution Y . If cY (w) 2 C1 then let cY (u) = cX (u) unless cX (u) = cY (w), in which case let cY (u) = Y (v ). If cY (w) 2 C2 then let ( ?1 cY (u) = cX (u) with probability (k ? jSu;w j ??11)(k ? jSu;w j) , Y (v) with probability (k ? jSu;w j) . We now check that each element of AY is chosen with equal probability. Let cX 2 AX be the element chosen uniformly at random in Step (i) and let cY 2 AY . Suppose rst that cY (w) 2 C1 and cY (u) 6= Y (v ). Then cY is the result of the coupling procedure if and only if (a) cX = cY , or (b) cX = hY (v ); cY (u)i and cY (w) is the element of C n Sw;u;Y chosen. Therefore the probability that cY results in this case is Prob [cX = cY ] + Prob [cX = hY (v ); cY (u)i]  Y (cY (w)) ?  = NX ?1 1 + (k ? jSu;w j ? 1)NY ?1 = NY ?1 : Suppose now that cY (w) 2 C1 and cY (u) = Y (v ). Then cY is the result of the coupling procedure if and only if (a) cX = cY , or (b) cX = hY (v ); cY (w)i and cY (w) is the element of C n Sw;u;Y chosen. The probability that cY occurs is given by Prob [cX = cY ] + Prob [cX = hY (v ); cY (w)i]  Y (cY (w)) = NY ?1 : Now let cY 2 AY be such that cY (w) 2 C2 and cY (u) 6= Y (v ). Then cY is the result of the coupling procedure if and only if (a) cX = cY , or (b) cX = hY (v ); cY (u)i, the chosen element of C n Sw;u;Y is cY (w) and cY (u) = cX (u). Therefore probability that cY is chosen is given by Prob [cX = cY ] + Prob [cX = hY (v ); cY (u)i]  Y (cY (w))  (k ? jSu;w j ? 1)(k ? jSu;w j)?1 ?  = NX ?1 1 + (k ? jSu;w j ? 1)NY ?1 = NY ?1 :

12

Finally suppose that cY (w) 2 C2 and cY (u) = Y (v ). Then cY occurs if and only if (a) cX = cY , or (b) cX (w) = Y (v ), the element chosen from C n Sw;u;Y is cY (w) and cY (u) = Y (v). Hence the probability that cY is chosen in this case is given by Prob [cX = cY ] + Prob [cX (w) = Y (v )]  Y (cY (w))  (k ? jSu;w j)?1 ?  = NX ?1 1 + (k ? jSu;w j ? 1)NY ?1 = NY ?1 : Therefore every element of AY is chosen with equal probability. Now we calculate the contribution which is made to E [H (X 0; Y 0) ? 1] using this procedure. If cX (w) 6= Y (v ) then H (X 0; Y 0 ) = 1. Suppose then that cX (w) = Y (v ). Then we choose an element cY (w) from C n Sw;u;Y with probability Y (cY (w)). Suppose rst that cY (w) 2 C1. Then k ? jSu;w j ? 2 elements of AX give H (X 0; Y 0) = 2 and one gives H (X 0; Y 0 ) = 3. If however cY (w) 2 C2 then H (X 0; Y 0 ) = 2 with probability (k ? jSu;w j ? 1)(k ? jSu;w j)?1 and H (X 0; Y 0 ) = 3 with probability (k ? jSu;w j)?1 . Therefore the contribution made to E [H (X 0; Y 0) ? 1] in this case is 

= =

<
hcX (w); X (v)i if cX (u) = Y (v), : hX (v); cX(u)i if cX (w) = Y (v). This de nes a bijection between AX and AY . Therefore each element of AY is equally likely to be chosen. Now H (X 0; Y 0 ) = 2 whenever cX (w) = Y (v ) or cX (u) = Y (v ). Otherwise H (X 0; Y 0 ) = 1. Therefore the contribution to E [H (X 0; Y 0 ) ? 1] in this case is   2(k ? 1) ? (jSw;u;X j + jSu;w;X j)  2 max k ? jSw;u;X j ? 1 ; k ? jSu;w;X j ? 1 < 2M:

NX

NX

NX

CONFIGURATION 2, SUBCASE 2(i). Here fu; v g 2 E . Moreover X (v ) 2 Sw;u;Y and Y (v) 62 Sw;u;X , so jSw;u;X j = jSw;u;Y j ? 1. Also X (v) 62 Su;w;Y and Y (v) 62 Su;w;X . It follows that NX = NY + (k ? jSu;w;X j ? 1). The set AY may be obtained from AX by deleting the k ?jSu;w;X j? 1 colour maps which assign the colour Y (v ) to w and replacing each of the k ?jSw;u;X j? 1 colour maps of the form hc(w); Y (v )i with hc(w); X (v )i. Recall the sets C1 and C2 de ned in (10) and the probability distribution Y de ned in (11). We use a coupling based on that used in Subcase 2(i) for Con guration 1 edges. Choose cX 2 AX uniformly at random. If cX (w) 6= Y (v) then let cY = cX unless cX (u) = Y (v), in which case let cY = hcX (w); X (v )i. If cX (w) = Y (v ) then choose cY (w) 2 C n Sw;u;Y according to the probability distribution Y . If cY (w) 2 C1 then let cY (u) = cX (u) unless cX (u) = cY (w), in which case let cY (u) = Y (v ). If cY (w) 2 C2 then let ( ?1 cY (u) = cX (u) with probability (k ? jSu;w;X j ??11)(k ? jSu;w;X j) , Y (v) with probability (k ? jSu;w;X j) . 15

Then each element of AY is equally likely to be the outcome of this procedure, following the proof of Subcase 2(i) for Con guration 1 edges. The contribution to E [H (X 0; Y 0) ? 1] will now be calculated. If cX (w) 6= Y (v ) and cX (u) 6= Y (v ) then H (X 0; Y 0 ) = 1. If cX (u) = Y (v) then H (X 0; Y 0) = 2. Finally the contribution to E [H (X 0; Y 0 ) ? 1] from elements cX 2 AX such that cX (w) = Y (v ) is given by

k ? jSu;w;X j ? jSu;w;Y n Sw;u;Y j ; NX NX NY following the proof for Subcase 2(i), Con guration 1. Therefore the contribution to E [H (X 0; Y 0) ? 1] in this case is given by 2k ? 1 ? (jSw;u;X j + jSu;w;X j) ? jSu;w;Y n Sw;u;Y j

NX NX NY  < 2 max k ? jNSw;u;X j ; k ? jNSu;w;X j X X < 2M: 

CONFIGURATION 2, SUBCASE 2(ii). Here fu; v g 2 E . Moreover X (v ) 2 Sw;u;Y and Y (v) 62 Sw;u;X , so jSw;u;X j = jSw;u;Y j ? 1. Also X (v) 62 Su;w;Y and Y (v) 2 Su;w;X . It follows that NX = NY + (jSw;u;Y j ? jSu;w;X j). The set AY may be obtained from AX by deleting the k ? jSu;w;X j colour maps which assign the colour Y (v ) to w and adding the k ? jSw;u;Y j colour maps which assign the colour X (v ) to u. First suppose that NX = NY . Then we couple as follows. Fix a bijection  : C n Su;w;X ! C n Sw;u;Y . Let cX be chosen uniformly at random from AX and let cY be de ned by (

if cX (w) 6= Y (v ), cY = cX h(cX (u)); X (v)i if cX (w) = Y (v). This de nes a bijection between AX and AY . Hence each element of AY is equally likely to result. Here H (X 0; Y 0 ) = 1 if cX (w) 6= Y (v ) and H (X 0; Y 0) = 3 if cX (w) = Y (v ). Therefore the contribution to E [H (X 0; Y 0 ) ? 1] in this situation is 2(k ? jSu;w;X j) < 2M: N X

For the remainder of this subcase assume without loss of generality that NX > NY . Before describing the coupling in this case, we must de ne two families of maps. Recall the sets C1 and C2 de ned in (10). For q 2 C1 let q : AY ! R be de ned by 8 > ( Sw;u;Y > > > ? j w;u;Y j) NY ? 1 (k ? jSw;u;Y j + jSw;u;Y n Su;w;Y j ? 1) NY if b(u) = X (v ); b(w) = q , > > > j

:0

otherwise

16

for all b 2 AY . Then q is a probability distribution on AY for all q 2 C1, since (k ? jSw;u;Y j ? 1) (jSw;u;Y j ? jSu;w;X j) + (k ? jSw;u;Y j ? 1) (k ? jSw;u;Y j) + (k ? jSw;u;Y j + jSw;u;Y n Su;w;Y j ? 1) = (k ? jSw;u;Y j ? 1) (k ? jSu;w;Y j ? 1) + (k ? jSw;u;Y j + jSw;u;Y n Su;w;Y j ? 1) = NY + jC1j ? (k ? jSu;w;Y ) + jSw;u;Y n Su;w;Y j = NY : (13)

Now for q 2 Sw;u;X n Su;w;X de ne the map q : AY ! R by 8 > ( Sw;u;Y > > > ? j w;u;Y j) NY ?1 if b(u) = X (v); b(w) 2 C1, (k ? jSw;u;Y j + 1) NY if b(u) = X (v ); b(w) 2 C2, > > > j

:0

otherwise

for all b 2 AY . Then q is a probability distribution on AY for all q 2 Sw;u;X n Su;w;X , since (k ? jSw;u;Y j) (jSw;u;Y j ? jSu;w;X j) + jC1j (k ? jSw;u;Y j) + jC2j (k ? jSw;u;Y j + 1) = (k ? jSw;u;Y j) (k ? jSu;w;Y j ? 1) + jSu;w;Y n Sw;u;Y j = NY + jC1j ? (k ? jSw;u;Y j) + jSu;w;Y n Sw;u;Y j = NY : (14) We may now de ne the coupling when NX > NY . Choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v ). If cX (w) = Y (v ) and cX (u) 2 C1 then choose cY 2 AY according to the probability distribution cX (u) . If cX (u) 62 C1 then cX (u) 2 Sw;u;X n Su;w;x . Here choose cY 2 AY according to the probability distribution cX (u) . We must prove that each element of AY is equally likely to be chosen by this procedure. Denote by cX the element of AX chosen uniformly at random in Step (i) and let cY 2 AY . Suppose rst that cY (u) 6= X (v ). Then cY is chosen by the coupling procedure if and only if (a) cX = cY , or (b) cX = hY (v ); cY (u)i and cY is the element selected from AY (with probability cY (u) (cY ) if cY (u) 2 C1 and with probability cY (u) (cY ) otherwise). Note that cY (u) (cY ) = (jSw;u;Y j ? jSu;w;X j)NY ?1 if cY (u) 2 C1 and cY (u) (cY ) = (jSw;u;Y j ? jSu;w;X j)NY ?1 if cY (u) 2 Sw;u;X n Su;w;X , so these two probabilities are the same. Therefore the probability that cY is chosen in this case is given by Prob [cX = cY ] + Prob [cX = hY (v ); cY (u)i]  (jSw;u;Y j ? jSu;w;X j) NY ?1 ?  = NX ?1 1 + (jSw;u;Y j ? jSu;w;X j)NY ?1 = NY ?1 : Suppose next that cY (u) = X (v ) and cY (w) 2 C1 . Then cY is the result of the coupling procedure if and only if (a) cX (w) = Y (v ); cX (u) 2 C1 n fcY (w)g and cY is the element chosen from AY (with probability cX (u) (cY )), or (b) cX (w) = Y (v ), cX (u) = cY (w) and cY is the element chosen from AY (with probability cY (w) (cY )), or 17

(c) cX (w) = Y (v ), cX (u) 2 Sw;u;X n Su;w;X and cY is the element chosen from AY (with probability cX (u) (cY )). Therefore the probability that cY is chosen in this case is given by Prob [cX (w) = Y (v ); cX (u) 2 C1 n fcY (w)g]  cX (u) (cY ) + Prob [cX = hY (v ); cY (w)i]  cY (w)(cY ) + Prob [cX (w) = Y (v ); cX (u) 2 Sw;u;X n Su;w;X ]  cX (u) (cY ) = NX ?1 NY ?1 [(jC1j ? 1)(k ? jSw;u;Y j) + (k ? jSw;u;Y j + jSw;u;Y n Su;w;Y j ? 1) + jSw;u;X n Su;w;X j(k ? jSw;u;Y j)] ? 1 ? 1 = NX NY [NY + jC1j ? (k ? jSw;u;Y j) + jSw;u;Y n Su;w;Y j ? 1] = NX ?1 NY ?1 (NY + jSw;u;Y j ? jSu;w;X j) = NY ?1 : Finally suppose that cY (u) = X (v ) and cY (w) 2 C2 . Then cY is chosen by the coupling procedure if and only if (a) cX (w) = Y (v ), cX (u) 2 C1 and cY is the element of AY chosen (with probability cX (u) (cY )), or (b) cX (w) = Y (v ), cX (u) 2 Sw;u;X n Su;w;X and cY is the element of AY chosen (with probability cX (u) (cY )). Therefore the probability that cY is chosen in this case is given by Prob [cX (w) = Y (v ); cX (u) 2 C1 ]  cX (u) (cY ) + Prob [cX (w) = Y (v ); cX (u) 2 Sw;u;X n Su;w;X ]  cX (u) (cY ) = NX ?1 NY ?1 [jC1j(k ? jSw;u;Y j) + jSw;u;X n Su;w;X j(k ? jSw;u;Y j + 1)] = NX ?1 NY ?1 [NY + jC1j ? (k ? jSw;u;Y j) + jSw;u;X n Su;w;X j] = NX ?1 NY ?1 (NY + jSw;u;Y j ? jSu;w;X j) = NY ?1 : Therefore every element of AY has equal probability of being the result of the procedure. We must now calculate the contribution of the coupling to E [H (X 0; Y 0) ? 1]. If cX (w) 6= Y (v) then H (X 0; Y 0) = 1. Suppose now that cX (w) = Y (v) and cX (u) 2 C1 . Then H (X 0; Y 0) = 2 with probability (k ? jSw;u;Y j ? 1)(jSw;u;Y j ? jSu;w;X j)NY ?1 ; otherwise H (X 0; Y 0) = 3. This follows as there are k ? jSw;u;Y j ? 1 elements cY 2 AY with cY (u) = cX (u), and each is chosen with probability (jSw;u;Y j?jSu;w;X j)NY ?1 using the probability distribution cX (u) . If cX (w) = Y (v ) and cX (u) 2 Sw;u;X n Su;w;X then there are k ? jSw;u;Y j elements cY 2 AY such that cY (u) = cX (u). Each is selected with probability (jSw;u;Y j?jSu;w;X j)NY ?1 using the probability distribution cX (u) . Therefore H (X 0; Y 0) = 2 with probability (k ? jSw;u;Y j)(jSw;u;Y j ? jSu;w;X j)NY ?1 ; and H (X 0; Y 0 ) = 3 otherwise. Therefore the contribution of this edge to [H (X 0; Y 0 ) ? 1] 18

is given by

jC1j 2 ? (k ? jSw;u;Y j ? 1)(jSw;u;Y j ? jSu;w;X j)  NX

=

= = = =


(k ? jSu;w;Y j)NY : ? 1 ? 1 (k ? 2jSu;w;Y j + jSw;u;Y j)NY if b = q for all b 2 C n (Sw;u;Y [ fq g). Then q is a probability measure on C n (Sw;u;Y [ fq g) for all q 2 C1, since jC2j (k ? jSu;w;Y j + 1) + (jC1j ? 2) (k ? jSu;w;Y j) + (k ? 2jSu;w;Y j + jSw;u;Y j) = (k ? jSw;u;Y j)(k ? jSu;w;Y j) + jC1j ? (k ? jSw;u;Y j) = NY + jC1j + jC2j ? (k ? jSw;u;Y j) = NY : (18) We de ne q 0 : C n (Su;w;Y [ fq g) ! R for all q 2 C1 by exchanging the roles of w and u, and replacing  ?1 by  in the de nition of q. That is, 8 > (k ? jSw;u;Y j)NY ?1 : ? 1 (k ? 2jSw;u;Y j + jSu;w;Y j)NY if b = q for all b 2 C n (Su;w;Y [ fq g). Then q 0 is a probability measure on C n (Su;w;Y [ fq g) for all q 2 C1, by adapting the proof for q . The coupling may now be de ned. Choose cX 2 AX uniformly at random and let cY = cX unless cX (w) = Y (v) or cX (u) = Y (v). If cX (w) = Y (v) and cX (u) 2 C20 then choose cY (w) 2 C n Sw;u;Y according to the distribution Y and let cY (u) = cX (u). If cX (w) 2 C2 and cX (u) = Y (v) then choose cY (u) 2 C n Su;w;Y according

to the distribution Y 0 and let cY (w) = cX (w). Now suppose that cX (w) = Y (v ) and cX (u) 2 C1. Choose cY (w) 2 C n (Sw;u;Y [fcX (u)g) according to the distribution cX (u) and let cY (u) = cX (u). Finally suppose that cX (w) 2 C1 and cX (u) = Y (v ). Here choose cY (u) 2 C n (Su;w;Y [ fcX (w)g) according to the distribution cX (w) 0 and let cY (w) = cX (w). Let us check that each element of AY is equally likely to be the result of this coupling procedure. Denote by cX the element of AX chosen uniformly at random in Step (i) and let cY 2 AY . Now cY is the result of the coupling if and only if (a) cX = cY , or (b) cX = hcY (w); Y (v)i and cY (u) is the element of C n Su;w;Y chosen (with probability Y 0(cY (u)) if cY (w) 2 C2, or probability cY (w) 0(cY (u)) if cY (w) 2 C1), or (c) cX = hY (v ); cY (u)i and cY (w) is the element of C n Sw;u;Y chosen (with probability Y (cY (w)) if cY (u) 2 C20 , or with probability cY (u) (cY (w)) if cY (u) 2 C1). Suppose rst that cY (w) 2 C2 and cY (u) 2 C20 . Then the probability that cY is the result of the procedure is given by Prob [cX = cY ] + Prob [cX = hcY (w); Y (v )i]  Y 0 (cY (u)) + Prob [cX = hY (v ); cY (u)i]  Y (cY (w))  NX ?1 1 + k ? jNSw;u;Y j + k ? jNSu;w;Y j

NY ?1 ;

Y

Y

20

as required. Suppose next that cY (w) 2 C2 and cY (u) 2 C1. Then the probability that cY is chosen is given by Prob [cX = cY ] + Prob [cX = hcY (w); Y (v )i]  Y 0 (cY (u)) + Prob [cX = hY (v ); cY (u)i]  cY (u) (cY (w))   k ? j S j + 1 k ? j S j ? 1 u;w;Y w;u;Y ? 1 + N 1+ X

NY ?1 :

NY

NY

If cY (w) 2 C1 and cY (u) 2 C20 then essentially the same calculation as the previous one shows that cY is chosen with probability NY ?1 . Finally suppose that cY (w) 2 C1 and cY (u) 2 C1. If cY (u) 6= cY (w) then the probability that cY is chosen is given by Prob [cX = cY ] + Prob [cX = hcY (w); Y (v )i]  cY (w) 0 (cY (u)) + Prob [cX = hY (v ); cY (u)i]  cY (u) (cY (w))   NX ?1 1 + k ? jNSw;u;Y j + k ? jNSu;w;Y j Y Y NY ?1 : However if cY (u) = cY (w) then the probability that cY is chosen is given by Prob [cX = cY ] + Prob [cX = hcY (w); Y (v )i]  cY (w) 0(cY (w) ) + Prob [cX = hY (v ); cY (u)i]  cY (u) (cY (u) ?1)   k ? 2 j S j + j S j k ? 2 j S j + j S j u;w;Y w;u;Y w;u;Y u;w;Y ? 1 + NX 1 + NY NY ? 1 NY :

Therefore every element of AY is equally likely to be the result of the coupling. Finally the contribution of this coupling to E [H (X 0; Y 0 ) ? 1] must be calculated. This is not dicult, as H (X 0; Y 0) = 2 whenever cX (w) = Y (v ) or cX (u) = Y (v ), and H (X 0; Y 0 ) = 1 otherwise. Therefore the contribution of this edge to E [H (X 0; Y 0 ) ? 1] is given by   2(k ? 1) ? (jSw;u;X j + jSu;w;X j)  2 max k ? jSw;u;X j ? 1 ; k ? jSu;w;X j ? 1 < 2M:

NX

NX

NX

This completes the case analysis. We have established that the maximum contribution of any Con guration 1 edge is less than M and the maximum contribution of any Con guration 2 edge is less than 2M . It follows by (6) that the total contribution of all Con guration 1 and Con guration 2 edges to E [H (X 0; Y 0 ) ? 1] is less than d( ? 1)M . The only other edges which a ect the Hamming distance are the d edges fv; wg, where H (X 0; Y 0) = 0 with probability 1. Therefore the expected value of H (X 0; Y 0 ) after one step of the coupling may be bounded above as follows:

E H (X 0; Y 0) ? 1 < jEd j (M ( ? 1) ? 1) ?  d k2 ? (3 ? 2)k + 2( ? 1)2 = ? jE j(k ?  + 1)(k ? ) : 21

(20)

Now the right hand side of (20) is a decreasing function of k which is negative when k  2 and positive when  + 1  k  2 ? 1. Let be de ned by

?   k2 ? (3 ? 2)k + 2( ? 1)2 = 1? (21) jE j(k ? )(k ?  + 1) : Then (20) implies that E [H (X 0; Y 0 )] < < 1 whenever k  2. Hence when k  2 the Markov chain M(C V ) is rapidly mixing with mixing time (k ? )(k ?  + 1) n"?1 ) = ?1 1(")  log( 2 1? (k ? (3 ? 2)k + 2( ? 1)2) jE j log(n" ); by Theorems 2.1 and 2.2. It follows that M1( k (G)) is rapidly mixing for k  2 with the same mixing time. Finally suppose that G is -regular. Then  =  and equation

(21) becomes

?  2 k2 ? (3 ? 2)k + 2( ? 1)2 = 1? : n(k ? )(k ?  + 1) Therefore the mixing time of M1( k (G)) in this case satis es ? )(k ?  + 1) ?1 T (")  2(k2 ?(k(3 ? 2)k + 2( ? 1)2) n log(n" ); as stated.

(22)

Remark 3.1 The proof given in Theorem 3.2 uses the path coupling method to show that the Markov chain M1( k (G)) is rapidly mixing. The fact that this proof is quite technical suggests that a proof of the same result by the standard coupling method might be impossibly complex.

Remark 3.2 Let e = fw; ug 2 E be an edge chosen by the transition procedure of the Markov chain M1 ( k (G)). The set AX (e) used in the transition procedure has O(k2) elements. When k is large it may be too expensive to form the entire set AX (e) simply

in order to choose an element uniformly at random from it. Recall the probability distribution X de ned in (11), Theorem 3.2. Consider the following procedure: (i) choose a colour c(w) from C n Sw;u;X with probability X (c(w)), (ii) choose a colour c(u) from C n (Su;w;X [ fc(w)g) uniformly at random. Now X (b) measures the proportion of elements c 2 AX (e) such that c(w) = b. It follows from (3), (11) that this procedure produces elements of AX (e) uniformly at random. Assuming that the set C is linearly ordered, the set di erence of two sets of size at most k can be formed in O(k) operations using membership arrays. Moreover, linear algorithms exist for selecting elements from sets of size O(k) according to a given probability distribution. See for example, [14]. Therefore this is an ecient implementation of the transition procedure of M1 ( k (G)). 22

Remark 3.3 In Theorem 3.2 the Markov chain M1( k (G)) was shown to be rapidly mixing for k  2, where  is the maximum degree of the graph G. The bound on the mixing time of the chain is proportional to jE j= , where jE j is the number of edges and  is the minimum degree of G. Therefore the chain may be  times more rapidly mixing

for a -regular graph than for a graph with maximum degree  and minimum degree 1. We can improve this situation by embedding the graph G in a -regular multigraph G0 with vertex set V and edge multiset E 0, constructed as follows. Suppose that v is a vertex with degree d(v ) < . If  ? d(v ) is even we add ( ? d(v ))=2 self loop edges fv; vg of weight 2, and if  ? d(v) is odd then we add ( ? d(v) ? 1)=2 self-loop edges of weight 2 and one of weight 1. Edges in E have weight two. Let M2( k (G0)) denote the Markov chain with state space k (G0) and with transitions governed by the following procedure: choose an edge e 2 E 0 with probability proportional to its weight. If e 2 E then perform one transition of M1( k (G)) on the edge e. If e = fv; v g is a self-loop edge then perform one step of the heat-bath JSS chain on the vertex v . Then M2( k (G0)) can be thought of as acting on k (G). The proof of Theorem 3.2 may be extended to prove that M2 ( k (G)) is rapidly mixing for k  2. If 2  k  3 ? 1 then the mixing time is bounded above by (k ? )(k ?  + 1) ?1 2(k2 ? (3 ? 2)k + 2( ? 1)2) n log(n" ): This is the same bound as that given in (22) for the mixing time of M1( k (G)) when the graph G is -regular. If k  3 then the mixing time of M2( k (G)) is bounded above by n log(n"?1 ). This may be reduced to ( + 1)(k ? )(k ?  + 1) n log(n"?1 ) 2(k2 ? (3 ? 2)k + 2( ? 1)2) by altering the transition procedure so that edges are chosen from E 0 uniformly at random. The details are omitted. We conclude this section with a comparison of the mixing time of the new chain with the mixing time of the heat bath JSS chain. We will assume that G is -regular. Note that, by Remark 3.3, the comparison is also valid for non-regular graphs with maximum degree  whenever 2  k  3 ? 1. By Theorem 3.2 the mixing time 1 () of M1( k (G)) satis es ? )(k ?  + 1) ?1 1(")  2(k2 ?(k(3 ? 2)k + 2( ? 1)2) n log(n" ): We shall not compare the new chain directly with the heat bath JSS chain as performing one transition of M1( k (G)) is more complicated than performing one transition of the heat bath JSS chain. However, a transition of M1( k (G)) is at most as complicated as two steps of the heat bath JSS chain. Suppose that k = 2. By de ning a suitable coupling one can show that, for any pair (X; Y ) 2  , the probability that the Hamming distance changes after one step is at least 1=(n) unless (X; Y ) are a socalled stuck pair, one in which no change in the Hamming distance is possible. One may de ne a coupling on stuck pairs so as to guarantee that they are not stuck in the next 23

step. Let 2J (") denote the mixing time of the two-step JSS chain. Then the probability that the Hamming distance changes in two steps is at least 1=(n). Therefore by the second part of Theorem 2.2,

2J (")  3n3 log("?1 ) when k = 2. By Theorem 3.2 the mixing time of M1 ( 2(G)) satis es 1(")  (4+ 1) n log(n"?1 ): The ratio r of these upper bounds is 2 "?1 ) : r = (12+n1)log( log(n"?1 ) Therefore the JSS chain is  (n2) times slower than the new chain when k = 2. Suppose now that k > 2. Again we compare the two-step JSS chain with the new chain. Let r be the ratio of the upper bound on the mixing time of the JSS chain with twice the upper bound on the mixing time of the new chain, as given in (2) and Theorem 3.2. The ratio is ? 2  k ? (3 ? 2)k + 2( ? 1)2 r = (k ? 2)(k ?  + 1) ? 2 + 2 ; = 1 + (k ? k2)( k ?  + 1) that is, always greater than 1. For example, if  = 4 and k = 2 + 1 then the JSS chain is fty percent slower than the new chain even allowing the JSS chain two steps for each step of the new chain.

4 Optimality of the couplings In this section we prove that the couplings used in Theorem 3.2 are optimal, in the sense that they minimise the expected value of H (X 0; Y 0 ). Therefore we cannot hope to improve the analysis of the Markov chain M1( k (G)) using the coupling method applied to Hamming distance. The optimality of the couplings used in Theorem 3.2 will be established as follows: each coupling will be related to a solution of an associated transportation problem, which will be shown to be optimal. This idea is explored further in [5], where the techniques of this section are generalised. First we give a very brief de nition of a transportation problem. For more detail see, for example, [11]. Let m and n be positive integers and let K be an m  n matrix of nonnegative integers. Let a bePa vector P of m positive integers and let b be a vector of n positive integers such that mi=1 ai = nj=1 bj = N . An m  n matrix Z of nonnegative numbers is a solution ofPthe transportation problem P 1  i  m and mi=1 Zi;j = bj for 1  j  n. de ned by a, b and K if nj=1 Zi;j = ai for Pm Pn The cost of this solution is measured by i=1 j =1 Ki;j Zi;j . An optimal solution of this 24

transportation problem is a solution which minimises the cost. The entries of an optimal solution are all integers. The elements of a are called row sums and the elements of b are called column sums. Ecient methods exist for solving transportation problems. One such is the Hungarian method (see for example, [11, Chapter 20]). We now describe how the couplings de ned for M1 ( k (G)) give rise to solutions of a related tranportation problem. Let X and Y be two colourings which di er only at vertex v and let e = fw; ug be the chosen edge, where fw; v g 2 E and u 6= v . A coupling for M1( k (G)) at the edge e de nes a probability distribution f : AX (e)  AY (e) ! R on the joint probability space AX (e)  AY (e). As in Section 3 write AX for AX (e), similarly write AY , NX and NY . Then 



f (cX ; cY ) = Prob X 0 = Xe!cX ; Y 0 = Ye!cY :

P

The map f satis es cY 2AY f (cX ; cY ) = NX ?1 for all cX 2 AX and ?1 cX 2AX f (cX ; cY ) = NY for all cY 2 AY . Let h : AX  AY ! f0; 1; 2g be de ned by P

8 > 2 if cX (w) 6= cY (w) and cX (u) 6= cY (u), : 1 otherwise. Then the expected value of H (X 0; Y 0 ) ? 1 which results from this coupling is given by X

X

cX 2AX cY 2AY

f (cX ; cY )h(cX ; cY ):

Refer to this quantity as the cost of the coupling. Let a be the NX -dimensional vector with each entry equal to NY and let b be the NY -dimensional vector with each entry equal to NX . Let K be the matrix with NX rows and NY columns, corresponding to the elements of AX and AY respectively, such that the (cX ; cY ) entry of K is h(cX ; cY ). Let Z be the NX  NY matrix whose (cX ; cY ) entry equals NX NY f (cX ; cY ). Then Z is a solution of the transportation problem de ned by a, b and K . An optimal solution of this transportation problem corresponds to an optimal coupling at the edge fw; ug. The cost of the optimal solution equals NX NY times the cost of the optimal coupling. We may now prove that the couplings de ned in the proof of Theorem 3.2 are optimal. We do this by tracing the steps which would be performed by the Hungarian method with input corresponding to the given couplings. We follow the description of the Hungarian method given in [11, Chapter 20]. Note that it takes at most two steps of the Hungarian method to establish optimality in each case.

Theorem 4.1 The couplings used in the proof of Theorem 3:2 are optimal. Proof. Let X and Y be two elements of k (G) which di er just at the vertex v. Let e = fw; ug be an edge in G such that u 6= v and fw; vg 2 E . In the proof of Theorem 3.2 we consider two types of edges: if fu; v g 62 E then fw; ug is a Con guration 1 edge, otherwise it is a Con guration 2 edge. Five subcases are identi ed for Con guration 25

1 edges and four subcases are identi ed for Con guration 2 edges, and a coupling is described for each. Recall the notation established in Theorem 3.2. Fix a linear ordering on the elements of C such that X (v ) and Y (v ) appear last. Unless otherwise stated, the sets AX and AY are ordered lexicographically with respect to this order. This gives an ordering on the rows and columns of the NX  NY matrix of costs K , with (cX ; cY ) entry equal to h(cX ; cY ) for all (cX ; cY ) 2 AX  AY . We now show that each coupling is optimal by tracing the steps which would be performed by the Hungarian method. Throughout the proof the (i; i) entries of a matrix are called diagonal entries, even if the matrix is not square. CONFIGURATION 1, SUBCASE 1(i). Here fu; v g 62 E , X (v ) 62 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w and Y (v) 62 Su;w . In this case NX = NY and the set AY is obtained from the set AX by replacing the colour map hY (v ); X (v )i by the colour map hX (v ); Y (v )i and replacing all other colour maps of the form hY (v ); c(u)i by hX (v ); c(u)i. The square matrix K of costs has the following form: the rst NX ? (k ? jSu;w j ? 1) rows have a zero element on the diagonal and all other entries either 1 or 2. The next k ? jSu;w j ? 2 rows have a 1 on the diagonal, a 2 in the last column and every other entry is equal to 1 or 2. Each entry of the nal row is either 1 or 2, with a 2 in the last column. Let K (1) denote the matrix obtained from K by rst subtracting 1 from the last k ? jSu;w j ? 1 rows and then subtracting 1 from the last column. The (i; i) entry of K (1) is zero for 1  i  NX . Make each of these zero elements an independent zero with multiplicity NX . This set of NX 2 independent zeros in K (1) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling described for Subcase 1(i), Con guration 1 in the proof of Theorem 3.2. CONFIGURATION 1, SUBCASE 1(ii). Here fu; v g 62 E , X (v ) 62 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w and Y (v) 2 Su;w . In this case NX = NY + 1 and the set AY is obtained from the set AX by deleting the colour map hY (v ); X (v )i and replacing all other colour maps of the form hY (v ); c(u)i by hX (v ); c(u)i. The matrix K of costs has the following form: the rst NX ? (k ? jSu;w j) rows have a zero element on the diagonal and every other entry is either 1 or 2. The next k ? jSu;w j ? 1 rows have a 1 on the diagonal and every other entry is either 1 or 2. Every entry of the nal row is either 1 or 2. Let K (1) be the matrix obtained from K by subtracting 1 from the last k ? jSu;w j rows. There is a zero in every row and every column of K (1). In particular, there is a zero in the (i; i) position for 1  i  NY . Make each of these elements an independent zero with multiplicity NY . Each entry of the last row of K (1) is either 0 or 1, and the entry in the column corresponding to cY is zero if and only if cY (u) = X (v ). Make each of these zero elements an independent zero with multiplicity 1. This puts k ?jSw;u;Y j? 1 independent zeros in the last row of K (1). A set of covering lines for K (1) is formed by taking the all columns which contain a zero entry in the nal row, together with all rows which contain a zero entry in any other column. No independent zero lies on the intersection of two covering lines, so the independent set of zeros is maximal for K (1). The minimum uncovered element of K (1) is 1. Let K (2) be the matrix obtained from K (1) by subtracting 1 from every uncovered element and adding 1 to every element which is covered twice. The last row of K (2) consists entirely of zeros. Therefore the set of independent zeros can be increased by adding the remaining elements of the last row as 26

independent zeros with multiplicity 1. This set of NX NY independent zeros in K (2) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling given for Subcase 1(ii), Con guration 1 in the proof of Theorem 3.2. CONFIGURATION 1, SUBCASE 1(iv). Here fu; v g 62 E , X (v ) 62 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 2 Su;w and Y (v) 2 Su;w . In this case NX = NY and the set AY is obtained from the set AX by replacing all colour maps of the form hY (v ); c(u)i by hX (v ); c(u)i. The square matrix K of costs has the following form: the rst NX ? (k ? jSu;w j) rows have a zero element on the diagonal and all other entries either 1 or 2. The nal k ? jSu;w j rows have a 1 on the diagonal and every other entry is equal to 1 or 2. Let K (1) denote the matrix obtained from K by subtracting 1 from the last k ? jSu;w j rows. The (i; i) entry of K (1) is zero for 1  i  NX . Make each of these zero elements an independent zero with multiplicity NX . This set of NX 2 independent zeros in K (1) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling described for Subcase 1(iv), Con guration 1 in the proof of Theorem 3.2. CONFIGURATION 1, SUBCASE 2(i). Here fu; v g 62 E , X (v ) 2 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w and Y (v) 62 Su;w . In this case NX = NY + (k ? jSu;w j ? 1) and the set AY is obtained from the set AX by deleting the (k ? jSu;w j ? 1) colour maps which assign the colour Y (v ) to w. The matrix K of costs has the following form: the rst NX ? (k ? jSu;w j ? 1) rows have a zero element on the diagonal and every other entry is either 1 or 2. Every entry of the last k ? jSu;w j ? 1 rows is either 1 or 2, with every row containing at least one entry which equals 1. Let K (1) be the matrix which is obtained from K by subtracting 1 from each of the last k ? jSu;w j ? 1 rows. Now K (1) has a zero in every row and every column. In particular, there is a zero in the (i; i) position of K (1) for 1  i  NY . Make each of these these zero elements an independent zero with multiplicity NY . Now consider the row corresponding to cX where cX (w) = Y (v ). If cX (u) 2 Sw;u;Y then this row has a zero entry only in the columns associated with cY where cY (u) = cX (u) and cY (w) 2 C n Sw;u;Y . If cX (u) 62 Sw;u;Y then cX (u) 2 C1 and the row has zero entries only in the columns associated with cY where cY (u) = cX (u) and cY (w) 2 C n (Sw;u;Y [ fcX (u)g). Let each of these zero entries be independent zeros with multiplicity k ? jSu;w j ? 1. This places NY ? jC2j independent zeros in the row if cX (u) 2 Sw;u;Y and places NY ? (jC2j + k ? jSu;w j ? 1) independent zeros in the row if cX (u) 2 C1 . A set of covering lines for K (1) is formed by taking all columns with a zero entry in the nal k ? jSu;w j ? 1 rows, together with all rows with a zero entry in any other column. No independent zero lies on the intersection of two covering lines, so the set of independent zeros is maximal for K (1). The minimum uncovered element of K (1) is 1. Let K (2) be the matrix formed by subtracting 1 from all uncovered elements of K (1) and adding 1 to all elements of K (1) which are covered twice. The set of independent zeros can now be extended as all uncovered elements in the last k ? jSu;w j ? 1 rows of K (2) are 0. Consider the row of K (2) associated with the element cX where cX (w) = Y (v). Make each of the following elements of this row an independent zero with multiplicity 1: those in a column associated with cY where cY (u) = Y (v ) and cY (w) 2 C2. If cX (u) 2 C1 then also make the element in the column associated with cY = hcX (u); Y (v)i an independent zero with multiplicity k ? jSu;w j ? 1. Then every row now has NY independent zeros. This set of NX NY independent zeros in K (2) gives 27

an optimal solution of the original transportation problem which corresponds exactly to the coupling for Subcase 2(i), Con guration 1 given in Theorem 3.2. CONFIGURATION 1, SUBCASE 2(ii). Here fu; v g 62 E , X (v ) 2 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w and Y (v) 2 Su;w . In this case NX = NY + (k ? jSu;w j) and the set AY is obtained from the set AX by deleting the (k ?jSu;w j) colour maps which assign the colour Y (v ) to w. The matrix K of costs has the following form: the rst NX ?(k ?jSu;w j) rows have a zero element on the diagonal and every other entry is either 1 or 2. Every entry of the last k ? jSu;w j rows is either 1 or 2, with every row containing at least one entry which equals 1. Let K (1) be the matrix obtained from K by subtracting 1 from each of the last k ? jSu;w j rows. Now K (1) has a zero in every row and every column. In particular there is a zero in the (i; i) position for 1  i  NY . Let each of these elements be an independent zero with multiplicity NY . Recall the probability measure Y de ned in (11). Consider the row of K (1) associated with cX where cX (w) = Y (v). If cX (u) 2 Sw;u;Y then every element of the row is either 0 or 1, and the entry in the column associated with cY is zero if and only if cY (u) = cX (u). Let these zero entries be independent zeros with multiplicity Y (c)NY . This places NY independent zeros in the row corresponding to cX . Now suppose that cX (u) 2 C1. Then every element in the row associated with cX is either 0 or 1 and the entry in the column associated with cY is zero if and only if cY (u) = cX (u) and cY (w) 2 C n (Sw;u;Y [ fcX (u)g). Let each of these zero elements be an independent zero with multiplicity k ? jSu;w j. By (12), this gives NY ? jSw;u;Y n Su;w j independent zeros in the row associated with cX . A set of covering lines for K (1) is formed by taking all columns associated with cY 2 AY where cY (u) 2 C1 and cY (w) 2 C n (Sw;u;Y [ fcY (u)g), together with all rows which contain a zero entry in any other column. No independent zero lies on the intersection of two covering lines, so the set of independent zeros is maximal for K (1) . The minimum uncovered element of K (1) is 1. Let K (2) be the matrix formed by subtracting 1 from all uncovered elements of K (1) and adding 1 to all elements of K (1) which are covered twice. The set of independent zeros can now be extended as all uncovered elements in the last k ? jSu;w j rows of K (2) are 0. Consider the row of K (2) associated with the element cX where cX (w) = Y (v ) and cX (u) 2 C1. Make each of the following elements of this row an independent zero with multiplicity 1: those in a column associated with cY where cY (w) = cX (u) and cY (u) 2 Sw;u;Y n Su;w . There are now NY independent zeros in this row. This set of NX NY independent zeros in K (2) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling for Subcase 2(ii), Con guration 1 given in Theorem 3.2. CONFIGURATION 2, SUBCASE 1(i). Here fu; v g 2 E , X (v ) 62 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w;Y and Y (v) 62 Su;w;X . In this case NX = NY and the set AY is obtained from AX by replacing each colour map of the form hc(w); Y (v )i with the colour map hc(w); Y (v)i, and replacing every colour map of the form hY (v); c(u)i with hX (v); c(u)i. The matrix K of costs has the following form: the row corresponding to cX has a 0 on the diagonal and every other entry equal to 1 or 2, unless cX (w) = Y (v ) or cX (u) = Y (v). For these rows there is a 1 on the diagonal and every other entry is equal to 1 or 2. Let K (1) be the matrix obtained from K by subtracting 1 from each of the rows associated with cX where cX (w) = Y (v ) or cX (u) = Y (v ). Now K (1) has a 28

zero in every row and every column. In particular there is a zero in the (i; i) position for 1  i  NX . Let each of these elements be an independent zero with multiplicity NX . This set of NX 2 independent zeros in K (1) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling described for Subcase 1(i), Con guration 2 in the proof of Theorem 3.2. CONFIGURATION 2, SUBCASE 2(i). Here fu; v g 2 E , X (v ) 2 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w;Y and Y (v) 62 Su;w;X . In this case NX = NY + (k ? jSu;w;X j ? 1) and the set AY may be obtained from AX by deleting the k ? jSu;w;X j ? 1 colour maps which assign the colour Y (v ) to w and replacing the k ? jSw;u;X j ? 1 colour maps of the form hc(w); Y (v)i with hc(w); X (v)i. The matrix K of costs has the following form: the row corresponding to cX has a 0 on the diagonal and every other entry equal to 1 or 2, unless cX (w) = Y (v ) or cX (u) = Y (v ). If cX (u) = Y (v ) then the row has a 1 on the diagonal and every other entry is equal to 1 or 2. If cX (w) = Y (v ) then every entry in the row is either 1 or 2, with at least one entry equal to 1. Let K (1) be the matrix obtained from K by subtracting 1 from the row corresponding to cX whenever cX (w) = Y (v) or cX (u) = Y (v). Then every row and column of K (1) contains a zero entry. In particular, the (i; i) entry of K (1) is zero for 1  i  NY . Make each of these entries an independent zero with multiplicity NY . The remainder of the proof follows just as in the proof of Subcase 2(i) for Con guration 1 above, to show that the coupling described in Theorem 3.2 for Subcase 2(i), Con guration 2 is optimal. CONFIGURATION 2, SUBCASE 2(ii). Here fu; v g 2 E , X (v ) 2 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 62 Su;w;Y and Y (v) 2 Su;w;X . In this case NX = NY + (jSw;u;Y j ? jSu;w;X j) and the set AY may be obtained from AX by deleting the k ? jSu;w;X j colour maps which assign the colour Y (v ) to w and adding the k ?jSw;u;Y j colour maps which assign the colour X (v ) to u. We order the elements of AX lexicographically but order the elements of AY as follows. The rst NY ? (k ?jSw;u;Y j) elements of AY are the elements fcY : cY (u) 6= X (v)g, ordered lexicographically. The last k ?jSw;u;Y j elements of AY are the elements fcY : cY (u) = X (v )g ordered by cY (w). We do this so that the ith element of AX and AY agree, for 1  i  NX ? (k ? jSu;w;X j). Then the matrix K of costs has the following form: the rst NX ? (k ?jSu;w;X j) rows have a 0 on the diagonal and every other entry is 1 or 2. Each entry of the last k ? jSu;w;X j rows is either 1 or 2, with each row having at least 1 entry equal to 1, and each entry in the last k ? jSw;u;Y j columns equal to 2. Let K (1) be the matrix obtained from K by rst subtracting 1 from the last k ?jSu;w;X j rows, and then subtracting 1 from the last k ?jSw;u;Y j columns. Then there is a zero entry in each row and column of K (1). First suppose that NX = NY . The diagonal entry of each row of K (1) equals zero. Let each of these be an independent zero with multiplicity NX . This set of NX 2 independent zeros gives an optimal solution of the original transportation problem which corresponds exactly to the coupling described for Subcase 2(ii), Con guration 2 (when NX = NY ) in the proof of Theorem 3.2. (Note that the bijection  : C n Su;w;X ! C n Sw;u;Y used in the proof of Theorem 3.2 can be de ned as follows: if the ith element of AX is hY (v); cX (u)i and the ith element of AY is hcY (w); X (v)i then let (cX (u)) = cY (w), for NX ? (k ? jSu;w;X j) < i  NX .) Now suppose that NX > NY . The following zero elements of K (1) are taken as 29

independent zeros with the stated multiplicities. Let the (i; i) entry be an independent zero with multiplicity NY for 1  i  NX ? (k ? jSu;w;X j). Now consider the row associated with hY (v ); cX (u)i. Let each entry in a column associated with hcY (w); cX (u)i be an independent zero with multiplicity jSw;u;Y j?jSu;w;X j. First suppose that cX (u) 2 C1. Then let each entry in a column associated with hcY (w); X (v)i where cY (w) 6= cX (u) be an independent zero with multiplicity k ? jSw;u;Y j, and let the entry in the column associated with hcX (u); X (v )i be an independent zero with multiplicity (k ? jSw;u;Y j + jSw;u;Y n Su;w;Y j ? 1). By (13), this places NY independent zeros in that row. Finally suppose that cX (u) 2 Su;w;X n Sw;u;X . Then let each entry in a column associated with hcY (w); X (v )i be an independent zero. The multiplicity of this entry is k ? jSw;u;Y j if cY (w) 2 C1 and k ? jSw;u;Y j + 1 if cY (w) 2 C2. This ensures that there are NY independent zeros in the row, by (14). This set of NX NY independent zeros in K (1) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling described for Subcase 2(ii), Con guration 2 (when NX > NY ) in Theorem 3.2. CONFIGURATION 2, SUBCASE 2(iii). Here fu; v g 2 E , X (v ) 2 Sw;u;Y , Y (v ) 62 Sw;u;X , X (v) 2 Su;w;Y and Y (v) 62 Su;w;X . In this case NX = NY +2(k ? 1) ? (jSw;u;X j + jSu;w;X j) and the set AY may be obtained from AX by deleting the k ? jSu;w;X j ? 1 colour maps which assign the colour Y (v ) to w and deleting the k ? jSw;u;X j ? 1 colour maps which assign the colour Y (v ) to u. The matrix K of costs has the following form: the row associated with cX has a 0 on the diagonal and every other entry equal to 1 or 2, unless cX (w) = Y (v ) or cX (u) = Y (v ). If cX (w) = Y (v ) or cX (u) = Y (v ) then every entry of the row is 1 or 2, with at least one entry equal to 1 in each row. Let K (1) be the matrix obtained from K by subtracting 1 from each row corresponding to cX where cX (w) = Y (v) or cX (u) = Y (v). There is a zero in every row and every column of K (1). In particular, the diagonal element of the row corresponding to cX is zero whenever cX (w) 6= Y (v) and cX (u) 6= Y (v). Make each of these entries an independent zero with multiplicity NY . We now describe how to select independent zeros in the remaining rows. Recall the probability distributions Y , Y 0 , q and q 0 de ned in (11), (16), (17) and (19) respectively. Consider the row associated with hY (v ); cX (u)i. If cX (u) 2 C20 then let the entry in the column associated with hcY (w); cX (u)i be an independent zero with multiplicity Y (cY (w))NY for all cY (w) 2 C n Sw;u;Y . If cX (u) 2 C1 then let the entry in the column associated with hcY (w); cX (u)i be an independent zero with multiplicity cX (u) (cY (w))NY for all cY (w) 2 C n (Sw;u;Y [ fcX (u)g). This places NY independent zeros in each of these rows, by (18) and the fact that Y is a probability distribution. Finally consider the row associated with hcX (w); Y (v )i. If cX (w) 2 C2 then let the entry in the column associated with hcX (w); cY (u)i be an independent zero with multiplicity Y 0 (cY (u))NY for all cY (u) 2 C n Su;w;Y . If cX (w) 2 C1 then let the entry in the column associated with hcX (w); cY (u)i be an independent zero with multiplicity 0 cX (w) (cY (u))NY for all cY (u) 2 C n (Su;w;Y [ fcX (w)g). This places NY independent zeros in each of these rows, as both Y 0 and q 0 are probability distributions. This set of NX NY independent zeros in K (1) gives an optimal solution of the original transportation problem which corresponds exactly to the coupling for Subcase 2(iii), Con guration 2 given in Theorem 3.2. This completes the proof. 30

References [1] D. Aldous, Random walks on nite groups and rapidly mixing Markov chains, in A. Dold and B. Eckmann, eds., Seminaire de Probabilites XVII 1981/1982, vol. 986 of Springer-Verlag Lecture Notes in Mathematics, Springer-Verlag, New York, 1983 pp. 243{297. [2] R. J. Baxter, Exactly solved models in statistical mechanics, Academic Press, London, (1982). [3] R. Bubley and M. Dyer, Path coupling: A technique for proving rapid mixing in Markov chains, in 38th Annual Symposium on Foundations of Computer Science, IEEE, Miami Beach, Florida, 1997 pp. 223{231. [4] R. Bubley and M. Dyer, Faster random generation of linear extensions, in 9th Annual Symposium on Discrete Algorithms, ACM{SIAM, San Francisco, California, 1998 pp. 350{354. [5] R. Bubley, M. Dyer, and C. Greenhill, Beating the 2 bound for approximately counting colourings: A computer-assisted proof of rapid mixing, in 9th Annual Symposium on Discrete Algorithms, ACM{SIAM, San Francisco, California, 1998 pp. 355{363. [6] R. Bubley, M. Dyer, C. Greenhill, and M. Jerrum, On approximately counting colourings of small degree graphs, (1997), (Preprint). [7] M. Dyer and C. Greenhill, On Markov chains for independent sets, (1997), (Preprint). [8] F. Jaeger, D. L. Vertigan, and D. Welsh, On the computational complexity of the Jones and Tutte polynomials, Mathematical Proceedings of the Cambridge Philosophical Society, 108 (1990), pp. 35{53. [9] M. Jerrum, A very simple algorithm for estimating the number of k-colorings of a low-degree graph, Random Structures and Algorithms, 7 (1995), pp. 157{165. [10] M. Jerrum and A. Sinclair, The Markov chain Monte Carlo method: an approach to approximate counting and integration, in D. Hochbaum, ed., Approximation Algorithms for NP-Hard Problems, PWS Publishing, Boston, 1996 pp. 482{520. [11] B. Kreko, Linear Programming, Sir Isaac Pitman and Sons Ltd, London, (1968). [12] M. Luby, D. Randall, and A. Sinclair, Markov chain algorithms for planar lattice structures (extended abstract), in 36th Annual Symposium on Foundations of Computer Science, IEEE, Milwaukee, Wisconsin, 1995 pp. 150{159. [13] J. Salas and A. D. Sokal, Absence of phase transition for antiferromagnetic Potts models via the Dobrushin uniqueness theorem, Journal of Statistical Physics, 86 (Feb. 1997) 3{4, pp. 551{579. 31

[14] M. D. Vose, A linear algorithm for generating random numbers with a given distribution, IEEE Transations on Software Engineering, 17 (1991) 9, pp. 972{975. [15] D. Williams, Probability with Martingales, Cambridge University Press, Cambridge, (1991).

32