Conductance and Convergence of Markov Chains - A Combinatorial Treatment of Expanders Milena Mihail Harvard University and
U. C. Berkeley (Extended Abstract)
0
Abstract
0
We give a direct combinatorial argument to bound the convergence-rate of Markov chains in terms of their conductance (these are statements of the nature “random walks on expanders converge fast’). In addition to showing that. the linear algebra in previous arguments for such results on time-reversible Markov chains was unnecessary, our direct analysis applies to general irreversible Markov chains.
1
Introduction
0
The distance from stationarity is expressed by some “discrepancy” vector. For undirected graphs the adjacency matrix A of the graph is symmetric and possesses an orthogonal basis of eigenvectors. Consequently, the discrepancy vector can be written in this basis, and (under mild conditions) the second largest eigenvalue A2 of A provides an effective characterization of the convergence rate [AD671 [AldS’i]. More important, expansion implies separation of X z from 1 (the discrete analogue of Cheeger’s theorem on Riemannian manifolds [Che’lO]), thus fast convergence is established [AldS7] [SJ67].
In contrast to this t,ypical algebraic treatment of expanders, we reason from a purely combinatorial perspecRecently there has been considerable interest in rapidly tive, in fact from first principles, as follows: mixing properties of Markov chains. that is Markov chains 0 The convergence rate of a random walk on an exwhich come “close’ to their stationary distribution after a pander graph can be viewed as the diffusion of an “small” number of steps -~sniall”is to be compared with initial “charge’ placed on the vertices of the graph. the number of states. From a theoretical perspective such properties introduce a complexity aspect t o discrete probThe charge is diffused along the edges of an expander ability: in contrast to classical Perron-Frobenius analysis according to a simple averaging rule: “each edge av[Se731the convergence bounds are non-asymptotic [Ald63] erages the charges of its two endpoints”. [ADST] [Ald88] [A10861 [Ald67] [SJ67]. (From the algorithmic perspective rapidly mixing Markov chains on spe0 Expansion is equivalent to ‘well distributed edges cific combinatorial populations have resulted in remarkover the entire graph”, which suggests a substantial able sampling and approximate counting schemes for hard number of edges with significantly different charges at problems [Br86] [JS88] [DLMV88] [DFK89]). their endpoints. Therefore the averaging is effective, Analyzing the convergence rate of Markov chains is a the charge diffusion is rapid, and the convergence is formal way to reason about the convergence rate of ranfast. dom walks on expanders. So far, the reasoning used to esAside from providing a simple and straightforward intablish the simple fact that “random walks on expanders converge fast” was strongly algebraic (influenced by non- sight for the rapidly mixing properties of expanders. our trivial bounds on spectra that were essential in different non-algebraic reasoning resulted in the natural generalizaapplications of expanders, e.g. explicit constructions). In tion t o directed graphs and arbitrary finite Markov chains particular, the proofs preceding ours were obtained along of rapidly mixing statements that. were known to hold only for undirected graphs and time-reversible Markov chains. the following general lines: Similar generalizations to arbitrary Markov Chains with * Suppond by NSF-CCR86-58143 continuous state space have been obtained independently
CH2806-8/89/0000/0526/$01.OO0 1989 IEEE
526
and in different literature by Lawler and Soker [LS8S]. It should be pointed out that if the symmetry (resp. timereversibility) assumption is dropped. then the adjacency matrix A is not guaranteed to possess and provide a basis of eigenvectors for the analysis of the discrepancy vector, and therefore the algebraic reasoning does not carry over (this was e.g. stated both by Aldous [AldSi] and by Sinclair and Jerrum [SJSi]). Finally, for the case of strongly aperiodic and reversible Markov chains our direct combinatorial reasoning improves previously known bounds by a constant factor (strongly aperiodic Markov chains are the ones t.ypically used in the algorithmic cont,ext). The rest of this paper is organized as follows: In section 2 we discuss regular directed graphs with self loops where the statements and the proofs have simple intuitive interpretations. For random walks on such graphs we bound the convergence rate in terms of cutset expansion. The proof proceeds in 4 stages. Stages 1,2, and 4 are conceptually and technically new. Stage 3 is in spirit a n d in technique similar to the treatment of eigenvectors in [A10861 and [SJ6i]. In section 3 we state the results for arbitrary Markov chains and bound the convergence rate in terms of the “conductance’. Conductance is a n expausioa-like properny for the graph of ergodic Aows of the Markov chain.
which the discrepancy vector
qt) = Z(t)-?
approaches 0.
‘Io measure the distance of Z ( f ) from n’ we use the usual norm of the discrepancy vector:
In what follows we show that the expansion of G determines the rate at which [[ qt) 11- 0. In particular, consider the following version of cutset expansion:
where C ( A )is the cutset of A : C ( A ) = {ij : i E A , j E A. ij E E ) . In Theorem 2.1 we show that after each step the length of the discrepancy vector decreases significantly:
Therefore,
And the exponential convergence of the random walk follows: II e-(f) IIL (1-02)‘II q o ) II
The first results of this flavor were obtained for undjrected graphs and in terms of vertex expansion by Aldous who used Alon’s bounds on eigenvalues [A10861 [Xldfi’i]. Let. G(t’, E ) be a d-regular directed graph. i.e. for every JVith respect to the version of cutset expansion defined vertex the in-degree is equal to the out-degree and equal above, Theorem 2.1 follows by the work of Sinclair and Jert o d. Let A be the adjacency matrix of G: arj = 1 if ij E E rum for the special case of undirected graphs. with slightly and aij = O if i j 4 E . worse constants. and under the same strong aperiodicit?. Consider a random walk on G with transition matrix P: assumption. They showed 1) C ( t ) 115 (1-a2/7)‘ 11 q0) 11 [SJS’i]. The constant was saved here by considering strong I/’ ifi=j aperiodicity in all stages of the proof. in fact. such a conI/2d if i f j and i j E E sideration appears necessary to our combinatorial reason0 otherwise ing. Sinclair and Jerrum used strong aperiodicity simply to guarantee a positive spectrum and dominance of conKotice that self-loops of weight 11’ have been added on vergence rate by X2. each vertex. Technically. this takes care of periodicities We proceed to state the Theorem. that may occur in directed graphs and such random walks are usually called “strongly aperiodic”. As we will dis- Theorem 2.1 For any initial distribution Z(O), cuss in section 3 the strong aperiodicity assumption can be dropped. However it is instructive to keep it for the time being. PROOF. The proof proceeds in four stages along the Let Z ( t ) be the distribution of the random walk at time t: Z(t) = Z(f - 1)P = 5 ( 0 ) P 1 . Let n’ = limt-w z(t).F is intuition described in the introduction. Stage 1 : PROBABILITY CHARGES the %tationary distribution’ of the random walk (it is an elementary fact that if G is connected. then f ; exists and The point here is to view the discrepancy Z(t) = Z ( t )- i? it is unique). Moreover. 5 satisfies i? = 5P: it is easy to as a charge distributed over the vertices of the graph G. check that for a random walk on G. x , = and the And further. to realize that the action of one step of the stationary distribution of the random walk is the urliform random walk on Z ( t ) obeys a simple rule. over t’. We are interested in the rate at which b ( f )a p In general. a charge =< f i ..f, > is a n assignment. of proaches 5. Equivalently. we are interested in the rate at real values to the vertex set 1’. For a charge f . the n o r m of
2
Regular Directed Graphs
-
521
.
ci
{is 11 f l l = f;. A charge has a positive and a negative component,solet f:=max{fi,O} and fy=min{fi,O}. A probability charge is a charge e'where LIE,, e i =EiEI, e: XiCV
le;
immediately from (1) and (3):
/=O.
Clearly, the discrepancy vector Z ( t ) is a probability = C i z i ( t ) - E i r i = 1 - 1 = 0. Morecharge: Cie;(t) over, the action of P on Z ( t ) is identical to the action of P on the probability distribution z'((2):
(4)
Realize that (4) precisely suggests that, the net decrease 11 { 11 - 11 11 is due to edges of G with significantly different charges at their endpoints. Stage 3 : EFFECTIVE AVERAGING FOR
q t + l ) = Z ( t + l ) - x' = i(l)P- i P = qt)P
EXPANDERS
In this and the next stage we want to pinpoint the idea Stage 2 : AVER.4GING ALONG EDGES in cutset expanders the edges are "well distributed' that We are inberested in the decrease 11 Glt) 11 - 11 Z(t+l) 11.
all over the graph. therefore any placement of a charge on the vertices is forced t o result in a large number of edges with significantly different, charges at their endpoinJs. In this stage we will establish the above for a charge h such that I { i : h, > 0 } 1 5 I\'1/2and I { i : hi < 0 1 1 5 IV1/3("log assume that 11'1/2 is odd). For such a charge h' we show:
The idea in what. follows is to express the action of P on Z ( t ) as an action of the edges of G. In fact, this will be done for an arbitrary charge J? First express 11 f 1) so that each edge (rather than vertex) of G is assigned a fraction of the charge of its endpoints:
II Tll =
Cf?
(5)
Y
f,' + f,2 =
Intuitively and technically (5) will be treated along the lines suggested by Sinclair and Jerrum's manipulation of Next, describe the action of P on f a s a n averaging of f eigenvectors, which in turn is closely related to .4lon's along the edges of G. For this, let f F = fP,and notice: treatment of eigenvectors (at this stage of the proof both the above references make minimal use of the property that the function of vertices manipulated is an eigenvector). In particular, the reasoning is as follows: If the vertices of G are ordered according t o the value of their fi + fj charge. so that hl 2 h2 2 . ... then edges with significantly = j EN-' (i) different charges at their endpoints become "long edgesr. where A'-l(i) = {j : j i E E}. Kow using the fact that If Ak = { 1. . .. k}. then every such long edge ij will a p the square of the mean is bounded by the mean of squares pear in each one of the cutsets C(&), k = i. ..., j. In turn. the size of these cutsets depends directly on cutset we get: expansion. PROOF OF (5). Clearly,
C
T
.
1 (6) Kow for the charge h; (1) suggests:
(7)
Using (7) and the Cauchy-Schwartz inequality we have: Finally establish that the averaging suggested by (2) and (3) is effective in reducing the norm of { if the quantities to be averaged are significantly different. This follows
528
-1 4d
(h+ - h,+)2 ijEE
L
16d2
Clearly. inequality (13) holds if h; is replaced by h‘. In view of (6) this completes t h e proof of (5). Stage 4 : NORMALIZATION In this final stage we argue that the crucial bound in (5) holds for an arbitrary probability charge q t ) . To establish so that ek(t) 2 e k + l ( t ) this consider a n ordering of and let, t)i be as before: m = ( l V l + l ) / 2 . Now let h‘ be a charge such that h, = e t ( f ) - c m ( t ) and notice that h, 2 0 for i < tn, while h, 5 0 for i 2 m,therefore ( 5 ) applies The idea in what follows is khat the net decrease of to i. af) is at least as large as that, of h. In particular, (4) and ( 5 ) suggest:
xi
z(f)
(hr)2
Recall that h l 2 h:+] and notice:
ij€ E:i<j
jiEE:i<j i-1
1
ijEE:i<j k=i i-1
I a211~II 11’1-1
c
=
( ( h 3-
(ICCAk)l+
Now let r be such that er(t) 2 0 and er+l(t)
IC(\’ \&)I)
(9) notice:
(14)
5
0 and
k=l
The fact that G is d-regular suggests (C(A1:)I= IC(\’ A L ) ~and , (9) becomes:
\
n-1
iJEE
k=l
(10) Let m be such that m = (lV1+1)/2. Clearly h$ = 0 (because of the special form of l )and (10) becomes: m-1
- (h,+I21= 2 tjEE
((h;l2
Moreover by the definition of %lAkl. hence (11) becomes:
c
- (h$+l)2) IC(Ak)I
k=l Q
(11) we know that IC(A)I 2
m-1
l ( v ) 2 - ( q 2 1 2 4da
rjEE
(@:I2 -
IAkI
Since Z(t) is a probability charge we ha.ve z l = l e i ( t ) lei(t)l = 0 and the above becomes:
k=l
c
n
m-1
= 4da
((15:)~
- ( l ~ : + ~ )k ~ )
t=l
k=l
= II 4t) II + l l ’ l e N L IIWII
m-1
= 4da z ( h : ) 2 k=1
= 4dc1E(h+)~
(12)
(15)
Finally (14) and (15) complete the proof of Theorem 2.1:
I1 qt) II - It e t t + l ) II> a2 II e7t) II
I
0
In view of ( E ) , (8) yields the following simple bound:
Remark : In terms of vertex expansion p = minA Ir(A)I/IAl, we can replace stage 3 by Alon’s treatment of eigenvectors [A10861 and show: 11 e‘(t) 11 - I(
529
The proof of Theorem 3.1 is similar to the proof of Thee 7 2 t-1) 112 P2/Sd (3+ P 2 ) 11 Z ( t ) 11. Roughly, Alon’s method considers a subgraph of G with p average ver- orem 2.1 and is omitted here. The key idea is simply tices in r ( A k ) per vertex of Ak (the subgraph induced by to notice that for the graph G p and for a n y i we have: ma-flow trough the expander). then uses the conductance w,, w J , (in-weights=out-weights for all vertices). bound for this subgraph. and finally normalizes by the deFinally, we may drop both the reversibilit,y and strong gree d to get the conductance of the original graph. aperiodicit y conditions. Henceforth P will be an arbitrary transition matrix over state space S, and Z(0). Z(t). and 5 are as before. To bound the convergence rate we introduce 3 General Markov Chains a new quantity, the “merging conductance”. The merging conductance @ > ( A )of a subset A of S is: Consider an irreducible and aperiodic Markov chain over state space S. IS1 = ti (irreducibility and aperiodicity are assumed simply to guarantee unique convergence). Let P = { p , j } denot,e its transition matrix, Z(0) the initial probability distribution, and 5 the unique stationary dis- Realize that the merging conductance of A is a measure tribution. As before. we are concerned with the rate at of the merge of ergodic flows that are conducted by A and which Z(‘(1) approaches x‘: equivalently, the rate a t which S \ A in stationarity. The merging conductance @> of P the discrepancy Z(t) = Z(t)- iT vanishes. We will charac- is terize this rate in terms of expansion-like properties of the graph of the ergodic parameters associated with P . Previously. results of this nature were obtained by Sin- Kotice that the merging conductance of P is simply the clair and Jerrum [SJSi]. Sinclair and Jerrum associated cutset expansion of the weighted graph I P M U f f . where with P the underlying graph of P : G p ( S .I T - ) , where Af is a diagonal matrix with m,, = 7r,-l. It can be shown wiJ = z i p t J . G p is the weighted graph of ergodic flows that the merging conductance determines the convergence of P. They further defined the Conductance @ p ( A ) of a rate: subset A of S as: Theorem 3.2 I[ q t ) I[< (1 11 9 0 ) (I
E, =EJ
i@>2)i
The conductance
@p
of P is:
Kotice that the conductance is the weighted edge analogue of cutset expansion. For the case of strongly aperiodic =wJa) (i.e. p a ,2 1/2 for all i) and time-reversible (i.e. uLJ Markov chains Sinclair and Jerrum’s bounds imply:
where throughout this section
The intuition behind this statement is. roughly, the following: Consider some discrepancy < t h a t assigns a large positive charge e: on all vertices j , of some set A and a large negative charge e,+? on all vertices j 2 of S \ A . For each vertex i such that p J 1 ,and p j z l are not negligible. part of the charges e: and e; will be iconducted’ in one step to i where they will “merge’ and “cancel out’. Therefore. the total decrease of the discrepancy should depend on the global distribution of such triples j l . j 2 . i which is: expressed by @ > ( A ) .The proof is left for the final version of the paper. It requires a detailed but otherwise straightforward treatment of the appropriate quantities. Remark 1 : Theorems 2.1, 3.1. and 3.2 could have been also obtained by some algebraic reasoning along the =< z ( ~ ) P Mq,t ) P >= following lines: 11 qt + 1) [I= Z(t)PMPTZ(t)* =< C ( t ) P M P * . q t ) > That is, the action of P on the norm of qt) can be related to the action of P M P T on qt). Kow P M P T is symmetric and possesses an orthogonal basis of eigenvectors. Consequently. its second largest eigenvalue A2 can be shown to bound 11 F(f 1) 11. In turn. Xz can be related to the merging conductance of P . which is the cutset expansion of Li-MWT. In fact. similar reasoning is rather typical in the context of bipartite expanders. Remark 2 : Our work suggests problems for further research is various directions. (a):To improve the bounds of
E,$
For time-reversible Markov chains where wlJ = w,, the matrix M’ is symmetric and possesses an orthogonal basis of eigenvectors, Sinclair and Jerrum‘s analysis makes strong use of the symnmry of M’ and their argument is algebraic. For general non-reversible and strongly aperiodic Markov chains we can show a slightly better bound:
5 30
+
Theorems 2.1.3.1. and 3.2 for special cases. or to show that they are tight. Such an improvement has been obtained by Diaconis [D89] for chains amenable to "canonical-path' arguments for conductance. (b):To explore if the bounds obtained here yield simple algorithms to approximate the expansion ofa graph. like the ones proposed in [Alo86] and [BSS'i]. (c):To extend Aldous's sample-averaging result for arbitrary Markov chains (Proposition 4.1 in [AldSi]. which, by the way. gives a remarkable upper bound 017 random resources). (d):To check how much of the linear algebra used in previous statements concerning expanders was actually necessary.
~ 8 9 1
P. Diaconis. personal communication
[DFK69]
hl. Dyer. A. Frieze. and R. Kannan. A Random Polynomial Time Algorithm for Estimating \:olumes of Convex Bodies,STOC 1969, 375-38 1
[ DL hl V6 61 P. Dagum. hl. Luby, M. Mihail, and U. \'azirani, Polytopes. Permanents, and Graphs with Large Factors, FOCS 1988. 412-421
[JS86]
h1.R. Jerrum and A. Sinclair, Conductance and t.he Rapid Mixing property for Markov chains: the Approsilnation of the Permanent resolved, STOC 1966, 235-243
[LS86]
F. Lawler and A.D. Sokal. Bounds on the L 2 Spectrum for Markov chains and Markov P r e cesses: A Generalization of Cheeger's Inequality, Transactions of t h e American Mathematical Society, Vol. 309, No 2. 1966, pp 557-560
[Se731
E. Senet,a, Non-negative hlatrices and Finhe Markov Chains. Springer Series in Statistics, Sevi I-ork. 1981
[SJ67]
A. Sinclair and M.R. Jerrum, Approximate counting. uniform generation and rapidly mixing markov chains, Iiifornlation and Compuiing. (to appear)
Acknowledgment I wish to thank Umesh Vazirani for the generous feedback of enlightening ideas that he provided. as usual. 1 also wish to thank Alistair Sinclair and Madhu Sudan for several important remarks.
References [Ald63]
D. Aldous, Random Walks on Finite Groups and Rapidly Mixing Markov Chains. Seminaire d e Probabilities A' VU, Lecture Eotes in Mathematics, \:ol. 966, Springer Ferlag, Berlin. 1963
[Ald67]
D. Aldous, On the Markov chain Simulation Method for Uniform Combinat>orialDistributions and Simulated Annealing. Probahility in Eng. and Inf. Sci. 3, 1987, 33-46
[Ald88]
D. Aldous. Random Walks on Exponentially Large Graphs: A Survey, preprint, U.C. Berkeley! Spring 1986
[A10861
K. Alon. Eigenvalues and Expanders, Combinatorica 6(2). 1986, 63-96
[AD671
D. Aldous and P. Diaconis. Strong Uniform Times and Finite Random Walks. Advances in Applied Math.ematics 8, 1987. 69-97
[Br86]
A.Z. Broder, Row hard is it. t,o marry at random? (On the approximation of the permanent), STOC 1966, 50-56
[BS87]
-4.2. Broder and E. Shamir, On the Second Eigenvalue of Random Regular Graphs, FOCS, 1967, 266294
[Che70]
J . Cheeger: A Lower Bound for the Smallest Eigenvalue of the Laplacian. Problems in Analysis, Princet,on University Press, Kew Jersey. 1970, 195-199
53 I