Deterministic Random Walks for Rapidly Mixing Chains Takeharu Shiraga∗
Yukiko Yamauchi∗
Shuji Kijima∗
Masafumi Yamashita∗
arXiv:1311.3749v1 [cs.DM] 15 Nov 2013
November 18, 2013
Abstract The rotor-router model, also known as the Propp machine, is a deterministic process analogous to a random walk on a graph. Instead of distributing tokens to randomly chosen neighbors, the rotor-router deterministically serves the neighbors in a fixed order. This paper is concerned with a generalized model, functional-router model. While the rotor-router is an analogy with a random walk consisting of only rational transition probabilities using parallel edges, the functional-router imitates a random walk containing irrational transition probabilities. For the functional-router model, we investigate the discrepancy on a single vertex between the number of tokens in the functional-router model and the expected number of tokens in a random walk. We present an upper bound of the discrepancy in terms of the mixing rate of the corresponding transition matrix. Using the result, we give a polynomial time deterministic sampler for particular #P-complete problems, such as 0-1 knapsack solutions, linear extensions, matchings, etc., for which rapidly mixing chains are known; Our deterministic algorithm provides samples from a “distribution” with a point-wise distance at most ε from the target distribution, in time polynomial in the input size and ε−1 . Key words: rotor-router model, Markov chain Monte Carlo, mixing time, #P-complete.
1 Introduction The rotor-router model, also known as the Propp machine, is a deterministic process analogous to a random walk on a graph [41, 8, 30]. Instead of distributing tokens to randomly chosen neighbors, the rotor-router model deterministically serves the neighbors in a fixed order by associating to each vertex a “rotor-router” pointing to one of its neighbors. Doerr et al. [6, 10] first called the rotor-router model deterministic random walk, meaning a “derandomized, hence deterministic, version of a random walk.” Single vertex discrepancy for multiple-walk. Cooper and Spencer [8] are concerned with the model of multiple tokens (multiple-walk) on Zn , and investigated the discrepancy on a single vertex: they gave a (t) (t) (t) (t) bound that |χv − µv | ≤ cn where χv (resp. µv ) denotes the number (resp. expected number) of tokens on vertex v ∈ Zn in a rotor-router model (resp. corresponding random walk) at time t on the condition that (0) (0) µv = χv for any v, and cn is a constant depending only on n but independent of the total number of tokens in the system. Cooper et al. [6] showed c1 ≃ 2.29, and Doerr and Friedrich [10] showed that c2 is about 7.29 or 7.83 depending on the routing rules. On the other hand, Cooper et al. [5] gave an example of √ (t) (t) |χv −µv | = Ω( kt) on infinite k-regular trees, the example implies that the discrepancy can get infinitely large as increasing the total number of tokens. Motivated by a derandomization of Markov chains, Kijima et al. [30] are concerned with multiple(t) (t) walks on general finite multidigraphs (V, A), and gave a bound |χv − µv | = O(|V ||A|) in case that ∗ Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
[email protected], {yamauchi,kijima,mak}@inf.kyushu-u.ac.jp
1
(t)
(t)
corresponding Markov chain is ergodic, reversible and lazy. They also gave some examples of |χv −µv | = Ω(|A|). Kajino et al. [29] sophisticated the approach by [30], and gave a bound in terms of the second largest eigenvalue and eigenvectors of the corresponding Markov chain, for an arbitrary irreducible finite Markov chain, which may not be lazy, reversible nor aperiodic. For some specific finite graphs, namely hypercubes and tori, some bounds in terms of logarithm of the size of transition diagram are known. For n-dimensional hypercube, Kijima et al. [30] gave a bound O(n3 ), and Kajino et al. [29] improved the bound to O(n2 ). Recently, Akbari and Berenbrink [1] gave a bound O(n1.5 ), using results by Friedrich et al. [17]. Akbari and Berenbrink [1] also gave a bound O(1) for constant dimensional tori. Those analyses highly depends on the structures of the specific graphs, and it is difficult to extend the technique to other combinatorial graphs. Kijima et al. [30] gave rise to a question if there is a deterministic random walk for #P-complete problems, such as 0-1 knapsack solutions, bipartite (t) (t) matchings, etc., such that |χv − µv | is bounded by a polynomial in the input size. Other topics on deterministic random walk. As a highly related topic, Holroyd and Propp [21] analyzed “hitting time” of the rotor-router machine with a single token (single-walk) on finite simple graphs, and gave (t) (t) a bound |νv −tπv | = O(|V ||A|) where νv denotes the frequency of visits of the token at vertex v in t steps, and π denotes the stationary distribution of the corresponding random walk. Friedrich and Sauerwald [18] studied the cover time of a single-walk version of the rotor-router machine for several basic finite graphs such as tree, star, torus, hypercube and complete graph. Holroyd and Propp [21] also proposed a generalized model called stack walk, which is the first model of deterministic random walk for irrational transition probabilities, as far as we know. While Holroyd and Propp [21] showed the existence of routers which approximates irrational transition probabilities well, Angel et al. [2] gave a simple routing algorithm based on a “shortest remaining time.” Shiraga et al. [43], that is a preliminary work of this paper, independently investigated a model based on the van der Corput sequence, motivated by irrational transition probabilities, too. As another topic on the rotor-router model, the aggregation model has been investigated [34, 32, 35, 36]. For a random walk, tokens in the Internal Diffusion-Limited Aggregation (IDLA) model on Zn asymptotically converge to the Euclidean ball [33], and Jerison, Levine and Sheffield [26] recently showed the fluctuations from circularity are O(log t) after t steps. For the rotor-router model, Levine and Peres [34, 35, 36] showed that tokens in the rotor-router aggregation model also form the Euclidean ball, and showed several bounds for the fluctuations. Kleber [32] gave some computational results. Doerr, Friedrich, and Sauerwald [12] showed that information spreading by the rotor-router machine is faster than the one by a random walk on some specific graphs, namely trees with the common depth and the common degree of inner vertices, and random graphs with restricted connectivity. Doerr, Friedrich, K¨unnemann, and Sauerwald [11] gave some computational results for this phenomena. There is much other research on information spreading by the rotor-router machine on some graphs [3, 9, 13, 14, 15, 23]. Our Results. This paper is concerned with a parallel walk version of the functional-router model, a generalization of the rotor-router model. While the rotor-router is an analogy with a random walk with rational transition probabilities, the functional-router imitates a random walk containing irrational transition probabilities by routing-functions defined on vertices. In the functional-router model, a configuration of M tokens (t) (t) over a finite set V = {1, . . . , N } is deterministically updated; let χ(t) = (χ1 , . . . , χN ) ∈ ZN ≥0 denote the P (t) (0) configuration at time t = 0, 1, 2, . . ., i.e., v∈V χv = M . For comparison, let µ = χ(0) , and let µ(t) = µ(0) P t , then µ(t) ∈ RN ≥0 denotes the expected configuration of M tokens independently according to (t)
(t)
P for t steps. A main contribution of the paper is to show that |χv − µv | ≤ 3(πmax /πmin )t∗ ∆ holds for any v ∈ V at any time t in case that the corresponding transition matrix P is ergodic and reversible, where 2
πmax and πmin are maximum/minimum values of π respectively, t∗ is the mixing rate of the corresponding Markov chain, and ∆ is the maximum degree of the transition diagram. This result suggests a polynomial-time deterministic algorithm for uniform sampling of some #P-complete problems, such as knapsack solutions, linear extensions, matchings, q-colorings etc., for which rapidly mixing chains exist. Thus, our result affirmatively answers the question by Kijima et al. [30]. Setting the number of tokens M ≥ 3ε−1 t∗ ∆ for an arbitrary ε (0 < ε < 1), our algorithm provides M samples with a “distribution” χ e(t) := χ(t) /M , of which the point-wise distance ke χ(t) − πk∞ is at most ε from the uniform distribution π over the target set. For instance, our algorithm runs in O∗ (n11.1 ε−1 ) time for n-dimensional 0-1 knapsack solutions, in O∗ (n8 ε−1 ) time for linear extensions of n elements poset, in O∗ (m4 n4 ε−1 ) time for all matchings in a graph with n vertices and m edges, in O∗ cq n4 ε−1 time for all q-colorings in a graph with n vertices, where cq is a constant depending on q and the max. degree of the input graph. O∗ notation ignores poly(log(ε−1 ), log m, log n) factors. Note that those orders of magnitude are not optimized, for simplicity of the main arguments. Unfortunately, these running times are the best possible in terms of ε−1 for any deterministic sampler, because of the integrality gap of the number of tokens. An example of a random walk containing irrational transition probabilities is the β-random walk devised by Ikeda et al. [24], which achieves an O(N 2 ) hitting time and an O(N 2 log N ) cover time for any graphs. Another example should be the Markov chain Monte Carlo (MCMC), such as Gibbs samplers for the Ising model (cf. [44, 42]), reversible Markov chains for queueing networks (cf. [31]), etc. Recently, derandomization of randomized algorithms is an interesting and challenging topic. Gopalan, Klivans, and Meka [19], and Stefankovic, Vempala, and Vigoda [45] gave deterministic approximation algorithms for counting knapsack solutions (cf. [20]), where randomized approximation algorithms based on MCMC [38] or based on rejection sampling via dynamic programming [16] had been known for the problem. Cooper, Ilcinkas, Klasing, and Kosowski [7] were concerned with exploration of an anonymous graph using derandomization of random walks. This paper is motivated by a development of a general scheme for a derandomization of randomized algorithms based on Markov chains, such as MCMC, and the functional-router model proposed by the paper is hopefully a step for this goal. Organization This paper is organized as follows. In Section 2, we briefly reviews MCMC, as a preliminary of analysis. In Section 3, we describe functional-router model, and explain our main result. In Section 4, we prove the main theorem. In Section 5, we show some constructions of routing function, and give upper bounds for the models. In Section 6, we present a deterministic sampling algorithm, and show examples of polynomial-time uniform samplers of #P-complete problems, namely for knapsack solutions, linear extensions, matchings, and q-colorings.
2 Preliminaries: Markov Chain Monte Carlo As a preliminary step of our deterministic sampling, this section briefly reviews the Markov chain Monte Carlo (MCMC). See e.g., [44, 37, 39] for detail of MCMC. def. Let V = {1, . . . , N } be a finite set, and suppose that we wish to sample from V with a probability proportional to a given positive vector f = (f1 , . . . , fN ) ∈ RN ≥0 ; for example, we are concerned with uniform sampling of 0-1 knapsack solutions in Section 6.3, where V denotes the set of 0-1 knapsack solutions and fv = 1 for each v ∈ V . The idea of a Markov chain Monte Carlo (MCMC) is to sample from P a limit distribution of a Markov chain which is equal to the target distribution f /kf k1 where kf k1 = v∈V fv is the normalizing constant. N ×N be a transition matrix of a Markov chain with the state space V , where Pu,v denotes the Let P ∈ R≥0 t > 0 for any u and transition probability from u to v (u, v ∈ V ). A transition matrix P is irreducible if Pu,v t > 0} = 1 holds for any x ∈ V , where P t denotes the v in V , and is aperiodic if GCD{t ∈ Z>0 | Px,x u,v 3
(u, v) entry of P t , the t-th power of P . An irreducible and aperiodic transition matrix is called ergodic. It is well-known for a ergodic P , there is a unique stationary distribution π ∈ RN ≥0 , i.e., πP = π, and the limit distribution is π, i.e., ξP ∞ = π for any probability distribution ξ ∈ RN on V . ≥0 N ×N An ergodic Markov chain defined by a transition matrix P ∈ R≥0 is reversible if the detailed balance equation fu Pu,v = fv Pv,u
(1)
holds for any u, v ∈ V . When P satisfies the detailed balance equation, it is not difficult to see that f P = f holds, meaning that f /kf k1 is the limit distribution (see e.g., [37]). Let ξ and ζ be a distribution on V , then the total variation distance Dtv between ξ and ζ is defined by def.
Dtv (ξ, ζ) = max A⊆V
X
v∈A
(ξv − ζv ) =
1 kξ − ζk1 . 2
(2)
Note that Dtv (ξ, ζ) ≤ 1, since kξk1 and kζk1 are equal to one, respectively. The mixing time of a Markov chain is defined by def. t τ (ε) = max min t ∈ Z≥0 | Dtv (Pv,· , π) ≤ ε v∈V
(3)
t denotes the v-th row vector of P t ; i.e., P t denotes the distribution of a Markov for any ε > 0, where Pv,· v,· t of the Markov chain chain at time t stating from the initial state v ∈ V . In other words, the distribution Pv,· t after τ (ε) transition satisfies Dtv (Pv,· , π) ≤ ε, meaning that we obtain an approximate sample from the target distribution. def. t , π for t ≥ 0, then it is well-known that h satisfies For convenience, let h(t) = maxw∈V Dtv Pw,· a kind of submultiplicativity. We will use the following proposition in the analysis of our algorithm in Section 4. See [37] or Appendix A for the proof.
Proposition 2.1. For any integers ℓ (ℓ ≥ 1) and k (0 ≤ k < τ (γ)), 1 h (ℓ· τ (γ) + k) ≤ (2γ)ℓ 2 holds for any γ (0 < γ < 1/2).
def.
Thus, t∗ = τ (1/4), called mixing rate, is often used as a characterization of P .
3 Model and Main result A functional-router model is a deterministic process analogous to a multiple random walk. Roughly speaking, tokens on u moves to a neighboring vertex v with probability Pu,v in a multiple random walk, whereas a router defined on each vertex u deterministically serves tokens to v at a rate of Pu,v in a functional-router model. To get the idea, let us start with explaining the rotor-router model (see e.g., [8, 30]), which corresponds to a simple random walk on a graph, in Section 3.1.
4
3.1 Rotor-router model Let G = (V, E) be a simple undirected graph1 , where V = {1, . . . , N }. Let N (v) denote the neighborhood of v ∈ V . For convenience, let δ(v) = |N (v)|. Let χ(0) ∈ ZN ≥0 be an initial configuration of tokens, and let (t) N χ ∈ Z≥0 denote the configuration of tokens at time t ∈ Z≥0 in the rotor-router model. A configuration χ(t) is updated by rotor-routers on vertices, as follows. Without loss of generality, we may assume that an ordering u0 , . . . , uδ(v)−1 is defined on N (v) for each v ∈ V . Then, a rotor-router σv : Z≥0 → N (v) on v ∈ V is defined by def.
σv (j) = ui mod δ(v) for j ∈ Z≥0 . Let
(4)
n o Pt−1 (s) (t) def. = u j + χ − 1} | σ = j ∈ {0, . . . , χ(t) Zv,u v v v s=0 (t)
for v, u ∈ V , where Zv,u denotes the number of tokens served from v to u in the update. Then, χ(t+1) is defined by (t+1) def.
χu for each u ∈ V . It is not difficult to see that
=
P
(t)
v∈V
Zv,u
|{j ∈ {0, . . . , z − 1} | σv (j) = u}| z
z→∞
−−−→
1 δ(v)
P (s) P (s) holds, which implies that the “outflow ratio” ts=0 Zv,u / ts=0 χv of tokens at v to u approaches asymptotically to 1/δ(v) as t increasing. Thus, the rotor-router hopefully approximates a distribution of tokens by a random walk.
3.2 Functional-router model def.
N ×N Let P ∈ R≥0 be a transition matrix of a Markov chain with a state space V = {1, . . . , N }, where Pu,v denotes the transition probability from u to v (u, v ∈ V ). Note that Pu,v may be irrational2 . In this paper, (0) (0) we assume that P is ergodic and reversible (see Section 2). Let µ(0) = (µ1 , . . . , µN ) ∈ ZN ≥0 denote an (t) N initial configuration of M tokens over V , and let µ ∈ R≥0 denote the expected configuration of tokens independently according to P at time t ∈ Z≥0 , i.e., kµ(t) k1 = M and µ(t) = µ(0) P t . Let G = (V, E) be the transition digram of P , meaning that E = {(u, v) ∈ V 2 | Pu,v > 0}. Note that E may contain self-loop edges, and also note that |E| ≤ N 2 holds. Let N (v) denote the (out-)neighborhood3 of v ∈ V , i.e., N (v) = {u ∈ V | Pv,u > 0}, and let δ(v) = |N (v)|. Note that v ∈ N (v) if Pv,v > 0. Let χ(0) = µ(0) , and let χ(t) ∈ ZN ≥0 denote the configuration of tokens at time t ∈ Z≥0 in the functional(t) router model. A configuration χ is updated by functional-routers σv : Z≥0 → N (v) defined on each v ∈ V to imitate Pv,u . To be precise, let
1 2 3
def. Iv,u [z, z ′ ) = j ∈ {z, . . . , z ′ − 1} | σv (j) = u
In Section 5.2, √we are concerned with the model on multidigraphs. e.g., Pu,v = 5/10, exp(−10), sin(π/6), etc. are allowed. Since P is reversible, u ∈ N (v) if and only if v ∈ N (u), and then we abuse N (v) for in-neighborhood of v ∈ V .
5
(5)
ϭ
Ϯ
ϭ
Ϯ
ϭ
Ϯ
Figure 1: In this example, V = {1, 2}, χ(0) = (7, 0), χ(1) = (4, 3), χ(2) = (5, 2). I1,2 [4, 10) = |{j ∈ {4, . . . , 9} | σ1 (j) = 2}| = 2. for v, u ∈ V and for any z, z ′ ∈ Z≥0 satisfying z < z ′ , for convenience. Then, σv is designed such as to minimize Iv,u [0, z) − Pv,u z for z ∈ Z≥0 . Some specific routing-functions are given in Sections 5.1, 5.2, and 5.3. Let hP (t) (s) t−1 (s) Pt Zv,u = Iv,u χ , χ v v s=0 s=0
(6)
(t)
for v, u ∈ V , where Zv,u denotes the number of tokens served from v to u in the update. Then, χ(t+1) is defined by (t+1) def.
χu
=
P
(t)
v∈V
Zv,u
(7)
for each u ∈ V . P (s) P (s) We in Section 5 show examples of functional-routers, in which the “outflow ratio” ts=0 Zv,u / ts=0 χv from v to u approaches asymptotically to Pv,u as t increases, meaning that the functional-router hopefully approximate a distribution of tokens by a random walk.
3.3 Main results (T )
(T )
Our goal is to estimate the discrepancy |χw − µw | for w ∈ V and T ≥ 0 for the functional router model described in Section 3.2. Let (t) (8) Ψσ = max Zv,u − χ(t) v Pv,u , v∈V, u∈N (v), t≥0
and then, the following is our main theorem.
N ×N be a transition matrix of a reversible and ergodic Markov chain with a state Theorem 3.1. Let P ∈ R≥0 space V , and let π be the stationary distribution of P . Then, the configurations χ(T ) and µ(T ) of tokens in a functional-routing model and in its corresponding random walk satisfy
2(1 − γ) πw (T ) (T ) τ (γ) ∆ χw − µw ≤ Ψσ 1 − 2γ πmin
for any w ∈ V , T ≥ 0 and γ (0 < γ < 1/2).
6
We remark that Ψσ ≤
max
v∈V, u∈N (v), z,z ′ ∈Z≥0 s. t. z ′ >z
Iv,u [z, z ′ ) − (z ′ − z)Pv,u
(9)
holds, since (t)
(t)
Zv,u − χv Pv,u = Iv,u
hP t−1
(s)
s=0 χv ,
Pt
(s)
s=0 χv
−
P
(s) t s=0 χv
−
Pt−1
(s)
s=0 χv
Pv,u
holds by the definition. We give upper bounds of Ψσ for some specific routing-functions. See sections 5.1, 5.2, and 5.3, considering (9).
4 Analysis of the Point-wise Distance This section proves Theorem 3.1. Our proof technique in this section is similar to previous results [8, 30]. To begin with, we establish the following key lemma. N ×N Lemma 4.1. Let P ∈ R≥0 be a transition matrix of a reversible and ergodic Markov chain with a state space V , and let π be the stationary distribution of P . Then, the configurations χ(T ) and µ(T ) of tokens in the algorithm and in corresponding random walk satisfy
(T ) (T ) χw − µw =
T −1 X
X X (t) T −t−1 Zv,u − χ(t) P Pu,w − πw v,u v
t=0 u∈V v∈N (u)
holds for any w ∈ V and for any T ≥ 0. Proof. Remark that (T ) (T ) χw − µw =
χ(T ) − µ(0) P T
w
=
χ(T ) P 0 − χ(0) P T
w
(10)
holds where the last equality follows the assumption χ(0) = µ(0) . It is not difficult to see that (T −1) 1 (T −2) 2 (T ) 0 (0) T (T ) 0 (T −1) 1 P −χ P + ··· χ P −χ P = χ P −χ P + χ + χ(2) P T −2 − χ(1) P T −1 + χ(1) P T −1 − χ(0) P T =
T −1 X t=0
χ(t+1) P T −t−1 − χ(t) P T −t
holds, thus we have (10) =
=
=
T −1 X
χ(t+1) P T −t−1
t=0 T −1 X
X
t=0
u∈V
T −1 X
X
t=0 u∈V
w
T −t−1 χu(t+1) Pu,w
− χ(t) P T −t w
−
X
(χ
(t)
T −t−1 P )u Pu,w
u∈V
T −t−1 (t) . χ(t+1) − (χ P ) u Pu,w u 7
! (11)
While
P
u∈V
(t+1) T −t−1 in (11) may not be 0 in general, remark that χu − (χ(t) P )u Pu,w X XX X = χ(t+1) − χ(t) χu(t+1) − (χ(t) P )u u v Pv,u u∈V
u∈V
X
=
u∈V v∈V
χ(t+1) u
u∈V
−
X
χ(t) v
v∈V
= M −M = 0
X
Pv,u
u∈V
holds. for any t ≥ 0. Hence (11) =
T −1 X
T −1 X X X T −t−1 (t+1) (t) χ(t+1) − (χ(t) P )u πw − χu − (χ P )u Pu,w u t=0 u∈V
t=0 u∈V
=
T −1 X
X (t) T −t−1 χ(t+1) − (χ P ) Pu,w − πw u u
(12)
t=0 u∈V
P P (t) (t+1) (t) (t) holds. Since P is reversible, Zv,u = 0 for any v ∈ / N (u) and χu = v∈V Zv,u = v∈N (u) Zv,u holds by definition (7). Thus, T −1 X X X X (t) T −t−1 Pu,w − Zv,u χ(t) − πw (12) = v Pv,u t=0 u∈V
=
T −1 X
v∈N (u)
X X
t=0 u∈V v∈V
holds, and we obtain the claim.
v∈N (u)
(t) Zv,u − χ(t) v Pv,u
T −t−1 Pu,w − πw
Now, we are concerned with reversible Markov chains, and show the theorem 3.1. Proof of Theorem 3.1. By Lemma 4.1 and (8), we obtain that T −1 X X X T −t−1 (t) (T ) (T ) − πw P Pu,w Zv,u − χ(t) χw − µw ≤ v,u v t=0 u∈V v∈N (u)
≤ Ψσ
= Ψσ
T −1 X
X X P T −t−1 − πw u,w
t=0 u∈V v∈N (u)
T −1 X
X
t=0 u∈V
holds. Since P is reversible, pendix A). Thus
t Pu,w
=
πw t πu Pw,u
(13) = Ψσ
t δ(u) Pu,w − πw
(13)
holds for any w and u in V (see Proposition A.5 in Ap-
T −1 X
X
t=0 u∈V
πw t P − πu δ(u) πu w,u
T −1 πw X X t Pw,u − πu ≤ Ψσ ∆ πmin t=0 u∈V
= 2Ψσ ∆
πw πmin
T −1 X t=0
8
t ,π Dtv Pw,·
(14)
P t t , π , by the definition of the − πu | = 2Dtv Pw,· where the last equality follows the fact that u∈V |Pw,u total variation distance (2). By Proposition 2.1, we obtain the following. Lemma 4.2. For any v ∈ V and for any T > 0, T −1 X t=0
1−γ t τ (γ) ,π ≤ Dtv Pv,· 1 − 2γ
holds for any γ (0 < γ < 1/2). t , π , for convenience. Then, h(t) is at most 1 for any t ≥ 0, by the Proof. Let h(t) = maxw∈V Dtv Pw,· definition of the total variation distance (2). By Proposition 2.1, T −1 X t=0
t ,π Dtv Pw,·
=
T −1 X t=0
h(t) ≤
τ (γ)−1
=
X
∞ X
h(t) =
t=0
h(k) +
k=0
= τ (γ) +
ℓ=1
ℓ=0
∞ τ (γ)−1 X X ℓ=1
∞ X
∞ τ (γ)−1 X X
k=0
h(ℓ· τ (γ) + k)
k=0
h(ℓ· τ (γ) + k) ≤
τ (γ)−1
X k=0
1+
∞ τ (γ)−1 X X 1 ℓ=1
k=0
2
(2γ)ℓ
γ 1−γ 1 τ (γ) = τ (γ) τ (γ) (2γ)ℓ = τ (γ) + 2 1 − 2γ 1 − 2γ
holds, and we obtain the claim. Now we obtain Theorem 3.1 from (14) and Lemma 4.2
5 Various Model (t)
(t)
In section 4, we gave a bound of |χv − µv | using Ψσ , in general. This section shows some routing functions, namely greedy routing in Section 5.1, rotor-router on multigraph in Section 5.2, and the functional(T ) (T ) router based on the van der Corput sequence in Section 5.3, and gives upper bounds on |χw − µw | for them. Greedy routing (Section 5.1), originally given by Holroyd and Propp by the name of stack-walk, and the functional-router based on the van der Corput sequence (Section 5.3) are to be used for irrational transition probabilities, while the rotor-router on multidigraph (Section 5.2) was discussed in [30, 29] to deal with rational transition probabilities.
5.1 Greedy routing This section introduces greedy routing, which is given by Angel et al. [2] and Tijdeman [46]. The functionalrouter σv (i) (i ∈ Z≥0 ) on v ∈ V of the greedy-routing is defined, as follows. For v ∈ V and i ∈ Z≥0 , let Li (v, u) = (i + 1)Pv,u − Iv,u [0, i) for each u ∈ N (v), and let Ti (v) = {u ∈ N (v) | Li (v, u) > 0}. Then, let σv (i) be u∗ ∈ Ti (v) minimizing the value 1 − Li (v, u) Pv,u 9
in all u ∈ Ti (v). If there are two or more such u ∈ Tv (i), then let u∗ be the minimum in them in a (prescribed) order. Since σv (i) ∈ Ti (v), we can see that Li (v, u) = (i + 1)Pv,u − Iv,u [0, i + 1) > −1 holds for any u, v and i, by an induction on i ∈ Z≥0 . The following theorem is due to Angel et al. [2] and Tijdeman [46]. Theorem 5.1. [46, 2] For any transition matrix P , |Iv,u [0, z) − z· Pv,u | ≤ 1 holds for any v, u ∈ V and any z ∈ Z>0 . Theorem 5.1 is firstly given by Tijdeman [46], where he gave a slightly better bound |Iv,u [0, z) − z· Pv,u | ≤ 1 − (2(δ(v) − 1))−1 , in fact. Angel et al. [2] rediscovered Theorem 5.1 in the context of deterministic random walk (see also [21]), where they also showed a similar statement holds even when the corresponding probability is time-inhomogeneous. Theorem 5.1 and (9) imply that Iv,u [z, z ′ ) − (z ′ − z)Pv,u ≤ 2 Ψσ ≤ max (15) v∈V, u∈N (v), z,z ′ ∈Z≥0 s. t. z ′ >z
holds for the greedy-routing model. Thus, we immediately obtain the following theorem by Theorem 3.1 and (15). N ×N be a transition matrix of a reversible and ergodic Markov chain with a state Theorem 5.2. Let P ∈ R≥0 space V , and let π be the stationary distribution of P . Then, the configurations χ(T ) and µ(T ) of tokens respectively in a greedy-routing model and in its corresponding random walk satisfy 6πw ∗ (T ) (T ) t ∆ χw − µw ≤ πmin
for any w ∈ V and T ≥ 0.
5.2 Rotor-router on multidigraph The rotor-router model described in Section 3.2 can be generally considered on digraphs with parallel edges (i.e., multidigraphs). Kijima et al. [30] and Kajino et al. [29] are concerned with the rotor-router model on ¯ finite multidigraphs. Suppose that P is a transition matrix with rational entries. For each v ∈ V , let δ(v) ∈ Z≥0 be a common denominator (or the least common denominator) of Pv,u for all u ∈ N (v), meaning that ¯ ¯ − 1) arbitrarily δ(v)· Pv,u is integer for each u ∈ N (v). We define a rotor-router σv (0), σv (1), . . . , σv (δ(v) satisfying that ¯ ¯ {j ∈ [0, . . . , δ(v)) | σv (j) = u} = δ(v)· Pv,u
for any v ∈ V and u ∈ N (v). Then, σv (i) is defined by i ¯ ¯ . σv (i) = σv (i mod δ(v)) ≡ σv i − δ(v)· ¯ δ(v)
For the rotor router on a multidigraph, it is not difficult to observe the following. ¯ holds for the rotor-router model on a multidigraph. Observation 5.3. Ψσ = maxv δ(v) By Theorem 3.1, and the above observation, we obtain the following theorem. 10
(16)
N ×N be a transition matrix of a reversible and ergodic Markov chain with a state Theorem 5.4. Let P ∈ Q≥0 space V , and let π be the stationary distribution of P . Then, the configurations χ(T ) and µ(T ) of tokens respectively in a rotor-router model and in its corresponding random walk satisfy 3πw (T ) (T ) ¯ max δ(v)· t∗ ∆ ≤ χw − µw πmin v
for any w ∈ V and T ≥ 0.
5.3 Routing based on the van der Corput sequence This section gives a router σ based on the van der Corput sequence [48, 40], which is a well-known lowdiscrepancy sequence. The van der Corput sequence ψ : Z≥0 → [0, 1) is define as follows. Suppose i ∈ Z>0 is represented in P⌊lg i⌋ binary as i = j=0 βj (i)· 2j using βj (i) ∈ {0, 1} (j ∈ {0, 1, . . . , ⌊lg i⌋}). Then, we define ⌊lg i⌋
def.
ψ(i) =
X
βj (i)· 2−(j+1) .
(17)
j=0
def.
and ψ(0) = 0. For example, ψ(1) = 1 × 1/2 = 1/2, ψ(2) = 0 × 1/2 + 1 × 1/4 = 1/4, ψ(3) = 1×1/2+1×1/4 = 3/4, ψ(4) = 0×1/2+0×1/4+1×1/8 = 1/8, ψ(5) = 1×1/2+0×1/4+1×1/8 = 5/8, ψ(6) = 0 × 1/2 + 1 × 1/4 + 1 × 1/8 = 3/8, and so on. Clearly, ψ(i) ∈ [0, 1) holds for any (finite) i ∈ Z≥0 . Now, given i ∈ Z>0 , we define σv (i) as follows. Without loss of generality, we may assume that an ordering u1 , . . . , uδ(v) is defined on N (v) for v ∈ V . Then, we define the routing function σv : Z≥0 → N (v) on v ∈ V such that σv (i) = uk ∈ N (v) satisfies that Pk−1 Pk j=1 Pv,uj ≤ ψ(i) < j=0 Pv,uj
P for k ∈ {1, . . . , δ(v)}, where 0j=1 Pv,uj = 0, for convenience. For the van der Corput sequence, the following theorem, due to van der Corput[48], is known. Theorem 5.5. [48] For any transition matrix P , |Iv,u [0, z) − z· Pv,u | ≤ lg(z + 1) holds for any v, u ∈ V and any z ∈ Z>0 .
More sophisticated bounds are found in [40]. Carefully examining Theorem 5.5, we obtain the following lemma. See Appendix B for the proof. Lemma 5.6. For any transition matrix P , Iv,u [z, z ′ ) − (z ′ − z)Pv,u ≤ 2 lg(z ′ − z + 1)
holds for any v, u ∈ V , and for any z, z ′ ∈ Z≥0 satisfying z ′ > z. Lemma 5.6 suggests the following lemma.
Lemma 5.7. Ψσ ≤ 2 lg(M + 1) holds for the van der Corput sequence. By Theorem 3.1 and Lemma 5.7, we obtain the following. 11
N ×N be a transition matrix of a reversible and ergodic Markov chain with a state Theorem 5.8. Let P ∈ R≥0 space V , and let π be the stationary distribution of P . Then, the configurations χ(T ) and µ(T ) of tokens respectively in a functional-routing model based on the van der Corput sequence and in its corresponding random walk satisfy 6πw (T ) (T ) lg(M + 1)· t∗ ∆ χw − µw ≤ πmin
for any w ∈ V and T ≥ 0.
(t)
(t)
Though the bound depends on log M , |χv /M − µv /M | = O(log(M )/M ) holds in terms of M , meaning that the discrepancy approaches asymptotically to zero as increasing the number of tokens M .
6 Deterministic Sampling This section presents deterministic sampling algorithm based on a (version of) functional routing model, as an application. We explain our algorithm in Section 6.1, and show our main theorem in Section 6.2. Note that our following algorithm is oblivious; while the rotor-router model or greedy-routing model memorizes the configurations of tokens and routers, our algorithm memorizes the configuration of tokens only. It makes the description of the algorithm simple, compared with other deterministic random walks. We show detailed description of deterministic sampling algorithms for particular applications, such as 01 knapsack solutions (Section 6.3), linear extensions (Section 6.4), matchings (Section 6.5), and q-coloring (Section 6.6), where we also discusses the computational complexities of our algorithm for the applications. Note that a similar (or essentially the same) bounds also hold on the greedy-routing model
6.1 Sampling algorithm The algorithm is essentially the greedy-routing model described in Section 5.1, but routing-functions are reset at (the beginning of) each time. Let χ(0) = µ(0) , and let χ(t) ∈ ZN ≥0 denote the configuration of tokens (t) at time t ∈ Z≥0 in our algorithm. A configuration χ is updated, imitating Pv,u , as follows. Without loss of generality, we may assume that an arbitrary ordering u1 , . . . , uδ(v) is defined on N (v) for each v ∈ V . (t)
Then, we define the number of tokens Zv,u sent from v to u during the time interval from t to t + 1 by j k (t) ∗ χ P v v,ui + 1 (i ≤ i ) (t) Zv,u = (18) k j i χ(t) (otherwise) v Pv,ui
where
(t)
i∗ = χv −
Pδ(v) j i=1
(t)
χv Pv,ui
k
denoting the number of “surplus” tokens. Then, χ(t+1) is defined by (t) (t+1) def. P = χu v∈V Zv,u
for each u ∈ V . It is not difficult to see the following observation. Observation 6.1. For the above algorithm, (t) P Zv,u − χ(t) v,u ≤ 1 v
holds for any u, v, ∈ V and t ≥ 0.
12
(19)
6.2 Upper bound of the point-wise distance Let µ e(t) = µ(t) /M , for simplicity, then clearly µ e(∞) = π holds, since P is ergodic (see Section 2). By the definition of the mixing time, Dtv (e µ(τ (ε)) , π) ≤ ε holds where τ (ε) denotes the mixing time of P , meaning that µ e approximates the target distribution π well. Thus, we hope our deterministic sampler that def. the “distribution” χ e(T ) = χ(T ) /M approximates the target distribution π well. We define a point-wise N distance Dpw (ξ, ζ) between ξ ∈ RN ≥0 and ζ ∈ R≥0 satisfying kξk1 = kζk1 = 1 by def.
Dpw (ξ, ζ) = max |ξv − ζv | = kξ − ζk∞ . v∈V
(20)
N ×N be a reversible transition matrix with a stationary distribution π, then Theorem 6.2. Let P ∈ R≥0
π ∗ max 3t ∆ · e(T ) , µ e(T ) ≤ Dpw χ πmin M
holds for any T ≥ 0, where πmax = max{πv | v ∈ V } and πmin = min{πv | v ∈ V }. Proof. We can apply Theorem 3.1 for algorithm described in section6.1 since (19) holds. Note that Ψσ = 1 holds by observation 6.1, we have 2(1 − γ) πw (T ) (T ) τ (γ) ∆ χw − µ w ≤ 1 − 2γ πmin
holds. Then, taking γ = 1/4 and divide by M both sides, we obtain the claim. In a special case that the stationary distribution is uniform, we obtain the following by Theorem 6.2. N ×N be an ergodic and reversible transition matrix with a uniform stationary Corollary 6.3. Let P ∈ R≥0 stationary distribution π. Set M ≥ 6ε−1 t∗ ∆,then the “distribution” χ e(T ) of the deterministic sampler after (T ) T ≥ τ (ε/2) steps satisfies that Dpw χ e , π ≤ ε.
In the following section, we show some examples of polynomial-time deterministic samplers for uniform sampling of combinatorial objects, whose counting is known to be #P-complete.
6.3 0-1 knapsack solutions Given a ∈ Rn≥0 and b ∈ R≥0 , the set of the 0-1 knapsack solutions is defined by ΩKna = {x ∈ {0, 1}n | Pn |ΩKna |×|ΩKna | by i=1 ai xi ≤ b}. We define a transition matrix PKna ∈ R (if y ∈ NKna (x)) 1/2n PKna (x, y) = 1 − |NKna (x)|/2n (if y = x) (21) 0 (otherwise)
for x, y ∈ ΩKna , where NKna (x) = {y ∈ ΩKna | kx − yk1 = 1}. Note that stationary distribution of PKna is uniform distribution since PKna is symmetric. The following theorem is due to Morris and Sinclair [38]. 9
Theorem 6.4. [38] The mixing time τ (γ) of PKna is O(n 2 +α log γ −1 ) for any α > 0 and for any γ > 0. For the Markov chain defined by PKna , our deterministic sampler is described as follows. Note that the following implementation is not optimized the time and space complexity, for simplicity of the arguments.
13
Algorithm 1. Step 0. Set W 0 [i] := 0 for each i = 1, . . . , M . % W t [i] stores a solution in ΩKna , where token i is. Step 1. For (t = 0 to T − 1){ (t) (t) (a). Set list Sx := {i ∈ {1, . . . , M } | W t [i] = x} for each x ∈ ΩKna as long as Sx 6= ∅. (t) (b). Serve tokens in Sx to neighboring vertices according to (18) for each x ∈ ΩKna satisfying (t) that Sx 6= ∅, and set W t+1 [i] be the solution in ΩKna at which token i arrived. } Step 2. Output W T [i] for each i = 1, . . . , M . 11
9
Theorem 6.5. For an arbitrary ε (0 < ε < 1), set M := c1 n 2 +α ε−1 and T := c2 n 2 +α log ε−1 with appropriate constants c1 , c2 and α, then Algorithm 1 outputs M sample over ΩKna satisfying that e(T ) , π ≤ ε (22) Dpw χ where π is the uniform distribution over ΩKna . The running time of Algorithm 1 is O(T M log(M ) n poly(log a, log b)) = O∗ (n11+2α ε−1 ) where O∗ ignores poly log term. Proof. We check Algorithm 1 for each Step. Step 0 sets all M tokens on 0 ∈ ΩKna , which takes O(M n) time. Step 1(a) constructs the configuration χ(t) of M tokens over ΩKna . Note that the number of lists is at most M , since Step 1(a) construct a list only for v ∈ ΩKna where at least one token exists. Step 1(a) takes O(M log(M ) n) time, by heapifying W t [i] (i = 1, . . . , M ) with the lexicographic order on ΩKna . Step 1(b) updates configuration according to our deterministic sampling algorithm in Section 6.1. It takes O(npoly(log a, log b)) time to find all feasible solutions neighboring to x. Once the algorithm finds all (t) feasible solutions neighboring to x, then it is easy to let every token of χx to go to the neighboring vertex (t) according to (18), in O(nχx ) time, like the rotor-router. Since we repeat Step 1 T times, then we obtain the time complexity O(T M log(M ) n poly(log a, log b)). Now, (22) is clear from Corollary 6.3, since Algorithm 1 is an implementation of the deterministic sampler in Section 6.1.
6.4 Linear extensions of a poset Let S = {1, 2, . . . , n}, and Q = (S, ) be a partial order. A linear extension of Q is a total order X = (S, ⊑) which respects Q, i.e., for all i, j ∈ S, i j implies i ⊑ j. Let ΩLin denote the set of all linear extensions of Q. We define a relationship X ∼p X ′ (p ∈ {1, . . . , n}) for a pair of linear extensions X and X ′ ∈ ΩLin satisfying that xp = x′p+1 , xp+1 = x′p , and xi = x′i for all i 6= p, p + 1, i.e., X = (x1 , x2 , . . . , xp−1 , xp , xp+1 , xp+2 , . . . , xn ) X ′ = (x1 , x2 , . . . , xp−1 , xp+1 , xp , xp+2 , . . . , xn ) holds. Then, we define a transition matrix PLin ∈ R|ΩLin|×|ΩLin| by (if X ′ ∼p X) F (p)/2 P ′ 1 − I∈NLin(X) PLin (X, I) (if X ′ = X) PLin (X, X ) = 0 (otherwise)
for X, X ′ ∈ ΩLin , where NLin (X) = {Y ∈ ΩLin | X ∼p Y (p ∈ {1, . . . , n − 1})} and F (p) =
(23) p(n−p) 1 (n3 −n) 6
.
Note that PLin is ergodic and reversible, and its stationary distribution is uniform on ΩLin [4]. The following theorem is due to Bubley and Dyer [4]. 14
Theorem 6.6. [4] For PLin ,
n2 1 3 (n − n) ln τ (γ) ≤ 6 4γ
holds for any γ > 0. For the Markov chain defined by PLin , our deterministic sampler is described as follows. Algorithm 2. Step 0. Set W 0 [i] := X ′ for each i = 1, . . . , M . % X ′ ∈ ΩLin is a linear extension. Step 1. For (t = 0 to T − 1){ (t) (t) (a). Set list SX := {i ∈ {1, . . . , M } | W t [i] = X} for each X ∈ ΩLin as long as SX 6= ∅. (t) (b). Serve tokens in SX to neighboring vertices according to (18) for each X ∈ ΩLin satisfying (t) that SX 6= ∅, and set W t+1 [i] be the solution in ΩLin at which token i arrived. } Step 2. Output W T [i] for each i = 1, . . . , M . l m 1 3 −1 n2 1 3 Theorem 6.7. For an arbitrary ε (0 < ε < 1), set M := 6n 3 (n − n) ln n ε and T := 6 (n − n) ln 2ε then Algorithm 2 outputs M sample over ΩLin satisfying that e(T ) , π ≤ ε (24) Dpw χ
where π is the uniform distribution over ΩLin . The running time of Algorithm 2 is O(T nM log(M )) = O∗ (n8 ε−1 ) where O∗ ignores poly log term.
Proof. We check Algorithm 2 for each Step. Step 0 sets all M tokens on X ′ ∈ ΩLin , which takes O(M n + n2 ) time, since it takes O(n2 ) time to pick a linear extension X ′ by topological sorting. Step 1(a) takes O(nM log(M )) time, by heapifying W t [i] (i = 1, . . . , M ) with the lexicographic order on ΩLin . Step 1(b) takes O(n log n) time 4 to find all feasible solutions neighboring to X. Since we repeat Step 1 T times, then we obtain the time complexity O(T n M log(M )). Now, (24) is clear from Corollary 6.3, since Algorithm 2 is an implementation of the deterministic sampler in Section 6.1.
6.5 Matchings in a graph Counting all matchings in a graph, related to the Hosoya index [22], is known to be #P-complete [47]. Jerrum and Sinclair [27] gave a rapidly mixing chain. This Section is concerned with sampling of all matchings in a graph. Remark that counting all perfect matchings in a bipartite graph, related to the permanent, is also well-known #P-complete problem, and Jerrum, Sinclair, and Vigoda [28] gave a celebrated FPRAS based on an MCMC method using annealing. To apply our algorithm to sampling perfect matchings, we need some assumptions on the input graph (see e.g., [44, 27, 28]). Let H = (U, F ) be an undirected graph, where |U | = n and |F | = m. A matching in H is a subset M ⊆ F such that no edges in M share an endpoint. Let NC (M) = {e = {u, v} | e ∈ / 4 In fact, to obtain this order of magnitude, we need a (simple) data structure, which is constructed in O(n3 ) (or O(n2 log n) is possible) time at Step 0. However, here we omit the detail, for simplicity of the argument.
15
M, both u and v are matched in M} and let NMat (M) = {e | e ∈ / NC (M)}. Then, for e = {u, v} ∈ NMat (M), we define M(e) by (if e ∈ M) M−e M(e) = M+e (if u and v are unmatched in M) M + e − e′ (if exactly one of u and v is matched in M , and e′ is the matching edge). Let ΩMat denote the set of all possible matchings of H. The we define the transition matrix PMat ∈ R|ΩMat |×|ΩMat | by (if M′ = M(e)) 1/2m ′ PMat (M, M ) = 1 − |NMat (M)|/2m (if M′ = M) 0 (otherwise)
for any M, M′ ∈ ΩMat . Note that PMat is ergodic and reversible, and its stationary distribution is uniform on ΩMat [27]. The following theorem is due to Jerrum and Sinclar [27]. Theorem 6.8. [27] For PMat , τ (γ) ≤ 4mn(n ln n + ln γ −1 ) holds for any γ > 0. For the Markov chain defined by PMat , our deterministic sampler is described as follows. Algorithm 3. Step 0. Set W 0 [i] := ∅ for each i = 1, . . . , M . Step 1. For (t = 0 to T − 1){ (t) (t) (a). Set list SM := {i ∈ {1, . . . , M } | W t [i] = M} for each M ∈ ΩMat as long as SM 6= ∅. (t) (b). Serve tokens in SM to neighboring vertices according to (18) for each M ∈ ΩMat satisfying (t) that SM 6= ∅, and set W t+1 [i] be the solution in ΩMat at which token i arrived. } Step 2. Output W T [i] for each i = 1, . . . , M .
Theorem 6.9. For an arbitrary ε (0 < ε < 1), set M := 24(m + 1)mn(n ln n + ln 4)ε−1 and T := 4mn(n ln n + ln(ε/2)−1 ) , then Algorithm 3 outputs M sample over ΩMat satisfying that e(T ) , π ≤ ε (25) Dpw χ where π is the uniform distribution over ΩMat . The running time of Algorithm 3 is O(T mM log(M )) = O∗ (m4 n4 ε−1 ) where O∗ ignores poly log term. Proof. We check Algorithm 3 for each Step. Step 0 sets all M tokens on 0 ∈ ΩMat , which takes O(M m) time. Step 1(a) takes O(mM log(M )) time, by heapifying W t [i] (i = 1, . . . , M ) with the lexicographic order on ΩMat . Step 1(b) takes O(m2 ) time to find all feasible solutions neighboring to M. Since we repeat Step 1 T times, then we obtain the time complexity O(T mM log(M )). Now, (25) is clear from Corollary 6.3, since Algorithm 3 is an implementation of the deterministic sampler in Section 6.1. 16
6.6 q-coloring of a graph Let H = (U, F ) be an undirected graph, where |U | = n and |F | = m, and let Q = {1, . . . , q} be a set of colors. The set of proper q-coloring of H is defined by ΩCol = {c ∈ Qn | cv 6= cw (∀(v, w) ∈ F )}. We define the relationship c ∼v c′ (v ∈ V ) for a pair of coloring c, c′ ∈ ΩCol satisfying that cv 6= c′v and cu = c′u for all u 6= v. Let Q(cv ) = Q \ c(N (v)), where c(A) = {cu | u ∈ A} for a set A ⊆ U , and we define the transition matrix PCol ∈ R|ΩCol |×|ΩCol | by · |Q(cv )|) (if c′ ∼v c) 1/(nP ′ 1 − i∈NCol (c) PCol (c, i) (if c′ = c) PCol (c, c ) = 0 (otherwise)
for any c, c′ ∈ ΩCol , where NCol (c) = {c′ ∈ ΩCol | c′ ∼v c(v ∈ U )}. This chain is known as the Glauber dynamics for proper q-colorings. Note that PCol is ergodic and reversible, and its stationary distribution is uniform on ΩCol [25]. The following theorem is due to Jerrum [25]. Theorem 6.10. [25] For PCol , τ (γ) ≤
q − ∆H q − 2∆H
n n ln γ
holds for any γ > 0 in case that q > 2∆H , where ∆H denotes the maximum degree of H. For the Markov chain defined by PCol , our deterministic sampler is described as follows. Algorithm 4. Step 0. Set W 0 [i] := c′ for each i = 1, . . . , M . % c′ is some coloring of ΩCol . Step 1. For (t = 0 to T − 1){ (t) (t) (a). Set list Sc := {i ∈ {1, . . . , M } | W t [i] = c} for each c ∈ ΩCol as long as Sc 6= ∅. (t) (b). Serve tokens in Sc to neighboring vertices according to (18) for each c ∈ ΩCol satisfying (t) that Sc 6= ∅, and set W t+1 [i] be the solution in ΩCol at which token i arrived. } Step 2. Output W T [i] for each i = 1, . . . , M . Note that maximum degree of transition diagram of PCol is at most nq, we obtain the following theorem. l m q−∆H Theorem 6.11. For an arbitrary ε (0 < ε < 1), set M := 6nqε−1 q−2∆ n ln 4n and T := H m l q−∆H 2n q−2∆H n ln ε , then Algorithm 3 outputs M sample over ΩCol satisfying that e(T ) , π ≤ ε (26) Dpw χ
where π is the uniform distribution over ΩCol . The running time of Algorithm 4 is ! 2 q − ∆ H O(T nM log(M )) = O∗ qn4 ε−1 q − 2∆H where O∗ ignores poly log term.
Proof. We check Algorithm 4 for each Step. Step 0 sets all M tokens on c′ ∈ ΩCol , which takes O(n∆H ) time. Step 1(a) takes O(nM log(M )) time, by heapifying W t [i] (i = 1, . . . , M ) with the lexicographic order on ΩCol . Step 1(b) takes O(nq∆H ) time to find all feasible solutions neighboring to c. Since we repeat Step 1 T times, then we obtain the time complexity O(T nM log(M )). Now, (26) is clear from Corollary 6.3, since Algorithm 4 is an implementation of the deterministic sampler in Section 6.1. 17
7 Concluding Remarks This paper is concerned with the functional-router model, that is a generalization of the rotor-router model, (t) (t) and gave an upper bound of |χv − µv | for the general function-router model in case of the corresponding Markov chain is reversible. We also better bounds for several specific models, such as greedy-routing, rotorrouter model on multidigraphs, and the functional-routing model based on the van der Corput sequence. We then proposed a deterministic sampling algorithm, and gave an upper bound of the point-wise distance. Using the deterministic sampling algorithm, we obtain polynomial-time deterministic algorithms for uniform sampling of knapsack solutions, linear extensions, matchings, q-coloring, etc. A bound of the point-wise distance independent of πmax /πmin is a future work. Development of deterministic approximation algorithms based on the deterministic sampler for #P-hard problems is a challenge.
Acknowledgment The authors would like to thank an anonymous reviewer of the previous version of the paper for letting us know the stack-walk by Holroyd and Propp.
References [1] H. Akbari and Petra Berenbrink, Parallel rotor walks on finite graphs and applications in discrete load balancing, Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures (SPAA 2013), 186–195. [2] O. Angel, A.E. Holroyd, J. Martin, and J. Propp, Discrete low discrepancy sequences, arXiv:0910.1077. [3] S. Angelopoulos, B. Doerr, A. Huber, and K. Panagiotou, Tight bounds for quasirandom rumor spreading, The Electronic Journal of Combinatorics, 16 (2009), #R102. [4] R. Bubley and M. Dyer, Faster random generation of linear extensions, Discrete Mathematics, 201 (1999), 81–88. [5] J. Cooper, B. Doerr, T. Friedrich, and J. Spencer, Deterministic random walks on regular trees, Random Structures & Algorithms, 37 (2010), 353–366. [6] J. Cooper, B. Doerr, J. Spencer, and G. Tardos, Deterministic random walks on the integers, European Journal of Combinatorics, 28 (2007), 2072–2090. [7] C. Cooper, D. Ilcinkas, R. Klasing, and A. Kosowski, Derandomizing random walks in undirected graphs using locally fair exploration strategies, Distributed Computing, 24 (2011), 91–99. [8] J. Cooper and J. Spencer, Simulating a random walk with constant error, Combinatorics, Probability and Computing, 15 (2006), 815–822. [9] B. Doerr, Introducing quasirandomness to computer science, Efficient Algorithms, volume 5760 of Lecture Notes in Computer Science, Springer Verlag, (2009), 99–111. [10] B. Doerr and T. Friedrich, Deterministic random walks on the two-dimensional grid, Combinatorics, Probability and Computing, 18 (2009), 123–144.
18
[11] B. Doerr, T. Friedrich, M. K¨unnemann, and T. Sauerwald, Quasirandom rumor spreading: An experimental analysis, ACM Journal of Experimental Algorithmics, 16 (2011), 3.3:1–3.3:13. [12] B. Doerr, T. Friedrich, and T. Sauerwald, Quasirandom rumor spreading, Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2008), 773–781. [13] B. Doerr, T. Friedrich, and T. Sauerwald, Quasirandom rumor spreading: Expanders, push vs. pull, and robustness, Proceedings of the 36th International Colloquium on Automata, Languages and Programming, (ICALP 2009), 366–377. [14] B. Doerr, T. Friedrich, and T. Sauerwald, Quasirandom rumor spreading on expanders, Electronic Notes in Discrete Mathematics, 34 (2009), 243–247. [15] B. Doerr, A. Huber, and A. Levavi, Strong robustness of randomized rumor spreading protocols, Proceedings of the 20th International Symposium on Algorithms and Computation (ISAAC 2009), 812– 821. [16] M. Dyer, Approximate counting by dynamic programming, Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC 2003), 693–699. [17] T. Friedrich, M. Gairing, and T. Sauerwald, Quasirandom load balancing, SIAM Journal on Computing, 41 (2012), 747–771. [18] T. Friedrich and T. Sauerwald, The cover time of deterministic random walks, The Electronic Journal of Combinatorics, 17 (2010), R167. [19] P. Gopalan, A. Klivans, and R. Meka, Polynomial-time approximation schemes for knapsack and related counting problems using branching programs, arXiv:1008.3187v1, 2010. [20] P. Gopalan, A. Klivans, R. Meka, D. Stefankovic, S. Vempala, and E. Vigoda, An FPTAS for #knapsack and related counting problems, Proceedings of the IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS 2011), 817–826. [21] A.E. Holroyd and J. Propp, Rotor walks and Markov chains, M. Lladser, R.S. Maier, M. Mishna, A. Rechnitzer, (eds.), Algorithmic Probability and Combinatorics, The American Mathematical Society, 2010, 105–126. [22] H. Hosoya, Topological index. A newly proposed quantity characterizing the topological nature of structural isomers of saturated hydrocarbons, Bulletin of the Chemical Society of Japan, 44 (1971), 2332–2339. [23] A. Huber and N. Fountoulakis, Quasirandom broadcasting on the complete graph is as fast as randomized broadcasting, Electronic Notes in Discrete Mathematics, 34 (2009), 553–559. [24] S. Ikeda, I. Kubo, M. Yamashita, The hitting and cover times of random walks on finite graphs using local degree information, Theoretical Computer Science, 410 (2009), 94–100. [25] M. Jerrum, A very simple algorithm for estimating the number of k-colorings of a low-degree graph, Random Structures Algorithms 7(1995), 157–165. [26] D. Jerison, L. Levine, S. Sheffield, Logarithmic fluctuations for internal DLA, Journal of the American Mathematical Society, 25 (2012), 271–301.
19
[27] M. Jerrum and A. Sinclair, Approximation algorithms for NP-hard problems, D.S. Hochbaum ed., The Markov chain Monte Carlo method: an approach to approximate counting and integration, PWS Publishing, 1996. [28] M. Jerrum, A. Sinclair, and E. Vigoda, A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries, Journal of the ACM, 51 (2004), 671–697. [29] H. Kajino, S. Kijima, and K. Makino, Discrepancy analysis of deterministic random walks on finite irreducible graphs, discussion paper. [30] S. Kijima, K. Koga, and K. Makino, Deterministic random walks on finite graphs, Proceedings of Analytic Algorithmics and Combinatorics (ANALCO 2012), 16–25. [31] S. Kijima and T. Matsui, Approximation algorithm and perfect sampler for closed Jackson networks with single servers, SIAM Journal on Computing, 38 (2008), 1484–1503. [32] M. Kleber, Goldbug variations, The Mathematical Intelligencer, 27 (2005), 55–63. [33] G.F. Lawler, M. Bramson, and D. Griffeath, Internal diffusion limited aggregation, The Annals of Probability, 20 (1992), 2117–2140. [34] L. Levine and Y. Peres, The rotor-router shape is spherical, The Mathematical Intelligencer, 27 (2005), 9–11. [35] L. Levine and Y. Peres, Spherical asymptotics for the rotor-router model in Zd , Indiana University Mathematics Journal, 57 (2008), 431–450. [36] L. Levine and Y. Peres, Strong spherical asymptotics for rotor-router aggregation and the divisible sandpile, Potential Analysis, 30 (2009), 1–27. [37] D.A. Levine, Y. Peres, and E.L. Wilmer, Markov Chain and Mixing Times, American Mathematical Society, 2008. [38] B. Morris and A. Sinclair, Random walks on truncated cubes and sampling 0-1 knapsack solutions, SIAM Journal on Computing, 34 (2004), 195–226. [39] R. Montenegro and P. Tetali, Mathematical Aspects of Mixing Times in Markov Chains, NOW Publishers, 2006. [40] H. Niederreiter, Quasi-Monte Calro methods and pseudo-random numbers, Bull. Amer. Math .Soc., 84(1978), 957–1042 [41] V. Priezzhev, D. Dhar, A. Dhar, and S. Krishnamurthy, Eulerian walkers as a model of self-organized criticality, Physical Review Letters, 77 (1996), 5079–5082. [42] J. Propp and D. Wilson, Exact sampling with coupled Markov chains and applications to statistical mechanics, Random Structures Algorithms, 9 (1996), 223–252. [43] T. Shiraga, Y. Yamauchi, S. Kijima, and M. Yamashita, Deterministic random walks for irrational transition probabilities, IPSJ SIG Technical Reports, 2012-AL-142(2), 2012. (in Japanese). [44] A. Sinclair, Algorithms for Random Generation & Counting, A Markov chain approach, Birkh¨auser, 1993.
20
[45] D. Stefankovic, S. Vempala, and E. Vigoda, A deterministic polynomial-time approximation scheme for counting knapsack solutions, SIAM Journal on Computing, 41 (2012), 356–366. [46] R. Tijdeman, The chairman assignment problem, Discrete Math. 32 (1980), 323–330. [47] L.G. Valiant, The complexity of enumeration and reliability problems, SIAM Journal on Computing, 8 (1979), 410–421. [48] J. G. van der Corput, Verteilungsfunktionen, Proc. Akad. Amsterdam, 38 (1935), 813–821, 1058–1066.
A
Fundamental Properties of Markov Chain and Mixing Time
A.1 Proof of Proposition 2.1 In this Section, we show Proposition 2.1 (see e.g., [39, 37]). Proposition 2.1 For any integers ℓ (ℓ ≥ 1) and k (0 ≤ k < τ (γ)), 1 h (ℓ· τ (γ) + k) ≤ (2γ)ℓ 2
holds for any γ (0 < γ < 1/2). To begin with, we define t t ¯ def. h(t) = max Dtv (Pv,· , Pw,· ).
(27)
v,w∈V
Then, we show the following. Lemma A.1. Let ξ, ζ ∈ R|V | be probability distributions. Then, ¯ Dtv (ξP t , ζP t ) ≤ h(t)
holds for any ξ, ζ ∈ R|V | and for any t ≥ 0. Proof. First, we obtain that Dtv (ξP t − ζP t ) = =
=
X 1
X t t ξv Pv,· − ζw Pw,·
2 v∈V w∈V 1
X X X X 1
t t ξv ζw − ζw Pw,· ξv Pv,·
2 v∈V w∈V w∈V v∈V 1
X X 1 t t ξv ζw Pv,· − Pw,·
.
2 v∈V w∈V
Second equality follows the fact that Thus, (28) ≤ ≤ holds, and we obtain the claim.
P
u∈V
ξu =
P
u∈V
1
ζu = 1, since ξ and ζ are probability distributions.
t
1XX t ξv ζw Pv,· − Pw,· 1 2 v∈V w∈V
t
XX 1 t ¯ ξv ζw = h(t) − Pw,· max Pv,· 1 2 v,w∈V v∈V w∈V
21
(28)
Lemma A.2. ¯ ≤ 2h(t) h(t) ≤ h(t) holds for any t ≥ 0. Proof. Let ev ∈ R|V | denote the v-th unit vector. By Lemma A.1, t ¯ Dtv (Pv,· , π) = Dtv (ev P t , πP t ) ≤ h(t)
¯ holds for any v ∈ V , and we obtain h(t) ≤ h(t). By the definition of the total variation distance, t t Dtv (Pv,· , Pw,· ) =
≤
1 X t t Pv,u − πu + πu − Pw,u 2 u∈V 1 X 1 X t πu − P t ≤ 2h(t) Pv,u − πu + w,u 2 2 u∈V
u∈V
¯ ≤ 2h(t). holds for any v, w ∈ V . We obtain h(t)
P Lemma A.3. Suppose a vector ξ ∈ R|V | satisfies i∈V ξi = 0, then
t ¯
ξP ≤ kξk h(t) 1 1
holds for any t ≥ 0.
Proof. Let ξ + , ξ − ∈ R|V | be defined by ξi+ = max{ξi , 0} and ξi− = max{−ξi , 0}. Then, ξi+ − ξi− = max{ξi , 0} − max{−ξi , 0} = ξi
(29)
ξ = ξ+ − ξ−.
(30)
holds, thus we obtain
Since
P
i∈V
ξi = 0, X
ξi+ =
i∈V
X
ξi−
(31)
i∈V
holds by (29), and we obtain the first inequality. Now, we show the second inequality. By the definition of ξ + and ξ − , ξi+ + ξi− = max{ξi , 0} + max{−ξi , 0} = |ξi | holds. Hence X
ξi+ +
i∈V
X
ξi− =
i∈V
X i∈V
|ξi |
(32)
holds. Thus, by (31) and (32), X i∈V
ξi+ =
X i∈V
ξi− =
1X 1 |ξi | = kξk1 2 2 i∈V
22
(33)
holds, hence
ξ+ 1 kξk1 2
and
ξ− 1 kξk1 2
are probabilistic distribution. Finally, by Lemma A.1 and (30),
ξ+ −
t
+ t
ξ 1 t t
ξP = ξ P − ξ − P t = kξk1 · P − P
1 1 1
12 kξk1 2 2 kξk1 1 ¯ ≤ kξk h(t) 1
holds, and we obtain the claim. Lemma A.4. ¯ h(s + t) ≤ h(s)h(t), ¯ + t) ≤ ¯h(s)h(t) ¯ h(s
and
hold for any s, t ≥ 0. Proof. By Lemma A.3,
1
ev P s+t − π = 1 (ev P s − π) P t ≤ 1 1 2 2
1 ¯ kev P s − πk1 h(t) 2
¯ holds for any v ∈ V , and we get h(s + t) ≤ h(s)h(t). Similarly,
1
ev P s+t − ew P s+t = 1 (ev P s − ew P s ) P t ≤ 1 1 2 2
1 ¯ kev P s − ew P s k1 h(t) 2
¯ + t) ≤ ¯h(s)h(t). ¯ holds for any v, w ∈ V , and we get h(s Proof of Proposition 2.1. Using Lemma A.4,
¯ h (ℓ· τ (γ) + k) ≤ h (ℓ· τ (γ)) h(k) ≤ h (ℓ· τ (γ)) ¯ ((ℓ − 1)· τ (γ)) ≤ h (τ (γ)) · h ¯ (τ (γ)) ℓ−1 ≤ h (τ (γ)) · h
holds. By Lemma A.2,
1 (34) ≤ h (τ (γ)) · (2h (τ (γ)))ℓ−1 ≤ γ· (2γ)ℓ−1 ≤ (2γ)ℓ 2 holds, and we obtain the claim.
A.2 Supplemental proof of Theorem 3.1 We show the following proposition, appearing in the proof of Theorem 3.1. Proposition A.5. If P is reversible, then t t πu Pu,v = πv Pv,u
holds for any u, v ∈ V and for any t ≥ 1.
23
(34)
Proof. We show the claim by an induction of t. For t = 1, the claim is clear by the definition of reversible. t+1 = π P t+1 holds for any Assuming that πu′ Put′ ,v′ = πv′ holds for any u′ , v ′ ∈ V , we show that πu Pu,v v v,u u, v ∈ V as follows; X X t t t+1 Pu,w Pw,v = πu Pu,w Pw,v πu Pu,v = πu w∈V
X
=
w∈V
t Pw,v πw Pw,u
=
=
t Pw,u πv Pv,w
= πv
X
t Pv,w Pw,u
w∈V
w∈V
t+1 πv Pv,u .
=
t πw Pw,v Pw,u
w∈V
w∈V
X
X
We obtain the claim.
B Supplemental Proofs in Section 5.3 This section presents a proof of Theorem 5.5 and Lemma 5.6. To begin with, we remark two lemmas concerning the function ψ defined by (17). Lemma B.1. For any i ∈ Z≥0 and any k ∈ Z≥0 , 1 i k · k ψ(i) = ψ i mod 2 + ψ k 2 2
holds.
Proof. In case of i < 2k , the claim is easy since ψ(i mod 2k ) = ψ(i) and ψ(⌊i/2k ⌋) = ψ(0) = 0 holds. P⌊lg i⌋ j Suppose that i ≥ 2k , and that i is represented in binary as i = j=0 βj (i)· 2 using βj (i) ∈ {0, 1} (j ∈ {0, 1, . . . , ⌊lg i⌋}). Then, ⌊lg i⌋ k−1 X X j k k βj (i)· 2 βj (i)· 2j , and mod 2 = i mod 2 = j=0
i 2k
=
j=0
$ P⌊lg i⌋ j=0
βj (i)· 2j
2k
%
⌊lg i⌋
=
X
βj (i)· 2j−k
j=k
hold, respectively. Let l = j − k and let bl = βl+k (i) (l = 0, 1, . . . , ⌊lg i⌋ − k), for convenience, then k−1 N X X 1 i 1 · k = ψ βj (i)2j + ψ βj (i)· 2j−k · k ψ i mod 2k + ψ k 2 2 2 j=0 j=k ⌊lg i⌋−k ⌊lg i⌋−k k−1 k−1 X X X 1 X −(j+1) l 1 −(j+1) βj (i)2 +ψ bl · 2 · k = βj (i)2 + k = bl · 2−(ℓ+1) 2 2 j=0
=
k−1 X
⌊lg i⌋
X
−(j+1)
βj (i)2
j=0
=
j=0
l=0
l=0
⌊lg i⌋ ⌊lg i⌋ k−1 X X 1 X −(j+1) −(j−k+1) βj (i)2 + βj (i)· 2−(j+1) βj (i)· 2 = + k 2 j=0
j=k
βj (i)· 2−(j+1) = ψ(i)
j=0
24
j=k
and, we obtain the claim. Lemma B.2. For any k ∈ Z≥0 and α ∈ {0, 1, . . . , 2k − 1}, ψ(2k + α) =
1 2k+1
+ ψ(α)
holds. Proof. By LemmaB.1, k 2 +α 1 ψ(2k + α) = ψ (2k + α) mod 2k + ψ · k 2k 2 ψ(1) = ψ(α) + k 2 1 = + ψ(α) 2k+1 holds, where the last equality follows ψ(1) = 1/2 by the definition. Now, we define def.
Φ[z, z ′ ) = {ψ(i) | i ∈ {z, z + 1, . . . , z ′ − 1}}
(35)
for z, z ′ ∈ Z≥0 satisfying z < z ′ . For convenience, we define Φ[z, z) = ∅. It is not difficult to see that i k k | i ∈ {0, 1, . . . , 2 − 1} (36) Φ[0, 2 ) = 2k holds for any k ∈ Z≥0 . In general, we can show the following lemma, using Lemmas B.1 and B.2. Lemma B.3. For any z ∈ Z≥0 and for any k ∈ Z≥0 Φ[z, z + 2k ) ∩ i , i + 1 = 1 k k 2 2 holds for any i ∈ {0, 1, . . . , 2k − 1}.
Proof. By Lemma B.1, o ψ(z + i) | i ∈ {0, 1, . . . , 2k − 1} ) ( ψ z+i k 2 | i ∈ {0, 1, . . . , 2k − 1} = ψ (z + i) mod 2k + 2k
Φ[z, z + 2k ) =
n
holds. Since 0 ≤ ψ(z ′ ) < 1 holds for any z ′ ∈ Z≥0 , ψ z+i k 2k ψ (z + i) mod 2 + k 2 1 ∈ ψ (z + i) mod 2k , ψ (z + i) mod 2k + k 2
25
(37)
holds for each i ∈ {0, 1, . . . , 2k − 1}. The observation (36) implies that o n ψ (z + i) mod 2k | i ∈ {0, 1, . . . , 2k − 1} j k | j ∈ {0, 1, . . . , 2 − 1} (38) = 2k holds. Notice that (38) implies ψ (z + i) mod 2k 6= ψ (z + i′ ) mod 2k for any distinct i, i′ ∈ {0, 1, . . . , 2k − 1}. Now, the claim is clear by (37) and (38). Note that Lemma B.3 implies that k
Φ[z, z + 2 ) =
i θ(i) k + k | i ∈ {0, 1, . . . , 2 − 1} 2k 2
(39)
using appropriate θ(i) ∈ [0, 1) for i = 0, 1, . . . , 2k − 1. Lemma B.4. Let z0 , k ∈ Z≥0 , and let x, y ∈ [0, 1) satisfy x < y. Then, k k |Φ[z0 , z0 + 2 ) ∩ [x, y)| − 2 · (y − x) < 2
holds.
Proof. Lemma B.3 and Equation (39) implies that there exists s, t ∈ {0, 1, . . . , 2k − 1} such that s ≤ t and s θ(s) s + 1 θ(s + 1) + k ≤ x < + k 2 2 2k 2k θ(t) t + 1 θ(t + 1) t + k ≤ y < k + 2k 2 2 2k where we assume −1 θ(−1) + = 0 and 2k 2k
2k θ(2k ) + =1 2k 2k
for convenience. Then, it is clear that k
Φ[z0 , z0 + 2 ) ∩ [x, y) =
θ(i) i + k | i ∈ {s, s + 1, . . . , t − 1} 2k 2
where s = t means that Φ[z0 , z0 + 2k ) ∩ [x, y) = ∅. By (40) and (41), we obtain 2k · x − θ(s + 1) − 1 < s ≤ 2k · x − θ(s), 2k · y − θ(t + 1) − 1 < t ≤ 2k · y − θ(t)
respectively, and hence we obtain that 2k · (y − x) − θ(t + 1) − 1 + θ(s) < t − s < 2k · (y − x) − θ(t) + θ(s + 1) + 1. Since |Φ[z0 , z0 + 2k ) ∩ [x, y)| = t − s and 0 ≤ θ(i) < 1 (i ∈ {0, 1, . . . , 2k − 1}), 2k · (y − x) − 2 < Φ[z0 , z0 + 2k ) ∩ [x, y) < 2k · (y − x) + 2
holds, and we obtain the claim.
26
(40) (41)
Using Lemma B.4, we obtain the following. Lemma B.5. Let z0 ∈ Z≥0 , z ∈ Z>0 , and let x, y ∈ [0, 1) satisfy x < y. Then, 2⌊lg z⌋ + 2 |Φ[z0 , z0 + z) ∩ [x, y)| < − (y − x) z z
holds.
Proof. For simplicity, let def.
Φ∗ [z0 , z0 + ℓ) = Φ[z0 , z0 + ℓ) ∩ [x, y) for any ℓ ∈ Z>0 . Then, notice that
|Φ∗ [z0 , z0 + z)| = Φ∗ [z0 , z0 + z ′ ) + Φ∗ [z0 + z ′ , z0 + z)
(42)
P⌊lg y⌋ holds for any z ′ ∈ Z>0 satisfying z ′ < z. Now, suppose z is represented as z = j=0 βj (z)· 2j in binary, where βj (z) ∈ {0, 1}. Using Lemma B.4, we obtain that ⌊lg z⌋ X ∗ ∗ j |Φ [z0 , z0 + z)| = Φ z0 , z0 + βj (z)· 2 j=0 ⌊lg z⌋−1 k+1 k X X X ∗ 0 j j ∗ = Φ z0 , z0 + β0 (z)· 2 + βj (z)· 2 βj (z)· 2 , z0 + Φ z0 + j=0 j=0 k=0 0
< β0 (z)· 2 · (y − x) + 2 +
=
⌊lg z⌋
X k=0
⌊lg z⌋−1
X k=0
βk+1 (z)· 2k+1 · (y − x) + 2
βk (z)· 2k · (y − x) + 2
= 2(⌊lg z⌋ + 1) + (y − x)
⌊lg z⌋
X
βk (z)· 2k
k=0
= 2(⌊lg z⌋ + 1) + z· (y − x). In a similar way, we also have
|Φ∗ [z0 , z0 + z)| > 2(⌊lg z⌋ − 1) + z· (y − x), and we obtain the claim. By Lemma B.5, it is not difficult to see Theorem 5.5 and Lemma 5.6 holds.
27