Approximate Undirected Transshipment and Shortest Paths via Gradient Descent Ruben Becker
Andreas Karrenbauer Christoph Lenzen
Sebastian Krinninger
arXiv:1607.05127v1 [cs.DS] 18 Jul 2016
Max Planck Institute for Informatics, Saarbrücken, Germany
Abstract We present a method for solving the transshipment problem—also known as uncapacitated minimum cost flow—up to a multiplicative error of 1 + in undirected graphs with polynomially bounded integer edge weights using a tailored gradient descent algorithm. An important special case of the transshipment problem is the single-source shortest paths (SSSP) problem. Our gradient descent algorithm takes O(−3 polylog n) iterations and in each iteration it needs to solve a variant of the transshipment problem up to a multiplicative error of polylog n. In particular, this allows us to perform a single iteration by computing a solution on a sparse spanner of logarithmic stretch. As a consequence, we improve prior work by obtaining the following results: ˜ −3 (m+n1+o(1) )) computational 1. RAM model: (1+)-approximate transshipment in O( 1 1+o(1) steps (leveraging a recent O(m )-step O(1)-approximation due to Sherman [2016]). ˜ 2. Multipass Streaming model: (1 + )-approximate transshipment and SSSP using O(n) −O(1) ˜ space and O( ) passes. 3. Broadcast Congested Clique model: (1 + )-approximate transshipment and SSSP ˜ −O(1) ) rounds. using O( ˜ −O(1) (√n + D)) rounds, 4. Broadcast Congest model: (1 + )-approximate SSSP using O( where D is the (hop) diameter of the network. The previous fastest algorithms for the last three models above leverage sparse hop sets. We bypass the hop set computation; using a spanner is sufficient in our method. The above bounds assume non-negative integer edge weights that are polynomially bounded in n; for general non-negative weights, running times scale with the logarithm of the maximum ratio between non-zero weights.
1
˜ to hide polylogarithmic factors in n. We use O(·)
1
Contents 3
1 Introduction 2 Solving the Transshipment Problem 2.1 Recovering a Primal Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Single Source Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 13 14
3 Applications 3.1 Broadcast Congested Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Broadcast Congest Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Multipass Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 20 21
References
24
A Deterministic Spanner Computation in Congested Clique and Multipass Streaming Model 28 B Primal-Dual Pair
30
2
1
Introduction
In the minimum-cost transshipment problem we want to find the cheapest route for sending units of a single good from sources to sinks along the edges of a graph. Equivalently, we want to find the minimum-cost flow in a graph where edges have unlimited capacity. More precisely, the input to the problem is a graph G = (V, E) with n nodes and m edges in which each node v specifies a demand b(v) for the good (positive for sources and negative for sinks), and every edge e specifies a cost w(e) for sending a single unit over e. The goal is to compute a transshipment P x : E → R that minimizes the total cost e∈E w(e)x(e) while keeping the excess at every node P P equal to its demand, i.e. satisfying the constraint e=(u,v)∈E x(e) − e=(v,u)∈E x(e) = b(v) for every node v. An important special case is the single-source shortest path (SSSP) problem, in which we want to compute the shortest paths from a designated source node s to every other node in the graph. This can be modeled as a minimum-cost transshipment problem by setting the demand of the source to −n + 1 and the demand of every other node to 1. In this paper, we introduce a technique for computing primal and dual (1 + )-approximate solutions of the minimum-cost transshipment in undirected graphs with non-negative edge weights. We show how the marriage of continuous optimization methods with combinatorial graph sparsification gives rise to highly efficient algorithms with many desirable properties. In particular, we use a tailored constrained gradiate descent method for approximating the dual problem that performs a polylogarithmic number of iterations. This method reduces the problem of computing a (1 + )-approximation to the more relaxed problem of computing, e.g., an O(log n)-approximation. We then exploit that am O(log n)-approximation can computed very efficiently by solving a variant of the problem on a sparse spanner, and that it is wellknown how to compute sparse spanners efficiently. Techniques from continuous optimization have been key to recent breakthroughs in the combinatorial realm of graph algorithms [DS08, CKM+ 11, She13, KLO+ 14, Mad13, LS14]. Our chief approach of performing gradient descent for a norm-minimization problem using a soft-max approximation is inspired by Sherman’s approximate max-flow algorithm [She13]. We deviate significantly from this framework by certain problem-specific tweaks to the standard gradient descent algorithm. Besides strengthening Sherman’s approximation scheme for the transshipment problem [She16] in the RAM model, our method is widely applicable among a plurality of computational models allowing for concurrency. As a consequence, we also improve on prior results for computing approximate SSSP in the multipass streaming, broadcast congested clique, and broadcast congest2 models. We note that an approximate (dual) solution to the transshipment problem merely yields distances to the source that are a (1+)-approximation on average. Naturally, one typically is interested in obtaining a (1 + )-approximate distance to the source for each node. We provide an extension of our algorithm that achieves this per-node guarantee. Interestingly, the generalization to the minimum-cost transshipment problem seems conceptually necessary for our method to be applicable, even if we are only interested in solving SSSP. Prior Work on SSSP. In the RAM model, the fastest asymptotic running times for computing SSSP are close to optimality, namely O(m + n log n) for weighted directed graphs (Dijkstra’s 2
Also known as the node-congest model.
3
algorithm with Fibonacci heaps [FT87]) and O(m) for undirected graphs with positive integer weights (Thorup’s algorithm [Tho99]). In a big-data world the sequentiality and global data structures of these algorithms are a severe drawback in many applications: to build scalable systems, algorithms also need to achieve good performance in terms of parallelism, locality, congestion, and storage. This has led researchers to study alternative models of computation that highlight these aspects, namely PRAM, distributed, streaming, etc. It turns out that in many of these models progress on solving the SSSP problem exactly has stalled because of implicit bottlenecks [KR90]. Such barriers can be overcome by allowing a little bit of approximation, and thus the focus has shifted to finding scalable approximation schemes, in particular for unweighted graphs. Our approach significantly differs from previous approaches that compute (1+)-approximate SSSP via sparse hop sets [Coh00, MPV+ 15, HKN16, EN16]. Roughly speaking, a hop set is a (preferably small) set of weighted edges that, when added to the original graph, provides sufficent shortcuts to approximate all pairwise distances using paths with only a small number of edges (“hops”). Progress along these lines was blocked by the unsuccessful search for hop set constructions with better guarantees in terms of both the number of required hops and the size of the hop set. Improvements in only one of these two dimensions is insufficient to provide strictly better approximate SSSP algorithms. Prior to our work, the deterministic (1 + o(1))-approximation algorithm of Henzinger et al. [HKN16] gives the best guarantees in both the congest model and the congested clique model of distributed computing, as well as the multipass streaming model. It matches known lower bounds up to a factor of no(1) . We improve upon this algorithm in all three models and match the lower bounds up to a factor of polylog n. Our Results. By implementing our method in specific models of computation, we obtain the following approximation schemes in graphs with non-negative polynomially bounded3 integer edge weights: 1. RAM model: We obtain a randomized algorithm for computing a (1 + )-approximation to the value of the minimum-cost transshipment using O(−3 (m + n1+o(1) )) time. This is faster than the recent (1 + )-approximate minimum-cost transshipment algorithm of Sherman [She16] with running time O(−2 m1+o(1) ), except for very sparse graphs. Our algorithm internally uses the algorithm of Sherman to get a constant-factor approximation in each iteration of the gradient descent method. Any (deterministic) approximation algorithm with polylogarithmic approximation ratio and running time O(m polylog n) will imply a (deterministic) (1 + )-approximation in time O(−O(1) m polylog n) by our method. 2. Broadcast Congest model: We obtain a deterministic algorithm that computes a (1 + )˜ −3 n) rounds. No approximation to the value of the minimum-cost transshipment using O( non-trivial upper bound was known before in this model. We also obtain a deterministic ˜ −O(1) (√n + D)) rounds. For algorithm for computing (1 + )-approximate SSSP using O( 3
For general non-negative weights, running times scale by a multiplicative factor of log R, where R is the maximum ratio between non-zero edge weights.
4
√ −1 ∈ polylog n, this matches the lower bound of Ω( n/ log n + D) [DHK+ 12] (for any (poly n)-approximation of the distance between two fixed nodes in a weighted undirected graph) up to polylogarithmic factors in n. 3. Broadcast Congested Clique model: We obtain a deterministic algorithm computing a ˜ −3 ) (1 + )-approximation to the value of the minimum-cost transshipment using O( rounds. No non-trivial upper bound was known before in this model. We also obtain a ˜ −O(1) ) rounds. deterministic algorithm for computing (1 + )-approximate SSSP using O( 4. Multipass Streaming model: We obtain a deterministic algorithm for computing a (1 + )˜ −3 ) passes and approximation to the value of the minimum-cost transshipment using O( ˜ O(n) space. No non-trivial upper bound was known before in this model. We also obtain a deterministic algorithm for computing (1 + )-approximate SSSP using O(−O(1) ) passes and O(n log n) space. By setting small enough we can compute distances up to log n exactly in integer-weighted graphs using polylog n passes and O(n log n) space. Thus, up to polylogarithmic factors in n, our result matches a lower bound of n1+Ω(1/p) / poly p space for all algorithms that decide in p passes if the distance between two fixed nodes in an unweighted undirected graph is at most 2(p + 1) for any p = O(log n/ log log n) [GO13]. Algorithm Overview. We design an approximation scheme for the minimum-cost transshipment problem in an undirected graph with non-negative edge weights based on its dual program. The dual asks to maximize potentials on nodes, weighted by their demands, subject to keeping the stretch of every edge, in terms of difference of potential of its endpoints, to at most the weight of the edge. In the special case of SSSP, for an optimal solution these potentials are equal to the distance to the source. In an equivalent formulation, we can normalize the weighted sum of the potentials to 1 and minimize the maximum stretch on the edges, which in turn means that we minimize the infinity-norm of the vector of edge stretches induced by the potentials. At the heart of our algorithm lies a method for approximatively solving this normminimization problem by means of gradient descent, a standard iterative method from continuous optimization for minimizing an unconstrained objective function. To comply with this framework, we modify our objective function as follows: we replace the non-differentiable infinity-norm by a differentiable approximation of the infinity-norm called soft-max. This function has a parameter β that controls the accuracy of the approximation of the infinity-norm. In contrast to a standard application of gradient descent, we adaptively control this parameter β to limit the number of iterations. The main idea of gradient descent is to iteratively update the node potentials in direction of the steepest descent of the gradient, i.e. the direction in which the soft-max objective function locally improves the most. However, since we actually want to solve a constrained problem, we will not follow this direction immediately, as such a step might lead us outside the feasible region. Instead, we follow a direction h that we determine by solving a minimum-cost transshipment problem with the gradient of the current potentials as the demands and the additional constraint of h being orthogonal to the initial demands. This may sound circular at first glance; however, if suffices to solve this intermediate problem up to a polylogarithmic
5
multiplicative error—the approximation ratio merely affects the number of gradient descent iterations. A logarithmic approximation to the intermediate problem can, e.g., be computed by solving the problem on a (multiplicative) spanner, a subgraph that approximately preserves all distances in the graph. Using this idea, we gain efficiency in many applications, because the spanner is much sparser than the original graph. We stop the algorithm as soon as the value computed for the intermediate problem is small enough. In case a primal solution is needed, it can be extracted from the gradient, where flow conservation is established by routing the (necessarily small) violations approximately, e.g., using a spanner. This approach yields a (1 + )-approximation of the minimum-cost transshipment. If our goal is to compute (1 + )-approximate SSSP from a source node s, we use the same algorithm with demand 1 for every node v 6= s and demand −n + 1 for s. However, this does not readily yield (1 + )-approximate distance estimates for all nodes, as the transshipment problem considers only the average cost of routing one unit of flow from the source to each node. We solve this problem by running the gradient descent algorithm with slightly larger precision and fix the distance estimates of all nodes that satisify an individual stopping criterion: essentially, we concurrently test for each node t whether the current potentials would be good enough for stopping the algorithm with b = 1t − 1s . This corresponds to the s-t path problem, for which the “average” simply becomes the cost of routing one unit of flow from s to t. We then repeat the algorithm, setting the demands of fixed nodes to 0. We can show that in each step we fix enough nodes to reduce the total sum of all distance estimates by a constant fraction and thus only need a logarithmic number of repetitions. Related Work. Minimum-cost transshipment is a classic problem in combinatorial optimization. To the best of our knowledge, polynomial-time algorithms have only been formulated for directed graphs with non-negative edge weights. In the following, we assume all weighted graphs to have non-negative edge weights. The fastest current algorithms in the centralized model run in time O(n(m + n log n) log n) [Orl93] and O(n(m + n log n) log B), respectively, where B is the sum of the nodes’ demands. Here the term m + n log n comes from SSSP computations ˜ √n polylog W ) in a in both cases. The weakly polynomial running time was improved to O(m recent breakthrough for minimim-cost flow [LS14], where W is the ratio between the largest and the smallest edge weight. Independent of our research, Sherman [She16] very recently obtained a randomized algorithm for computing a (1 + )-approximate minimum-cost transshipment in weighted undirected graphs with running time O(−2 m1+o(1) ). As opposed to SSSP, we are not aware of any non-trivial algorithms for computing (approximate) minimum-cost transshipment in alternative models of computation, such as distributed and streaming models. In all alternative models considered in this paper (i.e. Congest model, Congested Clique, and Multipass Streaming) the current best upper bound for computing (1 + o(1))-approximate SSSP on weighted undirected graphs is given by the deterministic algorithm of Henzinger, Krinninger, and Nanongkai [HKN16]. We survey related results for SSSP in the following. In the Congest model of distributed computing, SSSP on unweighted graphs can be computed exactly in O(D) rounds by distributed breadth-first search [Pel00]. For weighted graphs, the only non-trivial algorithm known is the distributed version of Bellman-Ford [Bel58, For56] using O(n) rounds. In terms of approximation, Elkin [Elk06] showed that computing an
6
α-approximate SSSP tree requires Ω((n/α)1/2 / log n + D) rounds. Toghether with many other lower bounds, this was strengthened in [DHK+ 12] by showing that computing a (poly n)approximation of the distance between two fixed nodes in a weighted undirected graph requires √ Ω( n/ log n + D) rounds. The lower bounds were countered by two SSSP algorithms: a ˜ 1/2+1/α + D) rounds [LPS13] and a randomized randomized O(α log α)-approximation using O(n 1/2 1/4 ˜ (1 + o(1))-approximation using O(n D + D) rounds [Nan14]. Both results were improved upon in [HKN16] by a deterministic algorithm that computes (1 + o(1))-approximate SSSP in n1/2+o(1) + D1+o(1) rounds. This algorithm leverages two ideas. First, it reduces the problem to computing SSSP on an overlay network in which communication is carried out by broadcasting via a global BFS tree. Second, it adds a sparse hop set to the overlay network to speed up an algorithm similar to Bellman-Ford on the overlay network. Hop sets were introduced by Cohen [Coh00] for speeding up approximate SSSP computations in a parallel setting. Roughly speaking, a hop set is a (preferably small) set of weighted edges that, when added to the original graph, provides sufficent shortcuts to approximate all pairwise distances using paths with only a small number of edges (“hops”). A recent improvement over Cohen’s hop sets by Elkin and Neiman [EN16] reduces the number of hops to be considered from polylogarithmic to constant and in this way speeds up algorithms for computing approximate SSSP from multiple sources. The Congested Clique model [LPP+ 05] has seen increasing interest in the past years as it highlights the aspect of limited bandwidth in distributed computing, yet excludes the possibility of explicit bottlenecks (e.g., a bridge that would limit the flow of information between the parts of the graph it connects to O(log n) bits per round in the Congest model). For weighted graphs, SSSP can again be computed exactly in O(n) rounds. The first approximation was given by Nanongkai [Nan14] with a randomized algorithm for computing (1 + o(1))-approximate ˜ √n) rounds. All-pair shortest paths on the Congested Clique can be computed SSSP in O( ˜ 1/3 ) rounds for an exact result and in O(n0.158 ) rounds for a (1 + o(1))deterministically in O(n approximation [CKK+ 15]. The hop set construction of [HKN16] gives a deterministic algorithm for computing (1 + o(1))-approximate SSSP in no(1) rounds. We note that both their and our approach can actually operate in the more restricted Broadcast Congested Clique and Broadcast Congest models, in which in each round, each node sends the same O(log n) bits to all other nodes or all its neighbors, respectively. In the Streaming model, two approaches were known for SSSP before the algorithm of [HKN16]. First, shortest paths up to distance d can be computed using d passes and O(n) space in unweighted graphs by breadth-first search. Second, approximate shortest paths can be computed by first obtaining a sparse spanner and then computing distances on the spanner without additional passes [FKM+ 05, EZ06, FKM+ 08, Bas08, Elk11]. This leads, for example, to a randomized (2k − 1)-approximate all-pairs shortest paths algorithm using 1 ˜ 1+1/k ) space for any integer k ≥ 2 in unweighted undirected graphs, which can pass and O(n be extended to a (1 + )(2k − 1)-approximation in weighted undirected graphs for any > 0 at the cost of increasing the space by a factor of O(−1 log W ). In unweighted undirected graphs, the spanner construction of [EZ06] can be used to compute (1 + o(1))-approximate SSSP using O(1) passes and O(n1+o(1) ). The hop set construction of [HKN16] gives a deterministic algorithm for computing (1 + o(1))-approximate SSSP in weighted undirected graphs using no(1) log W passes and n1+o(1) log W space. These upper bounds are complemented by
7
a lower bound of n1+Ω(1/p) /(poly p) space for all algorithms that decide in p passes if the distance between two fixed nodes in an unweighted undirected graph is at most 2(p + 1) for any p = O(log n/ log log n) [GO13]. Note that this lower bound in particular applies to all algorithms that provide a 1 + approximation for < 1/2(p + 1) in integer weighted graphs, as this level of precision requires to compute shortest paths for small enough distances exactly.
2
Solving the Transshipment Problem
We consider (1 + ε)-approximation algorithms for the undirected shortest transshipment problem without capacity constraints on a (w.l.o.g. connected) graph G = (V, E) with n nodes, m edges, and positive integral edge weights w ∈ Zm ≥1 . Note that excluding 0 as edge-weight is a mild restriction because we can always generate new weights w0 with we0 = 1 + dn/εe · we while preserving at least one of the shortest paths between each pair of nodes as well as (1 + ε)-approximations. The transshipment problem can be written as a primal/dual pair of linear programs (see Appendix B) : min{||W x||1 : Ax = b} = max{bT y : kW −1 AT yk∞ ≤ 1},
(1)
where A ∈ Rn×m is the node-arc incidence matrix of G with respect to an arbitrary orientation of the edges, W = diag(w) ∈ Rm×m is the diagonal matrix of weights, and b ∈ Zn . For instance, b is equal to 1t − 1s for the s-t shortest path problem and 1 − n1s for the single-source shortest path problem. In the following, we consider G and b to be fixed, and denote by y ∗ an optimal solution of the dual program, i.e., bT y ∗ = max{bT y : kW −1 AT yk∞ ≤ 1}. W.l.o.g., we restrict to feasible and non-trivial instances, i.e., bT 1 = 0 and b = 6 0. In particular, the optimum objective value of (1) is positive and finite. This implies that kW −1 AT y ∗ k∞ = 1. As our first step, we relate the dual program to another linear program that normalizes the objective to 1 and seeks to minimize kW −1 AT yk∞ instead: min{kW −1 AT πk∞ : bT π = 1}.
(2)
In the following, we denote by π ∗ an optimal solution to (2). Observation 1. Feasible solutions π of (2) that satisfy kW −1 AT yk∞ = 6 0 are mapped to feasible solutions of the dual program in (1) via f (π) :=
π kW −1 AT πk
.
(3)
∞
Feasible solutions y of the dual program in (1) that satisfy bT y 6= 0 are mapped to feasible solutions of (2) via g(y) :=
8
y bT y
.
(4)
Lemma 2. For ε > 0, let π be a solution of (2) within factor 1 + ε of the optimum. Then f (π) is feasible for (1) and within factor 1 + ε of the optimum. In particular, f (π ∗ ) is an optimal solution of (1). Proof. Recall that bT y ∗ = 6 0, i.e., by Observation 1, g(y ∗ ) is a feasible solution of (2). Because (1) is bounded, it must hold that kW −1 AT π ∗ k∞ 6= 0, implying the same for π, i.e., f (π) is feasible in (1). It follows that 1 1 bT π = ≥ −1 T −1 T kW A πk∞ kW A πk∞ (1 + ε)kW −1 AT π ∗ k∞ 1 bT y ∗ bT y ∗ ≥ = ≥ . (1 + ε)kW −1 AT g(y ∗ )k∞ (1 + ε)kW −1 AT y ∗ k∞ (1 + ε)
bT f (π) =
In other words, it is sufficient to determine a (1 + ε)-approximation to (2). We intend to use gradient descent for determining an approximate solution of (2). However, the objective of (2) is not differentiable, so we change it one more time by replacing the norm by the so-called soft-max (a.k.a. log-sum-exp or lse for short), which is a suitable approximation for k · k∞ . It is defined for vectors x ∈ Rd as
X 1 eβxi + e−βxi , lseβ (x) := ln β i∈[d]
where β > 0 is a parameter that controls the accuracy of the approximation of the infinity-norm at the expense of smoothness.4 This enables us to define a sufficiently smooth and accurate potential function Φβ (π) := lseβ (W −1 AT π), for some parameter β that will be adapted during the course of the gradient descent. The algorithm takes as arguments the graph G, a demand vector b, a starting-solution π that is an α-approximate solution, an initial β that is appropriate for π, and the desired approximation guarantee ε. Before showing the correctness and the running-time guarantee for the algorithm, we state some known facts (cf. [She13]) about lseβ (·). For x, y being d-dimensional vectors and β, t being positive scalars, it holds that kxk∞ ≤ lseβ (x)
(5)
lset·β (x/t) = lseβ (x)/t
(6)
∇ lset·β (x/t) = ∇ lseβ (x)
(7)
k∇ lseβ (x)k1 ≤ 1
(8)
k∇ lseβ (x) − ∇ lseβ (y)k1 ≤ βkx − yk∞ .
(9)
Convexity, Hoelder’s inequality5 , and (8) yield lseβ (x) ≤ lseβ (0) + ∇ lseβ (x)T x ≤ kxk∞ + 4
ln(2d) . β
(10)
Note that for the gradient ∇ lseβ (x) of the soft-max, we have that ∇ lseβ (x)i = P
eβxi −e−βxi (eβxi +e−βxi )
i∈[d]
i ∈ [d]. 5 For vectors x, y ∈ Rd , it holds that |xT y| ≤ kxkp kykq for p, q ≥ 0 such that
9
1 p
+
1 q
= 1.
for all
Algorithm 1: gradient_ust (G, b, π, β, ε) // π is an α-approximate solution with bT π = 1 // β is such that εβΦβ (π) ∈ [4 ln(2m), 5 ln(2m)] repeat // Invariant: bT π = 1 4 ln(4m) while εβ ≥ Φβ (π) do β ← 54 β. ˜ with ||W −1 AT h|| ˜ ∞ = 1 and bT h ˜ = 0 such that for δ := ∇Φβ (π)T h, ˜ it Determine h 1 T −1 T T holds that δ ≥ α max{∇Φβ (π) h : ||W A h||∞ ≤ 1, b h = 0}. ε if δ ≥ 4α then δ ˜ π ← π − 2β h. ε until δ < 4α return π, β
Lemma 3 (Correctness). For 0 < ε ≤ 1/2, algorithm gradient_ust returns potentials π ∈ Rn with ||W −1 AT π||∞ ≤ (1 + ε)||W −1 AT π ∗ ||∞ . π−π ∗ is a ||W −1 AT (π−π ∗ )||∞ −1 T T ||W A h||∞ ≤ 1, b h = 0}
Proof. If π = π ∗ + s1 for some scalar s, the result is trivial, otherwise feasible solution in the optimization problem max{∇Φβ (π)T h : and thus
∇Φβ (π)T (π − π ∗ ) ≤ δα||W −1 AT (π − π ∗ )||∞ .
(11)
Due to the convexity of Φβ , it holds that Φβ (π) ≤ Φβ (π ∗ ) + ∇Φβ (π)T (π − π ∗ ) (10)
ln(2m) + ∇Φβ (π)T (π − π ∗ ) β (11) ln(2m) ≤ ||W −1 AT π ∗ ||∞ + + δα||W −1 AT (π − π ∗ )||∞ . β ≤ ||W −1 AT π ∗ ||∞ +
Furthermore, using the triangle inequality, the fact that δ ≤ ε/(4α) at termination, the fact that 4 ln(2m) ≤ εβΦβ (π), and ||W −1 AT π||∞ ≤ Φβ (π), which follows from (5) yields
ε ε Φβ (π) ≤ 1 + ||W −1 AT π ∗ ||∞ . 2 4
1−
Thus using (5) once again and ε ≤ 1/2, we obtain the desired result ||W −1 AT π||∞ ≤ Φβ (π) ≤ (1 + ε)||W −1 AT π ∗ ||∞ .
10
Corollary 4. Consider a run of Algorithm gradient_ust for some 0 < ε ≤ 1/2 and let π be the potentials returned. Then f (π) = π/||W −1 AT π||∞ is a (1 + ε)-approximate solution of (1), T y∗ i.e., bT f (π) ≥ b1+ε and ||W −1 AT f (π)||∞ ≤ 1. Proof. As 0 < ε ≤ 1/2, Lemma 3 yields that π is optimal up to factor (1 + ε) for (2). By Observation 1 and Lemma 2, f (π) = π/||W −1 AT π||∞ is feasible in (1) and optimal up to factor 1 + ε. We will now bound its running time. This will happen in two steps. (1) We will show how the potential function decreases in each iteration, see Corollary 6. (2) We will, using this result, bound the number of iterations that the algorithm needs, see Lemma 7. First, note that the gradient of Φβ (π) takes the form ∇Φβ (π) = AW −1 ∇ lseβ (W −1 AT π) = AW −1 pβ (π),
(12)
where we denote pβ (π) := ∇ lseβ (W −1 AT π) ∈ Rm .6 Lemma 5. Given π ∈ Rn , for any h ∈ Rn , it holds that Φβ (1 − bT h)π + h − Φβ (π) ≤ ˜bT h + βkW −1 AT [I − πbT ]hk2∞ ,
where ˜b = [I − bπ T ]∇Φβ (π) = ∇Φβ (π) − π T ∇Φβ (π)b. Proof. By convexity of the potential function, we have Φβ (1 − bT h)π + h − Φβ (π)
T
≤ ∇Φβ (1 − bT h)π + h
(h − bT hπ)
h
iT
= ∇Φβ (π)T [I − πbT ]h + ∇Φβ (1 − bT h)π + h − ∇Φβ (π) h
= ˜bT h + pβ (1 − bT h)π + h − pβ (π)
iT
[h − bT hπ]
W −1 AT [h − bT hπ].
Using Hoelder’s inequality, we conclude that Φβ (1 − bT h)π + h − Φβ (π) ≤ ˜bT h + kpβ (1 − bT h)π + h − pβ (π)k1 kW −1 AT [I − πbT ]hk∞ (12)
≤ ˜bT h + βkW −1 AT [I − πbT ]hk2∞ .
˜ ∈ Rn such that bT h ˜=0 Corollary 6 (Multiplicative Decrement of Φβ ). Given π ∈ Rn and h ˜ ∞ ≤ 1, suppose that εβΦβ (π) ≤ 5 ln(2m). Let δ := ∇Φβ (π)T h, ˜ then for and ||W −1 AT h|| δ ˜ h = − 2β h, it holds that "
Φβ π −
δ ˜ 2β h
#
εδ 2 ≤ 1− Φβ (π). 20 ln(2m)
6
We remark that ∇f (x) is short term for ∇x f (x) for any function f , i.e., the gradient is w.r.t. the argument of the function following it, if the subscript is omitted.
11
˜ = 0 and ||W −1 AT h|| ˜ ∞ ≤ 1, yields Proof. Applying Lemma 5 and using bT h Φβ π −
δ ˜ 2β h
δ T˜ − Φβ (π) = Φβ 1− b h π + h − Φβ (π) 2β ˜ δ˜bT h δ2 ˜ 2 ≤− + kW −1 AT [I − πbT ]hk ∞ 2β 4β 2 δ ˜ + δ kW −1 AT [I − πbT ]hk ˜ 2 = − ∇Φβ (π)T [I − πbT ]h ∞ 2β 4β εδ 2 δ2 ≤− Φβ (π). =− 4β 20 ln(2m)
Lemma 7 (Number of Iterations). Suppose that 0 < ε ≤ 1/2. Then Algorithm gradient_ust terminates within O(ε−3 α2 log(α) log(n)) iterations. Proof. Note that for all x ∈ Rn , ∇β lseβ (x) ≤ 0, i.e., lseβ is decreasing as function of β and thus the while-loop that scales β up does not increase Φβ (π). Denoting π0 and β0 the initial values of β and π, respectively and π and β the values at termination, it follows by Corollary 6, 3 εδ 2 ≤ 1 − 320α2εln(2m) . Thus the number of that the potential decreases by a factor of 1 − 20 ln(2m) iterations k can be bounded by k ≤ log
Φ (π) β
Φβ0 (π0 )
log 1 −
It remains to bound the quotient 4 ln(2m) ≤ εβ0 Φβ0
(π 0 )
ε3
−1
320α2 ln(2m) Φβ0 (π0 ) Φβ (π) .
and thus
(π 0 )
Φβ0 Φβ (π)
≤
Φ (π ) 320α2 ln(2m) β0 0
Φβ (π)
ε−3
.
Using that the algorithm chooses β0 such that
shows that
Φβ0 (π 0 ) = lseβ0 (W −1 AT π 0 ) ≤ α||W −1 AT π ∗ ||∞ + and thus Φβ0 (π 0 ) ≤
≤ log
εΦβ0 (π 0 ) ln(2m) ≤ α||W −1 AT π ∗ ||∞ + β0 4
α||W −1 AT π ∗ ||∞ . On the other hand Φβ (π) ≥ ||W −1 AT π||∞ 1−ε/4 α −3 2 1−ε/4 = O(α). It follows that k = O(ε α log α log n).
≥ ||W −1 AT π ∗ ||∞
The following Lemma shows that the additional constraint bT h = 0, which distinguishes the optimization problem for computing the update in each iteration from the original undirected transshipment problem, does only have a limited influence in our setting. Lemma 8. Given π ∈ Rn with bT π = 1 such that ||W −1 AT π||∞ ≤ η||W −1 AT π ∗ ||∞ . For ˜ with kW −1 AT hk ˜ ∞ ≤ 1 and ˜bT h ˜ ≥ 1 max{˜bT h : kW −1 AT hk∞ ≤ 1}, where ˜b = (I − any h γ ˜ (I−πbT )h T 0 ˜ ˜ 0 = 0, ||W −1 AT h ˜ 0 ||∞ ≤ 1, and bπ )∇Φβ (π), we obtain h := that satisfies bT h 1+η
˜0 ≥ ∇Φβ (π)T h
1 (1+η)γ
max{∇Φβ (π)T h : kW −1 AT hk∞ ≤ 1, bT h = 0}.
12
˜ 0 = 0 follows from bT (I − πbT )h ˜ = (b − b)T h ˜ = 0 because bT π = 1. For Proof. The fact that bT h ˜ 0 ||∞ , note that the bound on ||W −1 AT h ˜ 0 ||∞ ≤ ||W −1 AT h|| ˜ ∞ + ||W −1 AT π||∞ · |bT h| ˜ (1 + η) · ||W −1 AT h "
≤ ||W
−1
T˜
A h||∞ · 1 + ||W h
−1
˜ |bT h| A π||∞ ˜ ∞ ||W −1 AT h||
#
T
˜ ∞ · 1 + η||W −1 AT π ∗ ||∞ bT y ∗ ≤ ||W −1 AT h||
i
˜ ∞ · [1 + η] . ≤ ||W −1 AT h|| Hence, we obtain ˜ 0 = (1 + η)−1˜bT h ˜≥ ∇Φβ (π)T h
1 max{˜bT h : kW −1 AT hk∞ ≤ 1} (1 + η)γ
1 max{˜bT h : kW −1 AT hk∞ ≤ 1, bT h = 0} (1 + η)γ 1 max{∇Φβ (π)T h : kW −1 AT hk∞ ≤ 1, bT h = 0} = (1 + η)γ ≥
Note that this Lemma implies that an α-approximation for max{˜bT h : kW −1 AT hk∞ ≤ 1} yields an 3α2 -approximation for max{∇Φβ (π)T h : kW −1 AT hk∞ ≤ 1, bT h = 0} for all π considered by Algorithm 1 because it also yields an α-approximate starting solution and ||W −1 AT π||∞ ≤ Φβ (π) ≤ Φβ 0 (π 0 ) ≤ ||W −1 AT π 0 ||∞ +
≤ 1+
εΦβ (π 0 ) ln(2m) −1 T 0 ≤ ||W A π || + ∞ β0 4
ε ε ||W −1 AT π 0 ||∞ ≤ 1 + α||W −1 AT π ∗ ||∞ ≤ 2α||W −1 AT π ∗ ||∞ . 4 4
Moreover, η ≤ 1 + ε holds for the vector π returned by Algorithm 1. Hence, both optimization problems are equivalent up to a factor of 2 + ε for computing the last update and thus also for the evaluation of the termination criterion. Therefore, we may assume w.l.o.g. that even ˜ : ||W −1 AT h||∞ ≤ 1} < ε while the running time is only affected by a constant max{˜bT h 4α factor.
2.1
Recovering a Primal Solution
In the following, we show that a routing of the given demand vector can be recovered from the final gradient. Let y ∗ be an optimum solution and y be the solution π returned by Algorithm 1 normalized by ||W −1 AT π||∞ . Then, we know that bT y ∗ ≤ (1 + ε)bT y. Let ∇Φβ (y) = AW −1 p be the gradient of the potential function Φβ (y) at y where p = ∇ lseβ (W −1 AT y) is a vector with one entry for each arc. Recall that ∇Φβ·bT y (y/bT y) = ∇Φβ (y). Moreover, let ˜b := [I − by T /bT y]AW −1 p.
13
We consider the primal-dual pair of the undirected shortest transshipment problem with the auxiliary demand vector ˜b max{˜bT h : ||W −1 AT h||∞ ≤ 1} = min{||W x||1 : Ax = ˜b}. ˜ < ε, ˜ such that ||W x Suppose that we have primal/dual feasible solutions x ˜, h ˜||1 ≤ α˜bT h 4 ˜ from the last iteration of where we may assume the latter inequality w.l.o.g. by taking the h Algorithm 1. T We define x ¯ := pT Wb−1yAT y · [W −1 p − x ˜] and observe that x ¯ is a primal feasible solution, since A¯ x=
h h bT y bT y −1 −1 ˜b] = · [AW p − · AW p − I− y T AW −1 p y T AW −1 p
by T bT y
i
i
AW −1 p = b.
Moreover the objective value of x ¯ is bT y bT y 1 + ε/4 T ||W x ¯||1 ≤ T −1 T · ||p − W x ˜||1 ≤ T −1 T · (||p||1 + ||W x ˜||1 ) ≤ ·b y p W A y p W A y 1 − ε/4
where the last inequality follows from the fact that the one-norm of the gradient of the softmax function is bounded by 1 from above, the convexity of Φβ (y) 7 , and the choice of β. We obtain the following corollary. Corollary 9. Assume that we are given an oracle that computes an α-approximate dual solution ˜ to the undirected transshipment problem in O(m) time. Then, using Algorithm gradient_ust, we can compute primal and dual solutions x, y satisfying kW xk1 ≤ (1 + ε)bT y in time 2 ε−3 ). ˜ O(mα In fact, we showed that the additional constraint bT h = 0 does not make a big difference and that it could be ommitted at the expense of an additional O(α2 ) factor in the running ˜ ˜ time. Moreover, if we compute a sparse spanner with O(n) edges in time O(m) [BS07] and then use the recent algorithm of Sherman [She16] that computes a (1 + ε)-approximate solution in O(m1+o(1) ) time, which runs in O(n1+o(1) ) time on the spanner, we obtain an algorithm solving ˜ −3 (m+n1+o(1) )) the undirected transshipment problem within factor (1+ε) of the optimal in O(ε time.
2.2
Single Source Shortest Path
In the special case of single source shortest path, we have bv = 1 for all v ∈ V \ {s} and bs = 1 − n for the source s. In fact, it is the combination of n − 1 shortest s-t-path problems. Thus, the approximation guarantee from the previous section only holds on average over all sink-nodes. In the following, we show how to use Algorithm 1 to obtain the distance from the source within a factor of 1 + ε for each node. To this end, we use a little more precision so that we can recognize the nodes for which we know the distance with sufficient accuracy 7
More precisely, ΦβbT y (0) ≥ ΦβbT y ( bTy y ) + ∇ΦβbT y ( bTy y )T (0 −
14
y ) bT y
using the tools we proposed above. We then concentrate on the other nodes and adapt the demand-vector b accordingly so that bv = 0 for all good nodes v, i.e., the nodes for which we know the distance within a factor of 1 + ε. We iterate until all nodes are good. It remains to show how to recognize good nodes and that a constant fraction of the nodes become good in each iteration. Let y ∗ ∈ Zn denote the vector of distances from the source and let y ∈ Rn be our current estimates of the distances. A node v ∈ V is called ε-good if yv∗ /(1 + ε) ≤ yv ≤ yv∗ . First, we show that we cannot be too close to the optimum when the stopping criterion is not reached yet. Lemma 10. Let ρ := Φβ·bT y ( bTy y ) · bT y ∗ . If the termination criterion is not met for y yet, i.e., ε δ > 4α , then ρ > 1 + ε0 , where ε0 :=
ε3 . 320α2 ln(2m)
Proof. Let π 0 be the vector obtained by updating π := bTy y according to Algorithm 1, i.e. π 0 = (1 − bT h)π + h, where h is an α-approximate solution of the optimization problem ε max{∇Φβ (π)T h : ||W −1 AT h||∞ ≤ 1, bT h = 0}. Then, using Corollary 6, δ > 4α , and 1 −1 T ∗ ||W A π ||∞ = bT y∗ , we get !
0
1 − ε Φβ·bT y (π) >
εδ 2 1 1− Φβ·bT y (π) ≥ Φβ·bT y (π 0 ) ≥ ||W −1 AT π 0 ||∞ ≥ T ∗ . 20 ln(2m) b y
This implies ρ > (1 − ε0 )−1 ≥ 1 + ε0 . Next, we show that a 1 + ε0 on average guarantee implies that bT y ∗ reduces by at least a constant factor when we set bv = 0 for all 2ε0 -good nodes. Lemma 11. Let bv ≥ 0 for all v ∈ V \ {s}, y ∈ Rn with ||W −1 AT y||∞ = 1, ys = 0 and T y∗ 1 ∗ 0 0 bT y ≥ b1+ε Then, 0 , and let X := {v ∈ V \ {s} : yv < yv /(1 + 2ε )} for some ε ≤ 4 . P 3 T ∗ ∗ b y . b y ≤ v∈X v v 4 Proof. We have that ε0
X
bv yv ≤ ε0
X
bv yv + (1 + ε0 )bT y − bT y ∗
v∈X
v∈X
= (1 + 2ε0 )
X
bv yv + (1 + ε0 )
v∈X
0 and π ∈ Rn known to every node, compute, and make known to every node, Φβ (π) and ∇Φβ (π) using a constant number of rounds.
17
(B) Given an α-spanner G0 of G and q, r ∈ Rn known to every node, compute, at every node, ˜ ∈ Rn such that αq T h ˜ ≥ max{q T h : ||W −1 AT h||∞ ≤ 1, rT h = 0}, ||W −1 AT h|| ˜ ∞ = 1, h T ˜ and r h = 0 using only local computation. We now describe how to implement the gradient descent algorithm algorithm. We will maintain the invariant that each node knows π and the current value of β at the beginning of each iteration of the algorithm. 1. Make n, m, b, and the set of node identifiers are known to all nodes in a single round by each node v broadcasting its identifier, bv , and degree in the input graph. 2. Next, we construct an α-spanner G0 of G with α = 2dlog ne − 1 in O(log2 n) rounds using the algorithm of Corollary 23. 3. Locally at every node, compute an α-approximate solution π ˜ to the min-cost transshipment problem with demands b using (B) by setting q := b and r := 0. 4. Compute and make known to every node Φβ (π) using (A) in a constant number of rounds. 5. Locally at every node, check if 4 ln(4m) ≥ Φβ (π). If not, locally at every node scale up β εβ by 5/4 and proceed with Step 3. 6. Compute and make known to every node ∇Φβ (π) using (A) in a constant number of rounds. ˜ ≥ max{∇Φβ (π)T h : ||W −1 AT h||∞ ≤ ˜ so that α∇Φβ (π)T h 7. Locally at every node, determine h ˜ ∞ ≤ 1 using (B) by setting q := ∇Φβ (π) and r := b by performing 1, bT h = 0}, ||W −1 AT h|| only local computation. ˜ and check if δ ≥ /(4α). If yes, locally 8. Locally at every node, compute δ := ∇Φβ (π)T h δ ˜ at every node, compute π := π − 2β h and proceed with the next iteration at Step 4. 9. At termination, first compute ||W −1 AT π||∞ in a single round (every node can determine |(W −1 AT π)a | locally from π for each of its incoming arcs and broadcast it such that every node can determine the maximum among the received values and its own value). Then, locally at every node, compute (π − πs 1)/||W −1 AT π||∞ as the output. Apart from repetitions of Step 4 for scaling up β we need a constant number of rounds per iteration. As there are −3 polylog n iterations by Lemma 7 and we scale up β at most O(log n) times over the course of the algorithm, we use −3 polylog n rounds as promised in Theorem 14. It remains to show how to perform primitives (A) and (B). Primitive (A). Define the oriented stretch of an arc a = (u, v) under potentials π by sπ (a) = (πv − πu )/w(a), i.e. W −1 AT π is the vector of oriented arc stretches under potential π. For every node v, let av be the following sum over its incoming arcs: av =
X
eβsπ (a) + e−βsπ (a)
a=(u,v)∈A
18
Thus, X 1 eβsπ (a) + e−βsπ (a) Φβ (π) = ln β a∈A
!
1 = ln β
! X
av
v∈V
and, for every node v, ∇Φβ (π)v equals X eβsπ (a) − e−βsπ (a) eβsπ (a) − e−βsπ (a) − P 0 0 βsπ (a ) + e−βsπ (a ) βsπ (a0 ) + e−βsπ (a0 ) w(a) · w(a) · 0 ∈A e 0 ∈A e a a a=(u,v)∈A a=(v,u)∈A X
P
=
X eβsπ (a) − e−βsπ (a) eβsπ (a) − e−βsπ (a) P P − w(a) · v0 ∈V av0 w(a) · v0 ∈V av0 a=(v,u)∈A a=(u,v)∈A X
As every node v knows its incident arcs and their respective weights, it can locally compute av . In one round of communication all nodes can learn the value av of every other node v. Once these values are known, each node can locally compute φβ (π) as well as ∇Φβ (π)v . Simultaneously for every node v, we send the latter value to all its neighbors in one round such that every node knows ∇Φβ (π)v afterwards. Primitive (B). We use the fact that an α-spanner G0 of G is known to every node. In particular, every node can locally construct the node-arc incidence matrix A0 and the diagonal weight matrix W 0 of G0 . Thus, each node can locally compute an optimal solution h0 to the ˜ := h0 /α has linear program max{q T h : ||(W 0 )−1 (A0 )T h||∞ = 1, rT h = 0}. We now claim that h the desired properties. Lemma 15. Let q, r ∈ Rn and let G0 be an α-spanner of G. Let A, A0 , W , and W 0 denote the node-arc incident matrices and the weight matrices of G and G0 , respectively. Let h0 be an optimal solution to the linear progam max{q T h : ||(W 0 )−1 (A0 )T h||∞ ≤ 1, rT h = 0} and set ˜ := h0 /α. Then αq T h ˜ ≥ max{q T h : ||W −1 AT h||∞ ≤ 1, rT h = 0}, ||W −1 AT h|| ˜ ∞ ≤ 1, and h T ˜ r h = 0. ˜ = rT h0 /α = 0. As the spanner is a subgraph, max{q T h : ||(W 0 )−1 (A0 )T h||∞ ≤ Proof. Clearly, rT h ˜ = q T h0 ≥ max{q T h : 1, rT h = 0} ≥ max{q T h : ||W −1 AT h||∞ ≤ 1, rT h = 0} and thus αq T h −1 T T ||W A h||∞ ≤ 1, r h = 0}. Let a = (u, v) ∈ A and denote by P a shortest path between u to v in G . Then the stretch of a in G under h0 is 0 0 |h0 − h0u | {u0 ,v 0 }∈P |hv 0 − hu0 | A h )a | = v ≤ w(a) w(a) P P 0 0 −1 0 T 0 w(a0 ) 0 a0 ∈P w(a )|((W ) (A ) h )a0 | ≤ a ∈P ≤ α. = w(a) w(a)
P
|(W
−1
T 0
˜ ∞ ≤ 1 for h ˜ = h0 /α. Thus, ||W −1 AT h0 ||∞ ≤ 1 and consequently ||W −1 AT h|| Implementing Algorithm single_source_shortest_path. After computing an α-spanner, we can implement Algorithm single_source_shortest_path using primitives (A) and (B) in a straightforward way. The algorithm internally uses our implemention of gradient_ust with an increased precision of 0 = 3 /(320α2 ln (2m)). We thus arrive at the following theorem.
19
Theorem 16. For 0 < ∈ O(1), on the broadcast congested clique a deterministic (1 + )approximation to single-source shortest paths can be computed in −9 polylog n rounds.
3.2
Broadcast Congest Model
Model. The broadcast congest model differs from the broadcast congested clique in that communication is restricted to edges that are present in the input graph. That is, node v receives the messages sent by node w if and only if {v, w} ∈ E. All other aspects of the model are identical to the broadcast congested clique. We stress that this restriction has significant impact, however: Denoting the hop diameter9 of the input graph by D, it is straightforward to show that Ω(D) rounds are necessary to solve the transshipment problem. Moreover, it has been √ established that Ω( n/ log n) rounds are required even on graphs with D ∈ O(log n) [DHK+ 12]. Both of these bounds apply to randomized approximation algorithms (unless the approximation ratio is not polynomially bounded in n). Solving the Transshipment Problem. A straightforward implementation of our algorithm in this model simply simulates the broadcast congested clique. A folklore method to simulate (global) broadcast is to use “pipelining” on a breadth-first-search (BFS) tree. Lemma 17 (cf. [Pel00]). Suppose each v ∈ V holds mv ∈ Z≥0 messages of O(log n) bits each, P for a total of M = v∈V mv strings. Then all nodes in the graph can receive these M messages within O(M + D) rounds. Proof Sketch. It is easy to construct a BFS tree in O(D) rounds (rooted at, e.g., the node with smallest identifier) and obtain an upper bound d ∈ [D, 2D] by determining the depth of the tree and multiply it by 2. By a convergecast, we can determine |M |, where each node in the tree determines the total number of messages in its subtree. We define a total order on the messages via lexicographical order on node identifier/message pairs.10 Then nodes flood their messages through the tree, where a flooding operation for a message may be delayed by those for other messages which are smaller w.r.t. the total order on the messages. On each path, a flooding operation may be delayed once for each flooding operation for a smaller message. Hence, this operation completes within O(D + |M |) rounds, and using the knowledge on d and M , nodes can safely terminate within O(d + |M |) = O(D + |M |) rounds. We obtain the following corollary to Theorem 14. Corollary 18. For 0 < ∈ O(1), in the broadcast congest model a deterministic (1 + )˜ −3 n) rounds. approximation of (1) can be computed in O( Proof. Simulate a round on the broadcast congested clique using Lemma 17, i.e., with parameters |M | = n and D ≤ n. Applying Theorem 14, the claim follows. 9 10
That is, the diameter of the unweighted graph G = (V, E). W.l.o.g., we assume identifier/message pairs to be different.
20
Special Case: Single-Source Shortest Paths. The near-linear running time bound of ˜ √n + D) lower bound, which also applies to the single-source Corollary 18 is far from the Ω( shortest path problem. However, for this problem there is an efficient reduction to a smaller skeleton graph, implying that we can match the lower bound up to polylogarithmic factors. The skeleton graph is given as an overlay network on a subset V 0 ⊆ V of the nodes, where each node in V 0 learns its incident edges and their weights. Theorem 19 ([HKN16]11 ). Given any weighted undirected network G and source node s ∈ V , ˜ −O(1) √n)-round deterministic distributed algorithm that computes an overlay there is an O( network G0 = (V 0 , E 0 ) with edge weights w0 : E 0 → {1, . . . , poly n} and some additional information for every node with the following properties. ˜ −1 √n) and s ∈ V 0 . • |V 0 | = O( • For 0 := /2, each node v ∈ V can infer a (1 + )-approximation of its distance to s from (1 + 0 )-approximations of the distances between s and each t ∈ V 0 . √ This reduces the problem to a graph of roughly n nodes, to which we can apply the previous simulation approach. Corollary 20. For 0 < ∈ O(1), in the broadcast congest model a deterministic (1 + )˜ −O(1) (√n + D)) rounds. approximation to single-source shortest paths can be computed in O( Proof. We apply Theorem 19. Subsequently, we use Lemma 17 to simulate rounds of the ˜ −O(1) √n + D) rounds per broadcast congested clique on the overlay network, taking O( simulated round. Using Corollary 18 for 0 ∈ Θ(), we obtain a (1 + 0 )-approximation to the distances from each t ∈ V 0 to s in the overlay. After broadcasting these distances using Lemma 17 again, all nodes can locally compute a (1 + )-approximation of their distance to s. ˜ −O(1) (√n + D)). The total running time is O(
3.3
Multipass Streaming
Model. In the Streaming model the input is presented to the algorithm bit by bit as a “stream” without repetitions and the goal is to design algorithms that use as little space as possible. In the Multipass Streaming model, the algorithm is allowed to make several such passes over the input stream and the goal is to design algorithms that need only a small number of passes (and again little space). For graph algorithms, the usual assumption is that the edges of the input graph are presented to the algorithm in arbitrary order. Implementing Algorithm gradient_ust. In the following we explain how to implement our gradient descent algorithm for approximating the minimum-cost transshipment. 11 All communication of the algorithm in [HKN16] meets the constraint that in each round, each node sends the same message to all neighbors (which is the difference between the broadcast congest and the standard congest model used in [HKN16]).
21
Theorem 21. For any 0 < ε ∈ O(1), in the multipass streaming model a deterministic (1 + ε)approximation to the value of a minimum-cost transshipment in an undirected graph with non-negative edge weights can be computed in ε−3 polylog n passes with O(n log2 n) space. The rest of this subsection is devoted to proving the theorem. In the algorithm we will use space O(n log n2 ) to permanently store certain information. We will perform all operations of the algorithm within an additional “temporay space” of size O(n log2 n), i.e., at any time sum of the permanent space and the temporary space is O(n log2 n). In the following description of the algorithm we assume that we can use the following two algorithmic primitives. We will argue after the algorithm’s descripion that they can be carried out within the stated running time bounds. (A) Given stored β > 0 and π ∈ Rn , compute Φβ (π) and ∇Φβ (π) using a single pass and O(n) temporay space. ˜ ∈ Rn such that (B) Given a stored α-spanner G0 of G and stored q, r ∈ Rn , compute h T T −1 T T −1 T ˜ ˜ ˜ = 0 αq h ≥ max{q h : ||W A h||∞ ≤ 1, r h = 0}, ||W A h||∞ = 1, and rT h 2 internally (i.e. without additional passes) with O(n log n) temporay space We reserve space O(n log n2 ) to permanently store the following information throughout the algorithm: • A spanner G0 of G of size O(n log2 n). • An n-dimensional vector π for the solution maintained by the gradient descent algorithm. • An n-dimensional vector b for the input demands. • An n-dimensional vector s for partial sums of edge weights. • Scalars α, β, , m and n. We now describe how to implement the gradient descent algorithm algorithm. 1. Compute and store n, m, b in a single pass with O(1) temporay space. 2. Construct and store an α-spanner G0 of G of size O(n2 log n) with α = 2dlog ne − 1 in O(log2 n) passes with O(n2 log n) temporay space using the algorithm of Corollary 24. 3. Internally, compute and store an α-approximate solution π ˜ to the min-cost transshipment problem with demands b using (B) by setting q := b and r := 0 with O(n2 log n) temporay space. 4. Compute and store Φβ (π) using (A) in a single pass with O(n) temporay space. 5. Internally, check if
4 ln(4m) εβ
≥ Φβ (π). If not, scale up β by 5/4 and proceed with Step 3.
6. Compute and store ∇Φβ (π) using (A) in a single pass with O(n) temporay space.
22
˜ such that α∇Φβ (π)T h ˜ ≥ max{∇Φβ (π)T h : ||W −1 AT h||∞ ≤ 7. Internally, compute and store h ˜ ∞ ≤ 1 using (B) by setting q := ∇Φβ (π) and r := b with 1, bT h = 0}, ||W −1 AT h|| 2 O(n log n) temporay space. ˜ and check if δ ≥ ε/(4α) with O(1) temporay space. 8. Internally, compute δ := ∇Φβ (π)T h δ ˜ If yes, internally compute π := π − 2β h with O(n2 log n) temporay space and proceed with the next iteration at Step 4. 9. At termination, first compute ||W −1 AT π||∞ in a single pass with O(1) temporay space (every time we read an edge, we determine its orientation as an arc e and determine |(W −1 AT π)a | from the stored π and store |(W −1 AT π)a | if it is maximum among the corresponding value for previously read arcs). Then internally compute (π−πs 1)/||W −1 AT π||∞ as the output using O(n) temporary space. We never exceed the promised space bound of O(n2 log n). Apart from repetitions of Step 4 for scaling up β we need a constant number of passes per. As there are −3 polylog n iterations by Lemma 7 and we scale up β at most O(log n) times over the course of the algorithm, we use −3 polylog n passes as promised in Theorem 21. It remains to show how to perform primitives (A) and (B). Primitive (A). Define the oriented stretch of an arc a = (u, v) under potentials π by sπ (a) = (πv − πu )/w(a), i.e. W −1 AT π is the vector of oriented arc stretches under potential π. Let d be the following sum over all arcs d=
X
eβsπ (a) + e−βsπ (a)
a∈A
and for every node v, let bv , and cv be the following sums over its incoming and outgoing arcs, respectively, bv =
eβsπ (a) − e−βsπ (a) w(a) a=(u,v)∈A
cv =
eβsπ (a) − e−βsπ (a) . w(a) a=(v,u)∈A
X
X
Now observe that Φβ (π) = and
1 ln d β
bv − cv d for every node v. Thus, to implement the computation of Φβ (π) and ∇Φβ (π) in a single pass with O(n) temporary space, it is sufficent to compute av , bv , and cv for every node v in a single pass. For this purpose we keep a variable for each of these values that is initialized to 0 before the pass. Every time we read an edge e = {u, v} of weight w(e) from the stream, we first ∇Φβ (π)v =
23
determine its orientation as an arc, say a = (u, v). We then compute sπ (a) = (π(v)−π(u))/w(e) βsπ (a) −e−βsπ (a) and add eβsπ (a) + e−βsπ (a) to the variable of d, and e to the variables of bv and w(a) cv , respectively. After the pass is finished, the variables have the desired value and we can internally compute Φβ (π) and ∇Φβ (π) with O(n) temporary space. Primitive (B). We use the fact that we have stored an α-spanner G0 of G. We internally construct the node-arc incidence matrix A0 and the diagonal weight matrix W 0 of G0 and compute an optimal solution h0 to the linear program max{q T h : ||(W 0 )−1 (A0 )T h||∞ = 1, rT h = 0}. As centralized computation is free in this streaming model, we can compute a minimum-cost transshipment using O(n log2 n) temporary space by enumerating all feasible solutions and ˜ := h0 /α has the desired properties. checking for optimality. By Lemma 15, h Implementing Algorithm single_source_shortest_path After computing an α-spanner, we can implement Algorithm single_source_shortest_path using primitives (A) and (B) in a straightforward way. The algorithm internally uses our implemention of gradient_ust with an increased precision of 0 = 3 /(320α2 ln (2m)). We thus arrive at the following theorem. Theorem 22. For any 0 < ε ∈ O(1), in the multipass streaming model, deterministic (1 + ε)approximate SSSP in an undirected graph with nonnegative edge weights can be computed in ε−9 polylog n passes with O(n log2 n) space.
References [Bas08]
Surender Baswana. “Streaming algorithm for graph spanners - single pass and constant processing time per edge”. In: Information Processing Letters 106.3 (2008), pp. 110–114 (cit. on pp. 7, 29).
[Bel58]
Richard Bellman. “On a Routing Problem”. In: Quarterly of Applied Mathematics 16.1 (1958), pp. 87–90 (cit. on p. 6).
[BS07]
Surender Baswana and Sandeep Sen. “A Simple and Linear Time Randomized Algorithm for Computing Sparse Spanners in Weighted Graphs”. In: Random Structures & Algorithms 30.4 (2007). Announced at ICALP’03, pp. 532–563 (cit. on pp. 14, 28, 29).
[CKK+ 15]
Keren Censor-Hillel, Petteri Kaski, Janne H. Korhonen, Christoph Lenzen, Ami Paz, and Jukka Suomela. “Algebraic Methods in the Congested Clique”. In: Symposium on Principles of Distributed Computing (PODC). 2015, pp. 143–152 (cit. on p. 7).
[CKM+ 11]
Paul Christiano, Jonathan A. Kelner, Aleksander Madry, Daniel A. Spielman, and Shang-Hua Teng. “Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs”. In: Symposium on Theory of Computing (STOC). 2011, pp. 273–282 (cit. on p. 3).
[Coh00]
Edith Cohen. “Polylog-Time and Near-Linear Work Approximation Scheme for Undirected Shortest Paths”. In: Journal of the ACM 47.1 (2000). Announced at STOC’94, pp. 132–166 (cit. on pp. 4, 7).
24
[DHK+ 12]
Atish Das Sarma, Stephan Holzer, Liah Kor, Amos Korman, Danupon Nanongkai, Gopal Pandurangan, David Peleg, and Roger Wattenhofer. “Distributed Verification and Hardness of Distributed Approximation”. In: SIAM Journal on Computing 41.5 (2012). Announced at STOC’11, pp. 1235–1265 (cit. on pp. 5, 7, 20).
[DS08]
Samuel I. Daitch and Daniel A. Spielman. “Faster approximate lossy generalized flow via interior point algorithms”. In: Symposium on Theory of Computing (STOC). 2008, pp. 451–460 (cit. on p. 3).
[Elk06]
Michael Elkin. “An Unconditional Lower Bound on the Time-Approximation Tradeoff for the Distributed Minimum Spanning Tree Problem”. In: SIAM Journal on Computing 36.2 (2006). Announced at STOC’04, pp. 433–456 (cit. on p. 6).
[Elk11]
Michael Elkin. “Streaming and Fully Dynamic Centralized Algorithms for Constructing and Maintaining Sparse Spanners”. In: ACM Transactions on Algorithms 7.2 (2011). Announced at ICALP’07, 20:1–20:17 (cit. on p. 7).
[EN16]
Michael Elkin and Ofer Neiman. “Hopsets with Constant Hopbound, and Applications to Approximate Shortest Paths”. In: CoRR abs/1605.04538 (2016) (cit. on pp. 4, 7).
[EZ06]
Michael Elkin and Jian Zhang. “Efficient algorithms for constructing (1 + , β)spanners in the distributed and streaming models”. In: Distributed Computing 18.5 (2006), pp. 375–385 (cit. on p. 7).
[FKM+ 05]
Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. “On graph problems in a semi-streaming model”. In: Theoretical Computer Science 348.2-3 (2005). Announced at ICALP’04, pp. 207–216 (cit. on p. 7).
[FKM+ 08]
Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. “Graph Distances in the Data-Stream Model”. In: SIAM Journal on Computing 38.5 (2008). Announced at SODA’05, pp. 1709–1727 (cit. on p. 7).
[For56]
Lester R. Ford. Network Flow Theory. Tech. rep. P-923. The Rand Corporation, 1956 (cit. on p. 6).
[FT87]
Michael L. Fredman and Robert Endre Tarjan. “Fibonacci heaps and their uses in improved network optimization algorithms”. In: Journal of the ACM 34.3 (1987). Announced at FOCS’84, pp. 596–615 (cit. on p. 4).
[GO13]
Venkatesan Guruswami and Krzysztof Onak. “Superlinear Lower Bounds for Multipass Graph Processing”. In: Conference on Computational Complexity (CCC). 2013, pp. 287–298 (cit. on pp. 5, 8).
[HKN16]
Monika Henzinger, Sebastian Krinninger, and Danupon Nanongkai. “A deterministic almost-tight distributed algorithm for approximating single-source shortest paths”. In: Symposium on Theory of Computing (STOC). 2016, pp. 489–498 (cit. on pp. 4, 6, 7, 21).
25
[KLO+ 14]
Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. “An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations”. In: Symposium on Discrete Algorithms (SODA). 2014, pp. 217–226 (cit. on p. 3).
[KR90]
Richard M. Karp and Vijaya Ramachandran. “Parallel Algorithms for SharedMemory Machines”. In: Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity. The MIT Press, 1990, pp. 869–942 (cit. on p. 4).
[LPP+ 05]
Zvi Lotker, Boaz Patt-Shamir, Elan Pavlov, and David Peleg. “Minimum-Weight Spanning Tree Construction in O(loglogn) Communication Rounds”. In: SIAM Journal on Computing 35.1 (2005). Announced at SPAA’03, pp. 120–131 (cit. on p. 7).
[LPS13]
Christoph Lenzen and Boaz Patt-Shamir. “Fast Routing Table Construction Using Small Messages”. In: Symposium on Theory of Computing (STOC). 2013, pp. 381– 390 (cit. on p. 7).
[LS14]
Yin Tat Lee and Aaron Sidford. “Path Finding Methods for Linear Programming: ˜ √rank) Iterations and Faster Algorithms for MaxiSolving Linear Programs in O( mum Flow”. In: Symposium on Foundations of Computer Science (FOCS). 2014, pp. 424–433 (cit. on pp. 3, 6).
[Mad13]
Aleksander Madry. “Navigating Central Path with Electrical Flows: From Flows to Matchings, and Back”. In: Symposium on Foundations of Computer Science (FOCS). 2013, pp. 253–262 (cit. on p. 3).
[MPV+ 15]
Gary L. Miller, Richard Peng, Adrian Vladu, and Shen Chen Xu. “Improved Parallel Algorithms for Spanners and Hopsets”. In: Symposium on Parallelism in Algorithms and Architectures (SPAA). 2015, pp. 192–201 (cit. on p. 4).
[Nan14]
Danupon Nanongkai. “Distributed Approximation Algorithms for Weighted Shortest Paths”. In: Symposium on Theory of Computing (STOC). 2014, pp. 565–573 (cit. on p. 7).
[Orl93]
James B. Orlin. “A Faster Strongly Polynomial Minimum Cost Flow Algorithm”. In: Operations Research 41.2 (1993). Announced at STOC’88, pp. 338–350 (cit. on p. 6).
[Pel00]
David Peleg. Distributed Computing: A Locality-Sensitive Approach. Philadelphia, PA: SIAM, 2000 (cit. on pp. 6, 20).
[RTZ05]
Liam Roditty, Mikkel Thorup, and Uri Zwick. “Deterministic Constructions of Approximate Distance Oracles and Spanners”. In: International Colloquium on Automata, Languages and Programming (ICALP). 2005, pp. 261–272 (cit. on p. 29).
[She13]
Jonah Sherman. “Nearly Maximum Flows in Nearly Linear Time”. In: Symposium on Foundations of Computer Science (FOCS). 2013, pp. 263–269 (cit. on pp. 3, 9).
26
[She16]
Jonah Sherman. “Generalized Preconditioning and Network Flow Problems”. In: CoRR abs/1606.07425 (2016) (cit. on pp. 3, 4, 6, 14).
[Tho99]
Mikkel Thorup. “Undirected Single-Source Shortest Paths with Positive Integer Weights in Linear Time”. In: Journal of the ACM 46.3 (1999). Announced at FOCS’97, pp. 362–394 (cit. on p. 4).
27
A
Deterministic Spanner Computation in Congested Clique and Multipass Streaming Model
For k ∈ Z>0 , a simple and elegant randomized algorithm computing, a (2k − 1)-spanner with O(kn1+1/k ) edges in expectation was given by Baswana and Sen [BS07]. For the sake of completeness, we restate it here. 1. Initially, each node is a singleton cluster: R1 := {{v} | v ∈ V }. 2. For i = 1, . . . , k − 1 do: (a) Each cluster from Ri is marked independently with probability n−1/k . Ri+1 is defined to be the set of clusters marked in phase i. (b) If v is a node in an unmarked cluster: i. Define Qv to be the set of edges that consists of the lightest edge from v to each cluster in Ri it is adjacent to. ii. If v is not adjacent to any marked cluster, all edges in Qv are added to the spanner. iii. Otherwise, let u be the closest neighbor of v in a marked cluster. In this case v adds to the spanner the edge {v, u} and all edges {v, w} ∈ Qv with w(v, w) < w(v, u) (break ties by neighbor identifiers). Also, let X be the cluster of u. Then X := X ∪ {v}, i.e., v joins the cluster of u. 3. Each v ∈ V adds, for each X ∈ Rk it is adjacent to, the lightest edge connecting it to X. It is easy to see that the algorithm selects O(kn1+1/k ) expected edges into the spanner: In each iteration, each node v sorts its incident clusters in order of ascending weight of the lightest edge to them and elects for each cluster, up to the first sampled one, the respective lightest edge into the spanner. Because this order is independent of the randomness used in this iteration, v selects O(n1/k ) edges in expectation and O(n1/k log n) edges with high probability.12 The same bound applies to the final step, as |Rk | ∈ O(n1/k ) in expectation and |Rk | ∈ O(n1/k log n) with high probability. Moreover, this observation provides a straightforward derandomization of the algorithm applicable in our model: Instead of picking Ri+1 in iteration i randomly, we consider the union Ei over all nodes v of the lightest O(n1/k log n) edges in Qv . By a union bound, with high probability we can select Ri+1 such that (i) |Ri+1 | ≤ n−1/k |Ri | and (ii) each node selects only O(n1/k log n) edges into the spanner in this iteration. In particular, such a choice must exist, and it can be computed from Ri and Ei alone. 12 That is, for any fixed constant choice of c > 0, the number of selected edges is bounded by O(n1/k log n) with probability at least 1 − 1/nc .
28
1. Initially, each node is a singleton cluster: R1 := {{v} | v ∈ V }. 2. For i = 1, . . . , k − 1 do for each node v: (a) Define Qv to be the set of edges that consists of the lightest edge from v to each cluster in Ri it is adjacent to. (b) Broadcast the set Q0v of the lightest O(n1/k log n) edges in Qv . (c) For w ∈ V , denote by Xw ∈ Ri the cluster so that v ∈ Xw . Locally compute Ri+1 ⊆ Ri such that (i) |Ri+1 | ≤ n−1/k |Ri | and (ii) for each w ∈ V for which Q0v 6= Qv , it holds that Xw ∈ Ri+1 or Q0v contains an edge connecting to some X ∈ Ri+1 . (d) Update clusters and add edges to the spanner as the original algorithm would, but for the computed choice of Ri+1 . 3. Each v ∈ V adds, for each X ∈ Rk it is adjacent to, the lightest edge connecting it to X. A slightly stronger bound of O(kn1+1/k ) on the number of edges in the spanner can be obtained in this framework by using the technique of deterministically finding early hitting sets developed by Roditty, Thorup, and Zwick [RTZ05]. Note that, as argued above, the selection of Ri+1 in Step 2(c) is always be possible and can be done deterministically, provided Ri is known. Because R1 is simply the set of singletons and each node computes Ri+1 from Ri and the Q0v , this holds by induction. We arrive at the following result. Corollary 23 (of [BS07]). For k ∈ Z>0 , in the broadcast congested clique a (2k − 1)-spanner of size O(kn1+1/k log n) can be deterministically computed and made known to all nodes in O(kn1/k log n) rounds. Proof. The bound of α = 2k − 1 follows from the analysis in [BS07], which does not depend on the choice of the Ri . For the round complexity, observe that all computations can be performed locally based on knowing Q0w for all nodes w ∈ V in each iteration. As |Q0v | ∈ O(n1/k log n ) for each iteration and each v, the claim follows. By similar arguments, we can also get an algorithm in the multipass streaming algorithm with comparable guarantees. We remark that, apart from the property of being deterministic, this was already stated in [Bas08] as a simple consequence of [BS07]. Corollary 24 (of [BS07]). For k ∈ Z>0 , in the multipass streaming model a (2k − 1)-spanner of size O(kn1+1/k log n) can be deterministically computed in k passes using O(kn1+1/k log n) space.
29
B
Primal-Dual Pair
Lemma 25. The dual of min{kW xk1 : Ax = b} is max{bT y : ||W −1 AT y||∞ ≤ 1}. Moreover, strong duality holds. Proof. Observe that the minimization problem is in fact a linear program: min{kW xk1 : Ax = b} = min{wT x+ + wT x− : Ax+ − Ax− = b, x+ , x− ≥ 0}. Thus, a dual exists for which strong duality holds. It is given by max{bT y : AT y ≤ w, −AT y ≤ w} = max{bT y : ||W −1 AT y||∞ ≤ 1}, which proves the claim.
30