arXiv:1507.07402v5 [cs.DS] 7 Apr 2017

Report 5 Downloads 21 Views
Partial resampling to approximate covering integer programs∗

arXiv:1507.07402v3 [cs.DS] 26 May 2016

Antares Chen†

David G. Harris‡

Aravind Srinivasan§

Abstract We consider column-sparse positive covering integer programs, which generalize set cover and which have attracted a long line of research developing (randomized) approximation algorithms. We develop a new rounding scheme based on the Partial Resampling variant of the Lov´asz Local Lemma developed q by Harris & Srinivasan (2013). This achieves an approximation ratio

1 +1) 1 +1) + O( log(∆ ), where amin is the minimum covering constraint and ∆1 is the of 1 + ln(∆ amin amin maximum ℓ1 -norm of any column of the covering matrix (whose entries are scaled to lie in [0, 1]). When there are additional q constraints on the sizes of the variables, we show an approximation 1 +1) ratio of 1 + O( ln(∆ amin ǫ +

log(∆1 +1) ) amin

to satisfy these size constraints up to multiplicative factor √ 1 + ǫ, or an approximation of ratio of ln ∆0 + O( log ∆0 ) to satisfy the size constraints exactly (where ∆0 is the maximum number of non-zero entries in any column of the covering matrix). We also show nearly-matching inapproximability and integrality-gap lower bounds. These results improve asymptotically, in several different ways, over results shown by Srinivasan (2006) and Kolliopoulos & Young (2005). We show also that our algorithm automatically handles multi-criteria programs, efficiently achieving approximation ratios which are essentially equivalent to the single-criterion case and which apply even when the number of criteria is large.

1

Introduction

We consider positive covering integer programs – or simply covering integer programs (CIPs) – defined as follows (with Z+ denoting the set of non-negative integers). We have solution variables x1 , . . . , xn ∈ Z+ , and for k = 1, . . . , m, a system of m covering constraints of the form: X Aki xi ≥ ak i

Here Ak is an n-long non-negative vector; by scaling, we can assume that Aki ∈ [0, 1] and ak ≥ 1. We can write this more compactly as Ak · x ≥ ak . We may optionally have constraints on the size of the solution variables, namely, that we require xi ∈ {0, 1, . . . , di }; these are referred to as the ∗

A preliminary version of this paper appeared in the proceedings of the Symposium on Discrete Algorithms (SODA) 2016. † University of California, Berkeley, CA 94720. This work was done while a student at Montgomery Blair High School, Silver Spring, MD 20901. Email: [email protected]. ‡ Department of Applied Mathematics, University of Maryland, College Park, MD 20742. Research supported in part by NSF Awards CNS-1010789 and CCF-1422569. Email: [email protected] § Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742. Research supported in part by NSF Awards CNS-1010789 and CCF-1422569, and by a research award from Adobe, Inc. Email: [email protected]

1

multiplicity constraints. Finally, we have some linear objective function, represented by a vector C ∈ [0, ∞)n . Our goal is to minimize C · x, subject to the multiplicity and covering constraints. This generalizes the set-cover problem, which can be viewed as a special case in which ak = 1, Aki ∈ {0, 1}. Solving set cover or integer programs exactly is NP-hard [11], so a common strategy is to obtain a solution which is approximately optimal. There are at least three ways one may obtain an approximate solution, where OPT denotes the optimal solution-value for the given instance: 1. the solution x may violate the optimality constraint, that is, C · x > OPT; 2. x may violate the multiplicity constraint: i.e., xi > di for some i; 3. x may violate the covering constraints: i.e., Ak · x < ak for some k. These three criteria are in competition. For our purposes, we will demand that our solution x completely satisfies the covering constraints. We will seek to satisfy the multiplicity constraints and optimality constraint as closely as possible. Our emphasis will be on the optimality constraints, that is, we seek to ensure that C · x ≤ β × OPT where β ≥ 1 is “small”. The parameter β, in this context, is referred to as the approximation ratio. More precisely, we will derive a randomized algorithm with the goal of satisfying E[C ·x] ≤ β×OPT, where the expectation is taken over our algorithm’s randomness. Many approximation algorithms for set cover and its extensions give approximation ratios as a function of m, the total number of constraints: e.g., it is known that the greedy algorithm has approximation ratio (1 − o(1)) ln m [17]. We often prefer a scale-free approximation ratio, that does not depend on the problem size but only on its structural properties. Two cases that are of particular interest are when the matrix A is row-sparse (a bounded number of variables per constraint) or column-sparse (each variable appears in a bounded number of constraints). We will be concerned solely with the column-sparse setting in this paper. The row-sparse setting, which generalizes problems such as vertex cover, typically leads to very different types of algorithms than the column-sparse setting. Two common parameters used to measure the column sparsity of such systems are the maximum l0 and l1 norms of the columns; that is, X Aki ∆0 = max #k : Aki > 0, ∆1 = max i

i

k

Since the entries of A are in [0, 1], we have ∆1 ≤ ∆0 ; it is also possible that ∆1 ≪ ∆0 . Approximation algorithms for column-sparse CIPs typically yield approximation ratios which are a function of ∆0 or ∆1 , and possibly other problem parameters as well. These algorithms fall into two main classes. First, there are greedy algorithms: they start by setting x = 0, and then increment xi where i is chosen in some way which “looks best” in a myopic way for the residual problem. These were first developed by [3] for set cover, and later analysis (see [6]) showed that they give essentially optimal approximation ratios for set cover. These were extended to CIP in [7] and [4], showing an approximation ratio of 1 + ln ∆0 . These greedy algorithms are often powerful, but 2

they are somewhat rigid. For instance, it is difficult to adapt these algorithms to take multiplicity constraints into account, or to deal with several objective functions. A more flexible type of approximation algorithm is based on linear relaxation. This replaces the constraint xi ∈ {0, 1, . . . , di } with the weaker constraint xi ∈ [0, di ]. The set of feasible points to this linear relaxation is a polytope, and one can find the exact optimal fractional solution x ˆ. As this is a relaxation, we have C · x ˆ ≤ OPT. It thus suffices to turn the solution x ˆ into a random integral solution x satisfying E[C · x] ≤ β(C · x ˆ). Randomized rounding is often employed to transform solutions to the linear relaxation back to feasible integral solutions. The simplest scheme, first applied to this context by [16], is to simply draw xi as independent Bernoulli(αˆ xi ), for some α > 1. When this is used, simple analysis using Chernoff bounds shows that A · x ≥ a k k simultaneously for q all k when α ≥ 1 + c0 ( logakm +

log m ak ),

where c0 > 0 is some absolute constant. Thus, the overall q log m m solution C ·x is within a factor of 1+O( log + amin amin ) from the optimum, where amin = mink ak ≥ 1.

In [18], Srinivasan gave a scale-free method of randomized rounding (ignoring multiplicity constraints), based on the FKG inequality and some proof q ideas behind theLov´asz Local Lemma. This log(∆0 +1) amin 0 +1) + logamin + log(∆ . The rounding scheme, by gave an approximation ratio of 1 + O amin amin itself, only gave a positive (exponentially small) probability of achieving the desired approximation ratio. Srinivasan also gave a polynomial-time derandomization using the method of conditional expectations. The algorithm of Srinivasan can potentially cause a large violation in the multiplicity constraints. In [12], Kolliopoulos & Young modified the algorithm of [18] to trade off between the approximation ratio and the violation of the multiplicity constraints. For a given parameter ǫ ∈ (0, 1], they gave an algorithm which violates each multiplicity constraint “xi ≤ di ” to at most “xi ≤ ⌈(1 + ǫ)di ⌉”, with 0 +1) ). Kolliopoulos & Young also gave a second algorithm an approximation ratio of O(1 + log(∆ amin ·ǫ2 which exactly meets the multiplicity constraints and achieves approximation ratio O(log(∆0 )).

1.1

Our contributions

In this paper, we give a new randomized rounding scheme, based on the partial resampling variant of the LLL developed in [9] and some proof ideas developed in [8] for systems of correlated constraints. We show the following result: Theorem 1.1. Suppose we are given a covering system with a fractional solution x ˆ. Let γ = ln(∆1 +1) n amin . Then our randomized algorithm yields a solution x ∈ Z+ satisfying the covering constraints with probability one, and with √  E[xi ] ≤ x ˆi 1 + γ + 4 γ The expected running time of this rounding algorithm is O(mn).

√ This automatically implies that E[C · x] ≤ βC · x ˆ ≤ β × OPT for β = 1 + γ + 4 γ. Our algorithm has several advantages over previous techniques. 1. We are able to give approximation ratios in terms of ∆1 , the maximum l1 -norm of the columns 3

of A. Such bounds are always stronger than those phrased in terms of the corresponding l0 norm. 2. When ∆1 is small, our approximation ratios is asymptotically smaller than that of [18]. In q log amin particular, we avoid the amin term in our approximation ratio. 3. When ∆1 is large, then our approximation ratio is roughly γ; this is asymptotically optimal (including having the correct coefficient), and improves on [18].

4. This algorithm is quite efficient, essentially as fast as reading in the matrix A. 5. The algorithm is oblivious to the objective function — although it achieves a good approximation factor for any objective C, the algorithm itself does not use C in any way. We find it interesting that one can “boil down” the parameters ∆1 , amin into a single parameter γ, which seems to completely determine the behavior of our algorithm. Our partial resampling algorithm in its simplest form could significantly violate the multiplicity constraints. By choosing slightly different parameters for our algorithm (but making no changes otherwise), we can ensure that the multiplicity constraints are nearly satisfied, at the cost of a worsened approximation ratio: Theorem 1.2. Suppose we are given a covering system with a fractional solution x ˆ. Let γ = ln(∆1 +1) n satisfying the covering . For any given ǫ ∈ (0, 1], our algorithm yields a solution x ∈ Z + amin constraints with probability one, and with √ xi ≤ ⌈ˆ xi (1 + ǫ)⌉, E[xi ] ≤ x ˆi (1 + 4 γ + 4γ/ǫ) This is an asymptotic improvement over the approximation ratio of [12], in three different ways: 1. It depends on the ℓ1 -norm of the columns, not the ℓ0 norm; 2. When γ is large, it is smaller by a full factor of 1/ǫ; 3. When γ is small, it gives an approximation ratio which approaches 1, at a rate independent of ǫ. We have a similar algorithm which respects the multiplicity constraints, while giving a different type of asymptotic guarantee: Theorem 1.3. There is a polynomial-time algorithm which yields a solution x ∈ Zn+ satisfying the covering constraints and multiplicity constraints with probability one and satisfies p C · x ≤ (ln ∆0 + O( log ∆0 ))OPT This improves over the the corresponding approximation ratio of [12], in that it achieves the optimal leading coefficient. We also give matching lower bounds on the achievable approximation ratios. The formal statements of these results contain numerous qualifiers and technical conditions. 4

1. When γ is large, then assuming the Exponential-Time Hypothesis, any polynomial-time algorithm to solve the CIP (ignoring multiplicity constraints) must have approximation ratio γ − O(log γ). 2. When γ is large, then assuming P 6= N P , any polynomial-time algorithm to solve the CIP while respecting the multiplicity constraints within a 1 + ǫ multiplicative factor, must have approximation ratio Ω(γ/ǫ). 3. When γ is small, then the integrality gap of the CIP is 1 + Ω(γ). Finally, we give an extension to covering programs with multiple linear criteria. Our extension is much simpler than the algorithm of [18]; we show that even conditional on our solution x satisfying all the covering constraints, not only do we have E[Cl · x] ≤ βCl · x ˆ but that in fact the values of Cl · x are concentrated, roughly equivalent to the xi being independently distributed as Bernoulli with probability β x ˆi . Thus, for each l there is a very high probability that we have Cl · x ≈ Cl · x ˆ and in particular there is a good probability that we have Cl · x ≈ Cl · x ˆ simultaneously for all l. Theorem 1.4 (Informal). Suppose we are given a covering system with a fractional solution x ˆ and with r objective functions C1 , . . . , Cr , whose entries are in [0, 1] and such that Cℓ · x ˆ ≥ Ω(log r) 1 +1) for all ℓ = 1, . . . , r. Let γ = ln(∆ amin . Then our solution x satisfies the covering constraints with probability one; with probability at least 1/2, p ∀ℓ Cℓ · x ≤ β(Cℓ · x ˆ) + O( β(Cℓ · x ˆ) ln r)

√ where β = 1 + γ + 4 γ. (A similar result is possible, if we also want to ensure that xi ≤ ⌈ˆ xi (1 + ǫ)⌉; √ then the approximation ratio is 1 + 4 γ + 4γ/ǫ.)

This significantly improves on [18], in terms of both the approximation ratio as well as the running 0) ) time. Roughly speaking, the algorithm of [18] gave an approximation ratio of O(1 + log(1+∆ amin O(log r) (worse than the approximation ratio in the single-criterion setting) and a running time of n (polynomial time only when r, the number of objective functions, is constant).

1.2

Outline

In Section 2, we develop a randomized rounding algorithm when the fractional solution satisfies x ˆ ∈ [0, 1/α)n ; here α ≥ 1 is a key parameter which we will discuss how to select in later sections. This randomized rounding produces produces a binary solution vector x ∈ {0, 1}n , for which E[xi ] ≈ αˆ xi . In Section 3, we will develop a deterministic quantization scheme to handle fractional solutions of arbitrary size, using the algorithm of Section 2 as a subroutine. We will show an upper bound on the sizes of the variables xi in terms of the fractional x ˆi . We will also show an upper bound on E[xi ], which we state in a generalized form without making reference to column-sparsity or other properties of the matrix A. In Section 4, we consider the case in which we have a lower bound amin on the RHS constraint vectors ak , as well as an upper bound ∆1 on the ℓ1 -norm of the columns of A. Based on these 5

values, we set key parameters of the rounding algorithm, to obtain good approximation ratios as a function of amin , ∆1 . These approximation ratios do not respect multiplicity constraints. In Section 5, we extend these results to take into account the multiplicity constraints as well. We give two types of approximation algorithms: in the first, we allow a violation of multiplicity constraints by a multiplicative factor of 1 + ǫ. In the second, we respect the multiplicity constraints exactly. In Section 6, we construct a variety of lower bounds on achievable approximation ratios. These are based on integrality gaps as well as hardness results. These show that the approximation ratios developed in Section 4 are essentially optimal for most values of ∆1 , amin , ǫ, particularly when ∆1 is large. In Section 7, we show that our randomized rounding scheme obeys a negative correlation property, allowing us to show concentration bounds on the sizes of the objective functions Cl · x. This significantly improves on the algorithm of [18]; we show asymptotically better approximation ratios in many regimes, and we also give a polynomial-time algorithm regardless of the number of objective functions.

1.3

Comparison with the Lov´ asz Local Lemma

One type of rounding scheme that has been used for similar types of integer programs is based on the Lov´asz Local Lemma (LLL); we contrast this with our approach taken here. The LLL, first introduced in [5], is often used to show that a rare combinatorial structure can be randomly sampled from a probability space. In the basic form of randomized rounding, one must ensure that the probability of a “bad-event” (an undesirable configuration of a subset of the variables) — namely, that Ak · x < ak — is on the order of 1/m; this ensures that, with high probability, no bad events occur. This accounts for the term log m in the approximation ratio. The power of the LLL comes from the fact that the probability of a bad-event is not compared with the total number of events, but only with the number of events it affects. Thus, one may hope to show approximation ratios which are independent of m. At a heuristic level, the LLL should be applicable to the CIP problem. We have a series of badevents, one for each covering constraint. Furthermore, because of our assumption that the system is column-sparse, each variable only affects a limited number of these bad-events. Thus, it should be possible to use the LLL to obtain a scale-free approximation ratio. There has been prior work applying the LLL to packing integer programs, such as [13]. One technical problem with the LLL is that it only depends on whether bad-events affect each other, not the degree to which they do so. Bad-events which are only slightly correlated are still considered as dependent by the LLL. Thus, a weakness of the LLL for integer programs with arbitrary coefficients (i.e. allowing Aki ∈ [0, 1]), is that potentially all the entries of Aki could be extremely small yet non-zero, causing every constraint to affect each other by a tiny amount. For this reason, typical applications of the LLL to column-sparse integer programs have been phrased in terms of the l0 column norm ∆0 . For packing problems with no constraint-violation allowed, good approximations parametrized by ∆0 , but not in general by ∆1 , are possible [1].

6

In [10], Harvey addressed this technical problem by applying a careful, multi-step quantization scheme with iterated applications of the LLL, to discrepancy problems with coefficient matrices where the ℓ1 norm of each column and each row is “small”. The LLL, in its classical form, only shows that there is a small probability of avoiding all the bad-events. Thus, it does not lead to efficient algorithms. In [14], Moser & Tardos solved this longstanding problem by introducing a resampling-based algorithm. This algorithm initially samples all random variables from the underlying probability space, and will continue resampling subsets of variables until no more bad-events occur. Most applications of the LLL, such as [10], would yield polynomial-time algorithms using this framework. In the context of integer programming, the Moser-Tardos algorithm can be extended in ways which go beyond the LLL itself. In [9], Harris & Srinivasan described a variant of the Moser-Tardos algorithm based on “partial resampling”. In this scheme, when one encounters a bad-event, one only resamples a random subset of the variables (where the probability distribution on which variables to resample is carefully chosen). This was applied for “assignment-packing” integer programs with small constraint violation. These bounds, like those of [10], depend on ∆1 . It is possible to formulate the CIP problem in the LLL framework, and to view our algorithm as a variant of the Moser-Tardos algorithm. This would achieve qualitatively similar bounds, albeit with asymptotics which are noticeably worse than the ones we give here. In particular, using the LLL directly, one cannot achieve approximation factors of the form 1 + γ when γ → ∞; one obtains instead an approximation ratio of 1 + cγ where c is some constant strictly larger than one. The case when γ → 0 is more complicated and there the LLL-based approaches appear to be asymptotically weaker by super-constant factors. The algorithm we develop has been extensively modified and specialized to the CIP case. In addition, we have taken advantage of two advanced variants of the MT algorithm developed in [9], [8]. These forms of the MT algorithm can be, in general, stronger than the LLL even nonconstructively. For the most part we will discuss our algorithm in a self-contained way, keeping the comparison with the LLL more as informal motivation than technical guide.

2

The RELAXATION algorithm

We first consider the case when all the values of x ˆ are small; this turns out to be the critical case for this problem. In this case, we present an algorithm which we label RELAXATION. Initially, this algorithm draws each xi as an independent Bernoulli trials with probability pi = αˆ xi , for some parameter α > 1. This will satisfy many of the covering constraints, but there will still be some left unsatisfied. We loop over all such constraint; whenever a constraint k is unsatisfied, we modify the solution as follows: for each variable i which has xi = 0, we set xi to be an independent Bernoulli random variable with probability σAki αˆ xi . Here σ ∈ [0, 1] is another parameter which we will also discuss how to select. For the remainder of this Section 2, we assume throughout that σ ∈ [0, 1] and α > 1 are given parameters, and that in addition we have xi < 1/α for all i ∈ [n]. This assumption will not be stated explicitly in the remainder.

7

Algorithm 1 The RELAXATION algorithm 1: function relaxation(ˆ x, A, a, σ, α) 2: for i from 1, . . . , n do ⊲ Initialization 3: xi ∼ Bernoulli(αˆ xi ) 4: while A · x 6≥ a do ⊲ The covering constraints are not all satisfied 5: Let k be minimal such that Ak · x < ak 6: for i from 1, . . . , n do 7: if xi = 0 then 8: xi ∼ Bernoulli(σAki αˆ xi ) 9: return x

Note that this algorithm only increments the variables. Hence, when a constraint k is satisfied, it will remain satisfied until the end of the algorithm. Whenever we encounter an unsatisfied constraint k and draw new values for the variables (lines 6–8), we refer to this as resampling the constraint k. There is an alternative way of looking at the resampling procedure, which seems counterintuitive but will be crucial for our analysis. Instead of setting each variable xi = 1 with probability σAki αˆ xi , we instead select a subset Y ⊆ [n], where each i currently satisfying xi = 0 goes into Y independently with probability σAki . Then, for each variable i ∈ Y , we draw xi ∼ Bernoulli(pi ), where pi = αˆ xi . It is clear that this two-part sampling procedure is equivalent to the one-step procedure described in Algorithm 1. In this case, we say that Y is the resampled set for constraint k. If i ∈ Y (for any constraint k) we say that variable i is resampled. For every variable i, we either have xi = 1 at the initial sampling, or xi first becomes equal to one during some resampling of a constraint k; or xi = 0 at the end of the algorithm. If xi = 1 for the first time at the j th resampling of constraint k, we say i turns at (k, j). If xi = 1 initially, we say that i turns at 0. In the algorithm as we have described, the first step is to set the variables xi as independent Bernoulli with probability pi . Our analysis, following [9] and [8], is based on an inductive argument, in which we consider what occurs when x is set to some arbitrary value. If A · x ≥ a, then the algorithm is already finished. If not, there will be a series of modifications made to x until it terminates. Given any fixed value of x, we will show upper bounds on the probability of certain future events. Lemma 2.1. Let Z1 , . . . , Zj be subsets of [n]. The probability that the first j resampled sets for Q constraint k are respectively Z1 , . . . , Zj is at most jl=1 fk (Zl ), where we define fk (Z) = (1 − σ)−ak

Y

(1 − Aki σ)

i∈[n]

Y (1 − pi )Aki σ 1 − Aki σ

i∈Z

Proof. For any integer T ≥ 0, any integer j ≥ 0, any sets Z1 , . . . , Zj ⊆ [n] and any vector v ∈ {0, 1}n , we define the following random process and the following event E(T, j, Z1 , . . . , Zj , v): Suppose that instead of drawing x ∼ Bernoulli(αˆ xi ) as in line 3 of RELAXATION, we set x = v, and we continue the remaining steps of the RELAXATION algorithm (lines 4–8) until done. We say that in this process event E(T, j, Z1 , . . . , Zj , v) has occurred if: 8

1. There are at < T total resamplings 2. There are at least j resamplings of constraint k 3. The first j resampled sets for constraint k are respectively Z1 , . . . , Zj . We claim now that for any Z1 , . . . , Zj , and v ∈ {0, 1}n , and any integer T ≥ 0, we have P (E(T, j, Z1 , . . . , Zj , v)) ≤ Q

Qj

l=1 fk (Zl )

i∈Z1 ∪···∪Zj (1

− pi )

(1)

(Note that pi < 1 by our assumption xi < 1/α, and so the RHS of (1) is always well-defined.) We shall prove (1) by induction on T . For the base case (T = 0) this is trivially true, because E(T, j, Z1 , . . . , Zj , v) is impossible (there must be at least 0 resamplings), and so the LHS of (1) is zero while the RHS is non-negative. We move on to the induction step. If Av ≥ a, then the RELAXATION algorithm performs no resamplings. Thus, if j ≥ 1, then event E(T, j, Z1 , . . . , Zj , v) is impossible and again (1) holds. On the other hand, if j = 0, then the RHS of (1) is equal to one, and again this holds vacuously. So we suppose Av 6≥ a; let k′ be minimal such that Ak′ v < ak′ . Then the first step of RELAXATION is to resample constraint k′ . We observe that if vi = 1 for any i ∈ Z1 ∪ · · · ∪ Zj , then the event E(T, j, Z1 , . . . , Zj , v) is impossible. The reason for this is that we only resample variables which are equal to zero; thus variable i can never be resampled for the remainder of the RELAXATION algorithm. In particular, we will never have i in any resampled set. Thus, as i ∈ Z1 ∪ · · · ∪ Zj , it is impossible for Z1 , . . . , Zj to eventually be the resampled sets for constraint k. So if vi = 1 for any i ∈ Z1 ∪ · · ·∪ Zj then (1) holds vacuously. Let x′ denote the value of the variables after the first resampling (x′ is a random variable). Then we observe that the remaining steps of the RELAXATION algorithm are equivalent to what would have occurred if we had set x = x′ initially. Now, suppose that k′ 6= k. Then after the first resampling, the event E(T, j, Z1 , . . . , Zj , v) becomes equivalent to the event E(T − 1, j, Z1 , . . . , Zj , x′ ). Thus, in this case, we have P (E(T, j, Z1 , . . . , Zj , v)) = P (E(T − 1, j, Z1 , . . . , Zj , x′ )) Qj l=1 fk (Zl ) induction hypothesis ≤Q i∈Z1 ∪···∪Zj (1 − pi )

and this shows the induction step as desired. (Note that here we are able to bound the probability of the event E(T − 1, j, Z1 , . . . , Zj , x′ ), even though x′ is a random variable instead of a fixed vector, because our induction hypothesis applies to all vectors v ∈ {0, 1}n . Next, suppose that k = k′ . In this case, we observe that the following are necessary events for E(T, j, Z1 , . . . , Zj , v): (B1) Y = Z1 , where Y is the first resampled set for constraint k ′ = k.

9

(B2) For any i ∈ Z1 ∩ (Z2 ∪ · · · ∪ Zj ), in the first resampling step (which includes variable i), we draw xi = 0. (B3) E(T − 1, j − 1, Z2 , Z3 . . . , Zj , x′ ) The condition (B2) follows from the observation, made earlier, that E(T − 1, j − 1, Z2 , Z3 . . . , Zj , x′ ) is impossible if x′i = 1 but i ∈ Z2 ∪ · · · ∪ Zj . Any such i ∈ Z1 must be resampled (due to condition (B1)), and it must be resampled to become equal to zero. Let us first bound the probability of the condition (B1). As we Q put each i into Y with probability Aki σ independently, the probability that all i ∈ Z1 go into Y is i∈Z1 Aki σ. By the same token, if vi = 0, then i avoids going into Y with probability 1 − Aki σ. Therefore, the overall probability of selecting Y = Z1 is given by Y Y Aki σ (1 − Aki σ) P (Y = Z1 ) = i∈Z1

=

Y

i∈Z / 1 ,vi =0

Aki σ

i∈Z1

=

Y

 Y

i∈Z / 1

(1 − Aki σ)

i∈[n]

  Y (1 − Aki σ) (1 − Aki σ)−1 vi =1

Y

i∈Z1

(as vi = 0 for all i ∈ Z1 )

Y Aki σ (1 − Aki σ)−1 1 − Aki σ i:vi =1

By definition of k′ , we have that Ak v < ak . By Proposition A.1, we thus have: Y (1 − Aki σ)−1 ≤ (1 − σ)−ak i:vi =1

further implying: P (Y = Z1 ) ≤ (1 − σ)−ak

Y

(1 − Aki σ)

i∈[n]

Y

i∈Z1

Aki σ 1 − Aki σ

(2)

Next, let us consider the probability of (B2), conditional on (B1). Each i ∈ Y is drawn independently as Bernoulli-pi ; thus, the total probability of event (B2), conditional on (B1), is at most Q i∈Z1 ∩(Z2 ∪···∪Zj ) (1 − pi ).

Finally, let us consider the probability of event (B3), conditional on (B1) and (B2). Observe that the event E(T − 1, j − 1, Z2 , Z3 . . . , Zj , x′ ) is conditionally independent of events (B1) and (B2), given x′ . By the law of total probability, we have P (E(T − 1, j − 1, Z2 , Z3 . . . , Zj , x′ ) | (B1), (B2)) X = P (E(T − 1, j − 1, Z2 , Z3 . . . , Zj , v ′ )P (x′ = v ′ ) v′ ∈{0,1}n



X

l=2 fk (Zl )

v′ ∈{0,1}n

=Q

Qj

Qj

Q

i∈Z2 ∪···∪Zj (1 − pi )

P (x′ = v ′ )

l=2 fk (Zl )

i∈Z2 ∪···∪Zj (1

− pi ) 10

induction hypothesis

Thus, as (B1), (B2), and (B3) are necessary conditions for E(T, j, Z1 , . . . , Zj , v), we have P (E(T, j, Z1 , . . . , Zj , v)) ≤ (1 − σ)−ak ×

i∈[n]

i∈Z1 ∩(Z2 ∪···∪Zj )

Qj

i∈Z2 ∪···∪Zj (1

Y

i∈Z1

×Q

Y

i∈Z1

Aki σ 1 − Aki σ

(1 − pi )

l=2 fk (Zl )

= (1 − σ)−ak

Y

− pi )

(1 − Aki σ)

i∈[n]

Y

i∈Z1

Aki σ 1 − Aki σ

(1 − pi ) Qj

l=2 fk (Zl )

i∈Z1 ∪···∪Zj (1 − pi ) Qj l=2 fk (Zl ) Q

= fk (Z1 ) × and the induction claim again holds.

(1 − Aki σ)

Y

×Q ×

Y

i∈Z1 ∪···∪Zj (1

− pi )

Thus, we have shown that (1) holds for any integer T ≥ 0 and any Z1 , . . . , Zj , and v ∈ {0, 1}n . Next, for any sets Z1 , . . . , Zj and any v ∈ {0, 1}n , let us define the event E(j, Z1 , . . . , Zj , v) to be the event that, if we start the RELAXATION algorithm with x = v, then the first j resampled sets for constraint k are respectively Z1 , . . . , Zj ; we make no condition on the total number of resamplings. Observe that we have the increasing chain E(0, j, Z1 , . . . , Zj , v) ⊆ E(1, j, Z1 , . . . , Zj , v) ⊆ E(2, j, Z1 , . . . , Zj , v) ⊆ . . . S and E(j, Z1 , . . . , Zj , v) = ∞ T =0 E(T, j, Z1 , . . . , Zj , v). By countable additivity of the probability measure, we have: P (E(j, Z1 , . . . , Zj , v)) = lim P (E(T, j, Z1 , . . . , Zj , v)) T →∞ Qj l=1 fk (Zl ) Q by (1) ≤ lim T →∞ i∈Z1 ∪···∪Zj (1 − pi ) Qj l=1 fk (Zl ) =Q i∈Z1 ∪···∪Zj (1 − pi ) So far, we have computed the probability of having Z1 , . . . , Zj be the first j resampled sets for constraint k, given that x is fixed to an arbitrary initial value v. We now can compute the probability that Z1 , . . . , Zj are the first j resampled sets for constraint k given that x is drawn as independent Bernoulli-pi . In the first step of the RELAXATION algorithm, we claim that a necessary event for Z1 , . . . , Zj to be the first j resampled sets is to have xi = 0 for each i ∈ Z1 ∪ · · · ∪ Zj ; the rationale for this 11

is equivalent to that for (B2). This event has probability the event P (E(j, Z1 , . . . , Zj , x)) occur.

Q

i∈Z1 ∪···∪Zj (1

− pi ). We must then have

The probability of E(j, Z1 , . . . , Zj , x), conditional on xi = 1 for all i ∈ Z1 ∪ · · · ∪ Zj , is at most Q

Qj

l=1 fk (Zl )

i∈Z1 ∪···∪Zj (1−pi )

(by a similar argument to that of computing the probability of (B3) conditional

on (B1), (B2)). Thus, the overall probability that the first j resampled sets for constraint k are Z1 , . . . , Zj is at most P (Z1 , . . . , Zj are first k resampled sets) ≤

Y

i∈Z1 ∪···∪Zj

as desired. We next compute

P

Z⊆[n] fk (Z);

Proposition 2.2. Suppose α >

(1 − pi ) × Q

Qj

l=1 fk (Zl )

i∈Z1 ∪···∪Zj (1

− pi )

such sums will recur in our calculations. − ln(1−σ) . σ

For any constraint k define

sk = (1 − σ)−ak e−σαAk ·ˆx Then for all k = 1, . . . , m we have

P

Z⊆[n] fk (Z)

≤ sk < 1.

Proof. We have X

Z⊆[n]

fk (Z) =

X

(1 − σ)−ak

Z⊆[n]

= (1 − σ)−ak = (1 − σ)−ak = (1 − σ)−ak

Y

(1 − Aki σ)

i∈[n]

Y (1 − pi )Aki σ 1 − Aki σ

i∈Z

Y

X Y (1 − pi )Aki σ (1 − Aki σ) 1 − Aki σ

Y

(1 − Aki σ)

i∈[n]

i∈[n]

Y

Z⊆[n] i∈Z n  Y

1+

i=1

(1 − pi )Aki σ  1 − Aki σ

(1 − Aki pi σ)

i∈[n] P −ak −σ i Aki pi

≤ (1 − σ)

e

−ak −σαAk ·ˆ x

= (1 − σ)

e

Also, noting that x ˆ satisfies the covering constraints (i.e., Ak · x ˆ ≥ ak ), we have that sk = (1 − σ)−ak e−σαAk ·ˆx < (1 − σ)−ak e−σak

− ln(1−σ) σ

Proposition 2.3. For any constraint k and any i ∈ [n], we have X fk (Z) ≤ sk Aki σ Z⊆[n],Z∋i

12

=1

=

j Y l=1

fk (Zl )

Proof. We have: X

Z⊆[n] Z∋i

fk (Z) =

X

(1 − σ)−ak

Z⊆[n] Z∋i

= (1 − σ)−ak

Y

(1 − Akl σ)

l∈[n]

(1 − pi )Aki σ 1 − Aki σ

= (1 − σ)−ak (1 − pi )Aki σ = (1 − σ)−ak (1 − pi )Aki σ

Y (1 − pl )Akl σ 1 − Akl σ

l∈Z

Y

(1 − Akl σ)

Y

(1 − Akl σ)

l∈[n]−{i}

l∈[n]−{i}

Y

X

Y

Z⊆[n],Z∋i l∈Z−{i}

Y



l∈[n]−{i}

1+

(1 − pl )Akl σ 1 − Akl σ

(1 − pl )Akl σ  1 − Akl σ

(1 − Akl pl σ)

l∈[n]−{i} P − l∈[n]−{i} Akl pl σ

≤ (1 − σ)−ak (1 − pi )Aki σe

≤ (1 − σ)−ak (1 − pi )Aki σe−σ(Ak ·p−Aki pˆi )

= (1 − σ)−ak (1 − pi )Aki σeσAki pi e−σα(Ak ·ˆx)

= sk (1 − pi )Aki σeσAki pi

Now note that Aki ≤ 1, σ ≤ 1 and hence (1 − pi )eσAki pi ≤ 1. Thus we have that X fk (Z) ≤ sk Aki σ Z⊆[n] Z∋i

To gain some intuition about this expression sk , note that if we set σ = 1 − 1/α (which is not necessarily the optimal choice for the overall algorithm), then we have sk = αak e−Ak ·ˆx(α−1) and this can be recognized as the Chernoff lower-tail bound. Namely, this is an upper bound on the probability that a sum of independent [0, 1]-random variables, with mean αAk · x ˆ, will become as small as ak . This makes sense: for example at the very first step of the algorithm (before any resamplings are performed), then Ak · x is precisely a sum of independent Bernoulli variables with mean αAk · x ˆ. The event we are measuring (the probability that a constraint k is resampled) is precisely the event that this sum is smaller than ak . Proposition 2.4 gives a bound on what is effectively the running time of the algorithm: . Then the expected Proposition 2.4. Suppose Ak · x ˆ ≥ ak for all k = 1, . . . , m and α > − ln(1−σ) σ P 1 number of resamplings steps made by the algorithm RELAXATION is at most k eσαAk ·ˆx (1−σ) . ak −1 Proof. Consider the probability that there are ≥ l resamplings of constraint k. A necessary condition for this to occur is that there are sets Z1 , . . . , Zl such that Z1 , . . . , Zl are respectively the first

13

l resampled sets for constraint k. Taking a union-bound over Z1 , . . . , Zl , we have: X P (≥ l resamplings of constraint k) ≤ P (Z1 , . . . , Zl are first resampled sets for constraint k) Z1 ,...,Zl ⊆[n]



X

fk (Z1 ) . . . fk (Zl )

(by Lemma 2.1)

Z1 ,...,Zl ⊆[n]

=(

X

fk (Z))l

Z⊆[n]

≤ slk

(by Proposition 2.2)

Thus, the expected number of resamplings of constraint k is at most ∞ X

slk =

l=1

=

1 1/sk − 1

as sk < 1

1

(1 −

σ)ak e−σαAk ·ˆx

We also give a crucial bound on the distribution of the variables xi at the end of the resampling process. Theorem 2.5. Suppose Ak · x ˆ ≥ ak for all k = 1, . . . , m and α >

− ln(1−σ) . σ

Then for any i ∈ [n], the probability that xi = 1 at the conclusion of RELAXATION algorithm is at most   X Aki P (xi = 1) ≤ αˆ xi 1 + σ eσαAk ·ˆx (1 − σ)ak − 1 k

Proof. There are two possible ways to have xi = 1: either i turns at 0 or it turns at (k, j) for some k ∈ [m], j ≥ 1. The former event has probability pi . Suppose that i turns at (k, j). In this case, there must be sets Z1 , . . . , Zj such that: (C1) The first j resampled sets for constraint k are respectively Z1 , . . . , Zj (C2) i ∈ Zj (C3) During the jth resampling of constraint k, we set xi = 1. Now, observe that the probability of (C3), conditional on (C1), (C2), is pi . The reason for this is that event (C3) occurs after (C1), (C2) are already determined. Thus, we can use time-stochasticity to compute the conditional probability. For any fixed k ∈ [m] and any fixed sets Z1 , . . . , Zj , the probability that Z1 , . . . , Zj are the first j resampled sets is at most fk (Z1 ) · · · fk (Zj ) by Lemma 2.1 14

Thus, in total, the probability that events (C1)–(C3) hold for a fixed Z1 , . . . , Zj where i ∈ Zj , is at most pi fk (Z1 ) · · · fk (Zj ). We now take a union bound over all k ∈ [m] and all integers j ≥ 1 and all sets Z1 , . . . , Zj ⊆ [n] with i ∈ Zj . This gives: P (xi = 1) ≤ pi +

m X ∞ X

X

pi fk (Z1 ) . . . fk (Zj )

k=1 j=1 Z1 ,...,Zj ⊆[n] i∈Zj m X X

fk (Z)

= pi (1 +

m X

X

fk (Z)

= pi (1 +

X

fk (Z)

= pi (1 +

k=1 m X

P

∞ X

X

fk (Z1 ) . . . fk (Zj−1 )

∞ X

skj−1 )

by Proposition 2.2

j=1

k=1 Z⊆[n],Z∋i

m X

fk (Z1 ) . . . fk (Zj−1 )

j=1 Z1 ,...,Zj−1 ⊆[n]

k=1 Z⊆[n],Z∋i

m X

X

j=1 Z1 ,...,Zj ⊆[n] Zj =Z

k=1 Z⊆[n],Z∋i

= pi (1 +

∞ X

Z⊆[n],Z∋i fk (Z)

1 − sk

)

as sk < 1

sk Aki σ ) (by Proposition 2.3) 1 − sk k=1   X Aki = αˆ xi 1 + σ eσαAk ·ˆx (1 − σ)ak − 1 ≤ pi (1 +

k

3 3.1

Extension to the Case Where xˆi is Large Overview

In the previous section, we described the RELAXATION algorithm under the assumption that x ˆi < 1/α for all i. This assumption was necessary because each variable i is chosen to be drawn as a Bernoulli random variable with probability pi = αˆ xi . In this section, we give a rounding scheme to cover fractional solutions x ˆ of unbounded size. We first give an overview of this process.   P ki of Section 2. Our goal is to extend the approximation ratio ρi = α 1 + σ k eσαAk ·ˆxA (1−σ)ak −1 First, note that if we have a variable i, and a solution to the LP with fractional value x ˆi , we can sub-divide it into two new variables y1 , y2 with fractional values yˆ1 , yˆ2 such that yˆ1 + yˆ2 = x ˆi . Now, whenever the variable xi appears in the covering system, we replace it by y1 + y2 . This process of sub-dividing variables can force all the entries in the fractional solution to be arbitrarily small. We can run the RELAXATION algorithm on this subdivided fractional solution, obtaining an integral solution y1 , y2 and hence xi = y1 + y2 . Observe that the approximation ratios for the two new variables both equal to ρi itself. Thus E[xi ] = E[y1 + y2 ] ≤ ρi y1 + ρi y2 ≤ ρi x ˆi . 15

By subdividing the fractional solution, we can always ensure that we obtain the same approximation for the general case (in which x ˆ is unbounded) as in the case in which x ˆ is restricted to entries of bounded size. However, this may violate the multiplicity constraints: in general, if we subdivide a fractional solution x ˆi into yˆ1 , . . . , yˆl , and then set xi = y1 + · · · + yl , then xi could become as large as l. There is another, simpler way to deal with large values x ˆi : for any variable with x ˆi ≥ 1/α, simply set xi = 1. Then, we certainly are guaranteed that E[xi ] ≤ αˆ xi ≤ ρi x ˆi . Let us see what problems this procedure might cause. Consider some variable i with x ˆi = r ≥ 1/α.1 Because we have fixed xi = 1, we may remove this variable from the covering system. When we do so, we obtain a residual problem A′ , a′ , in which the ith column of A is replaced by zero and all the RHS vectors ak are replaced by a′k = ak − Aki . Suppose that variable i appears in constraint k with another variable i′ with Aki = 1. We want to Aki′ bound E[x′i ] in terms of βi′ ; to do so, we want to show that constraint k contributes eσαAk ·ˆx (1−σ) ak −1

to E[x′i ]. Now, in the residual problem, we replace ak with ak −1 and we replace Ak · x ˆ with Ak · x ˆ −r. ′ Thus, constraint k contributes the following to βi′ : Contribution =

Aki′ Aki′ = eσα(Ak ·ˆx−r) (1 − σ)ak −1 − 1 eσα(Ak ·ˆx) (1 − σ)ak e−σαr (1 − σ)−1 − 1

, then this is larger than the original contribution term we wanted Observe that if r > − ln(1−σ) ασ Aki′ to show, namely ρi = eσαAk ·ˆx (1−σ)ak . Thus, there is a critical cut-off value θ = − ln(1−σ) ; when ασ x ˆi > θ, then forcing xi = 1 gives a good approximation ratio for variable i but may have a worse approximation ratio for other variables which interact with it. We can now combine these two methods for handling large entries of x ˆi . For any variable i, we first subdivide variable i into multiple variables yˆ1 , . . . yˆl with fractional value θ, along with one further entry yˆl+1 ∈ [0, θ]. We immediately set y1 , . . . , yl = 1. If yˆl+1 ≥ 1/α, we set yl+1 = 1 as well, otherwise we will apply the RELAXATION algorithm for it. At the end of this procedure, we know y1 + · · · + yˆl ) + ρi yˆl+1 ≤ that xi = y1 + · · · + yl+1 ≤ (l + 1) = ⌈ xˆθi ⌉. We also know that E[xi ] ≤ α(ˆ ρi (ˆ y1 + · · · + yˆl+1 ) = ρi x ˆi . Thus, we get a good approximation ratio and a good bound on the multiplicity of xi .

3.2

The ROUNDING algorithm

For each variable i, let vi = ⌊ˆ xi /θ⌋, where we define θ=

− ln(1 − σ) ασ

We define Fi = x ˆi − vi θ which we can write as Fi = x ˆi mod θ. We also define: ( ( Fi if Fi < 1/α 0 if Fi < 1/α , x ˆ′i = Gi = 0 if Fi ≥ 1/α 1 if Fi ≥ 1/α 1

To gain intuition, the reader may consider the case in which r > 1. In this case, it is obvious that this is a bad rounding procedure. It is instructive to trace through exactly why it fails badly.

16

P We form the residual problem a′k = ak − i Aki (Gi +vi ). We then run the RELAXATION algorithm on the residual problem, which satisfies the condition that x′i ∈ [0, 1/α]n . This is summarized in Algorithm 2. Algorithm 2 The ROUNDING algorithm function ROUNDING(ˆ x, A, σ, α) − ln(1−σ) . Set θ = ασ for i from 1, . . . , n do vi = ⌊ˆ xi /θ⌋ Fi = x ˆi mod θ. Gi = 1 if Fi ≥ 1/α, Gi = 0 otherwise. x ˆ′i = 0 if Fi ≥ 1/α, x ˆ′i = Fi otherwise. for k from 1, . . . , m P do Set a′k = ak − i Aki (Gi + vi )

Compute x′ = RELAXATION(ˆ x′ , A, a′ , σ, α) ′ Return x = G + v + x

We begin by showing a variety of simple bounds on the variables before and after the quantization steps. Proposition 3.1. Suppose that x′ ∈ {0, 1}n satisfies the residual covering constraints, that is, Ak · x′ ≥ a′k for all k = 1, . . . , m. Then the solution vector returned by the ROUNDING algorithm, defined by x = G + v + x′ , satisfies the original covering constraints. Namely, Ak · x ≥ ak for all k = 1, . . . , m. Proof. For each k we have: Ak · x = Ak · (x′ + G + v) = Ak · x′ +

X i

Aki (Gi + vi ) ≥ a′k +

X

Aki (Gi + vi ) = ak

i

Proposition 3.2. For any i ∈ [n] we have x ˆi − vi θ − Gi θ ≤ x ˆ′i ≤ x ˆi − vi θ − Gi /α Proof. If Gi = 0, then both of the bounds hold with equality. So suppose Gi = 1. In this case, we have 1/α ≤ x ˆi − vi θ ≤ θ. So xi − vi θ − Gi /α ≥ θ − 1/α ≥ 0 and xi − vi θ − Gi θ ≤ θ − θ = 0 as required. Proposition 3.3. For any i, at the end of the procedure ROUNDING, we have   ασ xi ≤ x ˆi − ln(1 − σ)

17

Proof. Note that 1/θ =

ασ − ln(1−σ) .

So we must show that xi ≤ ⌈ˆ xi /θ⌉.

First, suppose x ˆi is not a multiple of θ. Then xi = x′i + Gi + ⌊xi /θ⌋. Note that if Gi = 1, then ′ x ˆi = 0 which implies that x′i = 0. So Gi + vi ≤ 1 and hence xi ≤ 1 + ⌊xi /θ⌋ = ⌈xi /θ⌉. Next, suppose x ˆi is a multiple of θ. Then Gi = x ˆ′i = 0 and so x′i = 0 and we have xi = ⌊xi /θ⌋ = ⌈xi /θ⌉. The next result shows that the quantization steps can only decrease the inflation factor for the RELAXATION algorithm. Proposition 3.4 is the reason for our choice of θ. Proposition 3.4. For any constraint k, we have ′



(1 − σ)ak eσαAk ·ˆx ≥ (1 − σ)ak eσαAk ·ˆx Proof. Let r =

P

i

Aki (Gi + vi ). By definition, we have a′k = ak − r. We also have: X Ak · x ˆ′ = Aki x ˆ′i i



X i

Aki (ˆ xi − vi θ − Gi θ)

(by Proposition 3.2)

= ak − rθ Then ′



(1 − σ)ak eσαAk ·ˆx = (1 − σ)ak −r eσαAk ·ˆx



≥ (1 − σ)ak −r eσα(ak −rθ)

= (1 − σ)−ak e−σαak × ((1 − σ)eθσα )−r

= (1 − σ)−ak e−σαak

We can now show an overall bound on the behavior of the ROUNDING algorithm Theorem 3.5. Suppose that α >

− ln(1−σ) . σ

Then at the end of the ROUNDING algorithm, we have for each variable i   X Aki E[xi ] ≤ αˆ xi 1 + σ eσαak (1 − σ)ak − 1 k

The expected number of resamplings for the RELAXATION algorithm is at most Proof. Define T =1+σ

X k

eσαak (1 18

Aki − σ)ak − 1

P

1 k eσαak (1−σ)ak −1 .

By Theorem 2.5, the probability that x′i = 1 is at most  X x′i 1 + σ P (x′i = 1) ≤ αˆ k



αˆ x′i T

 Aki (1 − σ) eσαAk ·ˆx′ − 1 a′k

(by Proposition 3.4)

So we estimate E[xi ] by: E[xi ] = vi + Gi + E[x′i ] ≤ vi + Gi + αˆ x′i T

≤ vi + Gi + α(ˆ xi − θvi − Gi /α)T ≤ vi (1 − αθ) + αˆ xi T

≤ αˆ xi T

as αθ =

(by Proposition 3.2)

− ln(1 − σ) ≥1 σ

This shows the bound on E[xi ]. The bound on the expected number of resamplings is similar.

4

Bounds in terms of amin , ∆1

So far, we have given bounds on the behavior of ROUNDING algorithm which are as general as possible. Theorem 3.5 can be applied to systems which have multiple types of variables and constraints. However, we can obtain a simpler bound by reducing these to two simple parameters, namely ∆1 , the maximum ℓ1 -norm of any column of A, and amin = mink ak . We will first assume that amin ≥ 1, ∆1 ≥ 1. Later, Theorem 4.2 will show that we can always ensure that this holds with a simple pre-processing step. Theorem 4.1. Suppose we are given a covering system with ∆1 ≥ 1, amin ≥ 1 and with a fractional 1 +1) solution x ˆ. Let γ = ln(∆ amin . Then with appropriate choices of σ, α we may run the ROUNDING algorithm on this system to obtain a solution x ∈ Zn+ satisfying √  E[xi ] ≤ x ˆi 1 + γ + 4 γ & √ ' 1 γ γ + with probability one xi ≤ x ˆi 2 √ ln(1 + γ) The expected running time of this algorithm is O(mn). Proof. We set σ = 1 − 1/α, where α > 1 is a parameter to be determined. Now note that we have − ln(1−σ) ln α = α α−1 < α. σ

19

So we may apply Theorem 3.5; for each i ∈ [n] we have:   X Aki E[xi ] ≤ x ˆi α 1 + σ (1 − σ)ak eσαak − 1 k   X Aki =x ˆi α 1 + (1 − 1/α) eak (α−1) α−ak − 1 k   X Aki ≤x ˆi α 1 + (1 − 1/α) eamin (α−1) α−amin − 1 k   ∆1 ≤x ˆi α + (α − 1) a (α−1) −a e min α min − 1 √ Now substituting α = 1 + γ + 2 γ > 1 and amin = ln(∆1 + 1)/γ gives √   (2 γ + γ)∆1 √ √ √ E[xi ] ≤ x ˆi 1 + γ + 2 γ + 2 γ+γ−2 ln(1+ γ) γ (∆1 + 1) −1 Proposition A.2 shows that this is decreasing function of ∆1 . We are assuming amin ≥ 1, which implies that ∆1 ≥ eγ − 1. We can thus obtain an upper bound by substituting ∆1 = eγ − 1, yielding √   (eγ − 1) γ + 2 γ  √ √ E[xi ] ≤ x ˆi 1 + γ + 2 γ + (3) eγ+2 γ √ 2 − 1 ( γ+1) Some simple analysis of the RHS of (3) shows that we have √ E[xi ] ≤ x ˆi (1 + γ + 4 γ) To show the bound on the size of xi , we apply Proposition 3.3, giving us  &  √ ' 1 γ γ + ασ = x ˆi 2 xi ≤ x ˆi √ − ln(1 − σ) ln(1 + γ) Next, we will analyze the runtime of this procedure. The initial steps of rounding and forming the residual can be done in time O(mn). By Theorem 3.5, the expected number of resampling steps made by the RELAXATION algorithm is at most X 1 E[Resampling Steps] ≤ a (α−1) e k α−ak − 1 k m ≤ a (α−1) −a min e α min − 1 m √ √ ≤ γ+2 γ−2 ln(1+ γ) γ (∆1 + 1) −1 m ≤ ≤m ∆1 In each resampling step, we must draw a new random value for all the variables; this can be easily done in time O(n). The algorithm in its entirety is bounded by O(mn) as required. 20

We now show how to ensure that amin ≥ ∆1 ≥ 1: Theorem 4.2. Suppose we are given a covering system A, a with γ = ln(∆1 + 1)/amin . Then, in time O(mn), one can produce a modified system A′ , a′ which satisfies the following properties: 1. The integral solutions of A, a are precisely the same as the integral solutions of A′ , a′ ; 2. a′min ≥ 1 and ∆′1 ≥ 1; 3. We have γ ′ ≤ γ, where γ ′ = ln(∆′1 + 1)/a′min . Proof. First, suppose that there is some entry Aki with Aki > ak . In this case, set A′ki = ak . Observe that any integral solution to the constraint Ak · x ≥ ak also satisfies A′k · x ≥ ak , and vice-versa. This step can only decrease ∆1 and hence γ ′ ≤ γ. After this step, one can assume that Aki ≤ ak for all k, i. Now suppose there are some constraints with ak ≤ 1. In this case, replace row Ak with A′k = Ak /ak and replace ak with a′k = 1. Because of our assumption that Aki ≤ ak for all k, i, the new row of the matrix still satisfies A′k ∈ [0, 1]n . This step ensures that a′k ≥ 1 for all k. Also, every column in the matrix is scaled up by at most 1/ak ≤ 1/amin , so we have ∆′1 ≤ ∆1 /amin and a′min = 1. We then have γ ′ = ln(∆′ + 1)/a′min = ln(∆1 /amin + 1) ln(∆1 + 1) = γ. ≤ amin Finally, suppose that ∆1 ≤ 1. In this case, observe that we must have Aki ≤ ∆1 for all k, i. Thus, we can scale up both A, a by 1/∆1 to obtain A′ = A/∆1 , a′ = a/∆1 . This gives ∆′ = 1, a′min = amin /∆1 γ′ =

Corollary 4.3. Let γ =

ln(∆1 +1) amin .

ln(1 + 1) ln(∆1 + 1) ≤ =γ amin /∆1 amin

In polynomial time, one may obtain a solution x ∈ Zn+ satisfying

√  C · x ≤ 1 + γ + O( γ) OPT

where OPT is the optimal integral solution to the original CIP.

Proof. First, apply Theorem 4.2 to ensure that ∆1 ≥ 1, amin ≥ 1; the resulting CIP has a parameter ln(∆′ +1) ≤ γ. Next, solve the corresponding LP to obtain a fractional solution x ˆ ∈ [0, ∞)n . γ ′ = a′ 1 min √ Finally, apply Theorem 4.1 to obtain an integral solution x such that E[C·x] ≤ (1+γ ′ +4 γ ′ )OPT ≤ √ (1 + γ + 4 γ)OPT. √ 1 ) expected Let p denote the probability that C · x > (1 + γ + 8 γ)OPT. We must run O( 1−p √ iterations of the algorithm of Theorem 4.1 to ensure that C · x ≤ (1 + γ + O( γ))OPT (in actuality, not merely expectation). As each iteration of Theorem 4.1 runs in polynomial time, it suffices to 21

show that p ≤ 1 − (mn)−Ω(1) to show that we have a algorithm running in expected polynomial time. Observe that C · x ≥ 0 with probability one, and so we have

√ E[C · x] ≥ p(1 + γ + 8 γ0 )OPT + (1 − p)0 √ E[C · x] ≤ (1 + γ + 4 γ0 )OPT

which implies that √ 1+γ +4 γ p≤ √ 1+γ +8 γ √ 1+4 m+m √ ≤ 1+8 m+m

as γ ≤ ln(m + 1) ≤ m

≤ (nm)−Ω(1) Thus, after a polynomial number of repetitions, we achieve √  C · x ≤ 1 + γ + O( γ) OPT

5

Respecting multiplicity constraints

In Theorem 4.1, we may violate the multiplicity constraints considerably. By adjusting our parameters, we may have better control of the multiplicity constraints. We will describe two algorithms: the first ensures the multiplicity constraints are approximately preserved, and gives an approximation factor in terms of the LP solution. The second exactly preserves the multiplicity constraints exactly, but gives an approximation factor only in terms of the ℓ0 norm of the constraint matrix and the optimal integral solution. Theorem 5.1. Suppose we are given a covering system with ∆1 ≥, amin ≥ 1 and a fractional 1 +1) solution x ˆ. Let γ = ln(∆ amin . Let ǫ ∈ [0, 1] be given. Then, with an appropriate choice of σ, α we may run the ROUNDING algorithm on this system to obtain a solution x ∈ Zn+ satisfying xi ≤ ⌈ˆ xi (1 + ǫ)⌉ √ E[xi ] ≤ x ˆi (1 + 4 γ + 4γ/ǫ)

with probability one

The expected run-time is O(mn). Proof. First, √suppose γ ≤ ǫ2 /2. In this case, we apply Theorem 4.1. We are guaranteed that γ/2+ γ xi (1 + ǫ)⌉. We then have xi ≤ ⌈ˆ xi ln(1+√γ) ⌉ and some simple analysis shows that this is at most ⌈ˆ √ √ E[xi ] ≤ 1 + 4 γ + γ ≤ 1 + 4 γ + 4γ/ǫ as desired. 22

Next, suppose γ ≥ ǫ2 /2. We set α = −(1+ǫ)σln(1−σ) , where σ ∈ (0, 1) is a parameter to be determined. Then by Proposition 3.3, we have xi ≤ ⌈ˆ xi (1 + ǫ)⌉ at the end of the ROUNDING algorithm. We clearly have α ≥

− ln(1−σ) σ

and so we apply Theorem 3.5 to estimate E[xi ]:   X Aki E[xi ] ≤ αˆ xi 1 + σ (1 − σ)ak eσαak − 1 k   X Aki = αˆ xi 1 + σ (1 − σ)−ak ǫ − 1 k   X Aki ≤ αˆ xi 1 + σ (1 − σ)−aǫ − 1 k   ∆1 ≤ αˆ xi 1 + σ (1 − σ)−aǫ − 1

Now set σ = 1− e−γ/ǫ ; observe that this is indeed in the range (0, 1). This ensures that (1− σ)−aǫ = ∆1 + 1 and hence we have   1 E[xi ] ≤ x ˆi α(1 + σ) = x ˆi ǫ−1 (2 + γ/ǫ )(1 + ǫ)γ e −1 Some simple calculus shows that this coefficient ǫ−1 (2 + eγ/ǫ1 −1 )(1 + ǫ)γ is at most 1 + ǫ + (2 + 2/ǫ)γ. √ By our assumption that ǫ ∈ [0, 1] and our assumption that ǫ2 /2 ≤ γ, this is at most 1 + 2γ + 4γ/ǫ as desired. The bound on the running time follows the same lines as Theorem 4.1. Corollary 5.2. Let γ =

ln(∆1 +1) amin .

In polynomial time, one may obtain a solution x ∈ Zn+ satisfying

1. xi ≤ di (1 + ǫ) for all i = 1, . . . , n √  2. Cx ≤ 1 + O(γ/ǫ + γ) OPT, where OPT is the optimal integral solution to the original CIP Proof. The proof is similar to Corollary 4.3, except using Theorem 5.1 instead of Theorem 4.1. Next, we show how to exactly preserve multiplicity constraints. We follow here the approach of [12], which in turn builds on an approach of [2]: they construct a stronger linear program via the knapsack-cover (KC) inequalities. This LP has exponential size, but one can approximately optimize over it in polynomial time. One can then round the resulting solution using Theorem 4.1. Although this algorithm is discussed in great detail in [12] and [2], we give a self-contained presentation here to fill in a few technical details which arise. The key to the KC inequalities is to form a residual problem, given that a set of variables X is “pinned” to their maximal values. Definition 5.3 (The pinned-residual problem). Suppose we have a CIP problem with constraint matrix A, RHS vector a, and multiplicity constraints d. Given any X ⊆ [n], we define the pinnedresidual, denoted PR(X), to be a new CIP problem A′ , a′ , d which we obtain as follows. 23

1. For each k = 0, . . . , m, let vk = ak −

P

i∈X

Aki di , and set

   vk ′ ak = 1   0

2. For each k, i we set:

Aki

   0  0 =  min(1, Avki )   k  A ki

if vk > 1 if vk ∈ (0, 1] if vk ≤ 0 if i ∈ X if i ∈ / X, vk ≤ 0 if i ∈ / X, vk ∈ [0, 1] if i ∈ / X, vk > 1

Observe that if any constraint has a′k = 0, then it has effectively disappeared. Also, observe that for i ∈ X, the constraint matrix A′ does not involve variable xi (the column corresponding to i is zero). Hence, we may assume that any solution x to P R(X) has xi = 0 for i ∈ X. Proposition 5.4 ([12],[2]). For any X ⊆ [n], we have the following: 1. Any integral solution to the original CIP A, a, d also satisfies PR(X). 2. PR(X) has a′min ≥ 1, ∆′1 ≤ ∆0 , where ∆0 is the maximum ℓ0 -column norm of A. Theorem 5.5. Given any CIP A, a, d, C, there is an algorithm which runs in expected polynomial time and returns a solution x ∈ Zn+ satisfying: 1. xi ≤ di for all i = 1, . . . , n p  2. Cx ≤ ln(∆0 + 1) + O( ln(∆0 + 1)) OPT, where OPT is the optimal integral solution to the original CIP Proof. Let γ0 = ln(∆0 + 1) and let δ =

√ 1 2 γ0 +√ γ0 . ln(1+ γ0 )

We begin by finding a fractional solution x ˆ which minimizes C · x ˆ, subject to the conditions that x ˆi ∈ [0, di ] and such that x ˆ satisfies PR({i | x ˆi ≥ di /δ}). This can be done using the ellipsoid method: given some putative x ˆ, one can form PR({i | xi ≥ di /δ}) and determine whbich constraint in it, if any, is violated. Suppose we are given some optimal LP solution x ˆ satisfying this condition. By Proposition 5.4, any optimal integral solution satisfies PR(Y ) for all Y ⊆ [n], and in particular is a solution to the given LP. Thus, C · x ˆ ≤ OPT. Let X = {i | x ˆi ≥ di /δ}. We set xi = di for i ∈ X. For i ∈ / X, we run the algorithm of Theorem 4.1 on PR(X) to obtain a random value for each xi . For i ∈ X, we clearly have xi ≤ di . Observe that by Proposition 5.4, PR(X) has γ ′ ≤ γ0 . So for i ∈ / X, we have xi ≤ ⌈δˆ xi ⌉; this is at most ⌈di ⌉ = di by definition of X. So x satisfies the multiplicity constraints. 24

Next, for i ∈ X we clearly have E[xi ] ≤ di ≤ x ˆi δ ≤ xi (γ0 + O(1)). Also, for i ∈ / X, we have p E[xi ] ≤ x ˆi (1 + γ ′ + 4 γ ′ ); √ by Proposition 5.4 this is ≤ x ˆi (1 + γ0 + 4 γ0 ). Thus, we have that

√ E[C · x] ≤ (1 + γ0 + c γ0 )OPT

where c > 0 is some constant. √ To obtain an integral solution satisfying C · x ≤ (1 + γ0 + O( γ0 ))OPT, one may repeat this procedure for a polynomial number of trials; the proof of this is similar to Corollary 4.3.

6

Lower bounds on approximation ratios

In this section, we provide lower bounds on the performance of algorithms to solve covering integer programs. These bounds fall into two categories. First, we show hardness results, namely that there is no polynomial-time algorithm which can achieve significantly better approximation ratios than we do. These are based on Feige’s celebrated result on the inapproximability of set cover [6], which was later improved by Moshkovitz [15]. Next, we show integrality gap constructions. Our rounding algorithm transforms a solution to the LP relaxation to an integral solution; we show that there are some CIP instances for which the optimal integral solution has an objective-function value that is close to our approximation bound times the objective-function value of any optimal fractional solution. This implies that any algorithm which is based on the LP relaxation cannot have an improved approximation ratio. The formal statements of these results contain numerous qualifiers and technical conditions. So, we will summarize our results informally here: 1. Under the Exponential Time Hypothesis (ETH), when γ is large then any polynomial-time algorithm to solve the CIP (ignoring multiplicity constraints) must have approximation ratio γ − O(log γ). Also there is an integrality gap of γ − O(log γ). By contrast, the algorithm of √ Theorem 4.1 achieves approximation ratio γ + O( γ). 2. When γ is large, then to solve CIP while respecting the multiplicity constraints within a 1 + ǫ multiplicative factor, there is an integrality gap of Ω(γ/ǫ). By contrast, the algorithm of Theorem 5.1 achieves approximation ratio O(γ/ǫ). 3. When γ is small, the integrality gap of the CIP is 1 + Ω(γ); by contrast, the algorithm of √ Theorem 4.1 achieves approximation ratio 1 + O( γ). We note that the approximation ratios may depend on many parameters; two possible parameters are ∆1 , amin but there are numerous others including ∆0 , n, m, etc. Lower bounds for the approximation ratio are very difficult to state in the context of these multiparametric approximations. 25

Thus we will formulate these proofs as follows. We suppose we are given some fixed Turing machine M such that, on input of A, a, C, d, it produces an integral solution x which satisfies the covering constraints exactly, which may or may not fully satisfy the multiplicity constraints, and ensures that C · x ≤ βOPT. We say that β is the approximation ratio of M . Typically, we wish to show that β is bounded in terms of certain functions of the input; for example, in the algorithm of √ Theorem 4.1, we have β ≤ 1 + γ + O( γ). We will see that certain functional forms are impossible to have as bounds for β. We note that the parameter γ depends on two parameters ∆1 , amin . Thus, in order to show these results, we must show hardness across a wide range of the parameters ∆1 , amin . By contrast, typical hardness results for set cover only depend on a single parameter (such as ∆1 or n).

6.1

Hardness results

Our hardness results are all reductions from the construction of Feige [6], which was later strengthened by Moshkovitz [15]. These gave the following nearly-tight hardness results for approximating set cover:2 Theorem 6.1 ([15]). Suppose we are given set cover instances on a ground set [n] with optimal solution of value T . Then, assuming the Exponential Time Hypothesis (i.e., that any algorithm for SAT requires time at least 2Ω(n) ), there is some constant c > 0, such that no polynomial-time algorithm can find a solution of value ≤ T × (ln n − c ln ln n). The greedy algorithm for set cover achieves approximation ratio ln n − ln ln n + Θ(1) [17]. Thus, the approximation ratio of Theorem 6.1 is nearly tight (up to a coefficient of ln ln n). We will show that the covering integer program can be reduced to set cover. This will show hardness results for CIP, which closely match the bounds achieved by our algorithms. Proposition 6.2. Assume the Exponential Time Hypothesis holds, and let A be any algorithm with approximation ratio β, which returns a solution x ∈ Zn+ satisfying Ax ≥ a (but ignores the multiplicity constraints). Suppose that β ≤ f (amin , m) where f : [1, ∞) × Z+ → [1, ∞). Then for any integer a ≥ 1 we have f (a, m) ≥

ln m − c ln ln m a

for infinitely many integers m > 0. Proof. Suppose for contradiction that there is some integer a ≥ 1 such that there exists some A with approximation ratio β ≤ f (a, m), where f (a, m) ≤ ln m−caln ln m for all but finitely many m and c is the constant term of Theorem 6.1. Now suppose we are given some set cover instance, with optimum solution v, on some ground set [n]. Let S = {S1 , . . . , Sm } ⊆ 2[n] . Now, for each element k ∈ [n] in the ground set, we have a 2 These results do not follow from the original, conference version of [15]. They follow immediately from a result stated in the full journal version (unpublished), although they are not stated explicitly.

26

constraint X

i:k∈Si

and we have an objective function C · x =

P

xi ≥ a

xi ; that is, each variable has weight one.

The resulting CIP instance contains n constraints and m variables3 , as well as having amin = a. Observe that if we are given a solution S0 to this set cover instance, of weight v, then the corresponding CIP has a solution defined by ( a if Si ∈ S0 xi = 0 otherwise which has weight T = av. Now, we run A on this resulting system, and we obtain a solution x′ of weight ≤ βT ≤ f (amin , n) ≤ f (a, n). Now construct the solution S0′ to the original set cover instance: S0′ = {Si | x′i ≥ 1} It is not hard to see that S0′ is a valid solution to the original set cover instance and |S0′ | ≤ So our algorithm has overall approximation ratio P ′ x βT βav |S0′ | ≤ i i ≤ ≤ ≤ βa |S0 | v v v ln n − c ln ln n + 1) ≤ a a ≤ ln n − c ln ln n all n sufficiently large

P

i

x′i .

Now note that, when n is bounded by a constant, then one can solve Set Cover optimally. Thus, one can produce an algorithm which has approximation ratio ≤ ln n − c ln ln n for all n > 0. This contradicts Theorem 6.1. Corollary 6.3. Suppose that g : R → R is any function satisfying limx→∞ some sufficiently large constant.

g(x) ln x

≥ c, where c is

Then it is impossible to obtain any CIP algorithm with approximation ratio of the form β ≤ γ − g(γ)

γ=

ln(∆ + 1) amin

or β ≤ ln ∆0 − g(ln ∆0 ) In particular, for γ (respectively ∆0 ) large, the approximation ratio guarantees of Theorem 4.1 (respectively Theorem 5.5), are optimal up to first-order. 3

It is somewhat confusing that for set cover, the standard terminology uses m for the number of sets, which correspond to variables, and n for the size of the ground set, which corresponds to constraints. For CIP, one uses the opposite terminology: there are m constraints on n variables.

27

Proof. Suppose that β ≤ γ − g(γ). Then β ≤ f (amin , m) where f (a, m) =

ln m ln m − g( ) a a

For a = 1, we have that f (a, m) = ln m − g(ln m)

≤ ln m − c/2 ln ln m ln m − c/2 ln ln m = a

for m sufficiently large

ln ln m In particular, we have f (a, m) ≤ ln m−c/2 for infinitely many m. By Proposition 6.2, this is a impossible for c sufficiently large. A similar argument applies to the second bound β ≤ ln ∆0 − g(ln ∆0 ).

6.2

Integrality gaps

We next show a variety of integrality gaps for the CIP. These constructions work as follows: we give a CIP instance, as well as an upper bound on the weight of the fractional solution Tˆ and a lower bound on the weight of any integral solution T . This automatically implies that any algorithm which convert a fractional solution into an integral solution, as our algorithm does, must cause the weight to increase by at least T /Tˆ. The main advantage of the the complexity-theoretic results hold for a broad class of algorithms, while the integrality gaps hold only for a limited class. However, the integrality gaps have a number of compensating advantages: 1. In some cases, one may wish to compare an integral solution with a fractional solution, where the fractional solution is not derived by solving an LP optimization. For example, one may take as a starting point a “uniform” fractional solution, and measure the discrepancy forced by integrality. 2. The complexity-theoretic proofs depend on strong assumptions (such as P 6= NP or ETH), which are not likely to be proved soon. 3. The complexity-theoretic proofs are necessarily asymptotic: one cannot show any limits to the approximability of any particular problem (when the problem size is finite, then one can optimize it in “polynomial” time). The integrality gaps, by contrast, allow one to show inapproximability results which apply to specific problem instances. We show an integrality gap which matches Proposition 6.2 when γ is large. We are also able to show an integrality gap for the regime in which γ → 0. 28

Proposition 6.4. There is some universal constant c ≥ 0 for which the following holds. Let a ≥ 1, m ≥ 1 be given. There is a covering program with m constraints and amin = a and all the entries of the constraint matrix are in {0, 1}, and which satisfies the following property. Let Tˆ be the optimal value of this covering program, subject to the constraints x ∈ Rn+ and let T be the optimal value of the covering program, subject to the constraints x ∈ Zn+ . Then we have log log m ln m −c ≥ γ − c log γ T /Tˆ ≥ a a Proof. First, we claim that we can assume that m is larger than any desired constant. For, suppose m ≤ m0 . Then, for some constant c > 0, we have ln m − c log log m ≤ 1 for all m ≤ m0 . We certainly have T /Tˆ ≥ 1, so we have T /Tˆ ≥ ln m − c log log m ≥ ln m−c alog log m . Likewise, we can assume that ln m ≥ a. We will make both of these simplifications for the remainder of the proof. We will form the m constraints randomly as follows: we select exactly s positions i1 , . . . , is uniformly at random in [s] without replacement, where s = ⌈pn⌉; here n → ∞ and p → 0 as functions of m. We then set Aki1 = · · · = Akis = 1; all other entries of Ak arePset to zero. The RHS vector is always equal to a. The objective function C is defined by C · x = xi ; that is, each variable is assigned weight one. We can form a fractional solution x ˆ by setting x ˆi = as . As each constraint contains exactly s entries with coefficient one, this satisfies all the covering constraints. Thus, the optimal fractional solution has value Tˆ ≤ na/s = a/p. P Now suppose we fix some integral solution of weight xi = t. Let I ⊆ [n] denote the support of x, that is, the values i ∈ [n] such that xi > 0; we have |I| = r ≤ t. In each constraint k, there n is a probability of n−r s / s that Aki = 0 for all i ∈ I. If this occurs, then certaintly A · x = 0 and the covering constraint is violated. Thus, the probability that x satisfies constraint k is at (n−r) most 1 − ns . As all the constraints are independent, the total probability that x satisfies all m (s ) constraints is at most: n−r P (x satisfies all constraints and has weight t) ≤ (1 − ns  )m s

≤ exp(−m

n−t s n s



)

n − s − (t − 1) t )) n ≤ exp(−m(1 − p − t/n)t ) as s ≤ pn + 1

≤ exp(−m(

We want to ensure that there are no good integral solutions. To upper-bound the probability that there exists such a good x, we take a union-bound over all integral x. In fact, our estimate only depended on specifying the support of x, not the values it takes on there, so  need to take a P we only union bound over all subsets of [n] of cardinality ≤ t. There are at most tr=0 nr ≤ nt such sets, 29

and thus we have P (Some x satisfies all constraints) ≤ nt exp(−m(1 − p − t/n)t )

≤ exp(t ln n − m(1 − p)t + mt2 /n)

We now set n = mt, and obtain P (Some x satisfies all constraints) ≤ exp(t(1 + ln(mt)) − m(1 − p)t )

≤ exp(t2 ln m − m exp(−pt − p2 t))

for m, p, t sufficiently small

If this expression is smaller than one, then that implies that there is a positive probability that no integral solution exists. Hence, we can ensure that all integral solutions satisfy T > t. Now, some simple analysis shows that this expression is < 1 when p = 1/ ln m and t = p−1 (ln m − 10 ln ln m) and m sufficiently large. Thus we have p−1 (ln m − 10 ln ln m) a/p ln m − O(log log m) ≥ a ln(D + 1) log(D + 1) ≥ − O(log( )) a a

T /Tˆ ≥

as we have claimed. This argument can be adjusted to take into account a (1+ǫ) violation of the multiplicity constraints. Proposition 6.5. There is a constant c ≥ 0 with the following property. Let a, m be given integer parameters and let ǫ ∈ (0, 1). Then there is a CIP instance on m constraints with amin = a, with some parameter d ≥ 0 such that the fractional solution x ˆ ∈ [0, d]n has has objective value Tˆ, the optimal integral solution in x ∈ {0, 1, . . . , ⌈(1 + ǫ)d⌉}n has objective value T , and ln m − c ln ln m Tˆ/T ≥ aǫ Hence, CIPs cannot be approximated within o( γǫ ) as long as the multiplicity constraints are respected within a multiplicative factor of (1 + ǫ) Proof. Let A be the CIP instance constructed of Proposition 6.4 in n variables and m constraints and with amin = 1. By construction, it satisfies T /Tˆ ≥ ln m − c ln ln m for some constant c ≥ 0. We form a new CIP instance A′ on n + m variables and m constraints; for each constraint k = 1, . . . , m we form n X a Aki xi ≥ a xm+k + K(1 + ǫ) + 1 i=1 P and we have an objective function C · x = ni=1 xi ; that is, each variable x1 , . . . , xn has weight one, and each variable xn+1 , . . . , xn+m has weight zero. We set di = ∞ for i = 1, . . . , n and we set 30

di = K for i = m + 1, . . . , m + n; here K is a large integer parameter, which we will specify shortly. (In particular, for K sufficiently large, all the coefficients in this constraint are in the range [0, 1].) The resulting CIP instance contains m constraints and amin = a. Now suppose that x ˆ is a fractional a(1+ǫK) and consider the fractional solution solution to the original CIP instance. Then let v = 1+(1+ǫ)K x ˆ′ defined by ( vˆ xi if i ≤ n ′ x ˆi = K if n + 1 ≤ i ≤ n + m Observe that for any constraint k we have n n X X a a Aki x ˆ′i = Aki x ˆi x ˆm+k + K +v K(1 + ǫ) + 1 K(1 + ǫ) + 1 i=1 i=1 a ≥ K +v as Ak · x ˆ ≥ ak = 1 K(1 + ǫ) + 1 =a

P and so this is a valid LP solution. Thus the fractional objective value is at most Tˆ′ ≤ ni=1 vˆ x′i = v Tˆ.

On the other hand, consider an integral solution x′ . As xm+k ≤ ⌈(1 + ǫ)K⌉, we have that for all k ∈ [m]: n X a Aki xi ≥ a (1 + ǫ)K + K(1 + ǫ) + 1 i=1 P which implies that ni=1 Aki xi > 0. P As all the entries of Aki are in {0, 1}, this implies that ni=1 Aki xi ≥ 1, and so x is an integral solution to the original CIP instance. Thus, its objective value is at least T ′ ≥ T , where T is the optimal integral solution to the original A. Thus we have that

ln m − c ln ln m T T′ ≥ ≥ v v Tˆ Tˆ′

Taking the limit as K → ∞, we see that for any δ > 0 there exists a CIP with integrality gap T′ (1 + ǫ)(ln m − c ln ln m) −δ ≥ ′ ˆ aǫ T In particular, as ǫ > 0, we can select δ sufficiently small so that T′ ln m − c ln ln m ≥ ′ ˆ aǫ T

In light of this result, we note that Theorem 5.1 has an optimal approximation ratio in terms of γ, ǫ for γ → ∞, up to a constant factor. However, this integrality gap construction does not 31

apply to Theorem 5.5, which uses a stronger LP formulation (the KC constraints). For this reason, Theorem 5.5 is able to achieve an approximation ratio which remains bounded as ǫ → 0. Proposition 6.4 does not give a useful bound when a > m. Proposition 6.6, which is based on a construction of [19], covers that case: Proposition 6.6. For any g ∈ (0, 1) and m ≥ 21+14/g , there is a CIP with m constraints and ln m ˆ amin ≤ g, and which satisfies also the following integrality gap property: Let T be the optimal value of this covering program, subject to the constraints x ∈ Rn+ and let T be the optimal value of the covering program, subject to the constraints x ∈ Zn+ . Then we have T /Tˆ ≥ 1 + g/8 ≥ 1 + Ω(γ) In particular, it is impossible to guarantee an approximation ratio for LP rounding of the form m 1 + o( log amin ). Proof. We set n = 2q − 1 where q = ⌊log2 m⌋. We will view the integers from 1, . . . , n as corresponding to the non-zero binary strings of length q. Thus, if i, i′ ∈ {1, . . . , s}, then we write i · i′ to denote the binary dot-product. Namely if we i = i0 + 2i1 + 4i2 + . . . and i′ = i′0 + 2i′1 + 4i′2 + . . . Lhave q−1 ′ ′ where ij , ij ∈ {0, 1}, then we define i · i = l=0 il i′l . The covering system is defined as follows: For each k ∈ {1, . . . , n} we have a constraint X

i:(k·i)=0

The objective function is C · x =

Pn

xi ≥ a

i=1 xi .

where a =

q−1 g

This has n ≤ m constraints and it satisfies

amin = a =

log2 m − 2 q−1 ≥ . g g

Observe that we have m ≥ 21+14/g ≥ 91.7, and so log2 m − 2 ≥ ln m and hence we have amin ≥ as desired. We form the fractional solution x ˆ by setting x ˆi = q −1)a (2 fractional solution has value Tˆ ≤ q−1 ≤ 2a.

a 2q−1

ln m g

for i = 1, . . . , n. This shows that the optimal

2

P Now consider some integral solution x ∈ Zn+ with i xi = T . We can write x as a sum of basis vectors, ~x = ey1 + · · · + eyT , where y1 , . . . , yT are not necessarily distinct. Consider the quantity X X V = [k · yi1 = · · · = k · yiq−1 = 0] k 1≤i1 1

We use the bounds 2q − 1 ≤ 2q and the bound on the factorial obtain the following condition, which implies (5): 2−q (4 − 3g)

−3gq+5g+4q−4 4g

(8 − 3g)

3gq−5g−8q+8 4g

(5) √

1

1

2πr r+ 2 e−r ≤ r! ≤ er r+ 2 e−r , to

− (g+4)q+g−4 4g

(g + 4)

(g + 8)

(g+8)q+g−8 4g

>

e2 2π

(6)

We can increase the RHS of (6) slightly to e to simplify the calculations, and take the logarithm to solve for q. This gives us the following condition, which implies (6):  q >1+

2g −2 + ln(4 − 3g) − ln(8 − 3g) − ln(g + 4) + ln(g + 8) − 2 ln 2

4g ln 2 − (4 − 3g) ln(4 − 3g) + (8 − 3g) ln(8 − 3g) + (4 + g) ln(4 + g) − (8 + g) ln(8 + g)

(7)

The RHS of (7) is a function of g alone. Simple but tedious analysis (see Proposition A.5) shows that it is at most 14/g. But note that q = ⌊log2 m⌋ ≥ log2 m − 1; thus, our bound on the size of m guarantees that indeed q > 14/g. So (7) ⇒ (6) ⇒ (5) ⇒ T ≥ (q − 1)(2/g + 1/4). The integrality gap is then given by T /Tˆ ≥

7

(q − 1)(2/g + 1/4) 2a + ag/4 = = 1 + g/8 2a 2a

Multi-criteria Programs

One extension of the covering integer program framework is the presence of multiple linear objectives. Suppose now that instead of a single linear objective, we have multiple objectives C1 · x, . . . , Cr · x. We also may have some over-all objective function D defined by the following: D(x1 , . . . , xn ) = D(C1 · x, . . . , Cr · x) 33

For example, we might have D = maxl Cl · x or we might have D =

P

l (Cl

· x)2 .

We note that the greedy algorithm, which is powerful for set cover, is not obviously useful in this case. However, depending on the precise form of the function D, it may be possible to solve the fractional relaxation to optimality. For example, if D = maxl Cl · x, then this amounts to a linear program of the form min t subject to Cl · x ≤ t. For our purposes, the algorithm used to solve the fractional relaxation is not relevant. Suppose we are given some solution x ˆ. We now want to find a solution x such that we have simultaneously Cℓ · x ≈ Cℓ · x ˆ for all ℓ. Showing bounds on the expectations alone is not sufficient — it might be the case that E[Cℓ · x] ≤ βCℓ · x ˆ, but the random variables C1 · x, . . . , Cr · x are negatively correlated. In [18], Srinivasan gave a construction which provided this type of simultaneous approximation guarantee. This algorithm was based on randomized rounding, which succeeded only with an exponentially small probability. Srinivasan also gave a derandomization of this process, leading to a somewhat efficient algorithm. Some technical difficulties with this algorithm lead to worsened 0 +1) ), approximation ratios compared to the single-criterion setting, roughly of the order O(1+ log(∆ amin log r and running times of the order O(n ). In particular, this was only polynomial if r was constant. In this section, we will show that at the end of the ROUNDING algorithm, the values of Cℓ · x are concentrated around their means. This will establish that there is a good probability that we have Cℓ · x ≈ E[Cℓ · x] for ℓ. Thus, our algorithm automatically gives good approximation ratios for multi-criteria problems; the ratios are essentially the same as for the single-criterion setting, and there is no extra computational burden. We begin by showing that the values of x produced by the RELAXATION algorithm obey a type of negative correlation property. We will show this via a type of “witness” construction, similar to Lemma 2.1; however, instead of providing a witness for the event that xi = 1, we will provide a witness for the event that simultaneously xi1 = · · · = xis = 1. This proof is based on induction similar to Lemma 2.1. Suppose we are given any set I ⊆ [n], any integers J1 , . . . , Jm ≥ 0, and an array of sets Z = hZj,k | k = 1, . . . , m, j = 1, . . . , Jk i. We then define the event E(I, J, Z) to be the following: 1. For each k = 1, . . . , m, the first Jk resampled sets for constraint k are respectively Zk,1 , . . . , Zk,Jk 2. Each i ∈ I turns at 0 or some (k, j) where 1 ≤ j ≤ Jk . We similarly define the event E(I, J, Z, v) for any v ∈ {0, 1}n to be that the event E(I, J, Z) occurs, if we start the RELAXATION algorithm by setting x = v (instead of drawing x as independent Bernoulli-pi ), and the event E(T, I, J, Z, v) to be the event that E(I, J, Z, v) occurs and the relaxation algorithm terminates in less than T resamplings. Given any integers J1 , . . . , Jk , we define prefix(J) to be the set of all pairs (k, j) where 1 ≤ j ≤ Jk .

Proposition 7.1. Suppose that xi ∈ [0, 1/α)n . Let v ∈ {0, 1}n , I ⊆ [n], and J, Z be given. Then P (E(I, J, Z)) ≤

Y i∈I

pi

Y

(k,j)∈prefix(J)

34

fk (Zk,j )

Proof. Define [

D=

Zj,k

(k,j)∈prefix(J)

We prove by induction on T that for any T ≥ 0 we have Q Y (k,j)∈prefix(J) fk (Zk,j ) Q P (E(T, I, J, Z, v)) ≤ pi i∈D (1 − pi ) i∈I∩D

A few details of the proof which are identical to Lemma 2.1 are omitted for clarity. Let k be minimal such that Ak · x < ak . If Jl ≥ 1 for any l < k then the event E(T, I, J, Z, v) is impossible and we are done. If Jk = 0, then E(T, I, J, Z, v) is equivalent to E(T − 1, I, J, Z, x′ ) where x′ is the value of the variables after a resampling; for this we use the induction hypothesis and we are done. S So suppose Jk ≥ 1. Define D ′ = (j,l)∈prefix(J) Zj,l . Then the following are necessary events to have E(T − 1, I, J, Z, x′ ):

(j,l)6=(1,k)

(A1) We select Zk,1 as the resampled set for constraint k (A2) The event E(T − 1, I ′ , J ′ , Z ′ , x′ ) occurs, where x′ is the value of the variables after resampling, ′ , . . . , Z′ where I ′ = I ∩ D ′ , and J ′ , Z ′ are derived by setting Jk′ = Jk − 1 and by Zk,1 k,Jk −1 = ′ Zk,2 , . . . , Zk,JK (and all other entries remain the same) (A3) For any i ∈ (Zk,1 − D ′ ) ∩ I we resample xi = 1 (A4) For any i ∈ Zk,1 ∩ D ′ we resample xi = 0 The rationale for (A3) is that we require i ∈ I to turn at some (j, l) ∈ prefix(J), and in addition Zj,l is the j th resampled set for constraint l. This would imply that i ∈ Zj,l . However, there is only one such (j, l), namely (j, l) = (1, k). Thus, we are requiring i to become resampled to xi = 1. The rationale for (A4) is the same as in Lemma 2.1: if we resample xi = 1, then xi can never be resampled again. In particular, we cannot have i in any future resampled set. Thus if x′i = 1 but i ∈ Zk,1 ∩ D ′ , then the event (A2) is impossible. As in Lemma 2.1, the event (A1) has probability ≤ (1 − σ)−ak Event (A3), conditional on (A1), has probability

Q

Q

i∈(Zk,1 −D ′ )∩I

Event (A4), conditional on (A1), (A3), has probability

Q

i∈[n] (1

− Aki σ)

pi .

i∈Zk,1 ∩D ′

Q

Aki σ i∈Zk,1 1−Aki σ .

1 − pi .

By induction hypothesis, event (A2), conditional on (A1), (A3), (A4), has probability Y Y Y P ((A2)) ≤ pi × (1 − pi ) × fk (Zk,l ) i∈I ′ −D ′

i∈D ′

(j,l)∈prefix(J ′ )

35

Multiplying these probabilities, after some rearrangement, gives us the desired bound on P (E(T, I, J, Z, v)), thus completing the induction. Next, as in Lemma 2.1, we immediately obtain also P (E(I, J, Z, v)) = lim P (E(T, I, J, Z, v)) ≤ T →∞

Y

i∈I∩D

pi ×

Q

(k,j)∈prefix(J) fk (Zk,j )

Q

i∈D (1

− pi )

Finally, to obtain a bound on P (E(I, J, Z)), we observe that if i ∈ D, then xi must be equal to zero during the initial sampling. Also,Qif i ∈ I − QD, then xi must be equal to to one during the initial sampling. This has probability i∈I−D pi i∈D (1 − pi ). Conditional on this event, we have Q Q fk (Zk,j ) (k,j)∈prefix(J ) Q . Thus, multiplying the probabilities together, P (E(I, J, Z, x)) ≤ i∈I∩D pi × i∈D (1−pi ) gives us Y Y fk (Zk,j ) P (E(I, J, Z)) ≤ pi i∈I

(k,j)∈prefix(J)

as desired.

Proposition 7.2. Let R ⊆ [n]. Suppose that at the end of the RELAXATION algorithm we have xi = 1 for all i ∈ R. Then there is a set R′ ⊆ R and an injective function h : R′ → [m], as well as non-negative integers J1 , . . . , Jm and sets Zk,j for j = 1, . . . , Jk , which satisfy the following properties: (D1) For all i ∈ R′ we have Jh(i) ≥ 1 and i ∈ Zh(i),Jh(i) (D2) For all k ∈ / h(R′ ) we have Jk = 0 (D3) Each i ∈ R turns at either 0 or at some (k, j) for k ≤ Jk Proof. Let S0 ⊆ R denote the set of variables i ∈ R which turn at 0. For each k = 1, . . . , m let Sk ⊆ R denote the variables i ∈ R which turn at constraint k, where each i ∈ Sk turns at (k, Li ). Observe that S0 , S1 , . . . , Sm form a partition of R. Now for each k = 1, . . . , m we define: Jk = max Li i∈Sk

We form the set R′ by selecting, for each k ∈ [m] with Sk 6= ∅, exactly one i ∈ Sk with Li = Jk (there may be more than one; in which case we select i arbitrarily). We define f by mapping this i ∈ Sk to k. Note that we must have i ∈ Zh(i),Jh(i) , as we are assuming that Li = Jk where k = h(i). Also, each i ∈ Sk must turn at (k, Li ) and Li ≤ Jk , thus (D3) is satisfied.

36

Theorem 7.3. Suppose x ∈ [0, 1/α)n and α > − ln(1−σ) . For any R ⊆ [n], the probability that σ xi = 1 for all i ∈ R is at most ^ Y P( xi = 1) ≤ ρi i∈R

i∈R

where, for each i ∈ [n], we define  X ρi = αˆ xi 1 + σ k

 Aki (1 − σ)ak eσαAk ·ˆx − 1

Proof. By Proposition 7.2, there must exist R′ , h, Zk,j , J satisfying (D1), (D2),Q(D3). By Q Lemma 7.1, for any Z, J satisfying (D1), (D2) the probability of satisfying (D3) is at most i∈R pi (k,j)∈prefix(J) fk (Zk,j ). Taking a union bound over all such J, Zk,j we have: P(

^

i∈R

xi = 1) ≤

Y

X

Y

pi

i∈R R′ ,h,Z,J satisfying (D1), (D2)

fk (Zk,j )

(8)

(k,j)∈prefix(J)

We must enumerate over all R′ , h, Z, J satisfying (D1), (D2). Suppose now that R′ and h are fixed. To simplify the notation, let us suppose wlg that R′ = {1, . . . , r}. We now consider the following process to enumerate over Z, J: ′ where j ≤ J ′ . 1. We select any vector of integers J ′ ∈ Zr+ , and sets Zi,j i

2. For each i ∈ R′ , we select a set Wi ⊆ [n] with i ∈ Wi . 3. We define J by Jh(i) = Ji′ + 1 for i = 1, . . . , r, and all other value of J are equal to zero. Also ′ and finally Z for j = 1, . . . , Ji′ we set Zh(i),j = Zi,j h(i),Ji′ +1 = Wi . Now observe that for a fixed R′ , h this process enumerates every Z, J satisfying (D1), (D2) exactly once. Furthermore, for any J ′ , Z ′ , W , we have ′

Y

X

fk (Zk,j ) =

Y

fh(i) (Wi )

W1 ∋1,...,Wr ∋r i∈R′

(k,j)∈prefix(J)

Jk r Y Y

fk (Zj,k )

i=1 j=1

Thus, summing over possible values for Z ′ , J ′ , W we have: X

Y

fk (Zk,j ) =

r  Y



X

fh(i) (Wi )

≤ =

r Y i=1

sh(i) Ah(i),i σ ×

Y sh(i) Ah(i),i σ 1 − sh(i) ′

i∈R

37

X

j Y

j ′ ≥0 Zh(i),1 ,...,Zh(i),j ′ l=1

i=1 W ⊆[n],W ∋i

Z,J (k,j)∈prefix(J) satisfying (D1),(D2)

X

X

(sh(i) )j

j ′ ≥0



 fk (Zh(i),l )

by Propositions 2.2, 2.3

Thus, now we may sum over R′ ⊆ R and injective h : R′ → [m] as: X Y Y Y X fk (Zk,j ) ≤ pi pi i∈R R′ ,h,Z,J satisfying (D1), (D2)

i∈R R′ ⊆R injective h : R′ → [m]

i∈R

(k,j)∈prefix(J)



= =

X

Y

pi

Y

pi

Y

pi (1 +

i∈R

i∈R

i∈R

Y sh(i) Ah(i),i σ 1 − sh(i) ′

Y sh(i) Ah(i),i σ 1 − sh(i) ′

R′ ⊆R i∈R h:R′ →[m] m X YX

sk Ak,i σ 1 − sk

R′ ⊆R i∈R′ k=1 m X sk Ak,i σ k=1

1 − sk

)=

Y

ρi

i∈R

We can now show a concentration phenomenon for C · x. In order to obtain the simplest such bounds, we can make an assumption that the entries of C are in the range [0, 1]. In this case, we can use the Chernoff upper-tail function to give estimates for the concentration of C · x. Definition 7.4 (The Chernoff upper-tail). For t ≥ µ with δ = δ(µ, t) = t/µ − 1 ≥ 0, the Chernoff upper-tail bound is defined as  µ eδ Chernoff-U(µ, t) = (9) (1 + δ)1+δ That is to say Chernoff-U(µ, t) is the Chernoff bound that a sum of [0, 1]-bounded and independent random variables with mean µ will be above t.

Corollary 7.5. Suppose that all entries of Cl are in the interval [0, 1] and that x ˆ ∈ [0, 1/α]n − ln(1−σ) . Then, after running the RELAXATION algorithm, the probability of the event and α > σ Cl · x > t is at most Chernoff-U(Cl · ρ, t). Proof. The value of Cl · x is a sum of random variables Cli xi which are in the range [0, 1]. These random variables obey a negative-correlation property as shown in Theorem 7.3. This implies that they obey the same upper-tail Chernoff bounds as would a sum of random variables Xi which are independent and satisfy E[Xi ] = ρi . We next need to show concentration for the ROUNDING algorithm. Theorem 7.6. Suppose that all entries of Cl are in [0, 1]. Then, after the ROUNDING algorithm, the probability of the event Cl · x > t is at most Chernoff-U(Cl · ρ, t). Proof. Let vi , Gi , x ˆ′i , a′k , x′ be the variables which occur during the ROUNDING algorithm. We have P (Cl · x > t) = P (Cl · (vθ + G + x′ ) > t)

= P (Cl · x′ > t − Cl · (vθ + G))

≤ Chernoff-U(Cl · ρ′ , t − Cl · (vθ + G)) 38

Thus, we have  X X P (Cl · x > t) ≤ Chernoff-U α Cli x ˆ′i , 1 + σ i

k

  Aki , t − C · (vθ + G) (10) l ′ (1 − σ)ak eσαAk ·ˆx′ − 1

By Proposition A.3, Chernoff-U(µ, t) is always an increasing function of µ. So we can show an upper bound for this expression by giving an upper bound for the µ term in the (10). We first apply Propositions 3.2, 3.4 which give:  X x′i 1 + σ k

  Aki ≤ (ˆ xi − vi θ − Gi /α)T ′ (1 − σ) eσαAk ·ˆx − 1 a′k

where we define T =1+σ

X k

(1 −

Aki a σ) k eσαak

−1

Substituting this upper bound into (10) yields:  X   P (Cl · x > t) ≤ Chernoff-U α Cli (ˆ xi − vi θ − Gi /α)T , t − Cl · (vθ + G) i

 X Cli (ρi − (vi αθ + Gi )), t − Cl · (vθ + G) ≤ Chernoff-U i

  ≤ Chernoff-U (Cl · ρ) − (Cl · (vθ + G)), t − (Cl · (vθ + G))

≤ Chernoff-U(Cl · ρ, t)

by Proposition A.4

In the column-sparsity setting, we obtain the following result which extends Theorem 5.1: Corollary 7.7. Suppose we are given a covering system as well as a fractional solution x ˆ. Let 1 +1) γ = log(∆ . Suppose that the entries of C are in [0, 1]. Then, with an appropriate choice of σ, α l amin we may run the ROUNDING algorithm in expected time O(mn) to obtain a solution x ∈ Zn+ such that P (Cl · x > t) ≤ Chernoff-U(βCl · x ˆ, t) √ for β = 1 + γ + 4 γ. If one wishes to ensure also that xi ≤ ⌈ˆ xi (1 + ǫ)⌉ for ǫ ∈ (0, 1), then one can obtain a similar result √ with an approximation factor β = 1 + 4 γ + 4γ/ǫ.

8

Acknowledgements

Thanks to Vance Faber for helpful discussions and brainstorming about the integrality gap constructions, and to Dana Moshkovitz for her helpful input on inapproximability. Thanks to the anonymous SODA 2016 and journal reviewers for helpful comments and corrections. 39

References [1] Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and Aravind Srinivasan. Solving packing integer programs via randomized rounding with alterations. Theory of Computing, 8(1):533– 565, 2012. [2] Robert D. Carr, Lisa Fleischer, Vitus J. Leung, and Cynthia A. Phillips. Strengthening integrality gaps for capacitated network design and covering problems. Sympsium on Discrete Algorithms (SODA), pages 106–105, 2000. [3] V. Chv´atal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4(3):233–235, 1979. [4] Gregory Dobson. Worst-case analysis of greedy heuristics for integer programming with nonnegative data. Mathematics of Operations Research, 7(4):515–531, 1982. [5] Paul Erd˝os and L´ aszl´ o Lov´asz. Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets, 10:609–627, 1975. [6] Uriel Feige. A threshold of ln n for approximation set cover. Journal of the ACM, 45(4):634– 652, 1998. [7] Marshall L. Fisher and Laurence A. Wolsey. On the greedy heuristic for continuous covering and packing problems. SIAM Journal on Algebraic Discrete Methods, 3(4):584–591, 1982. [8] David G. Harris. Lopsidependency in the Moser-Tardos framework: Beyond the Lopsided Lov´asz Local Lemma. In Symposium on Discrete Algorithms (SODA), pages 1792–1808. SIAM, 2015. [9] David G. Harris and Aravind Srinivasan. The Moser-Tardos framework with partial resampling. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 469–478. IEEE, 2013. [10] Nick Harvey. A note on the discrepancy of matrices with bounded row and column sums. Discrete Mathematics, 338:517–521, 2015. [11] Richard M Karp. Reducibility among combinatorial problems. Springer, 1972. [12] Stavros Kolliopoulos and Neal Young. Approximation algorithms for covering/packing integer programs. Journal of Computer And System Sciences, 71:495–505, 2005. [13] Tom Leighton, Chi-Jen Lu, Satish Rao, and Aravind Srinivasan. New algorithmic aspects of the Local Lemma with applications to routing and partitioning. SIAM Journal on Computing, 31(2):626–641, 2001. [14] Robin A Moser and G´ abor Tardos. A constructive proof of the general Lov´asz Local Lemma. Journal of the ACM, 57(2):11, 2010. [15] Dana Moshkovitz. The projection games conjecture and the NP-hardness of ln n-approximating set-cover. Approximation, Randomization, and Combinatorial Optimization Algorithms and Techniques, pages 276–287, 2012.

40

[16] Prabhakar Raghavan and Clark D Thompson. Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica, 7(4):365–374, 1987. [17] Petr Slavik. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms, 25(2):237–254, 1997. [18] Aravind Srinivasan. An extension of the Lov´asz Local Lemma, and its applications to integer programming. SIAM Journal on Computing, 36(3):609–634, 2006. [19] Vijay V. Vazirani. Approximation algorithms. Springer, 2001.

A

Some technical lemmas

Proposition A.1. Given set S, xi ∈ [0, 1], and a ∈ (0, 1), we have P Y (1 − axi )−1 ≤ (1 − a)− i∈S xi i∈S

Q −1 Proof. P Using the concavity of the function i∈S (1 − axi ) we can show that, for a fixed value of s = i∈S xi , it attains a maximum occurs when at most one xi is fractional. So suppose s = z + r where z ∈ Z+ and r ∈ (0, 1). Then we have: Y (1 − axi )−1 = (1 − a)−z (1 − ra)−1 i∈S

≤ (1 − a)−z (1 − a)−r = (1 − a)−

P

i∈S

xi

Proposition A.2. For any γ > 0, define x

f (x) = (x + 1)

√ √ 2 γ+γ−2 ln(1+ γ) γ

−1

Then f (x) is a decreasing function of x for x > 0. Proof. At x = 0, both numerator and denominator are equal to zero. So it suffices to show that the denominator grows faster than the numerator, that is, that the derivative of the denominator is always ≥ 1. We compute the derivative of the denominator: √ √ 2 γ−2 ln(1+ γ)  √ √ γ γ + 2 γ − 2 ln γ + 1 (x + 1) R= γ 0

(γ + 0)(x + 1) γ ≥ γ = 1 as desired

as y ≥ ln(1 + y) for y > 0

41

Proposition A.3. For any 0 ≤ µ ≤ µ′ ≤ t we have Chernoff-U(µ, t) ≤ Chernoff-U(µ′ , t). Proof. Observe that Chernoff-U(µ, t) is monotonically increasing in µ in the range µ ∈ [0, t). Proposition A.4. For any 0 ≤ µ ≤ t and any r ≤ µ, we have Chernoff-U(µ, t) ≤ Chernoff-U(µ − r, t − r). Proof. Compute the directional derivative of Chernoff-U(µ, t) along the unit vector u ˆ = (1, 1).  Proposition A.5. Let f1 (g) = 2g −2 + ln(4 − 3g) − ln(8 − 3g) − ln(g + 4) + ln(g + 8) − 2 ln 2 and let f2 (g) = g ln 2 − (4 − 3g) ln(4 − 3g) + (8 − 3g) ln(8 − 3g) + (4 + g) ln(4 + g) − (8 + g) ln(8 + g). For any g ∈ (0, 1) we have f1 (g) (11) 14/g > 1 + f2 (g) Proof. Let us first consider the denominator f2 (g). Note that f2′′ (g) is a rational function, and simple algebra shows that its only root is at g = −16/9. As f2′′ (0) = −1, this implies that f2′′ (g) < 0 for all g ∈ (0, 1). Thus, f2′ (g) is decreasing in this range. As f2′ (0) = 0, this implies that f2′ (g) < 0 for g ∈ (0, 1). As f2 (0) = 0, this further implies that f2 (g) < 0 for g ∈ (0, 1). We may thus cross-multiply (11), taking into account the fact that the denominator is negative. Thus to show (11) is suffices to show that h(g) < 0, where we define h(g) = (−5g 2 + 46g − 56) ln(4 − 3g)

+ (5g 2 − 50g + 112) ln(8 − 3g) + (g2 + 10g + 56) ln(g + 4)

+ (−g 2 − 6g − 112) ln(g + 8) + 4g 2 + 56g ln 2 Simple calculus shows that h′′′ (g) is a rational function of g, and it has no roots in the range (0, 1). As h′′′ (0) = −75/8, this implies that h′′′ (g) < 0 for all g ∈ (0, 1). As h′′ (0) = −0.454, this implies that h′′ (g) < 0 for all g ∈ (0, 1). As h′ (0) = 0, this implies that h′ (g) < 0 for all g ∈ (0, 1). As h(0) = 0, this implies that h(g) < 0 for all g ∈ (0, 1).

42