Sparse Covers for Sums of Indicators
arXiv:1306.1265v1 [stat.CO] 5 Jun 2013
Constantinos Daskalakis∗ EECS and CSAIL, MIT
[email protected] Christos Papadimitriou† Computer Science, U.C. Berkeley
[email protected] May 7, 2014
Abstract A Poisson Binomial Distribution of Order n is the discrete probability distribution of the sum of n independent, not necessarily identically distributed, indicator random variables. We show that, for all n ∈ N and ǫ > 0, there exists a subset Sn,ǫ of the set Sn of Poisson Binomial distributions of order n such that: 1. for all D ∈ Sn , there exists D′ ∈ Sn,ǫ so that the total variation distance dTV (D, D′ ) ≤ ǫ; 2
2. the size of Sn,ǫ is polynomial in n and (1/ǫ)log
(1/ǫ)
;
3. Sn,ǫ can be computed in time quasi-linear in |Sn,ǫ |.
We discuss the implications of our construction for approximation algorithms and the computation of approximate Nash equilibria in anonymous games.
1
Introduction
A Poisson Binomial Distribution of Order n is the discrete probability distribution of the sum of n independent indicator random variables. The distribution is parameterized by a vector (pi )ni=1 ∈ [0, 1]n of probabilities, and is denoted PBD(p1 , . . . , pn ). In this paper we establish that the set Sn of all Poisson binomial distributions of order n admits certain useful covers with respect to the total variation distance. Let ǫ > 0 be an approximation parameter. A (proper) ǫ-cover of a class C of distributions with respect to metric k·; ·k over C is a subset Cˆ of C with the property that ˆ ∈ C such that kD; Dk ˆ ≤ ǫ. Covers are of interest in the design of for every D ∈ C there is a D algorithms, when one is searching a class of distributions to identify an element of the class with some quantitative property, or in optimizing over a class with respect to some objective. If the metric k·; ·k is relevant for the problem in hand, and the cover is discrete, relatively small and easy to construct, then one can provide a useful approximation to the sought distribution by searching the cover, instead of searching all of C. One situation of this sort arises in the computation of Nash equilibria in an important class of multiplayer games called anonymous [Mil96, Blo99]. A two-strategy anonymous game has n players with the same two strategies, 0 and 1, available to each player. The utility of each player i is a function ui : {0, 1} × [n − 1] → [0, 1]; in particular, ui depends on player i’s own strategy, and only the number (not the identities, hence the term “anonymous”) of the other players playing ∗
Supported by a Sloan Foundation fellowship, a Microsoft Reseach faculty fellowship and NSF Award CCF-0953960 (CAREER) and CCF-1101491. † Supported by NSF grant CCF-0964033 and a Google University Research Award.
1
strategy 1. The players may use randomized strategies, and we can represent player i’s strategy by an indicator random variable Xi , indicating whether player i plays strategy 1. When studying games, one is interested in their stable operational state, captured by Nash equilibrium [Nas50]. This is a collection of randomized strategies for the players of the game such that no player can improve her expected utility by switching to a different randomization, if the others do not change their randomization, and assuming that all players randomize independently from each other. In optimizing her strategy P in response to the other players’ strategies, player i only cares about the aggregate strategy P j6=i Xj of the others, due to the players’ anonymity. In fact, P if Xi is optimal against X−i := j6=i Xj then it is also approximately optimal against ′ ′ ′ X−i := j6=i Xj , as long as X−i and X−i are close in total variation distance. Exploiting this observation, the following is established in [DP07, DP09, DP13] using maximum-flow techniques: if there is an ǫ-cover of the set of Poisson binomial distributions with respect to the total variation distance, then an O(ǫ)-approximate Nash equilibrium of any given anonymous game with n players can be found in time essentially the same as it takes to construct the cover. Thus, the existence of an ǫ-cover of Sn which can be constructed in time polynomial in n implies a polynomial-time approximation scheme for the problem of computing Nash equilibrium in anonymous games with two strategies per player. In this paper we show that such covers indeed exist. In particular, we 2 show (Theorem 1) that for any n and ǫ > 0 there is an ǫ-cover Sn,ǫ of Sn consisting of n2 +n·kO(log k) distributions, where k ≈ 1ǫ is an integer. Proof Outline. The construction of our cover goes as follows: Starting from an arbitrary Poisson binomial distribution PBD(p1 , . . . , pn ) ∈ Sn , we arrive to one within variational distance ǫ whose probabilities are fractions with denominator k2 or n. To do this we first take care of pi ’s that are close to zero or one through a Poisson approximation, rounding them to either 0 or 1, or at least 1 1 1 k away. Then we show that there is a way to round all pi ’s to multiples of k 2 or n , while keeping the total variational distance under check. There are two cases: If most (all but k3 ) pi ’s are zero or one, then a sequence of binomial approximations establishes that a rounding of the remaining pi ’s into integer multiples of k12 exists. Otherwise, we show that a single binomial distribution whose probability is a multiple of n1 achieves the approximation. So far (this is Theorem 2 in our exposition), we have arrived at an ǫ-over; however, its cardinality is exponential in k. We next show how to sparsify this cover by taking advantage of a theorem about Poisson binomial distributions that may be of interest on its own right: If the first d moments of two Poisson binomial distributions with parameters in [0, 1/2] are identical, then their variational distance is 2−Ω(d) (Theorem 3). This result allows us to sparsify the cover by eliminating elements of the cover that have identical first d moments, for some integer d = O(log 1ǫ ), obtaining a cover whose size is quasi-polynomial in 1ǫ . This concludes the proof of Theorem 1. Related Work. It is believed that Poisson [Poi37] was the first to study the Poisson binomial distribution, hence its name. Sometimes the distribution is also referred to as “Poisson’s Binomial Distribution.” PBDs have many uses in research areas such as survey sampling, case-control studies, and survival analysis; see e.g. [CL97] for a survey of their uses. They are also very important in the design of randomized algorithms [MR95]. In Probability and Statistics there is a broad literature studying various properties of these distributions; see [Wan93] for an introduction to some of this work. Many results provide approximations to the Poisson binomial distribution via simpler distributions. In a well-known result, Le
2
Cam [Cam60] shows that, for any vector (pi )ni=1 ∈ [0, 1]n , dTV
PBD(p1 , . . . , pn ), Poisson
n X i=1
pi
!!
≤2
n X
p2i ,
i=1
where Poisson(λ) is the Poisson distribution with parameter λ. Subsequently many other proofs of this result and similar ones were given using a range of different techniques; [HC60, Che74, DP86, BHJ92] is a sampling of work along these lines, and Steele [Ste94] gives an extensive list of relevant references. Much work has also been done on approximating PBDs by normal distributions (see e.g. [Ber41, Ess42, Mik93, Vol95]) and by Binomial distributions (see e.g. [Ehm91, Soo96, Roo00]). These results provide structural information about PBDs that can be well-approximated via simpler distributions, but fall short of our goal of obtaining approximations of an unknown PBD up to arbitrary accuracy. Indeed, the approximations obtained in the probability literature (such as the Poisson, Normal and Binomial approximations) typically depend on the first few moments of the target PBD, while higher moments are crucial for arbitrary approximation [Roo00]. At the same time, algorithmic applications often require that the approximating distribution is of the same kind as the distribution that is being approximated. E.g., in the anonymous game application discussed above, the parameters of the target PBD correspond to the mixed strategies of players at Nash equilibrium, and the parameters of the approximating PBD represent mixed strategies at approximate Nash equilibrium, while approximating the target PBD via a Poisson or a Normal distribution wouldn’t have meaning in the context of a game. As outlined above, the proof of our main result, Theorem 1, builds on Theorems 2 and 3. A weaker form of Theorem 2 appeared in [Das08] building on techniques from [DP07]. Theorem 3 and a weaker form of Theorem 1 appeared in [DP09]. Theorem 1 appeared in [DDS12] with slightly worse parameters.
2
Definitions
For a positive integer ℓ, we denote by [ℓ] the set {1, . . . , ℓ}. For a random variable X, we denote by L(X) its distribution. We further need the following definitions. Total variation distance: For two distributions P and Q supported on a finite set A their total variation distance is defined as 1X |P(α) − Q(α)|. ||P ; Q|| := 2 α∈A
If X and Y are random variables ranging over a finite set, their total variation distance, denoted by ||X; Y ||, is defined to equal ||L(X) ; L(Y )||. We sometimes also use dTV (P, Q) and dTV (X, Y ) for ||P ; Q|| and ||X; Y || respectively. Covers: Let F be a set of probability distributions. A subset G ⊆ F is called a (proper) ǫ-cover of F in total variation distance if, for all D ∈ F, there exists some D ′ ∈ G such that ||D ; D ′ || ≤ ǫ. Poisson Binomial Distribution: A P Poisson binomial distribution of order n ∈ N is the discrete probability distribution of the sum ni=1 Xi of n mutually independent Bernoulli random variables X1 , . . . , Xn . We denote the set of all Poisson binomial distributions of order n by Sn . A Poisson binomial distribution D ∈ Sn can be represented uniquely as a vector (pi )ni=1 satisfying 0 ≤ p1 ≤ p2 ≤ . . . ≤ pn ≤ 1. To go from D ∈ Sn to its corresponding vector, we find a collection Pn X1 , . . . , Xn of mutually independent indicators such that i=1 Xi is distributed according to D and 3
E[X1 ] ≤ . . . ≤ E[Xn ]. (Such a collection exists by the definition of a Poisson binomial distribution.) Then we set pi = E[Xi ] for all i. We argue that the resulting representation is unique as follows. Lemma 1. Let X1 , . . . , Xn be mutually independent indicators with expectations p1 ≤ p2 ≤ . . . ≤ pn respectively. Similarly let Y1 , . . . , Yn beP mutually independent indicators with expectations q1 ≤ . . . ≤ P qn respectively. The distributions of i Xi and i Yi are different if and only if (p1 , . . . , pn ) 6= (q1 , . . . , qn ). The proof of the lemma is provided in Section 6. Pn We denote by PBD(p1 , . . . , pn ) the distribution of the sum i=1 Xi of mutually independent indicators X1 , . . . , Xn with expectations pi = E[Xi ], for all i. Given the above discussion PBD(p1 , . . . , pn ) is unique up to permutation of the pi ’s. Translated Poisson Distribution: We say that an integer random variable Y has a translated Poisson distribution with parameters µ and σ 2 and write L(Y ) = T P (µ, σ 2 ) if L(Y − ⌊µ − σ 2 ⌋) = P oisson(σ 2 + {µ − σ 2 }), where {µ − σ 2 } represents the fractional part of µ − σ 2 . Order Notation: Let f (x) and g(x) be two positive functions defined on some infinite subset of R+ . One writes f (x) = O(g(x)) if and only if, for sufficiently large values of x, f (x) is at most a constant times g(x). That is, f (x) = O(g(x)) if and only if there exist positive real numbers M and x0 such that f (x) ≤ M g(x), for all x > x0 . Similarly, we write f (x) = Ω(g(x)) if and only if there exist positive reals M and x0 such that f (x) ≥ M g(x), for all x > x0 . We are casual in our use of the order notation O(·) and Ω(·) throughout the paper. Whenever we write O(f (n)) or Ω(f (n)) in some bound where n ranges over the integers, we mean that there exists a constant c > 0 such that the bound holds true for sufficiently large n if we replace the O(f (n)) or Ω(f (n)) in the bound by c · f (n). On the other hand, whenever we write O(f (1/ǫ)) or Ω(f (1/ǫ)) in some bound where ǫ ranges over the positive reals, we mean that there exists a constant c > 0 such that the bound holds true for sufficiently small ǫ if we replace the O(f (1/ǫ)) or Ω(f (1/ǫ)) in the bound with c · f (1/ǫ). We conclude with an easy but useful lemma whose proof we postpone to Section 6. Lemma 2. Let X1 , . . . , Xn be mutually independent random variables, and let Y1 , . . . , Yn be mutually independent random variables. Then
n n n
X
X X
kXi ; Yi k. ≤ Y X ;
i i
i=1
3
i=1
i=1
Main Result
Our main result is the existence of an ǫ-cover of size polynomial in n and quasi-polynomial in 1/ǫ of the set Sn of Poisson binomial distributions of order n. Theorem 1 (ǫ-cover of Poisson binomial distributions). For all n, ǫ > 0, there exists an ǫ-cover Sn,ǫ of Sn in total variation distance with the following properties: 1. |Sn,ǫ | ≤ n2 + n ·
2 1 O(log 1/ǫ) ; ǫ
4
2. Sn,ǫ can be computed in time O(n2 log n) + O(n log n) ·
2 1 O(log 1/ǫ) . ǫ
Moreover, all distributions PBD(p1 , . . . , pn ) ∈ Sn,ǫ in the cover satisfy at least one of the following properties, for some positive integer k = k(ǫ) = O(1/ǫ) : o n 2 and, for • (k-sparse form) there is some ℓ ≤ k3 such that, for all i ≤ ℓ, pi ∈ k12 , k22 , . . . , k k−1 2 all i > ℓ, pi ∈ {0, 1}; or • ((n, k)-Binomial form) there is some ℓ ∈ {1, . . . , n} and q ∈ n1 , n2 , . . . , nn such that, for all i ≤ ℓ, pi = q and, for all i > ℓ, pi = 0; moreover, ℓ and q satisfy ℓq ≥ k2 and ℓq(1 − q) ≥ k2 − k − 1. We provide an outline of the proof of Theorem 1 in Section 3.1 and the detailed proof in Section 3.2.
3.1
Outline of Proof of Theorem 1, and further results
We start with an outline of the proof of Theorem 1, postponing its complete details to Section 3.2. The proof is obtained in two steps. We first establish the existence of an ǫ-cover whose size is 2 polynomial in n and (1/ǫ)1/ǫ . The existence of this cover is implied by Theorem 2. We then 2 show that this cover can be pruned to size polynomial in n and (1/ǫ)log (1/ǫ) . To prune it we use Theorem 3, which provides a quantification of how the total variation distance between Poisson binomial distributions depends on the number of their first moments that are equal. We proceed to state the two ingredient Theorems 2 and 3. We start with Theorem 2 whose proof is given in Section 4. Theorem 2. Let X1 , . . . , Xn be arbitrary mutually independent indicators, and k ∈ N. Then there exist mutually independent indicators Y1 , . . . , Yn satisfying the following: P P 1. || i Xi ; i Yi || ≤ 41/k; 2. at least one of the following is true:
n o 2 (a) (k-sparse form) there exists some ℓ ≤ k3 such that, for all i ≤ ℓ, E[Yi ] ∈ k12 , k22 , . . . , k k−1 2 and, for all i > ℓ, E[Yi ] ∈ {0, 1}; or (b) ((n, k)-Binomial form) there is some ℓ ∈ {1, . . . , n} and q ∈ n1 , n2 , . . . , nn such that, for all i ≤ ℓ, E[Yi ] = q and, for all i > ℓ, E[Yi ] = 0; moreover, ℓ and q satisfy ℓq ≥ k2 and ℓq(1 − q) ≥ k2 − k − 1. Theorem 2 implies the existence of an ǫ-cover of Sn in total variation distance whose size is n2 + 2 n · (1/ǫ)O(1/ǫ ) . This cover can be obtained by enumerating over all Poisson binomial distributions of order n that are in k-sparse or (n, k)-Binomial form as defined in the statement of the theorem, for k = ⌈41/ǫ⌉. The next step is to sparsify this cover by removing elements to obtain Theorem 1. Note that the 2 term n · (1/ǫ)O(1/ǫ ) in the size of the cover is due to the enumeration over distributions in sparse form. Using Theorem 3 below, we argue that there is a lot of redundancy in those distributions, 2 and that it suffices to only include n · (1/ǫ)O(log 1/ǫ) of them in the cover. In particular, Theorem 3 establishes that, if two Poisson binomial distributions have their first O(log 1/ǫ) moments equal, then their distance is at most ǫ. So we need to only include at most one sparse form distribution with the same first O(log 1/ǫ) moments in our cover. We proceed to state Theorem 3, postponing its proof to Section 5. In Section 3.2 we show how to use Theorems 2 and 3 to obtain Theorem 1. 5
Theorem 3. Let P := (pi )ni=1 ∈ [0, 1/2]n and Q := (qi )ni=1 ∈ [0, 1/2]n be two collections of probability values. Let also X := (Xi )ni=1 and Y := (Yi )ni=1 be two collections of mutually independent indicators with E[Xi ] = pi and E[Yi ] = qi , for all i ∈ [n]. If for some d ∈ [n] the following condition is satisfied: n n X X ℓ qiℓ , for all ℓ = 1, . . . , d, pi = (Cd ) : i=1
i=1
X X Xi ; Yi ≤ 13(d + 1)1/4 2−(d+1)/2 . then i
(1)
i
Remark 1. Condition (Cd ) in the statement of Theorem 3 constraints the first d power sums of the expectations of the constituent indicators of two Poisson binomial distributions. To relate these power sums to the moments of these distributions we can use the theory of symmetric polynomials to arrive at the following equivalent condition to (Cd ): !ℓ !ℓ n n X X Xi = E Yi , for all ℓ ∈ [d]. (Vd ) : E i=1
i=1
We provide a proof that (Cd ) ⇔ (Vd ) in Proposition 1 of Section 6. Remark 2. In view of Remark 1, Theorem 3 says the following:
“If two sums of independent indicators with expectations in [0,1/2] have equal first d moments, then their total variation distance is 2−Ω(d) .” We note that the bound (1) does not depend on the number of variables n, and in particular does not rely on summing a large number of variables. We also note that, since we impose no constraint on the expectations of the indicators, we also impose no constraint on the variance of the resulting Poisson binomial distributions. Hence we cannot use Berry-Ess´een type bounds to bound the total variation distance of the two Poisson binomial distributions by approximating them with Normal distributions. Finally, it is easy to see that Theorem 3 holds if we replace [0, 1/2] with [1/2, 1]. See Corollary 1 in Section 5.
3.2
Proof of Theorem 1
′ of S in total variation We first argue that Theorem 2 already implies the existence of an ǫ-cover Sn,ǫ n 2 1 O(1/ǫ ) 2 distance of size at most n + n · ǫ . This cover is obtained by taking the union of all Poisson binomial distributions in (n, k)-Binomial form and all Poisson binomial distributions in k-sparse form, for k = ⌈41/ǫ⌉. The total number of Poisson binomial distributions in (n, k)-Binomial form is at most n2 , since there are at most n choices for the value of ℓ and at most n choices for the value of q. The total number of Poisson binomial distributions in k-sparse form is at most O(1/ǫ2 ) 2 3 3k 2 choices of (k3 + 1) · k3k · (n + 1) = n · 1ǫ n since there areo k + 1 choices for ℓ, at most k
probabilities p1 ≤ p2 ≤ . . . ≤ pℓ in
1 2 k 2 −1 k2 , k2 , . . . , k2
, and at most n + 1 choices for the number
of variables indexed by i > ℓ that have expectation equal to 1.1 Notice that enumerating over the O(1/ǫ2 ) , as a number in {0, . . . , n} and a above distributions takes time O(n2 log n) + O(n log n) · 1ǫ 1 Note that imposing the condition p1 ≤ . . . ≤ pℓ won’t lose us any Poisson binomial distribution in k-sparse form given Lemma 1.
6
2 . . . , nn can be represented using O(log n) bits, while a number in {0, . . . , k 3 } n , n ,n o 2 and a probability in k12 , k22 , . . . , k k−1 can be represented using O(log k) = O(log 1/ǫ) bits. 2 ′ a large number of the sparse-form distributions it We next show that we can remove from Sn,ǫ O(log2 1/ǫ) contains to obtain a 2ǫ-cover of Sn . In particular, we shall only keep n · 1ǫ sparse-form
probability in
1
distributions by appealing to Theorem 3. To explain the pruning we introduce some notation. For a collection P = (pi )i∈[n] ∈ [0, 1]n of probability values we denote by LP = {i | pi ∈ (0, 1/2]} and by RP = {i | pi ∈ (1/2, 1)}. Theorem 3, Lemma 2 and Lemma 1 imply that if two collections P = (pi )i∈[n] and Q = (qi )i∈[n] of probability values satisfy X X pti = qit , for all t = 1, . . . , d; i∈LP
X
i∈RP
i∈LQ
pti
=
X
qit ,
for all t = 1, . . . , d; and
i∈RQ
(pi )[n]\(LP ∪RP ) and (qi )[n]\(LQ ∪RQ ) are equal up to a permutation; then dTV (PBD(P), PBD(Q)) ≤ 2 · 13(d + 1)1/4 2−(d+1)/2 . In particular, for some d(ǫ) = O(log 1/ǫ), this bound becomes at most ǫ. For a collection P = (pi )i∈[n] ∈ [0, 1]n , we define its moment profile mP to be the (2d(ǫ) + 1)dimensional vector X X X d(ǫ) X X d(ǫ) pi , mP = p2i , . . . , pi ; pi , . . . , pi ; |{i | pi = 1}| . i∈LP
i∈LP
i∈LP
i∈RP
i∈RP
By the previous discussion, for two collections P, Q, if mP = mQ then dTV (PBD(P), PBD(Q)) ≤ ǫ. ′ as follows: for every possible moment profile that can arise from Given the above we sparsify Sn,ǫ a Poisson binomial distribution in k-sparse form, we keep in our cover a single Poisson binomial distribution with such moment profile. The cover resulting from this sparsification is a 2ǫ-cover, since the sparsification loses us an additional ǫ in total variation distance, as argued above. We now bound the cardinality of the sparsified cover. The total number of moment profiles of 2 k-sparse Poisson binomial distributions is kO(d(ǫ) ) · (n + 1). Indeed, consider a Poisson binomial distribution PBD(P = (pi )i∈[n] ) in k-sparse form. There are at most k3 + 1 choices for |LP |, at most k3 + 1 choices for |RP |, and at most (n + 1) choices for |{i | pi = 1}|. We also claim that the total number of possible vectors X X X d(ǫ) pi , p2i , . . . , pi i∈LP
i∈LP
i∈LP
2
is kO(d(ǫ) ) . Indeed, if |LP | = P 0 there is just one such vector, namely the all-zero vector. If |LP | > 0, then, for all t = 1, . . . , d(ǫ), i∈LP pti ∈ (0, |LP |] and it must be an integer multiple of 1/k2t . So P the total number of possible values of i∈LP pti is at most k2t |LP | ≤ k2t k3 , and the total number of possible vectors X X X d(ǫ) pi , p2i , . . . , p i
i∈LP
is at most
i∈LP
i∈LP
d(ǫ)
Y t=1
2
k2t k3 ≤ kO(d(ǫ) ) . 7
The same upper bound applies to the total number of possible vectors X X X d(ǫ) pi , p2i , . . . , pi . i∈RP
i∈RP
i∈RP
The moment profiles we enumerated over are a superset of the moment profiles of k-sparse Poisson binomial distributions. We call them compatible moment profiles. We argued that there are at most 2 kO(d(ǫ) ) · (n + 1) compatible moment profiles, so the total number of Poisson binomial distributions O(log2 1/ǫ) 2 . The in k-sparse form that we keep in the cover is at most kO(d(ǫ) ) · (n + 1) = n · 1ǫ number of Poisson binomial distributions in (n, k)-Binomial form is the same as before, i.e. at most O(log2 1/ǫ) . n2 , as we did not eliminate any of them. So the size of the sparsified cover is n2 + n · 1ǫ ′ To finish the proof it remains to argue that we don’t actually need to first compute Sn,ǫ and 2 then sparsify it to obtain our cover, but can produce it directly in time O(n log n) + O(n log n) · 2 1 O(log 1/ǫ) . We claim that, given a moment profile m that is compatible with a k-sparse Poisson ǫ binomial distribution, we can compute some PBD(P = (pi )i ) in k-sparse form such that mP = m, if O(log2 1/ǫ) . This follows from Claim 1 of Section 6.2 such a distribution exists, in time O(n log n) 1ǫ So our algorithm enumerates over all moment profiles that are compatible with a k-sparse Poisson binomial distribution and for each profile invokes Claim 1 to find a Poisson binomial distribution with such moment profile, if such distribution exists, adding it to the cover if it does exist. It then enumerates over all Poisson binomial distributions in (n, k)-Binomial form and adds them to the cover as well. The overall running time is as promised.
4
Proof of Theorem 2
We begin with a sketch of the argument. We obtain the variables Y1 , . . . , Yn in two steps. We first massage the given variables X1 , . . . , Xn to obtain variables Z1 , . . . , Zn such that X X Xi ; Zi ≤ 7/k; (2) i i 1 1 (3) ∪ 1 − ,1 . and E[Zi ] ∈ / 0, k k These variables do not necessarily satisfy Properties 2a or 2b in the statement of Theorem 2, but allow us to define variables Y1 , . . . , Yn which do satisfy these properties and, moreover, X X (4) Zi ; Yi ≤ 34/k. i
(2), (4) and the triangle inequality imply k
i
P
i Xi ;
P
i Yi k
≤
41 k ,
concluding the proof of Theorem 2.
Let us call Stage 1 the process of determining the Zi ’s and Stage 2 the process of determining the Yi ’s. The two stages are described briefly below, and in detail in Sections 4.1 and 4.2 respectively. O(log2 1/ǫ) . We can improve this as follows: A naive application of Claim 1 results in running time O(n3 log n) · 1ǫ 3 for all possible values |LP |, |RP | such that |LP | + |RP | ≤ min(k , n − m2d(ǫ)+1 ), we invoke Claim 1 with n ˜ = 1, δ = d(ǫ), B = k3 , n0 = n1 = 0, ns = |LP |, nb = |RP |, and moments µℓ = mℓ , for ℓ = 1, . . . , d(ǫ), and µ′ℓ = md(ǫ)+ℓ , for ℓ = 1, . . . , d(ǫ). If for some pair |LP |, |RP | the algorithm succeeds in finding probabilities matching the provided moments, we set m2d(ǫ)+1 of the remaining probabilities equal to 1 and the rest to 0. Otherwise, we output “fail.” 2
8
For convenience, we use the following notation: for i = 1, . . . , n, pi = E[Xi ] will denote the expectation of the given indicator Xi , p′i = E[Zi ] the expectation of the intermediate indicator Zi , and qi = E[Yi ] the expectation of the target indicator Yi . P Stage 1: Recall that our goal in this stage is to define a Poisson binomial distribution i Zi whose constituent indicators have no expectation in Tk := (0, k1 )∪(1− k1 , 1). The expectations (p′i = E[Zi ])i are defined in terms of the corresponding (pi )i as follows. For all i, if pi ∈ / Tk we set p′i = pi . ′ Then, is the set of indices i such that pi ∈ (0, 1/k), we choose any (pi )i∈Lk so as to satisfy P if Lk P is the set of indices i | i∈Lk pi − i∈Lk p′i | ≤ 1/k and p′i ∈ {0, 1/k}, for all i ∈ Lk . Similarly, P if Hk P such that pi ∈ (1 − 1/k, 1), we choose any (p′i )i∈Hk so as to satisfy | i∈Hk pi − i∈Hk p′i | ≤ 1/k and p′i ∈ {1 − 1/k, 1}, for all i ∈ Hk . Using results P on Poisson approximation [BHJ92] we argue that the resulting Poisson binomial distribution i Zi satisfies (2). The details are given in Section 4.1. Stage 2: The definition of (qi )i depends on the number m of p′i ’s which are not 0 or 1. The case m ≤ k3 corresponds to Case 2a in the statement of Theorem 2, while m > k 3 corresponds to Case 2b. • Case m ≤ k3 : First, we set qi = p′i , if p′i ∈ {0, 1}. Then we argue that each p′i , i ∈ M := {i p′i ∈ / {0, 1}}, can be rounded to some qi , which is an integer multiple of 1/k2 , so that (4) holds. Notice that, if we were allowed to use multiples of 1/k4 , this would be immediate via an application of Lemma 2:
X X
X ′
|pi − qi |. Zi ; Yi ≤
i
i
i∈M
We improve the required accuracy to 1/k 2 via a series of binomial approximations to the Poisson binomial distribution [Ehm91]. The details are in Section 4.2.1.
• Case m > k3 : Using the translated Poisson approximation of the Poisson binomial distribuP tion [R¨ 07], we show that i Zi can be approximated by a Binomial distribution B(m′ , q), where m′ ≤ n and q is an integer multiple of n1 . In particular, we show that an appropriate choice of m′ and q implies (4), if we set m′ of the qi ’s equal to q and the remaining equal to 0. The details are in Section 4.2.2.
4.1
Details of Stage 1
Recall that Lk := {i i ∈ [n] ∧ pi ∈ (0, 1/k)} and Hk := {i i ∈ [n] ∧ pi ∈ (1 − 1/k, 1)} . We define (p′i )i formally as follows. First, we set p′i = pi , for all i ∈ [n] \ Lk ∪ Hk . It follows that
X X
= 0. X ; Z i i
i∈[n]\Lk ∪Hk i∈[n]\Lk ∪Hk Next, we define the probabilities p′i , i ∈ Lk , using the following procedure: P i∈Lk pi ; and let L′k ⊆ Lk be an arbitrary subset of cardinality |L′k | = r. 1. Set r = 1/k 2. Set p′i = k1 , for all i ∈ L′k , and p′i = 0, for all i ∈ Lk \ L′k . 9
(5)
We bound the total variation distance k to the Poisson binomial distribution:
P
i∈Lk
Xi ;
P
i∈Lk
Zi k using the Poisson approximation
Theorem 4 ([BHJ92]). Let J1 , . . . , Jn be mutually independent indicators with E[Ji ] = ti . Then ! P n n n X X t2i ti ≤ Pi=1 Ji ; Poisson . n i=1 ti i=1
i=1
Theorem 4 implies
P
1 P X
X
p2i i∈Lk pi i∈L k k
≤ P = 1/k. Xi ; Poisson pi
≤ P pi pi
i∈Lk
i∈L i∈L k k i∈Lk
P P ′ Similarly, k i∈Lk Zi ; Poisson i∈Lk pi k ≤ 1/k. Finally, we use Lemma 3 (given below and proved in Section 6) to bound the distance
1.5 X X
1 1 −1 ′
Poisson k − e k e , ≤ p p ; Poisson ≤ i i
k
2 i∈Lk i∈Lk where we used that |
P
i∈Lk
pi −
P
i∈Lk
p′i | ≤ 1/k. Using the triangle inequality the above imply
X X
3.5
Zi Xi ;
≤ k .
i∈Lk i∈Lk
(6)
Lemma 3 (Variation Distance of Poisson Distributions). Let λ1 , λ2 > 0 . Then ||Poisson(λ1 ) ; Poisson(λ2 )|| ≤
1 |λ1 −λ2 | e − e−|λ1 −λ2 | . 2
We follow a similar rounding scheme to define (p′i )P i∈Hk from (p Pi )i∈Hk . ′ That is, we round some of the pi ’s to 1 − 1/k and some of them to 1 so that | i∈Hk pi − i∈Hk pi | ≤ 1/k. As a result, we get (to see this, repeat the argument employed above to the variables 1 − Xi and 1 − Zi , i ∈ Hk )
X X
3.5
Xi ; Zi (7)
≤ k .
i∈Hk
i∈Hk Using (5), (6), (7) and Lemma 2 we get (2).
4.2
Details of Stage 2
Recall that M := {i | p′i ∈ / {0, 1}} and m := |M|. Depending on on whether m ≤ k3 or m > k 3 we follow different strategies to define (qi )i .
10
4.2.1
The Case m ≤ k3
First we set qi = p′i , for all i ∈ [n] \ M. It follows that
X
X
Zi ; Yi
= 0.
i∈[n]\M i∈[n]\M
(8)
For the definition of (qi )i∈M , we make use of the following approximation, shown by Ehm using Stein’s method. Theorem 5 ([Ehm91]). Let J1 , . . . , Jn be mutually independent indicators with E[Ji ] = ti , and P i ti ¯ t = n . Then n Pn X ¯2 i=1 (ti − t) ¯ Ji ; B (n, t) ≤ . (n + 1)t¯(1 − t¯) i=1
We partition M as M = Ml ⊔ Mh , where Ml = {i ∈ M | p′i ≤ 1/2}. We describe below a procedure for defining (qi )i∈Ml so that the following hold: P P 1. k i∈Ml Zi ; i∈Ml Yi k = 17/k; 2. for all i ∈ Ml , qi is an integer multiple of 1/k2 .
To define (qi )i∈Mh , we apply the same procedure to (1 − p′i )i∈Mh to obtain (1 − qi )i∈Mh . Assuming the correctness of our procedure for probabilities ≤ 1/2 the following should also hold P P 1. k i∈Mh Zi ; i∈Mh Yi k = 17/k; 2. for all i ∈ Mh , qi is an integer multiple of 1/k2 .
Using Lemma 2, the above bounds imply
X
X X X X X
Zi ; Yi Zi ; Yi ≤ Zi ; + Yi
= 34/k.
i∈Ml i∈Ml i∈Mh i∈M i∈M i∈Ml
(9)
Now that we have (9), using (8) and Lemma 2 we get (4).
So we only need to define (qi )i∈Ml . To do this we partition Ml as M = Ml,1 ⊔Ml,2 ⊔. . .⊔Ml,k−1 where for all j: 1 (j − 1)j 1 1 (j + 1)j 1 + , + . Ml,j = k 2 k2 k 2 k2 (Notice that the length of interval Ml,j is j k12 .) Now, for each j = 1, . . . , k − 1 such that Ml,j 6= ∅, we define (qi )i∈Ml,j via the following procedure: 1. Set pj,min := 2. Set r =
j
1 k
+
(j−1)j 1 2 k2 ,
nj (¯ pj −pj,min ) j·1/k 2
k
pj,max :=
1 k
+
(j+1)j 1 2 k2 ,
nj = |Ml,j |, p¯j =
P
i∈Ml,j
nj
p′i
.
; let M′l,j ⊆ Ml,j be an arbitrary subset of cardinality r.
3. Set qi = pj,max , for all i ∈ M′l,j ; 11
4. for an arbitrary index i∗j ∈ Ml,j \ M′l,j , set qi∗j = nj p¯j − (rpj,max + (nj − r − 1)pj,min ); 5. finally, set qi = pj,min , for all i ∈ Ml,j \ M′l,j \ {i∗j }. It is easy to see that P P ′ ¯j ; 1. i∈Ml,j qi ≡ nj p i∈Ml,j pi =
2. for all i ∈ Ml,j \ {i∗j }, qi is an integer multiple of 1/k2 .
Moreover Theorem 5 implies: P nj (j 12 )2 ′ −p 2 k (p ¯ ) X , j i∈Ml,j i (n +1)p j j,min (1−pj,min ) ≤ Z ; B (n , p ¯ ) ≤ 1 2 i j j (nj + 1)¯ ) n (j j 2 pj (1 − p¯j ) k i∈Ml,j (n +1)p (1−p ), j
≤
j,max
8 . k2
P A similar derivation gives i∈Ml,j Yi ; B (nj , p¯j ) ≤
8 k2 .
j,max
when j < k − 1 when j = k − 1
So by the triangle inequality:
X X 16 Zi ; Yi ≤ 2 . i∈Ml,j k i∈Ml,j
(10)
As Eq (10) holds for all j = 1, . . . , k − 1, an application of Lemma 2 gives: k−1 X X X 16 X X Yi ≤ . Zi ; Yi ≤ Zi ; k j=1 i∈Ml,j i∈Ml i∈Ml,j i∈Ml
Moreover, the qi ’s defined above are integer multiples of 1/k 2 , except maybe for qi∗1 , . . . , qi∗k−1 . But P P we can round these to their closest multiple of 1/k2 , increasing k i∈Ml Zi ; i∈Ml Yi k by at most 1/k. The Case m > k3
4.2.2
P Let t = |{i | p′i = 1}|. We show that the random variable i Zi is within total variation distance 9/k from the binomial distribution B(m′ , q) where & P 2 ' ′ ℓ∗ i∈M pi + t ′ P and q := m := , ′2 n i∈M pi + t
where ℓ∗ satisfies • •
P
P
′ i∈M pi
′ i∈M pi +t m′
P
+t ≤
′ i∈M pi +t m′
2
∗
P ≤ ( i∈M p′2 i + t)(m + t), by the Cauchy-Schwarz inequality; and
P
P
∗
∈ [ ℓ n−1 , ℓn ]. Notice that:
i∈M
p′i +t
2
( Pi∈M p′i +t)
=
P p′2 +t Pi∈M i′ i∈M pi +t
≤ 1.
′2 i∈M pi +t
12
So m′ ≤ m + t ≤ n, and there exists some ℓ∗ ∈ {1, . . . , n} so that
P
′ i∈M pi +t ′ m
∗
∗
∈ [ ℓ n−1 , ℓn ].
For fixed m′ P and q, we setP qi = q, for all i ≤ m′ , and qi = 0, for all i > m′ , and compare the distributions of i∈M Zi and i∈M Yi . For convenience we define # # " " X X Yi , Zi and µ′ := E µ := E i∈M
i∈M
2
σ := Var
"
X
i∈M
Zi
#
′2
and σ := Var
"
X
#
Yi .
i∈M
The following lemma compares the values µ, µ′ , σ, σ ′ . Lemma 4. The following hold µ ≤ µ′ ≤ µ + 1,
(11)
σ 2 − 1 ≤ σ ′2 ≤ σ 2 + 2,
(12)
2
µ≥k , (13) 1 σ2 ≥ k2 1 − . (14) k P P The proof of Lemma 4 is given in Section 6. To compare i∈M Zi and i∈M Yi we approximate both by translated Poisson distributions, making use of the following theorem, due to R¨ ollin [R¨ 07].
Theorem 6 ([R¨ 07]). Let J1 , . . . , Jn be a sequence of independent random indicators with E[Ji ] = pi . Then qPn 3 n X i=1 pi (1 − pi ) + 2 Pn , Ji ; T P (µ, σ 2 ) ≤ i=1 pi (1 − pi ) i=1 P P where µ = ni=1 pi and σ 2 = ni=1 pi (1 − pi ).
Theorem 6 implies that
qP ′3 pP ′ ′ X ′ i pi (1 − pi ) + 2 i pi (1 − pi ) + 2 2 P ′ P ≤ Zi ; T P (µ, σ ) ≤ ′ ′ ′ i pi (1 − pi ) i pi (1 − pi ) i
≤ pP
1
′ i pi (1
−
p′i )
+P
2
′ i pi (1
2 1 + 2 ≤ p k 1 − 1/k k 1 − k1 3 ≤ , k
−
p′i )
=
2 1 + 2 σ σ
(using (14))
where for the last inequality we assumed k ≥ 3, but the bound of 3/k clearly also holds for k = 1, 2. Similarly, X 2 1 1 2 ′ ′2 (using (12),(14)) Yi ; T P (µ , σ ) ≤ ′ + ′2 ≤ q + 2 σ σ k 1 − k1 − k12 k 1− 1 − 1 i
k
≤
3 , k
13
k2
where for the last inequality we assumed k ≥ 3, but the bound of 3/k clearly also holds for k = 1, 2. By the triangle inequality we then have that X X X X Yi ; T P (µ′ , σ ′2 ) + T P (µ, σ 2 ) ; T P (µ′ , σ ′2 ) Zi ; T P (µ, σ 2 ) + Zi ; Yi ≤ i i i i (15) = 6/k + T P (µ, σ 2 ) ; T P (µ′ , σ ′2 ) .
It remains to bound the total variation distance between the two Translated Poisson distributions. We make use of the following lemma. Lemma 5 ([BL06]). Let µ1 , µ2 ∈ R and σ12 , σ22 ∈ R+ \ {0} be such that ⌊µ1 − σ12 ⌋ ≤ ⌊µ2 − σ22 ⌋. Then
Lemma 5 implies
2 2 T P (µ1 , σ12 ) − T P (µ2 , σ22 ) ≤ |µ1 − µ2 | + |σ1 − σ2 | + 1 . σ1 σ12 2 ′2 ′ T P (µ, σ 2 ) ; T P (µ′ , σ ′2 ) ≤ |µ − µ | + |σ − σ | + 1 min(σ, σ ′ ) min(σ 2 , σ ′2 ) 1 3 (using Lemma 4) ≤ q + 2 1 1 1 1 k 1 − k − k2 k 1− − k
k2
≤ 3/k,
(16)
where for the last inequality we assumed k > 3, but the bound clearly also holds for k = 1, 2, 3. Using (15) and (16) we get X X (17) Zi ; Yi ≤ 9/k, i
i
which implies (4).
5
Proof of Theorem 3
The rough idea for the proof of Theorem 3 is this. First P we express PBD(p1 , . . . , pn ) as a weighted sum of the binomial distribution B(n, p) at p = p¯ = pi /n and its first n derivatives with respect to p also at value p = p¯. (These derivatives correspond to finite signed measures.) We notice that the coefficients of the first d terms of this expansion are symmetric polynomials in p1 , . . . , pn of degree at most d. Hence, from the theory of symmetric polynomials,P each of these coefficients can be written as a function of the power-sum symmetric polynomials i pℓi for ℓ = 1, . . . , d. So, if two Poisson binomial distributions satisfy Condition (Cd ), the first d terms of their expansions are exactly identical, and the total variation distance of the distributions depends only on the other terms of the expansion (those corresponding to higher derivatives of the binomial distribution). The proof is concluded by showing that the joint contribution of these terms to the total variation distance can be bounded by 2−Ω(d) . We proceed to provide the details. Proof of Theorem 3: For the purposes of this proof we denote by Bn,p (m) the probability assigned by the binomial distribution B(n, p) on integer m. The following theorem of Roos [Roo00] specifies 14
an expansion of the Poisson binomial distribution as a weighted sum of a finite number of signed measures: the binomial distribution B(n, p) (for an arbitrary choice of p) and its first n derivatives with respect to the parameter p, at the chosen value of p. Namely, n Theorem 7 ([Roo00]). Let P := (pi )ni=1 P ∈ [0, 1] , X1 , . . . , Xn be mutually independent indicators with expectations p1 , . . . , pn , and X = i Xi . Then, for all m ∈ {0, . . . , n} and p ∈ [0, 1],
P r[X = m] =
n X ℓ=0
αℓ (P, p) · δℓ Bn,p (m),
(18)
where for the purposes of the above expression: • α0 (P, p) := 1 and for ℓ ∈ [n] : X
αℓ (P, p) :=
ℓ Y
(pk(r) − p);
1≤k(1)