Journal of Combinatorial Theory, Series A TA2747 journal of combinatorial theory, Series A 77, 279303 (1997) article no. TA972747
Probabilistic Bounds on the Coefficients of Polynomials with Only Real Zeros* Jim Pitman Department of Statistics, University of California, 367 Evans Hall *3860, Berkeley, California 94720-3860 Communicated by the Managing Editors Received March 22, 1996
The work of Harper and subsequent authors has shown that finite sequences (a 0 , ..., a n ) arising from combinatorial problems are often such that the polynomial A(z) := nk=0 a k z k has only real zeros. Basic examples include rows from the arrays of binomial coefficients, Stirling numbers of the first and second kinds, and Eulerian numbers. Assuming the a k are nonnegative, A(1)>0 and that A(z) is not constant, it is known that A(z) has only real zeros iff the normalized sequence (a 0 A(1), ..., a n A(1)) is the probability distribution of the number of successes in n independent trials for some sequence of success probabilities. Such sequences (a 0 , ..., a n ) are also known to be characterized by total positivity of the infinite matrix (a i& j ) indexed by nonnegative integers i and j. This papers reviews inequalities and approximations for such sequences, called Po lya frequency sequences which follow from their probabilistic representation. In combinatorial examples these inequalities yield a number of improvements of known estimates. 1997 Academic Press
1. INTRODUCTION The work of Harper [58] and subsequent authors [60, 94, 62, 128, 119, 15, 16] has shown that finite sequences (a 0 , ..., a n ) arising from combinatorial problems are often such that the polynomial A(z) := nk=0 a k z k has only real zeros. Typically a k =a nk is the number of elements | of some finite set 0 n such that S n(|)=k, for some function S n : 0 n [0, 1, ..., n]. The normalized sequence (a 0 A(1), ..., a n A(1)) then describes the probability distribution of S n(|) for | picked uniformly at random from 0 n . See Section 4 for examples and further references, and [39] for definition of probabilistic terms. * Research supported in part by NSF Grant MCS9404345. E-mail: pitmanstat.berkeley.edu.
279 0097-316597 25.00 Copyright 1997 by Academic Press All rights of reproduction in any form reserved.
File: 582A 274701 . By:CV . Date:24:01:97 . Time:12:42 LOP8M. V8.0. Page 01:01 Codes: 4028 Signs: 2230 . Length: 50 pic 3 pts, 212 mm
280
JIM PITMAN
A sequence of real numbers (a k ) k # K indexed by a subset K of the nonnegative integers is called a Polya frequency sequence of order r, abbreviated PF r if the infinite matrix M :=(a i& j ) i, j=0, 1, 2, ... , where a k =0 for k K, is totally positive of order r. That is to say, for each 1sr, each s_s minor of M has a nonnegative determinant. The sequence (a k ) is called a Polya frequency (PF ) sequence if it is PF r for every r=1, 2, ... . See Karlin [74] and Ando [4] for background on total positivity, and Brenti [16, 17] for recent combinatorial developments of this concept. Proposition 1 [81, 108]. Let (a 0 , ..., a n ) be a sequence of nonnegative real numbers with associated polynomial A(z) := nk=0 a k z k such that A(1)>0. The following conditions are equivalent: (i)
the polynomial A(z) is either constant or has only real zeros ;
(ii)
(a 0 , ..., a n ) is a PF sequence;
(iii) the normalized sequence (a 0 A(1), ..., a n A(1)) is the distribution of the number S n of successes in n independent trials with probability p i of success on the i th trial, for some sequence of probabilities 0p i 1. The roots of A(z) are then given by &(1& p i )p i for i with p i >0. The equivalence (i) (ii) is due to Aissen, Schoenberg, and Whitney [1, 108]. This equivalence is a special case of the more general representation of totally positive infinite sequences due to Edrei [33], which is recalled in Section 5. See also Chapter 8 of Karlin [74], Theorem 1.2, and Corollary 3.1. The equivalence (i) (iii), due to P. Levy [80, 81], follows easily from the interpretation of A(z)A(1) as a probability-generating function. Call an array of real numbers (a nk )=(a nk , 0kn, n=1, 2, ...) a PF array iff (a nk , 0kn) is a PF sequence for every n=1, 2, 3, ... . Basic examples of PF arrays are provided by the binomial coefficients, the Stirling numbers of the first and second kinds, and the Eulerian numbers. Harper [58] and others have exploited the implication (i) O (iii) to deduce normal approximations for the n th row of a PF array from the normal approximation to the distribution of S n as in (iii). Normal approximations have also been obtained for sequences of combinatorially defined distributions satisfying other conditions [20, 40, 41, 46]. But results in the probability and statistics literature, reviewed in Section 2, show that PF sequences satisfy a number of useful inequalities which do not hold for just any sequence that is approximately normal. As shown in Section 4, even for the two Stirling arrays which have been extensively studied, the probabilistic bounds yield some improvements of known estimates.
File: 582A 274702 . By:CV . Date:24:01:97 . Time:12:42 LOP8M. V8.0. Page 01:01 Codes: 3211 Signs: 2588 . Length: 45 pic 0 pts, 190 mm
281
PROBABILISTIC BOUNDS
The notion of a PF r sequence was developed early in this century by Fekete, Polya, Schoenberg, and others. See [74] for a survey of this development. Polynomials with real coefficients and only real zeros were the subject of intensive study in the 19th and early 20th centuries, by Lagrange, Laguerre, Polya, and many other authors. Much information about such polynomials can be found in [92, 101]. See also [68] and papers cited there. As observed by Schoenberg [108], a sequence of nonnegative reals (a k ) is PF 2 iff it is log-concave (a 2k a k&1 a k+1 ) and has no internal zeros (i< j0 O a j >0). Sequences with these properties, and the weaker property of unimodality, have been extensively studied in probability and statistics [76, 75, 130, 19, 122, 109, 56], and in combinatorics and other fields [116]. The PF r property for r3 is harder to describe intuitively. But see Brenti [16] for recent combinatorial interpretations of total positivity. According to Newton's inequality [57, p. 52], if a polynomial a k z k with real coefficients has only real roots and in particular if (a 0 , ..., a n ) is a PF sequence, then 1
1
\ k+ \1+n&k+
a 2k a k&1 a k+1 1+
(1)
which is stronger than the log-concavity implied by the PF 2 condition. Levy [80, 81] noted that (1) is a constraint on the probabilities a k :=P(S n =k) for S n the number of successes in n independent trials. Levy also observed that (1) cannot be improved: given nonnegative a k&1 , a k , a k+1 satisfying (1) for some 1kn, there exists a PF sequence (a 0 , ..., a n ) with these terms at places k&1, k, k+1. As shown by Samuels [107], further applications of Newton's inequality imply that for each r=1, 2, ... the sequence of rth-order differences derived from a finite PF sequence has at most r strict sign changes.
2. REVIEW OF PROBABILISTIC RESULTS Let (a 0 , ..., a n ) be a frequency sequence, that is a sequence of nonnegative real numbers. Let A$(z) and A"(z) denote the first two derivatives of the polynomial A(z)= i a i z i. Abbreviate A=A(1), A$=A$(1), A"=A"(1), and assume throughout that A(1)>0. Let P denote the probability distribution on [0, 1, ..., n] defined by normalization of (a 0 , ..., a n ). So for example, P(k) :=a k A and for an interval [b, c] P[b, c] := : b jc
P( j)=
1 : aj . A b jc
File: 582A 274703 . By:CV . Date:24:01:97 . Time:12:42 LOP8M. V8.0. Page 01:01 Codes: 2926 Signs: 2250 . Length: 45 pic 0 pts, 190 mm
(2)
282
JIM PITMAN
Let + and _ denote the mean and standard deviation of P. That is, + := _ 2 :=
A$ 1 n : ka k = Ak=0 A
(3)
1 n A" A$ A$ 2 : (k&+) 2 a k = + & . A k=0 A A A
\ +
(4)
In probabilistic language, if S is a random variable with distribution P, then S has expectation + and variance _ 2. If (a 0 , ..., a n ) is a PF sequence then call P a PF distribution. Say a random variable X has Bernoulli( p) distribution if X assumes the values 0 and 1 with probabilities P(X=1)= p and P(X=0)=1& p. According to Proposition 1, a probability distribution P on [0, 1, ..., n] is a PF distribution iff P is the distribution of a sum of independent variables S n :=X 1 + } } } +X n , where X i has Bernoulli ( p i ) distribution. So for a PF distribution P there are the following standard probabilistic expressions [39], which can also be checked algebraically using the fact that the &(1& p i )p i are the roots of A(z): n
+=: p i ,
_ 2 = : p i (1& p i ).
i
(5)
i=1
History and Terminology. What is called here a PF distribution is called in the statistics literature the distribution of the number of successes in independent trials. Such trials with two possible outcomes, success and failure, and varying probabilities of success, are known as Poisson trials or Poisson-binomial trials. The distribution of the number of successes S n is sometimes called a Poisson-binomial distribution, but that term has also acquired other meanings. Study of the distribution of S n dates back to the 1837 monograph of Poisson [99]. Chebyshev [24] established bounds for tail probabilities and the law of large numbers for the distribution S n . The work of subsequent authors, reviewed below, has provided sharper bounds for tail probabilities, precise estimates for location of the mode and median, and error bounds for normal and Poisson approximations. The binomial (n, p) array. The array of binomial coefficients is a PF array due to the factorization of the associated polynomials n
: k=0
n k z =(1+z) n k
\+
(6)
Replace z by pzq in (6) and normalize to obtain the polynomial associated with the Binomial(n, p) distribution. The corresponding PF array with parameter 0p1, which describes the distribution of the number of
File: 582A 274704 . By:CV . Date:24:01:97 . Time:12:42 LOP8M. V8.0. Page 01:01 Codes: 2929 Signs: 2116 . Length: 45 pic 0 pts, 190 mm
PROBABILISTIC BOUNDS
283
successes in n independent trials with constant success probability p, may be presented as: P n, p(k) :=
n
\k+ p (1& p) k
n&k
(0kn).
(7)
Hoeffding's Inequalities [64]. Let P be a PF distribution on [0, 1, ..., n] and let P n, p as in (7) denote the binomial (n, p) distribution with the same mean as P, that is, with p=+n. Then for all integers b and d with 0b+&1 and ++1dn, P[0, b]P n, p[0, b],
P[d, n]P n, p[d, n].
(8)
Also, for every convex function g, there is the inequality n
n
: g( j) P( j) : g( j) P n, p( j). j=0
(9)
j=0
These inequalities make very precise sense of the following idea: amongst all PF distributions on [0, 1, ..., n] with a given mean +, the binomial (n, p) distribution for p=+n is the one that is ``most spread out.'' See Gleser [47] for refinements and Marshall and Olkin [84] for a survey of related inequalities. Hoeffding showed also that for an arbitrary real-valued function g any PF distribution P that maximizes the sum on the left side of (9) over all PF distributions on [0, 1, ..., n] is necessarily a shifted binomial distribution. That is to say P(k)=P m, p(k&h) for all hkh+m for some integers h and m with 0hh+mn and some p with 0p1. For g( j)=1( jk) this result dates back to Chebychev [24], who combined it with bounds on binomial probabilities to obtain a weak law of large numbers. Large Deviation Bounds. Good bounds for binomial tail probabilities were obtained by Okamoto [93] using the method of Chernoff [25]. Combined with Hoeffding's inequality (8), these bounds show that every PF distribution P on [0, 1, ..., n] is subject to P[b, n]
+ b
b
n&+ n&b
\+\ +
n&b
for
++1bn.
(10)
By an obvious reversal, the same function of ( +, b, n) is an upper bound on P[0, b] for 0b+&1. Numerous other bounds for binomial probabilities are known [39, 83, 112, 66, 14, 86, 69], any of which can be used to bound the tails of a PF sequence via (8). Appendix A of [3] derives the
File: 582A 274705 . By:CV . Date:24:01:97 . Time:12:42 LOP8M. V8.0. Page 01:01 Codes: 2673 Signs: 1884 . Length: 45 pic 0 pts, 190 mm
284
JIM PITMAN
following simpler bounds for all PF distributions P which are adequate for many purposes. For all c>0, c2 , 2+
\ +
P[0, +&c]exp &
c3 c2 P[ ++c, n]exp & + 2 . (11) 2+ 2+
\
+
See [64, 63, 44, 39, 3, 69] for variations, refinements, and generalizations of these inequalities and references to earlier results. If both the variance _ 2 and the mean + are known or can be bounded, further tail bounds are available for a PF distribution which are sharper than either the above estimates or Chebychev's inequality [12, 69]. Quite a different kind of bound, discovered by Nicholas Bernoulli for binomial probabilities around 1713, is presented in Section 16.3 of Hald [54]. This bound generalizes as follows to any PF 2 distribution on the integers, derived as in (2) by normalization of a summable PF 2 sequence (a k ): for integers bmc with a m >0, P[b, c]1&max(a b , a c )a m .
(12)
The bound is nontrivial only if both a b and a c are less than a m , so the best choice of m is a mode of the distribution, as discussed in the next paragraph. Note that the bound can be computed without knowing the constant of normalization A := k a k when P is defined via (2). Let [i, j]= ik j a k . By choosing b, c, and m so that max(a b , a c )a m 0 and a k+1 >0, the more precise version of Daroch's rule stated above gives 1 k+2
ak % a k+1
if
+(%)k+
ak % a k+1
if
+(%)k+1&
(17)
1 . n&k+1
(18)
Let l be the least k and r the greatest k such that a k >0. For (a k ) with l0, hence that +(%) is continuous and strictly increasing from l to r as % increases from 0 to . For 0<x0. See [36, 32, 5, 8, 37]. Bender [10, Example 5.1] shows how the normal approximation (26) in this case yields the leading term of an asymptotic expansion for the Stirling numbers of the first kind due to Moser and Wyman [89]. There is no shortage of asymptotic approximations for these Stirling numbers [88, 124, 131, 67, 125], but little in the way of easily computable bounds. Consider, for instance, the problem of computing the ratio n n &1 (33) r(n, k) := k k+1
_ &_ &
for large n and k. According to (20), %(n, k)