A CONSTRAINT ON EXTENSIBLE QUADRATURE RULES By Art B. Owen
Technical Report No. 2014-06 May 2014
Department of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065
A CONSTRAINT ON EXTENSIBLE QUADRATURE RULES By Art B. Owen Stanford University
Technical Report No. 2014-06 May 2014
This research was supported in part by National Science Foundation grant DMS 0906056.
Department of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065 http://statistics.stanford.edu
A constraint on extensible quadrature rules Art B. Owen Stanford University April 2014 Abstract When the worst case integration error in a family of functions decays as n−α for some α > 1 and simple averages along an extensible sequence match that rate at a set of sample sizes n1 < n2 < · · · < ∞, then these sample sizes must grow at least geometrically. More precisely, nk+1 /nk ≥ ρ must hold for a value 1 < ρ < 2 that increases with α. This result always rules out arithmetic sequences but never rules out sample size doubling. The same constraint holds in a root mean square setting.
1
Introduction
For both Monte Carlo (MC) and quasi-Monte Carlo (QMC) sampling, Sobol’ (1998) recommends that the number n of sample points should be increased geometrically, not arithmetically. Specifically, he recommends using a sequence like n1 , 2n1 , 4n1 , etc., instead of n1 , 2n1 , 3n1 and so on. For either MC or QMC, the estimate of an integral is a simple unweighted average of the integrand at n points. In the case of Monte Carlo sampling with its slow n−1/2 convergence rate, if n is too small to get a good answer then taking n + k sample points for k n is unlikely to bring a meaningful improvement in accuracy. Sobol’ (1993) studies the correlations among Monte Carlo estimates along sample sizes nk = 2k−1 n1 . In quasi-Monte Carlo sampling, much better convergence rates are sometimes possible, depending particularly on the smoothness and dimension of the problem space. Novak and Wozniakowski (2010) provide a comprehensive treatise on error rates for numerical integration. In favorable cases, a small change in n might make a meaningful reduction in the error bound. What we show here is that rate-optimal sample sizes are widely spaced in those favorable cases. Sobol’ (1998) showed that a better rate than 1/n cannot hold uniformly for all n. Hickernell et al. (2012) extended this finding to arithmetic sequences of sample sizes. They also showed that unequally weighted averages of function evaluations can attain a better than 1/n rate at all values of n. Suppose for instance that the first n sample points are partitioned into blocks of nj points, for j = 1, . . . , J where the estimate from block j has error O(n−α j ). Then weighting
1
those within block estimates proportionally to naj , with a ≥ α, attains an error O((J/n)α ). One can commonly arrange J = O(log n) and then the error is o(n−α+ ) for any > 0. It remains interesting to consider equal weight rules. They avoid cumbersome weighting of points. Also, there are quasi-Monte Carlo methods such as higher order digital nets (Dick, 2011), that are simultaneously rate optimal for more than one class of functions, each with its own rate. In such settings we might want to use the same weights for multiple integrands, but no single weighing might serve them all best. Finally, if the constraints we find here on equal weight rules are unpalatable for some given problem, it provides motivation to switch to an unequally weighted rule. An outline of this note is as follows. Section 2 presents the insight from the appendix of Sobol’ (1998) and the extension by Hickernell et al. (2012). If a QMC rule has worst case error o(1/n) holding for all n, then the points xi must have some very strange limit properties and the class of functions involved is odd enough that we could do very well using only one point xn for very large n. A generalization of that argument shows that an o(1/n) rate along an arithmetic sequence of sample sizes raises similar problems. Section 3 shows that if quadrature error in a class F of functions has a worst case lower bound mn−α for α > 1 and a specific sequence xi is rate optimal at sample sizes n1 < n2 < · · · , then necessarily nk+1 /nk ≥ ρ for some constant 1 < ρ < 2 depending on α and on how close the sequence comes to having the optimal constant. Section 4 considers root mean squared error for sequences of sample points incorporating some randomness. The same constraints hold in this setting as in the worst case setting.
2
Ruling out arithmetic sequences
It is not reasonable to expect a QMC rule to have errors of size o(1/n) for all values of n ≥ N0 for some N0 > 0. Intuitively, adding a single point makes a change of order 1/n to the estimated integral. Therefore two consecutive values of the integral estimate are ordinarily an order of magnitude farther apart from each other than they could be by the triangle inequality, with the true integral value making the third corner of the triangle. This idea is made precise in the appendix of Sobol’ R(1998), as we outline here. Pn Let µ = µ(f ) = f (x) dx and µ ˆn = µ ˆn (f ) = (1/n) i=1 f (xi ). The integral is over [0, 1]d and xi ∈ [0, 1]d for i ≥ 1. Let n
ηn = ηn (f ) =
1X f (xi ) − µ n i=1
and suppose that the points xi are chosen such sup |ηn (f )| ≤ (n) f ∈F
2
where (n) = o(1/n), and F is a class of integrands. Sobol’ considered (n) = O(n−α ) for some α > 1. The classes F that we study are usually balls with respect to a seminorm, such as the standard deviation in Monte Carlo and the total variation in the sense of Hardy and Krause, for quasi-Monte Carlo. Sobol’ observed that |f (xn+1 ) − µ| = |(n + 1)ηn+1 − nηn | ≤ (n + 1)(n + 1) + n(n) →0 as n → ∞. As a result limn→∞ f (xi ) = µ(f ) for all f ∈ F. The set F cannot be very rich in this case. As Sobol’ noted, for d = 1, if F contains x it cannot also contain x2 . If we had such a sequence xi and a class F we might simply estimate µ by f (xn ) for one extremely large n, perhaps the largest one for which we can compute xn . Alternatively, when f ∈ F are all known to be integrable and anti-symmetric functions on [0, 1]d we can take x1 = (1/2, . . . , 1/2) and have a zero error. This is the most favorable case for antithetic sampling (Hammersley and Morton, 1956). MC and QMC methods are ordinarily designed for more general purpose use, and so such special settings are of limited importance. Hickernell et al. (2012) extend Sobol’s argument to show that we should not expect (nk) = o(1/n) as n → ∞ for any integer k ≥ 1. We would then have a class of functions F with k 1X f (xnk+i ) → µ(f ) n→∞ k i=1
lim
for all f ∈ F. That is a very limited class, and once again, we could solve the problem uniformly over that class simply by taking k points xnk+1 , . . . x(n+1)k for some very large n.
3
Geometric spacing for the worst case setting
The class F of real-valued functions on [0, 1]d has a superlinear worst case lower bound if X 1 n sup f (xi ) − µ(f ) > mn−α (1) f ∈F n i=1 holds for some α > 1, m > 0, all n ≥ 1 and all xi ∈ [0, 1]d . There is a uniformly rate optimal sequence for this class, if for some xi ∈ [0, 1]d and a sequence of sample sizes n1 < n2 < · · · , X 1 n f (xi ) − µ(f ) ≤ M n−α , ∀n ∈ {n1 , n2 , . . . } sup (2) n f ∈F i=1 holds, where m ≤ M < ∞. 3
Theorem 1. Let F have a worst case lower bound given by (1) with α > 1 and 0 < m ≤ M < ∞. Suppose that there also exists a uniformly rate optimal sequence xi satisfying (2). If ρ = ρk = nk+1 /nk , then
m ρ≥1+ (1 + ρ1−α )−1 M
1/(α−1) ≥1+
m 1/(α−1) . 2M
(3)
Proof. From the lower bound (1), there is an f ∈ F with −α
m(nk+1 − nk )
1 ≤ nk+1 − nk
nk+1
X
(f (xi ) − µ)
i=nk +1 nX k+1 −1
= (nk+1 − nk )
(f (xi ) − µ) −
≤ M (nk+1 −
i=1 −1 1−α nk ) (nk+1
nk X i=1
+
(f (xi ) − µ)
n1−α ), k
(4)
where we have applied upper bounds from (2). Writing ρ = nk+1 /nk and rearranging (4), yields the first inequality in (3). The middle expression in (3) is decreasing in ρ ∈ (0, ∞) and by construction ρ > 1. Plugging in ρ = 1 yields the final bound. A rate optimal sequence nk must be spread out at least geometrically, with a ratio nk+1 /nk ≥ ρ where ρ is now the smallest number satisfying the first inequality in (3). This ρ depends on m/M and on α. By definition ρ > 1. Also we can find by inspection that ρ = 2 satisfies the first inequality in (3) and so Theorem 1 never rules out a doubling of the sample size. The lower bound on the critical extension factor nk+1 /nk always satisfies 1 < ρ < 2. Figure 1 shows this extension bound as a function of α for various levels of the ratio m/M . The values there were computed via Brent’s algorithm (Brent, 1973). The case m/M = 1 is of special interest. It describes a rate-optimal rule that also attains the optimal constant. That case has the highest bound on the extension factor. If the rate n−α is generalized to n−α log(n)β for α > 1 and β > 0, then a geometric spacing is still necessary. For any 1 < γ < α and large enough nk it is necessary to have ρk = nk+1 /nk ≥ 1 + (m/2M )1/(1−γ) .
4
The root mean square error setting
The class F of real-valued functions on [0, 1]d has a superlinear root mean square lower bound if 2 1/2 X 1 n sup E f (xi ) − µ(f ) > mn−α (5) n i=1 f ∈F holds for some α > 1, m > 0, all n ≥ 1 and any random xi ∈ [0, 1]d . Here E(·) denotes expectation with respect to the randomness in xi . The sequence of 4
2.0 1.8
m/M = 0.5
1.6
m/M = 1
1.4
m/M = 0.1
1.2
m/M = 0.01
1.0
Ratio bound rho
Lower bound on extension factor
1
2
3
4
5
Rate alpha
Figure 1: For worst case rates n−α this figure shows a lower bound on the extension factor nk+1 /nk in a uniformly rate-optimal integration sequence. random xi ∈ [0, 1]d is uniformly rate-optimal for this class if there is a sequence of sample sizes n1 < n2 < · · · , for which 1/2 X 1 n ≤ M n−α , ∀n ∈ {n1 , n2 , . . . } (6) sup E f (xi ) − µ(f ) n f ∈F i=1 holds, where m ≤ M < ∞. The same theorem holds for the root mean square error case as holds for the worst case. It is not necessary to assume that any of f (xi ) are unbiased or to make any assumption about the correlation structure among the f (xi ). We only need to square one of the identities in Theorem 1 and then use standard moment inequalities from probability theory. Theorem 2. Let F have a root mean square lower bound given by (5) with α > 1 and 0 < m ≤ M < ∞. Suppose that there also exists a uniformly rate optimal sequence of random xi satisfying (6). If ρ = ρk = nk+1 /nk , then ρ≥1+
m (1 + ρ1−α )−1 M
1/(α−1) ≥1+
m 1/(α−1) . 2M
(7)
Proof.PTo shorten some expressions, let ∆k = nk+1 − nk and recall that ηn = n (1/n) i=1 (f (xi ) − µ(f )). We begin with the identity, 1 ∆k
nk+1
X
(f (xi ) − µ) =
i=nk +1
1 nk+1 ηnk+1 − nk ηnk . ∆k
5
(8)
The expected square of the left side of (8) is no smaller than m2 /∆2α k . The expected square of the right side of (8) is i 1 h 2 2 2 2 n E(η ) − 2n n E(η η ) + n E(η ) k+1 k n n k+1 n k n k+1 k k+1 k ∆2k q i h 1 ≤ 2 n2k+1 E(ηn2 k+1 ) + 2nk+1 nk E(ηn2 k+1 )E(ηn2 k ) + n2k E(ηn2 k ) ∆k i M 2 h 2(1−α) 2(1−α) 1−α ≤ 2 nk+1 + 2n1−α + nk . k+1 nk ∆k As a result, i2 m2 M2 h 1−α ≤ 2 n1−α . k+1 + nk 2α ∆k ∆k
(9)
Taking the square root of both sides of (9) and rearranging, yields (4), from which the theorem follows just as it did for the worst case analysis.
5
Discussion
We have found a constraint on the spacings of a rate-optimal equal-weight extensible MC or QMC sequence. The constraint only applies when the convergence is better than O(1/n). We can use that constraint in reverse as follows. Suppose that a rate-optimal sequence nk includes sample sizes with nk+1 /nk = ρ ∈ (1, 2) and attains the error rate O(n−α ) for α > 1. Then from (3) we obtain M (1 + ρ1−α )−1/(α−1) ≥ . m ρ−1 As we approach an arithmetic progression by letting ρ ↓ 1, the inefficiency in the constant factor increases without bound. The constraint only applies to rate-optimal sequences. In particular it does not apply to an extensible sequence that may be inefficient by a logarithmic factor.
Acknowledgments I thank Alex Kreinin and Sergei Kucherenko for pointing out to me the elegant argument in the Appendix of Sobol’ (1998), and Erich Novak for sharing his slides from MCQMC 2014. This work was supported by grant DMS-0906056 of the U.S. National Science Foundation.
6
References Brent, R. (1973). Algorithms for Minimization without Derivatives. PrenticeHall, Englewood Cliffs, NJ. Dick, J. (2011). Higher order scrambled digital nets achieve the optimal rate of the root mean square error for smooth integrands. The Annals of Statistics, 39(3):1372–1398. Hammersley, J. M. and Morton, K. W. (1956). A new Monte Carlo technique: antithetic variates. Mathematical proceedings of the Cambridge philosophical society, 52(3):449–475. Hickernell, F. J., Kritzer, P., Kuo, F. Y., and Nuyens, D. (2012). Weighted compound integration rules with higher order convergence for all n. Numerical Algorithms, 59(2):161–183. Novak, E. and Wozniakowski, H. (2010). Tractability of multivariate problems, volume II: standard information for functionals. European Mathematical Society, Zurich. Sobol’, I. M. (1993). Asymmetric convergence of approximations of the Monte Carlo method. Computational Mathematics and Mathematical Physics, 33(10):1391–1396. Sobol’, I. M. (1998). On quasi-Monte Carlo integrations. Mathematics and Computers in Simulation, 47:103–112.
7