On Efficient Estimation of the Variance of the Randomised Quasi ...

Report 2 Downloads 40 Views
On Efficient Estimation of the Variance of the Randomised Quasi Monte Carlo Estimate John Haslett, Chaitanya Joshi and Rountina Vrousai Geveli Department of Statistics, Trinity College Dublin June 20, 2011

Abstract A new method to estimate the variance of an RQMC estimate has been proposed. It is expected that this method will be computationally inexpensive and will produce quite accurate estimates for a wide range of integrands. This is a ’proof of concept’ work and therefore in its early stages.

1

Introduction

Let U1 , · · · , Un denote the random variables corresponding to the first n members of a d-dimensional scrambled sequence, i.e. each Ui ∼ U (0, 1)d , for i = 1, · · · , n. Let f : [0, 1)d −→ � be a square Lebesgue-integrable function, i.e f ∈ L2 . Then, a randomised Quasi-Monte Carlo (RQMC) estimator of the integral � I= f (u) du. (1) [0,1)d

is given by

n

1� Iˆ = f (ui ), n

(2)

i=1

where u1 , · · · , un denote the realisations of the n random variables U1 , · · · , Un . 1

It can be shown that for any f ∈ L2 , such an estimator is an unbiased ˆ = I, see for e.g. Owen (1995), Lemieux (2009), and that estimator, i.e E(I) the variance is o(n−1 ), but never much worse than the Monte Carlo variance , see Owen (1997a). In fact, under the stronger assumption of f being a smooth enough function, Owen (1997b) has shown that the variance is o(n−3 (log n)s−1 ) - a much faster rate of convergence than the MC estimator. Further insight into the efficiency of the RQMC method (over MC) when dimension d is large can be gained by performing the ANOVA decomposition of the function f . This is obtained by the multiresolution analysis (wavelet decomposition) of f in L2 [0, 1)d . Owen (1997a) has shown that when the function f is approximately a superposition of low-dimensional and somewhat coarse functions, RQMC using scrambled nets can provide a big improvement in efficiency over the MC method even for a large d. Thus, it is important to note that for functions which do not possess such a structure, RQMC might be only as efficient as MC, specially when d is large.

2

Variance Estimation

In practice, n is always finite and therefore the order of convergence results are not very helpful in determining the uncertainty associated with an RQMC estimate for a given n. Because a scrambled sequence U1 , · · · , Un has a very strong and non-decaying dependency structure, estimating the variance of the estimate Iˆ is not as straightforward as it is for a MC estimate.

2.1

Existing Methods

It is still possible, in more than one ways, to obtain the estimate of the variance of the RQMC estimator. One way is to sample M independent copies of U1 , · · · , Un and thus produce M independent estimates of Iˆ which can then be used to estimate the variance of the estimator. Owen (1997a) proposed an alternative method which essentially involves considering a scrambled sequence to be ’ internally replicated’. Then if certain conditions are met, it is possible to come up with a conservative estimate of the variance. In the same paper, an upper bound to the variance has also been determined and it is shown that for any f ∈ L2 the RQMC variance can never be greater than the MC variance times a small constant. ˆ is obtained using one of the above methods, Once an estimate of V (I) then the central limit theorem (CLT) for scrambled sequences proved by 2

ˆ Loh (2003) can be used to provide an uncertainty bound for the estimate I.

2.2

Motivation for a New Method

ˆ mentioned above, the first Out of the three methods for estimating V (I) method might be computationally quite expensive, and when the computational power is limited, the user might have to decide whether to choose a larger n and get a more accurate point estimate Iˆ with less accurate uncertainty estimate or instead choose a larger M and thus get a more accurate uncertainty estimate around a less accurate point estimate. Lemieux (2009) discusses this issue. The other two methods only provide conservative estimates and thus may over-estimate the true uncertainty. Therefore, the authors feel that there is a need to develop a method which will be less computationally expensive than the first method, but will provide more accurate estimate of uncertainty than the other two. We describe our proposed method next.

3

ˆ New Method to Estimate V (I)

ˆ is then For the ease of notation, we denote f (ui ) as fi . The variance V (I) given by n � 1 ˆ = V (1 V (I) fi ) = 2 1t V1, (3) n n i=1

where V denotes the n × n covariance matrix of f1 , · · · , fn , and 1 denotes the n-dimensional vector of 1’s. Typically V is not sparse, but in fact has a very complex structure with many strongly negative elements. Therefore, it is not straightforward to estimate V using a single set of observations f 1 , · · · , fn . Remark : Note that this structure of V with many strongly negative ˆ for RQMC is much smaller (for many f ’s) elements also explains why V (I) ˆ using MC for which V is a diagonal matrix. than V (I) The principal feature of our method is that we translate the non-standard problem of estimating V into a ’more standard ’ model fitting problem. This is done using the following three steps.

3

3.1

Step 1 : ordering fi ’s

First we order the fi ’s. Let f 1 , · · · , f n denote their order statistic. Then ˆ = V (1 V (I) n =

n �

f i) = V (

i=1

1 t 1 Vd 1 n2

n

n

i=1

i=1

1� i 1� (f − µi )) = V ( di ), n n

(4) (5)

where µi = E[f i ], di measures the discrepancy between the observed ith order statistics f i and its expected value and Vd denotes the n × n covariance matrix of d1 , · · · , dn . Ordering f ’s has two important implications. • Smaller variances : Elements of matrix Vd are much smaller than the elements of V while still (as expected) satisfying 1t V1 = 1t Vd 1. For the examples described later, we have noticed that elements of Vd are an order of magnitude smaller than the elements of V. The important implication of this is that the confidence intervals obtained using Vd would be more tighter than those obtained using V. • Rapidly decaying correlation structure : We have observed (for the examples described later) that Vd has a very different structure than V. The corresponding correlation matrix has a rapidly decaying structure - somewhat similar to the correlation structure observed for Markov processes.

3.2

Step 2: Estimating Vd

The rapidly decaying structure of the correlation matrix means that it might be possible to use the variance estimation methods used in MCMC literaˆ can be estimated by approximating the spectral ture. For example, V (I) density of f at frequency 0. Numerous approaches of doing this have already been proposed. However, this approach is valid only if f 1 , · · · , f n could be considered as realisations from a covariance stationary process. Using the examples, we can also see that it is reasonable to assume that most of the di ’s have the same variance σd2 (although this assumption is strictly not true). Therefore, Vd can be expressed as Vd ≈ σd2 Rd ,

(6)

where Rd denotes the n × n correlation matrix of d1 , · · · , dn . Therefore, ˆ can be expressed as using Equations (5) and (6), V (I) 4

ˆ ≈ V (I)

σd2 t 1 Rd 1, n2

(7)

where σd2 can be estimated using the observed di ’s and 1t Rd 1 can be estimated using the autocorrelation function of di ’s. If ρˆd (h) denotes the sample estimate of the lag-h autocorrelation of di ’s, then 1t Rd 1 can be estimated as � ˆ d1 = 2 1t R h = 1n−1 (n − h)ˆ ρd (h) − nˆ ρd (0) (8)

3.3

Step 3: Estimating di ’s

The difficulty in estimating variance using equation (7) is that µi ’s will typically be unknown and so therefore will be di ’s. We aim to get around this problem by estimating µi ’s by fitting a smooth curve to the ordered data f 1 , · · · , f n . Once these estimates µ ˆi ’s are obtained, we can obtain i i 2 ˆ d 1 using Equation dˆi = µ ˆ − f , and therefore σ ˆd . We can then obtain 1t R ˆ (8). V (I) can thus be estimated as ˆd2 t ˆ ˆ =σ Vˆ (I) 1 Rd 1. n2

(9)

We now illustrate this approach using a couple of simple examples.

4 4.1

Examples f with a symmetric distribution: f (u) =

�d

j=1

u.j

As described earlier, u ∈ [0, 1)d . Let u.j denote the j th component of u, for j = 1, · · · d. The integrand we are concerned with is just the sum of all the components of u. First, by simulation, we illustrate how the ordering changes the dependency structure between fi ’s. Consider n = 128 and d = 5. Correlation between f1 , · · · , fn was obtained by simulating M = 1000 such datasets each of size n (using M independent replications of scrambled sequences). Correlation between the ordered variables f 1 , · · · , f n was obtained after ordering each of the M datasets. Figure 1 shows both the correlation structures.

5

Figure 1: Correlation structure between unordered f ’s (left) and ordered f ’s (right). Colour-bar on the left provides the scale. As expected diagonals are red in both cases.

We now consider just one of these M datasets (say the first one) as our observed data on f1 , · · · , fn . Figure 2 shows the unordered and ordered values of this dataset. The red line shows the expected value of f i ’s, i.e µi ’s. These µi ’s were also calculated empirically from the M repetitions of the data. It is thus possible to obtain M realisations of the true d1 , · · · , dn and thus true values for autocorrelation function ρd and the correlation matrix Rd (which is in fact the same as the correlation of f 1 , · · · , f n shown in Figure 1). As discussed in Section 3.1, the effects of ordering can be seen in Figures 3 and 4. Figure 3 not only shows that the variances post ordering are an order of magnitude smaller, but also that most of the di ’s (except the first and last few) have the same variance, and therefore (for the purposes of estimation) it might be reasonable to assume that all di ’s have constant variance σd2 . Figure 4 notes the difference in the dependence structure pre and post ordering using the autocorrelations and thus compliments Figure 1. Once the data have been ordered they can be considered as a realisation of a monotonous but bounded function. As mentioned earlier it is now possible to approximate µi ’s by fitting a smooth fit to the ordered f i ’s. A b-spline curve was fitted to the data in this case and the fit can be seen in Figure 5. It is important that this fit is smooth enough so that it fits the (latent) red line more closely than to the data itself. 6

Figure 2: Unordered values of f (left) and ordered values of f along with µi (red line) (right)

Figure 3: Variance for unordered f : V (fi ) (left) and variance for the ordered f : V (f i ) = V (di ) (right)

7

Figure 4: Autocorrelation for unordered values of f (left) and for d (ordered values of f ) (right)

ˆ This spline fit was then used as an estimate for µi . An estimate for V (I) was then obtained as described in Equation (9) and it matched closely with the true value obtained by empirical evaluation of Equation (5).

4.2

f with a skewed distribution: f (u) =

�d

j=1

u.j

As described earlier, u ∈ [0, 1)d and let u.j denote the j th component of u, for j = 1, · · · d. The integrand we now consider is just the product of all the components of u. Again, by simulation, we first illustrate how the ordering changes the dependency structure between fi ’s. Consider n = 128 and d = 5. Correlation between f1 , · · · , fn was obtained by simulating M = 1000 such datasets each of size n. Correlation between the ordered variables f 1 , · · · , f n was obtained after ordering each of the M datasets. Figure 6 shows both the correlation structures. We now consider just one of these M datasets (say the first one) as our observed data on f1 , · · · , fn . Figure 7 shows the unordered and ordered values of this dataset. The red line shows the expected value of f i ’s, i.e µi ’s. These were calculated empirically from the M repetitions of the data. 8

Figure 5: Left: Ordered values of f along with µi (red line) and Right: also along with the b-spline fit (blue line)

Figure 6: Correlation structure between unordered f ’s (left) and ordered f ’s (right). Colour-bar on the left provides the scale. As expected diagonals are red in both cases.

9

Figure 7: Unordered values of f (left) and ordered values of f along with µi (red line) (right)

As in the previous example, the effects of ordering can be seen in Figures 8 and 9. However, the difference in the variances of the ordered and unordered f ’s is less dramatic in Figure 8 than in Figure 3. It will be interesting to see which distributional properties of f induce this difference. Once the data have been ordered they can be considered as a realisation of a monotonous but bounded function. As in the earlier example, we approximate µi ’s by fitting a smooth fit to the ordered f i ’s. A b-spline curve was fitted to the data in this case and the fit can be seen in Figure 10. As in the earlier example, this spline fit was then used as an estimate for ˆ was then obtained as described in Equation (9) An estimate for V (I) and it matched closely with the true value obtained by empirical evaluation of Equation (5). µi .

10

Figure 8: Variance for unordered f : V (fi ) (left) and variance for the ordered f : V (f i ) = V (di ) (right)

Figure 9: Autocorrelation for unordered values of f (left) and for d (ordered values of f ) (right)

11

Figure 10: Left: Ordered values of f along with µi (red line) and Right: also along with the b-spline fit (blue line)

5

Challenges and Future Work

It is the authors’ understanding that there is a need to develop a method which will be computationally cheaper, easy to implement and yet provide an accurate estimate of the uncertainty. The authors are currently in pursuit of an application (or an integrand) which could really benefit from the development of the said method. The authors are intrigued by the change in the dependency structure that is induced by ordering. The authors would like to know if there is a theoretical explanation to this. Although, functional ANOVA can help in identifying integrands for which RQMC works efficiently or otherwise, the authors feel that a similar explanation should be possible also by looking at the distributional properties of the integrands (such as being long-tailed, for example). This is also the motivation behind selecting the two examples for discussion. If this method is to be further developed then the main challenge would be in developing guidance on how smooth is smooth enough for the purposes of accurate estimation of the variance. It will also be necessary to investigate if this guidance will be integrand-dependent. It is also important that such a method is easy to implement, widely applicable and computationally 12

inexpensive.

References Lemieux, C. (2009). Monte Carlo and Quasi-Monte Carlo Sampling. New York: Springer. Loh, W.-L. (2003). On the assymptotic distribution of scrambled net qudrature. The Annals of Statistics 31, 1282–1324. Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and Quasi-Monte Carlo in scientific computing. Springer, New York. Owen, A. B. (1997a). Monte carlo variance of scrambled net qudrature. SIAM J. Numerical Analysis 34, 1884–1910. Owen, A. B. (1997b). Monte carlo variance of scrambled net qudrature. The Annals of Statistics 25, 1541–1562.

13