Homogeneity testing

Report 11 Downloads 154 Views
13 international meeting on statistical climatology

Homogeneity testing revisited Pierre Masselot, Fateh Chebana, Taha B.M.J. Ouarda

June 2016, Canmore

Introduction

HW test

Improvements

Contents 1. Introduction

2. The Hosking-Wallis homogeneity test 3. Proposed improvements 4. Simulation study

5. Conclusion

Sim. Study

Conclusion

2

Introduction

HW test

Improvements

Sim. Study

Regional frequency analysis Frequency analysis (FA): → Fit a parametric probability distribution to data → Estimates frequency of extreme events from this distribution

Regional frequency analysis (RFA) → Estimates extreme events for ungauged target sites → Transfers hydrological information from gauged sites in a region to the ungauged target site → Usually requires the homogeneity of the region

Conclusion

3

Introduction

HW test

Improvements

Sim. Study

Homogeneity testing Required preliminary step of RFA Test for the homogeneity of a region

Most used test: The Hosking-Wallis (HW) homogeneity test (Hosking and Wallis 1993)

Conclusion

4

Introduction

HW test

Improvements

Sim. Study

Conclusion

The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉

𝑉=

𝑗 𝑛𝑗 (𝜏2

1 2 2

− 𝜏2 )

𝑛𝑗

→ 𝜏2 : L-scale → 𝑉 is the weighted variance of at-site scale measures → The larger 𝑉 is, the less homogeneous the regions is

5

Introduction

HW test

Improvements

Sim. Study

Conclusion

The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region → → → →

All sites are simulated from a 4-parameters Kappa distribution Very general distribution Contains Normal, Gumbel, GEV distributions as special cases The parameters are estimated on observed data

6

Introduction

HW test

Improvements

Sim. Study

Conclusion

The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions → Estimation of the 𝑉 distribution for homogeneous regions → 𝜇𝑠𝑖𝑚 : mean of the 𝑉 distribution → 𝜎𝑠𝑖𝑚 : standard deviation of the 𝑉 distribution

7

Introduction

HW test

Improvements

Sim. Study

The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure 𝑉 − 𝜇𝑠𝑖𝑚 𝐻= 𝜎𝑠𝑖𝑚 → 𝐻 < 1: homogeneous → 1 < 𝐻 < 2: possibly homogeneous → 𝐻 > 2: heterogeneous

Conclusion

8

Introduction

HW test

Improvements

Sim. Study

Conclusion

Drawbacks and suggested solutions 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure

Become more problematic in the multivariate setting

9

Introduction

HW test

Improvements

Sim. Study

Conclusion

Drawbacks and suggested solutions 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region The use of a parametric distribution creates uncertainty → Relevant only if data follow the distribution → Necessitates the estimation of 4 parameters → Issue more important in the multivariate case

Proposition: Simulate the regions through nonparametric procedures

10

Introduction

HW test

Improvements

Sim. Study

Drawbacks and suggested solutions 1. Compute the test statistic 𝑉

2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure → Rejection threshold not well justified

Proposition: Compute a p-value from the 𝑉 distribution

Conclusion

11

Introduction

HW test

Improvements

Sim. Study

Conclusion

Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =

𝑥 (𝑗)

 M: Permutations methods (Fisher 1935; Pitman 1937) → Randomly reassigning the observation between sites → Same as sampling sites without replacement from 𝑥 𝑡𝑜𝑡

12

Introduction

HW test

Improvements

Sim. Study

Conclusion

Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =

𝑥 (𝑗)

 M: Permutations methods (Fisher 1935; Pitman 1937)

 B: Bootstrap (Efron 1979) → Sampling with replacement from 𝑥 𝑡𝑜𝑡 → Allows testing more general hypotheses than permutations

13

Introduction

HW test

Improvements

Sim. Study

Conclusion

Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =

𝑥 (𝑗)

 M: Permutations methods (Fisher 1935; Pitman 1937)

 B: Bootstrap (Efron 1979)  Y: Pólya resampling (Lo 1988) → Bootstrap without the assumption that the empirical distribution represents the true distribution of 𝑥 𝑡𝑜𝑡 → Each time an observation 𝑥𝑖 is drawn, a new one is added to 𝑥 𝑡𝑜𝑡

14

Introduction

HW test

Improvements

Sim. Study

Conclusion

Step 4: rejection threshold After step 3: we get a set of simulated 𝑉 (𝑏) (𝑏 = 1, … , 𝑁𝑠𝑖𝑚 ) → Represents the distribution of 𝑉 under homogeneity A 𝑝 − 𝑣𝑎𝑙𝑢𝑒 uses the whole 𝑉 (𝑏) distribution

𝑝 − 𝑣𝑎𝑙𝑢𝑒 =

#(𝑉 𝑏 >𝑉 𝑜𝑏𝑠 ) 𝑁𝑠𝑖𝑚

Compare the 𝑝 − 𝑣𝑎𝑙𝑢𝑒 to a chosen significance level 𝛼 → If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼: reject the hypothesis of homogeneity → If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼: do not reject the hypothesis of homogeneity

15

Introduction

HW test

Improvements

Sim. Study

Conclusion

Selected Results: type I error (simulations) Regions simulated: homogeneous → Rejection rate must be close to the significance level 𝛼 = 5%

→ HW and Y tests: underestimated type I error → M and B tests: type I error closer to 𝛼

16

Introduction

HW test

Improvements

Sim. Study

Selected Results: power (simulations) Regions simulated: heterogeneous → We want the power to be as high as possible

→ Y test has a low power → M and B tests outperform the HW test

Conclusion

17

Introduction

HW test

Improvements

Sim. Study

Conclusion

Conclusion In order to improve the HW test we propose: → To use nonparametric procedures to simulate the 𝑉 distribution → To compute a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 to decide to reject the homogeneity or not

This leads to: → A wider applicability of the test → A simplification of the test procedure → An increase of the power of the test (M and B procedures only)

Among the three procedure M and B are more powerful than Y

18

Introduction

HW test

Improvements

Sim. Study

Conclusion

THANK YOU R codes can be provided and will be online soon References  Chebana, F., & Ouarda, T. B. M. J. (2007). Multivariate L-moment homogeneity test. Water Resources Research, 43(8), W08406.  Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1-26.  Hosking, J., & Wallis, J. (1993). Some statistics useful in regional frequency analysis. Water Resources Research, 29(2), 271-281.  Lo, A. Y. (1988). A Bayesian bootstrap for a finite population. The Annals of Statistics, 16(4), 1684-1695.  P Masselot, F Chebana, TBMJ Ouarda (2016) Fast and direct nonparametric procedures in the L-moment homogeneity test. Stochastic Environmental Research and Risk. In press.  Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations. Supplement to the Journal of the Royal Statistical Society, 4(1), 119-130.

19