13 international meeting on statistical climatology
Homogeneity testing revisited Pierre Masselot, Fateh Chebana, Taha B.M.J. Ouarda
June 2016, Canmore
Introduction
HW test
Improvements
Contents 1. Introduction
2. The Hosking-Wallis homogeneity test 3. Proposed improvements 4. Simulation study
5. Conclusion
Sim. Study
Conclusion
2
Introduction
HW test
Improvements
Sim. Study
Regional frequency analysis Frequency analysis (FA): → Fit a parametric probability distribution to data → Estimates frequency of extreme events from this distribution
Regional frequency analysis (RFA) → Estimates extreme events for ungauged target sites → Transfers hydrological information from gauged sites in a region to the ungauged target site → Usually requires the homogeneity of the region
Conclusion
3
Introduction
HW test
Improvements
Sim. Study
Homogeneity testing Required preliminary step of RFA Test for the homogeneity of a region
Most used test: The Hosking-Wallis (HW) homogeneity test (Hosking and Wallis 1993)
Conclusion
4
Introduction
HW test
Improvements
Sim. Study
Conclusion
The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉
𝑉=
𝑗 𝑛𝑗 (𝜏2
1 2 2
− 𝜏2 )
𝑛𝑗
→ 𝜏2 : L-scale → 𝑉 is the weighted variance of at-site scale measures → The larger 𝑉 is, the less homogeneous the regions is
5
Introduction
HW test
Improvements
Sim. Study
Conclusion
The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region → → → →
All sites are simulated from a 4-parameters Kappa distribution Very general distribution Contains Normal, Gumbel, GEV distributions as special cases The parameters are estimated on observed data
6
Introduction
HW test
Improvements
Sim. Study
Conclusion
The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions → Estimation of the 𝑉 distribution for homogeneous regions → 𝜇𝑠𝑖𝑚 : mean of the 𝑉 distribution → 𝜎𝑠𝑖𝑚 : standard deviation of the 𝑉 distribution
7
Introduction
HW test
Improvements
Sim. Study
The Hosking-Wallis Homogeneity test 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure 𝑉 − 𝜇𝑠𝑖𝑚 𝐻= 𝜎𝑠𝑖𝑚 → 𝐻 < 1: homogeneous → 1 < 𝐻 < 2: possibly homogeneous → 𝐻 > 2: heterogeneous
Conclusion
8
Introduction
HW test
Improvements
Sim. Study
Conclusion
Drawbacks and suggested solutions 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure
Become more problematic in the multivariate setting
9
Introduction
HW test
Improvements
Sim. Study
Conclusion
Drawbacks and suggested solutions 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region The use of a parametric distribution creates uncertainty → Relevant only if data follow the distribution → Necessitates the estimation of 4 parameters → Issue more important in the multivariate case
Proposition: Simulate the regions through nonparametric procedures
10
Introduction
HW test
Improvements
Sim. Study
Drawbacks and suggested solutions 1. Compute the test statistic 𝑉
2. Simulate a set of 𝑁𝑠𝑖𝑚 homogeneous region 3. Compute the 𝑉 statistic for all the 𝑁𝑠𝑖𝑚 regions 4. Compute the heterogeneity measure → Rejection threshold not well justified
Proposition: Compute a p-value from the 𝑉 distribution
Conclusion
11
Introduction
HW test
Improvements
Sim. Study
Conclusion
Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =
𝑥 (𝑗)
M: Permutations methods (Fisher 1935; Pitman 1937) → Randomly reassigning the observation between sites → Same as sampling sites without replacement from 𝑥 𝑡𝑜𝑡
12
Introduction
HW test
Improvements
Sim. Study
Conclusion
Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =
𝑥 (𝑗)
M: Permutations methods (Fisher 1935; Pitman 1937)
B: Bootstrap (Efron 1979) → Sampling with replacement from 𝑥 𝑡𝑜𝑡 → Allows testing more general hypotheses than permutations
13
Introduction
HW test
Improvements
Sim. Study
Conclusion
Step 2: nonparametric procedure Simulate homogeneous regions from the empirical distribution → i.e. from the pooling of all sites data 𝑥 𝑡𝑜𝑡 =
𝑥 (𝑗)
M: Permutations methods (Fisher 1935; Pitman 1937)
B: Bootstrap (Efron 1979) Y: Pólya resampling (Lo 1988) → Bootstrap without the assumption that the empirical distribution represents the true distribution of 𝑥 𝑡𝑜𝑡 → Each time an observation 𝑥𝑖 is drawn, a new one is added to 𝑥 𝑡𝑜𝑡
14
Introduction
HW test
Improvements
Sim. Study
Conclusion
Step 4: rejection threshold After step 3: we get a set of simulated 𝑉 (𝑏) (𝑏 = 1, … , 𝑁𝑠𝑖𝑚 ) → Represents the distribution of 𝑉 under homogeneity A 𝑝 − 𝑣𝑎𝑙𝑢𝑒 uses the whole 𝑉 (𝑏) distribution
𝑝 − 𝑣𝑎𝑙𝑢𝑒 =
#(𝑉 𝑏 >𝑉 𝑜𝑏𝑠 ) 𝑁𝑠𝑖𝑚
Compare the 𝑝 − 𝑣𝑎𝑙𝑢𝑒 to a chosen significance level 𝛼 → If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 𝛼: reject the hypothesis of homogeneity → If 𝑝 − 𝑣𝑎𝑙𝑢𝑒 > 𝛼: do not reject the hypothesis of homogeneity
15
Introduction
HW test
Improvements
Sim. Study
Conclusion
Selected Results: type I error (simulations) Regions simulated: homogeneous → Rejection rate must be close to the significance level 𝛼 = 5%
→ HW and Y tests: underestimated type I error → M and B tests: type I error closer to 𝛼
16
Introduction
HW test
Improvements
Sim. Study
Selected Results: power (simulations) Regions simulated: heterogeneous → We want the power to be as high as possible
→ Y test has a low power → M and B tests outperform the HW test
Conclusion
17
Introduction
HW test
Improvements
Sim. Study
Conclusion
Conclusion In order to improve the HW test we propose: → To use nonparametric procedures to simulate the 𝑉 distribution → To compute a 𝑝 − 𝑣𝑎𝑙𝑢𝑒 to decide to reject the homogeneity or not
This leads to: → A wider applicability of the test → A simplification of the test procedure → An increase of the power of the test (M and B procedures only)
Among the three procedure M and B are more powerful than Y
18
Introduction
HW test
Improvements
Sim. Study
Conclusion
THANK YOU R codes can be provided and will be online soon References Chebana, F., & Ouarda, T. B. M. J. (2007). Multivariate L-moment homogeneity test. Water Resources Research, 43(8), W08406. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1-26. Hosking, J., & Wallis, J. (1993). Some statistics useful in regional frequency analysis. Water Resources Research, 29(2), 271-281. Lo, A. Y. (1988). A Bayesian bootstrap for a finite population. The Annals of Statistics, 16(4), 1684-1695. P Masselot, F Chebana, TBMJ Ouarda (2016) Fast and direct nonparametric procedures in the L-moment homogeneity test. Stochastic Environmental Research and Risk. In press. Pitman, E. J. G. (1937). Significance tests which may be applied to samples from any populations. Supplement to the Journal of the Royal Statistical Society, 4(1), 119-130.
19