Royal Netherlands Meteorological Institute Ministry of Infrastructure and the Environment
Threshold selection for regional peaks-over-threshold data
Martin Roth
[email protected] Joint work with A. Buishand and G. Jongbloed
Area of Interest 52.75
54
52.50
lat
lat
53 52.25
52 52.00
51
51.75
4
lon
6
8
5.0
5.5
lon
6.0
6.5
Waterboard Vallei & Veluwe 21 daily precipitation series for 1951–2009 Buishand et al. (2013): Homogeneity of precipitation series in the Netherlands and their trends in the past century. Int. J. Climatol.,
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
2 / 13
Why Threshold Selection? Two main approaches for extreme value analysis I
Block maxima – one value per year ⇒ Generalized extreme value distribution
I
Peaks-over-threshold – all peak values above a threshold ⇒ Generalized Pareto distribution (GPD)
Trade-off situation I
Low thresholds lead to bias in the analysis of the excesses
I
High thresholds result in high parameter estimation uncertainty
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
3 / 13
Preliminaries At-site methods for threshold selection I
Visual inspection: mean excess plot, threshold stability plot
I
Goodness of Fit tests: Anderson–Darling or Kolmogorov–Smirnov statistic
I
Many more ...
1
Regional frequency analysis I
Reduces parameter estimation uncertainty
I
Often regional constant exceedance probability (e.g. 5%)
I
But inference on threshold level only on at-site basis
1 Scarrott and MacDonald (2012): A review of extreme value threshold estimation and uncertainty quantification. REVSTAT M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
4 / 13
Simulation: Smooth Marginal Distribution Any continuous distribution F on [0, ∞) can be written as Z F(x) = 1 − exp −
x 0
h(u) du ,
where h is the hazard rate of the distribution. Define x−u x−u h(x) := η h1 (x) + 1 − η h2 ( x ) , ε ε with h1 the hazard rate of some bulk distribution, h2 the hazard rate of the GPD, and η some smooth transition function.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
5 / 13
Threshold Stability Plot
ξ
0.50
0.25
0.00
-0.25 0.900
0.925
0.950
τ
0.975
τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
6 / 13
Threshold Stability Plot
ξ
0.50
0.25
0.00
-0.25 0.900
0.925
0.950
τ
0.975
τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
6 / 13
Threshold Stability Plot – Averaged
ξ
0.50
0.25
0.00
-0.25 0.900
0.925
0.950
τ
0.975
τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
6 / 13
GOF Test and Automatic Selection 1.000
τ
0.975 0.950 0.925 0.900 1
2
4
8
Number of sites
16
16c
Violin plots of the selected threshold based on the lowest value of τ for which the average KS statistic is not significant at the 5% level. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
7 / 13
Simulation: Spatial Dependence Copula approach I
Normal copula ⇒ weak tail dependence
I
Gumbel copula ⇒ strong tail dependence
Quantile based measure of tail dependence lu (τ ) := P X1 > F1−1 (τ )|X2 > F2−1 (τ ) Copula parameter can be chosen such that lu (0.9) is the same.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
8 / 13
Effect of Spatial Dependence
AEE 0
30
20
10
0.1
0.25
0.5
lu (0.9)
0.75
0.9
Averaged Euclidean error of the 5- (dots) and 50-year (triangles) return level for simulated data (red - Gumbel copula, blue - normal copula) M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
9 / 13
At-site KS Statistic
D(τ )
0.20
0.15
0.10
0.05 0.900
0.925
0.950
τ
0.975
The blue line gives the KS statistic and the red the 95% critical values.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
10 / 13
Regional KS Statistic
D(τ )
0.12
0.08
0.04 0.900
0.925
0.950
τ
0.975
The red line gives the regionally averaged KS statistic, the three other lines give the 95% critical values based on the Normal (blue), distance-dependent Normal (purple), and Gumbel copula (green). M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
11 / 13
Effect on Return Levels 120
mm
100
80
60
40 0.900
0.925
0.950
0.975
τ
Average return level as a function of the non-exceedance probability of the selected threshold for return periods of 5 (red), 50 (green) and 500 (blue) years. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
12 / 13
Conclusion Use RFA principles also for threshold selection! GoF based approach not restricted to the GPD. Outlook I
Inspect the role of selection criterion
I
Extend the approach to incorporate trends Roth, M., G. Jongbloed, and T. A. Buishand (2016), Threshold selection for regional peaks-over-threshold data. Journal of Applied Statistics, 43:1291–1309.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
13 / 13
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
1/4
Kolmogrov-Smirnov statistic (1) Fix τ0 and the corresponding threshold u. (2) Estimate the parameters based on the n excesses above u. (3a) Simulate n values from the corresponding GPD. (3b) Sample n˜ from B(T, 1 − τ0 ). Simulate n˜ values from the corresponding GPD. Scales to regional setting. (4) Calculate KS statistic for the simulated data. (5) Repeat steps 3 and 4 a thousand times and take the 0.95-quantile of the bootstrapped statistic as critical value for Dn at τ0 . (6) Repeat procedure for every τ in a reasonable range.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
2/4
Averaged Euclidean Error (AEE)
AEE(τ ) :=
1 B ||rˆ i (τ ) − r ||2 , B i∑ =1
where ||.||2 denotes the Euclidean (or l2 ) norm, B gives the number of bootstrap samples and r specifies some return level.
M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
3/4
Effect of Bulk–Tail Transition 13
AEE
11 9 7 5 0.900
0.925
0.950
τ
0.975
AEE in the 5-year return level as a function of the probability τ for the simulated data (blue - bulk and tail are different, red - similar). The dashed horizontal lines indicate the AEE of the selected threshold. M. Roth (
[email protected])
IMSC 2016: Regional Threshold Selection
4/4