Roth IMSC Threshold

Report 9 Downloads 52 Views
Royal Netherlands Meteorological Institute Ministry of Infrastructure and the Environment

Threshold selection for regional peaks-over-threshold data

Martin Roth [email protected] Joint work with A. Buishand and G. Jongbloed

Area of Interest 52.75

54

52.50

lat

lat

53 52.25

52 52.00

51

51.75

4

lon

6

8

5.0

5.5

lon

6.0

6.5

Waterboard Vallei & Veluwe 21 daily precipitation series for 1951–2009 Buishand et al. (2013): Homogeneity of precipitation series in the Netherlands and their trends in the past century. Int. J. Climatol.,

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

2 / 13

Why Threshold Selection? Two main approaches for extreme value analysis I

Block maxima – one value per year ⇒ Generalized extreme value distribution

I

Peaks-over-threshold – all peak values above a threshold ⇒ Generalized Pareto distribution (GPD)

Trade-off situation I

Low thresholds lead to bias in the analysis of the excesses

I

High thresholds result in high parameter estimation uncertainty

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

3 / 13

Preliminaries At-site methods for threshold selection I

Visual inspection: mean excess plot, threshold stability plot

I

Goodness of Fit tests: Anderson–Darling or Kolmogorov–Smirnov statistic

I

Many more ...

1

Regional frequency analysis I

Reduces parameter estimation uncertainty

I

Often regional constant exceedance probability (e.g. 5%)

I

But inference on threshold level only on at-site basis

1 Scarrott and MacDonald (2012): A review of extreme value threshold estimation and uncertainty quantification. REVSTAT M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

4 / 13

Simulation: Smooth Marginal Distribution Any continuous distribution F on [0, ∞) can be written as  Z F(x) = 1 − exp −

x 0

 h(u) du ,

where h is the hazard rate of the distribution. Define      x−u x−u h(x) := η h1 (x) + 1 − η h2 ( x ) , ε ε with h1 the hazard rate of some bulk distribution, h2 the hazard rate of the GPD, and η some smooth transition function.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

5 / 13

Threshold Stability Plot

ξ

0.50

0.25

0.00

-0.25 0.900

0.925

0.950

τ

0.975

τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

6 / 13

Threshold Stability Plot

ξ

0.50

0.25

0.00

-0.25 0.900

0.925

0.950

τ

0.975

τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

6 / 13

Threshold Stability Plot – Averaged

ξ

0.50

0.25

0.00

-0.25 0.900

0.925

0.950

τ

0.975

τ is the non-exceedance probability. The vertical line indicates the start of the GPD tail and the horizontal line shows the true shape parameter of the simulated data. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

6 / 13

GOF Test and Automatic Selection 1.000

τ

0.975 0.950 0.925 0.900 1

2

4

8

Number of sites

16

16c

Violin plots of the selected threshold based on the lowest value of τ for which the average KS statistic is not significant at the 5% level. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

7 / 13

Simulation: Spatial Dependence Copula approach I

Normal copula ⇒ weak tail dependence

I

Gumbel copula ⇒ strong tail dependence

Quantile based measure of tail dependence   lu (τ ) := P X1 > F1−1 (τ )|X2 > F2−1 (τ ) Copula parameter can be chosen such that lu (0.9) is the same.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

8 / 13

Effect of Spatial Dependence

AEE 0

30

20

10

0.1

0.25

0.5

lu (0.9)

0.75

0.9

Averaged Euclidean error of the 5- (dots) and 50-year (triangles) return level for simulated data (red - Gumbel copula, blue - normal copula) M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

9 / 13

At-site KS Statistic

D(τ )

0.20

0.15

0.10

0.05 0.900

0.925

0.950

τ

0.975

The blue line gives the KS statistic and the red the 95% critical values.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

10 / 13

Regional KS Statistic

D(τ )

0.12

0.08

0.04 0.900

0.925

0.950

τ

0.975

The red line gives the regionally averaged KS statistic, the three other lines give the 95% critical values based on the Normal (blue), distance-dependent Normal (purple), and Gumbel copula (green). M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

11 / 13

Effect on Return Levels 120

mm

100

80

60

40 0.900

0.925

0.950

0.975

τ

Average return level as a function of the non-exceedance probability of the selected threshold for return periods of 5 (red), 50 (green) and 500 (blue) years. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

12 / 13

Conclusion Use RFA principles also for threshold selection! GoF based approach not restricted to the GPD. Outlook I

Inspect the role of selection criterion

I

Extend the approach to incorporate trends Roth, M., G. Jongbloed, and T. A. Buishand (2016), Threshold selection for regional peaks-over-threshold data. Journal of Applied Statistics, 43:1291–1309.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

13 / 13

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

1/4

Kolmogrov-Smirnov statistic (1) Fix τ0 and the corresponding threshold u. (2) Estimate the parameters based on the n excesses above u. (3a) Simulate n values from the corresponding GPD. (3b) Sample n˜ from B(T, 1 − τ0 ). Simulate n˜ values from the corresponding GPD. Scales to regional setting. (4) Calculate KS statistic for the simulated data. (5) Repeat steps 3 and 4 a thousand times and take the 0.95-quantile of the bootstrapped statistic as critical value for Dn at τ0 . (6) Repeat procedure for every τ in a reasonable range.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

2/4

Averaged Euclidean Error (AEE)

AEE(τ ) :=

1 B ||rˆ i (τ ) − r ||2 , B i∑ =1

where ||.||2 denotes the Euclidean (or l2 ) norm, B gives the number of bootstrap samples and r specifies some return level.

M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

3/4

Effect of Bulk–Tail Transition 13

AEE

11 9 7 5 0.900

0.925

0.950

τ

0.975

AEE in the 5-year return level as a function of the probability τ for the simulated data (blue - bulk and tail are different, red - similar). The dashed horizontal lines indicate the AEE of the selected threshold. M. Roth ([email protected])

IMSC 2016: Regional Threshold Selection

4/4