fondeville canmore 2016

Report 6 Downloads 61 Views
High-dimensional peaks-over-threshold modelling for spatial extreme rainfall

Raphaël de Fondeville Chair of Statistics, EPFL IMSC - Canmore, AB, Canada June 6-10, 2016

High dimensional peaks-over-threshold modelling

1 / 13

Motivation I

Two types of extreme rainfall in Florida:

28.4

28.4

Rainfall

Rainfall

0.75 0.50

28.0

0.2

lat

lat

28.0

0.1

0.25 0.00 27.6

27.2

27.2 -82.5

-82.0

lon

-81.5

Local rainfall I I I

0.0

27.6

-82.5

-82.0

lon

-81.5

Spatially dispersed rainfall

How these dierent types of events can be characterized? Is it possible to model and generate similar extreme events? Can we extrapolate to events with unobserved intensities?

⇒ This talk aims to illustrate how peaks-over-threshold modelling addresses these

challenges and to explore techniques for inference in high dimensions. High dimensional peaks-over-threshold modelling

2 / 13

Risk functional and R-exceedances I

Let {X (s)}s∈S be a stochastic process with sample paths in the space of nonnegative continuous functions C+ (S), where S is a compact subset of Rd .

Risk functional

A continuous nonnegative functional R : C+ (S) → [0, +∞), for which there exists α > 0 such that R(ax) = aα R(x), a > 0, x ∈ C+ (S), is called a risk functional. I

An R-exceedance is an event {R(x) > un } where the threshold un is such that P{R(X ) > un } → 0,

I

n → ∞.

Possible risk functionals are  sups∈S X (s) for events where at least one location exceeds a threshold; 

PT



R

R

t=1 S

S

Xt (s)ds for spatio-temporal accumulation;

X (s)2 ds when the risk is determined by the energy inside a system.

High dimensional peaks-over-threshold modelling

3 / 13

Pareto processes I

For any risk functional R, if  P u −1 X ∈ · | R(X ) > u → P (P ∈ ·) ,

u → ∞,

then P is an R-Pareto process (Dombry and Ribatet, 2015).

R-Pareto

process

An R-Pareto process may be constructed as P=U

Q , R(Q)

where I I

I I I

U is a univariate Pareto random variable with P(U > r ) = 1/r β , r > 1; + = {x ∈ C + (S) : kxk Q is a stochastic process with sample paths in Sang ∞ = 1} and probability measure σang .

β is the tail index of the Pareto process. σang is called the angular or spectral measure.

There is an equivalence between the distribution of maxima, i.e., max-stable processes, and R-exceedances.

High dimensional peaks-over-threshold modelling

4 / 13

BrownResnick model

I

For the BrownResnick model, the tail index β = 1 and the random vector Q follows a log-Gaussian distribution with stationary increments. The probability measure σang is fully characterized by the semi-variogram

I

2 i 1 h E X (s) − X (s 0 ) , s, s 0 ∈ S, 2 where θ denotes the parameters of the model (Kabluchko et al., 2009). The `-dimensional intensity function of the associated Pareto process is

I

γθ (s − s 0 ) =

λR θ (x) =

  |Σθ |−1/2 1 T −1 e e x Σ x , exp − θ 2 x x2 · · · x` (2π)(`−1)/2 2 1

x ∈ R`+ \ {0},

where xe is the (` − 1)-dimensional vector {log(xj /x1 ) + γθ (sj − s1 ) : j = 2, . . . , `}T ,

and Σθ is the (` − 1) × (` − 1) matrix {γθ (si − s1 ) + γθ (sj − s1 ) − γθ (si − sj )}i,j∈{2,...,`} . I

To derive  a density function for a region of exceedances, such as AR (u) = x ∈ R`+ : R(x/u) > 1 , the intensity λR θ,u must be rescaled by Z Λθ {AR (u)} = AR (u)

High dimensional peaks-over-threshold modelling

λR θ (x)dx,

u ∈ R`+ . 5 / 13

The gradient score I

I

With hundreds of grid cells, the computation of the `-dimensional integral Λθ {AR (u)} must be simplied using particular risk functionals or simply avoided. An adaptation of the gradient score (Hyvärinen, 2007) allows statistical inference using the gradient of the log-density, δw (λR θ,u , x)

I

I

=

R ∂wi (x) ∂ log λθ,u (x) + ∂xi ∂xi i=1  ( )2  R ∂ 2 log λR 1 ∂ log λθ,u (x)  θ,u (x) 2  + , wi (x) ∂xi2 2 ∂xi

` X

2wi (x)

where w : AR (u) → R`+ is a weighting function. The weighting function w must be dierentiable on AR (u) and vanish on the boundaries of AR (u). δw (λR θ,u , x) is strictly proper, i.e., the estimator   m  n X x m  R > 1 δ(λR θ,u , x ), θ∈Θ u m=1

R θbδ,u {x 1 , . . . , x n } = arg max

(1)

where {·} is the indicator function, is consistent and asymptotically normal. High dimensional peaks-over-threshold modelling

6 / 13

Extreme rainfall over Florida I

I

I

15-minute radar rainfall measurements over Florida from 1994 to 2010 with 2km resolution. We focus on a 120km × 120km square south-west of Orlando and on the wet season, i.e., June to September. The region was selected for its homogeneity and orography by Buhl and Kluppelberg (2016).

28.4

lat

28.0

27.6

27.2 -82.5

-82.0

lon

-81.5

Radar rainfall measurement grid (2km × 2km) over East Florida. High dimensional peaks-over-threshold modelling

7 / 13

Marginal modelling

I

Generalized Pareto distributions are tted for each of the locations si using exceedances over the 99 percentile.

I

A model with common shape parameter ξ0 = 0.124 is retained.

I

Margins are then transformed to unit Fréchet: X ∗ (si ) = −1/ log F˜i {X (si )},

where F˜i {X (si )} =



Fbi {X (si )}, 1 − G{ξ0 ,bσ(si ),q99 (si )} {X (si )},

X (si ) 6 q99 (si ), X (si ) > q99 (si ),

(2)

and  Fbi is the empirical cumulative distribution function at location si ,  G{ξ0 ,bσ(si ),q99 (si )} is the distribution function of a generalized Pareto random variable with shape ξ0 , scale σ b(si ) and location q99 (si ).

High dimensional peaks-over-threshold modelling

8 / 13

Risks functionals and weighting function I

We dene two risk functionals " ∗

Rmax (X )

=

` X

#1/20 ∗

20

{X (si )}

,

i=1

" Rsum (X ∗ )

=

` X

#1/ξ0 {X ∗ (si )}ξ0

,

i=1

I

I

I

where ` = 3600 is the number of grid cells and ξ0 is the shape parameter of the marginal model. Rmax is a continuous and dierentiable approximation of maxi=1,...,` X ∗ (si ) which satises the requirements for the gradient score. Rsum selects events with large spatial cover. The power ξ0 approximately transforms the data X ∗ back to a scale where summing observations has a physical meaning. For gradient score inference, we use the weighting function h i wi (x) = xi 1 − e −R(x/u)−1 ,

High dimensional peaks-over-threshold modelling

i = 1, . . . , `,

x ∈ AR (u).

9 / 13

Spatial model and parameter estimates I

Non-separable semi-variogram model

Ω(si − sj ) κ

, γ(si , sj ) =

τ

si , sj ∈ [0, 120]2 ,

i, j ∈ {1, . . . 3600},

with 0 < κ 6 2, τ > 0 and anisotropy matrix  Ω= I

cos η a sin η

− sin η a cos η

 ,

 π πi , η∈ − ;

2 2

a > 1.

Fitted parameters obtained for both risk functionals with exceedances of Rmax (X ∗ ) and Rsum (X ∗ ) over the 99 quantile: Rmax Rsum

κ

τ

η

a

1.1920.02 0.3260.007

9.060.19 46.670.018

0.080.61 −0.300.10

1.0080.005 1.0640.017

 Rmax estimates are quite smooth with a small scale, they capture high quantiles and induce a model similar to that in Buhl and Kluppelberg (2016).  For Rsum , the semi-variogram is rougher but with a much larger scale, which is consistent with large-scale events.  Anisotropy does not seem signicant. High dimensional peaks-over-threshold modelling

10 / 13

Simulated extreme rainfall Rmax

28.4

28.4

0.0

-82.0

lon

27.2 -81.5

-82.5

28.4

-82.0

lon

-81.5

-82.0

-81.5

-82.5

28.4

lat

0.50

-81.5

0.75

28.0

0.50

0.25

0.25

0.00

0.00

27.6

27.2

lon

Rainfall

0.75

27.6

-82.0

-81.5

Rainfall 28.0

0.0

-82.5

-82.0

lon

28.4

lat

lat

lon

27.2 -82.5

0.2

27.2 -82.0

0.00

27.2

0.1

27.6

-82.5

0.50

27.6

Rainfall 28.0

0.0

27.2

0.75

0.25

27.6

lon

0.1

27.6

Rainfall 28.0

0.00

28.4

0.2

0.50 0.25

-81.5

Rainfall 28.0

lon

-81.5

28.4

0.75

lat

lat

lat 27.2

-82.0

Rainfall 28.0

0.0

27.6

-82.5

28.4

0.2

0.0

lon

lon

-81.5

0.1

27.6

-82.0

-82.0

Rainfall 28.0

0.1

-82.5

27.2 -82.5

-81.5

28.4

0.2

0.00

27.2 -82.5

Rainfall 28.0

0.25

27.6

lat

lon

-81.5

0.50

0.00 27.6

27.2 -82.0

0.75

28.0

0.25

27.6

27.2

Rainfall

0.75 0.50

0.0

27.6

28.4

Simulations

Rainfall 28.0

0.1

lat

0.2

0.1

-82.5

28.4

Rainfall 28.0

lat

0.2

lat

Rainfall 28.0

lat

28.4

lat

Observations

Rsum

27.2 -82.5

-82.0

lon

-81.5

-82.5

-82.0

-81.5

lon

15-minute cumulated rainfall (inches): observed (rst row) and simulated (second and third rows) for the risk functionals Rsum (left) and Rmax (right) with intensity equivalent to the 0.99 quantile. High dimensional peaks-over-threshold modelling

11 / 13

Discussion

I

Pareto processes generalize the notion of exceedance over a high threshold.

I

Risk functionals allow for broad and complex denitions of risk.

I

I

The gradient score can be used to perform statistical inference in high dimensions. For the BrownResnick model, the number of locations is limited only by the need to invert the covariance matrix. We develop an extreme rainfall generator over Florida for two types of risks: local heavy rain, and spatially widespread rainfall.

I

This model provides reasonable simulations especially with regard to its simplicity.

I

The marginal model should be improved to better handle dry locations.

I

With 15-minute measurements, a spatio-temporal model should be considered to account for temporal dependence.

High dimensional peaks-over-threshold modelling

12 / 13

Bibliography

Buhl, S. and Kluppelberg, C. (2016). Anisotropic BrownResnick Space-time Processes: Estimation and Model Assessment. Extremes, to appear. Dombry, C. and Ribatet, M. (2015). Functional Regular Variations, Pareto Processes and Peaks Over Thresholds. Statistics and Its Interface, 8(1):917. Hyvärinen, A. (2007). Some Extensions of Score Matching. Computational Statistics & Data Analysis, 51(5):24992512. Kabluchko, Z., Schlather, M., and de Haan, L. (2009). Stationary Max-stable Fields Associated to Negative Denite Functions. Annals of Probability, 37(5):20422065.

High dimensional peaks-over-threshold modelling

13 / 13