Projection pursuit regression Canmore2016 Clean

Report 4 Downloads 107 Views
13 international meeting on statistical climatology

Projection pursuit regression in regional flood frequency analysis Martin Durocher, Fateh Chebana, Taha B.M.J. Ouarda

June 8 2016

Outlines • Context: RFA and nonlinearity • Existing methods and their drowbacks • PPR model • Application

• Conclusion 2

Frequency analysis of extreme hydrological events • Designing infrastructure • Preventing floods • At-site analysis (when data available) – Fitting a distribution – Predicting flood quantiles

• Regional frequency analysis (RFA) – Estimates extreme events for ungauged target sites – Transfers hydrological information from gauged sites in a region to the ungauged target site 3

Nonlinearity and RFA • Nonlinear models may be more justified as hydrological

processes are naturally nonlinear • In RFA: – Generalized Additive Models (GAMs) recently investigated – Approaches based on machine learning (data-driven algorithms) such as Artificial neural networks (ANN) 4

Drawbacks of existing models in RFA • ANN calibration requires large dataset • ANN does not lead to explicit regression equations (lack of interpretation)) • Its calibration is not an easy task as some problems may occur if proper guidelines are not followed • It has some numerical difficulties and subjective choices

• GAM estimates a nonlinear function for each input basin characteristic

5

• Objective: Develop a simple methodology with a better interpretation

• Projection Pursuit Regression (PPR) [Friedman and Tukey, 1974] – Simple to use – Parcimonious – Interpretable

7

Projection Pursuit Regression • PPR uses intermediate predictors instead of the basin characteristics • A particularity of PPR is that both smooth functions and intermediate predictors are jointly estimated • In special cases, PPR can be parsimonious enough to provide a meaningful structure

8

9

Case study • Data – 151 hydrometric stations southern Quebec – Specific flood quantiles of 100 years – Basin characteristics (forward-stepwise) • Drainage Area (DA) • % of basin occupied by lake (PL) • Mean annual solid precipitation (SP) • Degree-days below 0 Celsius (DZ) 10

• Predictions at ungauged locations

11

• Model description

12

• Model description (cont.)

13

Performance comparison

14

Conclusions • Compromise between GAM and ANN • Advantages – Model with few directions (1 or 2)

– Most parsimonious model – Predictive performance similar to GAM – Explicit regression equations

• Drowbacks – No direct relation with the basin characteristics

– Non optimal smoothing 15

Thank you

16