13 international meeting on statistical climatology
Projection pursuit regression in regional flood frequency analysis Martin Durocher, Fateh Chebana, Taha B.M.J. Ouarda
June 8 2016
Outlines • Context: RFA and nonlinearity • Existing methods and their drowbacks • PPR model • Application
• Conclusion 2
Frequency analysis of extreme hydrological events • Designing infrastructure • Preventing floods • At-site analysis (when data available) – Fitting a distribution – Predicting flood quantiles
• Regional frequency analysis (RFA) – Estimates extreme events for ungauged target sites – Transfers hydrological information from gauged sites in a region to the ungauged target site 3
Nonlinearity and RFA • Nonlinear models may be more justified as hydrological
processes are naturally nonlinear • In RFA: – Generalized Additive Models (GAMs) recently investigated – Approaches based on machine learning (data-driven algorithms) such as Artificial neural networks (ANN) 4
Drawbacks of existing models in RFA • ANN calibration requires large dataset • ANN does not lead to explicit regression equations (lack of interpretation)) • Its calibration is not an easy task as some problems may occur if proper guidelines are not followed • It has some numerical difficulties and subjective choices
• GAM estimates a nonlinear function for each input basin characteristic
5
• Objective: Develop a simple methodology with a better interpretation
• Projection Pursuit Regression (PPR) [Friedman and Tukey, 1974] – Simple to use – Parcimonious – Interpretable
7
Projection Pursuit Regression • PPR uses intermediate predictors instead of the basin characteristics • A particularity of PPR is that both smooth functions and intermediate predictors are jointly estimated • In special cases, PPR can be parsimonious enough to provide a meaningful structure
8
9
Case study • Data – 151 hydrometric stations southern Quebec – Specific flood quantiles of 100 years – Basin characteristics (forward-stepwise) • Drainage Area (DA) • % of basin occupied by lake (PL) • Mean annual solid precipitation (SP) • Degree-days below 0 Celsius (DZ) 10
• Predictions at ungauged locations
11
• Model description
12
• Model description (cont.)
13
Performance comparison
14
Conclusions • Compromise between GAM and ANN • Advantages – Model with few directions (1 or 2)
– Most parsimonious model – Predictive performance similar to GAM – Explicit regression equations
• Drowbacks – No direct relation with the basin characteristics