obrien fastKDE presentation

Report 9 Downloads 261 Views
A fast and objective multidimensional kernel density estimation method: fastKDE Travis A. O'Brien1,2 N.R. Cavanaugh1, K. Kashinath1, W.D. Collins1,3, J.P. O'Brien1,4 1Climate

& Ecosystems Science Division, Lawrence Berkeley Lab 2Dept. of Land, Air and Water Resources, UC Davis 3Dept. of Earth and Planetary Sciences, UC Berkeley 4Dept. of Earth and Planetary Sciences, UC Santa Cruz

13th IMSC, 8 June 2016 CLIMATE & ECOSYSTEMS SCIENCE DIVISION • LAWRENCE BERKELEY NATIONAL LABORATORY

Joint PDFs in Climate Evaluating joint PDFs in NARCCAP (Lee et al. 2015, Clim. Dyn.)

Comparison of cloud height vs optical depth histograms (Marchand et al. 2010, JGR-Atmospheres)

Joint temp-precip return intervals for CA (AghaKouchak et al. 2015, GRL)

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Classes of current joint PDF methods •Parametric: –Fit parameters to an assumed PDF model •Maximum likelihood methods •Bayesian parameter estimation

•Non-parametric: –Directly construct PDF estimate from data •Histogram •Kernel density estimation •Bayesian non-parametrics

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Serious drawbacks with existing methods •Parametric: –Requires assumption about underlying PDF, and –May not be appropriate for exploratory analysis

•Non-parametric: –Requires user-specified parameters (bin width, kernel bandwidth), or –Assumptions about underlying PDF shape, or –Computationally-expensive fitting methods CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

What is kernel density estimation?

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Bandwidth analogous to histogram bin width

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

KDE in Fourier Space Discrete Fourier Transform

Empirical characteristic function:

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

KDE in Fourier space Discrete Fourier Transform

Empirical characteristic function:

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

A recent method for choosing the kernel •Bernacchia and Pigolotti (2011, J. Roy. Stat. Soc. B) derive an 'optimal kernel' (width and shape) based solely on data: –Low error; –No user-specified parameters; –Very slow; –Only works in 1D. •O'Brien et al. (2014, Comp. Stat. Data Anal.) –Present method for 100x speedup of BP11; –Quite fast; –Only works in 1D. CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

We extended the BP11 & OB14 methods to multiple dimensions O’Brien, T. A., Kashinath, K., Cavanaugh, N. R., Collins, W. D., & O’Brien, J. P. (2016). A fast and objective multidimensional kernel density estimation method: fastKDE. Comp. Stat. & Data Anal., 101, 148–160. doi:10.1016/j.csda.2016.02.014 Error Error drops like N-1

Speed R : ks package

4 orders of magnitude

fastKDE

a) The integrated, squared error of KDE estimates, as a function of the number of data samples, on samples from several distributions: 1, 2, and 3 dimensional normal distributions, as well as a non-trivial mixture of standard normal distributions. The solid lines depict the mean of 30 ensemble members, and the shaded swaths depict the 5–95 percentile range. The gray dashed line shows the theoretical convergence rate N-1 for reference. (b) As in (a), but for the time required to perform the KDE. The gray dashed line shows N1 for reference. CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

What can we do with this? Consider typical climate fields plotted as a scatterplot:

A fastKDE on those data yields the following PDF:

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

What can we do with this? A side note: these data were constructed such that noise masks the underlying x/y relationship. The fastKDE gives a hint of its existence, but...

Underlying relationship (dashed line)

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

We can directly calculate conditional probabilities. Both directly estimated w/ fastKDE.

True PDF (colors)

fastKDE (black)

(takes < 1 second to compute)

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Application to climate data:

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Application to climate data: CA mean temperature and precipitation, conditioned on global mean temperature anomaly.

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Application to climate data: CA Precip CDF clearly has changed with time. CA Temperature CDF systematically shifts with time.

Joint relationship reveals a transition in P-T relationship CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

fastKDE •Quickly and objectively estimates PDFs •Converges rapidly to true PDF as points are added •Is being used on multiple problems in CASCADE •Is available publicly (free for academic use) at bit.ly/1KbvyI9 or via pip: pip install fastkde CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Thank you! This research was supported by the Director, Office of Science, Office of Biological and Environmental Research of the U.S. Department of Energy Regional and Global Climate Modeling Program (RGCM) and used resources of the National Energy Research Scientific Computing Center (NERSC), also supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC0205CH11231.

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

fastKDE works on complex distributions

•Sampled from a non-trivial mixture of normal distributions. •(Distribution generated by adding a normal distribution for each black pixel in a monochrom image.)

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

fastKDE works on complex distributions

True PDF

fastKDE

fastKDE 6

A non-trivial mixture of normal distributions: (a) the true pdf, (b) a fastKDE on 10 samples, (c) a fastKDE 3 estimate on 10 samples.

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Convergence: R vs fastKDE

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Issues with KDE

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Issues with KDE

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY

Multidimensional selection of frequencies

CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY