A fast and objective multidimensional kernel density estimation method: fastKDE Travis A. O'Brien1,2 N.R. Cavanaugh1, K. Kashinath1, W.D. Collins1,3, J.P. O'Brien1,4 1Climate
& Ecosystems Science Division, Lawrence Berkeley Lab 2Dept. of Land, Air and Water Resources, UC Davis 3Dept. of Earth and Planetary Sciences, UC Berkeley 4Dept. of Earth and Planetary Sciences, UC Santa Cruz
13th IMSC, 8 June 2016 CLIMATE & ECOSYSTEMS SCIENCE DIVISION • LAWRENCE BERKELEY NATIONAL LABORATORY
Joint PDFs in Climate Evaluating joint PDFs in NARCCAP (Lee et al. 2015, Clim. Dyn.)
Comparison of cloud height vs optical depth histograms (Marchand et al. 2010, JGR-Atmospheres)
Joint temp-precip return intervals for CA (AghaKouchak et al. 2015, GRL)
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Classes of current joint PDF methods •Parametric: –Fit parameters to an assumed PDF model •Maximum likelihood methods •Bayesian parameter estimation
•Non-parametric: –Directly construct PDF estimate from data •Histogram •Kernel density estimation •Bayesian non-parametrics
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Serious drawbacks with existing methods •Parametric: –Requires assumption about underlying PDF, and –May not be appropriate for exploratory analysis
•Non-parametric: –Requires user-specified parameters (bin width, kernel bandwidth), or –Assumptions about underlying PDF shape, or –Computationally-expensive fitting methods CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
What is kernel density estimation?
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Bandwidth analogous to histogram bin width
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
KDE in Fourier Space Discrete Fourier Transform
Empirical characteristic function:
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
KDE in Fourier space Discrete Fourier Transform
Empirical characteristic function:
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
A recent method for choosing the kernel •Bernacchia and Pigolotti (2011, J. Roy. Stat. Soc. B) derive an 'optimal kernel' (width and shape) based solely on data: –Low error; –No user-specified parameters; –Very slow; –Only works in 1D. •O'Brien et al. (2014, Comp. Stat. Data Anal.) –Present method for 100x speedup of BP11; –Quite fast; –Only works in 1D. CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
We extended the BP11 & OB14 methods to multiple dimensions O’Brien, T. A., Kashinath, K., Cavanaugh, N. R., Collins, W. D., & O’Brien, J. P. (2016). A fast and objective multidimensional kernel density estimation method: fastKDE. Comp. Stat. & Data Anal., 101, 148–160. doi:10.1016/j.csda.2016.02.014 Error Error drops like N-1
Speed R : ks package
4 orders of magnitude
fastKDE
a) The integrated, squared error of KDE estimates, as a function of the number of data samples, on samples from several distributions: 1, 2, and 3 dimensional normal distributions, as well as a non-trivial mixture of standard normal distributions. The solid lines depict the mean of 30 ensemble members, and the shaded swaths depict the 5–95 percentile range. The gray dashed line shows the theoretical convergence rate N-1 for reference. (b) As in (a), but for the time required to perform the KDE. The gray dashed line shows N1 for reference. CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
What can we do with this? Consider typical climate fields plotted as a scatterplot:
A fastKDE on those data yields the following PDF:
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
What can we do with this? A side note: these data were constructed such that noise masks the underlying x/y relationship. The fastKDE gives a hint of its existence, but...
Underlying relationship (dashed line)
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
We can directly calculate conditional probabilities. Both directly estimated w/ fastKDE.
True PDF (colors)
fastKDE (black)
(takes < 1 second to compute)
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Application to climate data:
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Application to climate data: CA mean temperature and precipitation, conditioned on global mean temperature anomaly.
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Application to climate data: CA Precip CDF clearly has changed with time. CA Temperature CDF systematically shifts with time.
Joint relationship reveals a transition in P-T relationship CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
fastKDE •Quickly and objectively estimates PDFs •Converges rapidly to true PDF as points are added •Is being used on multiple problems in CASCADE •Is available publicly (free for academic use) at bit.ly/1KbvyI9 or via pip: pip install fastkde CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Thank you! This research was supported by the Director, Office of Science, Office of Biological and Environmental Research of the U.S. Department of Energy Regional and Global Climate Modeling Program (RGCM) and used resources of the National Energy Research Scientific Computing Center (NERSC), also supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC0205CH11231.
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
fastKDE works on complex distributions
•Sampled from a non-trivial mixture of normal distributions. •(Distribution generated by adding a normal distribution for each black pixel in a monochrom image.)
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
fastKDE works on complex distributions
True PDF
fastKDE
fastKDE 6
A non-trivial mixture of normal distributions: (a) the true pdf, (b) a fastKDE on 10 samples, (c) a fastKDE 3 estimate on 10 samples.
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Convergence: R vs fastKDE
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Issues with KDE
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Issues with KDE
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY
Multidimensional selection of frequencies
CASCADE SFA • CESD • LAWRENCE BERKELEY NATIONAL LABORATORY