Computational geometry for multivariate statistics John P. Nolan American University Washington, DC, USA
CASD 2015 George Mason University 21 October 2015
Nolan (American U)
Computational geometry
21 Oct 2015
1 / 30
Outline
1
Introduction
2
Computational geometry Multivariate histograms
3
Generalized spherical distributions
4
Multivariate EVDs
Nolan (American U)
Computational geometry
21 Oct 2015
2 / 30
ARO Multidisciplinary University Research Initiative (MURI) 5 year grant on Multivariate Heavy Tailed Phenomena Richard Davis - Columbia University (Statistics) Weibo Gong - UMass Amherst (ECE) John Nolan - American University (Math) Sidney Resnick - Cornell University (ORIE) Gennady Samorodnitsky - Cornell University (ORIE) Ness Shroff - Ohio State University R. Srikant - University of Illinois at Urbana-Champaign (ECE) Don Towsley - UMass Amherst (CS) Zhi-Li Zhang - University of Minnesota
Nolan (American U)
Computational geometry
21 Oct 2015
3 / 30
Some topics the MURI group has worked on
Growth of social networks - preferential attachment model leads to joint heavy tailed model for (in-degree, out-degree) Random walks on large graphs Analysis of communication networks with heavy tailed traffic Resource allocation in cloud computing Reducing power consumption on mobile devices Multivariate extreme value distributions - calculations and estimation Threshold selection in heavy tailed inference Dimensionality reduction for heavy tailed data - robust PCA and ICA
Nolan (American U)
Computational geometry
21 Oct 2015
4 / 30
Outline
1
Introduction
2
Computational geometry Multivariate histograms
3
Generalized spherical distributions
4
Multivariate EVDs
Nolan (American U)
Computational geometry
21 Oct 2015
5 / 30
There is a need for non-traditional models for multivariate data. Working in dimension d > 2 requires new tools. grids and meshes on non-rectangular shapes numerical integration over surfaces simulate from a shape
Nolan (American U)
Computational geometry
21 Oct 2015
6 / 30
There is a need for non-traditional models for multivariate data. Working in dimension d > 2 requires new tools. grids and meshes on non-rectangular shapes numerical integration over surfaces simulate from a shape R software packages mvmesh - MultiVariate Meshes (CRAN) SphericalCubature (CRAN) Simplicial Cubature (CRAN) gensphere (manuscript submitted) mvevd - MultiVariate Extreme Value Distributions (in progress)
Nolan (American U)
Computational geometry
21 Oct 2015
6 / 30
mvmesh Rectangular grids are straightforward in any dimensions
Grid points evenly spaced, easy to determine which cell a point is in. Nolan (American U)
Computational geometry
21 Oct 2015
7 / 30
Other shapes are not well described by rectangular grids
Points are not evenly spread, faces have different numbers of vertices. Nolan (American U)
Computational geometry
21 Oct 2015
8 / 30
Simplices - equal area subdivisions
This and following shapes can be generated in any dimension d ≥ 2. Nolan (American U)
Computational geometry
21 Oct 2015
9 / 30
Balls/spheres - approximate equal area subdivisions
Nolan (American U)
Computational geometry
21 Oct 2015
10 / 30
Tubes- approximate equal area subdivisions
Nolan (American U)
Computational geometry
21 Oct 2015
11 / 30
Rectangular histograms
Nolan (American U)
Computational geometry
21 Oct 2015
12 / 30
Rectangular histograms
Nolan (American U)
Computational geometry
21 Oct 2015
12 / 30
Histograms of non-rectangular regions
Nolan (American U)
Computational geometry
21 Oct 2015
13 / 30
Histograms of non-rectangular regions
Nolan (American U)
Computational geometry
21 Oct 2015
13 / 30
Directional histogram d = 2 - count how many in each “direction”
1.5 1.0
40 30
●
●
● ●
● ●
0.0
● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●
0
0.5
20 10
● ●
0
threshold= 0 2.0
●
50
60
mix of 5000 light tailed 100 heavy tailed data values
20
40
60
0.0
1.5
2.0
2.0 1.5 0.0
0.5
1.0
1.5 1.0 0.5 0.0 0.0
Nolan (American U)
1.0
threshold= 4
2.0
threshold= 1
0.5
0.5
1.0
1.5
2.0
0.0
Computational geometry
0.5
1.0
1.5
2.0
21 Oct 2015
14 / 30
Generalize to d ≥ 3?
triangulate/tessellate the sphere each simplex on sphere determines a cone starting at the origin loop through data points, seeing which cone each falls in plot histogram
Nolan (American U)
Computational geometry
21 Oct 2015
15 / 30
Directional histogram d = 3, positive data
Nolan (American U)
Computational geometry
21 Oct 2015
16 / 30
Directional histogram d = 4 n=100000 dimension= 4 −+++
+−++
−−++
++−+
−+−+
+−−+
−−−+
+++−
−++−
+−+−
−−+−
++−−
−+−−
+−−−
−−−−
100 0
50
counts
150
++++
0
200
400
600
800
1000
cone
Radially symmetric data in R4 Nolan (American U)
Computational geometry
21 Oct 2015
17 / 30
Directional histogram d = 4 350
n=100000 dimension= 4 −+++
+−++
−−++
++−+
−+−+
+−−+
−−−+
+++−
−++−
+−+−
−−+−
++−−
−+−−
+−−−
−−−−
counts
0
50
100
150
200
250
300
++++
0
200
400
600
800
1000
cone
All octants where 3rd component is negative are empty. Nolan (American U)
Computational geometry
21 Oct 2015
18 / 30
Outline
1
Introduction
2
Computational geometry Multivariate histograms
3
Generalized spherical distributions
4
Multivariate EVDs
Nolan (American U)
Computational geometry
21 Oct 2015
19 / 30
Generalized spherical distributions Distributions with level sets that are all scaled versions of a star shaped region. Flexible scheme for building up nonstandard star shaped contours.
Nolan (American U)
Computational geometry
21 Oct 2015
20 / 30
Generalized spherical distributions
2 0 −2
−2
0
c(−2, 2)
2
Distributions with level sets that are all scaled versions of a star shaped region. Flexible scheme for building up nonstandard star shaped contours.
−2
0
2
−2
0
2
0
● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●
−2
−2
0
c(−2, 2)
2
c(−2, 2)
2
c(−2, 2)
−2
0
2
−2
0
2
A tessellation based on the added ‘bumps’ is automatically generated and used in simulating from the contour. Process requires the arclength/ surface area of the contour. Nolan (American U)
Computational geometry
21 Oct 2015
20 / 30
10
Add a radial component to get a distribution: X = RZ, where Z is uniform w.r.t. (d − 1)-dimensional surface area on contour. Here R ∼ Γ(2, 1)
●
● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ● ●● ● ● ●● ●●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ●●● ● ● ●● ● ● ●● ● ●●●● ●● ●● ● ●●●● ●● ●●●●● ● ●● ● ●●●●● ● ● ● ●●● ● ● ●●● ● ● ●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ●● ● ●●● ● ●●●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●●●●●● ● ● ● ●●● ●● ● ●● ●● ● ●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●●● ● ●●●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ●●●● ● ●● ● ● ● ● ●● ● ● ●● ● ●●●●● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●●●● ●●● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
5
●
●
●
●
●● ● ●● ●
● ●●
●
x
−10
● ●
y
−5
●
●
●
z
●
●
0
●
●
−10
−5
0
5
10
Sample of X = RZ Nolan (American U)
density surface Computational geometry
21 Oct 2015
21 / 30
Many contour shapes possible
Nolan (American U)
Computational geometry
21 Oct 2015
22 / 30
Choice of R determines radial behavior
y
y
z
z x
x
y
y
z
z x
x
In all cases, contour is a diamond. (a) R ∼Uniform(0,1) (b) R ∼ Γ(2, 1) (c) R = |Y| where Y is 2D isotropic stable (d) R ∼ Γ(5, 1) Nolan (American U)
Computational geometry
21 Oct 2015
23 / 30
3D example - contour
Nolan (American U)
Computational geometry
21 Oct 2015
24 / 30
uniform sample from contour
Nolan (American U)
Computational geometry
21 Oct 2015
25 / 30
sample from distribution X with R ∼ Γ(2, 1)
Nolan (American U)
Computational geometry
21 Oct 2015
26 / 30
Flexible shapes
Specified some letters in 3D, can sample from this word proportional to arclength
Nolan (American U)
Computational geometry
21 Oct 2015
27 / 30
Flexible shapes
Specified some letters in 3D, can sample from this word proportional to arclength
Nolan (American U)
Computational geometry
21 Oct 2015
27 / 30
Outline
1
Introduction
2
Computational geometry Multivariate histograms
3
Generalized spherical distributions
4
Multivariate EVDs
Nolan (American U)
Computational geometry
21 Oct 2015
28 / 30
Multivariate Fr´echet Distributions de Haan and Resnick (1977): X max stable, centered with shape index ξ, is characterized by the angular measure H on the unit simplex W+ . The spread of mass by H determines the joint structure. Define the scale function ! Z d _ u ξ wi H(dw). σ ξ (u) = W+
i=1
(If the components of X are normalized and ξ = 1, then this is the tail dependence function `(u).)
Nolan (American U)
Computational geometry
21 Oct 2015
29 / 30
Multivariate Fr´echet Distributions de Haan and Resnick (1977): X max stable, centered with shape index ξ, is characterized by the angular measure H on the unit simplex W+ . The spread of mass by H determines the joint structure. Define the scale function ! Z d _ u ξ wi H(dw). σ ξ (u) = W+
i=1
(If the components of X are normalized and ξ = 1, then this is the tail dependence function `(u).) The scale function determines the joint distribution: G (x) = P(X ≤ x) = exp −σ ξ (x−1 ) . Observation: need to (a) describe different types of measures and (b) integrate over a surface
Nolan (American U)
Computational geometry
21 Oct 2015
29 / 30
R package mvevd, d ≥ 2 Define classes of mvevds: discrete H, generalized logistic, Dirichlet mixture, piecewise constant and linear angular measures (computational geometry) Compute scale functions σ(u) for above classes (integrate over simplices, computational geometry) Fitting mvevd data with any of the above classes (max projections) Exact simulation from these classes (Dirichlet mix - Dombry, Engelke & Oesting (EVA 2015), Dieker and Mikosch (2015)) Compute cdf G (x) = P(X ≤ x) = exp(−σ ξ (x−1 )), (µ = 0, x ≥ 0). Computation of density g (x) when known (partitions) Computation of H(S) for a simplex S to estimate tail probabilities in the direction S. (computational geometry & integrate over simplices)
Nolan (American U)
Computational geometry
21 Oct 2015
30 / 30