Computational geometry for multivariate statistics

Report 1 Downloads 203 Views
Computational geometry for multivariate statistics John P. Nolan American University Washington, DC, USA

CASD 2015 George Mason University 21 October 2015

Nolan (American U)

Computational geometry

21 Oct 2015

1 / 30

Outline

1

Introduction

2

Computational geometry Multivariate histograms

3

Generalized spherical distributions

4

Multivariate EVDs

Nolan (American U)

Computational geometry

21 Oct 2015

2 / 30

ARO Multidisciplinary University Research Initiative (MURI) 5 year grant on Multivariate Heavy Tailed Phenomena Richard Davis - Columbia University (Statistics) Weibo Gong - UMass Amherst (ECE) John Nolan - American University (Math) Sidney Resnick - Cornell University (ORIE) Gennady Samorodnitsky - Cornell University (ORIE) Ness Shroff - Ohio State University R. Srikant - University of Illinois at Urbana-Champaign (ECE) Don Towsley - UMass Amherst (CS) Zhi-Li Zhang - University of Minnesota

Nolan (American U)

Computational geometry

21 Oct 2015

3 / 30

Some topics the MURI group has worked on

Growth of social networks - preferential attachment model leads to joint heavy tailed model for (in-degree, out-degree) Random walks on large graphs Analysis of communication networks with heavy tailed traffic Resource allocation in cloud computing Reducing power consumption on mobile devices Multivariate extreme value distributions - calculations and estimation Threshold selection in heavy tailed inference Dimensionality reduction for heavy tailed data - robust PCA and ICA

Nolan (American U)

Computational geometry

21 Oct 2015

4 / 30

Outline

1

Introduction

2

Computational geometry Multivariate histograms

3

Generalized spherical distributions

4

Multivariate EVDs

Nolan (American U)

Computational geometry

21 Oct 2015

5 / 30

There is a need for non-traditional models for multivariate data. Working in dimension d > 2 requires new tools. grids and meshes on non-rectangular shapes numerical integration over surfaces simulate from a shape

Nolan (American U)

Computational geometry

21 Oct 2015

6 / 30

There is a need for non-traditional models for multivariate data. Working in dimension d > 2 requires new tools. grids and meshes on non-rectangular shapes numerical integration over surfaces simulate from a shape R software packages mvmesh - MultiVariate Meshes (CRAN) SphericalCubature (CRAN) Simplicial Cubature (CRAN) gensphere (manuscript submitted) mvevd - MultiVariate Extreme Value Distributions (in progress)

Nolan (American U)

Computational geometry

21 Oct 2015

6 / 30

mvmesh Rectangular grids are straightforward in any dimensions

Grid points evenly spaced, easy to determine which cell a point is in. Nolan (American U)

Computational geometry

21 Oct 2015

7 / 30

Other shapes are not well described by rectangular grids

Points are not evenly spread, faces have different numbers of vertices. Nolan (American U)

Computational geometry

21 Oct 2015

8 / 30

Simplices - equal area subdivisions

This and following shapes can be generated in any dimension d ≥ 2. Nolan (American U)

Computational geometry

21 Oct 2015

9 / 30

Balls/spheres - approximate equal area subdivisions

Nolan (American U)

Computational geometry

21 Oct 2015

10 / 30

Tubes- approximate equal area subdivisions

Nolan (American U)

Computational geometry

21 Oct 2015

11 / 30

Rectangular histograms

Nolan (American U)

Computational geometry

21 Oct 2015

12 / 30

Rectangular histograms

Nolan (American U)

Computational geometry

21 Oct 2015

12 / 30

Histograms of non-rectangular regions

Nolan (American U)

Computational geometry

21 Oct 2015

13 / 30

Histograms of non-rectangular regions

Nolan (American U)

Computational geometry

21 Oct 2015

13 / 30

Directional histogram d = 2 - count how many in each “direction”

1.5 1.0

40 30





● ●

● ●

0.0

● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●

0

0.5

20 10

● ●

0

threshold= 0 2.0



50

60

mix of 5000 light tailed 100 heavy tailed data values

20

40

60

0.0

1.5

2.0

2.0 1.5 0.0

0.5

1.0

1.5 1.0 0.5 0.0 0.0

Nolan (American U)

1.0

threshold= 4

2.0

threshold= 1

0.5

0.5

1.0

1.5

2.0

0.0

Computational geometry

0.5

1.0

1.5

2.0

21 Oct 2015

14 / 30

Generalize to d ≥ 3?

triangulate/tessellate the sphere each simplex on sphere determines a cone starting at the origin loop through data points, seeing which cone each falls in plot histogram

Nolan (American U)

Computational geometry

21 Oct 2015

15 / 30

Directional histogram d = 3, positive data

Nolan (American U)

Computational geometry

21 Oct 2015

16 / 30

Directional histogram d = 4 n=100000 dimension= 4 −+++

+−++

−−++

++−+

−+−+

+−−+

−−−+

+++−

−++−

+−+−

−−+−

++−−

−+−−

+−−−

−−−−

100 0

50

counts

150

++++

0

200

400

600

800

1000

cone

Radially symmetric data in R4 Nolan (American U)

Computational geometry

21 Oct 2015

17 / 30

Directional histogram d = 4 350

n=100000 dimension= 4 −+++

+−++

−−++

++−+

−+−+

+−−+

−−−+

+++−

−++−

+−+−

−−+−

++−−

−+−−

+−−−

−−−−

counts

0

50

100

150

200

250

300

++++

0

200

400

600

800

1000

cone

All octants where 3rd component is negative are empty. Nolan (American U)

Computational geometry

21 Oct 2015

18 / 30

Outline

1

Introduction

2

Computational geometry Multivariate histograms

3

Generalized spherical distributions

4

Multivariate EVDs

Nolan (American U)

Computational geometry

21 Oct 2015

19 / 30

Generalized spherical distributions Distributions with level sets that are all scaled versions of a star shaped region. Flexible scheme for building up nonstandard star shaped contours.

Nolan (American U)

Computational geometry

21 Oct 2015

20 / 30

Generalized spherical distributions

2 0 −2

−2

0

c(−2, 2)

2

Distributions with level sets that are all scaled versions of a star shaped region. Flexible scheme for building up nonstandard star shaped contours.

−2

0

2

−2

0

2

0

● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●

−2

−2

0

c(−2, 2)

2

c(−2, 2)

2

c(−2, 2)

−2

0

2

−2

0

2

A tessellation based on the added ‘bumps’ is automatically generated and used in simulating from the contour. Process requires the arclength/ surface area of the contour. Nolan (American U)

Computational geometry

21 Oct 2015

20 / 30

10

Add a radial component to get a distribution: X = RZ, where Z is uniform w.r.t. (d − 1)-dimensional surface area on contour. Here R ∼ Γ(2, 1)



● ● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ●● ● ●● ● ● ●● ● ●●● ●● ● ● ●● ●●● ●● ●● ● ●● ● ● ●● ●●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ●●● ● ● ●● ● ● ●● ● ●●●● ●● ●● ● ●●●● ●● ●●●●● ● ●● ● ●●●●● ● ● ● ●●● ● ● ●●● ● ● ●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ●● ● ●●● ● ●●●● ● ● ●● ● ●● ● ● ●● ●● ● ● ●●●●●● ● ● ● ●●● ●● ● ●● ●● ● ●●● ● ● ● ●●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ●● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●●● ● ●● ●● ● ● ● ● ●● ● ●●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●●● ● ●●●● ● ● ●● ●● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ●●●● ● ●● ● ● ● ● ●● ● ● ●● ● ●●●●● ● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ●●● ●●● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ●●●● ●●● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

5









●● ● ●● ●

● ●●



x

−10

● ●

y

−5







z





0





−10

−5

0

5

10

Sample of X = RZ Nolan (American U)

density surface Computational geometry

21 Oct 2015

21 / 30

Many contour shapes possible

Nolan (American U)

Computational geometry

21 Oct 2015

22 / 30

Choice of R determines radial behavior

y

y

z

z x

x

y

y

z

z x

x

In all cases, contour is a diamond. (a) R ∼Uniform(0,1) (b) R ∼ Γ(2, 1) (c) R = |Y| where Y is 2D isotropic stable (d) R ∼ Γ(5, 1) Nolan (American U)

Computational geometry

21 Oct 2015

23 / 30

3D example - contour

Nolan (American U)

Computational geometry

21 Oct 2015

24 / 30

uniform sample from contour

Nolan (American U)

Computational geometry

21 Oct 2015

25 / 30

sample from distribution X with R ∼ Γ(2, 1)

Nolan (American U)

Computational geometry

21 Oct 2015

26 / 30

Flexible shapes

Specified some letters in 3D, can sample from this word proportional to arclength

Nolan (American U)

Computational geometry

21 Oct 2015

27 / 30

Flexible shapes

Specified some letters in 3D, can sample from this word proportional to arclength

Nolan (American U)

Computational geometry

21 Oct 2015

27 / 30

Outline

1

Introduction

2

Computational geometry Multivariate histograms

3

Generalized spherical distributions

4

Multivariate EVDs

Nolan (American U)

Computational geometry

21 Oct 2015

28 / 30

Multivariate Fr´echet Distributions de Haan and Resnick (1977): X max stable, centered with shape index ξ, is characterized by the angular measure H on the unit simplex W+ . The spread of mass by H determines the joint structure. Define the scale function ! Z d _ u ξ wi H(dw). σ ξ (u) = W+

i=1

(If the components of X are normalized and ξ = 1, then this is the tail dependence function `(u).)

Nolan (American U)

Computational geometry

21 Oct 2015

29 / 30

Multivariate Fr´echet Distributions de Haan and Resnick (1977): X max stable, centered with shape index ξ, is characterized by the angular measure H on the unit simplex W+ . The spread of mass by H determines the joint structure. Define the scale function ! Z d _ u ξ wi H(dw). σ ξ (u) = W+

i=1

(If the components of X are normalized and ξ = 1, then this is the tail dependence function `(u).) The scale function determines the joint distribution:   G (x) = P(X ≤ x) = exp −σ ξ (x−1 ) . Observation: need to (a) describe different types of measures and (b) integrate over a surface

Nolan (American U)

Computational geometry

21 Oct 2015

29 / 30

R package mvevd, d ≥ 2 Define classes of mvevds: discrete H, generalized logistic, Dirichlet mixture, piecewise constant and linear angular measures (computational geometry) Compute scale functions σ(u) for above classes (integrate over simplices, computational geometry) Fitting mvevd data with any of the above classes (max projections) Exact simulation from these classes (Dirichlet mix - Dombry, Engelke & Oesting (EVA 2015), Dieker and Mikosch (2015)) Compute cdf G (x) = P(X ≤ x) = exp(−σ ξ (x−1 )), (µ = 0, x ≥ 0). Computation of density g (x) when known (partitions) Computation of H(S) for a simplex S to estimate tail probabilities in the direction S. (computational geometry & integrate over simplices)

Nolan (American U)

Computational geometry

21 Oct 2015

30 / 30