Generative Statistical Modeling for Dynamic and Distributed Data

Report 3 Downloads 41 Views
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data Jia Li Department of Statistics The Pennsylvania State University Email: [email protected]

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Generative Modeling for Classification and Clustering I I

I

(X , Y ) ∼ P(x, y ), X ∈ Rd , Y ∈ M = {1, 2, ..., M}. Goal: Predict Y by X based on training data T = {(xi , yi ), i = 1, ..., n}. Generative modeling I

I

Estimate P(x|y = j): fj (x), and prior πj = P(Y = j), j = 1, ..., m. Bayes formula (optimal under 0-1 loss): πj fj (x) P(Y = j|X = x) = PM 0 0 j 0 =1 πj fj (x)

I

Examples I I

Gaussina mixture model (GMM): (Xi , Yi )’s are i.i.d. Hidden Markov model (HMM): (Xi , Yi ), i = 1, ..., n is a stochastic process.

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Multiscale Statistical Modeling

Central Processor

Multiscale Statistical Modeling Principles vs. Feasibility

Model Clustering Build mixture

Model Fusion

Optimization, Geometric methods Discrete distributions

K el od

GMM HMM

M

Model 3

el 2 Mod

M

od

el

1

GMM Metric for Prob. measures

Site 1

Site 2

Site 3

Site K

Modeling

Modeling

Modeling

Modeling

Data

Data

Data

... ... ...

Data

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Wasserstein Distance I

Let the probability measures for X and Z be γ1 , γ2 : D(γ1 , γ2 ) ,

I

where k · k is the Lp distance. We let p = 2. Advantages I I I

I

min (E kX − Z kp )1/p , ζ∈Γ(γ1 ,γ2 )

True metric Different supports Relationship between support points (cross-terms)

Challenges: computational complexity

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Wasserstein Distance between Discrete Distributions I

(1)

(1)

(2)

(2)

(ml )

βl = {(vl , pl ), (vl , pl ), ..., (vl k = 1, ..., ml , l = 1, 2. 2

D (β1 , β2 ) = min

{wi,j }

subject to

m2 X j=1 m1 X

m1 X m2 X

(ml )

, pl

(i)

(k)

)}, vl

∈ Rd ,

(j)

wi,j k v1 − v2 k2

i=1 j=1 (i)

wi,j = p1 , i = 1, ..., m1 ; (j)

wi,j = p2 , j = 1, ..., m2 ;

i=1

wi,j ≥ 0, i = 1, ..., m1 , j = 1, ..., m2 .

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Wasserstein Distance between Gaussians

I

Two Gaussians φ(µ1 , Σ1 ) and φ(µ2 , Σ2 ): D 2 (φ(µ1 , Σ1 ), φ(µ2 , Σ2 )) = p p kµ1 − µ2 k2 + tr (Σ1 ) + tr (Σ2 ) − 2tr [( Σ1 Σ2 Σ1 )1/2 ] .

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Minimized Aggregated Wasserstein (MAW) GMMs Ml , l = 1, 2: fl (x) = ˜ 2 (M1 , M2 ) D

subject to

=

min

{wi,j } M2 X

PMl

j=1 al,j φ(x|µl,j , Σl,j ).

M1 X M2 X

wi,j D 2 (φ(x|µ1,i , Σ1,i ), φ(x|µ2,j , Σ2,j )) ,

i=1 j=1

wi,j = a1,i , i = 1, ..., M1 ;

j=1 M1 X

wi,j = a2,j , j = 1, ..., M2 ;

i=1

wi,j ≥ 0, i = 1, ..., M1 , j = 1, ..., M2 . I

Semimetric, upper bound for Wasserstein distance

I

Closely related to sampling methods, can be convenient for trade-off.

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Extension to HMM I

Distance between two HMMs comprises two parts: I I

MAW between the two marginal GMMs Comparison between the transition probability matrices via registration. I

I I

I I I

Soft matching for states in the two HMMs based on the two marginal GMMs: W = (wi,j ), i = 1, ..., M1 , j = 1, ..., M2 . Row or column wise normalized matrix: Wr or Wc . State transition probability matrices Pl of size Ml × Ml , l = 1, 2. P1 ↔ Wr P2 Wct P2 ↔ Wct P1 Wr After registration, compute MAW between the GMMs conditioned on the previous state.

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Experiments on HMM I

CMU 3D Motion Data: 1.5GB/ 7 motions(Walk, Jump, etc.)/ 62 dim time series. http://mocap.cs.cmu.edu

I

Confusion matrix based on nearest neighbor classification using different locations of sensor data

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Model Fusion for Discrete Distributions

I

Wasserstein Barycenter problem I

I I

Find a centroid distribution that minimizes the total squared Wasserstein distancs to a given set of distributions. Objective function is not in closed form. Optimization techniques: I I

Add-hoc divide and conquer Subgradient descent, ADMM, Bregman ADMM

1. Y. Zhang, J. Z. Wang, J. Li, “Parallel massive clustering of discrete distributions,” ACM Transactions on Multimedia Computing, Communications and Applications, 11(4):1-24, 2015. 2. J. Ye, P. Wu, J. Z. Wang, and J. Li, “Accelerated discrete distribution clustering under Wasserstein distance,” arXiv.org, http://arxiv.org/abs/1510.00012, 2015.

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Model Fusion for GMM based on Geometry PMl

I

Ml : fl (x) =

I

¯ Average model M:

j=1

al,j φ(x|µl,j , Σl,j ), l = 1, ..., L.

f¯(x) =

Ml L X X 1 l=1 j=1

I

L

al,j φ(x|µl,j , Σl,j )

Find modes by MEM (Modal EM) algorithm. Let P f (x) = M j=1 πj φ(x|µj , Σj ). πj φ(x (r ) |µj , Σj ) , where k = 1, ..., M. f (x (r ) ) !−1 ! M M X X −1 −1 (r +1) 2. M-step: x = pj · Σj · pj · Σj µj .

1. E-step: pj =

k=1

k=1

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

I

Ridgeline: low dimension summary of geometry I I

I

1-D curve connecting the modes of two uni-mode clusters. Passes through all the critical points of the mixed density of the two clusters, including modes, antimodes (local minimum) and saddle points.

Let g1 (x) and g2 (x) be the densities for two clusters. The ridgeline between two clusters is L = {x(α) : (1 − α)∇ log g1 (x) + α∇ log g2 (x) = 0, 0 ≤ α ≤ 1}.

I

REM algorithm solves L. Agglomerative clustering of components based on geometric characteristics.

1. J. Li, S. Ray, and B. G. Lindsay, “A nonparametric statistical approach to clustering via mode identification,” Journal of Machine Learning Research, 8(8):1687-1723, 2007. 2. H. Lee and J. Li, “Variable selection for clustering by separability based on ridgelines,” Journal of Computational and Graphical Statistics, 21(2):315-337, 2012.

EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data

Thank you!