EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data Jia Li Department of Statistics The Pennsylvania State University Email:
[email protected] EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Generative Modeling for Classification and Clustering I I
I
(X , Y ) ∼ P(x, y ), X ∈ Rd , Y ∈ M = {1, 2, ..., M}. Goal: Predict Y by X based on training data T = {(xi , yi ), i = 1, ..., n}. Generative modeling I
I
Estimate P(x|y = j): fj (x), and prior πj = P(Y = j), j = 1, ..., m. Bayes formula (optimal under 0-1 loss): πj fj (x) P(Y = j|X = x) = PM 0 0 j 0 =1 πj fj (x)
I
Examples I I
Gaussina mixture model (GMM): (Xi , Yi )’s are i.i.d. Hidden Markov model (HMM): (Xi , Yi ), i = 1, ..., n is a stochastic process.
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Multiscale Statistical Modeling
Central Processor
Multiscale Statistical Modeling Principles vs. Feasibility
Model Clustering Build mixture
Model Fusion
Optimization, Geometric methods Discrete distributions
K el od
GMM HMM
M
Model 3
el 2 Mod
M
od
el
1
GMM Metric for Prob. measures
Site 1
Site 2
Site 3
Site K
Modeling
Modeling
Modeling
Modeling
Data
Data
Data
... ... ...
Data
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Wasserstein Distance I
Let the probability measures for X and Z be γ1 , γ2 : D(γ1 , γ2 ) ,
I
where k · k is the Lp distance. We let p = 2. Advantages I I I
I
min (E kX − Z kp )1/p , ζ∈Γ(γ1 ,γ2 )
True metric Different supports Relationship between support points (cross-terms)
Challenges: computational complexity
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Wasserstein Distance between Discrete Distributions I
(1)
(1)
(2)
(2)
(ml )
βl = {(vl , pl ), (vl , pl ), ..., (vl k = 1, ..., ml , l = 1, 2. 2
D (β1 , β2 ) = min
{wi,j }
subject to
m2 X j=1 m1 X
m1 X m2 X
(ml )
, pl
(i)
(k)
)}, vl
∈ Rd ,
(j)
wi,j k v1 − v2 k2
i=1 j=1 (i)
wi,j = p1 , i = 1, ..., m1 ; (j)
wi,j = p2 , j = 1, ..., m2 ;
i=1
wi,j ≥ 0, i = 1, ..., m1 , j = 1, ..., m2 .
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Wasserstein Distance between Gaussians
I
Two Gaussians φ(µ1 , Σ1 ) and φ(µ2 , Σ2 ): D 2 (φ(µ1 , Σ1 ), φ(µ2 , Σ2 )) = p p kµ1 − µ2 k2 + tr (Σ1 ) + tr (Σ2 ) − 2tr [( Σ1 Σ2 Σ1 )1/2 ] .
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Minimized Aggregated Wasserstein (MAW) GMMs Ml , l = 1, 2: fl (x) = ˜ 2 (M1 , M2 ) D
subject to
=
min
{wi,j } M2 X
PMl
j=1 al,j φ(x|µl,j , Σl,j ).
M1 X M2 X
wi,j D 2 (φ(x|µ1,i , Σ1,i ), φ(x|µ2,j , Σ2,j )) ,
i=1 j=1
wi,j = a1,i , i = 1, ..., M1 ;
j=1 M1 X
wi,j = a2,j , j = 1, ..., M2 ;
i=1
wi,j ≥ 0, i = 1, ..., M1 , j = 1, ..., M2 . I
Semimetric, upper bound for Wasserstein distance
I
Closely related to sampling methods, can be convenient for trade-off.
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Extension to HMM I
Distance between two HMMs comprises two parts: I I
MAW between the two marginal GMMs Comparison between the transition probability matrices via registration. I
I I
I I I
Soft matching for states in the two HMMs based on the two marginal GMMs: W = (wi,j ), i = 1, ..., M1 , j = 1, ..., M2 . Row or column wise normalized matrix: Wr or Wc . State transition probability matrices Pl of size Ml × Ml , l = 1, 2. P1 ↔ Wr P2 Wct P2 ↔ Wct P1 Wr After registration, compute MAW between the GMMs conditioned on the previous state.
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Experiments on HMM I
CMU 3D Motion Data: 1.5GB/ 7 motions(Walk, Jump, etc.)/ 62 dim time series. http://mocap.cs.cmu.edu
I
Confusion matrix based on nearest neighbor classification using different locations of sensor data
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Model Fusion for Discrete Distributions
I
Wasserstein Barycenter problem I
I I
Find a centroid distribution that minimizes the total squared Wasserstein distancs to a given set of distributions. Objective function is not in closed form. Optimization techniques: I I
Add-hoc divide and conquer Subgradient descent, ADMM, Bregman ADMM
1. Y. Zhang, J. Z. Wang, J. Li, “Parallel massive clustering of discrete distributions,” ACM Transactions on Multimedia Computing, Communications and Applications, 11(4):1-24, 2015. 2. J. Ye, P. Wu, J. Z. Wang, and J. Li, “Accelerated discrete distribution clustering under Wasserstein distance,” arXiv.org, http://arxiv.org/abs/1510.00012, 2015.
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Model Fusion for GMM based on Geometry PMl
I
Ml : fl (x) =
I
¯ Average model M:
j=1
al,j φ(x|µl,j , Σl,j ), l = 1, ..., L.
f¯(x) =
Ml L X X 1 l=1 j=1
I
L
al,j φ(x|µl,j , Σl,j )
Find modes by MEM (Modal EM) algorithm. Let P f (x) = M j=1 πj φ(x|µj , Σj ). πj φ(x (r ) |µj , Σj ) , where k = 1, ..., M. f (x (r ) ) !−1 ! M M X X −1 −1 (r +1) 2. M-step: x = pj · Σj · pj · Σj µj .
1. E-step: pj =
k=1
k=1
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
I
Ridgeline: low dimension summary of geometry I I
I
1-D curve connecting the modes of two uni-mode clusters. Passes through all the critical points of the mixed density of the two clusters, including modes, antimodes (local minimum) and saddle points.
Let g1 (x) and g2 (x) be the densities for two clusters. The ridgeline between two clusters is L = {x(α) : (1 − α)∇ log g1 (x) + α∇ log g2 (x) = 0, 0 ≤ α ≤ 1}.
I
REM algorithm solves L. Agglomerative clustering of components based on geometric characteristics.
1. J. Li, S. Ray, and B. G. Lindsay, “A nonparametric statistical approach to clustering via mode identification,” Journal of Machine Learning Research, 8(8):1687-1723, 2007. 2. H. Lee and J. Li, “Variable selection for clustering by separability based on ridgelines,” Journal of Computational and Graphical Statistics, 21(2):315-337, 2012.
EAGER-DynamicData: Generative Statistical Modeling for Dynamic and Distributed Data
Thank you!