Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Robust Visual Mining of Data with Error Information J. Sun1,2
A. Kabán1
S. Raychaudhury2
1 School of Computer Science The University of Birmingham 2 School
of Physics and Astronomy The University of Birmingham
ECML/PKDD, WARSAW, 2007
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
The Problem that We Studied and The Proposed Model
Data visualization and outlier detection The problem: robust visualizing dataset with known errors, and showing outliers in the low-dimensional space. The proposed model: For each data tn with measurement error Sn , assume the error model: p(tn |wn ) = N (tn |wn , Sn ) Robust GTM for latent, error-free data wn p(wn ) =
1 X St (wn ; W, σ, νk ) K
The likelihood of the proposed model p(tn ; W, σ, ν): ZZ νk νk 1 X 1 N (t|wn , S)N wn |Wφ(xk ), G un | , dun dwn K uσ 2 2 k
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Tree-structured variational EM
Variational E-step We choose to use the tree-structured variational distribution: q(w, u, z = k ) = q(k )q(w|k )q(u|k ) The auxiliary distribution can be obtained analytically due to the conjugacy properties of the priors used: q(w|k ) q(k )
= N (w|hwik , Σw|k ); q(u|k ) = G(u|ak , bk ); =
exp(At,k ) k 0 exp(At,k 0 )
P
(1) (2)
where Σw|k hwik ak Ck At,k
−1 σhuiu|k + S−1 ; = Σw|k huiu|k σWφ(xk ) + S−1 t ; νk + D νk + Ck = ; bk = ; 2 2 = σ khwik − Wφ(xk )k2 + Tr Σw|k ; = hlog p(t, w, u, k ) − log (q(u|k )q(w|k )q(k ))iw,u|k =
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Tree-structured variational EM
Variational M-step To estimate W, we need to solve the following equation in matrix notation: ΦT GΦWT = ΦT A
where Φ is a K × M matrix with element Φij = φj (xi ), A is a P K × d matrix, its (k , i) element is n q(zn = k )hun ik hwn iki , G is a diagonal K × K matrix with elements Gkk =
X
q(zn = k )hun ik
n
We can re-estimate the inverse variance σ as: D E 1 1 XX 2 . = q(zn = k )hun ik kwn − Wφ(xk )k σ ND n wn |k k
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Outlier detection criterion
Outlier detection criterion
To detect outliers, the posterior expectation of u is of interest: e≡
X
q(k )
k
νk + D νk + σ khwik − Wφ(xk )k2 + Tr Σw|k
(3)
A data point is considered to be an outlier not due to errors, if its e value is sufficiently small, or equivalently, the value v≡
X
q(k )σ khwik − Wφ(xk )k2 + Tr Σw|k
k
is sufficiently large.
(4)
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Illustrative experiments on synthetic data
Simulation results on synthetic data
3D plot with inferred outliers by the proposed algorithm
2D plot with inferred outliers by the proposed algorithm 1
150 0.5 100 0 50
0 1
−0.5
1 0.5
0 −1 −1
−0.5
0
0.5
1
0 −1 −1
−0.5
Figure: Synthetic data sets with cluster structure and outliers.
Introduction
Variational EM Solution
Model Interpretation
Experiments and Results
Visualising high-redshift quasars
Visualizing the high-redshift quasars
Figure: The 3D plot of the proposed algorithm on the subset of SDSS quasar catalogue.