Robust Visual Mining of Data with Error Information - Semantic Scholar

Report 2 Downloads 32 Views
Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Robust Visual Mining of Data with Error Information J. Sun1,2

A. Kabán1

S. Raychaudhury2

1 School of Computer Science The University of Birmingham 2 School

of Physics and Astronomy The University of Birmingham

ECML/PKDD, WARSAW, 2007

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

The Problem that We Studied and The Proposed Model

Data visualization and outlier detection The problem: robust visualizing dataset with known errors, and showing outliers in the low-dimensional space. The proposed model: For each data tn with measurement error Sn , assume the error model: p(tn |wn ) = N (tn |wn , Sn ) Robust GTM for latent, error-free data wn p(wn ) =

1 X St (wn ; W, σ, νk ) K

The likelihood of the proposed model p(tn ; W, σ, ν):    ZZ νk νk  1 X 1 N (t|wn , S)N wn |Wφ(xk ), G un | , dun dwn K uσ 2 2 k

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Tree-structured variational EM

Variational E-step We choose to use the tree-structured variational distribution: q(w, u, z = k ) = q(k )q(w|k )q(u|k ) The auxiliary distribution can be obtained analytically due to the conjugacy properties of the priors used: q(w|k ) q(k )

= N (w|hwik , Σw|k ); q(u|k ) = G(u|ak , bk ); =

exp(At,k ) k 0 exp(At,k 0 )

P

(1) (2)

where Σw|k hwik ak Ck At,k

 −1 σhuiu|k + S−1 ;   = Σw|k huiu|k σWφ(xk ) + S−1 t ; νk + D νk + Ck = ; bk = ; 2 2  = σ khwik − Wφ(xk )k2 + Tr Σw|k ;   = hlog p(t, w, u, k ) − log (q(u|k )q(w|k )q(k ))iw,u|k =

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Tree-structured variational EM

Variational M-step To estimate W, we need to solve the following equation in matrix notation: ΦT GΦWT = ΦT A

where Φ is a K × M matrix with element Φij = φj (xi ), A is a P K × d matrix, its (k , i) element is n q(zn = k )hun ik hwn iki , G is a diagonal K × K matrix with elements Gkk =

X

q(zn = k )hun ik

n

We can re-estimate the inverse variance σ as: D E 1 1 XX 2 . = q(zn = k )hun ik kwn − Wφ(xk )k σ ND n wn |k k

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Outlier detection criterion

Outlier detection criterion

To detect outliers, the posterior expectation of u is of interest: e≡

X

q(k )

k

νk + D  νk + σ khwik − Wφ(xk )k2 + Tr Σw|k

(3)

A data point is considered to be an outlier not due to errors, if its e value is sufficiently small, or equivalently, the value v≡

X

q(k )σ khwik − Wφ(xk )k2 + Tr Σw|k

k

is sufficiently large.



(4)

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Illustrative experiments on synthetic data

Simulation results on synthetic data

3D plot with inferred outliers by the proposed algorithm

2D plot with inferred outliers by the proposed algorithm 1

150 0.5 100 0 50

0 1

−0.5

1 0.5

0 −1 −1

−0.5

0

0.5

1

0 −1 −1

−0.5

Figure: Synthetic data sets with cluster structure and outliers.

Introduction

Variational EM Solution

Model Interpretation

Experiments and Results

Visualising high-redshift quasars

Visualizing the high-redshift quasars

Figure: The 3D plot of the proposed algorithm on the subset of SDSS quasar catalogue.