poster template - Semantic Scholar

Report 2 Downloads 105 Views
A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data Yining Wang, Yu-Xiang Wang and Aarti Singh Carnegie Mellon University, Machine Learning Department

Subspace clustering: clustering data points into union of low-dimensional subspaces

Sparse Subspace Clustering (SSC, Elhamifar & Vidal 2007): state-of-the-art subspace clustering algorithm based on โ„“1 self-expression Step 1. Instance-level โ„“1 self-regression 2 ๐‘๐‘– = argmin๐‘โˆˆ๐‘…๐‘โˆ’1 ๐‘ฅ๐‘– โˆ’ ๐‘๐‘‹โˆ’๐‘– 2 + ๐œ† ๐‘ 1 ๐‘ร—๐‘ Step 2. Build similarity graph ๐บ โˆˆ ๐‘… by taking ๐บ๐‘–๐‘— = ๐‘๐‘–๐‘— + ๐‘๐‘—๐‘–

Deterministic analysis of Noisy Sparse Subspace Clustering under dimension reduction Subspace incoherence: for subspace ๐‘†โ„“ define ๐‘‡ โ„“ ๐œ‡โ„“ = max(โ„“) ๐‘‰ ๐‘ฅ โˆž โ„“ {normalize(๐‘ƒ๐‘†_โ„“ [๐‘ฃ(๐‘ฅ๐‘– ๐‘ฅโˆˆ๐‘‹\X

โ„“

where ๐‘‰ = )])} and ๐‘ฃ(๐‘ฅ) is the optimal solution to dual problem 2 ๐‘‡ max๐‘‘ ๐œˆ, ๐‘ฅ + 0.5๐œ† ๐œˆ 2 , ๐‘ . ๐‘ก. ๐‘‹ ๐œˆ โˆž โ‰ค 1 ๐œˆโˆˆ๐‘…

Mathematically: given ๐‘ฅ1 , โ‹ฏ , ๐‘ฅ๐‘ โˆˆ ๐‘‘ ๐‘… , find linear subspaces ๐‘†1 , โ‹ฏ , ๐‘†๐ฟ of dimension ๐‘Ÿ โ‰ช ๐‘‘ such that each ๐‘ฅ๐‘– approximately lies in some ๐‘†๐‘˜ Applications: motion segmentation

Inradius: ๐œŒโ„“ characterizing inner-subspace data distribution

Step 3. Spectral clustering on similarity graph ๐บ Question: will SSC still succeed if the ambient data dimension ๐’… is reduced to ๐’‘ โ‰ช ๐’… by linear dimensionality reduction? ๐’‘ร—๐’… ๐‘ฟ = ๐šฟ๐—, ๐šฟโˆˆ๐‘น Motivation: computational efficiency, compressed measurement, missing data, data privacy, etc.

โ€ฆ and many more: face clustering, network hop counting, social graph mining, recommendation systems โ€ฆ

Method: Gaussian projection, Fast JohnsonLindenstrauss transform (FJLT), uniform row sampling, sketching, etc.

Property: subspace embedding property 2 2 Pr โˆ€๐’™ โˆˆ ๐‘บ, ฮจ๐‘ฅ 2 โˆˆ 1 ยฑ ๐œ– ๐‘ฅ 2 โ‰ฅ 1 โˆ’ ๐›ฟ

No false connection: ๐‘ฅ๐‘– , ๐‘ฅ๐‘— โˆˆ ๐ธ ๐บ โŸน ๐‘ฅ๐‘– , ๐‘ฅ๐‘— belong to the same cluster (subspace).

Main Theorem Let ๐œผ be the level of adversarial noise, ๐ be the parameter in subspace embedding property and ๐šซ = ๐ฆ๐ข๐งโ„“ (๐†โ„“ โˆ’ ๐โ„“ ) be the geometric gap. Then ๐‘ฎ has no false connections with high probability if ๐ โ‰ค ๐ฆ๐ข๐ง

๐Ÿ ๐šซ ๐€ , , ๐Ÿ‘ ๐Ÿ’(๐Ÿ+๐†) ๐Ÿ–

๐’„๐Ÿ

๐Ÿ ๐šซ

โˆ’

๐Ÿ“๐œผ๐Ÿ ๐†

โˆ’ ๐Ÿ‘๐œผ