A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data Yining Wang, Yu-Xiang Wang and Aarti Singh Carnegie Mellon University, Machine Learning Department
Subspace clustering: clustering data points into union of low-dimensional subspaces
Sparse Subspace Clustering (SSC, Elhamifar & Vidal 2007): state-of-the-art subspace clustering algorithm based on โ1 self-expression Step 1. Instance-level โ1 self-regression 2 ๐๐ = argmin๐โ๐
๐โ1 ๐ฅ๐ โ ๐๐โ๐ 2 + ๐ ๐ 1 ๐ร๐ Step 2. Build similarity graph ๐บ โ ๐
by taking ๐บ๐๐ = ๐๐๐ + ๐๐๐
Deterministic analysis of Noisy Sparse Subspace Clustering under dimension reduction Subspace incoherence: for subspace ๐โ define ๐ โ ๐โ = max(โ) ๐ ๐ฅ โ โ {normalize(๐๐_โ [๐ฃ(๐ฅ๐ ๐ฅโ๐\X
โ
where ๐ = )])} and ๐ฃ(๐ฅ) is the optimal solution to dual problem 2 ๐ max๐ ๐, ๐ฅ + 0.5๐ ๐ 2 , ๐ . ๐ก. ๐ ๐ โ โค 1 ๐โ๐
Mathematically: given ๐ฅ1 , โฏ , ๐ฅ๐ โ ๐ ๐
, find linear subspaces ๐1 , โฏ , ๐๐ฟ of dimension ๐ โช ๐ such that each ๐ฅ๐ approximately lies in some ๐๐ Applications: motion segmentation
Inradius: ๐โ characterizing inner-subspace data distribution
Step 3. Spectral clustering on similarity graph ๐บ Question: will SSC still succeed if the ambient data dimension ๐
is reduced to ๐ โช ๐
by linear dimensionality reduction? ๐ร๐
๐ฟ = ๐ฟ๐, ๐ฟโ๐น Motivation: computational efficiency, compressed measurement, missing data, data privacy, etc.
โฆ and many more: face clustering, network hop counting, social graph mining, recommendation systems โฆ
Method: Gaussian projection, Fast JohnsonLindenstrauss transform (FJLT), uniform row sampling, sketching, etc.
Property: subspace embedding property 2 2 Pr โ๐ โ ๐บ, ฮจ๐ฅ 2 โ 1 ยฑ ๐ ๐ฅ 2 โฅ 1 โ ๐ฟ
No false connection: ๐ฅ๐ , ๐ฅ๐ โ ๐ธ ๐บ โน ๐ฅ๐ , ๐ฅ๐ belong to the same cluster (subspace).
Main Theorem Let ๐ผ be the level of adversarial noise, ๐ be the parameter in subspace embedding property and ๐ซ = ๐ฆ๐ข๐งโ (๐โ โ ๐โ ) be the geometric gap. Then ๐ฎ has no false connections with high probability if ๐ โค ๐ฆ๐ข๐ง
๐ ๐ซ ๐ , , ๐ ๐(๐+๐) ๐
๐๐
๐ ๐ซ
โ
๐๐ผ๐ ๐
โ ๐๐ผ