Streaming Sparse Principal Component Analysis Wenzhuo Yang, Huan Xu National University of Singapore
Introduction Standard principal component analysis (PCA): ๏ Perform the spectral decomposition of the sample covariance matrix. ๏ select the eigenvectors corresponding to the largest eigenvalues. Weaknesses of PCA: ๏ The output may lack interpretability. ๏ In the high dimensional regime where ๐ โซ ๐, PCA is not consistent.
ICML 2015
Introduction To address these issues: Previous works focus on sparse PCA: only a few of attributes of the resulting PCs are non-zero. ๏ A regression-type formulation based on the elastic-net (Zou et al., 2006). ๏ A convex semidefinite program formulation (dโAspremont et al., 2007). ๏ The TPower method and the iterative thresholding method (Yuan & Zhang, 2013 and Ma, 2013). ๏ The Fantope projection selection (Vu et al., 2013).
ICML 2015
Introduction The difficulty of sparse PCA: it is hard to apply them to large scale data ๏ They either explicitly compute the sample covariance matrix or store all the samples.๏ ๐(๐ min{๐, ๐}) storage ๏ The computational cost may become prohibitive when the dimensionality is high. ๏ ๐(๐3 ) operations For non-sparse PCA: e.g., Online PCA (Warmuth & Kuzmin, 2008)
Incremental PCA (Brand, 2002)
Stochastic power method (Arora et al., 2012)
Streaming PCA (Mitliagkas et al., 2013)
๐(๐2 ) storage
No theoretical guarantees
No theoretical guarantees
Inconsistent when ๐ โซ ๐
ICML 2015
Introduction For sparse PCA: e.g., online sparse PCA based on the online learning algorithm for sparse coding (Mairal et al., 2010), Memory complexity
Computational complexity
๐(๐๐)
High (the elastic net should be solved in each iteration)
How to design a computation- and memory-efficient sparse PCA method remains unsolvedโฆ
ICML 2015
Introduction Another important issue โ sub-Gaussianality assumption: ๏ Many sparse PCA methods are theoretically analyzed under the spike model, e.g., Amini & Wainwright, 2009; Vu et al., 2013; Yuan & Zhang, 2013; Vu & Lei, 2012; Shen et al., 2013; Mitliagkas et al., 2013; Cai et al., 2014
๏ The spike model requires subgaussian data and noise, and hence can not model heavy-tail distributions. To relax this assumption: ๏ The semiparametric transelliptical family and elliptical family are used to model data (Han & Liu, 2013). Transelliptical component analysis (TCA) and elliptical component analysis (ECA) ๏ ๐(๐ 3 ) computation and ๐(๐ 2 ) memory ICML 2015
Introduction Our contributions: ๏ We propose two variants of sparse PCA: Spike model
Elliptical model
Streaming sparse PCA
Streaming sparse ECA
๏ Our theoretical analysis shows that both of the algorithms have
ICML 2015
Memory complexity
Computational complexity
Sample complexity
๐(๐๐)
๐(๐๐๐ log๐)
ฮ(๐ log๐)
Problem Setup Streaming data model: ๏ One receives sample ๐๐ก at time ๐ก, and ๐๐ก vanishes after it is collected unless it is stored in the memory.
Spike model: ๏ Sample ๐๐ก is generated according to ๐๐ก = ๐จ๐๐ก + ๐๐ก โข ๐๐ก โ a sample of standard Gaussian ๐ 0, ๐ฐ๐ .
โข ๐๐ก โ a sample of standard Gaussian ๐ 0, ๐ 2 ๐ฐ๐ . โข ๐จ โ ๐น๐ร๐ โ a deterministic but unknown matrix. ๏ The covariance matrix ๐ฎ = ๐จ๐จโค + ๐ 2 ๐ฐ๐ .
ICML 2015
Problem Setup Elliptical model: ๏ Sample ๐๐ก is generated according to ๐๐ก = ๐ + ๐๐ก ๐จ๐๐ก
โข ๐๐ก โ a sample of a uniform random vector on the unit sphere. โข ๐๐ก โ a sample of a scalar random variable with unknown distribution. โข ๐จ โ ๐น๐ร๐ โ a deterministic satisfying ๐จ๐จโค = ๐ฎ. The sparse setting: ๏ The projection matrix ๐ซ = ๐ผ๐ ๐ผโค ๐ satisfies that diag ๐ซ 0 โค ๐ .
ICML 2015
Algorithm Basic idea: ๏ Block-wise stochastic power methods โ update the estimated PCs once a block of samples are received. ๏ The โrow truncationโ operator โ maintain the sparsity of the estimated PCs.
ICML 2015
Algorithm Streaming sparse PCA (for the spike model): Computational complexity: ๐(๐๐min ๐, ๐ log๐ ) ๐ต = ฮ(๐ log๐) ๐๐+1 = ๐บ๐+1 ๐ธ๐
2) ๐(๐๐๐ต)
3) ๐(๐๐ + ๐log๐) 4) ๐(๐๐ 2 )
ICML 2015
Algorithm The iterative deflation method: E.g., the leading ๐ PCs are all sparse but their supports are nearly disjoint.
ICML 2015
Algorithm The advantages compared with streaming PCA and TPower:
1. The streaming SPCA is consistent in the high dimensional regime where the streaming PCA is inconsistent. 2. TPower requires ๐(๐min{๐, ๐}) storage but our method only requires ๐ ๐๐ storage. 3. When the leading ๐ PCs are row sparse, our method can extract them simultaneously, but TPower can only extract them one by one.
ICML 2015
Algorithm For elliptically distributed data, ECA (Liu & Han, 2013) utilizes the multivariate Kendallโs tau statistic: The eigenspace of ๐ฒ is identical to that of ๐ฎ.
Consider the following estimator of ๐ฒ:
The empirical covariance matrix of samples ๐ฆ๐ =
ICML 2015
๐2๐โ1โ๐2๐ ๐2๐โ1 โ๐2๐ 2
Algorithm Streaming sparse ECA (for the elliptical model):
Compute sample ๐๐ก
ICML 2015
Performance Guarantees The main theorem for streaming sparse PCA: Theorem 1: For parameters ๐ > 0,0 < ๐ < 1, and ๐พ โฅ ๐ , let 2+ 2 ๐ ๐ ๐ + 1 ๐๐+1 + 2๐๐๐ ๐โ , ๐ ๐, ๐, ๐ โ max , . ๐๐ ๐ ๐ If the initial solution ๐ธ0 is โgoodโ 2 1 โ ๐ ๐ โ ๐ผโค , ๐,โฅ ๐ธ0 2 < 2 1 โ ๐ + ๐ + 1 ๐ ๐, ๐, ๐ then as long as the block size ๐ต and the iteration number ๐ are โlarge enoughโ ๐ log ๐ ๐๐ 2 ๐21 ๐ + 2๐พ log ๐ + log ๐ ๐โฅ ,๐ต โฅ , ๐ 2 ๐ 2 ๐2 ๐ log ๐ 1 โ ๐ โ ๐ ๐, ๐, ๐ ๐ with probability at least 1 โ ๐ โ10 , the output ๐ธ ๐ of Algorithm 2 satisfies that ๐ผโค ๐,โฅ ๐ธ ๐ 2 โค ๐. ICML 2015
Theoretical Guarantees 2 ๐ + 1 ๐๐+1 + 2๐๐๐ 1 โ ๐ ๐โ , ๐ผโค , ๐,โฅ ๐ธ0 2 < 2 ๐๐ 1 โ ๐ + ๐ + 1 ๐ ๐, ๐, ๐ ๐ log ๐ ๐๐ 2 ๐21 ๐ + 2๐พ log ๐ + log ๐ ๐โฅ ,๐ต โฅ , ๐ 2 2 2 ๐ ๐ ๐๐ log 1 โ ๐ โ ๐ ๐, ๐, ๐ ๐
Remarks: 1) The algorithm succeeds as long as ๐๐ > ๐ + 1 ๐๐+1 since โ๐ so that ๐ < 1.
2) A smaller ๐ leads to faster convergence and less samples required.
3) A more accurate initial solution is required when ๐ is larger.
4) The algorithm can succeed when ๐ต = ฮ ๐ log ๐ + log ๐ if ๐ โค ๐พ โค 2๐ .
ICML 2015
Theoretical Guarantees The main theorem for streaming sparse ECA: Theorem 2: For parameters ๐ > 0,0 < ๐ < 1, and ๐พ โฅ ๐ , let 2+ 2 ๐ ๐ ๐ + 1 ๐๐+1 + 2๐๐๐ ๐โ , ๐ ๐, ๐, ๐ โ max , . ๐๐ ๐ ๐ If the initial solution ๐ธ0 is โgoodโ 2 1 โ ๐ ๐ โ ๐ผโค , ๐,โฅ ๐ธ0 2 < 2 1 โ ๐ + ๐ + 1 ๐ ๐, ๐, ๐ then as long as the block size ๐ต and the iteration number ๐ are โlarge enoughโ ๐ 2 2 log ๐ ๐๐ 1 + ๐1 ๐ฒ ๐ + 2๐พ log ๐ + log ๐ ๐โฅ ,๐ต โฅ , ๐ 2 ๐ 2๐ ๐ฒ 2 ๐ ๐ log 1 โ ๐ โ ๐ ๐, ๐, ๐ ๐ with probability at least 1 โ ๐ โ10 , the output ๐ธ ๐ of Algorithm 2 satisfies that ๐ผโค ๐,โฅ ๐ธ ๐ 2 โค ๐. ICML 2015
Experimental Results Comparison between streaming sparse PCA, streaming PCA, FPS and online sparse PCA:
The samples are generated under the spike model. ICML 2015
Experimental Results Comparison between streaming sparse PCA and streaming PCA:
The samples are generated under the spike model. ICML 2015
Experimental Results Comparison between ECA, streaming sparse ECA, streaming sparse PCA and streaming PCA:
๐ follows (Left) the chi-distribution and (Right) the F-distribution ICML 2015
Experimental Results Real-world dataset: (Left) NIPS dataset and (Right) NYTimes dataset. Parameters B and ฮณ in streaming sparse PCA are set to 300 and 500, respectively.
Large scale sparse PCA (Zhang & El Ghaoui, 2011)
ICML 2015
ICML 2015