One-Bit Principal Subspace Estimation
Yuejie Chi Departments of ECE and BMI The Ohio State University GlobalSIP 2014, Atlanta GA December 8, 2014
Page 1
Acknowledgement ● Part of this work was accomplished during the Air Force Summer Faculty Fellowship Program, where stimulating discussions from Drs. Lee Seversky, Lauren Huie, and Matt Berger are greatly appreciated. ● Our research is supported by a Google Faculty Research Award, NSF CCF1422966 and the AFRL Summer Extension Grant.
Page 2
Signals Lying in a Low-Dimensional Subspace ● The signal of interest in many problems can be modeled as lying in a low-dimensional subspace parameterized by some unknown parameters.
● Examples: – multi-channel data in MRI and EEG – DOA estimation in sensor array processing – spatial-temporal observations from large-scale networks ● Interested in estimating the covariance matrix or the principal components (top eigenvectors of the covariance matrix) of the data. Page 3
High-Dimensional Streaming Data Acquisition Data generation at unprecedented rate: data samples are ● distributed at multiple locations; ● online generated on the fly and can only access once. Limited processing power at sensor platforms: ● storage-limited: cannot store the whole data set; ● power-hungry: minimize the number of observations. Unprecedented'Data'Rate'and'Volume'
Limited'Power'and'Storage'
Page 4
Covariance Sketching Observation: the covariance structure can be recovered without measuring the whole data stream.
Proposed Approach: distributed online data sketching and aggregation to recover the covariance structure or principal components. ● access each data sample via linear or quadratic (energy) sketches; ● aggregate the sketches into linear observations of the covariance matrix. Page 5
Quadratic Sketching for Covariance Estimation Consider a data stream possible distributively observed at m sensors:
Quadratic Sketching: For each sensor i = 1, . . . , m: ● randomly select a sketching vector ai ∈ Rn with i.i.d. sub-Gaussian entries; ● Sketch an arbitrary substream indexed by {`it}Tt=1 with an energy measurement 2
∣⟨ai, x`i ⟩∣ and aggregate the average energy measurement: t
yi,T
2 2 1 T T −1 1 yi,T −1 + ∣⟨ai, x`i ⟩∣ . = ∑ ∣⟨ai, x`i ⟩∣ = t T T t=1 T T
Page 6
Covariance Estimation with Rank-One Measurements ● Quadratic Measurement Model: H yi,T = aH i ΣT ai ∶= ai Σai + ηi ,
where ηi = aH i (ΣT − Σ)ai is the additive noise. ● More generally, we assume the following measurement model: zi = aH i Σai + ηi ,
i = 1, . . . , m;
or more concisely, z = A(Σ) + η with η = [η1, . . . , ηm] bounded deterministically as ∥η∥1 ≤ . – quadratic in ai and linear in the rank-one matrix aiaH i ; – We can solve for Σ via least-squares estimation if m ≥ n2 (the size of Σ); – It is possible to further reduce the number of m by exploiting the lowdimensional structure of Σ. Page 7
Other Applications of Quadratic Sensing ● Energy measurements are often more reliable with high-frequency applications for estimating power spectral density.
● Quadratic measurements arise in practical applications such as phase retrieval and phase space tomography.
Figure 1: A typical setup for structured illuminations in diffraction imaging using a phase mask.
E. J. Cand`es, Y. C. Eldar, T. Strohmer and V. Voroninski, “Phase retrieval via matrix completion,” SIAM J. on Imaging Sciences. L. Tian, J. Lee, S. Oh, and G. Barbastathis, “Experimental compressive phase space tomography,” Opt. Express.
Page 8
Low-Rank Covariance Estimation via Convex Relaxation ● Low-Rankness: We specialize to a low-rank covariance matrix when a small number of components accounts for most of the variability in the data.
● Seek the covariance matrix satisfying the observations with the minimal rank: ˆ = argmin rank(Σ) s.t. ∥z − A(Σ)∥1 ≤ . Σ Σ⪰0
● However this is non-convex and NP-hard. Therefore, we replace it by the trace minimization, which is the tightest convex relaxation with respect to the rank function, over all matrices compatible with the measurements: ˆ = argmin Tr(Σ) s.t. Σ
∥z − A(Σ)∥1 ≤ .
Σ⪰0 Page 9
Near-Optimal Covariance Estimation Theorem 1 (Chen, Chi and Goldsmith, 2013). Consider the sub-Gaussian sampling model, then with probability exceeding 1 − exp(−c1m), the solution ˆ satisfies Σ ˆ − Σ∥F ≤ ∥Σ
∥Σ − Σr ∥∗ √ r ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ C1
due to imperfect structure
+ C2 , m ² due to noise
where Σr is the best rank-r approximation of Σ, provided that m > c0nr, where c0, c1, C1 and C2 are universal constants. 1
1
● Exact with Θ(nr) measurements;
0.9
information theoretic limit
0.9
0.8
0.8
● Universal for all low-rank matrices;
● Results hold for i.i.d. bilinear measurements aTi Σbi and extendable to other covariance structures such as Toeplitz and sparse ones;
0.6
0.6
r/n
● Robust against approximate low-rankness and bounded noise;
0.7
0.7
0.5
0.5 0.4
0.4
0.3
0.3 0.2
0.2
0.1
0.1
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0
m / (n*n)
Y. Chen, Y. Chi, and A. J. Goldsmith, “Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming,” IEEE Trans. on Information Theory, Dec. 2013, in revision.
Page 10
From One Sketch to One Bit ● So far, one real-valued sketch per instance suffices to recover the covariance matrix via SDP, with a near-optimal number of measurements. ● In networked sensing, especially in bandwidth-constrained environments, we may only allow a single bit from each sensor to the fusion center.
● Luo, Ribeiro and Giannakis have considered estimating a scalar parameter from a collection of binary measurements [Luo 2005, Ribeiro-Giannakis 2006]. ● This work: estimating the principal subspace from binary measurements. Luo, Zhi-Quan, “Universal decentralized estimation in a bandwidth constrained sensor network.” IEEE Trans. on Info. Theory, 2005. A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributed estimation for wireless sensor networks-Part I: Gaussian case.” IEEE Transactions on Signal Processing, 2006.
Page 11
One-Bit PCA: Recovering Subspace From Bits H be a rank-r matrix with U ∈ Cn×r . Assumption: Let Σ = E[xtxH t ] = UU
One-Bit PCA: For each sensor i = 1, . . . , m: ● randomly select two sketching vectors ai, bi ∈ Cn with i.i.d. Gaussian entries; ● Sketch an arbitrary substream indexed by {`it}Tt=1 with two energy 2
2
measurements ∣⟨ai, x`i ⟩∣ , ∣⟨bi, x`i ⟩∣ , and transmit a binary bit indicating t t the energy comparison outcome to the fusion center: yi,T
2 2 1 T 1 T = sign ( ∑ ∣⟨ai, x`i ⟩∣ − ∑ ∣⟨bi, x`i ⟩∣ ) t t T t=1 T t=1
ˆ ∈ Cn×r ● Estimation: The fusion center recovers the principal components U by computing the top r eigenvectors of the surrogate matrix: 1 m H J m = ∑ yi,T (aiaH i − bi bi ) . m i=1
Page 12
Bit Comparisons are Robust ● With finite samples, the numerical energy difference measures the sample covariance ΣT : zi,T
2 2 1 T 1 T H − b b = ∑ ∣⟨ai, x`i ⟩∣ − ∑ ∣⟨bi, x`i ⟩∣ = ⟨ΣT , aiaH i i ⟩. i t t T t=1 T t=1
H The discrepancy zi,T − zi = ⟨ΣT − Σ, aiaH − b b i i ⟩ ≠ 0. i
● The ordinal energy difference measures the exact covariance Σ with high probability as soon as T is not too small: H H H yi,T = sign (⟨ΣT , aiaH i − bi bi ⟩) ==sign (⟨Σ, ai ai − bi bi ⟩) = yi .
Theorem 2 (Chi 2014). Let xt ∼ CN (0, Σ). Let 0 < δ ≤ 1, then with probability at least 1−δ all bit measurements are exact, given that the number of samples observed by each sensor satisfies T > cTr(Σ)/∥Σ∥F log ( m δ ) for some sufficiently large constant c. Y. Chi, “One-Bit Principal Subspace Estimation”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, GA, Dec. 2014.
Page 13
One-Bit PCA: Why does it work? Consider a rank-one example Σ = θθ H with the eigenvector θ ∈ Cn: ● Each bit yi = sign(∣⟨ai, θ⟩∣2 − ∣⟨bi, θ⟩∣2) selects the halfspace towards the direction with a smaller angle with either ai or bi. ● With enough bit measurements, we can trap the eigenvector θ accurately up to a sign difference.
Page 14
One-Bit PCA: Performance Guarantee Conditioned on that the bit measurements are exact, the principal subspace of J m agrees with U with high probability given m is sufficiently large. Theorem 3 (Chi 2014). Denote U ∈ Cn×r as the principal subspace of Σ and H − b bH ). Let 0 < δ < 1, ˆ as the principal subspace of J m = 1 ∑m (a U y a i i i i m i=1 i then with probability at least 1 − δ, there exists an r × r orthogonal matrix Q such that √ nr2 2n ˆ ∥U − U Q∥F ≤ c1 log ( ) m δ for all rank-r matrices Σ, where c1 is some absolute constant. ● The subspace estimate is accurate as soon as m = Θ(nr2 log n) which is near-optimal as the subspace requires at least nr measurements. Y. Chi, “One-Bit Principal Subspace Estimation”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, GA, Dec. 2014.
Page 15
How many bits do we need? ● We generate the covariance matrix as Σ = XX T , where X ∈ Rn×3 is composed of standard Gaussian entries. The sketching vectors ai’s and bi’s are also generated with standard Gaussian entries. ˆ is calculated via computing the top eigenvectors of J m. ● The estimate X 2 /∥X∥2 . ⊥ ● The error metric is calculated as ∥P X X∥ ˆ F F 1 n = 40 n = 100 n = 200
0.8
NMSE
0.6
0.4
0.2
0 0
1000
2000
3000
4000
Number of bit measurements
Page 16
Online DOA Estimation with One-Bit Measurements ● The estimate of the principal subspace can be updated in a sequential manner as new bits are available in an incremental SVD approach [Brand 2002]. ● The covariance matrix Σ is a low-rank Toeplitz PSD matrix with n = 40 and r = 3. The set of modes is F = [0.1, 0.7, 0.725] (notice the last two modes are separated by the Rayleigh limit 1/n), and their variance is σ 2 = 1. ESPRIT is applied to estimate 5 modes using the subspace estimate. 1 0.9 0.8
0.8 Mode locations
0.7 0.6
0.6 0.5
0.4
0.4 0.3
About 1000 bits are sufficient to distinguish two close modes separated by the Rayleigh limit.
0.2
0.2 0.1 0 0
1000
2000
3000
4000
0
Number of bit measurements
M. Brand, “Incremental singular value decomposition of uncertain data with missing values”, ECCV 2002.
Page 17
Summary and Future Work ● Recovering low-dimensional covariance structures rather than the data instances allows further reduction in the sampling rate, which seems to provide a new set of problems. ● Open issues: – Task-specific trade-offs between sample complexity, communication overheads, noise level and estimation accuracy; – Applying covariance sketching to applications such as network anomaly detection. – relationships to one-bit CS/MC.
Page 18
Related Publications Available at www.ece.osu.edu/~chi/. ● Y. Chen, Y. Chi and A. J. Goldsmith, “Universal and Robust Covariance Estimation via Convex Programming,” ISIT 2014. ● Y. Chen, Y. Chi and A. J. Goldsmith, “Estimation of Simultaneously Structured Covariance Matrices from Quadratic Measurements”, ICASSP 2014. ● Y. Chen, Y. Chi, and A. J. Goldsmith, “Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming,” in revision for IEEE Trans. on Information Theory. Available on Arxiv: 1310.0807. ● Y. Chi, “One-Bit Principal Subspace Estimation”, GlobalSIP 2014.
Thanks!
Page 19