Largest-Eigenvalue-Theory for Incremental ... - Semantic Scholar

Report 3 Downloads 137 Views
Largest-Eigenvalue-Theory for Incremental Principal Component Analysis Shuicheng Yan

Xiaoou Tang

Dept. of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong SAR Email: [email protected]

Dept. of Information Engineering The Chinese University of Hong Kong Shatin, Hong Kong SAR Email: [email protected]

Abstract— In this paper, we present a novel algorithm for incremental principal component analysis. Based on the LargestEigenvalue-Theory, i.e. the eigenvector associated with the largest eigenvalue of a symmetry matrix can be iteratively estimated with any initial value, we propose an iterative algorithm, referred as LET-IPCA, to incrementally update the eigenvectors corresponding to the leading eigenvalues. LET-IPCA is covariance matrix free and seamlessly connects the estimations of the leading eigenvectors by cooperatively preserving the most dominating information, as opposed to the state-of-the-art algorithm CCIPCA, in which the estimation of each eigenvector is independent. The experiments on both the MNIST digits database and the CMU PIE face database show that our proposed algorithm is much superior to CCIPCA in both convergency speed and accuracy.

I. I NTRODUCTION In recent years, subspace analysis techniques [3] have gained much attention in computer vision. An image can be represented as a point in the image space; even though the dimension of the image space is normally very high, the embedded dimension is much lower. Before we utilize any classification technique, it is beneficial to first perform dimensionality reduction to project an image into a low dimensional representation space, due to the consideration of learnability and computational efficiency. In particular, Principal Component Analysis (PCA) [7] has been applied to face recognition and many other problems with impressive results. PCA is an unsupervised subspace learning algorithm. It aims to find the geometrical structure of data set and project the data along the directions with maximal variances. The original PCA is batch-based and assumes that the data are available in advance and are given once together. However, it can no longer satisfy the demands of applications in which the data are incrementally received. Furthermore, when the dimension of the data is high, both the computation and storage complexities grow dramatically. Thus, an incremental method is highly desired to compute an adaptive subspace for the data arriving sequentially. Many algorithms [4] have been proposed for Incremental Principal Component Analysis (IPCA), and the state-of-theart algorithm is Candid Covariance free Incremental Principal Component Analysis (CCIPCA) [8]. However, though

CCIPCA statistically converges, the higher-order eigenvectors converge very slowly since their convergence properties are based on that the previous eigenvectors have been well estimated. Moreover, in each step, when an eigenvector is updated, the removed dominating information should be used to update the next eigenvector, yet it omits them and only use the newly arrived data to update each eigenvector. In this paper, based on the Largest-Eigenvalue-Theory, that is, the eigenvector corresponding to the largest eigenvalue can be iteratively estimated, we theoretically derive a method on how to estimate the higher-order eigenvectors, and then we propose a novel algorithm for incremental principal component analysis. This new algorithm seamlessly connects the estimations of different eigenvectors and in each step aims at preserving the most dominating information in the estimated leading eigenvectors. A brief review on PCA, IPCA, and detailed algorithm are introduced in the following sections. II. I NCREMENTAL P RINCIPAL C OMPONENT A NALYSIS Many real problems suffer from the curse of dimensionality. One approach to cope with the problem of excessive dimensionality of the data space is to reduce the dimensionality by linearly combining features, i.e. linear subspace methods. Linear subspace methods are particularly attractive because they are simple to compute and analytically tractable. Among them, Principal component analysis (PCA) is the most popular one and it has been widely used in many literatures. A. Principal Component Analysis Assume a data set given as vectors x1 , x2 , . . . , xN , with zero mean, where xn ∈ Rm . Principal Component Analysis (PCA) seeks an orthogonal projection matrix W ∈ RN ×K (K  N ) that best represents the data in a least-square-error sense; and the objective function of PCA is W = arg min W

N X n=1

0-7803-9134-9/05/$20.00 ©2005 IEEE

Authorized licensed use limited to: MICROSOFT. Downloaded on August 8, 2009 at 01:38 from IEEE Xplore. Restrictions apply.

kWWT xn − xn k2

(1)

The matrix W can be directly computed by using traditional eigenvalue decomposition approach CWk = λk Wk ,

C=

N X

xn xTn

(2)

n=1

PCA has achieved great successes in many applications for its computational efficiency and effectiveness in representation. B. Incremental Principal Component Analysis Traditional PCA is a batch algorithm, that is, all the data must be provided all together. However, there is an increasing requirement on the incremental learning algorithm: 1) there are many applications in which the data are incrementally received and maybe infinite, in which cases, it is inefficient to reconduct PCA when a new data arrives; and 2) when the feature dimension and sample number are both large, the computation and storage complexity make the batch PCA impossible. There were many efforts devoted to incremental PCA. Basically, there are two types of algorithms: covariance matrix based and covariance matrix free. And the state-of-the-art algorithm, called Candid Covariance free Incremental Principal Component Analysis (CCIPCA), was proposed by Weng J. and the algorithm is summarized as below: CCIPCA Algorithm. Compute the first K principal eigenvectors, vn1 , vn2 , . . . , vnK , from xn , n = 1, 2, . . . For n=1,2,. . . , do the following steps 1. x1n = xn . 2. For k = 1, 2, . . . , min{K, n} do, a. If k = n, initialize vnk = xkn . b. Otherwise, vnk =

k n−l−1 k 1 + l k kT vn−1 vn−1 + xn xn k n n kvn−1 k k k vn vn xk+1 = xkn − xkT n n kvnk k kvnk k

(3)

III. I NCREMENTAL E IGENVECTORS BASED E IGENVALUE T HEORY

L ARGEST

Before describe our new algorithm for incremental principal component analysis, we introduce the Largest-EigenvalueTheory [1]. Largest-Eigenvalue-Theory (Power Method). Let A be a m × m real symmetric matrix, the eigenvector corresponding to the largest eigenvalue λ can be estimated in an iterative algorithm with arbitrary non-zero initial vector v(0) as v(t + 1) = A

v(t + 1) v(t) , λ(t + 1) = kv(t)k kv(t + 1)k

(5)

From the Largest-Eigenvalue-Theory, we can easily prove the following Corollary. Corollary. Assume that the first k-1 eigenvectors and eigen(k−1) values are {v j , λj }j=1 , the k-th largest eigenvalue and eigenvector can be estimated in an iterative algorithm with arbitrary non-zero initial vector v k (0) as v k (t + 1) = Ak

k−1 X v k (t) k , A = A − λj v j v jT k kv (t)k j=1

(6)

v k (t + 1) kv k (t + 1)k

(7)

λk (t + 1) =

Let A = An be the estimated P covariance matrix when the n n-th data xn arrives, i.e. An = n1 i=1 xi xTi ; Denote vnk (t) as the estimated k-th eigenvector at the t-th iteration from the first n samples and vnk as the final estimated k-th eigenvector from the first n samples. Based on the Largest-Eigenvalue-Theory and the corollary, we have the following representation vnk (t

+ 1) = (An −

(4)

End In the CCIPCA algorithm, vnk is the k-th eigenvector derived from the first n samples; and xkn means the n-th sample after subtracted by the projections in the first k-1 eigenvectors and l is the amnesic parameter. CCIPCA algorithm has been reported to be much superior to other traditional algorithms, such as GHA [5], for incremental principal component analysis. However, though CCIPCA statistically converges, the higher-order eigenvectors converge very slowly since their convergence properties are based on the assumption that the previous eigenvectors are well estimated. k Moreover, in each step, when vn−1 is changed into vnk , k the dominating information removed from vn−1 should be used to update the next eigenvector, yet traditional algorithms omit them and only use newly arrived data to update each eigenvector. In the following, we propose a novel algorithm for incremental principal component analysis based on the LargestEigenvalue-Theory, i.e. Power Method. This new algorithm is sound in theory and does not suffer from the above issues.

ON

k−1 X

λjn vnj vnjT )

j=1

vnk (t) kvnk (t)k

(8)

Moreover, we have n−1 1 An−1 + xn xTn n n k−1 n−1 X j n−1 k 1 An−1 + = λ v j v jT + xn xTn n n j=1 n−1 n−1 n−1 n An =

k Let vnk (1) = vn−1 and we approximate

k vn (t) k (t)k kvn

with

for iteratively multiplying matrix Akn−1 , then, vnk (t + 1) = ( +

(9) (10)

k vn−1 k kvn−1 k

k−1 k−1 X n−1−l X j j jT λn−1 vn−1 vn−1 − λjn vnj vnjT n j=1 j=1

v k (t) n−1−l k 1+l k xn xTn ) nk + λn−1 vn−1 n kvn (t)k n

(11)

where l is the amnesic parameter. For simplicity, Pk−1 let the initial value of eigenvector vkk (1) = xk − j vjk vjkT xk . Note that there is only the dot production of vectors, thus the iteration speed is fast. The whole algorithm, we called LargestEigenvalue-Theory based Incremental Principal Component

Authorized licensed use limited to: MICROSOFT. Downloaded on August 8, 2009 at 01:38 from IEEE Xplore. Restrictions apply.

Analysis (LET-IPCA), is summarized as below

B. CMU PIE face database

LET-IPCA Algorithm. Compute the first K principal eigenvectors, vn1 , vn2 , . . . , vnK , from xn , n = 1, 2, . . . For n=1,2,. . . , do the following steps For k = 1, 2, . . . , min{K, n} do, Pk−1 a. If k = n, initialize vnk (1) = xk − j=1 vjk vjkT xk ; k else, vnk (1) = vn−1 b. For t = 2, . . . , Tmax , update the k-th eigenvector as in Eqn. (11). If kxkn (t) − xkn (t − 1)k ≤ , break. End v k (t) vnk = kvnk (t)k , λkn = kvnk (t)k n End End

The CMU PIE (Pose, Illumination and Expression) database contains more than 40,000 facial images of 68 persons. In our experiments, we use the fifteen persons with five near frontal poses in the illumination directory and the images in the expression directory. There are about 120 images for each person. As shown in Figure 2, LET-IPCA shows great superiority over CCIPCA: 1) the first five eigenvectors of LETIPCA converge much faster than those of CCIPCA; and 2) the final accuracies of the 6-10th eigenvectors from LET-IPCA are much higher than those of CCIPCA, there are two cases in CCIPCA, the accuracies are lower than 0.7 while all results of LET-IPCA are above 0.82. Note that, though LET-IPCA in each step (when a new data comes) is slower than CCIPCA as the multiple iterations, it converges very fast and less samples are required than CCIPCA for the good estimations of the eigenvectors.

LET-IPCA seamlessly integrate the estimations of different eigenvectors in a unified framework. In each step, the removed dominating information from the former eigenvectors are automatically transformed to the latter ones; thus all the dominating information is preserved, as opposite to CCIPCA, the removed information is thrown away. Therefore, the latter eigenvectors from LET-IPCA can converge much faster and are more accurate than those from CCIPCA. Moreover, the iterative process in each step aims at finding the eigenvectors and eigenvalues of the current covariance matrix and has no assumption that the data is Independent Identical Distributed (i.i.d.), which makes LET-IPCA more general for incremental subspace learning. IV. E XPERIMENTS We conduct two sets of experiments to compare the LETIPCA algorithm with the state-of-the-art algorithm, CCIPCA. One set of experiments were conducted on the MNIST [2] database of handwritten digits; and another were conducted on the CMU PIE [6] face database. On both experiments, Tmax =3 and each sample is subtracted by the currently estimated mean. In both experiments, we systematically compare the accuracies of two algorithms. The accuracy is measured as the correlation between the iteratively estimated eigenvector and that from the batch PCA. As shown in Figure 1 and 2, we compare the accuracies of the first 10 eigenvectors at each step when a new data comes. A. MNIST digits database The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. The digits image has been normalized to size 28 × 28 pixels. In our experiments, we used the first 5,000 samples of the test set. Figure 1 shows the encouraging results compared with CCIPCA: 1) the final accuracies of the 7-10th eigenvectors of CCIPCA are less than 0.83; while the corresponding accuracies of the results from LET-IPCA are all higher about 0.2 than those of CCIPCA; and 2) when the sample number is larger than 1000, the results of LET-IPCA are consistently better than those of CCIPCA.

V. C ONCLUSION

AND

D ISCUSSIONS

In this paper, based on the Largest-Eigenvalue-Theory, we proposed a novel algorithm, called LET-IPCA, for incremental principal component analysis. LET-IPCA has no assumption on data sampling strategy; though it directly operates based on the covariance matrix, it is covariance matrix free and needs not to reconstruct the covariance matrix when a data arrives. LET-IPCA well preserves the dominating information in computing the higher-order eigenvectors, and is much superior to CCIPCA in both convergency speed and accuracy. ACKNOWLEDGEMENT The work described in this paper was fully supported by grants from the Research Grants Council of the Hong Kong Special Administrative Region. The work was done while all the authors are with the Chinese University of Hong Kong. R EFERENCES [1] G. Golub and C. Loan. “Matrix Computations”, The Johns Hopkins University Press, Baltimore, MD, 1989. [2] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner.“Gradient-based learning applied to document recognition”, Proceedings of the IEEE, 86(11):22782324, November 1998. [3] E. Oja. “Subspace Methods of Pattern Recognition”, Research Studies Press, Letchworth, UK, 1983. [4] E. Oja and J. Karhunen. “On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix, Journal of Mathematical Analysis and Application, vol. 106, pp. 69-84, 1985. [5] T. Sanger. “Optimal unsupervised learning in a single-layer linear feedforward neural network”, IEEE Trans. Neural Networks, vol. 2, pp. 459-473, 1989. [6] T. Sim, S. Baker, and M. Bsat.“The CMU Pose, Illumi-nation, and Expression (PIE) Database of Human Faces”, Tech. Report CMU-RITR-01-02, Robotics Institute, Carnegie Mellon University, January, 2001. [7] M. Turk and A. Pentland. “Eigenfaces for recognition”, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991. [8] J. Weng, Y. Zhang, and W. Hwang. “Candid Covariancefree Incremental Principal Component Analysis”, IEEE PAMI, 25 (8). 1034-1040, 2004.

Authorized licensed use limited to: MICROSOFT. Downloaded on August 8, 2009 at 01:38 from IEEE Xplore. Restrictions apply.

Correlation (Accuracy)

Correlation (Accuracy)

1 0.8 0.6 0.4 0.2 0

0

1 0.8 0.6 0.4 0.2 0

1000 2000 3000 4000 5000 Sample Sequence Number (a) CCIPCA (MNIST Digits Database)

0

1000 2000 3000 4000 Sample Sequence Number

5000

(b) LET-IPCA (MNIST Digits Database)

1

Correlation (Accuracy)

Correlation (Accuracy)

Fig. 1. The correlations represented by dot products of the first 10 eigenvectors from batch PCA and those computed by (a) CCIPCA with amnesic parameter l=4, and (b) LET-IPCA with amnesic parameter l=4 on the MNIST digits database.

0.8

0.8

0.6

0.6

0.4

0.4

0.2 0

1

0.2

0

2000 4000 6000 8000 Sample Sequence Number (a) CCIPCA (CMU PIE Face Database)

0

0

2000 4000 6000 8000 Sample Sequence Number

(b) LET-IPCA (CMU PIE Face Database)

Fig. 2. The correlations represented by dot products of the first 10 eigenvectors from batch PCA and those computed by (a) CCIPCA with amnesic parameter l=2, and (b) LET-IPCA with amnesic parameter l=2 on the CMU PIE face database.

Authorized licensed use limited to: MICROSOFT. Downloaded on August 8, 2009 at 01:38 from IEEE Xplore. Restrictions apply.