Rayleigh-Ritz Approximation For the Linear Response Eigenvalue Problem Lei-Hong Zhang Jungong Xue Ren-Cang Li
Technical Report 2013-12 http://www.uta.edu/math/preprint/
Rayleigh-Ritz Approximation For the Linear Response Eigenvalue Problem Lei-Hong Zhang∗
Jungong Xue†
Ren-Cang Li
‡
November 24, 2013
Abstract Large scale eigenvalue computation is about approximating certain invariant subspaces associated with the interested part of the spectrum, and the interested eigenvalues are then extracted from projecting the problem by approximate invariant subspaces into a much smaller eigenvalue problem. In the case of the linear response eigenvalue problem (aka the random phase eigenvalue problem), it is the pair of deflating subspaces associated with the first few smallest positive eigenvalues that needs to be computed. This paper is concerned with approximation accuracy relationships between a pair of approximate deflating subspaces and approximate eigenvalues extracted by the pair. Lower and upper bounds on eigenvalue approximation errors are obtained in terms of canonical angles between exact and computed pair of deflating subspaces. These bounds can also be interpreted as lower/upper bounds on the canonical angles in terms of eigenvalue approximation errors. They are useful in analyzing numerical solutions to linear response eigenvalue problems. Key words. Linear response eigenvalue problem, eigenvalue approximation, Rayleigh-Ritz approximation, canonical angles, deflating subspace, error bounds AMS subject classifications. 65L15, 65F15, 81Q15, 15A18, 15A42 ∗ Department of Applied Mathematics, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, People’s Republic of China. Email:
[email protected]. Supported in part by the National Natural Science Foundation of China NSFC-11101257, NSFC-11371102, and the Basic Academic Discipline Program, the 11th five year plan of 211 Project for Shanghai University of Finance and Economics. Part of this work is done while this author is a visiting scholar at the Department of Mathematics, University of Texas at Arlington from February 2013 to January 2014. † School of Mathematical Science, Fudan University, Shanghai 200433, P. R. China. Email:
[email protected]. Supported in part by the National Science Foundation of China Grant 11371105 and Laboratory of Mathematics for Nonlinear Science, Fudan University. ‡ Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 760190408, USA. Email:
[email protected]. Supported in part by NSF grants DMS-1115834 and DMS-1317330, a Research Gift Grant from Intel Corporation, and by Laboratory of Mathematics for Nonlinear Science, Fudan University while this author visited the laboratory in October 2013.
1
1
Introduction
In computational quantum chemistry and physics, the so-called random phase approximation (RPA) describes the excitation states (energies) of physical systems in the study of collective motion of many-particle systems [2, 20, 21], which has applications in silicon nanoparticles and nanoscale materials and analysis of interstellar clouds [2, 3]. One important question in RPA is to compute a few eigenpairs associated with the smallest positive eigenvalues of the following eigenvalue problem: [ ][ ] [ ] A B u u H w := =λ , (1.1) −B −A v v where A, B ∈ Rn×n are both symmetric matrices and [ ] A B is positive definite. B A
(1.2)
The matrix H is a special Hamiltonian matrix: all of its eigenvalues are real and appear in pairs {λ, −λ}. The eigenvalue problem (1.1) is referred to as the Linear Response Eigenvalue Problem (LREP) in the literature of computational quantum chemistry and physics, and several minimization principles were recently established and, as a result, CG type optimization algorithms were proposed to solve (1.1) [2, 3, 15]. Using the symmetric orthogonal matrix [ ] 1 In In J=√ , (1.3) 2 In −In which satisfies J T J = J 2 = I2n , we can transform the matrix H similarly to [2, 3] [ ] [ ] 0 A−B 0 K T J HJ= =: =: H (1.4) A+B 0 M 0 and thus, by the relation: [ ] [ ] y T u z := =J , x v
[ ] [ ] u y =J , v x
the eigenvalue problem (1.1) is equivalent to [ ][ ] [ ] 0 K y y Hzz := =λ M 0 x x
(1.5)
(1.6)
which was still referred to as the linear response eigenvalue problem (LREP) [2, 16, 23] and will be in this paper, too. The condition (1.2) implies that both K and M are symmetric and positive definite [2]. Denote by −λn ≤ · · · ≤ −λ1 ≤ λ1 ≤ · · · ≤ λn 2
the ordered eigenvalues of H (and they are of H as well). In practice, some k smallest positive eigenvalues λ1 ≤ · · · ≤ λk are of interest. The classical Thouless’s minimization principle stated in terms of (1.6) is λ1 = min x, y
x + y T Myy x T Kx . x Ty | 2|x
In [2], a subspace version of this for characterizing the k smallest positive eigenvalues λ1 ≤ · · · ≤ λk was obtained1 k ∑ i=1
λi =
1 inf trace(U T KU + V T M V ), 2 U T V =Ik
(1.7)
where U, V ∈ Rn×k . An important notion for LREP [2] is the so-called pair of deflating subspaces {U, V} by which we mean that both U and V are subspaces of Rn and satisfy KU ⊆ V and M V ⊆ U. More discussions on this are in section 2.2. It is a generalization of the concept of the invariant subspace (or, eigenspace) in the standard eigenvalue problem upon considering the special structure in the LREP (1.6). This notion is not only vital in analyzing the theoretical properties such as the subspace version of Thouless’s minimization principle (1.7) and the Cauchy-like interlacing inequalities [3], but also fundamental for several rather efficient algorithms, e.g., the Locally Optimal Block Preconditioned 4D Conjugate Gradient Method (LOBP4DCG) [3], the block Chebyshev-Davidson method [19], as well as the generalized Lanczos method [18, 22, 23]. Each of these algorithms generates a sequence of approximate deflating subspace pairs {Uj , Vj } that hopefully converge to or contain subspaces near the pair of deflating subspaces {U, V} of interest. Therefore, it is important to establish relationships between the accuracy in eigenvalue approximations and the distances from the exact deflating subspaces to their approximate ones. Analogously to error estimate results for Rayleigh-Ritz approximations in the standard symmetric eigenvalue problem [7, 9, 10, 14, 25], in this paper, we will establish results on error bounds for approximating eigenvalues computed by a pair of approximate deflating subspaces and conversely the error bounds on approximate deflating subspaces in terms of eigenvalue approximation errors. These estimates are useful in analyzing certain iterative methods. The rest of the paper is organized as follows. In section 2, we will first provide some basic concepts about the angles between subspaces as well as some basic properties of LREP and the notion of the pair of deflating subspaces. Section 3 contains our main results: (i) lower and upper bounds on eigenvalue approximation errors in terms of canonical angles between exact and approximate pair of deflating subspaces, and (ii) lower and 1 The “inf” in (1.7) is used to allow the case when one of K or M may be singular. But when both are positive definite, as in the case of original LREP, the minimum is attainable and then “inf” can be replaced by “min”.
3
upper bounds on the canonical angles in terms of eigenvalue approximation errors. In section 4, we discuss possible extensions of our main results in section 3 to deal with the more general linear response eigenvalue problem. Finally, conclusions are drawn in section 5. Notation. Rn×k is the set of all n×k real matrices, Rn = Rn×1 , and R = R1 . In (or simply I if its dimension is clear from the context) is the n × n identity matrix, and e j is its jth column. All vectors are column vectors and are in boldface. For a matrix Z, Z T denotes its transpose; R(Z) is its column space, spanned by its column vectors; ∥Z∥2 , ∥Z∥F , and ∥Z∥ui are the spectral norm, the Frobenius norm, and a general unitarily invariant norm, respectively; Z’s submatrices Z(k:ℓ,i:j) , Z(k:ℓ,:) , and Z(:,i:j) consist of intersections of row k to row ℓ and column i to column j, row k to row ℓ, and column i to column j, respectively. The trace of a square matrix Z is trace(Z) and its eigenvalue set is eig(Z). For real symmetric matrices Z and W , Z ≽ W (resp. Z ≻ W ) means that Z − W is positive semidefinite (resp. positive definite).
2
Preliminary results
A matrix norm ∥ · ∥ is called a unitarily invariant norm on Cm×n (the set of all m × n complex matrices) if it is a matrix norm and has the following two properties 1. ∥XZY ∥ = ∥Z∥ for all unitary matrices X and Y of apt sizes and Z ∈ Cm×n . 2. ∥Z∥ = ∥Z∥2 , the spectral norm of Z, if rank(Z) = 1. Two commonly used unitarily invariant norm are the spectral norm ∥Z∥2 and the Frobenius norm ∥Z∥F . In what follows, we use ∥ · ∥ui for a general unitarily invariant norm.
2.1
Canonical angles
For two subspaces X and Y of Rn , suppose k := dim(X) ≤ dim(Y) =: ℓ,
(2.1)
and the canonical angles θi (X, Y) can be defined recursively [8] for i = 1, 2, . . . , k, by x, y ⟩ =: ⟨x xi , y i ⟩ cos θi (X, Y) = max max⟨x x ∈X y ∈Y
(2.2)
subject to x∥2 = ∥yy ∥2 = 1, ∥x
x, x j ⟩ = ⟨yy , y j ⟩ = 0 ⟨x
for j = 1, 2, . . . , i − 1,
(2.3)
x, y ⟩ is defined by where the standard inner-product ⟨x x, y ⟩ := x Ty ⟨x
for x , y ∈ Rn .
If X ∈ Rn×k and Y ∈ Rn×ℓ are orthonormal basis matrices of X and Y, respectively, i.e., X T X = Ik , X = R(X),
and 4
Y T Y = Iℓ , Y = R(Y ),
and σj for 1 ≤ j ≤ k in ascending order, i.e., σ1 ≤ · · · ≤ σk , are the singular values of Y T X, then the k canonical angles θj (X, Y) from2 X to Y are given by 0 ≤ θj (X, Y) := arccos σj ≤
π 2
for 1 ≤ j ≤ k.
(2.4)
They are in descending order, i.e., θ1 (X, Y) ≥ · · · ≥ θk (X, Y). Set Θ(X, Y) = diag(θ1 (X, Y), . . . , θk (X, Y)).
(2.5)
It can be seen that angles so defined are independent of the orthonormal basis matrices X and Y which are not unique. The angle θ(X, Y) between X and Y is defined to be θ(X, Y) := max arccos(σi ) = arccos(min σi ) = θ1 (X, Y). i
i
(2.6)
When k = 1, i.e., X is a vector, there is only one canonical angle from X to Y and so θ(X, Y) = Θ(X, Y). In what follows, we sometimes place a vector or matrix in one of or both arguments of θj ( · , · ), θ( · , · ), and Θ( · , · ), e.g., θ(X, Y ), with the understanding that it is about the subspace spanned by the vector or the columns of the matrix argument. Denote by Gnk the set of all subspaces of dimension k in Rn . It is famously known as the Grassmannian manifold. For any unitarily invariant norm ∥ · ∥ui , ∥sin Θ(·, ·)∥ui defines a distance metric on Gnk [17, p.94]. Let Y⊥ be an orthonormal basis matrix of the orthogonal complement of Y in Rn . It can be proved that the k singular values of X T Y⊥ are sin θi (X, Y), and thus3
∥sin Θ(X, Y)∥ui = X T Y⊥ ui . In particular, we have ∥ sin Θ(X, Y)∥2 = sin θ(X, Y) = sin θ1 (X, Y). Given a symmetric and positive definite matrix W ∈ Rn×n , the W -inner product is defined by x, y ⟩W = x T Wyy for x , y ∈ Rn . ⟨x x, y ⟩ in (2.2) and (2.3) by the W -inner-product Replacing the standard inner-product ⟨x x, y ⟩W leads to the W -canonical angles between X and Y, which will be denoted by ⟨x θW ;i (X, Y) and also ( ) ΘW (X, Y) = diag θW ;1 (X, Y), . . . , θW ;k (X, Y) . The W -canonical angles can also be stated in terms of the standard canonical angles (i.e., the ones under W = In ) through a linear transformation of the involved subspaces. If k = ℓ, we may says that these angles are between X and Y. We assume that any ∥ · ∥ui we use is generic to matrix sizes in the sense that it applies to matrices of all sizes. Examples include the matrix spectral norm ∥ · ∥2 and the Frobenius norm ∥ · ∥F . 2
3
5
In fact, let W = CC T , where C ∈ Rn×n is nonsingular. Such a decomposition is not unique, but for our purpose, any one of them suffices. Then [8, Theorem 4.2] θW ;i (X, Y) = θi (C T X, C T Y),
ΘW (X, Y) = Θ(C T X, C T Y),
θW (X, Y) = θ(C T X, C T Y). (2.7)
There are important implications of (2.7):
∥sin ΘW (X, Y)∥ui = sin Θ(C T X, C T Y) ui ,
(2.8)
and ∥sin ΘW (·, ·)∥ui is a distance metric on Gnk .
2.2
Basic properties of LREP and pair of deflating subspaces
Throughout the rest of this paper, K, M ∈ Rn×n are symmetric and at least positive semidefinite. To facilitate our discussions, we next collect several necessary properties of the LREP in Theorem 2.1, and the reader is referred to [2, Section 2] for proofs and more. Theorem 2.1. Suppose that M is definite. Then the following statements are true: (i) There exists a nonsingular Φ ∈ Rn×n such that K = Ψ Λ2 Ψ T
and
M = ΦΦT ,
(2.9)
where Λ = diag(λ1 , λ2 , . . . , λn ) and Ψ = Φ− T . (ii) If K is also definite, then all λi > 0 and H is diagonalizable: [ ] [ ][ ] ΨΛ ΨΛ ΨΛ ΨΛ Λ H = . Φ −Φ Φ −Φ −Λ
(2.10)
(iii) The eigen-decompostion of KM and M K are (KM )Ψ = Ψ Λ2
and
(M K)Φ = ΦΛ2 ,
(2.11)
respectively. The property (2.11) follows directly from (2.9) which implies that we can alternatively solve the LREP via solving any product eigenvalue problem in (2.11) for the k smallest positive eigenvalues and their associated eigenvectors. Given two k-dimensional subspaces U ⊆ Rn and V ⊆ Rn , the pair {U, V} is called a pair of deflating subspaces of {K, M } if KU ⊆ V and M V ⊆ U.
(2.12)
This definition is essentially the same as the existing ones for the product eigenvalue problem [4, 6, 11]. Let U ∈ Rn×k and V ∈ Rn×k be the basis matrices for U and V, respectively. Alternatively, (2.12) implies that there exist KR ∈ Rk×k and MR ∈ Rk×k such that KU = V KR and M V = U MR 6
and vice versa. Equivalently, [ H
]
V U
[ =
][
V U
KR
]
MR
.
This [implies that V ⊕ U is an invariant subspace of H ([2, Theorem 2.4]), and conversely, ] V if R( ) is an invariant subspace of H, i.e., U [
KU MV
]
[ ] [ ] [ ] V V VD =H = D= U U UD
for some D ∈ Rk×k ,
then {R(U ), R(V )} is a pair of deflating subspace of {K, M }. The eigenvalues of D are part of those of H, i.e., eig(D) ⊆ eig(H) and we are interested in the pair of deflating subspaces so that eig(D) = {λ1 , . . . , λk }. Given a pair of deflating subspaces {R(U ), R(V )}, partial eigenpairs of H can be obtained via solving the smaller eigenvalue problem ([2, Theorem 2.5]): if [ ][ ] [ ] yˆ KR yˆ HRzˆ := =λ =: λˆ z, (2.13) MR xˆ xˆ [
] V yˆ then (λ, ) is an eigenpair of H. The matrix HR is the restriction of H onto V ⊕ U U xˆ with respect to the basis matrices V ⊕ U . It is shown that if one of K and M is definite, then U T V is nonsingular [2, Lemma 2.7]. In that case, decompose U T V as U T V = W1T W2 , where W1 ∈ Rk×k and W2 ∈ Rk×k are nonsingular. We choose new basis matrices for R(U ) and R(V ) as U W1−1 and V W2−1 , respectively, upon which the restriction of H onto V ⊕ U becomes [ ] W1− T U T KU W1−1 HSR = (2.14) W2− T V T M V W2−1 whose off-diagonal blocks are both symmetric and positive (semi)definite. Roughly speaking, calculating the k smallest positive eigenvalues of LREP (1.6) is about finding the pair of deflating subspaces {R(Φ1 ), R(Ψ1 )}, where Φ1 = Φ(:,1:k)
and Ψ1 = Ψ(:,1:k) .
(2.15)
To this end, usually a sequence of approximate deflating subspace pairs {Uj , Vj } (the dimensions of Uj and Vj can be larger than k) are generated to converge or contain {R(Φ1 ), R(Ψ1 )}. For example, • for the first Lanczos type algorithm in [18], Uj and Vj are the Krylov subspaces generated by initial vectors u 0 ∈ Rn and v 0 ∈ Rn ([18, Lemma 3.1]): Uj = Kj (M K, u 0 ) and
7
Vj = Kj (KM, v 0 ),
(2.16)
and the basis matrices Uj and Vj for Uj and Vj , respectively, obey the relation ([18, Theorem 3.1]): UjT Vj = Ij ,
KUj = Vj Tj + βj v j+1e T j ,
M Vj = Uj Dj ,
(2.17)
where v i is the ith column of Vj , βj ∈ R and Tj , Dj ∈ Rj×j ; • in [3], Uj and Vj are constructed based on the Locally Optimal Block Preconditioned 4-D Conjugate Gradient Method (LOBP4DCG) based on the minimization principle (1.7). • in [19], Uj and Vj are generated by a block Chebyshev-Davidson type method, where the block Chebyshev filter procedure is used to refine and to expand the subspaces. Suppose {R(U ), R(V )} is a pair of approximate deflating subspace and U T V is nonsingular, then it is proved ([2, Theorem 2.9]) that the eigenvalues of HSR defined by (2.14) are invariant with respect to the different choice of basis matrices U and V . In particular, if U T V = Ik , then HSR becomes [ ] U T KU HSR = . (2.18) V TM V The eigenpairs of HSR are shown to give the best possible approximate eigenpairs, so-called the Ritz pairs, for H (see [3] for more details). Given an approximation {R(U ), R(V )} to the pair of deflating subspaces {R(Φ1 ), R(Ψ1 )}, we will investigate how good are the Ritz pairs in approximating the exact eigenpairs of H. In measuring the difference between {R(U ), R(V )} and {R(Φ1 ), R(Ψ1 )}, we have three choices to: sin ΘM −1 (U, Φ1 ) and sin ΘK −1 (V, Ψ1 ),
or
(2.19a)
sin ΘM −1 (U, Φ1 ) and sin ΘM −1 (U, M V ), or
(2.19b)
sin ΘK −1 (V, Ψ1 ) and sin ΘK −1 (V, KU ).
(2.19c)
While it seems that (2.19a) is most natural, (2.19b) and (2.19c) are more convenient to use for our purpose. In the case of K ≻ 0 and M ≻ 0, they are equivalent as the following theorem shows. Theorem 2.2. Suppose K ≻ 0 and M ≻ 0. If both sines in any one of (2.19) are zeros, {R(U ), R(V )} is the pair of deflating subspaces and with the corresponding HSR having eigenvalues ±λi for 1 ≤ i ≤ k. Proof. We will use (2.19a) as an example. Let Λ1 = Λ(1:k,1:k) . Suppose both sin ΘM −1 (U, Φ1 ) = 0 and sin ΘK −1 (V, Ψ1 ) = 0. Thus there exist nonsingular Q1 , Q2 ∈ Rk×k such that U = Φ1 Q1 ,
V = Ψ1 Q2 . 8
(2.20)
2 T Then KU = Ψ Λ2 Ψ T Φ1 Q1 = Ψ1 Λ21 Q1 = V (Q−1 2 Λ1 Q1 ) and M V = ΦΦ Ψ1 Q2 = Φ1 Q2 = −1 U (Q1 Q2 ) which imply that {R(U ), R(V )} is a pair of deflating subspaces. It remains to show that the corresponding HSR has eigenvalues ±λi for 1 ≤ i ≤ k. As different choices of basis matrices for R(U ) and R(V ) do not change the eigenvalues of HSR ([2, Theorem 2.9]), we can assume, without loss of generality, Q1 = Q2 = Ik in (2.20) for which [ ] 0 Λ21 HSR = Ik 0
whose eigenvalues are ±λi for 1 ≤ i ≤ k.
3
Main Results
Suppose one of K and M is definite. Let {R(U ), R(V )} be a pair of approximate deflating subspaces intended to approximate {R(Φ1 ), R(Ψ1 )}, where Φ1 and Ψ1 are given by (2.15), U, V ∈ Rn×k and U T V = Ik . With the pair, a restriction HSR given in (2.18) is constructed. Since HSR is of the same structure as H in (1.6), it has eigenvalues −µk ≤ · · · ≤ −µ1 ≤ µ1 ≤ · · · ≤ µk .
(3.1)
We are interested in bounding 1. the errors in µi as approximations to λi in terms of the error in {R(U ), R(V )} as an approximation to {R(Φ1 ), R(Ψ1 )}, and conversely 2. the error in {R(U ), R(V )} as an approximation to {R(Φ1 ), R(Ψ1 )} in terms of the errors in µi as approximations to λi . Define δk :=
k ∑
(µ2i − λ2i ).
(3.2)
i=1
We know 0 ≤ λi ≤ µi (see [2, Theorem 4.1]); so δk defines an error measurement in all µi as approximations to λi for 1 ≤ i ≤ k. This is what we will be using for measuring the total eigenvalue approximation error in all µi . We have already discussed how to measure approximation accuracy in deflating subspaces by one of (2.19a) – (2.19c). Theorem 3.1. Assume that M is definite. Let {R(U ), R(V )} be an approximate pair to {R(Φ1 ), R(Ψ1 )}, where U, V ∈ Rn×k and rank(U T V ) = k, and let the eigenvalues of HSR be given by (3.1). Then for the δk given by (3.2), it holds that (λ2k+1
−
λ2k )∥ sin ΘM −1 (U, Φ1 )∥2F
≤ δk ≤
k ∑
λ2i · tan2 θM −1 (U, M V )
i=1
+
λ2n − λ21 2 cos θM −1 (U, M V
9
)
∥ sin ΘM −1 (U, Φ1 )∥2F . (3.3)
As a result,
√ ∥ sin ΘM −1 (U, Φ1 )∥F ≤
δk . − λk2
(3.4)
λ2k+1
Proof. As different choices of basis matrices for R(U ) and R(V ) do not change the eigenvalues of HSR ([2, Theorem 2.9]), we can assume, without loss of generality, that U T V = Ik . Define e := Ψ T U = Φ−1 U and Ve := ΦT V = Ψ −1 V U (3.5) e T Ve = U T Ψ ΦT V = U T V = Ik . It can be verified that which satisfy U e , I(:,1:k) ). ΘM −1 (U, Φ1 ) = Θ(Ψ T U, Ψ T Φ1 ) = Θ(U
(3.6)
e and Λ as Partition U [ e= U
k n−k
k
e1 U e2 U
]
[ ,
Λ=
k
k
n−k
Λ1
n−k
Λ2
] .
(3.7)
According to Theorem 2.1, µ2i for i = 1, . . . , k, are the eigenvalues of the product matrix (U T KU )(V T M V ). Therefore, k ∑
µ2i = trace((U T KU )(V T M V ))
i=1
= trace((U T Ψ Λ2 Ψ T U )(V T ΦΦT V )) e T Λ2 U e )(Ve T Ve )) = trace((U e1 Ve T Ve U e1T ) + trace(Λ22 U e2 Ve T Ve U e2T ). = trace(Λ21 U
(3.8)
As we have pointed out that the eigenvalues of HSR are unchanged with different choices of the basis matrices for R(U ) and R(V ), we next choose specific basis matrices U and V e and Ve be to simplify (3.8). Specifically, let the QR decompositions of U e = Q1 R1 U
and Ve = Q2 R2 ,
respectively, where Q1 and Q2 are n-by-k with orthonormal columns. By [17, p.40], there exist orthogonal matrices P ∈ Rn×n and Si ∈ Rk×k such that • for 2k ≤ n, Ik 0 , 0 k
P Q 1 S1 =
k n−2k
Γ = diag(γ1 , . . . , γk ),
Γ Σ , 0
k
k
P Q 2 S2 =
k n−2k
k
0 ≤ γ1 ≤ · · · ≤ γk ,
Σ = diag(σ1 , . . . , σk ), σ1 ≥ · · · ≥ σk ≥ 0, e , Ve ), σi = sin θi (U e , Ve ), for 1 ≤ i ≤ k; γi = cos θi (U 10
• for 2k > n, n−k
P Q 1 S1 =
2k−n n−k
2k−n
In−k 0 0 I2k−n , 0 0 n−k
Γ = diag(γ1 , . . . , γn−k ),
n−k
P Q 2 S2 =
2k−n n−k
n−k
Γ 0 Σ
2k−n
0
I2k−n , 0
0 ≤ γ1 ≤ · · · ≤ γn−k ,
Σ = diag(σ1 , . . . , σn−k ), σ1 ≥ · · · ≥ σn−k ≥ 0, e , Ve ), σi = sin θi (U e , Ve ), for 1 ≤ i ≤ n − k γi = cos θi (U e , Ve ) = 0 for n − k + 1 ≤ i ≤ k. θi (U Now, we reset
−1 Γ Γ e = PT 0 , U Ve = P T Σ , 0 0 −1 Γ 0 Γ 0 e = PT 0 I2n−k , Ve = P T 0 I2n−k , U 0 0 Σ 0
if 2k ≤ n,
(3.9a)
if 2k > n
(3.9b)
which eseentially change the basis matrices for R(U ) and R(V ) from U and V to U R1−1 S1 e and Ve in (3.9) satisfy and V R2−1 S2 , respectively. Moreover, the new U e T Ve = Ik , U
Ve T Ve = Ik , 1 1 1 e ∥2 = ∥Γ −1 ∥2 = ∥U = = . T T e , Ve ) cos θ(Ψ U, Φ V ) cos θM −1 (U, M V ) cos θ(U
(3.10)
Thus, from (3.8), we have k ∑
e1 U e T ) + trace(Λ2 U e eT µ2i = trace(Λ21 U 1 2 2 U2 ).
(3.11)
i=1
Partition [ P =
k n−k
k
n−k
P11 P12 P21 P22
] .
e ) = R((P T )(:,1:k) ), and therefore By (3.9), R(U e , I(:,1:k) ) = Θ( ΘM −1 (U, Φ1 ) = Θ(U
[ T] [ ] P11 Ik T , 0 ). P12
e partitioned as in (3.7), we have On the other hand, from (3.9) and with U T −1 T −1 if 2k ≤ n, P12 Γ[ , P11 Γ[ , ] ] −1 −1 e2 = e1 = U U Γ Γ T T , , if 2k > n, P12 P11 I2k−n I2k−n
(3.12)
if 2k ≤ n, if 2k > n, (3.13)
11
from which, we can verify that for i = 1, 2 T T ei U eiT ≤ 1 P1i P1i P1i ≤ U P1i , γ12
where γ1 := cos θM −1 (U, M V ).
Therefore T T trace(Λ21 P11 P11 ) + trace(Λ22 P12 P12 ) ≤
k ∑
µ2i ≤
i=1
] 1[ 2 T 2 T trace(Λ P P ) + trace(Λ P P ) 1 11 11 2 12 12 . γ12 (3.14)
We claim that (λ2k+1
−
λ2k )∥ sin ΘM −1 (U, Φ1 )∥2F
≤
T trace(Λ21 P11 P11 )
≤
(λ2n
+
T trace(Λ22 P12 P12 )
−
k ∑
λ2i (3.15a)
i=1
−
λ21 )∥ sin ΘM −1 (U, Φ1 )∥2F .
(3.15b)
To prove (3.15), we note T Ik − P11 P11 ≽ 0,
λ21 · Ik ≼ Λ21 ≼ λ2k · Ik ,
T P12 P12 ≽ 0,
λ2k+1 · In−k ≼ Λ22 ≼ λ2n · In−k ,
and therefore T T T T λ21 · (Ik − P11 P11 ) ≼ (Ik − P11 P11 )1/2 Λ21 (Ik − P11 P11 )1/2 ≼ λ2k · (Ik − P11 P11 ), T λ2k+1 · P12 P12 ≼
T T (P12 P12 )1/2 Λ22 (P12 P12 )1/2
T ≼ λ2n · P12 P12 ,
So for the right-hand side of (3.15a), T trace(Λ21 P11 P11 )
+ = ≥ = = = =
T trace(Λ22 P12 P12 )
−
k ∑
λ2i
i=1 T T − trace((Ik − P11 P11 )Λ21 ) + trace(P12 Λ22 P12 ) 2 T 2 T λk+1 · trace(P12 P12 ) − λk · trace(Ik − P11 P11 ) T T λ2k+1 · trace(P12 P12 ) − λ2k · [k − trace(P11 P11 )] 2 T 2 T λk+1 · trace(P12 P12 ) − λk · [k − trace(Ik − P12 P12 )] (λ2k+1 − λ2k ) · ∥P12 ∥2F (λ2k+1 − λ2k )∥ sin ΘM −1 (U, Φ1 )∥2F .
This proves (3.15a) and thus the first inequality in (3.3). Similarly, T T − trace((Ik − P11 P11 )Λ21 ) + trace(P12 Λ22 P12 ) ≤ (λ2n − λ21 )∥ sin ΘM −1 (U, Φ1 )∥2F
which proves (3.15b). Noticing γ1 = cos θM −1 (U, M V ), we have k ∑ i=1
µ2i
−
k ∑ i=1
∑k λ2i
≤
2 i=1 λi
+ (λ2n − λ21 )∥ sin ΘM −1 (U, Φ1 )∥2F ∑ 2 − λi γ12 i=1 k
12
≤
k (λ2n − λ21 )∥ sin ΘM −1 (U, Φ1 )∥2F 1 − γ12 ∑ 2 + λ γ12 γ12 i=1 i
=
∑ (λ2n − λ21 )∥ sin ΘM −1 (U, Φ1 )∥2F 2 + tan θ (U, M V ) · λ2i −1 M cos2 θM −1 (U, M V ) k
i=1
which proves the second inequality in (3.3). Remark 3.1. We make several remarks for Theorem 3.1: (1) We first consider the special case k = 1 for which the lower and upper bounds in Theorem 3.1 reduce to u, Φ1 ) ≤ δ1 ≤ λ21 · tan2 θM −1 (u u, Mvv ) (λ22 − λ21 ) · sin2 θM −1 (u +
λ2n − λ21 u, Φ1 ), sin2 θM −1 (u u, Mvv ) cos2 θM −1 (u
(3.16)
where we have written u for U and v for V since k = 1. It is interesting to note that in this case, the upper bound (3.16) is sharp in the sense that when λ1 ≤ λ2 = · · · = λn ,
(3.17)
it becomes an equality. This can be seen as follows. Suppose (3.17) holds. Then (3.11) becomes ˜ 1 ) + λ2n (˜ ˜ 2 ), µ21 = λ21 (˜ v Tv˜)(˜ uT v Tv˜)(˜ uT 1u 2u
(3.18)
T ˜T ˜ are defined as in (3.5) and (3.7). Therefore, by noting where u˜ = [˜ uT 1 ,u 2 ] and v T T T u˜ u˜ = u˜ 1 u˜ 1 + u˜ 2 u˜ 2 , we have from (3.18) that
˜ 2 ). µ21 = λ21 (˜ v Tv˜)(˜ u Tu˜ ) + (λ2n − λ21 )(˜ v Tv˜)(˜ uT 2u
(3.19)
Moreover, for k = 1, T
T
(˜ v v˜)(˜ u u˜ ) =
[
1 u, Mvv ) cos2 θM −1 (u
,
(˜ v
T
˜2) v˜)(˜ uT 2u
u , Φ1 ) sin θM −1 (u = u, Mvv ) cos θM −1 (u
]2
which, together with (3.19), imply that the second inequality in (3.16) is an equality. (2) When M = In , the LREP reduces to the symmetric eigenvalue problem and Φ = Ψ is orthogonal. Therefore, only one of R(U ) and R(V ) is needed, i.e., U = V , which then leads to tan θM −1 (U, M V ) = 0 and
cos θM −1 (U, M V ) = 1,
and (3.16) becomes the well-known one 2 (λk+1 − λ2k )∥ sin Θ(U, Φ1 )∥2F ≤
k ∑
(µ2i − λ2i ) ≤ (λ2n − λ21 )∥ sin Θ(U, Φ1 )∥2F
i=1
for the symmetric eigenvalue problem [7, 9, 10, 14, 25]. 13
(3.20)
By comparing the lower and upper bounds for δk in Theorem 3.1, one may argue that an unsatisfactory part in the lower bound for δk is that a term in the order of ∥ sin ΘM −1 (U, M V )∥2F is absent because it would be reasonable to expect that ΘM −1 (U, M V ) = 0 if δk = 0. However this is false as demonstrated by Example 3.1 below. Example 3.1. Let K = diag(0, 0, 1) and M = I3 . The eigenvalues of H are ±0, ±0, and ±1, i.e., λ1 = λ2 = 0 and λ3 = 1, and Φ = Ψ = In in (2.9). Consider k = 2 and 1 0 1 0 U = 0 1 and V = 0 1 . (3.21) 0 0 0 1 It can be verified that T
T
U V = I2 , U KU = 0
[ ] 1 0 and V M V = , 0 2 T
which implies that µ1 = µ2 = 0 and thus δk = 0. However, ΘM −1 (U, Φ1 ) = 0,
π ΘM −1 (U, M V ) = diag( , 0), 4
1 (λ23 − λ22 )∥ sin ΘM −1 (U, M V )∥2F = (λ23 − λ22 )∥ sin ΘM −1 (U, M V )∥22 = . 2 This phenomenon that δk = 0 but ∥ sin ΘM −1 (U, M V )∥F ̸= 0 is caused by the indefiniteness of K, and in Theorem 3.3, we will establish a lower bound (see (3.36)) of δk using ∥ sin ΘM −1 (U, M V )∥2F under the assumption that both K and M are definite. So far, we have considered K ≽ 0 and M ≻ 0. It is not difficult to state a version for K ≻ 0 and M ≽ 0, by swapping the roles of K and M . In fact, when K ≻ 0 and M ≽ 0, we have, instead of (2.9), b 2Φ bT , K = ΨbΨbT , M = ΦΛ (3.22) b = Ψb− T . Let (cf. (2.15)) where Ψb ∈ Rn×n is nonsingular, Λ = diag(λ1 , λ2 , . . . , λn ) and Φ b1 = Φ b(:,1:k) Φ
and Ψb1 = Ψb(:,1:k) .
Remark 3.2. In the case when K ≻ 0 and M ≻ 0, if λk < λk+1 , then b1 ) R(Φ1 ) = R(Φ
and
R(Ψ1 ) = R(Ψb1 ).
In fact, for the case H is diagonalizable (see Theorem 2.1) and ] [ [ ] Ψ1 Λ 1 Ψb1 , b1 Λ1 Φ1 Φ
14
(3.23)
are two different basis matrices for the eigenspace of H associated with the eigenvalues λi for 1 ≤ i ≤ k which are different from the rest of the eigenvalues of H, where Λ1 = Λ(1:k,1:k) . So the eigenspace is unique and thus there exists a nonsingular Q ∈ Rk×k such that ] [ ] [ Ψ1 Λ 1 Ψb1 = b Q Φ1 Φ1 Λ1 b1 ) and R(Ψ1 ) = R(Ψb1 ). An implication of this remark is that which implies R(Φ1 ) = R(Φ b1 and Ψ1 from Ψb1 . there is no need to distinguish Φ1 from Φ Theorem 3.2. Suppose that K ≻ 0 and M ≽ 0. Let {R(U ), R(V )} be an approximate pair b1 ), R(Ψb1 )}, where U, V ∈ Rn×k satisfying rank(U T V ) = k, and let the nonnegative to {R(Φ eigenvalues of HSR given by (2.14). Then for the δk given by (3.2), it holds that (λ2k+1 − λ2k )∥ sin ΘK −1 (V, Ψb1 )∥2F ≤ δk ≤
k ∑
λ2i · tan2 θK −1 (V, KU )
i=1
+
λ2n − λ21 ∥ sin ΘK −1 (V, Ψb1 )∥2F . cos2 θK −1 (V, KU )
As a result,
√ ∥ sin ΘK −1 (V, Ψb1 )∥F ≤
δk . − λ2k
λ2k+1
(3.24)
(3.25)
Suppose K ≻ 0 and M ≻ 0. The inequalities (3.4) and (3.25) implies that if δk = 0 and if the gap ηk := λ2k+1 − λ2k > 0, (3.26) then the two sines in (2.19a) are zeros. By Theorem 2.2, all sines in (2.19) are zeros in the case, but it does not lead to quantitative upper bounds on some of the angles, most notably on sin ΘM −1 (U, M V ) and sin ΘK −1 (V, KU ). Lemma 3.1. Suppose K ≻ 0 and M ≻ 0, and let U, V ∈ Rn×k satisfying rank(U T V ) = k. Then for any unitarily invariant norm ∥ · ∥ui ∥sin ΘM −1 (U, M V )∥ui ≤ ∥sin ΘM −1 (U, Φ1 )∥ui + κ ∥sin ΘK −1 (V, Ψ1 )∥ui ,
(3.27)
∥sin ΘK −1 (V, KU )∥ui ≤ ∥sin ΘK −1 (V, Ψ1 )∥ui + κ ∥sin ΘM −1 (U, Φ1 )∥ui ,
(3.28)
where κ = λn /λ1 . Proof. Recall (3.5) and (3.6). Noting K −1 = ΦΛ−2 ΦT , we also have ΘK −1 (V, Ψ1 ) = Θ(Λ−1 ΦT V, Λ−1 ΦT Ψ1 ) = Θ(Λ−1 Ve , I(:,1:k) ), e , ΦT V ) = Θ(U e , Ve ), ΘM −1 (U, M V ) = Θ(Ψ T U, Ψ T M V ) = Θ(U
(3.29b)
e ). ΘK −1 (V, KU ) = Θ(Λ−1 ΦT V, Λ−1 ΦT KU ) = Θ(Λ−1 Ve , ΛU
(3.29c)
15
(3.29a)
e and Λ as in (3.7) and Ve accordingly as Partition U [ Ve =
k n−k
k
Ve1 Ve2
] .
(3.30)
For any unitarily invariant norm ∥ · ∥ui , we have
e , Ve ) e , I(:,1:k) )
sin Θ(U
≤ sin Θ(U
+ sin Θ(I(:,1:k) , Ve ) , ui ui
ui
−1 e −1 e e e )
sin Θ(Λ V , ΛU ) ≤ sin Θ(Λ V , I(:,1:k) ) + sin Θ(I(:,1:k) , ΛU
. ui
ui
ui
(3.31a) (3.31b)
We need to relate Θ(Λ−1 Ve , I(:,1:k) ) to Θ(Ve , I(:,1:k) ). Without loss of generality, we may normalize V from the right so that Ve T Ve = Ik . We know that sin Θ(I(:,1:k) , Ve ) and sin Θ(Λ−1 Ve , I(:,1:k) ) consist of the singular values of Ve2 ,
e e T −2 e −1/2 Λ−1 2 V2 (V Λ V )
respectively. Denote their singular values by α1 ≥ · · · ≥ αk and β1 ≥ · · · ≥ βk , respectively. It can be verified that
e e T −1 λ21 Λ−1 2 V2 V2 Λ2
λ21 Ik ≼ (Ve T Λ−2 Ve )−1 ≼ λ2n Ik , ≼ Λ−1 Ve2 (Ve T Λ−2 Ve )−1 Ve2T Λ−1 ≼ λ2n Λ−1 Ve2 Ve2T Λ−1 , 2
2
2
2
2 e eT e (1/λ2n )Ve2T Ve2 ≼ Ve2T Λ−2 2 V2 ≼ (1/λk+1 )V2 V2 .
These matrix inequalities imply λ1 λn αi ≤ βi ≤ αi λn λk+1 which yields
λ1 λn
sin Θ(Ve , I(:,1:k) ) ≤ sin Θ(Λ−1 Ve , I(:,1:k) ) ≤
sin Θ(Ve , I(:,1:k) ) . (3.32) λn λk+1 ui ui ui Combine (3.6), (3.29b), (3.31a), and (3.32) to get (3.27). e ) to Θ(I(:,1:k) , U e ). Without loss of generality, we may norWe now relate Θ(I(:,1:k) , ΛU e TU e = Ik . We know sin Θ(I(:,1:k) , U e ) and sin Θ(I(:,1:k) , ΛU e) malize U from the right so that U consist of the singular values of e2 , U
e2 (U e T Λ2 U e )−1/2 Λ2 U
respectively. Denote their singular values by α ˆ1 ≥ · · · ≥ α ˆ k and βˆ1 ≥ · · · ≥ βˆk , respectively. It can be verified that e T Λ2 U e )−1 ≼ (1/λ21 )Ik , (1/λ2n )Ik ≼ (U 16
e2 U e T Λ2 ≼ Λ2 U e2 (U e T Λ2 U e )−1 U e T Λ2 ≼ (1/λ2 )Λ2 U e2 U e T Λ2 , (1/λ2n )Λ2 U 2 2 1 2 2 Te T 2e 2 eT e e e λ U2 U2 ≼ U2 Λ2 U2 ≼ λn U2 U2 . k+1
These matrix inequalities implies λk+1 λn α ˆ i ≤ βˆi ≤ α ˆi λn λ1 which yields
λk+1 λn
e e e
sin Θ(I(:,1:k) , U ) ≤ sin Θ(I(:,1:k) , ΛU ) ≤
sin Θ(I(:,1:k) , U ) . λn λ ui ui ui 1
(3.33)
Combine (3.29a), (3.29c), (3.31b), and (3.33) to get (3.28). Theorem 3.3. Add to the conditions of Theorem 3.1 and Theorem 3.2 that K ≻ 0 and M ≻ 0. If ηk = λ2k+1 − λ2k > 0, then √ ∥ sin ΘM −1 (U, M V )∥F ≤ (κ + 1) √ ∥ sin ΘK −1 (V, KU )∥F ≤ (κ + 1)
δk , ηk
(3.34)
δk , ηk
(3.35)
where κ = λn /λ1 . Remark 3.3. While (3.4), (3.25), (3.34), and (3.35) are naturally interpreted as providing upper bounds on various canonical angles between interested subspaces, each of them can also be understood to yield a lower bound on δk , e.g., by (3.34), we have δk ≥
ηk ∥ sin ΘM −1 (U, M V )∥2F . (1 + κ)2
(3.36)
We omit the rest. Upper bounds on δk come from the second inequalities in (3.3) and (3.24).
4
Extension to the more general case
In this section, we discuss how to extend our results in section 3 to the following more general linear response eigenvalue problem: [ ][ ] [ ][ ] [ ] K y E+ y y Hzz := =λ =: λE , (4.1) M x E− x x T = E ∈ Rn×n are nonsingular and K and M have the same property as before, where E+ − i.e., K, M ∈ Rn×n are symmetric and at least positive semidefinite. It is a generalized eigenvalue problem for the matrix pencil H − λE and has been discussed in [1, 5, 12, 13].
17
T = E as Decompose E− +
T E− = E+ = CDT ,
where C, D ∈ Rn×n are nonsingular. The eigenvalue problem (4.1) can be equivalenly transformed to [1] ][ ] [ [ ] e y˜ K y˜ e H˜ z := f =λ , (4.2) ˜ ˜ x x M returning in form to the standard LREP (1.6), where [ ] [ ] y˜ −1 −T −1 −T T y e f K = C KC , M = D M D , =Γ x˜ x
[ and Γ =
]
D C
.
(4.3)
This transformation (4.3), though, equivalently transforms the general case to the original form in (1.6), it is of significance only in theory, because in practice, K, M and E± simply may not be available and their very existences are through matrix-vector multiplications. e Ψe ∈ Rn×n such that By Theorem 2.1, there exist nonsingular Φ, e = ΨeΛ2 ΨeT , K
f=Φ eΦ eT , M
eT Ψe = In , Φ
(4.4)
where Λ = diag(λ1 , λ2 , . . . , λn ) and −λn ≤ · · · ≤ −λ1 ≤ λ1 ≤ · · · ≤ λn are the eigenvalues of (4.1) (cf. (4.2) and (4.3)). Let e Φ = C − T Φ,
Ψ = D− T Ψe.
The decompositions in (4.4) become KΦ = E+ Ψ Λ2 ,
M Ψ = E− Φ,
ΦT E + Ψ = In .
(4.5)
The notion of the pair of deflating subspaces for H − λE was also introduced: For two k-dimensional subspaces R(U ) and R(V ) of Rn , we call {R(U ), R(V )} a pair of deflating subspaces of H − λE if KR(U ) ⊆ E+ R(V ) and
M R(V ) ⊆ E− R(U ).
This implies that there exist KR , MR ∈ Rk×k such that KU = E+ V KR
and M V = E− U MR ,
or equivalently, [ H
]
V U
[ =E
]
V U
[ HR ,
18
where
HR :=
KR MR
] .
[ ] y Furthermore, it is shown [1, Theorem 2.3] that if (λ, ) is an eigenpair of HR , then x [ ] [ T ] Vy D Vy (λ, ) is an eigenpair of (4.1), and accordingly, (λ, T ) is an eigenpair of (4.2). x x Ux C Ux Recall (4.5), and let Φ1 = Φ(:,1:k) ,
Ψ1 = Ψ(:,1:k) ,
Λ1 = Λ(1:k,1:k) .
Then KΦ1 = E+ Ψ1 Λ21 ,
M Ψ1 = E− Φ1 .
So {R(Φ1 ), R(Ψ1 )} is a pair of deflating subspaces of H − λE. The associated [ ] Λ21 HR = Ik whose eigenvalues are ±λi for 1 ≤ i ≤ k. Now, let {R(U ), R(V )} be an approximate pair to {R(Φ1 ), R(Ψ1 )}. It is shown that the k smallest positive eigenvalues of [ ] W1− T U T KU W1−1 HSR := (4.6) W2− T V T M V W2−1 are the best approximations to λ1 ≤ · · · ≤ λk by the pair {R(U ), R(V )} in the sense specified there [1, Theorem 2.5], where nonsingular W1 , W2 ∈ Rk×k are from decomposing U T E+ V : U T E+ V = W1T W2 . Theorem 4.1. Assume that M is definite. Let {R(U ), R(V )} be an approximate pair to the pair of deflating subspaces {R(Φ1 ), R(Ψ1 )} of H − λE, where U, V ∈ Rn×k and rank(U T E+ V ) = k. Denote the eigenvalues of HSR in (4.6) by −µk ≤ · · · ≤ −µ1 ≤ µ1 ≤ · · · ≤ µk and let δk =
∑k
2 i=1 (λi
− µ2i ). Then
(λ2k+1 − λ2k )∥ sin ΘM −1 (E− U, E− Φ1 )∥2F ≤ δk ≤ +
k ∑
λ2i · tan2 θM −1 (E− U, M V )
i=1 λ2n
− λ21 ∥ sin ΘM −1 (E− U, E− Φ1 )∥2F . cos2 θM −1 (E− U, M V ) (4.7)
As a result,
√ ∥ sin ΘM −1 (E− U, E− Φ1 )∥F ≤
19
δk . − λ2k
λ2k+1
(4.8)
Proof. By the equivalent relationship between (4.1) and (4.2), we can apply Theorem 3.1 to (4.2) to get T T 2 (λ2k+1 − λ2k )∥ sin ΘM f−1 (C U, C Φ1 )∥F ≤ δk ≤
k ∑
T Tf T λ2i · tan2 θM f−1 (C U, C M D V )
i=1
+
cos2 θM f−1
λ2n − λ21 fDT V (C T U, C T M
)
T T 2 ∥ sin ΘM f−1 (C U, C Φ1 )∥F .
Because T T T T ΘM f−1 (C U, C Φ1 ) = ΘD T Y Y T D (C U, C Φ1 )
= Θ(Y T DC T U, Y T DC T Φ1 ) = Θ(Y T E− U, Y T E− Φ1 ) = ΘM −1 (E− U, E− Φ1 ), T T T f T f ΘM f−1 (C U, M D V ) = Θ(Y DC U, Y D M D V ) T
(4.9)
T
= Θ(Y T E− U, Y T M V ) = ΘM −1 (E− U, M V ),
(4.10)
the assertion follows. Remark 4.1. By a similar argument, Theorems 3.2 and 3.3 can also be generalized. We omit the detail.
5
Conclusions
As an important notion in the linear response eigenvalue problem, the pair of deflating subspaces for {K, M } is not only a vital object in analyzing the theoretical properties such as the subspace version of Thouless minimization principle and the Cauchy-like interlacing inequalities, but also leads to very efficient algorithms (e.g., the 4D-search LOBPCG [3], a block Chebyshev-Davidson method [19] as well as the generalized Lanczos method [18]). Related, the estimation for the approximation of the pair of deflating pair {R(Φ1 ), R(Ψ1 )} also becomes important. In this paper, following the approximation for the Ritz values and Ritz vectors for the symmetric eigenvalue problem, we have established similar Ritz approximation results for the eigenvalues λ1 ≤ · · · ≤ λk as well as for the pair of deflating subspaces {R(Φ1 ), R(Ψ1 )}. These estimations can not only reveal some properties of the LREP, but is useful in analyzing certain iterative methods.
A
A slightly sharper bound for δk
In this appendix, we present a slightly sharper but more involved bound than Theorem 3.1 for δk . The following well-known von Neumann’s trace inequality [24] is used. Lemma A.1. If Z, W are complex n × n matrices with singular values ρ1 ≤ · · · ≤ ρn ,
ϱ 1 ≤ · · · ≤ ϱn , 20
respectively, then | trace(ZW )| ≤
n ∑
ρi ϱi .
i=1
As a straightforward result of the von Neumann’s trace inequality, we know if Z and W are positive semidefinite, we have trace(ZW ) ≤ ∥Z∥2 trace(W ).
(A.1)
Theorem A.1. Assume that M is definite. Let {R(U ), R(V )} be any approximation pair to {R(Φ1 ), R(Ψ1 )} with U ∈ Rn×k , V ∈ Rn×k and rank(U T V ) = k, and denote the eigenvalues of HSR in (2.14) by −µk ≤ · · · ≤ −µ1 ≤ µ1 ≤ · · · ≤ µk Then ψ1 + ψ2 ≤
k ∑
(µ2i − λ2i ) ≤ χ1 + χ2 ,
(A.2)
i=1
where χ1 =
k ∑
λ2i · tan2 θM −1 (U, M V ),
i=1
{
χ2 = min
(kλ2n
−
k ∑
[ λ2i )
i=1
(A.3)
sin θM −1 (U, Φ1 ) cos θM −1 (U, M V )
]2
[ ,
(λ2n
−
λ21 )
∥ sin ΘM −1 (U, Φ1 )∥F cos θM −1 (U, M V )
]2 } , (A.4)
and ψ1 =
k ∑ i=1
λ2i · tan2 θM −1 ;k (U, M V ), {
ψ2 = max (kλ2k+1 −
k ∑ i=1
(A.5)
[
sin θM −1 ;k (U, Φ1 ) λ2i ) cos θM −1 ;k (U, M V )
]2
[
∥ sin ΘM −1 (U, Φ1 )∥F , (λ2k+1 − λ2k ) cos θM −1 ;k (U, M V )
]2 }
(A.6) Proof. Following the proof of Theorem 3.1 up to (3.13) to get (3.11) and
T
e , I(:,1:k) )
, ∥sin ΘM −1 (U, Φ1 )∥ui = sin Θ(U
= P12 ui ui
T e , I(:,1:k) ) , ∥cos ΘM −1 (U, Φ1 )∥ui = cos Θ(U
= P11 ui ui
(A.7a) (A.7b)
for any unitarily invariant norm ∥ · ∥ui . The singular values of P11 are νi = cos θM −1 ;i (U, Φ1 ) for 1 ≤ i ≤ k. 21
(A.8)
.
Note 0 ≤ ν1 ≤ · · · ≤ νk . We show now bound the two terms in the right-hand-side of (3.11) from above. First, by the von Neumann’s trace inequality in Lemma A.1, we have e1 U e1T ) ≤ trace(Λ21 U
k ∑
λ2i ζi2 ,
(A.9)
i=1
e1 . Note that ζ 2 is also the ith smallest where 0 ≤ ζ1 ≤ · · · ≤ ζk are the singular values of U i e1 , i.e., e TU eigenvalue of U 1 e1T U e1 ) ζi2 = λi (U −1 T −1 ), ([ −1 λi (Γ ] P11 P11 Γ[ −1 ]) Γ Γ = T P P , λi I2k−n 11 11 I2k−n
if 2k ≤ n if 2k > n
≤ νi2 ∥Γ −1 ∥22 ,
(A.10)
where λi (·) is the ith smallest eigenvalues of a symmetric matrix. Thus by (A.9), (A.8), (A.10) and (3.10), we have e1 U e T) ≤ trace(Λ21 U 1
k ∑
∑ 1 − sin2 θM −1 ;i (U, Φ1 ) cos2 θM −1 ;i (U, Φ1 ) = λ2i . cos2 θM −1 (U, M V ) cos2 θM −1 (U, M V ) k
λ2i
i=1
(A.11)
i=1
e2 U e T ), we have by (A.1) Analogously, for trace(Λ22 U 2 e2 U e T ) ≤ λ2 · trace(U e TU e trace(Λ22 U 2 n 2 2) 2 · trace(Γ −1 P P T Γ −1 ), ]) ] 12 12 [ −1 ([λn −1 Γ Γ = T 2 , P P λn · trace I2k−n I2k−n 12 12
if 2k ≤ n if 2k > n
≤ λ2n ∥P12 ∥2F · ∥Γ −1 ∥22 =
λ2n
k ∑ sin2 θM −1 ;i (U, Φ1 ) · , cos2 θM −1 (U, M V )
(A.12)
i=1
where the last equality is from (A.7) and (3.10). Combining (A.11), (A.12) and (3.11), we finally have k ∑
(µ2i
−
λ2i )
≤
i=1
k ∑
λ2i
i=1
+ λ2n ·
1 − sin2 θM −1 ;i (U, Φ1 ) − cos2 θM −1 (U, M V ) cos2 θM −1 (U, M V )
k ∑ sin2 θM −1 ;i (U, Φ1 ) cos2 θM −1 (U, M V ) i=1
=
k ∑
λ2i · tan2 θM −1 (U, M V ) +
i=1
k ∑ i=1
22
(λ2n − λ2i )
sin2 θM −1 ;i (U, Φ1 ) cos2 θM −1 (U, M V )
∑ [ ]2 k (λ2n − λ2 ) · sin θM −1 (U,Φ1 ) , i i=1 cos θM −1 (U,M V ) ]2 [ ≤ λ2i · tan2 θM −1 (U, M V ) + (λ2 − λ2 ) · ∥ sin ΘM −1 (U,Φ1 )∥F , i=1 n 1 cos θ −1 (U,M V ) k ∑
M
which yields the upper bound in (A.2). Now, we prove the lower bound for δk based on (3.11). For the first term in the right-hand-side of (3.11), by using [2, Lemma A.2], we have e T) ≥ e1 U trace(Λ21 U 1
k ∑
2 λ2i ζk−i+1 .
(A.13)
i=1 2 e e TU Note that ζk−i+1 is also the ith largest eigenvalue of U 1 1 , i.e., 2 e1T U e1 ) ζk−i+1 = λk−i+1 (U −1 T −1 ), k−i+1 (Γ ] P11 P11 Γ ([ λ ]) [ −1 Γ −1 Γ = T P P , λk−i+1 I2k−n 11 11 I2k−n { ν2 k−i+1 , if 2k ≤ n, ∥Γ ∥22 ≥ 2 νk−i+1 , if 2k > n.
if 2k ≤ n if 2k > n (A.14)
From (3.9), for 2k ≤ n, we know that ∥Γ ∥2 = cos θM −1 ;k (U, M V ),
(A.15)
νk−i+1 = cos θM −1 ;k−i+1 (U, Φ1 ).
(A.16)
and moreover by (A.8), Thus, by (A.14), (A.15) and (A.16), (A.13) leads to 2 ∑k 2 cos θM −1 ;k−i+1 (U,Φ1 ) , λ i=1 i cos2 θM −1 ;k (U,M V ) e1 U e1T ) ≥ trace(Λ21 U ∑k λ2 cos2 θ −1 M ;k−i+1 (U, Φ1 ), i=1 i
if 2k ≤ n,
(A.17)
if 2k > n.
On the other hand, for the second term in (3.11), we have e2 U e T ) ≥ λ2 · trace(U e TU e trace(Λ22 U 2 2 2) k+1 −1 T −1 ), 2 λ k+1 · trace(Γ ] P12 P12 Γ [ −1 ]) ([ −1 = Γ Γ T 2 , P P λk+1 · trace I2k−n I2k−n 12 12 { λ2 ·∥P ∥2 12 F k+1 , if 2k ≤ n ∥Γ ∥22 ≥ 2 2 · ∥P12 ∥F , if 2k > n λ k+1 ∑ sin2 θ −1 ;k−i+1 (U,Φ1 ) λ2k+1 · ki=1 cos2Mθ −1 , if 2k ≤ n, M ;k (U,M V ) = ∑ λ2 · k sin2 θ −1 if 2k > n M ;k−i+1 (U, Φ1 ), i=1 k+1 23
if 2k ≤ n if 2k > n
(A.18)
where the last equality is based on (A.15) and (A.7). Consequently, by combing (A.18) and (A.17), we have for 2k ≤ n δk ≥
k ∑
λ2i
i=1
+
λ2k+1
cos2 θM −1 ;k−i+1 (U, Φ1 ) − cos2 θM −1 ;k (U, M V ) cos2 θM −1 ;k (U, M V ) ·
k ∑ sin2 θM −1 ;k−i+1 (U, Φ1 )
cos2 θM −1 ;k (U, M V )
i=1
=
k ∑
λ2i
i=1
sin2 θM −1 ;k (U, M V ) − sin2 θM −1 ;k−i+1 (U, Φ1 ) cos2 θM −1 ;k (U, M V )
+ λ2k+1 ·
k ∑ sin2 θM −1 ;k−i+1 (U, Φ1 )
cos2 θM −1 ;k (U, M V )
i=1 k ∑
k ∑ sin2 θM −1 ;k−i+1 (U, Φ1 ) = · tan θM −1 ;k (U, M V ) + (λ2k+1 − λ2i ) cos2 θM −1 ;k (U, M V ) i=1 i=1 ∑k 2 2 i=1 (λk+1 −λi ) k ∑ sin2 θM −1 ;k (U, Φ1 ), 2θ cos (U,M V) −1 2 2 M ;k ≥ λi · tan θM −1 ;k (U, M V ) + λ2 −λ2k cos2 θ k+1 (U,M ∥ sin ΘM −1 (U, Φ1 )∥2F , i=1 V) −1
λ2i
2
M
;k
which yields the lower bound of δk in (A.2) for the case 2k ≤ n. Lastly, if 2k > n, from (A.18) and (A.17), it follows that δk ≥ =
k ∑
λ2i (cos2 θM −1 ;k−i+1 (U, Φ1 )
− 1) +
i=1 k ∑
λ2k+1
·
k ∑
sin2 θM −1 ;k−i+1 (U, Φ1 )
i=1
(λ2k+1 − λ2i ) sin2 θM −1 ;k−i+1 (U, Φ1 )
i=1
≥ (λ2k+1 − λ2k ) · ∥ sin ΘM −1 (U, Φ1 )∥2F . Since when 2k > n, rank(U T Φ2 ) ≤ n − k < k and thus θM −1 ;k (U, Φ1 ) = 0, which implies that ψ1 + ψ2 = (λ2k+1 − λ2k ) · ∥ sin ΘM −1 (U, Φ1 )∥2F . In other words, the lower bounds for the cases 2k ≤ n and 2k > n can be combined as in (A.2) and the proof is completed. We finally remark that, as we did in Theorem 3.2, the above theorem can also be applied by swapping the roles of K and M to yield a counterpart version of Theorem 3.2. The details are omitted.
24
References [1] Z. Bai and R.-C. Li. Minimization principle for linear response eigenvalue problem, III: General case. Technical Report 2013-01, Department of Mathematics, University of Texas at Arlington, May 2013. Available at http://www.uta.edu/math/preprint/. [2] Zhaojun Bai and Ren-Cang Li. Minimization principle for linear response eigenvalue problem, I: Theory. SIAM J. Matrix Anal. Appl., 33(4):1075–1100, 2012. [3] Zhaojun Bai and Ren-Cang Li. Minimization principles for the linear response eigenvalue problem II: Computation. SIAM J. Matrix Anal. Appl., 34(2):392–416, 2013. [4] P. Benner, V. Mehrmann, and H. Xu. Perturbation analysis for the eigenvalue problem of a formal product of matrices. BIT, 42(1):1–43, 2002. [5] U. Flaschka, W.-W. Lin, and J.-L. Wu. A KQZ algorithm for solving linear-response eigenvalue equations. Linear Algebra Appl., 165:93–123, 1992. [6] R. Granat, B. K˚ agstr¨ om, and D. Kressner. Computing periodic deflating subspaces associated with a specified set of eigenvalues. BIT, 43(1):1–18, 2003. [7] A. V. Knyazev. Sharp a priori error estimates of the Rayleigh-Ritz method without assumptions of fixed sign or compactness. Math. Notes, 38:998–1002, 1986. [8] Andrew V. Knyazev and Merico E. Argentati. Principal angles between subspaces in an Abased scalar product: Alogarithms and perturbation estimates. SIAM J. Matrix Anal. Appl., 23(6):2008–2040, 2002. [9] J. Kovaˇc-Striko and K. Veseli´c. Some remarks on the spectra of Hermitian matrices. Linear Algebra Appl., 145:221–229, 1991. [10] Ren-Cang Li. Accuracy of computed eigenvectors via optimizing a Rayleigh quotient. BIT, 44(3):585–593, 2004. [11] W.-W. Lin, P. van Dooren, and Q.-F. Xu. Equivalent characterizations of periodical invariant subspaces. NCTS Preprints Series 1998-8, National Center for Theoretical Sciences, Math. Division, National Tsing Hua University, Hsinchu, Taiwan, 1998. [12] J. Olsen, H. J. Aa. Jensen, and P. Jørgensen. Solution of the large matrix equations which occur in response theory. J. Comput. Phys., 74(2):265–282, 1988. [13] J. Olsen and P. Jørgensen. Linear and nonlinear response functions for an exact state and for an MCSCF state. J. Chem. Phys., 82(7):3235–3264, 1985. [14] E. Ovtchinnikov. Cluster robust error estimates for the Rayleigh-Ritz approximation I: Estimates for invariant subspaces. Linear Algebra Appl., 415(1):167–187, 2006. [15] D. Rocca, Z. Bai, R.-C. Li, and G. Galli. A block variational procedure for the iterative diagonalization of non-Hermitian random-phase approximation matrices. J. Chem. Phys., 136:034111, 2012. [16] Dario Rocca. Time-Dependent Density Functional Perturbation Theory: New algorithms with Applications to Molecular Spectra. PhD thesis, The International School for Advanced Studies, Trieste,Italy, 2007. [17] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990.
25
[18] Zhongming Teng and Ren-Cang Li. Convergence analysis of Lanczos-type methods for the linear response eigenvalue problem. J. Comput. Appl. Math., 247:17–33, 2013. [19] Zhongming Teng, Yunkai Zhou, and Ren-Cang Li. A block Chebyshev-Davidson method for linear response eigenvalue problems. Technical Report 2013-11, Department of Mathematics, University of Texas at Arlington, September 2013. Available at http://www.uta.edu/math/preprint/, submited. [20] D. J. Thouless. Vibrational states of nuclei in the random phase approximation. Nuclear Physics, 22(1):78–95, 1961. [21] D. J. Thouless. The quantum mechanics of many-body systems. Academic, 1972. [22] E. V. Tsiper. Variational procedure and generalized Lanczos recursion for small-amplitude classical oscillations. JETP Letters, 70(11):751–755, 1999. [23] E. V. Tsiper. A classical mechanics technique for quantum linear response. J. Phys. B: At. Mol. Opt. Phys., 34(12):L401–L407, 2001. [24] John von Neumann. Some matrix-inequalities and metrization of matrix-space. Tomck. Univ. Rev., 1:286–300, 1937. [25] Hans F. Weinberger. Variational Methods for Eigenvalue Approximation, volume 15 of CBMSNSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1974.
26