Convergence of the Block Lanczos Method for ... - Semantic Scholar

Report 11 Downloads 76 Views
Convergence of the Block Lanczos Method for Eigenvalue Clusters

Ren-Cang Li Lei-Hong Zhang

Technical Report 2013-03 http://www.uta.edu/math/preprint/

Convergence of the Block Lanczos Method For Eigenvalue Clusters Ren-Cang Li˚

Lei-Hong Zhang:

April 26, 2013 Revised January 7, 2014

Abstract The Lanczos method is often used to solve a large scale symmetric matrix eigenvalue problem. It is well-known that the single-vector Lanczos method can only find one copy of any multiple eigenvalue and encounters slow convergence towards clustered eigenvalues. On the other hand, the block Lanczos method can compute all or some of the copies of a multiple eigenvalue and, with a suitable block size, also compute clustered eigenvalues much faster. The existing convergence theory due to Saad for the block Lanczos method, however, does not fully reflect this phenomenon since the theory was established to bound approximation errors in each individual approximate eigenpairs. Here, it is argued that in the presence of an eigenvalue cluster, the entire approximate eigenspace associated with the cluster should be considered as a whole, instead of each individual approximate eigenvectors, and likewise for approximating clusters of eigenvalues. In this paper, we obtain error bounds on approximating eigenspaces and eigenvalue clusters. Our bounds are much sharper than the existing ones and expose true rates of convergence of the block Lanczos method towards eigenvalue clusters. Furthermore, their sharpness is independent of the closeness of eigenvalues within a cluster. Numerical examples are presented to support our claims. Also a possible extension to the generalized eigenvalue problem is outlined. Key words. Block Lanczos method, Krylov subspace, clustered eigenvalues, eigenspace, rate of convergence, Chebyshev polynomial AMS subject classifications. 65F15 ˚ Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 760190408, USA. ([email protected].) Supported in part by NSF grants DMS-1115834 and DMS-1317330, and a Research Gift Grant from Intel Corporation. : Department of Applied Mathematics, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, People’s Republic of China. Supported in part by the National Natural Science Foundation of China NSFC-11101257, NSFC-11371102, and the Basic Academic Discipline Program, the 11th five year plan of 211 Project for Shanghai University of Finance and Economics. Part of this work is done while this author is a visiting scholar at the Department of Mathematics, University of Texas at Arlington from February 2013 to January 2014.

1

1

Introduction

The Lanczos method [15] is widely used for finding a small number of extreme eigenvalues and their associated eigenvectors of a symmetric matrix (or Hermitian matrix in the complex case). It requires only matrix-vector products to extract enough information to compute the desired solutions, and thus is very attractive in practice when the matrix is sparse and its size is too large to be solved by, e.g., the QR algorithm [8, 20] or the matrix does not exist explicitly but in the form of being capable of generating matrix-vector multiplications. Let A be an N ˆ N Hermitian matrix. Given an initial vector v0 , the single-vector Lanczos method begins by recursively computing an orthonormal basis tq1 , q2 , . . . , qn u of the nth Krylov subspace of A on v0 : Kn pA, v0 q “ spantv0 , Av0 , . . . , An´1 v0 u

(1.1)

and at the same time the projection of A onto Kn pA, v0 q: Tn “ QH n AQ, where Qn “ ˜ wq of Tn : rq1 , q2 , . . . , qn s and usually n ! N . Afterwards some of the eigenpairs pλ, ˜ Tn w “ λw, ˜ Qn wq of A. especially the extreme ones, are used to construct approximate eigenpairs pλ, ˜ The number λ is called a Ritz value and Qn w a Ritz vector. This procedure of computing approximate eigenpairs is not limited to Krylov subspaces but in general works for any given subspace. It is called the Rayleigh-Ritz procedure. The single-vector Lanczos method has difficulty in computing all copies of a multiple eigenvalue of A. In fact, only one copy of the eigenvalue can be found. To compute all copies, a block Lanczos method with a block size that is no smaller than the multiplicity of the eigenvalue or some variation of the Lanczos method with deflating strategies has to be used. The single-vector Lanczos method has difficulty in handling eigenvalue clusters – slow convergence to each individual eigenvalue in the cluster. The closer the eigenvalues in the cluster are, the slower the convergence will be. A block version should perform better. This is indeed true and well-known. There are a few block versions, e.g., the ones introduced by Golub and Underwood [9], Cullum and Donath [5], and, more recently, by Ye [24] for an adaptive block Lanczos method (see also Cullum and Willoughby [6], Golub and van Loan [10]). The basic idea is to use an N ˆ nb matrix V0 , instead of a single vector v0 , and accordingly an orthonormal basis of the nth Krylov subspace of A on V0 : Kn pA, V0 q “ spantV0 , AV0 , . . . , An´1 V0 u

(1.2)

will be generated, as well as the projection of A onto Kn pA, V0 q. Afterwards the same Rayleigh-Ritz procedure is applied to compute approximate eigenpairs of A. There has been a wealth of development, in both theory and implementation, for the Lanczos methods, mostly for the single-vector version. The most complete reference up to 1998 is Parlett [20]. This paper is concerned with the theoretical convergence theory of the block Lanczos method. Related past works include Kaniel [12], Paige [19], 2

Saad [21], Li [18], as well as the potential-theoretic approach in Kuijlaars [13, 14] who, from a very different perspective, investigated which eigenvalues are found first according to the eigenvalue distribution as N Ñ 8, and what are their associated convergence rates as n goes to 8 while n{N stays fixed. Results from these papers are all about the convergence of a single eigenvalue and eigenvector, even in the analysis of Saad on the block Lanczos method. The focus in this paper is, however, on the convergence of a cluster of eigenvalues, including multiple eigenvalues, and their associated eigenspace for the Hermitian eigenvalue problem. Our results distinguish themselves from those of Saad [21] in that they bound errors in approximate eigenpairs belonging to eigenvalue clusters together, rather than separately for each individual eigenpair. The consequence is much sharper bounds as our later numerical examples will demonstrate. One of the key steps in analyzing the convergence of Lanczos methods is to pick (sub)optimal polynomials to minimize error bounds. For any eigenpair other than the first one, it is often the standard practice, as in [21], that each chosen polynomial has a factor to annihilate vector components in all proceeding eigenvector directions, resulting in a “bulky” factor in the form of the product involving all previous eigenvalues/Ritz values in the error bound. The factor can be big and likely is an artifact of the analyzing technique. We propose also a new kind of error bounds that do not have such a “bulky” factor, but require knowledge of the distance from the interested eigenspace to a Krylov subspace Ki of a lower order as a tradeoff. The rest of this paper is organized as follows. Section 2 collects some necessary results on unitarily invariant norms and canonical angles between subspaces for our later use. Section 3 presents the (simplest) block Lanczos method whose convergence analysis that results in error bounds of the eigenspace/eigenvalue cluster type is done in section 4 for eigenspaces and section 5 for eigenvalues. In section 6, we perform a brief theoretical comparison between our results and related results derived from those of Saad [21] and point out when Saad’s bounds will overestimate the true rate of convergence. Numerical examples are given in section 7 to support our comparison analysis. Section 8 establishes more bounds, based on the knowledge of Krylov subspaces of lower orders. In section 9, we outline a possible extension of our results to the generalized eigenvalue problem A ´ λM solved by a block Lanczos method. Finally, we present our conclusion in section 10. Throughout this paper, A is an N ˆ N Hermitian matrix, and has eigenvalues: orthonormal eigenvectors: eigen-decomposition:

λ1 ě λ2 ě ¨ ¨ ¨ ě λN , and Λ “ diagpλ1 , λ2 , . . . , λN q, u1 , u2 , . . . , uN , and U “ ru1 , u2 , . . . , uN s, A “ U ΛU H and U H U “ IN .

(1.3)

Cnˆm is the set of all n ˆ m complex matrices, Cn “ Cnˆ1 , and C “ C1 . Pk is the set of polynomial of degree no bigger than k. In (or simply I if its dimension is clear from the context) is the n ˆ n identity matrix, and ej is its jth column. The superscript “¨H ” takes the complex conjugate transpose of a matrix or vector. We shall also adopt 3

MATLAB-like convention to access the entries of vectors and matrices. Let i : j be the set of integers from i to j inclusive. For a vector u and a matrix X, upjq is u’s jth entry, Xpi,jq is X’s pi, jqth entry; X’s submatrices Xpk:ℓ,i:jq , Xpk:ℓ,:q , and Xp:,i:jq consist of intersections of row k to row ℓ and column i to column j, row k to row ℓ, and column i to column j, respectively. RpXq is the column space of X, i.e., the subspace spanned by the columns of X, and eigpXq denotes the set of all eigenvalues of a square matrix X. For matrices or scalars Xi , both diagpX1 , . . . , Xk q and X1 ‘ ¨ ¨ ¨ ‘ Xk denote the same block diagonal matrix with the ith diagonal block Xi .

2

Preliminaries

2.1

Unitarily invariant norm

A matrix norm ~ ¨ ~ is called a unitarily invariant norm on Cmˆn if it is a matrix norm and has the following two properties [1, 22] ‌ ‌ 1. ‌X H BY ‌ “ ~B~ for all unitary matrices X and Y of apt sizes and B P Cmˆn . 2. ~B~ “ }B}2 , the spectral norm of B, if rankpBq “ 1. Two commonly used unitarily invariant norms are the spectral norm: the Frobenius norm:

}B}2 “ max břj σj , 2 }B}F “ j σj ,

where σ1 , σ2 , . . . , σmintm,nu are the singular values of B. The trace norm ÿ ~B~trace “ σj j

is a unitarily invariant norm, too. In what follows, ~¨~ denotes a general unitarily invariant norm. In this article, for convenience, any ~ ¨ ~ we use is generic to matrix sizes in the sense that it applies to matrices of all sizes. Examples include the matrix spectral norm }¨}2 , the Frobenius norm } ¨ }F , and the trace norm. One important property of unitarily invariant norms is ~XY Z~ ď }X}2 ¨ ~Y ~ ¨ }Z}2 for any matrices X, Y , and Z of compatible sizes. Lemma 2.1. Let H and M be two Hermitian matrices, and let S be a matrix of a compatible size as determined by the Sylvester equation HY ´ Y M “ S. If eigpHq X eigpM q “ H, then the equation has a unique solution Y , and moreover c ~Y ~ ď ~S~ , η where η “ min |µ ´ ω| over all µ P eigpM q and ω P eigpHq, and the constant c lies between 1 and π{2, and it is 1 for the Frobenius norm, or if either eigpHq is in a closed interval that contains no eigenvalue of M or vice versa. 4

This lemma for the Frobenius norm and for the case when either eigpHq is in a closed interval that contains no eigenvalue of M or vice versa is essentially in [7] (see also [22]), and it is due to [2, 3] for the most general case: eigpHq X eigpM q “ H and any unitarily invariant norm.

2.2

Angles between subspaces

Consider two subspaces X and Y of CN and suppose k :“ dimpXq ď dimpYq “: ℓ.

(2.1)

Let X P CN ˆk and Y P CN ˆℓ be orthonormal basis matrices of X and Y, respectively, i.e., X H X “ Ik , X “ RpXq,

and

Y H Y “ Iℓ , Y “ RpY q,

and denote by σj for 1 ď j ď k in ascending order, i.e., σ1 ď ¨ ¨ ¨ ď σk , the singular values of Y H X. The k canonical angles θj pX, Yq from1 X to Y are defined by 0 ď θj pX, Yq :“ arccos σj ď

π 2

for 1 ď j ď k.

(2.2)

They are in descending order, i.e., θ1 pX, Yq ě ¨ ¨ ¨ ě θk pX, Yq. Set ΘpX, Yq “ diagpθ1 pX, Yq, . . . , θk pX, Yqq.

(2.3)

It can be seen that angles so defined are independent of the orthonormal basis matrices X and Y , which are not unique. A different way to define these angles is through the orthogonal projections onto X and Y [23]. When k “ 1, i.e., X is a vector, there is only one canonical angle from X to Y and so we will simply write θpX, Yq. In what follows, we sometimes place a vector or matrix in one or both arguments of θj p ¨ , ¨ q, θp ¨ , ¨ q, and Θp ¨ , ¨ q with the understanding that it is about the subspace spanned by the vector or the columns of the matrix argument. Proposition 2.1. Let X and Y be two subspaces in CN satisfying (2.1). p Ď Y with dimpYq p “ dimpXq “ k, we have θj pX, Yq ď θj pX, Yq p for 1 ď j ď k. (a) For any Y (b) There exist an orthonormal basis tx1 , x2 , . . . , xk u for X and an orthonormal basis ty1 , y2 , . . . , yℓ u for Y such that θj pX, Yq “ θj pxj , yj q xH i yj “ 0

for 1 ď j ď k, and for 1 ď i ď k, k ` 1 ď j ď ℓ.

p where Y p “ spanty1 , y2 , . . . , yk u, and the subspace In particular, ΘpX, Yq “ ΘpX, Yq, p Y is unique if θ1 pX, Yq ă π{2. 1

If k “ ℓ, we may say that these angles are between X and Y.

5

Proof. Let X P CN ˆk and Y P CN ˆℓ be orthonormal basis matrices of X and Y, respecp Ď Y with dimpYq p “ dimpXq “ k, then there is W P Cℓˆk with orthonormal tively. If Y p Since cos θj pX, Yq p for p columns such that Y “ Y W is an orthonormal basis matrix of Y. 1 ď j ď k are the singular values of X H Y W which are no bigger than the singular values p ď cos θj pX, Yq individually, or equivalently, θj pX, Yq p ě θj pX, Yq of X H Y , i.e., cos θj pX, Yq for 1 ď j ď k. This is item (a). For item (b), let X H Y “ V ΣW H be the SVD of X H Y , where Σ “ diagpσ1 , σ2 , . . . , σk q ‘ 0kˆpℓ´kq . Then XV and Y W are orthonormal basis matrices of X and Y, respectively, and their columns, denoted by xi and yj , respectively, satisfy the specified requirements in the theorem. If also θ1 pX, Yq ă π{2, then all σi ą 0 and the first k columns of W spans p is unique for each given basis matrix Y . We have to prove RpY H Xq which is unique; so Y p is independent of the choosing of Y . Let Yr be another orthonormal basis matrix of that Y Y. Then Yr “ Y Z for some ℓ ˆ ℓ unitary matrix Z. Following the above construction for p we will have a new Y pnew “ RpYr W Ăp:,1:kq q, where W Ă is from the SVD X H Yr “ Vr Σ W ĂH. Y, Notice X H Yr “ X H Y Z “ V ΣpW H Zq which is yet another SVD of X H Yr . Thus the columns of pZ H W qp:,1:kq “ Z H Wp:,1:kq span Ăp:,1:kq . Hence W Ăp:,1:kq “ the column space of Yr H X which is also spanned by the columns of W Z H Wp:,1:kq M for some nonsingular matrix M , and Ăp:,1:kq “ Y Wp:,1:kq M Yr W pnew “ RpYr W p as expected. Ăp:,1:kq q “ RpY Wp:,1:kq q “ Y, which implies Y Proposition 2.2. Let X and Y be two subspaces in CN satisfying (2.1), and let X P CN ˆk be an orthonormal basis matrix of X, i.e., X H X “ Ik . Then max sin θpXp:,jq , Yq ď ~sin ΘpX, Yq~ ď

1ďjďk

k ÿ

sin θpXp:,jq , Yq,

(2.4)

j“1

g f k fÿ max sin θpXp:,jq , Yq ď }sin ΘpX, Yq}F “ e sin2 θpXp:,jq , Yq ,

1ďjďk

(2.5)

j“1

~sin ΘpX, Yq~ ď ~tan ΘpX, Yq~ ď a

~sin ΘpX, Yq~ 1 ´ sin2 θ1 pX, Yq

.

(2.6)

Proof. Let YK P CN ˆpN ´ℓq be an orthonormal basis matrix of the orthogonal complement H of Y in CN . We observe that sin θj pX, Yq for 1 ď j ď k are the singular values ‌ ‌ of X YK and ‌ H ‌ ‌ ‌ thus ~sin ΘpX, Yq~ “ ‌X YK ‌. Observe also sin θpXp:,jq , Yq “ ‌X H YK ‌ “ }X H YK }2 . p:,jq

Therefore k ‌ ‌ ÿ ‌ ‌ ‌ H ‌ H max ‌Xp:,jq YK ‌ ď ~sin ΘpX, Yq~ “ ‌X H YK ‌ ď }Xp:,jq YK }2 ,

1ďjďk

j“1

6

p:,jq

g f k › › ›2 › fÿ › › H › H H Y › . max ›Xp:,jq YK › ď }sin ΘpX, Yq}F “ }X YK }F “ e ›Xp:,jq K›

1ďjďk

F

j“1

F

They yield both (2.4) and (2.5). For (2.6), we notice sin θj pX, Yq ď tan θj pX, Yq “

sin θj pX, Yq sin θj pX, Yq ď cos θj pX, Yq cos θ1 pX, Yq

for 1 ď j ď k. Proposition 2.3. Let X and Y be two subspaces in CN with equal dimension: dimpXq “ dimpYq “ k, and let X P CN ˆk be an orthonormal basis matrix of X, i.e., X H X “ Ik , and Y be a (not necessarily orthonormal) basis matrix of Y such that each column of Y is a unit vector, i.e., }Yp:,jq }2 “ 1 for all j. Then } sin ΘpX, Yq}2F ď }pY H Y q´1 }2

k ÿ

sin2 θpXp:,jq , Yp:,jq q.

(2.7)

j“1

Proof. Since sin2 θj pX, Yq for 1 ď j ď k are the eigenvalues of ” ıH ” ı Ik ´ X H Y pY H Y q´1{2 X H Y pY H Y q´1{2 “ ‰ “ pY H Y q´1{2 Y H Y ´ pX H Y qH pX H Y q pY H Y q´1{2 , we have } sin ΘpX, Yq}2F “

k ÿ

sin2 θj pX, Yq

j“1

‌ ” ıH ” ı‌ ‌ ‌ H H ´1{2 H H ´1{2 ‌ ‌ “ ‌Ik ´ X Y pY Y q X Y pY Y q ‌ trace ‌ ‌ ď }pY H Y q´1{2 }2 ‌Y H Y ´ pX H Y qH pX H Y q‌trace }pY H Y q´1{2 }2 ` ˘ “ }pY H Y q´1 }2 trace Y H Y ´ pX H Y qH pX H Y q k ” ı ÿ H “ }pY H Y q´1 }2 1 ´ Yp:,jq XX H Yp:,jq j“1 H

´1

ď }pY Y q

k ” ı ÿ H H }2 1 ´ Yp:,jq Xp:,jq Xp:,jq Yp:,jq j“1

“ }pY H Y q´1 }2

k ÿ

sin2 θpXp:,jq , Yp:,jq q,

j“1

as was to be shown. In obtaining (2.8), we have used H H H Yp:,jq XX H Yp:,jq “ }X H Yp:,jq }22 ě Yp:,jq Xp:,jq Xp:,jq Yp:,jq H Y H because Xp:,jq p:,jq is the jth entry of the vector X Yp:,jq .

7

(2.8)

Remark 2.1. The inequality (2.7) is about controlling the subspace angle ΘpX, Yq by the individual angles between corresponding basis vectors. These individual angles depend on the selection of the basis vectors as well as their labelling. By Proposition 2.1(2), it is possible to find basis vectors for both X and Y and match them perfectly such that θj pX, Yq collectively is the same as all individual angles between corresponding basis vectors. But, on the other hand, it is possible that for two close subspaces in the sense that ΘpX, Yq is tiny there are unfortunately chosen and labelled basis vectors to make one or more individual angles between corresponding basis vectors near or even π{2. In fact, this can happen even when X “ Y. Therefore in general the collection tθpXp:,jq , Yp:,jq q, 1 ď j ď ku cannot be controlled by ΘpX, Yq without additional information. Proposition 2.4. Let X and Y be two subspaces in CN with equal dimension: dimpXq “ dimpYq “ k. Suppose θ1 pX, Yq ă π{2. p Ď Y of dimension k1 “ dimpYq p ď k, there is a unique X p Ď X of dimension (a) For any Y p p k1 such that PY X “ Y, where PY is the orthogonal projection onto Y. Moreover p Yq p ď θj pX, Yq θj`k´k1 pX, Yq ď θj pX,

for 1 ď j ď k1

(2.9)

which implies p Yq~ p ď ~sin ΘpX, Yq~ . ~ sin ΘpX, (b) For any set ty1 , y2 , . . . , yk1 u of orthonormal vectors in Y, there is a set tx1 , x2 , . . . , xk1 u of linearly independent vectors such that PY xj “ yj for 1 ď j ď k1 . Moreover (2.9) p “ spantx1 , x2 , . . . , xk u and Y p “ spanty1 , y2 , . . . , yk u. holds for X 1 1 Proof. Let X P CN ˆk and Y P CN ˆk be orthonormal basis matrices of X and Y, respecp Ď Y can be represented by its orthonormal basis matrix Y Yp , where Yp P Ckˆk1 tively. Y p Ď X with the desired property. X p Ď X can satisfies Yp H Yp “ Ik1 . We need to find a X p p be represented by its basis matrix (not necessary orthonormal) X X, where X P Ckˆk1 is p“Y p is the same as nonsingular and to be determined. The equation PY X p “ Y Yp Y Y HX X

ô

p “ Yp Y HX X

ô

p “ pY H Xq´1 Yp X

(2.10)

p “ because θ1 pX, Yq ă π{2 implies Y H X is nonsingular. This proves the existence of X p is independent of how p Following the argument, one can also prove that this X RpX Xq. the orthonormal basis matrices X and Y are chosen, and thus unique. To prove (2.9), we p Yq p for 1 ď j ď k1 are the singular values of note that σ ˆj :“ cos θj pX, p p H pX Xqs p ´1{2 “ Yp H Y H X Xr p X p H Xs p ´1{2 pY Yp qH pX XqrpX Xq ” ı´1{2 “ Yp H pY H Xq´ H pY H Xq´1 Yp . So the eigenvalues of Yp H pY H Xq´ H pY H Xq´1 Yp are σ ˆj´2 for 1 ď j ď k1 . On the other hand, σj :“ cos θj pX, Yq for 1 ď j ď k are the singular values of Y H X. So the eigenvalues of 8

pY H Xq´ H pY H Xq´1 are σj´2 for 1 ď j ď k. Use the Cauchy interlacing inequalities [20] to conclude that ´2 σj´2 ě σ ˆj´2 ě σj`k´k for 1 ď j ď k1 1 which yield (2.9). This proves item (a). To prove item (b), we pick the orthonormal basis matrix Y above in such a way p “ that its first k1 columns are y1 , y2 , . . . , yk1 . In (2.10), let Yp “ re1 , e2 , . . . , ek1 s, i.e., Y H ´1 p “ pY Xq Yp . Then rx1 , x2 , . . . , xk s :“ X X p gives what spanty1 , y2 , . . . , yk1 u, and let X 1 we need because of (2.10). p in the case of k1 “ 1 is Remark 2.2. The part of Proposition 2.4 on the existence of X essentially taken from [21, Lemma 4]. Remark 2.3. The canonical angles are defined under the standard inner product xx, yy “ xH y in CN . In a straightforward way, they can be defined under any given M -inner product xx, yyM “ xH M y, where M P CN ˆN is Hermitian and positive definite. We will call these angles the M -canonical angles. All results we have proved in this section are valid in slightly different forms for the M -canonical angles. Details are omitted.

3

Block Lanczos method

Given V0 P CN ˆnb with rankpV0 q “ nb , the block Lanczos process [5, 9] of Algorithm 3.1 is the simplest version and will generate an orthonormal basis of the Krylov subspace Kn pA, V0 q as well as a projection of A onto the Krylov subspace. It is simplest because we assume all Z at Lines 4 and 8 in Algorithm 3.1 have full column rank nb for all j. Then Vj P CN ˆnb , and Kn :“ Kn pA, V0 q “ RpV1 q ‘ ¨ ¨ ¨ ‘ RpVn q, (3.1) the direct sum of RpVj q for j “ 1, 2, . . . , n. A fundamental relation of the process is AQn “ Qn Tn ` r0N ˆnnb , Vn`1 Bn s,

(3.2)

Qn “ rV1 , V2 , . . . , Vn s P CN ˆnnb , and fi » A1 B1H ffi —B1 A2 B2H ffi — ffi — .. .. .. Tn “ QH AQ “ ffi P Cnnb ˆnnb . — n . . . n ffi — H fl – Bn´2 An´1 Bn´1 Bn´1 An

(3.3a)

where

(3.3b)

Tn “ QH n AQn is the so-called Rayleigh quotient matrix with respect to Kn and it is the projection of A onto Kn , too. Let Πn “ Qn QH (3.4) n

9

Algorithm 3.1 Simple Block Lanczos Process Given Hermitian A P CN ˆN and V0 P CN ˆnb with rankpV0 q “ nb , this generic block Lanczos process performs a partial tridiagonal reduction on A. 1: perform orthogonalization on given V0 P CN ˆnb (rankpV0 q “ nb ) to obtain V0 “ V1 B0 (e.g., via modified Gram-Schmit), where V1 P CN ˆnb satisfying V1H V1 “ Inb , and B0 P Cnb ˆnb ; 2: Z “ AV1 , A1 “ V1H Z; 3: Z “ Z ´ V1 A1 ; 4: perform orthogonalization on Z to obtain Z “ V2 B1 , where V2 P CN ˆnb satisfying V2H V2 “ Inb and B1 P Cnb ˆnb ; 5: for j “ 2 to n do Z “ AVj , Aj “ VjH Z; 6: H ; 7: Z “ Z ´ Vj Aj ´ Vj´1 Bj´1 8: perform orthogonalization on Z to obtain Z “ Vj`1 Bj , Vj`1 P CN ˆnb satisfying H V nb ˆnb ; Vj`1 j`1 “ Inb and Bj P C 9: end for H which is the orthogonal projection onto Kn . In particular Π1 “ Q1 QH 1 “ V1 V1 is the orthogonal projection onto RpV0 q “ RpV1 q. Basically the block Lanczos method is this block Lanczos process followed by solving the eigenvalue problem for Tn to obtain approximate eigenpairs for A: any eigenpair ˜ j , wj q of Tn gives an approximate eigenpair pλ ˜ j , Qn wj q for A. The number λ ˜ j is called pλ a Ritz value and u ˜j :“ Qn wj a Ritz vector. We introduce the following notation for Tn that will be used in the next two sections:

˜1 ě λ ˜2 ě ¨ ¨ ¨ ě λ ˜ nn , and λ b ˜1, λ ˜2, . . . , λ ˜ nn q, Ω “ diagpλ b w1 , w2 , . . . , wnnb , and W “ rw1 , w2 , . . . , wnnb s, Tn W “ W Ω and W H W “ Innb , u ˜j “ Qn wj for 1 ď j ď nnb , and r “ r˜ U u1 , u ˜2 , . . . , u ˜nnb s.

eigenvalues (also Ritz values): orthonormal eigenvectors: eigen-decomposition: Ritz vectors:

(3.5)

˜ j , wj , W on n is suppressed for convenience. Note the dependency of λ As in Saad [21], there is no loss of generality in assuming that all eigenvalues of A are of multiplicity not exceeding nb . In fact, let Pj be the orthogonal projections onto the eigenspaces corresponding to the distinct eigenvalues of A. Then à U :“ RpPj V0 q j

is an invariant subspace of A, and A|U , the restriction of A onto U, has the same distinct eigenvalues as A and the multiplicity of any distinct eigenvalue of A|U is no bigger than nb . 10

Since Kn pA, V0 q Ď U, what the block Lanczos method does is essentially to approximate some of the eigenpairs of A|U . When nb “ 1, Algorithm 3.1 reduces to the single-vector Lanczos process.

4

Convergence of eigenspaces

H Recall Πn in (3.4), and in particular, Π1 “ Q1 QH 1 “ V1 V1 . The matrix Xi,k,ℓ defined by the following lemma obviously depends on nb as well. This dependency is suppressed because nb is reserved throughout this article. For the rest of this and the next section, each of i, k, and ℓ will also be reserved for one assignment only: we are considering the ith to pi ` nb ´ 1qst eigenpairs of A among which the kth to ℓth eigenvalues may form a cluster as in

λtN

λi`n t b ´1

λℓ λk

λti

λt1

(cluster)

where 1 ď i ă n,

i ď k ď ℓ ď i ` nb ´ 1.

Recall (1.3). We are interested in bounding 1. the canonical angles from the invariant subspace RpUp:,k:ℓq q to the Krylov subspace Kn ” Kn pA, V0 q, 2. the canonical angles between the invariant subspace RpUp:,k:ℓq q and spant˜ uk , . . . , u ˜ℓ u (which we call a Ritz subspace), ˜ j for k ď j ď ℓ. 3. the differences between the eigenvalues λj and the Ritz values λ In doing so, we will use the jth Chebyshev polynomial of the first kind Tj ptq: Tj ptq “ cospj arccos tq „ ¯j ´ ¯´j ȷ a a 1 ´ “ t ` t2 ´ 1 ` t ` t2 ´ 1 2

for |t| ď 1,

(4.1)

for t ě 1.

(4.2)

It frequently shows up in numerical analysis and computations because of its numerous nice properties, for example |Tj ptq| ď 1 for |t| ď 1 and |Tj ptq| grows extremely fast2 for |t| ą 1. We will also need [17] ˇ ˆ ˙ˇ ˇ ˆ ˙ˇ ı ˇ ˇ ˇ ˇ 1” j t ` 1 1 ` t ´j ˇT j ˇ “ ˇT j ˇ“ ∆ ` ∆ for 1 ‰ t ą 0, (4.3) t t ˇ 1´t ˇ ˇ t´1 ˇ 2 where

? t`1 ∆t :“ ? | t ´ 1|

2

for t ą 0.

(4.4)

In fact, a result due to Chebyshev himself says that if pptq is a polynomial of degree no bigger than j and |pptq| ď 1 for ´1 ď t ď 1, then |pptq| ď |Tj ptq| for any t outside r´1, 1s [4, p.65].

11

In the rest of this section and the entire next section, we will always assume rankpV0H Up:,i:i`nb ´1q q “ rankpV1H Up:,i:i`nb ´1q q “ nb ,

(4.5)

and Xi,k,ℓ P CN ˆpℓ´k`1q will have the same assignment to be given in a moment. Consider an application of Proposition 2.4(b) with k1 “ ℓ ´ k ` 1, X “ RpV0 q “ RpV1 q,

Y “ RpUp:,i:i`nb ´1q q,

ry1 , y2 , . . . , yk1 s “ ruk , uk`1 , . . . , uℓ s.

The application yields a unique Xi,k,ℓ :“ rx1 , x2 , . . . , xk1 s such that RpXi,k,ℓ q Ď RpV0 q and H Up:,i:i`nb ´1q Up:,i:i`n Xi,k,ℓ “ Up:,k:ℓq ” ruk , uk`1 , . . . , uℓ s. b ´1q

Moreover, by Proposition 2.4(a), ‌ ‌ ‌ ‌ ‌sin ΘpUp:,k:ℓq , Xi,k,ℓ q‌ ď ‌sin ΘpUp:,i:i`n ´1q , V0 q‌ , b ‌ ‌ ‌ ‌ ‌tan ΘpUp:,k:ℓq , Xi,k,ℓ q‌ ď ‌tan ΘpUp:,i:i`n ´1q , V0 q‌ . b

(4.6)

(4.7) (4.8)

They show that the chosen RpXi,k,ℓ q has a significant component in the eigenspace RpUp:,k:ℓq q of interest if the initial RpV0 q has a significant component in the eigenspace RpUp:,i:i`nb ´1q q. The idea of picking such Xi,k,ℓ is essentially borrowed from [21, Lemma 4] (see Remark 2.2). Theorem 4.1. For any unitarily invariant norm ~ ¨ ~, we have ‌ ‌ ξi,k ‌tan ΘpUp:,k:ℓq , Xi,k,ℓ q‌ , Tn´i pκi,ℓ,nb q ‌ ‌ ‌ ‌ ‌ ‌ rp:,k:ℓq q‌ ď γ ‌sin ΘpUp:,k:ℓq , Kn q‌ ‌sin ΘpUp:,k:ℓq , U ‌ ‌ ‌tan ΘpUp:,k:ℓq , Kn q‌ ď

ď

(4.9) (4.10)

‌ ‌ γ ξi,k ‌tan ΘpUp:,k:ℓq , Xi,k,ℓ q‌ , Tn´i pκi,ℓ,nb q

(4.11)

1 ` δi,ℓ,nb , 1 ´ δi,ℓ,nb

(4.12)

where Xi,k,ℓ is defined by (4.6), and3 ξi,k “

i´1 ź λj ´ λN , λ ´ λk j“1 j

δi,ℓ,nb “

γ “1`

λℓ ´ λi`nb , λℓ ´ λN

κi,ℓ,nb “

c }Πn ApI ´ Πn q}2 , η

(4.13)

and the constant c lies between 1 and π{2, and it is4 1 for the Frobenius norm or if ˜ k´1 ą λk , and λ ˜ p |. η “ min |λj ´ λ (4.14) kďjďℓ păk, or pąℓ

For the Frobenius norm, γ in (4.13) can be improved to d ˙2 ˆ 1 γ “ 1` }Πn ApI ´ Πn q}2 . η 3 4

ś By convention, 0j“1 p¨ ¨ ¨ q ” 1. A by-product of this is that c “ 1 if k “ ℓ.

12

(4.15)

Proof. Write i´1 i´1

U“



nb

N ´nb ´i`1

U1 U2

U3

i´1



,

Λ“

nb

» –

nb

N ´nb ´i`1



Λ1 Λ2

N ´nb ´i`1

fl,

(4.16a)

Λ3

ˇ2 “ Up:,k:ℓq “ pU2 qp:,k´i`1:ℓ´i`1q “ ruk , . . . , uℓ s, U Λˇ2 “ Λpk:ℓ,k:ℓq “ pΛ2 qpk´i`1:ℓ´i`1,k´i`1:ℓ´i`1q “ diagpλk , . . . , λℓ q.

(4.16b) (4.16c)

For convenience, let’s drop the subscripts to Xi,k,ℓ because i, k, ℓ do not change. We have X “ U U H X “ U1 U1H X ` U2 U2H X ` U3 U3H X ˇ2 U ˇ2H X ` U3 U3H X “ U1 U1H X ` U

(4.17)

by (4.6). Let X0 “ XpX H Xq´1{2 which has orthonomal columns. We know that Rpf pAqX0 q Ă Kn for any f P Pn´1 , since RpX0 q “ RpXq Ď RpV0 q. By (4.17), ˇ2 f pΛˇ2 qU ˇ H X0 ` U3 f pΛ3 qU H X0 . Y :“ f pAqX0 “ U1 f pΛ1 qU1H X0 ` U 2 3

(4.18)

ˇ H X “ Iℓ´k`1 and thus U ˇ H X0 is nonsingular. Now if also f pΛˇ2 q is nonsingular By (4.6), U 2 2 (which is true for the selected f later), then ` H ˘´1 ` H ˘´1 ˇ X0 ˇ X0 Y U rf pΛˇ2 qs´1 “ U1 f pΛ1 qU1H X0 U rf pΛˇ2 qs´1 2 2 ` H ˘´1 ˇ2 ` U3 f pΛ3 qU3H X0 U ˇ 2 X0 `U rf pΛˇ2 qs´1 , (4.19) and consequently by Proposition 2.1 ‌ ‌ ‌ ‌ ˇ2 , Kn q‌ ď ‌tan ΘpU ˇ2 , Y q‌ ‌tan ΘpU ‌« ff‌ ` H ˘´1 ‌ ˇ X0 ˇ2 qs´1 ‌ rf p Λ ‌ f pΛ1 qU1H X0 U ‌ ` 2H ˘´1 “‌ ‌ ´1 ˇ X0 ˇ ‌ f pΛ3 qU3H X0 U ‌ rf pΛ2 qs 2 ‌„ ‌ « ff ` H ˘´1 ȷ ‌ ‌ ˇ X0 U1H X0 U ‌ f pΛ1 q 2 ˇ2 qs´1 ‌ ` ˘ “‌ rf p Λ ‌ ´1 ˇ H X0 f pΛ3 q U3H X0 U ‌ ‌ 2 ‌ ‌ 1 ˇ2 , X0 q‌ . ˆ ‌tan ΘpU ď max |f pλj q| ˆ max 1ďjďi´1 kďjďℓ |f pλj q|

(4.20)

i`nb ďjďN

We need to pick an f P Pn´1 to make the right-hand side of (4.20) as small as we can. To this end for the case i “ 1, we choose ˆ ˙N 2t ´ pλnb `1 ` λN q f ptq “ Tn´1 Tn´1 pκ1,ℓ,nb q (4.21) λnb `1 ´ λN 13

for which min f pλj q “ f pλℓ q “ 1,

max

nb `1ďjďN

kďjďℓ

|f pλj q| ď

1 . Tn´1 pκ1,ℓ,nb q

(4.22)

This, together with (4.20), concludes the proof of (4.9) for i “ 1. In general for i ą 1, we shall consider polynomials of form f ptq “ pλ1 ´ tq ¨ ¨ ¨ pλi´1 ´ tq ˆ gptq, and search a g P Pn´i such that

max

(4.23)

|gpλj q| is made as small as we can while

i`nb ďjďN

min |gpλj q| “ gpλℓ q “ 1. To this end, we choose

kďjďℓ

gptq “ Tn´i

ˆ

2t ´ pλi`nb ` λN q λi`nb ´ λN

˙N

Tn´i pκi,ℓ,nb q,

for which min gpλj q “ gpλℓ q “ 1,

max

i`nb ďjďN

kďjďℓ

|gpλj q| ď

(4.24)

1 . Tn´i pκi,ℓ,nb q

(4.25)

This, together with (4.20) and (4.23), concludes the proof of (4.9) for i ą 1. Next we prove (4.10) with an argument influenced by [11]. Recall (4.16). Let QK P CN ˆpN ´nnb q such that rQn , QK s is unitary, and write ˇ2 “ Qn Z ` QK ZK , U Hˇ ˇ where Z “ QH n U2 , ZK “ QK U2 . Then ‌ ‌ ˇ2 , Kn q‌ “ ~Z~ , ‌cos ΘpU

(4.26)

‌ ‌ ˇ2 , Kn q‌ “ ~ZK ~ . ‌sin ΘpU

(4.27)

ˇ2 “ U ˇ2 Λˇ2 , QH Keeping in mind that AU n AQn “ Tn , and Tn W “ W Ω from (3.5), we have

ñ ñ

Hˇ Hˇ ˇ QH n A rQn , QK srQn , QK s U2 “ Qn U2 Λ2 „ ȷ Z H rTn , Qn AQK s “ Z Λˇ2 ZK Tn Z ´ Z Λˇ2 “ ´QH AQK ZK .

(4.28)

n

Similarly to (4.16), partition W and Ω as k´1 k´1

W “



W1

ℓ´k`1

W2

nnb ´ℓ

W3

k´1

‰ ,

Ω“

ℓ´k`1

» –

ℓ´k`1

nnb ´ℓ



Ω1 Ω2

nnb ´ℓ

fl, Ω3

and set W1,3 :“ rW1 , W3 s and Ω1,3 :“ Ω1 ‘ Ω3 . Multiply (4.28) by W H from the left to get ΩW H Z ´ W H Z Λˇ2 “ ´W H QH n AQK ZK , and thus we have H H H Ω1,3 W1,3 Z ´ W1,3 Z Λˇ2 “ ´W1,3 QH n AQK ZK .

14

(4.29)

By Lemma 2.1, we conclude that ‌ H ‌ c ‌ H H ‌ ‌W1,3 Z ‌ ď ‌W1,3 Qn AQK ZK ‌ ď c }Πn ApI ´ Πn q}2 ~ZK ~ . η η

(4.30)

r1,3 “ Qn W1,3 . It can be verified that Let U H r1,3 qH pQn Zq “ pU r1,3 qH pU ˇ2 ´ QK ZK q “ pU r1,3 qH U ˇ2 W1,3 Z “ pU

by (4.26). Therefore ‌ ‌ ‌ ‌ ‌ ‌ r Hˇ ‌ ˇ2 , U rp:,k:ℓq q‌ , Q s U ‌sin ΘpU ‌ “ ‌rU 1,3 2‌ K ‌ ‌ ‌ ‌ r Hˇ ‌ ‌ Hˇ ‌ ‌ ď ‌pU 1,3 q U2 ‌ ` QK U2 ‌ H ‌ “ ‌W1,3 Z ‌ ` ~ZK ~

(4.31)

which, for the Frobenius norm, can be improved to an identity b ˇ2 , U rp:,k:ℓq q}F “ }W H Z}2 ` }ZK }2 . } sin ΘpU 1,3 F F The inequality (4.10) is now a consequence of (4.27), (4.30), and (4.31). Remark 4.1. Although the appearance of three integers, i, k, and ℓ makes Theorem 4.1 awkward and more complicated than simply taking i “ k or i ` nb ´ 1 “ ℓ, it provides the flexibility when it comes to apply (4.9) with balanced ξi,k (which should be made as small as possible) and δi,ℓ,nb (which should be made as large as possible). In fact, for given k and ℓ, both ξi,k and δi,ℓ,nb increase with i. But the right-hand side of (4.9) increases as ξi,k increases and decreases (rapidly) as δi,ℓ,nb increases. So we would like to make ξi,k as small as we can and δi,ℓ,nb as large as we can. In particular, if k ď ℓ ď nb , one can always pick i “ 1 so that (4.9) gets used with ξ1,k “ 1; but then if δ1,ℓ,nb is tiny, (4.9) is better used with some i ą 1. A general guideline is to make sure tλj uℓj“k is a cluster and the rest of λj are relatively far away.

5

Convergence of eigenvalues

In this section, we will bound the differences between the eigenvalues λj and the Ritz ˜ j for k ď j ď ℓ. values λ Theorem 5.1. Let k “ i. For any unitarily invariant norm, we have ‌ ‌ ‌ ˜ i , λi`1 ´ λ ˜ i`1 , . . . , λℓ ´ λ ˜ ℓ q‌ ‌diagpλi ´ λ ‌ „ ȷ2 ‌ 2 ‌ ζi ‌tan ΘpUp:,k:ℓq , Xi,k,ℓ q‌ , (5.1) ď pλi ´ λN q Tn´i pκi,ℓ,nb q

15

where κi,ℓ,nb is the same as the one in (4.12), and ˇ ˇ i´1 ˇ ˜ ˇ ź ˇ λj ´ λp ˇ ζi “ max ˇ. ˇ ˇ ˇ˜ i`1ďpďN j“1 λj ´ λi

(5.2)

˜ i´1 ě λi , then In particular, if also λ ˇ ˇ i´1 ˇ ˜ ˇ ź ˇ λj ´ λN ˇ ζi “ ˇ ˇ. ˇ˜ ˇ j“1 λj ´ λi

(5.3)

Proof. Upon shifting A by λi I to A´λi IN , we may assume λi “ 0. Doing so doesn’t change the Krylov subspace Kn pA, V0 q “ Kn pA ´ λi I, V0 q and doesn’t change any eigenvector and any Ritz vector of A, but it does shift all eigenvalues and Ritz values of A by the same ˜ j remain unchanged. Suppose amount, i.e., λi , and thus all differences λp ´ λj and λp ´ λ λi “ 0 and thus λi´1 ě λi “ 0 ě λi`1 . Recall (3.5), and adopt the proof of Theorem 4.1 up to (4.19). Take f as ˜ 1 ´ tq ¨ ¨ ¨ pλ ˜ i´1 ´ tq ˆ gptq, f ptq “ pλ

(5.4)

where g P Pn´i . We claim Y H Qn wj “ 0 for 1 ď j ď i ´ 1. This is because Y can be ˜ j IqYp for some matrix Yp P CN ˆpℓ´i`1q with RpYp q Ď Kn which represented by Y “ pA ´ λ ˜ j IqQn Yˇ further implies Yp “ Qn Yˇ for some matrix Yˇ P Cnnb ˆpℓ´i`1q . Thus Y “ pA ´ λ and ˜ ˜ ˇH Y H Qn wj “ Yˇ H QH n pA ´ λj IqQn wj “ Y pTn ´ λj Iqwj “ 0. Set

` H ˘´1 ˇ X0 ˇ2 ` U3 R3 , Y0 “ Y U rf pΛˇ2 qs´1 “ U1 R1 ` U (5.5) 2 ` H ˘´1 ˇ X0 where Rj “ f pΛj qUjH X0 U rf pΛˇ2 qs´1 . 2 Let Z “ Y0 pY0H Y0 q´1{2 , and denote by µ1 ě ¨ ¨ ¨ ě µℓ´i`1 the eigenvalues of Z H AZ which depends on f in (5.4) to be determined for best error bounds. Note Z H Qn rw1 , . . . , wi´1 s “ 0,

RpZq Ď Kn ,

Z H Z “ Iℓ´i`1 .

(5.6)

p because RpZq Ď Kn , where Z p has orthonormal columns. Apply Cauchy’s Write Z “ Qn Z H H H ˜ i`j´1 ě µj for p and QH AQn to get λ p interlace inequalities to Z AZ “ Z pQn AQn qZ n 1 ď j ď ℓ ´ i ` 1, and thus ˜ i`j´1 ď λi`j´1 ´ µj 0 ď λi`j´1 ´ λ

for 1 ď j ď ℓ ´ i ` 1.

(5.7)

In particular, this implies µj ď λi`j´1 ď λi ď 0 and consequently Y0H AY0 is negative semidefinite. Therefore for any nonzero vector y P Cℓ´i`1 , 0 ě y H Y0H AY0 y “ y H R1H Λ1 R1 y ` y H Λˇ2 y ` y H R3H Λ3 R3 y ě y H Λˇ2 y ` y H RH Λ3 R3 y, 3

16

y H Y0H Y0 y “ y H R1H R1 y ` y H y ` y H R3H R3 y ě y H y, where we have used y H R1H Λ1 R1 y ě 0, y H R1H R1 y ě 0, and y H R3H R3 y ě 0. Therefore 0ě

y H Y0H AY0 y y H Y0H AY0 y y H pΛˇ2 ` R3H Λ3 R3 qy ě ě . yHy yHy y H Y0H Y0 y

(5.8)

Denote by µ ˆ1 ě ¨ ¨ ¨ ě µ ˆℓ´i`1 the eigenvalues of Λˇ2 ` R3H Λ3 R3 . By (5.8), we know µj ě µ ˆj for 1 ď j ď ℓ ´ i ` 1 which, together with (5.7), lead to ˜ i`j´1 ď λi`j´1 ´ µ 0 ď λi`j´1 ´ λ ˆj

for 1 ď j ď ℓ ´ i ` 1.

(5.9)

Hence for any unitarily invariant norm [16, 22] ‌ ‌ ‌ ˜ i , λi`1 ´ λ ˜ i`1 , . . . , λℓ ´ λ ˜ ℓ q‌ ˆ1 , λi`1 ´ µ ˆ 2 , . . . , λℓ ´ µ ˆℓ´i`1 q~ ‌diagpλi ´ λ ‌ ď ~diagpλi ´ µ ‌ H ‌ ď ‌R3 Λ3 R3 ‌ ‌ ‌ ď pλi ´ λN q ‌R3H R3 ‌ , (5.10) where the last inequality is true because 1) R3H Λ3 R3 is negative semi-definite, 2) we shifted A by λi I, and 3) the jth largest eigenvalue of R3H pλi I´Λ3 qR3 which is positive semi-definite is bounded by the jth largest eigenvalue of pλi ´ λN qR3H R3 . Denote by σj (in descending order) for 1 ď j ď ℓ ´ i ` 1 the singular values of ` H ˘´1 ˇ X0 R3 “ f pΛ3 qU3H X0 U rf pΛˇ2 qs´1 , and by σ ˆj (in descending order) for 1 ď j ď ℓ´i`1 2 ` ˘´1 H H ˇ the singular values ˆj is less than or equal to the jth largest « of U3 `X0 U2˘X0ff . Then σ ´1 ˇ H X0 U1H X0 U 2 ˇ2 , Xq. We have ` ˘´1 , which is tan θj pU singular value of H ˇ U3 X0 U2H X0 1 ˆσ ˆj |f pλj q| 1 ˇ2 , Xq. ď max |f pλj q| ˆ max ˆ tan θj pU i`nb ďjďN iďjďℓ |f pλj q|

σj ď

max

i`nb ďjďN

|f pλj q| ˆ max

iďjďℓ

(5.11)

What remains to be done is to pick f P Pn´1 to make the right-hand side of (5.11) as small as we can. For the case of i “ 1, we choose f as in (4.21), and then (4.22) holds. Finally combining (5.10) and (5.11) with (4.22) leads to (5.1) for the case i “ 1. In general for i ą 1, we take f ptq as in (5.4) with gptq given by (4.24) satisfying (4.25), together again with (5.10) and (5.11), to conclude the proof. Remark 5.1. In Theorem 5.1, k “ i always, unlike lin Theorem 4.1, because we need the first equation in (5.6) for our proof to work.

17

6

A Comparison with the existing results

The existing results related to our results in the previous two sections include those by Kaniel [12], Paige [19], and Saad [21] (see also Parlett [20]). The most complete and strongest ones are in Saad [21]. In comparing ours with Saad’s results, the major difference is that ours are of the eigenspace/eigenvalue cluster type while Saad’s results are of the single-vector/eigenvalue type. When specialized to an individual eigenvector/eigenvalue, our results reduce to those of Saad. Specifically, for k “ ℓ “ i the inequality (4.9) becomes Saad [21, Theorem 5] and Theorem 5.1 becomes Saad [21, Theorem 6]. Certain parts of our proofs bear similarities to Saad’s proofs for the block case, but there are subtleties in our proofs that cannot be handled in a straightforward way following Saad’s proofs. It is well-known [7] that eigenvectors associated with eigenvalues in a tight cluster are sensitive to perturbations/rounding errors while the whole invariant subspace associated with the cluster is not. Therefore it is natural to treat the entire invariant subspace as a whole, instead of each individual eigenvectors in the invariant subspace separately. In what follows, we will perform a brief theoretical comparison between our results and those of Saad, and point out when Saad’s bounds may be too large. Numerical examples in the next section support this comparison. As mentioned, Saad’s bounds are of the single-vector/eigenvalue type. So a direct comparison cannot be done. But it is possible to derive some bounds for eigenspaces/eigenvalue clusters from Saad’s bounds, except that these derived bounds are less elegant and (likely) less sharp (which we will demonstrate numerically in section 7). One possible derivation based on [21, Theorem 5] may be as follows. By Proposition 2.2, ℓ ÿ ‌ ‌ max sin θpuj , Kn q ď ‌sin ΘpUp:,k:ℓq , Kn q‌ ď sin θpuj , Kn q,

kďjďℓ

(6.1)

j“k

g f ℓ fÿ sin2 θpuj , Kn q . max sin θpuj , Kn q ď } sin ΘpUp:,k:ℓq , Kn q}F ď e

kďjďℓ

(6.2)

j“k

These inequalities imply that the largest sin θpuj , Kn q is comparable to sin ΘpUp:,k:ℓq , Kn q, and thus finding good bounds for all sin θpuj , Kn q is comparably equivalent to finding a good bound for sin ΘpUp:,k:ℓq , Kn q. The right-most sides of (6.1) and (6.2) can be bounded, using [21, Theorem 5] (i.e., the inequality (4.9) for the case k “ ℓ “ i). In the notation of Theorem 4.1, we have, for k ď j ď ℓ, ξj,j tan θpuj , Kn q ď tan θpuj , Xj,j,j q (6.3) Tn´j pκj,j,nb q and use sin θ ď tan θ to get ℓ ‌ ‌ ÿ ‌sin ΘpUp:,k:ℓq , Kn q‌ ď

ξj,j tan θpuj , Xj,j,j q, T pκ q j“k n´j j,j,nb 18

(6.4)

g f ℓ „ fÿ } sin ΘpUp:,k:ℓq , Kn q}F ď e j“k

ȷ2 ξj,j tan θpuj , Xj,j,j q , Tn´j pκj,j,nb q

(6.5)

where Xj,j,j P CN are as defined right before Theorem 4.1, and κj,j,nb are as defined in Theorem 4.1. If θ1 pUp:,k:ℓq , Kn q is not too close to π{2 which we will assume, the left-hand sides of (4.9), (6.4), and (6.5) are comparable by Proposition 2.2, but there isn’t any easy way to compare their right-hand sides. Nevertheless, we argue that the right-hand side of (4.9) is preferable. First it is much simpler; Second, it is potentially much sharper for two reasons: 1. One or more ξj,j for k ď j ď ℓ in (6.4) and (6.5) may be much bigger than ξi,k in (4.9). 2. By Proposition 2.3, ΘpUp:,k:ℓq , rXk,k,k , . . . , Xℓ,ℓ,ℓ sq can be bounded in terms of the angles in tθpuj , Xj,j,j q, k ď j ď ℓu but not the other way around, i.e., together tθpuj , Xj,j,j q, k ď j ď ℓu cannot be bounded by something in terms of ΘpUp:,k:ℓq , rXk,k,k , . . . , Xℓ,ℓ,ℓ sq in general as we argued in Remark 2.1. For bounding errors between eigenvectors uj and Ritz vectors u ˜j , the following inequality was established in [21, Theorem 3] (which is also true for the block Lanczos method as commented there [21, p.703]): g ˜ ¸2 f f }Π ApI ´ Π q} n n 2 sin θpuj , u ˜j q ď χj sin θpuj , Kn q with χj “ e1 ` . (6.6) ˜p| minp‰j |λj ´ λ This inequality can grossly overestimate sin θpuj , u ˜j q even with the “exact” (i.e., computed) sin θpuj , Kn q for the ones associated with a cluster of eigenvalues due to possibly extremely ˜ p |, not to mention after using (6.3) to bound sin θpuj , Kn q. By tiny gap minp‰j |λj ´ λ Proposition 2.3, we have g f ℓ fÿ r } sin ΘpUp:,k:ℓq , Up:,k:ℓq q}F ď e sin2 θpuj , u ˜j q j“k

g f ℓ fÿ ďe χ2j sin2 θpuj , Kn q

(6.7)

j“k

g f ℓ „ fÿ ďe j“k

ȷ2 χj ξj,j tan θpuj , Xj,j,j q . Tn´j pκj,j,nb q

(6.8)

Unlike previously we claim finding good bounds for all sin θpuj , Kn q and finding a good bound for sin ΘpUp:,k:ℓq , Kn q are comparably equivalent due to (6.1) and (6.2), inherently any bound derived from bounds for all sin2 θpuj , u ˜j q likely very much overestimates 19

rp:,k:ℓq q}2 because there is no simply way to bound řℓ sin2 θpuj , u } sin ΘpUp:,k:ℓq , U ˜j q in j“k F 2 rp:,k:ℓq q} , i.e., the former may already be much bigger than the terms of } sin ΘpUp:,k:ℓq , U F latter, as we argued in Remark 2.1. So we anticipate the bounds of (6.7) and (6.8) to be bad when λj with k ď j ď ℓ form a tight cluster. ˜ j individually. The theorem is the Saad [21, Theorem 6] provides a bound on λj ´ λ same as our Theorem 5.1 for the case k “ ℓ “ i. Concerning the eigenvalue cluster tλk , . . . , λℓ u, Saad’s theorem gives g g f ℓ f ℓ „ ȷ4 fÿ fÿ ζ j e pλj ´ λ ˜ j q2 ď e pλj ´ λN q2 (6.9) tan θpuj , Xj,j,j q . T pκ q n´j j,j,n b j“k j“k This bound, too, could be much bigger than the one of (5.1) because of one or more ζj with k “ i ď j ď ℓ are much bigger than ζi .

7

Numerical examples

In this section, we shall numerically test the effectiveness of our upper bounds on the convergence of the block Lanczos method in the case of a cluster. In particular, we will demonstrate that our upper bounds are preferable to those of the single-vector/eigenvalue type of Saad [21], especially in the case of a tight cluster. Specifically, (a) the subspace angle ΘpUp:,k:ℓq , Xi,k,ℓ q used in our bounds is more reliable than the individual angles in tθpuj , Xj,j,j q, k ď j ď ℓu together to bound ΘpUp:,k:ℓq , Kn q (see remarks in section 6), and (b) our upper bounds are much sharper than those derived from Saad’s bounds in the presence of tight clusters of eigenvalues. ˜ j is at the same We point out that the worst individual bound of (6.3) or (6.6) or for λj ´ λ magnitude as the derived bound of (6.5) or (6.8) or (6.9), respectively. So if a derived bound is bad, the worst corresponding individual bound cannot be much better. For this reason, we shall focus on comparing our bounds to the derived bounds in section 6. Our first example below is intended to illustrate the first point (a), while the second example supports the second point (b). The third example is essentially taken from [21], again to show the effectiveness of our upper bounds. We implemented the block Lanczos method within MATLAB, with full reorthogonalization to simulate the block Lanczos method in exact arithmetic. This is the best one can do in a floating point environment. In our tests, without loss of generality, we take A “ diagpλ1 , . . . , λN q with special eigenvalue distributions to be specified later. Although we stated our theorems in unitarily invariant norms for generality, our numerical results are presented in terms of the Frobenius norm for easy understanding (and thus for Theorem 4.1, γ by (4.15) is used). No breakdown was encountered during all Lanczos iterations. 20

Table 7.1: Example 7.1: N “ 600, n “ 20, i “ k “ 1, ℓ “ 3, nb “ 3, and V0 as in (7.2) bř ℓ

˜ j ´ λj |2 rp:,k:ℓq q}F |λ } sin ΘpUp:,k:ℓq , U } sin ΘpUp:,k:ℓq , Kn q}F bound of Saad bound observed bound of Saad bound observed bound of Saad bound (5.1) of (6.9) (4.11) of (6.8) (4.9) of (6.5)

j“k

observed

1.9 ˆ 10´14 4.4 ˆ 10´14 4.2 ˆ 10´3 3.5 ˆ 10´8 1.3 ˆ 10´7 6.8 ˆ 10´2 3.3 ˆ 10´8 1.0 ˆ 10´7 2.5 ˆ 10´2

We will measure the following errors: (in all examples, k “ i “ 1 and ℓ “ nb “ 3) g f ℓ fÿ ˜ j ´ λ j |2 , |λ (7.1a) ϵ1 “ e j“k

rp:,k:ℓq q}F , ϵ2 “ } sin ΘpUp:,k:ℓq , U

(7.1b)

ϵ3 “ } sin ΘpUp:,k:ℓq , Kn q}F ,

(7.1c)

ϵ4 “ } tan ΘpUp:,k:ℓq , Kn q}F ,

(7.1d)

for their numerically observed values (considered as “exact”), bounds of Theorems 4.1 and 5.1, and derived bounds of (6.9), (6.8) and (6.5) considered as “Saad’s bounds” for comparison purpose. Rigorously speaking, these are not Saad’s bounds. For very tiny θ1 pUp:,k:ℓq , Kn q, ϵ3 « ϵ4 since b ϵ3 ď ϵ4 ď ϵ3 { 1 ´ sin2 θ1 pUp:,k:ℓq , Kn q . Indeed in the examples that follows, the difference between ϵ3 and ϵ4 is so tiny that we can safely ignore their difference. Therefore we will be focusing on ϵ1 , ϵ2 , and ϵ3 , but not ϵ4 . Example 7.1. We take N “ 600, the number of Lanczos steps n “ 20, and λ1 “ 3.5, λ2 “ 3, λ3 “ 2.5,

λj “ 1 ´

5j , N

j “ 4, . . . , N,

and set i “ k “ 1, ℓ “ 3 and nb “ 3. There are two eigenvalue clusters: tλ1 , λ2 , λ3 u and tλ4 , . . . , λN u. We are seeking approximations related to the first cluster tλ1 , λ2 , λ3 u. The gap 0.5 between eigenvalues within the first cluster is to ensure that our investigation for our point (a) will not be affected too much by the eigenvalue closeness in the cluster. The effect of the closeness is, however, the subject of Example 7.2. Our first test run is with a special V0 given by » fi 1 0 0 — 0 ffi 1 0 — ffi — 0 ffi 0 1 — 1 ffi (7.2) V0 “ — ffi. sin 1 cos 1 — N ffi — .. ffi .. .. – . fl . . N ´nb N

sinpN ´ nb q cospN ´ nb q 21

Table 7.2: Example 7.1: averages over 20 random V0 bř ℓ

˜ j ´ λj |2 rp:,k:ℓq q}F |λ } sin ΘpUp:,k:ℓq , U } sin ΘpUp:,k:ℓq , Kn q}F bound of Saad bound observed bound of Saad bound observed bound of Saad bound (5.1) of (6.9) (4.11) of (6.8) (4.9) of (6.5)

j“k

observed

1.2 ˆ 10´14 2.5 ˆ 10´13 8.2 ˆ 10´8 5.9 ˆ 10´8 2.5 ˆ 10´7 2.7 ˆ 10´4 5.6 ˆ 10´8 2.0 ˆ 10´7 9.9 ˆ 10´5

Table 7.1 reports ϵ1 , ϵ2 , and ϵ3 and their bounds. Our bounds are very sharp – comparable to the observed ones, but the ones by (6.9), (6.8) and (6.5) overestimate the true errors too much to be of much use. Looking at (6.5) carefully, we find two contributing factors that make the resulting bound too big. The first contributing factor is the constants ξ1,1 “ 1,

ξ2,2 “ 15,

ξ3,3 “ 105.

(7.3)

The second contributing factor is the angles θpuj , Xj,j,j q for k ď j ď ℓ: j

1

2

3

θj pUp:,k:ℓq , Xi,k,ℓ q 1.51299 1.51298 1.49976 θpuj , Xj,j,j q

1.49976 1.57066 1.57069

What we see is that the canonical angles θj pUp:,k:ℓq , Xi,k,ℓ q are bounded away from π{2 but the last two of the angles θpuj , Xj,j,j q are nearly π{2 “ 1.57080. This of course has something to do with the particular initial V0 in (7.2). But given the exact eigenvectors are e1 , e2 , e3 , this V0 should not be considered a deliberate choice so as to simply make our bounds look good. Similar reasons explain why the upper bounds of (6.9) and (6.8) are poor as well. In fact, now the first contributing factor is ζ1 “ 1,

ζ2 “ 15 ´ 1.5 ˆ 10´13 « 15,

ζ3 “ 105 ´ 3.8 ˆ 10´12 « 105,

χ1 « χ2 « χ3 « 2.6860,

(7.4) (7.5)

and then again the set of angles θpuj , Xj,j,j q for k ď j ď ℓ is the second contributing factor. Next we test random initial V0 as generated by randnpN, nb q in MATLAB. Table 7.2 reports the averages of the same errors/bounds as reported before over 20 test runs. The bounds of (6.9), (6.8) and (6.5) are much better now, but still about 1000 times bigger than ours. The following table displays the canonical angles θj pUp:,k:ℓq , Xi,k,ℓ q as well as θpuj , Xj,j,j q. j

1

2

3

θj pUp:,k:ℓq , Xi,k,ℓ q 1.5497 1.5183 1.4712 θpuj , Xj,j,j q

1.5481 1.5359 1.5281

22

Table 7.3: Example 7.2: averages over 20 random V0 bř

ℓ j“k

δ

observed

˜ j ´ λj | 2 |λ

bound of Saad bound observed (5.1)

10´1 10´2 10´3 10´4

5.6 ˆ

10´15

1.8 ˆ

10´13

6.1 ˆ

10´15

2.3 ˆ

10´14

bound of Saad bound observed

of (6.9)

2.6 ˆ

10´13

5.4 ˆ

10´12

2.5 ˆ

10´14

5.1 ˆ

10´14

} sin ΘpUp:,k:ℓq , Kn q}F

rp:,k:ℓq q}F } sin ΘpUp:,k:ℓq , U

2.6 ˆ

10´6

8.9 ˆ

10´3

6.1 ˆ

10`2

1.7 ˆ

10`6

(4.11) 7.3 ˆ

10´8

4.1 ˆ

10´7

3.4 ˆ

10´8

4.7 ˆ

10´8

3.8 ˆ

10´7

1.5 ˆ

10´6

1.2 ˆ

10´7

1.6 ˆ

10´7

of (6.8)

bound of Saad bound (4.9)

8.9 ˆ

10´3

4.9 ˆ

10`0

4.1 ˆ

10`3

6.7 ˆ

10`6

6.6 ˆ

10´8

3.8 ˆ

10´7

3.2 ˆ

10´8

4.3 ˆ

10´8

of (6.5)

2.2 ˆ

10´7

6.7 ˆ 10´4

3.8 ˆ

10´7

3.8 ˆ 10´2

7.2 ˆ

10´8

3.2 ˆ 10`1

1.0 ˆ

10´7

5.3 ˆ 10`2

10´5 4.0 ˆ 10´15 3.7 ˆ 10´14 6.2 ˆ 10`10 4.9 ˆ 10´8 1.5 ˆ 10´7 1.3 ˆ 10`10 4.5 ˆ 10´8 9.7 ˆ 10´8 1.0 ˆ 10`5

It shows that the randomness in V0 makes none of θpuj , Xj,j,j q for k ď j ď ℓ as close to π{2 as V0 in (7.2) does. In fact, they are about at the same level as the canonical angles θj pUp:,k:ℓq , Xi,k,ℓ q. Therefore, the sole contributing factor that makes the bounds of (6.9), (6.8), and (6.5), worse than ours are the ξ’s and ζ’s in (7.3) and (7.4). Example 7.2. Let N “ 1000, n “ 25, and λ1 “ 2 ` δ, λ2 “ 2, λ3 “ 2 ´ δ,

λj “ 1 ´

5j , N

j “ 4, . . . , N,

and again set i “ k “ 1, ℓ “ 3, nb “ 3. We will adjust the parameter δ ą 0 to control the tightness among eigenvalues within the cluster tλ1 , λ2 , λ3 u and to see how it affects the upper bounds and the convergence rate of the block Lanczos method. We will demonstrate that our bounds which are of the eigenspace/eigenvalue cluster type are insensitive to δ and barely change as δ goes to 0, while “Saad’s bounds” are very sensitive and quickly become useless as δ goes to 0. We randomly generate initial V0 and investigate the average errors/bounds over 20 test runs. Since the randomness will reduce the difference in contributions by ΘpUp:,k:ℓq , Xi,k,ℓ q and by θpuj , Xj,j,j q as we have seen in Example 7.1, the gap δ within the cluster and the gap between the cluster and the rest of the eigenvalues are the only contributing factor for approximation errors ϵi . Table 7.3 reports the averages of the errors defined in (7.1) and the averages of their bounds of Theorems 4.1 and 5.1 and “Saad’s bounds” of (6.9), (6.8) and (6.5) over 20 test runs. From the table, we observed that (i) all of our bounds which are of the eigenspace/eigenvalue cluster type are rather sharp – comparable to the observed (“exact”) errors, and moreover, they seem to be independent of the parameter δ as it becomes tinier and tinier; (ii) as δ gets tinier and tinier, the “Saad’s bounds” of (6.9), (6.8), and (6.5) increase dramatically and quickly contain no useful information for δ “ 10´3 or smaller. To explain the observation (ii), we first note that ξj,j in (6.5) are given by 6 ξ1,1 “ 1, ξ2,2 “ 1 ` , δ 23

and ξ3,3 “

6 36 ` 2. δ δ

Observed ε and its Saad bounds

Observed ε5, observed ε2, and our bound on ε2

−2

5

30

10

10

observed ε5

observed ε5 observed ε2 bound on ε2

−3

10

bound β1

25

10

bound β2

20

10 −4

10

15

10

10

−5

10

10

5

10 −6

10

0

10 −7

10

−5

10

−10

−8

10 −12 10

−10

10

−8

10

−6

10 δ

−4

10

−2

10

0

10

10

−12

10

−10

10

−8

10

−6

−4

10 δ

10

−2

10

0

10

Figure 7.1: Observed ϵ5 deteriorates as δ goes to 0 while ϵ2 seems to remain unchanged in magnitude. “Saad’s bounds” on ϵ5 are not included in the left plot in order not to obscure the radical difference between ϵ2 and ϵ5 for tiny δ. They grow uncontrollably to 8 as δ goes to 0. Therefore, the “Saad’s bound” of (6.5) is about Opδ ´2 q for tiny δ. Similarly, since ξj,j and ζj are almost of the same level, it can be seen from (6.9) that the “Saad’s bound” for ϵ1 is about Opδ ´4 q. For (6.8), when n is moderate, χj is about Opδ ´1 q, and therefore, “Saad’s bound” for ϵ2 is about Opδ ´3 q for tiny δ. We argued in section 6 that Saad’s bound on θpuj , u ˜j q can be poor in a tight cluster of eigenvalues. Practically, it is also unreasonable to expect it to go as tiny as the machine’s unit roundoff u as the number of Lanczos steps increases. For this example, by the DavisKahan sin θ theorem [7] (see also [22]), we should expect, for 1 ď j ď 3, (observed) sin θpuj , u ˜j q “ O(Lanczos approximation error) ` Opu{δq, where Opu{δq is due to machine’s roundoff and can dominate the Lanczos approximation error after certain number of Lanczos steps. To illustrate this point, we plot, in Figure 7.1, g f ℓ fÿ ϵ5 “ e sin2 θpuj , u ˜j q j“k

as δ varies from 10´1 down to 10´11 . Also plotted are its two upper bounds β1 and β2 of (6.7) and (6.8) g f ℓ fÿ ϵ5 ď β1 :“ e χ2j sin2 θpuj , Kn q j“k

g f ℓ „ fÿ ď β2 :“ e j“k

χj ξj,j tan θpuj , Xj,j,j q Tn´j pκj,j,nb q 24

ȷ2 ,

Table 7.4: Example 7.3: N “ 900, n “ 12, i “ k “ 1, ℓ “ 3, nb “ 3 bř ℓ

˜ j ´ λj |2 rp:,k:ℓq q}F |λ } sin ΘpUp:,k:ℓq , U } sin ΘpUp:,k:ℓq , Kn q}F bound of Saad bound observed bound of Saad bound observed bound of Saad bound (5.1) of (6.9) (4.11) of (6.8) (4.9) of (6.5)

j“k

observed

9.4 ˆ 10´10 1.5 ˆ 10´8 4.8 ˆ 10´4 3.9 ˆ 10´5 1.3 ˆ 10´4 3.0 ˆ 10´2 3.7 ˆ 10´5 1.1 ˆ 10´4 1.9 ˆ 10´2

rp:,k:ℓq q}F defined in (7.1b) and its as well as the observed values of ϵ2 “ } sin ΘpUp:,k:ℓq , U upper bounds of (4.11). From the figure, we see that the observed ϵ5 is about 10´7 for δ ě 10´7 and then starts to deteriorate from about 10´7 up to 10´3 as δ goes down to 10´8 or smaller. At the same time, the magnitudes of ϵ2 and its bound of (4.11) remain unchanged around 10´7 . This supports our point that one should measure the convergence of the entire invariant subspace corresponding to tightly clustered eigenvalues rather than their each individual eigenvector within the subspace. Example 7.3. Our last example is from [21]: λ1 “ 2, λ2 “ 1.6, λ3 “ 1.4, » 1 1 1 – 0 ´1 V0 “ 1 1 ´2 1

j´3 , j “ 4, . . . , N, N fiH 1 1 1 1 0 ´1fl . 1 ´2 1

λj “ 1 ´ ¨¨¨ ¨¨¨ ¨¨¨

Since the first three eigenvalues are in a cluster, we take i “ k “ 1, ℓ “ 3 and nb “ 3. Saad [21] tested this problem for N “ 60 and n “ 12. We ran our code with various N , including N “ 60 and saw little variations in observed errors and their bounds. What we report in Table 7.4 is for N “ 900 and n “ 12. The table shows that our bounds very much agree with the corresponding observed values but “Saad’s bounds” are much bigger — about the square roots of the observed values and our bounds. This is due mainly to the small gap between λ2 and λ3 since we observed that θj pUp:,k:ℓq , Xi,k,ℓ q « θpuj , Xj,j,j q « 1.51303 for j “ 1, 2, 3.

8

More bounds

The numerical examples in the previous section indicate that ξi,k in Theorem 4.1 and ζi in Theorem 5.1 can worsen the error bounds, especially in the case of tightly clustered eigenvalues. However, both can be made 1 if only the first nb eigenpairs are considered (see Remark 4.1). To use Theorems 4.1 or 5.1 for eigenvalues/eigenvectors involving eigenpairs beyond the first nb ones, we have to have ξi,k or ζi in the mix. Thus potentially the resulting bounds may overestimate too much to be indicative. Another situation in which the bounds of Theorems 4.1 or 5.1 would overestimate the actual quantities (by too much) is when there are clustered eigenvalues with cluster sizes bigger than nb , because then δi,ℓ,nb is very tiny for any choices of i, k, and ℓ. Ye [24] 25

proposed an adaptive strategy to use variable nb aiming at updating nb adaptively so that it is bigger than or equal to the biggest cluster size of interest. In what follows, we propose more error bounds with ξi,k and ζi always 1. However, it requires the knowledge of the canonical angles from the interested eigenspace to Ki pA, V0 q, where i ă n is a positive integer. Roughly speaking, the new results show that if the eigenspace is not too far from Ki pA, V0 q for some i ă n (in the sense that no canonical angle is too near π{2), the canonical angles from the eigenspace to Kn pA, V0 q are reduced by a factor purely depending upon the optimal polynomial reduction. To proceed, we let i ă n. Now we are considering the 1st to pinb qth eigenpairs of A among which the kth to ℓth eigenvalues may form a cluster as in λtN

λintb

λℓ λk

λt1

(cluster)

where 1 ď k ă ℓ ď inb ,

1 ď i ă n.

Suppose θ1 pUp:,1:inb q , Qi q “ θ1 pUp:,1:inb q , Ki pA, V0 qq ă π{2, i.e., rankpQH i Up:,1:inb q q “ inb .

(8.1)

Consider an application of Proposition 2.4(b) with k1 “ ℓ ´ k ` 1, X “ RpQi q,

Y “ RpUp:,1:inb q q,

ry1 , y2 , . . . , yk1 s “ ruk , uk`1 , . . . , uℓ s.

The application yields a unique Zi,k,ℓ :“ rx1 , x2 , . . . , xk1 s such that RpZi,k,ℓ q Ď RpQi q and H Up:,1:inb q Up:,1:in Z “ Up:,k:ℓq ” ruk , uk`1 , . . . , uℓ s. b q i,k,ℓ

(8.2)

‌ ‌ ‌ ‌ ‌sin ΘpUp:,k:ℓq , Zi,k,ℓ q‌ ď ‌sin ΘpUp:,1:in q , Qi q‌ , b ‌ ‌ ‌ ‌ ‌tan ΘpUp:,k:ℓq , Zi,k,ℓ q‌ ď ‌tan ΘpUp:,1:in q , Qi q‌ .

(8.3)

Moreover

b

(8.4)

Finally, we observe that Kn´i`1 pA, Qi q “ Kn pA, V0 q.

(8.5)

The rest is the straightforward application of Theorems 4.1 and 5.1 to Kn´i`1 pA, Qi q. Theorem 8.1. For any unitarily invariant norm ~ ¨ ~, we have ‌ ‌ ‌ ‌ 1 ‌tan ΘpUp:,k:ℓq , Zi,k,ℓ q‌ , ‌tan ΘpUp:,k:ℓq , Kn q‌ ď Tn´i pˆ κi,ℓ,nb q ‌ ‌ ‌ ‌ ‌ ‌ rp:,k:ℓq q‌ ď γ ‌sin ΘpUp:,k:ℓq , Kn q‌ ‌sin ΘpUp:,k:ℓq , U ‌ ‌ γ ‌tan ΘpUp:,k:ℓq , Zi,k,ℓ q‌ , ď Tn´i pˆ κi,ℓ,nb q

26

(8.6) (8.7) (8.8)

where Zi,k,ℓ is defined by (8.2), λℓ ´ λinb `1 δˆi,ℓ,nb “ , λℓ ´ λN

κ ˆ i,ℓ,nb “

1 ` δˆi,ℓ,nb , 1 ´ δˆi,ℓ,n

(8.9)

b

and γ takes the same form as in (4.13) with, again, the constant c lying between 1 and ˜ k´1 ą λk , and with η being the same as the π{2 and being 1 for the Frobenius norm or if λ one in (4.14). For the Frobenius norm, γ can be improved to the one in (4.15). Theorem 8.2. Let k “ 1. For any unitarily invariant norm, we have ‌ ‌ ‌ ˜ 1 , λ2 ´ λ ˜ 2 , . . . , λℓ ´ λ ˜ ℓ q‌ ‌diagpλ1 ´ λ ‌ „ ȷ2 ‌ 2 ‌ 1 ‌tan ΘpUp:,1:ℓq , Zi,1,ℓ q‌ , (8.10) ď pλ1 ´ λN q Tn´i pˆ κi,ℓ,nb q where κ ˆ i,ℓ,nb is the same as the one in (8.9).

9

Generalized eigenvalue problem

Consider the generalized eigenvalue problem for A ´ λM , where A and M are N ˆ N Hermitian and M is positive definite. Notice this eigenvalue problem is equivalent to the standard Hermitian eigenvalue problem for p :“ M ´1{2 AM ´1{2 . A

(9.1)

A (block) Lanczos process for A ´ λM can essentially be obtained by symbolically writing p and then translating it into one without the matrix square root [20]. out the process for A Take Algorithm 3.1 for an example, and adopt the convention that each notation p always has a hat. The key mathematical step of Algorithm 3.1 applied to A p symbol for A is the following equation H pj “ A pVpj ´ Vpj A pj ´ Vpj´1 B pj´1 Vpj`1 B ,

(9.2)

implemented at Lines 4 to 8 there, where Vpj has orthonormal columns. Define Vj :“ M ´1{2 Vpj ,

pj , Aj :“ A

pj . Bj :“ B

(9.3)

Then Vj has M -orthonormal columns, i.e, VjH M Vj “ Inb . Multiply (9.2) by M 1{2 from the left to get H M Vj`1 Bj “ AVj ´ M Vj Aj ´ M Vj´1 Bj´1 (9.4) which will be the key equation that yields the following generic block Lanczos process for A ´ λM . In the generic situation, i.e., when all Z in Lines 4 and 8 have full column rank nb , this process produces Qn “ rV1 , V2 , . . . , Vn s P CN ˆnnb , 27

QH n M Qn “ Innb ,

(9.5a)

Algorithm 9.1 Simple Block Lanczos Process for A ´ λM Given Hermitian pencil A ´ λM P CN ˆN with positive definite M and V0 P CN ˆnb with rankpV0 q “ nb , this generic block Lanczos process performs a partial tridiagonal reduction on A ´ λM . 1: perform M -orthogonalization on given V0 P CN ˆnb (rankpV0 q “ nb ) to obtain V0 “ V1 B0 (e.g., via modified Gram-Schmit), where V1 P CN ˆnb satisfying V1H M V1 “ Inb and B0 P Cnb ˆnb ; 2: Y “ AV1 , A1 “ V1H Y ; 3: Y “ Y ´ M V1 A1 ; solve M Z “ Y for Z; 4: perform M -orthogonalization on Z to obtain Z “ V2 B1 , where V2 P CN ˆnb satisfying V2H M V2 “ Inb and B1 P Cnb ˆnb ; 5: for j “ 2 to n do 6: Y “ AVj , Aj “ VjH Y ; H , and solve M Z “ Y for Z; 7: Y “ Y ´ M Vj Aj ´ M Vj´1 Bj´1 8: perform M -orthogonalization on Z to obtain Z “ Vj`1 Bj , Vj`1 P CN ˆnb satisfying H MV nb ˆnb ; Vj`1 j`1 “ Inb and Bj P C 9: end for fi A1 B1H —B1 A2 ffi B2H — ffi — ffi .. .. .. Tn “ QH ffi P Cnnb ˆnnb . . . n AQn “ — — ffi H fl – Bn´2 An´1 Bn´1 Bn´1 An »

(9.5b)

that are related by the following fundamental equation AQn “ M Qn Tn ` r0N ˆnnb , M Vn`1 Bn s.

(9.6)

As for the standard eigenvalue problem, the block Lanczos method for A ´ λM is this Lanczos process followed by solving the eigenvalue problem for Tn which is Hermitian ˜ j , wj q of Tn gives an to obtain approximating eigenpairs for A ´ λM : any eigenpair pλ ˜ ˜ approximating eigenpair pλj , Qn wj q for A ´ λM . These λj again are called Ritz values and u ˜j “ Qn wj Ritz vectors. We will use the same notation as detailed in (3.5) for Tn . Let Vp0 “ M 1{2 V0 . We have p Vp0 q “ Kn pM ´1{2 AM ´1{2 , M 1{2 V0 q Kn pA, “ M 1{2 Kn pM ´1 A, V0 q, and thus p Vp0 q Kn pM ´1 A, V0 q “ M ´1{2 Kn pA, “ M ´1{2 rRpVp1 q ‘ ¨ ¨ ¨ ‘ RpVpn qs “ RpV1 q ‘ ¨ ¨ ¨ ‘ RpVn q. 28

(9.7)

Correspondingly, the M -orthogonal projection onto Kn pM ´1 A, V0 q is Πn “ Qn QH n M. In particular Π1 “ V1 V1H M . p and Vp0 . Our previous approximation theorems in sections 4 and 5 apply directly to A The resulting theorems can then be translated into theorems for A ´ λM upon using the relationships in (9.3) and (9.7). We introduce the following notation for A ´ λM : eigenvalues: M -orthonormal eigenvectors: eigen-decomposition:

λ1 ě λ2 ě ¨ ¨ ¨ ě λN , and Λ “ diagpλ1 , λ2 , . . . , λN q, u1 , u2 , . . . , uN , and U “ ru1 , u2 , . . . , uN s, U H AU “ Λ and U H M U “ IN .

(9.8)

p “ M 1{2 U . Then for A: p Let u ˆj “ M 1{2 uj and U eigenvalues: orthonormal eigenvectors: eigen-decomposition:

λ1 ě λ2 ě ¨ ¨ ¨ ě λN , and Λ “ diagpλ1 , λ2 , . . . , λN q, u ˆ1 , u ˆ2 , . . . , u ˆN , and p U “ rˆ u1 , u ˆ2 , . . . , u ˆN s, p“U p ΛU p H and U p HU p “ IN . A

p P CN ˆℓ2 related Next we have the following angle relationship: for any Y, Yp P CN ˆℓ1 , Z, Z 1{2 1{2 p p by Y “ M Y and Z “ M Z, p “ ΘpM 1{2 Y, M 1{2 Zq “: ΘM pY, Zq ΘpYp , Zq

(9.9)

which defines the M -canonical angles from one of RpY q and RpZq with lower dimension to the other. To proceed, we need an extension of Proposition 2.4 for the M -inner product. In fact, the proposition remains valid with these modifications: replace 1) all canonical angles by the corresponding M -canonical angles, 2) orthonormal by M -orthonormal, 3) the orthogonal projection PY by the M -orthogonal projection, i.e., Y Y H M , where Y is the M -orthonormal basis matrix of Y. Consider an application of the modified Proposition 2.4(b) for the M -inner product with k1 “ ℓ ´ k ` 1, X “ RpV0 q “ RpV1 q,

Y “ RpUp:,i:i`nb ´1q q,

ry1 , y2 , . . . , yk1 s “ ruk , uk`1 , . . . , uℓ s,

assuming rankpV0H M Up:,i:i`nb ´1q q “ rankpV1H M Up:,i:i`nb ´1q q “ nb .

(9.10)

The application yields a unique Xi,k,ℓ :“ rx1 , x2 , . . . , xk1 s such that RpXi,k,ℓ q Ď RpV0 q and H Up:,i:i`nb ´1q Up:,i:i`n M Xi,k,ℓ “ Up:,k:ℓq ” ruk , uk`1 , . . . , uℓ s. b ´1q

29

(9.11)

Moreover ‌ ‌ ‌ ‌ ‌sin ΘM pUp:,k:ℓq , Xi,k,ℓ q‌ ď ‌sin ΘM pUp:,i:i`n ´1q , V0 q‌ , b ‌ ‌ ‌ ‌ ‌tan ΘM pUp:,k:ℓq , Xi,k,ℓ q‌ ď ‌tan ΘM pUp:,i:i`n ´1q , V0 q‌ . b

(9.12) (9.13)

In the rest of this section, we will always assume (9.10) and Xi,k,ℓ P CN ˆpℓ´k`1q will have the same assignment as given here. In the two theorems below, ξi,k , κi,ℓ,nb , η, and ζi take the same forms as in (4.12), (4.14), and (5.2) and (5.3), respectively, but with eigenvalues λj in (9.8) for the pencil rj as the eigenvalues of Tn here by Algorithm 9.1. A ´ λM and Ritz values λ Theorem 9.1. For any unitarily invariant norm, we have ‌ ‌ ‌ ‌ ξi,k ‌tan ΘM pUp:,k:ℓq , Kn q‌ ď ‌tan ΘM pUp:,k:ℓq , Xi,k,ℓ q‌ , Tn´i pκi,ℓ,nb q ‌ ‌ ‌ ‌ ‌ ‌ rp:,k:ℓq q‌ ď γ ‌sin ΘM pUp:,k:ℓq , Kn q‌ ‌sin ΘM pUp:,k:ℓq , U

(9.14)

‌ ‌ γ ξi,k ‌tan ΘM pUp:,k:ℓq , Xi,k,ℓ q‌ , Tn´i pκi,ℓ,nb q

(9.16)

ď where γ “1`

c }M ´1{2 ΠnH ApI ´ Πn qM ´1{2 }2 , η

(9.15)

(9.17)

˜ k´1 ą λk . and the constant c lies between 1 and π{2, and is 1 for the Frobenius norm or if λ For the Frobenius norm, γ can be improved to d ˆ ˙2 1 γ “ 1` }M ´1{2 ΠnH ApI ´ Πn qM ´1{2 }2 . (9.18) η Proof. Only the factor }M ´1{2 ΠnH ApI ´ Πn qM ´1{2 }2 in (9.17) and (9.18) needs a justification since the rest is rather obvious. The inequality (9.15) is derived from (4.10) applied p “ M ´1{2 AM ´1{2 , resulting in a factor }Π p n ApI p ´Π p n q}2 . Since Π pn “ Q pn Q pH to A n “ 1{2 H 1{2 M Qn Qn M , we have p n ApI p ´Π p n q “ M 1{2 Qn QH M 1{2 M ´1{2 AM ´1{2 pI ´ M 1{2 Qn QH M 1{2 q Π n n H ´1{2 “ M ´1{2 M Qn QH n ApI ´ Qn Qn M qM

“ M ´1{2 ΠnH ApI ´ Πn qM ´1{2 which contributes to the factor }M ´1{2 ΠnH ApI ´ Πn qM ´1{2 }2 in (9.17) and (9.18). Theorem 9.2. Let k “ i. For any unitarily invariant norm, we have ‌ ‌ ‌ ˜ i , λi`1 ´ λ ˜ i`1 , . . . , λℓ ´ λ ˜ ℓ q‌ ‌diagpλi ´ λ ‌ „ ȷ2 ‌ 2 ‌ ζi ‌tan ΘM pUp:,k:ℓq , Xi,k,ℓ q‌ . (9.19) ď pλi ´ λN q Tn´i pκi,ℓ,nb q Similarly, we can derive two more theorems as the extensions of Theorems 8.1 and 8.2. We omit the detailed statements. 30

10

Conclusions

We have established a new convergence theory for solving large scale Hermitian eigenvalue problem by the block Lanczos method from a perspective of bounding approximation errors in the entire eigenspace associated with all eigenvalues in a tight cluster, in contrast to bounding errors in each individual approximate eigenvector as was done in Saad [21]. In a way, this is a natural approach to follow because the block Lanczos method is known to be capable of computing multiple/cluster eigenvalues much faster than the single-vector Lanczos method (which will miss all but one copy of each multiple eigenvalue). The outcome is three error bounds on 1. the canonical angles from the eigenspace to the generated Krylov subspace, 2. the canonical angles between the eigenspace and its Ritz approximate subspace, 3. the total differences between the eigenvalues in the cluster and their corresponding Ritz values. These bounds are much sharper than the existing ones and expose true rates of convergence of the block Lanczos method towards eigenvalue clusters, as illustrated by numerical examples. Furthermore, their sharpness is independent of the closeness of eigenvalues within a cluster. As is well-known, the (block) Lanczos method favors the eigenvalues at both ends of the spectrum. So far, we have only focused on the convergence of the few largest eigenvalues and their associated eigenspaces, but as is usually done, applying what we have established to the eigenvalue problem for ´A will lead to various convergence results for the few smallest eigenvalues and their associated eigenspaces. All results are stated in terms of unitarily invariant norms for generality, but specializing them to the spectral norm and the Frobenius norm will be sufficient for all practical purposes.

References [1] R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics, vol. 169. Springer, New York, 1996. [2] R. Bhatia, C. Davis, and P. Koosis. An extremal problem in Fourier analysis with applications to operator theory. J. Funct. Anal., 82:138–150, 1989. [3] R. Bhatia, C. Davis, and A. McIntosh. Perturbation of spectral subspaces and solution of linear operator equations. Linear Algebra Appl., 52-53:45–67, 1983. [4] E. W. Cheney. Introduction to Approximation Theory. Chelsea Publishing Company, New York, 2nd edition, 1982. [5] Jane K. Cullum and W. E. Donath. A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, real symmetric matrices. In Decision and Control including the 13th Symposium on Adaptive Processes, 1974 IEEE Conference on, volume 13, pages 505 –509, 1974.

31

[6] Jane K. Cullum and Ralph A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. I: Theory. SIAM, Philadelphia, 2002. [7] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal., 7:1–46, 1970. [8] J. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997. [9] G. H. Golub and R. R. Underwood. The block Lanczos method for computing eigenvalues. In J. R. Rice, editor, Mathematical Software III, pages 361–377. Academic Press, New York, 1977. [10] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, Maryland, 3rd edition, 1996. [11] Zhongxiao Jia and G. W. Stewart. An analysis of the Rayleigh-Ritz method for approximating eigenspaces. Math. Comp., 70:637–647, 2001. [12] S. Kaniel. Estimates for some computational techniques in linear algebra. Math. Comp., 20(95):369–378, July 1966. [13] A. B. J. Kuijlaars. Which eigenvalues are found by the Lanczos method? SIAM J. Matrix Anal. Appl., 22(1):306–321, 2000. [14] A. B. J. Kuijlaars. Convergence analysis of Krylov subspace iterations with methods from potential theory. SIAM Rev., 48(1):3–40, 2006. [15] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Standards, 45:255–282, 1950. [16] Ren-Cang Li. Matrix perturbation theory. In L. Hogben, R. Brualdi, A. Greenbaum, and R. Mathias, editors, Handbook of Linear Algebra, page Chapter 15. CRC Press, Boca Raton, FL, 2006. [17] Ren-Cang Li. On Meinardus’ examples for the conjugate gradient method. Math. Comp., 77(261):335–352, 2008. Electronically published on September 17, 2007. [18] Ren-Cang Li. Sharpness in rates of convergence for symmetric Lanczos method. Math. Comp., 79(269):419–435, 2010. [19] C. C. Paige. The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices. PhD thesis, London University, London, England, 1971. [20] B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, 1998. This SIAM edition is an unabridged, corrected reproduction of the work first published by Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1980. [21] Y. Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods. SIAM J. Numer. Anal., 15(5):687–706, October 1980. [22] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990. ˚. Wedin. On angles between subspaces. In B. K˚ [23] P.-A agstr¨om and A. Ruhe, editors, Matrix Pencils, pages 263–285, New York, 1983. Springer. [24] Qiang Ye. An adaptive block Lanczos algorithm. Numerical Algorithms, 12:97–110, 1996.

32