A convergence analysis for a sweeping preconditioner ... - Math TAMU

Report 2 Downloads 46 Views
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2015; 22:371–392 Published online 11 November 2014 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/nla.1961

A convergence analysis for a sweeping preconditioner for block tridiagonal systems of linear equations Hakan Ba˘gcı1 , Joseph E. Pasciak2,*,† and Kostyantyn Y. Sirenko1 1 Division

of Computer, Electrical, and Mathematical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 2 Department of Mathematics, Texas A&M University, College Station, TX 77843-3368, USA

SUMMARY We study sweeping preconditioners for symmetric and positive definite block tridiagonal systems of linear equations. The algorithm provides an approximate inverse that can be used directly or in a preconditioned iterative scheme. These algorithms are based on replacing the Schur complements appearing in a block Gaussian elimination direct solve by hierarchical matrix approximations with reduced off-diagonal ranks. This involves developing low rank hierarchical approximations to inverses. We first provide a convergence analysis for the algorithm for reduced rank hierarchical inverse approximation. These results are then used to prove convergence and preconditioning estimates for the resulting sweeping preconditioner. Copyright © 2014 John Wiley & Sons, Ltd. Received 1 March 2013; Revised 22 September 2014; Accepted 5 October 2014 KEY WORDS:

Helmholtz equation, perfectly matched layer, cartesian PML, sweeping preconditioner

1. INTRODUCTION In this paper, we provide a convergence analysis for a so-called sweeping preconditioner for block tridiagonal systems of linear equations. The preconditioners are obtained by replacing the Schur complements appearing in a block Gaussian elimination algorithm by other, more conveniently defined, matrices. This paper uses hierarchical matrix constructions to provide the replacements. Although the algorithm is not new (cf. [1–3]), there has never been a complete analysis of its convergence. The sweeping preconditioner is sequential, while the hierarchical matrix approximations are recursive so that, in both cases, their convergence is closely related to their stability. We note that there is an immense literature providing techniques for efficiently solving discrete systems, which arise from finite element and finite difference approximation, and the debate over the best method is unending. The method studied in this paper has been computationally demonstrated [1] to be an extremely effective technique for some fairly difficult systems arising from scattering problems with far field condition coming from a PML approximation. Indeed, Engquist and Ying [1] observed very accurate results for this algorithm with only one or two iterations. This justifies studying the method from a theoretical point of view, developing a rigorous convergence theory. We shall only consider the case of discrete systems, which are symmetric and positive definite. This restriction in the theory results from a technical problem bounding the growth of norms of the Schur complements appearing in the approximate inverse procedure. To more concretely describe the method, we consider a general block tridiagonal matrix B of dimension NQ  NQ with N  N blocks ¹Bij º. Then B can be factored as B D LS U where S is

*Correspondence to: Joseph E. Pasciak, Mathematics, Texas A & M University, College Station, TX 77843-3368, USA. † E-mail: [email protected] Copyright © 2014 John Wiley & Sons, Ltd.

372

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

the block diagonal matrix with diagonal blocks S1 D B11 and Sj D Bjj  Bj;j 1 Sj1 1 Bj 1;j for j D 2; 3; : : : ; N . The blocks of L and U are respectively given by 8 I W if i D j; ˆ < 1 if i D j C 1; Lij D Bij Sj W ˆ : 0 W otherwise and Uij D

8 ˆ
1:

(2.2)

This is qualified in the following condition. Condition S. There is a constant  such that kB11  SQ1 k 6  and kBjj  SQj  Bj;j 1 SQj1 1 Bj 1;j k 6 forj D 2; : : : ; N: The following theorem is an easy consequence. Theorem 2.3 Suppose that SQj , for j D 1; : : : ; N  1, is nonsingular and that Condition S holds. Then Q 6 : kB  Bk Proof Q we find that it has the same off-diagonal blocks as B while its diagonal blocks are Expanding B, Q given by B11 D SQ1 and BQ jj D SQj C Bj;j 1 SQj1 1 Bj 1;j , for j D 2; : : : ; N . It follows that B  BQ D E

(2.3)

where E is the block diagonal matrix with diagonal blocks given by E11 D B11  SQ1 and Ejj D Bjj  BQ jj D Bjj  SQj  Bj;j 1 SQj1 1 Bj 1;j ; for j D 2; : : : ; N:

(2.4)

Q

Note that Ejj is exactly the quantity appearing in Condition S. For x 2 RN , ˇ ˇ N ˇ ˇ X ˇ ˇ 2 Q ˇ D jBx  Bx ˇ Ei i xi j2 : ˇ ˇ i D1

Here, xi denotes the part of x associated with the i’th block. Thus, Condition S implies Q 2 6 2 jBx  Bxj

N X

jxi j2 D  2 jxj2 :

i D1

This completes the proof of the theorem.



Although the proof of the previous theorem appears rather trivial, it does provide significant insight into how sweeping preconditioners should be designed. One might think of the preconditioning algorithm as providing approximations to the Schur complement operators ¹Sj º, j D 1; 2; : : : ; N . Taking this point of view would lead one to try an inductive analysis relating the error Sj  SQj to that of Sj 1  SQj 1 . Such an analysis suggests that the errors in the algorithm accumulate in a multiplicative fashion. Such errors would be very difficult to control. The theorem and its proof shows that this is the wrong point of view. The previous theorem shows that we need to develop SQj so that Condition S holds. In this case, the error analysis is not multiplicative as the error Ejj resulting from replacing Sj with SQj in the preconditioner only depends on how well SQj solves (2.2). Thus, even if Sj 1 and SQj 1 are not close, we still obtain a small error Ejj provided that Condition S is fulfilled. We then have the following two corollaries. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

376

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

Corollary 2.4 Assume that SQi is symmetric for i D 1 : : : ; N and nonsingular for i D 1; : : : ; N  1. Assume further that B is symmetric and positive definite and that Condition S holds with Q WD kB 1 k < 1:

(2.5)

Then BQ is symmetric and positive definite and satisfies   Q x 6 .1 C Q / .Bx; x/ for all x 2 RNQ : .1  Q / .Bx; x/ 6 Bx;

(2.6)

Q

Here, we have used .; / to denote the dot product on RN . Corollary 2.5 Assume that B is invertible and that SQj , for j D 1; 2; : : : ; N  1, is nonsingular. Suppose that  in Condition S satisfies (2.5). Then BQ is nonsingular and kI  BQ 1 Bk 6

Q : 1  Q

(2.7)

Remark 2.6 The previous corollary implies the convergence of BQ 1 to B 1 . Indeed, (2.7) implies that for Q y 2 RN , j.B 1  BQ 1 /yj D j.I  BQ 1 B/B 1 yj 6

Q kB 1 k jyj; 1  Q

that is, kB 1  BQ 1 k 6

Q kB 1 k: 1  Q

Remark 2.7 The inequalities of Corollary 2.4 imply bounds on the condition number of BQ 1 B, that is, K  K.BQ 1 B/ 6

1 C Q : 1  Q

This implies rapid convergence of the preconditioned conjugate gradient algorithm as the j ’th step satisfies the well-known convergence bound p K 1 j kej kB 6 2 ke0 kB ; Dp : K C1 Here, ej is the error after j steps and k  kB is the matrix norm induced by the vector norm kxkB WD .Bx; x/1=2 . Remark 2.8 Q Corollary 2.5 implies bounds on the preconditioned Richardson method: For f; x0 2 RN , set xj C1 D xj C BQ 1 .f  Bxj /; for j D 0; 1; : : :. If x is the unique solution of Bx D f , then ej D x  xj satisfies ej C1 D .I  BQ 1 B/ej

(2.8)

and the Corollary 2.5 implies that  jej j 6 Copyright © 2014 John Wiley & Sons, Ltd.

Q 1  Q

j je0 j: Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

377

Proof of Corollary 2.4 Q By Theorem 2.3, for x 2 RN ,   Q x  .Bx; x/ j 6 jxj2 6 Q .Bx; x/ : j Bx; Corollary 2.4 immediately follows.



Proof of Corollary 2.5 We note that Q B 1 BQ D I  .I  B 1 B/:

(2.9)

By Theorem 2.3, Q 6 kB 1 k D Q : Q D kB 1 .B  B/k kI  B 1 Bk A Neumann series argument with (2.9) implies that B 1 BQ is invertible when Q < 1 with norm Q is invertible and bounded by .1  Q /1 . This immediately implies that BQ D B.B 1 B/ 1 Q 1 kkB 1 k 6 kB k : kBQ 1 k 6 k.B 1 B/ 1  Q

Thus, kI  BQ 1 Bk D kBQ 1 .BQ  B/k 6 This completes the proof of the corollary.

Q : 1  Q 

3. H-MATRICES In this section, we consider the so-called H.K/-matrix approximations. These approximations involve partitioning the matrix into blocks and approximating the off-diagonal blocks by low rank Rk -matrices. More general discussions concerning Rk -matrices and H.K/-matrix representations can be found in [11–13]. 3.1. Rk -matrices The material here is not new and can be found in numerous places, for example, [11]. An n  m matrix M has a rank k representation (i.e., M is Rk -matrix) if it can be written M D AB t where A and B are, respectively, n  k and m  k matrices. It is clear that the rank of such a matrix M is at most k. The fact that any n  m matrix M of rank k can be represented as Rk -matrix is a consequence of the existence of the singular value decomposition of M . We note that we shall only use Rk -matrices to represent/approximate block n  m matrices when k < min.n; m/ as less memory and computation is involved using the full matrix representation when k > min.n; m/. The sum of two Rk 0 -matrices with the same m; n; k 0 indices and k 0 < min.n; m/ is in general not an Rk 0 -matrix but can easily be written as an Rk -matrix with k D 2k 0 (and the same m; n). To control the ranks in our algorithms, we need to approximate Rk -matrices by Rk 0 -matrices with k 0 < k. This approximation procedure is called truncation and is described later (cf. [11]). As a first case, suppose that k 0 < k and we are given an Rk -matrix M D AB t of rank k 6 min.n; m/. Our goal is to define an Rk 0 -matrix M 0 approximating M . Using the QR-decomposition, we write A D QA RA and B D QB RB , where QA ; RA ; QB ; RB are, respectively, n  k, k  k, mk and k k matrices, and QA ; QB have orthonormal columns. These factorizations exist as k 6 min.n; m/ (cf. Section 5.2 of [17]). Subsequently, we compute the singular value decomposition t RA RB D U˙ V t :

Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

378

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

t The matrix ˙ is diagonal and contains the singular values of RA RB , in decreasing order. The rank 0 k approximation to M is

M 0 D .QA .U˙ /k 0 /.QB Vk 0 /t WD A0 .B 0 /t :

(3.1)

Here, .U˙ /k 0 and Vk 0 denote the k  k 0 matrices consisting of the first k 0 columns of U˙ and V , respectively. We shall denote the truncation operator by TQk 0 , that is, M 0 D TQk 0 .M /. We further set .TQk 0 ; M / D ˙k 0 C1;k 0 C1 , the largest discarded singular value. The second case where truncation is required is when k 0 < min.n; m/ and k > min.n; m/. In this case, we are given an n  m matrix M stored as a full matrix. If n D min.n; m/, M D In .M t /t is an Rn representation of M , and we apply the previous truncation procedure to it. Here, In denotes the n  n identity matrix. Alternatively, if m < n, M D M Imt is an Rm representation of M , and the previous truncation procedure can again be applied. We note that the error can be written t M  M 0 D QA U˙e V t QB

where ˙e is the diagonal matrix with entries .0; 0; : : : ; 0; ˙k 0 C1;k 0 C1 ; : : : ˙k;k /. In the previous equation, QA ; QB , U and V all have orthonormal columns and so kQA k D kU k D kV t k D t k D 1. Because k˙e k D ˙k 0 C1;k 0 C1 , kQB kM  M 0 k 6 ˙k 0 C1;k 0 C1 D .TQk 0 ; M /:

(3.2)

Remark 3.1 An implementation of the previous truncation requires 6k 2 .nCm/C23k 3 floating point operations; see Lemma 1.3 in [11]. 3.2. H.K/-matrices A general discussion of hierarchical matrix approximation can be found in [11–13]. In this paper, we only consider a simple example of such a representation/approximation for nn matrices. These representations will be defined from a binary tree involving the index set ¹1; 2; : : : ; nº. This tree is set up as follows: the root of the tree is labeled with the index set ¹1; : : : ; nº. Its two sons come from partitioning the entries as equally as possible, for example, into 1; : : : bn=2c and bn=2c C 1; : : : n. We repeat this procedure until we get to the point when each of the sons (the leaves) contains at most two indices in its label; see Figure 2. We consider the root of the tree to be level zero, level one nodes of the tree are the sons of the root, level two nodes are the sons of level one nodes, and so on. The nodes of the last level of the tree, which have no sons, are called leaves of the tree. This tree structure gives rise to a block partitioning of a n  n matrix where each tree node gives rise to sub-blocks as follows. Any tree node on any level except the last level is associated with the two blocks, which connect the indices of its sons. Each leave is associated with the diagonal block involving its indices. Thus, for example, the level one node of Figure 2 labeled ¹1; : : : ; 6º has sons 1 D ¹1; 2; 3º and 2 D ¹4; 5; 6º and is thus associated with the two blocks in the matrix partition with indices ¹i; j W i 2 1 ; j 2 2 º and ¹i; j W i 2 2 ; j 2 1 º. Figure 3 shows the block partitioning of the matrix corresponding to the tree structure given in Figure 2. In particular, a block

Figure 2. A binary tree with n D 13. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

379

Figure 3. The block structure corresponding to Figure 2.

labeled j in Figure 3 comes from a node of level j in Figure 2. The level one node ¹1; : : : ; 6º gives rise to the two blocks labeled 1 in the upper left hand quarter of the matrix in Figure 3. An H.K/-matrix is a matrix such that the off-diagonal blocks in the previous block decomposition have restricted ranks represented by Rk -matrices. To simplify the discussion, we shall restrict to the case where the ranks of the off-diagonal blocks are determined by a positive integer K. Let  be a node with sons 1 and 2 and let #i denote the number of indices in i . There are two blocks in the block partitioning associated with  , one above and one below the diagonal. As these blocks are of dimension #1  #2 and #2  #1 , they are represented by Rk -matrices of rank K when K < min.#1 ; #2 / and full matrices otherwise. The diagonal blocks associated with the leaves are stored as full matrices. H.K/-matrices with the same block structure and the same K are referred to as being of the same format. The cost of applying an H.K/-matrix to a vector is essentially equal to its storage. For a H.K/matrix of dimension n  n, the number of indices for a block associated with a non-leaf node of level ` is essentially n=2`C1 . As there are 2`C1 such blocks, the memory requirements for all blocks associated with non-leaf nodes of level ` is O.nK/. Thus, the storage requirements for a H.K/-matrix of dimension n  n are bounded by O.n log2 .n/K/; see also Lemma 2.5 in [11]. This means that the use of H.K/-matrices can be effective if good approximations can be obtained with moderately small off-diagonal ranks. The truncation procedure for Rk -matrices immediately implies a truncation procedure for H.K/matrices sharing the same tree. Indeed, given an H.K/-matrix M associated with K and 1 6 K 0 < K, we develop a truncated H.K 0 /-matrix approximation M 0 as follows. Let  , 1 and 2 be as those previously discussed and, when  is not a leaf, let bi , i D 1; 2 be the associated off-diagonal blocks in M . (a) If  is a leaf or K 0 > min.#1 ; #2 /, its block is copied unchanged into M 0 . (b) Otherwise, the block bi0 in the K 0 -matrix corresponding to bi is set to bi0 D TQK 0 .bi /, for i D 1; 2. The matrix so constructed is denoted by TK 0 .M /. We define .TK 0 ; M / D max.max¹.TQK 0 ; M1 /; .TQK 0 ; M2 /º/ where the first maximum is taken over all nodes N satisfying (b) previously. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

380

In the sequel, we shall need the bounds for the error associated with the truncation operator TK given in the following lemma. We provide a proof for completeness. A similar lemma involving the Frobenius norm error was given in [11]. Lemma 3.2 Let M be an n  n H(K) matrix associated with a tree whose leaves are at level J . If K 0 < K, then kM  TK 0 .M /k 6 J .TK 0 ; M /: Proof For x; y 2 Rn ,

..M  TK 0 .M //x; y/ D

J 1 X

X   .b1  TQK 0 .b1 //xS1 . / ; yS2 . /

j D0  of levelj

 C..b2  TQK 0 .b2 //xS2 . / ; yS1 . / / :

Here,  is a node of the binary tree satisfying (b) previously, b1 and b2 denote the associated blocks in M and Si . /, i D 1; 2 denotes the index set associated with the i’th son of  . Also, xSi . / , and ySi . / denote the vectors formed by extracting the values of x and y corresponding to the index set Si . /. Now applying (3.2) gives J 1 X

X

j..M  TK 0 .M //x; y/j 6 .TK 0 ; M /

  jxS1 . / j jyS2 . / j C jxS2 . / j jyS1 . / j

j D0  of level j J 1 X

6 .TK 0 ; M /

j D0

0 @

X

0

11=2 X   @ jxS1 . / j2 C jxS2 . / j2 A  of levelj

11=2   jyS2 . / j2 C jyS1 . / j2 A

 of level j

D .TK 0 ; M /J jxj jyj: 

Remark 3.3 The cost of evaluating TK on an j  j matrix is O.j.k 2 log2 j C k 3 // and follows readily from Remark 3.1 (see also Lemma 2.9 in [11]).

4. THE APPROXIMATIONS SQj In this section, we use H(K)-matrices to define SQj of Section 2. Following the two examples of Section 2, we only consider the case when the blocks Bjj are tridiagonal with possible connections between the first and last unknowns of the block. This implies that the off-diagonal blocks of Bjj are of low rank, that is, Bjj can be stored as an H(K) matrix. Indeed, the off-diagonal blocks are of rank 1 in Example 2.1 and at most rank 2 in Example 2.2. We shall see that if Sj1 1 is an H(K) matrix, then H(K) truncation can be used to obtain an H(K) approximation to Bjj  Bj;j 1 Sj1 1 Bj 1;j . Unfortunately, we can only give a complete analysis for symmetric and positive definite matrices although the technique appears to work well in more general situations as illustrated in our last numerical computation and those cited previously. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

381

In the trivial case when the binary tree associated with the matrix A consists only of the root node, which has no sons (the height of the tree is zero), the inverse is computed directly. For nontrivial trees, we first write A 2H(K) in block 2  2 form,  AD

 A11 A12 ; A21 A22

(4.1)

where the blocks correspond to the indices in the sons of the root. The left son indices correspond to the upper left block, while the right son indices correspond to the lower right block (Figure 2). Note that A11 and A22 are H(K) matrices defined with respect to the sub-tree associated with the left and right sons, respectively. The block form of the inverse of A is given by A1 D

  1 1 1 1 A21 A1 A11 C A1 11 A12 S 11 A11 A12 S : S 1 A21 A1 S 1 11

(4.2)

Here, S D A22  A21 A1 11 A12 is the Schur complement. Our approximate inverse will be defined recursively with respect to the height of the tree. The diagonal blocks (leaves of the tree) are solved by direct methods. The approximate inverse will be denoted I .A/ for a matrix A 2H(K). Motivated by (4.2), we set  I .A11 /A12 I .TK .R// TK .W / : I .A/ D I .TK .R//A21 I .A11 / I .TK .R// 

(4.3)

Here, R D A22  A21 I .A11 /A12 , and W D I .A11 / C I .A11 /A12 I .TK .R//A21 I .A11 /. This approximation to the inverse appears in [6, 11, 12]. The definition of I .A/ utilizes the truncation operator TK to control the rank and/or accuracy of the approximation. In the implementation of this algorithm, one recursively computes the explicit H(K)-representation of I .A/. For simplicity, we consider the case when the off-diagonal ranks are equal and denoted by k. The algorithm is given as follows: Algorithm for computing I .A/. 1. If the height of the tree associated with A is zero, then compute the inverse by direct methods. 2. Otherwise, (a) Compute G D I .A11 / by applying this algorithm recursively. (b) Compute an Rk -representation of A21 I .A11 /A12 . This involves K matrix vector evaluations with matrix I .A11 /. (c) Compute the truncated H.K/-representation TK .R/ of R and apply the algorithm recursively to define I .TK .R//. This is the 22 block of the approximate inverse. (d) Compute the Rk -representation of GA12 I .TK .R//A21 G. This involves K matrix vector evaluations with I .TK .R// and 2K with G. (e) Compute the truncated H.K/-representation TK .W /. This is the 11 block of the approximate inverse. (f) Compute the Rk -representations of off-diagonal blocks of the inverse (similar to (b) previously) to define the 12 and 21 blocks of the approximate inverse. Remark 4.1 We consider applying I to a hierarchical matrix associated with a tree of height j representing a square matrix of dimension n and denote the computational cost by C C.j; n/. The cost of (a) is C C .j  1; n=2/. The cost of the multiplication of a hierarchical matrix of dimension m and height l  log2 .m/ times a vector is O.lKm/ so the cost of (b) previously is O.jK 2 n/. By Remark 3.3, the cost of (c) is bounded by C C .j  1; n=2/ C O.jK 2 n C K 3 n/. Similarly, the combined cost of (d), (e), and (f) are O.jK 2 n C K 3 n/, and hence, C C .j; n/ 6 2C C .j  1; n=2/ C O.jK 2 n C K 3 n/: Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

382 A simple induction gives

C C .j; n/ 6 O.j 2 K 2 n C jK 3 n/: As A is symmetric, we use symmetric Rk blocks. In fact, we only store the upper block as the transpose only involves reversing the roles of the two factors of the Rk representation. It is clear from the definition that the accuracy achieved by the previous algorithm depends on the truncation estimates, that is, .TK ; R/ and .TK ; W /: Because the algorithm is recursive, there will be similar errors from the blocks coming from all nodes of the tree except the leaves. Note that it is simple to monitor these errors in the implementation and we denote their maximum to be max . The definition of I involves recursive approximation of the inverses of the block matrices A11 and TK .R/. We shall need coercivity and norm bounds for both matrices. The norm bound for kA11 k 6 kAk is trivial. Additional bounds are given in the following lemma. Lemma 4.2 Suppose that an m  m matrix A is symmetric and positive definite and satisfies ˛jxj2 6 .Ax; x/ for all x 2 Rm :

(4.4)

Let A be partitioned as in (4.1). Then ˛jxj2 6 .A11 x; x/for all x 2 R#A11 :

(4.5)

Here, we set #A to be the number of rows of a square matrix A. Suppose that R D A22  A21 GA12 with a symmetric and positive definite matrix G satisfying kG 1  A11 k 6 ˛: Then, ˛jxj2 6 .Rx; x/for all x 2 R#A22

(4.6)

and kRk 6 kA22 k. Proof Let x be in R#A11 and xN extend x by zero to R#A . Then .A11 x; x/ D .Ax; N x/ N > ˛jxj N 2 D ˛jxj2 : To estimate R, we start with x in R#A22 and extend it to the vector   GA12 x : yQ D x Define   1 G A12 AG D : A21 A22 Then .Rx; x/ D .AG y; Q y/ Q D ..G 1  A11 /GA12 x; GA12 x/ C .Ay; Q y/ Q > kG 1  A11 k jGA12 xj2 C ˛.jxj2 C jGA12 xj2 / > ˛jxj2 : Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

383

This verifies the coercivity of R. Because .Rx; x/ > 0, .GA12 x; A12 x/ 6 .A22 x; x/ and hence, j.Rx; x/j 6 .A22 x; x/ 6 kAkjxj2 : The inequality kRk 6 kAk follows because R is symmetric and positive definite. This completes  the proof of the lemma. We then have the following theorem. Theorem 4.3 Suppose that A is an H(K)-matrix, which is symmetric and positive definite. Let J be the level of the leaves (height) of the tree associated with A. Then, there is a constant 0 D 0 .J; A/ such that if max 6 0 , I .A/ (with all block truncations appearing in the algorithm bounded by max ) is well defined, symmetric and positive definite and satisfies kI .A/1  Ak 6

3 .kAk C ˛=2 C 1/2 J 2 max : 2

(4.7)

Here, ˛ denotes the smallest eigenvalue of A. Proof The analysis must deal with the fact that there is truncation involved each time we do the recursion using (4.3). Because of this truncation, the coercivity constants and norm bounds for the Schur complement term, that is, I .TK .R//, degenerate. The proof is by induction on j , the level of the leaves (height) of the tree associated with A. For j D 0 or j D 1, there is nothing to prove because I .A/ D A1 . Set aJ D kAk and define 3 .aJ C ˛=2 C 1/2 j 2 ; 2 J 1 X `; ˇJ D 0; ˇj D

Cj D

`Dj

˛j D ˛  ˇj max ; and aj D aJ C ˇj max : Finally, set 0 D min¹J 2 ˛; .3.aJ C ˛=2 C 1/2 J 2 /1 min¹˛; 1ºº:

(4.8)

The previous definitions and max 6 0 imply, for example, that ˛j > ˛=2 and aj 6 aJ C ˛=2; for j D 0; : : : ; J: Assume that j < J and A is a symmetric and positive definite H(K)-matrix satisfying ˛Q j jxj2 6 .Ax; x/ for all x 2 R#A

(4.9)

kAk 6 aj :

(4.10)

and

The inductive assumption is that under these conditions, I .A/ is well defined, symmetric, and positive definite and satisfies kI .A/1  Ak 6 Cj max : Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

384

We now suppose that A is a symmetric and positive definite H(K)-matrix associated with the tree with height j C1 satisfying (4.9) and (4.10) with ˛j and aj replaced by ˛j C1 and aj C1 , respectively. We further assume that the inductive hypothesis holds for matrices associated with trees with height j. We now partition A as in (4.1) and set G D I .A11 /, R D A22  A21 GA12 , RQ D I .TK .R//, and Q 21 G. Then, W D G C GA12 RA     TK .W /  W 0 W GA12 RQ : (4.11) C I .A/ D Q 21 G 0 0 RQ RA By Lemma 4.2, A11 is symmetric and positive definite and satisfies (4.9) and (4.10) with ˛j C1 and aj C1 . Hence, because ˛j < ˛j C1 and aj C1 < aj , the inductive hypothesis implies that G is nonsingular and satisfies kG 1  A11 k 6 Cj max :

(4.12)

Applying (4.8) gives kG 1  A11 k 6 Cj 0 6

3 .aJ C ˛=2 C 1/2 J 2 0 6 ˛=2 6 ˛j C1 : 2

(4.13)

Lemma 4.2 implies ˛j C1 jxj2 6 .Rx; x/ for all x 2 R#A22 and kRk 6 aj C1 : Lemma 3.2 then gives .TK .R/x; x/ D .Rx; x/ C ..TK .R/  R/x; x/ > .˛j C1  jmax /jxj2 D ˛j jxj2 : Similarly, kTK .R/k 6 kRk C jmax 6 aj C1 C jmax D aj : This shows that TK .R/ satisfies the inductive hypothesis and so RQ D I .TK .R// is nonsingular and satisfies kRQ 1  TK .R/k 6 Cj max :

(4.14)

A simple calculation shows that the inverse of the first matrix on the right hand side of (4.11) is given by   1 A12 G : AQ D A21 RQ 1 C A21 GA12 Thus, we have I .A/ D AQ1 C



 TK .W /  W 0 Q  AQ1  EQ D AQ1 .I  AQE/: 0 0

(4.15)

By Lemma 3.2, Q Q 6 jmax kAk: kAQEk We apply a Neumann series argument to show that I .A/ is nonsingular. To do this, we first need a Q We note that bound on kAk.  1  0 G  A11 Q ADAC : 0 RQ 1  R Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

385

ANALYSIS OF A SWEEPING PRECONDITIONER

By the triangle inequality and (4.14), kRQ 1  Rk 6 kRQ 1  TK .R/k C kTK .R/  Rk 6 .Cj C j /max : Thus, applying (4.12) gives kAQ  Ak 6 .Cj C j /max

(4.16)

and hence, Q 6 kAk C .Cj C j /max : kAk It follows that Q 6 jmax .aj C1 C .Cj C j /max /  : Q 6 jmax kAk kAQEk Note that (4.8) implies that  6 j0 .aJ C ˛=2 C .Cj C j /0 / 6 j0 .aJ C ˛=2 C 1/ 6

1 2

(4.17)

Q Thus, I .A/ is nonsingular and is and we can apply the Neumann series argument to invert I  AQE. given by Q 2 C .AQE/ Q 1 AQ D AQ C .AQEQ C .AQE/ Q 3 : : :/A: Q I .A/1 D .I  AQE/ It follows that Q 6 kI .A/1  Ak

 Q kAk: 1

This combined with (4.16) gives kI .A/1  Ak 6

 Q C .Cj C j /max : kAk 1

As in (4.17), Q 6 .aJ C ˛=2 C 1/ and  6 jmax .aJ C ˛=2 C 1/: kAk Thus, 1

kI .A/



 3 2 2  Ak 6 2j.aJ C ˛=2 C 1/ max C .aJ C ˛=2 C 1/ j C j max 2 6 .aJ C ˛=2 C 1/2 .3j C 3=2j 2 /max 6 Cj C1 max : 2

This completes the proof of the theorem.



Remark 4.4 Suppose that we only have bounds for ˛ and kAk, for example, ˛ > ˛=2 Q and kAk 6 aJ . Then Q .3.2aJ C 1/2 J 2 /1 ; min.˛=2; Q 1/º 6 0 : Q0 D min¹J 2 ˛=2;

(4.18)

The previous theorem implies that for max < Q0 , kI .A/1  Ak 6

3 .2aJ C 1/2 J 2 max : 2

Remark 4.5 It is clear from estimate of the theorem that it is natural to scale A so that kAk D O.1/. The algorithm involves truncation of R and W . Note that R scales like A while W scales like A1 . It is clear that the error in the truncation TK .M / scales like  (if M and K are fixed). With kAk D O.1/, ill conditioning in the matrix A manifests itself in large kW k and larger truncation errors associated with TK .W /. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

386

5. ANALYSIS OF THE SWEEPING PRECONDITIONER In this section, we define the operators SQj 2H.K/ used in the abstract sweeping preconditioner of Section 2. We assume that Bjj 2H.K/ for j D 1 : : : ; N . This holds for Example 2.1 for K > 1 and Example 2.2 for K > 2 as discussed in Section 3. In addition, we assume that we have an algorithm Q 1 for generating a symmetric H.K/-matrix Vj approximating Bj;j 1 SQj1 1 Bj 1;j when Sj 1 2H.K/ satisfies (for j D 2; : : : ; N ) kVj  Bj;j 1 SQj1 1 Bj 1;j k 6 MP max :

(5.1)

Here, MP is a constant, which depends on the approximation. This approximation is not necessary Q 1 for Example 2.1 as Bj;j 1 SQj1 1 Bj 1;j is an H.K/-matrix with the same format as Sj 1 and can be 1 easily computed. In contrast, Bj;j 1 SQj 1 Bj 1;j in the case of Example 2.2 is more complicated, and an algorithm for approximating it in H.K/ is discussed later. The explicit algorithm for generating SQj1 , j D 1; : : : ; N is defined by setting W1 D B11 and SQ11 D I .W1 /

(5.2)

Wj D Bjj  Vj andSQj1 D I .TK .Wj //:

(5.3)

and, for j D 2; : : : ; N ,

The previous algorithm does not break down provided that TK .Wj / is positive definite. We assume that #Wj gives rise to a hierarchical tree of height Jj and set J D max¹Jj º, j D 1; : : : ; N . For both examples, Jj D J for every j . In the case of Example 2.1, assuming that the same ordering is used on each column, the application of Bj;j 1 and Bj 1;j reduces to multiplication by diagonal matrices. In this case, Bj;j 1 SQj1 1 Bj 1;j is an H(K) matrix with the same . The representation of the product can be easily computed in O.JKN / off-diagonal ranks as SQj1 1 operations. No approximation is necessary, that is, we take Vj D Bj;j 1 SQj1 1 Bj 1;j . We then have the following theorem. Theorem 5.1 Assume that B is symmetric and positive definite. There is a constant 2 WD 2 .J; B/ such that if max < 2 , then the algorithm (5.2) and (5.3) does not break down and defines symmetric and positive definite matrices SQj1 D I .TK .Wj // satisfying Condition S with  D

 3 2 2 .2kBk C ˛Q C 1/ J C J C MP max  CB max : 2

(5.4)

Here, ˛Q denotes the smallest eigenvalue of B. Proof We set 2 D min¹Q0 ; CB1 ˛; Q .J C MP /1 ˛=2º Q

(5.5)

Q where Q0 is given by (4.18) with aJ D kBk C ˛. We shall show that TK .Wj / satisfies .˛Q  .J C MP /max /jxj2 6 .TK .Wj /x; x/ for all x 2 R#Wj

(5.6)

kTK .Wj /k 6 kBk C .J C MP /max :

(5.7)

and

Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

387

Q and kBkC.J CMP /max 6 kBkC ˛=2. Q We note that (5.5) implies that .˛.J Q CMP /max / > ˛=2 If (5.6) and (5.7) hold, then (5.5) and Remark 4.4 imply that SQj1 D I .TK .Wj // is symmetric and positive definite and its inverse satisfies 3 .2kBk C ˛Q C 1/2 J 2 max : 2

kSQj  TK .Wj /k 6

Thus, by Lemma 3.2, (5.1) and the triangle inequality, Q 1 Q kSQj C Bj;j 1 SQj1 1 Bj 1;j  Bj;j k 6 kSj  Wj k C kBj;j 1 Sj 1 Bj 1;j  Vj k 6 kSQj  TK .Wj /k C kTK .Wj /  Wj k C MP max s 6 CB max :

(5.8)

Note that this is just Condition S with  given by (5.4). We shall complete the proof of this theorem by showing that (5.6) and (5.7) hold by induction on j . For j D 1, because W1 D B11 , we clearly have ˛jxj Q 2 6 .W1 x; x/; for all x 2 R#B11 and kW1 k 6 kBk. The inequalities (5.6) and (5.7) then follow from Lemma 3.2 and the triangle inequality. We now assume that (5.6), (5.7) hold and hence SQj1 is well defined, symmetric and positive definite and Condition S holds for all j 6 ` with ` < N . For j D 1; : : : ; `, let LQ j , Bj , and BQ j Q B, and B, Q respectively. ‡ Applying Theorem 2.3 gives that denote the j ’th principle minor of L, kBQ `  B` k 6 CB max :

(5.9)

Now, for j D 2; : : : ; ` C 1, set WQ j D Bj;j  Bj;j 1 SQj1 1 Bj 1;j : Note that SQ11 , WQ 2 and the blocks of B2 are related in exactly the same way that G, R and the blocks of A are related in Lemma 4.2. For ` > 1, we partition B`C1 as  B`C1 D

B`

BN `C1;`

 BN `;`C1 : B`C1;`C1

(5.10)

Now, BQ ` D LQ ` S ` LQ t` where S ` denotes the block diagonal matrix with diagonal blocks ¹S j º and 1 Q t are, respectively, lower and upper block triangular S ` LQ 1 . As LQ 1 and L hence BQ `1 D LQ t ` ` ` ` matrices with identity diagonal blocks; it follows that SQ`1 is the `` block of BQ `1 . Thus, WQ `C1 D B`C1;`C1  B`C1;` SQ`1 B`;`C1 D B`C1;`C1  BN `C1;` BQ `1 BN `;`C1 : This means that BQ `1 , WQ `C1 and the blocks of B`C1 in (5.10) are related in exactly the same way that G, R and the blocks of A are related in Lemma 4.2. Thus, it follows from Lemma 4.2 and (5.9) (5.5) that kWQ `C1 k 6 kB`C1 k 6 kBk and Q

˛jxj Q 2 6 .WQ `C1 x; x/ for all x 2 R#W`C1 :

Q j and BQ j are well defined even though the full matrices L Q and BQ may not be at this point of the matrices L argument.

‡ The

Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

388

This holds both when ` D 1 and ` > 1. Lemma 3.2, (5.1), and the triangle inequality then imply (5.6) and (5.7) for ` C 1 because kTK .W`C1 /  WQ `C1 k D k.B`C1;` SQ`1 B`;`C1  Vj / C .TK .W`C1 /  W`C1 /k 6 .J C MP /max : This completes the proof of the theorem.



The following corollary is an immediate consequence of the previous theorem and Corollaries 2.4 and 2.5. Corollary 5.2 Under the hypothesis of the previous theorem, Q x/ 6 .1 C CB max kB 1 k/.Bx; x/ for all x 2 R#B .1  CB max kB 1 k/.Bx; x/ 6 .Bx; and kI  BQ 1 Bk 6

CB max kB 1 k 1  CB max kB 1 k

provided that CB max kB 1 k < 1: We next consider the case of Example 2.2. Note that the number of unknowns in each block is a constant, which we denote by N . We use the same hierarchical structure for each group so that the hierarchical blocks of Bjj and SQj 1 align in a natural way. The problem is that, in contrast to Example 2.1, the ranks of the sub-blocks of Bj;j 1 SQj 1 Bj 1;j are larger than those of SQj 1 or Bjj . Thus, we need an algorithm for truncating the product. Because of the assumptions on the unknown ordering, the N  N matrices Bj 1;j and Bj;j 1 are tridiagonal except for nonzero entries at the corner entries 1; N and N; 1. This means that the off-diagonal hierarchical blocks have rank at most 2. We first consider the problem of approximating CD where C is an H(K)-matrix and D is a sparse matrix. The resulting approximation is to be an H(K)-matrix with the same structure as C . The approximation is defined recursively and denoted by L. If C is associated with a tree with zero height, then we compute CD exactly, that is, L.C; D/ D CD. For the general case, an H(K)-matrix C associated with a tree of height j has a 2  2 block structure 

C11 C12 C D C21 C22



with C11 and C22 being matrices associated with sub-trees with heights j  1. We partition D with the same block sizes and recursively define L.C; D/ D TK



 L.C11 ; D11 / C C12 D21 C11 D12 C C12 D22 : C21 D11 C C22 D21 C21 D12 C L.C22 ; D22 /

Note that the terms C12 D21 , C12 D22 , C21 D11 , and C21 D12 all involve an Rk -matrix times a sparse matrix. Each term results in an Rk -matrix, for example, if C11 D EF , then C11 D12 D E.FD12 /. Now, in general, the ranks of C11 D12 and C22 D21 could be large if the rank of D12 or D21 is large. This is avoided in our example where the ranks of D12 and D21 are at most 2. This also means that the computation of C11 D12 and C22 D21 involves relatively few H(K)-matrix vector products. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

389

ANALYSIS OF A SWEEPING PRECONDITIONER

Remark 5.3 We chose the situation where the number of unknowns in a group remained constant for simplicity. The case when the number of unknowns (and hence the hierarchical structure) changes from group to group requires a more complicated truncation algorithm. Such algorithms have been proposed in, for example, [6, 9, 13]. In an analogous fashion, we can develop an approximation to the product DC with, again, D being a sparse matrix and C being an H(K)-matrix. The resulting approximation is denoted LQ .D; C /. We now set Vj by    : (5.11) Vj D LQ Bj;j 1 ; L SQj1 1 ; Bj 1;j We then have the following proposition. Proposition 5.4 Let B come from Example 2.2. For Vj defined by (5.11), kVj  Bj;j 1 SQj1 1 Bj 1;j k 6

.1 C kBk/j.j C 1/ max ; 2

that is, (5.1) holds with MP D .1 C kBk/j.j C 1/=2. Proof Let C and D be as those previously discussed. Then,   0 C11 D11  L.C11 D11 / C A  TK .A/ CD  L.C; D/ D 0 C22 D22  L.C22 D22 / where 

 L.C11 ; D11 / C C21 D12 C11 D12 C C12 D22 AD : C21 D11 C C22 D21 C21 D12 C L.C22 ; D22 / Lemma 3.2 immediately implies that kA  TK .A/k 6 jmax and thus kCD  L.C; D/k 6 max¹kC11 D11  L.C11 D11 /k; kC22 D22  L.C22 D22 /kº C jmax : An easy induction then shows that j.j C 1/ max : 2

(5.12)

j.j C 1/ kDC  LQ .D; C /k 6 max : 2

(5.13)

kCD  L.C; D/k 6 The same argument shows that

Finally, the triangle inequality implies that   1  Q Q 1 Q kVj  Bj;j 1 SQj1 1 Bj 1;j k D kL Bj;j 1 ; L Sj 1 ; Bj 1;j  Bj;j 1 Sj 1 Bj 1;j k   1    6 kLQ Bj;j 1 ; L SQj 1 ; Bj 1;j  Bj;j 1 L SQj1 1 ; Bj 1;j k     Q 1 C kBj;j 1 L SQj1 1 ; Bj 1;j  Sj 1 ; Bj 1;j k and the proposition follows from (5.12) and (5.13). Copyright © 2014 John Wiley & Sons, Ltd.

 Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

390

6. NUMERICAL RESULTS We report the results of numerical experiments involving the sweeping preconditioner applied to model second order elliptic problems in this section. Specifically, we consider either Example 2.1 or Example 2.2 with q.x/ equal to a constant and a.x/ constant or variable. In both examples, the matrix B is scaled so that kBk D O.1/. We consider both the definite case, q > 0, and the indefinite case when q < mi n . Here, mi n denotes the smallest eigenvalue of the operator with q D 0. It follows from Theorem 2.3 that the sweeping preconditioner can be effective without approximating the Schur complements of the original tridiagonal matrix. However, if the truncations are sufficiently accurate, the blocks ¹SQi º should be close to ¹Si º. To test this conjecture, we considered the five-point stencil of Example 2.1 with q D 0 and a.x/ D 1 and varied N D h1 between 16 and 256. We set the off-diagonal rank by K D log2 .N / and computed N E.N / D max kSN .ei /  SQN .ei /k1 : i D1

Here, ei is the standard basis vector for RN , and k  k1 denotes the maximum norm. We observed that E.N / decreased monotonically as N increased starting at E.16/ D 1:7  108 and ending up at E.256/ D 2:9  109 . In the first two tables, we report the behavior of the sweeping preconditioner as a function of the off-diagonal rank K for the definite case and five-point stencil of Example 2.1 with a D 1 and q D 0 (excluding the last column of Table II). In Table I, we fix K D 2 and report the condition number of the preconditioned system BQ 1 B and max . Fixing K D 2 appears to result in a limit value of max  :003. As the number of levels becomes larger, it is quite likely that the threshold max < 2 is not satisfied and hence little can be concluded from the theory. It is not surprising that the method degenerates in this case. Table II shows the dramatic improvement when K D log2 .N /. In this case, we report   Q t  w 1=2 h D kI  BQ 1 BkBQ where kwkBQ  .Bw/ and max . Note that the condition number of BQ 1 B is bounded by .1 C h /=.1  h / and hence would be very close to one. In this case, the work associated with the construction of the precondition is O..log2 .N //4 N 2 /, and the cost of its evaluation on a vector is O..log2 .N //2 N 2 / by the Table I. K D 2, five-point stencil. N D h1

Cond(BQ 1 B)

max

1.008 1.07 1.46 3.24 10.2 49.4 150

1:58  103 2:69  103 2:98  103 3:0  103 3:0  103 3:0  103 3:0  103

16 32 64 128 256 512 1024

Table II. K D log2 .N /, five-point stencil. N D h1

K

kI  BQ 1 BkBQ

max

kI  BQ 1 BkBQ with (6.1)

16 32 64 128 256 512 1024

4 5 6 7 8 9 10

2:16  107 4:46  107 9:66  107 2:03  106 4:13  106 8:19  106 1:60  105

3:73  107 2:75  107 1:97  107 1:34  107 8:72  108 5:53  108 3:43  108

1:73  107 4:11  107 9:44  107 2:03  106 4:17  106 8:29  106 1:63  105

Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

ANALYSIS OF A SWEEPING PRECONDITIONER

391

Table III. K D dlog2 .N /  1 C .log2 .N //3=2 =8e, five-point stencil. N D h1

K

kI  BQ 1 BkBQ

16 32 64 128 256 512 1024

4 5 7 8 10 11 13

2:2  107 4:5  107 3:3  108 1:3  107 3:7  108 1:4  107 6:9  108

max 3:73 2:75 9:28 1:12 1:3 1:46 2:74

 107  107  109  108  109  109  1010

discussion preceding Lemma 3.2 and Remark 4.1, respectively. Note that the number grid points is NQ D N 2 so the case of N D 1024 corresponds to over a million grid points. As mentioned in the introduction, our motivation for studying the sweeping preconditioner is its remarkable behavior on scattering problems with far field condition approximated using a PML layer [1]. The PML scattering problem is variable coefficient, complex valued, non-Hermitian, and indefinite. This method worked well in [9] and also in numerical experiments, which we performed on a similar PML scattering problem on a domain similar to Example 2.2. In our case, we used a layer that resulted in a jumping coefficient at the PML interface. We will not provide details of those computations as they do not fit the theory presented here. However, we report results for a jumping-variable coefficient problem that fits our theory in the last column in Table II. Specifically, we consider q D 0 and ² 2 C x.1  x/  y.1  y/ W wheny < 1=2; a.x; y/ D (6.1) 4 C 2x.1  x/  2y.1  y/ W when y > 1=2; along with the natural transmission conditions, u .x; 1=2/ D uC .x; 1=2/

C C and a .x; 1=2/ u y .x; 1=2/ D a .x; 1=2/ uy .x; 1=2/:

Here, ˙ denotes the limits at y D 1=2 from above and below. The sweeping preconditioner uses the operator coefficient information in a strong way. The results for variable coefficient problems presented here appear as good as those for the constant coefficient problem. Table II appears to suggest that the sweeping preconditioner with K D log2 .N / results in an approximation that deteriorates like O.h1 /. This deterioration can be avoided with a modest increase of K. We report similar results in Table III but set K D dlog2 .N /  1 C log2 .N /3=2 =8e. This modest increase in K D K.N / leads to approximation errors, which remain bounded as h decreases, at least in the range of h tested. The work estimates for a problem with NQ D N 2 unknowns in this case are O..log2 .N //5=2 N 2 / for evaluation and O..log2 .N //5 N 2 / for construction. These estimates do suggest that the overhead of the method may be somewhat high except for very large N , for example, when log2 .N / D 10, log2 .N /5 > N so .log2 .N //5 N 2 > N 3 D NQ 3=2 . Next, we consider the nine-point stencil on the domain of Example 2.2. In this case, the number of unknowns is approximately NQ  N 2 =6. Because the off-diagonal blocks were no longer diagonal, we used L and LQ to approximate Vj in (5.1). We illustrate three cases: q D 60, q D 0, and q D 60. The first two cases correspond to positive definite problems. The last case corresponds to an indefinite problem. Indeed, a direct computation shows that for u D .r  1/.2  r/, kruk2˝ =kuk2˝ D 10 > mi n and hence the problem is indefinite for any q 6 10. Table IV reports kI  BQ 1 Bk for different values of q and K D log2 .N /. These results illustrate the robustness of the preconditioner for more complicated stencils and indefinite problems. Note that even in the worse case, two steps of preconditioned Richardson iteration is sufficient for convergence. The work estimates for the computation of the preconditioner and its evaluation are on the same order as those for Table II. Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla

˘ H. BAGCI, J. E. PASCIAK AND K. Y. SIRENKO

392

Table IV. K D log2 .N /, nine-point stencil. q D 60 N  1= h 16 32 64 128 256 512 1024

qD0

q D 60

kI  BQ 1 BkBQ

max

kI  BQ 1 BkBQ

max

kI  BQ 1 BkBQ

max

6:6  107 1:1  106 2:1  106 3:8  106 7:1  106 1:3  105 2:4  105

5:1  107 3:2  107 2:0  107 1:3  107 8:5  108 5:6  108 3:8  108

6:5  106 5:7  106 7:2  106 1:1  105 1:7  105 2:9  105 5:2  105

5:8  106 1:8  106 7:7  107 3:8  107 2:1  107 1:2  107 7:2  108

4:5  105 5:6  105 3:2  105 3:2  105 4:0  105 5:6  105 8:6  105

8:9  105 3:0  105 5:7  106 1:8  106 7:6  107 3:6  107 1:8  107

ACKNOWLEDGEMENTS

This work was supported in part by the National Science Foundation through grant DMS-0609544. It was also supported in part by award number KUS-C1-016-04 made by King Abdullah University of Science and Technology (KAUST). REFERENCES 1. Engquist B, Ying L. Sweeping preconditioner for the Helmholtz equation: hierarchical matrix representation. Communications on Pure and Applied Mathematics 2011; 64(5):697–735. 2. Martinsson PG. A fast direct solver for a class of elliptic partial differential equations. Journal of Scientific Computing 2009; 38(3):316–330. 3. Xia J, Chandrasekaran S, Gu M, Li XS. Superfast multifrontal method for large structured linear systems of equations. SIAM Journal on Matrix Analysis and Applications 2009; 31(3):1382–1411. 4. Concus P, Golub GH, Meurant G. Block preconditioning for the conjugate gradient method. SIAM Journal on Scientific and Statististical Computing 1985; 6(1):220–252. 5. Concus P, Golub GH, Meurant G. Corrigendum: “Block preconditioning for the conjugate gradient method”. SIAM Journal on Scientific and Statististical Computing 1985; 6(3):791. 6. Bebendorf M. Efficient inversion of the Galerkin matrix of general second-order elliptic operators with nonsmooth coefficients. Mathematics of Computation 2005; 74(251):1179–1199. 7. Bebendorf M, Hackbusch W. Existence of H -matrix approximants to the inverse FE-matrix of elliptic operators with L1 -coefficients. Numerische Mathematik 2003; 95(1):1–28. 8. Chandrasekaran S, Dewilde P, Gu M, Somasunderam N. On the numerical rank of the off-diagonal blocks of Schur complements of discretized elliptic PDEs. SIAM Journal on Matrix Analysis and Applications 2010; 31(5): 2261–2290. 9. Bebendorf M. Approximate inverse preconditioning of finite element discretizations of elliptic operators with nonsmooth coefficients. SIAM Journal on Matrix Analysis and Applications 2006; 27(4):909–929. 10. Chandrasekaran S, Dewilde P, Gu M, Pals T, Sun X, van der Veen AJ, White D. Some fast algorithms for sequentially semiseparable representations. SIAM Journal on Matrix Analysis and Applications 2005; 27(2):341–364. 11. Grasedyck L, Hackbusch W. Construction and arithmetics of H s-matrices. Computing 2003; 70(4):295–334. 12. Hackbusch W. A sparse matrix arithmetic based on H -matrices. I. Introduction to H -matrices. Computing 1999; 62(2):89–108. 13. Hackbusch W, Khoromskij BN. A sparse H -matrix arithmetic. II. Application to multi-dimensional problems Computing 2000; 64(1):21–47. 14. Grasedyck L. Theorie und Anwendungen Hierarchischer Matrizen PhD thesis, Christian-Albrechts-Universit, Kiel, Germany, July 2001. 15. Strikwerda JC. Finite Difference Schemes and Partial Differential Equations. The Wadsworth & Brooks/Cole Mathematics Series Wadsworth & Brooks/Cole Advanced Books & Software: Pacific Grove, CA, 1989. 16. Ciarlet PG. Basic error estimates for elliptic problems. In Handbook of Numerical Analysis, vol. II, Handb. Numer. Anal. II: North-Holland, Amsterdam, 1991; 17–351. 17. Golub GH, Van Loan CF. tit Matrix Computations volume 3 of Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press: Baltimore, MD, 1983.

Copyright © 2014 John Wiley & Sons, Ltd.

Numer. Linear Algebra Appl. 2015; 22:371–392 DOI: 10.1002/nla