OPTIMAL ORDER PRECONDITIONING OF FINITE DIFFERENCE MATRICES YVAN NOTAY
Abstract. A new multilevelpreconditioneris proposed for the iterativesolution of linear systems whose coecient matrix is a symmetric M{matrix arising from the discretization of a second order elliptic PDE. It is based on a recursive block incomplete factorization of the matrix partitioned in a two-by-two block form, in which the submatrix related to the ne grid nodes is approximated by a MILU factorization, and the Schur complement computed from a diagonal approximation of the same submatrix. A general algebraic analysis proves optimal order convergence under mild assumptions related to the quality of the approximations of the Schur complement on the one hand, and of the ne grid submatrix on the other hand. This analysis does not require a red black ordering of the ne grid nodes, nor a transformation of the matrices in hierarchical form. Considering more speci cally 5 point nite dierence approximations of 2D problems, we prove that the spectrum of the preconditioned system is contained in the interval [0:46 ; 3] , independently of possible (large) jumps or anisotropy in the PDE coecients, as long as the latter are piecewise constant on the coarsest mesh. Numerical results illustrate the eciency and the robustness of the proposed method. Key words. iterative methods for linear systems, acceleration of convergence, preconditioning. AMS subject classi cations. 65F10, 65B99, 65N20.
1. Introduction. This paper deals with the iterative solution of large sparse symmetric positive (nonnegative) de nite linear systems (1.1) Au = b whose coecient matrix is an M{matrix arising from the discretization of a second order elliptic PDE. In such cases, the conjugate gradient method combined with a suitable preconditioning is a choice method (e.g. [4, 5, 16, 35, 24]). The preconditioner is here a symmetric positive de nite matrix B such that solving a system with B is easy whereas the (spectral) condition number of the preconditioned system (B ?1 A) ; (B ?1 A) = max(B min ?1 A) on which depends the convergence rate, has to be as small as possible1. In particular, a preconditioner has an optimal order of computational complexity when solving a system with B requires O(n) operations, where n is the number of unknowns; (B ?1 A) is bounded independently of the grid size and possibly other problem dependent parameters. In this paper, we propose a method that originates from the algebraic multilevel preconditioning developed in [2, 10, 11, 12, 25, 26, 27, 38]. Like there, the key ingredient is the recursive use of a two-level preconditioner based on a block incomplete factorization of the system matrix partitioned in a two-by-two block form A A 11 12 (1.2) A = A ; 21 A22 Service de Metrologie Nucleaire, Universite Libre de Bruxelles (C.P. 165), 50, Av. F.D. Roosevelt, B-1050 Brussels, Belgium. Supported by the \Fonds National de la Recherche Scienti que", Chercheur quali e. email :
[email protected] 1 throughout this paper, max(C ) and min (C ) denote respectively the largest and the smallest eigenvalue of C 1
2
Y. NOTAY
where the rst block of unknowns corresponds to the ne grid nodes and the second block of unknowns to the coarse grid nodes. More precisely, we consider (1.3)
B(2) =
P
A21 S
I P ?1A 12 I
as basic two-level preconditioner where P is a modi ed ILU (MILU) factorization of A11 (e.g. [4, 17, 20, 21, 22]), and where S = A22 ? A21 K A12 for some diagonal matrix K (see below). Note that S computed in this way does not dier much from the coarse grid matrices used in previous methods of this type, see x3 below. The main novelty is the use of a MILU factorization to approximate the top left block A11 , together with the fact that our method applies directly to usual nite differences or nite element matrices with a standard multilevel partitioning (say h{2h) of the unknowns. Indeed, most previous algebraic multilevel methods originate from the hierarchical basis multigrid method [8, 11, 12, 13, 14, 15, 36, 37, 38, 39], and use as main theoretical tool the strengthened CBS constant [13, 14, 23]. They therefore essentially apply to nite element matrices computed with the hierarchical nite element basis functions, and require a basis transformation to work with standard nite dierence or nite element matrices. On the other hand, some methods have been developed that apply directly to these matrices [6, 7, 10, 25, 26, 27, 34]. However, all of them either consider a recursive red-black like partitioning of the unknowns, or use a red-black ordering of the ne grid nodes, which amounts the same. The theoretical analysis is then facilitated, the successive top left blocks A11 being then diagonal, thus inverted exactly. Nevertheless, this extensive use of red-black orderings is somewhat surprising, since the resulting methods are not that easy to implement and perform poorly in anisotropic cases [3, 10]. In a recent paper [33], we develop an analysis which in some sense explains this unexpected situation. Indeed, we bring to the light that using standard techniques to approximate A?111 has a dramatic eect on the convergence rate, leading to a condition number that may grow up to O(h?4 ) . Thus, with general approximations to A?111 , preconditioners of the form (1.3) are not spectrally equivalent to A . However, we also show that problems are prevented when P satis es some given algebraic requirements that are in particular met when it is computed from a modi ed ILU factorization of A11 . The purpose of the present paper is to exploit these basic results to construct a multilevel preconditioner. As in most previous works quoted above, this means that we approximate the inverse of the coarse grid matrix S (needed to apply the two-level preconditioner) by a matrix polynomial that involves the preconditioner B (S ) on the coarser level, and that this approach is followed recursively until an exact factorization is possible at negligible cost. Our algebraic analysis developed in x2, combined with Axelsson-Vassilevski analysis of shifted Chebyshev polynomials [2, 11, 12], proves then that the condition number is independent of the number of levels if, at every level, (1.4)
? ? max P ?1A11 max S ?1 SA < m2 ;
where m is the degree of the used polynomials and where SA = A22 ? A21 A?111 A12 is the Schur complement of A . On the other hand, it is well known that the computational cost involved by such a multilevel preconditioner remains O(n) as long as, m < where is a lower bound
OPTIMAL ORDER PRECONDITIONING
3
(valid at every level) on the coarsening ratio nn p. The method has therefore an optimal order of computational complexity as soon as < m < for some integer m . We stress that this analysis is purely algebraic and requires only that A is a symmetric M{matrix with nonnegative row-sum and P a MILU factorization of A11 . In particular, we do not make use of any form of strengthened CBS inequality, nor introduce any assumption on the structure of A11 or on the sparsity pattern of P . This makes our results essentially dierent from previous ones, even if, in the special case when A11 is ordered red-black, one could reach similar conclusions on the basis of some alternative analyzes [6, 7, 25, 26, 27]. In x3, we consider more particularly the application of our method to 5 point nite dierence approximations of PDEs of the type (1.5) ?@x ax @x u ? @y ay @y u = f in
with suitable (Dirichlet or Neumann) boundary conditions on @ . Considering P computed by the standard MILU(0) algorithm [20] with a natural ordering of the concerned nodes, we prove that, at every level, ? ? P ?1 A11 23 and S ?1 SA 2 (1.6) assuming only that the coecients ax , ay are piecewise constant on the coarsest mesh. Thus 3 , and optimal convergence is achieved with simple second degree polynomials. The resulting preconditioning method is then particularly cheap, each conjugate gradient iteration being less than two times more costly than with a mere ILU(0) like preconditioner. We stress that the results (1.6) hold independently of the ratio ax =ay , i.e. the coecients may be arbitrarily anisotropic. The eciency of the method is numerically illustrated in x4. 2. Algebraic analysis. We analyze here from a purely algebraic point of view the multilevel method derived from the two-level scheme presented in the introduction. About the latter, we have still to specify the matrix K according to which we compute the approximate Schur complement (or coarse grid matrix) S = A22 ? A21 K A12 . In fact, we use the diagonal matrix de ned by 2
(2.1)
Kii =
(A e )?1 if (A e ) 6= 0 11 1 i 11 1 i 0
otherwise ;
where e1 = (1 1 : : :1)T is the vector with all components equal to unity. As stated in the introduction, the only needed assumption is that A is a non singular symmetric M{matrix (i.e. a Stieltjes matrix) with nonnegative row-sum. In this context, note that (A11 e1 )i = 0 for some i implies (A12)ij = 0 for all j . Hence, letting be the diagonal matrix with same row-sum as A11 , S is nothing but the Schur complement of 12 Ae = A A ; 21 A22 in which one has deleted the zero lines and columns. Ae is a Stieltjes matrix since it has nonpositive odiagonal entries and same rowsum as A . S is therefore itself a Stieltjes matrix, the Schur complement of an M{ matrix being an M{matrix [4, Theorem 6.10]. Further, it is easy to see that S has
4
Y. NOTAY
also nonnegative row-sum. Indeed (2.1) implies K A11 e1 e1 , whence, noting that A21 0 , S e2 = A22 e2 + A21 K A11 e1 ? A21 K (Ae)1 (A e)2 . The assumptions made on A are thus automatically satis ed by S , showing that a recursive use of the preconditioning technique is possible, at least from a theoretical point of view. As usual with this kind of methods, such a recursive use means that the preconditioner on a given level is de ned only implicitly in function of the preconditioner on the next coarser level. In our case it writes I P ?1A12 ; B = AP21 M ?1 (2.2) I
where M is an approximate inverse of the coarse grid matrix de ned on the basis of its preconditioner B (S ) , that is M = Pm B (S ) ?1 S S ?1 (2.3) where Pm is a polynomial such that Pm (0) = 0 . In Theorem 2.1 below, we assume that the maximal eigenvalue of M S is not larger than 1 , which means in practice that these polynomials are to be scaled in such a way that max2( B S ? S ) Pm () 1 . Using shifted Chebyshev polynomials of the rst kind2 leads then to S S S S + Tm S ? S ? Tm S+? S?2 S S (2.4) ; Pm () = Tm S +? S + 1 ( )
1
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
where (S ) and (S ) are respectively lower and upper bounds on the eigenvalues of B (S ) ?1 S (in practice, such estimates are deduced from a recursive application of the Theorem, see below). In Theorem 2.1, we also use some technical assumptions on P , but we stress that they are all automatically satis ed when P is a MILU factorization of A11 . To make things clear in this respect, let us recall that a modi ed ILU factorization is an ILU factorization for which the diagonal entries of the triangular factors are computed in such a way that P has same row-sum as the factorized matrix, i.e. P e1 = A11 e1 . Further, P ?1 0 holds because these triangular factors are M{matrices, whereas ? ? 1 min P A11 1 is a well known result (see e.g. [5, 17] for a proof). Theorem 2.1. Let
A =
A
11 A21
A12 A22
be a Stieltjes matrix with nonnegative row-sum and let
(2.5)
P I P ?1A 12 B = A21 M ?1 ; I
be such that P and M are symmetric positive de nite and satisfy
(2.6) (2.7) (2.8) (2.9)
P e1 P ?1 ? ? 1 min P A11 max (M S)
= A11 e1 ; 0; 1; 1;
2 T0 (x) = 1 ; T1 (x) = x ; Tm (x) = 2 x Tm?1(x) ? Tm?2(x) (m = 2; 3; :: : ).
5
OPTIMAL ORDER PRECONDITIONING
where
S = A22 ? A21 K A12
(2.10)
for the diagonal matrix K de ned by (2.1). Then:
?
min B ?1 A min (M S) ; ? ? ? max B ?1 A max P ?1A11 max S ?1 SA ;
(2.11) (2.12)
where SA = A22 ? A21 A?111 A12 is the Schur complement of A . Proof. We rst prove for further reference that
v2T A21 P ?1 A21 v2T v2T A21 K A12 v2
(2.13)
8v2 ;
which in particular holds when
zT1 P ?1 z1 zT1 K z1
(2.14)
8z1 ;
where is the diagonal matrix with same row-sum as A11 (since A has non negative row-sum, ii = 0 implies (A12)ij = 0 for all j ). Now, K = by (2.1), whereas P ?1 e1 = e1 by (2.6). Thus, K ? P ?1 has nonpositive odiagonal entries and zero row-sum, i.e. is a symmetric M{matrix, therefore nonnegative de nite, proving (2.14) and therefore (2.13). ? 1 ?P ?1A11 and We now prove the upper bound (2.12). In this view, let = max 1 ?S ?1 SA , and consider = ?max B ? A =
P ? A 11
(1 ? )A21
(1 ? )A12 M ?1 + A21 P ?1 A12 ? A22
:
Since 1 , P ? A11 is nonnegative de nite, and (2.12) may be proved by checking the nonnegative de niteness of the Schur complement
SB? A = M ?1 ? A22 + A21 P ?1 ? (1 ? )2 (P ? A11)?1 A12 : Noting that the de nitions of and S imply with (2.13)
v2T A21 ?K ? A?111 A12 v2 ? v2T A21 ?P ?1 ? A?111 A12 v2 ; one gets, using (see (2.9)) v2T M ?1 v2 v2T S v2 v2T SA v2 , v2T SB ? A v2 (1? ) v2T A22 v2 + v2T A21 P ?1 ? A?111 ? (1? ) (P ? A11 )?1 A12v2 v2T A21 ?? P ?1 ? ?? A?111 ? (1? ) (P ? A11 )?1 A12v2 = ?? v2T A21 P ? f P ? A11 P ? P ? A12 v2 ; v2T A22 v2
?1
1
1
1
(2)
2
(1 1
1 1
1 1
1 2
)
1 2
2
1 2
where f(X) = I ? X ?1 ? (1 ? )(1 ? )(I ? X)?1 .
1 2
6
Y. NOTAY
is then proved nonSince P ? A11 P ? is symmetric positive ? de nite, SB? A ? negative de nite if f() 0 8 2 P A11 P 1; ?1 . This is easily checked because ? f() = (1 ? )?1 (1 ? )(1 ? ?1 ) ? (1 ? )(1 ? ) = (1 ? )?1 (1 ? ?1)(1 ? ) : We apply a similar reasoning to prove the lower bound (2.11): let = min (M S) and consider ! A11 ? P (1 ? )A12 A ? B = : (1 ? )A21 A22 ? M ?1 ? A21 P ?1 A12 Since 1 , A11 ? P is nonnegative de nite by (2.8), and (2.12) may be proved by checking the nonnegative de niteness of the Schur complement 1 2
1 2
1 2
1 2
SA? B = A22 ? M ?1 ? A21 P ?1 + (1 ? )2 (A11 ? P)?1 A12 : Now, by the de nition of , v2T M ?1 v2 v2T S v2 8v2 , whereas v2T (A22 ? S) v2 = v2T A21 K A12 v2 v2T A21 P ?1 A12 v2 by (2.13). Therefore,
v2T SA? B v2 (1 ? ) v2T A21 P ?1 ? (1?) (A11 ? P)?1 A12 v2 while P ?1 ? (1?) (A11 ? P)?1 is nonnegative de nite if and only if (1 ? ) v1T P v1 v1T (A11 ? P) v1 8v1 ; which is true by (2.8). ? ? Assume now that an upper bound max P ?1A11 max S ?1 SA holds at every level. Theorem 2.1 shows then that (S ) = is valid, i.e. the polynomial (2.4) used at a given level is properly de ned if a lower bound (S ) is known. This is true at the bottom level since S is inverted exactly on the coarsest mesh. From there and from the lower bound (2.11), the recursive de nition (bottom to top) of the polynomials is straightforward. Moreover, applying the? analysis of these polynomials as developed in [2, 10, 11, 12], it can be seen that min B ?1 A remains bounded away from 0 independently of the number of levels as long as < m2 . This is formally proved in the following theorem. When < 4 , we consider also a somewhat simpler strategy that uses the same polynomial at every level. Theorem 2.2. Let A(`) be a Stieltjes matrix with nonnegative row-sum. For k = `; ` ? 1; : : :; 1 , partition A(k) in a 2 2 block form A(k)
=
A(11k) A(12k) A(21k) A(22k)
!
and set
A(k?1) = S (k) = A(22k) ? A(21k) K (k) A(12k) ;
7
OPTIMAL ORDER PRECONDITIONING
where K (k) is the diagonal matrix de ned by (2.1). Let, for k = 1; : : :; ` ,
B (k)
P (k) A(21k) M (k) ?1
=
!
I P (k) ?1 A12 I
;
be such that P is symmetric positive de nite and satis es (2.6), (2.7), (2.8), whereas
M (1) = A(0) ?1 ; M (k) = Pm(k) B (k?1) ?1 A(k?1) A(k?1) ?1 ;
k = 2; : : :; ` ;
where Pm(k) is a polynomial such that Pm(k)(0) = 0 . Let 1 be such that, for all 1 k ` ,
(k) (k) ?1 S (k) max P (k) ?1 A11 max S A
(2.15)
where SA(k) = A(22k) ? A(21k) A(11k) The following holds. Case 1 If, for k = 2; : : :; ` ,
?1 (k) A12 is the Schur complement of A(k) .
+k? ?2 +k? ? T T m m ?k? ?k? ; Pm(k) () = + k? Tm ?k? + 1 1
(2.16)
1
1
1
1
1
where k is de ned from the recursion
Tm +?kk?? ? 1 ; k = Tm +?kk?? + 1 1
(1) = 1 ;
k = 2; : : :; ` ;
1
1
1
then M (k) and B (k) are symmetric and positive de nite for all 1 k ` and
?
(2.17) min B ?1 A ` ; ? (2.18) max B ?1 A max P (`) ?1 A(11k) max S (`) ?1 SA(`) :
Moreover, if < m2 , there exists a lower bound independent of ` such that
(2.19)
` > :
Case 2 If < 4 and if, for k = 2; : : :; ` ,
2
Pm(k)() = P2() = 2 p ? e ; e
(2.20)
p
where e is such that 1 e and 2 e > , then M (k) and B (k) are symmetric and positive de nite for all 1 k ` and
(2.21) (2.22)
?
min B ?1 A ; (`) ?1 (`) ? (`) SA max B ?1 A max P (`) ?1 A11 max S
8
Y. NOTAY
where
(2.23)
p = 2 e ? e :
Proof. Case 1: since max2[k? ;] Pm(k) = 1 and min2[k? ;] Pm(k) = k , the 1
1
nonnegative de niteness of M (k) , B (k) and the bounds (2.17), (2.18) straightforwardly follow from a recursive application of Theorem 2.1. On the other hand, since 2 and Tm (y) 1 for y 1 , it is easily seen that, letting Tm (1) = 1 , Tm0 (1) = m ? 1 +x f(x) = Tm ?+xx + 1 Tm ?x ? 1 , one has f(0) = 0 , f 0 (0) = ?1 m2 > 1 and f(1) 1 . Hence, there exist 2]0; 1] for which = f() . Further, since T(y) is increasing for y 1 , f(x) is increasing for x 2 [0; 1] , showing that f(x) > f() = for all x > , whence (2.19). Case 2: rstly, max P2 () = 1 and min2[ ; ] P2() = p ? ? ? min P2 () ; P2 . Now, P2 = and P2 () ? = e?1 2 e ? e ? = ? p e?1 P2 e ? P2 0 , P2 () being a decreasing function of for e . Hence, min2[ ; ] P2() = , and the the required results follow from a recursive application of Theorem 2.1. Concerning the choice (2.20) (Case 2), the lower bound increases with e and p is thus maximum for e = , in which case = 2 ? . Then, as long as is not too close to 0 (i.e. not too close to 4), this choicepis as ecient as the one based on (2.16) (Case 1) with m = 2 , since lim`!1 ` = 2 ? whereas P2(k) converges to P2 given by (2.20). ? On the otherhand, we allow e < in (2.20) to cover the case when max P ?1A11 ? or max S ?1 SA is slightlypunderestimated. One sees with (2.23) that this has no dramatic eect as long as 2 e ? is not too small. 3. Auxiliary ? need to derive upper results, one ? the above To apply ? bounds. bounds on max P ?1A11 and max S ?1 SA . About max P ?1A11 , we note that A11 has condition number O(1) [1, 8, 14], leading to expect a nice value, MILU preconditioning being on the other hand very ecient when the factorized matrix has much diagonal dominance. We further note that an upper bound is always computable from the triangular factors ? on the basis of e.g. [17, Theorem 3.1] or [4, Theorem 10.5]. Concerning max S ?1 SA , a general means to obtain an upper bound follows from the observation that the coarse grid discretization matrix Ac satis es v2T SA v2 ? ? 1 T v2 Ac v2 for all v2 [36]. Hence, to prove max S SA ?1 , it suces to check the nonnegative de niteness of S ? Ac , which holds when the matrix
eA ? 00 A0 = A A A?12 A 21 22 c c
is itself nonnegative de nite (S ? Ac being its Schur complement). In nite element applications, this can be checked at the level of coarse grid elements, resulting in upper bounds independent of the grid size and of jumps in the coecients that occur across the edges of the coarse mesh. Now, to proceed further, one need to select a particular class of applications and introduce some assumptions on the variations of the coecients. In the following, we consider the 5 point nite dierence approximation of PDEs of the form (1.5), using
OPTIMAL ORDER PRECONDITIONING
9
the point mesh box integration scheme [29] (which, for this type of PDEs, is equivalent to a linear nite element scheme with right triangles). We also assume that the domain
is the union of coarse rectangles on which the PDE coecients ax , ay are piecewise constant, and that these coarse rectangles (which need not be very coarse) induce a Cartesian grid (whose boundary is not necessarily regular) with suciently few nodes to allow an exact factorization of the related matrices. Successive uniform re nements (h ! h2 ) of this initial grid de ne the node sets corresponding to the successive levels, until one reach the nest grid on which the discretization is actually performed. Note that the coecients may present arbitrary jumps across the edges of the coarse rectangles, whereas the ratio ax =ay may be arbitrary small or large, as well as the aspect ratio hx =hy of each coarse rectangle. We also stress that the above assumptions are introduced to obtain a rigorous and complete analysis, resulting in eigenvalue bounds whose value do not depend further of any additional constant or parameter. ? In this context, the analysis of max S ?1 SA is particularly straightforward. Indeed, a close inspection of the concerned matrices, which we leave to the reader (see also [19]), shows that, under the given assumptions, (3.1) S = 21 Ac ; ? whence max S ?1 SA 2 . This result moreover applies at every level of the recursion that de nes the preconditioner, the considered matrix being by (3.1) always equal to the discretization matrix times some scaling factor. Note that the same argument (3.1) has been previously used in [25, 26, 27] to prove a similar result for the method considered there, which incidentally use the same coarse grid?matrices, although they are derived in ? another way. About max P ?1A11 , we show below that max P ?1A11 23 when P is computed by the standard MILU(0) algorithm [20] with a natural ordering of the concerned nodes. Here again, it is readily seen from (3.1) that the proof applies at every level of the recursion. On the other hand, we further conjecture that the actual value is still? better in practice since we never observed in our numerical experiments that max P ?1A11 could be larger than 34 . Combining both results, we can thus apply the method de ned in x2, using at every? level the polynomial (2.20) with e = = 3 . The resulting lower bound on min B ?1 A is = 0:46 , i.e. all eigenvalues are contained in the interval [0:46 ; 3] . On the basis of the above conjecture, we even obtain slightly better results with e = 2:66 , for which, assuming thus = e , one has = 0:60 . In case the conjecture does not hold, note ? that (2.23) with e = 2:66 and = 3 still gives = 0:30 as lower bound on min B ?1 A . ? Proof that max P ?1A11 32 . We rst note that the proposition holds if and only if A11 ? 3 R is nonnegative de nite, where R = A11 ? P is the error matrix that has zero row-sum by (2.6). A typical nite dierence grid is depicted on Fig.1 where the nonzero odiagonal entries in A11 and R are represented by respectively straight and dot lines. One sees that a MILU(0) factorization of A11 leads to the rejection of all ll-in, i.e. letting A11 = D ? L ? LT? where D is diagonal and L strictly lower triangular, one has P = (Pp ? L) Pp?1 Pp ? LT where Pp is diagonal. One can further observe that, to the natural decomposition A11 = e A(11`) , one may associate a decomposition R = e R(`) , where a typical elementary matrix A(11`)
10
Y. NOTAY
p p p
p p p p
p
p
p
p
p
p p
p
p
p p p
p
p
p
p p p p
p
p
p
p
p
p
p p
p
p
p p
p
p
Fig. 1. Typical nite dierence grid where the nodes marked and are respectively the coarse and the ne grid nodes; nonzero odiagonal entries in A11 are represented by straight lines and nonzero odiagonal entries in R by dot lines
5
p p p p p p
2
3
1
p
4
p
p
p
(`) and in the Fig. 2. Elementary box of the nite dierence grid; odiagonal entries in A11 corresponding R(`) are represented as on Fig. 1.
and the corresponding R(`) are shown on Fig.2. Because ax and ay are piecewise constant on the coarse grid, one has, using the local numbering indicated on Fig.2 ,
0 ax` + ay` B 0 B B ?ay` A(11`) = B B B B @ 0
0
( )
( )
a(x`) + a(y`)
?a(x`)
( )
0
0
and
0
00 BB ( ` ) R =B BB @0
(3.2)
?a(y`) ?a(x`)
0
0
0
0
?a(x`)
?a(y`)
a(x`) + a(y`)
0
0
ax(`) + ay(`)
? 2 a ` +a ` ( )
( )
y
x
?a(x`) ?a(y`)
0 a(x`) a(y`) p(3`) (`) (`) ? axp(`a)y 3
? axp `ay (`)
(`)
( ) 3
a(x`) a(y`) p(3`)
1 CC CC CC CA
1 CC CC ; CA
where p(3`) is the diagonal entry of Pp at the node with local number 3. nodes with local numbers 1 and 2 are eliminated exactly, there holds p3(`) = (Since ` ` 2 ax`) + a(y`) ? (apy ` ) ? (apx ` ) . On the other hand, (2.6) implies (see e.g. [17]) ( ) 2 ( ) 1
( ) 2 ( ) 2
11
OPTIMAL ORDER PRECONDITIONING
Pp e1 LT e1 + A11 e1 , whence p(1`) ay(`) + a(x`) , p(2`) a(x`) + a(y`) . Therefore,
(`)
(`)
p3(`) 2 ax + ay ?
(`)2 (`)2 ax
+ ay ( ` ) ax + a(y`)
=
(`)2 (`)2
+ ay + 4 a(x`) a(y`) ; a(x`) + a(y`)
ax
and using this lower bound instead of the exact value in (3.2), one gets a matrix Re(`) that is not smaller` than R(`) in the positive de nite sense. Now, with aaxy` = r , one has ( )
( )
0 1+r 0 ?1 BB BB 0 1 + r ?r B ?1 ?r 2(1 + r) A(`) ? 3 Re (`) = a(y`) B BB BB 0 0 ?r @
0 0 ?r
0 0 ?1
(1+r)(1+r+r2 ) 1+r2 +4r 3r(1+r) 1+r2 +4r
3r(1+r) 1+r2 +4r (1+r)(1+r+r2 ) 1+r2 +4r
1 CC CC CC : CC CC A
0 0 ?1 We leave the reader check that the successive pivots of an exact LU factorization of the latter matrix are u11 = u22 = a(y`) (1 + r) , u33 = a(y`)(1 + r)?1(1 + 4r + r2 ) , u44 = a(y`)(1 + 4r + r2)?1 (1 + r)2 and u5 = a(y`) (1 + 4r + r2 )?1r(1 ? r)2 . Since the rst four are positive and the last one nonnegative for any positive r , this completes the proof of the required result. 4. Numerical results. We have tested our multilevel preconditioner on linear systems resulting from the 5 point nite dierence discretization of the PDE (1.5) in
= (0; 1) (0; 1) with the boundary conditions ( u = 0 on ?0 @
@u @n = 0 on ?1 = @ n?0 :
We use the point mesh box integration scheme [29] with uniform mesh size h in both directions, except on the problem Stretched that has been retrieved from [18]. In each experiment, the reported numbers of iterations (# it.) and numbers of basic (+ or ) oating point operations ( # ops) are those necessary to reduce the relative residual error kkrbkkk below 10?7 when using the preconditioned conjugate gradient algorithm with the zero vector as initial approximation. The reported op counts exclude preprocessing, that is the operations needed to compute the successive incomplete factorizations of A11 and approximate Schur complements. In each case, the multilevel preconditioner is de ned as described in x2 and x3, using the polynomials (2.20) with e = 2:66 , and factorizing exactly the approximate Schur complement S when the number of concerned nodes is not larger than 81. We rst consider the model Poisson problem (ax = ay = f 1 ; ?1 = @ ). The results are reported in Table 1 for the multilevel method, which is further compared on Fig. 3 with some classical (block) ILU type preconditioners3 . One sees that 3 these are de ned with respect to a natural ordering of the unknowns; block preconditioners BILU, MBILU have been computed as in [9, 28], i.e. are identical to respectively IN(1) and MINV(1) in [21]).
12
Y. NOTAY
h?1 min 32 .86 64 .85 128 .85 256 .85 512 .85 1024 .85
max 1.97 2.00 2.00 2.00 2.00 2.00
# it. 2.30 11 2.34 12 2.35 12 2.36 12 2.36 12 2.36 12
Table 1
# ops
n
502. 596. 624. 639. 649. 654.
results for the Poisson problem
5000 # ops n 4500 4000 3500 3000 2500 2000 1500 1000 500 0
ILU(0)
MILU(0)
BILU MBILU Multilevel 32
64
128
h?1
256
512
1024
Fig. 3. Poisson problem: comparison of the multilevel method with (block) ILU preconditioning.
the multilevel method has grid independence convergence and already outmatches standard preconditioning techniques for fairly moderate problem sizes. We relate this to the fact that our method is rather cheap for such an high quality preconditioning. A thorough inspection indeed proves that each solve B x = y does not cost more than 36 n ops. This means that the cost of each preconditioned conjugate gradient iteration is only about 55 n ops. By comparison, it is 29 n ops with a ILU(0) like preconditioner and 33 n ops with a block ILU like preconditioner (see e.g. [9] ). We next consider the following test problems, where d is a positive parameter. Problem 1
ax = a y =
( d in ? 1 ; 3 ? 1 ; 3 4 4 4 4 1 elsewhere ;
OPTIMAL ORDER PRECONDITIONING
f =
( 100 in ? 1 ; 3 ? 1 ; 3 4 4 4 4 0
13
elsewhere ;
?0 = f(x; y) j 0 x 1; y = 0g : Problem 2
ax = f =
( d in ? 1 ; 1 ? 1 ; 1 4 2 4 2 1 elsewhere ;
; ay =
( 1 in ? 1 ; 3 ? 1 ; 3 4 4 4 4
( d in ? 1 ; 3 ? 1 ; 3 2 4 2 4 1 elsewhere ;
0 elsewhere ;
?0 = f(x; y) j 0 x 1; y = 0g : Problem Stretched [18] This problem solves the Laplace equation with Neumann boundary conditions (?1 = @ ) on a non uniform Cartesian (M + 1) (M + 1) grid, M = 128; 256; 512. In the neighborhood of the boundaries the grid is re ned in such a way that the ratio of maximum mesh size to minimum mesh size is equal to 1000 and the ratio of subsequent mesh sizes is kept constant. The right2 hand side is b = A u, where u is the function u(x; y) = (x y(1 ? x)(1 ? y)) ex y evaluated at the grid points. Problem Discontinuous [18] 2
ax = ay =
( 105 in (0:3; 0:8) (0:3; 0:8) 1
elsewhere ;
?1 = @ ; h?1 = 400 ; same r.h.s. as in Problem Stretched. The last two problems are especially interesting because they do not result from successive re nement steps applied to a problem de ned on a coarse grid, i.e. the assumptions used in x3 are not satis ed exactly. However, the multilevel method is applicable to any symmetric M{matrix as soon as one provides some multilevel partitioning of the unknowns. For the problem Stretched, it was natural to consider the same hierarchy of nodes as on the topologically equivalent (N + 1) (N + 1) grid that results when using a uniform mesh size h = N1 . For the problem Discontinuous, we used the hierarchy of nodes that is induced by considering the actual 401 401 grid as part of a larger 513 513 grid; the node at (0; 0) is then on the coarsest grid, but the nodes on the lines x = 0:3 and y = 0:3 are ne grid nodes after three levels of recursion. Note that due to the Neumann boundary conditions, the solution of the last two problems is determined up to a constant, hence the coecient matrix is singular. The multilevel preconditioner is then singular too, but admissible because it has the same kernel as A [30]. In practice, only the last pivot of the LU factorization of the Schur complement on the coarsest level is zero. It suces then to exchange it for an
14
Y. NOTAY
Problem 1 d min max # it. h?1 = 128 ? 4 10 .85 1.99 2.34 13 10?2 .85 2.00 2.35 13 1 .85 2.00 2.35 12 102 .85 2.00 2.34 13 104 .86 2.00 2.33 13 h?1 = 512 ? 4 10 .85 2.01 2.36 13 10?2 .85 2.01 2.36 13 1 .85 2.01 2.36 12 102 .85 2.01 2.35 13 104 .85 2.01 2.35 13 Problem Discontinuous min max # it. .85 2.01 2.36 11 Table 2
# ops
n
692. 692. 639. 692. 692. 709. 709. 655. 709. 709. # ops
n
623.
results for Problem 1 and Problem Discontinuous
arbitrary positive value to use the preconditioner trouble free as a regular one [31]. Besides, all direction vectors in the preconditioned conjugate gradient algorithm are projected onto the range of A. This improves the stability at the price of two more
ops per unknown and per iteration [32]. The results are reported in Tables 2 and 3. Comparing with the results for the model problem, the method appears nearly insensitive to jumps in the PDE coecients. Looking more particularly at the results for the problem discontinuous, it seems that this robustness even extends to problems where these jumps do not only occur across the edges of the coarsest mesh. Considering, on the other hand the results for Problem 2 and Problem Stretched, the method appears slightly more sensitive to anisotropy or grid stretching. However, it remains of optimal order as proved theoretically for problems like Problem 2. Further, we obtain practically grid independence convergence for the problem Stretched, which has to be considered as very dicult in this respect [18]. REFERENCES [1] O. Axelsson, On multigrid methods of the two-level type, in Multigrid Methods, W. Hackbusch and U. Trottenberg, eds., Lectures Notes in Mathematics No. 960, Berlin Heidelberg New York, 1981, Springer-Verlag, pp. 352{367. [2] , The method of diagonal compensation of reduced matrix entries and multilevel iteration, J. Comput. Appl. Math., 38 (1991), pp. 31{43. , On algebraic multilevel iterations for methods for selfadjoint elliptic problems with [3] anisotropy, in Fascicolo Speciale, Numerical Methods, Rend. Sem. Mat. Univers. Politech. Torino, 1991, pp. 31{61. , Iterative Solution Methods, University Press, Cambridge, 1994. [4] [5] O. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems. Theory and Computation, Academic Press, New York, 1984.
OPTIMAL ORDER PRECONDITIONING
Problem 2 d min max # it. h?1 = 128 ? 4 10 .73 2.00 2.76 14 10?3 .73 2.00 2.74 14 10?2 .78 2.00 2.55 13 10?1 .85 2.05 2.41 12 1 .85 2.00 2.35 12 101 .85 2.09 2.45 13 102 .79 2.18 2.77 15 103 .74 2.22 3.02 16 104 .72 2.23 3.08 17 h?1 = 512 ? 4 10 .68 2.01 2.96 15 10?3 .72 2.01 2.80 14 10?2 .78 2.20 2.81 14 10?1 .85 2.09 2.46 13 1 .85 2.01 2.36 12 101 .85 2.10 2.46 13 102 .79 2.20 2.80 15 103 .72 2.29 3.17 17 104 .68 2.31 3.40 18 h?1 min 128 .73 256 .72 512 .71
Problem Stretched max # it. 2.16 2.94 13 2.20 3.04 13 2.26 3.18 13 Table 3
15
# ops
n
745. 745. 692. 639. 639. 692. 799. 852. 905. 818. 764. 764. 709. 655. 709. 818. 928. 982. # ops
n
723. 733. 738.
results for Problem 2 and Problem Stretched
[6] O. Axelsson and V. Eijkhout, Analysis of recursive 5-point/9-point factorization method, in Preconditioned Conjugate Gradient Methods, O. Axelsson and L. Kolotilina, eds., Lectures Notes in Mathematics No. 1457, Springer-Verlag, 1990, pp. 154{173. , The nested recursive two level factorization for nine-point dierence matrices, SIAM [7] J. Sci. Stat. Comput., 12 (1991), pp. 1373{1400. [8] O. Axelsson and I. Gustafsson, Preconditioning and two-level multigrid methods of arbitrary degree of approximation, Math. Comp., 40 (1983), pp. 214{242. [9] O. Axelsson and G. Lindskog, On the eigenvalue distribution of a class of preconditioning methods, Numer. Math., 48 (1986), pp. 479{498. [10] O. Axelsson and M. Neytcheva, Algebraic multilevel iterations for Stieltjes matrices, Numer. Lin. Alg. Appl., 1 (1994), pp. 213{236. [11] O. Axelsson and P. S. Vassilevski, Algebraic multilevel preconditioning methods, I, Numer. Math., 56 (1989), pp. 157{177. [12] , Algebraic multilevel preconditioning methods, II, SIAM J. Numer. Anal., 27 (1990), pp. 1569{1590. [13] R. E. Bank, Hierarchical bases and the nite element method, Acta Numerica, 5 (1996), pp. 1{ 43. [14] R. E. Bank and T. F. Dupont, Analysis of a two-level scheme for solving nite element equations, Tech. Rep. CNA-159, Center for Numerical Analysis, The University of Texas at Austin, Texas, USA, 1980.
16
Y. NOTAY
[15] R. E. Bank, T. F. Dupont, and H. Yserentant, The hierarchical basis multigrid method, Numer. Math., 52 (1988), pp. 427{458. [16] Barrett, Berry, Chan, Demmel, Donato, Dongarra, Eijkhout, Pozo, Romine, and van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM Publications, Philadelphia, 1993. [17] R. Beauwens, Upper eigenvalue bounds for pencils of matrices, Lin. Alg. Appl., 62 (1984), pp. 87{104. [18] E. Botta, K. Dekker, Y. Notay, A. van der Ploeg, C. Vuik, F. Wubs, and P. de Zeeuw, How fast the laplace equation was solved in 1995, Appl. Numer. Math., 24 (1997), pp. 439{ 455. [19] D. Braess, The convergence rate of a multigrid method with Gauss{Seidel relaxation for the Poisson equation, Math. Comp., 42 (1984), pp. 505{519. [20] T. C. Chan and H. A. van der Vorst, Approximate and incomplete factorizations, in Parallel Numerical Algorithms, D. E. Keyes, A. Samed, and V. Venkatakrishnan, eds., ICASE/LaRC Interdisciplinary Series in Science and Engineering, Volume 4, Dordecht, 1997, Kluwer Academic, pp. 167{202. [21] P. Concus, G. H. Golub, and G. Meurant, Block preconditioning for the conjugate gradient method, SIAM J. Sci. Statist. Comput., 6 (1985), pp. 220{252. [22] T. Dupont, R. P. Kendall, and H. H. Rachford, An approximate factorization procedure for solving self-adjoint elliptic dierence equations, SIAM J. Numer. Anal., 5 (1968), pp. 559{ 573. [23] V. Eijkhout and P. Vassilevski, The role of the strengthened c.b.s. inequality in multilevel methods, SIAM Review, 33 (1991), pp. 405{419. [24] G. H. Golub and C. F. van Loan, Matrix Computations, The John Hopkins University Press, Baltimore, Maryland, 1996. third ed. [25] Y. A. Kuznetsov, Algebraic multigrid domain decomposition methods, Sov. J. Numer. Anal. Math. Modeling, 4 (1989), pp. 361{392. , Multi{level domain decomposition methods, Appl. Numer. Math., 6 (1989), pp. 303{314. [26] [27] , Multigrid domain decomposition methods, in Third International Symposium on Domain Decomposition Methods for Partial Dierential Equations, T. F. Chan, R. Glowinski, J. Periaux, and O. B. Widlund, eds., Philadelphia, 1990, SIAM, pp. 211{227. [28] M. M. Magolu, Ordering strategies for modi ed block incomplete factorization, SIAM J. Sci. Comput., 16 (1995), pp. 378{399. [29] S. Nakamura, Computational Methods in Engineering and Science, John Wiley & Sons, New York, 1977. [30] Y. Notay, Polynomial acceleration of iterative schemes associated with subproper splittings, J. Comput. Appl. Math., 24 (1988), pp. 153{167. , Incomplete factorization of singular linear systems, BIT, 29 (1989), pp. 682{702. [31] , Solving positive (semi)de nite linear systems by preconditioned iterative methods, in [32] Preconditioned Conjugate Gradient Methods, O. Axelsson and L. Kolotilina, eds., Lectures Notes in Mathematics No. 1457, Berlin Heidelberg New York, 1990, Springer-Verlag, pp. 105{125. , Using approximate inverses in algebraic multilevel methods, Numer. Math., 80 (1998), [33] pp. 397{417. [34] A. van der Ploeg, E. Botta, and F. Wubs, Nested grids ILU-decomposition (NGILU), J. Comput. Appl. Math., 66 (1996), pp. 515{526. [35] H. A. van der Vorst and T. C. Chan, Linear system solvers: Sparse iterative solvers, in Parallel Numerical Algorithms, D. E. Keyes, A. Samed, and V. Venkatakrishnan, eds., ICASE/LaRC Interdisciplinary Series in Science and Engineering, Volume 4, Dordecht, 1997, Kluwer Academic, pp. 91{118. [36] P. Vassilevski, Nearly optimal iterative methods for solving nite element elliptic equations based on the multilevel splitting of the matrix, Tech. Rep. # 1989-09, Institute for Scienti c Computation, University of Wyoming, Laramie, USA, 1989. , Hybrid V-cycle algebraic multilevel preconditioners, Math. Comp., 58 (1992), pp. 489{ [37] 512. [38] , On two ways of stabilizing the hierarchical basis multilevel methods, SIAM Review, 39 (1997), pp. 18{53. [39] H. Yserentant, On the multi-level splitting of nite element spaces, Numer. Math., 49 (1986), pp. 379{412.