Sine Transform Based Preconditioners for Elliptic ... - Semantic Scholar

Report 2 Downloads 52 Views
Sine Transform Based Preconditioners for Elliptic Problems Raymond H. Chan Chiu-Kwong Wongy November 25, 1993 Abstract

We consider applying the preconditioned conjugate gradient (PCG) method to solve linear systems Ax = b where the matrix A comes from the discretization of second-order elliptic operators. Let (L + )?1 (Lt + ) denote the block Cholesky factorization of A with lower block triangular matrix L and diagonal block matrix . We propose a preconditioner M = (L^ + )?1 (L^ t + ) with block diagonal matrix  and lower block triangular matrix L^ . The diagonal blocks of  and the subdiagonal blocks of L^ are respectively the optimal sine transform approximations to the diagonal blocks of  and the subdiagonal blocks of L. We show that for 2dimensional domains, the construction cost of M and the cost for each iteration of the PCG algorithm are of order O(n2 log n). Furthermore, for rectangular regions, we show that the condition number of the preconditioned system M ?1 A is of order O(1). In contrast, the system preconditioned by the MILU and MINV methods are of order O(n). We will also show that M can be obtained from A by taking the optimal sine transform approximations of each sub-block of A. Thus the construction of M is similar to that of Level-1 circulant preconditioners. Our numerical results on 2-dimensional square and L-shaped domains show that our method converges faster than the MILU and MINV methods. Extension to higher-dimensional domains will also be discussed.

Key Words. Sine transform, preconditioned conjugate gradient method, elliptic partial

di erential equation.

AMS(MOS) Subject Classi cations. 65F10, 65N22

Department of Mathematics, The Chinese University of Hong Kong, Shatin, Hong Kong. Research supported in part by HKRGC grant no. CUHK 178/93E. y Department of Mathematics, University of Hong Kong, Pokfulam Road, Hong Kong. 

1

1 Introduction In this paper, we study the numerical solutions of second order elliptic equations by iterative methods. After discretization by using standard nite-di erence method, such problems reduce to the solution of linear systems of the form Ax = b where A is usually symmetric and positive de nite. One of the most popular iterative methods for solving such systems is the conjugate gradient (CG) method, see Axelsson and Barker [1, p.18]. In general, the convergence rate of the CG method depends on the condition number (A) of A. The smaller (A) is, the faster the convergence of the method will be. However, if (A) is large, then we can apply the preconditioned conjugate gradient (PCG) method, i.e. we apply the CG method to the preconditioned system M ?1 Ax = M ?1 b. The matrix M , called a preconditioner to the matrix A, is chosen with two criteria in mind: Mr = d is easy to solve for any vector d; the spectrum of M ?1 A is clustered and/or M ?1 A is well-conditioned compared to A. One of the successful classes of preconditioners for elliptic problems is the class of modi ed incomplete LU (ILU, MILU) factorizations, see for instance, Axelsson and Baker [1, p.337] and Dupont et al. [8]. The ILU method computes an approximate LU factorization M of A based on the Gaussian elimination in which ll-ins at the (i; j )th element is dropped if the (i; j )th entry of A is zero. In the MILU method, the dropped ll-ins are added back to the diagonal plus an additional term 1=n2 , where 1=n is the mesh-size. For matrices A arising from the discretization of second-order elliptic problems, usually (A) = O(n2). However, it has been proved in Dupont et al. [8] that the condition numbers (M ?1 A) of the preconditioned systems for the ILU and the MILU methods are bounded by O(n2) and O(n) respectively. Besides the ILU-type preconditioners, incomplete block Cholesky factorizations (INV, MINV) is another popular class of block preconditioners for solving two dimensional elliptic problems. The motivation behind these preconditioners comes from the complete block Cholesky decomposition of A. For any tridiagonal matrix D1 and nonsingular tridiagonal matrix D2, the INV preconditioner approximates the Schur complement D1 ? D2?1 by the band matrix D1 ?B3 (D2?1). Here B3 (D2?1) denotes the tridiagonal matrix with diagonals identical to the three main diagonals of D2?1. In the MINV method, the dropped bands are added back to the main diagonal. Numerical experiments in [7] indicate that the condition numbers (M ?1 A) of the INV and the MINV methods are bounded by O(n2) and O(n) respectively. In [3], R. Chan and T. Chan proposed another class of preconditioners which is based on averaging the coecients of A to form a circulant approximation. Part of the motivation is to exploit the fast inversion of circulant systems via the Fast Fourier Transform (FFT). They proved that circulant preconditioners can be chosen so that (M ?1 A) = O(n), just as that for the MILU and MINV type preconditioners. In this pa2

per, we propose a class of block preconditioners which is based on the idea of constructing the INV preconditioner. However, we will use matrices that can be diagonalized by the discrete sine transform to approximate the Schur complement D1 ? D2?1 instead of using band matrices as in the INV method. For a given matrix K , the optimal sine transform approximation to K is the minimizer of kB ? K kF over the set of matrices B that can be diagonalized by the discrete sine transform. Here kkF denotes the Frobenius norm. The minimizer, denoted by s(K ), will be used in constructing our preconditioners. The motivation behind our choosing of sine transform approximation rather than circulant ones is that the optimal sine transform approximation gives exact approximation to the discrete Laplacian with Dirichlet boundary conditions. Therefore we expect our preconditioners are still good approximation for elliptic operators that are small perturbations of the Laplacian. Our theoretical and numerical results in Sections 3 and 5 verify this claim. The construction of our block preconditioners M is similar to that of the INV, namely, both use easily invertible matrices to approximate Schur's complements. However, for elliptic problems on 2-dimensional rectangular domains, we will show that our M can also be constructed from the matrix A by taking the optimal sine transform approximations of each n-by-n block of A. Thus the construction of M is also similar to the so-called Level-1 circulant approximation of A as de ned in T. Chan and Olkin [6]. We will see that the construction cost of M and the matrix-vector multiplication M ?1 v for any vector v can be done in O(n2 log n) operations. Furthermore, we will show that the condition number of the preconditioned system M ?1 A is of order O(1). Thus M is an ecient preconditioner. We note that both the construction (based on averaging of the coecients of the elliptic operator) and the inversion (using Fast Sine Transforms) of our preconditioner are highly parallelizable. Finally, we remark that the construction approach we used can easily be extended to 2-dimensional irregular domains or higher dimensional regular domains. Our numerical results on 2-dimensional rectangular and L-shaped domains show that our preconditioners work better than the MILU and MINV preconditioners. The outline of the paper is as follows. In the next section, we will describe the optimal sine transform approximation for general matrices. In x3, we use the optimal sine transform approximation to construct a block preconditioner on 2-dimensional rectangular domain. We will show that the INV approach is the same as the Level-1 approach and we will also analyze the spectral condition number of the preconditioned system. In x4, we extend the de nition of our preconditioner to 2-dimensional irregular domains and also higher dimensional regular domains. Finally, numerical experiments are given in x5.

3

2 Sine Transform Approximations for General Matrices In this section we recall some of the results in approximating a given matrix by matrices that can be diagonalized by the discrete sine transform. Let Sn be the n-by-n discrete sine transform matrix. Its (i; j )th entry is given by

s

2 sin( ij ); 1  i; j  n: n+1 n+1 We note that Sn is symmetric, orthogonal and the matrix-vector multiplication Snv can be computed in O(n log n) operations for any n-vector v, see Yip and Rao [9]. Let Bnn be the vector space containing all the n-by-n matrices that can be diagonalized by Sn. Given B 2 Bnn, we now show that the product B ?1v for any vector v can be done in O(n log n) operations. We rst emphasize the relationship between the rst column of B and its eigenvalues. Since B = SnSn for some diagonal matrix , if en = (1; 0; : : : ; 0)t and 1n = (1; 1; : : : ; 1)t, then we have

D?1SnBen = 1n;

(1)

where D is the diagonal matrix whose diagonal is equal to Snen. Thus by exploiting the Fast Sine Transform, the matrix  and hence the matrix-vector multiplication B ?1 v = Sn?1Snv can be computed in O(n log n) operations. Given an n-by-n matrix A, we are interested in nding a matrix B 2 Bnn which minimizes kAn ? B kF in the Frobenius norm k  kF . We will denote the minimizer by s(An) and called it the optimal sine transform approximation to An. The following lemma gives some basic properties of s(An).

Lemma 1 Let An be an n-by-n symmetric matrix and s(An) be the minimizer of kBn ? AnkF over all Bn 2 Bnn. Then s(An) is uniquely determined by An and is given by s(An) = Sn(SnAnSn)Sn;

(2)

where (SnAnSn ) denotes the diagonal matrix whose diagonal is equal to the diagonal of the matrix Sn An Sn. Furthermore,

min(An)  min((An))  max((An))  max(An): In particular, if An is positive de nite, then s(An) is also positive de nite.

4

Proof: Follows directly from Chan and Jin [4, Lemma 1]. We note that forming s(An) by computing all the diagonal entries of SnAnSn as in (2) requires O(n2 log n) operations. Chan, Ng and Wong [5] gave another approach of constructing s(An) which reduces the cost to O(n2) operations. Before we describe how the matrix s(An) is formed, we need the following de nitions and notations. De nition 1 Let Qi , i = 1; : : : ; n, be n-by-n matrices with the (h; k)th entry given by 8 > < 1 if jh ? kj = i ? 1, Qi (h; k) = > ?1 if jh + k ? n + 1j = n ? i, : 0 otherwise. Boman and Koltracht in [2] showed that fQigni=1 is a basis for Bnn. We note that each Qi is a sparse matrix with at most 2n non-zero entries. Let z = (z1 ; : : : ; zn)t be an n-vector. De ne (z)  (z3; z4 ; : : : ; zn; 0; 0)t: De ne Tn (z) to be the n-by-n symmetric Toeplitz matrix with z as the rst column and Hn(z) to be the n-by-n Hankel matrix with z as the rst column and (zn ; : : : ; z1 )t as the last column. Let Pn be the n-by-n permutation matrix with the (i; j )th entry given by 8 > < 1 if 1 n i  d n2 e and j = 2i ? 1, n [Pn]i;j = > 1 if d 2 e < i  n and j = 2i ? 2d 2 e, : 0 otherwise. With the help of the above notations, we can give explicit formula for the entries of the minimizer s(An). Lemma 2 (Chan, Ng and Wong [5]) Let An be an n-by-n symmetric matrix and s(An) be the minimizer of kBn ? AnkF over all Bn 2 Bnn . De ne for any m the m-vectors em = (1; 0; : : : ; 0)t, 1m = (1; : : : ; 1)t and the m-by-m matrix Um = 1m1tm . Then s(An) = Tn(z) ? Hn((z)) with ! 0 U n2 + I n2 + e n2 etn2 1 t z = 2(n + 1) Pn 0 2U n2 + I n2 Pnrn if n is even; and 1 0 t n+1 + I n+1 + e n+1 e n+1 0 2 U 1 2 2 A 2 z = 2(n + 1) Pnt @ 2 0 U n?2 1 + I n?2 1 Pnrn 5

if n is odd. Here

rn = (1tn(Q1  An)1n; 1tn(Q2  An)1n; : : : ; 1tn(Qn  An )1n)t

(3)

and  is the Hadamard product. We remark that if An has no special structure, then rn can be computed in O(n2) operations because Qi are sparse with only O(n) nonzero entries each. However, we show below that if An is a band matrix, then the cost can be reduced. Corollary 1 The construction cost of the optimal sine transformation approximation s(An) is of order O(`n) if An is an n-by-n band matrix with band width `. Proof: By the de nition of Qi in De nition 1, Qi  An has at most 2` non-zero entries for i = 3; : : : ; n. When i = 1; 2, the number of non-zero entries of Qi  An is less than 2n. Therefore, the cost of forming rn in (3) is bounded by O(`n). It is well known that the discrete sine transform matrix Sn diagonalizes the set of tridiagonal Toeplitz matrices. By the de nition of s(An) or (2), the optimal sine transform preconditioner gives exact approximation to all matrices in the set, in particular to the 1-dimensional discrete Laplacian: tridiag[?1; 2; ?1]. Therefore, the system tridiag[?1; 2; ?1]x = b can be solved in exactly one iteration by the PCG method with s(An) as preconditioner and the condition number of the preconditioned system is O(1). We therefore expect our preconditioner still to be good approximation for elliptic operators that are small perturbations of the Laplacian. In contrast, we remark that the condition number of the system preconditioned by the optimal circulant preconditioner is of O(n3=2), see R. Chan and T. Chan [3]. In the following sections, we will use the optimal sine transform approximation to construct block preconditioners for elliptic problems in higher dimensional domains.

3 Two-Dimensional Rectangular Domains In this section, we apply the optimal sine transform approximation to construct a block preconditioner on the 2-dimensional rectangular domain. The construction is given in x3.1 and the analysis of the spectral condition number of the preconditioned system is given in x3.2.

3.1 Construction of Preconditioners

Consider the 2-dimensional elliptic problems ? (a(x; y)ux)x ? (b(x; y)uy )y = f (x; y) 6

(4)

on the unit square [0; 1]  [0; 1] with Dirichlet boundary condition. Assume that the coecient functions a(x; y), b(x; y) satisfy 0 < cmin  a(x; y); b(x; y)  cmax (5) for some constants cmin and cmax. Let the domain be discretized by using a uniform grid with n internal grid points in each coordinate direction. With the usual 5-point centered di erencing, the resulting discretization matrix A will be an n2-by-n2 symmetric positive de nite matrix of the form 0D A 1 1 2 BB A2 D2 A3 CC BB CC ... ... ... (6) CC : A=B BB C . . . . . . An A @ An Dn Here Di are symmetric tridiagonal matrices for 1  i  n and Ai are diagonal matrices for 2  i  n. Let 1 0 0 CC BB A2 0 CC BB ... A (7) CC L=B 3 BB C . . .. .. A @ An 0 be the block lower triangular matrix. Then the block Cholesky factorization of A can be written as A = ( + L)?1 ( + Lt ); where  is a symmetric block diagonal matrix with diagonal blocks i satisfying 1 = D1 ; i = Di ? Ai ?i?11Ai ; 2  i  n: (8) Because of the work and storage required in large problems for computing the Schur's complements i , carrying out the complete block Cholesky factorization is not an ecient way for solving the systems Ax = b. Concus, Golub and Meurant in [7] focused on sparse approximations on the matrices i . Their idea is to approximate i by band matrices consisting of the three main diagonals of i. More precisely, their preconditioner M is de ned as follows:

M = ( + L)?1 ( + Lt ) 7

where  is a symmetric block diagonal matrix with diagonal blocks i satisfying 1 = D1; i = Di ? B3 (Ai?i?11 Ai); 2  i  n:

(9)

Here B3 (Ai?i?11 Ai) is the tridiagonal matrix consisting of the three main diagonals of Ai?i?11 Ai. The preconditioner M is called the INV preconditioner in Concus et al. [7] and each i is proved to be positive de nite by showing that the minimum row sum of matrices i + Ai+1 are greater than zero. Hence, the factorization process (9) can be carried out for 2  i  n. It was also proved that the inverse of an n-by-n symmetric tridiagonal matrix is determined by two n-vectors which can be computed in O(n) operations and hence forming B3 (Ai?i?11 Ai) only needs O(n) operations. As a result, forming the INV preconditioner M and computing M ?1 v can be done in O(n2) operations. In this paper, we use optimal sine transform preconditioner to approximate each i. We follow the approach in Concus et al. [7] and propose a block preconditioner sB (A) of the form sB (A) = ( + L^ )?1 ( + L^ t ): (10) Here 0 1 0 BB s(A2 ) 0 CC BB CC s(A3)  (11) L^  B CC BB ... ... C @ A s(An) 0 is a block lower triangular matrix which approximates L and

0 1 0 1 BB CC 2 B CC B ... @ A 0

n

is a diagonal block matrix with diagonal blocks i satisfying 1 = s(D1); i = s(Di) ? s(Ai )?i?11s(Ai); 2  i  n:

(12)

In order to ensure the above factorization can be carried out, we show in the next theorem that each i is positive de nite. In addition, the theorem also proves that the preconditioner sB (A) actually comes from the matrix A by taking the optimal sine 8

transform approximations to each n-by-n block of A. Thus the construction of our preconditioner is similar to that of the Level-1 circulant preconditioners proposed by T. Chan and Olkin [6].

Theorem 1 Let A be an n2 -by-n2 symmetric positive de nite matrix of the form given in (6). De ne

0 s(D ) s(A ) BB s(A21) s(D22) s(A3 ) B ... ... BB T =B B@ ...

1 CC CC ... (13) CC ; C . . . s(A ) A n s(An) s(Dn) where s() is the optimal sine approximation. Then T is positive de nite and the block Cholesky factorization of T is given by ( + L^ )?1( + L^ t ) where L^ and  are given in (11) and (12) respectively. In particular, T = sB (A) and hence sB (A) and i are positive de nite.

Proof: The positive de niteness of T follows from [4, Theorem 1]. Similar to (8), the block Cholesky factorization of T is given by T = (^ + L^ )^ ?1 (^ + L^ t );

where ^ is a symmetric block diagonal matrix with diagonal blocks ^ i satisfying ^ 1 = s(D1); ^ i = s(Di) ? s(Ai )^ ?i?11s(Ai); 2  i  n: Comparing this with the de nition of  in (12), we see that ^ =  and hence the block preconditioner sB (A) is identical to T . In the following, we apply the recursion formula (12) to show that both the preconditioner sB (A) and the solution of the linear system sB (A)x = b can be obtained in O(n2 log n) operations. Since matrices i, s(Di) and s(Ai) all belong to Bnn, we let , di and ai be their corresponding eigenvalue matrices. Speci cally, we have i = SniSn;

s(Di) = Sndi Sn and s(Ai) = Snai Sn: 9

(14)

As Di and Ai are band matrices, by Corollary 1, forming s(Di) and s(Ai) cost O(n) operations. By (1), di and ai can be computed in O(n log n) operations. Using (12), we have the following equality which relates the eigenvalues of matrices i and i?1 i = di ? ai ?i?11ai: Therefore, i can be obtained from i?1 in O(n log n) steps. Hence, forming  and the preconditioner sB (A) require only O(n2 log n) operations. Finally in solving the system sB (A)x = b, we are required to multiply each i to some vector and to solve systems with coecient matrices i . By noting the equalities in (14), it is easy to see that the system sB (A)x = b can also be solved in O(n2 log n) operations. We note that, by Theorem 1, the preconditioner sB (A) can also be constructed by using the approach used in constructing Level-1 circulant preconditioners in T. Chan and Olkin [6]. The cost will also be the same.

3.2 Convergence Analysis of the Preconditioners

In this subsection, we are going to show that the condition numbers of the preconditioned systems (sB (A)?1A) are bounded by a constant which is independent of the size of the matrix A. Hence, the convergence rate of the conjugate gradient method when applied to the preconditioned systems sB (A)?1A is linear, see Axelsson and Baker [1, p.26]. Before we present our proof, let us introduce the following notations. Let Ann be any 2 n -by-n2 matrix partitioned as

0 BB Ann = B B@

A1;1 A2;1 ... An;1

A1;2 A2;2 ... An;2

: : : A1;n 1 : : : A2;n C C . . . ... C CA : : : : An;n

Here Ai;j are square matrices of order n. Then we de ned B (Ann) to be the matrix

0 (A ) (A ) BB (A12;;11 ) (A12;;22) B (Ann)  B ... B@ ... (An;1) (An;2)

: : : (A1;n) 1 : : : (A2;n) C C ... C ... CA ; : : : (An;n)

where each block (Ai;j ) is the diagonal matrix of order n whose diagonal is equal to the diagonal of the matrix Ai;j . The following lemma relates eigenvalues of matrices Ann and B (Ann) and is useful in our analysis the convergence rate. 10

Lemma 3 (Chan and Jin [4]) Given any n2 -by-n2 symmetric matrix Ann, we have min(Ann)  min(B (Ann))  max(B (Ann))  max(Ann): In particular, if Ann is positive de nite, then B (Ann) is also positive de nite.

Using the () notation, we can give another formula for sB (A).

Lemma 4 Let A be an n2 -by-n2 symmetric positive de nite matrix of the form given in (6). Then

sB (A) = (I Sn)B ((I Sn)A(I Sn))(I Sn): Proof: We rst observe that B ((I Sn)A(I Sn)) is equal to 0 (S D S ) (S A S ) 1 n 1 n n 2 n BB (SnA2 Sn) (SnD2 Sn) (SnA3Sn) CC BB CC ... ... ... BB CC : B@ ... ... A (SnAnSn) C (SnAnSn) (SnDnSn) Then the lemma follows by using (2) and (13).

(15)

With the help of these two lemmas, we prove the main theorem in this section.

Theorem 2 Let A be the 5-point discretization matrix of (4) on the unit square satisfying conditions (5). If sB (A) is the preconditioner de ned in (10), we have

(sB (A)?1A)  ( ccmax )2 : min

Proof: Let AL be the 5-point discretization matrix of the Laplace operator on the unit square, i.e. AL = tridiag[?1; 2; ?1] In + In tridiag[?1; 2; ?1]: Then cminAL  A  cmaxAL; (16) see [3, (4.2)]. Multiplying (I Sn) on the left and on the right of matrices in the above inequality and applying Lemma 3, we have cmin(I Sn)AL(I Sn) = cminB ((I Sn)AL(I Sn))  B ((I Sn)A(I Sn))  cmaxB ((I Sn)AL(I Sn)) = cmax(I Sn)AL(I Sn): 11

Multiplying again the left and the right of the above matrices by (I Sn) and by noting (15), we then have

cminAL  sB (A) = (I Sn)B ((I Sn)A(I Sn))(I Sn) = sB (A)  cmaxAL: (17) This shows that sB (A) ? cminAL and cmaxAL ? sB (A) are positive semide nite matrices. Combining (16) and (17), we have t t t 0 < cmin xt ALx  t x Ax  cmax xt AL x : cmax x ALx x sB (A)x cmin x AL x Hence, the theorem is proved.

4 Extension to Other Domains In this section, we apply the optimal sine transform approximation to construct preconditioners for two-dimensional irregular domains and higher dimensional regular domains. They will be discussed respectively in x4.1 and x4.2.

4.1 Two-dimensional Irregular Domains

For ease of presentation, we consider irregular domains that are union of rectangular domains. In this case, the matrix A still has the form given in (6) but the diagonal submatrices Di of A are of di erent sizes and the submatrices Ai may not be square matrices. We note that the number of submatrices of Ai that are not square is proportional to the number of rectangular regions used in forming the given domain. For an L-shaped domain, there is only one Ai that is not square; and for a T-shaped domain, the number is two. When the Ai are square, we can carry out the construction of the preconditioner just as we did in x3. Therefore let us concentrate on the sub-block of A where the Ai are not square. In particular, let us consider A to be of the form t ! D A 1 A = A D2 ; 2 2

where D1 and D2 are respectively n1-by-n1 and n2 -by-n2 symmetric tridiagonal matrices and A2 is an n2 -by-n1 rectangular matrix with (A2 )ij = 0 for i 6= j . Without loss of generality we assume that n1 > n2. 12

The block preconditioner sB (A) in this case will still be of the form given by (10) except that the i, which are approximations to i, are de ned as follows: 1 = s(D1 ); 2 = s(D2 ) ? s(A2 Ent 2;n1 )s(En2;n1 ?1 1 Ent 2 ;n1 )s(En2;n1 At2 ) where En2;n1 is an n2 -by-n1 matrix such that (En2 ;n1 )i;j = ij , the Kronecker delta. We note that the matrix En2 ;n1 ?1 1 Ent 2;n1 is the n2 -by-n2 principal submatrix of ?1 1 . Hence it is a dense matrix without any special algebraic structure. Constructing the optimal sine transform approximation to the matrix requires O(n22) operations according to (3). Thus we conclude that for a given irregular domain which is the union of m rectangular regions, the construction cost of the preconditioner sB (A) is bounded by O(mn2 ) + O(n2 log n) where n is the size of the largest diagonal block of A. For an L-shaped or T-shaped domain, the cost will still be bounded by O(n2 log n). Once sB (A) is formed, the cost of solving the system sB (A)y = v is the same as in the rectangular case, i.e. it is bounded by O(n2 log n).

4.2 Higher Dimensional Rectangular Domains

In this section, we extend the construction of our preconditioner to the 3-dimensional cubic domain [0; 1]3. The approach can easily be generalized to higher dimensional regular domains. By applying the usual 7-point centered di erencing with n internal grid nodes in each coordinate direction, the resulting discretization matrix A will be an n3 -by-n3 symmetric positive de nite matrix of the form given in (6) with Ai and Di being n2 -by-n2 diagonal matrices and n2-by-n2 tridiagonal block matrices respectively. If we let L to be the matrix de ned in (7), then the block Cholesky factorization of A is given by

A = ( + L)?1 ( + Lt ); where  is a symmetric block diagonal matrix with each n2 -by-n2 diagonal blocks i satisfying (8). To emulate the approach of constructing the INV preconditioner, namely using an easily invertible matrix to approximate the Schur's complement i, we introduce the socalled Level-2 circulant approximation to i , see T. Chan and Olkin [6]. We need the following notations rst. For any n2 -by-n2 block matrix Ann, we denote (Ann)i;j;k;l to be the (i; j )th entry of the (k; l)th block of Ann. Let P be the permutation matrix that satis es (P tAnnP )i;j;k;l = (Ann)k;l;i;j ; 1  i; j  n; 1  k; l  n: 13

Then we de ne an approximation s2(Ann) to Ann by

s2(Ann) = PsB (P tsB (Ann)P )P t:

(18)

By Theorem 3 in [4], the approximation s2(Ann) can be diagonalized by Sn Sn. Specifically we have,

s2(Ann) = (Sn Sn)((Sn Sn)Ann(Sn Sn))(Sn Sn): Using this equality, we can relates the eigenvalues of s2(Ann) with its rst column as in (1). Hence, the inverse of s2 (Ann) can easily be computed. Now we apply the approximation s2() to each i and de ne the block preconditioner s3(A) for the matrix A as s3 (A) = ( + L~ )?1 ( + L~ t ); where

0 BB s2 (0A2) 0 B s2 (A3)  BB L~ = B ... B@

... s2(An) 0 is a block lower triangular matrix which approximates L and

1 CC CC CC CA

0 1 0 1 BB CC 2 B CC B ... @ A 0

n

is a diagonal block matrix with diagonal blocks i of order n2 satisfying 1 = s2 (D1); i = s2 (Di) ? s2(Ai)?i?11 s2(Ai ); 2  i  n:

(19)

Similar to Theorem 1, we can prove that the preconditioner s3(A) can be obtained from A by taking the s2 () approximation of each n2 -by-n2 block of A. Hence by Theorem 1 in [4], s3(A) is positive de nite. Also, since matrices Ai are diagonal and Di have the same graph structure as 2-dimensional discretization matrices, by (18), we see that obtaining each s2(Ai) and s2(Di) requires O(n2 log n) operations. Hence by (19) the construction cost of the preconditioner s3(A) is of order O(n3 log n). 14

5 Numerical Experiments In this section, we compare the performance of our method with the MILU and MINV types preconditioners. The equation we used is @ [(1 + ex+y ) @u ] + @ [(1 +  sin(2(x + y))) @u ] = f (x; y): (20) @x @x @y 2 @y The  here is a parameter controlling the variation of the coecient functions. We discretize the equation using the standard 5-point scheme. The initial guess and the right hand side are chosen to be random vectors and are the same for all methods. Computations are done with double precision on a VAX 6420 and the iterations are stopped when the residual vector rk at the k-th iteration satis es jjjjrrk0jjjj22 < 10?6. We note that the application of the preconditioner sB (A) requires O(n2 log n) ops, which is slightly more expensive than the O(n2) ops required by the MILU and MINV preconditioners. However, we remark that the Fast Sine Transform is easier to parallelize than tridiagonal solvers. Since the preconditioner sB (A) is based on averaging of coef cient functions over the grid points, their performance will deteriorate as the variation  in the coecients increase. To alleviate this potential problem, we rst symmetrically scale A by its diagonal before applying the preconditioner. This technique has also proven to be very useful when used in conjunction with other kinds of preconditioners. In our experiments, we apply diagonal scaling to all methods. The following tables show the numbers of iterations required for convergence for the equation (20) when di erent choices of  and preconditioners. The notation I there means no preconditioner is used and the parameter n is equal to 1=h where h is the mesh-size. Tables 1a-1b and 2a-2b are the results on the unit square and on the L-shaped domain [0; 21 ]  [0; 1] [ [ 12 ; 1]  [0; 12 ] respectively. We see that the performance of the preconditioner sB (A) is better than that of the MILU and MINV. However, the MILU and MINV methods are less sensitive to the changes in . We also observe that the number of iterations for the preconditioner sB (A) does not grow when n is large. TABLE 1a: Numbers of Iterations for the Unit Square.  0.0 0.01 n I sB (A) MINV MILU I sB (A) MINV MILU 4 9 1 4 7 12 3 4 7 8 23 1 5 9 25 3 5 9 16 43 1 7 13 47 3 7 13 32 84 1 11 19 90 3 11 20 64 165 1 16 28 186 3 16 28 128 318 1 24 41 363 3 24 41 15

TABLE 1b: Numbers of Iterations for the Unit Square.

 n 4 8 16 32 64 128

I 13 26 46 97 189 379

0.1 1.0 sB (A) MINV MILU I sB (A) MINV MILU 3 3 7 15 5 3 6 4 5 9 29 7 4 9 5 8 13 54 9 6 14 5 11 20 107 11 10 20 5 16 28 209 12 15 28 5 24 41 419 13 22 41

TABLE 2a: Numbers of Iterations for the L-shaped Domain.

 n 8 16 32 64 128

I 21 42 77 155 306

0.0 0.01 sB (A) MINV MILU I sB (A) MINV MILU 3 4 9 22 3 4 9 3 6 12 40 3 6 12 4 9 17 80 4 9 17 4 14 25 155 4 14 25 4 21 36 311 4 21 36

TABLE 2b: Numbers of Iterations for the L-shaped Domain.

 n 8 16 32 64 128

I 22 42 82 162 322

0.1 1.0 sB (A) MINV MILU I sB (A) MINV MILU 4 4 9 24 7 4 9 5 6 12 45 9 6 13 5 9 18 86 10 8 18 6 14 25 169 12 12 26 7 21 36 338 14 19 37

Acknowledgment: The authors wish to thank Professors Tony Chan and Vassilevski for their helpful discussion in this work.

16

References [1] O. Axelsson and V. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation , Academic Press, Orlando, Fl., 1983. [2] E. Boman and I. Koltracht, Fast Transform Based Preconditioners for Toeplitz Equations , Preprint, August 1993. [3] R. Chan and T. Chan, Circulant Preconditioners for Elliptic Problems J. Numer. Linear Algebra Appls., 1 (1992), pp. 77{101. [4] R. Chan and X. Jin, A Family of Block Preconditioners for Block Systems, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1218{1235. [5] R. Chan, K. Ng, and C. Wong, Sine Transform Based Preconditioners for Symmetric Toeplitz Systems, Submitted. [6] T. Chan and J. Olkin, Circulant Preconditioners for Toeplitz-block Matrices, Numer. Algorithms, to appear. [7] P. Concus, G. Golub, and G. Meurant, Block Preconditioning for the Conjugate Gradient Method, SIAM J. Sci. Statist. Comput., 6 (1985) pp. 220{252. [8] T. Dupont, R. Kendall and H. Rachford, An Approximate Factorization Procedure for Solving Self-Adjoint Elliptic Di erence Equations, SIAM J. Numer. Anal., 5 (1968), pp. 559{573. [9] P. Yip and K. Rao, Fast Decimation-in-time Algorithms for a Family of Discrete Sine and Cosine Transforms , Circuits Systems Signal Process., 3 (1984), pp. 387{408.

17