Wavelet based preconditioners for sparse linear systems

Comment

Report 2 Downloads 141 Views

Applied Mathematics and Computation 171 (2005) 203–224 www.elsevier.com/locate/amc

Wavelet based preconditioners for sparse linear systems B.V. Rathish Kumar *, Mani Mehra Department of Mathematics, Indian Institute of Technology, Kanpur 208 016, India

Abstract A class of eﬃcient preconditioners based on Daubechies family of wavelets for sparse, unsymmetric linear systems that arise in numerical solution of Partial Diﬀerential Equations (PDEs) in a wide variety of scientiﬁc and engineering disciplines are introduced. Complete and Incomplete Discrete Wavelet Transforms in conjunction with row and column permutations are used in the construction of these preconditioners. With these Wavelet Transform, the transformed matrix is permuted to band forms. The eﬃciency of our preconditioners with several Krylov subspace methods is illustrated by solving matrices from Harwell Boeing collection and Tim Davis collection. Also matrices resulting in the solution of Regularized Burgers Equation, free convection in porous enclosure are tested. Our results indicate that the preconditioner based on Incomplete Discrete Haar Wavelet Transform is both cheaper to construct and gives good convergence. 2005 Elsevier Inc. All rights reserved. Keywords: Preconditioning; Sparse matrices; Wavelet transform; Krylov subspace solvers

*

Corresponding author. E-mail addresses: [email protected] (B.V. Rathish Kumar), [email protected] (M. Mehra).

0096-3003/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2005.01.060

204

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

1. Introduction Consider solving the large and sparse linear system Ax ¼ b

ð1:1Þ

that arises when numerical methods such as ﬁnite diﬀerence, ﬁnite element etc., are used to solve partial diﬀerential equations (PDEs). Krylov subspace iterative methods such as Conjugate Gradient (CG), Biconjugate Gradient (BIGG), Conjugate Gradient squared (CGS), Bi-Conjugate Gradient stabilized (BiCGSTAB), Generalized minimal residual method (GMRES(k)) are largely employed in solving (1.1). It is well known that the rate of convergence of such methods are strongly inﬂuenced by the spectral radius of A. It is therefore natural to try to transform the original system in to one having the same solution but more favorable spectral properties. A preconditioner is a matrix that can be used to accomplish such a transformation. If M is a non-singular matrix which approximates A(M A), the transformed linear system M 1 Ax ¼ M 1 b

ð1:2Þ

will have the same solution as system (1.1) but the convergence rate of iterative methods applied to (1.2) may be much higher. Further it is desirable that M1v can be easily calculated for any vector v. A large number of diﬀerent preconditioning strategies have been developed. A brief survey of the same is provided in [1]. Recently, there is an increased interest in using wavelet based preconditioners with Krylov subspace methods (KSMs) for linear system (1.1). Chan et al. [2] have used Discrete Wavelet transform (DWT, usually complete Discrete Wavelet transform (cDWT) is referred as DWT) for improving the performance of sparse approximate inverse preconditioners as proposed by Grote and Huckle [3]. They dealt with matrices with smooth inverses resulting from 1-D Laplacian operator and linear PDE with variable coeﬃcients. Chen [4] used DWT with permutations for the construction of preconditioners for dense system arising from Boundary element analysis (BEA) of linear PDEs. Ford et al. [5,6] have considered dense matrices with local non-smoothness and have shown that wavelet compression can be used for designing preconditioners for such dense systems after isolating local non-smoothness. All these studies largely focus on algorithm for constructing DWT based preconditioners in solving dense symmetric/unsymmetric linear systems arising in BEA of linear PDEs by GMRES method. DWT transforms (1.1) to WAW T ~x ¼ Wb;

~x ¼ Wx:

ð1:3Þ

e ¼ WAW are Since the wavelet transform W is orthogonal the eigen values of A e will be unafthe same as those of A. This implies that condition number of A fected by DWT. Incomplete Discrete Wavelet Transform (iDWT) W1, which is T

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

205

approximately orthogonal, is cheaper and easier to construct. W1 may serve as an alternative for W to project A into wavelet space. Standard DWT leads to preconditioner with non-zero entries dispersed throughout. The cost of applying such a preconditioner is often too high for practical purpose. By use of cDWT and iDWT with permutation one can improve the positioning of the non-zero entries to give a preconditioner that is cheaper to apply and also gives good convergence. In this study we propose Haar and Daubechies wavelet based preconditioners for non-symmetric large sparse linear systems resulting from non-linear PDE analysis. Further in our algorithms we are using Discrete Wavelet Transform with permutations based on cDWT and iDWT. While it is generally agreed that the construction of eﬃcient general purpose preconditioner is not possible, there is still considerable interest in developing methods which will perform well on a wide range of problems. The algorithms proposed in this study are tested on a variety of matrices resulting from Finite Diﬀerence and Finite element method of non-linear PDEs and those from Harwell Boeing collection. Also the preconditioners are tested on ﬁve diﬀerent Krylov subspace methods. The paper is organized as follows. In Section 2 we give a quick overview of sparse and band preconditioner. In Section 3 we summarize some basics of wavelets. We also present cDWT and iDWT based on pyramid algorithm. In Section 4 we introduce our new concept of Incomplete Discrete Wavelet Transform with permutation (iDWTPer) and describe the properties of banded wavelet preconditioner using iDWTPer. Implementation details and the results of numerical experiments are discussed in Section 5. Finally, we make some conclusions in Section 6.

2. Sparse and band preconditioner A simple preconditioner for a sparse matrix would be banded matrix constructed by setting to zero all the entries of the matrix outside a chosen diagonal band. The wider the band more accurate will be preconditioner approximating the original matrix. But matrices with large band width (l) are expensive to store and would demand enormous amount of computation at each iterative step involving such a matrix as a preconditioner. So it is important to choose l in such a way that it balances these two conﬂicting considerations. To decide a preconditioner for linear system (1.1) we have considered the following splitting of A: A ¼ D þ C;

ð2:1Þ

where D is a band matrix with wrap around boundaries. For the case of l = 3, the matrix D is simply

206

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

2

A1;1

6 6 A2;1 6 6 6 6 6 6 6 6 6 4

A1;2

A1;n

A2;2

A2;3

A3;2

..

.

..

.

..

.

..

.

An;1

3

7 7 7 7 7 7 7 7 7 7 An1;n 7 5 An;n1

ð2:2Þ

An;n

the associated preconditioned system ðI þ D1 CÞx ¼ D1 b 1

ð2:3Þ 1

has the preconditioner M = D . Clearly with the above choice of l = 3, M1 will not approximate A1 well in all cases. It is our experience that to improve the above preconditioner, merely increasing l alone is not suﬃcient as the improvements are only marginal. Here we shall discuss a way based on cDWT and iDWT to derive a new and improved preconditioner.

3. Wavelet preliminaries Multiresolution analysis (MRA) is the theory that was used by Ingrid Daubechies to show that for any non-negative integer n there exists an orthogonal wavelet with compact support such that all the derivatives up to order n exist. Here we are using compactly supported wavelets like Haar, D4, D6 (DN stands for Daubechies order N wavelets). MRA describes a sequence of nested approximation spaces Vj in L2(R) such that closure of their union equals L2(R) making f0g V 1 V 0 V 1 L2 ðRÞ:

ð3:1Þ

The orthogonality of scaling functions and wavelets together with the dyadic coupling between MRA spaces lead to a relation between scaling function coefﬁcients and wavelet coeﬃcients on diﬀerent scales. This yields a fast and accurate algorithm due to Mallat [8] denoted by pyramid algorithm. Straightforward implementation of the pyramid algorithm leads to diﬃculties in handling boundary points because such a procedure requires several data values which are deﬁned outside the boundaries and these are not known. Based on this we will deﬁne two notions namely cDWT and iDWT. Both cDWT and iDWT are deﬁned by ﬁlter coeﬃcients a0, a1, . . . , aD1. Where ÔDÕ is order of wavelet transform. The ﬁlter coeﬃcients b0, b1, . . . , bD1 are derived from ai, by the following relation bi ¼ ð1Þi aD1i :

ð3:2Þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

207

For f 2 L2(R), fj(x) denotes the projection of f onto Vj space in terms of scaling function /j,k is deﬁned by X j fj ðxÞ ¼ P V j f ðxÞ ¼ sk /j;k : ð3:3Þ k

Projection also has a formulation in terms of scaling function and wavelet function wj,k where j is resolution level and k is translation. P V j ðxÞ ¼

X

srk /r;l ðxÞ þ

L1 X X j¼r

k

d jk wj;k ðxÞ;

ð3:4Þ

k

where r is lowest and L is highest resolution level. Thus we obtain the relations ¼ sj1 l

D1 X

ak sj2lþk ;

k¼0

d j1 l

¼

D1 X

ð3:5Þ bk sj2lþk :

k¼0

3.1. Complete Discrete Wavelet Transform (cDWT) If function f is periodic, we also have periodicity in the scaling and wavelet coeﬃcients. sjk ¼ sjkþ2j p d jk ¼ d jkþ2j p ;

ð3:6Þ

p 2 Z:

Hence it is enough to consider 2j coeﬃcients of either type at level j. Thus pyramid algorithm for periodic Wavelet Transform is deﬁned by sj1 ¼ l

D1 X

ak sjh2lþki2j ;

k¼0

d j1 l

¼

D1 X

ð3:7Þ bk sjh2lþki2j ;

j

l ¼ 0; 1; . . . ; 2 1

k¼0

For a given vector s from vector space Rn one may construct an inﬁnite periodic sequence of period n and use it as coeﬃcients of a scaling function fL(x) in some ﬁxed subspace VL of L2 (L in an integer). Hereafter we refer to periodic wavelet transform by cDWT. Then transform Ws ! w(w = Ws) is implemented by pyramid algorithm (3.7). Denote s = s(L) is column vector of A at wavelet level L. Then pyramid algorithm transforms the vector sl to w deﬁned as T

T

T

T

w ¼ ½ðsðrÞ Þ ðf ðrÞ Þ ðf ðrþ1Þ Þ . . . ðf ðL1Þ Þ

ð3:8Þ

208

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

in level by level manner as described below sðLÞ

! sðL1Þ &

! &

sðL2Þ

f ðL1Þ

f ðL2Þ

! &

! &

sðmÞ

f ðvÞ

! &

! sðrÞ & f ðrÞ

where sj and fj are of length 2j. Suppose w denotes the wavelet function. Then we expect the new vector w to be nearly sparse because of the usual moment conditions Z 1 wðxÞxp dx ¼ 0 for p ¼ 0; 1; . . . ; D=2 1 ð3:9Þ 1

are equivalent to vector moment conditions D1 X

ð1Þk k p bk ¼ 0

for

p ¼ 0; 1; . . . ; D=2 1:

ð3:10Þ

k¼0

Here the larger D is, the better is the compression in w, but the compact support is larger as well. As the periodized wavelet transform satisﬁes the orthonormal relation WWT = I we call it cDWT. 3.2. Incomplete Discrete Wavelet Transform (iDWT) Unlike in cDWT our iDWT does not require periodic ECÕs as the function f need not be periodic to apply the iDWT. iDWT assumes that all values outside the boundaries are equal to zero as shown in Fig. 1. Further it is much simpler to implement than its complete counterpart (cDWT).

Fig. 1. The pyramid algorithm and boundary points.

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

209

Pyramid algorithm for iDWT is ¼ sj1 l

D1 X

ak sj2lþk ;

k¼0

d j1 l

¼

D1 X

ð3:11Þ bk sj2lþk :

k¼0

where the points outside boundary are zero. The transform W1s ! w(w = W1s) is implemented by (3.11) and satisfy orthonormal relation approximately W 1 W T1 uI. We can show the power of this iDWT on Calderon–Zygmund matrix. Consider the Calderon–Zygmund type matrix A = (aij), where ( ð3=2Þ if i ¼ j; aij ¼ ð3:12Þ 1 otherwise: jijj As shown in Fig. 2 it is smooth, with entries decreasing in magnitude away from the diagonal. When a cDWT is applied to A the resultant matrix e ¼ WAW T after thresholding has a weak ﬁnger pattern with most of the largA e 1 ¼ W 1 AW T obest entries conﬁned to a narrow diagonal band. The matrix A 1 tained by applying iDWT to A is also shown in Fig. 2 and it is very much e similar to A.

4. DWT-based preconditioners 4.1. The wavelet transform with permutation (DWTPer) A discrete wavelet transform with permutation based on cDWT is deﬁned in [4]. It is equivalent to wavelet transform followed by permutations of rows and columns. We will refer this wavelet transform with permutation as DWTPer (or cDWTPer). This transform has the eﬀect of preserving the general structure (by which we mean the ÔshapeÕ of the areas of singularity) of a matrix after a wavelet transform has been applied. The precise eﬀect of applying DWTPer to a diagonal matrix is established in [4]. 4.1.1. Incomplete Discrete Wavelet Transform with permutation (iDWTPer) Here we are proposing a new notion of Incomplete Discrete Wavelet Transform with permutation as iDWTPer. Denote by s = s(L) a column vector of A at the wavelet level L. Then pyramid algorithm based on iDWT transforms the vector s(L) to w ¼ ½ðsðrÞ ÞT ðf ðrÞ ÞT ðf ðrþ1Þ ÞT . . . ðf L1 ÞT T :

ð4:1Þ

210

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

Fig. 2. Mesh plot of matrix Top center: original matrix, bottom left: its cDWTPer, bottom right: its iDWTPer.

Assume n = 2L and r is an integer such that 2r < D and 2r+1 P D. r = 0 for D = 2 (Haar wavelets) and r = 1 for D = 4 (Daubechies order 4 wavelets). In matrix form w is expressed as w ¼ P rþ1 W 1rþ1 . . . P L1 W 1L1 P L W 1L sL W 1 sL ;

ð4:2Þ

where Pm ¼

!

Pm

ð4:3Þ Im

nn

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

211

with P v a permutation matrix of size 2m = 2L km, that is, P m ¼ Ið1; 3; . . . ; W 1m 2m 1; 2; 4; . . . ; 2m Þ, and where W 1m ¼ with W 1m an orthogonal I m nn (sparse) matrix of size 2m = 2Lminuskv and Im is an identity matrix of size km. The one level transformation matrix W 1m is a compact diagonal block matrix. For example, with the Daubechies order D = 4 wavelets with m = 2 vanishing moments, it is 1 0 a0 a1 a2 a3 C Bb b b b C B 0 1 2 3 C B C B a0 a1 a 2 a 3 C B C B C B b0 b1 b 2 b 3 C B C B .. .. .. .. C B C B . . . . C: W 1m ¼ B C B .. .. .. .. C B . . . . C B C B B a 0 a 1 a2 a3 C C B C B C B b b b b 0 1 2 3 C B C B a0 a1 A @ a0

a1

Now we will deﬁne new one level iDWTPer matrix (similar to W 1m ) 1 0 a0 / a1 / a2 / . . . aD1 C B / I / / / / ... / C B C B C B b0 / b1 / b2 / . . . bD1 C B C B / / / I / / ... / C B C B C B .. .. C B . a / . 0 C B C B . . C : B b . . W 1m ¼ B . / I . C C B C B .. .. .. .. .. .. .. .. C B . . . . . . . . ... ... C B C B a0 / a1 / C B C B B / I / /C C B C B @ b0 / b1 / A / Lm

/

/

I

ð4:4Þ

nn

Here I is an identity matrix of size 2 1 and /Õs are block zero matrices. b 1L ¼ W 1L ¼ W 1L . Further this iDWTFor m = L, both I and / are of size 0 i.e. W Per for a vector sL 2 Rn can be deﬁned by

212

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

b 1 sðLÞ ^¼W w with b 1 ... W b1¼ W b1 W b 1L W rþ1 rþ2

ð4:5Þ

based on (L + 1 r) levels. For a matrix An·n, the iDWT would give b1 ¼ W b 1AW b T: A 1 e from a standard iDWT, we deﬁne a permutation matrix b 1 to A Now to relate A P ¼ P TL P TL1 . . . P Trþ2 P Trþ1 ; where matrices Pk following b 1k ¼ W

Lk Y

s

ð4:6Þ

come from (4.3). Firstly by induction, we can prove the !T

P kþl

l¼1

W 1k

Lk Y

! P kþl

for

k ¼ r þ 1; r þ 2; . . . ; L;

l1

that is, b 1;m ¼ W 1L W b 1 ¼ PTW 1 PL W L1 L1 L .. . b W 1rþ1 ¼ P TL P TL1 . . . P Trþ2 W 1rþ2 P rþ2 . . . P L1 P L : Now we can verify that PW 1 ¼ ðP TL P TL1 . . . P Trþ1 ÞðP rþ1 W 1rþ1 . . . P L W 1L Þ b 1rþ1 ðP T P T . . . P T ÞðP rþ2 W 1rþ2 . . . P L W 1L Þ ¼W L L1 rþ2 .. . b1 W b 1 ...W b 1 P T W 1 ðP L W 1L Þ ¼W rþ1 rþ2 L1 L L2 b 1rþ2 . . . W b 1rþ1 W b 1L ¼W b 1: ¼W b 1; ¼ P A e 1 P T . From this relation we can conclude that b 1 ¼ PW 1 ; A Therefore W iDWTPer can be implemented in a level by level manner, either directly using b 1m (via W b 1 ) or indirectly using Pm (via P) after a iDWT, and we obtain the W same result. The cost in terms of ﬂops of performing a cDWT may be calculated in a straight forward manner as in [7]. Since DWTPer simply involves a permutation of DWT, the ﬂops count for DWTPer is identical. The cost of applying

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

213

a block size N, order D, level l iDWTPer to an n · n rectangular matrix is less than 8DnN ð1 21l Þ. 4.2. Banded Wavelet Preconditioner using cDWTPer and iDWTPer As far as preconditioning is concerned, to solve (1.1), we propose the following algorithm e 1^x ¼ ^b 1. Apply a iDWTPer to Ax = b to obtain A b 2. Select a suitable band form M of A 1 b 1^x ¼ ^ 3. Use M as a preconditioner to solve A b b 1 by A b in the above algorithm. As we described in For cDWTPer replace A Section 2 the band size of M determines the cost of a preconditioning step. Therefore, we shall explore the possibility of constructing an eﬀective preconditioner based on suitable band. Let LEV denote the actual number of wavelet levels used (1 6 LEV 6 (L + 1 r)). With the help of this DWTPer we can b We show in Fig. 3 transform a band matrix A in to another band matrix A. the original matrix and cDWTPer, iDWTPer of a matrix taken from group of Saylor (petroleum reservoir simulation matrices) with D = 4, LEV = 3 and n = 216. Where the Band width of new matrices under cDWTPer and iDWTPer is increased and satisfy the over estimate given by Theorem 4.1. Here we are introducing deﬁnition of band matrix. Band(A, a, b, k): A block band matrix An·n and blocks of size k · k, is called a Band(A, a, b, k) if its lower band width is a and upper block band width b. when k = 1 Band(A, a, b, 1) = Band(A, a, b). The strategy that we take is to start preconditioning step is partition A = D + C, where D is a Band(D, a, a) for some integer a. First apply DWTPer with LEV 6 (L + 1 r) of wavelet levels, to give b x ¼ ðD b x¼^ b þ CÞ^ A^ b: b is also a band matrix and it is at most Band( D; b k; kÞ matrix with k as Now D predicted by following theorem proved in [4]. Theorem 4.1. A is a band(a, b) matrix. Then the new DWT of l levels, based on b which is at most a band(k1, k2) DaubechiesÕ order D wavelets, transforms A into A matrix with k1 a ¼ k2 b ¼ Dð2l1 1Þ:

ð4:7Þ

b k; k) part of the matrix A. b Then select the precondiLet B denote the Band( A; tioner of a band width l such that l 6 k and M = Band(B, l, l).

214

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224 0

5

10

15

20

25

30 0

5

10

15

20

0

0

5

5

10

10

15

15

20

20

25

25

30

30 0

5

10

15

20

25

30

0

25

30

5

10

15

20

25

30

Fig. 3. Level 3 Daubechies 4 transforms of a matrix pores1, Top center: original matrix, bottom left: cDWTPer, bottom right: iDWTPer.

5. Numerical experiments In this section we present the results of numerical experiments on few nonlinear problems and on a range of matrices from the Harwell Boeing collection [9] and Tim Davis collection. The right hand side of each of linear systems from Harwell-Boeing collection was computed from the solution vector x of all ones, the choice used, e.g., in [10]. All numerical experiments were computed in double precision using a MATLAB implementation. The initial guess was always x0 = 0 and the stopping criterion is kb Axk2 6 106 : ð5:1Þ kbk2

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

215

In our experiments we were primarily concerned with testing the eﬀectiveness of the pre-conditioners, so we did not attempt to optimize the choice of tolerance. For some small size problem like IBM32 and pores1 we have tested preconditioner for tolerance of 108. Default we have taken 106. We shall demonstrate the eﬀectiveness of the preconditioner for few standard KSMs like CGM, BICGM, Bi-CGSTAB, CGS, GMRES(k). To begin with we present the results for all these solvers and subsequently will focus on popular solvers like GMRES and Bi-CGSTAB. Initially matrices such as sherman1, saylr3, pores1, IBM32 from Harwell Boeing collection are used. Also the matrices encountered in ﬁnite diﬀerence analysis of unsteady burgers equation are considered. The number of iterations for both unpreconditioned and preconditioned iterative methods are given in respective tables. A means slow convergence or no convergence. Further in our numerical experiments we have tested with four diﬀerent preconditioners namely, cDWTPer-haar (complete DWT with permutation based on Haar wavelet), cDWTPer-D4 (complete DWT with permutation based on compactly supported wavelet of order four), cDWTPer-D6 (complete DWT with permutation based on compactly supported wavelet of order six), iDWTPer-haar (incomplete DWT with permutation based on Haar wavelet). The convergence results without preconditioning are listed under M = I. We also considered the use of a standard DWT-based preconditioner formed by setting to zero all entries whose magnitude is less than a chosen threshold value (as done in [11]), and found that such preconditioner becomes singular for non-linear problems. All the numerical experiments are carried out using MATLAB software installed on Sun E250 workstation with SunW ultrasparc II, 400 MHZ, Dual processor using double precision arithmetic under Solaris 8 operating system. To begin with we have tested 4 Krylov subspace solvers on matrices given below: Sherman1: This matrix arises in oil reservoir simulation on a 10 · 10 · 10 grid, using seven point ﬁnite-diﬀerence approximation with NC equations and unknowns per gridblock. Here, size n = 1000, nz = 23094 (where nz is number of non-zeros entries) and NC = 1. Outcome of the numerical experiments are provided in Table 1. From Table 1 one can notice that our wavelet transform based preconditioners are eﬀective with all the four iterative solvers for unsymmetric matrices, In most of the cases iDWTPer-haar based preconditioner is found to be eﬀective in accelerating the convergence rate of the iterative solver. In Fig. 4 logarithm of relative residual norm is plotted against the number of iterations to depict the convergence history of the preconditioned KSMs. Saylr3: Saylor petroleum engineering reservoir simulation matrix arises in 3D reservoir simulation. Here, n = 1000, and nz = 375. On this matrix we tested our preconditioner with diﬀerent band width (l). Then we found the choice of l = 5 leads to a singular preconditioner. Results corresponding l = 10 are

216

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

Table 1 Convergence results for sherman1: unpreconditioned (M = I)

M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar

Bi-CGSTAB

GMRES(25)

BICG

CGS

255 105 164 171 99

>1000 159 379 748 133

365 105 128 124 78

334 102 164 182 87

2

log of relative residual norm

-2 -4 -6 -8 -10

None D4 haar D6 ihaar

0 -2 -4 -6 -8

-10

-12

-12

-14 -16

2

log of relative residual norm

None D4 haar D6 ihaar

0

0

100

200

300

400

500

600

-14

0

200

400

# of iteration

1000

1200

1400

-2 -4 -6 -8

None D4 haar D6 ihaar

0

log of relative residual norm

None D4 haar D6 ihaar

0

log of relative residual norm

800

2

2

-2 -4 -6 -8

-10

-10

-12

-12

-14

-14 -16

600

# of iteration

16 0

50

100

150

200

250

# of iteration

300

350

400

0

50

100

150

200

250

300

350

400

# of iteration

Fig. 4. Convergence behavior of sherman1 problem. Top left: with Bi-CGSTAB, top right: GMRES(25), bottom left: BICG, bottom right; CGS.

presented in Table 2. Level 3, D4 transforms of a matrix saylr3 and the convergence history plots are provided in Fig. 5.

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

217

Table 2 Convergence results for saylr3: unpreconditioned (M = I)

M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar

Bi-CGSTAB

GMRES(25)

BICG

CGS

244 110 171 168 84

1000 160 385 747 132

366 83 120 115 77

332 94 155 180 90

0

0

100

100

200

200

300

300

400

400

500

500

600

600

700

700

800

800

900

900

1000

1000

0

0

100 200 300 400 500 600 700 800 900 1000

2

2

log of relative residual norm

-2 -4 -6 -8

-10

None D4 haar D6 ihaar

0 log of relative residual norm

None D4 haar D6 ihaar

0

-2 -4 -6 -8

-10

-12

-12

-14 -16

100 200 300 400 500 600 700 800 900 1000

0

50 100 150 200 250 300 350 400 450 500 # of iteration

-14

0

200

400

600 800 # of iteration

1000

1200

1400

Fig. 5. Level 3 Daubechies 4 transforms of a matrix saylr3. Top left: original matrix, top right: iDWTPer. Convergence behaviour of saylr3 problem. Bottom left: with Bi-CGSTAB, bottom right: with GMRES(25).

Pores 1: This matrix is extracted from PORES package for reservoir simulation. Size n = 30 and nz = 180. On this matrix we are showing result for two

218

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224 5 None D4 Haar D6 ihaar

0

None D4 haar ihaar

0

log of relative residual norm

log of relative residual norm

5

-5

-10

-15

-20

-5 -10 -15 -20 -25 -30

-25

0

50

100 150 200 250 300 350 400 450

-35

0

50

# of iteration 5

150

10

None D4 haar D6 ihaar

0

None D4 haar D6 ihaar

5

log of relative residual norm

log of relative residual norm

100

# of iteration

-5

10

0 -5

-10

-15

-15

-20

-25

-20

0

10

20

30

40

50

60

70

80

90

-25

0

20

40

# of iteration

60

80

100

120

140

160

# of iteration

Fig. 6. Convergence behavior of pores1 problem. Top left: with Bi-CGSTAB, top right: GMRES(25), bottom left: BICG, bottom right; CGS.

diﬀerent band width l = 3, 5. Convergence history is provided in Fig. 6. Convergence results for unpreconditioned (top) and l = 5 (middle), l = 3 (bottom) are presented in Table 3. In this case DWTPer-D6 is not working eﬀectively. IBM32: Size n = 32 and nz = 126. On this matrix we are showing results in Table 4 for two diﬀerent band width l = 5,10: unpreconditioned (top) and l = 10 (middle) l = 5 (bottom). Fig. 7 carries the convergence details. Regularized Burgers equation: In this case, A comes from ﬁnite diﬀerence analysis of Regularized Burgers Equation deﬁned with m > 0 by ouðx; tÞ=ot uðx; tÞouðx; tÞ=ox ¼ mo2 uðx; tÞ=o2 x for and 0 < x < 1 uðx; 0Þ ¼ u0 ðxÞ

t>0 ð5:2Þ

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

219

Table 3 Convergence results for Pores1: unpreconditioned (M = I)

M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar

Bi-CGSTAB

GMRES(25)

BICG

CGS

221 96 61 143 22 198 198 1162 22

167 25 23 20 25 24 20

77 51 41 89 22 51 113 113 22

150 96 75 136 22 360 20

0

0

5

5

10

10

15

15

20

20

25

25

30

30 0

5

10

15

20

25

30

0

0

5

10

15

20

5

log of relative residual

0

15

30

None D4 haar D6 ihaar

5

10

25

-5

-10

20

-15

25

30

-20 0

5

10

15

20

25

30

0

20

40

60

80

100 120 140 160 180

# of iteration

Fig. 7. Level 3 Daubechies 4 transforms of a matrix IBM32. Top left: original matrix, top right: cDWTPer, Bottom left: iDWTPer. Convergence behaviour of saylr3 problem. Bottom right: with Bi-CGSTAB.

220

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

Table 4 Convergence results for IBM32: unpreconditioned (M = I)

M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar

Bi-CGSTAB

GMRES(25)

BICG

CGS

88 40 37 25 33 53 62 57 52

74 70 57 24 147 134 119 90

34 33 34 34 26 38 38 34 38

47 36 31 34 30 47 46 37 44

Fig. 8 shows the solution at times t = 0.18 (without and with preconditioner) where time step is Dt = 103, size n = 1024. Related problems arise in many branches of science and engineering particularly ﬂuid mechanics and petroleums reservoir simulation. Finite diﬀerence analysis of this transient non-linear PDE model calls KSMs repeatedly till steady state solution is attained. Our preconditioners are found to be eﬀective with all the KSM calls. Here the convergence details corresponding to the matrix attained in the last stage of simulation are provided in Table 5. Here we are showing our result only with DWTPer-haar, DWTPer-D4 and iDWTPer-haar preconditioners. Now we continue to demonstrate the eﬀectiveness of the preconditioners for two standard and more popular solvers i.e. Bi-CGSTAB, GMRES(k) on the matrices from Tim Davis and Harwell Boeing collection. Also the eﬀect of the preconditioners on non-linear problem modeling natural convection in porous enclosure is studied. We recall that the Bi-CGSTAB requires two matrixvector multiplications per iteration, whereas the GM-RES(k) requires only one matrix-vector multiplication per iteration. Symbol o denotes convergence with GMRES(k), k = 1. For Bi-CGSTAB matrices like sherman4, pores2, saylr1, sherman1 etc. from Harwell Boeing collection are used. With GMRES(k) matrices like sherman4, sherman1, saylr1 etc. are used. In addition to these matrices related to simulation of ﬂow in Lid Driven Cavity (DRICAV) and ﬁnite element computation of Navier stokes equations from FIDAP are also considered. These two matrices are taken from Tim Davis collection. Results corresponding to these testings are provided in Tables 6 and 7. It is reported in [3] that matrix pores2 is diﬃcult to get convergence. For pores2 our method is working for band width 20 with Bi-CGSTAB. However with GMRES(k) we are not getting convergence. For sherman2 GMRES(25) reduced the relative residual below 105 after four and seven steps, but never reached 108. This may be due to very large condition number of sherman2. So we have tried with 105 tolerance. Convection in porous enclosure: To conclude this series of numerical experiments, we considered the problem of convection in porous enclosure. The free

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

221

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

Fig. 8. Solution of burgers equation at time t = .18. Top: without preconditioner, bottom: with preconditioner.

convection of heat from a hot vertical wall in a ﬂuid saturated porous enclosure with insulated top and bottom walls is governed by oT D2 w ¼ oy ð5:3Þ ow oT o/ oT 1 ¼ 2 D2 T : oy ox oy oy Ra

222

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

Table 5 Convergence results for burgers equation: unpreconditioned (M = I)

M=I DWTPer-haar DWTPer-D4 iDWTPer-haar

Bi-CGSTAB

GMRES(25)

BICG

CGS

29 17 8 17

78 68 14 66

70 25 15 46

61 37 25 46

Table 6 Convergence results with Bi-CGSTAB: unpreconditioned (M = I) Matrix

M=I

DWTPer-D4

DWTPer-haar

iDWTPer-haar

Sherman4 pores2 saylr1 Sherman1 FIDAP

84 1903 200 1595 1335

51 1574 157 700 227

50 1400 197 300 112

49 2334 105 48 106

Table 7 Convergence results with GMRES(k): unpreconditioned (M = I) Matrix

M=I

DWTPer-D4

DWTPer-haar

iDWTPer-haar

Sherman4 Sherman2 Saylr1 DRICAV

361 194 156

228 152 83 118

228 486 150 47

229 48 142 47

The other vertical wall is maintained at ambient temperature and w is taken to be zero on all the walls. The linear system resulting from ﬁnite element analysis of coupled non-linear PDEs is solved iteratively by GMRES(k), Bi-CGSTAB. The solution to non-linear system 5.3 is obtained to an accuracy of 104 on relative error of ﬁeld variables in seventeen global iteration. At each global iteration a call is made to GMRES(k)/Bi-CGSTAB. Results corresponding to the tenth global iteration are provided in Table 8. Similar results regarding the eﬃciency of preconditioner are seen at every global iteration. In this case our preconditioner is found not eﬀective with Bi-CGSTAB solver. The eﬃciency of our preconditioner is also tested with CGM solver. Harwell Boeing collection of matrices related to Dynamic Analysis in structural engineering such as BC-SSTK01, BCSSTK04 and BCSSTK27 are considered. Also the matrix Plat362 arising in ﬁnite diﬀerence analysis of PlatzmanÕs oceanographic model, which is known as a diﬃcult sparse matrix has also been considered for testing. In Table 9 results pertaining to these numerical experiments are provided.

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

223

Table 8 Convergence results for Convection problem: unpreconditioned (M = I)

M=I DWTPer-D4 DWTPer-haar iDWTPer-haar

GMRES(25)

Bi-CGSTAB

177 148 146 146

60 58 56 56

Table 9 Convergence results with CGM: unpreconditioned (M = I) Matrix

M=I

DWTPer-haar

iDWTPer-haar

BCSSTK01 BCSSTK04 BCSSTK27 Plat362

78 324 575 427

55 123 212 242

51 107 200 230

6. Conclusion The notion of iDWTPer has been proposed. Using iDWTPer/cDWTPer the given banded linear system when projected in to wavelet space takes a banded form. Taking advantage of the compression properties of wavelet transforms iDWTPer/cDWTPer preconditioners based on Daubechies family of wavelets are designed. The proposed class of preconditioners are found to be eﬃcient in accelerating the rate of convergence of iterative solvers likes BiCGSTAB,GMRES(k),BICG, CGS and CGM. They are successfully tested on several unsymmetric sparse linear systems from both Harwell Boeing and Tim davis matrix collection. Further they are tested on few non-linear problems including those from CFD context. Overall, preconditioner based on Incomplete Discrete Haar Wavelet Transform is relatively more eﬀective in most test cases.

References [1] M. Benzi, M. Tuma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19 (1998) 141–183. [2] T.F. Chan, W.P. Tang, W.L. Wan, Wavelet sparse approximate inverse preconditioned, BIT 37 (1997) 644–660. [3] M. Grote, T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18 (3) (1997) 838–853. [4] K. Chen, Discrete wavelet transforms accelerated sparse preconditioners for dense boundary element systems, Elec. Trans. Numer. Anal. 8 (1999) 138–153. [5] J. Ford, K. Chen, L. Scales, A new wavelet transform preconditioner for iterative solution of elastohyrodynamic lubrication problems, Int. J. Comput. Math. 75 (2000) 497–513.

224

B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224

[6] J. Ford, K. Chen, Wavelet-based preconditioners for dense matrices with Non-smooth local features, J. Numer. Math. 41 (2) (2001) 282–307. [7] O.M. Nilsen, Wavelets in scientiﬁc Computing, Ph.D. thesis, Technical University of Denmark, Lyngby, 1998. [8] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. PAMI 11 (7) (1989) 674–693. [9] I.S. Duﬀ, R. Grimes, J.G. Lewis, UsersÕs Guide for the Harwell-Boeing Sparse Matrix Collection, Technical REport RAL-92-086, Rutherfold Appleton Laboratory, Chilton, UK, 1992. [10] Z. Zlatev, Computational Methods for General Sparse Matrices, Kluwer, Dordrecht, the Netherlands, 1991. [11] D. Miller, Wavelet transforms and linear algebra, M.Sc. dissertation, University of Liverpool, UK, 1995.

Recommend Documents

An Efficient Solver for Sparse Linear Systems based on Rank ...

approximate inverse preconditioners via sparse-sparse ... - CiteSeerX

Block preconditioners with circulant blocks for general linear systems