Applied Mathematics and Computation 171 (2005) 203–224 www.elsevier.com/locate/amc
Wavelet based preconditioners for sparse linear systems B.V. Rathish Kumar *, Mani Mehra Department of Mathematics, Indian Institute of Technology, Kanpur 208 016, India
Abstract A class of efficient preconditioners based on Daubechies family of wavelets for sparse, unsymmetric linear systems that arise in numerical solution of Partial Differential Equations (PDEs) in a wide variety of scientific and engineering disciplines are introduced. Complete and Incomplete Discrete Wavelet Transforms in conjunction with row and column permutations are used in the construction of these preconditioners. With these Wavelet Transform, the transformed matrix is permuted to band forms. The efficiency of our preconditioners with several Krylov subspace methods is illustrated by solving matrices from Harwell Boeing collection and Tim Davis collection. Also matrices resulting in the solution of Regularized Burgers Equation, free convection in porous enclosure are tested. Our results indicate that the preconditioner based on Incomplete Discrete Haar Wavelet Transform is both cheaper to construct and gives good convergence. 2005 Elsevier Inc. All rights reserved. Keywords: Preconditioning; Sparse matrices; Wavelet transform; Krylov subspace solvers
*
Corresponding author. E-mail addresses:
[email protected] (B.V. Rathish Kumar),
[email protected] (M. Mehra).
0096-3003/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2005.01.060
204
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
1. Introduction Consider solving the large and sparse linear system Ax ¼ b
ð1:1Þ
that arises when numerical methods such as finite difference, finite element etc., are used to solve partial differential equations (PDEs). Krylov subspace iterative methods such as Conjugate Gradient (CG), Biconjugate Gradient (BIGG), Conjugate Gradient squared (CGS), Bi-Conjugate Gradient stabilized (BiCGSTAB), Generalized minimal residual method (GMRES(k)) are largely employed in solving (1.1). It is well known that the rate of convergence of such methods are strongly influenced by the spectral radius of A. It is therefore natural to try to transform the original system in to one having the same solution but more favorable spectral properties. A preconditioner is a matrix that can be used to accomplish such a transformation. If M is a non-singular matrix which approximates A(M A), the transformed linear system M 1 Ax ¼ M 1 b
ð1:2Þ
will have the same solution as system (1.1) but the convergence rate of iterative methods applied to (1.2) may be much higher. Further it is desirable that M1v can be easily calculated for any vector v. A large number of different preconditioning strategies have been developed. A brief survey of the same is provided in [1]. Recently, there is an increased interest in using wavelet based preconditioners with Krylov subspace methods (KSMs) for linear system (1.1). Chan et al. [2] have used Discrete Wavelet transform (DWT, usually complete Discrete Wavelet transform (cDWT) is referred as DWT) for improving the performance of sparse approximate inverse preconditioners as proposed by Grote and Huckle [3]. They dealt with matrices with smooth inverses resulting from 1-D Laplacian operator and linear PDE with variable coefficients. Chen [4] used DWT with permutations for the construction of preconditioners for dense system arising from Boundary element analysis (BEA) of linear PDEs. Ford et al. [5,6] have considered dense matrices with local non-smoothness and have shown that wavelet compression can be used for designing preconditioners for such dense systems after isolating local non-smoothness. All these studies largely focus on algorithm for constructing DWT based preconditioners in solving dense symmetric/unsymmetric linear systems arising in BEA of linear PDEs by GMRES method. DWT transforms (1.1) to WAW T ~x ¼ Wb;
~x ¼ Wx:
ð1:3Þ
e ¼ WAW are Since the wavelet transform W is orthogonal the eigen values of A e will be unafthe same as those of A. This implies that condition number of A fected by DWT. Incomplete Discrete Wavelet Transform (iDWT) W1, which is T
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
205
approximately orthogonal, is cheaper and easier to construct. W1 may serve as an alternative for W to project A into wavelet space. Standard DWT leads to preconditioner with non-zero entries dispersed throughout. The cost of applying such a preconditioner is often too high for practical purpose. By use of cDWT and iDWT with permutation one can improve the positioning of the non-zero entries to give a preconditioner that is cheaper to apply and also gives good convergence. In this study we propose Haar and Daubechies wavelet based preconditioners for non-symmetric large sparse linear systems resulting from non-linear PDE analysis. Further in our algorithms we are using Discrete Wavelet Transform with permutations based on cDWT and iDWT. While it is generally agreed that the construction of efficient general purpose preconditioner is not possible, there is still considerable interest in developing methods which will perform well on a wide range of problems. The algorithms proposed in this study are tested on a variety of matrices resulting from Finite Difference and Finite element method of non-linear PDEs and those from Harwell Boeing collection. Also the preconditioners are tested on five different Krylov subspace methods. The paper is organized as follows. In Section 2 we give a quick overview of sparse and band preconditioner. In Section 3 we summarize some basics of wavelets. We also present cDWT and iDWT based on pyramid algorithm. In Section 4 we introduce our new concept of Incomplete Discrete Wavelet Transform with permutation (iDWTPer) and describe the properties of banded wavelet preconditioner using iDWTPer. Implementation details and the results of numerical experiments are discussed in Section 5. Finally, we make some conclusions in Section 6.
2. Sparse and band preconditioner A simple preconditioner for a sparse matrix would be banded matrix constructed by setting to zero all the entries of the matrix outside a chosen diagonal band. The wider the band more accurate will be preconditioner approximating the original matrix. But matrices with large band width (l) are expensive to store and would demand enormous amount of computation at each iterative step involving such a matrix as a preconditioner. So it is important to choose l in such a way that it balances these two conflicting considerations. To decide a preconditioner for linear system (1.1) we have considered the following splitting of A: A ¼ D þ C;
ð2:1Þ
where D is a band matrix with wrap around boundaries. For the case of l = 3, the matrix D is simply
206
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
2
A1;1
6 6 A2;1 6 6 6 6 6 6 6 6 6 4
A1;2
A1;n
A2;2
A2;3
A3;2
..
.
..
.
..
.
..
.
An;1
3
7 7 7 7 7 7 7 7 7 7 An1;n 7 5 An;n1
ð2:2Þ
An;n
the associated preconditioned system ðI þ D1 CÞx ¼ D1 b 1
ð2:3Þ 1
has the preconditioner M = D . Clearly with the above choice of l = 3, M1 will not approximate A1 well in all cases. It is our experience that to improve the above preconditioner, merely increasing l alone is not sufficient as the improvements are only marginal. Here we shall discuss a way based on cDWT and iDWT to derive a new and improved preconditioner.
3. Wavelet preliminaries Multiresolution analysis (MRA) is the theory that was used by Ingrid Daubechies to show that for any non-negative integer n there exists an orthogonal wavelet with compact support such that all the derivatives up to order n exist. Here we are using compactly supported wavelets like Haar, D4, D6 (DN stands for Daubechies order N wavelets). MRA describes a sequence of nested approximation spaces Vj in L2(R) such that closure of their union equals L2(R) making f0g V 1 V 0 V 1 L2 ðRÞ:
ð3:1Þ
The orthogonality of scaling functions and wavelets together with the dyadic coupling between MRA spaces lead to a relation between scaling function coefficients and wavelet coefficients on different scales. This yields a fast and accurate algorithm due to Mallat [8] denoted by pyramid algorithm. Straightforward implementation of the pyramid algorithm leads to difficulties in handling boundary points because such a procedure requires several data values which are defined outside the boundaries and these are not known. Based on this we will define two notions namely cDWT and iDWT. Both cDWT and iDWT are defined by filter coefficients a0, a1, . . . , aD1. Where ÔDÕ is order of wavelet transform. The filter coefficients b0, b1, . . . , bD1 are derived from ai, by the following relation bi ¼ ð1Þi aD1i :
ð3:2Þ
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
207
For f 2 L2(R), fj(x) denotes the projection of f onto Vj space in terms of scaling function /j,k is defined by X j fj ðxÞ ¼ P V j f ðxÞ ¼ sk /j;k : ð3:3Þ k
Projection also has a formulation in terms of scaling function and wavelet function wj,k where j is resolution level and k is translation. P V j ðxÞ ¼
X
srk /r;l ðxÞ þ
L1 X X j¼r
k
d jk wj;k ðxÞ;
ð3:4Þ
k
where r is lowest and L is highest resolution level. Thus we obtain the relations ¼ sj1 l
D1 X
ak sj2lþk ;
k¼0
d j1 l
¼
D1 X
ð3:5Þ bk sj2lþk :
k¼0
3.1. Complete Discrete Wavelet Transform (cDWT) If function f is periodic, we also have periodicity in the scaling and wavelet coefficients. sjk ¼ sjkþ2j p d jk ¼ d jkþ2j p ;
ð3:6Þ
p 2 Z:
Hence it is enough to consider 2j coefficients of either type at level j. Thus pyramid algorithm for periodic Wavelet Transform is defined by sj1 ¼ l
D1 X
ak sjh2lþki2j ;
k¼0
d j1 l
¼
D1 X
ð3:7Þ bk sjh2lþki2j ;
j
l ¼ 0; 1; . . . ; 2 1
k¼0
For a given vector s from vector space Rn one may construct an infinite periodic sequence of period n and use it as coefficients of a scaling function fL(x) in some fixed subspace VL of L2 (L in an integer). Hereafter we refer to periodic wavelet transform by cDWT. Then transform Ws ! w(w = Ws) is implemented by pyramid algorithm (3.7). Denote s = s(L) is column vector of A at wavelet level L. Then pyramid algorithm transforms the vector sl to w defined as T
T
T
T
w ¼ ½ðsðrÞ Þ ðf ðrÞ Þ ðf ðrþ1Þ Þ . . . ðf ðL1Þ Þ
ð3:8Þ
208
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
in level by level manner as described below sðLÞ
! sðL1Þ &
! &
sðL2Þ
f ðL1Þ
f ðL2Þ
! &
! &
sðmÞ
f ðvÞ
! &
! sðrÞ & f ðrÞ
where sj and fj are of length 2j. Suppose w denotes the wavelet function. Then we expect the new vector w to be nearly sparse because of the usual moment conditions Z 1 wðxÞxp dx ¼ 0 for p ¼ 0; 1; . . . ; D=2 1 ð3:9Þ 1
are equivalent to vector moment conditions D1 X
ð1Þk k p bk ¼ 0
for
p ¼ 0; 1; . . . ; D=2 1:
ð3:10Þ
k¼0
Here the larger D is, the better is the compression in w, but the compact support is larger as well. As the periodized wavelet transform satisfies the orthonormal relation WWT = I we call it cDWT. 3.2. Incomplete Discrete Wavelet Transform (iDWT) Unlike in cDWT our iDWT does not require periodic ECÕs as the function f need not be periodic to apply the iDWT. iDWT assumes that all values outside the boundaries are equal to zero as shown in Fig. 1. Further it is much simpler to implement than its complete counterpart (cDWT).
Fig. 1. The pyramid algorithm and boundary points.
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
209
Pyramid algorithm for iDWT is ¼ sj1 l
D1 X
ak sj2lþk ;
k¼0
d j1 l
¼
D1 X
ð3:11Þ bk sj2lþk :
k¼0
where the points outside boundary are zero. The transform W1s ! w(w = W1s) is implemented by (3.11) and satisfy orthonormal relation approximately W 1 W T1 uI. We can show the power of this iDWT on Calderon–Zygmund matrix. Consider the Calderon–Zygmund type matrix A = (aij), where ( ð3=2Þ if i ¼ j; aij ¼ ð3:12Þ 1 otherwise: jijj As shown in Fig. 2 it is smooth, with entries decreasing in magnitude away from the diagonal. When a cDWT is applied to A the resultant matrix e ¼ WAW T after thresholding has a weak finger pattern with most of the largA e 1 ¼ W 1 AW T obest entries confined to a narrow diagonal band. The matrix A 1 tained by applying iDWT to A is also shown in Fig. 2 and it is very much e similar to A.
4. DWT-based preconditioners 4.1. The wavelet transform with permutation (DWTPer) A discrete wavelet transform with permutation based on cDWT is defined in [4]. It is equivalent to wavelet transform followed by permutations of rows and columns. We will refer this wavelet transform with permutation as DWTPer (or cDWTPer). This transform has the effect of preserving the general structure (by which we mean the ÔshapeÕ of the areas of singularity) of a matrix after a wavelet transform has been applied. The precise effect of applying DWTPer to a diagonal matrix is established in [4]. 4.1.1. Incomplete Discrete Wavelet Transform with permutation (iDWTPer) Here we are proposing a new notion of Incomplete Discrete Wavelet Transform with permutation as iDWTPer. Denote by s = s(L) a column vector of A at the wavelet level L. Then pyramid algorithm based on iDWT transforms the vector s(L) to w ¼ ½ðsðrÞ ÞT ðf ðrÞ ÞT ðf ðrþ1Þ ÞT . . . ðf L1 ÞT T :
ð4:1Þ
210
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
Fig. 2. Mesh plot of matrix Top center: original matrix, bottom left: its cDWTPer, bottom right: its iDWTPer.
Assume n = 2L and r is an integer such that 2r < D and 2r+1 P D. r = 0 for D = 2 (Haar wavelets) and r = 1 for D = 4 (Daubechies order 4 wavelets). In matrix form w is expressed as w ¼ P rþ1 W 1rþ1 . . . P L1 W 1L1 P L W 1L sL W 1 sL ;
ð4:2Þ
where Pm ¼
!
Pm
ð4:3Þ Im
nn
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
211
with P v a permutation matrix of size 2m = 2L km, that is, P m ¼ Ið1; 3; . . . ; W 1m 2m 1; 2; 4; . . . ; 2m Þ, and where W 1m ¼ with W 1m an orthogonal I m nn (sparse) matrix of size 2m = 2Lminuskv and Im is an identity matrix of size km. The one level transformation matrix W 1m is a compact diagonal block matrix. For example, with the Daubechies order D = 4 wavelets with m = 2 vanishing moments, it is 1 0 a0 a1 a2 a3 C Bb b b b C B 0 1 2 3 C B C B a0 a1 a 2 a 3 C B C B C B b0 b1 b 2 b 3 C B C B .. .. .. .. C B C B . . . . C: W 1m ¼ B C B .. .. .. .. C B . . . . C B C B B a 0 a 1 a2 a3 C C B C B C B b b b b 0 1 2 3 C B C B a0 a1 A @ a0
a1
Now we will define new one level iDWTPer matrix (similar to W 1m ) 1 0 a0 / a1 / a2 / . . . aD1 C B / I / / / / ... / C B C B C B b0 / b1 / b2 / . . . bD1 C B C B / / / I / / ... / C B C B C B .. .. C B . a / . 0 C B C B . . C : B b . . W 1m ¼ B . / I . C C B C B .. .. .. .. .. .. .. .. C B . . . . . . . . ... ... C B C B a0 / a1 / C B C B B / I / /C C B C B @ b0 / b1 / A / Lm
/
/
I
ð4:4Þ
nn
Here I is an identity matrix of size 2 1 and /Õs are block zero matrices. b 1L ¼ W 1L ¼ W 1L . Further this iDWTFor m = L, both I and / are of size 0 i.e. W Per for a vector sL 2 Rn can be defined by
212
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
b 1 sðLÞ ^¼W w with b 1 ... W b1¼ W b1 W b 1L W rþ1 rþ2
ð4:5Þ
based on (L + 1 r) levels. For a matrix An·n, the iDWT would give b1 ¼ W b 1AW b T: A 1 e from a standard iDWT, we define a permutation matrix b 1 to A Now to relate A P ¼ P TL P TL1 . . . P Trþ2 P Trþ1 ; where matrices Pk following b 1k ¼ W
Lk Y
s
ð4:6Þ
come from (4.3). Firstly by induction, we can prove the !T
P kþl
l¼1
W 1k
Lk Y
! P kþl
for
k ¼ r þ 1; r þ 2; . . . ; L;
l1
that is, b 1;m ¼ W 1L W b 1 ¼ PTW 1 PL W L1 L1 L .. . b W 1rþ1 ¼ P TL P TL1 . . . P Trþ2 W 1rþ2 P rþ2 . . . P L1 P L : Now we can verify that PW 1 ¼ ðP TL P TL1 . . . P Trþ1 ÞðP rþ1 W 1rþ1 . . . P L W 1L Þ b 1rþ1 ðP T P T . . . P T ÞðP rþ2 W 1rþ2 . . . P L W 1L Þ ¼W L L1 rþ2 .. . b1 W b 1 ...W b 1 P T W 1 ðP L W 1L Þ ¼W rþ1 rþ2 L1 L L2 b 1rþ2 . . . W b 1rþ1 W b 1L ¼W b 1: ¼W b 1; ¼ P A e 1 P T . From this relation we can conclude that b 1 ¼ PW 1 ; A Therefore W iDWTPer can be implemented in a level by level manner, either directly using b 1m (via W b 1 ) or indirectly using Pm (via P) after a iDWT, and we obtain the W same result. The cost in terms of flops of performing a cDWT may be calculated in a straight forward manner as in [7]. Since DWTPer simply involves a permutation of DWT, the flops count for DWTPer is identical. The cost of applying
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
213
a block size N, order D, level l iDWTPer to an n · n rectangular matrix is less than 8DnN ð1 21l Þ. 4.2. Banded Wavelet Preconditioner using cDWTPer and iDWTPer As far as preconditioning is concerned, to solve (1.1), we propose the following algorithm e 1^x ¼ ^b 1. Apply a iDWTPer to Ax = b to obtain A b 2. Select a suitable band form M of A 1 b 1^x ¼ ^ 3. Use M as a preconditioner to solve A b b 1 by A b in the above algorithm. As we described in For cDWTPer replace A Section 2 the band size of M determines the cost of a preconditioning step. Therefore, we shall explore the possibility of constructing an effective preconditioner based on suitable band. Let LEV denote the actual number of wavelet levels used (1 6 LEV 6 (L + 1 r)). With the help of this DWTPer we can b We show in Fig. 3 transform a band matrix A in to another band matrix A. the original matrix and cDWTPer, iDWTPer of a matrix taken from group of Saylor (petroleum reservoir simulation matrices) with D = 4, LEV = 3 and n = 216. Where the Band width of new matrices under cDWTPer and iDWTPer is increased and satisfy the over estimate given by Theorem 4.1. Here we are introducing definition of band matrix. Band(A, a, b, k): A block band matrix An·n and blocks of size k · k, is called a Band(A, a, b, k) if its lower band width is a and upper block band width b. when k = 1 Band(A, a, b, 1) = Band(A, a, b). The strategy that we take is to start preconditioning step is partition A = D + C, where D is a Band(D, a, a) for some integer a. First apply DWTPer with LEV 6 (L + 1 r) of wavelet levels, to give b x ¼ ðD b x¼^ b þ CÞ^ A^ b: b is also a band matrix and it is at most Band( D; b k; kÞ matrix with k as Now D predicted by following theorem proved in [4]. Theorem 4.1. A is a band(a, b) matrix. Then the new DWT of l levels, based on b which is at most a band(k1, k2) DaubechiesÕ order D wavelets, transforms A into A matrix with k1 a ¼ k2 b ¼ Dð2l1 1Þ:
ð4:7Þ
b k; k) part of the matrix A. b Then select the precondiLet B denote the Band( A; tioner of a band width l such that l 6 k and M = Band(B, l, l).
214
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224 0
5
10
15
20
25
30 0
5
10
15
20
0
0
5
5
10
10
15
15
20
20
25
25
30
30 0
5
10
15
20
25
30
0
25
30
5
10
15
20
25
30
Fig. 3. Level 3 Daubechies 4 transforms of a matrix pores1, Top center: original matrix, bottom left: cDWTPer, bottom right: iDWTPer.
5. Numerical experiments In this section we present the results of numerical experiments on few nonlinear problems and on a range of matrices from the Harwell Boeing collection [9] and Tim Davis collection. The right hand side of each of linear systems from Harwell-Boeing collection was computed from the solution vector x of all ones, the choice used, e.g., in [10]. All numerical experiments were computed in double precision using a MATLAB implementation. The initial guess was always x0 = 0 and the stopping criterion is kb Axk2 6 106 : ð5:1Þ kbk2
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
215
In our experiments we were primarily concerned with testing the effectiveness of the pre-conditioners, so we did not attempt to optimize the choice of tolerance. For some small size problem like IBM32 and pores1 we have tested preconditioner for tolerance of 108. Default we have taken 106. We shall demonstrate the effectiveness of the preconditioner for few standard KSMs like CGM, BICGM, Bi-CGSTAB, CGS, GMRES(k). To begin with we present the results for all these solvers and subsequently will focus on popular solvers like GMRES and Bi-CGSTAB. Initially matrices such as sherman1, saylr3, pores1, IBM32 from Harwell Boeing collection are used. Also the matrices encountered in finite difference analysis of unsteady burgers equation are considered. The number of iterations for both unpreconditioned and preconditioned iterative methods are given in respective tables. A means slow convergence or no convergence. Further in our numerical experiments we have tested with four different preconditioners namely, cDWTPer-haar (complete DWT with permutation based on Haar wavelet), cDWTPer-D4 (complete DWT with permutation based on compactly supported wavelet of order four), cDWTPer-D6 (complete DWT with permutation based on compactly supported wavelet of order six), iDWTPer-haar (incomplete DWT with permutation based on Haar wavelet). The convergence results without preconditioning are listed under M = I. We also considered the use of a standard DWT-based preconditioner formed by setting to zero all entries whose magnitude is less than a chosen threshold value (as done in [11]), and found that such preconditioner becomes singular for non-linear problems. All the numerical experiments are carried out using MATLAB software installed on Sun E250 workstation with SunW ultrasparc II, 400 MHZ, Dual processor using double precision arithmetic under Solaris 8 operating system. To begin with we have tested 4 Krylov subspace solvers on matrices given below: Sherman1: This matrix arises in oil reservoir simulation on a 10 · 10 · 10 grid, using seven point finite-difference approximation with NC equations and unknowns per gridblock. Here, size n = 1000, nz = 23094 (where nz is number of non-zeros entries) and NC = 1. Outcome of the numerical experiments are provided in Table 1. From Table 1 one can notice that our wavelet transform based preconditioners are effective with all the four iterative solvers for unsymmetric matrices, In most of the cases iDWTPer-haar based preconditioner is found to be effective in accelerating the convergence rate of the iterative solver. In Fig. 4 logarithm of relative residual norm is plotted against the number of iterations to depict the convergence history of the preconditioned KSMs. Saylr3: Saylor petroleum engineering reservoir simulation matrix arises in 3D reservoir simulation. Here, n = 1000, and nz = 375. On this matrix we tested our preconditioner with different band width (l). Then we found the choice of l = 5 leads to a singular preconditioner. Results corresponding l = 10 are
216
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
Table 1 Convergence results for sherman1: unpreconditioned (M = I)
M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar
Bi-CGSTAB
GMRES(25)
BICG
CGS
255 105 164 171 99
>1000 159 379 748 133
365 105 128 124 78
334 102 164 182 87
2
log of relative residual norm
-2 -4 -6 -8 -10
None D4 haar D6 ihaar
0 -2 -4 -6 -8
-10
-12
-12
-14 -16
2
log of relative residual norm
None D4 haar D6 ihaar
0
0
100
200
300
400
500
600
-14
0
200
400
# of iteration
1000
1200
1400
-2 -4 -6 -8
None D4 haar D6 ihaar
0
log of relative residual norm
None D4 haar D6 ihaar
0
log of relative residual norm
800
2
2
-2 -4 -6 -8
-10
-10
-12
-12
-14
-14 -16
600
# of iteration
16 0
50
100
150
200
250
# of iteration
300
350
400
0
50
100
150
200
250
300
350
400
# of iteration
Fig. 4. Convergence behavior of sherman1 problem. Top left: with Bi-CGSTAB, top right: GMRES(25), bottom left: BICG, bottom right; CGS.
presented in Table 2. Level 3, D4 transforms of a matrix saylr3 and the convergence history plots are provided in Fig. 5.
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
217
Table 2 Convergence results for saylr3: unpreconditioned (M = I)
M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar
Bi-CGSTAB
GMRES(25)
BICG
CGS
244 110 171 168 84
1000 160 385 747 132
366 83 120 115 77
332 94 155 180 90
0
0
100
100
200
200
300
300
400
400
500
500
600
600
700
700
800
800
900
900
1000
1000
0
0
100 200 300 400 500 600 700 800 900 1000
2
2
log of relative residual norm
-2 -4 -6 -8
-10
None D4 haar D6 ihaar
0 log of relative residual norm
None D4 haar D6 ihaar
0
-2 -4 -6 -8
-10
-12
-12
-14 -16
100 200 300 400 500 600 700 800 900 1000
0
50 100 150 200 250 300 350 400 450 500 # of iteration
-14
0
200
400
600 800 # of iteration
1000
1200
1400
Fig. 5. Level 3 Daubechies 4 transforms of a matrix saylr3. Top left: original matrix, top right: iDWTPer. Convergence behaviour of saylr3 problem. Bottom left: with Bi-CGSTAB, bottom right: with GMRES(25).
Pores 1: This matrix is extracted from PORES package for reservoir simulation. Size n = 30 and nz = 180. On this matrix we are showing result for two
218
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224 5 None D4 Haar D6 ihaar
0
None D4 haar ihaar
0
log of relative residual norm
log of relative residual norm
5
-5
-10
-15
-20
-5 -10 -15 -20 -25 -30
-25
0
50
100 150 200 250 300 350 400 450
-35
0
50
# of iteration 5
150
10
None D4 haar D6 ihaar
0
None D4 haar D6 ihaar
5
log of relative residual norm
log of relative residual norm
100
# of iteration
-5
10
0 -5
-10
-15
-15
-20
-25
-20
0
10
20
30
40
50
60
70
80
90
-25
0
20
40
# of iteration
60
80
100
120
140
160
# of iteration
Fig. 6. Convergence behavior of pores1 problem. Top left: with Bi-CGSTAB, top right: GMRES(25), bottom left: BICG, bottom right; CGS.
different band width l = 3, 5. Convergence history is provided in Fig. 6. Convergence results for unpreconditioned (top) and l = 5 (middle), l = 3 (bottom) are presented in Table 3. In this case DWTPer-D6 is not working effectively. IBM32: Size n = 32 and nz = 126. On this matrix we are showing results in Table 4 for two different band width l = 5,10: unpreconditioned (top) and l = 10 (middle) l = 5 (bottom). Fig. 7 carries the convergence details. Regularized Burgers equation: In this case, A comes from finite difference analysis of Regularized Burgers Equation defined with m > 0 by ouðx; tÞ=ot uðx; tÞouðx; tÞ=ox ¼ mo2 uðx; tÞ=o2 x for and 0 < x < 1 uðx; 0Þ ¼ u0 ðxÞ
t>0 ð5:2Þ
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
219
Table 3 Convergence results for Pores1: unpreconditioned (M = I)
M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar
Bi-CGSTAB
GMRES(25)
BICG
CGS
221 96 61 143 22 198 198 1162 22
167 25 23 20 25 24 20
77 51 41 89 22 51 113 113 22
150 96 75 136 22 360 20
0
0
5
5
10
10
15
15
20
20
25
25
30
30 0
5
10
15
20
25
30
0
0
5
10
15
20
5
log of relative residual
0
15
30
None D4 haar D6 ihaar
5
10
25
-5
-10
20
-15
25
30
-20 0
5
10
15
20
25
30
0
20
40
60
80
100 120 140 160 180
# of iteration
Fig. 7. Level 3 Daubechies 4 transforms of a matrix IBM32. Top left: original matrix, top right: cDWTPer, Bottom left: iDWTPer. Convergence behaviour of saylr3 problem. Bottom right: with Bi-CGSTAB.
220
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
Table 4 Convergence results for IBM32: unpreconditioned (M = I)
M=I DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar DWTPer-haar DWTPer-D4 DWTPer-D6 iDWTPer-haar
Bi-CGSTAB
GMRES(25)
BICG
CGS
88 40 37 25 33 53 62 57 52
74 70 57 24 147 134 119 90
34 33 34 34 26 38 38 34 38
47 36 31 34 30 47 46 37 44
Fig. 8 shows the solution at times t = 0.18 (without and with preconditioner) where time step is Dt = 103, size n = 1024. Related problems arise in many branches of science and engineering particularly fluid mechanics and petroleums reservoir simulation. Finite difference analysis of this transient non-linear PDE model calls KSMs repeatedly till steady state solution is attained. Our preconditioners are found to be effective with all the KSM calls. Here the convergence details corresponding to the matrix attained in the last stage of simulation are provided in Table 5. Here we are showing our result only with DWTPer-haar, DWTPer-D4 and iDWTPer-haar preconditioners. Now we continue to demonstrate the effectiveness of the preconditioners for two standard and more popular solvers i.e. Bi-CGSTAB, GMRES(k) on the matrices from Tim Davis and Harwell Boeing collection. Also the effect of the preconditioners on non-linear problem modeling natural convection in porous enclosure is studied. We recall that the Bi-CGSTAB requires two matrixvector multiplications per iteration, whereas the GM-RES(k) requires only one matrix-vector multiplication per iteration. Symbol o denotes convergence with GMRES(k), k = 1. For Bi-CGSTAB matrices like sherman4, pores2, saylr1, sherman1 etc. from Harwell Boeing collection are used. With GMRES(k) matrices like sherman4, sherman1, saylr1 etc. are used. In addition to these matrices related to simulation of flow in Lid Driven Cavity (DRICAV) and finite element computation of Navier stokes equations from FIDAP are also considered. These two matrices are taken from Tim Davis collection. Results corresponding to these testings are provided in Tables 6 and 7. It is reported in [3] that matrix pores2 is difficult to get convergence. For pores2 our method is working for band width 20 with Bi-CGSTAB. However with GMRES(k) we are not getting convergence. For sherman2 GMRES(25) reduced the relative residual below 105 after four and seven steps, but never reached 108. This may be due to very large condition number of sherman2. So we have tried with 105 tolerance. Convection in porous enclosure: To conclude this series of numerical experiments, we considered the problem of convection in porous enclosure. The free
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
221
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
Fig. 8. Solution of burgers equation at time t = .18. Top: without preconditioner, bottom: with preconditioner.
convection of heat from a hot vertical wall in a fluid saturated porous enclosure with insulated top and bottom walls is governed by oT D2 w ¼ oy ð5:3Þ ow oT o/ oT 1 ¼ 2 D2 T : oy ox oy oy Ra
222
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
Table 5 Convergence results for burgers equation: unpreconditioned (M = I)
M=I DWTPer-haar DWTPer-D4 iDWTPer-haar
Bi-CGSTAB
GMRES(25)
BICG
CGS
29 17 8 17
78 68 14 66
70 25 15 46
61 37 25 46
Table 6 Convergence results with Bi-CGSTAB: unpreconditioned (M = I) Matrix
M=I
DWTPer-D4
DWTPer-haar
iDWTPer-haar
Sherman4 pores2 saylr1 Sherman1 FIDAP
84 1903 200 1595 1335
51 1574 157 700 227
50 1400 197 300 112
49 2334 105 48 106
Table 7 Convergence results with GMRES(k): unpreconditioned (M = I) Matrix
M=I
DWTPer-D4
DWTPer-haar
iDWTPer-haar
Sherman4 Sherman2 Saylr1 DRICAV
361 194 156
228 152 83 118
228 486 150 47
229 48 142 47
The other vertical wall is maintained at ambient temperature and w is taken to be zero on all the walls. The linear system resulting from finite element analysis of coupled non-linear PDEs is solved iteratively by GMRES(k), Bi-CGSTAB. The solution to non-linear system 5.3 is obtained to an accuracy of 104 on relative error of field variables in seventeen global iteration. At each global iteration a call is made to GMRES(k)/Bi-CGSTAB. Results corresponding to the tenth global iteration are provided in Table 8. Similar results regarding the efficiency of preconditioner are seen at every global iteration. In this case our preconditioner is found not effective with Bi-CGSTAB solver. The efficiency of our preconditioner is also tested with CGM solver. Harwell Boeing collection of matrices related to Dynamic Analysis in structural engineering such as BC-SSTK01, BCSSTK04 and BCSSTK27 are considered. Also the matrix Plat362 arising in finite difference analysis of PlatzmanÕs oceanographic model, which is known as a difficult sparse matrix has also been considered for testing. In Table 9 results pertaining to these numerical experiments are provided.
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
223
Table 8 Convergence results for Convection problem: unpreconditioned (M = I)
M=I DWTPer-D4 DWTPer-haar iDWTPer-haar
GMRES(25)
Bi-CGSTAB
177 148 146 146
60 58 56 56
Table 9 Convergence results with CGM: unpreconditioned (M = I) Matrix
M=I
DWTPer-haar
iDWTPer-haar
BCSSTK01 BCSSTK04 BCSSTK27 Plat362
78 324 575 427
55 123 212 242
51 107 200 230
6. Conclusion The notion of iDWTPer has been proposed. Using iDWTPer/cDWTPer the given banded linear system when projected in to wavelet space takes a banded form. Taking advantage of the compression properties of wavelet transforms iDWTPer/cDWTPer preconditioners based on Daubechies family of wavelets are designed. The proposed class of preconditioners are found to be efficient in accelerating the rate of convergence of iterative solvers likes BiCGSTAB,GMRES(k),BICG, CGS and CGM. They are successfully tested on several unsymmetric sparse linear systems from both Harwell Boeing and Tim davis matrix collection. Further they are tested on few non-linear problems including those from CFD context. Overall, preconditioner based on Incomplete Discrete Haar Wavelet Transform is relatively more effective in most test cases.
References [1] M. Benzi, M. Tuma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19 (1998) 141–183. [2] T.F. Chan, W.P. Tang, W.L. Wan, Wavelet sparse approximate inverse preconditioned, BIT 37 (1997) 644–660. [3] M. Grote, T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput. 18 (3) (1997) 838–853. [4] K. Chen, Discrete wavelet transforms accelerated sparse preconditioners for dense boundary element systems, Elec. Trans. Numer. Anal. 8 (1999) 138–153. [5] J. Ford, K. Chen, L. Scales, A new wavelet transform preconditioner for iterative solution of elastohyrodynamic lubrication problems, Int. J. Comput. Math. 75 (2000) 497–513.
224
B.V. Rathish Kumar, M. Mehra / Appl. Math. Comput. 171 (2005) 203–224
[6] J. Ford, K. Chen, Wavelet-based preconditioners for dense matrices with Non-smooth local features, J. Numer. Math. 41 (2) (2001) 282–307. [7] O.M. Nilsen, Wavelets in scientific Computing, Ph.D. thesis, Technical University of Denmark, Lyngby, 1998. [8] S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. PAMI 11 (7) (1989) 674–693. [9] I.S. Duff, R. Grimes, J.G. Lewis, UsersÕs Guide for the Harwell-Boeing Sparse Matrix Collection, Technical REport RAL-92-086, Rutherfold Appleton Laboratory, Chilton, UK, 1992. [10] Z. Zlatev, Computational Methods for General Sparse Matrices, Kluwer, Dordrecht, the Netherlands, 1991. [11] D. Miller, Wavelet transforms and linear algebra, M.Sc. dissertation, University of Liverpool, UK, 1995.