Transpose{free Lanczos{type algorithms for ... - Semantic Scholar

Report 3 Downloads 11 Views
Transpose{free Lanczos{type algorithms for nonsymmetric linear systems C. Brezinski



M. Redivo{Zaglia

y

Abstract

The method of Lanczos for solving systems of linear equations is implemented by using recurrence relationships between formal orthogonal polynomials. A drawback is that the computation of the coecients of these recurrence relationships usually requires the use of the transpose of the matrix of the system. Due to the indirect addressing, this is a costly operation. In this paper, a new procedure for computing these coecients is proposed. It is based on the recursive computation of the products of polynomials appearing in their expressions and it does involve the transpose of the matrix. Moreover, our approach allows to implement simultaneously and at a low price a Lanczos{type product method such as the CGS or the BiCGSTAB.

Lanczos method [22] for solving a system of linear equations is based on formal orthogonal polynomials. Such polynomials satisfy, under certain assumptions discussed below, some recurrence relationships. However, the computation of the coecients of these recurrence relationships involves products of the form AT v where v is an arbitrary vector. When the matrix is large and sparse, such products are dicult to compute due to the indirect addressing needed by the structure of A. In some cases (such as the solution of systems of nonlinear equations), although products Av can be computed, A is not available and it is even impossible to compute products AT v. So, it was interesting to derive transpose{free algorithms for implementing the method of Lanczos. In [4], a transpose{free Lanczos/Orthodir algorithm was presented. It was based on a very simple idea known for a long time (see, for example, [2]): the coecients of the recurrence relationships of the formal orthogonal polynomials can be expressed as ratios of products of Hankel determinants and they can be computed by the qd{algorithm of Rutishauser [24]. It was also mentioned in [4] that the same technique could be used for avoiding the use of AT in the other algorithms for the implementation of Lanczos method. All these transpose{free algorithms were tested on numerical examples and the quantities qkn and ekn of the qd{algorithm were computed by several algorithms such as the ", the , the rs and the qd{algorithms and by the recurrence relationship of the Hankel determinants [7]. However, all these procedures were found to be highly numerically unstable and, so, an open question is still to nd a stable way of computing products Aiv and implementing the qd{algorithm. ( )

( )

Laboratoire d'Analyse Numerique et d'Optimisation, Universite des Sciences et Technologies de Lille, 59655{ Villeneuve d'Ascq Cedex, France. E{mail: [email protected] y Dipartimento di Elettronica e Informatica, Universit a degli Studi di Padova, via Gradenigo 6/a, 35131 { Padova, Italy. E{mail: [email protected] 

1

Let us also remind that, as showed in [2, pp.184{189], the topological "{algorithm is a transpose{free method for the implementation of Lanczos method. However, it requires too much storage and is intractable [17]. In this paper we will propose a di erent approach to this problem. It is based on the recursive computation of the products of polynomials appearing in the expressions of the coecients of the recurrence relationships and it does involve the transpose of the matrix. Thus, we are able to produce a transpose{free version of any algorithm for implementing Lanczos method. In this framework, we recover the algorithm of Chan, de Pillis and van der Vorst [13] as a particular case. Usually, the price to pay for a transpose{free algorithm is an additional matrix{vector product but, as we will see, the products of polynomials to be computed are, in many cases, related to some Lanczos{type product methods (also called CGM methods [3]). Thus, our approach allows to implement simultaneously, and at almost no extra cost, a Lanczos{type product method such as the CGS or the BiCGSTAB. Such a coupled implementation is, in fact, cheaper than using the two algorithms separately. Moreover, using the hybrid procedure [5] is quite easy. As we will see, it is possible to implement Lanczos method and a Lanczos{type product method just by adding some instructions (without any additional matrix{vector product) in the code of the product method. Although some new algorithms for the implementation of Lanczos{type product methods are obtained in the sequel, our purpose is to derive transpose{free algorithms for Lanczos method itself. Thus, in passing, known algorithms for product methods will be recovered such as those given in [13], [19] and [23]. Our derivation of the algorithms is based on the interpretation of Lanczos method in terms of formal orthogonal polynomials (FOP). Such polynomials were already known to Lanczos [21], but, although some considerations and algorithms were given in [2] and in [18], their systematical use for obtaining recursive algorithms for implementing Lanczos method seems to have been fully exploited in [12] for the rst time. The method of Lanczos will be presented in Section 1. Section 2 is devoted to formal orthogonal polynomials. Recursive algorithms for the implementation of Lanczos method are given in Section 3 and their transpose{free versions are discussed in Section 4. Numerical experiments end the paper.

1 Lanczos method

Let us consider the p  p system of linear equations Ax = b. The method of Lanczos [22] consists in constructing the sequence of iterates xk 2 IRp de ned by

xk ? x 2 Kk (A; r ) = span(r ; Ar ; : : :; Ak? r ) rk = b ? Axk ? Kk (AT ; y) = span(y; AT y; : : :; AT ?1 y) where x is an arbitrary vector, y an almost arbitrary one and r = b ? Ax . These two conditions de ne entirely the vector xk . Indeed, the rst one yields xk ? x = ?a r ? a Ar ?    ? ak Ak? r : Multiplying both sides by A, adding and subtracting b in the left hand side gives rk = r + a Ar +    + ak Ak r = Pk (A)r 0

0

0

1

0

k

0

0

0

1 0

0

2

1

2

1

0

0

0

0

0

0

0

with

Pk () = 1 + a  +    + ak k : The orthogonality conditions for rk give (AT y; rk) = (y; Airk ) = (y; AiPk (A)r ) = 0; for i = 0; : : : ; k ? 1: (1) Thus, the coecients a ; : : : ; ak are solution of the system a (y; Ai r ) +    + ak (y; Ai kr ) = ?(y; Air ); i = 0; : : : ; k ? 1: Obviously, in practice, solving such systems for increasing values of k is only feasible if their solutions can be obtained recursively, that is, in other words, if the polynomials Pk can be computed recursively. Such a computation is possible since, in fact, the polynomials Pk form 1

i

0

1

1

+1

+

0

0

0

a family of formal orthogonal polynomials as will now be explained.

2 Formal orthogonal polynomials

Let us de ne the linear functional c on the space of polynomials by c(i ) = ci = (y; Air ); i = 0; 1; : : : : So, if p is an arbitrary polynomial, we have c(p) = (y; p(A)r ). Thus, the orthogonality conditions (1) can be written as c(i Pk ) = 0; for i = 0; : : : ; k ? 1: (2) A family of polynomials fPk g satisfying these conditions for all k is called the family of formal orthogonal polynomials (FOP) with respect to the linear functional c [2]. They are de ned up to a multiplying factor which, in our case, is chosen so that Pk (0) = 1 as seen in Section 1. From (2), c(pPk ) = 0 for any polynomial p of degree strictly less than k. The orthogonality conditions (2) can also be written as c(UiPk ) = 0; for i = 0; : : :; k ? 1 where Ui is an arbitrary polynomial of exact degree i. It is well{known that, in the usual case (that is when c can be represented by an integral on the real line with respect to some positive measure), the orthogonal polynomials Pk exist for all k and satisfy a three{term recurrence relationship. Such a recurrence relationship still hold for FOP if, for all k , Pk exists and has degree k exactly. This is true if and only if, 8k , Hk 6= 0 where cn    cn k? ... : Hkn = ... cn k?    cn k? The polynomials Pk can also be recursively computed via their adjacent family that is the family of polynomials fPk g orthogonal with respect to the linear functional c de ned by c (i ) = c(i ) = ci : For these polynomials, let us choose the multiplying factor so that, 8k, the coecient of k in Pk is equal to 1. In that case, the polynomials are said to be monic. Since they are monic, the adjacent polynomials Pk exist under the same condition Hk 6= 0 (see, for example, [9, 12]). If, for some values of k, Hk = 0, then a division by zero (breakdown) occurs in the computation of the coecients of the recurrence relationships. Breakdowns are also possible even if the 0

0

(1)

+

1

( )

+

1

+2

2

(1)

(1)

(1)

+1

+1

(1)

(1)

(1)

(1)

3

preceding condition is satis ed. In the case of a breakdown, the computations can be continued by jumping over the polynomials for which a division by zero arises (see, for example, [8, 9]). In this paper, we will assume that no breakdown occurs, that is, 8k, Pk and Pk exist, have the exact degree k, and can be computed by the recurrence relationship under consideration. There are many possible ways of computing recursively the two families of FOP de ned above, see [2]. Each of these recurrence relationships (or each pair of them) leads to a di erent algorithm for the implementation of Lanczos method. In the sequel, the coecients of the recurrence relationships will be computed by using the orthogonality conditions with respect to two arbitrary families of polynomials, fUk g and fVk g, such that, 8k, Uk and Vk have the exact degree k. We will now present the three main recurrences for computing these FOP in their most general form, that is without specifying, for the moment, the families fUk g and fVk g. Other recurrences could be found in [12]. Since the polynomials Pk form a family of FOP, they satisfy a three{term recurrence relationship Pk () = ( k  + k )Pk () ? k Pk? () (3) for k = 0; 1; : : :, with P () = 1 and P? () = 0. The coecients are given as the solution of the 3  3 system 9 k c(Uk? Pk ) ? k c(Uk? Pk? ) = 0 >= k c(Uk Pk ) + k c(Uk Pk ) ? k c(Uk Pk? ) = 0 > (4) ; k ? k = 1: The recurrence relationship (3) can also be written as " ! #  k k + Pk () = k Pk () ? Pk? () (1)

+1

+1

0

+1

+1

1

1

+1

1

+1

+1

+1

1

1

+1

1

+1

+1

+1

+1

+1

+1

k+1

k+1

1

that is, setting k = k = k ; k = ? k = k and k = ? k +1

+1

+1

+1

+1

+1

+1

+1

Pk () = ?k [( ? k ) Pk () ? k Pk? ()] +1

+1

+1

+1

1

(5)

and the system (4) becomes

k

k k

+1

+1 +1

c(Uk? Pk ) c(Uk? Pk? ) [c(Uk Pk ) ? k c(Uk Pk? )] =c(Uk Pk ) 1

+ :

= = =

1

1

1

+1

k+1

1

(6)

k+1

Implementing the method of Lanczos via these relations is known as the Lanczos/Orthores algorithm. Let us now set Qk () = (?1)k Hk Pk (): (0)

Hk

(1)

(1)

As explained, for example, in [12], the coecients of k in Qk and Pk are equal and Qk is proportional to Pk . We have the following relations between these two families of FOP ) Pk () = Pk () ? 0k Qk () (7) Qk () = Pk () + k Qk () (1)

+1

+1

+1

+1

4

+1

with P () = Q () = 1 and the coecients are given by 0

0

0k k

= c(Uk Pk )=c(Uk Qk ) = ?c(Vk Pk )=c(Vk Qk ):

+1 +1

(8)

+1

Implementing the method of Lanczos via these relations is known as the Lanczos/Orthomin algorithm. Instead of using the polynomials Qk , we can use the polynomials Pk and compute them by their three{term recurrence relationship, that is ) Pk () = Pk () ? k Pk () (9) Pk () = ( + k )Pk () ? k Pk? () (1)

+1 (1) +1

+1

(1)

+1 (1)

+1

(1) 1

with P () = P () = 1 and P? () = 0. The coecients are given by 0

(1) 0

(1) 1

k

k k

+1

+1 +1

= c(Uk Pk )=c(Uk Pk ) = c( Vk? Pk )=c(Vk? Pk? ) = [ k c(Vk Pk? ) ? c( Vk Pk )]=c(Vk Pk ): (1)

2

(1)

1

(1) 1

+1

(1) 1

1

2

(1)

(10)

(1)

Implementing the method of Lanczos via these relations is known as the Lanczos/Orthodir algorithm.

3 Implementation of Lanczos method Let us now use the recurrence relationships given in the previous Section for implementing the method of Lanczos. Other algorithms, based on other recurrences for formal orthogonal polynomials, are given in [12] and studied in more details in [1]. We have rk = Pk (A)r and we set zk = Pk (A)r . Replacing the variable  by the matrix A in these recurrence relationships and multiplying by r leads to various recursive algorithms for computing the sequences (xk ) and (rk ). (1)

0

0

0

3.1 Lanczos/Orthores

Using the relation (5), we obtain the following algorithm

rk xk

+1 +1

= ?k (Ark ? k rk ? k rk? ) = k (rk + k xk + k xk? ) +1

+1

+1

+1

+1

+1

1

1

with

k = (y; AUk? (A)rk )=(y; Uk? (A)rk? )

k = [(y; AUk(A)rk ) ? k (y; Uk (A)rk? )]=(y; Uk(A)rk ) k = 1=( k + k ): For the choice Uk  Pk , this algorithm is called Lanczos/Orthores [28] or BIORES [18]. +1

1

1

+1

+1

+1

+1

+1

5

1

1

3.2 Lanczos/Orthomin

Setting pk = Qk (A)r , the relations (7) lead to rk = rk ? 0k Apk xk = xk + 0k pk pk = r k + k pk with p = r = b ? Ax and 0k = (y; Uk (A)rk )=(y; AUk(A)pk ) k = ?(y; AVk(A)rk )=(y; AVk (A)pk ): This algorithm is due to Vinsome [27]. For the choice Uk  Vk  Pk , it is called Lanczos/Orthomin [28] and is equivalent to the bi{conjugate gradient (BCG) of Lanczos [21, 22] which was written under an algorithmic form by Fletcher [14]. This algorithm is also known under the name of BIOMIN [18]. 0

+1

+1

+1

+1

+1

0

0

+1

+1

0

+1

+1

+1

3.3 Lanczos/Orthodir

We immediately obtain from the relations (9) rk = rk ? k Azk xk = xk + k zk zk = (A + k I )zk ? k zk? with z = r = b ? Ax and z? = 0. Using the de nition of the linear functional c, we have k = (y; Uk (A)rk )=(y; AUk(A)zk )

k = (y; A Vk? (A)zk )=(y; AVk? (A)zk? ) k = [ k (y; AVk (A)zk? ) ? (y; A Vk (A)zk)]=(y; AVk (A)zk): 0

0

0

+1

+1

+1

+1

+1

+1

+1

1

1

+1

2

+1

+1

1

1

+1

1

2

1

For the choice Uk  Vk  Pk this algorithm for implementing the method of Lanczos is known under the name of Lanczos/Orthodir [28] and of BIODIR [18]. All the preceding formulae for the coecients require the computation of products of the form T A v, where v is some vector. Indeed, let us explain this point in the case where Uk () = k . For example, for computing k in Lanczos/Orthodir, we need (y; Akrk ) and (y; Ak zk ). Since the vectors rk and zk depend on k, the computation of Ak rk and Ak zk requires many matrix{vector products and the algorithm becomes untractable in practice. Thus, we will replace the preceding +1 T T scalar products by (A y; rk ) and (A y; zk) respectively that is by (yk ; rk ) and (yk ; zk ) where 8i, yi = AT yi? with y = y. Thus, the price to pay for having a cheap algorithm is using AT , an operation often quite dicult because of the indirect addressing required by the structure of A when the system is large and sparse. The same drawback arises in the computation of the coecients of the other recursive algorithms for implementing the method of Lanczos and again AT has to be used. (1)

+1

+1

k

+1

k

+1

1

0

4 Transpose{free algorithms

We will now discuss how to avoid the use of AT (or A in the case of a complex system) in the computation of the coecients of the recurrence relationships of the orthogonal polynomials given 6

in the preceding Section. The idea consists in computing recursively the products of polynomials appearing in these coecients. Such relations are obtained by multiplying together the recurrence relationships of the corresponding polynomials. Obviously, there are several ways for doing these computations, each of them leading to a di erent algorithm with a di erent numerical behavior. It is not our purpose herein to give all these algorithms but only some of them selected for their properties. Our approach is the most general one since Lanczos method can be implemented by any recurrence relationship(s) (see [12] and [1]) and the polynomials Uk and Vk can also be computed by arbitrary recurrences. Obviously, in this case, that is when the polynomials Uk and Vk are not speci ed, the algorithms obtained require many matrix{vector products but most of them will disappear in the particular cases treated. In what follows, we will restrict ourselves to the three main algorithms for implementing the Lanczos method, namely Lanczos/Orthores, Lanczos/Orthomin and Lanczos/Orthodir, and choose, for Uk and/or Vk , the powers of , Pk (which corresponds to the CGS) and the polynomials corresponding to the BiCGSTAB. As we will see, some choices for these polynomials lead to a coupled implementation of a Lanczos{ type product method at almost no extra cost. Such an idea was already implicitly contained in [12], where determinantal formulae for the residuals and the iterates of all Lanczos{type product methods could be found (see also [11]). A Lanczos{type product method is a method where the residual has the form r~k = Wk (A)Pk (A)r (11) where Wk is an arbitrary polynomial satisfying Wk (0) = 1. This condition is mandatory for being able to recover the vector x~k de ned by r~k = b ? Ax~k without using A? . Thus, for the choice Wk () = k , r~k is not the residual of a Lanczos{type product method. Among the most well{known Lanczos{type product methods are the CGS of Sonneveld [25], the BiCGSTAB of van der Vorst [26] and its variants [19]. Other methods of this type are given in [6] (see [3] for a preliminary version). Even if the polynomials Uk and Vk can be arbitrarily chosen, some choices are more interesting than others. In any case, let us remark that all the algorithms simplify if 8k, Uk  Vk . The simplest choice for Uk and/or Vk is, of course, k . In that case, the recurrence relationship becomes Uk () = Uk () with U () = 1. However, this choice does not lead to a Lanczos{type product method. Another choice is to take Pk and/or Pk to obtain the residual of the CGS. When we choose Uk  Vk , satisfying the recurrence relationship Uk () = (1 ? k )Uk () with U () = 1, and k chosen to minimize kr~k k, we obtain the residual of the BiCGSTAB. It is also possible to assume that Uk and/or Vk satisfy other recurrence relationships as we shall see in [16] and [13]. We shall assume that the polynomials Uk and Vk satisfy three{term recurrence relationships of the form Uk () = (ak  + bk )Uk () ? dk Uk? () (12) 0 0 0 Vk () = (ak  + bk )Vk () ? dk Vk? () for k = 0; 1; : : :, with U? () = V? () = 0, U and V arbitrary nonzero constants, and 8k; ak 6= 0 and a0k 6= 0. This choice includes, as particular cases, Uk () = Uk () and Uk () = (1 ? k )Uk (). 0

1

+1

0

(1)

+1

0

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

0

1

1

0

+1

+1

7

+1

If we assume that 8k, dk 6= 0 and/or d0k 6= 0 then, by Shohat{Favard's theorem, the polynomials Uk and/or Vk satisfying (12) are formal orthogonal polynomials with respect to some linear functional. One can take, for example, the Chebyshev polynomials. Due to their minimisation property, this choice could lead to more stable algorithms. One could also, perhaps, take the Faber polynomials corresponding to the spectrum of A. All these questions remain to be studied. Although the polynomials Wk arising in the de nition of the Lanczos{type product methods could also be chosen arbitrarily, taking Wk  Uk or Vk directly leads to product methods. Taking for Wk a polynomial di erent from Uk or Vk may lead to more complicated algorithms. A di erent choice will be showed in Section 4.3.3. On the other hand, for the choice Wk  Uk (resp. Wk  Vk ), the condition Wk (0) = 1 imposes that bk ? dk = 1 (resp. b0k ? d0k = 1) in (12). +1

+1

+1

4.1 Transpose{free Lanczos/Orthores

+1

Let us compute the products involved in (6). In the following tables, the left columns contain the products computed and the right ones the corresponding recurrence relationships. Uk Pk Uk Pk = ?k [( ? k )Uk Pk ? k Uk Pk? ] Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk? Pk Uk Pk = ?k [( ? k )Uk Pk ? k Uk Pk? ] Uk Pk? Uk Pk? = (ak  + bk )Uk Pk? ? dk Uk? Pk? Let us set r~k = Uk (A)Pk (A)r u~k = Uk (A)Pk? (A)r s~k = Uk? (A)Pk (A)r q~k = Uk (A)Pk? (A)r : The preceding recurrence relationships give r~k = ?k (Au~k ? k u~k ? k q~k ) u~k = (ak A + bk I )~rk ? dk s~k (13) s~k = ?k [(A ? k I )~rk ? k u~k ] q~k = (ak A + bk I )~uk ? dk r~k? : The products involved in (6) can be computed in a di erent way thus leading to a di erent algorithm. We have Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk Pk Uk Pk = ?k [( ? k )Uk Pk ? k Uk Pk? ] Uk Pk? Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk? Pk Uk? Pk = ?k [( ? k )Uk? Pk ? k Uk? Pk? ] Let us set r~k = Uk (A)Pk (A)r u~k = Uk (A)Pk? (A)r s~k = Uk? (A)Pk (A)r q~k0 = Uk? (A)Pk (A)r : Thus, we obtain r~k = (ak A + bk I )~sk ? dk q~k0 u~k = (ak A + bk I )~rk ? dk s~k (14) s~k = ?k [(A ? k I )~rk ? k u~k ] q~k0 = ?k [(A ? k I )~sk ? k r~k? ] +1

+1

+1

+1

+1

1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

1

+1

+1

1

1

1

+1

1

1

+1

1

+1

0

1

0

1

0

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

0

+1

+1

+1

1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

1

+1

0

1

0

1

0

1

+1

+1

+1

+1

+1

+1

+1

+1

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

8

1

1

1

1

1

with, from (6),

k

+1

k k

+1

+1

y; As~k) = ((y; r~k? ) ( y; Ar~k) ? k (y; u~k ) = (y; r~k ) = 1=( k + k ): 1

(15)

+1

+1

+1

This algorithm was also given by Ressel and Gutknecht [23] but without any reference to its possible use as a transpose{free algorithm for the implementation of Lanczos method. The iterates and the corresponding residuals of Lanczos method are given by

rk xk

+1 +1

= ?k (Ark ? k rk ? k rk? ) = k (rk + k xk + k xk? ): +1

+1

+1

+1

+1

+1

1

1

Let us now examine in more detail some particular choices of the auxiliary polynomials Uk . If Uk (0) = 1, then, as explained above, r~k is the residual of a Lanczos{type product method which can be implemented simultaneously at almost no extra cost.

4.1.1 TF Lanczos/Orthores and powers For the choice Uk = Uk , it holds 8k; ak = 1, bk = dk = 0. The relations (13) simplify +1

+1

+1

and the vector s~k is not used. We have (since q~k = Au~k ) +1

+1

+1

u~k = Ar~k q~k = Au~k r~k = ?k (~qk ? k u~k ? k q~k ) with (since r~k = As~k and u~k = Ar~k ) k = (y; r~k )=(y; r~k? )

k = (y; u~k )(y;? r~k) (y; u~k) k and k computed as in (15). +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

We have the following TF Lanczos/Orthores algorithm

Algorithm TFres1 (A; b; x ; y) 0

initializations r b ?Ax 0

0

r~ = r r? = x? = r~? = u~ = q~ = 0 for k = 0; 1; 2; : : : until convergence do u~k Ar~k if k > 0 then k (y; r~k)=(y; r~k? ) 0

0

1

1

1

0

0

+1

+1

else  0 end if

1

1

9

+1

(y; u~k ) ? k (y; u~k ) (y; r~k ) 1=( k + k ) ?k (Ark ? k rk ? k rk? ) k (rk + k xk + k xk? ) if krk k  " then stop. q~k Au~k r~k ?k (~qk ? k u~k ? k q~k )

k k rk xk

+1

+1

+1

TF TF

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

1

+1

+1

+1

+1

+1

end for

+1

+1

+1

+1

This algorithm needs 3 matrix{vector products and 2 inner products. When using the relations (14), the vector q~k0 is not needed and a quite similar algorithm is obtained. We have u~k = Ar~k s~k = ?k (~uk ? k r~k ? k u~k ) r~k = As~k and we obtain another TF Lanczos/Orthores algorithm named TFres2. It consists in replacing, in the previous pseudo{code, the last two lines by s~k ?k (~uk ? k r~k ? k u~k ) r~k As~k Again this algorithm needs 3 matrix{vector products and 2 inner products. +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

4.1.2 TF Lanczos/Orthores and BiCGSTAB For the choice Uk = (1 ? k )Uk , we have ak = ?k ; bk = 1 and dk = 0. The +1

+1

+1

+1

+1

+1

parameter k is chosen to minimize the Euclidean norm of r~k and, thus, r~k is the residual of the BiCGSTAB of van der Vorst [26] implemented via the recurrence relationship (5). From (14), we have s~k = ?k (Ar~k ? k r~k ? k u~k ) (16) r~k = s~k ? k As~k u~k = r~k ? k Ar~k and q~k0 is not necessary. Since c(Uk? Pk ) = ?c(Uk Pk )=k , that is (y; As~k ) = ?(y; r~k )=k , and since u~k = r~k? ? k Ar~k? , from (15) we have y; r~k ) k = ?  ((y; r~k? ) k

k = (y; Ar~k) ? k [((y;y;r~r~k?) ) ? k (y; Ar~k? )] : +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

1

1

+1

1

+1

+1

1

1

k

We have (~rk ; r~k ) = (~sk ; s~k ) ? 2 k (~sk ; As~k ) + k (As~k ; As~k ): Thus, the value of k minimizing kr~k k is given by k = (~sk ; As~k )=(As~k ; As~k ): +1

+1

+1

+1

+1

+1

+1

+1

2 +1

+1 2

+1

+1

+1

10

+1

+1

+1

+1

Since r~k = b ? Ax~k , and k ( k + k ) = 1, we have, from (16), +1

b ? Ax~k

+1

+1

+1

?k (Ar~k ? k r~k ? k u~k ) ? k As~k ?k (Ar~k ? k r~k ? k r~k? + k k Ar~k? ) ? k As~k ?k [Ar~k ? k (b ? Ax~k) ? k (b ? Ax~k? ) + k k Ar~k? ] ? k As~k k ( k + k )b ? k [Ar~k ? k Ax~k ? k Ax~k? + k k Ar~k? ] ?k As~k :

= = = =

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

Thus

1

+1

+1

1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

1

1

+1

+1

+1

1

+1

x~k = k [~rk + k x~k + k (~xk? + k r~k? )] + k s~k : +1

+1

+1

+1

1

1

+1

+1

The TF Lanczos/Orthores algorithm coupled with the BiCGSTAB algorithm is the following

Algorithm TFresBiCGSTAB (A; b; x ; y) 0

initializations r b ?Ax 0

0

r~ = r x~ = x r? = x? = r~? = x~? = u~ = 0  =0 for k = 0; 1; 2; : : : until convergence do if k > 0 then k ?  ((y;y; rr~~k ) ) 0

0

0

0

1

1

1

1

0

0

+1

k?1

k

else  0 end if 1

(y; Ar~k) ? k [(y; r~k? ) ? k (y; Ar~k? )] (y; r~k) 1=( k + k ) ?k (Ark ? k rk ? k rk? ) k (rk + k xk + k xk? ) ?k (Ar~k ? k r~k ? k u~k ) (~sk ; As~k ) (As~k ; As~k ) s~k ? k As~k k [~rk + k x~k + k (~xk? + k r~k? )] + k s~k if krk k  " or kr~k k  " then stop. u~k r~k ? k Ar~k

k k rk xk s~k k r~k x~k

+1

+1

+1

TF TF

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

1

1

+1

+1

+1

+1

+1

end for

This algorithm needs 3 matrix{vector products and 4 inner products. Let us mention that, using (13) instead of (14), leads to a more complicated expression for kr~k k and, thus, for k . +1

+1

4.1.3 TF Lanczos/Orthores and CGS For the choice Uk  Pk , we have ak = ?k ; bk = k k and dk = ?k k . +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

r~k is the residual of the CGS of Sonneveld [25], implemented via the recurrence relationship

(5).

11

The expressions for k and k can be simpli ed and, as showed in [12], we have k = ? 1 cc(P(Pk )) (17) k k?

k = c(Pk )=c(Pk ): With this choice, s~k = u~k and the relations (13) and (14) coincide. We have u~k = ?k (Ar~k ? k r~k ? k u~k ) q~k0 = ?k (Au~k ? k u~k ? k r~k? ) (18) 0 r~k = ?k (Au~k ? k u~k ? k q~k ) : From (17) we obtain k = ? 1 (y;(y;r~r~k ) ) k k?

k = (y; Ar~k)=(y; r~k ): The recurrence relationship for x~k cannot be obtained directly from (18). If we set u~k = b ? Ay~k , we have y~k = k (~rk + k x~k + k y~k ) and, since k ( k + k ) = 1 and q~k0 = b ? k A (~uk + k y~k + k x~k? ), we obtain from the relation of r~k given in (18) x~k = k [~uk + k y~k + k k (~uk + k y~k + k x~k? )] : The TF Lanczos/Orthores algorithm coupled with the CGS algorithm is the following Algorithm TFresCGS (A; b; x ; y) +1

+1

2

+1

2

+1

+1

+1

1

2

+1

+1

+1

+1

+1

+1

2

+1

+1

+1

+1

+1

1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

initializations r b ?Ax 0

+1

+1

+1

+1

+1

+1

+1

0

0

r~ = r x~ = x r? = x? = r~? = x~? = u~ = y~ = q~0 = 0 for k = 0; 1; 2; : : : until convergence do if k > 0 then k ?(y; r~k)= [k (y; r~k? )] 0

0

0

0

1

1

1

0

+1

else



1

1

end if

0

0

1

0

k (y; Ar~k)=(y; r~k ) k 1=( k + k ) rk ?k (Ark ? k rk ? k rk? ) xk k (rk + k xk + k xk? ) u~k ?k (Ar~k ? k r~k ? k u~k ) q~k0 ?k (Au~k ? k u~k ? k r~k? ) r~k ?k (Au~k ? k u~k ? k q~k0 ) y~k k (~rk + k x~k + k y~k ) x~k k [~uk + k y~k + k k (~uk + k y~k + k x~k? )] if krk k  " or kr~k k  " then stop. +1

TF TF

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

1

+1

+1

+1

+1

end for

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

This algorithm needs 3 matrix{vector products and 2 inner products. 12

1

4.2 Transpose{free Lanczos/Orthomin

Let us now compute the products involved in (8). In the following table we put all the needed products. Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk Pk Uk Pk = Uk Pk ? 0k Uk Qk Uk? Pk Uk? Pk = Uk? Pk ? 0k Uk? Qk Uk Qk Uk Qk = Uk Pk + k Uk Qk Uk Qk Uk Qk = (ak  + bk )Uk Qk ? dk Uk? Qk Uk? Qk Uk Qk = Uk Pk + k Uk Qk Vk Pk Vk Pk = (a0k  + b0k )Vk Pk ? d0k Vk? Pk Vk Pk Vk Pk = Vk Pk ? 0k Vk Qk Vk? Pk Vk? Pk = Vk? Pk ? 0k Vk? Qk Vk Qk Vk Qk = Vk Pk + k Vk Qk Vk Qk Vk Qk = (a0k  + b0k )Vk Qk ? d0k Vk? Qk Vk? Qk Vk Qk = Vk Pk + k Vk Qk Let us set r~k = Uk (A)Pk (A)r r^k = Vk (A)Pk (A)r s~k = Uk? (A)Pk (A)r s^k = Vk? (A)Pk (A)r q~k0 = Uk? (A)Pk (A)r q^k0 = Vk? (A)Pk (A)r z~k = Uk (A)Qk(A)r z^k = Vk (A)Qk(A)r v~k = Uk (A)Qk? (A)r v^k = Vk (A)Qk? (A)r p~k = Uk? (A)Qk(A)r p^k = Vk? (A)Qk(A)r : The preceding recurrence relationships give r~k = (ak A + bk I )~sk ? dk q~k0 s~k = r~k ? 0k Az~k q~k0 = s~k ? 0k Ap~k (19) z~k = r~k + k v~k v~k = (ak A + bk I )~zk ? dk p~k p~k = s~k + k z~k r^k = (a0k A + b0k I )^sk ? d0k q^k0 s^k = r^k ? 0k Az^k q^k0 = s^k ? 0k Ap^k z^k = r^k + k v^k v^k = (a0k A + b0k I )^zk ? d0k p^k p^k = s^k + k z^k : Replacing q~k0 and q^k0 by their expressions in r~k and in r^k , respectively allows to suppress them from the relations and we obtain r~k = (ak A + bk I )~sk ? dk (~sk ? 0k Ap~k ) (20) 0 0 0 0 r^k = (ak A + bk I )^sk ? dk (^sk ? k Ap^k ): We have rk = rk ? 0k Apk xk = xk + 0k pk pk = r k + k pk +1

+1

1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

1

1

+1

+1

+1

1

+1

1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

1

1

1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

1

+1

+1

0

1

0

0

1

+1

1

0

0

1

+1

0

1

0

1

1

0

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

13

+1

0

0

+1

+1

0

0

with p = r = b ? Ax and, from (8), 0

0

0

0k

(y; r~k ) = (y; Az~k) ( y; = ? (y;AAs^zk^ ) ) :

+1

k

(21)

+1

+1

k

As in the Transpose{free Lanczos/Orthores, we can compute the products in a di erent way, thus leading to a di erent algorithm. Uk Pk Uk Pk = Uk Pk ? 0k Uk Qk Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Vk Pk Vk Pk = Vk Pk ? 0k Vk Qk Vk Pk Vk Pk = (a0k  + b0k )Vk Pk ? d0k Vk? Pk All the others products needed have already been computed. Let us set u~k = Uk (A)Pk? (A)r ; u^k = Vk (A)Pk? (A)r : We obtain the following relations r~k = u~k ? 0k Av~k s~k = r~k ? 0k Az~k u~k = (ak A + bk I )~rk ? dk s~k (22) z~k = r~k + k v~k v~k = (ak A + bk I )~zk ? dk p~k p~k = s~k + k z~k r^k = u^k ? 0k Av^k s^k = r^k ? 0k Az^k u^k = (a0k A + b0k I )^rk ? d0k s^k z^k = r^k + k v^k v^k = (a0k A + b0k I )^zk ? d0k p^k p^k = s^k + k z^k : Replacing u~k and u^k by their expressions in r~k and r^k respectively allows to suppress them from the relations and we obtain r~k = (ak A + bk I )~rk ? dk s~k ? k Av~k (23) 0 0 0 0 r^k = (ak A + bk I )^rk ? dk s^k ? k Av^k : The same idea, but with polynomials Uk and/or Vk satisfying recurrence relationships di erent from (12), is studied in [16] (see also [15]). Let us now look at some particular choices of Uk and Vk in more detail. We will always take Uk  Vk and, so, the tilde and the hat vectors coincide. +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

1

+1

1

+1

+1

1

+1

0

+1

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

4.2.1 TF Lanczos/Orthomin and powers For the choice Uk  Vk , Uk = Uk , we have ak = 1, bk = dk = 0. +1

+1

+1

+1

+1

+1

Using the relations (19) and (20), the vector p~k is not used and, since v~k = Az~k, we obtain s~k = r~k ? 0k Az~k r~k = As~k z~k = r~k + k Az~k +1

+1

+1

+1

+1

+1

+1

14

+1

with (since r~k = As~k ) +1

0

+1

k = ?(y; r~k )=(y; Az~k) +1

+1

and k computed as in (21). The TF Lanczos/Orthomin algorithm is the following Algorithm TFmin (A; b; x ; y) +1

0

initializations r b ?Ax p = r~ = z~ = r for k = 0; 1; 2; : : : until convergence do 0

0

0

TF TF

0

0k rk xk

0

0

(y; r~k )=(y; Az~k) rk ? 0k Apk xk + 0k pk if krk k  " then stop. s~k r~k ? 0k Az~k r~k As~k k ?(y; r~k )=(y; Az~k) z~k r~k + k Az~k pk r k + k pk +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

TF

+1

+1

+1

+1

+1

+1

+1

end for

This algorithm needs 3 matrix{vector products and 2 inner products. When, instead, we consider the relations (22) and (23), we obtain another algorithm, where v~k = Az~k and r~k = Ar~k ? 0k Av~k , but it needs an additional matrix{vector product. +1

+1

+1

+1

4.2.2 TF Lanczos/Orthomin and BiCGSTAB For the choice Uk  Vk , Uk = (1 ? k )Uk , we have ak = ?k ; bk = 1 and dk = 0. +1

+1

+1

+1

+1

+1

+1

+1

In this case, and if k is chosen as above, r~k is the residual of the BiCGSTAB as proposed by van der Vorst in [26] which makes use of the relationship (7). The relations (19) and (20) simplify and, by replacing v~k by its expression in z~k , become s~k = r~k ? 0k Az~k r~k = s~k ? k As~k (24) z~k = r~k + k (~zk ? k Az~k ) with, since c(Uk Pk ) = ?c(Uk Pk )=k , that is (y; As~k ) = ?(y; r~k )=k , ~k ) k =  (y;(ry; Az~k) k and 0k computed as in (21). Since the relation for r~k is the same as (16), we again have k = (~sk ; As~k )=(As~k ; As~k ): De ning x~k as r~k = b ? Ax~k , from (24) we have b ? Ax~k = s~k ? k As~k = r~k ? 0k Az~k ? k As~k = b ? Ax~k ? 0k Az~k ? k As~k : +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

15

+1

+1

+1

+1

Thus

x~k = x~k + 0k z~k + k s~k : +1

+1

+1

+1

The TF Lanczos/Orthomin algorithm coupled with the BiCGSTAB algorithm is the following

Algorithm TFminBiCGSTAB (A; b; x ; y) 0

initializations r b ?Ax 0

0

p = r~ = z~ = r x~ = x for k = 0; 1; 2; : : : until convergence do 0k (y; r~k )=(y; Az~k) TF rk rk ? 0k Apk TF xk xk + 0k pk s~k r~k ? 0k Az~k (~sk ; As~k ) k (As~k ; As~k ) r~k s~k ? k As~k x~k x~k + 0k z~k + k s~k if krk k  " or kr~k k  " then stop. (y; r~k ) k k (y; Az~k) z~k r~k + k (~zk ? k Az~k ) TF pk r k + k pk 0

0

0

0

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

end for

+1

This algorithm needs 3 matrix{vector products and 4 inner products. Taking k = ?1 in this algorithm, we recover the TEA2/Orthomin given in [12]. Conversely, the TEA2/Orthomin leads to a transpose{free algorithm for implementing Lanczos method. Similarly, a TEA2/Orthores and a TEA2/Orthodir algorithms could be derived. +1

4.2.3 TF Lanczos/Orthomin (BCG) and CGS For the choice Uk  Vk and Uk = Pk , rk is the residual of the BCG algorithm and r~k +1

+1

+1

+1

is exactly the residual of the CGS proposed by Sonneveld in [25]. Again, with this choice, the expressions for 0k and k can be simpli ed and, as showed in [12], we have +1

+1

0k k

+1 +1

= c(Pk )=c(Qk ) = c(Pk )=c(Pk ): 2

2

2 +1

(25)

2

However, in this case, we cannot directly use the relations given in Section 4.2 because Pk does not satisfy a recurrence relationship of the form (12). Thus we have to compute the products involved in (25) as follows Pk Pk = Pk Pk ? 0k Pk Qk Pk Pk Pk Pk = Pk ? 0k Pk Qk Pk Qk Pk Qk = Pk + k Pk Qk Pk Qk Pk Qk = Pk Qk ? 0k Qk Qk Qk = Pk Qk + k Qk Qk Qk Qk Qk Qk = Pk Qk + k Qk +1

2

2 +1

+1

+1 2

+1 +1

+1 2

+1 2 +1

+1

2 +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

16

+1

2

+1

+1

2

+1

Let us set

r~k u~k z~k v~k t~k w~k

= Pk (A)r = Pk (A)Pk? (A)r = Pk (A)Qk (A)r = Pk (A)Qk? (A)r = Qk (A)r = Qk? (A)Qk (A)r : The preceding recurrence relationships give

r~k u~k z~k v~k t~k w~k

2

0

1

0

0

1

2

1

= = = = = =

+1 +1 +1 +1 +1 +1

Replacing u~k by its expression in r~k them from the relations and we obtain +1

+1

0

0

0

u~k ? 0k Av~k r~k ? 0k Az~k r~k + k v~k z~k ? 0k At~k z~k + k w~k v~k + k t~k : and w~k by its expression in t~k allows to suppress +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

r~k = r~k ? 0k A(~zk + v~k ) t~k = z~k + k (~vk + k t~k ): Thus, only the vectors r~k ; v~k ; z~k and ~tk are needed. From (25), we obtain 0k = (y; r~k )=(y; At~k) k = (y; r~k )=(y; r~k ) +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

and, from (26), it immediately follows

x~k

+1

= x~k + 0k (~zk + v~k ): +1

+1

The TF Lanczos/Orthomin algorithm coupled with the CGS algorithm is the following

Algorithm TFminCGS (A; b; x ; y) 0

initializations r b ?Ax 0

0

p = r~ = z~ = ~t = r x~ = x for k = 0; 1; 2; : : : until convergence do (y; r~k ) 0k (y; At~k) TF rk rk ? 0k Apk TF xk xk + 0k pk v~k z~k ? 0k At~k r~k r~k ? 0k A(~zk + v~k ) x~k x~k + 0k (~zk + v~k ) if krk k  " or kr~k k  " then stop. (y; r~k ) k (y; r~ ) 0

0

0

0

0

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

k

17

(26)

TF

z~k t~k

+1

+1

pk

+1

end for

r~k + k v~k z~k + k (~vk + k t~k ) r k + k pk +1

+1

+1

+1

+1

+1

+1

+1

+1

This algorithm needs 3 matrix{vector products and 2 inner products and it is identical to the TFBiCG algorithm obtained by Chan, de Pillis and van der Vorst in [13].

4.3 Transpose{free Lanczos/Orthodir

Let us compute the products involved in (10). There are several possibilities which are as follows Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk or Uk Pk Uk Pk = Uk Pk ? k Uk Pk Uk Pk Uk Pk = Uk Pk ? k Uk Pk Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk? Pk Uk? Pk = Uk? Pk ? k Uk? Pk Uk? Pk Uk Pk = ( + k )Uk Pk ? k Uk Pk? Uk Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk Uk Pk = (ak  + bk )Uk Pk ? dk Uk? Pk or Uk Pk Uk Pk = ( + k )Uk Pk ? k Uk Pk? Uk? Pk Uk? Pk = ( + k )Uk? Pk ? k Uk? Pk? Uk Pk? Uk Pk? = (ak  + bk )Uk Pk? ? dk Uk? Pk? Vk? Pk Vk Pk = ( + k )Vk Pk ? k Vk Pk? Vk Pk? Vk Pk = (a0k  + b0k )Vk Pk ? d0k Vk? Pk Vk Pk = (a0k  + b0k )Vk Pk ? d0k Vk? Pk or Vk Pk Vk Pk = ( + k )Vk Pk ? k Vk Pk? Vk? Pk Vk? Pk = ( + k )Vk? Pk ? k Vk? Pk? Vk Pk? Vk Pk? = (a0k  + b0k )Vk Pk? ? d0k Vk? Pk? Let us set r~k = Uk (A)Pk (A)r s~k = Uk? (A)Pk (A)r u~k = Uk (A)Pk? (A)r q~k0 = Uk? (A)Pk (A)r p~k = Uk? (A)Pk (A)r p^k = Vk? (A)Pk (A)r v~k = Uk (A)Pk? (A)r v^k = Vk (A)Pk? (A)r z~k = Uk (A)Pk (A)r z^k = Vk (A)Pk (A)r 0 p^0k = Vk? (A)Pk (A)r p~k = Uk? (A)Pk (A)r v^k0 = Vk (A)Pk? (A)r : v~k0 = Uk (A)Pk? (A)r +1

+1

+1

+1

+1

+1

+1 (1)

1

(1)

+1

(1)

+1

+1

(1) +1 (1) 1 (1)

1 (1) 1 (1)

1

+1

1

+1

(1) +1 (1) 1

+1

+1

+1

1

+1

1

+1

(1)

+1

+1

+1

+1

1

+1

+1

+1

+1 (1)

(1)

(1) 1

(1)

+1

(1) +1

1

(1) +1 1 (1) 1 1 (1) 1 1

(1) 1

1

+1

(1) +1

+1

+1

1

+1

(1)

+1 (1)

+1

+1

+1

(1) 1

+1

+1

1

(1) +1

+1 (1)

1

(1) 1

+1 (1)

+1

+1

(1) +1 (1) +1 (1) 1

+1 (1)

1

+1

+1

(1) +1 (1) +1 (1) +1 +1

+1

+1 (1)

+1

(1)

+1

+1

1

(1) +1 (1) +1 (1) 1

+1

+1 (1)

+1

+1

1 +1 (1) +1 (1) +1 (1) +1 +1

+1

+1

+1

+1

1

1

+1

+1

(1)

(1) +1

1

(1) +1 1 (1) +1 1 1 (1) 1 +1 1 +1

0

1

0

1

1

1

1

+1

0

+1 (1)

(1) 1 (1)

(1) +1 (1) 1

0

0

1

0

0

0

1

0

+1

18

(1)

(1) 1 (1)

(1) +1 (1) 1

0

0

0

0

0

By combining all the possibilities, we obtain 8 di erent algorithms that can be schematized as follows r~k = (ak A + bk I )~sk ? dk q~k0 r~k = u~k ? k Av~k q~k0 = s~k ? k Ap~k u~k = (ak A + bk I )~rk ? dk s~k s~k = r~k ? k Az~k p~k = (A + k I )~zk ? k v~k v~k = (ak A + bk I )~zk ? dk p~k z~k = (ak A + bk I )~pk ? dk p~0k z~k = (A + k I )~vk ? k v~k0 v~k0 = (ak A + bk I )~vk ? dk z~k? p~0k = (A + k I )~pk ? k z~k? p^k = (A + k I )^zk ? k v^k v^k = (ak A + bk I )^zk ? dk p^k z^k = (ak A + bk I )^pk ? dk p^0k z^k = (A + k I )^vk ? k v^k0 p^0k = (A + k I )^pk ? k z^k? v^k0 = (ak A + bk I )^vk ? dk z^k? : We have rk = rk ? k Azk xk = xk + k zk zk = (A + k I )zk ? k zk? with z = r = b ? Ax and z? = 0, and from (10) (y; r~k) k = (y; Az~k)

k = ((y;y;AAz^ p^k )) (27) k? k = k (y; A(y;v^kA) ?z^ )(y; A z^k ) : k Again, let us look at some particular choices of Uk and Vk in more detail. We will always take 8k; Uk  Vk , and hence the hat vectors coincide with the tilde vectors. +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

0

0

+1

+1

+1

+1

+1

+1

+1

0

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

1

1

+1

2

+1

1

2

+1

+1

4.3.1 TF Lanczos/Orthodir and powers For the choice Vk  Uk , and Uk = Uk , we have ak = 1, bk = dk = 0. +1

+1

+1

+1

+1

+1

Choosing, among all the possible preceding algorithms, those with the minimal number of matrix{vector products, we obtain two di erent algorithms. The rst one is based on the following relations v~k = Az~k r~k = Ar~k ? k Av~k z~k = (A + k I )~vk ? k Av~k: Since (y; A p~k ) = (y; Az~k) = (y; v~k ) and (y; A z~k ) = (y; Av~k ), we have k = (y; r~k )=(y; v~k )

k = (y; v~k )=(y; v~k ) ~k) ? (y; Av~k ) : k = k (y; A(vy; v~ ) +1

+1

+1

+1

2

+1 2

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

k+1

The rst TF Lanczos/Orthodir algorithm is the following 19

initializations r b ?Ax 0

Algorithm TFdir1 (A; b; x ; y) 0

0

z = r~ = z~ = r z? = v~ = 0 for k = 0; 1; 2; : : : until convergence do v~k Az~k k (y; r~k )=(y; v~k ) TF rk rk ? k Azk TF xk xk + k zk if krk k  " then stop. r~k Ar~k ? k Av~k if k > 0 then

k (y; v~k )=(y; v~k) 0

0

1

0

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

else

0 end if 1

TF

k z~k zk

k (y; Av~k) ? (y; Av~k ) (y; v~k ) Av~k + k v~k ? k Av~k Azk + k zk ? k zk? +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

end for

+1

+1

1

This algorithm needs 4 matrix{vector products and 3 inner products. Setting s~k = r~k ? k v~k , r~k can also be computed by r~k = As~k . This algorithm, named TFdir2, di ers from the previous one only for the computation of r~k . It needs the same number of matrix{vector products and of inner products and seems to be slightly more stable than the previous one. +1

+1

+1

+1

+1

+1

+1

4.3.2 TF Lanczos/Orthodir and BiCGSTAB For the choice Uk  Vk , Uk = (1 ? k )Uk , we have ak = ?k ; bk = 1 and dk = 0. +1

+1

+1

+1

+1

+1

+1

+1

Choosing the algorithm needing the minimal computational cost, we use the following relations s~k = r~k ? k Az~k r~k = s~k ? k As~k (28) v~k = z~k ? k Az~k z~k = Av~k + k v~k ? k v~k0 v~k0 = v~k ? k Av~k: (29) Substituting v~k0 in z~k , we have z~k = Av~k + k v~k ? k (~vk ? k Av~k ): Since c( Uk? Pk ) = ?c(Uk Pk )=k , that is (y; A p~k ) = ?(y; Az~k)=k and c( Uk Pk ) = [c(Uk Pk ) ? c(Uk Pk )]=k , that is (y; A z~k ) = [(y; Az~k) ? (y; Av~k )]=k , we have, from (27), y; Az~k)

k = ?  ((y; Az~k? ) k

 k = k k (y; Av~k) ?(y;(y;AAz~z~)k) + (y; Av~k ) k k +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

2

(1)

1

+1

(1)

+1

+1

+1

+1

(1)

(1)

2

2

+1

+1

+1

+1

2

+1

1

+1

+1

+1

+1

20

+1

(1)

and k computed as in (27). In this case, as in TFminBiCGSTAB, we choose k minimizing kr~k k , that is +1

+1

+1 2

k = (~sk ; As~k )=(As~k ; As~k ) +1

and

+1

+1

+1

+1

x~k = x~k + k z~k + k s~k : +1

+1

+1

+1

A rst TF Lanczos/Orthodir algorithm coupled with the BiCGSTAB is

Algorithm TFdirBiCGSTAB1 (A; b; x ; y) 0

initializations r b ?Ax 0

0

z = r~ = z~ = r x~ = x z? = z~? = v~ = 0  =0 for k = 0; 1; 2; : : : until convergence do k (y; r~k )=(y; Az~k) TF rk rk ? k Azk TF xk xk + k zk s~k r~k ? k Az~k (~sk ; As~k ) k (As~k ; As~k ) r~k s~k ? k As~k x~k x~k + k z~k + k s~k if krk k  " or kr~k k  " then stop. v~k z~k ? k Az~k if k > 0 then

k ?  ((y;y; AAzz~~k) ) 0

0

0

0

0

0

1

1

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

k?1

k

else

0 end if 1

TF

k z~k zk

+1

+1 +1

end for

k k (y; Av~k) ? (y; Az~k) + (y; Av~k ) k (y; Az~k) Av~k + k v~k ? k (~vk ? k Av~k ) Azk + k zk ? k zk? +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

This algorithm needs 4 matrix{vector products and 5 inner products. The products Av~k, which are needed in this algorithm, can be computed recursively. Indeed, setting w~k = Av~k, we have w~k = Az~k ? k A z~k : In this way, we can use directly for k the relation (27) and we obtain a variant of the previous algorithm that needs again 4 matrix{vector products but now 6 inner products. +1

+1

+1

21

2

initializations r b ?Ax 0

Algorithm TFdirBiCGSTAB2 (A; b; x ; y) 0

0

z = r~ = z~ = r x~ = x z? = z~? = v~ = w~ = 0  =0 for k = 0; 1; 2; : : : until convergence do k (y; r~k )=(y; Az~k) TF rk rk ? k Azk TF xk xk + k zk s~k r~k ? k Az~k (~sk ; As~k ) k (As~k ; As~k ) r~k s~k ? k As~k x~k x~k + k z~k + k s~k if krk k  " or kr~k k  " then stop. if k > 0 then

k ?  ((y;y; AAzz~~k) ) 0

0

0

0

0

0

1

1

0

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

k?1

k

else

0 end if 1

k v~k w~k z~k zk

+1

+1

TF

+1

+1

+1

end for

k (y; w~k) ? (y; A z~k ) (y; Az~k) z~k ? k Az~k Az~k ? k A z~k w~k + k v~k ? k (~vk ? k w~k ) Azk + k zk ? k zk? 2

+1

+1

+1

+1

+1

2

+1

+1

+1

+1

+1

1

4.3.3 TF Lanczos/Orthodir and CGS For the choice Vk  Pk , we have a0k = 1; b0k = k and d0k = k . With this choice, (1) +1

+1

+1

+1

+1

+1

+1

v^k = p^k , the two expressions for z^k coincide and v^k0 = p^0k . The expressions for k and k can +1

be simpli ed and, as showed in [12], we have

k k

2

2

= c(Pk )=c2 (Pk? ) 2 = ?c( Pk )=c(Pk ) (1)

+1

2

+1

(1)

(1) 1

(1)

+1

(30)

that is

k k

= (y; Az^k)=(y; Az^k? ) = ?(y; A z^k )=(y; Az^k): Now, for the choice Uk = Pk , r~k is the residual of the CGS implemented via the recurrence relationships (10) and s~k = u~k . However, as in the TFminCGS, we cannot use directly the algorithms given in Section 4.3 since Pk is not computed by a relation of the same form as (12). Since k = c(Pk )=c(Pk Pk ) = (y; r~k )=(y; Az~k), we need to compute the following products of polynomials +1

+1

+1

2

1

2

+1

+1

(1)

+1

22

Pk Pk Pk Pk Pk

Pk Pk Pk Pk

2

2 +1

+1 +1

(1)

Pk Pk

or

(1)

+1 +1 +1

= Pk Pk ? k Pk Pk Pk = Pk ? k Pk Pk 2 Pk = Pk Pk ? k Pk Pk = ( + k )Pk Pk ? k Pk Pk? +1

+1

2

+1 (1)

(1)

(1)

+1

(1) +1

(1)

+1 (1)

+1

+1

(1)

+1

+1

(1) 1

Pk Pk = Pk Pk ? k Pk Pk Pk Pk? = Pk Pk? ? k Pk Pk? Pk Pk = ( + k )Pk Pk ? k Pk Pk? (1) +1 (1) +1 1 (1) +1

(1) +1

+1

Pk Pk? Pk Pk

(1) +1 1 (1) +1

(1) 1

2

(1)

+1 (1)

+1

(1)

+1

(1) +1

(1) 1

(1) 1

+1

In addition, we need the products Pk ; Pk Pk? (and consequentely for computing these products, also Pk? Pk ), but they correspond respectively to z^k , v^k and v^k0 . Thus, nally, we have the two following algorithms (1)

(1) 1

(1)

(1) 1

(1) +1

u~k ? k Av~k r~k ? k Az~k z~k ? k Az^k Az^k + k z^k ? k v^k (A + k I )^vk ? k v^k0 (A + k I )^vk ? k z^k? z~k ? k v~k0 z~k = p~k ? k Av^k v~k0 p~k = (A + k I )~zk ? k v~k : Replacing u~k by its expression in r~k , v~k0 by its expression in z~k (or p~k in the second form of z~k ) and v^k0 by its expression in z^k allows to suppress them from the relations, and we r~k u~k v~k v^k z^k v^k0 = (A + k I )~vk = v~k ? k Av^k +1

+1

+1

+1

+1

+1

+1

= = = = = =

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

+1

obtain

+1

+1

+1

+1

+1

+1

+1

+1

+1

r~k z^k

+1 +1

= r~k ? k A(~zk + v~k ) = (A + k I )(^vk ? k v^k ) + k z^k? +1

(31)

+1

+1

+1

2 +1

+1

1

= (A + k I )~vk ? k (~vk ? k Av^k ) or equivalently z~k = (A + k I )~zk ? k v~k ? k Av^k : Summing up the two relations (32), we have

z~k

z~k

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

9 > = > ;

(32)

= 1=2(A + k I )(~zk + v~k ) ? k v~k ? k =2A(^vk ? k v^k ):

+1

+1

+1

+1

+1

2

+1

+1

Since c(Pk? Pk ) = 0 and c( Pk Pk? ) = c(Pk ), we have (y; Ap~k ) = 0, (y; A v^k ) = (y; Az^k). Thus, from z~k = p~k ? k Av^k , it follows k = ?  ((y;y;r~Ak z)^ ) : k k Since v~k = z~k ? k Az^k and (y; v~k ) = 0, it is also possible to compute the coecient k directly from this relation and we have k = ((y;y;Az~z^k )) : k It seems that, using this relation, the algorithm is more stable. 1

(1)

2

(1)

(1) 1

(1)

2

+1

+1

+1

+1

+1

+1

23

From (31), we immediately obtain

x~k

= x~k + k (~zk + v~k ):

+1

+1

+1

In this algorithm, the products Av^k which are needed can be computed recursively. Indeed, setting w^k = Av^k, we have w^k = A z^k + k Az^k ? k w^k : We also set z~ = z~k + v~k , v^ = v^k ? k v^k and w^ = w^k ? k w^k . The rst TF Lanczos/Orthodir algorithm coupled with the CGS algorithm is the following. 2

+1

+1

+1

+1

+1

+1

+1

+1

Algorithm TFdirCGS1 (A; b; x ; y) 0

initializations r b ?Ax 0

0

z = r~ = z~ = z^ = r x~ = x z? = z^? = v~ = v^ = w^ = 0  = ?1 for k = 0; 1; 2; : : : until convergence do (y; z~k ) k ?  ((y;y;r~Ak )z^ ) or k (y; Az^k) k k TF rk rk ? k Azk TF xk xk + k zk v~k z~k ? k Az^k z~ z~k + v~k r~k r~k ? k Az~ x~k x~k + k z~ if krk k  " or kr~k k  " then stop. k ?(y; A z^k )=(y; Az^k) if k > 0 then

k (y; Az^k)=(y; Az^k? ) 0

0

0

0

0

0

0

1

1

0

0

0

0

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

2

+1

+1

1

else

0 end if 1

v^k Az^k + k z^k ? k v^k v^ v^k ? k v^k w^k = A z^k + k Az^k ? k w^k w^ w^k ? k w^k z^k w^ + k v^ + k z^k? 1 (A + I ) z~ ? v~ ? k w^ z~k k k k 2 2 zk (A + k I ) zk ? k zk? +1

+1

+1

+1

2

+1

+1

TF

+1

+1

end for

+1

+1

+1

+1

+1

+1

2 +1

+1

1

+1

+1

+1

+1

1

This algorithm will be denoted by 1.a with the rst expression for k , and by 1.b with the second one. +1

Now if we choose Uk  Vk  Pk , we have ak = 1; bk = k and dk = k , the hat vectors coincide with the tilde vectors, v~k = p~k and the two expression for z~k coincide. The +1

+1

(1) +1

+1

24

+1

+1

+1

+1

algorithms become r~k = (A + k I )~sk ? k q~k0 r~k = u~k ? k Av~k q~k0 = s~k ? k Av~k u~k = (A + k I )~rk ? k s~k s~k = r~k ? k Az~k (33) v~k = Az~k + k z~k ? k v~k z~k = (A + k I )~vk ? k v~k0 v~k0 = (A + k I )~vk ? k z~k? : As above, the expressions for k and k simplify and, from (30), we have

k = (y; Az~k)=(y; Az~k? ) k = ?(y; A z~k )=(y; Az~k): and always k = (y; r~k )=(y; Az~k): The vectors r~k obtained by this algorithm are not the residuals of a product method since Pk (0) 6= 1. For obtaining the residual of the CGS we must compute the product Pk (which corresponds to Wk  Pk in (11)). We have +1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

1

2

+1

+1

(1)

2

Pk = Pk ? 2k Pk Pk + k  Pk 2

2 +1

(1)

+1

2

+1

2

(1)

2

2

where the products Pk Pk and Pk correspond to the vectors r~k and z~k . Setting rk = Pk (A)r we have the additional relation (34) rk = rk ? 2k Ar~k + k A z~k Substituting, in the rst relation for r~k given by (33), the expression of q~k0 , we obtain r~k = As~k + k s~k ? k (~sk ? k Av~k): Thus we need to compute recursively the products As~k and Av~k. Indeed, setting s~0k = As~k , we have s~0k = Ar~k ? k A z~k and setting w~k = Av~k, we have w~k = A z~k + k Az~k ? k w~k : Substituting v~k0 in z~k we have z~k = Av~k + k v~k ? k (Av~k + k v~k ? k z~k? ): From (34) we have rk = rk ? k Ar~k ? k A(~rk ? k Az~k ) = rk ? k Ar~k ? k As~k : Thus xk = xk + k r~k + k s~k : Thus we obtain another transpose-free Lanczos/Orthodir, coupled with the GCS. (1)

(1)

2

+1

0

2

+1

+1

2

+1

+1

+1

+1

+1

+1

+1

2

+1

+1

+1

2

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

25

+1

+1

+1

+1

+1

1

Algorithm TFdirCGS2 (A; b; x ; y) 0

initializations r b ?Ax 0

0

z = r = r~ = z~ = r x = x z? = z~? = v~ = s~ = w~ = 0 for k = 0; 1; 2; : : : until convergence do (y; r~k) k (y; Az~k) TF rk rk ? k Azk TF xk xk + k zk s~k r~k ? k Az~k Ar~k ? k A z~k s~0k rk rk ? k Ar~k ? k s~0k xk xk + k r~k + k s~k if krk k  " or krk k  " then stop. k ?(y; A z~k )=(y; Az~k) if k > 0 then

k (y; Az~k)=(y; Az~k? ) 0

0

0

0

0

0

0

1

1

0

0

0

+1

+1

+1

+1

+1

+1

+1

2

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

2

+1

+1

+1

1

else

0 end if 1

r~k s~0k + k s~k ? k (~sk ? k w~k ) v~k Az~k + k z~k ? k v~k w~k = A z~k + k Az~k ? k w~k z~k w~k + k v~k ? k (w~k + k v~k ? k z~k? ) zk (A + k I ) zk ? k zk? +1

+1

+1

+1

+1

+1

2

+1

TF

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

end for

+1

+1

+1

+1

1

1

Now, if we consider the second relation for r~k , given by (33), and we substitute in it the expression given for u~k , we have r~k = Ar~k + k r~k ? k s~k ? k Av~k : From (34), we directly have xk = xk + 2k r~k ? k Az~k: Thus we obtain a third transpose-free Lanczos/Orthodir, coupled with the GCS. +1

+1

+1

+1

+1

+1

+1

+1

2

+1

Algorithm TFdirCGS3 (A; b; x ; y) 0

initializations r b ?Ax 0

0

z = r = r~ = z~ = r x = x z? = z~? = v~ = s~ = w~ = 0 for k = 0; 1; 2; : : : until convergence do (y; r~k) k (y; Az~ ) 0

0

0

0

0

0

0

1

+1

1

0

0

0

k

26

+1

TF TF

rk ? k Azk xk + k zk rk ? 2k Ar~k + k A z~k xk + 2k r~k ? k Az~k if krk k  " or krk k  " then stop. k ?(y; A z~k )=(y; Az~k) if k > 0 then

k (y; Az~k)=(y; Az~k? )

rk xk rk xk

+1

+1

+1

+1

+1

+1

2

+1

+1

+1

+1

+1

else

1

end if

2

+1

2

+1

1

0

w~k = A z~k + k Az~k ? k w~k r~k Ar~k + k r~k ? k s~k ? k w~k v~k Az~k + k z~k ? k v~k z~k w~k + k v~k ? k (w~k + k v~k ? k z~k? ) s~k r~k ? k Az~k zk (A + k I ) zk ? k zk? +1

2

+1

+1

+1

+1

+1

+1

+1

+1

+1

TF

2

+1

+1

+1

+1

end for

+1

+1

+1

+1

+1

+1

+1

1

+1

+1

+1

1

All the algorithms of this Section need 4 matrix{vector products and 3 inner products.

5 Numerical results We will now present the results obtained, using the Matlab versions of the algorithms, with various systems, all of dimension 200. The solution was always chosen randomly and the right hand side b was computed accordingly. All the algorithms were started from x = 0, with y = r . The curves represent, in a logarithmic scale, the norm of the residuals. 0

5.1 Example 1

Let us consider the matrix

0 2 B B 0 B B B1 A = BB B B B @

1 2 0 ...

1 2 ... ...

1 ... ... 1

... ... 0

1 CC CC CC CC : C 1C A 2

0

0

Its condition number is 2:91. The conclusions of the numerical results are the following 1. For Lanczos method, the results with Uk and Vk = k are not as good as with the other choices (see Figure 1). 2. In the algorithm TFdirCGS1, the computation of k by the second formula (TFdirCGS1.b) leads to better results either for the TF Lanczos method (see Figure 2) or for the CGS itself. The results of TFdirCGS1.b are quite similar to the results obtained by the other variants (TFdirCGS2 and TFdirCGS3). +1

27

10 11

norm of the residual

10 8

10 5

10 2

10 -1

10 -4

0

20

40

60

80

100

120

140

160

180

200

iteration

Figure 1: Example 1. TF Lanczos/Orthores TFres1 (solid line), TF Lanczos/Orthomin (dashed line), TF Lanczos/Orthodir TFdir2 (dotdash line), all with powers 10 1 10 0

norm of the residual

10 -1 10 -2 10 -3 10 -4 10 -5 10 -6 10 -7

0

10

20

30

40

50

60

iteration

Figure 2: Example 1. TF Lanczos/Orthodir coupled with the CGS, Algorithms TFdirCGS1.a (solid line) and TFdirCGS1.b (dashed line)

28

10 1 10 0 10 -1

norm of the residual

10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9

0

10

20

30

40

50

60

iteration

Figure 3: Example 1. TF Lanczos/Orthores (solid line), TF Lanczos/Orthomin (solid line), TF Lanczos/Orthodir TFdirCGS1.b (dotdash line), all coupled with the CGS 10 1 10 0 10 -1

norm of the residual

10 -2 10 -3 10 -4 10 -5 10 -6 10 -7 10 -8 10 -9

0

10

20

30

40

50

60

iteration

Figure 4: Example 1. Lanczos/Orthores and TF Lanczos/Orthores coupled with the CGS

29

3. The best algorithms for implementing the TF Lanczos method with the choice of Uk and Vk corresponding to the CGS seem to be Orthores and Orthomin (see Figure 3 where the curves coincide). Orthodir seems to be always less precise. 4. Comparing the algorithms using AT with the various implementations of the method of Lanczos via CGS shows that the results for Orthores (see Figure 4) and also for Orthomin coincide while, for Orthodir, the transpose{free algorithm is not as good as the algorithm using the transpose (see Figure 5). 10 4

norm of the residual

10 1

10 -2

10 -5

10 -8

10 -11

0

20

40

60

80

100

120

140

160

180

200

iteration

Figure 5: Example 1. Lanczos/Orthodir (solid line) and TF Lanczos/Orthodir TFdirCGS1.b coupled with the CGS (dashed line)

5.2 Example 2

Let us now consider the matrix redhe of the MATLAB Test Matrix Toolbox [20]. Its condition number is 458:8. For this example, the conclusions are the following 1. The 3 di erent implementations of the BiCGSTAB give quite similar results but the recurrence relationships of Orthomin are a little bit better at the end of the iterations (see Figure 6). 2. The transpose{free Lanczos method coupled with CGS based on Orthores and Orthomin give almost the same results but Orthodir (TFdirCGS1.b) is unstable (see Figure 7). The same conclusions hold for the implementations of the CGS itself. 3. The implementations of Lanczos method making use of AT give quite di erent results (see Figure 8, for Orthomin). 4. The iterative residuals of the BiCGSTAB (that is those computed recursively) obtained by our algorithm TFresBiCGSTAB and those obtained by the BIOStab of [23] are almost the 30

10 11 10 8 10 5

norm of the residual

10 2 10 -1 10 -4 10 -7 10 -10 10 -13 10 -16 10 -19

0

5

10

15

20

25

iteration

Figure 6: Example 2. BiCGSTAB obtained with TF Lanczos/Orthores (solid line), TF Lanczos/Orthomin (dashed line) and TF Lanczos/Orthodir (dotdash line) 10 7

norm of the residual

10 4

10 1

10 -2

10 -5

10 -8

0

5

10

15

20

25

30

35

iteration

Figure 7: Example 2. TF Lanczos/Orthores (solid line), TF Lanczos/Orthomin (dashed line), TF Lanczos/Orthodir TFdirCGS1.b (dotdash line), all coupled with the CGS

31

10 10 10 7

norm of the residual

10 4 10 1 10 -2 10 -5 10 -8 10 -11 10 -14 10 -17

0

5

10

15

20

25

30

35

iteration

Figure 8: Example 2. Lanczos/Orthomin (solid line) and TF Lanczos/Orthomin coupled with the CGS (dashed line) same and they reach ' 10? at iteration 26. However, the actual residuals (that is those computed by the formula rk = b ? Axk) stagnate around ' 10? after iteration 15 for both methods. On the other hand, using our algorithm TFminBiCGSTAB leads to iterative and actual residuals for the BiCGSTAB which are quite the same and they achieve the value 10? at iteration 20 (see Figure 9). 14

2

14

6 Conclusions In this paper, we show how to implement the method of Lanczos for solving nonsymmetric linear systems without using the tranpose of the matrix. Our approach is based on the recursive computation of the products of polynomials appearing in the recurrence relationships of the formal orthogonal polynomials underlying the method of Lanczos. In fact, there are many such transpose{free algorithms. In some cases, these algorithms allow a coupled implementation of a Lanczos{type product method at almost no extra cost which consists, in fact, in adding a few instructions (with no additional matrix{vector product) in the code of the product method. In this paper, we gave some of these algorithms and perform a few numerical experiments. It seems that the more stable algorithms are those based on the recurrence relationships of Lanczos/Orthomin. The less stable algorithms are those programmed via Lanczos/Orthodir and, moreover, they need an extra matrix{vector product. Transpose{free versions of the other algorithms given in [1] for implementing the method of Lanczos can be obtained similarly. They remain to be studied. The treatment of breakdowns and near{breakdowns in all these algorithms still has to be done following the procedures described in [8, 9, 6]. It will also consists in adding some instructions in the codes of the corresponding product method. This will be the subject of a future work. Transpose{free versions of the new look{ahead algorithms described in [10] are also under consideration. 32

10 10 10 7

norm of the residual

10 4 10 1 10 -2 10 -5 10 -8 10 -11 10 -14 10 -17

0

5

10

15

20

25

iteration

Figure 9: Example 2. Iterative (solid line) and actual (dashed line) residuals for the BiCGSTAB with TFminBiCGSTAB

References

[1] C. Baheux, New implementations of Lanczos method, J. Comput. Appl. Math., 57 (1995) 3{15. [2] C. Brezinski, Pade{type Approximation and General Orthogonal Polynomials, ISNM vol. 50, Birkhauser, Basel, 1980. [3] C. Brezinski, CGM: a whole class of Lanczos{type solvers for linear systems, Note ANO{253, Laboratoire d'Analyse Numerique et d'Optimisation, Universite des Sciences et Technologies de Lille, November 1991. [4] C. Brezinski, A transpose{free Lanczos/Orthodir algorithm for linear systems, C.R. Acad. Sci. Paris, Ser. I, 324 (1997) 349{354. [5] C. Brezinski, M. Redivo{Zaglia, Hybrid procedures for solving systems of linear systems, Numer. Math., 67 (1994) 1{19. [6] C. Brezinski, M. Redivo{Zaglia, Look{ahead in Bi{CGSTAB and other product methods for linear systems, BIT, 35 (1995) 169{201. [7] C. Brezinski, M. Redivo{Zaglia, Transpose{free implementations of Lanczos's method for nonsymmetric linear systems, Note ANO{372, Laboratoire d'Analyse Numerique et d'Optimisation, Universite des Sciences et Technologies de Lille, 1997. [8] C. Brezinski, M. Redivo{Zaglia, H. Sadok, Avoiding breakdown and near{breakdown in Lanczos type algorithms, Numerical Algorithms, 1 (1991), 261{284. [9] C. Brezinski, M. Redivo{Zaglia, H. Sadok, A breakdown{free Lanczos type algorithm for solving linear systems, Numer. Math., 63 (1992) 29{38. 33

[10] C. Brezinski, M. Redivo{Zaglia, H. Sadok, New look{ahead Lanczos{type algorithms for linear systems, submitted. [11] C. Brezinski, H. Sadok, Some vector sequence transformations with applications to systems of equations, Numerical Algorithms, 3 (1992) 75{80. [12] C. Brezinski, H. Sadok, Lanczos{type algorithms for solving systems of linear equations, Appl. Numer. Math., 11 (1993) 443{473. [13] T. F. Chan, L. de Pillis, H. van der Vorst, A transpose{free squared Lanczos algorithm and application to solving nonsymmetric linear systems, Numerical Algorithms, to appear. [14] R. Fletcher, Conjugate gradient methods for inde nite systems, in Numerical Analysis, Dundee 1975, G.A. Watson ed. LNM vol. 506, Springer, Berlin, 1976, pp. 73{89. [15] D.R. Fokkema, Subspace Methods for Linear, Nonlinear, and Eigen Problems, Thesis, University of Utrecht, 1996. [16] D.R. Fokkema, G.L.G. Sleijpen, H.A. van der Vorst, Generalized conjugate gradient squared, J. Comput. Appl. Math., 71 (1996) 125{146. [17] W. Gander, G.H. Golub, D. Gruntz, Solving linear equations by extrapolation, in Supercomputing, J.S. Kovalik ed., Springer{Verlag, Berlin, 1989, pp. 279{293. [18] M.H. Gutknecht, The unsymmetric Lanczos algorithms and their relations to Pade approximation, continued fractions, and the qd algorithm, in Proceedings of the Copper Mountain Conference on Iterative Methods, Copper Mountain, Colorado, April 1{5, 1990, vol. 2, unpublished. [19] M.H. Gutknecht, Variants of BiCGStab for matrices with complex spectrum, SIAM J. Sci. Comput., 14 (1993) 1020{1033. [20] N.J. Higham, The Test Matrix Toolbox for MATLAB (Version 3.0), Numerical Analysis Report No. 276, Departments of Mathematics, The University of Manchester, September 1995. [21] C. Lanczos, An iteration method for the solution of the eignevalue problem of linear di erential and integral operators, J. Res. Natl. Bur. Stand., 45 (1950) 255{282. [22] C. Lanczos, Solution of systems of linear equations by minimized iterations, J. Res. Natl. Bur. Stand., 49 (1952) 33{53. [23] K.J. Ressel, M.H. Gutknecht, QMR{smoothing for Lanczos{type product methods based on three{term recurrences, to appear. [24] H. Rutishauser, Der Quotienten{Di erenzen{Algorithmus, Z. Angew. Math. Phys., 5 (1954) 233{251. [25] P. Sonneveld, CGS, a fast Lanczos{type solver for nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 10 (1989) 35{52. [26] H.A. van der Vorst, Bi{CGSTAB: a fast and smoothly converging variant of Bi{CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 13 (1992) 631{644. 34

[27] P.K.W. Vinsome, Orthomin, an iterative method for solving sparse sets of simultaneous linear equations, in Proceedings 4th Symposium on Reservoir Simulation, Society of Petroleum Engineers of AIME, 1976, pp. 149{159. [28] D.M. Young, K.C. Jea, Generalized conjugate{gradient acceleration of nonsymmetrizable iterative methods, Linear Algebra Appl., 34 (1980), 159{194.

35