INEXACT PRECONDITIONED CONJUGATE GRADIENT METHOD WITH INNER-OUTER ITERATION Gene H. Golub
Qiang Ye
y
Abstract
An important variation of preconditioned conjugate gradient algorithms is inexact preconditioner implemented with inner-outer iterations [5], where the preconditioner is solved by an inner iteration to a prescribed precision. In this paper, we formulate an inexact preconditioned conjugate gradient algorithm for a symmetric positive de nite system and analyze its convergence property. We establish a linear convergence result using a local relation of residual norms. We also analyze the algorithm using a global equation and show that the algorithm may have the superlinear convergence property, when the inner iteration is solved to high accuracy. The analysis is in agreement with observed numerical behaviour of the algorithm. In particular, it suggests a heuristic choice of the stopping threshold for the inner iteration. Numerical examples are given to show the eectiveness of this choice and to compare the convergence bound.
1 Introduction Iterative methods for solving linear systems are usually combined with a preconditioner that can be easily solved. For some practical problems, however, a natural and ecient choice of preconditioner may be one that can not be solved easily by a direct method and thus may require an iterative method (called inner iteration) itself to solve the preconditioned equations. There also exist cases where the matrix operator contains inverses of some other matrices, an explicit form of which is not available. Then the matrix-vector products can only be obtained approximately through an inner iteration. The linear systems arising from the saddle point problems [3] is one such example. For these types of problems, the original iterative method will be called the outer iteration and the iterative method used for solving the preconditioner or forming the matrix-vector products is called the inner iteration. A critical question in the use of the inner-outer iterations is to what precision the preconditioner should be solved, i.e., what stopping threshold should be used in the inner iteration. Clearly, a very high precision will render the outer iteration close to the exact case and a very low one on the other hand could make the outer iteration irrelevant. An optimal one will be to allow the stopping
Address : Scienti c Computing and Computational Mathematics Program, Department of Computer Science, Stanford University, Stanford, CA 94305, USA E-mail :
[email protected]. Research supported in part by National Science Foundation Grant DMS-9403899. y Address : Department of Applied Mathematics, University of Manitoba, Winnipeg, Manitoba, Canada R3T 2N2. E-mail:
[email protected]. Research supported by Natural Sciences and Engineering Research Council of Canada.
1
threshold as large as possible in order to reduce the cost of inner iteration, while maintaining the convergence characteristic of the outer iteration. In other words, we wish to combine the inner and outer iterations so that the total number of operations is minimized. An answer to the above question requires understanding of how the accuracy in the inner iteration aects the convergence of the outer iteration. This has been studied by Golub and Overton [5, 6] for the Chebyshev iteration and the Richardson iteration, by Munthe-Kaas [8] for preconditioned steepest descent algorithms, by Elman and Golub [3] for the Uzawa algorithm for the saddle point problems, and by Giladi, Golub and Keller [4] for the Chebyshev iteration with a varying threshold. Golub and Overton also observed the interesting phenomenon for the preconditioned conjugate gradient algorithm (as the outer iteration) that the convergence of CG could be maintained for very large stopping threshold in the inner iteration; yet the convergence rate may be extremely sensitive to the change of the threshold at certain point. Given that the known convergence properties of CG depends strongly on a global minimization property, the phenomenon found in [5, 6] seems very surprising and makes the inexact preconditioned conjugate gradient an attractive option for implementing preconditioners. However, there has been no theoretical analysis to explain these interesting phenomena, and its extreme sensitivity to the threshold makes it hard to implement it in practice. The present paper is an eort in this direction. We shall formulate and analyze an inexact preconditioned conjugate gradient method for a symmetric positive de nite system. By establishing a local relation between consecutive residual norms, we prove a linear convergence property with a bound on the rate and illustrate that the convergence rate is relatively insensitive to the change of threshold up to certain point. In particular, the result is used to arrive at a heuristic choice of the stopping threshold. We also show, using a global relation as in [9], that the algorithm may have the superlinear convergence property when the global orthogonality is nearly preserved, which usually occurs with smaller thresholds and shorter iterations. The paper is organized as follows. In section 2, we present the inexact preconditioned conjugate gradient algorithm, and some of its properties together with two numerical examples illustrating its numerical behaviour. We then give in section 3 a local analysis, showing the linear convergence property, and in section 4, a global analysis, showing the superlinear convergence property. Finally, we give some numerical examples in section 5 to illustrate our results. We shall use the standard notation in numerical analysis. p The M -norm k kM and the M inner product < ; >M (for M > 0) are de ned by kv kM = v T Mv and < u; v >M = uT Mv respectively. cond(A) denotes the spectral condition number of a matrix A. A+ denotes the Moore-Penrose generalized inverse of A.
2 PCG with inner-outer Iterations We consider the preconditioned conjugate gradient (PCG) algorithm for solving Ax = b with a preconditioner M , where both A and M are symmetric positive de nite. Then at each step of PCG iteration, a search direction pn is found by rst solving the preconditioned system Mzn = rn . If a direct method is not available for solving M , an iterative method, possibly (though not necessarily) CG itself, can be used to solve it. In this case, we nd zn by the inner iteration such that Mzn = rn + en with the stopping criterion 2
ken kM ?1 krnkM ?1
(RES)
where 2 [0; 1) is the stopping threshhold in the inner iteration. Here we have used the M ?1 -norm
for the theoretical convenience. In practical computations, one can replace (RES) by the following criterion in terms of the 2-norm kenk2 ^krnk2 where ^ = =cond(M ). Because zn is only used to de ne the search direction pn in PCG, only its direction has any signi cance. Therefore, we also propose the following direction based stopping criterion jznT rnj j(Mzn)T M ?1rnj cos = (ANG) kznkM krnkM ?1 kMznkM ?1 krnkM ?1 i.e., the acute angle between Mzn and rn in the M ?1 -inner product (or the acute angle between zn and M ?1 rn in the M -inner product) is at most . Remark 1: It can be easily proved (using a triangular relation in the M ?1-inner product) that if (RES) is satis ed, then j(Mzn)T M ?1rnj cos = q1 ? 2 kMz k kr k i.e., (ANG) is satis ed with
n M ?1 n M ?1
sin = : However, the converse implication is not necessarily true. Therefore the criterion (ANG) is less restrictive than (RES). Namely, (ANG) may be satis ed before (RES) does in the inner iteration. However, our numerical tests suggest that there is usually little dierence between the two. Our introduction of (ANG) and the use of M ?1 -inner product are primarily for the theoretical convenience. Remark 2: If the inner iteration is carried out by CG, the residual en is orthogonal to Mzn in the M ?1 -inner product. Then again by a triangular relation, we have
s j(Mzn)T M ?1rnj = 1 ? kenkM ?1 2: kMznkM ?1 krnkM ?1 krnkM ?1
So in this case, (RES) and (ANG) are equivalent. It can also be proved that the CG iteration minimizes the residual ken kM ?1 as well as the angle in (ANG) over the Krylov subspace concerned. Now, we formulate the inexact preconditioned conjugate gradient algorithm (IPCG) as follows. Inexact Preconditioned Conjugate Gradient Algorithm (IPCG):
Input x1 and ; Initialize r1 = b ? Ax1 ; p0 = r0 = 0; 1 = 0 For n = 1; 2; :::::: Mzn = rn + en with (RES) (or (ANG)) satis ed; if n > 1, T n = zn z(rTn ?r rn?1 ) ; n?1 n?1 end if pn = zn + npn?1 T rn T rn ! z p n n = T equivalently n = Tn
pn Apn
pn Apn 3
rn+1 = rn ? n Apn xn+1 = xn + n pn
end for
Clearly, if or = 0, the above algorithm is the usual PCG. With (or ) > 0, we allow the preconditioner Mzn = rn to be solved only approximately. Remark 3. As is well-known, there are several dierent formulations for n and n in the algorithm, which are all equivalent in the exact PCG. For the inexact case, this may no longer be true. Our particular formulation here implies that some local orthogonality properties are maintained even with inexact zn , as given in the next lemma. Our numerical tests also show that it indeed leads to a more stable algorithm. For example, if n is computed by the old form znT+1 rn+1 =znT rn , the algorithm may not converge for larger .
Lemma 1 The sequences generated by Algorithm IPCG satis es the following local orthogonality Proof First Then
pTn?1 rn+1 = 0; pTn rn+1 = 0; znT rn+1 = 0; pTn Apn+1 = 0:
(1)
Apn zn+1 Apn n+1 = ?n zpnT+1 Ap = ? pT Ap :
(2)
T
n n
T
n
n
n
pTn Apn+1 = pTn Azn+1 + n+1 pTn Apn = 0: Thus znT Apn = (pn ? n pn?1 )T Apn = pTn Apn (with 1 = 0) and znT rn+1 = znT rn ? nznT pn = 0: Now supposing pTn?1 rn = 0, we have
pTn rn+1 = pTn rn ? n pTn Apn = znT rn ? n pTn Apn = 0: and
pTn?1 rn+1 = pTn?1 rn ? n pTn?1 Apn = 0: Thus the lemma follows from pT1 r2 = z1T r2 = 0 by an induction argument.
By eliminating pn in the recurrence, IPCG can be written with two consecutive steps as a second order recurrence rn+1 = rn?1 + !n (n Azn + rn?1 ? rn); a uni ed form that includes Chebyshev and second order Richardson iterations (cf [2]). From the local orthogonality, we obtain the following local minimization property.
Proposition 1 and
krn+1kA?1 = mintkrn ? tApnkA?1 < krnkA?1 : krn+1kA?1 = mins;tkrn?1 + tAzn + s(rn?1 ? rn)kA?1 4
Proof By the orthogonality, krn ? tApn k2A?1 = krn+1k2A?1 + (n ? t)2kApnk2A?1 krn+1k2A?1 . It is easy to check that znT rn = 6 0 for either (RES) or (ANG). So n 6= 0 and thus we have the strict
inequality. For the second inequality, we just need to show rn+1 ?A?1 Azn and rn?1 ? rn . This follows from T rn+1 zn = 0 and rnT+1 A?1 (rn?1 ? rn) = rnT+1 (npn?1 ) = 0. Recall that the steepest descent method constructs rnsd+1 = rn ? tn Arn from rn such that sd krn+1kA?1 = mintkrn ? tArnkA?1 . A bound on krnsd+1kA?1 can be obtained from the Kantrovich inequality. An inexact preconditioned version of the steepest descent method [8] is to construct rnsd+1 = rn ? tn Azn from rn with tn = znT rn =znT Azn such that ! T rn )2 ( z n sd 2 2 2 kr k = min kr ? tAz k = kr k 1? : (3) n+1 A?1
t n
n A?1
n A?1
znT AznrnT A?1 rn Then the second inequality of the above proposition shows that krn+1kA?1 mintkrn ? tAzn kA?1 (by choosing s = 0) and thus krn+1 kA?1 krnsd+1 kA?1 .
2.1 A Numerical Example
To motivate our discussion in the next two sections, we now present two numerical examples to illustrate the typical convergence behaviour of IPCG. We simulate IPCG by arti cially perturbing rn, i.e. applying PCG with M = I and zn+1 = rn+1 + krn+1 kfn=kfnk, where fn is a pseudo random vector with entries uniformly distributed in (?0:5; 0:5), and is a perturbation parameter. The following (Fig.1) are the convergencepcurves p for two p diagonal matrices A = diag[1; 1=2; 1=3; :::; 1=10000] (10000 10000) and A = diag[1; 2; 3; :::; 1000] (1000 1000) with a random constant term. We rst note that the A?1 -norm of the residuals decreases monotonically (cf Prop. 2.1) and convergence occurs for quite large . We further observe from the second example that for smaller (= 1e ? 5) the superlinear convergence property of exact CG is recovered. In fact, they follow the unperturbed case very closely. When is larger, the convergence tends to be linear, with a rate depending on . Interestingly, the convergence rate is insensitive to the change of magnitude of until a certain point (0:1 in the rst and 0:4 in the second), around which it becomes extremely sensitive to a relatively small change of . Our explanation and analysis to the above observation will be given in two categories, although there is no clear boundary between the two. For smaller , some global properties (e.g. the orthogonality among rn ) is expected to be preserved, which leads to a global near minimization property and thus the superlinear convergence. For larger , the global minimization property will be lost. However, we will show that some local relation from rn to rn+1 is not destroyed, which turns out to preserve a linear convergence property. We consider each of these two categories separately in the next two sections.
3 Linear Convergence of IPCG In this section we present a local relation between consecutive residual norms, which lead to a linear convergence bound for IPCG. Our basic idea is to relate the reduction factor of IPCG T A?1 rn+1 n = rn+1 rT A?1 r n
5
n
(4)
Figure 1: Convergence curves for various perturbations (solid (bottom): = 0, +: = 10?5 , dotted: = 0:1, dashed: = 0:2, dash-dotted: = 0:4, solid (top): = 0:6) A = diag{i}
0
A = diag{sqrt(i)}
0
10
10
−2
10 −1
−4
10
A−inverse−norm of residuals
A−inverse−norm of residuals
10
−2
10
−3
10
−6
10
−8
10
−10
10
−12
10
−14
10
−4
10
−16
10 −5
10
0
−18
50 100 150 number of iterations
10
200
0
20 40 60 number of iterations
80
to the reduction factor of the steepest descent method (along the inexact preconditioned gradient direction zn , see (3) of section 2)
rA?1 r = 1 ? (znT rn)2 :
n = r=rmin rT A?1 r zT Az n ?tAzn r A?1 r n
n
n
From Proposition 2.1, we have n n < 1.
n n
n
(5)
Theorem 1 Let n and n be de ned as in (4) and (5) for IPCG. Then for n 2, we have 1 ? n znT rn?1 ; (a) n = 1 ? with = n znT rn 1 ? (1 ? n )(1 ? n )2 =(n??11 ? 1) rT A?1 r 1 1 ? 1 1 ? 1 where g is de ned by (b) n+1T ?1 n+1 = 1 ? k g g g r A r 1
1
n?1
n
g1 = 1 ?1
1
and gk =
1
1
2 1 ? k + (1 ? k ) (1 ? gk?1 ):
Proof From rnT+1pn = 0, we have rnT A?1 rn = rnT+1 A?1 rn+1 + 2n pTn Apn: 6
(6)
Substituting n = pzTnnAprnn in, we obtain T
Thus
T
2
n
n
rn ) : rnT+1 A?1 rn+1 = rnT A?1 rn ? (pzTnAp T
2
n = 1 ? rT A?(z1nrrnp)T Ap : n n
n
n
(7) (8)
Now, using zn = pn ? n pn?1 and pTn Apn?1 = 0, we have znT Azn = pTn Apn + n2 pTn?1 Apn?1 . Noting T n?1 that n = ? pTnzn?Ap from (2), we have 1 Apn?1
T n?1 )2 pTn?1 Apn?1 pTn Apn = znT Azn ? ( pzTn Ap Ap n?1 n?1 T Apn?1 )2 ( z = znT Azn ? Tn pn?1 Apn?1 [?n?1 1 znT (rn?1 ? rn )]2 T = zn Azn ? ?2 T ?1 n?1 (rn?1 A rn?1 ? rnT A?1 rn) ! T rn?1 ? z T rn )2 ( z n n T = zn Azn 1 ? T zn Azn rnT A?1 rn (n??11 ? 1) ! T rn )2(1 ? n )2 ( z n T = zn Azn 1 ? T zn Azn rnT A?1 rn (n??11 ? 1) 2! (1 ?
)(1 ? ) n n T = zn Azn 1 ? n??11 ? 1 where n = znT rn?1 =znT rn , and we have used Apn?1 = ?n?1 1 (rn?1 ? rn ) and (6). Now substituting
the above and (5) into (8), we obtain part (a). For part (b), let gk = 1?1k . From (a),
?k ) 1 ? (1? ?k )(1 1 ?1 k ?1 gk = 1 ? = 1 ? k k 2 = 1 ?1 ? (1??1 k ) = 1 ?1 + (1 ? k )2(1 ? gk?1 ) k k?1 ? 1 k Since 1 = 1 (the rst step of IPCG amounts to one step of the steepest descend method), g1 = 1?11 = 1?1 1 by the de nition. Now (b) follows from rnT+1 A?1 rn+1 =r1T A?1 r1 = n n?1 1 and k = 1 ? g1k . In the exact case, n = 0 by the orthogonality and the above equations are simpli ed. In the inexact case, using the local orthogonality (Lemma 2.1), n can be bounded. We shall consider the stopping criterion (ANG) only in this section. Since (RES) implies (ANG) with sin = , all results here apply to the (RES) case as well by simply replacing sin by . First we need the following lemma that is geometrically clear. (An acute angle 2 [0; =2] j between two vectors u; v in an inner product < ; > is de ned by cos = j 1=2 1=2 .) 2
1
7
Lemma 2 Let 1 (2, resp.) be the acute angle between u and u1 ( v and v1 resp.). If < u; v >= 0 and 1 ; 2 =4, then the acute angle between u1 and v1 is at least =2 ? 1 ? 2 , i.e. j < u 1 ; v1 > j sin( + ):
1 2 < u1 ; u1 >1=2< v1 ; v1 >1=2 Proof We can assume that u; u1; v and v1 are all unit vectors. Then we write u1 = u cos 1 + u? sin 1 and v1 = v cos 2 + v ? sin 2 , where u? (v ? ) is a unit vector orthogonal to u ( v resp.). Let a = j < u; v ? > j and b = j < v; u? > j. < u?; v ? > = < u? ? < u?; v > v; v ? >=< u? ? < u? ; v > v; v ? ? < v ? ; u > u > kqu?? < u? ; v > q vkkv ? ? < v ? ; u > uk p p = 1? < u? ; v >2 1? < v ? ; u >2 = 1 ? a2 1 ? b2
where we note that u? ? < u? ; v > v is orthogonal to v and v ? ? < v ? ; u > u is orthogonal to u. (k k is the norm associated with < ; >.) Thus
j < u; v? > cos 1 sin 2 + psin 1 cos 2 + < u?; v? > sin 1 sin 2j a cos 1 sin 2 + bqsin 1 cos 2 + 1 ? a2 1 ? b2 sin 1 sin 2 a cos 1 sin 2 + (sin 1 cos 2 )2 + (1 ? a2)(sin 1 sin 2 )2 cos 1 sin 2 + sin 1 cos 2 = sin(1 + 2 ); where we note that the second last expression is an increasing function of a 2 [0; 1] for 1 ; 2 =4, and therefore is bounded by its value at a = 1. Lemma 3 If zn satis es (ANG) with =4, then q T jnj = j znzTrnr?1 j 2 sin cond(M ?1A)n??11 : n n Proof First (Mzn?1)T M ?1rn = znT?1rn = 0. By applying the above Lemma with the inner product de ned by M ?1 to the pairs Mzn?1 , rn?1 and rn , Mzn , which satisfy (ANG), we obtain j(Mzn)T M ?1rn?1j sin(2); kMznkM ?1 krn?1kM ?1 i.e. jznT rn?1 j sin(2)kzn kM krn?1 kM ?1 : On the other hand, jznT rn j cos kzn kM krnkM ?1 . So jnj 2 sin krn?1kM ?1 =krnkM ?1 . Furthermore, it is easy to check that, for any vector v min(M ?1 A)kvk2A?1 kvk2M ?1 max(M ?1 A)kvk2A?1 where min and max denotes respectively the minimal and the maximal eigenvalues. Thus, p krn?1kM ?1 pmax(M ?1A)krn?1kA?1 = qcond(M ?1A)?1 ; n?1 krnkM ?1 min(M ?1 A)krnkA?1 j < u 1 ; v1 > j =
which completes the proof of the lemma.
8
We now present a bound for n that has been derived by Munthe-Kaas [8]. We shall use one of the variations of the bounds in [8, Theorem 5] and we repeat some arguments of [8] below. First the following lemma is a generalization of [1, Corollary 4] T Lemma 4 [8, Lemma 2] Suppose p and q 2 Rn are such that kpjpk kqqj k cos ; 0 2 : Then 2 (pT q )2 2 (q T W ?1 q )(pT Wp) 01=2 + 0?1=2 where W is a symmetric positive de nite matrix and 0 = cond(W ) 1+sin 1?sin .
Now, let p = M 1=2zn , q = M ?1=2rn and W = M ?1=2AM ?1=2 . Then
jpT qj = jznT rnj kpk kqk kzn kM krnkM ?1 cos : So applying the above Lemma to p; q and W , we obtain 2 0 ? 1 2 T q )2 T rn )2 2 ( p ( z n
n = 1 ? rT A?1 r z T Az = 1 ? (q T W ?1 q)(pT Wp) 1 ? 01=2 + 0?1=2 = 0 + 1 : n n n n 0 ? 1 2 T rn )2 ( z n 1+sin 0 ? 1 Lemma 5 Let = cond(M A) 1?sin . Then we have n = 1 ? rT A?1 r zT Az = 0 + 1 : n n n n We now present a linear convergence bound for IPCG.
p Theorem 2 If 0 = 2 sin cond(M ?1A) < 1; then IPCG converges and for even n, krn+1kA?1 (K )n 1 ? p1 (2 ? 8 tan + 8 tan sin ) + n kr1kA?1 0 where s p0 0 0 0 2 sin : ? 1 = p 0 ; K = 2 + 161 +=(4 ? 1) and 0 = cond(M ?1 A) 11 + ? sin
(9)
+1
Proof We consider two consecutive n of IPCG. By 0 < 1, < =6: Therefore Lemmas 3 and 5 apply. So from Theorem 1 we have
2 k?1 k + k?1 k2 gk = 1 ?1 ? (1??1 k ) = 1 ?1 ? k?1 ? 21? k?1 k k?1 ? 1 k p 2 1 ?1 + 1 ? 1 ? 2 1k??1k + k?1 k k k?1 1 p 1 ? + 1 ? (1 ? k?1k )2gk?1 21 ??
? (1 ? 0)2gk?1:
9
where is as de ned in Lemma 5. Then
n n+1
= (1 ? g1 )(1 ? g 1 ) 1 ? pg 1g
!2
n
=
n+1 n n+1 ! 2 2 0) 0 2(1 ? 1 ? p 1 ? (1 ? 0)2g + g 1? (1 ? 0)2gn gn+1 n n+1 2 1 ? 2(1 ? 0) 1 ? :
2?
Since 0 < < 1 and < 1, 0 < 1 ? 2(1 ? 0) 12??
< 1. Furthermore,
?1=2 ?
0 + 1 p40 (p0 ? 1)2
?1 ? 1 = 0 ? 1 ? 0 ? 1 = 0 ? 1 =
q
p and ?1=2 + ?1 ? 1 = ?1 imply 1+224 = 2 +2?2 = 2? . So
= (1 + 2 0( ?1 ? 1)) = 2 2 (1 + 8 0 0 ) 1 ? 2(1 ? 0) 12 ? ? 2? 1 + 4 (0 ? 1)2 = (K )2 < 1: 4 So, p nn+1 (K ) . Therefore, for even n, the bound follows from krn+1kA?1 =kr1kA?1 = (p 12) (n?1 n ), which also shows that IPCG p converges. Finally, by expanding K in terms 0 0 of and using = (2 ? 8 tan + 8 tan sin ) 0, we obtain K 1 ? p10 (2 ? 8 tan + 8 tan sin ) + .
In terms of the stopping criterion (RES), p the same result holds with sin replaced by (see Remark 1). Therefore, if = sin < 1=(2 cond(M ?1A)), IPCG will converge with a rate depending on (the standard PCG convergence rate). In particular, the bound indicates that the IPCG convergence rate is relatively insensitive to the magnitude of for smaller but increases sharply at certain point (see the rate curves in Fig. 2). However, the bound on the convergence rate here tends to be pessimistic and it does not recover the classical bound for the case = 0. Nevertheless, it does seem to re ect the trend of how the rate changes as changes. To compare the bound with the actual numerical results, we consider an example similar to the one in section 2. Namely, we consider a diagonal matrix whose eigenvalues are linearly distributed on [1; ], and apply PCG with the same kind of random perturbation. We carry out IPCG and compute the actual convergence rate by (krk kA?1 =krk?20kA?1 )1=20 for ranging from 10?6 to 1. In Figure 2 the graphs of the bound and the actual computed rate are plotted (for the case = 10; and 100). By comparing the actual convergence rate and its bound, we observe that the bound, as a worst case bound, follows the trend of the actual convergence rate curve quite closely. The bound reaches 1 at q 0 = sin 0 = 1=(2 cond(M ?1 A)): (10) In particular, 0 seems to be a good estimate of the point at which the actual rate starts to increase signi cantly (i.e., the slope is greater than 1). Note that when the slope is less than 1, any increase 10
Figure 2: Actual Convergence Rate and its Bound verses cond=100 1
0.9
0.95
0.8
0.9
convergence rate
convergence rate
cond=10 1
0.7
0.6
0.85
0.8
0.5
0.75 actual rate bound
0.4 −6 −4 −2 0 10 10 10 10 stopping threshold eta=sin(theta)
actual rate bound
0.7 −6 −4 −2 0 10 10 10 10 stopping threshold eta=sin(theta)
in the rate may be compensated by a comparable or more increase in . We therefore advocate a value around 0 as a heuristic choice of the stopping threshold for inner iterations. Our numerical examples in Section 5 con rm that this is indeed a reasonable strategy in balancing the numbers of inner and outer iterations.
4 Superlinear Convergence of IPCG The bound in the previous section demonstrates linear convergence of IPCG. We observed in section 2 that for smaller , IPCG may actually enjoy the superlinear convergence property of exact CG. Here we explain this phenomenon by the method of [9], i.e. by considering a global equation that is approximately satis ed by IPCG. We remark that a global property is necessary in examining superlinear convergence. Let
Rn = [r1; ; rn]; Zn = [z1 ; ; zn]; En = [e1 ; ; en]; and Pn = [p1 ; ; pn ]: From rn+1 = rn ? n Apn , Mzn+1 = rn+1 + en+1 and pn+1 = zn+1 + n+1 pn respectively, we obtain the following matrix equations for IPCG AP = R L ?1 ? 1 r eT ; n
n n N
n
n+1 n
MZn = Rn + En; and Zn = PnUn ; 11
(11)
where n = diag[1 ; ; n], 0 1 BB ?1 1 B ?1 Ln = B BB
B@
1 ?1 1
1 0 1 ? 2 CC BB 1 ? 3 CC B CC and Un = BBB CA B@ 1 ? n 1
Combining the equations in (11), we obtain the following equation AM ?1 Rn = Rn T^n ? 1 rn+1 eTn ? AM ?1 En :
n
1 CC CC CC : CA
(12)
(13)
Note that T^n = Ln ?n 1 Un is a tridiagonal matrix such that eTn T^n?1 e1 = eTn Un?1 n L?n 1 e1 = eTn n v = n (v = [1 1 1]T ). Therefore, the inexact case satis es an equation similar to the exact case with the error term AM ?1 En . We rewrite (13) in a scaled form as in [9, Eq. (8)]. Let Dn = diagfkr1k; ; krnkg, and R^n = [^r1; ; r^n] = RnDn?1 = [r1=kr1k; ; rn=krnk]: Then from (13) AM ?1R^n = R^ nTn ? ^1 rknr+1k eTn ? AM ?1 n (14)
n 1 ? 1 T where Tn = Dn T^n Dn , ^ n = krnkn =kr1k = en Tn?1 e1 and n p = [e1 =kr1k; ; en =krnk]. By the stopping criteria, kei k=krik cond(M ), and therefore kn k Ncond(M ). Now applying the
same argument in the proof of [9, Theorem 3.5] to (14), we obtain the following theorem. (The details are omitted here). Theorem 3 Assume r1; ; rn; rn+1 are linearly independent and let V0T = [In 0]R^ +n+1 2 RnN (i.e. the matrix consisting of the rst n rows of R^ +n+1 ). Then (15) krn+1kA?1 (1 + Kn + Mn ) min kp(AM ?1(I + n R^ +n ))r1kA?1 p2Pn;p(0)=1
where
T M ?1 R^ n ?1 T ?1 T Kn = k rkn+1 r k ?1 Tn V0 AkA?1 ; Mn = knTn V0 kA?1 : n+1 A
To interpret the above result, we note that krnT+1 M ?1 R^ n k=krn+1kA?1 is a measure of M ?1 orthogonality among the residual vectors. If = 0, then the residuals of PCG are orthogonal with respect to M ?1 and hence Kn = 0 above. If > 0 but is not too large, the loss of M ?1 orthogonality among the residuals may be gradual and it will take modest length of run before Kn >> O(1). Also Mn and n R^ +n are of magnitude . Therefore, in this regime, krn+1kA?1 = O(1) minp2Pn;p(0)=1 kp(AM ?1 (I + nR^ +n ))r1kA?1 . Note that k = minp2Pk ;p(0)=1 kp(AM ?1(I + n R^ +n ))r1kA?1 decreases superlinearly because of annihilation of the extreme spectrum (see [10]). See [9] for some arti cially perturbed numerical examples. In summary, if the global M ?1 -orthogonality among the residual vectors are nearly maintained to certain step, the residual of IPCG is very close to that of exact PCG up to that point and thus may display the superlinear convergence property. 12
5 Numerical Examples In this section, we present numerical examples of inner-outer iterations, testing various choices of p the stopping threshold as compared with 0 = 0:5= of (10), where = cond(M ?1A). For this p purpose, we shall consider = d= for d ranging 0:01 to 5 as well as = 1. We consider ?r(a(x; y)ru) + (x; y)u = f (x; y) on R = (0; 1) (0; 1) (16) with the homogeneous Dirichlet boundary condition u = 0 on @R: Using uniform ve-point nite dierence with the step size h = N 1+1 , we obtain n n (with n = N 2) linear system (A + D)un = f n , where A is the discretization of ?r(a(x; y)r) and is a (N N ) block tridiagonal matrix. We consider solving this equation using the block Jacobi preconditioner M (i.e. M is the block diagonal part of A) or using the discrete Laplacian L (i.e. L is the discretization of ?) as the preconditioner. For the purpose of testing inner-outer iterations, an iterative method (i.e. SOR, CG or preconditioned CG) is used for the preconditioners. We denote them by CG-SOR, CG-CG and CG-PCG respectively. We compare the number of outer (nouter ) and total inner (ninner ) iterations required to reduce the A?1 -norm of the residual by 10?8 . N = 30; 50; and 100 will be used in our tests. In the rst test, a(x; y ) = exp(y 2), (x; y ) = 0 and f (x; y ) = x2 y and the discrete Laplacian preconditioner L is used. SOR (with the optimal parameter [11]), CG and PCG with the modi ed incomplete Cholesky factorization are used in the inner iteration to solve Mz = r. The results are listed in Tables 1, 2 and 3.
d
N=30 (=2.58)
nouter ninner
N=50, (=2.64)
nouter ninner
1.0 151 151 1.0 0.9 137 151 0.9 1 0.62 41 144 0.62 0.8 0.50 29 148 0.49 0.5 (0) 0.31 21 193 0.31 0.3 0.19 17 224 0.18 0.1 0.062 14 297 0.062 0.05 0.031 13 353 0.031 0.01 0.0062 13 530 0.0062 `*' : > 1 for k 2; instead = 0:9 is used.
?
272 202 43 31 21 17 14 13 13
272 259 235 235 313 368 494 580 868
N=100, (=2.68)
1.0 0.9 0.61 0.49 0.31 0.18 0.061 0.031 0.0061
nouter ninner 603 233 46 32 21 17 14 14 14
603 549 476 511 639 720 977 1215 1400
Table 1: Iteration Counts for CG-SOR with discrete Laplacian preconditioner L In the rst rows of the tables, we also list the results for = 1, in which case one step of inner iteration is carried out for each outer iteration. In this way, it is closely related to those cases with very close to 1. Interestingly, however, this extreme case is usually equivalent to applying CG directly to the original matrix A or with a preconditioner. For example, if the inner iteration is CG itself, one step of inner CG produces inner solution zn in the same direction of rn and thus the outer iteration is exactly CG applied to A (see Appendix for a detailed discussion for other inner solvers). This is why that convergence still occurs in these extreme cases and our analysis, which are based on Mzn = rn + en only, would not include this case. However, if zn is chosen only to satisfy 13
d
N=30 (=2.58)
1.0 0.9 1 0.62 0.8 0.50 0.5 (0) 0.31 0.3 0.19 0.1 0.062 0.05 0.031 0.01 0.0062
nouter ninner 136 135 36 26 18 15 13 13 13
136 339 255 248 225 271 322 357 458
N=50, (=2.64)
1.0 0.9 0.62 0.49 0.31 0.19 0.062 0.031 0.0062
nouter ninner 232 150 37 27 19 15 14 13 13
232 590 441 425 390 441 545 585 733
N=100, (=2.68)
1.0 0.9 0.61 0.49 0.31 0.18 0.061 0.031 0.0061
nouter ninner 478 156 37 27 19 16 14 14 13
478 1131 885 813 746 926 1069 1206 1466
Table 2: Iteration Counts for CG-CG with discrete Laplacian preconditioner L
k
N=30 (=2.58)
1.0 0.9 1 0.62 0.8 0.50 0.5 (0) 0.31 0.3 0.19 0.1 0.062 0.05 0.031 0.01 0.0062
nouter ninner 29 29 27 21 17 16 14 13 13
29 29 42 42 47 50 69 70 96
N=50, (=2.64)
1.0 0.9 0.62 0.49 0.31 0.18 0.062 0.031 0.0062
nouter ninner 38 38 28 25 20 16 14 14 13
38 38 55 58 62 64 84 97 123
N=100, (=2.68)
1.0 0.9 0.61 0.49 0.31 0.18 0.061 0.031 0.0061
nouter ninner 54 54 34 27 19 17 14 14 13
54 54 87 88 91 101 116 139 177
Table 3: Iteration Counts for CG-PCG with discrete Laplacian preconditioner L
Mzn = rn + en but not in the particular way here, the convergence is not expected. The relatively
small iteration counts are due to the fact that the original system is not too ill-conditioned. In comparing the performance of dierent with respect to the outer iteration counts, it appears that 0 lies right around the point at which the outer iteration count starts to increase signi cantly. This con rms our convergence analysis for the outer iteration. For the total inner iteration counts, the performance for larger seems to be irregular among dierent inner solvers, and this can be attributed to the dierent convergence characteristic at the extreme case = 1 for dierent inner solvers (see Appendix). In particular, we observe that for larger , CG-CG (or CG-PCG) performs better than CG-SOR. This phenomenon was also observed in [3]. Overall, 0 seems to be a reasonable choice in balancing the numbers of inner and outer iterations. In the above example, and thus 0 remains nearly constant for dierent N . In the second test, we use the block Jacobi preconditioner M and a(x; y ) = 1, (x; y ) = 1+ cos(1000 (x + y )) and f (x; y) = x2 y. Then = 184:1; 500:4 and 1965:9 and 0 = 0:037; 0:022 and 0:011 for N = 30; 50 and 100 respectively. Both SOR and CG are used in the inner iteration. The results are listed in Tables 4 and 5 in an Appendix. Similar behaviour was observed.
14
6 Conclusion We have formulated and analyzed an inexact preconditioned conjugate gradient method. The method is proved to be convergent for fairly large thresholds in the inner iterations. A linear convergence bound, though pessimistic, is obtained, which leads to a heuristic choice of the stopping threshold in the inner iteration. Numerical tests demonstrate the eciency of the choice. It still remains an unsolved problem to choose an optimal that minimizes the total amount of work (see [4]), although 0 here provides a rst approximation. Solving such a problem demands a sharper bound in the outer iteration and analysis of the near extreme threshold cases. It is not clear whether a better bound could be obtained from the approach of the present paper. It seems there are more properties of IPCG awaiting for discovery. For example, better bounds for the steepest descent reduction factor n may exist for IPCG, which in turn would lead to improvement to the results here.
References [1] F.L. Bauer and A.S. Householder, Some inequalities involving the euclidean condition of a matrix, Numer. Math. 2(1960):308-311. [2] P. Concus, G.H. Golub and D.P. O'Leary, A Generalized Conjugate Gradient Method for the Numerical Solution of Elliptic Partial Dierential Equations, in Sparse Matrix Computations, eds. J.R. Bunch and D.J. Rose, Academic, New York, NY, 1976. [3] H.C. Elman and G.H. Golub, Inexact and preconditioned Uzawa algorithms for saddle point problems, SIAM J. Numer. Anal. 31(1994): 1645-1661. [4] E. Giladi, G.H. Golub and J.B. Keller, Inner and outer iterations for the Chebyshev algorithm Stanford SCCM Technical Report 95-12, Stanford University, Stanford, CA, October 1995 [5] G.H. Golub and M.L. Overton, Convergence of a two-stage Richardson iterative procedure for solving systems of linear equations. in Numerical Analysis, Lect. Notes Math 912, (ed. G.A. Watson), Springer, New York Heidelberg Berlin pp.128-139. [6] G.H. Golub and M.L. Overton, The convergence of inexact Chebyshev and Richardson iterative methods for solving linear systems. Numer. Math. 53:571-593 (1988). [7] G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 1983. [8] H. Munthe-Kaas, The convergence rate of inexact preconditioned steepest descent algorithm for solving linear systems, Technical Report, Department of Computer Science, Stanford University, Stanford, CA, March 1987 [9] C. H. Tong and Q. Ye, Analysis of the Finite Precision Bi-Conjugate Gradient algorithm for Nonsymmetric Linear Systems, Stanford SCCM Technical Report 95-11, Stanford University, Stanford, CA, October 1995. [10] A. van der Sluis and H.A. van der Vorst, The rate of convergence of conjugate gradients, Numer. Math. 48(1986):543-560. 15
[11] D. M. Young, Iterative Solution of Large Linear Systems Academic Press, New York, 1971.
7 Appendix This appendix contains a detailed discussion on IPCG in the extreme case = 1 and numerical results for second example in section 5. If the inner iteration is SOR, let S! be the iteration matrix. Then applying one step of SOR iteration to solve Mzn = rn produces the inner solution zn = S! rn . Write M^ = S!?1 . Then zn ^ n = rn exactly. Therefore, the outer iteration is indeed just the standard preconditioned satis es Mz CG for A with M^ = S!?1 as the preconditioner. This explains the convergence. Note that M^ may be a bad preconditioner. If the inner iteration is PCG preconditioned by M0 (in our rst example, M0 was the modi ed incomplete Cholesky factorization), then applying the rst step of PCG to solve Mzn = rn produces the inner solution zn = 1 p^1 = 1 M0?1 rn , where p^1 = M0?1 rn is the rst search direction in the inner CG and 1 the rst coecient. However, constructing xn and rn in IPCG depends on the direction of pn , which depends on that of zn only. Thus, with zn = 1 M0?1 rn , the outer iteration is equivalent to the one generated by zn = M0?1 rn . Therefore, IPCG in this extreme case is equivalent to the standard preconditioned CG for A with M0 as the preconditioner. Again, this is the cause of convergence in the extreme case. When is very close to one, it takes relatively constant number of inner iteration to obtain zn . So, similar discussions are valid. Finally, the following are the results of the second numerical example.
d
N=30 (=184.1)
nouter ninner
N=50, (=500.4)
nouter ninner
N=100, (=1965.9)
nouter ninner
1.0 911 911 1.0 2448 2448 1.0 9580 9580 5 0.37 911 911 0.22 633 1266 0.11 2277 4554 3 0.22 264 528 0.13 633 1266 0.067 714 2142 1 0.073 264 528 0.044 244 732 0.023 714 2142 0.8 0.059 122 366 0.035 244 732 0.018 338 1352 0.5 (0) 0.037 122 366 0.022 244 732 0.011 338 1352 0.3 0.022 122 366 0.013 154 616 0.0067 338 1352 0.1 0.0073 93 372 0.0044 138 690 0.0023 271 1355 0.05 0.0037 90 450 0.0022 138 690 0.0011 267 1602 0.01 0.00073 85 510 0.00044 137 822 0.00023 267 1869 Table 4: Iteration Counts for CG-SOR with block Jacobi preconditioner M
16
k
N=30 (=184.1)
1.0 5 0.37 3 0.22 1 0.073 0.8 0.059 0.5 (0) 0.037 0.3 0.022 0.1 0.0073 0.05 0.0037 0.01 0.00073
nouter ninner 90 132 93 83 83 84 84 84 84 84
90 259 185 166 166 247 247 336 388 494
N=50, (=500.4)
1.0 0.22 0.13 0.044 0.035 0.022 0.013 0.0044 0.0022 0.00044
nouter ninner 149 154 153 136 136 137 137 136 136 136
149 306 305 356 397 411 496 594 678 815
N=100, (=1965.9)
1.0 0.11 0.067 0.023 0.018 0.011 0.0067 0.0023 0.0011 0.00023
nouter ninner 295 299 272 268 268 288 267 267 267 267
Table 5: Iteration Counts for CG-CG with block Jacobi preconditioner M
17
295 597 712 803 803 1052 1067 1329 1483 1767