CONVERGENCE ANALYSIS OF PROJECTION ... - Semantic Scholar

Report 1 Downloads 131 Views
CONVERGENCE ANALYSIS OF PROJECTION METHODS FOR THE NUMERICAL SOLUTION OF LARGE LYAPUNOV EQUATIONS∗ V. SIMONCINI† AND V. DRUSKIN



Abstract. The numerical solution of large-scale continuous-time Lyapunov matrix equations is of great importance in many application areas. Assuming that the coefficient matrix is positive definite, but not necessarily symmetric, in this paper we analyze the convergence of projection type methods for approximating the solution matrix. Under suitable hypotheses on the coefficient matrix, we provide new asymptotic estimates for the error matrix when a Galerkin method is used in a Krylov subspace. Numerical experiments confirm the good behavior of our upper bounds when linear convergence of the solver is observed.

1. The problem. We are interested in the approximate solution of the following Lyapunov matrix equation: AX + XA> = BB > ,

(1.1)

with A a real matrix of large dimension and B a real tall matrix. Here A> indicates the transpose of A. We assume that the n × n matrix A is either symmetric and positive definite, or nonsymmetric with positive definite symmetric part, that is, (A + A> )/2 is positive definite. In the following we mostly deal with the case of B having a single column, that is B = b and we assume that b has unit Euclidean norm, that is kbk = 1. Nonetheless, our results can be extended to the multiple vector case. This problem arises in a large variety of applications, such as signal processing and system and control theory. The symmetric solution X carries important information on the stability and energy of an associated dynamical linear system and on the feasibility of order reduction techniques [2], [6], [8]. The analytic solution of (1.1) can be written as

X=

Z



>

e−tA BB > e−tA dt =

Z



xx> dt,

(1.2)

0

0

where we have set x = e−tA B. Let αmin be the smallest eigenvalue of the symmetric part of A, αmin = λmin ((A + A> )/2) > 0. Then it can be shown that kxk ≤ exp(−tαmin )kBk; see, e.g., [8, Lemma 3.2.1]. Projection type methods seek an approximate solution Xm in a subspace of Rn , by requiring that the residual BB > − (AXm + Xm A> ) be orthogonal to this subspace. A particularly effective choice as approximation space is given by (here for B = b) the Krylov subspace Km (A, b) = span{b, Ab, . . . , Am−1 b}, of dimension m ≤ n [22], [23], [31]; we also refer to a richer bibliographic account collected in [2], [9], while we point to [33] for recent algorithmic progress within the Krylov subspace context. Abundant experimental evidence over the years has shown that the use of the space Km (A, b) allows one to often obtain a satisfactorily accurate approximation Xm , in a space of much lower dimension than n. A particularly attractive feature is that Xm may be ∗ Version

of April 5, 2008. di Matematica, Universit` a di Bologna, Piazza di Porta S. Donato 5, I-40127 Bologna, Italy and CIRSA, Ravenna, Italy ([email protected]). ‡ Scientific Advisor, Ph.D., 1 Hampshire st. Cambridge, MA 02139 USA ([email protected]) † Dipartimento

1

2

V. Simoncini and V. Druskin

> with Um of low column rank, so that only written as a low rank matrix, Xm = Um Um the matrix Um needs to be stored. To the best of our knowledge, no asymptotic convergence analysis of this Galerkin method is available in literature. The aim of this paper is to fill this gap. We also refer to [30] for a-priori estimates on the residual norm when solving the Sylvester equation with projection type methods; inthere, the role of αmin is also emphasized, although the bound derived in [30, Proposition 4.1] for the residual norm is of greater value as a non-stagnation condition of the procedure, rather than as an estimate of the actual convergence behavior. To derive our error estimates, we shall use the integral representation (1.2) for both X and Xm , and explicitly bound the norm of the error matrix X − Xm ; we refer to [31] for early considerations in this direction. Our approach is highly inspired by, and fully relies on, the papers [13], [24], where general estimates for the error in approximating matrix operators by polynomial methods are derived. We provide explicit estimates when A is symmetric, and when A is nonsymmetric with its field of values (or spectrum) contained in certain not necessarily convex sets of C+ . We also show that the convergence of the Galerkin method is closely related to that of Galerkin methods for solving the linear system (A + αmin I)d = b. Our estimates are asymptotic, and thus linear, that is, they do not capture the possibly superlinear convergence behavior of the method that is sometimes observed [29]. In the linear system setting, the superlinear behavior is due to the fact that Krylov based methods tend to adapt to the (discrete) spectrum of A, accelerating convergence as spectral information is gained while enlarging the space. Recent results for A symmetric have been derived, which completely describe the behavior of Krylov subspace solvers in the presence of superlinear convergence [4], [5]; see also [34] for a discussion and more references. Throughout the paper we assume exact arithmetic.

2. Numerical solution and preliminary considerations. Given the Krylov subspace Km (A, b) and a matrix Vm whose orthonormal columns span Km (A, b), with b = V e1 , we seek an approximation in the form Xm = Vm Ym Vm> . Here and in the following, ei denotes the ith column of the identity matrix of given dimension. Imposing that the residual Rm = bb> − (AXm + Xm A> ) be orthogonal to the given space, the Galerkin condition, yields the equation Vm> Rm Vm = 0



> T m Y + Y Tm = e1 e> 1,

where Tm = Vm> AVm ; see, e.g., [2], [31]. The m × m matrix Ym can thus be computed by solving the resulting small size Lyapunov equation. The matrix Xm can be equivalently written in integral form. Indeed, let xm = xm (t) = Vm e−tTm e1 be the so-called Krylov approximation to x = x(t) in Km (A, b). Then Xm can be written as ¶ µZ ∞ > −tTm e dt Vm> e−tTm e1 e> Xm = V m 1 0 Z ∞ Z ∞ > −tTm > xm x> e V dt = Vm e−tTm e1 e> = m dt. 1 m 0

0

We are interested in finding a-priori bounds for the 2-norm of the error matrix, that is for kX − Xm k, where the 2-norm is the matrix norm induced by the vector Euclidean

3

Projection methods for Lyapunov equations

norm. We start by observing that kX − Xm k = k

R∞ 0

(xx> − xm x> m )dtk, and that

> > kxx> − xm x> m k = kx(x − xm ) − (x − xm )xm k ≤ (kxk + kxm k) kx − xm k. > > )/2)) ≤ )/2) ≥ αmin . Using kxm k ≤ exp(−tλmin ((Tm +Tm It holds that λmin ((Tm +Tm exp(−tαmin ), we have Z ∞ Z ∞ kX − Xm k ≤ kxx> − xm x> kdt ≤ (kxk + kxm k)kx − xm kdt m 0 0 Z ∞ (2.1) e−tαmin kx − xm kdt. ≤2 0

We notice that e−tαmin kx − xm k = k exp(−t(A + αmin I))b − Vm exp(−t(Tm + αmin I))e1 k =: kˆ x−x ˆm k,

which is the error in the approximation of the exponential of the shifted matrix A + αmin I with the Krylov subspace solution. Therefore, Z ∞ kX − Xm k ≤ 2 kˆ x−x ˆm kdt. (2.2) 0

In the following we will bound kX − Xm k by judiciously integrating an upper bound of the integrand function. In fact, estimates for the error norm kˆ x−x ˆm k are available in the literature, which show superlinear convergence of the Krylov approximation xm to the exponential vector x; see, e.g., [12], [39], [36], [21]. However, these bounds are not appropriate when used in the generalized integral above. The matrix Vm = [v1 , . . . , vm ] can be generated one vector at the time, by means of the following Arnoldi recursion: AVm = Vm Tm + vm+1 tm+1,m e> m,

v1 = b/kbk,

(2.3)

where Vm+1 = [Vm , vm+1 ] has orthonormal columns and spans Km+1 (A, b). In general, Tm is upper Hessenberg, and it is symmetric, and thus tridiagonal, when A is itself symmetric. We conclude this section with a technical lemma, whose proof is included for completeness; see, e.g., [24] for a similar result in finite precision arithmetic. ∞ X fk Pk (z) Lemma 2.1. Let Pk be a polynomial of degree at most k. Let f (z) = k=0

be a convergent series expansion of the analytic function f and assume that the expansions of f (A) and of f (Tm ) are also well defined. Then kf (A)b − Vm f (Tm )e1 k ≤

∞ X

k=m

|fk | (kPk (A)k + kPk (Tm )k).

Proof. We have f (A)b − Vm f (Tm )e1 =

m−1 X

fk (Pk (A)b − Vm Pk (Tm )e1 )

k=0 ∞ X

+

k=m

fk (Pk (A)b − Vm Pk (Tm )e1 ).

4

V. Simoncini and V. Druskin

Using the Arnoldi relation and the fact that Tm is upper Hessenberg, Ak Vm e1 = k Vm Tm e1 for k = 1, . . . , m − 1, and thus Pk (A)b = Pk (A)Vm e1 = Vm Pk (Tm )e1 , k = 1, . . . , m − 1, so that f (A)b − Vm f (Tm )e1 =

∞ X

k=m

fk (Pk (A)b − Vm Pk (Tm )e1 ).

Taking norms, the result follows. 3. The symmetric case. In the symmetric case, we show that the asymptotic convergence rate of the Krylov subspace solver is the same as that of the Conjugate Gradient method applied to the shifted system (A + αmin I)x = b, where αmin = λmin , the smallest eigenvalue of the positive definite matrix A [18]; see also section 5. Proposition 3.1. Let A be symmetric and positive definite, and let λmin be the ˆ min , λ ˆ max be the extreme eigenvalues of A + λmin I and smallest eigenvalue of A. Let λ ˆ ˆ κ ˆ = λmax /λmin . Then √

κ ˆ+1 √ kX − Xm k ≤ ˆ λmin κ ˆ

Ã√

κ ˆ−1 √ κ ˆ+1

!m

.

(3.1)

R∞ Proof. Using (2.1) we are left to estimate 0 e−tαmin kx − xm kdt. Let λmax be the largest eigenvalue of A. Formula (4.2) in [12] shows that both x and xm may be written as Chebyshev series, e.g.1 ∞

λmax − λmin λmax + λmin X0 Ik (t ) )Tk (A0 )b, x = 2 exp(−t 2 2 k=0

where Ik is the Bessel function of an imaginary argument, or modified Bessel function, Tk is the Chebyshev polynomial of degree k, and A0 = (λmax + λmin )/(λmax − λmin )I − 2/(λmax − λmin )A so that kTk (A0 )k ≤ 1 holds; see also [1, formula (9.6.34)]. Since polynomials of degree up to k − 1 are exactly represented in the Krylov subspace of dimention k (see [12] and also Lemma 2.1), it thus follows that

kx − xm k ≤ 4 exp(−t

∞ λmax + λmin X λmax − λmin ) ). Ik (t 2 2 k=m

Therefore, setting p = (3λmin + λmax )/(λmax − λmin ) = (ˆ κ + 1)/(ˆ κ − 1) and ρ = p p + p2 − 1, we have 1 The

prime in the series indicates that the first term is divided by two.

5

Projection methods for Lyapunov equations

kX − Xm k ≤ 2 ≤8

Z



0 ∞ X

k=m

=q =√ =√

kˆ x−x ˆm kdt

Z



0

( 32 λmin

3 1 λmax − λmin exp(−t( λmin + λmax ))Ik (t )dt 2 2 2 ∞ X

8 +

λmax 2 2 )



(λmax −λmin )2 k=m 4

³

∞ X 1 8(ˆ κ + 1) ³ ´ κ ˆ (3λmin + λmax ) k=m p + pp2 − 1 k

1 ´k p p + p2 − 1

(3.2)

4(ˆ κ + 1) 2ρ 1 . ρ κ ˆ (3λmin + λmax ) − 1 ρm

To get (3.2) we used the following integral formula for Bessel functions in [19, Formula (6.611.4)] Z ∞ βν p , for −1 and | 0; see, e.g., [35, section 1.2]. Let ψ denote the inverse of φ. The principal (polynomial) part of the Laurent series of φk is the Faber polynomial Φk , of exact degree k. Under these hypotheses, it was recently shown by Beckermann that for any z in a convex and compact set of C, it holds that |Φk (z)| ≤ 2. Assume that f (λ) = exp(−λt) is regular in Ω = ψ(D(0, r2 )), and let f (λ) ≡ exp(−λt) =

∞ X

fk Φk (λ),

k=0

be the expansion of exp(−λt) in Faber series in Ω with 1 < r2 < ∞. For 1 < r < r2 , the expansion coefficients are given as fk =

1 2πi

Z

|τ |=r

exp(−tψ(τ )) dτ, τ k+1

|fk | ≤

1 sup | exp(−tψ(τ ))|, rk |τ |=r

(4.1)

see, e.g., [35, sec. 2.1.3],[37]. Note that fk = fk (t). 4.1. Field of values contained in an ellipse. The case in which the field of values is contained in an ellipse is a particularly natural generalization of the symmetric case. Proposition 4.1. Assume the field of values of the real matrix A is contained + in the pellipse E ⊂ C of center (c, 0) , foci (c ± d, 0) and semi-axes a1 and a2 , so that d = a21 − a22 . Then kX − Xm k ≤ p where r2 =

c+αmin 2r

+

1 2r

q

r2 (αmin + c)2 − d2 r2 − 1 8

2

(c + αmin ) − d2 , and r =

µ

1 r2

¶m

,

a1 +a2 . 2

Proof. For λ ∈ E, and setting rb = 2r/d, we have Φk (λ) = 2(b r)−k Tk ( λ−c d ) (see, e.g., [38], [17]), therefore we can explicitly write the Faber series on E via Chebyshev ones as e−λt = 2 exp(−tc)

∞ X 0

k=0



Ik (td)Tk (

X0 λ−c ) = exp(−tc) Ik (td)b rm Φk (λ). d k=0

Projection methods for Lyapunov equations

7

Using Lemma 2.1, the bounds kΦk (Tm )k ≤ 2, kΦk (A)k ≤ 2 obtained in [3], and the same integral formula for Bessel functions as in the proof of Proposition 3.1, we obtain Z ∞ kˆ x−x ˆm kdt kX − Xm k ≤ 2 0

≤8

∞ Z X

k=m

=p



e(−αmin −c)t Ik (td)b rk dt

0

¶k µ ¶m ∞ µ X 8 1 r2 1 =p . 2 2 2 2 (αmin + c) − d k=m r2 (αmin + c) − d r2 − 1 r2 8

We show the quality of the estimate with a few numerical examples. Example 4.2. We consider a 400 × 400 (normal) diagonal matrix A whose eigenvalues are λ = c + a1 cos θ + ıa2 sin θ, θ uniformly distributed in [0, 2π] and c = 20, semi-axes a1 = 10 and a2 = 2,pso that the √ eigenvalues are on an elliptic curve with center c and focal distance d = a21 − a22 = 96. Here αmin ≈ 10.001, yielding 1/r2 ≈ 0.2056 for Proposition 4.1. The vector b is the vector of all ones, normalized to have unit norm. In Figure 4.1 we report the error associated with the Krylov subspace approximation of the Lyapunov solution, and the estimate of Proposition 4.1. The agreement is impressive, as it should be expected since the spectrum exactly lies on the elliptic curve and the matrix is normal, so that the field of values coincides with the associated convex hull. 0

10

error norm ||X−X || m

estimate of Prop. 4.1

−2

10

−4

absolute error norm

10

−6

10

−8

10

−10

10

−12

10

−14

10

0

2

4

6

8

10

12

14

16

dimension of Krylov subspace

Fig. 4.1. Example 4.2. True error and its estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.

Example 4.3. We next consider the 400 × 400 matrix A stemming from the centered finite difference discretization of the operator L(u) = −∆u + 40(x + y)ux + 200u in the unit square, with Dirichlet boundary conditions. The spectrum of A, together with its field of values (computed with the Matlab function fv.m in [20]) and a surrounding ellipse, are shown in the left plot of Figure 4.2. Here αmin = 0.4533. The ellipse has parameters c = 4.4535, p a1 = c−αmin , a2 = 3.7, a1 , a2 being the semi-axes’s length, and focal distance d = a21 − a22 ≈ 1.52, yielding 1/r2 ≈ 0.8044. The right

8

V. Simoncini and V. Druskin

plot of Figure 4.2 shows the convergence history of the Krylov solver, together with the asymptotic factor (1/r2 )m in Proposition 4.1. The initial asymptotic convergence rate is reasonably well captured by the estimate. 0

10

error norm ||X−X || m

4

asymp. estimate of Prop. 4.1 −2

3

10

absolute error norm

2

ℑ(λ)

1

0

−1

−2

−4

10

−6

10

−8

10

−3 −10

10 −4

0

1

2

3

4

5

6

7

8

9

−12

10

0

5

10

15

20

25

30

35

40

45

50

dimension of Krylov subspace

ℜ(λ)

Fig. 4.2. Example 4.3. Left plot: Spectrum of A, field of values (thin solid curve), and smallest computed elliptic curve including the field of value (thick solid curve). Right plot: True error and its asymptotic factor in the estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.

0

5

10

error norm ||X−X || m

4

asymp. estimate of Prop. 4.1

−2

10

3 −4

10

absolute error norm

2

ℑ(λ)

1

0

−1

−2

−6

10

−8

10

−10

10

−3 −12

10

−4

−5

−14

0

20

40

60

ℜ(λ)

80

100

120

10

0

5

10

15

20

25

30

dimension of Krylov subspace

Fig. 4.3. Example 4.4. Left plot: real spectrum, field of values (thin solid curve), and smallest computed elliptic curve including the field of value (thick solid curve). Right plot: True error and its estimate of Proposition 4.1 for the Krylov subspace solver of the Lyapunov equation.

Example 4.4. We consider the 400 × 400 bidiagonal matrix A with uniformly distributed diagonal elements in the interval [10, 110] and unit upper diagonal. In this case αmin = 9.4692. The vector b is the normalized vector of all ones. Our numerical computation reported in the left plot of Figure 4.3 showed that the field of values of A (computed once again with fv.m [20]) is containedp in an ellipse with center c = 60, semi-axes a1 = 50.8, a2 = 4.2 and focal distance d = a21 − a22 ≈ 50.62, yielding 1/r2 = 0.4699. The right plot of Figure 4.3 shows the convergence history

9

Projection methods for Lyapunov equations

of the Krylov solver, together with the asymptotic factor in Proposition 4.1. Once again, the asymptotic rate is a good estimate of the actual convergence rate. Even more accurate bounds for this example might be obtained by using more appropriate conformal mappings than the ellipse. It may be possible to include the field of values into a rectangle, for which the mapping ψ could be numerically estimated [14], [16]; see also Example 4.9. 4.2. Field of values contained in a more general region. For a more general region, we employ the general expansion in Faber series. We will proceed as follows. Using Lemma 2.1, we write kˆ x−x ˆm k ≤

∞ X

k=m

|fk | (kΦk (A + αmin I)k + kΦk (Tm + αmin I)k).

If we consider a convex set containing the field of values of A + αmin I, the result in [3] allows us to write kΦk (A + αmin I)k ≤ 2 and kΦk (Tm + αmin I)k ≤ 2, so that kˆ x−x ˆm k ≤ 4

∞ X

k=m

|fk |,

and we can conclude by using (4.1), once appropriate estimates for the sup function and for r2 areRidentified. More precisely, if M = M(t) > 0 is such that |fk | ≤ Mr2−k ∞ for all k, and 0 Mdt converges, then µ ¶m ¶ µZ ∞ 1 r2 . Mdt kX − Xm k ≤ 8 r − 1 r 2 2 0 In the next few corollaries we derive a result of the same type, with a choice of r2 such that the generalized integral converges. In case we wish to only work with a set containing the spectrum, but not necessarily the field of values of A + αmin I, we can relax the convexity assumption, and differently bound the norm of the Faber polynomials in A, at the price of keeping the condition number of the eigenvector matrix in the convergence estimate. This case will be analyzed at the end of this section, and one example will be given around Corollary 4.10. We start by considering once again the case when the field of values is contained in an ellipse, for which the result is qualitatively the same as that in Proposition 4.1. The reason for reproducing the result in the case of the ellipse is precisely to appreciate the limited loss of accuracy given by the bound, when the more general approach is used, and to explicitly show the calculations in the case of an easy-to-handle mapping. Corollary 4.5. Assume the field of values of the real matrix A is contained in an ellipse E ⊂ C+ of center (c, 0) and semi-axes a1 and a2 , a1 > a2 . Let αmin = λmin ((A + A> )/2). Then for ² satisfying 0 < ² ≤ 2αmin , µ ¶m 8 r2 1 kX − Xm k ≤ ² r2 − 1 r2 where r2 =

c + αmin − ² 1 + 2r 2r

q

2

(c + αmin − ²) − d2 ,

r=

a1 + a2 ,d= 2

q

a21 − a22 .

10

V. Simoncini and V. Druskin

ˆ be the ellipse containing the field of values of Proof. Let α = αmin and let E ˆ A + αI. We consider the mapping whose boundary image of the unit disk is ∂ E, (d/2)2 ˆ For ψ(τ ) = c + α + rτ + rτ , with τ = eiθ ∈ D(0, 1), so that ψ(|τ | = 1) = ∂ E. −1 ² > 0, we define r2 := |ψ (²)|, so that exp(−t²) = max | exp(−tψ(τ ))|, |τ |=r2

and for 1 < rˆ < r2 , 1 2π

Z

0



|f (ψ(ˆ reiθ ))|dθ ≤ exp(−t²) =: M(t).

(4.2)

ˆ is convex, it follows that kΦk (A + αI)k ≤ 2 for k = 0, 1, ...; see [3]. The same Since E holds for kΦk (Tm + αI)k, since the field of values of Tm + αI is included in that of A + αI. Therefore, Lemma 2.1 ensures that kˆ x−x ˆm k ≤

∞ X

k=m

Finally, using

r2 |fk | (kΦk (A + αI)k + kΦk (Tm + αI)k) ≤ 8 exp(−t²) r2 − 1

R∞ 0

µ

1 r2

¶m

.

exp(−t²)dt = ²−1 ,

kX − Xm k ≤

Z

0



8 r2 kˆ x−x ˆm kdt ≤ ² r2 − 1

µ

1 r2

¶m

,

which completes the proof. The ideal result for kx − xm k would set r2 to be equal to r2 = ψ −1 (0) and not to r2 = ψ −1 (²) in the proof. However this would make M in (4.2) to be equal to one, and the generalized integral would not converge. The result above can be compared to the sharper one in Proposition 4.1. In practice, however, the asymptotic result is not affected by the use of ², since it is sufficient to take ² small compared to αmin , and the same asymptotic rate as in Proposition 4.1 is recovered; only the multiplicative factor increases. Therefore, setting r2,0 = ψ −1 (0), the result above shows that ¶m ¶ µµ 1 . (4.3) kX − Xm k = O r2,0 The following mapping is a modified version of the external mapping used for instance in [21], ψ(τ ) = γ1 − γ2 (1 −

1 2−θ ) τ, τ

τ = σeiω ,

|τ | ≥ 1

(4.4)

for 0 < θ < 1 and γ1 , γ2 ∈ R+ . The function ψ maps the exterior of the disc D(0, 1) onto a wedge-shaped convex set Ω in C+ . The following result holds. ˆ ⊂ C+ be the wedge-shaped set which is the image through Corollary 4.6. Let Ω ˆ ψ of the disk D(0, 1), where ψˆ is as in (4.4). Assume the field of values of the matrix ˆ For 0 < ² < 2αmin , let A + αmin I, with αmin = λmin ((A + A> )/2), is contained in Ω. −1 ˆ r2 = |ψ (²)|. Then µ ¶m 8 r2 1 kX − Xm k ≤ . ² r2 − 1 r2

11

Projection methods for Lyapunov equations

Proof. The proof follows the same steps as that of Corollary 4.5. Example 4.7. We consider the 400 × 400 (normal) diagonal matrix A whose eigenvalues are on the curve ψ(τ ) = 2 − 2(1 − 1/τ )2−ω τ , for τ ∈ D(0, 1) with ω = 0.3 (see the left plot of Fig. 4.4). Here αmin = 1.9627. The image of the mapping ˆ ) = αmin + ψ(τ ), τ ∈ D(0, 1) thus contains the spectrum of A + αmin I. Numerical ψ(τ computation yields r2,0 = |ψˆ−1 (0)| ≈ 3.5063. The vector b is the normalized vector of all ones. The right plot of Figure 4.4 shows the convergence history of the Krylov solver, together with the asymptotic factor (1/r2,0 )m in the estimate of Corollary 4.6. The linear asymptotic convergence is fully captured by the estimate. 0

1

10

error norm ||X−Xm|| 0.8

asympt. estimate of Corollary 4.6 −2

10

0.6 −4

10 absolute error norm

0.4

ℑ(λ)

0.2

0

−0.2

−0.4

−6

10

−8

10

−10

10

−0.6 −12

10

−0.8

−1

−14

2

3

4

5

ℜ(λ)

6

7

8

9

10

0

5

10

15

20

25

dimension of Krylov subspace

Fig. 4.4. Example 4.7. Left plot: Spectrum of A. Right plot: True error and its asymptotic factor associated with the asymptotic estimate (1/r2,0 )m related to Corollary 4.6 for the Krylov subspace solver of the Lyapunov equation.

In our next examples we numerically determine a contour bounding the field of values of the coefficient matrix. Indeed, more general mappings than in the examples above can be obtained and numerically approximated within the class of SchwarzChristoffel conformal mappings [10]. In all cases, the vector b was taken to be the normalized vector of all ones. Example 4.8. We consider the 200 × 200 Toeplitz matrix A = Toeplitz(−1, −1, 2, 0.1). In this case, the asymptotic convergence rate was numerically determined. To this end, we used the Schwarz-Christoffel mapping Toolbox ([11]) in Matlab to numerically compute a conformal mapping whose image was an approximation to the boundary of the field of values of A (cf. left plot of Figure 4.8). A polygon with few verteces approximating ∂F (A + αmin I) was obtained with fv.m, and this was then injected in the Schwarz-Christoffel inverse mapping function to construct the sought after mapping and the value of r2,0 according to (4.3). The asymptotic rate was determined to be 1/r2,0 ≈ 0.8859. The right plot in Figure 4.5 shows the extremely good agreement between the true error and the asymptotic rate, for this numerically determined mapping.

12

V. Simoncini and V. Druskin 0

2

10

1.5

10

1

10

error norm ||X−Xm|| asympt. estimate

−1

absolute error norm

−2

ℑ(λ)

0.5

0

−0.5

−3

10

−4

10

−5

10

−1

−6

10

−1.5

−7

10

−2 −8

0

0.5

1

1.5

ℜ(λ)

2

2.5

3

10

3.5

0

20

40

60

80

100

120

dimension of Krylov subspace

Fig. 4.5. Example 4.8. Left: spectrum (’x’) and approximated field of values (solid thick blue). Right: true convergence rate and asymptotic estimate (1/r2,0 )m .

Example 4.9. We consider once again the matrix in Example 4.4, and use the Schwarz-Christoffel mapping Toolbox to generate a sharper estimate of the polygon including the field of values. This provides a refined numerical mapping, and a more accurate convergence rate. The polygon approximating the field of values of A+αmin I is shown in the left plot of Figure 4.6, while the history of the error norm and the estimate for the numerically computed value 1/r2,0 ≈ 0.4445 (cf. (4.3)) are reported in the right plot of Figure 4.6. The estimated convergence rate is clearly higher, that is 1/r2,0 is smaller, than the value computed with the ellipse, which was r/r2 ≈ 0.4699. 0

4

10

error norm ||X−Xm|| asympt. estimate

3

−2

10

2 −4

absolute error norm

10

ℑ(λ)

1

0

−1

−6

10

−8

10

−10

10

−2

−12

10

−3

−4

−14

20

40

60

ℜ(λ)

80

100

120

10

0

5

10

15

20

25

30

dimension of Krylov subspace

Fig. 4.6. Example 4.9. Left: spectrum (’x’) and approximated field of values (solid thick blue). Right: true convergence rate and asymptotic estimate (1/r2,0 )m .

The following mapping was analyzed in [26] and it is associated with a non-convex domain; the specialized case of an annular sector is discussed for instance in [7]. Given a set Ω, assume that ∂Ω is an analytic Jordan curve. If Ω is of bounded (or finite)

Projection methods for Lyapunov equations

13

boundary rotation, then V (Ω) , π where V (Ω) is the boundary rotation of Ω, defined as the total variation of the angle between the positive real axis and the tangent of ∂Ω. In particular, this bound is scale-invariant, so that it also holds that V (sΩ) = V (Ω) [26]. These important properties ensure that for a diagonalizable matrix A, kΦk (A + αmin I)k is bounded independently of k, on a non-convex set with bounded boundary rotation. Indeed, letting A = QΛQ−1 be the spectral decomposition of A, then kΦk (A + αmin I)k ≤ κ(Q)kΦk (Λ + αmin I)k, where κ(Q) = kQk kQ−1 k, and the estimate above can be applied. Corollary 4.10. Assume that A is diagonalizable, and let A = QΛQ−1 be its spectral decomposition. Assume the spectrum of A + αmin I is contained in the set sΩ ∈ C+ , with s > 0, whose boundary is the “bratwurst” image for |τ | = 1 of max |Φk (z)| ≤ z∈Ω

ψ(τ ) =

(ρτ − λN )(ρτ − λM ) ∈ C+ , (N − M )ρτ + λ(N M − 1)

where τ ∈ D(0, r), r ≥ 1, while N, M, ρ and λ are given and such that ψ(D(0, 1)) ⊂ C+ . Then, for 0 < ² < min|τ |=1 − Zm Zm k for instance as > kZZ > − Zm Zm k≤

s X

k=1

(k) (k) > (k) (k) > kzm (zm ) − zm (zm ) k, (1)

(s)

where Z = [z (1) , . . . , z (s) ] and Zm = [zm , . . . , zm ]. The results of the previous sections can be thus applied to each term in the sum. Refined bounds may possibly be obtained by using the theory of matrix polynomials, but this is beyond the scope of this work; see, e.g., [32]. We also observe that our convergence results can be generalized to the case of accelerated methods, such as that described in [33], by using the theoretical matrix function framework described in [13]. Acknowledgments. We are deeply indebted to Leonid Knizhnerman for several insightful comments which helped improve a previous version of this manuscript. We thank the referee whose criticism also helped us improve the manuscript. REFERENCES [1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, 1965. [2] A. C. Antoulas. Approximation of large-scale Dynamical Systems. Advances in Design and Control. SIAM, Philadelphia, 2005. [3] B. Beckermann. Image num´ erique, GMRES et polynˆ omes de Faber. C. R. Acad. Sci. Paris, Ser. I, 340:855–860, 2005. [4] B. Beckermann and A. B. J. Kuijlaars. Superlinear Convergence of Conjugate Gradients. SIAM J. Numer. Anal., 39(1):300–329, 2001. [5] Bernhard Beckermann and Arno B.J. Kuijlaars. Superlinear CG convergence for special righthand sides. Electronic Transactions on Numerical Analysis, 14:1–19, 2002. [6] P. Benner. Control Theory, chapter 57. Chapman & Hall/CRC, 2006. Handbook of Linear Algebra. [7] J. P. Coleman and N. J. Myers. The Faber polynomials for annular sectors. Math. Comput., 64(209):181–203, 1995. [8] M. J. Corless and A. E. Frazho. Linear systems and control - An operator perspective. Pure and Applied Mathematics. Marcel Dekker, New York - Basel, 2003. [9] Biswa Nath Datta. Krylov subspace methods for large-scale matrix problems in control. Future Generation Computer Systems, 19(7):1253–1263, 2003. [10] T. A. Driscoll and L. N. Trefethen. Schwarz-Christoffel Mapping. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2002. [11] Tobin Driscoll. Algorithm 756: A MATLAB Toolbox for Schwarz-Christoffel Mapping. ACM Transactions on Mathematical Software, 22(2):168–186, 1996. [12] V. Druskin and L. Knizhnerman. Two polynomial methods of calculating functions of symmetric matrices. U.S.S.R. Comput. Math. Math. Phys., 29:112–121, 1989. [13] V. Druskin and L. Knizhnerman. Extended Krylov subspaces: approximation of the matrix square root and related functions. SIAM J. Matrix Anal. Appl., 19(3):755–771, 1998. [14] Michael Eiermann. On semiiterative methods generated by Faber polynomials. Numer. Math., 56:139–156, 1989. [15] Michael Eiermann. Field of values and iterative methods. Lin. Alg. Appl., 180:167–197, 1993. [16] S. W. Ellacott. Computation of Faber series with application to numerical polynomial approximation in the complex plane. Mathematics of Computation, 40(162):575–587, 1983. [17] K. O. Geddes. Near-Minimax Polynomial Approximation in an Elliptical Region. SIAM J. Numer. Anal., 15(6):1225–1233, 1978. [18] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, 3rd edition, 1996. [19] I. S. Gradshteyn and I. M. Ryzhik. Table of integrals, series, and products. Corrected and enlarged edition. Academic Press, San Diego, California, 1980. [20] Nicholas J. Higham. The Matrix Computation Toolbox. http://www.ma.man.ac.uk/~higham/mctoolbox.

16

V. Simoncini and V. Druskin

[21] M. Hochbruck and C. Lubich. On Krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal., 34(5):1911–1925, 1997. [22] I. M. Jaimoukha and E. M. Kasenally. Krylov subspace methods for solving large Lyapunov equations. SIAM J. Numer. Anal., 31(1):227–251, Feb. 1994. [23] K. Jbilou and A.J. Riquet. Projection methods for large Lyapunov matrix equations. Linear Algebra and its Applications, 415(2-3):344–358, 2006. [24] L. Knizhnerman. Calculus of functions of unsymmetric matrices using Arnoldi’s method. Comput. Math. Math. Phys., 31:1–9, 1991. [25] T. Koch and J. Liesen. The conformal ‘bratwurst’ maps and associated Faber polynomials. Numer. Math., 86:173–191, 2000. [26] J. Liesen. Construction and analysis of polynomial iterative methods for non-Hermitian systems of linear equations. PhD thesis, Fakult¨ at f¨ ur Mathematik der Universit¨ at Bielefeld, Nov. 1998. [27] T.A. Manteuffel. The Tchebychev iteration for nonsymmetric linear systems. Numer. Math., 28:307–327, 1977. [28] Matrix Market. A visual repository of test data for use in comparative studies of algorithms for numerical linear algebra, Mathematical and Computational Sciences Division, National Institute of Standards and Technology. Online at http://math.nist.gov/MatrixMarket . [29] O. Nevanlinna. Convergence of iterations for linear equations. Birkh¨ auser, Basel, 1993. [30] M. Robb´ e and M. Sadkane. A convergence analysis of GMRES and FOM for Sylvester equations. Numerical Algorithms, 30:71–89, 2002. [31] Y. Saad. Numerical solution of large Lyapunov equations. In M. A. Kaashoek, J. H. van Schuppen, and A. C. Ran, editors, Signal Processing, Scattering, Operator Theory, and Numerical Methods. Proceedings of the international symposium MTNS-89, vol III, pages 503–511, Boston, 1990. Birkh¨ auser. [32] Y. Saad. Iterative methods for sparse linear systems. The PWS Publishing Company, 1996. [33] Valeria Simoncini. A new iterative method for solving large-scale Lyapunov matrix equations. SIAM J. Sci. Comput., 29(3):1268–1288, 2007. [34] Valeria Simoncini and Daniel B. Szyld. On the occurrence of superlinear convergence of exact and inexact Krylov subspace methods. SIAM Review, 47:247–272, 2005. [35] V. I. Smirnov and N. A. Lebedev. Functions of a complex variable, constructive theory. M.I.T. Press, Cambridge, Massachusetts, 1968. [36] D. E. Stewart and T. S. Leyk. Error estimates for Krylov subspace approximations of matrix exponentials. J. Computational and Applied Mathematics, 72:359–369, 1996. [37] P. K. Suetin. Fundamental properties of Faber polynomials. Russ. Math. Surv., 19(4):121–149, 1964. [38] P. K. Suetin. Series of Faber Polynomials (Analytical Methods and Special Functions). Gordon and Breach Science Publishers, Amsterdam, 1998. Translated by E. V. Pankratiev. [39] H. Tal-Ezer. Spectral methods in time for parabolic problems. SIAM J. Numer. Anal., 26:1–11, 1989.