An error analysis of Galerkin projection methods for linear systems with tensor product structure Bernhard Beckermann∗
Daniel Kressner†
Christine Tobler‡
July 2, 2013
Abstract Recent results on the convergence of a Galerkin projection method for the Sylvester equation are extended to more general linear systems with tensor product structure. In the Hermitian positive definite case, explicit convergence bounds are derived for Galerkin projection based on tensor products of rational Krylov subspaces. The results can be used to optimize the choice of shifts for these methods. Numerical experiments demonstrate that the convergence rates predicted by our bounds appear to be tight.
Linear system, Kronecker product structure, Sylvester equation, tensor projection, Galerkin projection, rational Krylov subspaces. 15A24, 65F10.
1
Introduction
This paper is concerned with Galerkin methods on tensor product subspaces for particularly structured large-scale linear systems. These structures are motivated by the Sylvester equation A1 X + XAT2 = C, (1) with coefficient matrices A1 ∈ Cn1 ×n1 , A2 ∈ Cn2 ×n2 , a right-hand side matrix C ∈ Cn1 ×n2 and the solution matrix X ∈ Cn1 ×n2 . Using Kronecker products, the matrix equation (1) can be reformulated as a linear system (In2 ⊗ A1 + A2 ⊗ In1 )x = c,
(2)
where c = vec(C) and x = vec(X). Sylvester equations arise in the context of numerical methods for eigenvalue problems [9, 13], algebraic Riccati equations [5], and model reduction [1]. In the last two cases, the right-hand side C often has low rank. Given orthonormal bases V1 ∈ Cn1 ×k1 and V2 ∈ Cn2 ×k2 , the Galerkin method on the tensor product subspace span(V1 ) ⊗ span(V2 ) constructs an approximation to (2) as x e = (V2 ⊗ V1 )y, where y is chosen such that e1 + A e2 ⊗ Ik )y = (V2∗ ⊗ V1∗ )c, (Ik2 ⊗ A 1 ∗
(3)
Laboratoire Painlev´e UMR 8524 (ANO-EDP), UFR Math´ematiques – M3, UST Lille, F-59655 Villeneuve d’Ascq CEDEX, France. E-mail:
[email protected]. † ANCHP, MATHICSE, EPF Lausanne, Switzerland.
[email protected] ‡ ANCHP, MATHICSE, EPF Lausanne, Switzerland.
[email protected] 1
e1 = V ∗ A1 V1 and A e2 = V ∗ A2 V2 . A number of numerical methods for solving Sylvester with A 1 2 and Lyapunov equations can be seen as special cases of such a Galerkin approach. This includes methods based on Krylov subspaces [17, 24], extended Krylov subspaces [26], and rational Krylov subspaces [6, 18]. The convergence of the Galerkin method on tensor products of (rational) Krylov subspaces for Lyapunov and Sylvester equations has been analysed in [2, 10, 21, 20, 27, 28]. For the extensions considered in this paper the framework developed in [2] appears to be most suitable. It is based on a decomposition of the residual into a sum of three orthogonal vectors, as follows: r := c − (I ⊗ A1 + A2 ⊗ I)(V2 ⊗ V1 )y = (V2 ⊗ I)r1 + (I ⊗ V2 )r2 + b c,
(4)
where e2 ⊗ I)(I ⊗ V1 )y, r1 = (V2∗ ⊗ I)c − (I ⊗ A1 + A e1 + A2 ⊗ I)(V2 ⊗ I)y, r2 = (I ⊗ V ∗ )c − (I ⊗ A 1
b c = ((I − V2 V2∗ ) ⊗ (I − V1 V1∗ )) c. The usual choices for V1 and V2 yield b c = 0. The partial residuals r1 and r2 can be analysed separately and more easily compared to r as a whole. In particular, it becomes relatively straightforward to derive convergence bounds based on the fields of values of A1 and A2 . A natural extension of (2) is given by the linear system ! d X Ind ⊗ · · · ⊗ Inµ+1 ⊗ Aµ ⊗ Inµ−1 ⊗ · · · ⊗ In1 x = c, (5) µ=1
with coefficient matrices Aµ ∈ Cnµ ×nµ . Such linear systems arise, for example, from the discretization of a separable d-dimensional PDE with tensorized finite elements [14, 21]. Moreover, methods for (5) can be used as preconditioners in iterative methods for more general linear systems [19, 4] and eigenvalue problems [22]. Note that the solution vector x ∈ Rn1 n2 ···nd quickly grows in size as d increases. For large d, this growth will exclude the application of any standard linear solver to (5). In fact, for a general right-hand side c, it is questionable whether the solution of (5) can be approached at all for large d. However, in the special case when c can be written as a Kronecker product (or as a short sum of Kronecker products), a number of algorithms have recently been developed that are capable of dealing even with d = 50 and larger. A method that approximates x by a sum of Kronecker products of vectors was proposed in [14], based on the approximation of the scalar inverse function by a sum of exponentials. A Galerkin method on the tensor subspace spanned by Vd ⊗ · · · ⊗ V1 for orthonormal bases Vµ ∈ Cnµ ×kµ was proposed in [21]. This approach leads to a smaller linear system of size k1 · · · kd , which is solved by the method from [14]. More recently, an ADI-like method, applying low-rank tensor approximations in each ADI iteration was proposed [23]. In this paper, we provide an error analysis of such a Galerkin method based on the ideas of Beckermann [2]. Compared to the results from [21], our analysis allows for more elegant and improved convergence results when using standard or extended Krylov subspaces. It also gives some insight into a good choice of shifts when using rational Krylov subspaces. The rest of this paper is organized as follows. In Section 2, a decomposition of the residual into d + 1 orthogonal vectors is given for the case of projections into arbitrary subspaces. In Section 3, this result is made more concrete for the case of rational Krylov subspaces, and a 2
bound is given for such subspaces, based only on the fields of values of Aµ . Section 4 gives bounds on the residuals for three specific cases of rational Krylov subspaces. In Section 5, numerical experiments are given to confirm these bounds.
2
Galerkin projection on tensor product subspaces
To study Galerkin projection methods on tensor product subspaces for the solution of (5), we define the (huge) system matrix A ∈ Cn1 ···nd ×n1 ···nd as ! d X (6) A := Ind ⊗ · · · ⊗ Inµ+1 ⊗ Aµ ⊗ Inµ−1 ⊗ · · · ⊗ In1 . µ=1
Throughout the rest of this paper, we assume that A is invertible. The matrix A will never be constructed explicitly. Given matrices Vµ ∈ Rnµ ×kµ , µ = 1, . . . , d, with orthonormal columns, the Galerkin projection method computes an approximation x e = Vy, where V = Vd ⊗ Vd−1 ⊗ · · · ⊗ V1 and y ∈ Rk1 k2 ···kd is the solution of V ∗ AVy = V ∗ c,
(7)
provided that V ∗ AV is invertible. Note that the projected matrix V ∗ AV has the same Kronecker product structure as A: ∗
V AV =
d X
eµ ⊗ Ik Ikd ⊗ · · · ⊗ Ikµ+1 ⊗ A µ−1 ⊗ · · · ⊗ Ik1 ,
µ=1
eµ = Vµ∗ Aµ Vµ . It is assumed that (7) is uniquely solvable, which is always the case if where A each Aµ + A∗µ is Hermitian positive definite. An equivalent characterization of x e ∈ span(V) is given by the Galerkin orthogonality condition r ≡ r(V, A1 , . . . , Ad , c) := c − Ae x ⊥ span(V). (8) In this section, we will study general properties of the approximate solution x e without making any further assumptions on the choice of Vµ . For this purpose, we will require the following notation: Vµ := Ikd ⊗ · · · ⊗ Ikµ+1 ⊗ Vµ ⊗ Ikµ−1 ⊗ · · · ⊗ Ik1 V µ := Vd ⊗ · · · ⊗ Vµ+1 ⊗ Inµ ⊗ Vµ−1 ⊗ · · · ⊗ V1 . In particular, this implies V = V µ Vµ for every µ = 1, . . . , d. The following proposition reveals ∗ a useful relation between the orthogonal projections V µ V µ and VV ∗ , using the fact these projectors commute. Proposition 2.1. Consider d ≥ 2 matrices V1 ∈ Rn1 ×k1 , . . . , Vd ∈ Rnd ×kd having orthonormal columns. Then the following equality holds: d Y
∗
(I − V µ V µ ) = I −
µ=1
d X µ=1
3
∗
V µ V µ + (d − 1)VV ∗ .
(9)
Proof. By direct expansion, we obtain d Y
(I −
∗ V µV µ)
=
µ=1
d Y
∗ (−V µ V µ )iµ i∈{0,1}d µ=1
X
=
d d X X Y s=0
i∈{0,1}d |i|=s
∗
(−V µ V µ )iµ ,
µ=1
where |i| := i1 + · · · + id . We now separate the terms for s = 0 and s = 1 from the sum and ∗ ∗ ∗ make use of the fact that V µ V µ V ν V ν = VV ∗ for any µ 6= ν as well as V µ V µ VV ∗ = VV ∗ . This leads to d d Y X ∗ ∗ (I − V µ V µ ) = I − V µ V µ + θ VV ∗ , µ=1
with θ=
d X k=2
µ=1
d X d k d = −1 + d + (1 − 1)d = d − 1, (−1) = −1 + d + (−1) k k k
k=0
which concludes the proof. Based on the result of Proposition 2.1, we will now represent the residual as a sum of orthogonal terms, which can then be bounded individually. An essential part of this rep∗ resentation are terms V µ r, µ = 1, . . . , d, which we will examine in some more detail before stating the main technical results. Consider the term ∗ ∗ ∗ ∗ V µ r = V µ (c − AVy) = V µ c − V µ AV µ Vµ y, ∗
where we have used V = V µ Vµ in the last equality. Analogous to V ∗ AV, the matrix V µ AV µ inherits the Kronecker structure from A, see (6), but the coefficient matrices Aν are replaced by eν := Vν∗ Aν Vν , for ν ∈ {1, . . . , d} \ {µ}. A (10) ∗
∗
This allows us to write V µ r ≡ V µ r(V, A1 , . . . , Ad , c) as ∗ e1 , . . . , A eµ−1 , Aµ , A eµ+1 , . . . , A ed , V ∗µ c), V µ r(V, A1 , . . . , Ad , c) = r(Vµ , A ∗ ∗ where the latter coincides with the residual of the reduced system V µ AV µ Vµ y = V µ c.
Proposition 2.2. With the notation introduced above, the following statements hold. (a) The residual r(V, A1 , . . . , Ad , c) = c − Ae x can be represented as r(V, A1 , . . . , Ad , c) =
d X
e1 , . . . , A eµ−1 , Aµ , A eµ+1 , . . . , A ed , V ∗ c) + b V µ r(Vµ , A c, µ
(11)
µ=1
where the remainder term b c :=
Q
d µ=1 (I
∗ − V µ V µ ) c vanishes for c ∈ span(V).
∗ e1 , . . . , A eµ−1 , Aµ , A eµ+1 , . . . , A ed , V ∗ c) for µ = 1, . . . , d (b) The vectors b c and V µ V µ r = V µ r(Vµ , A µ are mutually orthogonal. In particular, this implies
kr(V, A1 , . . . , Ad , c)k22 =
d X
e1 , . . . , A eµ−1 , Aµ , A eµ+1 , . . . , A ed , V ∗µ c) 2 + kb
r(Vµ , A ck22 . 2 µ=1
4
Proof. (a) Multiplying both sides of the equality (9) with the residual r ≡ r(V, A1 , . . . , Ad , c) leads to d Y
(I −
∗ V µ V µ )r
=r−
µ=1
d X
∗ V µV µr
∗
+ (d − 1)VV r = r −
µ=1
d X
∗ V µ V µr ,
µ=1
∗ r = 0 from the Galerkin orthogonality condition (8). It therefore remains where we used QV Q ∗ ∗ d to show that µ=1 (I − V µ V µ )r = dµ=1 (I − V µ V µ )c = b c, or equivalently, that d Y
∗
(I − V µ V µ )AVy = 0.
(12)
µ=1
Pd
µ=1 Aµ
Inserting A =
d Y
(I −
leads to
∗ V µ V µ )AVy
=
d Y d X ν=1
µ=1
∗ ∗ (I − V µ V µ ) (I − V ν V ν )Aν V ν Vν y.
µ=1 µ6=ν
∗
∗
Because Aν and V ν commute, we have (I − V ν V ν )Aν V ν = (I − V ν V ν )V ν Aν = 0. This shows (12). (b) The orthogonality relations follow from (a),
∗ ∗ ∗ ∗ V ν V ν r, V µ V µ r = r∗ V µ V µ V ν V ν r = r∗ VV ∗ r = kV ∗ rk22 = 0, for µ 6= ν, and
∗ c = V ν V ν r, b
*
∗ V ν V ν r,
d Y
+
(I −
∗ V µV µ) c
= c∗
µ=1
d Y
∗ ∗ ∗ (I − V µ V µ ) (I − V ν V ν )V ν V ν r = 0
µ=1 µ6=ν
for any ν = 1, . . . , d. ∗ e1 , . . . , A eµ−1 , Aµ , A eµ+1 , . . . , A ed , V ∗µ c) contributing to The partial residuals V µ r = r(Vµ , A the overall residual in Proposition 2.2 (b) can also be interpreted more compactly as the residual of a two-dimensional problem. To see this, let us consider the case µ = 1. Then ∗ ∗ ∗ V 1 r = V 1 c − V 1 AV 1 V1 y, with the reduced matrix ∗ V 1 AV 1
= Ikd ⊗ · · · ⊗ Ik2 ⊗ A1 +
d X
eµ ⊗ Ik Ikd ⊗ · · · ⊗ Ikµ+1 ⊗ A µ−1 ⊗ · · · ⊗ Ik2 ⊗In1 .
µ=2
|
{z
=:B1
}
∗
Defining c(1) := V 1 c = (Vd∗ ⊗ · · · ⊗ V2∗ ⊗ I)c, we thus arrive at the compact formula ∗
V 1 r = c(1) − (I ⊗ A1 + B1 ⊗ I)(I ⊗ V1 )y = r(I ⊗ V1 , A1 , B1 , c(1) ) =: r(1) ,
(13)
with suitable sizes of the identity matrices. Note that the matrix I ⊗A1 +B1 ⊗I represents the Sylvester operator X 7→ A1 X +XB1T . As a consequence of (13), r(1) can be interpreted as the residual obtained from approximating the solution to the linear system (I⊗A1 +B1 ⊗I)x = c(1) 5
(which corresponds to a Sylvester equation) by applying Galerkin projection with the “1D projector” I ⊗ V1 . As implicitly demonstrated in [2] for Sylvester equations, the convergence of such 1D projections is significantly easier to study than the convergence of the projection as a whole. ∗ For general µ, a similar formula can be shown for V µ r. Let us define the two-dimensional residual r(µ) := r(I ⊗ Vµ , Aµ , Bµ , c(µ) ) = c(µ) − (I ⊗ Aµ + Bµ ⊗ I)(I ⊗ Vµ )P (µ) y,
(14)
where Bµ is now a matrix of size k1 · · · kµ−1 kµ+1 · · · kd . Moreover, both the right-hand side and the residual undergo a permutation with an appropriately chosen permutation matrix P (µ) : ∗ ∗ r(µ) = P (µ) V µ r , c(µ) = P (µ) V µ c . (15) Since a permutation does not change the norm of a vector, the following result follows directly from Proposition 2.2 (b). Proposition 2.3. The norm of the residual satisfies kr(V, A1 , . . . , Ad , c)k22
=
d X
kr(µ) k22 + kb ck22 ,
(16)
µ=1
with r(µ) ≡ r(I ⊗ Vµ , Aµ , Bµ , c(µ) ). In Section 3, we will derive bounds for the case that the columns of each Vµ form the basis of a rational Krylov subspace with Aµ . To compute these bounds, it is helpful to reformulate (14) in terms of contour integrals in C. Proposition 2.4. For the linear system (5), consider a fixed integer µ ∈ {1, . . . , d} and let Aµ , Bµ , c(µ) , Vµ ∈ Rnµ ×kµ (with kµ < nµ ) be defined as explained above. Suppose that the linear systems (I ⊗ Aµ + Bµ ⊗ I)x(µ) = c(µ) (17) and eµ + Bµ ⊗ I)y (µ) = (I ⊗ V ∗ )c(µ) , (I ⊗ A µ
eµ = V ∗ Aµ Vµ , with A µ
admit unique solutions. Then, the residual r(µ) = c(µ) − (I ⊗ Aµ + Bµ ⊗ I)e xµ of (17) for the approximate solution x eµ = Vµ y (µ) can be written as Z i −1 h (µ) eµ )−1 V ∗ c(µ) dz . r = zI + Bµ ⊗ I − (zI − Aµ )Vµ (zI − A µ 2πi ΓBµ with a compact curve ΓBµ which encircles the spectrum of −Bµ once, but does not encircle eµ . the spectrum of Aµ and A Proof. For a compact curve ΓA encircling the spectrum of A once but not encircling the spectrum of −B, where A, B are general square matrices, the unique solution of (I ⊗ A + B ⊗ I)x = c can be written as Z dz x= (zI + B)−1 ⊗ (zI − A)−1 c . (18) 2πi ΓA 6
This is seen by inserting (18) into (I ⊗ A + B ⊗ I)x = c: (I ⊗ A + B ⊗ I) x = I ⊗ (A − zI) + (zI + B) ⊗ I x Z Z dz dz −1 =− (zI + B) ⊗ I c + I ⊗ (zI − A)−1 c 2πi 2πi ΓA ΓA = 0 + c. eµ but not that of −Bµ , we Choosing a contour Γ encircling once the spectrum of both Aµ and A (µ) (µ) obtain integral representations of the solution vectors x and y . Hence the 1D projection error is Z (µ) (µ) eµ )−1 Vµ∗ c(µ) dz x − (I ⊗ Vµ )y = (zI + Bµ )−1 ⊗ (zI − Aµ )−1 − Vµ (zI − A 2πi Γ together with the residual r(µ) = (I ⊗ Aµ + Bµ ⊗ I)(x(µ) − (I ⊗ Vµ )y (µ) ) = − I ⊗ (zI − Aµ ) + (zI + Bµ ) ⊗ I (x(µ) − (I ⊗ Vµ )y (µ) ) Z eµ )−1 V ∗ c(µ) dz = − (zI + Bµ )−1 ⊗ I − (zI − Aµ )Vµ (zI − A µ 2πi ZΓ eµ )−1 V ∗ c(µ) dz . + I ⊗ (zI − Aµ )−1 − Vµ (zI − A µ 2πi Γ Notice that both integrands have the same expansion (I − Vµ Vµ∗ )c(µ) /z + O(1/z 2 ) at ∞. By the Cauchy formula, we can switch to a contour integral along ΓBµ , changing the sign of both integrals. Note that the second integral vanishes.
3
Rational Krylov subspace projection
In this section, we will concentrate on the projection to rational Krylov subspaces. Specifically, we will assume that c can be written as the Kronecker product of d vectors: c = cd ⊗ cd−1 ⊗ · · · ⊗ c1 ,
cµ ∈ Rnµ .
(19)
This is a rather strong assumption that is rarely satisfied in applications. However, for moderate d and a vector c obtained from the discretization of a smooth d-variate function, it is possible to approximate c by a short sum of vectors having the form (19). By superposition, we can reduce this to the situation (19). Alternatively, one could use a method based on block Krylov subspaces. For a right-hand side of the form (19), the term c(µ) from the previous section, see (14)– (15), becomes c(µ) = cµ ⊗ cµ , with cµ = e cd ⊗ · · · ⊗ e cµ+1 ⊗ e cµ−1 ⊗ · · · e c1 and e cν = Vν∗ cν . As a consequence, our integral formula of Proposition 2.4 for the partial residual r(µ) involves the expression h i eµ )−1 Vµ∗ cµ , I − (zI − Aµ )Vµ (zI − A which coincides with the residual of the OR (Orthogonal Residual) method for the shifted system (zI − Aµ )x = cµ . Provided that z is not an element of the field of values W (Aµ ), 7
one knows to relate this quantity with the corresponding minimal residual following, e.g., the techniques of [12, Thm 6.2.6]. It will be therefore convenient in what follows to suppose that 0 6∈ W (A). We will also make use of the fact that W (A) = W (A1 ) + · · · + W (Ad ) = W (Aµ ) − W (−Bµ ), see, for instance, [10, Proof of Thm 4.2]. In particular, this shows that the solvability conditions of Proposition 2.4 are satisfied. Let the columns of the matrix Vµ represent an orthonormal basis of the rational Krylov subspace (µ) Kkµ (Aµ , cµ ) := {Rµ (Aµ )cµ : Rµ ∈ Pkµ −1 /Qµ } for µ = 1, . . . , d. Here, Pr denotes the set of polynomials of degree at most r, and Qµ ∈ Pkµ is a fixed polynomial defined as kµ Y Qµ (z) = (z − zµ,i ). i=1 zµ,i 6=∞
For example, with kµ = 2 and zµ,1 = ∞, zµ,2 ∈ R, the associated Krylov subspace is given by (µ) K2 (Aµ , cµ ) = span{cµ , (Aµ − zµ,2 I)−1 cµ }. We will further fix the first shift to zµ,1 = ∞ for each µ = 1, . . . , d, which ensures that cµ ∈ R(Vµ ). It follows that b c = 0 and we simply have krk22 = kr(1) k22 + kr(2) k22 + · · · + kr(d) k22 from Proposition 2.2. The norm of each partial residual r(µ) can be seen as the solution of a minimization problem, as described in [2]. The following theorem summarizes these results. Theorem 3.1. Suppose that 0 6∈ W (A). Then the partial residual r(µ) defined in (14) satisfies h i (µ) e kRµ (Aµ )cµ k2 + g0 kRµ (Aµ )e cµ k2 kRµ−1 (Bµ )cµ k2 , kr k2 = min (20) Rµ ∈Pkµ /Qµ
with the constant g0 defined as g0 = kAk2 /dist(0, W (A)). Proof. The proof will proceed as follows. In a first step, we prove that, for all Rµ ∈ Pkµ /Qµ , eµ )e cµ k2 kRµ−1 (Bµ )cµ k2 . kr(µ) k2 ≤ kRµ (Aµ )cµ k2 kRµ−1 (Bµ )cµ k2 + g0 kRµ (A
(21)
In a second step, we will describe a function RµG ∈ Pkµ /Qµ for which equality holds in the above statement. Using the exactness property for rational Krylov subspaces [2, Lemma 3.2], the following representation has been derived in [2, Lemma 3.3] for the error of the OR method applied to shifted systems: eµ )−1 V ∗ cµ (zI − Aµ )−1 cµ − Vµ (zI − A µ eµ ) Rµ (A Rµ (Aµ ) eµ )−1 Vµ∗ cµ = (zI − Aµ )−1 cµ − Vµ (zI − A Rµ (z) Rµ (z)
8
for any Rµ ∈ Pkµ /Qµ . Inserting this relation into the integral representation of r(µ) from Proposition 2.4 yields Z −1 Rµ (Aµ ) dz zI + Bµ r(µ) = cµ ⊗ (zI − Aµ ) )(zI − Aµ )−1 cµ Rµ (z) 2πi ΓBµ Z R (A eµ ) −1 µ eµ )−1 V ∗ cµ dz . zI + Bµ cµ ⊗ (zI − Aµ ) Vµ − (zI − A µ Rµ (z) 2πi ΓBµ We will call the two integral terms s1 and s2 , and start by considering s1 . Using the fact that all the terms containing Aµ commute, we find Z −1 Rµ (Aµ ) dz cµ ⊗ cµ = Rµ−1 (−Bµ )cµ ⊗ Rµ (Aµ )cµ s1 = zI + Bµ R (z) 2πi µ ΓBµ and thus r(µ) = Rµ−1 (−Bµ )cµ ⊗ Rµ (Aµ )cµ − s2 . It has been shown in [2, p. 2443] that eµ )e eµ + Bµ ⊗ I)−1 R−1 (−Bµ )cµ ⊗ Rµ (A cµ . s2 = (I ⊗ Aµ + Bµ ⊗ I)(I ⊗ Vµ )(I ⊗ A µ Using that kI ⊗ Aµ + Bµ ⊗ Ik2 = kAk2 and eµ + Bµ ⊗ I)−1 k2 ≤ k(I ⊗ A
1 1 = , dist(0, W (I ⊗ Aµ + Bµ ⊗ I)) dist(0, W (A))
we conclude that eµ )e kr(µ) k2 ≤ kRµ−1 (−Bµ )cµ ⊗ Rµ (Aµ )cµ k2 + g0 kRµ−1 (−Bµ )cµ ⊗ Rµ (A cµ k2 , as claimed in the first part of the statement. To address the second part, we define the rational function eµ ) det(zI − A RµG (z) := , µ = 1, . . . , d. Qµ (z) eµ ) = 0, implying s2 = 0 for this choice of Rµ . Therefore Note that RµG (A r(µ) = (RµG )−1 (−Bµ )cµ ⊗ RµG (Aµ )cµ . This shows equality in (21) for Rµ = RµG and therefore completes the proof. Up to this point, an exact representation of the residual norm was given. Now, we aim to give a bound on the residual, involving only the fields of values W (Aµ ) for the matrices A1 , . . . , Ad . First, we note that h i eµ )e kr(µ) k2 = min kRµ (Aµ )cµ k2 + g0 kRµ (A cµ k2 kRµ−1 (Bµ )cµ k2 Rµ ∈Pkµ /Qµ
≤ kck2
min
h
Rµ ∈Pkµ /Qµ
≤ CCrouzeix kck2
i eµ )k2 kR−1 (Bµ )k2 kRµ (Aµ )k2 + g0 kRµ (A µ min
max
h
Rµ ∈Pkµ /Qµ z∈W (Bµ )
9
i eµ )k2 |R−1 (z)|. kRµ (Aµ )k2 + g0 kRµ (A µ
(22)
Here, the constant CCrouzeix is such that kf (A)k2 ≤ CCrouzeix kf kL∞ (W (A)) for any matrix A and any function f analytic in W (A). Recently, it was proven that CCrouzeix ≤ 11.08, and it has been conjectured that CCrouzeix = 2 [8]. Let us now define the Green’s function gAµ (·, ζ) of C \ W (Aµ ), µ = 1, . . . , d, with pole ζ ∈ C, and set kµ X uµ (z) := exp − gAµ (z, zµ,j ) , µ = 1, . . . , d. (23) j=1
Note that uµ (z) can be given explicitly for the case when Aµ is Hermitian positive definite, and thus W (Aµ ) = [αµ , βµ ] with 0 < αµ < βµ . It then takes the form s z−β z −α µ µ,j µ − 1 kµ Y z − αµ zµ,j − βµ . s (24) uµ (z) = z − βµ z µ,j − αµ j=1 z − αµ z µ,j − βµ + 1 The following theorem is a straightforward extension of [2, Theorem 2.3] from d = 2 to general d. Theorem 3.2. Suppose that 0 6∈ W (A) and define ) ( d X W (Aν ) , γµ := max uµ (−z) : z ∈
µ = 1, . . . , d.
(25)
ν=1 ν6=µ
Then, the residual for the rational Galerkin method described above is bounded by v u d uX γ µ 2 krk2 ≤ 2 CCrouzeix (1 + g0 )kck2 t . 1 − γµ
(26)
µ=1
For the case of Hermitian positive definite matrices Aµ , a tighter bound is given by v s u d X λmax (A) u t krk2 ≤ 2 kck2 γµ2 , λmin (A)
(27)
µ=1
provided that each set of shifts {zµ,1 , . . . , zµ,kµ }, µ = 1, . . . , d, is closed under complex conjugation. eµ ), there exists Proof. According to [2, Theorem 3.4] with the convex set E = W (Aµ ) ⊃ W (A # a function R ∈ Pkµ /Qµ such that kR# (Aµ )k2 ≤ 2,
eµ )k2 ≤ 2, kR# (A
1 |R# (z)|
10
≤
uµ (z) 1 − uµ (z)
∀z ∈ / W (Aµ ).
Applying this result to (22) directly leads to kr(µ) k2 ≤ CCrouzeix kck2 max 2(1 + g0 ) z∈W (Bµ )
uµ (z) . 1 − uµ (z)
Note that t/(1 − t) is monotonically increasing for t ∈ [0, 1), and that uµ (z) ∈ [0, 1) because gAµ (z, ξ) ≥ 0 for all z, ξ and gAµ (z, ξ) > 0 for z, ξ ∈ / W (Aµ ). This directly leads to (26). For the tighter bound in the case of Hermitian positive definite matrices, we refer to the proof of Theorem 2.3 on pages 2447–2448 of [2]. Remark 3.3. For A being Hermitian positive definite, error bounds in the energy norm, kx − x ekA have been given in [21] for the case of standard Krylov subspaces, that is, zµ,j ≡ ∞.
4
Application to specific examples
In this section, we consider several concrete choices of rational Krylov subspaces, and calculate the convergence bounds resulting from Theorem 3.2. We will focus on Hermitian positive definite matrices Aµ , µ = 1, . . . , d and consider the following three choices of subspaces: (i) standard Krylov subspaces (all shifts zµ,j = ∞); (ii) extended Krylov subspaces (zµ,j ∈ {0, ∞} alternatingly), also called Krylov plus inverse Krylov (KPIK); (iii) modified extended Krylov subspaces (zµ,j ∈ {σ, ∞} alternatingly, with σ ∈ R). Theorem 3.2 applies to all these cases and in the following we will only specify the parameter γµ that appears in the residual bound (27). Recall that, for Hermitian positive definite Aµ , the field of values W (Aµ ) = [αµ , βµ ] coincides with the convex hull of the spectrum, and hence 0 < αµ < βµ . Our convergence bounds will be expressed in terms of κµ :=
βµ , αµ
κL,µ :=
λmax (A) , λmax (A) − βµ + αµ
κR,µ := 1 +
β µ − αµ . λmin (A)
(28)
In what follows, the quantities κL,µ and κR,µ will be referred to as effective condition numbers. It is immediate to check that the inequalities λmax (A) 1 < κL,µ < κR,µ < min κµ , . (29) λmin (A) q z−β hold for d ≥ 2. Using the substitution f = z−αµµ and combining (25) with (24), we obtain in the Hermitian positive definite case the simplified formula kµ Y f − θ µ,j γµ = √ max√ , f + θµ,j f ∈[ κL,µ , κR,µ ] j=1
s θµ,j :=
zµ,j − βµ , zµ,j − αµ
(30)
from which it becomes clear that if is sufficient to restrict our attention to poles on the √ negative real axis zµ,j ∈ [−∞, 0] or, equivalently, θµ,j ∈ [1, κµ ]. We start by giving a bound for standard Krylov subspaces. 11
Corollary 4.1. Let Aµ be Hermitian positive definite matrices with W (Aµ ) = [αµ , βµ ] for µ = 1, . . . , d. Applying the Galerkin projection method with standard Krylov subspaces k −1 span cµ , Aµ cµ , . . . , Aµµ cµ , the convergence factor γµ satisfies γµ =
!kµ √ κR,µ − 1 , √ κR,µ + 1
with
κR,µ = 1 +
βµ − αµ . λmin (A)
Proof. The choice of standard Krylov subspaces corresponds to zµ,j ≡ ∞, and thus θµ,j = 1 for j = 1, . . . , kµ . Inserting this into (30) and taking into account that (1, ∞) 3 f 7→ (f −1)/(f +1) is positive and increasing, the assertion follows. Note that the convergence factor of Corollary 4.1 matches the one obtained in [21, Corollary 4.4]. However, for the case of extended Krylov subspaces, the approach from this paper gives a substantially better factor compared to [21, Lemma 6.1], especially for d > 2. Corollary 4.2. Let Aµ be Hermitian positive definite matrices with W (Aµ ) = [αµ , βµ ] for µ = 1, . . . , d. Applying the Galerkin projection method with extended Krylov subspaces k /2−1
µ span{cµ , A−1 µ cµ , Aµ cµ , . . . , Aµ
for even kµ , the convergence factor γµ satisfies !kµ √ 4 κ − 1 µ , γµ ≤ √ 4 κ + 1 µ
with
−kµ /2
cµ , Aµ
κµ =
cµ }
βµ . αµ
More specifically, we have the equalities kµ /2 √ √ p κL,µ −1 κµ /κL,µ −1 √ √ , for λ (A) < β + β µ αµ , max µ κ +1 L,µ κµ /κL,µ +1 kµ /2 √ √ p κR,µ −1 κµ /κR,µ −1 γµ = √ √ , for λ (A) > α + βµ αµ , min µ κR,µ +1 κµ /κR,µ +1 √ 4 κµ −1 kµ √ , otherwise. 4 κµ +1 Proof. The choice of extended Krylovpsubspaces corresponds to zµ,j = ∞ (hence, θµ,j = 1) √ for odd j, and zµ,j = 0 (hence, θµ,j = βµ /αµ = κµ ) for even j. We need to find √ κµ − f f −1 2/kµ γµ = √ max√ g(f ), g(f ) = · √ . f + 1 f + κµ f ∈[ κL,µ , κR,µ ] where we have used inequalities (29). Simple elementary calculus shows that the maximum √ √ of g on the interval [1, κµ ] is attained at f ∗ = 4 κµ , and that !2 √ 4 κ − 1 µ 2/kµ γµ = √ max√ g(f ) ≤ max g(f ) = g(f ∗ ) = √ . √ 4 κ + 1 f ∈[ κL,µ , κR,µ ] f ∈[1, κµ ] µ √ √ √ This shows the first statement. If 4 κµ ∈ [ κL,µ , κR,µ ], this inequality becomes an equality. √ √ √ √ Otherwise, the maximum is given by g( κL,µ ) if f ∗ < κL,µ and by g( κR,µ ) if f ∗ > κR,µ , as claimed in the second statement. 12
Finally, we get the following bound for a rational Krylov subspace method with two shifts, ∞ and σ ∈ R \ W (Aµ ), the latter possibly depending on µ. This is a generalization of extended Krylov subspaces, where σ = 0, and we will see that it allows for faster convergence, while requiring the same number of linear system solves in the method. Corollary 4.3. Let Aµ be Hermitian positive definite matrices with W (Aµ ) = [αµ , βµ ] for µ = 1, . . . , d. Applying the Galerkin projection method with rational Krylov subspaces of the form k /2−1 span{cµ , (Aµ − σI)−1 cµ , Aµ cµ , . . . , Aµµ cµ , (Aµ − σI)−kµ /2 cµ } for even kµ , the convergence factor γµ satisfies ( √ !kµ θ−1 √ γµ ≤ max , θ+1
√ κR,µ − 1 √ κR,µ + 1
!kµ /2 ) √ κR,µ − θ , · √ κR,µ + θ
(31)
q σ−β βµ −αµ with θ = σ−αµµ and κR,µ = 1 + λmin (A) . The shift minimizing this bound is given by 2 2 σopt = αµ (θopt − κµ )/(θopt − 1), (32) √ √ √ with θopt := s−1 ( κµ ) for s(θ) := (θ + 1)2 + (θ − 1) θ2 + 6θ + 1 /(4 θ). When using this shift σopt , we have !kµ p 6 4κR,µ − 1 γµ ≤ p . (33) 6 4κR,µ + 1
Proof. The choice of rational Krylov q subspaces corresponds to zµ,j = ∞ (hence, θµ,j = 1) for σ−β odd j, and zµ,j = σ (hence, θµ,j = σ−αµµ =: θ) for even j. We need to find 2/k γµ µ
=
gθ (f ), √ max√ f ∈[ κL,µ , κR,µ ]
f −1 gθ (f ) = f +1
f − θ · . f + θ
The function gθ (f ) is continuously √ differentiable in all points except 1 and θ, see Figure 1. It has a unique local maximum at θ. Combined with (29), this shows the bound (31): √ √ 2/k γµ µ = √ max√ gθ (f ) ≤ max gθ (f ) = max gθ ( θ), gθ ( κR,µ ) . √ f ∈[ κL,µ , κR,µ ]
f ∈[1, κR,µ ]
To prove (32), we need to find θopt ∈ (0, ∞) which minimizes the function h(θ) given by √ √ h(θ) := max gθ ( θ), gθ ( κR,µ ) = max{h1 (θ), h2 (θ)} √ √ √ θ − 1 2 κR,µ − 1 κR,µ − θ , h2 (θ) := √ with h1 (θ) := √ √ . κR,µ + 1 κR,µ + θ θ+1 The function h1 (θ) is zero in 1, and monotonically increases with |θ−1|. Similarly, h2 (θ) is zero √ √ in κR,µ , and monotonically increases with |θ − κR,µ |. As both h1 and h2 are monotonously √ decreasing on (0, 1], we clearly have θopt ≥ 1. Similarly, we find that θopt ≤ κR,µ . Therefore, θopt is uniquely defined by the relation √ h1 (θopt ) = h2 (θopt ), θopt ∈ [1, κR,µ ]. 13
gθ(f)
0.1
0 0
1
2
3 f
4
5
6
Figure 1: Function gθ (f ) from the proof of Corollary 4.3 for θ = 4. √ Inserting both functions leads to the condition κR,µ = s(θopt ), with the bijective function √ √ s : [1, κR,µ ] → [1, s( κR,µ )] defined in the statement of the corollary. To show (33), it remains to prove !2 !2 p p 6 4κR,µ − 1 θopt − 1 2/kµ γµ ≤ p = p , 6 4κR,µ + 1 θopt + 1 which is the case if and only if This is easily seen from
p p √ 3/2 θopt ≤ 6 4κR,µ , or, equivalently, θopt /2 ≤ κR,µ = s(θopt ).
p 2θ2 ≤ (θ + 1)2 + (θ − 1)2 ≤ (θ + 1)2 + (θ − 1) θ2 + 6θ + 1,
∀q ≥ 0.
Remark 4.4. The convergence behavior tends to improve when d, the number of dimensions, increases. We illustrate this by considering the case Aµ ≡ A with W (A) = [α, β] and kµ ≡ k. Then the bound obtained from Corollary 4.1 for standard Krylov subspaces becomes s !k √ √ κR − 1 λmax (A) krk2 ≤ 2 d kck2 , √ λmin (A) κR + 1 with κR = 1 + β−α dα < κ = β/α. Clearly, the effective condition number κR decreases as d increases. A similar statement holds for the bound obtained from Corollary 4.3 for rational Krylov subspaces with shifts ∞ and σopt . However, for the case of extended Krylov subspaces, p the bound generally only improves if the condition λmin (A) > αµ + βµ αµ holds, that is, if √ κ < d − 1. Since it is unlikely that such a condition is satisfied, it follows that choosing an optimal shift is essential for achieving improved results in higher dimensions. Remark 4.5. For different but similar Blaschke-type rational functions, the quantity γk of (30) has been determined in [3, Section 6] for various configurations of poles, including those discussed in this section. If it is affordable to solve shifted systems with Aµ for several different shifts, one could imagine to choose zµ,1 = ∞ and then cyclic shifts zµ,2 = σ1 , zµ,p+2 = σ1 , ...
zµ,3 = σ2 , . . . , zµ,p+3 = σ2 , . . . ,
14
zµ,p+1 = σp , zµ,2p+1 = σp ,
for some fixed p. Note that σ1 , . . . , σp will depend on µ. The so-called ADI optimal shifts are obtained by solving the third Zolotarev problem p+1 Y f − θµ,j min , √ max√ f + θµ,j θµ,2 ,...,θµ,p+1 f ∈[ κL,µ , κR,µ ] j=2
see, 1 we get the convergence rate p e.g., [25, Sec. 2.1] p and [11]. For instance, in the case p = √ ( κR,µ /κL,µ − 1)/( κR,µ /κL,µ + 1) for the parameter θµ,2 = κR,µ κL,µ .
5
Numerical Experiments
In this section, numerical experiments are presented to illustrate the theoretical results on the convergence behavior obtained in Section 4 for Hermitian positive definite matrices. Remark 5.1. The implementation of the Galerkin projection method requires the solution of the linear system ! d X eµ ⊗ Ik Ik ⊗ · · · ⊗ Ik ⊗A ⊗ · · · ⊗ Ik y = e cd ⊗ · · · ⊗ e c1 , µ+1
d
µ−1
1
µ=1
at least once in the final step of the method. This system has size k1 k2 · · · kd , which makes a direct approach infeasible for larger d, even when kµ nµ . Instead, we use an approximate solution based on exponential sums: ye =
R X
ed )e e1 )e ωj exp(−αj A cd ⊗ · · · ⊗ exp(−αj A c1 ,
j=1
eµ )e see [14, 21]. The vector ye is only stored implicitly through the vectors exp(−αj A cµ . The coefficients αj , ωj , j = 1, . . . , R are chosen to approximate 1/ξ by the exponential sum PR −αj ξ [7, 15, 16]. As the approximation error decreases exponentially depending with j=1 ωj e R, even moderate values of R lead to an accuracy at the level of about 10−8 , which is used in all experiments below. Experiment 1: Standard Krylov subspaces. In the first example, we set all matrices Aµ ≡ A, where A ∈ R200×200 is the standard finite difference discretization of the onedimensional Laplace operator on [0, 1] with homogeneous Dirichlet boundary conditions. The right-hand side is set to c = b ⊗ b ⊗ · · · ⊗ b, where the entries of b are random numbers from N (0, 1) and b is normalized such that kbk2 = 1. The obtained results are shown in the left plot of Figure 2. For d = 2, observed and predicted convergence rates match quite well. However, as d increases, the observed convergence becomes faster than predicted by Corollary 4.1. This phenomenon seems to depend strongly on the choice of b. To demonstrate this, let b = U D−1 e/kU D−1 ek2 ,
(34)
where A = U DU T is the eigenvalue decomposition of A and e = (1, . . . , 1)T . Then the results shown in the right plot of Figure 2 reveal that the observed and predicted convergence rates match very well even for large d. Moreover, both the convergence rate does not improve visibly as d increases, which confirms the observation made in Remark 4.4. 15
5
5
10
10 d=2 d=5 d = 10 d = 50 d = 100
0
residual norm
residual norm
0
10
−5
10
−10
10
−5
10
−10
10
0
d=2 d=5 d = 10 d = 50 d = 100
10 50
100
150
200
k
0
50
100
150
200
k
Figure 2: Norm of residuals for standard Krylov subspace method. Solid lines correspond to computed residuals and dashed lines correspond to the bound obtained from Corollary 4.1. Left: Randomly chosen right-hand side. Right: Particularly constructed right-hand side (34). Experiment 2: Extended Krylov subspaces. We repeat Experiment 1 for extended Krylov subspaces. The results for a random right-hand side are shown in the left plot of Figure 3. Even for d = 2, the observed convergence is much faster than predicted by Corollary 4.2. Moreover, this phenomenon does not disappear with the right-hand side (34). Inspired by a numerical experiment in [20, Example 4.2], we also consider a diagonal matrix A ∈ Rn×n , n = 104 , with diagonal elements 1 π(j − 1) ajj = √ (κ + 1) + (κ − 1) cos , j = 1, . . . , n, (35) n−1 2 κ where κ = 2500 = κ(A). The right-hand side vector is set to c = b ⊗ b ⊗ · · · ⊗ b, with b = A−1 e. The obtained results are shown in the right plot of Figure 3. Again, the observed and predicted convergence rates match very well even for large d. Experiment 3: Rational Krylov subspaces with shifts ∞ and σopt . We repeat Experiment 2 for rational Krylov subspaces with shifts ∞ and σopt defined in Corollary 4.3. The results are displayed in Figure 4. The observed and predicted convergence rates match well. However, in contrast to Experiment 2, the convergence rates improve as d increases. Experiment 4: Rational Krylov subspaces with optimal ADI shifts. In this experiment, we apply rational Krylov subspaces with optimal ADI shifts σ1 , · · · , σp to the matrix (35) and the corresponding right-hand side. The obtained results for p = 1 and p = 3 optimal ADI shifts are shown in the left and the right plot of Figure 5, respectively.
6
Conclusions
In this paper, we have provided an analysis of Galerkin projection onto tensor products of subspaces for linear systems that can be regarded as d-dimensional analogues of the Sylvester 16
5
5
10
10
d=2 d=5 d = 10 d = 50 d = 100
0
residual norm
residual norm
0
10
−5
10
10
−5
10
−10
−10
10
10
0
d=2 d=5 d = 10 d = 50 d = 100
50
100
150
0
200
10
20
30
40
50
60
k
k
Figure 3: Norm of residuals for extended Krylov subspace method. Solid lines correspond to computed residuals and dashed lines correspond to the bound obtained from Corollary 4.2. Left: Discrete Laplace. Right: Diagonal matrix (35).
5
5
10
10
d=2 d=5 d = 10 d = 50 d = 100
0
residual norm
residual norm
0
10
−5
10
10
−5
10
−10
−10
10
10
0
d=2 d=5 d = 10 d = 50 d = 100
50
100
150
200
0
10
20
30
40
50
60
k
k
Figure 4: Norm of residuals for rational Krylov subspace method with shifts ∞ and σopt . Solid lines correspond to computed residuals and dashed lines correspond to the bound obtained from Corollary 4.3. Left: Discrete Laplace. Right: Diagonal matrix. (35).
17
5
5
10
10 d=2 d=5 d = 10 d = 50 d = 100
0
residual norm
residual norm
0
10
−5
10
−10
10
−5
10
−10
10
0
d=2 d=5 d = 10 d = 50 d = 100
10 10
20
30
40
50
60
k
0
10
20
30
40
50
60
k
Figure 5: Norm of residuals for rational Krylov subspace method with optimal ADI shifts. Solid lines correspond to computed residuals and dashed lines correspond to the bound from Theorem 3.2. Left: One shift. Right: Three shifts. equation. The orthogonal decomposition of the residual derived in Proposition 2.2 is the key observation and allows to decompose the residual into a sum of residuals for simpler 1D projections, see Proposition 2.3. When applied to polynomial and rational Krylov subspaces, this decomposition allows to derive a priori error estimates via extremal problems for univariate rational functions. This contrasts with the analysis in [21], which involves multivariate approximation problems. Numerical experiments demonstrate that the convergence rates derived in this paper are sharp. Moreover, our bounds allow for a better understanding of the dependence of the convergence rates on the shifts in the rational Krylov subspaces. In turn, this can be used to optimize the choice of shifts. Interestingly, the convergence of rational Krylov subspace methods with optimized shifts improves as the dimension d increases.
References [1] A. C. Antoulas. Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia, PA, 2005. [2] B. Beckermann. An error analysis for rational Galerkin projection applied to the Sylvester equation. SIAM J. Numer. Anal., 49(6):2430–2450, 2011. [3] B. Beckermann and L. Reichel. Error estimates and evaluation of matrix functions via the Faber transform. SIAM J. Numer. Anal., 47(5):3849–3883, 2009. [4] P. Benner and T. Breiten. Low rank methods for a class of generalized Lyapunov equations and related issues. Technical report, MPI Magdeburg, 2012. Available from http://www.mpi-magdeburg.mpg.de/preprints/. [5] P. Benner, R. Byers, E. S. Quintana-Ort´ı, and G. Quintana-Ort´ı. Solving algebraic Riccati equations on parallel computers using Newton’s method with exact line search. Parallel Comput., 26(10):1345–1368, 2000. 18
[6] P. Benner, R.-C. Li, and N. Truhar. On the ADI method for Sylvester equations. J. Comput. Appl. Math., 233(4):1035–1045, 2009. [7] D. Braess and W. Hackbusch. Approximation of 1/x by exponential sums in [1, ∞). IMA J. Numer. Anal., 25(4):685–697, 2005. [8] M. Crouzeix. Numerical range and functional calculus in hilbert space. J. Funct. Anal., 244(2):668–690, 2007. [9] J. W. Demmel. Three methods for refining estimates of invariant subspaces. Computing, 38:43–57, 1987. [10] V. Druskin, L. Knizhnerman, and V. Simoncini. Analysis of the rational Krylov subspace and ADI methods for solving the Lyapunov equation. SIAM J. Numer. Anal., 49(5):1875– 1898, 2011. [11] N. S. Ellner and E. L. Wachspress. Alternating direction implicit iteration for systems with complex spectra. SIAM J. Numer. Anal., 28(3):859–870, 1991. [12] O. G. Ernst. Minimal and orthogonal residual methods and their generalizations for solving linear operator equations. Habilitation thesis, TU Bergakademie Freiberg, 2001. [13] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, third edition, 1996. [14] L. Grasedyck. Existence and computation of low Kronecker-rank approximations for large linear systems of tensor product structure. Computing, 72(3-4):247–265, 2004. [15] W. Hackbusch. Approximation of 1/x by exponential sums. Available from http:// www.mis.mpg.de/scicomp/EXP_SUM/1_x/tabelle. Retrieved August 2008. [16] W. Hackbusch. Entwicklungen nach Exponentialsummen. Technical Report, MaxPlanck-Institut f¨ ur Mathematik in den Naturwissenschaften, 2009. Revised version. See http://www.mis.mpg.de/preprints/tr/report-0405.pdf. [17] I. M. Jaimoukha and E. M. Kasenally. Krylov subspace methods for solving large Lyapunov equations. SIAM J. Numer. Anal., 31:227–251, 1994. [18] K. Jbilou. ADI preconditioned Krylov methods for large Lyapunov matrix equations. Linear Algebra Appl., 432(10):2473–2485, 2010. [19] B. N. Khoromskij and Ch. Schwab. Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput., 33(1):364–385, 2011. [20] L. Knizhnerman and V. Simoncini. Convergence analysis of the extended Krylov subspace method for the Lyapunov equation. Numer. Math., 118(3):567–586, 2011. [21] D. Kressner and C. Tobler. Krylov subspace methods for linear systems with tensor product structure. SIAM J. Matrix Anal. Appl., 31(4):1688–1714, 2010. [22] D. Kressner and C. Tobler. Preconditioned low-rank methods for high-dimensional elliptic PDE eigenvalue problems. Comput. Methods Appl. Math., 11(3):363–381, 2011. 19
[23] T. Mach and J. Saak. Towards an ADI iteration for tensor structured equations. Technical report, MPI Magdeburg, 2012. Available from www.mpi-magdeburg.mpg.de/preprints/ 2011/12/. [24] Y. Saad. Numerical solution of large Lyapunov equations. In Signal processing, scattering and operator theory, and numerical methods (Amsterdam, 1989), volume 5 of Progr. Systems Control Theory, pages 503–511. Birkh¨auser Boston, Boston, MA, 1990. [25] J. Sabino. Solution of Large-Scale Lyapunov Equations via the Block Modified Smith Method. PhD thesis, Rice University, 2006. [26] V. Simoncini. A new iterative method for solving large-scale Lyapunov matrix equations. SIAM J. Sci. Comput., 29(3):1268–1288, 2007. [27] V. Simoncini and V. Druskin. Convergence analysis of projection methods for the numerical solution of large Lyapunov equations. SIAM J. Numer. Anal., 47:828–843, 2009. [28] N. Truhar, Z. Tomljanovi´c, and R.-C. Li. Analysis of the solution of the Sylvester equation using low-rank ADI with exact shifts. Systems Control Lett., 59(3-4):248–257, 2010.
20