Theoretically supported scalable FETI for numerical solution of ...

Report 2 Downloads 48 Views
Theoretically supported scalable FETI for numerical solution of variational inequalities Zdenˇek Dost´al and David Hor´ak



Abstract The FETI method with a natural coarse grid is combined with recently proposed optimal algorithms for the solution of bound and/or equality constrained quadratic programming problems in order to develop a scalable solver for elliptic boundary variational inequalities such as those describing equilibrium of a system of bodies in mutual contact. A discretized model problem is first reduced by the duality theory of convex optimization to the quadratic programming problem with bound and equality constraints. The latter is then modified by means of orthogonal projectors to the natural coarse grid introduced by Farhat, Mandel and Roux. Finally the classical results on linear scalability for linear problems are extended to boundary variational inequalities. The results are validated by numerical experiments. The experiments also confirm that the algorithm enjoys the same parallel scalability as its linear counterpart.

Keywords: Domain decomposition, variational inequality, scalability, parallel algorithms, FETI Acknowledgment: This research is supported by grant No. 101/04/1145 of the Grant Agency of the Czech Republic and by projects of the Ministry of Education No. 1ET400300415 and ME641.

1

Introduction

The FETI (Finite Element Tearing and Interconnecting) domain decomposition method was originally proposed by Farhat and Roux [26] for parallel solving of the linear problems described by elliptic partial differential equations. Its key ingredient ∗

ˇ FEI VSB-Technical University of Ostrava, CZ-70833 Ostrava, Czech Republic

1

is decomposition of the spatial domain into non-overlapping subdomains that are ”glued” by Lagrange multipliers, so that, after eliminating the primal variables, the original problem is reduced to a small, relatively well conditioned, typically equality constrained quadratic programming problem that is solved iteratively. The time that is necessary for both the elimination and iterations can be reduced nearly proportionally to the number of the processors, so that the algorithm enjoys parallel scalability. Observing that the equality constraints may be used to define so called ”natural coarse grid”, Farhat, Mandel and Roux [25] modified the basic FETI algorithm so that they were able to adapt the results by Bramble, Pasciak and Schatz [5] to prove its numerical scalability, i.e. asymptotically linear complexity. The comprehensive review of the mathematical results related to the FETI methods may be found in the monograph by Tosseli and Widlund [38]. If the FETI procedure is applied to an elliptic variational inequality, the resulting quadratic programming problem has not only the equality constraints, but also the non-negativity constraints. Even though the latter is a considerable complication as compared with linear problems, it seems that the FETI procedure should be even more powerful for the solution of variational inequalities than for the linear problems. The reason is that FETI not only reduces the original problem to a smaller and better conditioned one, but it also replaces for free all the inequalities by the bound constraints. Promising experimental results by Dureisseix and Farhat [22] supported this claim and even indicated numerical scalability of their method. Similar results were achieved also for the FETI–DP (Dual–Primal) method introduced by Farhat et al. [24]. The FETI–DP method is very similar to the original FETI, the only difference is that it enforces the continuity of displacements at corners on primal level. A new Lagrange multipliers algorithm, FETI–C, based on FETI–DP and on active set strategies with additional planning steps and preconditioning, was introduced by Farhat et al. [1, 22]. Its scalability was demonstrated experimentally. Another approach yielding an experimental evidence of scalability was proposed by Dost´al, Friedlander, Gomes, Santos and Hor´ak [12, 13, 14]. The algorithm combined FETI with a special variant of the augmented Lagrangian method [10]. Scalability was later proved for an algorithm that enforced the equality constraints by the optimal dual penalty [15, 16] and solved the resulting bound constrained problem by recent in a sense optimal algorithms [7, 21]. Using the same algorithms, Dost´al, Hor´ak and Stefanica then proved numerical scalability for a FETI–DP algorithm applied to the coercive problems discretized by means of either nodal [18] or mortar [19] Lagrange multipliers. Most recently, the scalability results were proved also for semicoercive problems [20]. The rate of convergence was given in terms of the effective condition number of the dual Schur complement of the stiffness matrix which was proved to be bounded by CH 2 /h2 , where C is a constant independent of the discretization and decomposition parameters h and H, respectively. The estimates did not assume any preconditioning. Indeed, numerical experiments by the present authors, V. Vondr´ak and M. Lesoinne indicated that the performance of our FETI–DP based algorithms may be considerably improved by preconditioning. 2

It should be noted that the effort to develop scalable solvers for variational inequalities was not restricted to FETI. For example, using ideas related to Mandel [34], Kornhuber, Krause, Sander and Wohlmuth [30, 31, 40, 32, 33] gave an experimental evidence of numerical scalability of the algorithm based on monotone multigrid. Probably the first theoretical results concerning development of scalable algorithms were proved by Sch¨oberl [36, 37]. In this paper, we use the FETI method with a natural coarse grid to develop a scalable algorithm for numerical solution of both coercive and semicoercive variational inequalities. The rate of convergence is again given in terms of the effective condition number of the dual Schur complement of the stiffness matrix, but this time it is bounded by CH/h. The paper is organized as follows. After describing a model problem, we briefly review the FETI methodology [12] that turns the variational inequality into the well conditioned quadratic programming problem with bound and equality constraints. Then we review our algorithms for solution of the resulting bound and equality constrained quadratic programming problem whose rate of convergence may be expressed in terms of bounds on the spectrum of the dual Schur complement matrix [21, 8, 9]. Finally we present the main results about optimality of our method and give results of numerical experiments with parallel implementation of the algorithm in PETSc [3].

2

Model problem

For the sake of simplicity, we shall reduce our analysis to a simple model problem, but our reasoning is valid also in more general cases, including contact problems of 2D and 3D elasticity, provided that the conditions exploited in the proof of the results by Farhat, Mandel and Roux [25] are satisfied. Let Ω = Ω1 ∪ Ω2 , Ω1 = (0, 1) × (0, 1) and Ω2 = (1, 2) × (0, 1) denote open domains with boundaries Γ1 , Γ2 and their parts Γiu , Γif , Γic formed by the sides of Ωi , i = 1, 2 as in Figure 1a or Figure 1b. Let H 1 (Ωi ), i = 1, 2 denote the Sobolev space of the first order in the space L2 (Ωi ) of the functions on Ωi whose squares are integrable in the sense of Lebesgue. Let © ª V i = v i ∈ H 1 (Ωi ) : v i = 0 on Γiu denote the closed subspaces of H 1 (Ωi ), i = 1, 2, and let © ª V =V1×V2 and K = (v 1 , v 2 ) ∈ V : v 2 − v 1 ≥ 0 on Γc denote the closed subspace and the closed convex subset of H = H 1 (Ω1 ) × H 1 (Ω2 ), respectively. The relations on the boundaries are in terms of traces. On H we shall

3

define a symmetric bilinear form ¶ 2 Z µ X ∂ui ∂v i ∂ui ∂v i a(u, v) = + dΩ ∂x ∂x ∂y ∂y i Ω i=1 and a linear form `(v) =

2 Z X Ωi

i=1

where f i ∈ L2 (Ωi ),   −1 0 f (x, y) =  −3

f i v i dΩ,

i = 1, 2 are the restrictions of for (x, y) ∈ (0, 1) × [0.75, 1) for (x, y) ∈ (0, 1) × [0, 0.75) and (x, y) ∈ (1, 2) × [0.25, 1) for (x, y) ∈ (1, 2) × [0, 0.25)

for coercive problem and   −3 for (x, y) ∈ (0, 1) × [0.75, 1) 0 for (x, y) ∈ (0, 1) × [0, 0.75) and (x, y) ∈ (1, 2) × [0.25, 1) f (x, y) =  −1 for (x, y) ∈ (1, 2) × [0, 0.25) for semicoercive problem. Thus we can define a problem to find 1 min q(u) = a(u, u) − `(u) subject to u ∈ K. 2 −1

f

-3

−3





0.75

0.25

0.25

0.75

Ω1 1

1

Γu

1

Γc

2

1

2

Γu

Γu

Fig. 1a: Coercive model problem

0.75

Ω1 1

Γf

2

0.25

1

Γf

-1

2

0.25 0.75

f

(2.1)

1

1

Γf

Γc

2

Γf

Fig. 1b: Semicoercive model problem

Figure 1: Model problems We shall consider two variants of the Dirichlet data. In the first case, both the membranes are fixed on the outer edges as in Figure 1a, so that Γ1u = {(0, y) ∈ IR2 : y ∈ [0, 1]}, Γ2u = {(2, y) ∈ IR2 : y ∈ [0, 1]}. 4

Since the Dirichlet conditions are prescribed on parts Γiu , i = 1, 2 of the boundaries of the both membranes with positive measure, the quadratic form a is coercive which guarantees both existence and uniqueness of the solution [28, 27]. In the second case, only the left membrane is fixed on the outer edge and the right membrane has no prescribed displacement as in Figure 1b, so that Γ1u = {(0, y) ∈ IR2 : y ∈ [0, 1]}, Γ2u = ∅. Even though a is in this case only semidefinite, the form q is still coercive due to the choice of f so that it has again the unique solution [28, 27]. More details about this particular model problem may be found in [12]. The solution of the model problem may be interpreted as the displacement of two membranes under the traction f . The left edge of the right membrane is not allowed to penetrate below the right edge of the left membrane.

3

Domain decomposition and discretization

In our definition of the problem, we have so far used only the natural decomposition of the spatial domain Ω into Ω1 and Ω2 . However, to enable efficient application of the domain decomposition methods, we can optionally decompose each Ωi into subdomains Ωi1 , . . . , Ωip , p > 1 as in Figure 2. The continuity in Ω1 and Ω2 of the global solution assembled from the local solutions uij will be enforced by the ”gluing” conditions uij (x) = uik (x) that should be satisfied for any x on the interface Γij,ik of Ωij and Ωik . After modifying appropriately the definition of problem (2.1), introducing regular grids in the subdomains Ωij that match across the interfaces Γij,kl , indexing contiguously the nodes and entries of corresponding vectors in the subdomains, and using the Lagrangian finite element discretization, we get the discretized version of problem (2.1) with auxiliary domain decomposition that reads 1 min u> Ku − f > u s.t. BI u ≤ 0 and BE u = 0. 2

(3.1)

In (3.1), K denotes a block diagonal positive semidefinite stiffness matrix, the full rank matrices BI and BE describe the discretized non-penetration and gluing conditions, respectively, and f represents the discrete analog of the linear term `(u). The rows of BE and BI are filled with zeros except 1 and -1 in positions that correspond to the nodes with the same coordinates on the artificial or contact boundaries, respectively. In particular, if bi denotes a row of BE or BI , then bi will not have more than four nonzero entries, and for any displacement vector u, bi u will denote the difference or jump between the displacements on each side of the boundary. Some more details may be found in [12]. Our next step is to simplify the problem, in particular to replace the general inequality constraints BI u ≤ 0 by the nonnegativity constraints using the duality 5

1,2

H

1,4









1,1

h

2,2

1,3

λE

2,4





2,1

2,3

Ω λI

λE

Ω λE

Figure 2: Domain decomposition and discretization theory. To this end, let us introduce the Lagrangian associated with problem (3.1) by 1 > L(u, λI , λE ) = u> Ku − f > u + λ> (3.2) I BI u + λE BE u, 2 where λI and λE are the Lagrange multipliers associated with inequalities and equalities, respectively. Introducing the notation · ¸ · ¸ λI BI λ= and B = , λE BE we can observe that B is a full rank matrix and write the Lagrangian briefly as 1 L(u, λ) = u> Ku − f > u + λ> Bu. 2 It is well known [4] that (3.1) is equivalent to the saddle point problem Find (u, λ) so that L(u, λ) = sup inf L(u, λ). λI ≥0 u

(3.3)

For fixed λ, the Lagrange function L(·, λ) is convex in the first variable and the minimizer u of L(·, λ) satisfies Ku − f + B > λ = 0.

(3.4)

Equation (3.4) has a solution iff f − B > λ ∈ ImK,

(3.5)

which can be expressed more conveniently by means of a matrix R whose columns span the null space of K as R> (f − B > λ) = 0. (3.6) 6

The matrix R may be formed directly so that each floating subdomain is assigned to a column of R with ones in positions of the nodal variables that belong to the subdomain and zeros elsewhere. It may be checked that R> B > is a full rank matrix. The matrix R may also be extracted from K [23]. Now assume that λ satisfies (3.5) and denote by K † any matrix that satisfies KK † K = K.

(3.7)

Let us note that a generalized inverse K † that satisfies (3.7) may be evaluated at the cost comparable with the Cholesky decomposition of regularized K [23]. It may be verified directly that if u solves (3.4), then there is a vector α such that u = K † (f − B > λ) + Rα.

(3.8)

After substituting expression (3.8) into problem (3.3) and changing signs, we shall get the minimization problem to find min Θ(λ) s.t. λI ≥ 0 and R> (f − B > λ) = 0

(3.9)

where

1 (3.10) Θ(λ) = λ> BK † B > λ − λ> BK † f. 2 Once the solution λ of (3.9) is known, the vector u that solves (3.1) can be evaluated by (3.8) and the formula [12] e > BR) e −1 R> B e > BK e † (f − B > λ), α = −(R> B

(3.11)

e = [B e > , B > ]> , and the matrix B eI is formed by the rows bi of BI that where B I E correspond to the positive components of the solution λ characterized by λi > 0.

4

Natural coarse grid

Even though the problem (3.9) is much more suitable for computations than (3.1) and was used to efficient solving of the discretized variational inequalities [11], further improvement may be achieved by adapting some simple observations and the results of Farhat, Mandel and Roux [25]. Let us denote de = BK † f, ee = R> f

F = BK † B > , e = R> B > , G

e and let T denote a regular matrix that defines orthonormalization of the rows of G so that the matrix e G = TG 7

has orthonormal rows. After denoting e = T ee, problem (3.9) reads min

1 > λ F λ − λ> de s.t λI ≥ 0 and Gλ = e. 2

(4.1)

Next we shall transform the problem of minimization on the subset of the affine space to that on the subset of the vector space by looking for the solution of (4.1) in e where Gλ e = e. The following Lemma shows that we can even the form λ = µ + λ, e such that λ eI = 0. find λ Lemma 4.1. Let B be such that the negative entries of BI are in the columns eI ≥ 0 that correspond to the nodes in the floating subdomain Ω2 . Then there is λ e = ee. such that Gλ Proof: See [15] (coercive problem) or [16] (semicoercive problem). ¤ e so that To carry out the transformation, denote λ = µ + λ, 1 > 1 e + λ F λ − λ> de = µ> F µ − µ> (de − F λ) 2 2

1 e> e e> e λ Fλ − λ d 2

and the problem (4.1) is, after returning to the old notation, equivalent to min

1 > eI λ F λ − λ> d s.t Gλ = 0 and λI ≥ −λ 2

(4.2)

e and λ eI ≥ 0. with d = de − F λ Our final step is based on observation that the problem (4.2) is equivalent to min

1 > eI , λ (P F P + ρQ)λ − λ> P d s.t Gλ = 0 and λI ≥ −λ 2

(4.3)

where ρ is arbitrary positive constant and Q = G> G

and

P =I −Q

denote the orthogonal projectors on the image space of G> and on the kernel of G, respectively. The regularization term is introduced in order to simplify the reference to the results of quadratic programming that assume regularity of the Hessian matrix of the quadratic form. The problem (4.3) turns out to be a suitable starting point for development of an efficient algorithm for variational inequalities due to the classical estimates of the extreme eigenvalues. To formulate them, we shall denote by αmin (A) 8

and αmax (A) the smallest and the largest eigenvalue of a given symmetric matrix A, respectively. Theorem 4.2. There are constants C1 > 0 and C2 > 0 independent of the discretization parameter h and the decomposition parameter H such that αmin (P F P |ImP ) ≥ C1 and αmax (P F P |ImP ) ≤ ||P F P || ≤ C2 Proof: See Theorem 3.2 of Farhat, Mandel and Roux [25].

H . h

¤

Note: The statement of Theorem 3.2 of Farhat, Mandel and Roux [25] gives only an upper bound on the spectral condition number κ(P F P |ImP ). However, the reasoning that precedes and substantiates their estimate proves both bounds of (4).

5

Optimal solvers to bound and equality constrained problems

We shall now briefly review our in a sense optimal algorithms for the solution of the bound and equality constrained problem (4.3). They combine our semimonotonic augmented Lagrangian method [8] which generates approximations for Lagrange multipliers for the equality constraints in the outer loop with the working set algorithm for bound constrained auxiliary problems in the inner loop [21]. If a new Lagrange multiplier vector µ is used for the equality constraints, the augmented Lagrangian for problem (4.3) can be written as L(λ, µ, ρ) =

1 > λ (P F P + ρQ)λ − λ> P d + µ> Gλ + ρλ> Qλ. 2

The gradient of L(λ, µ, ρ) is given by g(λ, µ, ρ) = (P F P + ρQ)λ − P d + GT (µ + ρGλ). e The Let I denote the set of the indices of the bound constrained entries of λ ≥ −λ. P P projected gradient g = g (λ, µ, ρ) of L at λ is given componentwise by ( ei or i ∈ gi for λi > −λ /I giP = − e gi for λi = −λi and i ∈ I where gi− = min{gi , 0}. Our algorithm is a variant of that proposed by Conn, Gould and Toint [6] for identifying stationary points of more general problems. Its modification by Dost´al, Friedlander and Santos [10] was used by Dost´al and Hor´ak to 9

develop a scalable FETI based algorithm, as shown experimentally in [14]. The key to proving optimality results is to combine the adaptive precision control of auxiliary problems in Step 1 with the new update rule for the penalty parameter ρ in Step 4. All the necessary parameters are listed in Step 0, and typical values of these parameters for our model problem are given in brackets. Algorithm 5.1. Semi-monotonic augmented Lagrangian method for bound and equality constrained problems (SMALBE). Step 0. {Initialization of parameters} Given η > 0 [η = kP dk], β > 1 [β = 10], M > 0 [M = 1], ρ0 > 0 [ρ0 = 100], and µ0 [µ0 = 0] , set k = 0. Step 1. {Inner iteration with adaptive precision control.} eI Find λk such that λkI ≥ −λ P k k ||g (λ , µ , ρk )|| ≤ min{M kGλk k, η}. Step 2. {Stopping criterion.} If ||g P (λk , µk , ρk )|| and ||Gλk || are sufficiently small, then λk is the solution. end if. Step 3. {Update of the Lagrange multipliers.} µk+1 = µk + ρk Gλk Step 4. {Update the penalty parameter.} If k > 0 and L(λk , µk , ρk ) < L(λk−1 , µk−1 , ρk−1 ) + ρk kGλk k2 /2 then ρk+1 = βρk else ρk+1 = ρk end if. Step 5. Increase k and return to Step 1. Step 1 may be implemented by any algorithm for minimization of the augmented fI which guarantees convergence Lagrangian L with respect to λ subject to λI ≥ −λ of the projected gradient to zero. More about the properties and implementation of SMALBE algorithm may be found in [8]. The unique feature of the SMALBE algorithm is its capability to find an approximate solution of problem (4.3) in a number of steps which is uniformly bounded in terms of the bounds on the spectrum of P F P + ρQ [8]. To get bound on the number of matrix multiplication, it is necessary to have algorithm which can solve the problem eI minimize L(λ, µ, ρ) subject to λI ≥ −λ (5.1) with the rate of convergence in terms of the bounds on the spectrum of the Hessian matrix of L. To describe such algorithm, let us recall that the unique solution λ = λ(µ, ρ) of

10

(5.1) satisfies the Karush-Kuhn-Tucker conditions g P (λ, µ, ρ) = 0.

(5.2)

Let A(λ) and F(λ) denote the active set and free set of indices of λ, respectively, i.e., A(λ) = {i ∈ I : λi = −λei }

and F(λ) = {i : λi > −λei or i ∈ / I}.

To enable an alternative reference to the KKT conditions [4], let us define the free gradient ϕ(λ) and the chopped gradient β(λ) by ½ ½ gi (λ) for i ∈ F(λ) 0 for i ∈ F(λ) ϕi (λ) = and βi (λ) = − 0 for i ∈ A(λ) gi (λ) for i ∈ A(λ) so that the KKT conditions are satisfied if and only if the projected gradient g P (λ) = ϕ(λ) + β(λ) is equal to zero. We call λ feasible if λi ≥ −λei for i ∈ I. The projector P to the set of feasible vectors is defined for any λ by P (λ)i = max{λi , −λei } for i ∈ I,

P (λ)i = λi for i ∈ / I.

Let A denote the Hessian of L with respect to λ. The expansion step is defined by ¡ ¢ λk+1 = P λk − αϕ(λk ) (5.3) with the steplength α ∈ (0, kAk−1 ]. This step may expand the current active set. To describe it without P , let ϕ(λ) e be the reduced free gradient for any feasible λ, with entries ϕ ei = ϕ ei (λ) = min{λi /α, ϕi } for i ∈ I, ϕ ei = ϕi for i ∈ E such that If the inequality

P (λ − αϕ(λ)) = λ − αϕ(λ). e

(5.4)

||β(λk )||2 ≤ Γ2 ϕ(λ e k )> ϕ(λk )

(5.5)

k

holds, then we call the iterate λ strictly proportional. The test (5.5) is used to decide which component of the projected gradient g P (λk ) will be reduced in the next step. The proportioning step is defined by λk+1 = λk − αcg β(λk ). The steplength αcg is chosen to minimize L(λk − αβ(λk ), µk , ρk ) with respect to α, i.e., β(λk )> g(λk ) αcg = . β(λk )> Aβ(λk ) 11

The purpose of the proportioning step is to remove indexes from the active set. The conjugate gradient step is defined by λk+1 = λk − αcg pk

(5.6)

where pk is the conjugate gradient direction [2] which is constructed recurrently. The recurrence starts (or restarts) with ps = ϕ(λs ) whenever λs is generated by the expansion step or the proportioning step. If pk is known, then pk+1 is given by the formulae [2] ϕ(λk )> Apk . (5.7) pk+1 = ϕ(λk ) − γpk , γ = (pk )> Apk The conjugate gradient steps are used to carry out the minimization in the face WJ = {λ : λi = 0 for i ∈ J } given by J = A(λs ) efficiently. The algorithm that we use may now be described as follows. Algorithm 5.2. Modified proportioning with reduced gradient projections (MPRGP). ei for i ∈ I, α ∈ (0, kAk−1 ], and Γ > 0 be Let λ0 be an n-vector such that λi ≥ −λ given. For k ≥ 0 and λk known, choose λk+1 by the following rules: Step 1. If g P (λk ) = 0, set λk+1 = λk . Step 2. If λk is strictly proportional and g P (λk ) 6= 0, try to generate λk+1 by the conjugate gradient step. If λk+1 ≥ 0 for i ∈ I, then accept it, else generate λk+1 by i the expansion step. Step 3. If λk is not strictly proportional, define λk+1 by proportioning. The MPRGP algorithm has linear rate of convergence in terms of the spectral condition number of the Hessian A of L [21]. More about the properties and implementation of SMALBE algorithm may be found in [21] and [9].

6

Optimality

To show that Algorithm 5.1 with the inner loop implemented by Algorithm 5.2 is optimal for the solution of problem (or a class of problems) (4.3), we shall introduce new notation that complies with that used in [9]. We shall use T = {(H, h) ∈ IR2 : H ≤ 1, 2h ≤ H and H/h ∈ IN } as the set of indices. Given a constant C ≥ 2, we shall define a subset TC of T by TC = {(H, h) ∈ IR2 : H ≤ 1, 2h ≤ H, H/h ∈ IN and H/h ≤ C}.

12

For any t ∈ T , we shall define At = P F P + ρQ, bt = P d eI and `t,E = −∞ Ct = G, `t,I = −λ by the vectors and matrices generated with the discretization and decomposition parameters H and h, respectively, so that the problem (4.3) is equivalent to the problem minimize Θt (λt ) s.t. Ct λt = 0 and λt ≥ `t (6.1) > with Θt (λ) = 12 λ> At λ − b> t λ. Using these definitions, Lemma 4.1 and GG = I, we obtain kCt k ≤ 1 and k`+ (6.2) t k = 0,

where for any vector v with the entries vi , v + denotes the vector with the entries vi+ = max{vi , 0}. Moreover, it follows by Theorem 4.2 that for any C ≥ 2 there are C constants aC max > amin > 0 such that C aC min ≤ αmin (At ) ≤ αmax (At ) ≤ amax

(6.3)

for any t ∈ TC . Moreover, there are positive constants C1 and C2 such that aC min ≥ C1 and aC ≤ C C. In particular, it follows that the assumptions of Theorem 5 (i.e. 2 max the inequalities (6.2) and (6.3)) of [9] are satisfied for any set of indices TC , C ≥ 2 and we have the following result: Theorem 6.1 Let C ≥ 2 denote a given constant, let {λkt }, {µkt } and {ρt,k } be generated by Algorithm 5.1 (SMALBE) for (6.1) with kbt k ≥ ηt > 0, β > 1, M > 0, ρt,0 = ρ0 > 0, and µ0t = 0. Let s ≥ 0 denote the smallest integer such that β s ρ0 ≥ M 2 /amin and assume that Step 1 of Algorithm 5.1 is implemented by means of Algorithm 5.2 (MPRGP) with parameters Γ > 0 and α ∈ (0, (amax + β s ρ0 )−1 ], so k,1 k,l k that it generates the iterates λk,0 t , λt , . . . , λt = λt for the solution of (6.1) starting k,0 k−1 −1 from λt = λt with λt = 0, where l = lt,k is the first index satisfying

or

k,l k kg P (λk,l t , µt , ρt,k )k ≤ M kCt λt k

(6.4)

k −1 kg P (λk,l }. t , µt , ρt,k )k ≤ ²kbt k min{1, M

(6.5)

Then for any t ∈ TC and problem (6.1), Algorithm 5.1 generates an approximate solution λkt t which satisfies M −1 kg P (λkt t , µkt t , ρt,kt )k ≤ kCt λkt t k ≤ ²kbt k

(6.6)

at O(1) matrix-vector multiplications by the Hessian of the augmented Lagrangian Lt for (6.1) and ρt,k ≤ β s ρ0 . 13

7

Numerical experiments

In this section we report some results of numerical solution of the semicoercive model problem of Section 2 in order to illustrate the performance of the algorithm, in particular its numerical and parallel scalability. To this end, we have implemented Algorithm 5.1 with the solution of auxiliary bound constraints by Algorithm 5.2 in C exploiting PETSc [3] to solve problem (4.3) with varying decomposition and discretization parameters.

0

0

−0.1

−0.2 −0.2

−0.4 −0.3

−0.6 −0.4

−0.8 1

−0.5

−1 1

−0.6

−1.2

0.8

0.5

0.6 −0.7 0

0.4 0.2

0.4

0.6

0.8

1

−1.4

0.2 1.2

1.4

1.6

1.8

2

0

0

Fig. 3a: Coercive problem

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 2

Fig. 3b: Semicoercive problem

Figure 3: Solution of model problems The experiments were run on the Lomond 18-processor Sun HPC 6500 Ultra SPARC-II based SMP system with 400 MHz, 18 GB of shared memory, 90 GB disc space, nominal peak performance 14.4 GFlops, 16 kB level 1 and 8 MB level 2 cache in EPCC Edinburgh, and on the Turing Cray T3E 1200, 788 applications processors, each 1.2 GFlops with 256 MB, 209 GB memory, 28 command processors, 2TB disk space, high-speed network with low latency in the University of Manchester. All the computations were carried out with parameters e 1 Bf }, ² = 10−4 . M = 1, ρ0 = 10, Γ = 1, λ0 = max{−λ, 2 The results of computations are summarized in Tables 1 – 3. Table 1 illustrates numerical scalability of Algorithm 5.1. In particular, for varying decompositions and discretization parameters, the upper row of each field of the table gives the corresponding primal dimension/ dual dimension/ times in seconds, while the number in the lower row gives a number of the conjugate gradient iterations that were necessary for the solution of the problem to the given precision. We can see that the number of the conjugate gradient iterations for a given ratio H/h (in rows) varies very moderately. 14

Table 1: Performance for varying decomposition and discretization H H/h \ procs

1 2

1/2 8

1/4 16

1/8 16

128

33282/129/41.95

133128/1287/89.50

532512/6687/74.9

2130048/29823/421.5

28

59

36

47

8450/65/2.04

33800/647/4.14

135200/3359/7.10

540800/14975/53.48

22

47

33

43

2178/33/0.20

8712/327/0.50

34848/1695/1.48

139392/7551/11.66

17

33

30

37

578/17/0.04

2312/167/0.18

9248/863/0.68

36992/3839/4.30

13

29

26

32

8

162/9/0.03

648/87/0.10

2592/447/0.39

10365/1983/2.06

10

20

23

27

4

50/5/0.01

200/47/0.04

800/239/0.28

3200/1055/1.30

7

19

22

25

64 32 16

Table 2 indicates that the algorithm presented enjoys high parallel scalability. The results for the largest problems are in Table 3. Table 2: Parallel scalability for 128 subdomains processors time[sec]

8

1 2907.13

2 1022.03

4 462.4

8 165.8

16 32 68.06 51.40

64 85.47

128 232.00

Comments and conclusion

We have presented the scalability results related to application of the augmented Lagrangians with the FETI based domain decomposition method using the natural coarse grid to the solution of variational inequalities by recently developed algorithms for the solution of special QP problems. In particular, we have shown that the solution of the discretized elliptic variational inequality to a prescribed precision may be found in a number of matrix vector multiplications bounded independently of the discretization parameter provided the ration of the decomposition and the discretization parameters is kept bounded. Numerical experiments with the model variational inequality are in agreement with the theory and indicate that the algorithm may 15

Table 3: Highlights h 1/1024 1/2048

H

prim. dim. 1/8 2130048 1/8 8454272

dual. num. of dim. subdom. 29823 128 59519 128

procs 32 of Turing 16 of Lomond

out. iter. 2 2

cg. time iter. [sec] 47 193.6 65 3959.0

be efficient. The results remain valid also for the solution of frictionless 2D and 3D contact problems of elasticity and may be adapted to the solution of problems with Coulomb friction as indicated in [17]. The solution of auxiliary linear problems in the inner loop may be improved by standard preconditioners [29, 35, 38] and may be adapted to the mortar discretization [39]. We shall discuss these topics elsewhere.

References [1] P. Avery, G. Rebel, M. Lesoinne and C. Farhat, A umerically scalable dualprimal substructuring method for the solution of contact problems. I: the frictionless case. Comput. Methods Appl. Mech. Eng. 193, 23–26, (2004) 2403–2426. [2] O. Axelsson, Iterative Solution Methods. Cambridge University Press, Cambridge, 1994. [3] S. Balay , W. Gropp , L.C. McInnes and B. Smith, PETSc 2.0 Users Manual. Argonne National Laboratory, http://www.mcs.anl.gov/petsc/. [4] D. P. Bertsekas, Nonlinear Optimization. Athena Scientific, Belmont 1999. [5] J.H. Bramble, J. E. Pasciak, A. H. Schatz, The construction of preconditioners for elliptic problems by substructuring I, Mathematics of Comput. 47, 103-134 (1986). [6] A. R. Conn, N. I. M. Gould and Ph. L. Toint, A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis 28 (1991) 545–572. [7] Z. Dost´al, A proportioning based algorithm for bound constrained quadratic programming with the rate of convergence. Numerical Algorithms 34, 2–4(2003) 293–302. [8] Z. Dost´al Inexact semi-monotonic augmented Lagrangians with optimal feasibility convergence for quadratic programming with simple bounds and equality constraints. SIAM J. Numerical Analysis 43,1 (2005) 96–115. 16

[9] Z. Dost´al, An optimal algorithm for bound and equality constrained quadratic programming problems with bounded spectrum, submitted. [10] Z. Dost´al, A. Friedlander, S. A. Santos, Augmented Lagrangians with adaptive precision control for quadratic programming with simple bounds and equality constraints. SIAM Journal on Optimization 13,4(2003)1120–1140. [11] Z. Dost´al, A. Friedlander, S. A. Santos, Solution of contact problems of elasticity by FETI domain decomposition. Contemporary Mathematics 218 (1998) 82–93. [12] Z. Dost´al, F. A. M. Gomes, S. A. Santos, Duality based domain decomposition with natural coarse space for variational inequalities. Journal of Computational and Applied Mathematics 126, 1–2 (2000) 397–415. [13] Z. Dost´al, F. A. M. Gomes, S. A. Santos, Solution of contact problems by FETI domain decomposition with natural coarse space projection. Computer Methods in Applied Mechanics and Engineering 190, 13–14 (2000) 1611–1627. [14] Z. Dost´al Z and D. Hor´ak, Scalability and FETI based algorithm for large discretized variational inequalities. Mathematics and Computers in Simulation 61, (3–6) (2003) 347–357. [15] Z. Dost´al and D. Hor´ak, Scalable FETI with Optimal Dual Penalty for a Variational Inequality, Numerical Linear Algebra and Applications 11, 5–6 (2004) 455 – 472. [16] Z. Dost´al and D. Hor´ak, Scalable FETI with Optimal Dual Penalty for Semicoercive Variational Inequalities, Contemporary Mathematics 329 (2003)79-88. [17] Z. Dost´al, D. Hor´ak, R. Kuˇcera, V. Vondr´ak, J. Haslinger, J. Dobi´aˇs ans S. Pt´ak, FETI based algorithms for contact problems: scalability, large displacements and 3D Coulomb friction, Computer Methods in Applied Mechanics and Engineering 194, 2-5 (2005) 395-409. [18] Z. Dost´al, D. Hor´ak and D. Stefanica, A scalable FETI-DP algorithm for coercive variational inequalities. IMACS Journal Applied Numerical Mathematics 54,3–4 (2005) 378–390. [19] Z. Dost´al, D. Hor´ak and D. Stefanica, A Scalable FETI–DP Algorithm with Non–penetration Mortar Conditions on Contact Interface, submitted. [20] Z. Dost´al, D. Hor´ak and D. Stefanica, A Scalable FETI–DP Algorithm for Semicoercive Variational Inequalities, submitted. [21] Z. Dost´al Z and J. Sch¨oberl, Minimizing quadratic functions over non-negative cone with the rate of convergence and finite termination. Comput. Optimiz. and Applications 30,1 (2005)23–44.

17

[22] Dureisseix D and C. Farhat, A numerically scalable domain decomposition method for solution of frictionless contact problems. International Journal for Numerical Methods in Engineering 50, 12 (2001) 2643–2666. [23] C. Farhat and M. G´erardin, On the general solution by a direct method of a large scale singular system of linear equations: application to the analysis of floating structures. International Journal for Numerical Methods in Engineering 41,4 (1998) 675–696. [24] C. Farhat, M. Lesoinne, P. LeTallec, K. Piersonand D. Rixen, FETI-DP: A dual–prime unified FETI method. I: A faster alternative to the two–level FETI method. Int. J. Numer. Methods Eng. 50, No.7 (2001)1523–1544. [25] C. Farhat, J. Mandel and F.-X.Roux, Optimal convergence properties of the FETI domain decomposition method. Computer Methods in Applied Mechanics and Engineering 115, (1994) 365-385. [26] C. Farhat and F.-X. Roux, An unconventional domain decomposition method for an efficient parallel solution of large-scale finite element systems. SIAM Journal on Scientific Computing 13 (1992) 379-396. [27] R. Glowinski, Variational Inequalities, Springer Verlag, Berlin 1980. [28] I. Hlav´aˇcek, J. Haslinger, J. Neˇcas and J. Lov´ıˇsek, Solution of Variational Inequalities in Mechanics. Springer Verlag: Berlin, 1988. [29] A. Klawonn and O. B. Widlund FETI and Neumann-Neumann iterative substructuring mathods: connections and New results. Communications on Pure and Applied Mathematics Vol. LIV (2001) 57-90. [30] Kornhuber R. Adaptive monotone multigrid methods for nonlinear variational problems. Teubner-Verlag: Stuttgart, 1997. [31] Kornhuber R., Krause R. Adaptive multigrid methods for Signorini’s problem in linear elasticity. Computer Visualization in Science 4, 1, (2001) 9–20. [32] R. Krause and O. Sander, Fast solving of contact problems on complicated geometries. In Kornhuber, Ralf (ed.) et al., Domain decomposition methods in science and engineering. Selected papers of the 15th international conference on domain decomposition, Berlin, Germany, July 21-25, 2003. Berlin: Springer. Lecture Notes in Computational Science and Engineering 40, (2005) 495–502. [33] R. H. Krause, B. I. Wohlmuth, A Dirichlet-Neumann type algorithm for contact problems with friction. Comput. Vis. Sci. 5, 3 (2002)139–148. ´ [34] Mandel J. Etude alg´ebrique d’une m´ethode multigrille pour quelques probl`emes de fronti`ere libre (French). Comptes Rendus de l’Academie des Sciences Sr. I 298 (1984) 469–472. 18

[35] Mandel J, Tezaur R. Convergence of substructuring method with Lagrange multipliers. Numerische Mathematik 73 (1996) 473–487. [36] J. Sch¨oberl, Solving the Signorini problem on the basis of domain decomposition techniques. Computing 60,4 (1998) 323–344. [37] J. Sch¨oberl, Efficient contact solvers based on domain decomposition techniques. Computers&Mathematics 42 (1998) 1217–1228. [38] A. Toselli and O. B. Widlund, Domain Decomposition Methods–Algorithms and Theory, Springer Series on Computational Mathematics 34, Springer-Verlag, Berlin 2005. [39] Wohlmuth B. I., Discretization Methods and Iterative Solvers Based on Domain Decomposition, Springer , Berlin 2001. [40] B. I. Wohlmuth and R. Krause, Monotone methods on nonmatching grids for nonlinear contact problems. SIAM J. Sci. Comput. 25, No.1, 324-347 (2003).

19