Active-set Newton methods for mathematical programs with vanishing ...

Report 2 Downloads 49 Views
Comput Optim Appl (2012) 53:425–452 DOI 10.1007/s10589-012-9467-x

Active-set Newton methods for mathematical programs with vanishing constraints A.F. Izmailov · A.L. Pogosyan

Received: 14 February 2011 / Published online: 7 March 2012 © Springer Science+Business Media, LLC 2012

Abstract Mathematical programs with vanishing constraints constitute a new class of difficult optimization problems with important applications in optimal topology design of mechanical structures. Vanishing constraints usually violate standard constraint qualifications, which gives rise to serious difficulties in theoretical and numerical treatment of these problems. In this work, we suggest several globalization strategies for the active-set Newton-type methods developed earlier by the authors for this problem class, preserving superlinear convergence rate of these methods under weak assumptions. Preliminary numerical results demonstrate that our approach is rather promising and competitive with respect to the existing alternatives. Keywords Mathematical program with vanishing constraints · Constraint qualification · Optimality condition · Active-set method · Sequential quadratic programming · Semismooth Newton method · Global convergence 1 Introduction We consider the mathematical program with vanishing constraints (MPVC) minimize f (x) subject to h(x) = 0, Hi (x) ≥ 0,

g(x) ≤ 0, Gi (x)Hi (x) ≤ 0,

(1.1) i = 1, . . . , s,

This research is supported by the Russian Foundation for Basic Research Grant 10-01-00251. A.F. Izmailov · A.L. Pogosyan () VMK Faculty, OR Department, Moscow State University, MSU, Uchebniy Korpus 2, Leninskiye Gory, 119991 Moscow, Russia e-mail: [email protected] A.F. Izmailov e-mail: [email protected]

426

A.F. Izmailov, A.L. Pogosyan

where f : Rn → R is a smooth function, h : Rn → Rl , g : Rn → Rm and G, H : Rn → Rs are smooth mappings. The name of this problem class, originally introduced in [1], results from the following fact: if for some index i ∈ {1, . . . , s} and some x ∈ Rn it holds that Hi (x) = 0 then the corresponding last constraint in (1.1) holds automatically, i.e., “vanishes”. Alternatively, if Hi (x) > 0 then the last constraint takes the form of the inequality Gi (x) ≤ 0. MPVC appears to be a convenient modelling framework for problems of optimal topology design of truss structures; see [1, 2] for these applications. However, vanishing constraints usually violate standard constraint qualifications, which gives rise to serious difficulties in theoretical and numerical treatment of these problems (to be discussed in Sect. 2). In the present work we develop globalization strategies for the active-set Newtontype methods suggested for MPVC in [18, 19], preserving superlinear convergence rate of these methods under weak assumptions. In [18] it was pointed out that the active-set methods are more appropriate for globalization than, e.g., the piecewise sequential quadratic programming algorithm (also developed in [18]). Globalization strategies discussed below are of a hybrid nature: the active-set method is combined with some outer-phase algorithm with good global convergence properties, but otherwise it can be rather arbitrary, and in particular, unrelated to the active-set local phase. Similar constructions were used in [20] for a different class of difficult optimization problems, namely, for mathematical programs with complementarity constraints (MPCC); see [23, 25] for the detailed studies and applications of this problem class. We will also employ some ideas from the global algorithm in [4], where, however, they were used in a completely different context. The rest of the paper is organized as follows. In Sect. 2, we recall some needed notions and facts from MPVC theory. In Sect. 3, we discuss the main idea of the active-set methods for MPVC and the corresponding local convergence results, and we further improve some estimates obtained in [18, 19]. The main Sect. 4 is concerned with globalization of convergence of the active-set methods. Numerical results are presented in Sect. 5. Our notation is fairly standard. For u, v ∈ Rq , by u, v we denote the Euclidean inner product of these vectors, and  ·  stands for the associated norm. For a set U ⊂ Rq and for u ∈ Rq , we use the notation dist(u, U ) = infv∈U v − u. If u ∈ Rq and I ⊂ {1, . . . , q}, then uI stands for the subvector of u with components ui , i ∈ I . By ker A we mean the kernel (null space) of an r × q-matrix A, i.e., ker A = {u ∈ Rq | Au = 0}. For a function f : Rq → R, by f (u) we denote its gradient at u ∈ Rq . Similarly, for a mapping F : Rq → Rr , by F (u) we denote the Jacobian of F at u, i.e., the matrix whose rows are the gradients of the components of F at u. Finally, we say that F is locally Lipschitz-continuous with respect to the point u¯ ∈ Rq if there exists  > 0 such that F (u) − F (u) ¯ ≤ u − u ¯ for all u ∈ Rq close enough to u. ¯

2 Preliminaries Let x¯ ∈ Rn be a feasible point of problem (1.1). Following [1], define the index sets

Active-set Newton methods for MPVC

427

  Ig = Ig (x) ¯ = i = 1, . . . , m | gi (x) ¯ =0 ,   ¯ = i = 1, . . . , s | Hi (x) ¯ >0 , I+ = I+ (x)   ¯ = i = 1, . . . , s | Hi (x) ¯ =0 , I0 = I0 (x) and further partition I+ into the subsets   I+0 = I+0 (x) ¯ = i ∈ I+ | Gi (x) ¯ =0 , and I0 into the subsets

  I+− = I+− (x) ¯ = i ∈ I+ | Gi (x) ¯ 0 ,   ¯ = i ∈ I0 | Gi (x) ¯ =0 , I00 = I00 (x)   ¯ = i ∈ I0 | Gi (x) ¯ 0,

G I+0 ∪I1 (x) ¯ ξ¯ < 0.

We say that a weakly stationary point x¯ of MPVC (1.1) satisfies the MPVC-strict Mangasarian–Fromovitz constraint qualification (MPVC-SMFCQ) if the usual strict Mangasarian–Fromovitz constraint qualification holds at x¯ for TNLP (2.2), which mean that the associated to x¯ Lagrange multiplier μ¯ of TNLP (2.2) is unique. The equivalent form of this condition in terms of constraints’ derivatives can be found in [18].

Active-set Newton methods for MPVC

429

The next theorem can be proved by an evident modification of the argument in [18, Theorem 1], where the “usual” constraints h(x) = 0 and g(x) ≤ 0 were dropped from the problem for simplicity. Theorem 2.1 Let a function f and mappings g and G be differentiable at x¯ ∈ Rn , and let mappings h and H be differentiable near x, ¯ with their derivatives being continuous at x. ¯ Then the following assertions hold: 1. If x¯ is a local solution of problem (1.1), satisfying the piecewise MFCQ, then x¯ is a B-stationary point of problem (1.1), and hence, a weakly stationary point of this problem. 2. If a weakly stationary point x¯ of problem (1.1) satisfies MPVC-SMFCQ, and μ¯ is the unique associated to x¯ Lagrange multiplier for TNLP (2.2) (i.e., μ¯ = (μ¯ h , μ¯ g , μ¯ H , μ¯ G ) satisfies (2.3), (2.4)), then x¯ is a strongly stationary point of problem (1.1), and μ¯ is the unique MPVC-multiplier associated to x¯ (and in particular, it additionally satisfies (2.5)), as well as the unique Lagrange multiplier for branch problem (2.6) for any partition (I1 , I2 ) ∈ I. Define the usual Lagrangian of problem (1.1): s        L(x, λ) = f (x) + λh , h(x) + λg , g(x) − λH , H (x) + λGH i Gi (x)Hi (x), i=1

where x ∈ Rn and λ = (λh , λg , λG , λGH ) ∈ Rl × Rm × Rs × Rs . In [1, Remark 1] it was demonstrated that strong stationarity of a feasible point x¯ in MPVC (1.1) is in fact equivalent to the usual stationarity, that is, to the existence of a Lagrange multiplier λ = (λh , λg , λG , λGH ) ∈ Rl × Rm × Rs × Rs satisfying the Karush–Kuhn–Tucker (KKT) conditions ∂L (x, ¯ λ) = 0, ∂x g g λ{1,...,m}\Ig = 0, λIg ≥ 0, λGH I+0 ∪I0 ≥ 0,

λGH I+− = 0.

(2.7) λH I0 ≥ 0,

λH I+ = 0,

(2.8)

More precisely, the following fact is valid (see [21, Proposition 2.1]), where by  = (x) ¯ we mean the set of Lagrange multipliers associated with x¯ (i.e., of λ = (λh , λg , λG , λGH ) ∈ Rl × Rm × Rs × Rs satisfying (2.7), (2.8)). Proposition 2.1 Let a function f and mappings h, g, H and G be differentiable at a feasible point x¯ of problem (1.1). Then x¯ is a stationary point of problem (1.1) if and only if it is a strongly stationary point of this problem. Moreover, for any fixed λ = (λh , λg , λG , λGH ) ∈ , and for any μ = (μh , μg , μH , μG ) ∈ Rl × Rm × Rs × Rs , defined by the equalities μh = λh ,

μ g = λg ,

(2.9)

430

A.F. Izmailov, A.L. Pogosyan H μH i = λi = 0, H μH i = λi ,

i ∈ I+ ,

GH H μH ¯ i = λi − λi Gi (x),

i ∈ I00 ,

GH μG ¯ i = λi Hi (x),

i ∈ I+0 ,

GH μG ¯ = 0, i = λi Hi (x)

i ∈ I0+ ∪ I0− ,

(2.10)

i ∈ I+− ∪ I0 , (2.11)

it holds that μ ∈ M. Conversely, for any μ = (μh , μg , μH , μG ) ∈ M and λ = (λh , λg , λH , λGH ) ∈ l R × Rm × Rs × Rs , satisfying the relations (2.9)–(2.11) and

μH i GH , i ∈ I0+ , λi ≥ max 0, − λGH ≥ 0, i ∈ I00 , i Gi (x) ¯ (2.12) H μ ≤ − i , i ∈ I0− , 0 ≤ λGH i Gi (x) ¯ it holds that λ = (λh , λg , λH , λGH ) ∈ . Therefore, despite of the violation of MFCQ for MPVC (which is indispensable in the cases of violation of lower-level strict complementarity condition), the traditional notion of stationarity remains fully adequate for this problem class. Assuming twice differentiability of the input data of problem (1.1), below we will make use of the standard second-order sufficient optimality condition (SOSC), which consists of saying that there exists λ¯ ∈  satisfying 2 ∂ L ¯ )ξ, ξ > 0 ∀ξ ∈ C \ {0}, ( x, ¯ λ (2.13) ∂x 2 where C = C(x) ¯



  h (x)ξ ¯ = 0, gI g (x)ξ ¯ ≤ 0, = ξ ∈R 

 HI (x)ξ ¯ = 0, HI 00 ∪I0− (x)ξ ¯ ≥ 0, G I+0 (x)ξ ¯ ≤ 0, f (x), ¯ ξ ≤ 0 0+ n

is the standard critical cone of problem (1.1) at x. ¯

3 Active-set Newton method The active-set Newton-methods for MPVC developed in [18, 19] rely on the same basic idea as the one employed in [20] for MPCC. Specifically, the idea is to identify TNLP (2.2) associated with the solution we are trying to find, and to apply the sequential quadratic programming (SQP) method to the identified TNLP. Identification of TNLP (2.2) consists in identification of the index sets I+0 , I0+ , I00 and I0− , and this can be done under very weak assumptions, and without any considerable computational costs, by means of the procedure suggested in [10], and by the error bound estimating the distance to the set {x} ¯ ×  (see (4.7) below) following from [12, Lemma 2] and [11, Theorem 2].

Active-set Newton methods for MPVC

431

Specifically, the KKT system ∂L (x, λ) = 0, h(x) = 0, ∂x  g  λg ≥ 0, g(x) ≤ 0, λ , g(x) = 0,  H  H (x) ≥ 0, λ , H (x) = 0, λH ≥ 0, λGH ≥ 0,

Gi (x)Hi (x) ≤ 0,

i = 1, . . . , s,

(3.1) (3.2) (3.3) s 

λGH i Gi (x)Hi (x) = 0,

i=1

(3.4)

characterizing stationary points and associated Lagrange multipliers of the original problem (1.1), can be written in the form of the equation (x, λ) = 0, where the residual mapping : Rn ×(Rl ×Rm ×Rs ×Rs ) → Rn ×Rl ×Rm ×Rs ×Rs is defined by the equality ⎛ ⎜ ⎜ (x, λ) = ⎜ ⎜ ⎝

∂L ∂x (x, λ)



⎟ h(x) ⎟ g g ⎟, (ρ(λ1 , −g1 (x)), . . . , ρ(λm , −gm (x))) ⎟ H H ⎠ (ρ(λ1 , H1 (x)), . . . , ρ(λs , Hs (x))) GH GH (ρ(λ1 , −G1 (x)H1 (x)), . . . , ρ(λs , −Gs (x)Hs (x)))

with ρ: R × R → R being one of the so-called complementarity functions. The most commonly used choices of ρ are the following two options: • The natural residual function ρ(a, b) = min{a, √ b}. • The Fischer–Burmeister function ρ(a, b) = a 2 + b2 − a − b. The corresponding two instances of the residual mapping  will be denoted by NR and FB , respectively. It is well-known that NR and FB are semismooth, and in particular, locally Lipschitzian. Moreover, according to [26], the norms of these mappings are equivalent in terms of their growth rate, which implies that any result relying on an error bound involving one of these residuals remains valid with the other residual. In what follows we write  for either NR or FB when both choices are appropriate. Fix any θ ∈ (0, 1), and for any x k ∈ Rn and any λk ∈ Rl × Rm × Rs × Rs define the index sets        θ  I+ x k , λk = i = 1, . . . , s | Hi x k >  x k , λk  ,     I0 x k , λk = {i = 1, . . . , s} \ I+ x k , λk ,          θ  I+− x k , λk = i ∈ I+ x k , λk | Gi x k < − x k , λk  ,       (3.5) I+0 x k , λk = I+ x k , λk \ I+− x k , λk ,  k k   k k  k    k k θ  I0+ x , λ = i ∈ I0 x , λ | Gi x >  x , λ  ,          θ  I0− x k , λk = i ∈ I0 x k , λk | Gi x k < − x k , λk  ,          I00 x k , λk = I0 x k , λk \ I0+ x k , λk ∪ I0− x k , λk .

432

A.F. Izmailov, A.L. Pogosyan

According to [18, Proposition 2], if SOSC (2.13) holds at x¯ for a Lagrange multiplier ¯ coincide with the index sets λ¯ ∈  then these sets locally (for (x k , λk ) close to (x, ¯ λ)) I+ , I0 , I+− , I+0 , I0+ , I0− and I00 , respectively. In what follows we assume that function f and mappings h, g, G and H are twice differentiable. The algorithm we present next is a generalization of the algorithm proposed in [19, Algorithm 2] to the case when the usual equality and inequality constraints can be present. Algorithm 3.1 Preliminary step. Fix θ ∈ (0, 1). Set k = 0 and choose x 0 ∈ Rn and λ0 = ((λh )0 , (λg )0 , (λH )0 , (λGH )0 ) ∈ Rl × Rm × Rs × Rs . Identification step. Define the index sets I+0 = I+0 (x k , λk ), I0+ = I0+ (x k , λk ), I00 = I00 (x k , λk ) and I0− = I0− (x k , λk ) according to (3.5). G 0 0 Initialization step. If k = 0, define (μh )0 , (μg )0 , (μH I0 ) and (μI+0 ∪I00 ) according to the formulas suggested by Proposition 2.1:  h 0  h 0  g 0  g 0 μ = λ , μ = λ , (3.6)  H 0  H 0  GH 0  0  = λi − λi Gi x , i ∈ I0+ ∪ I0− , μi (3.7)  H 0  H 0 μi = λi , i ∈ I00 ,  G 0  G 0  GH 0  0  Hi x , i ∈ I+0 , μi = 0, i ∈ I00 , (3.8) μi = λi and set

 H 0 μI+ = 0,

 G 0 μI+− ∪I0+ ∪I0− = 0.

(3.9)

SQP step. Compute x k+1 ∈ Rn as a stationary point of the quadratic program  1 ∂ 2 L  k k     k  k k k x ,μ x − x ,x − x minimize f x , x − x + 2 ∂x 2           subject to h x k + h x k x − x k = 0, g x k + g x k x − x k ≤ 0,      (3.10) −HI0+ ∪I00 x k − HI 0+ ∪I00 x k x − x k = 0,      −HI0− x k − HI 0− x k x − x k ≤ 0,      GI+0 ∪I00 x k + G I+0 ∪I00 x k x − x k ≤ 0, k+1 , (μH )k+1 , (μG k+1 ) ∈ Rl × Rm × and let ((μh )k+1 , (μg )k+1 , (μH + I+0 ∪I00 ) I0+ ∪I00 ) I0− |I

|

|I

∪I |

R|I0+ ∪I00 | ×R+0− ×R++0 00 be a Lagrange multiplier associated with x k+1 . Define the entire μk+1 = ((μh )k+1 , (μg )k+1 , (μH )k+1 , (μG )k+1 ) by setting the remaining components equal to zero:  H k+1 k+1  G μI + = 0, μI+− ∪I0+ ∪I0− = 0. (3.11) Set νik+1

k+1

(μH i ) , = max 0, − Gi (x k+1 )

i ∈ I0+ ∪ I0− ,

(3.12)

Active-set Newton methods for MPVC

433

and define λk+1 = ((λh )k+1 , (λg )k+1 , (λH )k+1 , (λGH )k+1 ) as follows:  h k+1  h k+1  g k+1  g k+1 λ = μ , λ = μ ,  H k+1  H k+1  H k+1 λI + = 0, λI00 = μI00 ,  k+1  GH k   GH k+1 , i ∈ I0+ , = max νi , λi λi ⎧ k if (λGH ⎪ i ) ≤ 0, ⎪0  GH k+1 ⎨ GH k k+1 k λi = (λi ) if 0 < (λGH , i ∈ I0− , i ) < νi ⎪ ⎪ ⎩ k+1 k+1 k νi if (λGH , i ) ≥ νi  H k+1  H k+1  GH k+1  k+1  λi , i ∈ I0+ ∪ I0− , = μi + λi Gi x  GH k+1 λi =

k+1 (μG i ) , Hi (x k+1 )

i ∈ I+0 ,

 GH k+1 k    λi , = max 0, λGH i

 GH k+1 λI+− = 0,

i ∈ I00 .

(3.13) (3.14) (3.15)

(3.16)

(3.17) (3.18) (3.19)

Increase k by 1 and go to Identification step. For a strongly stationary point x¯ of problem (1.1) satisfying MPVC-SMFCQ, and for the unique MPVC-multiplier μ¯ associated with x, ¯ set ⎧ H   ⎨ max 0, − μ¯ i if i ∈ I0+ , Gi (x) ¯ ν¯ i = (3.20) ⎩ μ¯ Hi − Gi (x) if i ∈ I . 0− ¯ From (3.12) and (3.20) we derive that for any i ∈ I0+ ∪ I0−  k+1    ν − ν¯ i  = O  x k+1 − x, ¯ μk+1 − μ¯  . i

(3.21)

Suppose now that for some k ∈ {0, 1, . . .}, Algorithm 3.1 generated points x k , λk and μk . Let λ¯ k = ((λ¯ h )k , (λ¯ g )k , (λ¯ H )k , (λ¯ GH )k ) ∈  be the Euclidian projection of λk onto the polyhedral set , i.e.,   k   λ − λ¯ k  = dist λk ,  . (3.22) Then for any λ¯ ∈  it holds that λ¯ k − λ¯  ≤ λk − λ¯ , and in particular, λ¯ k tends to ¯ λ¯ as λk tends to λ. Furthermore, define λ˜ k = ((λ˜ h )k , (λ˜ g )k , (λ˜ H )k , (λ˜ GH )k ) ∈  as follows. Let GH k k (λ˜ GH I0 ) be the Euclidian projection of (λI0 ) onto the box defined by constraints (2.12), and let the other components of λ˜ k be defined according to formulas (2.9)– (2.11) in Proposition 2.1 (implying the inclusion λ˜ k ∈ ), namely  g k  h k = μ¯ h , λ˜ = μ¯ g , λ˜  H k  H k λ˜ I00 = μ¯ H λ˜ I+ = μ¯ H I+ = 0, I00 ,

k if (λGH ν¯ i  GH k i ) < ν¯ i , = λ˜ i i ∈ I0+ , k if (λGH )k ≥ ν¯ , (λGH i i ) i

434

A.F. Izmailov, A.L. Pogosyan

 GH k = λ˜ i

k (λGH i )

k if (λGH i ) ≤ ν¯ i ,

i ∈ I0− , k if (λGH i ) > ν¯ i ,  H k  GH k ˜ λ˜ i = μ¯ H Gi (x), ¯ i ∈ I0+ ∪ I0− , i + λi ν¯ i

 GH k μ¯ G i , = λ˜ i Hi (x) ¯  GH k  GH k λ˜ I = λI 00

 GH k λ˜ I+− = μ¯ G I+− = 0,

i ∈ I+0 ,

00

(recall that ≥ 0 according to (3.16), (3.19)). Employing (2.4), (2.5), (2.9)–(2.11), (3.6)–(3.9), (3.11), (3.13), (3.14), (3.17), (3.18) and the fact that λ¯ k ∈ , we derive the estimate k (λGH I00 ∪I0− )

l m   k   h k  h k    g k  g k  μ − μ¯  =  λ +  λ ¯ − λ − λ¯ i  i i i 1 i=1

+



i=1

  H k  H k  GH k  k   GH k  λ − λ¯ − λ Gi x + λ¯ Gi (x) ¯  i

i

i

i

i∈I0+ ∪I0−

+

  k  k           λH − λ¯ H  +  λGH k Hi x k − λ¯ GH k Hi (x) ¯  i i i i

i∈I00

i∈I+0

  k  k   k  k   = O x k − x¯  + O  λh − λ¯ h  + O  λg − λ¯ g    k  H k       + O  λGH k − λ¯ GH k  ¯ + O  λH I00 − λI00 I+0 I+0  k  H k   ¯ + O  λH I0+ ∪I0− − λI0+ ∪I0−  k  GH k   ¯ + O  λGH I0+ ∪I0− − λI0+ ∪I0−   k  (3.23) = O  x − x, ¯ λk − λ¯ k  . In the sequel we shall need the following assumptions: Assumption 1 Function f and mappings h, g, H and G are twice differentiable near a strongly stationary point x¯ ∈ Rn of problem (1.1), with their second derivatives being Lipschitz-continuous with respect to x. ¯ Assumption 2 The point x¯ satisfies MPVC-SMFCQ, and μ¯ = (μ¯ h , μ¯ g , μ¯ H , μ¯ G ) are the unique MPVC-multiplier associated with x. ¯ Assumption 3 SOSC (2.13) holds for some λ¯ = (λ¯ h , λ¯ g , λ¯ H , λ¯ GH ) ∈ . ¯ and if AlgoUnder Assumptions 1–3, if (x k , λk ) is close enough to (x, ¯ λ), H k+1 h k+1 g k+1 k+1 , rithm 3.1 computes the point (x , ((μ ) , (μ ) , (μI0+ ∪I00 )k+1 , (μH I0− ) G G H H k+1 k h k g k k k (μI+0 ∪I00 ) )) as a closest to (x , ((μ ) , (μ ) , (μI0+ ∪I00 ) , (μI0− ) , (μI+0 ∪I00 )k )) stationary point-Lagrange multiplier pair of quadratic program (3.10), then, according to [18, Theorem 4] (which in its turn employs SQP local convergence analysis from [3]), this algorithm generates x k+1 and μk+1 satisfying the following estimates:

Active-set Newton methods for MPVC

435

  k+1   2  2  x − x¯  = O  x k − x, ¯ μk − μ¯  = O  x k − x, ¯ λk − λ¯ k  , (3.24)  k+1    2  2  μ − μ¯  = O  x k − x, ¯ μk − μ¯  = O  x k − x, ¯ λk − λ¯ k  , (3.25) where the last equalities in both (3.24) and (3.25) are by (3.23). Observe that the estimates (3.21), (3.24) and (3.25) further imply that for any i ∈ I0+ ∪ I0−  k+1  2   ν (3.26) − ν¯ i  = O  x k − x, ¯ λk − λ¯ k  . i Lemma 3.1 Under Assumptions 1–3, suppose that Algorithm 3.1 generated points ¯ x k , λk and μk such that (x k , λk ) is close enough to (x, ¯ λ). k+1 , Then Algorithm 3.1, in which the point (x k+1 , ((μh )k+1 , (μg )k+1 , (μH I0+ ∪I00 ) k+1 , (μG k+1 )) is computed as a closest to (x k , ((μh )k , (μg )k , (μH I+0 ∪I00 ) I0− ) G H k k k (μH I0+ ∪I00 ) , (μI0− ) , (μI+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of quadratic program (3.10), generates points x k+1 , μk+1 and λk+1 such that the following estimate is valid:   k+1  2  λ (3.27) − λ¯ k+1  = O  x k − x, ¯ λk − λ¯ k  . Proof The inclusion λ˜ k+1 ∈  evidently implies the estimate     dist λk+1 ,  ≤ λk+1 − λ˜ k+1 . Therefore, in order to prove (3.27), it suffices to establish the estimate  k+1   2  λ − λ˜ k+1  = O  x k − x, ¯ λk − λ¯ k  .

(3.28)

From (3.13), (3.14), (3.18), (3.19), (3.24), (3.25) and the definition of λ˜ k+1 , we derive the estimates   h k+1  h k+1   h k+1 2   = μ  λ − λ˜ − μ¯ h  = O  x k − x, ¯ λk − λ¯ k  ,  g k+1  g k+1   g k+1   2   λ = μ − λ˜ − μ¯ g  = O  x k − x, ¯ λk − λ¯ k  ,  H k+1  H k+1   λ  = 0, − λ˜ I+ I+  H k+1  H k+1   H k+1   λ = μ  − λ˜ I00 − μ¯ H I00 I00 I00  2  = O  x k − x, ¯ λk − λ¯ k  ,   k+1   GH k+1  GH k+1   (μG μ¯ G i ) i    λ  ˜ = − − λi i k+1 ¯  Hi (x ) Hi (x)  k 2  = O  x − x, ¯ λk − λ¯ k  , i ∈ I+0 ,  GH k+1  GH k+1   λ  = 0, − λ˜ I+− ∪I00 I+− ∪I00 and in order to establish (3.28), it remains to consider the components of dual estimates corresponding to i ∈ I0+ ∪ I0− . For i ∈ I0+ , according to (3.15), we need to consider the following two cases. k+1 = ν k+1 . Then by the definition of λ ˜ k+1 we obtain the folCase 1. Let (λGH i ) i lowing conclusions:

436

A.F. Izmailov, A.L. Pogosyan

k+1 = ν¯ and estimate (3.26) imply the esti• If νik+1 < ν¯ i then the equality (λ˜ GH i i ) mate   GH k+1  GH k+1   k+1  2   = ν  λ − λ˜ i − ν¯ i  = O  x k − x, ¯ λk − λ¯ k  . (3.29) i i

• If νik+1 ≥ ν¯ i then

 GH k+1  GH k+1 = λ˜ i . λi

(3.30)

k+1 = (λGH )k . Then, again by the definition of λ ˜ k+1 , we derive Case 2. Let (λGH i ) i the following conclusions: k+1 k k • If (λGH ≤ (λGH i ) < ν¯ i , then according to (3.15), it holds that νi i ) < ν¯ i , and k+1 = ν¯ and estimate (3.26) implies the estimate hence, the equality (λ˜ GH i i )  GH k+1  GH k+1     λ  = O ν k+1 − ν¯ i  − λ˜ i i i  2  = O  x k − x, (3.31) ¯ λk − λ¯ k  . k • If (λGH i ) ≥ ν¯ i then (3.30) holds again.

For i ∈ I0− , according to (3.16), we need to consider the following three cases. k+1 = 0. Then by the definition of λ ˜ k+1 it holds that Case 1. Let (λGH i ) k+1 = 0, and hence, the equality (3.30) is valid. (λ˜ GH i ) k+1 = (λGH )k . Then by the definition of λ ˜ k+1 we have: Case 2. Let (λGH i ) i k • If (λGH i ) ≤ ν¯ i then the equality (3.30) is valid. k+1 GH k , and • If (λi )k > ν¯ i then according to (3.16), it holds that ν¯ i < (λGH i ) < νi GH k+1 ˜ = ν¯ i and estimate (3.26) imply the estimate (3.31). hence, the equality (λi ) k+1 = ν k+1 . Then by the definition of λ ˜ k+1 we have: Case 3. Let (λGH i ) i k+1 = ν¯ and estimate (3.26) imply estimate • If νik+1 > ν¯ i then the equality (λ˜ GH i i ) (3.29). • If νik+1 ≤ ν¯ i then the equality (3.30) is valid.

Finally, for i ∈ I0+ ∪I0− , and in any of the cases considered above, by the relations (3.17), (3.24), (3.25) and either (3.29) or (3.30) or (3.31), we derive the estimate   H k+1  H k+1   H k+1  GH k+1  k+1   GH k+1 = μ  λ − λ˜ i − λ˜ i − μ¯ H Gi x Gi (x) ¯  i i i + λi     = O x k+1 − x¯  + O μk+1 − μ¯   k+1  GH k+1   + O  λGH − λ˜ i i  k   2 = O  x − x, (3.32) ¯ λk − λ¯ k  . Combining (3.29), (3.30) and (3.31) (in the corresponding cases) with (3.32), we complete the proof of the needed estimate (3.28).  Theorem 3.1 Under Assumptions 1–3, for any starting point (x 0 , λ0 ) close ¯ enough to (x, ¯ λ), Algorithm 3.1, in which (x k+1 , ((μh )k+1 , (μg )k+1 , H H k+1 k+1 k+1 )) is computed as a closest to (x k , ((μh )k , (μI0+ ∪I00 ) , (μI0− ) , (μG I+0 ∪I00 )

Active-set Newton methods for MPVC

437

G H k k k (μg )k , (μH I0+ ∪I00 ) , (μI0− ) , (μI+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of the quadratic program (3.10), generates an iterative sequence {(x k , λk , μk )} which converges to (x, ¯ λ∗ , μ) ¯ with some λ∗ = ((λh )∗ , (λg )∗ , (λH )∗ , (λGH )∗ ) ∈ , and the rate of convergence of the sequences {(x k , λk )} and {(x k , μk )} is quadratic. Furthermore, considering λ∗ as a function of (x 0 , λ0 ), the following asymptotic relation is valid:   ¯ ¯ λ). (3.33) λ∗ → λ¯ as x 0 , λ0 → (x,

Proof Generalizing the argument in [19, Theorem 2] to the problem setting (1.1) allowing for “usual” equality and inequality constraints, we readily obtain the conclusion concerning {(x k , μk )}: this sequence converges to (x, ¯ μ), ¯ and the rate of convergence is quadratic. Moreover, one of the key points of the proof in [19, Theorem 2] consist of showing that      ¯ sup λk − λ¯  | k = 0, 1, . . . → 0 as x 0 , λ0 → (x, ¯ λ), (3.34) where the left-hand side term is considered as a function of (x 0 , λ0 ). In particular, {λk } is bounded, and hence, has an accumulation point λ∗ , and (3.34) implies (3.33). It remains to show that the entire sequence {λk } converges to λ∗ and that λ∗ ∈ , and to estimate the convergence rate. k According to (3.15), (3.16) and (3.19), for any i ∈ I0 the sequence {(λGH i ) } is monotone: for i ∈ I0+ ∪ I00 , this sequence is nondecreasing, while for i ∈ I0− , it is 0 nonincreasing (possibly starting from k = 1, since in the case of (λGH i ) ≤ 0, (3.16) GH k implies the equality (λi ) = 0 for all k = 1, 2, . . .). Therefore, for any i ∈ I0 , the GH ∗ k sequence {(λGH i ) } converges to its (λi ) . Combining this with convergence of {(x k , μk )} to (x, ¯ μ), ¯ and passing onto the limit in (3.13), (3.14), (3.17) and (3.18), we derive that the entire sequence {λk } converges to λ∗ , and that λ = λ∗ satisfies (2.9), (2.10) and (2.11) with μ = μ. ¯ Taking into account estimate (3.21), we also conclude that for any i ∈ I0+ ∪ I0− , the sequence {νik } converges to ν¯ i , and therefore, from (3.15), (3.16) and (3.19), and from (3.20), we derive that λ = λ∗ satisfies (2.12) with μ = μ. ¯ Therefore, by Proposition 2.1, λ∗ ∈ . We proceed with the convergence rate estimate. From (2.9)–(2.11) with λ = λ∗ and μ = μ, ¯ from (3.13), (3.14), (3.18), and from (3.24), (3.25) we derive the estimate ⎛ ⎞   (λh )k+1 − (λh )∗    ⎜ ⎟ (λg )k+1 − (λg )∗ ⎜ ⎟  k+1   x  ⎜ H ⎟ − x, ¯ μk+1 − μ¯  k+1 − (λH ∗ ⎟ = O ⎜ (λ ) ) I+ ∪I00 ⎠ ⎝ I+ ∪I00   GH k+1 ∗   (λI+ ) − (λGH I+ )  2  = O  x k − x, ¯ λk − λ¯ k   2  = O  x k − x, ¯ λk − λ ∗  , where the last equality is by (3.22). k Furthermore, for any i ∈ I0 , employing monotonicity of {(λGH i ) }, convergence k of {νi } to ν¯ i , (3.20), (3.15), (3.16) and (3.19), one can easily check the following: if

438

A.F. Izmailov, A.L. Pogosyan

k for some k it holds that λGH = (λGH ¯ i i ) satisfies the i-th relation in (2.12) with μ = μ GH GH k ∗ then (λi ) = (λi ) for all k large enough. = Consider now any index i ∈ I0 such that the i-th relation in (2.12) with λGH i GH (λi )k and μ = μ¯ does not hold for any k. Note that by (3.19), this cannot occur for i ∈ I00 , while by (3.16), this cannot occur with the nonnegativity condition in the i-th relation in (2.12) for i ∈ I0− . Therefore, it remains to consider the following two cases. k Consider any index i ∈ I0+ such that (λGH i ) < ν¯ i for all k. As discussed above in GH ∗ k this proof, the sequence {(λGH i ) } is nondecreasing and converges to (λi ) satisfyGH GH ∗ ∗ ing (λi ) ≥ ν¯ i , and hence, it actually holds that (λi ) = ν¯ i . Now, for any k, from k+1 = ν k+1 , or ν k+1 < (λGH )k+1 = (λGH )k < ν¯ , (3.15) we obtain that either (λGH i i ) i i i i and in both cases, employing (3.26), we derive the estimate  GH k+1  GH ∗   k+1   2   λ  ≤ ν − λi − ν¯ i  = O  x k − x, ¯ λk − λ¯ k  i i  2  = O  x k − x, ¯ λk − λ∗  ,

where the last equality is again by (3.22). k The case of i ∈ I0− such that (λGH i ) > ν¯ i for all k is considers similarly, employk ing the fact that the corresponding sequence {(λGH i ) } is nonincreasing. Finally, from (2.10) with λ = λ∗ and μ = μ, ¯ from (3.17), from (3.24) and (3.25), k+1 −(λGH )∗ |, for i ∈ I ∪I and from the obtained estimates for |(λGH 0+ 0− we derive i ) i the estimate   H k+1  H ∗   H k+1  GH k+1  k+1   GH ∗  λ − λi  =  μi − μ¯ H Gi x Gi (x) ¯  − λi i i + λi   2   = O  x k+1 − x, ¯ μk+1 − μ¯  + O  x k − x, ¯ λk − λ∗   2   2  = O  x k − x, ¯ λk − λ¯ k  + O  x k − x, ¯ λk − λ∗   2  = O  x k − x, ¯ λk − λ ∗  , where the last equality is again by (3.22). Combining the obtained estimates, we get quadratic rate of convergence of {(x k , λk )} to (x, ¯ λ∗ ).  The local active-set method developed in this section is intended for finding strongly stationary points of MPVC. Observe that there exist some reasonable weaker stationarity concepts for MPVC, such as M-stationarity and T-stationarity (see [17]), and developing methods for finding M-stationary or T-stationary points can be of interest. However, the authors are currently not aware of any methods with established local superlinear convergence to M-stationary or T-stationary points.

4 Globalization of convergence Hybrid globalization strategies suggested below for local Algorithm 3.1 employ a generic outer-phase method which in principle can be any method with reasonable

Active-set Newton methods for MPVC

439

global convergence properties. Its role is to drive the iterative sequence towards a stationary point of problem (1.1) and an associated Lagrange multiplier. Each step of the outer-phase method is followed by an attempt to switch to the steps of Algorithm 3.1 which are accepted if the linear decrease test for a residual of the KKT system (3.1)– (3.4) of problem (1.1) turns satisfied. Under reasonable assumptions we show that the active-set steps are eventually necessarily accepted, and the globalized methods exhibit the high convergence rate of Algorithm 3.1. In the algorithm to be proposed in Sect. 4.1, the residual at the trial point is compared with the current residual, and the cases of possible violation of the linear decrease test on later iterations are treated by means of backup safeguards. Thus, some iterates computed by Algorithm 3.1 can be eventually disregarded, and the overall efficiency of the algorithm certainly depend on the actual amount of these useless computations. Alternatively, in Sect. 4.2, the residual at the trial point is compared with record (rather than current) residual, and the record is updated only when the linear decrease test turns satisfied, or when the step of the outer-phase method turns to decrease the residual with respect to the record. This strategy allows to avoid using the backup safeguards. Finally, in Sect. 4.3, we will consider the specific outer-phase method, namely, the semismooth Newton method (SNM) for the Fischer–Burmeister function-based reformulation of the KKT system (3.1)–(3.4) equipped with the natural linesearch procedure. The resulting algorithm generates monotone sequences of residual values, and it fits both generic algorithmic frameworks from Sects. 4.1 and 4.2. 4.1 Hybrid globalization with backup safeguards We next specify our first conceptual globalized algorithm. Algorithm 4.1 Preliminary step. Fix θ, q ∈ (0, 1). Set k = 0 and choose x 0 ∈ Rn and λ0 = ((λh )0 , (λg )0 , (λH )0 , (λGH )0 ) ∈ Rl × Rm × Rs × Rs . Identification step. Define the index sets I+0 = I+0 (x k , λk ), I0+ = I0+ (x k , λk ), I00 = I00 (x k , λk ) and I0− = I0− (x k , λk ) according to (3.5). If k = 0, or if some of these index sets differ from their counterparts computed for the previous k, or if I+ ∪ I0 = {1, . . . , s} or I+0 ∪ I+− = I+ or I0+ ∪ I00 ∪ I0− = I0 , go to Outer-phase step. Active-set step. If (x k , λk ) was generated by the Outer-phase step, set k˜ = k, store ˜ ˜ (x k , λk ) and define μk = ((μh )k , (μg )k , (μH )k , (μG )k ) according to the formulas  h k  h k  g k  g k μ = λ , μ = λ , (4.1)  H k  H k  GH k  k  = λi − λi Gi x , i ∈ I0+ ∪ I0− , μi (4.2)  H k  H k = λi , i ∈ I00 , μi  G k  GH k  k   G k μi = λi Hi x , i ∈ I+0 , μi = 0, i ∈ I00 , (4.3) k  G  H k μI+− ∪I0+ ∪I0− = 0. (4.4) μI+ = 0,

440

A.F. Izmailov, A.L. Pogosyan

Compute x k+1 ∈ Rn as a stationary point of the quadratic program (3.10), k+1 , (μH )k+1 , (μG k+1 ) ∈ Rl × Rm × and let ((μh )k+1 , (μg )k+1 , (μH + I+0 ∪I00 ) I0+ ∪I00 ) I0− |I

|

|I

∪I |

R|I0+ ∪I00 | × R+0− × R++0 00 be a Lagrange multiplier associated with x k+1 . If x k+1 does not exist, or if there exists i ∈ I0+ ∪ I0− such that Gi (x k+1 ) = 0, or if there exists i ∈ I+0 such that Hi (x k+1 ) = 0, go to Outer-phase step. Otherwise define the entire μk+1 = ((μh )k+1 , (μg )k+1 , (μH )k+1 , (μG )k+1 ) using (3.11) and then define λk+1 = ((λh )k+1 , (λg )k+1 , (λH )k+1 , (λGH )k+1 ) according to (3.12)–(3.19). If the condition     k+1 k+1    ≤ q  x k , λk   x , λ (4.5)

is satisfied, increase k by 1 and go to Identification step. Outer phase step. If the point (x k , λk ) was generated by the Active-set step, set ˜ (x k , λk ) = (x k˜ , λk˜ ). Compute (x k+1 , λk+1 ) by the iteration of the outer-phase k = k, method, increase k by 1 and go to Identification step. Theorem 4.1 Let a function f and mappings h, g, H and G be twice differentiable on Rn . Let {(x k , λk )} be an iterative sequence generated by Algorithm 4.1. If all the iterates in this sequence with k large enough are generated by Active-set step of the algorithm then   k k  → 0 as k → ∞, (4.6)  x ,λ and in particular, the primal part of any accumulation point of {(x k , λk )} is a strongly stationary point of (1.1), while the dual part is an associated Lagrange multiplier. Otherwise all the iterates are generated by the Outer-phase steps. Evidently, if the last assertion of Theorem 4.1 holds, Algorithm 4.1 inherits global convergence properties of the outer-phase method. Proof If all the iterates in the sequence {(x k , λk )} with k large enough are generated by Active-set step of Algorithm 4.1 then (4.5) is fulfilled for any k large enough, which further implies (4.6). It remains to consider the case when the subsequence {(x kj , λkj )} of all iterates generated by Outer-phase steps of Algorithm 4.1 is infinite. By the construction of the algorithm, this subsequence starts with the point (x 1 , λ1 ). Take any two neighboring points (x kj , λkj ) and (x kj +1 , λkj +1 ) of this subsequence. Then the points (x kj +1 , λkj +1 ), (x kj +2 , λkj +2 ), . . . , (x kj +1 −1 , λkj +1 −1 ) are generated by Active-set step, while the attempt to generate the point (x kj +1 , λkj +1 ) by Active-set step is rejected. Therefore, all these intermediate points are disregarded, and the algorithm proceeds with Outer-phase step, setting the current value of the iteration counter k ˜ ˜ equal to k˜ = kj , and the current iterate equal to (x k , λk ) = (x kj , λkj ). Since the next k k j j (after (x , λ )) iterate generated by Outer-phase step is (x kj +1 , λkj +1 ), it holds that kj +1 = kj + 1, and this is true for all j . We thus conclude that the subsequence {(x kj , λkj )} coincides with the entire sequence {(x k , λk )}, where k = 1, 2, . . . , and the needed assertion follows.  We next establish the quadratic rate of convergence of Algorithm 4.1.

Active-set Newton methods for MPVC

441

¯ μ) Theorem 4.2 Under Assumptions 1–3, let (x, ¯ λ, ¯ be an accumulation point of a sequence {(x k , λk , μk )} generated by Algorithm 4.1, in which (x k+1 , ((μh )k+1 , (μg )k+1 , k+1 , (μH )k+1 , (μG k+1 )) is computed as a closest to (x k , ((μh )k , (μH I+0 ∪I00 ) I0+ ∪I00 ) I0− G H k k k (μg )k , (μH I0+ ∪I00 ) , (μI0− ) , (μI+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of the quadratic program (3.10). ¯ μ), Then the entire sequence {(x k , λk , μk )} converges to (x, ¯ λ, ¯ and the rate of convergence is quadratic. ¯ λ¯ ), and let (x k+1 , λk+1 ) be computed Proof Consider (x k , λk ) close enough to (x, k+1 , by Active-set step of Algorithm 4.1 with (x k+1 , ((μh )k+1 , (μg )k+1 , (μH I0+ ∪I00 ) H k k+1 , (μG k+1 )) being a closest to (x k , ((μh )k , (μg )k , (μH k (μH I0− ) I0+ ∪I00 ) , (μI0− ) , I+0 ∪I00 ) k (μG I+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of the quadratic program (3.10) (according to Theorem 3.1, this point is well-defined). From [12, Lemma 2] and [11, Theorem 2] it follows that     k    x − x¯  + λk − λ¯ k  = O  x k , λk  . (4.7) Employing Lemma 3.1, we derive the relations   k+1 k+1    k+1 k+1     x , λ  =  x , λ −  x, ¯ λ¯ k+1    = O  x k+1 − x, ¯ λk+1 − λ¯ k+1   2  = O  x k − x, ¯ λk − λ¯ k    2  = O  x k , λk  ,

(4.8)

where the first equality is by the inclusion λ¯ k+1 ∈ , the second equality is by Lipschitz-continuity of the mapping , the third equality is by (3.24) and (3.27), and the last equality is by (4.7). Evidently, (4.8) implies (4.5) for any fixed q ∈ (0, 1), provided (x k , λk ) is close ¯ This implies that (x k+1 , λk+1 ) will be accepted by Algorithm 4.1, enough to (x, ¯ λ). which will be further working identically to the (local) Algorithm 3.1. The needed result now follows from Theorem 3.1.  4.2 Hybrid globalization without backup safeguards The alternative hybrid globalization strategy is described as follows. Algorithm 4.2 Preliminary step. Fix θ, q ∈ (0, 1). Set k = 0 and choose x 0 ∈ Rn and λ0 = ((λh )0 , (λg )0 , (λH )0 , (λGH )0 ) ∈ Rl × Rm × Rs × Rs . Put ϕrec = (x 0 , λ0 ). Identification step. is identical to Algorithm 4.1. Active-set step. If (x k , λk ) was generated by the Outer-phase step, define μk = ((μh )k , (μg )k , (μH )k , (μG )k ) according to (4.1)–(4.4). Compute x k+1 ∈ Rn as a stationary point of the quadratic program (3.10), k+1 , (μH )k+1 , (μG k+1 ) ∈ Rl × Rm × and let ((μh )k+1 , (μg )k+1 , (μH + I+0 ∪I00 ) I0+ ∪I00 ) I0− |I

|

|I

R|I0+ ∪I00 | × R+0− × R++0

∪I00 |

be a Lagrange multiplier associated with x k+1 . If

442

A.F. Izmailov, A.L. Pogosyan

x k+1 does not exist, or if there exists i ∈ I0+ ∪ I0− such that Gi (x k+1 ) = 0, or if there exists i ∈ I+0 such that Hi (x k+1 ) = 0, go to Outer-phase step. Otherwise, define the entire μk+1 = ((μh )k+1 , (μg )k+1 , (μH )k+1 , (μG )k+1 ) using (3.11), and then define λk+1 = ((λh )k+1 , (λg )k+1 , (λH )k+1 , (λGH )k+1 ) according to (3.12)– (3.19). If the condition   k+1 k+1   x , λ  ≤ qϕrec (4.9) is satisfied, put ϕrec = (x k+1 , λk+1 ), increase k by 1 and go to Identification step. Outer phase step. Compute (x k+1 , λk+1 ) by the iteration of the outer-phase method. If   k+1 k+1   x , λ  ≤ ϕrec , (4.10) put ϕrec = (x k+1 , λk+1 ). Increase k by 1 and go to Identification step. Global convergence properties of this conceptual algorithm are again quite transparent. Theorem 4.3 Let a function f and mappings h, g, H and G be twice differentiable on Rn . Let {(x k , λk )} be an iterative sequence generated by Algorithm 4.2. If the iterates (x kj , λkj ) in this sequence are generated by Active-set step of the algorithm for infinitely many j , then   k k  (4.11)  x j , λ j → 0 as j → ∞, and in particular, the primal part of any accumulation point of {(x kj , λkj )} is a strongly stationary point of (1.1), while the dual part is an associated Lagrange multiplier. Otherwise, all the iterates with sufficiently large numbers are generated by the Outer-phase steps. Again, if the last assertion of Theorem 4.3 holds, Algorithm 4.2 inherits global convergence properties of the outer-phase method. Proof Suppose first that there exists an infinite subsequence (x kj , λkj ) of iterates generated by Active-set step. By the construction of the algorithm, all these iterates satisfy the acceptance test (4.9) (for k = kj −1), and the record value ϕrec is decreased by factor q infinitely many times (and never increased). Therefore, there exists a sequence {ϕk } of nonnegative reals, convergent to zero and such that   k k   x j , λ j  ≤ qϕk j holds for all j . This implies (4.11). Since any iterate in {(x k , λk )} is generated either by Active-set step or by Outerphase step of Algorithm 4.2, the only alternative to the existence of an infinite subsequence generated by Active-set steps is that all the iterates with sufficiently large numbers are generated by the Outer-phase steps. This completes the proof. 

Active-set Newton methods for MPVC

443

Theorem 4.4 Under Assumptions 1–3, let a sequence {(x k , λk , μk )} be generk+1 , (μH )k+1 , ated by Algorithm 4.2, in which (x k+1 , ((μh )k+1 , (μg )k+1 , (μH I0+ ∪I00 ) I0− k+1 )) is computed as a closest to (x k , ((μh )k , (μg )k , (μH k , (μH )k , ) ) (μG I+0 ∪I00 I0+ ∪I00 I0− k (μG I+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of the quadratic program ¯ μ). (3.10), and let this sequence be convergent to (x, ¯ λ, ¯ Then the rate of convergence is quadratic. Proof If the record value ϕrec changes infinitely many times, we can consider (x k , λk ) close enough to (x, ¯ λ¯ ), and such that ϕrec = (x k , λk ). Then we argue identically to the proof of Theorem 4.2, and we conclude that (x k+1 , λk+1 ) computed by Activeset step of Algorithm 4.2 will be accepted, and the algorithm will be further working identically to the (local) Algorithm 3.1. The needed result then follows from Theorem 3.1. Alternatively, assuming that the record value ϕrec > 0 is constant from some point on (which of course subsumes that Active-set steps are never accepted from some point on), we get a contradiction: {(x k , λk )} tends to 0 (since {(x k , λk )} tends to ¯ and therefore, (4.10) will necessarily hold as a strict inequality for a suffi(x, ¯ λ)) ciently large k, whatever would be the fixed value ϕrec , and the record would have to be changed.  The results obtained in this section for Algorithm 4.2 are somewhat weaker than their counterparts for Algorithm 4.1. However, these results still constitute reasonable convergence and rate of convergence properties. In particular, Theorem 4.3 implies that in the case of infinitely many accepted Active-set steps, the run of Algorithm 4.2 will be successful provided the stopping criterion will consist of making the KKT residual small enough. 4.3 Hybrid globalization by semismooth Newton method A natural (at least theoretically) candidate for the outer-phase method in Algorithms 4.1 and 4.2 is the semismooth Newton method (SNM) for the equation FB (x, λ) = 0 globalized by means of the linesearch for the merit function 2 1 ϕFB (x, λ) = FB (x, λ) , 2 where x ∈ Rn , λ ∈ Rl × Rm × Rs × Rs . This outer-phase method is in the spirit of the algorithm for complementarity problems in [6], or more precisely, of its variant considered in [22, Algorithm 4.1]. This choice is theoretically attractive for the following reason: using it in Algorithms 4.1 and 4.2, where one must in this case take  = FB , ensures monotonicity of the sequence {FB (x k , λk )}, and hence, no backup safeguards are needed in this version of the algorithm: as will be demonstrated below, global convergence can be proved without such safeguards. Recall (see [8, Proposition 9.1.4]) that twice continuous differentiability of f , h, g, H and G implies continuous differentiability of ϕFB , and

444

A.F. Izmailov, A.L. Pogosyan

ϕFB (x, λ) = J T FB (x, λ)

∀J ∈ ∂FB (x, λ), ∀x ∈ Rn , ∀λ ∈ Rl × Rm × Rs × Rs ,

(4.12)

where ∂FB (x, λ) stands for Clarke’s generalized Jacobian of FB at (x, λ) [5]. Algorithm 4.3 Preliminary step. Fix θ, q, ε, τ ∈ (0, 1) and M, γ > 0. Set k = 0 and choose x 0 ∈ Rn and λ0 = ((λh )0 , (λg )0 , (λH )0 , (λGH )0 ) ∈ Rl × Rm × Rs × Rs . Identification step is identical to Algorithm 4.1. Active-set step. If (x k , λk ) was generated by the Outer-phase step, define μk = ((μh )k , (μg )k , (μH )k , (μG )k ) according to (4.1)–(4.4). Compute x k+1 ∈ Rn as a stationary point of the quadratic program (3.10), k+1 , (μH )k+1 , (μG k+1 ) ∈ Rl × Rm × and let ((μh )k+1 , (μg )k+1 , (μH + I+0 ∪I00 ) I0+ ∪I00 ) I0− |I

|

|I

∪I |

R|I0+ ∪I00 | × R+0− × R++0 00 be a Lagrange multiplier associated with x k+1 . If x k+1 does not exist, or if there exists i ∈ I0+ ∪ I0− such that Gi (x k+1 ) = 0, or if there exists i ∈ I+0 such that Hi (x k+1 ) = 0, go to Outer-phase step. Otherwise, define the entire μk+1 = ((μh )k+1 , (μg )k+1 , (μH )k+1 , (μG )k+1 ) using (3.11), and then define λk+1 = ((λh )k+1 , (λg )k+1 , (λH )k+1 , (λGH )k+1 ) according to (3.12)– (3.19). If the condition (4.5) with  = FB is satisfied, increase k by 1 and go to Identification step. Outer phase step. SNM direction computation. Compute some Jk ∈ ∂FB (x k , λk ). Compute d k ∈ Rn × (Rl × Rm × Rs × Rs ) as a solution of the linear system   Jk d k = −FB x k , λk . (4.13) If d k is well-defined and   k     d  ≤ max M, 1/FB x k , λk γ ,

(4.14)

go to Linesearch step.

(x k , λk ). Gradient direction computation. Set d k = −ϕFB Linesearch. Compute the stepsize parameter αk according to the Armijo rule: set αk = τ j , where j is the smallest nonnegative integer such that        k k k x ,λ ,d . (4.15) ϕFB x k , λk + τ j d k ≤ ϕFB x k , λk + ετ j ϕFB Set (x k+1 , λk+1 ) = (x k , λk ) + αk d k , increase k by 1 and go to Identification step. In test (4.14), the parameters M and γ should be chosen large to allow for the acceptance of more Newton directions. This test can actually be checked within the inner solver for system (4.13), and its role is the same as detecting inconsistency of this system. In such cases, the algorithm resorts to the gradient direction and proceed. Theorem 4.5 Let a function f and mappings h, g, H and G be twice continuously differentiable on Rn . Let {(x k , λk )} be an iterative sequence generated by Algorithm 4.3. ¯ of this sequence satisfies the equality Then any accumulation point (x, ¯ λ)

¯ = 0. ϕFB (x, ¯ λ)

(4.16)

Active-set Newton methods for MPVC

445

Moreover, if there exists an infinite subsequence of {(x k , λk )} such that all the iterates in this subsequence are generated by Active-set step of the algorithm, then (4.6) holds. In that case, the primal part of any accumulation point of {(x k , λk )} is a strongly stationary point of (1.1), while the dual part is an associated Lagrange multiplier. Proof If the iterates (x kj , λkj ) in sequence {(x k , λk )} are generated by Active-set step of Algorithm 4.3 for infinitely many j , then (4.5) holds for k = kj − 1, and hence, the nonincreasing sequence {FB (x k , λk )} tends to 0. Therefore, any accumulation ¯ λ¯ ) = 0, and in particular, it is a global point (x, ¯ λ¯ ) of {(x k , λk )} satisfies FB (x, minimizer of ϕFB , which evidently implies (4.16). It remains to consider the case when (x k , λk ) is generated by Outer-phase step of Algorithm 4.3 for all k large enough. Then one just needs to literally repeat the proof in [22, Theorem 4.1].  ¯ λ¯ ) relation (4.16) is equivalent Remark 4.1 According to (4.12), for any J ∈ ∂FB (x, to the equality ¯ = 0. ¯ λ) J T FB (x,

(4.17)

¯ λ¯ ) contains at least one nonsingular matrix then (4.16) imIn particular, if ∂FB (x, ¯ = 0 which in its turn means that x¯ is a strongly station¯ λ) plies the equality FB (x, ary point of MPVC (1.1), while λ¯ is an associated Lagrange multiplier. Sufficient conditions for existence of a nonsingular matrix in the generalized Jacobian were derived in [9, Corollary 3.14 and Proposition 3.15]. For brevity, define the mapping g˜ : Rn → Rm × Rs × Rs ,    g(x) ˜ = g(x), H (x), G1 (x)H1 (x), . . . , Gs (x)Hs (x) , set μ˜ = (λ¯ g , λ¯ H , λ¯ GH ), and define the index sets   A = A(x, ¯ μ) ˜ = i = 1, . . . , m + 2s | g˜ i (x) ¯ = 0, μ˜ i ≥ 0 , ¯ μ) ˜ = {i ∈ A | μ˜ i > 0}. A+ = A+ (x, The first condition reads as follows:  

¯ h (x) = l + |A+ |, rank

¯ g˜ A+ (x) and



  ∂ 2L

¯ (x, ¯ λ)ξ, ξ > 0 ∀ξ ∈ ker h (x) ¯ ∩ ker g˜ A (x) ¯ \ {0}. + ∂x 2

The second condition is a slight modification of the first one:  

¯ h (x) = l + |A|, rank

¯ g˜ A (x) and

(4.18)



  ∂ 2L

¯ (x, ¯ λ)ξ, ξ > 0 ∀ξ ∈ ker h (x) ¯ ∩ ker g˜ A (x) ¯ \ {0}. ∂x 2

(4.19)

446

A.F. Izmailov, A.L. Pogosyan

Observe, however, that if for some index i ∈ {1, . . . , s} it holds that Hi (x) ¯ =0 ¯ GH ≥ 0 then (4.19) cannot hold since the gradients H (x) ≥ 0, λ ¯ and and λ¯ H i i i

H

¯ (Gi (x)Hi (x)) |x=x¯ = Gi (x)H ¯ i (x) ¯ are linearly dependent. Moreover, if λi > 0 and > 0 then (4.18) cannot hold as well. λ¯ GH i The needed conclusion remains valid under a weaker assumption that for any v ∈ ¯ λ¯ ) and u ∈ Rn × (Rl × Rm × Rn × (Rl × Rm × Rs × Rs ) there exists J ∈ ∂FB (x, s s R × R ) such that J u = v (that is, there is no need to have a single such matrix for all v), which is further implied by the assumption that the directional derivative ¯ ·) is a surjective mapping (see [8, Proposition 7.1.17]). Indeed, taking ¯ λ);  FB ((x, ¯ for the corresponding J and u from (4.17) we derive ¯ λ), v = FB (x,         ¯ u = FB (x, ¯ J u = FB (x, ¯ FB (x, ¯ = FB (x, ¯ 2 , ¯ λ), ¯ λ), ¯ λ), ¯ λ) ¯ λ) 0 = J T FB (x, ¯ λ¯ ) = 0. implying that FB (x, Finally, let us mention the following set of conditions, which also implies that a ¯ = 0, as estab¯ satisfying (4.16) necessarily satisfies the equality FB (x, ¯ λ) pair (x, ¯ λ) ∂2L ¯ lished in [9, Theorem 4.3]: the matrix ∂x 2 (x, ¯ λ) is positive semidefinite on ker h (x), ¯

  ∂ 2L ¯ ξ > 0 ∀ξ ∈ ker h (x) ( x, ¯ λ)ξ, ¯ ∩ ker g˜ (x) ¯ \ {0}, 2 ∂x

¯ = l, or h is affine and the feasible set is nonempty. (In [9, Theorem 4.3] and rank h (x) 2 ¯ is positive semidefinite on the entire Rn but the proof it is assumed that ∂∂xL2 (x, ¯ λ) there suggests that this assumption can be weakened as above.) The proof of the quadratic convergence rate literally repeats the proof of Theorem 4.2 with Algorithm 4.1 substituted by Algorithm 4.3. Theorem 4.6 Under Assumptions 1–3, let (x, ¯ λ¯ , μ) ¯ be an accumulation point of a sek k k quence {(x , λ , μ )} generated by Algorithm 4.3, in which (x k+1 , ((μh )k+1 , (μg )k+1 , k+1 , (μH )k+1 , (μG k+1 )) is computed as a closest to (x k , ((μh )k , (μH I+0 ∪I00 ) I0+ ∪I00 ) I0− G H k k k (μg )k , (μH I0+ ∪I00 ) , (μI0− ) , (μI+0 ∪I00 ) )) stationary point-Lagrange multiplier pair of the quadratic program (3.10). ¯ μ), ¯ λ, ¯ and the rate of Then the entire sequence {(x k , λk , μk )} converges to (x, convergence is quadratic. We complete this section with the following observation. Apart from the evident relations of Algorithm 4.3 with Algorithm 4.1, the former can also be regarded as a particular case of Algorithm 4.2, with the specific outer-phase method. Indeed, each Outer-phase step of Algorithm 4.3 reduces the value of ϕFB , and hence, the sequence {FB (x k , λk )} in this algorithm is necessarily nonincreasing. This implies that for each k, (4.10) will necessarily hold for the record value ϕrec defined as in Algorithm 4.2, and ϕrec will always be equal to the current residual FB (x k , λk ). Therefore, (4.9) will always be the same as (4.5).

Active-set Newton methods for MPVC

447

5 Numerical results In this section we present our preliminary numerical experience with Algorithm 4.3. The first part of our experiments is intended to demonstrate the use of Active-set step in this algorithm, while the second part compares Algorithm 4.3 with some alternatives. We employed the set of 23 test problems taken from all available publications on MPVC. All these problems are very small academic examples, with the only exception of “Ten-bar Truss” problem taken from [2, Sect. 6.2]. The latter problem arises from truss topology design, and it has 18 variables, 8 bilinear equality constraints, 21 linear inequality constraints, and 10 nonlinear inequality constraints modelling the vanishing stress restrictions. We use the following abbreviations for the algorithms under consideration: • SNM-AS is Algorithm 4.3 as it is. • SNM is Algorithm 4.3 with Active-set step skipped (i.e., this is the SNM for the Fischer–Burmeister function-based reformulation of the KKT system (3.1)–(3.4) equipped with the linesearch procedure). • SNM-AS-SR1 is the quasi-Newton version of Algorithm 4.3 where the true Hessians of the Lagrangians are replaced by SR1 approximations (see, e.g., [24, p. 144]). • SNM-AS-BFGS is the same as SNM-AS-SR1 but with SR1 approximations replaced by BFGS approximations (see, e.g., [24, p. 140]). • SQP-BFGS is the SQP algorithm applied directly to MPVC (1.1), with BFGS approximations of the Hessian of the Lagrangian complemented by Powell’s modification [24, pp. 536, 537], and equipped with linesearch for the l1 -penalty function. Essentially this is a simple Matlab implementation of [24, Algorithm 18.3] with generic parameters and without any tools for tackling possible infeasibility of subproblems, and without any tools for avoiding the Maratos effect. Observe that the use of BFGS or SR1 approximations of the Hessian of the Lagrangian is not theoretically justified in Algorithm 4.3, since the direction d k computed at Outer phase step of this algorithm employing such approximations may not be a descent direction for the merit function ϕFB at (x k , λk ). However, these quasiNewton versions of Algorithm 4.3 turn out to be no much less robust than our simple implementation of SQP-BFGS. The parameters of Algorithm 4.3 were chosen as follows: θ = 0.5, q = 0.5, ε = 10−4 , τ = 0.5 and M = 105 , γ = 1. We used the stopping criterion of the form    FB x k , λk  < 10−6 . (5.1) At the same time, Identification step of Algorithm 4.3 was performed with  = NR . Failure was declared when this criterion was not achieved after 500 iterations, or when in the process of backtracking, the trial value of the stepsize parameter was getting smaller than 10−17 , or when the method in question failed to make the current step, for whatever reason. All computations were performed in Matlab environment. QP-subproblems were solved by the active-set algorithm implemented in the built-in Matlab QP-solver quadprog.

448

A.F. Izmailov, A.L. Pogosyan

Fig. 1 Iterations

For each test problem, we performed 100 runs of each algorithm being tested from the same sample of randomly generated starting points. Primal starting points were generated in a cubic neighborhood around the solution, with the edge of the cube equal to 20. Dual starting points for all algorithms were generated the same way as primal ones but around 0, and with additional nonnegativity restriction for the components corresponding to inequality constraints. In the case of a successful run, convergence to the optimal objective function value was declared when the absolute value of the difference between the latter and the objective function value achieved at the last iterate was less than 10−4 . Figure 1 reports on the average numbers of major and minor iteration counts for SNM and SNM-AS over successful runs, in the form of a so called performance profile. For SNM-AS by major iteration we mean the number of quadprog runs at Active-set step added to the number of linear systems solved at Outer-phase step. For SNM by major iteration we mean the number of linear systems solved. By the number of minor iterations for all algorithms we essentially mean the number of linear systems solved (the linear programming problem solved within quadprog in order to identify a feasible starting point was also counted as one minor iteration). Therefore, for SNM minor and major iteration counts are the same. However, the Active-set step of Algorithm 4.3 generally has QP-subproblems with inequality constraints, and the number of minor iterations performed by quadprog in SNM-AS is generally larger than the number of major iterations. For each algorithm, we plot the function F : [1, +∞) → [0, 1] defined as follows. For a given algorithm, let kp stand for the result (in Fig. 1, the average numbers of major and minor iterations per one successful run), and let sp stand for the portion of successful runs for the p-th problem in the test set. If sp = 0, we put kp = +∞ since failure is regarded as infinitely many times worse than any other result. Let rp be the best result kp over all algorithms being compared. Then we set F (τ ) =

1  sp , P p∈R(τ )

Active-set Newton methods for MPVC

449

Fig. 2 Evaluations of constraints

Fig. 3 Evaluations of derivatives and convergence to optimal value

where P is the number of problems in the test set (in our case, P = 23), while R(τ ) consists of the test problems the result for which is no more than τ times worse than the best result: R(τ ) = {p = 1, . . . , P | kp ≤ τ rp },

τ ∈ [1, ∞).

Such performance profiles generalize the original construction suggested in [7] to the case of multiple runs for each test problem. The value F (1) characterizes the pure relative efficiency of the algorithm: it corresponds to portion of problems for which the algorithm demonstrated the best result. The value F (τ ) for large τ characterizes the pure robustness of the algorithm: it corresponds to the probability of a successful run. In Figs. 2 and 3(a), the “result” of each algorithm for a given test problem is the average number of evaluations of constraints and derivatives per one successful run, respectively. In Fig. 3(b), the “result” is the inverse number of the cases of convergence to the optimal objective function value. Note that this result equals +∞ when the given algorithm never converged to the optimal objective function value for a given problem, and this adds to the cases of failure.

450

A.F. Izmailov, A.L. Pogosyan

Fig. 4 Iterations

Fig. 5 Evaluations of constraints

All Figs. 1–3 demonstrate similar behavior: Active-set step evidently improves both robustness and efficiency of the algorithm in question, and even improves its ability to converge to a solution (rather than to a non-optimal stationary point). Figures 4–6 were plotted similarly to Figs. 1–3 but for algorithms SNM-AS, SNMAS-SR1, SNM-AS-BFGS and SQP-BFGS. For SNM-AS-SR1, SNM-AS-BFGS major iterations are understood the same way as for SNM-AS. However, by major iterations for SQP-BFGS we mean the number of quadprog runs. One can see from Fig. 4 that the efficiency of SNM-AS and SQP-BFGS in terms of major iteration counts is similar, while both seriously outperform the other algorithms. At the same time, SNM-AS outperforms other algorithms by efficiency in terms of minor iteration counts, while SNM-AS-SR1 and SNM-AS-BFGS (whose iterations are free from computing true second derivatives) behave similarly to SQPBFGS. Also SNM-AS outperforms other algorithms by robustness. However, Figs. 5 and 6 demonstrate that SQP-BFGS outperforms the other algorithms in terms of evaluations of constraints and derivatives, and slightly outperforms them in terms of convergence to the optimal objective function value.

Active-set Newton methods for MPVC

451

Fig. 6 Evaluations of derivatives and convergence to optimal value

Finally, note that there were runs when some algorithms were achieving the stopping criterion (5.1) even in the absence of strongly stationary accumulation points, with the dual iterates tending to infinity. The problem taken from [16, Example 6.1] has only weakly stationary points, and all of them actually solve the problem. For this problem, all runs of SNM-AS and SNM failed; however, SNM-AS-SR1 successfully terminated at a weakly stationary point for 47% of runs, SNM-AS-BFGS for 7% of runs, and SQP-BFGS for 99% of runs. To summarize, the preliminary numerical experience presented in this section demonstrated that the approach of this paper is quite promising, and the resulting algorithms can be competitive with respect to their reasonable alternatives. Acknowledgements The authors thank the anonymous referees for useful comments. In particular, Remark 4.1 is in response to the question raised by one of the referees.

References 1. Achtziger, W., Kanzow, C.: Mathematical programs with vanishing constraints: optimality conditions and constraint qualifications. Math. Program. 114, 69–99 (2007) 2. Achtziger, W., Hoheisel, T., Kanzow, C.: A smoothing-regularization approach to mathematical programs with vanishing constraints. Preprint 284, Institute of Mathematics, University of Würzburg (2008) 3. Bonnans, J.F.: Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29, 161–186 (1994) 4. Chen, Y.D., Gao, Y., Liu, Y.-J.: An inexact SQP Newton method for convex SC1 minimization problems. J. Optim. Theory Appl. 146, 33–49 (2010) 5. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983) 6. De Luca, T., Facchinei, F., Kanzow, C.: A theoretical and numerical comparison of some semismooth algorithms for complementarity problems. Comput. Optim. Appl. 16, 173–205 (2000) 7. Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002) 8. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer, New York (2003) 9. Facchinei, F., Fischer, A., Kanzow, C.: Regularity properties of a semismooth reformulation of variational inequalities. SIAM J. Optim. 8, 850–869 (1998) 10. Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active constraints. SIAM J. Optim. 9, 14–32 (1999)

452

A.F. Izmailov, A.L. Pogosyan

11. Fischer, A.: Local behaviour of an iterative framework for generalized equations with nonisolated solutions. Math. Program. 94, 91–124 (2002) 12. Hager, W.W., Gowda, M.S.: Stability in the presence of degeneracy and error estimation. Math. Program. 85, 181–192 (1999) 13. Hoheisel, T., Kanzow, C.: First- and second-order optimality conditions for mathematical programs with vanishing constraints. Appl. Math. 52, 495–514 (2007) 14. Hoheisel, T., Kanzow, C.: Stationarity conditions for mathematical programs with vanishing constraints using weak constraint qualifications. J. Math. Anal. Appl. 337, 292–310 (2008) 15. Hoheisel, T., Kanzow, C.: On the Abadie and Guignard constraint qualifications for mathematical programs with vanishing constraints. Optimization 58, 431–448 (2009) 16. Hoheisel, T., Kanzow, C., Outrata, J.: Exact penalty results for mathematical programs with vanishing constraints. Nonlinear Anal. 72, 2514–2526 (2010) 17. Hoheisel, T., Kanzow, C., Schwartz, A.: Convergence of a local regularization approach for mathematical programs with complementarity or vanishing constraints. Optim. Methods Softw. doi:10.1080/10556788.2010.535170 18. Izmailov, A.F., Pogosyan, A.L.: Optimality conditions and Newton-type methods for mathematical programs with vanishing constraints. Comput. Math. Math. Phys. 49, 1128–1140 (2009) 19. Izmailov, A.F., Pogosyan, A.L.: On active-set methods for mathematical programs with vanishing constraints. In: Bereznyov, V.A. (ed.) Theoretical and Applied Problems of Nonlinear Analysis, pp. 18–49. CCAS, Moscow (2009) (In Russian) 20. Izmailov, A.F., Solodov, M.V.: An active-set Newton method for mathematical programs with complementarity constraints. SIAM J. Optim. 19, 1003–1027 (2008) 21. Izmailov, A.F., Solodov, M.V.: Mathematical programs with vanishing constraints: optimality conditions, sensitivity, and a relaxation method. J. Optim. Theory Appl. 142, 501–532 (2009) 22. Izmailov, A.F., Pogosyan, A.L., Solodov, M.V.: Semismooth Newton method for the lifted reformulation of mathematical programs with complementarity constraints. Comput. Optim. Appl. 51, 199–221 (2012) 23. Luo, Z.-Q., Pang, J.-S., Ralph, D.: Mathematical Programs with Equilibrium Constraints. Cambridge Univ. Press, Cambridge (1996) 24. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006) 25. Outrata, J.V., Kocvara, M., Zowe, J.: Nonsmooth Approach to Optimization Problems with Equilibrium Constraints: Theory, Applications and Numerical Results. Kluwer Academic, Boston (1998) 26. Tseng, P.: Growth behavior of a class of merit functions for the nonlinear complementarity problem. J. Optim. Theory Appl. 89, 17–37 (1996)