CRITICAL SOLUTIONS OF NONLINEAR EQUATIONS: LOCAL ATTRACTION FOR NEWTON-TYPE METHODS∗ A. F. Izmailov1, 2 , A. S. Kurennoy3 , and M. V. Solodov4 March 28, 2016 ABSTRACT We show that if the equation mapping is 2-regular at a solution in some nonzero direction in the null space of its Jacobian (in which case this solution is critical; in particular, the local Lipschitzian error bound does not hold), then this direction defines a star-like domain with nonempty interior from which the iterates generated by a certain class of Newton-type methods necessarily converge to the solution in question. This is despite the solution being degenerate, and possibly non-isolated (so that there are other solutions nearby). In this sense, Newtonian iterates are attracted to the specific (critical) solution. Those results are related to the ones due to A. Griewank for the basic Newton method but are also applicable, for example, to some methods developed specially for tackling the case of potentially non-isolated solutions, including the Levenberg–Marquardt and the LP-Newton methods for equations, and the stabilized sequential quadratic programming for optimization.
Key words: Newton method, critical solutions, 2-regularity, Levenberg–Marquardt method, linearprogramming-Newton method, stabilized sequential quadratic programming. AMS subject classifications: 90C33, 65K10, 49J53. ∗
This research is supported by the Russian Foundation for Basic Research Grant 14-01-00113, by the Russian Science Foundation Grant 15-11-10021, by CNPq Grants 303724/2015-3 and PVE 401119/2014-9, by FAPERJ, and by VolkswagenStiftung Grant 115540. 1 VMK Faculty, OR Department, Lomonosov Moscow State University (MSU), Uchebniy Korpus 2, Leninskiye Gory, 119991 Moscow, Russia. 2 Peoples’ Friendship University of Russia, Miklukho-Maklaya Str. 6, 117198 Moscow, Russia. Email:
[email protected] 3 Department of Mathematics, Physics and Computer Sciences, Derzhavin Tambov State University, TSU, Internationalnaya 33, 392000 Tambov, Russia. Email:
[email protected] 4 IMPA – Instituto de Matem´ atica Pura e Aplicada, Estrada Dona Castorina 110, Jardim Botˆanico, Rio de Janeiro, RJ 22460-320, Brazil. Email:
[email protected] 1
Introduction
Consider a nonlinear equation Φ(u) = 0,
(1.1)
where the mapping Φ : Rp → Rp is smooth enough (precise smoothness assumptions will be stated later as needed). This paper is concerned with convergence properties of Newton-type methods for solving the equation (1.1) when it has a singular solution u ¯ (i.e., the matrix 0 Φ (¯ u) is singular). Of particular interest is the difficult case when u ¯ may be a non-isolated solution of (1.1). Note that if u ¯ is a non-isolated solution, it is necessarily singular. To describe the class of methods in question, we define the perturbed Newton method (pNM) framework for equation (1.1) as follows. For the given iterate uk ∈ Rp , the next iterate is uk+1 = uk + v k , with v k satisfying the following linear equation in v: Φ(uk ) + (Φ0 (uk ) + Ω(uk ))v = ω(uk ).
(1.2)
In (1.2) the mappings Ω : Rp → Rp×p and ω : Rp → Rp are certain perturbation terms which may have different roles, and individually or collectively define specific methods within the general pNM framework. In particular, if Ω ≡ 0 and ω ≡ 0, then (1.2) reduces to the iteration system of the basic Newton method. The most common setting would be for Ω to characterize a perturbation of the iteration matrix of the basic Newton method (i.e., the difference between the matrix a given method actually employs, compared to the exact Jacobian Φ0 (uk )), and for ω to account for possible inexactness in solving the corresponding linear system of equations. An example of this setting is the stabilized Newton-Lagrange method (stabilized sequential quadratic programming for equality-constrained optimization), considered in Section 3.3 below. However, we emphasize that our framework is not restricted to this situation. In particular, subproblems of a given method need not even be systems of linear equations, as long as they can be related to (1.2) a posteriori. One example is the linear-programming-Newton (LP-Newton) method discussed in Section 3.2 below, which solves linear programs, and for which the perturbation term ω is implicit (i.e., it does not have an explicit analytical formula, but its properties are known). In this respect, we also comment that the way we shall employ the perturbation mappings Ω and ω is somewhat unusual, in the following sense. They may describe a given method not necessarily on the whole neighborhood of a solution of interest, but possibly only in some relevant star-like domain of convergence; see the discussion of the Levenberg–Marquardt method in Section 3.1 and of the LP-Newton method in Section 3.2. This, however, is exactly what is needed in the presented convergence analysis, as it is shown that the generated iterates do in fact stay within the domain in question and, within this set, Ω and ω that we construct do adequately represent the given algorithms. Finally, we note that the specific methods that we consider in this paper have been designed to tackle the difficult cases when (1.1) has degenerate/non-isolated solutions, and in this sense the perturbation terms in (1.2) that describe these methods can be regarded as “structural”, i.e., introduced intentionally for improving convergence properties in the degenerate cases. The assumptions imposed on Ω and ω are only related to their “size”, which allows ω to cover naturally precision control when the subproblems are solved approximately. However, as already commented, the use of ω can also be quite different. 1
In the analysis of the LP-Newton method in Section 3.2, ω is implicit and is not related to solving subporoblems approximately. Our convergence results assume a certain 2-regularity property of the solution of (1.1), which implies that this solution is “critical” in the sense of [14]. We next state the relevant definitions, and discuss the relations between those concepts. Recall first that if Φ is differentiable at a solution u ¯ of the equation (1.1), then it holds that TΦ−1 (0) (¯ u) ⊂ ker Φ0 (¯ u), where TΦ−1 (0) (¯ u) is the contingent (tangent) cone to the solution set of (1.1) at u ¯. The following notion was introduced in [14]. Definition 1.1 Assuming that Φ is differentiable at a solution u ¯ of the equation (1.1), this −1 solution is referred to as noncritical if the set Φ (0) is Clarke-regular at u ¯, and TΦ−1 (0) (¯ u) = ker Φ0 (¯ u).
(1.3)
Otherwise, the solution u ¯ is referred to as critical. As demonstrated in [14], if Φ satisfies some mild and natural smoothness assumptions, then noncriticality of u ¯ is equivalent to the local Lipschitzian error bound on the distance to the solution set in terms of the natural residual of the equation (1.1): dist(u, Φ−1 (0)) = O(kΦ(u)k) holds as u ∈ Rp tends to u ¯. Moreover, it is also equivalent to the upper-Lipschitzian stability with respect to right-hand side perturbations of (1.1): any solution u(w) of the perturbed equation Φ(u) = w, close enough to u ¯, satisfies dist(u(w), Φ−1 (0)) = O(kwk) as w ∈ Rp tends to 0. Accordingly, criticality of u ¯ means the absence of the properties above. The interest in critical/noncritical solutions of nonlinear equations originated from the study of special Lagrange multipliers in equality-constrained optimization, also called critical [12, 17, 18, 13, 21], [20, Chapter 7]. For the relations between critical solutions of equations and critical multipliers in optimization, see [14]. It had been demonstrated that critical Lagrange multipliers tend to attract dual sequences generated by a number of Newton-type methods for optimization [17, 18], [20, Chapter 7]. In this paper, we show that critical solutions of nonlinear equations also serve as attractors, in this case for methods described by the pNM framework (1.2). The notion of 2-regularity is a useful tool in nonlinear analysis and optimization theory; see, e.g., the book [1], as well as [2, 15, 16, 8, 9] for some applications. The essence of the construction is that when a mapping Φ is irregular at u ¯ (Φ0 (¯ u) is singular), first-order information is insufficient to adequately represent Φ around u ¯, and so second-order information has to come into play. To this end, we have the following. 2
Definition 1.2 Assuming that Φ is twice differentiable at u ¯ ∈ Rp , Φ is said to be 2-regular at the point u ¯ in the direction v ∈ Rp if the p × p-matrix Ψ(¯ u; v) = Φ0 (¯ u) + ΠΦ00 (¯ u)[v]
(1.4)
is nonsingular, where Π is the projector in Rp onto an arbitrary fixed complementary subspace of im Φ0 (¯ u), along this subspace. In our convergence analysis of Newton-type methods described by (1.2), we shall assume that Φ is 2-regular at a solution u ¯ of (1.1) in some direction v ∈ ker Φ0 (¯ u) \ {0}. It turns out that this implies that the solution u ¯ is necessarily critical in the sense of Definition 1.1. We show this next. Let v ∈ ker Φ0 (¯ u) \ {0}. Suppose u ¯ is a noncritical solution. Then v ∈ TΦ−1 (0) (¯ u), by (1.3). Thus, there exist a sequence {tk } of positive reals and a sequence {rk } ⊂ Rp such that {tk } → 0, rk = o(tk ), and for all k it holds that 1 u)[v, v] + o(t2k ). 0 = Φ(¯ u + tk v + rk ) = Φ0 (¯ u)rk + t2k Φ00 (¯ 2 Therefore, u), u)[v, v] + o(t2k ) ∈ im Φ0 (¯ t2k Φ00 (¯ which implies that Φ00 (¯ u)[v, v] ∈ im Φ0 (¯ u). Hence, ΠΦ00 (¯ u)[v, v] = 0. Since also Φ0 (¯ u)v = 0, from (1.4) we conclude that v ∈ ker Ψ(¯ u; v). As v 6= 0, this contradicts the nonsigularity of Ψ(¯ u; v). Thus, u ¯ must be a critical solution. In Section 2, we shall prove that if Φ is 2-regular at a (critical) solution u ¯ of (1.1) in 0 some direction v ∈ ker Φ (¯ u) \ {0}, then v defines a domain star-like with respect to u ¯, with nonempty interior, from which the iterates that satisfy the pNM framework (1.2) necessarily converge to u ¯. In this sense, the iterates are “attracted” specifically to u ¯, even though there may be other nearby solutions. These results are related to [10], where the pure Newton method was considered (i.e., (1.2) with Ω ≡ 0 and ω ≡ 0). In Section 3, we demonstrate how the general results for the pNM framework (1.2) apply to some specific Newton-type methods. These include the classical Levenberg–Marquardt method [23, Chapter 10.2] and the LP-Newton method [5] for nonlinear equations, and the stabilized Newton–Lagrange method for optimization (or stabilized sequential quadratic programming) [25, 11, 6, 19]; see also [20, Chapter 7]. We finish this section with some words about our notation. Throughout, h·, ·i is the Euclidian inner product, and unless specified otherwise, k · k is the Euclidian norm, where the space is always clear from the context. Then the unit sphere is S = {u | kuk = 1}. For a linear operator (a matrix) A, we denote by ker A its null space, and by im A its image (range) space. The notation I stands for the identity matrix. A set U is called star-like with respect to u ∈ U if tˆ u + (1 − t)u ∈ U for all u ˆ ∈ U and all t ∈ [0, 1].
3
2
Local convergence to a critical solution
Results of this section are related to those in [10], where the basic Newton method was considered (i.e, pNM (1.2) with Ω ≡ 0 and ω ≡ 0). To the best of our knowledge, this is the only directly related reference, as it also allows for non-isolated solutions. Other literature on the basic Newton method for singular equations uses assumptions that imply that a solution, though possibly singular, must be isolated. Let u ¯ be a solution of the equation (1.1). Then every u ∈ Rp is uniquely decomposed into the sum u = u1 + u2 , with u1 ∈ (ker Φ0 (¯ u))⊥ and u2 ∈ ker Φ0 (¯ u). The corresponding notation will be used throughout the rest of paper. We start with the following counterpart of [10, Lemma 4.1], which establishes that, under appropriate assumptions, the pNM subproblem (1.2) has the unique solution for all uk close enough to u ¯ in a certain star-like domain. Lemma 2.1 Let Φ : Rp → Rp be twice differentiable near u ¯ ∈ Rp , with its second derivative Lipschitz-continuous with respect to u ¯, that is, Φ00 (u) − Φ00 (¯ u) = O(ku − u ¯k) as u → u ¯. Let u ¯ be a solution of equation (1.1), and assume that Φ is 2-regular at u ¯ in a direction v¯ ∈ Rp ∩ S. Let Π stand for the orthogonal projector onto (im Φ0 (¯ u))⊥ . Let Ω : Rp → Rp×p satisfy the following properties: Ω(u) = O(ku − u ¯k)
(2.1)
as u → u ¯, and for every ∆ > 0 there exist ε > 0 and δ > 0 such that for every u ∈ Rp \ {¯ u} satisfying
u−u
¯
ku − u ¯k ≤ ε, − v¯ (2.2)
≤ δ, ku − u ¯k it holds that Let ω : Rp → Rp satisfy
kΠΩ(u)k ≤ ∆ku − u ¯k.
(2.3)
ω(u) = O(ku − u ¯k2 )
(2.4)
as u → u ¯. ¯ v ) > 0 such that for every u ∈ Rp \ {¯ Then there exist ε¯ = ε¯(¯ v ) > 0 and δ¯ = δ(¯ u} satisfying
u−u ¯ ¯
≤ δ, (2.5) ku − u ¯k ≤ ε¯, − v ¯
ku − u ¯k the equation (1.2) with uk = u has the unique solution v, satisfying v1 + u1 − u ¯1 = O(ku − u ¯k2 ),
(2.6)
1 u2 +v2 −¯ u2 = (u2 −¯ u2 )+O(ku1 −¯ u1 k)+O(ku−¯ uk−1 Πω(u))+O(kΠΩ(u)k)+O(ku−¯ uk2 ) (2.7) 2 as u → u ¯. 4
Proof. Under the stated smoothness assumptions, without loss of generality we can assume that u ¯ = 0 and
1 Φ(u) = Au + B[u, u] + R(u), (2.8) 2 where A = Φ0 (0) ∈ Rp×p , B = Φ00 (0) is a symmetric bilinear mapping from Rp × Rp to Rp , and the mapping R : Rp → Rp is differentiable near 0, with R0 (u) = O(kuk2 )
R(u) = O(kuk3 ),
(2.9)
as u → 0. Multiplying (1.2) be (I−Π) and by Π, we decompose (1.2) into the following two equations: 1 0 B[u, u] + R(u) − ω(u) (A + (I − Π)(B[u] + R (u) + Ω(u)))v1 = −Au1 − (I − Π) 2 −(I − Π)(B[u] + R0 (u) + Ω(u))v2 , (2.10) and
0
Π(B[u] + R (u) + Ω(u))(v1 + v2 ) = −Π
1 B[u, u] + R(u) − ω(u) . 2
(2.11)
Let ε¯ > 0 and δ¯ > 0 be arbitrary and fixed for now. From this point on, we consider only those u ∈ Rp \ {0} that satisfy (2.5). Define the family of linear operators A(u) : (ker A)⊥ → im A as the restriction of (A + (I − Π)(B[u] + R0 (u) + Ω(u))) to (ker A)⊥ . Let A¯ : (ker A)⊥ → im A be the restriction of A to (ker A)⊥ . Then, taking into account (2.4) and (2.9), the equality (2.10) can be written as ¯ 1 − (I − Π)(B[u] + R0 (u) + Ω(u))v2 + O(kuk2 ) A(u)v1 = −Au
(2.12)
as u → 0. Evidently, A¯ is invertible, and according to (2.1) and (2.9), A(u) = A¯ + O(kuk). The latter implies that A(u) is invertible, provided ε¯ > 0 is small enough, and ¯ −1 + O(kuk) (A(u))−1 = (A) as u → 0 (see, e.g., [20, Lemma A.6]). Therefore, (2.12) can be written as v1 = −u1 + M(u)v2 + O(kuk2 ),
(2.13)
where M(u) : ker A → (ker A)⊥ , M(u) = −(A(u))−1 (I − Π)(B[u] + R0 (u) + Ω(u)) = O(kuk) as u → 0.
5
(2.14)
Substituting (2.13) into (2.11), and taking into account (2.9), we obtain that 1 0 Π(B[u] + R (u) + Ω(u))(I + M(u))v2 = −Π B[u, u] − ω(u) + Π(B[u] + Ω(u))u1 2 +O(kuk3 ).
(2.15)
Define the family of linear operators B(u) : ker A → (im A)⊥ as the restriction of Π(B[u] + ¯ R0 (u) + Ω(u))(I + M(u)) to ker A. Let B(u) : ker A → (im A)⊥ be the restriction of ΠB[u] to ker A. Then (2.15) can be written as 1¯ 1 B(u)v2 = − B(u)u2 + Π B[u] + Ω(u) u1 + ω(u) + O(kuk3 ) (2.16) 2 2 as u → 0. ¯ v ) is nonsingular. The 2-regularity of Φ at 0 in the direction v¯ means precisely that B(¯ Then, possibly after reducing δ¯ > 0, by [20, Lemma A.6] we obtain the existence of C > 0 ¯ such that for every u ∈ Rp \ {0} satisfying the second relation in (2.5), B(u/kuk) is invertible, and
−1
¯ (2.17)
B kuk−1 u
≤ C. According to (2.9), (2.1), (2.14), it holds that B kuk−1 u = B¯ kuk−1 u + kuk−1 ΠΩ(u) + O(kuk). Choosing ∆ > 0 small enough, and further reducing ε¯ > 0 and δ¯ > 0 if necessary, by (2.3) and (2.17), and by [20, Lemma A.6], we now obtain that B kuk−1 u is invertible, and −1 −1 + O kuk−1 kΠΩ(u)k + O(kuk). B kuk−1 u = B¯ kuk−1 u Employing again (2.1), we further conclude that −1 ¯ (B(u))−1 = (B(u)) + O kuk−2 kΠΩ(u)k + O(1) = O kuk−1 . Therefore, (2.16) is uniquely solvable, and its unique solution has the form 1 1 −1 ¯ v2 = − u2 + (B(u)) Π B[u, u1 ] + ω(u) + O(kΠΩ(u)k) + O(kuk2 ) = O(kuk) 2 2
(2.18)
as u → 0, where the last estimate employs (2.1) and (2.4). Substituting (2.18) into (2.13), and employing (2.14) again, we finally obtain that v1 = −u1 + O(kuk2 ) as u → 0. From (2.18) and (2.19), we have the needed estimates (2.6) and (2.7).
6
(2.19)
Remark 2.1 From the proof of Lemma 2.1 it can be seen that under the assumptions of this lemma (removing the assumptions on ω which are not needed for the following), the values ¯ v ) > 0 can be chosen in such a way that for every u ∈ Rp \ {¯ ε¯ = ε¯(¯ v ) > 0 and δ¯ = δ(¯ u} satisfying (2.5), the matrix Φ0 (u) + Ω(u) is invertible, and (Φ0 (u) + Ω(u))−1 = O(ku − u ¯k−1 ) as u → u ¯. When Ω ≡ 0, this result is a particular case of [10, Lemma 3.1]. We proceed to establish convergence of the iterates satisfying the pNM framework (1.2), from any starting point in the relevant domain. This result is a generalization of [10, Lemma 5.1]. Theorem 2.1 Let Φ : Rp → Rp be twice differentiable near u ¯ ∈ Rp , with its second derivative being Lipschitz-continuous with respect to u ¯, that is, Φ00 (u) − Φ00 (¯ u) = O(ku − u ¯k) as u → u ¯. Let u ¯ be a solution of equation (1.1), and assume that Φ is 2-regular at u ¯ in a direction v¯ ∈ ker Φ0 (¯ u) ∩ S. Let Ω : Rp → Rp×p and ω : Rp → Rp satisfy the estimates (2.1), (2.4), as well as ΠΩ(u) = O(ku1 − u ¯1 k) + O(ku − u ¯k2 ) (2.20) and Πω(u) = O(ku − u ¯kku1 − u ¯1 k) + O(ku − u ¯ k3 )
(2.21)
as u → u ¯. Then for every ε¯ > 0 and δ¯ > 0, there exist ε = ε(¯ v ) > 0 and δ = δ(¯ v ) > 0 such that any starting point u0 ∈ Rp \ {¯ u} satisfying
0
u −u
¯ 0
ku − u ¯k ≤ ε, 0 − v¯ (2.22)
≤δ ku − u ¯k uniquely defines the sequence {uk } ⊂ Rp such that for each k it holds that v k = uk+1 − uk solves (1.2), uk2 6= u ¯2 , the point u = uk satisfies (2.5), the sequence {uk } converges to u ¯, the k sequence {ku − u ¯k} converges to zero monotonically, kuk+1 −u ¯1 k 1 = O(kuk − u ¯k) k+1 ku2 − u ¯2 k
(2.23)
kuk+1 −u ¯2 k 1 2 = . k→∞ kuk 2 ¯2 k 2 −u
(2.24)
as k → ∞, and lim
Proof. We again assume that u ¯ = 0, and Φ is given by (2.8) with R satisfying (2.9). Considering that v¯1 = 0, observe first that if u ∈ Rp \ {0} satisfies the second condition in (2.2) with some δ ∈ (0, 1), then
u
ku1 k u1
= − v¯1 ≤ − v¯
≤ δ. kuk kuk kuk 7
This implies that ku1 k ≤ δkuk,
(2.25)
and hence, kuk ≤ ku1 k + ku2 k ≤ δkuk + ku2 k, so that (1 − δ)kuk ≤ ku2 k.
(2.26)
u2
u2
u2 u2
ku2 k − v¯ ≤ kuk − v¯2 + ku2 k − kuk
u
kuk − ku2 k
≤ − v¯
+ kuk kuk ku2 k ≤ δ+1− kuk ≤ 2δ,
(2.27)
Then
where the second inequality employs the fact that u2 /kuk is the metric projection of u/kuk onto ker A. From from (2.20), (2.21), and (2.25), it evidently follows that Ω satisfies the assumptions of Lemma 2.1. Therefore, according to this lemma, there exist ε¯ > 0 and δ¯ > 0 such that for every u ∈ Rp \ {¯ u} satisfying (2.5), the equation (1.2) with uk = u has the unique solution v, satisfying u1 + v1 = O(kuk2 ) and
1 u2 + v2 = u2 + O(ku1 k) + O(kuk2 ) 2 as u → 0, where (2.20) and (2.21) were again taken into account. In what follows, we use these ¯ with the understanding that if we prove the needed assertion for these values, values ε¯ and δ, it will also be valid for any larger values. At the same time, if the assertion of Lemma 2.1 holds with these values, it certainly also holds with smaller values. Therefore, there exists C > 0 such that ku1 + v1 k ≤ Ckuk2 ,
1 1
u2 + v2 − u2 ≤ u + v − u2 ≤ C(ku1 k + kuk2 ),
2 2
(2.28) (2.29)
and hence, 1 1 ku2 k − C(ku1 k + kuk2 ) ≤ ku2 + v2 k ≤ ku + vk ≤ ku2 k + C(ku1 k + kuk2 ). 2 2 ¯ and (2.30), we further derive that From (2.5), (2.26) with δ = δ,
1 − δ¯ 1 ¯ ¯ − C(δ + ε¯) kuk ≤ ku2 + v2 k ≤ ku + vk ≤ + C(δ + ε¯) kuk. 2 2 8
(2.30)
Reducing ε¯ > 0 and δ¯ > 0 if necessary, so that δ¯ 1 + C(δ¯ + ε¯) < , 2 2 and setting q− =
1 − δ¯ − C(δ¯ + ε¯), 2
q+ =
1 + C(δ¯ + ε¯), 2
we then obtain that q− kuk ≤ ku2 + v2 k ≤ ku + vk ≤ q+ kuk,
(2.31)
0 < q− < q+ < 1.
(2.32)
where By (2.29), the right inequality in (2.30), and the left inequality in (2.31), we have that
u+v
u k(u + v)ku2 k − u2 ku + vkk 2
ku + vk − ku2 k = ku2 kku + vk 2C(ku1 k + kuk2 ) ≤ q− kuk 2C ku1 k = + kuk) , (2.33) q− kuk
u2 + v2 u2
ku2 + v2 k − ku2 k = ≤ =
k(u2 + v2 )ku2 k − u2 ku2 + v2 kk ku2 kku2 + v2 k 2C(ku1 k + kuk2 ) q− kuk 2C ku1 k + kuk) . q− kuk
¯ satisfying Now choose ε ∈ (0, ε¯], δ ∈ (0, δ] 2C C ε ¯ 2δ + δ+ +1 ≤ δ, q− q− 1 − q+
(2.34)
(2.35)
and assume that (2.22) holds for u0 ∈ Rp \ {0}. Suppose that uj ∈ Rp \ {0}, satisfy (1.2) and (2.5) with u = uj for all j = 1, . . . , k. Then by the choice of¯ ε > 0 and δ¯ > 0, there exists k+1 the unique u satisfying (1.2), and by (2.25), (2.27), (2.28), (2.31), (2.32), (2.33), (2.34), it holds that k+1 k+1 2 kuk−1 k ≤ . . . ≤ q+ ku0 k ≤ q+ 0 < kuk+1 k ≤ q+ kuk k ≤ q+ ε ≤ ε ≤ ε¯,
9
k+1
k+1
k
u
u
u2 uk2
kuk+1 k − v¯ ≤ kuk k − v¯ + kuk+1 k − kuk k 2 2
uk−1
uk k−1 u uk2 uk+1
2
2
2
≤ k−1 − v¯ + k − k−1 + k+1 − k
ku2 k
ku2 k ku2 k ku k ku2 k ≤ ...
0
X k j−1 j
u2
u uk+1 uk2 u
2 2
≤ 0 − v¯ +
j − j−1 + k+1 − k
ku k ku2 k ku2 k ku2 k j=1 ku2 k ! k X 2C kuj−1 k 2C kuk1 k k j−1 1 ≤ 2δ + + ku k + ku k + q kuj−1 k q− kuk k j=1 − ! k 2C X kuj1 k ≤ 2δ + + kuj k q− kuj k j=0 k k 2C ku01 k X Ckuj−1 k X j ≤ 2δ + + + ku k q− ku0 k q− j=1 j=0 k k X 2C C X j−1 j ≤ 2δ + δ+ q+ ε + q+ ε q− q− j=1 j=0 ε 2C C ε ≤ 2δ + δ+ + q− q− 1 − q+ 1 − q+ ε 2C C +1 ≤ 2δ + δ+ q− q− 1 − q+ ¯ ≤ δ, where the last inequality is by (2.35). Therefore, (2.5) holds with u = uk+1 . We have thus established that there exists the unique sequence {uk } ⊂ Rp such that for each k the point uk satisfies (1.2) and (2.5) with u = uk . By (2.31) and (2.32), it then follows that uk 6= u ¯ for all k, and {uk } converges to 0. According to (2.28) and (2.31), it holds that ku1k+1 k C k ≤ ku k k+1 q ku2 k − for all k. This yields (2.23). Furthermore, according to (2.30), 1 kuk k + kuk k2 kuk+1 k 1 kuk k + kuk k2 −C 1 k ≤ 2k ≤ + C 1 k , 2 2 ku2 k ku2 k ku2 k where by (2.23) both sides tend to 1/2 as k → ∞. This gives (2.24).
10
Remark 2.2 In Theorem 2.1, by the monotonicity of the sequence {kuk − u ¯k}, for every k k+1 k+1 k large enough it holds that ku2 − u ¯2 k ≤ ku −u ¯k ≤ ku − u ¯k. Therefore, (2.23) implies the estimates kuk+1 −u ¯1 k 1 = O(kuk − u ¯k) kuk+1 − u ¯k and kuk+1 −u ¯1 k = O(kuk − u ¯ k2 ) 1
(2.36)
as k → ∞. Furthermore, kuk+1 −u ¯2 k kuk+1 −u ¯2 k kuk+1 −u ¯2 k 2 2 2 ≤ ≤ , k k k k ku − u ¯k ku1 − u ¯1 k + ku2 − u ¯2 k ku2 − u ¯2 k where by (2.23) and (2.24) both sides tend to 1/2 as k → ∞. Therefore, −u ¯2 k kuk+1 1 2 = . k k→∞ ku − u 2 ¯k lim
(2.37)
Finally, −u ¯1 k −u ¯2 k − kuk+1 −u ¯1 k kuk+1 −u ¯2 k kuk+1 kuk+1 − u ¯k kuk+1 1 2 1 ≤ ≤ + 2k k k k ku − u ¯k ku − u ¯k ku − u ¯k ku − u ¯k where by (2.36) and (2.37) both sides tend to 1/2 as k → ∞. Therefore, kuk+1 − u ¯k 1 = . k k→∞ ku − u 2 ¯k lim
Theorem 2.1 establishes the existence of a set with nonempty interior, which is star-like with respect to u ¯, and such that any sequence satisfying the pNM relation (1.2) and initialized from any point of this set, converges linearly to u ¯. Moreover, if Φ is 2-regular at u ¯ in at least one direction v¯ ∈ ker Φ0 (¯ u), then the set of such v¯ is open and dense in ker Φ0 (¯ u) ∩ S: its ¯ complement is the null set of the nontrivial homogeneous polynomial det B(·) considered on ker Φ0 (¯ u) ∩ S. The union of convergence domains coming with all such v¯ is also a star-like convergence domain with nonempty interior. In the case when Φ0 (¯ u) = 0 (full degeneracy), this domain is quite large. In particular, it is “asymptotically dense”: the only excluded directions are those in which Φ is not 2-regular at u ¯, which is the null set of a nontrivial homogeneous polynomial. Beyond the case of full degeneracy, the convergence domain given by Theorem 2.1 is at least not “asymptotically thin”. Though it is also not “asymptotically dense”. For the (unperturbed) Newton method, the existence of “asymptotically dense” star-like domain of convergence was established in [10, Theorem 6.1]. Specifically, it was demonstrated that one Newton step from a point u0 in this domain leads to the convergence domain coming with the appropriate v¯ = π(u0 )/kπ(u0 )k, where 1 1 ¯ 0 ¯2 ) + (B(u ¯1 ]; π(u0 ) = (u02 − u −u ¯))−1 ΠB[u0 − u ¯, u01 − u 2 2 11
see (2.18) above. Deriving a result like this for the pNM scheme (1.2) is hardly possible in general, at least without rather restrictive assumptions on perturbation terms. Perhaps results along those lines can be derived for specific methods, rather than for the general pNM framework, but such developments are not known at this time.
3
Applications to some specific algorithms
In this section we show how our general results for the pNM framework (1.2) can be applied to some specific methods. In particular, we consider the following algorithms, all developed for tackling the difficult case of non-isolated solutions: the Levenberg–Marquardt method and the LP-Newton method for equations, and the stabilized Newton–Lagrange method for optimization. We start with observing that the assumptions (2.1), (2.4), (2.20) and (2.21) on the perturbations terms in Theorem 2.1 hold automatically if Ω(u) = O(kΦ(u)k),
ω(u) = O(ku − u ¯kkΦ(u)k)
(3.1)
as u → u ¯. Indeed, this readily follows from the relation Φ(u) = Φ0 (¯ u)(u1 − u ¯1 ) + O(ku − u ¯ k2 ) as u → u ¯.
3.1
Levenberg–Marquardt method
An iteration of the classical Levenberg–Marquardt method [23, Chapter 10.2] consists in solving the following subproblem: minimize
1 1 kΦ(uk ) + Φ0 (uk )vk2 + σ(uk )kvk2 , 2 2
v ∈ Rp ,
(3.2)
where uk ∈ Rp is the current iterate and σ : Rp → R+ defines the regularization parameter. In [26], it was established that under the Lipschitzian error bound condition (i.e., being initialized near a noncritical solution u ¯ of (1.1)), the method described by (3.2) with σ(u) = kΦ(u)k2 generates a sequence which is quadratically convergent to a (nearby) solution of (1.1). For analysis under the Lipschitzian error bound condition of a rather general framework that includes the Levenberg–Marquardt method, see [4, 7]. Our interest here is the case of critical solutions, when the error bound does not hold. First, note that the unique (if σ(uk ) > 0) minimizer of the convex quadratic function in (3.2) is characterized by the linear system (Φ0 (uk ))T Φ(uk ) + ((Φ0 (uk ))T Φ0 (uk ) + σ(uk )I)v = 0.
(3.3)
We next show how (3.3) can be embedded into the pNM framework (1.2). In particular, we construct the perturbation terms for which the conditions (3.1) hold, and hence Theorem 2.1 is applicable, and which correspond to (3.3) on the relevant domain of convergence. 12
¯ v ) > 0 according to Remark 2.1, where we set Ω ≡ 0. Define ε¯ = ε¯(¯ v ) > 0 and δ¯ = δ(¯ Define the set K = K(¯ v ) = {u ∈ Rp \ {¯ u} | (2.5) holds}. (3.4) Then Φ0 (u) is invertible for all u ∈ K, and (Φ0 (u))−1 = O(ku − u ¯k−1 ) as u → u ¯ (this can also be concluded directly from [10, Lemma 3.1], with an appropriate choice of ε¯ > 0 and δ¯ > 0). Set Ω(u) = 0, ω(u) = 0, u ∈ Rp \ K. (3.5) Of course, those vanishing perturbation terms for u 6∈ K have no relation to (3.3). The point is that we next show that with the appropriate definitions for u ∈ K, we obtain Ω and ω satisfying (3.1). Then Theorem 2.1 ensures that if the starting point satisfies (2.22) with appropriate ε > 0 and δ > 0, it follows that the subsequent iterates are well defined and remain in the set K. Finally, in K, the constructed Ω and ω do correspond to (3.3). Let u ∈ K. Multiplying both sides of (3.3) considered with uk = u by the matrix (((Φ0 (uk ))T )−1 = (((Φ0 (uk ))−1 )T , we obtain that Φ(uk ) + (Φ0 (uk ) + σ(uk )(((Φ0 (uk ))−1 )T )v = 0, which is the pNM iteration system (1.2) with the perturbation terms given by Ω(u) = σ(u)(((Φ0 (u))−1 )T ,
ω(u) = 0,
u ∈ K.
Then Ω(u) = O(ku − u ¯k−1 σ(u)). Therefore, the needed first estimate in (3.1) would hold if σ(u) = O(ku − u ¯kkΦ(u)k)
(3.6)
as u ∈ K tends to u ¯. Let σ(u) = kΦ(u)kτ with τ > 0. Then (3.6) takes the form kΦ(u)kτ −1 = O(ku − u ¯k), and since Φ(u) = O(ku − u ¯k), the last estimate is satisfied as u → u ¯ if τ ≥ 2. 0 Moreover, in the case when Φ (¯ u) = 0 (full singularity) it holds that Φ(u) = O(ku − u ¯ k2 ) as u → u ¯, and hence, in this case, the appropriate values are all τ ≥ 3/2. The construction is complete. As the exhibited Ω and ω satisfy (3.1) as u → u ¯ (regardless of whether u stays in K or not), Theorem 2.1 is applicable. In particular, it guarantees that for appropriate starting points all the iterates stay in K. In this set, the perturbation terms define the Levenberg–Marquardt iterations (3.3). Thus, Theorem 2.1 implies the following. Corollary 3.1 Let Φ : Rp → Rp be twice differentiable near u ¯ ∈ Rp , with its second derivative Lipschitz-continuous with respect to u ¯. Let u ¯ be a solution of equation (1.1), and assume that Φ is 2-regular at u ¯ in a direction v¯ ∈ ker Φ0 (¯ u) ∩ S. Then for any τ ≥ 2, there exist ε = ε(¯ v ) > 0 and δ = δ(¯ v ) > 0 such that any starting point u0 ∈ Rp \ {¯ u} satisfying (2.22) uniquely defines the sequence {uk } ⊂ Rp such that for 13
each k it holds that v k = uk+1 − uk solves (3.2) with σ(u) = kΦ(u)kτ , uk2 6= u ¯2 , the sequence k k {u } converges to u ¯, the sequence {ku − u ¯k} converges to zero monotonically, and (2.23) and (2.24) hold. Moreover, if Φ0 (¯ u) = 0, then the same assertion is valid with any τ ≥ 3/2. The following example is taken from DEGEN test collection [3].
1 0.5 0 −0.5 −1
λ −1.5 −2 −2.5 −3 −3.5 −4 −3
−2
−1
0
1
2
3
x
(a) Iterative sequences
(b) Domain of attraction to the critical solution
Figure 1: Levenberg–Marquardt method with τ = 1 for Example 3.1. Example 3.1 (DEGEN 20101) Consider the equality-constrained optimization problem minimize x2 subject to x2 = 0. Stationary points and associated Lagrange multipliers of this problem are characterized by the Lagrange optimality system which has the form of a nonlinear equation with Φ : R2 → R2 , Φ(u) = (2x(1 + λ), x2 ), where u = (x, λ). The unique feasible point (hence, the unique solution, and the unique stationary point) of this problem is x ¯ = 0, and the set of associated Lagrange multipliers is the entire R. Therefore, the solution set of the Lagrange system (i.e., the primal-dual solution ¯ with λ ¯ = −1, the one for which set) is {¯ x} × R. The unique critical solution is u ¯ = (¯ x, λ) 0 Φ (¯ u) = 0 (full singularity). In Figures 1 and 2, the vertical green line corresponds to the primal-dual solution set. These figures show some iterative sequences generated by the Levenberg–Marquardt method, and the domains from which convergence to the critical solution was detected. Using zoom in, and taking smaller areas for starting points, does not significantly change the picture in Figures 1, corresponding to τ = 1. At the same time, such manipulations with Figures 2c and 2d put in evidence that for τ = 3/2, the domain of convergence is in fact asymptotically dense, which agrees with Theorem 2.1. 14
1 0.5 0 −0.5 −1
λ −1.5 −2 −2.5 −3 −3.5 −4 −3
−2
−1
0
1
2
3
x
(a) Iterative sequences
(b) Domain of attraction to the critical solution
−0.75
−0.8
−0.85
λ −0.9
−0.95
−1
−0.1
−0.05
0
0.05
0.1
0.15
0.2
x
(c) Iterative sequences
(d) Domain of attraction to the critical solution
Figure 2: Levenberg–Marquardt method with τ = 3/2 for Example 3.1.
3.2
LP-Newton method
The LP-Newton method was introduced in [5]. The iteration subproblem of this method has the form minimize γ subject to kΦ(uk ) + Φ0 (uk )vk ≤ γkΦ(uk )k2 , (3.7) kvk ≤ γkΦ(uk )k, (v, γ) ∈ Rp × R. If the l1 -norm is used, this is a linear programming problem (hence the name). As demonstrated in [4, 5] (see also [7]), local convergence properties of the LP-Newton method (under the error bound condition, i.e., near noncritical solutions) are the same as for the Levenberg– Marquardt algorithm. Again, our setting is rather that of critical solutions. The subproblem (3.7) always has a solution if Φ(uk ) 6= 0 (naturally, if Φ(uk ) = 0 the 15
method stops). The construction to place (3.7) within the pNM framework (1.2) is as follows. By the second constraint in (3.7), the equality (1.2) holds for Ω ≡ 0 and some ω(·) satisfying kω(u)k ≤ γ(u)kΦ(u)k2 , (3.8) where γ(u) is the optimal value of the subproblem (3.7) with uk = u. Note that ω(·) is defined implicitly, i.e., there is no analytic expression for it. However, ω(·) would satisfy (3.1), and thus the assumptions of Theorem 2.1, if γ(u) = O(kΦ(u)k−1 ku − u ¯k)
(3.9)
as u → u ¯. Indeed, from the first constraint in (3.7) we then obtain that kΦ(u) + Φ0 (u)vk ≤ γ(u)kΦ(u)k2 = O(ku − u ¯kkΦ(u)k), which implies the second estimate in (3.1). We thus have to establish (3.9) on the relevant set. In this respect, the construction is similar to what had been done in Section 3.1 above. ¯ v ) > 0 according to Lemma 2.1 applied with Ω ≡ 0 and Define ε¯ = ε¯(¯ v ) > 0 and δ¯ = δ(¯ ω ≡ 0, and define the set K according to (3.4). Then the step v(u) of the (unperturbed) Newton method from any point u ∈ K exists, it is uniquely defined, and by (2.6) and (2.7), it holds that v(u) = O(ku − u ¯k). Define Ω and ω on Rp \ K by (3.5). Let u ∈ K. Then the point (v, γ) = (v(u), kv(u)k/kΦ(u)k) is feasible in (3.7), and hence, γ(u) ≤ γ = kΦ(u)k−1 kv(u)k = O(kΦ(u)k−1 ku − u ¯k) as u ∈ K tends to u ¯. Observe that here, the values of ω(·) are defined in a posteriori manner, after v k is computed. For this reason, Theorem 2.1 cannot yield uniqueness of the iterative sequence: the next iterate can be defined by any ω(·) satisfying (3.8) for u = uk , and different choices of appropriate ω(·) may give rise to different next iterates. But all the other conclusions of Theorem 2.1 remain valid. As in Section 3.1, we note that the constructed Ω and ω satisfy (3.1) as u → u ¯, and therefore, Theorem 2.1 ensures that for appropriate starting points all the iterates stay in K, and in this set the perturbation terms correspond to (3.7). We thus obtain the following. Corollary 3.2 Under the assumptions of Corollary 3.1, there exist ε = ε(¯ v ) > 0 and δ = δ(¯ v ) > 0 such that for any starting point u0 ∈ Rp \ {¯ u} satisfying (2.22) the following assertions are valid: (a) There exists a sequence {uk } ⊂ Rp such that for each k the pair (v k , γk+1 ) with v k = uk+1 − uk and some γk+1 solves (3.7). (b) For any such sequence, uk2 6= u ¯2 for each k, the sequence {uk } converges to u ¯, the k sequence {ku − u ¯k} converges to zero monotonically, and (2.23) and (2.24) hold. Figure 3 shows the same information for the LP-Newton method as Figure 2 for the Levenberg–Marquardt algorithm, with the same conclusions. 16
1 0.5 0 −0.5 −1
λ −1.5 −2 −2.5 −3 −3.5 −4 −3
−2
−1
0
1
2
3
x
(a) Iterative sequences
(b) Domain of attraction to the critical solution
−0.5
−0.6
−0.7
λ −0.8
−0.9
−1
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
x
(c) Iterative sequences
(d) Domain of attraction to the critical solution
Figure 3: LP-Newton method for Example 3.1.
3.3
Equality-constrained optimization and the stabilized Newton–Lagrange method
We next turn our attention to the origin of the critical solutions issues, namely, to the equality-constrained optimization problem minimize f (x) subject to h(x) = 0,
(3.10)
where f : Rn → R and h : Rn → Rl are smooth. The Lagrangian L : Rn × Rl → R of this problem is given by L(x, λ) = f (x) + hλ, h(x)i.
17
Then stationary points and associated Lagrange multipliers of (3.10) are characterized by the Lagrange optimality system ∂L (x, λ) = 0, ∂x
h(x) = 0,
with respect to x ∈ Rn and λ ∈ Rl . The Lagrange optimality system is a special case of nonlinear equation (1.1), corresponding to setting p = q = n + l, u = (x, λ), ∂L (x, λ), h(x) . (3.11) Φ(u) = ∂x The stabilized Newton–Lagrange method (or stabilized sequential quadratic programming) was developed for solving the Lagrange optimality system (or optimization problem) when the multipliers associated to a stationary point need not be unique [25, 11, 6, 19]; see also [20, Chapter 7]. For the current iterate uk = (xk , λk ) ∈ Rn ×Rl , the iteration subproblem of this method is given by 1 ∂2L k k σ(uk ) 0 k minimize hf (x ), ξi + (x , λ )ξ, ξ + kηk2 2 ∂x2 2 subject to h(xk ) + h0 (xk )ξ − σ(uk )η = 0, where the minimization is in the variables (ξ, η) ∈ Rn × Rl , and σ : Rn × Rl → R+ now defines the stabilization parameter. Equivalently, the following linear system (characterizing stationary points and associated Lagrange multipliers of this subproblem) is solved: ∂L k k ∂2L k k (x , λ ) + (x , λ )ξ + (h0 (xk ))T η = 0, ∂x ∂x2
h(xk ) + h0 (xk )ξ − σ(uk )η = 0.
(3.12)
With a solution (ξ k , η k ) of (3.12) at hand, the next iterate is given by uk+1 = (xk +ξ k , λk +η k ). Note that if σ ≡ 0, then (3.12) becomes the usual Newton–Lagrange method, i.e., the basic Newton method applied to the Lagrange optimality system. ¯ associated with a stationary point x For a given Lagrange multiplier λ ¯ of problem (3.10), define the linear subspace 2 ∂ L T 0 0 ¯ ¯ Q(¯ x, λ) = ξ ∈ ker h (¯ x) (¯ x, λ)ξ ∈ im h (¯ x) . ∂x2 ¯ is called critical if Q(¯ ¯ 6= {0}; see [12, 17]. Otherwise λ ¯ is Recall that the multiplier λ x, λ) noncritical. As demonstrated in [19], if initialized near a primal-dual solution with a noncritical dual part, the stabilized Newton–Lagrange method with σ(u) = kΦ(u)k, where Φ is given by (3.11), generates a sequence which is superlinearly convergent to a (nearby) solution. Again, of current interest is the critical case. Evidently, the iteration (3.12) fits the pNM framework (1.2) for Φ defined in (3.11), taking 0 0 Ω(u) = , ω ≡ 0. 0 −σ(u)I 18
¯ = −1 in Example 3.1 (%) Table 1: Cases of convergence to λ ε 1 0.5 0.25 0.1 0.01
L-M τ =1 38 34 36 38 38
L-M τ = 3/2 44 52 57 68 84
L-M τ =2 56 68 79 86 86
LP-N
sN-L
N-L
54 55 58 73 86
42 57 73 83 88
83 86 86 90 86
These perturbations satisfy the assumptions in Theorem 2.1 if, e.g., σ(u) = kΦ(u)kτ , τ ≥ 1. We thus conclude the following. Corollary 3.3 Let f : Rn → R and h : Rn → Rl be three times differentiable near a stationary point x ¯ ∈ Rn of problem (3.10), with their third derivatives Lipschitz-continuous ¯ ∈ Rl be a Lagrange multiplier associated to x with respect to x ¯, and let λ ¯. Assume that the ¯ in a direction v¯ ∈ ker Φ0 (¯ mapping Φ defined in (3.11) is 2-regular at u ¯ = (¯ x, λ) u) ∩ S. Then for any τ ≥ 1, there exist ε = ε(¯ v ) > 0 and δ = δ(¯ v ) > 0 such that any starting point u0 = (x0 , λ0 ) ∈ (Rn × Rl ) \ {¯ u} satisfying (2.22) uniquely defines a sequence {(xk , λk )} ⊂ Rn × Rl such that for each k it holds that (ξ k , η k ) = (xk+1 − xk , λk+1 − λk ) solves (3.12) with σ(u) = kΦ(u)kτ , uk2 6= u ¯2 , the sequence {uk } converges to u ¯, the sequence {kuk − u ¯k} converges to zero monotonically, and (2.23) and (2.24) hold. It is worth to mention that the above results and discussion can be extended to variational problems (of which optimization is a special case), see [6], [20, Chapter 7]. We proceed with some further considerations. A multiplier is said to be critical of order ¯ = 1. The following was established in [14]. one if dim Q(¯ x, λ) Proposition 3.1 Let f : Rn → R and h : Rn → Rl be three times differentiable at a ¯ ∈ Rl be a Lagrange multiplier associated stationary point x ¯ ∈ Rn of problem (3.10), and let λ n ¯ ¯ ¯ is a critical multiplier of order to x ¯. Let Q(¯ x, λ) be spanned by some ξ ∈ R \ {0}, i.e., λ one. ¯ η) with some If rank h0 (¯ x) = l − 1, then ker Φ0 (¯ u) contains elements of the form v = (ξ, l 00 ¯ ¯ 6∈ im h0 (¯ η ∈ R , and Φ is 2-regular at u ¯ in every such direction if and only if h (¯ x)[ξ, ξ] x). If rank h0 (¯ x) ≤ l − 2, then Φ cannot be 2-regular at u ¯ in any direction v ∈ ker Φ0 (¯ u). ¯ ξ] ¯ = 0, then Φ cannot be 2-regular at u If h0 (¯ x) = 0, and l ≥ 2 or h00 (¯ x)[ξ, ¯ in any direction 0 v ∈ ker Φ (¯ u). In the last two cases specified in the proposition above, Theorem 2.1 is not applicable; ¯ ξ] ¯ 6= 0, these cases require special investigation. In the last case, when l ≥ 2 but h00 (¯ x)[ξ, allowing for non-isolated critical multipliers, the effect of attraction of the basic NewtonLagrange method to such multipliers had been studied in [22] for fully quadratic problems. ¯ ξ] ¯ 6= 0, then Theorem 2.1 is On the other hand, if, e.g., l = 1, h0 (¯ x) = 0, and h00 (¯ x)[ξ, ¯ η) for every η ∈ R. Taking here η = 0 recovers the results in [24] for applicable with v¯ = (ξ, 19
4 3 2 1
λ2 0 −1 −2 −3 −4 −2
0
2
4
6
8
λ1
Figure 4: Critical multipliers in Example 3.2. the basic Newton-Lagrange method and the fully quadratic case. Moreover, this is exactly the situation that we have in Example 3.1. Table 1 reports on the percentage of detected ¯ = −1 in Example 3.1, for the cases of dual convergence to the unique critical multiplier λ algorithms discussed above, depending on the size ε of the region for starting points around ¯ In the table, L-M refers to the Levenberg–Marquardt method (with different values (¯ x, λ). for the power τ that defines the regularization parameter), LP-N refers to the LP-Newton method, N-L to the Newton-Lagrange method (i.e., (3.12) with σ ≡ 0), and sN-L to the stabilized Newton-Lagrange method. ¯ ≥ 2 (i.e., when λ ¯ is critical of order higher than 1) opens wide The case when dim Q(¯ x, λ) possibilities for 2-regularity in the needed directions, and such solutions are often specially attractive for Newton-type iterates. Example 3.2 (DEGEN 20302) Consider the equality-constrained optimization problem x21 − x22 + 2x23 1 1 subject to − x21 + x22 − x23 = 0, x1 x3 = 0. 2 2 minimize
Then x ¯ = 0 is the unique solution, h0 (¯ x) = 0, and the set of associated Lagrange multipliers is the entire R2 . Critical multipliers are those satisfying λ1 = 1 or (λ1 − 3)2 − λ22 = 1. In Figure 4, critical multipliers are those forming the vertical straight line and two branches of the hyperbola. 20
¯ = (1, Table 2: Cases of convergence to λ ε 1 0.5 0.25 0.1 0.01
L-M τ =1 21 21 23 23 24
L-M τ = 3/2 26 27 29 33 60
L-M τ =2 31 37 47 63 89
√
3) in Example 3.2 (%)
LP-N
sN-L
N-L
42 45 48 55 82
4 7 15 26 59
65 75 82 91 97
According to Proposition 3.1, 2-regularity cannot hold for Φ defined in (3.11), x, λ) √ at (¯ 0 ¯ for any direction v ∈ ker Φ (¯ u), For all critical multipliers λ, except for λ = (1, ± 3), which are the two intersection points of the vertical line and the hyperbola. One can directly check ¯ in some directions v ∈ ker Φ0 (¯ that the mapping Φ is indeed 2-regular at u ¯ = (¯ x, λ) u). √ ¯ Table 2 reports on the percentage of detected cases of dual convergence to λ = (1, 3), for the algorithms discussed above, depending on the size ε of the region for starting points ¯ around (¯ x, λ).
References [1] A.V. Arutyunov. Optimality Conditions: Abnormal and Degenerate Problems. Kluwer Academic Publishers, Dordrecht, 2000. [2] E.R. Avakov. Extremum conditions for smooth problems with equality-type constraints. USSR Comput. Math. Math. Phys. 25 (1985), 24–32. [3] DEGEN. http://w3.impa.br/~optim/DEGEN collection.zip. [4] F. Facchinei, A. Fischer, and M. Herrich. A family of Newton methods for nonsmooth constrained systems with nonisolated solutions. Math. Methods Oper. Res. 77 (2013), 433–443. [5] F. Facchinei, A. Fischer, and M. Herrich. An LP-Newton method: Nonsmooth equations, KKT systems, and nonisolated solutions. Math. Program. 146 (2014), 1–36. [6] D. Fern´ andez and M. Solodov. Stabilized sequential quadratic programming for optimization and a stabilized Newton-type method for variational problems. Math. Program. 125 (2010), 47–73. [7] A. Fischer, M. Herrich, A.F. Izmailov, and M.V. Solodov. Convergence conditions for Newton-type methods applied to complementarity systems with nonisolated solutions. Comput. Optim. Appl. 63 (2016), 425–459. [8] H. Gfrerer and B.S. Mordukhovich. Complete characterizations of tilt stability in nonlinear programming under weakest qualification conditions. SIAM J. Optim. 25 (2015), 2081–2119. 21
[9] H. Gfrerer and J.V. Outrata. On computation of limiting coderivatives of the normal-cone mapping to inequality systems and their applications. Optim. DOI 10.1080/02331934.2015.1066372. [10] A. Griewank. Starlike domains of convergence for Newton’s method at singularities. Numer. Math. 35 (1980), 95–111. [11] W.W. Hager. Stabilized sequential quadratic programming. Comput. Optim. Appl. 12 (1999), 253–273. [12] A.F. Izmailov. On the analytical and numerical stability of critical Lagrange multipliers. Comput. Math. Math. Phys. 45 (2005), 930–946. [13] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. A note on upper Lipschitz stability, error bounds, and critical multipliers for Lipschitz-continuous KKT systems. Math. Program. 142 (2013), 591–604. [14] A.F. Izmailov, A.S. Kurennoy, and M.V. Solodov. Critical solutions of nonlinear equations: Stability issues. Submitted November 2015 (revised March 2016). http://www.cs.wisc.edu/~solodov/iks15Eq-critical.pdf. [15] A.F. Izmailov and M.V. Solodov. Error bounds for 2-regular mappings with Lipschitzian derivatives and their applications. Math. Program. 89 (2001), 413–435. [16] A.F. Izmailov and M.V. Solodov. The theory of 2-regularity for mappings with Lipschitzian derivatives and its applications to optimality conditions. Math. Oper. Res. 27 (2002), 614–635. [17] A.F. Izmailov and M.V. Solodov. On attraction of Newton-type iterates to multipliers violating second-order sufficiency conditions. Math. Program. 117 (2009), 271–304. [18] A.F. Izmailov and M.V. Solodov. On attraction of linearly constrained Lagrangian methods and of stabilized and quasi-Newton SQP methods to critical multipliers. Math. Program. 126 (2011), 231–257. [19] A.F. Izmailov and M.V. Solodov. Stabilized SQP revisited. Math. Program. 122 (2012), 93–120. [20] A.F. Izmailov and M.V. Solodov. Newton-Type Methods for Optimization and Variational Problems. Springer Series in Operations Research and Financial Engineering. 2014. [21] A.F. Izmailov and M.V. Solodov. Critical Lagrange multipliers: what we currently know about them, how they spoil our lives, and what we can do about it. TOP 23 (2015), 1–26. Rejoinder of the discussion: TOP 23 (2015), 48–52. [22] A.F. Izmailov and E.I. Uskov. Attraction of Newton method to critical Lagrange multipliers: fully quadratic case. Math. Program. 152 (2015), 33–73.
22
[23] J. Nocedal and S.J. Wright. Numerical Optimization. 2nd edition. Springer-Verlag, New York, Berlin, Heidelberg, 2006. [24] E.I. Uskov. On the attraction of Newton method to critical Lagrange multipliers. Comp. Math. Math. Phys. 53 (2013), 1099–1112. [25] S.J. Wright. Superlinear convergence of a stabilized SQP method to a degenerate solution. Comput. Optim. Appl. 11 (1998), 253–275. [26] N. Yamashita and M. Fukushima. On the rate of convergence of the Levenberg– Marquardt method. Computing. Suppl. 15 (2001), 237–249.
23