CONVERGENCE TO SECOND ORDER STATIONARY POINTS IN INEQUALITY CONSTRAINED OPTIMIZATION
Francisco Facchinei and Stefano Lucidi Universita di Roma \La Sapienza" Dipartimento di Informatica e Sistemistica Via Buonarroti 12, 00185 Roma, Italy e-mail (Facchinei):
[email protected] e-mail (Lucidi):
[email protected] Abstract: We propose a new algorithm for the nonlinear inequality constrained
minimization problem, and prove that it generates a sequence converging to points satisfying the KKT second order necessary conditions for optimality. The algorithm is a line search algorithm using directions of negative curvature and it can be viewed as a non trivial extension of corresponding known techniques from unconstrained to constrained problems. The main tools employed in the de nition and in the analysis of the algorithm are a dierentiable exact penalty function and results from the theory of LC1 functions.
Key Words: Inequality constrained optimization, KKT second order necessary con-
ditions, penalty function, LC1 function, negative curvature direction.
1 Introduction We are concerned with the inequality constrained minimization problem (P) min s.t.
f (x) g(x) 0;
where f : IRn ! IR and g : IRn ! IRm are three times continuously dierentiable. Our aim is to develope an algorithm that generates sequences converging to points x satisfying, together with a suitable multiplier 2 IRm, both the KKT rst order necessary optimality conditions
rf (x) + rg(x) = 0; 0; g(x) 0;
T g(x) = 0; and the KKT second order necessary optimality conditions ! m X T T rgi(x) z = 0; 8i : gi(x) = 0 =) z r f (x) + ir gi (x) z 0: 2
2
i=1
(1)
(2)
In the sequel we will call a point x satisfying (1) a rst order stationary point (or just stationary point), while a point satisfying both (1) and (2) will be termed second order stationary point. In the unconstrained case the conditions (1) and (2) boil down to rf (x) = 0 and 2 r f (x) 0 respectively. Standard algorithms for unconstrained minimization usually generate sequences converging to rst order stationary points. In a landmark paper [21] (see also [23, 22, 16, 20] and references therein for subsequent developments), McCormick showed that, by using directions of negative curvature in an Armijo-type line search procedure, it is possible to guarantee convergence to second order stationary points. From a theoretical point of view, this is a very strong result, since it makes much more likely that the limit points of the sequence generated by the algorithm are local minimizers and not just saddle points. Furthermore, from a practical point of view, the use of negative curvature directions turns out to be very helpful in the minimization of problems with large non-convex regions [16, 20]. Convergence to second order stationary points was later established also for trust-region algorithms [25, 24], and this constitute one of the main reasons for the popularity of this class of methods. Trust-region algorithms have been extended to equality constrained and box constrained problems so as to mantain convergence to second order stationary points [2, 8, 13, 5, 25, 24, 26]; while negative curvature line search algorithms have been proposed for the linearly inequality constrained case [22, 17]. However, as far as we are aware of, no algorithm for the solution of the more complex nonlinearly inequality constrained minimization Problem (P) exists which generates sequences converging to second order stationary points. The 2
main purpose of this paper is to ll this gap by presenting a negative curvature line search algorithm which enjoys this property. The basic idea behind our approach can be easily explained as follows. (a) Reduce the constrained minimization Problem (P) to an equivalent unconstrained minimization problem by using a dierentiable exact penalty function. (b) Apply a negative curvature line search algorithm to the minimization of the penalty function. Although appealingly simple we have to tackle some diculties to make this approach viable. First of all we have to establish a connection between the unconstrained stationary points of the penalty function provided by the unconstrained minimization algorithm and the constrained second order stationary points of Problem (P). Secondly, we must cope with the fact that dierentiable exact penalty functions, although once continuously dierentiable, are never twice continuously dierentiable everywhere, so that we cannot use an o-the-shelf negative curvature algorithm for its minimization. Furthermore, even in points where the second order derivatives exist, their explicit evaluation would require the use of the third order derivatives of the functions f and g, which we are not willing to calculate. To overcome these diculties we develop a negative curvature algorithm for the unconstrained minimization of the penalty function which is based on the theory of LC1 functions and on generalized Hessians. We show that by using a suitable approximation to elements of the generalized Hessian of the penalty function we can guarantee that the unconstrained minimization of the penalty function yields an unconstrained stationary point where a matrix which approximates an element of the generalized Hessian is positive semide nite. This suces to ensure that the point so found is also a second order stationary point of Problem (P). We believe that the algorithm proposed in this paper is of intereset because, for the rst time, we are able to prove convergence to second order stationary points for general inequality constrained problems. We do so by a fairly natural extension of negative curvature algorithms from unconstrained to constrained problems; we note that the computation of the negative curvature direction can be performed in a manner analogous to and at the same cost as in the unconstrained case. We also remark that we never require the complementarity slackness assumption to establish our results. Finally, we think that the use of some non trivial nonsmooth analysis results to analyze the behavior of smooth algorithms is a novel feature that could be fruitfully applied also in other cases. The paper is organized as follows. In the next section we recall some few known facts about LC1 functions and generalized Hessians and on the asymptotic identi cation of active constraints. Furthermore, we also introduce the penalty function along with some of its relevant properties. In Section 3 we introduce and analyze the algorithm. In the fourth section we give some hints on the practical realization of the algorithm. Finally, in the last section we outline possible improvements and make some remarks. We nally review the notation used in this paper. The gradient of a function h : n IR ! IR is indicated by rh, while its Hessian matrix is denoted by r2h. If H : IRn ! 3
IRm then the matrix rH is given by rH = (rH1 rHm). If I is an index set, with I f1; : : : ; mg, HI is the vector obtained by considering the components of H in I . We indicate by kk the Euclidean norm and the corresponding matrix norm. If S is a subset of IRn, coS denotes its convex hull. If A is a square matrix, min (A) denotes its smallest eigenvalue. The Lagrangian of Problem (P) is L(x; ) = f (x) + g(x)T . We indicate by L(x; (x)) the Lagrangian of Problem (P) evaluated in = (x). Analogously, we indicate by rxL(x; (x)) (r2xxL(x; (x))) the gradient (Hessian) of the Lagrangian with respect to x evaluated in = (x).
2 Background material In this section we review some results on dierentiability of functions and on the identi cation of active constraints. We also recall the de nition and some basic facts about a dierentiable exact penalty function for Problem (P) and we establish some related new results.
2.1 LC1 functions.
De nition 2.1 A function h : IRn ! IR is said to be an LC function on an open set O if - h is continuously dierentiable on O, - rh is locally Lipschitz on O 1
LC1 functions were rst systematically studied in [18], where the de nition of generalized Hessian and the theorem reported below where also given. The gradient of h is locally Lipschitz on O; rh is therefore dierentiable almost everywhere in O so that its generalized Jacobian in Clarke's sense [3] can be de ned. This is precisely the generalized Hessian of h, whose de nition is as follows. De nition 2.2 Let h : IRn ! IR be an LC1 function on the open set O and let x belong to O. We de ne the generalized Hessian of h at x to be the set @ 2 h(x) of matrices de ned as @ 2h(x) := co fH 2 IRnn : 9xk ! x with rh dierentiable at xk and r2h(xk ) ! H g: Note that @ 2h(x) is a nonempty, convex, compact set of symmetric matrices. Furthermore, the point-to-set map x 7! @ 2h(x) is bounded on bounded sets [18]. For LC1 functions a second-order Taylor-like expansion is possible. This is the main result on LC1 function we shall need in this paper. Theorem 2.1 Let h : IRn ! IR be an LC1 function on the open set O and let x and y be two points in O such that [x; y] is contained in O. Then h(y) = h(x) + rh(x)T (y ? x) + 21 (y ? x)T H (y ? x); where H 2 @ 2h(z), for some z 2 (x; y). 4
2.2 Identi cation of active constraints
In this section we recall some results on the identi cation of active constraints at a stationary point x of the nonlinear program (P). We refer the interested reader to [14] and to references therein for a detailed discussion of this issue. Here we recall only some results in order to stress the fact that, in a neighborhood of a rst order stationary point, it is possible, under mild assumptions, to correctly identify those constraints that are active at the solution. First of all we need some terminology. Given a stationary point x with a corresponding multiplier , which we suppose to be unique, we denote by I0(x) the set of active constraints I0(x) := fij gi(x) = 0g; while I+(x) denotes the index set of strongly active constraints I+(x) := fi 2 I0(x) j i > 0g: Our aim is to construct a rule which is able to assign to every point x an estimate A(x) so that A(x) = I (x) holds if x lies in a suitably small neighborhood of the stationary point x. Usually estimates of this kind are obtained by comparing the values of gi (x) with the value of an estimate of the multiplier . For example, it can be easily shown that the set I(x) := fi 2 I j gi(x) > ?ci(x)g; where c is a positive constant and : IRn ! IRm is a multiplier function (i.e. a continuous function such that (x) = ) coincides with the set I (x) for all x in a suciently small neighborhood of a stationary point x which satis es the strict complementarity condition (see the next section for an example of multiplier function). If this condition is violated, then only the inclusions 0
0
I (x) I(x) I (x) +
0
(3)
hold [14]. If the stationary point x does not satisfy strict complementarity, the situation is therefore more complex, and only recently it has been shown that it is nevertheless possible to correctly estimate the set I0(x) [14]. We will not go into details here, we only point out that the identi cation of the active constraints when strict complementarity does not hold is possible under very mild assumptions in a simple way. The identi cation rule takes the following form:
A(x) := fi 2 I j gi (x) ?(x)g
(4)
where (x) is a function that can take dierent forms according to the assumptions made on the stationary point x. For example, if in a neighborhood of x both f and g 5
are analytic (an assumption which is met in most of the practical cases), one can de ne (x) as 8 > < 0 ?1 if r(x) = 0; (x) = > log(r(x)) if r(x) 2 (0; 0:9); ?1 : log(0 :9) if r(x) 0:9; where
r(x) = krf (x) + rg(x)(x)k + jT g(x)j + k min[0; (x)]k + k max[0; g(x)]k: (5) With this choice the set A(x) de ned by (4) will coincide with I (x) in a suitable neighborhood of x. 0
2.3 Penalty function
In this section we consider a dierentiable penalty function for Problem (P), we recall some relevant known facts and prove some new results which are related to the dierentiability issues dealt with in Section 2.1. In order to de ne the dierentiable penalty function and to guarantee some of the properties that will be needed we make the following two assumptions.
Assumption A. For any x 2 IRn, the gradients rgi(x), i 2 I (x), are linearly inde0
pendent.
Assumption B. For any x 2 IRn, the following implication holds X gi(x)rgi(x) = 0 =) gi(x) = 0; 8i : gi (x) 0: i:gi (x)0
Assumptions A and B, together with Assumption C, which will be stated in Section 3, are the only assumptions used to establish the results of this paper. These assumptions, or assumptions similar to them, are frequently encountered in the analysis of constrained minimization algorithms. However, we point out that they can be considerably relaxed; this will be discussed in Section 5. We chose to use this particular set of assumptions in order to simplify the analysis and to concentrate on the issues related to the main topic of the paper, i.e. convergence to second order stationary points. We start by de ning a multiplier function
(x) := ?M ? (x)rg(x)0rf (x); (6) where M (x) is the m m matrix de ned by: M (x) = rg(x)0rg(x) + G (x); (7) and G(x) := diag(gi(x)). The main property of this function is that it is continuously dierentiable (see below) and, if x is a rst order stationary point, then (x) is the 1
2
6
corresponding multiplier (which, by Assumption A, is unique). Using this function we can de ne the following penalty function m X 1 2 Z (x; ) := f (x) + i (x) max[gi(x); ? 2 i (x)] + max[gi(x); ? 2 i(x)] ; (8) i=1 where > 0 is the so-called penalty parameter. Theorem 2.2 The following properties hold: (a) For every , the penalty function Z is continuously dierentiable on IRn and its gradient is given by rZ (x; ) = rxL(x; (x)) m X max[gi(x); ? 2 i (x)] 2 rgi(x) + ri(x) ; + i=1
where
r(x)
T
= ?M (x)?1
"
rg(x) rxxL(x; (x)) + T
2
m X i=1
2
ei is the i-th column of the m m identity matrix and (x) := diag(i(x)). (b) For every , the function Z is an LC function on IRn. (c) Let x be a rst order stationary point for Problem (P) and let be given. Then there exists a neighborhood of x such that, for every x in , the following overestimate of the generalized Hessian of Z evaluated at x holds @ Z (x; ) @~ Z (x; ) := cofH (x; ; A); A 2 Ag; 1
2
where
2
H (x; ; A) = rxxL(x; (x)) + rA (x)rgA(x)T + rgA(x)rA(x)T + 2 rgA(x)rgA(x)T ? 2 rN (x)rN (x)T + KA (x); 2
A := fA : A = I (x) [ J; 8J I (x) n I (x)g, N = f1; : : :; mg n A, and KA (x) is matrix for which we can write kKA (x)k (x) for a nonegative continuous +
0
+
function such that (x) = 0. In particular, the following overstimate holds in x @ 2Z (x; ) @~2Z (x; ) := cofH (x; ; A); A 2 Ag; where H (x; ; A) = r2xxL(x; (x)) + rA(x)rgA(x)T +rgA(x)rA(x)T + 2 rgA(x)rgA(x)T ? 2 rN (x)rN (x)T
7
#
eirxL(x; (x)) r gi (x) + 2(x)G(x)rg(x) ; T
T
Proof. Point (a) is proved in [11].
Point (b) follows from the expression of the gradient given in (a) taking into account the dierentiability assumptions The proof of point (c) can be derived from the very de nition of generalized Hessian in the following way. Let x be a stationary point of Problem (P). Consider a point x in a neighborhood of x and sequences of points fxk g converging to x with the gradient of Z existing in xk . This will happen either if (a) for no i gi(xk ) = ? 2 i (xk ) or if (b) for all i for which gi(xk ) = ? 2 i(xk ) we also have rgi(xk ) = ? 2 ri(xk ). Set I(x) = fi : gi(x) > ? 2 i(x)g. By recalling the expression of the gradient given previously we can write rZ (xk ; ) = rxL(xk; (xk )) X + gi(xk ) 2 rgi(xk ) + ri(xk ) i2I (xk )
+
X
i2I (xk )
?i(xk ) rgi(xk ) + 2 ri(xk ) ;
where I (xk ) := f1; : : : ; mg n I(xk ). It is now easy to see that, both in case (a) and (b), the Hessian of Z (x; ) in xk can be obtained by dierentiating this expression and this gives r2Z (xk ; ) = r2xxL(xk ; (xk )) + rI(xk )(xk )rgI (xk)(xk )T + rgI (xk)(xk )rI(xk )(xk )T + 2 rgI(xk )(xk )rgI(xk )(xk )T ? 2 rI (xk )(xk )rI (xk )(xk )T + KI (xk)(xk );
where KI (xk)(xk ) rapresents the sum of terms always containing as a factor either gI (xk )(xk ) or I (xk )(xk ). Taking into account the de nition of @ 2Z (x; ) and that, as discussed in the previous section, if is suitably small I+(x) I(xk ) I0(x); we have that both gI(xk )(xk ) and I (xk)(xk ) go to 0 as xk ! x. The assertion of point (c) now follows from these facts and the de nition of A. The following theorem gives a sucient condition, in terms of matrices in the overestimate @~2Z (x; ), for a stationary point of Problem (P) to be a second order stationary point. Theorem 2.3 Let x be a rst order stationary point of Problem (P) and let be given. Then, if a matrix H exists in @~2 Z (x; ) which is positive semide nite, x is a second order stationary point of Problem (P). Proof. Let H in @~2Z (x; ) be positive semide nite and suppose by contradiction that x does not satisfy the KKT second order necessary conditions (2). Then a vector z exists such that rgI0 (x)T z = 0 and zT r2xxL(x; (x))z < 0 (9) 8
(recall that (x) equals the multiplier associated with x). On the other hand, by Theorem 2.2 (c) and by Caratheodory theorem, we also have that, for some integer t n2 + 1, Xt H = iH (x; ; Ai) i=1
where, for each i, i 0, Pti i = 1 and Ai 2 A. Since, for each i, Ai 2 A, we can write, taking into account the de nition of H (x; ; Ai) and (9), zT H (x; ; A )z = zT r L(x; (x))z ? zT r (x)r (x)T z < 0 =1
i
2
xx
2
from which
Ni
Ni
zT Hz < 0 immediately follows. But this contradict the assumption that H is positive semide nite and the proof is complete.
This result will turn out to be fundamental to our approach, since our algorithm will converge to rst order stationary stationary points where at least one element in @~2Z (x; ) is positive semide nite. In the remining part of this section we consider some technical results about penalty functions that will be used later on and that help illustrate the relation between the function Z and Problem (P). Proposition 2.4 (a) Let > 0 and x 2 IRn be given. If x is an unconstrained stationary point of Z and max[g(x); ? 2 (x)] = 0, then x is a rst order stationary point of Problem (P). (b) Conversely, if x is a rst order stationary point of Problem (P), then, for every positive , rZ (x; ) = 0 and max[g(x); ? 2 (x)] = 0.
Proof. See [11]. Proposition 2.5 Let D IRn be a compact set. Then, there exists an ^ > 0 such that, for every x 2 D and for every 2 (0; ^], we have
krZ (x; )k k max[g(x); ? 2 (x)]k:
Proof. Let D IRn be a compact subset such that D intD . In the proof of Proposition 14 in [12] it is shown that if x is a feasible point in intD , and therefore in D, we can nd positive (x), (x) and 0(x) such that 1
1
1
krg(x)T rZ (x; )k 0(x)k max[g(x); ? 2 (x)]k; 8 2 (0; (x)]; 8x : kx?xk (x): (10) More precisely, (10) derives from formula (24) in [12] and the discussion which follows that formula. Indicate by M the maximum of krg(x)k for x : kx ? xk (x); 9
note that, by Assumption A, M > 0. Then, recalling that krg(x)T rZ (x; )k krg(x)kkrZ (x; )k, we can easily deduce from (10) that krZ (x; )k (x)k max[g(x); ? 2 (x)]k; 8 2 (0; (x)]; 8x : kx ? xk (x); (11) where we set (x) = 0(x)=M . Suppose now that the theorem is not true. Then, sequences fxk g and fk g exist, such that xk 2 D; fxk g ! x^; fk g ! 0; and k krZ (xk ; k )k < k k max[g(xk ); ? 2k (xk )]k: (12) Since k k max[g(xk); ? 2k (xk )]k ! 0; (12) gives k krZ (xk; k )k ! 0, which, recalling the expression of rZ , gives m X i=1
max[gi(^x); 0]rgi(^x) = 0:
Thus, by Assumption B we have that x^ is feasible. But then, we get a contradiction between (12) and (11), and this concludes the proof. Note that Proposition 2.4 and Proposition 2.5 imply that, given a compact set D, if is suciently small, then every stationary point of Problem (P) in D is an unconstrained stationary point of Z and, vice versa, every unconstrained stationary point of Z in D is a stationary point of Problem (P). We refer the interested reader to the review paper [9] and references therein for a more detailed discussion of the properties of dierentiable penalty functions.
3 Convergence to second order stationary points In this section we consider a line search algorithm for the minimization of Z (x; ) which yields second order stationary points of Problem (P). For the sake of clarity we break the exposition in three parts. In Section 3.1 we rst consider a line search algorithm (Algorithm M) which converges, for a xed value of the penalty parameter , to an unconstrained stationary point of the penalty function Z . By Proposition 2.4 we know that if the penalty parameter were suciently small, we would have thus obtained a rst order stationary point of Problem (P). Therefore, in Section 3.2, we introduce an algorithm (Algorithm SOC) where Algorithm M is embedded in a simple updating scheme for the penalty parameter based on Proposition 2.5. We show that after a nite number of reductions the penalty parameter stays xed and every limit point of Algorithm SOC is a rst order stationary point of Problem (P). Finally, in Section 3.3 10
we re ne the analysis of Algorithm SOC and we show that every limit point is actually a second order stationary point of Problem (P). In order to establish the results of Sections 3.1, 3.2 and 3.3 we assume that the directions used in Algorithm M satisfy certain conditions. In Section 4 we will illustrate possible ways for generating directions which ful l these conditions. In order to simplify the analisys we shall assume, from now on, that the following assumption is satis ed. Assumption C. The sequence fxkg of points generated by the algorithms considered below is bounded.
3.1 Convergence for xed to unconstrained stationary points of Z : Algorithm M
We rst consider a line search algorithm for the unconstrained minimization of the penalty function Z which generates, for a xed value of the penalty parameter, sequences converging to unconstrained stationary points of the penalty function. In all this section, is understood to be a xed positive constant. The algorithm generates a sequence fxk g according to the following rule:
Algorithm M
xk = xk + k sk + tk xk dk ; +1
where
2
(
)
t(xk ) = 1 + minf0:5; krZ (xk ; )kg
and where k is compute by the Linesearch procedure below.
(13)
Linesearch procedure
Data: xk , sk , dk , rZ (xk; ), Hk 2 IRnn, 2 (0; 21 ), 0 < 1 < 2 < 1. Step 1: Set = 1. Step 2: If Z (xk + 2sk + t(xk )dk ; ) Z (xk ; ) + 2rZ (xk; )T sk + 21 2t(xk)dTk Hk dk set k = and stop. Step 3: Choose 2 [1; 2], and go to Step 2.
We assume that the matrices Hk depend on the sequence fxk g and that the directions sk , dk and the matrices Hk are bounded and satisfy the following conditions: 11
Condition 1. The directions sk are such that rZ (xk ; )T sk 0 and rZ (xk; )T sk ! 0 =) rZ (xk; ) ! 0 and sk ! 0: Condition 2. The directions dk are such that rZ (xk ; )T dk 0 and, together with the matrices Hk , they satisfy
dT H d < 0 if H is not positive semide nite k k k k dTk Hk dk = 0 otherwise dTk Hk dk ! 0 =) dk ! 0:
Condition 3. Let fxk g and fuk g be sequences converging to a rst order stationary point x of Problem (P). Then, for every sequence of matrices fQk g, with Qk 2 @ Z (uk ; ), we have 2
dTk (Qk ? Hk )dk k ;
where fk g is a sequence of numbers converging to 0.
Algorithm M resembles classical line search algorithms using negative curvature directions to force convergence to second order stationary points in unconstrained minimization. The only apparent dierence is that we have the exponent t(xk) de ned by (13) while in corresponding unconstrained algorithms we usually have t(xk ) = 1 for every k. We need this change in order to be able to tackle the fact that the penalty function is not everywhere twice continuously dierentiable (see, for example, the proof of Proposition 3.1). We also assume that the directions sk , dk and the matrices Hk satisfy Conditions 1-3. Conditions 1 and 2 are fairly standard and similar to those employed in the unconstrained case. Condition 3, on the sequence of matrices Hk , is, again, related to the nondierentiability of the gradient of Z . In fact, the matrix Hk is supposed to convey some second order information on the penalty function; therefore Condition 3 imposes a certain relation between the matrices Hk and the generalized Hessians of Z . Note that if the function Z were twice continuously dierentiable the choice Hk = r2Z (xk ; ) would satisfy, by continuity, Condition 3. The following proposition shows that the linesearch procedure described above is well de ned in all the cases that, we shall see, are of interest for us. Proposition 3.1 The linesearch procedure is well de ned, namely at each iteration the test of Step 2 is satis ed for every suciently small if the point xk either (a) is not an unconstrained stationary point of the function Z or (b) is a rst order stationary point of Problem (P) and Hk is not positive semide nite. Proof. Assume by contradiction the assertion of the proposition is false. Then there exists a sequence fj g such that j ! 0 for j ! 1 and Z xk + 2j sk + tj(xk)dk ; > Z (xk ; ) + 2j rZ (xk ; )T sk + 21 j2t(xk )dTk Hk dk ; (14) 12
and either the condition (a) or the condition (b) hold. By Theorem 2.1 and taking into account that r Z (xk; )T dk 0 by Condition 2, we can nd a point uk = xk + t ( x ) !k 2j sk + j k dk with !k 2 (0; 1) such that:
Z xk + j sk + jt xk dk ; Z (xk ; ) + j rZ (xk; )T sk iT i h h + 1 j sk + tj xk dk Q(uk ) j sk + tj xk dk (15) 2 where Q(uk ) is a symmetric matrix belonging to @ Z (uk ; ). Therefore, by (14) and (15), we have: " 1 t x T k T ( ? 1) j rZ (xk; ) sk + j dk Hk dk 2 21 j t xk dTk [Q(uk ) ? Hk ] dk + 21 j sTk Q(uk)sk + j t xk dTk Q(uk )sk ; (16) (
2
)
2
(
2
)
2
(
)
2
2 (
2
2 (
)
)
4
2+ (
)
Now we consider two cases. If condition (a) holds, then krZ (xk; )k 6= 0. We have that rZ (xk; )T sk < 0 (by Condition 1) and t(xk ) > 1 (by (13)). By dividing both sides of (16) by 2j , by taking into account that ? 1 < 0, by making the limit for j ! 1 and by recalling that the sequence fQ(uk)g is bounded, we obtain the contradiction
rZ (xk ; )T sk 0: If condition (b) holds, then xk is a rst order stationary point of Problem (P) and Hk is not positive semide nite. We have, by Proposition 2.4 (b), that rZ (xk ; ) = 0, so that rZ (xk ; )T sk = 0, and t(xk) = 1 (by (13)). By dividing both sides of (16) by j2t(xk ), by making the limit for j ! 1, by recalling that the sequence fQ(uk )g is bounded, and by recalling that Condition 3 implies
dTk [Q(uk ) ? Hk ] dk k ; with fk g ! 0, we obtain from (16)
dTk Hk dk 0; which, recalling that Hk is not positive semide nite, contradicts Condition 2. Proposition 3.1 shows that Algorithm M can possibly fail to produce a new point only if, for some k, rZ (xk ; ) = 0. Then, supposing that this trivial case does not occur, the next theorem illustrates the behaviour of an in nite sequence generated by Algorithm M. Theorem 3.2 Let fxk g be an in nite sequence produced by Algorithm M. Then, every limit point x of fxk g is such that rZ (x; ) = 0. 13
Proof. Since the sequence fZ (xk ; )g is monotonically decreasing, Z is continuous and fxk g is bounded by Assumption C, it follows that fZ (xk ; )g converges. Hence lim (Z (xk ; ) ? Z (xk ; )) = 0: (17) k!1 +1
Then, by recalling the acceptability criterion of the line search, Condition 1 and Condition 2, we have: Z (xk+1; ) ? Z (xk ; ) 2k rZ (xk; )T sk + 12 2kt(xk)dTk Hk dk 0: (18) Therefore, (17), (18), Condition 1 and Condition 2 yield:
k rZ (xk ; )T sk ! 0 kt k dTk Hk dk ! 0:
(19) (20)
2
2 ( )
The boundness of sk and dk , Condition 1, Condition 2, (19) and (20) imply in turn:
k sk ! 0 kt k dk ! 0:
(21) (22)
2
( )
Suppose now, by contradiction, that there exists a converging subsequence fxk gK1 whose limit point x is not a stationary point. For semplicity and without loss of generality we can rename the subsequence fxk gK1 by fxk g. Condition 1, (19) and rZ (x; ) 6= 0 imply
k ! 0: By (23) we have that there exists an index k^ such that, for all k k^, ! k t xk k dk ; Z xk + sk + k k k t x k k 1 T > Z (xk ; ) + rZ (xk; ) sk + 2 dTk Hk dk 2
(
2
2 (
k
(23)
)
)
(24)
k
for some k 2 [1; 2]. By Theorem 2.1 and taking into account that rZ (xk; )T dk 0 by Condition 2, we can nd, for any k k^, a point uk = xk +!k (k =k )2 sk + (k =k )t(xk)dk with !k 2 (0; 1) such that:
! k k t xk k Z xk + sk + dk ; Z (xk ; ) + rZ (xk ; )T sk k k k " # " T k t xk # 1 k k t xk k + 2 sk + dk Q(uk ) sk + dk (25) k k k k 2
(
2
)
(
2
)
14
2
(
)
with Q(uk ) in @ 2Z (uk ; ). From (24) and (25) It follows that
k k t x k k 1 T T
rZ (xk; ) sk + 2 dk Hk dk rZ (xk ; )T sk k" k# " k t xk # T t x k dk Q(uk) k sk + k dk : (26) + 21 k sk + k k k k k k 2
2 (
2
Dividing both sides by
(
k
2
)
)
2
2
(
)
and by simple manipulations we obtain
k 2(t(xk)?1) 1 ( ? 1)rZ (xk ; ) sk 2 dTk [Q(uk ) ? Hk ] dk k k t(xk) 1 k 2 T dTk Q(uk )sk : (27) + 2 sk Q(uk )sk + k k By (21) and (22) we have uk ! x. Since ? 1 < 0, rZ (xk; )T sk 0 by Condition 1, t(xk ) ! t(x) > 1 because rZ (x; ) 6= 0, and since the sequence fQ(uk)g is bounded, we have by (27) lim ( ? 1)rZ (xk ; )T sk = 0: k!1 Condition 1 now implies that rZ (x; ) = 0 which contradicts the fact the subsequence fxk g does not converge to an unconstrained stationary point and this proves the theorem. T
In the next sections, given xk , we indicate by M(xk ) the new point produced by the Algorithm M described above.
3.2 Updating to guarantee convergence to stationary points of Problem (P): Algorithm SOC
In this section we show that it is possible to update in a simple way the value of the penalty parameter while minimizing the penalty function Z by Algorithm M, so that every limit point of the sequence of points generated is a rst order stationary point of Problem (P). This is accomplished by the Algorithm SOC below. In the next section we shall show that actually, under some additional conditions, the limit points generated by Algorithm SOC are also second order stationary points of Problem (P). This motivates the name SOC, which stands for Second Order Convergence.
Algorithm SOC
Step 0: Select x0 and 0. Set k = 0. Step 1: If rZ (xk ; k ) = 0 go to Step 2; else go to Step 3. 15
Step 2: If max[g(xk ); ? 2 (xk )] = 0 and Hk 0 stop; else, if max[g(xk ); ? 2 (xk )] = 0 and Hk 6 0 go to Step 4; otherwise if max[g (xk ); ? 2 (xk )] 6= 0 go to Step 5, Step 3: If krZ (xk ; k)k k max[g(xk); ? 2 (xk )]k go to Step 4; else go to Step 5. Step 4: Compute xk+1 = M(xk ), set k+1 = k , k = k + 1 and go to Step 1. Step 5: Set xk+1 = xk , k+1 = 12 k and go to Step 1. Algorithm SOC is related to similar schemes already proposed in the literature (see, e.g., the review paper [9] and references therein). The core step is Step 3, where, at each iteration, the decision of whether to update is taken. This Step is obviously motivated by Propositions 2.5 and Proposition 2.4 (a).
Theorem 3.3 Algorithm SOC is well de ned. Furthermore, let fxk g and fk g be the sequences produced by Algorithm SOC. Then, either the algorithm terminates after p iterations in a rst order stationary point xp of Problem (P), or there exist an index k and an > 0 such that, for every k k, k = and every limit point of the sequence fxk g is a rst order stationary point of Problem (P).
Proof. The algorithm is well de ned because every time we reach Step 4 Proposition 3.1 ensures the M(xk ) is well de ned. If the algorithm stops after a nite number of iterations, then, by the instructions of Steps 1 and 2, we have rZ (xp; p) = 0 and max[g(xp); ? p (xp)] = 0. The thesis then follows by Proposition 2.4 (a). Therefore, 2
assume that an in nite sequence of points is generated. Assumption C and Theorem 2.5 guarantee that is updated only a nite number of times. So, after a nite number of times k = and Algorithm SOC reduces to the application of Algorithm M to Z (x; ). Then, by Theorem 3.2, every limit point x of fxk g is such that rZ (x; ) = 0. Since the test at Step 3 is eventually always satis ed, this implies, in turn, that max[g(xk ); ? 2 (xk )] = 0. The thesis now follows by Theorem 2.4 (a).
3.3 Algorithm SOC: Second order convergence
In this section we prove that under additional suitable conditions, every limit point of the sequence fxk g generated by Algorithm SOC actually satis es the KKT second order necessary conditions. To establish this result we need the two further conditions below.
Condition 4. Let fxk g be a sequence converging to a rst order stationary point of Problem (P). Then the directions dk and the matrices Hk satisfy
dTk Hk dk ! 0 =) min [0; min (H (xk ))] ! 0: 16
(28)
Condition 5. Let fxk g be a sequence converging to a rst order stationary point x of Problem (P), and let > 0 be given. Then dist(Hk j@~2Z (x; )) ! 0:
(29)
Condition 4 mimics similar standard conditions in the unconstrained case, where Hk is the Hessian of the objective function. Roughly speaking, it requires the direction dk to be a suciently good approximation to an eigenvector corresponding to the smallest eigenvalue of Hk . Condition 5, similarly to Condition 3, imposes a connection between the matrices Hk and the generalized Hessian of Z . The following theorem establishes the main result of this paper. Theorem 3.4 Let fxk g be the sequence produced by Algorithm SOC. Then, either the algorithm terminates at a second order stationary point xp of Problem (P) or it produces an in nite sequence fxk g such that every limit point x of fxk g is a second order stationary point of Problem (P).
Proof. If Algorithm SOC terminates after a nite number of iterations we have, by
Theorem 3.3, that xp is a rst order stationary point of Problem (P). On the other hand, by the instructions of Step 2 and by Condition 5, we have that Hp is positive semide nite and belongs to @~2Z (xp; p). Therefore, the assertion follows from Theorem 2.3. We then pass to the case in which an in nite sequence is generated. We already know, by Theorem 3.3, that every limit point of the sequence is a rst order stationary point of Problem (P). We also know that eventually k is not updated, so that k = . Then, by Theorem 2.3 it will suce to show that @~Z (x; ) contains a positive semide nite element. Suppose the contrary. Let fxk g converge to x. Reasoning as in the beginning of the proof of Theorem 3.2, we have that (21) and (22) still hold. Then, we can assume, renumbering if necessary, that
fk g ! 0:
(30)
In fact, if this is not the case, (22), Conditions 4 and 5 imply the contradiction that fHk g tends to a positive semide nite element in @~2Z (x; ). Then, by (30) and by repeating again the arguments used in the proof of Theorem 3.2, we have that there exists an index k^ such that, for all k k^, (26) holds. From (26) we get, recalling Condition 3:
k k t x k 1 T T ( ? 1) dk Hk dk k rZ (xk; ) sk + 2 k k t xk k k 1 1 T T dk [Q(uk) ? Hk ] dk + 2 sk Q(uk )sk + 2 k k k k t xk k k t xk 1 1 k + 2 sTk Q(uk)sk + dTk Q(uk )sk ; 2 k k k 2
2 (
)
2 (
)
2 (
)
4
4
2+ (
17
t xk )
2+ (
)
dTk Q(uk)sk ;
which, taking into account that, by Condition 1, rZ (xk; )T sk 0, and the fact that ( ? 1) < 0, yields: k 2t(xk) k 2t(xk) 1 k 4 k 2+t(xk) 1 1 T T ( ?1) 2 dk Hk dk 2 k + 2 sk Q(uk )sk + dTk Q(uk )sk : k k k k Now, dividing both sides by
k t xk k
2 (
)
we have:
k 4?2t(xk ) k 2?t(xk ) 1 1 1 T T sk Q(uk )sk + dTk Q(uk)sk : (31) ( ? 1) 2 dk Hk dk 2 k + 2 k k By (21) and (22) we have uk ! x, so that fQ(uk)g is bounded, while by Condition 5 we have, renumbering if necessary, Hk ! H 2 @ 2Z (x; ). Since ? 1 < 0, dTk Hk dk 0 by Condition 2, (31) implies : lim ( ? 1)dTk Hk dk = 0
k!1
and hence, by recalling Condition 4, we have that min (H ) 0 which, by Condition 5, contradicts the fact the subsequence fxk g converges to a KKT point where every element in @~2Z (x; ) is not positive semide nite.
4 Practical realization In this section we show how we can calculate directions sk , dk and matrices Hk satisfying Conditions 1-5 required in the previous sections. Let the matrix Hk be de ned as
Hk = Hk (xk ) = rxxL(xk ; (xk )) + rA xk (xk )rgA xk (xk )T (32) + rgA xk (xk )rA xk (xk )T + 2 rgA xk (xk )rgA xk (xk )T ? 2 rN xk (xk )rN xk (xk )T 2
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
where N (x) = f1; 2; : : : ; mg n A(x) and A(x) is any estimate of the active set with the property that, in a neighborhood of a stationary point x, A(x) = I0(x). In section 2.3 we discussed more in detail some possible choices for A(x) and gave adequate references. Note also that, in a stationary point x, the matrix Hk (xk ) belongs to @~2Z (x; ). Given this matrix we have a wide range of choices for sk and dk . (sk ) (a) A theoretically sound option is to take sk to be ?rZ (xk; k ). (b) A more practical choice, however, could be that of taking sk as the solution of the linear system (Hk + Dk )sk = ?rZ (xk; ); where Dk is a diagonal matrix chosen so that the Hk + Dk is positive de nite and the smallest eigenvalue of the sequence fHk + Dk g is bounded away from 0. The 18
matrix Dk should be 0 if the matrix Hk is positive de nite with smallest eigenvalue greator than a (small) positive threshold value. Methods for automatically constructing the matrix Dk while solving the system Hk sk = ?rZ (xk; ) are well known and used in the unconstrained case. (c) Another possible choice is to take sk as the direction employed in [10, 1, 15]. (dk ) (a) dk can be chosen to be to be an eigenvector associated to the smallest eigenvalue of the matrix Hk with the sign possibly changed, in order to ensure rZ (xk ; )T dk 0. (b) Suitable approximations of the direction of point (a) calculated as indicated, for example, in [23] and [20] could also be employed. The design of an algorithmically eective choice for sk and dk is beyond the scope of this paper. Here we only wanted to illustrate the a wide range of options is available; further choices are certainly possible. In the sequel, for the sake of concreteness, we shall assume that both sk and dk are chosen according to the options (a) listed above. With these choices, and since we are supposing that fxk g remains in a bounded set, it is easy to see that also the sequences fHk g, fsk g and fdk g are bounded. It is also standard to show that Conditions 1, 2 and 4 are satis ed. Furthermore, if we recall that, in a neighborhood of a stationary point x of Problem (P), A(x) = I0(x), it is easy to see that, by the very de nition of @~2Z (x; ), also Condition 5 is met by our choice for the matrix Hk . In the next proposition we show that also the more cumbersome Condition 3 is satis ed. Proposition 4.1 The sequence of matrices de ned by (32) satis es Condition 3. Proof. Let sequences fxk g and fuk g converging to a stationary point of Problem (P) be given. Let fQk g be any sequence such that Qk 2 @ 2Z (uk ; ) for every k. By Theorem 2.2 (c) we know that we can assume, without loss of generality, that eventually, for xk suciently close to the point x, the matrix Qk has the following form, for some integer t n2 + 1, Xt Qk = k + ikH (uk ; ; Aki ) i=1
where a sequence converging to 0 and where, for each i and for each k, ik 0, Pt fk =k g1isand Aki A. Since, for each i, Aki A, then Aki A(xk) = I0(x) for k i=1 i suciently large. We also recall that if A and B are two s r matrices we can write:
AB = T
r X
j =1
aj bTj ;
where aj and bj are the j -th columns of A and B respectively. By employing Taylor expansion we can write
Xt k Qk ? Hk = k + rxxL(uk ; (uk )) + i rAki (uk )rgAki (uk )T 2
i=1
19
2 T T + rgAki (uk )rAki (uk ) + rgAki (uk )rgAki (uk ) ? 2 rNik (uk )rNik (uk ) Xt k 2 ? rxxL(xk ; (xk )) ? i rI0 (xk )rgI0 (xk )T T
i=1
+ rgI0 (xk )rI0 (xk )T + 2 rgI0 (xk )rgI0 (xk )T ? rI? (xk )rI? (xk )T 2 t X = k + O(kxk ? uk k) + r2xxL(xk ; (xk )) + ik rAki (xk )rgAki (xk )T i=1 2 + rgAki (xk )rAki (xk )T + rgAki (xk )rgAki (xk )T ? 2 rNik (xk )rNik (xk )T Xt ? r2xxL(xk ; (xk )) ? ik rI0 (xk )rgI0 (xk )T i=1 2 T T T + rgI0 (xk )rI0 (xk ) + rgI0 (xk )rgI0 (xk ) ? 2 rI? (xk )rI? (xk ) 20
1 t X X X = k + O(kxk ? uk k) + ik 64B @ rj (xk )rgj (xk )T ? rj (xk )rgj (xk )T CA i j 2I0 j 2Aki 0 1 X X + B @ rgj (xk )rj (xk )T ? rgj (xk )rj (xk )T CA j 2I0 j 2Aki 0 1 X X + 2 B @ rgj (xk )rgj (xk )T ? rgj (xk )rgj (xk )T CA j 2I0 j 2Ak 0 i 13 X X ? 2 B@ rj (xk )rj (xk )T ? rj (xk )rj (xk )T CA75 j 2I? j 2Nik Xt = k + O(kxk ? uk k) ? ik rDik (xk )rgDik (xk )T i + rgDik (xk )rDik (xk )T + 2 rgDik (xk )rgDik (xk )T + 2 rDik (xk )rDik (xk )T where I? = f1; : : :; mgn I and Dik = I n Aki . Now, if we take into account the previous =1
=1
0
formula and we set we can write
0
aki = rDik (xk )T dk ;
bki = rgDik (xk )T dk ;
"
t 2 X dk (Qk ? Hk )dk = dk [k + O(kxk ? uk k)] dk ? 2 ik kbki k2 + aki T bki + 4 kaki k2 i=1
2 t X 2 k k k T
= dk [k + O(kxk ? uk k)] dk ? i bi + 2 ai
: i=1 T
T
20
#
From this relation the thesis of the proposition readily follows by setting
k = dTk [k + O(kxk ? uk k)] dk :
5 Remarks and conclusions We have presented a negative curvature line search algorithm for the minimization of a nonlinear function subject to nonlinear inequality constraints. The main novel feature of this method is that every limit point of the sequence it generates satis es both the KKT rst and second order necessary optimality conditions. The main tools employed to obtain this result are a continuously dierentiable penalty function and some results from the theory of LC1 functions. For sake of simplicity we did not include equality constraints in our analysis, but they can be easily handled. All the results of this paper go through if one considers also equality constraints, it is sucient to use an analogous of the penalty function Z where equality constraints are included, see [12]. Another point which deserves attention are the Assumtions A, B and C that we employ. These assumptions are mainly dictated by the penalty function considered; however they can be relaxed if a more sophisticated choice is made for the penalty function. We chose to use the (relatively) simple function Z to concentrate on the main issues related to the second order convergence; however, if the continuously differentiable function proposed in [7] is employed instead of Z , we can improve on the assumptions A, B and C. For example, Assumption A can be relaxed to: For any feasible x, the gradients rgi(x), i 2 I0(x) are linearly independent. More signi cantly, also Assumptions B and C can be considerably relaxed, but to illustrate this point we should introduce some technical notation and we prefere to omit this here and to refer the reader to [7] for more details. We only point out that Assumption C can be replaced by natural and mild assumptions on the problem data which guarantee that the levels sets of the penalty function are compact.
References [1] D.P. Bertsekas: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York, 1982. [2] R.H. Byrd, R.B. Schnabel and G.A. Shultz: A trust region algorithm for nonlinearly constrained optimization. SIAM Journal on Numerical Analysis 24, pp. 1152{1170, 1987. [3] F. H. Clarke: Optimization and Nonsmooth Analysis. John Wiley & Sons, New York, 1983. (Reprinted by SIAM, Philadelphia, PA, 1990.) 21
[4] T.F. Coleman and Y. Li: An interior trust region approach for nonlinear minimization subject to bounds. SIAM J. on Optimization 6, pp. 418{445, 1996. [5] T.F. Coleman and W. Yuan: A new trust-region algorithm for equality constrained optimization. Technical Report TR95-1477, Departmenet of Computer Science, Cornell University, USA, 1995. [6] A.R. Conn, N.I.M. Gould and Ph.L. Toint: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. on Numerical Analysis 25, pp. 433{460, 1988. [7] G. Contaldi, G. Di Pillo and S. Lucidi: A continuously dierentiable exact penalty function for nonlinear programming problems with unbounded feasible set. Operations Research Letters 14, pp. 153{161, 1983. [8] J.E. Dennis and L.N. Vicente: On the convergence theory of trust-regionbased algorithms for equality-constrained optimization. Technical Report TR94-36 (revised april 1996), Departmenet of Computational and Applied Mathematics, Rice University, Houston, Texas, USA, 1994. [9] G. Di Pillo: Exact penalty methods. In \Algorithms for continuous optimization", E. Spedicato editor, NATO ASI Series Vol. 434, Kluwer Academic Publishers, Dordrecht, pp. 209{253, 1994. [10] G. Di Pillo and L. Grippo: A class of continuously dierentiable exact penalty function algorithms for nonlinear programming problems. In \System Modelling and Optimization", P. Toft Christensen ed., Springer Verlag, Berlin, pp. 246{256, 1984. [11] G. Di Pillo and L. Grippo: A continuously dierentiable exact penalty function for nonlinear programming problems with inequalty constraints. SIAM Journal on Control and Optimization 23, pp. 72{84, 1985. [12] G. Di Pillo and L. Grippo: Exact penalty functions in constrained optimization. SIAM Journal on Control and Optimization 27, pp. 1333{1360, 1989. [13] M. El-Alem: Convergence to a second-order point for a trust-region algorithm with a nonmonotonic penalty parameter for constrained optimization. Technical Report TR995-28 , Departmenet of Computational and Applied Mathematics, Rice University, Houston, Texas, USA, 1995. [14] F. Facchinei, A. Fischer and C. Kanzow: On the accurate identi cation of active constraints. DIS Working Paper 22.96, Universita di Roma \La Sapienza", Roma, Italy, 1996. [15] F. Facchinei and S. Lucidi: Globally and quadratically convergent exact penalty based methods for inequality constrained problems. Forthcoming. 22
[16] M.C. Ferris, S. Lucidi and M. Roma: Nonmonotone curvilinear line search methods for unconstrained optimization. Computational Optimization and Applications 6, pp. 117{136, 1996. [17] A. Forsgren and W. Murray: Newton methods for large-scale linear inequality constrained minimization. To appear in SIAM J. on Optimization. [18] J.-B. Hiriart-Urruty, J.J. Strodiot and V.H.Nguyen: Generalized Hessian matrix and second-order optimality conditions for problems with C1;1 data. Applied Mathematics and Optimization 11, pp. 43{56, 1984. [19] S. Lucidi: New results on a continuously dierentiable exact penalty function. SIAM Journal on Optimization 2, pp. 558{574, 1992. [20] S. Lucidi, F. Rochetich and M. Roma: Curvilinear stabilization techniques for truncated Newton methods in large scale unconstrained optimization: The complete results. DIS Technical Report 02.95, Universita di Roma \La Sapienza", Roma, Italy, 1995. [21] G.P. McCormick: A modi cation of Armijo's step-size rule for negative curvature. Mathematical Programming 13, pp. 111{115, 1977. [22] G.P. McCormick: Nonlinear Programming: Theory, Algorithms and Applications. John Wiley & Sons, New York, 1983. [23] J.J. More and D.C. Sorensen: On the use of directions of negative curvature in a modi ed Newton method. Mathematical Programming 16, pp. 1{20, 1979. [24] J.J. More and D.C. Sorensen: Computing a trust region step. SIAM J. on Scienti c and Statistical Computing 4, pp. 553{572, 1983. [25] D.C. Sorensen: Newton's method with a model trust region modi cation. SIAM J. on Numerical Analysis 19, pp. 409{426, 1982. [26] L.N. Vicente: Trust-region interior-point algorithms for a class of nonlinear programming problems. PhD Thesis, Rice University, Houston, Texas, USA, March 1996.
23