A QQP-Minimization Method for Semide nite and Smooth Nonconvex Programs Florian Jarre Institut fur Angewandte Mathematik und Statistik Universitat Wurzburg, Am Hubland D{97074 Wurzburg, Federal Republic of Germany Draft, August 13, 1998
Abstract. In several applications, semide nite programs with nonlinear equality constraints arise.
We give two such examples to emphasize the importance of this class of problems. We then propose a new solution method that also applies to smooth nonconvex programs. The method combines ideas of a predictor corrector interior-point method, of the SQP method, and of trust region methods. In particular, we believe that the new method combines the advantages|generality and robustness of trust region methods, local convergence of the SQP-method and data-independence of interior-point methods. Some convergence results are given, and some very preliminary numerical experiments suggest a high robustness of the proposed method. AMS 1991 subject classi cation . Primary: 90C. Key words. Predictor corrector method, SQP method, trust region method, semide nite program.
1. Introduction This work was motivated by two applications from semide nite programming with nonlinear equality constraints as outlined Section 2. To solve such problems we propose an interior-point method in which the usual linear systems of the Newton step and of the predictor step are replaced by quadratically constrained quadratic programs. These QQP's are of a special structure and can be solved directly. In addition, the QQP's allow for a special line search which eects that the algorithm is applicable to nonconvex programs. In the Introduction we motivate the concept for our method, we then describe two semide nite applications in Section 2. In Section 3. we propose a method for smooth unconstrained nonconvex minimization that is based on merging trust-region and line search methods. This method is then generalized to constrained nonlinear programs in Section 4.
1.1 The use of orthogonal transformations
When we summarize optimization history in a few words, it might be as follows: Before 1984, interior-point methods were regarded as slow and numerically unstable. Instead, SQP-methods with BFGS-updates for the Hessian of the Lagrangian were (and still are) regarded as very eective. In particular, if, before 1984, someone told us he is using Newton's method on a logarithmic barrier function to solve a linear program, the answer would have been \This must be too expensive!". However, computer technology had changed radically since the rst experiments were carried out with interior-point methods in 1968 (see [5]). Karmarkar in 1984 [12] (and his implementations in the years after 1984) showed that even for linear programs the use of exact second derivatives may result in a competitive algorithm. This observation carries over to QPs, SDPs, truss topology design problems, and other convex problems. In the mean time the computer technology has further improved dramatically, and we may ask again whether the use of methods that were previously thought of \too expensive" may now be worth to be reconsidered. In particular, the method proposed below heavily relies on orthogonal decompositions (QR-factorizations and certain UDU T -factorizations) instead of the cheaper Cholesky and LR-factorizations. For dense matrices the computational eort for a certain UDU T factorization is roughly four times higher than the cost of computing a Cholesky decomposition; this factor of four is independent of the dimension. However, for semide nite problems, often, the cost of forming the Hessian is much more expensive than the cost of computing the factorization, so that the more expensive factorization does not aect the overall computation time very much. In return for the higher computation time we believe we obtain a stable and very general method as we explain in the sections 3. and 4.
1.2 The use of a primal method
Our method mainly works with the primal variables; occasionally, dual variables are computed from the primal iterate. The decision to consider a primal method is due to the following observations: For convex semide nite programs, the use of a primal-dual algorithm is complicated by the fact that the search directions derived from the standard primal-dual system are nonsymmetric, and special symmetrization operators are introduced. For pure primal methods, the search direction is automatically symmetric. Moreover, as pointed out by Rendl, [19], in the presence of a positive duality gap, the implementation of primal dual methods is typically rather dicult. For nonconvex programs we have to be prepared to encounter further problems when implementing a primal-dual approach. Even for quadratically constrained convex programs, the ecient use of a primal dual algorithm is not simple; our numerical experience in [10] indicates that the independent update of primal and dual variables in an implementation using the Mehrotra corrector seems to require a sophisticated line search in both spaces in order to be numerically ecient. In particular, the con icting objectives of the primal dual line search are far more complicated than for a pure primal method. Finally, the preliminary implementation of a pure primal predictor corrector method for convex semide nite programs in [9] does not seem to be much less ecient than current primal dual implementations. The recent work by S. Wright [27] shows that also in theory, under suitable nondegeneracy assumptions, pure primal methods are locally superlinearly convergent. The use of an interior-point approach as the main concept is motivated by the fact that for semide nite programs the only ecient methods known to date are interior-point methods. We 2
believe that also for other types of nonconvex programs the approach will be ecient.
1.3 The use of a predictor corrector strategy
Predictor corrector strategies in interior point methods have been introduced to nd a way to balance between two con icting goals. In interior point methods, the search direction pointing towards an optimal solution (ane scaling direction) is only de ned when the iterate is \interior". The optimal solution, however, is not \interior". A balance is needed to stay suciently interior and to move suciently fast towards optimality. For simple problems (interior point methods for linear programs), it is possible and also ecient in an implementation to balance both goals in one search direction. However, when the balance of the two goals is not well chosen within this one search direction, the method may either become very slow (too short steps, too much centering), or converge to a non optimal point (too long steps, too little centering). The predictor corrector method solves for the rst goal (moving away from the boundary by doing centering steps) without worsening the second goal (remaining nearly optimal by keeping a penalty parameter constant). It then moves towards the second goal (towards optimality) while moderately violating the rst goal (approaching the boundary again). Then the cycle is repeated. Since there is only one goal in a centering step, the objective for a line search in a centering step is clearly de ned. In addition, the corrector (centering) step oers a criterion of adaptively adjusting the step length for the predictor step (towards optimality) by the rule to make the predictor step as long as possible without inducing too many corrector steps thereafter. Thus, especially for complicated problems, predictor corrector methods oer a way of reliably balancing two goals without being overly slow. In the present paper, the interplay of both steps is essential for our convergence analysis.
1.4 Related work
We point out that the need to generalize interior-point methods to nonconvex programs has been recognized earlier by several authors, see for example Argaez and Tapia [1], Byrd, Gilbert, and Nocedal [2], Coleman and Li [3], Conn, Gould, and Toint [4], Forsgren and Gill [6], Gay, Overton, and Wright [7], Sargent and Zhang [8], Urban, Tits, and Lawrence [25], Vanderbei and Shanno [26]. After we completed the present paper we learned of related work by Kruk and Wolkowicz [13]. They consider a so-called SQ2 P -Algorithm which allows for more complicated trust region subproblems (with several quadratic constraints) than the ones considered in the present paper. In particular, the main features of our approach, { having very simple trust region problems that oer the possibility of using the trust region radius for a line search, { not beeing forced to deal with the so-called \hard case" of the trust region problem, { operating without the need for a merit function are dierent from the approach in [13].
1.5 Notation
We use the following notation. For a symmetric matrix H we write H 0 (H 0) if H is positive de nite (positive semide nite). For a smooth function f : IRn ! IR we denote the transpose of the gradient of f at x by rf (x) (a column vector), and the Hessian of f at x by r2f (x). When 3
F : IRn ! IRk is smooth then rF (x) is the transpose of the Jacobian of F at x. The identity matrix is denoted by I , and by S kk we denote the space of real symmetric k k matrices. We also use the standard order notation, k = O( ) (and k = ( )) to express that the quantity k = k() satis es lim sup!1 k= < 1 (and lim inf !1 k= > 0), or that lim sup!0 k= < 1 (and lim inf !0 k= > 0). It will be clear from the context which of the limits applies.
2. Two applications We brie y outline two applications of semide nite programs with nonconvex constraints. We believe that, in particular, the second example may oer broad possibilities for future research. The rst application arises from an estimation problem in econometrics in which an unknown smooth function with negative curvature is to be found. In the particular example, explicit formulas for the Hessian H depending on several parameters are given, and also some nonlinear equality constraints for the parameters. The goal is to estimate the parameters in a certain optimal way. The common (!) technique in this eld is to require that the negative of the Hessian posesses a Cholesky factorization, and to include this as an equality constraint to the problem. Then, some standard nonlinear optimization software package is applied to this problem. The observation is| of course|that the method is very slow. The equation ?H = LLT with a lower triangular matrix L has a nonunique solution L if H is singular, and in particular, the Jacobian used in (almost) any optimization method will be singular at the optimal solution if the constraint ?H 0 is active. It is obvious, that an algorithm based on the function ? log det(?H ) will not suer from this systematic rank-de ciency|if it is suitably modi ed to non-convex problems. This modi cation is the goal of the present article. First however, we like to outline a very promising application:
2.1 The Kriging Model This application is based on an excellent talk by D. Jones from General Motors, [11]. In many engineering applications, a design is given, and there is a small number of parameters in the design that need to be chosen in an optimal way. Often the design is so complicated (the radiator of a car for example) that a reliable numerical simulation is not possible. How then to choose certain parameters (material and thickness of the lamellae) in an optimal way? The typical approach is to build a prototype and test it. It is clear that one function evaluation (construction of a prototype) is more costly than a day of CRAY time for example. So, the function evaluations need to be planned in a very ecient way and the computation time for determining the x-coordinate (parameter values) for the next function evaluation is negligible. In this framework, the Kriging approach proved to be of high eectiveness. We try to summarize some of its main ideas, a more accurate and comprehensive reference is the paper [20]. The principal idea of the Kriging approach is to nd a good approximation (model) to the unknown function f based on only few function evaluations f (xi ). In addition to the approximation, also some measure of \trustworthyness" of the model is computed. This means, at points where the model is less trustworthy (where the xi lie scarcely), the function f might dier more from the model than at points where the xi lie densely. This information is used to compute an \anticipated lower bound" for f . At the interpolation points xi , the anticipated lower bound coincides with f (xi), and in between the xi it is given by the value of the model minus the \highest anticipated deviation". 4
The search for the global minimum of f is then performed by minimizing the anticipated lower bound. (This way we \anticipate" that we will not miss the true minimizer.) Given the model, no further function evaluations of f are needed for the minimization of the anticipated lower bound; the minimization is therefore \cheap" compared to the function evaluations. The minimizer of the anticipated lower bound then serves as x-coordinate for the next function evaluation f (x), and the process is repeated. Note that the same ideas can also be used when the function values of f are not known exactly (biased by some measurement error for example). In the above aproach, the open question is how to construct a good model, and how to determine its trustworthyness. We will not discuss the question of how to determine an approximate global minimizer of the anticipated lower bound. We mention that the approach suggested below does not guarantee in a mathematical sense that the true minimizer of f will eventually be found, i.e. that the measure for trustworhtyness of the model is correct. In practice, however, Jones reported very impressive results. We now oultine how a model can be found: At the beginning we assume that the values of the unknown function f are given at certain points x1 ; : : : ; xN +1 2 IRD , and denote the values by f (xi) = yi. We search for a \best possible" approximation to the unknown function f based on the known values yi. We start with some basis function b(t) of a real variable t such that b(t2 ) is smooth, and tends to zero as t ! 1; for example b(t) = e?jtj : p (Another choice of b might be a suitable transformation of a cubic spline function like s( jtj) where s is a B-spline symmetric to the origin). We then de ne individual basis functions by bi (x) := b((x ? xi )T H (x ? xi)) (1) where H is a positive de nite matrix. We treat H as an arbitrary positive de nite matrix whose entries are to be chosen in an optimal way as follows: Out of N + 1 data points x1 ; : : : ; xN +1 we choose N points and let \l" be the remaining point. Let l be the average of the N function values (without l), i.e. X l = ( yi )=N: i6=l
Then nd the interpolating function
l (x) = l +
X i6=l
i bi (x)
(2)
where i are real numbers such that l (xj ) = yj for all j 6= l. We then consider l as a rather good approximation to f with the exception that we did not use all the available information to compute l (since we did not use xl ; yl ). Computing the parameters i for l can be done by solving a linear system of equations. By construction, the function l interpolates f at all points except from (xl ; yl ). Let el := l (xl ) ? yl denote the interpolation error at xl . The number el will later be used to evaluate the trustworhtyness of the nal model. First, however, we nd a positive de nite matrix H such that max je j (3) l2f1;:::;k+1g l 5
is minimized. Problem (3) can equivalently be written as minimize s:t : H 0 and ? 0 is solved, and then a predictor step is performed to reduce . Here, the role of is twofold. It is used as a bound for the residual of the equation F2 (x) = 0, and as a weight to the logarithmic barrier terms of the inequalities. (For convenience we divided by in the de nition of , so that in (12), seems to be a weight for the objective function rather than for the barrier terms.) To initialize the algorithm we set x(0) := x(0) ; := kF (x(0) )k + 1 krF (x(0) )k; := 1: 0
2
m?p
2
0
(We assume 0 > 0.) Given x(k) and = k , we consider the barrier problem minimize (x; ) s:t: F2 (x) = F2 (x(k) ):
(14)
The barrier subproblem (14) does not aim at reducing kF2 (x)k, and in fact, we will replace the condition F2 (x) = F2 (x(k) ) by a weaker condition
kF2 (x)k 2k 0;
(15)
which in the case k = 0, for example, allows an increase of kF2 (x)k by a factor of more than 2.
4.3 Solving the barrier subproblem (\centering step")
The barrier subproblem will be solved with a generalization of the method from the previous section, namely as follows: The KKT-conditions for (14) are
rx(x; ) + rF2(x)y = 0
(16)
and F2 (x) = F2 (x(k) ). Given an iterate x we nd the \best" y associated to x by solving a linear least squares problem to approximate (16). (The QR-factorization of rF2 (x) will be needed below, When x i = 6 x for 1 i p then 5
( )
(0)
p T X (x; ) := c x ? log det A(x) ? log(?fi (x + (x(i) ? x(0) ))) i=1
has a strictly feasible point x = x for 0 = 1. (0)
10
so that y can be obtained at little extra cost.) We then linearize (16) to obtain a system of linear equations for the corrections x and y
rx(x; ) + rx2(x; )x + rF2(x)(y + y) +
m X
j =p+1
yj r2fj (x)x = 0
(17)
The following notation will be used to describe solution method for the barrier problems.
H := rx2(x; ) +
m X
j =p+1
yj r2fj (x); g := rx (x; ); A := rF2 (x)T ; b := ?F2 (x):
(18)
Consider for a moment the following \simple problem" for approximating the solution of the equation (17) subject to a trust region constraint ksk2 2r2 and the condition F2 (x + s) = F2 (x): (19) minimizefgT s + 12 sT Hs j As = 0; sT s 2r2 g: In using the Hessian H of the Lagrangian of (14), this problem is very similar to an SQP subproblem, but it can be solved directly (as we will see) from a QR-factorization of AT and therefore allows a continuous adjustment of the trust region radius r just as in Section 3.1. In view of the similarity to the SQP method, and since we solve a simple Quadratically constrained QP to determine the search direction, we will call this method \QQP-method". We see that the solution s of (19) satis es the same descent properties for as the solution s of (5) or (6) does with respect to f . However, due to its nonlinearity, kF2 k may grow unbounded when iterating a search step based on (19). To maintain a convergence result in the sense of Proposition 1, i.e. to keep kF2 k small, we x some number 2 (0; 1]: We then might consider to replace the condition As = 0 in (19) by a condition of the form As = b for some 2 [0; ]. Whenever A has linearly independent rows and r is large enough, the problem with this modi cation still has a solution. For smaller values of r, or when A has linearly dependent rows, the system (19) with As = b may not have a solution. We therefore propose to use the following \full problem" in place of the \simple problem" (19): (20) minimizefgT s + 21 sT Hs j kAs ? bk (r)kbk; sT s 2r2 g: Here, (r) 1 is a number that is monotonously decreasing with r and will be speci ed below where we also de ne an approximate solution s(r) (or s()) for (20). The step length along the curved line s(r) will be chosen as to minimize ( : ; ) subject to the constraint (15), i.e. we compute
r := argmin (x + s(r); ) s:t: kF2 (x + s(r))k 20 ;
(21)
We hope, of course, that the constraint (15) will not become active, and that the step length r is solely determined by minimizing . To this end we set = in (20) if kF2 (x)k 0 , = 0 if kF2 (x)k 0:50 , and leave at its value from the previous iteration if 0:50 < kF2 (x)k < 0. Note that values < 1 are meaningful if steps of length more than one are taken along the Newton direction. 11
4.4 Computing the search step for the \full problem"
Here, we outline a simple approach that enables us to solve (20). Recall the de nitions of H; g; A; b in (18). A KKT-point for the \simple problem" (19) with the condition As = b is given by the system H + I AT s ?g (22) A y = b : Let
AT = Q R0 with QT Q = I 2 IRnn
(23)
and with a full row rank upper triangular matrix R be a QR-factorization of the transpose of the Jacobian of F2 . Here, R 2 IRq(m?p) with m ? p q, and 0 2 IR(n?q)(m?p) . When A has full rank then q = m ? p, else q < m ? p. Note that we referred to this QR-factorization after equation (16) where a multiplyer y was to be found. We divide Q = (Q1 ; Q2 ) into the rst q columns Q1 , and the remaining n ? q columns Q2 that form a basis of the null space of A. It is obvious that (22) is then equivalent to 2 T 3 R 4 Q HQ + I 0 5 QT s = ?QT g : (24) y b T R 0 0 The solution of (24) can be computed as follows. Let s~ := QT s = ss~~1 (25) 2 be partitioned analogously to the partition of Q (i.e. s~1 = QT1 s 2 IRq ), and likewise for g~ := QT g. Let H~ := QT HQ be divided in four blocks H~ ij = QTi HQj for i; j 2 f1; 2g. By I1 we denote the q q identity matrix, and by I2 we denote the (n ? q) (n ? q) identity matrix. With this notation we obtain in case of q = m ? p
s~1 = R?T b; s~2 = (H~ 22 + I2 )?1 (?g~2 ? H~ 21s~1 ); y = R?1 (?g~1 ? (H~ 11 + I1)~s1 ? H~ 12 s~2 ):
(26) (27) (28)
(For q < m ? p the right hand side of (26) needs to be changed to a least squares solution, and y in (28) is no longer unique since R?1 does not exist.) When R is poorly conditioned, then s~1 may become very large, and the above search step will have poor descent properties for . Since the right-hand side b is only intended to avoid kF2 k from growing too large we relax the corresponding set of equations and replace (26) by
s~1 = (RRT + 3=2 I1 )?1 Rb (29) (There is no need to use the same number in (29) and (27); instead, one might consider using parameters 1 in (29) and 2 in (27). See also the comment regarding (55) in the proof of Proposition 2) Note that s~1 in (29) is a KKT point|and hence an optimal solution|of the convex problem minimize kRT s1 ? bk2 s:t: ks1 k2 r^2 12
for some r^ > 0. Thus, the above solution s = Qs~ indeed satis es the bound kAs ? bk (r)kbk of problem (20) for some () = (r) < 1. Moreover, for q = m ? p it follows that
kRT s~1 ? bk = k(RT (RRT + 3=2 I1)?1 R ? I1)bk = 3=2k(RT R + 3=2 I1)?1 bk and thus, kRT s~1 ? bk ! 0 as ! 0. In the previous equation we used RT (RRT + 3=2 I1 )?1 R ? I1 = 3=2 R?1 (RRT + 3=2 I1 )?1 R = 3=2 (RT R + 3=2 I1)?1 :
(30)
(Multiply the rst expression with the inverse of the middle expression to verify the rst identity.) However, similar to the case of (6), the solution s of (29), (27) might not be the optimal solution to (20). Nevertheless, in the proof of Proposition 2 it is shown that for large the search direction s() reduces both, as well as kF2 k provided that r 62 range(AT ) and F2 6= 0. For the systems (29) and (27) the UDU T -decomposition of Section 3. can be applied again6 .
Proposition 2 Let > 0 be xed, and x(k) satisfy the condition (15) and be feasible for ( : ; ). If the above procedure (29), (27) and (21) to minimize ( : ; ) for a xed value > 0 is iterated, then each accumulation point x of the sequence of iterates is either 1. a stationary point in the sense that
kF2 (x)k 20 ; F1(x ) < 0; A(x ) 0; and there is a vector y with r(x ; ) + rF2 (x )y = 0. 2. or a singular point such that rF2 (x ) is rank de cient. Proof: See the appendix. When the second case happens we believe the method will not necessarily stagnate; we postpone the analysis of this case to some later research when the numerical experiments allow a better conclusion about how to specify certain options and modi cations to this method. The stationary point of ( : ; k ) found at iteration k will be denoted by x(k ) , and the residual for the equality constraints by k := kF2 (x(k ) )k 2k 0 (31) (because of (15)). +
+
A modi cation to make this approach more ecient for the case when there are only very few equality constraints (m ? p n) might be based on the decomposition A = [A1 ; A2 ] = L^ R^ = L^ [R^1 ; R^ 2 ] where L^ and R^ are square lower and upper triangular matrices. Let this factorization be performed with full pivoting (in particular the columns of A are permuted such that A1 is nonsingular, and \hopefully" kA?1 1 A2 k is moderate). Simple block matrix operations then transform (22) to a linear system of the form ?H22 + I22 ? H21A?1A2 ? AT A?T H12 + AT A?T (H11 + I11)A?1A2 s2 = rhs 2 2 1 1 1 1 (in place of the system involving the matrix H~ 22 + I2 in (27)). When neglecting the term AT2 A?1 T (I11 )A?1 1 A2 in the above expression, then the ideas of Section 3. can still be applied but the trust region is now with respect to a dierent norm in place of the Euclidean norm. 6
13
4.5 The predictor step
In principle, after an approximate local minimizer x(k ) of the barrier problem has been computed, one could simply reduce the barrier parameter and continue using Newtons method. When the objective function is linear, and there are no residuals with the equality constraints, this is in fact a sensible idea, since on the \central path" the Newton direction then coincides with the tangent to the central path. However, when the iterate is not close to the central path, the Newton direction is a poor search direction. Moreover, in the presence of equality constraints, the update of the bound (15) requires special attention when reducing without changing x. We propose a special predictor step. For the derivation of the predictor step (but not for the convergence analysis) we assume that the solution of the barrier subproblem is locally unique, and that it follows a smooth path with respect to the parameter . As observed before, this path may have bifurcation points. The predictor step is not intended to trace such bifurcations, but rather to \hop" over a bifurcation and to leave it up to the centering step to nd a point on one of the branches of the path. The predictor step is also intended to render the bound (15) inactive, i.e. if (15) was active in the previous centering step, then kF2 (x)k is reduced slightly more than . To this end let +
= maxf1; k g k 0
be xed. (We obtain larger values of when the norm of the residual k of F2 in (31) is large.) We compute the tangent to the path de ned by (k ) r(x(); ) + rF2 (x())y() r (32) (x ; k ) F2 (x()) F2 (x(k ) ) (33) +
+
k
Straightforward dierentiation with respect to yields the system
H AT x_ ( ) 2 k =4 A
y_ (k )
3 c2 k 5: (k+ ) F 2 (x k )
(34)
for computing the tangent x_ at = k where we use the same notation as in (18). When x(k ) satis es the second order suciency conditions for optimality of the barrier subproblem and x(k ) is nondegenerate, then this system has a unique solution. Given the solution x_ we may choose a step length along x_ subject to the condition that x(k ) ? x_ (k ) (35) is feasible for ( : ; ) for all 2 [0; k ]. Then, the next iterate is de ned as x(k+1) := x(k ) ? k x_ (k ) such that kF2 (x(k+1) )k maxf 3 ; kF2 (x(k ))k g: (36) k ? k 2 0 k The use of the tangent given by (34), however, is restriced as the system (32), (33) may not have a solution for all 2 (0; 1] when problem (9) is nonconvex. We therefore add appropriate modi cations below. For our convergence analysis we assume that the recipe below is used throughout. +
+
+
+
+
14
Let c~ := QT c and x~ = QT x. De ne
lin x (s; ) :=
p cT (x + s) ? log det Alin(s) ? X log(?fi (x) ? rfi (x)s) x i=1
(37)
as the barrier function of problem (9) linearized at x. Here, Alin x (s) is the linearization of A at lin + 2 x. De ne a positive de nite approximation H := I + rsx (s; ) to H . (When H is suciently positive de nite, we may use H in place of H + .) The above de nition of lin and hence of H + is not unique; if we treat the constraint ? det A(x) < 0 like the remaining constraints fi(x) < 0, we obtain a dierent matrix H + . In this case also the convergence result in Proposition 3 will be weaker. For the convergence analysis we require that lin is a self-concordant barrier function (which is automatically true if we de ne lin as in (37)). We propose to reduce kF2 k by exploiting the barrier property of lin , minimize kAs ? bk s:t: sT H + s r2 :
(38)
Thus, the second block row of (34) will be relaxed to the solution of (38), and by introducing again a multiplyer we obtain the following system
H AT x_ " = y_
A
# c (1+) ?RT (RRT + H~ 11+ )?1Rb ;
(39)
where we relaxed the rst block row in an analogous manner, and multiplied the right hand side by . As before for H we also denote H~ + = QT H + Q and x~_ = QT x_ . The solution of (39) is given by x~_1 = ?(RRT + H~ 11+ )?1 Rb; (40) (41) x~_2 = (H~ 22+ )?1 ( (1c~+2 ) ? H~ 21+ x~_ 1 ): In (39), (40), and (41), x_ is multiplied by , so that a step of length one along \?x_ " gives an approximation to the solution of (9). Again, dierent values of in (40) and (41) may be ecient in an implementation. System (40) may be solved by the technique of Section 3, after transforming either H~ 11+ or RRT to the identity matrix, alternatively, an approach as in (59) can be used. As for the search step s in the trust region problem, also x_ depends on the choice of ; occasionally we will write x_ to emphasize this dependence. For > 0 or H 6= H + , the quantity x_ is not the tangent to the curve de ned in (32), (33). For simplicity we use the notation \tangent" and \x_ " also for > 0 without elaborating to which (other) curve x_ is a tangent. The step lengths k and are determined by two dierent cases. When > 1 de ne (k ) (42) = () = k (1 ? kF32 (x + s(())k k) ) maxf 2 0 k ; kF2 (x )kg and nd > 0 that maximizes subject to the condition (43) (x(k ) ? tx_ ) is de ned for t 2 [0; 2 ? ]: +
+
+
15
When = 1 (and we do not use (35), (36)) we de ne = 1+ and minimize subject to
kF2 (x(k ) ? x_ )k maxf 3 ; kF2 (x(k ))k g k ? 2 0 k +
+
(44)
and (43). (Note that (42) implies (44).) If the step length for the predictor step lies below a certain preset threshhold and if > 1, we will replace the right hand side term c~2 =2 in (41) by 0 and use the rules (44), (43). This eects that the only purpose of the predictor step is to reduce kF2 k while staying feasible with respect to , and the goal to head towards optimality is \suspended". (In this case one might limit the reduction of to at most, say 50%; if is reduced substantially while cT x is not being \updated", it is likely that many corrector steps will be needed for the next centering.) Proposition 3 below states that the prediction can stagnate only if the iterates get \trapped" near a local \infeasible stationary point".
4.6 The overall algorithm We now summarize the derivations of the previous chapters and state the framework of a prototype algorithm. For brevity we omitt the modi cation (35), (36) in this chapter, and refer to Section 4.5 for a description of this acceleration. We also do not address the choice of dierent values of for the directions s1 and s2 (see e.g. the comments following (29)). Lut us recall the problem and the notation. minimize
s:t:
cT x A(x) 0; fi (x) 0 for 1 i p; fj (x) = 0 for p + 1 j m:
We assume that x(0) is given such that A(x(0) ) 0, and fi (x(0) ) < 0. As in (13) let F1 be the vector formed by the fi , and F2 by the fj . The logarithmic barrier function for the inequalities is de ned in (12), and the barrier function lin for the linearized problem is given in (37). For a given triple x; y; we denote as in (18)
H
m X 2 = H (x; y) = rx(x; ) + yj r2fj (x); j =p+1
g = g(x; ) = rx (x; );
A = A(x) = rF2 (x)T ; b = b(x) = ?F2 (x) Note that H and the matrix H + given by H + = r2lin (x; ) do not depend on . Whenever we use the letters H; g; A or b without their arguments we imply that these quantities are evaluated at a \current" R point which is given in the context. We also assume that Q = Q(x) is given by AT = Q 0 as in (23). Here, R 2 IRq(m?p) . (Note that this notation hides the fact that Q in not unique.) We partition Q = (Q1 ; Q2 ) into the rst q columns Q1 and the remaining n ? q columns Q2 . 16
Algorithm 1 (Conceptual Algorithm) Initialization: Choose ; 2 (0; 1) (e.g. = 0:2; = 0:1). Let x = x(0) ; 0 = 1 + kF2 (x(0) )k; = 0 = 1; 0 = 1; = : For k = 0; 1; 2; : : : do (outer loop) 1. Do until convergence (Corrector steps, inner loop) (a) Given x, compute a least squares solution y to g + AT y = 0, i.e. y = R?1 QT1 g if A has full row rank. Let g~2 = QT2 g. Update = if kF2 (x)k 0 and = 0 if kF2 (x)k 0:50 . (b) Let s~ = s~() be given by
s~1 = (RRT + 3=2 I1 )?1 Rb s~2 = (H~ 22 + I2 )?1 (?g~2 ? H~ 21s~1 ); and s() = Qs~. (c) Find the step length that minimizes (x + s(); ) subject to the constraint
kF2 (x + s())k 20: (d) Set x = x + s( ). 2. (Predictor step) Denote the result of the corrector steps by x = x(k ) , and set +
k = = c^2 =
kF2 (x(k ) )k (only for k 1) maxf1; k =(0 )g +
QT2 c=
3. Compute a least squares solution y to g + AT y = 0. 4. Let s^ = s^() be given by
s^1 = ((RRT + H~ 11+ )?1 Rb ~ 22+ )?1 (?H~ 21+ s^1 ) if k k and > 1 s^2 = ((H ?1 c^2 ? H~ + s^1 ) if k > k or = 1 H~ 22+ )?1 ( 1+ 21 and s() = Qs^. 5. When > 1 de ne
= () = k (1 ?
kF2 (x(k ) + s())k ) maxf 32 0 k ; kF2 (x(k ) )kg
17
+
+
(45)
and nd > 0 that maximizes subject to the condition
(x(k ) + ts()) is de ned for t 2 [0; 2 ? ]: When = 1 de ne = 1+ and minimize subject to (46) and +
kF2 (x(k ) + s())k maxf 3 ; kF2 (x(k ) )k g: k ? 2 0 k +
+
(46)
(47)
In Proposition 3 we present a few simple convergence results for Algorithm 1. As before, to simplify the notation we treat the inequality f0 (x) := ? det A(x) < 0 like the other fi ; (i 2 1; : : : ; p).
Proposition 3 Suppose that at each outer iteration k the barrier subproblem converges to some point x(k ) . Suppose that k ! > 0. Then for any accumulation point x of the iterates x(k ) +
+
either the implication
If z 2 IRn satis es F2 (x )T rF2 (x )T z < 0 then 9i 2 0; : : : ; p with fi (x ) = 0 and rfi(x )T z 0 holds or x is a singular point such that rF2 (x ) is rank de cient. If ! 0 then any accumulation point x of the x(k ) is either a feasible stationary point of (9) in the sense that there exists a multiplyer z 0 and y such that +
c + rF1 (x )z + rF2 (x )y = 0
and
z T F1 (x ) = 0;
or x is a feasible singular point.
Currently, we cannot provide a complete proof of Proposition 3; for a partial discussion see the appendix. The condition stated for the case that > 0 is also discussed there. As for Proposition 2, we believe that the algorithm will in general not stagnate at singular points. When the second order sucient conditions for (9) are satis ed, Algorithm 1 is locally (near the optimal solution of (9)) very similar to the one analyzed in [27, 28], and the local superlinear convergence property shown in [27] carries over to the algorithm of the present paper. The details to show this claim are lengthy and remain to be veri ed as well.
5. Conclusion We propose an algorithm for nonconvex programming that combines features of trust region methods, SQP methods, and interior-point methods. The method does not use a merit function, i.e. for the barrier subproblems the \merit function" is the same (barrier) function whose linearization was used to compute the search step. Therefore, \incompatibility eects" like the Maratos eect are not expected. Likewise the predictor step along the tangent of the central path is only restricted by feasibility conditions, and when the optimal solution is strictly complementary, the path behaves (under mild regularity assumptions) locally like a linear function. In this case we may expect the predictor step to be eective as well. Summarizing, we believe that the algorithm will prove to be very robust in an actual implementation. 18
Acknowledgements The author would like to thank Prof. Franz Rendl for many stimulating discussions. We also thank several colleagues who pointed us to realted work. The reference of Prof. M. Powell and his hint to tridiagonal systems, have very much improved the present paper. We also thank Prof. Philippe Toint for pointing to the reference [21], and Prof. S. Wright, for his reference to [13].
6. Appendix, Proofs of the propositions In the previous chapters we have simply assumed that the functions de ning the optimization problems are smooth. To specify this statement we assume that these functions are C 3-smooth on all of IRn . (Most results also hold under weaker assumptions.)
6.1 Proof of Proposition 1
By de nition of the line search function l in (7), we may assume that the method generates an in nite sequence of iterates xj with xj +1 = xj + s(j ) and f (xj +1) < f (xj ). Assume, by contradiction, that x is an accumulation point of this sequence with rf (x) 6= 0. We will show that rf (x) 6= 0 implies that the step s of (26) and (29) satis es
ksk O( 1 )
and
? gT s O( 1 ):
(48)
From this we will deduce a certain decrease of f along x + s() and derive the contradiction. Since rf (x) 6= 0 and by continuity, there exists some > 0 and some > 0 such that 2 krf (x)k for all x with kx ? x k . Further, there exists some M > 0 such that kr2f (x)k M for all x with kx ? xk . We assume w.l.o.g. M 4=(3). Now let xk be an iterate close enough to x such that f (xk ) f (x )+ 2 =(30M ) and kxk ? x k =2. Let := 4M , s := ?(r2f (xk ) + I )?1 rf (xk ), and := krf (xk )k. Then, k 2 2 k 2 k )T s krf (x )k = ; and ? r f ( x (49) ksk krf (x )k = 3M 3M 5M ?M + M as mentioned in (48). Note that ksk =2 since M 4=(3). Since kr2f (x)k M we may
therefore estimate
2 2 2 f (xk + s) f (xk ) + rf (xk )T s + M ksk2 f (x) + 30M ? 5M + 9M < f (x ): (50) Next, we show that f (xk + s()) is strictly monotone in for 2 [4M; 1), and thus, by the line search, the iterate xk+1 will also be chosen such that f (xk+1) f (xk + s()) < f (x ). By monotonicity of the f (xj ) f (xk+1 ) < f (x ) for j k + 1 this guarantees that x is not an accumulation point of the xj . This was to be shown. The monotonicity of l for 2 [4M; 1) follows from its de nition (7) and its derivative (8),
l0 () = rf (xk + s())T U (D + I )?2 U T rf (xk ) 19
(51)
where UDU T is the Schur decomposition of r2f (xk ) (i.e. here D is diagonal in contrast to the decomposition in Section 3.). The elements of D are bounded in absolute value by M , and thus, for 4M : (3M )?2 I (D + I )?2 (5M )?2 I: (52) Further, since ks()k is monotone in ,
krf (xk + s()) ? rf (xk )k M ks()k =3:
(53)
Inserting (52) and (53) in (51) it follows that 2 2 > 0 l0 () 25M 2 ? 31 9M 2 for 4M . We now prove the local superlinear convergence. Let x be an accumulation point of the xj and assume that x is a local minimizer of f with a positive de nite Hessian matrix. Since Newtons method converges locally quadratically to x it suces to show that ! 0 whenever xj ! x . To simplify the notation let H = r2f (xk ) and g = rf (xk ). Then,
rf (xk + s()) = g ? Hs() + O(ks()k2 ): With s = s() = (H + I )?1 g it follows from (51) that
l0 () = (g ? Hs + O(ks()k2 ))T (H + I )?2 g = sT I (H + I )?2 g + O(ksk2 )k(H + I )?1 kk(H + I )?1 gk = sT I(H + I )?1 s + O(ksk3 )k(H + I )?1 k ksk2 + M + O(ksk) :
Here, the last line follows since k(H + I )?1 k is bounded for all > 0, and the eigenvalues of I (H + I )?1 lie in the interval [=(M + ); 1] where M is an upper bound for the eigenvalues of H . Since s = s() is always shorter or equal to the Newton step s(0), the derivative l0 () is positive as long as = (ksk) = (kxj ? x k). In particular, when kxj ? x k ! 0 there exist numbers j with j ! 0 such that l0 () is positive for all values > j . This completes the proof.
6.2 Proof of Proposition 2
We now use again the notation as summarized in the beginning of Section 4.6. In addition, since
is kept xed in Proposition 2, we will omitt it in this proof and write (x) for example instead of (x; ). As in the proof of Proposition 1 we may assume that the method generates an in nite sequence of iterates xj with xj +1 = xj + s(j ) and (xj +1 ) (xj ). Let x be an accumulation point of this sequence. By the barrier property of , x must be in the interior of the domain of . First we assume by contradiction that kF2 (x )k < 20 but x is not a stationary point and not a singular point. Then, A(x) has full row rank in a neighborhood of x , and there exists a number > 0 and a number > 0 such that T min (54) z kg(x) + A(x) z k 2 [; 2] 20
whenever kx ? x k . In particular, x is in the domain of for all x with kx ? x k . Moreover, since A(x) has full row rank R = R(x) is a square matrix (q = m ? p), and we may assume w.l.o.g. that there exists some nite number M > 0 such that kRk M and kR?1 k M for kx ? x k . (Reduce if necessary.) Likewise, we may further assume that kH (x)k M , kr2(x)k M , and kbk M for kx ? x k . (In contrast to the proof of Proposition 1, here H 6= r2(x), but this relation was not used in the rst part of the proof of Proposition 1.) We now derive the same contradiction as in the proof of Proposition 1. Let x with kx ? x k be given and T := min z kg(x) ? A(x) z k = kg~2 (x)k 2 [; 2] (Here, z = ?R?1 g~1 .) Recall that s~1 = (RRT + 3=2 I1 )?1 Rb; (55) ? 1 s~2 = (H~ 22 + I2 ) (?g~2 ? H~ 21 s~1): It follows that ksk = O(1=) as in (48). Likewise, ~ 21 s~1 ) Rb + |{z} g~2T |(H~ 22 +{zI2 )?1}(|{z} g~2 + H ?gT s = ?g~T s~ = ? |{z} g~1T |(RRT +{z3=2 I1 )?1} |{z} |{z} |{z} O(M )
O(?3=2 )
O(M 2 )
(?1 )
O(M ) O(?1 )
is at least O(1=) as in (48). We omitt the repetition of the proof of Proposition 1 in terms of the new constants in these bounds. Note that the term 3=2 in (55) can be reduced to without aecting this proof as long as g~1T (RRT + I1)?1 Rb 0. It is evident to conjecture local superlinear convergence for the case that strong second order conditions for optimality hold; we do not elaborate on the proof of this conjecture. Let us now assume that kF2 (x )k = 20 . In this case, = is chosen for all suciently large j . To apply the above proof it suces to show that a step s of lenth ksk = O( 1 ) can be taken without violating the condition kF2 (x + s)k kF2 (x)k. Note that (29) implies As = b where b := RT (RRT + 3=2 I1 )?1 Rb: (56) Thus, since kr2F2 (x)k M , kF2 (x + s)k kF2 (x) + Ask + M ksk2 = k?b + RT (RRT + 3=2 I1 )?1Rbk + M ksk2 : Using (30) this can be reduced to kbk(k(1 ? )I1k + k3=2 (RT R + 3=2 I1)?1 k) + M ksk2 : Now, the eigenvalues of RT R are at least 1=M 2 by our assumptions. Hence, ?2
k3=2 (RT R + 3=2 I1)?1 k 3=2 =(3=2 + M ?2 ) = 1 ? M ?M2 + 3=2 = 1 ? (?3=2 ):
Summarizing,
kF2 (x + s)k kF2 (x)k(1 ? (?3=2 )) + M ksk2 :
(57) This implies that a step s of length s = O( 1 ) does not violate the condition kF2 (x + s)k kF2 (x)k. 21
6.3 Discussion of Proposition 3
Let x be an accumulation point of the sequence x(k ) . Then, by continuity, x is either a singular point or the limit of stationary points in the sense of Proposition 2. Thus, when = 0, the claim of Proposition 3 follows in a straightforward way from Proposition 2 and the barrier property of . Let us now consider the case that > 0. The condition stated in Proposition 3 implies that there is no direction z along which the infeasibility with respect to F2 is locally reduced while all active inequalities are improved at the same time. If Proposition 3 does not hold then there is some z 2 IRn with F2 (x )T rF2 (x )T z < 0 and rfi(x )T z < 0 for all i with fi (x ) = 0. In particular, this implies rfi(x ) 6= 0 for all active inequalities. Observe that the assumption rf0(x ) 6= 0 is typically violated when A(x ) has the eigenvalue zero of multiplicity more than one. When rfi(x ) = 0 for some active inequality, there may (in most cases) exist some number l > 1 such that a generalized gradient of the constraint ?(fi (x))1=l 0 is nonzero at x = x . The logarithmic barrier function for this constraint is the same up to a factor l, and the analysis below still applies if H + is de ned accordingly. In particular, for the inequalities it suces to assume that the fi are smooth where they are negative and have a continuous extension to the boundary of this domain. Assume that x is not a singular point. Observe that = j converges along any subsequence of iterates x~(j ) that converge to x . We assume that j ! > 1 as k ! 1. (The simpler case = 1 can be treated analogously; note that (45) implies (47).) Then, (since k ! 0) it follows from (39) that the predictor step s is de ned as H + AT s 0 A y = RT (RRT + H~ 11+ )?1 Rb : If F1 (x ) < 0, the eigenvalues of H + and H~ 11+ are bounded near x (and greater or equal to 1), and thus, for large it follows that kyk = O(1=) and ksk = O(1=). The same arguments as in (57) imply that the norm of F2 is reduced by a factor of ( 1 ) by the above step. This implies a value of of size (1=) in (45). Since ! 0, it follows ! 1, and the size of must be limited by (46), i.e. (x(k ) + ts(); ) is infeasible for some t 2 [0; 2]. This means some of the fi must be active at x . For simplicity of notation we assume that all constraints fi (x ) = 0 for 0 i p are active. (The inactive constraints contribute perturbations to the analysis below that do not change the arguments based on the magnitude of certain active quantities.) The points x(k ) satisfy p rf (x(k ) ) m X c +X i (k ) + i=0 ?fi(x(k ) ) j=p+1 yj rfj (x ) = 0: +
+
+
+
+
+
When multilying this relation by mini f?fi (x(k ) )g and taking the limit as k ! 1 (along a subsequence of the iterates if necessary) we obtain from > 0 (and since some fi ! 0) that there are vectors y^ 0, and y such that y^ 6= 0 and +
p X i=0
y^irfi (x(k )) + +
m X
j =p+1
yj rfj (x(k ) ) = 0: +
From Farkas Lemma follows that there does not exist a vector z such that rfi(x(k ) )T z < 0 for all i and Az = 0. This statement is weaker than the one made in Proposition 3. +
22
We outline why we believe that Proposition 3 is correct in the stronger form stated in the text; a rigorous proof is deferred to the nal version of this paper (hopefully). Since x(k ) is a stationary point for it is a minimizer for lin . Observe that by the barrier property of lin , and since all active constraints have nonzero gradients near x , it follows from the ellipsoidal approximation of the level sets (cf. Theorem 2.1.1. in [16]) that a step of length r^ = 1 in (38) is feasible for lin , and thus, a step of length r^ = (1) is feasible for if x(k ) is close enough to x . This implies that = O(1) is possible with respect to (46). The predictor steps are just ane scaling steps for the linearized barrier function lin and the convex quadratic objective function kF2 (x) + Ask. By (46), the step lengths converge to 21 when ! 0. The corrector steps reduce the value of the barrier function, but they may not change F2 very much. In [24], global convergence of the ane scaling method with step lengths up to 32 is proved for linear programs. In our case, the objective function of the predictor step (kF2 k) is nonlinear, and typically (when problem (9) is not ill posed), there is a solution such that kF2 k = 0 in the interior of the domain of . Thus, the situation is dierent form the one in [24]. If the result of [24] generalizes to the concept used here, then it follows that x minimizes the linearization of kF2 k over the linearized domain of problem (9) which is the statement made in Proposition 3. The articles [14, 23] show that even for linear programs the ane scaling algorithm may fail when the steps are very long. The counterexample given in [14, 23] can be prevented by a modi cation as follows: Since there is typically a solution of the equation F2 = 0 that lies in the interior of the domain of , it is meaningful to consider a \centering component" for the predictor step in the following form. Let g := r(x; ) be the gradient of , and de ne h := pgTgH g : T If x(k ) is a stationary point of , then r(x; )jx=x k = rlin x k (s; )js=0 2 rangeA and is nearly constant when moving a short step along the null space of A. Along the null space of A also kF2 k is nearly constant for short steps. Thus, (for small r) the \important components" of the predictor step lie in the range space of AT . The vector \?h" is a centering component in the range space of A. In place of (38) one may consider the problem (58) minimize kAs ? bk s:t: (s + 2r h)T H + (s + r2 h) r2 : Note that for all r > 0, the point s = 0 is in the interior of the trust region of (58), and thus the solution of (58) is a descent direction for kF2 k. Problem (58) eects that the trust region is shifted towards the direction ?h with growing r. For a xed value of r, the KKT conditions for the convex problem (58) can be solved by similar techniques as in Section 3. The main computational eort lies in solving several systems of the form (AT A + H )s = rhs. Let H = LLT be the Cholesky decomposition of H . By the Sherman Morrison formula, the inverse of H + AT A is then given by +
+
+
+
( +)
( +)
(LLT + AT A)?1 = (L(I + L?1 AT AL?T )LT )?1 ?T ?1 T = L?T ( 1 I ? 12 L?1 AT (I + AL L A )?1 AL?T )L?1 :
(59)
For the case n = 2m the computation of L?1 and of the tridiagonal UDU T decompositon of AL?T L?1 AT can be carried out with n3 + O(n2 ) multiplications. 23
References [1] M. Argaez and R.A. Tapia, \On the global convergence of a modi ed augmented Lagrangian linesearch interior-point Newton method for nonlinear programming", Technical Report 95-38, Dept. of Comp. and Appl. Math., Rice University, (1997). [2] R.H. Byrd, J.C. Gilbert, and J. Nocedal, \A trust region method based on interior-point techniques for nonlinear programming", Technical Report OTC 96/02, Argonne National Laboratories, (1996). [3] T.F. Coleman and Y.Y. Li, \A primal dual trust region algorithm for nonconvex programming using a L1 penalty function", Report, Dept. of Computer Science, Cornell University (1998?). [4] A.R. Conn, N. Gould, and Ph.L. Toint, \A primal-dual algorithm for minimizing a nonconvex function subject to bounds and nonlinear constraints", Report RC 20639, IBM T.J. Watson Center, Yorktown Heights, NY, (1996). [5] A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques (Wiley, New York, 1968). [6] A. Forsgren and P.E. Gill, \Primal-dual interior methods for nonconvex nonlinear programming", Technical Report NA 96-3, Dept. of Mathematics, University of California, San Diego, (1996). [7] D.M. Gay, M.L. Overton, and M.H. Wright, \A primal-dual interior method for nonconvex nonlinear programming", Technical Report 97-4-08, Bell Laboratories, Murray Hill, NJ (1997). [8] R.W.H. Sargent and X. Zhang, \An interior-point algorithm for general nonlinear complementarity problems and nonlinear programs", Report, Centre for Process Systems Engineering, Imperial College, London (1998?). [9] F. Jarre, \An Interior-Point Method for Minimizing the Maximum Eigenvalue of a Linear Combination of Matrices", SIAM Journal on Control and Optimization 31 (1993) 1360{1377. [10] F. Jarre, M. Kocvara, J. Zowe, \Optimal Truss Design by Interior-Point Methods", to appear in: SIAM Journal on Optimization (1998) [11] D. Jones, (General Motors R&D Center): \Kriging, The Cadillac of Nonlinear Surface Response Methodologies", 18th IFIP TC7-Conference, Detroit, Michigan, (1997). [12] N.K. Karmarkar, \A new polynomial-time algorithm for linear programming", Combinatorica 4 (1984) 373{395. [13] S. Kruk and H. Wolkowicz, \SQ2 P sequential quadratic constrained quadratic programming" Research Report CORR 97-01, Dept. of Combinatorics and Optimization, University of Waterloo, Canada, Revised Aug. 7, 1998. [14] W.F. Mascarenhas, \The ane scaling algorithm fails for = 0:999", Technical Report, Universidade Estadual de Campinas, Campinas S. P., Brazil, (1993). 24
[15] J.J. More and D.C. Sorensen, \On the use of directions of negative curvature in a modi ed Neton method" Mathematical Programming 16 (1979) 1{20. [16] J.E. Nesterov and A.S. Nemirovskii, Interior Point Polynomial Methods in Convex Programming: Theory and Applications (SIAM, Philadelphia, 1994), [17] M.J.D. Powell, \The use of band matrices for second derivative approximations in trust region algorithms", Report NA1997/12, Cambridge University, (1997). [18] M.J.D. Powell, \Trust region calculations revisited", Presented at the 17th Biennial Conference on Numerical Analysis (Dundee), Report DAMTP 1997/NA18, Cambridge University, (1997). [19] F. Rendl, personal communication. [20] Sacks, Welch, Mitchell, and Wynn, \Design and Analysis of Computer Experiments", Statistical Science Vol 4 (1989) 409|435. [21] G.A. Shultz, R.B. Schnabel, and R.H. Byrd, \A family of trust-region based algorithms for unconstrained optimization with strong global convergence properties", SIAM J. on Numerical Analysis, 22(1), (1985) 47{67. [22] G. Sonnevend, \An `analytical centre' for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming," in: System Modelling and Optimization (Budapest , 1985), Lecture Notes in Control and Information Sciences No. 84 (Springer, Berlin, 1986) 866{875. [23] T. Terlaky and T. Tsuchiya, \A note on Mascarenhas' counter example about global convergence of the ane scaling algorithm", Research Memorandum No.596, The Institute of Statistical Mathematics, Tokyo, Japan, (1996). [24] T. Tsuchiya and M. Muramatsu, \Global Convergence of a Long-Step Ane Scaling Algorithm for Degenerate Linear Programming Problems", SIAM Journal on Optimization, Vol. 5, No. 3 (1995) 525{551. [25] T. Urban, A.L. Tits, and C.T. Lawrence, A primal-dual interior-point method for nonconvex optimization with multiple logarithmic barrier parameters and with strong convergence properties", Report, University of Maryland, College Park (1998). [26] R.J. Vanderbei and D.F. Shanno, \An interior point algorithm for nonconvex nonlinear programming", Report SOR-97-21, Dept. of Statistics and OR, Princeton University (1997). [27] S.J. Wright \On the Convergence of the Newton/Log-Barrier Method" Preprint ANL/MCSP681-0897, Mathematics and Computer Science Division, Argonne National Laboratory, (1997). [28] S.J. Wright and F. Jarre, \The Role of linear Objective Functions in Barrier Methods" Preprint MCS-P485-1294, MCS Division, Argonne National Laboratory. (Revised, August, 1997).
25