an interior point algorithm for linearly constrained ... - Semantic Scholar

Comment

Report 2 Downloads 84 Views

SIAM J. OPTIMIZATION. Vol. 1, No. 4, pp. 000{000, Month 1992

c 1992 Society for Industrial and Applied Mathematics 012

AN INTERIOR POINT ALGORITHM FOR LINEARLY CONSTRAINED OPTIMIZATION STEPHEN J. WRIGHTy Abstract. We describe an algorithm for optimization of a smooth function subject to general linear constraints. An algorithm of the gradient projection class is used, with the important feature that the \projection" at each iteration is performed using a primal-dual interior point method for convex quadratic programming. Convergence properties can be maintained even if the projection is done inexactly in a well-de ned way. Higher-order derivative information on the manifold de ned by the apparently active constraints can be used to increase the rate of local convergence.

Key words. potential reduction algorithm, gradient porojection algorithm, linearly constrained

optimization

AMS(MOS) subject classi cations. 65K10, 90C30

1. Introduction. We address the problem (1)

min x f(x)

s.t.

AT x b;

where x 2 Rn and b 2 Rm , and f is assumed throughout to be twice continuously dierentiable on the level set

L = fx j AT x b; f(x) f(x0 )g; where x0 is some given initial choice for x. Recent literature on this problem can for the most part be divided into two main classes. On the one hand, there are the \active set" approaches such as sequential quadratic programming, which are most suitable when the constraints AT x b lack any special structure such as separability. In these algorithms a model of f (for example, the quadratic approximation f(x) + rf(x)T d + (1=2)dT r2f(x)d) is formed at each \outer" iteration and minimized over some subset of the feasible region. The algorithm tends to move along edges and faces of the boundary of the feasible set, changing its set of currently active constraints by at most one element on each \inner" iteration. A second class of methods, known as \gradient projection" methods, allow more substantial changes to the active set at each iteration by choosing a direction g (for example, rf(x) or some scaled version of it) and searching along the piecewise linear path P(x g), where > 0 and P is the projection onto the feasible set. These methods are best suited to the case in which the projection P(:) is easy to perform, for example, when the feasible region is a box whose sides are parallel to the principal coordinate axes. In this paper, our aim is to describe an algorithm of the gradient projection class, in which we allow the projections to be performed inexactly. We focus on the case of Euclidean norm projections, which can be solved by using interior-point methods for convex quadratic programming problems. In this way, general polyhedral feasible regions can be handled. We thus hope to combine the much-vaunted advantages of Received by the editors July xx, 1990; accepted for publication (in revised form) September xx, 1991. This research was partly performed at the University of Queensland, Australia, whose support is gratefully acknowledged. The work was also supported by the Applied Mathematical Sciences subprogram of the Oce of Energy Research, U. S. Department of Energy, under Contract W-31-109-Eng-38. y Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois, 60439. 1

2

STEPHEN J. WRIGHT

interior-point methods with the desirable properties of gradient projection algorithms | most notably, rapid identi cation of the nal active set. In addition, we allow second-derivative information to be used in the de nition of g (as is also done by Dunn [4, 3] and Gafni and Bertsekas [5]) to speed up the asymptotic convergence rate after the correct active set has been identi ed. The \inexactness" in the projection is quanti ed by a duality gap, which is updated at each iteration of the projection subproblem. The global convergence analysis in section 4 is not tied to the use of an interior-point method for the projection; any algorithm (including an active set method) that allows a duality gap to be calculated for each iterate may be used. The point x is a critical point for (1) if there are scalars yi 0 such that

rf(x ) =

X

i2A

yi ai ;

where ai are columns of A, and A = fi = 1; ; m j aTi x = bi g: Equivalently, (2) rf(x ) 2 N(x ; X); where X is the feasible set fx j AT x bg, and N(x; X) is the normal cone to X at x, de ned by N(x; X) = fv j vT (u x) 0; for all u 2 X g: In the next section, we specify the algorithm. The interior-point method that may be used to perform the projection is discussed in section 3. The global and local convergence properties of the algorithm are analyzed in section 4 and section 5, respectively. In the remainder1of the paper, the following notational conventions will be used: kxk = (xT x) 2 (the Euclidean norm), unless otherwise speci ed. PY (x) denotes the Euclidean projection of the vector x onto the convex set Y Rn, that is PY (x) = arg min kz xk: z 2Y If the subscript is omitted from P, projection onto X is assumed.

intY denotes the interior of Y , and @Y denotes its boundary. When x is a vector, relations such as x > 0 are meant to apply componentwise. Subscripts on vectors and matrices denote components, while superscripts are

used to distinguish dierent iterates. Subscripts on scalars denote iteration numbers. When fk g and fk g are non-negative sequences, the notation k = O(k ) means that there is a constant s such that k sk for all k suciently large. k = o(k ) means that there is a non-negative sequence fsk g converging to zero such that k sk k for all k suciently large. The sequence fvk g is said to converge Q-quadratically to v if kvk+1 v k = O(kvk v k2). It is said to converge R-quadratically if there is a sequence fk g that converges Q-quadratically to zero such that kvk v k k for all k.

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

3

If fvk g and fvk g are two sequences of vectors, the notation \vk ! vk " means that limk!1 kvk vk k = 0. In sections 4 and 5, we introduce constants denoted by C and C with a

subscript. In all cases these represent strictly positive constants, even where not stated explicitly. 2. The Algorithm. We start this section by giving an outline of the major operations at each iteration of the basic algorithm. Then we state a formal outline and conclude by mentioning possible variations. The algorithm rst de nes an \almost active" set of constraints at each iterate xk . It partitions the gradient into two orthogonal components (which are orthogonal to and tangent to the manifold de ned by the almost active set, respectively) and then scales the tangent component by a matrix with suitable positive de niteness properties (possibly an inverse reduced Hessian or a quasi-Newton approximation to it). A projected Armijo-like line search is then performed along the resulting direction. The activity tolerance at the point xk is k , where for the moment we require only that k 0. The almost active set I k is de ned by (3) I k = fi = 1; ; m j aTi xk bi k kaikg: We use T k to denote the tangent manifold corresponding to this set: (4) T k = fz j aTi z = 0; all i 2 I k g: The negative gradient is then decomposed using T k by setting (5) dk = PT k ( rf(xk )); dk+ = [rf(xk ) + dk ]: The tangent component dk is modi ed by setting (6) d~k = Dk dk ; where Dk is a matrix such that PT k Dk PT k = Dk and (7) 1 z T z z T Dk z 2 z T z; all z 2 T k ; where 1 and 2 are positive constants. The search direction is assembled as (8) gk = (d~k + dk+ ): A projected Armijo search is carried out along the path xk () = P(xk gk ); where the values = 1; ; 2 ; 3; (where 2 (0; 1) is some constant) are tried. For each such value of , the projection is calculated with the algorithm described in the next section. This algorithm generates a sequence of feasible approximations to xk (), which we denote by xkj (). For each such estimate, the algorithm produces a duality gap kj (). De ning the more convenient quantity q

kj () = 2 kj (); we can obtain upper and lower bounds on the distance from xk gk to X, that is kxkj () (xk gk )k2 kj ()2 kxk () (xk gk )k2 kxkj () (xk gk )k2 :

4

STEPHEN J. WRIGHT

These \inner iterations" are stopped at a value of j for which kj () becomes suciently small according to the following criteria: (9)

kj () =2 max

!2

kxkj () (xk + d~k)k ; kdkk

and (10)

kj () C1=2 :

Here , C1 , and are constants that satisfy the conditions > 2;

C1 < 1:

We denote the nal computed kj () by k (), and the corresponding xkj () by xk (; k ()). The step is then accepted if the following \sucient decrease" test is satis ed: ) ( k (; k ()) (xk + d~k )k2 k x k k kT k k (11) f(x ) f(x (; k ())) d D d + ; where 2 (0; 1) is a constant. The algorithm can be summarized as follows: Step 1: Choose k . Compute I k from (3), and gk according to (5){(8). Step 2: For = p , p = 0; 1; 2; (in sequence) approximately calculate xk () =

P(xk gk ), terminating when xk (; k ()) = xkj () and its associated k () = kj () are found that satisfy (9),(10). If the test (11) is passed for this value of , set k = = p , xk+1 = xk (; k ()), k k + 1, and go to the next iteration. Otherwise, increase p by 1, and try the next = p . In its \exact" form (i.e., k () 0), and when Dk is de ned as the reduced Hessian or a quasi-Newton approximation to it, the step gk is the same as that obtained by specializing the algorithm of Dunn [4] to the linearly constrained case. The calculation of gk is somewhat dierent in Gafni and Bertsekas [5]. They de ne an \almost tangent cone" at xk by C k = fz j aTi z 0; all i 2 I k g; and then de ne dk as the projection of rf(xk ) onto this cone. Additionally, the conditions on Dk are slightly dierent, and d~k is the projection of Dk dk onto C k . Our reason for following Dunn [4] and using the simpler decomposition relative to T k is our assumption that projection onto the subspace T k can be done exactly and cheaply. This is not unreasonable | the cost would normally be comparable to one iteration of the interior-point algorithm used for the projection onto X. Projection onto C k may, on the other hand, be as expensive as projection onto X. Still, there are intuitive reasons for preferring C k to T k , and it would be of interest to see whether the extra cost per iteration (and the extra algorithmic complexity of doing the projection onto C k inexactly) is justi ed.

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

5

The steplength rule (11) reduces to the one proposed by Gafni and Bertsekas [5] (and also used by Dunn [3]) when k () 0. Another obvious possibility, to which we will return brie y in section 5, is n o (12)f(xk ) f(xk (; k ()) dkT Dk dk + rf(xk )T [xk + d~k xk (; k ())] :

3. Projection onto X . Projection onto the polyhedral set X can be achieved by solving a convex quadratic program or, equivalently, a linear complementarity problem (LCP). In this section, we formulate the problem and outline a primal-dual potential reduction algorithm for solving it. The discussion will be brief, since other papers such as [6, 7, 10, 11] can be consulted for details about motivation, analysis, and implementation issues for this class of interior-point algorithms. Throughout the remainder of the paper, we use the following assumptions: (A) The feasible set X has an interior in Rn. (B) At the solution z = P(t) of the projection subproblem, the set of vectors fai j aTi z = bi g is linearly independent.

The (unique) vector P(t) is obtained by solving min 21 kz tk2 s.t. AT z b; or, equivalently, (13) min 21 kz tk2 s.t. AT z + = b; 0: Introducing Lagrange multipliers y for the constraints, we nd that (13) is equivalent to the (mixed) LCP 0 I A z t T (14) = AT 0 y + b ; 0; y 0; y = 0: The coecient matrix in (14) is clearly positive semi-de nite. The progress of the interior-point algorithm can be gauged by using the potential function de ned by (15)

(; y) = P log( T y)

m X i=1

log(i yi );

where P m + pm. In Kojima, Mizuno, and Yoshise [7], the step from iterate j to iterate j + 1 is obtained by solving the linear system 0 I A z (16) = AT 0 y ; together with

j j j j j j j i yi + yi i = 1 (17) i yi + ; i = 1; ; m; j j

6

STEPHEN J. WRIGHT

where j = mi=1 ij yij and the value of j is discussed below. A steplength j is chosen such that P

j

j i ; j j i ; i = 1; ; m; i i

(18)

for some 2 (0; 1). Trivial modi cations of the results of Kojima, Mizuno, and Yoshise [7] indicate that for the choices j P = m+ pm and = 0:4, we have that ( j + j ; yj + j y) ( j ; yj ) 0:2:

(19)

p When some iterate (z j ; j ; yj ) satis es ( j ; yj ) O( mL), it can easily be shown L). This suggests that, provided the initial point (z 0 ; 0; y0 ) satthat ( j )T yj 2 O(p 0 0 is es ( ; y ) = O( mL), convergence to a point with duality gap less than 2 O(L) p can be achieved in O( mL) iterations. (For purposes of the complexity analysis, L is taken to be the \size" of the problem.) Although this choice of j yields the best complexity result to date, it has been observed that, in practice, larger values of j lead to fewer iterations. In Han, Pardalos, and Ye [6], the choice j m2 is made for convex quadratic programs. In the context of linear programs, Zhang, Dennis, and Tapia [11] observe that it is even desirable to let j grow unboundedly large as the solution is approached. (The steps produced by (16),(17) are then very close to being Newton steps for the nonlinear equations formed by the equalities in (14).) Ye et al. [10] have shown such \large" choices of j are not incompatible with obtaining reductions in the potential function. In practical implementations, the line search parameter j is also chosen dierently. In Han, Pardalos, and Ye [6], the following choice appeared to give good experimental results: !

ij ; yij : j = 0:99 min i=1;min min ;m; i 1 such that the nal iterate (z; ; m+1; y; ym+1 ) generated by the projection algorithm, each time it is called, satis es Pm+1 i yi l=1 l yl =(m + 1) :

Although this assumption con icts to some extent with the desire for fast asymptotic convergence of the interior-point method, Zhang, Dennis, and Tapia [11, Theorem 3.1] observed that, at least in the case of linear programming that they consider, it appears to hold in practice.

8

STEPHEN J. WRIGHT

4. Global Convergence. In this section we prove that all accumulation points of the algorithm of section 2 are critical. The result depends crucially on the following lemma, which bounds the distance between xk (; 0) and xk (; k ()) in terms of k (). Lemma 4.1. Suppose that (A) holds and that (B) holds at z = xk (; 0). Then kxk (; 0) xk (; k ())k k (): Proof. Setting t = xk gk , we obtain kt xk (; 0)k2 kt xk (; k ())k2 k ()2 2 ) k () 2[t xk (; 0)]T [xk (; 0) xk (; k ())] + kxk (; 0) xk (; k ())k2 : Now since t xk (; 0) 2 N(xk (; 0) ; X) and xk (; k ()) 2 X, the rst term on the right-hand side above is non-negative and can be omitted from the inequality. The result follows. Under appropriate nondegeneracy assumptions, application of the implicit function theorem to a subset of the equalities in (13) (or (20)) would suggest that, locally, a stronger bound of O( k ()) = O(k ()2) might be obtained. In fact, some of the local convergence analysis in section 5 relies on just this observation. In general, however, given a point xk and a search direction gk , there are usually values of such that the solution of (13) (or (21)) for t = xk gk is degenerate. Our result in Lemma 4.1 is similar to, but more speci c than, the bound that would be obtained by applying the analysis of Mangasarian and Shiau [8] to (13). We state without proof the following well-known result, which actually applies for any closed convex X Rn . Lemma 4.2. For any x 2 X and z 2 Rn , a) kP(x + z) xk= is a nonincreasing function of > 0, b) kP(x + z) xk= kz k. Before proving the main result (Theorem 4.5), we show that the conditions (9),(10) ensure that the projection is computed exactly when xk is critical (Lemma 4.3) and, in a technical result, show that the algorithm produces descent at a noncritical point (Lemma 4.4). Lemma 4.3. Suppose that (A) holds and that (B) holds at z = xk . When xk is critical, then k () = 0 for all 2 [0; 1], and xk (; k ()) = xk for all 2 [0; 1]. Proof. Clearly the result is true for = 0. For the remainder of the proof, we assume that 2 (0; 1]. All vectors in the subspace T k are orthogonal to N(xk ; X). Hence by (2) and (5), k d = d~k = 0 and dk+ = rf(xk ). Also, by (2), xk (; 0) = P(xk rf(xk )) = xk ;

and so kxk (; k ()) (xk + d~k )k kxk (; 0) xk k + kxk(; k ()) xk (; 0)k k (): Substituting this expression in (9), we have 2 k () =2 k() 2 and hence (22) k ()[1 =2 2k ()] 0:

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

9

From (10) and the fact that C1 < 1, 1 =2 2 k () 1 C1 2 > 1 2 0; since 2 [0; 1] and > 2. Since k () 0, the inequality (22) can hold only if k () = 0. Thus, the rst statement is proved. Proof of the second statement follows immediately. Lemma 4.4. Suppose I k is de ned by (3), where k is any positive number. Suppose that (A) holds and that (B) holds for all z = xk (; 0) for 2 [0; 1]. Then, given any 2 (0; 1) there exists an () 2 (0; k =kgk k) such that rf(xk )T [xk xk (; k ())] i h (23) dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 Hence, provided xk is not critical, there is an ^( ) 2 (0; ] such that f(xk ) > f(xk (; k ()) for all 2 (0; ^]. Proof.

rf(xk )T [xk

rf(xk )T [xk xk (; k ())] xk (; 0)] + rf(xk )T [xk (; 0) xk (; k ())]

(24) = and for 2 (0; k =kgk k), it can be proved by using a similar argument to that in [5, Proposition 1 (b)] that rf(xk )T [xk xk (; 0)] dkT Dk dk + 1 kxk (; 0) (xk + d~k )k2: By the smoothness assumptions on f, there is a constant B such that krf(x)k B for all x 2 L: Since all xk 2 L, we have, using Lemma 4.1, that rf(xk )T [xk (; 0) xk (; k ())] Bk (): (25) Now (26) kxk (; 0) (xk + d~k )k2 = kxk (; k ()) (xk + d~k)k2 +2[xk (; 0) xk (; k ())]T [xk(; k ()) (xk + d~k )] + kxk (; 0) xk (; k ())k2 kxk (; k ()) (xk + d~k )k2 2k ()kxk (; k ()) (xk + d~k)k; and so from (24){(26) rf(xk )T [xk xk (; k())] (27) dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 2 k k ~k k ()kx (; k ()) (x + d )k Bk (): Now, 1 k k ~k kx (; k ()) (x + d )k 1 kxk (; k ()) xk (; 0)k + 1 kxk (; 0) (xk + d~k )k (28) 1 1 k k+ k ~k ~k k () + kP(x + d + d ) (x + d )k:

10

STEPHEN J. WRIGHT

The following simple argument shows that xk + d~k 2 X for 2 (0; k=kgk k): i 2= I k ) aTi [xk + d~k ] bi k kai k + kgk kkaik < bi i 2 I k ) aTi [xk + d~k ] = aTi xk bi : Hence by Lemma 4.2(b), (28) becomes 1 1 1 k k k+ ~k (29) kx (; k ()) (x + d )k k () + kd k k () + B: Hence (27) becomes (30)

rf(xk )T [xk xk (; k ())] dkT Dk dk + 1 kxk (; k ()) (xk + d~k)k2 2 k ()2 3Bk ():

We now consider two cases. First, suppose that 1 k k k ~k kx (; k ()) (x + d )k kd k: Then from (9) it follows that k k ~k 2 k () =2 kx (; k ())2 (x + d )k : Using this, together with (10) and the fact that C1 < 1, we have from (30) that

rf(xk )T [xk xk (; k ())] (31) dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 2 =2 =2 + 3B=2 ( 12 )kxk (; k ()) (xk + d~k )k2 (C1 ) i h dkT Dk dk + 1 1 2 2 3B 2 2 kxk (; k ()) (xk + d~k )k2 : The inequality (23) will be satis ed provided (32)

2 1 2 2 3B 2 :

Setting = 2 2 , we nd that the quadratic 2 2 + (3B) + ( 1) has one positive root. Hence we can nd an 1 > 0 such that the required inequality will be satis ed for all 2 (0; 1]. For the second case, assume that 1 kxk (; ()) (xk + d~k )k kdk k: k Then from (9), k () =2 kdk k2; and so from (30), (33) rf(xk )T [xk xk (; k ())] dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 2 C1 kdk k2 3B=2 kdkk2 :

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

From (7) it follows that and so using C1 < 1, we have

11

kdk k2 1 dkT Dk dk ; 1

rf(xik )T [xk xk (; k ())] (34) 1 21 2 3B1 2 2 dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 : h

For (23), it is sucient that 2 2 1 2 2 3B : 1 1 A similar argument to that above shows that a positive value 2 can be found so that this inequality is satis ed for 2 (0; 2]. Hence the rst part of the result follows by setting () = min(1; k=(2kgk k); 1; 2): The second part of the result (i.e., that f(xk ) > f(xk (; k ())) for suciently small ) is obtained by modifying the argument of Gafni and Bertsekas [5, Proposition 1 (b)]. By the mean value theorem, we can nd a point k () on the line joining xk to xk (; k ()) such that f(xk ) f(xk (; k ())) = rf( k ())T [xk xk (; k ())]: Hence from (23), for 2 (0; ), 1 k k [f(x ) f(x (; k ()))] i h (35) dkT Dk dk + 12 kxk (; k ()) (xk + d~k )k2 + 1 [rf( k ()) rf(xk )]T [xk xk (; k ())]: Again, writing xk xk (; k ()) = xk xk (; 0) + xk (; 0) xk (; k ()) and using kxk xk (; 0)k2 kgk k2 = kd~kk2 + kdk+k2 ( + 1)krf(xk )k2 ( + 1)B 2 ; (36) 2 2 2 we have 1 k k T k k [rf( ()) rf(x )] [xp x (; k ())] krf( k ()) rf(xk )k B 2 + 1 + 1 k () krf( k ()) rf(xk )k B p2 + 1 + C1=2 1 = O(): When dk 6= 0, it follows from (35) that f(xk ) f(xk (; k ())) dkT Dk dk > 0: lim !0

12

STEPHEN J. WRIGHT

On the other hand, when dk = 0, f(xk ) f(xk (; k ())) kxk (; ()) xk k2 + O() k 2 ()kxk (; 0) xk k + O(): (37) 2 kxk (; 0) xk k2 2 2 k

A straightforward application of Lemma 4.2(a) shows that 1 k k k k kx (; 0) x k kx (1; 0) x k: Also, from Lemma 4.2(b), we have for 2 (0; ) that kxk (; 0) xk k kdk+k B: Using these inequalities, together with (10), we have from (37) that f(xk ) f(xk (; k ())) kxk (1; 0) xk k2 2B () + O() k kxk (1; 0) xk k2 2BC1=2 1 + O(): Taking the limit, we have f(xk ) f(xk (; k ())) kxk (1; 0) xk k2 > 0: lim !0 In either case, there is an ^ with the desired property.

For the main result of this section, we need to be more speci c about the choice of k . We now assume that (38) k = min(; c^k ^(xk )); where there is a constant B^ such that ^ ^ck 2 [1; B]; and ^(x) is a continuous function of x that is zero only when x is critical. Theorem 4.5. Suppose that k satis es condition (38), that (A) holds, and that (B) holds for xk (; 0), for all 2 [0; 1] and all k suciently large. Then every accumulation point xk generated by the algorithm is critical. Proof. The proof is quite similar to the proof of Proposition 2 of Gafni and Bertsekas [5]. Some modi cations are necessary because of the inexactness in xk () and because of the need for the quantity in Lemma 4.4. We include most of the details here, and refer the reader to [5] for the remainder. Suppose for contradiction that there is a noncritical point x and a subsequence K such that limk2K xk = x . If k denotes the steplength used in the step from xk to xk+1, (11) implies that (39) lim dkT Dk dk = 0 k2K k

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

13

lim 1 kxk (k ; k (k )) (xk + k d~k )k2 = 0:

(40)

k2K k

Taking a subsequence if necessary, assume that lim = k2K k

for some 2 [0; 1]. 2K 0 and so Two cases arise. First assume that > 0. Then from (39), dk k! k 2K k 2K d~k ! 0 and dk+ ! rf(xk ). Also from (40), (41) lim kxk (k ; k (k )) (xk + k d~k )k = 0; k2K and so from (9), lim ( ) = 0: k2K k k Using this limit together with (41), we get x ( ; 0) = P(x rf(x )) = x ; which implies that x is critical. For the second case, take = 0. Then for k 2 K suciently large, the test (11) will fail at least once, thus, using the notation k = k ; we have that (42) f(xk ) f(xk (k ; k (k )) o n < k dkT Dk dk + 1 kxk (k ; k (k )) (xk + k d~k )k2 : k

Since, by (38), k is bounded away from zero, and since it follows from (36) that kgk k is bounded above, we have (43) lim kinf =kgk k > 0: 2K k Hence, setting = ( + 1)=2, Lemma 4.4 can be applied to nd an > 0 such that (23) holds for 2 (0; ]. Moreover, closer examination of the proof of Lemma 4.4 shows that, because of (43), the value of can be chosen independently of xk , for k suciently large. Now since limk2K k = 0, we have for k suciently large that rf(xk )T [xk xk (k ; k (k ))] o n k 2 kT k k 1 k k (44) +1 2 k d D d + kx (k ; k (k )) (x + k d~ )k : k

Using the mean value theorem, and combining (42) and (44), we have o n 1 dkT Dk dk + 1 kxk ( ; k ( )) (xk + d~k )k2 k k k k 2

(45)

k

rf(xk )T [xk xk (k ; k (k ))] f(xk ) + f(xk (k ; k (k )) = [rf(xk ) rf( k )]T [xk xk (k ; k (k ))]

14

STEPHEN J. WRIGHT

for some k on the line joining xk to xk (k ; k (k )). Note that 1 kxk xk ( ; ( ))k 1 kxk xk ( ; 0)k + (k ) kgk k + C ( )=2 1 ; 1 k k k k k k k k which is bounded because of (36). Hence the right-hand side of (45) is o(k ), and dividing both sides of (45) by k we have that lim dkT Dk dk = 0;

(46) (47)

k2K

lim 1 kxk(k ; k (k )) (xk + k d~k )k2 = 0:

k2K (k )2

From (46), limk2K dk = 0 and so limk2K d~k = 0. Since in addition d~k 2 T k , we have that xk + k d~k 2 X for k suciently large. Lemma 4.2(a) can be applied to show that 1 k k ~k k kx (k ; 0) (x + k d )k (48) = 1 kP((xk + k d~k ) + k dk+) (xk + k d~k )k k kP((xk + k d~k ) + dk+ ) (xk + k d~k )k: Meanwhile Lemma 4.2(b) implies that 1 kxk ( ; 0) (xk + d~k )k kdk+ k B: (49) k k k Taking the sequence in (47), and using Lemma 4.1, (10), (48), and (49), we have 1 k k ~k 2 (k )2 kx (k ; k (k )) (x + k d )k (k1 )2 kxk (k ; 0) (xk + k d~k )k2 2k(kk ) 1k kxk (k ; 0) (xk + k d~k )k (50) kP((xk + k d~k ) + dk+ ) (xk + k d~k )k2 2BC1 (k )=2 1 : Since the second term in this expression approaches zero, it follows from (50) that in the limit, P(x rf(x )) = x; and so x is critical, again giving a contradiction. 5. Local Convergence. For the exact algorithm, the local convergence analysis is quite simple because when convergence occurs to a local minimum that satis es the \standard" assumptions, the iterates eventually all lie on the manifold de ned by the constraints which are active at the solution. This does not occur in our case, where the iterates remain in the interior of X. We thus need to ensure that the distance of the iterates to the active manifold is decreasing suciently quickly so as not to interfere with the (rapid) convergence in the tangent direction. Fortunately, some inherent properties of the path following projection algorithm prove to be useful here. In this section we prove R-quadratic convergence of an algorithm in which Dk is a reduced Hessian. Much of the analysis is devoted to showing that steplengths

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

15

of k = 1 are used for all suciently large k. We start by de ning a scheme for choosing k , then state an active set identi cation result. Eventual unit steplength is established in a sequence of lemmas and Theorem 5.6. We conclude with the main rate-of-convergence result in Theorem 5.7. In addition to the assumptions made in the preceding sections, we use the following:

(D) x is a strict local minimum that is nondegenerate, that is, rf(x ) 2 ri N(x ; X);

where ri N(x ; X) is the interior of N(x ; X) relative to the ane hull of N(x; X). (E) k is de ned as k = min(; c^k (xk )); where > 0 is a positive constant, (x) = kx P(x rf(x))k; ^ for some B^ < 1. (^ck is a \random" quantity and need not be a and c^k 2 [1; B] k function of x .) If Assumption (B) also holds at x , then Assumption (D) implies that there are unique scalars yi > 0 such that (51)

rf(x ) =

X

i2A

yi ai;

where A is as de ned in section 1. For later reference we introduce the notation A = [ai ]i2A; A 2 Rnr ; r n: Orthonormal matrices Z 2 Rn(n r) and Y 2 Rnr can be de ned such that Z T A = 0 and Z T Y = 0. Our relaxed de nition of k is motivated by the fact that calculation of x P(x rf(x)) involves a projection onto X and hence will be carried out inexactly by the algorithm of section 3. The following scheme can be used: Algorithm to calculate k : Step 1: Given some constant C^ 2 (0; 1), apply the algorithm of section 2 to nd P(xk rf(xk )), terminating when the duality gap 2P =2 satis es the inequality ^ kx^k xk k; P (1 C) where x^k is the latest estimate of the solution. Step 2: Set k = min(; 2kx^k xk k). With the notation x^k = P(xk rf(xk )), Lemma 4.1 and the conditions on P can be used to show that k k ^ C^ 1 kx^k P xk k kkx^x^k xxk kk 1 + kx^k P xk k 2 C:

16

STEPHEN J. WRIGHT

Hence 2kx^k xk k = c^k kx^k xk k;

where

c^k 2

2 ; 2 ; 2 C^ C^

and so the requirements of Assumption (D) are satis ed. From this de nition of k , the following active set identi cation result can be proved:

Lemma 5.1. Suppose that assumptions (A), (D), and (E) hold and that (B) holds for xk (; 0), for 2 [0; 1] and all k suciently large. Assuming that x is a limit point of the sequence fxk g, we have limk!1 xk = x and I k = A for all k suciently large. Proof. The result follows from Lemma B.1 of Gafni and Bertsekas [5]; trivial

modi cations are required because of our relaxed de nition of k . The Assumption (B) in [5] corresponds to our Assumption (C) (see Theorem 2.8 in Burke and More [1]). We next show that the steplengths do not vanish as k ! 1. Lemma 5.2. Under the assumptions of Lemma 5.1, there is ^ > 0 such that k ^ for all k suciently large. Proof. From Lemma 5.1 we have that for k suciently large, I k = A. Since k d = PT k ( rf(xk )) ! 0, it follows that kgk k ! krf(x )k B. Now in Lemma 4.4, we are free to set k uniformly equal to a constant ~ > 0 which is chosen so that i 2= I k = A ) aTi xk bi 2~kai k for all suciently large k. Hence k =kgk k is bounded away from zero. Now, given any 2 (0; 1), we can apply Lemma 4.4 to nd an ( ) > 0 such that for all 2 (0; ()], rf(xk )T [xk xk (; k ())] dkT Dk dk + 1 kxk (; k ()) (xk + d~k )k2 : If we use L as an upper bound on r2f(x) for x in some neighborhood of x , it follows exactly as in Gafni and Bertsekas [5] that f(xk ) f(xk (; k ())) kT k ( L2 )d D dk + ( L)kxk (; k ()) (xk + d~k )k2: If we choose ~ = sup min (); L ; L ; 2 2[;1) it follows from the line search mechanism (11) that k ^ def = ~ ; and the result follows since clearly ^ > 0. The next result follows easily from Lemma 5.2, Lemma B.2 of [5], and the analysis of Dunn [2]

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

17

Lemma 5.3. Under the assumptions of Lemma 5.1, we have for all k suciently

large that

xk (; 0) = x + Zvk () 2 x + T k ; for all 2 [k ; 1], where vk () 2 Rn (52) (xk + (d~k + dk+ ))

r . Also,

yk () 2 N(x ; X); xk (; 0) = A

for all 2 [k; 1], where yk () 2 Rr has yik () > C2 for i = 1; ; r and some constant C2 > 0. Proof. We prove only the last statement concerning the lower bound on yk (). Since dk ! 0 and dk+ ! rf(xk ), we can combine (51) and (52) to obtain

) + A yk () ! 0; xk (; 0) (xk + Ay

(53)

where y = fyi gi2A . Since xk+1 xk ! 0 we have from (9) that k (k ) ! 0. Hence

0 kxk (k ; 0) xk k kxk (k ; k (k )) xk (k ; 0)k+kxk+1 xk k k (k )+kxk+1 xk k ! 0: Now by Lemma 4.2, and since 2 [k; 1], 1 kxk (; 0) xk k 1 kxk ( ; 0) xk k: k k Since k ^ , it follows from this inequality that xk (; 0) xk ! 0. Hence from (53), we have that using the full rank of A, yk () ! y : Since y > 0, the result follows.

Corollary 5.4. Under the assumptions of Lemma 5.1, we have for all k su-

ciently large that

Z T sk+ = 0

where

sk+ = xk (1; 0) (xk + d~k ):

In addition, aTi (xk + sk+ ) = bi for i 2 A. Proof. The statement Z T sk+ = 0 follows from the second expression in Lemma 5.3 by setting = 1 and noting that Z T dk+ = 0. For the second part, note that

xk + sk+ = xk (1; 0) d~k 2 x + T k ; from the rst expression in Lemma 5.3 and the fact that d~k 2 T k . Hence aTi (xk + sk+ ) = aTi x = bi , as required. For the remainder of this section we use the following notational conventions: k = k (k )2 =2 is the nal duality gap for the step from xk to xk+1; The error in the approximate unit step is separated into two components: xk (1; k (1)) xk (1; 0) = ek = e~k + ek+ ; where e~k = PT k (ek ) = ZZ T ek and ek+ = Y Y T ek ; k denotes k (1).

18

STEPHEN J. WRIGHT

A technical result is needed before establishing eventual unit steps.

Lemma 5.5. Suppose that Assumption (C) and the assumptions of Lemma 5.1 hold and that the \special case" in the projection algorithm (i.e., xk k gk 2 X ) occurs only nitely often. Then for k suciently large, there are positive constants C3, C4, and C5 such that (dk+ )T sk+ C3 k 1 kdk+k; ksk+ k C4 k 1 :

Further, if k = 1, we have that

kek+ k C5 k ; jdkT e~k j k kdk k:

Proof. Assume k is large enough that the \special case" never occurs after iteration k 1. Assume further that k 1 and all subsequent steplengths are bounded below by ^ , as in Lemma 5.2. Recall that xk = xk 1(k 1; k 1(k 1)). Let k 1 and yk 1 be the nal values of the and y variables in the projection algorithm of section 3 which was used to compute xk . We start by nding bounds on elements of k 1 in terms of k 1 ; these are needed for the rst three inequalities. As discussed in the proof of Lemma 5.3, k (k ) ! 0; that is, the projection subproblem is solved more and more accurately. Recall that the matrix equation in (20) holds at every iteration of the projection algorithm. The rst part of this equation yields that xk (xk 1 + k 1(d~k 1 + d(k 1)+)) + Ayk 1 + qymk +11 = 0: (54) From the second part of the equation and the choice of ~ in the proof of Lemma 5.2, for i 2= A: ik 1 = bi aTi xk 2~kaik > 0

P Since k 1 = k 1 (k 1)2 =2 = mi=1+1 ik 1yik 1 , we have for i 2= A that 0 < yik 1 kk 11 2~ kka1k ! 0: (55) i i Since we have assumed that m +1 is bounded away from zero for all projection subproblems, (56) ymk +11 ! 0: Now, using k 1 instead of k in (52) and setting = k 1, yk 1 (k 1) = 0: (57) xk 1(k 1; 0) (xk 1 + k 1(d~k 1 + d(k 1)+)) + A Comparing (54) with (57), we have yk 1 (k 1): xk 1(k 1; 0) xk = Ayk 1 + qymk +11 A and noting that kxk xk 1(k 1; 0)k k 1(k 1) ! Using (55){(56), the full rank of A, 0, we have that for some constant C2 > 0, yik 1 ! yik 1(k 1) C2 k 1 for i 2 A.

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

19

Hence for k suciently large, with k 1 ^ , there is a constant C2L > 0 such that yik 1 C2L for i 2 A. Also, by full rank of A and boundedness of rf, there is a C2U > 0 such that yik 1 C2U : By Assumption (C), we have for i 2 A that ik 1 kk 11 C k 1 : yi 2U Also, ik 1 kk 11 Ck 1 : yi 2L From these last two expressions, we can de ne positive constants C7L and C7U such that (58) C7L k 1 ik 1 C7U k 1 for i 2 A : k, For the rst result, note from dk+ ! rf(x ) and Z T dk+ = 0 that dk+ = At k k where t ! y . Hence t > 0 for k suciently large. From Corollary 5.4, (20), and (58), we have for i 2 A that aTi ski + = bi aTi xk = ik 1 C7L k 1 : Hence, noting that ktk k kdk+ k=kAk, we have k )T sk+ = tkT (AT sk+ ) C7L k 1 ktk k C3 k 1 kdk+k; (dk+ )T sk+ = (At for C3 = C7L=kAk, giving the rst result. For the second result, we have from Lemma 5.3, Corollary 5.4, and (20) that for some uk 2 lRr , k sk+ = Au and AT sk+ = [ik 1]i2A: Hence k = [ik 1]i2A; AT Au boundedness of i , and (58), gives that which, by full rank of A, kuk k C8 k 1 for some constant C8 > 0. Since ksk+ k kAkkuk k; the result follows by setting C4 = C8kAk. For the third inequality, we again use (20) and Lemma 5.3 to deduce that for i 2 A, ik = bi aTi xk (1; k ) = aTi [xk(1; 0) xk (1; k )] = aTi ek+ :

20

STEPHEN J. WRIGHT

k for some vk , so an identical argument to that of the preceding Now ek+ = Av paragraph can be used to give the result. The fourth inequality follows simply from jdkT e~k j kdk kke~k k kdk kkek k k kdk k: Theorem 5.6. Suppose that Assumptions (A), (C), (D), and (E) hold and that Assumption (B) holds in a neighborhood of x . Suppose that Z T r2f(x )Z is positive de nite and that for k suciently large, the tangent component of the step is given by d~k = Z(Z T r2f(xk )Z) 1 Z T dk :

Suppose there is a non-negative sequence fk g such that limk!1 k = 0 and that, in addition to (9),(10), the sequence fk g satis es

(59)

kdkkk k k 1 k2 k k 1:

Assume that < 0:5 in (11). Then k = 1 for all suciently large k. Proof. First, we consider the special case of x 2 intX, for which we have rf(x ) = 0. By Lemma 5.1, the two-metric gradient projection method reduces

to Newton's method when k is suciently large. Consequently, dk = rf(xk ), and d~k = (r2f(xk )) 1 rf(xk ). By Lemma 5.3, xk (1; 0) = xk + d~k 2 intX for k suciently large. Correspondingly, exactness of the projection yields k = k 0 for all such k. Now f(xk ) f(xk (1; k )) = f(xk ) f(xk + d~k ) = rf(xk )T d~k 12 (d~k )T r2 f(xk )d~k + o(kd~k k2) = 12 dkT (r2f(xk )) 1 dk + o(kdk k2): The second term on the right-hand side of (11) is zero, so for k suciently large, (11) is satis ed for k = 1. In the remaining case, x 2 @X; thus, by Assumption (C), rf(x ) 6= 0. Moreover, the \special case" does not occur in the projection algorithm for suciently large k (this follows directly from (52), which states in particular that (xk k gk ) P(xk k gk ) 6= 0). Since rf(xk ) = dk dk+ and xk (1; k ) = xk + d~k + sk+ + ek ; we have that f(xk ) f(xk (1; k )) = rf(xk )T [xk xk (1; k )] (1=2)[xk xk (1; k )]T r2f(xk )[xk xk (1; k )] + o(kxk xk (1; k )k2) k k = [ d d + ]T [ d~k sk+ ek ] (1=2)[ d~k (sk+ + ek )]T r2f(xk )[ d~k (sk+ + ek )] +o(kxk xk (1; k )k2):

21

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

It can be easily shown that d~kT r2f(xk )d~k = dkT d~k , and so, after some rearrangement, f(xk ) f(xko(1; k )) n = 21 dkT d~k + (sk+ + ek )T (sk+ + ek ) + (dk+ )T sk+ rf(xk )T ek 1 [sk+ + ek ]T [r2f(xk ) + 2I][sk+ + ek ] [sk+ + ek ]T r2f(xk )d~k (60) 2 +o(kxk xk (1; k )k2): Now Lemma 5.5 can be used to deduce the following inequalities: (dk+)T sk+ C3 k 1kdk+k C3 k 1; for some C3 > 0, since kdk+k ! krf(x )k = 6 0; rf(xk )T ek jdkT e ~k j + j(dk+)T ek+ j k kdk k + C5B k ; 1 [sk+ + ek ]T [r2f(xk ) + 2I][sk+ + ek ] C ksk+ + ek k2 11 2 C12 k2 1 + C13 k 1k + C14k2 ;

[sk+ + ek ]T r2f(xk )d~k C15kdk k( k 1 + k );

By substituting in (60), we obtain

n

o

f(xk ) f(xk (1; k )) 12 dkT d~k + (sk+ + ek )T (sk+ + ek ) +C3 k 1 k kdkk C5B k C12 k2 1 C13 k 1 k C14k2 C15 kdnkk k 1 C15kdk kk + o(kd~k + sk+o + ek k2) 1 dkT d~k + (sk+ + ek )T (sk+ + ek ) = 2 + k 1 C3 k (C5 Bk =2) C12 k 1 C13k C14k C15kdk k C15k + o(kd~k + sk+ + ek k2): As k ! 1, the term in square brackets approaches C3 > 0; that is, it is positive for suciently large k. It is easy to see that the nal o(kd~k + sk+ + ek k2) term is eventually dominated by the term in curly brackets. Hence, since 2 (0; 21 ), we have for k suciently large that o n f(xk ) f(xk (1; k )) dkT d~k + kxk (1; k ) (xk + d~k )k2 ; and so k = 1 passes the acceptance test (11) and xk (1; k ) will be accepted as the new iterate. The conditions (59) should be imposed only in the nal stages of the algorithm, when there is a suspicion that the active manifold has been identi ed. Otherwise, it could happen that at some early iterate, xk gk 2 intX, in which case the projection is performed exactly ( k = k = 0) and, because of (59), exact projections would be demanded at all subsequent iterations. A similar result to Theorem 5.6 can be stated for the alternative acceptance test (12), and it can be proved in almost identical fashion.

22

STEPHEN J. WRIGHT

We can now prove the nal result.

Theorem 5.7. Suppose that the assumptions of Theorem 5.6 hold and that the sequence f k g converges Q-quadratically to zero, that is, there is a constant C10 such that

(61)

k C10 k 1 :

Then the rate of local convergence of the algorithm is R-quadratic. Proof. In the case x 2 intX, we actually obtain Q-quadratic convergence, since the algorithm eventually reduces to Newton's method. We therefore focus on the case of x 2 @X.

By setting k = max(kdk k; k ), it is easy to see that (61) implies (59), and so Theorem 5.6 applies. By the de nition of sk+ , xk + d~k xk (1; 0) = sk+ : Multiplying through by Z T , and using the de nition of d~k , we obtain (62) Z T (xk xk (1; 0)) (Z T r2f(xk )Z) 1 Z T rf(xk ) = 0: By optimality of x , Z T rf(x ) = 0, so by Taylor series expansion, and since ZZ T + Y Y T = I, (63) Z T (rf(x) + r2f(x)(x x)) = O(kx x k2) ) Z T rf(xk ) Z T r2f(xk )ZZ T (xk x ) = Z T r2f(xk )Y Y T (xk x ) + O(kxk x k2): Multiplying (63) by (Z T rf(xk )Z) 1 , and adding to (62), we have (64) kZ T (x xk (1; 0))k = O(kY T (xk x)k) + O(kxk x k2): Recall that xk = xk 1(1; k 1) and that by Lemma 5.3 (65) Y T (xk 1(1; 0) x ) = 0 for all suciently large k. Hence, using the third inequality in Lemma 5.5, we have kY T (xk x)k = kY T (xk 1(1; k 1) xk 1(1; 0))k (66) = ke(k 1)+k C20 k 1: From (64){(66), kx xk+1k kx xk (1; 0)k + kek k (67) C20 k 1 + k + O(kxk x k2): Now k C10 k 1 , and so we can choose a constant C21 max(1; C20 + C10) such that (68) kx xk+1k C21 max( k 1 ; kx xk k2): Given any < C211, we can choose an integer k suciently large that

k 1 2 and kxk x k for all k k:

INTERIOR POINT ALGORITHM FOR LINEAR CONSTRAINTS

23

An inductive argument based on (68) then shows that

kxk+j x k j ; where 0 = j +1 = C21j2 : Clearly the sequence fj g is Q-quadratically convergent, so the result follows. Results similar to Theorems 5.6 and 5.7 could be proved for other choices of d~k | for example, where d~k is a quasi-Newton or inexact Newton method step rather than the reduced Newton step. These would be of practical importance in applications in which it is dicult to compute or factor the reduced Hessian. Finally, we note that it may be ecient to include a second \local" phase in the basic algorithm of section 2. When it appears that the active constraint set has been identi ed, the current iterate could be projected onto the appropriate manifold (placing it on @X). Standard methods for equality-constrained nonlinear programming could then be applied to identify the minimum on this manifold. However, it is likely that the basic algorithm would also be quite ecient in this situation because, as the nal few iterates are close together, a good starting point for the projection would be readily available. Acknowledgements. I thank David Stewart for helpful discussions during the preparation of this paper, and the referees and editor for their comments on the rst draft. REFERENCES [1] J. V. Burke and J. J. More, On the identi cation of active constraints, SIAM Journal on Numerical Analysis, 25 (1988), pp. 1197{1211. [2] J. C. Dunn, On the convergence of projected gradient processes to singular critical points, Journal of Optimization Theory and Applications, 55 (1987), pp. 203{216. , Gradient projection methods for systems optimization problems, Control and Dynamic [3] Systems, 29 (1988), pp. 135{195. , A projected Newton method for minimization problems with nonlinear inequality con[4] straints, Numerische Mathematik, 53 (1988), pp. 377{409. [5] E. M. Gafni and D. P. Bertsekas, Two-metric projection methods for constrained optimization, SIAM Journal on Control and Optimization, 22 (1984), pp. 936{964. [6] C.-G. Han, P. Pardalos, and Y. Ye, Computational aspects of an interior point algorithm for quadratic programming problems with box constraints, in Large-Scale Numerical Optimization, T. F. Coleman and Y. Li, eds., SIAM Publications, Philadelphia, PA, 1990, pp. 92{112. [7] M. Kojima, S. Mizuno, and A. Yoshise, An o(pnl) iteration potential reduction algorithm for linear complementarity problems, Tech. Rep. Research Report B{217, Department of Information Sciences, Tokyo Institute of Technology, 1988. [8] O. L. Mangasarian and T.-H. Shiau, Error bounds for monotone linear complementarity problems, Mathematical Programming, 36 (1986), pp. 81{89. [9] S. J. Wright, Interior point methods for optimal control of discrete-time systems, Tech. Rep. MCS{P229{0491, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, April 1991. [10] Y. Ye, K. O. Kortanek, J. A. Kaliski, and S. Huang, Near-boundary behavior of primaldual potential reduction algorithms for linear programming, Tech. Rep. Working Paper Number 90{9, College of Business Administration, The University of Iowa, 1990.

24

STEPHEN J. WRIGHT

[11] Y. Zhang, J. E. Dennis, and R. A. Tapia, On the superlinear and quadratic convergence of primal-dual interior point linear programming algorithms, Tech. Rep. TR90{6, Department of Mathematical Sciences, Rice University, 1990.

Recommend Documents

A derivative-free algorithm for linearly constrained ... - Semantic Scholar

A Linearly Constrained Constant Modulus ... - Semantic Scholar

Adaptive Linearly Constrained Minimum Variance ... - Semantic Scholar

Infeasible Interior-Point Algorithm for Pâ(Îº) - Semantic Scholar