A SMOOTHING METHOD FOR MATHEMATICAL PROGRAMS WITH EQUILIBRIUM CONSTRAINTS
Francisco Facchinei a 1 , Houyuan Jiang b 2 and Liqun Qi b 2 a
Universita di Roma \La Sapienza" Dipartimento di Informatica e Sistemistica Via Buonarroti 12, I-00185 Roma, Italy e-mail:
[email protected] b
School of Mathematics Department of Applied Mathematics University of New South Wales Sydney, NSW 2052, Australia e-mail (Jiang):
[email protected] e-mail (Qi):
[email protected] Abstract: The mathematical program with equilibrium constraints (MPEC) is an opti-
mization problem with variational inequality constraints. MPEC problems include bilevel programming problems as a particular case and have a wide range of applications. MPEC problems with strongly monotone variational inequalities are considered in this paper. They are transformed into an equivalent one-level nonsmooth optimization problem. Then, a sequence of smooth, regular problems that progressively approximate the nonsmooth problem and that can be solved by standard available software for constrained optimization is introduced. It is shown that the solutions (stationary points) of the approximate problems converge to a solution (stationary point) of the original MPEC problem. Numerical results showing viability of the approach are reported.
Key Words: Equilibrium constraints, variational inequality problems, strong monotonicity,
optimality conditions, global convergence.
1 This paper was written while this author was visiting the University of New South Wales. The visit was partially founded by the Australian Research Council and by MURST National Program \Metodi di Ottimizzazione nella Ricerca Operativa". 2 These authors are supported by the Australian Research Council.
1 Introduction The mathematical program with equilibrium constraints (MPEC) is an optimization problem whose constraints include variational inequalities. Such problems play an important role, for example, in the design of transportation networks [19, 28], in economic modelling [37], and in shape optimization [20]. The MPEC problem also includes, as a special case, the so called bilevel programming problem, where some variables are restricted to be in the solution set of a parametric convex optimization problem. An extensive bibliography on this topic and its applications can be found in [38]. Further material on the MPEC problem and its applications can also be found in the two volumes [2, 26]. We are interested in algorithms for solving the MPEC problem, which is known to be a very dicult problem, being nonsmooth and nonconvex also under very favourable assumptions. Furthermore, the computation of the (generalized) gradients of the constraints can be extremely dicult, if at all possible. Few successful numerical methods have been proposed to date. Among them, penalty techniques have been mostly used, see [1, 18, 20, 22, 25]. Some heuristic methods have been suggested, see, e.g., [15, 36]. Recently, Outrata [31] and Outrata and Zowe [32] converted the MPEC into an unconstrained nonsmooth Lipschitz optimization problem. Then, one may use well developed nonsmooth optimization numerical methods for solving the MPEC. For dierent methods of solving bilevel programming problems, the interested reader is referred to [38]. Some more recent references include [14, 35]. In this paper, under the same set of assumptions used in [32], we propose a new algorithm for the solution of MPEC problems and study its convergence properties. Furthermore we also develop necessary optimality conditions which are directly relevant to the algorithm considered. Our strategy can be outlined as follows. By using the KKT conditions for the variational inequality constraints, we reformulate the MPEC problem as a one-level nonsmoothly constrained optimization problem. Then, to avoid the diculty of the nonsmooth constraints, we solve a sequence of smooth, regular problems, which progressively approximate the nonsmooth problem. We prove that the sequence of points so generated is bounded and that every accumulation point is a solution of the MPEC problem. The approach we propose appears to be new, even when specialized to bilevel programs. In spite of its conceptual and practical simplicity, the numerical results reported in this paper seem to indicate that it is promising from the computational point of view. The remainder of the paper is organized as follows. In the next section we present some basic assumptions, under which the MPEC is reformulated as an equivalent one-level nonsmooth optimization problem. In Section 3 a smoothing technique is introduced by which this nonsmooth optimization problem is approximated by a sequence of smooth optimization problems. Some important properties of the smoothed problems are also investigated. Section 4 is devoted to studying optimality conditions, which play an important role in subsequent developments. In Section 5 we propose three dierent algorithms. Global convergence of these algorithms is established under the assumptions stated in Section 2. Numerical results are presented in Section 6 for both academic and practical problem. We conclude the paper with some remarks in the nal section. A complete description of the test problems is reported in the Appendix.
2 Problem Formulation and Preliminaries We consider the following mathematical program with equilibrium constraints: min f (x; y ) s:t: x 2 Xad (1) y 2 S (x); where f : IRn+m ! IRn+m is continuously dierentiable, Xad IRn and, for each x 2 Xad and for a continuously dierentiable function F : IRn+m ! IRm , S (x) is the solution set of 2
the variational inequality (VI) de ned by the pair (F (x; ); C (x)), i.e., y 2 S (x) if and only if y 2 C (x) and (v ? y )T F (x; y ) 0; for all v 2 C (x): (2) We assume that C (x) is de ned by
C (x) := fy 2 IRm : gi(x; y) 0; i = 1; : : :; lg
(3)
with g : IRn+m ! IRl twice continuously dierentiable and concave in the second variable. We indicate by I (x; y ) the set of active constraints, i.e.,
I (x; y) := fi : gi(x; y) = 0g: We make the following blanket assumptions (see also [32]): A1 C (x) 6= ; for all x 2 A, where A is an open set containing Xad. A2 C (x) is uniformly compact on A, i.e., there exists an open bounded set B IRm such that for all x 2 A C (x) B: A3 F is uniformly strongly monotone with respect to y on A B , i.e., there exists a constant > 0 such that for all (x; y) 2 A B and d 2 IRm ,
dT ry F (x; y)d kdk2 : A4 Xad IRn is compact. A5 At each x 2 Xad and y 2 S (x), the partial gradients ry gi (x; y ), i 2 I (x; y ) are linearly independent. We point out some simple consequences of these assumptions. By A1, A2 and A3, for every x 2 Xad, there exists one and only one solution of the lower-level variational inequality VI. Furthermore, by A5, every solution must satisfy the KKT conditions
F (x; y) ? ry g(x; y) = 0; g(x; y) 0; 0; T g(x; y) = 0;
(4)
min f (x; y ) s:t: x 2 Xad; F (x; y) ? ry g(x; y) = 0; g(x; y) 0; 0; T g(x; y) = 0:
(5)
where 2 IRl is a multiplier which, by Assumption A5, is uniquely determined. Note that Assumption A1, together with the above observation, implies that the feasible region of Problem (1) is nonempty. We have seen that to each x we can associate the unique solution of the lower-level variational inequality and the corresponding multiplier. On the other hand, if (x; y; ) satis es the KKT conditions (4), then y belongs to S (x) by the concaveness of gi with respect to the second argument [19]. Therefore, under our assumptions, the inner VI problem (2) and its KKT conditions (4) are equivalent, so that the MPEC problem (1) can be reformulated as the following standard nonlinear programming problem:
This is a one-level smooth reformulation of Problem (1), and it could be tempting to use this reformulation to solve Problem (1). This approach has actually been followed by some authors, in the case of bilevel programs, see [3, 5, 6, 13]. However, it is easy to see that, in general, Problem (5) does not satisfy any standard constraint quali cation (see also the 3
example at the end of Section 4). Furthermore, the complementarity-type constraints are very complicated and dicult to handle, see [14]. We are then forced to further reformulate the MPEC problem (1). To this end we consider the following nonsmooth equivalent reformulation of Problem (5) min s:t:
f (x; y) x 2 Xad; F (x; y) ? ry g(x; y) = 0 g(x; y) ? z = 0 ?2 min(; z) = 0;
(6) (7) (8)
where z 2 IRl and the min operator is applied componentwise to the vectors and z . If we introduce the function H0 : IRn+m+l+l ! IRm+l+l ; de ned as
1 0 F (x; y ) ? ry g (x; y ) CA ; g(x; y) ? z H0 (w) := H0(x; y; z; ) := B @ ?2 min(; z)
we can rewrite the above problem more compactly as (P)
min f (x; y ) s:t: x 2 Xad; H0(x; y; z; ) = 0:
The reason for the introduction of the multiplicative factor -2 before the min operator will become apparent shortly. With respect to Problem (5) we have added a new variable z which, at feasible points, is always equal to g (x; y ). This variable is not strictly necessary, but will facilitate some of the proofs. Problem (P) is equivalent to Problem (5) which, in turn, is equivalent to the MPEC problem (1). This is made precise in the following proposition.
Proposition 2.1 (x; y) is a global (local) solution of the MPEC problem (1) if and only if
there exists a vector (z ; ) such that (x ; y ; z ; ) is a global (local) solution of Problem (P).
Proof. The \only if" part. If (x; y) is a local minimizer of the MPEC problem (1), then
y 2 S (x ) and there exists a vector (z ; ) such that (x; y ; z; ) is a feasible point of Problem (P) by Assumption A5. Suppose (x; y ; z ; ) is not a local minimizer of (5), i.e., there exists a sequence of points f(xk ; y k ; z k ; k )g, which are feasible to Problem (P) such that f(xk ; yk; zk ; k)g ! (x; y; z; ) as k ! 1 and f (xk ; yk ) < f (x; y). Since yk 2 S (xk ), i.e., (xk ; y k ) is a feasible point of the MPEC (1), this contradicts that (x; y ) is a local minimizer of the MPEC (1). Let now (x; y ) be a global minimizer of the MPEC (1). By what we have just proved, (x ; y ; z ; ) is a local minimizer of Problem (P) for some vector (z ; ). Suppose now that (x ; y ; z ; ) is not a global minimizer of Problem (P). Then there exists a feasible point (x; y; z; ) such that f (x; y) < f (x; y ); which, since (x; y) is feasible to the MPEC problem (1), contradicts the global optimality of (x ; y ). The \if" part. Suppose now that (x; y ; z ; ) is a local minimizer of Problem (P). It follows that y 2 S (x ) and (x; y ) is a feasible point of the MPEC problem (1). Suppose (x ; y ) is not a local minimizer of the MPEC problem (1). Then there exists a sequence of 4
(xk ; y k ) of points feasible to the MPEC problem (1) such that (xk ; y k ) ! (x; y ) as k ! 1 and for any k f (xk ; y k ) < f (x ; y ): (9) Assumption A5 implies the existence of a unique vector (z k ; k ) such that (xk ; y k ; z k ; k ) is a feasible solution of Problem (P) for each k. In view of continuity of g and (7), we also obviously have that z k ! z . To conclude the proof then we only need to show that k ! . In fact, if we prove this, we have that (xk ; y k ; z k ; k ) ! (x; y ; z ; ) as k ! 1, so that (9) implies that (x; y ; z ; ) is not a local minimizer of Problem (P), which is a contradiction. We then proceed to prove that k ! . Suppose, by contradiction that this is not true. We consider two cases: (i) fk g is bounded, (ii) fk g is unbounded. (i) In this case we can assume, without loss of generality, that k ! 6= . By continuity then (x ; y ; z ; ) is feasible to Problem (P) and (x; y ; z ; ) 6= (x; y ; z ; ). But this is impossible by Assumption A5. (ii) Since (xk ; y k ; z k ; k ) is feasible, we have by (7), and for every k,
F (xk ; y k ) ? ry g(xk ; y k )k = 0: Dividing by kk k and passing to the limit, we obtain, assuming, without loss of generality, that k =kk k ! ~ 6= 0, ry g(x; y)~ = 0: (10) Since by continuity we eventually have I (xk ; y k ) I (x; y ), we see that, by (7) and (8), i = 0 if i 62 I (x; y ). But then (10) contradicts Assumption A5 and the proof of the \if" part in the case of local minimum points is complete. Let now (x; y ; z ; ) be a global minimizer of Problem (P). Since (x; y ) is feasible to the MPEC problem (1), if (x; y ) is not a global solution of the MPEC problem (1), there exists a feasible point (x; y) such that
f (x; y) < f (x; y ): But since we can nd a (z ; ) such that (x; y; z; ) is feasible to Problem (P), this contradicts the global optimality of (x; y ; z ; ). 2
Remark: We stress that, at this stage, we do not know whether the MPEC problem or the Problem (P) admit a global solution, even if it would not be dicult to establish such a result. We have just proved that if one of the two problems has a global solution, also the other does. Actually, in Section 4 we shall see, as a result of a more general analysis, that these problems do have global solutions. Our strategy will now be to solve a sequence of smooth, regular one-level problems which progressively approximate Problem (P). In the next section we introduce the smoothing technique and study some basic features of the smoothed problems. In Section 4 we digress to study optimality conditions both for Problem (P) and for its smooth perturbations which are relevant to our approach. In Section 5, instead, we formally state the algorithms and study their convergence properties.
3 Smoothing Problem (P) Let 2 IR be a parameter. De ne the function : IR2 ! IR by
q
(a; b) := (a ? b)2 + 42 ? (a + b): The most important property of this function is stated in the following result [23, Lemma 3.1]. 5
Proposition 3.2 For every we have (a; b) = 0 () a 0; b 0; ab = 2 : Note also that for = 0 we have (a; b) = ?2 min(a; b), while, for every 6= 0, (a; b) is a C 1 function. Furthermore, for every (a; b), lim!0 (a; b) = ?2 min(a; b). The function (a; b) is therefore a smooth perturbation of the min function. Let us introduce the nonlinear operator H : IRn+m+l+l ! IRm+l+l ;
1 0 F (x; y ) ? ry g (x; y ) CA ; g(x; y) ? z H (w) := H(x; y; z; ) := B @ (; z )
where
(; z ) := ( (1; z1); : : :; (m; zm ))T 2 IRl : Then, for every 6= 0, one may de ne an optimization problem (P )
min f (x; y ) s:t: x 2 Xad; H(x; y; z; ) = 0:
Problem (P ) may be viewed as a perturbation of Problem (P) with the parameter . For any 6= 0, (P ) is a smooth optimization problem although (P) is nonsmooth in general. It is clear that (P0 ) coincides with (P). We denote by F IRn+m+l+l the feasible set of problem (P ). Before continuing we stress that most of the times we view as a parameter, and this explains the notation adopted, where is a subscript. However, in some cases, especially in the proofs, we shall view as an independent variable. Hence, from this point of view, the function H depends on the ve variables (x; y; z; ; ). We note that the one proposed is certainly not the only possible smooth perturbation of the min operator [33, 34]. However, this particular perturbation enjoys several useful properties that we now study. The rst results we can prove readily follows from the results in [24]. Proposition 3.3 For every and for every feasible point of Problem (P) all the generalized Jacobians of H with respect to the variables (y; z; ) are nonsingular. Proof. We note that the function we introduced in this section is just the negative of the function used in [24]. It is then easy to check that this change does not aect at all the nonsingularity properties studied in [24]. Therefore, for each xed x, the proposition is an immediate consequence of [24, Theorem 3.5] (when 6= 0) and of [24, Lemma 3.12] (when = 0). 2 The most interesting result that we want to prove in this section, is that the sets F are uniformly compact as long as is bounded. To this end we rst recall that a locally Lipschitz function is said to be regular if (at every point) the directional derivative exists and is equal to its Clarke's directional derivative. Moreover, a vector function is said to be regular if each of its components is regular. The following result holds. Lemma 3.4 For every , the function H(x; y; z; ) (as a function of the variables ; x; y; z; ) is locally Lipschitz and regular. Proof. Since all the remaining components of the function H(x; y; z; ) are continuously dierentiable, we only need to show that (zi ; i) is locally Lipschitz and regular. Since
q
(zi; i) = (zi ? i)2 + 42 + (?zi ? i); 6
and the sum of regular functions is regular, we only have to check that
q
(zi ? i)2 + 42
is locally Lipschitz and regular. In turn this latter function can be seen as the composition of the and hence locally Lipschitz and regular, function h1 : IR2 ! IR, with h1 (t1; t2 ) = q 2 convex, t1 +pt22 , and of the dierentiable function h2 : IR3 ! IR2 , with h2 (zi; i; ) = (zi ? i; 2), 2 Then (zi ? i)2 + 42 is regular by [11, Theorem 2.3.9 (iii)]. Using this lemma we are now able to apply the implicit function theorem for locally Lipschitz functions thus obtaining the following result. Lemma 3.5 Let (; x; y; z; ) be such that H(x; y; z; ) = 0. Then, there exist a neighborhood
of (; x) and a continuous function (y; z; ) : ! IRm+l+l , such that, for each (; x) 2 ,
H(x; y(x; ); z(x; ); (x; )) = 0:
Proof. This lemma is nothing else but the application of the implicit function theorem for
Lipschitz equations, see [11, Corollary to Theorem 7.1.1], to the system H (x; y; z; ) = 0. According to [11, Corollary to Theorem 7.1.1], all we have to do is to check that the projection of @xyz H (x; y; z; ) on the space of the y , z and variables is comprised of nonsingular matrices. To this end, let y;z; denote the projection operator over the (y; z; ) space. By [11, Proposition 2.6.2 (e)] and the de nition of the projection operator, we have 3
0 @ (H)1(x; y; z; ) 1 0 y;z; [@ (H)1(x; y; z; )] 1 CA : CA B@ .. .. y;z; [@H(x; y; z; )] y;z; B @ . .
y;z; [@ (H)m(x; y; z; )] @ (H)m(x; y; z; ) It follows from the regularity property of H by Lemma 3.4, and [11, Proposition 2.3.15] that 0 y;z; [@ (H)1(x; y; z; )] 1 0 @yz (H)1(x; y; z; ) 1 B@ CA B@ CA : .. .. . . y;z; [@ (H)m(x; y; z; )] @yz (H)m(x; y; z; ) Notice that, from the very special structure of the function , we have
0 @yz (H)1(x; y; z; ) 1 CA = @yz H(x; y; z; ); B@ .. . @yz (H)m(x; y; z; )
see [24]. Consequently,
y;z; [@H(x; y; z; )] @yz H (x; y; z; ): On the other hand, by Proposition 3.3, all the generalized Jacobians in @yz H (x; y; z; ) are nonsingular. Then the assertion on the nonsingularity of the projection follows from the above formula. 2 The following lemma is interesting in its own and will be used several times in the sequel. Lemma 3.6 Let x in Xad be given. Then, for every there exists a unique point in F such that its x-part is equal to x. We denote this point by w = (x; y(x); z(x); (x)). Furthermore, w is continuous as a function of . 3 In the following formulas @ (H)i is a set of row vectors, representing the generalized gradient of the i-th
component of H . Note that in the rest of the paper, as usual, the generalized gradient of a function is set of column vectors.
7
Proof. Let x in Xad be given. We rst prove the existence and uniqueness assertion. If = 0, the existence and uniqueness follow by Assumptions A1, A2, A3 and A5, as already discussed in the introduction. If = 6 0, the assertion follows by [24, Theorem 3.15], taking into
account, that, as already explained in the proof of Proposition 3.3, the dierences between the functions used in this paper and in [24] do not in uence the properties proved in [24]. We then turn to the continuity property. Let x in Xad and be given. By the rst part of the lemma, there exists a (unique) point w = (x; y (x); z(x); (x)) such that H(w ) = 0. Then, by Lemma 3.5, a neighborhood of (; x) and a continuous function (y; z; ) : ! IRm+l+l exist such that, for each (; x) 2 ,
H(x; y(x; ); z(x; ); (x; )) = 0: Since, by the uniqueness proved in rst part of the lemma,
y (x) = y(x; ); z(x) = z(x; ); (x) = (x; );
2
the continuity assertion is also proved.
We are now ready to prove a result which will turn out to be crucial. It basically asserts that the feasible regions of the perturbed problems (P ) are uniformly compact.
Theorem 3.7 For every ~, there exists a compact set D(~) IRn+m+l+l such that F D(~) for every 2 [?~; ~].
Proof. We rst note that, for every 2 IR and for every point w = (x; y; z; ) 2 F we have,
by Assumption A4, that x belongs to the compact set Xad. We also have, by z = g (x; y ) and by Proposition 3.2, that g (x; y ) 0, which, in view of Assumption A2, implies that y belongs to the bounded set B . This, in turn, taking into account the continuity of the functions g and (7) implies that z belongs to a compact set. We now prove that also is bounded. Suppose that this is not true. Then there exist sequences fk g and fxk ; y k ; z k ; k g such that k 2 [?~; ~] and (xk ; y k ; z k ; k ) 2 Fk for all k and limk!1 kk k = 1. By the rst part of the proof we can then assume, without loss of generality, that k ! 2 [?~; ~] and (xk ; y k ; z k ) ! (x; y; z). Now, by Lemma 3.6 there exists a unique point w = (x; y(x); z(x); (x)) such that w belongs to F and the x-part of w is x. In particular the point w is such that H (w) = 0. On the other hand by Lemma 3.5, there exists a continuous function (y; z; ) : ! IRm+l+l from a neighborhood of (; x) such that H(x; y(x; ); z(x; ); (x; )) = 0 when (x; ) belongs to . But again by Lemma 3.6, we shall eventually have
y k = y(xk ; k ); z k = z(xk ; k ); k = (xk ; k ): Since the function (; ) is continuous we have a contradiction to the unboundedness of k .
2
Corollary 3.8 For every , F is nonempty and compact. Proof. The assertion on the nonemptiness readily follows from Lemma 3.6. Furthermore F
is closed by the continuity of the function H and bounded by Theorem 3.7.
8
2
4 Optimality Conditions In this section we consider optimality conditions for the Problems (P ). These conditions are important in the analysis that will be carried out in the next section and show that the perturbed problems (P ) always satisfy the KKT conditions and are therefore fairly regular, as opposite to Problem (5). Furthermore, in the case of = 0 we shall obtain optimality conditions for the MPEC problem. There are only a few papers where optimality conditions for MPEC problems are dealt with, see, e.g., [27]. Our main aim in deriving optimality conditions for the MPEC problem is the construction of reliable stopping criteria for the algorithms to be described in the next section. In what follows we shall consider in a uni ed way both the case = 0, which corresponds to a nondierentiable problem, and the case 6= 0, which gives rise to dierentiable problems. Therefore we shall write, for example, @ (zi ; i). This expression will just be ?2@ min(zi ; i) if = 0, while it will give r (zi ; i) if 6= 0. Unless stated otherwise, from now on all the (generalized) gradients are made with respect to the variables (x; y; z; ). So, for example, r is a vector in IRn+m+l+l . Finally, we shall indicate by d(jXad IRm+l+l ) the Euclidean distance function to the set Xad IRm+l+l . We are in the position to apply freely the well established theory of necessary optimality conditions for Lipschitz programs [11]. Accordingly, a necessary condition for a point w = (x; y; z; ) to be optimal for Problem (P ) is that the point be stationary, i.e. that 0 6= (; ; ; ) 2 IR1+m+l+l and s 2 @d(wjXad IRm+l+l ) exist such that m X
0 2 rf (x; y ) + rL(x; y; ) + r(g (x; y ) ? z ) +
i=1
@(zi ; i)i + M k(; ; ; )ks; (11)
where M is a Lipschitz constant for (f; H ) around w and where we set L(x; y; ) = F (x; y ) ? ry g(x; y). This conditions are known as Fritz-John (FJ) conditions. If 6= 0, and hence, without loss of generality, if = 1, (11) becomes 0 2 rf (x; y ) + rL(x; y; ) + r(g (x; y ) ? z ) +
m X i=1
@(zi; i)i + M k(1; ; ; )ks; (12)
which are the Karush-Kuhn-Tucker (KKT) conditions. It is well known that the presence of FJ points which are not KKT points can cause severe problems to numerical optimization procedures. Luckily, in our case we have that every stationary point is a KKT point. Theorem 4.9 For every , if w is a stationary point for Problem (P), then in (11) is always dierent from 0 and w is a KKT point of Problem (P ). Proof. Suppose by contradiction that w satis es the FJ conditions (11) with = 0. Then we can write 0 2 rL(x; y; ) + r(g (x; y ) ? z ) +
m X
@ (zi ; i)i + M k(; ; )ks;
i=1 @d(wjXad IRm+l+l ).
(13)
where (; ; ) = 6 0 and s 2 We shall now concentrate on the last m + l + l equations in the FJ conditions (11). We rst observe that, if we partition the vector s in (sx ; sy ; sz ; s), we have that (sy ; sz ; s) = 0, since this latter subvector, for well known
properties of the distance function, belongs to the normal cone to IRm+l+l , which is 0m+l+l . Taking this fact into account, the last m + l + l equations of (13) give 0 2 y;z; rL(x; y; ) + r(g (x; y ) ? z ) + y;z; 9
m X i=1
0 @ (zi ; i)i y;z; @H (w)T B @
1
C A;
(14)
where y;z; is the projector over the (y; z; ) space. Similarly to Lemma 3.4, we have that H, as a function of (x; y; z; ), is regular, so y;z; @H(w) is contained in the generalized Jacobian of H with respect to (x; y; z ) by [11, Proposition 2.3.15]. But then, recalling that (; ; ) 6= 0, (14) contradicts Proposition 3.3. 2 In the next section we shall also need the following result which, roughly speaking, states a continuity property for KKT points of the problems (P ).
Proposition 4.10 Let wk and k ! 0 be two sequences such that, for every k, wk is a KKT
point for Problem (Pk ) with multipliers (k ; k ; k ) and for some vector sk 2 @d(wk jXad IRm+l+l ), and such that k 6= 0 for all k. Suppose that fwk g ! w , f(k ; k ; k )g ! ( ; ; ), and fsk g ! s . Then w , with the multipliers ( ; ; ), and with the vector s , satis es the KKT conditions for Problem (P0).
Proof. We only need to check that
m X
0 2 rf (x ; y )+rL(x; y ; ) +r(g (x; y )?z )?
i1
2@ min(zi ; i ) +M k(1; ; ; )ks;
with s 2 @d(wjXad IRm+l+l ). By assumptions we have that 0 = rf (xk ; y k )+rL(xk ; y k ; k )k +r(g (xk ; y k )?z k )k +
m X i1
(15)
rk (zik ; ki)k +M k(1; k; k; k)ksk :
(16) Taking into account that, by the upper semicontinuity of the subdierential mapping, s 2 @d(wjXad IRm+l+l ), and taking also into account the continuity assumptions, it is easy to see that, if i = 1; : : :; m; (17) lim r k (z k ; k ) 2 ?2@ min(zi ; i ); k!1 i i then (15) follows from (16) by simple continuity arguments. We then check (17). If we indicate by ej the j -th column of the (n + m + l + l) dimensional identity matrix, we recall that if zi < i ?2@ min(zi; i ) = ?2en+m+i ; ?2@ min(zi; i ) = ?2en+m+l+i ; if zi > i ?2@ min(zi; i ) = ?2conv(en+m+i ; en+m+l+i ); if zi = i : On the other hand we also have
0 rk (zik ; ki) = @ q
zik ? ki (zik ? ki )2 + 4(k )2
0 1 ? 1A en+m+i +@ q
ki ? zik (zik ? ki )2 + 4(k )2
(18)
1 ? 1A en+m+l+i :
(19) Now, passing to the limit in (19) and comparing with (18), we obtain (17) and the proof is complete. 2 The properties proved in this section can be contrasted to the regularity properties of the reformulation (5). To this end we consider the following simple example, with n = 1; m = 1; l = 2. f (x; y) = 2x ? y Xad = fx : 0 x 1g F (x; y) = y ? x C (x) = fy : 0 y 1g: 10
For this problem, reformulation (5) becomes min 2x ? y s:t: x 0; y 0; 1 0; y1 = 0;
1?x 0 1?y 0 2 0 (1 ? y )2 = 0 y ? x ? 1 + 2 = 0: It can be easily veri ed that the feasible region of this problem is given by points of the form (; ; 0; 0), for all 2 [0; 1], and therefore that the unique minimum point is given by x = (0; 0; 0; 0). On the other hand, if we write down the necessary optimality conditions, we see that every feasible point is a FJ point and that x is not a KKT point.
5 Algorithms In this section we study the behaviour of a sequence fwk g, where wk is a \solution" of Problem (Pk ) and fk g ! 0. We shall consider both the case in which \solution" means global solution, which is the most interesting case from the theoretical point of view, and the case in which \solution" is just a (possibly approximate) stationary point of Problem (Pk ), which is the most important case from the practical point of view.
Algorithm G. Step 0 Let fk g be any sequence of nonzero numbers with limk!1 k = 0: Choose w0 := (x0; y 0; z 0; 0) 2 IRn+m+l+l ; and set k := 1: Step 1 Find a global solution wk of Problem (Pk ). Step 2 Set k := k + 1; and go to Step 1. Here \G" stands for \Global". By Corollary 3.8 the algorithm above is well de ned, i.e., the global solution to Problem (Pk ) required at Step 2 exists. The following theorem shows that the sequence fwk g generated by the algorithm converges to a global solution of problem (P).
Theorem 5.11 The sequence fwk g generated by Algorithm G is contained in a compact set, and each of its limit points is a global solution of Problem (P).
Proof. The fact that the sequence fwk g is contained in a compact set immediately follows
from Theorem 3.7 with ~ = supk jk j. Then we pass to the second part of the theorem. Assume, without loss of generality, that fwk g ! w = (x; y ; z ; ) and suppose, by contradiction, that w is not a global solution of Problem (P). We rst note that, since each wk is feasible to Problem (Pk ) and k ! 0 then, by continuity, we have that w 2 F0 . If w is not a global solution of Problem (P), then there exists a feasible point w = (x; y; z; ) such that
f (x; y) < f (x; y ): (20) We shall now obtain a contradiction by showing that wk is not a global solution of Problem (Pk ) for k large enough. By Lemma 3.6 we know that for every positive there exists a unique point w = (x; y(x); z(x); (x)) in F such that its x-part is equal to x. We also know that w tends to w when tends to 0. But then, since wk tends to w, by continuity and taking into account (20), we obtain that eventually f (x; yk (x)) < f (xk ; y k ), which contradicts 2 the fact that wk is a global solution of Problem (Pk ). Theorem 5.11 is obviously of great theoretical interest, however, if we use standard optimization methods to solve the sequences of Problems (Pk ) we are only able to nd stationary 11
points of Problem (Pk ). Then, it is interesting to analyze the following more practical algorithm scheme, where we suppose only that at each step we are able to nd a stationary point of (Pk ).
Algorithm S. Step 0 Let fk g be any sequence of nonzero numbers with limk!1 k = 0: Choose w0 :=
(x0; y 0; z 0; 0) 2 IRn+m+l+l ; and set k := 1: Step 1 Find a KKT point wk of Problem (Pk ). Step 2 Set k := k + 1; and go to Step 1. Here \S" stands for \Stationary". By Theorem 4.9 we have that every stationary point of the Problems (Pk ) is actually a KKT point. Since, by Corollary 3.8 every Problem (Pk ) admits at least one stationary point, the task at Step 2 is well posed. We stress again that, by Theorem 4.9, every stationary point of the Problem (Pk ) is a KKT point. We can prove the following theorem. Theorem 5.12 The sequence fwk g generated by Algorithm S is contained in a compact set, and each of its limit points is a KKT point of Problem (P). Proof. The fact that the sequence fwk g generated by Algorithm S is contained in a compact set can be proven as in Theorem 5.11, so we turn to the second part of the theorem. Since wk is a KKT point of Problem (Pk ), there exists, for each k, multipliers (k ; k ; k ) and a vector sk in @d(wk jXad IRm+l+l ) such that
m X k k k k k k k k k k 0 = rf (x ; y )+rL(x ; y ; ) +r(g (x ; y )?z ) + rk (zik ; ki )ik +M k(1; k ; k ; k )ksk ; i=1
(21) where M is the sup of the Lipschitz constants of the functions (f; H ) on the compact set D(~) (see Theorem 3.7), with ~ = supk jk j, and for all 2 [?~; ~]. We now consider two cases: k(k ; k ; k )k is bounded and k(k ; k ; k )k is unbounded. We rst examine the second case and show it cannot occur. We can divide (21) by k(1; k ; k ; k )k. If we indicate by (k ; k ; k ) the normalized vector of multipliers, we can assume without loss of generality, that (k ; k ; k ) converges to ( ; ; ) 6= 0 and that wk converges to w feasible to Problem (P). Since sk is in the subdierential of a distance function, its norm is bounded by 1, so that, taking also into account the upper-semicontinuity of subdierentials, we can assume that also sk converges to s belonging to @d(wjXad IRm+l+l ). Furthermore, by continuity, rf (xk ; y k ) ! rf (x ; y ), rL(xk ; yk; k) ! rL(x; y; ) and r(g(xk; yk) ? zk ) ! r(g(x; y) ? z). Finally, reasoning as in the proof of Proposition 4.10 it is easy to see that we can also assume, without loss of generality, that rk (zik ; ki ) converges to an element in ?2@ min(i ; zi). Hence, passing to the limit in the normalized (21) we obtain 0 2 rL(x; y ) + r(g (x; y ) ? z ) ? 2@ min(i ; zi )i + M k ; ; ks ; so that w is a FJ point with 0 multiplier for the objective function. But this contradicts Theorem 4.9 so this case cannot occur and k(k ; k ; k )k is bounded. So, let us examine the case k(k ; k ; k )k is bounded. Then, without loss of generality, we can assume that (1; k ; k ; k ) converges to (1; ; ; ) and that wk converges to w feasible to Problem (P). Analogously to the previous case, we can also assume, without loss of generality, that sk converges to s in @d(wjXad IRm+l+l ). The desired result then follows from Proposition 4.10. 2 In the proof of the theorem we have also shown that the multipliers generated when solving the Problems (Pk ) are bounded. Since this is a very signi cant property from the numerical point of view, showing that the Problems (Pk ) are numerically stable as k goes to 0, we state this result as a corollary. 12
Corollary 5.13 Let (k ; k; k) be the sequence of KKT multipliers associated to the solution wk calculated at Step 2 of Algorithm S. Then there exists a constant N such that k(k ; k; k)k N for every k.
Although Algorithm S is much more realistic than Algorithm G, since computing a KKT point is an aordable task, we have not considered yet the stopping criterion problem. Furthermore we would also like to be able to consider the case in which we only nd an approximate stationary point of Problem (Pk ). We address these issues in the following algorithm.
Algorithm AS. Step 0 Let fk g be any sequence of nonzero numbers with limk!1 k = 0: Choose a stop-
ping tolerance T 0 and a sequence fk g of nonnegative numbers. Choose w0 := (x0; y 0; z 0; 0) 2 IRn+m+l+l ; and set k := 1: Step 1 Find a point wk , multipliers (k ; k; k) and a vector sk 2 IRn+m+l+l such that
P (a)
rf k + rLk k + r(g k ? z k )k + mi=1 rk (z k ; k )ik + M k(1; k ; k ; k )ksk
k (b) d(wk jFk ) k (c) d sk j@d(wkjXad IRm+l+l ) k .
Step 2 If max (k )2; k ; d(wkjF0) T , STOP. Step 3 Set k := k + 1; and go to Step 1. Here \AS" stands for \Approximate Stationary". A few comments are in order. In Step 1 we look for an approximate KKT point of Problem (Pk ). The de nition of approximate KKT point of Problem (Pk ) is given by the fairly natural conditions (a), (b) and (c). Note that (a) and (c) are just an approximation of the KKT conditions, while (b) requires the point to be \not too far from feasibility". Much of the formal complication is due to the fact that we have used the abstract constraint notation x 2 Xad, with the consequent necessity to use the subdierential of the distance function in the KKT conditions. However, in most of the practical applications, the set Xad is de ned by a nite number of equalities and inequalities, so that the optimality conditions can be expressed in terms of the gradients of these constraints and of their associated multipliers. In this event, point (c) becomes super uous and can be incorporated directly into point (a). The test at point (a) is standard, and is usually a part of any practical stopping test in commercial codes for the solution of smooth constrained optimization problems. The test at point (b), instead, is conceptually hard to check. Assuming that the set Xad is de ned by a system of inequalities, i.e. Xad := fx 2 IRn jh(x) 0g with h : IRn ! IRs , test (b) is usually substituted, in practical algorithms, by the test kHk (wk )k + k min(h(x); 0)k k . We nally remark that, for simplicity, we have assumed that in the tests (a), (b) and (c) the same tolerance k is used, but we can obviously use three dierent parameters. We nally come to the stopping criterion for the main algorithm in Step 2. We recall that, in general, nding theoretically sound stopping criteria for nondierentiable problems is not an easy task and usually involves the calculation of (an approximation of) the -subdierential of the functions involved. In our scheme we can easily bypass these diculties using the fact that the sequence of smooth problems we (approximatively) solve, progressively approximate the nonsmooth problem (P0). The test at Step 2 requires that the quantities k , k and d(wk jF0) be smaller than T , which should be thought of as a small positive quantity. Requiring that d(wk jF0) be smaller than T obviously amounts to requiring \almost" feasibility. Instead, the reason for requiring that both (k )2 and k be smaller than T is connected to the satisfaction of the KKT conditions (15) for Problem (P0 ). In particular, thanks to (a), (c) and (17), we see that the smaller the T , the closer to satisfaction (15) is. We call the conditions expressed 13
by the test at Step 2 the T-approximate KKT conditions for Problem (P). In particular, we see that if T = 0 and we set k = 0 for every k, Algorithm AS reduces to Algorithm S. More in general the following theorem can be proved.
Theorem 5.14 Suppose that the sequence of numbers k tends to 0. Then the sequence fwk g
generated by Algorithm AS is contained in a compact set. Furthermore,
(a) if T > 0 then the algorithm terminates after a nite number of steps with a point satisfying the T-approximate KKT conditions for Problem (P);
(b) if T = 0 every limit point of the sequence fwk g is a KKT point of Problem (P). Proof. Let ~ = supk jk j and ~ = supk k . The sequence fwk g belongs to the set D(~) +
fw : kwk ~g, which is obviously compact (see Theorem 3.7). Then we pass to examine point (a). By the assumptions made, we know that both the sequences fk g and fk g tend to 0. So to prove this point it is sucient to show that d(wk jF0) also goes to 0. To this end, let us indicate by w k (one of) the closest point to wk in the closed set Fk . Since the sequence fw k g belongs to the compact set D(~), we can assume, without loss of generality, that it converges to a point w . By continuity of the constraints of the Problems (P ), we also have w 2 F0.
We can now write
kwk ? wk kwk ? wk k + kwk ? wk; which, taking into account that kwk ? wk k is smaller than k and the fact that w k is converging to w , shows that kwk ? w k converges to 0, and hence that also the distance d(wk jF0) goes
to 0. Let us examine point (b). If T = 0 the algorithm will generate an in nite sequence of points fwk g. The assertion of the theorem can be proved in a way very similar to that used in the proof Theorem 5.12, taking into account that fk g goes to 0 and that, as shown in the rst part of the proof, fwk g can be assumed to converge, without loss of generality, to a point w in the feasible set of Problem (P). 2
6 Numerical Results In this section we consider some preliminary numerical results obtained with the approach described in the previous sections. The algorithm AS was coded in Fortran 77 and all the test were run on a DEC 5000 workstation. After a few preliminary tests, the following choices were made and used on all the test problems. The perturbation parameter is initially set to 0.0001 and reduced by a factor of 100 at each outer iteration. All the nonlinear optimization problems which are required to be solved by the algorithm are solved by the routine E04UCF in the NAG library [30]. This routine implements a sequential quadratic programming algorithm; the default parameters were always used. In particular this implies that the nonlinear optimization subproblems are solved exactly, i.e. that k = 0 (up to machine precision). The starting point for subproblem (Pk+1 ) is wk . The stopping parameter T is set to 10?6.
14
n m l Start K
Problem 1 2 3 4 5 6 7 8 (L = 150, = 1.0) 8 (L = 150, = 1.1) 8 (L = 150, = 1.3) 8 (L = 150, = 1.5) 8 (L = 150, = 1.7) 8 (L = 50, = 1.0) 8 (L = 40, = 1.1) 8 (L = 30, = 1.3) 8 (L = 25, = 1.5) 8 (L = 20, = 1.7) 9 9 9 9 9 10 11
1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 4 2
6 4 6 4 6 4 6 4 6 4 6 4 6 4 6 4 2 2 1 1 2 6 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 4 8 2 2 2 2 2 2 2 2 2 2 4 12 6 4
(a) (b) (a) (b) (a) (b) (a) (b) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (b) (c) (d) (e) (a) (a)
2 2 1 1 1 1 1 1 2 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
f
3.207701 3.207701 3.449404 3.449404 4.604254 4.604254 6.592684 6.592684 -1 -3266.667 4.999375 -343.3453 -203.1551 -68.13565 -19.15407 -3.161181 -346.8932 -224.0372 -80.78597 -22.83712 -5.349136 0.1345863E-17 0.1469359E-22 0.1009552E-17 0.3635465E-26 0.4715810E-25 -6600.000 -12.67871
Table 1: Results for the test problems We remark that we only want to show viability of the approach proposed. An optimal choice of the parameters is out of the scope of this paper; however, it may be interesting to add that, at least to our limited experience, the behaviour of the algorithm does not seem to depend critically on the choices outlined above. We ran the algorithm on 11 test problems used in [1, 4, 31, 32]. A full analytical description of the test problems along with the starting points, the optimal solutions found and bibliographical references to the sources of the problems are given in the Appendix. In Table 1 we report, for each problem, the values of n, m and l (the \dimensions" of the problem), the number K of outer iterations needed to reach the prescribed accuracy and the optimal objective function value f found. In some cases more than one starting point is used. This is indicated in the column Start, the corresponding starting point are given in the Appendix. >From the results reported in Table 1 we see that the algorithm is capable to solve with a limited amount of work all the problems considered. A comparison with the data reported in [1, 4, 31, 32] shows that our algorithm is capable to reach at least the same accuracy in a number of outer iterations which is usually smaller (and never larger) than the number of nonsmooth functions evaluations reported in [1, 4, 31, 32]. It should be remarked that in Problem 7, in some instances of Problem 9, in Problem 10 and in Problem 11, convergence occured to a dierent point from the one reported in the sources. For Problem 10, our 15
Problem 1 2 3 4 5 6 7 8 (L = 150, = 1.0) 8 (L = 150, = 1.1) 8 (L = 150, = 1.3) 8 (L = 150, = 1.5) 8 (L = 150, = 1.7) 8 (L = 50, = 1.0) 8 (L = 40, = 1.1) 8 (L = 30, = 1.3) 8 (L = 25, = 1.5) 8 (L = 20, = 1.7) 9 9 9 9 9 10 11
Start (a) (b) (a) (b) (a) (b) (a) (b) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (a) (b) (c) (d) (e) (a) (a)
f rf
83 56 57 30 28 30 37 18 35 8 81 9 9 15 10 11 7 9 8 21 23 9 12 10 9 9 89 11
H
rH KIN
81 850 810 54 580 540 56 580 560 29 310 290 27 290 270 29 310 290 36 380 360 17 190 170 33 222 198 7 18 14 78 672 624 8 120 96 8 120 96 14 192 168 9 132 108 10 144 120 6 96 72 8 120 96 7 108 84 20 264 240 22 288 264 8 40 32 11 52 44 9 44 36 8 40 32 8 40 32 88 1440 1408 10 120 100
27 26 20 19 17 15 14 12 19 5 28 7 7 10 7 8 5 6 6 18 20 7 9 7 7 7 51 7
Table 2: More data on the test results algorithm converges to the optimal solution given in [1], while the algorithm in [1] only gave approximate solutions. We note that each function evaluation for the method reported in [31] and [32] requires, among other things, the solution of a variational inequality problem, and, very roughly speaking, this can be considered equivalent to the solution of a nonlinear optimization problem, i.e. to one outer iteration of our algorithm. However, using only the data reported in [31] and [32], it is not possible to compare the eciency of our scheme with those of [31] and [32], which are based on bundle methods. To make future comparisons possible, we report in Table 2 some more data which give a more detailed picture of the computational eort required by our algorithm. More speci cally, for each problem we report the total number of objective function evaluations (f), the total number of objective function gradient evaluations (rf ), the cumulative number of evaluations of the (H )i (H ) and of their gradients (rH ). However, the evaluation numbers of Hi = gi (x; y ) ? zi and their gradients are not counted if gi is linear (hence Hi is also linear). We remark that each evaluation of a component of (H ) is counted separately and the same happens for their gradients. Note also that in all the problems considered the set Xad is described by very simple constraints, we have not included the number of evaluations of these constraints, which gives a negligible contribution to the computational burden. Finally, under the heading KIN we have reported the total number of inner iterations required by the recursive quadratic solver. Each inner iterations corresponds, roughly speaking, to the solution of a convex quadratic program. We 16
remark that the data in Table 1 and 2 give a fairly complete picture of the work needed by our algorithm; in our scheme there are no other signi cant time consuming tasks besides those described so far. With regard to the number of inner iterations needed it may be of interest to note the following facts. At each outer iteration the Problems (Pk ) become closer and closer to the nondierentiable problem (P), since k goes to 0. It could then be expected that the resolution of the Problems (Pk ) becomes progressively more dicult. However, since the sequence of points wk is converging to w, we have that the starting point employed in the solution of each subproblem is closer and closer to the solution of the subproblems themselves, thus compensating the increasing diculty of the problem. This is con rmed by the fact that the number of inner iterations required by the sequential programming algorithm to solve the nonlinear programs decreases at each outer iteration. The inner iterations numbers are: 22, 5 for Problem 1 (a); 21, 5 for Problem 1 (b); 14, 5 for Problem 5; and 14, 8, 6 for Problem 7.
7 Conclusions We introduced a new method for the solution of MPEC problems with strongly monotone variational inequalities as side constraints. The method is based on approximating a nonsmooth one-level minimization reformulation of the MPEC problem by a series of smooth, regular problems. We showed that every sequence of global (stationary) solutions so generated converges to global (stationary) solutions of the MPEC problem. The numerical results reported seem to indicate that the method introduced in this paper can be numerically ecient. The new approach seems quite dierent from existing proposals for the same class of problems, and also if we specialize it to bilevel programming problems it does not appear to be related to existing algorithms for this latter class of problems, see [38]. On the other hand, smoothing techniques have received a renewed attention in recent years, especially in the area of complementarity problems [7, 8, 9, 10, 16], and have shown to be a powerful tool in the solution of complex mathematical problems. We think that, besides its theoretical properties, the new algorithm has one distinct practical advantage: it only requires the use of standard software for the solution of smooth constrained optimization problems, no other complex operations are necessary. On the other hand, very ecient and robust methods for the solution of smooth constrained optimization problems, along with many commercial implementations, are available nowadays, also for the large-scale case. Therefore, we hope that the new method can constitutes a rst step towards the development of simple, but ecient and practical algorithms for the solution of real-world MPEC problems.
Appendix In this appendix we report the analytical description of the test problems. For each problem we give the dimensions (n, m and l), the objective function f , the set Xad, the function F de ning the variational inequality, the variational inequality constraints g de ning the set ?(x), the starting point(s) x0 and the optimal solution(s) x found by the algorithm. Note that both for the starting point and for the solution point we only report the x variables, since the y variables are then uniquely determined by the x.
Problem 1. This problem is taken from [31]. The original problem is a bilevel program. Since the lower level program is convex, it is equivalent to its optimality condition, which can be reformulated as a variational inequality problem as done in [31].
n = 1, m = 6, l = 4 f (x; y) = 21 (y1 ? 3)2 + 21 (y2 ? 4)2 17
Xad = [0; 10]
0 (1 + 0:2x)y ? (3 + 1:333x) ? 0:333y + 2y y ? y 1 3 1 4 5 BB (1 + 0:1x)y2 ? x + y3 + 2y2 y4 ? y6 B 0:333y1 ? y2 + 1 ? 0:1x F (x; y) = B BB 9 + 0:1x ? y12 ? y22 B@ y1 y2
1 CC CC CC CA
g1(x; y) = y3 g2(x; y) = y4 g3(x; y) = y5 g4(x; y) = y6 (a) x0 = 0 (b) x0 = 10 (a) x = 4:06041 (b) x = 4:06041
Problem 2. The data for this problem are the same as those for Problem 1 with the exception of the objective function, which is given by f (x; y) = 21 (y1 ? 3)2 + 21 (y2 ? 4)2 + 12 (y3 ? 1)2: Consequently, also the optimal solutions found are dierent (a) x = 5:15361
(b) x = 5:15361:
Problem 3. The data for this problem are the same as those for Problem 1 with the exception
of the objective function, which is given by f (x; y) = 21 (y1 ? 3)2 + 12 (y2 ? 4)2 + 5y42 : Consequently, also the optimal solutions found are dierent (a) x = 2:38942
(b) x = 2:38942:
Problem 4. The data for this problem are the same as those for Problem 1 with the exception
of the objective function, which is given by f (x; y) = 21 x2 + 12 (y1 ? 3)2 + 21 (y2 ? 4)2 + 12 (y3 ? 1)2 + 21 (y4 ? 1)2: Consequently, also the optimal solutions found are dierent (a) x = 1:37313
(b) x = 1:37313:
Problem 5. This example was tested in [15, 32] and goes back to [12]. n = 2, m = 2, l = 2 f (x; y) = x21 ? 2x1 + x22 ? 2x2 + y12 + y22 Xad = [0; 2] [0; 2] 18
F (x; y) =
2y1 ? 2x1 2y2 ? 2x2
!
g1(x; y) = 0:25 ? (y1 ? 1)2 g2(x; y) = 0:25 ? (y2 ? 1)2 (a) x0 = (0; 0) (a) x = (0:50005; 0:50005)
Problem 6. This problem is a Stackelberg leader-follower game described in [21] and was tested in [15, 32].
n = 1, m = 1, l = 1 f (x; y) = 0:5x2 + 0:5xy ? 95x Xad = [0; 200] F (x; y) = 2y + 0:5x ? 100 g1(x; y) = y (a) x0 = 0 (a) x = 93:33333
Problem 7. This example has been used in [22, 32]. It does not belong to problems investi-
gated in this paper, but by using one exterior penalty it can be cast into our format. We set the penalty parameter R = 100 as done in [32].
n = 2, m = 2, l = 6 f (x; y) = 2x1 + 2x2 ? 3y1 ? 3y2 ? 60 + R[maxf0; x1 + x2 + y1 ? 2y2 ? 40g]2 Xad = [0; 50] [0; 50] ! 2 y 1 ? 2x1 + 40 F (x; y) = 2y ? 2x + 40 2 2 g1(x; y) = y1 + 10 g2(x; y) = ?y1 + 20 g3(x; y) = y2 + 10 g4(x; y) = ?y2 + 20 g5(x; y) = x1 ? 2y1 ? 10 g6(x; y) = x2 ? 2y2 ? 10 (a) x0 = (50; 50) (a) x = (25:00125; 30:00000)
Problem 8. This is a Cournot-Nash equilibrium problem, which is fully described in [29, 32]. n = 1, m = 4, l = 8
19
i 1 2 ci 10 8 Ki 5 5 i 1.2 1.1
3 4 5 6 4 2 5 5 5 1.0 0.9 0.8
Table 3: Parameter speci cation for Cournot-Nash-Equilibrium problem
f (x; y) = r1(x) ? xp(x + y1 + y2 + y3 + y4 ), where ri : IR ! IR (i = 1; 2; :::; 5) de ned by i K ?1= i v (1+ i)= i ; ri(v) = civ + i +1 i p : IR ! IR de ned by p(Q) = 50001= Q?1= ; ci ; i; Ki; i = 1; 2; :::; 5 are given positive parameters in Table 3?, is a positive parameter, Q = x + y1 + y2 + y3 + y4 Xad = [0; L], where L is a given parameter 0 rr2(y1) ? p(Q) ? y1rp(Q) 1 CA .. F (x; y) = B @ . rr5(y4) ? p(Q) ? y4rp(Q) g1(x; y) = y1 g2(x; y) = L ? y1 g3(x; y) = y2 g4(x; y) = L ? y2 g5(x; y) = y3 g6(x; y) = L ? y3 g7(x; y) = y4 g8(x; y) = L ? y4
We considered many instances of this problems, corresponding to dierent values of the parameter L and . We report below the starting point and the optimal solution found for each choice of the parameter values. L = 150; = 1:0 (a) x0 = 75; x = 55:55129 L = 150; = 1:1 (a) x0 = 75; x = 42:53825 L = 150; = 1:3 (a) x0 = 75; x = 24:14506 L = 150; = 1:5 (a) x0 = 75; x = 12:37270 L = 150; = 1:7 (a) x0 = 75; x = 4:75356 L = 50; = 1:0 (a) x0 = 25; x = 50:00000 L = 40; = 1:1 (a) x0 = 20; x = 39:79144 L = 30; = 1:3 (a) x0 = 15; x = 24:25713 L = 25; = 1:5 (a) x0 = 12:5; x = 13:01966 L = 20; = 1:7 (a) x0 = 10; x = 6:00234
Problem 9. This is a monotone quasi-variational inequality problem resulting from generalized Nash equilibria, see [17, 32].
n = 2, m = 2, l = 2 f (x; y) = 12 ((x1 ? y1)2 + (x2 ? y2 )2 ) Xad = [0; 10] [0; 10] 20
! 8 ? 34 + 2 y 1 + 3 y2 F (x; y) = ?24:25 + 1:25y + 2y 1 2 g1(x; y) = ?x2 ? y1 + 15 g2(x; y) = ?x1 ? y2 + 15 (a) x0 = (0; 0) (b) x0 = (5; 5) (d) x0 = (10; 0) (e) x0 = (0; 10) (a) x = (5; 9) (b) x = (5; 9) (d) x = (5; 9) (e) x = (5; 9)
(c) x0 = (10; 10) (c) x = (5; 9)
Problem 10. This is a bilevel programming problem tested in [1, 4]. The lower level program is convex problem and, therefore, it is equivalent to a variational inequality problem. n = 4; m = 4; l = 12 f (x; y) = ?(200 ? y1 ? y3)(y1 + y3 ) ? (160 ? y2 ? y4 )(y2 + y4 ) Xad = [0; 10] [0; 5] [0; 15] [0; 20] Tf(x1; x2; x3; x4)jx1 + x2 + x3 + x4 40g 1 0 y1 ? 4 B y2 ? 13 CC F (x; y) = B B @ y3 ? 35 CA y4 ? 2 g1(x; y) = x1 ? 0:4y1 ? 0:7y2 g2(x; y) = x2 ? 0:6y1 ? 0:3y2 g3(x; y) = y1 g4(x; y) = ?y1 + 20 g5(x; y) = y2 g6(x; y) = ?y2 + 20 g7(x; y) = x3 ? 0:4y3 ? 0:7y4 g8(x; y) = x4 ? 0:6y3 ? 0:3y4 g9(x; y) = y3 g10(x; y) = ?y3 + 40 g11(x; y) = y4 g12(x; y) = ?y4 + 40 (a) x0 = (5; 5; 15; 15) (a) x = (7:00000; 3:00000; 12:00000; 18:00000)
Problem 11. This is also a bilevel programming problem tested in [4]. Analogously to the previous problem, the optimality condition of the lower level program were rewritten as a variational inequality problem. n = 2; m = 6; l = 4 f (x; y) = ?x21 ? 3x2 ? 4y1 + y22 Xad = f(x1; x2)jx21 + 2x2 4; x1 0; x2 0g 1 0 2y + 2y ? 3y ? y 1 3 4 5 C BB BB x21 ? ?2x51?+yx322+?42yy41?+yy62 + 3 CCC F (x; y) = B CC BB x2 + 3y1 ? 4y2 ? 4 CA @ y1 y2 21
g1(x; y) = y3 g2(x; y) = y4 g3(x; y) = y5 g4(x; y) = y6 (a) x0 = (0; 2) (a) x = (0:00000; 2:00000)
References [1] E. Aiyoshi and K. Shimizu, Hierachical decentralized systems and its new solution by a barrier method, IEEE Transactions on Systems, Man, and Cybernetics SMC-11 (1981) 444-449. [2] G. Anandalingam and T.L. Friesz, eds., Hierarchical Optimization, Annals of Operations Research, 1992. [3] J.F. Bard, An algorithm for solving the general bilevel program, Mathematics of Operations Research 8 (1983) 260-272. [4] J.F. Bard, Convex two-level optimization, Mathematical Programming 40 (1988) 15-27. [5] J.F. Bard and J.E. Falk, An explicit solution to the multi-level programming problem, Computers and Operations Research 9 (1982) 77-100. [6] W.F. Bialas and M.H. Karwan, On two-level optimization, IEEE Transactions on Automatic Control AC-27 (1982) 211-214. [7] S.C. Billups, S.P. Dirkse and M.C. Ferris, A comparison of algorithms for large scale mixed complementarity problems, UCD/CCM Report 67, Center for Computational Mathematics, University of Colorado at Denver, Denver, 1995. [8] B. Chen and P.T. Harker, A Class of smooth approximations to nonlinear complementarity problems, Preprint, Departmnet of Management and Systems, Washington State University, Pullman, 1995. [9] C. Chen and O.L. Mangasarian, Smoothing methods for convex inequalities and linear complementarity problems, Mathematical Programming 71 (1995) 51-69. [10] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Computational Optimization and Applications 5 (1996) 97138. [11] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983. [12] A.H. DeSilva, Sensitivity formulas for nonlinear factorable programming and their application to the solution of an implicitly de ned optimization model of US crude oil production, Dissertation, George Washington University, Washington, D.C., 1978. [13] T.A. Edmunds and J.F. Bard, Algorithm for nonlinear bilevel mathematical programs IEEE Transactions on Systems, Man and Cybernetics 21 (1991) 83-89. [14] J.E. Falk and J. Liu On bilevel programming, Part I: general nonlinear cases. Mathematical Programming 70 (1995) 47-72. [15] T.L. Friesz, R.L. Tobin, H.-J. Cho and N.J. Mehta, Sensitivity analysis based heuristic algorithms for mathematical programs with variational inequality constraints, Mathematical Programming 48 (1990) 265-284. 22
[16] S.A. Gabriel and J.J. More, Smoothing of mixed complementarity problems, Preprint MCS-P541-0995, Mathematics and Computer Science Division, Algonne National Laboratory, Argonne, Illinois, 1995. [17] P.T. Harker, Generalized Nash games and quasi-variational inequalities, European Journal of Operational Research 54 (1991) 81-94. [18] P.T. Harker and S.C. Choi, A penalty function approach for mathematical programs with variational inequality constraints, WP 87-08-08, University of Pennsylvania, Pennsylvania, 1987. [19] P.T. Harker and J.-S. Pang, Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications, Mathematical Programming 48 (1990) 161-220. [20] J. Haslinger and P. Neittaanmaki, Finite element approximation for optimal Shape Design: Theory and Applications (Wiley, Chichester, 1988). [21] J.M. Henderson and R.E. Quandt, Microeconomic Theory, 3rd ed., McGraw-Hill, New York, 1980. [22] Y. Ishizuka and E. Aiyoshi, Double penalty method for bilevel optimization problems, in: G. Anandalingam and T. Friesz, eds., Hierachical Optimization, Annal of Operations Research 34 (1992) 73-88. [23] C. Kanzow, Some noninterior continuation methods for linear complementarity problems, SIAM Journal on Matrix Analysis and Applications, to appear. [24] C. Kanzow and H. Jiang, A continuation method for (strongly) monotone variational inequalities, Applied Mathematics Report AMR94/32, School of Mathematics, University of New South Wales, Sydney, Australia, October 1994 (Revised January 1996). [25] P. Loridan and J. Morgan, Regularizations for two-level optimization problems, in Advances in Optimization, Springer-Verlag (1992) 239-255. [26] Z.-Q. Luo, J.-S. Pang and D. Ralph, Mathematical Programs with Equilibrium Constraints, Cambridge University Press, to appear. [27] Z.-Q. Luo, J.-S. Pang, D. Ralph and S.-Q. Wu, Exact penalization and stationarity conditions of mathematical programs with equilibrium constraints, Mathematical Programming, to appear. [28] P. Marcotte, Network design problem with congestion eects: A case of bilevel programming, Mathematical Programming 34 (1986) 142-162. [29] F.H. Murphy, H.D. Sherali and A.L. Soyster, A mathematical programming approach for determining oligopolistic market equilibrium, Mathematical Programming 24 (1982) 92-106. [30] NAG Fortran Library: Mark 16, NAG Ltd., Oxford, 1993. [31] J.V. Outrata, On optimization problems with variational inequality constraints, SIAM Journal on Optimization 4 (1994) 340-357. [32] J.V. Outrata and J. Zowe, A numerical approach to optimization problems with variational inequality constraints, Mathematical Programming 68 (1995) 105-130. [33] L. Qi, Trust region algorithms for solving nonsmooth equations, SIAM Journal on Optimization 5 (1995) 219-230. 23
[34] L. Qi and X. Chen, A globally convergent successive approximation methods for nonsmooth equations, SIAM Journal on Control and Optimization 33 (1995) 402-418. [35] D. Ralph, Sequential quadratic programming for mathematical programs with linear complementarity constraints, in: R.L. May and A.K. Easton eds., CTAC95 Computational Techniques and Applications, World Scienti c, 1996. [36] C. Suwansirikul, T.L. Friesz and R.L. Tobin, Equilibrium decomposed optimization: A heuristic for the continuous equilibrium network design problem, Transportation Sciences 21 (1987) 254-263. [37] R.L. Tobin, Uniqueness results and algorithms for Stackelberg-Cournot-Nash equilibria, in: G. Anandalingam and T. Friesz, eds., Hierarchical Optimization, Annals of Operations Research 34 (1992) 21-36. [38] L.N. Vicente and P. Calamai, Bilevel and Multilevel Programming: A bibliography review, Journal of Global Optimization 5 (1994) 291-306.
24