Convergence of a Recurrent Neural Network for Nonconvex Optimization Based on an Augmented Lagrangian Function Xiaolin Hu and Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China {xlhu, jwang}@mae.cuhk.edu.hk
Abstract. In the paper, a recurrent neural network based on an augmented Lagrangian function is proposed for seeking local minima of nonconvex optimization problems with inequality constraints. First, each equilibrium point of the neural network corresponds to a Karush-KuhnTucker (KKT) point of the problem. Second, by appropriately choosing a control parameter, the neural network is asymptotically stable at those local minima satisfying some mild conditions. The latter property of the neural network is ensured by the convexification capability of the augmented Lagrangian function. The proposed scheme is inspired by many existing neural networks in the literature and can be regarded as an extension or improved version of them. A simulation example is discussed to illustrate the results.
1
Introduction
In the past two decades, recurrent neural networks for solving optimization problems have attracted much attention since the early work of Hopfield and Tank [1], [2]. The theory, methodology, and applications of these neural networks have been widely investigated. Now many elegant neural networks have been proposed for solving various optimization problems and related problems. For example, Rodr´ıguez-V´azquez et al. proposed the switched-capacitor neural network for nonlinear programming [3]; Wang proposed the deterministic annealing neural network [4,5] for convex optimization; Bouzerdoum and Pattison invented a neural network for quadratic programming with bound constraints [6]; Liang and Si studied a neural network for solving linear variational inequalities [7]; Xia and Wang et al. developed several neural networks for solving optimization problems and variational inequalities [8, 9, 10, 11, 12, 13, 14]. However, most of these neural networks can solve convex programs only. In contrast, little progress has been made on nonconvex optimization in the neural network community. This is mainly due to the difficulty in characterizing global optimality of nonconvex optimization problems by means of explicit equations. From the optimization context, it is known that under fairly mild conditions an optimum of the problem must be a Karush-Kuhn-Tucker (KKT) point, while the KKT points are D. Liu et al. (Eds.): ISNN 2007, Part III, LNCS 4493, pp. 194–203, 2007. c Springer-Verlag Berlin Heidelberg 2007
Convergence of a Recurrent Neural Network for Nonconvex Optimization
195
easier to characterize. In terms of developing neural networks for global optimization, it seems far-reaching to find global optima at the very beginning; and a more attainable goal at present is to design neural networks for seeking local optima first with the aid of KKT conditions. In [15] a recurrent neural network based on an augmented Lagrangian function was proposed for solving nonlinear optimization problems with equality constraints. The neural network was pointed out to be (locally) asymptotically stable at KKT points that correspond to local optima under mild conditions. Recently, a neural network based on another augmented Lagrangian function was proposed in [16] for seeking local optima of nonconvex optimization problems with inequality constraints, and it was also proved asymptotically stable at local optima. Unfortunately, the equilibrium set of the neural network does not coincide with the KKT point set. In other words, the neural network may converge to some non-KKT points. In the paper, a new recurrent neural network is proposed for solving inequality constrained optimization problems based on a similar augmented Lagrangian function to that in [16]. It will be shown that each equilibrium corresponds to a KKT point, and the neural network is locally convergent to local optima under some mild conditions.
2
Problem Formulation and Preliminaries
Consider the following constrained optimization problem: min s.t.
f (x) g(x) ≤ 0,
(1)
where f : Rn → R, g(x) = [g1 (x), . . . , gm (x)]T is an m-dimensional vectorvalued function of n variables. In the paper, the functions f, g1 (x), . . . , gm (x) are assumed to be twice differentiable. If all functions f (x) and gj (x) are convex over Rn , the problem is called a convex optimization problem; otherwise, it is called a nonconvex optimization problem. Equation (1) represents a wide variety of optimization problems. For example, it is well known that if a problem has equality constraints h(x) = 0, then this constraint can be expressed as h(x) ≤ 0 and −h(x) ≤ 0. n stands for the Throughout the paper, the following notations are used. R+ n nonnegative quadrant of the n-dimensional real space R . I = {1, · · · , n}, J = + + T where u+ {1, · · · , m}. If u ∈ Rn , then u+ = (u+ 1 , u2 , · · · un ) i = max(ui , 0); 2 2 2 T u = (u1 , · · · , un ) ; Γ (u) = diag(u1 , · · · , un ). A square matrix A > 0 (A ≥ 0) means that A is positive definite (positive semidefinite). Definition 1. A solution x satisfying the constraints in (1) is called a feasible solution. A feasible solution x is said to be a regular point if the gradients of gj (x), ∇gj (x), ∀j ∈ {j ∈ J|gj (x) = 0}, are linearly independent. Definition 2. A point x∗ is said to be a strict minimum of the problem in (1) if f (x∗ ) < f (x), ∀x ∈ K(x∗ ; ) ∩ S, where K(x∗ ; ) is a neighborhood of x∗ with the radius > 0 and S is the feasible region of the problem.
196
X. Hu and J. Wang
It is well known [17] that if x is a local minimum as well as a regular point of problem (1), there exits a unique vector y ∈ Rm such that the following Karush-Kuhn-Tucker (KKT) conditions hold: ∇f (x) + ∇g(x)y = 0, y ≥ 0, g(x) ≤ 0, y T g(x) = 0 where ∇g(x) = (∇g1 (x), ..., ∇gm (x)). The above KKT condition can be equivalently put into the following projection formulation: ∇f (x) + ∇g(x)y = 0, (2) (y + αg(x))+ = y, where α > 0. In the sequel, we denote the KKT point set of (1) or the solution set of (2) by Ω ∗ . Define the Lagrangian function associated with problem (1) L(x, y) = f (x) +
m
yj gj (x).
(3)
j=1
Lemma 1 (Second-order sufficiency conditions [17]). Suppose that x∗ is a feasible and regular point to problem (1). If there exists y ∗ ∈ Rm , such that (x∗ , y ∗ ) is a KKT point pair and the Hessian matrix ∇2xx L(x∗ , y ∗ ) is positive definite on the tangent subspace: M (x∗ ) = {d ∈ Rn |dT ∇gj (x∗ ) = 0, d = 0, ∀j ∈ J(x∗ )}, where J(x∗ ) is defined by J(x∗ ) = {j ∈ J|yj∗ > 0}, then x∗ is a strict minimum of problem (1). Now consider the following augmented Lagrangian function associated with problem (1), which differs slightly from the one considered in [16]. c Lc (x, y) = L(x, y) + (yj gj (x))2 , 2 j=1 m
(4)
where L(x, y) is the regular Lagrangian function defined in (3) and c > 0 is a scalar. Let Ω e denote the solution set of the following equations ∇x Lc (x, y) = 0, (5) (y + αg(x))+ = y, where α > 0. We have the following theorem. Theorem 1. Ω ∗ = Ω e . Proof. It is equivalent to prove that under the following condition (y + αg(x))+ = y
(6)
Convergence of a Recurrent Neural Network for Nonconvex Optimization
197
the first equation in (2) and the first equation in (5) are identical. The above equation gives yj gj (x) = 0, ∀j ∈ J. Then by writing the first equation in (5) as follows yj2 gj (x)∇gj (x) = 0, ∇x L(x, y) + c j∈J
we can readily see that it is identical to ∇x L(x, y) = 0. The proof is completed.
3
Neural Network Model and Analysis
Consider a recurrent neural network with its dynamic behavior governed by d x −α(∇f (x) + ∇g(x)y + c∇g(x)Γ (y 2 )g(x)) , (7) = −y + (y + αg(x))+ dt y where α > 0, c > 0 are two parameters. Note that the first term on the righthand-side is the expansion of ∇x Lc (x, y). Therefore the equilibrium set of the neural network is actually Ω e , which is equal to Ω ∗ as claimed in Theorem 1. Lemma 2 ( [18], p. 68). Let P ∈ Rn×n be symmetric and Q ∈ Rn×n be symmetric and positive semidefinite. Assume that xT P x > 0 for all x = 0 satisfying xT Qx = 0. Then there exists a scalar c such that P + cQ > 0. Lemma 3. Let u = (xT , y T )T and u∗ = ((x∗ )T , (y ∗ )T )T be a KKT point satisfying the second-order sufficiency conditions in Lemma 1. There exists c > 0 such that ∇2xx Lc (u) > 0 and ∇F (u) ≥ 0 on N (u∗ ), where ∇x Lc (x, y) F (u) = −g(x) and N (u∗ ) ⊆ Rn+m is a set containing u∗ as one of its interior points. Proof. A direct reasoning gives 2∇2xx Lc (u) 2c∇g(x)Γ (y)Γ (g(x)) T ∇F (u) + ∇F (u) = , 2cΓ (g(x))Γ (y)∇g(x)T 0 and ∇2xx Lc (u) = ∇2xx L(u) + c
[yj2 ∇gj (x)∇gj (x)T + yj2 gj (x)∇2 gj (x)].
j∈J
Since u∗ is a KKT point, we have ∇F (u∗ ) + ∇F (u∗ )T = and
∇2xx Lc (u∗ ) = ∇2xx L(u∗ ) + c
2 2∇xx Lc (u∗ ) 0 , 0 0
j∈J
(yj∗ )2 ∇gj (x∗ )∇gj (x∗ )T .
198
X. Hu and J. Wang
Because u∗ satisfies the second-order sufficiency conditions, there exists at least one j ∈ J such that yj∗ > 0. By Lemma 2, there exists c > 0 such that ∇2xx Lc (u∗ ) > 0. Let λj (u), ∀j ∈ J denotes any eigenvalue of the matrix function ∇2xx Lc (u). Then λj (u) is a continuous function in x and y satisfying λj (u∗ ) > 0 for any j ∈ J. Therefore, there exists a neighborhood of u∗ , denoted by N (u∗ ), on which ∇2xx Lc (u) > 0; moreover, u∗ is an interior point of this neighborhood. Similarly we can prove that there exists N (u∗ ) ⊆ N (u∗ ), on which ∇F (u) ≥ 0, where u∗ is an interior point of N (u∗ ). The proof is completed. Lemma 4. For any initial point (x(t0 )T , y(t0 )T )T ∈ Rn+m there exists a unique continuous solution (x(t)T , y(t)T )T for (7). Moreover, y(t) ≥ 0 if y(t0 ) ≥ 0. Proof. Similar to the analysis of Lemma 3 in [12]. Theorem 2. Let u∗ = ((x∗ )T , (y ∗ )T )T be a KKT point of problem in (1) satisfying the second-order sufficiency conditions in Lemma 1. There exists c > 0 such that the neural network in (7) is asymptotically stable at u∗ , where x∗ is a strict local minimum of the problem. Proof. Choose y(t0 ) ≥ 0, then from Lemma 4, y(t) ≥ 0, ∀t ≥ t0 . Consider the equivalent form of (7) du = −u + PΩ (u − αF (u)), dt n where F (u) is defined in Lemma 3, Ω = Rn × R+ and PΩ (·) is a projection operator defined as PΩ (u) = arg minv∈Ω u − v. Define the Lyapunov function
V (u) = −F (u)T r(u) −
1 1 r(u)2 + u − u∗ 2 2α 2α
where r(u) = PΩ (u − αF (u)) − u and u∗ is a KKT point satisfying the secondorder sufficiency conditions. Lemma 1 indicates that x∗ is a strict local minimum of the problem. As y ∗ is uniquely determined for x∗ , there exits > 0 such that u∗ is a unique KKT point in D(u∗ ; ) = {u ∈ Rn+m |u − u∗ < }. Similar to the proof of [12, Theorem 1], we can obtain V (u) ≥
1 u − u∗ 2 2α
∀u ∈ Rn+m
and dV (u(t))/dt ≤ −F (u)T (u − u∗ ) − r(u)T ∇F (u)r(u) ≤ −(F (u) − F (u∗ ))T (u − u∗ ) − r(u)T ∇F (u)r(u)
∀t ≥ t0 .
Select a convex subset Ωc ⊆ N (u∗ ) which contains u∗ as a unique KKT point in it, where N (u∗ ) is defined in Lemma 3. By Lemma 3 there exists c > 0 such that ∇F (u) ≥ 0 on Ωc , which implies F (u) is monotone on Ωc . Then dV (u)/dt ≤ 0 on Ωc , and the neural network in (7) is stable at u∗ .
Convergence of a Recurrent Neural Network for Nonconvex Optimization
199
According to the Lyapunov theorem [19], to prove the asymptotic stability of the neural network, it is only needed to show that dV (u)/dt = 0 if and only if u = u∗ in Ωc , or show that dV (u)/dt = 0 if and only if du/dt = 0 since in Ωc , du/dt = 0 is equivalent to u = u∗ . Consider a point u ∈ Ωc . Clearly, du/dt = 0 implies dV /dt = 0. Let u be a solution of dV (u)/dt = 0. It follows that r(u)T ∇F (u)r(u) = α2 ∇x Lc (x, y)T ∇2xx Lc (x, y)∇x Lc (x, y) = 0, and dx/dt = −α∇x Lc (x, y) = 0 since ∇2xx Lc (x, y) > 0 on Ωc . In view that ∇F (u) ≥ 0 on Ωc , we have 1 dV /dt ≤ −(F (u∗ ) − F (u))T (u∗ − u) = − (x∗ − x)T ∇2xx Lc (xs , ys )(x∗ − x)ds 0
where xs = x + s(x∗ − x) and ys = y + s(y ∗ − y) with 0 ≤ s ≤ 1. Because Ωc is convex, (xTs , ysT )T is in Ωc , and ∇2xx Lc (xs , ys ) > 0 for 0 ≤ s ≤ 1. Then dV (u)/dt = 0 implies x = x∗ . With this implication we can deduce dV (u)/dt = 0 ⇒ F (u)T (u − u∗ ) = 0 ⇒ g(x)T (y − y ∗ ) = 0 ⇒ g(x)T y = 0. By considering g(x) = g(x∗ ) ≤ 0 and y ≥ 0, we have dy/dt = (y + αg(x))+ − y = 0. Thus, u is a solution of du/dt = 0. In summary, dV (u)/dt = 0 if and only if du/dt = 0. Hence, the neural network is asymptotically stable at u∗ . The proof is completed.
4
Comparisons with Other Neural Networks
In 1992, Zhang and Constantinides proposed a neural network based on the augmented Lagrangian function for seeking local minima of the following equality constrained optimization problem [15]: min f (x) s.t. h(x) = 0, where f : Rn → R, h : Rn → Rm and both f and h are assumed twice differentiable. The dynamic equation of the network is as follows d x −(∇f (x) + ∇h(x)y + c∇h(x)h(x)) = , (8) dt y h(x) where c > 0 is a control parameter. Under the second-order sufficiency conditions, the neural network can be shown convergent to local minima with appropriate choice of c. The disadvantage of the neural network lies in that it handles equality constraints only. Though in theory inequality constraints can be converted to equality constraints by introducing slack variables, the dimension of the neural network will inevitably increase, which is usually not deemed optimal in terms of model complexity. In this sense, the proposed neural network in the present paper can be regarded as an extension of the Lagrange network.
200
X. Hu and J. Wang
An alternative extension of the neural network in [15] for handling inequality constraints in (1) directly can be found in [16], and its dynamic system is as follows: d x −(∇f (x) + ∇g(x)y 2 + c∇g(x)Γ (y 2 )g(x)) = . (9) dt y 2Γ (y)g(x) The local convergence of the neural network to its equilibrium set, denoted by ˆ e , was proved by using the linearization techniques, and moreover, Ω ∗ ⊂ Ω ˆ e. Ω e ∗ ˆ = Ω . For example, any critical point x of the However, it is clear that Ω objective function, which makes ∇f (x) = 0, and y = 0 constitute an equilibrium point of (9), but in rare cases such an equilibrium corresponds to a KKT point. If we set the control parameter c = 0 in (7), the neural network becomes −(∇f (x) + ∇g(x)y) d x , (10) = dt y −y + (y + g(x))+ which is a special case of the neural network proposed in [12] for solving convex version of (1). Clearly, its equilibrium set coincides with Ω ∗ . However, it will be seen in the next section that this neural network cannot be guaranteed the local convergence to local minima of nonconvex optimization problems. A scheme taking advantages of the neural network in (10) for seeking local minima of nonconvex problems is found in [20]. The idea is to transform the problem in (1) into the following equivalent one, of course under some appropriate assumptions, min f (x) s.t. (g(x) + b)p ≤ bp ,
(11)
where b ≥ 0 is a vector and p ≥ 1 is a scalar, and then use the neural network in (10) to solve the new problem. Some local convergence capability can be ensured by selecting sufficiently large p. The weak point of this approach is that large p values will introduce great nonlinearity to the system and cause some numerical problems.
5
Illustrative Example
Consider the following optimization problem min
f (x) = 5 − (x21 + x22 )/2,
s.t. g1 (x) = −x21 + x2 − 1 ≤ 0, g2 (x) = x41 − x2 ≤ 0. As both f (x) and g1 (x) are concave, the problem is a nonconvex optimization problem. Fig. 1 shows the contour of the objective function and the solutions to g1 (x) = 0 and g2 (x) = 0 on the x1 − x2 plane. The feasible region is the nonconvex area enclosed by the bold curves. Simple calculations yield −2y1 + 12x21 y2 − 1 0 2 ∇xx L(x, y) = . 0 −1
Convergence of a Recurrent Neural Network for Nonconvex Optimization
201
3 0
0
2.5 1 2
1
g1 (x) = 0
2
2
1.5 3
x2
3 1 4
0.5
4
Feasible region
0
g2 (x) = 0 −0.5
4
3
3 −1 −2
−1
0 x1
1
2
Fig. 1. Contour of the objective function and the feasible region
10000
10
8
y1 (t)
y2 (t) 6
States (x(t),y(t))
5000
y1 (t)
4
2
0
y2 (t)
x2 (t)
0
x1 (t) x1 (t)
−2
x2 (t) −5000
0
5
10
15
−4
20
0
5
10
15
20 Time t
Time t
(a) c = 0
25
30
35
40
(b) c = 0.1
8
7 6
6
5
y1 (t) States (x(t),y(t))
States (x(t),y(t))
y1 (t)
4
4
x2 (t) 2
y2 (t)
0
x2 (t)
3 2
y2 (t)
1 0
x1 (t)
x1 (t)
−1
−2
−2 −4
0
5
10
15
20 Time t
25
(c) c = 0.2
30
35
40
−3
0
5
10
15
20 Time t
25
30
35
40
(d) c = 0.5
Fig. 2. Transient behavior the neural network in (7) with different values of c
Evidently, ∇2xx L(x, y) is not positive definite over the entire real space, and the neural network in (10) can not be applied to solve the problem. Now we check if the neural network in (7) can be used to search for the KKT points. There are four KKT points associated with the problem: u∗1=(−1.272, 2.618, 4.013, 1.395)T , u∗2 = (1.272, 2.618, 4.013, 1.395)T , u∗3 = (0, 0, 0, 0)T , u∗4 = (0, 1, 1, 0)T , but only the first two correspond to local minima. Moreover, it is verified that at either
202
X. Hu and J. Wang
u∗1 or u∗2 , J(x∗ ) defined in Lemma 1 is equal to {1, 2}, and ∇g1 (x∗ ), ∇g2 (x∗ ) are linearly independent, which indicates M (x∗ ) = ∅. So the second-order sufficiency conditions holds trivially at either point. According to Theorem 2, the neural network in (7) can be made asymptotically stable at u∗1 and u∗2 by choosing appropriate c > 0. Fig. 2 displays the state trajectories of the neural network with different values of c started from the same initial point (−2, 3, 0, 0)T . When c = 0, the neural network reduces to the neural network in (10). It is seen from Fig. 2(a) that some state trajectories is divergent to infinity. When c = 0.1, the neural network is not convergent, either, as shown in Fig. 2(b). However, when c ≥ 0.2, in Figs. 2(c) and 2(d) we observe that the trajectories converge to u∗1 asymptotically.
6
Concluding Remarks
In the paper, a recurrent neural network is proposed for seeking local optima of general nonconvex optimization problems by means of solving Karush-KuhnTucker (KKT) equations. In the proposed scheme, there is no need to introduce slack variables to convert inequality constraints into equality constraints. Moreover, a nice property of the proposed neural network is that its equilibria are in correspondence with the KKT points. Another nice property lies in that by choosing an appropriate control parameter the neural network can be made asymptotically stable at those KKT points associated with local optima under some standard assumptions in the optimization context, although locally. This can be regarded as a meaningful progress in paving the way for designing neural networks for completely solving nonconvex optimization problems, by considering that many existing neural network models are unstable at such KKT points. A numerical example is discussed to illustrate the performance of the proposed neural network.
Acknowledgement This work was supported by the Hong Kong Research Grants Council under Grant CUHK4165/03E.
References 1. Hopfield, J.J., Tank, D.W.: ‘Neural’ Computation of Decisions in Optimization Problems. Biol. Cybern. 52 (1985) 141–152 2. Tank, D.W., Hopfield, J.J.: Simple Neural Optimization Networks: An A/D Converter, Signal Decision Circuit, and a Linear Programming Circuit. IEEE Trans. Circuits Syst. II 33 (1986) 533–541 3. Rodr´ıguez-V´ azquez, A., Dom´ınguez-Castro, R., Rueda, A., Huertas, J. L., S´ anchezSinencio, E.: Nonlinear Switched-Capacitor Neural Networks for Optimization Problems. IEEE Trans. Circuits Syst. 37 (1990) 384–397
Convergence of a Recurrent Neural Network for Nonconvex Optimization
203
4. Wang, J.: Analysis and Design of a Recurrent Neural Network for Linear Programming. IEEE Trans. Circuits Syst. I 40 (1993) 613–618 5. Wang, J.: A Deterministic Annealing Neural Network for Convex Programming. Neural Networks 7 (1994) 629–641 6. Bouzerdoum, A., Pattison, T.R.: Neural Network for Quadratic Optimization with Bound Constraints. IEEE Trans. Neural Networks 4 (1993) 293–304 7. Liang, X., Si, J.: Global Exponential Stability of Neural Networks with Globally Lipschitz Continuous Activations and Its Application to Linear Variational Inequality Problem. IEEE Trans. Neural Networks 12 (2001) 349–359 8. Xia, Y., Wang, J.: A Recurrent Neural Network for Solving Linear Projection Equations. Neural Networks 13 (2000) 337–350 9. Xia, Y., Wang, J.: On the Stability of Globally Projected Dynamical Systems. Journal of Optimization Theory and Applications 106 (2000) 129–150 10. Xia, Y., Leung, H., Wang, J.: A Projection Neural Network and Its Application to Constrained Optimization Problems. IEEE Trans. Circuits Syst. I 49 (2002) 447–458 11. Xia, Y., Feng, G., Wang, J.: A Recurrent Neural Network with Exponential Convergence for Solving Convex Quadratic Program and Linear Piecewise Equations. Neural Networks 17 (2004) 1003–1015 12. Xia, Y., Wang, J.: A Recurrent Neural Network for Nonlinear Convex Optimization Subject to Nonlinear Inequality Constraints. IEEE Trans. Circuits Syst. I 51 (2004) 1385–1394 13. Xia, Y., Wang, J.: Recurrent Neural Networks for Solving Nonlinear Convex Programs with Linear Constraints. IEEE Trans. Neural Networks 16 (2005) 379–386 14. Hu, X., Wang, J.: Solving Pseudomonotone Variational Inequalities and Pseudoconvex Optimization Problems Using the Projection Neural Network. IEEE Trans. Neural Networks 17 (2006) 1487–1499 15. Zhang, S., Constantinides, A.G.: Lagrange Programming Neural Networks. IEEE Trans. Circuits Syst. II 39 (1992) 441–452 16. Huang, Y.: Lagrange-Type Neural Networks for Nonlinear Programming Problems with Inequality Constraints. In: Proc. 44th IEEE Conference on Decision and Control and the European Control Conference. Seville, Spain, Dec. 12–15 (2005) 4129–4133 17. Luenberger, D.G.: Linear and Nonlinear Programming. 2nd Edition. AddisonWesley, Reading, Massachusetts (1984) 18. Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982) 19. Slotine, J.J., Li, W.: Applied Nonlinear Control. Prentice Hall, Englewood Cliffs, NJ (1991) 20. Hu, X., Wang, J.: A Recurrent Neural Network for Solving Nonconvex Optimization Problems. In: Proc. 2006 IEEE International Joint Conference on Neural Networks. Vancouver, Canada, July 16–21 (2006) 8955–8961