A One-Layer Recurrent Neural Network for ... - Semantic Scholar

Report 1 Downloads 137 Views
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

1323

A One-Layer Recurrent Neural Network for Constrained Nonsmooth Optimization Qingshan Liu, Member, IEEE, and Jun Wang, Fellow, IEEE

Abstract—This paper presents a novel one-layer recurrent neural network modeled by means of a differential inclusion for solving nonsmooth optimization problems, in which the number of neurons in the proposed neural network is the same as the number of decision variables of optimization problems. Compared with existing neural networks for nonsmooth optimization problems, the global convexity condition on the objective functions and constraints is relaxed, which allows the objective functions and constraints to be nonconvex. It is proven that the state variables of the proposed neural network are convergent to optimal solutions if a single design parameter in the model is larger than a derived lower bound. Numerical examples with simulation results substantiate the effectiveness and illustrate the characteristics of the proposed neural network. Index Terms—Convergence, differential inclusion, Lyapunov function, nonsmooth optimization, recurrent neural networks.

I. I NTRODUCTION

C

ONSIDER the following constrained optimization problem: minimize subject to

f (x) g(x) ≤ 0 Ax = b

(1)

where x = (x1 , x2 , . . . , xn )T ∈ Rn , f : Rn → R is an objective function that may be nonsmooth and nonconvex, g(x) = (g1 (x), g2 (x), . . . , gp (x))T : Rn → Rp is a p-dimensional vector-valued function, and gi (i = 1, 2, . . . , p) : Rn → R are functions that may be nonsmooth and nonconvex. A ∈ Rm×n is a full row-rank matrix (i.e., rank(A) = m ≤ n), and b = (b1 , b2 , . . . , bm )T ∈ Rm . Constrained optimization problems arise in a broad variety of scientific and engineering applications where real-time optimal solutions are often required [21], [23], [26], [29]. One possible and very promising approach to solving dynamic Manuscript received July 26, 2010; revised March 3, 2011; accepted March 14, 2011. Date of publication May 2, 2011; date of current version September 16, 2011. This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, under Grant CUHK417209E and Grant CUHK417608E. This paper was recommended by Associate Editor Q. Zhang. Q. Liu is with the School of Automation, Southeast University, Nanjing 210096, China (e-mail: [email protected]). J. Wang is with the Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCB.2011.2140395

optimization problems is to apply recurrent neural networks [12], [22]. As parallel computational models for solving convex optimization problems, many recurrent neural networks have been developed over the past two decades (e.g., see [7], [10], [14], [32], and references therein). In 1986, Tank and Hopfield [22] applied the Hopfield network for solving linear programming problems, which inspired many researchers to develop other neural networks for optimization. In 1988, the dynamic canonical nonlinear programming circuit (NPC) was introduced by Kennedy and Chua [12] for nonlinear programming by utilizing a finite penalty parameter, which can generate the approximate optimal solutions. From then on, research on NPC has been well developed, and plenty of neural network models have been designed for optimization problems (e.g., see [3], [20], [24], [35], and references therein). Among them, the Lagrangian network based on the Lagrangian method was proposed by Zhang and Constantinides [36] for solving convex nonlinear programming problems. The deterministic annealing neural network by Wang [25] was developed for linear and nonlinear convex programming. Based on the primal–dual method, the primal–dual network [27], the dual network [28], and the simplified dual network [18] were proposed for solving convex optimization problems. Recently, the projection method has been extended to construct recurrent neural networks for solving convex optimization problems and some constrained pseudoconvex optimization problems (e.g., [9] and [30]). In [15] and [16], some one-layer recurrent neural networks for solving linear and quadratic programming problems were proposed with lower model complexity. In particular, in [7], a generalized NPC was proposed for solving nonsmooth nonlinear programming problems without the equality constraints in (1). The neural network in [7] is guaranteed to generate optimal solutions to the problem under the assumption of int(S) = ∅, where int(S) denotes the interior of feasible region S. In [17], a recurrent neural network was proposed for solving the nonsmooth convex optimization problem without the inequality constraints in (1). In [34], a recurrent neural network modeled using a differential inclusion was proposed for solving nonsmooth optimization problem (1). Inspired by the works in [7] and [34], in this paper, a onelayer recurrent neural network is presented for solving the nonsmooth optimization problem subject to the equality and inequality constraints in (1). Because of linear equality constraint Ax = b, the assumption of int(S) = ∅ in [7] no longer holds for problem (1). Thus, the neural network in [7] may not be suitable for solving problem (1). In [34], to make the equilibrium points of the neural network feasible for the linear equality constraints,

1083-4419/$26.00 © 2011 IEEE

1324

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

the neural network therein was constructed based on the l2 norm for the equality constraints. As a result, the feasibility of the linear equality constraints needs to be checked when designing the neural network model. Moreover, two penalty parameters and some assumptions on the inequality constraints, such as estimating an upper bound of the Lipschitz constant of the inequality constrained functions over a compact set, are needed for the neural network in [34]. To overcome the aforementioned limitations, this paper presents a recurrent neural network with only one design parameter for solving problem (1), and some restrictive assumptions are relaxed. The remainder of this paper is organized as follows: In Section II, the proposed recurrent neural network model is described. The theoretical analysis of the proposed neural network is analyzed in Section III. Next, in Section IV, simulation results of several numerical examples are given. Finally, Section V concludes this paper. II. M ODEL D ESCRIPTION Here, the recurrent neural network model for solving nonsmooth optimization problem (1) is described. First, some notations are introduced. In this paper,  · 1 and  · 2 denote the l1 and l2 norms, respectively, of a vector or a matrix. For a given vector x ∈ Rn I+ (x) = {j ∈ {1, 2, . . . , p} : gj (x) > 0} I0 (x) = {j ∈ {1, 2, . . . , p} : gj (x) = 0} I− (x) = {j ∈ {1, 2, . . . , p} : gj (x) < 0} . The region where the constraints are satisfied (feasible region) is defined as S = {x ∈ Rn : g(x) ≤ 0, Ax = b}. Moreover, we define that S1 = {x ∈ Rn : g(x) ≤ 0} and S2 = {x ∈ Rn : Ax = b}.  Clearly, S = S1 S2 . Throughout this paper, the objective function f (x) and constraint gi (x) (i = 1, 2, . . . , p) in problem (1) may be smooth or convex. However, the following two assumptions on optimization problem (1) are necessary in this paper. ˆ∈ Assumption 1: There exist x ˆ ∈ Rn and r > 0 such that x  x, r), where B(ˆ x, r) = {x ∈ Rn : int(S1 ) S2 and S ⊂ B(ˆ ˆ. x − x ˆ2 ≤ r} is the r neighborhood of x Assumption 2: The objective function f (x) of problem (1)  is convex on S and Lipschitz bounded on B(ˆ x, r) S2 . The inequality constraint gi (i = 1, 2, . . . , p) of problem (1) is convex on S2 . Remark 1: From Assumption 1, we can see that feasible region S is assumed to be bounded. Compared with the conditions in [2], in which inequality feasible region S1 needs to be bounded, the assumption on optimization problem (1), as shown in Assumption 1, is weaker than those in the aforementioned reference. In [16], the quadratic programming problems in which objective functions only need to be convex on equality feasible region S2 are investigated. However, the decision variables of the quadratic programming problem are restricted in a bounded box set. Thus, inequality feasible region

S1 is also bounded. Therefore, optimization problem (1) under Assumption 1 includes more optimization problems than those in [2] and [16]. Remark 2: In the literature (see [11], [31], [33], [34], and references therein), many recurrent neural networks have been developed for solving convex optimization problems based on the assumption that an objective function and constraints are convex. To the best of our knowledge, there are few successful stories on neurodynamic optimization with nonconvex objective functions and constraints. In this paper, optimization problem (1) needs to satisfy some convexity conditions shown in Assumption 2 only. Therefore, the addressed problems are more general than the others in the literature. The dynamic equation of the proposed recurrent neural network model for solving (1) is described as follows: 

dx ∈ −AT φ[−1,1] (Ax − b) dt   − μ(t)(I − P ) ∂f (x) + σ∂g(x)φ[0,1] (g(x))

(2)

where  is a positive scaling constant, σ is a positive gain parameter, μ(t) ≥ 0 is a function of time t, I is the identity matrix, P = AT (AAT )−1 A, ∂f is the generalized gradient of f as defined in Appendix I, and ∂g = (∂g1 , ∂g2 , . . . , ∂gp ). φ[−1,1] and φ[0,1] are φ[l,h] of appropriate dimensions with l = −1, h = 1 and l = 0, h = 1, respectively. φ[l,h] = (φ[l,h] (v1 ), φ[l,h] (v2 ), . . . , φ[l,h] (vp ))T , and its component is defined as  h, if vi > 0 (3) φ[l,h] (vi ) = [l, h], if vi = 0 l, if vi < 0 for i = 1, 2, . . . , p. Let tS2 = Ax0 − b1 /λmin (AAT ), where λmin is the minimum eigenvalue of the matrix, and x0 is the initial state of neural network (2) at t = t0 . Function μ(t) is a monotone nondecreasing function, and two examples are given as follows, i.e., in a discontinuous form  0, if t ≤ tS2 (4) μ(t) = 1, if t > tS2 and in a continuous form ⎧ ⎨ 0, μ(t) = (t − tS2 )α /δ α , ⎩ 1,

if t ≤ tS2 if tS2 < t ≤ tS2 + δ if t > tS2 + δ

(5)

where α and δ are positive constants. Function μ(t) depends on initial state x0 , parameter , A, and b in the equality constraints of problem (1). Remark 3: Matrix P is a projection matrix with some salient features such as symmetry, P 2 = P , (I − P )2 = I − P , and P 2 = 1, which can be directly derived from the definition of P . III. T HEORETICAL A NALYSIS Here, the convergence and optimality of the proposed neural network are proven.

LIU AND WANG: ONE-LAYER RECURRENT NEURAL NETWORK FOR CONSTRAINED NONSMOOTH OPTIMIZATION

+ Let us define D(x) = eT (g(x))+ , where u+ = (u+ 1 , u2 , . . . , + with ui = max{0, ui } (i = 1, 2, . . . , p), and e = (1, 1, . . . , 1)T ∈ Rp , then  0,

if x ∈ S1 D(x) = gi (x), if x ∈ Rn /S1

T u+ p)

i∈I+ (x)

From the definition of P , one gets that A(I − P ) = 0. Thus, for any x0 ∈ Rn , when x ∈ Rn \S2 , we have 1 d B(x) = − AT η22 dt 

i∈I0 (x)

For any x ∈ Rn \S2 and η ∈ K[φ[−1,1] (Ax − b)], since A η ∈ ∂B(x), then AT η = 0 and at least one of the components of η is −1 or 1. On one hand, since A has full row rank, AAT is invertible. It follows that   (AAT )−1 AAT η  = η2 ≥ 1. 2 On the other hand, we have     (AAT )−1 AAT η  ≤ (AAT )−1 A AT η2 . 2 2 Since AAT is positive definite, it follows that      (AAT )−1 A = λmax (AAT )−1 A ((AAT )−1 A)T 2 

where bd(S1 ) denotes the boundary of set S1 . It follows that the following equivalent differential inclusion for neural network (2) dx ∈ −∂B(x) − μ(t)(I − P ) (∂f (x) + σ∂D(x)) dt

0 ∈ ∂B(x) + μ(t)(I − P ) (∂f (x) + σ∂D(x))

(7)

x − b)], γ¯ ∈ ∂f (¯ x), and ξ¯ ∈ i.e., if there exist η¯ ∈ K[φ[−1,1] (A¯ ∂D(¯ x) such that, for any t ∈ [t0 , ∞) ¯ = 0. γ + σ ξ) AT η¯ + μ(t)(I − P )(¯

(8)

From the definition of an equilibrium point of system (6), multiplying both sides of (8) by A, then AAT η¯ = 0 in view of A(I − P ) = 0. Furthermore, η¯ = 0 since A has full row rank. ¯ = 0. Therefore, x From the definition of μ(t), (I − P )(¯ γ + σ ξ) ¯ x − b)] is an equilibrium point of system (6) if 0 ∈ K[φ[−1,1] (A¯ and 0 ∈ (I − P )(∂f (¯ x) + σ∂D(¯ x)). A. Finite-Time Convergence to S2 Here, the state of neural network (6) is proven to reach equality feasible region S2 in finite time and stays there thereafter. Theorem 1: The state of neural network (6) with any initial point x0 ∈ Rn is guaranteed to reach equality feasible region S2 by tS2 = Ax0 − b1 /min (AAT ) and stays there thereafter. Proof: Note that B(x) = Ax − b1 . According to the chain rule, we have ∀ξ ∈ ∂B (x(t)) .

=

λmax ((AAT )−1 )

1 = >0 λmin (AAT )

(6)

where B(x) = Ax − b1 , and ∂B(x) = AT K[φ[−1,1] (Ax − b)], with K(·) being the closure of the convex hull. In the following, the neural network in (6) is utilized to analyze the performance of the neural network in (2). Definition 1: x ¯ is said to be an equilibrium point of system (6) if, for any t ∈ [t0 , ∞)

d B(x) = ξ T x(t) ˙ dt



∀η ∈ K φ[−1,1] (Ax − b) .

T

which is a nonsmooth barrier function. For any x ∈ Rn , the generalized gradient of D(x) can be derived as follows: ⎧

[0, 1]∂gi (x), if x ∈ bd(S1 ) ⎪ ⎪ i∈I0 (x) ⎪ ⎪ ⎪ ⎪ 0,

if x ∈ int(S1 ) ⎪ ⎪ ⎪ ⎨ ∂gi (x), if x ∈ / S1 and I0 (x)= ∅ ∂D(x)= i∈I

+ (x) ⎪ ⎪ ∂gi (x) ⎪ ⎪ ⎪ ⎪ i∈I+ (x)

⎪ ⎪ ⎪ [0, 1]∂gi (x), if x ∈ / S1 and I0 (x) = ∅ ⎩ +



1325

where λmax is the maximum eigenvalue of the matrix. Thus  AT η2 ≥ λmin (AAT ). Then, d 1 B(x) ≤ − λmin (AAT ) < 0. dt 

(9)

Integrating both sides of (9) from t0 = 0 to t, then 1 B (x(t)) ≤ B (x(t0 )) − λmin (AAT )t.  Thus, B(x(t)) = 0 as t = B(x0 )/λmin (AAT ). That is, the state of neural network (6) reaches S2 in finite time, and an upper bound of the hit time is tS2 = B(x0 )/λmin (AAT ). Next, we prove that, when t ≥ tS2 , the state vector of neural network (6) remains inside S2 thereafter. If not, we suppose that the trajectory leaves S2 at time t1 and stays outside of S2 for almost all (a.a.) t ∈ (t1 , t2 ), where t1 < t2 . Then, B(x(t1 )) = 0, and from the aforementioned analysis, B(x(t)) < 0 for a.a. t ∈ (t1 , t2 ). From the definition of B(x), B(x(t)) ≥ 0 for any t ∈ [t0 , ∞), which contradicts the aforementioned result. That is, the state vector of neural network (2) reaches equality feasible region S2 by tS2 at the latest and stays there thereafter.  It is worth pointing out that the neural network of this paper is capable of solving problem (1) even if S2 is empty, which was studied in [7]. B. Boundedness of State Vector x(t) Here, the boundedness of the state vector of neural network (6) is proven. Because  of Assumption 2, f (x) is Lipschitz bounded on B(ˆ x, r) S2 . Throughout this paper, we always

1326

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

denote lf asan upper bound of the Lipschitz constant of f (x) on B(ˆ x, r) S2 . From the definition of P , it is easy to obtain the following lemma. Lemma 1: For any x ∈ Rn , Ax = b if and only if P x = q, where P is defined in (2) and q = AT (AAT )−1 b. Next, inspired by the work in [2] and [34], the following two lemmas hold. Lemma 2: Suppose that Assumptions 1 and 2 hold. For any ˆ)T (I − P )ξ > gˆ, where gˆ = x ∈ S2 \S1 and ξ ∈ ∂D(x), (x − x x)). min1≤i≤p (−gi (ˆ Proof: The proof is given in Appendix II.  Lemma 3: Suppose that Assumptions 1 and 2 hold. For  any x ∈ (B(ˆ x, r) S2 )\S1 and ξ ∈ ∂D(x), (I − P )ξ > gˆ/r holds, where gˆ is defined in Lemma 2. Proof: The proof is given in Appendix II.  The boundedness of the state vector of neural network (6) is stated as the following theorem. Theorem 2: Suppose that Assumptions 1 and 2 hold. For x, r), the state vector of neural network (6) stays any x0 ∈ B(ˆ g. x(t) ∈ B(ˆ x, r) if σ > rlf /ˆ Proof: The proof is divided into two steps. x, r), the state vector of Step 1: We prove that, for any x0 ∈ B(ˆ neural network (6) satisfies x(t) ∈ B(ˆ x, r) for t ∈ [t0 , tS2 ], where tS2 is defined in Theorem 1. In fact, if x0 ∈ S2 , then the result obviously holds / S2 , when t ∈ [t0 , tS2 ], we have since tS2 = t0 . If x0 ∈ ˆ ∈ S2 , μ(t) = 0. Let ρ(x(t)) = x(t) − x ˆ22 /2. Since x we have Aˆ x = b. Then, one gets that d ρ (x(t)) =  (x(t) − x ˆ)T x(t) ˙ dt = (x(t) − x ˆ)T (−AT η)

If x(t) ∈ / S, according to Lemma 2, one gets that (x(t) − x ˆ)T (I − P )ξ > gˆ. Since (I − P )γ2 ≤ γ2 , we get that d ρ (x(t)) ≤ μ(t) (x(t) − x ˆ2 γ2 − σˆ g) . dt

g , x(t) ∈ B(ˆ x, r). If not, trajectory x(t) leaves If σ > rlf /ˆ B(ˆ x, r) at time t1 with t1 > tS2 , and x(t1 ) − x ˆ2 = r. Then, dρ(x(t))/dt|t=t1 ≥ 0. From (10), combining γ2 ≤ lf and μ(t1 ) > 0, one gets that   d ρ (x(t)) ≤ μ(t1 )(rlf − σˆ g) < 0 dt t=t1 which is a contradiction. If x(t) ∈ S, then x(t) ∈ B(ˆ x, r) is obviously true. Thus, x(t) ∈ B(ˆ x, r) for t > t˜. Consequently, for any x0 ∈ B(ˆ x, r), the state vector of g.  neural network (6) satisfies x(t) ∈ B(ˆ x, r) if σ > rlf /ˆ C. Finite-Time Convergence to S Here, neural network (6) is further proven to be convergent to feasible region S in finite time. Theorem 3: Suppose that Assumptions 1 and 2 hold. For any x, r), the state of neural network (6) is guaranteed to x0 ∈ B(ˆ reach feasible region S in finite time and stays there thereafter g. if σ > rlf /ˆ Proof: According to Theorem 1, the trajectory of x(t) reaches equality feasible region S2 in finite time and stays there thereafter. It remains to show that once in set S2 , the trajectory reaches set S1 in finite time and stays there thereafter. According to the definition of D(x), it is convex on S2 . By the chain rule, we have

= − (Ax(t) − b)T η ≤ 0 where η ∈ K[φ[−1,1] (Ax − b)]. Then, x(t) − x ˆ2 is nonincreasing, and x(t) ∈ B(ˆ x, r) when t ∈ [t0 , tS2 ]. x, r), there exists t˜ ≤ tS2 Step 2: We prove that, for any x0 ∈ B(ˆ such that x(t) ∈ B(ˆ x, r) for t ∈ (t˜, ∞). In fact, from Theorem 1, x(t) reaches S2 in finite time and stays there thereafter. Assume that x(t) reaches S2 at t = t˜ for the first time, then t˜ ≤ tS2 . Next, we prove that x(t) ∈ B(ˆ x, r) for t > t˜. x = b. For When t > t˜, we have x(t) ∈ S2 . Then, Ax(t) = Aˆ any η ∈ K[φ[−1,1] (Ax − b)], γ ∈ ∂f (x) and ξ ∈ ∂D(x), when t > t˜, we have d ρ (x(t)) =  (x(t) − x ˆ)T x(t) ˙ dt   = (x(t) − x ˆ)T −AT η − μ(t)(I − P )(γ + σξ) = − μ(t) (x(t) − x ˆ)T (I − P )(γ + σξ)  ≤ μ(t) x(t) − x ˆ2 (I − P )γ2  − σ (x(t) − x ˆ)T (I − P )ξ .

(10)

d D(x) = ξ T x(t) ˙ dt

∀ξ ∈ ∂D (x(t)) .

x, r), x(t) ∈ B(ˆ x, r). Since From Theorem 2, for any x0 ∈ B(ˆ x ∈ S2 , from Lemma 1, we have P x = q. Thus, P x˙ = 0. Since x can be written as x = P x + (I − P )x, then x˙ = (I − P )x. ˙ Combining (I − P )AT = 0 yields dx ∈ −μ(t)(I − P ) (∂f (x) + σ∂D(x)) . (11) dt  Then, for any x ∈ (B(ˆ x, r) S2 )\S1 and any γ ∈ ∂f (x), ξ ∈ ∂D(x), we have 



d D(x) = ξ T x(t) ˙ dt = − μ(t)ξ T (I − P )(γ + σξ)

= − μ(t) ξ T (I − P )γ + σξ T (I − P )ξ ≤ − μ(t) (I − P )ξ2 [σ (I − P )ξ2 − γ2 ] .

For any γ ∈ ∂f (x), we have γ2 ≤ lf . By Lemma 3, for any ξ ∈ ∂D(x), (I − P )ξ2 > gˆ/r. From the definition of

LIU AND WANG: ONE-LAYER RECURRENT NEURAL NETWORK FOR CONSTRAINED NONSMOOTH OPTIMIZATION

μ(t) in (4) and (5), when t ≥ tS2 + δ, μ(t) = 1. Then, for g , we have t ≥ tS2 + δ, if σ > rlf /ˆ   gˆ σˆ g d − lf < 0.  D(x) < − dt r r Denote k = gˆ(σˆ g /r − lf )/(r). Then, k > 0 and d D(x) < −k. dt

(12)

Integrating (12) from tS2 + δ to t, one gets that D (x(t)) ≤ D (x(tS2 + δ)) − k(t − tS2 − δ). Thus, when t ≥ D(x(tS2 + δ))/k + tS2 + δ, D(x(t)) ≤ 0. Therefore, the state of neural network (6) reaches S in finite time. Similar to the proof of Theorem 1, we can prove that the state of neural network (6) stays in S thereafter.  D. Optimality Analysis Theorem 4: Suppose that Assumptions 1 and 2 hold. Any equilibrium point of neural network (6) is an optimal solution g. of problem (1), and vice versa, if σ > rlf /ˆ Proof: Let x∗ be an optimal solution of problem (1), then x∗ ∈ S. Since x∗ is a minimum point of f (x) over feasible region S, according to Lemma 6 in Appendix I, it follows that 0 ∈ ∂f (x∗ ) + NS (x∗ ) ∗ where NS (x∗ ) is the normal cone  to set S at x defined in Appendix I. Because int(S1 ) S2 = ∅, we get that 0 ∈ int(S1 − S2 ). From Lemma 5 in Appendix I, it follows that NS = NS1 + NS2 . Then

0 ∈ ∂f (x∗ ) + NS1 (x∗ ) + NS2 (x∗ ). Thus, 0 ∈ (I − P )∂f (x∗ ) + (I − P )NS1 (x∗ ) + (I − P )NS2 (x∗ ). By simple derivation, we get that NS2 (x∗ ) = {AT y : y ∈ Rm }. Furthermore, (I − P )NS2 (x∗ ) = {0} since (I − P )AT = 0. Then 0 ∈ (I − P )∂f (x∗ ) + (I − P )NS1 (x∗ ). 

Suppose x∗ ∈ bd(S1 ) S2 , then ∂D(x∗ ) = i∈I0 (x∗ ) [0, 1]∂gi (x∗ ). It follows that there exists u ∈ NS1 (x∗ ) such that (I − P )u ∈ −(I − P )∂f (x∗ ). For any γ ∈ ∂f (x∗ ), (I − P )γ2 ≤ I − P 2 γ2 ≤ γ2 ≤ lf . Then, (I − P )u2 ≤ lf . Inspired by , proof of Lemma 5[2], we prove that (I − P )u ∈ σ(I − P )∂D(x∗ ). to [34, Property 5], we have NS1 (x∗ ) =  In fact, according ∗ ∗ ) = ν ≥ 0 ν∂D(x

i∈I0 (x∗ ) [0, +∞)∂gi (x ). Thus, (I − P )NS1 (x∗ ) = i∈I0 (x∗ ) [0, +∞)(I − P )∂gi (x∗ ). Then, for ∗ i ∈ I0 (x∗ ), there

exist σi ∈ [0, +∞) and ξi ∈ ∂gi (x ) such that (I − P )u = i∈I0 (x∗ ) σi (I − P )ξi .  ∗ For i ∈ I0 (x∗ ), since x∗ ∈ bd(S1 ) S2 , we have (x  − T ∗ x) = −gi (ˆ x) ≥ gˆ, where x ˆ ∈ int(S1 ) S2 . x ˆ) ξi ≥ gi (x )−gi (ˆ

1327

ˆ ∈ S2 , we have P x∗ = P x ˆ = q. Then, (x∗ − x ˆ)T (I − Since x∗ , x ∗ T ˆ) ξi ≥ gˆ. P )ξi = (x − x

Note that σ(I − P )∂D(x∗ ) = i∈I0 (x∗ ) [0, σ](I − P )∂gi (x∗ ). We say that σi ≤ σ for i ∈ I0 (x∗ ). If not, there ∗ ∗ ˆ)T (I − P )u = exists

(x − x

j ∈ I0 (x ∗) suchT that σj > σ, then ˆ) (I − P )ξi ≥ i∈I0 (x∗ ) σi gˆ ≥ σj gˆ > σˆ g. i∈I0 (x∗ ) σi (x − x Thus, (I − P )u2 > σˆ g /x∗ − x ˆ2 ≥ σˆ g /r. If σ > rlf /ˆ g, then we have (I − P )u2 > lf , which contradicts (I − P )u2 ≤ lf . Consequently, σi ≤ σ for i ∈ I0 (x∗ ), and it follows that (I − P )u ∈ σ(I − P )∂D(x∗ ). Combining it with 0 ∈ (I − P )∂f (x∗ ) + (I − P )NS1 (x∗ ) yields 0 ∈ ∗ (I − P )∂f (x∗ ) + σ(I −  P )∂D(x ). When x∗ ∈ int(S1 ) S2 , we have NS∞ (x∗ ) = {0} and ∂D(x∗ ) = {0}. Thus, 0 ∈ (I − P )∂f (x∗ ) + (I − P )NS1 (x∗ ) implies that 0 ∈ (I − P )∂f (x∗ ). Furthermore, we get that 0 ∈ (I − P )(∂f (x∗ ) + σ∂D(x∗ )). From the aforementioned analysis, for any optimal solution g , then 0 ∈ (I − P )(∂f (x∗ ) + x∗ of problem (1), if σ > rlf /ˆ ∗ σ∂D(x )). Moreover, since x∗ ∈ S2 , one gets that Ax∗ − b = 0. Thus, 0 ∈ AT K[φ[−1,1] (Ax∗ − b)] = ∂B(x∗ ). It follows that 0 ∈ ∂B(x∗ ) + μ(t)(I − P )(∂f (x∗ ) + σ∂D(x∗ )). Therefore, x∗ is an equilibrium point of neural network (6). Next, we prove that the reverse side is true. Let ψ(x) = f (x) + σD(x) and x ¯ be an equilibrium point of neural network (6). In fact, since x ¯ is an equilibrium point of neural network (6), x − b)] and ζ¯ ∈ ∂ψ(¯ x) such that, there exist η¯ ∈ K[φ[−1,1] (A¯ for any t ∈ [t0 , ∞) AT η¯ + μ(t)(I − P )ζ¯ = 0.

(13)

Multiplying both sides of (13) by A yields AAT η¯ = 0. Then, η¯ = 0 since A is a full row-rank matrix. It implies that A¯ x= b. Thus, (I − P )ζ¯ = 0 from (13) and the definition of μ(t). Since P = AT (AAT )−1 A, we have ζ¯ − AT (AAT )−1 Aζ¯ = ¯ then ζ¯ + AT y¯ = 0. According to 0. Let y¯ = −(AAT )−1 Aζ, Assumption 2, f (x) and gi (x) (i = 1, 2, . . . , p) are convex on S, g , then then ψ(x) is convex on S. From Theorem 3, if σ > rlf /ˆ the trajectory of neural network (6) is convergent to S in finite time. Thus, x ¯ ∈ S and D(¯ x) = 0. Then, for any x ∈ S, ψ(x) ≥ x), in which x) − (x − x ¯)T AT y¯ = ψ(¯ ψ(¯ x) + (x − x ¯)T ζ¯ = ψ(¯ the last equality holds since Ax = A¯ x = b. Combining that D(x) = D(¯ x) = 0 for x, x ¯ ∈ S, one gets that f (x) ≥ f (¯ x) for any x ∈ S. Therefore, x ¯ is a minimum point of f (x) over S.  Remark 4: According to Theorem 4, the optimality of the equilibrium points of neural network (6) depends on gain parameter σ, i.e., the equilibrium point is an optimal solution of problem (1) if σ is larger than a derived lower bound. To estimate the lower bound of σ, one only needs to know  an upper bound of the Lipschitz constant of f (x) on B(ˆ x, r) S2 , a feasible point in S, and an upper bound of feasible region S. Thus, the neural network proposed in this paper and the theoretical results show some advantages over those in [2] and [34]. For example, in [2], under the assumption of the boundedness of S1 , the neural network model depends on the upper bound of the Lipschitz constant of D(x) in a compact set containing S1 . In [34], two design parameters are used in

1328

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

the neural network model, and one of them depends on the upper bound of the Lipschitz constant of D(x) in a bounded set. However, for the neural network herein, these restrictions are relaxed, and there is no need to estimate the upper bound of the Lipschitz constant of D(x). Consequently, the results of this paper imply a reduction of the model complexity of the neural networks for solving problem (1). E. Convergence Analysis

AT η¯ + (I − P )ζ¯ = 0.

(14)

Substituting it into (6) yields 

   dx ∈ −AT K φ[−1,1] (Ax−b) − η¯ −(I −P ) ∂ψ(x)− ζ¯ . dt (15)

Since x ∈ S, from Lemma 1, we have P x = q. Then, P x˙ = 0. Since x can be written as x = P x + (I − P )x, we have x˙ = (I − P )x. ˙ Combining it with (I − P )AT = 0, we have 

  dx ∈ −(I − P ) ∂ψ(x) − ζ¯ . dt

(16)

Consider the Lyapunov function   1 V (x) =  ψ(x) − ψ(¯ x) − (x − x ¯)T ζ¯ + x − x ¯22 (17) 2 then,

∂V (x) =  ∂ψ(x) − ζ¯ + x − x ¯ . By using the chain rule, it follows that V (x(t)) is differentiable for a.a. t ≥ t0 , and it results in d V (x(t)) = ξ(t)T x(t) ˙ dt

∀ξ(t) ∈ ∂V (x(t)) .

Let

ξ(t) =  ζ(t) − ζ¯ + x(t) − x ¯ where ζ(t) ∈ ∂ψ(x(t)). Then

d ¯ V (x(t)) ≤ sup [ζ − ζ¯ + x − x ¯]T −(I − P )(ζ − ζ) dt ζ∈∂ψ(x) ¯ T (I − P )(ζ − ζ)−(x ¯ = sup −(ζ − ζ) −x ¯)T ζ∈∂ψ(x)

d V (x(t)) ≤ dt =

Here, the convergence property of neural network (6) is investigated by using the Lyapunov method and the differential inclusion theory (see, e.g., [1], [4], [6], [19], and references therein). Theorem 5: Suppose that Assumptions 1 and 2 hold. For any x, r), the state of neural network (6) is convergent to an x0 ∈ B(ˆ g. optimal solution of problem (1) if σ > rlf /ˆ Proof: Let x ¯ be an equilibrium point of neural network (6). According to Theorem 4, x ¯ is an optimal solution of problem (1). Thus, A¯ x = b. From Theorem 3, we can suppose that x0 ∈ S and μ(t) = 1. Let ψ(x) = f (x) + σD(x). There x − b)] and ζ¯ ∈ ∂ψ(¯ x) such that exist η¯ = 0 ∈ K[φ[−1,1] (A¯



According to Assumption 2, f (x) and gi (x) (i = 1, 2, . . . , p) are convex on S, then ψ(x) is convex on S. Then, for any ζ ∈ ¯ ≥ 0 holds. Since x, x ¯ ∈ S, we have ∂ψ(x), (x − x ¯)T (ζ − ζ) Px = Px ¯ = q. Then,

¯ + (x − x ¯ . ×(ζ − ζ) ¯)T P (ζ − ζ)



sup ζ∈∂ψ(x)

sup ζ∈∂ψ(x)

¯ T (I − P )(ζ − ζ) ¯ −(ζ − ζ)

    ¯ 2 . − (I − P )(ζ − ζ) 2

From (14), since η¯ = 0, one gets that (I − P )ζ¯ = 0. Then, d V (x(t)) ≤ dt

sup ζ∈∂ψ(x)

=−

inf

  − (I − P )ζ22

ζ∈∂ψ(x)

(I − P )ζ22 .

(18)

Define Γ(x) = inf ζ∈∂ψ(x) (I − P )ζ22 . If x ¯ is an equilibrium point of neural network (6), then we have Γ(¯ x) = 0. Conversely, if there exists x ˇ ∈ S such that Γ(ˇ x) = 0, combining that ∂ψ(x) is a compact convex subset in Rn , then there exists ζˇ ∈ ∂ψ(ˇ x) such that (I − P )ζˇ = 0. Thus, x ˇ is an equilibrium point. Therefore, Γ(x) = 0 on S if and only if x is an equilibrium point of neural network (6). From the boundedness of x(t) and (6), we get that x(t) ˙ 2 is also bounded, which is denoted by M . Then, there exists an increasing sequence {tk } with limk→∞ tk = ∞ and a limit ˜. Inspired by the proof in point x ˜ such that limk→∞ x(tk ) = x [13], we prove that Γ(˜ x) = 0. If it does not hold, then Γ(˜ x) > 0. From the definition of Γ(x), it is lower semicontinuous, and then, there exist ω > 0 and ε > 0 such that Γ(x) > ε for all ˜2 ≤ ω} is x ∈ B(˜ x, ω), where B(˜ x, ω) = {x ∈ Rn : x − x ˜, there exists the ω neighborhood of x ˜. Since limk→∞ x(tk ) = x ˜2 ≤ a positive integer N such that for all k ≥ N , x(tk ) − x ω/2. When t ∈ [tk − ω/(4M ), tk + ω/(4M )] and k ≥ N , we have x(t) − x ˜2 ≤ x(t) − x(tk )2 + x(tk ) − x ˜2 ω ≤ M |t − tk | + 2 ≤ ω. It follows that Γ(x(t)) > ε for all t ∈ [tk − ω/(4M ), tk + ω/(4M )]. On  one hand, since the Lebesgue measure of set t ∈ k≥N [tk − ω/(4M ), tk + ω/(4M )] is infinite, then ∞ Γ (x(t)) dt = ∞.

(19)

t0

On the other hand, by (18), V (x(t)) is monotonically nonincreasing and bounded, and then, there exists a constant V0 such

LIU AND WANG: ONE-LAYER RECURRENT NEURAL NETWORK FOR CONSTRAINED NONSMOOTH OPTIMIZATION

1329

that limt→∞ V (x(t)) = V0 . Furthermore, we have ∞

s Γ (x(t)) dt = lim

Γ (x(t)) dt

s→∞

t0

t0

s V˙ (x(t)) dt

≤ − lim

s→∞ t0



= − lim V (x(s)) − V (x(t0 ))



s→∞

= − V0 + V (x(t0 )) which contradicts (19). Therefore, Γ(˜ x) = 0. That is, limit point x ˜ is an equilibrium point of neural network (6). Finally, let us define another Lyapunov function, i.e.,   1 ˜22 V˜ (x) =  ψ(x) − ψ(˜ x) − (x − x ˜)T ζ˜ + x − x 2 where ζ˜ ∈ ∂ψ(˜ x). Similar to the aforementioned proof, we have V˜ (x(t)) ≥ x − x ˜22 /2 and V˜˙ (x(t)) ≤ 0. From the continuity of function V˜ (x(t)), for any υ > 0, there exists τ > 0 ˜2 ≤ τ . Since V˜ (x(t)) is such that V˜ (x(t)) < υ 2 when x − x monotonically nonincreasing on interval [t0 , ∞), there exists a positive integer L such that, when t ≥ tL

Fig. 1. Two-dimensional phase plot of the trajectories of (x1 , x2 ) of neural network (2) with σ = 17 in Example 1.

V (x(t)) ≥ x − x ¯22 /2, where x ¯ is the optimal solution which x(t) converges to. Consequently, x(t) = x ¯ when t ≥ V (x(tS ))/α2 + tS . That is, x(t) is convergent to an optimal solution of problem (1) in finite time. 

 x(t) − x ˜22 ≤ 2V˜ (x(t)) ≤ 2V˜ (x(tL )) < 2υ 2 . IV. S IMULATION R ESULTS That is, limt→+∞ x(t) = x ˜. Then, the state of neural network (6) is convergent to an equilibrium point. Combining with Theorem 4, this completes the proof. 

Here, the proposed recurrent neural network is demonstrated for solving two numerical examples. In both examples, μ(t) in (4) is used for simulations. Example 1: Consider the following optimization problem:

F. Further Results on Finite-Time Convergence Furthermore, the convergence to an optimal solution of problem (1) can be achieved in finite time if a mild condition is satisfied. Similar to that in [7], to achieve finite-time convergence, another assumption is stated as follows. Assumption 3: There exists α > 0 such that   inf (I − P )ζ2 > α inf x∈S\M

ζ∈∂ψ(x)

where ψ(x) = f (x) + σD(x), and M is the optimal solution set of problem (1). Theorem 6: Suppose that Assumptions 1–3 hold. For any x, r), the state of neural network (6) is convergent to x0 ∈ B(ˆ g. an optimal solution of problem (1) in finite time if σ > rlf /ˆ Proof: From (18) and Assumption 3, for x ∈ S\M, one gets that V˙ (x(t)) ≤ −α2 .

(20)

Integrating both sides of (20) from tS to t yields V (x(t)) ≤ V (x(tS )) − α2 (t − tS ) where tS is the minimum time that x(t) reaches S. Then, V (x(t)) ≤ 0 when t ≥ V (x(tS ))/α2 + tS . From (17),

minimize

f (x) = −0.25x21 + 1.2x22 + 0.1x1 x2 + 2x1 − 2x2

subject to

g1 (x) = −x1 + x2 − 2 ≤ 0 g2 (x) = x1 − 2x2 − 2 ≤ 0 g3 (x) = −x1 x2 ≤ 0 x1 + 2x2 = 3.

(21)

As f (x) and g3 (x) are nonconvex, the neural networks for convex optimization in the literature (e.g., see [2], [11], [33], and [34]) may not be capable of solving this problem. Substituting x1 = −2x2 + 3 into f (x) and g3 (x) results in f (x) = −2.7x2 + 3.75 and g3 (x) = 2x22 − 3x2 , which are convex. Thus, f (x) and g3 (x) are convex on equality feasible region S2 . According to the analysis in the preceding section, the proposed neural network in (2) is capable of solving this problem. Feasible region S is shown in Fig. 1, which is the line segment on x1 + 2x2 = 3 between x1 = 0 and g2 (x) = 0. In Fig. 1, we can see that inequality feasible region S1 is nonconvex and unbounded, whereas feasible region S is convex and bounded.  Let x ˆ = (1, 1)T ∈ int(S1 ) S2 , then gˆ = 1. Moreover, the feasible region S ⊂ B(ˆ x, r) with r = 2. An upper bound of the

1330

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

depicts the convergence of the state variables of the neural network with ten random initial values in B(ˆ x, r). It shows that the state variables of the neural network are convergent to the unique optimal solution x∗ when σ = 4, 11, and 20, as shown in Fig. 3. However, if σ = 1, then the neural network is stable at an equilibrium point x∗ = (−1.85, 1.925, 0.692, −0.616)T , which is not feasible.

V. C ONCLUSION

Fig. 2. Transient behaviors of the state variables of neural network (2) with σ = 17 in Example 1.

 Lipschitz constant of f (x) on B(ˆ x, r) S2 is estimated as lf = 8.2. Then, a lower bound of design parameter σ in the neural network is estimated as 16.4. Let  = 10−5 and σ = 17 in the simulations. Fig. 1 depicts the phase plot of (x1 , x2 )T from 20 initial points on B(ˆ x, r), which shows that the state variables slide to feasible region S and then converge to optimal solution x, r). x∗ = (0, 1.5)T , in which the dashed circle indicates B(ˆ Fig. 2 shows the transient behavior of the state variables of the neural network with 20 initial points on B(ˆ x, r). Example 2: Consider a nonsmooth optimization problem as follows: minimize

f (x) = 2|x1 − x2 + 2| + |x2 − x3 + 2x4 | + |x3 + x4 − 4| − cos (0.2(x1 − x4 ))

subject to

g1 (x) = |2x1 − x2 | + x23 − 4 ≤ 0 g2 (x) = −x2 − x3 − |x4 + 2| − 2 ≤ 0 g3 (x) = x1 − x4 − 5 ≤ 0 g4 (x) = x4 − x1 − 5 ≤ 0 g5 (x) = x21 + x22 + x23 + x24 − 8 ≤ 0 x1 + 2x2 = 2

− 2x3 + x4 = −2.

(22)

In this problem, objective function f (x) is nonconvex and nonsmooth. However, according to the formulations of g3 (x) and g4 (x), − cos(0.2(x1 − x4 )) is convex when −5 ≤ x1 − x4 ≤ 5. Then, objective function f (x) is convex on the feasible region, and the proposed neural network in (2) is capable of solving this problem. This problem has a unique optimal solution x∗ = (−0.9,  ˆ = (0, 1, 0, −2)T ∈ int(S1 ) S2 , 1.45, 0.85, −0.3)T . Let x then gˆ = 3. Moreover, √in view of g5 (x), feasible region S ⊂ B(ˆ x, r) with r = 4 2. An  upper bound of the Lipschitz constant of f (x) on B(ˆ x, r) S2 is estimated as lf = 5.3. Then, a lower bound of design parameter σ in the neural network is estimated as 10.2. Simulation results are shown in Fig. 3 with  = 10−5 and four different values of σ, which

This paper has presented a single-layer recurrent neural network for solving nonsmooth optimization problems. Based on the assumptions of bounded feasible region and some convex conditions on the objective functions and inequality constraints, the feasibility of the equilibrium point of the proposed neural network is guaranteed by properly selecting the values of the design parameter in the model. Furthermore, by using the Lyapunov method and the differential inclusion theory, the neural network is proven to be convergent to an exact optimal solution of the optimization problem. In contrast to the existing neural networks in the literature for nonsmooth optimization, the present neural network has the following advantages: First, the objective functions and constraints in the optimization problems are not restricted to be globally convex but need to be locally convex only on the feasible region and the equality feasible region. Second, a single design parameter in the model is independent of the Lipschitz constant of inequality constraints. Simulation results in two numerical examples are given to illustrate the effectiveness and characteristics of the present neural network.

A PPENDIX I Here, we present some definitions and properties concerning set-valued map, nonsmooth analysis, and convex analysis, which are needed for the theoretical analysis in this paper. We refer the readers to [1], [4], [5], and [8] for more thorough discussions. Definition 2: Suppose E ⊂ Rn . F : x → F (x) is called a set-valued map from E → Rn if, to each point x of set E, there corresponds a nonempty closed set F (x) ⊂ Rn . Definition 3: A function ϕ : Rn → R is said to be Lipschitz near x ∈ Rn if there exist ε, δ > 0 such that, for any x , x ∈ Rn that satisfies x − x2 < δ and x − x2 < δ, we have |ϕ(x ) − ϕ(x )| ≤ εx − x 2 . If ϕ is Lipschitz near any point x ∈ Rn , then ϕ is also said to be locally Lipschitz in Rn . Assume that ϕ is Lipschitz near x. The generalized directional derivative of ϕ at x in the direction v ∈ Rn is given by ϕ0 (x; v) = lim sup y→x s→0+

ϕ(y + sv) − ϕ(y) . s

Clarke’s generalized gradient of f is defined as  ∂ϕ(x) = y ∈ Rn : ϕ0 (x; v) ≥ y T v

 ∀v ∈ Rn .

LIU AND WANG: ONE-LAYER RECURRENT NEURAL NETWORK FOR CONSTRAINED NONSMOOTH OPTIMIZATION

Fig. 3.

1331

Transient behaviors of the state variables of neural network (2) with four different values of σ in Example 2.

When ϕ is locally Lipschitz in Rn , ϕ is differentiable for a.a. x ∈ Rn (in the sense of the Lebesgue measure). Then, Clarke’s generalized gradient of ϕ at x ∈ Rn is equivalent to ! ∂ϕ(x) = K lim ∇ϕ(xn ) : xn → x, xn ∈ / N , xn ∈ /E n→∞

where K(·) denotes the closure of the convex hull of the corresponding set, N ⊂ Rn is an arbitrary set with measure zero, and E ⊂ Rn is the set of points where ϕ is not differentiable. Definition 4: A function ϕ : Rn → R, which is locally Lipschitz near x ∈ Rn , is said to be regular at x if there exists a one-sided directional derivative for any direction v ∈ Rn , which is given by ϕ(x + ξv) − ϕ(x) ϕ (x; v) = lim+ ξ ξ→0 

and we have ϕ0 (x; v) = ϕ (x; v). Function ϕ is said to be regular in Rn if it is regular for any x ∈ Rn . A regular function is very important in the Lyapunov approach and nonsmooth analysis used in this paper, which has

been studied in the literature (see, e.g., [4] and [5] for references). In particular, a nonsmooth convex function on Rn is ϕi (i = regular at any x ∈ Rn . For a finite family of functions

n 1, 2, . . . , n), which are regular at x, we have ∂( ϕ i=1 i )(x) =

n ∂ϕ (x). i i=1 Consider the following ordinary differential equation: dx = ψ(x), dt

x(t0 ) = x0 .

(23)

A set-valued map is defined as " " K [ψ (B(x, ε) − N )] φ(x) = ε>0 μ(N )=0

where μ(N ) is the Lebesgue measure of set N , and B(x, ε) = {y : y − x2 ≤ ε}. A solution of (23) is an absolutely continuous function x(t) that is defined on an interval [t0 , t1 ](t0 ≤ t1 ≤ +∞), which satisfies x(t0 ) = x0 and the differential inclusion, i.e., dx ∈ φ(x), dt

a.a. t ∈ [t0 , t1 ].

1332

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 41, NO. 5, OCTOBER 2011

In the regular case, the following chain rule is of key importance in the Lyapunov approach used in this paper. Lemma 4 (Chain Rule) [4]: If V : Rn → R is regular at x(t) and x(t) : R → Rn is differentiable at t and Lipschitz near t, then d V (x(t)) = ξ T x˙ dt

∀ξ ∈ ∂V (x(t)) .

Definition 5: Suppose that E ⊂ Rn is a nonempty closed convex set. The normal cone to set E as x ∈ E is defined as   ∀y ∈ E . NE (x) = v ∈ Rn : v T (x − y) ≥ 0 Lemma 5 [4]: If E1 , E2 ⊂ Rn are closed convex sets and satisfy 0 ∈ int(E1 − E2 ), then for any x ∈ E1 E2 , NE1  E2 (x) = NE1 (x) + NE2 (x). Lemma 6 [4]: Suppose that f is Lipschitz near x and attains a minimum over S at x, then 0 ∈ ∂f (x) + NS (x). A PPENDIX II Proof of Lemma 2: For any x ∈ S2 \S1 , we have gi (x)  − gi (ˆ x) > −gi (ˆ x) > 0, where i ∈ I+ (x) and x ˆ ∈ int(S1 ) S2 . x) = −gi (ˆ x) > 0 for i ∈ If I0 = ∅, then we have gi (x) − gi (ˆ I0 (x). Combining that gi (x) is convex on S2 , for any γi (x) ∈ ˆ) ≥ gi (x) − gi (ˆ x). Since ∂gi (x), one gets that γi (x)T (x − x x)) > 0, then γi (x)T (x − x ˆ) > gˆ as i ∈ gˆ = min1≤i≤p (−gi (ˆ ˆ) ≥ gˆ as i ∈ I0 (x). I+ (x) and γi (x)T (x − x Since x, x ˆ ∈ S2 , according to Lemma 1, it follows that P x = q and P x ˆ = q. Thus, for any i ∈ I+ (x), one gets that ˆ)T γi (x) (x − x ˆ)T (I − P )γi (x) = (x − x − (x − x ˆ)T P γi (x) = (x − x ˆ)T γi (x) > gˆ. Similarly, for i ∈ I0 (x), we have (x − x ˆ)T (I

− P )γi (x) ≥ gˆ. For any x ∈ S2 \S1 and ξ ∈ ∂D(x), ξ = i∈I+ (x) γi (x) if



I0 = ∅ and ξ = i∈I+ (x) γi (x) + i∈I0 (x) γi (x) if I0 = ∅. Thus # # γi (x)T (I −P )(x− x ˆ) > gˆ ≥ gˆ (x− x ˆ)T (I −P )ξ ≥ i∈I+ (x)

i∈I+ (x)

 in which the last inequality holds since I+ (x) = ∅. Proof of Lemma 3: According to Lemma 2, for  ˆ)T (I − P )ξ > gˆ, any x ∈ (B(ˆ x, r) S2 )\S1 , we have (x − x T ˆ2 (I − where ξ ∈ ∂D(x). Since (x − x ˆ) (I − P )ξ ≤ x − x P )ξ2 ≤ r(I − P )ξ2 , it follows that (I − P )ξ2 > gˆ/r.  R EFERENCES [1] J. Aubin and A. Cellina, Differential Inclusions: Set-Valued Maps and Viability Theory. New York: Springer-Verlag, 1984. [2] W. Bian and X. Xue, “Subgradient-based neural networks for nonsmooth nonconvex optimization problems,” IEEE Trans. Neural Netw., vol. 20, no. 6, pp. 1024–1038, Jun. 2009. [3] A. Bouzerdoum and T. Pattison, “Neural network for quadratic optimization with bound constraints,” IEEE Trans. Neural Netw., vol. 4, no. 2, pp. 293–304, Mar. 1993. [4] F. Clarke, Optimization and Nonsmooth Analysis. New York: Wiley, 1983.

[5] A. Filippov, Differential Equations With Discontinuous Righthand Sides. Boston, MA: Kluwer, 1988, ser. Mathematics and its applications (Soviet series). [6] M. Forti, M. Grazzini, P. Nistri, and L. Pancioni, “Generalized Lyapunov approach for convergence of neural networks with discontinuous or nonLipschitz activations,” Phys. D, vol. 214, no. 1, pp. 88–99, Feb. 2006. [7] M. Forti, P. Nistri, and M. Quincampoix, “Generalized neural network for nonsmooth nonlinear programming problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 9, pp. 1741–1754, Sep. 2004. [8] M. Forti, P. Nistri, and M. Quincampoix, “Convergence of neural networks for programming problems via a nonsmooth £ojasiewicz inequality,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1471–1486, Nov. 2006. [9] X. Hu and J. Wang, “Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1487–1499, Nov. 2006. [10] X. Hu and J. Wang, “Design of general projection neural networks for solving monotone linear variational inequalities and linear and quadratic optimization problems,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 5, pp. 1414–1421, Oct. 2007. [11] X. Hu and B. Zhang, “A new recurrent neural network for solving convex quadratic programming problems with an application to the k-winnerstake-all problem,” IEEE Trans. Neural Netw., vol. 20, no. 4, pp. 654–664, Apr. 2009. [12] M. Kennedy and L. Chua, “Neural networks for nonlinear programming,” IEEE Trans. Circuits Syst., vol. 35, no. 5, pp. 554–562, May 1988. [13] G. Li, S. Song, C. Wu, and Z. Du, “A neural network model for nonsmooth optimization over a compact convex subset,” in Proc. 3rd ISNN, vol. 3971, Springer LNCS, 2006, pp. 344–349. [14] Q. Liu and J. Cao, “A recurrent neural network based on projection operator for extended general variational inequalities,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 3, pp. 928–938, Jun. 2010. [15] Q. Liu and J. Wang, “A one-layer recurrent neural network with a discontinuous activation function for linear programming,” Neural Comput., vol. 20, no. 5, pp. 1366–1383, May 2008. [16] Q. Liu and J. Wang, “A one-layer recurrent neural network with a discontinuous hard-limiting activation function for quadratic programming,” IEEE Trans. Neural Netw., vol. 19, no. 4, pp. 558–570, Apr. 2008. [17] Q. Liu and J. Wang, “A one-layer recurrent neural network for non-smooth convex optimization subject to linear equality constraints,” in Proc. 15th ICONIP, vol. 5507, Springer LNCS, 2009, pp. 1003–1010. [18] S. Liu and J. Wang, “A simplified dual neural network for quadratic programming with its KWTA application,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1500–1510, Nov. 2006. [19] W. Lu and T. Chen, “Dynamical behaviors of delayed neural network systems with discontinuous activation functions,” Neural Comput., vol. 18, no. 3, pp. 683–708, Mar. 2006. [20] C. Maa and M. Shanblatt, “Linear and quadratic programming neural network analysis,” IEEE Trans. Neural Netw., vol. 3, no. 4, pp. 580–594, Jul. 1992. [21] R. Rockafellar, “Linear–quadratic programming and optimal control,” SIAM J. Control Optim., vol. 25, no. 3, pp. 781–814, May 1987. [22] D. Tank and J. Hopfield, “Simple neural optimization networks: An a/d converter, signal decision circuit, and a linear programming circuit,” IEEE Trans. Circuits Syst., vol. CAS-33, no. 5, pp. 533–541, May 1986. [23] Q. Tao, X. Liu, and X. Cui, “A linear optimization neural network for associative memory,” Appl. Math. Comput., vol. 171, no. 2, pp. 1119– 1128, Dec. 2005. [24] J. Wang, “Analysis and design of a recurrent neural network for linear programming,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 40, no. 9, pp. 613–618, Sep. 1993. [25] J. Wang, “A deterministic annealing neural network for convex programming,” Neural Netw., vol. 7, no. 4, pp. 629–641, 1994. [26] J. Wang, “Primal and dual neural networks for shortest-path routing,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 28, no. 6, pp. 864– 869, Nov. 1998. [27] Y. Xia, “A new neural network for solving linear and quadratic programming problems,” IEEE Trans. Neural Netw., vol. 7, no. 6, pp. 1544–1548, Nov. 1996. [28] Y. Xia, G. Feng, and J. Wang, “A recurrent neural network with exponential convergence for solving convex quadratic program and related linear piecewise equations,” Neural Netw., vol. 17, no. 7, pp. 1003–1015, Sep. 2004. [29] Y. Xia, G. Feng, and J. Wang, “A primal–dual neural network for online resolving constrained kinematic redundancy in robot motion control,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 1, pp. 54–64, Feb. 2005.

LIU AND WANG: ONE-LAYER RECURRENT NEURAL NETWORK FOR CONSTRAINED NONSMOOTH OPTIMIZATION

[30] Y. Xia, H. Leung, and J. Wang, “A projection neural network and its application to constrained optimization problems,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 49, no. 4, pp. 447–458, Apr. 2002. [31] Y. Xia and J. Wang, “A general projection neural network for solving monotone variational inequalities and related optimization problems,” IEEE Trans. Neural Netw., vol. 15, no. 2, pp. 318–328, Mar. 2004. [32] Y. Xia and J. Wang, “A one-layer recurrent neural network for support vector machine learning,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1261–1269, Apr. 2004. [33] Y. Xia and J. Wang, “A recurrent neural network for nonlinear convex optimization subject to nonlinear inequality constraints,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 7, pp. 1385–1394, Jul. 2004. [34] X. Xue and W. Bian, “Subgradient-based neural networks for nonsmooth convex optimization problems,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 2378–2391, Sep. 2008. [35] Y. Yang and J. Cao, “Solving quadratic programming problems by delayed projection neural network,” IEEE Trans. Neural Netw., vol. 17, no. 6, pp. 1630–1634, Nov. 2006. [36] S. Zhang and A. Constantinides, “Lagrange programming neural networks,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 7, pp. 441–452, Jul. 1992.

Qingshan Liu (S’07–M’08) received the B.S. degree in mathematics from Anhui Normal University, Wuhu, China, in 2001, the M.S. degree in applied mathematics from Southeast University, Nanjing, China, in 2005, and the Ph.D. degree in automation and computer-aided engineering from The Chinese University of Hong Kong, Shatin, Hong Kong, in 2008. In 2008, he joined the School of Automation, Southeast University. From August to November 2009, he was a Senior Research Associate in the Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Kowloon, Hong Kong. From February to August 2010, he was a Postdoctoral Fellow in the Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong. He is currently an Associate Professor in the School of Automation, Southeast University. His current research interests include optimization theory and applications, artificial neural networks, computational intelligence, and nonlinear systems.

1333

Jun Wang (S’89–M’90–SM’93–F’07) received the B.S. degree in electrical engineering and the M.S. degree in systems engineering from Dalian University of Technology, Dalian, China, in 1982 and 1985, respectively, and the Ph.D. degree in systems engineering from Case Western Reserve University, Cleveland, OH, in 1991. He held various academic positions at Dalian University of Technology, Case Western Reserve University, and the University of North Dakota, Grand Forks. He also held various short-term visiting positions at the U.S. Air Force Armstrong Laboratory, Dayton, OH, in 1995; RIKEN Brain Science Institute, Wako, Japan, in 2001; the Universite Catholique de Louvain, Ottignies, Belgium, in 2001; the Chinese Academy of Sciences, Beijing, China, in 2002; Huazhong University of Science and Technology, Hubei, China, in 2006–2007; and Shanghai Jiao Tong University, Shanghai, China, as a Cheung Kong Chair Professor in 2008–2011. He is currently a Professor in the Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong. His current research interests include neural networks and their applications. Prof. Wang has been an Associate Editor of the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS —PART B since 2003 and an Editorial Advisory Board Member of the International Journal of Neural Systems since 2006. He also served as an Associate Editor of the IEEE T RANSACTIONS ON N EURAL N ETWORKS from 1999 to 2009 and the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETICS —PART C from 2002 to 2005. He was a Guest Editor of special issues of the European Journal of Operational Research in 1996, International Journal of Neural Systems in 2007, and Neurocomputing in 2008. He has organized several international conferences such as the 13th International Conference on Neural Information Processing in 2006, of which he was the General Chair, and the 2008 IEEE World Congress on Computational Intelligence held in Hong Kong. He served as the President of the Asia Pacific Neural Network Assembly in 2006. In addition, he has served on many IEEE committees such as the Fellows Committee. He is an IEEE Distinguished Lecturer for 2010–2012. He was the recipient of the Research Excellence Award from The Chinese University of Hong Kong for 2008–2009 and the Shanghai Natural Science Award (first class) for 2009.