A New Steepest Descent Differential Inclusion ... - Semantic Scholar

Report 0 Downloads 42 Views
J Optim Theory Appl DOI 10.1007/s10957-012-0258-4

A New Steepest Descent Differential Inclusion-Based Method for Solving General Nonsmooth Convex Optimization Problems Alireza Hosseini · S.M. Hosseini

Received: 16 April 2012 / Accepted: 21 December 2012 © Springer Science+Business Media New York 2013

Abstract In this paper, we investigate a steepest descent neural network for solving general nonsmooth convex optimization problems. The convergence to optimal solution set is analytically proved. We apply the method to some numerical tests which confirm the effectiveness of the theoretical results and the performance of the proposed neural network. Keywords Steepest descent neural network · Differential inclusion-based methods · General nonsmooth convex optimization · Convergence of trajectories 1 Introduction Nonsmooth optimization problems belong to important categories of applied problems in industry, engineering, and economics. Different methods have been introduced for solving such problems since the 1960s. Nonsmooth optimization problems are important because of their scientific and engineering applications (see [1–3] and references therein). These problems are difficult to solve, even in the unconstrained cases. Some algorithms for solving such problems are available, such as subgradient methods [4], cutting plane methods [5], analytic center cutting plane methods [6], bundle methods [7–9], and differential inclusion-based methods [10–13]. Most of these methods are feasible in the sense that, for solving a constrained optimization problem, the initial condition must be chosen from inside of the feasible region; for more details see, for example, [14] and [15]. On the other hand, some versions of bundle methods (see, for example, [16, 17]) as well as some differential inclusion-based A. Hosseini · S.M. Hosseini () Department of Mathematics, Tarbiat Modares University, P.O. Box 14115-175, Tehran, Iran e-mail: [email protected] A. Hosseini e-mail: [email protected]

J Optim Theory Appl

methods, such as the methods of [10, 11] and the one proposed by us, can deal with infeasible starting points. In these methods, it is possible to choose the initial condition from outside of the feasible region while still being convergent. This fact is one of the most important features which differentiates these methods from some other conventional ones. It is also a reason for authors to try to develop methods of this class for solving optimization problems. From differential equation-based and differential inclusion-based methods for solving optimization problems, we can refer to four different classes: gradient projection methods [12, 13, 18, 19], primal–dual methods [20, 21], Lagrange multiplier rule methods [22], penalty-based methods [10, 11], or some combination of these methods. Gradient projection-based methods are among those which have received a lot of attention in recent years. In these methods, the righthand side of the designed differential inclusion is a projection of a subgradient function (which is dependent on the objective function) on the feasible solution set. While finding projections on simple sets in Rn (e.g., those given by box constraints) is an easy task, it is more challenging on complex sets. Although primal–dual methods are efficient for solving some special problems, like quadratic programming problems, they are not efficient for solving more general optimization problems. Lagrange multiplier methods have been designed for problems with nonsmooth objective functions with special constraints, e.g., linear ones. Among them, the penalty-based methods are the most robust ones. One of the basic methods for a general form of convex optimization problem is the neural network introduced by Forti et al., which is a penalty-based method. This method has several valuable properties: first, its global convergence is guaranteed. Second, the initial condition can be selected from outside of the feasible region. And, third, the convergence to feasible region in finite time is established. On the contrary, the selection of an appropriate penalty parameter can be considered a limitation concern of this method, as the convergence is guaranteed for a sufficiently large penalty parameter. Furthermore, the performance of the method for different values of penalty parameters cannot be anticipated beforehand; for example, bigger values do not guarantee better performance. However, in this paper, we will introduce a penalty parameter free, new neural network for solving general nonsmooth convex optimization problems based on a different analytical formulation. In this neural network, the calculation of projections is not required and there is no need to increase the dimension of the problem. The rest of the paper is organized as follows. In Sect. 2, we formulate the problem and describe some concepts that are used in the rest of the paper. In Sect. 3, first, some results related to our new neural network are expressed and, then, global convergence to solution in general convex problems for this steepest descent differential inclusion-based neural network is argued theoretically. In Sect. 4, a more practical form of the designed neural network is expressed and its nonlinear circuit implementation is shown. Finally, in Sect. 5, some typical examples are given, illustrating the effectiveness and the performance of the proposed neural network.

2 The Problem and Preliminaries In this section, first, the problem is stated. Then, some definitions and preliminary results concerning nonsmooth optimization, set-valued maps and nonsmooth analy-

J Optim Theory Appl

sis are provided. These issues will be utilized in the rest of the paper. Furthermore, Forti et al.’s method is briefly explained. Let us consider the constrained optimization problem min f (x)

s.t.

G(x) ≤ 0.

(1)

Here, x ∈ Rn and G = (g1 , g2 , . . . , gm )T : Rn → Rm is an m-dimensional vectorvalued function of n variables. Furthermore, f, gi : Rn → R, i = 1, 2, . . . , m are general convex Lipschitz functions. We assume that problem (1) is feasible, and generally nonsmooth, i.e., the functions involved in this problem can be nondifferentiable, and the optimal solution set is bounded. We will analyze the existence of solution for a new proposed differential inclusion and its convergence to optimal solution for the corresponding problem (1). Suppose that Ω is the feasible region of problem (1), Ω ∗ is the optimal solution set, Ω 0 is the interior of Ω, ∂Ω is the boundary of Ω, and Ω  is the complement of Ω. We use b(x 0 , l) to denote an open ball with center x 0 and radius l and μ(A), as the Lebesgue measure of a set A ⊆ R. Definition 2.1 (Upper semicontinuity, [23]) Let X and Y be normed spaces. We say that the set valued map F : X ⇒ Y is upper semicontinuous (USC) at x 0 iff, given  > 0, there exists δ > 0 such that F (x 0 + δb(0, 1)) ⊆ F (x 0 ) + b(0, 1). We say that F is USC iff it is USC at every x 0 ∈ X. Definition 2.2 [23] The vector v ∈ Rn is a subgradient of a convex function f : Rn → R at x ∈ Rn iff, for each y ∈ Rn , f (y) ≥ f (x) + v T (y − x). The set of all subgradients of f at x is denoted by ∂f (x). Definition 2.3 [23] Assume that F : Rn ⇒ Rn is a set-valued map; then, a vector function x(·) : [0, ∞[ → Rn is called a solution of x(t) ˙ ∈ F (x(t)) on [t1 , t2 ] iff x(·) is absolutely continuous on [t1 , t2 ] and x(t) ˙ ∈ F (x(t)), a.e. t ∈ [t1 , t2 ]. Lemma 2.1 [24] Suppose that F : Rn → R is a locally Lipschitz function and semismooth at x ∈ Rn ; then, for each h ∈ Rn and v ∈ ∂F (x),   F (x + h) − F (x) = v T h + o h . Note that, according to [24] and references therein, smooth, convex and concave functions F : Rn → R are semismooth at each x ∈ Rn (see the definition of semismooth functions in [25]). 2.1 The Forti et al.’s Method One of the differential inclusion-based methods that has appropriate properties is the method introduced by Forti et al. [10], which is based on the theory of the penalty

J Optim Theory Appl

method. They proposed a neural network for solving nonsmooth optimization problems. A brief explanation of this method is given here. Consider the following barrier function:  0, x ∈ Ω, B(x) =  / Ω, i∈I + (x) gi (x), x ∈ where for a given x ∈ Rn   I + (x) := i ∈ {1, 2, . . . , m} : gi (x) > 0 , and the following differential inclusion for solving problem (1):     x˙ ∈ −∂f x(t) − σ ∂B x(t) ,

(2)

where ⎧ ⎪ i∈I 0 (x) [0, 1]∂gi (x), ⎪ ⎪ ⎨{0}, ∂B(x) =   ⎪ ⎪ i∈I + (x) ∂gi (x) + i∈I 0 (x) [0, 1]∂gi (x), ⎪  ⎩ ∂g (x), i i∈I + (x)

x ∈ ∂Ω, x ∈ Ω 0, (3) x∈ / Ω and I 0 (x) = ∅, x∈ / Ω and I 0 (x) = ∅.

Furthermore, suppose that there exists some x 0 ∈ Ω such that G(x 0 ) < 0 and consider a sufficiently large sphere with center x 0 and radius R, b(x 0 , R) such that Ω ⊂ b(x 0 , R) and

Mf (R) := max υ 2 < ∞. max x∈b(x 0 ,R)\Ω υ∈∂f (x)

Moreover, let fm := −



  max gi x 0 > 0,

i∈{1,...,m}

and MB (R) :=

x 2 < ∞. x∈b(x 0 ,R) fm + B(x) max

Then, we have the following result: Theorem 2.1 [10] Suppose that Eσ is the set of equilibrium points of differential inclusion (2) and σ > Γ (R) = Mf (R)MB (R), then Eσ = Ω ∗ . Furthermore, for any initial condition x0 ∈ b(x 0 , R), the differential inclusion (2) has a unique solution x(t) converging to the optimal solution set of problem (1).

J Optim Theory Appl

3 A New Differential Inclusion-Based Method In this section, a new differential inclusion-based method, called the steepest descent neural network, is introduced and analyzed theoretically. Unlike the penalty-based methods, the convergence of this method is not controlled by any penalty parameter. We first introduce a differential inclusion corresponding to problem (1) and then theoretically show that the existence of its solution and the convergence of trajectories to the optimal solution set of problem (1) is guaranteed. 3.1 Definition of a New Differential Inclusion Without any loss of generality, problem (1) can be replaced by the following problem min f (x)

s.t.

g(x) ≤ 0,

(4)

where g(x) := max{gi (x) : i = 1, 2, . . . , m}. Obviously, by convexity of gi ’s, g is also a convex function. We propose the following differential inclusion:   (5) x˙ ∈ S x(t) , x(0) = x0 , where

⎧ ⎪ if x ∈ Ω  , ⎨−∂g(x) S(x) := −∂f (x) if x ∈ Ω 0 , ⎪ ⎩ {−α∂g(x) − β∂f (x) : α + β = 1, α, β ≥ 0} if x ∈ ∂Ω,

(6)

and x0 ∈ Rn is the initial condition. Then we prove that this differential inclusion leads to our new improved method. 3.2 Existence of Solution and the Convergence In this subsection, we show that the right-hand side of the differential inclusion (5) is USC, which guarantees the existence of a solution. Lemma 3.1 S(x), as defined in (6), is a USC set valued map on Rn . Proof Suppose  > 0 and x ∈ Rn ; then, three cases can happen: Case (i): x ∈ Ω 0 . Then, there exists some δ1 > 0 such that x + δ1 b(0, 1) ⊆ Ω 0 . By the upper semicontinuity of −∂f (x) [26], there exists a δ2 > 0 such that   −∂f x + δ2 b(0, 1) ⊆ −∂f (x) + b(0, 1). Then, with δ = min{δ1 , δ2 }, we obtain   S x + δb(0, 1) ⊆ S(x) + b(0, 1).

J Optim Theory Appl

Case (ii): x ∈ ∂Ω. We take δ1 , δ2 , such that   −∂f x + δ1 b(0, 1) ⊆ −∂f (x) + b(0, 1) and   −∂g x + δ2 b(0, 1) ⊆ −∂g(x) + b(0, 1). Obviously, these values exist because of the upper semicontinuity of −∂f and −∂g. Assume δ = min{δ1 , δ2 }; then,         S x + δb(0, 1) ⊆ −α∂f x + δb(0, 1) − β∂g x + δb(0, 1) : α + β = 1, α, β ≥ 0 , because from (6), if x + δb(0, 1) ∈ Ω  , then     S x + δb(0, 1) = −∂g x + δb(0, 1) , that is, we can take α = 0 and β = 1. Similarly, if x + δb(0, 1) ∈ Ω 0 , then     S x + δb(0, 1) = −∂f x + δb(0, 1) , that is, we can take α = 1 and β = 0. In the case x + δb(0, 1) ∈ ∂Ω, the result is obvious. It can be shown that S(x +δb(0, 1)) ⊆ S(x)+b(0, 1). To prove this, let y ∈ S(x + δb(0, 1)); then, there exist α, β, α + β = 1, α, β ≥ 0 and γ ∈ ∂f (x + δb(0, 1)), η ∈ ∂g(x + δb(0, 1)) such that y = −αγ − βη. Hence, −γ ∈ −∂f (x) + b(0, 1),

−η ∈ −∂g(x) + b(0, 1).

Therefore, y ∈ −α∂g(x) − β∂f (x) + b(0, 1) ⊆ S(x) + b(0, 1), which shows

  S x + δb(0, 1) ⊆ S(x) + b(0, 1).

Case (iii): x ∈ Ω  . The proof of this case is similar to that of Case (i), where g(x) should be used instead of f (x).  Proposition 3.1 If we use the differential inclusion (5) to solve the nonsmooth problem (1), then it is shown that this differential inclusion has a local solution for any initial condition x0 ∈ Rn . Proof From Lemma 3.1, clearly, S is a USC set-valued map with nonempty, compact, and convex values. Hence, the local existence of the solution x(t) for (5) on [0, t1 ] (t1 > 0) with x(0) = x0 is a straightforward consequence of Theorem 1 on p. 77 of [27]. 

J Optim Theory Appl

To continue our analytical investigation, we assume that only one of the following two assumptions holds: Assumption 3.1 There exists an optimal solution in the interior of Ω. Assumption 3.2 All the optimal solutions are located on the boundary of Ω. The proof of convergence under Assumption 3.1 is similar to the analysis of [28], although a different differential inclusion is considered; thus, we will not give the details of the proof. However, under Assumption 3.2, the analysis of [28] cannot be applied to investigate the convergence to an element of the optimal solution set. Hence, in the rest of this paper, with an essentially different analysis, we will prove that in the newly proposed method, the convergence to an element of the optimal solution set for general convex nonsmooth optimization problems will be guaranteed. For an arbitrary x ∗ ∈ Ω ∗ , let us consider V (x, x ∗ ) = 12 x − x ∗ 2 . Then, if x(t) is a solution trajectory of the differential inclusion (5), it obviously satisfies     dV (x(t), x ∗ ) = x(t) ˙ T x(t) − x ∗ = D x(t), x ∗ , dt where

⎧ ⎪−γ T (x − x ∗ )   ⎨ ∗ D x, x := −(αγ T + βηT )(x − x ∗ ) ⎪ ⎩ T −η (x − x ∗ )

if x ∈ Ω 0 , if x ∈ ∂Ω, if x ∈ Ω  ,

(7)

(8)

for some γ ∈ ∂f (x), η ∈ ∂g(x) and α + β = 1, α, β ≥ 0 where α and β depend on x. Lemma 3.2 Suppose that x ∗ ∈ Ω ∗ . Then, we have (a) D(x, x ∗ ) ≤ 0, for any x ∈ Rn , (b) If Λ ⊂ Rn is a compact set such that Λ ∩ (Ω ∗ ∪ ∂Ω) = ∅, then there exists some τ > 0 such that, for any x ∈ Λ, D(x, x ∗ ) < −τ . Proof (a) Proof is similar to the proof of Lemma 3.5 of [28]. (b) Proof is similar to the proof of Lemma 3.7 of [28].



Now, a straightforward corollary of Lemma 3.2(a) follows: Corollary 3.1 Assume {tn } ⊆ R+ , tn ↑ ∞ and limn→∞ x(tn ) = x ∗ ∈ Ω ∗ ; then, limt→∞ x(t) = x ∗ . Theorem 3.1 A solution of the differential inclusion (5) with any initial condition x0 exists globally. Proof The conclusion follows from Proposition 3.1 and Lemma 3.2(a).



Now, we prove convergence of trajectories of the differential inclusion (5) to optimal solution set, under each one of the above-mentioned Assumptions 3.1 and 3.2, separately.

J Optim Theory Appl

3.3 Convergence Under Assumption 3.1 Theorem 3.2 Suppose that x0 ∈ Rn is arbitrary and Ω ∗ is bounded. Then, under Assumption 3.1, the trajectory x(t) of the differential inclusion (5) will converge to a point in Ω ∗ . Proof The proof is similar to the proof of Theorem 3.9 of [28].



In the rest of this section, we suppose that Assumption 3.2 holds and x(t) is an arbitrary solution trajectory of the differential inclusion (5). 3.4 Convergence Under Assumption 3.2 In this subsection, when all the optimal solutions are on the boundary of the feasible region, as mentioned before, we need to use an analysis essentially different from that of [28] to prove the convergence of the differential inclusion (5) to optimal solution set of problem (1). It is shown that, no matter if the constraints are strictly convex or not, the convergence to the optimal solution set is guaranteed for general nonsmooth convex optimization problems. Lemma 3.3 For any positive real number c, the Lebesgue measure of      Hc := t ≥ 0 : g x(t)  ≥ c is finite. Proof (By contradiction) Suppose that the result of lemma does not hold; thus, there exists some c > 0, such that μ(Hc ) = ∞. Let us define     A := x ∈ Rn : g(x) ≥ c . Then, from continuity of |g|, A is a closed subset of Rn . From Lemma 3.2(a), there exists some r > 0, such that x(t) ≤ r. Now, define Λ as follows:   Λ := A ∩ x ∈ Rn : x ≤ r . Obviously, Λ is compact and Λ ∩ (Ω ∗ ∪ ∂Ω) = ∅; therefore, from Lemma 3.2(b), there exists some τ > 0, such that   D x, x ∗ < −τ,

∀x ∈ Λ,

and, thus, letting t → ∞, in the following result of fundamental theorem of calculus for Lebesgue integrals,     V x(t), x ∗ = V x(0), x ∗ +

 0

t

dV (x(s), x ∗ ) ds, ds

J Optim Theory Appl dV (x(s),x ∗ ) ds

and from Lemma 3.2(a) which implies     lim V x(t), x ∗ = V x(0), x ∗ +



t→∞

  ≤ V x(0), x ∗ +



Hc

≤ 0, we obtain

dV (x(s), x ∗ ) ds + ds

 Hc

dV (x(s), x ∗ ) ds ds



  −τ ds = V x(0), x ∗ − τ μ(Hc ) = −∞,

Hc



which is a contradiction.

Remark 1 Note that in Lemma 3.3, we can replace the set Hc , with the following one     t ≥ 0 : d x(t), ∂Ω ≥ c , where d is the distance function. Lemma 3.4 An absolutely continuous function g : R → R is Lipschitz continuous iff it has bounded first derivative for almost all x ∈ R; that is, |g  (x)| < k, for some k > 0. Here, k is a Lipschitz constant of g. Lemma 3.5 x(t) is Lipschitz continuous function of t. Proof By Lemma 3.2(a), we can assume that there exists a compact set C ⊂ Rn , such that x(t) ∈ C, for each t ≥ 0. Suppose t ≥ 0 is fixed. From definition of the differential inclusion (5), there exist some α, β ≥ 0, α + β = 1 such that x(t) ˙ = −αγ − βη, where γ ∈ ∂f (x(t)) and η ∈ ∂g(x(t)). From the properties of a generalized derivative and Lipschitz continuity of f and g, there exist some M, N > 0 such that, for each x ∈ C and γ ∈ ∂f (x) and η ∈ ∂g(x),

γ < M,

η < N.

Therefore,   x(t) ˙  < M + N. Hence, by Lemma 3.4, x(t) is Lipschitz continuous with Lipschitz constant M +N .  Lemma 3.6 For x ∗ ∈ Ω ∗ , let us consider V (t) = V (x(t), x ∗ ). Then, for any positive real number c, the Lebesgue measure of the set       Ac := t ≥ 0 : V˙ (t) ≤ −c = t ≥ 0 : V˙ (t) ≥ c is finite.

J Optim Theory Appl

Proof From theorems in real analysis and Lipschitz continuity of V , V˙ is Lebesgue integrable and thus Lebesgue measurable, and for each c > 0, the set Ac is measurable. Suppose that the result of the theorem is not true, then there exists a c > 0 such that μ(Ac ) = ∞. Then, from the following equation for Lebesgue integrals,  t     dV (x(s), x ∗ ) ∗ ∗ ds, V x(t), x = V x(0), x + ds 0 we obtain, as t → ∞, the following     lim V x(t), x ∗ = V x(0), x ∗ +

t→∞

Ac

 ≤



dV (x(s), x ∗ ) ds + ds

 Ac

dV (x(s), x ∗ ) ds ds

−c ds = −∞. Ac



However, this is a contradiction.

In the rest of this subsection, we assume that, in addition to Assumption 3.2, the following assumption is also true: Assumption 3.3 There exists some x 0 ∈ Ω such that g(x 0 ) < 0. Lemma 3.7 Suppose that x(t) converges in finite time to a point x; ¯ then, x¯ ∈ Ω ∗ . Proof By Lemma 3.3, there exists a sequence, {tn }, such that tn ↑ ∞, x(tn ) → x¯ ∈ ∂Ω. Thus, from its convergence in finite time, we conclude that x(t) → x. ¯ Therefore, for some t¯ > 0, we have x(t) = x, ¯ ∀t ≥ t¯ and x( ˙ t¯) = 0. From definition of x(t), we have x( ˙ t¯) ∈ −α∂f (x) ¯ − β∂g(x). ¯ Then, 0 ∈ α∂f (x) ¯ + β∂g(x), ¯ where α + β = 1, α, β ≥ 0. Now, three cases might happen: (i) α, β = 0. In this case, 0 ∈ ∂f (x) ¯ +

β ∂g(x), ¯ α

and, hence, there exist γ ∈ ∂f (x) ¯ and η ∈ ∂g(x) ¯ such that γ + βα η = 0. Now, for a proof by contradiction, suppose that x¯ is not an optimal solution. Then, there exists some x˜ ∈ Ω ∗ such that f (x) ¯ > f (x). ˜ However, from convexity of f and g, 0 > f (x) ˜ − f (x) ¯ ≥ γ T (x˜ − x) ¯

and 0 > g(x) ˜ − g(x) ¯ ≥ ηT (x˜ − x). ¯

J Optim Theory Appl

Thus,

 γ+

β η α

T (x˜ − x) ¯ < 0,

which is a contradiction with γ + βα η = 0. Thus, x¯ is an optimal solution. (ii) β = 0. In this case, 0 ∈ ∂f (x); ¯ hence, x¯ is an optimal solution of the unconstrained problem. However, from Lemma 3.3, x¯ ∈ ∂Ω is feasible and, thus, it is an optimal solution of the constrained problem (1). (iii) α = 0. Here, 0 ∈ ∂g(x); ¯ that is, x¯ is a minimizer of the convex function g(x) on ¯ = 0, x¯ cannot be a minimizer of g Rn . However, from Assumption 3.3 and g(x) and, hence, this case cannot happen.  Lemma 3.8 Assume that x(t) does not converge in finite time and for a sequence {tn }, tn ↑ ∞, V (x(t), x ∗ ) is differentiable at each tn and V˙ (x(tn ), x ∗ ) → 0, for some x ∗ ∈ Ω ∗ . Then, for arbitrary small hn , there exists a sequence of elements sn ∈ ] tn , tn + hn [ such that V (x(t), x ∗ ) is differentiable at each sn and V˙ (x(sn ), x ∗ ) → 0. Proof From the absolute continuity property of x(t), it can be seen that V (x(t), x ∗ ) is also absolutely continuous. However,       V˙ x(t), x ∗  = x(t) ˙ T x(t) − x ∗  < MR = L, where M is a Lipschitz constant of the Lipschitz continuous function x(t) and R is an upper bound for x(t) − x ∗ , t ≥ 0. Thus, by Lemma 3.4, V (x(t), x ∗ ) is Lipschitz continuous with Lipschitz constant L, and the Clarke generalized derivative can be defined for V at almost every t ∈ R+ . To define an equivalent definition of the generalized gradient, we should recall that a locally Lipschitz function U (t) is differentiable almost everywhere. Then, for such a function U (t), the generalized gradient is defined as

/ N, ln ∈ / Ωt , ∂U (t) := conv lim U˙ (ln ) : ln → t, ln ∈ (9) n→∞

where Ωt is the set of points in t + b(0, 1) for some arbitrary  > 0 where U fails to be differentiable, N is an arbitrary set of measure zero and conv{·} denotes the closure of the convex hull. Without loss of generality, we can assume that |V˙ (x(tn ), x ∗ )| < n1 . Now, define  V (x(t), x ∗ ) if t ≥ tn , Un (t) := 1 − ∗ + V (x(tn ), x ) + 2 (un + un )(t − tn ) if t < tn , + where u− n and un respectively denote the right-hand essential infimum and supremum of V˙ (x(·), x ∗ ) at tn , as the supremum and infimum are taken over almost all t ≥ tn . Note that the constructed function Un (t), obviously, is Lipschitz continuous; thus, we can use the definition of generalized gradient for this function. From (9),   + (10) ∂Un (tn ) = u− n , un ,

J Optim Theory Appl + because limk→∞ Un (lk ), for each {lk } such that lk ↓ tn , is a scalar in [u− n , un ]. Fur1 − + thermore, if t < tn , Un is differentiable at t and U˙ n (t) = 2 (un + un ). Thus, because of the compactness and convexity of ∂Un (tn ), (10) holds. On the other hand, U˙ n (t) = V˙ (x(t), x ∗ ), for each t ∈ ]tn , tn + hn [, where V is differentiable at t. By Proposition 2.2.2 of [26],   V˙ x(tn ), x ∗ = D+ Un (tn ) ∈ ∂Un (tn ),

where D+ Un (tn ) is the right-sided derivative of Un at tn . Now, we prove that there exists a sequence of elements sn ∈ ]tn , tn + hn [ such that |V˙ (x(sn ), x ∗ )| < n2 . If this were not the case, then for each t ∈ ]tn , tn + hn [, where V is differentiable at t, we would have U˙ n (t) = V˙ (x(t), x ∗ ) < − n2 and, thus, by the definition of generalized 2 + derivative of Un (t), [u− n , un ] ⊆ ]−∞, − n ] would hold. Therefore,       2 + , V˙ x(tn ), x ∗ ∈ u− ⊆ −∞, − , u n n n which is in contradiction with |V˙ (x(tn ), x ∗ )| < n1 . Thus, V˙ (x(sn ), x ∗ ) → 0, and the proof is complete.  Lemma 3.9 ([29, Lemma 5.13]) Let x1 , x2 , . . . , xn be real-valued locally Lipschitz functions that are defined on R+ and g be a real-valued locally Lipschitz function defined on Rn . If x = (x1 , x2 , . . . , xn )T is differentiable at some t˜ > 0 and either d ˜ ˙ t˜) g(x(t˜)) exists, then dt (g(x(t ))) or Dx(    d  g x(t˜) = Dx( ˙ t˜) g x(t˜) . dt Dx( ˙ t˜). ˙ t˜) g(x(t˜)) is the directional derivative of g at the point x(t˜) in the direction x( Lemma 3.10 Suppose x ∗ ∈ Ω ∗ ; then, there exists a sequence {tn } ⊆ R+ , tn ↑ ∞, such that x(tn ) → x, ¯ for some x¯ ∈ ∂Ω and V˙ (x(tn ), x ∗ ) → 0, when n → ∞. Proof Suppose s1 > 0 is arbitrary and sn > n has been made inductively. Now, we make sn+1 . From Lemmas 3.6 and 3.3, μ(An ), μ(Bn ) < ∞, where           1 1 ∗     ˙ An := t ≥ 0 : V x(t), x ≥ , Bn := t ≥ 0 : g x(t) ≥ . n+1 n+1 Thus, μ(An ∪ Bn ) < ∞ and, therefore, μ(An ∩ Bn ∩ {t ≥ 0}) = ∞. Now, we choose n + 1 < sn+1 ∈ An ∩ Bn ∩ {t ≥ 0}, that is,    V˙ x(sn+1 ), x ∗  < Therefore,

1 , n+1

   g x(sn+1 ) 
0, (14)     ¯ − g x ∗ = 0. (15) ηT x¯ − x ∗ ≥ g(x) From (14), we see that γ = 0. Furthermore, (13), (14), and (15) yield α = 0 and, thus, βn → β = 1. Moreover, η = 0 because if η = 0, then 0 = η ∈ ∂g(x), ¯ that is in contradiction with Assumption 3.3. Now, using {tn }, we can introduce a sequence {un } that satisfies g(x(un )) < 0, V˙ (x(un ), x ∗ ) → 0, and x(un ) → x. ¯ Therefore, to construct that sequence, obviously, from γ = 0 and η = 0, we can assume that

γn > N1 ,

ηn > N2 ,

(16)

for some positive numbers N1 and N2 . Note that (16) holds for members of some subsequences of {γn } and {ηn }. However, for simplicity, we assume that this property is true for the main sequences. Furthermore, from the properties of the generalized gradient, there exist some positive numbers M1 and M2 such that

γn < M1 ,

ηn < M2 .

(17)

N22 . 4M2

(18)

Now, we choose , δ > 0, such that δ
k, βn > Obviously, by (18),

M1 M2 +M2 +δ N22 +M1 M2

M 1 M 2 + M2  + δ . N22 + M1 M2

(19)

< 1. Then, by the definition of the differential in-

clusion (5), for s > k, there exist γs ∈ ∂f (x(ts )) and ηs ∈ ∂g(x(ts )) such that x(t ˙ s ) = −αs γs − βs ηs , or x˙i (ts ) = −αs γsi − βs ηsi ,

i = 1, 2, . . . , n,

where αs + βs = 1. Thus, for each i ∈ {1, 2, . . . , n}, there exists hsi > 0 such that, for each h < hsi , xi (ts ) − hαs γsi − hβs ηsi − h/n2 < xi (ts + h) < xi (ts ) − hαs γsi − hβs ηsi + h/n2 . Suppose that h < 12 min{hsi }i=1,2,...,n = h¯ 0s ; then, there exist some i , i = 1, 2, . . . , n, such that |i | < n2 and xi (ts + h) = xi (ts ) − hαs γsi − hβs ηsi + hi . In other words, x(ts + h) = x(ts ) − hαs γs − hβs ηs + hk, where k = (1 , . . . , n )T . Suppose u = ts + h; by convexity of g, g is semismooth at each x 0 ∈ Rn . Thus, from the property of semismoothness for each x 0 ∈ Rn , ζ ∈ ∂g(x 0 ) and ρ ∈ Rn with small norm, we have       g x 0 + ρ = g x 0 + ζ T ρ + o ρ . Note that this is a straightforward result of Lemma 2.1 and the convexity of g. By assuming x 0 := x(ts ), ζ := ηsT , and ρ := −αs hγs − βs hηs + hk, it is easily seen that for any such h, 0 ≤ −αs hγs − βs hηs + hk ≤ (M1 + M2 + )h. Moreover, we have       g x(u) = g x(ts ) − ηsT (αs hγs + βs hηs − hk) + o ρ .

(20)

This means that there exists h¯ 1s < h¯ 0s such that, if h < h¯ 1s , then     δh g x(u) < g x(ts ) − ηsT (αs hγs + βs hηs − hk) + . 2

(21)

J Optim Theory Appl

Furthermore, by continuity of x(t), there exists h¯ 2s < h¯ 1s such that, if h < h¯ 2s , then   x(ts ) − x(ts + h) < 1 . s

(22)

On the other hand, from Lemma 3.8, we can conclude, without loss of generality, that there exists hs < h¯ 2s such that  1  V˙ x(ts + hs ), x ∗ < . s

(23)

Hence, by considering ρs := −αs hs γs − βs hs ηs + hs k, where k depends on hs and us = ts + hs , from (20), we have       g x(us ) = g x(ts ) − ηsT (αs hs γs + βs hs ηs − hs k) + o ρs .

(24)

From (16)–(19), it is seen that −ηsT (αs hs γs + βs hs ηs − hs k) < (1 − βs )M1 M2 hs − βs N22 hs + M2 hs  < −δhs , (25) which implies g(x(us )) < 0, by (21), (24), and (25). From (23) and (22),   x(ts ) − x(us ) < 1 , s

  1 V˙ x(us ), x ∗ < , s

and, consequently, x(un ) → x, ¯

  V˙ x(un ), x ∗ → 0 as n → ∞, 

and the proof is now complete.

Lemma 3.12 Let x ∗ ∈ Ω ∗ and {tn } be the same sequence as mentioned in Lemma 3.10, and let there exist a subsequence {tnk } of {tn } such that, for each k, g(tnk ) > 0. Then, there exists a sequence {υn } such that υn ↑ ∞, g(x(υn )) ≤ 0, V˙ (x(υn ), x ∗ ) → 0, and x(υn ) → x¯ for some x¯ ∈ ∂Ω. Proof Without loss of generality, we can suppose that for each n, g(x(tn )) > 0. Only one of the following statements holds: Statement 1. There exists some N > 0 such that, for each t > N ,   g x(t) > 0,

   1 or V˙ x(t), x ∗  ≥ , N

   1 or g x(t)  ≥ . N

Statement 2. For each N > 0, there exists υN > N such that   g x(υN ) ≤ 0,

   1 and V˙ x(υN ), x ∗  < , N

   1 and g x(υN )  < . N

J Optim Theory Appl

Obviously, if Statement 2 holds, then there exists a subsequence {υnk } of {υn } such that υnk ↑ ∞, g(x(υnk )) ≤ 0, V˙ (x(υnk ), x ∗ ) → 0 for some x ∗ ∈ Ω ∗ and x(υnk ) → x¯ for some x¯ ∈ ∂Ω and, hence, in this case, we are done. In the other case, when Statement 1 holds, we consider     A = t > N : g x(t) ≤ 0          1  1 ∗     ˙ ⊆ {t > N} ∩ t : V x(t), x ≥ ∪ t : g x(t) ≥ . N N Then, from Lemmas 3.3 and 3.6, μ(A) < ∞ and μ(A ∩ {t > N }) = ∞. Now, define   E(t) := g x(t) ; then, E(t) is Lipschitz continuous and, by Lemma 3.9,   ˙ = Dx( E(t) ˙ t˜) g x(t˜) . Suppose t ∈ A ∩ {t > N}; then, x(t) ˙ = −γt , for some γt ∈ ∂g(x(t)); then, from the convexity of g and a property of semismooth functions, T ˙ = lim g(x(t) − hγt ) − g(x(t)) = lim −hγt γt + o(h) = −γtT γt . E(t) h↓0 h↓0 h h

Now, it can be shown that there exists some k > 0 such that, for each t ∈ A ∩ {t > N}, γt > k. Because if this were not the case, there would exist some sequences {ϑn }, {γn }, ϑn ∈ A ∩ {t > N}, γn ∈ ∂g(x(ϑn )) such that x(tn ) → x˜ ∈ (∂Ω ∪ Ω  ) and γn → 0; thus, 0 ∈ ∂g(x), ˜ that is in contradiction with Assumption 3.3 and, thus, there ˙ exists k > 0 such that E(t) < −k 2 , for each t ∈ A ∩ {t > N}. On the other hand, there exists a positive number l, such that |g(x(t))| < l for each t > 0 (because of the continuity of g and the boundedness of x(t)). Moreover, for some Lipschitz constant ˙ M > 0 for g, we have γt < M and, thus, |E(t)| < M 2 . Furthermore,  tn     ˙ ds, g x(tn ) = g x(N ) + E(s) (26) N

for each n, such that tn > N . By taking the limit of (26), as n → ∞, we get     ˙ ds + ˙ ds g(x) ¯ = g x(N ) + E(s) E(s) A

A ∩{t>N }

    ≤ g x(N ) + M 2 μ(A) − k 2 μ A ∩ {t > N} = −∞,

(27)

that is an obvious contradiction which confirms that Statement 1 cannot hold. Therefore, the proof is complete. 

J Optim Theory Appl

Corollary 3.2 Suppose x ∗ ∈ Ω ∗ ; then, there exists a sequence {ln } ⊆ R+ such that ln ↑ ∞, g(x(ln )) < 0, x(ln ) → x¯ for some x¯ ∈ ∂Ω and V˙ (x(ln ), x ∗ ) → 0. Proof Let us consider the sequence {tn } to be the sequence concluded from Lemma 3.10; then, there exists a subsequence {t¯n } of {tn } for which only one of the following cases holds: Case (i): for each n, g(x(t¯n )) < 0, Case (ii): for each n, g(x(t¯n )) = 0, Case (iii): for each n, g(x(t¯n )) > 0. In Case (i), it is enough to take ln = t¯n , for each n. In Case (ii), it is enough to take ln = un , for each n, where {un } is the same sequence which is obtained from Lemma 3.11. In Case (iii), by Lemma 3.12, a sequence exists whose elements satisfy either Case (i) or Case (ii), for which we have already obtained the desired conclusions.  ¯ Theorem 3.3 There exists x¯ ∈ Ω ∗ such that x(t) → x. Proof If x(t) converges in finite time, then, by Lemma 3.7, the result is true. Otherwise, let us assume x ∗ ∈ Ω ∗ ; then, from Corollary 3.2, there exists a sequence ¯ for some x¯ ∈ ∂Ω. Further{ln } ⊆ R+ such that ln ↑ ∞, g(x(ln )) < 0, x(ln ) → x, more, V˙ (x(ln ), x ∗ ) → 0, for some x ∗ ∈ Ω ∗ . We then show that x¯ should satisfy ¯ x¯ ∈ Ω ∗ which, according to Corollary 3.1, implies that x(t) → x. From g(x(ln )) < 0, there exists a sequence {γn } such that γn ∈ ∂f (x(ln )) and       ˙ n )T x(ln ) − x ∗ = −γnT x(ln ) − x ∗ . (28) V˙ x(ln ), x ∗ = x(l Thus, by the upper semicontinuity of f at each x ∈ Rn , there exists γ ∈ ∂f (x) ¯ such that, after taking the limit of (28) as n → ∞, we obtain     lim V˙ x(ln ), x ∗ = −γ T x¯ − x ∗ = 0. (29) n→∞

If γ = 0, then 0 ∈ ∂f (x), ¯ that is, x¯ ∈ Ω ∗ ; otherwise, from the convexity of f , we get     γ T x¯ − x ∗ ≥ f (x) ¯ − f x ∗ ≥ 0. (30) Hence, from (29) and (30), we get f (x ∗ ) = f (x), ¯ which implies x¯ ∈ Ω ∗ and, by Corollary 3.1, x(t) → x. ¯ 

4 Implementation in Nonlinear Circuits It should be noted that the differential inclusion (5) can be implemented by nonlinear circuits. Suppose ⎧ ⎪ if x < 0, ⎨0 χ(x) = [0, 1] if x = 0, ⎪ ⎩ 1 if x > 0,

J Optim Theory Appl

then, the following theorem discusses the practical implementation of the differential inclusion (5). Theorem 4.1 The differential inclusion (5) can be implemented with the following system:          x(t) ˙ ∈ χ g x(t) ∂f x(t) − ∂g x(t) − ∂f x(t) . (31) Proof We show that    S(x) = χ g(x) ∂f (x) − ∂g(x) − ∂f (x). Assume x ∈ Ω 0 , then χ(g(x)) = 0, which implies χ(g(x))(∂f (x) − ∂g(x)) − ∂f (x) = −∂f (x) = S(x). If x ∈ Ω  , then χ(g(x)) = 1; therefore, χ(g(x))(∂f (x) − ∂g(x)) − ∂f (x) = −∂g(x) = S(x). If x ∈ ∂Ω, that is, g(x) = 0, then   S(x) = −α∂g(x) − β∂f (x), α + β = 1, α, β ≥ 0 , (32) and χ(g(x)) = [0, 1]; thus,      χ g(x) ∂f (x) − ∂g(x) − ∂f (x) = [0, 1] ∂f (x) − ∂g(x) − ∂f (x). Obviously, (32) and (33) are the same, and the proof is complete.

(33) 

Figure 1 shows the designed neural network for differentiable problems when we apply the differential inclusion (31). Note that for the nondifferentiable case, instead of the gradient of each function, one can use an element of the generalized gradient. From the system defined by (31), clearly, in comparison with other related neural networks for solving nonlinear programming problems (see, for example, [10, 11, 30]), the new designed neural network has a simpler architecture and can be easily used in practical problems. As far as we can tell, in all other related methods for nonsmooth optimization problems, a simple implementation in circuit form has not yet been explained. For instance, one can see that for the penalty-based method and the complicated form of right-hand side of the differential inclusion (3), expressing the method for implementation by a nonlinear circuit may not be easy or could be more complicated in comparison to the new method presented in this article. Fig. 1 Block diagram for differentiable cases

J Optim Theory Appl

Fig. 2 Convergence to an optimal solution of problem P1, with three different initial conditions

5 Numerical Results In this section, we provide some numerical examples to illustrate the theoretical results of the paper. The test problems P1 and P2 (see [31]) will appear here with the notations as given in the main formulation of problem (1) in Sect. 2. Here, we have considered the infeasible initial points much different from those used by [31] to illustrate practically what was shown theoretically; that is, the convergence of the proposed method happens independent of the initial points. Problem P1 (Rosen–Suzuki) f (x) = x12 + x22 + 2x32 + x42 − 5x1 − 5x2 − 21x3 + 7x4 , ⎫ ⎧ 2 2 2 2 ⎪ ⎬ ⎨ x1 + x2 + x3 + x4 + x1 − x2 + x3 − x4 − 8; ⎪ . g(x) = max x12 + 2x22 + x32 + 2x42 − x1 − x4 − 10; ⎪ ⎪ ⎭ ⎩ 2 2 2 x1 + x2 + x3 + 2x1 − x2 − x4 − 5 In this problem, the constraint is nonsmooth. The optimal value of the objective function for this problem is –44. We have used three different sets of initial conditions. In each case, the solution trajectory converges to an equilibrium point of the differential inclusion, which is an optimal solution. (i) xi = 2, i = 1, . . . , 4, see Fig. 2 for the convergence to an optimal solution of problem P1, plotted in pink. (ii) (2, −3, 4, 1), see Fig. 2 for the convergence to an optimal solution of problem P1, plotted in green. (iii) (−1, 4, 7, 11), see Fig. 2 for the convergence to an optimal solution of problem P1, plotted in blue.

J Optim Theory Appl

Fig. 3 Solution trajectory path of the differential inclusion (5) with initial condition (10, −8) for problem P2 and its convergence to an optimal solution, which is located on the boundary of the feasible region

Problem P2 (Constrained CB2-II) ⎧ 2 ⎫ ⎨ x1 + x24 ; ⎬ f (x) = max (2 − x1 )2 + (2 − x2 )2 ; , ⎩ ⎭ 2 exp(x2 − x1 ) g1 (x) = x12 + x22 − 2x1 + x2 − 4, g2 (x) = |x1 | − 1, g3 (x) = |x2 | − 2. Here, the objective function and the constraints are nonsmooth. The optimal value of the objective function for this problem is 2. We have solved the problem with three different sets of initial conditions, all of which converge to the optimal solution (1, 1). (i) (10, −8), see Fig. 6 for the convergence to an optimal solution of problem P2, plotted in pink (solid). A solution trajectory path of the differential inclusion (5) in this case is shown in Fig. 3. (ii) (0, −14), see Fig. 6 for the convergence to an optimal solution of problem P2, plotted in green (dashed). A solution trajectory path of the differential inclusion (5) in this case is shown in Fig. 4. (iii) (−5, 6), see Fig. 6 for the convergence to an optimal solution of problem P2, plotted in blue (dotted). A solution trajectory path of the differential inclusion (5) in this case is shown in Fig. 5.

J Optim Theory Appl

Fig. 4 Solution trajectory path of the differential inclusion (5) with initial condition (0, −14) for problem P2 and its convergence to an optimal solution, which is located on the boundary of the feasible region

Fig. 5 Solution trajectory path of the differential inclusion (5) with initial condition (−5, 6) for problem P2 and its convergence to an optimal solution, which is located on the boundary of the feasible region

J Optim Theory Appl

Fig. 6 Convergence to an optimal solution of problem P2, with three different initial conditions

6 Conclusions In this paper, we introduced a differential inclusion which solves general nonsmooth convex optimization problems. Convergence to a solution was theoretically investigated. In this new neural network, the calculation of projections is not required. By the same token, there is no need to use any penalty parameter. Furthermore, this method can easily deal with infeasible starting points. We have illustrated the effectiveness of theoretical results as well as the performance of the proposed method by some numerical examples. These examples deal with nonsmooth objective functions and constraints, in which we use infeasible starting points. Some generalization of the proposed method for solving nonconvex problems and more application results will be discussed in a future work. Acknowledgements The authors would like to thank the editor and the reviewers of the paper for their instructive comments. Certainly, their meticulous reading and fruitful comments enriched the content and the structure of this paper.

References 1. Mordukhovich, B.: Variations Analysis and Generalized Differentiation, I: Basic Theory. Springer, Berlin (2006) 2. Mordukhovich, B.: Variations Analysis and Generalized Differentiation, II: Applications. Springer, Berlin (2006) 3. Soleimani-damaneh, M.: Nonsmooth optimization using Mordukhovich’s subdifferential. SIAM J. Control Optim. 48, 3403–3432 (2010)

J Optim Theory Appl 4. Shor, N.: Minimization Methods for Non-differentiable Functions. Springer, Berlin (1985) 5. Kelley, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8, 703– 712 (1960) 6. Goffin, J.L., Haurie, A., Vial, J.P.: Decomposition and nondifferentiable optimization with the projective algorithm. Manag. Sci. 38, 284–302 (1992) 7. Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms: Part 2: Advanced Theory and Bundle Methods. Springer, Berlin (2010) 8. Solodov, M.V.: On approximations with finite precision in bundle methods for nonsmooth optimization. J. Optim. Theory Appl. 119, 151–165 (2003) 9. Zhao, X., Luh, P.B.: New bundle methods for solving lagrangian relaxation dual problems. J. Optim. Theory Appl. 113, 373–397 (2002) 10. Forti, M., Nistri, P., Quincampoix, M.: Generalized neural network for nonsmooth nonlinear programming problems. IEEE Trans. Circuits Syst. I 51, 1741–1754 (2004) 11. Bian, W., Xue, X.: Subgradient-based neural networks for nonsmooth nonconvex optimization problems. IEEE Trans. Neural Netw. 20, 1024–1038 (2009) 12. Li, G., Song, S., Guan, X.: Subgradient-based feedback neural networks for non-differentiable convex optimization problems. Sci. China Ser. F, Inf. Sci. 20, 421–435 (2006) 13. Li, G., Song, S., Wu, C.: Generalized gradient projection neural networks for nonsmooth optimization problems. Sci. China Ser. F, Inf. Sci. 53, 990–1004 (2010) 14. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69, 111–147 (1995) 15. Mäkelä, M.M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2001) 16. Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics. Springer, Berlin (1985) 17. Mäkelä, M.M., Neittaanmäki, P.: Nonsmooth Optimization. World Scientific, Singapore (1992) 18. Xia, Y., Leung, H., Wang, J.: A projection neural network and its application to constrained optimization problems. IEEE Trans. Circuits Syst. I, Fundam. Theory Appl. 49, 442–457 (2002) 19. Xia, Y., Wang, J.: A general projection neural network for solving monotone variational inequalities and related optimization problems. IEEE Trans. Neural Netw. 15, 318–328 (2004) 20. Xia, Y., Wang, J.: Neural network for solving linear programming problems with bound variables. IEEE Trans. Neural Netw. 6, 515–519 (1995) 21. Xia, Y.: A new neural network for solving linear programming problems and its application. IEEE Trans. Neural Netw. 7, 525–529 (1996) 22. Wang, J., Hu, Q., Jiang, D.: A Lagrangian network for kinematic control of redundant robot manipulators. IEEE Trans. Neural Netw. 10, 1123–1132 (1999) 23. Aubin, J.P., Cellina, A.: Differential Inclusion: Set-Valued Maps and Viability Theory. Springer, Berlin (1984) 24. Mifflin, R.: Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15, 957–972 (1977) 25. Defeng, S., Womersley, R.S., Houduo, Q.: A feasible semismooth asymptotically Newton method for mixed complementarity problems. Math. Program. 94, 167–187 (2002) 26. Clarke, F.H.: Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics, Philadelphia (1983) 27. Filippov, A.F.: Differential Equations with Discontinuous Right-Hand Sides: Control Systems. Kluwer, Boston (1988) 28. Hosseini, A., Hosseini, S.M., Soleimani-damaneh, M.: A differential inclusion-based approach for solving nonsmooth convex optimization problems. Optimization (2011). doi:10.1080/02331934.2011.613993 29. Borwein, J.M., Moors, W.B.: Essentially strictly differentiable Lipschitz functions. J. Funct. Anal. 149, 305–351 (1997) 30. Liu, Q., Wang, J.: A one-layer recurrent neural network for constrained nonsmooth optimization. IEEE Trans. Syst. Man Cybern., Part B 41, 1323–1333 (2011) 31. Lukšan, L., Vlˇcek, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Institute of Computer Science, Academy of Science of the Czech Republic, Technical report, 798 (2000)