A One-Layer Recurrent Neural Network for Non-smooth Convex ...

Report 2 Downloads 110 Views
A One-Layer Recurrent Neural Network for Non-smooth Convex Optimization Subject to Linear Equality Constraints Qingshan Liu1 and Jun Wang2 1

School of Automation, Southeast University, Nanjing, Jiangsu, China [email protected] 2 Department of Mechanical and Automation Engineering The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong [email protected]

Abstract. In this paper, a one-layer recurrent neural network is proposed for solving non-smooth convex optimization problems with linear equality constraints. Comparing with the existing neural networks, the proposed neural network has simpler architecture and the number of neurons is the same as that of decision variables in the optimization problems. The global convergence of the neural network can be guaranteed if the non-smooth objective function is convex. Simulation results are provided to show that the state trajectories of the neural network can converge to the optimal solutions of the non-smooth convex optimization problems and show the performance of the proposed neural network.

1

Introduction

Consider the following nonlinear programming (NP) problem: minimize subject to

f (x), Ax = b,

(1)

where x ∈ Rn , f (x) : Rn → R is convex continuous function but not smooth (i.e., not continuously differentiable), A ∈ Rm×n is a full row-rank matrix (i.e., rank(A) = m), and b ∈ Rm . Convex programming has many applications in scientific and engineering areas, such as signal and image processing, manufacturing, optimal control, and pattern recognition. Non-smooth optimization has been widely utilized to minimax problems, parameter estimation and support vector machine learning. Recurrent neural networks based on hardware implementation are effective for online solutions of convex programming problems [1,2,3,4,5,6,7,8,9,10]. In 1986, Tank and Hopfield [1] first proposed a neural network for solving linear programming problems, which motivated the development of neural networks for 

The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project no. CUHK417608E).

M. K¨ oppen et al. (Eds.): ICONIP 2008, Part II, LNCS 5507, pp. 1003–1010, 2009. c Springer-Verlag Berlin Heidelberg 2009 

1004

Q. Liu and J. Wang

solving linear and nonlinear programming problems. Kennedy and Chua [2] presented a neural network for solving nonlinear programming problems by utilizing the finite penalty parameter method and the network is convergent to an approximate optimal solution, and this neural network has an implementation problem when the penalty parameter is very large. Zhang and Constantinides [11] proposed the Lagrangian network which has two-layer structure and can be utilized to solve some strictly convex programming problems. Wang and Xia [5] proposed a primal-dual neural network and it can be utilized to some convex quadratic programming problems. Recently, the projection neural networks were proposed for solving general nonlinear programming problems, and these neural networks have well convergence properties and can globally converge to an exact optimal solution for convex programming problems [12][13]. In [10][14], we proposed some one-layer recurrent neural networks for solving linear and quadratic programming problems. In [15][16], the non-smooth optimization was investigated. In [15], a neural network model was proposed for solving non-smooth convex optimization subject to bound constraints. In [16], a two-layer recurrent neural network was constructed for solving non-smooth convex optimization subject to linear equality and bound constraints. In this paper, a one-layer recurrent neural network is proposed for solving non-smooth convex programming problem (1). Comparing with the existing neural networks for non-smooth convex optimization, this new neural network has a simpler structure, but can be utilized to solve more general convex programming problems.

2

Model Description

In this section, a one-layer recurrent neural network is constructed for solving programming problem (1). Definition 1. [17] Suppose E ⊂ Rn . F : x → F (x) is called a set-valued function from E → Rn , if to each point x of a set E, there corresponds to a nonempty closed set F (x) ⊂ Rn . Definition 2. [18] Let V (x) be a function from Rn to R. For any x ∈ Rn , V (x + hv) − V (x) . h→0+ h

DV (x)(v) = lim

We say that DV (x)(v) is the derivative from the right of V at x in the direction v. If DV (x)(v) exists for all directions, we say that V is differentiable from the right at x. We say that the closed convex subset (possibly empty) ∂V (x) = {ξ ∈ Rn : ∀v ∈ Rn , ξ T v ≤ DV (x)(v)} is the sub-differential of V at x. The element ξ of ∂V (x) is called the sub-gradient of V at x.

A One-Layer Recurrent Neural Network

1005

Theorem 1. x∗ is an optimal solution of problem (1) if and only if there exists y ∗ ∈ Rm such that (x∗ T , y ∗ T )T satisfies the following equations 0 ∈ ∂f (x) − AT y,

(2)

0 = Ax − b.

(3)

Proof: See Theorem 1 in [16]. Next, according to Equations (2) and (3), the one-layer recurrent neural network model will be induced. From (2), for any γ ∈ ∂f (x), we have x = x − γ + AT y.

(4)

Substituting (4) into (3), it follows that A(x − γ + AT y) − b = 0, That is

AAT y = Aγ − Ax + b,

∀γ ∈ ∂f (x). ∀γ ∈ ∂f (x).

Since A is full raw-rank, AAT is invertible. Then y = (AAT )−1 (Aγ − Ax + b),

∀γ ∈ ∂f (x).

(5)

Substituting (5) into (2), for any γ ∈ ∂f (x), we have γ − AT (AAT )−1 (Aγ − Ax + b) = 0.

(6)

Let P = AT (AAT )−1 A and q = AT (AAT )−1 b, then, (6) can be written as P x + (I − P )γ − q = 0,

∀γ ∈ ∂f (x),

(7)

where I is identity matrix. The matrix P , called the projection matrix, is symmetric and satisfies P 2 = P . Based on Equation (7), the proposed recurrent neural network model is described by the following differential inclusion: dx ∈ λ[−P x − (I − P )∂f (x) + q], dt

(8)

where λ is a positive scaling constant. Definition 3. x∗ is said to be an equilibrium point of neural network (8) if there exists γ ∗ ∈ ∂f (x∗ ) such that −P x∗ − (I − P )γ ∗ + q = 0.

(9)

From above analysis, the following theorem obviously holds. Theorem 2. x∗ is an optimal solution of problem (1) if and only if it is an equilibrium point of neural network (8).

1006

3

Q. Liu and J. Wang

Global Convergence

In this section, the global convergence of the recurrent neural network (8) is analyzed. Throughout this paper, we always assume that the optimal solution set (denoted as Ω ∗ ) of problem (1) is not empty and there always exists a finite x∗ ∈ Ω ∗ . Then the equilibrium point set (denoted as Ω e ) of neural network (8) is nonempty. Definition 4. The neural network (8) is said to be globally convergent to an optimal solution of problem (1) if for any trajectory x(t) of the neural network with initial point x(t0 ) ∈ Rn , there exists an optimal solution x∗ ∈ Ω ∗ such that limt→+∞ x(t) = x∗ . Theorem 3. The neural network (8) is stable in the sense of Lyapunov and globally convergent to an optimal solution of problem (1). Proof: Assume x∗ being an equilibrium point of neural network (8), then there exists γ ∗ ∈ ∂f (x∗ ) such that −P x∗ − (I − P )γ ∗ + q = 0. By substituting it into Equation (8), it can be rewritten as dx ∈ λ[−P (x − x∗ ) − (I − P )(∂f (x) − γ ∗ )]. dt

(10)

Consider the following Lyapunov function 1 V (x) = λ[f (x) − f (x∗ ) − (x − x∗ )T γ ∗ + x − x∗ 22 ], 2 We have

(11)

∂V (x) = λ[∂f (x) − γ ∗ + x − x∗ ].

By using the chain rule [19], it follows that V (x(t)) is differentiable for a.a. t ≥ 0 and it results in V˙ (z(t)) = ξ(t)T x(t), ˙ Let

∀ξ(t) ∈ ∂V (x(t)).

ξ(t) = λ[γ(t) − γ ∗ + x(t) − x∗ ],

where γ(t) ∈ ∂f (x(t)). Then V˙ (x(t)) ≤ λ2 sup [γ − γ ∗ + x − x∗ ]T [−P (x − x∗ ) − (I − P )(γ − γ ∗ )] γ∈∂f (x)

2



sup [−(γ − γ ∗ )T (I − P )(γ − γ ∗ ) − (x − x∗ )T P (x − x∗ )

γ∈∂f (x)

−(x − x∗ )T (γ − γ ∗ )].

(12)

A One-Layer Recurrent Neural Network

1007

Since f (x) is convex, for any γ ∈ ∂f (x), (x − x∗ )T (γ − γ ∗ ) ≥ 0 holds. Then V˙ (x(t)) ≤ λ2 sup [−(γ − γ ∗ )T (I − P )(γ − γ ∗ ) − (x − x∗ )T P (x − x∗ )] γ∈∂f (x)

= −λ2

inf [(γ − γ ∗ )T (I − P )(γ − γ ∗ ) + (x − x∗ )T P (x − x∗ )].(13)

γ∈∂f (x)

On the other hand, for any γ ∈ ∂f (x), x ˙ 22 = λ2 [−P (x − x∗ ) − (I − P )(γ − γ ∗ )]T [−P (x − x∗ ) − (I − P )(γ − γ ∗ )] (14) = λ2 [(x − x∗ )T P (x − x∗ ) + (γ − γ ∗ )T (I − P )(γ − γ ∗ )]. From (13) and (14), it follows that V˙ (x(t)) ≤ −

inf

γ∈∂f (x)

= −λ2

x ˙ 22

inf

γ∈∂f (x)

P x + (I − P )γ − q22 .

(15)

From (11), we have V (x) ≥ λx − x∗ 22 /2. Let L(x0 ) = {x ∈ Rn : V (x) ≤ V (x0 )}, then L(x0 ) is bounded. From (15), x(t) is also bounded and it follows that the solution x(t) exists on [t0 , +∞). V (x) is a Lyapunov function of (8) and neural network (8) is stable in the sense of Lyapunov. Define Γ (x) = inf γ∈∂f (x) P x+(I −P )γ −q22 . If x∗ ∈ Ω e , we have Γ (x∗ ) = 0. Conversely, if there exists x ˆ ∈ Rn such that Γ (ˆ x) = 0, combining that ∂f (x) is a compact convex subset in Rn , then there exists γˆ ∈ Γ (ˆ x) such that Px ˆ + (I − P )ˆ γ − q = 0. Therefore, Γ (x) = 0 if and only if x ∈ Ω e . As the rest proof is similar to that of Theorem 1 in [10], it is omitted here.

4

Simulation Results

In this section, two examples are given to demonstrate the effectiveness of the recurrent neural network proposed in this paper for solving the constrained least absolute deviation and nonlinear curve-fitting problems. Example 1. Consider the following constrained least absolute deviation problem: minimize subject to

Cx − d1 , Ax = b,

(16)

where x = (x1 , x2 , x3 , x4 )T and         4 2 −1 2 −3 4 1 −2 1 −2 C= ,d = ,A = ,b = . 1 3 2 −1 5 1 3 1 −1 4 This problem has a unique optimal solution x∗ = (0.5, 0.125, 1, −2.125)T . Let λ = 106 , the simulation results are shown in Fig. 1 with 10 random initial values. We can see that the state trajectories of neural network (8) is globally convergent to the unique optimal solution.

1008

Q. Liu and J. Wang state trajectories 3

2 x3

1

x

1

x2 0

−1

x4

−2

−3

0

0.2

0.4

0.6

0.8

time (sec)

1 −5

x 10

Fig. 1. Transient behavior of the neural network (8) in Example 1

Example 2. Nonlinear Curve-Fitting Problem Let us consider a constrained nonlinear least absolute deviation curve-fitting problem: Find the parameters of the combination of exponential and polynomial function y(x) = a4 ex + a3 x3 + a2 x2 + a1 x + a0 , which fits the data given in Table 1 and subjects to the equalities y(1.2) = −7.8 and y(4.6) = −3.4. This problem can be formulated as follows: minimize subject to

Cx − d1 , Ax = b,

(17)

where x = (x1 , x2 , x3 , x4 , x5 )T = (a4 , a3 , a2 , a1 , a0 )T and ⎛

⎞T 1 1.649 2.718 4.482 7.389 12.183 20.086 33.116 54.598 90.017 ⎜ 0 0.125 1 3.375 8 15.625 27 42.875 64 91.125 ⎟ ⎜ ⎟ 6.25 9 12.25 16 20.25 ⎟ C=⎜ ⎜ 0 0.25 1 2.25 4 ⎟ , ⎝ 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ⎠ 1 1 1 1 1 1 1 1 1 1

T d = −8.6 −8.2 −7.9 −9 −7 −6.2 −3 −3.8 −2.8 −3.8 ,     3.32 1.728 1.44 1.2 1 −7.8 A= ,b = , 99.484 97.336 21.16 4.6 1 −3.4 By utilizing the neural network in (8) for solving this problem, the simulation results are shown in Fig. 2(a) with λ = 106 and 10 random initial points, from which we can see that the neural network is globally convergent to the optimal

A One-Layer Recurrent Neural Network

1009

Table 1. Nonlinear fitting data for Example 2 x y

0 −8.6

0.5 −8.2

1 −7.9

1.5 −9

2 −7

2.5 −6.2

3 −3

3.5 −3.8

state trajectories

4 −2.8

4.5 −3.8

curve−fitting results −2 LA LS

10

−3

−4 5

x1

0

y(x)

−5 x4

x2

−6

x3

−7 −5 −8 x5 −10

0

0.5

1 time (sec)

(a)

1.5

2 −5

x 10

−9

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

x

(b)

Fig. 2. (a) Transient behavior of neural network (8) in Example 2; (b) Comparison of the two nonlinear curve fitting methods between the LA and LS in Example 2

solution x∗ = (−0.21, 0.32, −0.55, 1.26, −8.39)T or (a4 , a3 , a2 , a1 , a0 ) = (−0.21, 0.32, −0.55, 1.26, −8.39). The curve fitting is drawn in Fig. 2(b) for l1 -norm (solid line) and l2 -norm (dashed line). It shows that least absolute (LA) has better fitting performance than least square (LS) in dealing with outliers.

5

Conclusions

In this paper, a one-layer recurrent neural network has been proposed for solving non-smooth convex programming problems with linear equality constraints. Comparing with other neural networks for convex optimization, the proposed neural network has lower architecture complexity with only one-layer structure and less neurons. However, it is efficient for solving both smooth and non-smooth optimization problems. The global convergence of the neural network are proved based on the Lyapunov theory. Furthermore, the proposed recurrent neural network is efficient for solving the constrained least absolute deviation and nonlinear curve-fitting problems. Simulation results show the performance and effectiveness of the proposed neural network.

References 1. Tank, D., Hopfield, J.: Simple neural optimization networks: An a/d converter, signal decision circuit, and a linear programming circuit. IEEE Transactions on Circuits and Systems 33, 533–541 (1986)

1010

Q. Liu and J. Wang

2. Kennedy, M., Chua, L.: Neural networks for nonlinear programming. IEEE Transactions on Circuits and Systems 35, 554–562 (1988) 3. Wang, J.: Analysis and design of a recurrent neural network for linear programming. IEEE Transactions on Circuits and Systems-I 40, 613–618 (1993) 4. Wang, J.: A deterministic annealing neural network for convex programming. Neural Networks 7, 629–641 (1994) 5. Wang, J., Xia, Y.: Analysis and design of primal-dual assignment networks. IEEE Transactions on Neural Networks 9, 183–194 (1998) 6. Xia, Y., Wang, J.: Neural network for solving least absolute and related problems. Neurocomputing 19, 13–21 (1998) 7. Xia, Y., Wang, J.: A general projection neural network for solving monotone variational inequalities and related optimization problems. IEEE Transactions on Neural Networks 15, 318–328 (2004) 8. Yang, Y., Cao, J.: Solving quadratic programming problems by delayed projection neural network. IEEE Transactions on Neural Networks 17, 1630–1634 (2006) 9. Hu, X., Wang, J.: Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network. IEEE Transactions on Neural Networks 17, 1487–1499 (2006) 10. Liu, Q., Wang, J.: A one-layer recurrent neural network with a discontinuous hardlimiting activation function for quadratic programming. IEEE Transactions on Neural Networks 19, 558–570 (2008) 11. Zhang, S., Constantinides, A.: Lagrange programming neural networks. IEEE Transactions on Circuits and Systems-II 39, 441–452 (1992) 12. Xia, Y., Wang, J.: On the stability of globally projected dynamical systems. Journal of Optimization Theory and Applications 106, 129–150 (2000) 13. Xia, Y., Leung, H., Wang, J.: A projection neural network and its application to constrained optimization problems. IEEE Transactions Circuits and Systems-I 49, 447–458 (2002) 14. Liu, Q., Wang, J.: A one-layer recurrent neural network with a discontinuous activation function for linear programming. Neural Computation 20, 1366–1383 (2008) 15. Li, G., Song, S., Wu, C., Du, Z.: A neural network model for non-smooth optimiza˙ tion over a compact convex subset. In: Wang, J., Yi, Z., Zurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3971, pp. 344–349. Springer, Heidelberg (2006) 16. Liu, Q., Wang, J.: A recurrent neural network for non-smooth convex programming subject to linear equality and bound constraints. In: King, I., Wang, J., Chan, L.W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4233, pp. 1004–1013. Springer, Heidelberg (2006) 17. Filippov, A.: Differential Equations with Discontinuous Righthand Sides. Mathematics and its applications (Soviet series). Kluwer Academic Publishers, Boston (1988) 18. Aubin, J., Cellina, A.: Differential Inclusions: Set-Valued Maps and Viability Theory. Springer, New York (1984) 19. Forti, M., Nistri, P.: Global convergence of neural networks with discontinuous neuron activations. IEEE Transactions on Circuits and Systems-I 50, 1421–1435 (2003)