IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
1321
Stable Dynamic Backpropagation Learning in Recurrent Neural Networks Liang Jin and Madan M. Gupta, Fellow, IEEE
Abstract— The conventional dynamic backpropagation (DBP) algorithm proposed by Pineda does not necessarily imply the stability of the dynamic neural model in the sense of Lyapunov during a dynamic weight learning process. A difficulty with the DBP learning process is thus associated with the stability of the equilibrium points which have to be checked by simulating the set of dynamic equations, or else by verifying the stability conditions, after the learning has been completed. To avoid unstable phenomenon during the learning process, two new learning schemes, called the multiplier and constrained learning rate algorithms, are proposed in this paper to provide stable adaptive updating processes for both the synaptic and somatic parameters of the network. Based on the explicit stability conditions, in the multiplier method these conditions are introduced into the iterative error index, and the new updating formulations contain a set of inequality constraints. In the constrained learning rate algorithm, the learning rate is updated at each iterative instant by an equation derived using the stability conditions. With these stable DBP algorithms, any analog target pattern may be implemented by a steady output vector which is a nonlinear vector function of the stable equilibrium point. The applicability of the approaches presented is illustrated through both analog and binary pattern storage examples. Index Terms— Adaptive algorithm, dynamic backpropagation algorithm, dynamic neural networks, Lyapunov stability, nonlinear dynamics.
I. INTRODUCTION
D
YNAMIC neural networks (DNN’s) which contain both feedforward and feedback connections between the neural layers play an important role in visual processing, pattern recognition, neural computing and control [36], [37]. In neural associative memory, DNN’s which deal with a static target pattern can be divided into two classes according to how the pattern in the network is expressed [6]–[8], [21]: 1) the target pattern (input pattern) is given as an initial state of the network or 2) the target pattern is given as a constant input to the network. In both the cases, the DNN must be designed such that the state of the network converges ultimately to a locally or globally stable equilibrium point which depends only on the target pattern [10]–[12]. In an earlier paper on neural associative memory, Hopfield [3], [4] proposed a well-known DNN for a binary vector pattern. In this model, every memory vector is an equiManuscript received September 15, 1998; revised May 28, 1999. L. Jin is with the Microelectronics Group, Lucent Technologies Inc., Allentown, PA 18103 USA. M. M. Gupta is with the Intelligent Systems Research Laboratory, College of Engineering, University of Saskatchewan, Saskatoon, Sask., Canada S7N 5A9. Publisher Item Identifier S 1045-9227(99)09400-X.
librium point of the dynamic network, and the stability of the equilibrium point is guaranteed by the stable learning process. Many alternative techniques for storing binary vectors using both continuous and discrete-time dynamic networks have appeared since then [9], [13], [14], [17], [20], and [21]. For the analog vector storage problem, Sudharsanan and Sundereshan [13] developed a systematic synthesis procedure for constructing a continuous-time dynamic neural network in which a given set of analog vectors can be stored as the stable equilibrium points. Marcus et al. [35] discussed an associative memory in a so-called analog iterated-map neural network using both the Hebb rule and the pseudoinverse rule. Atiya and Abu-Mostafa [16] recently proposed a new method using the Hopfield continuous-time network, and a set of static weight learning formulations was developed in their paper. An excellent survey of some previous work on the design of associative memories using the Hopfield continuous-time model was given by Michel and Farrell [22]. A dynamic learning algorithm for the first class of DNN’s where the analog target pattern is directly stored at an equilibrium point of the network was first proposed by Pineda [1] for a class of continuous-time networks. At the same time, a dynamic learning algorithm was described by Almeida [5]. In order to improve the capability of storing multiple patterns in such an associative memory, a modified algorithm for the dynamic learning process was later developed by Pineda [2]. Two dynamic phenomena in the dynamic learning process were isolated into primitive architectural components which perform the operations of continuous nonlinear transformation and autoassociative recall. The dynamic learning techniques for programming the architectural components were presented in a formalism appropriate for a collective nonlinear dynamic neural system [2]. This dynamic learning process was named dynamic back propagation (DBP) by Narendra [38], [39] due to the application of the gradient descent method. More recently, this method was applied to a nonlinear functional approximation with a dynamic network using a dynamic algorithm for both the synaptic and somatic parameters by Tawel [15]. Some control applications of the DBP learning algorithm in recurrent neural networks may be found in the survey papers [40], [41]. However, the problem of a DBP learning algorithm for discrete-time dynamic neural networks has received little attention in the literature. In the DBP method, the dynamic network is designed using a dynamic learning process so that each given target vector becomes an equilibrium state of the network. The stability is easily ensured for a standard continuous-time Hopfield
1045–9227/99$10.00 1999 IEEE
1322
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
network during the learning process, if the synaptic weight matrix is set as a symmetric matrix with the zero diagonal elements [3], [4] at each learning instant. Generally speaking, the dynamic learning algorithm does not guarantee though the asymptotic stability of the equilibrium points, and a checking phase must be added to the learning process by simulating the set of dynamic equations, or else by verifying the stability conditions, after the learning has been completed. If some of the equilibrium points are unstable, the learning process must be repeated using a learning rate that is small enough, which is a very time-consuming process. An interesting topic is thus to develop a systematic way of ensuring the asymptotic stability of the equilibrium points during such a DBP learning procedure. It is important to note that another issue associated with a stable DBP learning process is the stability criteria of a dynamic neural network. The stability conditions of continuoustime Hopfield neural networks have been extensively studied [24]–[29] during the past few years. Recently, several stability criteria were proposed for discrete-time recurrent networks. A global stability condition for a so-called iterated-map neural network with a symmatric weight matrix was proposed in [34] by Marcus, and Westervelt using Lyapunov’s function method and the eigenvalue analysis, and the condition was used in the associative memory learning algorithms in [35] by Marcus et al. Recently, the stability and bifurcation properties of some simple discrete-time neural networks were analyzed by Blum and Wang [32], [33], and the stability of the fixed points was studied for a class of discrete-time recurrent networks by Li [29] using the norm condition of a matrix. Changes in the stable region of the fixed points due to the changing of the neuron gain were also obtained. More recently, the problem of the absolute stability of a general class of discrete-time recurrent neural networks was discussed in [30] and [31] by Jin, Nikiforuk, and Gupta, and some absolute stability conditions which are directly expressed using the synaptic weights were derived. From the stability theory point of view, the stability in discrete-time dynamic neural networks can be evaluated using Lyapunov’s first or second methods. The stability analysis method using the Lyapunov’s first method and the well-known Gersgorin’s theorem [42] will be incorporated to develop a stable DBP learning process in this paper. A stable equilibrium point learning problem associated with discrete-time dynamic neural network is studied in this paper using stable dynamic backpropagation (SDBP) algorithms. Assume that the analog vector is desired to be implemented by a steady output vector which is a nonlinear vector function of the network state, not directly stored at an equilibrium point of the network. Two new learning algorithms, 1) the multiplier method and 2) the constrained learning rate method, are developed for the purpose of ensuring the stability of the network during DBP learning phase. The dynamics and Gersgorin’s theorem-based global stability conditions for a general class of discrete-time DNN’s are discussed in Section II. A conventional DBP algorithm is extended to the discrete-time networks which have nonlinear output equations in Section III. In Section IV, the explicit stability conditions are introduced into the weight updating formulations using the multiplier
concept which has been used in optimal control theory [42], and a set of stable DBP formulations is constructed by a gradient algorithm with the inequality constraints. The constrained learning rate algorithm is proposed in Section V, where the learning rate is adapted by an equation derived using the stability conditions. The applicability of the approaches is illustrated using examples in Section VI, and some conclusions are given in Section VII. II. DISCRETE-TIME DYNAMIC NEURAL NETWORKS A. Network Models Consider a general form of a dynamic recurrent neural network described by a discrete-time nonlinear system of the form (1) is the state vector of the where represents the internal state of dynamic neural network, is the real-valued matrix of the th neuron, the synaptic connection weights, is a threshold vector or a so-called somatic vector, is an observation vector or output vector : is a continuous and differentiable and are, respectively, vector valued function, : is a bounded and uniformly bounded, and known continuous and differentiable vector valued function. The recurrent neural network consists of both feedforward and feedback connections between the layers and neurons repreforming complicated dynamics. In fact, the weight sents a synaptic connection parameter between the th neuron is a threshold at the th neuron. and the th neuron, and Hence, the nonlinear vector valued function on the right side of system (1) may be represented as (2) where .. .
.. .
.. .
.. .
(3)
and .. .
(4)
Equation (2) indicates that the dynamics of the th neuron in the network is associated with all the states of the network, and the somatic threshold the synaptic weights parameter . The four main types of discrete-time dynamic neural models are given in Table I. These neural models describe the different dynamic perporties due to the different neural state equations. Models I and II consist of complete nonlinear difference
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
1323
TABLE I FOUR DISCRETE-TIME DYNAMIC NEURAL MODELS
equations. Models III and IV are, however, the seminonlinear equations which contain the linear terms on the right hand side is the of the models. In these neural models, synaptic connection weight matrix, is the neural gain of the th neuron, is the time-constant or linear feedback gain of the th neuron, and is a threshold at the th neuron. may be chosen as the The neural activation function continuous and differentiable nonlinear sigmoidal function satisfying the following conditions: as ; 1) is bounded with the upper bound and the lower 2) ; bound at a unique point ; 3) and as ; 4) has a global maximal value . 5) are Typical examples of such a function
least one equilibrium point . In fact, there exists at least one equilibrium point for every neural model given in Table I. Moreover, one can estimate that the regions of the equilibrium points of the models I–IV are respectively the -dimensional , and hypercubes . Gersgorin’s theorem [43] has often been used to derive the stability conditions by several authors [13], [16], [31]. For a known real or complex matrix, Gersgorin’s theorem provides an effective approach for determining the positions of the eigenvalues of the matrix. In order to analyze the stability of be the Jacobian the equilibrium points of system (1), let with respect to . Based on the of the function Lyapunov’s first method, if (6) or (7)
where is a sign function and all the above nonlinear activation functions are bounded, monotonic, nondecreasing functions.
is a local state equilibrium point. Furthermore, if the then, are uniformly bounded, there elements of the Jacobian such that then exist functions (8) In this case, if
B. Gersgorin’s Theorem-Based Stability Conditions The equilibrium points of the system (1) are defined by the following nonlinear algebraic equation: (5) Without loss of generality, one can assume that there exists at least one solution of the (5); that is, the system (1) has at
(9) or (10)
1324
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
TABLE II GLOBAL STABILITY CONDITIONS OF THE NEURAL MODELS
The equilibrium point is a unique global stable equilibrium point of the system. Using the uniformly bounded property of the derivative , the global stability conditions of the neural models shown in Table I are summarized in Table II. If is treated as an external input to the th neuron in these neural models, the stability conditions given in Table II are called the absolute is not involved in stability conditions [24], [31] because the stability conditions. In the later sections, these stability conditions will be incorporated to develop the stable dynamic learning algorithms.
III. GENERALIZED DBP LEARNING ALGORITHM The DBP algorithm for a class of continuous-time recurrent neural networks was first proposed by Pineda [1], [2]. A DBP learning algorithm for a general class of dynamic neural systems with nonlinear output equations will be developed in this section for the analog target pattern storage purpose. Let be an analog target pattern which is desired to be implemented by a steady-state output vector which is a of the nonlinear vector function of an equilibrium point . The purpose of the neural system (1); that is, and learning procedure is to adjust the synaptic weights the somatic threshold parameter such that can be realized . Define an error function as by the nonlinear function
(11)
IN
TABLE I
incremental change term of the weight
is given as
(12)
is a learning rate associated with the synaptic where weights. On the other hand, for the somatic parameter , the incremental formulation is easily given as
(13)
is a learning rate associated with the somatic where parameters. The incremental terms are derived [see the Appendix] to be (14)
. Next, we are going to discuss the where learning formulations of the synaptic weights and somatic parameters. After performing a gradient descent in , the
and (15)
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
1325
TABLE III DYNAMIC BACKPROPAGATION (DBP) LEARNING ALGORITHM
where (16) is said to be the steady adjoint equation. Hence, the synaptic and somatic learning formulations are obtained as
(17) and (18) where (19) and (20) . In fact, and are the where equilibrium points of the following dynamic systems: (21) and
respectively, where . Equation (22) is said to be the adjoint equation associated with the (21). The updating rules (17) and (18) are not able to guarantee the stability for both systems (21) and (22), and a checking procedure for the stability of both (21) and (22) is needed in such a dynamic learning process. Two primary approaches may be used to check the stability of the network during a dynamic learning process. The first approach is verifying that the stability condition of the equilibrium may be carried out after the whole dynamic learning process has been completed. In this case, if the network is unstable, the learning phase must and must be solved by be repeated, and the steady states the nonlinear algebraic equations (19) and (20) at each iterative instant. The second approach is verifying the known stability of the network at each iterative instant. When the network is unstable, in other words, when (19) and (20) do not converge to the stable equilibrium points and , the iterative process needs to be repeated through adjusting the learning rates and , until the solutions of (19) and (20) converge to the stable equilibrium points as time becomes large. Both of these methods for stability studies are very time consuming. For the neural models given in Table I, the DBP learning , and the threshold algorithms for the synaptic weight are, respectively, derived in Table III. Meanwhile, the steady adjoint equations corresponding to the neural models are also given in Table III. IV. STABLE DBP LEARNING: MULTIPLIER METHOD
(22)
The learning algorithm given in the last section performs a storage process in which the target pattern is stored at an
1326
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
equilibrium point of the neural system (1). In fact, the target pattern is desired to be stored at a stable equilibrium point of the neural system (1); that is, after finishing the learning procedure, the synaptic weights and the somatic parameters would satisfy either one of the local stability conditions (6) and (7) or one of the global stability conditions (9) and (10) so that the stored pattern is a local or global stable equilibrium point. For this goal, one may require that the systems (21) and (22) are stable at each learning instant, or after finishing the dynamic learning procedure, the neural system with the learned synaptic and somatic parameters is stable in the sense of Lyapunov. The multiplier method was used effectively to deal with the inequality constraints on functions of the control and state variables in optimal control by Bryson and Ho [42]. In dynamic optimal control problems, since the multipliers associated with the inequality appear in the augmented Hamiltonian which is in the neural parameter learning similar to the error index process, the difficulties are due to the inequality constraints on both the control and state variables and are overcome successfully. For the sake of simplicity, let only the row sum local and global stability conditions be addressed to deal with the stable DBP learning algorithms in this paper. The multiplier method will now be used to develop the stable learning algorithm for the neural system (1).
Hence, the incremental term of the
is derived from (23) as
(24) and
(25) The partial derivatives from the Appendix as follows:
and
are obtained
A. Local Stable DBP (LSDBP) Learning Let the local stability condition (6) be incorporated so as to develop the local stable DBP (LSDBP) learning algorithm which guarantees the local stability of the equilibrium at the end of the learning. It is seen that the the original neural equation (21) and the corresponding adjoint equation (22) , and the local and global have the same Jacobian stability conditions (6)–(7) and (9)–(10) of system (21) are thus the sufficient conditions for the local and global stability of the adjoint system (22), respectively. Hence, in order to guarantee the stability of the equilibrium point which is used to represent the desired analog vector, the stability condition (6) may be considered as a set of the inequality constraints on and the somatic parameters . Using the synaptic weights the multiplier method for such a constrained neural learning, an augmented learning error index is defined as
(26) (27) where
and
The main difficulty with this algorithm is associated with the at each iterative instant. As the computing of the matrix number of the neurons becomes large, the computing becomes very time consuming. This shortcoming can be avoided using the global stable fixed point learning algorithm, which will be described now. B. Global Stable DBP (GSDBP) Learning
(23) where the second term in the right side of the above equabecomes tion will guarantee that the equilibrium point . locally asymptotically stable as the iterative time In other words, the introduction of the second term in the error index will force the trained system to satisfy the local satisfy the additional stability condition. Let the multiplier requirement if if
It is to be noted that the global stability conditions (9) and (10) of the neural system (21) are the global stability and are conditions of the adjoint system (22). If guaranteed to be the globally stable equilibrium points of the systems (21) and (22), respectively, after the learning process, the global stability condition (9) may then be used in the augmented error index; that is
(28) where the second term on the right side of the above equation will guarantee that the trained system satisfies the global
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
1327
TABLE IV LSDBP LEARNING ALGORITHMS
stability condition (9) as the iterative time satisfies the additional requirement multiplier
, and the
WITH
MULTIPLIERS
and
if if Hence, the incremental terms of the tively, derived from (28) as
and
are, respec-
(29) and
(30) Using the formulations obtained in the last section, the stable synaptic and somatic learning algorithms are described by the following updating equations:
(32) and are determined by (19) and (20), and the where last terms in the above updating equations are said to be the additional incremental terms which are due to the global stability condition (9). Since the stability of the network is , gradually ensured in this algorithm as iterative time and have to be solved by the nonlinear algebraic the (19) and (20) of the equilibrium states at each iterative instant. Based on the global stability conditions shown in Table II, for the neural models the additional incremental terms given in Table I are presented in Table IV, where the row sum conditions are only considered in the derivations. Since the global stability conditions are independent of the threshold for the networks given in Table I, the updating formulations derived in the last section remain. It is of the threshold obvious that the computational requirement of the GSDBP learning algorithm is not significantly increased compared with the DBP and GSDBP algorithms. V. CONSTRAINED LEARNING RATE METHOD
(31)
The learning rate in the DBP algorithm not only plays an important role in the convergence of the updating process,
1328
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
but it also influences the stability of the network. During a dynamic learning process, if learning rate is too large, the network may become unstable, while if learning rate is too small, the convergence of the learning phase will be too slow to be practical. Indeed, because the incremental terms of the weights are time-varying during the leraning process, the scale of a suitable learning rate needs to be adjusted at each learning instant, so that the stability of the system is ensured. In fact, this is the reason that if a fixed learning rate is used, then the unstable situation may occur in the conventional DBP learning. For the purpose of stability, a criterion for determining an adaptive learning rate using the known stability conditions of the network at each iterative instant will therefore be considered in this section using the DBP formulations given in Section III.
Substituting the above result into (38) yields
(41) where the incremental terms used in the last section are and modified using the time-varying learning rates as follows: (42) and
A. LSDBP Learning First, an LSDBP learning algorithm will be considered using the local stability condition. Let the equilibrium point of system (1) be stable at the iterative time , and the row sum local stability condition (6) be satisfied; that is
(43) Let
, and
(33) and such that the In order to determine the learning rates , stability condition (6) is satisfied at the iterative time it can be assumed that
(34) Hence, if
where (35) (36) (37) For convenience, the iterative time will not appear in the following derivation. Using a Taylor series expansion of first , , and , (34) may be represented order terms of as
(44) Then, the fixed point
is local stable. Moreover, let (45)
The stable learning rate is obtained as (46)
(38) Furthermore, based on the equilibrium point equation (5), (39) Hence,
may be solved as (40)
Obviously, the computational complexity of the above learning algorithm arises from the computation of the inverse of the at the each iterative time. matrix B. Global Stable DBP (GSDBP) Learning If the equilibrium point of the network at each iterative instant is required to be globally stable, another constraint formulation of the learning rate may be then derived using the global stability condition. In Section II, one of the absolute stability conditions was given as (47)
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
GSDBP LEARNING
Let the system at the iterative time condition; that is
WITH
1329
TABLE V CONSTRAINED LEARNING RATE ALGORITHM
satisfy the above
(48) Consider the stability of the system at the iterative time Let
.
(49) and are given by (36) and (37). where Expanding the terms on the left side of the above inequality and produces to first-order incremental terms
(50) It is to be noted that the iterative time is omitted above and in the following formulations. Moreover, denote
On the other hand, let ; that is, both the synaptic learning and somatic learning have the same learning rates, then, the stable learning rate at instant is obtained as (51) The computational formulations of the stable learning rates for the neural models given in Table I are summarized in Table V. Since the threshold is not involved in the stability conditions given in Table II, the learning rate algorithms given in Table V are independent of the threshold . For the each neural model, it is easy to see that the constraint equation of the learning rate has a simple form. The computational phase of the learning rate can be easily implemented, therefore, with the DBP routines given in Section III at each iterative instant.
VI. IMPLEMENTATION EXAMPLES Example 1 (Analog Vector Storage): In this example, the proposed global stable learning algorithms are used to study an analog vector storage procedure. Let a two-dimensional analog storage problem be considered using a vector two-neuron model of the following form:
(52)
is realized by a steady output vector The target pattern whose components are and , and is a equilibrium state of the network. The GSDBP algorithms with multiplier and constrained learning rate methods are used to carry out the storage process. In this example, the four possible desired equilibrium states can easily be obtained as , and by the known target vector and the output equations. Hence, the target analog vector may be implemented by the one of the above equilibrium states of the network. This shows that the storage capacity of a dynamic network may be increased by introducing some suitable nonlinear output equations. The initial values of the weights and thresholds were chosen randomly in the interval [ 0.5,0.5], and the initial learning . Using the multiplier method rate were set as provided in Section IV, a dynamic learning process with a was then achieved so that the analog learning time vector was realized by an equilibrium point and . When the multipliers were selected as , the weights and thresholds were obtained at the end of the learning as follows:
Similar to the well-known static BP learning for feedforward network, the computational procedure shows that a suitable choice of the values of the learning rates and the multipliers in
1330
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
Fig. 1. The error index curves during both the multiplier and constrained learning rate procedures.
Fig. 2. The phase plane diagram of the equilibrium point of the network during the multiplier learning, where the global stable equilibrium point is f x 0:707 05 and xf2 = 0:547 72. 1
Fig. 3. The phase plane diagram of the equilibrium point of the network during the constrained learning rate learning, where the global stable equilibrium point is xf1 = 0:70714 and xf2 = 1:227 57.
0
network and the adjoint system with the fixed weights and thresholds because both the network and adjoint equations are stable at each iterative instant in the constrained learning rate method. Indeed, the learning rate may become as very small value at the some iterative instant for the purpose of global stability of the network, the convergence speed may thus become somewhat slowly in the constrained learning rate algorithm. This drawback appeared in these simulation studies. Example 2 (Binary Image Storage): A binary image pattern storage process is discussed in this example to illustrate the applicability of the algorithms where the target pattern is a 10 10 binary image as shown in Fig. 4(a). The equations of the dynamic network have the following form:
=
(53) the multiplier method is the first step for a successful dynamic learning process. On the other hand, using the constrained learning algorithm, the analog vector was perfectly realized by an equilibrium and , where the state with , and the set of weights and total learning time was thresholds were computed as
The decrement curves of the error indexes are given in Fig. 1 for the dynamic learning processes, and the trajectories of the equilibrium points at each iterative instant are depicted in Figs. 2 and 3, respectively. The results shows that even though the constrained learning rate method required more iteratives than the multiplier method, the computational time of the former was less than that of the latter. The reason is and of the network and the that the equilibrium points adjoint equation at each iterative instant had to be solved by a set of the nonlinear algebraic equations in the multiplier method because of the instability of the network during the and may go through initial iterative steps. However, the a simple iterative procedure for the dynamic equations of the
where the double subscripts are introduced to represent the is a state of the neuron two-dimension binary image. which corresponds the image unit is a weight and the neuron , and is a between the neuron . The neural activation function threshold of the neuron was chosen as . In this example, the binary target pattern is desired to be stored directly at a stable equilibrium point of the network. The global stability conditions of the neural model (53) may be obtained as
or
Let the GSBDP algorithms be used to deal with this problem. For the initial stability of the network, all the iterative initial values of the weights and thresholds were chosen
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
1331
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 4. The binary patterns correspond to the equilibrium point of the network (53) during the learning process using the multiplier method. (a). The target 0; (c). k = 50; (d). k = 100; (e). k = 150; (f). k = 200; (g). k = 250; (h). k = 300. pattern; (b). k
=
randomly in the interval [ 0.001, 0.001], the initial learning in both the multiplier and rate was selected as constrained learning rate algorithms, and the multipliers were in the multiplier selected as method. In order to observe the changing of the binary image corresponding to the equilibrium state of the network during was learning process, the pattern employed to represent the binary image at the iterative instant, , if , and , if . where Using the multiplier method, the dynamic learning process , and the 10 was completed at the iterative instant 10 binary patterns were perfectly stored at a global stable equilibrium point of the network. The recovered binary images at some iterative instant from the analog vector using the instant pattern operator are depicted in Fig. 4, and the error index curve during the dynamic learning process is given in Fig. 6. The results obtained in Figs. 4 and 6 show that the multiplier method has a satisfactory convergence, even if a set of high-dimensional nonlinear algebric equations are needed to be solved at each iterative instant. The binary image patterns were also perfectly stored at a global stable using the constrained equilibrium point at the instant learning rate method. The changing procedure of the binary image corresponding to the steady-state vector of the network is shown in Fig. 5. The error index curve is given in Fig. 6 and shows the converging procedure of both the weight and threshold learning process. The total computational time of the learning process using the constrained learning rate method was much smaller than that of the learning process using the multiplier method. The results show that GSDBP learning algorithms can be used effectively for large-scale pattern storage problems. 32 girl Example 3 (Gray-Scale Image Storage): A 32 image with 256 gray-scale given in Fig. 7 is used to show the effectiveness of the SDBP algorithm developed in the paper for a purpose of associative memory. It is easy to show that the eural network has synaptic weights in order to store such an image. To reduce the computational
requirment, the weight matrix is simplified as follows:
.. .
.. .
.. .
.. .
.. .
(54)
a simplified version of the neural networks is given by the following equations:
(55) . The neural structure where represented by (55) can be viewed as an multilayered neural hidden layers, network with two visible layers and neural units. It is assumed that the where each layer has multilayered structure equation (55) has two visible layers and . Let the each layer have six hidden layers; that is, 128 neural units and the network contain total neural units. It is easy to show that this eight-layered structure involves only 131 072 synaptic weights, which is only 12.5 percent of the one-layered strucyure. The significant reduction on the number of the connection weights may reduce the computational complexity such as computing time and memory requirement of the weight learning. For associative memory synthesis, the simulation results indicate that the computing time for the multilayered structure using the algorithm presented in the letter is only 5% of that for a single layer Hopfield network using the pseudoinverse algorithm. Three blurred girl images given in Figs. 8(a), 9(a), and 10(a) are distorted respectively by removing two parts of the 10 motion without noise, and adding a original image, 1 white Gaussian noise with 0 dB signal-to-noise ratio (SNR). These image patterns are inputed respectively to the multilayered network to test the capability of the recall of the associative memory. The recalled results are given respectively
1332
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
(a)
(b)
(c)
(d)
(e)
(a)
(g)
(h)
(i)
(j)
Fig. 5. The binary patterns correspond to the equilibrium point of the network (53) during the learning process using the constrained learning rate method. (a). The target pattern; (b). k 0; (c). k = 50; (d). k = 100; (e). k = 150; (f). k = 200; (g). k = 250; (h). k = 300; (i). k = 350; (j). k = 400.
=
(a)
(b)
Fig. 8. (a) The girl image with two parts removed; (b) recalled image.
Fig. 6. The error index curves in Example 2 during both the multiplier and constrained learning rate learning procedures.
(a) Fig. 9. (a) The girl image with 1
Fig. 7. Original girl image pattern (32
2 10 motion;
(b)
2 32).
in Figs. 8(b), 9(b), and 10(b). It is seen that the distributed associative memory was able to perfectly recall the stored image when presented with blurred images given in Figs. 8(a) and 9(a), however, the noise reduction capability of such a memory structure is somewhat poor as shown in Fig. 10(b). VII. CONCLUSION The conventional DBP algorithm which has been used exclusively for the adjustment of the parameters of dynamic neural networks is extended in this paper using two new stable learning concepts, the multiplier and the constrained learning rate methods. They are proposed for the purposes of LSDBP learning and GSDBP learning. These dynamic learning schemes make use of the conventional DBP version
(a)
(b)
Fig. 10. (a) The girl image with white Gaussian noise; (b) recalled image.
proposed by Pineda [1], [2] and some additional formulations which are due to the known stability conditions of the network. The standard DBP routines can be still used, therefore, and the stability of the network is ensured after the dynamic learning has been completed. It is important to show that the
JIN AND GUPTA: STABLE DYNAMIC BACKPROPAGATION LEARNING
1333
computational requirement of the GSDBP learning algorithms is not significantly increased as compared with the DBP and GSDBP algorithms. The effectiveness of the proposed schemes were tested using both the analog and binary pattern memory problems. The nonlinear dynamic neural models discussed in this paper have the potential for application to image restoration, adaptive control, and visual processing. Since the multiplier and constrained learning rate formulations are derived based on the explicit expressions of the network, a better estimation of the stability condition of a dynamic network may be of benefit to the learning algorithms. The sets of Gersgorin’s theorem-based stability conditions for a general class of dynamic networks are applied to design the learning algorithms in this paper.
and (63) Furthermore, let a new variable be introduced (64) It can be shown that (65) that is (66)
APPENDIX In this Appendix, the procedure for deriving (14)–(16) is given. Based on the equilibrium point equation (5), the partial with respect to results in following derivative of expression:
Hence, (62) and (63) can be represented by (67) and
(56) (68) where
is the Kronecker delta function
REFERENCES
Moreover, let (57) then (55) can be represented as (58) For convenience, let the matrix elements are defined by
be introduced whose
(59) Then (57) can be rewritten as (60) Let
be the inverse of the matrix may be solved as
. Then
(61) Hence, the incremental changes in the weight are expressed as old
and thresh-
(62)
[1] F. J. Pineda, “Generalization of backpropagation to recurrent neural networks,” Phys. Rev. Lett., vol. 59, no. 19, pp. 2229–2232, 1987. [2] F. J. Pineda, “Dynamics and architecture for neural computation,” J. Complexity, vol. 4, pp. 216–245, 1988. [3] J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” in Proc. Nat. Academy Sci. USA, vol. 79, 1982, pp. 2554–2558. [4] J. Hopfield, “Neurons with graded response have collective computational properties like those of two state neurons,” in Proc. Nat. Academy Sci. USA, vol. 81, 1984, pp. 3088–3092. [5] L. B. Almeida, “A learning rule for asynchronous perceptrons with feedback in a combinatorial environment,” in Proc. IEEE 1st Conf. Neural Networks, San Diego, CA, June 21–24, 1987, vol. II, pp. 609–618. [6] S. Amari, “Learning patterns and pattern sequences by self-organizing nets of threshold elements,” IEEE Trans. Comput., vol. c-21, pp. 1197–1206, Nov. 1972. [7] S. Amari, “Neural theory of association and concept formation,” Biol. Cybern., vol. 26, pp. 175–185, 1977. [8] M. A. Arbib, Brains, Machines, and Mathematics. New York: McGraw-Hill, 1988. [9] T. Kohonen, Associative Memory: A System Theoretical Approach. New York: Springer-Verlag, 1977. [10] B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern., vol. SMC-18, pp. 42–60, 1988. [11] Y. Chauvin, “Dynamic behavior of constrained backpropagation networks,” in Advances in Neural Information Processing System, D. A. Touretzky, Ed., vol. 2. San Mateo, CA: Morgan-Kaufmann, pp. 519–526, 1990. [12] A. Guez, V. Protopopsecu, and J. Bahren, “On the stability, storage capacity, and design of continuous nonlinear neural networks,” IEEE Trans. Syst., Man, Cybern., vol. 18, pp. 80–87, Jan./Feb. 1988. [13] S. I. Sudharsanan and M. K. Sunareshan, “Equilibrium characterization of dynamical neural networks and a systematic synthesis procedure for associative memories,” IEEE Trans. Neural Networks, vol. 2, pp. 509–521, Sept. 1991. [14] R. Kamimura, “Activated hidden connections to accelerate the learning in recurrent neural networks,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 1992, pp. I-693–700. [15] R. Tawel, “Nonlinear functional approximation with networks using adaptive neurons,” in Proc. Int. Joint Conf. Neural Networks (IJCNN), 1992, pp. III-491–496.
1334
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 6, NOVEMBER 1999
[16] A. Atiya and Y. S. Abu-Mostafa, “An analog feedback associative memory,” IEEE Trans. Neural Networks, vol. 4, no. 1, pp. 117–126, Jan. 1993. [17] Y. S. Abu-Mostafa and J.-M. St. Jacques, “Information capacity of the Hopfield model,” IEEE Trans. Inform. Theory, vol. IT-31, pp. 461–464, 1984. [18] M. Morita, “Associative memory with nonmonotone dynamics,” Neural Networks, vol. 6, no. 1, pp. 115–126, 1993. [19] A. H. Gee, S. V. B. Aiyer, and R. W. Prager, “An analytical framework for optimizing neural networks,” Neural Networks, Vol. 6, no. 1, pp. 79–98, 1993. [20] S. V. B. Aiyer, M. Niranjan, and F. Fallside, “A theoretical investigation into the performance of the Hopfield model,” IEEE Trans. Neural Networks, vol. 1, pp. 204–215, 1990. [21] J. Farrell and A. Michel, “A synthesis procedure for Hopfield’s continuous-time associative memory,” IEEE Trans. Circuits Syst., vol. 37, pp. 877–884, 1990. [22] A. Michel and J. Farrell, “Associative memories via artificial neural networks,” IEEE Contr. Syst. Mag., pp. 6–17, Apr. 1990. [23] S. Grossberg, “Nonlinear neural networks: Principles, mechanisms and architectures,” Neural Networks, vol. 1, no. 1, pp. 17–61, 1988. [24] M. A. Cohen and S. Grossberg, “Absolute stability of global pattern information and parallel memory storage by competitive neural networks,” IEEE Trans. Syst., Man, Cybern., vol. SMC-13, pp. 815–826, 1983. [25] A. Guez, V. Protopopsecu, and J. Barhen, “On the stability, storage capacity, and design of nonlinear continuous neural networks,” IEEE Trans. Syst., Man, Cybern., vol. SMC-18, pp. 80–87, 1988. [26] D. G. Kelly, “Stability in contractive nonlinear neural networks,” IEEE Trans. Biomed. Eng., vol. 37, pp. 231–242, 1990. [27] J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, “Distinctive features, categorical perception, and probability learning: Some applications of a neural model,” Neurocomputing: Foundations of Research, J. A. Anderson and E. Rosenfeld, Eds. Cambridge, MA: MIT Press, 1988. [28] A. H. Michel, J. Si, and G. Yen, “Analysis and synthesis of a class of discrete-time neural networks described on hypercubes,” IEEE Trans. Neural Networks, vol. 2, pp. 32–46, 1991. [29] L. K. Li, “Fixed point analysis for discrete-time recurrent neural networks,” in Proc. IJCNN, June 1992, vol. IV, pp. 134–139. [30] L. Jin, P. N. Nikiforuk, and M. M. Gupta, “Absolute stability conditions for discrete-time recurrent neural networks,” IEEE Trans. Neural Networks, vol. 5, pp. 954–964, 1994. [31] L. Jin and M. M. Gupta, “Globally asymptotical stability of discretetime analog neural networks,” IEEE Trans. Neural Networks, vol. 7, pp. 1024–1031, 1996. [32] L. Jin and M. M. Gupta, “Equilibrium capacity of analog feedback neural networks,” IEEE Trans. Neural Networks, vol. 7, pp. 782–787, 1996. [33] E. K. Blum and X. Wang, “Stability of fixed points and periodic orbits and bifurcations in analog neural networks,” Neural Networks, vol. 5, no. 4, pp. 577–587, 1992. [34] C. M. Marcus and R. M. Westervelt, “Dynamics of iterated map neural networks,” Phys. Rev. A, Vol. 40, no. 1, pp. 577–587, 1989. [35] C. M. Marcus, F. R. Waugh, and R. M. Westervelt, “Associative memory in an analog iterated-map neural network,” Phys. Rev. A, vol. 41, no. 6, pp. 3355–3364, 1990. [36] D. E. Rumelhart and J. L. McCelland, “Learning internal representations by error propagation,” Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1. Cambridge, MA: MIT Press, 1986. [37] R. Hecht-Nielsen, “Theory of the backpropagation neural network,” in Proc. Int. Joint Conf. Neural Networks, June 1989, pp. I-593–605. [38] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” IEEE Trans. Neural Networks, vol. NN, pp. 4–27, Mar. 1990. [39] K. S. Narendra and K. Parthasarthy, “Gradient methods for the optimization of dynamical systems containing neural networks,” IEEE Trans. Neural Networks, vol. 2, pp. 4–27, 1991. [40] K. J. Hunt, D. Sbarbaro, R. Zbikowski, and P. J. Gawthrop, “Neural networks for control systems—A survey,” Automatica, vol. 28, no. 6, pp. 1083–1112, 1992.
[41] D. R. Hush and B. G. Horne, “Progress in supervised neural networks: What’s new since Lippmann?,” IEEE Signal Processing Mag., no. 1, pp. 8–39, Jan. 1993. [42] A. E. Bryson and Y. C. Ho, Applied Optimal Control. New York: Blaisdell, 1969. [43] R. A. Horn and C. A. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1985.
Liang Jin received the B.S. and M.Sc. degrees in electrical engineering from the Changsha Institute of Technology, China, in 1982 and 1985, respectively. He received the Ph.D. degree in electrical engineering from the Chinese Academy of Space Technology, China, in 1989. From 1989 to 1991, he was a Research Scientist of the Alexander von Humboldt (AvH) Foundation at the University of Bundeswher, Munich, Germany. From 1991 to 1995, he was a Research Scientist in the Intelligent Systems Research Lab at University of Saskatchewan, Saskatoon, Canada. He was with the SED Systems Inc. in Saskatoon, Canada, from 1995 to 1996 as a Design Engineer. From 1996 to 1999, he was a Member of Scientific Staff at Nortel Networks in Ottawa, Canada. He has been with the Microelectronics Group of Lucent Technologies, Allentown, PA, since 1999 as a Member of Technical Staff. He has published more than 30 conference and journal papers in the area of neural networks, digital signal processing, and control and communication systems. He holds four U.S. patents (pending). His current research interests include intelligent information systems, digital signal processing with its applications to wireless communications, and neural networks and its applications to communication and control systems.
Madan M. Gupta (M’63–SM’76–F’90) received the B.Eng. (Hons.) and the M.Sc. in electronicscommunications engineering, from the Birla Engineering College (now the BITS), Pilani, India, in 1961 and 1962, respectively. He received the Ph.D. degree from the University of Warwick, U.K., in 1967 in adaptive control systems. In 1998, he received an Earned Doctor of Science (D.Sc.) degree from the University of Saskatchewan, Canada for his research in the fields of adaptive control systems, neural networks, fuzzy logic, neuro-control systems, neuro-vision systems, and early detection and diagnosis of cardia ischemic disease. He is currently Professor of Engineering and the Director of the Intelligent Systems Research Laboratory and the Centre of Excellence on Neuro-Vision Research at the University of Saskatchewan, Canada. In addition to publishing over 650 research papers, he has coauthored two books on fuzzy logic with Japanese translation, and has edited 21 volumes in the field of adaptive control systems, fuzzy logic/computing, neuro-vision, and neuro-control systems. His present research interests are expanded to the areas of neuro-vision, neurocontrols and integration of fuzzy-neural systems, neuronal morphology of biological vision systems, intelligent and cognitive robotic systems, cognitive information, new paradigms in information processing, and chaos in neural systems. He is also developing new architectures of computational neural networks (CNN’s), and computational fuzzy neural networks (CFNN’s) for applications to advanced robotic systems. Dr. Gupta has served the engineering community worldwide in various capacities through societies such as IFSA, IFAC, SPIE, NAFIP, UN, CANSFINS, and ISUMA. He is a Fellow of the SPIE. In June 1998, he was honored by the award of the very prestigious Kaufmann Prize of Gold Medal for Research into Fuzzy Logic. He has been elected as a Visiting Professor and a Special Advisor in the areas of high technology to the European Centre for Peace and Development (ECPD), University of Peace, which was established by the United Nations.