A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil ...

Report 0 Downloads 85 Views
1424

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

A Novel Fuzzy-Neural-Network Modeling Approach to Crude-Oil Blending Wen Yu, Senior Member, IEEE

Abstract—In this brief, we propose a new fuzzy-neural-network (FNN) modeling approach which is applied for the modeling of crude-oil blending. The structure and parameters of FNNs are updated online. The new idea for the structure identification is that the input (precondition) and the output (consequent) spaces partitioning are carried out in the same time index. This idea gives a better explanation for input–output mapping of nonlinear systems. The contributions of the parameters identification are as follows: 1) A time-varying learning rate is applied for the commonly used backpropagation algorithm, and the upper bound of modeling error and stability are proved, and 2) since the data of the precondition and the consequent are in the same temporal interval, we can train each rule by its own group data. Index Terms—fuzzy neural networks, online clustering, crude-oil blending.

I. INTRODUCTION

B

OTH NEURAL networks and fuzzy logic are universal estimators; they can approximate any nonlinear function to any prescribed accuracy, provided that sufficient hidden neurons or fuzzy rules are given. Recent results show that fusion procedure of these two different technologies seems to be very effective for nonlinear-system modeling [4]. It falls in two categories [15], [17]: structure identification and parameter identification. The parameter identification is usually addressed by some gradient-descent variants, e.g., the least squares algorithm and backpropagation (BP) [22]. Structure identification is to select fuzzy rules; it lies on a substantial amount of heuristic observation to express proper strategy’s knowledge. It is often tackled by offline trial-and-error approaches, like the unbias criterion [19]. There are several approaches which generate fuzzy rules from numerical data. One of the most common methods for structure initialization is uniform partitioning of each input variable; it results to a fuzzy grid [13]. In [2], the Takagi-Sugeno-Kang (TSK) model is used for designing various neurofuzzy identifiers. The earlier approaches consist of two learning phases. First is structure learning which involves finding the main input variables of all the possible, specifying the membership functions, partitioning the input space, and determining the number of fuzzy rules. Second is parameter learning that involves the unknown parameters determination and optimization; it uses

Manuscript received January 24, 2008; revised August 23, 2008. Manuscript received in final form October 14, 2008. First published April 14, 2009; current version published October 23, 2009. Recommended by Associate Editor J. Sarangapani. The author is with the Departamento de Control Automático, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav-IPN), Av.IPN 2508, México D.F., 07360, México (e-mail: [email protected]). Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCST.2008.2008194

some optimization methods based on the linguistic information obtained from the human expert and the numeric data of the actual system. These two learning phases are interrelated; none of them can be carried out independently from the other one. Traditionally, they are done sequentially and offline. The parameter updating is employed after the structure is decided. Most of structure-identification methods are based on data clustering, such as fuzzy C-means clustering [5], mountain clustering [17], and subtractive clustering [7]. These approaches require that all input–output data are ready before we start to identify the plant. These structure-identification approaches are offline. There are a few online-clustering methods in the literature. In order to maintain up-to-date clustering structure, an online version of the classical K-means clustering algorithm is developed by Beringer and Hüllermeier [3]. In [11], the input space is partitioned according to an aligned-clustering-based algorithm. After the number of rules is decided, the parameters are tuned by recursive least square algorithm. A combination of the online clustering and the genetic algorithm for fuzzy systems is proposed in [12]. The preconditions of a fuzzy systems are online constructed by an aligned-clustering-based approach. The consequents are designed by genetic reinforcement learning. In [21], the input space is automatically partitioned into fuzzy subsets by adaptive resonance theory mechanism; fuzzy rules that tend to give high output error are split in two by a specific fuzzy-rule splitting procedure. In [20], the Takagi–Sugeno fuzzy inference system is applied for online knowledge learning; it requires more training data than the models which use global generalization such as adaptive-network-based fuzzy inference system (ANFIS) [13] and multilayer perceptrons [22]. The online clustering for the input-output data with a recursively calculated spatial proximity measure is given in [1], It is instrumental for the online identification of Takagi-Sugeno models with the recursive modified weighted least squares estimation. There exist two weaknesses in the earlier online-clustering methods: 1) The input-output mapping of nonlinear systems is through time, but the input (precondition) and the output (consequent) spaces partitioning does not take into account the same time interval, and 2) since the data of precondition and consequent are not assured to be in the same temporal interval, they have to use all data to train each rule. In this brief, a novel online-clustering approach is proposed to overcome the aforementioned two weaknesses for nonlinearsystem modeling. There are three new contributions of this brief. 1) The new idea for the structure identification is that the input (precondition) and the output (consequent) spaces partitioning are carried out in the same time index which in turn renders a better explanation for input–output mapping of nonlinear systems. 2) A time-varying learning rate is applied to the BP algorithm; the upper bound of modeling error and stability are obtained.

1063-6536/$26.00 © 2009 IEEE Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING

1425

3) An application on modeling of crude-oil blending is proposed to show that the online-clustering method can be applied on nonlinear-system modeling via fuzzy neural networks (FNNs). II. NONLINEAR-SYSTEM MODELING VIA FNNS We start from a state-space discrete-time smooth nonlinear system (1) where is input vector, is state vector, is output vector. and are general nonlinear and smooth functions. Equation (1) can be rewritten as

(2) Denoting

, , , and . Since (1) is a smooth nonlinear system, . This (2) can be expressed as leads to multivariable nonlinear autoregressive moving average (NARMA) model (3) where

(4) is an unknown nonlinear function representing the plant dynamics, and are measurable scalar input and output, and is time delay. A generic fuzzy model is presented as a collection of fuzzy rules in the following form (Mamdani fuzzy model [16])

(5) fuzzy IF–THEN rules to perform a We use mapping from the input linguistic vector to the output linguistic vector . and are standard fuzzy sets. Each input variable has fuzzy sets. In the case of full connection, . By using product inference, center-average, and singleton fuzzifier, the th output of the fuzzy-logic system can be expressed as

(6)

Fig. 1. Partitioning of input–output spaces.

where is the membership functions of the fuzzy sets and is the point at which . When we have prior information of the identified plant, we can construct fuzzy rules as (5). The object of fuzzy neural modeling is to find the center , as well as the membership functions values of , such that the FNNs (6) can follow the nonlinear plant (3). III. STRUCTURE IDENTIFICATION The objective of structure identification is to partition the of nonlinear systems, input and the output data , and to find how many here groups we need or what is for the following rules:

Now, we use the following example to explain the importance of the online clustering in the same time interval. We consider a nonlinear function (7) is shown in Fig. 1; by the normal The data pair online-clustering methods proposed in [1], [11], [20], and [21], the input and output may be partitioned into four groups. These is , THEN groups can be formed into four rules as “IF is ,” . Obviously, for the third rule: “IF is , THEN is ,” it does not satisfy the relation and the consequent do (7)because the precondition not occur at the same time. One of possible methods to deal with this kind of continuously increasing sequence of time-stamped data is to use an incremental version of K-means algorithm [3]. Where the standard K-means algorithms run on the current data streams. When new block is available for all streams, the current streams are updated by a sliding-windows operation. Therefore, the clustering structure of the current streams is taken as an initialization for the clustering structure of the new streams. In this brief, the basic idea of online clustering is that the input and the output spaces partitioning are carried out in the same time interval. If the distance from a point to the center is less

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

1426

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

than a required length, the point is in this group. When new data come, the center and the group should be changed according to the new data. We give the following algorithm. The Euclidean distance at time is defined as

(8) and are the centers of and at time and are positive factors; normally we can choose ; ; ; . Usually, fuzzy models are developed by partitioning data in input and output space domains separately. In this . The new brief, we consider time domain as idea of the online clustering of this brief is that the input–output spaces partitioning is carried out in the same temporal interval. There are two reasons: 1) Nonlinear-system modeling is to find a suitable mapping between the input and the output spaces, these two spaces are connected by time index, and 2) we will propose an online modeling approach based on the online clustering. When a new group (or a new rule) is created, we do not want to use all data to train it as in [1], [11], [20], and [21]. If the data have time property, we can use the data in the corresponding time interval to train the rule. Therefore, clustering with time interval will simplify parameter identification and make the online modeling easier. For the Group , the centers are updated by where and .

respectively. If the input dominates the dynamic property, we should increase and decrease . Usually, we select such that the input and the output are of the same impor, then it becomes the normal tance. If we let online clustering. is the threshold of creating new rules; it is the lowest possible value of similarity required to join two objects in one cluster. How to choose the user-defined threshold is a tradeoff. If the threshold value is too small, there will still be many groups present at the end, and many of them will be singletons. Conversely, if the threshold is too large, many objects that are not very similar may end up in the same cluster. Since , . If we want the algorithm to partition several groups, we should ; otherwise, there is only one group. let There are some approaches to select the optimal cluster is number, for example, in [3], the optimal cluster number updated by (10) is a quality measure for the cluster number . where to means that one of the current clusters Going from has disappeared, e.g., the streams in this cluster have become very similar to the streams in a neighbored cluster. Going from to means that an additional cluster has emerged, e.g., a homogeneous cluster of streams has separated into two groups. If we do not use the threshold and change the online Euclidean distance (8) into the sliding windows, (10) can be applied to decide the cluster number. IV. PARAMETER IDENTIFICATION For the Group , there is one fuzzy rule

(9)

where is the first number of the Group and is the last number of the Group . The length of Group is . The time interval of Group is . The process of the structure identification can be formed as the following steps. . are the centers of 1) For the first data , . the first group, 2) If new data come, then ; we use (8) and (9) to calculate . If no any new data come, go to 5). , then is still in group ; go to 2). 3) If , then is in a new group ; 4) If is ; the center of the go to 2). , if 5) Check the distances between all centers , the two groups and are combined into one group. There are three design parameters , , and . and can be regarded as the weights on the input and the output spaces,

(11) to train We use the input–output data the membership functions and , i.e., the parameter identification of the membership functions are performed in the corresponding input/output time interval found in the structure identification. We use Gaussian functions as the membership functions. If we use singleton fuzzifier and Mamdani implication, the output of the th group can be expressed as (12) where is the center of fuzzy set),

and

(normal . We select

as initial conditions. We use the data to find some suitable membership functions pair . It can be transformed into a modeling in the zone problem to determine parameters , , and ; the objective is to find the center value of , as well as the membership , such that . We assume that the functions

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING

data pair in this group can be expressed in Gaussian membership function

1427

Theorem 1: If we use Mamdani-type FNN (12) to identify nonlinear plant (13), the following BP algorithm makes modbounded eling error

(13)

, , and are unknown parameters which may where minimize the modeling error . In the case of three independent variables, a smooth function has Taylor formula (18)

,

where ,

where

is the remainder of the Taylor formula. If we let correspond to , and and correspond to , and , then we have

(14)

(15)

where

and

. The identification error the following average performance

and . where The only difference between the stable learning (18) and the gradient descent is the learning gain. For gradient descent, is a positive constant; it should be small enough such that the learning process is stable. In (18), the normalizing learning rate is time-varying in order to assure that the identification is stable. The time-varying learning rate is easier to be decided; no any prior information is required, for example, we may select . The contradiction in fast convergence and stable learning may be avoided. If we select as dead-zone function if if

order approximation error of the Taylor series. Using the chain rule, we get

(16) so

(17)

where

.

satisfies

(19)

is a second-

We define

, and

equation (18) is the same as in [23]. If a —modification term or modified —rule term is added in (18), it becomes that of [14]. However, all of them need the upper bound of modeling error . Moreover, the modeling error is enlarged by the robust modifications [10]. The learning law with a constant learning rate can assure the parameters convergent to optimal (or local optimal) values. The learning law (18) cannot assure that the parameters converge to optimal values. Compared with other FNNs [11], [15], [17], the parameteridentification algorithm (18) proposed in this brief has two ad, which vantages: 1) We use the data in the time interval corresponds to the group , to train each fuzzy rule independently. Generally, it has better model accuracy than the normal in the BP-like algorithm can FNNs. 2) The time-varying assure the boundedness and the convergence of the modeling error. The relations between the structure and parameter identifications are given in the following remark. Remark 1: The parameter learning is also affected by the structure identification. If structure identification is poor (e.g., and are not suitable), then the rule (11) cannot represent the mapping in the time interval well; the unmodeled dynamic in (13) becomes bigger. Therefore, the bound of

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

1428

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

Fig. 2. TMDB crude-oil blending process.

modeling error increases with Another influence factor of structure identification is . This may happen when new rules are added to the system. If is too small, the training data are not enough to guarantee the learning procedure , clustering does not convergent. If is too big (when exist, it is a normal FNN), then the time interval becomes longer, and one rule (11) cannot approximate the complete dynamic in this long period. In this case, the unmodeled dynamic in (13) increases, and modeling error is bigger. V. APPLICATION TO THE MODELING OF CRUDE-OIL BLENDING Crude oils are often blended to increase the sale price or the process-ability of a lower grade crude oil by blending it with a higher grade higher price crude. American Petroleum Institute (API) gravity is the most used indication of density of crude oil. Usually, crude-oil blending is realized by a model-based optimization algorithm. The static properties of crude-oil blending can be obtained by thermodynamic analysis. However, mathematical models work only in some special conditions [6]. In this real application, we have only input/output data; FNNs can be applied in modeling the crude-oil blending. In this brief, we discuss a typical crude-oil blending process in PEMEX (a Mexican petroleum company); it is called Terminal Marítima de Dos Bocas Tabasco (TMDB). The flow sheet , and ), is shown in Fig. 2(a). It has three blenders ( one dehydration equipment, and one tank. Fig. 2(b) shows the static process of the crude- oil blending; is flow rate; and is the property of th feed stock, it can be API gravity. The data are recorded in the form of Microsoft Excel daily. Each day, we and output data have input data , it is called as integrated model. Because there are new data coming continuously, we will use online-clustering technique proposed in this brief for FNN modeling. Original data are obtained in each hour; the mean value are calculated and are saved in each day, such that the measurement noise is reduced. We use 730 input/output pairs, which correspond to two years worth of records, to train the fuzzy model.

1) Structure Identification: The following Mamdani fuzzy rule: model is used, for

(20) for the crude-oil blending, the output API gravity affects more than the input flow for partitioning. We select . From Fig. 3, it is shown that the maximum changes in the input and the output are about three and one; Remark 2 tells . Therefore, us should be chosen as ; in this application, we select . The input–output partitioning of two-month data are shown in Fig. 3, where “ ” represents the center of each group, is the boundary between the groups. We can see that there are three groups (rules) in the two-month data. For example, , the time intervals of the second group are the group length is 15, and the center of the second group is when . The time intervals of the input and the output are identical. Then, we do online clustering for the other 22-month data. There exist six combinations for each group . (step 5) in Section III). The final group number is 2) Parameter Identification: Each group has one fuzzy rule in the form of (20). We use (18) to train the membership . After functions of each rule; in this brief, we select parameter training, the final fuzzy model is obtained by the product inference, center-average, and singleton fuzzifier, i.e., . 3) Testing: We use 28 testing data, which are one months worth of records in the other year. In this way, we can assure that the testing phase is independent of the training phase. The testing data are 28 days; it is a very short period as compared with the learning period (730), because training phase is slow and needs a large amount of data to assure convergence. The modeling results are shown in Fig. 4. The first figure shows the training phase; in order to make it clear, only a part of the training data (from 600 to 740) are reported. The behavior of the testing phase and the modeling error are shown in the second and third figures in Fig. 4.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING

Fig. 3. Input and output partitioning of two-month data.

Fig. 4. Modeling of crude-oil blending with online clustering and FNNs.

TABLE I COMPARISONS FOR DIFFERENT PARAMETERS

A. Comparisons First, we discuss how the parameters , , and affect the structure identification and how to select testing sets. The results are shown in Table I. In this section, “Rule #” is rule number; the “Training” and . The “Testing” are RMS errors defined as “case 1,” “case 2,” and “case 3” correspond to . The first 22-month data are for training; the other one-month data are for testing. “R1” and “R2” are two random selections for the training sets (22-month data) and testing sets (one-month data)

1429

, , and . from the two-year data, and From Table I, we can obtain the following conclusions. 1) and can be regarded as weights in input and output. In this application example, we try different values for them and find that and not only decide the structure (rule number) but also influence the parameter learning. The and , testing errors reach minimum when and the training errors are almost the same. 2) The number of groups in which the spaces are partitioned also depends on a threshold parameter . From the earlier . When is big ( ), there analysis, we know are less groups. Each group has one fuzzy rule; the number of fuzzy rules is also less. Now, for the same training data, there are more data in each group (or for each fuzzy rule), so the training error is small. However, the structure error for the three fuzzy rules system is big, although the parameters’ training is good, so the testing error is big. When is small ( ), there are more rules and less data in each rule. 3) If the training set and testing set are selected randomly, the training error and testing error do not change a lot, because the crude-oil blending policy remains the same for the whole year. Second, in order to illustrate structure identification, we compare our approach: Online clustering for FNNs (OFNs), with BP [18]; Online Fuzzy clustering with Independent input and output partition (OFI) [11], [20]; Discrete-time Neural Networks (DNNs) [24]; normal FNNs [13]; and FNNs with stable learning (FNN1) [22]. The training epoch for all models are the same, i.e., 730 (two-year data). The results are shown in Table II. In the method proposed in this brief, we use two learning and . The normal BP algorithm [18] rates: is compared with our stable parameter-identification algorithm (18). We use the same multilayer neural networks as [18]; it (the numbers of input layer, hidden layer, and output is . The learning rate layer are 8, 5, and 1, respectively) and ; we found that after it becomes unstable. OFI use the whole data and BP algorithm to train the fuzzy rules. The thresholds for the input and output are selected in two cases: and . The DNN [24] use the similar time-varying learning algorithm; the structure is the and . The normal FNNs [13], same as BP [18], i.e., [22] uses six and eight rules, training set is from 22-month data, and testing set is the other one-month data. From Table II, we can obtain the following conclusions. 1) The time-varying learning rate for the steepest descent update equations is faster than normal BP, particularly when the training data set is not large enough to assure convergence of the BP. Therefore, the training and testing errors of BP are bigger than OFN when the training set is from 22-month data. Moreover, FNN2 is better than FNN1. 2) OFI and FNN use the whole data to train all membership functions of the fuzzy system. Although FNNs can learn through local learning data inherently due to the local mapping of fuzzy rules, they are not like OFN which uses the data in certain interval to train each rule independently.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

1430

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 17, NO. 6, NOVEMBER 2009

TABLE II COMPARISONS FOR DIFFERENT APPROACHES

OFN has better model accuracy than OFI and FNN for this example. 3) OFN and OFI obtain rule number automatically, but OFI does not consider the time index. We can find that, with the same threshold, our method has less model complexity; the fuzzy rules of our method are less with a high modeling accuracy. 4) It is well known that both neural networks and fuzzy systems can approximate any nonlinear function to any prescribed accuracy. However, the fuzzy systems (OFN, OFI, FNN) are more complex than the neural networks (BP, DNN); here, each rule has eight Gaussian membership functions and one consequent parameter. The total param, while the neural neteters are has parameters. Comworks paring OFN and DNN, we see that fuzzy systems and neural networks can achieve high modeling accuracy with time-varying learning rates. However, the hidden nodes of DNN should be specified in advance, and the structure of OFN is obtained automatically. The data arrive once a day. It seems that there is no need for online learning, because we can retrain a fuzzy system in batch every day. The advantage of “online” clustering is when the nonlinear system is changed, for example, the prescription of crude-oil blending is modified. In this case, the history data cannot be applied; for bath method, we have to use forgetting factor or moving window to select recent data. The online clustering proposed in this brief can avoid this kind of problem.

where updating law (18) can be written as

. The

Therefore, we have

(22) Because then

Since

,

, then

VI. CONCLUSION In this brief, we have presented a quick and efficient approach for nonlinear-system modeling using FNNs. Both the structure identification and the parameter learning are done online. By the novel online-clustering approach and the time-varying learning law, we resolve the two problems in online clustering for nonlinear-system modeling: 1) The input and output data are made to correspond by the same time and (b) the parameters are updated by their own group data, and the learning process is stable.

Therefore,

We define as

APPENDIX PROOF OF THEOREM 1

; we choose

We selected a positive-defined scalar (21) Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

(23)

YU: NOVEL FUZZY-NEURAL-NETWORK MODELING APPROACH TO CRUDE-OIL BLENDING

where is defined in (19). Therefore, the Lyapunov function satisfies (24) is functions and is a where satisfies time varying. In addition,

function;

is

where and are functions. From [8], we know is stable; the modeling error bounded. From (23), we have

1431

[6] D. M. Chang, C. C. Yu, and I. L. Chien, “Coordinated control of blending systems,” IEEE Trans. Control Syst. Technol., vol. 6, no. 4, pp. 495–506, Jul. 1998. [7] S. L. Chiu, “Fuzzy model identification based on cluster estimation,” J. Intell. Fuzzy Syst., vol. 2, no. 3, pp. 267–278, 1994. [8] M. J. Corless and G. Leitmann, “Continuous state feedback guaranteeing uniform ultimate boundness for uncertain dynamic systems,” IEEE Trans. Autom. Control, vol. AC-26, no. 5, pp. 1139–1144, Oct. 1981. [9] C. A. Gama, A. G. Evsukoff, P. Weber, and N. F. F. Ebecken, “Parameter identification of recurrent fuzzy systems with fuzzy finite-state automata representation,” IEEE Trans. Fuzzy Syst., vol. 16, no. 1, pp. 213–224, Feb. 2008. [10] P. A. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ: Prentice-Hall, 1996. [11] C. F. Juang and C. T. Lin, “An online self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, no. 1, pp. 12–32, Feb. 1998. [12] C. F. Juang, “Combination of online clustering and -value based GA for reinforcement fuzzy system design,” IEEE Trans. Fuzzy Syst., vol. 13, no. 3, pp. 289–302, Jun. 2005. [13] J. S. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Syst., Man, Cybern., vol. 23, no. 3, pp. 665–685, May/Jun. 1993. [14] S. Jagannathan and F. L. Lewis, “Identification of nonlinear dynamical systems using multilayered neural networks,” Automatica, vol. 32, no. 12, pp. 1707–1712, Dec. 1996. [15] C. T. Lin, Neural Fuzzy Control Systems With Structure and Parameter Learning. New York: World Scientific, 1994. [16] E. H. Mamdani, “Application of fuzzy algorithms for control of simple dynamic plant,” Proc. Inst. Elect. Eng.—Control Theory Appl., vol. 121, no. 12, pp. 1585–1588, 1976. [17] S. Mitra and Y. Hayashi, “Neuro-fuzzy rule generation: Survey in soft computing framework,” IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 748–768, May 2000. [18] K. S. Narendra and S. Mukhopadhyay, “Adaptive control using neural networks and approximate models,” IEEE Trans. Neural Netw., vol. 8, no. 3, pp. 475–485, May 1997. [19] I. Rivals and L. Personnaz, “Neural-network construction and selection in nonlinear modeling,” IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 804–819, Jul. 2003. [20] G. Serra and C. Bottura, “An IV-QR algorithm for neuro-fuzzy multivariable online identification,” IEEE Trans. Fuzzy Syst., vol. 15, no. 2, pp. 200–210, Apr. 2007. [21] S. G. Tzafestas and K. C. Zikidis, “NeuroFAST: Online neuro-fuzzy ART-based structure and parameter learning TSK model,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 31, no. 5, pp. 797–802, Oct. 2001. [22] W. Yu and X. Li, “Fuzzy identification using fuzzy neural networks with stable learning algorithms,” IEEE Trans. Fuzzy Syst., vol. 12, no. 3, pp. 411–420, Jun. 2004. [23] W. Yu, A. S. Poznyak, and X. Li, “Multilayer dynamic neural networks for nonlinear system online identification,” Int. J. Control, vol. 74, no. 18, pp. 1858–1864, Dec. 2001. [24] X. Li and W. Yu, “Modeling of crude oil blending via discrete-time neural networks,” Int. J. Comput. Intell., vol. 2, no. 1, pp. 63–70, 2005.

Q

It is noted that if , then ; the total time during which must be finite. Let denotes the time interval during . If only stay outside the ball of radius and finite times that then reenter, will eventually stay inside of this ball. If leave leave the ball the ball infinite times, since the total time . Therefore, is is finite, bounded; the identification error and the weights are bounded. Let denotes the largest tracking error during the interval. Then, bounded implies that . Therefore, tion (19) is obtained.

will converge to

. Equa-

REFERENCES [1] P. Angelov, “An approach for fuzzy rule-base adaptation using online clustering,” Int. J. Approx. Reason., vol. 35, no. 3, pp. 275–289, Mar. 2004. [2] M. F. Azeem, M. Hanmandlu, and N. Ahmad, “Structure identification of generalized adaptive neuro-fuzzy inference systems,” IEEE Trans. Fuzzy Syst., vol. 11, no. 5, pp. 666–681, Oct. 2003. [3] J. Beringer and E. Hüllermeier, “Online clustering of parallel data streams,” Data Knowl. Eng., vol. 58, no. 2, pp. 180–204, Aug. 2006. [4] M. Brown and C. J. Harris, Neurofuzzy Adaptive Modelling and Control. Englewood Cliffs, NJ: Prentice-Hall, 1994. [5] J. C. Bezdek, Pattern Recognition With Fuzzy Objective Function Algorithms. New York: Plenum, 1981.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on October 30, 2009 at 11:45 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

983

Recurrent Neural Networks Training With Stable Bounding Ellipsoid Algorithm Wen Yu, Senior Member, IEEE, and José de Jesús Rubio

Abstract—Bounding ellipsoid (BE) algorithms offer an attractive alternative to traditional training algorithms for neural networks, for example, backpropagation and least squares methods. The benefits include high computational efficiency and fast convergence speed. In this paper, we propose an ellipsoid propagation algorithm to train the weights of recurrent neural networks for nonlinear systems identification. Both hidden layers and output layers can be updated. The stability of the BE algorithm is proven. Index Terms—Bounding ellipsoid (BE), identification, recurrent neural networks.

I. INTRODUCTION ECENT results show that neural network techniques seem to be effective to identify a broad category of complex nonlinear systems, when complete model information cannot be obtained. Neural networks can be classified as feedforward and recurrent ones [8]. Feedforward networks, for example multilayer perceptrons, are implemented to approximate nonlinear functions. The main drawback of these neural networks is that the weights’ updating does not utilize information on the local data structure and the function approximation is sensitive to the training data [17]. Since recurrent networks incorporate feedback, they have powerful representation capabilities and can successfully overcome the disadvantages of feedforward networks [13]. Even though backpropagation has been widely used as a practical training method for neural networks, there are some limitations such as slow convergence, local minima, and sensitivity to noise. In order to overcome these problems, many methods for neural identification, filtering, and training have been proposed, for example, Levenberg–Marquardt, momentum algorithms [15], extended Kalman filter [23], and least squares approaches [17], which can speed up the backpropagation training. Most of them use static structures. There are some special restrictions for recurrent structure. In [2], the output layer must be linear and the hidden-layer weights are chosen randomly. The extended Kalman filter with decoupling structure has fast convergence speed [22], however the computational complexity

R

Manuscript received December 03, 2007; revised July 17, 2008 and October 21, 2008; accepted December 18, 2008. First published May 15, 2009; current version published June 03, 2009. W. Yu is with the Departamento de Control Automático, CINVESTAV-IPN, México D.F. 07360, México (e-mail: [email protected]). J. de Jesús Rubio is with the Sección de Estudios de Posgrado e Investigación, Instituto Politécnico Nacional-ESIME Azcapotzalco, Col.Sta. Catarina, México D.F. 07320, México. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2009.2015079

in each interaction is increased. Decoupled Kalman filter with diagonal matrix [19] is similar to the gradient algorithm, so the convergence speed cannot be increased. A main drawback of the Kalman filter training is that theoretical analysis requires the uncertainty of neural modeling to be Gaussian process. In 1979, Khachiyan indicated how an ellipsoid method for linear programming can be implemented in polynomial time [1]. This result caused great excitement and stimulated a flood of technical papers. The ellipsoid technique is a helpful tool in state estimation of dynamic systems with bounded disturbances [5]. There are many potential applications to problems outside the domain of linear programming. Weyer and Campi [27] obtained confidence ellipsoids which are valid for a finite number of data points, whereas Ros et al. [20] presented an ellipsoid propagation such that the new ellipsoid satisfies an affine relation with another ellipsoid. In [3], the ellipsoid algorithm is used as an optimization technique that takes into account the constraints on cluster coefficients. Lorenz and Boyd [14] described in detail several methods that can be used to derive an appropriate uncertainty ellipsoid for the array response. In [16], the problem concerning asymptotic behavior of ellipsoid estimates is considered for linear discrete-time systems. There are few application of ellipsoid on neural networks. In [4], unsupervised and supervised learning laws in the form of ellipsoids are used to find and tune the fuzzy function rules. In [12], ellipsoid type of activation function is proposed for feedforward neural networks. In [10], multiweight optimization for bounding ellipsoid (BE) algorithms is introduced. In [6], a simple adaptive algorithm is proposed that estimates the magnitude of noise. They are based on two operations of ellipsoid calculus: summation and intersection which correspond to the prediction and correction phase of the recursive state estimation problem, respectively. In [21], we used the BE algorithm to train recurrent neural networks. But the training algorithm does not have standard recurrent form, so theory analysis cannot be implemented. In this paper, we modify the above algorithm and analyze the stability of nonlinear system identification. To the best of our knowledge, neural network training and stability analysis with the ellipsoid or the BE algorithm has not yet been established in the literature, and this is the first paper to successfully apply the BE algorithm for stable training of recurrent neural networks. In this paper, the BE algorithm is modified to train the weights of a recurrent neural network for nonlinear system identification. Both hidden layers and output layers can be updated. Stability analysis of identification error with the BE algorithm is given by a Lyapunov-like technique.

1045-9227/$25.00 © 2009 IEEE Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

984

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

II. RECURRENT NEURAL NETWORKS TRAINING WITH BE ALGORITHM

is a known initial constant weight, is where with respect the derivative of nonlinear activation function and is the remainder of the Lipschitz form (5). to Also

Consider following discrete-time nonlinear system: (1) where is a state vector, is an input vector, is the upper bound of and and are known. is an unknown nonlinear smooth vector-valued function . We use the following seriesparallel [15] recurrent neural network to identify the nonlinear plant (1):

(2) represents the state of the neural network. The is a stable matrix. The weights in an output , the weights in a hidden layer are is -dimensional vector function, and is a diagonal matrix

where matrix layer are

(6) is the derivative of nonlinear activation function where with respect to and is the remainder of the Lipschitz form (6). From [18, Lemma 12.5], and are bounded, and are when the functions bounded. So we have

(7) is a known initial constant weight, where . Similarly

(8) . When as a diagonal matrix, where substituting (7) and (8) into the plant (4), we have the following single-output form: (9) (3) where

and and

are sigmoid functions. The expressions on the right-hand side of (2) can and , respectively, in which case also be it is called parallel model [15]. By using our previous results in [28], the parallel model training has similar results as the series-parallel model (2). According to the Stone–Weierstrass theorem [13], the unknown nonlinear system (1) can be written in the following form:

(4) where represents unmodeled dynamics. By [13], we know can be made arbitrarily small by simply sethat the term lecting appropriate number of the hidden neurons. and are differentiable, Because the sigmoid functions based on [18, Lemma 12.5], we conclude that

where the output is

is th element of the vector The unmodeled dynamics are defined as

. (10)

the parameter is , the data is and . The output of the recurrent neural network (1) is (11) We define the training error as (12) The identification error neural network (2) is

between the plant (1) and the

(5) Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

(13)

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM

985

ellipsoid intersection will not increase. Now using the ellipsoid definition for the neural identification, we define the parameter as error ellipsoid (14) where is the unknown optimal weight in (9), and . that minimizes the modeling error In this paper, we use the following two assumptions. belongs to an ellipsoid A1. It is assumed that set (15)

Fig. 1. An ellipsoid.

. where are the known positive constants, is bounded The assumption A1 requires that by . In this paper, we discuss the open-loop identification, and we assume that the plant (1) is bounded-input–bounded-output and in (1) are bounded. Since (BIBO) stable, i.e., and are bounded, all of data in are bounded, so is bounded. If

Fig. 2. Ellipsoid intersection of two ellipsoid sets.

Now we use the BE algorithm to train the recurrent neural netis bounded. work (2) such that the identification error Definition 1: A real -dimensional ellipsoid set, centered on , can be described as

where volume of

is a positive-definite symmetric matrix. The is defined as in [5] and [20]

where is a constant that represents the volume of the unit ball . in The orientation (direction of axis) of the ellipsoid is deof , and the lengths termined by the eigenvectors of the semimajor axes of are determined by the eigenvalues of . A 2-D ellipsoid is shown in Fig. 1. Definition 2: The ellipsoid intersection of two ellipsoid sets and is another ellipsoid set [25], defined as

where and and are positive definite symmetric matrices. is The normal intersection of the two ellipsoid sets not an ellipsoid set in general. The ellipsoid set contains the . Fig. 2 normal intersection of ellipsoid sets [25], shows this idea. There exists a minimal volume ellipsoid corresponding to , called the optimal bounding ellipsoid (OBE); see [11], [20], and [25]. In this paper, we will not try to find , but we will design an algorithm such that the volume of the new

By Definition 1, the common center of the sets , is , so . Finding is an intractable task since the amount of information in (15) grows linearly in . in (14) involves the soMoreover, evaluating a value of lution of th-order inequalities for (15). A2. It is assumed that the initial weight errors are inside an ellipsoid (16) where and are the unknown optimal weights. The assumption A2 requires that the initial weights of the neural networks be bounded. It can be satisfied by and . From the definition of choosing suitable in (14), the common center of the sets , is , so . By (14) and (15), the ellipsoid intersection satisfies (17) Thus, the problem of identification is to find a minimum set of , which satisfies (14). We will construct a recursive identifiis a BE set if is a BE set. cation algorithm such that The next theorem shows the propagation process of these ellipsoids.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

986

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Theorem 1: If in (14) is an ellipsoid set, we use the foland : lowing recursive algorithm to update

Substituting (21) into (22), it gives

(18) is a given diagonal positive-definite matrix, and where and positive constant which is selected such that Then is an ellipsoid set and satisfies

(23)

is a .

By the intersection property (17) of the ellipsoid sets, we have

(19)

So (23) becomes

where and is given in (12). Proof: First, we apply the matrix inversion lemma [7] to by (18) calculate

Now we use as in (12). The second term of the above equation can be calculated as Since

where specifically,

, and

denote matrices of the correct size, and , and

so

, it gives From (21), we know . Because so

,

(20) so (21) where

. Now we calculate . By (18), we have and

so (24) Equation (19) is established. Since

and (25)

(22)

is an ellipsoid set.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM

987

Remark 1: The algorithm (18) has a scalar form, which is for each subsystem. This method can decrease the computational burden when we estimate the weights of the recurrent neural network. A similar idea can be found in [8] and [19]. For each and in (18), we have element of

(26) If does not change as , it becomes the backpropagation algorithm [8]. The time-varying gain in the BE algorithm may speed up the training process. The BE algorithm (18) has the similar structure as the extended Kalman filter training algorithm [22], [23], [26]

(27) where can be chosen as , where is small and positive, is the covariance of the process noise. When , and it becomes the least square algorithm [7]. If and , (27) is the BE algorithm (18). But there is a big difference: the BE algorithm is for deterministic case and the extended Kalman filter is for stochastic case. and is (17), which is also The ellipsoid intersection of ellipsoid, defined as

where is a vector variable and is the center of . We cannot assure that the center of the ellipsoid intersection is also , but since the centers of and are , we can . From (19) and (18), we know guarantee that is inside

Fig. 3. Convergence of the intersection 5 .

The following steps show how to train the weights of recurrent neural networks with the BE algorithm. 1) Construct a recurrent neural networks model (2) to identify an unknown nonlinear system (1). The matrix is selected such that it is stable. 2) Rewrite the neural network in linear form

3) Train the weights as

4)

is changed as the BE algorithm

III. STABILITY ANALYSIS where

and . , i.e.,

If , then

. So , and the volume of satisfies

The volume of is less than the volume of when the and . modeling error is not small and will converge to the set when Thus, the set . This means that when the modeling error is bigger than the unmodeled dynamic will converge to the set ; see Fig. 3.

Theorem 1 tells us that is a BE if A2 is satisfied. So the weights of the neural networks are bounded with the training algorithm (18). The next theorem gives the bound of the identification error. Theorem 2: If we use the neural network (2) to identify the unknown nonlinear plant (1) with the training algorithm (18), is bounded, and the normalthen the identification error ization of the training error converges to the residual set (28) where

.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

988

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Proof: We define the following Lyapunov function:

is minimized. When is bounded, , we know (30) and

is also bounded. From

(29) (32)

as

Evaluating

Summarize (32) from

to

by (24), we have , so Since

is constant and (33)

From Theorem 1, we know is a BE set, so

. We define . So (30)

where

It is (28). Remark 2: Even if the parameters converge to their optimal values with the training algorithm (18), from (4), we know that there always exists unmodeled dynamics (structure error). So . we cannot reach

. Because

IV. SIMULATIONS The nonlinear plant to be identified is expressed as [15], [24]

where and are -functions, is an -function of and and is a -function of , so admits the smooth input-to-state (ISS)-Lyapunov function as in [9], the dynamic of the training error is input-to-state stable. The “INPUT” corresponds to the second term of the last line in (30). The “STATE” corresponds to the first term of the last line in (30), i.e., the training error . Because the “INPUT” is bounded and the dynamic is ISS, the “STATE” is bounded. is not the same as the identification The training error , but they are minimized at the same error time. From (2), (4), (9), and (12), we have (31) where

. By the relation

(34) This input–output model can be transformed into the following state–space model

(35) This unknown nonlinear system has the standard form (1). We use the recurrent neural network given in (2) to identify it, where and . , which is a stable diagonal matrix. We select The neural identifier (2) can be written in the form of (36) Here

because

Since

is a constant, the minimization of the training error means the upper bound of the identification error

is nonlinear part. Dynamics of the linear part is determined by the eigenvalues of . In this can assure both stability and fast example, we found that response for the dynamic neural network (36). Model complexity is important in the context of system identification, which corresponds to the hidden nodes of the neuromodel. In order to get higher accuracy, we should use more hidden nodes. In [15], the static neural networks needed 20 hidden nodes for this example. For this simulation, we try to test different numbers of hidden nodes, and we find that with

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM

989

Fig. 4. Identification errors of backpropagation and BE. Fig. 6. Identification errors of backpropagation and BE with a bounded perturbation.

that after the backpropagation algorithm becomes unstable. If we define the mean squared error for finite time as

Fig. 5. Identification errors of Kalman filter and BE.

more than three hidden nodes, the identification accuracy will not improve much. So we use three nodes in the hidden layer, i.e., , and . The initial weights , are chosen in random in . The input is and

then the comparison results for the identification error are shown in Fig. 4. in the BE When the time-varying learning rate , the training algorithm (18) is constant, e.g., BE training becomes backpropagation. The updating steps in the BE training are variable to guarantee the stability. Also when is large, so this BE algorithm performs much better than backpropagation. Extended Kalman filter training algorithms [22], [23], [26] are also effective when disturbances are white noise or small bounded noise, which is

(38)

(37) From Theorem 1, we know that the initial condition for should be large, and we select . , and corresponds to the learning Theorem 1 requires rate in (18). The bigger is, the faster is the training algorithm, but it is less robust. In this example, we found that is , i.e., satisfied. Theorem 1 also requires . It is the upper bound of the modeling error as in (15), and from (18), we see that also decides the learning rate. For this example, we found that is a good choice. We compare the BE training algorithm (18) with the standard backpropagation algorithm (26), and the learning rate for . In this simulation, we found the backpropagation is

We choose and . The comparison results for the identification error are shown in Fig. 5. Now we repeat the above simulations with a bounded perturin the input. This input–output model can be written bation as

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

(39)

990

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 6, JUNE 2009

Fig. 7. Identification errors of Kalman filter and BE with a bounded perturbation.

where is a uniform random noise. When it is bounded by 0.02, the comparisons between the BE algorithm and backpropagation are shown in Fig. 6. When it is bounded by 0.2, the comparisons between the BE algorithm and extended Kalman filter are shown in Fig. 7. They show that the BE technique does better than both backpropagation and extended Kalman filter in the presence of non-Gaussian noise. in the extended Kalman filter (38) and The leaning rate the BE (18) are similar. Both of them have fast convergence speeds. The extended Kalman filter requires that the external disturbances be white noises. On the other hand, the noises in the BE training are required to be bounded. When the disturbances are big, the bounded ellipsoid training proposed in this paper has less steady error than the extended Kalman filter algorithm. Theorem 1 gives necessary conditions of and for stable and . In this example, we found that learning, i.e., and , the learning process becomes unstable. if V. CONCLUSION In this paper, a novel training method for recurrent neural network is proposed, and the BE algorithm is modified for neural identification. Ellipsoid intersection and ellipsoid volume are introduced to explain the physical meaning of the proposed training algorithms. Both hidden layers and output layers of the recurrent neural networks can be updated. Lyapunov-like technique is used to prove that the ellipsoid intersection can be propagated and the BE algorithm is stable. The proposed concept can be extended for feedforward neural networks. The BE algorithm may also be applied to nonlinear adaptive control, fault detection and diagnostics, performance analysis of dynamic systems and time series, and forecasting. REFERENCES [1] R. G. Bland, D. Goldfarb, and M. J. Todd, “The ellipsoid method: A survey,” Oper. Res., vol. 29, pp. 1039–1091, 1981.

[2] F. N. Chowdhury, “A new approach to real-time training of dynamic neural networks,” Int. J. Adapt. Control Signal Process., vol. 31, pp. 509–521, 2003. [3] M. V. Correa, L. A. Aguirre, and R. R. Saldanha, “Using steady-state prior knowledge to constrain parameter estimates in nonlinear system identification,” IEEE Trans. Circuits Syst. I, Fund. Theory Appl., vol. 49, no. 9, pp. 1376–1381, Sep. 2002. [4] J. A. Dickerson and B. Kosko, “Fuzzy function approximation with ellipsoid rules,” IEEE Trans. Syst. Man Cybern. B. Cybern., vol. 26, no. 4, pp. 542–560, Aug. 1996. [5] E. Fogel and Y. F. Huang, “On the value of information in system identification: Bounded noise case,” Automatica, vol. 18, no. 2, pp. 229–238, 1982. [6] S. Gazor and K. Shahtalebi, “A new NLMS algorithm for slow noise magnitude variation,” IEEE Signal Process. Lett., vol. 9, no. 11, pp. 348–351, Nov. 2002. [7] G. C. Goodwin and K. Sang Sin, Adaptive Filtering Prediction and Control. Englewood Cliffs, NJ: Prentice-Hall, 1984. [8] S. Haykin, Neural Networks-A Comprehensive Foundation. New York: Macmillan, 1994. [9] Z. P. Jiang and Y. Wang, “Input-to-state stability for discrete-time nonlinear systems,” Automatica, vol. 37, no. 2, pp. 857–869, 2001. [10] D. Joachim and J. R. Deller, “Multiweight optimization in optimal bounding ellipsoid algorithms,” IEEE Trans. Signal Process., vol. 54, no. 2, pp. 679–690, Feb. 2006. [11] S. Kapoor, S. Gollamudi, S. Nagaraj, and Y. F. Huang, “Tracking of time-varying parameters using optimal bounding ellipsoid algorithms,” in Proc. 34th Allerton Conf. Commun. Control Comput., Monticello, IL, 1996, pp. 1–10. [12] N. S. Kayuri and V. Vienkatasubramanian, “Representing bounded fault classes using neural networks with ellipsoid activation functions,” Comput. Chem. Eng., vol. 17, no. 2, pp. 139–163, 1993. [13] E. B. Kosmatopoulos, M. M. Polycarpou, M. A. Christodoulou, and P. A. Ioannou, “High-order neural network structures for identification of dynamic systems,” IEEE Trans. Neural Netw., vol. 6, no. 2, pp. 422–431, Mar. 1995. [14] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beam-forming,” IEEE Trans. Signal Process., vol. 53, no. 5, pp. 1684–1696, May 2005. [15] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamic systems using neural networks,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 4–27, Mar. 1990. [16] S. A. Nazin and B. T. Polyak, “Limiting behavior of bounding ellipsoid for state estimation,” in Proc. 5th IFAC Symp. Nonlinear Control Syst., St. Petersburg, Russia, 2001, pp. 585–589. [17] A. G. Parlos, S. K. Menon, and A. F. Atiya, “An algorithm approach to adaptive state filtering using recurrent neural network,” IEEE Trans. Neural Netw., vol. 12, no. 6, pp. 1411–1432, Nov. 2001. [18] A. S. Poznyak, E. N. Sanchez, and W. Yu, Differential Neural Networks for Robust Nonlinear Control. Singapore: World Scientific, 2001. [19] G. V. Puskorius and L. A. Feldkamp, “Neurocontrol of nonlinear dynamic systems with Kalman filter trained recurrent networks,” IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 279–297, Mar. 1994. [20] L. Ros, A. Sabater, and F. Thomas, “An ellipsoid calculus based on propagation and fusion,” IEEE Trans. Syst. Man Cybern. B, Cybern., vol. 32, no. 4, pp. 430–442, Aug. 2002. [21] J. J. Rubio and W. Yu, “Neural networks training with optimal bounded ellipsoid algorithm,” in Advances in Neural Networks-ISNN 2007, ser. Lecture Notes in Computer Science 4491. Berlin, germany: SpringerVerlgag, 2007, pp. 1173–1182. [22] J. J. Rubio and W. Yu, “Nonlinear system identification with recurrent neural networks and dead-zone Kalman filter algorithm,” Neurocomputing, vol. 70, no. 13, pp. 2460–2466, 2007. [23] D. W. Ruck, S. K. Rogers, M. Kabrisky, P. S. Maybeck, and M. E. Oxley, “Comparative analysis of backpropagation and the extended Kalman filter for training multilayer perceptrons,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 6, pp. 686–691, Jun. 1992. [24] P. S. Sastry, G. Santharam, and K. P. Unnikrishnan, “Memory neural networks for identification and control of dynamic systems,” IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 306–319, Mar. 1994. [25] F. C. Schweppe, Uncertain Dynamic Systems. Englewood Cliffs, NJ: Prentice-Hall, 1973. [26] S. Singhal and L. Wu, “Training multilayer perceptrons with the extended Kalman algorithm,” Adv. Neural Inf. Process. Syst. I, pp. 133–140, 1989.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.

YU AND DE JESÚS RUBIO: RECURRENT NEURAL NETWORKS TRAINING WITH STABLE BOUNDING ELLIPSOID ALGORITHM

[27] E. Weyer and M. C. Campi, “Non-asymptotic confidence ellipsoids for the least squares estimate,” in Proc. 39th IEEE Conf. Decision Control, Sydney, Australia, 2000, pp. 2688–2693. [28] W. Yu, “Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithms,” Inf. Sci., vol. 158, no. 1, pp. 131–147, 2002.

Wen Yu (M’97–SM’04) received the B.S. degree in electrical engineering from Tsinghua University, Beijing, China, in 1990 and the M.S. and Ph.D. degrees in electrical engineering from Northeastern University, Shenyang, China, in 1992 and 1995, respectively. From 1995 to 1996, he served as a Lecturer at the Department of Automatic Control, Northeastern University. In 1996, he joined CINVESTAV-IPN, México, where he is currently a Professor at the Departamento de Control Automático. He also held a research position with the Instituto Mexicano del Petróleo, from December 2002 to November 2003. Since October 2006, he has been a senior visiting research fellow at Queen’s University Belfast. He also held a visiting profes-

991

sorship at Northeastern University in China from 2006 to 2008. His research interests include adaptive control, neural networks, and fuzzy control. Dr. Yu serves as an Associate Editor of Neurocomputing and the International Journal of Modelling, Identification and Control. He is a member of the Mexican Academy of Science.

José de Jesús Rubio was born in México City in 1979. He graduated in electronic engineering from the Instituto Politecnico Nacional, México, in 2001. He received the M.S. and Ph.D. degrees in automatic control from CINVESTAV IPN, México, in 2004 and 2007, respectively. He was a full time Professor in the Autonomous Metropolitan University, Mexico City, Mexico, from 2006 to 2008. Since 2008, he has been a full time Professor at the Instituto Politecnico Nacional, ESIME Azcapotzalco, Mexico. He has published four chapters in international books and ten papers in international magazines and he has presented more than 20 papers in international conferences. He is a member of the adaptive fuzzy systems task force. His research interests are primarily focused on evolving intelligent systems, nonlinear and adaptive control systems, neural-fuzzy systems, mechatronic, robotic, and delayed systems.

Authorized licensed use limited to: CINVESTAV IPN. Downloaded on June 5, 2009 at 14:53 from IEEE Xplore. Restrictions apply.