Quick fuzzy backpropagation algorithm

Report 11 Downloads 61 Views
Neural Networks PERGAMON

Neural Networks 14 (2001) 231±244

www.elsevier.com/locate/neunet

Contributed article

Quick fuzzy backpropagation algorithm A. Nikov a,*, S. Stoeva b a

Technical University of So®a, PO Box 41, BG-1612, So®a, Bulgaria b Bulgarian National Library, So®a, Bulgaria Received 30 December 1998; accepted 1 August 2000

Abstract A modi®cation of the fuzzy backpropagation (FBP) algorithm called QuickFBP algorithm is proposed, where the computation of the net function is signi®cantly quicker. It is proved that the FBP algorithm is of exponential time complexity, while the QuickFBP algorithm is of polynomial time complexity. Convergence conditions of the QuickFBP, resp. the FBP algorithm are de®ned and proved for: (1) single output neural networks in case of training patterns with different targets; and (2) multiple output neural networks in case of training patterns with equivalued target vector. They support the automation of the weights training process (quasi-unsupervised learning) establishing the target value(s) depending on the network's input values. In these cases the simulation results con®rm the convergence of both algorithms. An example with a large-sized neural network illustrates the signi®cantly greater training speed of the QuickFBP rather than the FBP algorithm. The adaptation of an interactive web system to users on the basis of the QuickFBP algorithm is presented. Since the QuickFBP algorithm ensures quasi-unsupervised learning, this implies its broad applicability in areas of adaptive and adaptable interactive systems, data mining, etc. applications. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Neural networks; Fuzzy logic; Quasi-unsupervised learning; Adaptable interactive systems

1. Introduction Many problems in applied sciences and engineering especially adaptive and adaptable interactive systems (Berthold & Hand, 1999; Brusilovsky, Kobsa & Vassileva, 1998; Decker & Sycara, 1997; Fink, Kobsa & Schrenk, 1997; Oppermann, Rashev & Kinshuk, 1997; Oppermann, 1998), data mining (Berry & Linoff, 1997; Michalski, Bratko & Kubat, 1998; Westphal & Blaxton, 1998), etc. are connected with acquisition of implicit knowledge in the relevant domain. Neural networks can present internally the knowledge necessary to solve a given problem. After learning the network's knowledge about the problem is spread all over its weights and units. Therefore the neural networks describe implicit knowledge. In many practical situations like the above mentioned there is no special need to translate it into explicit knowledge. The widely used learning method for neural networks is the backpropagation algorithm (Rumelhart, Hinton & Williams, 1986). The classical backpropagation algorithms are supervised algorithms indicating the targets. In real-life situations of extraction of implicit knowledge it is often * Corresponding author. Tel.: 1359-2-965-3693; fax: 1359-2-512-909. E-mail address: nikov@tu-so®a.acad.bg (A. Nikov).

dif®cult for the experts to determine the exact target as a point evaluation. If in this case it could be possible to determine the interval where the point evaluation belongs to, this could help considerably the experts for their de®nition of the target. The basic idea of the paper is to present an algorithm ensuring quasi-unsupervised learning based on the convergence conditions of the algorithm. One such algorithm is the fuzzy backpropagation (FBP) algorithm (Stoeva & Nikov, 2000), which is quicker than the standard backpropagation algorithms (Nauck, Klawonn & Krause, 1997; Rumelhart et al., 1986). For the bottom-up aggregation of input values (forward propagation) the Sugeno fuzzy integral (Sugeno, 1977) is employed. This aggregation has a psychological background that simulates the experts' decision-making. For top-down network weights changing (learning) the error backpropagation takes place. In this way fuzzy logic is combined with neural networks (Nauck et al., 1997). However, the FBP algorithm has a drawback: the computation of the net function is based on constructing of the power set of the set of network inputs. Therefore the computational time grows exponentially in case of large-sized neural networks containing neurons with a large number of inputs. This reduces the possibilities for application of the FBP algorithm in the above mentioned areas. Here we

0893-6080/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S 0893-608 0(00)00085-X

232

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

propose a modi®cation of the FBP algorithm called QuickFBP algorithm where the computation of the net function is signi®cantly quicker. Therefore this work presents an extension of the paper (Stoeva & Nikov, 2000), where the convergence conditions of the FBP algorithm are proved only for single output neural networks in case of training patterns with equal targets. In the present paper new convergence conditions of the QuickFBP resp. the FBP algorithm are formulated and proved for:

Table 1 Notation list _ ^ÿ  l ai ÿ  m l ÿa l ÿ  aiÿ  s 0

They determine the target(s) depending on the network's input values. Fuzzy neurons are used in similar learning processes in (Pedrycz, 1991; Pedrycz & Rocha, 1993), but the convergence conditions for these algorithms are not published. The simulation examples provided here illustrate the QuickFBP algorithm. A summary of notations used is given in Table 1 as a notation list. 2. Description of the QuickFBP algorithm The QuickFBP algorithm is a fuzzy backpropagation algorithm that offers a signi®cant computational speed-up for training multilayer networks. It provides a considerable acceleration of the convergence of the FBP algorithm by means of a modi®ed computation of the net function. The steps of the QuickFBP algorithm are described as follows (cf. Fig. 1): Each training pattern is composed of n0 input values xj and respectively of n2 desired output values (targets) tk (cf. Fig. 4). Step 1: determining of input values of training patterns The input values of training patterns are determined arbitrarily without assumptions about their probability distributions and about their statistical independence. They are  transformed in the unit interval 0; 1 . Step 2: determining of target values of training patterns The targets   of training patterns are determined in the unit interval 0; 1 according to the relevant convergence conditions (cf. Section 4 below). This can be done: 1. Automatically by a computer on the basis of: ± the interval between the minimum and maximum of the input values of training pattern(s) with equal targets for single output neural networks or ± the convergence pro®le of input values of training patterns with different targets for single output neural networks or ± the convergence 3D-pro®le of input values of training

activation value ai

l

at the sth step

aÿj 0  0 aÿj 00  0 aÿj 0m 

the upper bound of pattern's inputs

aÿj 00m  f net g h ÿ l neti s t tm tmkÿ  l wij ÿ  l Dw ÿ ij l ÿ  wij s

the lower bound of input values of the mth pattern activation function ÿ  fuzzy measure on the set PÿJ fuzzy measure on the set P I net value of the ith neuron at the lth layer the sth iteration step the pattern's output value, i.e. target value the pattern's output value of the mth pattern the kth output value of the mth pattern

0

1. single output neural networks in case of training patterns with different targets and 2. multiple output neural networks in case of training patterns with equivalued target vector.

max operation in unit interval [0, 1] min operation in unit interval [0, 1] activation value ofÿ the ith neuron at the lth layer activation value aÿl relevant to the mth pattern

wold ij wnew ij xj I J J K L M Nÿ1 PI Q R Tm Xm ÿ l di f h

the lower bound of pattern's inputs the upper bound of input values of the mth pattern

weight of the jth input of the ith neuron at the lth layer weight change at the lth layer ÿ l

weight wij at the sth step ÿ l ÿ  weight wij s ÿ  l ÿ weight wij s 1 1 the jth pattern's input the set of (indices of) neurons at the hidden layer the set of (indices of) network inputs the cardinal number of the set J the set (of indices of) neurons at the output layer the number of network layers the set of (indices of) training patterns the set of positive integers the power set of the set I performance index the set of real numbers the vector of output values of the mth pattern, i.e. the target vector the vector of input values of the mth pattern error for the ith neurion at the lth layer the empty set the learning rate

patterns with an equivalued target vector for multiple output neural networks. 2. Semi-automatically by experts that de®ne the direction and degree of target value changes. Thus the actual target value is determined by fuzzy sets relevant to prede®ned linguistic expressions. Step 3: determining of neural network initial weights The neural network initial weights are determined also arbitrarily without requiring to be different from each other.  They belong to the unit interval 0; 1 . Step 4: quick computation of the net function In this step a quicker computation of the net function of

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

233

Fig. 1. QuickFBP algorithm steps.

the QuickFBP algorithm in comparison with that one of the FBP algorithm  isproposed. ÿ  0; 1 , R; j ˆ 1; ¼; n0 , and all network Let xj [   ÿ  l weights wij [ 0; 1 , R; where l ˆ 1,2. Let P J be the power set of the set J of the network inputs (indices), where Jij is the jth input of the ith neuron. Further on for simplicity network ÿ   inputs  are identi®ed with their indices. Let g : P J ! 0; 1 be a function de®ned as follows: g…B† ˆ 0 n o g Jij ˆ w…ij1† ; n

g Jij1 ; Jij2 ; ¼; Jijr

ˆ wij…1† _ w…ij12† _ ¼ _ w…ij1r† ; 1

The FBP algorithm calculates the net function as follows (Stoeva & Nikov, 2000): Hidden neurons    neti…1† ˆ _ ^ xp ^ g…G† …1† G[P…J †

p[G

  …1 † ; a…1† i ˆ f neti

i ˆ 1; 2; ¼; n1

Output neurons    ^ a…p1† ^ h…G† net…2† ˆ _

1 # j # n0 o

2.1. Computation of the net function by the FBP algorithm

G[…I †

where

fj1 ; j2 ; ¼; jr g , f1; 2; ¼; n0 g n o g Ji1 ; Ji2 ; ¼; Jij ; ¼; Jin0 ˆ 1; where the notations _ and ^ stand for the operations   MAX and MIN respectively in the unit real interval 0; 1 , R. It is proved (Stoeva, ÿ  1992) that the function g is a fuzzy measure on P J . Therefore the functional assumes the form of the Sugeno over the ®nite reference set J. ÿ  integral  Let h : P I ! 0; 1 be a function, de®ned in a similar way.

p[G

…2†

  a…2† ˆ f net…2† ÿ  ÿ  f net are linear ones or equal to functions ÿ The  activation 24 in order to obtain f ÿnet ˆ 1=…1 1 e net20;5 †  [ ‰0; 1Š. f ÿnet Therefore the activation values l ai [ ‰0; 1Š, where l ˆ 1; 2. 2.2. Computation of the net function by the QuickFBP algorithm The quicker computation of the net function proposed is carried out ÿ as follows: ÿ ÿ  Let Ps J resp. Ps I be the subset of the power set P J

234

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

ÿ resp. P I of the set J of network inputs resp. the set I of neurons at hidden layer that contains only the subsets of single elements and the whole set J resp. I.

Proposition 1.. The net function can be calculated as follows: Hidden neurons    …1 † ^ xp ^ g…G† …3† neti ˆ _ G[Ps …J †

p[G

Output neurons    ^ a…p1† ^ h…G† : net…2† ˆ _ G[Ps …I †

p[G

…4†

Proof. During the computation of net function neti…1† according to Eq. (3) the operations _ and ^ are taken over the activation values resp. weights of all subsets of single elements of the set J and of the whole set J, i.e. over all the combinations of 1-class and over the only one combination of J - class. The same activation values resp. weights ªbuild upº all the possible combinations of r-class of activation values resp. weights, where r ˆ 1; 2; ¼; J , that are needed for computation of net function neti…1† according to Eq. (1). Since J is a ®nite set, it is reasonable that different results of the operations _ and ^ can not appear during both computations of net function neti…1† . Therefore, the netÿ function neti…1† can be computed on the  basis of the set Ps J . Analogously, it can be proved that the computation of ÿthe  net function net…2† can be done on the basis of the set Ps I . In order to get a value at the output node, the input nodes of the neuron proposed by (Pedrycz, 1991) should be activated to a certain degree. Otherwise, because of lack of summation, the aggregation effect might be too low to activate the output. To overcome this shortcoming and to obtain the entire unit interval of admissible values at the output node, a bias v is incorporated in Pedrycz's (1991) fuzzy neuron:  h i  …5† neti…1† ˆ _ _ xj ^ w…ij1† ; v :

Step 5: error computation Here the error computation formula for single-layeredsingle-output (l ˆ 1; nl ˆ 1) neural networks will be derived. The following performance index Q is introduced: Qˆ

1X 2 …t 2 m a…l† i † ; 2 m m

where m ˆ 1; 2; ¼; M is the number of training patterns. A stationary point of Q is searched following a sequence of iterations where the appropriate increments/decrements 2Q are based upon the derivative …l† : Using the chain rule, we 2wij obtain:

2Q 2Q 2m a…i l† ˆ m …l† …l † 2wij 2 ai 2w…ijl†

j

j

This bias is of great importance for proving the convergence conditions in Theorem 1. Such conditions have not been published for the Pedrycz's algorithm (Pedrycz, 1991).

…8†

The ®rst derivative can be computed from Eq. (7):   2Q ˆ 2 tm 2 m a…i l† m …l † 2 ai ÿ  Taking into account that the activation function f net is a linear one, the second derivative is expressed as follows: " " #! !# 2neti…l† 2 …l† ˆ _ _ xj ^ wij ; ^ xj j j 2w…l† 2w…l† is is " # 8 9 …l† i > > > w ; if ^ x # _ ^ w x > is j j ij > > j j > > > > < = 2 " # ˆ …l† > …l† …l† …l† > 2wis > and _ j xj ^ wij # xs ^ wis and wis # xs > > > > > j±s > > ; :

t;

otherwise

ÿ l t is an expression that does not include wis . Therefore it follows that the error can be computed as: 8 9 h i …l† > > 1; if ^ x # _ x ^ w > > ij j ij > > j j > > > > < = …l† h i 2neti … l † … l † … l † ˆ and _ j xj ^ w # xs ^ w and w # xs …l† ij is is > > 2wis > > j±s > > > > > > : ; 0; otherwise Therefore, substituting back into Eq. (8), we obtain:

j

In order to ®nd the convergence conditions for the QuickFBP algorithm we introduce a special bias ^j xj . Therefore, Eq. (3) can be written as follows:  h  i …6† neti…1† ˆ _ _ xj ^ w…ij1† ; ^ xj :

…7†

d…l† i ˆ 2

ˆ

8 > > > >
> > > =

…l† …l† and _ j ‰xj ^ w…l† ij Š # wis and wis # xs > > > > j±s > > > > : ; 0; otherwise

…9†

The error for networks with more than one layer (l . 1) can be computed for each network node in an analogous way.

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

235

Fig. 2. Time complexity of the FBP and the QuickFBP algorithms for one neuron with n inputs.

Step 6: network weights learning   [ 0; 1 the In order to obtain the trained weights wnew ij weights are handled as follows: Case t ˆ a…2† : no adjustment of the weights is necessary Case t . a…2† : ÿ …l†  new If wold ˆ 1 ^ wold ij , t then wij ij 1 Dwij new ˆ wold If wold ij $ t then wij ij …2† Case t , a : ÿ …l†  new ˆ 0 _ wold If wold ij . t then wij ij 2 Dwij new ˆ wold If wold ij # t then wij ij . Dw…l† ij

…l† hd…l† i ; di

ˆ is determined by means of Eq. (9) where and h is the learning rate. 3. Time complexity of the QuickFBP algorithm versus the FBP algorithm For large-sized neural networks it is very important to determine the application bounds of the learning algorithm used. In case of a large number of network inputs, the time complexity of the algorithm determines its application bounds (Papadimitriou & Steiglitz, 1982). An appropriate measure of algorithm complexity is the number of steps for the computation of the net function for a given number of network inputs n. For simplicity without a loss of generality, a network with one neuron with n inputs is considered. The number of steps for the computation of the net function of the FBP algorithm according to Eqs. (1) and (2) is approximately 2n11 2 1 (Reingold, Nievergelt & Deo, 1977). The number of steps for the computation of the net function of the QuickFBP algorithm according to Eqs. (3) and (4) is approximately n 1 1. So the time complexity of

ÿ  the FBP algorithm is approximately O 2n11 and the time complexity of the QuickFBP algorithm is approximately ÿ  On. An example is given in Fig. 2 for the relevant time complexity of the FBP and the QuickFBP algorithms in this case. Accordingly, the QuickFBP algorithm for one neuron with 100 inputs needs 101 computational steps, as the FBP algorithm needs 10 30 steps. Therefore the QuickFBP algorithm is approximately 2n11 n

…10†

times quicker than the FBP algorithm in this case. 4. Convergence conditions of the QuickFBP resp. the FBP algorithm The convergence conditions of the FBP algorithm (Stoeva & Nikov, 2000) are proved only for single output neural networks in case of training patterns with equal targets. Here we add and prove the following two new convergence conditions for: 1. single output neural networks in case of training patterns with different targets (cf. Section 4.2) and 2. multiple output neural networks in case of training patterns with equivalued target vector (cf. Section 4.3). Without loss of generality, we may consider networks with only one hidden layer, but all de®nitions and convergence conditions can easily be extended to networks with more than one hidden layer (see the Description of the QuickFBP algorithm). The proofs can be performed through mathematical induction with respect to the number of network layers.

236

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

Fig. 3. A neural network with single output neuron and a hidden layer.

Fig. 4. A neural network with n2 output neurons and a hidden layer.

4.1. Convergence conditions for single output neural networks in case of training patterns with equal targets The following convergence conditions of the FBP algorithm for single output neural networks in case of single training pattern (cf. Fig. 3) are also valid for the QuickFBP algorithm. This can easily be proved using Proposition 1. According to these conditions the interval between the minimum and maximum of the input values of training pattern shall include the output of the training pattern, i.e. the target value (Stoeva & Nikov, 2000). De®nition 1. The QuickFBP resp. the FBP algorithm is convergent to the target value t if there exists a number s0 [

N 1 such that the following equalities hold: ÿ  ÿ  ÿ  t ˆ a…2† s0 ˆ a…2† s0 1 1 ˆ a…2† s0 1 2 ˆ ¼; ÿ  ÿ  ÿ  w…ijl† s0 ˆ w…ijl† s0 1 1 ˆ w…ijl† s0 1 2 ˆ ¼;

l ˆ 1; 2:

The following theorem concerns the case of single training pattern: Theorem 1. The QuickFBP, resp. the FBP algorithm is convergent to the target value t if the following condition holds for the neurons at the input layer:   a…j 000 † # t : 'j 0 'j 00 a…j 00 † $ t

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

The proof of this theorem is given in (Stoeva & Nikov, 2000). De®nition 2. In case of multiple training patterns  ÿ m ˆ 1; ¼; M; the QuickFBP resp. the FBP Xm ; t ; algorithm is convergent to the target value t if there exists a number s0 [ N 1 such that following equalities hold: ÿ  ÿ  ÿ  t ˆm a…2† s0 ˆm a…2† s0 1 1 ˆm a…2† s0 1 2 ˆ ¼;

237

the QuickFBP, resp. the FBP algorithm is convergent to the target values tm , m ˆ 1; 2, iff the following condition holds:   and t2 ˆ a…j 000 † or t1 ˆ a…j 000 † 

1

t1 ˆ a…j 00 † 1

2

and

 t2 ˆ a…j 00 † : 2

m ˆ 1; ¼; M ÿ  ÿ  ÿ  w…ijl† s0 ˆ w…ijl† s0 1 1 ˆ w…ijl† s0 1 2 ˆ ¼;

l ˆ 1; 2:

The following theorem concerns the case of training patterns with equal targets. Theorem 2. Let more than one training pattern  ÿ …0† m ˆ 1; ¼; M, be given. Let a…0† Xm ; t ; j 0m ; aj 00m be inputs of the mth pattern, such that Theorem 1. holds, i.e. a…0† j0 $ t …0† …0† …0† …0† m M M # t. Let a ˆ ^ a and a ˆ _ a and a…0† 00 0 0 00 00 mˆ1 j m mˆ1 j m . Then jm m m the QuickFBP, resp. the FBP algorithm is convergent to the target value t if the following condition holds: a…0† m 0 $ t and a…0† # t. 00 m The proof of this theorem is given in Stoeva and Nikov (2000). 4.2. Convergence conditions for single output neural networks in case of training patterns with different targets The following convergence conditions of the QuickFBP resp. the FBP algorithm for single output neural networks (cf. Fig. 3) in case of training patterns with different targets are formulated and proved: the target value of each training pattern shall equal its minimal input value or its maximal one. The following De®nition 3 and Theorem 3 concern the case of two training patterns and De®nition 4 and Theorem 4 concern the common case of multiple training patterns (M . 2).  ÿ De®nition 3. In case of two training patterns Xm ; tm , m ˆ 1; 2, the QuickFBP resp. the FBP algorithm is convergent to the target values tm , m ˆ 1; 2, if there exists a number s0 [ N1 , such that the following equalities hold: t1 ˆ a …s0 † ˆ a …s0 1 2† ˆ a …s0 1 4† ˆ ¼; …2 †

…2†

…2†

t2 ˆ a…2† …s0 1 1† ˆ a…2† …s0 1 3† ˆ a…2† …s0 1 4† ˆ ¼; …l† …l† ¼; w…l† ij …s0 † ˆ wij …s0 1 1† ˆ wij …s0 1 2† ˆ

where l ˆ 1; 2.  ÿ Theorem 3. Let two training patterns Xm ; tm , m ˆ 1; 2, …0† be given. Let a…0† j 00m ; aj 0m be the minimal, respectively the maximal input value of the mth training pattern, such …0† that Theorem 1 holds, i.e. a…0† j 0m $ tm and aj 00m # tm . Then

…0† Proof. Let t1 ˆ a…0† j 001 , t2 ˆ aj 002 and t1 # t2 . At some iteration step s0 the lower target value t1 implies a reduction of network weights suf®ciently low such that at the next iteration step s0 1 1 only the network inputs determine the network output. So the weights remain unchanged further on. The iteration process is convergent to the target values t1 and t2 according to De®nition 3. …0† Let t1 ˆ a…0† j 01 , t2 ˆ aj 02 and t1 $ t2 . At some iteration step s0 the higher target value t1 implies increase of network weights suf®ciently high such that at the next iteration step s0 1 1 only the network inputs determine the network output. So the weights remain unchanged further on. The iteration process is convergent to the target values t1 and t2 according to De®nition 3. The `only if' part of Theorem 3 can be proved by analogy with the `only if' part of Theorem 1.

 ÿ De®nition 4. In case of multiple training patterns Xm ; tm , m ˆ 1; ¼; M, the QuickFBP, resp. the FBP algorithm is convergent to the target values tm , m ˆ 1; ¼; M, if there exists a number s0 [ N1 , such that the following equalities hold: ÿ  ÿ  tm ˆ a…2† s0 1 m 2 1 ˆ a…2† s0 1 M 1 m 2 1 ÿ  ˆ a…2† s0 1 2M 1 m 2 1 ˆ ¼; m ˆ 1; ¼; M ÿ  ÿ  ÿ  w…ijl† s0 ˆ w…ijl† s0 1 1 ˆ w…ijl† s0 1 2 ˆ ¼; where l ˆ 1; 2.  ÿ Theorem 4. Let more than two training patterns Xm ; tm , 0 0 m ˆ 1; ¼; M, M . 2, be given. Let aj 00m ; aj 0m be the minimal respectively maximal input value of the mth training pattern, …0† such that Theorem 1 holds, i.e. a…0† j 0m $ tm and aj 00m # tm . Then the QuickFBP resp. the FBP algorithm is convergent to the target values tm , m ˆ 1; ¼; M, if the following conditions hold:   ;m tm ˆ a…j 000m† or

  ;m tm ˆ a…j 00m† ;

m ˆ 1; ¼; M:

238

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

Fig. 5. A single output neural network in case of training patterns with different targets.

Proof. Theorem 3 can be easily generalised for the common case of M training patterns. An example illustrating Theorems 3 and 4 is given in Figs. 5±7. 4.3. Convergence condition for multiple output neural networks in case of training patterns with equivalued target vector The following convergence conditions of the QuickFBP, resp. the FBP algorithm for multiple output neural networks (cf. Fig. 4) in case of training patterns with equivalued target vector are formulated and proved: the target values of each training pattern shall equal its minimal input value or its maximal one. The following De®nition 5 and Theorem 5 concern multiple output neural networks in case of training patterns with equivalued target vector.

De®nition 5. In case of multiple training patterns ÿ ÿ  Xm ; Tm , m ˆ 1; ¼; M, Tm ˆ tm1 ; tm2 ; ¼; tmn2 , the QuickFBP, resp. the FBP algorithm is convergent to the target values tmk , m ˆ 1; ¼; M, k ˆ 1; ¼; n2 , if there exists a number s0 [ N1 , such that the following equalities hold: ÿ  ÿ  tmk ˆ a…k2† s0 1 m 2 1 ˆ a…k2† s0 1 M 1 m 2 1 ÿ  ˆ a…k2† s0 1 2M 1 m 2 1 ˆ ¼; m ˆ 1; ¼; M;

k ˆ 1; ¼:; n2 ;

ÿ  ÿ  ÿ  w…ijl† s0 ˆ w…ijl† s0 1 1 ˆ w…ijl† s0 1 2 ˆ ¼; where l ˆ 1; 2.  ÿ Theorem 5. Let M ÿ training patterns Xm ; Tm , where  m ˆ 1; ¼; M, Tm ˆ tm1 ; tm2 ; ¼; tmn2 , be given. Let …0† a…0† j 00m ; aj 0m be the minimal respectively maximal input value of the mth training pattern, such that Theorem 1

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

239

Fig. 6. Convergence pro®les and an illustrative non-convergence pro®le determining target values for neural network shown in Fig. 5.

Fig. 7. Errors in case of training patterns with different targets. …0† holds, i.e. a…0† j 0m $ tmk and aj 00m # tmk , where k ˆ 1; ¼; n2 . Then the QuickFBP resp. the FBP algorithm is convergent to the target values tmk , m ˆ 1; ¼; M, k ˆ 1; ¼; n2 , if the following conditions hold:   ;m;k tmk ˆ a…j 000m†

An example illustrating Theorem 5 is shown in Figs. 8±11. 5. Computer simulation results

or

  ;m;k tmk ˆ a…j 00 m† ;

Proof. Theorem 4 can be easily generalised for the common case of network with n2 outputs.

m ˆ 1; ¼; M;

k ˆ 1; ¼; n2 :

In the following a computer simulation illustrates the QuickFBP algorithm. Examples of single, resp. multiple

240

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

Fig. 8. A multiple output neural networks in case of training patterns with equivalued target vector.

Fig. 9. Convergence pro®les and an illustrative non-convergence pro®le determining equivalued target vector for neural network showed in Fig. 8.

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

241

ÿ 3 Fig. 10. Errors in case of training patterns with equivalued target vector for network output a1 .

ÿ 3 Fig. 11. Errors in case of training patterns with equivalued target vector for network output a2 .

output neural networks in case of training patterns with different targets (cf. Figs. 5±7) resp. equivalued target vector (cf. Figs. 8±11) are given. The QuickFBP algorithm is compared with the FBP algorithm (cf. Figs. 12 and 13). An example illustrates the application of the QuickFBP algorithm for adaptation of an interactive web system to users (cf. Fig. 14).

5.1. Single output neural network in case of training patterns with different targets The following example illustrates Theorems 3 and 4 concerning the convergence conditions for single output neural networks in case of training patterns with different targets. In Fig. 5 an example of a neural network with 4

242

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

Fig. 12. A fully connected neural network with 10 inputs, 55 neurons, single output neuron and nine hidden layers.

inputs, 6 neurons at 3 layers and 1 output is given. Five training patterns are used. According to Theorems 3 and 4, the following two convergence pro®les for determining the target values and an illustrative non-convergence pro®le are de®ned. MinTargets: Low convergence pro®le. The target value should equal the minimal input value of relevant training

pattern. If all target values ful®l this condition the learning process is convergent. After two training steps the error (difference between actual network output and target value of relevant training pattern) is zero (cf. Fig. 7). MaxTarget: Upper convergence pro®le. The target value should equal the maximal input value of relevant training pattern. If all target values ful®l this condition the learning process is convergent. After 5 training steps the error reaches zero (cf. Fig. 7). Min_MaxTargets: Illustrative non-convergence pro®le with oscillations. It is a pro®le different from MinTarget and MaxTarget pro®les. If one or more target values belong to this pro®le the learning process is not convergent. There are oscillations but the error does not reach zero (cf. Fig. 7). 5.2. Multiple output neural network in case of training patterns with equivalued target vector

Fig. 13. Comparison of the FBP and the QuickFBP algorithms on the basis of computational time [s] needed for their convergence.

An example illustrates Theorem 5 concerning the convergence condition for multiple output neural networks in case of training patterns with equivalued target vector. For this purpose a neural network is used with 4 inputs, 7 neurons at 3 layers and 2 outputs (cf. Fig. 8). Here ®ve training patterns are applied for network learning. According to Theorem 5

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

243

Fig. 14. Initial (left) and adapted (right) ELFI interaction structure.

for each training pattern three equivalued target vectors are determined. The elements of the target vectors belong to the pro®les shown in Fig. 9. For the ®rst training pattern these vectors are:

steps. The real QuickFBP computational time is 5% of FBP time (cf. Fig. 13). The deviation of the experimental result from the theoretical one can be explained by taking into account the time spent for error, weights, etc. calculations.

² MinTargets: (0.0, 0.0) ² MaxTargets: (0.9, 0.9) ² Min_MaxTargets: (0.5, 0.5)

5.4. Illustrative example of an user-adapted interactive web system

In Figs. 10 and 11 the learning process is convergent for MinTargets and MaxTargets pro®les. At the same time the process is not convergent for the Min_MaxTargets pro®le, i.e. for target values different from the minimal and maximal input values of the relevant training patterns. 5.3. Experimental comparison of the QuickFBP algorithm with the FBP algorithm For a comparison of the QuickFBP with the FBP algorithm, a large-sized neural network with 55 neurons at 10 layers is used (cf. Fig. 12). Theoretically, according to Eq. (10), the steps for computing of net functions of all neurons by the QuickFBP algorithm are approximately 1% of FBP

The information over¯ow in Internet can be tackled through adapting the information to a particular user or a group of users. An adaptation of the interaction structure of the web system Electronic Research Funding Information server ELFI (http://www.el®.ruhr-uni-bochum.de/el®/) based on the QuickFBP algorithm is carried out. The user interactions with ELFI are recorded in a log®le. The training patterns are taken from this log®le data using the transition frequencies between the interaction points. Using these patterns the QuickFBP algorithm determines the weights of the interaction points. On the basis on these weights the interaction structure (left) in Fig. 14 was adapted to the user (cf. right interaction structure in Fig. 14). Further details are given in (Nikov & Pohl, 1999).

244

A. Nikov, S. Stoeva / Neural Networks 14 (2001) 231±244

6. Conclusions

References

This work presents an extension of the paper (Stoeva & Nikov, 2000). A modi®cation of the fuzzy backpropagation algorithm called QuickFBP algorithm is proposed, where the net function computation is significantly quicker. It is proved that the FBP algorithm is of exponential time complexity, while the QuickFBP algorithm is of polynomial time complexity. Therefore the QuickFBP algorithm is approximately 2n11 =n times quicker than the FBP algorithm for one neuron with n inputs. An example with a large-sized neural network shows a signi®cant reduction of QuickFBP computational time down to only 5% of FBP time. Two convergence conditions of the QuickFBP, resp. the FBP algorithm are de®ned and proved for:

Berry, M., & Linoff, G. (1997). Data mining techniques for marketing, sales and customer support. New York: John Wiley & Sons. Berthold, M. & Hand, D. J. (Eds.). (1999). Intelligent data analysis: an introduction. Berlin: Springer-Verlag. Brusilovsky, P., Kobsa, A. & Vassileva, J. (Eds.). (1998). Adaptive hypertext and hypermedia. Dordrecht: Kluwer. Decker, K. S., & Sycara, K. (1997). Intelligent adaptive information agents. Journal of Intelligent Information Systems, 9, 239±260. Espinoza, F., & HoÈoÈk, K. (1996). An interactive WWW interface to an adaptive information system. Workshop ªUser Modeling for Information Filtering on the World Wide Webº at the Fifth International Conference on User Modeling, Hawaii, (http://www.cs.su.oz.au/~bob/ um96-workshop.html). Feldman, R., & KloÈsgen, W. (1998). Data mining on the web: a new promising challenge. KuÈnstliche Intelligenz, 1, 35±36. Fink, J., Kobsa, A., & Schreck, J. (1997). Personalized hypermedia information provision through adaptive and adaptable system features: user modeling, privacy and security issues. Proc. Fourth International Conference on Intelligence in Services and Networks, Como. Berlin: Springer. (pp. 459±467). Gunst, G., Oppermann, R., & Thomas, C. G. (1996). Adaptable and adaptive systems. Computers as assistantsÐA new generation of support systems (pp. 29±46). Hillsdale: Lawrence Erlbaum Associates. Kinnebrock, W. (1994). Accelerating the standard backpropagation method using a genetic approach. Neurocomputing, 6, 583±588. Michalski, R. S., Bratko, I. & Kubat, M. (1998). Machine learning and data mining: methods and applications. London: John Wiley. Nauck, D., Klawonn, F., & Krause, R. (1997). Foundations of neuro-fuzzy systems, Chichester: Wiley. Nikov, A., & Pohl, W. (1999). Combining user and usage modeling for user-adaptivity. In H. -J. Bullinger & J. Ziegler, Human±computer interaction: ergonomics and user interfaces (pp. 336±340). London, Mahwah, New Jersey: Lawrence Erlbaum Associates. Oppermann, R., Rashev, R., & Kinshuk. (1997). Adaptability and adaptivity in learning systems. In A. Behrooz, Knowledge transfer (pp. 173± 179). London: Pace. Oppermann, R. (1998). Adaptive user support: ergonomic design on manually and automatically adaptable software (computers, cognition, and work) Hillsdale, New Jersey: Lawrence Erlbaum Associates. Papadimitriou, C. H., & Steiglitz, C. (1982). Combinatorial optimization: algorithms and complexity. Englewood Cliffs: Prentice-Hall. Pedrycz, W. (1991). Neurocomputations in relational systems. IEEE Trans. Patterns Analysis and Machine Intelligence, PAMI-13, 289±297. Pedrycz, W., & Rocha, A. F. (1993). Fuzzy-set based models of neurons and knowledge-based networks. IEEE Trans. Fuzzy Systems, 1, 254± 266. Reingold, E. M., Nievergelt, J., & Deo, N. (1977). Combinatorial algorithms: theory and practice. Englewood Cliffs: Prentice-Hall. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). Parallel distributed processing: exploration in the microstructure of cognition Cambridge MA: MIT Press. Sugeno, M. (1977). Fuzzy measures and fuzzy integrals: a survey. In M. Gupta, G. Sarides & B. Gaines, Fuzzy automata and decision processes (pp. 89±101). Amsterdam: North-Holland. Stoeva, S., & Nikov, A. (2000). A fuzzy backpropagation algorithm. Fuzzy Sets and Systems, 112 (1), 27±39. Stoeva, S. (1992). A weight-learning algorithm for fuzzy production systems with weighting coef®cients. Fuzzy Sets and Systems, 48, 87±97. Westphal, C., & Blaxton, T. (1998). Data mining solutions, John Wiley.

1. single output neural networks in case of training patterns with different targets and 2. multiple output neural networks in case of training patterns with equivalued target vector. These conditions determine the target(s) depending on the network's input values by means of introducing a special bias in the net function. The bias differs from similar biases used in other works, e.g. in Pedrycz's fuzzy neuron (Pedrycz, 1991). On the basis of the convergence conditions, the target(s) of training patterns can be determined automatically or semi-automatically. Therefore it is possible to carry out quasi-unsupervised learning of network weights that ensures a wide application of the QuickFBP algorithm. The QuickFBP algorithm has been implemented and tested in the area of adaptive interactive web systems. The results of this ®rst real world application of the algorithm demonstrate its ability to adapt the interaction structure to the user. The QuickFBP algorithm should be implemented in other interactive web systems for ®ltering the information to meet the individual users needs and preferences (Espinoza & HoÈoÈk, 1997; Gunst, Oppermann & Thomas, 1996). This adaptivity of web systems presents an important issue ensuring web mining (Feldman & KloÈsgen, 1998). Further acceleration of the algorithm may be achieved using genetic algorithms for changing the network's weights (Kinnebrock, 1994).

Acknowledgements This article was inspired by research on the LaboUr project ªMachine Learning for User Modelingº funded by grant No 1044/12 of the German Science Foundation. The work was partially carried out while A. Nikov was a guest researcher at GMD FITÐthe German National Research Center for Information Technology.