To appear in: International Conference on Artificial Neural Networks ICANN’97 to be held Lausanne, Switzerland, October 8-10, 1997
The Application of Radial Basis Function Networks with Implicit Continuity Constraints Ralf Salomon Department of Computer Science, University of Zurich Winterthurerstr. 190, 8057 Zurich, Switzerland FAX: +41-1-363 00 35, E-mail:
[email protected] Abstract. In contrast to most applications, it is not suitable for autonomous agents to distinguish between a learning and a performance phase; rather continuous learning is required, especially in dynamically changing, partially unknown environments. This paper shows how modified radial basis function networks can be used as controllers for mobile robots that can adapt to different environments and also to sensor faults. In addition, the proposed model yields fast convergence rates in various regression and classification tasks, e.g., learning the well-known double-spiral problem requires only one epoch with perfect generalization.
1 Introduction One goal of research on autonomous agents is to build robots that are able to behave without human control in dynamically changing, partially unknown environments. The control architecture of such agents is typically implemented as neural networks. The research community has developed several different neural network models, such as backpropagation (BP), radial basis functions (RBF) [10], growing cell structures [4], and self-organizing (Kohonen) feature maps [6], which have been successfully applied in diverse areas, such as pattern and speech recognition, financial forecasting, control, etc. A common characteristic of the aforementioned models is that they distinguish between a learning and a performance phase. This distinction is, however, inappropriate for autonomous agents, since they are exposed to a continuous, unending stream of sensory patterns x1 ; . . . ; xt in which not only the composition but also the relevance of each pattern might change over time, especially in dynamically changing environments. It is therefore not sufficient to select a set of training patterns, but rather continuous on-line learning is required. Due to memory demands, it is also not suitable to store the entire sequence of all observed sensory patterns. Even the storage of the last p patterns would not be a solution, since repetitive retraining (one epoch) would be too time consuming to be done during two time steps. The unusual task of neural network training for autonomous agents is that it has to provide an adequate and representative compression in time, rather than finding a compression of a pre-selected, finite training set as is usually done. In the autonomous agent context, the well-known problem of oversized networks becomes very important. A network with too many weights/parameters with respect to the number of distinct training patterns very likely generalizes poorly (see also the discussion of the VC dimension [13]. The VC dimension expresses the probability of a missclassification as 1
a function of the number of weights and the number of training patterns). For existing network models, many techniques have been developed that tackle this problem by dynamically adjusting the network size during training. BP-oriented examples are cross validation (e.g., [2]), weight sharing [8], hints [1], tangent prop [12], and pruning methods, such as optimal brain damage [9] or skeletonization [11]. The size of an RBF network is normally determined by means of clustering algorithms, such as k-means clustering or h-clustering, which estimate the centers and width of the receptive fields. Fine-tuning can be done by applying gradient descent. These methods are, however, not applicable for autonomous agents, since they require a fixed training set. Furthermore, a long series of sensory stimulation xt does not ensure that this series provides sufficiently many distinct training patterns. Section 2 proposes a modified RBF network that (1) bounds its high VC dimension by an implicit continuity constraint and (2) allows for continuous updates without changing the learned mappings of other regions in input space. Section 3 shows how this model can be used as a controller for autonomous agents, and Section 4 presents some regression and classification applications. Section 5 concludes with a brief discussion.
2 The Model R(x), and the normalized output is of the form The model a set Pof RBFs P employs h connects input unit j with hidden f (x) = f R (x)= R (x). A weight wij o unit i and a weight wij connects hidden unit j with output unit i. Each hidden unit i computes its netinput nethi and activation ohi as follows nethi
=
X ? oi
j
h j ? wij
2
;
ohi
=
e
?nethi
;
(1)
with oij denoting the activation of input unit j and denoting a scaling factor, which is constant and equal for all hidden units. The activation of each output unit ooi is given as
ooi
P
=
o h j wij oj h : j oj
P
(2 )
Rather than obtaining a small VC dimension by parsimonious design, this model features many hidden units and reduces its high VC dimension by imposing an implicit continuity h connections are initialized at random and remain fixed. The output constraint. The wij o weights wij are locally adapted as follows
o wij
h o + Poj (t ? oo ) + 0:1 (t ? wo ) ; wij i i i ij h j oj
(3)
with denoting the learning rate and ti denoting the expected output at unit i. In this learning rule, the term (ti ? ooi ) performs the actual learning, whereas the term o ) stabilizes the convergence of the weights wo . 0:1(ti ? wij ij The parameter specifies the size of each receptive field and depends on the resolution of the input space. should be in the order of the average distance between 2
IR 2
IR 3
IR 4
IR 5
IR 1
IR 6 Right wheel
Left wheel
IR 8
IR 7
Fig. 1. The Khepera robot and the location of the infrared sensors.
adjacent training points, e.g., 100 training points equally distributed in x 2 [?5::5], o by applying leads to = (5 + 5)=100 = 0:1. Each output unit adapts its weight wij o learning rule (3) and interpolates over those weights wij that have the most active hidden units ohj . Since the learning rule and the output units all consider not only the winner but all hidden units in proportion to their activation, the model establishs an implicit continuity hint, which reduces the network’s effective VC dimension.
3 Application to Autonomous Agent Control The RBF model described in the previous section has been used to control the KheperaTM robot, which is shown in Fig. 1. Khepera’s body is 32 mm high and 55 mm in diameter, it is equipped with 8 infrared sensors, which send very noisy sensor readings, and it can be connected to a workstation. In the experiments described below, only the six frontal sensors have been used. For a very detailed description of the robot, see [5]. The controller connects each infrared sensor to a one-dimensional RBF network consisting of four hidden units and a scaling factor = 0:1. The “forward” sf and “turn” st components are determined as
sf
=
1 ? 1:2 tanh
6 X
i=1
!
ooi (IR i ) ; st
=
tanh
3 X
i=1
ooi (IR i) ?
6 X
i=4
!
ooi (IR i )
:
(4)
As can be seen, the “forward” component uses the sum of all six network outputs, whereas the “turn” component uses the activation difference between the left and the right-hand-side. Both motor activations Ml and Mr are determined as
Ml
=
sf
+ st ;
Mr
=
sf
? st :
(5 )
To allow for continuous learning, the control structure also features a value system, which monitors the robots actions. The value system constantly compares the motor speed setting, i.e., Ml and Mr , with the actual motor speeds measured with attached wheel encoders. In case of a collision, where the wheel-surface friction slows down the motors significantly, the value system triggers learning rule (3) and demands higher receptive field responses, i.e., ooi (IR i ), which should be proportional to the total infrared 3
activation of the corresponding side. If, however, the speed averaged over the last 20 time steps is lower than a certain threshold, e.g., 5, the value system demands lower receptive field responses, since the current response slows down the motors too much. By these means, the robot is able to constantly adapt to a dynamically changing environment. In experiments in an arena of size 60x45 cm with wooden walls of 3 cm o and after height, the robot never became stuck. Figure 2 shows the initial weights wij 500 steps. In addition, the controller is able to adapt to sensor changes, sensor faults, and the presence of ambient light, which significantly changes the sensor readings for unchanged robot-obstacle distances.
4 Regression and Classification In addition to mobile-robot control, the proposed model has been applied to various regression and classification tasks. The model yielded fast convergence rates, good generalization performance, and exhibited incremental learning properties. Among other functions, the proposed model was used to approximate three periods of the one-dimensional function f (x) = sin(2 x); x 2 [?1:0; 2:0]. Training was performed on 30 equally distributed points, and generalization was tested on 300 patterns. Using standard BP networks with 5 and 500 units in the hidden layer, using on-line and batch update, and initial weights between [?0:1; 0:1] and [?40; 40], the network provided a sufficient approximation within 20,000 epochs only for 5 and 10 hidden units; in all other cases, learning completely failed. The proposed model with 100 hidden units, = 0:1, and = 1:0 learned the desired mapping in merely 100 epochs. Figures 3 and 4 show the initial mapping and the mapping after 100 epochs. The network obtained almost the same mapping after merely 10 epochs, which is three orders of magnitude faster than backpropagation with optimal parameters. Furthermore, the mapping was virtually unchanged when using 1000 or more hidden units and the same number of training examples. Also, during ongoing learning for another 1000 epochs, the network did not exhibit any overlearning. Figure 5 shows the model’s incremental learning capabilities. Here, the trained network already shown in Fig. 4 was trained on 10 new examples drawn from the function f (x) = sin(4 x); x 2 [?1:0; 0:0] for another 100 epochs without seeing any of the old patterns from x 2 [0:0; 2:0]. As can be seen, the network has learned
o Fig. 2. The evolution of the output weights wij after 500 steps. For details, see text.
4
f(x) 1.5
f(x) 1.5 Network mapping Desired mapping
Hidden units Training points
Network mapping Desired mapping
1
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5 -1
-0.5
0
0.5
1
1.5
2
-1.5 -1
x
Fig. 3. Initial mapping of a network with 100 hidden units and 0:1. The initial mapping is, of course, arbitrary.
-0.5
Hidden units Training points
0
0.5
1
1.5
2
x
Fig. 4. Final network mapping after 100 epochs. The network’s mapping is virtually identical with the desired function.
=
the new patterns without forgetting the old mapping. Finally, the proposed model has been applied to the well-known, double-spiral problem [7] where 194 <x; y>-points represent two, intertwined spirals. Figure 6 shows the perfect generalization of a network with 2500 hidden units, = 0:1, and = 0:7 after only one epoch. Other algorithms require 10 000 - 20 000 epochs [7], 1700 epochs [3], or 180 epochs [4], respectively.
5 Conclusions This paper has proposed a modified RBF network model that is especially designed for controlling autonomous agents where compression in time is to be done, rather than compression with respect to a fixed training set. In addition to mobile-robot control, this paper has discussed several regression and classification tasks where high convergence rates and good generalization was obtained. The design of the proposed RBF model resembles Kohonen feature maps and growing cell structures, but it does not explicitly maintain any neighborhood relation of adjacent units. Further research should address h by local rules without using feed-back of self-adaptation of the hidden weights wij certain signals from the output layer in order to eventually discard the parameter .
Acknowledgements This work was supported in part by a Human Capital and Mobility fellowship ERBCHBICT941266 of the European Union.
References 1. Y.S., Abu-Mostafa, Hints and the VC Dimension, Neural Computation, 1992. 2. H. Bourlard and N. Morgan, Connectionist Speech Recognition - A Hybrid Approach, Kluwer Academic Publishers, 1994.
5
3. S.E. Fahlman and C. Lebiere, The Cascade-Correlation Learning Architecture, In D. Touretzky (ed.), Advances in Neural Information Processing Systems 2, pp. 524-532, Morgan Kaufmann Publishers, 1990. 4. B. Fritzke, Supervised Learning with Growing Cell Structures, In J. Cowan, G. Tesauro, and J. Alspector (eds.), Advances in Neural Information Processing Systems 6, pp. 255-262, Morgan Kaufmann Publishers, 1994. 5. Khepera Users Manual, Laboratoire de microinformatique, Swiss Federal Institute of Technology (EPFL), 1015 Lausanne, Switzerland. 6. Kohonen, T. (1989). Self-Organization and Associative Memory. Springer-Verlag. 7. K.J. Lang and M.J. Witbrock, Learning to tell two spirals apart, In D. Touretzky, G. Hinton, and T. Sejnowski (eds.), Proceedings of the 1988 Connectionist Models Summer School, pp. 52-59, Morgan Kaufmann Publishers, 1989. 8. Y. Le Cun, Generalization and Network Design Strategies. In R. Pfeifer, Z. Schreter, F. Fogelman, and L. Steels (eds.), Connectionism in Perspective, Zurich, pp. 148-153., Elsevier, Amsterdam, 1989. 9. Y. Le Cun, J.S., Denker, and S.A. Solla, Optimal Brain Damage, In D. Touretzky (ed.), Advances in Neural Information Processing Systems 2, pp. 598-605, Morgan Kaufmann Publishers, 1990. 10. J. Moody and C. Darken, Learning with Localized Receptive Fields. In D. Touretzky, G. Hinton, and T. Sejnowski (eds.), Proceedings of the 1988 Connectionist Models Summer School, pp. 133-143, Morgan Kaufmann Publishers, 1989. 11. M.C. Mozer and P. Smolensky, Skeletonization: A technique for trimming the fat from a network via relevance assessment, In D.S. Touretzky (ed.), Advances in Neural Information Processing Systems 1, pp. 107–115, Morgan Kaufmann, 1989. 12. P. Simard, B. Victori, Y. Le Cun, and J. Denker, Tangent Prop – A formalism for specifying selected invariances in an adaptive network, In David S. Touretzky (ed.), Advances in Neural Information Processing Systems 4, pp. 895-903, Morgan Kaufmann Publishers, 1992. 13. V. Vapnik, Estimation of Dependencies Based on Empirical Data, Springer, 1982.
f(x) 1.5
1.0
Network mapping New desired mapping Training points
1
0.8 0.6 0.4
0.5
0.2 0.0
0
-0.2 -0.5
-0.4 -0.6
-1
-0.8
-1.5 -1
-0.5
0
0.5
1
1.5
2
-1.0
x
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Fig. 5. Presenting new patterns from the interval 1:0; 0:0 does not affect the mapping in the interval 0:0; 2:0 .
[?
[
]
Fig. 6. Final mapping of an RBF network using the proposed learning rule after only one epoch.
]
6