Adaptive Classi cation in Autonomous Agents Christian Scheier and Dimitrios Lambrinos AI Lab, Computer Science Department, University of Zurich Winterthurerstrasse 190, 8057 Zurich, Switzerland E-mail: scheier@i .unizh.ch October 9, 1995 Abstract
One of the fundamental tasks facing autonomous robots is to reduce the many degrees of freedom of the input space by some sorts of classi cation mechanism. The sensory stimulation caused by one and the same object, for instance, varies enormously depending on lighting conditions, distance from object, orientation and so on. Eorts to solve this problem, say in classical computer vision, have only had limited success. In this paper a new approach towards classi cation in autonomous robots is proposed. It's cornerstone is the integration of the robots own actions into the classi cation process. More speci cally, correlations through time-linked independent samples of sensory stimuli and of kinesthetic signals produced by self-motion of the system form the basis of the category learning. Thus, it is suggested that classi cation should not be seen as an isolated perceptual (sub-)system but rather as a sensory-motor coordination which comes about through a self-organizing process. These ideas are illustrated with a case study of an autonomous system that has to learn to distinguish between graspable and non-graspable objects.
1 Introduction
If an autonomous robot is to function in the real world it must have means to parse the environment into coherent entities, that is, into bounded objects and events. It must also possess the means to recognize discriminably dierent entities with important common characteristics as instances of the same category. Both capacities form the core of what is usually referred to as classi cation. The ability to categorize is obviously of great value to robots, as it permits them to respond correctly to items they have never encountered before by virtue of class properties of those items that relate them to previous experience. Categorization in autonomous robots is tricky because even under constant environmental condititions sensory stimulations caused by one and the same object varies strongly depending e.g. on the distance and orientation from the object. As an example, consider the sensory space of the robot that was used to conduct the experiments presented in this paper. It has 8 IR sensors where each IR sensor has 210 states, i.e. there are 2108 dierent states, which amounts roughly to 1024. Clearly, the problem of reducing the many degrees of freedom of this space is a non-trivial one. It is extremely dicult to de ne a priori con gurations within this space which are meaningful. This implies that categories should be acquired at run time, i.e. as the agent is moving around in the environment. Many psychological models of category learning are based on supervised learning (e.g. [13]). The advantage of supervised learning is that the nal error metric is available during training. AILab, Tech. Report No. 95.12, submitted to EMCSR (European Meeting on Cybernetics and Systems Research, Vienna
1
Unfortunately, when constructing classi ers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. Thus, the learning should be selforganizing. (e.g.[15]). Self-organization is important because the categories acquired by the system must be grounded in the system's own interaction with its environment ([10]). In robotics, but also in computer vision and psychology, categorization is normally treated as an information processing question: the sensors receive a particular input which is processed and mapped onto an internal representation. Often neural networks are used for this purpose. The problem encountered is that typically an enormous number of highly dierent sensory patterns should map onto the same representation. This problem has also been referred to as perceptual aliasing [23]. Eorts to solve this problem have only been successful to a limited extend. In contrast, biological systems are extremly good at categorizing their environment at run time (see e.g. [11]). One key dierence between information processing approaches and categorization in natural systems seems to be that the former view categorization as a process which happens on the input side whereas the latter heavily includes information stemming from the organism's own movements. For recognizing a memorized visual stimuli, for instance, ies have to shift the actual image to the same location in the visual eld where the image had been presented during the storage process ([8]). In other words, the incorporation of action is crucial for the recognition process. Selfmotion is even more important for the development of categories in young infants (see e.g. [18] for a review). According to these results, movement must itself be considered a perceptual system providing critical input for the classi cation process. It also provides for the dynamic sampling of the stimulus attributes (e.g. [4]). In sum, these results suggest that classi cation is not an information process which takes place on the sensory or input side, but rather a sensory-motor coordination. This idea has already been pointed out by [7] and we have argued elsewhere along similar lines ([14],[16]). In this paper we show that viewing categorization as a sensory-motor coordination sheds new light on and leads to simpli cations of the dicult problem of categorization in autonomous robots. In the experiments presented below, a mobile robot has to collect small objects. Since there are other objects in the environment which it should not collect, it has to learn (a) to recognize objects it should collect and (b) to discriminate the latter from all other objects in the environment.
2 Adaptive Classi cation: a case study
2.1 The experimental set-up
The mobile robot used in the experiments is shown in gure 1. It is a KheperaT M robot, 55mm in diameter and 32mm high (weight 70g) equipped with eight IR sensors, six in the front and two in the back. It has two wheels which are individually driven by DC motors. In addition, there is an arm with a gripper installed at the end. The arm is moved by a DC motor coupled with a position sensor. The maximum object size that can be grasped with the gripper is about 40mm. Pegs inside the gripper can be detected by an optical barrier that is mounted on the gripper. With this gripper the robot can grasp certain pegs but not others, some are too heavy and cannot be lifted with the arm. The robot should collect the ones it can grasp. The simplest strategy is to try and grasp every object it encounters, and if the grasping is successful the agent brings the peg home, if not it moves on. But this would be highly inecient. In order to make the collection process more ecient the distinction between graspable and non-graspable objects should be based on information which is more readily available. This information comes from visual and from kinesthetic signals. In our experiments we exploit the fact that the agent is able to move around. Instead of looking at a particular ( xed) sensory pattern we let the agent move along an object. This is a re ex action: whenever it gets close to an object it will move along it (see below).
2
Figure 1: The robot and its environment. The small pegs are graspable, i.e., can be lifted by the robot. The large ones are too heavy and cannot be grasped by the agent. The robot's task is to bring small pegs to the nest.
2.2 The control architecture
In the real world, agents always have to do several things and at least some of them will be incompatible. To decide what the agent should do in a particular situation is one of the important functions of a control architecture. In the literature this problem has been called \action selection". The term is inappropriate because it introduces a particular bias on how the issue should be tackled (see below). A comprehensive review of \action selection" is given in [20]. There is a fundamental problem with most approaches. They explicitly or implicitly rely on the assumption that what is expressed behaviorally has an internal correspondence. If we see an agent following an object we suspect an object-following module (which is sometimes called a \behavioral layer"). But there is a frame-of-reference issue here. Behavior is by de nition the result of a system-environment interaction and the internal mechanisms cannot be directly inferred from the behavior alone. Of course, there must be something within the organism which leads to this behavior. But it would be a mistake to infer that, if we want an agent to follow objects, we have to de ne a special module for object-following. But this is precisely what is often done. [2] proposed an alternative scheme which does not suer from this problem. Instead of having modules (or behavioral layers) there are a number of simple processes which all run in parallel and continuously in uence the agent's internal state. An extension of this approach, the \Extended Braitenberg Architecture" (EBA), has been proposed in [16]. Let us illustrate the point directly with our case study. There are the following processes: move forward, turn towards object, avoid obstacle, grasp, and bring to nest. This sounds very much like the traditional approaches. The main dierence is the following. All the processes run all the time. The in uence they exert on the behavior of the agent varies depending on the circumstances. So, under certain conditions they will have no in uence and, in others, they will constitute the major in uence; but they are not on or o. The basic architecture is shown in Figure 2. There are values (called quantities) associated with each of the eectors, namely the speed of the motors and the position of the gripper, respectively. The processes continuously write onto these quantities, i.e. they add or subtract particular values. Thus, the activation of the eector quantities is a superposition of the output values from all processes. For example, the motor speeds are calculated as: s(t)
= (sl (t); sr (t)) =
N X i=1
()
l oi t ;
N X i=1
()
r oi t
!
(1)
where sl , sr is the speed of the left and the right motor and oli , ori is the output of the i-th 3
from from sensors other and processes effectors motor speed
MOVE
FORWARD
AVOID OBSTACLE
TURN-TOWARDS-OBJECT
GRASP
BRING HOME gripper
Figure 2: The control architecture of the robot: see text for explanation. process to the speed quantity of the left and the right motor, respectively and N is the number of processes. The agent does not care where the speed values for the motors come from: they can originate from any process writing onto the speed quantities. Since the main focus of this paper is the problem of categorization we give only a short description of each process in Figure 2. A detailed presentation of their mathematical formulations can be found in [16].
move forward: This is a default process: if the activated move forward causes the robot to move
other processes are not or only slightly straight ahead with a default speed. avoid obstacle: This process increases its in uence on the speed quantities as the robot gets close to an obstacle. Obstacles on the right will lead to large values in the sensors on the right side and in turn to an increase of the speed quantity asscociated with the left motor, sl , and a decrease of the speed quantity sr associated with the right motor. turn towards object: Whenever a sensor on the right or on the left of the agent is on, the agent tries to maintain this condition. This makes the agent move along anything which stimulates a lateral IR sensor. Together with the avoid obstacle process it will also start moving along obstacles it encounters head on - there is no need to provide explicitly for this behavior internally. The agent will rst turn which leads to activation in one of the lateral IR sensors, which in turn causes the agent to move along the object. grasp: This process causes the robot to grasp an object. Because this is a core process we give it seperate treatment below. bring to nest: Once the robot has grasped an object the bring to nest process gets highly activated. This in turn causes robot to bring the object to a home base. This process is connected to the ambient light sensors of the robot. It uses the activations of these sensors to nd the home bases, i.e. it implements phototaxis.
2.3 Learning rules
While designing the individual processes is quite straightforward, having the processes cooperate to achieve the desired behaviors is nontrivial: how much should each process add to the various quantities ? Similar to the sensor fusion problem of the classical approach or the behavior fusion problem in behavior based robotics (e.g. [3]) there is a process fusion problem in our approach. For the designer the diculty is as follows. If one process works ne, say move forward, and a new one is added, say avoid obstacle, this new process will interfere with the existing ones since it also writes onto the same quantities (the speed values of the motors). How should the parameters 4
motor speed output of
turn
towards
object
weight sensor
activity level w value system gripper
Sensory-Motor Map
angular velocity vector
IR sensors
Figure 3: Tuning of the grasp process. Explanations see text. be chosen, how should the tuning be done? The solution is to use principles of self-organization. This implies that the agent must be embedded in a value system which guides the self-organized learning (e.g. [9]). Since in our case study we are primarily interested in the grasping behavior we have used learning to tune the output of the grasp process. This part of the architecture is illustrated in Figure 2.3. There is a Sensory-Motor Map (SMM) which receives inputs from the IR sensors and from a 10-dimensional angular velocity vector. In all experiments the raw signals were used, i.e. there was no preprocessing of the inputs to the SMM. Angular velocities are approximated by taking the dierence of the speed quantities: ( ) = vl (t) ? vr (t) (2) The SMM is based on a self-organizing map ([5]) but in principle any type of self-organizing neural net could be used. We use self-organizing maps because it has been shown that they are plausible models of human classi cation (e.g. [6], [17]). While experimenting with the map, however, a major problem was encountered. Namely, the \moving along object" behavior of the agent was not very stable. This has to do with the coordination between the avoid obstacle and the turn towards object process and relates to the parameters of these two processes. Even when we tried to tune these parameters the noise on the IR sensors was \breaking" this coordination leading to an oscillatory path along the object. The oscillation that occurs on the motor side is then propagated back to the sensory side via the angular velocity input to the map. As a result of this \noise-recycling", the activation pattern of the map never remains stable in one point but keeps changing position on the map. This oscillation on the map's activation pattern was making the association between the map's activity and the grasp and avoid process problematic. A way of solving the above problem is to use the \leaky integrator" activation function [5] corresponding to a decay term instead of the activation function used in standard self-organizing maps: (?1=2)(w (t)?in(t))2 (3) ai(t) = ai (t ? 1) + exp where 0 < < 1 relates to the time constant of the neuron. Equation 3 makes the activation of the units not only depend on the weights and the current inputs but also on the activation on the previous time step. This operation can also be seen as performing some kind \low pass" ltering on the angular velocity input. The SMM is connected to an internal quantity Q of the grasp process via modi able weights. The idea behind these adaptive connections is that the agent should learn to associate its sensory-motor mappings with grasping or leaving an object. In essence, the SMM is associated with Q if there is a reinforcement or value signal. Reinforcement is generated whenever the weight sensor is on, i.e. the robot was able to lift a small peg. In a ! t
in i
5
pure reinforcement learning approach, the agent initially makes more or less random movements. If by chance it happens to do the right thing, i.e. something which leads to a reinforcement, it can learn and this particular sensory-motor coordination will be strengthened. But as is well-known in the reinforcement literature (e.g.[12], [19]), this would take much too long. An initial bias must be introduced: the agent must be constrained to explore the environment in a particular manner, not just at random. In our example, the agent must be endowed by the designer with an \idea" of when to try and grasp an object. After learning has occurred this initial bias should be tuned such that the agent grasps small but not large pegs. As we mentioned earlier the agent is \normalizing" the sensory stimulation by approaching an object and circling around it. The latter is achieved by the turn towards object process. It continuously adds to Q. Thus Q only gets bigger if this process is active for a certain period of time. As this activation gets high it will eventually cause the agent to make a grasping movement. If the peg is liftable the value system generates a reinforcement signal, which in turn enables the Hebbian learning to occur. Thus learning is a posteriori and depending on the reinforcement signal. This is in accordance with results from category learning in infants ([18]) and animals ([22]). The updating of the weights between the SMM and Q thus is as follows: wj = v(t)(Qaj (t) ? ((Q + aj (t))wj (t)) (4) where wj represents the strength of the connection between the unit j of the SMM and Q. v(t) is the value signal (activity of the value sytem), aj is the activation of the pre-synaptic unit, and ; are the learning rate and decay parameters, respectively. There are several points that have to be noted concerning the update rule 4: While the potentiation term is standard use in Hebbian learning rules the depression term includes activity of both pre- and postsynaptic units. Thus, weights are decreased if there is either no pre- or no postsynaptic activity. This bidirectional active forgetting diers from the forgetting term in standard associative learning schemes where only the postsynaptic activity is included (e.g. [21]). The main advantage of our approach is that erroneous associations between the SMM and Q that have been built up in the beginning of the learning process - where the sensory-motor mappings have not yet been stabilized - will be signi cantly decreased as soon as a stable mapping is acquired. This in turn leads to less miss-classi cations and faster learning (see below). The dependency of the updating upon wj (t) dynamically limits the range over which the connection strength is allowed to vary. Thus, there is no need for thresholding or normalization of connection strenghts. Value sytems can be understood as basic evolutionary adaptations that generate global signals which relate to the occurence of salient events and that can gate synaptic modi cation. Value signals only impose biases on synaptic modi cations depending on the outcome of previous interactions of the robot with the environment. There is no attempt to correct \errors" at individual neurons, as is the case for the back-propagation training rule used in many models of human categorization (e.g. [13]). In sum, weights are only modi ed if the peg is liftable (which triggers v(t)). Since only small pegs are liftable an association between SMM activations for small pegs and high activations in Q (which in turn lead to a grasping movement) are acquired.
3 Results
Experiments were conducted on a at arena (100 cm x 100 cm) with walls (8 cm height) on each side. Small pegs were of 1.5 cm in diameter and 2 cm high, large pegs were of 3 cm in diameter 6
Figure 4: A typical trajectory of the robot before learning. For visualation purposes small pegs were removed from the gripper once the robot had grasped them. This enabled the robot to continue its trajectory instead of bringing the peg home. with the same height (2 cm). The shape of both types of pegs was cylindrical (see Figure 1). There were 20 pegs of each type in the environment. The robot's task was to collect the small ones and bring them to a home base. As mentioned above, instead of using perceptual inputs only signals stemming from the robot's own actions were used as the basis for the category learning. This leads to considerable simpli cations for the object recognition and classi cation process. In Figure 4 a typical trajectory of the robot before learning has occured is shown. The traces show smooth behavior of the robot as it interacts with its environment. As can be seen, grasp behavior is initially triggered by the bias and as a consequence the robot is not able to distinguish between large and small pegs. After a given number of steps of circling around a peg it tries to grasp the peg. As observers we can say that the robot is not able to distinguish between graspable (small) and non-graspable (large) pegs. As the robot circles around pegs the SMM gets inputs from the IR sensors and the angular velocity vector (see Figure 2.3). Over time the SMM acquires a topographical mapping of these inputs. In Figure 5 the evolution of this mapping is depicted. Initially, because the weights in the SMM are initialialized to random values the activations on the map are randomly distributed and indistinguishable for both type of objects (Figure 5 left four panels). As the robot encounters more objects the activations of the SMM get more stable and are distinct for the two types of objects (Figure 5 right four panels). Moreover, because the robot can circle around objects in two dierent directions (left, right) there are actually four clusters of activations on the SMM. The categories are grounded in the robot's interactions with the objects. From the robot's perspective a small (large) peg can either be grasped from the right or from the left side relative to it's \body". Thus the categories serve as a kind of indexical representation ([1]). If additional pegs with dierent shapes are added to the environment, the SMM restructures itself and a new mapping including all types of pegs is acquired (data not shown). This adaptivity is important since in the real world the system has to be able to cope with pegs it has never encountered before. In Figure 6 a typical trajectory of the robot after it has encountered a large number of pegs is shown. There is a clear distinction between the move along object and grasp behavior for large and small pegs, respectively. In order to quantify this learning process two performance measures were used. First, the time the agent spends circling around small and large pegs before and after learning was measured. The hypothesis was that once the categories are acquired the agent will need signi cantly less time to circle around pegs. Second, the number of pegs grasped before and after learning was 7
a)
c)
b)
a)
d)
c)
b)
d)
Figure 5: Evolution of activations on the Sensory-Motor Map. Four panels on the left show activations before the mappings have been acquired. On the right map activations after learning are depicted.
Figure 6: A typical trajectory of the robot after it has encountered a large number of pegs.
8
Table 1: Averaged (Std.Dev.) steps around peg and number of pegs grasped, sums over 50 trials. performance measure
large peg before
large peg after
small peg before
small peg after
steps pegs grasped
41.32.5 18.21.3
14.61.6 1.6.8
39.12. 18.7.6
12.32.5 19.2.5
recorded. The hypothesis in this case was that after learning the number of large pegs grasped by the agent should be signi cantly lower than before learning. In order to obtain statistically meaningful results the measures were taken over 50 runs. The environmental conditions in each run (e.g. placement of objects, number of objects) were identical. There were 20 pegs of each type. The results are summarized in table 1 There was a signi cant decrease in the number of steps the agent circled around pegs after learning had taken place. While initially the agent needed around 40 steps after learning the average number of steps decreased to 14:6 1:6 and 12:3 2:5 for large and small pegs, respectively. Moreover, the small standard devations reveal the robustness of this behavior. In addition, the number of pegs grasped decreased signi cantly for large pegs but remained similar for small pegs. Note that the robot initially grasps large pegs because of the bias discussed previously. Thus, the agent has learned to avoid large pegs and to grasp small pegs. Again, the small standard deviations show that the learning was robust over all trials. In sum, the experiments show that the agent learns to associate the sensory-motor mappings for small and large pegs with grasping and avoiding, respectively.
4 Conclusions
In this paper we have demonstrated that viewing classi cation as a sensory-motor coordination rather than a process happening on the sensor-side can lead to signi cant simpli cations. It enables the agent to reliably learn a classi cation of objects in the environment in a straightforward and simple way. Classi cation comes about via a process of self-organization by using a temporal self-organizing map. Context information is introduced in the model in terms of a leaky integrator activation function in order to enable the map to learn noisy sequences. The new learning rule which uses bidirectional active forgetting proves to be very ecient.
Acknowledgments This work is supported by the Swiss National Science foundation, grant number 20-40581.94
References
[1] P. E. Agre and D. Chapman. What are plans for ? Robotics and Autonomous Systems, 6(1):17{35, 1990. [2] V. Braitenberg. Vehicles. Kluwer Academic Publishers, 1984. [3] R. Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, RA-2:14{23, September/October 1986. 9
[4] E.M. Bushnell and J.P. Boudreau. Motor development in the mind:the potential role of motor abilities as a determinant of aspects of perceptual development. Child Development, 64:1005{ 1021, 1993. [5] G. J. Chappell and J. G. Taylor. The temporal kohonen map. Neural Networks, 6:441{445, 1993. [6] V. R. de Sa. Unsupervised Classi cation Learning from Cross-Modal Environmental Structure. PhD thesis, University of Rochester, 1994. [7] J. Dewey. The re ex arc concept in psychology. Psychological Review, (3):357{370, 1896. [8] M. Dill, R. Wolf, and M. Heisenberg. Visual pattern recognition in drosophila involves retinotopic matching. Nature, (365):751{753, 1993. [9] G. M. Edelman. The Remembered Present. Basic Books, New York, 1987. [10] S. Harnad. The symbol grounding problem. Physica D, (42):335{346, 1990. [11] R. J. Herrnstein. Levels of categorization. In W. E. Gall G. M. Edelman and W. M. Cowan, editors, Signal and Sense. Local and Global Order in Perceptual Maps, pages 365{413. Wiley, New York, 1993. [12] Leslie Pack Kaelbling. Learning in Embedded Systems. MIT Press, Cambridge, MA., 1994. [13] J. K. Kruschke and M. A. Erickson. Five principles for models of category learning. In Z. Dienes, editor, Connectionism and Human Learning, pages 365{413. Oxford University Press, Oxford, 1995. [14] R. Pfeifer and C. Scheier. From perception to action: the right direction? In Proceedings \From Perception to Action" Conference, pages 1{11, Los Alamitos, 1994. IEEE Computer Society Press. [15] R. Pfeifer and P. Verschure. The challenge of autonomous agents: Pitfalls and how to avoid them. In L. Steels and R. Brooks, editors, The \Arti cial Life" Route to \Arti cial Intelligence". Erlbaum, Hillsdale, NJ, 1994. [16] C. Scheier and R. Pfeifer. Classi cation as sensory-motor coordination: a case study on autonomous agents. In Proceedings ECAL95, 1995. [17] P. G. Schyns. A modular neural network model of concept acquisition. Cognitive Science, (15):461{508, 1991. [18] E. Thelen and L. B. Smith. A Dynamic Systems Approach to the Development of Cognition and Action. Bradford Books, Cambridge, Massachusetts, 1994. [19] S. Thrun. Ecient exploration in reinforcement learning. Technical report, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1992. [20] T. Tyrell. Computational Mechanisms for Action Selection. PhD thesis, University of Edingburgh, 1990. [21] P. Verschure, J. Wray, O. Sporns, G. Tononi, and G.M. Edelman. Multilevel analysis of a behaving real world artifact: An illustration of synthetic neural modeling. In Proceedings \From Perception to Action" Conference, Los Alamitos, 1994. IEEE Computer Society Press. [22] E. A. Wasserman. The conceptual abilities of pigeons. American Scientist, (83):246{268, 1995. [23] S.D. Whitehead and D.H. Ballard. Active perception and reinforcement learning. In W. Porter and R.J. Mooney, editors, Machine Learning. Proceedings of the Seventh International Conference, pages 179{190, 1990. 10