1999 IEEE Int. Conf. on Robotics and Automation, pp. 2255-2260
Probabilistic localization by appearance models and active vision B.J.A. Krose
R. Bunschoten
[email protected] Real World Computing Partnership, Autonomous Learning Functions Lab. SNN Department of Computer Science, University of Amsterdam Kruislaan 403, NL-1098 SJ Amsterdam, The Netherlands
Abstract In order to do useful things a mobile robot needs some sort of global information about the environment it is operating in. In this paper an approach is described where the global information is not cast in a model of the geometry of the environment but in a model of all sensory data of the robot. As a primary sensing system we used computer vision. The model gives a probability distribution over the learned locations given an observation. We developed an active vision strategy to increase the performance and tested the method on real image data from our robot.
1 Introduction Methods for mobile robot navigation come in two
avors: "reactive" navigation and "planned" navigation. In reactive navigation schemes sensor data is mapped directly into motor commands, while in planned navigation an internal model is used to calculate a path (optimal in some sense) which subsequently has to be followed. Reactive navigation is mainly used for collision avoidance and simple goal directed tasks. Sensors used are mainly sonar, infrared and, to an increasing amount, vision. For more complicated goal directed task a pure reactive method is insucient, for example the system may be trapped in local minima. This is where the planned navigation comes in. Here we need a model of the world for the determination of the optimal path. Traditionally such a model is represented as a geometric model in the task space of the robot. A wide variety of path planning algorithms exist (see [10] for an overview) which compute a path in the task space or con guration space of the robot. This reduces the navigation problem into a path following problem. However, this is not a trivial task. Path following requires a good estimation of the lo-
cation of the robot. The proprioceptive sensors (shaft encoders, which measure the revolutions of the wheels) are subject to inaccuracies as a result of slip. Therefor, information from exterospeci c sensors has to be used to update the estimate of its location. In completely known environments the environment model can be given a-priori, but in most realistic cases the parameters of the environment model have to be estimated or updated from sensor data. Environment models typically consist of geometric objects in the Cartesian coordinate system or occupancy grid methods. Incremental methods such as Kalman lters have been presented as adequate methods to update the estimated location and the estimated parameters of the environment model [3]. In other approaches the sensor information is rst mapped to a representation in the task domain before being fused with previous estimates [6]. All these approaches have in common that sensor data is used to update the estimated location of the robot de ned in some Cartesian coordinate system (the task space) in which also the environment is modeled. This approach is in contrast with the current trend in reactive robot control, where the robot is controlled by the dierence between the current and desired sensor values, without constructing a representation in the world domain. Much of this work is done for robot manipulators [17] but also for mobile robots [1]. If we want to apply such sensor based control or visual servoing on our robot we may have a discrepancy between the state representation for global navigation (i.e. in the task space) and for reactive navigation (in the sensor space). To avoid this, environment learning models in the sensory domain have been proposed. Some are based on more abstract representations for objects [8, 13], some are bottom-up starting from the raw sensory signals [9, 18]. The latter approaches model the environment as a low-dimensional manifold in the high-dimensional sensor space. Such models can be based on neural networks as nonlinear
models [7] or local linear models [4]. Many of the above-mentioned methods for modeling in the sensor domain are based on range data as sensor measurements. A much richer source of information is a vision system and recently many papers have been presented on using vision for environment modeling [11, 16].
2 Appearance modeling of the environment Our approach to modeling the environment in the visual domain is again based on the idea described in the previous section: the intrinsic dimensionality of images taken from a camera on a mobile robot is restricted by the degrees of freedom of the camera and the illumination conditions. If the latter do not vary, all images taken from a camera mounted on a mobile robot lie on a 3D manifold in the high dimensional image space. Basically we could try to model this manifold with for example a Kohonen network (as we did earlier with range data [7]). However, the problem with images is that, because of the extremely high dimensionality of the sensor space (for example 256 x 256), we need a ridiculous high number of examples to get reliable low variance estimates. Instead of estimating a parameterized manifold in this high-dimensional space we rst perform a linear projection to a lower dimensional representation. This approach has been used by Murase and Nayar [12] for object recognition and by Deguchi [5] for visual servoing. As a linear projection we perform a Principal Component Analysis on the data, also known as the Karhunen-Loeve transform [14]. Suppose an image is represented by z = (z1 ; :::; zN )T , with N being the number of pixels, then the whole set of images is denoted by Z = (z1 ; :::; zM ). We make a set of samples Znorm of which the mean by subtracting Pj=1is zzero the average image = M1 M j from each image and compute the covariance matrix R = ZnormZTnorm . We determine the eigenvectors e1 ; e2 ; :: and the corresponding eigenvalues i . After sorting the eigenvectors in descending order of eigenvalues we can select the rst k eigenvectors for a new basis for the patterns: znorm;i Pks=1 ys;i es. 1 The projection of a normal1 In our experiments we did not compute the the covariance matrix R = Znorm ZTnorm but we computed the implicit covariance matrix R~ = ZTnorm Znorm . Now R~ is a M M matrix, much smaller then R because the number of images is much smaller then the number of pixels in an image. Murase [12] shows that the rst eigenvectors of R can be obtained with 1 e = ~? 2 Z e~ . i
i
norm
i
Figure 1: Top-view of the experimental room with the positions at which the learning samples have been acquired ized image znorm;i in the eigenspace is yi = (y1 :::yk )T with yi = [e1; e2; ::ek ]T zi : (1) In this low-dimensional space we can now use the patterns yi to learn a nonlinear representation. For our experiments we positioned our robot (a Nomad Super Scout), at 18 dierent locations in a part of the laboratory (which is a very cluttered environment). The locations are on a square grid with 1 m distance (see gure 1). For each location we captured 24 images, taken at intervals of 15 degrees. In total 432 images were acquired. To reduce the dimensionality of the problem we smoothed and subsampled the images to 64 x 64 pixels. From one particular location the images are given in Figure 2. We perform the PCA on the 432 images. In order to determine k (the number of eigenvectors to reconstruction) we computed Pki=1havei=aPNireasonable : the fraction of variance which is =1 i preserved. On the basis of this we decided to take k = 20, the point where we still have 80% of the variance preserved. Some of the principal eigenvectors are depicted in gure 3. To have an impression about the smoothness of the manifold in eigenspace we visualized the curve which is formed by a pure rotation of the camera (see gure 4. We used for this only the rst 3 eigenvectors. Note that the curve is not as smooth as the curves found by Murase et al. for objects under dierent orientations. We decided not to construct a parameterized model of the data, but instead used a nearest neighbour approach for localization. In order to say something about the con dence for localization we rst cast the problem in a probabilistic framework.
Trajectory in Eigenspace for location 5
6 105 4
90
eigenvector 3
30
330 15
2
240
345 0 195 285
−2 180 3 060 315 165
75
−4
300
210
225 120
45 255
135270
−6 150 −8 10
60 5 10 5
0
0 −5
eigenvector 2
−5 −10 eigenvector 1
Figure 4: Projected images in a 3D eigenspace spanned by the three eigenvectors with the largest eigenvalues. The images are taken at one single location but dierent orientation, neigbouring orientations are connected by a line.
3 Probabilistic representation Figure 2: All images taken at one particular position in the room by rotating the robot in steps of 15 degrees clockwise.
Suppose that the probability of generating a vector
y in eigenspace from a robot pose with location xi and orientation j is given by P (yjxi ; j ). With Bayes' rule we can compute the distribution giving the probability that the robot is at location xi , orientation j , given y: P (yjxi ; j )P (xi ; j ) ; (2) P (xi ; j jy) = P i;j P (yjxi ; j )P (xi ; j ) in which P (xi ; j ) is the prior probability for state xi ; j . We model the probability P (yjxi ; j ) as a Gaussian centered at the learned vector yxi ;j with covariance matrix :
exp[? 12 (yA ? yxi ;j )T ?1 (yA ? yxi ;j )] : (2)K=2 jj1=2 (3) For the moment we assume a diagonal covariance matrix, with equal variance in all dimensions. For the localization procedure we start with a newly acquired image zA which is rst projected on the linear subspace according to Eq (1), resulting in a vector yA. For this observation yA we obtain a probability distribution for the location and orientation of the robot P (xi ; j jyA ) according to on the distance of yA to the learned prototypes. P (yA jxi ; j ) =
Figure 3: The rst eigenvectors of the covariance matrix of the data set
Trajectories in Eigenspace
location 2 location 5 location 6 6
4
2 eigenvector 3
We expect this approach to work well if there is a nice `smooth' relationship between the robot pose x; and the projection y. However, from gures 4and 5 we see that, although the visual information is richer than the range data which we used in earlier research, it is still possible that dierent poses generate a very similar image (just think of the robot looking at a brick wall). This is the problem often called `perceptual aliasing'. Also, because of the projection to the low-dimensional subspace dierent images z may project to a similar y, particularly if k is chosen too low. Therefor we adopt an active vision strategy for the localization task.
0
−2
−4
−6
5
4 Active vision strategy
−5 5
The probability that the robot is at location xi , independent of its orientation can be found by rst calculating the probability of observing yA at this location, independent of orientation, P (yA jxi ) =
X P (y jx ; )P ( jx ); A i j j i j
(4)
in which P (yA jxi ; j ) can be derived from Eq (3). We then apply Bayes' rule to get the probability distribution over the locations: P (y jx )P (xi ) : (5) P (xi jyA ) = P A i i P (yA jxi )P (xi )
If the probability of being at a certain location xi is higher than some threshold, we can assign the robot's location to xi . If there is not such strong evidence, we need to get more evidence on the location. We do this by rotating the camera with a A such that a new observation yB is made. A second time Bayes' rule is used to nd the new probability distribution: P (xi jyA ; yB ) =
PPi P(y(Ay;Ay;ByjBxjix)Pi)P(x(ix)i )
(6)
Under the assumption that yA and yB are conditionally independent we obtain: P (xi jyA ; yB ) =
0
0
PPi (Py(AyjAxjix)Pi)(Py(ByjBxjix)Pi)(Px(ix)i) ;(7)
which can be calculated from P (yA jxi ) and P (yB jxi ) derived from Eq. (4) and (3). Again we can decide to locate the robot or to take a new observation in a dierent direction.
−5 eigenvector 2
eigenvector 1
Figure 5: 3 curves in eigenspace generated by rotating the camera at location 2, 5 and 6.
5 The optimal rotation From gure 5 we can see that in some part of the eigenspace the curves which indicate a certain location xi are close together while in some parts they are further apart. An optimal camera rotation will be such a displacement on the curve that the ambiguity is minimized. Since we have written the localization problem in a probabilistic framework we can use an information theoretic approach to nding the best rotation. The aim is to have a unique allocation to one location, which means that the distribution P (xi jyA ) must be very peaked. A natural measure is to look at the entropy of the distribution: H (yA ) = ?
X P (x jy i
i A ) log P (xi jyA )
(8)
The entropy reduction sgyA ( ) which is achieved by taking a second measurement yB under relative angle is de ned as: syA ( ) = H (yA ) ? H (yB )
(9)
= argmax sy ( ) i
(10)
For all learning samples yi in our set we determined the optimal rotation, which now can be de ned by: yi
Note that this approach requires an optimal rotation to be stored for every learning sample in the set. In
0.6 0.4 0.2
0.8
0.6
0.4
0.2
0
0 1 0.8 0.6 0.4 7
0.2
% of correct location estimates
% of correct location estimates
0.8
P(Li,Oj|Ga)
1
1
PDF for test location C, orientation 75 degrees
A
B
C D test location
E
0.8
0.6
0.4
0.2
0
1
2
3 4 test location
5
Figure 7: Experimental results showing the fraction of correctly classi ed locations. To the left the localization after one observation,to the right the localization after two observations
6 5
0 0
4 60
120
3 180
240
2 300
360
1 Training Location
Orientation (in degrees)
Figure 6: Probability distribution over all locations and orientations for an image taken at testlocation C and orientation 75 degrees the eld of object recognition similar approaches have been presented where an average entropy reduction was computed [2].
6 Experiments From our total set of captured images we selected 168 images as a training set. These images were taken at the 7 `prototype' locations [1-7], as depicted in gure 1, at intervals of 15 degrees. For each of these images we determined the optimal rotation according to eq 10. We tested the robot localization on 120 robot poses, taken from 5 test locations [A-E], at orientation intervals of 15 degrees. The 5 test locations are about 25 cm from the learning locations. The localization procedure is as follows. First we take an image from one of the test poses. We compute the projection in the eigenspace and determine the probability over the 168 learning samples (7 locations and 24 orientations), using eq. 2 and 3. For an arbitrary pose (location C at orientation 75 degrees) this probability distribution is given in gure 6. Since we are only interested in the location we take the marginal distribution by summation over the j . The location xi with the highest probability is selected as estimated location. To evaluate the performance of the method we con-
sidered a localization to be correct if the selected location was the nearest neighbour in workspace. Note that the test samples are taken from positions about 20 cm. from the training locations. We could have used some sort of interpolation, to get a better position estimate, but because the training points were not sampled on a dense grid, the interpolation would have been vary inaccurate. The fraction of correct localizations is given in Figure 6a. For each of the test samples we computed the probability distribution over the locations after a second observation, using the optimal rotation. The right graph of Figure 6 shows the localization after two observations.
7 Discussion Modeling in sensor space or "appearance modeling" is a good alternative for a modeling in Cartesian space. The localization is immediately driven by the sensory input and the mapping from sensory data to a geometrical representation is no longer needed. Our experiments show that localization can be carried out on such models. In this paper we did not try to get a parameterization of the low-dimensional manifold in sensorspace, we simply stored the learning samples and used a nearest neighbour technique. The experiments also reveal some of the problems related to the approach. Our gures (4) and ( g-eigencurve) show that the manifold is highly curved, which was also shown by [15]. The number of data points needed to have an accurate modeling of the manifold will be large, even though we reduced the dimension of the sensor data rst with a linear projection. However, obtaining supervised training data is expensive. Methods have been presented to use model knowledge and generate
'synthetic' (range) data [4] from a few experimental range scans, to have more training samples. This approach is also possible for images, and the samples can now be used to obtain a parametric representation of the manifold. If, however, a nearest neighbour method is used, this will have the disadvantage that an enormous amount of learning samples has to be stored and that recognition will be slow. Another problem which was shown in the experiments was related to our assumption on the noise. We assumed Gaussian noise, with equal variance in all directions. In this case the Bayesian localization procedure is identical to a 1-nearest neighbour method. We also worked on an approach which locally models the noise characteristics of the data. This enables us to use a locally varying metric in a nearest neighbour method. Because of the large number of learning samples which has to be stored in a nearest neighbour method we currently develop a parametric representation of the manifold and the noise. Active vision is a strong tool for disambiguating multiple location hypotheses, as was shown earlier by [11]. In this paper we determine and store the optimal rotation for each prototype, but if we change to a parameterized representation of the manifold, the optimal rotation will be a continuous function of the intrinsic state of the system. We currently try to apply the method to model a much larger part of the building.
Acknowledgement We thank Nikos Vlassis and the reviewers for comments on the manuscript. Part of the work on the optimal rotation was done by G. de Haan as a mastersthesis project.
References [1] Andersen, C.S., Stephan Jones, James Crowley, \Appearance based processes for visual navigation", Proc. SIRS'97, pp 227-236, 1997. [2] Borotschnig, H, L. Paletta, M. Prantl, A. Pinz, \Active object recognition in parametric eigenspace", in: P.H. Lewis and M.S. Nixon (ed), Proc. Brit. Mach. Vision. Conf. pp 629-638, 1998. [3] Crowley, J.L. \World modeling and position estimation for a mobile robot using ultrasonic ranging", Proc. of the IEEE Int. Conf. on Rob. and Aut., 1989.
[4] Crowley, J.L., F. Wallner and B. Schiele \Position estimation using principal components of range data", Proc. 1998 IEEE Int. Conf on Robotics and Automation, pp 3121-3128. [5] Koichiro Deguchi, \A direct interpretation of dynamic images and camera motion for vision guided robotics", Proc. IEEE/SICE/RSJ Int. Conf. on Multisensor Fusion, pp 313-320, 1996. [6] Hugh F. Durrant-Whyte, \Integration, coordination and control of multi-sensor robot systems", Kluwer Academic Publishers, 1988 [7] B.J.A. Krose, M.Eecen, \A Self-Organizing Representation of Sensor Space for Mobile Robot Navigation", Proc. of the 1994 IEEE IEEE/RSJ Int. Conf. on intelligent Robots and Systems, Munich, Germany, 1994, pp. 9-14. [8] Pierce, A and B. Kuipers: \Learning to explore and build maps". Proc. AAAI-94, AAAI/MIT Press 1994. [9] Kurtz, A. \Building maps for path-planning and navigation using learning classi cation of external sensor data", in: I. Aleksander and J. Taylor (eds): Arti cial Neural Networks 2, North-Holland/Elsevier Science Publishers, Amsterdam, 1992, 587-591. [10] Latombe, J.C. \Robot motion planning" (1991) Kluwer Academic Publishers, Boston. [11] Maeda,S., Kuno,Y, Shirai,Y, \Active Vision Based on Eigenspace Analysis". IROS97, pp1018 - 1023, 1997. [12] Murase, H. and S. K. Nayar. \Visual learning and recognition of 3-D objects from appearance", Int. Jrnl of Computer Vision, 14, 5-24, (1995). [13] Maratic, M. \Integration of representation into goaldriven behaviour based robots", IEEE Trans. on Robotics and Automation, 8 (3), 1992. [14] Oja, E. \Subspace methods of Pattern Recognition". Research Studies Press, Hertfordshire, 1983. [15] Pourraz, F. and J.L. Crowley, \Continuity properties of the appearance manifold for mobile robot estimation", SIRS'98, Edinburgh, 1998. [16] Tagare, H.D, D. McDermott, H. Xiao. \Visual place recognition for autonomous robots", in: Proc. 1998 IEEE Int. Conf on Rob. and Aut. pp 2530-2535, 1998. [17] Weiss, L.E., A.C. Sanderson and C.P. Neuman. \Dynamic sensor-based control of robots with visual feedback". IEEE Journal on Robotics and Automation, RA-3, (1987) 404-417. [18] Zimmer, U. \Robust world-modeling and navigation in a real world". Neurocomputing, vol 13 (2-4), 1996, pp247-260.