Proceedings of the 2000 IEEE International Conference on Robotics & Automation San Francisco, CA • April 2000
Using Multiple Gaussian Hypotheses to Represent Probability Distributions for Mobile Robot Localization David J. Austin d. aust in@comput er. org
Patric Jensfelt p a t r i c O s 3 , k t h . se
C e n t r e for A u t o n o m o u s S y s t e m s , R o y a l I n s t i t u t e of T e c h n o l o g y , S t o c k h o l m S E - 1 0 0 44, S w e d e n .
Abstract A new mobile robot localization technique is presented which uses multiple Gaussian hypotheses to represent the probability distribution of the robots location in the environment. Sensor data is assumed to be provided in the form of a Gaussian distribution over the space of robot poses. A tree of hypotheses is built, representing the possible data association histories for the system. Covariance intersection is used for the fusion o/the Gaussians whenever a data association decision is taken. However, such a tree can grow without bound and so rules are introduced for the elimination o / t h e least likely hypotheses from the tree and for the proper re-distribution of their probabilities. This technique is applied to a feature-based mobile robot localization scheme and experimental results are given demonstrating the effectiveness of the scheme.
1
Introduction
In many areas it is desirable to be able to efficiently represent an arbitrary probability distribution. A number of techniques have been used for this, including grid-based approaches [1] and samplebased approaches [2, 3, 4]. However, these approaches tend to be computationally expensive. Grid-based approaches can be very wasteful, requiring computation of the probability even in areas where the probability is negligible. On the other hand, sample-based approximations require the computation of a significant number of samples and care must be taken when deciding the location of the samples to ensure that all of the significant areas of the distribution are sampled. Furthermore, extracting information such as the peak from grid-based or sample-based distributions can be computationally expensive.
0-7803-5886-4•00•510.00©
2 0 0 0 IEEE
1036
Mobile robot localization is one area of particular interest where the representation of arbitrary probability distributions is required. Here we consider the problem of global localization which is to determine the position of the robot in an environment by observations of features in the environment with known locations (stored in a map). A probabilistic approach is required for global localization as the observation of the features is a highly uncertain process, depending upon real sensors and signal processing. However, the features observed by the robot are generally not unique and may occur multiple times in the map. As a result, the probability distribution over the robots location will be, in general, multi-modal. Many localization schemes have neglected this and used singlemode representations, of which the Kalman filter is by far the most common (e.g. [5, 6, 7]). In this paper, we present a method for representing the probability distribution by a number (or a mixture) of Gaussian hypotheses. Multiple Gaussian hypotheses have been previously applied to the problem of mobile robot localization by Jensfelt and Kristensen [8]. However, this method assumed a solution to the "data association" problem. That is, it was assumed that measurements could be matched against the existing set of hypotheses and only matching measurements were used to update the hypotheses. On the other hand, the method presented here delays the data association step by forming all possible matches and only then removing the least likely of them (if necessary). A further enhancement presented here is the use of "covariance intersection" [9] to intersect Gaussian hypotheses. This technique does not depend on the assumption of statistical independence and hence, the robot localization method presented here has no re-
quirement for independent measurement data. This is significant as it is not clear that statistical independence can be guaranteed for most robotic applications. 2
System
Model
As discussed above, due to the highly uncertain nature of real-world sensors and signal processing, we use a probabilistic framework for mobile robot localization. T h a t is, we maintain a probability distribution over the space of possible robot locations. For this paper we will consider the space of robot poses (both position and orientation in the plane). We denote the true robot pose at time t as x(t) = (x(t),y(t),a(t)) and the estimate at time t as :~(t) = (2(t), ~)(t), &(t)). For the mobile robot, we assume t h a t the system evolves as: x(t + 1) = f ( x ( t ) , u ( t ) ) + v(t)
(1)
where u(t) is the change in the robot pose estimated by odometry and v(t) is the process noise, assumed zero-mean and Gaussian with covariance matrix Q(t) (determined from a study the odometry performance [6]). Whenever the robot moves, we can form a new estimate from an existing estimate based on equation
(1). It is necessary to use external sensor information to obtain an initial estimate for the pose and to prevent the uncertainty in the pose growing without bound. For this purpose, we assume that there are a number of feature detectors which observe features in the environment. Without loss of generality, we treat the matching of features serially and consider only one detected feature at a time. When a feature is detected, it is matched against a m a p and provides a set if(t) of feature hypotheses fj(t) E .T(t). For example, a line feature detector could extract lines from laser scanner data and match these lines against a map. In addition, each feature has a uniform zero hypothesis fo(t) which captures the possibilities of sensor error, the fenture observed not being in the m a p or that the m a p is incorrect. Hence, the set of feature hypotheses is written as 2(t)
=
: j = 0,1,...}
(2)
The number of feature hypotheses is given by IJc(t)], using the standard notation for set size. The feature hypotheses are stochastic and we assume that they are given by
fj(t) = g(x(t)) + w(t)
(3)
1037
where g is a nonlinear feature extraction function and w(t) is the process noise, assumed zero-mean and Gaussian with covariance matrix R ( t ) . It is further assumed that v(t) and w(t) are independent. Given this general framework we must update an a priori probability distribution with the information in the feature hypotheses to give a new, posterior distribution, which reflects the u p d a t e d knowledge of the robot pose. However, before we can apply the new information, we must decide upon a representation for the probability distributions.
3
R e p r e s e n t a t i o n of D i s t r i b u t i o n s
As discussed above, for mobile robot localization, it is necessary to represent a complicated multi-modal probability distribution. Many localization schemes have neglected this and used single-mode representations, of which the K a l m a n filter is by far the most common [5, 6, 7]. Here we use a model which consists of a number of Gaussian hypotheses, each with weight and covariance matrix. In addition, we use a single uniform hypothesis to model the possibility that the position of the robot is completely unknown. Initially, there is only a uniform hypothesis, indicating that we have no knowledge of the robot pose. We denote the set of Gaussians (plus the uniform, if any) at time t as N(t) where t = 0 , 1 , 2 . . . is the discrete time or iteration number. Hence, we write 7-/(t) = {hi(t) : i = 0, 1,...}
(4)
We denote the number of hypotheses at time t as I~-/(t)l. In order for the hypotheses to remain Gnussian when u p d a t e d by the feature information, it is necessary to assume t h a t the feature distributions are also Gaussian. The assumption t h a t the feature hypotheses are Gaussian restricts the applicability of this method (e.g. a Gaussian hypothesis is not a good approximation for the probability distribution resulting from the observation of a point landmark at some (non-zero) distance, which results in an annular distribution). However, for a wide range of features, a Gaussian assumption is appropriate. So now we have an a priori set of hypotheses 7{(t) and we obtain from a feature detector a set of feature hypotheses .T(). The problem posed is: How to form a new set of hypotheses ~(t + I) which incorporates the information in .T()? This problem is commonly subdivided into two sequential steps: data association and tracking or sensor fusion [I0]. The data association step attempts to form matches between elements of the sets ~(t) and 9v0 . Only those matches are fused in the second step to create new hypotheses 7-/(t + i).
Figure 1: All possible data associations for two initial
hypotheses and three feature hypotheses. For the second, d a t a fusion step, we use the covariance intersection method [9] to compute the intersection of two Gaussian hypotheses. Unlike the K a l m a n filter, this method does not require an assumption of independence. This means that, even if the robot repeatedly observes the same feature, the covariance of the Gaussian hypotheses will not diminish. The K a l m a n filter would suffer from divergence in this situation, resulting in vanishing covariances. One drawback of this two step method is that the data association step is non-trivial and, in practice, cannot be achieved perfectly. This means that some possible matches are not considered at all, resulting in approximation of the true distribution. For a true representation of the distribution, we should consider all possible matches of elements of 7-/(t) with elements of .7"(t) 1 An example of this is shown in Figure 1. Here we have two initial hypotheses and three feature hypotheses. We must consider the fusion (or intersection) of all possible pairs consisting of one element from 7-{(t) and one element from 5r(t). Note that the number of hypotheses will rise exponentially with time because each new set of features multiplies the number of hypotheses in N(t) by the number of features 1.7"(t)[. As a result, for a practical application, it is necessary to find a simpler approximation to the true set of hypotheses. For example, Jensfelt and Kristensen [8] used this approach, with the restriction that each element of 7/(t) could match at most one element of .7"(t). In this way, the storage and computational requirements were strictly bounded with the assumption that the data association step was very reliable. Here we propose an alternate approach which does not assume a solution to the d a t a association problem and a t t e m p t s to minimise the approximations made. 3.1
Approach
The key idea of this approach is to consider as m a n y possible d a t a associations as we can. However, we must remain within some fixed m e m o r y size and hence 1This is the optimal Bayesian filter [10]
1038
there is some m a x i m u m number of hypotheses and it becomes necessary to delete hypotheses. Each deletion corresponds to eliminating one of the choices in the d a t a association step. To minimise the effects of deleting hypotheses, we choose to delete those with minimum probability. In this way, the limiting of possible d a t a associations is delayed for as long as possible and incorrect associations are less likely for the hypotheses with greatest probabilities. The first step is to create a set of new posteri hypotheses 7-t(t + 1), as shown in Figure 1, using covariance intersection for the intersection of Gaussian hypotheses. This step creates 17-/(t) l[.Y(t)l new hypotheses, i.e. I n ( t ÷ 1)l = In(t)II~(t)l
(5)
The second step is to repeatedly find the worst hypothesis and delete it. Note that, when the worst hypothesis is deleted, the probability mass associated with t h a t hypothesis must be assigned to some or all of the remaining hypotheses. The deletion of hypotheses is effectively a decision of d a t a association and therefore care is necessary in re-distributing the probability mass of the deleted node.
3.2
R e d i s t r i b u t i o n of Probability
A simplistic approach to the redistribution of probability mass would be to "throw it away" and renormalize afterwards. However, this can result in incorrect behavior as can be demonstrated by an example. Consider a sensor which has a 40% probability of false measurement. If in the first iteration where the sensor detects a feature which exists in five locations in the map. Then, after creating the new hypotheses we have the situation shown in Figure 2. The uniform hypothesis t h a t resulted from the zero hypothesis of the feature has total probability of 0.4 and the five other hypotheses have equal probability of 0.12. If we decide at this point to keep only two of the hypotheses and we naively discard the probability mass of the deleted nodes, then the final situation is as shown in Figure 2. After re-normalization, the weight of the uniform hypothesis is 0.8 and the Gaussian hypothesis has weight 0.2. This is incorrect because the elimination of one of the Gaussian hypotheses does not mean t h a t the probability of sensor failure is increased. In fact, if we eliminate one of the Gaussian hypotheses, it implies that the other Gaussian hypotheses are more likely. Thus, the correct procedure would be to assign the probability mass from the deleted hypotheses to the remaining undeleted Gaussian hypothesis, with the result t h a t the uniform hypothesis has probability
i. For each existing Gaussian hypothesis Nlrt
hi(t): (a) For each Gaussian feature hypothesis ~(t): i. Compute the probability of the intersection of the two Gaussians Pint ii. If Pint > Pmin create a new hypothesis hk(t + I) as a child of the old hypothesis hi(t) iii. otherwise, add the probability Pint to the spare probability of the old hypothesis node
An example situation where discarding the probability mass of the deleted nodes and renormalizing would give incorrect weight to the zero hypothesis of the feature.
Figure 2:
(b) If no children were added to the existing Gaussian hi(t), delete it and any ancestors which have no children as a result
/\
I\
(c) If the hypothesis hi(t) has any spare probability, re-distribute
k
2. Let nttI be the total number of new hypotheses created
3. Repeat n t t l - - n times (a) Search the new Gaussians to find the one with least probability
Figure 3: Example off hypotheses tree evolution over
time. Note that deletion of hypothesis A is effectively a data association decision for the previous time step, t=l. 0.4 and the remaining Gaussian hypothesis has probability 0.6. The above discussion indicates that we need to keep track of the parents of the hypotheses so that we know which hypotheses result from the one feature. In fact, this principle applies iteratively so that we must keep track of the parents of the parents and so on, back to the original uniform hypothesis that the system started with. In other words, we must build a tree with the nodes being the hypotheses and each level in the tree corresponding to a different iteration, starting from the initialization at the top of the tree and ending with the most recent d a t a at the bottom. Figure 3 gives an overview of the relationship between hypotheses that needs to be stored as the number of iterations increases. Figure 3 also shows an example of why we need to consider the complete history of hypotheses. If, in the second iteration (t = 2) we delete the node labeled A, we are effectively making a d a t a association decision for the previous iteration (t = 1) and so the probability of node A should be assigned to the two siblings B and C.
1039
(b) Delete the least probable Gaussian (c) Re-distribute the probability 4. Simplify the probability tree, deleting any nodes which have exactly one child (substituting the child in place of the deleted node).
Figure 4: The algorithm for adding a new layer of leaf
nodes when a new feature is observed 4
Algorithm
and
Implementation
The history of the hypotheses is stored in a doublylinked tree structure, allowing traversal in both directions. When each new feature is observed, the nodes in the tree are updated according to the model of equation (1) and then a new layer of leaf nodes is added to the b o t t o m of the tree. The algorithm used is shown in Figure 4. Special care must be taken in steps 1 (c) and 3 (c) above to ensure t h a t the "spare" probability resulting from unlikely hypotheses being deleted is re-distributed properly. Also note t h a t we are able to delete all nodes out of the tree which have only one child. This step does not affect the information in the tree and allows us
to place an upper bound on the number of nodes in the tree as num~odes _< Nloge(N)) because each node in the simplified tree must have at least two children. This simplification is implemented in step 4 of the algorithm.
5
Experiments
The algorithm described above was implemented for a Nomad200, operation in an office environment. Two feature detectors were used. The first is a laser line feature detector which determines dominant lines in the room using a Hough transform and matches those against a simple rectangular model of each room. The second feature detector is a door detector which matches templates against the laser scan looking for doors. Both the laser lines and doors have been measured manually and stored in a map. Figure 5 shows the evolution of the set of Gaussian hypotheses over a number of iterations. For this experiment, an upper bound of N = 100 was used. The integration of each new feature took approximately 300ms on the Pentium III 450MHz computer on-board the Nomad200. In this experiment a laser-based pose tracker has also been used to estimate the pose of the robot and both estimates are shown as circles. In Figure 5(a), the two circles overlap. In (b) the detected door feature is in slightly the wrong place and so the uniform hypothesis dominates, giving no best pose estimate, and the output pose is set to zero (the circle in the lower left of the room). In (c), the two estimates seem to be in good agreement but there seems to be a timedelay in the recording (note how the detected features are misaligned with the map). The comparison with the laser pose tracker demonstrates t h a t the method can localize the robot quite accurately. The alert reader will notice that the lab is a different sized room to all of the other rooms in the map. This means, t h a t when the robot turns around and sees lines from all sides of the room, it is fairly straightforward to localize. We have also run the algorithm in the corridors of the building, and it results in a distribution which seems reasonable, with long hypotheses lying along each of the corridors. However, with our simplistic rectangular room model and door features, it seems that there is insufficient information to localize fully. In particular, the left corridor is has very little information for laser lines and doors. Clearly, further work must be carried out to implement and test additional feature detectors. However, in feature rich environments such as the lab, the algorithm functions very well.
1040
6
Conclusions
In this paper, we have presented a new algorithm which uses multiple Gaussian hypotheses to represent probability distributions for mobile robot localization. The algorithm makes as few d a t a association decisions as possible and makes least significant d a t a association decisions first to try to minimize the impact of the inevitable mistakes. Another i m p o r t a n t feature is that we use covariance intersection for the intersection of Gaussian hypotheses, removing the assumption of measurement independence. This is significant as measurement independence cannot be guaranteed for a robotic application. The experimental results demonstrate the effectiveness of the method for localization in a real world environment.
7
Acknowledgment
This research has been sponsored by the Swedish Foundation for Strategic Research through the Centre for Autonomous Systems. The funding is gratefully acknowledged.
References [1] W. Burgard, D. Fox, D. Hennig, and T. Schmidt, "Estimating the absolute position of a mobile robot using position probability grids," in Proc. of the National Conference on Artificial Intelligence, 1996. [2] F. Dellaert, W. Burgard, D. Fox, and S. Thrun, "Using the condensation algorithm for robust, vision-based mobile robot localization," in Conference on Computer Vision and Pattern Recognition, IEEE, 1999. [3] M. Isard and A. Blake, "Condensation - conditional density propagation for visual tracking," Intl. Journal o] Computer Vision, vol. 29, no. 1, pp. 5-28, 1998. [4] F. Dellaert, D. Fox, W. Burgard, and S. Thrun, "Monte carlo localization for mobile robots," in IEEE Intl. Conf. on Robotics and Automation, pp. 1322-1328, May 1999. [5] J. L. Crowley, "World modeling and position estimation for a mobile robot," in I E E E Intl. Conf. on Robotics and Automation, vol. 3, pp. 1574-1579, 1987. [6] P. Jensfelt, "Localization using laser scanning and minimalistic environmental models," licentiate thesis, Automatic Control, Royal Institute of Technology, SE-100 44 Stockholm, Sweden, Apr. 1999. [7] J. Leonard and H. Durrant-Whyte, "Mobile robot localization by tracking geometric beacons," IEEE Transactions on Robotics and Automation, vol. 7, no. 3, pp. 376-382, 1991. [8] P. Jensfelt and S. Kristensen, "Active global localisation for a mobile robot using multiple hypothesis tracking," in Workshop on Reasoning with Uncertainty in Robot Navigation (IJCAI'99), (Stockholm, Sweden), Aug. 1999. [9] J. K. Uhlmann, "Dynamic map building and localization: New theoretical foundations." PhD Thesis, University of Oxford, 1995. [10] Y. Bar-Shalom and T. Fortmann, Tracking and Data Association. Academic Press, 1988.
m
I
©:~ ii ¸
0
i
i. . . . . li. . .
I I
!
I
i
iI
0
I
i
I
~
I
'
'
'
,..
'
=
0
I
5m ,2:
I
i
'
i
o
~=-2)~
5m
/ ~ /i
~........
20m
(b) t = 6
(a) t = 3
i
•
gl
i I
--'4
I q ,0,
,
,
,
i ,
i
5m
_JZZLZ
0
......... 15m~
!
~j
!! .
I0
I
I
I
I
I
I
I
I
I
=~
z
i-/ I
20m
(d) t = 3 0
(c) t = 9
F i g u r e 5: Four snapshots of the covariance ellipses of the hypotheses. The entire map of the building is used and
the lab space is shown expanded. The robot is initially started in the lab with no idea about its location. The figures (a), (b) and (c) show how the hypotheses evolve over time. Note in (a) the long thin hypotheses adjacent to each corridor that say that one of the lines segments observed could be matched to the corridor wall. A later iteration t = 30 shown in (d) has a sensing error which results in new hypotheses being created away from the (by now highly clumped) estimates of the robot pose. The map consists of rectangular models for each room and a set of doors. The doors are shown marked as thicker line segments on the walls. The set of features that the robot has recently observed is shown with darker lines on top of the map. For example, in (a) the robot has observed a door (solid black rectangle) and two line segments which are shown as solid black lines along the top and left of the lab.
1041