1
Dominant Orientation Tracking for Path Following Alan M. Zhang and R. Andrew Russell Centre for Perceptive and Intelligent Machines in Complex Environments: Intelligent Robotics Monash University, Clayton, Victoria 3800, Australia {alan.zhang, andy.russell}@eng.monash.edu.au
Abstract— The behaviour that allows mobile robots to follow what humans consider as paths is beneficial for autonomous navigation in a wide range of environments. Paths can be corridors, footpaths, catwalks, roads, etc. In fact roads can be considered as well-structured paths. Traditional approaches to path following has been extensions of road following systems. We present a vision-based path following method different to that of the traditional model fitting approach of road following. The method focuses on tracking the direction in which parallel lines in the environment are aligned. It is similar in principle to methods that extract vanishing points but with better robustness achieved with appropriate simplifying assumptions. Some preliminary results of successful path following experiments are also presented. Index Terms— road following; vanishing point; Markov; dominant orientation; path following
I. I NTRODUCTION Mobile robots operating in structured or semi-structured environments should exploit any regular features that they can find in the environment for the purpose of navigation. A particularly common and useful type of feature is what humans consider as paths, such as corridors, footpaths, roads, catwalks, etc. Following paths is beneficial for mobile robots because paths are generally easier and safer to travel on than the surrounding terrain. They effectively limit the robot to only one degree of freedom along the path, such that localisation and path planning can be significantly simplified. Research in the area of path following has mainly concentrated on a subset of paths: roads. Road following has received considerable attention and the literature on the subject is vast. Vision is by far the most common mode of sensory input. Being a passive sensor, vision has the advantage of low power consumption, compact, readily available as a commercial product and inexpensive. Since the system presented in this paper will also be vision-based, the following brief review of the literature will focus on vision based systems. Road followers can be divided into two categories by the types of road they are designed to follow: structured or ill-structured roads. Structured roads are roads where lane markings or road boundaries can be reliably detected. Because it is assumed that good features can be extracted from the images, structured road followers can fit complex road models to the image features rather accurately. Accurate estimation of road curvature is important for these systems because they are designed to steer vehicles at high speeds. A variety of robust statistical estimation and heuristic methods have been used to track the road using a given model, examples include Kalman filters [5], [14], particle filters [4], line snakes [17], maximum
likelihood estimation using the metropolis algorithm [11], and many others. Features of ill-structured roads are more difficult to extract and are subjected to a greater variety of noise such as shadows, cracks on the road surface, occlusion, etc. Footpaths and dirt roads (as opposed to highways) are typically referred to as ill-structured. Following of illstructured roads has received less attention in the literature. In [6] unsupervised image segmentation is performed followed by fitting road models to the image segments. Pixels in the image are classified as on-road or off-road based on colour statistics in [16], followed by a voting procedure to determine the most likely road model. The same idea of pixel classification is used by [9] to follow dirt roads, except a simpler road model fitting procedure is employed. Three commonalities exist between most road following systems regardless of whether they are designed for ill-structured or structured roads. 1. Explicit road models are required, most of them consist of parallel lines representing the lane markings or boundaries of the road. 2. While some road following systems can detect obstacles, all features used to track the road are assumed to be in the ground plane. 3. Most systems do not integrate time series of measurements in a rigorous way (with the exception of [4]). They instead find the best fitting road model on a per image basis, i.e. they are maximum likelihood estimators of the road model parameters.
Following paths with mobile robots has a different set of requirements compared to road following for vehicles. Robustness is the key issue for robotics because the operating environments are expected to be a lot more complex and variable. Recovery from tracking failures should occur quickly and without human intervention. Estimating the curvature of the road is of secondary concern if the robot is operated at low speeds. Explicit and complex road models are therefore unattractive. The complexity of the environment means the assumption that all features lie in the ground plane must be relaxed. Integrating a series of measurements for more robust tracking can be made easier by using odometry measurements. This paper presents a vision-based path following system for mobile robots that was designed to satisfy the aforementioned requirements. The organisation of the paper is as follows: Section II introduces the concept of dominant orientation and describes the path following system. Experimental results and some discussion are presented in Section III. Section IV considers possible future directions.
2
23cm
35cm
34cm
12°
Fig. 1. Hardware configuration, not drawn to scale. Side view is shown on the left and top view on the right. A colour camera is mounted at the front of the mobile robot platform pointing slightly downwards. Central axis of the camera is at right angles with the wheel axle.
(a)
(b)
(c)
(d)
II. PATH F OLLOWING BY D OMINANT O RIENTATION T RACKING We make the observation that paths are typically identified by linear features aligned with the direction of the path. Such linear features could be the boundaries of the road, the hand rail of a catwalk, the groves between tiles on a concrete footpath, edges of the road curb, etc. The direction in which most linear features in the environment point towards is referred to as the dominant orientation. These linear features must also be parallel to the ground. The assumption is that the dominant orientation points in the direction of the path. Therefore we hope to achieve path following by tracking the dominant orientation. This very general model of paths should enable the system to operate in a wide range of environments. A. Hardware Configuration Fig. 1 illustrates the hardware configuration of the system. A colour camera is mounted on a Pioneer 3-DX mobile robot. Processing of visual information and robot motion control is provided by an on-board 2.4GHz Pentium 4 computer. Wheel encoders provide odometry measurements. Because the robot was not designed for outdoor applications it generates a significant amount of vibration and jitter when operated in outdoor environments. To alleviate this problem the camera is mounted low on the robot at approximately 34cm from the ground. Images are captured at 10 frames per second with a resolution of 320 by 240 pixels. Lens distortion is minimal so no compensation is required. Camera shutter speed and gain are adjusted automatically by the on-board PC to accommodate changes in lighting conditions in outdoor environments. The camera has a 40 degrees horizontal and 30 degrees vertical field-of-view which is rather narrow. As a result, in order to capture a significant proportion of the environment it is pointed slightly downwards while making sure the horizon is still within view. The central axis of the camera is aligned perpendicular to the drive wheel axle as shown in Fig. 1. The camera is oriented such that that the horizon appeared horizontal in the image. B. Assumptions The following simplifying assumptions are made. Ground is assumed to be flat. The location of the horizon in the image (and implicitly the azimuth angle of the camera with respect
Fig. 2. Effect of Gaussian smoothing. (a) Original image. (b) Edge map of original image obtained with the Canny detector. (c) Gaussian smoothed image, standard deviation of the Gaussian is progressively reduced as we move up the image. (d) Edge map of smoothed image, notice that a lot of the noise is removed while the major features are preserved.
to the ground plane) is assumed to be given (it was obtained experimentally). Only straight line features are considered. Curved lines, such as boundaries of curved footpaths, are assumed to be locally straight. However linear features are not assumed to all lie in the ground plane. C. Extraction of Linear Features Any straight linear feature in the environment may serve as an indication of the dominant orientation. A simple method of extracting lines from edge maps of images is employed. Currently only the green colour channel is used because it contains less noise than the red and blue colour channels. To reduce the noise-accentuating effect of fore-shortening the image is first smoothed with a variable width Gaussian kernel. The standard deviation of the Gaussian is progressively reduced according to the geometry involved in projecting the ground plane onto the image plane, such that more distant areas are smoothed to a lesser extent. Edge maps are then extracted with the Canny edge detector. The noise suppression effect of smoothing is shown in Fig. 2, where edge maps of the original and smoothed images are compared. Straight lines in the edge map are then detected using the Hough transform. Vegetation and clutter in the scene often produce regions with very dense edges. To remove erroneous lines caused by these dense edge regions, the Hough transform accumulator is filtered with a template to emphasis “clean” lines that do not have any other lines closely adjacent to them. Lines close to vertical are ignored as they are mostly caused by vertical features in the environment like door frames and corners of walls.
3
D. Dominant Orientation Tracking Because the robot might operate in complex environments, a robust method to track the dominant orientation is required. Fortunately the state-space needed for tracking the dominant orientation is a single angle ranging from 0 to 180 degrees measured in the robot-centric frame of reference. The small size of the state-space means that if it was discretised appropriately and that the Markov assumption is made the full posterior over the entire state-space could be estimated. The approach adopted is derived from the formulation of Markov localization presented in [7]. The estimation process is governed by two phases: Prediction phase: Bel(L− T
= l) =
P (l | aT , l )Bel(LT −1 = l )dl
(1)
Update phase: Bel(LT = l) = βT P (sT | l)Bel(L− T = l)
(2)
Where Bel(LT = l) is the posterior probability distribution over the discretised state-space; LT is the state variable at time T ranging from 0 to 180 degrees; aT is the odometry measurement between T − 1 and T ; sT is the sensor measurement at time T ; βT is a normalising constant; and the minus sign in superscript in Bel(L− T −1 = l) indicates that the estimate is before the update phase. Odometric errors are modelled as normal distributions. As a consequence the state transition probability function P (l | aT , l ) also takes on the form of a Gaussian. Thus Equation (1) can be calculated efficiently using a cross-correlation: Bel(LT = l) =
f (l − ∆Φ − l | σaT )Bel(LT −1 = l )dl
(3) Where f (l − ∆Φ − l | σaT ) is a zero mean Gaussian probability density function with a standard deviation of σaT ; σaT is a function of the distances registered on both wheel encoder; and ∆Φ is the angle of rotation as measured by odometry. The expression P (sT | l) in (2) is the likelihood of observing the detected lines in the image given a specific orientation l. For a given value of l, the diagram in Fig. 3 illustrates the quantities involved. The line ab is a detected line in the image; C is the centre of the image; p is the closest point on ab to the centre of the image C, p was easily obtained from the Hough transform in the line detection step; and v is the vanishing point located on the horizon calculated from the given orientation l, all lines parallel to the ground plane (regardless of their height from the ground plane) that point in the direction of l should intersect the horizon at v in the image. The unnormalised probability density function describing the likelihood of observing line ab given the vanishing point v is assumed to be a function of the angle θ in Fig. 3: P (θ) =
1 (kθ)2 + 1
(4)
Horizon
v b θ p
a
Fig. 3.
C
Quantities involved in calculating the measurement likelihood.
Where the constant k is a tunable scaling factor that reflects the amount of noise in the line detection process. Equation (4) is a bell shaped curve meant to emulate a Gaussian but being much faster to calculate. A more sophisticated likelihood function that used the accumulator counts from the Hough transform directly as the error model for the detected lines was also implemented but did not yield superior results. This is because the accumulator counts are good error models for detected lines in a single frame but not between frames. For instance, mechanical vibration of the camera causes lines from the same source in the scene to be shifted in the image between successive frames. The constant k in (4) can be tuned to account for this inter-frame error. After (2) has been evaluated and the result normalised, a uniform prior is superimposed onto the result to allow for the following of curved paths and recovery from tracking failure. Note that this tracking process will implicitly reject linear features not parallel to the ground plane. It was found that a discretisation of 1 degree per division for l is adequate for path following. An area of research in computer vision related to our method is the recovery of vanishing points of parallel lines from images [2], [3], [12], [13]. These methods use vanishing points to either reconstruct the orientation of planar surfaces, or to match vanishing points between images of the same scene taken from different view points to recover the motion of the camera. They are generally designed to detect vanishing points with high accuracy. In contrast, robustness is of primary concern in path following. Robustness in our method is achieved through the explicit specification of the location of the horizon which restricts the location of vanishing points down to only one dimension along the horizon, and the incorporation of odometry data that provides direct estimates of the camera motion. While projects reported in [15], [18] achieved successful path following using vanishing points, a minimum of two parallel lines was required and they did not integrate any odometry information. Fig. 4 is a screen-shot showing the visualisation of the estimated posterior and the measurement likelihood during a test run. The group of white lines on the right in Fig. 4 shows the measurement likelihood, the posterior is shown near the centre. The length of the lines pointing in a particular direction is proportional to the probability of the direction being the dominant orientation. Lines contributing to the
4
(a)
(b)
(c)
(d)
Fig. 4. Screen-shot of an outdoor experiment. The original captured image on the upper-left is superimposed with the detected lines in red. The lines pointing in the dominant orientation are shown as think blue lines. The posterior estimate is visualized at the centre and the measurement likelihood is shown on the right. The length of the lines pointing in a particular direction is proportional to the value of the probability density function associated with that direction. All directions are relative to the robot-centric frame of reference.
measurement likelihood are shown as thin red lines in the top left image and lines that point in the dominant orientation are shown as thick blue lines.
Fig. 5. Screen-shots from experiments in various environments. (a) Corridor. (b) Tiled concrete footpath. (c) Footpath with some curvature. (d) Shows tracking of the curb of a road
E. Robot Motion Control Strategy While not using an explicit road model provides the system with flexibility, it also means that the boundaries of the path is not readily obtainable. The current control strategy assumes the robot to be initially located on the path and tries to avoid crossing any lines pointing in the direction of the dominant orientation. The controller uses a subsumption architecture. The nominal behaviour is to move at a constant speed and adjust the robot’s orientation towards the dominant orientation. The dominant orientation is set as the desired heading and a PID controller adjusts the actual robot reading towards the desired heading. The PID controller is implemented by the AROS [1] control software running onboard the robot. While the robot is traveling in the direction of the dominant orientation, lines pointing in the same direction should appear stationary in the image. A counter records the number of times a line is detected at the same position in the image. If the count is above a threshold the line is considered persistent and the nominal behaviour is preempted to steer the robot so as to not cross the line. III. E XPERIMENTAL R ESULTS AND D ISCUSSION The system has been tested in both indoor and outdoor environments with no modification of any software parameters. Fig. 5 contains screen-shots of experiments in various environments. Preliminary results showed successful tracking of dominant orientation. The robot successfully followed what humans perceived as paths thereby validating the suitability of using dominant orientation for path following. The predictionupdate cycle of dominant orientation tracking ran at an
average of 7 frames per second in corridors and 2 frames per second in outdoor environments where there was more clutter. These rates were adequate for motion control when the robot was traveling at a constant speed of 20cm per second. As mentioned in Section II-A the camera suffered from significant vibration in outdoor environments that displaced the same scene by up to 20 pixels in successive frames. This vibration was intentionally not compensated in software to test the system’s robustness. While tests have only been performed in a limited number of outdoor experiments, the system was able to cope with the noise introduced by vibration. To test for recovery from tracking failures, while the robot was following a path it was rotate by up to 70 degrees without odometry registering the rotation. The recovery of dominant orientation occurred in under 8 prediction-update cycles inside corridors. The system was also able to cope with pedestrians walking in front or past the robot. Also notice in Fig. 5(a) that the bottom of the wall, the skirting on the wall above the floor (which is a feature that is parallel to the ground plane but not in the ground plane) as well as the bottom edge of the door (which is a transient feature, it disappears when the robot moves past it) all contributed to determining the dominant orientation as indicated by the thick blue lines. This demonstrates that the system achieved robustness by using whatever cues that were available to determine the dominant orientation. This is also one of the reasons why the system could recover quickly from tracking failures. However, preliminary experiments also highlighted a number of issues. Shadows in outdoor environments were
5
a major problem. When both sunlit and shaded areas were visible in the same scene the automatic camera shutter speed and gain control made the shadow areas appear so dark that no features could be detected in the shaded area. As a result features such as the edges of footpaths were lost when a shadow was cast over part of the footpath. Also, because the camera was mounted close to the ground, shadows casted by tree branches onto the paths caused dense regions of edges to appear in the image. The system was often confused by this because the edges of the shadows in the image seemed to form lines pointing in the same direction. Preliminary results indicate that the current system did not cope well with highly curved paths because the assumption was that the path was locally straight. IV. F UTURE W ORK We plan to improve the linear feature detection procedure to be more robust against clutter by utilizing other cues such as segment boundaries in a segmented image or local edge orientation information. The current implementation of the classic Hough transform is quite processing intensive. Faster Hough transform variants such as the probabilistic Hough transform [10] and the progressive Hough transform [8] are currently being investigated. Optimizing the tracker to achieve higher update rates should allow the robot to move at higher speeds. Mounting the camera at a higher vantage point should reduce the effect of foreshortening and improve system performance. Experiments in a greater variety of environments and performance comparison with other previous methods are also planned for the future. V. C ONCLUSION We have described a system that tracks the dominant orientation of the environment and uses it to follow paths. Preliminary experimental results have shown that the system was able to operate under a range of indoor and outdoor environments. Recovery from tracking failures was fast and reliable. The strength of the system is that it does not try to fit an explicit road model to sensor observations, providing it with more flexibility and robustness. However, this is also it’s shortcoming because not tracking road boundaries makes staying on the path difficult. Perhaps a hybrid solution could offer the best of both worlds, where dominant orientation tracking helps to bootstrap a model based road follower and also to help recover from tracking failures. ACKNOWLEDGMENT The work described in this paper was supported by the Australian Research Council funded Centre for Perceptive and Intelligent Machines in Complex Environments. R EFERENCES [1] Pioneer 3-dx documentation, http://robots.activmedia.com. [2] Andres Almansa, Agnes Desolneux, and Sebastien Vamech. Vanishing point detection without any a priori information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(4):502–507, April 2003.
[3] Matthew E. Antone and Seth Teller. Automatic recovery of relative camera rotations for urban scenes. Computer Vision and Pattern Recognition (CVPR’00), 2:2282–2289, 2000. [4] Nicholas Apostoloff and ALexander Zelinsky. Robust vision based lane tracking using multiple cues and particle filtering. In Proceedings of the IEEE Intelligent Vehicles Symposium, 2003. [5] Romuald Aufrere, Roland Chapuis, and Frederic Chausse. A fast and robust vision based road following algorithm. In Proceedings of the IEEE Intelligent Veicles Symposium 2000, Dearborn (MI), USA, 2000. [6] Jill D. Crisman and Charles E. Thorpe. Unscarf, a color vision system for the detection of unstructured roads. In Proceedings of the 1991 IEEE International Conference on Robotics and Automation, Sacramento, California, April 1991. [7] D. Fox. Markov Localization: A Probabilistic Framework for Mobile Robot Localization and Naviagation. Phd thesis, Dept. of Computer Science, University of Bonn, Germany, December 1998. [8] C. Galambos, J. V. Kittler, and J. Matas. Gradient based progressive probabilistic hough transform. Vision, Image and Signal Processing, 148(3):158–165, June 2001. [9] R. Ghurchian, T. Takahashi, Z. D. Wang, and E. Nakano. On robot self-navigation in outdoor environments by color image processing. In Seventh International Conference on Control, Automation, Robotics and Vision (ICARCV’02), Singapore, Dec, 2002. [10] Heikki Kälviäinen, Petri Hirvonen, Lei Xu, and Erkki Oja. Probabilistic and non-probabilistic hough transforms: overview and comparisons. Image and Vision Computing, 13(4):239–252, 1995. [11] K. Kluge and S. Lakshmanan. A deformable-template approach to lane detection. In Proceedings of the Intelligent Vehicles ’95 Symposium, pages 54–59, 1995. [12] J. C. Leung and G. F. McLean. Vanishing point matching. In IEEEE International Conference on Image Processing (ICIP’96), page 17A10, 1996. [13] P. L. Palmer, M. Petrou, and J. Kittler. Accurate line parameters from an optimising hough transform for vanishing point detection. In Proceedings of the Fourth International Conference on Computer Vision, pages 529–533, Berlin, Germany, 1993. [14] Daniel Raviv and Martin Herman. A new approach to vision and control for road following. In Proceedings of the IEEE Workshop on Visual Motion, 1991. [15] Rolf Schuster, Nirwan Ansari, and Ali Bani-Hashemi. Steering a robot with vanishing points. IEEE Transactions on Robotics and Automation, 9(4):491–498, 1993. [16] Charles Thorpe, Martial H. Herbert, Takeo Kanade, and Steven A. Shafer. Vision and navigation for the carnegie-mellon navlab. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(3), May 1988. [17] Yue Wang, Eam Khwang Teoh, and Dinggang Shen. Lane detection using b-snake. In Proceedings of the International Conference on Information Intelligence and Systems, 1999. [18] Zhongfei Zhang, Richard Weiss, and Allen R. Hanson. Automatic calibration and visual servoing for a robot navigation system. In Proceedings of the 1993 IEEE International Conference on Robotics and Automation, volume 1, Atlanta, Georgia, USA, May 1993.