An Omnidirectional Time-of-Flight Camera and its ... - Semantic Scholar

Report 0 Downloads 53 Views
An Omnidirectional Time-of-Flight Camera and its Application to Indoor SLAM Katrin Pirker, Matthias R¨uther, Horst Bischof

Gerald Schweighofer, Heinz Mayer

Institute for Computer Graphics and Vision Graz University of Technology Inffeldgasse 16/II, 8010 Graz, Austria [email protected]

Institute of Digital Image Processing Joanneum Research GmbH Wastiangasse 6, 8010 Graz, Austria [email protected]

Abstract—Photonic mixer devices (PMDs) are able to create reliable depth maps of indoor environments. Yet, their application in mobile robotics, especially in simultaneous localization and mapping (SLAM) applications, is hampered by the limited field of view. Enhancing the field of view by optical devices is not trivial, because the active light source and the sensor rays need to be redirected in a defined manner. In this work we propose an omnidirectional PMD sensor which is well suited for indoor SLAM and easy to calibrate. Using a single sensor and multiple planar mirrors, we are able to reliably navigate in indoor environments to create geometrically consistent maps, even on optically difficult surfaces.

I. INTRODUCTION Photonic mixer devices, commonly known as Time-ofFlight (TOF) cameras, are an emerging technology for 3D computer vision. By measuring the travel time of light with an area sensor, a single camera robustly produces 2.5D height maps within its central perspective field of view. In contrast to laser profile scanners, a TOF camera theoretically delivers more information because of the area based measurement principle, and promises to become a cheap device because of forseeable applications in the gaming industry. Although a TOF camera produces dense geometrical information, the reasons for not using a TOF camera with simultaneous localization and mapping (SLAM) are manifold: the limited field of view, sensitivity to stray light and difficulties in multiplexing more than a few cameras. To solve the SLAM problem, currently the most widespread robotic sensor is a laser range finder. Various authors extracted line features out of the two-dimensional range data for fast localization and map building (e.g. [10], [8]). A further approach to solve the SLAM problem with a laser range finder called FastSLAM was introduced by Montemerlo et al. [5]. They extended the Rao-Blackwellized particle filter (RBPF) idea introduced my Murphy [7] where the robot localization problem is solved by a particle filter and the map features are estimated by Extended Kalman Filters. Eliazar and Parr [2] successfully adapted the FastSLAM algorithm to occupancy grid maps and developed a pure laser based mapping algorithm called DP-SLAM. Work on SLAM methods using TOF cameras is sparse. One of the first approaches was developed by Weingarten et al. [13], who used a TOF camera for obstacle avoidance

c 978-1-4244-7815-6/10/$26.00 2010 IEEE

(a)

(b)

Fig. 1. Proposed sensor setup. (a) The four planar mirrors arranged as a pyramid. (b) A Pioneer PT-3 robot equipped with a TOF camera facing to the top with the mirror-pyramid placed diagonal to the robot motion direction.

and path planning. They compared their results to a laser scanner approach within a 4 × 4m2 laboratory environment. Ohno et al. [9] estimated the robot trajectory and the surface of the environment using a real-time iterative closest point variant to register the range scans. To compensate the low resolution and the limited field of view, several authors tried to combine a Time-of-Flight sensor with a CCD camera. Prusak et al. [11] proposed the combination with a high-resolution spherical camera to construct a 3D-depth-panorama by rotating the whole robot around its axis when entering unknown terrain. Pose estimation is done by tracking interest points in the spherical image, while map building is done through scan matching. However, they do not provide any evaluation regarding accuracy or time complexity. In order to reduce the computational costs of traditional Kalman Filter approaches, Zhou et al. [14] used a Sparse Extended Information Filter to produce dense colored pointclouds of a 6 × 4m2 synthetic rescue area. They evaluated their approach visually against a standard scan matching algorithm. To get rid of systematic errors caused by varying object reflectivity or phase noise, May et al. [4] focused on accurate calibration routines. They developed a mapping routine to produce metric maps out of TOF sensor data only. First, false measurements are neglected using various filtering routines. Second, different variants of the iterative closest point algo-

ICARCV2010

rithm are used to register two consecutive scans, assuming very small pose changes. Finally, a GraphSLAM approach is used to perform loop closing and to compensate for accumulated registration errors. They evaluated their system by moving the robot along a 10m path. Mapping accuracy was evaluated with the distance between two walls resulting in a deviation of 0.4m. In summary, the main drawbacks of using a TOF camera for SLAM are the limited field of view, the low resolution and the limited capture range. Hence, localization and map building is restricted to small indoor environments. A forward looking camera with a field of view of 40◦ and a range of 5m might capture no more than empty space in a typical office corridor. A side looking camera typically observes a single wall, which constrains camera motion only in one direction. By using more than one camera to increase the field of view, one increases the overall complexity and runs into the problems of synchronization, calibration, excessive data transfer and interference between the cameras. In this work we use a TOF camera in an omnidirectional configuration to improve its localization capabilities. We use a number of planar mirrors to split and redirect the light source and the field of view. The number of mirrors hereby is scalable and only limited by the image area. Calibration of the camera extrinsics is particularly simple and does not require a large calibration device. II. OMNIDIRECTIONAL TOF VISION A. Imaging Principle The mode of operation of a TOF camera is based on measuring the phase shift between an emitted amplitude modulated light signal and its reflection. The system consists of a light source, which is usually coaxial to the camera viewing direction. Emitted light is amplitude-modulated with a frequency fm . In most cases, LEDs in the near infrared spectrum around 850nm are used for this purpose. Objects in the scene reflect the light and the incoming signal is measured in each pixel and compared to the original modulation signal. Depending on the camera-object distance, the light travel time affects the phase shift between original and received signal. Both signals are correlated within a sensor pixel. At a sampling rate of four times the modulation period, the phase shift can be reconstructed as Φ = arctan

A3 − A1 , A0 − A2

(1)

where A0 . . . A3 are the four samples as shown in Figure 2. Physically the samples are measured as the charge difference in neighboring CCD pixels, where one is charged with the original signal, and the other one with the reflected signal. Phase shift is related to scene depth through d=Φ

c , 4πfm

(2)

Fig. 2.

Measurement principle of a Time-of-Flight camera.

where c is the speed of light. Furthermore, most TOF cameras return the reflected signal amplitude, calculated by p (A3 − A1 )2 + (A2 − A0 )2 A= , (3) 2 and the signal DC offset by B=

1X Ai . 2 i

(4)

B. Omnidirectional Geometry In creating an omnidirectional device with a TOF camera, one has to account for the difference between light source and projection center. The focused imaging rays of the camera, as well as the coaxial light source need to be redirected. Small parabolic mirrors, which are commonly used in omnidirectional devices, are not suitable, because the LED projection centers differ from the camera center and the rays are reflected in different directions. Fisheye lenses will not work for the LEDs either. Further, considering the application of indoor SLAM, we are not interested in covering the entire upper hemisphere with a modest resolution of 176 × 144 pixels. Therefore, we propose to use four planar mirrors assembled as a pyramid, as sketched in Figure 1(a). This setup has several advantages: the imaging rays are redirected mainly to the side, front and rear, where most geometric information should be present in an indoor environment. Furthermore, in contrast to classical omnidirectional devices, it is comparably easy to align and calibrate planar mirrors in front of a camera. We decided to place the pyramid diagonal to the robot motion direction to capture most of the surrounding structure (see Figure 1(b)). C. Calibration Calibration of omnidirectional cameras traditionally involves the construction of an accurate, large and concave calibration target, where the camera is placed inside. Other approaches follow the traditional way of placing a planar reference target around the camera, which is feasible if the imaging model is mainly radially symmetric. Our imaging model differs from this assumption. There are essentially four independent, virtual cameras to calibrate, which differ in the extrinsics and share the same intrinsics. Because the mirrors are oriented in a divergent manner, it is

(nT 1)T up to scale, the mirror reflection matrix D and virtual camera Pv :

LEFT

FRONT

 D= BACK RIGHT

(a)

(b)

Fig. 3. Exemplary images stemming from the TOF camera facing four planar mirrors. (a) Intensity image with its manually marked mirror regions used for external calibration. (b) Exemplary depth image showing a corridor scene.

hard to constrain their relative pose through a simple external target. Also here, a large, concave calibration target would be required.

I3×3 − 2nnT 01×3

−2dn 1

 ,

Pv = PD.

(6)

Coordinates of the calibration target points are further calculated by triangulation and metric scale is determined either by knowing the actual distance between triangulated reference points, or, if accurate enough, by measuring the range between camera and calibration target points. One obvious drawback is the lack of constraints between separate mirrors, so final bundle adjustment cannot improve the results. Yet, in our experiments calibration accuracy proved to be sufficient for indoor SLAM. III. GENERATION OF RANGE READINGS

virtual camera j virtual camera i

Because we only consider a robot moving in the xy-plane, we are interested in extracting two-dimensional range and bearing data from the omni-TOF readings, similar to a laser range finder. As shown in the experiments two-dimensional data is sufficient for performing localization and mapping.

mirror i mirror j

mirrors

d{

plane

real camera

Fig. 4.

Imaging principle using two planar mirrors. pointcloud

To simplify the calibration process, we calibrate the TOF camera without the mirrors to determine lens distortion parameters and projection matrix P. We finally determine the mirror poses separately using a simple, small calibration target consisting of four dots printed on paper. We hereby follow the approach documented in [3], where a reference target is placed in front of the camera and close to the mirror, as sketched in Figure 3(a), where the four dots are observed directly (four circles left) and indirectly through the mirror (four circles right). Each correspondence between a direct and indirect feature results in an epipolar line in the direct view. Using at least two corresponding points between target and mirror image, it is possible to determine the epipole e of the reflected, virtual camera in the real camera image by robust intersection. Because the mirror plane is situated halfway between real and virtual camera (see Figure 4) and is perpendicular to the baseline, the mirror normal n is computed as −1

n=M

e,

(5)

where M is the first 3 × 3 submatrix of P. Knowing the position of the mirror plane is equivalent to knowing the stereo baseline and consequently equivalent to knowing the metric scale of the scene. Therefore, in a first step we deliberately set the baseline to one, which gives the mirror plane πm =

camera

(a)

r

(b) Fig. 5. Feature extraction. (a) Schematic view of the two-dimensional feature extraction out of sparse three-dimensional data stemming form the omni-TOF. (b) Resulting range and bearing measurements at a resolution of 1◦ .

After removing lens distortion from range and intensity images we perform rough amplitude filtering to get rid of false readings caused by surfaces with specular reflectivity such as metallic doors or glass (see Figures 8(b) and 8(c)). Distance readings from each previously masked mirror region (see Figure 3(b)) are transformed into three-dimensional points with respect to the camera origin using the projection matrices obtained from calibration. Finally, we select points with a

distance smaller than a user defined threshold d to a plane parallel to the xy-plane at a predefined height, as sketched in Figure 5(a). The resulting points are mapped onto the xy-plane and sampled at a resolution of 1◦ to generate distance measurements. Exemplary range readings representing the small corridor from Figure 3(b) are shown in Figure 5(b), which serve as input for the SLAM system. IV. LOCALIZATION AND MAPPING To perform SLAM we adapt the well known RaoBlackwellized particle filter (RBPF) proposed by Montemerlo et al. [5] to occupancy grid maps because of two reasons: we extracted range and bearing measurements out of the sensor data, similar to laser readings, which have been used for large scale environment mapping very successfully in the past. Second, we have to be aware of noisy, low resolution range measurements as shown in Figure 5(b). So feature based methods are not suitable in our case. The main idea of a RBPF is to estimate a joint distribution p(x1:t , m|z1:t , u1:t−1 ) of the robot trajectory x1:t = x1 , ..., xt and the environment map m, where z1:t = z1 , ..., zt denotes the observations up to time t and u1:t−1 denotes odometry measurements, usually obtained by the wheel encoders. According to Murphy et al. [7] this posterior can be factorized in the following way: p(x1:t , m|z1:t , u1:t−1 ) = p(x1:t |z1:t , u1:t−1 ) · p(m|x1:t , z1:t ), (7) which allows us to estimate robot pose and map separately. To compute the most likely robot trajectory p(x1:t |z1:t , u1:t−1 ) from time 1 to time t we make use of a particle filter. To gain a new set of particles at each time step we use the probabilistic odometry motion model as stated by Thrun et al. [12] using the wheel odometry of the robot. According to Moravec et al. [6] the posterior over maps p(m|x1:t , z1:t ) given a set of measurements z1:t can be broken down to estimate the probability of each grid cell mi independently: Y p(m|x1:t , z1:t ) = p(mi |x1:t , z1:t ) (8) i

using the log odds representation of occupancy. Furthermore, each particle is assigned an importance factor w which is computed recursively wt = wt−1 · p(zt |m, xt ),

(9)

where p(zt |m, xt ) denotes the observation model. Because our sensor measures the range to nearby objects, we decided to use a standard beam model with a smooth likelihood function to model p(zt |m, xt ). Before incorporating the next measurement we perform adaptive resampling based on the value of the effective sample size Nef f as proposed by Doucet et al. [1]. 1 (10) Nef f = PN ¯2 i=1 wi where w¯i denotes the normalized weight of particle i.

V. EXPERIMENTS To evaluate the localization and mapping performance of the omni-TOF we acquired four test sequences: ROOM, OFFICE, CORRIDOR 1 and CORRIDOR 2. Localization accuracy is analyzed by: • error propagation whilst driving a loop • robustness to difficult surfaces while mapping accuracy is measured by: • visual comparison to ground plans • distance measures between known objects To empirically evaluate calibration performance, the robot was placed in a room, surrounded by objects of equal height, such that the upper object edges can be observed on all sides. The generated point-cloud shown in Figure 6 shows reasonable calibration quality. If the front and back mirror planes were not aligned correctly, the edges would not be at the same height. mirror front

mirror back TOF camera

Fig. 6.

Side view of the setup used for mirror calibration evaluation.

The CORRIDOR sequences consist of comparable narrow corridors (see Figure 8(a)) and a large meeting room, where we covered loops of 58m and 118m, respectively. Here the robot is surrounded by metal doors, metal grids and glass walls as shown in Figure 8, which pose a challenge for most optical sensors. A more spacious environment close to the range limits of a TOF camera is covered by the OFFICE sequence, while driving two loops of about 46m. The raw range readings and odometry output of each test sequence are shown in Figure 7. The resulting occupancy grid maps together with the estimated trajectories are shown in Figures 9(a), 10(a) and 11. We successfully closed all loops using 200 particles only. Mapping accuracy is close to groundtruth data represented through floor plans as shown in Figures 9(b) and 10(b). Still there are some misreadings, especially when passing the metal doors and data is very sparse at regions with the metal grid as shown in the closeups of Figure 10(b). When entering terrain where the environment exceeds the sensor maximum range, data is completely missing and the resulting map is incomplete there (shown in the closeups of Figure 9(b)). However, as long as the sensor receives valid readings passing at least one of the mirror planes, mapping is still possible even at broader areas as demonstrated in the OFFICE sequence shown in Figure 11(a). The accuracy of the produced maps is evaluated using manual measurements at known locations in the test sequences CORRIDOR 1 and CORRIDOR 2. We take groundtruth data at five specific locations as shown in Figure 12(a). We cut out patches at various locations in the resulting occupancy grids and estimate the distance of opposite walls by fitting two Gaussians to the data. A schematic view of the procedure is shown in Figure 12. The results of the two sequences where

location #1 #2 #3 #4 #5

ground truth 217 134 306 1213 133

CORRIDOR 1 meas offset 209.69 7.31 134.86 0.86 388.00 82.00 1493.25 280.25 133.28 0.28

CORRIDOR 2 meas offset 210.62 6.31 137.88 3.88 409.58 103.57 1522.83 309.83 132.25 0.75

TABLE I M EASURED DISTANCES OF TWO TESTSEQUENCES COMPARED TO

(a)

GROUNDTRUTH DATA OF KNOWN OBJECT DISTANCES WITHIN THE ENVIRONMENT. M EASUREMENTS ARE GIVEN IN CM .

(b)

(c)

Fig. 7. Raw data stemming from the omni-directional TOF device of the three testsets used for map building. (a) Testsequence CORRIDOR 1 covering a path of 118m. (b) Smaller indoor scene CORRIDOR 2 moving the robot along a 58m long path. (c) Raw data of the broader office scene.

(a)

(b)

(a)

(c)

(b) Fig. 9. Results of test sequence CORRIDOR 1. (a) Occupancy grid and the estimated robot trajectory. (b) Floor plan overlaid with the most likely map.

(d) Fig. 8. Images of the mapped indoor environment. (a) Indoor environment mapped in test sequences CORRIDOR 1 and CORRIDOR 2. Challenging locations such as metallic doors (b), glazed doors (c) or coarse grids (d).

groundtruth was available are listed in Table I. As expected we are close to groundtruth at locations 1, 2 and 5, where walls normal to the driving directions have been mapped. Errors in the robot path accumulate when mapping is performed along the driving direction which can be observed by the larger mapping errors at locations 3 and 4. We also applied three-dimensional occupancy grid mapping to the OFFICE testsequence and acchieved nearly the same trajectory as in the two-dimensional case as shown in Figure 11(b). Yet, the mapping quality does not significantly improve compared to the two-dimensional version and is not considered worth the computational effort.

(a)

(b)

Fig. 10. Results of the localization and mapping routine of test sequence CORRIDOR 2. (a) Occupancy grid and the estimated robot trajectory. (b) Floor plan overlaid with the estimated grid map.

mirrors get closer to the camera, because reflection of illumination no longer coincides with reflection of the camera image. For the application of SLAM, a point lightsource which is independent of the camera would certainly be beneficial. We plan to redesign the mirror setup to enlarge the sideviews and to concentrate on enhanced filtering routines to get rid off systematic errors to achieve a more accurate pointcloud and perform 3D map building in a realistic way. ACKNOWLEDGMENT This work was supported within the Austrian Research Promotion Agencys (FFG) program: Research Studios Austria Machine Vision Meets Mobility and the Austrian Federal Ministry of Economy, Family and Youth (bmwfj).

(a)

R EFERENCES

(b) Fig. 11. Results of test sequence OFFICE. (a) Map and trajectory of the broader office scene of the most probable particle. (b) Resulting threedimensional occupancy grid with a resolution of 0.1m each cell.

(a)

d

(b)

(c)

Fig. 12. Groundtruth measurements. (a) Locations in the map where groundtruth was measured manually. (b), (c) Metric distances between opposite walls at location 1 is taken from the occupancy grid by fitting two Gaussians to the black peaks in various patches.

VI. CONCLUSION Narrow field of view is one of the reasons why TOF cameras are rarely applied in SLAM applications. Using a mirror system, it is possible to extend the camera field of view and spread it over an angular range of 360◦ . Of course, the same effect could be reached by using several low-cost TOF cameras looking in different directions. Yet, we have the advantage of only one modulated light source, easy calibration and less hardware effort. One obvious drawback of the proposed sensor is its size. We experienced considerable performance degradation, if the

[1] A. Doucet, N. De Freitas, and N. Gordon, editors. Sequential Monte Carlo methods in practice. Springer Verlag, 2001. [2] A. Eliazar and R. Parr. Dp-slam: fast, robust simultaneous localization and mapping without predetermined landmarks. In Proceedings of the 18th international joint conference on Artificial intelligence, pages 1135– 1142, 2003. [3] B. Hu, C. Brown, and R. Nelson. Multiple-view 3-D reconstruction using a mirror. Technical report, University of Rochester, May 2005. [4] S. May, D. Droeschel, S. Fuchs, D. Holz, and A. Nuechter. Robust 3d-mapping with time-of-flight cameras. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1673–1678, 2009. [5] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI National Conference on Artificial Intelligence, pages 593–598, 2002. [6] H. Moravec. Sensor fusion in certainty grids for mobile robots. AI Mag., pages 61–74, 1988. [7] K. Murphy. Bayesian map learning in dynamic environments. In Neural Info. Proc. Systems (NIPS), pages 1015–1021. MIT Press, 1999. [8] V. Nguyen, A. Harati, A. Martinelli, R. Siegwart, and N. Tomatis. Orthogonal slam: a step toward lightweight indoor autonomous navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5007–5012, 2006. [9] K. Ohno, T. Nomura, and S. Tadokoro. Real-time robot trajectory estimation and 3d map construction using 3d camera. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5279– 5285, 2006. [10] S. Pfister, S. Roumeliotis, and J. Burdick. Weighted line fitting algorithms for mobile robot map building and efficient data representation. In IEEE International Conference on Robotics and Automation, volume 1, pages 1304–1311, 2003. [11] A. Prusak, O. Melnychuk, H. Roth, I. Schiller, and R. Koch. Pose estimation and map building with a Time-Of-Flight-camera for robot navigation. Int. J. Intell. Syst. Technol. Appl., pages 355–364, 2008. [12] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, 2005. [13] J. Weingarten, G. Gruener, and R. Siegwart. A state-of-the-art 3d sensor for robot navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2155–2160, 2004. [14] W. Zhou, J. V. Mir´o, and G. Dissanayake. Information-driven 6d slam based on ranging vision. In International Conference on Robots and Systems, pages 2072–2077, 2008.