Image-based ICP Algorithm for Visual Odometry Using a RGB-D Sensor in a Dynamic Environment Deok-Hwa Kim and Jong-Hwan Kim Department of Robotics Program, KAIST 335 Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea
[email protected],
[email protected] Abstract. This paper proposes a novel approach to calculate visual odometry using Microsoft Kinect incorporating depth information into RGB color information to generate 3D feature points based on speed up robust features (SURF) descriptor. In particular, the generated 3D feature points are used for calculating the iterative closest point (ICP) algorithm between successive images from the sensor. The ICP algorithm works based on image information of features differently from previous approaches. This paper suggests one of the modified versions for a state-of-the-art implementation of the ICP algorithm. Such an approach makes accurate calculation of the rigid body transformation matrix for visual odometry in a dynamic environment. From this calculation step, dynamically moving features can be separated into outliers. Then, the outliers are filtered with random sample consensus (RANSAC) algorithm for accurate calculation of the rigid body transformation matrix. The experiments demonstrate that visual odometry is successfully obtained using the proposed algorithm in a dynamic environment. Keywords: Visual Odometry, RGB-D Sensor, Iterative Closest Point Algorithm, Dynamic Environments, SLAM
1
Introduction
Mobile robots containing wheels use their encoder values for odometry information. However, aerial vehicles or humanoid robots cannot directly use their encoder values for odometry information. Because of this constraint, visual odometry has become more important than another sensor values in their fields. Actually, the robotics and computer vision communities have developed many techniques and algorithm for 3D mapping and visual odometry using range scanner [5, 6], stereo cameras [3], monocular cameras [4, 7] and RGB-D sensors [1, 2, 10]. Most visual odometry systems require the spatial alignment of successive camera frames. To deal with the alignment problem, the ICP algorithm [11] has been used. This algorithm is to minimize the difference between the two sets of the points. It is often employed to reconstruct 2D or 3D surfaces from different scans. If there are some unreliable points in a set of features, the systems using those points will be unstable. To solve this problem, RANSAC algorithm, which is very
simple and useful algorithm, has been used widely [12]. There have been many researches on visual odometry using a RGB-D sensor based on the ICP and RANSAC algorithms. However, dynamic environments have not been considered in those researches. Considering dynamic environment applications, this paper proposes a novel approach to calculate visual odometry with the modified ICP algorithm based on image information.
2
Visual Odometry System Using a RGB-D Sensor
Many researchers proposed visual odometry systems using a RGB-D sensor. Those visual odometry systems follow similar algorithm [1, 10]. This chapter introduces the algorithm for getting the visual odometry information. Visual odometry information can be obtained by five procedures. First of all, it needs to get feature points. Visual odometry system works based on rotation-invariant feature points. For the feature detection, SURF algorithm [8] is employed. SURF algorithm have many parallel processing computation, so it can be applied to graphic processing unit (GPU) processing for boosting speed. From the SURF algorithm, the feature information, which contains feature position, feature scale, etc, can be obtained. After getting features, feature matching algorithm is computed between prior frame and current frame. The next step to get odometry information is 3D reconstruction using the camera intrinsic parameters (focal lengths, principal points) and the depth information. It can be done from the proportional equation between the focal length and the depth information, easily. After the step, the inliers detection has to be computed by reprojection using the homography between the prior image plane and the current image plane. The homography is found by RANSAC algorithm. From this step, matched features can be refined.
Fig 1. The Image matching result with SURF GPU algorithm.
Using the matched features, the rigid body transformation matrix [9] is calculated. It is computed by singular value decomposition (SVD) method for decomposing the cross-dispersion matrix [C]. The cross-dispersion matrix is computed from [ ] ∑ ̅ ̅ , where is the position of the i-th point measured in the prior image; the position of the i-th point measured in the current image; ̅ the mean position of the prior image; ̅ the mean position of the current image. It can be decomposed to [U], [W], [V] by SVD where [U] and [V] are the orthogonal matrices, and [W] is the diagonal matrix which contains the singular values of matrix [C]. For calculating the rigid body transformation matrix, rotation matrix is computed from [ ]
[ ][ ] .
And translation vector is estimated by ̅
[ ] ̅.
Through the combination of [R] matrix and t vector, the rigid body transformation matrix can be computed. Based on this matrix, estimation for odometry information is conducted. For more accurate estimation of the transformation matrix, RANSAC algorithm is applied. The result from this procedure is used for initial rigid body transformation matrix. However, there still remain errors in this transformation matrix. More compensation algorithm is needed.
Fig 2. The odometry information from up, forward, backward, down movements.
For more compensation of rigid body motion, this paper uses ICP algorithm. This algorithm has been used for 2D or 3D reconstruction in the vision field. It is consists of 3 procedures. Firstly, finding closest points from feature sets is required. KD-tree searching algorithm is used for searching closest points. In this paper, this procedure is modified to use the image information and it is discussed in detail in the next section. And next procedure is finding the rigid body transformation matrix from the closest point sets. To find the matrix, the SVD method is applied. After then, update procedure is computed. This update step can make new closest sets of feature. From the result of iterative searching for transformation matrix, odometry information gets more precise. Those overall procedures produce visual odometry information. Figure 2 shows the result from the continuous up, forward, backward and down movements for visual odometry. In the next section, accurate visual odometry algorithm in a dynamic environment (consisting of static and dynamic objects) is proposed.
3
The Image-based ICP algorithm in a Dynamic Environment
In a dynamic environment, visual odometry cannot be correctly computed because dynamic movements affect unfit matching in a part of getting closest sets in the ICP algorithm. In this section, the ICP algorithm based on images is proposed for the dynamic environment applications. 3.1
The image-based ICP algorithm
The ICP algorithm based on images is different from original one in detecting the closest sets of features. At a part of finding closest sets in ICP algorithm, an image information with the 3D position information of the feature is combined. This image information can contribute to the precise detection of matched features. Algorithm 1. The image based ICP ← for i=0 to max_iteration do (
If(
)
)
break;
end
If the features are matched more precisly, they can be divided into inliers and outliers. Dynamic obstacle features are classified into the outliers because those features have different movements from the others.
Fig 3. Diagram of finding closest set of feature using the proposed algorithm.
Algorithm 1 shows the image-based ICP algorithm. “FindClosestSetPoint” function has two more parameters , where is a set of RGB for the prior image features; is a set of RGB for the current image features. This function uses KDtree searching algorithm which is faster than raw searching algorithm for getting the closest sets. After then, solving the rigid body transformation matrix using RANSAC algorithm follows. Figure 3 shows the diagram of finding closest set of feature using the image-based ICP algorithm. Each finding procedure considers euclidean distance but also normalized RGB distance. 3.2
Benefits of the image based ICP algorithm
This paper proposes a novel visual odometry algorithm to be used in a dynamic environment. It is very robust to dynamic obstacles. Therefore, it can be applied to the real life for computing visual odometry. For example, aerial vehicles are very hard to get odometry information from their motors. In that field, this visual odometry algorithm using RGB-D sensor can be applied.
4
Experiments
The proposed algorithm was tested with RGB-D sensor as known as the Kinect sensor. Experimental environments were Gentoo OS, Intel i5 3.3GHz Quad-core processor, NVIDIA GTX 560 GPU and 6GB RAM. Average computation time was 152.0 ms per frame. The experiments were carried out in a static environment and in a dynamic environment.
4.1
The experiment result in a static environment
Fig 4. The experiment result in a static environment for up, forward and return movements: (left) using the conventional ICP algorithm; (right) using the image-based ICP algorithm.
Experiments in a static environment were conducted in a room where there were no movements excluding the Kinect sensor and experimenter. Experimenter grabbing the Kinect sensor moved it upward, forward, backward, and downward. Figure 4 is a result of experiment in a static environment. Left graph is a result of visual odometry using the conventional ICP algorithm. Right graph is a result of visual odometry using the image-based ICP algorithm. Both algorithms have showed similar results in a static environment. From this result, the image-based ICP algorithm also can be applied to calculate visual odometry in a static environment.
4.2
The experiment result in a dynamic environment
For the realization of a dynamic environment, experimenter roamed around a room, as Fig. 5 shows. In this environment, calculating the visual odometry was performed using a fixed Kinect sensor. Figure 6 shows a result of the experiment with the proposed visual odometry algorithm compared with the conventional ICP. The left graph is a result of the conventional ICP Fig 5. A dynamic environment. algorithm. It shows a result of unstable computation for the visual odometry. This result came from no inliers detection in the ICP algorithm. The outliers which came from roaming people were not filtered to solve rigid body transformation matrix. However, the right graph shows a very stable computing result for the visual odometry in a dynamic environment with the fixed Kinect. The proposed algorithm has additional filtering steps, such as the closest set detection based on images and outlier suppression using the RANSAC algorithm, in the ICP algorithm. As a result, the visual odometry using the RGB-D sensor by the proposed algorithm can be more accurate than the conventional ICP algorithm in a dynamic environment.
Fig 6. The experiment result in a dynamic environment with a fixed RGB-D camera: (left) using the conventional ICP algorithm; (right) using the image-based ICP algorithm.
5
Conclusion
This paper proposed the image-based ICP algorithm for calculating visual odometry. It was demonstrated that the proposed algorithm was very robust to dynamic obstacles. In this algorithm, RGB information of feature point was used for finding the closest set points in the ICP algorithm. After finding the closest set points, RANSAC algorithm was computed for solving the rigid body transformation matrix with eliminating outlier effect of dynamic object features. Such overall procedures contributed to promote accurate calculation of odometry information in a dynamic environment. However, the proposed algorithm still has a problem in computation time caused by RANSAC algorithm. Therefore, we need to solve the real-time problem considering the case of fast movements of the RGB-D sensor. Furthermore, the SLAM system has to be considered for our further works using the proposed visual odometry algorithm.
Acknowledgements This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the Human Resources Development Program for Convergence Robot Specialists support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2012-H1502-12-1002)
References 1.
2. 3. 4. 5.
6.
7.
Albert S. Huang, Abraham Bachrach, Peter Henry, Michael Krainin, Daniel Maturana, Dieter Fox and Nicholas Roy (2011) Visual Odometry and Mapping for Autonomous Flight Using an RGB-D Camera. Paper presented at International Symposium on Robotics Research (ISRR), Flagstaff, AZ, USA, Aug 2011 Flank Steinbrucker, Jurgen Sturm and Daniel Cremers (2011) Real-Time Visual Odometry from Dense RGB-D Images. Paper presented at IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, pp 719-722, Nov 2011 Navid Nourani-Vatani, Jonathan Roberts, and Mandyam V. Srinivasan (2008) IMU aided 3D visual odometry for car-like vehicles. Paper presented at 10th Australasian Conference on Robotics & Automation, Canberra, ACT, Australia, pp 1-8, Dec 2008 Laurent Kneip, Margarita Chli, and Roland Siegwart (2011) Robust Real-Time Visual Odometry with a Single Camera and an IMU. Paper presented at The 22nd British Machine Vision Conference, University of Dundee, pp 16.1-16.11, Sep 2011 Marco Baglietto, Antonio Sgorbissa, Damiano Verda and Renato Zaccaria (2011) Human navigation and mapping with a 6DOF IMU and a laser scanner. Paper presented at Robotics and Autonomous Systems 59:1060-1069. doi: 10.1016/jrobot.2011.08.005 Armin Hornung, Kai M. Wurm, and Maren Bennewitz (2010) Humannoid Robot Localization in Complex Indoor Environments. Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, pp 16901695, Oct 2010 Andrew J. Davison, lan D. Reid, Nicholas D. Molton, and Olivier Stasse (2007)
MonoSLAM: Real-Time Single Camera SLAM. Paper presented at IEEE Transactions On Pattern Analysis And Machine Intelligence 29(6):1052-1067. doi: 10.1109/TPAMI.2007.1049 8. Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool (2008) Speeded-Up Robust Feautres (SURF). Paper presented at Computer Vision and Image Understanding 110(3):346-359. doi: 10.1016/j.cviu.2007.09.014 9. John H. Challis (1994) A Procedure for determining rigid body transformation parameters. Paper presented at Journal of Biomechanics 28(6):733-737. doi: 10.1016/0021-9290(94)00116-L 10. Peter Henry, Michael Krainin, Evan Herbst, Xiaofeng Ren, and Dieter Fox (2010) RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments. Paper presented at International Symposium on Experimental Robotics, Delhi, India, Dec 2010 11. Aleksandr V. Segal, Dirk Haehnel, and Sebastian Thrun (2009) Generalized-ICP. Paper presented at Robotics: Science and Systems, Seattle, USA, June 2009 12. Martin A. Fischler, and Robert C. Bolles (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Paper presented at Communications of the ACM 24(6):381-395. doi: 10.1145/358669.358.692