Camera Pose Determination From a Single View of Parallel Lines Xianghua Ying, Hongbin Zha National Laboratory on Machine Perception Peking University, Beijing, 100871 P.R.China {xhying, zha}@cis.pku.edu.cn Abstract-In this paper, we present a method for finding the closed form solutions to the problem of determining the pose of a camera with respect to a given set of parallel lines in 3D space from a single view, while it can not be solved by previous methods for the Perspective-n-Line (PnL) problem. The main idea of our method is that, firstly the distances from the optical center of the camera to these parallel lines are determined, and then the pose parameters are recovered using the obtained distances. The problem of finding these optical-center-to-line distances in fact is the degenerated Perspective-n-Point (PnP) problem, and we proved that there are at most two solutions for the degenerated P3P problem. An application of our method is to distinguish crosswalks and staircases aiding for the partially sighted. The method also provides a different way to investigate the problem of shape from texture.
I.
INTRODUCTION
Using 2D to 3D point or line correspondences to determine camera pose has received a lot of attention in the past two decades. The problem of determining the pose of a camera with respect to a given set of 3D points from a single view is called the Perspective-n-Point (PnP) problem [3, 4]. Correspondingly, the problem of determining the pose using 3D lines is called the Perspective-n-Line (PnL) problem [1, 2]. To solve these problems, we shall assume that the following parameters are known: the intrinsic parameters of the camera, the 2D image coordinates of image points and lines, the 3D world coordinates of 3D control points and lines, and the 2D to 3D point and line correspondences. For three point correspondences, Fishler and Bolles [3] found that the solutions are given by a quartic equation in one unknown, and the number of solutions is at most four. With three line correspondences, Dhome et al. [2] and Chen [1] shown that the solutions are given by an eight-degree equation in one unknown, and the number of solutions is up to eight. Note that in the PnL problem, each line segments in 3D space are treated as infinite lines when computing the pose, and the endpoints of line segments are not involved in the computation. In real scenes, parallel line features may be located in, such as, crosswalks, staircases, ceilings, floors, windows, and vertical intersection lines of adjacent walls. These parallel line features are often used for robot navigation [6, 7], and structure from motion [8, 9] (Note that [8] requires multiple
0-7803-9134-9/05/$20.00 ©2005 IEEE
views and [9] requires two sets of coplanar parallel lines. Both of them are different from our method). However, as pointed out by Chen [1], previous methods for the PnL problem will become unusable when all space lines belong to one set of parallel lines (these lines need not lie in one plane). The main reason is that using previous methods, the rotation is needed to be determined in the first step and then the translation is recovered. For parallel lines, however, the rotation cannot be determined, i.e., these previous methods will fail in the first step. Yet very little attention has been paid to this problem. In this paper, a novel approach especially for solving this problem for parallel lines is presented. Unlike previous methods, in the first step of our method, the distances from the optical center of the camera to these parallel lines, called the optical-center-to-line distances, are estimated, and then the pose parameters are recovered from these obtained distances. II.
PROBLEM STATEMENT
A set of parallel lines in 3D space, L1 , L2 and L3 , and the optical center of the camera O are illustrated in Fig. 1. From the optical center O , construct perpendicular lines to L1 , L2 and L3 , respectively, where A , B , C are the perpendicular feet. Therefore, the distances from O to these parallel lines are OA , OB and OC , respectively. Obviously, A , B , C and O are coplanar. Since the equations of these parallel lines are given in the word coordinate system, the distances between these parallel lines, i.e., AB , BC and CA , can be obtained. As discussions in [1, 2], the equation of the plane containing O and the image line of a 3D space line, called the interpretation plane, can be obtained in the camera coordinate system. Obviously, the interpretation plane also passes through the space line. Then the angles ∠BOC , ∠AOC and ∠AOB can be determined which are these dihedral angles between the interpretation planes. Therefore, the problem of recovering these optical-center-to-line distances can be treated as a special case of the PnP problem when all 3D control points and optical center of the camera are coplanar. Here, the control points are the perpendicular feet from the optical center to these parallel lines. Unlike the undegenerated P3P problem, the degenerated P3P problem requires to solve a quadric equation in one unknown, and the number of solutions is at most two. We can also easily find
that, like the aperture problem mentioned by Marr [5], the translation parameter along the direction of the parallel lines (denoted by the dashed line in Fig. 1) cannot be determined, since the camera can be translated along this direction freely while keeping the image unchanged. Though with such ambiguity, the five recovered pose parameters, three for rotation, and two for translation, are still very useful 3D cues extracted from a single view to interpret real scenes. For example, in robot navigation on the ground plane using vertical lines, the 2D translation parameters on the ground and 3D rotation parameters in space can be recovered using our method, and the undetermined translation parameter along the vertical orientation is often neglected. A novel application of our method is shown to distinguish crosswalks and staircases aiding for the partially sighted. Note that such ambiguity can be overcome if an additional point correspondence is given. The work of Sugihara [7] is closest to our work. However, in his approach he only considered the vertical lines, and the optical axis of the camera is kept parallel to the ground plane. Only 2D planar motion parameters, one for rotation and two for translation, are recovered. How to estimate the 3D motion parameters is not studied in his paper. Note that he also said that to estimate the 3D motion parameters is one of his future directions, but we never saw any further result presented in literature. Another important novelty in this paper is that we treat the problem of the pose estimation from parallel lines as a special case of the PnL problem, and discover that this problem can be converted to the degenerated PnP problem. Deeper discussions on the degenerated PnP problem are also given in this paper.
O
C 23
C12 P
C13
A
C B
C12′
Q
C13′ O′
C 23′
Fig. 2. Geometry for the degenerated P3P problem.
problem as shown in Fig. 2. We use the 2D planar coordinates, and let O = ( xO , yO ) , A = ( x1 , y1 ) , B = ( x2 , y 2 ) , C = ( x3 , y3 ) . Let α = ∠BOC , β = ∠AOC , γ = ∠AOB , from the law of cosine, we obtain: 2
2
2
2
2
2
2
OB + OC − 2 cosα ⋅ OB OC = BC ,
(1)
2
(2)
OA + OB − 2 cos γ ⋅ OA OB = AB .
(3)
OC + OA − 2 cos β ⋅ OC OA = AC , 2
Now, we only consider two control points A , B , and γ , i.e., the degenerated P2P problem. From (3), we obtain: ( xO − x12 ) 2 + ( yO − y12 ) 2 = r122 , ( xO − x12′ ) 2 + ( yO − y12′ ) 2 = r122 , (4) where x + x 2 y1 − y 2 y + y 2 x1 − x 2 ; y12 , y12′ = 1 ; x12 , x12′ = 1 ± m 2 2 tan γ 2 2 tan γ r12 = ( x1 − x 2 ) 2 + ( y1 − y 2 ) 2 2 sin γ .
O
L1
A
B
L3
L2
C
Fig. 1. The optical center of the camera is represented by O . L1 , L2 and L3 are three parallel lines in 3D space.
III. THE DEGENERATED P3P PROBLEM As discussed in Section 2, the problem of finding the optical-center-to-line distances is the degenerated P3P
From (4) we know that, there are two circles that can be derived from (3) where ( x12 , y12 ) and ( x12′ , y12′ ) are the two circle centers, and r12 is the radius. The two circle C12 , C12′ are) shown in )Fig. 2. Note that if 0 < γ < π 2 , then the arcs AOB and AO′B greater than a hemicircle are the possible solutions for γ , and the two arcs less than a hemicircle are the possible solutions for π − γ . Similar results can be derived for (1) and (2). Definition 1. In the degenerated P2P problem, the possible solutions to the optical center are located on two arcs. The arc is called a base arc. The circle, a base arc located, is called a base circle. The line passing through the two control points is called a base line. ) ) In Fig. 2, arc AOB and AO′B are two base arcs, and circle C12 and C12′ are two base circles. Line AB is a base line. Obviously, the two base arcs are located on the opposite sides of the base line.
Lemma 1. For the degenerated P3P problem, if the three control points and the optical center of the camera are cocircular, then there are infinite solutions. This is called the cocircular degenerated P3P problem.
degenerated P3P problem. For the degenerated P3P problem, we only need to find the intersections of these six base circles defined in (1), (2), (3). Therefore, only quadric equations are needed to be solved.
For the degenerated P3P problem, when the three control points and the optical center of the camera are not cocircular, there are totally six base circles and six base arcs as defined by (1), (2), (3) (shown in Fig. 2). Obviously, an intersection of three base arcs corresponding to different edges of the triangle ABC is a solution to the degenerated P3P problem.
IV. FINDING THE POSE PARAMETERS
Lemma 2. For the degenerated P3P problem, if the three control points and the optical center of the camera are not cocircular, there is at most one solution located on a base arc. Proof. As shown in Fig. 2, the base circles C12 , C12′ corresponding to AB are denoted by dark lines, the base circles C23 , C23′ corresponding to BC are denoted by shallow lines, and the base circles C13 and C13′ corresponding to AC are denoted by dashed lines. For C12 , C12′ , C23 and C23′ , there are usually four intersections among the four circles, O, O′, P, Q , except the intersections identical to the vertices of the )triangle, i.e., A, B, C (see Fig. 2). Obviously, the base arc AOC passes ) through O . Now, we will prove that the base arc AOC cannot pass through the other three intersections P , Q, O ′ . ) If AOC passes through P , then C13 is coincided with C23 since they both pass through O, C , P . Therefore we obtain A, B, C ,)O are cocircular. This is contradictory. If AOC passes through Q , then C13 is coincided with C12 since they both pass through O, A, Q . Therefore we obtain A, B, C , O ) are cocircular. This is contradictory. ) If A O C passes through O′ , then AOC is identical to ) AO′C . This is contradictory. From the discussions above we) know, at most one solution can be located on the base arc AOC . Similar conclusions can be )drawn for the other five base arcs with the role of AOC being replaced. Therefore, there is at most one solution located on a base arc. From Lemma 1 and 2, we can obtain: Theorem 1. For the degenerated P3P problem, if the three control points and the optical center of the camera are not cocircular, then there are at most two solutions. Theorem 2. For the degenerated P3P problem, if the optical center of the camera is located inside the triangle constructed from the three control points, then there is at most one solution. Note that in [7], Sugihara solved the ambiguity mentioned in Theorem 1 by defining the clockwise direction of control points around the optical center as the true solution. However, he did not give how many ambiguous solutions existing in the
As shown in Fig. 1, the optical-center-to-line distances OA , OB and OC can be recovered by solving the degenerated P3P problem presented in Section 3. Therefore, the two translation parameters of the optical center of the camera O can be easily determined in the world coordinate system, while leaving one translation parameter undetermined along the direction of the parallel lines. How to recover the rotation parameters of the camera is described below. The unit directional vectors of lines OA, OB, OC in the world coordinate system, k 1 , k 2 , k 3 , can be easily determined after finding OA , OB and OC . The normals of the three interpretation planes represented in the world coordinate system, n′i , i = 1,2,3 is defined as: n ′i = R T n i , (5) where n i is the normal of the three interpretation planes represented in the camera coordinate system. Obviously, n′i are perpendicular to k i and the directions of these parallel lines m i , i.e., n′i = m i × k i . (6) Note that m1 = m 2 = m 3 . The obtained n′i are shown in Fig. 3b. The normals of the interpretation planes described in the camera coordinate system are shown in Fig. 3a. Therefore, the problem of finding the rotation of the camera converts to the problem of finding the rotation between the two coordinate systems shown in Fig. 3a and 3b. Since n 3 ⋅ (n1 × n 2 ) = 0 and n′3 ⋅ (n1′ × n′2 ) = 0 , i.e., n1 , n 2 and n 3 , n′1 , n′2 and n′3 , are both coplanar and not independent, we can only use n1 , n 2 and n′1 , n′2 here. Therefore, the two point sets {(0,0,0) T , n1 , n 2 } and {(0,0,0) T , n′1 , n′2 } are used to perform the 3D registration method as described in [10] to recover the 3D rotation. The reason why we can find the three rotation parameters from parallel lines is mainly that, in the process of finding the rotation, we also use the information of the dihedral angles between these interpretation planes. However, in the previous
n3
O
n2
O
n 1′
(a)
n1
n ′2
n ′3 (b)
Fig. 3. Geometry for estimating the rotation.
methods for the P3L problem, only the fact that n′i is perpendicular to m i is taken into account. V. EXPERIMENTAL RESULTS A. Simulations The effective focal length of the simulated camera is chosen as 400. The resolution of the image is 640 × 480 . Three 3D space parallel lines are generated in the world coordinate system OW X W YW Z W . To simplify the representations of these parallel lines, we assume they are parallel to the YW axis. We produce two images containing these three parallel lines taken in two different locations R i and Ti , i = 1,2 . Since the translation along the YW axis direction, i.e., the direction of these parallel lines does not change the image, we use Ti * instead of Ti to denote this ambiguity. Note that here the 3D transformation is composed by a translation Ti * followed by a rotation R i . On each image line we choose about 50 points. Gaussian noises with zero-mean and σ standard deviation are added to these image points. We vary the noise level σ from 0.1 to 2.0 pixels. Due to lack of space, we only show the estimation results for (R, T*) with noise level 2.0 in Table 1. From Table 1, we find that the estimated results are very robust with respect to the noises.
Image 2
Ground truth
0.667 − 0.333 − 0.667 0.667 0.667 0.333 0.333 − 0.667 0.667
0.00 * 500.00
0.851 0.488 0.191 − 0.191 0.628 − 0.754 − 0.488 0.605 0.628
− 100.00 * 400.00
Mean
0.668 − 0.332 − 0.665 1.19 0.665 0.666 0.335 * 0.331 − 0.666 0.667 500.71
0.852 0.488 0.187 − 0.193 0.627 − 0.754 − 0.486 0.606 0.629
− 101 .58 * - 398.15
Std
0.007 0.003 0.007 0.003 0.002 0.008 0.006 0.003 0.004
5.91 * 2.45
0.003 0.005 0.011 0.010 0.002 0.003 0.009 0.002 0.007
(a)
(b)
Fig. 4. (a) A crosswalk image. (b) A staircase image. Some detected parallel lines are superimposed in (a) and (b). The obtained slope angles of the two planes in (a) are 18.7° and 19.1° while the slope angles of the two planes in (b) are 37.7° and 35.8° .
VI. CONCLUSIONS
TABLE I SIMULATED ESTIMATION RESULTS UNDER NOISE LEVEL 2.0
Image 1
the edges from black to white, while the edges extracted from a staircase is classified into the concave and convex edges [6]. Therefore, there exist two planes containing each set of parallel lines. We can find the slopes angles of the two planes from the crosswalk and staircase with respect to the horizontal plane. Since our method only need to solve a quadric equations in one unknown, our method can satisfy the requirements of realtime applications. The recovered results from the real images are shown in Fig. 4. We can define a threshold for slope angle, e.g., 30 degrees, to distinguish crosswalks and staircases.
5.84 * 5.43
B. Real data As mentioned in [6], crosswalks and staircases are useful road features for outdoor navigation in mobility aids for the partially sighted. Crosswalks and staircases after edge extraction can be both represented by a set of parallel lines. Therefore we need to distinguish them in real applications. Three methods for making this distinguishment are given in [6]. However, these methods are slow and far from realtime because they need nonlinear optimization. The parallel edges extracted from crosswalks and staircase can both be classified into two sets of equally spaced parallel lines using intensity variation information. That is the edges extracted from a crosswalk is classified into the edges from white to black and
In this paper, we treat the problem of the pose estimation from parallel lines as a special case of the PnL problem, and a novel method for camera pose determination from a single view of parallel lines is presented. We show that three rotation parameters and two translation parameters can be recovered, and the rest translation parameter along the direction of the parallel lines remains undetermined. Our future work is to apply the novel pose determination method to virtual reality. REFERENCES [1]
H. Chen, Pose determination from line-to-plane correspondences: existence solutions and closed-form solutions, PAMI, 13(6), 1991, 530–541. [2] M. Dhome, M. Richetin, J.-T. Lapreste, Determination of the attitude of 3D objects from a single view, PAMI, 11(12),1989,1265–1278. [3] M.Fischler, R.Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, ACM 24(6), 1981, 381–395. [4] W.J. Wolfe, D. Mathis, C. Weber, and M. Magee, The perspective view of three points, PAMI, 13 (1), 1991, 66-73. [5] D. Marr, Vision. Freeman Publishers, New York, 1982. [6] S. Se, Zebra-crossing detection for the partially sighted, CVPR 2000 (2), 211-217. [7] K. Sugihara, Some location problems for robot navigation using a single camera, CVGIP, 1988, 42(1), 112-129. [8] P. Baker, Y. Aloimonos, Structure from motion of parallel lines, ECCV (4) 2004: 229-240. [9] F.Heuvel, Exterior Orientation using coplanar parallel lines. Tenth Scandinavian Conf. on Image Analysis, 1997, 71-78. [10] K. S. Arun, T. Huang, and S. D. Blostein. Least-squares fitting of two 3-D point sets. PAMI, 9(5), 1987, 698-700.