Using Scene Constraints during the Calibration Procedure Didier Bondyfalat
Théodore Papadopoulo Bernard Mourrain INRIA Sophia-Antipolis 2004, route des Lucioles, BP 93, 06902 Sophia-Antipolis, Cedex, France
[email protected] Abstract This paper focuses on the problem of calibration from a single view and a map of a scene. This situation arises quite often when modelling urban scenes, e.g. for augmented reality purposes. We show how some scenes constraints can be used to achieve a calibration like procedure. An example excerpted from a sequence of pictures for which selfcalibration-like techniques consistently fail illustrates some of the benefits of the approach.
1. Introduction In applications such as augmented reality, 3D reconstruction and its prerequisite calibration is sometimes still an issue. Although self-calibration techniques have made tremendous progress these last years [4, 6, 9, 2, 1, 7, 3], some stages of self-calibration still have difficulties with some scenes. Figure 1 shows an example: These two pictures are excerpted from a series of pictures of London that have been taken from across the Thames. Although depth information is available in these pictures, using selfcalibration techniques has been impossible because the very first stage of the process – the computation of the Fundamental matrix – gives results that are far too unstable. One of the drawbacks of the current techniques is that they take only the image formation constraints into account during the self-calibration process. The scene, however, might also be a rich source of constraints that might be exploited. And, people that do model such scenes for augmented reality purposes indeed make some use of such constraints. One particularly rich example of constraints is a map of the imaged scene, and such maps are often available with urban scene. While this case (one orthographic or weak-perspective image – the map – and some perspective ones – the picture – ) has been recognized as been rather common and useful in practise [10, 8, 11], none of these works has studied the potential use of the scene constraints,
Figure 1. Two views of London taken from across the Thames.
this has however been studied in the general context of selfcalibration [5]. This paper formalises the use of constraints during the calibration process in the specific case where one image and a map of the scene are available. Figure 2 shows an example of such a configuration. Since the map is a very particular view, it is possible to compute the epipolar geometry between those two views. As it can be seen, the result is very poor as the epipole in the map, which corresponds to the position of the photographer, is situated in the middle of the Thames! In the sequel, we show how scenes constraints can be used early in the process of self-calibration to provide better results.
2. Notations and preliminary results Throughout this paper, we will make use of the Grassman-Cayley algebra, with ^ and _ being the meet and join operators, respectively.
The optical center of the camera not belonging to the plane , M is invertible and thus defines an homography of P3. Proof: M = (P1 P2 P3 )t , thus
det(M ) = P1 ^ P2 ^ P3 ^ = C ^ 6= 0
2
From now on,
A; B; C; D are four points of the plane
. Their coordinates are given by (A1 : A2 : A3 : A4 ), (B1 : B2 : B3 : B4 ), (C1 : C2 : C3 : C4 ) and (D1 : D2 : D3 : D4 ) respectively. The images of A; B; C; D through the camera P are called a; b; c; d, respectively. In the remainder of this article, we will be particularly interested in the 3D lines (AB ) and (CD) and their respective images. u = ab and v = cd.
Figure 2. An image and a map of the pictured scene. The epipolar geometry computed by standard algorithm is shown as red lines on the map.
Proposition 1 If a = (a1 : a2 : a3)t then the coordinate Ai ; i = 1 : : : 4 of A = P+ (a) are obtained by taking the determinant of the matrix M in which the i-th column has ? t been replaced by the vector a 0 , e.g.:
2.1. The projective model of a camera A camera is defined by its projection matrix P . The three rows and four columns of P are denoted by P1 , P2 , P3 and P 1 , P 2 , P 3 , P 4 , respectively. The optical center C of the camera is the unique 3D point satisfying P:C = 0, C = P 1 ^ P2 ^ P3 . The optical ray defined by a point m of the retina with coordinates is:
xP2 ^ P3 + yP3 ^ P1 + zP1 ^ P2 :
(1)
2.2. The inverse projection matrix for a given plane A 3D plane not passing through C is denoted by = (1 ; 2 ; 3 ; 4 ). There is an homography between the retinal plane and such a plane . Its matrix P+ is given by:
P+ = (P2 ^ P3 ^ ; P3 ^ P1 ^ ; P1 ^ P2 ^ ) : For each point m of the retina, P+ (m) represents the coor-
dinate vector of the 3D point intersection of the optical ray (Cm) and of the plane . P+ behaves thus as the inverse projection matrix for the plane . Proof: Indeed, if m = (x : y : z ) then
M = P+ (m) = (P2 ^ P3 ^ ; P3 ^ P1 ^ ; P1 ^ P2 ^ ) (m) = x(P2 ^P3 ^) + y(P3 ^P1 ^) + z (P1 ^P2 ^) = (xP2 ^ P3 + yP3 ^ P1 + zP1 ^ P2 ) ^ : 2
We now define the matrix M by:
M=
P1 P2 P3 P4 : 1 2 3 4
=
a P 2 P 3 P 4 A1 0 2 3 4 a = PA . This can be Proof: If A = P+ (a) then _A =0 ? t ? written MA = a 0 . Thus, A = M?1 a 0 , and the result follows using Cramer’s rule. 2 3. Single View Constraints Using GrassmannCayley Algebra In this section, we look for expressions for orthogonality and parallelism constraint between two lines of the 3D scene viewed by a camera. In other words, we want to express the fact that two lines in the image are images of parallel or orthogonal 3D lines.
3.1. The orthogonality constraint In the following, we assume that the points of the plane at infinity H1 are those which last coordinate is zero and that the points M = (X; Y; Z; 0)t of the absolute conic satisfy the equation X 2 + Y 2 + Z 2 = 0, using the canonical scalar product hM; M 0 i = XX 0 + Y Y 0 + ZZ 0 + TT 0. In the following, we denote by H1 (X ) the scalar product of a = (X1 : vector 1 representing H1 and of the vector X2 : X3 : X4 ) representing the 3D point X , i.e. H1 (X ) =
X
T1 X = X4
(AB ) and (CD) are orthogonal iff: D14 D0 + D24 D0 + D34 D0 = 0 ;
Proposition 2
14
24
34
(2)
Ai Bi = A B
and
Di are the Dij0 = CCi D j j j j Plücker coordinates of the lines (AB ) and (CD), respec-
where Dij tively.
Proof: (AB ) and (CD) are orthogonal iff the two points at infinity of those lines are conjugated with respect to the absolute conic i.e. h(A _ B ) ^ H1 ; (C _ D) ^ H1 i = 0. Moreover:
h(A _ B ) ^ H1 ; (C _ D) ^ H1 i = hH1 (A)B ? H1 (B )A; H1 (C )D ? H1 (D)C i = hA4 B ? B4 A; C4 D ? D4 C i
From the definition of ?,using again formula (3), we get:
h3 P? 2 ?2 P 3; ?i = 3 hP 2 ; ?i? ?2 hP 3 ; ?i = 0; P 2 P 3 ? = P 1 ; P 2 ; P 3 3 P 2 ?2 P 3 : ? ? Thus ? = 4 P 1 ; P 2 ; P 3 +hP 4 ; ?i 3 P 2 ?2 P 3 . It is easy to show that
4 P 1 ; P 2 ; P 3 +hP 4 ; ?i = det(M) : Consequently: D14 = det(M)(3 hu; P 2 i?2 hu; P 3 i). The other determinants are obtained similarly. Thus: ?? ?3 +? 1
0 + D24 D0 + D34 D0 : = D14 D14 24 34
2
?
hu; P 2 i?2 hu; P 3 i ?3 hv; P 2 i?2 hv; P 3 i hu; P 3 i?3 hu; P 1 i ?1 hv; P 3 i?3 hv; P 1 i + 2 hu; P 1 i?1 hu; P 2 i 2 hv; P 1 i?1 hv; P 2 i det(M)2 =0:
Lemma 1 We remind two simple properties of crossproducts that will be used in the sequel:
(uv )w = hab; xyi = Proposition 3 ? ? +?
hu; wiv ? hv; wiu ha; xihb; yi ? hb; xiha; yi
(AB ) and (CD) are orthogonal iff: ?
(3) (4)
3 hu; P 2 i ? 2 hu; P 3 i ?3 hv; P 2 i ? 2 hv; P 3 i 1 hu; P 3 i ? 3 hu; P 1 i ?1 hv; P 3 i ? 3 hv; P 1 i + 2 hu; P 1 i ? 1 hu; P 2 i 2 hv; P 1 i ? 1 hv; P 2 i = 0
(5) where u = ab and v = cd are the image lines corresponding to the 3D lines (AB ) and (CD) respectively.
Proof: The scheme of the demonstration is the following: starting from equation 2, we substitute the coordinates of the points using proposition 1.
D14
=
a P2 P3 P4 0 2 3 4
b P2 P3 P4 0 2 3 4
P1 P2 P3 a 1 2 3 0
P1 P2 P3 b 1 2 3 0
:
Developing the determinants along their last line, we get: =
ha; i hb; i = ha; ihb; ?i?hb; iha; ?i : ha; ?i hb; ?i = 2 P 3 P 4 ? 3 P 2 P 4 + 4 P 2 P 3 with: ?=?1 P 2 P 3 + 2 P 1 P 3 ? 3 P 1 P 2 Using relation (4), we obtain: D14 = hab; ?i. D14
From Equation (3), we also have: ?
?
? = 4 P 2 P 3 ? 3 P 2 ?2 P 3 P 4 ? ? = 4 P 2 P 3 ??h3 P 2 ?2 P 3 ; ?iP 4 + ? hP 4 ; ?i 3 P 2 ?2 P 3
2
3.2. The parallelism constraints Proposition 4 Assuming that A; B; C; D do not belong to the plane at infinity, the (AB ) and (CD) are parallel iff:
H1 (C )A _ B _ D ? H1 (D)A _ B _ C = 0
(6)
Proof: Two lines not belonging to the plane at infinity are parallel iff they intersect at a point of the plane at infinity, or equivalently, the point at infinity (i.e. the unique point intersection of H1 and of the 3D line) of one of the lines belongs to the other one. This last property can be written very naturally in the Grassmann-Cayley algebra:
(A _ B ) _ (H1 ^ (C _ D)) =0 () A _ B _ (H1 (C )D ? H1 (D)C ) =0 () H1 (C )A _ B _ D ? H1 (D)A _ B _ C = 0 2 Proposition 5 The lines (AB ) and (CD) are parallel iff:
h1 (P 3 P 2 ) + 2 (P 1 P 3 ) ? 3 (P 1 P 2 ); uvi = 0
(7) where u = ab, v = cd denote the image lines corresponding to (AB ) and (CD), respectively. uv is the intersection of these two lines. Proof: Equation (6) is equivalent to
8I; jH1 (C )D ? H1 (D)C; A; B; I j = 0 : The mapping M being invertible, we have: det(M)jH1 (C )D ? H1 (D)C; A; B; I j = H1 (C )d ? H1 (D)c a b (i1 ; i2 ; i3 )t
0
0 0
i4
where i = (i1 : i2 : i3 : i4 ) is the image of I by the mapping M. If I does not belong to then i4 6= 0, the H1 (C )d ? H1 (D)c a b = 0. equation becomes:
H1 (C ) = C4 = hc; ?i , we have: Since H1 (D) = D4 = hd; ?i H1 (C )d?H1 (D)c = h?; cid?h?; dic = (cd)? : Thus:
jH1 (C )d ? H1 (D)c; a; bj = 0 () j(cd)?; a; bj =0 () jcd; ?; abj =0 () h?; (ab)(cd)i =0:
Proposition 7 ? F = P 1 P 2 P 3 P 2 P 4 P 2 : (8) Let M = (M1 : Y : M3 : M4 ) a 3D point, its
Proof: images in the map and the view are respectively given by:
mm = (M1 : M3 : M4 ); mv = PM = M1 P 1 + Y P 2 + M3P 3 + M4 P 4 :
2
4. Camera Calibration Using a Single View and a Map In this section, we present a technique that allows the calibration of a camera using a single view obtained with this camera and a map of the scene. Our input data consists of correspondences (points or segments) between the image and the map as well as some geometrical properties of the scene. Of particular interest are vertical segments, parallelities and orthogonalities in horizontal or vertical planes. In the remaining, verticality is defined by the y axis.
4.1. Estimation of P 2 What is the epipolar geometry between an image and a map? Every point of the map corresponds to the image of a vertical line in the view. Consequently, the epipole of the map in the image is at the intersection of all the image lines corresponding to 3D vertical lines. Proposition 6 P 2 represents the coordinates of the epipole of the map in the image. Proof: Let us consider a vertical line that goes through the point M = (M1 : 0 : M3 : M4 )t . This line also contains the points of coordinates (M1 : 1 : M3 : M4 )t . The image of this line is trivially written as the meet of the projections of these two points:
(M1 P 1 +M3 P 3 +M4 P 4 )_(M1 P 1 +P 2 +M3 P 3 +M4 P 4 )
Developping this expression, we verify that P 2 always belongs to the projection, i.e.:
P 2 _ (M1 P 1 + M3 P 3 + M4 P 4 ) _ (M1 P 1 + P 2 + M3 P 3 + M4P 4 ) = 0
4.2. Estimation of the fundamental matrix
The fundamental matrix between the map and the view satisfies: mtv Fmm = 0. It is easy to check that:
Fmm = M1 P 1 _ P 2 + M3 P 3 _ P 2 + M4 P 4 _ P 2 : is the epipolar line defined by mv and the epipole P 2 . 2 Proposition 8 Parallelities in a known vertical plane give a constraint on the first two columns of F . Proof: If is a vertical plane then 2 the parallelism constraint (7) gives:
h1 P 3 P 2 ? 3 P 1 P 2 ; uvi = 0 :
2
up to a scale factor. This factor is fixed using the constraint jjP 2 jj = 1, which fixes the global scale factor of matrix P .
(9)
Since the plane is known, 1 ; 3 are given by the line of the map corresponding to the projection of . 2
P 2 being known (see previous section), correspondences between points in the map and points or lines of the image and parallel lines in known vertical planes (as per proposition 8) give constraints that allow a linear estimation of F . Practically, we write a homogenous linear system corresponding to the constraints in the unknowns P 1 , P 3 and P 4 and this system is solved using least-squares. We thus obtain an estimate F~ of F up to a scale factor. By construction, F will be of rank 2 with P 2 as a left kernel. 4.3. Estimating the projection matrix up to four parameters Given the estimates of P 2 and of F~ , it is possible to obtain linearly some information about the other columns of the matrix P . Proposition 9 There are three vectors P01 ; P03 ; P04 and four scalars 6= 0; 1 ; 3 ; 4 2 R such that: ?
We have thus obtained a mean to estimate P 2 linearly
= 0. Consequently,
P = P01 + 1 P 2 P 2 P03 + 3 P 2 P04 + 4 P 2 : Proof: Let us define for i 2 f1; 2; 3g: =1; l(i) = i1 + 1 ifif ii 6= 1:
For each i 2 f1; 3; 4g, we have F~ l(i) = P i P 2 with 6= 0. Taking the cross-product of this formula by P 2 (remember that jjP 2 jj = 1) and applying formula (3), we obtain:
P 2 F~ l(i) = P i ? hP 2 ; P i iP 2 : Defining P0i = P 2 F~ l(i) and denoting by i the unknown quantity hP 2 ; P i i, we get the desired result. 2 4.4. Estimating the projection matrix up to two parameters Proposition 10 Let (ab) and (cd) be the image lines corresponding to the 3D lines (AB ) and (CD) of the scene, respectively. If (CD) is a vertical line, if (AB ) and (CD) belong to the vertical plane , and orthogonal, then:
hv;? 3 P01 ? 1 P03 i hu; 3 P01 ? 1 P03 i + (3 1 ? 1 3 )hu; P 2 i = 0 ; (10) where u = ab and v = cd. Proof: If is a vertical plane, 2 = 0 and since v is the image of a vertical line, hv; P 2 i = 0. Substituting in the orthogonality constraint (5), we have:
(3 hv; P 1i ? 1 hv; P 3 i)(3 hu; P 1i ? 1 hu; P 3i) = 0 Substituting P 1 = P01 + 1 P 2 and P 3 the above equation, we get:
= P03 + 3 P 2 in
?hv; 3 P01 ? 1 P03 i hu; 3 P01 ? 1 P03 i + (3 1 ? 1 3 )hu; P 2 i = 0 Since 6= 0, we obtain the announced result. 2 Proposition 11 Let (ab) and (cd) be the image lines corresponding to the 3D lines (AB ) and (CD) of the scene, respectively. If (AB ) and (CD) are in an horizontal plane and orthogonal then:
2?
hu; P01 ihv; P01 i + hu; P03 ihv; P03 i + ? ? 1 hu; P 2 ihv; P01 i + hu; P01 ihv; P 2 i + ? 3 hu; P03 ihv; P 2 i + hu; P 2 ihv; P03 i + ? 21 + 23 hu; P 2 ihv; P 2 i = 0:
(11)
Proof: Since is an horizontal plane then 1 = 3 = 0 and 2 6= 0. Substituting these in equation (5), we obtain:
hu; P 1 ihv; P 1 i + hu; P 3 ihv; P 3 i = 0:
(12)
The announced result is obtained by using the linearity of 2 the scalar product.
Proposition 12 Let (ab) and (cd) be the image lines corresponding to the 3D lines (AB ) and (CD) of the scene, respectively. If (AB ) and (CD) are in an horizontal plane and parallel then: ?
h P01 P03
??
+ 3 P01 ? 1 P03
P 2 ; uvi = 0:
(13)
Proof: If is an horizontal plane then 1 2 6= 0. Thus equation 7 becomes:
= 3 = 0 and
2 hP 1 P 3 ; uvi = 0: Substituting P 1 = P01 + 1 P 2 and P 3 = P03 + 3 P 2 , we obtain:
?? ? h P01 P03 + 3 P01 ? 1 P03 P 2 ; uvi = 0: Since 6= 0, we obtain the result. 2
We have thus obtained two kinds of geometrical constraints that can be used to estimate, up to scale factor, the values of ; 1 ; 3 by solving a homogeneous linear system in the least-squares sense. Orthogonality constraints (11) can be used in a non-linear refinement of the solution. We thus obtain the matrix P up to the two parameters and 4 : ? P = P01 + 1 P 2 P 2 P03 + 3 P 2 P04 + 4 P 2
(14) To obtain a full estimation of the projection matrix, it is sufficient to set an origin and a scale factor over the vertical axis. Consequently, the knowledge of the height of two points in the scene is enough to fully recover P .
5. Experiments To assess the potential of the method, various experiments have been run. We illustrate the method on two such experiments. The first experiment uses a calibration pattern as shown in figure 3. The results obtained after full calibration by the procedure explained in this article (Fundamental matrix computation, projection matrix up to four, two and zero parameters using successive constraints) are: 2
P0 =
4 2
P2 =
4
5:77e?1 ?1:71e?2 3:43e?4 5:55e?1 ?3:21e?2 2:68e?4
?2:09e?3
9:96e?1 8:04e?5 ?2:82e?2 9:52e?1 ?1:54e?5
3:71e?1 6:67e?2 ?3:56e?4 3:47?1 6:56e?2 ?3:78?4
1:30e2 3:17e1 1 1:31e2 3:41e1 1
3 5
;
3 5
;
where P0 is the result obtained without any noise and P2 is the one obtained when adding a 2.0 standard deviation
34
27
35
20
29
36
33 44 42
37
45
43
41
30
46
60
68
69
70
71
72
75
76
77
78
79
80
82
83
84
85
86
87
92
93
94
95
39
47 55
32 40
48 56
64 62
59
67
74
81
53
51 50 49
66
73
24
31
38
54 52
65
16
23
61
63
89
90
97
98 99
91
100 101 102 103
105 106 107 108 109 110 111 113 114 115 116 117 118 119 121 122 123 124 125 126 127
88
96
215
214
213
212
211
210
209 122
206
21
28
8
15
22
207
208 19
26
14
205
18
13
104
204
25
7
6
5
12
112
120
128
203
17
11
202
10
9
4
3
2
1
58
200 9
201
57
Figure 3. A calibration pattern with its corresponding map. The geometric primitives on which parallelism and orthogonality are used are overlayed over the calibration pattern image.
Figure 4. The London images and the geometric constraints used for calibration.
Gaussian noise. With no added noise, the result is in perfect accordance with the known theoretical projection matrix. When noise is added, while the results are not perfect, it should be remembered that they were obtained in a purely linear fashion and adding some non-linear optimisation procedures after each stage of the method should further improve the quality of the result. The second experiment has been conduced with the London images that have been shown in the introduction. Since this is a real situation, with which standard calibration techniques have failed, a reasonable idea of the final projection matrix that should be recovered is not available. Thus, we show the result of the algorithm after the Fundamental matrix computation stage. Figure 4 shows the primitives with each image. As it can be seen in figure 5 and figure 6, the epipolar geometry between the view and the map computed when using constraints is much better than the one obtained from the sole viewing geometry (i.e. point matches between the map and the image). For one, the viewer is no longer situated in the Thames but correctly positioned on the other bank. Then, the epipolar lines in the image are corresponding to images of vertical lines all over the image which is not usually the case with standard methods.
6. Conclusion and perspectives In this paper, we have addressed the problem of calibrating a camera from a single view and a map of the scene. Some special cases of geometrical 3D constraints that happen often in practise have been shown to provide linear equations on fundamental objects in Computer Vision such as the Fundamental matrix or the projection matrix. A procedure for obtaining the full calibration of a camera from these in a linear fashion has been outlined. Of course, for even better results, various non-linear optimisation stages must be added to this procedure. The example show nonetheless how useful can be scene constraints in
Figure 5. Epipolar geometry between the view and the map using the standard methods.
References
Figure 6. Epipolar geometry between the view and the map using the geometrical constraints shown in figure 4.
some situations. Obviously, there is a whole domain of extension that has not been considered at all in this article: the case of multiple views of the scene plus the map information. In this case also, a lot can be done to integrate scene constraints into the calibration process. This will be the subject of a forthcoming article.
Aknowledgment This work is partially supported by the European project CUMULI (LTR 21 914). The images and map on London have been kindly provided by Didier Stricker from Fraunhofer, IGD.
[1] B. Triggs. Autocalibration from planar scenes. In H. Burkhardt and B. Neumann, editors, Proceedings of the 5th European Conference on Computer Vision, volume I of Lecture Notes in Computer Science, pages 89–105, Freiburg, Germany, June 1998. Springer-Verlag. [2] B. Triggs. Autocalibration and the absolute quadric. In IEEE International Conference on Computer Vision and Pattern Recogition, pages 609–614, 1997. [3] R. Hartley and A. Zisserman. Multiple View Geometry in computer vision. Cambridge university press, 2000. [4] S. J. Maybank and O. D. Faugeras. A theory of selfcalibration of a moving camera. The International Journal of Computer Vision, 8(2):123–152, Aug. 1992. [5] D. Liebowitz and A. Zisserman. Combining scene and autocalibration constraints. In Proceedings of the 7th International Conference on Computer Vision, Kerkyra, Greece, Sept. 1999. IEEE Computer Society, IEEE Computer Society Press. [6] M. Armstrong, A. Zisserman, and R. Hartley. Selfcalibration from image triplets. In Fourth European Conference on Computer Vision, pages 3–16, Apr. 1996. [7] M. Pollefeys and L. V. Gool. Stratified self-calibration with the modulus constraint. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8):707–724, Aug. 1999. [8] N. Navab, Y. Genc, and M. Appel. Lines in one orthographic and two perspective views. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, volume 2, pages 607–614, Hilton Head Island, SC, June 2000. IEEE Computer Society. [9] Q. Luong and O. Faugeras. Self-calibration of a moving camera from point correspondences and fundamental matrices. The International Journal of Computer Vision, 22(3):261–289, 1997. [10] M. Appel and N. Navab. Registration of technical drawings and calibrated images for industrial augmented reality. In IEEE Workshop on Applications of Computer Vision, pages 48–55, Palm Springs, CA, Dec. 2000. [11] Z. Zhang, P. Anandan, and H. Shum. What can be determined from a full and a weak perspective image? In Proceedings of the 7th International Conference on Computer Vision, pages 680–687, Kerkyra, Greece, 1999. IEEE Computer Society, IEEE Computer Society Press.