Geometric calibration of digital cameras through ... - Semantic Scholar

Report 3 Downloads 91 Views
Image and Vision Computing 23 (2005) 517–539 www.elsevier.com/locate/imavis

Geometric calibration of digital cameras through multi-view rectification Luca Lucchese* School of Electrical Engineering and Computer Science, Oregon State University, 313 Owen Hall, Corvallis, OR 97331, USA Received 4 April 2004; received in revised form 30 September 2004; accepted 14 January 2005

Abstract This paper introduces a new and very effective method for high-precision geometric calibration of digital cameras. The internal and external geometry are estimated: (1) by extracting features with subpixel accuracy from various views of a planar calibration plate; and (2) by mapping these feature sets into the corresponding points of the undistorted and rectified image that would be generated by an ideal pinhole digital camera with the same focal length as the camera to calibrate but devoid of lens distortion and perspective warp. The rectification of the views is formulated as a nonlinear least-squares optimization problem where a quadratic cost function expressing the residual registration error has to be minimized. The views are first prealigned with the reference image by means of a simplified mathematical model. This initialization, together with the closed-form computation of the gradient of the cost function, allows the Levenberg-Marquardt algorithm employed to find its minimum to rapidly converge to the optimal solution. The performance of the new calibration algorithm is tested with a set of real images available on the Internet and discussed in the paper. Also, its accuracy is assessed by means of synthetic versions of the actual images generated with the estimated parameters. q 2005 Elsevier B.V. All rights reserved. Keywords: CCD and CMOS digital cameras; Pinhole camera; 2-D projective transformation; 2-D homography; Image rectification; Camera calibration; Intrinsic and extrinsic parameters; Focal length; Pixel aspect ratio; Lens distortion

1. Introduction Camera calibration is one of the classical problems of photogrammetry and computer vision where many tasks require the computation of accurate metric information from images. Calibrating a camera consists in determining the transformation which maps 3-D points of a certain scene or object into their corresponding 2-D projections onto the image plane of the camera [1,2]. This transformation depends on two sets of parameters: (1) the extrinsic (or external) parameters which define the camera’s pose, i.e. its position and orientation in space with respect to a given reference system; and (2) the intrinsic (or internal) parameters which describe the internal geometry of the camera, i.e. the image formation process through its optical system. For the determination of the external geometry of a CCD and CMOS digital camera one has to estimate the 3-D rigid transformation which relates the 3-D reference system * Tel.: C1 541 737 2980; fax: C1 541 737 1300. E-mail address: [email protected] 0262-8856/$ - see front matter q 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2005.01.001

associated with the camera (camera coordinate system (CCS)) with the 3-D reference system associated with the imaged scene (object coordinate system (OCS)). This transformation has six degrees of freedom: the three components of the translational displacement and the three degrees of freedom of the rotation matrix. The intrinsic parameters can be distinguished into two subsets. The first includes the effective focal length, the coordinates of the principal point, and the pixel aspect ratio of the CCD or CMOS sensor (they are defined in Section 3). An ideal digital camera can mathematically be modeled as a pinhole imaging device where these values completely describe the mapping of 3-D scene points into corresponding 2-D image points through the laws of projective geometry. The pinhole model fails however to provide an adequate description of the image formation process taking place in real acquisition devices, since their lens systems may introduce various types of geometric and chromatic aberrations. Geometric distortion is usually modeled with appropriate polynomials in the 2-D coordinates of the image plane which account for both radial and decentering, or tangential, components [3]; the coefficients of these polynomials constitute the second set of intrinsic parameters.

518

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

The literature of geometric camera calibration offers a number of algorithms which deliver estimates of all, or some of, the intrinsic and extrinsic parameters with various degrees of accuracy. Some representative techniques are reviewed in Section 2. Based on the calibration procedure, they can be divided into: (1) Methods estimating the external parameters and some of the internal parameters by exploiting the correspondences between 3-D points on a convenient calibration target (either 2-D or 3-D) and the relative 2-D features measured in one or more images of the target; (2) Methods estimating the external parameters and some of the internal parameters without any calibration target by exploiting self-consistency constraints across different views of a static and rigid 3-D scene; they are usually referred to as self-calibration or auto-calibration techniques; (3) Methods estimating some of the intrinsic parameters by enforcing priors such as straightness of lines in one or more views of a scene containing rectilinear features. The algorithm advanced in this paper can be considered as an extension of the methods of the third category where all the intrinsic and extrinsic parameters are estimated very accurately by registering different views of a planar calibration plate1 with a regular pattern onto an undistorted and rectified2 image [2] of the plate itself. Such an image would be generated by an ideal pinhole digital camera with the same focal length as the camera to calibrate but devoid of lens distortion and perspective warp. The feature measurements from the various views are mapped into the image plane of this virtual view according to an image warping model which includes all internal and external parameters. The two geometric priors exploited by our algorithm are the flatness of the plate and the regularity of the pattern printed on it. The calibration algorithm consists of two stages and is applied to sets of features extracted from different views of the plate. Even though our algorithm works with features obtained from any kind of planar target, a regular black-andwhite checkerboard is considered in this paper, since the X-junctions in its images can be readily extracted with subpixel accuracy with the method advanced in [6]. The first stage estimates the 2-D homographies which relate each view of the calibration plate to the undistorted and rectified reference image; this stage also returns rough estimates of the most important internal parameters of the camera: the coordinates of the principal point and the first radial distortion coefficient. In the second stage, all the views are registered onto the reference image yielding the internal geometry of 1 For a reference to plane-based camera calibration, the reader is referred to Sturm and Maybank’s work [4]. 2 For the concept of metric rectification the reader is referred to Liebowitz and Zisserman’s work [5].

the camera, expressed by 10 parameters, and the external geometries of all the views used for calibration. The estimates from the first stage are used to initialize the LevenbergMarquardt algorithm in the solution of the nonlinear minimization problem of the second stage and convergence to the optimal solution is achieved in a few iterations. The partial derivatives of the quadratic cost functions defined in the two stages for view rectification are computed with respect to all the internal and external parameters allowing the closed-form computation of the gradients and Hessian matrices used by the Levenberg-Marquardt algorithm in their minimization. This allows for a major computational advantage with respect to the techniques that estimate these parameters numerically. The high accuracy of our feature extractor 3 and the ease of implementation of our multi-view rectification algorithm make up a powerful tool for the complete, reliable, and highprecision calibration of digital cameras affected by any kind of lens distortion, making it very attractive for a wide range of applications both in computer vision and in photogrammetry. The algorithm has been tested with images taken with different kinds of CCD and CMOS sensors showing a wide range of lens distortion, from mild to severe. The experimental results reported in this paper refer to the image dataset acquired by Jean-Yves Bouguet and available on the Internet [7]. This choice was made to allow other researchers to have an easily accessible benchmark to compare the performance of our algorithm with theirs. The estimation accuracy of our calibration method was evaluated by comparing the results obtained with Bouguet’s images with the results obtained with synthetic versions of these images. In the first case, the residual multi-view rectification error is of about 0.2 pixels, in the second, of about 0.01 pixels. The different precision in the two sets of experiments can be explained (1) by the presence of digitization noise in the real images, (2) by the possible deviation from perfect flatness of the calibration plate, and (3) by the likely presence of distortion components not accounted for in the mathematical model describing geometric lens aberrations. However, the fact that both internal and external parameters obtained from the synthetic images closely match the corresponding parameters estimated from the actual images clearly confirms the high accuracy and the convergence to the optimal solution of our algorithm. The major contributions of this paper, preliminarily presented in [8–10], can be summarized as follows. (1) The concept of estimating all the internal and external parameters through image rectification is unprecedented and the two-step algorithm we propose does constitute an effective method for accomplishing this task. (2) The external parameters of each view and the focal length of the camera are estimated from the relative homography 3 Image features are obtained with an accuracy of the order of a few hundredths of a pixel in low-noise conditions [6].

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

coefficients once the pixel aspect ratio has been computed as the zero-crossing of a simple nonlinear equation built upon the cosines of the 3-D rotation angles. This result represents an attractive alternative for the estimation of the pose of a camera. This paper has six sections. Section 2 reviews relevant work in camera calibration. Section 3 presents the camera model and the calibration geometry used by our algorithm. Section 4 provides the details of the three algorithmic steps for camera calibration. Section 5 reports and discusses the experimental results with both real and synthetic data. Section 6 draws the conclusions.

2. Related work The bundle adjustment method is one of the classical camera calibration algorithms advanced in analytical photogrammetry [11]. The internal and external parameters are recovered by minimizing the mean-squared distances between the observed positions on the camera’s image plane of some fiducial 3-D points of a calibration plate and their predicted locations based on the image formation process derived from the laws of perspective projection and the mathematical model of geometric lens distortion. This method is computationally intensive, since a large-scale nonlinear optimization problem has to be solved, and requires good initial estimates of the parameters in order to converge to the optimal solution. Many authors have proposed simplified or linearized versions of the bundle adjustment method. Abdel-Aziz and Karara advanced the direct linear transformation (DLT) technique where lens distortion is not taken into account and the coefficients of the 3!4 projective transformation matrix in homogeneous coordinates are estimated with a linear leastsquares algorithm [12]. Tsai extended this model by adding radial lens distortion through the radial alignment constraint [13]; in this method, which has become widely popular in the computer vision community, as well as in its extension [14], the internal and external parameters are decoupled by decomposing the camera model into linear and nonlinear components. A more complex model of lens distortion is adopted in the two-step method of [15]: in the first step, the calibration parameters are estimated in a closed form by ignoring lens distortion; in the second step, these are refined with an iterative nonlinear optimization algorithm. The progressive refinement of the mathematical model of the camera and its use in different steps of the calibration procedure appears to be the most effective strategy for the accurate estimation of the internal and external geometry of the camera. The algorithm proposed in this paper follows a similar approach. Several extensions of the two-step method have been proposed, e.g. the three-step method of [16] and the four-step method of [17]. Some authors have investigated the possibility to estimate intrinsic and extrinsic parameters through self-calibration

519

techniques; this is usually accomplished by enforcing selfconsistency constraints between sets of views of a scene. In their pioneering work, Maybank and Faugeras [18] showed that the absolute conic provides the relationship between camera calibration and epipolar geometry. Several other authors have further developed this theory by considering fixed or varying intrinsic parameters and various types of camera motions and scene geometry [19–27]. Self-calibration has also been demonstrated by using images of surfaces of revolution [28] and by exploiting image sequences where objects follow trajectories dictated by the gravity law [29]. The self-calibration algorithms usually adopt five intrinsic parameters: the effective focal length of the camera, the pixel aspect ratio, the skew angle between the axes of the image plane, and the coordinates of the principal point. There exist other techniques which employ more sophisticated models to describe the internal geometry of the camera and achieve high-precision calibration by means of 2-D or 3-D targets. One term of the polynomial expansion modeling radial distortion is used in the algorithm of [30]; two terms are adopted instead by Zhang [31,32], by Chatterjee and Roychowdhury [33], and by Pedersini et al. [34]. Heikkila¨ and Silve´n also add two decentering distortion terms [17,35,36]. A similar model is also adopted in [37] for the calibration of zoom lenses. Three radial distortion parameters and two decentering distortion parameters are used in the camera models of [38,39]. Zhang’s calibration method [31,32], which adopts a planar and regular target like the technique advanced in this paper, has become largely popular in the computer vision community. Moreover, Bouguet’s Matlab Camera Calibration Toolbox, which is the term of comparison of our algorithm, shares similar characteristics with Zhang’s method. Therefore, this latter will be briefly reviewed. Zhang’s algorithm retrieves the internal and external parameters by estimating the homography between a model plane containing the calibration target and several images of this target. A simple two-parameter warping model can be plugged into the algorithm in order to account for lens distortion. Only the first two radial distortion coefficients, as mentioned above, are considered and are estimated with a linear least-squares algorithm, once the other internal and external parameters have been obtained by minimizing an appropriate cost function [31,32]. These two steps are then applied alternately until convergence is reached. Zhang’s algorithm cannot however be straightforwardly extended to higher-order lens distortion model, since tangential distortion coefficients would produce nonlinear equations. Unlike Zhang’s technique, the algorithm proposed in this paper estimates, within a single framework, the internal and external parameters by modeling the combined warping due to perspective tilt (2-D homography) and lens distortion which relates the various images of the calibration target and a virtual image representing a rectified and distortioncompensated version of all the views. Furthermore, unlike

520

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Zhang’s technique, the tangential components of lens distortion are estimated as well, thus extending the range of applicability of the algorithm. A final note has to be reserved to a family of algorithms which estimate the lens distortion parameters without the use of calibration objects and, therefore, without the knowledge of 3-D metric structure. They exploit the fact that an ideal projective camera maps 3-D straight lines in space into 2-D straight lines on the image plane. Therefore, by enforcing the straightness of curved image features which are supposed to be rectilinear, it is possible to estimate the geometric aberration induced by the lens [3,40–43]. Stein’s method utilizes epipolar and trilinear constraints between, respectively, pairs and triplets of views to estimate radial distortion [44]. Other authors have advanced motion estimation and image mosaicking algorithms which take into account lens distortion through simple models described by reduced sets of parameters [45,46]. Still others estimate the intrinsic parameters through the optical flow induced on the image plane by deliberate camera motions [47,48]. Some researchers have also analyzed the dependence of camera calibration accuracy on measurement errors on the image plane and on the calibration target [38,49,50].

3. Camera model and calibration geometry This section presents the mathematical model and the geometry adopted for the calibration of a digital camera Cðzi Þ; this notation stresses the dependence of the image formation process of the camera on the internal parameters zi which describe the geometric distortion produced by its optical system. Based on these parameters, the geometric aberrations induced by the lens, or lens system, of the camera can be corrected and the images acquired with the camera Cðzi Þ can be transformed into images devoid of any distortion. Such images would be equivalently generated by an ideal pinhole digital camera § mapping 3-D object points into 2-D image points through a perspective projection [1,2]. 3.1. Problem formulation T 2 Let f[ðkÞ d ðxÞgkZ1;.;K , x ^½xy 2R , denote a set of 4 images of a planar calibration plate O taken with Cðzi Þ from K different viewpoints. The calibration plate may be any regular 2-D pattern printed on a flat surface whose salient features or markers can be extracted with subpixel accuracy from images of the plate itself; the only two requirements for the acquisition of these images are: (1) that the whole pattern be contained within the image, and (2) that the angle between the optical axis of the camera and

4 Although digital images are naturally defined over a finite, discrete domain, for convenience they will be defined over R2 .

the normal to the plane containing the pattern vary across different views. A regular black-and-white checkerboard is usually adopted as the pattern on the calibration plate because its X-junctions (or corners) can be easily extracted and used as fiducial points. In this paper, we have used it too for two reasons: (1) Image features can be obtained with subpixel accuracy through the algorithm of [6] which has a precision of the order of a few hundredths of a pixel; (2) The performance of the camera calibration algorithm presented in this paper is compared with that of a widely used calibration toolbox implemented in Matlab by Jean-Yves Bouguet and available on the Internet [7]; this method uses a regular checkerboard as well. Rather than using our own images, we have thus preferred to use Bouguet’s images; Fig. 1 shows four of the 25 images from the dataset of [7]. Our objective is to estimate the internal parameters zi characterizing the camera Cðzi Þ by mapping the features extracted from the K views of the calibration plate O into the corresponding points of a convenient reference frame devoid of any lens distortion and perspective warp. This undistorted and rectified image would be generated by an ideal pinhole digital camera § having the same focal length as Cðzi Þ and two of its CCS axes parallel to the sides of the plate and the third, corresponding to the optical axis, orthogonal to the calibration plate and intersecting the plate at its center; this geometric arrangement will be referred throughout as the centered fronto-parallel (CFP) configuration with respect to O. 3.2. External geometry of the camera and perspective projection equations Fig. 2 shows the arrangement of the two cameras Cðzi Þ and § with respect to the plane containing the calibration plate O; let OoXoYoZo denote the left-handed OCS attached to O. Let O 0 X 0 Y 0 Z 0 denote the left-handed CCS associated with §; its origin O 0 is located at the focal point (pinhole) of the camera, the X 0 -axis and Y 0 -axis are parallel to the sides of the calibration plate O, and the Z 0 -axis, the optical axis, is orthogonal to it. Let o 0 x 0 y 0 be the coordinate system of the image plane5 of §, which is parallel to the plane X 0 Y 0 and located at a distance f from it, f being the effective focal length6 of §. The intersection o 0 of the optical axis Z 0 with the image plane x 0 y 0 is called the principal point of the camera §. Based on this pinhole camera model, a 3-D point (X 0 , Y 0 , Z 0 ) in the CCS of § is mapped into the 2-D point 5 The plane indexed by o 0 x 0 y 0 is a virtual image plane since it is placed between the object O and the focal point O 0 [1]. The actual image plane is located Z 0 ZKf. The virtual image plane is used for convenience since its axes x 0 and y 0 are, respectively, equioriented with X 0 and Y 0 thereby eliminating the upside-down effect of objects on the actual image plane. 6 It is assumed that the pinhole camera is focused at infinity so that the distance between the pinhole and the image plane x 0 y 0 is equal to the focal length f [1].

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

521

Fig. 1. Images 8, 13, 18, and 21 (a–d, respectively) from the image dataset of [7].

(x 0 , y 0 ) on its image plane according to the two perspective projection equations 8 X0 > > < x 0 Z fx 0 ; Z (1) 0 > Y > : y 0 Z fy 0 ; Z where fx ^kf and fy ^lf ; w ^1=k and h ^1=l are, respectively, the width and the height of a pixel [1]; r ^fy =fx Z l=kZ w=h defines the pixel aspect ratio. If f is measured in meters, k and l are expressed in pixels/meter. Eq. (1) can also be conveniently expressed in homogeneous coordinates [1,2] as 2 03 3 X 2 03 2 x fx 0 0 0 6 0 7 6 7 76 Y 7 6 07 6 (2) 4 y 5 Z 4 0 fy 0 0 56 7: 6 Z0 7 5 4 0 0 1 0 1 1 In this paper, we assume that the ideal camera § is located in a CFP configuration with respect to the calibration plate O with its focal point O 0 at a distance Z 0 ZLCf from it (see Fig. 2). This distance is arbitrary but

Fig. 2. Camera calibration geometry. O is the calibration plate; OoXoYoZo is the OCS attached to it. O 0 X 0 Y 0 Z 0 is the CCS associated with the pinhole camera §; o 0 x 0 y 0 is the reference system of the relative image plane, at a distance f from the focal point (pinhole) O 0 ; [r ðx 0 Þ is the image of the calibration plate seen on x 0 y 0 ; OXYZ is the CCS associated with Cðzi Þ; oxy is the reference system of the relative image plane, at a distance f from the focal point O; o˜, having coordinates (xp, yp) in the reference system oxy, is the principal point of Cðzi Þ; o~x~y~ is the image plane reference system centered at o˜ and related to oxy through Eq. (3); [ðkÞ d ðxÞ is the image of the calibration plate seen on the image plane xy from the kth viewpoint.

522

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

such that the image [r ðx 0 Þ, x 0 ^½x 0 y 0 T 2R2 , of the calibration plate O is entirely contained in x 0 y 0 ; the image [r ðx 0 Þ is thus a version of O simply scaled by the two constant factors fx/(LCf) and fy/(LCf). Therefore, our problem can alternately be cast as the projection of the K views of the calibration plate O onto O itself. Let us assume that the kth image [ðkÞ d ðxÞ of the calibration plate is acquired with the real camera Cðzi Þ from a viewpoint different from the CFP configuration, as schematically indicated in Fig. 2. Let OXYZ denote the left-handed CCS associated with Cðzi Þ: its origin O is located at the focal point of the camera and its optical axis Z points at the calibration plate O. Let oxy denote the reference system of the image plane of Cðzi Þ. Because of the imperfections of the optical system of a real digital camera, the principal point of Cðzi Þ (denoted by o˜ in Fig. 2) very seldom coincides with the actual physical center (denoted by o in Fig. 2) of its CCD or CMOS array (image plane). This offset is taken into account by introducing a second reference system o~ x~y~ indexing the image plane of Cðzi Þ. If (xp, yp) denote the coordinates of the principal point in the reference system ~ i.e. oxy, a simple translation relates oxy to o~x~y, ( x~ Z x K xp ; (3) y~ Z y K yp ; Therefore, the perspective projection equations for Cðzi Þ write 8 X > < x~ Z fx ; Z (4) > : y~ Z fy Y : Z Eqs. (3) and (4) can be combined and written with the formalism of homogeneous coordinates as 2 3 2 3 2 3 X fx 0 xp 0 6 7 x 6y7 6 76 Y 7 7 (5) 4 5 Z 4 0 fy yp 0 56 6 Z 7: 4 5 1 0 0 1 0 1 From Eqs. (2) and (5), it should be noted that the skew angle between the image plane coordinate axes is assumed to be zero. This assumption is reasonable with common CCD cameras where these axes are close to being perfectly orthogonal [1,2]. The CCSs O 0 X 0 Y 0 Z 0 and OXYZ of § and Cðzi Þ, respectively, relate through a 3-D rigid transformation7 that can be expressed as X Z RðkÞ X 0 C T ðkÞ ;

(6)

where X ^½XYZT 2R3 , X 0 ^½X 0 Y 0 Z 0 T 2R3 , T ðkÞ^ ½TXðkÞ TYðkÞ TZðkÞ T 2R3 , and RðkÞ 2SOð3Þ. By adopting the 7

SO (3) denotes the group of the 3!3 special orthogonal matrices [1].

formalism of the roll, pitch, and yaw angles [1], the 3-D rotation matrix can be decomposed into three elementary rotation matrices as RðkÞ Z RðuðkÞ ÞRð4ðkÞ ÞRðqðkÞ Þ 2 32 3 1 0 0 cos 4ðkÞ 0 Ksin 4ðkÞ 6 76 7 ðkÞ 6 7 1 0 ^6 sin uðkÞ 7 4 0 cos u 54 0 5 sin 4ðkÞ 0 0 Ksin uðkÞ cos uðkÞ 2 3 cos qðkÞ sin qðkÞ 0 6 7 ðkÞ !6 cos qðkÞ 0 7 4 Ksin q 5: 0

0

cos 4ðkÞ ð7Þ

1

(k)

The entries of matrix R in Eq. (7) are given by 8 ðkÞ ðkÞ ðkÞ > > > r11 Z cos q cos 4 ; > > ðkÞ > > r12 Z sin qðkÞ cos 4ðkÞ ; > > > > ðkÞ > > r13 ZKsin 4ðkÞ ; > > > > > > r ðkÞ ZKsin qðkÞ cos uðkÞ Ccos qðkÞ sin 4ðkÞ sin uðkÞ ; > < 21 ðkÞ r22 Z cos qðkÞ cos uðkÞ Csin qðkÞ sin 4ðkÞ sin uðkÞ ; > > > ðkÞ > r23 Z cos 4ðkÞ sin uðkÞ ; > > > > > ðkÞ > r31 Z sin qðkÞ sin uðkÞ Ccos qðkÞ sin 4ðkÞ cos uðkÞ ; > > > > ðkÞ > > r32 ZKcos qðkÞ sin uðkÞ Csin qðkÞ sin 4ðkÞ cos uðkÞ ; > > > > : ðkÞ r33 Z cos 4ðkÞ cos uðkÞ :

(8)

The parameters u(k), 4(k), and q(k) of R(k) and the three components of T(k) will be referred to as the external (or extrinsic) parameters of the camera associated with the kth view. Strictly speaking, the external geometry of Cðzi Þ is defined as the rigid transformation relating OXYZ to OoXoYoZo [1,2]. This can be readily computed once the focal length f of Cðzi Þ has been estimated, since O 0 X 0 Y 0 Z 0 and OoXoYoZo relate through a displacement LCf along their third coordinate. 3.3. Internal geometry of the camera Unlike §, the real camera Cðzi Þ is affected by lens aberration which can be decomposed into radial and decentering (or tangential) distortion [1,2]. The former is caused by imperfections of the curvature of the lens surface which make the light rays bend more or less than the correct amount. The latter is caused by the misalignment of the centers of curvature of the two surfaces of the lens; as a consequence, there exists a displacement between the principal point and the physical center of the camera image plane, i.e., point o does not coincide with point o˜ (see Fig. 2). Because of distortion, 3-D straight lines in a scene do not map into 2-D straight lines on the image plane; the bending of lines is clearly noticeable in the four images of Fig. 1. The actual ‘distorted’ location (xd, yd), due to both types of geometric aberration, of a point on the image plane xy of

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

the real camera Cðzi Þ relates to the ‘undistorted’ location (xu, yu) that would be obtained if Cðzi Þ were the ideal pinhole camera § in the same position and with the same orientation as ( xu Z j1 ðxd ; yd ; z[ Þ Z xd C d1 ðxd ; yd ; z[ Þ; (9) yu Z j2 ðxd ; yd ; z[ Þ Z yd C d2 ðxd ; yd ; z[ Þ; where d1 ðxd ; yd ; z[ Þ and d2 ðxd ; yd ; z[ Þ are appropriate corrective offsets. The model in Eq. (9) will be referred to as a corrective model for lens distortion. The notation in Eq. (9) stresses the dependence of the pertinent functions both on the spatial location (xd, yd) and on the parameters z[ of the model. A very accurate mathematical model for the two corrective displacements d1 and d2, accounting for both radial and decentering distortions, is provided by 8 d1 ðxd ; yd ; z[ Þ Z ðxd K xp Þðk1 r 2 C k2 r 4 C k3 r 6 Þ > > > < Cq½p1 ðr 2 C 2ðxd K xp Þ2 Þ C 2p2 ðxd K xp Þðyd K yp Þ; > > d ðx ; y ; z Þ Z ðyd K yp Þðk1 r 2 C k2 r 4 C k3 r 6 Þ > : 2 d d [ Cq½2p1 ðxd K xp Þðyd K yp Þ C p2 ðr 2 C 2ðyd K yp Þ2 Þ; (10) where xp and yp are the coordinates of the principal point defined in Section 3.2, k1, k2, and k3 are the radial distortion coefficients, p1, p2, and p3 are the decentering distortion coefficients [2,3], and the quantities r and q are defined, respectively, as r ^ððxd K xp Þ2 C ðyd K yp Þ2 Þ1=2

and

q ^ð1 C p3 r 2 Þ: (11)

The distortion coefficients in Eqs. (10) and (11) are conveniently collected into the vector z[ ^½xp yp k1 k2 k3 p1 p2 p3 T 2R8 ; the vector containing the internal parameters of the camera Cðzi Þ is thus defined as zi ^½zT[ fx fy T 2R10 . An alternative model describing the geometric aberration of the lens is provided by ( xd Z j 0 1 ðxu ; yu ; z[0 Þ Z xu C d10 ðxu ; yu ; z[0 Þ; (12) yd Z j 0 2 ðxu ; yu ; z[0 Þ Z yu C d20 ðxu ; yu ; z[0 Þ;

523

which is the ‘inverse’ of that in Eq. (9). This model will be referred to as a lens distortion model. Of course, functions j 0 1 and j 0 2 are not the inverse functions of j1 and j2, respectively, and z[0 sz[ . Some approximate relationships between the two models, only for the case of radial distortion, are reported by Tordoff in [51]. In this paper we will be using the corrective model in Eq. (9). For notational convenience, in the rest of the paper the subscripts in Eqs. (9)–(11) will be dropped and (xu, yu) will be denoted as (x 0 , y 0 ) and (xd, yd) as (x, y). 3.4. Forward and backward homographies If the values of the parameters z[ in Eqs. (10) and (11) were known, it would be possible to correct the geometric aberration caused by the lens on the image plane xy of Cðzi Þ thereby transforming this camera into an ideal pinhole digital imaging device § looking at the calibration plate O from the same viewpoint. By plugging these values into Eq. (9), the actual distorted image [ðkÞ d ðxÞ could be warped into an image [ðkÞ ðxÞ devoid of lens aberrations; this operation is u schematically represented in Fig. 3. After this transform0 ation, the corrected image [ðkÞ u ðxÞ would relate to [r ðx Þ through a 2-D homography [2] (also referred to as a projective transformation or projectivity or collineation) defined, in homogeneous coordinates, by 2

x0

3

2

aðkÞ 11

6 07 6 ðkÞ 4y 5 Z6 4 a21 1

cðkÞ 1

32 3 2 3 bðkÞ 1 x K xp x K xp 7 7 ðkÞ 6 76 y K yp 7 5 ^H 4 y K yp 5: bðkÞ 2 54

aðkÞ 12 aðkÞ 22 cðkÞ 2

1

1

1

(13) The 2-D homography of Eq. (13) will also be denoted in non-homogeneous coordinates as 8 ðkÞ ðkÞ > aðkÞ > ðkÞ 11 ðxKxp Þ Ca12 ðyKyp ÞCb1 0 > Z x ðx; z Þ^ ; x > 1 H < ðkÞ cðkÞ 1 ðx Kxp ÞCc2 ðyKyp Þ C1 (14) ðkÞ ðkÞ > aðkÞ > ðkÞ 21 ðxKxp Þ Ca22 ðyKyp ÞCb2 0 > y Z x ðx; z Þ^ ; > 2 H : cðkÞ ðx Kx ÞCcðkÞ ðyKy Þ C1 1

p

2

p

2 0 Fig. 3. The two geometric transformations relating the kth view [ðkÞ d ðxÞ, x 2R , of the calibration plate O to the undistorted and rectified reference image [r ðx Þ, ðkÞ 0 2 ðkÞ 0 ðkÞ x 2R ; [u ðxÞ represents the version of [d ðxÞ after compensation for lens distortion through Eqs. (9) and (10). [r ðx Þ is obtained by rectifying [u ðxÞ through the forward homography of Eq. (13). The inverse transformation warping [r ðx 0 Þ into [ðkÞ u ðxÞ is obtained through the backward homography of Eq. (20).

524

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

where functions x1 and x2 denote the ratios of homogeneous ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ T 8 coordinates and zðkÞ H ^½a11 a21 a12 a22 b1 b2 c1 c2  2R . The transformation of Eq. (13) or (14), schematically depicted in Fig. 3, will be referred to as the forward homography. By combining the two nonlinear geometric transformations of Eqs. (9), (10), and (13), the reference systems of 0 the two images [ðkÞ d ðxÞ and [r ðx Þ relate in homogeneous coordinates as 2

x0

2

3

aðkÞ 11

6 07 6 ðkÞ 4y 5 Z6 4 a21

aðkÞ 12 aðkÞ 22

cðkÞ 1

1

cðkÞ 2

32 3 bðkÞ j1 ðx; z[ Þ K xp 1 7 7 76 bðkÞ 2 54 j2 ðx; z[ Þ K yp 5 1 1

(15)

and in non-homogeneous coordinates as 8 > x 0 Z x1 ðjðx; z[ Þ; zðkÞ > HÞ > > > ðkÞ ðkÞ > > a11 ðj1 ðx; z[ Þ K xp Þ C aðkÞ > 12 ðj2 ðx; z[ Þ K yp Þ C b1 > > ^ ; > < cðkÞ ðj1 ðx; z[ Þ K xp Þ C cðkÞ ðj2 ðx; z[ Þ K yp Þ C 1 1

2

> > y 0 Z x2 ðjðx; z[ Þ; zðkÞ > HÞ > > > ðkÞ ðkÞ > > a21 ðj1 ðx; z[ Þ K xp Þ C aðkÞ > 22 ðj2 ðx; z[ Þ K yp Þ C b2 > ^ ; > : ðkÞ cðkÞ 1 ðj1 ðx; z[ Þ K xp Þ C c2 ðj2 ðx; z[ Þ K yp Þ C 1 (16) where jðx; z[ Þ ^½j1 ðx; z[ Þj2 ðx; z[ ÞT 2R2 . The coefficients zðkÞ H of the forward homography are functions of the external parameters in Eq. (6) as well as of the focal length f and pixel aspect ratio r of the camera Cðzi Þ. This dependence can be better explicited by computing the backward homography which maps [r ðx 0 Þ into [ðkÞ u ðxÞ. The backward homography can be expressed in homogeneous coordinates as 3 2 aðkÞ 11 x K xp 6 y Ky 7 6 ðkÞ 6 4 p 5 Z 4 a21

aðkÞ 12

gðkÞ 1

gðkÞ 2

2

1

aðkÞ 22

32 0 3 2 03 bðkÞ x x 1 7 6 7 6 7 ðkÞ 7 0 ^H H 4 y 0 5; bðkÞ 2 54 y 5 1 1 1 (17)

and also as a function of the external parameters [2] as 0 3 x K xp B ðkÞ 6 y Ky 7 4 p 5 Z KB @R C 2

1

where 2

2

TXðkÞ

31

2

x0

3

0

0

1 6 60 L Cf 4 0

0

7C K1 6 7 0 C TYðkÞ 7 5AK 4 y 5;

0

TZðkÞ

1

0

0

fy

7 0 5:

0

0

1

By comparing Eqs. (1) and (2), after matrix multiplication in Eq. (18), it is 8 ðkÞ ðkÞ > r11 fx r12 ðkÞ > > aðkÞ ^ a ^ > 11 12 > fy DðkÞ DðkÞ > > > ðkÞ ðkÞ > > fy r21 r22 > ðkÞ ðkÞ > a ^ < a21 ^ 22 fx DðkÞ DðkÞ (21) ðkÞ ðkÞ ðkÞ > r13 C tX r23 C tðkÞ > ðkÞ ðkÞ Y > b ^f b ^f > x y 1 2 > > DðkÞ DðkÞ > > ðkÞ ðkÞ > > 1 r31 1 r32 > ðkÞ > gðkÞ : g1 ^ 2 ^ ðkÞ fx D fy DðkÞ ðkÞ ðkÞ T 3 ðkÞ where tðkÞ ^½tðkÞ with tðkÞ n ^Tn =ðLC f Þ, X tY tZ  2R nZX, Y, Z, is the normalized translational displacement ðkÞ relative to the kth view of O; DðkÞ ^r33 C tðkÞ Z , and the ðkÞ coefficients rij are the entries of the 3-D rotation matrix R(k) defined in Eq. (8). Since forward and backward homographies are the inverse of each other, from Eqs. (13) and (17) it is HðkÞ Z ðH HðkÞ ÞK1 , whence, by dropping the superscripts for notational convenience, " # " # a22 K b2 g2 Ka12 C b1 g2 a11 a12 1 A^ Z ; det A Ka C b g a Kb g a21 a22 21 2 1 11 1 1 " # " # a22 b1 K a12 b2 b1 1 ZK ; b^ det A Ka b Ca b b2 21 1 11 2 " # " # a22 g1 Ka21 g2 c1 1 ZK ; ð22Þ c^ det A Ka12 g1 Ca11 g2 c2

where   a11 a12 : A^ a21 a22

(23)

Therefore, the dependence of the forward homography coefficients in Eq. (13) on the external parameters of the camera and on the focal lengths fx and fy is expressed by Eqs. (21) and (22).

(18)

3.5. The cosines’ equation

(19)

If the forward homography coefficients are known, the external parameters and the focal lengths fx and fy of the camera can be estimated by computing the backward homography coefficients through HðkÞ Z ðHðkÞ ÞK1 and by ðkÞ ðkÞ solving Eq. (21) for u(k), 4(k), q(k), tðkÞ X , tY , tZ , fx, and fy.

3

fx 6 K Z4 0

The backward homography in Eq. (17) can be alternately expressed in non-homogeneous coordinates as 8 ðkÞ 0 ðkÞ 0 aðkÞ > 11 x C a12 y C b1 > > Z ; x K x p < ðkÞ 0 0 gðkÞ 1 x C g2 y C 1 (20) ðkÞ 0 ðkÞ 0 ðkÞ > a x C a y C b > 21 22 2 > : : y K yp Z ðkÞ 0 0 g1 x C gðkÞ 2 y C1

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

525

A new method for accomplishing this task has been advanced in [8] and is briefly described next. From Eq. (21), by dropping the superscripts and defining the two angles h1 ^arctanða21 =ra22 Þ and

h2 ^arctanðrg2 =g1 Þ;

after a few trigonometric manipulations,8 it is 8 > u ZGarctanðtanðq C h1 Þtanðq K h2 ÞÞ1=2 ; > < 4 ZGarcsinðtanðq C h1 Þ=tanðq K h2 ÞÞ1=2 ; > > : q Z arctanðra12 =a11 Þ:

(24)

(25)

Clearly, the roll, pitch, and yaw angles in Eq. (25) depend on the coefficients aij and gi of the backward homography, which are computed from HðkÞ Z ðHðkÞ ÞK1 , and are parameterized by r, the pixel aspect ratio. An estimate of r can by obtained by observing that, from Eqs. (21) and (23), it is det A Z a11 a22 K a12 a21 Z ðr11 r22 K r12 r21 Þ=D2 ;

(26)

where, by using the first of Eq. (21), r11/a11 can be substituted for D yielding, after some simple algebra, gðrÞ ^det A cosð4Þcos2 ðqÞ K a211 cosðuÞ Z 0:

(27)

Eq. (27) is referred to as the cosines’ equation [8], since it contains only the cosines of the angles u, 4, and q. The zerocrossing r0 of g(r) in close proximity to rZ1 (since the pixels of common digital cameras are approximately square) provides an estimate of the pixel aspect ratio for the kth view of O. Fig. 4 shows the cosines’ equation g(r) associated with the image of Fig. 1(b) (view 13 from the image dataset of [7]); in this case, it is r0Z0.99; Fig. 5 shows the estimates of the pixel aspect ratio obtained from all 25 views of the calibration plate. It is worth observing that, as expected, all estimates are rather close to 1. From the values of the pixel aspect ratio, the angles of Eq. (25) can be computed for all the views and, from these, the normalized translational displacement t and the focal lengths fx and fy in Eq. (21) can be obtained as 8 r C tZ > tX Z b1 33 K r13 ; > > > fx > > r C tZ > > > t Z b2 33 K r23 ; > < Y fy r (28) tZ Z 11 K r33 ; > > a > 11 > > 1 r31 > > > Z ; f > > x g1 r33 C tz : fy Z rfx :

Fig. 4. Cosines’ equation g(r) relative to the image of Fig. 1(b); its zero crossing r0Z0.99 close to rZ1 is denoted by a red circle. (For interpretation of references to color in this figure legend, the reader is referred to the web version of the article.)

expressed in meters, of the sides of the squares of. O. Let dx0 and dy0 denote the corresponding lengths, in pixel units, of the sides of the squares of [r ðx 0 Þ. Therefore, from Eq. (1), it is 8 Do > ; < dx0 Z kf L Cf (29) > : dy0 Z lf Do ; L Cf whence   D L Z k 0o K 1 f dx

  D or L Z l 0o K 1 f : dy

(30)

4. Camera calibration algorithm The calibration method consists of two stages. First, in a preprocessing stage, sets of corresponding features are

In order to estimate the absolute translational displacement T(k) in Eq. (6), either the pixel width w or the pixel height h has to be known, since T(k)Z(LCf)t(k). L too has to be computed from the actual dimensions of the squares on the calibration plate O and from the dimensions of the squares of the virtual image [r ðx 0 Þ. Let Do denote the length, e.g. 8

The reader is referred to [8] for the details of the derivation.

Fig. 5. Estimates of the pixel aspect ratio r ^fy =fx obtained from all 25 views of the calibration plate O for the image dataset of [7].

526

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

extracted with subpixel accuracy from the K images [ðkÞ d ðxÞ of O. Then, each view is registered onto the virtual image [r ðx 0 Þ by using a simplified version of the model in Eq. (16) involving a reduced set of parameters. Finally, these estimates are used to initialize a multi-view rectification algorithm where all K images are registered onto [r ðx 0 Þ by using the complete model of Eq. (16). 4.1. Subpixel feature extraction For the extraction of salient features from the K images [ðkÞ d ðxÞ of the calibration plate O , we have designed a graphical user interface (GUI) which has been implemented in Matlab. The GUI allows the user to select a quadrilateral region of interest (ROI) in each image by clicking with the mouse at four locations, as indicated in Fig. 6(a), which refers to the image of Fig. 1(b); the selected ROI is enclosed by yellow lines. An algorithm is currently being developed for the automatic detection of the ROI using the textural characteristics of the pattern on the calibration plate as a prior. In [6] we have advanced a simple algorithm for extracting features from images of checkerboard calibration plates with subpixel accuracy; the data extracted are the spatial locations of the X-junctions (or corners) within these images. The corners are first roughly extracted with Harris’ corner detector [52] and then refined by computing the locations of the saddle points on the surfaces representing the intensity profiles of the images. The first are marked with red ‘C’ symbols in Fig. 6(a), the latter with yellow ‘C’ symbols in Fig. 6(b). The accuracy delivered by this method is of the order of a few hundredths of a pixels in the absence of noise or in lownoise conditions. The algorithmic details and the performance analysis in the presence of additive Gaussian noise can be found in [6]. After extraction, the Cartesian coordinates of the X-junctions are collected in two vectors X 2RMN and Y 2RMN ; these need to be appropriately ordered and stored into two arrays xðkÞ ½m; n 2RM!N and yðkÞ ½m; n 2RM!N , mZ1,.,M, nZ1,.,N for processing. For the calibration example of this paper, it is MZ12 and NZ13 for a grand total of 156 extracted features. The algorithm for the conversion of X and Y in the two ordered arrays x(k)[m, n] and y(k)[m, n] is reported in Appendix A. This algorithm was tested with images of planar checkerboard patterns affected by various degrees of distortion and always performed successfully. The virtual image [r ðx 0 Þ of the calibration plate O (within the ROI) as seen from the ideal pinhole digital camera § in a CFP position is also generated and the locations of its X-junctions are orderly stored in the two arrays x 0 [m, n] and y 0 [m, n], mZ1,.,M, nZ1,.,N. Of course, in this case the coordinates of the X-junctions need not be extracted but can be directly obtained from the number of squares of the checkerboard pattern and from their size. While the number of squares is dictated by the characteristics of the actual calibration plate O, their size can be chosen arbitrarily.

Fig. 6. (a) Initial estimate of the X-junctions of [ðkÞ d ðxÞ within the selected ROI (enclosed by yellow lines) in the image of Fig. 1(b); (b) refinement of the estimates with the saddle points of the intensity surface. (For interpretation of references to color in this figure legend, the reader is referred to the web version of the article.)

4.2. Stage I: view prealignment with a simplified warping model The K geometric transformations mapping each view 0 [ðkÞ d ðxÞ of O onto the reference image [r ðx Þ are computed by simplifying the mathematical model of Eq. (16). Since the most important parameters in Eq. (10) are the coordinates of the principal point xp and yp and the first radial distortion coefficient k1 (the others being a few orders of magnitude smaller [1,2]), the corrective displacements of Eq. (10) are simplified as (

d1 ðxd ; yd ; z[ Þ Z ðxd K xp Þk1 r 2 ; d2 ðxd ; yd ; z[ Þ Z ðyd K yp Þk1 r 2 ;

(31)

and the vector containing the reduced set of internal parameters is denoted as z[r ^½xp yp k1 T 2R3 .

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Moreover, one may notice that various combinations of ðkÞ parameters aðkÞ ij , bi , xp, and yp in Eq. (16) could lead to the same fitting of the data since the net translational components in their numerators can be expressed as (

ðkÞ ðkÞ d1ðkÞ ^bðkÞ 1 K ða11 xp C a12 yp Þ; ðkÞ ðkÞ d2ðkÞ ^bðkÞ 2 K ða21 xp C a22 yp Þ:

(32)

These ambiguities are due to the fact that the forward homography coefficients and the coordinates of the principal point do not have to satisfy any constraint. Therefore, based on Eq. (32), a new parameter vector defining the forward homography is built as ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ T 8 zðkÞ The overall Hr ^½a11 a21 a12 a22 d1 d2 c1 c2  2R . reduced parameter vector is denoted as ðkÞ T T T 11 zðkÞ ^½z ðz Þ  2R . r [r Hr With these simplifications, Eq. (16) are rewritten as 8 0 > x Z x1 ðjðx; z[r Þ; zðkÞ > Hr Þ > > ðkÞ ðkÞ > a j ðx; z[r Þ C aðkÞ > 11 1 12 j2 ðx; z[r Þ C d1 > Z ; > < ðkÞ cðkÞ 1 ðj1 ðx; z[r Þ K xp Þ C c2 ðj2 ðx; z[r Þ K yp Þ C 1 > y 0 Z x2 ðjðx; z[r Þ; zðkÞ > Hr Þ > > ðkÞ ðkÞ > a j ðx; z[r Þ C aðkÞ > 1 21 22 j2 ðx; z[r Þ C d2 > > : : Z ðkÞ c1 ðj1 ðx; z[r Þ K xp Þ C cðkÞ 2 ðj2 ðx; z[r Þ K yp Þ C 1 (33) The fitting of the model described by Eq. (33) to the sets of corresponding features computed in the preprocessing stage is cast into a minimization problem and solved within a maximum likelihood framework. A cost function for the kth view [ðkÞ d ðxÞ of O is defined by mapping its features into the corresponding features of the reference image [r ðx 0 Þ as cðkÞ r ^

M X N X

ðkÞ 0 2 0 2 ½ðxðkÞ 1 ½m;nKx ½m;nÞ Cðx2 ½m;nKy ½m;nÞ 

mZ1 nZ1

(34) where 8 ðkÞ  x1 ½m;n^x1 ðjðx;z[r Þ;zðkÞ > Hr Þ xZxðkÞ ½m;n > >  > >  > <  yZyðkÞ ½m;n : ðkÞ ðkÞ  > > x2 ½m;n^x2 ðjðx;z[r Þ;zHr Þ xZxðkÞ ½m;n > >  > >  :  yZyðkÞ ½m;n

(35)

The problem is thus formulated as ðkÞ z^ r Zargmin cðkÞ r

(36)

zðkÞ r

and solved with the Levenberg-Marquardt (LM) algorithm [1,53]. By dropping the superscripts for notational convenience and by denoting with zr the generic component of the reduced parameter vector zr, the partial derivatives of

the cost function in Eq. (34) are  M X N  X vcr vx1 vx2 0 0 Z2 Dx ½m;n CDy ½m;n vzr vzr vzr mZ1 nZ1 where ( 0 Dx ½m;n^x1 ½m;nKx0 ½m;n; Dy0 ½m;n^x2 ½m;nKy0 ½m;n;

527

(37)

(38)

and the partial derivatives of x1 and x2 in Eq. (33) are evaluated at xZx(k)[m, n] and yZy(k)[m, n]. The analytical expressions of these derivatives are reported in Appendix B. The partial derivatives in Eq. (37) of the cost function cr are collected into the gradient vector   vcr vcr vcr vcr vcr vcr vcr vcr vcr vcr vcr T Vz r c r ^ vxp vyp vk1 va11 va21 va12 va22 vd1 vd2 vc1 vc2 2R11 ;

ð39Þ

and, from this, an approximation of the Hessian matrix required by the LM algorithm [53] is computed as Xr Z Vzr cr VTzr cr . In order for the LM algorithm to converge to the optimal solution, an initial estimate of zr close to it has to be determined. This is obtained by further simplifying the model of Eq. (33) and by setting xpZ0, ypZ0, and k1Z0 in Eq. (31); this clearly leads to d1Zd2Z0 whence j1Zx and j2Zy. Eq. (33) are thus rewritten as 8 ðkÞ ðkÞ aðkÞ > ðkÞ 11 x C a12 y C d1 0 > > x Z x ðx; z Þ Z ; 1 < Hr ðkÞ cðkÞ 1 x C c2 y C 1 (40) ðkÞ ðkÞ > aðkÞ > ðkÞ 21 x C a22 y C d2 0 > Z x ðx; z Þ Z y ; : 2 Hr ðkÞ cðkÞ 1 x C c2 y C 1 which make the geometric warping between the two feature sets linear with respect to the forward homography parameters zðkÞ these can therefore be computed in a Hr ; ðkÞ closed form as z^ Hr with a linear least-squares algorithm ðkÞ ðkÞ [1,53]. By using z^ r;o ^½000ðz^ Hr ÞT T 2R11 as the initial estimate, the LM algorithm yields the optimal solution ðkÞ ðz^ r Þopt to the problem in Eq. (36) in a few iterations; experimentally, convergence can be observed after about 10–15 iterations. ðkÞ From ðz^ r Þopt , the values for bðkÞ and bðkÞ are then 1 2 obtained by solving Eq. (32). Table 1 displays the estimates x^p , y^p , and k^1 of the internal parameters xp, yp, and k1, respectively, for all KZ25 images of the calibration plate O in the image dataset of [7]. While the estimates of k1 are consistent across all the views and close to the final estimated value of Table 2, the values of x^p and y^p vary significantly from view to view. However, it is worth observing that their average values, K9.1119 and K2.3681, respectively, are not far from the estimates obtained with the global multi-view rectification of Stage II (see Table 2). ðkÞ The estimates z^ Hr , kZ1,.,K, of the forward homography parameters are omitted in Table 1 for brevity.

528

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Table 1 Estimates x^p , y^p , and k^1 of the internal parameters xp, yp, and k1, respectively, obtained with the view prealignment algorithm of Stage I

Table 2 Estimates z^ i of the internal parameters zi obtained from the 25 images in the dataset of [7] along with their standard deviations (SDs)

View

x^p

y^p

k^1 (!107)

zi

z^ i

SD

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

K5.2545 K7.1935 K9.2943 K11.7588 K14.8657 K6.7191 K12.7780 K8.1745 K6.6581 0.1254 K4.1062 K5.0844 K6.7979 K10.7166 K7.2134 K17.6458 K8.8328 K10.4511 K7.7047 K11.5716 K15.3545 K16.2622 K6.9279 K9.5771 K6.9795

K13.4045 K10.5434 K11.1053 K10.0521 K13.0820 3.7412 8.5370 7.3946 K5.9170 K5.7894 K3.6927 K2.0198 0.4586 K0.2313 K0.1467 K8.9736 K0.9312 0.8707 8.1823 8.7465 9.2107 K2.3090 K0.9114 K2.7039 K14.5303

5.8899 5.9075 5.8961 5.9513 6.0187 6.0385 5.8840 5.7896 6.0438 5.9086 5.8336 5.9013 5.9456 6.0155 5.9825 6.2405 6.0560 6.1284 5.9927 5.7727 5.8167 5.7556 5.9850 6.0412 5.8064

xp yp k1 k2 k3 p1 p2 p3 fx fy

K16.48 K2.90 5.71!10K7 6.05!10K13 K2.06!10K18 K6.34!10K7 K1.06!10K6 1.37!10K6 656.66 657.83

0.94 0.97 2.85!10K8 6.76!10K13 4.58!10K18 5.23!10K7 7.08!10K7 1.04!10K5 0.71 0.57

By using homogeneous coordinates, Eq. (43) read 2 0 3 2 aðkÞ ðzðkÞ Þ aðkÞ ðzðkÞ Þ bðkÞ ðzðkÞ Þ 32 3 x j1 ðx; zi ÞKxp e 11 e 12 e 1 7 6 07 6 7 ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ 76 4y 5Z6 4 a21 ðze Þ a22 ðze Þ b2 ðze Þ 54 j2 ðx; zi ÞKyp 5: 1 1 1 cðkÞ ðzðkÞ Þ cðkÞ ðzðkÞ Þ

The internal geometry of the camera and the parameters defining the external geometry of the K views of O are estimated by defining the cost function cðkÞ r ;

(41)

kZ1

where the kth component cost function cðkÞ r is expressed by ðkÞ Eq. (34); the two arrays xðkÞ ½m; n and x 1 2 ½m; n are now defined as 8 ðkÞ  x ½m; n ^x1 ðjðx; zi Þ; zðkÞ > e Þ x Z xðkÞ ½m; n > > 1  > >  > <  y Z yðkÞ ½m; n ðkÞ ðkÞ  > > x2 ½m; n ^x2 ðjðx; zi Þ; ze Þ x Z xðkÞ ½m; n > >  > >  :  y Z yðkÞ ½m; n

(42)

where 8 ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ > aðkÞ > 11 ðze Þðj1 ðx; zi Þ K xp Þ C a12 ðze Þðj2 ðx; zi Þ K yp Þ C b1 ðze Þ 0 ðkÞ > x Z x ðjðx; z Þ; z Þ Z ; > 1 i e < ðkÞ ðkÞ ðkÞ cðkÞ 1 ðze Þðj1 ðx; zi Þ K xp Þ C c2 ðze Þðj2 ðx; zi Þ K yp Þ C 1 ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ > aðkÞ > 21 ðze Þðj1 ðx; zi Þ K xp Þ C a22 ðze Þðj2 ðx; zi Þ K yp Þ C b2 ðze Þ 0 ðkÞ > y Z x ðjðx; z Þ; z Þ Z : > e 2 i : cðkÞ ðzðkÞ Þðj ðx; z Þ K x Þ C cðkÞ ðzðkÞ Þðj ðx; z Þ K y Þ C 1 1

e

1

i

e

Eqs. (44) and (43) are simply Eqs. (15) and (16), respectively, where the dependence of the forward homography coefficients on the external parameters ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ ðkÞ T 6 zðkÞ of the kth view is e ^½u 4 q tX tY tZ  2R explicited through the use of Eqs. (21) and (22). It should be observed that, while the forward homography coefficients in Eq. (33) are unconstrained and the simplifications of Eq. (32) are necessary to remove estimation ambiguities, making explicit in Eq. (43) their dependence on the external parameters intrinsically establishes constraints among these coefficients—i.e. the 2-D homography has to be consistent with a 3-D rigid motion—and, therefore, all parameters can be estimated unambiguously. It is also worth pointing out that the remarkable characteristics of our camera calibration algorithm, both in terms of accuracy and robust convergence to the optimal solution (see Section 5), and its novelty with respect to current literature in the field rely on the formulation of Eq. (43) or (44). This formulation encodes the six degrees of freedom of the external geometry associated with each view of the calibration plate and the 10 parameters describing lens distortion into the image warping which compensates for such distortion and, at the same time, rectifies each view. Although this warping is highly nonlinear and the cost function associated with it is defined over a parameter space having a large number of dimensions, our algorithmic strategy permits the optimal and accurate recovery of all parameters.

4.3. Stage II: internal and external geometry from multi-view rectification

K X

2

(44)

The numbering of the views follows the order of the image dataset of [7].

cZ

e

1

p

2

e

2

i

p

(43)

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

529

The new optimization problem is formulated as z^ Z argmin c;

(45)

z T ðKÞ T T 10C6K is the overall where z ^½zTi ðzð1Þ e Þ .ðze Þ  2R parameter vector; in the example of this paper, z 2R160 . The LM algorithm is used also for the solution of the problem in Eq. (45). By dropping the superscripts for notational convenience and by denoting with z the generic component of the parameter vector z, the partial derivatives of the cost function in Eq. (41) are  K X M X N  X vc vx vx Z2 Dx 0 ½m; n 1 C Dy 0 ½m; n 2 ; (46) vz vz vz kZ1 mZ1 nZ1

where x1[m, n] and x2[m, n] in Eq. (38) are now expressed by Eq. (42). The partial derivatives of x1 and x2 with respect to z are reported in Appendices C–F. The partial derivatives in Eq. (46) of the cost function c are collected into the gradient vector Vz c Z ½VTzi cVTzð1Þ c.VTzðKÞ cT 2R10C6K ; e

(47)

e

where  Vzi c^

vc vc vc vc vc vc vc vc vc vc vxp vyp vk1 vk2 vk3 vp1 vp2 vp3 vfx vfy

T

2R10 ; (48)

and "

vc vc vc vc vc vc VzeðkÞ c ^ ðkÞ ðkÞ vuðkÞ v4ðkÞ vqðkÞ vtðkÞ X vtY vtZ k Z 1; .; K

#T 2R6 ; ð49Þ

and, from this, an approximation of the Hessian matrix for the LM algorithm is computed as XZ Vz cVTz c. The initial values for the components of the parameter vector z are chosen as follows: (i) xp, yp, and k1 are initialized by averaging the corresponding estimates across the K views computed in Stage I and collected in Table 1; (ii) All the remaining lens distortion coefficients in z[ are set to zero; (iii) The external parameters zðkÞ of the K views are e initialized with the values computed from Eqs. (25) and (28) after determining the backward homography from the forward homography from HðkÞ Z ðHðkÞ ÞK1 and after computing r0Zfy/fx from the zero-crossing of Eq. (27) close to rZ1; (iv) fx and fy are initialized by averaging the corresponding K values obtained from (iii). Thanks to this initialization, the LM algorithm converges to the optimal solution z^ opt rather quickly, usually within 20–25 iterations (it should be remembered that the overall number of optimization variables is 10C6K). This can be clearly appreciated in the plots of Fig. 7(a) and (b) which

Fig. 7. (a) Behavior of the cost function c in Eq. (41) with respect to the number of iterations of the LM algorithm for the 25 images of [7]; (b) analogous plot for the corresponding synthetic images.

show the cost function c versus the iteration number of the LM algorithm for the two examples of Sections 5.1 and 5.2. Similar convergence patterns were observed also in other experimental runs where optical systems affected by various degrees of lens distortion were calibrated. 5. Experimental results 5.1. Real images The calibration algorithm of Section 4 was applied to the 25 images of the dataset of [7]. The estimates z^ i of the internal parameters zi are shown in Table 2; this table also shows the relative standard deviations (SDs), obtained by computing the square roots of the diagonal entries of the inverse of the Hessian matrix X for zZ z^ opt . The estimates ðkÞ ðkÞ T 6 ^ ðkÞ ^ðkÞ z^ e ^½u^ ðkÞ 4^ ðkÞ q^ t^ðkÞ X t Y t Z  2R , kZ1,.,25, of the ðkÞ external parameters ze of the 25 views of the calibration plate O are not reported here for space constraints.

530

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

the four corners, the corrective displacement is of the order of 40 pixels. The arrows denote magnitude and orientation of the gradient of d. The radial and decentering components of d are displayed in Fig. 8(b) and (c). The comparison of the two reveals that the former accounts for most of the lens distortion. Fig. 9(a) shows the image of Fig. 1(b) after compensation for lens distortion with the internal parameters of Table 2. Fig. 9(b) shows the same image after rectification, i.e. after eliminating the perspective warp in the image of Fig. 9(a) by means of the estimated external parameters associated with this view. Fig. 10 shows the residual rectification errors Dx 0 and Dy 0 after registering all K views [ðkÞ d ðxÞ of O onto the reference 0 ^ image [r ðx Þ by plugging zZ zopt in Eq. (43). Fig. 11(a) and (b) shows the histograms of the Cartesian components of the residual rectification errors; they are: the horizontal offset Dx 0 and the vertical offset Dy 0 defined in Sections 4.2 and 4.3. Fig. 11(c) shows the histogram of the net residual displacement e 0 ^ððDx 0 Þ2 C ðDy 0 Þ2 Þ1=2 . The histograms h(Dx 0 ) and h(Dy 0 ) of errors Dx 0 and Dy 0 , respectively, are fitted with Gaussian probability density functions (PDFs),

Fig. 8. (a) Net corrective displacement d ^ðd21 C d22 Þ1=2 , where d1 and d2 are defined in Eq. (10), relative to the internal parameters of Table 2; (b) radial component of d; (c) decentering component of d. The yellow ‘C’ symbol indicates the physical center of the image plane whereas the red ‘!’ symbol denotes the estimated principal point. (For interpretation of references to color in this figure legend, the reader is referred to the web version of the article.)

Fig. 8(a) shows the net corrective displacement d ^ðd21 C d22 Þ1=2 , where d1 and d2 are defined in Eq. (10), relative to the internal parameters of Table 2. The yellow ‘C’ symbol indicates the physical center of the image plane whereas the red ‘!’ symbol denotes the estimated principal point. The intensity of the image is proportional to the amount of compensation necessary to find the correct location of any given pixel; it should be noted that, near

Fig. 9. (a) Image of Fig. 1(b) after compensation for lens distortion with the internal parameters of Table 2. (b) Image of Fig. 9(a) after rectification.

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

531

Fig. 10. Residual multi-view rectification errors for the 25 images of [7].

the histogram h(e 0 ) relative to e 0 with a Rayleigh PDF. They are marked with red lines in Fig. 11(a)–(c), respectively. Their mean values and SDs are reported in the plots. It should be noticed that the mean value of the latter is 0.2 pixels with an SD of 0.1 pixels. Only a subset of the internal parameters computed by Bouguet in [7] can be compared with our estimates because he uses, as many authors do (e.g. see [36]), the lens distortion model of Eq. (12) where d10 and d20 are expressed by Eq. (10) with xd/xu. Therefore, a direct comparison between the radial and decentering distortion coefficients is not possible. The parameters that can be compared are the coordinates of the principal point and the effective focal lengths along x and y; with Bouguet’s calibration method they are found to be: xpZK17.03 (0.58), ypZK2.50 (0.55), fxZ657.25 (0.28), and fyZ657.68 (0.29), with the respective SDs reported in parentheses. These values are very close to the estimates obtained with our algorithm and reported in Table 2. In order to compare our residual alignment errors with Bouguet’s reprojection errors a few simplifications have to be made, since the transformations of Eq. (43) are not invertible. Bouguet’s reprojection errors are computed on the image planes of the K views of O; our registration errors are computed instead on the image plane of the virtual pinhole camera in a CFP configuration with respect to O. By neglecting lens distortion and by using the backward homography of Eq. (20), the equivalent reprojection errors for the kth view can be approximated as 8 ðkÞ 0 0 aðkÞ > 11 Dx ½m; n C a12 Dy ½m; n > > ; < Dx½m; n x ðkÞ 0 ðkÞ 0 g1 x ½m; n C g2 y ½m; n ðkÞ 0 0 > aðkÞ > 21 Dx ½m; n C a22 Dy ½m; n > : : Dy½m; n x ðkÞ 0 ðkÞ 0 g1 x ½m; n C g2 y ½m; n

(50)

Fig. 11. Histograms of the residual multi-view rectification errors Dx 0 , Dy 0 , and e 0 ^ððDx 0 Þ2 C ðDy 0 Þ2 Þ1=2 (a–c, respectively) and their fitting with Gaussian PDFs (a and b) and a Rayleigh PDF (c).

The histograms of Dx, Dy, and e ^ððDxÞ2 C ðDyÞ2 Þ1=2 , across all K views are displayed in Fig. 12(a)–(c), respectively. The standard deviations of Dx and Dy are, respectively, sDxZ 0.091 and sDyZ0.107; the corresponding values obtained with Bouguet’s technique are 0.119 and 0.116.

532

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Fig. 13. (a) Synthetic version of the image of Fig. 1(b); (b) synthetic image superposed to the actual image.

images from the dataset of [7] by using as internal and external parameters the values obtained from the 25 actual images. Fig. 13(a) shows the synthetic version of the image of Figs. 1(b) and 13(b) its superposition to the actual image. The estimates z^ i of the internal parameters zi obtained from the 25 synthetic images are shown in Table 3. Table 3 Estimates z^ i of the internal parameters zi obtained from the synthetic versions of the 25 images in the dataset of [7] along with their SDs

Fig. 12. Histograms of the approximated reprojection errors Dx, Dy, and e ^ððDxÞ2 C ðDyÞ2 Þ1=2 (a–c, respectively) and their fitting with Gaussian PDFs (a and b) and a Rayleigh PDF (c).

5.2. Synthetic images In order to assess the accuracy of our camera calibration algorithm, we have generated synthetic versions of all 25

zi

z^ i

SD

xp yp k1 k2 k3 p1 p2 p3 fx fy

K16.48 K2.87 5.71!10K7 6.11!10K13 K2.10!10K18 K6.34!10K7 K1.07!10K6 1.30!10K6 656.66 657.82

0.94 0.97 2.85!10K8 6.76!10K13 4.58!10K18 5.23!10K7 7.10!10K7 1.03!10K5 0.71 0.57

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Fig. 14. Histogram hðeðz^ e ÞÞ of the relative estimation errors eðz^ e Þ ^ j½ðz^ e Þreal K ðz^ e Þsyn =ðz^ e Þreal j between the estimates ðz^ e Þreal of the external parameters obtained from the actual images and the corresponding estimates ðz^ e Þsyn obtained from the synthetic images generated by using those parameters.

The comparison with Table 2 shows that all the parameters have been estimated very accurately. Fig. 14 shows the histogram hðeðz^ e ÞÞ of the relative estimation errors    ðz^ Þ K ðz^ Þ    e real e syn eðz^ e Þ ^     ðz^ e Þreal

(51)

between the estimates ðz^ e Þreal of the external parameters obtained from the actual images and the corresponding estimates ðz^e Þsyn obtained from the synthetic images generated by using those parameters. It should be noted that the maximum relative estimation error is about 0.3% whereas most of the errors are less than 0.05%. This is a clear indication that the 3-D structure of the scene, encoded by the external parameters, is accurately estimated for all the views of the calibration plate O. Fig. 15 shows the residual rectification errors Dx 0 and Dy 0 after registering all K synthetic views [ðkÞ d ðxÞ of O onto the reference image [r ðx 0 Þ by plugging zZ z^ opt in Eq. (43). Fig. 16(a)–(c) shows analogous plots to those of Fig. 11(a)–(c). The numerical values in Fig. 16(c) indicate that the residual multi-view rectification error has a mean value of about 0.01 pixels and an SD of about 0.005 pixels showing (1) the high precision of the subpixel feature extractor of [6] and (2) the high accuracy of our calibration technique. The difference of about one order of magnitude between the residual registration errors obtained from the actual and synthetic images can be ascribed (1) to presence of digitization noise in the former, (2) to the possible deviation from perfect flatness of the calibration plate, and (3) to the likely presence of lens distortion components not modeled by Eq. (10).

533

Fig. 15. Residual multi-view rectification errors for the synthetic versions of the 25 images of [7].

6. Concluding remarks A new algorithmic framework for estimating the intrinsic and extrinsic parameters of a digital camera has been presented in this paper; it is based on the joint rectification of multiple views of a planar calibration target. Its performance has been tested and commented on with both real and synthetic data which have demonstrated its high precision. The major contributions of this paper consist in showing how both internal and external geometry can be obtained very accurately from multi-view image rectification and in providing a truly effective algorithm for its actual implementation. Unlike other related contributions, this paper provides the expressions of all partial derivatives of the quadratic cost functions defined in the two stages of the algorithm with respect to all the internal and external parameters allowing the closed-form computation of the gradients and Hessian matrices used by the Levenberg-Marquardt algorithm in their minimization. Another important feature of our method is its striking algorithmic simplicity with respect to most of the calibration techniques which use camera models of complexity comparable to ours (e.g. see [33,36]). The high accuracy of our feature extractor and the ease of implementation of our multi-view rectification algorithm make up a powerful tool for the complete, reliable, and high-precision calibration of digital cameras affected by any kind of lens distortion. Research is currently underway to completely automate and speed up the calibration procedure, implemented in Matlab at the time of writing, and to make it applicable to active vision systems where the intrinsic parameters may vary over time. Future research will aim at determining the optimal number of views to use for calibration, in particular, the minimum number which guarantees a reliable estimate

534

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

the coordinates of the principal point do not move significantly by using more than 5–6 views. We believe that our technique does provide a very attractive tool for the complete and accurate calibration of digital cameras and represents a great asset for both computer vision and photogrammetry applications.

Acknowledgements The author thank the reviewers whose comments provided very useful insights to improve the final version of the manuscript.

Appendix A This appendix provides the algorithm for the conversion of the two feature vectors X and Y into two ordered arrays x(k)[m, n] and y(k)[m, n]: (i) (x(k)[1, 1], y(k)[1, 1]), (x(k)[1, N], y(k)[1, N]), (x(k)[M, N], y(k)[M, N]), (x(k)[M, 1], y(k)[M, 1]), are computed as the entries of X and Y having the smallest Euclidean distance, respectively, from corners 1, 2, 3, and 4 of the manually selected ROI (see Fig. 6(a)); (ii) (x(k)[1, n], y(k)[1, n]), for nZ2,.,NK1, are determined by computing the analytical equation of the line joining points (x(k)[1, 1], y(k)[1, 1]) and (x(k)[1, N], y(k)[1, N]) and by seeking the entries of X and Y having a distance to this line below a preselected threshold; these candidates are then ordered based on their distance from (x(k)[1, 1], y(k)[1, 1]); the row of index M and the columns of indices 1 and N are determined likewise; (iii) Matrices x(k)[m, n] and y(k)[m, n] are then filled left-toright and top-to-bottom; once (x(k)[m, n] y(k)[m, n]) has been determined, (x(k)[m, nC1], y(k)[m, nC1]) is found: (a) by computing  ðkÞ  y ½m K 1; n C 1 K yðkÞ ½m K 1; n w Z arctan ðkÞ ; x ½m K 1; n C 1 K xðkÞ ½m K 1; n (A1)

Fig. 16. Histograms of the residual multi-view rectification errors Dx 0 , Dy 0 , and e 0 ^ððDx 0 Þ2 C ðDy 0 Þ2 Þ1=2 (a–c, respectively) and their fitting with Gaussian PDFs (a and b) and a Rayleigh PDF (c).

of the principal point. In fact, the principal point is one of the most difficult parameters to determine and the estimates of the other lens distortion coefficients do depend on it [7]. In our tests with real data, it was observed that

(b) by seeking the entries of X and Y having an angular distance to w below a preselected angular threshold and within a preselected Euclidean distance from (x(k)[m, n], y(k)[m, n]); (c) by keeping, out of the point set found in (b), the point having the smallest Euclidean distance from (x(k)[m, n], y(k)[m, n]).

Appendix B This appendix computes the partial derivatives of x1 and x2 in Eq. (33) with respect to the components of the reduced

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

parameter vector zðkÞ r . The superscripts are omitted for notational convenience   8 vx1 1 vj1 vj2 > > ða Z K x c Þ C ða K x c Þ ; > 1 1 12 1 2 > P  11 vzr vzr vzr  > > > > vx2 1 vj vj > > > ða21 K x2 c1 Þ 1 C ða22 K x2 c2 Þ 2 ; Z > > P vz vz vzr > r r > > zr Z xp ; yp ; k1 ; > > > > > < vx1 vx j vx1 vx j Z 2 Z 1; Z 2 Z 2; va va P va va P > 11 21 12 22 > > vx vx 1 > 1 2 > > Z Z ; > > P vd2 > vd1 > > vx j vx1 j > 1 1 > > ; Z Kx1 Z Kx1 2 ; > > vc P vc P > 1 2 > > > vx2 j1 vx2 j2 > : ; ; Z Kx2 Z Kx2 vc1 P vc2 P (B1) where P ^c1 ðj1 K xp ÞC c2 ðj2 K yp ÞC 1 and 8 vj1 > Z Kk1 ðr 2 C ðx K xp Þ2 Þ; > > > vx > p > > > vj2 vj > > Z 1 Z K2k1 ðx K xp Þðy K yp Þ; > > vyp > vxp > < vj2 Z Kk1 ðr 2 C 2ðy K yp Þ2 Þ; > vyp > > > > vj1 > > Z r 2 ðx K xp Þ; > > vk > 1 > > > > : vj2 Z r 2 ðy K yp Þ: vk1

Appendix C

(B2)

535

where S is the partial derivative of det det AZ a11 a22 K a12 a21 with respect to ze: vðdet AÞ va va Z a22 11 C a11 22 vze vze vze   va va K a21 12 C a12 21 : vze vze

S^

(C2)

Appendix D This appendix computes the partial derivatives of the backward homography coefficients in Eq. (21) with respect to the components ze Z u; 4; q; tX ; tY ; tZ of the external parameter vector ze and with respect to fx, fy 8 va11 > > > vu > > > > va11 > > > > > v4 > > > > va11 > > > > vq > > > > va11 > > < vtX va > 11 > > > vtY > > > > va > 11 > > > > vt Z > > > va11 > > > > > > vfx > > > > va11 : vfy

1 cos q cos2 4 sin u; Q2 1 Z K 2 tZ cos q sin 4; Q 1 Z K sin q cos 4; Q

Z

Z 0; (D1) Z 0; ZK

1 cos q cos4; Q2

Z 0;

Z 0; This appendix computes the partial derivatives of the forward homography coefficients in Eq. (22) with respect to the components ze Z u; 4; q; tX ; tY ; tZ of the external parameter vector ze and with respect to fx, fy 8   va11 1 va22 vb2 vg2 S > > > Z K g K b K 2 ða22 K b2 g2 Þ; 2 2 > > det A vz vz vz vz A det > e e e  e  > > > va12 1 va12 vb1 vg2 S > > > > vze Z det A K vze C g2 vze C b1 vze K det2 A ðKa12 C b1 g2 Þ; > >   > > va21 1 va21 vb2 vg1 S > > > K Z C g1 C b2 K 2 ðKa21 C b2 g1 Þ; > > det A  vze vze vze vze  det A > > > > va 1 va vb vg S > 22 11 1 > K Z K g1 K b1 1 K 2 ða11 C b1 g1 Þ; < det A vze vze vze vze det A vb1 1 va12 vb2 va22 vb S > > > b2 Z C a12 K b1 K a22 1 K 2 ða12 b2 K a22 b1 Þ; > > det A  vze vze vze vze vze  det A > > > > vb2 1 va21 vb1 va11 vb2 S > > b1 Z C a21 K b2 K a11 K 2 ða21 b1 K a11 b2 Þ; > > det A vz vz vz vz vz > A det e e e e >  > e > vc 1 va vg va vg S > 1 21 2 22 1 > > g Z C a21 K g1 K a22 ða g K a22 g1 Þ; K > > det A  2 vze vze vze vze vze  det2 A 21 2 > > > > vc2 1 va vg va vg S > > g1 12 C a12 1 K g2 11 K a11 2 K 2 ða12 g1 K a11 g2 Þ; Z : det A vze vze vze vze vze det A

(C1)

536

8 va21 > > > > > vu > > > > > > > > va21 > > > > > v4 > > > > > > > > > va21 > > > > < vq va21 > > vtX > > > va > > 21 > > > vtY > > > va > 21 > > > > vtZ > > > > va21 > > > > vfx > > > va21 > > : vfy

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

Z

1 fy ðtZ sin q sin u C tZ cos q sin 4 cos u Q2 fx Ccos 4 cos q sin 4Þ;

Z

1 fy ðtZ cos q cos 4 sin u K sin 4 cos2 u sin q Q2 fx Ccos u cos q sin uÞ;

Z

1 fy ðKcos q cos u K sin q sin 4 sin uÞ; Q fx

Z 0; Z 0; 1 fy ðsin q cos u K cos q sin 4 sin uÞ; Q2 fx 1 fy ZK r ; Q fx2 21 1 1 Z r ; Q fx 21 Z

(D5) (D2)

8 va12 > > > vu > > > > > va12 > > > > v4 > > > > va > 12 > > > > vq > > > va > > < 12 vtX va12 > > > > vtY > > > > va12 > > > > vtZ > > > > va12 > > > > > vfx > > > va > 12 > : vfy 8 va 22 > > > vu > > > > > > > > > va22 > > > > > v4 > > > > > > > > > va22 > > > > < vq va22 > > vtX > > > va > > 22 > > > vtY > > > va > 22 > > > > vtZ > > > va22 > > > > vfx > > > va22 > > : vfy

1 fx sin q cos2 4 sinu; Q2 fy 1 f Z K 2 x tZ sin q sin 4; Q fy 1 fx Z cos q cos 4; Q fy

Z

Z 0; (D3) Z 0; 1 fx sin q cos4; Q2 fy 1 1 Z r ; Q fy 12 1 f Z K x2 r12 ; Q fy ZK

8 vb2 1 > > Z 2 fy cos 4ðcos 4 C tZ cos u C tY sin uÞ; > > vu Q > > > vb2 1 > > > Z 2 fy sin 4ðKtZ sin u C tY cos uÞ; > > v4 Q > > > > vb2 > > Z 0; > > vq > > > > vb2 > > Z 0; < vtX vb 1 > > 2 Z fy ; > > Q vtY > > > > vb2 1 > > Z K 2 fy ðcos 4 sin u C tY Þ; > > > vtZ Q > > > vb2 > > > > vf Z 0; > x > > >  vb2 1 > > : cos 4 sin u C tY ; Z Q vfy (D6)

1 Z 2 ðKtZ cos q sin u C tZ sin q sin 4 cos u Q Csin q cos 4 sin 4Þ; Z

1 ðtZ sin q cos 4 sin u C sin 4 cos2 u cos q Q2 Csin q cos u sin uÞ;

Z

1 ðKsin q cos u C cos q sin 4 sin uÞ; Q

Z 0; Z 0; ZK

8 vb1 1 > > Z K 2 fx ðsin 4 K tX Þcos 4 sin u; > > vu Q > > > vb1 1 > > > Z K 2 fx ðcos u C tZ cos 4 K tX sin 4 cos uÞ; > > v4 Q > > > > vb1 > > Z 0; > > vq > > > > vb1 1 > > Z fx ; < Q vtX vb1 > > Z 0; > > vt > Y > > > > > vb1 Z 1 fx ðsin 4 K tX Þ; > > > vtZ Q2 > > > vb 1 > 1 > Z ðKsin 4 C tX Þ; > > > vfx Q > > > > vb1 > : Z 0; vfy

1 ðcos q cos u C sin q sin 4 sinuÞ; Q2

Z 0; Z 0; (D4)

8 vg1 > > > > vu > > > > > > > > > vg1 > > > > > v4 > > > > > > > > > vg1 > > > > > < vq vg1 > > vt > X > > > > vg1 > > > vtY > > > > vg1 > > > > vtZ > > > > vg1 > > > > > vfx > > > vg > 1 > : vfy

Z

1 1 ðtZ sin q cos u K tZ cos q sin 4 sin u Q 2 fx Csin q cos 4Þ;

Z

1 1 cos uðtZ cos q cos 4 C sin 4 sin q sin u Q 2 fx Ccos q cos uÞ;

Z

1 1 ðcos q sin u K sin q sin 4 cos uÞ; Q fx

Z 0; Z 0; 1 1 ðsin q sin u C cos q sin 4 cos uÞ; Q 2 fx 1 1 ZK r ; Q fx2 31 ZK

Z 0; (D7)

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

8 vg2 > > > > > vu > > > > > > > > > > > > > vg2 > > > > v4 > > > > > > > > > > > > > > vg2 > > > > > < vq vg2 > > > vtX > > > > > > vg2 > > > > vtY > > > > > > vg2 > > > > vtZ > > > > > > vg2 > > > > > vfx > > > > > vg2 > > : vfy

ZK

1 1 ðtZ cos q cos u C tZ sin q sin 4 sin u Q2 fy

Ccos q cos 4Þ; Z

1 1 cos uðtZ sin q cos 4 K sin 4 cos q sin u Q2 fy

with 8 9 ^ðk1 r 2 C k2 r 4 C k3 r 6 Þ; > > < lx ^½p1 ðr 2 C 2ðx K xp Þ2 Þ C 2p2 ðx K xp Þðy K yp Þ; > > : ly ^½2p1 ðx K xp Þðy K yp Þ C p2 ðr 2 C 2ðy K yp Þ2 Þ:

537

(E3)

The partial derivatives of j1 and j2 with respect to the components of the lens parameter vector z[ are: 8 1 1 vj1 > > Z ðsin q sin u C cos q sin 4 cos uÞ; Z K½9 C 2ðd9 ðx K xp Þ2 C lx p3 ðx K xp Þ > > Q fy > vxp > > > > Cqð3p1 ðx K xp Þ C p2 ðy K yp ÞÞÞ; > > > Z 0; > > vj2 > > > Z K2½d9 ðx K xp Þðy K yp Þ C ly p3 ðx K xp Þ > > vxp > > > Z 0; > Cqðp1 ðy K yp Þ C p2 ðx K xp ÞÞ; > > > > > > vj1 1 1 > > Z K2½d9 ðx K xp Þðy K yp Þ C lx p3 ðy K yp Þ > Z 2 ðcos q sin u K sin q sin 4 cos uÞ; > > vyp Q fy > > > Cqðp1 ðy K yp Þ C p2 ðx K xp ÞÞ; > > > > > Z 0; > vj2 > > Z K½9 C 2ðd9 ðy K yp Þ2 C ly p3 ðy K yp Þ > > < vyp 1 1 Cqðp1 ðx K xp Þ C 3p2 ðy K yp ÞÞÞ; ZK r ; > Q fy2 32 > > > vj1 vj2 > > Z r 2 ðx K xp Þ; Z r 2 ðy K yp Þ; (D8) > > > vk vk > 1 1 > > vj1 vj2 > 4 > > Z r ðx K x Þ; Z r 4 ðy K yp Þ; p > > vk vk where Q ^r33 C tZ Z cos 4 cos uC tZ . > 2 2 > > vj1 vj2 > 6 > > Z r ðx K x Þ; Z r 6 ðy K yp Þ; p > > vk vk > 3 3 > > > vj1 vj2 2 2 > > Z qðr C 2ðx K x Þ Z 2qðx K xp Þðy K yp Þ; > p Þ; > vp vp1 > Appendix E 1 > > > vj vj2 > > 1 Z 2qðx K xp Þðy K yp Þ; Z qðr 2 C 2ðy K yp Þ2 Þ; > > vp vp > This appendix computes the partial derivatives of x1 and 2 2 > > > vj1 vj2 > x2 in Eq. (43) with respect to the components of the lens > Z lx r 2 ; Z ly r 2 ; : vp3 vp3 parameter vector z[ (E4)   8 vx1 1 vj1 vj2 > > ða Z K x c Þ C ða K x c Þ K ða K x c Þ ; > 11 1 1 12 1 2 11 1 1 > D vz[0 vz[0 vz[0 > > >   > > > vx2 1 vj1 vj2 > 0 > > < vz 0 Z D ða21 K x2 c1 Þ vz 0 C ða22 K x2 c2 Þ vz 0 K ða21 K x2 c1 Þ ; z[ Z xp ; yp ; [ [ [ (E1)   > vx1 1 vj1 vj2 > > ða11 K x1 c1 Þ 00 C ða12 K x1 c2 Þ 00 ; Z > > > D vz[00 vz[ vz[ > > >   > > vx 1 vj1 vj2 > > : 200 Z ða21 K x2 c1 Þ 00 C ða22 K x2 c2 Þ 00 ; z[00 Z k1 ; k2 ; k3 ; p1 ; p2 ; p3 : D vz[ vz[ vz[ Csin q cos uÞ;

The partial derivatives of j1 and j2 in Eq. (E1) can be easily computed by writing Eqs. (9) and (10) in a compact form as

where d9 ^k1 C 2k2 r 2 C 3k3 r 4 .

8 < j1 ðx; y; z[ Þ Z x C 9ðx K xp Þ C qlx ;

Appendix F

: j ðx; y; z Þ Z y C 9ðy K y Þ C ql ; 2 [ p y

(E2)

This appendix computes the partial derivatives of x1 and x2 in Eq. (43) with respect to the components of the external

538

parameter vector ze 3 2 2 vx va11 6 vu 7 6 vu 7 6 6 6 vx 7 6 va11 7 6 6 6 v4 7 6 v4 7 6 6 6 vx 7 6 va11 7 6 6 6 vq 7 6 vq 7 6 6 6 vx 7 6 va11 7 6 6 6 vtX 7 6 vtX 7 6 6 6 vx 7 Z 6 va11 7 6 6 6 vtY 7 6 vtY 7 6 6 6 vx 7 6 va11 7 6 6 6 vtZ 7 6 vtZ 7 6 6 6 vx 7 6 va11 7 6 6 6 vfx 7 6 vfx 7 6 6 4 vx 5 4 va11 vfy vfy

L. Lucchese / Image and Vision Computing 23 (2005) 517–539

and with respect to fx and fy: va21 vu va21 v4 va21 vq va21 vtX va21 vtY va21 vtZ va21 vfx va21 vfy

va12 vu va12 v4 va12 vq va12 vtX va12 vtY va12 vtZ va12 vfx va12 vfy

va22 vu va22 v4 va22 vq va22 vtX va22 vtY va22 vtZ va22 vfx va22 vfy

vb1 vu vb1 v4 vb1 vq vb1 vtX vb1 vtY vb1 vtZ vb1 vfx vb1 vfy

vb2 vu vb2 v4 vb2 vq vb2 vtX vb2 vtY vb2 vtZ vb2 vfx vb2 vfy

vc1 vu vc1 v4 vc1 vq vc1 vtX vc1 vtY vc1 vtZ vc1 vfx vc1 vfy

where x is a short notation for x1 and x2. In Eq. (F1), the partial derivatives of x1 and x2 with respect to the forward homography coefficients are, respectively, 8 j1 K xp vx1 > > > va Z P ; > > 11 > > vx1 > > > Z 0; > > va21 > > > j2 K yp vx > > > 1 Z ; > > va P > 12 > > vx > > < 1 Z 0; va22 (F2) vx1 1 > > ; Z > > > P vb1 > > > vx1 > > Z 0; > > > vb2 > > > j1 K x p vx1 > > ; Z Kx1 > > vc P > 1 > > > j K yp vx > : 1 Z Kx1 2 ; vc2 P and 8 vx2 > > Z 0; > > va > 11 > > j1 K xp > vx2 > > ; Z > > va P > 21 > > vx2 > > > Z 0; > > va12 > > > vx2 j2 K yp > > < ; Z va22 P vx2 > > Z 0; > > > vb 1 > > vx > 1 2 > > Z ; > > P vb2 > > > j1 K x p > vx2 > > ; Z Kx2 > > P > > vc1 > j K yp vx > > : 2 Z Kx2 2 : vc2 P

(F3)

32 3 vc2 vx 7 6 vu 7 76 va11 7 vc2 76 vx 7 76 7 7 6 v4 7 76 va21 7 vc2 76 vx 7 76 7 7 6 vq 7 76 va12 7 vc2 76 vx 7 76 7 7 6 vtX 7 76 va22 7 vc2 76 vx 7 76 7 7 6 vtY 7 76 vb1 7 7 7 6 vc2 76 vx 7 7 6 vtZ 76 vb2 7 7 vc2 76 vx 7 76 7 7 6 vfx 7 76 vc1 7 vc2 54 vx 5 vfy vc2

(F1)

References [1] D.A. Forsyth, J. Ponce, Computer vision—a modern approach, Prentice Hall, Upper Saddle River, NJ, 2002. [2] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, UK, 2000. [3] D.C. Brown, Close-range camera calibration, Photogrammetric Engineering 37 (8) (1971) 855–866. [4] P.F. Sturm, S.J. Maybank, On plane-based camera calibration: a general algorithm, singularities, applications, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’99), Fort Collins, CO, June 1 (1999) 23–25. [5] D. Liebowitz, A. Zisserman, Metric rectification for perspective images of planes, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’98), Santa Barbara, June 1998; 482–488. [6] L. Lucchese, S.K. Mitra, Using saddle points for subpixel feature detection in camera calibration targets, Proceedings of the 2002 Asia Pacific Conference on Circuits and Systems, Singapore, December 2002;. [7] J.-Y. Bouguet, Camera calibration toolbox for Matlab, http://www. vision.caltech.edu/bouguetj/calib\_doc/, November 2001. [8] L. Lucchese, Estimating the pose and focal length of a camera from the perspective projection of a planar calibration plate, Proceedings of the Fifth IASTED International Conference on Signal and Image Processing (SIP 2003), Honolulu, Hawaii, August 2003; 201–206. [9] L. Lucchese, Digital camera calibration. Part I: internal and external geometry from multi-view alignment, Proceedings of the 3rd IASTED International Conference on Visualization, Imaging, and Image Processing (VIIP 2003), Benalma´dena, Spain, September III (2003) 1061–1065. [10] L. Lucchese, Digital camera calibration. Part II: camera model, subpixel feature extraction, and algorithm initialization, Proceedings of VIIP 2003, Benalma´dena, Spain, September III (2003) 1066–1070. [11] C.C. Slama, Manual of Photogrammetry, 4th ed, American Society of Photogrammetry, Falls Church, VA, 1980. [12] Y.I. Abdel-Aziz, H.M. Karara, Direct linear transformation into object space coordinates in close-range photogrammetry, Proceedings of the Symposium on Close-Range Photogrammetry, Urbana-Champaign, IL, January 1971; 1–18. [13] R.Y. Tsai, A versatile camera calibration technique for highaccuracy 3D machine vision metrology using off-the-shelf cameras and lenses, IEEE Journal of Robotics and Automation RA-3 (4) (1987) 323–344.

L. Lucchese / Image and Vision Computing 23 (2005) 517–539 [14] R.K. Lenz, R.Y. Tsai, Techniques for calibration of the scale factor and image center for high accuracy 3D machine vision metrology, IEEE Transactions on Pattern Analysis and Machine Intelligence 10 (5) (1988) 713–720. [15] J. Weng, P. Cohen, M. Herniou, Camera calibration with distortion models and accuracy evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (10) (1992) 965–980. [16] H. Bacakoglu, S. Kamel, A tree-step camera calibration method, IEEE Transactions on Instrumentation and Measurement 46 (5) (1997) 1165–1172. [17] J. Heikkila¨, O. Silve´n, A four-step camera calibration procedure with implicit image correction, Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, June 1997; 1106–1112. [18] S.J. Maybank, O.D. Faugeras, A theory of self calibration of a moving camera, International Journal of Computer Vision 8 (2) (1992) 123–151. [19] D. Liebowitz, A. Zisserman, Combining scene and auto-calibration constraints, Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV’99), Kerkyra, Greece, September 1 (1999) 293–300. [20] A.W. Fitzgibbon, Simultaneous linear estimation of multiple view geometry and lens distortion, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai, HI, December 1 (2001) 125–132. [21] L. deAgapito, E. Hayman, I.D. Reid, Self-calibration of rotating and zooming cameras, International Journal of Computer Vision 45 (2) (2001) 107–127. [22] Q.-T. Luong, O.D. Faugeras, Self-calibration of a moving camera from point correspondences and fundamental matrices, International Journal of Computer Vision 22 (3) (1997) 261–289. [23] O.D. Faugeras, L. Quan, P. Sturm, Self-calibration of a 1D projective camera and its application to the self-calibration of a 2D projective camera, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2000) 1179–1185. [24] E. Malis, R. Cipolla, Camera self-calibration from unknown planar structures enforcing the multiview constraints between collineations, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (9) (2002) 1268–1272. [25] M. Pollefeys, R. Koch, L. Van Gool, Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1) (1998) 7–25. [26] P.F. Sturm, Self-calibration of a moving zoom-lens camera by precalibration, Image and Vision Computing 15 (1997) 583–589. [27] B. Triggs, Autocalibration from planar scenes, Proceedings of the European Conference on Computer Vision (ECCV’98), Freiburg, Germany, June 1 (1998) 89–105. [28] K.-Y. Wong, P.R.S. Mendonc¸a, R. Cipolla, Camera calibration from surfaces of revolution, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (2) (2003) 147–161. [29] P.F. Sturm, L. Quan, Camera calibration and relative pose estimation from gravity, Proceedings of the 16th International Conference on Pattern Recognition (ICPR 2000), Barcelona, Spain, September 1 (2000) 72–75. [30] T.-S. Shen, C.-H. Menq, Automatic camera calibration for multiplesensor integrated coordinate measurement system, IEEE Transactions on Robotics and Automation 17 (4) (2001) 502–507. [31] Z. Zhang, Flexible camera calibration by viewing a plane from unknown orientations, Proceedings of International Conference on Computer Vision (ICCV’99), Corfu, Greece, September 1999; 666–673. [32] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (11) (2000) 1330–1334.

539

[33] C. Chatterjee, V.P. Roychowdhury, Algorithms for coplanar camera calibration, Machine Vision and Applications 12 (2000) 84–97. [34] F. Pedersini, A. Sarti, S. Tubaro, Accurate and simple geometric calibration of multi-camera systems, Signal Processing 77 (1999) 309–334. [35] J. Heikkila¨, O. Silve´n, Calibration procedure for short focal length offthe-shelf CCD cameras, Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96), Vienna, Austria, August 1 (1996) 166–170. [36] J. Heikkila¨, Geometric camera calibration using circular control points, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2000) 1066–1077. [37] M. Li, J.-M. Lavest, Some aspects of zoom lens camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (11) (1996) 1105–1110. [38] J.-M. Lavest, M. Viala, M. Dhome, Do we really need an accurate calibration pattern to achieve a reliable camera calibration?, Proceedings of the European Conference on Computer Vision (ECCV’98), Freiburg, Germany, June 1 (1998) 158–174. [39] R. Samtaney, A method to solve interior and exterior camera calibration parameters for image resection, NASA Advanced Supercomputing, Moffett Field, CA, Technical report # NAS-99-003, 1999. [40] M. Ahmed, A. Farag, Non-metric calibration of camera lens distortion, Proceedings of the International Conference on Image Processing, Thessaloniki, Greece, September 2 (2001) 157–160. [41] F. Devernay, O.D. Faugeras, Straight lines have to be straight, Machine Vision and Applications 13 (1) (2001) 14–24. [42] B. Prescott, G.F. McLean, Line-based correction of radial lens distortion, Graphical Models and Image Processing 59 (1) (1997) 39–47. [43] R. Swaminathan, S.K. Nayar, Nonmetric calibration of wide-angle lenses and polycameras, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2002) 1172–1178. [44] G.P. Stein, Lens distortion calibration using point correspondences, Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’97), San Juan, Puerto Rico, June 1997; 602–608. [45] Y. Altunbasak, R.M. Mersereau, A.J. Patti, A fast parametric motion estimation algorithm with illumination and lens distortion correction, IEEE Transactions on Image Processing 12 (4) (2003) 395–408. [46] H.S. Sawhney, R. Kumar, True multi-image alignment and its application to mosaicking and lens distortion correction, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (3) (1999) 235–243. [47] R.T. Collins, Y. Tsin, Calibration of outdoor active camera, Carnegie Mellon University, Technical report # CMU-RI-TR-98-36, 1999. [48] F. Du, M. Brady, Self-calibration of the intrinsic parameters of cameras for active vision systems, Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR’93), New York, NY, June 1993; 477–482. [49] J.Z.C. Lai, On the sensitivity of camera calibration, Image and Vision Computing 11 (10) (1993) 656–664. [50] S.K. Kopparapu, P. Corke, The effect of measurement noise on intrinsic camera calibration parameters, Proceedings of the 1999 IEEE International Conference on Robotics and Automation, Detroit, MI, May 1999; 1281–1286. [51] B.J. Tordoff, Active control of zoom for computer vision, DPhil Thesis, Michaelmas, 2002, http://www.eng.cam.ac.uk/~bjt21/Papers/. [52] C. Harris, M. Stephens, A combined corner and edge detector, Proceedings of the Fourth Alvey Vision Conference, Manchester, UK, August 1988; 147–151. [53] W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes in C: The Art of Scientific Computing, 2nd ed, Cambridge University Press, Cambridge, 1992.