Efficient Visual-Inertial Navigation using a Rolling-Shutter Camera with Inaccurate Timestamps
Chao X. Guo, Dimitrios G. Kottas, Ryan C. DuToit Ahmed Ahmed, Ruipeng Li and Stergios I. Roumeliotis
Multiple Autonomous Robotic Systems Laboratory Technical Report Number -2014-001 February 2014 Repository Revision: Last Update: Dept. of Computer Science & Engineering University of Minnesota 4-192 EE/CS Building 200 Union St. S.E. Minneapolis, MN 55455 Tel: (612) 625-2217 Fax: (612) 625-0572 URL: http://mars.cs.umn.edu
Efficient Visual-Inertial Navigation using a Rolling-Shutter Camera with Inaccurate Timestamps Chao X. Guo, Dimitrios G. Kottas, Ryan C. DuToit Ahmed Ahmed, Ruipeng Li and Stergios I. Roumeliotis Multiple Autonomous Robotic Systems Laboratory, TR-2014-001
February 2014
This technical report is structured as follows: In the first section, we introduce the expression for generating the position and orientation of an interpolated coordinate frame. In the second section, we derive the Jacobian matrices for the measurement model dealing with both the time synchronization and rolling shutter problems. The third and fourth sections, present the unobservable directions of the linearized system, while its non-linear counterpart is employed for proving the uniqueness of those directions. Finally, we provide a strategy for selecting the poses that comprise the estimator’s optimization window, for two common motion profiles: generic motion, that provides sufficient baseline between consecutive camera poses and hovering scenarios (i.e., static case or pure rotational motions). I. I NTERPOLATION M ODEL A. Translation For generating the position of the interpolated frame at time φ , we use the following linear interpolation model. pψ = (1 − λ )pk + λ pk+1
(1)
where λ is the interpolation parameter, pk and pk+1 are the camera position in the global frame at times k and k+1 (with tk ≤ tφ ≤ tk+1 ), and pψ is the camera position when capturing the feature measurement. B. Rotation For the orientation, we introduce the following small angle-based interpolation model: k+1
ψ kC
Ck C ' I − bθ ×c
(2)
' I − λ bθ ×c = (1 − λ )I + λ k+1 Ck C
(3)
C = k Ck C = (1 − λ )k C + λ k+1 C
(4)
ψ G
ψ
iC j
where denotes the orientation of j in frame i. When there is no superscript or subscript is used, this corresponds k+1 C). to the global frame G (i.e., k+1 G C, The interpolation parameter λ can be defined as the time delay caused by the time synchronization and the rolling shutter. tts + trs m λ= (5) tn II. M EASUREMENT M ODEL AND JACOBIANS The measurement model can be written as z = h0 (x) + n, where n is the measurement noise, and " h(x) # 1
h0 (x) =
h(x)3 h(x)2 h(x)3
(6)
where h(x) = ψ p f = ψ C(p f − pψ ) = (1 − λ )k C + λ k+1 C p f − (1 − λ )pk − λ pk+1
(7)
The measurement Jacobians can be computed using the chain rule, as ∂ h0 (x) ∂ h0 (x) ∂ h(x) = ∂x ∂ h(x) ∂ x
(8)
where ∂ h0 (x) = ∂ h(x)
"
1 h(x)3
0
0
1 h(x)3
h(x)1 − (h(x) )2 3 h(x)2
− (h(x)
# , H pk
(9)
2 3)
and b T1 ,(1 − λ )k C + λ (k+1 C) t2 ,p f − (1 − λ )pk − λ pk+1 Hθk =(1 − λ )bk Ct2 ×c Hθk+1 =λ bk+1 Ct2 ×c Hk =(λ − 1)T1 Hk+1 = − λ T1 Hp f =T1 Hλ =(k+1 C − k C)t2 + T1 (pk − pk+1 )
(10)
where Hθk and Hθk+1 are the Jacobians of the orientation of the global in the camera frames at times k and k+1, Hk and Hk+1 are the Jacobians of the position of the camera in global frame at times k and k+1, Hp f is the Jacobian of the feature position, and Hλ is the Jacobian of the interpolation parameter λ . III. O BSERVABILITY A NALYSIS : L INEARIZED S YSTEM In our observability analysis, we assume that the camera and IMU frames coincide so as to simplify notation. In practice, the extrinsic calibration parameters can be determined using the algorithm proposed in [1]. The system’s state vector is: xk xk+1 ξ , where xi , i = k, k + 1, is the IMU state: xi = i q bg vi ba pi (11) corresponding, i q, the global orientation in the IMU frame (in quaternion parametrization), bg , the gyroscope bias, vi , the IMU velocity in the global frame, ba , the accelerometer bias, and pi , the IMU position in the global frame respectively. ξ is defined as: ξ = pf λ (12) where p f is the feature position in the global frame, and λ is the interpolation parameter. Note that only one point feature is necessary. A system’s observability matrix [2], M, is defined as H1 H2 Φ2,1 M= . (13) . . Hk Φk,1 where Φk,1 is the state transition matrix from time step 1 to k, and Hk is the measurement Jacobian at time step k. As described in [2], a system’s unobservable directions, N, span the observability matrix’s right nullspace. MN = 0 TR-2014-001. RepRev
(14) 3
From now on, we will use Φ and H with superscripts denote the state transition and measurement Jacobian matrices of the whole system, while Φ and H with subscripts denote the state transition and measurement Jacobian matrices corresponding to the IMU state xi , i =k,k+1. According to (10), the measurement Jacobian can be written as: h Hp f Hk = H pk HkR Hk+1 R
Hλ
i
(15)
where for i = k, k + 1, HiR , Hθi
0
0
0
Similarly, the system state transition matrix can be defined Φk,1 015×15 015×15 Φk+1,2 k,1 Φ = 03×15 03×15 01×15 01×15
as:
Hi
015×3 015×3 I3×3 01×3
015×1 015×1 03×1 1
(16)
(17)
where Φi, j , (i, j) = (k, 1), (k + 1, 2) is the state transition matrix between IMU states xi and x1 . As shown in [3], the structure of Φi,1 is written as: i i 0 0 j C φ12 0 I 0 0 i i i (18) Φi, j = φ31 φ32 I φ34 0 0 0 I i i i φ51 φ52 (i − j)δtI φ54 Then, substituting Hk from (10), the block columns of the k-th block row of the system’s observability matrix, M, is written as:
Mk1 = H pk (1 − λ )bk Ct2 ×ck C1 + (λ − 1)T1 Φk51 Mk2 = H pk (1 − λ )bk CT2 ×cΦk12 + (λ − 1)T1 Φk52 Mk3 = H pk ((k − 1)δt(λ − 1)T1 ) Mk4 = H pk (λ − 1)T1 Φk54 Mk5 = H pk ((λ − 1)T1 ) Mk6 = H pk (T1 ) Mk7 = H pk λ bk+1 CT2 ×ck+1 C1 − λ T1 Φk+1 51 k+1 Mk8 = H pk λ bk+1 CT2 ×cΦk+1 − λ T Φ 1 12 52 Mk9 = H pk (−kδtλ T1 ) Mk10 = H pk −λ T1 Φk+1 54 Mk11 = H pk (−λ T1 ) Mk12 = H pk (k+1 C − k C)t2 + T1 (pk − pk+1 )
(19)
In [3], it has been shown that with a time synchronized global shutter camera, the vision-aided inertial navigation TR-2014-001. RepRev
4
system’s unobservable directions are:
Nglobal
1 Cg
03×3 03×1 03×3 −bv1 ×cg 03×3 , 03×1 03×3 −bp1 ×cg I3×3 −bp f ×cg I3×3
(20)
In our case, the system receives the same amount of information from the IMU and camera measurements, except there exists a time misalignment between the two sensors. Thus, we expect our system to have at least the same unobservable directions. Lemma 1: The system with time misaligned IMU and camera measurements has at least the following four unobservable directions: 1 Cg 03×3 03×1 03×3 −bv1 ×cg 03×3 03×1 03×3 −bp1 ×cg I3×3 2 Cg 03×3 = Ng N p (21) N, 03×3 03×1 −bv2 ×cg 03×3 03×1 03×3 −bp2 ×cg I3×3 −bp f ×cg I3×3 0 01×3 Proof: First, it can be easily seen that Mk N p = H pk (−T1 + T1 ) = 0, for any k. Thus, MN p = 0. To verify Ng is also the system’s unobservable direction, we multiply it to the system’s observability matrix as: Mk Ng =H pk (1 − λ )k CbT2 ×cg + (λ − 1)T1 bp1 − pk ×c − (λ − 1)T1 bp1 ×cg − T1 bp f ×cg + H pk λ k+1 CbT2 ×cg − λ T1 bp2 − pk+1 ×cg + λ T1 bp2 ×cg =H pk (1 − λ )k CbT2 ×cg + (λ − 1)T1 b−pk ×cg − T1 bp f ×c + λ k+1 CbT2 ×cgλ T1 bpk+1 ×cg =0
(22)
k is substituted with the following expression, as proved in [3]: in which φ51
1 k φ51 = bp1 + (k − 1)δtv1 − ((k − 1)δt)2 g − pk ×c1 C 2
(23)
IV. O BSERVABILITY A NALYSIS : N ON - LINEAR S YSTEM In the last section, we have proved N is the system’s nullspace. In this section, we will show that the time offset between the IMU and camera, corresponding to the interpolation ratio, is observable for the VINS. Thus, there exists no other unobservable directions other than N. Theorem 1: Consider a nonlinear, continuous-time system: x˙ = ∑li=0 fi (x)ui (24) z = h(x) if a time delay τ is unobservable, then O x˙ = 0
(25)
where O is the system’s observability matrix, as defined in [4]. Proof: For a system (24), given the control inputs ui and the measurements z, if the time delay τ is unobservable, TR-2014-001. RepRev
5
then: z = h(x(t)) = h(x(t + τ))
(26)
By applying Taylor series expansion on (26), we have: h(x(t)) = h(x(t + τ)) 1 1 ... = h(x(t)) + τ h˙ + τ 2 h¨ + τ 3 h + · · · 2 6 ! 1 3 l l 1 2 l 1 2 0 = h(x(t)) + τ∇L h + τ ∑ ∇Lfi hui + τ ∑ ∑ ∇Lfi f j hui u j + · · · x˙ 2 i=0 6 i=0 j=0
(27)
˙ h¨ and h¨ are substituted with the following expressions: where h, ∂h ∂h ∂x h˙ = = = ∇L0 h˙x ∂t ∂ x ∂t l 1 l ∂ h˙ ∂ (∇L0 h˙x) ∂ (∑i=0 Lfi hui ) h¨ = = = = ( ∑ ∇L1fi hui )˙x ∂t ∂t ∂t i=0 l l l l ∂ ((∑li=0 ∇L1fi hui )˙x) ∂ (∑i=0 ∑ j=0 L2fi f j hui u j ) ... ∂ h¨ = = = ( ∑ ∑ ∇L2fi f j hui u j )˙x h= ∂t ∂t ∂t i=0 j=0
(28)
Since (27) needs to be satisfied by any set of control inputs ui , the following relation needs to hold:
∇L0 h ∇L1 h fi ∇L2 h f f i j O x˙ = 0, O = ∇L3fi f j fk h .. .
(29)
where i, j, k = 0, . . . , l, and the matrix O, is by definition [4], the observability matrix of system (1). Lemma 2: For a Vision-aided Inertial Navigation System, O x˙ 6= 0 Proof: The VINS process model can be written as: 1 1 s˙ 0 − 2 Dbg 2D v˙ g − CT ba 0 CT 0 0 p˙ v x˙ = = + ω + 0 a ˙ p 0 0 f 0 0 b˙ a 0 0 b˙ g 0 0 | {z } | {z } | {z } f0
where
1 ∂s 2 D , ∂θ .
(30)
f2
f1
According to [5], the system’s observability matrix can be split into the product of two matrices: O = ΞB
where Ξ is a full column rank matrix, ξ 03 03 I3 B= 03 03 03 03 03 03 TR-2014-001. RepRev
(31)
and B is: 03 03 I3 03 03
03 03 03 I3 03
bp f c ∂∂θs 03 ∂θ 03 bCvc ∂ s 03 03 03 bCgc ∂ θ ∂s I3 03
03 C 03 03 03
−C 03 03 03 03
C 03 03 03 03
03 03 I3 03 03
03 03 03 0
(32)
3
I3 6
where ξ is a 3 × 3 full rank matrix. Then: O x˙ = ΞB˙x = ΞB( f0 + f1 ω + f2 a) ξ 03 03 03 03 bC(p f − p)c ∂∂θs 03 I3 03 03 03 bCvc ∂ θ ∂s = Ξ 03 03 03 I3 03 03 03 03 03 I3 03 bCgc ∂ θ ∂s 03 03 03 03 I3 03 | {z } A −bC(p f − p)cbg − Cv + bC(p f − p)cω −bCvcbg + Cg − ba + bCvcω + a 0 = A −bCgcbg + bCgcω 0
03 C 03 03 03
−C 03 03 03 03
C 03 03 03 03
03 03 I3 03 03
1 1 0 − Db 03 2 T g 2 D CT 0 g − C b a 03 0 v + ω + 0 a 03 0 0 0 03 0 0 0 I3 0 0 0
(33)
Since A has full column rank [5], O x˙ 6= 0. Based on Theorem 1 and Lemma 2, the time delay τ is observable. Therefore, there exists no unobservable direction, that is not spanned by N. V. C LONING DURING HOVERING Given sufficient baseline, between consecutive camera poses, the proposed sliding window estimator manages cloned poses, using a first in, first out (FIFO) scheme. Specifically, before cloning the IMU pose corresponding to the new image, into the estimator’s sliding window, the oldest pose is marginalized, such as the length of the sliding window remains constant. Assume that at time step k − 1, the sliding window comprises N poses, {xk−N , . . . , xk−1 }. After processing feature tracks, firstly observed in image k − N, the oldest pose xk−N is marginalized. At the next time step k, the new pose xk , is added to the estimator’s optimization window. In short, the FIFO scheme slides the window of camera poses forward in time, which is the most commonly used image management scheme employed by sliding window filters. Under sufficient motion, a FIFO cloning strategy leads to poses uniformly distributed in space and time, which allows i) the triangulation of features, for linearization purposes, and ii) the representation of camera poses as an interpolation of consecutive IMU poses. Furthermore, under motion with sufficient baseline, the system has four unobservable directions, equal in number to the system’s theoretical minimum. However, if the sensor platform is ”hovering” (i.e., the camera performs pure rotational motion, or remains static), an alternative cloning strategy is required such that no new unobservable directions are introduced [6]. A simple solution, is to use a last in, first out (LIFO) [6] cloning strategy, during which a new (hovering) IMU pose replaces the newest (hovering) clone in the sliding window. Such a scheme, maintains non-hovering camera frames, in the estimator’s optimization window which provide sufficient baseline for visual observations. Consider a sliding window comprising the poses {xk−N , . . . xk−2 , xk−1 }. At the arrival of image k, the platform enters a hovering state, and the management of cloned poses switches to a LIFO scheme. After M images, the sliding window will comprise the poses {xk−N , . . . xk−2 , xk−1+M }. Clearly, the increasing time interval between the newest poses of the window, {xk−2 , xk−1+M }, does not allow us to employ the interpolation model, for processing measurements originating from image k − 1 + M. Instead, we propose an alternative cloning strategy (Algorithm V). Specifically, instead of maintaining only the most recent hovering pose, in the estimator’s window, we keep the two most recent hovering poses (i.e., xk−2+M , xk−1+M ). Such a strategy, allows us to use this pair of clones, for modeling measuremenets originating from the newest image k − 1 + M, while at the same time non-hovering poses remain in the estimator’s window such as baseline is maintained, between the camera frames comprising the estimator’s window. R EFERENCES [1] F. M. Mirzaei and S. I. Roumeliotis, “A kalman filter-based algorithm for imu-camera calibration: Observability analysis and performance evaluation,” IEEE Trans. on Robotics, vol. 24, no. 5, pp. 1143–1156, Oct. 2008.
TR-2014-001. RepRev
7
Algorithm 1 Proposed clone management at the arrival of image k, for a sliding window of size N if Regular Motion then Drop oldest pose, {xk−N } Add new pose, {xk } else if Switched from Regular Motion to Hovering then Drop the two most recent poses {xk−2 , xk−1 } Add new pose, {xk } else if Remained in Hovering then Drop second most recent pose, {xk−2 } Add new pose, {xk } end if
[2] P. S. Maybeck, Stochastic models, estimation and control. New York, NY: Academic Press, 1979, vol. 1. [3] J. A. Hesch, D. G. Kottas, S. L. Bowman, and S. I. Roumeliotis, “Observability-constrained vision-aided inertial navigation,” University of Minnesota, Dept. of Comp. Sci. & Eng., MARS Lab, Tech. Rep., Feb. 2012. [Online]. Available: http: //www-users.cs.umn.edu/∼dkottas/pdfs/vins tr winter 2012.pdf [4] R. Hermann and A. J. Krener, “Nonlinear controllability and observability,” IEEE Trans. on Automatic Control, vol. 22, no. 5, pp. 728–740, Oct. 1977. [5] J. A. Hesch, D. G. Kottas, S. L. Bowman, and S. I. Roumeliotis, “Camera-imu-based localization: Observability analysis and consistency improvement,” International Journal of Robotics Research, vol. 33, no. 1, pp. 182–201, Jan. 2014. [6] D. G. Kottas, K. Wu, and S. I. Roumeliotis, “Detecting and dealing with hovering maneuvers in vision-aided inertial navigation systems,” in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, Nov.03–07 2013, pp. 3172–3179.
TR-2014-001. RepRev
8