Rigorous Bounds for Two{Frame Structure from Motion John Oliensis (
[email protected]) NEC Research Institute 4 Independence Way Princeton, N.J. 08540 Abstract We present an analysis of the problem of recovering motion, particularly rotation, from two image frames. We derive an exact bound on the magnitude of the error in recovering the rotation. With the single weak requirement that the average translational image displacements are smaller than the eld of view, we demonstrate rigorously that the error in recovering the rotation is small. A lower bound on the rotation error can be derived which, for reasonable values of the eld of view, is very close to the upper bound. Thus the upper bound is in fact a good estimate of the rotation error. We demonstrate the validity of these theoretical claims on synthetic and real images. These results form part of our correctness proof for a recently developed algorithm for recovering structure and motion from a multiple{image sequence. In addition, we argue and demonstrate experimentally that in the complementary domain, when the translation is large, the whole motion can be recovered robustly if also perspective eects are important. Thus it appears that rotation can be robustly recovered over a large domain.
Keywords Structure from motion, low level vision, rigorous bounds, nonlinear estimation, multiframe structure from motion.
Rigorous Bounds for Two{Frame Structure from Motion Abstract We present an analysis of the problem of recovering motion, particularly rotation, from two image frames. We derive an exact bound on the magnitude of the error in recovering the rotation. With the single weak requirement that the average translational image displacements are smaller than the eld of view, we demonstrate rigorously that the error in recovering the rotation is small. A lower bound on the rotation error can be derived which, for reasonable values of the eld of view, is very close to the upper bound. Thus the upper bound is in fact a good estimate of the rotation error. We demonstrate the validity of these theoretical claims on synthetic and real images. These results form part of our correctness proof for a recently developed algorithm for recovering structure and motion from a multiple{image sequence. In addition, we argue and demonstrate experimentally that in the complementary domain, when the translation is large, the whole motion can be recovered robustly if also perspective eects are important. Thus it appears that rotation can be robustly recovered over a large domain.
Keywords Structure from motion, low level vision, rigorous bounds, nonlinear estimation, multiframe structure from motion.
1 Introduction
We have demonstrated in [7, 6] a new algorithm for structure from motion (SFM) which, in the appropriate domain, provably reconstructs structure and motion correctly. It is the rst SFM algorithm for which a rigorous justi cation exists. In the current paper, we present a part of the correctness proof for this algorithm, deriving tight bounds and estimates on the error in recovering rotation between two image frames when the translation is moderate. It has traditionally been believed that recovering rotations was dicult. We prove here that this is not true when the translation is moderate or small. Intuitively, our result is straightforward: if the translational displacements of image points are not large, then merely aligning the elds of view of two images gives reasonable rotation estimates. We make this intuition precise in this paper. We also argue and show experimentally that in the complementary large translation domain, if we additionally assume that perspective eects are important, then in fact the complete motion and structure can be determined reliably. Experimentally, it seems that the domain where \small translation" techniques determine the rotation well actually overlaps the domain where \large translation" techniques work. Thus it appears from our work that the rotation can be recovered reliably over a large domain. The rigorous upper bounds on the rotation error derived in this paper are in fact good estimates of it. Good estimates of the expected error are important in designing reliable algorithms; they can be used to ensure that at each stage the algorithm estimates quantites that can be estimated robustly, and are also important in understanding the domain of validly of the algorithm. For a detailed understanding of the error sensitivity we must confront the errors head{on, and our analysis is inevitably messy. The nal result bears out our original intuition: the rotation error is scaled by the average size of the translational image displacements. 1
Our experimental work on the large translation domain suports our previous claim [7, 6] that this domain is essentially an easy one: any SFM algorithm, including the notoriously unreliable \8{point" [5] algorithm, will work in this domain. In addition, we demonstrate that most of the error in recovering the Euclidean structure is due to a single component of the structure. It is often claimed that only the ane structure is reliably recoverable from two images; here we show that an \intermediate" structure, containing all but one of the Euclidean components, can be recovered reliably|at least when the translation is large.
2 Rotation Error Bound In this section we derive an explicit bound on the error in recovering the relative rotation between two noisy images neglecting a nonzero translation between the two images. The derived bound goes beyond a rst order estimate; it does not require that the translation and noise be in nitesimally small. Let the rst image be represented by unit rays pi and the second by unit rays p0i . Assume there are Np points in both images. We assume that the rotation is determined by nding R minimizing the least square error for the unit rays
X i
which is equivalent to maximizing
jp0i ? Rpj2;
X i
p0 Rp : t i
i
(1) (2)
Let RT represent the true rotation between the two images. De ne R RE RT (RE is the error rotation) and p i RT pi . RE is the rotation maximizing
X i
p0 R p : i
t
E
i
(3)
Let the unit vector !^ de ne the axis and the angle of rotation for RE . jj measures the size of the error in the recovered rotation. We will derive a bound on jj starting from (3). 2
Theorem 1 De ne the 3 3 symmetric matrix M by X M N1 (13 p0it p i ? 12 (p0i p ti + p i p0it )); p
i
(4)
where 13 is the identity matrix, and de ne the 3 vector
V N1
X
X
p0 p = N1 (p0 ? p ) p : Assume M is positive de nite. Then jj jM ?1 Vj : p
i
i
i
p
i
i
i
i
(5)
Remark. As we discuss below, M will be positive de nite when the translation and noise are suciently small.
Proof. (3) can be decomposed as X i
p0 R p = t i
E
i
+
X Xi
( !^ t p0i) ( !^ t p i)
i
( !^ p0i)t ( !^ RE p i):
(6)
The rst term on the right hand side involves the rotation only through !^ and is independent of . The second, {dependent term can be written as
X i
( !^ p0i)t (cos ( !^ p i) ? sin ( !^ ( !^ p i ))) :
(7)
The coecient of cos() in this expression is proportional to !^ tM !^ and is nonvanishing since M is positive de nite. Then the ! giving the maximum can be solved for explicitly:
P ( !^ p0 )t p
tan(! ) = P ^i 0 t i ^ i A(!^ ) ; B (!^ ) i ( ! pi ) ( ! pi ) where
X A(!^ ) N1 ( !^ p0i)t p i ; p
i
X B (!^ ) N1 ( !^ p0i)t ( !^ p i): p
i
(8) (9)
Substituting tan(! ) back into (7) yields Np(A2(!^ ) + B 2(!^ ))1=2. This is the value at the maximum over of the second term in (6). From the de nition of B (!^ ), the rst term in (6) can be rewritten as
X
X
i
i
( !^ t p0i) ( !^ tp i ) =
3
p0 p ? N B (!^ ): t i
i
p
(10)
Thus after the substitution for tan(! ), (3) becomes up to rotation independent terms and an irrelevant factor of Np
q q E ( !^ ) A2 ( !^ ) + B 2 ( !^ ) ? B ( !^ ) = B ( 1 + (A=B )2 ? 1):
(11)
B (!^ ) and A(!^ ) may be written B (!^ ) = !^ t M !^ ;
(12)
A(!^ ) = Vt !^ ;
(13)
where M , V are the matrix and vector de ned in the statement of the theorem. De ne X (!^ ) (1=2) A2 (!^ )=B (!^ ). As we discuss below, when the translation and noise are small jA=B j will in general also be small and E ( !^ ) X ( !^ ). Thus, as a rst step toward maximizing E , we consider maximizing X ( !^ ) with respect to !^ .
Lemma 2 Let !^ be the value of the unit vector !^ maximizing X (!^ ). Assume M is positive de nite. Then the ratio (!^ ) A(!^ )=B (!^ ), where A(!^ ), B (!^ ) are de ned by (9), satis es j(!^ )j = jM ?1 Vj, and tan?1 ((!^ )) satis es j j jM ?1 Vj: M
M
M
M
M
M
M
M
Proof. Since M is real symmetric, we can write M = USU = M 1 2 M 1 2 where U is orthogonal, S is real diagonal, and M 1 2 US 1 2 U . Since M is positive de nite, M 1 2 is real and the eigenvalues of M on the diagonal of S are positive . Let !0 M 1 2 !^ and V 0 M ?1 2 V . Maximizing X over unit vectors !^ is equivalent to maximizing t
=
=
=
=
t
=
=
=
(^!0 V0)2;
(14)
where !^ 0 !0=j!0j, over the ellipsoid !0t M ?1 !0 = 1. Clearly, the maximum occurs at one of the two points on the ellipsoid where !0 is parallel to V 0, corresponding to a value for !^ of ! ^M
M ?1=2 V0 = M ?1=2 M ?1=2 V = M ?1 V: 4
(15)
Substituting this value into the expressions for A(!^ M ), B (!^ M ) yields
j(!^ M )j jM ?1 Vj
V M ?1 V t
V M ?1 MM ?1 V t
= jM ?1 Vj:
(16)
Since jM j j tan(M )j, it follows that
jM j jM ?1 Vj:
(17)
We now return to the maximization of the exact expression E ( !^ ). In terms of X (^!) and (^!) A(!^ )=B (!^ ), ^) q 2 X ( ! E (^!) = 2 (!^ ) ( 1 + 2 (!^ ) ? 1) X (^!) K (2 (!^ )); p K (x) 2 1 +xx ? 1 :
(18) (19)
K (x) for x 0 is a monotonically decreasing function of x. Recall that !^ M M ?1 V gives the maximum value XM of X (^!). Let M (^!M ); we have shown in the Lemma that jM j = jM ?1 Vj. The only way to achieve a value E (!^ ) larger than E (^!M ) = XM K (2M ) is via a value of !^ such that K (2 (!^ )) > K (2M ). But since K (x) is monotonic decreasing this implies j (!^ ) j < jM j. Since j (!^ )j jtan ( (!^ ))j = j (!^ )j we have the desired result: the error in the rotation is bounded by jj < jM ?1 Vj.
2.1 Estimates on the Rotation Bound In the remainder of this section we discuss the expected magnitude of the bound derived above, paying particular attention to its dependence on the eld of view (FOV). We also discuss the conditions under which the bound is valid|that is, the conditions under which M will be positive de nite, with all eigenvalues positive. Let pi p0i ? p i . Recall that pi re ects the displacement due to noise and translation only since the rotation has been compensated exactly in the de nition of p i . Thus pi is small 5
if the translation and noise are. We rst derive crude but simple bounds on the eigenvalues of M and on jM ?1 Vj in terms of the pi , and then extend them to tighter estimates in a following subsection. We use the notation hxi Pi xi =Np to denote the average value of a quantity x; where here i varies over all Np feature points. Let hjpji be the average value of the jpij. We have X jVj = N1 j (pi p )j hjpj i; (20) p i ! X 1 j !^ p ij2 + ( !^ pi)t ( !^ p i ) : (21) B (!^ ) = N p i In (21), the second term is also bounded by hjpj i, X 1 ( !^ pi)t ( !^ p i ) 1 X jpi p ij hjpj i; (22) Np i Np i while the rst term is positive for arbitrary !^ (assuming the image points pi are not all colinear). We denote the rst term of (21) by B0(!^ ): X B0 (!^ ) = N1 j !^ p ij2 = !^ t M0 !^ ; (23) p i where X M0 13 ? N1 p ip ti : (24) p i Denote the eigenvalues of M0 by ei and those of M by e0i, ordered from greatest to least. The least eigenvalues e3 and e03 are, respectively, greatest lower bounds for B0 (!^ ) and B (!^ ). e3 depends just on the initial observed image points and, as discussed in the next section, is typically of order 1 if the eld of view (FOV) is moderate or large. We also have that e3 > 0 unless the image points are colinear, as noted above. Thus from (21) and (22),
B (!^ ) !^ tM !^ e03 e3 ? hjpj i:
(25)
If hjpji is suciently small, e3 ? hjpji > 0 and B (!^ ) > 0 for arbitrary !^ , implying that M is positive de nite. Moreover, e3 ? hjpji > 0 implies j hjpj i : (26) jj M ?1 V jV 0 e3 e3 ? hjpj i 6
When e3 1 and hjpji 1, (26) states that the rotation error (in radians) is bounded approximately by hjpji, the average displacement of an image point due to the
translation or noise. Recall that p is a dierence of unit vectors; hjpji 1 would imply i
a large average translational shift in the image of approximately 45. (26) is a pessimistic bound on jM ?1 Vj since the overlap of V with the least eigendirection of M |the one corresponding to e03 |is typically small. Our bounds on jj and e03 will be tightened in subsection 2.2.
2.1.1 An Example The usefulness of the bound in (26) depends crucially on the size of e3, the least eigenvalue of M0 . M0 measures the scatter in the measured image points in the rst image; e3 will be small when the observed image points cluster around a single direction. In this section we estimate the typical size of e3 depending on the eld of view. Let F denote the size of the FOV. Let Ei be the eigenvector of M0 corresponding to ei; similarly let E0i be the eigenvector of M corresponding to e0i. Assume for convenience and without loss of generality that E3 = z^. Then
E D X e3 = N1 x2i + yi2 x2 + y2 ; p
i
(27)
where xi; yi are coordinates of the p i, and the brackets denote the average value. (For later reference note that e3 sin2 (F ) since x2i + yi2 sin2 (F ).) In the limit of continuous image sampling, 1 D 2 2E Rcos( sin2 ()d cos 1 = 3 (1 ? cos(F =2)) (2 + cos(F =2)) : (28) x + y = R 1F =2) cos(F =2) d cos (28) should also give the typical e3 values for discrete image points distributed uniformly across the eld of view. For this continuous example, e3 = hx2 + y2i :26 at F = 90. It follows from (26) that the rotation error is bounded at worst by approximately 4 jVj 4 hjpji|however, as remarked 7
in the previous section, the bound is typically signi cantly better than this when the small overlap of V with the least eigendirection of M is taken into account. For small eld of view, e3 can be small; in the continuous example e3 :06 at F = 40. In this case the bound in (26) can be large (or negative) and we require a better estimate of the bound on the rotation error.
2.2 An Improved Estimate In this section we rst improve the bound (25) on the least eigenvalue e03 of M . Second, we improve the bound (26) on the rotation error by estimating the overlap of V with the least eigendirection of M . These results depend on the following Theorems.
De nition 1 Let H 0, H be N N symmetric matrices, and let H be positive de nite. De ne the perturbation H H 0 ? H . Let e0 , e be the eigenvalues respectively of H 0 and H , ordered from greatest to least, and let E0 and E be the corresponding unit eigenvectors. De ne P? to be the projection operator P? 1 ? E E , where 1 is the N N identity matrix. i
i
i
N
i
N
N
t N
N
N
De ne
D P?N (H + H ? 1N (eN + HN N )) P?N ;
(29)
where HN N EtN H EN , and also de ne H?? P?N HP?N .
De nition 2 For an arbitrary matrix M let jjMjj be its 2-norm (largest singular value of M). De nition 3 Let a be a lower bound on the eigenvalues of the matrix D (excluding the zero eigenvalue associated with E ) and let N
b jP?N H EN j: De ne
B2( ) =
p1 + 4 2 ? 1 2
8
(30)
2:
Theorem 3 (eigenvalue bound) Assume that a > 0. Then the perturbation of the least eigenvalue is bounded by
HN N e0N ? eN HN N ? aB2 (b=a):
(31)
Remark. The assumption on a will hold if the least eigenvalue of H is well separated from the others compared to the the scale of the perturbation H . For example, we may take
a e(N ?1) ? eN ? jHN N j ? jjH??jj:
De nition 4 With the de nitions above and assuming that E0 E 6= 0, de ne E by i
E0 jEE ++ EE j ; i
i
i
i
i
i
i
E E = 0: i
i
Theorem 4 Assume that a > 0. Then jEN j 2ab : The proofs of these Theorems are omitted due to lack of space. Theorems similar to these can be found in [2] and [11]; however, the bounds stated above, since they deal with the least eigenvalues, improve slightly on the bounds quoted in these references. We rst use Theorem 3 to bound the least eigenvalue of M 0 , focusing on the case when the FOV and e3 are small since the bound (25) suces when the FOV is large. We have M33 e03 ? e3 > M33 ? aB2 (b=a);
(32)
where M33 Et3 M E3; M M ? M0 is treated as a small perturbation of M0 . We rst estimate M33 .
Bound on M33 9
De ne
i jp i ? T=rij;
i 1 ? p i T=ri ? jp i ? T=rij;
T? T ? p p T; i
i
(33)
i
where ri is the radial depth and T is the unit translation vector. We take the translation to have unit magnitude; then our basic small translation assumption is jTj = 1 ri. It follows that the i are order 1. The displacement of an image point is
? T=ri ? p + = ? T?i + p i i + ; pi = jpp i ? i T=r j i i r i
i
i i
(34)
i
where i represents the eects of noise. In terms of the pi,
!
X X M M ? M0 = N1 13 p i pi ? 21 (pipti + pi p ti ) : p i i M can be rewritten as M = M1 + M2 + M , where
X i (13 ? p i p ti ); M1 1 Np i i X piTt?i + T?ipti ; M2 1 2Np i rii ! X X t 1 1 t M N 13 p i i ? 2 (pii + ip i ) : p i i
(35)
Note that i is nonpositive and second order in the small translational image displacement T=ri. Thus the rst term M1 in (35) makes a negligible contribution to M33 . Moreover, since it is clear from the de nition of M0 that E3 is within the eld of view, E3 pi cos (F ) and X i i p ti)E3 ? sin2 (F )h i i: ( 1 0 Et3 N1 3?p p
i
i
i
Thus the contribution of M1 is further reduced at small FOV due to the factor sin2 (F ) . The contribution of the noise term M is also small. Write i ?i + jji, where ?i, jji are the components of i respectively perpendicular and parallel to p i . If the noise is 10
small as is typical (typically it is less than or on the order of one pixel), then since the noise perturbs one unit vector into another, jji o(2i ) . The terms involving jji are bounded by 2h jj i, where the angled brackets denote as before the average value over all points. Further, we have j?i E3j sin(F )?max, where ?max maxi j?ij, so that this contribution is suppressed at small FOV. The total noise contribution is bounded by
E3M E3 2h jj i + sin( )?max: t
F
(36)
For a typical situation with 90 FOV and a 512 512 image, one pixel noise corresponds to jij :004 and the noise contribution is clearly small. It is insigni cant in our experiments compared to the error due to the translation. In addition, the term in (36) linear in the size of the noise ?max should typically be further suppressed by a factor of order Np?1=2 , due to the cancellation of the random noise between dierent points. Below, we incorporate a factor of s in the noise bound to denote this extra suppression factor. The main correction to M33 comes from the second term M2 . For small eld of view, this term is again suppressed since jT?i E3j sin(F )T?max, where T?max maxi jT ? pi pi Tj. The M2 contribution is bounded by
h(rii)?1 i sin(F )T?max: Combining the bounds on the three terms yields
jM33 j B33 s sin2 (F )h i i + h(rii )?1i sin(F )T?max i +2h jj i + s sin(F )?max;
(37)
where s represents the additional suppression due to noise cancellations.
B2 term The other term appearing in the bound (31) is ?aB2 (b=a). Since B2 is a monotonic increasing function of its argument, this term can be bounded by
jaB2 (b=a)j amaxB2 (bmax=amin) ; 11
where amax; amin, bmax are the upper or lower bounds on a and b derived below.
a was originally de ned as a lower bound on the eigenvalues of the matrix P?3 (M + M ? 13 (e3 + M33 )) P?3: Since M1 is negative de nite, we may choose a as
a e2 ? e3 ? jM2;33 j ? jM;33 j ? jjM??jj;
(38)
where Mf1;2;g;33 Et3 Mf1;2;g E3; M?? P?3 M P?3; and P?3 is de ned as in De nition 1. Also write M?? = M1;?? + M2;?? + M;??; where Mf1;2;g;?? P?3Mf1;2;g P?3. From (38) and since it follows from the de nition of M0 that ei 1; a e2 ? e3 1 ? e3 amax: (39) Next we derive a lower bound on a. By the triangle inequality,
jjM??jj jjM1;??jj + jjM2;??jj + jjM;??jj: It is easily shown that
jjM1;??jj ?h i i; i jjM2;??jj h(rii )?1i sin(F )T?max; D E jjM;??jj jjjj 1 + sin2 (F ) + s sin(F )?max;
(40)
where the factors of sin(F ) in the last two equations follow from the action of the projection operator P?3 on pi . This yields the bound a amin with i amin 1 ? 2e3 ? h i ? 2h(ri i)?1i sin(F ) ? h jj i 3 + sin2 (F ) ? 2s sin (F ) ?max; i 1 ? 2e3 ? min: (41) 12
From the de nition of M0 , Tr(M0 ) = 2 and, as shown in Section 2.1.1, e3 sin2 (F ). Combined with ei 1, these results imply that
e1;2 1 ? e3 cos2(F );
(42)
and 1 ? 2e3 cos(2F ): Thus the sum of the rst two terms in (41) is order 1 in the case of interest|small FOV| while the remaining terms in (41) are small: recall that typically we expect s Np?1=2 , and from the assumption ri?1; jj 1 it follows that i 1, i ri?2; h jj i o(2). Thus amin is typically positive and order 1 in the small FOV case. (When the FOV is large and e3 1, we use the bound (25) instead of the one derived in this section.) From similar reasoning, b jP?3M E3j is bounded by b bmax, with
i D E bmax h i sin(F ) + 12 h r 1 i (1 + sin (F )) + 2 jj + 21 s (1 + sin (F )) ?max: (43) i i i
Since B2 ( ) 2, the contribution of the B2 term to the bound on the eigenvalue dierence is second order in small quantities: o (b2max) o(ri?2; jj2): Thus although some terms in bmax have no sin (F ) suppression at small FOV, the B2 term is small compared to the rst order terms e3 and Et3 M E3 except at very small FOV. In summary, the bound on e03 is given from (32) by
e03 min e3 ? B33 ? amaxB2 (bmax=amin) ;
(44)
with B33 as in(37), and with amax; amin; bmax given in (39), (41), and (43). For very small FOV, e3 is scaled by 2F while Et3 M E3; the most signi cant term in the bound onje03 ? e3 j ; varies as F times o(jTj =ri; jj). Thus M should be positive de nite
except when the image displacement due to the translation or noise is about the same size as the FOV. In such cases our bound on the rotation error may fail. The exact
region of positive de niteness of M depends on the situation|on the sizes of the image 13
displacements and on the distribution of feature points in the image. We explore this region experimentally in the next section.
Experiments on e03 We have conducted a series of experiments to measure the least eigenvalues of M . In these experiments, the FOV, the number of points, the translation direction, the maximum value of jTj =ri (the inverse radial depth scaled by the translation magnitude), and the standard deviation of the image noise were given as inputs. A number of synthetic trials were run for each setting of these parameters. In each trial, feature points were selected randomly within the eld of view, and values of jTj =ri were chosen randomly varying uniformly from 0 up to the input maximum value. (Note that using the uniform distribution for the inverse depths gives a dicult test of our approach, since it causes the translational image displacements to be larger than they typically are in practical situations. Typically, the radial depths ri are distributed uniformly; assuming this instead for the inverse depths causes the 3D points to cluster at small depths, with probability density P (ri) 1=ri2.) A second image was generated from the rst using the input translation and randomly computed structure; then Gaussian image noise with 2 pixel standard deviation was added to each image point in the second image. Finally, the least eigenvalues of M and M0 were computed. The results are shown in Table 1. In this table m; refer to the mean and standard deviation of e03 over the given number of trials. The ranges, means and standard deviations adequately represent the experimental distributions since the histograms of e03 and e3 appear to be approximately Gaussian. Note that e03 remains positive in all cases even for large translational displacements up to jTj =ri 0:5 and a very small FOV of 30 (0:52 radians max (jTj =ri)). For even smaller FOV, some negative e03 are obtained at jTj =ri 0:5. This bears out our claim that M is positive de nite and our bound valid except when the translational image displacements are approximately equal to the FOV. For more typical FOV with F 40, the values for e03 14
#Trials 100 100 200 200 200 200 200 200 200 100 100 100 100 100 100 100 100
Npts FOV 20 90 20 40 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 90 20 40 20 30 20 30 20 30 40 30 10 30 10 30 Table 1:
T
max(jT j=ri) range(e03) m; (e03) (1 0 0) .3 .14{.33 .24, .03 (1 0 0) .3 .04{.06 .06, .009 (1 0 0) .3 .005{.022 .013, .003 (1 0 1) .3 .009{.027 .017, .003 ?( 1 0 1 ) .3 .008{.022 .015, .003 (1 0 2) .3 .01{.029 .020, .003 ?( 1 0 2 ) .3 .009{.023 .016, .003 (0 0 1) .3 .014{.034 .022, .003 ?( 0 0 1 ) .3 .011{.022 .016, .002 (1 0 0) .5 .14{.33 .23, .034 (1 0 0) .5 .022{.08 .05, .01 (1 0 0) .5 .004{.044 .025, .008 (1 0 1) .5 .017{.054 .035, .008 ?( 1 0 1 ) .5 .016{.052 .030, .006 (1 0 1) .5 .022{.050 .037, .006 (1 0 1) .5 .006{.059 .033, .011 ?( 1 0 1 ) .5 .009{.052 .027, .008 Rotation error bound: minimum eigenvalues of M
range( ee33 ) .90{1.01 .73{1.02 .31{1.03 .55{1.15 .60{.99 .90{1.19 .70{.97 1.09{1.26 .83{.92 .82{1.00 .45{.92 .50{1.13 .55{.96 .57{1.10 .21{1.24 .39{.94 0
are in good agreement both with the experimentally computed values for e3 and with the theoretical estimates computed in Section 2.1.1 for the continous example. Note that the results are worse (smaller e03 ) when T is parallel to the image plane. This happens because the T?i and thus the dominant perturbation M2 in (35) are larger in this case. Also, e03 tends to be slightly smaller for backward motion (negative T) compared with forward. This is due to the fact that for forward motion the overlap of T with the view direction is positive; since E3 is approximately in the view direction, the rst order contribution to the eigenvalue shift Et3M2 E3 also tends to be positive. However, at small FOV we do nd some trials where e03 < e3 for forward motion, even though a rst order analysis would suggest the opposite inequality. The second order terms can dominate the rst since the rst order terms are suppressed by sin (F ) at small FOV while the second order terms are not. Finally, note that the results are slightly worse when the number of feature points is small. This is probably because of the larger variance in the eective FOV; eectively, F and therefore e3 may be smaller than their nominal values due to the nonuniform distribution of points across the eld of view. 15
2.3 Improved Bound on
for Small FOV
j j
As stated previously, the least eigenvalue of M may be small at small FOV even though it remains positive. In this case (26) is inadequate as a bound on jj. In this section we improve the bound at small FOV by computing a better estimate for jM ?1 Vj; taking account of the fact that V typically has small overlap with E03. Writing M in its eigenvector basis gives
3 E0j E0j V X M ?1 V = : e0j j =1
From (42) we have
0 ?1 M V cosj2V(j ) + jE3e0 Vj : 3
F
At small FOV the rst term is of order jVj o(ri?1; ). We wish to show that the second term is also small, even for small e03; due to the small overlap jE03 Vj 1. We show rst that E03 hpi, where hpi is proportional to the average viewing direction, and then that hpi V 1.
Bound on E03 We have where
1 X p pt = hpihpt i + 1 X(p ? hpi)(pt ? hpt i); i Np i i i Np i i
X jhpij = N1 jhhppiji pi cos(F ): p i P De ne C i(pi ? hpi)(pti ? hpt i)=Np. Then
(45)
M = 13 ? hpihpt i ? C + M: De ne
P?v 13 ? v^v^t ; where v^ hpi= jhpij is the average viewing direction. v^ is the least eigenvector of the matrix 13 ? hpihpti (i.e., the one associated with the least eigenvalue). From Theorem 4, the least 16
eigenvector E03 of M is given by hpi=jhpij + where satis es hpi=0 and is bounded by
jj 2 jP?v (?Ca~+ M ) v^jj :
(46)
Here a~ is a lower bound on the eigenvalues of the matrix D~ (excluding the zero eigenvalue associated with < p >), where D~ is de ned by
D~ P?v 13 ? hpihpti?C + M ? 13 1 ? jhpij2 + v^t (?C + M ) v^ P?v = P?v ?C + M + 13 jhpij2 ? v^t (?C + M ) v^ P?v: a~ can lower bounded by reasoning similar to that of subsection (2.2). a~ cos2 (F ) ? jjCe??jj ? jjg M ??jj + jCvvj ? jM2;vv j ? jM;vv j ; where we have used (45) and the positive de niteness of C and ?M1 , and de ned
Ce?? P?vCP?v; and
g M ?? P?vMP?v ;
Cvv v^t C v^; Mf2;g;vv v^tMf2;g v^:
(Note that in g M ?? the projection matrix annihilates hpi rather than E3 as previously for M??.)
Cvv is bounded by sin4 (F ) and is thus very small at small FOV: The bounds on jjg M ??jj, jM2;vv j, and jM;vv j can be taken to be exactly the same as for their analogs in (40) and (37). This is so since these quantities dier from their analogs only by the substitution of v^ for E3, and the only information on E3 used in deriving the previous bounds was that it was contained within the FOV as is true also for v^. Lastly, jjCe??jj sin2 (F ): Putting these bounds together yields a~ a~min,
a~min cos (2F ) + jCvv j ? min; where min is de ned as in (41). 17
Similarly an upper bound on the numerator in (46) can be derived. The term in the numerator involving C is bounded by
jP?3C v^j = v^t C v^ sin(F )(1 ? cos2(F )) = sin3 (F ) ; while as for the bound on b in (43)
jP?v M v^j bmax: This implies nally that
D
E
jj 2 sin (a~F ) + bmax = o (rii )?1 ; s jj : 3
min
Now we turn to estimating V.
!
X X V = N1 ?T rpi + i pi : p i i i i
(47)
Clearly V is approximately orthogonal to hpi up to corrections of the order of the FOV. Since the least eigenvector of M is approximately v^, we focus on this component. The noise{dependent contribution is bounded by
1 v^ X pi s sin (F ) ; ?max Np i i
where as before s is a suppression factor due to cancellation of the noise between dierent image points. The T{dependent contribution is
1 ?v^ T X pi = jT v^j Np r Np D i ?i 1Ei jT v^j sin (F ) (rii ) :
X pi ; i rii
(48)
where pi denotes the component of pi along the direction of Thpi, and we have used jpij sin (F ) in deriving the inequality. Roughly, the T{dependent contribution is given by the average dot product between the
ow v^ p produced by a rst order rotation around the view direction v^ and a rst order 18
translational ow T=ri. In general, since the ows due to translations and rotations dier signi cantly, we expect this average dot product to be small. (This is particularly true for a rotation around the viewing direction.) However, it is important to understand when the average dot product is at its largest, causing the bound in (48) to approach saturation, since in these cases the expected rotation error will be relatively larger also. The terms in the sum in (48) will tend to cancel since P pi = 0 (from ? hpi) and since rii 0. There will be no cancellation only if the 3D points are at in nite depth in one half of the visual eld, that is if ri 1 for one sign of pi: Also, the sum in (48) will be larger when ri?1and pi are correlated, since this increases the eects of the large translational displacements. ri?1 pi implies that the 3D points lie approximately on a plane, with normal perpendicular toT and to the average viewing direction v^. The largest rotation
error can be expected for translations T parallel to the image plane (due to the factor T?max) and for 3D points lying on a plane approximately perpendicular to the image plane. Conversely, for general nonplanar con gurations of 3D points, the sum in (48) will display cancellation and be relatively smaller than the nominal bound. In summary,
E
D
V v^ sin ( ) s ?max + s2 jT v^j (r )?1 ; E D jVj s ?max + T?max (r )?1 ; F
i
i
i
i
where as before we use s2 1 as a reminder of the additional suppression that occurs when the 3D points do not fall on a plane perpendicular to the image plane. Combining all results we derive the improved bound on the rotation error:
E D s ?max + T?max (rii)?1 jj 2 cos E (F ) D sin (F ) s ?max + s2 jT v^j (rii )?1 + e03 min ! 0 s + T D(r )?1E 1 3 ?max i i A; +2 sin (a~F ) + bmax @ ?max e0 min 3 min 19
(49)
where e03 min is de ned in (44). Recall that at small FOV e03 min is of order sin2 (F ), except at such small FOV that the E D translational image displacement is of comparable size to F . bmax = o (rii )?1 ; jj and a~min 1. Thus for FOV larger than the translational image displacement, crudely
jj
E
D
(rii)?1 ; jj : sin (F )
The rotation error is bounded roughly by the ratio of the average translational image displacement to the FOV. From (26), this holds also at large FOV. Experiments We have conducted synthetic experiments to verify these theoretical expectations. The results appear in Table 2, where each entry summarizes 100 trials. The experiments were conducted as for those reported in Table 1; in particular the inverse depths were chosen from a uniform distribution, which decreases the average depths of the 3D points and correspondingly increases the rotation error. For this reason, in practice we expect rotation errors signi cantly smaller than those reported in Table 2. Note that in our experiments the maximum computed rotation error over all trials and all cases is 23:4 :41 radians. In our structure from motion algorithm [7, 6], the rst order eects of rotation error are eliminated. Thus the eect of rotation error on the estimates of structure and motion are in the worst case scaled by :412 17%: This is not large especially considering that the rotation errors are likely to smaller for the remaining image pairs in a multiframe sequence. As shown in the last column, expecially at moderate to large FOV the upper bound B is a very good estimate of the actual rotation error jj. Although we omit the derivation for lack of space, it can be shown that a lower bound on jj exists which at such FOV is very close to the upper bound. Finally, since as discussed in subsection 2.3 the case of 3D planar scenes is expected to produce relatively large rotation errors, we ran experiments for this case. First we ran 100 trials 20
Npts FOV T max(jT j=ri) range( jj) range(B ) 20 90 (1 0 0) .3 5.5{10.8 5.5{11.0 20 40 (1 0 0) .3 6.3{12.5 6.3{13.3 20 20 (1 0 0) .3 5.6{19.0 5.7{28.3 20 20 (1 0 1) .3 4.5{13.7 4.5{18.1 20 20 (1 0 2) .3 3.3{10.9 3.3{11.6 20 20 ? ( 1 0 2 ) .3 2.3{7.7 2.3{8.1 20 40 (1 0 0) .5 9.7{18.1 9.8{22.9 20 30 (1 0 0) .5 9.5{23.4 9.6{45.0 20 30 (1 0 1) .5 7.9{20.5 7.9{31.0 20 30 ? ( 1 0 1 ) .5 6.0{14.9 6.0{16.6 40 30 (1 0 1) .5 10.2{17.6 10.3{21.0 40 30 (1 0 0) .5 10.8{17.8 10.9{23.5 20 30 (1 0 2) .5 5.8{16.4 5.8{20.1 Table 2: Measured rotation error computed by SVD and bound B summarizes the results of 100 trials.
m; (B ) range(jj =B ) 8.4, 1.2 .984{.997 9.3, 1.4 .94{.996 11.5, 4.0 .62{.995 8.8, 2.4 .75{.995 5.6, 1.5 .92{.998 4.1, 1.2 .93{.9995 15.3, 2.6 .78{.989 17.8, 5.3 .39{.988 15.0, 3.5 .66{.992 9.4, 1.9 .88{.995 14.1, 2.1 .81{.988 15.7, 2.6 .73{.986 10.2, 2.1 .81{.994 M ?1 V. Each entry
with 20 points lying on a plane orthogonal to the translation direction and perpendicular to the image plane. The points were chosen uniformly on the plane, then the plane was positioned so that the FOV and maximum inverse radial depth were approximately 40 and .3, respectively. Though the actual FOV varied down to values as small as 27, the maximum rotatior error was just 17, while the mean was 12:8. An additional set of 100 trials where the plane normal was allowed to vary randomly near the set of image plane directions produced similar results. These results are consistent with those reported in Table 2. We also report results on two real image sequences: the rocket eld sequence [1, 12] and the PUMA sequence [9, 10]. The rocket sequence consists of nine images obtained by approximately forward motion in an outdoor environment. The maximum translation magnitude, between the rst and ninth image, was 7.35 feet, while the depth of the nearest point relative to the rst (farthest) camera position was 17.8 feet. The eld of view was about 45. We computed the rotations neglecting the translation. The rotation was recovered with errors increasing nearly linearly with the size of the translation from :33 to 4:3. The PUMA sequence consists of 16 images of 32 points, with predominantly rotational motion of the camera at the rate of about 4 per image. The maximum translation was 1.5 feet, while the depth of the nearest point to the rst camera position was 13.4 feet. The maximum rotation from the rst image was 60:5. The FOV was about 40. Neglecting the 21
translation, the rotation was recovered with error increasing almost perfectly linearly with the translation magnitude. The error varied from :37 to a maximum of 4:3.
3 Large Translations We now turn to the case when the translation is of the same order or large compared to the distances of the 3D points. We argue and demonstrate experimentally that in this case, assuming the feature points vary signi cantly in depth, both the rotation and translation can be recovered accurately. The structure can also be recovered accurately except for very distant points. Given a multi{image sequence, the distances of distant points should also be determinable by an algorithm such as [7, 6], by taking advantage of the motion information computed from 2{image pairs. The reasons why the structure and motion can be recovered robustly for large translation from 2 images are straightforward. Most importantly, because of the signi cant depth variation, translation and rotation are easy to distinguish. For a rotation, the image displacements from the rst image to the second are uniform, depending only on image point position, so that the displacements for nearby points are similar. On the other hand, for translations the displacements for near points are very dierent from those for far points. A large translation ampli es these dierences compared to the image noise, allowing easy disambiguation. It is well known that the main cause of failure in motion recovery is the ambiguity between rotation and translation; accurate disambiguation leads to accurate recovery of the entire motion. The structure computation is robust rstly because the motion is accurately known. Also, the signal to noise ratio is large: the structure is determined from the translational image displacements which are large compared to the noise. Finally, the nonlinearities of structure computation are minimized since the image rays from the two images tend to intersect at large angles due to the large translation, causing the depths to be relatively insensitive to 22
image noise. We conducted a set of 100 synthetic experiments to test these claim. In each trial, a set of 20 3D points was chosen randomly, with image coordinates varying over a eld of view of 40, and depths varying uniformly from 2 to 40 in units of the translation size. We neglected to include more distant points (ri > 40) since there is not enough information in the two images to determine the depths of these points reliably. As noted above, the depths of these further points may be determinable using more images once the motion has been computed from the image pairs. In each case, the translation was taken to be in the x{direction. This was done since translation parallel to the image plane is traditionally believed to be a dicult case for motion reconstruction. The rotation was chosen randomly varying up to a maximum of about 25. One pixel Gaussian image noise was added to each image point in each image. The results are displayed in the gures. The angular error in the translation direction as computed by the \8-point" algorithm is shown in Figure 1a, while the nal error in this direction, as computed by Levenberg{Marquardt starting from the result of the \8{point" algorithm, is shown in Figure 1b. Even though the \8{point" algorithm is notorious for its unreliability, it reconstructs the translation direction accurately when the translation is large. Similar results for the rotation error (in degrees) are shown in Figures 2a and 2b, for the \8-point" and Levenberg{Marquart results, respectively. Figure 3a shows a 3D histogram of the number of points reconstructed for given values of the depth and depth error. The maximum depth error shown is 10 units; the number of reconstructions indicated at 10 units of depth error includes all reconstructions with higher depths errors as well. It is clear that for depths less than about 20 units, the reconstruction error is small. Figure 3b shows this explicitly. The same results for the percentage depth error are shown 23
in Figure 4a. Note that the results are better at the smallest depths than they are at slightly larger depths. The reason for this is the scaling ambiguity: in displaying these results we scaled the reconstructed depths to the ground truth using only points with ground truth depths less than 10 units. (This was done since we expect these points to be reconstructed more accurately due to their larger translational image displacements.) The results for the depths are actually much better than indicated in these displays. It can be shown that the component of the depths that is least accurately determined by structure from motion is the constant component of the inverse depths. It is well known that the ane structure is more robustly determined than the full Euclidean structure; in fact, this constant component is the most dicult Euclidean component and often is responsible for most of the error in determining the Euclidean structure. We omit a fuller theoretical discussion of this result for lack of space, but illustrate it experimentally in Figures 4b|6a. For these gures, a single component of the structure|the constant component| has been corrected to the ground truth for each of the 100 trials. The residual depth error is then plotted. Figures 4b, 5a, 5b, 6a, show respectively the results for all points, for points with depths less than 20, for the fractional errors for all points, and for the fractional errors for points with depths less than 20. In Figure 6b, the fraction of points of a given depth with depth errors greater than 10% are shown. In Figure 7 the similar results are shown after correction of the single, dicult structure component mentioned above. The result that the error is dominated by the constant component of the imverse depths is likely to be a useful one. In a multiframe approach for the large translation case, this component could be reconstructed via an optimization solely in this component, using the values of all other structure components xed at their values as computed by 2{frame algorithms. Finally, we computed the rotation directly assuming, as in the previous sections, that the translation is zero. Note that since the minimum 3D depth in these experiments is 24
Angular error in translation direction (8 point algorithm)
Angular error in translation direction (LM)
40
35
35
30
30
25
Number of trials
Number of trials
25
20
15
20
15
10 10 5
5
0 0
2
4
6
8 Angular error
10
12
14
16
0 0
1
2
3
4 Angular error
5
6
7
Figure 1: a, b
2, the \large translation" domain considered here actually overlaps the \small translation" domain previously discussed, where the maximum inverse depth was .5. The maximum rotation error obtained by this method was 12, and the average was 6:4. Thus these experiments can be considered successfully as either large translation or small translation. We have also obtained similar results for translations in the directions (1 0 1), (1 0 2), and (0 0 1). In these cases the bas relief ambiguity is less signi cant, so that the single constant component accounts for less of the error. However, the motion is recovered more accurately. Lastly, we ran trials similar to those above but using structure obtained from the real PUMA image sequence. The results for the \8{point" algorithm are shown in Table 3, where each entry summarizes 500 trials. As before, 1 pixel uniform noise was added to each image point, and the translation was selected randomly under the constraints indicated. The results of Levenberg{Marquardt starting from the 8{point estimates were: for Tz = 0 and jT j = 12, the translation direction was determined over 250 trials with an average angular error of 1:6, standard deviation of 1:2, and maximum angular error of 8:6. Also, 25
8
Rotation error (8−point algorithm)
Rotation error (LM)
25
20 18
20
16
Number of trials
Number of trials
14 15
10
12 10 8 6
5
4 2
0 0
0.5
1
1.5
2 2.5 Rotation error
3
3.5
0 0
4
0.5
1
1.5 Rotation error
2
2.5
Figure 2: a, b
Number reconstructed at given depth and depth error
Number reconstructed at given depth and depth error
120 100
80 60 40 40 20 0 10
Number reconstructed
Number reconstructed
100
20 6
2
0
0
40
0 10
10 4
60 20
20
30
8
80
3D Depth
Magnitude of depth error
15 10 8
6
5 4
Magnitude of depth error
Figure 3: a, b
26
2
0
0
3D Depth
Number reconstructed at given depth and depth error
Number reconstructed at given depth and depth error
120 100
40
Number reconstructed
Number reconstructed
50
30 20 20 10 10
10
5
0
0
40
0 10
5
15
60 40
20
15
0 20
80
30 20 8
10
6
4
3D Depth
Percent depth error
2
0
0
3D Depth
Magnitude of depth error
Figure 4: a, b
Number reconstructed at given depth and depth error
Number reconstructed at given depth and depth error
120
100
80 60 40
20
20 0 10
Number reconstructed
Number reconstructed
100
15 10 8
6
2
0
0
60 40 40 20 0 20
5 4
80
30 20 15
10 10
3D Depth
Magnitude of depth error
Percent depth error
Figure 5: a, b
27
5
0
0
3D Depth
Fraction of reconstructed points with depth errors greater than 10%, vs depth
Number reconstructed at given depth and depth error 100 90 80 70 Fraction reconstructed
120
80 60 40
20
20 0 20
15
15
60 50 40 30 20
10
10
5 10
5
0
0
3D Depth
0 0
5
10
15
Percent depth error
Figure 6: a, b
Fraction of reconstructed points with depth errors greater than 10%, vs depth 25
20
Number reconstructed
Number reconstructed
100
15
10
5
0 0
5
10
15
20 3D Depth
Figure 7:
28
25
30
35
40
20 3D Depth
25
30
35
40
Table 3: Two Frame Motion Recovery for Large Translation Using the 8{point algorithm. The angular error in determining the translation direction is displayed (in degrees) for different magnitudes of the input 3D translation. The mean of the angular error, its standard deviation, the maximum angular error, and the number of trials with errors respectively greater than 10 and 20 are shown. The angular error quoted is min(jj; j180 ? j) since both T and ?T are correct solutions. Baseline (ft) 4 6 8 10 12 14 16 Mean(jj) 41 14 7 6 5 5 4 jj 25 12 6 5 4 3 3 Tz = 0 max(jj) 90 81 32 29 20 19 14 N>10 437 253 130 84 45 37 27 N>20 370 114 17 5 1 0 0 Mean(jj) 22 7 4 3 2 jj 18 6 4 4 2 Forward T max(jj) 89 42 23 20 19 N>10 369 106 39 21 5 N>20 191 19 3 1 0 Mean(jj) 32 18 12 9 8 7 7 jj 16 10 7 5 5 4 4 Backward T max(jj) 88 86 42 35 25 22 22 N>10 476 411 287 210 171 123 112 N>20 393 174 66 22 14 3 6 the average depth error (in feet), its standard deviation, and the maximum depth error were recorded for each trial. The averages (and standard deviations over trials T ) of these results over the 250 trials were .3 feet (T = :2), .2 feet (T = :1), and .7 feet (T = :4), and their maximum values were 1.3 feet, .6 feet, and 2.5 feet. Thus the largest depth error for any feature point over all 250 trials was 2.5 feet. Clearly, these results support our claim that the depth can be recovered accurately from just two image frames when the translational displacement is large. These experiments were repeated for forward motion. As expected, for forward motion the translation is better determined than for sideways motion. The depths are determined with only slightly less accuracy, presumably because the depths of points near the FOE are recovered relatively badly.
References 1. R. Dutta, R. Manmatha, L.R. Williams, and E.M. Riseman, \A data set for quantitative motion analysis," CVPR, 159-164, 1989. 2. G. Golub and C. F. Van Loan, Matrix Computations, John Hopkins Press, Baltimore, Maryland,1983.
29
3. D.J. Heeger and A.D. Jepson, \Subspace methods for recovering rigid motion I: Algorithm and implementation," IJCV 7, 95-117, 1992. 4. R. Hummel and V. Sundareswaran, \Motion parameter estimation from global ow eld data," PAMI 15, 459-476, 1993. 5. H. C. Longuet{Higgins, \A computer algorithm for reconstructing a scene from two projections," Nature, 293:133{135, 1981. 6. J. Oliensis, \A Linear Solution for Multiframe Structure from Motion," in Proc. Arpa Image Understanding Workshop, Monterey, California, November 1994, pp. 1225{1231. 7. J. Oliensis, \Multiframe Structure from Motion in Perspective," to appear in the proceedings of the IEEE Workshop on the Representations of Visual Scenes, Boston, June, 1995. 8. J. Oliensis and J. I. Thomas, \Incorporating Motion Error in Multi{frame Structure from Motion," in IEEE Workshop on Visual Motion, Princeton, New Jersey, October 1991, pp. 8-13. 9. H. S. Sawhney, J. Oliensis, and A. R. Hanson, A \Description and Reconstruction from Image Trajectories of Rotational Motion", ICCV, 494-498, 1990. 10. H.S. Sawhney and A.R. Hanson, \Comparative results of some motion algorithms on real image sequences," IUW, 307-313, 1990. 11. G.W. Stewart and J. G. Sun, \Matrix Perturbation Theory," Academic Press, Boston, 1990. 12. J. Inigo Thomas, A. Hanson, and J. Oliensis, \Re ning 3D reconstructions: A theoretical and experimental study of the eect of cross{correlations", CVGIP:IU, Vol. 60, 1994, pp. 359{370. 13. C. Tomasi and T. Kanade, \Shape and motion from image streams under orthography: A factorization method," IJCV 9, 137-154, 1992. 14. C. Tomasi and T. Kanade, \Factoring Image Sequences into Shape and Motion," Motion Workshop, Princeton, 21-28, 1991.
30