Segmentation of a Piece-Wise Planar Scene from Perspective Images Allen Y. Yang, Shankar Rao, Andrew Wagner and Yi Ma Coordinated Science Laboratory, University of Illinois 1308 W. Main, Urbana, Illinois 61801 Email: yangyang,srrao,awagner,
[email protected] Abstract We study and compare two novel embedding methods for segmenting feature points of piece-wise planar structures from two (uncalibrated) perspective images. We show that a set of different homographies can be embedded in different ways to a higher-dimensional real or complex space, so that each homography corresponds to either a complex bilinear form or a real quadratic form. Each embedding reveals different algebraic properties and relations of homographies. We give a closed-form segmentation solution for each case by utilizing these properties based on subspacesegmentation methods. These theoretical results show that one can intrinsically segment a piece-wise planar scene from 2-D images without explicitly performing any 3-D reconstruction. The resulting segmentation may make subsequent 3-D reconstruction much better-conditioned. We demonstrate the proposed methods with some convincing experimental results.
1. Introduction Piece-wise planar structures are ubiquitous in both indoor and outdoor man-made environments. Given a sequence of images and a set of tracked feature points of a scene, many problems associated with the task of reconstructing the 3-D motion and structure from the image sequence can be significantly simplified or made betterconditioned if we know which subset of points belong to the same plane in 3-D. For instance, the classical “x-point” algorithms become ill-conditioned when they happen to be applied to coplanar features, and it is therefore very important to know in advance which subsets of features are coplanar. In addition, it is well-known to practitioners of multiple-view geometry that knowing some planes in 3-D (and their homographies) may greatly facilitate many reconstruction problems such as the projective-to-affine stratification [4], structure from motion [3], camera self-calibration [12,18], wide baseline matching [8], planar parallax [5], and the study of dynamical scenes [10], just to name a few. However, without first performing the 3-D reconstruc-
tion, how can we determine which subsets of the features are coplanar in 3-D? This paper aims to provide principled answers and practical algorithms for this problem. We will show rigorously that it is possible to segment a set of corresponding features from two perspective views into coplanar subsets without explicitly performing 3-D reconstruction. Notice that, we do not assume that the scene is static. The planes can be moving independently from one another with respect to the two views or they can be all static with respect to each other. For either case, there is always a unified solution to segment the features into coplanar subsets. More precisely, in this paper we provide a solution for the following mathematical problem: Problem (Homography Segmentation) Assume that a scene is piece-wise planar. Given a set of corresponding feature points {(xk1 , xk2 )}N k=1 from two perspective (calibrated or uncalibrated) views of the scene, estimate the number of planes, the homography relations, and segment the feature points based on the 3-D planes to which they belong. Relation to Previous Work. Despite its practical importance, the above problem has received relatively less attention than other more general image/motion segmentation problems [11]. This is probably because most methods for segmenting rigid-body motions (i.e., fundamental matrices) [14, 17] do not directly apply to homographies. Given an image sequence, common statistical approaches have been proposed to select point sets from the most likely planes using the RANSAC method [1, 4, 15] or other iterative schemes [7]. Sinclair and Blake [9] introduced an effective projective planar invariant to conduct local searches on the images for five coplanar points, and further predict new image positions based on their standard deviations. Recently, [13] discovered that the problem of segmenting independently moving planar objects is solvable algebraically based on a complexification of the homography matrices which essentially reduces the problem to the same as segmenting fundamental matrices. Contributions of This Paper. In this paper, we provide an alternative algebraic solution to homography segmentation other than the solution based on complexification given by [13]. The new solution exploits properties of the homo-
graphy exclusively in the real domain. By a comparative study of the two solutions in both the complex domain and the real domain, we examine more thoroughly the mathematical properties and algorithmic issues associated with segmenting multiple planar objects, regardless of their 3-D motion(s). Furthermore, in order to better deal with noise, we propose a more robust objective function for estimating the multi-plane homography based on the Rayleigh quotient which is better suited for segmentation purposes. Although we do not address the issue with outliers, the result will provide insight for the development of robust techniques in the future. We conduct extensive simulations and experiments on synthetic data and real images to validate the performance of the proposed solutions and algorithms.
2. Homography Segmentation via Complex Bilinear Embedding We first review the formulation of the complex homography constraint based on [13]. In particular, we discuss the geometric interpretation of the induced complex epipolar lines and epipoles from the complex homography constraint. We then propose a new method using a Rayleigh quotient to estimate the complex multi-plane homography, which boosts the performance in the presence of noise.
2.1. Complexification of Homography Given a feature point X ∈ R3 on a plane in the space, the image points in the two views satisfy the homography relation under the homogeneous coordinates:1 x2 h11 h12 h13 x1 x2 ∼ Hx1 ⇔ λ y2 = h21 h22 h23 y1 , (1) 1 h31 h32 h33 1 where H ∈ R3×3 is the homography matrix. Since H is full rank, traditional motion segmentation methods [14, 17] for the fundamental matrix F , which is of rank 2, no longer apply to the segmentation of homographies. However, we may complexify the homography matrix H as the following: λ
»
– » x2 + iy2 h + ih21 = 11 1 h31
2
3
– x h13 + ih23 4 1 5 y1 , h33 1
h12 + ih22 h32
√ H matrix where i = −1, and we denote the complexified ˜ ∈ C2×3 . Define w2 = −x 1+iy ∈ C2 and also as H 2
2.2. Complex Epipolar Line and Epipole Notice that just like the fundamental matrix F , the com˜ is of rank 2. Comparing equaplex homography matrix H tion (2) with the classical epipolar constraint xT2 F x1 = 0, we can define a similar concept of the (left) complex epipolar line: ˜l ∼ H ˜ ∗ w 2 ∈ C3 , (3) which still satisfies the geometric condition: ˜l∗ x1 = 0. Now consider a plane P in space. Assume we have N feature points on P , and they all obey the same homography relation H. After the complexification of H and {xk2 }N k=1 , they yield N (possibly) different complex epipolar lines ˜ has rank-2, there {˜lk }. Since the complex homography H 3 ˜ ∈ C such that exists a vector e ˜e ˜ = 0. H
(4)
˜ = [˜ If we further normalize e e1 , e˜2 , 1]T by a complex ˜ becomes an image point on a complex imscalar, then e age plane at distance 1 from the camera center. According to this definition, for any complex epipolar line ˜l in the first ˜ = 0. That is, e ˜ is the intersection of all complex view, ˜ l∗ e ˜ is called the complex epipole associated epipolar lines. e to the homography H. In summary, a homography relation associates one complex epipolar line to each pair of feature points, and all complex epipolar lines intersect at the complex epipole of the homography.
2.3. Estimation of Multi-Plane Homography Now assume we know there are n different planar structures in a scene. We now propose a new approach to estimate the complex multi-plane homography and complex epipoles from the image data. For noisy data, a significant improvement in the accuracy will be achieved by estimating an optimal segmentation multi-plane homography based on the Rayleigh quotient. Each image pair (x1 , w2 ) must satisfy n Y
(w ∗2 H˜k x1 ) = 0,
(5)
k=1
since the image pair has to satisfy one of the n different homography constraints. (5) can be written using an nthdegree Veronese embedding of the complex data (x1 , w2 ) in a bilinear form [13]:
2
consider x1 ∈ C3 , then we rewrite (1) as the following complex bilinear constraint:
˜ n (x1 ) = 0 ∈ C, νn (w 2 )∗ Hν
(6)
K
˜ 1=0 w ∗2 Hx
∈ C,
(2)
where ∗ denotes the Hermitian transpose. This constraint is called the complex homography constraint. 1 In
this paper, the operator “∼” means “equal up to a scalar.”
where νn : CK → CMn of degree n is defined as n T νn : [z1 , . . . , zK ]T 7→ [z1n , z1n−1 z2 , . . . , zK ] ,
(7)
and the dimension of the codomain MnK = ( n+K−1 ) . The n ˜ is called the multi-plane homography matrix. matrix H
We further rewrite the bilinear equation (6) as the following linear form: ∗ S ˜ = 0 ∈ C, νn (w 2 ) ⊗ νn (x1 ) H (8)
2 3 Mn ·Mn
where νn (w2 ) ⊗ νn (x1 ) ∈ C is the Kronecker product of the two column vectors νn (x1 ) and νn (w 2 ), and ˜ S is the stacked version of the matrix H. ˜ H When we have N pairs of image points, and assume the model number n is known, the matrix Ln defined by ˆ
N νn (w12 ) ⊗ νn (x11 ) , . . . , νn (wN 2 ) ⊗ νn (x1 ) ∈ C
˜
2 3 Mn ·Mn ×N
2 6 6 4
∂u2 x2 ∂u
∂u2 x2 ∂v
.. . ∂v 2 z 2 ∂v
∂v 2 z 2 ∂u
··· .. . ···
(9)
as illustrated in Figure 1. In the presence of noise, a linear ˜ S is given as the singular least-square (LLS) solution to H vector of L∗n associated with the smallest singular value. However, the LLS solution might not be the one that is the best for segmentation, so we present a new approach based on the Rayleigh quotient [2], which will lead to a better multi-plane homography for segmentation. xM M PSfrag replacements C
∈
C18 . Its
∂u2 x2 ∂z
3 2 3 2ux2 0 2u2 x 0 0 7 6 7 .. .. 7=4 5. 5 . . 2 2 ∂v 2 z 2 0 2vz 0 0 2v z ∂z
Define N
. X Ln (k)Ln (k)∗ , A= k=1
has exactly a 1-dimensional left null space spanned by the ˜ S . In other words, H ˜ S defines an (M 2 · M 3 − 1) vector H n n dimensional hyperplane, which is the subspace containing all the data vectors of the form νn (w 2 ) ⊗ νn (x1 ), such that ˜S = 0 L∗n H
[u2 x2 , u2 xy, . . . , u2 z 2 , uvx2 , . . . , v 2 z 2 ]T gradient ∇L2 ∈ C18×5 is given by
N
. X ∇Ln (k)∇Ln (k)∗ . B= k=1
˜ S that best segments Then the multi-plane homography H the n planes minimizes the following Rayleigh quotient: ∗ ˜ S = argmin c Ac . (10) H c∗ Bc c A closed-form solution to this problem is given as the minimal generalized eigenvector of the matrix pair (A, B). Geometrically, equation (10) aims to obtain an optimal solution ˜ S in the presence of noise, such that the solution simulof H taneously minimize the model fitting error and maximize the model distances between homographies. As we will see from experiments, the solution given by the Rayleigh quotient outperforms the LLS solution.
2.4. Segmenting Multi-Plane Homography ˜S H
x2
o x1 ˜ S defines a hyperplane in the ambient space. Figure 1. H The normal evaluated at each point should be at the same direction.
The matrix Ln is a function of a five-dimensional vector [w2 , x1 ]T ∈ C5 , and we compute the gradient with respect to [w 2 , x1 ]T for each column vector k, k = 1, . . . , N , representing a multi-plane homography constraint for one pair of points: ∂νn (xk1 ) ∂νn (wk2 ) k k ⊗ νn (x1 ), νn (w 2 ) ⊗ , ∇Ln (k) = ∂w2 ∂x1 Then the whole matrix ∇Ln has the dimension of (Mn2 · Mn3 ) × 5 × N . Example 1 Assume we have a pair of image points x1 = [x, y, z]T and w2 = [u, v]T . Then the second degree data matrix L2 is ν2 (w2 ) ⊗ ν2 (x1 ) = [u2 , uv, v 2 ]T ⊗ [x2 , xy, xz, y 2 , yz, z 2 ]T =
The fact that different homographies normally result in different complex epipoles allows us to segment feature points from an unknown number of planar structures. For each image pair (xk1 , wk2 ), its complex epipolar line ˜lk is: k ∗ ˜lk ∼ νn (wk ) ⊗ ∂νn (x1 ) H ˜S 2 ∂x1
∈ C3 ,
(11)
which satisfies the property that ˜lk∗ · xk1 = 0. If we know there exist n different planar structures in the scene, each structure is associated with a complex epipole ˜k , k = 1, . . . , n. So for an arbitrary epipolar line ˜ e l, it must satisfy: n Y ˜k ) = 0. (˜l∗ · e (12) k=1
Then the problem of segmenting feature points to different homographies becomes a standard subspace segmentation ˜k , k = 1, . . . , n. problem with unknown subspace normals e A solution to this problem has been solved in [13]. Remark 2 (Number of Planes) The above discussion assumes that we know the number n of planar structures. We have several solutions to determine this number using the
rank condition and the eigenvalues of the data matrix Ln . Theoretically, this number is determined by: 2 3 n = argmin {rank(Lm ) = Mm · Mm − 1}
(13)
m
noise levels, we add Gaussian noise on the image points after the perspective projection. At each noise level, the experiment is repeated 100 times. The results in Figure 2 bottom show that the proposed Rayleigh quotient boosted the performance from the original LLS solution.
as Ln has a one-dimensional null space for the correct n. In the presence of noise, the following model selection criterion [6] gives very good estimations: n = argmin m
n
2 o σM 3 2 (Lm ) 3 2 m ·Mm +κMm ·Mm , (14) PMm 3 ·M 2 −1 m σk2 (Lm ) k=1
where σk (Lm ) is the kth singular value of Lm and κ > 0 is a small weighting value. We outline the segmentation process as Algorithm 1.
Algorithm 1 (Segmentation via Complex Bilinear Embedding). 1: Complexify x2 → w 2 , and normalize the data vectors x1 and w2 . Construct Lk , k = 2, . . . , nmax , where nmax is a preset model number threshold. Then the number of planar structures could be determined from equation (14). 2: Minimize the following Rayleigh quotient to get the ˜S : multi-plane homography H ∗ ˜ S = argmin c Ac . H c∗ Bc c 3:
4: 5: 6:
of a camera; Bottom: Classification rate versus noise level (left: Rayleigh quotient; right: LLS).
The second experiment shows the segmentation of three fixed planar structures that mimic a corridor: the camera is moving backward from the first view to the second one (Figure 3 top). Figure 3 bottom shows the average classification rates using the Rayleigh quotient versus LLS at different noise levels.
For each pair of points (xk1 , w k2 ), its corresponding complex epipolar line is: k ∗ ˜lk ∼ νn (w k ) ⊗ ∂νn (x1 ) H ˜S . 2 ∂x1 Qn ˜k ) = 0. Identify the polynomial pn (˜l) = k=1 (˜l∗ · e for k = n : 1 do |p (z)| ˜ k = argmin y z∈{˜ l}
7: 8:
Figure 2. Top: Two independently moving planes in front
n +δ k∇pn (z)k ∗ ·z|+δ ; |˜ e∗ ·z|···|˜ e n k+1
˜k = e
∇pn (˜ yk ) k∇pn (˜ y k )k
( δ > 0 is a small positive number chosen to avoid cases in which the denominator or the numerator is zero.) end for Segment each image pair (x1 , w 2 ) as follows: ˜k }. class = argmin {˜l∗ · e k=1,...,n
Example 3 We conduct two experiments on synthetic data to test the performance of the algorithm. In the first experiment, the camera is fixed, and two planes undergo different rotations and translations in space (Figure 2 top). To demonstrate the performance of the algorithm at difference
Figure 3. Top: Two views of three fixed planes in front of a moving camera; Bottom: Classification rate versus noise level (left: Rayleigh quotient; right: LLS).
3.
Homography Segmentation Quadratic Embedding
via
Real
3.1. Real Quadratic Form In this section, we propose a method to embed the homography relation into a system of quadratic polynomials in the real domain.
Recall the homography relation for an image pair {x1 = [x1 , y1 , z1 ]T , x2 = [x2 , y2 , z2 ]T } under homogeneous coordinates is b 2 Hx1 = 0 ∈ R3 , x (15)
b ∈ R3×3 is a skew symmetric matrix of x such where x b z for all z ∈ R3 . Since rank(b that x × z = x x) = 2 for all nonzero vectors, the three equations in (15) are not independent. We pick the first two equations to describe the relation: h11 h12 h13 x1 0 −x2 y2 h21 h22 h23 y1 = 0 ∈ R2 . x2 0 −z2 z1 h31 h32 h33 (16) We cannot solve for H by any linear embedding of (x1 , x2 ) using this constraint, since the forms of the two rows of the first matrix are different. (The complexification approach we just described is one way to resolve this problem.) However, this relation can be described by two quadratic equations of the form: ¯ T ∈ R3×3 x1 T T 0 ∈ R3×3 H j x1 , x2 = 0, (17) ¯ j ∈ R3×3 x2 H 0 ∈ R3×3 h 21 −h22 −h23 i ¯2 = ¯ 1 = −h where j = 1, 2. We set H h31 h32 h33 , H 0 0 0 h h11 h12 h13 i 0 0 0 . It is easy to verify that the two quadratic −h31 −h32 −h33 equations are equivalent to the original homography constraint (16). If we denote the matrix in equation (17) as Qj , and y = [xT1 , xT2 ]T ∈ R6 , the homography constraint becomes: y T Qj y = 0, j = 1, 2. (18) Notice that (18) is not a general quadratic form since the diagonal blocks of Qj need to be zeros according to equation (17). Let {yi }6i=1 be the six entries of y. Then, y T Qj y is a special quadratic form which only involves monomials of the type yi yj with i 6= j.
3.2. Real Quadratic Embedding Given n homographies in the scene, we have n pairs (Qk1 , Qk2 ). A point y must satisfy the following constraint: n n [
(y T Qk1 y = 0)
k=1
T
\
o (y T Qk2 y = 0) ,
(19)
S
where means logical “and”, and means logical “or”. Applying De Morgan’s law, we rewrite the above equation as n \[ (y T Qkσk y = 0), (20) σ k=1
where σ = {(σ1 , . . . , σn )} is a set of combinatorial sequences of 1 or 2 of length n. That is, y satisfies a set of
fitting polynomials of the form: pσ (y) =
n Y
(y T Qkσk y) = 0.
(21)
k=1
Example 4 Assume we have two homographies: (Q11 , Q12 ) and (Q21 , Q22 ). Then any y must satisfies the following four fitting polynomials: (y T Q11 y)(y T Q21 y) = 0, (y T Q11 y)(y T Q22 y) = 0, (y T Q12 y)(y T Q21 y) = 0, and (y T Q12 y)(y T Q22 y) = 0. Since each term y T Qy only involves monomials of the type yi yj with i 6= j, pσ (y) is a special 2n-degree polynomial which contains monomials of the type y1n1 y2n2 · · · y6n6 such that ni ≤ n and n1 + n2 + · · · + n6 = 2n. For this reason, we consider a partial Veronese embedding: Definition 5 (Real Quadratic Homography Embedding) The quadratic homography embedding of degree 2n is µ2n : y ∈ R6 7→ [. . . , y1n1 · · · y6n6 , . . .]T ,
where n1 , . . . , n6 ≤ n and n1 + n2 + · · · + n6 = 2n.
With this notation, we rewrite (21) as pσ (y) = µ2n (y)T QSσ = 0 for some coefficient vector QSσ associated with the monomials. To identify the coefficient vector QSσ from a collection of image pairs {y k = (xk1 , xk2 )}N k=1 , we solve QSσ from the left null space of the following embedded data matrix: L2n = µ2n (y 1 ), · · · , µ2n (y N ) . (22)
That is, LT2n QSσ = 0. However, as we have demonstrated in Example 4, the choice of a fitting polynomial is not unique. After the quadratic homography embedding, the lifted data points µ2n (y) sit in a subspace of codimension larger than one, unlike the complex bilinear embedding case. We illustrate the situation with two homographies in Figure 4, in comparison with Figure 1. Similar to the complex case, we are able to segment the dataset by decomposing the polynomials pσ (y). From a segmentation point of view, one general polynomial from the null space of L2n is enough to correctly classify the data points. In the presence of noise, we again use the Rayleigh quotient to find the most discriminating polynomial from all the polynomials that fit the data set well: T c Ac , (23) QS = argmin cT Bc c where
N
. X A= L2n (k)L2n (k)T , k=1
N
. X B= ∇L2n (k)∇L2n (k)T . k=1
This Rayleigh quotient seeks the optimal solution of QS which strikes a balance between the fitting error of an image pair to one of the homographies and the distance to all the other homographies to which the image pair does not belong.
xM
Proposition 9 Given a quadratic surface S j representing a homography relation parameterized by Qj , for a point y on it, its contraction matrix becomes
PSfrag replacements
C(y) = 2¯ pj (y)T (y)T Qj T (y) ∈ R5×5 .
x2
o
Now, we consider the intersection of the two tangent spaces at the two points y 1 and y 2 , respectively. Define T (y1 , y 2 ) = T (y1 ) ∩ T (y 2 ). In general, the column vectors in T (y 1 , y 2 ) span a 4-dimensional subspace.
QS1
x1 QS2
Figure 4. An illustration of a dataset from two homographies embedded via the real quadratic homography embedding.
3.3. Segmentation After the optimal QS is computed, we have obtained a fitting polynomial p(y). In general, the zero set of p(y) is a union of quadratic hyper-surfaces {y T Qk y = 0}. Notice that points that lie on the same quadratic surface have normal vectors with different directions. Thus we cannot segment quadratic surfaces based on the first order derivatives alone. We have to also consider their second derivatives, i.e., the Hessian matrix Hy .2 Proposition 6 (Derivatives of a Fitting QnPolynomial) T k Given a fitting polynomial p(y) = k=1 y Q y. If y belongs to a quadratic surface S j parameterized by Qj , let p(y) Q = pj (y) · p¯j (y), where pj (y) = y T Qj y and j p¯ (y) = k6=j y T Qk y. Then n Hy (y)
= ∇y p(y) = 2¯ pj (y)Qj y, j j = 2¯ p (y)Q + (Qj y)(∇y p¯j (y))T +(∇y p¯j (y))(Qj y)T .
(26)
(24)
Notice the first term of Hy (y) is the scaled Hessian for the surface S j , but the remaining terms depend on derivatives of factors from other surfaces. This prevents us from directly using Hessian to segment the data to different surfaces. Definition 7 (Tangent Matrix) Given a point y ∈ R6 , the tangent matrix T (y) ∈ R6×5 is a matrix whose columns form a basis for the orthogonal complement to the space spanned by the normal n at point y. The tangent matrix T is used to eliminate the cross terms in (24). Definition 8 (Contraction of Hessians) Given a point y, the contraction matrix is defined as . C(y) = T (y)T Hy (y)T (y) ∈ R5×5 . (25) 2 To avoid the ambiguity between the Hessian and homography matrices, the Hessian matrix with respect to the variable y is denoted as Hy .
Definition 10 (Mutual Contraction) Define the mutual contraction for a pair of points y 1 and y 2 to be the pair ¯ 1 , y 2 ), C(y ¯ 2 , y 1 )) at y 1 and y 2 with of contractions (C(y respect to T (y1 , y 2 ): . ¯ 1 , y2 ) = C(y T (y 1 , y 2 )T Hy (y 1 )T (y 1 , y 2 ), (27) . T ¯ 2 , y 1 ) = T (y1 , y 2 ) Hy (y 2 )T (y1 , y 2 ). C(y (28) ¯ 1 , y 2 ) and C(y ¯ 2 , y 1 ) are 4 × 4 symmetric matriBoth C(y ces. The following theorem shows that for a pair of points (y 1 , y 2 ) on the same quadratic surface, their mutual contraction matrices are equal up to a scale. Theorem 11 For any two points y j and y k to be on the ¯ , y ) and same quadratic surface, we denoate C¯j = C(y j k ¯ ¯ Ck = C(y k , y j ). Then the following relation holds: C¯j ∼ C¯k ∈ R4×4 ,
(29)
where ∼ means “equal up to a nonzero scalar”. Theorem 11 is a necessary but insufficient condition for distinguishing between points on different surfaces. However, points from different surfaces having similar mutual contraction matrices is a zero-measure event.3 Hence, we form a similarity matrix S, such that the elements of the ma¯ ,C ¯ i hC trix Sjk = kC¯jjkkC¯k k , where h·, ·i is the dot product. Then, k based on the similarity matrix S, any spectral clustering algorithms [16] will be able to segment the data into different homographies. The overall process of the proposed method is outlined in Algorithm 2. Example 12 We run the same datasets in Example 3 using the real quadratic embedding method, and the results at different noise levels are shown in Figure 5. Compared to the results using the complex embedding method, it performs much better at all noise levels. 3 Assume we have a union of two quadratic hypersurfaces S 1 and ˆ such that the S 2 , defined by Q1 and Q2 . Suppose there exists an Q ˆ S is in the left null space of T (y , y ) N T (y , y ) stacked vector Q 1 2 1 2 for some y 1 ∈ S 1 and y 2 ∈ S 2 . If Q1 = Q2 + ˆ then T (y , y )T Q1 T (y , y ) = T (y , y )T Q2 T (y , y ) + Q, 1 2 1 2 1 2 1 2 ˆ (y , y ) = T (y , y )T Q2 T (y , y ). Thus, even T (y 1 , y 2 )T QT 1 2 1 2 1 2 though the two points are from different surfaces, they both have the same ˆ only consist of a mutual contractions up to a scale. However, all such Q’s zero-measure set of all symmetric matrices.
Algorithm 2 (Segmentation via Real Quadratic Embedding). 1: for all corresponding image pairs (x1 , x2 ) do 2: Let y = [xT1 , xT2 ]T . 3: Compute the quadratic homography embedding µ(y) of order 2n, its derivative, and it’s Hessian. 4: end for T 5: QS = arg minc ccT Ac Bc . 6: for all y k do 7: Let p(y k ) = µ(y k )T QS . 8: nk = ∇y p(y k ). 9: Hyk = Hy (y k ). 10: end for 11: for all pair of points (y j , y k ) do 12: Let T = [t1 , t2 , t3 , t4 ] be set of vectors that form a basis for Null(nj , nk ). 13: Let C¯j = T T Hyj T and C¯k = T T Hyk T . 14: 15: 16:
¯ ,C ¯ i hC
Compute the similarity matrix Sjk = kC¯jjkkC¯kk k . end for Use the similarity matrix S to cluster the samples in n groups.
Figure 5. Segmentation results using the dataset in Example 3. (Left) Correct rates of the first experiment. (Right) Correct rates of the second experiment.
3.4. Complex versus Real Embedding We summarize the difference between the complex bilinear embedding and the real quadratic embedding as the following: 1. Noise-free cases. Both methods give the same (correct) solution to the homography segmentation problem in the absence of noise. 2. Determining the number n of planes. The data matrix of a complex bilinear embedding has co-rank 1 for the correct number of planes. However, for the real quadratic embedding, the data matrix in general has co-rank strictly larger than one, which makes it much more difficult to determine n either in theory or in practice. 3. Imposing constraints on the multi-plane homography matrix. In the real case, it is easy to enforce additional constraints on QS so that it is realizable by real homographies. In the complex case, in theory many entries ˜ k } are all of HS should be real as the last rows of {H
real. Our current algorithm is unable to enforce this condition as the numerical computation all takes place in the complex domain. We believe this is the main reason that the real quadratic embedding outperforms the complex linear embedding. 4. Algorithm complexity. The complex bilinear embedding method is much simpler to implement than the quadratic counterpart, and the complexity and dimension are both much lower. The quadratic method may not be able to handle a large number of planes due to the computer hardware limit, say n > 4. 5. Accuracy. From our simulations and experiments (next section), the real quadratic embedding consistently outperforms the complex bilinear embedding provided the number of planes is known. The algorithms given in the paper are at best conceptual. More robust implementation of both approaches may exist and it is left for future investigation.
4. Experiments The first experiment is a controlled scene of a calibration cube and a checkerboard shown in Figure 6. In total there are three planes in the images. Between the two views, the camera, the cube and the checkerboard all undergo independent motions. All feature points and their correspondence are hand-picked in Matlab. The output from the two embedding methods is also shown in Figure 6. The real quadratic embedding method gives better classification result. We also run the algorithms on noisy outdoor images. Figure 7 shows pictures of a building. We hand pick the corner points of its windows on the three sides of the walls. In this dataset, the real quadratic embedding method also outperforms the complex bilinear embedding method.
5. Conclusions In this paper, we have provided and compared two different approaches to solve the problem of segmenting a piecewise planar scene from two perspective images. Our theoretical analysis has offered tremendous mathematical insight to this problem and has shown it is in principle possible to have an exact solution to the segmentation of a piece-wise planar scene intrinsically from two perspective images. The conceptual algorithms derived from the theory give satisfactory simulation and experimental results, although more robust techniques still need to be investigated in our future research.
References [1] A. Bartoli. Piecewise planar segmentation for automatic scene modeling. In CVPR, 2001.
(a) Complex bilinear em-
(b) Real quadratic em-
(a) Complex bilinear em-
(b) Real quadratic em-
bedding. Correct classification rate is 93.1%.
bedding. Correct classification rate is 99.5%.
bedding. Correct classification rate is 78.9%.
bedding. Correct classification rate is 99.2%.
Figure 6. Two views of a cube and a checkerboard (top)
Figure 7. Two views of a building (top) and the segmen-
and the segmentation results by the two embedding methods (bottom). The segmentation is illustrated as points with different colors.
tation results by the two embedding methods (bottom). The segmentation is illustrated as points with different colors.
[2] R. Duda, P. Hart, and D. Stork. Pattern classification. John Wiley & Sons, second edition, 2001. [3] O. D. Faugeras and F. Lustman. Motion and structure from motion in a piecewise planar environment. Int. J. of Pat. Recog. and Arti. Intel., 2(3):485–508, 1988. [4] R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge University Press, Cambridge, 2000. [5] M. Irani, P. Anandan, and M. Cohen. Direct recovery of planar-parallax from multiple frames. In Workshop on vision algorithms, pages 85–99, 1999. [6] K. Kanatani. Evaluation and selection of models for motion segmentation. In ACCV, pages 7–12, 2002. [7] M. Lourakis, A. Argyros, and S. Orphanoudakis. Detecting planes in an uncalibrated image pair. In BMVC, pages 587– 596, 2002. [8] P. Pritchett and A. Zisserman. Matching and reconstruction from widely separated views. Lecture Notes in Computer Science, 1506:78 – 92, 1998. [9] D. Sinclair and A. Blake. Quantitative planar region detection. IJCV, 18(1):77–91, 1996. [10] P. Sturm. Structure and motion for dynamic scenes - the case of points moving in planes. In ECCV, volume 2, pages 867–882, May 2002. [11] P. H. S. Torr. Geometric motion segmentation and model selection. Phil. Trans. Royal Society of London A, 356(1740):1321–1340, 1998. [12] B. Triggs. Autocalibration from planar scenes. In CVPR, 1998. [13] R. Vidal and Y. Ma. A unified algebraic approach to 2-D and 3-D motion segmentation. In ECCV, 2004. [14] R. Vidal, S. Soatto, Y. Ma, and S. Sastry. Segmentation of dynamic scenes from the multibody fundamental matrix. In
[15]
[16] [17] [18]
ECCV workshop on vision and modeling of dynamic scenes, Copenhagen, Denmark, 2002. E. Vincent and R. Laganiere. Detecting planar homographies in an image pair. In 2nd Inter. Symp. on Image and Signal Proc. and Anal., pages 182–187, 2001. Y. Weiss. Segmentation using eigenvectors: a unifying view. In ICCV, pages 975–982, 1999. L. Wolf and A. Shashua. Two-body segmentation from two perspective views. In CVPR, 2001. Z. Zhang. A flexible new technique for camera calibration. PAMI, 22(11):1330–1334, 2000.