THE QUADRIFOCAL VARIETY
arXiv:1501.01266v1 [math.AG] 6 Jan 2015
LUKE OEDING Abstract. Multi-view Geometry is reviewed from an Algebraic Geometry perspective and multi-focal tensors are constructed as equivariant projections of the Grassmannian. A connection to the principal minor assignment problem is made by considering several flatlander cameras. The ideal of the quadrifocal variety is computed up to degree 6 using the representations of GL(3)×4 in the polynomial ring on the space of 3 × 3 × 3 × 3 tensors. Further analysis gives a lower bound for the number of minimal generators. We conjecture that the ideal of the quadrifocal variety is minimally generated in degree at most 6.
1. Introduction Multi-view Geometry is a branch of Computer Vision [HZ03]. It considers several cameras in general position as a collection of projection maps. For instance, in the standard pinhole camera model the multi view map is represented by a collection of 3×4 matrices (A1 , . . . , An ) P3 → P2 × · · · × P2 [x] 7→ ([A1 x], . . . , [An x]). One popular question is how to efficiently reconstruct the 3-dimensional image from the 2dimensional projections. Recently in [AST13] a certain Hilbert scheme made an appearance in connection to multi-view geometry. Theoretical techniques utilized there included Borel fixed monomial ideals, a universal Gr¨obner basis, degeneration to a special monomial ideal, and more, in order to describe the ideal defining the generic multiview variety. From the multiview camera setup several tensors (such as the trifocal tensors [AT10, AO14] and quadrifocal tensors [SW00b]) can be constructed by supposing the camera matrices as unknowns and considering different sets of minors of the block of camera matrices. For more about the applications of quadrifocal tensors to computer vision see [HS04, HS09, KS04, WS02, TP12]. In this paper we will be mainly concerned with the quadrifocal variety, which is parametrized as follows. Collect four camera matrices in a block matrix A = (At1 |At2 |At3 |At4 ). The 81 coordinate functions are given by the 81 special 4 × 4 minors of A that only use one column from each block. The quadrifocal variety is the Zariski closure of this map in P80 . In this paper we study the geometric and algebraic properties of quadrifocal tensors. Section 2 provides a uniform construction of the epipolar, trifocal and quadrifocal tensors via equivariant projections of a Grassmannian. Section 3 addresses the case of different dimensional cameras and contains a connection between the multi-focal variety and the variety of principal minors of square matrices.
Date: January 7, 2015. 1
2
OEDING
Our main result is the following. Theorem 1.1. Let Id denote the degree d piece of the ideal of the quadrifocal variety. Id is zero for d < 3. I3 is 600-dimensional. I4 is 48, 600-dimensional but contains no minimal generators. I5 is 1, 993, 977-dimensional and contains at least 1, 377 minimal generators. I6 is 54, 890, 407-dimensional and contains at least 37, 586 minimal generators. In Section 5 we give an invariant description of all these equations. Because these are the lowest degree equations in the ideal and initial calculations indicate that there should be no more minimal generators, we conjecture that they form a complete set of minimal generators for the ideal of the quadrifocal variety. Until now, it was only known that a quadrifocal tensor must adhere to 51 non-linear constraints [SW00b]. Indeed, the quadrifocal variety has codimension 51, so there must be at least 51 equations, but our results show that it is very far from being a complete intersection. The lowest degree equations are 600 cubics. In Section 4 we give a simple description of these equations via contractions. From the contraction description we see that the cubic equations are a consequence of the fact that every contraction of a quadrifocal tensor is a homography tensor [SW00a]. The additional equations take the set of tensors having that property alone and cut it down to the quadrifocal locus. 2. Epipoles, fundamental matrices, trifocal and quadrifocal tensors 2.1. The multi-view setup. Multiple view geometry arises when one considers many images taken of the same scene, from (possibly) different viewpoints and is beautifully presented in [HZ03]. The following introduction is an invariant view inspired by the ideas in [HZ03, Ch. 17]. Let Aj denote 3 × 4 camera matrices (non-degenerate) for 1 ≤ j ≤ n, with row spaces equal to Vj , a three-dimensional vector space. Let W denote a 4-dimensional vector space, whose projectivization represents the 3dimensional “projective world”. For fixed camera matrices Aj , the multi view map (which also appeared in [AST13]) is (1)
PW
(A1 ,...,An )
/
PV1 × · · · × PVn .
Now we wish to treat the camera matrices as variable or as having indeterminate entries. The map in (1) is the same if we replace the matrices Aj with scalar multiplies of themselves. So, we should consider our space of parameters for cameras to be P(W ∗ ⊗V1 ) × · · · × P(W ∗⊗Vn )
n-camera space.
We note here that if different camera models are taken, this paradigm may be easily altered to accommodate such changes by altering the spaces in which the cameras are modeled. 2.2. The Grassmannian. Faugeras and Mourrain studied multi-view geometry from the point of view of the Grassmann algebra, see [FM95, HS04]. We also adapt that approach as it provides a uniform treatment and a convenient way to organize many of our computations.
THE QUADRIFOCAL VARIETY
3
2.2.1. Exterior products. Recall if VU is a vector space with basis u1 , . . . , un , we construct the exterior powers of U, denoted k U by considering the alternating (or wedge) product ∧ and forming the vector space of k-vectors (length k wedge products) with basis consisting V of pure k-vectors {ui1 , . . . , uik | 1 ≤ i1 < i2 < · · · < ik ≤ n}. Thus k V has dimension nk if 0 ≤ k ≤ n and 0 otherwise. It is straightforward to see that v1 ∧ · · · ∧ vk 6= 0 if and only if the vectors {v1 , . . . , nk } are linearly independent. Consider two non-zero k-vectors v1 ∧ · · · ∧ vk and w1 ∧ · · · ∧ wk and the underlying vector spaces E := span{v1 , . . . , vk } and F := span{w1 , . . . , wk }. It is straightforward to check that Vk V v1 ∧ · · · ∧ vk = λ(w1 ∧ · · · ∧ wk ) for some λ 6= 0 if and only if E = k F,
and equality holds V when λ is the determinant of the change of basis between E and F .V We denote by P k U the projective space consisting of lines through the origin oin k U, n V which we may consider as the set of classes [ω] = λω | λ ∈ C \ {0}, ω ∈ k U \ {0} . This motivates the definition of the Grassmannian (in its minimal embedding). Let Gr(k, U) denote the set of k-planes in U. The rational map: V Gr(k, U) → P k U Vk M 7→ M
is an embedding (in fact, the embedding is a minimal rational embedding). The usual Pl¨ ucker embedding is a slight variant of this construction, which we will review next in the context of multiple view geometry.
2.2.2. From multiple views to the Grassmannian. It is natural to consider the following the 4 × 3n blocked matrix, which will present a convenient way to keep track of external constraints on the multi-view setup M M = At1 At2 . . . Atn ∈ (W ∗ ⊗V1 ) ⊕ · · · ⊕ (W ∗ ⊗Vn ) = W ∗ ⊗( Vj ). j
The non-degeneracy condition is that each matrix Aj in an n-tuple ([At1 ], [At2 ], . . . , [Atn ]) must have full rank, which occurs in an open set; so the row space of M parameterizes the (Grassmannian variety) 4-dimensional subspaces of a 3n-dimensional space. The maximal minors of M give coordinates (the Pl¨ ucker coordinates) on the Grassmannian Gr(4, 3n). These minors are also known as multilinear coordinates. In invariant language this parametrization is the following map L V V L V4 L ϕ : W ∗ ⊗( j Vj ) −→ 4 W ∗ ⊗ 4 ( j Vj ) ∼ = ( j Vj ) V (2) . M 7−→ 4 M
The image of ϕ (the Zariski closure of the image of anLopen set) is isomorphic to the cone over the Grassmannian of 4-dimensional subspaces of j Vj , which is the row space of M. In other words L V4 L c Im(ϕ) = Gr(4, ( j Vj ). j Vj ) ⊂ V4 Notice that because W is 4-dimensional, W is one-dimensional, and thus passing to the maximal minors of the concatenated matrix M removes the dependency on the world points represented by PW . It is well known that the dimension of the Grassmannian Gr(r, CN ) is r(N − r). In our example dim(Gr(4, 3n)) = 4(3n − 4).
4
OEDING
If we restrict to camera space, we have to consider the image of the map up to the ndimensional torus action which records the projective ambiguity in each of the n cameras. We can restrict the target of the map to the appropriate GIT quotient: L V4 L ∗ n c P(W ∗ ⊗V1 ) × · · · × P(W ∗ ⊗Vn ) → Gr(4, ( j Vj )//(C∗ )n . j Vj )//(C ) ⊂
From this we obtain the dimension of the GIT quotient (see [HZ03, § 17.5] and [AST13, § 6]) M c dim Gr(4, Vj )//(C∗ )n = 4(3n − 4) + 1 − n = 11n − 15. j
Remark 2.1. Here is a classical formula for the degree of the Grassmann manifold, (see [Muk93], [Ful98, Ex. 14.7.11] or [Dol12, § 10.1.2]) Y deg Gr(r, n) = (r(n − r))! (j − i)−1 . 1≤i≤r≤j≤n
For example deg(Gr(4, 6)) = 14,
deg(Gr(4, 9)) = 1662804,
deg(Gr(4, 12)) = 1489877926680.
One hope is that a better understanding of the GIT quotient of the Grassmannian might allow us to find the degree of the multi-focal tensor varieties. 2.3. Symmetry. For j ∈ {1, 2, 3, 4} let Aj be a 3 × 4 (non-degenerate) camera matrix (an element of W ∗ ⊗Vj ), with blocking Aj = (Bj |xj ). On each matrix Aj have an action of GL(Vj ) ∼ = GL(3) acting by change of coordinates in the camera plane. The action is (GL(V1 ) × GL(V2 ) × GL(V3 ) × GL(V4 )) × ((W ∗ ⊗V1 ) ⊕ (W ∗ ⊗V2 ) ⊕ (W ∗ ⊗V3 ) ⊕ (W ∗ ⊗V4 )) → ((W ∗ ⊗V1 ) ⊕ (W ∗ ⊗V2 ) ⊕ (W ∗ ⊗V3 ) ⊕ (W ∗ ⊗V4 )) (g1 , g2 , g3, g4 ), (A1 , A2 , A3 , A4 ) 7→ (g1 A1 , g2 A2 , g3 A3 , g4A4 ), where we take the action of each gj to be a change of basis in the row space of Aj . Because the matrices Aj are assumed to be full rank, we can, without loss of generality, act by an element of GL(3)×4 and assume that Bj = Id3 and move the 4-tuple (At1 |At2 |At3 |At4 ) to 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 . (3) A∼ = 0 0 1 0 0 1 0 0 1 0 0 1 x11 x12 x13 x21 x22 x23 x31 x32 x33 x41 x42 x43 Remark 2.2. There is an action of GL(W ) acting on simultaneously on all of the column spaces of Aj (which are all equal to W ). While this action doesn’t turn out to be useful (it is trivialized in the tensorial map), there is an action of S4 permuting the matrices Aj , which in turn permutes the indices in the image of the tensorial map, and preserves the set of quadrifocal tensors. 2.4. Multilinearity spaces. The 4 × 4Vminors of M come in several classes, which have 4 L invariant descriptions. By construction, ( j Vj ) is a vector space with a natural GL(3n)V L action, but we can further view this as a GL( 4 ( j Vj ))-action (each GL(Vj ) acting by invertible linear change of coordinates in Vj ), and there is a natural inclusion of G := Q j GL(Vj ), which may be thought of as block diagonal (with the proper choice of basis)
THE QUADRIFOCAL VARIETY
inside GL(3n). Moreover, we may view G as a product of a torus Tn := Q (Tj ∼ = C∗ acting by scaling block j of M) and j SL(Vj ). In summary Y Y VM G := Tj × SL(Vj ) ⊂ GL( 4 Vj ). j
L
5
Q
j
Tj ∼ = (C∗ )n
j
Thus, we may consider Gr(4, Vj ) as a G-variety. On the otherQhand, on the GIT quotient, the torus action is trivialized, so we will consider G′ := j SL(Vj ) acting on L VL L V4 L ∗ n c Gr(4, Vj ) ⊂ 4 Vj ) and on the GIT quotient Gr(4, ( j Vj )//(C∗ )n . j Vj )//(C ) ⊂ The effect of trivializing the torus action is that we may identify every Vj with its dual, and this induces an identification of every irreducible representation Sπ Vj with its dual Sπ Vj . V L Now G acts on 4 ( j Vj ) and it has a decomposition into irreducible G-modules: L V L V V4 L V2 3 2 V = V ⊗V ⊕ V ⊗ Vj i j i j j i,j i,j L V2 L ⊕ V ⊗V ⊗V ⊕ V ⊗V ⊗V ⊗V , i j k i j k l i,j,k i,j,k,l where the summations are over (respectively) pairs, triples and quadruples of distinct indices. The 4 non-isomorphic module classes are as follows (we assume i, j, k, l are distinct, and is come from block i and js come from block j): 4 × 4 minor i3 j j1 j2 j k k l
space epipole space fundamental matrix space trifocal space quadrifocal space L c Now we will consider the projection of the cone over the Grassmannian Gr(4, j Vj ) to each type of multi-linearity space. The images of the projections are respectively the single view, the epipolar variety, the trifocal variety, and the quadrifocal variety. The fiber of the projection over a general point is the product of the ignored camera planes and the torus acting on the utilized camera planes. So it is interesting to consider the minimal number of cameras in each case. Moreover, because the projections on the level of vector spaces are equivariant, the images are automatically invariant (with respect to the appropriate group). Suppose now that there are at most 4 cameras. For each i ∈ {1, 2, 3, 4} let the set i {e1 , ei2 , ei3 } denote a basis of Vi , which also provides an ordered basis on C12 ∼ = V1 ⊕V2 ⊕V3 ⊕V4 . Columns in i1 i2 i1 i2 i1 i2 i j
invariant description V3 V ⊗V ∼ V V2 i Vj2 = ∼j ∗ V ⊗ Vj = Vi ⊗Vj∗ V2 i Vi ⊗Vj ⊗Vk ∼ = Vi∗ ⊗Vj ⊗Vk Vi ⊗Vj ⊗Vk ⊗Vl
2.5. Epipoles. For a pair of cameras, we consider the projection of Gr(4, 6) = Gr(4, V1 ⊕ V V V2 ) ⊂ P 4 (V1 ⊕ V2 ) = P14 to epipolar space P(V1 ⊗ 3 V2 ) = P2 . The target space is (naturally isomorphic to) the projective plane, and the map subjects onto P2 . The image is naturally GL(V1 ) × GL(V2 )-invariant, the action of GL(V1 ) being trivial and the 2-dimensional torus acts by a weight of (3, 1). The image of the projection is the space of epipoles in view 1 imposed by view 2. Thus the epipoles may be recovered from the multiview setup via a projection from the Grassmannian. To get an expression of an epipole in coordinates consider just two cameras and the matrix 1 0 0 1 0 0 0 1 0 0 1 0 . A∼ = 0 0 1 0 0 1 x1,1 x1,2 x1,3 x2,1 x2,2 x2,3
6
OEDING
An element S of single camera space has the following form in the Pl¨ ucker coordinates associated to the determinants of the matrices constructed from one column from the first block of A the three columns of the second block: X V3 S= Sp,{1,2,3} e1p ∧ e21 ∧ e22 ∧ e23 ∈ V2 ⊗V1 ∼ = V1 . 1≤p≤3
Applying this to A we get the coordinates of the epipole S1,{1,2,3} (A) (x1,1 − x2,1 ) S2,{1,2,3} (A) = (−1)(x1,2 − x2,2 ) . S3,{1,2,3} (A) (x1,3 − x2,3 ) 2.6. Fundamental matrices. V Again for a pair of cameras, we may consider the projection of Gr(4, 6) = Gr(4, V1 ⊕ V2 ) ⊂ P 4 (V1 ⊕ V2 ) = P14 to fundamental matrix space P(V1∗ ⊗V2∗ ) = P8 . The target space may be interpreted as the projectivization of a space of 3 × 3 matrices, and caries the natural action of GL(V1 ) × GL(V2 ). We might call the image of the projection the variety of fundamental matrices, which is also naturally GL(V1 ) × GL(V2 )-invariant. Because the vector spaces V1 and V2 play symmetric roles, this image is also naturally S2 invariant. It is well known that the matrices in the image of the projection have a onedimensional kernel and the image variety is just the (degree 3) determinantal hypersurface. \6)//(C∗ )2 is 7-dimensional, c 6) is 9-dimensional, the GIT quotient Gr(4, Note that Gr(4, and thus birational to the projective variety of 3 × 3 matrices of rank ≤ 2, the variety of fundamental matrices. Also note that the 2-dimensional torus acts by a weight of (2, 2). An element F of fundamental matrix space has the following form in Pl¨ ucker coordinates: X V2 V Vi ⊗ 2 Vj ∼ F = F{i,j},{k,l}e1i ∧ e1j ∧ e2k ∧ e2l ∈ = Vi∗ ⊗Vj∗ . 1≤i,j,k,l≤3, i<j, k