Improving Accuracy of Geometric Parameter ... - Semantic Scholar

Report 2 Downloads 100 Views
Improving Accuracy of Geometric Parameter Estimation Using Projected Score Method Takayuki Okatani and Koichiro Deguchi Tohoku University Aramaki Aza Aoba 6-6-01, Sendai 980-8579, Japan [email protected]

Abstract A fundamental problem in computer vision (CV) is the estimation of geometric parameters from multiple observations obtained from images; examples of such problems range from ellipse fitting to multi-view structure from motion (SFM). The maximum likelihood (ML) method is widely used to estimate the parameters in such problems, assuming Gaussian noises to be present in the observations, for example, bundle adjustment for SFM. According to the theory of statistics, the ML estimates are nearly optimal for these problems, provided that the variance of the observation noises is sufficiently small. This implies that when noises are not small, more accurate estimates can be derived as compared to the ML estimates. In this study, we propose the application of a method called the projected score method, developed in statistics for computing higheraccuracy estimates, to the CV problems. We describe how it can be customized to solve the CV problems and propose a numerical algorithm to implement the method. We show that the method works effectively for such problems.

1. Introduction Estimating geometric parameters from images, ranging from ellipse fitting to multi-view structure from motion (SFM), is a fundamental problem in computer vision. Such problems can be mainly formulated as follows: Problem Consider n observations x 1 ; : : : ; x n distributed in a space of certain dimensionality. They are assumed to have observation noises as follows: x i D xN i C "i ;

i D 1; : : : ; n;

(1)

where xN i is the true value of an observation x i and is given by a function g of two parameters  and  i as xN i D g.;  i /; i D 1; : : : ; n:

(2)

Each noise "i is assumed to independently follow an isotropic Gaussian distribution: "i  N.0;  2 I/. We estimate  (and, if necessary,  1 ; : : : ;  n ) from these n observations.

The maximum likelihood (ML) method is widely used in computer vision to solve this problem. It is because of the optimality properties of this method. By optimality we mean that as the “sample size” tends to infinity, ML estimates attain the highest accuracy that we can hope for; specifically, they are unbiased and their variances attain the theoretical lower bound. Roughly speaking, the ML method is nearly the most accurate among all estimators, provided that the sample size is sufficiently large. This optimality holds true for the above mentioned CV problems, assuming that each observation is repeatedly observed for m times (e.g., multiple images are captured for the same scene) and regarding the repetition count m as the sample size. The condition that m is sufficiently large is equivalent to that the variance  2 of the observation noise is sufficiently small. The number n of observations, instead of m, can be considered as the sample size, which appears more natural. However, as was pointed out in [4, 8], in theory, increasing the number of observations as n ! 1 does not contribute to guaranteeing the optimality of the ML estimate. This is because the above mentioned problem has a particular structure such that when a new observation x i is added, a new unknown  i is also added; that is, the number of unknowns increases proportionally with the number of observations. In statistics, such problems are referred to as the NeymanScott problem [7]. To summarize, for the ML estimate of the above mentioned problem to be optimal, it is essential that the noise variance  2 be small. This leads to the following implications: (I1) When  2 is not small, it could be possible that the ML estimate is not accurate. (I2) When  2 is not small, it may be expected that increasing the number of observations will recover the optimality of the ML estimate. However, there is no guarantee that this expectation will be realized. (On the contrary, it could reduce the estimation accuracy in theory.) In [9], we considered (I1). Specifically, we deal with a particular class of problems in which xN D g.; / as a function of  gives a hypersurface in the space of x. We described the mechanism of emergence of biases in the ML estimate for this problem class and also presented a method for correcting the biases. The following two open problems remain unsolved: 1733

2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE

(P1) How can we deal with more general problems in which xN D g.; / gives a general manifold other than a hypersurface? (P2) In relation to (I2), if the noise variance  2 is not small, is there an estimation method that can achieve the optimality when the number of observations tends to infinity( n ! 1)? In the field of statistics, many studies have been carried out on the Neyman-Scott problem having the above nonoptimality property. Some of their results could be used to solve the CV problems. In this study, we apply one of such results called the projected score method, which was first developed by Small and McLeish [10, 11] and then generalized by Waterman and Lindsay [13, 14, 6], to our problems. This method attempts to minimize the negative influence of using an estimate of  i for estimating , which is considered to be the major cause of the non-optimality, by a projection-based approach that “orthogonalizes” the estimations of the two parameters. In what follows, we first describe how this method can be applied to the CV problems. Although the method is general enough to deal with all sorts of problems, it involves tedious calculations. Therefore, we derive formulae for the CV problems to considerably simplify the calculations. We also propose a numerical algorithm to implement the method in a realistic manner and then apply it to a few simple problems to examine the effectiveness of the method.

2. Geometric parameter estimation and nuisance parameters 2.1. Example of problem In this paper, we focus mainly on the problem stated at the beginning of this paper. To begin with, we show a few examples of the problem. An example of the problem is fitting a simple ellipse .x=a/2 C .y=b/2 D 1 to point data. In this case, the coordinates .x; y/ of each point is an observation, and their true values can be represented as xN D Œx; N y N > D g.a; b; / D Œa cos ; b sin > . Another example is estimating a projective transformation represented by Œx 0 ; y 0 ; 1> / H Œx; y; 1> from point correspondences between two images of a single plane. Then, Œx; y; x 0 ; y 0  is an observation, and (independent) components of H constitute . Selecting  as   Œx; N y N > , i.e., the true coordinates of the first image point, g can be designed appropriately. In a similar manner, the problem of SFM, in which multi-view images are used, can be formulated. If more weight is placed on the structure recovery, we denote the three-dimensional coordinates of feature points by  and the camera parameters of the i-th image by  i . Then, g can be designed appropriately. (Note that this is the method of bundle adjustment.) The following is noteworthy. We assume throughout this paper that, as in Eq.(2), the true value of an observation is explicitly given as xN i D g.;  i /. Depending on problems, it may be implicitly given as f .xN i I / D 0, for example, the estimation of a fundamental matrix. However, most of such implicit representations can be converted into explicit representations by choosing some appropriate parametriza-

tions. Therefore, this assumption does not reduce the range of problems that can be dealt with.

2.2. Maximum likelihood estimation and two types of asymptotics As stated earlier, we assume that noise for an observation x i follows an isotropic Gaussian distribution N.0;  2 I/. (This assumption is made for simplicity and it is possible to extend it to a more general case.) Thus, the probability density of x i is given by   1 2 p.x i I ;  i / / exp  2 .x i  g.;  i // : (3) 2 Because of the independence between the observations, all the observations x 1 ; : : : ; x n have the density p.x 1 ; : : : ; x n I ;  1 ; : : : ;  n / /

n Y

p.x i I ;  i /:

(4)

i D1

Note that  is associated with all the observations, while  i is associated only with the i-th observation. In statistics, it is often the case that for problems having a structure similar to the considered problem, one is interested in estimating , but not  i . In such a case,  is usually called an interest parameter and  i a nuisance parameter. In this paper, we follow this convention and use these names to distinguish between the parameters, although in CV problems, we are often interested in  i . (The point here is that the number of  i ’s grows with the number of observations.) The ML estimates of these parameters are given as the parameter values that maximize the likelihood of all the observations, i.e., the values of  and  1 ; : : : ;  n maximizing the following loglikelihood: l.;  1 ; : : : ;  n / D

n X

li .;  i / C const.;

(5)

i D1

where li is the loglikelihood for a single observation x i and is given by li .;  i / D 

1 .x i  g.;  i //2 C const. 2 2

(6)

Generally, the accuracy of the ML estimate is discussed in an asymptotic sense for the “sample size” tending to infinity. In our problem, there are two types of asymptotics. One is that the number n of observations tends to infinity, n ! 1; and the other is that when m independent observations x i1 ; : : : ; x im are obtained assuming an identical point xN i to be repeatedly observed, the number m of repeated observations tends to infinity, m ! 1. For the latter, obtaining m observations of an identical point, each of which follows N.xN i ;  2 I/, is equivalent to obtaining a single observation following N.xN i ; . 2 =m/I/. Therefore, m ! 1 is equivalent to  2 ! 0. For these two types of asymptotics (i.e., n ! 1 and m ! 1), the ML estimate calculated as above gives contrasting optimality results. It is optimal in m ! 1 (or 1734

equivalently  2 ! 0), whereas it is not in n ! 1 [4, 9]. The non-optimality in the latter case is due to the presence of the nuisance parameter  i in each observation x i ; as the number n of observations increases, so does the number of nuisance parameters. In fact, it is known [13] that for problems with many parameters, the ML estimate can become biased or can lose (asymptotic) efficiency even if it remains unbiased. On the other hand, these changes cannot occur in the asymptotics of m ! 1 ( 2 ! 0); in this case, standard optimality results hold and the ML estimate is optimal.

2.3. Estimating nuisance parameters and resulting biases A typical case where the ML estimate is non-optimal is when it is biased. In [9], we discussed the biases of the ML estimate for a subset of the problem class considered here. It is a set of problems in which xN D g.; / gives a hypersurface in the space of observations when it is considered as a function of . We also explained the mechanism of emergence of a bias in the ML estimate for this problem class. It is usually required to estimate  i even if we are interested only in . We discussed that a bias emerges in the very process where the estimate of  i is used for estimating . From the analysis, we showed that the magnitude of the bias is closely connected to the ratio of the curvature of the hypersurface and the noise variance  2 ; when they are comparable, because  2 is large and/or the curvature is large, the bias will be non-negligibly large. A central issue of this study is to consider how to deal with the other subset of problems, which is the set of problems in which xN D g.; / does not give a hypersurface (i.e., a submanifold of codimension 1) but gives a more general manifold. In this case, it no longer appears possible to understand the phenomenon using simple properties of differential geometry such as curvature. Nevertheless, it is still true that using the estimate of  i for estimating  contributes to the biasing of . In the field of statistics, this is explained using a tool called the estimating function, as will be briefly described below. The ML estimate, the maximizer of the loglikelihood l given by Eq.(5), is also a solution of the following simultaneous equations, which are obtained by equating the partial derivatives of l with respect to the parameters to zero: u0 .;  1 ; : : : ;  n / D 0; v1i .;  i / D 0;

be rewritten as n X

u0i .; O i .// D 0:

(9)

iD1

In short, the ML estimate O is a solution of this equation. O the ML estimate of  i is given as (Using the resulting , O O can be O i ./.) As shown below, this substitution of O i ./ considered to be the cause of the bias. We introduce here an approach of statistical inference called the method of estimating functions [2]. In this approach, introducing a function f .; x/ of the parameter  and the observations x, we consider the solution  of f .; x/ D 0 given x to be an estimate of  for x. The introduced function f is called an estimating function, and the equation f D 0 is called an estimating equation, an example of which is Eq.(9). An advantage of this approach is that optimality properties of estimators can be discussed in terms of their corresponding estimating functions. For example, the condition that an estimate is unbiased can be represented as EŒf .; x/ D 0, where EŒ is an expectation with respect to the density of x (this applies to the rest of this paper unless otherwise stated). In the framework of estimating functions, we can consider ML estimation to be an inference method that uses the score functions of the parameters (i.e., u0i and v1i ) for the estimating function. For problems without the NeymanScott structure (i.e., problems without increasing nuisance parameters), the score function, given as u0i ./ in our problem, is found to be an optimal estimating function. (This is the same as the optimality in the asymptotics m ! 1.) On the other hand, for problems having the Neyman-Scott structure, the -score will have the form u0i .;  i /, thereby containing the unknown  i . Since its true value is unknown, we have to estimate  i (even if we are not interested in  i ). In ML estimation, this problem is solved by substituting the ML estimate O i ./ of  i into the -score, as in Eq.(9), and using the resulting function for the estimating function. However, it can generally be shown that E

" n X

# u0i .; O i .// ¤ 0:

(10)

i D1

i D 1; : : : ; n;

(7a) (7b)

where u0  @l=@ and v1i  @l=@ i (the meaning of the subscripts will be explained later); u0 and v1i are called score functions of  and  i (or -score and -score), respectively. We denote  i satisfying Eq.(7b) for a fixed  as O i ./. Then, the ML estimate of  can be regarded as a solution of O the equation obtained by substituting ./ into Eq.(7a): u0 .; O 1 ./; : : : ; O n .// D 0

(8)

Defining u0i .;  i /  @li =@ and from Eq.(5), Eq.(8) can

According to the theory of estimating function, this implies that the ML estimate O is biased.

3. Projected score method From the discussion in the last section, it is desirable, if possible, to estimate  independently of the estimation of  i . In this section, we introduce the projected score method [13, 11] that is based on this concept; the method attempts to reduce the influence of the estimation of  i as much as possible. The purpose of this paper is to show how this method can be applied to the CV problems and to examine its usefulness. Thus, for the theoretical aspects of the method, refer [13]. 1735

3.1. Outline of the theoretical background

3.2. Bhattacharyya basis

Our concern here is to find a good estimating function that makes desirable estimation possible. The method for designing an optimal (or nearly optimal) estimating function depending on the difficulty of each problem has been described in [12]. (We will omit the subscript i of the observations hereafter, since the observations are equally treated, and thus we have to consider only a single observation. The function for all the observations is merely given by the sum of the functions for independent observations, as in Eq.(9).) The simplest case is when there exists a complete sufficient statistic s.x/ for , and it is independent of . For these problems, which are called Type I problems in [12], it is shown [2] that an optimal estimating function is given by the conditional score uc defined by

The projected scores are defined by using a set of functions called the Bhattacharyya basis. Their calculation is the main part of the calculation of the projected scores. Let s be the dimensionality of  and t be positive integer (t D 1; 2; : : :). Define vi1 it for t integers i1 ; i2 ; : : : ; it that satisfy 1  ij  s [13, 1] as

uc  u0  EŒu0 j s;

(11)

where EŒ j s is the expectation conditioned on s. Since uc D uc ./ and it is free of ,  can be estimated independently of . NoteP that the estimate of  is given as a solution of the equation, uc ./ D 0, where the sum is taken over all observations. A more general case is that a complete sufficient statistic for  i exists, but it depends on . The problems in this case are called Type II. Then, it is shown in [5] that the conditional score uc still holds some optimality (e.g. debiasing property). In this case, uc depends also on  as uc D uc .; /, and thus, unlike Type I problems, independent estimation of  and  is not possible. Therefore, we substitute the ML estimate O into uc , as in the ML method, P O and solve uc .; .// D 0 for . Although this procedure is the same as ML estimation, the use of the conditional structure (i.e., the modification of the  score as in Eq.(11)) reduces the influence of the substitution of the estimate of . The last and the most general case is when there is no sufficient statistics for . The problems in this category are called Type III. For them, it was proposed in [10, 11, 13] to adjust the score function based on a projection-based approach so that the influence of  is reduced. The resulting function is called the projected score. It is defined for positive integers t D 1; 2; : : : and is denoted by ut . An intuitive rationale behind the use of ut for the estimating function is that for Type II problems, when t ! 1, ut asymptotically approaches uc . Although it is unrealistic to actually take t ! 1, the approximation (i.e., ut  uc ) is usually accurate even for t as small as 2. In addition, ut can always be calculated systematically and mechanically. (For Type I and II problems, in which uc can be obtained from definition, the calculation of uc involves the determination of conditional structures, which is often not easy.) Similar to uc in the case of Type II problems, ut depends on  as ut DPut .; /. Thus, as in the case of uc , we substitute O into ut .; / D 0; owing to the nature of ut similar to uc , the bad influence of this substitution seen in the original ML solution is mitigated. Because of their complexity, most of the CV problems are categorized into Type III. Thus, we have to use the projected score method for solving them.

vi1 it 

@t l.xI ; /: @i1    @it

(12)

From the definition, the number of possible vi1 it ’s is determined by the dimensionality s of  and the order t . For example, when s D 2 (i.e.,  D Œ1 ; 2 > ), there are two functions in the case of t D 2, i.e., v1 and v2 , and four functions in the case of t D 4, i.e., v11 , v12 , v21 , and v22 . The set of functions fvi1 it g is referred to as the Bhattacharyya basis of order t. The following recurrence formula is convenient for the calculation of these functions: vi1 i2 it it C1 D

@ vi i i C vi1 i2 it vitC1 @itC1 1 2 t

(13)

This enables the calculation of higher-order basis elements in terms of the derivatives of lower-order basis elements We denote the vector containing all the t -th order bases by vt . For example, when s D 2, v1 D Œv1 ; v2 > , and v2 D Œv11 ; v12 ; v21 ; v22 > . We also use the following notation: > > wt  Œv> 1 ; : : : ; vt  :

(14)

3.3. Projected scores Projected scores are defined to be the -score (i.e., u0 .; /  @l.xI ; /=@) minus its projection onto the space spanned by the Bhattacharyya basis. Specifically, the t-th order projected score ut is defined as ut D u0  …t u0 ;

(15)

Qt  where …t is the projection operator onto the space of w > Ow Q t, Œ1; w>  . For any random variable y, … y gives P t t 2 O is P minimizing EŒ.y  P w Q t / , where EŒ repwhere P resents an expectation at the model of the density of x. By calculating the above projection and using identity relations EŒu0  D 0 and EŒvj  D 0 .j D 1; : : : ; /, we can rewrite ut as >  ut D u0  EŒu0 w> t EŒwt wt  wt :

(16)

This is for the case of a single observation. Denoting the projected score with the i -th observation x i by uti , the t -th order projected score for n observations x 1 ; : : : ; x n is given by their sum as follows [13]: ut .;  1 ; : : : ;  n / D

n X i D1

1736

uti .;  i /:

(17)

The estimate is computed using the projected score ut thus calculated for a particular t. Similar to the case of the original ML method, substituting the ML estimate O i D O i ./ of  i into Eq.(17), we solve n X

ut i .; O i .// D 0

1. Compute the ML estimates of the parameters by P 2 O .0/ minimizing J D i .x i  g.;  i // . (Let  .0/ .0/ and O 1 ; : : : ; O n be them.) Compute O2 from the residue of J after the minimization. 2. Repeat 3 and 4 for k D 1; : : : in an alternative manner until convergence. P .k1/ / D 0 for , where  .k1/ 3. Solve i uti .; O i is used for an initial value. (Let  .k/ be the solution.) 4. Update  1 ; : : : ;  n by minimizing J while fixing .k/ .k/  D  .k/ . (Let O 1 ; : : : ; O n be the results.)

(18)

i D1

to obtain the estimate O t . Note that although the first order projected score u1 .; / is different from the original score u0 .; /, once O ./ is substituted into u1 , u1 becomes identical to u0 , i.e., O O O u1 .; .// D u0 .; .//. This is because  D ./ leads to v1 D 0 and the first-order score reduces as u1 D u0  EŒu0 v1 EŒu0 v> 1 v1 D u0 . Therefore, it is meaningful to use only the second- and higher-order projected scores (t  2).

Figure 1. The proposed algorithm

rO2 is a new estimate of r. The difference between rO2 and rO0 is approximately given as

3.3.1 Example: estimation of simple circle

rO2  rO0 D 

Using a simple example problem, we show how the above method of projected scores is used in practice. For this purpose, we consider a problem of estimating the radius of a circle with a fixed center. Representing the circle as x 2 C y 2 D r 2 , we define g.r; / D Œr cos ; r sin > . We assume that each observation x i D Œxi ; yi > follows N  2 I 2 /, where xN D g.r; /. Then, we estimate r from N.x; n observations. The ML estimate of r is calculated as follows. First, solving the equation of the -score, i.e., v1i .r; i / D 0, we obtain the ML estimate Oi as Oi D atan2.yi ; xi /. (Note that O is independent of r in this case.) Next, substituting this into the sum of the r-score and solving the resulting P equation, i.e., i u0i .r; Oi / D 0, we obtain the ML estimate rO0 of r as follows: n

1X jx i j: rO0 D n

(19)

i D1

The projected score method calculates another estimate as follows. The second order score u2i .r; i / is used here. Substituting g, defined before, into Eq.(25) (the formula is given later), u2i .r; i / is given by u2i D u0i 

P

i

4. Numerical computation Thus, the solution of Eq.(18) yields the estimate of our interest. Although it seems to be assumed in the statistical literature that the solution can be analytically obtained, it is usually impossible for many CV problems, due to their complexity and large dimensionality. Therefore, we propose to numerically solve the equation.

4.1. Algorithm Suppose that we have the t -th order projected scores uti .;  i / for a particular t . Then, the equation to be solved P is Eq.(18), i.e., i uti .; O i .// D 0, where O i ./ is the ML estimate of  i for a fixed . Since it is difficult to analytically derive O i ./ in most cases, we propose to simultaneously solve for  and  i . We first consider numerically solving the following simultaneous equations: n X

uti .;  i / D 0;

(23a)

iD1

u2i .r; Oi / D 0. Its

p 1 (21) rO2 D . C 2  2 2 /; 2 P where  D .1=n/ i jx i j (D rO0 ). In order to compute rO2 , the value of  2 is necessary. It can be estimated from the observations, too, as will be mentioned later. The resulting

(22)

In [9], by using Efron’s curvature, we calculated the bias of the ML estimate rO0 as EŒrO0  r D  2 =.2r/. Thus, rO2 is considered to be such that the bias of rO0 is well corrected.

r.xi sin i  yi cos i /2   2 .xi cos i C yi sin i / :  2 .2r 2 C  2 / (20)

Now, the equation to be solved is solution is given as

2 : 2

v1i .;  i / D 0;

i D 1; : : : ; n:

(23b)

The resulting solution should be the same as the expected one. Any algorithm that solves the above nonlinear equations is inevitably iterative, and good initial values are necessary to make it converge correctly. Considering that  2 tends to be small in practice and the ML estimate will be basically close to the expected solution, we can use it for the initial values. 1737

Although the above solution can be obtained theoretically, our experiments have shown that it has some difficulty with convergence. Thus, we propose here a different method that alternatively updates  and  1 ; : : : ;  n ; that is, one is computed while keeping the other fixed, alternatively. P The complete2 algorithm is as follows. First, J D i .x i  g.;  i // is minimized and the ML estimates of  and  1 ; : : : ;  n are computed. For the computation, a standard least squares algorithm such as the Levenberg-Marquardt method is used. (For initial values required for the optimization, a linear method such as the DLT algorithm [3] is used.) Next, the following two computations are repeated in an alternative manner until convergence: Eq.(23a) is solved for  while O 1 ; : : : ; O n are fixed, and then, O 1 ; : : : ; O n are updated by minimizing J (using the LM algorithm, etc.) while O is fixed. The latest parameter values are used for the initial values required at each step. The algorithm is summarized in Fig.1. The nonlinear equation given in Step 3 of the algorithm (Fig.1) can be solved using any general algorithm. We used the trust-region dogleg method implemented in fsolve of Optimization Toolbox 3.0.3 in Matlab 7.1. (We carried out all experiments in this software environment.) Since the algorithm does not require the Jacobian matrix of the target function, which is required by the P Newton algorithms, the implementation of the function i ut i is the only complex part in the overall implementation of the algorithm. As mentioned before,  2 is also unknown and, therefore, has to be estimated. We compute an estimate from the residue of J after the minimization in Step 1 for obtaining the ML estimates. For a more accurate estimation, we can recompute it from the residue available at each iteration of Step 4. (We do not employ this in the experiments described below, since there is no significant difference between the estimates obtained.)

4.2. Formula for second-order projected scores As mentioned earlier, the smallest order projected score that is useful in practice is that of the second-order (t D 2). Even this smallest order projected score is anticipated to be accurate. (It is more cumbersome to compute higher-order projected scores.) Therefore, here we derive a formula for u2 to make the calculation easier, in which the structures of the CV problems are used. As in the above discussions, on the basis of the independence between the observations, we omit the subscript i and focus on the projected score for a single observation. Moreover, we assume the nuisance parameter  to be a scalar and denote it by . This assumption is made for the sake of simplicity; otherwise, tensor notations are necessary, and the resulting expressions will inevitably be complex. Nevertheless, the following derivation basically applies to the case of the vector nuisance parameters . From Eq.(16), the projected score u2 considered here >  is given by u2 D u0  EŒu0 w> 2 EŒw2 w2  w2 , where > w2 D Œv1 ; v2  . Thus, the main part of the derivation is to calculate the expectations in the form of EŒuk vl . The details of the derivation are given in Appendix A.1. Here,

we only summarize the results. First, v2 is given by 2 2 v2 D .1= 2 /fg >  .x  g/  jg  j g C v1 ;

(24)

and u2 is given by   g  g  u2 D u0  G >  #1   " g> g .g  /2 v1    v2 ; g> g .g  /2 C .2= 2 /.g  /4  

(25)

where G D

@g ; @

g D

@g ; @

and

g  D

@2 g : @ 2

If the numerical algorithm described earlier is used, for each problem, we only have to write the code for the function numerically computing (the sum of) u2 given above. Analytical calculations are necessary only for the first few order derivatives of g with respect to  and .

5. Experimental results We conducted several experiments to examine the effectiveness of the proposed method. The problems for which accuracy is of particular concern are the targets of the proposed method. Some of them are often large-scale problem, i.e., there are many parameters and high-dimensional observations (e.g., multi-view SFM). However, for these problems, mainly because of the large degree of freedom, it is technically difficult to compare the estimation accuracies of individual parameters or to ensure convergence of algorithms. Therefore, in this paper, to evaluate the performance of the proposed algorithm as precisely as possible, we consider small-scale problems that are free from such difficulties. Specifically, we consider the estimation of an ellipse and that of a one-dimensional projective transformation. In both cases, the second-order projected score is used. Hereafter, we denote the proposed method by PS (i.e., projected score).

5.1. Ellipse estimation An ellipse can be parametrized by its center .x0 ; y0 /, the major and minor axis half lengths, a and b, and the angle  of the major axis with respect to the x axis. We estimate these five parameters from a set of noisy points, fŒxi ; yi > j i D 1; : : : ; ng. Introducing the nuisance parameter i , the true coordinates of Œxi ; yi > can be represented as ŒxN i ; yNi > D g.a; b; x0 ; y0 ; I i /, where      cos   sin  a cos i x0 g D sin  (26) cos  b sin  C y : i

0

We select Œa; b; x0 ; y0 ;  D Œ5; 0:7; 0; 0; 0:3 and  D 0:1. From 100 randomly generated points, the ellipse parameters are estimated by both the PS and ML methods. We performed many trials, for each of which the points are regenerated. Fig.2(a) shows the true ellipse along with an instance of the generated points. 1738

4

1.6

0

y 1.4

300

200

200

100

100

2 4

0

1.2 5

0 x

5

4.4

(a)

4.6

x

4.8

1.6

4.6

x

4.8

5

1.2

200

200

100

100

0 0.6

4.4

4.6 x

4.8

5

(c) (d) Figure 2. Results of ellipse estimation. (a) True ellipse and instance of generated points. (b) Mean ellipses estimated by PS and ML methods. True ellipse (thick line), PS estimate (broken line), and ML estimate (dotted line). (c) Thirty ellipses estimated by PS method (gray lines) and true ellipse (thick red line). (d) Ellipses estimated by ML method.

4.9

5 a

0.65

0.7 b

5.1

5.2

0.75

0.8

(b) 300

0.75

0 0.6

0.8

0.65

0.7 b

(c) (d) Figure 3. Histograms of estimates of major and minor axis lengths, a and b. (a) PS estimates of a, (b) ML estimates of a, (c) PS estimates of b, and (d) ML estimates of b. In each histogram, the vertical broken line indicates the true value. 2

0.78

0 x’

4.4

0

5.2

(a)

y

y 1.2

5.1

300

1.4

1.4

5 a

5

(b)

1.6

4.9

−2

x’

y

2

300

0.74

−4

Figs.2(b)-(d) show the results, with a part of the ellipse magnified. Fig.2(b) shows the mean ellipses obtained over 1000 trials (i.e., the ellipses given by the mean of the estimated parameter values); the true ellipse is shown in the thick line, the ellipse estimated by the PS method is in the broken line, and that estimated by the ML method is shown in the dotted line. Figs.2(c) and (d) show 30 individual ellipses estimated by the PS and ML methods, respectively, where the estimated ellipses are shown in thin gray lines and the true ellipse is in the (red) thick line. It is seen from these that the ML method yields non-negligible biases in the particular direction of the major axis, and these are reduced in the estimate by the PS method. (This behavior of the biases agrees with the results of [9].) This is quantitatively observed in Fig.3 that shows the histograms of the estimates of the axis length parameters a and b. Thus, the PS method yields better estimates with less biases as compared to the ML method. The proposed algorithm is iterative. In the experiments, the alternative computation (Steps 3 and 4 in Fig.1) converges in 3 to 10 iterations, in each of which the nonlinear solver in Step 3 (as well as 4) converges in less than 10 iterations. Thus, the proposed algorithm shows a good convergence performance for the problem.

5.2. One-dimensional projective transformation Next, we consider the estimation of one-dimensional projective transformations. We take the following transformation as an example:       0  1:1 0:9 x h11 h12 x x D (27) / h 1 1 1:0 1:0 1 : 1 21

(This is a simulation of a case in which two onedimensional cameras placed above a plane take images of

0.7

−6

−2

0

2

4 x

6

8

−0.5

−0.46 x

−0.42

(a) (b) Figure 4. Results of estimation of one-dimensional projective transformation; (a) True curve representing transformation and observed points and (b) magnified plots of true curve (solid line), ML estimate (dotted line), and PS estimate (broken line).

the plane from opposite directions.) We want to estimate h D Œh11 ; h12 ; h21 > (h22 D 1 is fixed), given a set of point correspondences x $ x 0 . Observations are f.x; x 0 /g, each of which contains additive Gaussian noises with  D 0:1. Choosing the coordinate x of the first camera for , we define g.h; / D Œ; .h11  C h12 /=.h21  C 1/> . This gives a curve on the xx 0 plane, as shown in Fig.4(a). Assuming regularly spaced feature points on the plane, we generate 500 points; an instance of the generated points is shown in Fig.4(a). As in the case of ellipse estimation, we apply the PS and ML methods to the data and carry out 1000 trials. Fig.4(b) shows magnified plots of the results, where the mean of the estimates obtained by the two methods is plotted as curves; the thick line indicates the true curve, the dotted line indicates the mean of ML estimates, and the broken line, which almost overlaps with the true curve, indicates the mean of PS estimates. It is observed, again, that the PS method yields better estimates as compared to the ML method, although the difference is very small.

6. Summary and conclusion In this paper, we discussed the methods for enabling more accurate estimation than the ML method for the problems of estimating geometric parameters in computer vision. In this background, there is a fact that since the 1739

CV problems have the Neyman-Scott structure, the standard optimality of the ML method does not hold true; for example, increasing the number of observations does not contribute to guaranteeing the optimality of the ML estimate. We have proposed the application of the projected score method to CV problems; this method was developed in the field of statistics to deal with the problems having the Neyman-Scott structure. We have presented a numerical algorithm implementing the method in a practical manner, and have also derived formulae to considerably simplify the implementation for individual problems. Taking a few simple problems as examples, we have shown that the method works effectively. In the future, we will consider applying the method to more practical, larger-scale problems such as camera calibration and multi-view SFM. In Section 1, we addressed (P1) and (P2) as open problems. (P1) is how to deal with problems that do not belong to the particular problem class considered in our previous study. An answer to this is the method described in this paper. (P2) is whether there is an estimating method that can achieve optimality in the asymptotics of increasing number of observations (i.e., n ! 1). This still remains unsolved. It is true that the estimates obtained using the method of projected scores could be more accurate than the ML estimates, as is shown in this paper. It does not mean that the estimates are optimal (i.e., the most accurate), unless the interest and nuisance parameters are independently estimated. This will be a future research direction of this study.

A. Appendix A.1. Derivation of formula for second-order projected score First, by definition, u0 and v1 are given as follows:  1 @g > u0 .; / D 2 .x  g/; (28a)  @  1 @g > .x  g/: (28b) v1 .; / D 2  @ We wish to derive u2 , the second-order projected score. From Eq.(13), we have v2 D @v1 =@ C v12 , from which we have Eq.(24). In the case of t D 2, Eq.(16) reduces to > u2 D u0  M 1 M 1 2 Œv1 ; v2  ;

where

  M 1 D EŒu0 v1  EŒu0 v2  ;   EŒv1 v1  EŒv1 v2  M 2 D EŒv v  EŒv v  : 2 1 2 2

(29) (30a) (30b)

The elements of these matrices are calculated as follows: EŒu0 v1  D .1= 2 /G >  g ; EŒu0 v2  D EŒv1 v1  D

(31a)

.1= /G >  g  ; 2 > .1= /g  g  ; 2

(31b) (31c)

EŒv1 v2  D .1= 2 /g >  g  ; EŒv2 v2  D .1=

2

/g >  g 

(31d) C .2=

4

2 /.g >  g / ;

where G  D @g=@, g  D @g=@, and g  D @2 g=@ 2 . Here, we have considered that when e. x  g/ is a random vector following N.0;  2 I/, the expectations of any odd products of e are zero (e.g., EŒ.e > e/e D 0), and the expectations of even products are given as follows: > 2 > (32a) EŒ.a> 1 e/.a2 e/ D  a1 a2 ; ˚

> 2 > 2 4 2 2 > 2 EŒ.a1 e/ .a2 e/  D  ja1 j ja2 j C 2.a1 a2 / ; (32b)

EŒ.a> e/6  D 15 6 jaj6 :

(32c)

These are derived from the formula for the even moments of a mean zero normal distribution; EŒz 2k  D .2kŠ/=.kŠ2k / for z  N.0; 1/. The substitution of Eqs.(31) into Eq.(29) yields Eq.(25).

References [1] A. Bhattacharyya. On some analogues of the amount of information and their uses in statistical estimation. Sankhy¯a, 8:1–14, 201–218, 315–328, 1946. [2] V. P. Godambe, editor. Estimating functions. Oxford University Press, 1991. [3] R. Hartley and A. Zisserman. Multi-View Geometry in Computer Vision. Cambridge University Press, 2000. [4] K. Kanatani. Statistical optimization for geometric fitting: Theoretical accuracy bound and high order error analysis. International Journal of Computer Vision, 80(2):167–188, 2007. [5] B. G. Lindsay. Conditional score functions: some optimality results. Biometrika, 34:503–512, 1982. [6] B. G. Lindsay and R. Waterman. Second-order information loss due to nuisance parameters: a simple measure. In S. Ghosh, editor, Asymptotics, Nonparametrics, and Time Series, pages 789–810. New York: Dekker, 1999. [7] J. Neyman and E. Scott. Consistent estimates based on partially consistent observations. Econometrica, 16(1):1–32, 1948. [8] T. Okatani and K. Deguchi. Toward a statistically optimal method for estimating geometric relations from noisy data: cases of linear relations. In Proc. of CVPR, volume 1, pages 432–439, 2003. [9] T. Okatani and K. Deguchi. On bias correction for geometric parameter estimation in computer vision. In Proc. of CVPR, 2009. [10] C. G. Small and D. L. McLeish. Generalizations of ancillarity, completeness and sufficiency in an inference function space. Annals of Statistics, 16:534–551, 1988. [11] C. G. Small and D. L. McLeish. Projection as a method for increasing sensitivity and eliminating nuisance parameters. Biometrika, 76(4):693–703, 1989. [12] R. P. Waterman. Projected score methods. PhD thesis, The Pennsylvania State University, 1993. [13] R. P. Waterman and B. G. Lindsay. Projected score methods for approximating conditional scores. Biometrika, 83(1):1– 13, 1996. [14] R. P. Waterman and B. G. Lindsay. A simple and accurate method for approximate conditional inference applied to exponential family models. Journal of the Royal Statistiacl Society B, 58(1):177–188, 1996.

(31e) 1740