Robust direct motion estimation considering discontinuity

Report 1 Downloads 143 Views
Pattern Recognition Letters 21 (2000) 999±1011

www.elsevier.nl/locate/patrec

Robust direct motion estimation considering discontinuity Jong-Eun Ha a,*, In-So Kweon b,1 a

b

MECA Group, Technology/R&D Center, Samsung Corning Co. Ltd., 472 Shin-Dong, Paldal-Gu, Suwon-Shi, Kyunggi-Do 442-390, South Korea Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong, Yusong-gu, Taejon, South Korea Received 15 September 1999; received in revised form 18 April 2000

Abstract In this paper, we propose a robust motion estimation algorithm using uncalibrated 3D motion model considering depth discontinuity. Most of the previous direct motion estimation algorithms with 3D motion model compute the depth value through the local smoothing, which result in erroneous results at depth discontinuity. In this paper, we overcome this problem at depth discontinuity by adding discontinuity preserving regularization term to the original equation. Robust estimation enables motion segmentation through the dominant camera motion compensation. Experimental results show the improved result at the depth discontinuity. Ó 2000 Elsevier Science B.V. All rights reserved. Keywords: Optic ¯ow; Discontinuity; Direct method; Uncalibrated

1. Introduction Analysis of image motion plays an important role in many areas of computer vision: scene motion detection, object segmentation, tracking, and the recovery of scene structure. Typical gradient based optic-¯ow algorithms are based on the brightness constancy assumption: invariance of recorded image brightness along motion trajectories. However, this assumption provides a single constraint for two unknowns at each pixel.

*

Corresponding author. Tel.: +82-331-219-7874; fax: +82331-219-7085. E-mail addresses: [email protected] (J.-E. Ha), [email protected] (I.-S. Kweon). 1 Tel.: +82-2-958-3465; fax: +82-2-960-0510.

Horn and Schunck (1981) introduce a spatial constraint by regularization on the optical ¯ow ®elds, often called the smoothness constraint, to compute a dense, smoothly varying velocity ®eld. This method gives a globally smooth motion ®eld, but it also blurs the motion ®eld at the discontinuity. In general, optic ¯ow ®eld is piecewise continuous rather than locally constant or globally smooth. Nagal and Enkelmann (1986) proposed an oriented smoothness constraint to attenuate smoothing across strong intensity edges. Schunck (1989) identi®es motion boundary by clustering local gradient-based constraints. Bartolini and Piva (1997) proposed median based relaxation optic ¯ow algorithm to reduce the strength of the smoothing step, especially across motion boundaries. Motion discontinuities can be treated explicitly by introducing line ®eld to be computed

0167-8655/00/$ - see front matter Ó 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 0 ) 0 0 0 5 9 - 3

1000

J.-E. Ha, I.-S. Kweon / Pattern Recognition Letters 21 (2000) 999±1011

simultaneously to image motion. This approach has often been embedded in a Markov random ®elds (MRF) modeling framework. MRF modeling gives a means to organize velocities and motion discontinuities by allowing the introduction of generic knowledge of a local and contextual nature. MRF framework for image motion estimation has been exploited in the work of Konrad and Dubois (1992), Heitz and Bouthermy (1990). Robust estimation techniques by Black and Anandan (1993), Bober and Kittler (1994), Odobez and Bouthemy (1995), and Ayer et al. (1994) can also be exploited to tackle motion discontinuity. Local approach can deal with motion discontinuity explicitly but their performance degrades in areas with a low intensity gradient and homogenous regions. Bergen et al. (1992) pointed out that an explicit representation of the motion model might lead to more accurate computation of motion ®elds. Direct method imposes an explicit motion model in addition to the optic ¯ow constraint, and it gives a better performance in the homogenous region. Previous direct methods using 3D motion model compute the depth ®eld through the local smoothing. Horn and Weldon (1988) and Hanna (1991) proposed the direct method using the calibrated 3D motion model, and they obtain the depth ®eld using the local smoothing. Szeliski and Coughlan (1997) proposed a spline based image registration method based on uncalibrated 3D motion model, but his framework uses the spline so that a smooth depth map is obtained from the interpolation of the spline's control points. All these methods employ the least-squares estimation. Ayer (1995) proposed robust direct estimation using uncalibrated 3D motion model, but their motion model is based on the instantaneous motion and depth map is obtained explicitly from the motion parameter at each step. Also he assumes locally constant depth map. In this paper, we propose a robust direct approach using uncalibrated 3D motion model considering depth discontinuity. We consider the depth discontinuity through the regularization with discontinuity in the motion ®eld. In addition, the dominant motion of camera is directly given due to the robust estimation framework with uncalibrated 3D motion model. On the other hand,

direct estimation using a simple linear model requires additional processing to extract the dominant motion of the camera. The proposed algorithm can be easily extended to the motion segmentation problem. 2. Related direct method In this section, we review the direct method using calibrated or uncalibrated 3D motion model and their shortcomings at depth discontinuity. The basic assumption behind any optic ¯ow algorithm is the brightness constancy I…x; t† ˆ I…x ÿ u…x†; t ÿ 1†: Direct method usually obtains the motion ®eld through the minimization of sum of squared difference (SSD) error over a local image area or the entire image using the explicit motion mode. X 2 ‰I…x; t† ÿ I…x ÿ u…x†; t ÿ 1†Š : …1† E…fug† ˆ x

The motion model u(x) is chosen according to application and its explicit form is u…x† ˆ u…x : fhi g†;

…2†

where fhi g is a vector representing the model parameters. Thus, motion estimation problem reduces to the estimation of model parameters. In the case of the general perspective projection model, the image motion induced by a rigidly moving object can be written as: 1 At ‡ Bx; u…x† ˆ Z…x†   ÿf 0 x Aˆ ; 0 ÿf y  …xy†=f ÿ…f 2 ‡ x2 †=f Bˆ 2 2 …xy†=f …f ‡ y †=f

…3† y ÿx

 ;

where f is the focal length, t the translation vector, x the angular velocity vector, and Z is the depth. The estimation for this motion model requires two parts: estimation of the global parameters and the estimation of the local parameters. Bergen et al. (1992) obtained the local depth parameters

J.-E. Ha, I.-S. Kweon / Pattern Recognition Letters 21 (2000) 999±1011

explicitly through the minimization of the following local component of the error measure X E…t; x; 1=Z†: …4† Elocal ˆ 55

Di€erentiating Eq. (4) with respect to 1=Z…x† and setting the result to 0 gives 1=Z ˆ

ÿ

P

T 55 …rI† At



 DI ÿ …rI†T Ati =Zi ‡ …rI†T Bx ÿ …rI†T Bxi : 2 P  T 55 …rI† At

…5† To re®ne the global motion parameters, the minimization is performed over the entire image and 1=Z…x† of Eq. (5) is used, thus through the Gauss±Newton minimization only t and x is updated. X E…t; x; 1=Z†: …6† Eglobal ˆ image

After updating the global motion parameters, local depth is obtained explicitly using Eq. (5). Local depth of Eq. (5) is based on the locally constant depth, and this assumption is violated at the depth of discontinuity. Ayer (1995) proposed a robust direct estimation with uncalibrated 3D motion model, he updates only the motion parameters using the iterative reweighted least squares (IRLS) and depth is obtained using an explicit equation from the leastsquares. Szeliski and Coughlan (1997) presented a registration algorithm using the spline with uncalibrated 3D motion model. Projective depth is estimated at each control point and the pixel-wise depth map is obtained using the interpolation through the control points. All these methods produce a smooth depth map ignoring the depth discontinuity. All previous direct approaches using calibrated or uncalibrated 3D motion model give an erroneous result in depth discontinuity due to the locally constant depth assumption. In this paper, we present a robust direct method using uncalibrated 3D motion model considering depth discontinuity.

1001

3. Robust direct estimation using uncalibrated 3D motion model considering depth discontinuity Direct model-based motion estimation obtains the motion ®eld through the minimization of the registration error using the motion model explicitly. Robust estimation of motion ®eld is obtained through the minimization of the following equation X E…fug† ˆ q…I2 …xi ‡ ui ; yi ‡ vi † ÿ I1 …xi ; yi †; r†: i;j2R

…7† The function q is the robust M-estimator and r is the scale factor that adjusts the shape of robust M-estimator. We can use various motion models according to a speci®c application. We use the uncalibrated 3D motion model that is proposed by Hartley et al. (1992) and Faugeras (1992) to cope with unknown parameters of cameras and to deal with the perspective e€ect. m0 x1 ‡ m1 y1 ‡ m2 ‡ z…x1 ; y1 †m8 ÿ x1 ; m6 x1 ‡ m7 y1 ‡ 1 ‡ z…x1 ; y1 †m10 …8† m3 x1 ‡ m4 y1 ‡ m5 ‡ z…x1 ; y1 †m9 ÿ y1 ; v…x1 ; y1 † ˆ m6 x1 ‡ m7 y1 ‡ 1 ‡ z…x1 ; y1 †m10 u…x1 ; y1 † ˆ

where m ˆ fm1 ; . . . ; m10 g are the motion parameters of an uncalibrated camera and z…x; y† is the projective depth. Above motion model is valid for any pinhole camera model and even can cope with time varying internal camera parameters. The projective coordinates are related to the true Euclidean coordinates through the 3D projective collineation, which can be recovered by a selfcalibration algorithm using only projective information. Previous direct approaches obtained the motion and structure parameters through the minimization of Eq. (7). In this paper, we consider the following two formulae that take into account the depth discontinuity: X qD …I2 …xi ‡ ui ; yi ‡ vi † ÿ I1 …xi ; yi †; rD † Eˆ i

XX qs …zt ÿ zi ; rS †; ‡ X i t2Ni qD …I2 …xi ‡ ui ; yi ‡ vi † ÿ I1 …xi ; yi †; rD † Eˆ i

‡

XX i

t2Ni

qS …ut ÿ ui ; rS †;

…9†

…10†

1002

J.-E. Ha, I.-S. Kweon / Pattern Recognition Letters 21 (2000) 999±1011

where Ni represents the neighborhood of the current pixel, and we consider four neighborhoods of east, west, south and north. qD and qS represent robust M-estimator for the data conservation term and spatial coherence term, and we use equal robust estimator for both of them. Eq. (9) is useful when we use the calibrated 3D motion model. But in case of the uncalibrated 3D motion model the di€erence in projective depth has no physical meaning. Therefore, we use the formulation of Eq. (10) to consider the depth discontinuity in direct motion estimation with uncalibrated 3D motion model. Black and Rangarajan (1996) shows that robust q-functions are closely related to the traditional line-process approaches for coping with discontinuities. For many q-functions it is possible to recover an equivalent formulation in terms of analog line processes. Based on this observation, through the second term in Eq. (10), we can take into account the discontinuity in the robust direct estimation with uncalibrated 3D motion model. Therefore, we can impose a global constraint for the image motion through the uncalibrated 3D motion model and can recover optic ¯ow preserving discontinuity. The objective function of Eq. (10) has a nonconvex form and it has many local minima. We use the graduated non-convexity (GNC) algorithm by Blake and Zisserman (1987) to minimize this nonconvex object function. GNC algorithm ®nds the solution by varying the functional form. In the robust estimation, this adjustment of the functional form is possible through the adjustment of the scale parameters. In each ®xed scale, a gradient based method can ®nd the local minimum. We use the simultaneous over relaxation (SOR) as the local minimizer. The update formula of each parameter is pn‡1 ˆ pn ÿ c

1 oE ; T …p† op

…11†

where 0 < c < 2 is an overrelaxation parameter that is used to overcorrect the estimate of pn‡1 at stage n ‡ 1. When 0 < c < 2, the method is proven to converge. The term T …p† is an upper bound on the second partial derivative of E.

The gradient of the function with respect to the motion and depth is   X oE oui ovi 0 ˆ qD …ei ; rD † I2x ‡ I2y omk omk omk i    XX oui out q0S …ui ÿ ut ; rS † ÿ ; ‡ omk omk i t2Ni …12†   X oE oui ovi 0 ˆ qD …ei ; rD † I2x ‡ I2y ozk ozk ozk i    XX oui out q0S …ui ÿ ut ; rS † ÿ ; ‡ ozk ozk i t2Ni …13† where ei ˆ I2 …xi ‡ ui ; yi ‡ vi † ÿ I1 …xi ; yi †. We use the Lorentzian q-function:   1  x 2 ; q…x; r† ˆ log 1 ‡ 2 r w…x; r† ˆ

oq 2x ˆ 2 : ox 2r ‡ x2

…14†

…15†

3.1. Initialization There are many unknowns, 11 motion parameters and projective depth at each pixel. Initialization of the these unknowns is a dicult problem. We initialize the motion parameters under the assumption that there is no motion between two images, i.e., m0 ˆ m4 ˆ 1 and all other values are 0. The initial projective depth is set to a constant. Other initialization is also possible. For example, in the work of Szeliski and Coughlan (1997), the uncalibrated motion parameters are initialized according to a given image sequence using a priori information. While we do not use such information, thus our algorithm is more ¯exible. 3.2. Propagation of motion parameters in pyramid structure A coarse-to-®ne strategy is usually employed to handle large displacements by constructing a

J.-E. Ha, I.-S. Kweon / Pattern Recognition Letters 21 (2000) 999±1011

pyramid of spatially ®ltered and sub-sampled images. In an optic ¯ow algorithm that computes …u; v† at each pixel, the result of coarse pyramid level is propagated with constant multiplier, usually 2, to the next ®ner level. In a direct method with uncalibrated 3D motion model, we should transfer the motion parameter of uncalibrated camera and depth at each pixel to the next ®ner level. We use linear interpolation when we transfer depth to the next ®ner pyramid level. The motion transfer equation to the ®ner pyramid level is obtained using the following facts. When we have obtained the corresponding points …x1 ; y1 †; …x2 ; y2 † at the pyramid level N ÿ 1, their positions at the next ®ner pyramid level N are …x01 ; y10 † ˆ …2x1 ; 2y1 †, …x02 ; y20 † ˆ …2x2 ; 2y2 †. These corresponding points should satisfy the motion model of Eq. (8). By substituting these into Eq. (8) we obtain x02 ˆ

m0 x01 ‡ m1 y10 ‡ 2m2 ‡ 2z…x1 ; y1 †m8 ; …m6 =2†x01 ‡ …m7 =2†y10 ‡ 1 ‡ z…x1 ; y1 †m10

y20 ˆ

m3 x01 ‡ m4 y10 ‡ 2m5 ‡ 2z…x1 ; y1 †m9 : …m6 =2†x01 ‡ …m7 =2†y10 ‡ 1 ‡ z…x1 ; y1 †m10

…16†

From Eq. (16), the transfer formula of motion parameters to the next ®ner pyramid level are m2

2m2 ;

m5

m9

2m9 ;

m6

2m5 ; m8 m6 ; m7 2

2m8 ; m7 2

…17†

Here, other parameters have the same values.

1003

Fig. 1. Two updating groups in projective depth updating.

3.3. Update of the motion and depth parameters at each pyramid level We ®rst update the motion parameters in N iteration, then we update the depth in N -iteration at each pyramid level. The scales rD ; rS are decreased using rn‡1 ˆ cr00 and c is set 0.95 at each pyramid level. The motion parameters are updated with following sequences: fm2 ; m5 ; m0 ; m1 ; m3 ; m4 ; m6 ; m7 ; m8 ; m9 ; m10 g. The projective depth at each pixel is updated by two passings even and odd groups, as shown in Fig. 1.

4. Experimental results In this section, we ®rst compare our algorithm with other two algorithms: one by Black and Anandan (1996) and another by Szeliski and Coughlan (1997) using the yosemite image

Fig. 2. The yosemite image sequence: (a) yosemite 11 and (b) yosemite 12.

1004

J.-E. Ha, I.-S. Kweon / Pattern Recognition Letters 21 (2000) 999±1011

sequence. Black and Anandan (1993)'s algorithm is a local ¯ow algorithm, and it obtains the pixelwise optic ¯ow using the robust formulation of data conservation an spatial coherence terms. Szeliski and Coughlan (1997)'s algorithm computes the pixel-wise optic ¯ow using the uncali-

brated 3D motion in a direct framework, and it is based on the least-squares formulation. Fig. 2 represents yosemite images 11 and 12 and we excluded the upper cloudy part because it has no true optic ¯ow ®eld. With the true optic ¯ow ®eld, three methods are compared in terms

Table 1 Flow error (mean/std) Black and Anandan (1996) Szeliski and Coughlan (1997) Proposed approach





4:46 /4:21 4:11 /12:5 4:02 /4:75

Flow error (percentage)