Improved Accuracy in Gradient-Based Optical Flow Estimation

Report 7 Downloads 46 Views
P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

International Journal of Computer Vision 25(1), 5–22 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. °

Improved Accuracy in Gradient-Based Optical Flow Estimation∗ JONATHAN W. BRANDT Advanced Systems Division, Silicon Graphics Computer Systems, 2011 N. Shoreline Blvd., Mountain View, CA, 94043 [email protected]

Received June 20, 1995; Revised June 13, 1996; Accepted July 2, 1996

Abstract. Optical flow estimation by means of first derivatives can produce surprisingly accurate and dense optical flow fields. In particular, recent empirical evidence suggests that the method that is based on local optimization of first-order constancy constraints is among the most accurate and reliable methods available. Nevertheless, a systematic investigation of the effects of the various parameters for this algorithm is still lacking. This paper reports such an investigation. Performance is assessed in terms of flow-field accuracy, density, and resolution. The investigation yields new information regarding pre-filter, differentiator, least-squares neighborhood, and reliability test selection. Several changes to previously-employed parameter settings result in significant overall performance improvements, while they simultaneously reduce the computational cost of the estimator. Keywords: optical flow 1.

Introduction

Accurate estimation of the optical flow field of an image sequence is critically important to a number of computer vision and image processing applications. These include image sequence compression, motion compensation, and the recovery of three-dimensional motion parameters and depth. Accuracy is especially important in the latter case because the computed optical flow field serves as input to numerically sensitive three-dimensional motion and structure estimation algorithms. In this case, systematic estimation errors can lead to disaster by biasing critical motion parameters such as time-to-impact. Among the many techniques to estimate optical flow are those which are based on first-order spatiotemporal derivatives. These gradient-based methods are generally relatively simple to implement, efficient to compute, and can produce surprisingly accurate ∗ This

research was undertaken at the Japan Advanced Institute of Science and Technology, Tatsunokuchi, Nomigun, Ishikawa, Japan 923-12, under a grant from Komatsu, Ltd.

dense optical flow fields. For example, a recent empirical study by Barron et al. (1993, 1994) found the gradient-based local optimization method (Lucas and Kanade, 1981a, 1981b, 1985; Adelson and Bergen, 1986; Kearney et al., 1987; Simoncelli et al., 1991, 1993) to be the best-performing overall. Barron’s study compared the results of nine different optical flow estimation techniques, of varying levels of sophistication, for a suite of five synthetic and four natural image sequences. The study provides empirical evidence that the gradient-based local optimization method performs well under a variety of conditions. However, a systematic investigation of the effects of the various algorithm parameters under exhaustive test conditions is still lacking. This paper reports progress, in the form of analytical and experimental results, in understanding these effects. The results provide a set of guidelines for parameter selection. In addition, several changes to previously-employed parameter settings result in significant overall performance improvements, while they reduce the computational cost of the estimator. In general, several parameters can affect the ultimate performance of the gradient-based optical flow

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

6

QC:

KL480-01-Brandt

August 9, 1997

Brandt

estimator. The next section outlines the algorithm and enumerates the relevant parameters. Performance evaluation requires performance criteria as well as a specification of the input conditions for the algorithm. These aspects are also described in the next section. The algorithm consists of four stages, each of which requires a design choice and consequently an evaluation of the effects of that choice. The subsequent sections explore each of these choices in detail. Section 3 examines the choice of the pre-filter and of the differentiator. These two choices are examined together because of the strong interaction between them. Section 4 examines the choice of the optimization neighborhood and the weighting of that neighborhood. Section 5 examines the reliability test that is used at the last stage of the algorithm to weed out unreliable estimates. In each case, the effects of the parameter settings on algorithm performance are evaluated, both analytically and experimentally. Data that are directly comparable to those reported by Barron et al. indicate progressive performance improvements in terms of overall accuracy, estimation density, and flow-field resolution. 2.

13:15

Algorithm Description

Let f = f (x, y, t) denote the time-varying image intensity function and let (u, v) = (u(x, y, t), v(x, y, t)) denote the x- and y-components of the instantaneous optical flow value. The classical first-order constancy constraint (Horn and Shunck, 1981) is f x u + f y v + f t = 0,

(1)

where subscripts denote partial derivatives. It is wellknown that (1) is not sufficient to determine a unique flow value at each point and so additional constraints are required. The gradient-based local optimization method (GBLOM) (Lucas and Kanade, 1981a, 1981b, 1985; Adelson and Bergen, 1986; Kearney et al., 1987; Simoncelli et al., 1991, 1993) obtains the necessary additional constraints from a finite neighborhood and combines them by weighted, linear leastsquares. Specifically, GBLOM finds the pair (u, v) that minimizes 1

²=

n X

¡ ¢2 wi f x(i) u + f y(i) v + f t(i) ,

(2)

i=1

where f (i) = f (x + 1xi , y + 1yi , +1ti ) and wi is the weight associated with constraint i. The triples (1xi , 1yi , 1ti ) determine a neighborhood around

each point from which n first-order constancy constraints are extracted. Traditionally, the temporal extent of the least-squares neighborhood is a single frame. (One notable exception is (Nomura et al., 1993)). Perhaps this choice has been made in the past due to implementation restrictions, such as memory limitations. Such limitations can be overcome without restricting the temporal extent of the least-squares neighborhood. Therefore, no such restriction is adopted here. The pair (u, v) that minimizes (2) is the solution to ÃP ¡ ! µ ¶ ¢2 P wi f x(i) wi f x(i) f y(i) u P P ¡ (i) ¢2 (i) (i) v wi f x f y wi f y ! ÃP wi f x(i) f t(i) . (3) = − P wi f y(i) f t(i) To simplify notation, rewrite (3) as µ ¶µ ¶ µ ¶ Mx x Mx y u Mxt =− . Mx y M yy M yt v

(4)

If the spatial gradient of the image is sufficiently large and its direction varies sufficiently within the neighborhood, then the above linear system is well-conditioned and a unique, reliable flow value can be determined. On the other hand, if the spatial gradient is near zero, or its direction is nearly constant, then a reliable flow estimate cannot be produced. This is an instance of the “aperture problem” that arises in motion estimation (Horn and Shunck, 1981). Weber and Malik (1993, 1995) suggest that total least-squares is potentially more accurate than standard least-squares. Total least-squares stems from the observation that we have essentially equal statistical confidence of the terms f x(i) , f y(i) , and f z(i) , while standard least-squares contains the implicit assumption that f x(i) and f y(i) are known exactly. Their algorithm, when applied to the test sequences used in Barron et al. produced more accurate results. However, it is unclear to what extent the total least-squares approach contributes to the improved accuracy, given that the algorithm also uses a Gaussian derivative-based filter bank to integrate multi-scale information. In addition, the total least-squares approach requires significantly more computation than the standard least-squares approach. Head-to-head comparison of total versus standard least-squares is an important topic of investigation. However, it is beyond the scope of this paper. GBLOM depends on the assumption that the value of (u, v) is constant, or nearly-constant, within the

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

integration neighborhood. Modifications have been proposed that allow for variation of (u, v) within the neighborhood (e.g., (Campani and Verri, 1990)). However, these techniques introduce more unknowns into the least-squares system and therefore require larger integration neighborhoods for accurate estimation. Several researchers, notably (Uras et al., 1988; Verri et al., 1990; Otte and Nagel, 1994; Tistarelli, 1994), have suggested that higher-order derivatives can be employed to further constrain the optical flow at a point and thereby reduce the size of the integration neighborhood. However, Barron et al. found that the secondorder-based method of Uras et al. was less accurate in general than the first-order-based method of Lucas and Kanade. A subsequent controlled comparison of the first- and second-order based methods by the author (Brandt, 1994b) further corroborated this finding. This paper considers only the first-order-based method (GBLOM). GBLOM has four distinct stages: 1. Pre-Filtering: Apply a time/space low-pass filter to the input image in order to improve the signalto-noise ratio and to reduce the non-linear components of the image that tend to degrade subsequent gradient estimation accuracy. For example, Barron et al. use an 11 × 11 × 11 Gaussian low-pass filter (σ = 1.5). 2. Gradient Estimation: Apply a differencing kernel in each of the three axial directions. For example, Barron et al. use the kernel D5 = [−1, 8, 0, −8, 1]/12. 3. Neighborhood Integration: Compute the coefficients for the linear system that determines the minimizing solution to (2) by forming weighted sums of the terms ( f x(i) )2 , f x(i) f y(i) , ( f y(i) )2 , f x(i) f t(i) , and f y(i) f t(i) obtained from a local neighborhood. For example, Barron et al. use the 5 × 5 neighborhood specified by the separable kernel P5 = [.0625, .25, .375, .25, .0625]. 4. Least-Squares Solution: Invert the resulting 2 × 2 linear system to obtain (u, v). Usually, some sort of reliability test is applied to the system in order to screen out those points where the optical flow is not uniquely determined. For example, Barron et al. require that the minimum eigenvalue of the linear system be greater than a prescribed threshold in order for the estimate to be considered reliable. The required design choices regarding this algorithm are the selection of (1) the pre-filter, (2) the

7

differentiator, (3) the neighborhood integrator, and (4) the reliability test. Each of these choices affects the overall performance of the estimator in a variety of ways. Three competing factors determine the performance of the optical flow estimator: 1. Estimator Accuracy: How closely do the estimated flow values match the actual flow values? Is the estimator biased? How much variance is in the estimates? 2. Estimation Density: How many estimates are produced per unit area? 3. Flow-Field Resolution: How finely can the estimator resolve time/space transitions in the flow field? Generally, greater accuracy and density requires increasing the overall support of the estimator. (The estimator support is determined by the support of the pre-filter, the differentiator, and the neighborhood integrator.) Increasing the support tends to decrease the flow-field resolution. Therefore a tradeoff exists. In order to resolve the tradeoff, it is necessary to consider the system operating parameters. The system operating parameters fall into three main categories: 1. Image Characteristics: What is the image power spectrum? 2. Flow-Field Characteristics: What is the maximum flow magnitude to be reliably estimated? Can the flow magnitudes exceed the maximum, and if so, should the system detect and reject such cases? How rapidly does the flow field vary in space and time? What time/space flow field resolution is required? 3. Noise Characteristics: What is the noise power spectrum? What is the signal-to-noise ratio? The design problem is to select the pre-filter, differentiator, neighbor integrator, and reliability test that results in the best performance, in terms of flow accuracy, estimation density, and flow-field resolution, according the specified operating parameters. Naturally, all of this should be achieved at a reasonably low computational cost. The following sections examine each of these design choices in turn. 3.

Choice of Pre-Filter and Differentiator

GBLOM relies on accurate partial derivative estimates obtained by the use of finite-differencing convolution kernels. Such kernels can accurately approximate the jω frequency-response characteristic of the derivative

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

8

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

operator over only a limited frequency range. The range of reliable frequencies is related to the kernel’s support size and thus ultimately to the operating speed and cost of the system that performs these computations. Therefore, the choice of finite-differencing kernel constitutes a tradeoff between accuracy and efficiency, among other factors. It is often claimed that spatio-temporal low-pass prefiltering of the input image sequence is necessary in order to obtain accurate results. The need for low-pass filtering has generally been attributed to the presence of broad-band noise in the input. However, is has been noted by several researchers that low-pass pre-filtering also helps reduce errors introduced by finite differencing. In particular, Cafforio and Rocca (1979, 1982) analyzed the effects of central versus non-central differences on motion estimation using a quadratic autocorrelation model. Kerney et al. (1987) used a Taylorseries expansion to examine the effect of non-central differencing. Finally, recent study by the author and others provides more detailed understanding of the problems of finite differencing in the context of optical flow estimation (Brandt, 1994a, 1994b, 1994c; Simoncelli, 1994). Kearney’s error analysis (Kearney et al., 1987) identified the gradient estimation step as a source of systematic error for optical flow estimation. Using the Taylor-series expansion, he argues that the derivative of a function f (x) that is estimated by the forwarddifferencing formula f (x + 1x) − f (x) fˆ0 (x) = 1x yields the approximation fˆ0 (x) ≈ f 0 (x) + 1x f 00 (x). So the error in the derivative estimate is proportional to the second derivative. However, it is more common to use central differences to estimate the derivative. That is, f (x + 1x) − f (x − 1x) , fˆ0 (x) = 21x which effectively cancels the second-order term in the Taylor-series expansion and yields the alternative approximation 1x 2 000 f (x). fˆ0 (x) ≈ f 0 (x) + 6

In practice, central differencing produces errors that are well-characterized by the latter formula—the derivatives are generally underestimated when the curvature is decreasing, overestimated when it is increasing, and not significantly affected by the value of the second derivative. Let the positive parameter α specify the efficacy of the differentiator in estimating the derivative of f , fˆ0 (x) ≈ f 0 (x) + α f 000 (x).

(5)

Frequency domain analysis further corroborates the above approximation. Model the frequency-response characteristic of a finite differencing kernel as D(ω) = jω A(ω),

(6)

where A(ω) is the frequency-response characteristic of a low-pass filter. It quickly follows that the approximation in (5) is equivalent to assuming A(ω) ≈ 1 − αω2 ,

(7)

which is a reasonable form for a low-pass filter, at least for small values of ω. For instance, the Gaussian lowpass filter has a frequency-response characteristic that 2 is proportional to e−αω which has the Taylor-series expansion e−αω = 1 − αω2 + O(ω4 ). 2

Therefore, the differencing kernel derived from the first derivative of the Gaussian is subject to errors of the form expressed in (5). Many other differencing kernels are subject to these errors as well. Note that Cafforio (1982) concludes that central differencing is preferable to forward differencing due to its intrinsic bias. In the following paragraphs, it is argued through analysis and simulation that although central differencing per-se is unbiased, the resulting flow estimates are biased in a non-trivial way because of the particular manner that the derivative estimates are combined arithmetically to produce those estimates. 3.1.

Analysis of the One-Dimensional Case

Consider the one-dimensional optical flow estimation problem: given a signal of the form f (x, t) = g(x − vt)

(8)

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation that has been sampled in space and time, estimate v. By applying the one-dimensional flow-constraint equation, vˆ f x + f t = 0, (subscripts denote partial derivatives) the flow velocity can be estimated (provided fˆx 6= 0) by vˆ = − fˆt / fˆx . Substituting the approximation formulae for fˆt and fˆx (assuming that identical differentiators are used for estimating both derivatives) yields µ vˆ ≈

¶ g 0 + αv 2 g 000 v. g 0 + αg 000

Thus, the relative bias in the flow velocity estimation is ¶ µ αg 000 (9) (v 2 − 1). (vˆ − v)/v ≈ g 0 + αg 000 The above formula implies that the relative bias is directly proportional to α and g 000 , and inversely proportional to g 0 . The term (v 2 − 1) is intriguing because it implies that the bias goes to zero as |v| approaches unity. One might object, at this point, that the above result is dubious because we have not specified any units, either of time or of distance, and it seems nonsensical to have a result that depends on the choice of these units. However, the result in fact does not depend on the choice of units. It depends only on the fact that the time and space differentiators have the same non-ideal characteristic, when expressed in the chosen units for time and space. If the time scale is changed relative to the distance scale without changing the differentiators, then the characteristics of the time and space differentiators will no longer be congruent. If v is constant, then the Fourier transform of (8) is F(ω1 , ω2 ) = G(ω1 )δ(ω2 + vω1 ) which implies that the temporal frequencies are scaled by |v| relative to the spatial frequencies. (In fact, this relationship is the basis for Heeger’s frequency-domain approach to optical flow estimation (Heeger, 1987).) If the spatial and temporal derivatives are each approximated by the same differencing operations D(ω), then the consequent distortion in the temporal domain will

9

generally differ from that in the spatial domain. However, if |v| is close to unity, then the distortions in the temporal and the spatial domains will be nearly identical, and so the errors in fˆt and fˆx will more-or-less cancel. This explains why the error approaches zero as the flow velocity approaches unity, regardless of how poorly the system operates (that is, regardless of the value of α). One lesson of the foregoing analysis is that it is very important to test the accuracy of the flow estimator over a broad velocity range. It has been shown (Brandt, 1994c) that if g(x) = sin ωx and the differentiator is of the form expressed in (6), then the relative flow estimation bias is µ (vˆ − v)/v ≈

¶ A(vω) −1 . A(ω)

If A(ω) is of the form specified in (7), then the above expression is equivalent to µ (vˆ − v)/v ≈

¶ αω2 (1 − v 2 ), 1 − αω2

(10)

which can be viewed as the frequency-domain alternative to (9) that applies when g(x) is a sinusoid, or more generally, a narrow-band signal. The estimated flow value can be considered reasonably accurate only when αω2 ¿ 1. In this region, the term αω2 /(1−αω2 ) is non-negative, so the sign of vˆ − v is the same as the sign of (1−v 2 )v. This is significant because it suggests that for a narrow-band signal, gradient-based optical flow magnitude estimates are systematically biased toward unity. Simulations have confirmed this property (Brandt, 1994a, 1994c). 3.2.

Analysis of the Two-Dimensional Case

The analysis in the preceding section yields an expression for the flow-estimation bias resulting from finite-differencing errors for the case of a translating one-dimensional signal. The expression implies that the flow estimation bias vanishes when |v| = 1 regardless of the degree of finite-differencing error and the form of the translating signal. Below, the analysis is extended to the two-dimensional case. The result describes the locus of flow values for which the finitedifferencing-based distortion vanishes. Suppose that the function f (x, y, t) is a timevarying image that has been formed by translating a fixed, two-dimensional pattern g( p, q) with constant

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

10

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

velocity. That is, the image is of the form f (x, y, t) = g( p, q) = g(x − ut, y − vt),

(11)

where u and v are constants. The two-dimensional, first order motion constraint equation is uˆ f x + vˆ f y + f t = 0. Applying the assumed form for f yields ˆ q − (ug p + vgq ) = 0, ug ˆ p + vg which admits the solution (u, ˆ v) ˆ = (u, v), regardless of g. However, if the derivative estimates are distorted according to (5), then the actual motion constraint equation that the flow estimator attempts to satisfy is (uˆ − u)g p + (vˆ − v)gq ¶ ¶ µ µ ∂ 3 ∂ +v + α ug ˆ p3 + vg ˆ q3 − u g = 0. ∂p ∂q (12) It follows that the set of flow values for which (u, ˆ v) ˆ = (u, v), regardless of the value of α and the function g is the set of flow values that satisfy the constraints u(1 − u ) = 0, 2

a derivative in, say, the x direction requires an accompanying low-pass operation in the y and z directions. He combines several criteria in order to obtain a frequency-domain algorithm to design matched differencing and low-pass kernels. It turns out, however, that Simoncelli’s criteria are stronger than necessary for the purpose of accurate optical flow estimation. (Simoncelli considered the general problem of differentiation, not just optical flow estimation.) Suppose that the differentiator in each axial direction consists of a non-ideal one-dimensional differentiator of the form expressed in (6) and two lowpass filters in the other two axial directions. That is, the frequencydomain forms of the spatial and temporal differentiators are Dx (ω1 , ω2 , ω3 ) = jω1 A(ω1 )B(ω2 )B(ω3 ), D y (ω1 , ω2 , ω3 ) = jω2 A(ω2 )B(ω1 )B(ω3 ), Dt (ω1 , ω2 , ω3 ) = jω3 A(ω3 )B(ω1 )B(ω2 ). If the time-varying input image f is of the form expressed in (11) and u and v are constant in space and time, then the frequency-domain expression for f is F(ω1 , ω2 , ω3 ) = G(ω1 , ω2 )δ(uω1 + vω2 + ω3 ). Applying the first-order motion constraint equation yields

v(1 − v 2 ) = 0, u 2 v = 0, uv 2 = 0. The set of flow values that satisfies these constraints is {(0, 0), (±1, 0), (0, ±1)}. Other flow values are systematically biased in order to satisfy (12). The amount of bias fluctuates depending on the magnitudes of the third-order derivatives g p3 , g p2 q , g pq 2 , and gq 3 . Low-pass pre-filtering generally decreases the magnitudes of these derivatives and consequently reduces the amount of flow estimation bias. 3.3.

Gradient Estimation Error Compensation

Simoncelli (1994) proposes that accurate multi-dimensional differentiation requires the design of a matched set of low-pass and differencing kernel pairs so that

(u Dx + v D y + Dt )F = 0, or j (uω1 A(ω1 )B(ω2 )B(ω3 ) + vω2 A(ω2 )B(ω1 )B(ω3 ) + ω3 A(ω3 )B(ω1 )B(ω2 ))Gδ(uω1 + vω2 + ω3 ) = 0. If A = k B, then the terms involving A and B can be factored and eliminated, yielding (uω1 + vω2 + ω3 )δ(uω1 + vω2 + ω3 ) = 0, which is true for all u and v. If A and B are not linearly proportional, then the above factoring step is not possible, and consequently the flow constraint equation cannot produce unbiased solutions without constraining the values of u and v. In practice, u and v are not constant. Nevertheless, the above analysis appears to be supported by experimental evidence. Namely, that compensatory low-pass filtering of the type proposed by Simoncelli results in

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

11

increased flow estimation accuracy with smaller overall support, and uncompensated differentiation leads to errors that correspond roughly to (9). 3.4.

Experiments

A series of experiments using synthetic sequences tends to support the hypothesis that the non-ideal response of the differentiator systematically biases optical flow estimation, unless compensatory low-pass filtering is performed. The procedure for each experiment run is as follows. First, a two-dimensional image with prescribed spatial frequency content is generated. The image is an instance of independent, identically-distributed white Gaussian noise that has been passed through a filter with frequency-response characteristic ( Hρ1 ,ρ2 (ω1 , ω2 ) =

q 1 if ρ1 ≤ ω12 + ω22 ≤ ρ2 0 otherwise.

This filter is an isotropic band-pass filter whose pass band is delimited by the parameters ρ1 and ρ2 . (The form of the filter is depicted in Fig. 1.) Three images are generated using the parameters (ρ1 , ρ2 ) = (0.0, 0.2), (0.2, 0.4), and (0.4, 0.6), respectively. (Frequencies are normalized such that the Nyquist frequency is one.) These parameters enable the comparison of the relative optical flow estimation error resulting from information originating in different spatial frequency regions. Let these three images be

Figure 1.

Isotropic two-dimensional band-pass filter.

Figure 2.

The low-, medium-, and high-frequency test images.

called the low-, medium-, and high-frequency images, respectively. Single-frames of these test sequences are depicted in Fig. 2. Each of the three images is rotated about its center by a fixed angular increment per frame to generate an image sequence that specifies a rotating motion field. This rotating motion field has the property that the velocity at each pixel is unique and constant over time. In the experiments reported here, the image is rotated by 2◦ per frame and the resulting sequence is 179 frames of 128 × 128 pixels each. Each synthetic image sequence is then processed to estimate the optical flow. In each case, the integration neighborhood is the 5 × 5 neighborhood weighted by the separable kernel P5 . Flow estimates for which the least eigenvalue of the least-squares system is less than one are rejected as unreliable. Both of these settings are identical to those used by Barron et al. The resulting optical flow estimates at each pixel are integrated over time to collect mean and variance statistics. Three sets of experiments were performed, each using a different set of differentiators. The first case used the standard 5-point differentiator D5 = [−1, 8, 0, −8, 1]/12 in each of the axial directions, without any correcting low-pass operations in the nondifferentiating directions. The second case used a 9point first derivative of the Gaussian (σ = 1.5 pixels) in the direction of the derivative and the 9-point Gaussian lowpass (σ = 1.5 pixels) in the other two directions.

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

12

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

Figure 3. The aggregate flow estimation results for the mediumfrequency sequence. From left to right: the mean x-component, the mean y-component, and the density. From top to bottom: the D5 differentiator, matched Gaussian differentiator, matched Simoncelli differentiator. The black lines in the flow component images are equi-velocity contours.

Finally, the third case used the 5-point Simoncelli matched differentiator/low-pass pair. Figure 3 depicts the aggregate flow estimation results for the medium-frequency sequence. The top row contains the mean x- and y-velocity components, as well as the estimate density per pixel (over time) that results from using the D5 differentiator with no compensatory low-pass filtering. The bias introduced by the use of this differentiator is evident in the dependence of the x-component on the x position and similarly of the y-component on the y position. (The ideal x-component value is a linear function of y only, and the ideal y-component value is a linear function of x only.) The bias in the x-component is roughly symmetric with respect to y, and anti-symmetric with respect to x. The middle row of Fig. 3 contains the corresponding results from using the matched Gaussian differentiator and low-pass filters. The bias effect is reduced but is still present as evidenced by the slight curvature of the equi-velocity contours. In addition, the density of the estimates has greatly reduced, mainly because the Gaussian eliminates too much high-frequency information. The bottom row contains corresponding results from using the matched Simoncelli filters. In this case the

Figure 4. The absolute values of the relative error of the estimated x-component flow velocity are plotted as functions of the actual flow velocity magnitude. The inputs are filtered white Gaussian noise (see text), namely the low-frequency (solid), medium-frequency (dashed), and high-frequency (dash-dotted curve) images. The top plot depicts the case of no compensatory low-pass filtering. The middle plot is the case of matched nine-point Gaussian filters. The bottom plot uses Simoncelli’s matched filters. All velocities are in units of pixels per frame.

bias is reduced even further, relative to the results derived from the Gaussian filters, while maintaining a relatively high estimation density. The comparative performance of these three types of differentiation strategies can be assessed more precisely when the data is presented graphically. Figure 4 depicts the absolute value of the relative error in the x-component of the flow estimate (similar results occur for the y-component) as a function of the flow velocity magnitude. That is, it is the mean of |uˆ − u| 0.5 + |u| as a function of

p u 2 + v2.

The 0.5 in the above fraction compensates for the singularity that otherwise occurs at u = 0. It is clear from Fig. 4 that the optical flow is more accurately estimated from the low-frequency image than from the medium- and high-frequency images. It should be noted that the aliasing limit for flow magnitude estimation is 1.67 pixels per frame for the high-frequency image and 2.5 pixels per frame for the medium-frequency image. Also, the Gaussian filter pair effectively eliminates most of the information

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

in the high-frequency sequence, at least for velocities greater than .75 pixels per frame. Nevertheless, Fig. 4 demonstrates that significant estimation bias occurs for these images well within the non-aliasing region, and it appears to be roughly in agreement with (9). That is, it approaches zero when the velocity magnitude approaches one. This bias pattern is particularly apparent for the case of the high-frequency image. When the nine-point Gaussian filters are used, no flow estimates are produced for the high-frequency image for velocity magnitudes that are greater than one. In this case, the high degree of temporal smoothing eliminates the flow information. The simulations indicate that compensatory lowpass filtering can significantly reduce systematic gradient estimation errors, therefore obviating the need for extensive low-pass pre-filtering. Consequently, the estimator can have smaller overall support and can make better use of medium- and high-frequency information. Figure 4 also indicates that Simoncelli’s matched kernel pairs generally make more efficient use of higher frequency information than the matched Gaussians filters. In summary, these experiments appear to confirm the existence of systematic flow-estimation errors of the form predicted in (9) and (10). The errors appear to be particularly severe when the velocity magnitude is greater than one and in general, for highfrequency information. 3.5.

Evaluation of Aggregate Error

The error measure suggested by Barron et al. provides a means to evaluate the aggregate accuracy of the estimator as it depends on the differentiator/pre-filter combination. Specifically, they define the flow estimation error to be the angle between the unit vectors (u, v, 1)/k(u, v, 1)k and (u, ˆ v, ˆ 1)/k(u, ˆ v, ˆ 1)k. Otte and Nagel (1994) comment that the angle-based error measure proposed by Barron et al. has the shortcomings that it is not symmetric with respect to the direction of the flow error, and that it weights errors in small magnitude flows more heavily relative to errors in large magnitude flows. Nevertheless, this work reports error in terms of Barron’s measure in order to provide a basis of comparison with that work. Tables 1–3 contain the means and standard deviations of this error measure, as well as the densities of valid flow estimates, for a set of image sequences that have known flow values. The first three sequences are identical to those that are evaluated by Barron et al.

13

Table 1. Aggregate error figures with parameter settings as reported by Barron et al. Sequence

Average error

Standard deviation

Density

Translating tree

0.67◦

0.63◦

43.41%

Diverging tree

2.03◦

1.88◦

50.95%

Yosemite

4.27◦

10.21◦

38.06%

Low frequency

2.33◦

1.87◦

80.74%

Medium frequency

1.47◦

0.97◦

64.25%

High frequency

1.75◦

0.95◦

3.02%

Table 2. Aggregate error figures using five-point Simoncelli differentiator and no pre-filtering. Sequence

Average error Standard deviation Density

Translating tree

1.04◦

2.77◦

55.94%

Diverging tree

2.31◦

3.40◦

63.11%

Yosemite

6.00◦

12.14◦

55.83%

Low frequency

2.11◦

1.83◦

95.27%

Medium frequency

1.07◦

0.80◦

99.47%

High frequency

4.82◦

15.35◦

85.34%

Table 3. Aggregate error figures using five-point Simoncelli differentiator and 3 × 3 × 3 pre-filter. Sequence

Average error Standard deviation Density

Translating tree

0.64◦

0.84◦

49.88%

Diverging tree

1.95◦

1.85◦

55.38%

Yosemite

4.68◦

10.60◦

48.29%

Low frequency

2.12◦

1.81◦

91.05%

Medium frequency

1.13◦

0.85◦

91.79%

High frequency

1.25◦

0.68◦

39.67%

The second three sequences are the low-, medium-, and high-frequency rotating sequences that are described in the preceding section. Table 1 contains the accuracy statistics resulting when the parameters are set as described by Barron et al. (The figures obtained here differ slightly from those reported by Barron et al. perhaps due to a slight difference in the pre-filtering kernel.) That is, the input is pre-filtered with a 11 × 11 × 11point Gaussian kernel (σ = 1.5) and the differentiator is D5 with no compensatory low-pass filtering. The least-squares neighborhood and eigenvalue threshold are the same as described in the preceding section. The overall support of the estimator is 19 × 19 × 15 pixels. Table 2 is for the case of the 5-point matched differentiator/low-pass filter pair with no pre-filtering. In this case, the overall support of the estimator is

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

14

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

9 × 9 × 5 pixels. Table 3 is for the case of the 5-point matched differentiator/low-pass filter pair with the 3 × 3 × 3 pre-filter defined by the separable kernel P3 = [.25, .5, .25]. In this case, the overall support of the estimator is 11 × 11 × 7 pixels. When the D5 differentiator is replaced by the 5-point matched differentiator/low-pass filter pair and the prefilter is removed, the aggregate performance of the estimator generally deteriorates, although the density increases dramatically. Apparently, much useful flow information is rejected by the Gaussian pre-filter. The most notable density jump is for the case of the highfrequency synthetic image. With the pre-filter, only 3% of the pixels in this image result in flow estimates. But when the pre-filter is removed, the density increases to 85%. Interestingly, the accuracy of the mediumfrequency synthetic image actually increases while the estimation density jumped from 64% to 99%. This improvement is remarkable considering that the estimator support is now roughly 13 times smaller than its original size. The improvement can be ascribed to the strong effect that gradient error compensation has on medium-frequency information, as is apparent in Figs. 3 and 4. The slight increase in estimation error and standard deviation observable in Table 2 indicates that the pre-filter does achieve some desirable noise rejection. However, the large-support Gaussian rejects too much useful flow information, and is perhaps overkill when the matched differentiator/low-pass filter is used. A smaller-support lowpass filter may provide sufficient noise rejection while passing useful flow information. Table 3 confirms this conjecture. When the smallsupport (3 × 3 × 3) pre-filter is combined with the matched differentiator/low-pass filter pair, the error figures approach those of the original case, while the density figures remain relatively high. The result is startling, considering that the overall support of the estimator is now 6 times smaller than its original size. Note that the accuracies for the synthetic, rotating images have improved over the original figures. Probably this is due to the fact that the smaller support size results in greater flow-field resolution. The results in Tables 1–3 provide further confirmation that the matched differentiator/low-pass pair is able to compensate for gradient estimation errors, and produce accurate flow estimates with relatively small overall support. Additional tests with larger-support matched differentiator/low-pass pairs produced no accuracy improvements.

Some pre-filtering is still beneficial. However, this pre-filtering is now primarily for the purpose of noise rejection and anti-aliasing. That is, by using the matched kernels, pre-filtering is mostly decoupled from the problem of systematic gradient estimation error. Therefore, the choice of pre-filter depends mostly on the input SNR. This enables more efficient use of medium- and high-frequency flow information, and potentially much higher flow-field resolution.

4.

Choice of Optimization Neighborhood

Little attention has been paid to the choice of weighted neighborhood that is used to form the local linear leastsquares system that determines the optical flow values. The preceding sections demonstrated that changing the support of the differentiator and pre-filter kernels can have a significant impact on the resulting flow estimation accuracy. Naturally, this property should apply to the neighborhood kernel as well. Generally, increasing the support of the neighborhood should increase the estimation density, while it reduces the overall accuracy. The latter follows from the fact that flow value is assumed to be constant, or nearly constant, within the neighborhood. As the size of the neighborhood increases, this assumption breaks down. Table 4 contains the aggregate error values that result from assuming the same parameter settings as those in Table 3, but with the least-squares neighborhood changed to the 3×3×3 neighborhood determined by the separable kernel P3 . This case is interesting because it increases the number of effective constraints per estimate from 25 to 27, while it reduces the overall support size to 9 × 9 × 9 (a 14% reduction). The change yields mixed results. There is a slight improvement for some cases, notably the rotating fields. This is understandable because the motion field in this case is Table 4. Aggregate error figures for the weighted 3 × 3 × 3 leastsquares neighborhood case. Sequence

Average error

Standard deviation

Density

Translating tree

0.73◦

2.31◦

50.02%

Diverging tree

1.81◦

2.10◦

48.78%

Yosemite

4.33◦

9.91◦

46.12%

Low frequency

1.25◦

1.44◦

91.36%

Medium frequency

0.71◦

0.72◦

92.56%

High frequency

0.97◦

1.62◦

37.73%

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

temporally constant, and can therefore only benefit by integrating the flow constraints in the temporal direction. The performance for the translating and diverging tree sequences deteriorates somewhat (in terms of standard deviation). This is probably due to the fact that these images are somewhat sparse, containing large flat regions. Reducing the spatial extent of the neighborhood integrator increases the effective areas of the flat regions of these images. Nevertheless, the deterioration is marginal, and the 3 × 3 × 3 neighborhood kernel is probably preferable to the 5 × 5 kernel, unless prior information is given concerning the flow field (such as the fact that it is spatially constant), since the 3 × 3 × 3 kernel optimizes the space/time resolution of the estimator symmetrically. A question remains regarding the importance of the weights themselves in the weighted least-squares step of the algorithm. The kernels are generally centrallyweighted in order to give dominance to the central motion constraint. Central-weighting tends to increase the flow-field resolution, but with some loss of density since peripheral pixels that have strong flow information do not contribute significantly to the least-squares system. Table 5 contains the aggregate accuracy figures for the case of a uniformly-weighted 5 × 5 leastsquares neighborhood kernel, while Table 6 contains figures for the case of a uniformly-weighted 3 × 3 × 3 Table 5. Aggregate error figures for the uniformly-weighted 5 × 5 least-squares neighborhood case. Sequence

Average error

Standard deviation

Density

Translating tree

0.61◦

1.55◦

57.87%

Diverging tree

2.03◦

1.90◦

62.90%

Yosemite

4.85◦

11.06◦

54.18%

Low frequency

2.15◦

1.79◦

97.87%

Medium frequency

1.27◦

0.87◦

96.36%

High frequency

1.46◦

0.75◦

42.36%

Table 6. Aggregate error figures for the uniformly-weighted 3 × 3 × 3 least-squares neighborhood case. Sequence

Average error Standard deviation Density

Translating tree

0.72◦

2.44◦

53.26%

Diverging tree

1.82◦

2.14◦

52.17%

Yosemite

4.38◦

10.02◦

49.40%

Low frequency

1.26◦

1.47◦

94.56%

Medium frequency

0.72◦

0.74◦

94.85%

High frequency

0.96◦

1.47◦

39.00%

15

kernel. As expected, using the uniformly-weighted kernels tends to reduce the accuracy (in terms of mean error and standard deviation) while it increases the density. However, the density increase is fairly marked, while the accuracy decrease is slight. The foregoing results suggest that the optimum choice of neighborhood kernel depends greatly on the flow-field dynamics and the image spatial structure. In the absence of prior knowledge, the centrally-weighted 3 × 3 × 3 kernel is probably best. However, further improvement might be made by developing an algorithm that can adapt the neighborhood kernel according to the image data and flow-field dynamics. Certainly, the various multi-resolution flow estimation schemes (see, for example (Enkelmann, 1988; Weber and Malik, 1993, 1995)) effectively adapt the constraint integration neighborhood.

5.

Choice of Reliability Test

The final step of the algorithm is to invert the system of equations specified by (4) and thereby obtain the instantaneous flow values (u, v). Often, the system is under-determined, or ill-conditioned, because the gradient direction does not vary sufficiently within the integration neighborhood (Nagel, 1987). A reliable flow value cannot be determined if this is the case. A second type of failure can occur that is sometimes called model failure. Model failure occurs when the constancy constraint specified in (1) is violated, or when the flow varies rapidly within the integration neighborhood (thus violating the tacit assumption that u and v are locally constant). Either case must be detected reliably in order to reject bad flow estimates. 5.1.

Detecting Failure Due to Ill-Conditioning

Figure 5 depicts several possible eigenvalue-based detectors that can be used to screen out flow estimates that are unreliable due to ill-conditioning. The first is the reliability test suggested by Barron et al., namely that the minimum of the two eigenvalues exceed a threshold. The second is the sum of eigenvalues suggested by Simoncelli et al. (1991, 1993). The third is the determinant. The fourth is the reciprocal of the matrix condition number—namely the ratio of the minimum to the maximum eigenvalue. Note that the matrix in question is positive semi-definite and so the two eigenvalues are real and non-negative.

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

16

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

of eigenvalues and the reciprocal of the condition value are poor reliability indicators, while the minimum of the eigenvalues and the determinant appear to be relatively good indicators. Note that these latter two tests have the shared characteristic, as depicted in Fig. 5, that they reject estimates that get too close to either of the eigenvalue axes. 5.2.

Figure 5. Several flow reliability tests are determined by the eigenvalues of the least-squares system.

The question is, how well does each of these correlate with the actual flow estimation error? Figures 7–10 depict the mean values of each of the four reliability measures as functions of the actual flow estimation error (measured as the base 10 logarithm of the magnitude of the absolute error). The distribution of the error values is shown in Fig. 6. The data are from the diverging tree image. These data indicate that the sum

Figure 6.

Detecting Model Failure

The second type of failure that can occur, namely model failure, cannot be detected, in general, by examining the eigenvalues of the least-squares system because such instances can easily result in a well-conditioned system. However, it is possible that the residual error of the least-squares system is related to model failure errors. The value of the residual ² as defined in (2), for the minimizing values of (u, v) is ¶µ ¶ µ ¶¸ ·µ u Mxt Mx x Mx y +2 + Mtt , ² = (u, v) Mx y M yy M yt v P wi ( f t(i) )2 . However, examination where Mtt = of (2) reveals that ² grows with the square of the image contrast. That is, if fˆ = α f,

Histogram of flow estimation error, measured as the base 10 logarithm of the magnitude of the absolute error.

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

17

Figure 7.

Mean and standard deviation of the minimum of the eigenvalues as a function of the actual estimation error (log of error magnitude).

Figure 8.

Mean and standard deviation of the sum of the eigenvalues as a function of the actual estimation error (log of error magnitude).

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

18

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

Figure 9.

Mean and standard deviation of the system determinant as a function of the actual estimation error (log of error magnitude).

Figure 10. Mean and standard deviation of the reciprocal of the system condition number as a function of the actual estimation error (log of error magnitude).

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation for some scalar α, then ²ˆ = α 2 ². So the residual should be normalized by dividing by P the term wi [( f x(i) )2 + ( f y(i) )2 + ( f t(i) )2 ], which also grows with the square of the image contrast and expresses the “contrast energy” of the image. The resulting normalized residual is 1

²n =

Mx x

² . + M yy + Mtt

(If the denominator in the above expression approaches zero, then the estimate is rejected by the eigenvalue threshold.) The normalized residual is consequently invariant to changes of brightness scale, as well any

Figure 11. The rotating high-frequency sequence demonstrates the complementarity of the residual threshold relative to the minimum of eigenvalue threshold. The top left and right images are the minimum of eigenvalue map and the residual map for one frame of the sequence. The middle left and right images are thresholded versions of each. The bottom image depicts those pixels that are above the residual threshold, but below the eigenvalue threshold. (Residual threshold = 0.01; eigenvalue threshold = 0.5.)

19

input bias. (The latter is the due to the fact that the flow estimates depend only on the derivatives of the input image). Figure 11 provides a visual comparison of the contributions of the minimum of eigenvalue threshold and the normalized residual threshold for the rotating highfrequency sequence. This case is interesting because the high-velocity flow values on the periphery of this image cannot be reliably estimated due to aliasing. Examination of this figure reveals that the normalized residual threshold assists in rejecting many of these unreliable high-velocity estimates, although the minimum of eigenvalue criterion also contributes in the high-velocity region. Figure 12 provides evidence of successful rejection by the normalized residual threshold of a second type of model failure: flow discontinuity. At the horizon of the yosemite sequence the clouds are translating horizontally, while the ground is more-or-less fixed. In the lower-left portion of the image, a large occluding structure in the foreground produces a flow discontinuity

Figure 12. Comparison of the roles of the eigenvalue and residual thresholds for a frame of the yosemite sequence. The top left and right images are the minimum of eigenvalue map and the residual map. The middle left and right images are thresholded versions of each. The bottom image depicts those pixels that are above the residual threshold, but below the eigenvalue threshold. (Residual threshold = 0.005; eigenvalue threshold = 0.3.)

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

20

QC:

KL480-01-Brandt

August 9, 1997

Brandt

Table 7. Aggregate error figures for the when the normalized residual error threshold is applied. Sequence

algorithm. Specifically, the following lessons derive from the investigation reported here:

Average error Standard deviation Density

Translating Tree

0.71◦

2.14◦

49.96%

Diverging Tree

1.80◦

2.06◦

48.70%

Yosemite

3.69◦

8.34◦

44.38%

Low Frequency

1.25◦

1.44◦

91.36%

Medium Frequency

0.71◦

0.72◦

92.56%

High Frequency

0.94◦

0.60◦

37.72%

along its silhouette. Both of these regions, in addition to some velocity overflow regions in the image periphery, are detected by the normalized residual threshold, while they are passed, for the most part, by the minimum of eigenvalue threshold. Table 7 contains the error and density figures that result when a residual test with a threshold of 0.02 is added to the existing minimum eigenvalue test. The remaining parameter settings are identical to those of Table 4. This additional criterion reduces significantly the mean error and standard deviation for the Yosemite image, while reducing slightly the estimate density. This quantitatively corroborates our observations in Fig. 12. Clearly, the eigenvalue and normalized residual thresholds are related. For instance, a low minimum eigenvalue leads to a high residual, statistically. One could consider each criterion in isolation as being sufficient to eliminate a significant number of bad estimates. However, the fact is that these two criteria reject overlapping but non-identical sets of errors. Thus, the two criteria provide complementary information which can be used to select a more highly restricted and reliable set of estimates than could be selected using either one of the two criteria alone. 6.

13:15

Discussion

The analysis and experiments of the preceding sections result in a series of performance improvements relative to the performance results reported by Barron et al. The improvements arise from each of the fours stages of the algorithm—the pre-filter, the differentiator, the neighborhood integrator, and the reliability test. The analysis and experiments also provide new understanding regarding the tradeoffs among accuracy, density, and resolution that exist with respect to the

1. Differentiation requires compensatory low-pass filtering in the non-differentiating axial directions to correct for distortions introduced by non-ideal response of the differentiator. In the absence of this compensation, excessive low-pass pre-filtering is required, which has the effect of reducing flow estimation density and flow-field resolution. 2. With compensatory low-pass filtering as part of the differentiation step, the pre-filtering step and the differentiation step are mostly decoupled. Thus the choice of pre-filter depends primarily on the input noise characteristics. Nevertheless, a small-support low-pass filter was observed be beneficial in all cases, even for the “noiseless” synthetic sequences. Certainly, quantization effects can be reduced by such a prefilter. 3. The neighborhood integrator can be made more compact by using a 3D kernel. The result is greater accuracy and less overall support. 4. Central weighting of the neighborhood integrator tends to increase the flow resolution and accuracy, but reduce estimation density. 5. The best choice of integration neighborhood depends strongly on the local flow dynamics and on the image intensity structure. Therefore, further performance improvements are possible through the use of an adaptive neighborhood. 6. The overall support of the algorithm is determined by the collective supports of the pre-filter, the differentiator, and the neighborhood integrator. In general, increasing the support increases the estimation density, while it reduces flow-field resolution. This suggests the use of an adaptive support, such as is effectively employed by the multi-resolution (e.g., (Weber and Malik, 1993, 1995)) and filterbank methods (e.g., (Fleet and Jepson, 1990)). 7. Flow reliability tests that reject estimates for which the minimum eigenvalue is near zero are strongly correlated with errors due to ill-conditioning of the least-squares system. However, examination of this relationship suggest that other errors, such as those due to model failure, can be undetectable by eigenvalue tests. 8. The least-squares residual is correlated with model failure and also with failures due to temporal aliasing. The use of a residual threshold improves

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

QC:

KL480-01-Brandt

August 9, 1997

13:15

Gradient-Based Optical Flow Estimation

overall flow accuracy, while not excessively decreasing the density of the estimates. The fundamental tradeoff regarding this algorithm is that increasing the overall support increases flow estimation density, while it decreases resolution and accuracy. Perhaps the most obvious way to cope with this apparent conflict is to adopt a multi-resolution estimation scheme. That is, generate a set of flow estimates for each pixel that are each based-on a particular spatial resolution-level, and then robustly combine these estimates using some statistical test. The use of such an estimator requires that each single-resolution estimator is tuned to perform as well as possible. Thus the results presented in this paper are certainly applicable to the multi-resolution case. Related future work includes the development of an adaptive least-squares neighborhood integrator that can detect and adjust to the presence of spatial and temporal flow-field transitions. In addition, a quantitative measure of the effective resolution of the estimated flow field is lacking. In this paper, Barron et al.’s accuracy and density measures are employed in order to provide a basis for comparison. However, resolution assessment is currently only qualitative. Analysis of the algorithm response to space/time flow-field transitions under general conditions, as well as a experimental procedure to quantitatively assess this response, is required. 7.

Conclusions

The design of the gradient-based, local optimization, optical flow estimation algorithm involves the selection of the pre-filter, the differentiator, the neighbor integrator, and the reliability test. The goal is to make the choice that results in the best performance, in terms of flow accuracy, estimation density, and flow-field resolution, according the specified operating parameters, at a reasonably low computational cost. Systematic investigation of the effects of these choices, yields new information and suggests a series of changes to previously-suggested parameter settings that result in significant overall performance improvements. Acknowledgment The author is grateful for the encouragement and support of Prof. Makato Miyahara at JAIST, and for the

21

help of the reviewers in improving the quality of this paper.

References Adelson, E.H. and Bergen, J.R. 1986. The extraction of spatiotemporal energy in human and machine vision. In Proc. IEEE Workshop on Visual Motion, pp. 151–156. Barron, J.L., Fleet, D.J., Beauchemin, S.S., and Burkitt, T.A. 1993. Performance of optical flow techniques. Technical Report TR-299, Dept. of Computer Science, Univ. of Western Ontario, July 1992. Revised July 1993. Barron, J.L., Fleet, D.J., and Beauchemin, S.S. 1994. Performance of optical flow techniques. Int. J. Computer Vision, 12(1):43–77. Brandt, J.W. 1994a. Analysis of bias in gradient-based optical-flow estimation. In Proc. Twenty-Eighth Annual Asilomar Conference on Signals, Systems, and Computers, pp. 721–725. Brandt, J.W. 1994b. Derivative-based optical flow estimation: Controlled comparison of first- and second-order methods. In IAPR Workshop on Machine Vision Applications, pp. 464–469. Brandt, J.W. 1994c. Finite-differencing errors in gradient-based optical flow estimation. In Proc. First IEEE Int. Conf. Image Processing, vol. II, pp. 775–779. Cafforio, C. 1982. Remarks on the differential method for estimation of movement in television images. Signal Processing, 4:45–52. Cafforio, C. and Rocca, F. 1979. Tracking moving objects in television images. Signal Processing, 1:133–140. Campani, M. and Verri, A. 1990. Computing optical flow from an overconstrained system of linear algebraic equations. In Proc. Third IEEE Int. Conf. Computer Vision, pp. 22–26. Enkelmann, W. 1988. Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences. Computer Vision, Graphics, and Image Processing, 43:150–177. Fleet, D.J. and Jepson, A.D. 1990. Computation of component image velocity from local phase information. Int. J. Computer Vision, 5:77–104. Heeger, D.J. 1987. Model for the extraction of image flow. J. Optical Soc. America A, 4(8):1455–1471. Horn, B.K.P. and Shunck, B.G. 1981. Determining optical flow. Artificial Intelligence, 17:185–203. Kearney, J.K., Thompson, W.B., and Boley, D.L. 1987. Optical flow estimation: An error analysis of gradient-based methods with local optimization. IEEE Trans. Pattern Analysis and Machine Intelligence, 9(2):229–244. Lucas, B. and Kanade, T. 1981a. An iterative image registration technique with an application to stereo vision. In Proc. DARPA Image Understanding Workshop, pp. 121–130. Lucas, B. and Kanade, T. 1981b. An iterative image registration technique with an application to stereo vision. In Proc. 5th Int. Joint Conf. Artificial Intelligence, pp. 674–679. Lucas, B. and Kanade, T. 1985. Optical navigation by the method of differences. In Proc. 7th Int. Joint Conf. Artificial Intelligence, pp. 981–984. Nagel, H.-H. 1987. On the estimation of optical flow: Relations between different approaches and some new results. Artificial Intelligence, 33:299–324. Nomura, A., Miike, H., and Koga, K. 1993. Detecting a velocity field from sequential images under time-varying illumination. In

P1: RPS/PCY

P2: VTL/SRK

P3: VTL/SRK

International Journal of Computer Vision

22

QC:

KL480-01-Brandt

August 9, 1997

13:15

Brandt

Time-Varying Image Processing and Moving Object Recognition, V. Cappellini (Ed.), Elsevier, vol. 3, pp. 343–350. Otte, M. and Nagel, H.-H. 1994. Optical flow estimation: Advances and comparisons. In Lecture Notes in Computer Science, ECCV ’94, Jan-Olof Eklundh (Ed.), vol. 800, pp. 51–60. Simoncelli, E.P. 1993. Distributed Representation and Analysis of Visual Motion. Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, MIT. Simoncelli, E.P. 1994. Design of multi-dimensional derivative filters. In Proc. First IEEE Int. Conf. Image Processing, vol. I, pp. 790– 779. Simoncelli, E.P., Adelson, E.H., and Heeger, D.J. 1991. Probability distributions of optical flow. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 310–315.

Tistarelli, M. 1994. Multiple constraints for optical flow. In Lecture Notes in Computer Science, ECCV ’94, Jan-Olof Eklundh (Ed.), vol. 800, pp. 61–70. Uras, S., Girosi, F., Verri, A., and Torre, V. 1988. A computational approach to motion perception. Biol. Cybern., 60:79– 87. Verri, A., Girosi, F., and Torre, V. 1990. Differential techniques for optical flow. J. Opt. Soc. Am. A, 7(5):912–922. Weber, J. and Malik, J. 1993. Robust computation of optical flow in a multi-scale differential framework. Proc. IEEE Int. Conf. Computer Vision, pp. 12–20. Weber, J. and Malik, J. 1995. Robust computation of optical flow in a multi-scale differential framework. Int. J. Computer Vision, 14:67–81.