Image and Vision Computing 17 (1999) 309–320
Detection of motion fields under spatio-temporal non-uniform illumination Lin Zhang*, Tatsunari Sakurai, Hidetoshi Miike Graduate School of Science and Engineering, Yamaguchi University, 2557 Tokiwadai, Ube 755, Japan Received 5 September 1997; received in revised form 5 January 1998; accepted 5 May 1998
Abstract In actual scene analysis, the influence of non-ideal conditions such as non-uniform illumination should be taken into account. The conventional methods for the estimation of motion fields are violated in this situation. In this study, two approaches are proposed to extract reliable motion fields under spatio-temporal non-uniform illumination. These are an extended constraint equation with spatio-temporal local optimization and a pixel-based temporal filtering. Experiments have been made to confirm the performance of the proposed methods and to clarify the difference of characteristics between them. 䉷 1999 Elsevier Science B.V. All rights reserved. Keywords: Motion field; Non-uniform illumination; Spatio-temporal local optimization; Temporal filtering
1. Introduction In the computer vision [1] fields, many studies have been carried out with the aim of obtaining information on the three-dimensional (3D) environment from image sequence. One of the most important problems is to determine optical flow, which is the distribution of apparent velocities of moving brightness patterns in an image sequence [2,3,18]. Optical flow results from relative motion between a camera and objects in the scene. A number of different approaches to determine optical flow have been proposed including gradient-based, correlation-based, energy-based and phase-based methods. A recent survey is due to Barron et al. (1994) [5], where the different approaches were compared on a series of synthetic and real images. In the actual scene analysis, however, the performance of conventional methods is not satisfactory. There exists the influence of non-ideal conditions in the actual scene. For example: non-uniform illumination [8]; occlusions [9]; multiple optical flow [10,11]; non-rigid motion of object [12]; and diffusion [13]. If we want to obtain a reliable optical flow, we should take such problems into account. Recently, our research
* Corresponding author. E-mail:
[email protected] group have proposed two methods for determining motion fields from sequential images under spatially or temporally non-uniform illumination [8,14]. The methods are based on the extended conservation equation, which is obtained by observing the total brightness change in a fixed local closed area. One of the methods assumes spatially non-uniform illumination and a stationary motion field. The other method assumes temporally non-uniform illumination and local constancy of motion vectors. With these methods, we can determine 2D motion fields of fluid flow under spatially or temporally non-uniform illumination separately [8]. On the other hand, in the ordinary approach, noise reduction and contrast enhancement of images are based on twodimensional (2D) space filtering [4]. For these purposes, we can introduce frequently digital filtering with 2D Fast Fourier Transform (FFT) and non-linear filters such as a median filter. These space domain approaches are effective for static image processing; however, it is usually difficult to remove the influence of non-uniform illumination (spatial and temporal) in a dynamic image sequence. In this paper, we develop a new algorithm to cope with two co-existing conditions of non-uniform illumination. We tested two approaches. The first method introduces the extended constraint equation with spatio-temporal local optimization. The second method introduces a new method of temporal filtering that enables the reduction of the influence of non-uniform illumination. The performance of the proposed methods is confirmed by the use of synthetic and real image sequences.
0262-8856/99/$ - see front matter 䉷 1999 Elsevier Science B.V. All rights reserved. PII: S0 26 2 -8 8 56 ( 98 ) 00 1 11 - 5
310
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
2. Background of extended constraint equation An extended constraint equation is derived from a conservation law of total brightness [14,16,17] in a fixed small region dS as illustrated in Fig. 1. Z Z f dS ¼ ¹ 养dC f ~n·~ndC þ fdS, (1) dS t dS where f(x,y,t) is a spatio-temporal brightness distribution of sequential images, dS is a fixed local observation area, dC is the contour surrounding dS, ~n ¼ (nx , ny ) is the motion vector to be determined, n~ is the unit vector normal to dC and pointing outwards, and f is the rate of creation (or annihilation) of brightness at a pixel in dS. The creation term includes increasing or decreasing brightness on the image plane under influence of non-uniform illumination. Eq. (1) is reduced to a differential formula [14] in two dimensions: f ¼ ¹ fdiv(~n) ¹ ~n·grad(f ) þ f: (2) t Under the assumption div(~n) ¼ 0 [16] and f ¼ 0 [17], Eq. (2) coincides with the basic constraint equation of the gradientbased method [7]: f f f ¼ ¹ ~n·grad(f ) ¼ ¹ nx ¹ ny : t x y In this study, we adopt the following relationship for the determination of motion fields: f ¼ ¹ ~n·grad(f ) þ f: (3) t This relationship is reduced from Eq. (2) under the assumption of div(~n) ¼ 0. This assumption requires a rigid object motion perpendicular to the camera optical axis. Since this conservation equation contains the creation term of brightness, it is possible to estimate the effects of non-uniform illumination for the detection of motion fields. Nomura et al. (1995) [8] introduced an assumption of separability of non-uniform illumination. They assumed the spatio-temporal brightness distribution f ðx; y; tÞ is f ðx; y; tÞ ¼ rðx; y; tÞ·gðx; y; tÞ, where r(x,y,t) represents the effect of non-uniform illumination and g(x,y,t) is the brightness distribution under uniform illumination. For the first
Fig. 1. A schematic explanation of variables appearing in the conservation equation.
situation, the illumination is assumed to be only spatially non-uniform and constant with respect to time (r ¼ r(x,y), r/t ¼ 0). The reduced relationship is [8] q f þ ~n·grad(f ) ¼ fq n2x þ n2y , (4) t where q(x,y) is an unknown constant. If the velocity field is constant locally with respect to time (dt), the motion vector ~n and the unknown constant q(x,y) are determined by minimizing the following error function (with temporal local optimization: TLO [14]) with the non-linear least-squares method (for example, the Newton–Raphson method): q X (ft þ nx fx þ ny fy ¹ fq n2x þ n2y )2 : (5) E¼ dt
As in the second situation, the illumination is assumed to be only temporally non-uniform and constant with respect to space (r ¼ r(t), grad(r) ¼ 0). The reduced relationship is [8] f r(t)=t þ ~n·grad(f ) ¼ f· ¼ fw(t), t r(t)
(6)
where w(t) is an unknown constant. If the velocity field is constant locally with respect to space (dS ¼ dx·dy), the motion vector ~n and the unknown constant w(t) are determined by minimizing the following error function with the linear least-squares method (with spatial local optimization: SLO [15]): XX (ft þ nx fx þ ny fy ¹ fw)2 : (7) E¼ dx dy
Here, the parameters dx and dy represent the width of the local neighborhood dS in x and y direction. 3. Proposed methods 3.1. A spatio-temporal local optimization The extended constraint equation for gradient-based method contains the creation term of brightness. Nomura et al. (1995) [8] separated the term into two different conditions of illumination (spatially non-uniform illumination or temporally non-uniform illumination) and determined this term under respective conditions. In the actual scene, however, there is the case of two coexisting conditions of non-uniform illumination. In this section, we propose a new model, which fuses two conditions of non-uniform illumination. We introduce the extended constraint equation with spatio-temporal local optimization [19]. The parameter f(x,y,t) of Eq. (3) represents the influence of non-uniform illumination. Under the assumption that the illumination is only spatially non-uniform, f1 ¼ q f n2x þ n2y q(x, y). On the other hand, f 2 ¼ fw(t) when we assume that the illumination is only temporally nonuniform, where q(x,y) represents the core effect of spatially non-uniform illumination and w(t) represents that
311
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
of non-stationary illumination. Now we assume the spatiotemporal brightness distribution f(x,y,t) is under spatiotemporal non-uniform illumination f (x, y, t) ¼ r(x, y, t)·g(x, y, t) ¼ r1 (x, y)·r2 (t)·g(x, y, t), where r 1(x,y) represents the effect of spatial non-uniform illumination, r 2(t) represents the effect of temporal nonuniform illumination and g(x,y,t) is the virtual brightness distribution under uniform illumination. From Eq. (3) the f(x,y,t) is represented by the following equation [see Eq. (A12) in Appendix A]: f(x, y, t) ¼ f1 (x, y, t) þ f2 (x, y, t) q 2 2 ¼ f (x, y, t) nx þ ny q(x, y) þ w(t) :
(8)
When the unknown variables n x, n y are assumed to be constant in a local spatial–temporal volume dV ¼ dx·dy·dt, Eq. (8) can be reduced: f ¼ f (x, y, t)(cq(x, y) þ w(t)), q where c ¼ n2x þ n2y ¼ const. We also assume that the image function f(x,y,t) varies rapidly with respect to time and space compared to the effect of non-uniform illumination q(x, y) or w(t). Thus, we obtain again a simplified relationship: f ¼ ¹ ~n·grad(f ) þ fw⬘(x, y, t), t
(9)
where w⬘ðx; y; tÞ ¼ cq(x,y) þ w(t) is regarded as a constant parameter in the local volume dV. Since this equation contains both effects of spatially non-uniform illumination and non-stationary illumination, it seems possible to manipulate the effects of every condition of non-uniform illumination under the above assumptions. The above assumptions seem to be valid in a small spatiotemporal neighborhood dV ¼ dx·dy·dt. For the determination of the three unknown variables in Eq. (9), ~n ¼ (nx , ny ) and w⬘, we introduce the assumption that the components of motion vector and unknown variables are constant with respect to time and space in a local volume of dV: 8 w⬘ ¼ const > > < nx ¼ const (10) in dV: > > : ny ¼ const
Motion vector ~n and the unknown constant w⬘ can be determined by minimizing the following error function with the linear least-squares method, XXX E¼ (ft þ nx fx þ ny fy ¹ fw⬘)2 : (11) dx dy
dt
We tentatively call this approach ‘the extended constraint equation with spatio-temporal local optimization’ (E-STO). Since this method can manipulate many equations effectively compared with SLO and TLO, it is expected to determine more accurate motion fields. The comparison of ESTO, SLO and TLO is summarized in Table 1. It is also possible to introduce smoothness constraints or the regularization approach to solve Eq. (9). 3.2. The pixel-based temporal filtering In this paper, we also introduce a new approach based on a pixel-based image sequence processing [20]. For a sequential image, temporal change of the brightness at each pixel is regarded as a time series. The proposed method is based on digital Fourier Transform and digital filtering under the assumption of local constancy of statistical characteristics of the time series. We evaluate temporal development of the spectrum within a finite time at each pixel by Fast Fourier Transform. After a digital filtering of the spectrum, inverse transformation is carried out at each pixel site to create filtered image sequences (see Fig. 2). Let a temporal brightness change of raw image sequence at a pixel coordinate (x,y) be f(x,y,t). We define a local time-domain digital Fourier Transform within a local time-window dT at around t. The reduced instantaneous spectrum F(x,y,t;q K) can be represented by F(x, y, t; qK ) ¼
t þX dT=2
f (x, y, t)exp( ¹ iqK t),
t ¹ dT=2
where q K ¼ 2pk/dT. After a digital filtering of the spectrum, inverse transformation is carried out to create filtered image f ac(x,y,t): X fac (x, y, t) ¼ Fac (x, y, t; qK )exp(iqK t), where Fac ðx; y; t; qK Þ ¼ F(x,y,t;q K ⫽ 0). By shifting the time-window step-by-step, we can create filtered image
Table 1 The comparison of the proposed method (ESTO) with the method of assumption of separability of non-uniform illumination proposed by Nomura et al. (1995) [8] Non-uniform illumination condition Spatial non-uniform Temporal non-uniform Proposed method: Spatiotemporal non-uniform
Constraint equation
Error function (E)
q f þ ~n·grad(f ) ¼ f (x, y, t) n2x þ n2y q(x, y) t f þ ~n·grad(f ) ¼ f (x, y, t)w(t) t f þ ~n·grad(f ) ¼ f (x, y, t)w⬘(x, y, t) t
X
q2 ft þ nx fx þ ny fy ¹ fq n2x þ n2y X X 2 dx dy (ft þ nx fx þ ny fy ¹ fw) X X X 2 dx dy dt (ft þ nx fx þ ny fy ¹ fw⬘) dt
Solution temporal local optimization (TLO) spatial local optimization (SLO) spatio-temporal local optimization (ESTO)
312
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
temporal) illumination. In the previous report [20], we proposed this method mainly for the motion enhancement. In the report, we test the performance of the temporal filtering as a preprocessing tool for the detection of motion fields.
4. Experiment In this section, we try to apply the proposed methods to determine motion fields under non-uniform illumination (containing non-stationary illumination with respect to time and non-uniform illumination with respect to space). We compare the proposed methods with the conventional methods and discuss the performance of the proposed methods. The experimental data include two synthetic image sequences and one real image sequence. 4.1. Experimental images analysis
Fig. 2. A schematic explanation of a pixel-based temporal filtering.
sequence (ac-image). From the raw image sequence, a sequential image is created to enhance the brightness of moving object and because that the DC-component is removed in the filtered image sequence, it is also expected to reduce the influence of non-uniform (spatial and
4.1.1. Yosemite sequence1 (synthetic data) Fig. 3 shows a snapshot of synthetic image sequence (Yosemite sequence). The image sequence has a resolution of 316 ⫻ 252 pixels. The brightness is quantified into 256 steps. The Yosemite sequence is a complex test scene. In the scene, the cloud has a translational motion with a speed of 2 pixels/frame, while the speed in the lower left is about 4– 5 pixels/frame. However, the brightness of cloud changes with respect to time and space. The landscape (mountains, valley, etc.) moves against depth direction. Then, motion field expands. Namely, the motion field has divergence characteristics (div(~n) ⫽ 0). This sequence is challenging because of the range of velocities, occluding edges between the mountains and at the horizon, divergence and nonuniform illumination. Fig. 4 represents the theoretical motion fields of Fig. 3. 4.1.2. Rotated Yosemite sequence under non-uniform illumination (synthetic data) Rotated Yosemite sequence is created from a static image (the first frame of Yosemite sequence) shown in Fig. 3. The example of the Rotated Yosemite sequence f(x,y,t) simulating spatially and temporally non-uniform illumination is created by the equation: f (x, y, t) ¼ r1 (x, y)·r2 (t)·g(x, y, t), where r1 (x, y) ¼ e
¹
(x ¹ x_center)2 (y ¹ y_center)2 þ j2x j2y
represents the effect of spatially non-uniform illumination, pt Þ represents the effect of temporally non-unir2 (t) ¼ sinð t_size form illumination and r(x,y,t) ¼ r 1(x,y)·
Fig. 3. A simulation image sequence (Yosemite sequence).
1 The image sequence of the Yosemite sequence is obtained from the ftpsite of ftp.csd.uwo.ca.
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
Fig. 4. Theoretical motion fields of the Yosemite sequence.
r 2(t) represents the effect of spatially and temporally nonuniform illumination. g(x,y,t) is obtained by rotating the static image (see Fig. 3) clockwise around its center axis with constant angular velocity (0.8 degree/frame). The size of the Rotated Yosemite sequence is 200 ⫻ 150 pixels and 50 frames, the center axis is at x_center ¼ 100, y_center ¼ 75, we take j x ¼ 100, j y ¼ 75 and t_size ¼ 50. Fig. 5 shows the 10th and 20th frames of the images. The theoretical motion fields are shown in Fig. 6. 4.1.3. Toy car sequence (real data) In order to confirm the usefulness of the proposed methods, we took sequential images of toy car motions on a floor affected by spatially and temporally non-uniform illumination through a TV camera with a sampling frequency of 30 Hz. The size of the Toy car sequence is
Fig. 5. The 10th and 20th frames of the Rotated Yosemite sequence.
313
Fig. 6. Theoretical motion fields of the Rotated Yosemite sequence.
236 ⫻ 110 pixels and 40 frames. Brightness is quantified into 256 steps. The toy car moves from lower left to upper right. The spatially non-uniform illumination is from 270 ⫾ 5 to 1360 ⫾ 10 lux and the temporally non-uniform illumination is from 45 ⫾ 2 to 765 ⫾ 5 lux. Fig. 7 shows the 2nd and 32nd frames of the Toy car sequence. 4.2. Experimental results In this section, we report the quantitative performance of conventional methods and proposed methods on the synthetic image sequences and also show the motion fields produced by the methods on the real image sequence. 4.2.1. Comparison of conventional methods with the E-STO (proposed method 1) We try to determine motion fields by use of the following methods. First, we use the conventional gradient-based method (Horn and Schunck [7]).
Fig. 7. The 2nd and 32nd frames of the Toy car sequence.
314
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
• Spatial Global Optimization (SGO): Horn and Schunck [7] combined the basic constraint equation with a global smoothness term to constrain the estimated motion field ~n ¼ (n x,n y), minimizing
4.2.2. Comparison of conventional method with raw images and with temporal filtered images (proposed method 2) In Figs. 11 and 12, we demonstrate a different approach to remove the influence of non-uniform illumination in motion
ZZ f f f 2 þ a2 (kⵜnx k22 þ kⵜny k22 ) dxdy: nx þ ny þ x y t We used a ¼ 0.5 and 100 iterations as suggested by Barron [5] in testing below. The original method described by Horn and Schunck [7] used first-order difference to estimate intensity derivatives. Because this is a relatively crude form of numerical differentiation and can be the source of considerable error, following Barron [5], we implemented the method with spatio-temporal Gaussian presmoothing (j xy ¼ j t ¼ 1.5) and four-point central differences for differentiation (with mask coefficients 1/12(¹1,8,0,¹8,1)). The results of obtained motion fields by using modified Horn and Schunck method is shown in Fig. 8a–c. Second, we use the conventional gradient-based method of local optimization. • Spatio-Temporal Local Optimization (STO) We also tested conventional gradient-based method with spatio-temporal local optimization. We assume that motion vector is constant with respect to time and space in a local volume of 5 ⫻ 5 pixels and 8 frames. The result of obtained motion fields is shown in Fig. 9a–c. Third, we use the proposed method 1 (E-STO) based on Eq. (11). We assume that the motion vector ~n and w⬘ are constant in a local volume of 5 ⫻ 5 pixels and 8 frames. The result of obtained motion fields is shown in Fig. 10a–c. The motion fields obtained by the conventional gradientbased methods (with SGO and STO) have serious errors at the fields under non-uniform illumination where the brightness distribution changes temporally and spatially (e.g. at the cloud position in the Yosemite sequence). When we apply the extended constraint equation under the assumption of Eq. (11), motion fields obtained at these places are apparently improved. However, for the Yosemite sequence, reduction of the error at foreground mountain surface is not satisfactory. Because the texture of foreground mountain surface is only pinstriped, it is easy to encounter the aperture problem. When we try to determine motion fields by the gradient method, we have to consider the aperture problem, which cannot be solved locally. If the pinstriped texture area is larger than the observation area, it is hard to obtain the correct motion fields with the local optimization method. In general, it is not possible to compute true velocity and direction by the observation within a small neighborhood (a local area). In order to overcome the shortage, it seems effective to introduce a spatial presmoothing (e.g. Gaussian spatial presmoothing) based on the image sequence (see Fig. 13). We also can introduce the global optimization techniques [7] and hierarchical approaches [6].
Fig. 8. (a) Motion field of Yosemite sequence determined by SGO: Horn and Schunck (modified) (conventional gradient-based method). (b) Motion field of Rotated Yosemite sequence determined by SGO: Horn and Schunck (modified) (conventional gradient-based method). (c) Motion field of Toy car sequence determined by SGO: Horn and Schunck (modified) (conventional gradient-based method).
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
315
analysis. The raw image sequences to be analysed are also Yosemite sequence, Rotated Yosemite sequence and Toy car sequence shown in Figs. 3, 5, and 7, respectively. Examples of temporally filtered image sequence (ac-image) obtained
by the temporal filtering described in Section 3.2 are shown in Fig. 11a–c. We applied the conventional gradient-based method with STO to obtain motion fields from the temporal filtered
Fig. 9. (a) Motion field of Yosemite sequence determined by STO (conventional gradient-based method). (b) Motion field of Rotated Yosemite sequence determined by STO (conventional gradient-based method). (c) Motion field of Toy car sequence determined by STO (conventional gradient-based method).
Fig. 10. (a) Motion field of Yosemite sequence determined by the proposed method 1 (E-STO). (b) Motion field of Rotated Yosemite sequence determined by the proposed method 1 (E-STO). (c) Motion field of Toy car sequence determined by the proposed method 1 (E-STO).
316
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
image sequence (ac-image, see Fig. 11a–c). Fig. 12a–c shows the analysed motion fields. By comparison among the analysed motion fields (Figs. 8, 9, and 12), the error of motion vector estimation at the places (non-uniform illumination) are reduced apparently in Fig. 12. Finally, for the Yosemite sequence we introduced a Gaussian spatial presmoothing to cope with the aperture problem. The conventional gradient-based method (with STO) is applied to the temporal filtered image sequence (ac-image, see Fig. 11a) by introducing the Gaussian spatial presmoothing
Fig. 11. (a) A temporal filtered image sequence: ac-image (from Yosemite sequence, dT ¼ 8). (b) A temporal filtered image: ac-image (from Rotated Yosemite sequence, dT ¼ 16). (c) A temporal filtered image: ac-image (from Toy car sequence, dT ¼ 2).
(j xy ¼ 1.0) simultaneously. The result is shown in Fig. 13. Apparently, the obtained motion fields confirm to us the better performance. For more detailed and quantitative evaluation see Table 2 and Table 3 in Section 4.3.
Fig. 12. (a) Motion field of Yosemite sequence determined by STO: Applied to ac-images (conventional gradient-based method). (b) Motion field of Rotated Yosemite sequence determined by STO: applied to ac-images (conventional gradient-based method). (c) Motion field of Toy car sequence determined by STO: applied to ac-images (conventional gradient-based method).
317
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
4.2.3. Application of the ESTO to the filtered ac-image We also test the situation to apply the ESTO method to the temporal filtered ac-image (combination of proposed method 1 with proposed method 2). The results are shown in Fig. 14a–c. Because the influence of non-uniform illumination has been removed from the raw images in the acimage, the improvement of the combination of proposed method 1 with proposed method 2 is not very apparent, comparing it with the proposed method 2. But comparing Fig. 14a with Fig. 13, the motion fields obtained at the cloud area (upper left) are improved slightly. 4.3. Error measurement
Fig. 13. Motion field of Yosemite sequence determined by STO: applied to ac-images and Gaussian spatial presmoothing images (conventional gradient-based method).
We tested the proposed algorithms on the synthetic and real images. For the experiment on the synthetic images where the correct motion fields are known, we can use the angular measure of error used by Barron et al. (1994) [5] to
Table 2 Summary of the Yosemite sequence motion fields results Algorithm
Ave. Error
S.D.
Frames 1
G_s_s 2
G_t_s 3
Horn and Schunck (original) Horn and Schunck (modified) Uras et al. Nagel Anandan Singh Conventional STO TLO (based on Eq. (5)) SLO (based on Eq. (7)) Proposed method 1 (ESTO: based on Eq. (11)) Proposed method 2 (STO: based on ac-images) Proposed method 2 with G_s_s (STO: based on ac-images) Proposed method 3 with G_s_s (ESTO: based on ac-images)
31.69 9.78 8.94 10.22 13.36 10.44 10.18 14.37 10.49
31.18 16.19 15.61 16.51 15.64 13.94 16.40 20.23 17.64
2 15 15 15 2 3 8 8 3
No Yes Yes Yes No No No No No
No Yes Yes Yes No No No No No
8.47
15.32
8
No
No
8.05
14.94
8
No
No
5.70
11.15
8
Yes
No
5.59
11.24
8
Yes
No
The error comparison of the proposed methods with other techniques reported by Barron et al. (1994) [5]. All compared methods provide 100% density of motion vector. Notes: 1 Frames represents the input frames of images the technique required. 2 G_s_s represents the Gaussian spatial presmoothing used. 3 G_t_s represents the Gaussian temportal presmoothing used. Table 3 Summary of the Rotated Yosemite sequence motion fields results Algorithm
Ave. Error
S.D.
Frames 1
G_s_s 2
G_t_s 3
Horn and Schunck (original) Horn and Schunck (modified) Conventional STO Proposed method 1 (ESTO: based on Eq. (11)) Proposed method 2 (STO: based on ac-images) Proposed method 3 (ESTO: based on ac-images)
11.99 10.57 3.91
11.06 9.53 4.41
2 15 8
No Yes No
No Yes No
2.61
2.91
8
No
No
2.47
2.16
8
No
No
2.46
2.43
8
No
No
The error comparison of the proposed methods with Horn and Schunck technique. The all compared methods provides 100% density of motion vector. Notes: 1 Frames represents the input frames of images the technique required. 2 G_s_s represents the Gaussian spatial presmoothing used. 3 G_t_s represents the Gaussian temporal presmoothing used.
318
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
evaluate the results. It is favorable to compare our methods with other techniques [5]. They measure the error between the correct velocity ~nc ¼ (nx , ny ) and the estimate ~ne ¼ (ˆnx , nˆ y ) as the angle between the unit vectors in 3D space,
q ~n3 ⬅ 1= n2x þ n2y þ 1 (nx , ny , 1): The angular error between the correct vector ~n3c and the estimate ~n3e is wE ¼ arccos(~n3c ·~n3e ):
(12)
The error comparison of the proposed methods with other techniques reported by Barron et al. [5] is summarized in Tables 2 and 3, which list the average and standard deviation of the angular error. The comparison with TLO and SLO (see Table 1) is also carried out. The proposed method 3 records the best performance. 5. Conclusions
Fig. 14. (a) Motion field of Yosemite sequence determined by ESTO: applied to ac-images and Gaussian spatial presmoothing images (proposed method 1 þ proposed method 2). (b) Motion field of Rotated Yosemite sequence determined by ESTO: applied to ac-images (proposed method 1 þ proposed method 2). (c) Motion field of Toy car sequence determined by ESTO: applied to ac-images (proposed method 1 þ proposed method 2).
In this paper, we proposed two methods to determine motion fields from an image sequence under non-uniform illumination. The first method is based on the extended constraint equation from the conservation law of total brightness in a fixed observation area. Thus, it is possible to estimate the effects of non-uniform illumination and true motion fields. Since we adopted the spatio-temporal local optimization, we obtained high resolution and high reliability of the determined motion fields compared to the conventional gradient method. The performance of the proposed method was confirmed by the analysis of two synthetic image sequences and one real image sequence. As the second method, we propose a different approach to remove the influence of non-uniform illumination. The algorithm is based on a local temporal filtering. From an original image sequence, a dynamic scenes (ac-image) is created, which is defined in a local time-domain dT at around t. By experimentation, the reduction of the influence of non-uniform illumination is also confirmed. It is confirmed that the influence of non-uniform illumination is reduced by the proposed methods. However, for the aperture problem, the two methods proposed are not effective because of the local approach. For the Yosemite sequence the mountain area (lower-left front region in the raw image sequence) has a parallel high-contrast pattern of pinstripe. To cope with this aperture problem, we can introduce approaches such as regularization method. For a simple approach, we tested the Gaussian spatial presmoothing combined with the proposed temporal filtering. The performance of the filter is clearly demonstrated in Fig. 13. Thus, the proposed methods are hopeful for complicated actual scene analysis. An advantage of the first method can be the possibility to evaluate the non-uniform illumination quantitatively. Further investigation considering the neglected term (div(~n)) and testing global approaches are expected. Acknowledgements The authors wish to thank Dr A. Nomura, Professor E.
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
Yokoyama and Dr Y. Mizukami for their critical comments and helpful discussions. This work partly supported by the Sasakawa Scientific Research Grant from The Japan Science Society.
319
Then f is expressed as f ¼ f ~n·
grad(r1 ) r =t þ f· 2 : r1 r2
(A10)
Here we introduce a vector Appendix A The model of spatio-temporal non-uniform illumination We assume the spatio-temporal brightness distribution f(x,y,t) is f (x, y, t) ¼ r1 (x, y)·r2 (t)·g(x, y, t),
(A1)
where r 1(x,y) represents the effect of spatial non-uniform illumination, r 2(t) represents the effect of temporal nonuniform illumination and g(x,y,t) is the virtual brightness distribution under uniform illumination. Now the following equation is adopted for determining motion field under nonuniform illumination: f þ ~n·grad(f ) ¼ f: (A2) t Substituting Eq. (A1) we obtain (r1 r2 g) þ ~n·grad(r1 r2 g) ¼ f: t Expansion of the left side of Eq. (A3) r1 r2 þ ~n·grad(r1 ) þ gr1 þ ~n·grad(r2 ) gr2 t t g þ ~n·grad(g) ¼ f: þ r1 r2 t
(A3)
(A7)
Substituting Eq. (A6) with Eq. (A7) we obtain r2 ¼ f, t
f f r ·~n·grad(r1 ) þ · 2 ¼ f: r1 r2 t
(A11)
where a(x,y) is the angle between ~n(x, y) and p~(x, y). With the assumption of spatio-temporal optimization (~n=x ¼ ~n=y ¼ ~n=t ¼ 0 in dV ¼ dx·dy·dtÞ, the term p~(x, y), aðx; yÞ and r2 (t)=t r2 (t) are also constant in dV. If the symbol q(x,y) and w(t) are used as unknown constants, Eq. (A11) is rewritten as q (A12) f(x, y, t) ¼ fq(x, y) n2x þ n2y þ fw(t), where q(x, y) ¼ l~plcosa, r2 =t : r2
References
r1 ¼ 0, t
gr2 ·~ngrad(r1 ) þ gr1 ·
r2 =t r =t ¼ f l~nk~plcosa þ f· 2 r2 r2 q r =t ¼ f n2x þ n2y l~plcosa þ f· 2 : r2
f ¼ f ~n·~p þ f·
(A4)
where r 1(x,y) represents the effect of spatial non-uniform illumination, r 2(t) represents the effect of temporal nonuniform illumination. Then
r2 ¼ r2 (t), grad(r2 ) ¼ 0:
Then Eq. (A10) is expressed as
w(t) ¼
Where we assume that g(x,y,t) obeys the equation (because of uniform illumination) g þ ~n·grad(g) ¼ 0: (A5) t Then we obtain the relationship r1 r2 þ ~n·grad(r1 ) þ gr1 þ ~n·grad(r2 ) ¼ f gr2 t t (A6)
r1 ¼ r1 (x, y),
p~(x, y) ¼ grad(r1 (x, y))=r1 (x, y):
(A8) (A9)
[1] B.K.P. Horn, Robot vision. MIT Press, Cambridge, MA, 1986. [2] A. Verri, T. Poggio, Motion field and optical flow qualitative properties, IEEE Trans. Pattern Anal. Machine Intell. 11 (1989) 490–498. [3] F. Bergholm, S.A. Carlsson, ‘Theory’ of optical flow, CVGIP: Graphic Models and Image Processing 53 (2) (1991) 171–188. [4] B. Jahne, Digital Image Processing, Springer-Verlag, Berlin, 1995, pp. 53–230. [5] J.L. Barron, D.J. Fleet, S.S. Beauchemin, Systems and experiment— performance of optical flow techniques, Intern. J. Comput. Vis. 12 (1) (1994) 43–77. [6] F. Glazer, G. Reynolds and P. Anandan, Scene matching by hierarchial correlation, IEEE Computer Society, 432-441, 1983. [7] B.K.P. Horn, B.G. Schunck, Determining optical flow, Artifical Intell. 17 (1981) 185–203. [8] A. Nomura, H. Miike, K. Koga, Determining motion fields under nonuniform illumination, Pattern Recog. Letters 16 (1995) 285–296. [9] E. Dubois and J. Konrad, Estimation of 2D motion fields from images sequences with application to motion compensated processing, Motion Analysis and Image Sequence Processing, Kluwer Academic Publishers, Dordrecht, pp. 53–87, 1993. [10] J.A. Lees, C.S. Novak, B.B. Clark, An automated technique for obtaining cloud motion from geosynchronous satellite data using cross correlation, Journal of Applied Meterology 10 (1971) 118–132. [11] M.J. Black, The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields, Comput. Vision and Image Understanding 63 (1) (1996) 75–104. [12] A. Verri, F. Girosi, V. Tore, Differential techniques for optical flow, J. Opt. Soc. Am. A7 (1990) 912–922.
320
L. Zhang et al. / Image and Vision Computing 17 (1999) 309–320
[13] A. Nomura, H. Miike and E. Yokoyama, Detecting motion and diffusion from a dynamic image sequence, Trans. of the Institute of Electronics Engineers Japan, 115, 3, 4003, 1995 (in Japanese). [14] A. Nomura, H. Miike, K. Koga, Field theory approach for determining optical flow, Pattern Recog. Letters 12 (1991) 183–190. [15] J.K. Kearney, W.B. Thompson, D.L. Boley, Optical flow estimation: an error analysis of gradient-based methods with local optimization, IEEE Trans. Pattern Anal. Machine Intell 9 (1987) 229–244. [16] N. Cornelius and T. Kanade, Adapting optical flow to measure object motion in reflectance and X-ray image sequence, Proc. ACM SIGGRAPH/SIGGART Interdisciplinary Workshop on Motion: Representation and Perception, Toronto, Ontario, Canada, pp. 145–153, 1983.
[17] J. Aisbett, Optical flow with an intensity-weighted smoothing, IEEE Trans. Pattern Anal. Machie Intell. 11 (1989) 512–522. [18] A. Singh, Optical Flow Computation: A Unified Perspective, IEEE Computer Society Press, Los Alamitos, California, 1991. [19] T. Hara, T. Kudou, H. Miike and E. Yokoyama, A. Nomura, Recovering 3D-shape from motion stereo under non-uniform illumination, IARP MVA, pp. 241–244, 1996. [20] H. Miike, T. Sakurai, L. Zhang and H. Yamada, Motion enhancement and visualization of dynamic streamline by pixel-based time-domain filtering of image sequence (submitted).