Detecting Motion Independent of the Camera Movement ... - CiteSeerX

Detecting Motion Independent of the Camera Movement through a Log-Polar Di erential Approach ? Jose A. Boluda1 , Juan Domingo2 , Fernando Pardo1, and Joan Pelechano2 1

Departament d'Informatica i Electronica, Universitat de Valencia 2 Institut de Robotica, Universitat de Valencia C/ Doctor Moliner, 50. 46100 Burjassot. Spain.

Abstract. This paper is concerned with a di erential motion detection technique in log-polar coordinates which allows object motion tracking independently of the camera ego-motion when camera focus is along the movement direction. The method does not use any explicit estimation of the motion eld, which can be calculated afterwards at the moving points. The method, previously formulated in Cartesian coordinates, uses the log-polar coordinates, which allows the isolation of the object movement from the image displacement due to certain camera motions. Experimental results on a sequence of real images are included, in which a moving object is detected and optical ow is calculated in log-polar coordinates only for the points of the object.

1 Introduction The problem of real-time motion detection and tracking from a moving camera is an important issue in arti cial vision. Typical applications are obstacle avoidance and time-to-impact computation. The constrains for real-time implementation of these are the large amount of data to be processed and the high computational cost of the algorithms employed. Motion estimation techniques t into three categories: feature-based, optical

ow and di erential techniques [8]. The feature-based techniques are based on extraction and matching of interesting points [6]. A typical problem is instability of the extracted features due to noise or occlusions. The optical ow computation is a powerful approach but su ers from a very high computational cost, since regressions on the neighbors of a pixel must be applied to solve the optical ow equation. The accuracy of these techniques increases as their computational cost does [2]. The di erential techniques typically compute spatially and/or temporal derivatives for all the image. They su er also from a very high computational cost but due to its parallel nature are very suitable for a parallel implementation [8]. ?

This work has been supported in part by the Spanish government (CICYT project TAP95-1086-C02-02) and has been partially developed at the Machine Vision Lab,Dept. of E.E., University of Virginia (USA).

On the other hand, the use of the Cartesian coordinates to describe images, due to the usual camera layout, seems not to be the natural way for the optical

ow computation, especially when the prevalent movement is along the optical axes. In fact it has been shown that in this case the use of the log-polar coordinates simpli es the computation of the optical ow [10,5]. The di erential approach used in this paper was previously formulated in Cartesian coordinates [4]. Our translation of the restrictions applied to the logpolar mapping will show the utility of this representation, which automatically discards the image displacement due to the camera ego-motion. Moreover, the information representation reduces the amount of data to be processed, thus making the algorithm suitable for real-time implementations.

2 The Log-Polar Representation In humans, the retina exhibits a non-uniform photo-receptor distribution: more resolution at the center of the image and less at the periphery, which allows a selective reduction of the information. The advantages of the log-polar representation for this kind of active vision systems have been widely studied [9,10]. Special hardware that includes the log-polar transformation has been developed: a CMOS sensor [7] and a camera using this sensor [3] have been utilized for the experimental testing of the algorithm proposed. The sensor used has two di erent areas: the retina, which follows the log-polar law and the fovea which follows a linear-polar scale, in order to avoid the singularity at the origin and some scaling problems [7]. γ=θ

y

ξ =r

ξ =log r

Fovea

Retina

x

Fig. 1. The log-polar transformation 8
ro (retina)

being ro is the radius of the fovea circle.

where



p

r = x2 +?y2  = arctan xy

(1)

Figure 1 shows graphically the log-polar representation. This representation is a compromise between resolution and amplitude of the eld of view, or in other words, between computational cost and width of view eld.

3 The Original Algorithm and its Adaptation to Log-polar Mapping The method for boundary motion detection employed here was developed in Cartesian coordinates by Chen and Nandhakumar [4]. Their formulation in the Cartesian plane (x; y) and our transformation to the log-polar plane (; ) is as follows: Let E (x; y; t) (E (; ; t)) be the time-varying image sequence, T ( ) the projection of a surface  in the projection plane  and @T ( ) the border points of  . The assumptions for the method are: { E is piecewise linear with respect to x and y (to  and ) on any point belonging to the projection of a surface in the image plane, which means:

@2E = @2E = 0 @x2 @y2 almost everywhere on T ( ) ? @T ( )

@2E = @2E = 0 @ 2 @2

(2)

{ The motion of the scene is smooth with respect to time, which means 8x; y(8(; )) 2 T ( )

@2x = @2y = 0 @t2 @t2

@2 = @2 = 0 @t2 @t2

(3)

Now, the optical ow equation has to be used. It can be written as:

@E @E vx @E @x + vy @y = @t

@E @E v @E @ + v @ = @t

(4)

It is easy to prove that for the log-polar scaling (retina) and also for the linear-polar (fovea) the optical ow equation has exactly the same functional form, being the only di erence the name of the used variables, as it has been written in (4). 1 With these conditions, it can be proved [4] that 8x; y 62 @T ( )

@2E = 0 @t2

(5) Thus, it is possible to detect motion boundaries through the computation of the second order temporal derivative of the image. When the absolute value of 1

It remains to be proved that for any conformal mapping the optical ow equation would have also the same form.

this magnitude at a given pixel exceeds a threshold value (ideally 0) this point should be marked. The threshold depends on the smoothness quality and on the camera movement. The method includes the ego-motion of the camera in the x and y coordinates (under the log-polar mapping, in the  and  coordinates). The only restriction for this ego-motion is condition (3). The rst assumption, optimal smoothing, can be achieved through a direct linearization. This method, though e ective, requires solving many linear equations and it is computationally expensive, so this condition is approximately achieved through a smoothing in the cortical plane that has been made by using a convolution mask. This approximation could be compensated by increasing the threshold value to retain points whose second spatial derivatives are both about a threshold. The second assumption, motion of the scene is smooth with respect to time, is a strong constrain, and it can be accepted only if a suciently high image rate is used, so in a sequence of 3 very close images, any movement can be approximated to a linear one. Let us detail the meaning of 3 for the case of log-polar coordinates. For the angular coordinate  the condition is clearly a rotation with a constant angular velocity. This assumption implies that the camera can have that class of rotational movement around its optical axis. Apart from that, the velocity condition for the radial coordinate  expressed in (3) will have to be translated to Cartesian coordinates to understand which will be the discarded movement. Using (1) and (3), the di erential equations in the Cartesian plane for the fovea and retina are found to be: 8