Local Parallel Computation of Stochastic Completion Fields Lance R. Williams
David W. Jacobs
NEC Research Institute 4 Independence Way Princeton, NJ 08540 November 9, 1995
Abstract
We describe a local parallel method for computing the stochastic completion eld introduced in an earlier paper (Williams and Jacobs '95). The stochastic completion eld represents the likelihood that a completion joining two contour fragments passes through any given position and orientation in the image plane. It is based upon the assumption that the prior probability distribution of completion shape can be modelled as a random walk in a lattice of discrete positions and orientations. The local parallel method can be interpreted as a stable nite dierence scheme for solving the underlying Fokker-Planck equation identi ed by Mumford (Mumford '94). The resulting algorithm is signi cantly faster than the previously employed method which relied on convolution with large kernel lters computed by Monte Carlo simulation. The complexity of the new method is O(n3m) while that of the previous algorithm was O(n4m2 ) (for an n n image with m discrete orientations). Perhaps most signi cantly, the use of a local method allows us to model the probability distribution of completion shape using stochastic processes which are neither homogenous nor isotropic. For example, it is possible to modulate particle decay rate by a directional function of local image brightnesses (i.e., anisotropic decay). The eect is that illusory contours can be made to respect the local image brightness structure. Finally, we note that the new method is more plausible as a neural model since 1) unlike the previous method, it can be computed in a sparse, locally connected network; and 2) the network dynamics are consistent with psychophysical measurements of the time course of illusory contour formation.
1 Introduction In recent years, several dierent researchers [10, 11, 34] have formulated computational models of perceptual completion and illusory contours. Although the function computed in each case is dierent, each method requires computing one (or more) convolutions with lters represented by kernels of size equal to the largest gaps to be bridged (usually taken to be the size of the image). From the standpoint of computer vision, it should be clear that computing large numbers of convolutions with kernels of size equal to the image size is prohibitively expensive. These methods also have de ciencies as models of human visual processing, since implementation in vivo would require pairs of neurons separated by large amounts of visual arc to be densely interconnected. While this possibility cannot be ruled out,1 a model based on local connections clearly makes weaker demands on neuroanatomy. Data from human psychophysics is also inconsistent with methods requiring large kernel convolution. First, experiments by Rock and Anson[25] (and others[24]) demonstrate (see Figure 1) that illusory contours can be supressed by the presence of texture (or other gural elements) along the path the completion would follow if these elements were absent. That is, the shape and salience of the completion is not solely a function of characteristics of the pair of elements it bridges, but is also a function of the intervening pattern of image brightnesses. Second, recent studies of the time course of illusory contour formation[27] show that the time required for an illusory contour to form is at least partly a function of the size of the gap to be bridged.2 Again, this is consistent with a local-iterative process since a global process would require an amount of time independent of the size of the gap to be bridged. One of the things which distinguishes the work of Williams and Jacobs[34] from prior models of illusory contour formation is that the algorithm they present is simply the most obvious means of computing a function they de ne in advance. This function, f : [P; Q] ! C , takes a pair of arguments, P and Q of dimensionality R2 S 1 and maps them to an output Absence of evidence is not evidence of absence! This result is due to Rubin, Shapley and Nakayama[27], who used a forced choice discrimination task together with masking stimulus to estimate the time required for illusory contour formation. Rubin et al. rst showed that the time required for an illusory contour to form increases in direct proportion to distance when the \pacmen" in a Kanizsa display are \pulled apart." Somewhat surprisingly, they then showed that the time required for an illusory contour to form remains approximately constant when the entire gure is uniformly scaled. This somewhat paradoxical result is consistent with a stochastic completion eld \pyramid" computed by repeated small kernel convolution. 1
2
1
(a)
(b)
Figure 1: Rock and Anson[25] show that illusory contours in the Kanizsa triangle can be suppressed by the presence of a background texture. (a) Kanizsa Triangle. (b) Kanizsa Triangle with background texture. representation, C of the same dimensionality. The space R2 S 1 can be interpreted as a distribution of orientations associated with every point in the image plane. Now consider a set of image curves (de ned by signed contrast edges), some closed, some fragmented. Let p(xp; yp; p) represent the probability that an image curve ends at point (xp; yp) with orientation p. Similiarly, let q(xq ; yq ; q ) represent the probability that an image curve begins at point (xq ; yq ) with orientation q . Given some means of associating probabilities with completion shapes, then the stochastic completion eld (i.e., C (u; v; )) represents the probability that a completion will pass through point (u; v) with orientation . Williams and Jacobs[34] and Mumford[20] have independently proposed that the prior distribution of boundary completion shape can be modeled by a particle undergoing a particular kind of random walk.3 Both show that for this random walk, the maximum likelihood path that a particle will take between two positions and orientations is a curve of least energy ([12]). In a recent paper, Thornber and Williams[30] derive analytic expressions for the stochastic completion eld (and for its expected value and variance). These analyses are independent of the particular algorithm used to compute the stochastic completion eld and therefore apply equally to the output of our previous method and to the method introduced in this paper. In this paper, we show how the stochastic completion eld can be computed eciently in a local parallel network. The network computation is based upon a nite dierence scheme for A similiar model is implicit in Cox, Rehg and Hingorani's[4] use of the Kalman lter in their work on edge tracking. 3
2
solving the Fokker-Planck equation identi ed by Mumford[20]. This avoids computationally prohibitive large kernel convolutions and is more consistent with human vision.
2 Related Work The earliest theory of illusory contour shape is due to Ullman[31]. Ullman hypothesized that the curve used by the human visual system to join two contour fragments is constructed from two circular arcs. Each circular arc is tangent to its sponsoring contour at one end and to the other arc at the point of intersection. From the family of possible curves of this form, R the pair which minimizes total bending energy (i.e., E = (s)2ds) models the shape of the illusory contour. Ullman suggested that illusory contours could be computed in parallel in a network by means of local operations, but he did not implement or test this idea. Grossberg and Mingolla[8] describe a cooperative-competitive model of illusory contour formation which involves repeated convolution with a large (but xed sized) kernel. In outward form, this kernel resembles that of Williams and Jacobs[34], but does does not represent the Green's function of a stochastic process of the sort described by us or Mumford[20]. The outputs of oppositely oriented lters are combined through a non-linear operation consisting of thresholding followed by a logical AND function. Consequently, the state of activity of an individual neuron does not correspond in an obvious way to a probability or other physical quantity. Grossberg and Mingolla's network is extremely complex and dicult to analyze. Clearly, part of this complexity stems from their desire to model, in a comprehensive way, the many dierent forms of stimulus which can elicit illusory contours (e.g., contrast edges of mixed sign, line endings, etc.). No other model attempts to be this comprehensive. Furthermore, no information about its convergence properties is presented, and these properties would seem dicult to ascertain. Assuming that the time required for the network computation to converge is of O(n) (for an n n image), then the complexity of their computation would appear to be O(n3m). Implemented in the obvious parallel way, this would be consistent with the time course of illusory contour formation in human vision[27]. More recently, Guy and Medioni[10] describe a method for computing a vector- eld representing global image structure from local tangent measurements. Like the Hough transform, the key to their approach is the local summation of a set of global voting patterns. Unlike 3
the Hough transform, the accumulator is spatially registered with the image and the voting pattern is a vector- eld, not a scalar- eld. However, because the voting pattern only represents orientations which are co-circular to the tangent measurements, the vector- eld is non-stochastic (i.e., deterministic), and cannot model the prior distribution of completion shapes. Furthermore, their use of analysis of moments to combine votes of dierent orientations is an immediate consequence of their decision to use a non-stochastic eld. That is, given that they desire a single orientation everywhere, and given that (in their scheme) evidence takes the form of votes of dierent orientations, in the end, they compute an average. The time complexity of Guy and Medioni's method is O(n4) since O(n2) number of voting patterns of size O(n2) must be combined to form the output vector- eld. Heitger and von der Heydt[11] describe a theory of gural completion based upon nonlinear combination of the convolutions of \keypoint" images with a xed set of oriented grouping lters. Keypoints are located at negative minima or positive maxima of curvature (i.e., corners) in a luminance boundary and represent the pair of orientations at likely points of occlusion. Unfortunately, neither the equations de ning the lters nor the proposed method of combination are described as a means of computing an underlying function. This makes analysis of their method dicult. However, because their grouping lters are scalar functions, they cannot (even implicitly) model a prior probability distribution of completion shapes in the manner we describe. Signi cantly, they demonstrate their method on both illusory contour gures like the Kanizsa triangle and on more \realistic" images (e.g. of plants and rocks) with impressive results. The time complexity of Heitger and von der Heydt's method is O(n4m) since m convolutions requiring O(n4) time must be computed. Although not portrayed as theories of illusory contours, there have been a number of methods in computer vision which compute contours or \salience" elds using dynamic programming or shortest path algorithms based on dynamic programming[1, 3, 6, 19, 17, 29]. These methods are related to ours by their use of cost functions based on smoothness and shortness. Perhaps more deeply, these dynamic programming methods are related to our present work because our repeated small kernel convolution can be viewed as a dynamic programming method of computing the source eld (see next section). Our method diers in many details, and most fundamentally, in that we compute a probability distribution based upon a stochastic model of contour formation, rather than a single contour minimizing a cost 4
function.
3 Review of Previous Method Like Mumford[20], we de ned a random walk which on average generates short, smooth paths. Unlike the familiar two-dimensional isotropic random walk, where a particle's state is simply its position in the plane, the particles of the random walk we de ned possess both position and orientation. The random walk is de ned by: 1) the equations of motion; and 2) a decay constant. The equations of motion describe the change in a particle's position and orientation as a function of time, the decay constant, , speci es a particle's half-life. Because a certain fraction of particles (i.e. 1 ? e? 1 ) decay per unit time, longer paths are exponentially less likely. A particle's position and orientation are updated according to the following stochastic non-linear dierential equation:
x_ = cos y_ = sin _ = ^(0; 2 ; t) where x_ and y_ specify change in position, _ is change in orientation (i.e. curvature) and ^(0; 2 ; t) is a normally distributed random variable with zero mean and variance equal to 2. Let there exist two subsets of states, P and Q representing the endpoints of a set of signed contrast edges corresponding to visible surface boundaries. We call P the set of sources and Q the set of sinks. The stochastic completion eld represents the probability that a particle, initially in state (xp; yp; p), for some p 2 P , will in the course of a random walk (representing a prior on completion shape) pass through state (u; v; ), on its way to state (xq ; yq ; q ), for some q 2 Q, for all combinations of u, v and . Williams and Jacobs[34] observed that the stochastic completion eld can be expressed as the product of two vector- eld convolutions. The convolution kernels are de ned with respect to random walks beginning (or ending) at the origin with orientation zero. More speci cally, the convolution kernels represent: 1) the probability that a particle beginning at (0; 0; 0) will reach any other position and orientation in the image plane (i.e., (x; y; )) 5
before it decays (i.e., the source eld); and 2) the probability that a particle beginning at (x; y; ) will reach (0; 0; 0) before it decays (i.e., the sink eld). Because the stochastic process is translation and rotation invariant, the probability that a particle will be in any position and orientation at some future time, t, can be computed by convolution with the appropriate Green's function, G:
p(u; v; ; t) =
Z1
dx
?1
Z1
dy
?1
Z
d G(u0 ? x; v0 ? y; ? ; t)p(x; y; ; 0) e ?t
?
where (u0; v0; ? ) is (u; v; ) rotated by ? about (x; y), so that u0 = u cos + v sin and v0 = ?u sin + v cos . The Green's function satisfying the dierential equation was originally computed by a Monte Carlo method which involved simulating the random walks of 1:0 106 particles on a 256 256 grid with 36 xed orientations. The probability that a particle beginning at (0; 0; 0) reaches (x; y; ) before it decays is approximated by the fraction of simulated trajectories beginning at (0; 0; 0) which intersect the region (x 1:0; y 1:0; =72). In a recent paper, Thornber and Williams[30] describe an analytic expression for the Green's function which can be used to compute lters which are signi cantly more accurate than those computed by the Monte Carlo method. If we assume that paths do not self-intersect before they decay, then the fraction of paths which pass through a given position and orientation equals the integral of the position and orientation probability density function over time. By changing the order of integration, and de ning a new Green's function, G0:
G0 (u; v; )
Z1 = dt G(u; v; ; t) e? 0
t
the explicit time variable, t, can be supressed. The result is that the source eld, p0 (u; v; ), can be computed by convolving the p.d.f. representing the source distribution with the new Green's function:
p0 (u; v; ) =
Z1 Z1 Z dx dy d G0(u0 ? x; v0 ? y; ? ) p(x; y; ; 0) ?1 ?1 ?
The sink eld, q 0(u; v; +),4 which represents the probability that a particle leaving (u; v; ) will reach a sink state before it decays can be computed in a similiar fashion:
q 0(u; v; + ) 4
The bar over the 0).
q (x; y; ;
Z1 Z1 Z = dx dy d G0(u0 ? x; v0 ? y; + ? ) q(x; y; + ; 0) ?1 ?1 ?
q
is meant to indicate that the initial condition is q(x; y; + ; 0) rather than
6
Finally, the stochastic completion eld, C (u; v; ), which represents the relative likelihood that a particle leaving a source state will pass through (u; v; ) and enter a sink state before it decays, equals the product of the source and sink elds:
C (u; v; ) = p0 (u; v; ) q 0 (u; v; + ) Clearly, the run-time complexity (i.e., neglecting the time required to compute G) of the above process is dominated by the convolution with the Green's function G0. The overall run-time compexity is therefore O(n4m2) for an n n image with m discrete orientations.
4 A Local Parallel Method In this section we show to how to compute the completion eld in a local parallel network. Fundamentally, the computation is the same as before:
Compute the source eld. Compute the sink eld. Compute their product. The dierence is in the way the source and sink elds are computed. In the new algorithm, the source eld is computed by integrating the Fokker-Planck equation for the stochastic process identi ed by Mumford[20]. The probability density function representing a particle's position at time t is given by: Zt y; ; t) p(x; y; ; t) = p(x; y; ; 0) + @p(x;@t 0 and @P = ? cos @P ? sin @P + 2 =2 @ 2 P ? 1= P @t @x @y @2 where P is p(x; y; ; t). The Fokker-Planck equation for the stochastic process is best understood by considering each of the four terms separately. The rst two terms taken together are the so called advection equation: @P = ? cos @P ? sin @P @t @x @y 7
Solutions of this equation have the form:
p(x; y; ; t) = p(x + cos t; y + sin t; ; t + t) for small t. That is, the probability density function for constant at times t and t + t are related through a translation by the vector [cos t; sin t]. Intuitively, for constant , probability density is being transported in the direction with unit speed. The third term is the classical diusion equation: @P = 2=2 @ 2 P @t @2 Solutions of the diusion equation are given by Gaussian convolutions. Although, strictly speaking, the Gaussian is not de ned for angular quantities, for small t and variance 2 the following will approximately hold:
p(x; y; ; t + t) =
Z
?
d p(x; y; ? ; t) exp(?2 =2t)
The last term implements the exponential decay: @P = ?1= P @t It is possible to interpret the Fokker-Planck equation in this case as describing a set of independent advection equations (one for each possible value of ) coupled in the dimension by the diusion equation. Taken together, the terms comprising the Fokker-Planck equation faithfully model the evolution in time of the probability density function representing a particle's position and orientation. The value of the source eld will also depend on the initial conditions. In this case, this is the probability density function describing the particle's position at time zero. In our implementation, the p.d.f. de ning the initial conditions consist of oriented impulses located at corners detected in the intensity image. These impulses represent the starting (or ending) points of possible completions. Strictly speaking, the Fokker-Planck equation can model our diusion process only if we assume smooth initial conditions. Unfortunately, because the p.d.f.'s de ning the source and sink distributions are discontinuous, their partial derivatives are not de ned. Moreover, the discrete method that we describe for solving the Fokker-Planck equation will be accurate only when a rst order approximation holds well 8
over the length of scales that we discretely represent. For now, we will assume that the initial conditions are smooth, and are well approximated by rst order expressions, and then show how to solve this partial dierential equation iteratively. In the next section, we will consider the accuracy of this method, and then describe a stochastic interpretation of the local method which does not require continuous initial conditions. A standard technique for solving partial dierential equations on a grid is to use the Taylor series expansion to write the partial derivatives at grid points as functions of the values of nearby grid points and of higher order terms. Then by neglecting higher order terms, we obtain an equation that expresses the value of a grid point at time t + 1 as a function of nearby grid points at time t, and which is accurate to rst order. See Ames[2] or Ghez[7] for more detailed discussion of this. This technique leads to the following set of equations, which are accurate to rst order and which represent an iterative method of solving the Fokker-Planck equation:
8 t < px;y; ? ptx?1;y; if cos > 0 t px;y; ? cos : t px+1;y; ? ptx;y; if cos < 0 8 t+1=4 t+1=4 +1=4 ? sin < px;y; ? px;y?1; if sin > 0 = ptx;y; : pt+1=4 ? pt+1=4 if sin < 0 x;y +1; x;y; t+1=2 t+1=2 +1=2 = px;y;? + (1 ? 2) px;y; + ptx;y; + 1 +3=4 = e? ptx;y;
p +1 4 =
(1)
+1=2 ptx;y;
(2)
t = x;y;
+3=4 ptx;y; +1 ptx;y;
(3) (4)
where = (2t)=2()2 . +1 into These equations require a few comments. First we have split the computation of ptx;y; +1 with a series four separate steps. This standard fractional method allows us to compute ptx;y; of separate convolutions in each of the three spatial directions. See Ames[2] for comments on the accuracy of this method, and descriptions of other related methods. Second, notice that our computations of the rst two terms depend on the sign of cos and sin . To see why this is necessary, consider the problem of computing the rst term for a particle traveling in the positive x direction. In this case, we wish to ensure that our approximation of @P at the @x point (x; y; ; t) depends on (x ? 1; y; 0 ; t) but not on (x + 1; y; 0 ; t) because the value of (x; y; 0 ; t + 1) depends on the rst value, but not the second. This method of ensuring that the local dierence used to approximate derivatives depends on the direction relevant to the 9
underlying physical process is called upwind dierencing, and can be shown to be formally necessary to ensure stability. Finally, we can see that the recurrence relationship given by these equations can be computed through small kernel convolution. For example, we can compute the rst recurrence using a 3 1 1 lter centered on the pixel being convolved with the lter. This lter is [cos ; 1 ? cos ; 0] for cos > 0 and [0; 1 + cos ; ? cos ] for cos < 0. The source eld, p0(x; y; ), which represents the probability that a particle beginning at a source will reach (x; y; ) before it decays, is computed by integrating the probability density function representing the particle's position over time:
p0(x; y; ) =
Z1
dt p(x; y; ; t)
0
We can approximate the integral of p(x; y; ; t) over all times less than some xed time t as follows: Xt p0(x; y; ; t) p(x; y; ; t) =0
i
which can be computed by the following recurrence:
p0 (x; y; ; t + 1) = p0(x; y; ; t) + p(x; y; ; t + 1) Since O(n) iterations are required to bridge the largest gaps which might be found in an image of size n n, and each iteration requires O(n2m) time, the run-time complexity of the small kernel method is O(n3m). In our experiments, the image size is typically 256 256 with 36 discrete orientations. This represents an eective speedup on the order of one thousand times when compared against the previous algorithm. We note that since the new algorithm is local and parallel, the communication overhead on a SIMD computer would be negligible, so that the parallel time complexity would be O(n) on a machine with O(n2m) processors. This is consistent with the estimated time required for illusory contour to form in human vision, which has been shown to scale linearly with the size of the gap to be bridged when the size of the inducing elements is held constant[27].
5 Stability and Accuracy Considerations We have presented a method of computing the stochastic completion eld using repeated small kernel convolutions. We now consider how faithfully this new method models the 10
underlying stochastic process. First, we identify the conditions under which our nitedierence scheme converges to the underlying partial dierential equation that it models. We then present a stochastic interpretation of the small kernel method. This interpretation embodies our original (i.e., physical) intuition and applies even in the case of discontinuous initial conditions. Finally, we compare the accuracy of our new method to our previous approach, which used a single large convolution mask. We can use standard techniques to determine that our method will be stable provided that the following three conditions are met: 1. = (2t)=2()2 21 . 2. cos xt 1. 3. sin yt 1. See standard texts, such as Ghez[7] or Ames[2] for the derivation and precise signi cance of stability. Informally, stability is necessary to ensure that the solution of the partial dierential equation at each time step incorporates all the possibly relevant information from the initial conditions, and to ensure convergence to the correct solution as space and time are discretized more nely. In the experiments that we report in this paper, we assume that x = y, and that the particles that we model are travelling at a speed of one spatial unit per unit of time. Therefore, conditions 2 and 3 will always be met. These conditions imply, however, that should we choose to discretize space more nely (i.e., to achieve greater accuracy), then we must also discretize time proportionately nely. Also, in all of our experiments we set = 36 . With this level of discretization, our method will only be stable when 36 . All our experiments use values of much less than this limit. We now present a stochastic interpretation of repeated small kernel convolution. This discussion serves two purposes: 1) The stochastic interpretation more directly embodies our original (i.e., physical) intuition that the prior probability distribution of completion shape can be modelled as a random walk in a lattice of discrete positions and orientations; and 2) Our stochastic interpretation applies even in the case of discontinuous initial conditions, 11
where interpreting the small kernel method as a rst order approximation to a partial differential equation breaks down. The small kernel method can be viewed as computing the probability distribution of the position of a particle undergoing a random walk in a lattice in R2 S 1. Suppose there is a particle in this lattice, with position (x; y; ), which at the rst time step:
moves to the neighboring site in the positive x direction with probability max(0; cos ). moves to the neighboring site in the ?x direction with probability max(0; ? cos ). remains at the current site with probability (1 ? j cos j). See Figure 2(a). Clearly, the convolution de ned by Equation 1 updates the probability distribution on the lattice consistent with this motion. In a similiar fashion, the change in the probability distribution of the particle's y-position is updated in the second time step. This update is described by Equation 2 (see Figure 2(b)). In the third time step, the particle:
moves to the neighboring site in the direction with probability . moves to the neighboring site in the ? direction with probability . remains at the current site with probability (1 ? 2). See Figure 2(c). Equation 3 updates the probability distribution on the lattice consistent with this motion. Finally, Equation 4 simulates the exponential decay of the particle (see Figure 2(d)), which occurs on the fourth time step. After t iterations of these four steps, we have computed the distribution of the position of a set of particles at time t, given initial conditions that specify a distribution of their positions at time zero. This stochastic process is an exact model of what our small kernel method computes, and is valid even when the initial conditions are discontinuous. We now estimate the dierence in magnitude between the outputs of the small and large kernel methods. Previously, we constructed a large convolution lter by simulating the paths of a large number of particles undergoing a random walk somewhat dierent than the one described above. Each particle began its random walk at the origin, with a direction of = 0. Time was divided into discrete units, and at each time step, was changed by a value drawn from a continuous, Gaussian distribution. The x and y positions of the particle 12
S T E P
S T E P
cos
1−cos
sin 1−sin
2
1 Y
(a)
X
S T E P
(b)
S T E P
1−2
e
1
4
3
(d)
(c)
Figure 2: The small kernel method can be interpreted as a random walk in a lattice of discrete positions and orientations. (a) Step 1: Advection in x direction (for cos 0). (a) Step 2: Advection in y direction (for sin 0). (c) Step 3: Diusion in . (d) Step 4: Exponential decay.
13
at each time step were represented as real numbers. The magnitude of the lter kernel at any given point was estimated to be the fraction of trajectories intersecting a small volume of x ? y ? space. Rotated and translated versions of this lter could then be used to determine the source (or sink) eld produced by a source (or sink) of any position and orientation. This Monte Carlo approach has two potential sources of inaccuracy compared to the continuous process described by the Fokker-Planck equation: 1) time is discretized not continuous; and 2) we only simulate the paths of a nite (though large) number of particles. The small kernel method shares this rst source of error and since we are only concerned with additional inaccuracy due to the use of the small kernel method, the second source of error does not concern us (see Thornber and Williams[30] for an analysis of the accuracy of the Monte Carlo method). The stochastic process simulated by the small kernel method diers from that of the previous method in two ways. First, in the new process, diuses discretely at each time step, while previously, changes in were drawn from a continuous Gaussian distribution. This is analogous to the way in which Gaussian smoothing is typically implemented on a grid. Since this discrete approximation will rapidly converge to a true Gaussian distribution, we focus on the second dierence, which is the restriction that particles trace their paths in a lattice. Previously, a particle with direction at the beginning of a time step moved by the vector [cos ; sin ] during that time step. In the new process, a particle moves one unit in the x direction with probability j cos j, and one unit in the y direction with probability j sin j. This can be interpreted as a stochastic approximation of the same motion. Consider the dierence in how each method computes the path of a particle as a function of a set of values. Let 1; 2 ; :::t be the values which takes at successive times, t. While the Monte Carlo method maintains the exact x and y coordinates over time of the piecewise linear path de ned by these values of , the small kernel method does not trace the trajectory of a particle exactly but rather maintains a probability distribution representing its likely position in the lattice (i.e., it introduces an additional source of randomness, beyond that inherent in the Monte Carlo method). Now, consider the x position of a particle after t time steps. The Monte Carlo method will give this as Pti=1 ci (where ci = cos i and si = sin i ). If we let xi be a random variable which is 1 with probability ci and 0 with probability 1 ? ci , then repeated small-kernel convolution 14
produces values in the x direction that are a probability distribution for the random variable Pt x . It is straightforward to show that this variable has a distribution with mean Pt c i=1 i i=1 i P P and variance ti=1 (ci ? ci)2 = ti=1(ci ? c2i ). This variance can range from 0 (for ci = 0 or ci = 1) to as much as 41 when each ci is equal to 21 (i.e., = 3 ). Moreover, it follows from Lindeburg's theorem that if this distribution
is normalized to be of mean 0 and unit variance it will converge to a normal distribution, as t ! 1. This depends on the variables satisfying the Lindeburg condition, which states roughly that each term in the sum becomes, in the limit, a vanishingly small component of the overall sum. See Feller[5], for example, for more detail on this. Similar reasoning tells us that the distribution of a particle's position in the y direction will have mean Pti=1 si and variance Pti=1 (si ? si)2 = Pti=1(si ? s2i ) which can also range from 0 to 4t . So, for example, with t = 100 the standard deviation of this \spread" in the position of a particle is bounded by 5 pixels. The source eld computed with each method consists of the sum of each possible path weighted by the probability of encountering the set of values that describes that path. Therefore, the entire source eld computed by the small kernel method is similar to what one would obtain by convolving the source eld produced by the Monte Carlo method with Gaussian lters with variances between 0 and 4t . Note that this eect is not isotropic. The above expressions show that particles travelling in directions parallel to the x and y axis will have zero variance, while those traveling in the 4 direction will have maximum variance.5 Finally, we note that we can obtain greater accuracy in computing the source eld, if needed, at the cost of additional computation. Suppose that we divide each unit in space, angle and time into d subunits, performing the same computations on these subunits. Similar calculations show that the variance in each direction will be bounded by 4td pixels (i.e., that the standard deviation drops with the square root of the number of discrete units used, as might be expected). We must, however, perform td convolutions on a eld of size nd nd m, rather than t convolutions on a eld of size n n m, increasing the run time by a factor of d3. In sum we can see that the use of a small convolution kernel has a negligible eect on The lack of any detectable global phase coherence in the orientation preference structure of visual cortex[21] suggests that (unlike the obvious computer implementation on a rectangular grid) systematic long range errors due to local anisotropy will be minimal in vivo. 5
15
the accuracy of our method, and that this eect can be reduced even more, if needed. This seems to be a price well worth paying in light of its many advantages.
6 Anisotropic Decay The fact that previous computation took the form of a convolution with a large kernel was a simple consequence of the translational and rotational invariance of the random process. That is, the likelihood of a particle taking a random walk of a given shape is assumed to be independent of the random walk's starting point and orientation. Because the new method is based upon repeated small kernel convolution, we are free to consider stochastic processes which are not everywhere the same, that is, processes which are neither homogenous nor isotropic. There is some precedent for this in computer vision, where dierent methods lumped together under the term \anisotropic diusion" are becoming increasingly popular for image enhancement[23, 28, 32]. The analogy breaks down in the sense that the quantity being \diused" in the one case is image brightness, and in our case, the probability density of a particle's position and orientation. Conceivably, each of the two parameters de ning the stochastic process (i.e., 2 and ) could be modulated by local directional functions of the image brightness. Interpreted within a Bayesian framework, this could be viewed as augmenting the prior model with the additional information provided by the local image brightnesses to better estimate the parameters of the stochastic process modelling completion shape within the local neighborhood. In this paper, we take only a rst step in this direction and consider modulating the half-life of the particle, . The eect is that illusory contours will respect the local image brightness structure. Perceptual completions occur in two types: modal and amodal. Modal completions are typically accompanied by a change in perceived brightness. Modal completions correspond to surface boundaries which are nominally visible (i.e., no occluding surface is presumed to be hiding them). Illusory contours are therefore modal completions. Amodal completions are nominally occluded (i.e., an intervening surface is presumed to be hiding them). Kellman and Shipley[15] and others (e.g., [22, 33]) hypothesize that modal and amodal completions are computed by the same underlying process. According to this view, whether or not a 16
completion is perceived as modal or amodal is determined by a later process devoted to organizing surfaces. We are not concerned here with all of the factors which play a role in determining whether or not a completion will be perceived as modal or amodal. Some of these factors are beyond the scope of the current paper (they concern the topological validity of the surface organization[33]). Rock[26] emphasizes one factor, termed stimulus conformity, which is relevant to our current discussion. Rock's notion of stimulus conformity can be used to explain why illusory contours (unlike amodal completions) cannot cross large brightness discontinuities. This is based on the following argument: The illusory contour is presumed to be the visible boundary of a surface which occludes either the area to its left or right. However, the discontinuity in image brightness continues uninterrupted on both sides of the boundary of the presumed occluding surface. This contradicts the assumption that the illusory contour is the visible boundary of an occluding surface. Unlike the global method, the local parallel algorithm allows us to incorporate local dependencies on image brightnesses into the computation. Speci cally, we can enforce the requirement that illusory contours not cross discontinuities in image brightness by means of a mechanism which we refer to as anisotropic decay. This mechanism makes the rate of decay an angular function of the local image brightnesses. The rst requirement is that in regions where brightness is changing slowly, a particle's half-life should remain constant. This will allow illusory contours to form in these regions. The second requirement is that along edges the half-life should be very short in directions leading away from the edge, but very long in directions leading towards the edge. This will have the eect of channeling particles toward the edges. Finally, we require that the half-life should be very long in the direction tangent to the edge. This will encourage particles to continue traveling along the edge. All of these requirements can be satis ed by a function based on the directional second derivative: 8 >< e D2 I if D2+=2 I 6= 0 and D1 I > 0 () = > e?D2 I if D2+=2 I 6= 0 and D1 I < 0 : + jrI j otherwise: where D1 I and D2 I are the rst and second directional derivatives of image brightness in the direction of the particle's motion and D2+=2 I is the second directional derivative in 17
the direction normal to the particle's motion. The condition D2+=2 I 6= 0 is simply used to decide whether or not the particle is traveling along an edge (i.e., a zero-crossing of the second directional derivative in the direction normal to the particle's motion). If so, the halflife is taken to be + jrI j, which becomes in homogenous areas as required, irrespective of whether or not there are zero crossings. If the particle is not travelling in the direction of an edge, then the half-life is exp(D2 I ) or exp(?D2 I ) depending on the sign of D1 I . This expression: 1) is large or small depending on whether or not the particle is moving toward or away from an edge; and 2) evaluates to in homogenous areas.
7 Experimental Results Through a sequence of pictures shown in Figure 3, we rst demonstrate the propagation of two probability density \waves" emitted from a pair of sources located on a line parallel to the x-axis and with orientations equal to 6 and 56 . The pictures show the probability density (i.e., p(x; y; ; t)) summed over all orientations (i.e, the marginal distribution in ). The time interval between successive pictures in the sequence, t is 5, the variance, 2, is 0.005 and the half-life, , is 4.5. Error due to the inherent anisotropy of the nite-dierence scheme is especially pronounced in the rst ve frames (a-e) where it manifests itself as a
attening of the face of the wave along the vertical direction. This eect diminishes as the size of the wave increases, the result of greater eective resolution. In (j-q) the waves from the dierent sources can be seen to pass through one another without aecting one another. Finally, we observe that there is a visible decrease in brightness between the rst and last frame of the sequence. While some of this decrease in brightness can be attributed to the diusion in space, the larger part can be attributed to the exponential decay of the particles. A second sequence of images, shown in Figure 4 depicts the formation of the completion eld as a function of time. To simplify this demonstration, the sink eld was assumed to be related to the source eld by a transformation which reverses directions, so that:
C (u; v; ; t) = p0(x; y; ; t) p0(x; y; + ; t) Frames (a-i) are totally dark, since insucient time has elapsed for particles traveling from the two sources to reach each other. The completion eld rst appears in frame (j), at the 18
point midway between both sources. In frames (k-t) the completion eld rapidly spreads outward in both directions away from the midpoint and back toward the two sources. The third demonstration is the well known Kanizsa triangle[13] (see Figure 5 (a)). Detecting and localizing points where it is likely that one surface occludes another (i.e., \T" or \L" junctions) and then measuring the orientations of both boundary contours in the neighborhood of that point is a dicult and unsolved problem. Although our method of detection and localization is not satisfactory, our method of measuring multiple orientations (similiar to the steerable \wedge" lter described by Farid and Simoncelli[9]), works reasonably well. For the moment, we use the steerable lter to measure the two orientations at corners detected as local maxima in the eigenvalue associated with the second eigenvector of the windowed second moment matrix[16, 28]. The maximum and minimum of the continuous response of the steerable lter is then measured to approximately 3 of accuracy. As an anti-aliasing measure, a unit mass is distributed proportionally among the two discrete orientations straddling the nominal maximum and minimum. The maximum is interpreted as a source and the minimum as a sink. Figure 5 (b) shows the stochastic completion eld computed using large kernel convolution with a lter generated by Monte Carlo simulation (from [34]). The completion eld is summed over all orientations and superimposed on the brightness gradient magnitude image for illustrative purposes. Figure 5 (c) shows the same, but this result was computed with a lter generated by numerical integration of the analytic expression for G derived by Thornber and Williams[30]. This is by far the most accurate method of computing the stochastic completion eld. Finally, Figure 5 (d) shows the stochastic completion eld computed using repeated small kernel convolution. There are small (but noticeable) numerical artifacts due to the inherent anisotropy of the nite dierence scheme for solving the advection equation (See Figure 6). Although space considerations preclude our showing them here, we have re-run all of the experiments from our previous paper using the small kernel algorithm and achieved results of similiar quality. Finally, we have tested the small kernel method on a picture of a dinosaur using the same method of keypoint detection and orientation measurement. The stochastic completion eld, summed over all orientations and superimposed on an edge image for illustration purposes, is shown in Figure 7. The body of the dinosaur has been completed where it is occluded by the legs. 19
In Figure 8 we demonstrate the eectiveness of the anisotropic decay function described in the previous section. This gure also illustrates the strong connection between the stochastic completion eld and the active energy minimizing contours (i.e., \snakes") of Kass, Witkin and Terzopolous[14]. The stochastic completion eld can be thought of as a probability distribution representing a family of snakes with given end conditions. This connection is strengthened by the inclusion of the anisotropic decay mechanism, since this device plays a role similiar to the \external" energy term of the snake formulation. Figure 8(a) shows two probability density \waves" emitted from a pair of sources located on the dinosaurs back. The source and sink elds were assumed equal and the time, t = 32. Figure 8(b) shows the stochastic completion eld computed for this pair of sources. The mode of the stochastic completion eld (i.e. the curve of least energy), lies signi cantly below the back of the dinosaur. Furthermore, the variance of the distribution is quite large. Figure 8(c) show the probability density \waves" emitted from this same pair of sources but with particle half-life determined by the anisotropic decay function. The probability density is maximum where the wave intersects the back of the dinosaur. Figure 8(d) shows that the mode of the resulting completion eld conforms closely to the back of the dinosaur and that the variance of the distribution is signi cantly reduced. In our last experiment, we demonstrate the eect of the anisotropic decay function using three variations of the Kanizsa square gure (see Ramachandran et al.[24]). The rst variation, shown in Figure 9(a), consists of four \pacmen" on a uniform white background. The second, shown in Figure 9(c) is the same except that a checkered background has been added. The phase of the checkered pattern is such that the illusory contours joining the \pacmen" must traverse large brightness gradients (i.e., the maximum likelihood path in the absence of the checkered pattern crosses the borders of the squares at right angles). Ramachandran et al.[24] report that the percept of the illusory square is severely diminished or supressed entirely by the addition of the out-of-phase checkered background. The third variation, shown in Figure 9(e), also places the \pacmen" on a checkered background, but this time the phase of the checkered pattern is such that the illusory contours encounter no large brightness gradients but instead coincide with the borders of the squares. Ramachandran et al.[24] report that the illusory square is greatly enhanced by the addition of the in-phase checkered background. The magnitude of the stochastic completion eld (summed 20
over all orientations) along the line y = 64 is displayed to the right of each of the Kanizsa squares. Since the scale of the vertical dimensions are identical, it is clear that the results of the computer simulation are consistent with the human percept in each case.
8 Conclusion Illusory contour formation provides a strong clue about how human boundary detection works, and consequently has been one of the most widely studied processes in human midlevel vision. However, no adequate computational theory of this phenomenon exists. Our method improves upon previous work, including our own, by better tting the known properties of the brain, and by explaining some of the data on the timing properties of illusory contour formation. Moreover, we believe that illusory contours are merely one aspect of the general problem, in both human and machine vision, of extracting and interpreting the boundaries of objects. We show experiments that indicate that our work can also be relevant to providing a better understanding of methods for extracting boundaries in everyday images. We show how to model illusory contour formation using repeated, small kernel convolutions. This method directly improves upon several published models because of its greater eciency (by several orders of magnitude) and because it allows the edge boundaries of objects to be connected in a way that takes account of all relevant image intensities, forming illusory contours only when they are consistent with these intensities. Our work therefore helps to bridge the gap between work on understanding human performance on illusory contours, and work in computer vision, such as \snakes," that attempts to extract useful contours that are smooth and yet faithful to the image intensities.
References
[1] Alter, T. D., and R. Basri, Extracting Salient Contours from Images: An Analysis of the Saliency Network, MIT A. I. Memo No. 1550, August 1995. [2] W. Ames, Numerical Methods for Partial Dierential Equations, Academic Press Inc., 1992. [3] Amini, A. A., T. E. Weymouth, and R. C. Jain, Using Dynamic Programming for Solving Variational Problems in Vision, IEEE Transactions on Pattern Anal. and Machine Intell., Vol. 12, No. 9, pp. 855-867, 1990. 21
[4] Cox, I.J., Rehg, J.M. and Hingorani, S., A Bayesian Multiple Hypothesis Approach to Edge Grouping and Contour Segmentation, Intl. Journal of Computer Vision 11, pp. 5-24, 1993. [5] W. Feller, An Introduction to Probability Theory and its Applications, Vol. II, John Wiley and Sons, 1971. [6] Geiger, D., A. Gupta, L. A. Costa, and J. Vlontzos, Dynamic Programming for Detecting, Tracking, and Matching Deformable Contours, IEEE Transactions on Pattern Anal. and Machine Intell., Vol. 17, No. 3, pp. 294-302, 1995. [7] R. Ghez, A Primer of Diusion Problems, John Wiley and Sons, 1988. [8] Grossberg, S., and Mingolla, E., The Role of Illusory Contours in Visual Segmentation, The Perception of Illusory Contours, Petry, S. and Meyer, G. (eds.), Springer-Verlag, New York, pp. 116-125, 1987. [9] Farid, H., and E. Simoncelli, Steerable Wedge Filters, Proc. of the 5th Intl. Conf. on Computer Vision, Cambridge, Mass., 1995. [10] Guy, G. and Medioni, G., Inferring Global Perceptual Contours from Local Features, Proc. of the DARPA Image Understanding Workshop, Washington, D.C., pp. 881-892, 1993. [11] Heitger, R. and Von der Heydt, R., A Computational Model of Neural Contour Processing, Figure-ground and Illusory Contours, Proc. of 4th Intl. Conf. on Computer Vision, Berlin, Germany, 1993. [12] Horn, B.K.P., The Curve of Least Energy, MIT AI Lab Memo No. 612, MIT, Cambridge, Mass.,1981. [13] Kanizsa, G., Organization in Vision, Praeger, New York, 1979. [14] Kass, M., Witkin, A. and Terzopolous, D., Snakes: Active Minimum Energy Seeking Contours, Proc. of the First Intl. Conf. on Computer Vision, London, England, pp. 259-268, 1987. [15] Kellman, P.J., and T.F. Shipley, A Theory of Visual Interpolation in Object Perception, Cognitive Psychology 23, pp. 141-221, 1991. [16] Lindeberg, T., Scale-Space Theory in Computer Vision, Kluwer Academic Publishers, The Netherlands, 1994. [17] Martelli, A., An Application of Heuristic Search Methods to Edge and Contour Detection, Comm. ACM, 19(2), February 1976. [18] Michaelis, M. and Sommer, G., Junction Classi cation by Multiple Orientation Detection, Proc. of European Conf. on Computer Vision, pp. 101-108, 1994. [19] Montanari, U.,On the Optimal Detection of Curves in Noisy Pictures, Comm. ACM, 14(5), May 1971. [20] Mumford, D., Elastica and Computer Vision, Algebraic Geometry and Its Applications, Springer-Verlag, New York, 1994. [21] Neibur, E. and Worgotter, F., Design Principles of Columnar Organization in Visual Cortex, Neural Computation 6, pp. 602-614, 1994. 22
[22] Nitzberg, M., Shiota, T. and Mumford, D., Filtering, Segmentation and Depth, Springer-Verlag, Berlin, 1993. [23] Perona, P. and Malik, J., Scale Space and Edge Detection Esing Anisotropic Diusion, IEEE Trans. on Pattern Analysis and Machine Intelligence, 12, pp. 629-639, 1990. [24] Ramachandran, V.S., Ruskin, D., Cobb, S., Rogers-Ramachandran, D., and Tyler, C.W., On the Perception of Illusory Contours, Vision Research, Vol. 34, No. 23, pp. 3145-3152, 1994. [25] Rock, I. and Anson, R., Illusory Contours as the Solution to a Problem, Perception Vol. 8, pp. 665-681, 1979. [26] Rock, I., The Logic of Perception, MIT Press, Cambridge, Mass., 1983. [27] Rubin, N., Shapley, R., and Nakayama, K., Rapid Propagation Speed of Signals Triggering Illusory Contours, Investigative Opthalmology and Visual Science (ARVO), Vol. 36, No. 4, 1995. [28] Nitzberg, M. and Shiota, T., Nonlinear Image Filtering with Edge and Corner Enhancement, IEEE Trans. on Pattern Analysis and Machine Intelligence, 14, pp. 826833, 1992. [29] Shashua, A. and Ullman, S., Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network, 2nd Intl. Conf. on Computer Vision, Clearwater, FL, pp. 321-327, 1988. [30] Thornber, K.K. and L.R. Williams, Analytic Solution of Stochastic Completion Fields, IEEE Intl. Symposium on Computer Vision, Coral Gables, FL, 1995. [31] Ullman, S., Filling-in the Gaps: The Shape of Subjective Contours and a Model for Their Generation, Biological Cybernetics 21, pp. 1-6, 1976. [32] Weickert, J., Multi-Scale Texture Enhancement, Conf. on Computer Analysis of Image and Patterns, Prague, Czech Republic, pp. 230-237, 1995. [33] Williams, L.R., Perceptual Completion of Occluded Surfaces, Ph.D. Dissertation, Dept. of Computer Science, Univ. of Massachusetts at Amherst, Amherst, Mass., 1994. [34] Williams, L.R., and D.W. Jacobs, Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience, Proc. of the 5th Intl. Conf. on Computer Vision, Cambridge, Mass., 1995.
23
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
Figure 3: Sequence of images depicting probability density \waves" emitted by a pair of sources located on a line parallel to the x-axis and with orientations equal to 6 and 56 . The pictures show the probability density (i.e., p(x; y; ; t)) summed over all orientations (i.e, the marginal distribution in ). The time interval between successive pictures in the sequence, t is 5, the variance, 2, is 0.005 and the half-life, , is 4.5. 24
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
Figure 4: Sequence of images depicting the formation of the completion eld (i.e., C (u; v; ; t)) as a function of time. Frames (a-i) are totally dark, since insucient time has elapsed for particles traveling from the two sources to reach each other. The completion eld rst appears in frame (j), at the point midway between both sources. In frames (k-t) the completion eld rapidly spreads outward in both directions away from the midpoint and back toward the two sources. 25
(a)
(b)
(c)
(d)
Figure 5: (a) Kanizsa Triangle. (b) Stochastic completion eld computed by large kernel convolution with lter generated by Monte Carlo simulation[34] (summed over all orientations and superimposed on the brightness gradient magnitude image). (c) Stochastic completion eld computed by large kernel convolution with lter generated by analytic expression[30]. (d) Stochastic completion eld computed by repeated small kernel convolution.
26
(a)
(b)
Figure 6: Numerical artifacts caused by anisotropy inherent in nite dierence solution of the advection equation. (a) Completion eld in neighborhood of \pacman" computed by large kernel convolution. (b) Completion eld in same neighborhood computed by repeated small kernel convolution.
(a)
(b)
Figure 7: (a) Dinosaur image (Apatosaurus). completed where it is occluded by the legs. 27
(b)
The body of the dinosaur has been
(a)
(b)
(c)
(d)
Figure 8: The stochastic completion eld can be thought of as a probability distribution representing a family of snakes with given end conditions. (a) Probability density at t = 32 (i.e., p(x; y; ; 32)) for a pair of sources placed on the back of the dinosaur. This was computed by the small kernel method without anisotropic decay. (b) The mode of the stochastic completion eld (i.e. the curve of least energy), lies signi cantly below the back of the dinosaur. (c) Probability density at t = 32 for same sources computed with anisotropic decay. The probability density is maximum where the wave intersects the back of the dinosaur. (d) The mode of the resulting completion eld conforms closely to the back of the dinosaur.
28
(a)
(b)
(c)
(d)
(e)
(f)
Figure 9: Anisotropic Decay. (a) Kanizsa Square. (b) Stochastic completion eld magnitude along line y = 64. (c) Kanizsa Square with out-of-phase checkerboard background (see Ramachandran et al.[24]). (d) Stochastic completion eld magnitude along line y = 64. (e) Kanizsa Square with in-phase checkerboard background. (f ) Stochastic completion eld magnitude along line y = 64. 29