IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
1973
On the Distribution of Saliency Alexander Berengolts and Michael Lindenbaum Abstract—Detecting salient structures is a basic task in perceptual organization. Saliency algorithms typically mark edge-points with some saliency measure, which grows with the length and smoothness of the curve on which these edge-points lie. Here, we propose a modified saliency estimation mechanism that is based on probabilistically specified grouping cues and on curve length distributions. In this framework, the Shashua and Ullman saliency mechanism may be interpreted as a process for detecting the curve with maximal expected length. Generalized types of saliency naturally follow. We propose several specific generalizations (e.g., gray-level-based saliency) and rigorously derive the limitations on generalized saliency types. We then carry out a probabilistic analysis of expected length saliencies. Using ergodicity and asymptotic analysis, we derive the saliency distributions associated with the main curves and with the rest of the image. We then extend this analysis to finite-length curves. Using the derived distributions, we derive the optimal threshold on the saliency for discriminating between figure and background and bound the saliency-based figure-from-ground performance. Index Terms—Saliency networks, grouping, perceptual organization, figure-from-ground.
Ç 1
INTRODUCTION
A
TTENTION mechanisms allow the human observer to focus its computational resources on the more informative aspects of the retinal images [1], [2]. Effective attention can be carried out as a bottom up process, without any information about the sought for objects. The attention process is often modeled using a saliency map: an internal map calculated by some preattentive mechanism [3] and representing the estimated priorities assigned to every location; see, e.g., [4], [5], [6]. Image edges, which are more informative than other image parts, are detected by the human visual system [2] and further filtered by the attention process, which is effectively able to discriminate the important image edges, known as the “figure,” from the less important edges, known as the “background.” To account for this phenomenon using a computational theory, [7], Shashua and Ullman proposed a measure denoted saliency, which quantifies the quality of image curves according to their length and smoothness [8]; see also related work in [9]. The properties of this saliency measure were considered in [10]. Other saliency measures were suggested as well. Some are based on local voting schemes and either use tensor voting [11] or else count the number of random paths through the given point [12]. Others are recursive voting approaches, where every feature point votes according to its own saliency [13], [14] and saliency is calculated using eigenvector-based clustering. While the more recent saliency algorithms are mathematically sound and often provide a visually informative result, it seems that only the SU method [8] provides us with the curves that maximize a prespecified quality measure. Direct figure-ground discrimination approaches (e.g., [15], [16]) and contour grouping methods (e.g., [17], [18], [19], [20], [21]) may be considered as providing binary
. The authors are with the Computer Science Department, Technion, Israel Institute of Technology, Haifa 32000, Israel. E-mail: {aer, mic}@cs.technion.ac.il. Manuscript received 14 Aug. 2005; revised 13 Mar. 2006; accepted 1 May 2006; published online 12 Oct. 2006. Recommended for acceptance by J. Oliensis. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TPAMI-0439-0805. 0162-8828/06/$20.00 ß 2006 IEEE
saliency measures. We should also mention that saliencylike spatial support is often part of the design of edge detectors, either by incorporating hysteresis [22] or simply by increasing the edge detection scale. Increasing the scale indeed enhances the larger, more salient, object boundaries but may miss details and small objects. This applies to algorithms which suppress texture-based edges as well [23], [24]. In a recent study, we found that, with cue optimization, the SU saliency competes favorably with common edge detectors in detecting detailed significant edges [25]. Having its origin in an attempt to explain a perceptual phenomena, most saliency research focuses on biologically plausible mechanisms for finding long, smooth, and possibly closed, curves. The saliency measure proposed in [8] is one possible quantification of the intuitively phrased desired properties, and other quantifications, which are as plausible and computationally efficient, would lead to a different choice of the most salient curve. Thus, the first goal of this paper is to provide an alternative specification of saliency, which is consistent with the human visual system, but has also a precise, quantitatively justified meaning. This notion of saliency relies on probabilistic cues and length distributions. A grouping cue typically uses some affinity estimate and provides uncertain information about the grouping of two (or more) features. This uncertainty is naturally described by a random variable (r.v.), indicating whether the two features are connected. The statistics of this r.v., which depends on the cue value, depend on the cue, and we refer to this r.v. as a probabilistic cue. Taking this stochastic approach (related to [26], [27], [28], [29]), the length of any curve through feature points becomes a random variable as well, characterized by a distribution. The first part of this paper (Section 2) shows how the length distribution can be used for interpreting and generalizing saliency processes. We show that this distribution may be iteratively estimated, in parallel, for all image features. Different saliencies, such as the expected length, may be specified as different functions of this estimated distribution. Phrased in terms of our approach, the original saliency of [8] may be regarded as a particular case associated with a curvature/distance-based grouping cue and maximizing the expected length of the curve on which Published by the IEEE Computer Society
1974
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
this point lies. This way, the traditional saliency measure gets a different, clearer interpretation. The abstract treatment of cues as random variables allows different types of cues, not only those associated with curvature, to be used when specifying a saliency. The last result in the first part of the paper is negative: We formally show that while, in principle, many types of saliency could be specified from the length distribution, only a few may be optimized efficiently by dynamic programming. In particular, only a few generalized moments of the length distribution (including the mean) may be exactly optimized. The second part of the paper (Sections 3 and 4) applies the stochastic model to analysis and prediction. It considers a family of images, characterized by two distributions of cues: one for the important “figure” parts and another for the less important “background.” Thus, the cues’ values themselves are random, and the associated saliency at every point is random as well. This comes in contrast to the first part of the paper, which considers a specific image, with specific cue values, leading to a deterministic saliency (although, certainly, both the cues and the implied saliency vary from point to point). Given the cue distributions, we derive (for the first time) the distribution of (expected length) saliency values. This distribution is estimated both asymptotically, where we get a very close analytic approximation, and for finite length curves. Setting a threshold on the saliency yields a binary distinction between figure and background. Once these distributions are available, this threshold may be optimally set and the associated performance (the error rate in the task of finding the figure) predicted. This new characterization of saliency using probabilistic cues and the associated analysis has the following advantages: Another perspective: The consideration of the SU saliency not only according to its original curvature-based interpretation, but also by its probabilistic interpretation as expected length, gives another intuitive interpretation for the measure that makes one curve a winner. . Generality: The probabilistic saliency lends itself to different realizations of saliencies based on different cues and thus allows other sources of information to be used for grouping. . Systematic design: The probabilistic characterization of cues may be derived from typical images (using ground truth), rather than preconceived opinions about the nature of figure subsets. The threshold for discriminating between figure and background on the basis of saliency may be calculated from the derived distributions. . Predictability: The expected number of errors, as well as the effect of changing cues or thresholds, may be predicted from the derived distributions. The flexibility (or generality) of our approach helps to overcome some problems described in [10], which are not, as we see it, inherent to the saliency process, but are caused instead by the use of geometry-based exponential saliency cues. The paper continues as follows: First, in Section 2, we present the length distribution concept and the different saliency measures that may be built upon it. We investigate a limitation on the efficient optimization on these saliency functions and characterize the admissible saliencies. In .
VOL. 28, NO. 12,
DECEMBER 2006
Section 3, we derive the distribution of expected length saliency, using a random cue model. The derived distributions are used for setting a figure-ground threshold and for estimating the performance of the saliency-based figureground discrimination (Section 4). Many experiments, describing the different types of saliencies and comparing the predicted saliency distributions to the true ones (on real imagery), are described in Section 5. Partial preliminary results of this paper were presented in [30], [31], and a full version including the proofs is in [32].
2
A PROBABILISTIC APPROACH
TO
SALIENCY
In this section, we describe the proposed saliency measure. We start with a definition of the length distribution, and then show how it is estimated and how saliencies may be associated with it and optimized. Then, we discuss a particular saliency and show that it is tightly related to the SU saliency [8]. Finally, we briefly discuss an inherent limitation that prevents us from further generalizing the proposed approach.
2.1 Length Distributions Length distribution is a key concept in our approach. For instance, let xi be a directional feature point in the image (e.g., an edgel). Such a point may or may not belong to some curve that extends lþ length units to one side and l length units to the other side. Here, we consider these lengths as random variables associated with the feature xi , and characterize them by the distributions Diþ ðlÞ and Di ðlÞ, respectively. The idea behind treating the lengths as random variables is discussed below. The direction, used to keep the order in the curve, is specified relative to, say, the direction of the gradient at this feature point, and may take one of the two fþ; g values. The parts of the curve lying in the positive and negative directions are denoted as positive and negative extensions, respectively. Our basic intuition is that points with long extensions correspond to larger objects and deliver more significant information about the content of the image. Therefore, we shall try to find those feature points associated with the Diþ ðlÞ and Di ðlÞ distributions, which put more weight on longer l values. 2.2
Estimating Length Distribution by Local Propagation When no connectivity information is available, no features can be said to belong to any curve. Hence, all distributions are concentrated on very short lengths, corresponding to the length of the corresponding features themselves. For simplicity, we assume that all these distributions are identical and denote them by D ðlÞ. When two feature points are connected, their distributions are tightly related. Consider two features, xi and xj , which belong to some curve such that xj lies in the positive extension of xi . Suppose that Djþ ðlÞ is known. Then, Diþ ðlÞ can be written as j Diþ ðlÞ ¼ Dj!i þ ðlÞ ¼ Dþ ðl lij Þ;
ð1Þ
where lij is the distance from xi to xj (on the curve); see Fig. 1. This follows by observing that a positive extension of length l associated with xj implies that the length of the positive extension of xi is l þ lij . The notation Dj!i þ ðlÞ explicitly emphasizes that this is an inference of the length distribution associated with the ith feature from the known distribution associated with the jth feature.
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
Fig. 1. Edge features, possibly belonging to the same curve.
As is common in image analysis, we can never be sure that two features lie on the same curve. In a non-modelbased context, we may take a probabilistic approach, treat this event as a binary random variable, and estimate its probability from local information such as perceptual organization cues [33], [34]. Let cðxj Þ denote the curve on which xj lies and let Pij be the probability P robfxi 2 cðxj Þg; see Fig. 1. This probability, denoted a probabilistic grouping cue, is inferred as follows (in an already standard way; see, e.g., [35]): Find the distribution of some perceptual affinity (for example, the difference in gradient angles) for feature point pairs that belong to the same curve, and again, for feature point pairs that do not. Then, given an affinity value associated with the ðxi ; xj Þ pair, find the posterior probability that two edgels xi and xj belong to the first class. Specifying the affinity between the two feature points using this abstract cue allows us to calculate a saliency-like measure, based on different grouping cues and not only on the cocircularity cue used in [8]. As we shall see, such different saliencies have a common meaning, independent of the different types of information they employ. Consider a specific path ¼ fx1 ; x2 ; . . . ; xN g, starting at the feature point x1 , such that xiþ1 is on the positive extension of xi ; i ¼ 1; . . . ; N 1. The length of the connected path starting at x1 depends on all the randomly modeled events fxi 2 cðxiþ1 Þ; i ¼ 1; . . . ; N 1g that characterize the connectivity. Therefore, this length may be considered as a random variable. Consider now some hypothesis about a particular path, in which the feature xj lies on the positive extension of the feature xi . If the length distribution Djþ ðlÞ is known, then the length distribution Diþ ðlÞ, is ^ j!i ðlÞ ¼ Pij Djþ ðl lij Þ þ ð1 Pij ÞD ðlÞ: D þ
ð2Þ
^ j!i ðlÞ D þ
is not the expected length of the positive Note that extension (which is scalar). In the context of this path, xi is either connected to the curve or disconnected from everything (in the positive direction). An alternative formulation, where all curves to which xi may belong are taken into account, leads to a Bayesian estimate of Diþ ðlÞ. See Section 6 for a discussion of this alternative and its relation to the saliency approaches. Suppose now that a path ¼ fx1 ; x2 ; . . . ; xN g starts at the feature point x1 , such that xiþ1 is on the positive extension of xi ; i ¼ 1; . . . ; N 1. Then, the length distribution asso ciated with x1 may be recursively calculated: DN þ ðlÞ ¼ D ðlÞ, N1 N!N1 1 ^ ^ ðlÞ is finally estimated. ^ ðlÞ; . . . , until D D þ ðlÞ ¼ Dþ þ A useful variation is to consider the distribution of the number of feature points on the curve. This distribution, denoted Dþ ðkÞ; is easily propagated, using the relation in (2), by replacing l with k and lij with 1.
2.3
The Saliency Associated with a Length Distribution The length distribution provides partial information on the importance of the curve (or path). Generally, longer curves
1975
correspond to larger objects that are usually more significant. Therefore, a length distribution is better if it puts more weight on longer lengths. Let Q½Diþ ðlÞ be a quality measure (saliency), quantifying, in some way, the desired properties of the curve. The most straightforward choice would be the expected length. Other choices are possible. If, for example, curve connectivity is sought, then ffi R pstrong Q½Diþ ðlÞ ¼ lDðlÞdl would de-emphasize the contribution of a long extension after a large gap. Such measures correspond to one extension of the feature point, and are, therefore, denoted as one-sided saliencies. The sum of the two one-sided saliencies is also a natural measure of the curve’s quality.
2.4 The Probabilistic Saliency Optimization Process So far, we have considered a specific curve, the length distribution associated with the feature point on its end, and the saliency associated with it. A saliency process that aims to find the best curves should find, for every feature point, a curve ending at this point and maximizing the saliency there. Calculating this optimum is easy for short paths but is generally exponential in N and thus too expensive for practical use.1 Fortunately, for some useful quality functions (see Section 2.5 for a specification), the optimal curves may be calculated by the simple dynamic programming iterative process described in Algorithm 1. Algorithm 1 The Saliency Algorithm 1: Preprocessing: A neighborhood xj j ¼ 1; 2; 3 . . . is specified for every feature point xi , all probabilistic grouping cues Pij are calculated, and all the length distributions associated with the feature point are initialized to D ðlÞ. 2: At the kth iteration (k ¼ 1; 2; 3; . . . ; N) for every feature point xi : 1) For all neighbors xj ; j ¼ 1; 2; 3 . . . of xi , update the j!i ðlÞ using (2), and calculate length distribution Dþ j!i the saliency Q½Dþ ðlÞ. 2) Choose the neighbor xj that maximizes the saliency and update the length distribution Diþ ðlÞ to Dj!i þ ðlÞ. Apart from building the length distributions, the process also specifies, for every feature point, the next feature point on its extension. Thus, starting from salient points, the iterative process also finds the long, well-connected curves that contribute to and support this high saliency. Fig. 2 illustrates the development of the length distribution associated with a particular point and Fig. 3 describes some (roughly) stable distribution obtained after many iterations. (The examples correspond to expected length saliency.)
2.5
An Inherent Limitation on Optimizable Saliencies Note that the Dynamic Programming decision process makes only local choices of the preferred path. Still, if the quality function (saliency) satisfies certain conditions, it finds the global optimum. That is, after N iterations, the choices made in the feature points specify, for every feature point, a curve of length N, which ends in this feature and 1. The derivation in this section is similar to that proposed in [8] yet differs somewhat due to the use of distributions.
1976
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 2. The one-sided length distribution at point C3 (in the top left illustration), plotted for 1, 2, 3, 4, and 10 iterations. (The horizontal axes of the plots show the length, in pixels.) These plots show how the distribution changes over time. Note that for such a smooth curve (a straight line segment), the distribution quickly develops a significant weight for the large values.
Fig. 3. Length distributions associated with different points after a large number of iterations. The left graph (a) describes some distributions corresponding to the different points C1 ; . . . ; C7 (shown in the previous figure) after 80 iterations. Note that points that are close to the end (C1 is the closest) cannot have large values, and correspond to the distributions with peaks on small l values. (b) The point A gets support from a smooth curve and is associated with a distribution having significant weight in high values. The point B is weakly connected to A and, therefore, its distribution is an average of the initial distribution that focuses on low values and that of A, which makes it roughly bimodal (c).
maximizes its saliency. Specifically, the quality criterion should satisfy a very special property: Consider some optimal path ðx0 ; . . . ; xi Þ, maximizing the saliency at xi . As a consequence, any subpath ðxp ; . . . ; xq Þ on this optimal path maximizes the saliency at xq relative to all paths of the same length. See also the slightly less general extensibility criterion specified in [8]. As we want to find several types of saliency, we consider the class of saliencies specified for generalized moments, Z Q½Diþ ðlÞ ¼ fðlÞDiþ ðlÞdl; to see if they satisfy this property and, thus, can be optimized by dynamic programming. The result, given by
the theorem below, is pessimistic: Only two types of saliencies—the expected length and the exponentially weighted length—can be optimized. Proposition 1. Of all quality functions which may be written as generalized moments of the length distribution, Q½D ¼ R fðlÞDðlÞdl; only Z Q1 ½D ¼ ðal þ bÞDðlÞdl and Q2 ½D ¼
Z
ðaecl þ bÞDðlÞdl
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
1977
E½liþ ¼ max ð1 Pij ÞE ½l þ Pij ðlij þ E½ljþ Þ j ¼ max E ½l þ Pij ðlij E ½l þ E½ljþ Þ ;
may be exactly optimized by the Dynamic Programming-based process (2) (or by any Dynamic Programming algorithm). This theorem holds if lij , the distance between feature points, can take arbitrary values. If this condition is not satisfied, a weaker but similar constraint still holds; see details and proofs in the technical report [32].
2.6
The Expected Length Saliency and Its Relation to the SU Saliency Following the implications of Proposition 1, we now focus on the expected length saliency. As discussed below, this saliency has algorithmic advantages and also corresponds to the SU saliency. First, note that for the expected length saliency, the relatively costly length distribution updating may be replaced by directly updating the saliency values. Let E½liþ (E ½l) be the expected lengths associated with the distributions Diþ ðlÞ (Dþ ðlÞ). Then, the distribution update rule is changed to the following expected length update rule: j E½liþ ¼ E½lj!i þ ¼ Pij ðlij þ E½lþ Þ þ ð1 Pij ÞE ½l:
ð3Þ
For feature points on closed curves, the meaning of the saliency as expected length is somewhat distorted. The increase in the saliency of closed curves is often considered desirable because these curves are usually more significant than their open counterparts with the same length. Indeed, closed curves are considered more salient by the HVS [8]. See [8], [10] for the relevant discussions on the saliencies associated with closed and infinite curves. Our approach follows the original saliency measure [8], relying on the human visual system preference for long smooth curves. There are some differences, however. In the framework of [8], features could be “real” (where we have, say, an edge point), or “virtual,” where there is no local image-based evidence for an edge. This choice allows the authors to hypothesize an image-independent, parallel local architecture that is a plausible model for a perceptual process. In our framework, the network of elements is sparser but all features are real. The saliency of the ith feature, specified in [8], is updated by the local rule ðnþ1Þ
Ei
ðnÞ
¼ i þ i maxðEj fij Þ;
where the maximum is taken over all the features in the ðnÞ neighborhood of the ith feature. Ei is the saliency of the ith feature after the nth iteration; i is a “local saliency,” which is set as a positive value (e.g., 1) for every real feature; i is a penalty for gaps, which is set to 1 in features (no gap) and to a lower value when the feature is virtual; and fij is a “coupling constant” which decreases with the local curvature. The curvature/distance-based cues used in [8] may be interpreted as a measure for the grouping probability. By the general assumption that smooth curves are more likely, a low curvature implies that connection is more probable. Thus, for real feature points, the associated update formulae may be interpreted as ðnþ1Þ
Ei
ðnÞ
¼ 1 þ maxðEj Pij Þ:
j
which, for lij ¼ E ½l ¼ 1, yields E½liþ ¼ 1 þ Pij E½ljþ :
This is just the same as (4). Therefore, the proposed saliency, with the specific choice of curvature-distance cue, and with expected length quality criterion, is equivalent to the original saliency proposed in [8]. For several examples of saliency with different cues and different quality measures, see Section 6. Fig. 4 and Fig. 5 illustrate two such examples (see details in Section 6).
3
THE DISTRIBUTION
OF
SALIENCY
Saliency is mostly used for classifying the image curves into parts that are likely to be an important object (figure) and those that are not (background). The second part of this paper aims to quantitatively characterize the resulting saliency values in an image, so that we can decide about the classification criterion and predict how good this classification will be. We restrict ourselves to the expected length saliency.
3.1 Cues and Saliencies as Random Variables The saliency at every point is the result of an accumulation process on the best curve that reaches it. Note that in the discussion above, the probabilistic model specified the connectivity between the two features as a random event characterized by the deterministic perceptual organization cue and the corresponding probabilistic cue Pij . In this part, we do not consider the specific cues associated with particular feature point pairs, but rather consider the cues as random variables taking values in the ½0; 1 interval. Different distributions are used for figure and for background. In this context, the saliency at every point itself is also a random variable. To simplify notations, we denote this variable simply by x. In our analysis, we shall first consider the propagation process on an isolated curve and shall be interested in the positive saliency of a point associated with positive extension of n feature points. That is, we seek a point that is n-distant from the end of the curve. This saliency is a random variable, denoted yn . Recall that the saliencies on consecutive points on a curve are related by the relation (3). For the analysis, we shall make two simplifications: We assume the expected length of an isolated feature point E ðlÞ is 0 and the distance between two features points is constant (and 1). Both approximations are reasonable and significantly simplify the derivation and the presentation. Note also that the second approximation is exact if we consider a saliency defined as the expected number of feature points. The saliency is now propagated by E½liþ ¼ Pij ðE½ljþ þ 1Þ; or, in our shorter notation, ynþ1 ¼ xðyn þ 1Þ:
ð4Þ
Recall now that for our saliency, the expected length propagates as
ð5Þ
ð6Þ
Note that yn and ynþ1 are different random variables with potentially different distributions. Let fX ; fY n ; fY nþ1 be the distributions of x; yn and ynþ1 , respectively. Using standard
1978
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 4. A typical saliency calculation with an angle cue: (starting from upper left, clockwise) the original image, edges, positive and negative saliencies, sum of saliencies (2D saliency), and thresholded 2D saliency.
Fig. 5. A typical saliency calculation with a gray-level cue: (starting from upper left, clockwise) the original image, edges, positive and negative saliencies, sum of saliencies (2D saliency), and thresholded 2D saliency.
transformations of distribution for a function of random variables yields Z d fX ðÞ : ð7Þ fY nþ1 ðÞ ¼ fY n þ1 þ1 In this calculation, we assume that yn , the saliency at the nth point, and the cue x, specified between yn and ynþ1 , are independent random variables. To a first order approximation, this assumption is justified because yn depends only on cues and saliencies associated with the path in one direction, away from the point ynþ1 . Note that no independence between ynþ1 and yn or between ynþ1 and x
is assumed as, by (5), ynþ1 is deterministically specified by both yn and x. It seems, however, that the cue values form a nonstationary process along the curve. That is, a high cue in one location often implies that the adjacent cues are high as well. This dependence seems to be of second order, but causes some inconsistencies between predictions and experimental results; see Section 3.8. When the distribution of the cues is given, the relation (6) may be used to calculate the distribution iteratively. Naturally, the distribution depends on the distance n from the curve’s end, and, thus, the saliency process along the curve is nonstationary and difficult to estimate.
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
This section continues as follows: We start with onedirectional saliencies (satisfying (6)), and focus first on an asymptotic analysis, where we provide analytic expression to the distribution. More accurate calculations, which take into account the finite length of curves and the interaction between figure and background, follow. This results in an accurate saliency model. The sources of inaccuracy are discussed and a two-directional saliency, advantageous for figure-ground discrimination, is presented.
3.2 Constraints on the Asymptotic Distribution We start with an asymptotic evaluation of the saliency distribution. Consider the set of curve points that are substantially far from the curve’s end. The saliency at such points, when the iteration process converges (see below), reaches a statistical steady state, which essentially does not depend on the distance to this end. That is, the saliency at a point, selected at random from this set, is randomized according to a fixed distribution, y1 , which we shall now try to find. Note that in this part of the curve, the saliency process becomes ergodic as the distribution at some point and a sample from the entire curve are the same. Being a probabilistic cue, x satisfies 0 x 1. This condition suffices for the convergence of every moment associated with the sequence of distributions fyn g, implying that for nonpathological distributions, the sequence of distributions converges to y1 (see Section 3.5.). Consider the plausible case where the cue x ¼ Pij never provides complete certainty, whether or not the two features belong to the same cue. Then, x is restricted by 0 < x < 1:
ð8Þ
Using geometric progression (similar to the analysis of closed curves in the deterministic context of [10]) implies that the asymptotic saliency is bounded: y1 : 1 1
ð9Þ
3.3
The Moments of the Asymptotic Saliency Distribution For the asymptotic limit, we get a Fredholm integral equation, Z 1 d : ð10Þ fX ðÞfY1 fY1 ðÞ ¼ þ1 þ1 0 The classical method for solving such equations is to represent their kernel as a sum of separable functions. It seems that here this approach would not lead to an analytic solution, unless very restrictive conditions are imposed. Therefore, another approach is sought. We chose to characterize the asymptotic saliency distribution by its moments. Raising both sides of (6) to the kth power and taking their expected value (with respect to the asymptotic distribution) yields hðy1 Þk i ¼ hxk ðy1 þ 1Þk i ¼ hxk ihðy1 þ 1Þk i: Using the binomial formula and the linearity of expectation,
1979
hðy1 Þk i ¼ hxk i
k X k l¼0
l
hðy1 Þl i
¼ hxk ihðy1 Þk i þ hxk i
k1 X k hðy1 Þl i: l l¼0
Therefore, hðy1 Þk i ¼
k1 hxk i X k hðy1 Þl i: 1 hxk i l¼0 l
ð11Þ
The first moments of the distribution fY may be written in explicit form: hxi ; 1 hxi hx2 i M2 ¼ hðy1 Þ2 i ¼ ð2hy1 i þ 1Þ 1 hx2 i hx2 i 1 þ hxi ; ¼ 1 hx2 i 1 hxi hx3 i ð3hðy1 Þ2 i þ 3hy1 i þ 1Þ M3 ¼ hðy1 Þ3 i ¼ 1 hx3 i hx3 i hxihx2 i þ 2hx2 i þ 2hxi þ 1 ; ¼ 1 hx3 i ð1 hxiÞð1 hx2 iÞ hx4 i M4 ¼ hðy1 Þ4 i ¼ ð4hðy1 Þ3 i þ 6hðy1 Þ2 i þ 4hy1 i þ 1Þ: 1 hx4 i ð12Þ M1 ¼ hy1 i ¼
(As the complexity of the moments increases very quickly, their calculation using the recursive formula of (11) may be more convenient.)
3.4
An Explicit Approximation of the Asymptotic Saliency Distribution The moments provide all the information about the required saliency distribution, and as we shall see below, just a few of them suffice to approximate it well. To get this approximation, we follow a complex but known procedure (see [36] and [32] for details), which requires the calculation of the cumulants, C1 ¼ M1 ; C2 ¼ M2 M1 2 ð¼ 2 Þ; C3 ¼ M3 3M1 M2 þ 2M1 3 ;
ð13Þ
C4 ¼ M4 3M2 2 4M1 M3 þ 12M1 2 M2 6M1 4 ; and yields the Edgeworth power series development of a distribution [37], [38]. " ðM1 Þ2 1 M1 2 1 þ 3 H3 fY1 ðÞ ¼ pffiffiffiffiffiffi e 2 2 M1 M1 þ 5 H5 þ 4 H 4 ð14Þ # 2 M1 þ 6 þ 3 H6 þ ... ; 2 where the coefficients l Cl l!l depend on the cumulants, Hl ð Þ, l ¼ 1; 2; 3; . . . are the lth modified Hermite polynomials,
1980
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 6. Asymptotic distributions: (a) Four (synthetic) distributions of (four) cues. The horizontal axis is the cue value. (b) The corresponding asymptotic saliency distributions. The horizontal axis is saliency. A saliency distribution associated with a peak in higher saliency value corresponds to a cue with larger values. (c) The error is approximating these saliencies as a function of the number of cumulants taken into account. (d) shows one saliency distribution—the leftmost in (b), plotted in solid line—and its approximations using two and three cumulants (dotted). This distribution is the one associated with the highest approximation error (i.e., the worst case). Note that the approximation by three cumulants almost coincides with the true distribution.
H1 ¼ ; H2 ¼ 2 1; H3 ¼ 3 3 ; H4 ¼ 4 6 2 þ 3; . . . ; Mj j ¼ 1; 2; 3; . . . are the moments (of y1 ), and 2 is its variance. While the Edgeworth series provides a closed form expression for the distribution in terms of its moment, its use may not be practical for all distributions. First, the distribution must be smooth enough. Otherwise, the power series does not converge. Moreover, to get a reasonable approximation, the number of terms may be prohibitively large. Fortunately, we found that for typical cue distributions, the saliency distributions fY that obey (10) require only a few terms (see Fig. 6). Taking only two terms (approximating by a Gaussian) gives an approximation error of 10-20 percent. Adding a third term, based on the skewness cumulant, decreases the approximation error to 1-4 percent for the y distributions we tested. Thus, we propose using the following three Edgeworth series terms as an approximation: !" 1 ð M1 Þ2 C3 ð M1 Þ3 fY 1 ðÞ pffiffiffiffiffiffi exp 1 þ 22 63 3 2 !# M1 : 3 ð15Þ See Fig. 6 for an illustration of the approximation’s accuracy.
3.5 The Effect of a Finite Curve’s Length The approximation of saliency by its asymptotic distribution is valid only for points far from the curve’s end. Points closer to the end get, on the average, lower saliencies. Quantitatively, the kth moment of yn satisfies: k
k
k
k
k
hðyn Þ i ¼ hx ðyn1 þ 1Þ i ¼ hx ihðyn1 þ 1Þ i: This expression can be evaluated using binomial expressions, leading to complex recursive sums. The first few moments are relatively simple:
hyn i ¼ hxiðhyn1 i þ 1Þ ¼ hxin hy0 i þ hxi
n1 X
hxil
l¼0
¼ hxin hy0 i þ hxi
1 hxin ¼ hy1 i þ hxin ðhy0 i hy1 iÞ; 1 hxi ð16Þ
hðyn Þ2 i ¼ hx2 i hðyn1 Þ2 i þ 2hyn1 i þ 1 hx2 in hxin ¼ hx2 in hðy0 Þ2 i þ 2hy0 ihx2 i hx2 i hxi 2 2 n 2hxihx i 1 hx i hx2 in hxin þ 1 hxi 1 hx2 i hx2 i hxi n 1 hx2 i þ hx2 i 1 hx2 i ¼ hðy1 Þ2 i þ hx2 in hðy0 Þ2 i hðy1 Þ2 i
ð17Þ
hx2 in hxin : þ 2 hy0 i hy1 i hx2 i hx2 i hxi Recalling that hxi < 1 and, hence, hxin ! 0, we can see that these expressions show how the distributions yn converge to the asymptotic distribution y1 . If the curve is isolated, then hy0 i is simply E ðyÞ: Otherwise, hy0 i may change; see Section 3.7, which describes the analysis of interaction between figure and background. It turns out that the expected value hy1 i is the characteristic convergence parameter. To see this, let Nx ¼ hy1 i. As a result, hyn¼Nx i hy1 i þ e1 ðhy0 i hy1 iÞ; where we used ð1 1=aÞa ! e1 . This is somewhat unfortunate because it means that with better cues, associated with a higher expected cue hxi and a higher asymptotic expected saliency hy1 i, the convergence is slower.
3.6 The Average Distribution Sampling saliencies from a finite length curve (of length N), we get an average over all the distributions of y1 ; y2 ; . . . ; yN , fY ¼
N 1X fy : N n¼1 n
ð18Þ
We use lowercase letters yn for the distribution at specific locations and calligraphic uppercase letters Y N for the average saliency distribution over the entire curve. For the latter distribution,
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
1981
Fig. 7. Finite curve average saliency distribution (for hy1 i ¼ 15) for curves of length 30, 60, 90, and 1 (left to right). N N 1X hy0 i hy1 i X hyn i ¼ hy1 i þ hxin N n¼1 N n¼1 hy0 i hy1 i N 1 hxi ¼ hy1 i 1 þ N hy0 i hy1 i ; hy1 i 1 þ N
hY N i ¼
hY N 2 i ¼
ð19Þ
N N hy2 i hy21 i X 1X hy2n i ¼ hy21 i þ 0 hx2 in N n¼1 N n¼1
þ 2hx2 i
N hy0 i hy1 i X hx2 in hxin N hx2 i hxi n¼1
ð20Þ
hy2 i hy21 i hx2 i þ 0 1 hx2 i N hy0 i hy1 i hx2 i 1 ; þ2 N 1 hx2 i 1 hxi hy21 i
and 2Y N ¼ hY N 2 i hY N i2 hy21 i hy1 i2 hy20 i hy21 i hx2 i hy0 i hy1 i þ2 N 1 hx2 i N 2 2 hx i 1 hy0 i hy1 i 2 hy hy i i : 1 1 1 hx2 i 1 hxi N þ
ð21Þ This distribution will be close to the asymptotic distribution only if the curve is long enough. See Fig. 7 for an example showing how the true distribution on a curve of length N depends on N. Note that even for a relatively long curve, the distribution is highly asymmetrical. Therefore, we cannot describe it using a few first moments, (i.e., a few terms in an Edgeworth series) as we did for the yn distributions. This is different for the two-directional saliencies; see Section 3.9. Naturally, for real applications, the curve’s length is not known. We shall use a rough approximation and take half of the image size as the characteristic length of the “important” curves.
3.7 The Interaction between Figure and Background Using the derivation above, the distributions of figure and background can be evaluated independently, relying on the respective cue distributions. Such idealization is nonaccurate, as already observed in [10], where background elements got high saliencies due to figure proximity. To model the influence of background elements on the figure, we simply assume that the first element of every figure curve starts with a saliency y0 , distributed according to the background distribution.
The influence of the figure on the saliency of the background is a bit more complex. We assume that every background element belongs to some long curve starting from the figure and, therefore, the saliency of the first element of such a curve is distributed according to figure saliency distributions. The number of such curves and their length is estimated as follows: By the uniformity assumption, the number of background elements that are direct neighbors of the figure is Nfig bg , where Nfig is the number of figure points and bg is the average number of background feature points in a unit disk. This is also the number of background curves. The length of these curves is thus estimated as Nbg =ðNfig bg Þ. Note that, for most background elements that are far from any figure, the effect of the first saliency is negligible, especially because the cue average hxi is low and the convergence to asymptotic distribution is fast. Near-figure background elements, on the other hand, may get high saliency values. The most significant effect of the interaction is the addition of a heavy tail to the right side of the background saliency distribution. A secondary effect is the cropping of the left tail of the figure saliency distribution because, starting with background distribution (and not with y0 ¼ 0), very low saliencies are eliminated (see Fig. 8).
3.8 On the Accuracy of Our Model Comparing the predicted distributions to the actual ones (see Section 5), we note that, while the similarity is apparent, there are also differences. Using the simplest, asymptotic predictions gives a reasonable estimate of the expected saliency but underestimates the variance. The asymptotic prediction matches the background saliency distribution much better than the figure saliency distributions. One reason for this difference is that, for the background, Nx is much lower and the asymptotic approximation is substantially better.
Fig. 8. Predicted saliency distribution for (b) figure and for (a) background, according to three models: asymptotic (light dashed), finite length (dashed), and interaction-based (solid). The cue distributions as well as the other parameters (figure size, background density, etc.) correspond to the real image in Fig. 12.
1982
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 9. Finite curve average two-directional saliency distribution (for hy1 i ¼ 15) for curves of length 30, 60, 90, and 1.
One problem we did not address is cue mixture. Different figure curves in the image may have different cue distributions. Our predicted distribution relies on their average cue distribution and is narrower than the true saliency distribution, which is essentially a sum of several subdistributions. This effect is much stronger in the figure distribution because the saliency is much more sensitive to dhyi 1 ¼ 1hxi small changes in high cue values. (Note that dhxi 2 .) As mentioned above, cues along the curve tend to be locally correlated. In its extreme form, this correlation means that cues associated with different curves are distributed differently. Thus, the cue mixture may be considered as a violation of the independence assumption. Note that two finite length distributions associated with different cues are very different, but their left tails, which indicate the distribution of low saliencies, are not too different. For example, a change of the cue average from 0.95 to 0.975 more than doubles the expected length (from 19 to 39), but the expected saliency of the 10th figure point changes only from 7.6 to 8.7. This result is fortunate because, while the prediction using one average distribution gives a narrower, visually dissimilar figure distribution, it suffices for setting the threshold and estimating the error, which depends only on the low saliency part of the figure distribution.
3.9 Two-Directional Saliencies The one-directional (1D) saliency analyzed above is highly variable along the curve and, in particular, gets low values near one end of it. At the other end, figure points with high saliencies increase the saliency values of nearby background features to high values, which may be well above many saliency values on the figure; see, for example, Figs. 12, 13, and 14. Therefore, it is problematic to discriminate between figure and background by setting a threshold on saliency. Two directional (2D) saliency is the sum of the two 1D saliencies corresponding to the two directions specified in every pixel, and is associated with reduced variability. Note that the two 1D saliencies correspond to the two different extensions of the curve and are, therefore, independent. As a consequence, the distribution of the 2D saliency is the convolution between the two 1D saliencies, and the cumulants of the 2D saliencies are the sum of the corresponding 1D cumulants [36]. The 2D asymptotic distribution is associated with a double mean and variance, and is thus expected to have more concentrated distribution. For length N curves, the nth point in one direction is the ðN nÞth point in the other direction. Let yen be the
2D saliency value at the nth point. Assuming for simplicity that y0 is the same from both sides,
yNnþ1 i ¼ 2hy1 i þ hxin þ hxiNnþ1 hy0 i hy1 i : he yn i ¼ he For the average saliency (sampled uniformly on the e curve) Y, N N 2X hy0 i hy1 i X hYeN i ¼ 2hY N i ¼ hyn i ¼ 2hy1 i þ 2 hxin N n¼1 N n¼1 hy0 i hy1 i 1 hxiN ¼ 2hy1 i 1 þ N hy0 i hy1 i : 2hy1 i 1 þ N
ð22Þ For the second moments, he y2n i ¼ hy2n i þ hy2Nnþ1 i þ 2hyn ihyNnþ1 i: After some manipulations, we get the variance h 2 2e ¼ hYeN i hYeN i2 2 hy21 i hy1 i2 YN
hy20 i hy21 i hx2 i hy0 i hy1 i þ2 N 1 hx2 i N 2 # hx2 i 1 hy0 i hy1 i 2 hy1 i 2 hy1 i ; 1 hx2 i 1 hxi N þ
ð23Þ which is almost twice the variance associated with the 1D average saliency (21). Higher moments and cumulants may be evaluated in the same way as second moments, only the expressions are longer. This distribution is much more symmetrical and can be described well by just a few moments; see Fig. 9 for an illustration and Fig. 7 for a comparison. Using 2D saliency induces a gap between the saliency of the figure and that of the deep background elements and sets the saliency values of the near-background to the lower range of figure saliencies. Thus, while there is not a complete separation of figure and background saliencies, the figurebackground discrimination is expected to be better.
4
SALIENCY THRESHOLDING
4.1 Setting a Threshold Given the saliency at every point, we may want to threshold it to get a binary image of the figure. Often, such thresholds are determined empirically, relying on the specific characteristics of the application. Knowing,
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
1983
Fig. 10. Saliency calculation for a real image with gray-level cue: (a) gray-level image, (b) edge image, (c) 1D, and (d) 2D thresholded saliencies. The thresholds, 13 and 24, were calculated from the two distributions.
however, that the saliency comes from two given distributions (associated with figure and background populations) allows us to use standard decision theoretic tools for setting the optimal threshold, minimizing some penalty function (e.g., a classification error) [39]. Let Nfig and Nbg be the number of figure and background elements, respectively. Similarly, let Nmiss and Nfa be the number of figure errors (misses) and background errors (false alarms). Then, a reasonable penalty function Nfa would be the normalized error rate NNmiss þ NBg . (Choosing F ig this simple penalty is equivalent to assuming equal priors and equal importance of errors, or alternatively, penalties that are inversely proportional to the prior probability.) As the true saliency distributions are usually not known, we wish to use the estimates derived above for setting the threshold. This may be done in several ways, which, in the experimental situations we considered, gave roughly the same result. The choice depends on the information available. 1.
Based on the finite length, interaction-based model: If the information available on the figure distribution is comparable to that available on the background, then we propose that both distributions be used. These should be estimated with finite length and figure-background interaction taken into account and the threshold set to minimize the penalty.
This approach may be used even when little is known about the two distributions. For example, in the case of figure and background cues for which only the first two moments are known, we may derive the corresponding statistics of the saliencies (taking into account the finite length and the interaction) and use a Gaussian approximation. (The mean and variance are specified by (19) and (21) for 1D saliency or by (22) and (23) for 2D saliency.) This approach is especially attractive when the Gaussian is a reasonable approximation, which happens for the 2D saliency. Of course, if more moments are known, better approximations may be used. For the Gaussian case, the threshold value may be calculated by solving a quadratic equation and is " ! 1 fig bg T hr ¼ 1 2fig 2bg 12 2fig bg vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ! # u uðfig bg Þ2 2fig 1 1 t þ þ 2 2 2 ln 2 ; 2fig 2bg bg fig bg ð24Þ
1984
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 11. Comparison between (first row) the square root moment saliency, (second row) the expected value saliency, and (third row) the third moment saliency for (left column) one-directional and (right column) two-directional saliencies. The saliencies are thresholded so that only the points associated with the top 10 percent are marked. The lower degree moments emphasize strong unfragmented segments, and contain fewer nearfigure errors. Higher degree moments provide lower figure fragmentation and lower deep background salient texture but more near-figure errors (although the differences are not that large). This error characterization was similar for other thresholds as well.
2.
where fig ; bg refers to the expected values of the two saliency distributions, and 2fig ; 2bg refers to their variances. (The plus sign is chosen in the numerator because the saliency distribution of the background is narrower.) A heuristic based mainly on background distribution: Often, the information about the figure is not reliable, and only partial or no information about it is available. We propose to set a threshold based only on the estimates for the background saliency distribution. The simplest option would be to set a threshold based only on the asymptotic background distribution viewed as a Gaussian, using a larger standard deviation than asymptotically predicted. For one-directional saliency, we propose that the threshold be set at, say, bg bg bg bg 1 þ 41 , where 1 and 1 are the mean and standard deviation of the asymptotic distribution,
respectively. This is a simple heuristic, which follows from the observation that the variance is underestimated. Alternatively, if rough information about the figure suffices to calculate the mean of the figure distribution, we can use the interaction corrected background distribution mean instead of the asymptotic value and get better results. Note that relying on the asymptotic figure distribution is significantly less reliable as its variance is underestimated due to the reasons described in Section 3.7.
4.2 Performance Estimation Discriminating between figure and background by thresholding the saliency is, inevitably, not error free. Estimating these errors is of interest if the overall system performance needs to be predicted. Due to the underestimation of the variance, the asymptotic distributions cannot be used for estimating the error rate. Using the interaction-based
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
1985
Fig. 12. Real image saliency. (a) Original image, (b) input edges, (c) thresholded one-directional saliency, and (e) measured and predicted saliency distributions for the background, and (f) for the figure, (d) thresholded two-directional saliency, and (g) measured and predicted saliency distributions for the background, and (h) for the figure. The empirical distributions are solid (red) curves. The dashed curves are predictions: The brighter (green) corresponds to the asymptotic model and the darker (blue) to the interaction-based finite-length model.
models, however, provides reasonable results by simply integrating the errors using the predicted distributions.
5
EXPERIMENTS
5.1.1 Traditional Cocircularity Cue First, we considered a classical cue, using curvature (or weighted angle differences). Following [8], we set the cue as Pij ¼ ekxij k
2
=50
e tanðGradAngleDiff=2Þ ;
The experiments may be divided into two types, corresponding to the two contributions of the paper. First, we describe some experiments demonstrating the proposed concept of distribution-based saliency and the different types of saliencies that correspond to the proposed generalizations. The second, larger, experiment set is focused on validating the analysis, demonstrating its applicability for saliency threshold specification, and revealing its limitations. This is done by comparing predicted saliency statistics to empirical statistics.
where GradAngleDiff is the difference between the two gradient angles in the two points. Note that as the distance between a feature point and its neighbors are no longer constant, we added a preference for short distances. Interestingly, this dependency needs to reduce the cue faster than expfkxij kg because otherwise the process always prefers the far neighbors. (Going to these neighbors through other, closer neighbors gives a lower expected length.)
5.1 Implementation of Saliency with Different Cues In contrast to [8], [10], we considered only real (nonvirtual) feature points, which were obtained using the DRF edge operator [40]. They were oriented using the directions of the corresponding gradients. This choice substantially reduces the problems associated with orientation discretization inherent to the original SU saliency [8], [10]. The positive (negative) extension neighbors of every feature point were all (real) neighboring feature points, s.t. the angle between the vector xi xj and the gradient is in ½=6; 5=6 (½5=6; =6). The neighborhood was usually a disk of radius 10 pixels. The initial length distribution was set to have equal weights on the values 0, 1, and 2.
5.1.2 Measuring Cue Statistics We found that the accurate estimation of cue distribution is difficult and usually biased to lower values. The saliency process depends only on the cues connecting the feature point to its best neighbor, which is the one from which its saliency is propagated. Such cues may be called active. All other cues are irrelevant but still contribute to the estimated cue distribution, if the latter is done on all pairs of nearby features. Therefore, in [29], for example, the actual pairs of adjacent feature points along the curve are manually marked. We used a somewhat simpler procedure, evaluating the cues for all pairs of nearby features, and then selecting for every feature point the maximal cue connecting it with some other feature. We found that estimating the two distributions from this set of maximal cues gives a cue distribution that is very close to that of active cues.
1986
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
VOL. 28, NO. 12,
DECEMBER 2006
Fig. 13. Real image saliency. (a) Original image, (b) input edges, (c) thresholded one-directional saliency, and (e) measured and predicted saliency distributions for the background, and (f) for the figure, (d) thresholded two-directional saliency, and (g) measured and predicted saliency distributions for the background, and (h) for the figure. The empirical distributions are solid (red) curves. The dashed curves are predictions: The brighter (green) corresponds to the asymptotic model and the darker (blue) to the interaction-based finite-length model.
See Fig. 4 and Fig. 14 for two examples of synthetic and real images. Note that the saliency image has a concrete meaning: It is the expected length of the curve on which the point lies. For the one-sided case, for example, if one starts from a point associated with a saliency of 38 (a typical value for the strong curves on, say, the lizard’s back in the “lizard” example), one may expect to find about 38 neighbors on the curve in one of the directions.
5.1.3 Saliency with a Gray-Level Cue We took the same saliency process and changed only the cue, which now measured the similarity in gray levels, and not the smoothness of the curve. Specifically, we set 2
Pij ¼
ekxij k
=50 2
30 GrayLevelDiff 1 þ GradSizeðiÞGradSizeðjÞ
:
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
1987
Fig. 14. Real image saliency. (a) Original image, (b) input edges, (c) thresholded one-directional saliency, and (e) measured and predicted saliency distributions for the background, and (f) for the figure, (d) thresholded two-directional saliency, and (g) measured and predicted saliency distributions for the background, and (h) for the figure. The empirical distributions are solid (red) curves. The dashed curves are predictions: The brighter (green) corresponds to the asymptotic model and the darker (blue) to the interaction-based finite-length model.
The GrayLevelDiff is the difference in gray levels between the two feature points, and GradSizeðiÞ is the gradient size at xi ; see Fig. 5 and Fig. 10. Note that the saliency value has the same meaning: expected length of the curve (either to one side or to both). Actually, the results were better than we expected and, for a synthetic image, outperform (in a sense) those obtained when the anglebased cue was used. To conclude, this experiment demonstrates that a saliency process that is similar, in principle, to that proposed in [8], can work also with other sources of information.
5.2 A Saliency Measure Emphasizing Confidence Two length distributions associated with the same expected length may be associated with different moments, which often represent intuitive quality better. Consider, for example, a distribution putting a full weight to the length l ¼ 10 and another distribution sharing the weight between l ¼ 0 and l ¼ 20. Both distributions are associated with expected length of 10, but we would often prefer the first where a length of 10 is guaranteed. Such issues arise in real images as well. This was the case, for example, in the real “lizard” image, where many points on the texture were associated with large saliency. A
1988
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
TABLE 1 One-Directional Saliency Threshold: Predictions for Different Images
quality measure emphasizing the connectedness over the long length may be preferred. One such measure pffi P i is the “expected square root length,” specified as l Dþ ðlÞ l, which prefers shorter curves associated with higher confidence. pffiffiffiffiffiIn the example above, it gives a psaliency ffiffiffiffiffi value of 10 to the first distribution and only 10=2 to the second, uncertain, one. Indeed, we found that such saliency may have advantages when working on real images (see Fig. 11). We found that the lower degree moments emphasize strong unfragmented segments (which may be short segments in the background) and contain fewer near-figure errors. Higher degree moments provide lower figure fragmentation with lower deep background texture but more near-figure errors. We also found that the figure background discrimination is less sensitive to the threshold values for higher degree moment saliencies. Recall, however, that such saliencies have one severe theoretical deficiency: They are not extensible and, thus, global maximization is not guaranteed.
5.3 Distribution Simulations To test our approximation (using an Edgeworth series), to get distributions of saliency in a particular curve’s points (i.e., nonaverage distributions), and to observe the distributions over long curves (i.e., asymptotic distributions), we carried out simulations where cues were randomly drawn according to fixed distributions. Fig. 6, Fig. 7, and Fig. 9 were prepared using the simulation results. 5.4
Predicting Saliency Distributions for Real Images Given an image, we extract the cue distributions from labeled sets of figures and backgrounds; see Fig. 12e and Fig. 12f for examples. For this figure, the estimated means of the cues were 0.975 and 0.78 for the figure and the background, respectively. For this image, Nfig ¼ 3;185, Nbg ¼ 47;885, and bg ¼ 0:21. These numbers were used for estimating the interaction-based distribution. We used the cue distributions to predict the saliency distribution as described above. We also ran the saliency process until it converged and measured its empirical distribution. (In another experiment, we ran the saliency process once with only the figure points in the image and then again with only the background points in the image. This is a semireal situation, where the real cues, their nonuniformity, and the effect of finite length curves, are demonstrated without figure background interaction effect; see [32]). Fig. 12 describes one image, its edge map, its saliency map (thresholded for more informative visualization), and several distributions:
VOL. 28, NO. 12,
DECEMBER 2006
TABLE 2 One-Directional Saliency Error Rate (Figure Miss / Background False Addition) for Empirical Distribution-Based Threshold in Percents
asymptotic predicted distribution, finite length, interaction-based predicted distribution, and 3. empirical distribution. The asymptotic figure distribution gives a very good prediction of the mean saliency. The predicted means of the figure (background) was 34 (3.56) versus a true mean of 32 (3.57). On the other hand, it substantially underestimates the variance for the figure distribution. The approximation of the background distribution is much better. Note also that the interaction analysis “pushes” the predictions in the right direction and that these predictions are good estimates of both figure and background distributions in the regions of overlap. See also Fig. 13 and Fig. 14 for two other examples. In all cases, the thresholding was done with the best threshold according to predictions. 1. 2.
5.5 Setting the Threshold Considering Fig. 12, Fig. 13, and Fig. 14, we calculated the best threshold from the true distributions and, using the methods described in Section 4, from the predicted distributions. All threshold values turned out to be rather similar (see Tables 1 and 3), implying that any of the proposed methods is a reasonable approach. 5.6 Performance Estimation We now turned to compare the performance prediction with the actual number of misclassified points. The threshold was selected by the empirical distribution-based method, but as shown above, the other methods give similar values. The performance predictions for one-directional saliency (see Table 2), given by a finite-length interaction-based model, is close to empirical results. The asymptotic distribution, as we expected, predicts no misclassified points. The difference between the empirical results and the predictions is due, in our opinion, to a mixture of cues considered as one average cue, which also leads to an underestimation of the saliency distribution variance. For two-directional saliency (Table 4) the errors are much lower (as expected from the higher separation), and the cue mixture problem became more evident with a correspondingly lower agreement.
6
DISCUSSION
This paper presented a framework and an algorithm for calculating a well-defined saliency measure that is based on estimating the length distribution and the expected length of curves. The work was motivated by the SU saliency [8],
BERENGOLTS AND LINDENBAUM: ON THE DISTRIBUTION OF SALIENCY
TABLE 3 Two-Directional Saliency Threshold: Prediction Results for Different Images
which, in our opinion, rests on good principles but has not been adequately interpreted, at least for computer vision practitioners. One result of the proposed work is that the SU saliency at a point may be interpreted as an estimate of the expected length of the curve on which this point lies. The proposed approach lends itself to different types of systematic generalizations and, in particular, to saliencies based on different cues. To demonstrate this, we created a saliency process that is based on gray-level similarity. For every similarity cue, different types of saliency may be specified as functions of the length distribution. Such saliencies may be evaluated empirically and yield useful patterns (we tested three options), yet as rigorously proved, only a few may be exactly optimized by a local process. A minor generalization is the use of two-directional saliency, which seems trivial, but is very useful for obtaining a better figure-ground discrimination and, to the best of our knowledge, has not been used before. While the first part of the paper dealt with interpretation and generalization, the second part focused on analyzing the expected length saliency (corresponding to SU saliency). We derive the saliency distribution associated with the main curves and with the rest of the image. Using the derived distributions, we show how to set a threshold on the saliency for optimally discriminating between figure and background, and how to predict the expected figure background separation accuracy. While the threshold estimates are accurate, the predicted error rate is lower than the actual error rate. This gap demonstrates the complexity of modeling real grouping processes. See Section 3 for a detailed discussion. One interesting result of the analysis is that the cues used for the saliency process must be good enough. Recall that the expected asymptotic saliency is hxi=ð1 hxiÞ, where hxi is the expected value of the cue. Therefore, a reasonable cue, for which hxi ¼ 0:85 on the figure (and, say, 0.3 on the background), results in an expected figure saliency of only 5.5. This is simply too low to get a good discrimination from occasional high saliency values, which may arise in the background due to small random structures. Saliency methods suffer from many deficiencies. In particular, three weaknesses were observed in [10]. The generalized process presented here suggests that these weaknesses may not be too severe: 1.
The first problem was that short background segments, close and with good continuation to a salient figure, may get saliency values that are stronger than those given to the figure, implying that saliency is not a good indicator of figure. Note
1989
TABLE 4 Two-Directional Saliency Error Rate (Figure Miss / Background False Addition) for Empirical Distribution-Based Threshold in Percents
that this problem decreases significantly for twodirectional saliencies, where the saliencies given to the background are always in the lower range of figure saliencies. 2. A second weakness was the lack of scale invariance. First, note that curvature-based cues (used in [8], [10]) are inherently scale-dependent. Other types of cues, such as the gray-level cues described above, could be scale independent (at least to a first approximation). 3. The third weakness—the possible nondesirable preference of a large gap over over some smaller ones—is due to the cue. If the cue is specified such that its dependence on distance decreases faster than an exponential function (in this distance), this preference is eliminated. Thus, we suggest that these saliency weakness are associated with the specific cue construction suggested in [8] and analyzed in [10], and not with the inherent properties of the local, dynamic-programming-like process. The interpretation of cues as probabilities was considered in [12], where the stochastic motion of a particle was used to model completion fields and to elicit a saliency process as well (as observed in [14]). The saliency induced by this process is different from that suggested in [8], mainly because it is not associated with a single “best” curve but with some average of all curves in the image. Interestingly, a modified form of the proposed saliency may be created by updating the length distribution not according to the best curve but according to the average of all curves with weights, which are simply the corresponding probabilities. This way, we get an alternative estimate of the length distribution (and the expected length). Asking ourselves which one is better, we observed that the saliency method proposed here (and that of [8]) is essentially a maximum likelihood approach to saliency and length estimation, because it calculates the saliency relative to the best “parameter,” which, in this context, is the path. The second approach is essentially Bayesian and allows us to get contributions from many alternatives. Note that both methods can be used to calculate the expected length estimate. We actually expect the second, Bayesian, method to give more visually pleasing saliency plots. Observe, however, that it does not provide an estimate of the best curves associated with every image point.
ACKNOWLEDGMENTS This work was supported by the Israeli Science Foundation and by the MUSCLE Network of Excellence. The authors would like to thank the editor and the anonymous reviewers for their helpful comments.
1990
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,
REFERENCES [1]
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
[14] [15] [16] [17] [18] [19]
[20] [21] [22] [23] [24]
[25]
M. Wertheimer, “Untersuchungen zur Lehre von der Gestalt. ii,” Psychologische Forschung, vol. 4, pp. 301-350, 1923, abridged English translation, Laws of Organization in Perceptual Forms, pp. 331-363, 1938. S. Palmer, Vision Science—Photons to Phenomenology. MIT Press, 1999. U. Neisser, Cognitive Psychology. New York: Appleton-CenturyCrofts, 1967. A. Treisman and G. Gelade, “A Feature Integration Theory of Attention,” Cognitive Psychology, vol. 12, pp. 97-136, 1980. C. Koch and S. Ullman, “Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry,” Human Neurobiology, vol. 4, pp. 219-227, 1985. J. Tsotsos, S. Culhane, W. Wai, Y. Lai, N. Davis, and F. Nuflo, “Modeling Visual Attention via Selective Tuning,” Artificial Intelligence, vol. 78, nos. 1-2, pp. 507-545, 1995. D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Company, 1982. A. Sha’ashua and S. Ullman, “Structural Saliency: The Detection of Globally Salient Structures Using Locally Connected Network,” Proc. Int’l Conf. Computer Vision (ICCV ’88), pp. 321-327, 1988. U. Montanari, “On the Optimal Detection of Curves in Noisy Pictures. ” Comm. ACM, vol. 14, pp. 335-345, 1971. T. Alter and R. Basri, “Extracting Salient Curves from Images: An Analysis of the Saliency Network,” Int’l J. Computer Vision, vol. 27, pp. 51-69, 1998. G. Guy and G. Medioni, “Inferring Global Perceptual Contours from Local Features,” Int’l J. Computer Vision, vol. 20, nos. 1-2, pp. 113-33, Oct. 1996. L. Williams and D. Jacobs, “Stochastic Completion Fields: A Neural Model of Illusory Contour Shape and Salience,” Proc. Int’l Conf. Computer Vision (ICCV ’95), pp. 408-415, 1995. S. Sarkar and K. Boyer, “Quantitative Measures of Change Based on Feature Organization: Eigenvalues and Eigenvectors,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’96), pp. 478483, 1996. L.R. Williams and K.K. Thornber, “A Comparison of Measures for Detecting Natural Shapes In Cluttered Backgrounds,” Proc. European Conf. Computer Vision, pp. 432-448, 1998. L. Herault and R. Horaud, “Figure-Ground Discrimination by Mean Field Annealing,” Proc. European Conf. Computer Vision (ECCV ’92), pp. 58-66, 1992. A. Amir and M. Lindenbaum, “Ground from Figure Discrimination,” Computer Vision and Image Understanding, vol. 76, no. 1, pp. 7-18, Oct. 1999. P. Parent and S. Zucker, “Trace Inference, Curvature Consistency, and Curve Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 8, pp. 823-839, Aug. 1989. E. Sharon, A. Brandt, and R. Basri, “Fast Multiscale Image Segmentation,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’00), pp. 70-77, 2000. I. Jermyn and H. Ishikawa, “Globally Optimal Regions and Boundaries as Minimum Ratio Weight Cycles,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 10, pp. 10751088, Oct. 2001. E. Saund, “Perceptual Organization of Occluding Contours of Opaque Surfaces,” Computer Vision and Image Understanding, vol. 76, no. 1, pp. 70-82, Oct. 1999. X. Ren, C. Fowlkes, and J. Malik, “Scale-Invariant Contour Completion Using Conditional Random Fields,” Proc. Int’l Conf. Computer Vision (ICCV ’05), vol. 2, pp. 1214-1221, 2005. J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 6, pp. 679-698, 1986. B. Dubuc and S. Zucker, “Complexity, Confusion, and Perceptual Grouping. Part II: Mapping Complexity,” Int’l J. Computer Vision, vol. 42, nos. 1/2, pp. 83-115, 2001. D.R. Martin, C.C. Fowlkes, and J. Malik, “Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 1-20, 2004. R. Golubchyck and M. Lindenbaum, “Improving the Saliency Algorithm by Cue Optimization,” Technical Report MSC-2006-07, Technion, 2006.
VOL. 28, NO. 12,
DECEMBER 2006
[26] J. Elder and S.W. Zucker, “Computing Contour Closure,” Proc. European Conf. Computer Vision (ECCV ’96), vol. 1, pp. 399-412, 1996. [27] A. Amir and M. Lindenbaum, “A Generic Grouping Algorithm and Its Quantitative Analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 2, pp. 168-185, Feb. 1998. [28] A. Berengolts and M. Lindenbaum, “On the Performance of Connected Components Grouping,” Int’l J. Computer Vision, vol. 41, no. 3, pp. 195-216, 2001. [29] J. Elder and R.M. Goldberg, “Ecological Statistics of Gestalt Laws for the Perceptual Organization of Contours,” J. Vision, vol. 2, no. 4, pp. 324-353, 2002. [30] M. Lindenbaum and A. Berengolts, “A Probabilistic Interpretation of the Saliency Network,” Proc. European Conf. Computer Vision (ECCV ’00), pp. 257-272, 2000. [31] A. Berengolts and M. Lindenbaum, “On the Distribution of Saliency,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 543-549, 2004. [32] A. Berengolts and M. Lindenbaum, “On the Distribution of Saliency,” Technical Report CIS-2004-03, Technion, 2004. [33] D.G. Lowe, Perceptual Organization and Visual Recognition. Kluwer Academic, 1985. [34] A.P. Witkin and J.M. Tenenbaum, “On the Role of Structure in Vision,” Human and Machine Vision, pp. 481-543, Feb. 1983. [35] S. Sarkar and K.L. Boyer, “Perceptual Organization in Computer Vision: A Review and Proposal for a Classifactory Structure,” IEEE Trans. Systems, Man, and Cybernetics, vol. 23, no. 2, pp. 382399, Mar./Apr. 1993. [36] A. Stuart and K. Ord, Kendall’s Advanced Theory of Statistics, vol. 1. Arnold Press, 2003. [37] F. Edgeworth, “The Law of Error,” Trans. Cambridge Philosophical Soc., vol. 20, pp. 36-113, 1904. [38] C. Charlier, “Applications [de la the´orie des probabilitye´s] a` l’astronomie,” Traite´, 1931. [39] K. Fukunaga, Introduction to Statistical Pattern Recognition, second ed. Academic Press, 1990. [40] J. Shen and S. Castan, “An Optimal Linear Operator for Edge Detection,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR ’86), pp. 109-114, 1986. Alexander Berengolts received the MSc degree in physics in 1995 and the PhD degree in computer science in 2004, both from the Technion, Israel. Since 1999, he has worked in the industry (Camtek Ltd., Optun Ltd.) as a senior algorithm developer and is now with Tevet—Process Control Technology. His main research interests include low-level vision, grouping, and statistical methods for performance evaluation.
Michael Lindenbaum received the BSc, MSc, and DSc degrees from the Department of Electrical Engineering at the Technion, Israel, in 1978, 1987, and 1990, respectively. From 1978 to 1985, he served in the IDF. He did his postdoctoral work at the NTT Basic Research Labs in Tokyo, Japan, and since 1991, he has been with the Department of Computer Science, Technion. He was also a consultant to HP Labs. Israel, and spent a sabbatical in NECI, New Jersey (in 2001). He served on several committees of computer vision conferences, co-organized the IEEE Workshop on Perceptual Organization in Computer Vision, and was an associated editor of Pattern Recognition and Pattern Recognition Letters. Dr. Lindenbaum worked in digital geometry, computational robotics, learning, and various aspects of computer vision and image processing. Currently, his main research interest is computer vision, and especially statistical analysis of object recognition and grouping processes.
. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.