Traditional Saliency Reloaded: A Good Old Model in New Shape Simone Frintrop, Thomas Werner, and Germán Martín García Department of Computer Science III, University of Bonn, Germany.
Feature Maps
Conspicuity Maps
Saliency Map
Contrast Pyramids
Abstract: We show in this paper that the seminal, biologically-inspired saliency model by Itti et al. [3] is still competitive with current state-ofthe-art methods for salient object segmentation if some important adaptions are made. We show which changes are necessary to achieve high performance, with special emphasis on the scale-space: we introduce a twin pyramid for computing Difference-of-Gaussians, which enables a flexible center-surround ratio. The resulting system, called VOCUS2, is elegant and coherent in structure, fast, and computes saliency at the pixel level. Some example saliency maps are shown in Fig. 1.
Center and Surround Pyramids
On-Off Contrast
Off-On Contrast On-Off Contrast
Off-On Contrast On-Off Contrast
Off-On Contrast
C-S
S-C
C-S
S-C
C-S
S-C
Center
Surround
Center
Surround
Center
Surround
Intensity
Red/Green
Blue/Yellow Fusion Operation
Input image
Center-Surround Difference
C-S
Surround-Center Difference
S-C
Gaussian Smoothing
Figure 2: Overview of our saliency system VOCUS2. Figure 1: Pixel-precise (middle) and segment-based (right) saliency maps of our VOCUS2 saliency system
Precision
to complex scenes as obtained from mobile devices such as Google Glass or autonomous robots. The second row of Fig. 1 shows an example of such System Overview: Fig. 2 shows an overview over the VOCUS2 saliency a scene and the corresponding saliency maps. In [2] and [4], we show how system. The basic structure is the same as in other systems based on the psysuch saliency maps can be used for object discovery on mobile systems. chological Feature Integration Theory [5], e.g. Itti’s iNVT [3] or our previous VOCUS system [1]: feature channels are computed in parallel, pyramids MSRA 1 enable a multi-scale computation, contrasts are computed by Difference-of0.9 Gaussians. The main difference to the above systems is that we introduce a new twin pyramid that enables a flexible center-surround ratio to compute 0.8 feature contrasts. Instead of computing Difference-of-Gaussians by sub0.7 tracting layers of the same pyramid, we compute a center and a surround pyramid separately and compute Difference-of-Gaussian contrasts based on 0.6 these maps. This enables to chose the center-surround ratio in a flexible 0.5 way instead of being restricted to sigmas available in the pyramid. Since the center-surround ratio is the most crucial parameter of saliency systems, 0.4 9. VOCUS2-Prop (0.85083) this change has a large effect on the performance. Other changes include 8. VOCUS2-LP (0.83126) 6. VOCUS2-Basic (0.81521) 0.3 a different color space and a different fusion of the channels. Fig. 3 shows 6. Intermediate (c-s Ratio) (0.80823) 5. Intermediate (Twin Pyramids + Layers 0-4) (0.66877) 4. Intermediate (Colorspace) (0.57399) how each change with respect to the iNVT improves the performance on the 3. Intermediate (Fusion) (0.54612) 0.2 2. Our version of iNVT (0.49374) MSRA-1000 dataset. 1. iNVT (0.45113) 0.1 An additional location prior can be added optionally to the system for 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall specific applications and benchmarks. We added simply a wide Gaussian, centered at the image, which improved the performance on several benchFigure 3: Stepwise improvements of Itti’s iNVT saliency system [3] until marks considerably (VOCUS2-LP). reaching VOCUS2. AUC values in parentheses. Additionally, we extended the method to obtain segment-based saliency maps by combining the saliency map with a generic object proposal detec- [1] Simone Frintrop. VOCUS: A Visual Attention System for Object Detection and Goal-directed Search, volume 3899 of LNAI. Springer, 2006. tion method. The resulting object proposals are integrated into a segment[2] Esther Horbert, Germán Martín García, Simone Frintrop, and Bastian Leibe. based saliency map (VOCUS2-Prop, cf. Fig. 1, right). Sequence-level object candidates based on saliency for generic object recogniResults: In the full paper, we show results on the MSRA-10k, ECSSD, tion on mobile systems. In ICRA, 2015. SED1, SED2, and PASCAL-S datasets and show that our method is com- [3] Laurent Itti, Christof Koch, and Ernst Niebur. A model of saliency-based visual attention for rapid scene analysis. TPAMI, 20(11), 1998. petitive with state-of-the-art methods. Since the system does not rely on center or background priors (although [4] Germán Martín-García, Ekaterina Potapova, Thomas Werner, Michael Zillich, Markus Vincze, and Simone Frintrop. Saliency-based object discovery on RGBthey can be integrated if desired), it is especially well suited to be applied This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage.
D data with a late-fusion approach. In ICRA, 2015. [5] Anne M. Treisman and Garry Gelade. A feature integration theory of attention. Cog. Psych., 12, 1980.