Attentive motion sensor for mobile robotic applications Giacomo Indiveri
Chiara Bartolozzi and Neeraj K. Mandloi
University of Zurich and ETH Zurich Institute of Neuroinformatics Zurich, Switzerland e-mail:
[email protected] Italian Institute of Technology Genova, Italy e-mail:
[email protected] Abstract— We present a compact vision sensor comprising a one-dimensional array of adaptive photo-receptors, spatiotemporal feature extraction circuits, feature normalization circuits, and an attentional readout circuit that selects the most salient region in the feature map. The sensor comprises also digital input and output circuits for directly interfacing it to digital processing units, making it an ideal device for mobile robotic applications. We describe the sensor architecture and present experimental results measured from the fabricated chip. As we identified unexpected results from one of the computational stages, we compare the measured responses to circuit simulations and propose improvements for new revisions of the chip.
Address Decoder Analog Output
I. I NTRODUCTION
978-1-4244-9472-9/11/$26.00 ©2011 IEEE
Temporal Derivative Spatial Derivative Velocity Direction
Feature Maps
(a)
Processing detailed sensory information in real-time is a computationally demanding task for both natural and artificial sensory systems. This is especially true for vision, where the amount of information provided by the sensors typically exceeds the parallel processing capabilities of the system. An effective strategy to cope with large amounts of input data in face of limited computational resources is to use selective attention, a process that allows the system to select sub-regions of the input and process them serially, shifting from one subregion to another, in a sequential fashion [1]. In active vision and robotic applications these strategies can be used to decide which salient sub-regions of the sensory input space to process, dramatically reducing the bandwidth requirements for information transfer, and the system’s overall computational load [1]. With this goal in mind, we designed a compact one-dimensional neuromorphic vision sensor that implements an on-chip model of selective attention, to select and track features of interest as they move in the environment. We will refer to this device as the Tracker-Motion Sensor (TMS). In the TMS, “saliency” is defined as a combination of spatial and temporal contrast features, as well as stimulus velocity. All these features are computed in parallel on the sensor focal-plane, weighted by adjustable biases, and summed into the input of a winner-take-all (WTA) circuit, which selects the most salient input and produces in output the address of the winning pixel, using both analog and digital representations. The low-power vision processing capabilities of the TMS, combined with the parallel/distributed computation approach, allow both a low weight budget and a redundant fault-tolerant
Photoreceptor Array
Selective Attention Address Encoder
(b)
Fig. 1: TMS chip micro-photograph and functional block diagram. The chip contains one row of 64 pixels comprising adaptive photoreceptors, functional blocks for extracting spatio-temporal features from the visual scene, and circuits for saliency-map based selective attention processing. The block diagram depicts how pixels are interconnected, and interfaced to input/output circuits.
architecture. Furthermore, the current-mode neuromorphic circuits used in the TMS for designing the WTA network are ideal for hardware models of selective attention systems [2]. Several neuromorphic attention systems of this kind have been proposed in the past [3]–[6], that contain photo-sensing elements and processing elements on the same focal plane, to carry out competitive selection and visual tracking operations. The TMS extends previously proposed approaches by implementing a more complex saliency map computation stage, with a feature normalization stage for optimally combining the different saliency features. Furthermore, the TMS has digital input/output circuits which allow it to selectively read-out the analog values of the most salient pixels in the scene: the digital input address decoders used to read-out analog outputs can be driven by the chip’s own output address encoders, which report the position of the most salient pixel. As opposed to imagers, which simply reproduce the visual
2813
input, the TMS extracts relevant information for interacting with the external world in real-time. Its real-time processing characteristics combined with flexible attention-based read-out techniques, make it an ideal device for on-line active-vision tasks and/or mobile robotic applications. In this paper we present the TMS architecture, describe its main analog processing blocks, and present preliminary experimental results measured from the fabricated chip. We point out unexpected behaviors in the current implementation, also by comparing SPICE simulations to measured data. We conclude by proposing circuit modifications that can improve the system’s performance and by describing application domains that are optimally suited for the TMS.
IOn
VFast
VOn
Iw
Vw -
VPr
+
CInv
VTd
IOff
VOff
(a)
CAmp2 CAmp1
VThr
VSlow
Vτ
Iτ
Csp
(b)
Fig. 2: Temporal derivative (TmpDiff) circuit (a) and DPIbased slow-pulse generation circuit (b).
II. T HE TRACKER MOTION SENSOR ARCHITECTURE Figure 1 shows the micro-photograph and the block diagram of the TMS. Each pixel comprises a photoreceptor, a spatial derivative circuit, a temporal derivative circuit and circuits for the computation of velocity and direction of motion. A featuremap circuit combines these signals at each pixel location to encode the pixel saliency, and provides this signal in input to a global winner-take-all (WTA) circuit, which implements the attentional selection mechanism. A. Analog core: 1) Photoreceptor circuit: The input of each pixel is the logarithmic adaptive photoreceptor proposed in [7]. It responds with low gain to static or slowly changing stimuli and with high gain to fast changing stimuli centered around the adaptation point. To minimize power-consumption, we implemented a self-biasing option [8] for the photoreceptor amplifier, such that the photoreceptor bias current is proportional to the total amount of photo-current generated in the whole array. An optimal amount of power is used at each illumination level by scaling the amplifier bias current and bandwidth proportional to the input bandwidth. 2) Spatial derivative circuit: The spatial derivative is implemented by a “bump-antibump” circuit [9], that compares the photoreceptor outputs of two neighboring pixels and produces an output current with a bell shaped profile, which reaches its minimum value in correspondence of equal inputs, and increases for increasing difference (in absolute value). 3) Temporal derivative and edge detection circuit: The temporal circuits are those originally proposed in [10]: they comprise a hysteretic differentiating amplifier which converts rapid changes in the input voltage (the photoreceptor output) into current pulses with amplitudes proportional to the input signal’s temporal contrast (see Fig. 2a). 4) Velocity circuits: To compute the velocity of detected contrast changes, we use a new circuit based on the the “Facilitate and sample” (FS) concept first described in [10]. Moving edges in the scene produce temporal derivative pulses which are used to generate a pair of sharp “fast” and a slowly decaying “slow” voltage pulses. The fast pulse of one pixel samples the slow decaying pulses of its neighbors. As the
sampled voltage is proportional to the time the edge takes to travel across pixels, it is directly proportional to its velocity. There are two novel aspects in the circuits proposed here, compared to the original FS circuit. The first lies in a new thresholding and signal restoration stage inserted between the temporal derivative and the fast pulse generation circuit, to improve robustness to noise: a two-node winner-take-all (WTA) circuit [11] compares the temporal derivative output to a set threshold and eventually produces a normalized pulse which is optimally suited for driving the fast pulse generation circuit. The second improvement lies in a new slow pulse generation circuit, which comprises a current-mode log-domain filter, the Differential Pair Integrator (DPI) [12] with tunable time-constant (see Fig. 2b). In its original conception, the FS produced a non-linear decay signal, using a diode-capacitor circuit [10], which had a very steep slope at short time intervals and a very shallow slope for longer intervals. This resulted in inhomogeneous sampling resolution for different speeds, while the exponentially decaying signal provides a uniform sampling resolution for a wide range of (tunable) speeds. Each pixel computes left and right velocity, if an edge is traveling in one direction, one of the two will be significantly higher than the other, determining the estimated direction of motion. The comparison between the two sampled values is implemented by a three inputs current-mode WTA; the third input sets a minimum threshold used to drive the circuit when no moving edges are detected. 5) Feature maps and selective attention: The saliency value of each pixel is computed as a weighted sum of the currents corresponding to the measured temporal and spatial derivatives and velocity. It is performed by three current normalizer circuits, that set the diverse currents to a common range and allow to differently weight each contribution to the final saliency map. The attentional selection is implemented by a current-mode WTA and a self-inhibition circuit [2], needed for scanning the array in order of decreasing saliency. Depending on the time constant of the self inhibition an increasing number of pixels can be selected as attentional target.
2814
SpaceDiff (mV) 0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.05
0.10
0.15
0.20 Time (s)
0.25
0.30
0.35
0.40
285 280 275 270 265 260 255 1 28 26 24 22 20 18 16 1
Steady State Min 0
1
0
1
∆SpaceDiff (mV)
Space Diff(mV)
Photoreceptor(V)
2.2 2.0 1.8 1.6 1.4 0.00 290 285 280 275 270 265 260 255 0.00
Fig. 3: Photoreceptor and SpaceDiff circuit responses to a moving bar. There are 7 measurements superimposed, made from 7 neighboring pixels.
2
3
4
5
6
7
6
7
Mean=0.24 mV, Std=0.091 mv 2
3 Pixel
4
5
Fig. 4: Average SpaceDiff circuit response measured across 7 pixels. Top: maximum steady state and minimum response measurements. Bottom: their difference, corresponding to an estimate of the spatial derivative, is significantly different from zero..
B. Asynchronous logic I/O and analog readout
III. TMS TEST SETUP AND CHARACTERIZATION To test the TMS with real-world stimuli we mounted a lens onto it and projected a moving dark bar on a white background painted on a rotating drum with well-controlled rotational speed. We verified that the adaptive photoreceptor circuits behave as expected and measured the response of the following computational stages. 1) Spatial derivative circuits: Figure 3 shows the response of the adaptive photo-receptors (top) and the spatial derivative circuits (bottom) to a moving bar that appears in the pixel’s field of view, measured for seven neighboring pixels. The slow decrease in photoreceptor’s peak response is due to their adaptation property. Subsequent circuits are not affected by this decay, as they compute spatial and temporal discontinuities. As the bar moves from one pixel to its neighbor, the spatial
276 274 SpaceDiff minimum (mV)
The position of the winning location is reported off-chip using both analog position-to-voltage circuits, as well as digital encoders. The selection of a winner in the competition stage is signaled off-chip via a digital address and two control signals that signal an error exception in case more than one pixel is active or no pixel is active at all, respectively. The analog currents corresponding to the spatial and temporal derivatives and to the velocity measure are transformed into voltages by on-chip current-voltage converters. A digital asynchronous decoder is used to address the pixel from which we want to read the measurements. The addressing can be driven by an external micro-controller that can be programmed either to perform a sequential scanning, or to perform the attentional readout, by selecting as reading outputs the pixels selected by the WTA. The TMS behavior can be tuned by means of 36 biases controlled by an on-chip programmable bias generator [13] (see bottom half of the circuitry in the chip photograph of Fig. 1a). The programmable, precise and temperature independent properties of this on-chip bias generator are critical for application of the TMS in uncontrolled environments (e.g., such as outdoors) and for the use on compact autonomous systems.
272 270 268 266 264 262 260 258 6
4
2
0 Velocity
2
4
6
Fig. 5: Spatial derivative estimate, computed for different velocities (arbitrary units) of the moving bar.
derivative produces a negative peak in its response. Mismatch effects have been minimized in this stage, by careful layout. To quantify them we measured the response of the seven pixels over ten repetitions of the same experiment. In the top panel of Fig. 4 the average values of both minimum peak and maximum outputs of each pixel are shown. The two values are significantly different in all pixels. Their difference, plotted in the lower panel of Fig. 4, is a measure of the input stimulus spatial derivative. Figure 5 shows the spatial derivative measure of one pixel as a function of different speeds of the rotating drum; clockwise and counter-clockwise rotations can be clearly distinguished thanks to the asymmetric nature of the bump response. 2) Temporal derivative circuits: Figure 6 shows both photoreceptor response (top) and temporal derivative circuit response (bottom), for different values of photo-receptor adaptation bias. The photo-receptor data confirms the expected behavior: a transient peak is followed by slow adaptation, the peak amplitude and the time course of adaptation can be tuned by means of a dedicated bias voltage. However, the response of the (ON) temporal derivative (TmpDiff) circuit is not behaving as expected: the output peak of the temporal differentiator has an unexpected delay from the onset of the photoreceptor step. To investigate this issue we performed post-layout SPICE circuit simulations using as input wave-
2815
Photoreceptor(V) Temp Diff (mV)
2.6 2.4 2.2 2.0 1.8 1.6 1.40.0 80 60 40 20 0 20 40 600.0
system optimally suited for mobile robotic applications.
VAdap= 0.0V VAdap= 0.5V VAdap= 1.0V VAdap= 1.5V 0.5
1.0
1.5
IV. C ONCLUSIONS 2.0
VAdap= 0.0V VAdap= 0.5V VAdap= 1.0V VAdap= 1.5V 0.5
1.0 Time(s)
1.5
2.0
Fig. 6: Photoreceptor and TmpDiff circuit response to a moving bar, for different values of the photoreceptor adaptation bias. 140
Simulated Experimental
120
Temp Diff(mV)
100 80 60 40
We presented a novel analog VLSI vision sensor which extends the capabilities of previously proposed designs by combining in one single chip spatial derivative measurements, temporal derivative measurements, velocity measurements, as well as selective attention processing features. We endowed the device with digital inputs and outputs in order to directly interface it to micro-controllers for mobile robotic applications. The first processing stages of the chip work as expected. However, an unusual behavior measured in the temporal derivative stages has delayed the application of this device to the mobile robotic tasks that it was designed for. We are currently in the process of “debugging” the analog VLSI circuits, and are planning a second revision of this device. We are confident however that the architectural and functional solutions adopted for this sensor are ideal for a wide range of application scenarios, irrespective of the details (and issues) present in the individual processing stages.
20
ACKNOWLEDGMENTS
0
This work was supported by the EU eMorph grant #ICT231467.
20 400.0
0.5
1.0 Time(s)
1.5
2.0
Fig. 7: TmpDiff circuit: SPICE simulation results vs. measured output (AC coupling), in response to the same input waveform as measured from the photoreceptor. The delay observed in the measured response is most likely due to parasitic capacitive effects not accounted for in the SPICE simulations.
forms the same ones measured from the photo-receptors. Figure 7 shows the measured and simulated output of the circuit in response to the measured photoreceptor waveform superimposed. As shown, the SPICE simulations don’t explain the data. Therefore, we carried out further measurements using a wide range of bias settings: in most biasing conditions, the TmpDiff circuit produces many spurious peaks mainly in response to noise and flicker. We found that these can be removed by increasing the circuit unity gain frequency. This can be achieved either by either careful biasing of the amplifier in sub-threshold, or by decreasing the absolute value of its capacitors [10]. As changing bias settings in the TMS did not produce the desired outcome, we are investigating the effect of using smaller capacitor in the layout, and are considering changing design completely, with radically different temporal derivative circuits. 3) Velocity, feature-map, and selective attention circuits: The unexpected output of the temporal difference circuit strongly affects the following processing stages. In light of this, we are currently in the process of testing the velocity and subsequent circuits. We are furthermore characterizing the analog and digital outputs, in order to determine what specifications are needed by conventional digital processing modules (e.g., micro-controllers) and develop a mixed analog/digital
R EFERENCES [1] L. Itti and C. Koch, “Computational modeling of visual attention,” Nature Reviews Neuroscience, vol. 2, no. 3, pp. 194–203, 2001. [2] C. Bartolozzi and G. Indiveri, “Selective attention in multi-chip address-event systems,” Sensors, vol. 9, no. 7, pp. 5076–5098, 2009. [Online]. Available: http://www.mdpi.com/1424-8220/9/6/5076 [3] R. Etienne-Cummings, J. Van der Spiegel, and P. Mueller, “VLSI model of Primate visual smooth pursuit,” in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds., vol. 8. Cambridge, MA: MIT Press, 1996, pp. 706–712. [4] T. Morris, T. Horiuchi, and S. DeWeerth, “Object-based selection within an analog VLSI visual attention system,” IEEE Transactions on Circuits and Systems II, vol. 45, no. 12, pp. 1564–1572, 1998. [5] V. Brajovic and T. Kanade, “Computational sensor for visual tracking with attention,” IEEE Journal of Solid State Circuits, vol. 33, no. 8, pp. 1199–1207, August 1998. [6] G. Indiveri, “Neuromorphic VLSI models of selective attention: from single chip vision sensors to multi-chip systems,” Sensors, vol. 8, no. 9, pp. 5352–5375, 2008. [Online]. Available: http://www.mdpi.com/14248220/8/9/5352 [7] S.-C. Liu, “Silicon retina with adaptive filtering properties,” Analog Integrated Circuits and Signal Processing, vol. 18, no. 2/3, pp. 243– 254, February 1999. [8] T. Delbr¨uck and D. Oberhof, “Self biased low power adaptive photoreceptor,” in International Symposium on Circuits and Systems, ISCAS 2004, vol. IV. IEEE, 2004, pp. 844–847. [9] T. Delbr¨uck, ““Bump” circuits for computing similarity and dissimilarity of analog voltages,” in Proc. Int. Joint Conf. Neural Networks, July 1991, pp. I–475–479. [10] J. Kramer, R. Sarpeshkar, and C. Koch, “Pulse-based analog VLSI velocity sensors,” IEEE Transactions on Circuits and Systems II, vol. 44, no. 2, pp. 86–101, February 1997. [11] J. Lazzaro, S. Ryckebusch, M. Mahowald, and C. Mead, “Winner-takeall networks of O(n) complexity,” in Advances in neural information processing systems, D. Touretzky, Ed., vol. 2. San Mateo - CA: Morgan Kaufmann, 1989, pp. 703–711. [12] C. Bartolozzi and G. Indiveri, “Synaptic dynamics in analog VLSI,” Neural Computation, vol. 19, no. 10, pp. 2581–2603, Oct 2007. [13] T. Delbr¨uck and P. Lichtsteiner, “Fully programmable bias current generator with 24 bit resolution per bias,” in International Symposium on Circuits and Systems, (ISCAS 2006). IEEE, 2006.
2816