Proceedings of the 2004 IEEE International Conference on Robotics & Automation New Orleans, LA • April 2004
A New High Speed CMOS Camera for Real-Time Tracking Applications Ulrich Muehlmann, Miguel Ribo, Peter Lang, Axel Pinz Institute of Electrical Measurement and Measurement Signal Processing and Christian Doppler Laboratory for Automotive Measurement Research Graz University of Technology, Austria Abstract— There are many potential applications for very highspeed vision sensors in robotics. Maybe tracking is the most obvious one. The main idea of this paper is to combine existing standard technology (CMOS imaging sensors, FPGA “glue logic”, and USB 2.0 interface) to use direct pixel access capabilities for real-time tracking. We designed and built such a prototype camera system, including FPGA programmed functionality for Fixed Pattern Noise (FPN) calibration, subsampling and direct subwindow access. The performance of this new camera is on one hand limited by the maximum pixel clock of the sensor, on the other hand by the USB 2.0 microframe timing and bandwith constraints. We achieve “frame rates” of up to 2.5 kHz for small subwindows which can be randomly and individually addressed for each update cycle. Besides all given technical details and specifications of the system, we show a demonstration application of a high-speed blob tracking system which verifies the usability of our new camera for highly demanding tracking applications. Keywords— CMOS imaging sensor, vision-based real-time tracking, FPGA
I. INTRODUCTION Many robotic applications require real-time vision-based motion-tracking. In the case of self-tracking, or “inside-out” tracking, the motion of a sensor platform has to be determined relative to stationary landmarks in the scene. “Outside-in” tracking tries to recover moving objects in a scene from a stationary sensor platform. In both cases it is often sufficient to track the motion of small, compact regions, which can be nicely modeled as “subwindows” of the imaging sensor. Another option is to track homogeneous background motion at lower scales, e.g. by subsampling the imaging sensor. Both, subwindows and subsampling, will lead to a significant reduction of the amount of image data to be processed, which will increase the tracking rates significantly. Theoretically, tracking rates of several kHz would be feasible, so that such a system could deal with very fast motion.
Several approaches to address marked sub frames, high speed image acquisition using CMOS camera technology and exploiting logarithmic light-to-voltage conversion are reported in the literature. Related work on CMOS image sensors with non-integrating direct readout architecture, and a logarithmic light-to-voltage conversion was presented in [11]. Similar work based on spatially distributed read out of variant pixels at high speed was introduced in [8, 9]. [10] describes a camera system for automotive applications based on a 256°256 pixel CMOS image sensor with an effective frame rate of 50 Hz. The possibility of placing an entire “glue logic” for fast image acquisition on a single chip can be found in [2]. Moreover, the author discusses the requirements for CMOS image sensors and their historical development. High-speed sensor read out was introduced in [5] and it is achieved by many simultaneously working ADCs integrated on a chip. In [7] another hardware implementation on a custom CMOS chip to be used by an edge detection algorithm is reported. Scoped on the problem of mismatch between individual pixels known as Fixed Pattern Noise (FPN), [6] shows a way to reduce this presence by using built-in automatic FPN correction. Another technique to remove FPN employing on-chip calibration has been designed in [4]. Figure 1 shows the monocular prototype of our camera consisting of three combined printed circuit boards (PCB1PCB3).
Recent developments in CMOS imaging sensor technology support direct pixel access, and thus would allow subwindowing and subsampling. Unfortunately, existing CMOS cameras do not support this functionality in an efficient way. We thus tried to build our own camera using standard and affordable components (CMOS imaging sensor, FPGA “glue logic”, and USB 2.0 interface). This work is supported by the Austrian Science Fund (FWF project P15748 “smart tracking”)
0-7803-8232-3/04/$17.00 ©2004 IEEE
5195
Figure 1. Monocular prototype camera
Our current version of the camera is using a monochrome sensor (we lose color information, but gain spatial resolution). Moreover, CMOS sensors have a logarithmic response like the human retina. Thanks to this feature 120 dB dynamic range of illumination intensities can be covered. B. USB 2.0 Controller The increasing interest in using USB 2.0 bus systems opens a way to develop cheaper and more powerful camera systems, which permit data transfer rates up to 480 Mbit/sec. Modern computer systems include enhanced USB host controllers, so no additional hardware is required. Compared with other camera systems and due to USB 2.0 interface no additional frame-grabber hardware must be added. To have the most flexible usage of USB 2.0 interface a peripheral device controller with embedded micro controller is used. Figure 2. Stereo prototype camera
PCB1 contains the USB 2.0 connector, a special purpose interface and the switched power supply circuit. PCB2 houses the USB 2.0 peripheral controller with an embedded microcontroller, 16 MByte DRAM and an FPGA. PCB3 holds the sensor and its socket and a c-mount lens holder. Figure 2 shows our stereo prototype camera consisting of one PCB1, one PCB2 and two PCB3 components.
C. FPGA device This device forms the connecting strap between both image sensor and USB peripheral controller in which several services are programmed. Particularly, and in this context demanded features are included, like:
II. CAMERA SYSTEM DESIGN The key issue of this CMOS camera system is to maintain the best compromise between mobile system integrity and high flexibility for tracking applications. To meet these requirements we decided to implement a USB 2.0 interface for its high transfer rate and good performance. In order to handle fast window addressing, to obtain synchronous frame data reading of both cameras and to correct FPN on the fly, a lot of glue logic must be added. To gain effort in speed and size the whole logic is embedded in an FPGA device. The resulting camera system design consists of the following three major parts: A. Sensor To be able to perform subsampling and direct window access a reliable CMOS sensor must be selected. Related work on CMOS sensor technology is presented in [1]. To do hard real-time tasks the sensor must not exhibit motion blur. Therefore a sensor with non-integrating behavior must be selected. As image sensors we use 2/3" monochrome, logarithmic, non-integrating, random-accessible CMOS area image sensors with 10-bit on-chip ADC and an effective resolution of 1024°1024 pixels (FUGA 1000). This sensor features a high optical fill factor, a high dynamic range and a logarithmic response. However, non-integrating illumination behavior leads to the presence of FPN. The existence of this non-uniformity is known and possibilities for calibration are presented in [3]. In this context a straightforward calibration method is used and explained in section C.
5196
•
Fixed Pattern Noise (FPN) correction: Calibration is needed to adapt the CMOS image sensor to the specific lighting conditions used in any application and also to correct the present FPN in a raw image. FPN is due to small non-uniformity differences between pixels of the image sensor. These differences result from the manufacturing process of the CMOS image sensor. As the FPN presents a fixed spatial distribution (meaning it is not varying in time), it can easily be reduced by using a simple pixel-per-pixel offset calibration (first order correction of every pixel). Once calibrated the proposed camera performs real-time correction of each newly acquired image. The calibration stage consists in pointing the camera at the typical scene to be imaged, adjusting the two main sensor’s parameters (offset and gain) according to the specific lighting condition, taking a reference image with uniform gray value, and finally storing this reference image as a “calibration image” into the on-board memory. Obtaining a calibration image involves an image with a constant uniform gray level over the entire image sensor. This is achieved by placing a calibration paper (transparent paper) or frosted glass out of focus, for instance directly in front of the lens, so that ideally the resulting image should be uniform. After calibration is complete, the calibration image is involved in a real-time FPN correction process. The central part of the real-time algorithm for FPN correction is a simple difference between the raw pixel readout and the corresponding pixel offset. A clipping stage is added to keep the range of the calibrated pixel within the desired bounds (usually between 0 and 1023). The main advantage of this first order correction lies in the fact that it is ideally suited for high-speed applications since the offset correction is simple and requires little processing power. Not taking into account the response time of
the image sensor can dramatically affect image quality, mainly due to its influence on the FPN correction. The pixel value is affected by the sampling rate, thus by the specific timing conditions used when addressing and reading a pixel. As a result, it is clear that a specific offset is valid only for an identical timing condition. Therefore a calibration image depends on the specific illumination level and timing conditions. To partially overcome this side effect, the camera provides four memory blocks for those specific calibration images. This means a maximum set of four calibration images for each sensor can be stored in the on-board memory. •
•
directional bus is used to control the image sensor hardware and to grab image data. The fourth interface is a bi-directional bus between memory and device. B. FPGA internal modules • Register Set: This module contains all necessary registers to control the image sensors. This set involves window width, window height, origin and image sensor related registers like decimation, gain and offset and a control register, which is responsible for image acquisition and FPN calibration processing.
Subsampling and subwindow access: To fulfill the requirements of reducing the amount of image data, subsampling and subwindow access are added. Because of a specific subsampling mechanism provided by the sensor a line-by-line subsampling can be executed. Basically the FPGA internal address generator consists of a presetable row and column counter with particular increment register which provides the absolute sensor addresses. An additional so called window address counter is overlaid, which keeps track of the current width and height of the image area. If one increment step is loaded into the particular increment register a subwindow access is executed without subsampling. Values greater than one determine the decimation factor of the observed image area. Mono/Stereo ocular setup: Some tracking applications require stereo vision. For this purpose an optional extension for stereo operation is realized. Because of the known problem to synchronize two cameras in software we decided to integrate stereo grabbing in hardware with the advantage of concurrent image acquisition, but with the restriction of simultaneously grabbing equal subwindow sizes from both sensors. Standard image data delivery is done in 10-bit wide packets. Due to the 16-bit wide peripheral interface of the USB controller, dual data transfer is realized by truncating the least significant two bits of each data stream with a subsequent mapping to 16-bit for stereo. This means that upper and lower bytes are filled with data of the left and right sensor, respectively. This organization requires identical window sizes but does not affect transfer bandwidth and image capture speed.
•
Calibration Unit: This module forms the main part of the integrated hardware. The major task of this unit is to execute the image acquisition and to correct the Fixed Pattern Noise (FPN) of the present CMOS imagers. Furthermore, this unit comprises an image sensor controller, a module to communicate with the external memory, some internal dual port buffers to handle asynchronous data transfers between memory and image data and a small arithmetic unit as well.
•
Software Version ROM: Holds information about the loaded FPGA configuration file like release date or software version. This memory can also be used to verify some special integrated hardware features.
•
Timestamp Register: Provides the timestamp of the current frame.
•
System Control: This unit is a finite state machine which coordinates the connected modules.
III. FPGA SYSTEM ARCHITECTURE A. Interface Figure 3 shows the block diagram and data flow of the camera system architecture. Basically, four different interfaces to the FPGA device can be defined. The interface between USB 2.0 peripheral controller and FPGA consists of two discrete busses which work independently. This architecture makes it possible to load an upcoming request directly into the device during image data transfers. This feature can be exploited when more than one region of interest should be acquired in one single grab without additional overhead. A bi-
5197
Figure 3. Block diagram and data flow
IV. CAMERA PERFORMANCE Due to the fundamental USB concept that requires only one master in the USB system (the host computer), USB devices cannot exchange information between themselves. USB devices respond to host requests. USB 2.0 establishes a 125 µs time base called a “microframe” on a high-speed bus (refer to USB 2.0 spec [12]). Due to master-slave relationship between host and device, data exchange must be arranged in correct chronological order. Data transfer starts with sending a request statement to the slave device, performed by the host controller, and ends after an error free transmission with an appropriate answer. Focused on this technique, the present camera system has the same characteristics. In this special case the host controller starts the asynchronous transmission by transmitting a data package containing sensor and region of interest related information which are necessary for the claimed image acquisition. In this state sensor specific parameters like gain, offset and decimation factor and values which define origin, width and height of the observed sub area are programmed into the FPGA. After short latency (USB access time) the device answers with corresponding image data. This camera system is characterized by fast random access to partitions of the sensor. Partitions can differ in size, which are not constrained to the same aspect ratio. Simple rows or columns can be read out with this technique as well. It should be figured out that the camera attributes can be changed per acquired frame without restriction. This means that area and position can be varied between frames with no additional overhead. A fast and flexible access to the sensor’s sub areas can be achieved by this special property of this given camera system. Figure 4 shows the frame rate behavior in relation to different window sizes and number of simultaneously acquired areas on the sensor at varying positions using a Pentium IV based hardware running at 2 GHz with Linux kernel (version 2.6.0 beta).
Basically, two speed limitations can be determined. On one hand the maximum pixel clock - in particular the sensor read out mechanism - may not be driven beyond 7.5 MHz. On the other hand the USB high-speed time base of 125 µs constrains the delivery of small data packets. This occurs if small areas are read because the so called microframes cannot totally be filled up and the next request must wait for synchronization. The discontinuous run of the curves is due to the fact that USB 2.0 bus systems synchronize ingoing and outgoing data packets to its related clock speed of 8 MHz. Theoretically, a maximum of 4000 frames per second can be achieved because each frame requires at least two microframes, one for request statement and the second one for delivering image data (image acquisition within one microframe is ensured). Figure 5 depicts frame rate versus decimation factor. Low decimation factors yield a quadratic rise of the frame rate, because USB bus timing does not affect transmission since large data packets are transferred which are essentially limited by the sensor readout circuit. Full frame image capture (1024°1024 pixels) is obtained at 7.5 Hz refresh rate. Applying decimation factors beyond 16 gives images that are less than 64°64 pixels with frame rates above 1000 Hz. At this point, the USB 2.0 interface limits data transfer due to the fact of microframe synchronization. V. BLOB TRACKING DEMONSTRATOR The functionality of this camera system was verified with the following demonstrator as shown in figure 6. We use a box with a dark background, a transparent cover, solid side-walls, a looped top and a solid bottom with a centered pipe connection. Air-flow is generated by strong ventilation from bottom to top. Sponge rubber balls are inserted into the frame, which move around in a random manner. The task of our system is to simultaneously track all balls. This demonstration is focused on the specs of our camera and its applications to very fast motion tracking. Tracking process is done in small subwindows, which remain centered
Figure 5. Frame rate vs. subsampling decimation factor
Figure 4. Frame rates vs. window size and number of windows per request
5198
TABLE I:
UPDATE RATES Blob tracking
Ball count
Update Rate (Hz)
1
2500
3
1200
8
800
inside the frame. The experimental testbed consists of three different setups, with only one ball, three balls, and eight balls. All objects inside the frame are moving in a random manner. The tracking process was carried out over a long period of time (up to one hour). The average frame rates of the tracking process are shown in Table I. The results can be interpreted as follows. In the first setup, the system is able to track one ball with a frame rate of about 2500 Hz. In the second setup, three balls are tracked with a frame rate of about 1200 Hz. Finally, in the third setup, the system achieves a frame rate of about 800 Hz while tracking the eight moving objects. As the emphasis of our system is especially laid on the capabilities of the presented camera subsystem, the used blob tracker suffers from some deficiencies. Indeed, our blob tracker lacks a better motion predictor (e.g. predict correct trajectories for colliding balls), and an enhanced recovery procedure. Full frame recovery is done if at least one ball is lost. As a consequence, the straightforward use of our blob tracker tends to drastically decrease the frame rate of the system as the number of moving objects increases inside the frame. Figure 6. Experimental setup
VI. SUMMARY
at the individual blobs. Full frame images, which are downscaled by a factor of 12, are only taken at initialization, later on, only subwindows (typically 15°15 pixels) are grabbed. Furthermore, the non-integrating behavior of the sensor results in low motion-blur and the logarithmic characteristic allows a wide range of illumination intensities. Figure 7 shows snapshots of three examples of the described demonstrator, taken with different numbers of balls
Figure 7: Snapshots from tracking 1,3, and 8 balls
CMOS camera technology opens the way for new real time tracking applications and image processing as well. This approach can be defined into two terms, technical nature on one hand and application related nature on the other hand. Thanks to the high capacity of USB 2.0 bus systems, fast and easy image grabbing can be done without using additional frame grabber hardware. Due to random readout and subsampling mechanism a wide field of non-standard image acquisition can be created. Some applications do not require full spatial resolution (e.g. in this context the used blob tacker) thus subsampling can be adopted to increase image scanning rate. Subsampling by a factor of two means four times faster image readout. Furthermore, this two dimensional area camera can be easily converted into a one dimensional line camera by applying only row or column readout. Employment of reprogrammable logic accomplishes new features, e.g. external trigger input, mono or stereo sensor operation or flexible control of additional hardware. Unfortunately, non-integrating CMOS image sensors suffer from FPN which must be calibrated at extra effort. For most applications, the first order correction is fully satisfactory. However, in cases where the camera is used in extreme conditions (e.g. in very dark or very bright illuminations) and where the application requires a higher image quality, a more advanced correction must be used. In those extreme conditions, if only a first order correction is performed, the image may show very few white pixels in dark conditions or black pixels in bright conditions.
5199
These are over or under corrected pixels. The response curve of these pixels needs to be corrected with a second order correction model. However, for high quality imaging digital signal processing must be adopted. USB 2.0 topology comes with the disadvantage of using microframes for data transfers which are synchronized to 8 kHz. A workaround for this effect is to nest several sub areas into one request.
Solid-State Circuits, IEEE Journal of , Volume: 35 Issue: 7 , July 2000, Page(s): 932 –938 [11] D. Scheffer, B. Dierickx, G. Meynants, “Random addressable 2048°2048 active pixel image sensor,” Electron Devices, IEEE Transactions on , Volume: 44 Issue: 10 , Oct. 1997, Page(s): 1716 –1720 [12] www.usb.org
In order to exploit these given features, software applications must be changed to handle non-standard image acquisition. Due to the non-standard image acquisition provided by this presented camera system a new field of highspeed tracking applications for hard real-time tasks arises.
ACKNOWLEDGMENT We gratefully acknowledge support from the following projects: The Tracking with Smart Sensors project (Austrian FWF project number P15748) and Christian Doppler Laboratory of Automotive Measurement Research. REFERENCES [1]
B. Dierickx, D. Scheffer, G. Meynants, W.Ogiers and J. Vlummens, ”Random addressable active pixel image sensor,“ Europto Conf. Advanced Focal Plane Arrays and Electronic Cameras, Berlin, Germany, Oct. 9-10, 1996, pp. 1-5. [2] E. R. Fossum, “CMOS image sensors: electronic camera-on-a-chip,” Electron Devices, IEEE Transactions on , Volume: 44 Issue: 10 , Oct. 1997, Page(s): 1689 –1698 [3] D. Joseph, S. Collins, “Modelling, calibration and correction of nonlinear illumination dependent fixed pattern noise in logarithmic CMOS image sensors” Instrumentation and Measurement Technology Conference, 2001. IMTC 2001. Proceedings of the 18th IEEE , Volume: 2 , 2001, Page(s): 1296 -1301 vol.2 [4] S. Kavadias, B. Dierickx, D. Scheffer, A. Alaerts, D. Uwaerts, J. Bogaerts, “A logarithmic response CMOS image sensor with on-chip calibration,” Solid-State Circuits, IEEE Journal of , Volume: 35 Issue: 8 , Aug. 2000 Page(s): 1146 –1152 [5] A. Krymski, D. Van Blerkom, A. Andersson, N. Bock, B. Mansoorian, E. R. Fossum, “A high speed, 500 frames/s, 1024°1024 CMOS active pixel sensor,” VLSI Circuits, 1999. Digest of Technical Papers. 1999 Symposium on , 1999 Page(s): 137 –138 [6] M. Loose, K. Meier, J. Schemmel, “CMOS image sensor with logarithmic response and self calibrating fixed pattern noise correction,” HD-KIP-98-13 in: International Symposium on Electronic Image Capture and Publishing - Advanced Focal Plane Arrays and Electronic Cameras, T. M. Bernard, Editor, Proc. SPIE 3410 (1998) [7] T. W. J. Moorhead, T. D. Binnie, “Smart CMOS camera for machine vision applications”, Image Processing and Its Applications, 1999. Seventh International Conference on (Conf. Publ. No. 465) , Volume: 2 , 1999, Page(s): 865 -869 vol.2 [8] Y. Ohtsuka, T. Hamamoto, K. Aizawa, M. Hatori, “A novel image sensor with flexible sampling control,” Circuits and Systems, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on , Volume: 6 , 1998, Page(s): 637 -640 vol.6 [9] R. Ooi, T. Hamamoto, T. Naemura, K. Aizawa, “Pixel independent random access image sensor for real time image-based rendering system,” Image Processing, 2001. Proceedings. 2001 International Conference on , Volume: 2 , 2001, Page(s): 193 -196 vol.2 [10] M. Schanz, C. Nitta, A. Bussmann, B.J. Hosticka, R. K. Wertheimer, “A high-dynamic-range CMOS image sensor for automotive applications,”
5200