A CMOS Image Sensor with Focal Plane Discrete Cosine Transform Computation Edwin J. Tan, Zeljko Ignjatovic and Mark F. Bocko Department of Electrical and Computer Engineering University of Rochester Rochester, NY 14627 fetan,ignjatov,
[email protected] Abstract— In this paper we describe an image sensor with nonuniform pixel placement that enables a highly efficient computation of the Discrete Cosine Transform, which is the most computationaly demanding step of the image compression algorithm. This technique is based on the Arithmetic Fourier Transform (AFT), which we show to be 5 times more computationaly efficient than currently used DCT computation methods. The architecture and circuits described can be implemented in conventional CMOS processes.
A. One-dimensional AFT algorithm The AFT based on the Mobius function was derived in [8]. The 1-D Mobius function 1 (n) is defined as
I. I NTRODUCTION Currently in commercial applications, image compression is performed by a separate digital signal processing circuit employing the DCT algorithms in the digital domain [1] [6]. The development of a new generation of image sensors that efficiently integrates the compression with the sensor itself will find widespread use in imaging applications where compression is needed and power consumption must be limited. The key element in the method that we are proposing is the two-dimensional Arithmetic Fourier Transform (2-D AFT) algorithm. This is a two-dimensional extension of the AFT algorithm proposed by Tufts in 1988 [8]. With some pre-scaling of the data, the AFT computes the DCT coefficients solely by performing additions. Unlike the conventional DCT algorithm, the AFT-based algorithm requires no multiplications. Given its relative mathematical simplicity, the AFT algorithm is well suited for efficient VLSI implementation. Since only additions are required, it is even possible to implement the AFT algorithm in the analog domain employing switched capacitor integrating circuits. In either analog or digital implementations, the AFT will enable savings in circuit complexity, size, and power consumption as well as an increase in speed. II. T HE A RITHMETIC F OURIER T RANSFORM The AFT is vastly more computationally efficient than existing DCT algorithms because it allows computation of the DCT coefficients solely through the operation of additions. However to compute the AFT, the input data must be sampled non-uniformly and an increase in the number of samples per unit area is required. To find the one-dimensional AFT equivalent of an 8 point one-dimensional DCT on the unit interval (0 to 1), 12 non-uniformly spaced samples at 0, 1/4, 2/7, 1/3, 2/5, 1/2, 4/7, 2/3, 3/4, 4/5, 6/7 and 1 are required. The first and the last samples are shared between adjacent 1-4244-0921-7/07 $25.00 © 2007 IEEE.
unit intervals. In number theory, fractions of the form k=j , where k = 0 1 : : : N ; 1 and j = 1 2 : : : N are called Farey fractions of order N [9]. Thus, the minimum required set of samples for calculating an 8 point DCT via the AFT is a subset of the Farey fractions of order 8.
8 1 < 1 (n) = : (;1)s 0
n = (p1 )(p2 )(p3 ) : : : (ps ) (1) p2 j n for any prime p where the vertical bar notation m j n means that the integer if if
n is divisible by the integer m with no remainder. If n can be expressed as the product of s different prime numbers, the value of 1 (n) is (;1)s , otherwise it is zero. Within a unit interval, the signal A(t) is assumed to be periodic with period of one. If the signal A(t) is further assumed to be band-limited to a total of N harmonics, its AFT coefficients are given by
ak (tref ) =
1 X m=1
1 (m) S (mk tref )
(2)
for k = 1 2 3 : : : N . Based on samples A(tref ; j=n) which are distributed at locations corresponding to respective Farey fractions of the interval 0 to 1, each S (n tref ) denotes the output of a filter having the following filtering function
S (n tref ) = n1
nX ;1 j =0
A tref ; nj
(3)
for n = 1 2 3 : : : N . Each of the filter outputs S (n tref ) is the sum of the respective samples A(tref ; j=n) multiplied by the scale factor 1=n, where tref is an arbitrary reference point. tref is preferably equal to 1 for a unit interval. Thus, each AFT coefficient is the sum of the filter outputs of selected filters, weighted by the Mobius function 1 (n).
2395
B. Two-dimensional extension of the AFT To process a two-dimensional (2-D) input signal such as an image or an image portion (e.g., a unit sub-image or block), the AFT is extended to two dimensions using a 2-D Mobius function 2 (n m) defined as
2 (n m) = 1 (n) 1 (m)
x. Given (8), the following relation (9) for the 2-D Fourier series coefficients is obtained [4].
akl (pref qref ) =
1 X 1 X m=1 n=1
2 (m n) S (mk nl pref qref ) (9)
III. I MPLEMENTATION OF THE 2-D AFT FOR A DIRECT 2-D DCT OUTPUT FROM AN IMAGE SENSOR ARRAY IN where n and m are positive integers, and 1 (n) is the 1-D NON - UNIFORM SAMPLE DISTRIBUTION Mobius function defined in (1). Given the close relationship between the DCT and the We derive the 2-D AFT formulae following the method Discrete Fourier Transform (DFT), the outputs from the 2presented in [8] for the derivation of the 1-D AFT. A zeroD AFT algorithm can be used to calculate the 2-D DCT mean 2-D input signal A(p q ); where p and q are continuous coefficients of a unit sub-image that is conventionally divided spatial coordinates in a unit range (i.e., a range from 0 to 1), into N N uniformly spaced pixels. can be represented with respect to any arbitrary reference point First, the image sensor array is divided into unit area blocks, (pref qref ) by a 2-D Fourier series as follows. each block having by definition, a size of 1 1. The pixel centroids inside each unit area are placed in locations based 1 1 A(pref qref ) = ak` (pref qref ) (5) on a set of Farey fractions of the unit block size, to provide the appropriate samples for the filters defined in (7). In order to k=1 `=1 calculate the filters’ outputs, an appropriate reference location ( pref qref ) is chosen. A convenient reference location is at akl (pref qref ) = Ak` cos(2 kpref + k ) cos(2 `qref + ` )pref = 1 and qref = 1 (at a corner of the unit area). Equation (6) (7) then becomes where (pref qref ) is an arbitrary reference location, preferably (1, 1) for a unit sub-image. k and ` are the phase shifts. 1 1 n;1 m;1 A 1 ; j 1 ; k S ( n m ) = (10) It is assumed that the signal A(p q ) is band-limited to N n m j=0 k=0 n m harmonics in both spatial dimensions p and q, i.e., the Fourier where n = 1 2 3 : : : N and m = 1 2 3 : : : N . series coefficients higher than N are equal to zero. A filter-bank The output of the 2-D AFT is a set of 2-D Fourier series having N 2 filters is used to process the image data. Each filter coefficients. In order to derive DCT coefficients from the has the following filtering function. Fourier series coefficients, an extended image block X (p q ) is derived by extending the original image block A(p q ) by n;1 m;1 1 1 j k S (n m pref qref ) = n m A pref ; n qref ; m its own mirror image in both directions, as shown in Fig. 1. The figure illustrates the method for an 8 8 pixel block. j =0 k=0 (7) (4)
XX
XX
XX
8 A(p q) > < (2 ; p q) X (p q) = > A : AA((2p;2 ;p q2); q)
where n = 1 2 3 : : : N and m = 1 2 3 : : : N . It can be seen from (7) that the spatial locations (pref ; j=n qref ; k=m) of the samples A(pref ; j=n qref ; k=m) processed by the filters are defined, relative to the reference location (pref qref ), by respective Farey fractions (j=n k=m). By replacing the signal A(p q ) in (7) by its Fourier series given in (5) and (6), it can be shown that the output of each filter is equal to the sum of a particular set of Fourier series coefficients of A(p q ) [4].
S (n m pref qref ) =
=
1X 1 X j =1 k=1
j s.t. njj k s.t. mjk
ajk (pref qref )
0 p < 1 0 q < 1 1 p < 2 0 q < 1 0 p < 1 1 q < 2 1 p < 2 1 q < 2
(11)
Fig. 1. Extension of the unit sub-image to the extended image employed in the DCT computation.
anjmk (pref qref )
X X
(8)
If the 2-D AFT is computed from the extended image block X (p q ) rather than from the original block A(p q ), the appropriate filter values become
Based on the assumption that the signal is band-limited, there are no more than bN=nc + bN=mc terms that are nonzero, where bxc denotes the largest integer which is less than
2396
S (n m) = n1 m1
nX ;1 mX ;1 j =0 k=0
X 2 ; 2 n j 2 ; 2 m k
(12)
From (11) and (12), the respective outputs filters can be expressed as
S (n m) of the
2b n2 c b m2 c X X 2 j 2k 1 1 4 A nm + S (n m) = n m j =0 k=0 m2 c 2 j 2k d nX b n2 c d mX 2 e;1 2 e;1 bX X A n m ++ A 2nj 2mk + j =0 k=1
j =1 k=0
3 m e;1 d nX 2 e;1 d X 2 A 2nj 2mk 5 j =1 k=1
coefficients within a scale factor; a proof of this result is provided in [4]. On the other hand, if the extended image does not satisfy the Nyquist criterion, the 2-D AFT coefficients are only an approximation of the 2-D DCT coefficients. Such aliasing is more likely to occur for images rich in highfrequency components. However, it is possible to improve the approximation using aliasing correction techniques. According to the proof given in [4], the corresponding 2-D DCT coefficients can be computed as follows DCTfAg(0 0) DCTfAg(k 0)
= = DCTfAg(0 `) = DCTfAg(k `) =
(13)
From (13) it is apparent that there are certain points in the sample space that are repeated. As a result, by calculating the DCT rather than the DFT, the number of independent samples required to find the 2-D AFT is decreased by half. The required non-uniform sample space is shown in Fig. 2, where the circles show the sample points needed for the 2-D AFT calculation, and the stars show the corresponding 2-D DCT sample points.
8p E A] (18) 4p2 xk0 k = 1 2 : : : N ; 1(19) 4 2 x0` ` = 1 2 : : : N ; 1(20) 4 xk` k ` = 1 2 : : : N ; 1 (21)
IV. E XAMPLE I MAGES In this section, we provide a visual comparison between the image obtained through the AFT based non-uniform sampling and the image obtained through uniform sampling of the continuous mathematical image. The values of the spatially continuous image I (x y ) at the coordinates x and y are defined by (22).
I (x y) = cos arctan xy
Fig. 2. Image sampling locations of the unit area sub-image required to compute the 2-D AFT.
With the image sampled as illustrated in Fig. 2, and using filters whose filtering functions are defined according to (13), the 2-D AFT coefficients xk` are computed as
xk` = xk0 = x0` =
1 X 1 X m=1 n=1
1 X m=1
1 X
n=1
2 (m n) S (mk n`)
1 (m) S (mk N )
1 (n) S (N n`)
for
; 127 x y 128
(22)
We see that the image defined by (22) has high frequency components as the x and y coordinates approach zero. The image I (x y ) is first divided into sub blocks. The AFT based sampling pattern (circles in Fig. 2) is used within each unit block to sample the image I (x y ). The DCT coefficients, which are calculated by the AFT procedure, are then returned to the spatial image by using the standard inverse DCT (IDCT). Aliasing correction methods were not applied in this example and the resulting spatial image is shown in Fig. 3.
k ` = 1 2 : : : N (14)
for for
k = 1 2 : : : N
` = 1 2 : : : N
(15) (16)
Fig. 3.
Image obtained through the AFT based 2-D DCT computation.
(17) At the same time, the unit blocks of the image I (x y ) are sampled uniformly at the locations that correspond to the are the conventional DCT (stars in Fig. 2). Fig. 4 shows the IDCT of 2-D AFT coefficients of the extended block image X. xk0 are the DCT computed for the conventionally sampled image. the coefficients obtained by calculating the 1-D AFT of the Fig. 5 confirms that the images are identical except in their mean values of the rows along the p-axis, and x0` are the center regions, where the aliasing artifacts related to the AFT coefficients obtained by calculating the 1-D AFT of the mean based 2-D DCT calculation due to the high frequency of the values of the columns along the q-axis. The equality between input signal are seen. the 1-D DCT and 1-D AFT coefficients is proven in [7]. The error is concentrated in the center region of the image If the extended image block X (p q ) obeys the Nyquist inside approximately 10 by 10 pixel area and its maximum criterion, the resulting AFT coefficients are equal to the DCT mean-square value is equal to 3.6%, which is insignificant.
x00 = E A] where E A] is the mean value of the image, xk`
2397
using analog circuits such as the circuit illustrated in Fig. 6.
Fig. 4.
Image obtained through the conventional 2-D DCT computation.
Fig. 5. Center 40 40 pixel portions of the images (high frequency region). Left figure is the AFT based 2-D DCT computation and the right figure is the conventional 2-D DCT computation. Fig. 6.
V. C OMPUTATIONAL ECONOMIES OF THE AFT Table I presents a comparison for the computation of the 1-D 8-point DCT. An estimate of the number of computations for the 2-D DCT with the results normalized to the 2-D AFT number of operations is shown in the last row of the table. TABLE I C OMPARISON OF THE NUMBER OF REAL OPERATIONS FOR DIFFERENT DCT ALGORITHMS AND AFT- BASED ALGORITHM
Additions FP Multiplications Integer Scaling Total Ops. (8 pt) 1-D DCT Total Ops. (8 8) 2-D DCT Normalized to AFT
Conventional Fast DCT Fast DCT in [2] 56 26 42 16 n.a.
n.a.
Fast DCT in [5] n.a. 18
Fast DCT in [3] 29 12
n.a.
n.a.
AFT 33 n.a. 7
1820
282
350
173
82
493
12
18
4.5
1
2-D DCT computation block employing switched capacitor circuits.
VI. C ONCLUSION Compression is an essential step in the image capture and processing chain, especially in power and bandwidth constrained remote viewing applications. We have proposed a physical redesign of an image sensor array that employs the AFT to vastly reduce the amount of computation required to perform the DCT, the first step in block compression. Following the DCT calculation either by AFT or conventional methods, the remainder of the compression algorithm proceeds as usual. The key in the simplification is rather than sampling the image at uniformly spaced intervals, the image is sampled at the maxima of the transform basis functions. In future work, a prototype sensor is being developed and tested. ACKNOWLEDGMENT This research was supported in-part by NSF ECS #0428157. R EFERENCES
It can be seen that even if we assume that the floatingpoint (FP) multiplication is equivalent to the operation of integer scaling, the 1-D AFT method is approximately 2.1 times as efficient as the most efficient prior art method [3], in terms of the total number of operations for computing a 1D DCT. Furthermore, because the number of total operations in the 2-D case is approximately proportional to the square of the number of computations in the 1-D case, the AFT method is approximately 4.5 times more efficient than the most efficient prior method for computing a 2-D DCT. If the AFT is calculated digitally, there will be a significant decrease in the number of multiplications, thus canceling the slight increase in the number of additions as compared to the methods given in [2], [5], and [3]. In addition, if the multiplications in the AFT computation comprise of pre-scaling the respective pixel intensities by integer values, they can be readily implemented
[1] A. Bandyopadhyay, J. Lee, R. Robucci, and P. Hasler. A 80 W/frame 104 128 CMOS imager front end for JPEG Compression. In Proceedings of the IEEE International Symposium on Circuits and Systems, pages 5318–5321, Kos, Greece, May 2005. [2] W. H. Chen, C. H. Smith, and S. C. Fralick. A Fast Computational Algorithm for the Discrete Cosine Transform. IEEE Transaction on Communications, COM-25:1004–1009, Sep 1977. [3] Z. Cvetkovic and M. V. Popovic. New fast recursive algorithms for the computation of discrete cosine and sine transforms. IEEE Transaction on Signal Processing, 40(8):2083–2086, Aug 1992. [4] Z. Ignjatovic. Towards All-digital Image Sensors with Built-in Compression. PhD thesis, University of Rochester, Rochester, NY, May 2004. [5] M. J. Narasimha and A. M. Peterson. On the Computation of the Discrete Cosine Transform. IEEE Transaction on Communications, COM-26(6):934–936, Jun 1978. [6] A. Olyaei and R. Genov. ViPro: Focal-Plane Spatially-Oversampling CMOS Image Compression Sensor. In Proceedings of the IEEE Custom Integrated Circuits Conference, San Jose, CA, Sep 2006. [7] D. W. Tufts, Z. Fan, and Z. Cao. Image Processing and the Arithmetic Fourier Transform. SPIE High Speed Computing II, 1058:46–53, 1989. [8] D. W. Tufts and G. Sadisiv. Arithmetic Fourier Transform and Adaptive Delta Modulation: A Symbiosys for High Speed Computing. SPIE High Speed Computing, 880:168–178, 1988. [9] N. M. Wigley and G. A. Jullien. On Implementing the Arithmetic Fourier Transform. IEEE Transaction on Signal Processing, 40(9):2233–2242, Sep 1992.
2398