Photon and electron leakages caused by optical ... - Semantic Scholar

Report 2 Downloads 22 Views
CROSS-TALK EXPLAINED Keigo Hirakawa Harvard University Department of Statistics and School of Engineering and Applied Sciences Oxford Street, Cambridge, MA 02138 USA [email protected] ABSTRACT The image sensor measurements are subject to degradation caused by the photon and electron leakage. The color image data acquired via a spatial subsampling procedure implemented as a color filter array is especially vulnerable to the ambiguation between neighboring pixels that measure different portions of the visible spectrum. This so-called “cross-talk” phenomenon is expected to become more severe as the electronics industry’s trend to shrink the device footprint continues because the pixel sensors are more densely packed together. We show that an analysis of the mechanism underlying the cross-talk problem is surprisingly straightforward. Our comprehensive analysis admits a simple and effective color correction scheme for a given choice of color filter array in a digital camera. Index Terms— color image sensors, cross-talk, color correction, color measurement, spatio-spectral sampling. 1. INTRODUCTION Cost effectiveness has helped secure the dominance of single-sensor solution over the alternative color image acquisition configurations in consumer electronics. The popularity of this design paradigm is also marked by the sustained progress in the research of color filter array (CFA) [1–5], interpolation [6–10], denoising [11], compression [12], and quantum efficiency of pixel sensors [13]. In recent years, however, we have witnessed the trend in the industry to increase the image resolution as a response—at least in part—to the heightened consumer awareness and expectation of digital color image contents. The challenges posed by densely populating pixel sensors are not limited to the manufacturability of integrated circuits alone, as there are implications for image processing as well— a great care must be taken in analyzing the image sensor because acquisition typically represents the first step in the digital camera pipeline and largely determines the image quality achievable by subsequent processing schemes. A well-known drawback to shrinking the geometry of the pixel sensors is that it increases noise. While a detailed investigation of noise sources is beyond the scope of this paper, studies suggest that the number of photons encountered during a spatio-temporal integration is a Poisson process—the “noise” variance scales linearly with the intensity of the light, integration time, and the surface area of the sensor and lens. Consequently, the signal-to-noise ratio is poor when the surface of the sensor chip is subdivided into smaller pixels. Photon and electron leakages caused by optical refraction and minority carriers are the sources of an interaction between neighboring pixels—the so-called “cross-talk” phenomenon—and it is another artifact of the decreased distance between neighboring pixels Patent pending.

that complicates the reconstruction of the desired image signal. “Desaturation” of color is the most noticeable aftermath of cross-talk in single-sensor color image acquisition devices as a result of combining neighboring pixel measurements representing different portions of the visible spectrum. The research in cross-talk has so far focused on the physical aspects [14–16]. Moreover, the problem of leakage cannot simply be “un-done” by existing de-convolution or image deblurring methods owing to the complexity of the sensor data. In sum, cross-talk is an important problem that still deserves our attention. In this paper, we show that the spatio-spectral sampling theory admits a straightforward alternative to the analysis of the mechanism underlying the cross-talk contaminations. This comprehensive study precisely models the attenuation of color information not only based on the sensor and leakage characteristics but also as a function of the color image content and the demosaicking method. We offer a simple and effective color correction scheme and compare the sensitivity of various color filter array patterns as characterized by the interplay between aliasing, cross-talk, and demosaicking. 2. BACKGROUND Let x : Z2 → R3 , x = [xr , xg , xb ]T be the color image of interest, where x(n) is the RGB tristimulus value at pixel location n ∈ Z2 . Once the light enters into the camera through the lens, it is focused onto the surface of sensor array. Ideally, value digitally recorded by the pixel sensor y : Z2 → R is proportional to the intensity of the light that penetrates through the color filter at the corresponding pixel location: y(n) = c(n)T x(n), where c : Z2 → R3 , c = [cr , cg , cb ]T is the color filter array. This is effectively a spatio-spectral subsampling procedure implemented as a color filter array, whereby each pixel location measures only a portion of the visible spectrum selected from amongst a chosen “color partition” of that spectrum. Two predominant causes for cross-talk contamination are optical diffraction and minority carriers. Optical diffraction occurs when a high incidence angle of the light entering the substrate causes the photons to stray away from the center of the pixel; microlenses can help to reduce this risk [14]. The diffusion is stochastic but mostly linear with respect to the intensity of the light. The incident angle is typically wider for the pixel sensors far from the lens axis, and thus the light that reaches photosensitive material can be modeled as spatially-variant convolution: y(n) =

X T {c x}(n − m)h0 (n, m), m

Recall that point-wise multiplication in space is equivalent to convolution in the Fourier domain. Suppose that cr and cb are finite sums of pure sinusoids (true by default for periodic CFAs): F cr (ω) = (a) F xg

(b) F xα

(c) F xβ

Fig. 1. Log-magnitude spectra of color channels of “bike” image.

where h0 : Z2 × Z2 → R+ is the location-dependent impulse response. The precise modeling of h0 (n, m) as a function of sensor geometry is an active area of research involving sophisticated simulation [15, 16]. Minority carrier deteriorates the signal when electrons stray from the target after the charge is collected [14]. This carrier is typically deterministic and mostly linear with respect to the signal strength, and it can be modeled as spatially-invariant convolution: y(n) =

X T {c x}(n − m)h1 (m),

J X

sj δ(ω − λj )

F cb (ω) =

j=0

J X

tj δ(ω − λj )

j=0

where F represents a two-dimensional Fourier transform, ω ∈ (R/2π)2 is the radial frequency index, and δ is a Dirac function. Here, the carrier frequencies λj ∈ (R/2π)2 and weights sj , tj ∈ C are the parameters determined fully by the choice of CFA pattern [4, 6]. Thus Fourier transform of cT x is the linear combination of the baseline signal (F xg ) and the “modulated” versions of difference images (F xα and F xβ ): F{cT x}(ω) = γF xg (ω) +

J X {sj F xα + tj F xβ }(ω − λj ). j=0

ˆ For the time being, let h(n, m) ≈ h(m)—that is the cross-talk ˆ Then, ˆ = F h. effect is spatially-invariant—and H

m 2

where h1 : Z → R+ is the convolution kernel. Motivated by physics, the characteristics of this diffusion process are crudely modeled as h1 (m) ∝ e−kτ mk/κ , where κ is the diffusion constant, τ is the sample interval, and k · k is the Euclidean distance [16]. Combining the effects of optical diffusion and minority carrier, the overall acquisition process is: y(n) =

X T {c x}(n − m)h(n, m), m

where h : Z2 × Z2 → R+ represents the combined effect of the convolution filters h0 and h1 . We assume h is known a priori as it is not data-dependent and parameterized via calibration experiments. 3. ANALYSIS OF CROSS-TALK With the aid of spatio-spectral sampling analysis [4, 6], we are concerned with the coding of color information “embedded” in the sensor data y as a function of cross-talk kernel h. Suppose we assume cr + cg + cb = γ for some constant γ (true by default for pure-color CFAs). Then,     xr (n) T c(n) x(n) = cr (n) cg (n) cb (n) xg (n) xb (n)     1 −1 0 xr (n)   1 1 0 1 0 xg (n) = cr (n) cg (n) cb (n) 0 1 0 0 0 1 1 0 −1 1 xb (n)     xα (n) = cr (n) γ cb (n)  xg (n)  , xβ (n) where the difference images xα = xr −xg and xβ = xb −xg can be taken as a proxy for chrominance component of x while xg is similar to the luminance of x. The advantage to {xα , xg , xβ } representation is that xα and xβ enjoy rapid spectral decay, while xg embody image features such as edges and textures [6]. See Figure 1.

T ˆ F y(ω) = H(ω)F{c x}(ω) # " J X ˆ {sj F xα + tj F xβ }(ω − λj ) = H(ω) γF xg (ω) + j=0

ˆ = γ H(ω)F xg (ω) +

J X

ˆ H(ω){s j F xα + tj F xβ }(ω − λj )

j=0

ˆ ≈ γ H(ω)F xg (ω) +

J X

ˆ j )F{sj xα + tj xβ }(ω − λj ), H(λ

j=0

(1) where the approximation in the last step is justified owing to the ˆ and the bandlimitedness of F xα and F xβ . smoothness of H The first term in (1) denotes the blurring that occurs as a result of ˆ spatial averaging in h—the consequence of this low-pass filtering is usually negligible compared to the optical blurring (i.e. out of focus lens). However, the more noticeable artifact is “desaturation”—this is evidenced by the attenuation of the modulated difference images ˆ j ). As a result, the reconF{sj xα + tj xβ } by the factor of H(λ structed image often appears less colorful—see Figure 2. Due to the ˆ is a lowpass filter, higher modulation terms suffer from fact that h increased level of desaturation. As the spatially-variant impulse response h varies very smoothly over n, we make no further attempt ˆ other than that the attenuation term to distinguish between h and h is now a function of pixel location n: X T . Hn (λj ) = (2π)−1 h(n, m)e−iλj m . m

4. CROSS-TALK COLOR CORRECTION The role of the correction scheme needed to improve the color fidelity is to cancel the effects of the attenuation Hn (λj ). One naive approach to accomplishing this is “de-convolution”—rescaling F y −1 by Hn —because of its numerical instabilities (especially in the presence of noise) and high computational costs. Moreover, the regularization terms employed in existing “de-blurring” methods are

tuned to enhance image features and are not intended for the subsampled color image data that we deal with in a single-sensor color imaging device. Instead, we focus on the color fidelity after or in conjunction with demosaicking—a process of reconstructing a spatially undersampled vector field whose components correspond to particular color tristimulus values. To this end, suppose we re-write (1) in the spatial domain as follows:   s0 γ t0   h i  s1 0 t1  xα (n) T   h y(n) = . . . , Hn (λj )eiλj n , . . .   xg (n) . ..  x (n)  . β sJ 0 tJ P where xhg (n) = m h(n, m)xg (n − m) is a slightly blurred green image, and—without loss of generality—let λ0 = [0, 0]T and Hn (λ0 ) = 1. Though overwhelming majority of the state-of-theart demosaicking methods are nonlinear, they are often conditionally linear (examples include directional filtering, data-driven kernels, etc.) and can be re-written as a spatially-variant convolution. Let f : Z2 × Z2 → R3 be the impulse response of the demosaicking kernel where Fn : (R/2π)2 → R3 and X T . f (n, m)e−iω m . Fn (ω) = (2π)−1 m

Then, the reconstructed color image (according to the chosen demosaicking method) is: X ˆ x(n) = f (n, m)y(n − m) m

=

X

i h T f (n, m) . . . , Hn-m (λj )eiλj (n-m) , . . . s0  s1  ×  sJ



γ t0   xα (n-m) 0 t1   h  xg (n-m) ..  x (n-m) . β 0 tJ   s0 γ t0    xα (n)    s1 0 t1   h ≈ . . . , Fn (λj )Hn (λj ), . . .   xg (n) , ..   x (n) . β sJ 0 tJ | {z }

linear demosaicking [2] [3] 76.5/25.1 68.5/24.7 61.5/20.6 58.3/20.2 22.4/ 9.6 21.4/11.3 386.2/79.1 317.4/60.5 74.3/17.0 60.4/13.5

[4] 97.1/18.1 72.8/14.8 25.2/ 6.4 564.0/53.0 110.5/12.0

by the cross-talk contamination—it is full-rank (unless Hn (λj ) = 0 for some j) and off-line computable. Thus, the cross-talk color correction scheme is a simple pixel-wise matrix inversion,   1 1 0 ˆ ˆ CT (n) = 0 1 0 Mn−1 x(n). x 0 1 1 It can be performed in conjunction with (by combining Mn−1 with f ) or after (by retaining F (λj )) demosaicking step In many existing demosaicking methods, the kernel f is predetermined and thus the matrix Mn is easy to (pre-)compute. Take, for example, directional filtering strategy—f is limited to a few choices (e.g. “vertical” and “horizontal” interpolation kernels), considerably reducing the complexity of the proposed cross-talk color correction for real-time processing. Due to page constraints, we reserve the implementational details of this important special case to future publications.

5.1. Experimental Results

(2)

Mn

where the approximation in the last step is two-fold: X γ f (n, m)xhg (n-m) ≈ xhg (n)

summary nonlinear MSE stat [1] [1] mean 59.0/ 19.1 63.1/25.3 median 52.7/ 15.1 54.9/20.6 min 17.3/ 6.3 19.3/11.8 max 244.6/ 59.2 253.4/68.4 std. dev. 47.1/ 13.7 47.8/15.0

5. EXPERIMENTAL RESULTS AND DISCUSSIONS

m



Table 1. Performance comparison for various CFA patterns and reconstruction methods tested on the 20 Kodak test set images of [8] using nonlinear [9] and linear [4] demosaicking methods. Summary MSE statistics are expressed as “before/after” color correction.

(3)

m

X T f (n, m)Hn-m (λj )eiλj (n-m) xα (n-m)

For our experimentational verification, time-invariant cross-talk kernel is used: h(n, m) = (τ /2κ) exp(−kτ mk/κ), τ = 5, κ = 2— this is a moot point since h is assumed to be known a priori. We work with CFA patterns in [1–4] and demosaicking methods in [4] (linear) and [9] (nonlinear) for Bayer pattern case. No comparisons to alternative color correction schemes are offered because there are no existing cross-talk correction methods that the author is aware of. The zoomed portions of example processed images are shown in Figure 2 and the mean square errors of reconstructions are reported in Table 1. Though the smoothing in the output images of demosaicking step (Figure 2(a-d)) is hardly noticeable, the colors appear desaturated to varying degrees— [4] is most severe and [1] is least affected—but in all cases color desaturation dominates the reconstruction error. However, the proposed color correction scheme restores the desired color, and the output from [4] now outperforms [1]. See Figure 2(e-h).

m

≈ Fn (λj )Hn (λj )xα (n)

(4)

The approximation in (3) stems from P the fact that one key goal of demosaicking is precisely to achieve m γf (n, m)xg (n-m) ≈ xg (n); the existence of blurring due to h would in fact make this task even easier. The approximation in (4)—which applies to xβ as well—is justified owing to the smoothness of Hn and the bandlimitedness of the difference images. Besides the negligible blur introduced by h, the matrix Mn ∈ R3×3 as indicated in (2) represents the desaturation as manifested

5.2. Discussions In addition to the experimental evaluations, (1) and (2) provide insights into numerical stability of color filter arrays. The color filter array with higher carrier frequencies (greater kλj k)—though more robust for demosaicking—are likely to suffer cross-talk phenomenon (smaller Hn (λj )), as evidenced by Figure 2 and Table 1. The severity of the desaturation artifact has far less bearing on the effectiveness of the proposed color correction scheme, however. Indeed, the CFA sampling can be viewed as spatial-frequency multiplexing,

(a) [1]+NL

(b) [2]+L

(c) [3]+L

(d) [4]+L

(e) [1]+NL

(f) [2]+L

(g) [3]+L

(h) [4]+L

Fig. 2. “Bike” image for various CFA patterns. (a-d) Output from (L=linear;NL=nonlinear) demosaicking step. (e-h) After color correction.

where the CFA demosaicking is then a demultiplexing problem to recover subcarriers, with spectral overlap given the interpretation of aliasing [7]. The cross-talk contamination attenuate both F xg (ω) and F {sj xα + tj xβ }(ω − λj ) by an equal amount (Hn (λj ))— and their ratio remain fixed even after the proposed color correction scheme, which has the effect of boosting the baseline and the sub−1 carriers by Hn . In net, the contribution of aliasing to the overall image quality is independent of cross-talk contamination; the color correction scheme remains numerically stable even if the L2 matrix norm of Mn is small, as shown by the last column of Table 1. The interactions between cross-talk and noise is complex, and a complete analysis requires considerations for optical diffraction and minority carrier diffusion separately. Recall that the number of photons encountered during a spatio-temporal integration is a Poisson process. Although the optical diffraction takes place before the charge collection, the Poisson process in the sensor measurements is no longer spatially independent owing to the minority carrier diffusion which couples the neighboring pixel sensor values after the photoncurrent is generated [16]. Due to page constraints, we reserve the analysis of cross-talk, noise, and the proposed color correction scheme to future publications. Finally, our analysis clarifies a common misunderstanding about cross-talk. Putting aside the manufacturing variabilities, it is often claimed that a color filter array with a fixed number of neighbors corresponding to each color filter type is more robust to cross-talk contamination [3]. It is apparent from (1), however, that there is no evidence to support the advantages to this arrangement. 6. SUMMARY With the aid of spatio-spectral sampling theories, we analyzed the cross-talk phenomenon as the coding of chrominance data embedded in the sensor measurements. Due to the bandlimitedness of the chrominance images, the desaturation artifacts are characterized as the attenuation of the modulated signals by the frequency response of the cross-talk kernel at the carrier frequencies. The proposed method to correct the cross-talk contaminations follows forthrightly from the interplay between cross-talk and demosaicking—which can be reduced to a pixel-wise matrix operation. The brilliance of the color (saturation) is restored after an inverse matrix operation, as evidenced by our numerical experiments. 7. REFERENCES [1] B. E. Bayer, “Color imaging array,” US Patent 3 971 065, 1976. [2] S. Yamanaka, “Solid state color camera,” US Patent 4 054 906, 1977. [3] R. Lukac and K. N. Plataniotis, “Color filter arrays: Design and performance analysis,” IEEE Transactions on Consumer Electronics, vol. 51, pp. 1260–1267, 2005.

[4] K. Hirakawa and P. J. Wolfe, “Spatio-spectral color filter array design for enhanced image fidelity,” in Proceedings of the IEEE International Conference on Image Processing, 2007, vol. 2, pp. 81–84, Extended version submitted to IEEE Transactions on Image Processing, October 2007. [5] M. Parmar and S. J. Reeves, “A perceptually based design methodology for color filter arrays,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 3, pp. 473–476. [6] D. Alleysson, S. S¨usstrunk, and J. H´erault, “Linear demosaicing inspired by the human visual system,” IEEE Transactions on Image Processing, vol. 14, pp. 439–449, 2005. [7] E. Dubois, “Filter design for adaptive frequency-domain Bayer demosaicking,” in Proceedings of the IEEE International Conference on Image Processing, 2006, pp. 2705–2708. [8] B. K. Gunturk, J. Glotzbach, Y. Altunbasak, R. W. Schafer, and R. M. Mersereau, “Demosaicking: Color filter array interpolation in single chip digital cameras,” IEEE Signal Processing Magazine, vol. 22, pp. 44–54, 2005. [9] K. Hirakawa and T. W. Parks, “Adaptive homogeneity-directed demosaicing algorithm,” IEEE Transactions on Image Processing, vol. 14, pp. 360–369, 2005. [10] R. Lukac and K. N. Plataniotis, “Single-sensor camera image processing,” in Color Image Processing: Methods and Applications, R. Lukac and K. N. Plataniotis, Eds., pp. 363–392. CRC Press, Boca Raton, FL, 2006. [11] K. Hirakawa and T. W. Parks, “Joint demosaicing and denoising,” IEEE Transactions on Image Processing, vol. 15, pp. 2146–2157, 2006. [12] N. Zhang and X. Wu, “Lossless compression of color mosaic images,” IEEE Transactions on Image Processing, vol. 15, no. 6, pp. 1379–1388, June 2006. [13] T. Kijima, H. Nakamura, J. Compton, and J. Hamilton, “Image sensor with improved light sensitivity,” US Patent 20 070 177 236, 2007. [14] G. Agranov, V. Berezin, and R. H. Tsai, “Crosstalk and microlens study in a color cmos image sensor,” IEEE Trans. Electron Devices, vol. 50, no. 1, 2003. [15] H. Rhodes, G. Agranov, C. Hong, U. Boettiger, R. Mauritzson, J. Ladd, I. Karasev, J. McKee, E. Jenkins, W. Quinlin, I. Patrick, J. Li, X. Fan, R. Panicacci, S. Smith, C. Mouli, and J. Bruce, “Cmos imager technology shrinks and image performance,” in IEEE Microelectronics and Electron Devices, 2004. [16] I. Shcherback, T. Danov, and O. Yadid-Pecht, “A comprehensive cmos aps crosstalk study: Photoresponse model, technology, and design trends,” IEEE Trans. Electron Devices, vol. 51, no. 21, 2004.