Perceptual Compressive Sensing for Image Signals - IEEE Xplore

Report 2 Downloads 91 Views
PERCEPTUAL COMPRESSIVE SENSING FOR IMAGE SIGNALS Yi Yang, Oscar C. Au, Lu Fang, Xing Wen and Weiran Tang Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong, China email:{yyang, eeau, fanglu, wxxab, tangweir}@ust.hk • Find sparse dictionaries, which could give more sparse representations of different types of signals. • Design well structured sampling matrices to sufficiently capture the signal structures. • Develop efficient reconstruction methods to enhance the precision and the speed of decoding.

ABSTRACT Human eyes have different sensitivity to different frequency components of image signals, typically, low frequency components are relatively more crucial to the perceptual quality of images than high frequency components. Based on this observation, we propose a novel sampling scheme for compressive sensing framework by designing a weighting scheme for the sampling matrix. By adjusting the weighting coefficients, we can tune the structure of the sampling matrix to favor the frequency components that are important to human perception, so that those components could be more precisely recovered in the reconstruction procedure. Experimental results reveal that our proposed scheme can greatly enhance the performance of compressive sensing framework in both PSNR and visual quality without increasing the complexity of the framework structure or computational procedure.

The idea of compressive sensing is to measure the signal at a random manner with a very low sampling rate. It is based on the hypothesis that we do not have any prior-knowledge about the signal structure, which on the other hand actually is an advantage of compressive sensing. Basically this hypothesis is true for most of real life signals. However, when it comes to some specific types of signals, images, for example, we do know some of the characteristics about the signal structure, like frequency components distribution. It would be beneficial to utilize such information in the sampling scheme. In this paper, we propose a perceptual sampling scheme for image signals in compressive sensing framework. This scheme works on the block-based processing mechanism. First the images will be divided into small macro-blocks, and a weighting matrix will be multiplied on the sampling matrix to put different emphasis on different frequency components according to the characteristics of human perception system. The main advantages of our proposed method include, a) simple to implement and completely compatible with existing CS systems, b) capable of fully utilizing the signal characteristics while keeping the convenience of random sampling, c) greatly enhance the performance of existing CS framework with nearly no increase of complexity. The rest of the paper will be organized as follows. Section 2 will give a brief review of compressive sensing framework. Section 3 describes the characteristics of human perception mechanism and our proposed sampling scheme. Section 4 represents the simulation results and finally in section 5 we will draw our conclusion.

Index Terms— Compressive Sensing, human perception, 1 minimization

1. INTRODUCTION In conventional digital image sampling and compression system, natural image signals are sampled according to Shannon sampling theory and quantized into discrete digital numbers. Then a transform based image codec like JPEG or JPEG 2000 will be applied to compress the signal. As a result, only a few of the most important transform coefficients will be kept and transmitted while most of the other sampled data will be discarded. It is a huge waste of time, storage space and computation power to do so, especially for those devices with limited power. A natural question rising is whether it is possible to directly get the samples with the size near the number we want (the number of coefficients coded in conventional codecs)? The advancement in information theory and digital signal processing in the past few years brought us a positive answer to this question. Compressive Sensing (CS) (also known as compressed sampling) is such a novel technique built upon the pathbreaking work by Candes et. al.[1] and Donoho[2] that allows us to directly acquire condensed data with no/little information loss. It was shown that we can precisely reconstruct a highly condensed signal simply by linear programming as long as the original signal is sparse. The CS principles provide us the potential to dramatically reduce data size, power consumption, computation complexity and transmission bandwidth occupation in digital data sampling systems. The framework of compressive sensing consists of two major parts, sampling procedure and reconstruction method. The current research hot spots are mainly focused on three aspects:

978-1-4244-4291-1/09/$25.00 ©2009 IEEE

2. COMPRESSIVE SAMPLING FRAMEWORK OVERVIEW Consider a real-value, finite length, discrete time signal x with dimension N , the compressive sensing sampling process is basically to take linear non-adaptive measurements of x through dimension reduction projection[3], which can be described as, y = Φx

(1)

where y is the sampled measurements with dimension n(n  N )) , and Φ is an n × N sampling matrix. Usually, the real world signals could not be perfectly sparse in time/space domain (e.g. images, audio signals), but they may be sparely represented in a certain

89

ICME 2009

transform domain Ψ (e.g. DCT or Wavelet). The above sampling problem could also be described in a more general form[3]: y = ΦΨs = Θs

3.2. Proposed Sampling Scheme Usually, the size of a natural image will be considerably large. It would be very expensive in memory and computation power to process the whole image at one time. Therefore we apply a block based compressive sensing framework here, which is a more efficient choice[9]. Consider a natural image with dimension M × N . In block based compressive sensing, we divide the whole image into macro blocks with size nB × nB , for each block with index i, suppose the number of measurements we want to take is m, define the sampling rate Rs to be the ratio of measurements to the original signal size, which is Rs = m/n2B . First, the targeting block will be transformed to obtain a sparse like representation, then the transform coefficients will be vectorized into a 1D sequence in a certain scanning order before being sampled. Therefore, the sampling process for block i could be represented as:

(2)

where Θ = ΦΨ is an n × N matrix. Since n  N , the problem of reconstructing x from y is ill conditioned. However, the compressive sensing theory says that as long as x is sparse in some domain, we can reconstruct the original signal from the condensed measurements exactly. It was proved in [1][2] that when Φ and Ψ are incoherent and their product Θ satisfies the Restricted Isometry Property (RIP) [1], then x can be well reconstructed from the n = O(c log N ) measurements[3], where c is a constant related to the sparsity of original signal. This reconstruction could be accomplished by solving a simple linear programming problem[3], min : s.t. :

x 1 y = Θx

yi = ΦB VB (T (xi ))

(3)

(4)

where VB (·) is the vectorizing operator, yi is the sampled measurements sequence with length m = Rs n2B , ΦB is the sampling matrix with dimension m × n2B , and T (·) is the 2D transform operator. In this paper, i.i.d Gaussian matrix[5] and 2D DCT are employed to be the sampling matrix and transform basis respectively. Define SB (·) = ΦB VB (T (·)) as the sampling operator. Then the equivalent sampling matrix for the whole image in (2) could be written as a diagonal matrix consisting of SB : ⎤ ⎡ SB ⎥ ⎢ SB ⎥ ⎢ (5) Θ=⎢ ⎥ . .. ⎦ ⎣ SB

A commonly used sampling matrix is i.i.d Gaussian matrix with entries being outcomes of i.i.d Gaussian variables [1][2]. It is universal in the sense that the product of an i.i.d Gaussian matrix and the transform matrix Θ = ΦΨ is also i.i.d Gaussian thus will have the RIP with high probability regardless the choice of transform basis Ψ. However, due to the structurelessness of i.i.d Gaussian matrix, the reconstruction process could be rather slow[4], especially for image signals. Therefore in order to accelerate the recovering process, other sampling operators (as described in [5]) including partial random Fourier ensemble[5] and scrambled block Hadamard ensemble[4] are also proposed. Nevertheless, none of the aforementioned work takes the structure of image signal in the transform domain nor human perception characteristics into consideration. In the following section, we will propose our human perception based compressive sensing sampling scheme.

For each row of the sampling matrix, say φi , every element is corresponding to a certain frequency component of the transformed signal. In conventional CS, each measurement is a non-adaptive random linear combination of all the frequency components which are equally weighted. To adjust the structure of the sampling matrix, we put additional weight on each entry of the row.  φi = wi1 φi1 wi2 φi2 · · · win φin (6)

3. PERCEPTUAL COMPRESSIVE SENSING 3.1. Human Perception Characteristics

In matrix form:

As a widely known observation, the human eyes have certain masking effect on some parts of the frequency spectrum of image signals. Our eyes work like low pass filters[6], which makes them more sensitive to low frequency components than high frequency ones. Based on this observation, the JPEG still image compression standard[7] designed a quantization table to suppress those less sensitive frequency components for human eyes. The entries of the table are given based on the Just-Noticeable Distortion (JND) profile which says that human eyes cannot sense any changes in an image that are below the JND threshold[8]. By quantizing the DCT coefficients, JPEG could achieve significant compression ratio without notable decrease of visual quality. In conventional compressive sensing scheme, we treat all the components of image signals equally, therefore a high ratio (usually 30% ∼ 50%, based on the transform used) of measurements would be needed to well recover the original images due to the imperfection of sparsity of original signals. However, this is neither efficient nor economical, thus it is of great necessity to develop a sampling scheme that is based on human perception.

(7) φi = φi · ω where ω = w1 w2 · · · wn and · is defined as the elementwise multiplication of two matrices. Define the weighting matrix  T , then (4) will be revised as Ω = w w ··· w 

yi =

1  1 ΦB VB (T (xi )) = (ΦB · Ω)VB (T (xi )) α α

(8)

where α is the normalizing factor to preserve the RIP of the modified sampling matrix, typically, we can define α = ΦB 2 . Define s = VB (T (xi ))

(9)

as the vectorized signal in transform domain. The corresponding reconstruction LP problem for one block will become min : s.t. :

90

s 1 1 yi = ΦB s α

(10)

where s is the estimation of s. By adjusting the value of the weighting coefficients, we can make the sampling matrix favor the frequency components of interests. Besides the sample rate and weighting coefficients, the signal length, or block size, in block base framework also puts influence on the reconstruction result. Neither too short nor too long would be a good choice. According to our empirical data, a 16 × 16 block will be a proper size for an image larger than or equal to the size of 512 × 512. Typically, the weighting matrix is determined through a series of experiments according to the JND profile for a specific transform basis. However, for 2D DCT, a convenient way is to derive the weighting coefficients from JPEG quantization table by taking the inverse of the table entries and adjusting their amplitudes to a proper range. Based on the JPEG quantization table and our experiments, we design a weighting coefficients curve ω for a 16×16 block as shown in Fig. 1 1 . The weighting curve for other block size can be obtained by sub-sampling or interpolating this curve.

Fig. 2. Geometric illustration of the proposed sampling scheme, the projection (y  ) on the null space of weighted sampling matrix H is much closer to the original signal than that of the un-weighted sampling matrix null space H

2.5

corresponding to the low frequency components, see H as shown in Fig. 2. H will lie in the area close to x with a very high probability. Therefore, information from low frequency components will take a major role in the measurements, which will greatly affect the error distribution in the reconstruction procedure. Intuitively, since we have more information about the low frequency components in the measurements, the reconstruction error for these pieces will be much lower than that of the other components. However, this adjustment of error distribution will not decrease the reconstruction quality, because on one hand, those high frequency components which take more distortion are less sensitive to human eyes, the visual quality will not be greatly affected, on the other hand, high frequency parts inherently have lower energy than low frequency ones in the spectrum, the amplitudes of errors will still be low even they have more distortion. Contrarily, since low frequency components that are sensitive to human eyes will be more precisely recovered, the visual quality will be greatly enhanced comparing to the conventional unweighted sampling scheme.

2

Weight

1.5

1

0.5

0 0

50

100

150

200

250

Frequency

Fig. 1. Weight Curve

3.3. Geometric Explanation Due to the length limitation of this paper, we could not present a complete proof of validity for our proposed sampling scheme here, instead, we will give a brief geometric explanation with illustration. Consider an N dimensional signal space RN , a K-sparse 2 signal x in this space would be represented as a K-dimensional hyperplane aligned with the coordinate axes as shown in Fig. 2, while the sampling matrix will be a null space H. A CS measurements sequence y will be a projection of x from the high dimensional space onto the null space H. Usually, for a conventional sampling matrix (e.g. i.i.d Gaussian matrix) the null space H will be oriented at a random direction with equal probability, or in other words, it will not favor any direction, and the information from different frequency components will be collected with approximately equal amount (see y in Fig. 2). As a result, the reconstruction error will be equally distributed on all the frequency components. However, for a weighted random matrix, some coordinates of the null space will be amplified while some will be shrunk, as a result, the null space will be adjusted to be in favor of a certain direction. In our case, the null space will be weighted to approach the axes

4. EXPERIMENT RESULTS In our simulation, i.i.d Gaussian matrix is employed together with 2D DCT and the transformed signal is vectorized with Zig-Zag scanning. The test data are 512 × 512 8-bit grey level images. Block size of 8 × 8 and 16 × 16 are tested in our simulation. This is because for block size smaller than 8 × 8, the signal length is too short for CS framework to work properly, while for block size larger than 16 × 16, the computational complexity is too high for our test bed. The weighting coefficients proposed in Section 3.2 is applied, as illustrated in Fig. 1. The reconstruction procedure is accomplished by solving Problem (10). However, due to the limitation of space, we only demonstrate the results with block size 16 × 16 here, for more results, please visit our website http://ihome.ust.hk/˜yyang/. Table 1 tabulates the PSNR performance of our proposed sampling method compared with the normal sampling scheme on three testing images with various features at different sampling rates. From the data listed, we can see that by applying our proposed sampling scheme we can achieve at most 5.1 dB’s gain on average over different sampling rates. Figure 3 illustrates the comparison of visual qualities between

1 Due to the limitation of space, we only the show the curve instead of the value of the coefficients here 2 A signal is K-sparse means that only K of its entries are non-zero.

91

(a) peper with normal sampling

(b) peper with proposed sampling

(c) Lenna with normal sampling

(d) Lenna with proposed sampling

Fig. 3. Simulation results at sample rate 0.3. (a)Peper with un-weighted sampling, PSNR = 29.38dB, (b)Peper with proposed sampling, PSNR = 34.45dB, (c)Lenna with un-weighted sampling, PSNR = 28.15dB, (d)Lenna with proposed sampling, PSNR = 33.69dB

6. ACKNOWLEDGEMENT

our proposed method and the normal un-weighted sampling scheme 3 . From Fig. 3(a) and Fig. 3(c) we can see that, for un-weighted sampling scheme, obvious blocking and noise like artifacts can be observed, while as illustrated in Fig. 3(b) and Fig. 3(d) our proposed perceptual sampling scheme can get very good reconstruction results. 4

This work has been supported in part by the Innovation and Technology Commission of the Hong Kong Special Administrative Region, China (project no GHP/048/08). 7. REFERENCES

Table 1. PSNR Performance (in dB) Sample Rate 0.2 0.3 0.4 Lenna

Peper

Baboon

Normal

25.70

28.15

30.32

32.57

[1] E.J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” Information Theory, IEEE Transactions on, vol. 52, no. 2, pp. 489–509, 2006.

Proposed Gain

30.68 4.98

33.69 5.54

35.56 5.24

37.21 4.64

[2] D.L. Donoho, “Compressed Sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, 2006.

Normal

26.48

29.38

31.63

33.55

Proposed

31.81

34.45

35.95

37.06

Gain

5.33

5.07

4.32

3.51

Normal

20.59

21.89

23.11

24.48

Proposed

23.23

24.79

26.07

27.54

Gain

2.64

2.90

2.96

3.06

0.5

[3] E.J. Candes, “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, Madrid, Spain, 2006, vol. 3, pp. 1433–1452. [4] T.T. Do, T.D. Tran, and L. Gan, “Fast compressive sampling with structurally random matrices,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008, pp. 3369–3372. [5] E. Candes and J. Romberg, “Practical signal recovery from random projections,” IEEE Trans. Signal Processing, 2005. [6] Jeanny H´erault and Barth´el´emy Dur´ett´e, Modeling Visual Perception for Image Processing, Springer Berlin/Heidelberg, 2007.

5. CONCLUSION

[7] W.B. Pennebaker and J.L. Mitchell, JPEG Still Image Data Compression Standard, Kluwer Academic Publishers Norwell, MA, USA, 1992.

In this paper, a novel human perception based sampling scheme for image signals is proposed for compressive sensing framework. By adding the weighting coefficients to the sampling matrix, the proposed perceptual sampling scheme can emphasize the important frequency components in the sampling procedure, so that they could be more precisely recovered in the reconstruction. The simulation results illustrate that our proposed sampling scheme can enhance the performance of compressive sensing framework greatly in both objective and subjective assessment manners without increasing the complexity.

[8] N. Jayant, J. Johnston, and R. Safranek, “Signal compression based on models of human perception,” Proceedings of the IEEE, vol. 81, no. 10, pp. 1385–1422, 1993. [9] L. Gan, “Block Compressed Sensing of Natural Images,” in Proc. of International Conference on Digital Signal Processing, 2007, pp. 403–406.

3 We only demonstrate the reconstruction results of Lenna and Peper at sampling rate 0.3 here due to the limitation of space. 4 You may want to zoom in the images to see more details.

92