LOSSY COMPRESSION OF DISTRIBUTED SPARSE SOURCES: A ...

Report 13 Downloads 181 Views
LOSSY COMPRESSION OF DISTRIBUTED SPARSE SOURCES: A PRACTICAL SCHEME G. Coluccia1 , E. Magli1 , A. Roumy2 and V. Toto-Zarasoa2 1 2

Politecnico di Torino, C.so Duca degli Abruzzi 24, 10129 Torino, Italy INRIA, Campus Universitaire de Beaulieu, 35042 Rennes-Cedex, France

ABSTRACT A new lossy compression scheme for distributed and sparse sources under a low complexity encoding constraint is proposed. This architecture is able to exploit both intra- and inter-signal correlations typical of signals monitored, for example, by a wireless sensor network. In order to meet the low complexity constraint, the encoding stage is performed by a lossy distributed compressed sensing (CS) algorithm. The novelty of the scheme consists in the combination of lossy distributed source coding (DSC) and CS. More precisely, we propose a joint CS reconstruction filter, which exploits the knowledge of the side information to improve the quality of both the dequantization and the CS reconstruction steps. The joint use of CS and DSC allows to achieve large bit-rate savings for the same quality with respect to the non-distributed CS scheme, e.g. up to 1.2 bps in the cases considered in this paper. Compared to the DSC scheme (without CS), we observe a gain increasing with the rate for the same mean square error. 1. INTRODUCTION Lossy compression of sparse but distributed sources consists in finding a cost constrained representation of inherently sparse sources by exploiting their inter- but also intra-correlation, when the communication between the sources is not possible. This problem naturally arises in wireless sensor networks. Indeed, nodes of a sensor network may acquire temperature readings over time. The temperature may vary slowly over time, and hence consecutive readings have similar values. However, they also have inter-sensor correlation, as the sensors may be in the same room, in which the temperature is rather uniform. The question hence arises of how to exploit intra- and inter-sensor correlations without communication between the sensors and with a low complexity acquisition process in order to save energy consumption at the sensor. Therefore, we consider continuous, correlated, distributed and sparse (in some domain) sources and perform lossy universal compression under a low encoding complexity constraint. Compressed sensing (CS) [1, 2] has recently emerged as an efficient technique for sampling a signal with fewer coefficients than classic Shannon/Nyquist theory. The hypothesis underlying this approach is that the signal to be sampled is sparse or at least “compressible”, i.e., it must have a sparse representation in a convenient basis. In CS, sampling is performed by taking a numThe research leading to these results has received funding from the EC’s FP7 under grant agreement n. 216715 (NEWCOM++)

ber of projections of the signal onto pseudorandom sequences. Therefore, the acquisition presents appealing properties such as: low encoding complexity, since the basis in which the signal is sparse does not need to be computed, and universality, since the sensing is blind to the source distribution. The universality property makes CS well suited for the compression of distributed and correlated sources, since the same measurement matrix can be used for all sources and the inter-correlation is maintained. Therefore, distributed CS (DCS) was proposed in [3]. In that paper, it was shown that a distributed system based on CS could save up to 30% of measurements with respect to separate CS encoding/decoding of each source. However, CS [1, 2] and DCS [3] are mainly concerned with the performance of perfect reconstruction, and do not consider the representation/coding problem in a ratedistortion framework, particularly regarding the rate necessary to encode the measurements. Recently, [4, 5] considered the cost of encoding the random measurements for single sources. More precisely, rate distortion analysis was performed and it was shown that adaptive encoding (that takes into account the source distribution and is therefore complex) outperforms scalar quantization of random measurements. However, in the distributed context, adaptive encoding may loose the inter-correlation between the sources since it is adapted to the distribution of each single source or even the realization of each source (KLT or Burrows Wheeler Transform). Instead, DCS has the natural property to maintain it. For this reason, we propose to construct a quantized distributed CS architecture. The design implies to optimize the joint reconstruction filter. The main idea is to exploit the knowledge of Side Information (SI) not only as a way to reduce the encoding rate, but also as an instrument to improve the reconstruction quality. In particular, we exploit knowledge of SI to improve the accuracy of the dequantized signal, improving the method in [6] through clipping of the the reconstructed value within the quantization interval. Moreover, we propose two practical algorithms to estimate the common component between the unknown source and the SI, in order to help the CS reconstruction allowing the transmission of fewer measurement samples. The construction of optimal quantizers for sparse sources has been addressed in [7] and the rate-distortion behaviour of sparse memoryless sources was described in [8]. In both papers, authors give hints about the application of the methods they propose to a DCS scenario, but they consider the case where the sources are ob-

served in the domain where they are sparse. Instead we consider here the more general case of a source not observed in its sparsity basis. On the other hand, the authors of [9] propose a method to exploit the knowledge of a SI at the decoder to recover sparse signals. In that case, the SI consists in the (even approximate) knowledge of the position of nonzero components. This information is used to reduce the computational complexity of the Orthogonal Matching Pursuit algorithm for the recovery of the sparse signal. Instead, the SI we use is a sparse signal correlated to the unknown source through a common component. Therefore it contains information about the position of nonzero components but also about the coefficients and is used in the CS reconstruction step (estimation of the common part between the unknown source and the SI) and in the dequantizer. 2. BACKGROUND 2.1 Compressed sensing In the standard CS framework, introduced in [2], a signal x ∈ RN ×1 which has a sparse representation in some basis Ψ ∈ RN ×N , i.e: x = Ψθ, kθk`0 = K, K  N can be recovered by a smaller vector y ∈ RM ×1 , K < M < N , of linear measurements y = Φx, where Φ ∈ RM ×N is the sensing matrix. The optimum solution, requiring at least M = K + 1 measurements, would be s.t. ΦΨθ = y . θb = arg min kθk θ

`0

Since the `0 norm minimization is a NP-hard problem, one can resort to a linear programming reconstruction by minimizing the `1 norm s.t. ΦΨθ = y , θb = arg min kθk θ

`1

provided that M is large enough (∼ K log(N/K)). When the measurements are noisy, like in the case of quantized data which are subject to quantization noise, the `1 minimization with relaxed constraints is used for reconstruction: θb = arg min kθk`1 s.t. kΦΨθ − yk`2 < ε , (1) θ

where ε bounds the amount of noise in the data. Extracting the elements of Φ at random from a Rademacher distribution (i.e., ±1 with the same probability) allows a correct reconstruction with overwhelming probability. 2.2 Distributed source coding Distributed source coding (DSC) refers to the problem of compressing correlated sources without cooperation at their encoders. In our scheme, DSC applies after the CS module that spreads the intra-correlation of the sources among measurement samples y. Therefore, we assume that the sources compressed with DSC are i.i.d. In the lossless case, Slepian and Wolf [10] showed that the separate encoding of two i.i.d. sources, say Y1 and Y2 , does not incur any loss relative to joint encoding (in terms of compression sum rate R1 +R2 ) provided decoding is performed jointly. In the asymmetric setup, one source (Y2 for instance) is compressed at its entropy

H(Y2 ) and is hence available at the decoder; the other source (Y1 ) is compressed at its conditional entropy and can be recovered exploiting Y2 as a “side-information”. This problem is also referred to as source coding with SI at the decoder. It has been shown that this problem can be optimally solved by using channel codes matched to the channel that models the correlation between the source and the SI. This setup has been extended to lossy compression of general sources by Wyner and Ziv (WZ) [11], where it is shown that separate encoding incurs a loss relative to joint encoding except for some distributions (Gaussian sources, or more generally Gaussian correlation noise); practical WZ solutions compress and decompress the data relying on an inner SW codec and an outer quantization plus reconstruction filter. 3. PROPOSED ALGORITHM The problem we wish to solve is the following. In a WZ setting, we want to encode continuous jointly sparse and correlated sources, achieving the smallest distortion for a given rate. We assume that the encoding stage must have low complexity, hence either CS is used to take advantage of intra-sensor correlations because of its low complexity with respect to transform coding, or the system employs a CS camera that directly outputs the linear measurements. In the latter case, measurements can be quantized in order to meet a rate (or distortion) constraint. However, in the former case, a new degree of freedom is introduced since the quantizer can be placed either before or after the sensing module. This issue is discussed in the following. The main ingredients of the proposed solution are a CS stage to exploit sparsity, reducing the signal length from N to M , coupled with a WZ stage to exploit inter-sensor correlation, i.e. a scalar quantizer with joint reconstruction to set the desired degree of distortion, a SW codebook to achieve minimum rate, and a joint CS reconstruction stage. 3.1 Jointly sparse source model Let source signals x1 and x2 be acquired by the system. We assume that x1 and x2 follow the JSM-1 model [3]. According to this model, a set of J sparse correlated signals xj ∈ RN ×1 , j = 1, . . . , J is modeled as the sum xj = xC + xI,j of two sparse components: i) a common component xC , which is shared by all signals, and ii) an innovation component xI,j , which is unique to each signal xj . Both xC and xI,j are sparse in some domain represented by the orthonormal basis matrix Ψ ∈ RN ×N , namely: xC = ΨθC and xI,j = ΨθI,j , with kθC k`0 = KC , kθI,j k`0 = Kj and KC , Kj  N . This model is a good fit for signals acquired by a group of sensors monitoring the same physical event in different points of space, where local factors can affect the innovation component of a more global behavior taken into account by the common component. With respect to the original JSM1 scheme described before, we introduce a parameter α to modulate the correlation between the sources, i.e. x1 = xC + αxI,1 and x2 = xC + αxI,j . 3.2 Quantization and coding When the data are first acquired through a conventional process and then sensed through linear random measurements, the quantizer can either be placed before or

Φ

JSM-1 x1 source

Φ

x2

y1

Uniform Quantizer y q , 1

DSC

DSC-1

Joint y q ,1 Dequant

y 1

CS-1

x 1

y2 (a) Block diagram of the Measure-then-quantize system

JSM-1 x1 source x2

Uniform Quantizer

xq , 1

Uniform Quantizer

xq , 2

Φ Φ

yq , 1

DSC

DSC-1

y q ,1

CS-1

Joint x q ,1 Dequant x 1

yq , 2

(b) Block diagram of the Quantize-then-measure system

Figure 1: Block diagrams of M+Q and Q+M systems after the sensing module. In the single source case, it is known that CS is robust to quantization noise [4]. Here we raise the problem of the optimal position of the quantizer vs the CS module in the context of separate encoding of distributed sources. To answer this question, we define and compare two different schemes for encoding jointly sparse and correlated sources. They are depicted in fig. 1. In the first scheme (fig. 1(a)), denoted M+Q, first linear measurements of the sources are taken with  1 a Gaussian Φ matrix such that (Φ)mn ∼ N 0, M to obtain y1 and y2 . Then, the measurements are quantized with a scalar uniform quantizer and coded using a SW coder. In the second scheme (fig. 1(b)), denoted Q+M, the sources are first quantized with a scalar uniform quantizer having step size ∆. After quantization, we take the linear measurements with a random i.i.d. integer Φ matrix with Rademacher distributed entries (Pr ((Φ)mn = ±1) = 0.5), yielding a measurements vectors yq,1 , yq,2 composed by integer elements, which are encoded in a lossless way by the SW encoder. It should be noted that, since x1 and x2 are correlated sample by sample, their linear measurements are still correlated if we use the same sensing matrix Φ for either sources, justifying the use of a SW coder. 3.3 Slepian-Wolf coding In both systems, after the quantization/measurement stage, a SW zero-error co/decoding stage follows. Let yb,1 and yb,2 denote the Nb -long binary versions of yq,1 and yq,2 respectively. Let H be the parity-check matrix representing a Kb /Nb rate channel code, of size (Nb − Kb ) × Nb . yb,1 is compressed by sending its syndrome s1 = H·yb,1 , of size (Nb −Kb ), to the decoder; the side-information yb,2 is exploited at the decoder using the modified Belief Propagation algorithm presented in ˆ b,1 , as the closest sequence [12] to find the best estimate y to yb,2 with syndrome s1 . Here, we use rate-adaptive accumulated LDPC codes that have been proved to perform close to the SW bound [13], and allow to achieve zero-error coding through rate adaptation. 3.4 Reconstruction Finally, reconstruction is performed in order to obtain b1 . The performance an estimate of the unknown source x of the reconstruction stage can be improved exploiting knowledge of the SI at the decoder, for both dequantiza-

tion and CS reconstruction. In this section, for simplicity we will refer to the scenario of “coding with SI”, or “asymmetric” coding, in which x1 has to be communicated, and x2 is known at the receiver. We describe the implementation for the M+Q scheme, but the extension to Q+M is straightforward. 3.4.1 Joint Dequantization The first improvement is given by joint signal dequantization. Under the classical Gaussian additive correlation noise model, i.e. y1 and y2 are Gaussian, and their difference is independent from y2 , we derive the optimal joint dequantizer r (a−y2 )2 (b−y2 )2 e 2σ2 − e 2σ2 2     , σ (2) yb1 = y2 + π erf b−y √ 2 − erf a−y √ 2 2σ



where y1 is an element of y1 and y2 is the corresponding element in y2 . σ is the variance of y1 −y2 , a = ybq,1 −∆/2 and b = ybq,1 + ∆/2. However, in the JSM-1 model y1 − y2 is not independent from either y1 or y2 . Since (2) is not optimal under JSM-1, we resort to a suboptimal more robust estimator. In [6], the output of the inverse quantizer is σ2

2

q σ obtained as yb1 = σ2 +σ bq,1 . The scheme we 2 y2 + σ 2 +σ 2 y q q use is an improved version in which yb1 is clipped within the interval [a, b] when it happens to lie outside of it.

3.4.2 Joint Reconstruction The second improvement regards CS reconstruction. In particular, the idea is to estimate the sparsity support b C of the of the common component θC , i.e. the set Θ positions of its nonzero coefficients. This estimation allows us to improve CS reconstruction, since we further sparsify the vector to be reconstructed, and hence require less measurements to successfully reconstruct the signal (or, equivalently, improve the quality of the reconstruction given the same number of measurements). In particular, we subtract from yq,1 the “measurements common component” yC = ΦxC ; the resulting measurements vector only contains the innovation component of the source xI,1 , which is sparser than x1 and requires fewer measurements to be recovered. After recovering the unique component, we re-add the estimated common component to obtain the final estimate of x1 . The entire process is depicted in the block diagram of fig. 2.

y 1

Subspace Detection

 Subspace C Comparison

Ψ-1

x C

Φ

y C

+ -

y I ,1

CS-1

x I ,1

x 1 +

+

2

Figure 2: Block diagram of the Joint Reconstruction stage b C . Both We propose two algorithms to estimate Θ take as inputs the set of nonzero elements of θ2 = ΨT x2 , and initially perform non-joint reconstruction to obtain b1 using (1). This is used to compute an initial estimate x Tb b b C of positions of the θ1 = Ψ x1 . The output is the set Θ nonzero elements of θC . In short, Algorithm 1 estimates the intersection of the positions of the significant components of both sources; algorithm 2 sorts by decreasing magnitude the nonzero elements of the estimated source, and then compares them to the SI coefficients.

ticed that the best performance is achieved by the M+Q system with M = 128, while for M = 64 the decoder has not enough measurements to properly reconstruct the signal. The Q+M system with M = 128 shows a penalty of 0.85 bit per source symbol. This can be interpreted by the fact that: i) in the scheme Q+M the quantized signal may not be sparse (not sparse at all or not sparse in the same basis as the original signal). Therefore, this scheme performs poorly. ii) the scheme M+Q performs good which shows that even in the distributed context, CS is robust to quantization noise.

Algorithm 1 Intersect algorithm Require: ΘNZ,2 = {i|(θ2 )i 6= 0}, θb1 , t b C = {i|(θC )i 6= 0} Ensure: Θ b b 1: ΘNZ,1 ← {i| (θ 1 )i > t} bC ← Θ b NZ,1 ∩ ΘNZ,2 Θ bC 3: return Θ 2:

Algorithm 2 Sorting algorithm Require: ΘNZ,2 = {i|(θ2 )i 6= 0}, θb1 , KC b C = {i|(θC )i 6= 0} Ensure: Θ bC ← ∅ 1: Θ b b 2: for i = maxi (θ 1 )i to mini (θ1 )i do 3: if i ∈ ΘNZ,2 then bC ← Θ b 4: Θ C∪i b 5: if ΘC = KC then 6: break 7: end if 8: end if 9: end for bC 10: return Θ 4. NUMERICAL RESULTS In this section, we assess the rate-distortion performance of the proposed scheme. The rate is measured in bits per original source symbol, as output by the zeroerror SW encoder. Sources x1 and x2 have N = 512 samples, and we take α = {100 , 10−1 , 10−2 , 10−3 } and KC , K1 , K2 = 8. Ψ is the DCT matrix of length N . The quantization step for Q+M q is ∆ = {0.1, 0.01, 0.001},

N and for M+Q it is ∆ = M {0.1, 0.01, 0.001}, in order to obtain comparable rates. The measurement vector has M = 64, 128, 256 entries. For the SW code, L = 66, Nb = 1584, and Kb varies from 24 to 1584.

4.1 Comparison between Q+M and M+Q Fig. 3 shows the rate-distortion performance of the Q+M and M+Q systems, for α = 10−2 . It can be no-

4.2 Penalty of practical vs. Wolf encoder

optimal Slepian-

Moreover, fig. 3 shows the penalty of the Slepian-Wolf encoder previously described with respect to the conditional entropy H(yq,1 |yq,2 ), which bounds its performance. The penalty is about 0.15 bit per source symbol, which is pretty close to the performance of the code shown in [14], although this latter is not a zero-error coder. 4.3 Gain of proposed DCS scheme vs. DSC (no CS) and non-distributed CS In addition, we can read from fig. 3 the gain we obtain singularly by the CS and DSC blocks. Without CS, the rate-distortion curve would be the one depicted in the curve labeled as “No CS”, with a penalty increasing with the rate. On the other hand, to evaluate the advantage obtained using DSC, we compare the entropy H(yq,1 ) with the conditional entropy H(yq,1 |yq,2 ) and see that joint encoding yields a gain of 1.18 bit per source sample. 4.4 Conjecture about the optimal number of measurements Finally, we conjecture that the best RD performance is independent of the number of measurement. The optimal number of measurement would be the one guaranteeing (close-to-)perfect reconstruction. More than this, additional measurements would not yield any improvement. Less than this, the reconstruction distortion would likely dominate over the quantizer distortion. Hence, all the distortion can be optimally tuned at the quantizer only and this should yield a predictable degree of reconstruction distortion. 4.5 Joint Reconstruction Fig. 4 shows the rate-distortion performance of the reconstruction algorithms proposed in section 3.4 for M = 64, 128 for the M+Q scheme. Curves labeled with IDR refer to independent dequantization and reconstruction, i.e., the curves in Fig. 3 . JD labels curves referring to joint dequantization. JD+JR refers to joint dequantization and joint reconstruction. Ideal refers to

−2

10

M+Q − M = 64 Q+M − M = 64 M+Q − M = 128 −3

10

Q+M − M = 128 M+Q − M = 256 Q+M − M = 256

Distortion, MSE

No CS −4

M+Q (Entropy) − M = 128

10

M+Q (Conditional Entropy) − M = 128

−5

10

ments represent a viable, universal and low-complexity signal representation. First, we showed that the resilience of CS to quantization error also holds in the distributed setup. Moreover, the optimal number of measurements can be chosen as the one guaranteeing (close-to-)perfect reconstruction. In addition, using joint decoding, dequantization and reconstruction techniques allows to boost performance even further, making the proposed scheme an attractive choice for environments such as sensor networks, in which the devices performing acquisition and processing are severely constrained in terms of energy and computations.

−6

10

REFERENCES −7

10

0

1

2

3 4 5 Rate, bit per source symbol

6

7

8

Figure 3: Rate Distortion performance, Q+M vs. M+Q systems. N = 512, KC = Kj = 8, α = 10−2 −2

10

IDR − M = 64 JD − M = 64 JD + JR (ideal) − M = 64 −3

JD + JR (intersect) − M = 64

10

JD + JR (sorting)− M = 64 IDR − M = 128 Distortion, MSE

JD − M = 128 −4

JD + JR (ideal) − M = 128

10

JD + JR (intersect) − M = 128 JD + JR (sorting)− M = 128 −5

10

−6

10

−7

10

0

0.2

0.4

0.6

0.8 1 1.2 1.4 Rate, bit per source symbol

1.6

1.8

2

Figure 4: Rate-distortion performance, Joint dequantization and Joint reconstruction. N = 512, KC = Kj = 8, α = 10−2

the performance of a joint reconstruction knowing in advance the positions of the nonzero components of θC . It can be noticed that the joint reconstruction provides the most significant gain. With M = 64, it is now possible to reconstruct the source, significantly outperforming M = 128. Moreover, the proposed algorithms for the estimation of the common component perform very close to each other and show a slight penalty with respect to the ideal case. 5. DISCUSSION AND CONCLUSIONS In this paper, we have proposed a system for lossy compression of intra- and inter-correlated sources in application scenarios such as sensor networks. It is based on the joint use of CS to capture memory of a signal, and DSC to take advantage of inter-sensor correlations. The proposed system has satisfactory rate-distortion performance in comparison with a system that does not employ CS but only DSC, showing that linear measure-

[1] D.L. Donoho, “Compressed sensing,” IEEE Trans. on Inf. Theory, vol. 52, no. 4, pp. 1289–1306, 2006. [2] E.J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?,” IEEE Trans. on Inf. Theory, vol. 52, no. 12, pp. 5406–5425, 2006. [3] D. Baron, M. F. Duarte, M. B. Wakin, S. Sarvotham, and R. G. Baraniuk, “Distributed Compressive Sensing,” ArXiv e-prints, Jan. 2009. [4] A.K. Fletcher, S. Rangan, and V.K. Goyal, “On the ratedistortion performance of compressed sensing,” in IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 07), 2007. [5] V.K. Goyal, A.K. Fletcher, and S. Rangan, “Compressive sampling and lossy compression,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 48-56, 2008. [6] S.S. Pradhan and K. Ramchandran, “Distributed Source Coding Using Syndromes (DISCUS): design and construction,” IEEE Trans. on Inf. Theory, vol. 49, no. 3, pp. 626 – 643, mar. 2003. [7] C. Weidmann, F. Bassi and M. Kieffer, “Practical distributed source coding with impulse-noise degraded side information at the decoder,” Proc. 16th Eusipco 2008, Lausanne, CH, August 2008. [8] C. Weidmann, and M. Vetterli, “Rate Distortion Behavior of Sparse Sources,” submitted to IEEE Trans. Inform. Theory, Aug. 2010. [9] V. Stankovic, L. Stankovic, and S. Cheng, “Sparse signal recovery with side information,” Proc. 17th Eusipco 2009, Glasgow, UK, August 2009. [10] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. on Inf. Theory, vol. 19, no. 4, pp. 471–480, July 1973. [11] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. on Inf. Theory, vol. 21, no. 1, pp. 1–10, January 1976. [12] A. D. Liveris, Z. Xiong, , and C. N. Georghiades, “Compression of binary sources with side information at the decoder using LDPC codes,” IEEE Commun. Lett., vol. 6, no. 10, October 2002. [13] D. Varodayan, A. Aaron, and B. Girod, “Rate-adaptive codes for distributed source coding,” EURASIP Signal Processing, vol. 86, no. 11, pp. 3123–3130, November 2006. [14] Z. Xiong, A.D. Liveris, and S. Cheng, “Distributed source coding for sensor networks,” IEEE Signal Proces. Mag., vol. 21, no. 5, pp. 80–94, 2004.