Fundamental Limits on Sensing Capacity for ... - Semantic Scholar

Report 5 Downloads 91 Views
1

Fundamental Limits on Sensing Capacity for Sensor Networks and Compressed Sensing

arXiv:0804.3439v1 [cs.IT] 22 Apr 2008

Shuchin Aeron, Manqi Zhao, and Venkatesh Saligrama

Abstract Modern applications of signal processing problems arising in sensor networks require efficient sensing of multi-dimensional data or phenomena. In this context it becomes important to understand fundamental performance limits of both sensing and communication between sensors. In this paper we focus primarily on sensing aspects. We propose the notion of sensing capacity to characterize the performance and effectiveness of various sensing configurations. We define Sensing Capacity as the maximal number of signal dimensions reliably identified per sensor deployed. The inverse of sensing capacity is the compression rate, i.e., the number of measurements required per signal dimension for accurate reconstruction, a concept that is of interest in compressed sensing(CS). Using information theoretic arguments we quantify sensing capacity (compression rate) as a function of information rate of the event domain, SNR of the observations, desired distortion in the reconstruction and diversity of sensors. In this paper we consider fixed SNR linear observation models for sensor network (SNET) and CS scenarios for different types of distortions motivated by detection, localization and field estimation problems. The principle difference between the SNET and CS scenarios is the way in which the signal-tonoise ratio (SNR) is accounted. For SNET scenarios composed of many sensors, it makes sense to impose fixed SNR for each individual sensor which leads to row-wise normalization of the observation matrix. On the other hand CS has dealt with column-wise normalization, i.e., the case where the vector valued observation corresponding to each signal component is normalized. The two cases lead to qualitatively different results. In the SNET scenario sensing capacity is generally small and correspondingly the number of sensors required does not scale linearly with the target density(sparsity) in contrast to the CS setting. Specifically, the number of measurements are generally proportional to the signal dimension and is weakly dependent on target density or sparsity. This raises questions on compressive gains in power

The authors are with the department of Electrical and Computer Engineering at Boston University, MA -02215. They can be reached at {shuchin, mqzhao, srv}@bu.edu August 6, 2009

DRAFT

2

limited sensor network applications based on sparsity of the underlying source domain. For compressed sensing scenario, we also illustrate gaps between existing state of the art convex programming methods and optimal estimators.

I. I NTRODUCTION In this paper we will focus on problems that arise in the area of multi-dimensional data acquisition and processing motivated by applications arising in sensor networks (SNET) and compressed sensing (CS). In this context we define the notion of Sensing Capacity as a measure for detection, estimation and reconstruction capability of a network of sensors. Sensing capacity is defined asymptotically as the maximal number of signal dimensions reliably identified per sensor. The inverse of sensing capacity is the compression rate, i.e., the ratio of number of measurements to number of signal dimensions and characterizes the minimum rate to which the source can be compressed. Sensing capacity is a function of the following parameters: (a) Sensing channels that include SNR and diversity of sensor operation; (b) Sensing domains that model the ambient phenomena or data; (c) Solution objectives that are essentially some function of the source that is needed to be estimated; and (d) desired distortion. These questions are related to problems dealt within compressed sensing (CS) literature on sampling sparse signals, [1], [2], [3] and the problem of sampling signals with finite rate of innovation [4],[5]. A quantity of interest in both these problems is that of finding the minimum number of samples of a signal to ensure reliable recovery of the signal from these samples and is equivalent to our notion of compression rate. Although the problem of finding sensing capacity can be dealt with in very general settings, in this paper we primarily focus on the fixed SNR linear observation models and derive fundamental information theoretic upper and lower bounds to sensing capacity for both SNET and CS scenarios. Nevertheless our techniques and bounds can be extended to handle more general cases as well. A related notion of sensing capacity has also been proposed in [6] in the context of sensor networks for discrete applications such as target detection. While their work focused on recovery for arbitrary sources our objective here is to focus on sources which are sparse and characterize sensing capacity and compression rate as a function of sparsity. In general the notion of sensing capacity is useful to address problems with positive information rate sources, i.e., where sparsity grows proportionately with signal dimension. We show that for problems where this information rate is asymptotically zero, such as in problems of classification and detection, sensing capacity is infinite (i.e. the compression rate is zero). Therefore very few measurements suffice for such problems. August 6, 2009

DRAFT

3

For positive information rate sources we obtain sharp contrasts between the SNET and CS scenarios. √ Concretely, consider for example the SNR model, Y = SN RGX + N where, X ∈ Rn , N ∈ Rm is AWGN noise with variance 1 per dimension and G ∈ Rm×n is the observation matrix coming from

a random Gaussian ensemble or from random {1, −1} ensemble. Such random ensembles have been shown to yield good compression rates due to good isometry properties [1], [7]. For SNET scenario this choice has been shown to arise naturally in wireless SNET applications [8]. The main difference between SNET and CS scenarios is the way in which SNR is accounted. In SNETs fixed SNR arises due to fixed SNR for each sensor observation. Thus for sensing a field of dimensionality n the sensed SNR per location is spread across the field of view. This leads to a row-wise normalization of the observation matrix. On the other hand in CS scenario column-wise normalization of the observation matrix has been conventionally employed. This difference leads to surprisingly different regimes of compression rate as indicated in Table I. Scenario

Sparsity Pattern Recovery

Average Distortion ≤ d0 < α

CS

m ≈ 6nH(α), SN R ≈ log n;

m ≈ 2nRX (d0 ), for SN R ≈

SNETs

m ≈ n, SN R ≈ log n

m≈n

4RX (d0 ) . d0

2RX (d0 ) d SN R log(1+ 0 2 )

TABLE I

C ONTRASTS

BETWEEN THE

CS

AND

SNET

SCENARIOS .

R ESULTS FOR EXACT

SPARSITY PATTERN RECOVERY

ARE BASED ON A NON - BAYESIAN SOURCES WHILE RESULTS FOR AVERAGE DISTORTION ARE BASED ON A BAYESIAN SOURCES . DIMENSION

n). m IS

α IS

THE SPARSITY RATIO ( MAXIMUM NON - ZERO COMPONENTS TO THE SIGNAL

THE NUMBER OF MEASUREMENTS .

H(α) IS

THE BINARY ENTROPY FUNCTION AND

RX (·)

IS THE SCALAR RATE - DISTORTION FUNCTION GIVEN THE SOURCE DISTRIBUTION .

Note from the above results that in exact support recovery problems, for the CS scenario the number of measurements scale linearly with the entropy rate of the source for sufficiently large SN R ≈ log n. In contrast for the SNET scenario with SN R ≈ log n the compression rate approaches unity. This means that for SNET scenarios even with a moderate (logarithmic) scaling of SNR with the dimension of the signal, the number of measurements scale with the dimensionality of the source domain and no compression can be achieved. Similar trends hold for average distortion as well as seen in Table I. Much of the literature is motivated by the performance of convex algorithms such as Linear Programming for no noise case [1], LASSO [9], [10], [11] and Dantzig selector, [7] for noisy cases. More recent bounds for exact support recovery using LASSO were obtained in [12]. In this paper we identify gaps August 6, 2009

DRAFT

4

between optimal achievable bounds to sensing capacity and compression rate and that obtained by convex programming methods. For support recovery problems LASSO appears to require sparsity to grow sublinearly with signal dimensions whereas one can tolerate linear growth by using optimal max-likelihood estimators. For average distortion the ℓ1 method of [13],[12] hold for small levels of sparsity ratio. This gap is summarized in Table II. Method

Exact Support Recovery

Average Distortion ≤ d0 = c0 α

Max-Likelihood

m ≈ nH(α), α > 0, SN R ≈ log n

constant SNR , m ≈ nRX (c0 α)

LASSO

2

m ≈ n, SN R ≈ log n, α ≤

1 log n

→0

m ≈ nH(α), constant SN R, α < 10−4

TABLE II

G AP

BETWEEN ACHIEVABLE BOUNDS FOR

SENSING .

ML

R ESULTS FOR EXACT ARE BASED ON A

LASSO

AND

M AX -L IKELIHOOD ESTIMATOR FOR

SPARSITY PATTERN RECOVERY FOR

BAYESIAN

SOURCE MODEL .

COMPONENTS TO THE SIGNAL DIMENSION

α IS

n). m IS

LASSO

COMPRESSED

AND AVERAGE DISTORTION WITH

THE SPARSITY RATIO ( MAXIMUM NON - ZERO

THE NUMBER OF MEASUREMENTS .

RX (·) IS

THE

RATE - DISTORTION FUNCTION

The adverse scaling of SNR for exact support recovery motivates focus on approximate reconstruction for positive information rate sources. In this direction we extend the Fano’s inequality [14], to handle continuous signal spaces and arbitrary distortion measures. Using this we derive upper bounds to sensing capacity. The novelty of this new Fano’s bound is that it provides bounds to sensing capacity in terms of rate distortion of the solution objective and mutual information between source and observations conditioned on the sensing channels. Rate distortion is directly related to the source domain and the corresponding solution objective. The mutual information is related to source domain that is being sensed and the sensing channels. For problems of continuous field estimation, we utilize maximum-likelihood (ML) estimators constructed over the set of rate-distortion quantization points and derive lower bounds to sensing capacity. Sensing Capacity bounds obtained from these techniques also exhibit similar contrast between the CS and SNET scenarios as in the exact support recovery case. In particular for the SNET scenario sensing capacity goes to zero with sparsity going to zero, whereas in CS scenario sensing capacity scales inversely with the rate distortion. Consequently fewer measurements are required in CS scenario than in SNET scenario. These differences are illustrated in Table I. Information theoretic analysis for exact support recovery for the CS case was considered in [15], where m ≥ O(αn log n(1 − α)) but requires that SN R ∼ ∞. Information theoretic bounds for estimation of August 6, 2009

DRAFT

5

support for the CS scenario (expressed in terms of identifying the true underlying signal subspace) were derived in [16] for finite SNR. Nevertheless extension to arbitrary distortion in reconstruction (either in support recovery or in ℓ2 distortion) was not provided. Here we provide a complete information theoretic characterization for approximate recovery in ℓ2 distortion. Also the results in principle be extended to derive bounds for reconstructing any function of the source. For the simple case of binary alphabets where ℓ2 distortion matches that of distortion in support recovery our bounds are sharper than those obtained previously. In addition, information theoretic results obtained in this paper for sensing capacity, offer another significant advantage. Through a simple manipulation of the mutual information term we isolate the effect of sensing architecture, i.e. the structure of the sensing channels on sensing capacity. This allows us to quantify in advance the efficiency of particular type of sensor deployment and modality of operation and compare the performance in an algorithm independent manner. In this direction we compare performance of several sensor configurations with differing fields-of-view. This paper is organized as follows. In Section II we present the problem set-up in detail with examples of sensing domains, sensing channels, solution objectives and distortion measures. In Section III we show that for problems of detection and classification sensing capacity is infinite. In Section IV we show that for perfect support reconstruction sensing capacity is zero unless SNR asymptotically approaches infinity. Here we will point out quantitative differences in the operating regimes for SNET and CS scenarios. Following these developments in Section V-A we extend Fano’s inequality to handle arbitrary distortion measures, continuous signal spaces and solution objectives. In Section V-B for problems of field estimation we utilize maximum likelihood estimator to evaluate upper bounds to probability of error subject to distortion criteria. Using lower and upper bounds to probability of error, in sections VI-A and VI-B we evaluate explicit upper and lower bounds to sensing capacity for SNET and CS scenarios respectively, under Bayesian priors on sensing domains modeling the signal sparsity and show the effect of fundamental parameters, viz., SNR, sparsity and distortion. We then contrast the two cases in VI-C. Finally in Section VII we show how the extended Fano’s inequality can be used to evaluate the effect of sensing architecture on the performance. For sake of brevity, we will briefly outline the approach and mention several results that have been reported in [17]. II. P ROBLEM S ET- UP We first present an abstract setup and then describe the specific problem dealt with in this paper. We consider two types of sensing domains (sources) - random and non-random. For random sensing domains, August 6, 2009

DRAFT

6

Fig. 1.

Schematic for the general problem.

let {Pn , X n , Fn } be an n-dimensional probability space and let Xn ∈ X n ⊂ Rn be drawn according to the the probability measure Pn . For non-random sensing domains Xn is unknown deterministic and belongs to a n-dimensional parameter set, Ξn ⊂ Rn . The parameter set is usually the same as X n and for brevity we make no distinctions between them unless otherwise noted. We will refer to the probability space as the source or the sensing domain. We consider m sensors and/or observations that measure X through noisy sensing channels, φi : X n × Ω → Y, i = 1, 2, ..., m

where Ω is the space where noise events are realized. The overall sensing channel Φ = [φj ]m j=1 induces a joint probability measure on X n × Y m . Let f : Xn → Z

denote a solution objective, i.e., we are interested in estimating Z ∈ Z which is related to X via the mapping f . Let U denote a reconstruction operator U : Ym → Z

that generates estimates of Z ∈ Z from Y . To this end we introduce a distortion measure over Z × Z denoted by d(., .).

August 6, 2009

DRAFT

7

We will now introduce the notion of sensing capacity for random and non-random sources. Loosely, sensing capacity is the maximum ratio n/m that is achievable such that the probability of error in reconstruction of the solution objective asymptotically approaches zero. In particular we consider the asymptotic situation letting both the ambient dimension, n, and the number of sensors approach infinity. For the random case we need a sequence of n-fold probability distributions, Pn , a sequence of sensing channels φj , j = 1, 2, . . . , m, . . . and sequence of estimators Um that maps the m-dimensional observation to the solution domain f (X). The non-random case is specified by providing a collection sensing domains Ξn . Note that bounds on sensing capacity provide precise characterization of the number of measurements required to reconstruct the solution objective as well. In this paper we will capture this by Compression Rate that is essentially the inverse of sensing capacity. In general sensing capacity depends on the sensing domain (whether random or non-random), distortionlevel, d0 , desired solution objective, f , and sensing channel, Φ, signal-to-noise ratio, SN R, the sparsity ratio, α, the diversity of sensor operation characterized by diversity ratio, ξ etc. To avoid cumbersome notation we drop this functional dependence and will make it explicit wherever appropriate, for example, we will say, C(SN R) to make explicit dependence on SN R. In the following we will also consider sensing channels that are drawn from random ensembles that are dependent on n, m. This means that Φ is drawn from a probability space. We incorporate this setting by defining sensing capacity as the average over all sensing channels. For non-random choices of Φ we can assume a singular measure. Sensing Capacity and Compression Rate: To this end denote the ratio

n m

as the Sensing Rate for the

set-up. We first define the ε-sensing capacity as follows, Definition 2.1: ǫ-sensing capacity is the supremum over all the sensing rates such that for the sequence of n-dimensional realizations over the sensing domains and the sequence of (possibly random) sensing channels φ1 , .., φm , there exists a sequence of reconstruction operators Um such that the probability that the distortion in reconstruction of Z = f (X) is below d0 is greater than 1 − ǫ. In particular for the random sensing domain we have,

Cǫ = lim sup m,n

August 6, 2009

nn

m

o ˆ ≥ d0 ) ≤ ǫ|Φ) → 0 : EΦ P(d(Z, Z)

(1)

DRAFT

8

and for the non-random setting we have,   n ˆ ≥ d0 ) ≤ ǫ|Φ) → 0 Cǫ = lim sup : sup EΦ P(d(Z, Z) m X∈Ξn m,n Sensing Capacity can now be defined as the limiting case, C = lim Cǫ ǫ→0

(2)

(3)

Similar to the definition of sensing capacity we have the following definition of compression rate. To this end let

m n

denote the compression ratio for the sensing channel. With the setup as above we have

for random setting Rǫ =

nm o 1 ˆ ≥ d0 ) ≤ ǫ|Φ) → 0 = lim inf : EΦ P(d(Z, Z)) m,n Cǫ n

(4)

and for the non-random setting,

1 = lim inf Rǫ = m,n Cǫ



 m ˆ : sup EΦ P(d(Z, Z ≥ d0 )) ≤ ǫ|Φ) → 0 n X∈Ξ

(5)

Compression rate can now be defined as the limiting case, R = lim Rǫ ǫ→0

(6)

Since bounds on sensing capacity yields bounds on compression rate and vice versa, in the following we will focus mainly on sensing capacity. We will state results for the compression rate wherever required for explanation of results.

A. Sensing Domains In this paper we will focus on sparse sensing domains/sources. We will consider the following domains for the non-random and random settings. Non-Random sources: We say that Ξn ⊂ Rn is a family of k-sparse sequences if for every X ∈ Ξn , the support of X is smaller than k. Formally, let

Supp(X ∈ Rn ) = {k | X 6= 0}

(7)

Then Ξn is a family of k-sparse sequences if,

Card(Supp(X)) ≤ k, ∀ X ∈ Ξn

(8)

We will refer to the ratio, α = k/n as the sparsity ratio. August 6, 2009

DRAFT

9

Random sources: For this case we consider the n dimensional source vector X = X1 , . . . , Xn as a sequence drawn i.i.d from a mixture distribution

PX = αN (µ1 , σ12 ) + (1 − α)N (µ0 , σ02 )

(9)

where α ≤ 12 . Note that if µ1 = 1 and µ0 = 0 and σ1 = σ0 = 0, then X is a Bernoulli(α) sequence. This models the discrete case for addressing problems such as target localization. If µ1 = µ0 = 0 but σ12 = 1 and σ02 = 0. This models the continuous case for addressing problems such as field reconstruction.

In this context we call α the sparsity ratio which is held fixed for all values of n. Under the above model, on an average the signal will be k sparse where k = αn. Note that k → ∞ as n → ∞. B. Sensing Channels In this paper we will primarily focus on linear channels with AWGN noise. The observations at each sensor is given by Yj = φTj X + N where φj is a vector of gains modulated for sensing or natural gains from environment. Note that these modulated gains could be random. In this context SNR is an important parameter and we consider fixed SNR scenarios. In particular the observation vector at the set of m sensors is given by,

Y=



SNRΦX + N

(10)

where the matrix Φ = [φij ] ∈ Rm×n are the corresponding (possibly random) sensing gains. We call Φ the sensing matrix.

Now there are two ways of normalization of the sensing matrix Φ, row-wise or column-wise, with respect to which SN R can remain fixed. These lead to qualitatively different results and we therefore separate them in to two cases below. These cases are motivated by both practical and historical reasons.

Sensor Network Scenario: In the SNET scenario where we have a collection of m sensors, each sensor is power limited. Therefore, we must impose,

E

n X j=1

φ2i,j = 1; ∀ i

(11)

i.e. power of each row is normalized to one. Examples of such scenarios have been considered in several papers [18],[8], where n sensors observe a noisy source and the idea is to minimize the latency August 6, 2009

DRAFT

10

and energy of transmissions to a fusion center to appropriately reconstruct the source. The authors propose a distributed strategy under assumptions of fast fading channel from the sensors to the fusion center and the overall communication model is Y = Φ(X + W)

(12)

where W is the additive noise at the sensors and Φ is m × n matrix of channel gains in m uses of the channel. Each sensor has power constraint P . Here the network resource is the number of channel uses, say m and one needs to minimize m. It can be seen that this problem can be recast in terms of sensing capacity (or compression rate) where now sensing capacity addresses bounds to energy and channel uses for adequate reconstruction. Compressed Sensing Scenario: In the compressed sensing scenario we have a collection of m observations that correspond to signal components in a collection of basis consisting of unit norm vectors. Thus here the columns of the matrix φ is normalized -

E

m X i=1

φ2i,j = 1; ∀ j

(13)

i.e. power of each column is normalized to one. Channels Chosen from Random Ensembles: As discussed earlier we can have sensing matrices drawn from random ensembles. For convenience we consider the class of random Gaussian matrices -

Φ = G ∈ Rm×n : Gij ∼ (1 − ξ)N (µ0 , σ02 ) + ξN (µ1 , σ12 )

(14)

i.e., sensing matrix G is random matrix such that for each row i, Gij , j = 1, 2, .., n are distributed i.i.d according to a mixture distribution, (1 − ξ)N (m0 , σ02 ) + ξN (m1 , σ12 ). To this end, we have the following definition for sensing diversity. Definition 2.2: Sensing diversity for a sensor is defined as the average number of non-zero elements in the sensing vector corresponding to the sensor. Thus for the mixture model on an average each sensor will have on average diversity of ξn. For this ensemble in order to have a fixed SNR we normalize the non-zero entries suitably such that either the average power for each row (SNET scenario) or the average power of each column (CS scenario) is normalized to one. Recently another class of sensing matrices have been proposed in [19], based on random filtering and in [20] where Toeplitz structured sensing matrices were considered from the point of view of computationally August 6, 2009

DRAFT

11

faster and efficient sensing. Our methods can be extended to handle these cases as well. Some results in this direction have been reported in [17].

C. Solution Objectives and Distortion measures Typical examples are (1) Support (or sparsity pattern) recovery, i.e. f (X) = 1X6=0 ; this is useful in target localization problems where the strengths of targets is not important; (2) Exact recovery, i.e. f (X) = X; this is useful in problems of direct field estimation when one needs to reconstruct the underlying

phenomena in detail. (3) Subset Selection, i.e., given L disjoint subsets, Sk ⊂ X , k = 1, 2, . . . , L determine which subset the observation comes from. (4) Prediction: f (X) = ΦX and we are interested in predicting the signal f (X) from noisy observations Y = GX+N. Typical distortion measures that are of interest in these applications are (1) Hamming distortion measure for discrete solution objectives such as sign pattern and support recovery ; (2) Squared distortion measure for continuous solution objectives such as exact recovery. III. S ENSING C APACITY

FOR

D ETECTION

AND

C LASSIFICATION

We will now show that for problems of detection and classification where the number of hypotheses does not scale with ambient signal dimensions the sensing capacity is infinite. This implies that the number of measurements can scale sub-linearly with respect to signal dimension. Therefore, for such problems sensing capacity does not offer an adequate framework for understanding performance. Now consider the non-random detection problem wherein we have a collection of signals X1 , . . . , XL that model L hypothesis or objects. Let dmin = min n1 ||Xi − Xj ||2 be the minimum distance between any two signals. Assume that dmin > 0. Further assume that the measurements are made according to the measurement model of Equation 10. It can be shown that under Gaussian ensemble for G with ξ = 1 and AWGN N, the expected probability of error (expectation taken over G) in classification is upper bounded as (see for e.g. [21]),

Pe ≤

1 1+

dmin SN R 4

!m/2

2log L

(15)

For finite number of hypothesis L 0 the sensing capacity is lower bounded by

C≥

1 2

log(1 + dmin4SN R ) −ǫ log L/n

(17)

Consequently, the sensing capacity in this case approaches ∞. For general hypothesis classes and observation models, similar bounds can be obtained using Large Deviations Principles involving KullbackLiebler (KL) divergence between the conditional distributions for the pair(s) of hypothesis. Indeed, a pointwise bound for the subset selection problem described in Section II-C can be readily obtained as well: Suppose we are given L disjoint sets, Skn ⊂ X n ⊂ Rn , k = 1, 2, . . . , L and a signal Xn ∈ X n is picked. We need to determine the membership of Xn from observations given by Equation 10. We consider the asymptotic situation with n → ∞ but L fixed. Suppose, Xn ∈ S1n without loss of generality and we have the unnormalized distance, d˜min (X) =

lim inf

n→∞,Zn ∈Sjn , j6=1

kXn − Zn k2 > 0

This implies that there is a number n0 (X) such that kXk −Zk k2 ≥ d˜min (X)/2 for all k ≥ n0 (X). Here Xk and Zk are projections onto the first n components. Now, this implies that all of the claims above

follows with C ≥

1 2

˜ d

(X)SN R

log(1+ min 8n log L/n

)

for all n ≥ n0 (X). Consequently, sensing capacity is arbitrarily

large for sufficiently large SNR (i.e. SN R ≫ Log(L)). In summary, if information rate or the complexity of the solution objective goes to zero with the dimension of the signal a constant number of measurements usually suffice. IV. S ENSING C APACITY

FOR

E XACT S UPPORT R ECOVERY

In this section we consider the problem of exact support recovery under the SNR model of Equation 10 for both the SNET and CS scenarios described through the sensing channels in Equations 11 and 13 respectively. We will develop results for the family of k-sparse non-random sequences, Ξ,

Ξ = {X ∈ Rn | Card(supp(X)) ≤ k, |Xi | > β in the support}

August 6, 2009

(18)

DRAFT

13

ˆ is the estimate We will now briefly recall sensing capacity for exact support recovery. Suppose, X

for X based on data, by sensing capacity for exact support recovery we mean the best ratio n/m as in Equations 5,6 with the error probability satisfying,

ˆ 6= Sgn(X)} −→ 0 P{Sgn(X)

(19)

Here the Sgn function is described by:    1, if X > 0   Sgn(X) = −1, if X < 0     0, if X = 0

(20)

We begin by deriving the necessary conditions for exact support recovery for SNET and CS scenarios. Necessary conditions yield lower bounds to sensing capacity.

A. Necessary conditions for exact support recovery In this section we will derive the necessary conditions for exact support recovery. The necessary conditions are expressed in terms of upper bounds to sensing capacity derived from lower bounds to probability of error. For the lower bounds to probability of error, we will use the following theorem of [22] which provides a lower bound for N -ary hypothesis testing. Lemma 4.1: Let (Y, B) be a a space with a σ−field, P1 , ..., PN be probability measures on B and θ(y) be the estimator of the measures defined on Y . Then:

N 1 P 1 X i,j D(Pi kPj ) + 1 N2 max Pi (θ(y) 6= Pi ) ≥ (21) Pi (θ(y) 6= Pi ) ≥ 1 − 1≤j≤n N log(N ) i=1 However the use of Lemma 4.1 requires a finite number of hypothesis. In order to use this lemma to

derive bounds for general k-sparse sequences X ∈ Ξ we first show that the worst case probability of error in support recovery is lower bounded by the probability of error in support recovery for X belonging to k-sparse sequences in {0, β}n . To this end we have the following Lemma.

Lemma 4.2: Let Ξ be the set of k sparse as defined in Equation 18. Denote the observed distribution induced by X as PX . Let Ξ0 = {X ∈ Ξ | Xj = β, j ∈ Supp(X)} be a subset of Ξ consisting of binary ˆ denote an estimator for X based on observation Y . Furthermore, let Ξ1 ⊂ Ξ0 . valued sequences. Let X

August 6, 2009

DRAFT

14

Then, ˆ 6= X, X ˆ ∈ Ξ0 |G) ˆ 6= Supp(X)|G} ≥ min max PX (X Pe|G = min max PX {Supp(X) X∈Ξ0 ˆ X∈Ξ

X∈Ξ ˆ X∈Ξ

ˆ 6= X, X ˆ ∈ Ξ0 |G) ≥ min max PX (X ˆ X∈Ξ1 X∈Ξ 0

(22) (23)

Proof: See Appendix. We will first consider the SNET scenario and then contrast it with the CS scenario. The main idea behind obtaining the following results is to lower bound the error probability by using Lemma 4.2 thereby restricting attention to binary sequences and further a smaller subset of n elements in Ξ and subsequently using Lemma 4.1 to derive the lower bounds for the set of binary sequences. The lower bound thus obtained yields the necessary conditions. To this end we have the following theorem for support recovery in sensor networks: Theorem 4.1 (SNET necessity): Let X ∈ Ξ. Then for observations as in Equation 10 and normalized row-wise (Equation 11) based on the Gaussian ensemble for sensing matrix G with ξ = 1, the sensing capacity is bounded by: C(SN R, α) ≤ min

Proof: See Appendix.

2β 2 SN R 2β 2 α(1 − α)SN R , log n − 1 α log α1

!

(24)

We have the following corollary for the CS scenario: Corollary 4.1 (CS necessity): Let X ∈ Ξ. Consider the compressed sensing setting of Equation 13 based on Gaussian ensemble for sensing matrix G with ξ = 1. Then it is necessary that the SNR increase as

log n−1 2β 2

for perfect support recovery.

Proof: See Appendix. The first bound in Theorem 4.1 implies that the number of measurements necessary for exact recovery in the SNET case is asymptotically

m≥

n log n 2β 2 SN R

(25)

Moreover the second bound in Theorem 4.1 suggests that as α → 0 the sensing capacity goes to zero, i.e.,

α → 0 =⇒ C(SN R, α) → 0

August 6, 2009

(26)

DRAFT

15

This means that in contrast to the CS scenario for the SNET scenario under fixed SNR the number of measurements must increase faster than the dimension of the signal in order to recover the support. One may conclude from this analysis that we need to increase SNR moderately (say log n) to ensure recovery. Nevertheless our Max-Likelihood bounds derived in the next section indicate that sensing capacity goes to zero even with slowly increasing SNR.

B. Sufficient conditions for exact support recovery Let X0 ∈ Ξ be the true signal where,

Ξ = {X | Card(supp(X)) ≤ k, |Xi | > β in the support}

(27)

Ξ0 = {X | Card(supp(X)) ≤ k, |Xi | > β/2 in the support}

(28)

Consider the set,

In order to derive sufficient conditions for exact support recovery we consider the following estimator:

ˆ = arg min kY − GXk2 X X∈Ξ0

(29)

ˆ as the final solution. and report Supp(X)

Lemma 4.3: Error event of the above algorithm is:

Pe = P({

min

X∈Ξ0 :dsupp (X,X0 )≥1

˜ 2 }) kY − GXk2 ≤ min kY − G0 Xk ˜ Ξ ˜0 X∈

≤ P(E1 ) + P(E2 )

(30) (31)

˜ = Supp(X0 ), dsupp (., .) where G0 is the matrix corresponding to the support of X0 and Supp(X)

denotes the support errors and where,

E1 = {

min

X∈Ξ0 :dsupp (X,X0 )≥1

˜ 2} kY − GXk2 ≤ min kY − G0 Xk ˜ X

 E2 = N : k(GT0 G0 )−1 GT0 Nk∞ ≥ β/2

August 6, 2009

(32)

(33)

DRAFT

16

Proof: Denote ∆

A={

min

X∈Ξ0 :dsupp (X,X0 )≥1

˜ 2} kY − GXk2 ≤ min kY − G0 Xk

(34)

˜ X∈Ξ 0

∆ ˜ 2 = min kY − G0 Xk ˜ 2} B = { min kY − G0 Xk

(35)

˜ X

˜ X∈Ξ 0

Then we have ¯ ≤ P (A ∩ B) + P (B) ¯ Pe = P (A) = P (A ∩ B) + P (A ∩ B)

(36)

Actually, A∩B = {

min

˜ 2} ∩ B kY − GXk2 ≤ min kY − G0 Xk

(37)

= {

min

˜ 2 } = E1 kY − GXk2 ≤ min kY − G0 Xk

(38)

X∈Ξ0 :dsupp (X,X0 )≥1

X∈Ξ0 :dsupp (X,X0 )≥1

˜ X∈Ξ 0 ˜ X

¯ = { min kY − G0 Xk ˜ 2 6= min kY − G0 Xk ˜ 2 } ⊂ E2 B ˜ Ξ ˜0 X∈

(39)

˜ X

And the lemma follows. ¿From the above Lemma, it is sufficient to focus on events E1 and E2 separately. To this end we have the following theorem for CS scenatio Theorem 4.2 (CS Sufficiency): Consider the compressed sensing scenario of Equation 13 with Gaussian ensemble for sensing matrix G with ξ = 1 and signal domain satisfying Equation 18, then for the ML estimator as described above the following sensing capacity is achievable for SN R ≥

p



1  p C(SN R, α) ≥  √ ( 2α + 2H(2α))

 v u u t

2 1+



32 log 2n β 2 SN R



 1/2 − 1

32 log 2n β2

-

(40)

where, α = k/n is the sparsity of the signal domain and H(α) is the binary entropy function in nats. Proof: The proof can be found in Appendix. Here we will provide the intuition behind the proof. The proof essentially follows by separately upper bounding P(E1 ) and P(E2 ). For SN R ≥ 32β 2 log 2n, P(E2 ) → 0. In order to upper bound P(E1 ), using Restricted Isometry Property (RIP) of G we show that

support error events with support error ≥ 1 are almost contained in union of support errors of 1 and then use a union bound to evaluate the sufficient conditions. Then we use concentration of measure results on the RIP constant of G to arrive at the expression.

August 6, 2009

DRAFT

17

Note that the achievable capacity goes to zero unless SN R ≈ O(log n). This is also borne out by the necessary condition of Corollary 4.1. Therefore, for this regime it follows that the number of measurements, m is given by:

m ≈ 6H(2α)n, if SN R ≈ log n

(41)

Thus we see that the number of measurements is proportional to the sparsity independent of dimension n for sufficiently large SN R. Furthermore, as α → 0 we see that achievable sensing capacity goes to

infinity for sufficiently large SN R, i.e,

α → 0 =⇒ C(SN R, α) → ∞, if SN R ≈ log n

(42)

We have the following theorem for sensor networks Theorem 4.3 (SNET Sufficiency): Consider the SNET scenario of Equation 11 with Gaussian ensemble for sensing matrix G with ξ = 1 and signal domain satisfying Equation 18, then the ML estimator as described above achieves a sensing capacity described by:

C(SN R, α) ≥

where η2 =

p

p

2 (η0 + η2 )2 + 4η2 η0 − (η0 + η2 ) 4η02 η22

(43)

p √ 32 log 2n/β 2 SN R is the normalized signal-to-noise ratio and η0 = 3( α + 2H(α))

is a measure of the sparsity with H(q) := −q log q − (1 − q) log(1 − q), q ∈ (0, 1/2).

Proof: The proof is detailed in the Appendix. The ideas and intuition behind the proof is the same as for Theorem 4.2. The final expression for this case is complicated due to implicit nature of the sufficient conditions on sensing capacity. We have different regimes of interest. Note that as before, if SN R is held constant, the parameter η2 → ∞. This implies that the achievable sensing capacity goes to zero. Now if η2 ≫ η0 , and 1/η2 ≫ 0

(i.e. SN R ≈ log n) we get a different regime.

C(SN R, α) ≈ (

1 )2 η0 + η2

(44)

is achievable. This implies that even for sufficiently large SN R the ML-sensing capacity remains bounded. In addition even for large SN R the sensing capacity the number of measurements required is not proportional to the sparsity ratio: August 6, 2009

DRAFT

18

10 SNET CS

9 8

SNR = log(n)

Compression Rate

7 6 5 4 3 2 1 0 −7 10

Fig. 2.

−6

10

−5

−4

10 10 Sparsity Ratio = k/n

−3

10

−2

10

Comparison of Lower bounds on Compression Rate for SNET and CS scenarios based on max-likelihood

when SNR is approximately log n. In CS setting the compression rate increases nearly linearly with sparsity and goes to zero as sparsity goes to zero. In contrast for the SNET scenario the compression rate goes to a constant for

m ≈ n(η0 + η2 )2

(45)

This implies that the number of measurements generally scale with signal dimension n for support recovery even when SN R is sufficiently large but not infinite (i.e. η2 = 0.). A particularly adverse effect is the difference between high and low levels of sparsity. At high levels of sparsity, α, the number of measurements due to the SN R factor η2 is relatively small and therefore the number of measurements grows linearly with sparsity. In contrast at low levels of sparsity the SNR overhead is significant and the number of measurements is nearly equal to the signal dimension. This is schematically shown in Figure 2. This has fundamental consequences for sensor networks. It implies that we cannot generally hope to achieve compression in a sensor network scenario unlike the compressed sensing scenario. We point out that these results give somewhat sharper results than those obtained through convex optimization techniques such as LASSO as seen in Table II. Not only do we require smaller SNR but also admit constant values for sparsity as opposed to vanishing sparsity levels with the LASSO method.

August 6, 2009

DRAFT

19

Remark 4.1: It can be seen that for the parameter space of exactly k-sparse sequences given by, Ξ = {X | Card(supp(X)) = k, |Xi | > β in the support}

(46)

in the sufficiency results as described above the requirement on scaling of SNR can be sharpened from O(log 2n) to O(log 2(n − k)). This is due to the fact that for a fixed k-sparse sequence X0 with a given

support, all the support errors can only occur in n − k other locations. To this end note that the effect of sensing architecture is not clear in the above analysis. In this context an alternative analysis tool is to employ Fano’s inequality, [14]. The main drawbacks are that it could be loose and applicable only to discrete alphabets. Nevertheless, it lends itself to tractable analysis for understanding the effect of sensing diversity as shown in [17]. In addition we also want to understand the performance when we relax the reconstruction criteria to tolerate some distortion in reconstruction. As we will see in the following section if we relax our objective to average distortion instead of exact support recovery the lower bound is reasonably tight. In this context the lower bound on probability of error given by Lemma 4.1 can also be applied over the set of rate distortion quantization points for random sources or over the set of ǫ-net coverings for parameter spaces for non-random sources as considered in [22]. Nevertheless this approach requires calculation of the Kullback-Liebler (KL) distance between pairs of probability distributions which is hard to compute for exponential set of points for arbitrary signal spaces. In contrast to this approach, in the following we will derive a novel lower bound directly in terms of rate distortion of the source and the mutual information between the source and the observations conditioned on the sensing channel(s). Moreover, it can also be shown that the lower bound that we obtain is tighter than that in Lemma 4.1 as the average KL distance between the hypotheses upper bounds the mutual information. V. B OUNDS

TO

P ROBABILITY

OF ERROR FOR

AVERAGE D ISTORTION

In this section we begin by developing a modified Fano’s inequality to handle arbitrary alphabets, in particular real-valued signals, and to handle arbitrary distortion in reconstruction. We will then derive Max-Likelihood (ML) upper bounds to the probability of error in reconstruction for the average distortion criteria. The lower and upper bounds to probability of error are subsequently used to derive upper bounds and lower bounds to sensing capacity for the sparse real-valued signals X under the measurements Y satisfying equation (10) for average distortion criteria. As pointed out in the previous section this direction holds forth two advantages: (a) Average distortion leads to non-vanishing sensing capacity even for constant SN R; (b) the effect of sensing architecture can be more easily discerned through information

August 6, 2009

DRAFT

20

theoretic inequalities. Based on these results we will show that the upper and lower bounds for sensing capacity are close for large but fixed SNR.

A. Modified Fano’s Inequality In the following we will use X and X n interchangeably. Though the theorem derived below uses an i.i.d. assumption on the components of X it is easy to see that the proof extends to all sources for which Asymptotic Equipartition Property (AEP), [14] holds and for which the rate distortion theorem, [14] holds. For sake of clarity and ease of exposition we will assume that components of X are i.i.d. in the following. Lemma 5.1: Given observation(s) Y for the sequence X n , {X1 , ..., Xn } of random variables

ˆ n (Y) be the reconstruction of X n from Y . Let the distortion measure drawn i.i.d. according to PX . Let X ˆ n (Y)) = Pn d(Xi , X ˆi (Y)). Then, be given by d(X n , X i=1 P



1 ˆn d(X (Y), X n ) ≥ d0 n



RX (d0 ) − K(d0 , n) − n1 I(X n ; Y) − o(1) ≥ RX (d0 )

(47)

where K(d0 , n) is bounded by a constant and where RX (d0 ) is the corresponding (scalar) rate distortion function for X . Proof: Here we will provide an outline of the proof without technical details. The detailed proof can be found in the Appendix. The proof of lemma 5.1 closely follows the proof of Fano’s inequality [23], where we start with a distortion error event   1 if 1 d(X n , X ˆ n (Y)) ≥ d0 n En =  0 otherwise

(48)

We then consider the following expansion,

H(f (X n ), En |Y) = H(f (X n )|Y) + H(En |f (X n ), Y) = H(En |Y) + H(f (X n )|En , Y)

where f (X n ) is the vector-rate distortion mapping subject to an average distortion level d0 under the measure d(., .), [23]. This implies that,

August 6, 2009

DRAFT

21

H(f (X n )|Y) = H(En |Y) − H(En |f (X n ), Y) + H(f (X n )|En , Y)

(49)

= I(En ; f (X n )|Y) + H(f (X n )|En , Y)

(50)

≤ H(En ) + H(f (X n )|En , Y)

(51)

Note that H(En ) ≤ 1. Thus we have, H(f (X n )|Y) ≤1 + Pne H(f (X n )|Y, En = 1) + (1 − Pne )H(f (X n )|Y, En = 0)

⇒ Pne ≥

(52)

H(f (X n )|Y) − H(f (X n )|Y, En = 0) − 1 H(f (X n )|Y, En = 1) − H(f (X n )|Y, En = 0)

Now note that I(f (X n ); X n ) = H(f (X n )) and

H(f (X n )|Y) = H(f (X n )) − I(f (X n ); X n ) ≥ H(f (X n )) − I(X n ; Y)

This implies that,

Pne ≥

I(f (X n ); X n )−I(X n ; Y) −H(f (X n )|Y, En = 0) −1 H(f (X n )|Y, En = 1) −H(f (X n )|Y, En = 0)

Since f (X n ) is the rate distortion mapping, by definition of rate distortion function note that I(f (X n ); X n ) ≥ nRX (d0 ) by definition and w.h.p., H(f (X n )|Y, En = 1) ≤ log |Range(f (X))| → nRX (d0 )

Identifying, 1 H(f (X n )|Y, En = 0) = K(n, d0 ) n

we get the result.

Essentially, K(n, d0 ) =

1 n

× log(♯ neighbors of a quantization point in an optimal n-dimensional

rate-distortion mapping). In the derivation of the lower bound the mapping f (X n ) is a free parameter which can be optimized. It turns out that the optimal choice for f (X n ) is the one that achieves the rate distortion performance at distortion level d0 . To see this first note that H(f (X n )|Y, En = 1) and

August 6, 2009

DRAFT

22

H(f (X n )) depend on the range of the mapping f (X n ), i.e. the number of quantization points chosen

say N . For a good lower bound one may want to choose it as large as possible. On the other hand H(f (X n )|Y, En = 0) depends on the number of quantization points contained in the distortion ball

of radius d0 . We want to keep this as low as possible since it decreases the lower bound. It can be seen that selecting a mapping finer than at level d0 (i.e. finer quantization) will increase both N and H(f (X n )|Y, En = 0). This in fact balances out and the lower bound does not change. On the other hand

selecting a coarser mapping decreases the range of f (X n ) and hence decrease H(f (X n )|Y, En = 1), H(f (X n )) and H(f (X n )|Y, En = 0). This decreases both the numerator and the denominator and

thus the lower bound is looser. Thus selecting the mapping f (X n ) to be the rate distortion mapping at distortion level d0 yields the tightest lower bound. 1) Discrete X under Hamming Distortion:

Lemma 5.2: Given observation(s) Y for the sequence X n , {X1 , ..., Xn } of random variables

ˆ n (Y) be the reconstruction of X n from Y . drawn i.i.d. according to PX and Xi ∈ X , |X | < ∞. Let X

For hamming distortion measure dH (., .) and for distortion levels,   d0 ≤ min 1/2, (|X | − 1) min PX X∈X

(53)

we have

1 nRX (d0 ) − I(X n ; Y) − 1 − log nd0 ˆ n (Y) ≥ d0 )) ≥   P( dH (X n , X n n log(|X |) − n h(d0 ) + d0 log(|X | − 1) + lognnd0 Proof: See Appendix.

(54)

Remark 5.1: The extended Fano’s inequality can be easily seen to hold for arbitrary sensing functions, arbitrary distortion measures and source distributions. Moreover the terms involved in the lower bound capture the effect of the relevant parameters namely, the source distribution and distortion in reconstruction via the rate distortion function and the effect of SNR and sensing functions via the mutual information term. In this way one can study the effect of various sensor topologies on the performance. Several results using this approach have been quantified in a previous paper [17]. In addition one can directly study the performance with respect to arbitrary solution objectives, though in the present paper we will concentrate on reconstruction of the source itself, i.e. the solution objective is an identity mapping. In the next section we will provide an upper bound to the probability of error for field estimation problems. As shown before even for discrete spaces exact reconstruction implies a zero sensing capacity. Thus in the following we will focus on the case when a certain distortion in reconstruction is allowed.

August 6, 2009

DRAFT

23

B. Upper Bounds In this section we will provide a constructive upper bound to the probability of error in reconstruction subject to an average squared distortion level. Unlike the lower bounds in this section we will provide upper bounds for the particular observation model of equation (10). This could potentially be generalized but we will keep our focus on the problem at hand. To this end assume that we are given a minimal cover as prescribed by the following lemma of [23], Lemma 5.3: Given ǫ > 0 and the distortion measure dn (., .), let Nǫ (n, d0 ) be the minimal number n of points Z1n , ..., ZN ⊂ X n satisfying the covering condition, ǫ (n,d0 )



PX n 

Nǫ (n,d0 )

[

i=1

Let Nǫ (n, d0 ) be the minimal such number. Then,



Bi  ≥ 1 − ǫ

1 lim sup Nǫ (n, d0 ) = RX (ǫ, d0 ) n n  n 1 where Bi , X : n dn (X n , Zin ) ≤ d0 , i.e. the d0 distortion balls around the point Zin and where

RX (ǫ, d0 ) is the infimum of the ǫ- achievable rates at distortion level d0 .

This means that there exists a function f (X) : X → Zni such that P( n1 d(X, Zi ) ≤ d0 ) ≥ 1 − ǫ. For sufficiently large n, we will assume that X belongs to the typical set in the following. For such a cover . the points Zi = Zin correspond to the rate-distortion quantization points. Since all the sequences in the typical set are equiprobable, we convert the problem to a max-likelihood detection set-up over the set of rate-distortion quantization points given by the minimal cover as follows. Given G and the rate distortion points Zi corresponding to the functional mapping f (X n ), we enumerate the set of points, GZi ∈ Rm .

Then given the observation Y we map Y to the nearest point (in Rm ) GZi . Then we ask the following probability,

Pe (i, j) = P {X ∈ Bi → Zj |d(Bi , Bj ) ≥ 2d0 , G}

(55)

In other words we are asking for the pairwise probability of error in mapping a signal that belongs to the distortion ball Bi to the quantization point Zj of the distortion ball Bj under the noisy mapping GX+N such that the set distance between the distortion balls is ≥ 2d0 , see figure 3. Under the minimum

distance estimator we have,

August 6, 2009

DRAFT

24

 Pe (i, j) = P ||GX + N − GZi ||2 ≥ ||GX + N − GZj ||2

(56)

where we have omitted the conditioning variables and equations for brevity. Simplifying the expression inside the probability of error we get that,   ||G(X − Zj )||2 − ||G(X − Zi )||2 T G(Zj − Zi ) ≥ Pe (i, j) = P 2N ||G(Zj − Zi )|| ||G(Zj − Zi )||

(57)

Under the assumption that the noise N is an AWGN noise with power N0 in each dimension, its projection onto the unit vector

G(Zj −Zi ) ||G(Zj −Zi )||

is also AWGN with power N0 in each dimension. Thus we

have   ||G(X − Zj )||2 − ||G(X − Zi )||2 Pe (i, j) = P N ≥ 2||G(Zj − Zi )||   ||G(X − Zj )||2 − ||G(X − Zi )||2 ≤ P r N ≥ min X∈Bi 2||G(Zj − Zi )||

(58) (59)

where we have further upper bounded the probability of the pairwise error via choosing the worst case X that minimizes the distance between the ball Bi and the quantization point Zj and maximizes the

distance from the quantization point Zi within the distortion ball Bi . For the case of squared distortion and covering via spheres of average radius d0 , it turns out that the √ worst case X is given by X = 3Zi4+Zj and ||Zi − Zj || = 4 nd0 . Plugging this value in the expression we have for the worst case pairwise probability of error that   ||G(Zi − Zj )|| Pe (i, j) ≤ P N ≥ 4   ||G(Zi − Zj )||2 ≤ exp − 32N0

(60) (61)

where the second inequality follows by a standard approximation to the Q(.) function. Now we apply the union bound over the set of rate distortion quantization points Zj minus the set of points that are the neighbors of Zi (see figure 3). For reasonable values of distortion d0 , the total number of such points still behaves as ∼ 2nRX (d0 ) , where RX (d0 ) is the scalar rate distortion function, [14]. Hence we have,   p ||G(Zi − Zj )||2 2 ˆ : ||Zi − Zj || = 4 nd0 2nRX (d0 ) Pe (||X − X|| ≥ 2nd0 ) ≤ exp − 32N0 August 6, 2009

(62)

DRAFT

25

Fig. 3.

Figure illustrating the minimum distance decoding under the Maximum Likelihood set-up over the rate

distortion quantization points.

VI. S ENSING

CAPACITY FOR

A PPROXIMATE R ECOVERY

In rest of the paper we will explicitly evaluate sensing capacity for the fixed SNR linear observation models for SNET and CS scenarios based on the upper and lower bounds to the probability of error obtained above for approximate recovery. Unlike the previous case here we will limit ourselves to a Bayesian framework where we model the sparse sensing domain through the following mixture model,

PX = αN (µ1 , σ12 ) + (1 − α)N (µ0 , σ02 )

(63)

We will evaluate results for the sensing ensemble Φ = G ∈ Rm×n : Gij ∼ (1 − ξ)N (µ0 , σ02 ) + ξN (µ1 , σ12 )

(64)

with appropriate row-wise normalization in SNET case and column-wise normalization in the CS case.

A. Bounds to approximate recovery for the SNET case 1) Discrete X, full diversity: Under this case X is drawn i.i.d. according to, PX = αN (1, 0) + (1 − α)N (0, 0)

August 6, 2009

(65)

DRAFT

26

i.e., X is a Bernoulli(α) sequence. Also note that α ≤ 1/2 in order for X to be sparse. For this case we have the following lemma. Lemma 6.1: With Hamming distance as the distortion measure, for a diversity ratio of ξ = 1 for the Gaussian ensemble for G and for d0 < α, the sensing capacity C obeys, 1 2

d0 log2 (1 + SNR ) 2 ≤ C(α, SN R, d0 ) ≤ H(α) − H(d0 )

1 2

log2 (1 + αSNR) H(α) − H(d0 )

(66)

where H(.) denotes the binary entropy function. Note that d0 < α as it does not make sense to reconstruct with a higher distortion that α. Proof: ¿From lemma 5.2 the probability of error is lower bounded by zero if the numerator in the lower bound is negative, this implies for any m, n that

Cm,n (d0 , G) ≤

1 m I(X; Y|G) RX (d0 ) − n1 − lognnd0

(67)

Note that for the binary alphabet RX (d0 ) = H(α) − H(d0 ). Further, since G is random we take expectation over G and bound the mutual information as follows,

EG I(X n ; Y|G) ≤ ≤

Pmax PX : n1 EXi2 ≤α

PX :

1 EG log det(Im×m + GXXT GT ) 2

log det(Im×m Pmax 1 EXi2 ≤α n

+ EG GXXT GT SNR) =

(68) m log(1 + αSN R) 2

(69)

Letting m, n → ∞ the result follows. The proof of the lower bound follows from the upper bound to the probability of error in approximate recovery derived in section V-B by taking expectation with respect to G. 2) Continuous X, full diversity: Under this case X is drawn i.i.d. according to,

PX = αN (1, 1) + (1 − α)N (0, 0)

(70)

Again note that α ≤ 1/2 in order for X to be sparse. For this case we have the following lemma. Lemma 6.2: With squared distance as the distortion measure, for diversity ratio ξ = 1 for the Gaussian ensemble for G and for values of d0 ≤ α2 , the sensing capacity C obeys, 1 log2 (1 + d0 SNR ) 2 2 log2 (1 + αSNR) ≤ C(α, SN R, d ) ≤ 0 α α H(α) + 2 log2 d0 H(α) + α2 log 2dα0

1 2

August 6, 2009

(71)

DRAFT

27

2.5

Sensing Capacity

2

1.5 SNR increasing 1

0.5

0 0

Fig. 4.

0.1

0.2

α

0.3

0.4

0.5

The plot of sparsity versus upper bounds to the sensing capacity for various SNRs for the binary case

(X = {0, 1}) for zero Hamming distortion.

Notice that here d0 ≤ α/2 and for reasonable reconstruction one typically desires d0 = ǫα for some ǫ > 0.

Proof: ¿From lemma 6.1 we have that EG I(X; Y|G) ≤

m 2

log(1 + αSNR). In order that the

probability of error be lower bounded by zero, from lemma 5.1 it follows that asymptotically 1 EG I(X; Y|G) n ≤ m m RX (d0 ) − K(d0 , n)

(72)

It can be shown that |K(d0 , n) − 0.5α log 2| < ǫ with ǫ arbitrarily small for large enough n, see e.g. [24]. The lemma then follows by noting that (see [25]),

RX (d0 ) = H(α) +

α 2

log( dα0 ) if 0 < d0 ≤ α

(73)

The proof of the lower bound follows form the upper bound to the probability of error in approximate recovery derived in section V-B by taking expectation with respect to G. We now point out several interesting facts. For the binary as well as the continuous alphabet case, sensing capacity is a function of sparsity and SNR. Further note that as α ↓ 0 the sensing capacity goes to zero. This implies that in SNR limited sensor networks it is difficult to detect very sparse events. This is shown in figure 4. B. Bounds to Approximate recovery for the CS case 1) Discrete X, full diversity: For this case we have the following lemma. August 6, 2009

DRAFT

28

Lemma 6.3: With Hamming distance as the distortion measure, for a diversity ratio of ξ = 1 for the Gaussian ensemble for G and for d0 < α, the sensing capacity C obeys, 1 2

nd0 log2 (1 + SNR ) 2m ≤ C(α, SN R, d0 ) ≤ H(α) − H(d0 )

1 2

log2 (1 + αnSNR ) m H(α) − H(d0 )

(74)

where H(α) denotes the binary entropy function. Note that d0 < α. Proof: The proof for the upper bound follows along the same lines as that of 6.1 with the following upper bound to the mutual information -

EG I(X n ; Y|G) ≤

m nαSNR log(1 + ) 2 m

(75)

The proof of the lower bound follows from the upper bound to the probability of error in approximate recovery derived in section V-B by taking expectation with respect to G. 2) Continuous X, full diversity: For this case we have the following lemma. Lemma 6.4: With squared distance as the distortion measure, for diversity ratio ξ = 1 for the Gaussian ensemble for G and for values of d0 ≤ α2 , the sensing capacity C obeys, SNR ) nαSNR 1 log2 (1 + nd02m ) 2 log 2 (1 + m ≤ C(α, SN R, d ) ≤ (76) 0 α α α α H(α) + 2 log d0 H(α) + 2 log 2d0 Notice that here d0 ≤ α/2 and for reasonable reconstruction one typically desires d0 = ǫα for some 1 2

ǫ > 0.

Proof: The proof is similar to the proof of Lemma 6.2. C. Contrast between the CS and SNET scenario for approximate recovery We will now compare the SNET and the CS case in terms of achievable sensing capacity. For the CS case we have,

CCS ≥

1 2

SNR ) log(1 + nd02m = RX (d0 )

1 2

log(1 + Cd0 SNR ) 2 RX (d0 )

(77)

The above equation is implicit in sensing capacity C . Note that for C = 0 there is equality. It can be seen that inequality is satisfied for positive C for C =

1 2RX (d0 )

and SN R ≥

4RX (d0 ) . d0

This implies

m ≈ 2nRX (d0 ).

On the other hand for SNET case we have,

August 6, 2009

DRAFT

29 α = 0.25 SNR ≈ 6RX(d0)/d0

α = 0.25 SNR ≈ 6RX(d0)/d0

20

1 0.9 CS

16

Achievable Compression Rate

Achievable Sensing Capacity

18

SNET 14 12 10 8 6 4 2

CS SNET

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0 0

0.05

0.1 0.15 average distortion level d

0.2

0 0

0.25

0.05

0

0.1 0.15 average distortion level d

(a) Fig. 5.

0.2

0.25

0

(b)

(a) Figure showing sensing capacity as a function of distortion for CS and SNET cases for SN R =

6RX (d0 ) d0

for the

binary alphabet case for a fixed sparsity ratio α = 0.25. Note that for the CS case sensing capacity increases for decreasing RX (d0 ) whereas for SNET case it essentially remains almost constant. (b) The corresponding compression ratio difference between the CS and SNET cases. More compressibility is achieved in CS scenario.

CSN ET ≥

This implies that m ≈ 4RX (d0 ) d0

1 2

log(1 + d0 SNR ) 2 RX (d0 )

2nRX (d0 ) log(1+

d0

SNR ) . For d0 small we arrive at m ≈

(78) 4nRX (d0 ) . d0 SNR

Thus for SN R =

2

in this case m ≈ n. Thus the number of measurements required does not scale with the rate

distortion function. This difference is shown in Figure 5 for the binary alphabet case. The plots are generated for the CS scenario using Pade’s approximation to log(1 + x) ≈

x(6+x) 6+4x .

VII. E FFECT OF S ENSING A RCHITECTURE In this section we will comment on the full power of the information theoretic method that can be used to evaluate and compare performance of various sensor network configurations. We will focus on the SNET scenarios. Nevertheless the results can be easily extended to CS scenarios. Also, for sake of brevity in this paper we omit the details of the results and derivations and mainly point out several interesting results and their consequences. The reader is referred to [17] for further details. Using the result on the lower bound on the probability of error given by lemma 5.1, a necessary condition is immediately identified in order that the reconstruction to within an average distortion level 1 d0 is feasible, which is, RX (d0 ) − K(n, d0 ) ≤ I(X; Y|G). For a fixed prior on X the performance is n August 6, 2009

DRAFT

30

then determined by the mutual information term that in turn depends on G. This motivates us to consider the effect of the structure of G on the performance and via evaluation of I(X; Y|G) for various ensembles of G we quantify the performance of many different scenarios that restrict the choice of G for sensing. Under the case when G is chosen independently of X and randomly from an ensemble of matrices (to be specified later in the problem set-up), we have

I(X; Y, G) = I(X; G) + I(X; Y|G) | {z }

(79)

=0

= I(X; Y) + I(X; G|Y)

(80)

⇒ I(X; Y|G) = I(X; Y) + I(X; G|Y)

(81)

This way of expanding allows us to isolate the effect of structure of the sensing matrix G on the performance. In [17] we evaluated upper bounds to sensing capacity for various sensing configurations such as (a) Sensing Diversity which is the measure of field coverage per sensor as outlined in the problem set-up; (b) Deterministic filtering architecture motivated by computational advantages in sensing using FFT methods and (c) {0, 1} sensing matrices for sensors that cannot modulate the gains in different directions. For all these cases we reported the following general design principle- Sensing Capacity goes down as Sensing Diversity goes down. Since the structure of G effects sensing capacity via its effect on the mutual information I(X; Y|G) we have the following lemma for the Gaussian ensemble for G. Lemma 7.1: For a diversity ratio of ξ , with l = ξn as the average diversity per sensor, we have    m SNR EG I(X; Y|G) ≤ Ej log j+1 , (82) 2 l where the expectation is evaluated over the distribution   k n−k P(j) =

j

l−j  n l

In the above lemma j plays the role of number of overlaps between the projection vector and the sparse signal. As the diversity reduces this overlap reduces. The reduction in the overlap reduces the SNR and thus the mutual information. The proof of this result can be found in [26]. Moreover for random selection of non-zero entries for G or look directions we observe a saturation effect of sensing diversity wherein all the benefits are realized at a fraction of full diversity, see figure 6. This may lead to further computational savings.

August 6, 2009

DRAFT

31 α = 0.25, SNR = 10dB 0.4

0.28 0.27

0.35

0.26 Sensing Capacity

0.3 0.25 0.25

0.24

0.2

0.23 0.22

0.15 0.21 0.1 0.05 0

Fig. 6.

0.2

0.2 0.4 sparsity ratio α

0.19 0

0.5 Diversity ratio ξ

1

The gap between upper bounds to sensing capacity in very low diversity and full diversity for the binary

alphabet case. Shown also in the figure on the right is the sensing capacity as a function of diversity for fixed sparsity. Note the saturation effect with diversity ratio. For the plots we used SN R = 10dB, n = 60, k = 15. The upper bounds from Lemma 7.1 was used to evaluate the sensing capacity for various levels of diversity l = ξn.

For the {0, 1} ensemble sensing diversity corresponds to the number of ones in each row of the sensing matrix. For this ensemble we show in [17] that for a fixed diversity ξ > 0 random sampling by the sensors is better than contiguous location sampling. The two types of sampling are shown in figure 7. The difference in performance is shown in figure 8 for the case of binary sensing domain. VIII. A PPENDIX A. Proof of Lemma 4.2 d

For each X ∈ Ξ the observed distribution, Y , is given by Y = GX + N ∼ PX . We next consider the equivalence class of all sequences with the same support and lump the corresponding class of observation probabilities into a single composite hypothesis, i.e.,

[X] = {X′ ∈ Ξ | Supp(X′ ) = Supp(X)}

(83)

Each equivalence class bears a one-to-one correspondence with binary valued k-sparse sequences:

Ξ0 = {X ∈ Ξ | Xj = β, j ∈ Supp(X)}

(84)

Our task is to lower bound the worst-case error probability

August 6, 2009

DRAFT

32

Fig. 7.

Illustration of random sampling vs contiguous sampling in a sensor network. This leads to different structures on the

sensing matrix and that leads to different performance.

contiguous Vs randomized sampling, α = 1/12 10 randomized sampling contiguous sampling

Upper bound to sensing capacity

9 8 7 6 5 4 3 2 1 0

Fig. 8.

0.1

0.2

0.3

0.4 0.5 0.6 diversity ratio

0.7

0.8

0.9

1

A comparison of the upper bounds to sensing capacity for the randomized sampling Vs contiguous sampling case. X

is the Bernoulli model and the ensemble for G is the {0, 1} ensemble.

ˆ 6= [X]|G) Pe|G = min max PX ([X]

(85)

ˆ 6= X, X ˆ ∈ Ξ0 |G) ˆ 6= [X]|G) = max PX (X ˆ 6= [X]|G) ≥ max PX ([X] max PX ([X]

(86)

ˆ X

X∈Ξ

Now note that,

X∈Ξ

X∈Ξ0

X∈Ξ0

This implies that

August 6, 2009

DRAFT

33

ˆ 6= X, X ˆ ∈ Ξ0 |G) ˆ 6= Supp(X)|G} ≥ min max PX (X Pe|G = min max PX {Supp(X) X∈Ξ0 ˆ X∈Ξ

X∈Ξ ˆ X∈Ξ

(87)

ˆ 6= X, X ˆ ∈ Ξ0 |G) = min max PX (X

(88)

ˆ 6= X, X ˆ ∈ Ξ0 |G) ≥ min max PX (X

(89)

ˆ X∈Ξ0 X∈Ξ 0

ˆ X∈Ξ1 X∈Ξ 0

B. Proof of Theorem 4.1 ¿From Lemma 4.2 it is sufficient to focus on the case when X belongs to k-sparse sequences in {0, β}n and any subset of these sequences. We will establish the first part of the Theorem as follows:- Let Ξ1 be the subset of η sparse binary valued sequences defined as follows: Let X0 ∈ Ξ1 , be an arbitrary element with support Supp(X0 ) = η − 1. Next choose n elements Xj , j = 1, 2, . . . , n with support equal to η and at a unit Hamming distance from X0 . Denote by the probability kernel Pj , 0 ≤ j ≤ n the induced observed distributions. Under the AWGN noise model, for a fixed sensing channel, G, and a fixed set of elements, Xj , the probability kernels are Gaussian distributed, i.e., √ d Hj : Y ∼ Pj ≡ N ( SN R GXj , σ 2 ), j = 0, 1, . . . , n

(90)

Furthermore we have n + 1 hypotheses. Consider now the support recovery problem. It is clear that the error probability can be mapped into a corresponding hypothesis testing problem. For this we consider θ(Y) as estimate of one of the n + 1 distributions above and we have the following set of inequalities: n

ˆ 6= X | G) = max Pj (θ(Y) 6= Pj | G) ≥ Pe|G = max PX (X j

X∈Ξ1

1 X Pj (θ(Y) 6= Pj | G) n+1

(91)

j=0

where we write Pe|G to point out that the probability of error is conditioned on G. Applying Lemma 4.1 it follows that the probability of error in exact support recovery is:

Pe|G ≥

log(n + 1) −

1 n2

P

i,j,i6=j

log(n + 1)

D(Pi kPj ) − 1

(92)

We now lower bound D(Pi kPj ). We observe that that under AWGN noise N we have D(Pi kPj ) ≥ min SN RkG(Xi − Xj )k2 i, j, i6=j

August 6, 2009

(93)

DRAFT

34

Now taking expectation over G we get, 2

Pe = EG Pe|G

Rm log(n + 1) − 2β SN −1 n ≥ log(n + 1)

(94)

Now, to drive Pe → 0 requires the Sensing Capacity be bounded by the expression in Equation 24. To establish the second upper bound we consider the family, Ξ1 of exact k-sparse binary valued sequences which form a subset of Ξ0 . Following similar logic as in the proof of the first part, for the set  of exactly k-sparse sequences forming the corresponding nk hypotheses, we arrive at, log Pe = EG Pe|G ≥

n k





We compute the average pairwise KL distance -

1

n 2 k

()

P

i,j,i6=j

log

n k

D(Pi kPj ) − 1

k 1 X 1 X  D(Pi kPj ) = n SN RkG(Xi − Xj )k2 .♯(sequences at hamming distance 2j) n2 k

i,j,i6=j

k

(95)

(96)

j=1

The equality follows from symmetry. Now taking expectations over G we have, k k 1 X m 1 X 2  SN RkXi − Xj k2 SN R E kG(X − X )k =  G i j n 2 n nk k

j=1

(97)

j=1

   k m 1 X k m 2 n−k  = SN Rβ (2j) = 2β 2 SN Rα(1 − α)n n nk n j j j=1

(98)

where the last equality follows from standard combinatorial identity. The proof then follows by noting  that log( nk ) ≥ αn log α1 . C. Proof of Corollary 4.1 The proof follows by following the same steps as in the proof of Theorem 4.1 upto Equation 94. Here we note that the ratio n/m is no longer a factor. Therefore, following the rest of the steps we have that, 2β 2 SN R ≥ log n.

August 6, 2009

DRAFT

35

D. Proof of Theorems 4.2 and 4.3 We need the concept of Restricted Isometry Property (RIP). Definition 8.1: Let G = [gj ], j = 1, 2, . . . , n be a m × n matrix. Let ||gj || = 1 for all j . For every integer 1 ≤ k ≤ n let T denote an arbitrary subset of {1, 2, . . . , n} with |T | ≤ k. Let GT = [gj ], j ∈ T . We define the k-restricted isometry constants δk to be the smallest quantity such that GT obeys (1 − δk )kXk2 ≤ kGT Xk2 ≤ (1 + δk )kXk2 , ∀|T | ≤ k, ∀ X ∈ RT

(99)

We say G ∈ RIP (δk , α) if δk is the isometry constant for sparsity, k = αn. For notational convenience we will drop the index k in δk in the following. Now note that for the RIP property to make sense (1 − δ) ≤ (1 + δ) and 1 − δ > 0.This implies that 0 < δ < 1. To this end suppose that the sensing matrix G ∈ RIP (δ, 2α). We will now focus on E1 . This event is the union over all the support error events El which we will describe next. Let the true signal be X0 with a given support. Let El denote a support error of l, i.e. support of the estimated Xˆ0 is wrong (missed detections + false alarms) at l places. Under the given estimator the error event El of mapping X0 to a sequence X such that dsupp (X0 , X) = l, is given by -

El =

(

N:

min

X:X∈Ξ0 ,dsupp (X0 ,X)=l

2

||(Y − GX)|| ≤

˜ ||(Y − GX)||

2

min

˜ supp (X0 ,X)=0 ˜ X:d

)

(100)

where dsupp (., .) is the number of places where the supports don’t match. P(E1 ) then is the probability of the union of events of this type over 1 ≤ l ≤ 2k, i.e. P(E1 ) = P

2k [

l=1

El

!

(101)

Without loss of generality let Supp(X0 ) = k0 and assume that it is supported on the first k0 locations. Let the number of support errors be l and to this end fix the locations of the support errors. Let G3 denote the matrix of columns of G where miss occur, G1 denote the matrix of columns where the support matches and G2 denote the matrix of columns where false alarms occur. Then note that GX = G1 X1 + G2 X2 and GX0 = G3 X02 + G1 X01 . Then consider the event,

Sl =

(

N : min min ||(G3 X02 + G1 X01 − G1 X1 − G2 X2 + N)||2 ≤ X2 ∈Ξ0 X1

min

˜ supp (X0 ,X)=0 ˜ X:d

˜ 2 ||(Y − GX)||

)

(102) August 6, 2009

DRAFT

36

We will fix X2 and perform the inner minimization first. The inner minimum is achieved at,

X0 − X1 = −(GT1 G1 )−1 GT1 (N + G3 X02 − G2 X2 )

(103)

where (GT1 G1 )−1 GT1 = G†1 is the pseudo-inverse of G1 . Plugging in the expressions we obtain,

min minkGX0 − GX + Nk2

X2 ∈Ξ0 X1

= min ||N + G3 X02 − G2 X2 ||2 − (N + G3 X02 − G2 X2 )T G1 G†1 (N + G3 X02 − G2 X2 ) X2 ∈Ξ0

(104) Now note that,

min

˜ supp (X0 ,X)=0 ˜ X:d

˜ 2 = ||N||2 − NT G0 G† N ||(Y − GX)|| 0

(105)

where G0 = [G3 G1 ]. For sake of notational convenience we will drop stating the condition X2 ∈ Ξ0 . In the following it

should be assumed unless otherwise stated. Also note that X02 ∈ Ξ. Then using the results derived so far we have,  Sl = N : min(G3 X02 − XT2 G2 )T (I − G1 G†1 )(G3 X02 − G2 X2 ) − X2

o 2NT (I − G1 G†1 )(G3 X02 − G2 X2 ) + NT (G0 G†0 − G1 G†1 )N ≤ 0

(106) (107)

First note that NT (G0 G†0 − G1 G†1 )N ≥ 0 so we will ignore this term in subsequent calculations.

Now note that H = (I − G1 G†1 ) is the null-space of G1 . To this end one can identify a necessary

condition - For l = 1, if there is a column of the matrix [G3 G2 ] that falls in the null space of G1 then the right hand side is zero, which implies a probability of error of 1. This will not happen as long as G has rank m and m ≥ 2k + 1. Assume for now that G has rank m and m ≥ 2k + 1. This is always true if G ∈ RIP (δ, 2α). Now note that,

Sl ⊆

August 6, 2009

[ N : 2NT H(G3 X02 − G2 X2 ) ≥ (G3 X02 − G2 X2 )T H(G3 X02 − G2 X2 )

(108)

X2

DRAFT

37

Using the singular value decomposition for G1 = UΣV∗ , one can show that (I − G1 G†1 ) =

Um−l1 U∗m−l1 where Um−l1 ∈ Cm×m−l1 is the matrix composed of the m − l1 column vectors of U that span the null space of G1 ∈ Rm×l1 . Thus we have,

Sl ⊆

[ N : 2NT Um−l1 U∗m−l1 (G3 X02 − G2 X2 ) ≥ (XT2 G2 )T Um−l1 U∗m−l1 (G3 X02 − G2 X2 ) X2

(109)

Sl ⊆

 [ X2



i

h



N : 2NT Um−l1 U∗m−l1 G3 G2 

X02 −X2



h

 ≥ X0 2

    i X0  h i GT 2  3 Um−l1 U∗m−l1 G3 G2  −X2  T −X2  G2

(110)

˜ T = NT Um−l and To this end denote N 1 i

h



X02



′ ˜ = U∗   G m−l1 G3 G2 , X = −X2

. Then we have,

Sl ⊆

[

X2 ∈Ξ0 ,X02 ∈Ξ

n

o ˜ : 2N ˜ T GX ˜ ′ ≥ kGX ˜ ′ k2 = S˜l N

Lemma 8.1: If G ∈ RIP (δ, 2α) then   [ 1−δ ′ T ′ ′ 2 ˜ ˜ ˜ ˜X ≥ N, |X | = β/2 : 2N g Sl ⊂ A1 (δ) = k˜ gX k 1+δ

(111)

(112)

˜ ∈{˜ g g1 ,...,˜ gn }

i.e., each event with l support errors is almost contained in the union of events with support error of 1. Proof: First note that if G ∈ RIP (δ, 2α) then U∗ G ∈ RIP (δ, 2α) and for any fixed location errors n

o n o ˜ : 2N ˜ T GX ˜ ′ ≥ kG ˜ X′ k2 ⊂ N ˜ : 2N ˜ T GX ˜ ′ ≥ (1 − δ)kX′ k2 N

(113)

′ ˜ and X′ = [X ′ , .., X ′ , ..X ′ ]T . ˜ ′ = Pl g ˜j is the j -th column of G Now note that GX 1 j j=1 ˜ j Xj where g l P ˜1 (δ), where Note also that ||X′ ||2 = j |Xj′ |2 . This implies that S˜l ⊂ E

˜1 (δ) = E X′

[

,˜ g∈{˜ g1 ,..,˜ gl }

n

˜ : 2N˜ ˜ gX ≥ (1 − δ)kX k N ′

′ 2

o



[

˜ ∈{˜ X′, g g1 ,..,˜ gn }

  1 − δ ′ ′ 2 ˜ : 2N˜ ˜ gX ≥ N k˜ gX k 1+δ

(114) August 6, 2009

DRAFT

38

The result then follows by noting that |X ′ | ≥ β/2 and [

|X ′ |≥β/2

    ˜ : 2N˜ ˜ gX ′ ≥ 1 − δ k˜ ˜ |X ′ | = β/2 : 2N˜ ˜ gX ′ ≥ 1 − δ k˜ gX ′ k2 ⊆ N, gX ′ k2 N 1+δ 1+δ

Now note that since every event El is contained in the union of events S˜l over all possible

(115)

n l



support

error locations and from Lemma 8.1 each of these S˜l is contained in A1 (δ) - this implies that El ⊂ A1 (δ). Therefore,

P(E1 ) ≤ P(A1 (δ))

(116)

Thus we only need to upper bound the probability of the event A1 (δ). For this we will use the union bound over the n possible support error locations for the choices X ′ = β/2 and X ′ = −β/2. To this end we have the following Lemmas. Lemma 8.2: For the CS scenario, under the fixed SNR model, 2 β 2 SN R 32

− (1−δ) (1+δ)2

P(A1 (δ)|G ∈ RIP (δ, 2α)) ≤ e

elog 2n

(117)

Lemma 8.3: For the SNET scenario, under the fixed SNR model, 2 β 2 SN Rm 32n

− (1−δ) (1+δ)2

P(A1 (δ)|G ∈ RIP (δ, 2α)) ≤ e

elog 2n

(118)

The proof of above lemmas follows by using the union bound and by taking expectation over G. To this end we have the following Lemma for the RIP constant δ, taken from [13] Lemma 8.4: Restricted Isometry Constant: Let the sparsity be α = k/n and consider the function: √  p p f (α) := n/m α + 2H(α)

where H(·) is the entropy function H(q) := −q log q − (1 − q) log(1 − q) defined for 0 < q < 1. For

1 each ε > 0, the RIP constant δ of a m × n Gaussian matrix G whose elements are i.i.d. N (0, m ), obeys

 P 1 + δ ≥ (1 + (1 + ε)f (α))2 ≤ 2 exp (−nH(α)ǫ/2)

(119)

Lemma 8.4 implies that with probability exceeding 1 − e−nεH(2α) 1−δ 1 − η1 ≥ 1+δ 1 + η1

(120)

where η1 = 2(1 + ε)f (2α) + (1 + ε)2 f 2 (2α).

August 6, 2009

DRAFT

39

E. Proof of Theorem 4.2 ¿From above it follows that for P(E1 ) to go down to zero it is sufficient that, (1 − η1 )2 β 2 SN R = (1 + γ) log 2n 32(1 + η1 )2  1/2 log 2n for some arbitrary γ > 0. Let η2 = 32(1+γ) . This implies that it is sufficient that, 2 β SN R 1 − η2 1 − η1 ≥ η2 =⇒ η1 ≤ 1 + η1 1 + η2 1 − η2 =⇒ (1 + ε)f (2α)(2 + (1 + ε)f (2α)) + 1 ≤ 1 + 1 + η2 r 2 2 =⇒ (1 + ε)f (2α) ≤ −1 =⇒ (1 + (1 + ε)f (2α))2 ≤ 1 + η2 1 + η2

In terms of sensing capacity C =

r

n m

(121)

(122) (123) (124)

it is sufficient that,

n 1 p √ ≤ m (1 + ε)( 2α + 2H(2α))



v u u t

2 1+



 32(1+γ) log 2n 1/2 2 β SN R

Since γ and ε are arbitrary following sensing capacity is achievable,

√ 1 p C= √ ( 2α + 2H(2α))

 v u u t

2 1+



32 log 2n β 2 SN R



 − 1



 1/2 − 1

(125)

(126)

Note that in order that sensing capacity as defined above to be achievable and to be positive it is required that -

SN R ≥

32 log 2n β2

(127)

To this end we have the following Lemma Lemma 8.5: For the CS scenario P(E2 ) → 0 for SN R ≥

32c log 2n β2

for fixed c > 0.

Proof: For X0 supported on the submatrix G0 (say) we wish to find the probability  P(E2 ) = P N : k(GT0 G0 )−1 GT0 Nk∞ ≥ β/2

First note that the ℓ∞ norm of the vector (GT0 G0 )−1 GT0 N is ≤ eigenvalue of G0 . Since G0 ∈ Rm×k0 August 6, 2009

(128)

||N|| √ √ m SN Rσmin

where σmin is the minimum q is i.i.d. Gaussian from [27] it follows that , σmin → 1 − km0 > DRAFT

40

0 a.s. In [28] a strong concentration form of this behavior was proved where it was shown that the q k convergence is exponentially fast. In the worst case k0 = k, for which σmin → 1 − m > 0 and for q m ≥ 2k + 1, σmin ≥ 1 − 12 .

Now note that since N is AWGN noise with variance one per dimension,

p ||N|| P( √ ≥ (1 + ε) log m) ≤ e−εm m q k −1 ) ≤ 4. The result then follows by identifying c = (1 − m

(129)

F. Proof of Theorem 4.3 For this case in order that P(E1 ) to go down to zero it is sufficient that, m (1 − η1 )2 β 2 SN R ≥ (1 + γ) log 2n (130) n 32(1 + η1 )2  1/2 log n for some arbitrary γ > 0. To this end let η3 = 32n(1+γ) . Then the above condition is 2 mβ SN R

equivalent to the condition,

η1 ≤

1 − η3 1 + η3

(131)

Substituting the values of η1 and η3 into the expressions we obtain,

r

1 n p √ ≤ m (1 + ε)( 2α + 2H(2α))

 v u u t

2 1+



32n(1+γ) log n mSN R



 1/2 − 1

(132)

Since γ and ε are arbitrary, the achievable sensing capacity obeys the implicit relation given by the following equation,

√ 1 p C= √ ( 2α + 2H(2α))

 v u u t



2   1/2 − 1 √ log n 1 + C 8SN R

(133)

It can be seen that in order that P(E1 ) to go down to zero and for sensing capacity to be positive it is required that

SN R ≥ August 6, 2009

32n log 2n mβ 2

(134)

DRAFT

41

For a fixed fraction of

n m

this scaling and the scaling necessary from Theorem 4.1 are the same. To

this end we have the following Lemma Lemma 8.6: For the SNET scenario P(E2 ) → 0 for SN R ≥

a.s.

32cn log 2n mβ 2

for some fixed c > 0. q pm k ) Proof: The proof is similar to the proof in CS case except for the fact that σmin → n (1 − m

Since the equation is implicit in sensing capacity, we will simplify it in order to arrive at a simpler √ expression stated in the Theorem. To this end assume that 2f (2α) ≥ f 2 (2α). Let η0 = 3(1 + ε)( 2α + 1/2  p log 2n 2H(2α)) and η2 = 32(1+γ) . Then for probability of error to go down to zero it is sufficient β 2 SN R that,

1 − rη0 ≥ rη2 1 + rη0

where r =

pn

m.

(135)

This yields the following quadratic in r , r 2 η0 η2 + r(η0 + η2 ) − 1 ≤ 0

(136)

The positive solution to the quadratic equation is given by, p (η0 + η2 )2 + 4η0 η2 − (η0 + η2 ) r = 2η0 η2 ∗

(137)

Since γ and ǫ are arbitrary the result follows. G. Proof of lemma 5.1 Let X n = {X1 , ..., Xn } be an i.i.d. sequence where each variable Xi is distributed according to a

distribution PX defined on the alphabet X . Denote PX n , (PX )n the n-dimensional distribution induced

by PX . Let the space X n be equipped with a distance measure d(., .) with the distance in n dimensions P given by dn (X n , Z n ) = nk=1 d(Xk , Zk ) for X n , Z n ∈ X n . Given ǫ > 0, there exist a set of points o n n ⊂ X n such that, Z1n , ..., ZN ǫ (n,d0 ) 

PX n 

Nǫ (n,d0 )

[

i=1



Bi  ≥ 1 − ǫ

(138)

 where Bi , X n : n1 dn (X n , Zin ) ≤ d0 , i.e., the d0 balls around the set of points cover the space X n

in probability exceeding 1 − ǫ. August 6, 2009

DRAFT

42

Given such set of points there exists a function f (X n ) : X n → Zin s.t. P

1 n n n dn (X , Zi )

 ≤ d0 ≥ 1−ǫ.

To this end, let TPX n denote the set of δ - typical sequences in X n that are typical PX n , i.e. TPX n =

  1 ˆ n ) − H(X)| ≤ δ X n : | − log P(X n

ˆ n ) is the empirical distribution induced by the sequence X n . We have the following lemma where P(X

from [14]. Lemma 8.7: For any η > 0 there exists an n0 such that for all n ≥ n0 , such that  1 n ˆ P X : | − log P(X ) − H(X)| < δ > 1 − η n ˆ n (Y) that produces an estimate of In the following we choose η = δ. Given that there is an algorithm X 

n

X n given the observation Y . To this end define an error event on the algorithm as follows,   1 if 1 d (X n , X ˆ n (Y)) ≥ d0 n n En =  0 otherwise

Define another event An as follows

  1 if X n ∈ T n PX An =  0 otherwise

(139)

Note that since X n is drawn according to PX n and given δ > 0 we choose n0 such that conditions of lemma 8.7 are satisfied. In the following we choose n ≥ n0 (δ). Then a priori, P(An = 1) ≥ (1 − δ). Now, consider the following expansion,

H(f (X n ), En , An |Y) = H(f (X n )|Y) + H(En , An |f (X n ), Y)

(140)

= H(En , An |Y) + H(f (X n )|En , An , Y)

(141)

This implies that

H(f (X n )|Y) = H(En , An |Y) − H(En , An |f (X n ), Y) + H(f (X n )|En , An , Y)

August 6, 2009

(142)

= I(En , An ; f (X n )|Y) + H(f (X n )|En , An , Y)

(143)

≤ H(En , An ) + H(f (X n )|En , An , Y)

(144)

≤ H(En ) + H(An ) + H(f (X n )|En , An , Y)

(145) DRAFT

43

Note that H(En ) ≤ 1 and H(An ) = δ log

1 δ

+ (1 − δ) log

1 1−δ

∼ δ for δ small enough. Thus we have

H(f (X n )|Y) ≤ 1 + δ + Pne H(f (X n )|Y, En = 1, An ) + (1 − Pne )H(f (X n )|Y, En = 0, An )

(146)

Now the term Pne H(f (X n )|Y, En = 1, An ) ≤ Pne log Nǫ (n, d0 ). Note that the second term does not go to zero. For the second term we have that,

(1 − Pne )H(f (X n )|Y, En = 0, An ) = P(An = 1)(1 − Pne )H(f (X n )|Y, En = 0, An = 1) + P(An = 0)(1 − Pne )H(f (X n )|Y, En = 0, An = 0)

(147) ≤ (1 − Pne )H(f (X n )|Y, En = 0, An = 1) + δ(1 − Pne ) log (Nǫ (n, d0 ))

(148)

The first term on R.H.S in the above inequality is bounded via, (1 − Pne )H(f (X n )|Y, En = 0, An = 1) ≤ (1 − Pne ) log (|S|)

where S is the set given by, o   n S = i : dset Bf (X n ) , B i ≤ d0

where dset (S1 , S2 ) = mins∈S1 ,s′ ∈S2 dn (s, s′ ) is the set distance between two sets. Now note that I(f (X n ); X n ) = H(f (X n )) and H(f (X n )|Y) = H(f (X n ))−I(f (X n ); X n ) ≥ H(f (X n ))−I(X n ; Y)

where the second inequality follows from data processing inequality over the Markov chain f (X n ) ↔ X n ↔ Y . Thus we have,

Pne ≥

I(f (X n ); X n ) − log |S| − I(X n ; Y) − 1 (1 − δ) log Nǫ (n, d0 ) − log |S| −

δ(1 + log Nǫ (n, d0 )) (1 − δ) log Nǫ (n, d0 ) − log |S|

The above inequality is true for all the mappings f satisfying the distortion criteria for mapping X n and for all choices of the set satisfying the covering condition given by 8.8. We now state the following lemma for a minimal covering, taken from [23].

August 6, 2009

DRAFT

44

Lemma 8.8: Given ǫ > 0 and the distortion measure dn (., .), let Nǫ (n, d0 ) be the minimal number n of points Z1n , ..., ZN ⊂ X n satisfying the covering condition, ǫ (n,d0 )   Nǫ (n,d0 ) [ Bi  ≥ 1 − ǫ PX n  i=1

Let Nǫ (n, d0 ) be the minimal such number. Then,

1 lim sup Nǫ (n, d0 ) = RX (ǫ, d0 ) n n

where RX (ǫ, d0 ) is the infimum of the ǫ- achievable rates at distortion level d0 . ˆ X) subject to Note that limǫ↓0 RX (ǫ, d0 ) = RX (d0 ) where RX (d0 ) = minp(X|X) I(X; ˆ

1 n ˆn n E(d(X , X ))

d0 . In order to lower bound Pne we choose the mapping f (X n ) to correspond to the minimal cover. Also

w.l.o.g we choose δ = ǫ. We note the following. 1) From lemma 8.7, given ǫ > 0, ∃n0 (ǫ) such that for all n ≥ n0 (ǫ), we have P(TPX n ) ≥ 1 − ǫ. 2) Given ǫ > 0 and for all β > 0, for the minimal cover we have from lemma 8.8 that ∃ n1(β) such that for all n ≥ n1 (β), Nǫ (n, d0 ) ≤ n(RX (ǫ, d0 ) + β).

3) From the definition of the rate distortion function we have for the choice of the functions f (X n ) that satisfies the distortion criteria, I(f (X n ); X n ) ≥ nRX (ǫ, d0 ). Therefore we have for n ≥ max(n0 , n1 ), Pne ≥

nRX (ǫ, d0 ) − log |S| − I(X n ; Y) − 1 (1 − ǫ)(n(RX (ǫ, d0 ) + β) − log |S| −

Clearly, log |S| ≤ n2 RX (ǫ, d0 ).

ǫ(1 + n(RX (ǫ, d0 ) + β) (1 − ǫ)n(RX (ǫ, d0 ) + β) − log |S|

a) Limiting case: Since the choice of ǫ, β is arbitrary we can choose them to be arbitrary small. In fact we can choose ǫ, β ↓ 0. Also note that for every ǫ > 0 and β > 0 there exists n2 (β) such that RX (d0 ) + β ≥ RX (ǫ, d0 ) ≥ RX (d0 ) − β . Therefore for all n ≥ max(n0 , n1 , n2 ) in the limiting case

when ǫ, β ↓ 0, we have Pe ≥

RX (d0 ) − n1 log |S| − n1 I(X n ; Y) − o(1) RX (d0 ) − n1 log |S|

This implies that

August 6, 2009

DRAFT



45

Pe ≥

RX (d0 ) −

1 n

log |S| − n1 I(X n ; Y) − o(1) RX (d0 )

The proof then follows by identifying K(n, d0 ) =

1 n

log |S|, and is bounded above by a constant.

H. Proof of lemma 5.2 Proof: Define the error event,   1 if 1 d (X n , X ˆ n (Y)) ≥ d0 n H E=  0 otherwise

Expanding H(X n , E|Y) in two different ways we get that,

H(X n |Y) ≤ 1 + nPe log(|X |) + (1 − Pe )H(X n |E = 0, Y)

Now the term  nd 0 −1 X

 n (1 − Pe )H(X |E = 0, Y) ≤ (1 − Pe ) log (|X | − 1)nd0 −j (149) d0 n − j j=0   n ≤ (1 − Pe ) log nd0 (150) (|X | − 1)nd0 d0 n − 1   log nd0 (151) ≤ n(1 − Pe ) h(d0 ) + d0 log(|X | − 1) + n  n where the second inequality follows from the fact that d0 ≤ 1/2 and d0 n−j (|X | − 1)nd0 −j is a n

decreasing function in j for d0 ≤ 1/2. Then we have for the lower bound on the probability of error that,   H(X n |Y) − n h(d0 ) + d0 log(|X | − 1) + lognnd0 − 1   Pe ≥ n log(|X |) − n h(d0 ) + d0 log(|X | − 1) + lognnd0

Since H(X n |Y) = H(X n ) − I(X n ; Y) we have

  n H(X) − h(d0 ) − d0 log(|X | − 1) − lognnd0 − I(X n ; Y) − 1   Pe ≥ n log(|X |) − n h(d0 ) + d0 log(|X | − 1) + lognnd0

It is known that RX (d0 ) ≥ H(X) − h(d0 ) − d0 log(|X | − 1), with equality iff d0 ≤ (|X | − 1) min PX X∈X

August 6, 2009

DRAFT

46

see e.g., [23]. Thus for values of distortion d0 ,   d0 ≤ min 1/2, (|X | − 1) min PX X∈X

(152)

we have for all n,

Pe ≥

nRX (d0 ) − I(X n ; Y) − 1 − log nd0   n log(|X |) − n h(d0 ) + d0 log(|X | − 1) + lognnd0

I. Rate distortion function for the mixture Gaussian source under squared distortion measure It has been shown in [25] that the rate distortion function for a mixture of two Gaussian sources with variances given by σ1 with mixture ratio α and σ0 with mixture ratio 1 − α, is given by R (D) = mix  H(α) + (1−α) log( σ02 ) +

α D 2 ασ12 α 2 log( D−(1−α)σ02 ) 2

 H(α) +

2

log( σD1 ) if D < σ02

if σ02 < D ≤ (1 − α)σ02 + ασ12

For a strict sparsity model we have σ02 → 0 we have that, Rmix (D) = H(α) +

α 2

2

2 1 log( ασ D ) if 0 < D ≤ ασ1

R EFERENCES [1] D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. [2] J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE Transactions on Inforamtion Theory, vol. 52, no. 9, pp. 4036–4068, Sep 2006. [3] E. Candes and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?” preprint, 2004. [4] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,” IEEE Transactions on Signal Processing, vol. 50, no. 6, pp. 1417–1428, June 2002. [5] I. Maravic and M. Vetterli, “Sampling and reconstruction of signals with finite rate of innovation in the presence of noise,” IEEE Transactions on Signal Processing, vol. 53, no. 8, pp. 2788–2805, August 2005. [6] Y. Rachlin, R. Negi, and P. Khosla, “Sensing capacity for discrete sensor network applications,” ser. Proceedings of the Fourth International Symposium on Information Processing in Sensor Networks, 2005. [7] E. J. Cands and T. Tao, “The dantzig selector: statistical estimation when p is much larger than n.” to appear in Annals of Statistics. [8] M. Rabbat, J. Haupt, A. Singh, and R. Nowak, “Decentralized compression and predistribution via randomized gossiping,” ser. International Conference on Information Processing in Sensor Networks, Nashville, TN, USA, April 2006. August 6, 2009

DRAFT

47

[9] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 267–288, April 1996. [10] J. A. Tropp, “Recovery of short linear combinations via l1 minimization,” IEEE Transactions on Inforamtion Theory, vol. 51, no. 4, pp. 1568–1570, April 2005. [11] ——, “Just relax: Convex programming methods for identifying sparse signals,” IEEE Transactions on Inforamtion Theory, vol. 51, no. 3, pp. 1030–1051, March 2006. [12] E. J. Cands and Y. Plan, “Near-ideal model selection by l1 minimization,” California Institute of Technology, Tech. Rep., 2007. [13] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203– 4215, Dec. 2005. [14] T. M. Cover and J. Thomas, Elements of Information Theorys, ser. Wiley, New York, 1991. [15] M. J. Wainwright, “Sharp thresholds for high-dimensional and noisy recovery of sparsity,” Dept. of Statistics, Univ. of California, Berkeley, Tech. Rep., May, 2006 2006. [16] A. K. Fletcher, S. Rangan, and V. K. Goyal, “Rate-distortion bounds for sparse approximation,” ser. IEEE/SP 14th Workshop on Statistical Signal Processing, 2007, Madison, WI, 26-29 August 2007, pp. 254–258. [17] S. Aeron, M. Zhao, and V. Saligrama, “On sensing capacity of sensor networks for a class of linear observation models,” in IEEE Statistical Signal Processing Workshop, Wisconsin-Madison, WI, August 26-29 2007. [18] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressive wireless sensing,” ser. International Conference on Information Processing in Sensor Networks, Nashville, TN, USA, April 2006. [19] M. Wakin, M. Duarte, D. Baron, and R. Baraniuk, “Random filters for compressive sampling and reconstruction,” in Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 2006. [20] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitz-structured compressed sensing matrices,” ser. IEEE Workshop on Statistical Signal Processing, Madison, WI, August 2007. [21] J. Haupt, R. Castro, R. nowak, G. Fudge, and A. Yeh, “Compressive sampling for signal classification,” ser. Fortieth Asilomar Conference on Signals, Systems and Computers, 2006. ACSSC ’06., 2006, pp. 1430–1434. [22] Y. G. Yatracos, “A lower bound on the error in non parametric regression type problems,” Annals of statistics, vol. 16, no. 3, pp. 1180–1187, Sep 1988. [23] I. Csisz´ ar and J. J. Korner, Information Theory: Coding Theorems for Discrete Memoryless Systems, ser. Academic Press, New York, 1981. [24] K. Zeger and A. Gersho, “Number of nearest neighbors in a euclidean code,” IEEE Transactions on Information Theory, vol. 40, no. 5, pp. 1647–1649, Sep 1994. [25] Z. Reznic, R. Zamir, and M. Feder, “Joint source-channel coding of a gaussian mixture source over a gaussian broadcast channel,” IEEE Transactions on Information Theory, pp. 776–781, March 2002. [26] S. Aeron, M. Zhao, and V. Saligrama, “On sensing capacity of sensor networks for a class of linear observation fixed snr models,” Boston University, Tech. Rep., 2007, http://arxiv.org/abs/0704.3434. [27] J. W. Silverstein, “The smallest eigenvalue of a large dimensional wishart matrix,” The Annals of Probability, vol. 13, no. 4, pp. 1364–1368, Nov. 1985. [28] M. Ledoux, The Concentration of Measure Phenomena, ser. Mathematical Surveys and Monographs 89.

American

Mathematical Society, 2001.

August 6, 2009

DRAFT