Adaptive Statistical Sampling Methods for ... - Semantic Scholar

Report 1 Downloads 191 Views
Adaptive Statistical Sampling Methods for Decentralized Estimation and Detection of Localized Phenomena Erhan Baki Ermis¸

Venkatesh Saligrama

Information Systems and Sciences Laboratory Department of Electrical and Computer Engineering Boston University, Boston, MA 02215 Email: [email protected]

Information Systems and Sciences Laboratory Department of Electrical and Computer Engineering Boston University, Boston, MA 02215 Email: [email protected]

Abstract— Sensor networks (SNETs) for monitoring spatial phenomena has emerged as an area of significant practical interest. We focus on the important problem of detection of distributed events, sources, or abnormalities that are localized, i.e., only a small number of sensors in the vicinity of the phenomena are in the field of observation. This problem complements the standard decentralized detection problem, where noisy information about an event is measured by the entire network. For localized phenomena the main difficulty arises from the coupling of: a) noisy sensor observations that lead to local false positives/negatives; and b) limited energy, which constrains communication among sensor nodes. Together these difficulties call for reaching a decentralized statistical ordering based on limited collaboration. We are then led to the following fundamental problem: determine the most probable event locations while minimizing communication cost. Our objective in this paper is to characterize the fundamental trade offs between global performance (false alarms and miss rate) and communication cost. We develop a framework to minimize the communication cost subject to worst-case misclassification constraints by making use of the false discovery rate (FDR) concept along with an optimal local measure transformation at each sensor node. The preliminary results show that the FDR concept applied in a sensor networks context leads to significant reduction in the communication cost of the system. A very interesting implication of this work is that the detection performance of a wireless sensor network is comparable to that of a wired network of sensors.

I. I NTRODUCTION The design and deployment of sensor networks (SNETs) for decision making pose fundamental challenges due to energy constraints and uncertain environments. It is well known [3] that, energy limits the ability to collaborate by limiting the communication range, capacity, and system lifetime. On the other hand, to overcome uncertainty, which arises from noisy measurement of distributed phenomena, some form of collaboration among sensor nodes is necessary. In this paper we focus on one such problem where minimization of communication costs due to information exchange is required subject to end to end information quality constraints. Specifically, we develop solutions for detection of distributed events, sources, or abnormalities that are localized, i.e., only a small number of sensors in the vicinity of the phenomena are in the field of observation. This problem complements the standard decentralized detection problem, where noisy information about an event is measured by the entire network. Fig. 1 illustrates the salient difference between the two problems. In decentralized detection each sensor measures noisy information about a single global phenomena. The global phenomena by itself can be one of several different discrete possibilities. Researchers have investigated several architectures within this context. Under the fusion architecture [2], [12], [13] a local decision is made at each sensor node and transmitted to a fusion center(s). In this context, our earlier work [14] developed alternative belief propagation based approach,

(a) Global Information Fig. 1.

(b) Local Information

Decentralized Detection vs. Localized Detection

wherein each sensor node transmits its likelihood to its neighboring sensor, which then forms an update and forwards this information to its nearest neighbor. Nevertheless, the setting of these problems is significantly different from the problem dealt with in this paper. In contrast to the global information setting where sensor data is correlated across all the sensor nodes, the issue here is that most sensors measure independent information. For localized phenomena the main difficulty arises from the coupling of: a) noisy sensor observations that lead to local false positives/negatives; and b) limited energy, which constrains communication among sensor nodes. Together these difficulties call for reaching a decentralized statistical ordering based on limited collaboration. We are then led to the following fundamental problem: determine the most probable event locations while minimizing communication cost. Our objective in this paper is to characterize the fundamental trade offs between global performance (false alarms and miss rate) and communication cost. We develop a framework to minimize the communication cost subject to worst-case misclassification constraints by making use of the false discovery rate (FDR) concept along with an optimal local measure transformation at each sensor node. The preliminary results show that the FDR concept applied in a sensor networks context leads to significant reduction in the communication cost of the system. A very interesting implication of this work is that the detection performance of a wireless sensor network is comparable to that of a wired network of sensors in the following sense: corresponding to an achievable (centralized) false positive/negative threshold the communication complexity grows in proportion to the number of events/sources/abnormalities while achieving the same centralized performance. The organization of this paper is as follows: Section 2 provides a brief discussion of how the problem described here relates to the statistical literature. Section 3 motivates possible solutions to the

different formulations. We describe two possible strategies: 1) a nonadaptive solution wherein individual sensors make local decisions that are then transmitted to the fusion center; 2) an adaptive feedback based solution where sensor decisions are made recursively available to undecided sensors. Section 4 then develops a novel statistical framework to characterize the trade off between false alarms and misses, which can be seen as being analogous to Neyman-Pearson type trade offs. In Section 5 we show that the method developed can be efficiently decentralized resulting in significant reduction ofcommunication costs. These solutions are then applied to the problem of boundary detection in Section 6. II. D ISCUSSION The general problem described here is related to the so called multiple comparison tests (MCPs) in the statistical literature [1], [10], [11] as well as the bio-statistical communities [4]. The setup consists of a collection of sample observations, y, each drawn either with the probability distribution, fY (y/H0 ), which corresponds to the null hypothesis, H0 , or with fY (y/H1 ), which corresponds to the positive hypothesis. The problem is to partition the samples in to two bins corresponding to null and positive hypothesis respectively. A general partition can be associated with a decision rule for each sensor node, k, mapping all the observations into the two different hypothesis, i.e., uk : y1N 7→ {0, 1}, k = 1, 2, . . . , m

(1)

The main difficulty encountered in MCPs is that controlling the global false alarm rate leads to controlling the so called family-wise-errorrate (FWER), which is also known as the Bonferroni procedure [11]. To further illustrate this point consider table below, where m is the number of samples (or sensor nodes) known in advance. The locations and the hypotheses are drawn with some probability distribution, which may or may not be known. R is an observable random variable; U, V, S, T are unobservable random variables. True H0 True H1 Total

Declared H0 U T m−R

Declared H1 V S R

Total m0 m − m0 m

Control of the global probability of false alarms, i.e., F W ER = Prob{V ≥ 1} ≤ γ in general requires control of individual false alarm thresholds for each sample (or sensor) to γ/m, where m is the number of samples (# sensor nodes in the sensor network), i.e., P rob{uk (y) = 1/H0 } ≤ α/m The sufficiency part is straightforward and follows from a unionbound type argument. Although, the strategy directly lends itself to decentralized decision making, it is well known (and easy to see) that it has poor detection power, i.e., there are a large number of false negatives. Taking into account the low power of detection in Bonferroni’s procedure, Benjamini and Hochberg introduced the FDR procedure [1]. This concept, instead of trying to control the probability of making any type one error, controls the expected ratio of the number of observations falsely declared as significant, (V ), to the total number of observations declared as significant, (R), i.e., F DR = E(Q) = E{V /V + S} = E{V /R} This relaxation improves the detection power while maintaining FDR to within γ, where γ is the desired false discovery rate constraint.

It is easy to establish that the false alarm rate [1] is bounded from below by FDR, i.e., F W ER = Prob{V ≥ 1} ≥ E(Q) = F DR Although it is a weaker notion in terms of false alarm probability, the significant increase in the power of detection makes it a desirable approach in many problems. Nevertheless, the FDR procedure suffers from two significant drawbacks that make it unsuitable in our applications: • First, the FDR procedure does not depend on the probability distribution, fY (y/H1 ), that generates the positive samples. The focus primarily is to reduce false positives and there is no control over the false negatives or miss rate. Therefore, depending on the application the detection power can be poor. • Second, the FDR strategy does not lend itself easily to decentralized implementation. The main contributions in this paper are two fold: • We develop a new FDR type procedure that optimizes power of detection among the family of FDR-like procedures for a fixed FDR threshold, γ. This is achieved through a convenient measure preserving transformation. • The new procedure lends itself to efficient decentralization. The experimental results show that an application of FDR-type procedure in the boundary detection problem and source localization leads to great improvements in terms of both detection power and communication costs. III. A DAPTIVE VS N ON - ADAPTIVE D ECISION RULES To describe possible decision rules we first consider the centralized strategy. Here the decision rule for each sensor is a mapping from all observations to one of the two hypothesis as described in Eq. 1. A decentralized decision rule is one where each sensor makes a decision based solely on its local observation, i.e., uk : Yk 7→ {φ, 0, 1} A sensor transmits whenever the decision is positive, i.e., uk (yk ) = 1. A non-adaptive solution corresponds to the Bonferroni procedure outlined in Section II. A uniform false positive threshold, γ/m, is set at each sensor to test the observation. Then, each sensor tests its observations against this threshold and transmits positive decisions and takes no action if the test result is negative. As described earlier, this non-adaptive solution bounds the global false alarm probability at γ, however, a very major drawback is its detection rate, which is usually arbitrarily small. Finally, an adaptive decision rule allows some form of collaboration between the sensor nodes and requires introduction of time and a “not yet decided state”. For this purpose, we allow each sensor to take on three values uk (·) ∈ {φ, 0, 1}, where φ corresponds to undecided state. In the beginning all the sensors are undecided. At time t + 1, an undecided sensor, k, updates its decision based on its local observation, yk , and all the messages, U t received (note that only positive messages (i.e., equal to 1) are received) from other sensor nodes up to time period t, i.e., uk (t + 1) : Yk × U t 7→ {φ, 0, 1} Such an adaptive rule constitutes a feedback mechanism and is shown schematically in Fig 2. It is possible to describe more elaborate feedback mechanisms. However, we focus on this structure for several reasons: a) feedback structure is simple; b) only 1-bit communication is admissible; c) for a uniform broadcast cost/bit, c, for each sensor node this scheme is optimal.

Fig. 2.

Adaptive Decision Rule

We are now ready to formalize our problem. There are two possibilities to explore. The objective here is to maximize the number of expected declarations subject to an FDR and communication constraint. maximize E(R) subject to: X c(ui (0)...ui (t)) ≤ α

E(V /R) ≤ γ,

B. Distributed Thresholding Strategies

i,t

where, the second constraint limits the transmission bit budget to α. However, as it turns out this problem does not always guarantee maximization of detection power. Therefore, we may seek to solve a modified problem: minimize E(T ) subject to: X E(V /R) ≤ γ, c(ui (0)...ui (t)) ≤ α i,t

where c is the cost of communicating a decision: ( 1 if ui (τ < t) 6= 1 and ui (t) = 1 c(ui (0)...ui (t)) = 0 else

(2)

We will discuss solutions to each of these problems in the sequel. A. Controlling the False Discovery Rate Under a Communication Constraint Consider m number of observations coming from different hypotheses, H0 and H1i , with corresponding probability density functions (pdfs), f0 and f1i . It is not known how many of these observations come from null hypothesis and how many of them come from positive hypotheses. The goal is to declare the maximum number of observations as significant subject to an FDR constraint. FDR procedure is described as follows: 1) 2) 3) 4)

Theorem 3.2: For independent test statistics under null hypothesis, and for any configuration of positive hypotheses, the above procedure controls the FDR at level γ. Proof: The reader is referred to the original work [1] for the proof of this theorem and a better understanding of the FDR concept. The general sketch of the proof is as follows: First, a uniform bound is formed on E(Q/p1 ..pm1 ), the distribution of FDR conditioned on the p-values for H1 . The uniform bound is derived by making use of the lemma 3.1 and the assumption that the test statistics are independent. Lemma 3.1 and Theorem 3.2 lead to the following proposition. Proposition 3.3: False discovery rate constraint is satisfied for FDR procedures applied along with transformations that are measure invariant with respect to the distribution under H0 . Proof: Applying a measure preserving transform with respect to the distribution under H0 leaves the random variable P0 uniformly distributed in (0,1). Since the distribution under null hypothesis is preserved, following the proof of theorem 3.2 from the original work [1] gives the result stated in the proposition.

Calculate the p values for all the observations Order the p values in ascending order i Find the largest index, imax , such that pi ≤ m γ Declare pj significant for 0 ≤ j ≤ imax

The following definition of p value of a random variable X is used in this paper: Z ∞ p(X) = f0 (t)dt = 1 − F0 (X) (3) X

where f0 is the pdf of the observations under H0 , and γ is the FDR constraint. It is obvious that p value of a random variable is also a random variable. Throughout the paper, the p value of the random variable X0 , where X0 comes from H0 , is denoted by P0 , and similarly for P1 . Lemma 3.1: The random variable P0 is distributed uniformly in (0,1) regardless of the distribution of X0 .

Using the FDR procedure described in section III-A as the basis of decision rule at time t, an algorithm can be established to perform the FDR procedure in a distributed fashion. Although a multi-layered approach can be used to solve more challenging problems, a single layer algorithm will be described for simplicity: 1) Each sensor calculates the p value of its observation, pi , and 1 tests pi with m γ 1 2) The sensor(s) with pi ≤ m γ declares its observation as significant, and communicates this decision to other sensors by a suitable protocol, (assume l of them declare their observations significant) 3) The decisions of the l sensors are fed back to the system and γ all the sensors update their threshold to l+1 m 4) Each sensor tests the p value of its data by the new threshold and declares its data as significant accordingly, (assume k more sensors declare their observations significant) 5) The new significant decisions are fed back to the system again, and the new threshold is updated to l+k+1 γ m 6) Steps 4 and 5 are repeated until when there is no more sensors that declare their observations as significant under the most current threshold, which is when the process terminates. In a multi-layered approach, the idea is to first set a larger threshold to detect the boundary locally at each sensor node. This is accomplished by local collaboration. Then, among the sensors that locally declare significant observations, a second level test is performed with the overall desired false discovery rate constraint. Then, this process repeats at higher levels of the network. The high level tests make use of more data, and they occur on a larger scale. Although they are more reliable, there is an extra communication cost involved. The approach presented in this work limits these high level tests to a small subset of the entire set of nodes, thus reduces the communication costs. IV. D OMAIN T RANSFORMED FDR P ROCEDURE In many cases the adaptive solution that has been described so far is sufficient. However, in some problems, the distribution of the observations may have characteristics which accentuate the suboptimal nature of the FDR procedure. Specifically, FDR procedure performs best when the p values of the data that come from H1 are clustered near zero, and that may not necessarily be

the case as seen in the following example. Example Consider two Gaussian random variables with f0 (x) ∼ N (0, 1) and f1 (x) ∼ N (−4, 1), and consider the FDR constraint γ = 0.05. The goal is to detect as many samples of P1 as possible from a mixture of samples subject to FDR constraint, γ. In this case most of the realizations of the random variable P1 are close to 1 rather than 0, and FDR procedure described above will not declare them as significant. In this situation the FDR procedure suffers from severe increase in miss rate. The issue here is that the algorithm searches for significant observations that are less than or equal to the FDR threshold, while all the significant observations are near 1. To overcome this problem, consider the following transformation on the random variables P0 and P1 : Pˆi = 1 − Pi , i = 0, 1

(4)

Since P0 is uniformly distributed in (0,1), it is obvious that Pˆ0 is also uniformly distributed in (0,1). Observe, however, that most of the realizations of Pˆ1 are close to 0. The transformation shifted all the significant p values to near 0 without changing the distribution of the insignificant observations, generating a more suitable data set for the algorithm. Therefore, when the FDR procedure is performed on this new set of p values, more of the observations coming from H1 will be declared as significant, thus the detection power of the test is increased. Furthermore the FDR constraint γ is still satisfied since the transform preserves the U (0, 1) distribution of the p values for the observations coming from H0 . In this section, a method is developed, which not only solves the possible early termination problem of the distributed algorithm, but also yields to the best performance of FDR procedure, subject to the problem constraints. Conveniently, as a result of this solution, E(T ) is also minimized within the capabilities of FDR procedure. FDR procedure does not assume knowledge of the distribution of observations under positive hypothesis. However, in many problems, the distribution of the observations under positive hypothesis is known, or can be estimated. Here, it is shown that when this extra information is available, one can design more powerful tests, still remaining within the FDR constraint γ. Making use of the assumption that the distributions of the observations are known under null and positive hypotheses, we introduce a transformation in the p domain. The transformation is simple in nature, and is a reorientation of the p domain. Despite its simplicity, it has three very important properties to note: 1) It preserves uniform distribution of p values under null hypothesis, 18

16

2) It maps a non-monotonic or monotonically increasing density of p values to a monotonically decreasing one, 3) After the p values are put in ascending order, the plot of p values vs indices looks like a convex function sampled at integer points. The transformation is dependent on the distribution of the observations under H1 , but not the realizations themselves. Therefore it can be applied apriori to experimentation. Due to the first property of the transformation the FDR constraint is satisfied after the transformation, and by the second property the detection power of FDR procedure is increased. The third property plays a crucial role in the implementation of FDR procedure in a distributed manner. Property three is discussed more in detail in section V. A. Transformation of p Domain Let g0 and g1 be the probability density functions of P0 and P1 respectively. Define the transformation, Tn , on the p domain described as follows: 1) Partition the range of g1 into n bins of size ε = 1/n, preimages of which induce a partitioning of the p domain, 2) Index the partitions in p domain with a location index i, i = 1..n according to their order of appearance as p ranges from 0 to 1, 3) Index the partitions in p domain with area index j, j = 1..n such that the j th partition has the j th largest area under g1 , 4) Beginning from j = 1, rearrange the locations of the partitions so that the location index i of each bin is equal to its area index, j. Proposition 4.1: The sequence of transformations {Tn } converges to a measurable transformation T . Proof: {Tn } is a sequence of measurable transforms which is point wise convergent. Define, lim Tn (x) = T (x). It follows from a standard application of elementary analysis [8] that, supn {Tn }, is also a measurable transform and that lim sup{Tn } is also measurable. Now since Tn is point wise convergent the result follows. The following procedure will be referred as the Domain Transformed FDR (DTFDR) procedure throughout the paper: 1) Apply the transformation T to p values of the observations 2) Follow the FDR procedure The following propositions directly follow: Proposition 4.2: T is a measure invariant transformation for samples of H0 in the p-value domain. Proof: The proof follows by construction and continuity arguments. Proposition 4.3: The DTFDR procedure controls the false discovery rate at the same level as the FDR procedure. Proof: The proof follows from Proposition 4.2 and Lemma 3.1. Proposition 4.4: The measure transformation T converts an arbitrary density of p values for H1 to a monotonically decreasing density over (0, 1). Proof: The proof follows by construction.

14

B. Improving the Performance of FDR Procedure

12

10

8

6

4

2

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 3. The distribution of P1 before (solid) and after (dotted) the transformation is applied

Before proceeding any further, the term “stochastically larger” [7] must be introduced: We say that the random variable X is stochastically larger than the random variable Y , denoted X ≥st Y , when FX (a) ≤ FY (a) for all a. In this paper the term “dominant density” is used for the pdf of Y . Lemma 4.5: Let X1 ..Xn ∈ (0, 1) be n independent random variables with common density function fX and let Y1 ..Yn ∈ (0, 1) be n independent random variables with common density function fY . Also, let X(i) and Y(i) denote the ith smallest of X1 ..Xn

15

10

100

1

90

0.9

80

0.8

70

0.7

60

0.6

50

0.5

40

0.4

30

0.3

20

0.2

5

0.1

10

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4

(a)

0.5

(b)

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(c)

Fig. 4. Expected number of observations declared as significant for the original (solid) and the transformed (dotted) data with 100 observations: (a) Expected number of declarations vs ratio of observations from positive hypothesis to total number of observations (b) Expected number of declarations vs FDR constraint (c) Expected miss rate vs FDR constraint. In (a) the FDR constraint is .05, in (c) and (b) the ratio is fixed at .3

and Y1 ..Yn respectively. If FX (t) ≥ FY (t) ∀t ∈ (0, 1), then Y(i) ≥st X(i) . Proof: fX(i) (t) Let Ai =

=

n! (FX (t))i−1 (1 − FX (t))n−i fX (t) (i − 1)!(n − i)!

n! . (i−1)!(n−i)!

FX(i) (t)

= =

Ai Ai

Then, Z

Z

t

(FX (x))i−1 (1 − FX (x))n−i fX (x)dx 0 FX (t)

ui−1 (1 − u)n−i du

0

By the same approach it is easy to see that Z t FY(i) (t) = Ai (FY (y))i−1 (1 − FY (y))n−i fY (y)dy 0 Z FY (t) = Ai ui−1 (1 − u)n−i du 0

By hypothesis of the lemma, FX (t) ≥ FY (t) ∀t ∈ (0, 1), and since 0 ≤ u ≤ 1, it follows that FX(i) (t) ≥ FY(i) (t) ∀t ∈ (0, 1), which concludes the proof of the lemma. The following theorem formalizes the main result of this section: Theorem 4.6: For any given data set with known distributions and any integer k, the probability of declaring the first k values as significant is larger under the DTFDR procedure than the FDR procedure. Proof: The proof of this theorem follows from Lemma 4.5. Let Pˆ1 = T (P1 ). By the construction of the transform, the density of Pˆ1 , gˆ1 , dominates the density of P1 , g1 . In other words, Pˆ1 ≤st P1 . Therefore, the results of the lemma 4.5 apply to random variables P 1 and Pˆ1 . First, assume that the observations contain only samples from H0 . Since the random variable T (P0 ) is stochastically equal to P0 , the probability of declaring k of them significant is equal for all k in both procedures. Next, when the samples of H1 are added one by one, index of k th smallest p value in the FDR procedure is less likely to increase than the index of k th smallest p value in the DTFDR

procedure, because P1 is stochastically larger than T (P1 ). Then the proof follows. V. C OMMUNICATION C OSTS Apart from the gains in detection power of FDR procedure, the transformation T leads to further advantages in communication costs. When, for example, H1 is a multi modal distribution, the linearly increasing FDR threshold can intersect the ordered p value curve at multiple locations. Hence early termination of the distributed algorithm is possible if an ordered p value above the threshold curve is discovered. However, the DTFDR procedure yields to a convex pvalue curve as shown in Fig. 5 and resolves this issue. The implication of this convex structure is that once the distributed DTFDR procedure terminates, we have the confidence that there will be very little penalty for not searching any further, whereas this is not the case with the distributed FDR procedure. We state this as a lemma next. Lemma 5.1: For a large number of observations, the expected values of the order statistics in the transformed domain approximate to samples of a convex function. Furthermore, the expected intersection point of this convex curve and the threshold curve can be lower 1 bounded by a non-trivial value for mixture densities f such that Fˆ (x) 1 and 1−Fˆ (x) are both convex. Proof: For this, we assume a mixture distribution for the observations, F . The proposition 4.4 implies that the distribution Fˆ is a concave function, since the mixture density is converted to a monotonically decreasing one by the transformation. Combining this with the fact that for sufficiently large m the approximation to E(P(i) ) is provided i by E(P(i) ) = Fˆ −1 ( m+1 ) [9] proves first part of the lemma. For the second part, it is noteworthy that in fact even if the number of observations are not large, the expected value of the orders statistics can be bounded from below and from above. This relationship is given by the following equation [9] for densities such 1 and 1−F1ˆ (x) are both convex: Fˆ −1 ( i−1 that Fˆ (x) ) ≤ E(P(i) ) ≤ m −1 i Fˆ ( m ) The lower bound in this equation itself is a convex function of i, and therefore the expected point of intersection can be lower bounded using this curve. Theorem 5.2: For small FDR thresholds, the communication cost of the system scales linearly with the actual number of significant observations among the data if DTFDR procedure is implemented by the system.

Proof: This follows from the convex structure induced by the transformation. The communication cost is a function of the number of observations from both positive (m1 ) and null (m0 ) hypotheses. However, for small γ, the cost is a weak function of m0 . This is because when γ is small, the observations that are declared to be significant are also small. Since in the region near 0 the number of observations from positive hypothesis strongly dominates the number of observations from null hypothesis, the communication cost is dominated by the number of observations from positive hypothesis in that region. As m1 gets larger and larger, the cost of the system for maximum detection power increases proportionally. 1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

As an example for Theorem 5.3, Fig. 6 was generated using 100 observations, 30 of which were significant. The experiments were repeated for 1000 times to get the average behavior of the systems. Now consider the following problem as an example: The FDR constraint is set to γ = .15, and β = 5. In this problem, communication cost of the system using DTFDR procedure is 5 bits on average, while the cost of the system using FDR procedure is about 15 bits on average. Looking at a similar problem, which was posed in Section 3, if the communication constraint was set to be 10 bits, then the system which implements the distributed DTFDR procedure can detect about 10 significant observations (E(S) = 10), whereas the system which implements distributed FDR procedure can detect none. Hence E(T ) is smaller for the system which uses DTFDR procedure as its decision making strategy, compared to the system which uses FDR procedure. This was the predicted behavior by the theoretical analysis throughout the paper. This example illustrates how DTFDR procedure maximizes the detection power and minimizes the communication cost compared to another procedure from the family of FDR-like procedures. VI. E XPERIMENTAL R ESULTS

0.1

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0

0

1000

(a) Original Data Set Fig. 5.

2000

3000

4000

5000

6000

7000

8000

9000

10000

(b) Transformed Data Set

Ordered p value vs index plots

Theorem 5.3: For the following minimization problem with a fixed pair (γ, β), the solution of the class of FDR-like procedures is lower bounded by the solution of the DTFDR procedure: P minimize i,t c(ui (0)...ui (t)) subject to: E(V /R) ≤ γ, E(S) ≥ β

where c is given by Eq. 2. Proof: It is easy to see that due to the properties of the transform, for a fixed FDR constraint, γ, the DTFDR procedure minimizes E(T ), and in turn maximizes E(S), within the family of FDR-like procedures. This is because the density of P1 decreases monotonically. Furthermore, for any given number of declarations R = r, E(S/R = r) is also maximized by DTFDR, and the reasoning is the same as above. This implies that the same value of E(S/R = r) can be achieved with a smaller number of declarations (smaller r) when DTFDR procedure is used as opposed to any other procedure from the family. Since the communication cost is determined by the number of declarations, the result follows.

(a) Sensor Field Model

(b) Solution via [5]

Fig. 7. Sensor field model and the solution via [5] for boundary detection problem

30

0.25

25

0.2

20 0.15

(a) FDR Procedure

15

(b) Bonferroni Procedure

0.1

Fig. 8. Boundary detection via distributed implementation of FDR and Bonferroni procedures

10 0.05 5 0

0

5

10

15

20

25

30

35

(a) Ordered p values Fig. 6.

40

45

0

0

10

20

30

40

50

60

70

80

90

(b) E(S) vs Comm. Cost

Communication costs for FDR (solid) and DTFDR (dotted)

100

Fig. 7 and Fig. 8 were taken from a previous work [6] by us. The noise was assumed to be with normal distribution. These figures demonstrate how FDR procedure can be very useful when the variance of the observations is same both under H0 and H1 . Results of the straight forward implementation of FDR

procedure is compared with the implementation of a nonadaptive system, namely Bonferroni procedure, as well as a related procedure described recently by [5]. The strength of the proposed method is evident in the results. The implementation of the suggested algorithm detected the boundary successfully, whereas the implementation of Bonferroni procedure gave poor detection performance as expected. The detected parts of the boundary are indicated as solid squares in the figure representing the implementation of method in [5].

4

x 10 15

0.03

0.025

10 0.02

0.015

5

0.01

0.005

0

0

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0

(a) Unnormalized density

500

1000

1500

2000

2500

3000

(b) Distribution

Fig. 11. Close ups on Unnormalized density (a) and Distribution (b) of P1 before (solid) and after (dotted) the transformation.

(a) Sensor Field Model

(b) Solution via [5]

Fig. 9. Sensor field model and the solution via [5] for mine detection problem

(a) Field Model

(b) FDR

(c) DTFDR

Fig. 12. Detection performance of distributed implementations for α = 800 bits

(a) FDR Procedure

(b) Bonferroni Procedure

Fig. 10. Mine detection via distributed implementation of FDR and Bonferroni procedures

Again, the same approaches were tested in the mine detection problem. Fig. 9 and Fig. 10 display the simulation results. The observation model was the same as previous one, with Gaussian noise, which was the same under both hypotheses. It can be easily seen that FDR procedure is superior to both Bonferroni procedure, and the method in [5] in terms of detection. To see how DTFDR procedure is superior to FDR procedure in a wireless SNET context, we have performed several tests in which the structure of the data brought up the suboptimal nature of FDR procedure. For this study we assumed that the observation models under positive and null hypotheses were different. The experiments were performed on a grid of size 128x128. The observations were distributed with respect to N (0, 1) for H0 and N (2.25, .08) for H1 . The probability of having a significant observation at a given location was 0.12 and FDR threshold was set to γ = .15. We then varied the communication constraint α. The results are presented for illustrative cases in Fig. 11 through 14. For α ≤ 1500, implementation of the FDR procedure was unable to detect the observations from

(a) Field Model

(b) FDR

(c) DTFDR

Fig. 13. Detection performance of distributed implementations for α = 1200 bits

(a) Field Model

(b) FDR

(c) DTFDR

Fig. 14. Detection performance of distributed implementations for α = 1600 bits

positive hypothesis, whereas the DTFDR procedure was able to do so. As the constraint was loosened, the performance of DTFDR procedure increased accordingly, almost in a linear fashion. The predicted behavior of FDR procedure as presented in Fig. 6 in terms of having a necessary minimum amount of bit budget to detect any significant observations are clear in these figures. Until the bit budget is increased to 1500, there was no detection, even though there are the same number of observations from the positive hypothesis. VII. C ONCLUSION This paper deals with developing tools for detection of localized events, sources, or abnormalities within a sensor network. Unlike decentralized detection the focus here is on problems involving localized information, where, only a small number of sensors in the vicinity of the phenomena are in the field of observation. For localized phenomena the main difficulty arises from the coupling of: a) noisy sensor observations that lead to local false positives/negatives; and b) limited energy, which constrains communication among sensor nodes. Together this situation poses fundamental difficulties when one desires to control false alarms/misses at a global level. The paper characterizes fundamental trade offs between global performance (false alarms and miss rate) and communication cost. We develop a novel framework of Domain Transformed FDR (DTFDR) to minimize the communication cost subject to worst-case misclassification constraints. This involves a measure invariant transformation followed by application of the FDR approach. The preliminary results show that DTFDR concept applied in a sensor networks context leads to significant reduction in the communication cost of the system. We show that the detection performance of a wireless sensor network is comparable to that of a wired network of sensors.

R EFERENCES [1] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series B, vol. 57, pp. 289300, 1995. [2] J. F. Chamberland and V. V. Veeravalli, “Decentralized detection in sensor networks,” IEEE Transactions on Signal Processing, 2003. [3] D. Estrin, D. Culler, K. Pister, G. Sukhatme, “Connecting the Physical World with Pervasive Networks,” IEEE Pervasive Computing, 1, January-March, 2002, pg. 59-69 [4] G. Fleury, A. Hero, S. Yoshida, T. Carter, C. Barlow and A. Swaroop, “Clustering Gene Expression Signals from Retinal Microarray Data,” Proc. IEEE Int. Conf. on Acoust., Speech, and Sig. Process., Orlando FL, 2002. [5] R. Nowak and U. Mitra, “Boundary Estimation in Sensor Networks,” 2nd International Workshop on Information Processing in Sensor Networks, 2003, Palo Alto, CA [6] Venkatesh Saligrama, Yonggang Shi, and William C. Karl, “Performance Guarantees in Sensor Networks,” IEEE Int. Conf. on Acoust., Speech, and Sig. Process., Montreal, Canada, 2004 [7] S. Ross, Introduction to Stochastic Dynamic Programming. Academic Press, 1983. [8] H. L. Royden, Real Analysis. Prentice Hall, 1988 [9] H. A. David and H. N. Nagaraja, Order Statistics. John Wiley and Sons, 2003. [10] X. Shen, H.-C. Huang, and N. Cressie, “Nonparametric hypothesis testing for a spatial signal,” Journal of the American Statistical Association, vol. 97, 2002 [11] R.J. Simes, “An improved Bonferroni procedure for multiple tests of significance,” Biometrika, vol. 73, pp. 751754, 1986 [12] J. N. Tsitsiklis, “Decentralized detection,” Advances in Statistical Signal Processing, H. V. Poor and J. B. Thomas Eds, vol. 2. [13] P. K. Varshney, Distributed Detection and Data Fusion. Springer, 1997. [14] Saligrama Venkatesh, M. Alanyali, O. Savas, and S. Aeron, “Classification in sensor networks,” Proceedings of International Symposium on Information Theory, 2004.

Recommend Documents