Unexpected Properties and Optimum Distributed Sensor Detectors for Dependent Observation Cases Yunmin Zhu Department of Mathematics Sichuan University Chengdu, Sichuan 610064 P. R. China
R. S. Blumy Zhi-Quan Luo and K. M. Wong EECS Dept. Communications Res. Lab. Lehigh University McMaster University Bethlehem, PA 18015-3084 Hamilton, ONT L8S 4K1 U.S.A. Canada Abstract
Optimum distributed signal detection system design is studied for cases with statistically dependent observations from sensor to sensor. The common parallel architecture is assumed. Here each sensor sends a decision to a fusion center which determines a nal binary decision using a nonrandomized fusion rule. General L sensor cases are considered. A discretized iterative algorithm is suggested which can provide approximate solutions to the necessary conditions for optimum distributed sensor decision rules under a xed fusion rule. The algorithm is shown to converge in a nite number of iterations and the solutions obtained are shown to approach the solutions to the original problem, without discretization, as the variable step size shrinks to zero. In the formulation, both binary and multiple-bit sensor decisions cases are considered. Illustrative numerical examples are presented for two, three and four sensor cases where a common random Gaussian signal is to be detected in Gaussian noise. Some unexpected properties of distributed signal detection systems are also proven to be true. In an L-sensor distributed detection system which uses L ? 1 bits in the decisions of the rst L ? 1 sensors, it is shown that the last sensor should use no greater than 2L?1 bits in its decision. Using more than this number of bits cannot improve performance. Further, in these cases a particular fusion rule, which depends only on the number of bits used in the sensor decisions, can be used without sacri cing any performance. This fusion rule can achieve optimum performance with the correct set of sensor decision rules. This work was supported by The National Climbing Project and NNSF of China. y This paper is based on work supported by the Oce of Naval Research under Grant No. N00014-97-1-0774 and
by the National Science Foundation under Grant No. MIP-9703730
1
1 Introduction In recent years, there has been increased research interest in distributed signal detection problems [1] through [18]. A system with multiple distributed sensors has many advantages over one with a single sensor. These include increases in the reliability, robustness and survivability of the system. Distributed signal detection problems for cases with statistically independent observations have received signi cant attention. Problems with statistically dependent observations have received much less study, mainly because they have been shown to be dicult. A set of coupled equations describing necessary conditions for optimum distributed sensor detectors were produced in [1]. It was explained in [1] that nding solutions to these equations is extremely dicult, due to the coupling. While [1] produced necessary conditions for the single-bit sensor decision case, similar equations for multiple-bit sensor decision cases were provided in [6]. Again, the authors emphasized the diculty in solving for the optimum sensor detectors. A mathematical veri cation demonstrating the computational diculty of nding the optimum sensor detectors was provided in [7]. Some progress was made for weak signal cases. Closed form expressions for the optimum sensor detectors for weak signal cases were found in [8] and interpretations of the forms of the optimum sensor rules were given. A Gauss-Seidel approach was suggested in [9] to numerically nd the functional forms of the sensor detectors for cases where signals may not be weak. In [9], the sensor rule optimization problem is addressed as a continuous one. Discretizing the optimization problem is not addressed in [9]. We propose an ecient discretized iterative algorithm to search for optimum sensor rules. We prove the convergence of this algorithm and that the solutions approach the solutions of the original continuous algorithm as a step-size parameter is taken to zero. Studies of fusion rules for distributed signal detection systems have also been appeared. The form of the optimum fusion rule for cases with independent observations was produced in [10]. Further studies of fusion rules for independent observations have also appeared [11, 12]. General studies of fusion rules for dependent observations have appeared less frequently, but a limited number of studies can be found [13, 14]. In this research, we present some new ndings about optimum fusion rules. The basic nature of these ndings demonstrates how little is known about this topic. In particular, we show some highly unexpected properties of fusion rules. We show that a speci c xed fusion rule, which depends only on the number of bits used in the sensor decisions, can be used to obtain overall optimum performance in some speci c cases. These cases include those where an L-sensor distributed decision system uses a total of L ? 1-bit decisions distributed over the rst L ? 1 sensors, while the last sensor uses 2L?1 bits in its decision. Moreover, the fusion rule employed in these cases does not depend on the statistical properties of the observation data. Of course, to get optimum performance, the choice of the sensor rules does depend on statistical properties of the observation data. Further, we show that, in these cases, increasing the number
2
of bits used in the last sensor decision will not improve performance. Even if the observations themselves are sent from this sensor to the fusion center, performance will not be improved. Our numerical results support the above statements. We note that other types of unexpected behavior have been observed for distributed detection systems in the past, for example in [15]. The paper is organized as follows. In Section 2, we present the problem formulation. In Section 3 we propose a discretized algorithm to nd optimum sensor detectors for a xed fusion rule and we demonstrate the nite convergence of this algorithm. In Section 4 we extend our results to cases with sensor detectors that make multiple bit decisions. In Section 4, we also present some unexpected properties of distributed signal detection systems. In Section 5, several numerical results are provided. Finally, we present our conclusions in the last section, Section 6.
2 Problem Formulation Consider a Bayesian hypothesis testing problem with two hypotheses H0 and H1 , and L sensors, where an ni dimensional vector of observations yi is observed at sensor i. A binary decision, ui = 0 or 1, is generated at sensor i, i = 1; : : : ; L, using the nonrandomized sensor decision rule ui = Ii (yi ). Here ui denotes the decision at sensor i. For convenience we de ne
i = fyi 2 > R > < I2(i+1) (y2 ) = I [ P21 (I1(i+1) (y1 ); I3(i) (y3 ); :::; IL(i) (yL ))L^ (y1 ; y2 ; :::yL )dy1 dy3 dyL ]; > > > > > : IL(i+1) (yL) = I [R PL1 (I1(i+1) (y1 ); I2(i+1) (y2 ); :::; IL(i?+1)1 (yL?1))L^ (y1 ; y2 ; :::yL )dy1 dy2 dyL?1 ]:
(3.1)
To facilitate computer implementation of the above process, we discretize the variables. Let the discretization (the grid) of L^ (y1m1 ; :::; yLmL )y2 y3 yL]; m1 = 1; :::; N1 ; > > > > > P P P > I2(im+1) = I [ Nm11 =1 Nm33 =1 NmLL =1 P21 (I1(i+1) (y1m1 ); I3(i) (y3m3 ); :::; IL(i) (yLmL )) > 2 > < L^ (y1m1 ; :::; yLmL )y1 y3 yL]; m2 = 1; :::; N2 ; > > > > > > > > (i+1) = I [PN1 PN2 PNL?1 P (I (i+1) (y ); I (i+1) (y ); :::; I (i+1) (y > ILm 2m2 1m1 2 > mL?1 =1 L1 1 =1 L?1 (L?1)mL?1 )) > : L L^ (ym11m=11 ; :::;my2Lm L )y1 y2 yL?1 ]; mL = 1; :::; NL ;
(3.2)
6
where y1 , y2 ,...,yL are the positive step-sizes used for discretizing of the vectors y1 , y2 ,...,yL respectively. In the sequel, for notational simplicity we normalize them to unity. The iterations (3.2) are the corresponding discrete versions of the continuous iteration processes (3.1). As such, they are readily implemented on computers. A simple termination criterion of the iteration process is to stop as soon as (i) (i+1) (i) (i+1) (i) I1(im+1) 1 = I1m1 ; I2m2 = I2m2 ; :::; ILmL = ILmL for all m1 ; m2 ; :::; mL .
An alternative is to stop when
X
m1 ;m2 ;:::;mL
(i+1) (i) (i+1) ? I (i) j ) ; jI1m1 ? I1m1 j + + jILm LmL L
(3.3) (3.4)
where > 0 is some pre-speci ed tolerance parameter. We now examine the convergence of the Gauss-Seidel iterative algorithm of (3.2). To simplify expressions, de ne i) ; :::; I (i) ) (I1(i+1) ; :::; Ij(i+1) ; Ij(+1 L
P
(i+1) (i) (i) = m1 ;m2 ;:::;mL [1 ? F (I1(im+1) 1 ; :::; Ijmj ; I(j +1)mj +1 ; :::; ILmL )]
(3.5)
L^ (y1m1 ; :::; yLmL ) for 81 j L and
G(ji+1) (yjmj )
P
(i+1) (i) (i) = m1 ;:::;mj?1;mj+1 ;:::;mL Pj 1 (I1(im+1) 1 ; :::; I(j ?1)mj ?1 ; I(j +1)mj +1 ; :::; ILmL )
(3.6)
L^ (y1m1 ; :::; yLmL ) for 81 j L,
where is the discrete version of the cost functional in (2.3). G(ji+1) is the discrete version of the R integral Pj 1 L^ (y1 ; :::; yl )dy1 yj ?1 yj +1 dyL in (2.8). In order to simplify the presentation of the proof of the convergence, we present a sequence of lemmas.
Lemma 3.1 Once the termination condition in (3.3) is satis ed for some i = k 0, then this condition will remain satis ed for all i k. The lemma obviously follows from the iterative algorithm (3.2) immediately. 7
i) ; :::; I (i) ) is non-increasing as j is increased and (I (i+1) ; I (i+1) ; :::; I (i+1) ) Lemma 3.2 (I1(i+1) ; :::; Ij(i+1) ; Ij(+1 1 2 L L
(I1(i) ; I2(i) ; :::; IL(i) ).
Proof. Using (2.5), (3.2), (3.5) and (3.6), we have i) ; :::; I (i) ) (I1(i+1) ; :::; Ij(i+1) ; Ij(+1 L
P = Nmjj =1 (1 ? Ij(i+1) (yjmj ))G(ji+1) (yjmj ) + Cj P P = Nmjj =1 (1 ? Ij(i) (yjmj ))G(ji+1) (yjmj ) + Cj + Nmjj =1 (Ij(i) (yjmj ) ? Ij(i+1) (yjmj ))G(ji+1) (yjmj ) (i) (i) PNj (i) (i+1) (i+1) = (I1(i+1) ; :::; Ij(?i+1) 1 ; Ij ; :::; IL ) + mj =1 (Ij (yjmj ) ? Ij (yjmj ))Gj (yjmj ) (i) (i) (I1(i+1) ; :::; Ij(?i+1) 1 ; Ij ; :::; IL ) for 8j L;
where Cj is a constant independent of Ij(i) and Ij(i+1) . The rst three equalities follow from (2.5), (3.5) and (3.6), and the last inequality holds due to the fact that (3.2) implies
Ij(i+1) (yjmj ) = 1 if and only if G(ji+1) (yjmj ) 0; for 8 1 mj Nj :
(3.7)
that is to say, the all terms of the summation Nj X (Ij(i) (yjmj ) ? Ij(i+1) (yjmj ))G(ji+1) (yjmj )
mj =1
(3.8)
Q.E.D.
are non-positive.
From Lemma 3.2, we know (I1(i) ; I2(i) ; :::; IL(i) ) must converge to a stationary point after a nite number of iterations since it is positive. Then we have
(i) from (3.2) are also nitely convergent. Lemma 3.3 The I1(im) 1 ; I2(im) 2 ; ..., ILm L Proof. By Lemma 3.2, must attain a stationary point after a nite number of iterations, i.e., i) ; :::; I (i) ) = (I (i+1) ; :::; I (i+1) ; I (i) ; :::; I (i) ): (I1(i+1) ; :::; Ij(i+1) ; Ij(+1 1 j ?1 j L L
Then, using the analysis in Lemma 3.2 (recall all terms in (3.8) are non-positive) (Ij(i) (yjmj ) ? Ij(i+1) (yjmj ))G(ji+1) (yjmj ) = 0 for all mj , which implies either or (from (3.7))
Ij(i+1) (yjmj ) = Ij(i) (yjmj );
G(ji+1) (yjmj ) = 0; i.e., Ij(i+1) (yjmj ) = 1: 8
It follows that when attains such a point, either Ij(i+1) (yjmj ) is invariant, or Ij(i+1) (yjmj ) = 1. In other words, Ij(i+1) (yjmj ) may change values, between 0 to 1, at most a nite number of times, then Ij(i+1) (yjmj ) can no longer change. Thus the algorithm cannot oscillate inde nitely between two xed points. Q.E.D. Summarizing the above three lemmas, we can assert the following theorem on the nite convergence of the discretized Gauss-Seidel iteration process.
Theorem 3.1 For any positive discretization step size of the elements of y1 ,y2,...,yL and any
initial choice of (I1(0) ; I2(0) ; :::; IL(0) ), the algorithm of (3.2) terminates with a set (I1(i) ; I2(i) ; :::; IL(i) ) satisfying (3.3) after a nite number of iterations. 2
Remark 3.1 Here we have to emphasize again that Theorem 3.1 does not guarantee the algorithm
in (3.2) converges to a globally optimum solution of (2.6) for any initial conditions. It is possible that even as the step-sizes approach zero, the algorithm may converge to a person-by-person 2 optimum solution, that is not globally optimum, for some initial values provided. Of course, the problem is that (2.6) are only necessary conditions. Fortunately, in all cases we investigated numerically (see Section 5), only one (usually one) or a few solutions to (3.2) were observed. Let y1 = y2 = = yL = and let C (I1 ; I2 ; :::; IL ; F ) be the minimum of the discrete version of C (I1 ; I2 ; :::; IL ; F ). Thus C (I1 ; I2 ; :::; IL ; F ) is computed using a sum and it is the quantity our discretized algorithm tries to, and can, achieve. One may question the existence of the limit of C and its relationship to the minimum of C (I1 ; I2 ; :::; IL ; F ). The following theorem asserts, under some mild conditions, that the limit of C exists and that C converges to the in mum of C (I1 ; I2 ; :::; IL ; F ) as tends to zero.
Theorem 3.2 Suppose that for a region
= f(y1 ; y2 ; :::yL ) : F (I1 (y1 ); I2 (y2 ); :::; IL (yL )) = 0g de ned by any set of sensor decision rules (I1 ; I2 ; :::; IL ) and any fusion rule F (I1 ; I2 ; :::; IL ), the following inequality
Z
jj L^ (y1; y2 ; :::yL )dy1 dy2 dyL ? S ( ; )jj M
(3.9)
holds, where S ( ; ) is a Riemann sum of the integral in (3.9) and the constant M does not depend on and . Then we have lim C = inf C (I ; I ; :::; IL ; F ) Cinf !0 I1 ;:::;IL 1 2 2 A person-by-person optimum solution: a solution that can't be improved by changing only one sensor rule.
9
2
Proof. For arbitrary > 0, by the de nition of Cinf , there exists a set of sensor decision rules (I1 ; I2 ; :::; IL ) such that
C (I1 ; I2 ; :::; IL ; F ) Cinf + 21 :
(3.10)
C (I1 ; I2 ; :::; IL ; F ) Cinf + ;
(3.12)
Denote the Riemann sum of C (I1 ; I2 ; :::; IL ; F ) by C (I1 ; I2 ; :::; IL ; F ). There exists > 0 such that for any C (I1 ; I2 ; :::; IL ; F ) C (I1 ; I2 ; :::; IL ; F ) + 12 : (3.11) Thus, combining the above two inequalities yields Furthermore, noticing the de nition of C ,
C C (I1 ; I2 ; :::; IL ; F ) Cinf + ; for 8 ,
(3.13)
lim sup C Cinf + :
(3.14)
lim sup C Cinf :
(3.15)
lim inf C < Cinf : !0
(3.16)
Ck < Cinf ? :
(3.17)
which implies that Since is arbitrary, we have On the other hand, suppose
!0
!0
There would be a positive constant > 0 and a sequence fk g such that k ! 0 and For every such Ck , there must be a set of (I1(k) ; I2(k) ; :::; IL(k) ) such that
Ck = C k (I1(k) ; I2(k) ; :::; IL(k) ; F ):
(3.18)
Using the inequality (3.9) and (3.17), for some K large enough, we have
C (I1(K ) ; I2(K ) ; :::; IL(K ) ; F ) CK + < Cinf ;
(3.19)
which contradicts the de nition of Cinf . Therefore, the reverse inequality of (3.16) should be true and Cinf lim inf C lim sup C Cinf : (3.20) !0 !0
Q.E.D. Remark 3.2 The assumption in this theorem (3.9) is not restrictive. When L(y1 ; y2 ; :::yL ) is The theorem follows.
locally Lipschetz continuous, one can easily prove the inequality (3.9) holds. 10
4 Extensions and Unexpected Properties To improve decision accuracy, if communication bandwidth is available, the sensors can make multiple bit decisions. In this case an ri bit sensor decision can be made by a group of decision functions at sensor i. In particular, the mth bit in the sensor decision at sensor ` is produced by u`;m = I`;m(y` ) for m = 1; : : : ; r` and ` = 1; : : : ; L. It is not hard to see that all the above analyses, algorithms and results can be extended to this more general case. In fact, a set of equations which are very similar to (2.5) results. These are 1 ? F (u1;1 ; :::; uL;rL ) = (1 ? u1;1 )P1;1;1 (u1;2 ; :::; uL;rL ) + P1;1;2 (u1;2 ; :::; uL;rL ) = (1 ? u1;2 )P1;2;1 (u1;1 ; u1;3 :::; uL;rL ) + P1;2;2 (u1;1 ; u1;3 :::; uL;rL )
= (1 ? uL;rL )PL;rL ;1 (u1;2 ; :::; uL;rL ?1 ) + PL;rL ;2 (u1;2 ; :::; uL;rL ?1 ); (4.1) The dierence is that (4.1) now requires one equation for each bit at each sensor and that the constants, Pij ! Pi;m;j are dierent. Using (4.1) leads to a similar set of equations as given in (2.6) and (3.1). The new version of (3.1) is
8 (i+1) R (i) )L^ (y ; y ; :::y )dy dy dy ]; > I1;1 (y1 ) = I [ P1;1;1 (I1(i;2) ; I1(i;3) ; :::; I1(i;r)1 ; IL;(i)1 ; :::; IL;r 1 2 L 2 3 L > L > > R (i) )L^ (y ; y ; :::y )dy dy dy ]; > < I1(i;2+1) (y1 ) = I [ P1;2;1 (I1(i;1) ; I1(i;3) ; :::; I1(i;r)1 ; IL;(i)1 ; :::; IL;r 1 2 L 2 3 L L > > > > > (i) (i) (i) (i) (i) (i+1) (y ) = I [R P : IL;r L L;rL ;1 (I1;1 ; I1;3 ; :::; I1;r1 ; IL;1 ; :::; IL;r ?1 )L^ (y1 ; y2 ; :::yL )dy2 dy3 dyL ]: L
L
(4.2) Accordingly, there is an obvious discretized version of (4.2) which has obvious similarities to (3.2).
Unexpected Properties Thus far, we have not yet considered optimizing the fusion rule. Of course, one could use our techniques to nd the best sensor rules for each of a set of fusion rules in some class. Then one could pick the fusion rule in this class that gives best performance. Of course, this could require signi cant computation if the class of fusion rules considered is large. Here we present an alternative for some special cases which generally requires considerably less computation. This alternative can nd the best distributed signal detection system over the class of all nonrandomized fusion rules. The alternative is based on the following fact. In special cases we can provide a xed fusion rule that can be used to achieve optimum performance. This is not always possible, just in these special cases. These special cases exhibit unexpected and interesting properties as we shall discuss in the 11
rest of this section. Consider a case with L sensors and assume L ? 1 of them are required to make binary decisions 3 while one (the last) makes a 2L?1 bit decision 4 . The fusion rule we propose is given by
9 8 > > u = 0 ; u = 0 ; :::; u = 0 ; u = 0 ; 1 2 L ? 1 L; 1 > > > = < u1 = 1; u2 = 0; :::; uL?1 = 0; uL;2 = 0; > fu1 ; : : : ; uL;2L?1 : F = 0g = > : > ; > > > ; : u1 = 1; u2 = 1; :::; uL?1 = 1; uL;2L?1 = 0: >
(4.3)
Theorem 4.1 By employing the fusion rule in (4.3), in the case speci ed (L ? 1 binary sensors and one 2L?1 bit sensor), we can obtain optimum performance.
2
The Proof of Theorem 4.1 Obviously, the Theorem will be proven if we show that there exists
a set of sensor decision rules which, with this fusion rule, can implement any distributed detection scheme with a nonrandomized fusion rule. Consider a set of sensor decision rules I1 ; : : : ; IL;2L?1 and a general fusion rule
8 (1) ; u = d(1) ; :::; u = d(1) ; u = d(1) ; : : : ; u L?1 = d(1) ; 9 > u = d 1 L?1 L;2 1 2 2 L;1 L?1 L;1 L;2L?1 > > > > > (2) (2) (2) (2) = < u1 = d(2) = d ; ; u = d ; :::; u = d ; u = d ; : : : ; u L ? 1 L?1 L;2 1 2 2 L?1 L;1 L;1 L;2L?1 fu1 ; : : : ; uL;2L?1 : F = 0g = > : > ; > > > ; : u1 = d(1N ) ; u2 = d(2N ) ; :::; uL?1 = d(LN?)1 ; uL;1 = d(L;N1) ; : : : ; uL;2L?1 = d(N )L?1 : > L;2
(4.4) L ? 1 ( j ) where N 2L?1+2 . In (4.4) all di = 0 or 1. Assume that each set of sensor decisions that produce F = 0 is explicitly written in (4.4). Thus, for example if the Lth sensor is irrelevant in the nal decision then we could express (4.4) with a small N where the Lth sensor's decisions never appear. However, this is really the same as listing all combinations of the Lth sensor's decisions for each combination of the rst L ? 1 sensor decisions that appear in (4.4). This last form is assumed here. Now divide all the possible sensor decision combinations into a number of groups, where each group consists of all sets of sensor decisions which use the same decisions at the rst L ? 1 sensors. The j th group has the common terms u1 = d(1j ) ; u2 = d(2j ) ; :::; uL?1 = d(Lj?) 1 equal to the binary count for j . Each member of the group is uniquely identi ed by the decisions at sensor L. Note there are 2L?1 groups. Now de ne the set of observations at sensor L that produce a given set of distinguishing sensor decisions as (d^L;j = 0 or 1; j = 1; : : : ; 2L?1 )
L = fyL : IL;1 (yL ) = d^L;1 ; :::; IL;2L?1 (yL ) = d^L;2L?1 g;
3 For simplicity, we denote these sensor decisions and their decision rules using a single index as we did in the
binary sensor decision case. 4 Here we use two indices.
12
We denote the union of all such L which correspond to sensor decisions in the j th group when the j th group does appear in (4.4) by Lj . Note that if the j th group does not appear in (4.4) then
Lj is the empty set. Thus, using Lj , we can de ne a new sensor decision rule I^L;j (yL ) at the Lth sensor as ( I^L;j (yL) = 0; as yL 2 Lj , 1; otherwise. Notice that if Lj is the empty set then I^L;j (yL ) = 1 for all yL . Using I^L;j (yL ); j = 1; : : : ; 2L?1 with the other sensor rules Ii ; i = 1; : : : ; L ? 1 and (4.3) insures the overall scheme produces the same output as the original scheme using the rule from (4.4). Thus our fusion rule (4.3) allows us to represent any scheme with a rule from (4.4). Q.E.D.
Remark 4.1 An optimum set of sensor rules and a fusion rule is not necessarily unique. This is clear from Theorem 4.1.
The next Theorem shows the special nature of the case we have considered (L ? 1 binary sensors and one 2L?1 bit sensor). It says that performance is not improved by sending a sensor decision from the 2L?1 bit sensor which uses more than 2L?1 bits. This is true even if this sensor can send uncompressed data (the original observation yL ) to the fusion center.
Theorem 4.2 When one sensor (the Lth) transmits more than 2L?1 bits (including uncompressed observation data) and all of the other sensors transmit totally L ? 1 information bits to the fusion center, the optimum performance is equivalent to that of a system in which the Lth sensor transmits only 2L?1 information bits to the fusion center and in particular to the scheme in Theorem 4.1 which uses (4.3). 2
The Proof of Theorem 4.2 It still suces to prove that all schemes which use fusion rules of the form of F (I1 (y1 ); ..., Il?1 (yL?1 ), yL ) with any sensor decision rules can be implemented using the fusion rule (4.3) with some of the sensor decision rules at the Lth sensor set in a speci c way. First assume the rst L ? 1 sensors make binary decisions as in Theorem 4.1. Consider a general scheme with a the critical region for the fusion rule of the form 8 9 > ; yL ) = 0; > F (I1 (y1 ) = d(1) ; I2 (y2 ) = d(1) ; :::; IL?1 (yL?1 ) = d(1) 1 2 > > L ? 1 > > (2) (2) (2) < = ; :::; I ( y ) = d ; y ) = 0 ; ; I ( y ) = d F ( I ( y ) = d L ? 1 L ? 1 L 2 2 1 1 2 1 L ? 1 H = (y ; :::; y ) : ; 0
1 > > > :
L
> > ; F (I1 (y1 ) = d(1N ) ; I2 (y2 ) = d(2N ) ; :::; IL?1 (yL?1 ) = d(LN?)1 ; yL ) = 0: > ;
(4.5)
where all d(ij ) = 0 or 1. Take the same approach as in Theorem 4.1, but with I^L;j (yL ) F (d(1j) ; :::; d(Lj?) 1 ; yL ). Using I^L;j (yL ); j = 1; : : : ; 2L?1 with the other sensor rules Ii; i = 1; : : : ; L ? 1
and (4.3) insures the overall scheme produces the same output as the original scheme using the rule from (4.5). It is easy to see that the important point for the proof is that the last sensor has a bit in its sensor decision that can match up with each combination of the other sensor decisions. 13
Thus L ? 1 bits can be used by the rst L ? 1 sensors in total and it is not important that each of the rst L ? 1 sensors make a one bit decision. Q.E.D.
Remark 4.2 Theorem 4.2 is useful in practice. For example, when the performance of a decision system is not satisfactory, we may add a number of extra sensors. The theorem tells the number of bits to be used if one sensor is added to binary sensors. We note that increasing the number of bits used in a sensor decision will generally increase performance except in speci c situations. Some of these have been outlined. Thus, even in a case with L binary sensors, it is generally advantageous to increase the number of bits used in one of the sensor decisions until the limit of 2L?1 is reached.
Remark 4.3 Theorem 4.1 and Theorem 4.2 suggest a way to nd optimum distributed signal
detection systems which use binary sensor decisions if one can make observations at the fusion center. Start with an L ? 1 sensor system. Allow observations to be made at the fusion center, through an added sensor there, and allow these observations to be used in the fusion. Recall that Theorem 4.2 tells us we could quantize these observations to 2L?1 bits if desired, without loss of performance and in fact we need to do this to use Theorem 4.1. Then, Theorem 4.1 says (4.3) can be used to achieve optimum performance over the class of nonrandomized fusion rules. From Theorem 4.2, the system would be optimum over all systems that combine binary decisions at the rst L ? 1 sensors with an unquantized observation at the Lth sensor. In essence, we have exchanged the complexity of searching over multiple fusion rules for the complexity of designing an extra sensor rule for the added Lth sensor (at the fusion center). The added sensor is more dicult to design than the other sensors since it makes an 2L?1 bit sensor decision. Once the optimum sensor rules are found, one can use (4.3) to learn exactly how the rst L ? 1 single-bit sensor decisions and the observations at the fusion center are used to generate a nal decision.
Remark 4.4 Note that the added Lth sensor could be a dummy which really does not make
observations. This might be used as a \trick" to avoid searching for the optimum fusion rule for the L ? 1 sensor problem. In this case we expect the rules I^L;j (yL ); j = 1; : : : ; 2L?1 will not depend on yL so each I^L;j (yL ); j = 1; : : : ; 2L?1 will always take on the value 0 or 1 for all yL . In fact, if we think that the dummy sensor ouput is xed at a certain value, maybe 0, then we must acknowledge that this value could be mapped to any of the possible sensor decisions at sensor L. Since there are 2L?1 bits in the Lth sensor's decision, then there are 22L?1 possible sensor decisions at sensor L. In this case trying all possible sensor decision rules at sensor L involves trying all of the possible 22L?1 combinations of the individual bit decisions at sensor L. Note that this is exactly the total number of fusion rules for the original L ? 1 sensor problem (if none are ruled out [17]) which makes complete sense. Thus we see there is no magic associated with our results.
Remark 4.5 From the proofs of Theorems 4.1 and 4.2, it is clear that the important property of the case considered, total of L ? 1 bit decisions at rst L ? 1 sensors and 2L?1 bits at the last
sensor, is that there is one bit in the last sensor's decision for each of the possible combinations 14
of sensor decisions from the other sensors. Clearly this leads to a generalization of the results in Theorem 4.1 and 4.2 that will work as long as the last sensor has enough bits in its decision. This leads to xed fusion rules for other cases that can be used to achieve optimum performance. It also leads to more cases where a nite number of bits in a sensor's decision will lead to the same performance that can be achieved if the sensor sends unquantized data to the fusion center. As one example consider a two sensor case where the rst sensor makes a 2-bit decision and the second sensor makes a 4-bit decision. A slight generalization of (4.3) gives the fusion rule to use here. A case of this type is considered in the numerical results given in the next Section.
5 Numerical Results In the following numerical investigations, we consider detecting a common random Gaussian signal in Gaussian noise with 2, 3, and 4 sensors. In our algorithm, we take = 0:2, yi 2 [?7; 7].
Two sensors The observations consist of signal s and noise 1 ; 2 so that
H1 : H0 :
y1 = s + 1 ; y 1 = 1 ;
where s; 1 and 2 are all mutually independent, and
y2 = s + 2 y2 = 2 ;
s N (2; 3); 1 N (0; 3); 2 N (0; 2): Therefore, the two conditional pdfs given H0 and H1 are
!"
#
p(y1 ; y2 jH1 ) N ( 2 ; 6 2 3
3 ) 5
p(y1 ; y2 jH0 ) N ( 0 ; 3 0 0
0 ): 2
!"
#
First, we consider two sensors cases using binary sensor decisions. The ROC (Receiver Operating Characteristics) [19] for centralized, AND, and OR rule cases are provided in Fig. 1. We also include the ROC of the optimum scheme using binary decisions at sensor one and two bit decisions at the second sensor. This is a case of the type discussed in Theorem 4.1 so the fusion rule in (4.3) is 15
used here. Here we use the notation \OPT(1+2)". Note that using two bit sensor decisions yields better performance as we expect. To show the performances of choosing dierent sensors to transmit the extra bit, in Fig. 2 we have computed the ROC for the optimum schemes where sensor one transmits two bits and sensor two transmits one \OPT(2+1)" and for the optimum schemes where sensor one transmits one bit and sensor two transmits two \OPT(1+2)". Next, we consider cases where one sensor makes two bit decisions and the other sensor makes four bit decisions. This case is covered by the extension to Theorem 4.1 discussed in Remark 4.5. Thus, a xed fusion rule, which is a slight extension to the one in Theorem 4.1, can be used to obtain optimum performance. In Fig. 3, we have again compared switching the sensor which uses the extra bits as shown in the results labeled \OPT(4+2)" and \OPT(2+4)". From the above 3 gures, we can see that, typically, the more information bits transmitted by sensors, the closer the performance of the distributed scheme to the performance of the centralized decision system. In addition, when the power of one sensor's noise is smaller, we should generally use the extra bits at this sensor (see Figs. 2 and 3). For all the results in Figs. 1,2 and 3, we tried running the iterative algorithm with many dierent starting conditions and in each case we found only one solution with the iterative algorithm, which is the one shown in Figs. 1, 2, and 3.
Three sensors Now we add one more sensor with sensor noise 3 N (0; 1) to the above system. The resulting two conditional pdfs are given by
3 0 12 6 3 3 2 7 B CC ; 66 p(y1 ; y2 ; y3 jH1 ) N (B @ 2 A 4 3 5 3 75) 2
3
3
4
3 0 12 3 0 0 0 7 B CC ; 66 p(y1 ; y2 ; y3 jH0 ) N (B @ 0 A 4 0 2 0 75)
0 0 1 0 In Fig. 4, we show ROC curves for 2-sensor and 3-sensor centralized decision systems as well as two distributed detection cases. We consider a two sensor case with sensor detectors that use one bit and two bit sensor decisions. This is the case considered in Theorem 4.1 where the xed fusion rule given in this Theorem can be used to achieve optimum performance. We also consider a three 16
sensor case where the rst two sensors make one bit decisions and the last sensor makes four bit decisions. This another case of the type discussed in Theorem 4.1. From Fig. 4 we can see that the three sensor distributed decision system with six bit communication distributed among the sensors can be superior to the two sensor centralized decision system. Again, for all the results in Fig. 4, we tried running the iterative algorithm with many dierent starting conditions and in each case we found only one solution with the iterative algorithm, which is the one shown in Fig. 4.
Four sensors Now we add one more sensor again with sensor noise 4 N (0; 0:5) to the above system. The resulting two conditional pdfs given H0 and H1 are
0 12 BB 2 CC 66 6 3 3 p(y1 ; y2 ; y3 ; y4 jH1 ) N (B BB 22 CCC ; 666 33 35 43 @ A4 2
3
3
3
0 12 BB 0 CC 66 3 0 0 p(y1 ; y2 ; y3 ; y4 jH0) N (B BB 00 CCC ; 666 00 02 10 @ A4 0
0
0
0
3 3 3 3:5 0 0 0 0:5
3 77 77) 75
3 77 77) 75
respectively. We use the xed fusion rule from Theorem 4.1 for the case considered there with four sensors (single bit decisions at three sensors and eight bit decisions at the other sensor). We know this xed fusion rule can achieve optimum performance. We compute the cost functional in (2.2) and present results in Table 1. The parameters needed to calculate (2.2) are P1 = 31 , P0 = 23 , C00 = C11 = 0 and C10 = C01 = 1. Here we tried many dierent starting conditions and some of these resulted in dierent solutions as illustrated in Table 1. In Table 1 we just show one of the starting conditions that produced a given solution. Actually, we found many starting conditions produced this same solution. In Table 1 I () is an indicator function de ned as in (2.7). Note that we do not really need the initial sensor rule with respect to y4 in our algorithms. The rst three sensor rules de ne the last in our algorithm. This can be seen from (3.2). 17
Cent/Distr C (Cost) initial sensor rules Cent (4 sen) 0.1055 Cent (3 sen) 0.1410 Distr 0.1162 I (y1 ), I (y2 ), I (y3 )]; [I (y1 ), I (?y2 ), I (y3 )] Distr 0.1143 [I (cos(y1 )), I (sin(y2 )), I (?sin(y3 ))] Distr 0.1144 [(I (?sin(y1 ), I (?cos(y2 )), I (sin(y3 ))]; [I (sin(y1 )), I (?cos(y2 )), I (cos(y3 ))] Table 1: Some centralized and distributed (using our algorithm) designs found and their costs. From the results, we can see that the distributed costs are quite close to the centralized cost. In addition, the initial sensor decision rules in our algorithms do in uence the nal cost, but the numerical results indicate performance may not be too sensitive to the initial sensor decision rules.
6 Conclusion We investigated distributed signal detection problems without making the assumption of independent observations from sensor to sensor. We have provided necessary conditions for optimum sensor decision rules under a given fusion rule, proposed a discretized Gauss-Seidel iterative algorithm and proved its convergence. Further, we uncovered some highly unexpected results concerning distributed signal detection systems. In certain cases we have shown that a xed fusion rule can be used to achieve optimum performance. The fusion rule is independent of the detection problem (additive noise, known signals, random signals), pdfs, prior probabilities, and all other details except for the number of bits used in the sensor decisions. This signi cantly reduces the complexity of nding optimum distributed detection schemes in these cases. The cases for which his xed fusion rule can be found include those where L ? 1 out of L sensors use a total of L ? 1 bits in their sensor decisions, while the last sensor makes a 2L?1 bit decision. Further, we have also shown that performance is not improved if more than 2L?1 bits are used in the last sensor decision. This is true even if full precision is used. All theses results have been supported by numerical investigations. There are a number of interesting investigations which deserve exploration. Studying the properties of the optimum sensor detectors is an important topic which has just recently begun to receive attention [16, 17]. The investigations in [16, 17] categorize when likelihood ratio tests are optimum at the sensors for the speci c problem of detecting a known signal in Gaussian noise which is correlated from sensor to sensor. This is an important issue which can considerably simplify the design of optimum sensor detectors. Similar studies should be made for other distributed signal 18
detection problems.
References [1] R.R. Tenney and N.R.Jr. Sandell, \Detection with Distributed Sensors", IEEE Transaction on Aerospace and Electronic Systems, vol. AES{17, 4, 1981, pp. 501{510. [2] P. K. Varshney, Distributed Detection and Data Fusion, New York: Springer-Verlag, 1997. [3] J.N. Tsitsiklis, \Decentralized Detection," Advances in Statistical Signal Processing, Vol. 2: Signal Detection, H.V. Poor and J.B. Thomas, Ed. Greenwich, CT: JAI Press, 1990. [4] R. Viswanathan and P.K. Varshney, \Distributed detection with multiple sensors: part I - fundamentals," Proceedings of the IEEE, pp. 54-63, Jan 1997. [5] R. S. Blum, S. A. Kassam, and H. V. Poor, \Distributed detection with multiple sensors: part II - advanced topics," Proceedings of the IEEE, pp. 64-79, Jan. 1997. [6] I. Y. Hoballah and P. K. Varshney, \Distributed Bayesian signal detection," IEEE Transactions on Information Theory, vol. 35, No. 5, pp. 995-1000, Sept. 1989. [7] J.N. Tsitsiklis and M. Athans, \On the Complexity of Decentralized Decision Making and Detection Problems," IEEE Transaction on Automatic Control, vol. AC{30, pp. 440{446, 1985. [8] R.S. Blum and S.A. Kassam, \Optimum Distributed Detection of Weak Signals in Dependent Sensors," IEEE Transaction on Information Theory, vol. IT-36, pp. 1066{1079, 1992. [9] Z.B. Tang, K. R. Pattipati, and D. L. Kleinman, \A distributed M-ary hypothesis testing problem with correlated observations," Ieee transactions on automatic control, Vol. 37, No. 7, July 1, 1992, pp. 1042-1046. [10] Z. Chair and P. K. Varshney, \Optimal data fusion in multiple sensor detection systems," IEEE Transactions on Aerospace and Electronic Systems, AES-22, pp. 98-101, Jan. 1986. [11] P.K. Willett and D. Warren, \The Suboptimality of Randomized Test in Distributed and Quantized Detection Systems," IEEE Transaction on Information Theory, vol. 38, pp. 355{361, 1992.
19
[12] V. Veeravalli, T. Basar and H.V. Poor, \Decentralized Sequential Detection with a Fusion Center Performing the Sequential Test," IEEE Transaction on Information Theory, vol. IT-37, pp. 433-442, 1993. [13] E. Drakopoulos and C.C. Lee, \Optimum Multisensor Fusion of Correlated Local Decisions," IEEE Transaction on Aerospace and Electronic Systems, vol. SES{27, pp. 424{429, 1991. [14] M. Kam, Q. Zhu, W. S. Gray, W Steven, \Optimal data fusion of correlated local decisions in multiple sensor detection systems," IEEE Transactions on Aerospace and Electronic Systems, AES-28, p 916-920, July 1992. [15] M. Cherikh and P. B. Kantor, \Counter Examples in Distributed Detection," IEEE Transactions on Information Theory, IT-38, pp. 162-165, Jan. 1992. [16] P. N. Chen and A. Papamarcou, \Likelihood ratio partitions for distributed signal detection in correlated Gaussian noise," Proc. IEEE Int'l. Symp. Information Theory, Oct. 1995, p. 118. [17] P. F. Swaszek, P. Willett, and R. S. Blum, \Distributed detection of dependent data { the two sensor problem," 29th Annual Conference on Information Sciences and Systems, Princeton University, pp. 1077-1082, Princeton, N.J., March 1996. [18] Z.Q. Luo, K.M. Wong, Y. Sun and Y.M. Zhu, \Optimum Local Binary Decisions for a Fixed Fusion Rule in a Distributed Multi-Sensor System," Technical Report, Communications Research Laboratory, McMaster University, 1996. [19] H. V. Poor, An Introduction to Signal Detection and Estimation - Second Edition. New York : Springer-Verlag, 1994) [20] H.L. Van Trees, Detection, Estimation, and Modulation Theory, vol. 1, New York: Wiley, 1968. [21] A.W. Al-Khafaji and J.R. Tooley, Numerical Methods in Engineering Practice, Holt, Reinhart and Winston, Inc., 1986.
20
stepsize= 0.2, [−7, 7], ROC curves 0.85
0.8
0.75
Pd
0.7
0.65
solid: Centr Decision
0.6
dashed: OPT(1+2)
0.55
dashdot: AND(1+1)
dotted: OR(1+1)
0.5
0.45 0
0.1
0.2
0.3 Pf
0.4
0.5
0.6
Figure 1: Two sensor case: centralized and distributed OPT(1+2), AND, OR rules.
stepsize= 0.2, [−7, 7], ROC curves 0.85
0.8
0.75
Pd
0.7
0.65
dashed: OPT(1+2)
0.6
solid: OPT(2+1)
0.55
dashdot: OR(1+1)
0.5
dotted: AND(1+1)
0.45 0
0.1
0.2
0.3
0.4
0.5
0.6
Pf
Figure 2: Two sensor case: distributed OPT(2+1), OPT(1+2).
21
stepsize=0.2, [−7, 7], ROC curves 0.85
0.8
0.75
Pd
0.7
solid: Centr Decision
0.65
dashdot: OPT(4+2)
0.6
dashed: OPT(2+4) 0.55
0.5 0
0.05
0.1
0.15
0.2 Pf
0.25
0.3
0.35
0.4
0.45
Figure 3: Two sensor case: centralized and distributed OPT(4+2), OPT(2+4) rules.
stepsize=0.2, [−7, 7], ROC curves 0.85
0.8
0.75
Pd
0.7 solid: Centr 3 sensor 0.65 dashed: OPT(1+1+4) 0.6 dashdot: Centr 2 sensor 0.55
0.5 0.05
dotted: OPT(1+2)
0.1
0.15
0.2
0.25 Pf
0.3
0.35
0.4
0.45
0.5
Figure 4: Two and three sensor cases: centralized and distributed OPT(1+2), OPT(1+1+4) rules.
22