Information Fusion 5 (2004) 157–167 www.elsevier.com/locate/inffus
Distributed M-ary hypothesis testing with binary local decisions Xiaoxun Zhu a, Yingqin Yuan b, Chris Rorres c, Moshe Kam b
b,*
a Metrologic Instruments, Inc., Blackwood, NJ 08012, USA Data Fusion Laboratory, Department of Electrical and Computer Engineering, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA c Department of Clinical Studies, School of Veterinary Medicine, University of Pennsylvania, Kenneth Square, PA 19348, USA
Received 28 December 2002; received in revised form 20 October 2003; accepted 20 October 2003 Available online 18 November 2003
Abstract Parallel distributed detection schemes for M-ary hypothesis testing often assume that for each observation the local detector transmits at least log2 M bits to a data fusion center (DFC). However, it is possible for less than log2 M bits to be available, and in this study we consider 1-bit local detectors with M > 2. We develop conditions for asymptotic detection of the correct hypothesis by the DFC, formulate the optimal decision rules for the DFC, and derive expressions for the performance of the system. Local detector design is demonstrated in examples, using genetic algorithm search for local decision thresholds. We also provide an intuitive geometric interpretation for the partitioning of the observations into decision regions. The interpretation is presented in terms of the joint probability of the local decisions and the hypotheses. Ó 2003 Elsevier B.V. All rights reserved. Keywords: Decision fusion; Distributed hypothesis testing
1. Introduction Most studies of parallel distributed detection have been aimed at binary hypothesis testing [2,4,7,10,11,15– 17]. When M-ary hypothesis testing was considered, the local detectors (LDs) were often assumed to transmit at least log2 M bits to the Data Fusion Center (DFC) for every observation [1,12]. However, the cardinality of the local decisions need not be equal to the number of hypotheses. As Tang and his co-authors observe [14], when the total capacity of the communication channels is fixed and the information quality of each LD is identical, it is better to have a large number of short and independent messages than a smaller number of relatively long messages. Following this observation, we investigate here the effectiveness and performance of an architecture where binary messages are used even when M > 2. Approaches to address this problem were suggested in [3,20]. In [20], a hierarchical structure was used to break the complex M-ary decision problem into a set of much simpler binary decision fusion problems,
requiring a detector at each node of the decision tree. In [3] an architecture was studied where several binary decisions are fused into a single M-ary decision and processing time constraints need to be satisfied. Our distributed detection system employs N LDs to survey a common volume for evidence of one of M hypotheses M1 ðfHi gi¼0 Þ. These LDs are restricted to make a single binary decision per observation, i.e., they have to compress each observation into either ‘‘1’’ or ‘‘)1’’. The DFC uses the local decisions u ¼ fuj gNj¼1 2 f1; 1gN to make a global decision D in favor of one of the M hypotheses. In this context, it is appropriate to model the jth LD, j 2 f1; 2; . . . ; N g, through a set of transition M1 probabilities R ¼ fRji gi¼0 , where Rji is the probability that the jth detector transmits ‘‘1’’ to the DFC when the phenomenon Hi was present, namely Rji ¼ P fuj ¼ 1jHi g: ð1Þ This model of a local detector is shown in Fig. 1. M1 The DFC is characterized by the set E ¼ fbi gi¼0 , where bi is the probability that the DFC accepts one of the hypotheses fHk gk6¼i given that phenomenon Hi was present. Thus,
*
Corresponding author. Tel.: +1-215-895-6920; fax: +1-215-8951695. E-mail address:
[email protected] (M. Kam). 1566-2535/$ - see front matter Ó 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.inffus.2003.10.004
bi ¼
M 1 X k¼0;k6¼i
P fD ¼ Hk jHi g:
ð2Þ
158
X. Zhu et al. / Information Fusion 5 (2004) 157–167
and wji ¼
1 Rji log 2 1 Rji
j ¼ 1; 2; . . . ; N :
Using the optimal DFC decision rule, it is also possible to rewrite bi in (2) as
Fig. 1. Transition model (hypothesis-decision) for a local detector.
In (2) D ¼ Hk is used to indicate that the decision ðDÞ of the DFC is to accept the kth hypothesis. The probability of error Pe of the DFC is Pe ¼
M 1 X
P ðHi Þbi :
ð3Þ
M 1 X
bi ¼
M 1 X
¼
P fQk ¼ maxfQj gM1 j¼0 jHi g
k¼0;k6¼i
"
M 1 X
¼
X
k¼0;k6¼i
i¼0
¼
o n M1 P ðujHi ÞP Qk ¼ maxfQj gj¼0 ju
k¼0;k6¼i
X
" P ðujHi Þ
M1 Y
( U1 w0k w0m
m¼0;m6¼k
u2U
N X þ ðwjk wjm Þuj
2. Optimal design for the DFC
#
u2U
(
M1 X
M1
Our main objective is to find fbi gi¼0 and to determine conditions under which limN !1 Pe ¼ 0. In addition, we discuss (Appendix B) the design of the local decision rules.
P fD ¼ Hk jHi g
k¼0;k6¼i
)#) ð6Þ
;
j¼1
The optimal DFC decision rule that minimizes Pe is D ¼ argmaxfP ðHi juÞg Hi P ðujHi ÞP ðHi Þ ¼ argmax P ðuÞ Hi ¼ argmaxfP ðujHi ÞP ðHi Þg Hi
¼ argmaxflog P ðujHi Þ þ log P ðHi Þg:
ð4Þ
N
where U ¼ f1; 1g is the set of all possible values of u, and U1 fg is the unit-step function, 1 x > 0; ð7Þ U1 fxg ¼ 0 otherwise: Thus we possess an optimal (min-Pe ) decision rule for the DFC, and an expression ((3) and (6)) for the global performance of this architecture.
Hi
We set Qi ¼ log P ðujHi Þ þ log P ðHi Þ:
ð5Þ
Under the assumption that the LD observations are conditionally independent (conditioned on the hypothesis), we have P ðujHi Þ ¼
N Y j¼1
P ðuj jHi Þ ¼
N Y ðRji Þð1þuj Þ=2 ð1 Rji Þð1uj Þ=2
We consider a simpler system where all LDs are identical (the case for non-identical LDs is analyzed along a similar path but is somewhat more demanding in bookkeeping and notation). Under this assumption, the local transition probabilities are denoted
j¼1
N Y j j 1=2 ¼ Ri ð1 Ri Þ j¼1
3. System performance with identical LDs
Rji 1 Rji
M1 fRi gi¼0 ;
uj =2 :
Hence we can rewrite Qi as N 1X Qi ¼ log P ðHi Þ þ log½Rji ð1 Rji Þ 2 j¼1 N N X 1X Rji 0 þ uj log þ wji uj ; ¼ w i 2 j¼1 1 Rji j¼1 where for i ¼ 0; 1; . . . ; M 1, we have set N 1X w0i ¼ log P ðHi Þ þ log½Rji ð1 Rji Þ 2 j¼1
where Ri ¼ Rji
for j ¼ 1; 2; . . . ; N :
The error probabilities for the DFC are ði ¼ 0; 1; . . . ; M 1Þ ( " ( M 1 M 1 X X Y bi ¼ P ðujHi Þ U1 w0k w0m k¼0;k6¼i
m¼0;m6¼k
u2U
þ ðwk wm Þ
N X
)#) uj
;
j¼1
where w0i ¼ log P ðHi Þ þ
N log½Ri ð1 Ri Þ 2
ð8Þ
X. Zhu et al. / Information Fusion 5 (2004) 157–167
Rm Ri > 1R , i.e., Rm > Ri , We note that Ti;m ¼ Tm;i . If 1R m i which according to (10) corresponds to m > i, Eq. (13) becomes L < Ti;m ; else when m < i, Eq. (13) becomes L > Ti;m . Let
and wi ¼
159
1 Ri log : 2 1 Ri
3.1. Decision region for Hi
i1
The binary decisions received by the DFC are governed by a discrete probability distribution function (pdf). Under Hi , each value of u 2 U has a probability of being realized which depends on Ri . If the local detectors are identical, then P ðu has k \1"s and N k \ 1"s jHi Þ N N k ¼ Rki ð1 Ri Þ : k Therefore, for each hypothesis Hi , there exists a corresponding binomial distribution of order N . For the distributions to be distinguishable from each other, i.e., for Hi to be distinguished from Hm ði 6¼ mÞ, there must exist at least one value of u such that P ðu has k \1"s and N k \ 1"sjHi Þ 6¼ P ðu has k \1"s and N k \ 1"sjHm Þ:
ð9Þ
For all hypotheses to be distinguishable, (9) must hold for all i and m with i 6¼ m. In the case of identical local detectors, this is equivalent to Ri 6¼ Rm . Furthermore, if any two hypotheses are distinguishable, we can re-index M1 the hypotheses fHi gi¼0 such that R0 < R1 < < RM1 :
ð10Þ
Lmin ¼ maxfdTi;m eg; i m¼0
ð15Þ
M1
¼ min fbTi;m cg; Lmax i m¼iþ1
ð16Þ
where dxe is the smallest integer greater than or equal to x, and bxc is the largest integer less than or equal to x. Then if Lmin < Lmax , the DFC accepts the hypothesis Hi i i if and only if Lmin 6 L < Lmax . However, if Lmin P Lmax , i i i i the hypothesis Hi is not detectable by the DFC. In Section 5 we elaborate further on the nature of these ‘‘decision intervals’’. 3.3. DFC performance and asymptotic properties For identical LDs, the probability of error under Hi (Eq. (8)) becomes 9 8 max Lk M 1 < X = X N j N j bi ¼ : ð17Þ ðRi Þ ð1 Ri Þ ; : min j k¼0;k6¼i j¼Lk
Moreover, the probability of detection for phenomenon P ðHi Þ, fi ¼ P fD ¼ Hi jHi g, can be written as Lmax i X N fi ¼ ðRi Þj ð1 Ri ÞN j : ð18Þ j min j¼Li
3.2. Criterion for accepting Hi For the DFC to make a decision D ¼ Hi , the following must be true: ( ) N X 0 0 U1 wi wm þ ðwi wm Þ uj ¼ 1; 8m 6¼ i: ð11Þ j¼1
Suppose that L out of N LDs make the decision 1, while the other N L LDs make the decision )1. Eq. (11) becomes w0i w0m þ ðwi wm Þð2L N Þ > 0;
8m 6¼ i;
ð12Þ
Theorem 1 now provides conditions for asymptotic detection of the correct hypothesis by the DFC, namely conditions for the probability of error to go to zero, as the number of sensors, N , goes to infinity. Theorem 1. If condition (10) holds, then limN !1 fi ¼ 1 and limN !1 bi ¼ 0 for i ¼ 0; 1; . . ; M 1. The probabilP.M1 ity of error of the DFC, Pe ¼ i¼0 P ðHi Þbi , converges to zero at least exponentially as N ! 1. The proof is in Appendix A.
which can be written as Rm ð1 Ri Þ P ðHi Þ 1 Ri < log þ N log L log ; Ri ð1 Rm Þ P ðHm Þ 1 Rm
4. Examples 8m 6¼ i: ð13Þ
Let Ti;m ¼
log
8m 6¼ i:
P ðHi Þ 1 Ri þ N log P ðHm Þ 1 Rm
log
Rm ð1 Ri Þ ; Ri ð1 Rm Þ ð14Þ
4.1. Gaussian populations––same variance, different means Let the local observations be drawn from one of five Gaussian populations with the same variance ðr2 ¼ 1Þ but different means (H0 : 2m, H1 : m, H2 : 0, H3 : m, and H4 : 2m). The observations are statistically independent, and all local detectors are identical. The
160
X. Zhu et al. / Information Fusion 5 (2004) 157–167
a priori probabilities are equal ðP ðH0 Þ ¼ P ðH1 Þ ¼ ¼ P ðH4 Þ ¼ 1=5Þ. The local detectors employ the following decision rule based on the observation z: u¼
1 z > 0; 1 otherwise:
ð19Þ
The DFC employs the optimal decision rule in Eq. (4). We calculate the probability of error (Pe ) of the DFC with respect to N , the number of local detectors, for different values of m. Fig. 2 shows Pe . It is not surprising that when m is small (e.g., m ¼ 0:5), the DFC performance is poor. As m grows (e.g., m ¼ 1:5), performance improves. It may appear somewhat counter-intuitive that as m increases further (e.g., m ¼ 2), the performance of the DFC does not continue to improve. However, as m becomes very large, the LD transition probabilities for certain hypotheses, e.g., R2 and R3 become closer in value. As a result, the discrete distributions of the local decisions under hypotheses H2 and H3 becomes less and less distinguishable, thus increasing the probability of error at the DFC for large values of m. 4.2. Gaussian populations––different variance, same means We assume that the observations are drawn from one of four zero-mean Gaussian populations with different variances (H0 : r2 ¼ 1, H1 : 4, H2 : 9, and H3 : 16). The observations are statistically independent, and all local detectors are identical. The a priori probabilities are equal ðP ðH0 Þ ¼ P ðH1 Þ ¼ ¼ P ðH3 Þ ¼ 1=4Þ. The local detector employs the following decision rule based on the observation z:
u¼
ð20Þ
The DFC employs the optimal decision rule in Eq. (4). We calculate the probability of error ðPe Þ of the DFC with respect to N , the number of local detectors, for different values of t. Fig. 3 shows that Pe decreases exponentially for different values of t (the graphs tend to a straight line). When t is small, the transition probabilities for all hypothesis are very close in value (most of the area under the probability density functions is outside the pffi pffi interval ½ t; t). As t increases, these probabilities become more distinct from each other. However, when t is large enough the area under the probability pffi pffi density functions is mostly confined within ½ t; t, and the transition probabilities are close in value once again. The resulting degradation in performance is demonstrated (for t ¼ 6) in Fig. 3. 4.3. Poisson populations We assume that the observations are drawn from one of four Poisson populations with different means (H0 : m, H1 : 2m, H2 : 3m and H3 : 4m). The observations are statistically independent, and all local detectors are identical. The a priori probabilities are equal ðP ðH0 Þ ¼ P ðH1 Þ ¼ ¼ P ðH3 Þ ¼ 1=4Þ. The local detector employs the following decision rule based on the observation z: 1 z > m= lnð3=2Þ; u¼ ð21Þ 1 otherwise: The DFC employs the optimal decision rule in Eq. (4). We calculate the probability of error (Pe ) of the DFC with respect to N , the number of local detectors, for different values of m. Fig. 4 shows that Pe decreases
0
100
10
-1
10
m=2
t=1
m=1.5
t=2
m=1
t=4
m=0.5
t=6
-2
Pe
Pe
z2 > t; otherwise:
1 1
10
-1
10
-3
10
-4
10
-2
0
20
40
60
80
100
120
140
160
N
Fig. 2. Probability of error (Pe ) vs. N (example 4.1).
180
10
0
20
40
60
80
100
120
140
160
N
Fig. 3. Probability of error ðPe Þ vs. N (example 4.2).
180
X. Zhu et al. / Information Fusion 5 (2004) 157–167
161
0
10
m=1 m=2 m=5 m = 20
-1
10
-2
Pe
10
-3
10
-4
10
-5
10
0
20
40
60
80
100
120
140
160
180
N
Fig. 4. Probability of error ðPe Þ vs. N (example 4.3).
Fig. 5. A typical decision region diagram (here M ¼ N ¼ 10).
exponentially for different values of m (the graphs tend to a straight line). Similar to example 4.1, when m is small, the DFC performance is poor. It improves as m increases to 2 and 5. However, as m becomes larger (e.g., m ¼ 20), some of the LD transition probabilities (e.g., those of H2 and H3 ) become so close that the DFC performance start to degrade again.
log
R1 1 R1
< log < log
R2 1 R2
RM1 1 RM1
< :
ð24Þ
Fig. 5 is a typical diagram for ten LDs showing the ten linear functions fQ00 ðxÞ; Q01 ðxÞ; . . . ; Q09 ðxÞg plotted as functions of x for the case M ¼ N ¼ 10. Let us denote by Q0max ðxÞ the function Q0max ðxÞ ¼ maxfQ00 ðxÞ; Q01 ðxÞ; . . . ; Q0M1 ðxÞg: ð25Þ x
5. Decision region for Hi ––geometric interpretation In this section we discuss an intuitive geometric interpretation of the partition of decision regions for M1 each hypothesis in fHi gi¼0 . From (5), we have Qi 1 L L Q0i ¼ ¼ log P ðHi Þ þ log Ri þ 1 logð1 Ri Þ; N N N N i ¼ 0; . . . ; M 1: ð22Þ Given the number L of LDs that make the decision 1, our optimal decision rule is to select Hi that corresponds to the largest Q0i . Let us denote by x the fraction of LDs that make the decision 1; that is, x ¼ L=N , 0 6 x 6 1. Then (22) can be written as Ri 1 0 Qi ðxÞ ¼ x log þ log P ðHi Þ þ logð1 Ri Þ; N 1 Ri i ¼ 0; . . . ; M 1;
ð23Þ
where we now write Q0i ðxÞ to emphasize the dependence of Q0i on x. For each i, Q0i ðxÞ is a linear function of x with slope log½Ri =ð1 Ri Þ. When the probabilities Ri are ordered as in (10) these slopes are correspondingly ordered by
As can be seen from Fig. 5, this function is a continuous, piecewise linear, convex-up function of x over the interval ½0; 1. In the particular example illustrated, only four of the ten functions, (namely, Q01 ðxÞ; Q04 ðxÞ; Q06 ðxÞ and Q09 ðxÞ), enter into the formation of Q0max ðxÞ. Likewise, only the four corresponding hypothesis (H1 ; H4 ; H6 and H9 ), can be distinguished by the DFC depending on which of four intervals ½0; x1 Þ; ½x1 ; x2 Þ; ½x2 ; x3 Þ and ½x3 ; 1 the fraction x lies in. We remark, however, that because x can only assume the discrete values in the set f0; 1=N ; 2=N ; . . . ; ðN 1Þ=N ; 1g, it is possible that x may not be able to fall in the one of the non-empty intervals determined by the function Q0max ðxÞ. This would further restrict the set of hypotheses that could be distinguished by the DFC. There are two situations in which all M hypotheses can be potentially distinguished by the DFC. The first situation is when the probabilities P ðHi Þ of the M hypotheses are all equal to each other. Letting the common probability be P , (23) reduces to Ri 1 Q0i ðxÞ ¼ x log þ log P þ logð1 Ri Þ; N 1 Ri i ¼ 0; . . . ; M 1:
ð26Þ
As can be verified, each of the straight lines represented by (26) is tangent to the function
162
X. Zhu et al. / Information Fusion 5 (2004) 157–167
EðxÞ ¼ x log
x 1 þ log P þ logð1 xÞ 1x N
ð27Þ
at x ¼ Ri . In other words, EðxÞ is the envelope of any one of the linear functions Q0i ðxÞ as the parameter Ri varies over all values between 0 and 1. As indicated in Fig. 6, EðxÞ is a smooth convex-up function over the interval ½0; 1, and so each of the straight lines determined by each Q0i ðxÞ lies below the function EðxÞ for all x 2 ½0; 1. This means that for any particular index i, the function Q0max must equal Q0i ðxÞ for all x in some neighborhood of the point of tangency x ¼ Ri . More specifically, because of the ordering R0 < R1 < < RM1 , the interval ½0; 1 can be partitioned into M non-empty subintervals, ½0; x1 Þ;½x1 ; x2 Þ; . . . ; ½xM2 ; xM1 Þ and ½xM1 ; 1, such that ð1Þ xi < Ri < xiþ1 ; i ¼ 1; . . . M 1; where x0 ¼ 0 and xM ¼ 1: ( Q0i ðxÞ; x 2 ½xi ; xiþ1 Þ; 0 ð2Þ Qmax ðxÞ ¼ 0 QM1 ðxÞ; x 2 ½xM1 ; 1:
0 6 i < M 1;
ð28Þ The optimal decision rule is then to choose hypothesis Hi if x 2 ½xi ; xiþ1 Þ, for some i ¼ 0; . . . ; M 2, or hypothesis HM1 if x 2 ½xM1 ; 1. In this way, hypothesis Hi is selected by the DFC if the fraction of the LDs that register 1 is in the subinterval containing Ri . As before, however, it is possible that none of the N þ 1 possible values of the discrete variables x lies in some of these subintervals, and so the corresponding hypothesis cannot be selected. The second situation in which all M hypotheses can be distinguished is when the number of detectors N is sufficiently large so that the term N1 log P ðHi Þ in (23) can be neglected. We then have
Q0i
x log
Ri 1 Ri
þ logð1 Ri Þ;
i ¼ 0; . . . ; M 1: ð29Þ
Each of the straight lines in (29) is tangent to the curve x EðxÞ ¼ x log þ logð1 xÞ; ð30Þ 1x which is simply a vertical translation of the curve (27). Fig. 6 thus serves to also illustrate the case when the number of the local detectors is large. Consequently, all M hypotheses can be distinguished if N is sufficiently large and, furthermore, a sufficiently large N guarantees that the N þ 1 possible values of the discrete variable x ¼ L=N will be distributed among all of the M intervals that distinguish the hypothesis. 5.1. Examples––Gaussian populations, same variance, different means We provide a geometric interpretation of our decision rule, using data from example 4.1, where the local observations were drawn from one of five Gaussian populations with the same variance ðr2 ¼ 1Þ but different means (H0 : 2m, H1 : m, H2 : 0, H3 : m, and H4 : 2m). The observations were statistically independent, and all local detectors were identical. The a priori probabilities were equal ðP ðH0 Þ ¼ P ðH1 Þ ¼ ¼ P ðH4 Þ ¼ 1=5Þ. Fig. 7 shows the decision regions for hypotheses H0 , H2 and H4 , with m ¼ 1, using graphs of EðxÞ and Q0i ðxÞ.
0
–0.1
–0.2
–0.3
–0.4
Q4’(x)
–0.5
E(x)
Q0’(x)
–0.6
Q1’(x) –0.7
Q2’(x) Q3’(x)
–0.8
–0.9
0
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
Fig. 6. Envelope EðxÞ and the family of functions Q0i ðN ! 1Þ.
1
Fig. 7. Decision region illustration ðm ¼ 1Þ.
X. Zhu et al. / Information Fusion 5 (2004) 157–167
163
Appendix A. Proof of Theorem 1 Lemma 1. If p and r are such that p 6¼ r and 0 < p; r < 1, then p 1p r 1r < 1: ðA:1Þ p 1p Proof. Let r 1r F ðp; rÞ ¼ p log þ ð1 pÞ log : p 1p
ðA:2Þ
Thus, o r 1r F ðp; rÞ ¼ log log ; op p 1p
ðA:3Þ
o2 1 1 : F ðp; rÞ ¼ p 1p op2
ðA:4Þ
This indicates that F ðp; rÞ has maxima where Fig. 8. Decision region illustration ðm ¼ 2Þ.
We observed before that when m increases, the performance of the DFC does not necessarily always improve (e.g., m ¼ 2). Fig. 8 demonstrates that as R3 and R4 become closer (and very small) in value, the decision region for H4 becomes very small, causing the optimal decision rule to choose H3 over H4 most of the time. Similarly, H1 is usually chosen rather than H0 . The observed increase in the probability of error at the DFC (with the increase of m) corresponds to the diminishing decision regions for certain hypotheses.
6. Conclusion We studied a distributed detection system that performs M-ary hypothesis testing with identical LDs making binary decisions. We showed that when the local detectors compress the observation into a single bit, the data fusion center (DFC) is able to distinguish the hypotheses provided that the distribution functions of the local decisions are not identical. We also showed that the probability of error decreases exponentially as the number of local detectors increases. Practical examples were given for discovering the local decision rules using genetic algorithm searches for local thresholds. Additional investigation is needed to determine the exact trade-off between compression level (i.e., number of bits per observation) and hardware complexity, namely, the number of sensors required to achieve a certain error rate. This is tied to the design of the local decision rules, since the transition probabilities determine the rate of convergence of the probability of error.
r 1r log log ¼ 0: p 1p
ðA:5Þ
It is easy to show that the maxima are at p ¼ r, 0 < p; r < 1. Also, for any given r, 0 < r < 1, we note from (A.3) that opo F ðp; rÞ > 0 for p < r, and opo F ðp; rÞ < 0 for p > r. Hence the maxima at p ¼ r is also a global maxima for 0 < p; r < 1. Therefore, 8p 6¼ r, 0 < p; r < 1, F ðp; rÞ < F ðp; pÞ ¼ 0. Eq. (A.1) follows. h Lemma 2. Given any r 2 ð0; 1Þ and p1 ; p2 such that 1. 1 > p1 > p2 > r > 0; or 2. 1 > r > p1 > p2 > 0, it follows that log pr1 log pr2 > : 1r 1r log 1p log 1p 1 2 Proof. Let 1 > p > r > 0, and p 1r log f ðp; rÞ ¼ log r 1p
ðA:6Þ
ðA:7Þ
and of ðp; rÞ 1r p ¼ ð1 pÞ log p log gðp; rÞ ¼ op 1p r ," # 2 1r pð1 pÞ log : 1p From Lemma 1, gðp; rÞ < 0 for 1 > p > r > 0. This implies that 8p1 ; p2 , 1 > p1 > p2 > r > 0, Eq. (A.6) is true. Similarly, for 1 > r > p > 0, it is easy to show that gðp; rÞ < 0. Therefore Eq. (A.6) is also true, 8p1 ; p2 , 1 > r > p1 > p2 > 0. h
164
X. Zhu et al. / Information Fusion 5 (2004) 157–167
Lemma 3. If condition (10) holds, then 8k; k k m;n 2 f0; 1; . . . ; M 1g, 9Nm;n 2 R, such that 8N > Nm;n , Tk;m > Tk;n , provided that either m > n > k or k > m > n. Proof. Let 0
1 P ðHk Þ P ðHk Þ log B P ðHm Þ P ðHn Þ C k C Nm;n ¼B @ Rm ð1 Rk Þ Rn ð1 Rk Þ A log log Rk ð1 Rm Þ Rk ð1 Rn Þ 0 1 , B C 1 1 B C: @ Rm 1 Rk Rn 1 Rk A 1 þ log 1 þ log log log Rk 1 Rm Rk 1 Rn log
Therefore Lmax NRi i pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi NRi ð1 Ri Þ pffiffiffiffi pffiffiffiffi 1 > pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½ðAi;iþ1 1Þ= N þ Bi;iþ1 N ; Ri ð1 Ri Þ Lmin NRi i pffiffiffiffiffiffiffi ffi NRi ð1 Ri Þ pffiffiffiffi pffiffiffiffi 1 < pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½ðAi;i1 þ 1Þ= N þ Bi;i1 N ; Ri ð1 Ri Þ
Since either m > n > k or k > m > n is true, from Eq. (10), we have either Rm > Rn > Rk or Rk > Rm > Rn . From Lemma 2,
and
k Provided that N > Nm;n , from Eqs. (A.8) and (14) we have Tk;m > Tk;n . h
P ðHk Þ P ðHm Þ ¼ Rm ð1 Rk Þ log Rk ð1 Rm Þ log
Ak;m
ðA:9Þ
ðA:15Þ
where
ðA:8Þ
1 1 > : 1Rk Rn k = log 1R 1 þ log RRmk = log 1R 1 þ log Rk 1Rn m
ðA:14Þ
Bk;m ¼
1 .
1 þ log RRmk
ðA:16Þ
1Rk log 1R m
ðA:17Þ
Rk :
From Lemma 1 and Riþ1 > Ri , 1 .
log RRiþ1i
1Ri log 1R iþ1
ðA:18Þ
> Ri :
Lemma 4. If condition (10) holds, then 8k 2 f1; . . . ; k k M 1g, 9Nmin , such that 8N > Nmin , Lmin ¼ dTk;k1 e; k k k 8k 2 f0; . . . ; M 2g, 9Nmax , such that 8N > Nmax , max Lk ¼ bTk;kþ1 c.
1þ
k k k ¼ maxk1 and Nmax ¼ Proof. Let Nmin m;n¼0 Nm;n M1 k k minm;n¼kþ1 Nm;n , where Nm;n is as defined in Eq. (A.8). This lemma follows from Lemma 3 and Eqs. (15) and (16). h
Since Ai;iþ1 is bounded,
Therefore, ðA:19Þ
Bi;iþ1 > 0:
h pffiffiffiffi pffiffiffiffii 1 lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðAi;iþ1 1Þ= N þ Bi;iþ1 N N !1 Ri ð1 Ri Þ
! ¼ 1: ðA:20Þ
Proof of Theorem 1. Using the DeMoivre–Laplace Theorem [9, pp. 49–50], Lmax NRi i ffi lim fi ¼ lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N !1 N !1 NRi ð1 Ri Þ
!
! Lmin NRi i lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; N !1 NRi ð1 Ri Þ
Lmax NRi i ffi lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N !1 NRi ð1 Ri Þ ðA:10Þ
1 þ log RRi1i Z
x
ey
2 =2
dy:
ðA:11Þ
1
When N is sufficiently large (Lemma 4),
! ¼ 1:
1Ri log 1R i1
< Ri :
ðA:12Þ
Lmin ¼ dTi;i1 e < Ti;i1 þ 1: i
ðA:13Þ
ðA:22Þ
Hence Bi;i1 < 0: Since Ai;i1 is bounded,
Lmax ¼ bTi;iþ1 c > Ti;iþ1 1; i
ðA:21Þ
Similarly, since Ri > Ri1 , one can show from Lemma 1 that 1 .
where 1 GðxÞ ¼ pffiffiffiffiffiffi 2p
From Eq. (A.14),
h pffiffiffiffii pffiffiffiffi 1 lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðAi;i1 þ 1Þ= N þ Bi;i1 N N !1 Ri ð1 Ri Þ
ðA:23Þ ! ¼ 0: ðA:24Þ
X. Zhu et al. / Information Fusion 5 (2004) 157–167
From Eq. (A.15), Lmin NRi i ffi lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N !1 NRi ð1 Ri Þ
165
Using the property GðX Þ ¼ 1 GðX Þ, we can rewrite 2 ðN Þ
! ¼ 0:
ðA:25Þ
! h pffiffiffiffi pffiffiffiffii 1 2 ðN Þ ¼ 1 G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Ai;i1 1Þ= N Bi;i1 N : Ri ð1 Ri Þ
From Eqs. (A.21) and (A.25), lim fi ¼ 1;
N !1
ðA:31Þ
i ¼ 0; 1; . . . ; M 1:
From Eq. (A.10), Lmax i
ðA:26Þ
N >
!
NRi lim bi ¼ 1 lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N !1 N !1 NRi ð1 Ri Þ ! Lmin NR i i ffi : þ lim G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi N !1 NRi ð1 Ri Þ
N !1
N !1
N !1
Ai;iþ1 þ Ai;i1 ; Bi;iþ1 þ Bi;i1
ðA:32Þ
if Bi;iþ1 þ Bi;i1 < 0, Eq. (A.32) yields
pffiffiffiffi pffiffiffiffi pffiffiffiffi pffiffiffiffi ðAi;iþ1 1Þ= N þ Bi;iþ1 N < ðAi;i1 1Þ= N Bi;i1 N ;
ðA:27Þ
ðA:33Þ
otherwise,
Use the inequalities in Eqs. (A.14) and (A.15), lim bi 6 lim 1 ðN Þ þ lim 2 ðN Þ;
When N is large enough such that
ðA:28Þ
pffiffiffiffi pffiffiffiffi pffiffiffiffi pffiffiffiffi ðAi;iþ1 1Þ= N þ Bi;iþ1 N > ðAi;i1 1Þ= N Bi;i1 N : ðA:34Þ
where
! h pffiffiffiffii pffiffiffiffi 1 1 ðN Þ ¼ 1 G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðAi;iþ1 1Þ= N þ Bi;iþ1 N Ri ð1 Ri Þ ðA:29Þ
and
! h pffiffiffiffii pffiffiffiffi 1 2 ðN Þ ¼ G pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðAi;i1 þ 1Þ= N þ Bi;i1 N : Ri ð1 Ri Þ
Therefore, since 1 GðxÞ is monotonically decreasing for x > 0, 2 limN !1 1 ðN Þ Bi;iþ1 þ Bi;i1 < 0; lim bi < ðA:35Þ 2 limN !1 2 ðN Þ otherwise: N !1 From [19, p. 39], 1 2 1 GðX Þ < pffiffiffiffiffiffi eX =2 ; 2pX
X > 0:
ðA:36Þ
ðA:30Þ Therefore
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h pffiffiffiffii2 pffiffiffiffi Ri ð1 Ri Þ 1 ðAi;iþ1 1Þ= N þ Bi;iþ1 N 1 ðN Þ < pffiffiffiffiffiffi pffiffiffiffi exp pffiffiffiffi 2Ri ð1 Ri Þ 2p ðAi;iþ1 1Þ= N þ Bi;iþ1 N pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i h Ri ð1 Ri Þ 1 2 ¼ pffiffiffiffiffiffi ðAi;iþ1 1Þ =N þ B2i;iþ1 N þ 2Bi;iþ1 ðAi;iþ1 1Þ pffiffiffiffi exp pffiffiffiffi 2Ri ð1 Ri Þ 2p ðAi;iþ1 1Þ= N þ Bi;iþ1 N pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h i Ri ð1 Ri Þ 1 < pffiffiffiffiffiffi B2i;iþ1 N þ 2Bi;iþ1 ðAi;iþ1 1Þ : pffiffiffiffi exp ðA:37Þ pffiffiffiffi 2Ri ð1 Ri Þ 2p ðAi;iþ1 1Þ= N þ Bi;iþ1 N
Similarly,
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h pffiffiffiffii2 pffiffiffiffi Ri ð1 Ri Þ 1 ð Ai;i1 1Þ= N Bi;i1 N 2 ðN Þ < pffiffiffiffiffiffi pffiffiffiffi exp pffiffiffiffi 2Ri ð1 Ri Þ 2p ðAi;i1 1Þ= N Bi;i1 N pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi i h Ri ð1 Ri Þ 1 2 2 ¼ pffiffiffiffiffiffi ðAi;i1 þ 1Þ =N þ Bi;i1 N þ 2Bi;i1 ðAi;i1 þ 1Þ pffiffiffiffi exp pffiffiffiffi 2Ri ð1 Ri Þ 2p ðAi;i1 1Þ= N Bi;i1 N pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h i Ri ð1 Ri Þ 1 2 pffiffiffiffi exp pffiffiffiffi < pffiffiffiffiffiffi B N þ 2Bi;i1 ðAi;i1 þ 1Þ : ðA:38Þ 2Ri ð1 Ri Þ i;i1 2p ðAi;i1 1Þ= N Bi;i1 N
166
X. Zhu et al. / Information Fusion 5 (2004) 157–167
Hence, pffiffiffiffi bi ¼ OðeN = N Þ
ðA:39Þ
and lim bi ¼ 0;
N !1
i ¼ 0; 1; . . . ; M 1:
ðA:40Þ
From Eq. (A.40), lim Pe ¼
N !1
M 1 X
P ðHi Þ lim bi ¼ 0:
i¼0
ðA:41Þ
N !1
Since bi decreases at least exponentially for i ¼ 0; 1; . . . ; M 1, Pe , a linear combination of these bi ’s, also decreases at least exponentially. h
Appendix B. Design of local decision rules For many decentralized detection problems, including the one studied here, determination of the optimal local decision rules was shown to be NP-complete [18]. The necessary conditions of the optimum are often described as a set of coupled non-linear equations that are extremely difficult to solve [5,15,20]. Several numerical methods were proposed to approximate the optimal local decision rules for such systems, including variants of Gauss–Seidel algorithm [14,21], and the use of genetic algorithms (GA) [8]. Though ‘‘most of the problems analyzed in the literature have been found to have globally optimal solutions in which each sensor uses the same threshold’’ [6], this is not the general case. In this Appendix B we demonstrate how the local decision rules for our architecture can be approximated, using genetic algorithm search. The best solutions that we found are for non-identical LDs. We assume that the local detector compares each scalar local observation zj ; j ¼ 1; 2; . . . ; N , to a threshold (or thresholds) in order to determine the local decision uj (see [6], Section III). We search numerically for a minimum probability of error Pe (as defined in Eq. (3)) in terms of the threshold(s) of each LD, assuming that we know the probability density function of the local observation zj and the a priori probabilities of the hypotheses. In general, Pe is a non-continuous non-differentiable function of the local thresholds which makes gradient-based optimizing algorithms ineffective. We therefore used genetic
algorithms (GAs) for the optimization task [8,13]. The GA sought a local minimum of Pe in N thresholds, where N is the number of LDs. In the following, we provide two examples (corresponding to the examples in Sections 4.1 and 4.2) for design of the local decision rules. The calculations were made for different number of sensors (2–7), using numerical search for the decision thresholds. Our GA [13] used a varying crossover rate, and a constant mutation rate of 0.12. The search was terminated either when the number of iterations reached 20,000 or the improvement in the probability of error over the last 100 iterations was less than 104 . B.1. Example B-1: Gaussian populations––same variance, different means The hypothesis testing problem has four equi-probable hypotheses. Under hypothesis i ði ¼ 1; . . . ; 4Þ, the observations are Gaussian with mean mi and standard deviation ri . Specifically, m1 ¼ 1, m2 ¼ 0, m3 ¼ 5 and m4 ¼ 7, and ri ¼ 1 for all i. Following [6] we assume that the ith LD uses the rule described in Eq. (19). Fig. 9 shows the global probability of error for the case of two (2) LDs. Not surprisingly the optima occur when the LDs are not identical, and we find two distinct global minima (we can permute the values of the two thresholds between LDs). Table 1 presents the calculated
Fig. 9. Probability of error ðPe Þ vs. thresholds in a two-sensor case (example B-1).
Table 1 Optimization results using identical and non-identical thresholds No. of sensors ðN Þ 2 3 4 5 6 7
Identical thresholds
Non-identical thresholds
Threshold
Error probability
Thresholds
Error probability
4.880 4.880 4.850 4.800 5.200 5.150
0.3846 0.3267 0.2993 0.2865 0.2767 0.2671
1.9750, 6.0281 )0.5065, 2.4871, 5.9247 )0.5348, 3.1490, 5.1615, 5.6864 )0.0628, 0.3386, 2.0482, 5.3941, 5.5808 )0.2424, 0.2868, 0.4751, 2.5893, 5.7135, 6.1509 )0.7008, )0.6992, )0.0608, 0.9873, 5.5744, 6.1316, 6.2272
0.3300 0.2356 0.2149 0.1935 0.1748 0.1572
X. Zhu et al. / Information Fusion 5 (2004) 157–167 Error probability Vs the number os sensors (diffrent variances)
0.45 Nonidentical sensros Identical sensors
The probability of error
0.4
0.35
0.3
0.25
0.2 2
3
4
5
6
7
Number of sensor
Fig. 10. Probability of error vs. number of sensors (example B-2).
thresholds for 2–7 LDs. The table also shows the solution for identical LDs. Clearly, identical LDs perform much worse than non-identical LDs. B.2. Example B-2: Gaussian populations––different variance, same means Following the notation in example B-1, we tested four Gaussian hypotheses. Here, m1 ¼ m2 ¼ m3 ¼ m4 ¼ 0 and r1 ¼ 1; r2 ¼ 4; r3 ¼ 10; r4 ¼ 20. The local decision rule is defined by Eq. (20). Fig. 10 shows the probability of error vs. the number of sensors, using the local thresholds found by the GA. Again, non-identical sensors provide better performance than identical sensors. However, threshold search for a large number of sensors can become computationally expensive. In this case, the results of Theorem 1 encourage the use of identical sensors.
References [1] W. Baek, S. Bommareddy, Optimal M-ary data fusion with distributed sensors, IEEE Transactions on Aerospace and Electronic Systems 31 (3) (1995) 1150–1152. [2] Z. Chair, P.K. Varshney, Optimal data fusion in multiple sensor detection systems, IEEE Transactions on Aerospace and Electronic Systems 22 (1) (1986) 98–101.
167
[3] B.V. Dasarathy, Operationally efficient architecture for fusion of binary-decision sensors in multidecision environments, Optical Engineering 36 (3) (1997) 632–641. [4] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973. [5] I.Y. Hoballah, P.K. Varshney, Distributed Bayesian signal detection, IEEE Transactions on Information Theory 35 (5) (1989) 995–1000. [6] W.W. Irving, J.N. Tsitsiklis, Some properties of optimal thresholds in decentralized detection, IEEE Transactions on Automatic Control 39 (4) (1994) 835–838. [7] M. Kam, W. Chang, Q. Zhu, Hardware complexity of binary distributed detection systems with isolated local Bayesian detectors, IEEE Transactions on Systems Man and Cybernetics 21 (3) (1991) 713–725. [8] W. Liu, Y. Lu, J.S. Fu, Data fusion of multiradar system by using genetic algorithm, IEEE Transactions on Aerospace and Electronic Systems 38 (2) (2002) 601–612. [9] A. Papoulis, Probability, Random Variables, and Stochastic Processes, third ed., McGraw-Hill, New York, 1991. [10] A.R. Reibman, L.W. Nolte, Design and performance comparison of distributed sensor systems, IEEE Transactions on Aerospace and Electronic Systems 23 (6) (1987) 789–797. [11] A.R. Reibman, L.W. Nolte, Optimal detection and performance of distributed sensor systems, IEEE Transactions on Aerospace and Electronic Systems 23 (1) (1987) 24–30. [12] F.A. Sadjadi, Hypothesis testing in a distributed environment, IEEE Transactions on Aerospace and Electronic Systems 22 (2) (1986) 134–137. [13] K.S. Tang, K.F. Man, Q. Kwong, S. He, Genetic algorithms and their applications, IEEE Signal Processing Magzine 13 (6) (1996) 22–37. [14] Z.B. Tang, K.R. Pattipati, D.L. Kleinman, A distributed M-ary hypothesis testing problem with correlated observations, IEEE Transactions on Automatic Control 37 (7) (1992) 1042–1046. [15] R.R. Tenney, N.R. Sandell, Detection with distributed sensors, IEEE Transactions on Aerospace and Electronic Systems 17 (4) (1981) 501–509. [16] S.C. Thomopoulos, R. Viswanathan, D.K. Bougoulias, Optimal decision fusion in multiple sensor systems, IEEE Transactions on Aerospace and Electronic Systems 23 (5) (1987) 644–653. [17] S.C. Thomopoulos, R. Viswanathan, D.K. Bougoulias, Optimal distributed decision fusion, IEEE Transactions on Aerospace and Electronic Systems 25 (5) (1989) 761–765. [18] J.N. Tsitsiklis, M. Athans, On the complexity of decentralized decision making and detection problems, IEEE Transactions on Automatic Control 30 (5) (1985) 440–446. [19] H.L. Van Trees, Detection Estimation and Modulation Theory, vol. 1. Wiley, New York, 1969. [20] Q. Zhang, P.K. Varshney, Decentralized M-ary detection via hierarchical binary decision fusion, Information Fusion 2 (2001) 3–16. [21] Y. Zhu, R.S. Blum, Z. Luo, K.M. Wong, Unexpected properties and optimum-distributed sensor detectors for dependent observation cases, IEEE Transactions on Automatic Control 45 (1) (2000) 62–72.