Optimizing Decision Fusion in the Presence of Byzantine Data Huimin Chen
Vesselin P. Jilkov
X. Rong Li
Department of Electrical Engineering University of New Orleans New Orleans, LA 70148, U.S.A. Email: {hchen2,vjilkov,xli}@uno.edu Abstract—We consider the problem of fusing local decision outputs into a global decision with a budget constraint in the presence of Byzantine data. Each local decision maker is assumed to provide finite output regarding two competing hypotheses. A fusion rule is characterized by probabilistic mixing of decision trees corresponding to deterministic policies to reach a global decision. For practical problems where maximizing detection probability is of primary concern, we propose to optimize the fusion rule under the budget constraint so that the fusion center can maintain the expected operational cost in the long run. In addition, we assume that each local processor receives the feedback from the fusion center sequentially in order to achieve the desired false alarm rate. A practical procedure based on the conformal prediction method is proposed for the honest local processor to adapt to the globally optimal decision fusion policy against Byzantine attack. We show that the attacker can only reduce the detection probability vs. budget curve to the situation where the fusion center has full knowledge of the compromised local processors in the asymptotic regime. Illustrative examples regarding the conflict detection problem in air traffic management are provided for policy analysis within the optimization of the decision fusion in the presence of Byzantine data.
Keywords: Decision fusion, distributed detection, decision tree, Byzantine attack, conformal prediction, efficient front. I.
I NTRODUCTION
Consider a distributed detection system with a collection of local processors and a fusion center. The objective is to make a decision between two competing hypotheses using the outputs from the local processors. The classical decentralized detection problem has been studied extensively with various configurations [15]. Recently, significant attention has been drawn to the inference problems with distributed sensor networks in the presence of Byzantine data. In a decision fusion setting, an intelligent adversary may compromise a subset of the local processors and send falsified data to the fusion center in order to degrade the detection performance. The optimal attacking distributions for Byzantine sensors to minimize the detection error exponent at the fusion center were obtained in [13]. When the fraction of Byzantine sensors reaches a critical threshold, the consistency of the detection at the fusion center can be completely destroyed [16]. Further theoretical results have been obtained in the Bayesian detection setting [6] and under the tree topology configuration [5]. These results quantify the detection performance degradation with various assumptions on the attacking power of the adversary. However, they are valid assuming that the fusion rule is
fixed and known by the adversary. The resulting performance analysis can be viewed as the worst case assessment of the Byzantine attacks. Alternatively, the fusion center can deploy various mitigation strategies against the Byzantine attacks in the repeated game setting. In [1], the authors adopted an entropy based trust model to identify the Byzantine sensors. A reputation based Byzantine identification scheme was proposed in [14]. Securing the communication channels from local processors to the fusion center through distributed coding [10], authentication protocol design [2], and smart quantization of sensor measurements [17] has also been shown to enhance the protection from independent or collaborative data falsification attacks. For other data falsification mitigation schemes, see the survey article [16]. Existing Byzantine mitigation schemes primarily focus on improving the identification accuracy of the Byzantine sensors in the hope that the decision fusion with only the input from those honest sensors will achieve the same error exponent as that without the Byzantine sensors. Owing to the sequential nature of the Byzantine identification methods, one faces the challenge of analyzing the detection performance in the non-asymptotic regime. In fact, most of the existing performance analysis of the distributed detection in the presence of Byzantine data has to assume an identical sensing model among the honest sensors. The results are usually of the worst case kind without the consideration of Byzantine mitigation. Under a slightly different decision fusion setting, this paper considers maximizing the detection probability at the fusion center under an operational budget constraint. It has been shown that the optimal fusion policy can be characterized by a random mixing of decision trees, and the resulting detection probability vs. budget curve is a piece-wise linear concave function [3]. We extend the decision fusion problem to a sequential setting where the fusion center has to learn the statistical property of the output from each local processor. We assume that each local processor receives 1 bit feedback from the fusion center after sending its locally processed measurement or decision. Since the false alarm rate for each local processor to achieve the globally optimal detection performance is unknown and time varying, the honest processor has to adaptively adjust its detection threshold based on the feedback from the fusion center. We propose that each honest processor applies the conformal prediction [18] on its output to achieve the desired false alarm rate. On the other hand, a Byzantine sensor can apply any attack distribution and acquire the complete knowledge of the decision fusion policy. Assuming conditional independence among the local
processors, we show that, with the reputation of each local processor being maintained at the fusion center, the intelligent adversary can only reduce the detection probability vs. budget curve to the situation where the fusion center is fully aware of the compromised local processors. Illustrative examples representing the conflict detection problem in air traffic management are provided for policy analysis within the framework of optimizing decision fusion. The rest of the paper is organized as follows. Section II formulates the decision fusion problem with a budget constraint in the presence of Byzantine data. Section III presents the optimal decision fusion policy when each local processor’s performance is known to the fusion center. Section IV provides a sequential update method by the honest local processor and the fusion center to maximize the detection probability. Section V gives the simulation examples representing the conflict detection problem in the presence of Byzantine data. Concluding remarks are in Section VI. II.
P ROBLEM F ORMULATION
The problem concerns binary hypotheses ℋ0 and ℋ1 to be determined at the fusion center. A local processor is a device that can generate 𝐾 possible outputs. For each output 𝑖 = 1, ..., 𝐾, denote by 𝑙0𝑖 the likelihood of getting 𝑖 under ℋ0 and 𝑙1𝑖 the likelihood of getting 𝑖 under ℋ1 . In the conventional setting, one assumes that the fusion center has the complete knowledge regarding the performance of each local processor, which is fully characterized by {𝑙0𝑖 , 𝑙1𝑖 }𝐾 𝑖=1 . Furthermore, one can assume that {𝑙0𝑖 } and {𝑙1𝑖 } are set of probabilities ordered by the likelihood ratio, i.e., 𝐾 ∑
𝑙0𝑖 =
𝑖=1
and
𝐾 ∑
𝑙1𝑖 = 1, 𝑙0𝑖 ≥ 0, 𝑙1𝑖 ≥ 0,
(1)
𝑖=1
𝑙11 𝑙12 𝑙1𝐾 ≥ ≥ ... ≥ . 𝑙01 𝑙02 𝑙0𝐾
(2)
The fusion center combines the outputs from 𝑁 local processors to declare ℋ0 or ℋ1 . Let 𝐴 be the set of possible actions. For the binary hypothesis testing problem, the fusion center has to declare one of the two hypotheses, i.e., 𝐴 = {“ℋ0 ”, “ℋ1 ”} for any sequence of the outputs provided by the local processors. Thus a decision fusion rule is viewed as a probabilistic mixing of the deterministic action for each possible output sequence from the local processors. Clearly, the fusion center has the option to choose local processors sequentially based on the outputs from the local processors being chosen previously. The decision procedure is generally characterized as a decision tree. Formally, let 𝐷 be an arbitrary sequential procedure to run the local processors and ℒ(𝐷) be the set of all possible outputs generated by 𝐷, i.e., the set of decision trees. A policy 𝑃 over 𝐷 is a weighted mapping 𝑝 : ℒ(𝐷) × 𝐴 → [0, 1] such that 0 ≤ 𝑝(𝑖, 𝑎) ≤ 1 for every 𝑖 ∈ ℒ(𝐷) and 𝑎 ∈ 𝐴. Clearly, the probability of the outcome 𝑖 being assigned to action 𝑎 should satisfy ∑ 𝑝(𝑖, 𝑎) = 1, ∀𝑖 ∈ ℒ(𝐷). (3) 𝑎∈𝐴
When maximizing the detection probability, we cast the optimal fusion rule as 𝑃 ∗ = arg max Pr(“ℋ1 ”∣ℋ1 )(𝑃 )
(4)
𝐶(𝑃 ) ≤ 𝐵
(5)
𝑃 ∈𝒫
subject to
where 𝐶(𝑃 ) is the cost of deploying policy 𝑃 and 𝐵 is the budget constraint. Denote by 𝑃𝐷 (𝑃 ) = Pr(“ℋ1 ”∣ℋ1 )(𝑃 ) the detection probability under policy 𝑃 . We have ∑ 𝑃𝐷 (𝑃 ) = 𝑙1𝑖 (𝐷)𝑝(𝑖, “ℋ1 ”). (6) 𝑖∈ℒ(𝐷)
The normalized cost of policy 𝑃 can be decomposed as ∑ 𝐶(𝑃 ) = 𝐶(𝐷) + 𝜋1 𝑙1𝑖 𝑝(𝑖, “ℋ1 ”) + 𝜅𝜋0 𝑙0𝑖 𝑝(𝑖, “ℋ1 ”) 𝑖∈ℒ(𝐷)
where 𝜋0 = 𝑃 (ℋ0 ) and 𝜋1 = 𝑃 (ℋ1 ) are the prior probabilities of the underlying hypotheses and 𝜅 > 1 accounts for the consequence of false alarm. Note that the cost of “ℋ0 ” is often negligible compared with “ℋ1 ” under ℋ0 . The cost 𝐶(𝐷) is usually associated with the expected sensor operation cost and the communication cost while the other two terms are associated with the expected consequence of declaring ℋ1 . One may rewrite the budget constraint as 𝐶(𝑃 ) = 𝐶0 (𝐷) + 𝜅𝑃𝐹 𝐴 (𝑃 ) ≤ 𝐵
(7)
where 𝐶0 is the cost of running 𝐷 and communicating the outputs to the fusion center while 𝑃𝐹 𝐴 is the false alarm probability using the policy 𝑃 . If 𝐶0 = 0, the problem reduces to Neyman-Pearson detection [7]. In the presence of Byzantine sensors, the fusion center can not rely on its prior knowledge of the local processors’ performance. We assume that the fusion center learns {𝑙0𝑖 , 𝑙1𝑖 }𝐾 𝑖=1 sequentially and optimizes the decision fusion policy based on the learned statistical model. In addition, the fusion center is able to provide 1 bit feedback to each local processor in order to improve the global detection performance. We assume that each honest processor will adjust its local detection threshold accordingly to meet the desirable false alarm rate from the fusion center. On the other hand, the Byzantine sensor may intentionally send the opposite information to confuse the fusion center. Note that the Byzantine sensor does not have to send the opposite information to the fusion center consistently since this attack strategy can be easily identified when the percentage of Byzantine sensors is relatively small. We are concerned with the adaptive strategy for each honest local processor to provide the most informative output to the fusion center based on the feedback. We also want to analyze the performance of detection probability vs. budget in the presence of Byzantine sensors over the long run. III. O PTIMAL D ECISION F USION P OLICY WITH K NOWN S TATISTICAL M ODELS OF L OCAL P ROCESSORS [3] When the fusion center has perfect knowledge of {𝑙0𝑖 , 𝑙1𝑖 }𝐾 𝑖=1 , the optimal decision fusion policy under a budget constraint was obtained in [3] and summarized as follows. Let 𝑔 be a local processor that can generate ℒ(𝑔) outputs regarding the two hypotheses. Let 𝐴𝑖 be the set of available
actions for 𝑖 ∈ ℒ(𝑔). For the entire range of possible budget 𝐵, the following constrained optimization problem ∑ max 𝑙1𝑖 𝑃𝐷 (𝑎)𝑝(𝑖, 𝑎) 𝑖∈ℒ(𝑔),𝑎∈𝐴𝑖
subject to
∑
𝑙0𝑖 𝐶(𝑎)𝑝(𝑖, 𝑎) ≤ 𝐵 − 𝐶(𝑔)
𝑖∈ℒ(𝑔),𝑎∈𝐴𝑖
∑
𝑝(𝑖, 𝑎) = 1, ∀𝑖 ∈ ℒ(𝑔)
𝑎∈𝐴𝑖
𝑝(𝑖, 𝑎) ≥ 0, ∀𝑖 ∈ ℒ(𝑔), ∀𝑎 ∈ 𝐴𝑖 yields a piece-wise linear concave function 𝑃𝐷 (𝐵). For output 𝑖 with probability 𝑙0𝑖 being seen under ℋ0 and 𝑙1𝑖 under ℋ1 , the set of actions 𝐴𝑖 = {𝑎𝑖,0 , ..., 𝑎𝑖,𝑚𝑖 } may include “ℋ0 ”, “ℋ1 ” and invoking a new policy with other local processors, where 𝑚𝑖 = ∣𝐴𝑖 ∣. Denote by 𝐶𝑖,𝑗 the cost associated with 𝑎𝑖,𝑗 and by 𝛿𝑖,𝑗 the detection probability associated with 𝑎𝑖,𝑗 . Assuming 𝐶𝑖,0 = 𝛿𝑖,0 = 0, we define the improvement of the detection probability per cost using 𝑎𝑖,𝑗 as 𝑙1𝑖 𝛿𝑖,𝑗 − 𝛿𝑖,𝑗−1 ⋅ . (8) 𝜌𝑖𝑗 = 𝑙0𝑖 𝐶𝑖,𝑗 − 𝐶𝑖,𝑗−1 By incrementally increasing the budget to assign 𝑝(𝑖, 𝑎𝑖∗ ,𝑗 ∗ ) = 1 when (𝑖∗ , 𝑗 ∗ ) = arg max 𝜌𝑖𝑗 , 𝑖 ∈ ℒ(𝑔), 𝑗 ∈ 𝐴𝑖 , 𝑖,𝑗
we can construct the set of (𝐶(𝑃 ∗ ), 𝑃𝐷 (𝑃 ∗ )) pairs with a greedy algorithm shown in Algorithm 1. The line connecting two adjacent points (𝐶𝑘−1 , 𝑃𝐷,𝑘−1 ) and (𝐶𝑘 , 𝑃𝐷,𝑘 ) can be achieved by randomly mixing the two corresponding policies 𝑃𝑘−1 and 𝑃𝑘 . Algorithm 1 Greedy Allocation Input: 𝑔 = {𝑙0𝑖 , 𝑙1𝑖 }𝑖∈ℒ(𝑔) , 𝒜 = {𝑎𝑖,𝑗 }𝑖∈ℒ(𝑔) . Initialization: 𝑃 ∗ = 𝜙, 𝑘 = 0, 𝑃𝐷0 = 0, 𝐶0 = 𝐶(𝑔). 𝐶𝑖,𝑗 = 𝐶(𝑎𝑖,𝑗 ), 𝛿𝑖,𝑗 = 𝑃𝐷 (𝑎𝑖,𝑗 ), 𝑗(𝑖) = 0,𝑝(𝑖, 𝑎𝑖,0 ) = 1, 𝛿 −𝛿 ⋅ 𝑖,𝑗 𝑖,𝑗−1 , ∀𝑖 ∈ ℒ(𝑔), 𝑝(𝑖, 𝑎𝑖,𝑗 ) = 0,𝑃0 = (𝑔, 𝑝), 𝜌𝑖𝑗 = 𝑙𝑙1𝑖 0𝑖 𝐶𝑖,𝑗 −𝐶𝑖,𝑗−1 𝑗 = 1, ..., 𝑚𝑖 . Policy Update: 𝑘 =𝑘+1 (𝑖𝑘 , 𝑗𝑘 ) = arg max𝑖,𝑗 𝜌𝑖𝑗 , 𝑖 ∈ ℒ(𝑔), 𝑗𝑖 < 𝑗 ≤ 𝑚𝑖 𝐶𝑘 = 𝐶𝑘−1 + 𝑙0𝑖𝑘 (𝐶𝑖𝑘 ,𝑗𝑘 − 𝐶𝑖𝑘 ,𝑗𝑘 −1 ) 𝑃𝐷𝑘 = 𝑃𝐷𝑘−1 + 𝑙1𝑖𝑘 (𝛿𝑖𝑘 ,𝑗𝑘 − 𝛿𝑖𝑘 ,𝑗𝑘 −1 ) 𝑝(𝑖𝑘 , 𝑎𝑖𝑘 ,𝑗𝑘 −1 ) = 0, 𝑝(𝑖∪ 𝑘 , 𝑎𝑖𝑘 ,𝑗𝑘 ) = 1 𝑃𝑘 = (𝑔, 𝑝), 𝑃 ∗ = 𝑃 ∗ 𝑃𝑘 , 𝑗(𝑖𝑘 ) = 𝑗𝑘 until there is no 𝑖 ∈ ℒ(𝑔) that yields 𝑗𝑖 < 𝑚𝑖 . Output: Efficient front connecting (𝐶𝑘 , 𝑃𝐷𝑘 ) and associated optimal deterministic policy 𝑃 ∗ . Let 𝑁 be the number of local processors. In order to construct the efficient front (𝐶(𝑃 ), 𝑃𝐷 (𝑃 )) for the entire
budget range, we need to recursively apply Algorithm 1 to obtain the decision trees for a subset of size 𝑘 based on the efficient front from the subset of size 𝑘 − 1. The optimal fusion policy to obtain the complete efficient front by combining the outputs from 𝑁 local processors is given in Algorithm 2. The action set 𝐴𝐺𝑘 ∖𝑔 includes all possible actions for the local processors in the subset 𝐺𝑘 except 𝑔. Note that the optimal decision policy involving local processors 𝐺𝑘 combines 𝑘 different policies from the greedy Algorithm 1 and generates the combined policy using Algorithm 3. The operator ⊎ indicates that the union of the undominated deterministic policies can be properly utilized to construct the efficient front of the 𝑃𝐷 (𝐵) (detection probability vs. budget) curve. The computation of the optimal decision fusion policy requires to retain the optimal sub-policies (and the associated 𝑃𝐷 (𝐵) curves) for all subsets of 𝐺. Thus the recursion given in Algorithm 2 can be interpreted as dynamic programming in 𝑁 stages to compute the optimal policy at stage 𝑘 using the optimal policy at stage 𝑘 − 1. Algorithm 2 Decision Fusion Input: Collection of local processors 𝐺 = {𝑔1 , ..., 𝑔𝑁 }. Initialization: 𝐴𝜙 = {“ℋ0 ”, “ℋ1 ”}. Fusion Policy: For 𝑘 = 1, ..., 𝑁 For all subsets 𝐺𝑘 ⊆ 𝐺 of size 𝑘 For each local processor 𝑔 ∈ 𝐺𝑘 𝐹𝐺𝑘 ,𝑔 =Greedy Allocation(𝑔, 𝐴𝐺𝑘 ∖𝑔 ) End For⊎ [∪ ( )] ∪ 𝐴𝐺𝑘 = 𝐴𝐺𝑘 ∖𝑔 𝑔∈𝐺𝑘 𝐹𝐺𝑘 ,𝑔 End For End For Output: Fusion policy 𝐴𝐺 and associated efficient front 𝑃𝐷 (𝐵). Algorithm 3 Finding Undominated Policies 𝑃 ∗ = ⊎ [𝑃1 , ..., 𝑃𝑘 ] Input: Collection of policies {𝑃1 , ..., 𝑃𝑘 } with increasing cost 𝐶(𝑃1 ) < ... < 𝐶(𝑃𝑘 ). Initialization: 𝑃 ∗ = 𝜙. Policy Selection: ∗ 𝐷 (𝑃𝑖 )−𝑃𝐷 (𝑃 ) , 𝑖 = 1, ..., 𝑘 𝜌𝑖 = 𝑃𝐶(𝑃 ∗ 𝑖 )−𝐶(𝑃 ) 𝜌 𝑗 = arg max ∪ 𝑖 𝑖 𝑃 ∗ = 𝑃 ∗ 𝑃𝑗 until no improvement in 𝑃𝐷 . Output: Policy 𝑃 ∗ and associated efficient front. For 𝑁 local processors each producing two outputs, the total number of deterministic fusion rules given by binary decision trees is 2 𝑆𝑛 = 𝑛𝑆𝑛−1 + 2, 𝑛 = 1, ..., 𝑁
with 𝑆0 = 2. We can see that even for 𝑁 = 5, 𝑆5 > 5.8 ⋅ 1018 . On the other hand, selecting the undominated policies from 𝑘 candidates using Algorithm 3 has 𝑂(𝑘) running time. The greedy allocation Algorithm 1 has 𝑂(∣𝒜∣ log ∣ℒ(𝑔)∣) in running time. The computation of the efficient front using Algorithm 2 takes 𝑂(2𝑁 𝑁 2 log 𝑁 ∣𝒜∣) time. The critical computation boils down to the decision fusion policies for 2𝑁 subsets of the local processors. The memory size for computing the optimal policy with 𝑘 local processors depends on the number of subsets with 𝑘 − 1 local processors and it grows exponentially in 𝑁 . Thus the optimal decision fusion policy using Algorithm 2 has exponential complexity in memory and running time as 𝑁 increases even though it is much more efficient than the brute force enumeration of all possible decision trees. Note that 𝑃𝐷 (𝐵) generated by fusing the outputs from local processors is a piece-wise linear concave function with the breakpoints (𝐶(𝑃𝑗 ), 𝑃𝐷 (𝑃𝑗 )) being achieved by a certain deterministic policy 𝑃𝑗 for 𝑗 = 0, ..., 𝐿 where 𝑃0 means assigning “ℋ0 ” to any terminal output of the decision tree. In the worst case, 𝐿 also grows exponentially in 𝑁 . It is important to note that computing the first 𝑘 breakpoints for a relatively small 𝑘 does not lead to exponential complexity. This has been experimentally verified for problems of moderate size with randomly generated sensor models [3]. IV.
S EQUENTIAL U PDATE OF D ECISION F USION P OLICY
When the statistical models of local processors 𝐺 are unknown to the fusion center, we consider optimizing the decision fusion policy in a sequential setting where the fusion center estimates 𝐺 and then finds the optimal policy under the budget constraint using Algorithm 2 with the estimated ˆ Each local likelihood functions of the local processors 𝐺. processor receives 1 bit feedback regarding the global decision on the two hypotheses. It is worth noting that most of the training examples are from ℋ0 since ℋ1 is rare and thus maximizing detection probability for decision fusion makes sense. A. Sequential Estimation of Local Performance Without loss of generality, we consider a local processor that generates only two possible outputs, namely, the local decision “ℋ0 ” and “ℋ1 ” while the fusion center needs to estimate the local processor’s performance in terms of 𝑃𝐹 𝐴 and 𝑃𝐷 . A practical approach is to compare the local decision and the global decision sequentially assuming that the global decision is always correct. At time 𝑡, let ℎ0𝑡 be the number of global decisions “ℋ0 ” and ℎ1𝑡 the number of global decisions “ℋ1 ”. Let 𝜌0𝑡 be the number of different decisions that the local processor sent to the fusion center while the global decisions are “ℋ0 ”. Let 𝜌1𝑡 be the number of different decisions that the local processor sent to the fusion center while the global decisions are “ℋ1 ”. The fusion center estimates the false alarm probability by 𝑃𝐹 𝐴 (𝑡) =
𝜌0𝑡 + 1 . ℎ0𝑡 + 2
(9)
Similarly, the detection probability is estimated by 𝑃𝐷 (𝑡) = 1 −
𝜌1𝑡 + 1 . ℎ1𝑡 + 2
(10)
The prior knowledge (at 𝑡 = 0) of a local processor’s performance is usually specified by its operating point assuming that it is honest. However, incorporating prior knowledge of an honest processor’s operating point is susceptible to Byzantine attacks. In addition, such an operating point may not be globally optimal from the fusion center’s perspective. Thus there is a need for an honest processor to adjust the operating point in order for the fusion center to recognize its type. B. Sequential Local Update Based on Conformal Prediction Since the local processor receives the feedback regarding the global decision after sending its local output to the fusion center, it may adjust the operating point in terms of 𝑃𝐹 𝐴 and 𝑃𝐷 to improve the detection probability at the fusion center. However, computing the threshold to achieve the desirable 𝑃𝐹 𝐴 is challenging when the distribution of the test statistic is unknown or finding the tail probability is computationally demanding. Nevertheless, an honest processor has to rely on its operating point to distinguish itself from the Byzantine sensors. Here we focus on the sequential decision making from an honest local processor by controlling the false alarm rate without knowing the distribution of the test statistic under ℋ0 . We propose that the local processor uses the conformal prediction technique [18] to adjust its false alarm rate based on the fusion center’s feedback. Conformal prediction is a hedge method that generates the predicted output with a given confidence level using a set of training examples being generated from the same distribution. In the sequential setting, a local processor computes the test statistic 𝑥𝑡 , sends the local decision 𝑦¯𝑡 to the fusion center and receives the global decision 𝑦𝑡 . We can treat (𝑥1 , 𝑦1 ), ..., (𝑥𝑡 , 𝑦𝑡 ) as the training examples and the local processor has to predict 𝑦𝑡+1 using 𝑥𝑡+1 as well as {(𝑥𝑖 , 𝑦𝑖 )}𝑡𝑖=1 . Assuming that the training examples are exchangeable, i.e., the order of the training examples does not affect the nonconformity measure for any particular example, then one can calculate the nonconformity score 𝛼𝑖 = 𝐴(𝐵𝑖 , (𝑥𝑖 , 𝑦𝑖 )) for each example, where 𝐵𝑖 = {(𝑥𝑗 , 𝑦𝑗 )∣𝑗 = 1, ..., 𝑡, 𝑗 ∕= 𝑖} and 𝐴 denotes a non-conformity score function defined over the space of training examples. The so called 𝑝-value of a particular example (𝑥𝑖 , 𝑦𝑖 ) can be estimated using the nonconformity score by 𝑞𝑖 =
∣{𝑗 = 1, ..., 𝑡∣𝛼𝑗 > 𝛼𝑖 }∣ . 𝑡
(11)
For the testing case (𝑥𝑡+1 , 𝑦) with a hypothetical label 𝑦, one expects that 𝑦𝑡+1 ∕= 𝑦 would lead to 𝛼𝑛+1 being relatively large compared with 𝛼1 , ..., 𝛼𝑡 . Thus a small 𝑝-value is an indication that the testing example with a label 𝑦𝑡+1 might be an outlier compared with the training examples. Denote by Γ𝜖 the valid prediction set with a significance level 𝜖, i.e., given (𝑥1 , 𝑦1 ), ..., (𝑥𝑡 , 𝑦𝑡 ), the testing example 𝑥𝑡+1 has a label 𝑦 satisfying 𝑃 (𝑦 ∈ / Γ𝜖 ) ≤ 𝜖. It can be shown that the smoothed 𝑝-value estimate for the 𝑖-th training sample given by ∣{𝑗∣𝛼𝑗 > 𝛼𝑖 }∣ + 𝜏𝑖 ∣{𝑗∣𝛼𝑗 = 𝛼𝑖 }∣ (12) 𝑡 yields a significance level 𝑞𝑖 , where 𝜏𝑖 is a random variable uniformly distributed in [0, 1] [18]. Thus conformal prediction provides a set of valid outputs for any testing example based on the nonconformity scores obtained from the training examples. 𝑞𝑖 =
Using the training examples {(𝑥1 , 𝑦1 ), ..., (𝑥𝑡 , 𝑦𝑡 )}, one can generate a valid prediction set Γ𝜖 of the output 𝑦𝑡+1 given the input 𝑥𝑡+1 and a fixed significance level 𝜖. It is interesting to note that the prediction set can be empty. In this case, every possible output label has the 𝑝-value below 𝜖. When the test example is not truly representative of the training data set, the prediction set is empty even under a fairly small 𝜖. Another possibility is that the true label 𝑦𝑡+1 does not belong to Γ𝜖 . According to conformal prediction theory, this can happen with a probability no larger than 𝜖. Thus if the local processor declares 𝑦𝑡+1 as “ℋ1 ” when 𝑦𝑡+1 ∈ / Γ𝜖 , then the false alarm probability can be controlled at the level of 𝜖 without knowing the underlying distribution that generates the training examples {(𝑥𝑖 , 𝑦𝑖 )}𝑡𝑖=1 [12]. In the decision fusion problem being considered earlier, the local processor computes its test statistic 𝑥𝑖 and compares with a threshold to reach a local decision. Once a global decision is made, the feedback 𝑦𝑖 from the fusion center will be available to the local processor. Thus the local processor can use the estimated 𝑝-value to adjust its false alarm rate. Specifically, when 𝑦𝑖 = “ℋ1 ” while the local decision is “ℋ0 ”, the local processor should set the detection threshold to the test statistic 𝑥𝑖 or its smoothed version from those test statistics obtained in the missed detection events at time 𝑡 = 1, ..., 𝑖 − 1. This will increase the detection probability at the cost of an increased false alarm rate. When 𝑦𝑖 = “ℋ0 ” while the local decision is “ℋ1 ”, the local processor has to lower 𝜖 slightly so that the 𝑥𝑖 will not be treated as an indication of “ℋ1 ” at the specified 𝑝-value. Finally, we suggest that the local processor uses the height of the kernel density estimate of 𝑥𝑖 under 𝑦𝑖 = “ℋ0 ” as the nonconformity score function, i.e., ( ) 𝑥 − 𝑥𝑖 𝑖−1 1 𝐴𝑖−1 (𝑥) + 𝐾 𝐴𝑖 (𝑥) = 𝑖 𝑖𝜎 𝜎 where 𝐾(⋅) is chosen as a Gaussian kernel 𝑥2
𝐾(𝑥) = 𝑒− 2𝜎2 with 𝜎2 =
𝑖
1∑ 2 𝑥 . 𝑖 𝑗=1 𝑖
A similar technique has been used in [12] for detecting anomaly events from trajectory data. The sequential update of a local processor is fully characterized by Algorithm 4. C. Byzantine Attack Strategies and Possible Mitigation Schemes There are different attack strategies that can be adopted by the local processors being compromised and controlled by the intelligent adversary. Assume that a Byzantine sensor will send the opposite information to the fusion center with probability 𝑃𝑜 . Without the coordination by the intelligent adversary, the opposite information is simply the alternative hypothesis of the local processor’s decision. If the intelligent adversary knows the statistical models of the local processors that send their decisions to the fusion center, then the opposite information corresponds to the alternative hypothesis of the global decision made as if the intelligent adversary were the fusion center. In
Algorithm 4 Sequential Update of An Honest Local Processor Input: Test statistic {𝑥1 , ..., 𝑥𝑡−1 , 𝑥𝑡 } and global decisions {¯ 𝑦1 , ..., 𝑦¯𝑡−1 }. Local Decision: Estimate 𝑝-value 𝑞𝑡 of 𝑦𝑡 = “ℋ0 ” using (12). If 𝑞𝑡 < 𝜖, then send 𝑦𝑡 = “ℋ1 ” to the fusion center. Otherwise, send 𝑦𝑡 = “ℋ0 ”. Threshold Update: Receive feedback 𝑦¯𝑡 from the fusion center. If 𝑦¯𝑡 = “ℋ0 ” and 𝑦𝑡 = “ℋ1 ”, then 𝜖 = 𝑞𝑡 . If 𝑦¯𝑡 = “ℋ1 ” and 𝑦𝑡 = “ℋ0 ”, then 𝜖 = min{𝛼𝑗 : 𝑦¯𝑗 = “ℋ1 ”}. Output: Local decision 𝑦𝑡 and the updated threshold 𝜖.
the extreme situation, the Byzantine sensors will all report the incorrect decision to the fusion center. Let 𝜉 be the percentage of Byzantine sensors among all local processors. Clearly, if 𝜉 ≥ 0.5, then the attacker can fully destroy the consistency of any policy at the fusion center no matter how good the performance of the honest local processors is [13]. On the other hand, when 𝜉 < 0.5, the fusion center may start with a majority vote to estimate 𝜉 and compare the global decision with the local decisions to identify the Byzantine sensors [1]. This mitigation strategy is effective when the honest local processors have low 𝑃𝐹 𝐴 and high 𝑃𝐷 . In the sequential setting, the fusion center is able to estimate the performance of each local processor so that the estimated difference 𝑃𝐷 − 𝑃𝐹 𝐴 is conservative when an honest processor generates its local decision based on conformal prediction. In order to reduce the chance of being identified as the Byzantine sensor, the attacker may change 𝑃𝑜 dynamically with an arbitrary attack distribution. Since the primary objective at the fusion center is to maximize 𝑃𝐷 under a budget constraint, the attacker may send “ℋ0 ” with high probability when the underlying hypothesis is ℋ1 while maintaining a low false alarm rate by sending “ℋ1 ” only when the majority of the honest local processors doing so. It is worth noting that larger size of the local processors are more reliable under Byzantine attacks even if 𝜉 remains unchanged. D. Performance Analysis We assume that the fusion center receives 𝑁 observations from the local processors and with probability 𝜉 an observation is Byzantine. Denote by 𝑙0 the distribution of the output from the honest local processor under ℋ0 and 𝑙1 the distribution under ℋ1 . Denote by ˜𝑙0 the distribution of the output from the Byzantine under ℋ0 and ˜𝑙1 the distribution under ℋ1 . With the observation origin uncertainty, any output from a local processor has a mixture distribution 𝑝0 = (1 − 𝜉)𝑙0 + 𝜉 ˜𝑙0 under ℋ0 and 𝑝1 = (1 − 𝜉)𝑙1 + 𝜉 ˜𝑙1 under ℋ1 . We assume that the 𝑁 observations are conditionally independent and the distributions 𝑝0 and 𝑝1 are distinguishable. For example, the divergence between 𝑝0 and 𝑝1 is positive, i.e., 𝐷(𝑝0 //𝑝1 ) > 0 where 𝐷(𝑝0 //𝑝1 ) =
𝐾 ∑
(1 − 𝜉)𝑙0𝑘 + 𝜉 ˜𝑙0𝑘 [(1 − 𝜉)𝑙0𝑘 + 𝜉 ˜𝑙0𝑘 ] log . (1 − 𝜉)𝑙1𝑘 + 𝜉 ˜𝑙1𝑘 𝑘=1
¯ one has It has been shown in [13] that when 𝜉 < 𝜉, 𝐷(𝑝0 //𝑝1 ) > 0 no matter what attack distributions ˜𝑙0 and ˜𝑙1 are, where ∑ + 𝑘 (𝑙1𝑘 − 𝑙0𝑘 ) ¯ ∑ 𝜉= 1 + 𝑘 (𝑙1𝑘 − 𝑙0𝑘 )+ and (𝑥)+ = 𝑥 when 𝑥 > 0 and (𝑥)+ = 0 otherwise. Next, we show that under the above assumptions, the fusion center is able to estimate 𝑙0 and 𝑙1 in the sequential setting based on the global decisions made using the outputs from the honest and Byzantine sensors. Assume that the honest local processor has the operating point satisfying 𝜉 < (𝑃𝐷 − 𝑃𝐹 𝐴 )/(1 + 𝑃𝐷 − 𝑃𝐹 𝐴 ). Otherwise, the detection capability would be completely destroyed even when the fusion center has the perfect knowledge of 𝑙0 and 𝑙1 among the honest local processors in the worst case of Byzantine attack [13]. For the likelihood ratio test between 𝑝0 and 𝑝1 whose false alarm rate does not exceed a given threshold 𝑃𝐹 𝐴 implicitly required by the budget constraint, with 𝑁 conditionally independent local decisions, the miss detection error probability has the asymptotic expression (𝑛)
𝑃𝑀 = 𝑒−𝑛[𝐷(𝑝0 //𝑝1 )+𝑒(𝑛)] where 𝑒(𝑛) → 0 as 𝑛 → ∞ [7]. Thus for a fixed decision fusion policy based on the likelihood ratio test statistic from any local processor, the probability of choosing a Byzantine sensor in the optimal fusion policy at the 𝑡-th round approaches zero for large 𝑡 when the attack distribution is distinguishable from the performance of an honest local processor. On the other hand, an honest local processor has a conservative estimate of its 𝑃𝐹 𝐴 based on the conformal prediction using the feedback from the fusion center. Thus we have ∑ 𝜋1 ˆ𝑙1𝑖 𝑝(𝑖, “ℋ1 ”) + 𝜅𝜋0 ˆ𝑙0𝑖 𝑝(𝑖, “ℋ1 ”) 𝐶(𝑃 ) = 𝐶(𝐷) + ≤ 𝐶(𝐷) +
∑
𝑖∈ℒ(𝐷)
𝜋1 𝑙1𝑖 𝑝(𝑖, “ℋ1 ”) + 𝜅𝜋0 𝑙0𝑖 𝑝(𝑖, “ℋ1 ”) ≤ 𝐵
𝑖∈ℒ(𝐷)
for the policy 𝑃 being optimized with ˆ𝑙0 and ˆ𝑙1 . In the asymptotic regime, a Byzantine sensor is either fully exposed to the fusion center or helping the fusion center to improve the detection performance at the given budget constraint. The attacker can only reduce the detection probability vs. budget curve to the situation where the fusion center has complete knowledge of the Byzantine sensors thus ignoring their outputs. Assuming that the sensing and communication cost are small, we can see that an honest local processor will increase its 𝑃𝐹 𝐴 until reaching the budget constraint at the fusion center or the maximum of 𝑃𝐷 − 𝑃𝐹 𝐴 . Intuitively, an honest processor operating at the maximum of 𝑃𝐷 − 𝑃𝐹 𝐴 has the largest distinction from a Byzantine sensor in terms of maximally allowable percentage of Byzantine sensors 𝜉 not to destroy the decision fusion policy. It is equivalent to minimizing the Bayesian error assuming equal prior of the two hypotheses. V.
S IMULATION S TUDY
We illustrate the performance of the proposed approach through simulation examples. The scenario considered here represents a simplified conflict detection problem in air traffic
control and collision avoidance. Realistic estimate of collision risk using air traffic surveillance data has been reported in [8], [9], [4]. As new sensor and communication systems will provide each aircraft more capability in determining possible airspace encounter, the resulting decision fusion problem also becomes costly to optimize and vulnerable to Byzantine data. To simplify the model of aircraft encounter and the detection performance from an independent observer, we assume that an honest local processor obtains the test statistic following a Gaussian distribution 𝒩 (0, 1) under ℋ0 and 𝒩 (𝜇, 1) under ℋ1 with a positive mean 𝜇 such that 𝑃𝐹 𝐴 = 0.2 and 𝑃𝐷 = 0.9. Such an operating point seems worse than the existing conflict detection system [11], but it brings the benefit of decision fusion as the airspace becomes increasingly dense while each aircraft has more freedom in trajectory based operation. By changing the threshold above which ℋ1 is declared, the local processor can have different operating points over time on the receiver operating characteristic (ROC) curve (𝑃𝐹 𝐴 , 𝑃𝐷 ). In the experimental study, we have 20 local processors among which 6 are Byzantine sensors corresponding to 𝜉 = 0.3. All the honest processors have identical performance in terms of the ROC curve. Assume that the cost of obtaining the decision from a local processor is 1 while the cost of resolving the conflict is 10 when the global decision is “ℋ1 ”. The fusion center has a budget constraint of 1.6 for each conflict detection event under the normal condition. Without Byzantine data, the fusion center obtains the efficient front of (𝐵, 𝑃𝐷 ) by connecting the breakpoints (0,0), (2.4,0.81) and (3,0.9) when the local processors operate at 𝑃𝐹 𝐴 = 0.2 and 𝑃𝐷 = 0.9. At budget 𝐵 = 1.6, the detection probability 𝑃𝐷 = 0.81 ⋅ 1.6/2.4 = 0.54 when the fusion center randomly picks two local processors with probability 0.67 and declares ℋ1 when receiving “ℋ1 ” from both processors. The expected cost under ℋ0 is (0.22 ⋅ 10 + 2) ⋅ 0.67 = 1.6 which meets the budget constraint. The detection probability 𝑃𝐷 = 0.66 when the fusion center randomly selects a local processor and declares ℋ1 when receiving “ℋ1 ” from the local processor if the local processor operates at 𝑃𝐹 𝐴 = 0.06. In this case, the expected cost under ℋ0 is (0.06 ⋅ 10 + 1) = 1.6. Thus the detection probability can be improved when the honest local processors reduce the false alarm rate to meet the budget constraint at the fusion center. We consider two attack schemes by the Byzantine sensors. The first scheme tries to minimize the relative entropy between the distributions under ℋ0 and ℋ1 . The resulting attack distribution turns out to be hypothesis-reversed: when the truth is ℋ1 , the Byzantine sensors generate output according to ℋ0 , and vice versa [13]. Note that the intelligent adversary has to know the true hypothesis and also assumes that the honest processors operate at 𝑃𝐹 𝐴 = 0.2 and 𝑃𝐷 = 0.9 when computing the optimal attack distribution. We call the attack static since the attack distribution is fixed. A more realistic situation is that the Byzantine sensors generate attack distribution according to the estimated 𝑃𝐹 𝐴 and 𝑃𝐷 of the selected honest processors in the fusion policy by the fusion center. In this case, the intelligent adversary does not know the true hypothesis, but applies the hypothesis-reverse attack distribution based on the local decision from the honest processor. We call the attack dynamic since the attack distribution is time varying and dependent upon the update of the honest processor’s operating point and the resulting optimized decision
0.22
0.9 static attack dynamic attack
static attack dynamic attack
0.2
0.8 0.18
0.16
0.7
PD
PFA
0.14
0.6
0.12
0.1
0.5
0.08
0.4 0.06
0.04
0
200
400
600
800
1000 time
1200
1400
1600
1800
0.3
2000
Fig. 1. False alarm probability of an honest local processor under Byzantine attacks
fusion policy. We generated the truth hypothesis independently with 𝑃 (ℋ0 ) = 0.9 and 𝑃 (ℋ1 ) = 0.1. The honest local processor that has the largest probability being selected by the fusion center at the final time 𝑡 = 2000 has the false alarm probabilities shown in Fig. 1. Clearly, by reducing the false alarm rate from 0.2 to 0.06, an honest processor can increase the chance of being selected by the fusion center. This is accomplished when all of the honest processors apply Algorithm 4 to update their operating points. Note that the false alarm rate reduces more slowly under dynamic attack owing to the adaptive nature of the attack distributions made by the Byzantine sensors. The actual detection probabilities of the optimized fusion policy are shown in Fig. 2 under static and dynamic Byzantine attacks. We can see that 𝑃𝐷 increases from around 0.5 to 0.66 in both cases and 𝑃𝐷 is slightly larger under dynamic attack owing to the reduced miss probability under ℋ1 of the hypothesis-reverse attack distribution. Clearly, the probability of selecting a Byzantine sensor becomes very small in both cases so that the resulting decision fusion performance is fairly close to 𝑃𝐷 = 0.66 under 𝐵 ≤ 1.6 without Byzantine data. It is worth noting that an honest processor may have incentive to operate under a higher false alarm rate than 0.06 in order to maximize 𝑃𝐷 − 𝑃𝐹 𝐴 and then reduce the false alarm rate in order to meet the budget constraint once being recognized as the honest processor by the fusion center. Such a strategy rather than using conformal prediction may improve the detection probability to 0.66 with fewer training examples when ℋ1 occurs frequently. However, the Byzantine sensors may respond accordingly to increase the expected operating cost at the fusion center with some modification to the dynamic attack strategy. Thus we are unable to offer a general picture of the best Byzantine mitigation strategy in the non-asymptotic regime even with this simplified decision fusion example. Next, we consider the decision fusion under Byzantine attack where the performance of local processors changes over time. Assume that the 20 local processors are randomly deployed in a 20 × 20 area. Each local processor moves with a random direction and a speed of 1 in each time step. If the distance between two local processors is smaller than 1, then a conflict (hypothesis ℋ1 ) should be declared. Each local processor is able to estimate its own position as well as the positions of other local processors. The estimation error is
Fig. 2.
200
400
600
800
1000 time
1200
1400
1600
1800
2000
Detection probability at the fusion center under Byzantine attacks
assumed to be Gaussian with zero mean and the variance being equal to the distance from the observer to the true location of the local processor being measured. An honest local processor estimates the pairwise distance of two local processors and compares with a threshold to determine whether a conflict should be declared. We assume that there are 6 Byzantine sensors that engage the dynamic attack using the hypothesisreverse attack distribution. Fig. 3 shows one realization of the scenario at the initial time with two real conflicts (representing possible air collision) among the local processors. Assume that the cost of obtaining the decision from a local processor is 0.1 while the cost of resolving the conflict is 10. The fusion center has a budget constraint of 1.6. Note that without the reputation of each local processor being maintained at the fusion center, the conflict between two Byzantine sensors can not be detected by any decision fusion policy since 𝜉 = 0.3 is large enough to destroy the consistency of the detection. Fig. 4 compares the detection probability at the fusion center under Byzantine attacks and that when all local processors are honest. We can see that the performance degradation is quite small thanks to the sequential update of each local processor’s 𝑃𝐹 𝐴 and 𝑃𝐷 at the fusion center. It is worth noting that the performance gap will further reduce when the cost of obtaining the local decision decreases. In addition, when the fusion center can perfectly identify all of the Byzantine sensors, the detection performance is still upper bounded by the 𝑃𝐷 curve without Byzantine attacks. Thus the identification of Byzantine sensors is less crucial than the estimation of every local processor’s performance when one wants to maximize the detection probability in the presence of Byzantine data. This has practical implications when the fusion center has limited budget to inquire only a small percentage of the local processors in each trial while the honest local processor’s best response is to operate at the maximum of 𝑃𝐷 − 𝑃𝐹 𝐴 so that the fusion center has a large probability to inquire its local decision. VI.
C ONCLUSION
We studied the problem of fusing local decision outputs into a global decision with a budget constraint in the presence of Byzantine data. Each local decision maker is assumed to provide finite output regarding two competing hypotheses.
20
ACKNOWLEDGMENT
18
This work is supported in part by NASA/LEQSF(2013-15)Phase3-06 through grant NNX13AD29A and Air Force STTR FA9453-13-M-0059.
16
14
12
R EFERENCES
10
[1] M. Abdelhakim, L. Lightfoot, J. Ren, and T. Li, “Distributed detection in mobile access wireless sensor networks under Byzantine attacks,” IEEE Trans. Parallel and Distributed Systems, doi:10.1109/TPDS.2013.74, 2013. [2] Y. Chen, J. Yang, W. Trappe, and R. Martin, “Detecting and localizing identity-based attacks in wireless and sensor networks,” IEEE Transactions on Vehicular Technology, 59(5), pp. 2418–2434, 2010. [3] H. Chen, V. P. Jilkov, and X. R. Li, “On optimizing decision fusion with a budget constraint”, 16th International Conference on Information Fusion, Istanbul, Turkey, 2013. [4] R. Cole, M. Kochenderfer, R. Weibel, M. Edwards, D. Griffith, and W. Olson, “Fielding a sense and avoid capability for unmanned aircraft systems: policy, standards, technology, and safety modeling,” Air Traffic Control Quarterly, 21(1), pp. 5–27, 2013. [5] B. Kailkhura, S. Brahma, Y. S. Han, and P. K. Varshney, “Distributed detection in tree topologies with Byzantines,” CoRR, vol. abs/1309.4513, 2013. [6] B. Kailkhura, Y. S. Han, S. Brahma, and P. K. Varshney, “Distributed Bayesian detection with Byzantine data,” CoRR, vol. abs/1307.3544, 2013. [7] S. M. Kay, Fundamentals of Statistical Signal Processing Vol. II: Detection Theory, Prentice Hall, 1998. [8] M. Kochenderfer, M. Edwards, L. Espindle, J. Kuchar, and J. Griffith, “Airspace encounter modelds for estimating collision risk,” AIAA Journal of Guidance, Control, and Dynamics, 33(2), pp. 487–499, 2010. [9] M. Kochenderfer, J. Holland, and J. Chryssanthacopoulos, “Next generation airborne collision avoidance system,” Lincoln Laboratory Journal, 19(1), pp. 55–71, 2012. [10] O. Kosut, and L. Tong, “Distributed source coding in the presence of Byzantine sensors,” IEEE Trans. Information Theory, 54(6), pp. 2550– 2565, 2008. [11] J. Kuchar, and L. Yang, “A review of conflict detection and resolution modeling methods,” IEEE Trans. Intelligent Transportation Systems, 1(4), pp. 179–189, 2000. [12] R. Laxhammar, and G. Falkman, “Online learning and sequential anomaly detection in trajectories”, IEEE Trans. Pattern Analysis and Machine Intelligence, doi:10.1109/TPAMI.2013.172, 2013. [13] S. Marano, V. Matta, and L. Tong, “Distributed detection in the presence of Byzantine attacks,” IEEE Trans. Signal Processing, 57(1), pp. 16–29, 2009. [14] A. Rawat, P. Anand, H. Chen, and P. Varshney, “Collaborative spectrum sensing in the presence of Byzantine attacks in cognitive radio networks,” IEEE Trans. Signal Processing, 59(2), pp. 774–786, 2011. [15] P. K. Vashney, Distributed Detection and Data Fusion, New York: Springer-Verlag, 1997. [16] A. Vempaty, L. Tong, and P. Varshney, “Distributed inference with Byzantine data: State-of-the-art review on data falsification attacks,” IEEE Signal Processing Magazine, 30(5), pp. 65–75, 2013. [17] A. Vempaty, O. Ozdemir, K. Agrawal, H. Chen, and P. K. Varshney, “Localization in wireless sensor networks: Byzantines and mitigation techniques,” IEEE Trans. Signal Processing, 61(6), pp. 1495–1508, 2013. [18] V. Vovk, A. Gammerman, and G. Shafer, Algorithmic Learning in A Random World, Springer, 2005.
8
6
4 honest processor Byzantine conflict
2
0
2
4
6
8
10
12
14
16
18
20
Fig. 3. One realization of the conflict detection scenario with 20 local processors, 6 of which are Byzantine sensors
1
0.995
PD
0.99
0.985
0.98
0.975
0.97
without Byzantine attack with Byzantine attack
200
400
600
800
1000 time
1200
1400
1600
1800
2000
Fig. 4. Comparison of conflict detection probability at the fusion center without and with Byzantine attack
A fusion rule is characterized by a probabilistic mixing of decision trees corresponding to deterministic policies to reach a global decision. For those problems where maximizing detection probability is of primary concern, we optimize the fusion rule under the budget constraint so that the fusion center can maintain the expected operational cost in the long run. In addition, we assume that each local processor receives the feedback from the fusion center sequentially to achieve the desired false alarm rate. A conformal prediction based procedure is proposed for the honest local processor to adapt to the globally optimal decision fusion policy in the presence of Byzantine sensors. The strategy is relatively simple to implement in the sequential setting. We show that with the reputation of each local processor being maintained at the fusion center, the attacker can only reduce the detection probability vs. budget curve to the situation where the fusion center has the complete knowledge of the compromised local processors. Illustrative examples representing the decision fusion problem for conflict detection in air traffic management were provided for policy analysis. We found that the detection performance degradation is fairly small in the repeated game when the percentage of Byzantine sensors is below a critical threshold.