12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 2009
Optimal Distributed Estimation Fusion with Compressed Data∗ Zhansheng Duan X. Rong Li Department of Electrical Engineering University of New Orleans New Orleans, LA 70148, U.S.A. {zduan,xli}@uno.edu
Abstract – Considering communication constraints and affordable computational resources at the fusion center (e.g., in sensor networks), it is more beneficial for local sensors to send in compressed data. In this paper, a linear local compression rule is first constructed based on the full rank decomposition of the measurement matrix at each local sensor. Then an optimal distributed estimation fusion algorithm with the compressed data is proposed. It has three nice properties. Compression along time in the case of reduced-rate communication for some simpler cases and an extension to the singular measurement noise case are also discussed. Several counterexamples are provided to answer some potential questions. Keywords: Estimation fusion, distributed fusion, centralized fusion, linear MMSE, weighted least squares, reducedrate communication, singular measurement noise, full rank decomposition.
1 Introduction Estimation fusion, or data fusion for estimation, is the problem of how to best utilize useful information contained in multiple sets of data for the purpose of estimating an unknown quantity—–a parameter or process (at a time) [1]. There are two basic estimation fusion architectures: centralized and decentralized/distributed (also referred to as measurement fusion and track fusion in target tracking, respectively), depending on whether the raw measurements are sent to the fusion center or not. In centralized fusion, all raw measurements are sent to the fusion center, while in distributed fusion, each sensor only sends in the processed data. In terms of communication burden, there is an unresolved dispute about which is a better choice between these two basic architectures. For example, some argue that distributed fusion with local estimates transmitted should be preferred since sending raw measurements is usually more demanding. This argument seems reasonable. Unfortunately, to obtain the cross-correlation of the estimation errors across ∗ Research supported in part by NSFC grant 60602026, Project 863 through grant 2006AA01Z126, ARO through grant W911NF-08-1-0409 and NAVO through Contract # N62306-09-P-3S01. Z. Duan is also with the College of Electronic and Information Engineering, Xi’an Jiaotong University.
978-0-9824438-0-4 ©2009 ISIF
sensors, extra communication of filtering gain, measurement matrix, etc., are also needed. Then it is doubtful that the distributed architecture can still beat the centralized architecture communicationwise. But as will be shown in this paper, there does exist one distributed estimation fusion algorithm which can beat the centralized estimation fusion algorithm in terms of communication while having the same performance. Distributed estimation fusion has been researched for several decades and numerous results are available. Two classes of optimality criteria were used most in the existing distributed estimation fusion algorithms. The first class [2, 3, 4, 5] tries to reconstruct the centralized fused estimate from the locally processed data (e.g., local estimates). That is, their optimality criterion is the equivalence to the centralized estimation fusion. The second class [6, 7, 8, 9, 10, 5] is optimal for the locally processed data without regard to the equivalence to centralized fusion. The first class is potentially better than the second class but it is also harder to obtain. Considering communication constraints and affordable computational resources at the fusion center (e.g., in sensor networks), it is more beneficial for local sensors to send in compressed data. There have been some discussion [11, 12, 13, 14, 4, 15] about compression in the literature of estimation fusion in the hope of reducing the communication from each local sensor to the fusion center. They mainly differ in the following aspects. For example, where are the compression rules constructed, at the fusion center or at each sensor separately? Which of the optimality criteria above is used? Does construction of the compression rule at one sensor need information from another sensor? Some of the existing results are questionable in that construction of the compression rules at the fusion center needs not only local information other than compressed data being transmitted first but also feedback of the compression needed information to each local sensor. Taking all these into account, the initial motivation for compression seems violated although the results are indeed optimal in some sense. In this paper, a linear local compression rule is first constructed based on the full rank decomposition of the measurement matrix at each sensor. Then the optimal distributed fusion algorithm with the compressed data is proposed. It
563
has three nice properties. First, it is globally optimal in that it is equivalent to centralized fusion. Second, the communication requirements from each sensor to the fusion center is less than that of existing centralized and distributed fusion algorithms. Third, the inverses of the corresponding error covariance matrices are never used, so it can be applied in more general cases. Compression along time in the case of reduced-rate communication for some simpler cases and an extension to the singular measurement noise case are also discussed. Several counterexamples are provided to answer some potential questions. The paper is organized as follows. Sec. 2 formulates the problem. Sec. 3 describes a local linear compression rule. Sec. 4 presents the distributed fusion algorithms with the compressed data. Sec. 5 analyzes their optimality. Sec. 6 discusses compression along time in the case of reduced-rate communication for some simpler cases. Sec. 7 discusses extension to to singular measurement noise case. Sec. 8 provides several counterexamples to answer some potential questions. Sec. 9 gives concluding remarks.
2 Problem formulation
with zero-mean white noise wk with cov(wk ) = Qk ≥ 0 and xk ∈ Rn , E [x0 ] = x¯0 , cov(x0 ) = P0 . Assume that altogether Ns sensors are used to observe the state at the same time (i)
(i)
zk = Hk xk + vk , i = 1, 2, · · · , Ns (i)
(i)
(i)
(i)
z¯k = gk (zk ) (i)
(i)
where dim(¯ zk ) ≤ dim(zk ). Then the compressed data (i) z¯k is sent to the fusion center for the fusion purpose. In this paper, only linear compression is considered: (i)
(i) (i)
z¯k = Tk zk
Also it is required that the compression rule at each sensor be constructed locally based on local information and there is no feedback of information from the fusion center to the local sensors. Given (i)
rank(Hk ) = ri , ri ≤ min (mi , n) from full rank decomposition, we have (i)
(i)
(i)
Hk = M k N k
(3) (i)
(i)
Process estimation Consider the following generic dynamic system xk = Fk−1 xk−1 + Gk−1 wk−1 (1)
(i)
each local raw measurement first:
(i)
where Mk ∈ Rmi ×ri has full column rank and Nk ∈ Rri ×n has full row rank. It was shown in [15] that the following linear tranformation (i) (i) (i) T¯k = (Hk )′ (Rk )−1 is optimal in the sense that the distributed estimation fusion based on this is equivalent to the centralized fusion. Substi(i) (i) tuting Hk of Eq. (3) into T¯k , we have
(2)
(i) (i) (i) (i) T¯k = (Nk )′ (Mk )′ (Rk )−1
(i)
with zero-mean white noise vk with cov(vk ) = Rk > 0 (i) (i) and zk ∈ Rmi ×1 . hwk i , hvk i and x0 are uncorrelated with each other. Also it is assumed that the measurement noises across sensors are uncorrelated. Parameter estimation It is assumed that the timeinvariant parameter x to be estimated satisfies (2). Remark: The parameter x of interest can be random or nonrandom. If x is random, it is assumed that no prior about it is available. For the case x is random with complete prior, since it can be treated as a special type of process estimation [1], it will not be classified into parameter estimation in this paper. In distributed fusion, the fusion center tries to get the best estimate of the state with the processed data received from each sensor. In this paper, by distributed estimation fusion, we mean only data-processed observations are available at the fusion center, not necessarily the local estimates from each sensor. Systems with only local estimates available at the fusion center, referred to as the standard distributed estimation fusion in [1], are not the focus of this paper. Also the optimality criterion used in this paper for distributed estimation fusion with compressed data is the equivalence to centralized estimation fusion.
(i)
Since Nk has full row rank, if both sides are premulti(i) plied by ((Nk )′ )+1 , we have the new transformation matrix (i) (i) (i) T˜k = (Mk )′ (Rk )−1 Furthermore, let (i)
(i)
(i)
(i)
(i)
(i)
Tk = ((Mk )′ (Rk )−1 Mk )−1/2 (Mk )′ (Rk )−1 Then from Eq. (2), it follows that (i)
(i) (i)
(i)
(i)
(i) (i)
(i)
where ¯ (i) = T (i) H (i) H k k k (i) v¯k
=
(4)
(i) (i) Tk vk (i)
with zero-mean white noise h¯ vk i with (i) (i) (i) (i) ¯ (i) = cov(¯ R vk ) = Tk Rk (Tk )′ = Iri ×ri k
(5)
uncorrelated with hwk i and x0 , uncorrelated across sensors, (i) z¯k ∈ Rri and
3 Sensor measurement compression
¯ (i) = ((M (i) )′ (R(i) )−1 M (i) )1/2 N (i) ∈ Rri ×n H k k k k k
In distributed estimation fusion with compressed data, a (i) mapping gk (·) (i = 1, 2, · · · , Ns ) is applied to compress
(i)
¯ xk + v¯ z¯k = Tk zk = Tk Hk xk + Tk vk = H k k
(6)
1 Here A+ stands for the unique Moore-Penrose pseudoinverse (MP inverse for short) of matrix A.
564
5 Optimality of distributed fusion with compressed data
Similarly, for the parameter estimation case, we have (i) ¯ (i) x + v¯(i) , i = 1, 2, · · · , Ns z¯k = H k k
Let
Remark: The total communication requirements from each sensor to the fusion center at any time instant is (i) ri × 1 + ri × n = ri × (n + 1), where ri × 1 is for z¯k (i) ¯ . This beats existing centralized and and ri × n is for H k distributed fusion since ri ≤ min (mi , n). Remark: The introduction of full rank decomposition may be computationally costly at each sensor. For efficient ways to calculate full rank decomposition, see [16].
4 Distributed fusion with compressed data Let (1)
(2)
(Ns ) ′ ′
zkd = [(¯ zk )′ , (¯ zk )′ , · · · , (¯ zk Hkd = vkd
=
)]
¯ (1) )′ , (H ¯ (2) )′ , · · · , (H ¯ (Ns ) )′ ]′ [(H k k k (1) (2) (N ) [(¯ vk )′ , (¯ vk )′ , · · · , (¯ vk s )′ ]′
(7)
(1)
Hkc
Rkd = cov(vkd ) = Ir×r , r =
i=1
ri
(1)
=
(9) +
(12)
(Ns )
}
(13)
and the stacked measurement equation at the fusion center w.r.t. all Ns local sensors can be written as zkc = Hkc xk + vkc Assuming that the centralized fused state estimate at k −1 is xˆck−1|k−1 with the corresponding error covariance matrix c Pk−1|k−1 , then the optimal LMMSE centralized fused estimate of the state at the fusion center at k can be recursively computed as (LMMSE Centralized Fusion): c Pk|k−1 xˆck|k Kkc c Pk|k Skc
= = = = =
′ c Fk−1 + Gk−1 Qk−1 G′k−1 Fk−1 Pk−1|k−1 x ˆck|k−1 + Kkc (zkc − Hkc xˆck|k−1 ) c (Hkc )′ (Skc )−1 Pk|k−1 c c c (Hkc )′ (Skc )−1 Hkc Pk|k−1 − Pk|k−1 Pk|k−1 c (Hkc )′ + Rkc Hkc Pk|k−1
(14) (15)
(16) (17)
For parameter estimation, the unique optimal WLS fuser at the fusion center having minimum norm is (WLS Centralized Fusion):
Assuming that the distributed fused estimate at time k − 1 is x ˆdk−1|k−1 with the corresponding error covariance matrix d Pk−1|k−1 , then in the sense of LMMSE [1, 17], the optimal distributed fused estimate of the state at the fusion center at time k can be recursively computed as (LMMSE Distributed Fusion):
Gk−1 Qk−1 G′k−1
(2)
Rkc = cov(vkc ) = diag{Rk , Rk , · · · , Rk
(8)
4.1 Process estimation
′ d Fk−1 Fk−1 Pk−1|k−1
)]
(1) (2) (N ) [(Hk )′ , (Hk )′ , · · · , (Hk s )′ ]′ (1) (2) (N ) [(vk )′ , (vk )′ , · · · , (vk s )′ ]′
ˆck−1|k−1 xˆck|k−1 = Fk−1 x
zkd = Hkd xk + vkd
d Pk|k−1
(Ns ) ′ ′
Then
and the stacked measurement equation at the fusion center with respect to (w.r.t.) all Ns local sensors becomes
x ˆdk|k−1 = Fk−1 xˆdk−1|k−1
=
vkc =
Then XN s
(2)
zkc = [(zk )′ , (zk )′ , · · · , (zk
x ˆck = ((Hkc )′ (Rkc )−1 Hkc )+ (Hkc )′ (Rkc )−1 zkc Pkc = ((Hkc )′ (Rkc )−1 Hkc )+ For the given dynamic system observed by multiple sensors, the following theorems hold. d c = Pk−1|k−1 , then for the Theorem 1. If Pk−1|k−1 LMMSE distributed and centralized fusion, we have (Hkd )′ (Skd )−1 Hkd = (Hkc )′ (Skc )−1 Hkc
(10)
ˆdk|k−1 ) ˆdk|k−1 + Kkd (zkd − Hkd x x ˆdk|k = x d (Hkd )′ (Skd )−1 Kkd = Pk|k−1 d d d d (11) (Hkd )′ (Skd )−1 Hkd Pk|k−1 − Pk|k−1 = Pk|k−1 Pk|k
Proof: See the Appendix. d Theorem 2. If x ˆdk−1|k−1 = xˆck−1|k−1 and Pk−1|k−1 = c Pk−1|k−1 , then the LMMSE distributed fusion is globally optimal in that it is equivalent to the centralized fusion:
d (Hkd )′ + Rkd Skd = Hkd Pk|k−1
c d = Pk|k x ˆdk|k = xˆck|k , Pk|k
The unique optimal weighted least-squares (WLS) fuser of the parameter x at the fusion center having minimum norm is given by (WLS Distributed Fusion):
Proof: See the Appendix. For the given multi-sensor parameter estimation fusion problem, the following theorems hold. Theorem 3. For the WLS distributed and centralized fusion, we have
x ˆdk = ((Hkd )′ (Rkd )−1 Hkd )+ (Hkd )′ (Rkd )−1 zkd
¯ (i) = (H (i) )′ (R(i) )−1 H (i) ¯ (i) )′ (R ¯ (i) )−1 H (H k k k k k k
4.2 Parameter estimation
¯ (i) )′ (R ¯ (i) )−1 z¯(i) = (H (i) )′ (R(i) )−1 z (i) (H k k k k k k
Pkd = ((Hkd )′ (Rkd )−1 Hkd )+
565
Proof: See the Appendix. Theorem 4. The WLS distributed fusion is globally optimal in that it is equivalent to the centralized fusion: x ˆck = x ˆdk , Pkc = Pkd
Thus it follows that i i i zk+N = Hk+N xk+N + vk+N
where
Proof: See the Appendix. In summary, there are three nice properties associated with the proposed distributed fusion algorithm:
(i)
(i)
(i)
i zk+N = [(zk+N )′ , (zk+N −1 )′ , · · · , (zk+1 )′ ]′ YN −1 (i) (i) −1 ′ ′ i Fk+l )] Hk+N = [(Hk+N )′ , · · · , (Hk+1
• It is globally optimal in that it is equivalent to the optimal centralized fusion. • The communication requirements from each sensor to the fusion center is just ri × (n + 1), which beats the existing centralized and distributed fusion algorithms.
i vk+N
=
i Rk+N
=
l=1 (i) (i) (i) ′ ′ [(vk+N ) , (vk+N −1 ) , · · · , (vk+1 )′ ]′ (i) (i) i cov(vk+N ) = diag{Rk+N , · · · , Rk+1 }
>0
Given full rank decomposition
• The inverses of the corresponding error covariance matrices are never used, which makes it more general.
i i i ˘ k+N ˘k+N Hk+N =M N
Remark: The computational burden can be further reduced by recursive processing of the compressed data at the fusion center and is not discussed here due to space limitation. The interested reader is referred to [15] for detail. Remark: After the results of this paper had been worked out, we found that a similar idea given in the appendix of [4] to reduce the dimensionality of the raw measurements also use full rank decomposition of the measurement matrix at each sensor. By comparison, it can be seen that the distributed estimation fusion algorithm of [4] is based on the information form of the Kalman filter and its optimality (equivalence to the centralized fusion) was proved based on this form, which requires the existence of the corresponding error covariance matrices and cannot necessarily be satisfied in more general cases. Also, one goal of this paper is to eliminate the transmission of compressed measurement noise covariance matrix, as was made clear in the above, which is not the case for [4], so the results of this paper is more applicable to process estimation fusion.
at sensor i, its local raw measurements from k + 1 up to k + N can be compressed optimally as
6 Compression along time
Then it can be updated by the locally compressed data i , i = 1, 2, · · · , Ns , to obtain x ˆdk+N |k+N and z¯k+N d Pk+N |k+N .
In the above, it is assumed that the local sensors have full rate communication with the fusion center. However, in some applications, due to communication constraints (e.g., on communication bandwidth or power consumption or both), it is more meaningful for the sensors to send in processed data in a reduced rate. In the following, it is assumed that every N time instants the sensors send their compressed data to the fusion center and then the fusion center does the fusion operation correspondingly. For process estimation, depending on whether the dynamic system is driven by process noise or not and the state transition matrix (STM) is invertible or not, we can divide the dynamic systems into four classes. Due to space limitation and complexity, only two simpler classes are discussed here. Two other classes are for the future work.
i i i ¯ k+N z¯k+N =H xk+N + v¯k+N
where i i i i i ¯i z¯k+N = Tk+N zk+N , H k+N = Tk+N Hk+N i i i i i i ˘ k+N ˘ k+N ˘ k+N Tk+N = ((M )′ (Rk+N )−1 M )−1/2 (M )′ (Rk+N )−1 i ¯ (i) = cov v¯i R k+N = Idi ×di , di = rank(Hk+N ) k+N
Note that in this case, the prediction at the fusion center is N steps ahead: x ˆdk+N |k = d Pk+N |k =
YN
l=1 YN
l=1
Fk+N −l x ˆdk|k YN d Fk+N −l )′ ( Fk+N −l Pk|k l=1
6.2 With not necessarily invertible STM and no process noise In this case, for j = 1, 2, · · · , N , xk+j = (i)
zk+j
Yj
Fk+j−l xk Y j (i) (i) Fk+j−l xk + vk+j = Hk+j l=1
l=1
It follows that zki = Hki xk + vki where
6.1 With invertible STM and no process noise In this case, for j = 1, 2, · · · , N − 1, YN −1 −1 Fk+l xk+N xk+j = l=j YN −1 (i) (i) (i) −1 Fk+l xk+N + vk+j zk+j = Hk+j
(i)
(i)
(i)
(i)
(i)
(i)
zki = [(zk+1 )′ , (zk+2 )′ , · · · , (zk+N )′ ]′ YN (i) (i) Fk+N −l )′ ]′ Hki = [(Hk+1 Fk )′ , · · · , (Hk+N l=1
vki = [(vk+1 )′ , (vk+2 )′ , · · · , (vk+N )′ ]′ (i)
(i)
Rki = cov(vki ) = diag{Rk+1 , · · · , Rk+N } > 0
l=j
566
Given full rank decomposition
where (i)
(i)
(i)
z¯k = [(¯ zk,1 )′ , (¯ zk,2 )′ ]′
˘ iN ˘i Hki = M k k
¯ (i) = [(H ¯ (i) )′ , (H ¯ (i) )′ ]′ = U (i) H (i) H k k,1 k,2 k k
at sensor i, its local raw measurements from k + 1 up to k + N can be compressed optimally as
(i)
(i)
(i)
(i) (i)
v¯k = [(¯ vk,1 )′ , v¯k,2 )′ ]′ = Uk vk (i)
(i)
(i)
(i)
z¯k,1 , v¯k,1 ∈ Rai ×1 , z¯k,2 , v¯k,2 ∈ Rbi
¯ ki xk + v¯ki z¯ki = H
¯ (i) ∈ Rai ×n , H ¯ (i) ∈ Rbi ×n H k,1 k,2
where
(i)
(i)
(i)
(i)
(i)
¯ , cov(¯ cov(¯ vk,1 ) = R vk,1 , v¯k,2 ) = 0, v¯k,2 = 0 a.s. k,1 ¯ ki = Tki Hki z¯ki = Tki zki , H ˘ i )−1/2 (M ˘ i )′ (Ri )−1 ˘ i )′ (Ri )−1 M T i = ((M
k (i) ¯ Rk
k
k
k
k
(i)
k
(i)
= cov(¯ vk ) = Idi ×di , di = rank(Hki )
d In this case, x ˆdk|k and Pk|k at the fusion center is first updated by the locally compressed data zki , i = 1, 2, · · · , Ns , d to obtain the smoothed estimate x ˆdk|k+N and Pk|k+N , and then:
x ˆdk+N |k+N = d Pk+N |k+N
=
YN
l=1
YN
l=1
(i)
(i) (i)
Since Uk is a unitary matrix, z¯k = Uk zk is optimal (i) in that the LMMSE estimation based on zk is equivalent to (i) based on z¯k . That is, the original measurement equation (2) is equivalent to (i) ¯ (i) xk + v¯(i) z¯k,1 = H k,1 k,1 (i) ¯ (i) xk z¯k,2 = H k,2 (i) ¯ (i) xk + v¯(i) can be compressed into a new Here, z¯k,1 = H k,1 k,1 ¯ (i) ) optimally. The measurement with a dimension rank(H
Fk+N −l xˆdk|k+N YN d Fk+N −l )′ ( Fk+N −l Pk|k+N
k,1
l=1
6.3 Parameter estimation Since parameter estimation is a special case of process estimation with Fk = In×n , wk = 0, compression along time for parameter estimation is relatively easier and can be done optimally similarly as what is done above for process estimation with no process noise except that now the fusion rule is optimal in the sense of WLS.
(i) ¯ (i) xk , by the following noise-free measurement z¯k,2 = H k,2 theorem, can also be compressed optimally. One disadvantage is that in general we need to use MP inverse instead of just the inverse, especially to handle the noise-free part. (i) ¯ (i) xk Theorem 5. The noise-free measurement z¯k,2 = H k,2 can be compressed optimally into a new measurement with ¯ (i) ) by simply selecting the linearly ina dimension rank(H k,2 ¯ (i) . dependent rows of H k,2
Proof: See the Appendix.
7 Extension to singular measurement 8 Counterexamples noise case ¯ (i) has full row As can be seen from the above, the final H (i)
In the above, it is assumed that Rk > 0, i = 1, 2, · · · , Ns , which may limit the application of the proposed algorithm. (i) We now extend it to the general case of Rk ≥ 0. (i)
(i)
If Rk ≥ 0, it follows that rank(Rk ) = ai < mi . It follows from singular value decomposition (SVD) that there (i) must exist a unitary matrix Uk such that (i)
(i)
(i)
Uk Rk (Uk )′ =
¯ (i) R k,1 0bi ×ai
0ai ×bi 0bi ×bi
¯ (i) > 0 is an ai × ai diagonal where bi = mi − ai and R k,1 matrix. Let (i)
k
rank. In view of Theorem 5, one may think that what is done (i) in this paper is trivial for noisy measurement zk generated (i) (i) with Rk > 0, i.e., zk can also be compressed optimally (i) by simply selecting the linearly independent rows of Hk . As shown next, this is not necessarily the case. Consider the problem of random parameter estimation as follows z = Hx + v (18) where x has the a priori mean x ¯ and covariance Cx , v has zero mean and covariance R, and
(i) (i)
z¯k = Uk zk Then from Eq. (2), it follows that
(i) (i) (i) (i) (i) ¯ (i) xk + v¯(i) z¯k = Uk Hk xk + Uk vk = H k k
567
1 1 Cx = 1 1 1 H = 5 2
1 1 1 2 3 4 3 6 10 4 10 20 2 3 4 1 1 6 7 8 , R = 1 2 4 6 8 1 3
1 3 6
The MSE matrix of the LMMSE estimate of x with z directly is 0.1455 −0.0937 −0.1449 0.0910 0.1140 −0.1466 −0.0937 0.1266 Pc = −0.1449 0.1140 0.2043 −0.1673 0.0910 −0.1466 −0.1673 0.2393
T = (M ′ M )−1/2 M ′ R−1/2 ¯ and R ¯ = I2×2 for the example of gives a full row rank H Eq. (18), where H = M N with
With T z it is 0.1967 −0.0472 −0.1331 0.0262 0.1247 −0.2056 −0.0472 0.1689 Pd = −0.1331 0.1247 0.2070 −0.1822 0.0262 −0.2056 −0.1822 0.3215
1 2 1 0 M = 5 6 , N = 0 1 2 4
T =
1 0 0 1
0 0
1 or 0 0
0 0 1 0 0 0
0.1497 −0.0903 −0.1452 0.0830 0.1138 −0.1531 −0.0903 0.1293 Pd = −0.1452 0.1138 0.2044 −0.1667 0.0830 −0.1531 −0.1667 0.2547
for T =
0 1 0 0
0 1
0 0 1 0 0 1
0 or 0 0
which selects rows 2 and 3 of H. It can be seen from an in-depth analysis that the difference between P d and P c is due to the the correlation among the components of the measurement noise vector v. By selecting linearly independent rows of H, this correlation is discarded completely, which was already proved to be useful in fact. The MSE matrix of LMMSE estimate of x with T z is the same as that based on z directly for elementary transformation matrices 1 0 0 1 0 −0.5 0 T = 0 1 0 , 0 1 −2 0 1 0 0 1 Unfortunately, the third component of T z will be noise only and cannot be discarded due to its correlation with the noise in the other two components of T z. That is, we do not have compression in this case at all. It should be noted that the compression proposed above is not unique for two reasons. One is that the full rank de(i) composition of Hk is not unique and they all give the same result. The other reason is that any transformation in the form of (i)
(i)
(i)
(i)
(i)
(i)
Tk = (Ak (Mk )′ (Rk )−1 Mk (Ak )′ )−1/2 (i)
(i)
(i)
· Ak (Mk )′ (Rk )−1 (i)
will be optimal if Ak ∈ Rri ×ri is an invertible matrix.
which selects rows 1 and 2 of H, and with T z it is 0.2006 −0.0460 −0.1395 0.0060 0.1187 −0.2202 −0.0460 0.1679 Pd = −0.1395 0.1187 0.2049 −0.1757 0.0060 −0.2202 −0.1757 0.3705
−1 −2 2 3
is one full rank decomposition of H, but the MSE matrix of the LMMSE estimate of x with T z is
for
(i)
Note that not every transformation Tk that leads to a full ¯ (i) and R ¯ (i) = Iri ×ri will give the same P d as row rank H k k c P . For instance,
which is certainly different from P c based on z directly. ¯ (i) One may also think to transform the full row rank H k (i) ˘ (i.e., the row canoninto the reduced row echelon form H k ical form) to save communication further. Unfortunately, the ˘ (i) will not be identity matrix any more. Due to resultant R k ˘ (i) , we can send in just the upper or lower the symmetry of R k ˘ (i) . But what is saved by the transmistriangular part of R k ˘ (i) is cost exactly by the transmission of H ˘ (i) . So sion of R k k ¯ (i) to its reduced row echelon in general, even if we reduce H k form, the communication is still ri × (n + 1).
9 Conclusions In fusion application, due to constraints on communication (e.g., on communication bandwidth or power consumption) and on computational resources at the fusion center, it is more beneficial for the local sensors to send in compressed data. In this paper, a linear local compression rule is first constructed based on the full rank decomposition of the measurement matrix at each sensor, and then the optimal distributed estimation fusion algorithm with the compressed data is proposed. Its three nice properties make it attractive. First, it is globally optimal in that it is equivalent to the centralized fusion. Second, the communication requirement from each sensor to the fusion center is less than that of existing centralized and distributed fusion algorithms. Third, the inverses of the corresponding error covariance matrices are never used, so it can be applied in more general cases. Compression along time in the case of reduced-rate communication for some simpler cases and an extension to the singular measurement noise case are also discussed. Several counterexamples are provided to answer some potential questions. In our work, the compressed dimension for each sensor is the rank of the measurement matrix. Whether this is the minimal compressible dimension for the formulated problem is a topic for future work.
568
Appendix
Taking transpose on both sides, we have
A. Proof of Theorem 1 d c Since Pk−1|k−1 = Pk−1|k−1 , from Eqs. (10) and (15), it d c follows that Pk|k−1 = Pk|k−1 . Let (1) (2) (N ) Tk = diag{Tk , Tk , · · · , Tk s }
˘′ Mk′ (Rkc )−1 Hkc = Mk′ (Rkc )−1 Skc (Rkc )−1 Mk N k d · (I − Pk|k−1 Vk−1 (Hkc )′ (Rkc )−1 Hkc )
where
d +I Vk = (Hkc )′ (Rkc )−1 Hkc Pk|k−1
Then it follows from Eqs. (8), (5), (13), (7) and (4) that Then Rkd = Ir×r = Tk Rkc Tk′ , Hkd = Tk Hkc
(Hkd )′ (Skd )−1 Hkd
Thus
d ˘k = (I − (Hkc )′ (Rkc )−1 Hkc Uk−1 Pk|k−1 )N
(Hkd )′ (Skd )−1 Hkd
· Mk′ (Rkc )−1 Skc (Rkc )−1 Mk
d = (Hkc )′ Tk′ (Tk Hkc Pk|k−1 (Hkc )′ Tk′ + Tk Rkc Tk′ )−1 Tk Hkc
· (Mk′ (Rkc )−1 Skc (Rkc )−1 Mk )−1
= (Hkc )′ Tk′ (Tk Skc Tk′ )−1 Tk Hkc
· Mk′ (Rkc )−1 Skc (Rkc )−1 Mk ˘k′ (I − P d ·N V −1 (Hkc )′ (Rkc )−1 Hkc ) k|k−1 k
Furthermore, let (1)
(2)
(Ns )
Mk = diag{Mk , Mk , · · · , Mk
= (I −
}
−1 c ′ c −1 c ˘k′ (I − P d Hk ) · Skc · (Rkc )−1 Mk N k|k−1 Vk (Hk ) (Rk )
Lk = Mk′ (Rkc )−1 Mk Then
−1/2
Tk = Lk
= (Hkc )′ (Skc )−1 Skc (Skc )−1 Hkc = (Hkc )′ (Skc )−1 Hkc
Mk′ (Rkc )−1
(Hkd )′ (Skd )−1 Hkd −1/2
= (Hkc )′ (Rkc )−1 Mk Lk
−1/2 −1
· Skc (Rkc )−1 Mk Lk
)
−1/2
(Lk
−1/2
Lk
Mk′ (Rkc )−1
Mk′ (Rkc )−1 Hkc
= (Hkc )′ (Rkc )−1 Mk (Mk′ (Rkc )−1 Skc (Rkc )−1 Mk )−1 · Mk′ (Rkc )−1 Hkc It can be easily seen that (Hkc )′ (Rkc )−1 Mk = (Hkc )′ (Skc )−1 Skc (Rkc )−1 Mk Also from Eq. (17) and the matrix inverse lemma, we have d (Hkc )′ (Rkc )−1 (Skc )−1 = (Rkc )−1 − (Rkc )−1 Hkc Uk−1 Pk|k−1
where
d ˘k Mk′ (Rkc )−1 (Hkc )′ (Rkc )−1 Hkc Uk−1 Pk|k−1 )N
B. Proof of Theorem 2 Since x ˆdk−1|k−1 = xˆck−1|k−1 , it follows from Eqs. (9) d and (14) that x ˆdk|k−1 = x ˆck|k−1 . Also, since Pk−1|k−1 = c d Pk−1|k−1 , it follows from Eqs. (10) and (15) that Pk|k−1 = c . Pk|k−1 d c Since Pk−1|k−1 = Pk−1|k−1 , it follows from Theorem 1, c d . Eqs. (11) and (16) that Pk|k = Pk|k From the almost sure uniqueness of the LMMSE estimators that two LMMSE estimators of the same estimand (quantity to be estimated) using the same set of data are almost surely identical if and only if their MSE matrices are ˆck|k . This completes the equal [18], it follows that x ˆdk|k = x proof. C. Proof of Theorem 3 From Eqs. (5) and (6), it follows that ¯ (i) )′ (R ¯ (i) )−1 H ¯ (i) = (H ¯ (i) )′ H ¯ (i) (H k k k k k
d (Hkc )′ (Rkc )−1 Hkc + I Uk = Pk|k−1
(i)
(i)
(i)
(i)
(i)
(i)
= (Nk )′ ((Mk )′ (Rk )−1 Mk )1/2 (i)
(i)
· ((Mk )′ (Rk )−1 Mk )1/2 Nk
Thus
(i)
(i)
(i)
(i)
(i)
(i)
(i)
= (Hk )′ (Rk )−1 Hk
· (Hkc )′ (Rkc )−1 d (Hkc )′ (Rkc )−1 Mk = (I − (Hkc )′ (Rkc )−1 Hkc Uk−1 Pk|k−1 )
and
· (Hkc )′ (Rkc )−1 Skc (Rkc )−1 Mk ˘k M ′ , where N ˘k Note that (Hkc )′ = N k (1) ′ (2) ′ (Ns ) ′ ′ [(Nk ) , (Nk ) , · · · , (Nk ) ] . Then
(i)
= (Nk )′ (Mk )′ (Rk )−1 Mk Nk
d ) (Hkc )′ (Skc )−1 = (I − (Hkc )′ (Rkc )−1 Hkc Uk−1 Pk|k−1
(i)
(i)
(i)
(i)
(i)
¯ )′ (R ¯ )−1 z¯ = (H ¯ )′ z¯ (H k k k k k =
(i)
(i)
(i)
(i)
= (Nk )′ ((Mk )′ (Rk )−1 Mk )1/2 (i)
(i)
(i)
(i)
(i)
(i)
· ((Mk )′ (Rk )−1 Mk )−1/2 (Mk )′ (Rk )−1 zk (i)
d ) (Hkc )′ (Rkc )−1 Mk = (I − (Hkc )′ (Rkc )−1 Hkc Uk−1 Pk|k−1
(i)
(i)
(i)
= (Nk )′ (Mk )′ (Rk )−1 zk (i)
˘k M ′ (Rc )−1 S c (Rc )−1 Mk ·N k k k k
(i)
(i)
= (Hk )′ (Rk )−1 zk
569
D. Proof of Theorem 4 Since Rkc and Rkd are both block diagonal matrices, it follows that XN s (i) (i) (i) (Hkc )′ (Rkc )−1 Hkc = (Hk )′ (Rk )−1 Hk i=1 XN s ¯ (i) )′ (R ¯ (i) )−1 H ¯ (i) (Hkd )′ (Rkd )−1 Hkd = (H k k k i=1 XN s (i) (i) (i) (Hk )′ (Rk )−1 zk (Hkc )′ (Rkc )−1 zkc = i=1 XN s ¯ (i) )′ (R ¯ (i) )−1 z¯(i) (Hkd )′ (Rkd )−1 zkd = (H k k k
[7] K. H. Kim, “Development of track to track fusion algorithms,” in Proceedings of the 1994 American Control Conference, Baltimore, MD, June 1994, pp. 1037–1041. [8] K. C. Chang, R. K. Saha, and Y. Bar-Shalom, “On optimal track-to-track fusion,” IEEE Transactions on Aerospace and Electronic Systems, vol. 33, no. 4, pp. 1271–1276, October 1997. [9] H. M. Chen, T. Kirubarajan, and Y. Bar-Shalom, “Performance limits of track-to-track fusion versus centralized estimation: theory and application,” IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 2, pp. 386– 400, April 2003.
i=1
Furthermore, it follows from Theorem 3 that (Hkc )′ (Rkc )−1 Hkc = (Hkd )′ (Rkd )−1 Hkd (Hkc )′ (Rkc )−1 zkc = (Hkd )′ (Rkd )−1 zkd
[11] K. S. Zhang, X. R. Li, P. Zhang, and H. F. Li, “Optimal linear estimation fusion - Part VI: Sensor data compression,” in Proceedings of the 6th International Conference on Information Fusion, Cairns, Quensland, Australia, July 2003, pp. 221–228.
Thus, Pkc = ((Hkc )′ (Rkc )−1 Hkc )+ = ((Hkd )′ (Rkd )−1 Hkd )+ = Pkd x ˆck = ((Hkc )′ (Rkc )−1 Hkc )+ (Hkc )′ (Rkc )−1 zkc =
((Hkd )′ (Rkd )−1 Hkd )+ (Hkd )′ (Rkd )−1 zkd
=
[10] K. C. Chang, Z. Tian, and S. Mori, “Performance evaluation for map state estimate fusion,” IEEE Transactions on Aerospace and Electronic Systems, vol. 40, no. 2, pp. 706– 714, April 2004.
x ˆdk
E. Proof of Theorem 5 Premultiplying elementary row transformation matrices ¯ (i) so that the only difference between H ¯ (i) and the to H k,2 k,2 final transformed measurement matrix is that the linearly ¯ (i) are replaced by zero row vectors. dependent rows of H k,2 Since elementary row transformation matrices are invertible, (i) the LMMSE estimation based on z¯k,2 must be equivalent to the LMMSE estimation based on the newly transformed measurement. In the final transformed measurement matrix, ¯ (i) are replaced by zero all the linearly dependent rows of H k,2 row vectors, that is, we selected the linearly independent ¯ (i) . rows of H k,2
References [1] X. R. Li, Y. M. Zhu, J. Wang, and C. Z. Han, “Optimal linear estimation fusion - Part I: Unified fusion rules,” IEEE Transactions on Information Theory, vol. 49, no. 9, pp. 2192–2208, September 2003. [2] C. Y. Chong, “Hierarchical estimation,” in Proceedings of the MIT/ONR Workshop on C3, Monterey, CA, 1979. [3] H. R. Hashemipour, S. Roy, and A. J. Laub, “Decentralized structures for parallel kalman filtering,” IEEE Transactions on Automatic Control, vol. 33, no. 1, pp. 88–94, January 1988.
[12] Y. M. Zhu, E. B. Song, J. Zhou, and Z. S. You, “Optimal dimensionality reduction of sensor data in multisensor estimation fusion,” IEEE Transactions on Signal Processing, vol. 53, no. 5, pp. 1631–1639, May 2005. [13] E. B. Song, Y. M. Zhu, and J. Zhou, “Sensors’ optimal dimensionality compression matrix in estimation fusion,” Automatica, vol. 41, no. 12, pp. 2131–2139, December 2005. [14] I. D. Schizas, G. B. Giannakis, and Z. Q. Luo, “Distributed estimation using reduced-dimensionality sensor observations,” IEEE Transactions on Signal Processing, vol. 55, no. 8, pp. 4284–4299, August 2007. [15] Z. S. Duan and X. R. Li, “Optimal distributed estimation fusion with transformed data,” in Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 2008, pp. 1291–1297. [16] R. Piziak and P. L. Odell, “Full rank factorization of matrices,” Mathematics Magazine, vol. 72, no. 3, pp. 193–201, June 1999. [17] X. R. Li, “Recursibility and optimal linear estimation and filtering,” in Proceedings of the 43rd IEEE Conference on Decision and Control, Atlantis, Paradise Island, Bahamas, December 14-17 2004, pp. 1761–1766. [18] X. R. Li and K. S. Zhang, “Optimal linear estimation fusion - Part IV: Optimality and efficiency of distributed fusion,” in Proceedings of the 4th International Conference on Information Fusion, Montreal, QC, Canada, August 2001, pp. WeB1– 19–WeB1–26.
[4] E. B. Song, Y. M. Zhu, J. Zhou, and Z. S. You, “Optimal Kalman filtering fusion with cross-correlated sensor noises,” Automatica, vol. 43, no. 8, pp. 1450–1456, August 2007. [5] Z. S. Duan and X. R. Li, “The optimality of a class of distributed estimation fusion algorithm,” in Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 2008, pp. 1–6. [6] Y. Bar-Shalom and L. Campo, “The effect of the common process noise on the two-sensor fused-track covariance,” IEEE Transactions on Aerospace and Electronic Systems, vol. 22, no. 6, pp. 803–805, November 1986.
570