COMBINING AND UPDATING OF LOCAL ESTIMATES AND

Report 3 Downloads 73 Views
LIDS-P-955

November 1979

COMBINING AND UPDATING OF LOCAL ESTIMATES AND REGIONAL MAPS ALONG SETS OF ONE-DIMENSIONAL TRACKS

Alan S. Willsky* Martin Bello** David A. Castanont Bernard C. Levyt George Verghesett

ABSTRACT

In this paper we consider the problem of combining and updating estimates that may have been generated in a distributed fashion or may represent estimates, generated at different times, of the same process sample path. The first of these cases has applications in decentralized estimation, while the second has applications in updating maps of spatiallydistributed random quantities given measurements along several tracks. The method of solution for the second problem uses the result of the first, and the similarity in the formulation and solution of these problems emphasizes the conceptual similarity between many problems in decentralized control and in the analysis of random fields.

Laboratory for Information and Decision Systems and the Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA. 02139. The work of this author was performed in part at The Analytic Sciences Corporation, Reading, MA., and in part at M.I.T. with partial support provided by the Air Force Office of Scientific Research under Grant AFOSR-77-3281B. **

Laboratory Engineering this author MA., and in

for Information and Decision Systems and the Department of Electrical and Computer Science, M.I.T., Cambridge, MA. 02139. The work of was performed in part at The Analytic Sciences Corporation, Reading, part at The M.I.T. Lincoln Laboratory.

tLaboratory for Information and Decision Systems and the Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA. 02139. The work of these authors was performed at M.I.T. with support provided by the Office of Naval Research under Contract ONR/N00014-76-C-0346.

tt Electric Power Systems Engineering Laboratory and Department of Electrical Engineering and Computer Science, Cambridge, MA. 02139

-2-

I.

INTRODUCTION

The research reported in this paper was motivated by the following problem in the mapping of two-dimensional random fields, that is spatially distributed random quantities.

Measurements or surveys of the field are

collected at different times along sets of one-dimensional tracks across the field.

The sets of tracks may differ from survey to survey.

Either

a local or regional map is generated for each of these surveys and the problem is either to combine these local maps optimally, or to update an overall map as each new survey becomes available.

Problems of

this type arise in many applications including the mapping of vertical temperature profiles of the atmosphere given data provided by satellites

[1]

and the mapping of anomalies in the earth's gravitational field and the effects of such anomalies on errors in inertial navigation systems

[2,14].

The problem posed in the preceding paragraph is not solved completely in this paper, but a special case of it is in which the tracks are all parallel and the field along the direction of the tracks can be modeled by a finite dimensional linear shaping filter driven by white noise.

In ad-

dition to solving this special case and to providing insight into the general case, the solution we obtain is of independent interest in that it provides a procedure for optimally updating smoothed estimates as more data is collected.

Furthermore, one of the principle steps in our development is

the construction of the optimal combined filtered (i.e. causal) estimate from several local filtered estimates.

This is basically a problem in

-3-

decentralized filtering,and our results extend those of Speyer

[3] and

Chong [7]. In the next section we present and discuss the solution to the problem of combining decentralized filtered estimates, while Sections III contains the description and solution of the problem of updating smoothed estimates. In Section IV we apply the results of the preceding section to the problem of real-time smoothing, that is, of estimation given a previous smoothed estimate and new real-time data.

The paper concludes with a discussion in

Section V.

II.

COMBINING DECENTRALIZED FILTERED ESTIMATES

2.1

Formulation and Solution of the General Case Consider a linear dynamical system driven by Gaussian white noise x(t)

(2.1)

A(t)x(t) + w(t)

(2.2)

EEw(t)w(T)'] = Q(t)3(t-T)

where w(t) is independent of x(O) which is taken to be a zero mean Gaussian random variable with covariance

(0).

Suppose we have two sets

of white noise-corrupted observations

where v

YL(t) = C (t)x (t) + vl ( t)

(2.3)

Y2(t) = C 2 (t)x(t) + V2(t)

(2.4)

and v2

are independent of each other and of w and x(O), with

E[vi(t)vi ()']

= Ri(t)6(t-T) ,

i=1,2

(2.5)

-4-

The two sets of measurements

(2.3), (2.4) can be thought of as representing

observations taken at different locations in a distributed system or at different nodes in a network of interconnected systems.

These observations

are processed separately to produce local filtered estimates, and we wish to consider the problem of recovering the overall optimal filtered estimate

x(tlt) = E{x(t)lY

( S),y 2 ( s)

st}

in terms of these local estimates.

(2.6)

If this can be done, then much of the

raw data processing can be done locally without any loss in global performance.

In addition, if local filtering is performed on the data, we may

reduce the required bandwidth for transmission of information to a centralized processor.

A problem of this type was considered by Speyer [3]

in the context of decentralized control. of the estimation portion of his results.

Our work represents an extension Also, while we consider only

two sets of measurements (2.3),(2.4), the preceding formulation and our analysis of it extend in an obvious manner to

the case of N sets of

measurements and local estimates. In order to complete the formulation of the problem, we assume that the local processing algorithms are Kalman filters based on different models:

i. (t) =

t)(t) (t) +x.w. 1

1

E[w

i (t) w

Yi(t) where x

i (

0)

is

(T)'

H(t)x(t)

,

i=1,2

(2.7)

i=1,2

(2.8)

i=1,2

(2.9)

1

1

= Qi(t)6(t-T),

t

+ +(t)x vt),

taken to be zero-mean with covariance

i(O).

It

is

I

-5-

important to emphasize here that (2.7)-(2.9) represents a model whose sole purpose is for the design of local Kalman filters. not accurately reflect the actual statistics of Yi.

This model may

At the moment we are

assuming no relationship between the local model (2.7)-(2.9) and the correct global model (2.1)-(2.5), except for the assumption that the v. in (2.9) are the same as in (2.5) (i.e.

that they have the same statistics,

so at least the measurement noise is modeled in the same fashion locally and globally).

As we need to impose some relationship between local and

global models, we will do so. Given these local models, the equation for each local processor is given by the following,

x.(tt)

=

[A-PHR. H.]x.(tt) + PiHiR. Yi (t)

1

1

The covariance P

(2.10)

1i k

can be precomputed from either of the following

equations: P. = A.P. + PA

d dt

(P

-1 i

-P

+ Q

-1-

-

P

A. - AP. i -i i

i H-1HiP

-1

-1

P. QiP.

(2.11)

-

+ H.R. H. 1 1

(2.12)

with the initial condition .i(O)=

i(0)

(2.13)

The problem to be solved is to obtain an algorithm for computing

* From this point of the explicit time dependence of matrices will be suppressed. If a particular matrix is constant, we will explicitly state this in order to avoid confusion.

-6-

the global estimate x in (2.6) in terms of x

and x2 .

Speyer in [3] solved

this problem when the local model are the same as the global model, and we will comment on the simplifications that occur in that case shortly. Allowing

the local models to differ from the global model leads to several

potential advantages.

For example, presumably the local models are lower-

dimensional than (2.1) and represent the important dynamics at that particular location in the distributed system or network.

Therefore, the local

processors can be made far less complex than the global processor. A

A

we cannot recover x from x

Of course,

/

and x2 for arbitrary choices of local models,

but the conditions needed are quite weak.

Specifically, as we will see, the

only condition that is required is that there exist (possibly-varying) matrices

M 1 and M2 such that 1 i2 C. =i

il=1,2

(2.14)

Equation (2.14) and its implications deserve some comment.

First

note that (2.14) is equivalent to

R(C.) C

(Hi )

i=1,2

(2,15)

i=l,2

(2.16)

or equivalently that R(C)

D

R(H)

What these conditions say is that if any set of components of H.x. are linearly interrelated, then the same set of components of Cx must have exactly the same linear interrelationship.

That is, if the local models

I

__

-7-

(2.7)-(2.9) assume any redundancy among the sensed quantities -- i.e. the components of model.

i --

then that redundancy must actually exist in the global

Note that if (2.15) is satisfied, valid choices for M 1 and M 2 are

M. = HC.

i=1,2

,

(2.17)

(where "t" denotes pseudo-inverse) and the choice is unique only if N(H.) = {O1. Thus, the dynamics

(2.7),(2.8) can be totally arbitrary, as long as

(2.15) or (2.16) is satisfied.

For example, one implication of this

condition is that the dimension of x

must be at least as large as the

number of linearly independent components of the measurement vector Yi. However, the condition (2.15) is sufficiently weak that, if we desire, we can always choose a local model of this minimal dimension that satisfies the condition.

Therefore, the conditions does not require that there by

any physical relationship between the local states, x1 and x2, and the global state x.

On the other hand, (2.14) suggests an interpretation of

Xi as being a part of the global state, specifically Mix.

If this is the

case, then (2.7) implies that this part of the state is decoupled from the remaining part of x in the sense that M.x is itself a Markov process.

This

is, of course, not the usual case in practice, where approximations are made in assuming that the couplings between the local states can be neglected or can be replaced by additional white noise sources.

What our results say is

that as long as (2,14) holds, for the purposes of reconstructing x, it doesn't

-8-

matter if (2.7) is an exact or approximate expression for the evolution of the local state.

If xi actually equals Mx, we obtain some simplifi-

cations in the equations that define our algorithm, and we will discuss these at the end of this section. As a first step in deriving our algorithm, consider the Kalman filter for the calculation of the global estimate x:

x(t·t) =

EA-PC'R

C1-PC R

Cx(tlt) + PCIR

yl(t) + PC2R2 y (t) (2.18)

where P can be calculated from

P = AP + PA

P(0) =

+ Q -PC

R

CP

PC R

C P

(2.19)

(0)

(2.20)

The solution to the problem we have posed can be obtained as follows. Rearranging

HR

(2.10) we have*

Y

= Pi {xi- [A-P.HR. Hx.

(2.21)

Examining (2.18), we see that the quantities needed in the calculation of x are CR

1

matrices M

Y 1 and C2 R2 Y2 .

These can be obtained from (2.21) only if

and M 2 exist that satisfy

case, we can combine

(2.14).

(2.14),(2.18), and

Assuming that this is the

(2.21) to obtain

* Note that we have implicitly made one other assumption about the local models, in that in (2.21) we are assuming that P. is invertible. This will be guaranteed as long as

------

:i(0)

is invertible

-9-

x= [A-PC1 R

C-

2 ]X

PC2 R2 + PMP 11

1

- [A -P H R 1 11

+ PM22 {2

Hx} 1

[A 2-2H2R 2]2

1

(2.22)

2 }

In order to simplify notation, define the following quantities:

(2.23)

F = A-PC1Rl C1 - PC2R21C 11 1 PC22 C2 -1 F. = A.-P.H.R. H. 3

1

1

1

1

(2.24)

i=1,2

1

G. = PM P 1

1

(2.25) 1

Then, in order to avoid differentiating x

A

and x2 in (2.22), we define

A

A

= x - GX

.26)

- G2x 2

1

(2

and differentiating, we find that

F

x =

+K

+

K

.27)

+2 2X

(2

lX 1 + G2x 2

(2

1

.28)

where K. = FG. - G . 1

1

- G.F.

, i=1,2 -1

If we use the differential equations for P., P. 1 1

_

_I

(2.29)

1.3

and P,

(2.29) becomes*



* Note that in (2.29) we have implicitly assumed that M 1 and M2 are differentiable. Again this is not a particularly restrictive condition. For example, in the time-invariant case it is certainly true, since M 1 and M 2 can be taken to be constants.

---------

-

II-

--

-

-10-

K.

P_

PMP

t

i

- QM- -1 11i

+

-1

[PMiA P

PA'MP.

- PM.P. -]

i=1,2

(2.30)

If all of the models, local and global, are time-invariant and if we consider the steady-state case, then the above solution still applies (with M.=O) and is also time-invariant. This is the general solution to the problem of combining decentralized maps.

In addition, this solution can be directly adapted to the problem

of computing x from xl and

2.

This is of interest in situations in which

one local processor transmits information to a global processor that has measurements of its own.

We can solve this problem by returning to

, -1 (2.18), and instead of replacing both Cl Y1

, 1 and CR2 Y C2R2

by expressions Y2

-l

A^~~~~~~~~~~

in terms of xi and xi, we make this substitution only for CRl 1 Y1 remaining analysis is analogous to that carried out previously,

The and

the result is

x = p + GX

(2.31)

1

where A

p

=

F

K1x1

I

-1

+ PC2R2

+2

(2.32)

Here F, K1, and G1 are the same as given previously. In the next two subsections we present two special cases which result in some simplifications in (2.23)-(2.32) and consequently allow us to interpret our result in more detail.

_rrrrllssl

____s__·_____________C__III___1__

ill

_1C

_^111_1^--··11-··II..

---1111-1

-·-----

- -

- ·-I ·

-

-

I

I--

I

-11-

2.2

The Special Case of Identical Local and Global Models

In this section we consider the case examined by Speyer in [3]. Specifically, we assume that the models used by the local processors are identical to the global model.

A l=A

1

2=

Ql Q=Q2=Q,

A

cC2 C=H1 ,

That is,

M =I M1 =M 2=I

C2 =H2 ,

(2.33)

In this case the expressions for K1 and K2 simplify to

K.1

PP 1

QP 1

x =

+P(P P(

-

QP 1

(PP1

-I)QP 1

(2.34)

and

x

+x22

(2.35)

Note that the second term in the expression for x is the usual expression for combining independent estimates in general, and

[4,5].

However x1 and x 2 are not independent

represents a correction for this correlation.

The reason that x

and x2 are not independent estimates is that they

are based not only on measurements with independent noises but also on a priori information.

Specifically, both of the local estimates incorporate

statistical descriptions of x(O) and w(t), and thus the errors in both estimates are correlated with these processes.

the process w(t) that leads to the need for-a dynamical correction ()

to

account for the correlation in the processes x 1 and x2 .

if

w(t) is not present), then K=Oand hence 1

of x! and x. 2

_.

_ _ ..-

--

It is the correlation with

--

--

--

--

--

-

If Q=

(i.e.

=0, and x is a memoryless function

In this case it is straightforward to show that

-12-

x(tlt) + P(t) P' (t)xi(tIt) + P t)x 2 (tlt)3

(2.36)

and

P where

(t) + P-(t)

(t) =

(2,37)

-

(t) is the unconditional covariance of x(t).

Xn general

(t)

satisfies

i(t) with

= A(t(t)

(O) given.

Q is zero.

+

Equations

(t)A, (t)

+

Q(t)

(2,38)

(2.36) and (2.37) hold only in the case when

Note that even in this case x

and x2 are not independent

estimates because of the correlation of the estimation errors with x(O). Following the work of Wall [4], we can interpret follows.

(2.36) and (2.37) as

We have three sources of information on which to base our

estimate of x(t), the measurement processes Y1 and Y2 and the a priori information about x(t), provided by the unconditional propagation of the mean and variance from the specified statistics of x(O).

The estimate x.

uses yi and the a priori information, which, therefore is used twice. -1 -1 Equation (2.37) corrects for the fact that both P and P 1 2 uses of this information.

Also,

(2.36) is the correct expression under the

assumption that x(O) is zero mean. mean m(0)70, then

If this is not the case, that is if its

(2.36) is replaced by

x(tlt) = P(t) [P1 x (tlt) x(tt) = P(t)2[P1 X2 (tt )

_

reflect the

__

_

P x (tit) P2 2 2t)-

+ +

_I_

(t)m(t)] t

____1______

(2.39)

__

-13-

where m(t) is the unconditional mean of x(t) which satisfies (2.40)

m(t) ='Am(t)

Again we see the "subtracting out" of the effect of a priori information, so that the duplication of this information is removed. Finally, note that K.=O also if P=P.. 1

1

However, this is only the

case if the other set of measurements contains no information.

In general,

if the system is observable from each set of measurements, (PPi -I) will 1

be invertible.

Of course, all of the previous statements have certain For example, if part of the state is uncontrol-

obvious generalizations.

lable from the noise, then the corresponding part of x is a memoryless function of x1 and x2 .

Also, if one set of measurements, say set 1,

contains no information about a part of x, then the corresponding parts of P and P2 are identical.

2.3

The Case in Which the Local Model is a Subsystem of the Global Model In some cases the dynamics of one of the local models may, in fact,

be the exact dynamics of a subsystem of the global model.

Specifically,

if this is true of local model 1, then

xl(t) =

l

(t)x(t)

(2.41)

Equation (2.41) has several important implications.

Since x

satisfies

(2.7),(2.8), and x satisfies (2,1),(2.2), equation (2.41) states that the Markov process x(t) has a subprocess, namely x (t), that is Mrkov

1_1-_11---

11111-

111111_111_11__1·

-14-

by itself.

Differentiating

A1 M x+w1 = AlXl+W 1 1 11 1

and from this

x

(2.41) and using

=Mlx+ Mx 1

(2.1) and (2.7) we have that

Mx+M1Ax +Mw 1 1

(2.42)

we conclude that

AlM1 = M1 1 1

MA 1

(2.43)

and w

M w

(2.44)

which implies that

1

=

(2.45)

M 1 QM 1

Also, directly from (2.41) we have that

E = M EM

(2.46)

Note that from (2.46) it is clear that onto (assuming that

is invertible).

1

is invertible only if M 1 is

We will assume that this is the case, with fewer

since from (2.41) we see that any other choice for M 1 leads to an x degrees of freedom than it has components.

In addition, under these

conditions, the expression for K 1 simplifies: 1

K1 = PM1P1 M QMlP 1

- QM P11 (2.47)

=

[PMP-1 M-I]QMP-1

This equation bears some resemblance to the form of the gain when the local model is the same as the global model.

_________1111_____1_11_31111____11_11__1

1

--·I

In order to gain further

Ill

-15-

insight, we wish to consider a particularly convenient form for the global model.

This is done by choosing a basis for the global state

space so that the components of x l are the first components of x. Assuming without loss of generality that the global model is in this form, then

x

=

---

1

M1 = (I 11O)

(2.48)

'

21 ) = (

Q1

kQ12 Y 1,

2=

(H1

(c21

(2.49)

22) ()

(2.50)

12 Q22

0)

(2.51)

/

x

c22 ) (x)

+

(2.32)

1

+ v2

This form is illustrated in Figure 2.1.

(2.53)

Note from the figure that it is

clear that the global system is not observable from Yl alone.

This is

not surprising given that xl is Markov by itself.

1_1__11___11___________si)--L_____

_

_____

··

·111_1_

C'j

I H

U~

________q_________·_____

-16-

Using

(2.48)-(2.53), equation

(2.47) becomes

-

-1

K =

1 P1 1

11

[(P)

-Q

~~~~~~~~~~~= -1

)2 p1

- Q12

~~(2.54)

P11 1

where (P) 11

(P)12

(P)12

(P)22

(2.55)

P =

From this and the previous equations and from the figure we can get a clearer picture of the structure of our solution in this case*. is partitioned, let us consider each part individually. [(P)

llPl -I]Q 1 P1 1

Since K1

The first piece,

is exactly of the form of the gain that we saw in the

preceding subsection when the local and global models are identical. (see equation (2.34)).

This is not surprising, as the first piece of the

global state x is nothing more than xl, for which the local and global models agree.

Therefore, the incorporation of xl into a global

estimate of this piece of x, given Y1 and Y2,

is the same as

* In the following discussion we use the notation developed previously. Thus xl refers to the local estimate of x given Yl (P1 is its locally-computed covariance) and x refers to the global estimate of x given Y1 and Y2 covariance P).

In the particular case being examined here x =

xi

(global ,

and

therefore there is some chance of confusion. We have attempted to reduce this chance by using xl and x only in the senses described above. Also, we have left-hand block of P by (P)1 1 (see (2.55)) to distinguish denoted the upper l it from P . Here (P) i is the estimation error covariance of x 1 given and Y2 , wile P1 is te error covariance based only on Yl'

111_._1111

_^·______

_

11_1___.._· .

-17-

the problem we considered in Subsection 2.2. The second piece [(P)

Q-Q2

1

essentially tells us how to

use the estimate of x 1 to obtain an estimate of the remaining part of the state.

Consider for the moment the case in which there is no second

set of measurements, that is, when C21=C22-O.

In this case we have a

cascade interconnection of two systems and measurements from only the first of these.

It is clear that under these conditions (P)11=P1,

which merely states that local processor #1 produces the best filtered estimate of x1 given

.' 1

From (2.54) we see that this

observation is

consistent with the fact that the first part of K1 is zero. second piece of K

Also, the

becomes

[Pi2lQl-Q 2(P

and using (2.23)-(2.28) and (2.54), the optimal estimator for y becomes

y

n +( 2)

x1

(2.57)

A22n + Ml 2lQ2](iX

(2.57)

These equations describe how the optimal estimate of the unobservable part of a system can be constructed from the optimal estimate of the observable part.

It is worth noting that this particular special case is

of practical importance, for example, in navigation systems in which accelerations are sensed and in which velocities and positions are to be estimated.

Our result states that the acceleration measurements can be

processed first (locally) to produce optimal acceleration estimates, and

~~ - -~~

-~~

"~

-18-

these estimates can then be used (perhaps in a centralized processor) to compute the optimal estimates of velocity and position.

Again the

transmission of filtered measurements may be done more efficiently than the transmission of the raw data, and the complexity of the two processors (for Xland for y) are each less than the complexity of a global, centralized estimator for x.

Such a procedure may also be of value even if

Y2 is present, for example, if we do have velocity or position sensors. In this case, from eq.

(2.32) we see that our results tell us how to

reconstruct the optimal estimate of acceleration, velocity and position in terms of velocity and sensor measurements and the estimate of acceleration obtained by processing the accelerometers alone.

Again there

may be transmission savings in transmitting this estimate rather than the raw accelerometer data, and, in addition, there may be implementation advantages in breaking the overall optimal estimator into smaller pieces. Note also from (2.47) that K=0 (2.51)

if Q=O.

In fact, from (2.49) and

(together with the fact that Q12 must be zero if Q1 is), we see

that K=O if Qi=0. depends on x

In this case, whether

in a memoryless fashion.

noting that with Q=0,

xl (t) =

2

is present or not, x

This is best understood by

x1 is a time-varying bias*

l(t,O)x 1 (0)

(2.58)

and it also produces a time-varying bias in

* Here

1 is the state transition matrix associated with Al.

I22 is the state transition matrix for A22.

__111__11111111__1Illll_-_lplllllllli

___ ___

Similarly

-19-

y(t)

=b

t22(t tT)A21(T)Xl(d)d

+

2(t,o)Y(o)

t +

2

(2.59)

(t,T) [0I]w(T)dT

The measurements Yl provide information about the second term in (2.59), which can be rewritten as

()

22 (tT)A

Thus the best estimate of memoryless function of xl. set of measurements

= PM1P

1

where M1 is as in

Y=

III.

d

(t)

(2.60)

(t )

y given the measurements Yl is simply a For example, if we do not have a second

(C21=C 22=O), then (2.28) reduces to

x

(2.61)

(2.49) and P is given by

(2.54).

Therefore

P12P1 x1

(2.62)

THE SMOOTHING UPDATE PROBLEM

Consider the formulation described in the preceding subsection, but with the following additional features: over the time interval [0,T];

(1) we observe the measurements

(2) the data are processed locally to

produce the smoothed estimates

________________________________________

_

-20-

is(t) = E[x.(t) lyi(), 0'