A Systolic Array for Recursive Least Squares Computations. Marc Moonen and Joos Vandewalle ESAT - Katholieke Universiteit Leuven - Belgium
Abstract The original recursive least squares (RLS) computational scheme basically consists of triangular updates and triangular backsolves, and it is well known that pipelining these two separate steps on a parallel architecture as such is impossible. As a result of this, a straightforward systolic implementation with, e.g. O(n2) processors for O(n2 ) operations per update, will have a throughput which is only O(n01), where n is the problem size. Several alternative schemes have been developed, but as far as we know, up till now none of these have been worked into an ecient (stable) systolic implementation. Here, we focus on an RLS algorithm by Pan and Plemmons [12], and we show how an ecient systolic implementation can be derived. We get around the critical path problem by introducing a few additional computational steps. Such a strategy is apparently new. As the dierent steps are then perfectly pipelined, the obtained throughput is the highest possible, i.e. O(n0). The overall eciency is seen to be roughly 67 %.
EDICS classi cation : 7.1.4 Systolic and Wavefront Architectures. Mailing address : Marc Moonen ESAT-KUL K. Mercierlaan 94 3030 - Heverlee Belgium tel. 32/16/22 09 31 email :
[email protected] 1
1
Introduction
The linear least squares (LS) problem is one of the oldest problems in applied mathematics, with numerous applications throughout modern engineering. It appears in linear estimation methods for adaptive antennas, beamforming, data communications, space navigation, systems identi cations, etc. In general, without reference to any speci c application, the problem can be stated as follows: Given a real m 2 n matrix A (m n) with full column rank n, and a real m-vector b, nd x that solves min
x
kA 1 x 0 bk
The solution to the above minimization problem is known to be xLS
T A)01AT
= (A
1b
The matrix AT A is called the information matrix, as it measures the information content in the experiment, while its inverse is called the covariance matrix, as it measures the expected error in x. An accurate and stable way of computing the LS-solution consists in rst applying a sequence of orthogonal transformations, such that A is reduced to an upper triangular matrix (QR-factorization) [3, 4] QT 1 A 1x
=
|" {z #}
R
QT 1 b
|" {z #}
y z
0
where Q is an m 2 m orthogonal matrix QQT = Im2m , and R is n 2 n upper triangular. The least squares solution can then be computed by solving the triangular system R 1 xLS = y Computing the QR factorization takes O(mn2 ) operations, while the triangular backsolve requires O(n2) operations. In many real-time signal processing applications it is necessary to continuously update the least squares solution, when after each sampling new observations are added to the problem. Without loss of generality, assume 2
that at a certain time step an additional vector a and a scalar are appended, forming # # " " A b b= A= aT The recursive least squares (RLS) problem consists in re-computing the least squares solution after appending the new data, by making use of the results of the previous time step. Brie y, the RLS problem can be solved as follows (information matrix approach). Instead of A and b, the corresponding triangular factor R and right-hand side y are stored. In a rst step, these are updated with the new observation (QR-updating [2]) QT
1
"
R aT
#
1x
=
QT
1
"
y
#
| " {z # }
| " {z # }
y
R 0
where Q is an (n + 1) 2 (n + 1) orthogonal matrix. The new LS-solution can then be computed by solving the triangular system R 1 xLS
=
y
Both the triangular update and the triangular backsolve require O(n2 ) operations. For extremely high throughput applications such as image and radar processing, this computational complexity is a serious impediment. Systolic arrays or wavefront arrays then provide a means of exploiting large amounts of parallelism and hence achieving orders of magnitude improvement in performance, consistent with the requirements. For the QR updating step, one can make use of the well-known Gentleman-Kung array [1]. Subsequent updates are then pipelined on an n 2 n triangular array, such that the throughput is independent of the problem size, i.e. O(n0). As for the RLSproblem, one could try to implement the additional triangular backsolves on the same triangular array. Pipelining the two separate steps (update & triangular backsolve) however appears to be impossible. The QR-update runs from the upper-left corner to the lower-right corner of the array, while the backsolve runs in precisely the opposite direction, hence creating a critical path. Alternatively, the triangular backsolve can be computed on an additional linear array [1]. The problem here is that the necessary data transfer 3
from the triangular array to the linear array breaks the pipelining on the former, so that the throughput again drops to O(n01 ). 1 One way of solving the above problem consists in storing and updating R01 as well as R. Instead of a triangular backsolve, one then only has to perform a matrix-vector multiplication xLS
=
R01 1 y
On a triangular array, the matrix-vector multiplications can be computed along with the triangular updates. Furthermore, the transformations that update R can also be used for updating R01 [10, 5, 12], which results in an elegant systolic implementation. As there are now two triangular factors to be updated, the operation count is roughly doubled. In [9] it is however shown that this updating scheme is unstable. The procedure for stabilizing the scheme, as developed in [9], again doubles the number of computations. The resulting scheme, though very elegant, is thus highly inecient. A related, but much more ecient RLS scheme has been derived in [12]. Only R01 is stored and updated, where the transformations for updating R01 follow from the matrix-vector product 0aT 1 R01 as follows. [12] 0 1 Given R , right-hand side y and LS solution xLS Input a and Step 1. Form the matrix-vector product Algorithm Orthogonal-Inverse Updating
sT
Step 2. For k Qk so that
1 uals
= 1; : : : ; n
1
T
s |{z} 12n
=
0aT 1 R01
determine orthogonal plane rotations
1 Q1Q2 : : : Qn =
T |{z} 0 12n
Note that in some cases, e.g. for adaptive beamforming, only the least squares resid-
0 aT 1 xLS
are required. In [7], it is shown how these can be computed on the
Gentleman-Kung array, along with the 0 The throughput then remains O (n ).
QR-update
4
and without explicitly forming
xLS .
Step 3. Update R01 h
0
R01
i
1 Q1Q2 : : : Qn =
h
u R01
i
Step 4. Update LS solution xLS = xLS 0
0 aT xLS 1u
The vector 0 u is known as the Kalman gain vector. The plane rotations in Step 2 are de ned by 2
Qk
=
6 6 6 4
cos k
0
sin k
0
Ik01
0
cos k
0
0
0
0 sin k
0
0
3
7 7 7 0 5 0
In0k
where Il is han l 2 l iidentity matrix and k is chosen such that the k + 1st element in 1 sT 1 Q1 Q2 : : : Qk01 is zeroed. For details, we refer to [12]. In the sequel we develop a systolic array for this orthogonal-inverse updating algorithm. First we show how to implement one recursive update. The dierent computational steps of the updating algorithm are seen to correspond to dierent computational fronts that run over the array in different directions. Then we show how subsequent updates can be pipelined, by introducing an `intermediate updating' procedure, for taking care of the crossing of the dierent computational fronts. This intermediate updating represents a computational overhead, such that the overall eciency will be roughly 67 %. However, as the dierent steps are now pipelined, an O(n0 ) throughput is obtained. As far as we know, our array is therefore the rst to implement a complete RLS algorithm in a stable and ecient manner. The correctness of the presented array has been veri ed by software simulation. 2
A systolic array for orthogonal-inverse updating
A preview of the array is given in Figure 1. Brie y, the triangular array stores and updates R01 , while the additional column to the right stores and updates the LS solution xLS . New data are fed in to the left, one vector at a time, while the LS solutions run out to the right at the same 5
rate. Note at once that the data ow is completely dierent in the original Gentleman-Kung array. Implementing one single update. One single update of the above orthogonal-inverse scheme is easily implemented on a triangular array. Figure 2 exhibits a 10 2 10 example. A dot corresponds to an entry either in the triangular factor R01 or the current LS solution xLS (right-hand column). In the Gentleman-Kung array, one elementary processor is associated to each such entry. Here we adopt an alternative partitioning, which is related to the odd-even ordering in Jacobi-type algorithms [13, 6]. For the sake of simplicity, we can assume that one processor is associated to each 2 2 2 block of contiguous matrix elements [11]. Let it suce to say that such a partitioning can be implemented with a 100% processor utilization, as was detailed e.g. in [8]. The reason for adopting this partitioning is that pipelining successive updates in a Gentleman-Kung type array is much harder and above all extremely messy. With the odd-even partitioning, successive updates are nicely separated and easy to trace. At a certain time step, only the framed 2 2 2 blocks (Figure 2) are `active'. Essentially, such `activity' is initiated on the main diagonal and then propagated to the blocks next outwards. In Figure 2a, e.g. all 2 2 2 blocks on the main diagonal corresponding to odd pivot indices are active (the pivot index is the row (column) number of the upper-left entry in such a 2 2 2 block). In Figure 2b, this has been propagated one step outwards. In Figure 2c, this rst sequence has moved far enough to allow the next sequence to be generated, this time corresponding to even pivot indices. In Figure 2e, the next odd sequence again starts o, etc. We refer to the original paper by Stewart [13] for further details.
1. The first step in the orthogonal-inverse updating scheme is a matrixvector multiplication sT = 0aT 1 R01 and this is readily implemented on the array. The a-vector is fed in as indicated in Figure 2d with the 's, and propagated to the right. Meanwhile the product terms are accumulated and propagated upwards. The functionality of a 2 2 2 block is exhibited in Figure 3, where 8 and are a-components, and 9 and are intermediate products. The R01 entries are indicated with 6
; ; and . The s-components run out at the top end of the array. As for the right-hand column containing xLS , we can adopt much the same functionality to compute 0 aT xLS (used in step 4 of the algorithm), if the is fed in from the bottom. We can already point out that also throughout the rest of our description, the operations in the right-hand column hardly dier from those in the triangle. The matrix-vector multiplication is completed after Figure 2i.
2. As soon as the s-components become available at the top end of the array, they are propagated downwards, towards the diagonal, as indicated with the 's. The functionality of a 2 2 2 block is then exhibited in Figure 4. Similarly, 0 aT xLS is propagated downwards in the right-hand column. The plane rotations Q1 through Qn in the second step can then be computed on the main diagonal. Rotation Q1 is computed in Figure 2e as indicated, Q2 in Figure 2g, etc. The functionality of a 2 2 2 block on the main diagonal J is exhibited in Figure 5. Here we make use of an auxiliary variable propagated along the diagonal, which is initially equal to 1 (for Q1) and equal to after Q10 (Qn ), see step 2 of the algorithmic description. Rotation Q1 zeroes the rst component of s, which then drops out, Q2 zeroes the second component, etc. After Figure 2w, the s-vector is completely zeroed, while J the equals , which will be used for the xLS update in step 4. 3. In the third step, the inverse factor R01 is updated by making use of the computed rotations Q1 through Qn and an auxiliary vector which initially equals zero. Rotation Q1 creates a ll-in in the rst position of this vector, Q2 creates a ll-in in the second position, etc. After Q10, one ends up with the u-vector. The rotations which are computed on the main diagonal are therefore propagated upwards, while the auxiliary vector, indicated with the 's, is propagated to the right. The functionality of a 2 2 2 block is exhibited in Figure 6. 4. Again the same functionality can be adopted forhthe right-hand i colcos 0 sin umn to perform the fourth step. Instead of a rotation sin cos one can now propagate the following transformation "
1
1
#
0 0a xLS 0 0a xLS T
T
such that the xLS in the last column is updated, and furthermore copied into the -vector, which is then output. Note that both 0 aT xLS (propagated 7
downwards) and (propagated along the diagonal) are by then available in the lower right corner. The updated least squares solution xLS is completely available in Figure 2E. Pipelining dierent updates. One complete update, as it was detailed in the previous section, requires O(n) time steps (from Figure 2e through 2E). An O(n0 ) throughput can only be obtained if successive updates can be pipelined. Suppose we would input a new data vector at each odd phase, in other words in Figure 2e, 2i, 2m, etc. Successive s-vectors are then distributed over the grid as depicted in Figure 7. Here successive s-vectors are indicated alternately with 's and 2's. It is seen that each framed 2 2 2 block has two s-components in its upper part. Similarly, the auxiliary vectors are distributed over the grid as depicted in Figure 8, were we used 's and 2's. It is seen that each framed 2 2 2 (o-diagonal) block also has two auxiliary vector components in its left part. The functionality of all the 2 2 2 blocks could then be as indicated in Figure 3-4-5-6. However, dierent updates slightly interfere and in order to take care of this, we will have to introduce a few additional operations. In order to make things clear, let us rst pick out two arbitrary updates and see how they interfere. In Figure 9, a rst update is still being completed, while a second one starts o in Figure 9e ('s). Let us refer to the rst update with a1; 1, s1 and u1 , and to the second update with a2; 2, s2 and u2. The rst update generates R01 and xLS , the second one generates R01 and xLS . As the rst update is not yet completed, the computed 1 matrix-vector multiplication in the second update will be sT]2 = 0aT2 1 R0 ] , 1 01 instead of sT2 = 0aT2 1 R01 , where R0 ] is some intermediate version of R . The matrices R01 and R0] 1 are related as follows h
i i h 1 0 1 aux] R0 1 Q = u R ] 1 ]
where aux] is the current value for the auxiliary vector in the rst update, and Q] represents a number of rotations which are yet to be carried out in that same update. In our case Q] = Q]7Q]8 Q]9Q]10
8
From this it readily follows that 3 2 0aT 1 aux] 0aT 1 R01 4
2
2
|
{z
]
sT]2
} 5 1 Q] =
"
0aT2 1 u1 |0aT2 {z1 R01} # sT2
that the vectorisT2 can be recovered by applying the rotations Q] onto 0aT2 1 aux] sT]2 . The trick is thus to compute 0aT2 1 R0] 1 as well as 0aT2 1 aux] and then afterwards apply the Q] rotations to get s2. This is a computational overhead of course, but it allows for an ecient pipelining of successive updates, so that the throughput is O(n0 ). In Figure 9, the vector-vector product 0aT2 1 aux] is computed along with J the matrix-vector product. Intermediate products are indicated with the in Figures 9e through 9g. The functionality of a 2 2 2 block is exhibited in J Figure 10, in substitution for Figure 3. The resulting = 0aT 2 1 aux] is then propagated downwards, together with the s -components. Rotations i h J]2 T s]2 in Figure 9h-j-l-n Q]7 through Q]10 are seen to be applied to respectively. The functionality of a 2 2 2 block is then exhibited in Figure 11, in substitution for Figure 4-6. hso
As for the right-hand column, one can similarly verify that rst 2 0 aT2 xLS is accumulated from the bottom to the top, instead of 2 0 aT2 xLS . The correct value is then obtained as follows 2 0 aT2 xLS
=
=
2 0 aT2 h
1 (xLS 0 1 0 a1 xLS 1 u1) T
1
2 0 aT2 xLS
J
i
2
0aT2 u1 1 4
1
0 1 0a11xLS T
3 5
The that runs out of the triangular array into the right-hand column 1 T T happens to be (Figure 9n-0) is known to be 0a2 u1 , and as 0 1 0a11 xLS (part of) the upgoing transformation in the right-hand column, the functionality in the right-hand column is again seen to be roughly the same as in the triangular array. In Figure 9, it was outlined how two (and only two) updates interfere. With one update starting in each odd phase, the situation is of course not any dierent. Several vector-vector products are then computed along with each matrix-vector multiplication, etc. Remarkably enough, the functionality of 9
the 2 2 2 blocks is extremely simple. From Figures 7-8 it followed that each 2 2 block has two J's in its upper part, and two 's to the left. Additionally it will now have a in its upper left corner. All 2 2 2 blocks therefore always perform exactly the same operations, as indicated in Figures 10-11 (Figure 5 for the blocks on the diagonal). The operations in the right-hand column are essentially the same. Finally, from this it can also be seen that one third of these operations are essentially overhead, so that the overall eciency is roughly 67%. 2
3
Conclusion
A triangular processor array has been developed for implementing recursive least squares computations. The throughput is O(n0 ), which means that data vectors are fed in at a rate which is independent of the problem size. The corresponding least squares solutions then run out of the array at the same rate. The overall eciency is seen to be roughly 67%. As far as we know, this array is the rst to implement a complete RLS algorithm in a stable and ecient manner. Acknowledgement : This research work was carried out at the ESAT laboratorium of the Katholieke Universiteit Leuven, in the framework of the Community Research programme ESPRIT Basic Research Action Nr. 3280, and was partially sponsored by the European Commission. Marc Moonen is a senior research assistant with the N.F.W.O. (Belgian National Fund for Scienti c Research).
References
[1] W.M. Gentleman, H.T. Kung, Matrix triangularization by systolic arrays. Real-Time Signal Processing IV, Proc. SPIE, Vol. 298, 1982, pp 19-26. [2] P.E. Gill, G.H. Golub, W. Murray, M.A. Saunders, Methods for modifying matrix factorizations. Mathematics of Computation, Vol. 28, 126, 1974, pp 505-535. [3] G.H. Golub. Numerical methods for solving linear least squares problems. Numer. Math. 7, 1965, pp 206-216. 10
[4] G.H. Golub, C.F. Van Loan, Matrix computations. North Oxford academic Publishing Co.,Johns Hopkins Press, 1983. [5] J. Hudson, T.J. Shepherd, J.G. McWhirter, Parallel weight extraction by a systolic least squares algorithm. Advanced Algorithms and Architectures for Signal Processing IV, Proc. SPIE, Vol. 1152, 1989. [6] F.T. Luk, A triangular processor array for computing singular values. Lin. Alg. Appl. 77, 1986, pp 259-273. [7] J.G. McWhirter, Recursive Least Squares Minimization using a systolic array. Real-Time Signal Processing VI, Proc. SPIE, Vol. 431, 1983, pp 105-112. [8] M. Moonen, J. Vandewalle, A hexagonally connected processor array for Jacobi-type matrix algorithms. Electronics Letters 26, No. 6, 1990, pp 400-401. [9] M. Moonen, J. Vandewalle, Recursive least squares with stabilized inverse factorization. Signal Processing 21, 1990, No. 1. [10] M. Morf, T.K. Kailath, Square root algorithms and least squares estimation. IEEE-Trans. Automat. Control 20, 1975, pp 487-497. [11] D.P. O'Leary and G.W. Stewart, Data ow algorithms for parallel matrix computations. Comm. ACM 28, 1975, pp 840-853. [12] C.T. Pan, R.J. Plemmons, Least squares modi cations with inverse factorization: parallel implications. J. Computational and Applied Mathematics, Vol. 27, No. 1-2, 1989, pp. 109-127. [13] G.W. Stewart, A Jacobi-like algorithm for computing the Schur decomposition of a nonhermitian matrix. SIAM J. Sc. Stat. Comp., Vol. 6, No. 4, 1985, pp 853-863.
11
CAPTIONS Figure 1 :
Outline RLS array.
Figure 2 :
Data ow for one update.
Functionality for a 2 2 2 block for one update, (1) matrix-vector multiplication.
Figure 3 : Figure 4 :
ment.
Functionality for a 2 2 2 block for one update, (2) data move-
Functionality for a update, (3a) R01 -update.
Figure 5 :
2
2 2 block on the main diagonal, for one
Functionality for an o-diagonal (3b) R01 -update.
Figure 6 :
2
2 2 block, for one update,
Figure 7 :
Distribution of successive s-vectors over the array.
Figure 8 :
Distribution of auxiliary vectors over the array.
Figure 9 :
Data ow and interaction of successive updates.
Complete functionality for a multiplication.
Figure 10 : Figure 11 :
2
2 2 block,
(1) matrix-vector
Complete functionality for a 2 2 2 block, (2) R01 -update.
12
input
@triangular @ @array @ @@ @
Figure 1
13
-
output
Figure a
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure e
11 11 11 11 11
J
Q1
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure i
1 11J11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 Q3 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11J11 11 11 11 11 11 1 11 11 11 11 11 Q5 1 11 11 11 1 11
11 11 11 11 11
Figure m
Figure b
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure f
11 11 11 11 11
1J11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure j
1 11 11 11 11 11 11 11 11 11 1J11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1J11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure n
Figure c
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure g
1J11 11 11 11 11 11 11 11 11 11 1 1 1 1 1 1 1 1 1 Q2 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 11 Figure k
1 11 11 11 11 11 11 11 11 11 1J11 11 11 11 11 11 11 111111 Q4 1 11 11 11 11 1 11 11 1
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1J11 11 11 11 11 1111 Q6 1 11 11 1
11 11 11 11 11
Figure o
Figure 2 (part 1)
14
11 11 11 11 11
Figure d
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure h
1 11J11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure l
1 11 11 11 11 11 11 11 11 11 1 11J11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11J11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure p
Figure q
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11J11 11 11 11 1 11 11 11 Q7 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11J11 11 1 11 Q
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure u
Figure y
Figure C
9
Figure r
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1J11 11 11 1 11
Figure v
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1J11 11
Figure z
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure D
Figure s
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1J11 11 11 11 Q8 1
Figure w
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1J11 11 Q
Figure A
10
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure E
Figure 2 (part 2)
15
11 11 11 11 11
Figure t
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11J11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
11 11 11 11 11
Figure x
Figure B
9 901801
6
8
-
6
6 6
9
Figure 3
? ?
Figure 4
16
01801
-
8
-
1. Compute rotation parameters & update
J
tan =
"
0
J #
J # "
" =
0J
1
0
2. Propagate
cos sin
0 sin cos
@@R ?
J
3. Propagate rotation parameters cos ; sin
6
J
Figure 5
17
#
1. Input rotation parameters
6
cos ; sin
2. Update "
#
" =
# "
1
cos sin
0 sin cos
3. Propagate
- -
4. Propagate rotation parameters cos ; sin
6
Figure 6 18
#
Figure A
1 11 11 11 11 11 11 11 11 11 11 222 11 11 11 211 1 11 11 11 11 222 1 11 11 11 11 11 11 222 1 11 11 11 11 1 11 11 222
Figure B
Figure C
Figure D
Figure C
Figure D
11 11 11 11 11 11 11 11 222 11 11 11 11 11 11 11 1 11 11 222 1 211 222 11 11 11 11 11 11 11 11 11 1 222 11 111 111 222 1 1 1 1 1 1 1 1 1 1 1 11 11 1 11 11 11 11 11 11 22 21 1 1 11 11 11 11 11 1 1 1 1 1 1 1 1 1 1 1 21 1 11 11 222 11 1 11 11 11 222 1 222 11 11 11 11 11 1 211 222 1 11 111 111 111 1 11 11 11 11 21 111 111 111 111 1 11 11 11 11 1 22 1 211 211 Figure 7
Figure A
1 11 2211 11 11 2211 11 11 2211 11 11 1 11 11 211 211 11 11 2211 11 1 11 2211 11 11 2211 11 1 11 11 211 211 1 11 2211
Figure B
1 11 2211 11 11 211 211 11 11 2211 11 1 211 11 11 2211 11 11 2211 11 1 11 2211 11 11 211 211 1 211 11 11 2211 1 11 2211 Figure 8
19
1 11 11 2211 11 11 2211 11 11 2211 11 1 11 11 2211 11 11 2211 11 11 211 211 1 211 211 11 11 2211 11 11 2211 1 211 11 11 2211 11 11 211 211 1 11 11 2211 11 11 2211 1 11 11 2211 11 11 2211 1 211 211 11 11 2 2 1 211 11 11 2211 1 11 11 2 2 1 11 11 2 2
Figure a
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure e
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11J11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 Q]7 1 11 11
Figure i
1 11 11 11 11 11 11J11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 Q
Figure m
]9
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11J11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure b
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure f
11 11 11 11 11
1 11 11 11 11 11J11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure j
1 11 11 11 11 11 11J11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure n
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11J11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 Figure 9
20
Figure c
1 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure g
11 11 11 11 11
J
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 11 1 Q]8 1 1
Figure k
1 11 11 11 11 11 11 11J11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11 Q
Figure o
]10
11 11 11 11 11
Figure d
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11J11 11 11 11 11 1 11 11 11 11 1 11 11
Figure h
1 11 11 11 11 11J11 11 11 11 11 1 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11
Figure l
1 11 11 11 11 11 11 11J11 11 1 11 11 11 11 11 11 11 1 11 11 11 11 11 1 11 11 11 1 11
Figure p
11 11 11 11 11
1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 11 11 11 1 11 11 11 11 11 11 11 J11 1 11 11 11 11 11 11 11 J11 1 11 11 11 11 11 11 1 11 11 11 11 11 11 1 11 11 11 11 1 11 11 11 11 1 11 11 1 11 11
9J 9J0 1 8 0 1
0 180 1
6
8
-
6
6 6
9 J
Figure 10
21
01801
-
8
-
1. Input rotation parameters J
6
cos ; sin
2. Update
J 3 " J 3 2 cos 7 7 6 6 51 5=4 4 2
sin
3. Propagate
0 sin cos
? @-@R ?
J
4. Propagate rotation parameters cos ; sin
6
J
Figure 11 22
#