Proceedings of the American Control Conference Albuquerque, New Mexico J u n e 1 9 9 7 0-7803-3832-4/97/$10.00 0 1 9 9 7 AACC
An LFT Approach to Parameter Estimation Greg Wolodkint , Sundeep Ranganz and Kameshwar Poolla*
Abstract
In this paper we focus our attention on the output error cost function. The results are readily generalized to In this paper we consider a unified framework for param- include the maximum likelihood (ML) cost function. The eter estimation problems which arise in a system identifi- use of ML estimators is motivated by their attractive and cation context. In this framework, the parameters to be well-known asymptotic properties (under sufficiently inestimated appear in a linear fractional transform (LFT) formative experimentation) such as unbiasedness, consiswith a known constant matrix M . Through the addition of tency and efficiency. Further, we shall restrict our attenother nonlinear or time-varying elements in a similar fash- tion to the case of Gaussian noise statistics. For nonlinear ion, this framework is capable of treating a wide variety problems, we will require that the noise is modest enough of identification problems, including structured nonlinear that all signals retain approximately Gaussian statistics. systems, linear parameter-varying (LPV) systems, and all The principal analytical advantage of our framework is of the various parametric linear system model structures. that it allows for a unified treatment of important issues In this paper, we consider both output error and maxi- such as identifiability, persistence of excitation, robustmum likelihood (ML) cost functions. Using the structure ness, and experiment design. Existing results for stanof the problem, we are able to compute the gradient and dard model structures such as ARX, output-error, ARthe Hessian directly, without inefficient finite-difference MAX, or Box-Jenkins (see [4]) can be recovered as special approximations. Since the LFT structure is general, it al- cases of the general treatment. We have begun to exlows us to consider issues such as identifiability and persis- plore some of these issues, but much work remains to be tence of excitation for a large class of model structures, in done. We should like to draw attention to the fact that a single unified framework. Within this framework, there there is no need to distinguish between “open-loop” and is no distinction between “open-loop” and “closed-loop” “closed-loop” identification in this framework. Indeed, identification. this distinction becomes artificial in the LFT framework. The traditional impediment to using ML parameter Keywords: System identification, parameter estimation estimation (even in the Gaussian noise case) lies in the associated nonlinear programming problem of maximizing the log-likelihood function. The necessary gradient 1. Introduction and Hessian computations are inefficient and, even worse, Parameter estimation plays a central role in many ugly. Also, these computations are problem dependent and not particularly portable. Thus, minor modificaidentification algorithms. In this paper we consider a unitions in the model structure require the user to recompute fied framework for parameter estimation problems which log-likelihood functions, gradients and Hessians. Existing arise in a system identification context. In our framework, tools usually require the user to specify the log-likelihood the parameters to be estimated appear as a linear fracfunction, its gradients and Hessians, or approximate these tional transformation (LFT) with a known constant mausing finite-difference or secant techniques. In this contrix M . Through the addition of other nonlinear or timetext the principal advantage of the LFT framework is that varying elements in LFT with M , our framework is capait allows us to analytically compute gradient and Hessian ble of treating a wide variety of identification problems, information efficiently and without significant user input. including structured nonlinear systems, linear parameterOnce the parameter estimation problem is cast into the varying (LPV) systems, and all of the various parametric LFT framework it can be solved generically. linear system model structures. The remainder of the paper is organized as follows. In Section 2 we introduce the LFT structure and discuss sev‘Supported in part by the National Science Foundation under eral elementary results associated with it. Identifiability is Grant ECS 95-09539. t Department of Aerospace Engineering and Mechanics University of the subject of Section 3, while Section 4 contains a discusMinnesota, Minneapolis, MN 55455,
[email protected],corresion of computational issues including gradient /Hessian sponding author computation, and choice of initial seeds. Department of Electrical Engineering, University of California, Berkeley, CA 94720 * Department of Mechanical Engineering, University of California, Berkeley, CA 94720, (510) 642-4642, poollaQme.berkeley.edu
2088
where M22 E Rnzxnw.The interconnection of M and 8 as shown in Fig. 2 is said to be well-posed if ( I - 8M22) is invertible. In this case, the star product of M and 8 is defined to be
M
* 8 = M11 + M12 ( I - 8M22)-l
M2l
(2)
Figure 2: LFT interconnection of M with 8.
+XI-
Remark 2.2 In the remainder of this paper, we will make ubiquitous use of star products. In doing so, we will implicitly assume that the statements made are valid only if the associated interconnection is well-posed. We should remark that for a fixed matrix M , the set of matrices 8 for which the interconnection fails to be well-posed has zero measure.
Figure 1: The general LFT framework.
2. Parameter Estimation using LFTs We begin with an arbitary interconnection of integrators, delays of various durations, gains, unknown (to be estimated) parameters, static nonlinearities, (known) time-varying gains, (known) input signals, (unmeasured) noise signals, measured output signals, and summing junctions. The unknown parameters appearing in this interconnection may recur at altogether different points in the interconnection. As is widely employed in robust control problem formulations, we may manipulate this interconnected in order to represent it terms of an LFT. The LFT representation consists of a known, constant matrix M interconnected with several additional blocks as shown in Fig. 1. Here, U is a the applied input signal, y are the observed outputs, and e is a unit-variance white noise process. The static nonlinearities are collectively designated as f while the time-varying gains are denoted by g ( t ) . All integrators and delays are gathered into a single block, and similarly for 8 E RnwXnz, the (matrix-valued) parameters to be identified. Implicit in the LFT framework for parameter estimation is the fact that we restrict our attention to finitedimensional models. We briefly mention the notion of "folding" the parameters into the model, through use of the star product. (a),
Definition 2.1 Fix a matrix 8 E RnwXnz and consider a matrix M partitioned as
Figure 3: LFT interconnection of M with nonlinear block. The structure of the model in Fig. 1 has many useful properties. For example, for a given input signal U' and parameter values e", the model can be readily linearized about the resulting trajectory to yield a linear time-varying system. Lemma 2.3 Consider an L F T consisting of an interconnection matrix M and a nonlinear block f (.). Let U', go denote the inputs to M and f respectively. The interconnection can be linearized about the input trajectory 'U b y replacing the nonlinear block with the time-varying gain (3)
Remark 2.4 For many nonlinearities the gradient can be computed analytically. In some cases, such as table lookup or other non-parametric forms, numerical differencing may be required to obtain a linearization. Once obtained, however, cost functions and their gradients can be computed analytically with no further differencing.
2089
Another useful property of the LFT structure lies in the analytical computation of various gradients, in particular the gradients of various cost functions with respect to the parameters 8.
Definition 3.1 Two parameter values 8" and 8 are called indistinguishable if their likelihood functions p y (y; 8") and p y (y; 8 ) are identical for every input signal U of finite support.
Lemma 2.5 Consider an LFT consisting of an interconnection matrix M and a parameter matrix 8. Fix 8" and suppose the interconnection is well-posed at 8. Let N = M * 8".
Definition 3.2 Fix the parameter value 8 O . The indistinguishable manifold at Bo I M ( 8 " ) associated with the model structure M is the set
(b) Fix an input signal U independent of 8 and let y(8) be the resulting output. Then,
(4)
where v = deiei
[ ]
(5)
Here e, denotes the ith standard Euclidean basis vector, and z is the state vector associated with the upper loop of the LFT interconnection. Thus gradient computation in the single-parameter case requires one time response computation to obtain the model state z and output y, one additional time response to obtain and one inner product to obtain the gradient. In some cases (e.g. for LTI systems) an adjoint system can be constructed which computes all of the in one time response computation.
&,
2
IM ={ e :
e
eo)
(6)
Definition 3.3 Fix the parameter value 8". The unidentifiable subspace at 8 O UI(O0) associated with the model structure M is the tangent space of I M at 8" (if it exists). The model structure M is said to be locally identifiable at 8 O if UI(8O) = 0. The model structure M is said to be globally identifiable if it is locally identifiable at all 8 E 0 except for a set of measure zero. In the event a model structure is not globally identifiable it becomes necessary to confine the nonlinear programming appropriately. We will therefore require a chararacterization of the subspace U I I . Recent results include a characterization of U I I in terms of the matrix N = M * 8 , however due to space limitations it has been omitted here.
4. Numerical Aspects In this section we present formulae for gradient and Hessian computation. Due to space considerations, we consider the output error case only. Fix a given record of input-output data u o , y o ,together with an LFT model as in Fig. 1 with nominal parameters 8". Let y denote the model output. Other intermediate signals we require include the model state x , and the input to the nonlinear block q. We begin by definining the two-norm output error cost J = II - Y 112 (7) The gradient of J with respect to the i , j t h entry of 0 can be written as YO
3. Identifiability In this section we discuss identifiability aspects for the LFT model structure described in the previous section. The study of identifiability is fundamental to system identification problems. Loosely speaking, a model structure is identifiable if its associated parameterization contains no redundancy. The importance of characterizing identifiability for a given model structure can not be overstated, since the success of any identification scheme depends heavily on a one-to-one mapping from 8 to inputoutput behavior. For the following, we will limit our discussion to linear time-invariant model structures. Later in this section, we will lift this restriction and discuss identifiability in the general case.
&
is computed using a two-step In the general case, process. First the model is linearized as in Lemma 2.3 to obtain a linear time-varying system. The desired derivatives are then computed using Lemma 2.5. Through the use of an adjoint system, all of the gradients can be computed at once. The Hessian of J is approximated as
2090
(9)
%
Here again the derivatives are computed as before. In this case, however, there is no way to make use of an adjoint filter. The search direction associated with the approximate Hessian (Newton direction) is simply the product of the inverse Hessian with the gradient. This can be obtained through solution of a least-squares problem as follows. Compute the derivatives of y and store them as the columns of S . For vector-valued outputs, a reshaping is necessary to produce a single column vector for each parameter 8i,j.
s=[&
...
$4
= H-lg =
- 0 0 0 1 M= 0 1 0 0 1
(10)
In this notation, the Hessian is given by H = S‘S. Similarly the gradient can be written as g = S’(yo - y). Thus the Newton direction
g
Thus there are three parameters, two of which are repeated. It is a simple exercise to verify that the LFT interconnection can be written as 0 0 0 0 1 0 1 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 1 - 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0
(14)
with
(11)
(S’S)-lS’(y”- y)
which is the least-squares solution of S‘g = yo - y. To summarize briefly, gradient and Hessian computation requires one to
as depicted in Fig. 5 .
1 Compute one time response to obtain z, y, q. 2 Linearize the system about uo as in Lemma 2.3 to obtain a linear time-varying system.
Y
3 For gradient computation, compute one time response as in Lemma 2.5, or
U
e
For Hessian computation, compute n p time responses as in Lemma 2.5. 4 Compute the desired search direction as above.
5 . An Example U
-H-+
N-d-
Y
Figure 4: Example system. Consider the following identification problem, consisting of an LTI system with two unknown parameters interconnected with a square-law nonlinearity. Specifically, suppose the LTI system is written in state-space form as
H=
[ -;! I : ]
(14
1 0 0
Here H has two states, one input, and one output. The output of the LTI system is then squared and scaled via the nonlinearity N(v) = yv 2 (13)
Figure 5 : Example LFT interconnection. In this example, our “experimental” data record will consist of 100 samples of white noise for the input U’. This signal will be applied to the “true” system, whose output will be corrupted with a small amount of white noise to generate yo. The “true” system has a = 0.8, p = 0.3, and y = 1.0, and its output has been corrupted with noise of variance onn = 0.1. Fig. 6 shows the output of the true system, together with a noise-corrupted version. The gradient-descent algorithm is started at the initial condition a = 0.7, /3 = 0.2, y = 2.0, at which point
209 1
1,
the relative error erei
llYO - Yl12 = llY01I2
-hess
05-
is equal to 0.862. After 5 gradient iterations, the relative cost is reduced to 0.051, better but still substantially larger than the error associated with the output noise. The parameter estimates at this point are 0.743, 0.272, and 1.993 respectively. Starting at the same initial condition, after 5 Newton iterations we find the relative cost reduced to 0.0033, the error associated with the output noise. Parameter estimates are 0.798, 0.299, and 1.025 respectively. The results are summarized in Table 1.
-1.5'
0
20
40
60
80
100
Sample
Figure 7: Prediction error for Newton and gradient methods.
References
Initial seed
[l] P. E. Gill, W. Murray, and M. A. Saunders. Practical Optimization. Academic Press, 1981.
Table 1: Comparison of gradient and Newton methods.
6. Summary In this paper we have considered an LFT framework for parameter estimation problems arising in a system identification context. Since the LFT structure may contain time-varying and/or nonlinear blocks, it is possible to treat a wide variety of identification problems with this one structure. Although the results presented have focused on the output error cost function, similar results have been obtained in a maximum likelihood setting. Though the formulae are somewhat more involved, they are again completely general and can be computed analytically. A simple example has been presented to illustrate the concept.
[2] Jr. J. E. Dennis and R. B. Schnabel. Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall, 1983.
[3] R. L. Kosut, M. Lau, and S. Boyd. Identification of systems with parametric and nonparametric uncertainty. In Proceedings of the 1990 American Control Conference, pages 2412-2417, San Diego, CA, 1990. [4] L. Ljung. System Identification, Theory for the User. Prentice-Hall, 1987.
[5] David G. Luenberger. Introduction to linear and nonlinear programming. Addison-Wesley, 1973. [6] A. K. Packard and J. C. Doyle. The complex structured singular value. Automatica, 29(1):71-109, 1993. [7] Roy Smith and John Doyle. Towards a methodology for robust parameter identification. In Proceedings of the 1990 American Control Conference,pages 2394-2399, San Diego, CA, 1990. [8] R. Tempo and G. Wasilkowski. Maximum likelihood estimators and worst case optimal algorithms for system identification. Int. J . of Control, 10:265-270, 1988.
0
20
40
60
80
I 100
Sample
Figure 6 : True and noise-corrupted outputs
2092