Residual Generation from Principal Component ... - Semantic Scholar

Report 2 Downloads 132 Views
IEEE Conference on Intelligent Control, Limassol, Cyprus, June 2005

Residual Generation from Principal Component Models for Fault Diagnosis in Linear Systems Part I: Review of static systems Janos Gertler, Fellow, IEEE1 Abstract. First some fundamental concepts of principal component analysis are summarized. This is followed by the review of our recently developed and published results on the generation of enhanced (structured) residuals from PC models, both by algebraic transformation and by direct structured modeling. Recent results on residual optimization and extensions to dynamic systems are discussed in Part II of the paper. 1. INTRODUCTION In model-based fault detection and diagnosis, the observed plant variables are checked against a mathematical model of the system. This concept is referred to as analytical redundancy. Any discrepancy is expressed as residuals; the analysis of these residuals leads then to decisions concerning the presence and location of faults. The mathematical model may arise from a first principle description of the system, or from systems identification using empirical data. The two main approaches within the analytical redundancy framework are consistency (parity) relations [1,2] and diagnostic observers [3]. Consistency relations are suitably rearranged equations of the plant’s inputoutput model. These are subjected to transformations to yield residuals enhanced for fault isolation; such residuals possess structured or directional properties with respect to the various faults [2]. In the parity-space formulation of the same fundamental idea, residuals are obtained as projections of the observation vector onto “parity vectors”, orthogonal to the space of fault-free variables [4]. While the space spanned by the parity vectors (the parity space) is defined by the system, with the appropriate choice within this space the desired residual properties may be achieved. Principal component analysis (PCA) [5] has been applied successfully in the supervision of complex chemical process systems. By revealing linear relations among the process variables, PCA allows the representation of large systems with models of significantly reduced dimensionality. PCA has also been suggested as a tool for the identification of faulty sensors, by reconstruction via iterative substitution and optimization [6]. ….. 1 Janos Gertler is with the School of Information Technology, George Mason University, Fairfax, VA 22030, USA; (703) 993-1604; [email protected]

As it has been shown recently [7,8], principal component models can also serve as a basis for analytical redundancy type fault detection and diagnosis. The PC directions (eigenvectors) associated with the last principal components are, in fact, parity vectors, and the space they span (the residual space of the PC model) is the parity space. Thus the residual enhancement techniques developed in the analytical redundancy framework may be applied to PC models as well. Structured residuals are powerful tools in fault isolation [2]. In structured design, each residual responds to a specific subset of faults, and thus a fault-specific subset of the residuals responds to each fault, resulting in a unique fault code. Structured residuals may be obtained from consistency relations, or from a full PC model, by algebraic transformation. But they can also be generated by first specifying residual structures and then obtaining subsystem models, each corresponding to a residual, by systems identification or by PC modeling [7]. In this paper, we first summarize some fundamental concepts of principal component analysis. This is followed by the review of our recently developed and published results on the generation of enhanced (structured) residuals from PC models, both by algebraic transformation and by direct structured modeling. Recent results on residual optimization and extensions to dynamic systems are discussed in the second part of the paper. 2. PRINCIPAL COMPONENT MODELS 2.1 Full PC models Consider a linear static (steady-state) system with n observed (measured or manipulated) variables x°=[x°1 ... x°n]’ and k unobserved disturbances z=[z1 ... zk]’. The variables and the disturbances are centered (their mean value is deducted) and normalized (divided with their standard deviation). The superscript ° signifies that the observations are free from faults. Assume that there are m linear relations among the variables, so that B x°(t) + D z(t) + v(t) = 0

(1)

where B and D are m.n and m.k matrices and v=[v1 ... vm]’ represents the effect of noise in the equations. In

many cases, it is meaningful to decompose x° into outputs y°=[y°1 ... y°m]’=[x°1 ... x°m]’ and inputs u°=[u°1 ... u°n-m]’ =[x°m+1 ... x°n]’; then (1) may be written as y°(t) = A u°(t) + D z(t) + v(t)

(2)

where A is an m(n-m) matrix and B = [-I A]. If k<m then the disturbances can be eliminated by a transformation so that (1) becomes B* x°(t) + v*(t) = 0

(3)

where B* is (m-k)⋅n and v*=[v*1 ... v*m-k]’. Such a transformation is equivalent to using k of the m equations to express the k disturbances and substituting them into the remaining m-k equations. Now consider a training data set consisting of N observations x°(t), t=1...N, of the x° vector which are guaranteed to be free of faults. Define X=[x°(1) ... x°(N)]. By a Singular Value Decomposition, the X matrix may be expressed as n

X = Σ σ*i qi γi’

(4)

i=1

where σ*i are the singular values (the non-negative square roots of the eigenvalues) of XX’ and qi are its eigenvectors while γi are the first n eigenvectors of X’X. An observation x(t) is expressed as n

n

i=1

i=1

x°(t) = Σ σ*i qi γit = Σ qi pi(t)

(5)

where pi(t)=σ*i γit= qi’x°(t)

(6)

is the projection of the observation on the eigenvector qi Because of the m-k linear relations among the data, the last m-k singular values are zero if there is no noise, and they are small relative to the first n-m+k singular values if there is moderate noise. Thus (5) may be written as n–m+k

x°(t) = Σ qi qi’x°(t) + ε(t) = QM QM’ x°(t)+ ε(t) (7) i=1

where n

ε(t) = Σ qi qi’x°(t) = QR QR’ x°(t) i=n-m+k+1

(8)

with QM = [q1 ... qn-m+k]

QR = [qn-m+k+1 ... qn]

Here ε(t) is the residual, expressing the effect of noise. Eq. (7) and (8) are the Principal Component (PC) representation of the data; the eigenvectors qi are the principal directions (“loadings”) and the projections pi(t) are the “scores”. The first n-m+k eigenvectors (the

columns of QM) span the model (representation) space, the subspace where the noise-free observations exist, while the remaining m-k eigenvectors (the columns of QR) span the orthogonal complement of the representation space, the residual space. The PC residual in (8) is described in the n dimensional space of the variables, as the sum of m-k n-dimensional vectors qn-m+k+1 ... qn , each with a scale-factor (score) pi(t) = qi’x°(t). Though these vectors are n-dimensional, they are confined to the much smaller m-k dimensional subspace. It will be simpler to work with the m-k dimensional equivalent of this residual, expressed directly in that subspace. By a change of basis, e°(t) = QR’ ε(t) = QR’ x°(t)

(9)

Here each of the m-k elements of e°(t) is a scalar that is the projection of x°(t) on one of the eigenvectors spanning the residual space. Relationship between B and QR’. Consider equations (1) and (9), with no noise and disturbances. Then Bx°(t)=0 and QR’x°(t)=0. Observe that the rows of both B and QR’ are orthogonal to any x°(t), that is, both set of rows spans the same residual space. This implies that B = M QR’

(10)

where M is a full-rank square matrix. This relationship is important because it allows drawing conclusions for the structure and behavior of QR’ by studying the same of B. Note that the PC representation of the system is not unique; the m linear relations in the data only determine the residual space and any m orthonormal vectors spanning this space qualify as rows of QR’. The actual outcome of the PC model is affected by the noise. With B = [-I A], A = -( QR1’)-1 QR2’

(11)

where QR’ = [QR1’ QR2’] and QR1’ is square. This relationship allows the estimation of the explicit system parameters (matrix A) from the eigenvectors associated with the last principal components. If noise is present in the data then the relationship becomes approximate [12]. 2.2 Partial PC models Consider again system (1) but, for the sake of simplicity, without disturbances. There are m linear relations among the n variables, so the model space is n-m dimensional and the residual space is m dimensional. Let us select one of the variables xj, use one of the equations to express it and substitute it into the remaining equations. The resulting system has n-1

variables and m-1 linear relations. (This is the same as keeping the original system and re-classifying xj as a disturbance.) If a PC model is applied to this reduced system, its model space is still n-m dimensional but its residual space is m-1 dimensional. This is a partial PC representation. Two special situations should be mentioned: a. The variable xj is present only in one equation. Then the reduced system is obtained by simply omitting this equation. b. When eliminating xj , one or more additional variables get also eliminated (this happens if there are co-linear columns in matrix B). This reduces the dimension of the model space, without affecting that of the residual space. Now select two variables, xj and xg , and use two of the equations to eliminate them (or keep the full set of equations but re-classify these two variables as disturbances). Then there are n-2 variables in the reduced system, with m-2 linear relations among them. The model space is still dimension n-m (or less, see b. above) while the dimension of the residual space is m-2. In general, by eliminating h<m variables, we arrive at a reduced system of n-h variables and m-h linear relations. In the PC model of the reduced system, the dimension of the model space is n-m (or less, see b. above) while the dimension of the residual space is m-h. Two cases are of particular interest: 1. h=m-1. This is the maximum possible elimination. The residual space is one dimensional, QR’ contains a single eigenvector. 2. h=m-2. The residual space is two dimensional, QR’ contains two eigenvectors. If there are k true disturbances in the system then the number of variables that may be eliminated is subject to h<m-k. In the partial PC models, the dimension of the model space is n-m+k while the residual space is m-k-h. 3. RESIDUAL PROPERTIES

e(t) = QR’ x(t) = QR’ [∆x(t) + ε(t)]

(13b)

Notice that e(t) is completely insensitive to the true value x°(t) of the observed variables. That is, e(t) is a consistency-type residual. Equation (13b) is basically a parity relation and the rows of QR’, that is, the eigenvectors qn-m+k+1 ... qn , are parity vectors [4]. The effect of faults. Consider Eq. (13b) with ε(t)=0 and expand QR’ and ∆x(t) as e(t) = [q*1 q*2 ... q*n] [∆x1(t) (14)

∆x2(t) ... ∆xn(t)]’

where q*j , j=1...n, are the columns of the QR’ matrix (recall: its rows are the eigenvectors). Thus the response to a single fault ∆xj(t), j=1...n, is e(t|∆xj) = q*j ∆xj(t)

(15)

The effect of noise. Now consider Eq. (13b) with ∆x(t)=0, that is, e(t) = QR’ε(t), and compute its covariance: E{e(t)e’(t)} = QR’ E{ε(t) ε’(t)} QR

(16)

With (13b), this becomes E{e(t)e’(t)} = QR’E{x°(t) x°’(t)} QR

(17)

The covariance of x°(t) may be approximated from the training data as E{x°(t)x°’(t)} = (1/N) X X’

(18)

= (1/N) [QM QR]Diag (σ*1 2 ... σ*n2) [QM’ QR’]’ Now observe that QR’[QM QR]=[0 I]. Thus E{e(t)e’(t)} = Diag (σ*n-m+k+1 2 /N ... σ*n2 /N)

(19)

That is, the elements of the e(t) vector are uncorrelated and their variance σj2, j=1...m-k, is related to the singular values of the data, σ*i2 , i=n-m+k+1 ... n, as

3.1 The effect of faults and noise From here on, we are dealing with the monitoring phase. Consider the observations x(t) which potentially contain sensor and actuator faults ∆x(t): x(t) = x°(t) + ∆x(t)

Alternatively, (13a) may be written as

(12)

Due to the normalization of the variables, each fault is also divided with the standard deviation of the respective variable. Now apply the residual equation (9) to x(t) yielding e(t) = QR’x(t) = QR’x°(t)+QR’∆x(t) = e°(t)+ QR’∆x(t) (13a)

σj2 = E{ej2(t)} = σ*n-m+k+j 2 /N

(20)

One may relate the fault-free residual e°(t) to the model error v(t) using (10), Recall, however, that in the presence of noise, (10) is approximate. From (1) and (10) (with z=0) v(t) ≈ - M QR’ x°(t) = - M e°(t)

(21)

If B=[-I A] then e°(t) = - M-1 v(t) ≈ QR1’ v(t)

(22)

3.2 Isolation-enhanced residuals To detect a fault, a single scalar residual may only be needed (provided it is sufficiently sensitive to that fault). However, for fault isolation, a set of scalar residuals (a residual vector) is necessary. In the context of analytical redundancy based on explicit models, residuals are usually enhanced so that they facilitate the isolation of faults. The main enhancement techniques involve directional and structured residuals. Residual enhancement is achieved by the transformation of the “primary” residuals or by appropriate design of a diagnostic observer. These same ideas can be adopted in the PC framework as well. Directional residuals. Directional residuals are so designed that, in response to a particular fault, the residual is always confined to a specific direction in the residual space. In a static system, residuals have directional properties naturally. The directions may be modified by the manipulation of the residuals to achieve better separation of the responses. Ideally, the response directions should be orthogonal (or at least independent); this limits the number of faults to that of the system outputs. Structured residuals. A very effective isolation approach involves residuals that are selectively sensitive to subsets of faults. Such residuals are called structured. With a structured set, each residual is sensitive to a different subset of faults and each fault affects a different subset of residuals. Diagnosis is achieved by threshold testing each residual and then comparing the resulting Boolean pattern to standard patterns. The structure of a residual set is described by a structure (incidence) matrix. The rows of the matrix are residuals while the columns are faults. A 1 (one) in an intersection means that the concerned fault does affect the concerned residual while a 0 (zero) means that it does not. For fault detection, no column of the structure matrix should contain only 0’s. For isolation, all columns should be different. For “strong” isolation (that is, for protection against false isolation under certain conditions), the columns need to have certain symmetry properties, best achieved if the number of 0’s in each column is the same (CC, “column canonical” structure). Shown below is a strongly isolating structure for 4 faults and 4 residuals.

r1 r2 r3 r4

∆x1

∆x2

∆x3

∆x4

1 1 0 0

1 0 1 0

0 1 0 1

0 0 1 1

Structured residual sets are subject to simple combinatorial constraints: (i) the total number of zeros in the columns should be the same as that in the rows, and (ii) the number of different columns should be the same as that of the faults [2]. We usually strive for structures where each row has the maximum possible number of 0’s, that is m-1. Structured residuals with optimization. By giving up one (or more) of the possible zeros in the row structure, one (or more) degree(s) of freedom is introduced that may be utilized to optimize the residual for some performance index [10]. This concept also works equally well in the PC framework as it does with explicit model, or even better for with PC optimization can be given a simple geometric interpretation (see later). 4. ENHANCED RESIDUALS FROM THE FULL PC MODEL Enhanced residuals can be obtained from the full PC model, by manipulating the residual vector e(t). This method was discussed in [8]. The basis for residual enhancement is (14) and (15), where the response to each fault is shown to be a column of QR’. In the following discussion, we will assume that there are no real disturbances in the system (k=0). Directional residuals. Equation (14) is already directional. To change the directions for a subset of not more than m faults, a transformation may be applied to the residual as r(t) = W e(t)

(23)

where r(t) and W have m rows and W is such that W QRi’ = Γ

(24)

Here QRi’ contains the columns belonging to the subset of faults and the columns of Γ are the desired directions. In an important special case, Γ =I. If m columns are selected and QRi’ has full rank then there is a unique solution for W. Structured residuals. Structured residuals are generated separately row-by-row, as ri(t) = wi e(t)

(25)

The maximum number of guaranteed zeros in a row is m-1. If these are all utilized (maximum zero design) then wi is to satisfy wi [QRi’ q*g ] = [0 … 0 1]

(26)

Here QRi’ contains those m-1 columns that belong to the selected zero responses while q*g is the column belonging to an arbitrary “one” response. One may then

normalize wi . There is a unique solution for wi if [QRi’ q*g ] has full rank. Structured residuals with optimization. This subject will be addressed in Part II of the paper. 5. STRUCTURED RESIDUALS FROM PARTIAL PC MODELS Partial PC models lead naturally to structured residuals, without the need for algebraic transformation. One first specifies the structure of the residual set then obtains a partial PC model for each row (subsystem). Recall that a partial PC model is obtained by applying PC to a selected subset of variables. The residual will only be sensitive to the sensor and actuator faults associated with the variables in the model. This idea was originally introduced in [7] and further demonstrated in [11].

Example. Consider a system with n=5 observed variables subject to m=3 linear relations. The one-dimensional partial PC models will contain n-m+1=3 variables each (two variables missing from each). There are 5!/(3!2!)=10 ways to select 2 out of 5. The 10 row-structures are shown in Table 1. For a column canonical (strongly isolating) set, 5 rows are needed. A possible set is shown in Table 2. Table 1. e1 e2 e3 e4 e5 e6 e7 e8 e9 e10

We will present two approaches: 1. Use one-dimensional partial PC models to get residuals with maximum number of 0’s; 2. Use two-dimensional partial PC models and optimization to obtain residuals that have fewer 0’s but have optimal fault sensitivity.

x2

x3

x4

x5

0 0 0 0 1 1 1 1 1 1

0 1 1 1 0 0 0 1 1 1

1 0 1 1 0 1 1 0 0 1

1 1 0 1 1 0 1 0 1 0

1 1 1 0 1 1 0 1 0 0

x1

x2

x3

x4

x5

0 0 1 1 1

0 1 0 1 1

1 0 1 0 1

1 1 0 1 0

1 1 1 0 0

Table 2.

5.1. One-dimensional partial PC models. Assume again that there are no real disturbances in the system (k=0). To obtain a one-dimensional partial PC model (that is, a partial PC model having a onedimensional residual space), one needs to eliminate m-1 variables. These will be one linear relation for the remaining n-m+1 variables. The row-structure of the residual contains m-1 zeros; this is the maximum number of zeros possible with m linear relations in the system . Any spontaneous elimination of variables (due to colinearity with those eliminated intentionally) results in (undesired) additional zeros in the structure. For each partial PC model, the residual space contains just one eigenvector and the residual is a scalar. Using superscripts to distinguish partial systems, the residual equation for the i-th partial system is ei(t) = qi’xi(t) = qi’[∆xi(t) + ε i(t)]

x1

(27)

Here xi(t) contains those n-m+1 variables that were selected to stay in the model. The parity vector qi is the last eigenvector in the partial PC model. The variance (σi)2 of ei(t) (the effect of noise) can be obtained as 1/N times the eigenvalue that accompanies qi. The residual is uniquely defined for each structure. There are n! /[(n-m+1)! (m-1)!] ways to select m-1 variables out of n; these all result in a different row structure. Apart from trivially small systems, only a small fraction of these rows is necessary to compose a strongly isolating set.

e1 e2 e6 e9 e10

5.2. Two-dimensional partial PC models. Two-dimensional partial PC models are obtained by eliminating (ignoring) m-2 variables. This leaves nm+2 variables in each partial model. The twodimensional residual for the i-th partial system is ei1(t) = qi1’ xi(t) = qi1’ [∆xi(t) + ε i(t)] ei2(t) = qi2’ xi(t) = qi2’ [∆xi(t) + ε i(t)]

(28)

Here qi1 and qi2 are the last two eigenvectors in the partial PC model. Notice that the two scalar residuals, ei1(t) and ei2(t), respond to the same subset of faults. The fault-free residuals have zero mean, are independent and their variances (σi1)2 and (σi2)2 are 1/N-times the respective singular values. A new residual ri(t) may be obtained as a linear combination of ei1(t) and ei2(t) ri(t) = αi1 ei1(t) + αi2 ei2(t) This can be represented as

(29)

ri(t) = si’ xi(t)

si = αi1 qi1 + αi2 qi2

(30)

Here the new parity vector si is a unit-length vector in the plane spanned by qi1 and qi2 . Because of the orthonormality of the latter, the coefficients can be expressed as αi1=sin θi, αi2=cos θi, where θi is the angle of si in the plane (measured from qi2, see Fig. 1.). Thus ri(t, θi) = [(sin θi)qi1 + (cos θi)qi2 ] xi(t)

(31)

The effect of the fault ∆xj(t) in the transformed residual ri(t, θi) is ri(t, θi|∆xj) = sij ∆xj(t) = [cij sin (θi + ϕij)]∆xj(t) (32) where

cij = [(qi1j)2 + (qi2j)2]1/2, ϕij = tan-1(qi2j / qi1j).

qi2 αi2 qi2

si

Over the years, several people contributed to this research, including Thomas McAvoy, Weihua Li, Yungbing Huang, Jin Cao and Yongtong Hu. A suggestion by Mark Kramer provided the initial impetus. Part of this work was supported by NSF under Grant #ECS-9906250. REFERENCES [1] Chow, E.Y and A.S. Willsky, “Analytical Redundancy and the Design of Robust Failure Detection Systems”, IEEE Trans. on Automatic Control, 29, 603 (1984). [2] Gertler, J. and D. Singer “A New Structural Framework for Parity Equation Based Fault Detection and Isolation”, Automatica, 26, 381 (1990) [3] Frank, P. “Fault Diagnosis in Dynamic Systems Using Analytical and Knowledge-Based Redundancy”, Automatica, 26, 459 (1990).

θi αi1qi1

ACKNOWLEDGEMENT

qi1

Figure 1. Parity vectors in two-dimensional space The variance of the fault-free residual ri(t, θi) (the effect of the noise) is (σi)2 = (sin θi )2 (σi1)2 + (cos θi )2 (σi2)2

(33)

The transformed residual ri(t, θi) inherits the joint rowstructure of ei1(t) and ei2(t) (with one exception, see the note below); it has m-2 zeros. However, it is not unique; it depends on the free parameter θi. This parameter may be used to optimize the residual according to some performance criterion. This will be addressed in the next section.

[4] Lou, X.C., A.S. Willsky and G.C. Verghese, “Optimal Robust Redundancy Relations for Failure Detection in Uncertain Systems”, Automatica, 22, 333 (1986). [5] Kresta, J.W, J.F. MacGregor and T.E. Marlin, “Multivariate Statistical Monitoring of Processes”, Can. J. Chem. Eng. 69, 35 (1991). [6] Dunia, R., S.J. Qin, T.F. Edgar and T.J. McAvoy, “Identification of Faulty Sensors Using Principal Component Analysis”, AIChE Journal, 42, 2787 (1996). [7] Gertler, J. and T. McAvoy, “Principal Component Analysis and Parity Relations – A Strong Duality”, Preprints of IFAC Safeprocess Symposium (Hull, UK), 2, 837 (1997).

Note that θi may be so chosen that the transformation results in the elimination of one additional variable (i.e. to eliminate xg(t), choose θi so that θi + tan-1(qi2g / qi1g) = 0). In this case, the structure of ri(t) has m-1 zeros and thus it becomes identical with one of the structures obtained from one-dimensional partial models.

[8] Gertler, J., W. Li, Y.Huang and T. McAvoy, “Isolation-Enhanced Principal Component Analysis”, AIChE Journal, 45, 323 (1999).

CONCLUSION

[10] Qin, S. J. and W. Li, “Detection, Identification, and Reconstruction of Faulty Sensors with Maximized Sensitivity”, AIChE Journal, 45, 1963 (1999)

In this Part I of the paper, after recalling the basics of principal component modeling, we reviewed the concept of enhanced residuals, including directional, structured and optimized structured enhancement. We also discussed the idea of partial PC models. Then we demonstrated the generation of structured residuals, both from the full PC model and from a structurally designed set of partial PC models.

[9] Cao, J. and J. Gertler, “Noise-Induced Bias in Last Principal Component Modeling of Linear Systems”, J. of Process Control, 14, 365 (2003).

[11] Huang, Y., J. Gertler and T. McAvoy, “Sensor and Actuator Fault Isolation by Structured Partial PCA with Nonlinear Extensions”, Journal of Process Control, 10, 459 (2000).

IEEE Conference on Intelligent Control, Limassol, Cyprus, June 2005

Residual Generation from Principal Component Models for Fault Diagnosis in Linear Systems Part II: Extension to optimal residuals and dynamic systems Janos Gertler, Fellow, IEEE Abstract. The generation of enhanced (structured) residuals from PC models, both by algebraic transformation and by direct structured modeling, was reviewed in Part I of this paper. In Part II, some recent results on residual optimization are included. Extensions to dynamic systems are discussed. The new concept of dynamic redundancy is introduced, followed by an algorithm to simultaneously estimate the system order and the number of linear relations. The generation of enhanced (and optimized) residuals is revisited in the dynamic framework and explained in the light of dynamic redundancy. A simple example is given to illustrate the new concepts. 1. INTRODUCTION In the companion paper (Same title, Part I: Review of static systems [1]) the generation of enhanced (structured) residuals from PC models was reviewed, both by algebraic transformation and by structured subsystem modeling. For basic definitions and notation, the reader is referred to that paper. The possibility of combining PC-based structured residuals with optimization for maximum sensitivity within the structure was pointed out in the companion paper. Here this avenue will be explored in more detail. In the second part of this paper, extensions to dynamic systems will be discussed. Dynamics can be handled in the PCA framework by including time-shifted samples of the variables in the model [2]. In this paper, the behavior of such models will be analyzed and the new concept of dynamic redundancy will be introduced. Based on this, an algorithm to simultaneously estimate the system order and the number of linear relations will be proposed. The generation of enhanced (and optimized) residuals will be revisited in the dynamic framework and explained in the light of dynamic redundancy. A simple example will be given to illustrate the new concepts. An alternative approach to dynamic system modeling using PCA has embedded principal components into the framework of subspace system identification [3,4].

------1

Janos Gertler is with the School of Information Technology, George Mason University, Fairfax, VA 22030, USA; (703) 425-3419; [email protected]

2. OPTIMAL DESIGN OF RESIDUALS A measure of residual sensitivity. In the structured residual framework, scalar residuals are individually threshold tested to arrive at a fault pattern. The threshold is determined by the standard deviation of the fault-free residual. Thus a natural measure of fault sensitivity is the size of fault response related to the fault-free standard deviation. This measure is different for each fault to which the particular residual responds. With residuals generated from two-dimensional PC models, the sensitivity index (signal-to-noise ratio) of the i-th residual ri(t, θi) to the j-th fault ∆xj(t), denoted as κij, is κij = |sij | / σi

(1)

where sij is the j-th element of the i-th parity vector. In lack of other information, we have to assume that each fault has the same (unit) nominal size. If the faults are known to have different nominal sizes then the sensitivity indexes should be weighted with these. Note that no assumption is made, nor is necessary, concerning the timebehavior of the faults (such as step or ramp fault). The max/min optimum criterion. An important issue is the choice of the performance index. Recall that a residual has different sensitivities with respect to the various faults. Qin and Li [5] proposed the average fault sensitivity of the residual as the performance index. With this index, however, the optimization algorithm would strive to maximize the largest sensitivity, since this would result in the highest return in the average, while (almost) ignoring the smallest ones. We recommend, instead, a max-min index which is aimed at maximizing the weakest sensitivity, resulting in a more balanced behavior of the residual (see also [6] and [7]). Thus the critical (worstcase) sensitivity is κi = min κij = min |sij | / σi j

j

(2)

Optimal design from the full PC model. We will only discuss the case when there are m-2 zeros in each row structure and the resulting one degree of freedom is used for optimization. The problem may be formulated as a constrained optimization: seek the m elements of wi, so that the minimal sensitivity index is maximized, subject to a set of equality constraints (equation (26) in Part I) with m2 columns in QRi’ (m-1 constraints, one free parameter to

optimize). Note that a closed-form solution is not possible since the performance index is not a smooth function. An alternative approach utilizes the geometric properties of the PC model. Notice that by requiring m-2 zeros in the structure, the m-dimensional residual is forced into a 2dimensional subspace. Find two orthonormal vectors that span this subspace and then seek the optimized residual as a linear combination of those vectors. Thus satisfying the constraint is decoupled from the parametric optimization. The two orthonormal vectors can be found by requiring conditions like (26) (Part I), first wi1 with m-2 zeros and two ones, then wi2 with m-2 zeros, one “one” and the added condition wi1⊥wi2. The parametric optimization problem is identical with the one encountered with partial PC design and will be discussed at that point in Part II. Optimization in the partial PC framework. As shown in Part I (equations (32) and (33)), for residuals obtained from two-dimensional partial PC models, both the fault-gains sij and the noise-effect (σi)2 depend on the parameter θi , as sij = cij sin (θi + ϕij) where

cij = [(qi1j)2 + (qi2j)2]1/2;

(3) ϕij = tan-1(qi2j / qi1j)

(σi)2 = (sin θi )2 (σi1)2 + (cos θi )2 (σi2)2

(4)

3. PC MODELING OF DYNAMIC SYSTEMS In a computerized environment, dynamic systems are described by their difference equations. These contain, in addition to their present sample, a number of past (delayed) samples of the system variables. The highest delay in the equations is the order of the model. When PCA is applied to a dynamic system, the set of variables is expanded to include the past samples [2] (which may be considered as “pseudo-variables”). This increases the size of the PC model, in terms of the variables, sometimes significantly if the system order is high. However, the number of relationships among the variables is the same as in steady state. 3.1. Dynamic model structures

With these, κij (θi) = | sij (θi) | / σi (θi)

(5)

We then seek the value of θi which maximizes the smallest of these indices θiopt

non-trivial is that the n-m+2 different κij (θi) functions are intersecting over the range of optimization. Thus the function κi (θi), for which the maximum is sought, is not a single (smooth) function but it consists of sections from the various κij (θi) functions. In other words, we seek the maximum of the smallest of the n-m+2 functions at any given value of θi (Figure 1.). A closed-form solution does not seem possible. One may observe that the optimum is either at the maximum of one of the functions κig (θi) or at an intersection of two functions κij (θi) = κig (θi).

i

i

κij (θi)

= arg max κ (θ ) = arg max min (6) θi θi j Because of the absolute value, the range of interest is -π/2 ≤ θi ≤ π/2.

Consider static system without disturbances and noise B x°(t) = 0

(7)

There are n variables and m linear relations in the system. The ν-th order discrete dynamic extension of this equation, in the usual regression format, is B0 x°(t) + B1 x°(t-1) + ... + Bν x°(t-ν) = 0

(8)

where Bk , k=0 ... ν, are m⋅n matrices. Eq. (8) may be condensed into BD ϕν°(t) = 0

(9)

where ϕν°(t)=[x°’(t) x°’(t-1) ... x°’(t-ν)]’ is the regression vector and BD=[ B0 B1 … Bν]. An alternative way of presenting linear difference equations utilizes the shift operator z -k , defined as z –k x(t) = x(t-k) Figure 1. Sensitivity curves κij (θi), j=1..4, in twodimensional space (one-dimensional search). The horizontal axis is scaled to θi/0.5π Finding the optimum. As formulated in (6), we are dealing with a one-dimensional optimization problem with a continuous optimization variable. What makes the problem

(10)

With this, (8) is written as (B0 + B1 z -1 + … + Bν z -ν ) x°(t) = B(z) x°(t) = 0

(11)

The shift operator notation is not only compact but it also reduces manipulations with equations to manipulations with polynomials.

In the PCA framework, all samples [t, t-1 ... t-ν] of the n variables appear in the model (as “pseudo-variables”), so the dimension of the model is n⋅(ν+1) and it has the form

real while the other µ are superfluous. We will refer to this situation (already observed and utilized in [1]) as (18) having a dynamic redundancy of order µ.

QRD’ϕν°(t)=QR0 x°(t)+QR1 x(t-1)+ ... +QRν x°(t-ν)= 0

A PC model obtained from the ϕν°(t) dataset will recognize only one relationship among the data but one obtained from ϕν+µ°(t) will “recognize” µ+1. Thus, for the residual space describing (18),

(12)

where the rows of QRD’ are the extended eigenvectors spanning the residual space. In shift operator format, (12) becomes QR’(z) x°(t)=(QR0+QR1 z -1 + ... +QRν z -ν ) x°(t)= 0

(13)

The rows of the extended matrices BD and QRD’ span the same subspace (see equation (10) in part I) , so that BD = M QRD’

(14)

This implies Bk=MQRk, k=0…ν, and B(z)=MQR’(z). Now consider the static (steady state) model of the system. The static model describes the system behavior when all the variables are constant. Then B x°(t) = 0

QR’ x°(t) = 0

(15)

and B=B0+B1+...+ Bν = B(z)|z=1

(16a)

QR’=QR0+QR1+...+QRν = QR’(z)|z=1

(16b)

(20)

Now consider steady state. The static equivalent of (19a) and of all lines in (19b) is b’x(t) = 0, where b’=b’0+b’1+…+b’ν . Thus the static gain matrix corresponding to the set of equations represented by (18) is B = [b c1b … cµb]’ . Clearly, this is a rank 1 matrix. Then it follows from (14) that, for the residual space describing (18), QR’=M-1B and Rank QR’ = 1

(21)

A system in general has m independent dynamic linear relations, each with a potentially different real order νi . If the regression vector is sufficiently large to cover all real relations, these will be recognized by the PC model. Depending on the choice of the regression vector, each relation may have a different order of dynamic redundancy µi . The PC model will recognize all the redundant relations as well, so that m

3.2. Dynamic redundancy

Rank QRD’ = Σ (1 + µi )

The following concept, is central to the estimation and diagnostic use of dynamic PC models. Consider first a system with a single output, described by a single equation: b’(z) x°(t) = 0 where b’(z)=b’0+b’1z -1+ …+ b’ν z equation

Rank QRD’ = Rank QR’(z) = µ+1

(17) -ν

. Now consider the

[b’(z) x°(t)] c(z) = 0

(18)

where c(z) = c0 + c1 z -1 +…+cµ z -µ . Observe that while for equation (17) the regression vector is ϕν°(t)=[ x°’(t)… x°’(t-ν)]’, for (18) it becomes ϕν+µ°(t)=[x°’(t) … x°’(t-ν-µ)]’. With the latter, (17) implies, in addition to the original equation b’0 x°(t)+b’1 x°(t-1) +…+ b’ν x°(t-ν) = 0

(19a)

also its shifted variants b’0 x°(t-1)+b’1 x°(t-2) +…+ b’ν x°(t-ν-1) = 0 ….. (19b) b’0 x°(t-µ)+b’1 x°(t-1-µ) +…+ b’ν x°(t-ν-µ) = 0 Observe that (18) is a linear combination of the original equation and the shifted ones. Of these, only the first one is

i=1

(22)

However, the static equivalent of the model ignores the redundant relations, that is, Rank QR’ = m

(23)

3.3. Estimating dynamic models In a practical situation, the “true system” (the B matrices) is not known and we aim for a PC model (Q matrices). The critical issue is the composition of the regression vector ϕ(t); this implies the simultaneous estimation of the number of linear relations and the dynamic orders of the system. Of course, we can seldom exactly reconstruct the true system; the objective is to find the smallest estimated regression vector ϕ^(t), and the accompanying dynamic PC model QRD , that describe the system with sufficient accuracy. In the following, we will outline a procedure for this task. The algorithm relies on the fundamental equations (22) and (23), which state that •

there is dynamic redundancy in the model if Rank QRD’ >Rank QR’.

We supplement this with two additional observations:



If the dynamic order of the estimated regression vector ϕ^(t) is not sufficiently high to cover the real relationships in the system then Rank QR’< m.



In most cases, the order of the individual variables xj , j=1...n, is not the same but some νj , and ν=maxj (νj). This implies that there are dynamic redundancies and/or zero columns in the Bj and QRj matrices. Eliminating these, and the associated pseudo-variables, may significantly reduce the size of the PC model. To this end, the full ν-th order regression vector ϕν^(t) has to be replaced with a reduced vector ϕν*^(t) in which each variable is present with its order νj .

Based on the above properties, the following algorithm is proposed: 1. 2.

3.

Start with ϕ^(t) = x(t). Perform a PCA, note Rank QR = Rank QRD . Increase the order simultaneously for all variables, as ϕ^(t) = [x’(t) x’(t-1)]’ , ϕ^(t) = [x’(t) x’(t-1) x’(t-2)]’ , etc. Perform a PCA, compute Rank QR and Rank QRD in each step. Stop the procedure when adding a new sample would not increase Rank QR any more. Then m=Rank QR and the system order ν is the last shift that still resulted in an increase in Rank QR. Select the first variable x1(t). Drop the highest shift x1(t-ν) from ϕ*^(t), perform a PCA, compute Rank QRD and Rank QR . Then drop x1(t-ν+1), repeat the computation, etc. Continue this procedure as long as Rank QR does not decrease. Repeat this for all other variables xi(t), i=2...n.

Dropping a sample (in step 3) may or may not result in the reduction of Rank QRD . If it does then, by this step, a redundant equation is eliminated. This may result in other variable samples attaining zero columns in QRD . Note that there are two tolerances in the above procedure: the tolerance for a “zero” eigenvalue in the PCA modeling and that of the rank computation. These tolerances need to be set according to the level of noise in the system. Clearly, the algorithm is empirical and the results may be affected by the selection of the tolerances. 4. STRUCTURED RESIDUALS FROM DYNAMIC PC MODELS 4.1. Structured residuals by algebraic transformation Structured residuals may be designed by generating subsystems by algebraic elimination (transformation) from a full dynamic PC model [7]. The procedure may be described at three different levels: (1) shift-operator matrix transformation, (2) shift-operator elimination and (3) regression-form elimination.

The shift-operator transformation is the dynamic generalization of the static formulas (equations (25) and (26) in Part I), as ri(t) = wi (z) e(t)

(24)

where wi (z) is to satisfy wi(z)[QRi’(z) q*g(z) ] = [0 … 0 1]

(25)

If [QRi’(z) q*g(z) ] has full rank (for all z) then (25) can be solved for wi(z) by the inversion of the polynomial matrix. If π(z)=Det[QRi’(z) q*g(z) ] is polynomial (not just a number) then, in order to keep wi(z) polynomial (not rational), the response specification has to be modified to [0 … 0 π(z)] [8]. Equations (24) and (25) correspond to a series of eliminations, using the shift-operator relations e(t) = QR’(z) x(t) = QR’(z) ∆x(t)

(26)

Each shift-operator manipulation, in turn, corresponds to manipulations with time-shifted residuals (equations) and as such affects the dynamic order and redundancy of the residual. In particular: •

Multiplication of a residual with a polynomial, c(z)ei(t), with c(z) = c0 + c1 z -1 +…+cµ z -µ, brings into the transformation the shifted residuals ei(t1),…, ei(t-µ), and thus results in increased order.



A polynomial that can be factored out from a residual at any point during the procedure, as ri(t)=c(z)ri*(t), is an indication of ri(t) containing redundant shifted residuals; canceling c(z) leads to a reduced order.

If the primary PC model is minimum order (as sought by the algorithm described in Section 3.3) then the order of some of the transformed residuals may be higher than that of the original equations, but it never exceeds the MacMillan order of the full system [9]. 4.2. Structured residuals from partial PC models Each residual structure corresponds to a subsystem. The subsystem models are now determined directly from experimental data, including the estimation of the subsystem order. The order obtained this way may be significantly lower than the theoretical limit. Residual structures are defined in terms of variables (not variable samples). That is, a variable indicated with “one” in a structure is present, possibly with several samples, while a variable indicated with “zero” is absent completely. Therefore the possible row structures in a dynamic system are identical with those obtained for a static system with the same variables.

For each subsystem (row-structure), the dynamic residual is to be computed (as a generalization of equation (13b) in Part I), as ei(t) = QiRD’ ϕi^(t)

This clearly requires the knowledge of the composition of ϕi^(t); the parameters of QiRD’ then follow. While dynamic orders need to be estimated, the number of relations in the subsystem is known and is very low (m=1 for maximum zero subsystems and m=2, perhaps m=3 for subsystems designed for parametric optimization). This simplifies the algorithm quite substantially and provides some guidance in the choice of tolerances. 4.3. The effect of dynamics on optimization By generalizing equation (13b), Part I, the effect of faults and noise in the dynamic residual of the i-th subsystem is (28)

with ∆ϕi^(t) containing the faults ∆x1’, ∆x2’ ... ∆xn’, each with its subsystem-specific set of samples, and ψ(t) containing the noise terms in appropriate structure. Thus the noise-free response to the j-th variable fault is ei(t|∆xj) = [qiR*j0

… qiR*jνj][∆xj (t) … ∆xj (t-νj )]’

(29)

where qiR*jk , k=0…νj , are columns of QiR belonging to the samples of ∆xj . The relationship among the consecutive samples of ∆xj is not known; the most reasonable assumption is that ∆xj is constant over the samples t, t-1, … , t-νj . Thus i

e (t|∆xj) =

(qiRj*1

+… +

qiRj*νj )

∆xj (t) =

qiR*j

∆xj (t) (30)

where qiR*j is the j-th column of the static coefficient matrix QiR’ . That is, parametric optimization in the dynamic situation is identical with that in a static system; the coefficient matrix used is the steady state equivalent computed from the dynamic coefficient matrix. Example.

u1

(31)

y2(t) = 0.9y2(t-1) + 0.3y1(t) + 0.5u2(t)

(32)

By algebraic manipulation, a third equation is obtained as

(27)

ei(t) = QiRD’ [∆ϕi^(t) + ψ(t)]

y1(t) = 0.8y1(t-1) + 0.2u1(t)

y2(t) = 1.7y2(t-1) -0.72y2(t-2) + 0.06u1(t) + 0.5u2(t) - 0.4u2(t-1) The possible residual structures are y1 y2 u1 u2 r1 r2 r3

1 1 0

0 1 1

1 0 1

0 1 1

Here the rows belong to residuals and the columns to faults; a “1” means that a fault affects the residual while a “0” means it does not. Clearly, the faults on u2 and y2 cannot be distinguished from one another. The system was simulated with zero mean colored random inputs, without noise. Then the algorithm described in 3.3 was used to estimate the PC model. Full PC model and transformation. The minimal regression vector was found as ϕ^(t) = [y1(t) y1(t-1) y2(t) y2(t-1) u1(t) u2(t)]’ and one possible gain (eigenvector) matrix QRD’ as -0.7645 0.6223 -0.0444 0.0399 0.1556 0.0222 -0.1333 -0.0588 0.6892 -0.6203 -0.0147 -0.3446 The shift-operator equivalent of this, QR’(z), is qR*1(z) = [ -0.7645(1-0.8140z-1) 0.1556

-0.0444(1-0.9z-1)

0.0222]

qR*2(z) = [ -0.1333(1+0.4411z-1) -0.0147

0.6892(1-0.9z-1)

-0.3446]

A residual decoupled from u2 may be obtained as

y1

u2

(33)

r1(t) = s1(z) [y1(t) y2(t) u1(t)]’ where

y2

Figure 2. Example system The foregoing will be illustrated on a simple example system shown in Figure 2. The system is described by the two independent equations

s1(z) = c(0.3446 qR*1(z) + 0.0222 qR*2(z)) = [ 1-0.8z-1 0 -0.2]

(34)

This is consistent with (31) (c is a normalizing factor). Notice that y2 has also disappeared from the residual. Similarly, a residual decoupled from u1 is r2(t)=s2(z) [y1(t) y2(t) u2(t)]’ where s2(z) = c(0.0147 qR*1(z) + 0.1556 qR*2(z))

= [ -0.3

1-0.9z-1

-0.5]

(35)

This is consistent with (32). Finally, a residual decoupled from y1 is r3(t)=s3(z) [y2(t) u1(t) u2(t)]’, with



The relationship between various model structures and the implications of dynamic redundancy need to be further explored.



The dynamic estimation algorithm, as proposed here, suffers from some deficiencies, in that the rank estimation it relies on is very sensitive to the noise-tosignal ratio and to the choice of tolerances. Clearly, more robust techniques need to be sought to solve this problem. One possible avenue, explored presently, is the reduction of the noise effect by incorporating instrumental variables into the estimation algorithm.



The estimation of dynamic models and residual generation based on them needs to be tested on nontrivial size industrial examples.

s3(z) = c(0.1333(1+0.4411z-1) qR*1(z) -0.7654(1-0.814z-1) qR*2(z)) = [ (1-0.8z-1)(1-0.9z-1) -0.06 -0.5(1-0.8z-1)]

(36)

Here multiplication of qR*1 and qR*2 with z-1 brought a time-shifted variant into the transformation, leading to the increase of the dynamic order. r3(t) is of course consistent with (33). Structured partial PC models. The same data was used to obtain subsystem models in various structures. With x =[y1 y2 u1]’, the algorithm found ϕ1^(t) = [y1(t) y1(t-1) y2(t) u1(t)]’ 1

s = [0.7715 -0.6172 0.0000 -0.1543] This is identical with (34), apart from a scaling factor. With x =[y1 y2 u2]’, the model was found as 2^

ϕ (t) = [y1(t) y2(t)

y2(t-1) u2(t)]’

s2 = [-0.2046 0.6820 -0.6138 -0.3419] Compare to (35). Finally, with x =[y2 u1 u2]’, ϕ3^(t) = [y2(t) y2(t-1) y2(t-2) u1(t) u2(t) u2(t-1)]’ s3 = [0.4554 -0.7742 0.3279 -0.0273 -0.2277 0.1822] Apart from scaling, this is identical to (36). 5. CONCLUSION In the first part of this paper, our recently developed and published results on the generation of enhanced (structured) residuals from PC models were reviewed, both by algebraic transformation and by direct structured modeling. In the present, second part, some recent results on residual optimization were included. Further, extensions to dynamic systems were discussed. The concept of dynamic redundancy was introduced, followed by an algorithm to simultaneously estimate the system order and the number of linear relations. The generation of enhanced (and optimized) residuals was revisited in the dynamic framework and explained in the light of dynamic redundancy. Significant additional work is needed concerning dynamic PC modeling and its application to residual generation.

ACKNOWLEDGEMENT The contribution of Jin Cao is acknowledged to the work on optimal structured residuals. Part of this work was supported by NSF under Grant #ECS9906250. REFERENCES [1] Gertler, J. “Residual generation from principal component models for fault diagnosis in linear systems. Part I: Review of static systems” 2005 ISIS-MED Conference. [2] Ku, W., R.H. Storer and C. Georgakis, “Disturbance Detection and Isolation by Dynamic Principal Component Analysis”. Chemometrics and Intelligent Laboratory Systems, 30, 179 (1995). [3] Wang, J. and Qin, J, “A new subspace identification approach based on principal component analysis,” J. of Process Control, 12, 841 (2002). [4] Li, W. and S.J. Qin, “Consistent Dynamic PCA based on Errors-in-Variables Subspace Identification”. J. of Process Control, 11, 661 (2001). [5] Qin, S. J. and W. Li, “Detection, Identification, and Reconstruction of Faulty Sensors with Maximized Sensitivity”, AIChE Journal, 45, 1963 (1999) [6] Xu, R. and C. Kwan, “Robust Isolation of Sensor Failures” Asian Journal of Control, 5, 12 (2003). [7] Kwan, C., J. Cao, W. Li, R. Xu and L. Haynes, “Principal Component Analysis for Feature Extraction from One Dimensional Signals, Phase 1 Final Report”, Intelligent Automation, Inc. (2003). [8] Gertler, J. and J. Cao, “PCA-Based Fault Diagnosis in the Presence of Control and Dynamics”, AIChE Journal, 50, 388 (2004). [9] Gertler, J., “Fault Detection and Diagnosis in Engineering Systems”, Marcel Dekker Inc., NY, (1998)