A Probabilistic Approach to Fault Diagnosis of ... - Semantic Scholar

Report 3 Downloads 286 Views
950

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

A Probabilistic Approach to Fault Diagnosis of Industrial Systems Alberto Barigozzi, Lalo Magni, and Riccardo Scattolini

Abstract—A method for fault diagnosis of industrial systems is presented. Plant devices, sensors, actuators and diagnostic tests are described as stochastic finite-state machines. A formal composition rule of these models is given to obtain: 1) the set of admissible fault signatures; 2) their conditional probability given any fault; and 3) the conditional probability of a fault given a prescribed signature. The modularity and flexibility of this method make it suitable to deal with complex systems made by a large number of components. The method is used in an industrial automotive application, specifically the diagnosis of the throttle body and of the angular sensors measuring the throttle plate angle is described in detail. Index Terms—Automata, automotive, fault diagnosis, finite-state machines, probabilistic models.

I. INTRODUCTION

M

ANY APPROACHES have been proposed so far for fault detection and diagnosis of industrial systems. Among them, signal based (or data driven) methods attempt to extract useful information from the analysis of specific signals, see, e.g., [1]. More sophisticated analytical model-based methods, see, e.g., the books [2], [3] and the references reported there, rely on the use of a quantitative model of the system under diagnosis to generate a residual sequence formed by the difference between the projected output behavior and the actual measure outcome. Residuals are usually generated by parameter estimation, observers and parity relations. Finally, knowledge-based methods use qualitative model descriptions in terms of signed directed graphs [4]–[6], symptom trees [7], artificial neural networks [8], [9], and expert systems [10]. A recent and complete presentation of the main approaches to fault diagnosis can be found in [11], while a review of the application of the main model-based techniques is reported in [12]. Anyone of the methods presented above has its own strength and fields of application, however it is widely recognized that in many cases the design of diagnostic systems for complex plants calls for a wise combination of various techniques, see [13]–[16]. The proposed mixed approaches are usually derived by combining physical, first principles, and qualitative models obtained with neural networks [13] or statistical classifiers [14]. Manuscript received July 23, 2003. Manuscript received in final form February 17, 2004. Recommended by Associate Editor A. T. Vemuri. This work was supported in part by MURST Project “New techniques for identification and adaptive control of industrial systems.” A. Barigozzi was with the Dipartimento di Informatica e Sistemistica, University of Pavia, Italy. He is now with IBM Global Services-ITS, 20132, Milano, Italy (e-mail: [email protected]). L. Magni is with the Dipartimento di Informatica e Sistemistica, University of Pavia, Italy (e-mail: [email protected]). R. Scattolini is with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133 Milano, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/TCST.2004.833606

In other cases [15], they integrate analytical and heuristic symptoms by means of the theory of fuzzy sets to obtain a unified diagnostic reasoning strategy. In this brief, a new method is presented to integrate in a unified framework different diagnosis approaches currently available. The diagnostic system is assumed to be composed by apparatuses and tests. Apparatuses are all the system components which can be subject to a fault, that is plant devices, sensors, actuators, transmission lines, software code. Tests are sources of information which can be used to monitor the system. As such, they can rely on analytical and hardware redundancies, signal analysis, logical relations, and can be designed with the help of signal-based, model-based or knowledge-based methods. Both apparatuses and tests are described as stochastic finite-state machines (FSM) whose states represent safe and fault behavior of apparatuses or diagnosis of normal and abnormal conditions given by tests. Transitions between states are probabilistic and forced by events, which describe either the occurrence of faults or normal working conditions. Associated to apparatuses and tests there are also alarms, whose status (switched off/on) is deterministically defined by the current status of the FSM. By assigning transition probabilities and marginal probabilities to safe and fault events, through simple composition rules it is possible to determine the feasible configurations of alarms (signatures) and their conditional probability given any event. This is useful in the design of the diagnostic system to assess its capability to correctly identify and isolate faults, as well as to tune the thresholds used by diagnostic tests to assess the presence of faults. With this method, it is also possible to determine the probability of a fault event given any signature during plant operations; this information is particularly useful to complete online the fault diagnosis. Finally, it is remarkable that the incremental method adopted here to derive the overall model of the system starting from the FSM models of apparatuses and tests provides modularity and flexibility to the procedure for the rapid prototyping of the diagnostic strategy. The method proposed in this brief is the extension to the stochastic case of the methods presented in [17] and [18]. Moreover, the use of FSM to describe the system under diagnosis has already been considered in [19], where a fault observer was derived using the information provided by the sequence of events registered in working conditions. With respect to the deterministic methods proposed in [17]–[19], the probabilistic point of view here adopted allows to easily include in the problem formulation the intrinsic nondeterminism implicit in the definition of normal (or abnormal) behavior of the system, or in the use of thresholds which can lead to false alarms or missed detections.

1063-6536/04$20.00 © 2004 IEEE

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

The method proposed here has already been used in an industrial automotive application to study the diagnostic strategy for the diagnosis of faults of the throttle body, the intake manifold, the accelerator and brake pedals, the combustion chamber and a number of sensors. In a deterministic environment [17], the results achieved were totally in agreement with those provided by a standard failure mode and effect analysis (FMEA), which however required much more effort for its development. For space limitations, it is here described in detail a smaller and more tractable problem concerning the diagnosis of the throttle body and of the angular sensors measuring the throttle plate angle.

which can be obtained according to the following composition rules, where is the safe event:

if if if

II. MODELING DIAGNOSTIC SYSTEMS WITH STOCHASTIC AUTOMATA A. Models of Apparatuses and Tests A diagnostic system is composed by apparatuses and diagnostic tests . Apparatuses are devices which can be subject to faults, such as plant components, sensors, actuators, as well as software code, control algorithms, transmission networks. Diagnostic tests are used to detect and isolate the presence of faults, and can be simple operations such as signal comparisons, or more sophisticated algorithms, like those based on consistency relations, analytical redundancies, logical propositions. Both apparatuses and tests can be described by a , where the set of finite-state machine FSM describes the normal or the fault states behavior of components and the symbol represents the caris the set of events dinality of the set . which correspond to the occurrence of faults and govern the are the transitions between states. Outputs available alarms, while is the deterministic output . Thus, the current state uniquely detransformation, fines the status of alarms. Finally, is the state transition probwhere ability, i.e., is a discrete probability measure. Obviously, , . B. Composition Rules A formal composition rule of FSM models is now derived under the following assumptions. Assumption A1 (Independence of Alarms): The sets of outputs of the FSM models of apparatuses and tests are independent. Assumption A2 (No Simultaneous Faults): Fault events can occur only one at a time and, once a fault has occurred, the diagnostic procedure is complete before the arrival of a new fault event. Note that Assumption A1 is quite obvious, since it simply states that each alarm is uniquely related to an apparatus or to a test. On the contrary, Assumption A2 is more restrictive, but it is usually accepted in many industrial fields, such as in the automotive industry. Given two FSM models, i.e., FSM and FSM , of apparatuses or tests, their synchronous composition is described by the model FSM

951

(1) Rule (1) can be regarded as an extension to the stochastic case of the synchronous composition rules of deterministic automata (see [20]), usually applied in the analysis of discrete event systems. In the composition of FSM models, it can happen that some ) can be unreachable by any event, composite states ( that is , , hence, these states are with the last rule, so reducing its cardinality. removed from C. Implementation In view of an algorithmic implementation, any FSM model of apparatuses or tests can be described by a mathematical representation by means of two matrices, namely, the state/event and the output transformation matrix . The rows matrix correspond to states, its columns are associated to events of and its element is the probability that event forces a transition to state , so that , . As for matrix , its rows correspond to states, while its columns is equal to one if alarm are associated to alarms, hence, is switched on in state and is zero otherwise. , Given two models FSM and FSM described by and , the composite model FSM is described by and by the output transformation the state/event matrix . The element of corresponds to state matrix and to event . The value of is given by: • the product of the term associated to and in and the term associated to and in when ; • the product of the term associated to and in and the term associated to and in when ; and in • the product of the term associated to and the term associated to and in when . must be removed as they correspond to Null rows of unreachable states. , it has rows and columns. As for the matrix element, corresponding to the composite state Its

952

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

TABLE I STATE/EVENT MATRIX OF THE THROTTLE

TABLE III OUTPUT TRANSFORMATION MATRICES HARDWARE REDUNDANCY TEST

STATE/EVENT

AND

STATE/EVENT

AND OUTPUT TRANSFORMATION MATRICES OF THE ANALYTICAL REDUNDANCY TEST

TABLE II STATE/EVENT AND OUTPUT TRANSFORMATION MATRICES OF THE SENSORS

and to A1).

OF THE

TABLE IV

and to the output , is equal to the term associated in or to and in (recall assumption

D. Interpretation and Use of the Model By repeatedly applying the composition rule (1) to the models of all apparatuses and tests, one finally obtains the overall model of the system FSM and the matrices and . The rows of correspond to all obtained columns are associated to all composite states, while its possible events in the system. The element is the probability that the system is in the th composite state given the th event, as such , . As are associated to unreachable already stated, null rows of composite states and must be removed. Matrix has columns which correspond to the outputs of all the collected submodels, and rows. The element is one, provided that one of the states activating the alarm of belongs to the composite state corresponding to row , othof signatures, erwise is zero. The set that is configurations of alarms allowed by the system, is given . Remarkably, can be smaller by the different rows of because two or more composite states can produce than the same signature. and are not particularly useful per se, but Matrices , with are necessary to compute the matrix, henceforth called rows and columns. The th element of is the given the event . conditional probability of the signature To determine its value, it is enough to select from the rows of the composite states producing the configuration of alarms corresponding to and to add the conditional probabilities of . Obvithese states given , as specified by the elements of ously, , . The information is useful in the design of the diagnostic provided by matrix strategy to assess its capability to discriminate between different fault events. Indeed, a satisfactory strategy could be such that any fault event could be uniquely related with high probability to only one signature. , Finally, given the marginal probabilities of the events , from the matrix and by means of Bayes Theorem, it is possible to compute the probability of the event given the signature . This information can be stored in the

matrix

, with

rows and columns, and such that , . In real time operations, is useful for the diagnosis, the information stored in matrix that is to estimate the most likely fault when a signature occurs. III. FAULT DIAGNOSIS OF THE THROTTLE BODY

In the development of drive by wire systems, one of the most significant requirements for safe operations is that the torque provided by the engine equals the driver’s request. For this purpose, it is necessary to perform an accurate online diagnosis of the throttle body. In the following, it is shown how the procedure described in this brief can be applied to the (simplified) problem of the diagnosis of a throttle and of two sensors, duplicated for hardware redundancy, used to measure the throttle angle. For the diagnosis of the throttle and of the sensors, four tests are performed. Two of them are local tests of the sensors, checking whether the measured signal is inside a prescribed range. A third test is related to hardware redundancy and checks the discrepancy between the measures provided by the two sensors. Finally, an analytical redundancy test fed by the measure provided by the first sensor, is used to isolate throttle faults. It is based on the comparison between the computed and measured air flow rate inside the intake manifold. In the description of the analytical redundancy test it is assumed that all other used signals are free from errors. The rationale behind the scheme is that gross sensors’ errors are locally diagnosed, more subtle sensors’ faults are revealed by the hardware redundancy test, while throttle faults are detected by the analytical redundancy test. A. Model of the Throttle Throttle is an apparatus which, in the simplified example described here, can be in the safe operating state or in the because of the fault event . This fault event can fault state be associated for example to a mechanical block or to a malfunctioning of the electric drive. Its FSM model has not any output transformation and is described by the state/event matrix reported in Table I. Hence, the throttle cannot be in the fault state when it works properly, while a negligible fault, for example a small leakage, can maintain the device in the safe state.

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

953

TABLE V STATE/EVENT MATRIX OF THE OVERALL MODEL

TABLE VI OUTPUT TRANSFORMATION MATRIX OF THE OVERALL MODEL

, , is equal to zero in and to one in , . Hence, they can be described by the state/event and output transformation matrices shown in Table II.

output

TABLE VII MATRIX P 6

B. Model of the Sensors and are apparatuses subject to electrical and Sensors and , functional faults, described by the events , respectively. Electrical faults correspond to a short (or to an open) circuit, while functional faults can represent a bias or a long term drift. The FSM models of the sensors have three , , , and . events, namely, , Their status can be described by two states: the safe state , and the fault state , . Correspondingly, the

In this case, there can be both a false alarm (the safe event can lead to the fault state), and a missed detection. In particular a functional fault, for example a small bias, maintains the system in the safe state with a significant probability.

954

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

TABLE VIII MATRIX P E

C. Model of the Hardware Redundancy Test Hardware redundancy test (HR) has two states, namely, the and the fault state , and is subject to the events safe state , , , besides the safe event . Its and transformation matrices are given in Table III. D. Model of the Analytical Redundancy Test Finally, the analytical redundancy test, which is based on the angular measure provided by the first sensor, is sensible only to , , and , besides the safe state , and has the events and the fault state . The two states, namely, the safe state matrices describing its FSM model are reported in Table IV. By applying (1), after the elimination of the null rows, the overall FSM model has 24 states S1-S24 and is described by the of Table V. state/event matrix Note that, as expected, the events , , , and are associated with high probability ( ) to only one composite and can lead the system to difstate. On the contrary, ferent composite states with comparable probabilities. Indeed, the sensors’ functional faults are more subtle and difficult to diagnose. The overall output transformation matrix is reported in Table VI, where the signatures are also defined. The matrix shows that different states, for example S2 and S6, produce

the same signature. As such, they are indistinguishable from the analysis of the configuration of alarms. As described in Section II-D, it is possible to compute from and the matrix shown in Table VII. This matrix clearly shows that there is a specific signature ( , , , ) , and , while and associated to events , , can cause different configurations of alarms with similar probit is apparent that the probability abilities. Moreover, from to obtain some signatures, for example , is very low for each possible event. Finally, assuming that the marginal probability of the events , , , is , , , by applying Bayes in Table VIII can be computed. Theorem matrix stores all the information to be used for the online Matrix diagnosis. Once a given configuration of alarms is recognized, it is immediately possible to read from the corresponding row the probability of any fault event. In the case considered of here, it is apparent that some signatures could lead to an ambiguity in the fault diagnosis. For example, if or occur, both a throttle fault and a functional fault of the first sensor should would be considered as possible causes. On the contrary, almost uniquely lead to the diagnosis of a functional fault of and can only be due to fault of the first sensor, while the first sensor, although its nature cannot be definitely decided from them.

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 12, NO. 6, NOVEMBER 2004

IV. DISCUSSION AND CONCLUSION The modularity of the proposed method allows the designer to create a library of FSM models of apparatuses and tests and to combine them quickly in many different configurations. This reduces the development times in the design of the diagnostic strategy. Moreover, in this way it is easy to modify the value of some parameters specifying the behavior of the tests and to verify the effects of these changes to the possibility of diagnosing faults. The tests can be based on any one of the many methods already available for fault detection, so that qualitative and quantitative knowledge on the system can be integrated in an unique environment. It is also believed that the probabilistic approach adopted here is often necessary to get information and perform a diagnosis on line even when the signature of the alarms is not an expected one. The proposed approach requires to dispose of large sets of data to obtain experimentally the transition probabilities of the FSM models, as well as the marginal probabilities of the events. Moreover, it must be pointed out that the size of the overall model of the system can become very large in view of the adopted combinatorial composition rules. However, in the composition of FSM models it is possible to force the elimination of composite states with negligible probability of occurrence and to normalize the reduced model for further compositions. Following this method, significant reductions of the size of the overall diagnostic system can be obtained at the price of an approximate solution. REFERENCES [1] M. Basseville and I. V. Nikiforov, Detection of Abrupt Changes: Theory and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1993. [2] R. J. Patton, P. M. Frank, and R. N. Clark, Fault Diagnosis in Dynamic Systems: Theory and Applications. Englewood Cliffs, NJ: PrenticeHall, 1989. [3] J. J. Gertler, Fault Detection and Diagnosis in Engineering Systems. New York: Marcel Dekker, 1998.

955

[4] M. I. K. Aoki, E. O’Shima, and H. Matsuyama, “An algorithm for diagnosis of system failures in the chemical process,” Comput. Chem. Eng., vol. 3, pp. 489–493, 1979. [5] J. Guan and J. H. Graham, “Diagnostic reasoning with fault propagation digraph and sequential testing,” IEEE Trans. Syst., Man, Cybern., vol. 24, pp. 1552–1558, Oct. 1994. [6] J. M. Koscielny, “Fault isolation in industrial processes by the dynamic table of states method,” Automatica, vol. 31, pp. 747–753, 1995. [7] E. S. Yoon and J. H. Han, “Process failure detection and diagnosis using the tree model,” in Proc. IFAC World Congr., 1987, pp. 126–129. [8] M. Chow, R. N. Sharpe, and J. C. Hung, “On the application and design consideration of artificial neural network fault detectors,” IEEE Trans. Ind. Electron., vol. 40, pp. 181–198, Apr. 1993. [9] K. Watanabe, S. Hirota, L. Hou, and D. M. Himmelblau, “Diagnosis of multiple simultaneous fault via hierarchical artificial neural networks,” AIChE J., vol. 40, pp. 839–848, 1994. [10] J. De Keer, A. K. Mackworth, and R. Reiter, “Characterizing diagnoses and systems,” Artif. Intell., vol. 56, pp. 197–222, 1992. [11] L. H. Chiang, E. L. Russel, and R. D. Braatz, Fault Detection and Diagnosis in Industrial Systems. New York: Springer-Verlag , 2001. [12] R. Isermann and P. Ballé, “Trends in the application of model-based fault detection and diagnosis of technical processes,” Control Eng. Practice, vol. 5, pp. 709–719, 1997. [13] H. E. Garcia and R. B. Vilim, “Combining physical modeling, neural processing, and likelihood testing for online process monitoring,” in Proc. IEEE Int. Conf. Systems, Man Cybernetics, vol. 1, Piscataway, NJ, 1998, pp. 806–810. [14] D. Mylaraswamy and V. Venkatasubramanian, “A hybrid framework for large scale process fault diagnosis,” Comput. Chem. Eng., vol. 21-S, pp. S935–S940, 1997. [15] R. Isermann and M. Ulieru, “Integrated fault detection and diagnosis, systems,” in Proc. IEEE Conf. Systems, Man Cybernetics: Systems Engineering Service of Human, 1993, pp. 743–748. [16] G. Biswas, R. Kapadia, and X. W. Yu, “Combined qualitative-quantitative steady-state diagnosis of continuous-valued systems,” IEEE Trans. Syst., Man, Cybern. A, vol. 27, pp. 167–185, Mar. 1997. [17] L. Magni, R. Scattolini, and C. Rossi, “A fault detection and isolation method for complex industrial systems,” IEEE Trans. Syst., Man, Cybern. A, vol. 30, pp. 860–865, Nov. 2000. , “A design methodology for diagnostic strategies for industrial sys[18] tems,” Int. J. Syst. Sci., vol. 33, pp. 505–512, 2002. [19] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamohideen, and D. Teneketzis, “Failure diagnosis using discrete event models,” IEEE Trans. Contr. Syst. Technol., vol. 4, pp. 105–124, Mar. 1996. [20] C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems. Norwell, MA: Kluwer, 1999.