Fault Identi cation Algorithmic: A New Formal Approach Bechir AYEB University of Sherbrooke RiFa Lab / DMI Sherbrooke (PQ) CANADA J1K 2R1
[email protected] Abstract Much research has been devoted to system-level diagnosis. Two issues have been addressed. The rst of these is diagnosability, The second is the design of fault identi cation algorithms. This paper focuses on the second of these concerns. This paper investigates the process of fault identi cation itself, introduces a new formal approach, and proposes a pfaultp identi cation algorithm which runs in O(n2 = log n), < n2 .
1 Prologue
Consider a system S of n units (e.g., subsystems, modules, chips, processors), where each unit is assigned a particular subset of the remaining units to test. The complete collection of tests is called the connection assignment and can be represented by a directed graph GhV; E i where V is a nite set of vertices, each vertex representing a unit of S, and E is a set of directed edges, each representing a test between a pair of units. An outcome, denoted by aij , is associated with each edge (vi ; vj ) in E , where aij = 1 (resp. aij = 0) if vi evaluates (i.e., tests) vj as faulty (resp. faultfree). The set of outcomes associated with the edges of the corresponding graph of S is called a syndrome. Of course, when the tester is fault-free the outcome is deterministic. That is, if vi is fault-free then aji = 0 provided that vj is fault-free. If vi is faulty then aji = 1 provided that vj is fault-free. When the tester is faulty, things are a bit complicated and we should eventually take assumptions. Under the PMC model [21] (also called the symmetric model), no as This work is supported by Federal Grant
OGP0121468; NSERC.
sumptions are made.1 The outcome provided by a faulty tester may be either 0 or 1. A syndrome is said to be consistent with F , a subset of units, if it may be obtained when F is the set of all faulty units. A system S is called -diagnosable if, given a syndrome, all the (permanent) faulty units in S can be identi ed provided that the number of faulty units does not exceed . Two related issues are worth addressing here. The rst concerns the necessary and sucient conditions for a system S to be -diagnosable. Considering the PMC model, it has been provided in [15].2 Lemma 1 (Hakimi & Amin [15]) A system S is diagnosable i: n 2 + 1, ,(v) ; 8v 2 V , and j,(V 0 )j > p; 8V 0 V such that jV 0 j = n , 2 + p and 0 p , 1.
222
Similar results have been obtained in [1], [13], and [23]. Let us now turn to the second issue which concerns the design of fault identi cation algorithms for -diagnosable systems. Under the PMC model, we have the following results: (1) The Kameda et al algorithm runs in O( jE j) [17]; (2) The Dahbura & Masson algorithm runs in O(n2:5 ) [13]; (3) The Sullivan algorithm runs in O( 3 + jE j) [24]. This paper is concerned with the second issue, that is, fault identi cation algorithms. We rst shed some light on the process of fault identi cation and yields a simple algorithm to identify allpfaulty p units. The proposed algorithm runs in O(n2 = log n); < n2 . 1 In
[7], a new model is proposed. It assumes that
aij = 1 whenever both units ui and vj are faulty. This
model is known as the BGM model; it is also called the asymmetric model. 2 As usual, we have: ,(vi ) = fvj =(vi ; vj ) 2 E g and ,,1 (vi ) = fvj =(vj ; vi ) 2 E g. If V1 and V2 are two given sets then (1) jV1 j denotes the cardinality of set (3) ,(V1 ) = S V1; (2) V1nV2 denotes set dierence; S ( v2V1 ,(v))nV1 ; (4) ,,1 (V1 ) = ( v2V1 ,,1 (v))nV1 .
This bound is not however tight, but it depends on solving other related problems. Hence, subsequent improvements are possible but they do not alter the basic results given in this paper. The rest of this paper is organized as follows. Section 2 provides problem formulation and an illustrative example. It also presents D, a fault identi cation algorithm with the complexity mentioned above. Finally, Section 3 summarizes this paper and presents concluding remarks.
2 Our Framework
Due to space limitation, the paper has been drastically shortened. Results are stated without proof and several steps have been simpli ed. The complete version, including details and proofs, is available in [3], a 20-page research report.
2.1 Formulation A System S under diagnosis is a pair hU ; i where U is a nite set of constants fu1; ; un g representing the units of S, and is an integer denoting the diagnosability of S.
To formalize3 the notion of a syndrome corresponding to a given system S, we introduce the following distinguished predicate [6]. The unary predicate uu(x); x 2 U is interpreted as meaning the unit x is unreliable{ i.e., faulty. Using the uu(:) predicate, we formalize the outcomes composing the syndrome of the system under diagnosis as follows. Let ui and uj be two units of U . If aij = 0 then we write :uu(ui ) ) :uu(uj ) Conversely, if aij = 1 then we write :uu(ui ) ) uu(uj ). Observe that aij = aji = 1 yields the same formula. Clearly, a syndrome, hereafter denoted by , consists of a set of formulas which have the following forms: (1) :uu(ui ) ) :uu(uj ) which is equivalent to the clause (uu(ui ) _ :uu(uj )), or (2) :uu(ui ) ) uu(uj ), which could rewritten as (uu(ui ) _ uu(uj )). As in [6], we suppose that we have a conventional deduction system, denoted here as j=sld , with two inference rules. Let and be two literals, then the rst rule is the standard Modus Ponens: if holds and ) holds, then holds{ e.g., if uu(x) holds 3 The
following notation is used through this paper: Capital Greek letters (e.g., , , ) denote formulas. End-of-alphabet lower-case letters x; y; z denote variables, while other lower-case letters (e..g., u; v; ui ; vj ; a; b; c; :::) denote constants. Calligraphic characters (e.g., U ; Uv ) denote sets. Integers are denoted by lower-case Greek letters{e.g., ; ; . The Symbol ^ (resp. _, resp. :, resp. , resp. )) denotes and (resp. or, resp. not, resp. exclusive, resp. implies). Finally, n operator denotes set dierence, while jEj denotes the cardinality of the set E .
and uu(x) ) uu(y) holds then uu(y) holds. The second is the resolution rule adapted to j=sld : if ) holds and : ) : holds, then ) holds. For the sake of completeness, we add the following axiom [8x 2 U ; uu(x) :uu(x)] to our deduction system j=sld . This axiom states that any unit must be exclusively unreliable (i.e., faulty) or not unreliable (i.e., fault-free). Observe that the only predicate which could occur in all formulas handled by j=sld , our deduction system, is uu(x), where x 2 U , the given set of units. Therefore, all conjunctions and clauses in this paper are necessarily built using uu(x) predicate, where x 2 U . If is a formula, then U () denotes the set of units occurring in . Since units in U () could occur in positive or negative literals, then U + () denotes all units occurring in positive literals, whereas U , () denotes those occurring in negative literals. Let = uu(u)^uu(v)^:uu(p)^:uu(q)^:uu(r) be a formula; then U ( ) = fu; v; p; q; rg, U + ( ) = fu; vg, and U , ( ) = fp; q; rg. Now, we can de ne a diagnosis as follows.
De nition 1 Let ShU ; i be a system under diagnosis. Then a Diagnosis for S w.r.t syndrome is a conjunction such that: (1) jU + ()j is minimal w.r.t set cardinality; (2) j=sld ; and (3) f [ g is consistent. 222
The following properties are simple consequences of the above de nition and make a rst connection with the de nition of a diagnosis used in the system-level literature.
Property 1 Let ShU ; i be a system under diagnosis. If is a diagnosis for S w.r.t syndrome , then the following statements hold: (1) 8x 2 fUnU +()g; 8y 2 U , (); (:uu(x) ) :uu(y)) 62 . (2) 8x; y 2 U ; (:uu(x) ) uu(y)) 2 ) x 2 U + () _ y 2 U + (). (3) If ShU ; i is -diagnosable then is necessarily unique and jU + ()j . 222 Property 2 Let ShU ; i be a system under diagnosis. Consider , a syndrome of S, and let 1 and 2 be two conjunctions such that: (1) jU +(1 )j , jU + (2 )j , (2) U +(1 ) = 6 U + (2 ), U + (1 ) \ + U (2 ) = 6 ;, (3) 1 is a diagnosis for S w.r.t , and (4) 2 is a diagnosis for S w.r.t . Then ShU ; i is not -diagnosable. 222 Consider , a set of formulas, and , a conjunction. is called an implicant of provided that j= . Let be an implicant of ; then is called a prime implicant provided that if 0 is an implicant of F and 0 j= then = 0 . In what follows, we focus
only on prime implicants; we will write p-implicants to denote prime implicants. As in [6], the following property is immediate.
Property 3 Let ShU ; i be a system under diagnosis
and its syndrome. If is a p-implicant of such that U + () is minimal w.r.t set cardinality, then is a diagnosis of ShU ; i. 222
2.2 An Illustrative Example For the sake of illustration, let us borrow the following example from Dahbura & Masson [13]. Let SdmhUdm; dmi be the system under diagnosis, where Udm = fa; b; c; d; e; f; g; h; ig and dm = 4. Fig. 2.1 depicts Sdm together with dm, its corresponding syndrome. 0 0
b
a 0
1
1
1
1 1 1
1
i 1
1 1
0 0
c
1 1
1
h
1
1
1
1 0
1
d
1 1
1
0 1
1 0 0
e
1 1
1
g 0
f
Fig. 2.1: Illustrative Example{ SdmhUdm ; dm i.
8 (uu(b) _ :uu(a)); (uu(c) _ :uu(a)); 9 >> > >> (uu(d) _ :uu(a)); (uu(c) _ :uu(b)); >>> >> (uu(b) _ :uu(c)); (uu(c) _ :uu(d)); >> >> (uu(c) _ :uu(e)); (uu(d) _ :uu(e)); >> >> (uu(f ) _ :uu(g)); (uu(g) _ :uu(f )); >> < (uu(i) _ uu(b)); (uu(c) _ uu(i)); = dm > (uu(d) _ uu(i)); (uu(i) _ uu(e)); >> (uu(h) _ uu(b)); (uu(c) _ uu(h)); >>> >> (uu(d) _ uu(h)); (uu(e) _ uu(h)); >> >> (uu(a) _ uu(g)); (uu(b) _ uu(g)); >> >> (uu(c) _ uu(g)); (uu(g) _ uu(d)); >> >: (uu(f ) _ uu(a)); (uu(f ) _ uu(b)); >; (uu(c) _ uu(f )); (uu(d) _ uu(f ))
Considering the syndrome dm , SdmhUdm ; dm i admits the following diagnosis: dm = :uu(a) ^ :uu(b) ^ :uu(c) ^ :uu(d) ^ :uu(e) ^ uu(f ) ^ uu(g) ^ uu(h) ^ uu(i)
2.3 Theoretical Issues Lemma 2 Let ShU ; i be a system under diagnosis and its syndrome. Then we have the following: 8x; y; z 2 U ; j=sld (:uu(x) ) :uu(y)) ^ j=sld (:uu(y) ) uu(z )) ) j=sld (:uu(x) ) uu(z )): 222
While the importance of Lemma 2 will be shown in x2.4, the following lemmas are ready to oer us an immediate result, the status of a given unit. Lemma 3 Let ShU ; i be a system under diagnosis and its syndrome. Then we have: 8x; y 2 U ; uu(x) ^ [ j=sld (:uu(y) ) :uu(x))] ) j=sld uu(y): 222 Lemma 4 Let ShU ; i be a system under diagnosis and its syndrome. Then we have: 8x; y 2 U ; j=sld (:uu(x) ) :uu(y)) ^ j=sld (:uu(y) ) uu(x)) ) j=sld uu(x): 222 Assuming that ShU ; i, the system under diagnosis, is diagnosable, we introduce the following de nition. De nition 2 Let ShU ; i be a system under diagnosis, then the Actual Diagnosis for S w.r.t , its syndrome, is a conjunction, denoted by , such that (1) U () = U , i.e., the state (reliable or not) of each unit is speci ed in , and (2) j=sld , i.e., holds in the intended interpretation. 222 Considering , the following lemma is immediate. Lemma 5 Let ShU ; i be a system under diagnosis and its syndrome. If is a diagnosis for ShU ; i w.r.t , then j=sld . 222 The following proposition is straightforward but it serves our purpose. Proposition 1 Let be a set of formulas and 1 ; ; m the set of its p-implicants. If is consistent and complete, then there exists among 1 ; ; m at least one p-implicant which necessarily holds. 222 Lemma 6 Let ShU ; i be a system under diagnosis and its syndrome. Suppose that 1 ; ; m are the p-implicants of . Then among them, there exists at least one p-implicant k such that: j=sld k :
222
Considering Lemma 6 together with De nition 1, we obtain: Corollary 1 Let ShU ; i be a system under diagnosis and its syndrome. If is a diagnosis for ShU ; i w.r.t , then is a p-implicant of . 222
Lemma 7 Let ShU ; i be a system under diagnosis and its syndrome. Suppose that 1 ; ; m are the p-implicants of . Assuming that S is -diagnosable, then there exists one and only one p-implicant among 1 ; ; m such that: jU + ()j : 222 Let Generate() be a procedure to compute the set of p-implicants for , a given set of clauses. Suppose that Select primitive picks up the p-implicant which is minimal w.r.t set cardinality. Then by using Lemma 7, we obtain a rst diagnosis algorithm, A, shown in Fig. 2.2. A(ShU ; i; ) j PIs Generate() j Select(PIs)
j Return() j
Fig. 2.2: A{ Abstract Algorithm Clearly, A is an abstract algorithm and several versions could be derived. It is also inecient as we will investigate eciency in the next subsection. In this vein, the following corollaries characterize (un)reliable units. Corollary 2 Let ShU ; i be a system under diagnosis and its syndrome. Consider u 2 U and suppose that :uu(u) holds, then for any p-implicant of , we have: u 2 U + () ) jU + ()j > : 222 Corollary 3 Let ShU ; i be a system under diagnosis and its syndrome. Consider u 2 U and suppose that for any , a p-implicant of , we have u 62 U + (); then :uu(u) holds. 222 As a matter of fact, a diagnosis could be built recursively, as shown by the following lemma. Lemma 8 Let ShU ; i be a system under diagnosis and its syndrome. Considering v 2 U and supposing that uu(v) holds, let us build S0 hU 0 ; 0 i, a new system under diagnosis, and 0 , its syndrome, such that U 0 = Unfvg, 0 = , 1, and 0 consists of all formulas of except those which include v. Then all previous lemmas and corollaries hold for S0 hU 0 ; 0 i.
222
2.4 Implementation Issues As one could expect, previous results and namely
A Algorithm, is rather useless. This cannot aord
us practical ways to diagnose unreliable units. Ultimately, computing the p-implicants of a given syndrome is rather expensive{ indeed the process could be exponential [6]. We must therefore seek ecient
ways to build diagnoses. Intuitively, we want to take advantage of the nature of our formulas. Let us focus on , a given syndrome. Let us denote by the conjunctions of all clauses which can be deduced from . Formally, we have: j=sld . Observe that since U is nite, the set of all formulas which can be deduced from under j=sld is necessarily nite. Moreover, j=sld is sound and valid by de nition. Finally, is assumed to be consistent and complete, is also consistent and complete. From the operational viewpoint, includes all clauses in and possibly new (deduced) clauses which are founded on Lemma 2. In our example, coincides with , namely because Dahbura & Masson have arranged the syndrome such that the set of clauses remains minimal and the status of units cannot be stated easily. Let us now focus on the nature of the clauses included in . Any clause in could be either (1) (uu(x) _ uu(y)), or (2) (uu(x) _ :uu(y)), where x; y 2 U . Let us then split into two subsets + (resp. , ) including all clauses of the rst (resp. second) form. A close examination of our deduction system shows that clauses in , are from clauses in . On the other hand, clauses in + are generated from a combination of clauses in + and , , using the resolution rule. From an informational standpoint, clauses in , convey little information. A clause such as (uu(u) _ :uu(v)) states that if u is reliable then v is also reliable. On the other hand, a clause in + (e.g., (uu(u) _ uu(v))) states that at least u or v is unreliable. Considering the above remarks, let us focus on + and call all clauses in + con icts. Intuitively, a con ict clause (uu(u) _ uu(v)) tells us that there is con ict between units u and v and one of them is necessarily unreliable. In fact, the computation of a diagnosis for a system under diagnosis ShU ; i w.r.t , its syndrome, turns out in resolving all con icts in + . From a theoretical standpoint, computing a diagnosis w.r.t + instead of does not convey the same information. In particular, such a diagnosis does not specify the state of reliable units. Indeed, for any prime implicant of + , we have U , () = fg. Hence, a unit which is not stated explicitly to be unreliable is assumed to be reliable. In fact, such an assumption was already made in our initial De nition 1 but not in De nition 2. On the other hand, diagnosis computation w.r.t + is computationally attractive. Consider + a given set of con icts, and let x 2 U be a given unit. Let us then denote by +x the set of con icts in which x occurs. Similarly, let Ux be the subset of units occurring in +x . Note that x 2 Ux.
Finally, let us denote by +x the subset of remaining con icts which do not include units occurring in Ux. Then we have: Lemma 9 Let ShU ; i be a system under diagnosis and + the set of its con icts. Consider u 2 U and suppose that j+u j = ; then uu(u) holds i: jj > , ; where denotes the p-implicant of +u such that jU ()j is minimal w.r.t set cardinality. 222 Considering Lemma 9, we can now design another algorithm to build a diagnosis. The C algorithm is shown in Fig. 2.3. C(U ; ; )
Begin ; For x 2 U Let = j+x j PIs
+
Generate(x ) Select(PIs)
If jj > , Then Let uu(x) in Else Let :uu(x) in Fi Od Return() End
Fig. 2.3: C{ Concrete Algorithm Back to our example, let us take unit a, we have: +a = f(uu(a) _ uu(f )); (uu(a) _ uu(g))g, = 2, +a = f(uu(i) _ uu(b)); (uu(c) _ uu(i)); (uu(d) _ uu(i)); (uu(i) _ uu(e)); (uu(h) _ uu(b)); (uu(c) _ uu(h)); (uu(d) _ uu(h)); (uu(e) _ uu(h)); g, while = (uu(i) ^ uu(h)) stands for a minimal p-implicant. Since jj = 2, it follows that :uu(a) is in . Hence, unit a is then reliable. On the other hand, consider unit f , we obtain: +f = f(uu(f ) _ uu(a)); (uu(f ) _ + uu(b)); (uu(f )_uu(c)); (uu(f )_uu(d))g, = 4, f = f(uu(i) _ uu(e)); (uu(e) _ uu(h)); g, and = uu(e). This time, we have jj = 1 and uu(f ) is in . Unit f is then unreliable. Yet, C algorithm is still not ecient because it uses p-implicants. However, let us rst observe that computing a p-implicant which is minimal w.r.t set cardinality coincides with computing a minimal vertex cover in a bipartite graph [19, 9]. Most importantly, we are not interested in computing the p-implicant which is minimal w.r.t set cardinality, but rather in knowing its size. Therefore, using Konig's result [19], we know that the size of a maximal matching of a bipartite graph is also the size of a minimal vertexes cover. These remarks yield a diagnosis algorithm,
called D, depicted in Fig. 2.4. From an abstract view, D algorithm works in similar way than C algorithm. The main dierence is that D algorithm represents formulas in as a graph to handle size of maximal matching rather than size of p-implicants. However, the mapping is not straightforward and raises several subtleties. These are explained later, when commenting the stages of the algorithm. For now, let us turn our attention to the computation of . For this end, Proposition 1 together with the de nition of an order on units play a central role. As noted in x2.4 (paragraph 3), coincides with in our example. However, this is not always the case. Intuitively, the order aims at driving the inference process, particularly when using Lemma 2, and obtaining in O(n2 log n). Consider two clauses (:uu(u) _ uu(v)); (uu(u) _ uu(w)), we have to de ne the following order u v w. A such an order, where u precedes the other ubits, starts the inference process with clauses including unit u to generate new clauses (i.e., (uu(v) _ uu(w)) which will be added to . Details are in [3]. Now, we are ready to comment D algorithm. It requires four parameters : (1) U , the set of units; (2) , the maximum number of faulty units; (3) EMM+ , the maximal matching of the corresponding graph; (4) , the size of EMM+ . To provide the third parameter, we proceed as follows. Start by generating , then r g build BG+ hV ; V ; E i, a bipartite graph for + . If x is a unit occurring in + , then let us color it twice, once red, denoted by xr , and then green, denoted by xg . The set of vertices (units) V r (resp. V g ) includes all red (resp. green) units. Put the red vertices on the left side and the green ones on the right side of BG+ . If (uu(u) _ uu(v)) is a con ict in + then build two edges (ur ; vg ) and (vr ; ug ) and put them in E . From a computational viewpoint, BG+ doubles both the number of units and the number of con icts in + . For convenience, let us introduce the following conventions. If x 2 U is a unit, then xr (resp. xg ) denotes the red (resp. green) vertex in V r (resp. V g ). When the color does not matter, we write x . Finally, if x denotes a colored vertex in V r or V g , then x stands for the \original" unit in U . Finally, compute a maximal matching of BG+ hV r ; V g ; E i to obtain EMM+ . Usingpthe procedure of [2], this could be done in O(n01:5 m=log n0 ), where n0 (resp. m) is the size of the set of vertices (resp. edges) in BG+ . We have n0 2n, where n = jUj. Assuming4 that 4 In fact,
ially holds.
if x has more than edges, then uu(x) triv-
the number of edges for each vertex does not exceed , then m n0 . Summing p p up, the complexity of this stage is in O(n2 = log n). Set to the size of EMM+ . Let us denote by EMM+ (resp. V MM+ ) the subset of edges (resp. vertices) included in a maximal matching of BG+ hV r ; V g ; E i, the corresponding bipartite graph of + . As mentioned above, the set of edges of BG+ , doubles the number of con icts in + . Ultimately, it doubles the size of the maximal matching [3]. Now, we are ready to comment each stage. D(U ; ; EMM+ ; ) j % Stage 1.0: Initialization j U , = fg j U + = fg j % Stage 2.0: Diagnosis j For x 2 U j j Stage 2.1: r Initialization j j A Adjfx g j j jAj j j 0 j j Stager 2.2: Updating EMM+ w.r.t xr j j If (x ; yg ) 2 EMM+ j j j Then j j j j + 1g j j j j A Any j j j j j j j j j Stage 2.3: Updating EMM+ w.r.t Adjfxr g j j For y 2 Ar g j j j If (z ; y ) 2 EMM+ j j j j Then 0g j j j j j If 9y 2 Adjfzr g ^ (zr ; y0g ) 62 EMM+ j j j j j j Then 2 j j j j j j Else + 1 j j j j j j j j j j j j j j j j j j j j % Stage 2.4: Decision w.r.t Lemma 9 j j If , > 2+, + j j j Then U U [ fxg j j j Else U , U , [ fxg j j j j j j % Stage 3.0: V TerminationV j D = ( x2U + uu(x) ^ x2U , :uu(x)) End.
Fig. 2.4: D Algorithm. Stage 1.0 simply initializes two sets U + and U ,, the sets of unreliable and reliable units respectively. Similarly, Stage 3.0 collects units in U + and U , to build the nal diagnosis . Stage 2.0 uses Lemma 9 to determine the status of each unit x 2 U . Stage 2.1 is merely initialization, where A denotes the set of vertices adjacent to xr , denotes the size of A, and , set to 0, will serve as a counter. The primitive Adjfx g returns the sub-
set of units which are adjacent to x in BG+ . A conventional preprocessing with an appropriate data structure ensures that Adjfx g can be done in O(1). Stage 2.2 and Stage 2.3 aims at computing the size of a minimal p-implicant of +x . Referring to BG+ , this amounts to computing the size of a maximal matching of BG+ , when removing vertex xr as well as all vertices adjacent to xr . Instead of computing a new maximal matching, Stage 2.2 and Stage 2.3 actually update the size of EMM+ . Assume that Adjfxr g = fy1g ; ; ykg g. If there is an edge (xr ; yjg ) 2 EMM+ , then the size of EMM+ must be decremented by one. That is the task of Stage 2.2, where counts the number of edges deleted from EMM+ . Again, a simple data structure ensures that such a test can be done in O(1). Consider A = Adjfxr gnyjg . Stage 2.3 must proceed as follows. Whenever EMM+ includes (z r ; yig ) such that yig 2 A, then the initial size of EMM+ has to be decremented by one, except in the following particular case: there is an edge (z r ; y0g ) 2 E such that y0g 62 V MM+ . Fortunately, such a particular case occurs only when xr , the unit being considered, is unreliable. Indeed, suppose that xr is reliable (i.e., :uu(x) holds); then by de nition 8y 2 A; uu(y). Since (z r ; yig ) 2 EMM+ , it follows that :uu(z r ). Hence, we have 8y0 2 Adjfz 0r g; uu(y0 ). Moreover, we have 8y0 2 Adjfz 0r g, 9(z 0r ; y0) 2 EMM+ . Summing up, the particular case occurs only when x is an unreliable unit. Formally, we have the following property.
Property 4 Let x be a unit in U and suppose that Adjfxr g = fy1g ; ; ykg g the subset of units adjacent to xr in BG+ . If EMM+ is a maximal matching of BG+ , then we have: 8z 2 U ; 8yg 2 Adjfxr g; z r = 6 xr ^ (z r ; yg ) 2 EMM+ ^ 9y0g 2 Adjfz r g^ (z r ; y0g ) 62 EMM+ ) uu(x): 222 As in Stage 2.2, variable in Stage 2.3 counts the number of edges deleted from EMM+ . Note that is set to a high value (e.g., 2 ) when the particular case described above is encountered. Testing for such a case could be done in O(1), provided a simple preprocessing which indicates for each unit xr whether Adjfxr g V MM+ . Stage 2.4 uses Lemma 9 to determine the status of each unit. Finally, note that Stage 2.0 is in O(n), while preprocessing (e.g., building Adjfxg could be done in O(n2 )). Summing up, we obtain the following lemmas.
Lemma 10 (Correctness) Let ShU ; i be a system under diagnosis and its syndrome. If denotes
the actual diagnosis of ShU ; i w.r.t and D is the diagnosis generated by algorithm D, then = D .
222
Lemma 11 (Complexity) p p The complexity of the D algorithm is in O(n2 = log n), which corresponds to the complexity computing a maximal matching for a bipartite graph. 222
Note that the D algorithm is basic in the sense that it does not test certain particular cases, where it could be stated straightforwardly that a unit is reliable or unreliable. However, these particular cases, discussed in [3], do not change the overall complexity, which depends on the complexity of computing a maximal matching for a bipartite graph.
3 Epilogue
3.1 Related Work Using the proposed framework, it is straightforward to capture the essence5 of both Kameda et al [17] and Sullivan [24] algorithms{ see [3] for details. This paper is in fact rooted in Dahbura & Masson's proposal [13]. Therefore, several similarities could be drawn between the two proposals. Our proposal differs from the work of Dahbura & Masson on at least three levels. On the conceptual level, it seems that the ultimate goals of the work are dierent. The goal of Dahbura & Masson is speci cally to obtain an ef cient and elegant diagnosis algorithm. On the other hand, our primary aim is to investigate the diagnosis computation process. Consequently, the proposed D algorithm appears as an incidental result. Unlike Dahbura & Masson's work, ours requires no diagnosability criteria thanks to our use of p-implicants. On the decision level, the status of each unit is determined according to dierent criteria. In particular, the D algorithm is based on Lemma 9 and uses the Konig's Theorem [19], while the Dahbura & Masson algorithm uses Hall's Theorem [16] and is based on an elegant labeling procedure. Finally, on the level of implementation, both algorithms take advantage of maximal matching. The Dahbura & Masson algorithm requires the computation of a maximal matching in a general graph, whereas the D algorithm uses a bipartite graph. Hopefully, computation of a maximal matching is much easier for a bipartite graph. p pIts theoretical complexity, which is now in O(n2 = log n), may be reduced still further. Obviously, there is no simple way to adapt the Dahbura & Masson algo5 Actually,
we showed that it is possible to provide a counter-example for Kameda et al [17] algorithm in [3].
rithm in [13] to handle bipartite graphs without altering the essence of the algorithm.6 Yet, we do not believe that the D algorithm is in practice more ecient than that of Dahbura & pMasson. The new bound, which is in O(n2 p= log n) compared to O(n2:5 ), is rather theoretical. As a matter of fact, experimentation shows that backtracking algorithms (e.g., the Sullivan algorithm which is in O( 3 + jE j)) are the most ecient [12]. In fact, the ultimate goal of this paper is not take part in a competition, but rather to shed some light on the process of diagnosis.
3.2 Concluding Remarks The ultimate objective of this paper is to present an investigation7 of diagnosis computation process in -diagnosable systems. Such an investigation yields a simple diagnosis algorithm pwhose p (theoretical) complexity is attractive{ O(n2 = log n) compared to O(n2:5 ) the best known complexity. This work could be extended in several directions. The rst direction consists in studying non-PMC models [7, 22], intermittent faults [25, 8]. In the same vein, distributed diagnosis is another area, where investigation is worthwhile. The preliminary results [4, 5] are quite encouraging but much work remains to be done. There are several challenges which remains to be tackled. It is unclear whether simple criteria can be found to determine straightforwardly the status of a unit. That is, given a unit x, is it possible to say whether x is reliable or not simply by exploring a constant number of other units? In [18], a thorough investigation is provided and a deep case-by-case analysis remains to be done.
Acknowledgments I appreciated the encouragements of Dr. Tony Dahbura, the co-author of [13]. His elegant algorithm (with Prof. G. Masson) has triggered this work aiming at the analysis of the essence of an ecient computation of diagnoses by using the concept of maximal matching. I would also like to thank my colleague Prof. D. Ziou, who spent a (very) long time, while I am presenting multiple versions of this work. 6 A recent correpondence with Dr. A. Dahbura has drawn our attention to [11] which is built on a bipartite graph. A deep analysis of [11] is ongoing. We believe however that this work goes beyond using bipartite graphs. It rather proposes a new approach shedding deep insights on the diagnosis process itself. 7 See research report [3] for a thorough investigation.
References [1] F. J. Allan, T. Kameda, and S. Toida. An Approach to the Diagnosability Analysis of a System. IEEE, Transactions on Computers, 25:1040{1042, October 1975. [2] H. Alt, N. Blum, K. Mehlhorn, and M. Paul. Computing a Maximum Cardinality p Matching in a Bipartite Graph in Time O(n1:5 m=log n). Information Processing Letters, (37):237{240, February 1991. [3] B. Ayeb. Fault Identi cation in System-Level p p Diagnosis: An O(n2 = logn) Algorithm. Research Report, 20 pages # DMI-218, Universite de Sherbrooke, 1998. [4] B. Ayeb. Reliability: Circumventing vs Identifying Faults (Bridging the Gap). In Proceedings of 2nd IMACS International Conference on Computational Engineering in Systems Applications, pages (3)129{133, Hammamet (TN), April 1{4 1998. [5] B. Ayeb and A. Farhat. The Byzantine Problem: Masking and Demasking Faults. Research Report # 219, Submitted, 1998. [6] B. Ayeb, P. Marquis, and M. Rusinowitch. Preferring Diagnoses By Abduction. IEEE Transactions on Systems, Man and Cybernetics, 23:792{ 808, May 1993. [7] F. Barsi, F. Grandoni, and P. Maestrini. A Theory of Diagnosability of Digital Systems. IEEE, Transactions on Computers, C-25(6):585{ 593, 1976. [8] D. M. Blough, G. F. Sullvian, and G. M. Masson. Intermittent Fault Diagnosis in Multiprocessor Systems. IEEE, Transactions on Computers, 41(11):1430{1441, November 1992. [9] J. A. Bondy and U. S. R. Murthy. Graph Theory and Applications. Elsevier North-Holland, New York, 1976. [10] D. Coppersmith and S. Winograd. Matrix Multiplication via Arithmetic Progressions. IEEE, Transactions on Computers, 9(3):251{280, 1990. [11] A. Dahbura and G. Masson. A Practical Variation of the O(n2:5 ) Fault Diagnosis Algorithm. In Proceedings of FTCS, pages 428{433, 1984. [12] A. T. Dahbura, J. J. Laferrera, and L. L. King. A Performance Study of System-Level Fault Diagnosis Algorithms. In Proceedings of 4th. Phoenix Conf. on Computers and Communication, pages 469{473, Scott Sdak, AR (USA), 1985.
[13] A. T. Dahbura and G. M. Masson. An O(n2:5 ) Fault Identi cation Algorithm for Diagnosable Systems. IEEE, Transactions on Computers, 33:486{492, June 1984. [14] A. Goralcikova and V. Koubek. A Reduct and Closure Algorithms for Graphs. In Springer Heidelberg, editor, Proceedings Conf. on Mathematical Foundations of Computer Science, LNCS 74, pages 301{107, 1979. [15] S. L. Hakimi and A. T. Amin. Characterization of Connection Assignement of Diagnosable Systems. IEEE, Transactions on Computers, pages 86{88, January 1974. [16] P. Hall. On Representatives of Subsets. J. London Math. Soc., 10:26{30, 1935. [17] T. Kameda, S. Toida, and F. J. Allan. A Diagnosing Algorithm for Networks. Information and Control, 29:141{148, 1975. [18] M. A. Kennedy and G. G. L Meyer. The PMC System Level Fault Model: Cardinality Properties of the Implied Faulty Sets. IEEE, Transactions on Computers, 38(3):478{480, March 1989. [19] D. Konig. Graphes and Matrices (Hungarian). Mat. Fiz. Lapok, 38:116{119, 1931. [20] K. Mehlhorn. Data Structures and Ecient Algorithms. Springer Verlag, New York, 1984. [21] F. P. Preparata, G. Metze, and R. T. Chien. On the Connection Assignement Problem of Diagnosable Systems. IEEE, Transactions on Electronic Computers, EC-16(6):848{854, December 1967. [22] A. Sengupta and A. T. Dahbura. On SelfDiagnosable Multiprocessor Systems: Diagnosis by the Comparison Approach. IEEE, Transactions on Computers, 41(11):1386{1396, November 1992. [23] A. K. Somani, V. K. Agarwal, and D. Avis. A Generalized Theory for System Level Diagnosis. IEEE, Transactions on Computers, C-36(5):538{ 546, May 1987. [24] G. F. Sullivan. An O(t3 + jE j) Fault Identi cation Algorithm for Diagnosable Systems. IEEE, Transactions on Computers, 37(4):388{ 397, April 1988. [25] C. L. Yang and G. M. Masson. A Fault Identi cation Algorithm for ti -Diagnosable Systems. IEEE, Transactions on Computers, C-35(6):503{ 510, June 1986.