Combination of multiple classifiers with measurement values ...

Report 2 Downloads 24 Views
Combination of Multiple Classifiers with Measurement values Y. S. HuangO and C. Y. Suen Centre for Pat tern Recognition and Machine Intelligence Concordia University, Montreal, Quebec H3G 1M8, Canada Abstract

Concordia University, Suen and his group [4, 5, 6 , 71 have conducted substantial research to combine the first type of classifiers (the results on handwritten numeral recognition are quite encouraging). Also, at the CEDAR Lab. of the State University of New York at Buffalo, Hull, Ho and Srihari used candidate subset combining and re-ranking approach for the second type of classifiers [3, 81. As for the third type, little work [l]has come forward t o date. One approach for combining classifiers of the third type has been developed at CENPARMI recently. It is called the Linear Confidence Accumulation method (LCA). Experiments by the voting method, the Bayesian method and LCA have been performed on a data set of 2997 numeral samples, where the voting method and the Bayesian method did not take measurement values into account. The results of the experiments reveals that measurement values are important and they can be used to improve system performance. Section 2 defines the basic symbols of CME and gives a formal specification of CME dealing with the third type of classifiers.

This paper introduces one approach f o r the combination of classifiers, in a context that each classifier can ofler not only class labels but also Ihe corresponding measurement values. This approach is called the Linear Confidence Accumvlation method (LCA). The three steps that LCA consists of are: first, measurement values are transformed into confidence values; second, a confidence aggregation function aggregates the confidence values of each class label; and the last, the final decision will be derived by a decision rule based on the accumulated confidence values. Preliminary experiments have been performed and showed that LCA achieved better performance than the voting and the Bayesian methods. This reveals that measurement values play an important role in improving a system’s performance when combining daflerent classifiers.

1

Introduction

Handwritten character recognition has been studied for more than two decades. During this period, most research emphasized topics such as image preprocessing, feature extraction, classification and contextual post-processing. Many good techniques related to these topics have been developed. However, none of them can achieve a satisfactory performance when faced with characters of degraded quality such as unconstrained, cursive, or noisy characters. Recently, a trend called “Combination of Multiple Experts”(CME) has emerged to solve the problem and it has shown t o be very promising [l,2, 3, 4, 5, 6 , 71. In general, there are three types of classifiers which offer different levels of output information: the first offers a unique class label, the second offers a subset of ranked or unranked candidate class labels, and the third offers a subset of class labels with measurement values. At the CENPARMI research Lab. of

2

ek means classifier k where k = 1,. . ., K , and K is the total number of classifiers. Cl,. . . , CM are mutually exclusive and exhaustive sets of patterns. M represents the total number of pattern classes. A = { 1, . . . , M } is a set which consists of all class index numbers. x denotes an input pattern and e k ( x ) = {mi(x))1V i (1 5 i 5 M ) } means that expert k assigns the input x to each class i with a measurement value mi(x).Often mi(x)is expressed as mf in order to make the expression simpler. Then, the problem of CME becomes - When K experts give their individual decisions about the identity of a unknown input, how can these individual decisions be combined efficiently to produce a better decision? To formulate

‘He is on leave from Industrial Technology Research Institute, Taiwan, R.O.C.

598

0-81864960-7/93$3.000 1993 IEEE

Symbol Definition

I

Rec.

I 96.29

el

I 96.50

e2

Sub. 3.70 3.50

Rej. 0.00

Rel. 0.9629 0.00 0.9650

A 0.5 1.0

J

Rec. 96.30 94.26

Sub. 3.70 0.90

Rej.

0.00 4.84

Rel. 0.9630 0.9905

Table 2: Performances after combining two classifiers by the voting method.

Table 1: Performance of individual classifiers. rules can be designed to serve this purpose. The most optimistic decision rule is

, if j E A

E(x)=

A4 +

and V+) = maxci&(x) 1 , otherwise.

0.000 0.500 0.600 0.700 0.800

> 0;

{ j (7) Usually, a threshold a (0 5 a 5 1) is adopted to make decisions more reliable. Then, the decision rule becomes

E(x)=

M

+1

, ifjEAand (VJ(x) = maxi&J'(z)) > a * A-; , otherwise. (8)

4

Experiment

Two classifiers were used (el and ez) in the following experiments, which were developed by Franke [9] during his visit to CENPARMI Lab. In fact, el and e2 perform 'quite similarly because both of them are functional classifiers and were constructed by using the same kind of features but different polynomial combinations. In fact, el and e2 are not the appropriate classifiers for CME. However, due to (1) no other more appropriate data set is currently available, and (2) some applications may have only very limited number of experts, el and e2 are adopted in our preliminary experiments. Table 1 shows the individual performances of el and e2 for 2997 samples, where Rec., Sub., Rej. and Re1. denote the recognition, substitution, rejection and reliability rates respectively. The reliability rate is defined by

96.33 95.70 94.86 94.26 94.26

3.67 3.04 1.70 1.13 0.90

0.00 1.27 3.44 4.60 4.84

0.9633 0.9692 0.9824 0.9881 0.9905

Table 3: Performances after combining two classifiers by the Bayesian method. measurement value from each classifier in this experiment. Tables 2, 3, and 4 show the performance of the voting, the Bayesian and LCA methods respectively. Fig. 1 shows their performances in a graphic representation. Clearly, LCA performed best, This reveals that measurement value is important for improving the system performance. Interestingly, even when the two classifiers were strongly dependent on each other, the performance can be improved considerably by LCA.

5

Conclusion

CME can be applied to many areas in expert systems and pattern recognition. According to the levels of output information, various classifiers can be divided into three types: the first offers a unique class level, the second offers a subset of ranked or unranked

X 0.000

Recognition Reliability = 100% - Rejection '

0.100 0.200 0.300 0.400 0.500 0.600

Besides LCA, the voting [4] and the Bayesian methods [5] are also included in the experiments. Since the voting and the Bayesian methods are designed for combining the first type of classifiers, they will only take one class label and no measurement value offered by each classifier was taken into account. For comparision, LCA also took only one class label with a

Rec. 97.10 96.03 95.26 94.66 94.26 93.89 93.29

Sub.

Rej.

Rel.

2.90 1.84 1.33 1.07 0.87 0.83 0.80

0.00

0.9710 0.9812 0.9862 0.9888 0.9909 0.9912 0.9915

2.14 3.40 4.27 4.87 5.27 5.91

Table 4: Performances after combining two classifiers by LCA.

599

this problem, it becomes

confidence values linearly, it is called the Linear Confidence Accumulation method (LCA). However, equation (3) is based on three assumptions: (1) mi(.) can only have contribution to the confidence of class d , (2) the classifiers are independent of each other, and (3) the aggregation confidence values are produced by a linear summation of the individual confidence values. The remainder of this section will discuss the following two issues: (1) the transformation from a measurement value into a confidence value, and (2) the decision rule which gives x one definitive class label j .

e K ( z ) = mk,...,mg J where E ( z )is the combination function of the multiple classifiers which gives x one definitive class j and E A U { M 1).

+

3

The Linear Confidence Aggregation Method

3.1

There are many types of measurement values output by various classifiers such as distance, similarity and subjective confidence value'. Conventionally, similarities and subjective confidence values range between [0-l], but distances may range between different intervals (e.g., [0.1-2.01 or [3.0-100.01) depending on the output scales of the individual classifiers. Suppose m f ( t ) only has contribution to the confidence value of the class d, then a general equation to aggregate multiple measurement values is ~'(z= )

mi,, + . , mi, , m k )

Suppose each sample x contains both I ( x ) and { m f ( z ) l V d , k ( 1 5 d 5 M and 1 5 k 5 K ) } , where I ( x ) is the expected class label of x. Let c f ( y ) represent the objective confidence value when classifier k supports class d with a measurement value y . In fact, c f ( y ) is the conditional probability that x belongs to class d when classifier k gives its measurement value as y. Therefore, c f ( y ) can be expressed by a conditional probability as ci(Y)

(2)

where V i denotes the aggregated confidence value of class i (1 5 i 5 M ) and F is the aggregating function which contains parameters mi (1 5 k 5 K ) . The most common and simple model of function F performs a weighted and linear summation as ~ ' ( 8= ) w1*

mf

+ . . . + wk * m i + . . . + W K * mk.

+ . + f k ( m i )+ + fK(mk) '

'

' ' '

= p ( z E c d I ?/,EN;) = P(l(x)=d 1 y,ENi)

(4)

where P ( .) denotes the probability function and EN,' denotes the classification environment generated from classifier k . Let Pk be the probability function of classifier k, equation (4) becomes C~(Y) = Pk(I(2)= d

where wk is the weight of classifier k (1 5 k 5 K ) . Usually, these weights are adjusted only in the design phase and remained fixed during operation. However, due to nonlinearity of measurement values, a constant weight cannot serve its role well. Therefore, a modified aggregation technique becomes

V'(Z)= t l ( m f )

Measurement Value and Confidence Value

I Y).

(5)

Using the property of a conditional probability, ci?(y) can be further derived as

(3) Therefore, equation (3) becomes

where t k denotes a function which transforms a measurement value of classifier k into an objective confidence value2. Since this approach sums up all the

+

Vi (x) = cf (mf ) . . . + cf (mi) + . . . + c&((mk).

3.2

Often, a confidence value directly offered by a classifier is based on a distance value and a heuristic transformation function, it maybe quite different from the real confidence value. Therefore, it is called a subjective confidence value. 20bjective confidence value means that it is not a subjective assignment but is derived by probability function. Hereafter, objective confidence value will be expressed as confidence value if no ambiguity is possible.

(6)

Confidence Accumulation and Decision

Since the aggregated confidence values of each class label can be computed by equation (6) for LCA, the next step is t o derive the final decision from these aggregated confidence values. Various kinds of decision

600

6

Acknowledgements

The authors want to thank Mr. J. Franke for supporting two classifiers for the experiments. This work was s u p ported by research grants received from the Natural Sciences and Engineering Research Council of Canada, the Ministry of Education of Quebec and the Chinese Character Recognition Project (project no. : 35N1100) supported by MOEA, Taiwan, R.O.C.

‘i

References E. Mandler and J. Schirmann. “Combining The Classification Results of Independent Classifers Based on the Dempster-Shafer Theory of Evidence,” Pattern Recognition and Artificial Intelligence, pp. 381-393, 1988. Figure 1: T h e graphic representation of the performances of the voting, the Bayesian and LCA.

X. Ling and W.G. Rudd. “Combining Options from Several Experts,” Applied Artificial Intelligence, Vol. 3, pp. 439-452, Hemisphere Puslishing Corporation, 1989.

candidate class labels, and the third offers a subset of class levels with measurement values. This paper proposed one approach (LCA) t o combine the classifiers of the third type. It contains three steps. First, the measurement values offered by individual classifiers are transformed into confidence values. Secondly, the transformed confidence values are aggregated into collective confidence values. Thirdly, with a decision rule the final decision is derived by comparing the collective confidence value of each class label. Using an available data set, two strongly dependent classifiers were used in our preliminary experiments. Although these two classifiers are not quite suitable for CME, they have indeed made a n extremely challenging test case for the 3 CME methods: voting, Bayesian, and LCA. Two conclusions can be derived from this preliminary experiment: (1) measurement values play an important role in improving the system performance and (2) the transformation function which derives the confidence values from measurement values is quite effective. During the derivation of LCA, many assumptions have been made, such as the classifiers are independent of each other, the aggregation confidence values are produced by a linear summation of the individual confidence values, and so on. The performance may deteriorate dramatically when the assumptions are not satisfied. Therefore, future research should focus on 0

0

J.J. Hull, A. Commike, and T.K. Ho. “Multiple Algorithms for Handwritten Character Recognition,” Proc. International Workshop on Frontiers in Handwritten Recognition, pp. 117-129, 1990, Montrkal, Canada.

C. Nadal, R. Legault, and C.Y. Suen. ”Complementary Algorithms for the Recognition of Totally Uncontrained Handwritten Numerals,” Proc. lnternationol Con+rence on Pattern Recognition, Vol. 1, pp. 443449, 1990, Atlantic City, New Jersey, USA. L. Xu, A. Krzyzak, and C.Y. Suen. “Method of Combining Multiple Classifiers and Their Application to Handwritten Numeral Recognition,” IEEE Trans. on Systems, Man and Cybernetics, Vol. SMC-22, No. 3, pp. 4 1 8 4 3 5 , 1992. Y.S. Huang and C.Y. Suen. “An Optimal Method of Combining Multiple Experts for Handwritten Numerical Recognition,” the Third International Workshop on Frontiers in Handwriting Recognition, pp. 11-20, Buffalo, New York, USA, 1993. Y.S. Huang and C.Y. Suen. ‘The Behavior-Knowledge Space Method for the Combination of Multiple Classifiers,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 347-352, New York, 1993.

T.K. Ho. “Theory of Multiple Classifier Systems and Its Application to Visual Word Recognition,” Doctoml Dissertation, Department of Computer Science, State University of New York at Buflalo, 1992.

J. Franke. “On the Functional Classifier,” 1st Int. Conf. on Document Analysis and Recognition, pp. 481489, 1991, St. Malo.

Experiments on a large and more general data set, Elimination of the assumptions about LCA and further improvement of its performance.

601