A Method of Feature Selection Using Contribution Ratio based on ...

Comment

Report 0 Downloads 49 Views

A Method of Feature Selection Using Contribution Ratio based on Boosting Masamitsu Tsuchiya, Hironobu Fujiyoshi Dept. of Computer Science, Chubu University Aichi 487-8501 Japan {tsuchiya, hf}@vision.cs.chubu.ac.jp Abstract AdaBoost and support vector machines (SVM) algorithms are commonly used in the ﬁeld of object recognition. As classiﬁers, their classiﬁcation performance is sensitive to affected by feature sets. To improve this performance, in addition to using the classiﬁers for accurate selection of feature sets, attention must be given to determining which feature subset to use in the classiﬁer. Evaluating feature sets using a margin of the decision boundary of an SVM classiﬁer proposed by Kugler is a solution for this problem. However, the margin in an SVM is sometimes large due to outliers. This paper presents a feature selection method that uses a contribution ratio based on boosting, which is effective for evaluating features. By comparing our method to the conventional one that uses a conﬁdent margin, we found that our method can select better feature sets using the contribution ratio obtained from boosting.

1

Introduction

Feature-based classiﬁers are commonly used for object recognition and type classiﬁcation [1]. To make the classiﬁers more robust, we need to select features that are invariant to various changes resulting from location, scaling, viewpoint, and other environments. To improve classiﬁcation performance, in addition to using classiﬁers for accurate selection of feature sets such as AdaBoost [2], and SVM [3], determining which feature subset to use in the classiﬁer is required. Also, to reduce computation costs and maintain classiﬁcation performance, removing ineffective features is important. Therefore, a tiny and effective feature subset selection (FSS) is needed for robustness and speed, particularly in image/object recognition algorithms that use large feature sets [5], [6]. Previous work in this area has focused on the margin of support vector machines (SVM). The problem with directly using the SVM mar-

gin is that it does not always provide a clear relationship between its value and the performance of the SVM, and the best obtained subset is not guaranteed to be the best possible one. Kugler et. al. proposed FSS using the conﬁdent margin (CM ) in the subset criterion, which enables FSS to come close to the best recognition rate by monitoring the peak of the CM curve without directly calculating the recognition rate, which reduces computational time [4]. However, in these methods, leaving a feature is necessary so that the margin of the SVM can be measured for as many number of features as there are in a feature subset. In addition, margin errors are present in the SVM in some cases. This paper describes our novel method for selecting feature subsets using a contribution ratio (CR). The CR is the relative importance of features based on the boosting classiﬁer for object type classiﬁcation [8]. Starting with a given set of features, we estimate the CR of each features from a feature set and remove the features that are not contributing and that have a low CR until achieving the desired number of feature subsets. We evaluated our method by performing two tasks with a classiﬁcation problem in a real world situation, and we conﬁrmed that feature subset selection using the CR is more reliable than the method based on the margin of an SVM.

2 Related works[4] When constructing an SVM, the distance between a decision boundary and a learning sample is maximized. This distance is called a“ margin ”. The function of an SVM is determined by: 〈w · Φ(x)〉 + b n ∑ = wi Ψ(x) + b,

f (x) =

(1) (2)

i=1

where x1 , x2 , ..., xn is the input, Y = {−1, 1} is the label, w is the weight vector, Φ is the projection to the

nonlinear space, and b is the bias. 1 The margin of an SVM is represented as ||w|| by weight vector w. The weight vector w is given by the follow equation, which solves equation 1 using the Lagrange multiplier method: w

=

n ∑

yi αi Φ(x),

(3)

i=1

where α is the Lagrange multipliers. A large margin classiﬁer is expected to have a high generalization. Therefore, the margin can be used to evaluate the performance of the classiﬁer and input features of an SVM. The margin in an SVM, however, is sometimes large due to outliers [4]. In such cases, errors are present when using the margin -based method. Kugler et al. used the conﬁdent margin, which is a combination of conﬁdence and margin (normal margin : NM). Conﬁdence c is given by: c=

n 1∑

l

yi f (xi ).

(4)

i=1

The deﬁned value c has a stronger classiﬁcation performance. The decision function f (xi ) of an SVM that can be expressed by the equations 1 and 3 as: f (x)

=

n ∑

yi αi K(xi , x) + b

AdaBoost

AdaBoost selects a subset of features to construct a robust classiﬁer from a training dataset {(xi , yi ) : 1 < i < n}, where x = (x1 , · · · xP ) is the 11-dimensional feature vector, and y ∈ {−1, +1} is the label as follows: { +1, if xi ∈ target yi = (8) −1, otherwise. In each round, the learning algorithm selects from all the features. The AdaBoost algorithm picks the optimal threshold th for each feature p by: ∑ hp,th (x) =(0 argmin { I(yi = sgn (xpi −th))} t ≤ th ≤ 1 , 1 ≤ p ≤ P ) i

(9) The output of Adaboost after the learning process is a binary classiﬁer that consists of a linear combination of the selected features with weights αt . Therefore, the ﬁnal classiﬁer HT is given by:

= 〈Ψ(xi ) · Ψ(x)〉 .

(6)

CM = c · N M.

(7)

Using the conﬁdent margin, the variance between the fact performance and the evaluation can be reduced. However, even the conﬁdent margin is not free from margin errors, as we later discuss. We have to deal with the problem of not havinge an outlier in a dataset.

Feature contribution ratio based on boosting

A feature subset evaluation using the margin of an SVM is problematic due to an error from outliers. Also, constructing the same number of SVM classiﬁers as the square numbers of features is needed. We focused on a boosting classiﬁer that is equal to an SVM to maximize the margin. The boosting classiﬁer was constructed so that its features could be used more effectively. Tsuchiya et al. has proposed a feature evaluation method based on boosting [8]. They deﬁned the

T ∑

αt h(t) (x).

(10)

t=1

(5)

The conﬁdent margin CM , estimated by using conﬁdence c and normal margin N M , is as follows:

3

3.1

HT (x) =

i=1

K(xi , x)

feature contribution ratio with a weak hypothesis and the weight from the performance of the weak hypothesis.

3.2

Evaluating feature contribution ratio

In each round, Adaboost selects from the total set of various features, which are the features with the lowest weighted error on the training examples. The ﬁnal classiﬁer balances the 11 features to maximize classiﬁcation performance. The weight α, the selected feature, and the threshold th chosen at each round are very important factors for bolstering classiﬁcation performance. Here, we introduce a metric that indicates how well the features “contribute” to the classiﬁcation performance. A contribution ratio CRp for each feature p is deﬁned by: CRp =

T ∑

αt′ · δK [P (ht ) − p],

(11)

t=1

where p is a kind of feature, and P () is a function for outputting the feature chosen at round t in the AdaBoost training process. Let δK be the Kronecker delta. This contribution ratio CRp becomes a metric for measuring the contribution of the feature vector p, and it enables us to determine which subset of features should be selected in a given classiﬁcation task. This evaluating method need only an AdaBoost classiﬁer for evaluating all the

features in the subset. In addition, CR is the invariant outlier than margin of an SVM, that caused by CR is estimated from performance and frequently in used of weak hypothesis. We proposed a feature selection method using a contribution ratio based on boosting.

4

Feature subset selection using contribution ratio

A margin can be used to evaluate only a classiﬁer on a feature subset, and it is not robust. Therefore, a margin-based method is problematic because constructing many classiﬁers is needed and margin is following no ﬁxed the classiﬁcation performance for outliers in some cases. However, we can evaluate all the features in a subset using contribution ratio (CR) from an AdaBoost classiﬁer. Thus, CR is an invariant outlier than margin of an SVM, and that necessitates the construction of as many AdaBoost classiﬁers as there are features. We developed a feature selection method using a contribution ratio based on boosting. An illustration of our method is shown in Figure1. Starting with a given set of features, we estimate the CR of each features from a feature set and remove the features that are not contributing until achieving the number of desired feature subsets. Feature subset selection using a contribution ratio algorithm is given as shown in Figure2.

1. 2. 3.

Algorithm The FSS using Contribution ratio Input: n, Training dataset (xi , yi ) Initialize: Subset of surviving features s = [1, 2, ..., n] Do for Until s is empty (a) Train AdaBoost classiﬁer with all the training exsamples (b) Compute Contribution Rate CR1 , CR2 , ..., CRn (c) ﬁnd the worst feature worst = argmin(CRi ) (d) Remove the worst feature i that minimam CRi

Figure 2. FSS Algorithm [9] and “VH” data from a CU database [8]. The “Sonar” data included 208 samples with 60 features. The “VH” data included 800 samples with 7 features. Our method is using only boosting classiﬁers. However, we believe this method has usable for common classiﬁers. To demonstrate we our hypothesis is real, we constructed an SVM classiﬁer from selected feature subsets that are based on a conﬁdent margin and contribution ratio. Estimating the recognition rate using leave-one-out and cross-validation methods in order to estimate CM from separable datasets is difﬁcult. Our goal in these experiments is to show that, by comparing feature subset selection methods using CR and CM , we can improve the accuracy and effectiveness of selection from an outlier that includes errors related to CM . The SVM classiﬁer and margin estimation are hindered by svm-lighit[7].

5.1 Sonar data case We report the best recognition rate (Best RR) and feature dimension (DIM) using feature selection in table1. Both methods are effective for reducing the number of features. Using CM is more effective than our method; however, our method can be used to select effective features that can maintain a recognition rate.

5.2 VH data case Figure 1. FSS using contribution ratio

5

Evaluation

We used two test sets that included binary classiﬁcation problems called “Sonar” data from a UCI database

We report the Best RR and DIM using feature selection in table2, and we show a variation of the recognition rate in ﬁgure 4. Both methods are effective for reducing the number of features. However, the recognition rate using CM is 20% less effective than our method when selecting the last feature. We believe this problem is due to the SVM margin.

recognition rate

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Table 3. Margin and conﬁdence ( DIM = 1) NM c CM Sonar 0.57 0.12 0.06 VH 5.13 0.50 2.57

RR

6 Conclusion 59

50

40

30

20

10

0

feature dimension

Figure 3. Variation of recognition rate with FSS on Sonar dataset Table 1. Results of FSS on Sonar dataset Best RR[%] DIM Our method 91 36 Conﬁdent Margin 93 13

We created a novel feature subset selection method that uses a contribution ratio. The contribution ratio is obtained from the selected features and weights in the AdaBoost training. We experimentally validated our method by demonstrating robust selection using an SVM classiﬁer in a test set under an outlier. This enabled us to determine which feature should be selected, in a [general learningbased classiﬁer.

References recognition rate

1

2.3 1.8

0.9

RR(CM) RR(CR) CM

1.3

0.8 0.8

confident margin

2.8

0.3

0.7 6

5

4

3

2

1

feature dimension

Figure 4. Variation of recognition rate with FSS on VH dataset Table 2. Results of FSS on VH dataset Best RR[%] DIM Our method 100 3 Conﬁdent Margin 100 4

5.3 Margin -based evaluation problem For the focus variation of conﬁdent margin (CM ), shown in ﬁgure 4, the CM is very high when the last feature is selected. This is because for selection using the CM , ineffective features cannot be rejected. The conﬁdence (c) and normal margin (N M ), when the last feature is selected, are shown in table 3. The conﬁdence work to reduce margin in both cases, however, the N M is too large to effectively estimate the feature in the VH data. This means that the conﬁdence has not been sufﬁciently checked. On the other hands, our method can effectively be used for selecting feature subsets. In addition, our method does not reject the most effective features, when the CR is used and does not affect the outlier at the margin of an SVM.

[1] R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for cooperative multisensor surveillance”, Proc. of the IEEE, Vol. 89, No. 10, pp. 1456 - 1477 (Oct. 2001). [2] Y. Freund and R.E. Schapire , “A decision-theoretic generalization of on-line learning and an application to boosting”, Journal of Computer and System Sciences, pp. 119 -139, (1997) [3] N. Cristianini, and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, (2000.) [4] M. Kuguler, K. Aoki, A. S. Nugroho, S. Kuroyanagi, and A. Iwata, “A Feature Subset Selection for Support Vector Machines using Conﬁdent Margin”, IJCNN, Montreal., Proceedings, IEEE Computer Society, (2005.) [5] D. Hoiem, A. A. Efros, and M. Hebert, “Putting Objects in Perspective”, pp. 2137 - 2144, CVPR (2) (2006). [6] P. Sabzmeydani and G. Mori, “Detecting Pedestrians by. Learning Shapelet Feature”, CVPR, pp. 511- 518, (2007). [7] T. Joachims,“ Making large-Scale SVM Learning Practical: Advances in Kernel Methods - Support Vector Learning ”, B. Scholkopf, C. Burges, and A. Smola (ed.) MITPress (1999). [8] M. Tsuchiya, and H. Fujiyoshi, “Evaluating Feature Importance for Object Classiﬁcation in Visual Surveillance”, Proc. of ICPR, pp. 978 - 981, 4 pages (2006). [9] C. Blake, E. Keogh, and C. J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/ mlearn/MLRepository.html]., Irvine, CA: University of California, Department of Information and Computer Science, (1998).

Recommend Documents

FEATURE SELECTION BASED ON FISHER RATIO AND MUTUAL ...

A Feature Selection-based Ensemble Method for Arrhythmia ...