A Modified Editing k-nearest Neighbor Rule - Journal of Computers

Report 2 Downloads 30 Views
JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

1493

A Modified Editing k-nearest Neighbor Rule Ruiqin Chang, Zheng Pei and Chao Zhang School of Mathematics and Computer Engineering Xihua University Chengdu, 610039, China Email: [email protected], [email protected]

Abstract—Classification of objects is an important area in a variety of fields and applications. Many different methods are available to make a decision in those cases. The knearest neighbor rule (k-NN) is a well-known nonparametric decision procedure. Classification rules based on the k-NN have already been proposed and applied in diverse substantive areas. The editing k-NN proposed by Wilson would be an important one. In this rule, editing the reference set is first performed, every sample in the reference set is classified by using the k-NN rule and the set is formed by eliminating it from the reference set. All the samples mistakenly classified are then deleted from the reference set. Afterward, any input sample is classified using the k-NN rule and the edited reference set. Obviously, the editing k-nearest neighbors classifier (EK -NN) consists of the k-nearest neighbor classifier and an editing reference set. However, the editing reference set gained by this method is only a subset of the reference set. This may result in the loss of some important information and decline of classification accuracy. In this paper, we focus on modifying the editing reference set of EK -NN, the new editing set in our method consists of subsets of the reference set and testing set, such subsets are received by classifying every sample in the reference set and testing set by using the k-NN rule and removing misclassified samples from reference set and testing set, respectively. Advantages of our method are to reduce the loss of information and improve the recognition rate. Comparisons and analysis of the experimental results demonstrate the capability of the proposed algorithm. Index Terms—k-nearest neighbors classifier, technique, reference set, testing set, training set

editing

I. INTRODUCTION Pattern classification is a method capable of discriminating patterns, it is an approach to supervised learning in pattern recognition. The k-nearest neighbor rule (k-NN), which is proposed by Fix and Hodges [1], is a well-known nonparametric decision rule of pattern classification. Let D = {( X 1 ,θ1 ),⋅ ⋅ ⋅, ( X n ,θ n )} be the classified sample set. It is independently and identically distributed according to the distribution F ( X ,θ ) of ( X ,θ ) , where the X i s take values in a metric space ( X , d ) and the

θ i take s

values in the set {1,",c} . A new pattern X is given. It is

desired to estimate its label θ by the majority of labels θ [ 1] ," , θ [ k ] corresponding to the k-nearest neighbors X [ 1] , " , X [ k ] of { X 1 , ", X n } to X . Thereafter, k-NN is investigated extensively. Such extensive investigation can © 2011 ACADEMY PUBLISHER doi:10.4304/jcp.6.7.1493-1500

be found in Cover and Hart [3], Devroye [4], Urahama and Nagao [5], Wagner [6], Yan [7], and Wilson [2]. The k-NN determines the class θ of X associated with the largest number of points among the k-NN, formally, this method is computationally expensive. To improve the drawback of k-NN, Wilson proposed the editing k-NN [2], which can be described as follows: θ i of each X i is first the estimated by using the k-NN and the data set {( X1,θ1 ),", ( X n ,θ n )} is editing by deleting ( X i ,θi ) whenever θi does not coincide with its estimate. Finally, the k-NN is used again to estimate. Finally, the k-NN is used again to estimate θ of X by using the editing data [8]. In the editing k-NN, editing the reference set is first performed. Firstly, D is divided into reference set D1 and testing set D2 . Each sample in the reference set is classified by testing set using the k-NN [3] and the editing reference set D′1 formed by eliminating its samples from the reference set, that is, all the misclassified samples are then deleted from the reference set, afterward, any input sample is classified using the k-NN and the editing reference set. The editing k-NN has yielded many interesting results in many finite sample size problems. From the practical point of view, it effectively improve the classification accuracy rate and reduce the computation times, which makes it suitable as a preprocessor for classification that are much more complex. Nowadays, based on k-NN and the editing kNN, many interesting modified classifying algorithm have been proposed, such as, Frigui and Gader [9] propose possibility k-NN for land mine detection using sensor data generated by a ground-penetrating radar system, in which, edge histogram descriptors are used for feature extracting and a possibility k-NN for confidence assignment. Wu et al [10] develop two effective techniques, namely, template condensing and preprocessing, to significantly speed up k-NN classification while maintaining the level of accuracy. Samsudin et al [11] extend three variants of the nearest neighbor algorithm to develop a number of nonparametric group-based classification techniques. Hattori and Takahashi [12] propose a new editing k-nearest neighbor (k-NN) rule. In this method, every sample y in the editing reference set, all the k- or (k+l)-nearest neighbors of y must be in the class to which y belongs. Rubio et al [13] present an alternative based on the settings of specific-to-problem kernels to be applied to time series prediction focusing on large scale prediction.

1494

In Rer [14~18], they extend the fuzzy k-nearest neighbor (FK-NN) algorithm to yield higher classification accuracy, and have been shown to be a powerful soft pattern classifier. Jahromi et al [19] and Babu et al [20] take the weights of the training instances into account, in the course of classification. Those algorithm reduce the size of the training set and can be viewed as a powerful instance reduction technique. Gil-Pita and Yao [21] propose three improvements of the editing k-nearest neighbor design using genetic algorithms: the use of a mean square error based objective function, the implementation of a clustered crossover, and a fast smart mutation scheme. However, the editing reference set received by the EKNN is only a subset of reference set, which may result in the loss of some important information. In order to delete noise more effectively and further improve the classification accuracy rate, we modify the previous method. Based on previous process, let D1′ be testing set and D2 be reference set. We classify D2 by D1′ using kNN again and remove misclassified samples from D2 . Then the rest samples which are accurately classified in D2 are served as D2′ . The editing training set finally used in the proposed method is an union of D1′ and D2′ , which retains the information-rich samples that EKNN algorithm has removed. The performance of the method proposed has been compared with those of the k-NN and EKNN. Theoretically, selected samples reserve most of the important information of the original training set. In practice, better sample selection techniques should be able to detect and ignore noisy, modify misleading samples. In EKNN, the quality of training set affects recognition accuracy, and increases efficiency of next sample selection. From the practical point of view, potential benefits of sample selection can be shown as follows: 1) Simplifying data visualisation: By reduction of the training set to a smaller one, trends within the training set can be easily identified. This can be very important to the training set where only a few samples have an influence on outcomes. 2) Reduction of computation times: With smaller training set, the runtimes of learning algorithms can be condensed significantly for classification phases. And the smaller training set can avoid the “overfitting” phenomenon. 3) Improvement of classification accuracy: The prediction performance of the classifiers and classification accuracy can be increased as a result of sample selection, through the removal of noisy and misleading samples. 4) Reduction of storage space: After romoving noise or misleading smples, datas that need to be served will be greatly reducted. From previous experimentations, it is observed that the methods with sample selection often have higher classification accuracy than those without. This motivates

© 2011 ACADEMY PUBLISHER

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

a new sample selection technique to guide search, in order to locate optimal training set. The remainder of this paper is structured as follows: In section 2, basic theory is described which includes the knearest neighbor classifier and the editing k-nearest neighbors classifier. The editing k-nearest neighbors classifier which we modify is described in Section 3. An experiment shown in section 4 is to describe the validity of the method proposed in this paper. In section 5, there is a conclusion of the paper. II. PRELIMINARIES The principal focus of this paper lies in the modification of editing k-nearest neighbors classifier (MEKNN), however an in-depth view of the current knearest neighbor classifier (k -NN) [1] and EKNN [2] are necessary to appreciate the MEKNN approach fully. First, we introduce k-nearest neighbors classifier: A. k-Nearest Neighbors Classifier In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. kNN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors ( k is a positive integer, typically small). The k-NN classifier is commonly based on the Euclidean distance between a testing sample and the specified training samples. The algorithm of the so-called nearest neighbors classifier is summarized as follows: be training set, Let D = { X 1 , X 2 ,⋅ ⋅ ⋅, X n } D = {C1 , C2 ,⋅ ⋅ ⋅, Cc } be c classes of samples in D , and each sample only belongs to one class, X i be an training

sample with

p features {xi1 , xi 2 ,⋅ ⋅ ⋅, xip } , in which

i = 1,2,⋅ ⋅ ⋅, n , n be the total number of training sample.

Then input the value of k . For an unlabeled vector X l (a querying or testing sample), it is calculated the distance of training sample X i (i = 1,2,⋅ ⋅ ⋅, n) and testing sample X l by Euclidean distance. The distance between X i and X l is: p

∑ (x

D( X i , X l ) =

r =1

ir

− xlr ) 2 .

Sort these n distances, and select k nearest neighbors that are closet to X l . Then we can find that there are c′ classes and every class has k j ′ nearest neighbors which are closet to X l , in which 1 ≤ c′ ≤ c , and: c′

∑k j ′ =1

j′

=k.

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

Finally, the unlabeled vector is classified to a class which is most frequent among the k training samples nearest to that unlabeled vector. If X l ∈ C j then:

DN = max{k j ′ } , in which DN is denote the number of the nearest distance between X l and C j . Input: the original training set D ; the new sample X l ; classes

Ci

in

D ,

where

i = 1,2,⋅ ⋅ ⋅, c ; (1) ∀X j ∈ D (2) for ( j = 1; j < D ; j + + ) (3)

Distance[ j ] = D( X j , X l )

(4) end (5) input k (6) select k the shortest distances from Distance[| D |] , and conserve them in ShorDis[n] , n = 1 , 2 , ⋅ ⋅ ⋅ , k (7) find a class Ci which includes the largest number of the distances in ShorDis[n] (8) assign X l to Ci

shortest

Figure 1. The k-nearest neighbors classifier algorithm

From Figure 1, we can find that the k-NN is a pure classifiction algorithm, without sample selection phase. When a new sample is assigned to a certain class, it is necessary to calculate distance of new sample and every exiting sample in training set. However, when n → ∞ , the computation is considerable large. In fact, it is not necessary to calculate all the distance of an unlabeled sample and every exiting sample in training set, in order to find its class, but to calculate distance with some representative samples. A solution to this problem is to select the informationrich samples to represent the original training set. The following we will introduce several methods to solve the problem. B. Sample Selection Just as the methods enumerated previously, they can operate effectively on deleting noise or misleading samples: • K-Editing Nearest Neighbor (EKNN) algorithm was created in 1972 by Wilson [7]. The main idea of EKNN is to remove given samples which is misclassified. EKNN starts from original training set [22]. • Dasarathy [16] has developed a condensing method to edit the reference set: his method provides the minimal consistent subset(MCS) which is used as the editing reference set. All the © 2011 ACADEMY PUBLISHER

1495

samples in the reference set can be correctly classified using the k-NN and the MCS [23]. • Condensed Nearest Neighbor Rule (CNN) was made by Hart [5]. The CNN algorithm starts new data set from one instance per class randomly chosen from training set. After that each instance from the training set that is wrongly classified using the new dataset is added to this set. This procedure is very fragile in respect to noise and the order of presentation [22]. • Reduced Nearest Neighbor (RNN) described in [6] by Gates was based on the same idea as CNN. However, RNN starts from original training set and rejects only those instances that do not decrease accuracy [22]. All of those methods are attempt to reduce the size of training set. Whereas, in this paper our focus is on EKNN. So in the next part, we firstly will introduce EKNN. C. Editing k-Nearest Neighbors Classifier The editing k-nearest neighbors classifier consists of the k-nearest neighbor classifier and an editing training set. In order to reduce the classification error rate and improve the classification accuracy rate, the EKNN selects samples from training set which accurately classified as the editing training set (training set). When we classify every sample of testing set, it is evaluated distance of the sample and each pattern in training set and then is assigned to a certain class by k-NN. This can save a host of computation time. The detailed description of the algorithm is as follows: Let D = { X 1 , X 2 ,⋅ ⋅ ⋅, X n } be training set, and each

sample has a given class label. Then we divide D into two subsets D1 and D2 , in which: D1 = {Y1 , Y2 ,⋅ ⋅ ⋅, YM } (1 ≤ M ≤ n) , D2 = {Z1 , Z 2 ,⋅ ⋅ ⋅, Z N } (1 ≤ N ≤ n) , and n = M + N . Yi and Z j be training samples, each has p features { yi1 , yi 2 ,⋅ ⋅ ⋅, yip } , {z j1 , z j 2 ,⋅ ⋅ ⋅, z jp } , (i = 1 ,2,⋅ ⋅ ⋅, M ; j = 1,2,⋅ ⋅ ⋅ N ) and let D1 be reference set (training set), D2 be testing set. Classify every sample belongs to D2 by D1 once more. Namely, for every Z j ∈ D2 and Yi ∈ D1 , they are evaluated distance of Z j and Yi as

follows: D ( Z j , Yi ) =

p

∑ (z r =1

jr

− yir ) 2

then use k-NN to assign class of Z j , and preserve the results. Estimate whether each sample in D2 misclassified or not, if misclassified, then remove the corresponding test sample from D2 . Remember the rest samples which are accurately classified in D2 for D2′ : D2′ = {Z1′, Z 2′ ,⋅ ⋅ ⋅Z r′ } (1 ≤ r ≤ N ) . Serve D2′ as editing training set. For an unlabeled vector X l , it is classified by D2′ using k-NN. The editing k-nearest neighbors classifier algorithm, based on the k-NN algorithm, has been developed as

1496

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

given in Figure 2. From the process of implementation, it is obviously found that computation has been greatly reduced. Besides, the approach increases the classification accuracy and effectively removes the noisy or misleading samples which can be definitely discerned in Figure 3.

Input: the original training set D ; the new sample X l ; classes Ci in D , where i = 1,2,⋅ ⋅ ⋅, c ; (1) divide D into two subsets D1 and D2 (2) for every Z 2j ∈ D2 , Z k1 ∈ D1 , calculate the distance between Z 2j and Z k1 , then use kNN to research which class Z 2j belongs by D1 , and conserve the results (namely, do (1)~(8) in Figure. 1) (3) remove mistakenly calssified samples from D2 compared with the original class in D2 , serve the rest numbers in D2 as training set D2′ . (4) for a new sample X l , it is classified by D2′

using k-NN, (namely, do (1)~(8) in Figure. 1) (5) return class Ci of X l Figure 2. Comparison of before and after sample selection

testing set. This may result in the loss of some important information and misclassification for some samples. In order to reduce the loss of information and improve the classification accuracy rate, in this paper, we modify the previous method. The final editing training set is no longer a subset of reference set but an union of two subsets respectively belongs to reference set and testing set. The detailed process is as follows: Let D = { X 1 , X 2 ,⋅ ⋅ ⋅, X n } be training set, and each sample has a given class label. Then we divide D into two subsets D1 and D2 , in which: D1 = {Y1 , Y2 ,⋅ ⋅ ⋅, YM } (1 ≤ M ≤ n) D2 = {Z1 , Z 2 ,⋅ ⋅ ⋅, Z N } (1 ≤ N ≤ n) and n = M + N . Yi and Z j be training samples, each has

p features { yi1 , yi 2 ,⋅ ⋅ ⋅, yip } , {z j1 , z j 2 ,⋅ ⋅ ⋅, z jp } ,

(i = 1

,2,⋅ ⋅ ⋅, M ; j = 1,2,⋅ ⋅ ⋅ N ) and let D1 be reference set (training set), D2 be testing set. Classify every sample belongs to D2 by D1 once more. Namely, for every Z j ∈ D2 and Yi ∈ D1 , they are evaluated distance of Z j and Yi as follows:

D ( Z j , Yi ) =

p

∑ (z r =1

jr

− yir ) 2

then use k-NN to assign class of Z j , and preserve the results. Estimate whether each sample in D2 misclassified or not, if misclassified, then remove the corresponding test sample from D2 . Remember the rest samples which are accurately classified in D2 for D2′ : D2′ = {Z1′, Z 2′ ,⋅ ⋅ ⋅Z r′} (1 ≤ r ≤ N ) . Let D2′ be reference test, the original reference set D1 be testing set. Then repeat the previous classification operation. Finally we can receive D1′ : D1′ = {Y1′, Y2′,⋅ ⋅ ⋅, Yq′} (1 ≤ q ≤ M ) , a subset of D1 . Serve the union D′ as editing training set, D′ = D1′ ∪ D2′ = {Y1′, Y2′,⋅ ⋅ ⋅, Yq′, Z1′, Z 2′ ,⋅ ⋅ ⋅, Z r′} .

Figure 2. The edited k-nearest neighbors classifier algorithm

III. A METHOD OF SAMPLE SELECTION BASED ON EDITING K-NEAREST NEIGHBORS CLASSIFIER The modified method proposed (MEKNN) in this paper uses the EKNN as a basis for sample selection, while using k-NN to classify. It is well known that the main idea of EKNN is to reduce the original training set to smaller one for reducing the computational complexity and improving the classification accuracy rate. But the smaller training set finally generated is only a subset of © 2011 ACADEMY PUBLISHER

For an unlabeled vector X l , it is classified by D′ using kNN. Figure 4 shows a algorithm of the method improved based on k-NN and EKNN algorithm as betrayed above in Figure 1 and Figure 2. The MEKNN algorithm is similar to the EKNN algorithm, but the difference between them is that they use different training sets to guide the classification process. By the described above, we can see that the sample number in the training set of the improved method is more than in original method. This is because the number of the training set is rationally increased in the MEKNN. In the same time it can retain the important information-rich samples that EKNN has lost. By using the method improved, it can make the classification accuracy higher which is shown in section 4.

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

IV. EXPERIMENT This section presents the results of experimental studies which are respectively shown in Table 1, Table 2 and Figure 5. The data sets used in this experiment are two two-dimensional synthetic data sets which are gener ated as follows: each class has 2000 samples which were independent and identically distributed (i.i.d), drawn from a normal distribution having mean as (2,3), (3,2) and the same covariance matrix as I 2× 2 (i.e., identity matrix). A comparison of k-NN , EKNN and MEKNN is given based on reductive size and classification accuracy. In the process of classification, the method proposed firstly generates a reductive training set, and then classifies an unlabeled vector using k-NN. Input: the original training set D ; the new sample X l ; classes Ci in D , where i = 1,2,⋅ ⋅ ⋅, c ; (1) divide D into two subsets D1 and D2 (2) for every Z 2j ∈ D2 , Z k1 ∈ D1 , calculate the distance between Z 2j and Z k1 , then use k-NN to research which class Z 2j belongs by D1 , and conserve the results (namely, do (1)~(8) in Figure. 1) (3) remove mistakenly calssified samples from D2 compared with the original class in D2 , serve the rest numbers in D2 as D2′ . (4) for every Z k1 ∈ D1 , Z ′j ∈ D2′ , calculate the distance between Z k1 and Z ′j , then use k-NN to research which class Z k1 belongs by D2′ , and conserve the results (namely do (1)~(8) in Figure 1) (5) remove mistakenly calssified samples from D1 compared with the original class in D1 , serve the rest numbers in D1 as D1′ (6) serve D′ = D1′ ∪ D2′ as training set, for X l , it is classified by D′ using k-NN, (namely do (1)~(8) in Figure 1) (7) return class Ci of X l

1497

remaining 2000 samples as testing set DTTS . DTNN and DTTS are shown in Figure 5, in the form of chart. For EKNN: we firstly divide DTNN into two subsets DTNN 1 and DTNN 2 , and let DTNN1 be reference set, DTNN 2 be testing set. Then DTNN 2 is classified by DTNN 1 using kNN. And samples which are mistakenly classified that compared with the original class are excluded from ′ 2 of DTNN 2 , DTNN 2 . Finally, we can receive a subset DTNN which is served as new training set. For MEKNN: based on the EKNN algorithm we gained DTNN ′ 2 be reference set and DTNN 1 ′ 2 , then let DTNN be testing set. After repeating the steps above, we can receive a subset DTNN ′ 1 of DTNN 1 . Finally set an union ′ 2 be training set. ′ of DTNN ′ 1 and DTNN DTNN The result of sample selection using EKNN and MEKNN will be shown in Table 1. The retention rate in Table 1 is calculated as follows:

r=

DRe mainNumber DTrainingNumber

From Table 1, it can be clearly seen that the retention rate of EKNN is only 35.6%. This is mean that only fewer samples have been remained, which may not represent distribution of the whole training set. While MEKNN is employed, the retention rate rises to 72.25%. This is mean that while removes the noise or misleading samples, retains as much as possible the information-rich samples. In other words, MEKNN can compensate for the shortcoming of EKNN to make classification accuracy higher.

B. Comparison of Classification Accuracy In this section we will compare the classification accuracy of k-NN, EKNN and MEKNN, which is shown in Table 2. In the process of classification, the three methods use different training set. For k-NN, the training

Figure 4. The modification of editing k-nearest neighbors classifier algorithm

A. Sample Selection In the process of experiment, we remember the two data sets respectively for DT1 and DT2 . Then select 1000 samples respectively from DT1 and DT2 . Serve the selected 2000 samples as training set DTNN , the

Figure 4. The Original Date(the red ones denotes DTNN , the blue ones denotes DTTS )

© 2011 ACADEMY PUBLISHER

1498

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

TABLE II. THE RESULTS OF SAMPLE SELECTION USING EKNN AND MEKNN . METHOD NUMBER OF SAMPLE NUMBER OF CLASS TESTING NUMBER

EKNN

MEKNN

4000

4000

2

2

2000

SUBSET CLASS

2000

D1

TRAINING NUMBER 1000

D2

D1

2000 1000

D2 2000

1000

1000

712

REMAIN NUMBER

1445

0

712

733

35.60%

RETENTION RATE

712 72.25%

0

35.60%

73.30%

71.20%

TABLE I. VERIFY THE VALIDITY OF MODIFIED EKNN. KNN K

EKNN

MEKNN

CORRECT CLASSIFICATION NUMBER

ACCURACY

CORRECT CLASSIFICATION NUMBER

ACCURACY

CORRECT CLASSIFICATION NUMBER

ACCURACY

2

1360

60%

1510

75.50%

1525

76.25%

4

1453

72.65%

1543

77.15%

1550

77.50%

6

1493

74.60%

1546

77.30%

1555

77.75%

8

1513

45.65%

1549

77.45%

1559

77.95%

10

1544

77.20%

1556

77.80%

1555

77.75%

set is the DTNN (original training set). Its number is 2000. Each sample of DTTS is directly classified by evaluating the distances to each sample in DTNN using k-NN. For EKNN, the training set is DTNN ′ 2 . Its number is 712. DTTS is classified by DTNN ′ 2 using k-NN. For MEKNN the training set is DTNN ′ . Its number is 1445. DTTS is classified by DTNN ′ using k-NN. The classification result of these three methods is shown in the Table 2. Experiment results of three algorithms include adjusted values of neighbors k , where k is 2, 4, 6, 8, 10. From Figure 6, we can also definitely see that the results of the three methods are quiet different. The average classification accuracy of MEKNN is the highest in the three methods. As seen from Table 2 and Figure 6, we can draw a conclusion that the modified method is better than the other two methods. Compared to k-NN, first the classification accuracy is improved; second the training set is shrunk; last the amount of computation is reduced. Compared to EKNN, the modified method retains some important information that EKNN has lost and the classification accuracy is also improved.

Figure 6.

The comparison of k-NN, EKNN and Modified EKNN

V. CONCLUSION A new editing k-nearest neighbors classifier is proposed in this paper, and its performance is investigated by compared with k-NN, EKNN and MEKNN.

© 2011 ACADEMY PUBLISHER

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

Via the experimental part, we can definitely arrive a conclusion that the method improved has made a certain effect. In section 4, it is found that the method proposed has yielded better result than the k-NN [11], and EKNN [7]. This result is also suggest that a effective and rational editing training set is obtained in the method proposed which preserves the important information. This is the key it increases classification accuracy as well. The condition of a sample to be included in the editing training set in the method proposed, is different from that in EKNN [7]. And the number of samples in the editing training set as is shown previously is more than that in EKNN [7]. However, by comparison of k-NN, EKNN and MEKNN, it has shown that, although the NEKNN finds a more reasonable training set and improves the classification accuracy, it often occurs at the expense of runtime, compared with EKNN. How much number in training set is the optimal state, and in the same time, minimize its runtime are worth researching ulteriorly. Although the method in this paper can not live up to the perfect standard, however, it is important to point out that the proposed method reserves the important information which EKNN [7] algorithm has lost, as well as deletes the redundant information which k-NN [11] algorithm has reserved. No matter in theory or experimentation, we both received a better result than the previous methods. ACKNOWLEDGMENT This work is partly supported by the research fund of Sichuan key laboratory of intelligent network information processing (SGXZD1002-10) and key laboratory of the radio signals intelligent processing (Xihua university) (XZD0818-09). REFERENCES [1] E. Fix and J. L. Hodges, “Discriminatory analysis, nonparametric discrimination, consistency properties,” U.S. Air Force Sch. Aviation Medicine, Randolf Field, Texas, Project 21-49-004, Contract AF 41(128)-31, Rep. 4, 1951. [2] D. L. Wilson, “Asymptotic properties of nearest neighbor rules using editing data,” IEEE Trans. Syst., Man, Cybern, vol. SMC-2, Issue. 3, pp. 408-421, 1972. [3] T. M. Cover, P. E. Hart, “Nearest neighbor pattern classification,” IEEE Transaction on Information Theory, vol. IT-IS, pp. 21-27, 1967. [4] L. Devroye, “On the asymptotic probability of error in nonparametric discrimination,” Ann. Statist., vol. 9, Issue. 6, pp. 1320–1327, 1981. [5] K. Urahama and T. Nagao, “Analog circuit implementation and learning algorithm for nearest neighbor classifiers,” Pattern Recognit. Lett., vol. 15, pp. 723–730, 1994. [6] T. J. Wanger, “Convergence of the nearest neighbor rule,” IEEE Trans. Inform. Theory, vol. 17, pp. 566–571, 1971. [7] H. Yan, “Prototype optimization for nearest neighbor classifiers using a two-layer perception,” Pattern Recognit., vol. 26, pp. 17–324, 1993. [8] M. S. Yang, C. H. Chen, “On the Editing Fuzzy k-Nearest Neighbor Rule,” IEEE Trans. Systems, Man, And Cybernetics—Part B: Cybernetics, vol. 28(3), pp. 461– 466, 1998.

© 2011 ACADEMY PUBLISHER

1499

[9] H. Frigui, P. Gader , “Detection and Discrimination of Land Mines in Ground-Penetrating Radar Based on Edge Histogram Descriptors and a Possibilistic k-Nearest Neighbor Classifier,” IEEE Trans. Fuzzy Systems, vol. 17(1), pp. 185–199, 2009. [10] Y. Wu, K. Ianakiev, V. Govindaraju, “Improved k-nearest neighbor classification,” Pattern Recognition, vol. 35, pp. 2311–2318, 2002. [11] N. A. Samsudin, A. P. Bradley, “Nearest neighbour groupRecognition, doi: based classification,” Pattern 10.1016/j.patcog. 2010. [12] K. Hattori, M. Takahashi, “A new editing k-nearest neighbor rule in the pattern classi"cation problem,” Pattern Recognition, vol. 33, pp. 521-528, 2000. [13] G. Rubio, L. J. Herrera, H. Pomares, I. Rojas, A. Guillen, “Design of specific-to-problem kernels and use of kernel weighted K-nearest neighbours for time series modelling”, Neurocomputing, vol. 73, pp. 1965–1975, 2010. [14] D. Li, H. Gu, L. Zhang, “A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data,” Expert Systems with Applications, vol: 37, pp. 6942–6947, 2010. [15] H. Seker,M. O. Odetayo, D. Petrovic, and R. N. G. Naguib, “A Fuzzy Logic Based-Method for Prognostic Decision Making in Breast and Prostate Cancers,” IEEE Transactions on Information Technology in Biomedicine, vol. 7, 2003. [16] R. Ostermark , “A fuzzy vector valued KNN-algorithm for automatic outlier detection,” Applied Soft Computing, vol. 9, pp. 1263–1272, 2009. [17] X. Song, Y. Zheng, X. Wu, X. Yang, J. Yang, “A complete fuzzy discriminant analysis approach for face recognition,” Applied Soft Computing, vol. 10, pp: 208–214, 2010. [18] S. Rasheed, D. Stashuk, M. Kamel, “Adaptive fuzzy k-NN classifier for EMG signal decomposition,” Med. Eng. Phys., pp. 694–709, 2006. [19] M. Z. Jahromi, E. Parvinnia, R. John, “A method of learning weighted similarity function to improve the performance of nearest neighbor,” Information Sciences, vol. 179, pp: 2964–2973, 2009. [20] V. Suresh Babua, P. Viswanathb, “Rough-fuzzy weighted k-nearest leader classifier for large data sets,” Pattern Recognition, vol. 42, pp. 1719 – 1731, 2009. [21] R. Gil-Pita, X. Yao, “Using a Genetic Algorithm for Editing k-Nearest Neighbor Classifiers,” Springer Verlag Berlin Heideberg, LNCS, vol. 4881, pp. 114-1150, 2007. [22] M. Govindarajan , RM. Chandrasekaran, “Evaluation of kNearest Neighbor classifier performance for direct marketing,” Expert Systems with Applications, vol. 37, pp. 253–258, 2010. [23] Ricardo Fraiman, Ana Justel, Marcela Svarc, “Pattern k-NN reles,” recognition via projection-based Computational Statistics & Data Analysis, vol. 54, pp. 257-263, May 2010.

Ruiqin Chang, graduated from Adult Education of Henan University in 2008, and received the Bachelor's degree of Engineering in Computer. Currently, She is a graduate student with the School of Mathematics & Computer Engineering, Xihua University, Chengdu, Sichuan, China. Her major in Applied Mathematics, and research interests is Intelligent Informtaion Processing, which includes areas such as decision analysis, group decision, knowledge discovery and so on.

1500

Zheng Pei is my supervisor, he has received the M.S and PH.D. degrees from Southwest Jiaotong university in 1999 and 2002 respectively, Chengdu, China. Currently he is a professor in school of Mathematics and Computer Engineering, Xihua University, Chengdu, China. His research interests include rough sets theory, fuzzy sets theory, logical reasoning and linguistic information processing. He has published a overwhelming majority of papers which are embodied by SCI, IEEE, kernel magazines and so on.

© 2011 ACADEMY PUBLISHER

JOURNAL OF COMPUTERS, VOL. 6, NO. 7, JULY 2011

Chao Zhang is a graduate student with the School of Mathematics & Computer Engineering, Xihua University, Chengdu, Sichuan, China. His major in Applied Mathematics, and research interests is Intelligent Information Processing, which includes areas such as decision analysis, group decision, knowledge discovery and so on.