Ensemble with neural networks for bankruptcy ... - Semantic Scholar

Comment

Report 5 Downloads 45 Views

Expert Systems with Applications 37 (2010) 3373–3379

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Ensemble with neural networks for bankruptcy prediction Myoung-Jong Kim a,*, Dae-Ki Kang b a b

Department of Business Administration, Dongseo University, San69-1, Churye-2Dong, Sasang-Gu, Busan 617-716, Republic of Korea Department of Computer and Information Engineering, Dongseo University, San69-1, Churye-2Dong, Sasang-Gu, Busan 617-716, Republic of Korea

a r t i c l e

i n f o

Keywords: Boosting Bagging Neural networks Bankruptcy prediction

a b s t r a c t In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impact. Ensemble is one of widely used methods for improving the performance of classiﬁcation and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classiﬁers. In this paper, we propose an ensemble with neural network for improving the performance of traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean ﬁrms indicated that the bagged and the boosted neural networks showed the improved performance over traditional neural networks. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction Bankruptcy prediction has been an important and widely studied topic in accounting and ﬁnance. The accuracy is clearly of crucial importance in bankruptcy prediction model because bankruptcy has a signiﬁcant impact on management, stockholders, employees, customers and nation. Numerous statistical techniques have been used for improving the performance of bankruptcy prediction models. Beaver (1966) originally proposed the univariate analysis on ﬁnancial ratios to predict bankruptcy. Many empirical studies have proposed statistical bankruptcy prediction models using multiple regression (Meyer & Pifer, 1970), discriminant analysis (Altman, 1968; Altman, Edward, Haldeman, & Narayanan 1977), logistic models (Dimitras, Zanakis, & Zopounidis, 1996; Ohlson, 1980; Pantalone & Platt, 1987), and probit (Zmijewski, 1984). However, strict assumptions of traditional statistics, such as the linearity, normality, independence among predictor variables and pre-existing functional form relating the criterion variable and the predictor variable, have limited their application to the real world. Data mining techniques used in bankruptcy prediction include decision trees (Han, Chandler, & Liang, 1996; Shaw & Gentry, 1998), case-based reasoning (CBR) (Bryant, 1997; Buta, 1994), and neural networks (NNs) (Odom & Sharda, 1990; Ravi & Ravi, 2007). Ensemble methods have generally been used as tools to improve the accuracy of learning algorithms by constructing and combining an ensemble of weak classiﬁers, each of which needs only to be moderately accurate on the training set (Perrone, * Corresponding author. Tel.: +82 51 320 1917; fax: +82 51 320 1629. E-mail address: [email protected] (M.-J. Kim). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.10.012

1994; Schapire, 1990). Two popular methods for creating accurate ensembles are Bagging (Breiman, 1996) and Boosting (Freund & Schapire, 1996). Both theoretical and empirical studies have demonstrated impressive improvements in the generalization behavior (Bauer & Kohavi, 1999; Breiman, 1996, 1997, 1998; Friedman, Hastie, & Tibshirani, 1998; Maclin & Opitz, 1997; Quinlan, 1996; Schapire, 1999; Schapire, Freund, Bartlett, & Lee, 1997). There have been reported that ensemble decreases the generalization error of CART decision trees (Breiman, 1996), C4.5 decision trees (Bauer & Kohavi, 1999; Quinlan, 1996), and NNs (Maclin & Opitz, 1997). Recently, several studies on bankruptcy prediction have applied AdaBoost, which is one of popularly used Boosting algorithms, to bankruptcy classiﬁcation trees. They have shown that AdaBoost decreases the generalization error and improve the accuracy (Alfaro, Gámez, & García, 2007). An empirical comparison has shown that AdaBoost with classiﬁcation tree decreases the generalization error by about 30% percent with respect to error produced with a NNs (Alfaro, García, Gámez, & Elizondo, 2008). Previous studies have suggested that ensemble with classiﬁcation trees is very effective for bankruptcy prediction, however, there has been little empirical testing of ensemble with NNs in bankruptcy prediction literature. The major reason is that ensemble with decision trees provides fast training speed and well-established default parameter settings, while NNs has the difﬁculties for testing both in terms of the signiﬁcant processing time required and in selecting training parameters (Optiz & Maclin, 1999). Ensemble method is expected to provide the following advantages over the traditional NN; First, NNs has introduced as one of prominent techniques which can show effective performance in bankruptcy prediction. Ensemble can produce even more accurate results than any of the individual NNs classiﬁers by making up the ensemble and thus intensifying discriminant capability of NNs.

3374

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379

Second, the classiﬁcation approaches using error minimization, such as NNs, are prone to overﬁtting when a classiﬁer is too closely adjusted to the training set, and the classiﬁer’s generalization error tends to increase when it is applied to previously unseen samples. Ensemble methods can make base classiﬁers such as NNs to be robust to overﬁtting and thus reduce generalization error. Finally, ensemble with a variety of standard classiﬁers is expected to provide further insight into the general characteristics of ensemble methods which are inﬂuenced by the learning algorithm. Against these backgrounds, we propose two ensemble methods to improve the performance of NNs for bankruptcy prediction. Two popular methods, Bagging and Boosting, are used for creating accurate ensembles by combining the predictions of multiple NN classiﬁers. This paper presents a comprehensive evaluation result of both Bagging and Boosting on a bankruptcy data set of Korean ﬁrms in the respect of prediction accuracy, reduction in generalization error, and robustness to overﬁtting. Experimental results present that two ensembles, bagged and the boosted NNs, consistently outperform all of 10 NNs with different topologies and also show considerable reduction in generalization error. The next section describes two ensemble methods, Bagging and Boosting algorithms. Section 3 discusses several implementation issues when using NNs as base classiﬁers. Sections 4 and 5 explain the experimental design and the results of the experiment, respectively. The ﬁnal section presents several concluding remarks and future research issues.

2. Bagging and boosting algorithms Several ensemble methods of constructing and combining an ensemble of classiﬁers have been proposed to improve the accuracy of learning algorithms. (Breiman, 1998; Krogh & Vedelsby, 1995; Perrone, 1994). We explain two popularly used methods, Bagging (Breiman, 1994) and AdaBoost (Freund, 1995; Schapire, 1990), which differ in the way of preparing training sets.

2.1. Bagging algorithm Bagging is a bootstrap aggregation method that creates and combines multiple classiﬁers, each of which is trained on a bootstrap replicate of the original training set. The bootstrap data is created by resampling examples uniformly with replacement from the original training set. Each classiﬁer is created by training on corresponding bootstrap replicate. The classiﬁers could be trained in parallel and the ﬁnal classiﬁer C is generated by combining ensemble of classiﬁer with unweighted majority voting. The algorithm of Bagging can be described in Table 1. Breiman (1996) considered Bagging as a variance reduction technique for a given base procedure such as decision trees or NNs. Bagging is known to be particularly effective when the classiﬁers are unstable, that is, when perturbing the learning set can cause signiﬁcant changes in the classiﬁcation behavior because Bagging improves generalization performance due to a reduction in variance while maintaining or only slightly increasing bias (Breiman, 1996; Geman, Bienenstock, & Doursat, 1992).

2.2. Boosting algorithm Boosting (Freund and Schapire,1996; Schapire, 1990) constructs a composite classiﬁer by sequentially training classiﬁers while increasing weight on the misclassiﬁed observations through iterations. The observations that are incorrectly predicted by previous classiﬁers are chosen more often than examples that were correctly predicted. Thus Boosting attempts to produce new classiﬁers that are better able to predict examples for which the current ensemble’s performance is poor. Boosting combines predictions of ensemble of classiﬁers with weighted majority voting by giving more weights on more accurate predictions. AdaBoost used in this paper is one of the most widely used Boosting methods. The algorithm of AdaBoost can be described as follows. Let fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxn ; yn Þg be a training set, where x is a vector of predictor variables and y is a two-class response variable such that y 2 f1; 1g. The weight wb(i) is assigned to each observation Xi and is initially set to 1/n. The bth classiﬁer, Cb, is learned on this new training set, Tb and applied to each training observation. The error of this classiﬁer, eb, is calculated as

eb ¼

n X

wb ðiÞeb ðiÞ where

i¼1

eb ¼

0

Cb ðiÞ ¼ yi

1 Cb ðiÞ – yi

ð1Þ

and eb is slightly better than random guessing. ab, which indicates the importance of Cb, is deﬁned as ab ¼ lnð1 eb =eb Þ. The weight for the b+1th classiﬁer is calculated as wb+1(i) = wb(i) exp (abeb(i)) and the calculate weights are normalized to sum one. Consequently, the weight of the incorrectly classiﬁed observation is increased, and the weight of correctly classiﬁed is decreased. Thus each classiﬁer is forced to concentrate on the training examples that are misclassiﬁed by the previous classiﬁer. This classiﬁcation algorithm is repeatedly apply to the training set with modiﬁed weights, producing a sequence of classiﬁers Cb, where b = 1, 2, . . ., B. Finally, the ensemble classiﬁer calculates the ﬁnal predicted output P as the B a C ðxÞ . Table 2 weighted sum of its votes as CðxÞ ¼ sign b b b¼1 illustrates the major procedures of AdaBoost. Freund and Schapire (1997) suggest that when the number B of iteration is increased, the training error level of the AdaBoost classiﬁer exponentially tends to zero and the generalization error (eR) of ﬁnal classier has an upper limit which depends on the training or apparent error (eA), the size of the training set, the Vapnik-Chervonenkis’s dimensionality (VC-dim) coefﬁcient of the parametric space of basic classiﬁers and the number of iteration B in AdaBoost. 3. Ensemble with neural networks for bankruptcy prediction The proposed method is to improve the performance by using Bagging and AdaBoost to generate the ﬁnal output by combining the predictions of multiple NNs used as base classiﬁers. The same prediction problem is solved using three different classiﬁcation methods, NN, Bagging and AdaBoost. 3.1. NN classiﬁer The traditional NN scheme explored as a base classiﬁer is the well-known Multi-Layer Perceptron (MLP) with one hidden layer.

Table 1 Bagging algorithm (Breiman, 1996). 1. Repeat for b = 1, 2, . . ., B (a) Construct a bootstrap sample fðx1 ; y1 Þ; ðx2 ; y2 Þ; ; ðxn ; yn Þg by randomly drawing n times with replacement from the data fðx1 ; y1 Þ; ðx2 ; y2 Þ; ; ðxn ; yn Þg (b) Fitting the bootstrapped classiﬁer Cb on corresponding bootstrap sample P 2. Output the ﬁnal classiﬁer CðxÞ ¼ B1 Bb¼1 C b ðxÞ

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379

3375

with an ensemble of 25 NN classiﬁers for each fold. The basic framework of the bagged classiﬁer is shown in Fig. 2.

Table 2 AdaBoost algorithm (Freund and Schapire, 1996). 1. Start with wb(i) = 1/n, I = 1, 2, . . ., n. 2. Repeat for b = 1, 2,. . ., B (a) Fit the classiﬁer Cb(x) e {1, 1} using weights wb(i) on Tb P (b) Compute eb ¼ ni¼1 wb ðiÞeb ðiÞ and ab ¼ lnð1 eb =eb Þ (c) Update the weights wb+1(i) = wb(i) exp (abeb(i)) and normalize them P 3. Output the ﬁnal classiﬁer CðxÞ ¼ signð Bb¼1 ab C b ðxÞÞ

Each of NNs has the same structure of 7 nodes in input layer and 2 nodes in output layer corresponding to 7 independent variables and two output classes, {bankrupt, non-bankrupt}, respectively. Several experiments were conducted to ﬁnd the effects of the number of hidden nodes on the accuracy in predicting the test data set, varying the number of hidden nodes from 5 to 15. The activation functions were selected to be linear in the input layer and sigmoid in the hidden and output layers. Learning algorithm was back propagation with adaptive learning starting at 0.3 and ﬁnishing at 0.01 and momentum term set to 0.3. 3.2. Boosted classiﬁer The focus of Boosting is to produce a series of classiﬁer. The training set used for each the series is chosen based on the performance of the earlier classiﬁers in the series and the classiﬁer generation will be continued until each classiﬁer yield a weighted error that is less than 50%, that is, better than chance in the twoclass case. The basic framework of the Bagging classiﬁer is shown in Fig. 1. 3.3. Bagged classiﬁer The bagged NNs have the same parameters and structures as NN classiﬁer. They are used as the basic classiﬁers making up ensemble, each of which is learned on the training data for that network. Bagging algorithm is used as the method of generating a single classiﬁer to produce the output of the ensemble by combining the predictions of multiple NNs. A bag of classiﬁer is built

4. Experimental design The data used in this study is obtained from a commercial bank in Korea. The data set contains 1458 externally audited manufacturing ﬁrms, half of which went bankrupt during 2002–2005 while healthy ﬁrms were selected from active companies at the end of 2005. Initially 32 ﬁnancial ratios categorized as proﬁtability, debt coverage, leverage, capital structure, liquidity, activity, and size are investigated through literature review and basic statistical methods. Finally, 7 ﬁnancial ratios with the highest accuracy ratio (AR), a single number indicating discriminating power of a given model (or variable) based on cumulative accuracy proﬁles (CAP). The accuracy ratio is computed as the ratio of the area between the actual model and the random model to the area between the perfect model and random model as shown in Fig. 3. Thus, the better is the prediction model; the closer AR is to 100%. The accuracy ratios per ﬁnancial ratio are listed in Table 3. Although less directly related to model predictive power, the potential presence of multicollinearity is an important check-point of the model. Variance information factors (VIF) among 7 ﬁnancial ratios are performed to check for multicollinearity. Table 4 shows that the estimated VIF values are below the threshold levels of 4 and 10 that are commonly used in VIF analysis when testing for presence of multicollinearity. The ﬁndings indicate that the model variables do not present any substantial multicollinearity. 5. Experimental results We repeated 10-fold cross validations for ﬁve times with different random seeds as is conducted by Optiz and Maclin (1999) in order to ensure that the comparison among three different classiﬁers does not happen by chance. For each of 10-fold cross validation, the entire data set (1458 ﬁrms) is ﬁrst partitioned into 10 equal-sized sets, and then each set is in turn used as the test set while the classiﬁer trains on the other nine sets. That is, cross-validated folds were tested independently of each algorithm. This way

Fig. 1. The basic framework of boosted classiﬁer.

3376

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379

Combine Network Outputs

C1

C2

C25

Network 1

Network 2

Network 25

Bootstrap Data 1

Bootstrap Data 2

Bootstrap Data 25

Original Training Data Fig. 2. The basic framework of bagged classiﬁer.

100% Percent of Defaulters Exclude

90% 80%

Perfect model

A

Model under consideration

70%

B

60% 50% 40%

Random model

30%

Accuracy Ratio = B / [A + B]

20% 10% 0%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Percent of sample excluded Fig. 3. Cumulative accuracy proﬁle and accuracy ratio.

we obtained the results for three classiﬁers on each of the 50 experiments. Table 5 describes the average prediction accuracy of each classiﬁer. In all of 10 different topologies, bagged and boosted classiﬁers consistently produce more accurate results than NN. The t-test is used to examine whether the average prediction performance of three classiﬁers for 50 folds is signiﬁcantly different. The results of t-test indicates that bagged classiﬁer outperforms boosted classiﬁer and NN classiﬁer at the 1% statistical signiﬁcance level and boosted classiﬁer in turn outperforms NN classiﬁer at the 1% statistical signiﬁcance level. This implies that two ensemble methods can be effective tools to improve the performance of NN in bankruptcy prediction domain. Bagged NN shows better results and more stable leaning ability than other classiﬁers, while boosted NN generates the relatively small improved performance compared with bagged classiﬁer. One of major reasons is that boosted NN could have lost the opportunity to improve the performance due to the constraint in new classiﬁer generation that is dependent on the performance of the earlier classiﬁer, while Bagging has no

constrain in new classiﬁer generation because it is not dependent on the performance of the earlier classiﬁers when resampling of the training set. In fact, we observed that most of boosted NNs for 50 folds have less than 10 base classiﬁers. Another possible reason is that NN used as base classiﬁer could be an obstacle to generate new classiﬁer because it pre-maturely converges the error of testing set on about 50% before learning boosted NN. Table 5 also shows that the ensemble with NN is useful to cope with overﬁtting problems of NN. NN shows worse results on testing set than training set. The average difference of accuracy is 3.78%, while those of two ensemble methods are reduced to 0.60% and 0.50%, respectively. This means that two proposed methods are capable of reducing the accuracy gap (or generalization error) because of the robustness to overﬁtting. Table 6 describes generation error, Type I error, and Type II error of three classiﬁers in the testing set. There is a reduction of 8.60% by learning boosted NN compare with the NN test error, which is of 28.94% and a reduction of 16.97% by learning bagged NN. Type I error is a misclassiﬁcation rate when misclassify bank-

3377

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379 Table 3 The accuracy ratios of per ﬁnancial ratio. Category

Variable *

AR (%)

Variable

VIF

Ordinary income to total assets EBITDA to Interest expenses Total debt to total assets Retained earning to total assets Cash ratio Inventory to sales Total assets

1.66 2.31 1.97 2.74 1.64 1.69 1.61

Proﬁtability

Ordinary income to total assets Net income to total assets (ROA) Financial expenses to sales Financial expenses to total debt Net ﬁnancing cost to sales Ordinary income to sales Net income to sales Ordinary income to capital Net income to capital

52.1 45.7 49.2 48.5 50.3 45.7 49.9 48.2 47.5

Debt coverage

EBITDA to Interest expenses* EBIT to Interest expenses Cash operating income to interest expenses Cash operating income to total debt Cash ﬂow after interest payment to total debt Cash ﬂow after interest payment to total debt Debt repayment coefﬁcient Borrowings to Interest expenses

53.2 49.2 48.5 47.8 52.7 51.9 50.8 52.4

Leverage

Total debt to total assets* Capital to total asset Current assets to total assets

52.4 51.9 51.3

Capital structure

Retained earning to total assets* Retained earning to total debt Retained earning to current assets

53.6 51.6 50.8

Liquidity

Cash ratio* Quick ratio Current assets/current Liabilities

46.5 45.7 43.2

Activity

Inventory to sales* Current liabilities to sales Account receivable to sales

31.5 28.5 27.0

Total assets* Sales Fixed assets

25.2 22.4 24.6

Size

*

Table 4 Variance information factors.

7 ﬁnancial ratios with the highest AR in each category.

rupt ﬁrms as non-bankrupt ﬁrms while Type II error is a misclassiﬁcation rate when misclassify non-bankrupt ﬁrms as bankrupt ﬁrms. Type I error is regarded as the most important error because of its signiﬁcant impact. Boosted and bagged NN decreases Type I error by 76.72% and 73.89%, respectively. In ensemble learning, it is useful to visualize the behavior of an ensemble classiﬁer by plotting cumulative graph of its margins. A margin of an instance, in ensemble learning, is deﬁned as a difference between the number of correct base classiﬁers and the maximum among the numbers of base classiﬁers attached each class label, and thus, it reﬂects the certainty of its classiﬁcation. Hence, if there are two-class labels {0, 1} and we ﬁx ‘1’ as a correct label without loss of generality, then m(x), a margin m of an instance x is deﬁned as follows: mðxÞ ¼

ð# of correct base classifiersÞ ð# of incorrect base classifiersÞ # of base classifiers

When all base classiﬁers correctly classify a given instance, the margin of the instance is 1. If the number of class labels is more than two, to calculate a margin, we measure the difference be-

Table 5 Comparison of predictive accuracy. Topology

NN

Boosted NN

Bagged NN

Training

Testing

Dif.

Training

Testing

Dif.

Training

Testing

Dif.

7–5–2 7–6–2 7–7–2 7–8–2 7–9–2 7–10–2 7–11–2 7–12–2 7–13–2 7–14–2

74.71 74.81 74.73 74.84 74.81 74.82 74.84 74.80 74.84 74.84

71.06 70.78 71.12 71.12 70.92 70.92 70.85 71.19 71.12 71.12

3.66 4.03 3.60 3.71 3.89 3.90 3.99 3.60 3.72 3.30

75.58 75.29 75.51 75.95 75.92 75.54 75.76 75.66 75.87 75.96

75.04 75.03 75.24 74.90 75.24 74.62 75.58 74.96 75.18 75.18

0.54 0.26 0.28 1.05 0.67 0.91 0.17 0.70 0.70 0.78

76.46 76.51 76.56 76.56 76.34 76.43 76.49 76.51 76.53 76.37

75.99 75.92 75.92 76.06 75.86 75.78 75.99 76.20 75.92 76.06

0.47 0.58 0.64 0.50 0.48 0.65 0.50 0.31 0.61 0.30

Average

74.80

71.02

3.78

75.70

75.10

0.60

76.47

75.97

0.50

Table 6 Comparison of prediction error rate. Topology

NN

Boosted NN

Bagged NN

Overall (%)

Type I (%)

Type II (%)

Overall (%)

Type I (%)

Type II (%)

Overall (%)

Type I (%)

Type II (%)

7–5–2 7–6–2 7–7–2 7–8–2 7–9–2 7–10–2 7–11–2 7–12–2 7-13–2 7–14–2

28.94 29.22 28.88 28.88 29.08 29.08 29.15 28.81 28.88 28.46

23.05 22.91 23.05 23.18 23.32 23.46 23.59 23.46 23.59 23.59

34.84 35.53 34.71 34.57 34.84 34.71 34.71 34.16 34.16 33.33

24.96 24.97 24.76 25.10 24.76 25.38 24.42 25.04 24.82 24.83

16.61 17.97 17.15 17.70 16.46 20.16 19.34 17.97 15.92 19.61

33.32 31.97 32.37 32.50 33.05 30.58 29.49 32.09 33.73 30.03

24.01 24.08 24.08 23.94 24.14 24.22 24.01 23.80 24.08 23.94

16.88 17.30 17.16 17.16 17.29 17.57 17.43 17.16 17.16 17.15

31.14 30.86 31.00 30.72 31.00 30.86 30.59 30.45 31.00 30.73

Average

28.94

23.32

34.55

24.90

17.89

31.91

24.03

17.23

30.83

3378

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379

Fig. 4. Cumulative distribution graph of margins with 10 different NN topologies.

tween the number of correct base classiﬁers and the maximum number of classiﬁers among incorrect class labels. Thus the deﬁnition of a margin is generalized as follows:

together the predictions of the classiﬁers as an additive model using a maximum likelihood criterion, can be applied as an alternative mechanism to resolve such a problem. Boosted and bagged NN

ð# of correct base classifiersÞ maxð# of base classifiers that predict class label iÞ mðxÞ ¼

i

# of base classifiers

So, if all base classiﬁers incorrectly predict to one certain class label, the margin will be 1. To visually assist human understanding on the behavior of an ensemble classiﬁer, Kuncheva (2004) introduced cumulative distribution graph of margins. In the graph, x-axis represents a margin, m, and y-axis represents the percentage of instances of which the margin is less than or equal to m. If all the instances are correctly classiﬁed, then the graph will simply be a vertical line at m = 1. Fig. 4 shows the margin cumulative graphs of bagged and boosted classiﬁers for the 10 different topologies of NN. It is interesting to see that Boosted ensembles shape a curve similar to ‘S’ (shown in a thin line in Fig. 4), but Bagged ensembles shape a curve similar to ‘N’ (shown in a thick line in Fig. 4). As noted earlier, it is because the sample distribution of a classiﬁer in a Boosted ensemble is dependent on its predecessor classiﬁer’s results. Therefore, in terms of sample distribution and resulting neural network’s weights, base classiﬁers in a Boosted ensemble have bigger variance among themselves than the base classiﬁers in a Bagged ensemble.

also alleviate the overﬁtting problem and thus achieve reductions of 8.60% and 16.97% in the test error compared with NN. Particularly, they decrease Type I error by 76.72% and by 73.89%, respectively. These results mean that two proposed ensemble methods can be effective tools to improve the performance of NN for bankruptcy domain. However, several issues remain to be addressed by further research. First, this research has not addressed many important tasks such as the effect of the interdependence of combined classiﬁers on joint accuracy or the behavior of combination methods in the presence of noisy data (Optiz & Maclin, 1999). Further investigation on these tasks would facilitate the vigorous combination of ensemble methods and the standard classiﬁers. Second, further improvements of boosted NN could be achieved by using new proposed Boosting algorithms such as conﬁdence rated boosting (Schapire & Singer, 1999), Margin Boost (Mason, Baxter, Bartlett, & Frean, 2000) and Logit Boost (Friedman, 2001).

References 6. Conclusion In this study, two popular ensemble methods; Bagging and Boosting, are applied to bankruptcy prediction to improve the classiﬁcation performance of NN. Boosted and bagged NN consistently show the improved predictive accuracy in all of 10 different topologies. Particularly, bagged NN produces a more accurate single classiﬁer than other classiﬁers. Boosted NN generates the relatively small improved performance compared with bagged classiﬁer due to the constraint of new classiﬁer generation depending on the performance of the earlier classiﬁer. These results suggest the need of new learning strategy for boosted NN to produce a more accurate classiﬁer. In this sense, Logit Boost (Friedman, 2001) that ﬁts

Alfaro, E., Gámez, M., & García, N. (2007). Multiclass corporate failure prediction by AdaBoost. M1. Advanced Economic Research, 13, 301–312. Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: an empirical comparison of AdaBooost and neural networks. Decision Support Systems, 45, 110–122. Altman, E. L. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609. Altman, E. L., Edward, I., Haldeman, R., & Narayanan, P. (1977). A new model to identify bankruptcy risk of corporations. Journal of Banking and Finance, 1, 29–54. Beaver, W. (1966). Financial ratios as predictors of failure, empirical research in accounting: Selected studied. Journal of Accounting Research, 4(3), 71–111. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classiﬁcation algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139. Breiman, L. (1994). Bagging predictors. Machine Learning, 24(2), 123–140. Breiman, L. (1996). Bias, variance, and arcing classiﬁers (Tech. Rep. No. 460). Berkeley: Statistics Department, University of California at Berkeley.

M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379 Breiman, L. (1997). Arcing the edge (Tech. Rep. No. 486). Berkeley: Statistics Department, University of California at Berkeley. Breiman, L. (1998). Arcing classiﬁers. Annuals of Statistics, 26(3), 801–849. Bryant, S. M. (1997). A case-based reasoning approach to bankruptcy prediction modeling. International Journal of Intelligent Systems in Accounting, Finance and Management, 6(3), 195–214. Buta, P. (1994). Mining for ﬁnancial knowledge with CBR. AI Expert, 9(10), 34–41. Dimitras, A. I., Zanakis, S. H., & Zopounidis, C. (1996). A survey of business failure with an emphasis on prediction methods and industrial applications. European Journal of Operational Research, 90(3), 487–513. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of 13th International Conference (pp. 148–156). Freund, Y., & Schapire, R. E. (1997). A decision theoretic generalization of online learning and an application to boosting. Journal of Computer and System Science, 55(1), 119–139. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting (Tech. Rep.). Stanford: Department of Statistics, Stanford University. Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistic, 29(5), 1189–1232. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/ variance dilemma. Neural Computation, 4(1), 1–58. Han, I., Chandler, J. S., & Liang, T. P. (1996). The impact of measurement scale and correlation structure on classiﬁcation performance of inductive learning and statistical methods. Expert System with Applications, 10(2), 209–221. Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.). Advances in neural information processing systems (Vol. 7, pp. 231–238). Cambridge, MA: MIT Press. Kuncheva, L. I. (2004). Combining pattern classiﬁers: Methods and algorithms. New York: Wiley. Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the 14th National Conference on Artiﬁcial Intelligence (pp. 546– 551). Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, & B. Scholk (Eds.), Direct

3379

optimization of margins improves generalization in combined classiﬁers. In M. S. Kearns, S. Solla, & D. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11, Cambridge, MA: MIT Press. Meyer, P. A., & Pifer, H. (1970). Prediction of bank failures. The Journal of Finance, 25, 853–868. Odom, M., & Sharda, R. (1990). A neural network for bankruptcy prediction. In Proceedings of the International Joint Conference on Neural Networks, San Diego, CA: IEEE Press. Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131. Optiz, D., & Maclin, R. (1999). Popular ensemble methods: an empirical study. Journal of Artiﬁcial Intelligence, 11, 169–198. Pantalone, C., & Platt, M. B. (1987). Predicting commercial bank failure since deregulation. New England Economic Review, 37–47. Perrone, M. E. (1994). Putting it all together: Methods for combining neural networks. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.). Advances in Neural Information Processing Systems (Vol. 6, pp. 1188–1189). San Mateo, CA: Morgan Kaufman. Quinlan, J. R. (1996). Bagging, boosting and C4.5. Machine Learning: Proceedings of the 14th International Conference (pp. 725–730). Ravi, P., & Ravi, K. V. (2007). Bankruptcy prediction in banks and ﬁrms via statistical and intelligent techniques-a review. European Journal of Operational Research, 180, 1–28. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227. Schapire, R. E. (1999). Theoretical views of boosting. Computational Learning Theory: Fourth European Conference, EuroCOLT (pp. 1–10). Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. Machine Learning: Proceedings of 14th International Conference (pp. 322–330). Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithm using conﬁdencerated predictions. Machine Learning, 37(3), 297–336. Shaw, M., & Gentry, J. (1998). Using and expert system with inductive learning to evaluate business loans. Financial Management, 17(3), 45–56. Zmijewski, M. E. (1984). Methodological issues related to the estimation of ﬁnancial distress prediction models. Journal of Accounting Research, 22(1), 59–82.

Recommend Documents

GENETIC OPTIMIZATION OF ENSEMBLE NEURAL NETWORKS FOR ...

Ensemble of neural networks with associative memory (ENNA) for ...

Zhouhan - Neural Networks with Few ... - Semantic Scholar