Expert Systems with Applications 37 (2010) 3373–3379
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Ensemble with neural networks for bankruptcy prediction Myoung-Jong Kim a,*, Dae-Ki Kang b a b
Department of Business Administration, Dongseo University, San69-1, Churye-2Dong, Sasang-Gu, Busan 617-716, Republic of Korea Department of Computer and Information Engineering, Dongseo University, San69-1, Churye-2Dong, Sasang-Gu, Busan 617-716, Republic of Korea
a r t i c l e
i n f o
Keywords: Boosting Bagging Neural networks Bankruptcy prediction
a b s t r a c t In a bankruptcy prediction model, the accuracy is one of crucial performance measures due to its significant economic impact. Ensemble is one of widely used methods for improving the performance of classification and prediction models. Two popular ensemble methods, Bagging and Boosting, have been applied with great success to various machine learning problems using mostly decision trees as base classifiers. In this paper, we propose an ensemble with neural network for improving the performance of traditional neural networks on bankruptcy prediction tasks. Experimental results on Korean firms indicated that the bagged and the boosted neural networks showed the improved performance over traditional neural networks. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Bankruptcy prediction has been an important and widely studied topic in accounting and finance. The accuracy is clearly of crucial importance in bankruptcy prediction model because bankruptcy has a significant impact on management, stockholders, employees, customers and nation. Numerous statistical techniques have been used for improving the performance of bankruptcy prediction models. Beaver (1966) originally proposed the univariate analysis on financial ratios to predict bankruptcy. Many empirical studies have proposed statistical bankruptcy prediction models using multiple regression (Meyer & Pifer, 1970), discriminant analysis (Altman, 1968; Altman, Edward, Haldeman, & Narayanan 1977), logistic models (Dimitras, Zanakis, & Zopounidis, 1996; Ohlson, 1980; Pantalone & Platt, 1987), and probit (Zmijewski, 1984). However, strict assumptions of traditional statistics, such as the linearity, normality, independence among predictor variables and pre-existing functional form relating the criterion variable and the predictor variable, have limited their application to the real world. Data mining techniques used in bankruptcy prediction include decision trees (Han, Chandler, & Liang, 1996; Shaw & Gentry, 1998), case-based reasoning (CBR) (Bryant, 1997; Buta, 1994), and neural networks (NNs) (Odom & Sharda, 1990; Ravi & Ravi, 2007). Ensemble methods have generally been used as tools to improve the accuracy of learning algorithms by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set (Perrone, * Corresponding author. Tel.: +82 51 320 1917; fax: +82 51 320 1629. E-mail address:
[email protected] (M.-J. Kim). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.10.012
1994; Schapire, 1990). Two popular methods for creating accurate ensembles are Bagging (Breiman, 1996) and Boosting (Freund & Schapire, 1996). Both theoretical and empirical studies have demonstrated impressive improvements in the generalization behavior (Bauer & Kohavi, 1999; Breiman, 1996, 1997, 1998; Friedman, Hastie, & Tibshirani, 1998; Maclin & Opitz, 1997; Quinlan, 1996; Schapire, 1999; Schapire, Freund, Bartlett, & Lee, 1997). There have been reported that ensemble decreases the generalization error of CART decision trees (Breiman, 1996), C4.5 decision trees (Bauer & Kohavi, 1999; Quinlan, 1996), and NNs (Maclin & Opitz, 1997). Recently, several studies on bankruptcy prediction have applied AdaBoost, which is one of popularly used Boosting algorithms, to bankruptcy classification trees. They have shown that AdaBoost decreases the generalization error and improve the accuracy (Alfaro, Gámez, & García, 2007). An empirical comparison has shown that AdaBoost with classification tree decreases the generalization error by about 30% percent with respect to error produced with a NNs (Alfaro, García, Gámez, & Elizondo, 2008). Previous studies have suggested that ensemble with classification trees is very effective for bankruptcy prediction, however, there has been little empirical testing of ensemble with NNs in bankruptcy prediction literature. The major reason is that ensemble with decision trees provides fast training speed and well-established default parameter settings, while NNs has the difficulties for testing both in terms of the significant processing time required and in selecting training parameters (Optiz & Maclin, 1999). Ensemble method is expected to provide the following advantages over the traditional NN; First, NNs has introduced as one of prominent techniques which can show effective performance in bankruptcy prediction. Ensemble can produce even more accurate results than any of the individual NNs classifiers by making up the ensemble and thus intensifying discriminant capability of NNs.
3374
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379
Second, the classification approaches using error minimization, such as NNs, are prone to overfitting when a classifier is too closely adjusted to the training set, and the classifier’s generalization error tends to increase when it is applied to previously unseen samples. Ensemble methods can make base classifiers such as NNs to be robust to overfitting and thus reduce generalization error. Finally, ensemble with a variety of standard classifiers is expected to provide further insight into the general characteristics of ensemble methods which are influenced by the learning algorithm. Against these backgrounds, we propose two ensemble methods to improve the performance of NNs for bankruptcy prediction. Two popular methods, Bagging and Boosting, are used for creating accurate ensembles by combining the predictions of multiple NN classifiers. This paper presents a comprehensive evaluation result of both Bagging and Boosting on a bankruptcy data set of Korean firms in the respect of prediction accuracy, reduction in generalization error, and robustness to overfitting. Experimental results present that two ensembles, bagged and the boosted NNs, consistently outperform all of 10 NNs with different topologies and also show considerable reduction in generalization error. The next section describes two ensemble methods, Bagging and Boosting algorithms. Section 3 discusses several implementation issues when using NNs as base classifiers. Sections 4 and 5 explain the experimental design and the results of the experiment, respectively. The final section presents several concluding remarks and future research issues.
2. Bagging and boosting algorithms Several ensemble methods of constructing and combining an ensemble of classifiers have been proposed to improve the accuracy of learning algorithms. (Breiman, 1998; Krogh & Vedelsby, 1995; Perrone, 1994). We explain two popularly used methods, Bagging (Breiman, 1994) and AdaBoost (Freund, 1995; Schapire, 1990), which differ in the way of preparing training sets.
2.1. Bagging algorithm Bagging is a bootstrap aggregation method that creates and combines multiple classifiers, each of which is trained on a bootstrap replicate of the original training set. The bootstrap data is created by resampling examples uniformly with replacement from the original training set. Each classifier is created by training on corresponding bootstrap replicate. The classifiers could be trained in parallel and the final classifier C is generated by combining ensemble of classifier with unweighted majority voting. The algorithm of Bagging can be described in Table 1. Breiman (1996) considered Bagging as a variance reduction technique for a given base procedure such as decision trees or NNs. Bagging is known to be particularly effective when the classifiers are unstable, that is, when perturbing the learning set can cause significant changes in the classification behavior because Bagging improves generalization performance due to a reduction in variance while maintaining or only slightly increasing bias (Breiman, 1996; Geman, Bienenstock, & Doursat, 1992).
2.2. Boosting algorithm Boosting (Freund and Schapire,1996; Schapire, 1990) constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that were correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble’s performance is poor. Boosting combines predictions of ensemble of classifiers with weighted majority voting by giving more weights on more accurate predictions. AdaBoost used in this paper is one of the most widely used Boosting methods. The algorithm of AdaBoost can be described as follows. Let fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxn ; yn Þg be a training set, where x is a vector of predictor variables and y is a two-class response variable such that y 2 f1; 1g. The weight wb(i) is assigned to each observation Xi and is initially set to 1/n. The bth classifier, Cb, is learned on this new training set, Tb and applied to each training observation. The error of this classifier, eb, is calculated as
eb ¼
n X
wb ðiÞeb ðiÞ where
i¼1
eb ¼
0
Cb ðiÞ ¼ yi
1 Cb ðiÞ – yi
ð1Þ
and eb is slightly better than random guessing. ab, which indicates the importance of Cb, is defined as ab ¼ lnð1 eb =eb Þ. The weight for the b+1th classifier is calculated as wb+1(i) = wb(i) exp (abeb(i)) and the calculate weights are normalized to sum one. Consequently, the weight of the incorrectly classified observation is increased, and the weight of correctly classified is decreased. Thus each classifier is forced to concentrate on the training examples that are misclassified by the previous classifier. This classification algorithm is repeatedly apply to the training set with modified weights, producing a sequence of classifiers Cb, where b = 1, 2, . . ., B. Finally, the ensemble classifier calculates the final predicted output P as the B a C ðxÞ . Table 2 weighted sum of its votes as CðxÞ ¼ sign b b b¼1 illustrates the major procedures of AdaBoost. Freund and Schapire (1997) suggest that when the number B of iteration is increased, the training error level of the AdaBoost classifier exponentially tends to zero and the generalization error (eR) of final classier has an upper limit which depends on the training or apparent error (eA), the size of the training set, the Vapnik-Chervonenkis’s dimensionality (VC-dim) coefficient of the parametric space of basic classifiers and the number of iteration B in AdaBoost. 3. Ensemble with neural networks for bankruptcy prediction The proposed method is to improve the performance by using Bagging and AdaBoost to generate the final output by combining the predictions of multiple NNs used as base classifiers. The same prediction problem is solved using three different classification methods, NN, Bagging and AdaBoost. 3.1. NN classifier The traditional NN scheme explored as a base classifier is the well-known Multi-Layer Perceptron (MLP) with one hidden layer.
Table 1 Bagging algorithm (Breiman, 1996). 1. Repeat for b = 1, 2, . . ., B (a) Construct a bootstrap sample fðx1 ; y1 Þ; ðx2 ; y2 Þ; ; ðxn ; yn Þg by randomly drawing n times with replacement from the data fðx1 ; y1 Þ; ðx2 ; y2 Þ; ; ðxn ; yn Þg (b) Fitting the bootstrapped classifier Cb on corresponding bootstrap sample P 2. Output the final classifier CðxÞ ¼ B1 Bb¼1 C b ðxÞ
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379
3375
with an ensemble of 25 NN classifiers for each fold. The basic framework of the bagged classifier is shown in Fig. 2.
Table 2 AdaBoost algorithm (Freund and Schapire, 1996). 1. Start with wb(i) = 1/n, I = 1, 2, . . ., n. 2. Repeat for b = 1, 2,. . ., B (a) Fit the classifier Cb(x) e {1, 1} using weights wb(i) on Tb P (b) Compute eb ¼ ni¼1 wb ðiÞeb ðiÞ and ab ¼ lnð1 eb =eb Þ (c) Update the weights wb+1(i) = wb(i) exp (abeb(i)) and normalize them P 3. Output the final classifier CðxÞ ¼ signð Bb¼1 ab C b ðxÞÞ
Each of NNs has the same structure of 7 nodes in input layer and 2 nodes in output layer corresponding to 7 independent variables and two output classes, {bankrupt, non-bankrupt}, respectively. Several experiments were conducted to find the effects of the number of hidden nodes on the accuracy in predicting the test data set, varying the number of hidden nodes from 5 to 15. The activation functions were selected to be linear in the input layer and sigmoid in the hidden and output layers. Learning algorithm was back propagation with adaptive learning starting at 0.3 and finishing at 0.01 and momentum term set to 0.3. 3.2. Boosted classifier The focus of Boosting is to produce a series of classifier. The training set used for each the series is chosen based on the performance of the earlier classifiers in the series and the classifier generation will be continued until each classifier yield a weighted error that is less than 50%, that is, better than chance in the twoclass case. The basic framework of the Bagging classifier is shown in Fig. 1. 3.3. Bagged classifier The bagged NNs have the same parameters and structures as NN classifier. They are used as the basic classifiers making up ensemble, each of which is learned on the training data for that network. Bagging algorithm is used as the method of generating a single classifier to produce the output of the ensemble by combining the predictions of multiple NNs. A bag of classifier is built
4. Experimental design The data used in this study is obtained from a commercial bank in Korea. The data set contains 1458 externally audited manufacturing firms, half of which went bankrupt during 2002–2005 while healthy firms were selected from active companies at the end of 2005. Initially 32 financial ratios categorized as profitability, debt coverage, leverage, capital structure, liquidity, activity, and size are investigated through literature review and basic statistical methods. Finally, 7 financial ratios with the highest accuracy ratio (AR), a single number indicating discriminating power of a given model (or variable) based on cumulative accuracy profiles (CAP). The accuracy ratio is computed as the ratio of the area between the actual model and the random model to the area between the perfect model and random model as shown in Fig. 3. Thus, the better is the prediction model; the closer AR is to 100%. The accuracy ratios per financial ratio are listed in Table 3. Although less directly related to model predictive power, the potential presence of multicollinearity is an important check-point of the model. Variance information factors (VIF) among 7 financial ratios are performed to check for multicollinearity. Table 4 shows that the estimated VIF values are below the threshold levels of 4 and 10 that are commonly used in VIF analysis when testing for presence of multicollinearity. The findings indicate that the model variables do not present any substantial multicollinearity. 5. Experimental results We repeated 10-fold cross validations for five times with different random seeds as is conducted by Optiz and Maclin (1999) in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set (1458 firms) is first partitioned into 10 equal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds were tested independently of each algorithm. This way
Fig. 1. The basic framework of boosted classifier.
3376
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379
Combine Network Outputs
C1
C2
C25
Network 1
Network 2
Network 25
Bootstrap Data 1
Bootstrap Data 2
Bootstrap Data 25
Original Training Data Fig. 2. The basic framework of bagged classifier.
100% Percent of Defaulters Exclude
90% 80%
Perfect model
A
Model under consideration
70%
B
60% 50% 40%
Random model
30%
Accuracy Ratio = B / [A + B]
20% 10% 0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Percent of sample excluded Fig. 3. Cumulative accuracy profile and accuracy ratio.
we obtained the results for three classifiers on each of the 50 experiments. Table 5 describes the average prediction accuracy of each classifier. In all of 10 different topologies, bagged and boosted classifiers consistently produce more accurate results than NN. The t-test is used to examine whether the average prediction performance of three classifiers for 50 folds is significantly different. The results of t-test indicates that bagged classifier outperforms boosted classifier and NN classifier at the 1% statistical significance level and boosted classifier in turn outperforms NN classifier at the 1% statistical significance level. This implies that two ensemble methods can be effective tools to improve the performance of NN in bankruptcy prediction domain. Bagged NN shows better results and more stable leaning ability than other classifiers, while boosted NN generates the relatively small improved performance compared with bagged classifier. One of major reasons is that boosted NN could have lost the opportunity to improve the performance due to the constraint in new classifier generation that is dependent on the performance of the earlier classifier, while Bagging has no
constrain in new classifier generation because it is not dependent on the performance of the earlier classifiers when resampling of the training set. In fact, we observed that most of boosted NNs for 50 folds have less than 10 base classifiers. Another possible reason is that NN used as base classifier could be an obstacle to generate new classifier because it pre-maturely converges the error of testing set on about 50% before learning boosted NN. Table 5 also shows that the ensemble with NN is useful to cope with overfitting problems of NN. NN shows worse results on testing set than training set. The average difference of accuracy is 3.78%, while those of two ensemble methods are reduced to 0.60% and 0.50%, respectively. This means that two proposed methods are capable of reducing the accuracy gap (or generalization error) because of the robustness to overfitting. Table 6 describes generation error, Type I error, and Type II error of three classifiers in the testing set. There is a reduction of 8.60% by learning boosted NN compare with the NN test error, which is of 28.94% and a reduction of 16.97% by learning bagged NN. Type I error is a misclassification rate when misclassify bank-
3377
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379 Table 3 The accuracy ratios of per financial ratio. Category
Variable *
AR (%)
Variable
VIF
Ordinary income to total assets EBITDA to Interest expenses Total debt to total assets Retained earning to total assets Cash ratio Inventory to sales Total assets
1.66 2.31 1.97 2.74 1.64 1.69 1.61
Profitability
Ordinary income to total assets Net income to total assets (ROA) Financial expenses to sales Financial expenses to total debt Net financing cost to sales Ordinary income to sales Net income to sales Ordinary income to capital Net income to capital
52.1 45.7 49.2 48.5 50.3 45.7 49.9 48.2 47.5
Debt coverage
EBITDA to Interest expenses* EBIT to Interest expenses Cash operating income to interest expenses Cash operating income to total debt Cash flow after interest payment to total debt Cash flow after interest payment to total debt Debt repayment coefficient Borrowings to Interest expenses
53.2 49.2 48.5 47.8 52.7 51.9 50.8 52.4
Leverage
Total debt to total assets* Capital to total asset Current assets to total assets
52.4 51.9 51.3
Capital structure
Retained earning to total assets* Retained earning to total debt Retained earning to current assets
53.6 51.6 50.8
Liquidity
Cash ratio* Quick ratio Current assets/current Liabilities
46.5 45.7 43.2
Activity
Inventory to sales* Current liabilities to sales Account receivable to sales
31.5 28.5 27.0
Total assets* Sales Fixed assets
25.2 22.4 24.6
Size
*
Table 4 Variance information factors.
7 financial ratios with the highest AR in each category.
rupt firms as non-bankrupt firms while Type II error is a misclassification rate when misclassify non-bankrupt firms as bankrupt firms. Type I error is regarded as the most important error because of its significant impact. Boosted and bagged NN decreases Type I error by 76.72% and 73.89%, respectively. In ensemble learning, it is useful to visualize the behavior of an ensemble classifier by plotting cumulative graph of its margins. A margin of an instance, in ensemble learning, is defined as a difference between the number of correct base classifiers and the maximum among the numbers of base classifiers attached each class label, and thus, it reflects the certainty of its classification. Hence, if there are two-class labels {0, 1} and we fix ‘1’ as a correct label without loss of generality, then m(x), a margin m of an instance x is defined as follows: mðxÞ ¼
ð# of correct base classifiersÞ ð# of incorrect base classifiersÞ # of base classifiers
When all base classifiers correctly classify a given instance, the margin of the instance is 1. If the number of class labels is more than two, to calculate a margin, we measure the difference be-
Table 5 Comparison of predictive accuracy. Topology
NN
Boosted NN
Bagged NN
Training
Testing
Dif.
Training
Testing
Dif.
Training
Testing
Dif.
7–5–2 7–6–2 7–7–2 7–8–2 7–9–2 7–10–2 7–11–2 7–12–2 7–13–2 7–14–2
74.71 74.81 74.73 74.84 74.81 74.82 74.84 74.80 74.84 74.84
71.06 70.78 71.12 71.12 70.92 70.92 70.85 71.19 71.12 71.12
3.66 4.03 3.60 3.71 3.89 3.90 3.99 3.60 3.72 3.30
75.58 75.29 75.51 75.95 75.92 75.54 75.76 75.66 75.87 75.96
75.04 75.03 75.24 74.90 75.24 74.62 75.58 74.96 75.18 75.18
0.54 0.26 0.28 1.05 0.67 0.91 0.17 0.70 0.70 0.78
76.46 76.51 76.56 76.56 76.34 76.43 76.49 76.51 76.53 76.37
75.99 75.92 75.92 76.06 75.86 75.78 75.99 76.20 75.92 76.06
0.47 0.58 0.64 0.50 0.48 0.65 0.50 0.31 0.61 0.30
Average
74.80
71.02
3.78
75.70
75.10
0.60
76.47
75.97
0.50
Table 6 Comparison of prediction error rate. Topology
NN
Boosted NN
Bagged NN
Overall (%)
Type I (%)
Type II (%)
Overall (%)
Type I (%)
Type II (%)
Overall (%)
Type I (%)
Type II (%)
7–5–2 7–6–2 7–7–2 7–8–2 7–9–2 7–10–2 7–11–2 7–12–2 7-13–2 7–14–2
28.94 29.22 28.88 28.88 29.08 29.08 29.15 28.81 28.88 28.46
23.05 22.91 23.05 23.18 23.32 23.46 23.59 23.46 23.59 23.59
34.84 35.53 34.71 34.57 34.84 34.71 34.71 34.16 34.16 33.33
24.96 24.97 24.76 25.10 24.76 25.38 24.42 25.04 24.82 24.83
16.61 17.97 17.15 17.70 16.46 20.16 19.34 17.97 15.92 19.61
33.32 31.97 32.37 32.50 33.05 30.58 29.49 32.09 33.73 30.03
24.01 24.08 24.08 23.94 24.14 24.22 24.01 23.80 24.08 23.94
16.88 17.30 17.16 17.16 17.29 17.57 17.43 17.16 17.16 17.15
31.14 30.86 31.00 30.72 31.00 30.86 30.59 30.45 31.00 30.73
Average
28.94
23.32
34.55
24.90
17.89
31.91
24.03
17.23
30.83
3378
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379
Fig. 4. Cumulative distribution graph of margins with 10 different NN topologies.
tween the number of correct base classifiers and the maximum number of classifiers among incorrect class labels. Thus the definition of a margin is generalized as follows:
together the predictions of the classifiers as an additive model using a maximum likelihood criterion, can be applied as an alternative mechanism to resolve such a problem. Boosted and bagged NN
ð# of correct base classifiersÞ maxð# of base classifiers that predict class label iÞ mðxÞ ¼
i
# of base classifiers
So, if all base classifiers incorrectly predict to one certain class label, the margin will be 1. To visually assist human understanding on the behavior of an ensemble classifier, Kuncheva (2004) introduced cumulative distribution graph of margins. In the graph, x-axis represents a margin, m, and y-axis represents the percentage of instances of which the margin is less than or equal to m. If all the instances are correctly classified, then the graph will simply be a vertical line at m = 1. Fig. 4 shows the margin cumulative graphs of bagged and boosted classifiers for the 10 different topologies of NN. It is interesting to see that Boosted ensembles shape a curve similar to ‘S’ (shown in a thin line in Fig. 4), but Bagged ensembles shape a curve similar to ‘N’ (shown in a thick line in Fig. 4). As noted earlier, it is because the sample distribution of a classifier in a Boosted ensemble is dependent on its predecessor classifier’s results. Therefore, in terms of sample distribution and resulting neural network’s weights, base classifiers in a Boosted ensemble have bigger variance among themselves than the base classifiers in a Bagged ensemble.
also alleviate the overfitting problem and thus achieve reductions of 8.60% and 16.97% in the test error compared with NN. Particularly, they decrease Type I error by 76.72% and by 73.89%, respectively. These results mean that two proposed ensemble methods can be effective tools to improve the performance of NN for bankruptcy domain. However, several issues remain to be addressed by further research. First, this research has not addressed many important tasks such as the effect of the interdependence of combined classifiers on joint accuracy or the behavior of combination methods in the presence of noisy data (Optiz & Maclin, 1999). Further investigation on these tasks would facilitate the vigorous combination of ensemble methods and the standard classifiers. Second, further improvements of boosted NN could be achieved by using new proposed Boosting algorithms such as confidence rated boosting (Schapire & Singer, 1999), Margin Boost (Mason, Baxter, Bartlett, & Frean, 2000) and Logit Boost (Friedman, 2001).
References 6. Conclusion In this study, two popular ensemble methods; Bagging and Boosting, are applied to bankruptcy prediction to improve the classification performance of NN. Boosted and bagged NN consistently show the improved predictive accuracy in all of 10 different topologies. Particularly, bagged NN produces a more accurate single classifier than other classifiers. Boosted NN generates the relatively small improved performance compared with bagged classifier due to the constraint of new classifier generation depending on the performance of the earlier classifier. These results suggest the need of new learning strategy for boosted NN to produce a more accurate classifier. In this sense, Logit Boost (Friedman, 2001) that fits
Alfaro, E., Gámez, M., & García, N. (2007). Multiclass corporate failure prediction by AdaBoost. M1. Advanced Economic Research, 13, 301–312. Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: an empirical comparison of AdaBooost and neural networks. Decision Support Systems, 45, 110–122. Altman, E. L. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609. Altman, E. L., Edward, I., Haldeman, R., & Narayanan, P. (1977). A new model to identify bankruptcy risk of corporations. Journal of Banking and Finance, 1, 29–54. Beaver, W. (1966). Financial ratios as predictors of failure, empirical research in accounting: Selected studied. Journal of Accounting Research, 4(3), 71–111. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–139. Breiman, L. (1994). Bagging predictors. Machine Learning, 24(2), 123–140. Breiman, L. (1996). Bias, variance, and arcing classifiers (Tech. Rep. No. 460). Berkeley: Statistics Department, University of California at Berkeley.
M.-J. Kim, D.-K. Kang / Expert Systems with Applications 37 (2010) 3373–3379 Breiman, L. (1997). Arcing the edge (Tech. Rep. No. 486). Berkeley: Statistics Department, University of California at Berkeley. Breiman, L. (1998). Arcing classifiers. Annuals of Statistics, 26(3), 801–849. Bryant, S. M. (1997). A case-based reasoning approach to bankruptcy prediction modeling. International Journal of Intelligent Systems in Accounting, Finance and Management, 6(3), 195–214. Buta, P. (1994). Mining for financial knowledge with CBR. AI Expert, 9(10), 34–41. Dimitras, A. I., Zanakis, S. H., & Zopounidis, C. (1996). A survey of business failure with an emphasis on prediction methods and industrial applications. European Journal of Operational Research, 90(3), 487–513. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of 13th International Conference (pp. 148–156). Freund, Y., & Schapire, R. E. (1997). A decision theoretic generalization of online learning and an application to boosting. Journal of Computer and System Science, 55(1), 119–139. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting (Tech. Rep.). Stanford: Department of Statistics, Stanford University. Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistic, 29(5), 1189–1232. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/ variance dilemma. Neural Computation, 4(1), 1–58. Han, I., Chandler, J. S., & Liang, T. P. (1996). The impact of measurement scale and correlation structure on classification performance of inductive learning and statistical methods. Expert System with Applications, 10(2), 209–221. Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation and active learning. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.). Advances in neural information processing systems (Vol. 7, pp. 231–238). Cambridge, MA: MIT Press. Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. New York: Wiley. Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the 14th National Conference on Artificial Intelligence (pp. 546– 551). Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, & B. Scholk (Eds.), Direct
3379
optimization of margins improves generalization in combined classifiers. In M. S. Kearns, S. Solla, & D. Cohn (Eds.), Advances in Neural Information Processing Systems, Vol. 11, Cambridge, MA: MIT Press. Meyer, P. A., & Pifer, H. (1970). Prediction of bank failures. The Journal of Finance, 25, 853–868. Odom, M., & Sharda, R. (1990). A neural network for bankruptcy prediction. In Proceedings of the International Joint Conference on Neural Networks, San Diego, CA: IEEE Press. Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131. Optiz, D., & Maclin, R. (1999). Popular ensemble methods: an empirical study. Journal of Artificial Intelligence, 11, 169–198. Pantalone, C., & Platt, M. B. (1987). Predicting commercial bank failure since deregulation. New England Economic Review, 37–47. Perrone, M. E. (1994). Putting it all together: Methods for combining neural networks. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.). Advances in Neural Information Processing Systems (Vol. 6, pp. 1188–1189). San Mateo, CA: Morgan Kaufman. Quinlan, J. R. (1996). Bagging, boosting and C4.5. Machine Learning: Proceedings of the 14th International Conference (pp. 725–730). Ravi, P., & Ravi, K. V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques-a review. European Journal of Operational Research, 180, 1–28. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227. Schapire, R. E. (1999). Theoretical views of boosting. Computational Learning Theory: Fourth European Conference, EuroCOLT (pp. 1–10). Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. Machine Learning: Proceedings of 14th International Conference (pp. 322–330). Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithm using confidencerated predictions. Machine Learning, 37(3), 297–336. Shaw, M., & Gentry, J. (1998). Using and expert system with inductive learning to evaluate business loans. Financial Management, 17(3), 45–56. Zmijewski, M. E. (1984). Methodological issues related to the estimation of financial distress prediction models. Journal of Accounting Research, 22(1), 59–82.