Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering Private Bag 3 Wits 2050, Johannesburg, South Africa {d.lunga, t.marwala}@ee.wits.ac.za http://www.ee.wits.ac.za/∼marwala
Abstract. In this paper we present a particular implementation of the Learn++ algorithm: we investigate the predictability of financial movement direction with Learn++ by forecasting the daily movement direction of the Dow Jones. The Learn++ algorithm is derived from the Adaboost algorithm, which is denominated by sub-sampling. The goal of concept learning, according to the probably approximately correct weak model, is to generate a description of another function, called the hypothesis, which is close to the concept, by using a set of examples. The hypothesis which is derived from weak learning is boosted to provide a better composite hypothesis in generalizing the establishment of the final classification boundary. The framework is implemented using multi-layer Perceptron (MLP) as a weak Learner. First, a weak learning algorithm, which tries to learn a class concept with a single input Perceptron, is established. The Learn++ algorithm is then applied to improve the weak MLP learning capacity and introduces the concept of online incremental learning. The proposed framework is able to adapt as new data are introduced and is able to classify.
1
Introduction
The financial market is a complex, evolutionary, and non-linear dynamical system. The field of financial forecasting is characterized by data intensity, noise, non-stationary, unstructured nature, high degree of uncertainty, and hidden relationships [1]. Many factors interact in finance including political events, general economic conditions, and traders’ expectations. Therefore, predicting market price movements is quite difficult. Increasingly, according to academic investigations, movements in market prices are not random. Rather, they behave in a highly nonlinear and dynamical manner. The standard random walk assumption of future prices may merely be a veil of randomness that shrouds a noisy nonlinear process [2]. Incremental learning is the solution to such scenarios, which can be defined as the process of extracting new information without losing prior I. King et al. (Eds.): ICONIP 2006, Part III, LNCS 4234, pp. 440–449, 2006. c Springer-Verlag Berlin Heidelberg 2006
Online Forecasting of Stock Market Movement Direction
441
knowledge from an additional dataset that later becomes available. Various definitions and interpretations of incremental learning can be found in literature, including online learning [3], relearning of previously misclassified instances, and growing and pruning of classifier architectures [4]. An algorithm possesses incremental learning capabilities, if it meets the following criteria: – Ability to acquire additional knowledge when new stock data are introduced – Ability to retain previously learned information about the stock closing prices. – Ability to learn new classes of stock data if introduced by new data. Some applications of online classification problems have been reported recently [5]. In most cases, the degree of accuracy and the acceptability of certain classifications are measured by the error of misclassified instances. Although Learn++ has mostly been applied to classification problems, we show in this paper that the choice of Learn++ algorithm can boost a weak learning model to classify stock closing values with minimum error and reduced training time. For the practitioners in financial market, forecasting methods based on minimizing forecast error may not be adequate to meet their objectives. In other words, trading driven by a certain forecast with a small forecast error may not be as profitable as trading guided by an accurate prediction of the direction of movement. The main goal of this study is to explore the predictability of financial market movement direction using an ensemble of classifiers implemented using the Learn++ algorithm. This paper discusses the ensemble systems, introduces the basic theory on incremental learning and the Learn++ algorithm, and gives the experimental scheme as well as results obtained.
2
Ensemble of Classifiers
Ensemble systems have attracted a great deal of attention over the last decade due to their empirical success over single classifier systems on a variety of applications. Such systems combine an ensemble of generally weak classifiers to take advantage of the so-called instability of the weak classifier. This causes the classifiers to construct sufficiently different decision boundaries for minor modifications in their training parameters and as a result each classifier makes different errors on any given instance. A strategic combination of these classifiers, such as weighted majority voting [6], then eliminates the individual errors, generating a strong classifier. A rich collection of algorithms has been developed using multiple classifiers, such as AdaBoost [7], with the general goal of improving the generalization performance of the classification system. Using multiple classifiers for incremental learning, however, has been largely unexplored. Learn++, in part inspired by AdaBoost, was developed in response to recognizing the potential feasibility of ensemble of classifiers in solving the incremental learning problem. Learn++ was initially introduced in [8] as an incremental learning algorithm for the MLP type networks. A more versatile form of the algorithm was presented in [9] for all supervised classifiers. We have recently recognized that the
442
D. Lunga and T. Marwala
inherent voting mechanism of the algorithm can also be used in effectively determining the confidence of the classification system in its own decision making. In this work, we describe the algorithm Learn++, along with representative results on incremental learning and confidence estimation obtained on the application of the algorithm to predict the direction of the movement for the Dow Jones Average Indicators.
3
Incremental Learning
An incremental learning algorithm is defined as an algorithm that learns new information from unseen data, without necessitating access to previously used data [10]. The algorithm must also be able to learn new information from new data and still retains knowledge from the original data. Lastly, the algorithm must be able to learn new classes that may be introduced by new data. This type of learning algorithm is sometimes referred to as a ’memoryless’ online learning algorithm. Learning new information without requiring access to previously used data, however, raises ’stability-plasticity dilemma’ [11]. This dilemma indicates that a completely stable classifier maintains the knowledge from previously seen data, but fails to adjust in order to learn new information, while a completely plastic classifier is capable of learning new data but lose prior knowledge. The problem with the MLP is that it is a stable classifier and is not able to learn new information after it has been trained. Different procedures have been implemented for incremental learning. One procedure of learning new information from additional data involves discarding the existing classifier and training a new classifier using accumulated data. Other methods such as pruning of networks or controlled modification of classifier weight or growing of classifier architectures are referred to as incremental learning algorithm. This involves modifying the weights of the classifier using the misclassified instances only. The above algorithms are capable of learning new information; however, they suffer from ’catastrophic forgetting’ and require access to old data. One approach evaluates the current performance of the classifier architecture. If the present architecture does not sufficiently represent the decision boundaries being learned, new decision clusters are generated in response to new pattern. Furthermore, this approach does not require access to old data and can accommodate new classes. However, the main shortcomings of this approach are: cluster proliferation and extreme sensitivity to selection of algorithm parameters. In this paper, Learn++ is implemented for online prediction of stock movement direction using the Dow Jones average indicators. The Learn++ algorithm is summarized in the next section.
4
Learn++
Learn++ is an incremental learning algorithm that uses an ensemble of classifiers that are combined using weighted majority voting. Learn++ was developed from an inspiration by a boosting algorithm called adaptive boosting (AdaBoost).
Online Forecasting of Stock Market Movement Direction
443
Each classifier is trained using a training subset that is drawn according to a distribution D. The classifiers are trained using a weakLearn algorithm. The requirement for the weakLearn algorithm is that it must be able to give a classification rate of atleast 50% initially. For each database Dk that contains learning examples and their corresponding classes, Learn++ starts by initializing the weights, w, according to the distribution DT , where T is the number of hypothesis. Initially the weights are initialized to be uniform, which gives equal probability for all instances to be selected to the first training subset and the distribution is given by 1 (1) D= m Where m represents the number of training examples in database Sk . The training data are then divided into training subset TR and testing subset TE to ensure weakLearn capability. The distribution is then used to select the training subset TR and testing subset TE from Sk . After the training and testing subset have been selected, the weakLearn algorithm is implemented. The weakLearner is trained using subset, TR . A hypothesis, ht obtained from weakLearner is tested using both the training and testing subsets to obtain an error,t: t =
Dt (i)
(2)
t:ht (xi )=yi
The error is required to be less than 12 ; a normalized error βt is computed using: t βt = (3) 1 − t If the error is greater than 12 , the hypothesis is discarded and new training and testing subsets are selected according to DT and another hypothesis is computed. All classifiers generated so far, are combined using weighted majority voting to obtain composite hypothesis, Ht
Ht = arg max y∈Y
log
t:ht (x)=y
1 βt
(4)
Weighted majority voting gives higher voting weights to a hypothesis that performs well on its training and testing subsets. The error of the composite hypothesis is computed as in Eq. 5 and is given by Et =
Dt (i)
(5)
t:Ht (xi )=yi
If the error is greater than 12 , the current composite hypothesis is discarded and the new training and testing data are selected according to the distribution DT . Otherwise, if the error is less than 12 , the normalized error of the composite hypothesis is computed as: Et (6) Bt = 1 − Et
444
D. Lunga and T. Marwala
The error is used in the distribution update rule, where the weights of the correctly classified instances are reduced, consequently increasing the weights of the misclassified instances. This ensures that instances that were misclassified by the current hypothesis have a higher probability of being selected for the subsequent training set. The distribution update rule is given by [|Ht (xi )=yi |]
wt+1 = wt (i) · Bt
(7)
Once the T hypotheses are created for each database, the final hypothesis is computed by combining the composite hypothesis using weighted majority voting given by K 1 Ht = arg max log (8) y∈Y βt k=1 t:Ht (x)=y
5
Confidence Measurement
An intimately relevant issue is the confidence of the classifier in its decision, with particular interest on whether the confidence of the algorithm improves as new data become available. The voting mechanism inherent in Learn++ hints to a practical approach for estimating confidence: decisions made with a vast majority of votes have better confidence than those made by a slight majority [12]. We have implemented McIver and Friedl’s weighted exponential voting based confidence metric [13] with Learn++ as expFi (x) , 0 ≤ Ci (x) ≤ 1 Ci (x) = P (y = i|x) = N Fk (x) k=1 exp
(9)
Where Ci (x) is the confidence assigned to instance x when classified as class i, Fi (x) is the total vote associated with the it h class for the instance x and N is the number of classes. The total vote Fi (x) class received for any given instances is computed as N log β1t , if ht (x) = i Fi (x) = (10) 0, otherwise t=1
The confidence of winning class is then considered as the confidence of the algorithm in making the decision with respect to the winning class. Since Ci (x) is between 0 and 1, the confidences can be translated into linguistic indicators as shown in Table 1. These indicators are adopted and used in interpreting our experimental results. Equations (9) and (10) allow Learn++ to determine its own confidence in any classification it makes. The desired outcome of the confidence analysis is to observe a high confidence on correctly classified instances, and a low confidence on misclassified instances, so that the low confidence can be used to flag those instances that are being misclassified by the algorithm. A second desired outcome is to observe improved confidences on correctly classified instances and reduced confidence on misclassified instances, as new data become available, so that the incremental learning ability of the algorithm can be further confirmed.
Online Forecasting of Stock Market Movement Direction
445
Table 1. Confidence estimation representation Confidence range (%) Confidence level 90 ≤ C ≤ 100 Very High (VH) 80 ≤ C < 90 High (H) 70 ≤ C < 80 Medium (M) 60 ≤ C < 70 Low (L) C < 60 Very Low (VL)
6 6.1
Forecasting Framework Experimental Design
In our empirical analysis, we set out to examine the daily changes of the Dow Jones Index. The Dow Jones averages are unique in that they are price weighted rather than market capitalization weighted. Their component weightings are therefore affected only by changes in the stock prices, in contrast with other indexes’ weightings that are affected by both price changes and changes in the number of shares outstanding [14]. When the averages were initially created, their values were calculated by simply adding up the component stock prices and dividing by the number of components. Later, the practice of adjusting the divisor was initiated to smooth out the effects of stock splits and other corporate actions. The Dow Jones Industrial Average measures the composite price performance of over 30 highly capitalized stocks trading on the New York Stock Exchange (NYSE), representing a broad crosssection of US industries. Trading in the index has gained unprecedented popularity in major financial markets around the world. The increasing diversity of financial instruments related to the Dow Jones Index has broadened the dimension of global investment opportunity for both individual and institutional investors. There are two basic reasons for the success of these index trading vehicles. First, they provide an effective means for investors to hedge against potential market risks. Second, they create new profit making opportunities for market speculators and arbitrageurs. Therefore, it has profound implications and significance for researchers and practitioners alike to accurately forecast the movement direction of stock prices. 6.2
Model Input Selection
Most of the previous researchers have employed multivariate input. Several studies have examined the cross-sectional relationship between stock index and macroeconomic variables. The potential macroeconomic input variables which are used by the forecasting models include term structure of interest rates (TS), short-term interest rate (ST), long-term interest rate (LT), consumer price index (CPI), industrial production (IP), government consumption (GC), private consumption (PC), gross national product (GNP) and gross domestic product (GDP). Other macroeconomic variables data are not available for our study. Thus for our study only the closing values of the Index were selected as inputs.
446
D. Lunga and T. Marwala
A one step forward prediction of the Index was performed on a daily basis. The output of this prediction model was used as inputs to the learn++ algorithm for classification into the correct category that would give an indication of whether the predicted index value is 1 (indicating a positive increase in next day’s predicted closing value compared to the previous day’s closing value) or a predicted closing value of −1, indicating a decrease in next day’s predicted closing value compared to the previous day’s closing value. Figure 1 depicts the conceptual model of all processes required for this study. The first prediction model can be written as depicted by Eq. 11 below: CVt = F (cvt−1 , cvt−2 , cvt−3 , cvt−4 )
(11)
Where CVt is the predicted close value at time t, cvt−1 indicates the close value at day i, where i = 1, 2, 3, , t − 1.The second model takes the output of the first model as its input in predicting the direction of movement for the index. The classification prediction stage can be represented by Eq. 12: Directiont = F (CVt )
(12)
Where CVt is the first model prediction of the fifth day stock closing value when given the raw data at time t − 1 to t − 4 respectively. Directiont is a categorical variable to indicate the movement direction of Dow Jones Index at time t. If Dow Jones Index at time t is larger than that at time t − 1, Directiont is 1. Otherwise, Directiont is −1.
Fig. 1. Proposed model for online stock forecasting
6.3
Experimental Results
The forecasting model described in the sections above is estimated and validated by insample data. The model estimation selection process is then followed by an empirical evaluation which is based on the out-of-sample data. At this stage, the relative performance of the model is measured by the classification accuracy of the final hypothesis chosen for all given databases. The confidence of the algorithm on its own decision is used in establishing the accuracy of predicted closing value category. The first experiment implements a one step forward prediction of the next day’s stock closing value. After predicting the
Online Forecasting of Stock Market Movement Direction
447
next day’s closing value this value is fed into a classification model to indicate the direction of movement for the stock prices. As discussed above the database consisted of 1476 instances of the Dow Jones average closing value during the period from January 2000 to November 2005; 1000 instances is used for training and all the remaining instances is used for validation. The two binary classes are 1, indicating an upward direction of returns in Dow Jones stock, and -1 to indicate a predicted fall/downward direction of movement for the Dow Jones stock. Four datasets S1 , S2 , S3 , S4 , where each dataset included exactly one quarter of the entire training data, were provided to Learn++ in four training sessions for incremental learning. For each training session k,(k = 1, 2, 3, 4) three weak hypothesis were generated by Learn ++. Each hypothesis h1 , h2 and h3 of the k t h training session was generated using a training subset T Rt and a testing subset T Et . The WeakLearner was a single hidden layer MLP with 15 hidden layer nodes and 1 output node with an MSE goal of 0.1. The test set of data, Validate consisted of 476 instances that were used for validation purposes. On average , the MLP hypothesis, weakLearner, performed little over 50%, which improved to over 80% when the hypothesis were combined by making use of weighted majority voting. This improvement demonstrates the performance improvement property of Learn++, as inherited from AdaBoost, on a given database. The data distribution and the percentage classification performance are given in Table 2. The performances listed are on the validation data, Validate following each training session. Table 3 provides an actual breakdown of correctly classified and misclassified instances falling into each confidence range after each training session. The trends of the confidence estimates after subsequent training sessions are given in Table 3. The desired outcome on the actual confidences is high to very high confidences on correctly classified instances, and low to very low confidences on misclassified instances. The desired outcome on confidence trends is increasing or steady confidences on correctly classified instances, and decreasing confidences on misclassified instances, as new data is introduced. Table 2. Training and generalisation performance of Learn++ Database Class(1) Class(-1) Test Performance (%) S1 132 68 72 S2 125 75 82 S3 163 37 85 S4 104 96 86 V alidate 143 57 –
The performance shown in Table 2 indicates that the algorithm is improving its generalization capacity as new data become available. The improvement is modest, however, as majority of the new information is already learned in the first training session. Table 4 indicates that the vast majority of correctly classified instances tend to have very high confidences, with continually improved confidences at consecutive training sessions. While a considerable portion of
448
D. Lunga and T. Marwala
misclassified instances also had high confidence for this database, the general desired trends of increased confidence on correctly classified instances and decreasing confidence on misclassified ones were notable and dominant, as shown in Table 3. Table 3. Confidence results Correctly classified S1 S2 S3 S4 Incorrectly classified S1 S2 S3 S4
VH H 96 14 104 7 111 11 101 13 23 7 27 0 21 1 24 0
M 13 22 6 42 13 1 2 2
VL 15 17 3 12 3 3 4 2
L 6 14 39 4 8 4 2 0
Table 4. Confidence trends for Dow Jones
Correctly classified Misclassified
7
Increasing Steady Decreasing 119 8 16 24
Conclusion
In this paper, we study the use of an incremental algorithm to predict financial markets movement direction. As demonstrated in our empirical analysis, Learn++ is observed to give good results on converting the weakLearner (MLP) into a strong learning algorithm that has confidence in all its decisions. The Learn++ algorithm is observed to assess the confidence of its own decisions. In general, majority of correctly classified instances had very high confidence estimates while lower confidence values were associated with misclassified instances. Therefore, classifications with low confidences can be used as a flag to further evaluate those instances. Furthermore, the algorithm also showed increasing confidences in correctly classified instances and decreasing confidences in misclassified instances after subsequent training sessions. This is a very comforting outcome, which further indicates that algorithm can incrementally acquire new and novel information from additional data.
Acknowledgement This research was fully funded by the National Research Foundation of the Republic of South Africa.
Online Forecasting of Stock Market Movement Direction
449
References 1. Carpenter, G., Grossberg, S., Marhuzon, N., Reynolds, J., Rosen, D.: Artmap: A neural network architecture for incremental learning supervised learning of analog multidi-mensional maps. In: Transactions in Neural Networks. Volume 3., IEEE (1992) 678–713 2. McNelis, P.D., ed.: Neural Networks in Finance: Gaining the predictive edge in the market. Elsevier Academic Press, Oxford-UK (2005) 3. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science (1997) 4. Bishop, C., ed.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford-London (1995) 5. Vilakazi, B., Marwala, T., Mautla, R., Moloto, E.: Online bushing condition monitoring using computational intelligence. WSEAS Transactions on Power Systems 1 (2006) 280–287 6. Littlestone, N., Warmuth, M.: Weighted majority voting algorithm. information and computer science 108 (1994) 212–216 7. Polikar, R., Byorick, J., Krause, S., Marino, A., Moreton, M.: Learn++: A classifier independent incremental learning algorithm. Proceedings of International Joint Conference on Neural Networks (2002) 8. Polikar, R.: Algorithms for enhancing pattern separability, feature selection and incremental learning with applications to gas sensing electronic noise systems. PhD thesis, Iowa State University, Ames (2000) 9. Freund, Y., Schapire, R.: A short introduction to boosting. Japanese Society for Artificial Intelligence 14 (1999) 771–780 10. Polikar, R., Udpa, L., Udpa, S., Honavar, V.: An incremental learning algorithm with confi-dence estimation for automated identification of nde signals. Transactions on Ul-trasonic Ferroelectrics, and Frequency control 51 (2004) 990–1001 11. Grossberg, S.: Nonlinear neural networks: principles, mechanisms and architectures. Neural Networks 1 (1988) 17–61 12. Byorick, J., Polikar, R.: Confidence estimation using the incremental learning algorithm. Lecture notes in computer science 2714 (2003) 181–188 13. McIver, D., Friedl, M.: Estimating pixel-scale land cover classification confidence using nonparametric machine learning methods. Transactions on Geoscience and Remote Sensing 39 (2001) 14. Leung, M., Daouk, H., Chen, A.: Forecasting stock indices: a comparison of classification and level estimation models. (International Journal of Forecasting) 173–190