Online Forecasting of Stock Market Movement Direction Using the ...

Comment

Report 2 Downloads 77 Views

Online Forecasting of Stock Market Movement Direction Using the Improved Incremental Algorithm Dalton Lunga and Tshilidzi Marwala University of the Witwatersrand School of Electrical and Information Engineering Private Bag 3 Wits 2050, Johannesburg, South Africa {d.lunga, t.marwala}@ee.wits.ac.za http://www.ee.wits.ac.za/∼marwala

Abstract. In this paper we present a particular implementation of the Learn++ algorithm: we investigate the predictability of ﬁnancial movement direction with Learn++ by forecasting the daily movement direction of the Dow Jones. The Learn++ algorithm is derived from the Adaboost algorithm, which is denominated by sub-sampling. The goal of concept learning, according to the probably approximately correct weak model, is to generate a description of another function, called the hypothesis, which is close to the concept, by using a set of examples. The hypothesis which is derived from weak learning is boosted to provide a better composite hypothesis in generalizing the establishment of the ﬁnal classiﬁcation boundary. The framework is implemented using multi-layer Perceptron (MLP) as a weak Learner. First, a weak learning algorithm, which tries to learn a class concept with a single input Perceptron, is established. The Learn++ algorithm is then applied to improve the weak MLP learning capacity and introduces the concept of online incremental learning. The proposed framework is able to adapt as new data are introduced and is able to classify.

1

Introduction

The ﬁnancial market is a complex, evolutionary, and non-linear dynamical system. The ﬁeld of ﬁnancial forecasting is characterized by data intensity, noise, non-stationary, unstructured nature, high degree of uncertainty, and hidden relationships [1]. Many factors interact in ﬁnance including political events, general economic conditions, and traders’ expectations. Therefore, predicting market price movements is quite diﬃcult. Increasingly, according to academic investigations, movements in market prices are not random. Rather, they behave in a highly nonlinear and dynamical manner. The standard random walk assumption of future prices may merely be a veil of randomness that shrouds a noisy nonlinear process [2]. Incremental learning is the solution to such scenarios, which can be deﬁned as the process of extracting new information without losing prior I. King et al. (Eds.): ICONIP 2006, Part III, LNCS 4234, pp. 440–449, 2006. c Springer-Verlag Berlin Heidelberg 2006

Online Forecasting of Stock Market Movement Direction

441

knowledge from an additional dataset that later becomes available. Various deﬁnitions and interpretations of incremental learning can be found in literature, including online learning [3], relearning of previously misclassiﬁed instances, and growing and pruning of classiﬁer architectures [4]. An algorithm possesses incremental learning capabilities, if it meets the following criteria: – Ability to acquire additional knowledge when new stock data are introduced – Ability to retain previously learned information about the stock closing prices. – Ability to learn new classes of stock data if introduced by new data. Some applications of online classiﬁcation problems have been reported recently [5]. In most cases, the degree of accuracy and the acceptability of certain classiﬁcations are measured by the error of misclassiﬁed instances. Although Learn++ has mostly been applied to classiﬁcation problems, we show in this paper that the choice of Learn++ algorithm can boost a weak learning model to classify stock closing values with minimum error and reduced training time. For the practitioners in ﬁnancial market, forecasting methods based on minimizing forecast error may not be adequate to meet their objectives. In other words, trading driven by a certain forecast with a small forecast error may not be as proﬁtable as trading guided by an accurate prediction of the direction of movement. The main goal of this study is to explore the predictability of ﬁnancial market movement direction using an ensemble of classiﬁers implemented using the Learn++ algorithm. This paper discusses the ensemble systems, introduces the basic theory on incremental learning and the Learn++ algorithm, and gives the experimental scheme as well as results obtained.

2

Ensemble of Classiﬁers

Ensemble systems have attracted a great deal of attention over the last decade due to their empirical success over single classiﬁer systems on a variety of applications. Such systems combine an ensemble of generally weak classiﬁers to take advantage of the so-called instability of the weak classiﬁer. This causes the classiﬁers to construct suﬃciently diﬀerent decision boundaries for minor modiﬁcations in their training parameters and as a result each classiﬁer makes diﬀerent errors on any given instance. A strategic combination of these classiﬁers, such as weighted majority voting [6], then eliminates the individual errors, generating a strong classiﬁer. A rich collection of algorithms has been developed using multiple classiﬁers, such as AdaBoost [7], with the general goal of improving the generalization performance of the classiﬁcation system. Using multiple classiﬁers for incremental learning, however, has been largely unexplored. Learn++, in part inspired by AdaBoost, was developed in response to recognizing the potential feasibility of ensemble of classiﬁers in solving the incremental learning problem. Learn++ was initially introduced in [8] as an incremental learning algorithm for the MLP type networks. A more versatile form of the algorithm was presented in [9] for all supervised classiﬁers. We have recently recognized that the

442

D. Lunga and T. Marwala

inherent voting mechanism of the algorithm can also be used in eﬀectively determining the conﬁdence of the classiﬁcation system in its own decision making. In this work, we describe the algorithm Learn++, along with representative results on incremental learning and conﬁdence estimation obtained on the application of the algorithm to predict the direction of the movement for the Dow Jones Average Indicators.

3

Incremental Learning

An incremental learning algorithm is deﬁned as an algorithm that learns new information from unseen data, without necessitating access to previously used data [10]. The algorithm must also be able to learn new information from new data and still retains knowledge from the original data. Lastly, the algorithm must be able to learn new classes that may be introduced by new data. This type of learning algorithm is sometimes referred to as a ’memoryless’ online learning algorithm. Learning new information without requiring access to previously used data, however, raises ’stability-plasticity dilemma’ [11]. This dilemma indicates that a completely stable classiﬁer maintains the knowledge from previously seen data, but fails to adjust in order to learn new information, while a completely plastic classiﬁer is capable of learning new data but lose prior knowledge. The problem with the MLP is that it is a stable classiﬁer and is not able to learn new information after it has been trained. Diﬀerent procedures have been implemented for incremental learning. One procedure of learning new information from additional data involves discarding the existing classiﬁer and training a new classiﬁer using accumulated data. Other methods such as pruning of networks or controlled modiﬁcation of classiﬁer weight or growing of classiﬁer architectures are referred to as incremental learning algorithm. This involves modifying the weights of the classiﬁer using the misclassiﬁed instances only. The above algorithms are capable of learning new information; however, they suﬀer from ’catastrophic forgetting’ and require access to old data. One approach evaluates the current performance of the classiﬁer architecture. If the present architecture does not suﬃciently represent the decision boundaries being learned, new decision clusters are generated in response to new pattern. Furthermore, this approach does not require access to old data and can accommodate new classes. However, the main shortcomings of this approach are: cluster proliferation and extreme sensitivity to selection of algorithm parameters. In this paper, Learn++ is implemented for online prediction of stock movement direction using the Dow Jones average indicators. The Learn++ algorithm is summarized in the next section.

4

Learn++

Learn++ is an incremental learning algorithm that uses an ensemble of classiﬁers that are combined using weighted majority voting. Learn++ was developed from an inspiration by a boosting algorithm called adaptive boosting (AdaBoost).

Online Forecasting of Stock Market Movement Direction

443

Each classiﬁer is trained using a training subset that is drawn according to a distribution D. The classiﬁers are trained using a weakLearn algorithm. The requirement for the weakLearn algorithm is that it must be able to give a classiﬁcation rate of atleast 50% initially. For each database Dk that contains learning examples and their corresponding classes, Learn++ starts by initializing the weights, w, according to the distribution DT , where T is the number of hypothesis. Initially the weights are initialized to be uniform, which gives equal probability for all instances to be selected to the ﬁrst training subset and the distribution is given by 1 (1) D= m Where m represents the number of training examples in database Sk . The training data are then divided into training subset TR and testing subset TE to ensure weakLearn capability. The distribution is then used to select the training subset TR and testing subset TE from Sk . After the training and testing subset have been selected, the weakLearn algorithm is implemented. The weakLearner is trained using subset, TR . A hypothesis, ht obtained from weakLearner is tested using both the training and testing subsets to obtain an error,t: t =

Dt (i)

(2)

t:ht (xi )=yi

The error is required to be less than 12 ; a normalized error βt is computed using: t βt = (3) 1 − t If the error is greater than 12 , the hypothesis is discarded and new training and testing subsets are selected according to DT and another hypothesis is computed. All classiﬁers generated so far, are combined using weighted majority voting to obtain composite hypothesis, Ht

Ht = arg max y∈Y

log

t:ht (x)=y

1 βt

(4)

Weighted majority voting gives higher voting weights to a hypothesis that performs well on its training and testing subsets. The error of the composite hypothesis is computed as in Eq. 5 and is given by Et =

Dt (i)

(5)

t:Ht (xi )=yi

If the error is greater than 12 , the current composite hypothesis is discarded and the new training and testing data are selected according to the distribution DT . Otherwise, if the error is less than 12 , the normalized error of the composite hypothesis is computed as: Et (6) Bt = 1 − Et

444

D. Lunga and T. Marwala

The error is used in the distribution update rule, where the weights of the correctly classiﬁed instances are reduced, consequently increasing the weights of the misclassiﬁed instances. This ensures that instances that were misclassiﬁed by the current hypothesis have a higher probability of being selected for the subsequent training set. The distribution update rule is given by [|Ht (xi )=yi |]

wt+1 = wt (i) · Bt

(7)

Once the T hypotheses are created for each database, the ﬁnal hypothesis is computed by combining the composite hypothesis using weighted majority voting given by K 1 Ht = arg max log (8) y∈Y βt k=1 t:Ht (x)=y

5

Conﬁdence Measurement

An intimately relevant issue is the conﬁdence of the classiﬁer in its decision, with particular interest on whether the conﬁdence of the algorithm improves as new data become available. The voting mechanism inherent in Learn++ hints to a practical approach for estimating conﬁdence: decisions made with a vast majority of votes have better conﬁdence than those made by a slight majority [12]. We have implemented McIver and Friedl’s weighted exponential voting based conﬁdence metric [13] with Learn++ as expFi (x) , 0 ≤ Ci (x) ≤ 1 Ci (x) = P (y = i|x) = N Fk (x) k=1 exp

(9)

Where Ci (x) is the conﬁdence assigned to instance x when classiﬁed as class i, Fi (x) is the total vote associated with the it h class for the instance x and N is the number of classes. The total vote Fi (x) class received for any given instances is computed as N log β1t , if ht (x) = i Fi (x) = (10) 0, otherwise t=1

The conﬁdence of winning class is then considered as the conﬁdence of the algorithm in making the decision with respect to the winning class. Since Ci (x) is between 0 and 1, the conﬁdences can be translated into linguistic indicators as shown in Table 1. These indicators are adopted and used in interpreting our experimental results. Equations (9) and (10) allow Learn++ to determine its own conﬁdence in any classiﬁcation it makes. The desired outcome of the conﬁdence analysis is to observe a high conﬁdence on correctly classiﬁed instances, and a low conﬁdence on misclassiﬁed instances, so that the low conﬁdence can be used to ﬂag those instances that are being misclassiﬁed by the algorithm. A second desired outcome is to observe improved conﬁdences on correctly classiﬁed instances and reduced conﬁdence on misclassiﬁed instances, as new data become available, so that the incremental learning ability of the algorithm can be further conﬁrmed.

Online Forecasting of Stock Market Movement Direction

445

Table 1. Conﬁdence estimation representation Conﬁdence range (%) Conﬁdence level 90 ≤ C ≤ 100 Very High (VH) 80 ≤ C < 90 High (H) 70 ≤ C < 80 Medium (M) 60 ≤ C < 70 Low (L) C < 60 Very Low (VL)

6 6.1

Forecasting Framework Experimental Design

In our empirical analysis, we set out to examine the daily changes of the Dow Jones Index. The Dow Jones averages are unique in that they are price weighted rather than market capitalization weighted. Their component weightings are therefore aﬀected only by changes in the stock prices, in contrast with other indexes’ weightings that are aﬀected by both price changes and changes in the number of shares outstanding [14]. When the averages were initially created, their values were calculated by simply adding up the component stock prices and dividing by the number of components. Later, the practice of adjusting the divisor was initiated to smooth out the eﬀects of stock splits and other corporate actions. The Dow Jones Industrial Average measures the composite price performance of over 30 highly capitalized stocks trading on the New York Stock Exchange (NYSE), representing a broad crosssection of US industries. Trading in the index has gained unprecedented popularity in major ﬁnancial markets around the world. The increasing diversity of ﬁnancial instruments related to the Dow Jones Index has broadened the dimension of global investment opportunity for both individual and institutional investors. There are two basic reasons for the success of these index trading vehicles. First, they provide an eﬀective means for investors to hedge against potential market risks. Second, they create new proﬁt making opportunities for market speculators and arbitrageurs. Therefore, it has profound implications and signiﬁcance for researchers and practitioners alike to accurately forecast the movement direction of stock prices. 6.2

Model Input Selection

Most of the previous researchers have employed multivariate input. Several studies have examined the cross-sectional relationship between stock index and macroeconomic variables. The potential macroeconomic input variables which are used by the forecasting models include term structure of interest rates (TS), short-term interest rate (ST), long-term interest rate (LT), consumer price index (CPI), industrial production (IP), government consumption (GC), private consumption (PC), gross national product (GNP) and gross domestic product (GDP). Other macroeconomic variables data are not available for our study. Thus for our study only the closing values of the Index were selected as inputs.

446

D. Lunga and T. Marwala

A one step forward prediction of the Index was performed on a daily basis. The output of this prediction model was used as inputs to the learn++ algorithm for classiﬁcation into the correct category that would give an indication of whether the predicted index value is 1 (indicating a positive increase in next day’s predicted closing value compared to the previous day’s closing value) or a predicted closing value of −1, indicating a decrease in next day’s predicted closing value compared to the previous day’s closing value. Figure 1 depicts the conceptual model of all processes required for this study. The ﬁrst prediction model can be written as depicted by Eq. 11 below: CVt = F (cvt−1 , cvt−2 , cvt−3 , cvt−4 )

(11)

Where CVt is the predicted close value at time t, cvt−1 indicates the close value at day i, where i = 1, 2, 3, , t − 1.The second model takes the output of the ﬁrst model as its input in predicting the direction of movement for the index. The classiﬁcation prediction stage can be represented by Eq. 12: Directiont = F (CVt )

(12)

Where CVt is the ﬁrst model prediction of the ﬁfth day stock closing value when given the raw data at time t − 1 to t − 4 respectively. Directiont is a categorical variable to indicate the movement direction of Dow Jones Index at time t. If Dow Jones Index at time t is larger than that at time t − 1, Directiont is 1. Otherwise, Directiont is −1.

Fig. 1. Proposed model for online stock forecasting

6.3

Experimental Results

The forecasting model described in the sections above is estimated and validated by insample data. The model estimation selection process is then followed by an empirical evaluation which is based on the out-of-sample data. At this stage, the relative performance of the model is measured by the classiﬁcation accuracy of the ﬁnal hypothesis chosen for all given databases. The conﬁdence of the algorithm on its own decision is used in establishing the accuracy of predicted closing value category. The ﬁrst experiment implements a one step forward prediction of the next day’s stock closing value. After predicting the

Online Forecasting of Stock Market Movement Direction

447

next day’s closing value this value is fed into a classiﬁcation model to indicate the direction of movement for the stock prices. As discussed above the database consisted of 1476 instances of the Dow Jones average closing value during the period from January 2000 to November 2005; 1000 instances is used for training and all the remaining instances is used for validation. The two binary classes are 1, indicating an upward direction of returns in Dow Jones stock, and -1 to indicate a predicted fall/downward direction of movement for the Dow Jones stock. Four datasets S1 , S2 , S3 , S4 , where each dataset included exactly one quarter of the entire training data, were provided to Learn++ in four training sessions for incremental learning. For each training session k,(k = 1, 2, 3, 4) three weak hypothesis were generated by Learn ++. Each hypothesis h1 , h2 and h3 of the k t h training session was generated using a training subset T Rt and a testing subset T Et . The WeakLearner was a single hidden layer MLP with 15 hidden layer nodes and 1 output node with an MSE goal of 0.1. The test set of data, Validate consisted of 476 instances that were used for validation purposes. On average , the MLP hypothesis, weakLearner, performed little over 50%, which improved to over 80% when the hypothesis were combined by making use of weighted majority voting. This improvement demonstrates the performance improvement property of Learn++, as inherited from AdaBoost, on a given database. The data distribution and the percentage classiﬁcation performance are given in Table 2. The performances listed are on the validation data, Validate following each training session. Table 3 provides an actual breakdown of correctly classiﬁed and misclassiﬁed instances falling into each conﬁdence range after each training session. The trends of the conﬁdence estimates after subsequent training sessions are given in Table 3. The desired outcome on the actual conﬁdences is high to very high conﬁdences on correctly classiﬁed instances, and low to very low conﬁdences on misclassiﬁed instances. The desired outcome on conﬁdence trends is increasing or steady conﬁdences on correctly classiﬁed instances, and decreasing conﬁdences on misclassiﬁed instances, as new data is introduced. Table 2. Training and generalisation performance of Learn++ Database Class(1) Class(-1) Test Performance (%) S1 132 68 72 S2 125 75 82 S3 163 37 85 S4 104 96 86 V alidate 143 57 –

The performance shown in Table 2 indicates that the algorithm is improving its generalization capacity as new data become available. The improvement is modest, however, as majority of the new information is already learned in the ﬁrst training session. Table 4 indicates that the vast majority of correctly classiﬁed instances tend to have very high conﬁdences, with continually improved conﬁdences at consecutive training sessions. While a considerable portion of

448

D. Lunga and T. Marwala

misclassiﬁed instances also had high conﬁdence for this database, the general desired trends of increased conﬁdence on correctly classiﬁed instances and decreasing conﬁdence on misclassiﬁed ones were notable and dominant, as shown in Table 3. Table 3. Conﬁdence results Correctly classiﬁed S1 S2 S3 S4 Incorrectly classiﬁed S1 S2 S3 S4

VH H 96 14 104 7 111 11 101 13 23 7 27 0 21 1 24 0

M 13 22 6 42 13 1 2 2

VL 15 17 3 12 3 3 4 2

L 6 14 39 4 8 4 2 0

Table 4. Conﬁdence trends for Dow Jones

Correctly classiﬁed Misclassiﬁed

7

Increasing Steady Decreasing 119 8 16 24

Conclusion

In this paper, we study the use of an incremental algorithm to predict ﬁnancial markets movement direction. As demonstrated in our empirical analysis, Learn++ is observed to give good results on converting the weakLearner (MLP) into a strong learning algorithm that has conﬁdence in all its decisions. The Learn++ algorithm is observed to assess the conﬁdence of its own decisions. In general, majority of correctly classiﬁed instances had very high conﬁdence estimates while lower conﬁdence values were associated with misclassiﬁed instances. Therefore, classiﬁcations with low conﬁdences can be used as a ﬂag to further evaluate those instances. Furthermore, the algorithm also showed increasing conﬁdences in correctly classiﬁed instances and decreasing conﬁdences in misclassiﬁed instances after subsequent training sessions. This is a very comforting outcome, which further indicates that algorithm can incrementally acquire new and novel information from additional data.

Acknowledgement This research was fully funded by the National Research Foundation of the Republic of South Africa.

Online Forecasting of Stock Market Movement Direction

449

References 1. Carpenter, G., Grossberg, S., Marhuzon, N., Reynolds, J., Rosen, D.: Artmap: A neural network architecture for incremental learning supervised learning of analog multidi-mensional maps. In: Transactions in Neural Networks. Volume 3., IEEE (1992) 678–713 2. McNelis, P.D., ed.: Neural Networks in Finance: Gaining the predictive edge in the market. Elsevier Academic Press, Oxford-UK (2005) 3. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Science (1997) 4. Bishop, C., ed.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford-London (1995) 5. Vilakazi, B., Marwala, T., Mautla, R., Moloto, E.: Online bushing condition monitoring using computational intelligence. WSEAS Transactions on Power Systems 1 (2006) 280–287 6. Littlestone, N., Warmuth, M.: Weighted majority voting algorithm. information and computer science 108 (1994) 212–216 7. Polikar, R., Byorick, J., Krause, S., Marino, A., Moreton, M.: Learn++: A classiﬁer independent incremental learning algorithm. Proceedings of International Joint Conference on Neural Networks (2002) 8. Polikar, R.: Algorithms for enhancing pattern separability, feature selection and incremental learning with applications to gas sensing electronic noise systems. PhD thesis, Iowa State University, Ames (2000) 9. Freund, Y., Schapire, R.: A short introduction to boosting. Japanese Society for Artiﬁcial Intelligence 14 (1999) 771–780 10. Polikar, R., Udpa, L., Udpa, S., Honavar, V.: An incremental learning algorithm with conﬁ-dence estimation for automated identiﬁcation of nde signals. Transactions on Ul-trasonic Ferroelectrics, and Frequency control 51 (2004) 990–1001 11. Grossberg, S.: Nonlinear neural networks: principles, mechanisms and architectures. Neural Networks 1 (1988) 17–61 12. Byorick, J., Polikar, R.: Conﬁdence estimation using the incremental learning algorithm. Lecture notes in computer science 2714 (2003) 181–188 13. McIver, D., Friedl, M.: Estimating pixel-scale land cover classiﬁcation conﬁdence using nonparametric machine learning methods. Transactions on Geoscience and Remote Sensing 39 (2001) 14. Leung, M., Daouk, H., Chen, A.: Forecasting stock indices: a comparison of classiﬁcation and level estimation models. (International Journal of Forecasting) 173–190

Recommend Documents

Forecasting stock market movement direction with support vector ...

RITE Report - Stock Market Direction

A Hybrid Method for Forecasting Stock Market Trend Using Soft ...

Forecasting stock market indexes using principle component analysis ...

Forecasting Volatility in Indian Stock Market using ... - Semantic Scholar