Improved estimation of electricity demand function by using of artificial ...

Report 2 Downloads 121 Views
Computers & Industrial Engineering 64 (2013) 425–441

Contents lists available at SciVerse ScienceDirect

Computers & Industrial Engineering journal homepage: www.elsevier.com/locate/caie

Improved estimation of electricity demand function by using of artificial neural network, principal component analysis and data envelopment analysis A. Kheirkhah a, A. Azadeh b, M. Saberi c, A. Azaron d,e,⇑, H. Shakouri b a

Department of Industrial Engineering, University of Bu-Ali Sina, Hamadan, Iran Department of Industrial Engineering, College of Engineering, University of Tehran, P.O. Box 11365-4563, Iran c Young Researchers Club, Islamic Azad University, Tafresh Branch, Tafresh, Iran d Michael Smurfit Graduate School of Business, University College Dublin, Carysfort Avenue, Blackrock, Co. Dublin, Ireland e Department of Industrial Engineering, Istanbul Sehir University, Istanbul, Turkey b

a r t i c l e

i n f o

Article history: Received 3 November 2010 Received in revised form 25 July 2012 Accepted 19 September 2012 Available online 13 October 2012 Keywords: Neural networks Time series analysis Data envelopment analysis Electricity consumption forecasting

a b s t r a c t Due to various seasonal and monthly changes in electricity consumption and difficulties in modeling it with the conventional methods, a novel algorithm is proposed in this paper. This study presents an approach that uses Artificial Neural Network (ANN), Principal Component Analysis (PCA), Data Envelopment Analysis (DEA) and ANOVA methods to estimate and predict electricity demand for seasonal and monthly changes in electricity consumption. Pre-processing and post-processing techniques in the data mining field are used in the present study. We analyze the impact of the data pre-processing and post-processing on the ANN performance and a 680 ANN-MLP is constructed for this purpose. DEA is used to compare the constructed ANN models as well as ANN learning algorithm performance. The average, minimum, maximum and standard deviation of mean absolute percentage error (MAPE) of each constructed ANN are used as the DEA inputs. The DEA helps the user to use an appropriate ANN model as an acceptable forecasting tool. In the other words, various error calculation methods are used to find a robust ANN learning algorithm. Moreover, PCA is used as an input selection method, and a preferred time series model is chosen from the linear (ARIMA) and nonlinear models. After selecting the preferred ARIMA model, the Mcleod–Li test is applied to determine the nonlinearity condition. Once the nonlinearity condition is satisfied, the preferred nonlinear model is selected and compared with the preferred ARIMA model, and the best time series model is selected. Then, a new algorithm is developed for the time series estimation; in each case an ANN or conventional time series model is selected for the estimation and prediction. To show the applicability and superiority of the proposed ANN-PCA-DEA-ANOVA algorithm, the data regarding the Iranian electricity consumption from April 1992 to February 2004 are used. The results show that the proposed algorithm provides an accurate solution for the problem of estimating electricity consumption. Ó 2012 Elsevier Ltd. All rights reserved.

1. Significance The significance of the proposed algorithm is fourfold. First, it is flexible and identifies the preferred estimation model based on the results of MAPE (Minimum Absolute Percentage Error), ANOVA and DEA, whereas previous studies consider the best fitted fuzzy regression model based on MAPE or other relative error results. Second, the proposed model may identify a linear (ARIMA) or nonlinear time series model. ARIMA is the best model to be used for the prediction of electricity consumption because of its dynamic structure, whereas previous studies assume that ANN always provides the best estimation. Third, it utilizes Principal Component Analysis (PCA) to define input variables versus the trial and process method. Fourth,

⇑ Corresponding author. Tel.: +353 1 716 8853. E-mail address: [email protected] (A. Azaron). 0360-8352/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cie.2012.09.017

the Mcleod–Li test is used to consider both linearity and nonlinearity conditions of the electricity consumption time series data.

2. Introduction Electricity, as an energy resource, plays an ever-increasing role in the world economy, and its multi-purpose application in production and consumption has gained special attention. Because of the development of societies and the growth of economic activities, electricity is having a greater impact on corporations and their services. Corporations use electricity as a production factor. Moreover, families, directly or indirectly, rely on electricity. Thus, energy consumption determines the welfare of both individuals and the social economy. Recently, intelligent methods have been widely used in the energy field (Azadeh, Ghaderi, & Sohrabkhani, 2007; Azadeh, Ghaderi, Tarverdian, & Saberi, 2007; Azadeh, Gha-

426

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

deri, Tarverdian, & Saberi, 2007; Aznarte et al., 2007; Chiang, Urban, & Baldridge, 1996; Gareta, Romeo, & Gil, 2006; Hill, O’Connor, & Remus, 1996; Indro, Jiang, Patuwo, & Zhang, 1999; Jhee & Lee, 1993; Karunasinghe & Liong, 2006; Kohzadi, Boyd, Kermanshahi, & Kaastra, 1996; Nayak, Sudheer, Rangan, & Ramasastri, 2004; Oliveira & Meira, 2006; Stern, 1996; Tang, Almeida, & Fishwick, 1991; Tang & Fishwick, 1993). Models in these studies fall into two main categories: the regression-based models (Azadeh, Ghaderi, Tarverdian et al., 2007; Azadeh et al., 2007; Aznarte et al., 2007; Gareta et al., 2006; Hill, O’Connor, & Remus, 1996; Indro et al., 1999; Kohzadi et al., 1996; Stern, 1996; Tang & Fishwick, 1993; Tang et al., 1991) and the time series-based models (Abdel-Aal, Al-Garni, & Al-Nassar, 1997; Aydinalp, Ismet Ugursal, & Fung, 2002; Azadeh & Ebrahimipour, 2004; Azadeh, Saberi, Gitiforouz, & Saberi, 2009; Azadeh, Saberi, & Seraj, 2010; Azadeh & Tarverdian, 2007; Azadeh, Ghaderi, & Sohrabkhani, 2007; Chiang et al., 1996; Dillon, Sestito, & Leuang, 1991; Enlin, 1995; Gallant & Stephen, 1993; Hsu & Chen, 2003; Jhee & Lee, 1993; Kalogirou, 2000; Kalogirou & Bojic, 2000; Metaxiotis, Kagiannas, Ashounis, & Psarras, 2003; Oliveira & Meira, 2006; Ozturk, Ceylan, Canyurt, & Hepbasli, 2003; Peng, Hubele, & Karady, 1992; Yao, Song, Zhang, & Cheng, 2000; Yu, Choi, & Hui, 2011; Zhu, 1998). There have been many studies of ANN models in different contexts. An ANN is configured for a specific application, such as pattern recognition, function approximation, data classification and so on in different areas of science. Time series modeling is one of the main applications. Many researchers showed ANN’s comparability and superiority to conventional methods for estimating functions (Azadeh, Ghaderi, & Sohrabkhani, 2007; Azadeh, Ghaderi, Tarverdian et al., 2007; Azadeh et al., 2007; Chiang et al., 1996; Hill et al., 1996; Indro et al., 1999; Jhee & Lee, 1993; Kohzadi et al., 1996; Stern, 1996; Tang & Fishwick, 1993; Tang et al., 1991). The main objective of this paper is to contribute to the use of ANN, conventional time series models, PCA, DEA and data pre-processing methods for time series modeling. ANN and conventional time series models are used for the estimation and prediction. Also, PCA is used for the selection of ANN inputs. Finally, data pre-processing (to make the process, covariance stationary) and post-processing (to access main data) methods are used to improve the ANN performance. The literature reveals that a combination of these methods with ANN to model time series has rarely been done. Although the data pre-processing concept is considered in some literature, the covariance stationary concept in data pre-processing has been ignored (Aznarte et al., 2007; Gareta et al., 2006; Karunasinghe & Liong, 2006; Nayak et al., 2004; Oliveira & Meira, 2006). Moreover, the selection of input variables in most heuristic methods is either experimental or based on the trial and error methods (Al-Saba & El-Amin, 1999; Ghiassi, Saidane, & Zimbra, 2005; Karunasinghe & Liong, 2006; Kim, Oh, Kim, & Do, 2004; Nayak et al., 2004; Palmer, Montano, & Sese, 2006; Zhang & Hu, 1998). In summary, this paper presents an intelligent algorithm to model non-stationary time series data with respect to the electricity consumption. Conventional time series, Artificial Neural Network (ANN), Principal Component Analysis (PCA), Data Envelopment Analysis (DEA) and Data pre-processing methods are integrated for forecasting electricity consumption. A unique feature of this study is that it considers the impact of data pre-processing and post-processing on the ANN. Another unique feature of the proposed algorithm is the utilization of PCA to define input variables unlike conventional methods that use the trial and error method. In other words, using PCA removes the need to consider and assess 4096  680 different ANN models. In addition, the 680 ANN models are used in the present study. Seventeen general training ANN algorithms are utilized to construct the proposed ANN. Moreover, in order to obtain a more exact comparison of multiple criteria, DEA is utilized. Criteria such as average, mini-

mum, maximum and standard deviation of the mean absolute percentage error (MAPE) of each constructed ANN are considered in this study. The superiority of the proposed algorithm is shown by comparing its results against the other intelligent tools such as Genetic Algorithm (GA), Artificial Neural Network (ANN), Fuzzy Inference System (FIS), Adaptive Network Based Fuzzy Inference System (ANFIS) and Fuzzy Regression (FR) with respect to MAPE. This paper is organized as follows. In the next section, the literature is reviewed. Then, ANN is described in detail. After introducing conventional time series models, the importance of data preprocessing is explained and different data pre-processing methods are proposed. An algorithm1 for the estimation and prediction with ANN is developed in Section 4, and the results and concluding remarks are shown in the subsequent sections.

3. Literature review Artificial Neural Networks, Genetic Algorithms, Neuro Fuzzy and Fuzzy Inference Systems are often used in the energy sector as significant tools in Artificial Intelligence (AI) science. High flexibility, good estimation, forecasting capability and their ability to deal with noisy data are the main reasons that these methods are used in energy estimation and prediction. Some of the AI applications in the estimation of energy demand in various fields are discussed below. Peng et al. (1992) proposed a minimum-distance-based identification of the suitable historical patterns of load and temperature used for training of the ANN. They did not consider the possible integration of ANN with the other candidate methods. Moreover, they gave no reason for selecting one lag in the input time series selection. Dillon et al. (1991) showed that ANNs have the ability to learn and construct a complex nonlinear mapping through a set of input/output examples. They did not compare the result of ANN with the other suitable methods. Moreover, the normalization method is not suitable for ANN as it produces zero, but ANN works with non-zero values. Abductive network machine learning for predicting monthly electric energy consumption in the domestic sector of eastern Saudi Arabia was also proposed by Abdel-Aal et al. (1997). Rahman and Drezga (1992) used a connected ANN consisting of one major and three supporting neural networks to incorporate input variables such as the day of the week, the hour of the day and temperature. Kalogirou (Kalogirou, 2000) constructed and developed an ANN technique in order to estimate the heating loads of buildings and to forecast the energy consumption of a passive solar building. As the focus of this study is on the suitability of ANN in energy systems, a comparison of its performance with at least traditional statistical methods is vital. This is one of the main drawbacks of this study. Kalogirou and Bojic (2000) proposed an ANN forecasting model in order to predict the energy consumption of a passive solar building. One shortcoming of this study is the small number of training data. Moreover, they compared the ANN results only with dynamic simulation programs. Yao et al. (2000) used a wavelet transformation and neural network technique for electrical load forecasting. The energy consumption of a Canadian residential sector was used as the case study for research into the forecasting ability of a neural network (Aydinalp et al., 2002). Their proposed approach was time-consuming and this is one of the main shortcomings of this study. Hsu and Chen (Hsu & Chen, 2003) proposed an ANN model for fore1 In this algorithm, three basic models are selected in which two of them are ANN models and the last is ARIMA model. Two ANN models are ANN model with preprocessing (ANNW) and ANN model without pre-processing (ANNWO). The most suitable ANN model is then compared with ARIMA model using the Granger-Newbold test. MAPE error is also used to compare the models with actual data and to estimate the errors of the models.

427

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

casting the peak load of Taiwan. This study has two major drawbacks. The first one is that they did not provide a scientific method in the time series input selection. The second is that they compared the ANN performance only with traditional regression methods. The application of AI methods in short-term electric load forecasting was discussed by Metaxiotis et al. (2003). Ozturk et al. (2003) proposed a model based on genetic algorithm (GA) in order to estimate Turkey’s energy demand. Using simple GA for the purpose of estimation is incorrect as GA only optimizes functions such as OLS error function, while the GA solution is a semi-optimization and the OLS solution is a general optimization solution. Yu et al. (2011) proposed an extreme learning machine (ELM) and traditional statistical methods to deal with the time-consuming problem of ANN in forecasting. Their case study for forecasting was the fashion industry. They stated that ANN is slow in fashion industry prediction but they did not provide sufficient evidences for their claim. Sun, Choi, Au, and Yu (2008) proposed ELM as a novel ANN in sales forecasting. They concluded that their proposed method outperforms several other sales forecasting methods. For short-term forecasting, hybrid-forecasting methods are useful for areas such as daily electricity consumption. Amjady and Keynia (2011) proposed a new prediction strategy for the price prediction. They used a Probabilistic Neural Network (PNN) and Hybrid NeuroEvolutionary System (HNES) in their work. They considered the occurrence and value of electricity price spikes in their proposed model. They did not consider the various error methods to determine the performance of their proposed method. Kavaklioglu (Kavaklioglu, 2011) proposed an electricity prediction model based on the Support Vector Regression (SVR). Population, gross national product, imports and exports were used as the inputs for this model. Electricity consumption of Turkey from 1975 to 2006 was used to predict it for 2026. However, they did not compare the performance of the SVR with other estimation methods. Akay and Atak (2007) used the grey prediction method regarding the uncertain behavior pattern of electricity consumption in Turkey. They showed that their method is applicable in situations with small data. Zhou, Ang, and Poh (2006) used a trigonometric grey prediction for electricity demand forecasting. A review of the above works shows that all or some of the followings have not been taken into consideration in the past studies:

et al., 1996; Kohzadi et al., 1996; Tang & Fishwick, 1993; Tang et al., 1991), but traditional concepts such as covariance stationary and input selection are ignored. In this study, we show that the use of these concepts improves the performance of an ANN model. Although it has been shown that multilayer feed forward networks are universal approximations, the superiority of ANN over traditional models is yet to be proven. Hence, an ANN model is selected for the electricity consumption estimation if it has better forecasting performance than the ARIMA model. Among the different networks, the feed forward neural networks or multi-layer perceptron (MLP) are commonly used in engineering fields. MLP networks are normally arranged in three layers of neurons; the input layer and output layer represent the input and output variables of the model, and between them one or more hidden layers lie, which contain the network’s ability to learn nonlinear relationships. Architecture selection is one of the major issues for the empirical results and consists of: 1. 2. 3. 4.

Number of input and output variables. Number of hidden layers. Hidden and output activation functions. Learning algorithm.

All of the above issues are open questions today. The number of hidden units is determined by a trial and error process (White, 1989). Too few neurons in hidden layers (hidden units) can lead to under-fitting. However, too many neurons can cause over-fitting. The actual number of neurons required in the hidden layer must be found by trial and error. Moreover, the inputs that are used by the network must have an effect on the value of output(s). In fact the input and output variables should be identified carefully in order to enable the network to learn the relationships quickly and to use fewer hidden units.

y(t-1) L y(t-2) L

 The impact of pre-processing data on ANN modeling performance.  Using input selection method instead of trial and error methods.  Taking both linearity and nonlinearity conditions of time series modeling.  Using Granger–Newbold test to compare time series models.  Using DEA method to compare the performances of ANN learning algorithms.  Evaluation of forecasting ability (closed loop simulation).

y(t)

L

(a) Closed loop simulation

4. Explanation of estimation tools Artificial Neural Networks (ANNs), conventional time series models and data pre-processing methods are described in the following sub-sections.

ANN .. .

y(t-1)

• L

y(t-2) ANN y(t)

4.1. ANN An Artificial neural network is a promising alternative to econometric models. An ANN is an information processing paradigm that is inspired by the way that biological nervous systems, such as the brain, process information. ANNs, like people, learn by example. This heuristic method can be useful for nonlinear processes that have unknown functional forms (Enders, 2004). There have been many studies of ANN models in time series forecasting (Hill

.. . L

(b) Open loop simulation Fig. 1. Closed and open loop simulation.

428

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

y(t-1) L y(t-2) L

PCA

New Variable s

.. . L

ANN

y(t)

y(t-k)

Fig. 2. Closed loop simulation with PCA.

ANN is trained by m sample of the training data set and n new observations, namely validation (or test) data, to verify the socalled generalization capability (Schiffmann, Joost, & Werner, 1992). This validation is usually done via an open loop simulation procedure. In this study, a new aspect of the capability concept is considered to test an ANN’s performance, which is based on simulation power. Herein, a closed loop simulation is used at the validation step to test the performance of the constructed network. Fig. 1 illustrates the difference between these two methods. To explain further, suppose that the ANN is trained by a time series of y(t), t = 1 . . . m, using a set of lagged samples y(t  i), the ith lag of y(t), i = 1, . . ., k; 1 < k  m, as the input (independent) variables. In a closed loop simulation, y(t + j), j P 1 is generated through the network by the previous generated output of y(t + j  i). In fact, to test the simulation power of an ANN, we use samples m  k + 1 to m and then let the model generate all the succeeding samples, y(t), t = m + 1, . . ., m + n. If the network is capable of regenerating data without any need for correction, it is validated as an acceptable ANN; otherwise, the structure and/ or the parameters should be modified. Fig. 2 shows the closed loop simulation when the PCA approach is used for the input selection. In this study, the following algorithm is used. Moreover, it is assumed that all networks have a single hidden layer because the single hidden layer network is found to be sufficient to model any function (Cybenko, 1989). To find the appropriate number of hidden nodes, networks with one to q nodes in their hidden layer are constructed. The value of q is optional and should be changed if the goal error has not been met. 4.2. Learning algorithms Well-known types of different back-propagation algorithms are explained in Table 1 (Gallant & Stephen, 1993). 4.3. Conventional time series models 4.3.1. ARIMA model The ARIMA model belongs to a family of flexible linear time series models that can be used to model many different types of seasonal as well as non-seasonal time series. In the most popular multiplicative form, the ARIMA model can be expressed as:

model, in which p and q are the order of autoregressive and moving average terms, respectively. Box and Jenkins (1970) proposed a set of effective model building strategies for identification, estimation, diagnostic checking, and forecasting of ARIMA models. In the identification stage, the sample autocorrelation function (ACF) is plotted. A slowly decaying autocorrelation function suggests nonstationary behavior. In such circumstances, Box and Jenkins recommend differencing of the data. A common practice is to use a logarithmic transformation if the variance does not appear to be constant. After pre-processing, if needed, the ACF and PACF of the pre-processed data are examined to determine all plausible ARIMA models. A well-estimated model should be economical, fits the data well, has residuals that approximate a white-noise process and has good out-of-sample forecasts. The Box-Pierce Q-statistic can be used to test whether residuals can approximate the white noise process. This eliminates those models, which are not part of the white noise process, from consideration. Then, the parsimony and well fitness of the model are checked using the Akaike Information Criterion (AIC) and Schwartz Bayesian criterion (SBC). Finally, the Granger–Newbold test is used to compare the forecasting performance of the model (Enders, 2004). 4.3.2. Nonlinear time series Some nonlinear time series patterns were also developed mainly by Granger and Pristly. One of these nonlinear models is referred to as bilinear, in which the first rank model of the bilinear model is shown as below:

Xt ¼ aXt  1 þ b Zt þ c Zt  1 Xt  1

ð2Þ

where Zt is the stochastic procedure and a, b and c are the model parameters. It should be noted that only the last part of the above equation is nonlinear. Another type of nonlinear model is the Threshold Auto Regressive (TAR) model, in which the parameters dependent on the past values of the procedure. One example of such models is described as below:

Xt ¼

a1 X t1 þ Z ð1Þ if X t1 < d t1 ð2Þ a2 X t1 þ Z t1 if X t1 P d

ð3Þ

Furthermore, the proposed algorithm fits the best linear or nonlinear model to the data set. This is quite important, because most studies assume that linear time series such as ARMA provide the best fit.

Up ðLÞyt ¼ hq ðLÞet with

Up ðLÞ ¼ 1  U1 ðLÞ      Up Lp

ð1Þ

hq ðLÞ ¼ h1 L      hq Lq where s is the seasonal length, L is the back shift operator defined by Lk yt = ytk and er is a sequence of white noises with zero mean and constant variance. Eq. (1) is often referred to as the ARIMA (p, q)

4.4. Data pre-processing Covariance stationary2 is one of the basic assumptions in time series, which should be assessed. Moreover, using pre-processed 2 By definition, an ARIMA model is covariance stationary, if it has a finite and timeinvariant mean and covariance.

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

429

Table 1 ANN learning algorithm. Learning Algorithms

Explanation

2

Batch training with weight and bias learning rules (Trainb) BFGS quasi-Newton back-propagation (Trainbfg)

3

Bayesian regularization back-propagation (Trainbr)

4

12

Cyclical order incremental training with learning functions (Trainc) Conjugate gradient back-propagation with Powell–Beale restarts (Traincgb) Conjugate gradient back-propagation with Fletcher–Reeves updates (Traincgf) Conjugate gradient back-propagation with Polak–Ribiere updates (Traincgp) Gradient descent back-propagation (Traingd) Gradient descent with adaptive learning rate backpropagation (Traingda) Gradient descent with momentum back-propagation (Traingdm) Gradient descent with momentum and adaptive learning rate back-propagation (Traingdx) Levenberg–Marquardt back-propagation (Trainlm)

13

One step secant back-propagation (Trainoss)

14

Random order incremental training with learning functions (Trainr) Resilient back-propagation (Trainrp)

Trainb trains a network with weight and bias learning rules with batch updates. The weights and biases are updated at the end of an entire pass through the input data Trainbfg is a network training function that updates weight and bias values according to the BFGS quasiNewton method Trainbr is a network training function that updates the weight and bias values according to Levenberg– Marquardt optimization. It minimizes a combination of squared errors and weights, and then determines the correct combination so as to produce a network that generalizes well. The process is called Bayesian regularization Trainc trains a network with weight and bias learning rules with incremental updates after each presentation of an input. Inputs are presented in cyclic order Traincgb is a network training function that updates weight and bias values according to the conjugate gradient back propagation with Powell–Beale restarts Traincgf is a network training function that updates weight and bias values according to the conjugate gradient back-propagation with Fletcher–Reeves updates Traincgp is a network training function that updates weight and bias values according to the conjugate gradient back-propagation with Polak–Ribiere updates Traingd is a network training function that updates weight and bias values according to gradient descent Traingda is a network training function that updates weight and bias values according to gradient descent with adaptive learning rate Traingdm is a network training function that updates weight and bias values according to gradient descent with momentum Traingdx is a network training function that updates weight and bias values according to gradient descent momentum and an adaptive learning rate Trainlm is a network training function that updates weight and bias values according to Levenberg– Marquardt optimization Trainoss is a network training function that updates weight and bias values according to the one step secant method For each epoch, all training vectors (or sequences) are each presented once in a different random order, with the network and weight and bias values updated after each individual presentation Trainrp is a network training function that updates weight and bias values according to the resilient back-propagation algorithm (Rprop) Trains trains a network with weight and bias learning rules with sequential updates. The sequence of inputs is presented to the network with updates occurring after each time step Trainscg is a network training function that updates weight and bias values according to the scaled conjugate gradient method

1

5 6 7 8 9 10 11

15 16 17

Sequential order incremental training with learning functions (Trains) Scaled conjugate gradient back-propagation (Trainscg)

data is useful in most heuristic methods (Zhang & Qi, 2005). If the models are not covariance stationary, the most suitable pre-processed method should be defined and applied. In forecasting models, a pre-processing method should have the capability of transforming the pre-processed data into its original scale (called post-processing). So, in time series forecasting methods, an appropriate pre-processing method should have two main properties. It should make the process stationary and have the post-processing capability. In the following, the most useful pre-processed methods are studied. These methods have been reported in the literature and reference books and will be used with their original forms in the case study. In the case study section, the pre-processing method that has the two mentioned properties is finally selected. 5. Proposed algorithm We now develop an algorithm to model the time series process, which is shown in Fig. 3. Moreover, the conceptual framework of the proposed algorithm is depicted in Fig. 4. Figs. 3 and 4 assist readers to follow the proposed algorithm. Two ANN models (ANNW and ANNWO) are considered to determine the impact of pre-processing on the ANN models for estimation. Another conventional time series estimation method is also considered to study the efficiency of ANN compared with conventional models. This algorithm has the following basic steps: Step 1: Data collection Divide data into two sets, one for estimating the models called ‘training data’ set and the other one for evaluating the validity of the estimated model called ‘testing data’ set. Usually, the training

data set contains 70% or 90% of all data and the remaining data are used in testing the data set (Aznarte et al., 2007). Step 2: Pre-processing data The stationary assumption should be studied for both the ANNW model and ARIMA model. If the process is not covariance stationary, the most suitable pre-processing method should be selected and applied to the model. Step 3: Input selection The input variables for each model should be determined. For the ARIMA model, input variables can be selected using an autocorrelation function (ACF) and partial autocorrelation function (PACF), and in most heuristic methods, selection of input variables is experimental or based on the trial and error method (Al-Saba & El-Amin, 1999; Ghiassi et al., 2005; Karunasinghe & Liong, 2006; Kim et al., 2004; Nayak et al., 2004; Zhang & Hu, 1998; Zhang & Qi, 2005). As there is no specified proposed method for selecting input variables in ANN, the PCA (Principal Component Analysis) approach is proposed to select input variables in both ANN models. The importance of this approach is understood when one considers the difficulty and inaccuracy of the trial and error method. Irregular input selection produces imprecision. Even if all the previous lag combinations are used, the trial and error method will be time-consuming. For example, if all the combinations are selected from the 12 most recent lags, the number of combinations3 will be:   12 To explain Eq. (9) in more detail, note that shows the number of various i input sets when the number of input variables is i. 3

430

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

Fig. 3. ANN-time series algorithm with non-stationary data.

 i¼1  X 12 12

i

¼ 212

ð4Þ

The PCA approach introduces a few combinations for model input in comparison with the trial and error process. PCA is regarded as a method for main feature extraction and can reduce/remove redundant variables from the original input data (Azadeh & Ebrahimipour, 2004; Enlin, 1995; Zhu, 1998). Since the lost information needs to be justified, a reasonable number of principal components need to be chosen to avoid poor predicting performance. Generally, the objective of PCA is to identify a new set of variables such that each new variable, called a principal component, is a linear combination of the original variables. Therefore, the first new variable y1 accounts for the maximum variance in the sample data, the second new variable y2 accounts for the second maximum variance in the sample data and so on. Moreover, the new variables (principal com-

ponents) are uncorrelated. PCA is performed by identifying the eigen structure of the covariance or singular value decomposition of the original data. The PCA approach helps us to reduce the number of original variables to a set of new variables. The new set of variables accounts for 85–90% of the total variation; changes in this percentage could change the number of new variables. Step 4: Estimation of models Running and estimating of all models are done in this step. The plausible architecture of ANN is constructed in this step and if error goal4 is not met in step 6, the nodes are increased. Step 5: Post-processing data Post-processing of the estimated data is done in this step.

4

4–5% is considered as the error goal of the present study.

431

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

ANN*

ANNs DEA Data Collection

Preprocessing data

GrangerNewbold Test

PCA

Mcleod –LI Test

Raw data

Linear Time series

Non Linear Time series

Select the preferred model Fig. 4. Conceptual framework of proposed algorithm.

Test Data

ANN1

• • • •

AveMAPE MinMAPE MaxMAPE StdMAPE

ANN2

• • • •

AveMAPE MinMAPE MaxMAPE StdMAPE

. . .

DEA

ANN 17

• • • •

The Best ANN

AveMAPE MinMAPE MaxMAPE StdMAPE

Fig. 5. Procedure of applying DEA on ANN results.

Step 6: Comparing the models The predicting ability for each model is evaluated in this step. First, the two general ANN models are compared with each other to study the impact of pre-processing on ANN performance and to determine the best ANN. DEA is used to compare the ANN models. Fig. 5 shows the procedure of applying DEA on ANN results. After DEA runs, either ANNW or ANNWO will be selected as the preferred group. In fact, the impact of data pre-processing on ANN performance has been determined by DEA. ANOVA is used to find

the best structure in the group that has been determined by DEA. The preferred ARIMA models are selected from the plausible model by Box-pierce Q-statistic test, Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC). The nonlinearity of the process is determined by the McLeod–Li test. If this test shows the process in nonlinear, then the plausible nonlinear model is constructed and the best nonlinear model is determined. Finally, the best conventional time series is selected from the best ARIMA model and the nonlinear model with the Granger–Newbold test. The most suitable ANN model is then compared with the best conventional

432

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

time series model. Furthermore, the comparison of each model with actual data is done and the error of each model is calculated.

countries. These algorithms, together with the closed loop and open loop assessment methods, should be examined closely by relevant managers in order to ascertain their performance.

5.1. A note on using DEA in the proposed algorithm The use of DEA helps the user to find an appropriate ANN model to utilize as an acceptable forecasting tool. In the other words, various error calculation methods are used to find the robust ANN learning algorithm. As one of the important parameters in the ANN is the learning algorithm, it is helpful to find an appropriate and preferred learning algorithm in each case study. The performances of some of these algorithms have already been reported in various literatures/references, in terms of their specific use for the purpose of estimating electricity consumption in developing

6. Case study The proposed algorithm is applied to 130 data, which are the monthly consumption values from April 1992 to February 2004 in Iran. The data are derived from the Energy Balances of Islamic Republic of Iran book (2005 version) that was prepared by the Ministry of Energy, Energy Planning Department. The raw data is shown in Table 2. The flowchart of the algorithm in the pre-processing case and for ANN is depicted in Fig. 6. The flowchart shows

Table 2 Iranian monthly electricity consumption (kW h) data. Month

Electricity consumption (kW h)

Month

Electricity consumption (kW h)

Month

Electricity consumption (kW h)

Month

Electricity consumption (kW h)

April-92

5,112,500

5,666,248

June-95

7,962,063

January-97

7,207,062

May-92

5,681,521

5,826,754

July-95

8,679,644

6,395,367 6,975,157

5,951,504 6,011,416

8,720,364 7,941,711

August-92 September92 October-92

7,110,100 6,735,385

7,134,022 6,438,431

May-97 June-97

8,462,697 9,674,038

5,736,692

May-94

6,217,736

6,660,092

July-97

10,523,297

November92 December92 January-93 February93 March-93

5,521,386

June-94

7,062,810

August-95 September95 October-95 November95 December95 January-96

February97 March-97 April-97

7,042,722

June-92 July-92

November93 December93 January-94 February94 March-94 April-94

6,847,702

August-97

10,738,586

5,675,558

July-94

7,757,156

6,812,296

August-94 September94 October-94

8,064,807 7,838,773 6,508,814

May-96

7,289,726

6,171,428

June-96

8,351,912

September97 October-97 November97 December97 January-98

10,331,172

5,855,284 5,811,723

February96 March-96 April-96

6,315,113

July-96

9,178,630

6,430,886 6,597,923

6,760,571

August-96 September96 October-96 November96 December96

5,412,274

April-93

5,343,047

May-93

5,890,343

June-93 July-93

6,498,841 7,281,385

August-93 September93 October-93

7,598,830 7,225,359

November94 December94 January-95 February95 March-95 April-95

5,896,872

May-95

5,751,556 5,536,390

6,583,064 6,117,896

6,472,784 6,352,603

6,672,616 7,408,035

8,496,857 8,079,716 8,302,567 8,355,663 8,334,044

9,225,747 8,807,836

February98 March-98 April-98

7,396,819 6,758,709

May-98 June-98

9,499,280 10,356,647

7,034,106

July-98

11,662,210

8,065,338 8,143,074

Month

Electricity consumption (kW h)

Month

Electricity consumption (kW h)

Month

Electricity consumption (kW h)

August-98 September-98 October-98 November-98 December-98 January-99 February-99 March-99 April-99 May-99 June-99 July-99 August-99 September-99 October-99 November-99 December-99 January-00 February-00

11,913,777 11,226,582 9,437,329 8,962,981 9,100,005 9,151,585 9,252,917 9,118,551 8,822,002 10,307,161 11,330,227 12,506,292 12,879,931 12,004,267 10,235,825 9,754,226 9,701,765 9,703,783 9,901,824

March-00 April-00 May-00 June-00 July-00 August-00 September-00 October-00 November-00 December-00 January-01 February-01 March-01 April-01 May-01 June-01 July-01 August-01 September-01

9,401,967 9,436,829 10,729,817 12,437,144 13,715,487 14,094,341 13,308,948 11,637,033 10,331,066 10,555,722 10,789,612 10,606,500 10,210,820 10,510,355 11,833,496 13,466,023 15,215,272 15,405,851 14,479,378

October-01 November-01 December-01 January-02 February-02 March-02 April-02 May-02 June-02 July-02 August-02 September-02 October-02 November-02 December-02 January-03 February-03

12,274,190 11,332,188 11,483,643 11,707,494 11,485,285 11,202,808 11,328,070 12,920,315 14,715,280 16,291,332 16,944,113 16,086,172 13,581,415 12,435,256 12,612,147 12,690,363 12,676,607

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

433

Fig. 6. Flowchart of the proposed algorithm for ANN and pre-processed data.

The 130 data to be used are divided into 118 training data and 12 test data. Moreover, 129 pre-processing data are divided into 117 training data and 12 test data. To address our problem, the extrapolating ability of ANN should be calculated; hence the data for validation was chosen for the period which is closer to 2005.

pre-processing method for our data, which can make the model covariance stationary, is selected. Examination of Fig. 7b–d shows that the normalization method is not suitable for pre-processing the data. Although the first difference of the series seems to have a constant mean, the variance is an increasing function of time. Therefore, this method is not covariance stationary and cannot be used for data pre-processing (Fig. 7e). It can be seen in Fig. 7f, which shows the first difference of the logarithm of consumption data, that this method is the most likely candidate to be covariance stationary. Therefore, it is the most applicable pre-processing method for our data.

6.2. Pre-processing data

6.3. Input selection

It can be seen in Fig. 7a that raw data exhibits a trend. As any trend needs to be removed in order to arrive at a more precise estimation and also to study the impact of pre-processing on ANN, all pre-processing methods are applied on the data. Finally, the best

For both general ANN models, the PCA approach is used to select input variables among 12 lags. By using this approach, the input variables of ANNWO are reduced to 4 so that 90% of total variation could be described with these new variables.

only the flow of the algorithm at the end of DEA and only for an ANN in the pre-processing case to help readers following the algorithm. 6.1. Data collection

434

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

18000000 16000000

12000000 10000000 8000000 6000000 4000000 2000000 0 1

9

17

25

33 41

49 57

65 73

81 89

97 105 113 121 129

(a) Months-April 1992 to February 2004

Preprocessed consumption by Min-Max normalization method

1.2

1

0.8

0.6

0.4

0.2

0 1

9

17

25

33

41

49

57

65

73

81

89

97 105 113 121 129

(b) Months-April 1992 to February 2004 (Preprocessed value) 3.5 3

Preprocessed consumption by Z-score Normalization method

Raw data-Kwh

14000000

2.5 2 1.5 1 0.5 0 1

8

15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127

-0.5 -1 -1.5 -2

(c) Months-April 1992 to February 2004 (Preprocessed value) Fig. 7. Raw data and pre-processed data by different methods (continue).

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

435

Preprocessed consumption by Sigmoidal normalization method

0.8 0.6 0.4 0.2 0 -0.2

1

10

19

28

37

46

55

64

73

82

91 100 109 118 127

-0.4

data by different methods (continue)

-0.6 -0.8 -1

(d) Months-April 1992 to February 2004 (Preprocessed value) Preprocessed consumption by difference method

2500000 2000000 1500000 1000000 500000 0 -500000

1

8

15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127

-1000000 -1500000 -2000000 -2500000 -3000000

Preprocessed consumption by difference of logarithm method

(e) Paired Comparison (difference method) 0.08 0.06 0.04 0.02 0 -0.02

1

9

17

25

33

41

49

57

65

73

81

89

97 105 113 121 129

-0.04 -0.06 -0.08 -0.1

(f) Paired Comparison (diffrence of logarithm method) Fig. 7. (continued)

6.4. Estimation of models In this section, the modeling task is done. As each sub-method has some specific parameters, it is important to establish them and finally find the best architecture. As stated in Section 2, there are four important parameters in the ANN architecture: 1. 2. 3. 4.

Number of input and output variables. Number of hidden layers. Hidden and output activation functions. Learning algorithm.

The first parameter is determined by PCA in the present study. Moreover, the number of hidden layers is set to one according to Cybenko (1989). As ANN is used for estimation in this paper, the hidden and output activation functions are set to tansig and pureline, respectively. Finally, 17 general training algorithms are considered in the present study. DEA models can be input or output oriented and can be specified as constant returns to scale (CRS) or variable returns to scale (VRS). In the present study, input-oriented – VRS DEA is selected according to the type of problem. As only inputs are important and controllable in this problem, input-oriented DEA

436

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

Table 3 Statistical variable related to ANN-MLP models. Learning algorithm

AVEMAPE (%) a

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 a b

Trainb trainbfg trainbr Trainc traincgb traincgf traincgp traingd traingda traingdm traingdx trainlm Trainoss Trainr Trainrp Trains trainscg

MinMAPE (%) b

Without

With

Without

With

Without

With

Without

15.82 0.09 0.02 0.05 0.09 0.11 0.11 1.84 0.40 21.48 1.95 0.02 0.2 0.06 0.31 558.26 12.77

1.72 0.14 0.03 0.39 0.18 0.22 0.32 1.52 0.77 1.33 1.12 0.05 0.41 0.69 0.43 2.63 0.23

0.15 0.05 0.01 0.02 0.03 0.02 0.03 0.16 0.05 0.17 0.15 0.01 0.03 0.02 0.04 0.93 0.02

0.40 0.08 0.02 0.35 0.06 0.07 0.05 0.24 0.09 0.33 0.06 0.02 0.06 0.67 0.11 0.69 0.07

226.87 0.24 0.02 0.08 0.25 0.52 0.28 18.66 1.42 333.13 15.76 0.04 1.52 0.09 1.07 3479.63 253.31

5.85 0.25 0.07 0.53 0.52 0.80 1.91 5.03 2.63 4.66 3.82 0.08 2.81 0.72 0.86 5.17 0.85

50.27 0.04 0.00 0.01 0.05 0.11 0.07 4.08 0.35 73.94 3.54 0.004 0.19 0.02 0.26 765.98 22.004

0.99 0.10 0.02 0.52 0.13 0.19 0.33 1.59 0.67 0.94 0.89 0.024 0.44 0.53 0.32 1.55 0.15

With pre-process and post-process. Without pre-process and post-process.

6.4.1. Estimation of electricity consumption using ANNW and ANNWO In order to obtain the best ANN, for each MLP-training algorithm (17 general training algorithms) 20 MLP-LM models (1–20 nodes in hidden layer) are tested to find the best architecture. In fact, 680 ANN-MLP models (3405 models with pre-processed data and 340 models without pre-processed data) are constructed and tested. Table 3 shows the statistic variables value of 346 MLP-training algorithms, where: – Variable MAPEjk is defined as: Mean Absolute Percentage Error of ANN, with regard to kth part of data, used as test data and j node in hidden layer. – Variable AVEMAPEj is defined as: AVEMAPEj = Average {MAPEjk: k = 1, 2 . . .} or average of MAPE in all constructed ANNs with regard to j node in hidden layer. – AVEMAPE is defined as: AVEMAPE = Average (AVEMAPEj) – StdMAPE is defined as: StdMAPE = std (AVEMAPEj) – MaxMAPE is defined as: MaxMAPE = Max (AVEMAPEj) – MinMAPE is defined as: MinMAPE = Min (AVEMAPEj)

6.4.1.1. Assessing the impact of data pre-processing by using DEA. DEA is a non-parametric method that uses linear programming to calculate the efficiency in a given set of decision-making units (DMUs). The less efficient firms and the relative efficiency of the firms are calculated in terms of scores on a scale of 0–1, with the frontier firms receiving a score of 1. DEA models can be input or output oriented and can be specified as constant returns to scale (CRS) or variable returns to scale (VRS). 17  20. Seventeen general training algorithms with pre-processed and 17 general training algorithms without pre-processed data. 6

StdMAPE (%)

With

is selected. As ANNs are in a similar situation, CRS is selected too. Different ARIMA models are also selected according to the Box– Jenkins strategy.

5

MaxMAPE (%)

We have utilized the DEA method to compare the ANN models. Since MAPE alone may not be appropriate for the comparison process, we need to utilize other variables including AVEMAPE, MinMAPE, MaxMAPE and StdMAPE for each ANN model. Error variables are shown in Table 4. The DEA method will effectively take into account the values of these variables to study the performance of ANN models. We treated the values of MAPE, MIN, MAX and STD for each model as inputs with the specific output of 1 for each ANN. The calculated full rank efficiency scores along with ANN performance ranks are shown in Table 4. Examination of Table 4 shows that ANNs with Bayesian regularization backpropagation and Levenberg–Marquardt back-propagation algorithms and pre-processed data are the best. Moreover, the examination of Table 4 shows that 12 of the initial 17 learning algorithms use the pre-processing and post-processing methods. Therefore, five learning algorithms have good performances even without using the pre-processed data, which are Trainbr, trainlm, traingdx, traincgp and traincgb. 6.4.1.2. Finding the best ANN model using ANOVA. A factorial design is used to study the effect of the following factors on the performance of ANN: Train Alg., Hidden Transfer Fun and H. Neurons. The ANOVA table for this experiment is presented in Table 5. The results of ANOVA show that the training algorithm and hidden transfer function are statistically the significant factors, which affect the MAPE test of ANN. The only insignificant factor in the MAPE test of ANN is the number of neurons in the hidden layer (H. Neurons) because of the large p-value. To select the best ANN structure, we can refer to the levels of the significant factors, i.e. training algorithm and hidden transfer function. In Table 6, the average of MAPE for each level of these factors is presented. As seen in Table 6, among the training algorithms, trainlm has the lowest MAPE; hence this training algorithm is the preferred algorithm to train ANN. Tansig as the transfer function associates with the lower MAPE in comparison with logsig. 6.4.2. Estimation of electricity consumption using ARIMA model To find the best time series model, pre-processed data are used. Fig. 8 shows ACF and PACF charts, respectively. The theoretical ACF of a pure MA (q) process cuts off to zero at lag q and theoretical ACF of an AR (1) model decays geometrically. According to this figure, neither of these specifications seems

437

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441 Table 4 Full rank efficiency of ANN-MLP model. Learning algorithm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 a b

ANN model efficiency scores

a

Trainbrw Trainlmw Traincgfw Trainbrwob Trainscgw Traincw Trainrw Trainlmwo Traincgpw Trainossw Traincgpw Trainrpw Traingdaw Traincgpw Trainbfgw Traincgpwo Traingdxwo

1.51 1 0.79 0.76 0.73 0.70 0.67 0.65 0.54 0.54 0.48 0.48 0.35 0.33 0.32 0.30 0.29

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Learning algorithm

ANN model efficiency scores

Trainosswo Trainscgwo Traincgfwo Trainbfgwo Traingdawo Trainrpwo Trainbw Traingdxw Traingdw Traingdmw Traingdwo Traingdmwo Traincwo Trainbwo Trainrwo Trainswo Trainsw

0.29 0.26 0.25 0.23 0.21 0.16 0.123 0.122 0.11 0.10 0.07 0.05 0.05 0.04 0.03 0.02 0.02

Trainbw (with pre-process and post-process): ANN that uses trainb algorithm and pre-processed data and post-process. Trainbwo (without pre-process): ANN that uses trainb algorithm and raw data.

test shows that AR (1) and AR (2) should be eliminated. However, as measured by AIC and SBC, ARIMA (1, 1) and AR (2, MA (12)) do not fit the data as well as the ARIMA (1, (1, 12)). Granger–Newbold

appropriate for the electricity consumption. The ACF does not decay geometrically and therefore is suggestive of an AR (2) process or a process with both autoregressive and moving average components. Since we have monthly data, we might want to incorporate a seasonal factor at lag 12. Therefore, we consider 5 models AR (1), AR (2), ARIMA (1, 1), ARIMA (1, (1, 12)) and AR (2, MA (12)) for our train data. Table 7 shows models’ information, including Q statistics, AIC, SBC and coefficients estimation. Box-pierce Q-statistic

Table 7 Characteristics of the ARIMA models, including Q statistics, AIC, SBC and coefficients estimations.

Table 5 ANOVA table for MAPE test of different ANNs. Source

DF

SS

MS

F

P-value

Train alg. Hidden transfer fun. H. neurons Error Total

2 1 4 527 539

1076.60 24.58 4.87 1101.29 2908.79

538.30 24.58 1.22 2.09

257.59 11.76 0.58

0.000 0.001 0.675

a0 a1 a2 b1 b2 Q(4) Q(8) Q(12) AIC SBC

AR (1)

AR (2)

ARIMA (1, 1)

ARIMA (1, (1, 12))

AR (2, MA (12))

0.009 0.201

0.008 0.235 0.174

0.009 0.616

0.003 0.235

0.008 0.122 0.076

0.879

0.149 0.757 15.6 (0.00) 8.94 (0.07) 4.66 (0.33) 1,3 (0.85) 25.005 (0.00) 16.04 (0.05) 11.642 (0.17) 7.2 (0.51) 31.35 (0.00) 21.4 (0.05) 18.13 (0.11) 11.5 (0.501) 623 632 639 648 620 625 631 640

0.885 2.02 (0.72) 8.5 (0.41) 12.1 (0.41) 645 642

Table 6 Average of MAPE for each level of affecting factors. Factor

Levels

Number of ANNs

MAPE

Train alg.

Trainlm Traingdx Trainrp

180 180 180

2.72 6.15 4.00

Tansig Logsig

270 270

4.08 4.50

Hidden transfer fun.

Table 8 The Mcleod–Li test results.

Ljung-Box Q-statistic

Q(4)a

Q(8)

Q(12)

1.4 (0.846)

7.3 (0.5)

11.2 (0.51)

a

Q(n) reports the Ljung-Box Q-statistic for the autocorrelation coefficients of the squared n residuals of the estimated model. Significant levels are in parentheses.

0.3 0.2 0.1 0

ACF 1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

-0.1 -0.2 -0.3 -0.4 Fig. 8. ACF and PACF charts for ARIMA model.

31

33

35

PACF

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

20000000 18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0

Actual data ARIMA Monthly Electricity Consumption Estimation

5) 00 n(2 4) Ja 00 c(2 ) De 4 00 v(2 No 4) 00 t(2 4) Oc 00 p(2 Se 4) 00 g(2 ) Au 4 00 l(2 Ju 4) 00 n(2 4) Ju 00 y (2 ) Ma 4 00 r(2 Ap 4) 00 r(2 Ma 04) 0 b(2 Fe

Electricity Consumption (Kwh)

438

Test months Fig. 9. Comparison of ARIMA output with actual data.

test also shows that ARIMA (1, (1, 12)) has the best forecasting performance versus ARIMA (1, 1) and AR (2, MA (12)). 6.4.2.1. The Mcleod–Li test. Residuals of ARIMA (1, (1, 12)) are used to run the Mcleod–Li Test. An examination of Table 8 shows that the nonlinearity condition is not satisfied. Therefore, ARIMA (1, (1, 12)) is selected as the preferred time series model. 6.4.2.2. Nonlinear time series model. As mentioned, the ARIMA model is adequate for our data and there is no need to identify an appropriate nonlinear time series model. The comparison of ARIMA (1, (1, 12)) output with actual data is shown in Fig. 9. 6.5. Post-processing data Since data are pre-processed for both ANNW and ARIMA models, the estimated data obtained by these models should be postprocessed. Here, we have:

Table 9 The impact of data post-processing on ANN performance.

ANN’s MAPE (%)

Before data post processing

After data post processing

7

01

Table 10 Errors (MAPE) of the models. ANNW model

ANNWO model

0.103

0.01

0.02



x1 xt1



x1 ¼ 10yt ! xt1

!

xt ¼ xt1  10yt

ð5Þ

Let annxi (i: 1 . . . 12) be the ANN output for pre-processed test data. Therefore, Annxi is post-processed by the following formula:

xði  1Þ ð10Þ^ ðAnnxiÞ

ð6Þ

ARIMA’s output post-processed is similar to the case mentioned above. 6.6. Comparing the models ANNW (trainlmw) and conventional time series (ARIMA (1, (1, 12))) are compared by using the Granger–Newbold test. Trainbrw is considered as model 1 and ARIMA (1, (1, 12)) as model 2. The value of t-statistic is statistically different from zero (2.602). Since rxz (explained in Appendix A) is positive, then the trainlmw has the better forecasting performance than the ARIMA (1, (1, 12)) model. Table 9 shows the impact of data post-processing on ANN performance. According to Table 9, calculating ANN performance after post-processing the ANN’s output is necessary. The content of this table shows that avoidance of post-processing unexpectedly leads to increasing the network error about 23 times that of the postprocessed network output. Table 10 shows that ANNW has the least MAPE, which shows the efficiency of ANNW compared with that of the other models. The comparison of ANNW (the best proposed model) with actual data is shown in Fig. 10. 6.7. Technical note about the number of training data Rumelhart and McClelland (1986) claimed that becoming trapped in a local minimum is rarely a practical problem for

18000000 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0

Actual data ANNW monthly electricity consumption estimation

5) 00 n(2 Ja 4) 00 c(2 De 4) 00 v(2 ) No 4 00 t( 2 Oc 4) 00 p( 2 ) Se 4 00 g( 2 Au 4) 00 l(2 Ju 4) 00 n(2 Ju 4) 00 y (2 Ma 04 ) 0 r( 2 Ap 4) 00 r( 2 Ma 4) 00 b( 2 Fe

Electricity Consumption (Kwh)

MAPE (%)

ARIMA model

yt ¼ log

Test months Fig. 10. Comparison of ANNW output with actual data.

439

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

consumption. In order to train the ANN, pre-processed data have been extracted using the time series techniques. Monthly electricity consumption in Iran was collected to train and test the network. The MAPE value of ANN is 0.0156. Third: Fuzzy Linear Regression (FR) is an extension of the classical regression and is used to estimate the relationships among variables where the available data are very limited and imprecise and variables are interacting in an uncertain, qualitative and fuzzy way. Azadeh et al. (2010) presented an integrated fuzzy regression and time series to estimate and predict electricity demand for seasonal and monthly changes in electricity consumption with nonstationary data. According to their findings, the MAPE value for the best fuzzy regression model is 0.008. Fourth: Neuro-Fuzzy is a combination of ANN and fuzzy system, and therefore has the advantages of two models and is selected instead of ANN. Azadeh et al. (2009) presented a hybrid Adaptive Network based Fuzzy Inference System (ANFIS), computer simulation and time series algorithm to estimate and predict electricity consumption. Monthly electricity consumption data in Iran was collected to train and test the ANFIS. The MAPE value of ANFIS was reported as 0.015. In summary, Table 11 shows the MAPE estimation Genetic Algorithm (GA), Fuzzy Regression (FR), Artificial Neural Network (ANN) and Adaptive Network based Fuzzy Inference System (ANFIS) versus the proposed algorithm. This table indicates that the proposed algorithm provides a better estimation with respect to GA, FR, ANN and ANFIS.

back-propagation learning (Rumelhart and McClelland, 1986, chap. 8]. Moreover, the 0.01% error shows that a trained neural network has an acceptable performance. Finally, the following formula shows that 130 data is acceptable:

N

n

ð7Þ

e

where N is the number of learning data, n is the number of hidden neurons and e is the error goal. In the present study, n is 4 and e is 0.04. Therefore, N must be greater than 100. Thus, because the number of training data (118) is greater than 100, the number of data is sufficient. 6.8. A note on the comparability and superiority of the proposed method The use of the following tests gives the proposed method comparability and superiority compared to the conventional methods for estimating functions:  McLeod–Li test to consider both linearity and nonlinearity conditions of time series modeling.  Granger–Newbold test to compare time series models.  DEA method to compare ANN models. Moreover, several experiments must be conducted to show the comparability and superiority of the proposed algorithm. Monthly Iranian electricity consumption values from April 1992 to February 2004 are used in this experiment. Usually, training data set contains 70% or 90% of all data and the remaining data are used as the test data (Aznarte et al., 2007). To improve the ANN performance, the mean absolute percentage error (MAPE) is extended to four indexes to cover different values of MAPE. DEA is also used to integrate the four extended MAPE indexes.

8. Conclusion In this paper, an algorithm was developed to improve the electricity consumption estimation. The intelligent ANN–PCA–DEA algorithm was developed by different data pre-processing methods and its efficiency was examined for Iranian electricity consumption. DEA and Granger–Newbold test were used to show the efficiency of the ANN. The algorithm for calculating the ANN performance is based on its closed and open simulation abilities. DEA was used to find a suitable ANN learning algorithm, which is robust in view of different error calculation methods. Different error calculation methods to assess the ANN learning algorithm in electricity consumption estimation were integrated. DEA was used for long-term usage and the Granger–Newbold test was used for on-time usage. Moreover, a non-covariance stationary process was converted to a covariance stationary process by a suitable data pre-processing method (data mining techniques). Then, the PCA method was used for input selection. We compared three models for time series modeling by using DEA and the Granger–Newbold test. The DEA results showed the positive impact of pre-processing data on the ANN performance as well as ANN learning algorithms performance. The results of Granger–Newbold test also showed that ANN has a better performance than ARIMA. Moreover, we showed that after data pre-processing, data post-processing is needed. In summary, it should be mentioned that the following features of the proposed algorithm have not been considered in the previous studies:

7. Comparison with other intelligent methods The results of four intelligent methods have been used in the present study and are compared with the proposed algorithm in terms of MAPE value. First: Genetic Algorithm (GA) is similar to the natural evolution process where a population of specific species adapts to the natural environment under consideration. A population of designs is also created and then allowed to evolve in order to adapt to the design environment under consideration. Here, several GA studies are cited to show the applicability of this method. Azadeh and Tarverdian (2007) presented an integrated algorithm for forecasting monthly electrical energy consumption based on the genetic algorithm to select the optimum model, which could be time series, GA or simulated-based GA. Monthly electricity consumption in Iran has been used as the case study. The MAPE value of GA is 0.014, which is calculated based on the method outlined in Appendix B. Second: Artificial Neural Networks (ANNs) are a promising alternative to econometric models. An ANN is an information-processing paradigm that is inspired by the way that biological nervous systems, such as the brain, process information. ANNs, like people, learn by example. Azadeh, Ghaderi, Tarverdian et al. (2007) illustrated an ANN approach based on a supervised MultiLayer Perceptron (MLP) network for the forecasting of electrical

(1) The impact of pre-processing data on ANN. (2) The use of PCA to input selection versus trial and error method in the previous studies.

Table 11 The MAPE estimation of the proposed algorithm versus GA, fuzzy system, fuzzy regression and ANN.

MAPE (%)

Genetic algorithm

Fuzzy regression

Artificial neural network

Adaptive network based fuzzy inference system

The proposed algorithm (trainbrw)

0.14

0.082

0.156

0.155

0.01

440

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

Table 12 The features of the ANN-PCA-DEA algorithm versus other methods. Method

Feature

The ANN-PCA-DEA algorithm ANN ANFIS Fuzzy regression Genetic algorithm Decision tree

Multiple inputs p

Data complexity and non-linearity p

Intelligent modeling and forecasting p

High precision and reliability p

p p p p p

p p p p p

p p

p p p

Data pre-processing and post-processing p

Systematic input selection p

Multiple criteria model assessment p

Fuzzy data modeling

p p

p p

(3) The Use of Mcleod–Li test to consider both linearity and nonlinearity conditions of the time series modeling. (4) The Use of Granger–Newbold test to compare time series models. (5) The use of DEA method to compare the ANN learning algorithms.

All methods, except MAPE, have a scaled output. As input data used for the model estimation, pre-processed and raw data have different scales, the MAPE method is the most suitable one for estimating the errors. Appendix C. Pre-processing methods

The ANN-PCA-DEA algorithm is also compared with previous models to show its advantages over the other models (Table 12). Future research may be concentrated on solving the complexity and time-consuming issues of the proposed algorithm. Moreover, in order to extend the proposed model, the seasonal changes from data can be removed and compared with the two ANN models presented in this paper. The impact of data pre-processing and post-processing in another method (such as Fuzzy Regression, Genetic Algorithm (GA), Neuro-Fuzzy and Memetic Algorithm) can also be studied.

C.1. Normalization

Acknowledgments

This method scales the numbers in a data set to improve the accuracy of the subsequent numeric computations. Let xold, xmax, xmin be the mean value, maximum and minimum of raw data, respectively, and X 0max ; X 0min be the maximum and minimum of the normalized data, respectively, then the normalization of xold called X 0new is given by the following transformation function:

This study was supported by a grant from University of Tehran (Grant No. 8106013/1/09), and also another grant from Iran National Science Foundation (INSF), Grant No. 91000659. Appendix A. The Granger–Newbold test

Nayak et al. (2004), Enders (2004) and White (1989) used this method in their articles, in which the time series functions are estimated using a heuristic approach. The different normalization algorithms are described in the following. C.2. Min–Max normalization

X 0new ¼

Granger and Newbold (1974) proposed this test to compare two time series models. This test overcomes the problem of contemporaneously correlated forecast errors. Let e1t and e2t be the error of model 1 and 2, respectively. Use the two sequences of forecast errors to form xt = e1t + e2t and zt = e1t  e2t. Let rxz denote the sample correlation coefficient between {xt} and {zt}. If rxz is statistically different from zero, model 1 has a larger MSE if rxz is positive, and model 2 has a larger MSE if rxz is negative.



  xold  xmin  0 X max  X 0min þ X 0min xmax  xmin

C.3. Zscore normalization In this method, data are changed so that their mean and variance are 0 and 1, respectively. The transformation function for this method is:

X new ¼

xold  mean std

Appendix B. Error estimation methods

where std is the standard deviation of the raw data.

There are four basic error estimation methods, which are listed as follows:

C.4. Sigmoidal normalization

   

a

X new ¼ 1e 1þea

a ¼ xold mean std

They can be calculated by the following equations:

MAE ¼ MSE ¼

t¼1

Pn

RMSE ¼ MAPE ¼

ð11Þ

C.5. The first difference method

jxt x0t j

n

t¼1

ð10Þ

This method uses a sigmoid function to scale the data in the range of [1, 1]. The transformation function is:

Mean Absolute Error (MAE). Mean Square Error (MSE). Root Mean Square Error (RMSE). Mean absolute percentage error (MAPE).

Pn

ð9Þ

ðxt x0 Þ

n ffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 0 2 t¼1

ðxt x Þ n

Pn xt x0  x t¼1

n

t

ð8Þ

As mentioned, the first step in the Box–Jenkins method is to transform the data so as to make it stationary. The difference method was proposed by Box–Jenkins (GEP, 1970). In this method, the following transformation should be applied:

yt ¼ xt  xt1

ð12Þ

A. Kheirkhah et al. / Computers & Industrial Engineering 64 (2013) 425–441

C.6. The first difference of the logarithm In this method, the following transformation should be applied:

Y t ¼ logðxt Þ  logðxt1 Þ

ð13Þ

References Abdel-Aal, R. E., Al-Garni, A. Z., & Al-Nassar, Y. N. (1997). Modeling and forecasting monthly electric energy consumption in eastern Saudi Arabia using abductive networks. Energy, 22, 911–921. Akay, D., & Atak, M. (2007). Grey prediction with rolling mechanism for electricity demand forecasting of Turkey. Energy, 32(9), 1670–1675. Al-Saba, T., & El-Amin, I. (1999). Artificial neural networks as applied to long-term demand forecasting. Artificial Intelligence in Engineering, 13, 189–197. Amjady, N., & Keynia, F. (2011). A new prediction strategy for price spike forecasting of day-ahead electricity markets. Applied Soft Computing, 11(6), 4246–4256. Aydinalp, M., Ismet Ugursal, V., & Fung, A. S. (2002). Modelling of the appliance, lighting and space-cooling energy consumptions in the residential sector using neural networks. Applied Energy, 71, 87–110. Azadeh, A., & Ebrahimipour, V. (2004). An integrated approach for assessment and ranking of manufacturing systems based on machine performance. International Journal of Industrial Engineering, 11, 349–363. Azadeh, A., Ghaderi, S. F., Tarverdian, S., Saberi, M. (2007). Integration of artificial neural networks and genetic algorithm to predict electrical energy consumption. In Proceedings of the 32nd annual conference of the IEEE industrial electronics society – IECON’2006. Paris, France: Conservatoire National des Arts & Metiers. Azadeh, A., Ghaderi, S. F., & Sohrabkhani, S. (2007). Forecasting electrical consumption by integration of Neural Network, time series and ANOVA. Applied Mathematics and Computation, 186, 1753–1761. Azadeh, A., Ghaderi, S. F., Tarverdian, S., & Saberi, M. (2007). Integration of artificial neural networks and genetic algorithm to predict electrical energy consumption. Applied Mathematics and Computation, 186, 1731–1741. Azadeh, A., Saberi, M., Gitiforouz, A., & Saberi, Z. (2009). A hybrid simulationadaptive network based fuzzy inference system for improvement of electricity consumption estimation. Expert Systems with Applications, 36, 11108–11117. Azadeh, A., Saberi, M., & Seraj, O. (2010). An integrated fuzzy regression algorithm for energy consumption estimation with non-stationary data: A case study of Iran. Energy Policy, 35(6), 2351–2366. Azadeh, A., & Tarverdian, S. (2007). Integration of genetic algorithm, computer simulation and design of experiment for forecasting electrical consumption. Energy Policy, 35(10), 5229–5241. Aznarte, J. L., Sanchez, J. M. B., Lugilde, D. N., Fernandez, C. D. L., Guardia, C. D., & Sanchez, F. A. (2007). Forecasting airborne pollen concentration time series with neural and neuro-fuzzy models. Expert Systems with Applications, 32(4), 1218–1225. Box, G. E. P., Jenkins, G. M. (1970). Time series analysis: Forecasting and control. San Francisco, 7 Holden Day (Revised ed. 1976). Chiang, W. C., Urban, T. L., & Baldridge, G. W. (1996). A neural network approach to mutual fund net asset value forecasting. Omega, 24, 205–215. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 303–314. Dillon, T. S., Sestito, S., & Leuang, S. (1991). Short term load forecasting using an adaptive neural network. International Journal of Electrical Power & Energy Systems, 13(4), 186–192. Enders, W. (2004). Applied econometric time series. John Wiley & Sons. Enlin, M. (1995). Use of principal component analysis in selecting plants for fluoride. Chinese Journal of Ecology, 14(3), 124–128. Gallant, A., & Stephen, I. (1993). Neural network learning and expert systems. MIT Press. Gareta, R., Romeo, L. M., & Gil, A. (2006). Forecasting of electricity prices with neural networks. Energy Conversion and Management, 47, 1770–1778. Ghiassi, M., Saidane, H., & Zimbra, D. K. (2005). A dynamic artificial neural network model for forecasting time series events. International Journal of Forecasting, 21, 341–362. Granger, C. W. J., & Newbold, P. (1974). Spurious regressions in econometrics. Journal of Econometrics, 2, 111–120. Hill, T., O’Connor, M., & Remus, W. (1996). Neural network models for time series forecasts. Management Science, 42(7), 1082–1092.

441

Hsu, C. C., & Chen, C. Y. (2003). Regional load forecasting in Taiwan—Applications of artificial neural networks. Energy Conversion and Management, 44, 1941–1949. Indro, D. C., Jiang, C. X., Patuwo, B. E., & Zhang, G. P. (1999). Predicting mutual fund performance using artificial neural networks. Omega, 27, 373–380. Jhee, W. C., & Lee, J. K. (1993). Performance of neural networks in managerial forecasting. Intelligent Systems in Accounting, Finance and Management, 2, 55–71. Kalogirou, S. A. (2000). Applications of artificial neural networks for energy systems. Applied Energy, 67, 17–35. Kalogirou, S. A., & Bojic, M. (2000). Artificial neural networks for the prediction of the energy consumption of a passive solar building. Energy, 25, 479–491. Karunasinghe, D. S. K., & Liong, S. Y. (2006). Chaotic time series prediction with a global model artificial neural network. Journal of Hydrology, 323, 92–105. Kavaklioglu, K. (2011). Modeling and prediction of Turkey’s electricity consumption using support vector regression. Applied Energy, 88(1), 368–375. Kim, T. Y., Oh, K. J., Kim, C., & Do, J. D. (2004). Artificial neural networks for nonstationary time series. Neurocomputing, 61, 439–447. Kohzadi, N., Boyd, M. S., Kermanshahi, B., & Kaastra, I. (1996). A comparison of artificial neural network and time series models for forecasting commodity prices. Neurocomputing, 10(3), 169–181. Metaxiotis, K., Kagiannas, A., Ashounis, D., & Psarras, J. (2003). Artificial intelligence in short term electric load forecasting: a state-of-the-art survey for the researcher. Energy Conversion and Management, 44, 1525–1534. Nayak, P. C., Sudheer, K. P., Rangan, D. M., & Ramasastri, K. S. (2004). A neuro-fuzzy computing technique for modeling hydrological time series. Journal of Hydrology, 291, 52–66. Oliveira, A. L. I., & Meira, S. R. L. (2006). Detecting novelties in time series through neural networks forecasting with robust confidence intervals. Neurocomputing, 70(1–3), 79–92. Ozturk, H., Ceylan, H., Canyurt, O. E., & Hepbasli, A. (2003). Electricity estimation using genetic algorithm approach: A case study of Turkey. Energy, 30, 1003–1012. Palmer, A., Montano, J. J., & Sese, A. (2006). Designing an artificial neural network for forecasting tourism time series. Tourism Management, 27, 781–790. Peng, T. M., Hubele, N. F., & Karady, G. G. (1992). Advancement in the application of neural networks for short term load forecasting. IEEE Transaction on Power System, 7(1), 250–258. Rahman, S., & Drezga, I. (1992). Identification of a standard for comparing shortterm load forecasting techniques. Electric Power Systems Research, 25(3), 149–158. Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing: Explorations in the microstructure of cognition. Foundations (Vol. 1). Cambridge, MA: MIT Press. Schiffmann, W., Joost, M., Werner, R., (1992). Optimization of the backpropagation algorithm for training multilayer perceptrons. Technical Report 16/92. Institute of Physics, University of Koblenz. Stern, H. S. (1996). Neural networks in applied statistics. Technometrics, 38(3), 205–214. Sun, Z. L., Choi, T. M., Au, K. F., & Yu, Y. (2008). Sales forecasting using extreme learning machine with applications in fashion retailing. Decision Support Systems, 46, 411–419. Tang, Z., Almeida, C., & Fishwick, P. (1991). A time series forecasting using neural networks vs. Box–Jenkins methodology. Simulation, 57(5), 303–310. Tang, Z., & Fishwick, P. A. (1993). Back-propagation neural nets as models for time series forecasting. ORSA Journal on Computing, 5(4), 374–385. White, H. (1989). Some asymptotic results for learning in single hidden-layer feedforward network models. Journal of the American Statistical Association, 84(408), 1003–1013. Yao, S. J., Song, Y. H., Zhang, L. Z., & Cheng, X. Y. (2000). Wavelet transform and neural networks for short-term electrical load forecasting. Energy Conversion and Management, 41, 1975–1988. Yu, Y., Choi, T. M., & Hui, C. L. (2011). An intelligent fast sales forecasting model for fashion products. Expert Systems with Applications, 38, 7373–7379. Zhang, G., & Hu, M. Y. (1998). Neural network forecasting of the British Pound/US Dollar exchange rate. Omega, 26, 495–506. Zhang, G. P., & Qi, M. (2005). Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, 160, 501–514. Zhou, P., Ang, B. W., & Poh, K. L. (2006). A trigonometric grey prediction approach to forecasting electricity demand. Energy, 31(14), 2839–2847. Zhu, J. (1998). Data envelopment analysis vs. principal component analysis: An illustrative study of economic performance of Chinese cities. European Journal of Operation Research, 111, 50–61.