A Comparative Study on Hybrid Linear and Nonlinear

Report 0 Downloads 48 Views
A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting *1

Yukun Bao, 2Dongbo Yi, 3Tao Xiong, 4Zhongyi Hu, 5Shuai Zheng School of Management, Huazhong University of Sci.&Tech.,Wuhan, P.R.China, [email protected] 2 School of Management, Huazhong University of Sci.&Tech.,Wuhan, P.R.China, [email protected] 3 School of Management, Huazhong University of Sci.&Tech.,Wuhan, P.R.China, [email protected] 4 School of Management, Huazhong University of Sci.&Tech.,Wuhan, P.R.China, [email protected] 5 School of Management, Huazhong University of Sci.&Tech.,Wuhan, P.R.China, [email protected]

*1

doi : 10.4156/aiss.vol3.issue5.28

Abstract The hybrid linear and nonlinear modeling framework has been widely used as a promising method for time series forecasting. However, there have been very few, if any, large scale comparative studies for the hybrid linear and nonlinear framework for air passenger traffic forecasting. So, we hope this study would fill this gap. The linear models selected are autoregressive integrated moving average model (ARIMA ) and seasonal autoregressive integrated moving average model (SARIMA). As for the nonlinear models, support vector machines (SVMs) and multi-layer feed-forward neural networks (FNN) are selected. Specifically, we employ these models on the four monthly air passenger traffic series of American airlines. The results demonstrate that significant improvement can be achieved with hybrid linear and nonlinear framework, particularly, hybrid framework combined by SARIMA and SVM models performed best in terms of symmetric mean absolute percentage error (SMAPE), multiple comparisons with the best (MCB), and fraction best (FRAC-BEST).

Keywords: Hybrid Linear and Nonlinear Modeling Framework, Air Passenger Traffic Forecasting, Seasonal Time Series Forecasting.

1. Introduction Forecasting is an important component of planning in any enterprise; but it is particularly critical in airline revenue management because of the direct influence forecasts have on the booking limits that determine airline profits. Thus, how to model and forecast air passenger has long been a major research topic that has significant practical implication. Unfortunately, the passenger behaviors and other complicating factors make air passenger traffic forecasting extremely difficult. In the past decades, academic researchers and practitioners have made many contributions to air passenger forecast. Early researches about air passenger traffic forecasting focused mainly on models for demand distributions, models for arrival processes and censoring demand data. Descriptions of statistical models of passenger booking can be found in Beckmann and Bobkowski [1]. Lyle models demand as composed of a Gamma systematic component with Poisson random errors [2]. This model leads to a negative binomial distribution for total demand, which is then truncated for demand censorship. Empirical studies have shown that the normal probability distribution gives a good continuous approximation to aggregate airline demand distributions (see, for example, Belobaba [3] and Shlifer and Vardi [4]). The use of the Poisson process, when appropriate, is useful in dynamic treatments because of the memoryless property of the exponential inter arrival distribution. Cumulative arrivals from compound Poisson processes can provide a reasonable fit to the coefficients of variation *

Corresponding author.

- 243 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

of real world arrival data. For example, the stuttering Poisson process used by Rothstein [5] and Beckmann and Bobkowski [6] is a compound process that allows for batch arrivals at each occurrence of a Poisson arrival event and achieves more realistic variances. Since the deregulation of the United States Commission on aviation, air demand has become a favorite issue of the scholars. Traditional research has focus on the economic characters of the take-off and landing point, such as population, income and price, which has been discussed by Boyer and Butten. Since then, most scholars make them as a basis to establish econometric models for empirical study[7-8]. Most econometric models aimed to reveal the relationship between air passenger traffic flow and selected variables such as geo-economic and service-related factors. Compared with econometric modeling, little attention has been paid on time series models in air passenger forecast. Some important researches have been done by Grubb and Mason [9], and Bermudez [10]. They all used Holt-winters method to model the monthly UK air passenger data. The promising results show that univariate forecasting may have some advantages over multivariate econometric modeling for long lead-times. A univariate forecast depends only on the past of the series and not on estimated relationships between the series and exogenous variables, and it does not require forecasts of the exogenous variables, which will themselves be subject to uncertainly. Traditional time series models assume the time series are linear or near linear and can award good results when the series are stationary. Nevertheless, in real world, there is always some irregularity along with the trend and seasonality. Recent researches focus on modeling time series with complex nonlinearity, dynamic variation and high irregularity. Artificial intelligence models such as artificial neural networks (ANN), support vector machines (SVMs) and genetic programming (GP) and some data mining methods are widely used in air passenger forecast. Kyungdoo et al. first used ANN to forecast air passenger traffic[11]. Alekseev used a multivariate neural forecasting modeling for air transport after preprocessed by decomposition, the results showed that the neural processing outperforms the traditional econometric approach and offers generalization on time series behavior. It also revealed the capabilities of the proposed hybrid model [12]. Recently, it is an accepted fact that combining multiple forecast, particularly combined by linear and nonlinear model, tends to improve the forecasting performance because different forecasting models can complement each other in capturing patterns of data sets. For example, Terui and Dijk presented a linear and nonlinear time series model for forecasting the US monthly employment rate and production indices [13]. Zhang proposed to take a combining approach including the ARIMA and ANN models to time series forecasting [14]. Pai established a hybrid ARIMA and support vector machines model for stock price forecasting [15]. Nevertheless, can the combined models above effectively deal with the problem of seasonality which is perhaps the most significant characteristic in an air passenger time series? Utilizing the concept of hybrid model combined by linear and nonlinear models, Tseng presented a hybrid forecasting model that combines the seasonal ARIMA and the neural network back propagation (BP) model to predict seasonal time series data [16]. Chen proposed a hybrid methodology that exploits the unique strength of the seasonal ARIMA model and the support vector machines (SVMs) model in forecasting seasonal time series [17]. Naturally, two question that arises are weather the models proposed by Tseng [16] and Chen [17] are still valid when using seasonal air passages time series and which one is better in this context. This study extended the hybrid linear and nonlinear framework into air passages forecasting by conducting an extensive and comparative experiment with different modeling scenario. The goals of the study have two folds. One is to investigate the issue of whether the hybrid model combined by linear and nonlinear models can model and forecast effectively seasonal air passage time series. The other is to compare the performance across different modeling methodologies. In doing so, we employ four different hybrid models presented in [14-17]. Meanwhile, the corresponding individual models such as ARIMA, seasonal ARIMA, FNN and SVMs are selected as the benchmark models. The monthly ones of four selected airlines are used in the experiment to see the forecast accuracy judged by two measures of RMSE and MAPE. Moreover, we also paid attention to the rank-based measure which infers whether there are significant differences between forecasting models. The rest of this paper is organized as follows: Section 2 describes the methodologies used in this study including ARIMA, SARIMA, FNN and SVMs along with the hybrid methodology. Section 3 illustrates the data source and the experimental design on accuracy measure, rank-based measures,

- 244 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

implementation and parameters selection in details. In section 4, the experimental results are discussed. Finally, conclusions of this study are drawn in Section 5.

2. Methodologies 2.1. Seasonal ARIMA model Seasonal ARIMA (SARIMA) is the most popular linear model for modeling many types of seasonal as well as nonseasonal time series. A time series Z t t  1, 2, , k  is generated by SARIMA

 p, d , q  P, D, Q s (if process with mean 

P, D, Q all equal to zero, the SARIMA model degenerate to ARIMA model)

of Box and Jenkins time series model if

 p  B   p  B s  1  B  1  B s  Z t   q  B  Q  B s   t D

d

(1)

With

 p  B   1  1 B  2 B 2     p B p ,  p  B s   1   s B s   2 s B 2 s     ps B ps ,

 q  B   1  1 B   2 B 2     q B q and Q  B s   1   s B s   2 s B 2 s    Qs B Qs Where p, d , q, P, D, Q are integers, s is the season length ( s  4 for quarterly data and s  12 for monthly data); B is the backward shift operator, and  t is the estimated residual at time t . d is the number of regular differences, D is the number of seasonal differences; Z t denotes the observed value at time t , t  1, 2, , k . The SARIMA model involves the following four steps: Step 1: Identification of the SARIMA  p, d , q  P, D, Q  s . structure. Step 2: Estimation of the unknown parameters. Step 3: Goodness-of-fit tests on the estimated residuals. Step 4: Forecast future outcomes based on the known data. The  t should be independently and identically distributed as normal random variables with mean=0

and constant variance  2 . The roots of  p  Z   0 and  q  Z   0 should all lie outside the unit circle.

2.2. Feed Forward Neural Network In this study, we employ a particular structure of ANN, multi-layer feed-forward neural networks (FNN) on the basis of error back-propagation algorithm, which is the most popular and widely-used network paradigm in time series forecasting. The FNN system has three layers, which are input layer, hidden layer and output layer, the upper layer nodes connect to the lower layer nodes, but the nodes of the same layer can’t connect. The input layer consists of all the input factors, then knowledge or information from the input layer are processed in the hidden layer, the output vector that follows is computed in the output layer. Generally hidden and output layers have a mathematical function (generally non-liner and called as an activation function). A sigmoid function as an activation function is a widely used non-liner activation function whose output lies between 0 and 1.

2.3. Support vector machines for regression

- 245 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

The support vector machines are a new and powerful approach for data classification and regression which employs the structural risk minimization (SRM) principle. For a detailed introduction to this subject and its technical details on how to employ SVMs for time series forecasting can be found in [18-20]. Given a set of training data points G=

 x , y 

n

i

i

i

.where xi  X , yi  R , xi is the input vector,

yi is the desired value and n is the total number of the data patterns. The aim is to find a function which can evaluate all these data well. SVR is one of the methods to perform the above regression task. In general, SVR approximate the function using the following:

f  x    w    x    b, Where

(2)

  denotes the inner product,   x  is the high dimensional feature space which is non-

linearly mapped from the input space x . The coefficients

Rreg  f   C

w and b are estimated by minimizing

1 n 1   f  xi  , yi   w  n i 1 2

2

 f  x   y   , if f  x   y   .   f  x, y   otherwise 0 

Figure 1.

(3)

(4)

 -insensitive loss function

In the regularized risk function given by Eq. (3), the first term C

1 n    f  xi  , yi  is the empirical n i 1

risk. They are measures by the  -insensitive loss function (Fig. 1) given by Eq. (4). This loss function provides the advantage of enabling one to use sparse data points to represent the decision function given by Eq.(2). The second term

1 2 w , is the regularization term, C is referred to as the regularized 2

constant and it determines the trade-off between the empirical risk and the regularization term. Increasing the value of C will result in the relative importance of the empirical risk with respect to the

- 246 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

regularization term to grow.  is called the tube size and it is equivalent to the approximation accuracy placed on the training data points. Both C and  are user-prescribed parameters. In this paper, we use a general type of  -insensitive loss function, which is given as:

if    yi  f  xi    if yi  f  xi    , if f  xi   yi  

 0,    f  xi   yi    yi  f  xi    ,  f x  y , i i  '

(5)

Finally, using the Lagrange function and exploiting the optimality constraints, we obtain the following Quadratic Programming (QP) problem:

arg min* a ,a





n n 1 n n ai  ai*  a j  a*j    xi     x j       yi  ai      yi  ai* (6)   2 i 1 j 1 i 1 i 1

Subject to

  a  a   0, a , a  0, C , n

* i

i

* i

i

(7)

i

*

Where ai and ai are corresponding Lagrange multipliers used to push and pull the outcome of

f  xi  towards

yi respectively.

Solving the above QP problem of Eq. (6) with constraints of Eq. (7), we determine the Lagrange

a and a* and obtain w    ai  ai*    xi  . Therefore the estimation function in Eq. n

multipliers

i 1

(2) becomes:

f  x     ai  ai*    x     xi    b. n

(8)

i 1

So far, we have not considered the computation of b. In fact, this can be solved by exploiting the Karush-Kuhn-Tucker (KKT) conditions. These conditions state that at the optimal solution, the product between the Lagrange multipliers and the constraints has to equal to zero. In this case, it means that

 a   

  y   w   x   b   0

ai    i  yi   w    xi    b  0 * i

* i

i

i

And

 C  ai   i  0

C  a  * i

* i

- 247 -

 0.

(9)

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

Where

i

and

 i*

are slack variables used to measure the error of up and down side. Where

corresponds to a point for which yi which

i  0

  w    xi    b   , and   0 corresponds to a point for * i

yi   w    xi    b   , as illustrated in Fig.2

f  w   x   b

  i  0 

 i*  0 Figure 2. Nonlinear SVR with

 -insensitive tube

 i(*)  0 for ai(*)   0, C  , b can be computed as follows:

Since ai  ai  0 and *

 yi   w    xi     , for ai   0, C  b . *  yi   w    xi     , for ai   0, C 

(10)

Using the trick of kernel function, Eq. (8) can be written as:

f  x     ai  ai*  K  x , xi   b. n

i 1

K  x, xi     x     xi   is defined as the kernel function. The value of the kernel is equal to the X i and X j in the feature space   xi  and   x j  , that

inner product of two vectors is, K

 x , x     x     x   . The elegance of using the kernel function is that one can deal with i

j

i

j

feature spaces of arbitrary dimensionality without having to compute the map

  x

explicitly. Any

function satisfying Mercer’s condition can be used as the kernel function. The typical examples of kernel function are as follows:





Linear: K xi , x j  xi x j T

K  xi , x j     xiT x j  r  ,   0. d

Polynomial:







Radial basis function (RBF): K xi , x j  exp  xi  x j





Sigmoid: K xi , x j  tanh

 x

T i

x j  r .

- 248 -

2



2 2 .

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

Here,  ,  , r and d are kernel parameters. The kernel parameter should by carefully choose as it implicitly defines the structure of the high dimensional feature space

  x

and thus controls

the complexity of the final solution

2.4. The hybrid framework Air passenger traffic demand is always affected by various factors such as economy, weather, and important events and so on. Besides seasonality and long trend, there are always some irregular fluctuations which cannot be easily captured. ARIMA and SVM both have advantages in capturing linear or nonlinear patterns. So the hybrid model in this study is composed of the ARIMA component and the SVM component. Thus, the model can capture both linear and nonlinear patterns with improved overall forecasting performance. The hybrid model ( Z t ) can then be represented as follows:

Z t  Yt  N t

(11) ~

Where Yt is the linear part modeled by ARIMA and is the forecast value of the ARIMA model at time of the ARIMA model, then

Nt is the nonlinear part modeled by SVMs. Yt

t . Let  t

represent the forecast errors at time

t

~

 t  Z t  Yt

(12)

The forecast errors are then modeled by SVMs and can be represented as follows:

 t  f ( t 1 ,  t  2 , ,  t  n )  t Where f is a nonlinear function modeled by SVMs and combined forecast is ~

~

t

(13)

is the random error. Therefore, the

~

Z t  Yt  N t

(14)

~

Where N t is the forecast value of the SVMs model. The sake of brevity, we name the hybrid model combined by linear model (ARIMA or SARIMA) and nonlinear models (SVMs or FNN) as ARIMASVM, ARIMAFNN, SARIMASVM, SARIMAFNN, respectively.

3. Experiment design 3.1. Data collection and preprocessing Four monthly air passenger traffic series of American airlines are chosen as experimental samples. The data are freely obtained from the Bureau of Transportation Statistics, U.S. Department of Transportation (http://www.bts.gov/). The main reason of selecting these four airlines is that these airlines are famous in American and they represent the development trend of air industry in American. The four sampling data series all covers a period from Jan. 1990 to Dec. 2005, with a total 192 observations. The data from Jan. 1990 to Jun. 2001 is used for the training set (138 observations), and the remainder is used as the testing set. The original data of these four airlines are showed in Figure 3(a) to Figure3(d). All performance comparisons are based on these testing sets. We considered only onestep-ahead forecasting.

- 249 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

(a) Monthly data of American

(b) Monthly data of Delta

(c) Monthly data of Southwest (d) Monthly data of UnitedAir Figure 3. Air passenger traffic series of four American airlines In this paper, we adopt liner transference to adjust the original data set scaled into the range of [0, 1] as show in equation (17). The two main advantages of scaling are to avoid inputs in greater numeric ranges from dominating those in smaller numeric ranges, and to prevent numerical difficulties during the calculation. Inputs value scaling can help increase accuracy of FNN and SVMs according to experimental results of this study.

yk , t 

xk ,t  min( xk )

(15)

max( xk )  min( xk )

Where xk ,t is original value, yk ,t is the scaled value, max( xk ) is the maximum value of dataset k (k  1, 2,3, 4) , and min( xk ) is the minimum value of dataset k.

3.2. Accuracy measure To compare the effectiveness of the different model, this study used the statistical metrics of the symmetric mean absolute percentage error (SMAPE), as this is the main measure considered in many competitions. It is defined as:

^

1 SMAPE  N

N

 t 1

x(t )  x(t ) ^    x(t )  x(t )  2  

- 250 -

(16)

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011 ^

Where x(t ) denotes the observation at period t , x(t ) denotes the forecast of x(t ) , N is the number of forecasting periods. In this study, we repeat running each model twenty times to even out the fluctuations, such as random initial weight and parameter estimation. Then each of the twenty runs will produce a specific SMAPE. Next, we average these twenty SMAPE to obtain the overall SMAPE for the specific time series and the model. Finally, the overall SMAPE (SMAPE-ALL) for all four time series which is the main performance measure in this experiment is computed.

3.3. Rank-based measures A rank-based performance measure termed multiple comparisons with the best (MCB) is used in this experiment to test whether some models perform significantly worse than the best model. Specifically, we firstly compute the rank of each model p( p  1, 2, , P) on each time series

q(q  1, 2, , Q) , say R p  q  , with 1 being the best and 8 being the worst. Let P and Q be the number

of models and time series ( P  8, Q  4 in this context), respectively. Then the average rank of each model p is computed by averaging R p  q  over all time series. The  % confidence limits (in our case  %  95% ) will then be 

R p  0.5 p P

P( P  1) 12q

(17)



Where R p is the average rank of each model p over all time series, and p P is the upper  percentile of the range of P independent standard normal variables. In addition, we also computed another rank-based measure, namely the “fraction best” (or in short FRAC-BEST). It is defined as the fraction of time series for which a specific model beats all other models. We used the SMAPE as the basis for computing this measure. It could be interesting that a model that has a high FRAC-BEST, even if it has average over SMAPE, is deemed worth testing for a new problem, as it has a shot at being the best.

3.4. ARIMA and SARIMA implementation The ARIMA and SARIMA models used in this study are implemented using the Eviews 4.0. The specifications are selected based on the ACF and PACF and evaluated according to Akaike’s Information Criterion (AIC).

3.5. FNN implementation In this study, we employ Matlab (Version R2006b) ANN toolbox for FNN modeling. The architecture of the FNN model is as follows: the number of input nodes is set at six; the number of hidden nodes varies from 6 to 20 and the optimum number of hidden nodes that minimizes the error rate on the training set is determined; the number of output nodes is set at one. For the stopping criteria, the number of learning epochs is chosen as 1000 as there is no prior knowledge of this value before the experiment. In the training phase, gradient descent with momentum algorithms is applied to update weight and bias values. The learning rate is chosen as 0.9 and the momentum constant is chosen as 0.1. The activation function of the hidden layer is sigmoid and the output node uses the linear transfer function. Each experiment is repeated 10 times and the average values are represented.

- 251 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

3.6. SVMs implementation The SVMs models used in this experiment are implemented using the LibSVM package (version 2.86). In general, the radial basis function (RBF) is suggested for SVMs. The RBF kernel function nonlinearly maps the samples into the high-dimensional space, so it can handle nonlinear problem. So, in this paper, the RBF kernels function is used in the SVR, The selection of three parameters C,  and  , of a SVR model is important to the accuracy of forecasting. There are lots of existing practical approaches to the selection of C and  , such as user defined based on priori knowledge and experience, cross-validation, and asymptotical optimization. However, structural methods for efficiently and simultaneously confirming the selection of those three parameters efficiently are lacking. Hence, in this study  is set to a fixed value for each model, then we use a grid-search on (C,  ) for the following reasons. Firstly, computational time to find good parameters by grid-search is not much more than that by other methods. Secondly, the grid-search can be easily parallelized because each (C,  ) is independent. In order to increase efficiency, we try exponentially growing sequences of (C,  ) to indentify good parameters. We set the C= 26 , 25 , , 25 , 26 and kernel 5 4 5 parameter   10 ,10 , ,10 . In grid-search process, pairs of (C,  ) are tried and the one with the best accuracy is picked up.

4. Results and discussions The training and testing forecasting performances of all the examined models (ARIMASVM, ARIMAFNN, SARIMASVM, SARIMAFNN, ARIMA, SARIMA, SVMs and FNN) in terms of SMAPE-ALL, average rank and fraction best for the monthly data are shown in Table 1.

Table 1. Performance comparison of the individual models versus the hybrid models Training data Testing data Average rank FRAC-BEST(%) Model SMAPE%(Std err) SMAPE%(Std err) ARIMA 20.18(1.24) 23.48(1.35) 6.18 2.17 SARIMA 19.76(0.97) 21.04(1.04) 5.47 2.51 SVMs 16.27(0.57) 18.48(0.75) 4.45 5.76 FNN 15.27(1.01) 23.31(1.15) 5.52 6.48 ARIMASVM 16.01(0.54) 17.35(0.86) 3.65 8.27 ARIMAFNN 17.54(0.87) 20.14(1.07) 4.88 8.49 SARIMASVM 14.25(0.34) 15.64(0.54) 2.59 15.84 SARIMAFNN 16.14(0.58) 17.82(0.75) 3.27 8.48 As mentioned in the introduction, the goals of the experiment based study have two folds. One is to investigate the issue of whether the hybrid model combined by linear and nonlinear models can model and forecast effectively a seasonal air passage time series. The other is to compare the performance across different modeling methodologies. The following paragraphs discuss these two folds based on the results achieved. Focusing on the first goal, this is to say, three findings can be drawn from Table 1. Firstly, all the four hybrid models outperform the corresponding individual models in terms of SMAPE-ALL for the testing data sets. This finding is consistent with the results in [15-17]. Secondly, the gaps between insample and out-of sample error in terms of SMAPE-ALL by hybrid models are smaller. Last but not least, the standard deviation of hybrid models is smaller than that of the corresponding individual models. Thus, this shows that hybrid models exhibit more stable and consistent performance in air passages forecasting. As for the comparisons across different modeling methodologies, commenting on the Table 1, we observe that in the case of the mean rank and fraction best, the SARIMASVM achieves the best performance. In additions, Figure 2 also reports that the seven models with confidence interval above the dashed line are significantly worse than the best model (SARIMASVM).

- 252 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

Interesting, the performance of individual FNN model is almost worst in all six learning approaches, as shown in Table 1. This may be due to the intrinsic drawbacks of artificial neural networks, although FNN is noise tolerant, suffering from difficulties with generalization because of over fitting. Moreover, ARIMA achieves a poor performance also. The possible reasons could be that ARIMA cannot model the seasonality exist in air passages time series

5. Conclusions Generally speaking, in terms of the experiments presented in this study, we can draw the following conclusions. (1) The experimental results show that the hybrid linear and nonlinear frameworks produce satisfactory forecasts in terms of SMAPE-ALL and rank-based measures. (2) The SARIMASVM and SARIMA-FNN models outperform the other two methodologies, specifically ARIMASVM and ARIMA-FNN. (3) SARIMA-SVM outperforms the all other candidate models. During the experiment, we also found that when we try to improve the forecast accuracy of individual models, there will be an over fitting on training sets. That means, though we can get a better forecast performance of the individual model, we cannot get a better forecast performance or even get a worse performance of the hybrid models. Therefore, the optimal parameters of the hybrid model should be further researched.

6. Acknowledgement This research is granted by National Science Foundation of China (No. 70771042 and No. 70731001) and supported by the Hubei Provincial Department of Education Project of Key Research Institute of Humanities and Social Sciences at Universities-Center for Modern Information Management.

8

7

Average Rank

6

5

4

3

2

ARIMA

SARIMA

SVMs

FNN

ARIMASVM

ARIMAFNN SARIMASVM SARIMAFNN

Model

Figure 4. The average ranks with 95% confidence limits for multiple comparisons with the best test. The dashed line indicates that any method with confidence interval above this line is significantly worse than the best

- 253 -

A Comparative Study on Hybrid Linear and Nonlinear Modeling Framework for Air Passenger Traffic Forecasting Yukun Bao, Dongbo Yi, Tao Xiong, Zhongyi Hu, Shuai Zheng Advanced in Information Sciences and Service Sciences. Volume 3, Number 5, June 2011

7. References [1] Martin J. Beckmann, F. Bobkoski, “Airline demand: An analysis of some frequency distributions,” “Naval Research Logistics Quarterly”, John Wiley & Sons, vol.5, no. 1, pp. 43-51, 1958. [2] Charles Lyle, “A statistical analysis of the variability in aircraft occupancy,” in Proceedings of 10th AGIFORS Symposium, 1970. [3] Peter Paul Belobaba, “Air travel demand and airline seat inventory management,” Ph.D. thesis, Flight Transportation Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 1987. [4] E. Shlifer, Y. Vardi, “An airline overbooking policy,” “Transportation Science”, Informs, vol.9, no.2, pp. 101-114, 1975. [5] Marvin Rothstein, “Stochastic models for airline booking policies,” Ph. D. thesis, Graduate School of Engineering and Science, New York University, New York, NY, 1968. [6] Marvin Rothstein, “An airline overbooking model,” “Transportation Science”, Informs, vol.5, no. 2, pp. 180-192, 1971. [7] Seraj Y. Abed, Abdullah O. Ba-Fail and Sajjad M. Jasimuddin, “An econometric analysis of international air travel demand in Saudi Arabia,” “Journal of Air Transport Management”, Elsevier, vol.7, no.3, pp. 143-148, 2001. [8] Tobias Grosche, Franz Rothlauf and Armin Heinzl, “Gravity models for airline passenger volume estimation,” “Journal of Air Transport Management”, Elsevier, vol.13, no.4, pp.175-183, 2007. [9] Howard Grubb, Alexina Mason, “Long lead-time forecasting of UK air passengers by HoltWinters methods with damped trend,” “International Journal of Forecasting”, Elsevier, vol.17, no.1, pp. 71-82, 2001. [10] Jose D. Bermudez, Jose V. Segura, Enriqueta Vercher, “Holt-winters forecasting: An alternative formulation applied to UK air passenger data,” “Journal of applied statistics”, Taylor & Francis, vol. 34, no. 9, pp. 1075-1090, 2007. [11] Nam Kyungdoo, Schaefer Thomas, “Forecasting international airline passenger traffic using neural networks,” “Logistics and Transportation Review”, National Academy of Science, vol. 31, no. 3, pp. 239-251, 1995. [12] K.P.G. Alekseev, J.M. Seixas, “A multivariate neural forecasting modeling for air transportPreprocessed by decomposition: A Brazilian application,” “Journal of Air Transport Management”, Elsevier, vol.15, no.5, pp. 212-216, 2009. [13] Nobuhiko Terui, Herman K. van Dijk, “Combined forecasts from linear and nonlinear time series models,” “International Journal of Forecasting”, Elsevier, vol. 18, no.3, pp. 421-438, 2002. [14] Guoqiang Peter Zhang, “Time series forecasting using a hybrid ARIMA and neural network model,” “Neurocomputing”, Elsevier, vol.50, no.1, pp. 159-175, 2003. [15] Ping-Feng Pai, Chih-Sheng Lin, “A hybrid ARIMA and support vector machines model in stock price forecasting, ” “Omega”, Elsevier, vol.33, no. 6, pp. 497-505, 2005. [16] Fang-Mei Tseng, Hsiao-Cheng Yu and Gwo-Hsiung Tzeng, “Combining neural network model with seasonal time series ARIMA model,” “Technological Forecasting & Social Change”, Elsevier, vol. 69, no. 1, pp.71-87, 2005. [17] Kuan-Yu Chen, Cheng-Hua Wang, “A hybrid SARIMA and support vector machines in forecasting the production values of the machinery industry in Taiwan,” “Expert System with Applications”, Elsevier, vol. 32, no. 1, pp. 254-264, 2007. [18] Jae Kwon Bae, "Forecasting Decisions on Dividend Policy of South Korea Companies Listed in the Korea Exchange Market Based on Support Vector Machines", "JCIT, Vol. 5, No. 8, pp. 186194, 2010 [19] Yukun Bao, Rui Zhang, Tao Xiong, Zhongyi Hu, "Forecasting Non-normal Demand by Support Vector Machines with Ensemble Empirical Mode Decomposition", AISS, Vol. 3, No. 3, pp. 81-91, 2011 [20] Yao-hong Zhao, Ping Zhong , Kuai-ni Wang, "Application of Least Squares Support Vector Regression Based on Time Series in Prediction of Gas", JCIT, Vol. 6, No. 1, pp. 243-250, 2011

- 254 -