An Adaptive-Margin Support Vector Regression for Short-Term Traffic Flow Forecast Dali Wei1, Hongchao Liu2,* 1
2
Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409 Civil and Environmental Department, Texas Tech University, 10th and Akron, Lubbock, TX 79409 Email:
[email protected]; Fax: 1- (806)7424168; Telephone: 1-(806)7422801 *Corresponding
1
author
An Adaptive-Margin Support Vector Regression for ShortTerm Traffic Flow Forecast ABSTRACT The day-to-day volatility of traffic series provides valuable information for accurately tracking the complex characteristics of short-term traffic such as stochastic noise and non-linearity. Recently, Support Vector Regression (SVR) has been applied for shortterm traffic forecasting. However, standard SVR adopts a global and fixed ε-margin, which not only fails to tolerate the day-to-day traffic variation, but also requires a blind and time-consuming searching procedure to obtain a suitable value for ε. In this work, on the ground of stochastic modeling of day-to-day traffic variation, we propose an adaptive SVR short-term traffic forecasting model. The time-varying deviation of the day-to-day traffic variation, described in a bi-level formula, is integrated into SVR as heuristic information to construct an adaptive ε-margin, in which both local and normalized factors are considered. Comparative experiments using filed traffic data indicate that the proposed model consistently outperforms the standard SVR with an improved computational efficiency. KEY WORDS: Short-term traffic forecasting, Support Vector Regression, ε-margin, day-to-day traffic variation 1. Introduction Short-term traffic forecasting plays an important role in proactive traffic control and operation. Usually, the predicting period required is less than 15 minutes. In such a short interval, traffic volume and velocity exhibit very complex characteristics such as stochastic noise and non-linearity (Jiang & Adeli, 2005; L. Li, Lin, & Liu, 2006; Wei, Chen, & Chen, 2010). A number of forecasting methods have been proposed regarding short-term traffic as a typical time series, such as the Auto Regression Integrated Moving Average (ARIMA) model (Williams, Durvasula, & Brown, 1998; Williams & Hoel, 1999, 2003), the NonParameter Regression model (Davis & Nihan, 1991) and the Neural Network (NN) models (Park, Messer, & Urbanik II, 1998; Smith, Williams, & Keith Oswald, 2002; Xie & Zhang, 2006). For example, Williams and Hoel (1999) applied traditional parametric Box-Jenkins time series models to the dynamic system of single point traffic flow forecasting, which addressed most of the parametric model concerns for traffic condition data by establishing a theoretical foundation for using seasonal ARIMA forecast models. The experiments of Smith et al. (2002) further indicated that the performances of the seasonal ARIMA are better than the nearest neighbor nonparametric regression model, where the Naive forecast model is applied as a benchmark model. Recently, Support Vector Regression (SVR), as a universal learning model, has drawn increasing attentions and successfully been applied in short-term traffic forecasting. 2
Various empirical studies indicate that SVR consistently outperforms traditional forecasting models like ES and ARIMA (Castro-Neto, Jeong, Jeong, & Han, 2009; F. Chen, Wei, & Tang, 2010; Wei et al., 2010; Y. Zhang & Xie, 2008). SVR using the 𝜀-insensitive loss function is in the form of (Vapnik, 1995): 0 𝑖𝑓 |𝑦 − 𝑓(𝑥, 𝜔)| < 𝜀 𝐿𝜉 (𝑦, 𝑓(𝑥, 𝜔)) = � (1) |𝑦 − 𝑓(𝑥, 𝜔)| − 𝜀 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 where 𝑓(𝑥, 𝜔) is the predicting function, 𝐿𝜉 is the loss function, 𝑦 is the observed value and 𝜀 is the margin parameter. As shown in Figure 1, if the data sample is inside the 𝜀tube (the tube between the two dash lines), SVR regards it as having no loss or error corresponding to the estimation of 𝑓(𝑥, 𝜔). Only the data samples outside the tube are taken into account in the optimization. The 𝜀-insensitive loss function would yield better generalization than other loss functions like Huber’s loss function, which not only measures the empirical risk (training error), but also controls the generalization ability and the tolerance of noise, while conferring the advantage of sparsity in the solution (Smola, Schölkopf, & Müller, 1998).
Figure 1: Illustration of 𝜀 insensitive function in SVR
However, in the context of short-term traffic forecasting, using a global and fixed 𝜀margin for all time period within a day lacks the flexibility to capture and tolerate the time-varying variation in daily traffic. SVR, similar to other supervised learning models, learns from historical traffic data (training samples) to gain the knowledge in the form of equation systems, and then predicts the future traffic states. Although showing a similar pattern, day-to-day traffic exhibits a prominent variation due to underlying stochastic travelling behavior. This has been well verified by a number of empirical traffic data analyses (Hellinga & Abdy, 2008; Levinson, Sullivan, & Bryson, 2006; J. Q. Li, 2011; Rakha & Van Aerde, 1995; Yin, 2008; L. Zhang, Yin, & Lou, 2010). Hence, to attain a satisfying predicting accuracy, SVR not only needs to learn the seasonal and cyclical patterns, but also needs to adequately tolerate and track this day-to-day variation. Specifically, this variation is time-dependent—different time periods exhibit different variation characteristics. Using a fixed 𝜀-margin in SVR obviously cannot adapt to this time-varying volatility. Furthermore, it is always a tricky task to determine a suitable value of 𝜀 in the fixedmargin SVR. Without prior experience, all optional values for 𝜀 have to be tried and compared using a K-fold cross-validation method, which is a time-consuming and blind searching procedure. We believe that integrating an adaptive 𝜀-margin in SVR to utilize the day-to-day traffic variation as heuristic information, will not only lead to a better 3
forecasting performance, but also avoid the computation complexity for determine a suitable global margin. This forms the basic motivation of this work. In this paper, an adaptive-margin SVR model for short-term traffic forecasting is proposed on the ground of a stochastic modeling of traffic variation. We establish a bilevel stochastic formula to describe the traffic variation, which combines the timevarying characteristic into the day-to-day traffic variation. The obtained time-varying deviation then is utilized to construct an adaptive 𝜀-margin of SVR, in which both local and normalized factors are considered. Four traffic datasets from Hefei, China and Seattle, USA are used in this paper for training our model and validate the performance. Our new model is compared with the fixed 𝜀 margin SVR, as well as the seasonal ARIMA, NN and Naive forecast.
The rest of this paper is organized as follows. In section 2, we briefly introduce the SVR, and then propose the stochastic model for the day-to-day traffic variance and the adaptive 𝜀 -margin in SVR. Section 3 presents comparative experiments to evaluate the performance of our model. Section 4 concludes the paper. 2. Model formulation The section briefly introduces some fundamentals of ε -SVR, and then presents the stochastic modeling of the day-to-day traffic variation and describes how to integrate this variation into the adaptive margin of SVR. 2.1 Support Vector Regression Support Vector Machine (SVM) is a universal learning method based on the statistical learning theory developed by Vapnik (1995). SVM embodies the structural risk minimization (SRM) principle to minimize an upper bound on the expected risk, which is opposed to empirical risk minimization (ERM) that aims to minimize the error on the training data. This feature equips SVM with a greater generalization performance. SVR is an application of SVM for function estimation problem. Given a set of training data: (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3 … , 𝑙, 𝑥𝑖 ∈ ℛ𝑑 , 𝑦𝑖 ∈ ℛ (2) a linear function regression function can be stated as: 𝑓(𝑥, 𝜔) = 𝜔𝑇 𝜙(𝑥) + 𝑏 (3) in high-dimensional feature space ℱ, where ω is a vector in ℱ. Function ϕ(x) maps the input x into a vector in ℱ. The quality of estimation is measured by the loss function of L(y, f(x, ω)). SVR uses a new type of loss function, namely an ℰ-intensive loss function in Equation (1) proposed by Vapnik. The empirical risk is then: 1 𝑅𝑒𝑚𝑝 (𝜔) = ∑𝑛𝑖=1 𝐿ℰ (𝑦𝑖 , 𝑓(𝑥, 𝜔)) (4) 𝑛
SVR performs linear regression in the high-dimension feature space using the ℰ-intensive loss function and, at the same time, tries to reduce model complexity by minimizing ‖𝜔‖2 . This is described by introducing slack variables, 𝜉𝑖 , 𝜉𝑖∗ 𝑖 = 1, … , 𝑛, to 4
measure the deviation of training samples outside the ℰ -tube. Thus, SVR is then formulated as: 𝑛 1 2 min ‖𝜔‖ + 𝐶 �(𝜉𝑖 + 𝜉𝑖∗ ) 2 𝑖=1
𝑦𝑖 − 𝑓(𝑥𝑖 , 𝜔) ≤ ℰ + 𝜉𝑖∗ 𝑠. 𝑡. � 𝑓(𝑥𝑖 , 𝜔) − 𝑦𝑖 ≤ ℰ + 𝜉𝑖 𝜉𝑖 , 𝜉𝑖∗ ≥ 0, 𝑖 = 1, … , 𝑛
1
(5)
The ‖𝜔‖2 in Equation (5) is called the regularization term. 𝐶 is the regularized fixed for 2 determining the trade-off between the empirical risk and the regularization term. Increasing the value of 𝐶 will result in the relative importance of the empirical risk with respect to the regularization term to grow. We can view the goal defined by Equation (6) as: 1 𝑅𝑟𝑒𝑔 (𝜔) = min ‖𝜔‖2 + 𝑅𝑒𝑚𝑝 (𝜔) (6) 2
The ERM principle only considers the term 𝑅𝑒𝑚𝑝 (𝜔), which is the risk on the given dataset. When dealing with few data in very high-dimensional spaces, this may lead to over-fitting and thus poor generalization properties. Hence the SVR method adds a 1 capacity control term— ‖𝜔‖2 , which leads to the regularized risk function. 2
To solve the optimization problem of Equation (5), the objective function can be transformed to a dual problem, and its solution is given by: 𝑛𝑠𝑣 (𝛼𝑖 − 𝛼𝑖∗ )𝐾(𝑥𝑖 , 𝑥) 𝑓(𝑥) = ∑𝑖=1 𝑠. 𝑡. 0 ≤ 𝛼𝑖 ≤ 𝐶, 0 ≤ 𝛼𝑖∗ ≤ 𝐶 (7) where 𝑛𝑠𝑣 is the number of support vectors and 𝐾(𝑥𝑖 , 𝑥) is the kernel function. So with Equation (6), the one-step SVR forecasting model is illustrated in Figure 2. The input variables of this model are 𝐹(𝑡 − 𝑘) ,…, 𝐹(𝑡), where 𝐹(𝑡) denotes the traffic volume at time period 𝑡. The one-step model prediction is 𝐹(𝑡 + 1). Volume
t F(t-k)
F(t-k+i)
F(t)
F(t+1)
SVR Figure 2: Illustration of SVR short-term traffic forecasting model
As stated in section 1, the 𝜀-insensitive loss function not only measures the empirical risk (training error), but also controls the generalization ability and the tolerance of noise. It 5
is well known that a suitable value of 𝜀 should depend on the input variation level 𝜀 ∝ 𝜎, as well as the number of training samples. In the context of short-term traffic forecasting, the traffic variation exhibits a time-varying stochastic characteristic, which demands an adaptive 𝜀-margin to accommodate the varying level of variation. Next, we will develop a time-varying model for describing day-to-day traffic variation and then present the adaptive 𝜀-margin in SVR. 2.2 Modeling Stochastic Variation in Day-to-Day Traffic
Although showing a typical seasonal and cyclical pattern, daily traffic also exhibits a prominent day-to-day variation, which has been verified by a number of empirical traffic data analyses. Levinson et al. (2006) analyzed weekday traffic data from 22 continuous traffic counting stations in the city of Milwaukee and computed the coefficient of variation (COV) of peak hour traffic volume, which ranged from 0.048 to 0.155 with a mean of 0.089. Yin (2008) collected traffic data during 9–11AM for an intersection of Gainesville, Florida and showed that the traffic flow varies significantly within that time interval. Hellinga and Abdy (2008) investigated the observed traffic data in Ontario, Canada and found the variation of day-to-day hour volume can be best described by the normal distribution. They also analyzed the daily traffic volume for a 12 month period at city of Toronto and concluded that the hourly volumes follow a normal distribution (Abdy & Hellinga, 2008). Rakha and Van Aerde (1995) investigated traffic volumes at Orlando, Florida for 22 days and validated that 5-min traffic counts for weekday traffic is normally distributed with a confidence level of 95%. Based on these empirical findings, we formulate the traffic volume 𝑓(𝑡, 𝑖) at time period 𝑡 of day 𝑖 as a random distribution with a mean of 𝑓𝑒 (𝑡) and variance of 𝜎 2 (𝑡). So we have the day-to-day variation in the form of: ∆𝑓(𝑡) = [𝑓(𝑡, 𝑖) − 𝑓(𝑡, 𝑗)] ∼ 𝐷�0, 𝜎 2 (𝑡)� (8) 2 (𝑡) 2 (𝑡) where 𝜎 = 2𝜎∗ with the assumption that the distribution of the traffic volumes at the same time period 𝑡 for day 𝑖 and day 𝑗 are independent. 𝐷 is the random distribution of the day-to-day traffic volume variations. For determining the variance 𝜎(𝑡) in Equation (8), we introduce a bimodal formula as Equation (9), which is able to describe the time-varying characteristic of the day-to-day traffic variation. 𝜎(𝑡) = 𝜎0 �𝑝𝑒
𝑡−𝜇
−� 𝜎 1 � 1
2
+ (1 − 𝑝)𝑒
𝑡−𝜇
−� 𝜎 2 � 2
2
�
(9)
Equation (9) is derived from the empirical variation data shown in Figure 3, which plots the day-to-day variation from five-minute traffic counts on five consecutive weekdays collected in Huangshan Avenue in Hefei, China. Three features can be easily observed from the figure. First, the variation is obviously time-varying within a day and shows different magnitude of deviations; secondly, the variation is symmetric with the zero axis; last but not least, two peak variations can be identified, representing the morning and afternoon peak hours. With well-calibrated parameters, these features can be characterized by the proposed Equation (9).
6
Day to day traffic volume variation (Normalized) vehicle per 5 minutes
0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 1440
1080
720 Time (minutes)
360
0
Figure 3: Day-to-day traffic variation of the traffic dataset from Hefei, China
Using Equation (8) together with Equation (9), the day-to-day traffic variation is then modeled as: ∆𝑓(𝑡) ∼
𝑡−𝜇
1� −� 𝐷 �0, 𝜎0 �𝑝𝑒 𝜎1
2
+ (1 −
𝑡−𝜇
2
2� −� 𝑝)𝑒 𝜎2
��
(10)
In Equation (10), at the day-to-day level, the traffic variation at certain time 𝑡 is subject to a random distribution with a mean of zero. Meanwhile within a day, the variances of the at different time periods 𝑡 are associated by a bimodal function to represent the timevarying characteristics observed from field data. To find the parameters of the distribution, the Max-likelihood Estimation (MLE) method is applied to identify the deviation 𝜎(𝑡) in day-to-day traffic, which requires the maximization of the log-likelihood function: 𝑛 𝑛 1 ln ℒ�𝜎 2 (𝑡)� = − ln 2𝜋 − ln𝜎 2 (𝑡) − 2(𝑡) ∑𝑛𝑖=1 Δ(t)2𝑖 (11) 2
2
2𝜎
where 𝑛 is the number of variation observations and ∆ is the day-to-day traffic variation. Taking derivatives with respect to 𝜎 2 (𝑡) and solving the resulting system of first order conditions yields the maximum likelihood estimations: 1 �(𝑡))2 ] 𝜎� 2 (𝑡) = ∑𝑛𝑖=1[(Δ𝑖 (𝑡) − Δ (12) 𝑛
𝜎� 2 (𝑡) is then associated in a compact function form of Equation (8) by the Non-Linear Least Square (NLLS) method, which finds the optimal parameter set of {𝑝, 𝜇1 , 𝜇2 , 𝜎1 , 𝜎2 , 𝜎0 } to minimize: Ψ = � �𝜎0 =
𝑡−𝜇1 2 −� � �𝑝𝑒 𝜎1
+ (1 −
𝑡−𝜇
−� 𝜎 1 � 1 ∑𝐾 �𝑝𝑒 �(𝜎 0 𝑘=1
2
𝑡−𝜇2 2 −� � 𝑝)𝑒 𝜎2 �
+ (1 −
𝑡−𝜇
2
− 𝜎�(𝑡)� 𝑑𝑡
2� −� 𝑝)𝑒 𝜎2
2
� − 𝜎�(𝑘))2 �
(13)
where 𝐾 is the number of observations within a day. A widely-accepted computation method, the Gauss-Newton method is applied to solve the NLLS of Equation (13). 7
2.3
Adaptive margin
On the ground of stochastic modeling the time-varying day-to-day traffic variation, this section describes how to integrate this variation to construct an adaptive 𝜀-margin. It is well known that 𝜀 should depend on the data variation (Cherkassky & Ma, 2004; Smola et al., 1998), and therefore we proposed a new time-varying 𝜀-margin as: 𝜀(𝑡) = �� 𝜆1 𝑔(𝜎(𝑡)) �) ����� + 𝜆��� 2 𝑔(𝜎 𝑙𝑜𝑐𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚
𝑠. 𝑡. 𝜆1 + 𝜆2 = 1
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑡𝑒𝑟𝑚
(13)
The 𝜆1 𝑔(𝜎(𝑡)) term describes the local variation, while 𝜆2 𝑔(𝜎�) contains the normalized variation information. The parameters of 𝜆1 and 𝜆2 provide a tradeoff between local and global information. The proposed time-varying form of 𝜀-margin includes the fixed margin as a special case if 𝜆1 = 0. When 𝜆2 = 0, the margin then solely relies on local variation. As stated in section 2.1, 𝜀 is also dependent on the number of training samples 𝑛. We adopt an empirical selection proposed by Cherkassky and Ma (2004) as: ln 𝑛
𝑔(𝜎) = 3𝜎�
𝑛
(14)
Combining Equation (13) and Equation (14), the proposed adaptive margin is ln 𝑛
𝜀(𝑡) = 3(𝜆1 𝜎(𝑡) + 𝜆2 𝜎�)�
(15)
𝑛
With this new margin, the SVR optimization problem is then re-formulated as: 𝑛
1 min ‖𝜔‖2 + 𝐶 �(𝜉𝑖 + 𝜉𝑖∗ ) 2 𝑖=1
𝑠. 𝑡.
⎧𝑦𝑖 − 𝑓(𝑥𝑖 , 𝜔) ≤ 𝜏(𝜆1 𝜎(𝑡) + 𝜆2 𝜎�)�ln 𝑛 + 𝜉𝑖∗ 𝑛 ⎪ ln 𝑛
⎨ 𝑓(𝑥𝑖 , 𝜔) − 𝑦𝑖 ≤ 𝜏(𝜆1 𝜎(𝑡) + 𝜆2 𝜎�)� ⎪ 𝜉𝑖 , 𝜉𝑖∗ ≥ 0, 𝑖 = 1, … , 𝑛 ⎩
𝑛
+ 𝜉𝑖
3. Experiments and Discussion
This section presents comparative experiments to validate our adaptive-margin SVR using four different field data. The performance is compared with fixed-margin SVR determined by the method of cross-validation, as well as the seasonal ARIMA model, RBF-NN model and the Naive forecast method. 3.1 Experimental settings 8
(16)
3.1.1 Data Normalization and Preprocessing Four field datasets from Hefei, China and Seattle, USA are used in this experiment. The first dataset is collected from Huangshan Avenue in Hefei, China. It contains traffic counts for five consecutive weekdays from Sep. 15th to Sep. 19th 2008 in normal traffic conditions without incidents and adverse weather. The second dataset contains traffic counts of eight weeks from March 26th to May 20th, 2007 on I-90 (north bound of station cabinet ES-855D), obtained from the Traffic Data Acquisition and Distribution (TDAD) database maintained by an ITS research group at the University of Washington. The third dataset also contains eight weeks of traffic counts from Mar 19th to May 13th on SR-18 (east bound of station cabinet ES-394D). The fourth dataset contains eight weeks of traffic counts from April 2nd to May 26th on SR525 (south bound of station cabinet ES-766D). The traffic counts are aggregated into 5 minutes interval and normalized into [0,1] by: 𝑦𝑛𝑜𝑟𝑚 =
𝑦−𝑦𝑚𝑖𝑛
(17)
𝑦𝑚𝑎𝑥 −𝑦𝑚𝑖𝑛
where 𝑦𝑚𝑎𝑥 and 𝑦𝑚𝑖𝑛 are the maximum and minimum traffic counts in the dataset, respectively. Since the data series contains obvious outliers with zero or extremely large traffic volume. The statistics of the four datasets are summarized in Table 1. Table 1: Descriptive Statistics of experimental datasets
Collection Periods
Series length
Mean (vehicle/5minutes)
Hefei Dataset
th
th
Sep. 15 to Sep. 19 2008 th
th
Standard Deviation
1440
21.6671
16.0877
ES-855D
Mar. 26 to May 20 2007
16128
204.1465
141.5130
ES-394D
Mar. 19th to May 13th 2007
16128
54..4217
36.1245
16128
70.1395
46.3563
ES-766D
nd
th
Apr. 2 to May 26 2007
3.1.2 Auto-Correlation The input dimension is important for establishing the prediction model. A suitable input dimension is usually determined by the statistical autocorrelation function (ACF), which measures the correlation between data points in a time series separated by different time lags (Jiang & Adeli, 2005). The smallest time lag which makes ACF equal to zero was used to determine the input dimension. Figure 4 shows the ACF for the four datasets. For the Hefei dataset (Fig. 4a), the ACF curve first intersects with zero axes at a time lag of 6 hours, and thus the input dimension is then selected to be 72. For the other three datasets from Seattle, the ACF is plotted in Figure 5(b)-(d) and the input dimension is selected to be 60.
9
Besides 𝜀, other parameters including the kernel parameter 𝜆 and the regularization parameter 𝐶 are determined by cross-validation. The SVR program is implemented based on the SVM-KM toolbox (Canu, Grandvalet, Guigue, & Rakotomamonjy, 2005). 1
1
0.8 0.6 0.5
0.2
ACF
ACF
0.4
0 -0.2
0
-0.4 -0.6 -0.8 0
8
16
32
24 Hour
40
-0.5 0
48
16
32
48
Hour
(a)
(b) 1
1 0.8 0.6
0.5
ACF
ACF
0.4 0.2 0
0
-0.2 -0.4 -0.6 0
-0.5
32
16
48
16
0
Hour
32
48
Hour
(c)
(d)
Figure 4: ACF of the traffic series data (a) dataset from Hefei; (b) ES766; (c) ES394; (d) ES855
3.2 Variation Calibration Result As described in section 2, the variation for day-to-day traffic is modeled by Equation (9) and a combination method of MLE and NLLS is adopted to calibrate the parameters. The results are summarized in Table 2. Table 2: Parameter sets in variation equation (9)
Hefei Dataset ES-855D ES-766D ES-394D
𝝈𝟎 𝒑
𝝈𝟎 (𝟏 − 𝒑)
540.0
𝝁𝟐
948.0
𝝈𝟏
114.4
579.5
0.0749
0.1309
𝝁𝟏
0.1019
356.7
1008.5
183.1
403.1
0.0608
0.08010
334.9
912.5
171.5
411
0.0851 0.0448
0.05345
296.8
971.0
198.0
𝝈𝟐
394.3
The time-varying deviations of variations are then plotted in Figure 6. We can see that for the tested datasets the deviations of the afternoon peak is slightly smaller than the morning peak, and the descending branch is also smoother compared with the morning peak. For the datasets from Seattle, the deviation of the morning peak is obviously larger than the afternoon peak. The traffic counts from ES-855 shows relatively larger 10
deviations than other datasets, which is easy to understand and consistent with the deviation of the four time series listed in Table 1. 0.12 Heifei dataset ES-394D dataset ES-855D dataset ES-766D dataset
0.1
σ(t)
0.08
0.06
0.04
0.02
0 0
360
720 Time (minutes)
1080
1440
Figure 5: Time-varying deviation in day-to-day traffic
3.3 Performance Evaluation and Discussion This section presents the comparative results of our adaptive SVR with fixed 𝜀 on all the four datasets first, then three forecasting models including ARIMA, RBF-NN, and Naïve forecast methods are implemented on the three large datasets from Seattle. 3.3.1 Performance evaluation with Fix-margin SVR Three error indicators are used in this experiment, including Mean Absolute Error (MAE), Mean Sum of Squares of Errors (MSE), and Normalized Square Root of the Mean Sum of Squares of Errors (NSRMSE). The NSRMSE measures the MSE in a normalized way regarding the level of the deviation in the data itself. 1 (18) 𝑀𝐴𝐸 = ∑𝑁 �𝑖 − 𝑦𝑖 | 𝑖=1|𝑦 𝑁
1
𝑀𝑆𝐸 = ∑𝑁 �𝑖 − 𝑦𝑖 )2 𝑖=1(𝑦 𝑁
1
𝑁𝑆𝑅𝑀𝑆𝐸 = �𝑁1
𝑁
∑𝑁 � 𝑖 −𝑦𝑖 )2 𝑖=1(𝑦 ∑𝑁 �)2 𝑖=1(𝑦𝑖 −𝑦
(19)
(20)
The adaptive-margin SVR model is compared with the fixed 𝜀-SVR first, where the parameter of 𝜀 is usually determined by cross-validation. In K-fold cross-validation, the original sample is randomly partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K−1 subsamples are used as the training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. Although cross-validation is widely adopted to determine a suitable fixed margin 𝜀 in SVR, it is a time-consuming and blind searching procedure with a computational complexity of 𝑂(𝐾𝑅), where 𝑅 is the range of the parameter (F. Chen, Wei, & Tang, 2011). In contrast, the adaptive-margin SVR avoids this blind searching procedure and 11
improved the computational efficiency through establishing an adaptive margin, which is heuristically determined by the day-to-day traffic variations. For the dataset collected from Hefei, traffic counts on Monday are used as the training samples, and the validation dataset contains traffic volumes from Tuesday to Friday. Results with different combination of 𝜆1 and 𝜆2 are listed in Table 3. The predicted and observed traffic counts on Tuesday are plotted in Figure 7. The NRSSME is plotted in Figure 6, from which we can see that the predicting errors are descending for the consecutive four weekdays with an increase in the proportion of local variation. For nearly all the four of the predicting weekdays, the lowest predicting errors are obtained when 𝜆1 = 1 and 𝜆2 = 0, which indicates utilizing local variation information without normalized or global information can achieve a better performance. It is also interesting to note that since the traffic on Friday shows a different pattern from typical weekdays, the prediction errors also tend to be slightly larger. Table 4 gives the forecasting errors between fixed and adaptive margin SVRs. It can be seen that compared with the time-consuming K-fold cross-validation procedure in the fixed-margin SVR, the proposed adaptive-margin SVR not only improves the computational efficiency by heuristically determining an adaptive margin, but also consistently outperforms the fixed-margin SVR on the forecasting accuracy. NRSMSE for adaptive-margin NRSMSE for fixed-margin 0.56
0.52
0.558
NRSMSE
0.515 0.556 0.554
0.51
0.552 0.55 0.548 -1 0.585
0.505 Wednesday
Tuesday
-0.6
-0.2
0.2
0.6
1
0.58
0.5 -1 0.59
-0.6
-0.2
0.2
0.6
1
0.585
0.575
0.58
0.57 0.575 Thursday -1
-0.6
-0.2
0.2
λ 1-λ 2
0.6
1
Friday
-1
-0.5
0
λ 1-λ 2
0.5
1
Figure 6: Predicting results with different combination of λ1 and λ2
Table 3: Predicting errors for adaptive margin SVR for the dataset in Hefei
Tuesday
𝝀𝟏 − 𝝀𝟐
MAE NSRMSE
-1
-0.6
-0.2
0.2
0.6
1
5.828 0.5581
5.834 0.5586
5.807 0.5575
5.773 0.5539
5.773 0.5517
5.766 0.5492
12
Wednesday
Thursday
Friday
MSE MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE
79.92 5.168 0.5182 68.57 5.828 0.5816 82.07 6.208 0.5834 83.29
80.06 5.114 0.5146 67.62 5.807 0.5791 81.37 6.181 0.5809 82.58
79.75 5.086 0.5132 67.25 5.773 0.5775 80.92 6.140 0.5784 82.12
78.72 5.052 0.5090 66.15 5.760 0.5748 80.16 6.079 0.5750 81.35
78.10 5.032 0.5051 65.14 5.692 0.5705 78.97 6.038 0.5731 80.14
77.39 5.005 0.5013 64.17 5.610 0.5662 77.78 6.032 0.5733 78.94
Table 4: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei
𝜺 margin
Tuesday
Wednesday
Thursday
Friday
Adaptive margin SVR
Fixed margin SVR
5.766 0.5492 77.39 5.005 0.5013 64.17 0.0825 0.5662 77.78 5.610 0.5834 78.94
5.8072 0. 5575 79.74752 5.1068 0.5146 64.27 5.746 0.5787 82.30311 6.154 0.5895 85.56835
MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE
1 Predicted Value Observed Value
Traffic volume per 5 minutes (Normalizd)
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
360
720
1080
Time (minutes)
Figure 7: Illustration of predicted and observed traffic counts on Tuesday for the dataset collected in Hefei
Similar results are obtained for the three datasets from Seattle as listed in Table 5. The traffic counts of the first five weekdays for the collection periods are used as the training samples. The model is then evaluated with the traffic counts of the last four weeks. Figure 8 plots the NSRMSE errors for different combination of 𝜆1 and 𝜆2 , from which we can see that the best performances are also obtained for all datasets when the local variations are fully utilized. The errors listed in Table 6 also indicate that the predicting accuracy with the adaptive margin consistently outperforms the fixed margin SVR.
13
NSRMSE
0.2
0.195
ES394 0.19 -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0.8
1
NSRMSE
0.22 0.215 0.21
NSRMSE
0.205 -1 0.13
ES766 -0.8
0.125 0.12 0.115 0.11
NSRMSE with adpative margin NSRMSE with fixed-margin
ES855
0.105 -1
-0.8
-0.6
-0.4
0
-0.2
λ 1-λ 2
0.2
0.4
0.6
Figure 8: Predicting results with different combination of λ1 and λ2
Table 5: Predicting errors for adaptive margin SVR for the dataset in TDAD
ES394
ES766
ES855
𝝀𝟏 − 𝝀𝟐
MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE
-1
-0.6
-0.2
0.2
0.6
1
5.434 0.1981 56.67 7.532 0.2132 112.6 13.83 0.1244 356.89
5.333 0.1955 55.19 7.477 0.2104 109.7 13.56 0.1214 340.0
5.120 0.1953 55.08 7.421 0.2093 108.6 12.34 0.1203 310.9
5.148 0.1922 53.34 7.425 0.2076 106.8 12.77 0.1161 333.7
5.015 0.1902 52.24 7.400 0.2062 105.4 12.57 0.1149 304.5
5.013 0.1901 52.18 7.370 0.2059 105.1 12.63 0.1153 306.6
Table 6: Predicting errors for fixed-margin SVR and adaptive for the dataset in Hefei
ES394
ES766
ES855
𝜺 margin
Adaptive margin SVR
Fixed margin SVR
5.013 0.1901 52.1819 7.370 0.2059 105.1 12.63 0.1153 306.6
5.412 0.1966 55.0757 7.457 0.2149 109.7 13.73 0.1247 331.6
MAE NSRMSE MSE MAE NSRMSE MSE MAE NSRMSE MSE
14
3.3.2 Model comparisons Above experiments demonstrated that the adaptive margin SVR model consistently outperformed the fixed margin SVR forecast. Next, comparisons with three forecasting models including seasonal ARIMA, Radius Basis Function NN (RBF-NN) and Naive forecast are conducted. For the structure of ARIMA, in Williams et al. (1998), Williams and Hoel (2003) and follow-on research (Smith et al., 2002), the seasonal ARIMA (1, 0, 1) (0, 1, 1)s consistently emerged as the preferred structure. For the data with 5 minutes resolution, the seasonal cycle length 𝑆 for a week is 2016, thus the recursive prediction equation is written as: y� t+1 = yt−2015 + ∅1 (yt − yt−2016 ) − θ1 (yt − y� t ) − Θ1 (yt−2015 − y� t−2015 ) +θ1 Θ1 (yt−2016 − y� t−2016 )
(21)
The parameters in Eq. (21) are then estimated using the Econometrics toolbox of Matlabtm(2012a) with the calibration dataset of the first four weeks traffic count (8064 observations) as listed in table 5. With the calibrated parameters, the model is then evaluated by iteratively forecasting the traffic counts of the second four weeks. The errors are calculated regarding the weekday traffic. Table 7: ARIMA parameters
ES394
0.9533
0.6127
0.4950
ES766
0.9280
𝜽𝟏
0.7492
0.5191
ES855
0.9510
0.4863
0.5027
Dataset
∅𝟏
𝜣𝟏
The architecture of the RBF-NN is determined by the package of DTREG (Sherrod, 2003), which applies the evolutionary approach developed by S. Chen, Hong, Harris, and Sharkey (2004) to determine the optimal parameter set. Same as the adaptive SVR model, traffic counts of the first week are used as the training samples. The third comparison model as suggested in Smith et al. (2002) is the Naive forecast in the form of: yt y� t+1 = yt+1, hist (22) yt, hist
where yt, hist is the historical average of the traffic flow rate at the time-of-day and dayof-week associated with time interval t. Forecasting models that could not be shown to consistently produce more accurate forecast than this naive method would be of little value.
The performances of above comparing models, as well as the adaptive SVR model, are listed in Table 7. It can be seen that the Naive forecast shows the largest errors for all three datasets, and the adaptive margin SVR model consistently outperforms other comparing models. In terms of NSRMSE, the adaptive-margin SVR model improves the forecasting accuracy by 6%, 7% and 2% with ES394, ES766 and ES855 datasets, respectively. Furthermore, SVR model only used one-week traffic data as the training 15
samples, while the Naive and ARIMA models both utilized traffic counts of four consecutive weeks. Table 7: Prediction errors for different models
Models
Adaptive margin SVR
Naive Forecast
RBF-NN
ARIMA (1,0,1)(0,1,1)1026
ES394
MAE MSE NSRMSE
5.013 52.1819 0.1901
6.964 95.90 0.2564
5.420 55.58 0.1962
ES766
MAE MSE NSRMSE
7.357 105.0 0.2059
9.554 182.8 0.2722
7.655 110.8 0.2114
7.97 126.9 0.2243
MAE MSE NSRMSE
12.53 284.9 0.1113
15.00 439.3 0.1382
13.03 319.6 0.1177
14.85 428.8 0.1365
ES855
5.92 69.97 0.2159
4. Conclusion and future work Accurately forecasting short-term traffic is a challenging task due to its complex characteristics. This paper proposed an adaptive-margin SVR with integration of day-today traffic variation, which can adequately tolerate and track the time-varying volatility. The adaptive-margin SVR avoids this blind searching procedure and improves the computational efficiency through establishing an adaptive and time-varying 𝜀 margin, which is heuristically determined by the day-to-day traffic variations. Experimental results with field data indicate that the adaptive 𝜀 with only the local variation can achieve better performance compared with combinations of the global variation. The adaptive-margin model consistently outperforms the fixed 𝜀 margin SVR with improved computational efficiency. The impact of the training samples is also discussed, demonstrating that a complete set of daily and weekly traffic counts plays a fundamental role for SVR learning a seasonal traffic pattern. We only investigated the 𝜀 margin of the SVR model in this paper. However, other parameters in SVR such as the regulation parameter 𝐶 also may affect the learning ability and forecasting accuracy. Combinations of heuristically determined parameter set including both the 𝜀 margin and regulation parameter are expected to further improve the performance of the forecasting model. And now this is being investigated by the authors. Reference Abdy, Z. R., & Hellinga, B. R. (2008). Use of Microsimulation to Model Day-to-Day Variability of Intersection Performance. Transportation Research Record: Journal of the Transportation Research Board, 2088(-1), 18-25. Canu, S., Grandvalet, Y., Guigue, V., & Rakotomamonjy, A. (2005). Svm and kernel methods matlab toolbox. Perception Systemes et Information, INSA de Rouen, Rouen, France, 2, 2. Castro-Neto, M., Jeong, Y. S., Jeong, M. K., & Han, L. D. (2009). Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions. Expert Systems with Applications, 36(3), 6164-6173.
16
Chen, F., Wei, D., & Tang, Y. (2010). Wavelet analysis based sparse LS-SVR for time series data. Paper presented at the Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on. Chen, F., Wei, D., & Tang, Y. (2011). Virtual Ion Selective Electrode for Online Measurement of Nutrient Solution Components. Sensors Journal, IEEE, 11(2), 462-468. Chen, S., Hong, X., Harris, C. J., & Sharkey, P. M. (2004). Sparse modeling using orthogonal forward regression with PRESS statistic and regularization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 34(2), 898-911. Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), 113-126. doi: 10.1016/s0893-6080(03)00169-2 Davis, G. A., & Nihan, N. L. (1991). Nonparametric regression and short-term freeway traffic forecasting. Journal of Transportation Engineering, 117(2), 178-188. Hellinga, B., & Abdy, Z. (2008). Signalized intersection analysis and design: Implications of dayto-day variability in peak-hour volumes on delay. Journal of Transportation Engineering, 134(7), 307-318. Jiang, X., & Adeli, H. (2005). Dynamic wavelet neural network model for traffic flow forecasting. Journal of Transportation Engineering, 131(10), 771-779. Levinson, H. S., Sullivan, D., & Bryson, R. W. (2006). Effects of Urban Traffic Volume Variations on Service Levels. Paper presented at the Transportation Research Board 85th Annual Meeting. Li, J. Q. (2011). Discretization modeling, integer programming formulations and dynamic programming algorithms for robust traffic signal timing. Transportation Research Part C: Emerging Technologies, 19(4), 708-719. Li, L., Lin, W., & Liu, H. (2006). Type-2 fuzzy logic approach for short-term traffic forecasting. Paper presented at the Intelligent Transport Systems, IEE Proceedings. Park, B., Messer, C. J., & Urbanik II, T. (1998). Short-term freeway traffic volume forecasting using radial basis function neural network. Transportation Research Record: Journal of the Transportation Research Board, 1651(-1), 39-47. Rakha, H., & Van Aerde, M. (1995). Statistical analysis of day-to-day variations in real-time traffic flow data. Transportation research record, 26-34. Sherrod, P. H. (2003). DTREG predictive modeling software. Software available at http://www. dtreg. com. Smith, B. L., Williams, B. M., & Keith Oswald, R. (2002). Comparison of parametric and nonparametric models for traffic flow forecasting. Transportation Research Part C: Emerging Technologies, 10(4), 303-321. Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11(4), 637-649. Vapnik, V. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc. Wei, D., Chen, F., & Chen, C. (2010). Least-Square Wavelet-Support Vector Machine for Shortterm Traffic Flow Forecast. The Mediterranean Journal of Measurement and Control, 6(2), 39-45. Williams, B. M., Durvasula, P. K., & Brown, D. E. (1998). Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transportation Research Record: Journal of the Transportation Research Board, 1644(-1), 132-141. Williams, B. M., & Hoel, L. A. (1999). Modeling and forecasting vehicular traffic flow as a seasonal stochastic time series process. Williams, B. M., & Hoel, L. A. (2003). Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of Transportation Engineering, 129(6), 664-672.
17
Xie, Y., & Zhang, Y. (2006). A wavelet network model for short-term traffic volume forecasting. Journal of Intelligent Transportation Systems, 10(3), 141-150. Yin, Y. (2008). Robust optimal traffic signal timing. Transportation Research Part B: Methodological, 42(10), 911-924. Zhang, L., Yin, Y., & Lou, Y. (2010). Robust Signal Timing for Arterials under Day-to-Day Demand Variations. Transportation Research Record: Journal of the Transportation Research Board, 2192(-1), 156-166. Zhang, Y., & Xie, Y. (2008). Forecasting of short-term freeway volume with v-support vector machines. Transportation Research Record: Journal of the Transportation Research Board, 2024(-1), 92-99.
18