Improvement of the multilayer perceptron for air ... - Semantic Scholar

Report 2 Downloads 113 Views
Computers & Geosciences 59 (2013) 148–155

Contents lists available at SciVerse ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

Improvement of the multilayer perceptron for air quality modelling through an adaptive learning scheme K.I. Hoi, K.V. Yuen, K.M. Mok n Department of Civil and Environmental Engineering, University of Macau, Av. Padre Tomás Pereira Taipa, Macau SAR, China

art ic l e i nf o

a b s t r a c t

Article history: Received 3 September 2012 Received in revised form 29 April 2013 Accepted 10 June 2013 Available online 19 June 2013

Multilayer perceptron (MLP), normally trained by the offline backpropagation algorithm, could not adapt to the changing air quality system and subsequently underperforms. To improve this, the extended Kalman filter is adopted into the learning algorithm to build a time-varying multilayer perceptron (TVMLP) in this study. Application of the TVMLP to model the daily averaged concentration of the respirable suspended particulates with aerodynamic diameter of not more than 10 mm (PM10) in Macau shows statistically significant improvement on the performance indicators over the MLP counterpart. In addition, the adaptive learning algorithm could also address explicitly the uncertainty of the prediction so that confidence intervals can be provided. More importantly, the adaptiveness of the TVMLP gives prediction improvement on the region of higher particulate concentrations that the public concerns. & 2013 Elsevier Ltd. All rights reserved.

Keywords: Extended Kalman filter Multilayer perceptron PM10 Time-varying multilayer perceptron

1. Introduction With its exceptional function approximation capability, the multilayer perceptron (MLP) with the backpropagation learning algorithm (Hornik et al., 1989) has been a popular technique applied to air quality modelling (Mok and Tam, 1998; Cobourn et al., 2000; Plochl, 2001; Lu and Chang, 2005; Chattopadhyay and Chattopadhyay, 2009; Voukantis et al., 2010). However, these conventional MLPs are generally trained to learn in an offline manner; i.e. the network weights and biases are assumed constants after training. This assumption is appropriate for modelling time-invariant systems. However, the time-invariant assumption does not always hold for the air quality system in many urban agglomerations worldwide due to their developments. Macau, a global gaming Mecca located on the Southeast coast of China on the western bank of the Pearl River Delta, is also one undergo such changes. The subsequent energy consumption due to the urbanization has increased the emissions of air pollutants. In addition, the urbanization process can also influence the local meteorology, which is directly related to the dispersion of the air pollutants. Studies indicate that urbanization in the Pearl River Delta region can have significant impacts on the land–sea breezes and regional circulations hence affecting the airflow pattern and air quality over the region (Lu et al., 2009). Therefore, urbanization makes the air quality in the region a time-varying system. For this reason, it was

n

Corresponding author. Tel.: +853 83974971; fax: +853 28831694. E-mail addresses: [email protected] (K.I. Hoi), [email protected] (K.V. Yuen), [email protected] (K.M. Mok). 0098-3004/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cageo.2013.06.002

shown that a trained MLP underperformed when it was applied to model the air quality in Macau (Hoi et al., 2009). In addition, the backpropagation learning algorithm could not address the uncertainty of the prediction. Therefore, the objective of this study is to deal with these shortcomings of the conventional MLP, especially on handling the time-varying nature of an air quality system, which has never been addressed in previous studies with MLPbased models. This is done by adopting the extended Kalman filter into the learning algorithm to build an adaptive time-varying multilayer perceptron (TVMLP) for improvement (Singhal and Wu, 1989; Chibani and Nemmour, 2003; Nemmour and Chibani, 2003; Härter and Campos Velho, 2008, 2010; Canty, 2009). In addition, this study also addresses the estimation of the uncertain noise variances in the governing equations of Kalman filter. Since values of these variances have to be prescribed by the users and they were shown to affect the performance of Kalman filter based models (Harvey, 1991; Yuen et al., 2007; Hoi et al., 2010), they should be properly assigned. The Bayesian approach is used to give their optimized estimations. The codes for implementation of the TVMLP algorithm and the noise variance optimization are developed by the authors with Matlab. The paper is organized as follows. In Section 2, formulation of the Kalman filter based TVMLP and its optimization with appropriate assignment of the noise variances are presented in details. Then application of the TVMLP in modelling the daily averaged PM10 concentration of Macau (response variable) with its past time history and the relevant meteorological conditions (input variables) is presented in Section 3. Within the same section, its general performance is compared with the conventional offline MLP by using several well-known indicators in air quality studies.

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

Hypothesis test (paired t-test) is also conducted to test the statistical significance of their differences. In addition, the performance of both models in capturing the extreme values ([PM10] 4100 μg m−3) is compared too. Conclusions are drawn in Section 4.

fixed. However, this is not appropriate for a time-varying system as mentioned earlier. Therefore, the weights and biases in the hidden layer and the output layer of the present TVMLP are allowed to evolve according to: ϕm;k ¼ ϕm;k−1 þ um;k−1 ;

2. The Kalman filter based time-varying multilayer perceptron (TVMLP) The architecture of the three-layer time-varying multilayer perceptron (TVMLP) built in this study is shown in Fig. 1. The three-layer structure is adopted here since it is commonly used in air quality studies (Mok and Tam, 1998; Cobourn et al., 2000; Plochl, 2001; Yuen and Lam, 2006; Chattopadhyay and Chattopadhyay, 2009). The network has one input layer, one hidden layer with the hyperbolic tangent sigmoid transfer function and one output layer with the linear transfer function. The input layer comprises of s0 normalized input variables x1,k−1,…,xs0,k −1 of the (k−1)th day. Each normalized input variable xi is bounded within the range of [−1,1]   and is calculated from the raw input variable x′i ∈ x′i;min ; x′i;max by using the following linear transformation:   2 x′i;k −x′i;min xi;k ¼ −1 þ ð1Þ x′i;max −x′i;min This normalization is a common practice of the MLP application to avoid saturation of neurons and blocking of the learning algorithm for some large inputs. In the hidden layer where the ðlÞ neurons are, the symbols s1, wðlÞ and bi;k−1 denote the number of i;j;k−1 neurons, the network weight to the ith neuron from the jth input variable, and the bias to the ith neuron of the lth layer at the (k−1) th step, respectively. In the output layer, the symbol x^ k denotes the network output at the kth step. For the present setup, the normalized actual output xk of the kth step is assumed to be the sum of the network output x^ k , and a process noise fk−1: xk ¼ x^ k þ f k−1

ð2Þ

The process noise f represents the unmodelled dynamics, and it is modelled as a Gaussian independent and identically distributed (i.i.d.) process with zero mean and variance s2f . The normalized measured output zk, which is converted similarly from the raw measured output z′k with Eq. (1), is assumed to be the sum of the normalized actual value xk and the measurement noise nk: zk ¼ xk þ nk

ð3Þ

where the measurement noise n is also modeled as a Gaussian i.i.d. with zero mean and variance s2n . The stochastic processes f in Eq. (2) and n in Eq. (3) are assumed to be independent. For the conventional MLP, the weights and biases in the network are kept

149

m ¼ 1; …; ðs0 þ 2Þs1 þ 1

ð4Þ

where ϕm;k denotes the network weights and biases in the hidden layer and the output layer at the kth time step and um;k−1 denotes the correction made to the coefficient ϕm;k−1 at the (k−1)th time step. The stochastic process u is modelled as Gaussian i.i.d. with   zero mean and covariance matrix diag s21 ; …; s2ðs0þ2Þs1þ1 . Now the state vector that contains the normalized actual output and the unknown coefficients (weights and biases) is defined as: h iT ð1Þ ð1Þ ð2Þ Yk ¼ xk ; wð1Þ ; …; wð1Þ ; …; wð1Þ ; …; wð1Þ ; b1;k ; …; bs1;k ; wð2Þ ; …; wð2Þ ; b1;k 1;1;k 1;s0;k s1;1;k s1;s0;k 1;1;k 1;s1;k

ð5Þ Then, the normalized measured output zk can be expressed in terms of Yk in the form: zk ¼ CYk þ nk

h

where C is a row vector given by C ¼ 1

i

ð6Þ

01ððs0þ2Þs1þ1Þ . The

process noise vector Fk containing the process noise fk and the corrections at time step k is defined as:  T Fk ¼ f k ; u1;k ; …; uððs0þ2Þs1þ1Þ;k ð7Þ It follows from above that Fk has zero mean and a diagonal covariance matrix Q:   ð8Þ Q ¼ diag s2f ; s21 ; …; s2ðs0þ2Þs1þ1 Then the state vector Yk can be written as a general vector function of Yk−1 and Fk−1: 2 3 x^ k þ f k−1 6 7 ð1Þ 6 7 w1;1;k−1 þ u1;k−1 6 7 6 7 ⋮ 6 7 6 7 ð1Þ 6 7 w1;s0;k−1 þ us0;k−1 6 7 6 7 ⋮ 6 7 6 ð1Þ 7 6 ws1;1;k−1 þ us0ðs1−1Þþ1;k−1 7 6 7 6 7 ⋮ 6 7 6 7 6 wð1Þ 7 þ u s0s1;k−1 s1;s0;k−1 6 7 ¼ rðY k−1 Þ þ Fk−1 ð9Þ Yk ¼ 6 7 ð1Þ 6 b 7 þ u s0s1þ1;k−1 1;k−1 6 7 6 7 ⋮ 6 7 6 7 ð1Þ 6 b 7 þ u ðs0þ1Þs1;k−1 s1;k−1 6 7 6 ð2Þ 7 6w 7 6 1;1;k−1 þ uðs0þ1Þs1þ1;k−1 7 6 7 6 7 ⋮ 6 ð2Þ 7 6 w 7 6 1;s1;k−1 þ uðs0þ2Þs1;k−1 7 4 5 ð2Þ b1;k−1 þ uðs0þ2Þs1þ1;k−1 The above equation can be linearized locally at its updated state estimate Y^ k−1jk−1 : ∂r  Y k ¼ rðY^ k−1jk−1 Þ þ ðY −Y^ Þ þ Fk−1 ð10Þ  ∂Y k−1 Yk−1 ¼ Y^ k−1jk−1 k−1 k−1jk−1 This first-order difference equation can be rewritten as follows: ^ ^ Yk ¼ A k−1jk−1 Y k−1 þ Fk−1 þ Gk−1jk−1

Fig. 1. Architecture of time-varying multilayer perceptron in the present study.

ð11Þ

^ ^ ^ ^ where G k−1jk−1 ¼ rðY k−1jk−1 Þ−A k−1jk−1 Y k−1jk−1 . The matrix Âk−1|k−1 denotes the estimator of the matrix A at the (k−1)th time step, conditional on the normalized measurements Dk−1 ¼{z1,z2,…,zk−1}. It is equal to the first order derivative of the state vector Yk with respect to the state vector Yk−1 evaluated at the state estimate

150

Y^ k−1jk−1 : ^ A k−1jk−1 ¼

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

" Wð2Þ Hð1Þ Wð1Þ E k−1 k−1 k−1j 0ððs0þ2Þs1þ1Þ1

h i Wð2Þ Hð1Þ ⊗pTk−1 k−1 k−1

Wð2Þ Hð1Þ k−1 k−1



að1Þ k−1

T

Iððs0þ2Þs1þ1Þððs0þ2Þs1þ1Þ

# 1 Y k−1 ¼ Y^ k−1jk−1

ð12Þ where the symbol ⊗ denotes the Kronecker product. The symbol ^ ð2Þ W k−1jk−1 denotes the row vector which contains the estimators of the weights in the output layer conditional on Dk−1: h i ^ ð2Þ ^ ð2Þ ^ ð2Þ ð13Þ W k−1jk−1 ¼ w1;1;k−1jk−1 ; …; w1;s1;k−1jk−1 ^ ð1Þ The symbol H k−1jk−1 denotes the estimator of the diagonal matrix which is defined as the first order derivative of the output h iT ð1Þ ¼ að1Þ ; …; as1;k−1 with respect to vector of the hidden layer að1Þ k−1 1;k−1 h iT ð1Þ ð1Þ ð1Þ to this layer the network input vector hk−1 ¼ h1;k−1 ; …; hs1;k−1 conditional on Dk−1:   2  2

∂að1Þ ð1Þ ð1Þ k−1  ^ ð1Þ H ¼ diag 1− a^ 1;k−1jk−1 ; …; 1− a^ s1;k−1jk−1  k−1jk−1 ¼ ð1Þ hð1Þ ¼ hð1Þ ∂hk−1 k−1 k−1jk−1

ð14Þ ð1Þ

^ The symbol W k−1jk−1 denotes the estimator of the weight matrix in the hidden layer conditional on Dk−1: 2 ð1Þ 3 ^ 1;1;k−1jk−1 ⋯ w ^ ð1Þ w 1;s0;k−1jk−1 6 7 ^ ð1Þ 6 7 ⋮ ⋱ ⋮ ð15Þ W k−1jk−1 ¼ 4 5 ð1Þ ð1Þ ^ s1;s0;k−1jk−1 ^ s1;1;k−1jk−1 ⋯ w w The symbol E denotes a column vector: h iT E ¼ 1 01xðs0−1Þ

ð16Þ

^ ð1Þ W k−1jk−1

by the vector E extracts Post-multiplying the matrix the first column of the matrix. The symbol p^ k−1jk−1 represents the estimator of the input vector conditional on Dk−1:  T p^ k−1jk−1 ¼ x^ 1;k−1jk−1 ; …; x^ s0;k−1jk−1 ð17Þ With the TVMLP vector shown in Eq. (11), one can perform the prediction and filtering steps of the state vector alternatingly by the extended Kalman filter When the measurements up to the (k−1)th step are available, the predicting procedure is applied to estimate the state vector on the kth step from the filtered state vector on the (k−1)th step: 2

3 s1 ð1Þ ð2Þ ^ ð2Þ ^ 1;i;k−1jk−1 a^ i;k−1jk−1 7 6 b1;k−1jk−1 þ ∑ w 6 7 i¼1 6 7 6 7 ^ ð1Þ w 6 7 1;1;k−1jk−1 6 7 6 7 ⋮ 6 7 ð1Þ 6 7 ^ 1;s0;k−1jk−1 w 6 7 6 7 6 7 ⋮ 6 7 ð1Þ 6 7 ^ s1;1;k−1jk−1 w 6 7 6 7 6 7 ⋮ 6 7 6 7 ð1Þ ^ s1;s0;k−1jk−1 w 7 Y^ kjk−1 ¼ rðY^ kjk−1 Þ ¼ 6 6 7 6 7 ^bð1Þ 6 7 1;k−1jk−1 6 7 6 7 6 7 ⋮ 6 7 ð1Þ 6 7 ^ 6 7 bs1;k−1jk−1 6 7 6 7 6 7 ^ ð2Þ w 1;1;k−1jk−1 6 7 6 7 ⋮ 6 7 6 7 ð2Þ 6 7 ^ 1;s1;k−1jk−1 w 6 7 4 5 ð2Þ ^b 1;k−1jk−1

ð18Þ

It is noted that when evaluating the expectation of Yk condi^ ^ ^ tional on Dk−1, the term A k−1jk−1 Y k−1 becomes A k−1jk−1 Y k−1jk−1 . Then, it is cancelled out by the existing one in Eq. (11). Therefore, it is   ^ only necessary to evaluate r Y^ instead of G . The kjk−1

k−1jk−1

corresponding uncertainty of the state estimate can be quantified by the covariance matrix: ^ ^T Pkjk−1 ¼ A k−1jk−1 Pk−1jk−1 A k−1jk−1 þ Q

ð19Þ

Within this prediction covariance matrix, the (1,1) component s2x;k is the variance of the predicted network output on the kth day conditional on Dk−1. This quantity can represent the uncertainty of the normalized network output by the 97% confidence interval [x^ kjk−1 −3sxk|k−1, x^ kjk−1 +3sxk|k−1]. Although this confidence level is for the normalized output, an inverse linear transformation of Eq. (1) can be readily applied to obtain the corresponding confidence level of the original values. When the measurement on the kth step is available, the estimates of the system output and the network parameters are updated as follows:   −2 T ^ ð20Þ Y^ kjk ¼ Pkjk P−1 kjk−1 Y kjk−1 þ sn C zk The uncertainty of the state estimation is represented by its covariance matrix: Pkjk ¼ Pkjk−1 −

1 vk vTk s2n þ s2x;k

ð21Þ

where vk is the first column given by Pk|k−1CT of the predicting covariance matrix Pkjk−1 . It should be mentioned that the extended Kalman filter instead of the standard one is used for the state estimation because the TVMLP proposed above assumes that the predicted network output depends partly on the output of the previous step. It is expected that the modelling accuracy is inevitably affected when the measured history is directly adopted as the input due to the measurement noise. To reduce this error, the variable xk is incorporated into the state vector so that the filtered history can be used as the input instead of the measured one. In addition, the inclusion of the variable xk into the state vector also allows the variance of the unmodelled dynamics to be included as a noise parameter and can be quantified. However, the consequence of using this augmented state vector is that Eq. (9) may become nonlinear. Therefore, linearization is necessary for the application of the extended Kalman filter, which is not applicable to highly nonlinear systems (Yuen and Beck, 2003). From the formulations, it can be seen that the TVMLP is tuned in a different way compared to the conventional offline MLP. For the MLP, the network weights and biases are trained (adjusted) with a batch of input vectors and target values such that the sum squared modelling error is minimized. The network parameters are kept fixed afterwards unless network retraining with another set of data is performed. Whereas, the network weights and biases of the present TVMLP are updated adaptively at every step from the weighted combination between the predicting state vector and the measurement with Eq. (20). Therefore, any new change in the modelled system could be adapted from the updated coefficients. Furthermore, the associated uncertainties of the predicted or updated states can be quantified with Eq. (19) and Eq. (21), providing additional information that the backpropagation learning algorithm of the conventional MLP could not. However, the limitation of the adaptive updating scheme is that it is valid only when the measured air quality information of the filtering step is available. It is noted that the TVMLP is tuned adaptively at each filtering step by a weighted combination between the state estimate and the measurement. The weights are assigned by the process noise variance sf2 and the measurement noise variances sn2, which indicate the modelling error and the quality of the normalized

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

measurements, respectively. Previous studies show that the choice of these parameters can affect the performance of the Kalman filter (Harvey, 1991; Yuen et al., 2007; Hoi et al., 2010). Therefore, accurate estimation of these quantities is required. Here, these values are properly assigned with their optimal estimates obtained by minimizing the following objective function J(θ) derived based on the Bayesian approach (Yuen et al., 2007; Yuen, 2010; Yuen and Kuok, 2011): ( " #) ðzk −x^ kjk−1 Þ2 N 1 N : θ^ ¼ arg minJðθÞ ¼ arg min ln 2π þ ∑ lnðs2x;k þ s2n Þ þ 2 2 2k¼1 sx;k þ s2n θ∈Θ θ∈Θ

ð22Þ h

The symbol θ^ ¼ s^ 2f ; s^ 2n

iT

is a column vector containing the

optimal estimates of the noise variances. Although the process noise variance does not explicitly appear in the expression, its influence on the objective function can be reflected in the variance s2x;k . The uncertainty of the estimated noise variances can be quantified by using the posterior PDF of the parameter vector conditional on the measurements, e.g., to calculate the standard deviation or the contours with equal probability density. It should be noted that the objective function in Eq. (22) is derived based on a non-informative prior distribution of the uncertain parameters. Therefore, this function can be modified through the specification of the prior probability density function of the parameter vector.

3. Application of the TVMLP to air quality modelling and comparison with the MLP To show the capability of the TVMLP on modelling the variation of air quality over time, it is applied to the daily averaged PM10 concentrations recorded at an ambient monitoring station in Macau during the year 2001 to 2011. The PM10 concentrations are measured in μg m−3 by a Tapered Element Oscillating Microbalance (TEOM1400a) with a sampling interval of 1 h. The calculation of the daily averaged concentration is simply the arithmetic mean of the hourly concentrations over 24 h. The daily averaged concentration is considered valid when three quarters of the hourly concentrations are available during the day of concern. These validated data were provided by the Macau Meteorological and Geophysical Bureau (SMG), which is the official department for monitoring the ambient air quality. Fig. 2 shows the time history of the daily averaged PM10 concentrations from 2001 to 2011. The range of the measured concentrations within this period is between 5.32 μg m−3 and 617.05 μg m−3, with a mean of 58.49 μg m−3 and a standard deviation of 39.46 μg m−3. The measurements exhibit a distinct pattern related to the seasonal variation of the monsoon (Mok and Hoi, 2005). Based on this dataset, the results from the TVMLP will be compared with the conventional MLP counterparts.

151

For comparison, both of the TVMLP and MLP are set to estimate, at the output layer, the PM10 value of the next day with the same set of inputs. Previous study in Macau indicates that the daily averaged PM10 concentration at kth day (x′∈[5.32 mg m−3, 617.05 mg m−3]) is related to the daily averaged PM10 value of the (k−1) th day (x′1 ∈[5.32 mg m−3,617.05 mg m−3]), the hourly averaged PM10 value between 11:00 PM of the (k−1)th day and 00:00 AM of the kth day (x′2 ∈[0 mg m−3,514.30 mg m−3]), the magnitude (x′3 ∈[10.62 km h−1,1307.50 km h−1]) and the absolute angle (x′4 ∈[01,1801]) of the resultant wind velocity vector which is the vector sum of the hourly wind velocity vector over 24 h, and the rainfall index (x′5 ∈[0 mm h, 4700 mm h]) defined by the product of the daily rainfall amount and the duration of rainfall (Hoi et al., 2009). More details about the data collection and the quality assurance of those supplied variables from SMG (2013) can be referred to their official website. Hence this input– output relation (5 input variables vs. 1 output variable) is adopted here directly. Each raw input or output variable is scaled to the range of [−1,1] by using the corresponding maximum and minimum value of the training dataset. The performance of the TVMLP is compared to that of the best conventional MLP counterpart, i.e. with the same inputs, number of network layers and number of neurons in the hidden layer (hidden neurons) to give the value of the same output variable. The architecture of the offline MLP counterpart can be referred to the architecture of TVMLP, except that the weights and biases of the offline MLP are time-invariant and the time index k is omitted in those symbols. In this study, a single hidden layer is adopted in the network architecture. With the input–output relation adopted from Hoi et al. (2009) as discussed earlier, the parameter needed to be determined is the number of hidden neurons. To choose the best conventional MLP, data for the years 2001 and 2002 are used to train the MLPs of different number of hidden neurons (between 1 and 10) with the backpropagation algorithm. Each trained MLP is then tested with data for the year period 2003 to 2005. Performance of them is evaluated through indicators commonly used in air quality studies (Willmott, 1982; Comrie, 1997; Mok and Tam, 1998; Cobourn et al., 2000; Hoi et al., 2009). They are the root-mean-square error (RMSE), the normalized rootmean-square error (RMSE/RMSPM10), the mean absolute percentage error (MAPE), the coefficient of determination (r2) and the index of agreement (IA). In general, good performance associates with small values of RMSE, RMSE/RMSPM10 and MAPE, as well as large values of r2 and IA. Table 1 shows the performance of the trained MLPs with different number of neurons using the data from year 2003 to 2005. The MLP(3) with 3 hidden neurons performs best among the candidates in the testing phase. Therefore, this MLP (3) is chosen to compare with the TVMLP(3) that also uses 3 hidden neurons for predicting the PM10 concentrations from the year of 2003 to 2011. Before comparing the performance of TVMLP(3) and MLP(3), it is noted that the noise variances s2f and s2n are unknowns and their

Fig. 2. Time history of the measured daily averaged PM10 concentrations (2001–2011).

152

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

choices can affect the performance of TVMLP. Therefore, these values are properly assigned by using their optimal estimates with Eq. (22). It was found that the optimal noise variances s2f and s2n conditional on the measured daily averaged PM10 concentrations in year 2001 and 2002 are 0.0048 and 0.0162, respectively. It is noted that the noise variances estimated here are dimensionless since these two quantities correspond to the normalized PM10 time history. Therefore, the order of magnitude for these two estimated quantities cannot be compared with the variance (1557.4 mg2 m−6) of the PM10 time history directly. These values are set for the TVMLP(3) when it is used to predict data from year 2003 and on. The comparison of TVMLP(3) and MLP(3) is done in two phases; i.e. the testing phase including data from year 2003 to 2005 as the first, and the prediction phase including data from year 2006 to 2011 as the second. Since the MLP(3) was chosen based on its best performance during testing, the TVMLP(3) is compared with it first in order to see the start up condition of the two perceptrons. Time histories of their yearly modelling results are plotted together with the measurements in Fig. 3. Visually both of the TVMLP(3) and MLP(3) perform similarly and are able to capture the main variation trend of the data. However, the outputs of the TVMLP(3) could also include the corresponding 99.7% confidence interval of its prediction as shown by the grey shaded area in Fig. 3. It can be seen that the measurements are almost bounded within the confidence level, meaning that the extended Kalman filter based learning algorithm used by the TVMLP(3) is Table 1 Performance of the MLPs and the TVMLP(3) in the testing phase. Model class

RMSE (μg m−3)

RMSE/RMSPM10

MAPE (%)

r2

IA

MLP(1) MLP(2) MLP(3) MLP(4) MLP(5) MLP(6) MLP(7) MLP(8) MLP(9) MLP(10) TVMLP(3)

21.54 22.20 20.76 21.15 23.78 25.57 22.40 27.32 22.35 23.28 19.62

0.54 0.56 0.52 0.53 0.60 0.64 0.56 0.69 0.56 0.59 0.49

27.08 28.52 25.53 25.38 25.66 28.00 26.43 26.67 25.38 26.63 28.53

0.76 0.74 0.77 0.76 0.69 0.66 0.73 0.61 0.73 0.71 0.78

0.91 0.91 0.92 0.92 0.90 0.89 0.91 0.87 0.91 0.90 0.94

able to provide reliable adaptive estimate of its prediction uncertainty. This is one advantage that the backpropagation training algorithm used by the MLP(3) does not possess. To obtain an overall comparison of the two perceptrons in this phase, performance indicators of the TVMLP(3) are calculated and listed in Table 1 together with those of the MLPs. It can be seen that these indicators of the TVMLP(3) and the best performing MLP(3) are similar with those of the TVMLP(3) being slightly better except for the MAPE. The RMSE and RMSE/RMSPM10 of TVMLP(3) are about 6% lower, values of the r2 and IA are 1% to 2% higher. The difference in MAPE between the two is 3%. With these minimal differences, the TVMLP(3) and MLP(3) are considered performing at similar capacity to start with. Comparison of them in the prediction phase is now carried out. In the prediction phase, both of the TVMLP(3) and MLP(3) are set to forecast the daily averaged PM10 concentration from year 2006 to 2011. Time histories of their yearly modelling results are plotted in Fig. 4 together with the measurements and the 99.7% confidence interval given by the TVMLP(3). The two perceptrons are able to continue to capture the main data variation trend. The overall comparison is by reviewing their performance indicators shown in Table 2. It is noted that TVMLP(3) performs better in all indicators during 2006 and 2007, except the MAPE. Notable differences in the indicators can be observed in the years 2009 and 2010. Finally, the TVMLP(3) model also outperforms the MLP (3) model in all indicators during 2010 and 2011. Overall, the RMSE and the RMSE/RMSPM10 of TVMLP(3) within the six years are about 30% lower. To further show the confidence of the calculated percentage difference, hypothesis test is conducted to examine its statistical significance. The paired t-test which is a well-known statistical test for comparison of two population means was adopted here. The null hypothesis assumes that there is no difference in the means of the squared prediction errors of both models. The alternate hypothesis is that the mean of the squared errors for the TVMLP(3) is less than that of the MLP(3). The t-test yields the significance level, which indicates the probability level that the null hypothesis should be supported. Therefore, the smaller the significance level, the more probable the null hypothesis should be rejected and replaced by the alternate one. The ttest shows that the significance level is 15%, meaning that it is highly probable (≥85%) that the 30% difference is true. In addition, the MAPE of TVMLP(3) is 8% lower and the t-test also showed that

Fig. 3. Time histories of the modeling results from the MLP(3) (dotted line), the TVMLP(3) (solid line) with its corresponding confidence interval (shaded area), and the measured daily averaged PM10 concentrations (circles) in the testing phase between 2003 and 2005.

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

153

Fig. 4. Time histories of the modeling results from the MLP(3) (dotted line), the TVMLP(3) (solid line) with its corresponding confidence interval (shaded area), and the measured daily averaged PM10 concentrations (circles) in the prediction phase between 2006 and 2011.

Table 2 Performance of the MLP(3) and the TVMLP(3) in the prediction phase. Year

Model class

RMSE (μg m−3)

RMSE/RMSPM10

MAPE (%)

r2

IA

2006

MLP(3) TVMLP(3) MLP(3) TVMLP(3) MLP(3) TVMLP(3) MLP(3) TVMLP(3) MLP(3) TVMLP(3) MLP(3) TVMLP(3) MLP(3) TVMLP(3)

23.12 21.58 22.59 21.82 17.27 16.84 51.64 16.23 33.18 30.01 14.38 14.24 29.74 20.78

0.58 0.54 0.50 0.48 0.45 0.44 1.13 0.53 0.73 0.66 0.49 0.48 0.75 0.53

26.72 29.38 23.83 25.81 27.27 23.80 31.45 26.82 38.06 29.50 24.59 24.41 28.63 26.61

0.67 0.71 0.77 0.77 0.80 0.81 0.30 0.72 0.48 0.58 0.77 0.77 0.51 0.72

0.90 0.91 0.92 0.93 0.94 0.95 0.61 0.92 0.77 0.82 0.93 0.93 0.83 0.91

2007 2008 2009 2010 2011 Overall 2006–2011

this difference is very likely (≥99%) to be true. Finally, values of the r2 and IA for TVMLP(3) model are 41% to 10% higher. As the inputs and the number of hidden neurons in both perceptrons are the

same, the obvious improvement in prediction must be due to the introduction of adaptiveness to the TVMLP through its extended Kalman filter based learning algorithm that allows update of the network parameters whenever new measurement is obtained. To illustrate under which conditions the TVMLP(3) performs better, scatterplots of the measured daily averaged PM10 concentrations and the predictions by the two perceptrons are shown in Fig. 5. For the results given by the MLP(3) shown in Fig. 5(a), the points generally fall around the 451 line when the measured PM10 concentrations are low. However, when the PM10 concentrations are above 100 μg m−3, the points fall mostly below the 451 line, meaning that the MLP(3) underestimates the high concentrations. On the other hand, results of the TVMLP(3) shown in Fig. 5(b) indicate that the points are distributed more evenly about the 451 line even when the measured PM10 concentrations are above 100 μg m−3. To further compare the performance of both models in capturing these extreme values, the RMSEs of both models in modelling high PM10 concentrations ([PM10]4100 μg m−3) are evaluated. It is found that the RMSE (42.41 μg m−3) of the TVMLP(3) is 40% lower than that (70.19 μg m−3) of the MLP(3) and the t-test shows that the difference has a significance level of 15%. From this comparison, it is very likely

154

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

Application of the TVMLP on modelling the daily averaged PM10 concentration in Macau shows its advantage over the conventional MLP with the same set of input variables, the same functional form, and the same number of hidden neurons. More importantly, the TVMLP is shown to perform better in the range of higher particulate concentration. This improvement is important because the days with higher pollution concentrations would impose greater health effects on individuals; hence are concern of the sensitive groups.

Predicted PM10 concentrations (μg m−3)

250

200

150

100

Acknowledgments

50

0

0

50

100

150

200

250

Measured PM10 concentrations ( μg m−3)

Predicted PM10 concentrations (μg m−3)

250

References

200

150

100

50

0

The present study is supported by the Science and Technology Development Fund of the Macau SAR government under grant no. 025/2011/A and the university level grant UL019/08-Y5/CEE/ MKM01/FST of the research committee of UM. The Macau Meteorological and Geophysical Bureau is acknowledged for supplying the data.

0

50

100

150

200

250

Measured PM10 concentrations ( μg m−3) Fig. 5. Scatterplot of modelled PM10 concentrations and the measurements (a) by the MLP(3) and (b) by the TVMLP(3) in the prediction phase.

(≥85%) that the TVMLP(3) performs better in the range of higher particulate concentration due to the introduction of adaptiveness to the network. This reconfirms that changes of the air quality system over time can be tracked and reacted by the TVMLP.

4. Conclusion Considering possible time-varying nature of the ambient air quality system, the conventional MLP that uses an offline learning algorithm would not be able to give satisfactory prediction. The problem is that the network parameters are kept fixed in the prediction phase after training. Therefore it is not able to adapt to the changing system that it models over time. To deal with this problem, one can retrain the MLP offline at an appropriate time with another set of data when available, which is cumbersome and time consuming. Another way is to introduce adaptiveness to the MLP so that it could adjust itself automatically when new information is available. In the present study, the adaptive approach was chosen. To improve the MLP, the extended Kalman filter was adopted into the learning algorithm to build a time-varying multilayer perceptron (TVMLP). The formulated adaptive learning algorithm could also address explicitly the uncertainty of the prediction, which the backpropagation training algorithm used by the conventional MLP could not, making the users more confident in applying the TVMLP results for decision making when needed.

Canty, M.J., 2009. Boosting a fast neural network for supervised land cover classification. Computer and Geosciences 35, 1280–1295. Chattopadhyay, G., Chattopadhyay, S., 2009. Autoregressive forecast of monthly total ozone concentration: a neurocomputing approach. Computers and Geosciences 35, 1925–1932. Chibani, Y., Nemmour, H., 2003. Kalman filtering as a multilayer perceptron training algorithm for detecting changes in remotely sensed imagery. In: Proceedings of IEEE International Geoscience and Remote Sensing Symposium IGARSS’03. Toulouse, France, pp. 4101–4103. Cobourn, W.G., Dolcine, L., French, M., Hubbard, M.C., 2000. A comparison of nonlinear regression and neural network models for ground-level ozone forecasting. Journal of Air and Waste Management Association 50, 1999–2009. Comrie, A.C., 1997. Comparing neural networks and regression models for ozone forecasting. Journal of Air and Waste Management Association 47, 653–663. Härter, F.P., Campos Velho, H.F., 2008. New approach to applying neural network in nonlinear dynamic model. Applied Mathematical Modelling 32, 2621–2633. Härter, F.P., Campos Velho, H.F., 2010. Multilayer perceptron neural network in a data assimilation scenario. Engineering Applications of Computational Fluid Mechanics 4, 237–245. Harvey, A.C., 1991. Forecasting Structural Time Series Models and the Kalman Filter. Cambridge University Press. Hoi, K.I., Yuen, K.V., Mok, K.M., 2009. Prediction of daily averaged PM10 concentrations by statistical time-varying model. Atmospheric Environment 43, 2579–2581. Hoi, K.I., Yuen, K.V., Mok, K.M., 2010. Optimizing the performance of Kalman filter based statistical time-varying air quality models. Global NEST Journal 12, 27–39. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366. Lu, H.C., Chang, T.S., 2005. Meteorologically adjusted trends of daily maximum ozone concentrations in Taipei, Taiwan. Atmospheric Environment 39, 6491–6501. Lu, X., Chow, K.C., Yao, T., Lau, K.H., Fung, C.H., 2009. Effects of urbanization on the land sea breeze circulation over the Pearl River Delta region in winter. International Journal of Climatology, 1089–1104, http://dx.doi.org/10.1002/ joc.1947. Mok, K.M., Hoi, K.I., 2005. Effect of meteorological conditions on PM10 concentrations—a study in Macau. Environmental Monitoring and Assessment 102, 201–223. Mok, K.M., Tam, S.C., 1998. Short-term prediction of SO2 concentration in Macau with artificial neural networks. Energy and Buildings 28, 279–286. Nemmour, H., Chibani, Y., 2003. Comparison between object- and pixel-level approaches for change detection in multispectral images by using neural networks. In: Proceedings of Image and Signal Processing for Remote Sensing IX. Barcelona, Spain, pp. 551–559. Plochl, M., 2001. Neural network approach for modelling ammonia emission after manure application on the field. Atmospheric Environment 35, 5833–5841. Singhal, S., Wu, L., 1989. Training feed-forward networks with the extended Kalman algorithm. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing ICASSP-89. Glasgow, Scotland, pp. 1187–1190. SMG, 2013. Annual Meteorological Observation Report of 2012, 40pp. 〈http://www. smg.gov.mo/www/ccaa/pdf/e_pdf_download.php〉, [accessed 20 April, 2013]. Voukantis, D., Niska, H., Karatzas, K., Riga, M., Damialis, A., 2010. Forecasting daily pollen concentrations using data-driven modelling in Thessaloniki, Greece. Atmospheric Environment 44, 5101–5111.

K.I. Hoi et al. / Computers & Geosciences 59 (2013) 148–155

Willmott, C.J., 1982. Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society 63, 1309–1313. Yuen, K.V., 2010. Bayesian Methods in Structural Dynamics and Civil Engineering. John Wiley and Sons (Asia) Pte Ltd, Singapore p. 294. Yuen, K.V., Beck, J.L., 2003. Updating properties of nonlinear dynamical systems with uncertain input. Journal of Engineering Mechanics, 129. ASCE, pp. 9–20.

155

Yuen, K.V., Hoi, K.I., Mok, K.M., 2007. Selection of noise parameters for Kalman filter. Earthquake Engineering and Engineering Vibration 6, 49–56. Yuen, K.V., Kuok, S.C., 2011. Bayesian methods for updating dynamic models. Applied Mechanics Reviews 64, 010802–1–010802-18. Yuen, K.V., Lam, H.F., 2006. On the complexity of artificial neural networks for smart structures monitoring. Engineering Structures 28 (7), 977–984.