Prediction Interval Construction and Optimization for Adaptive Neuro ...

Report 7 Downloads 12 Views
1

Prediction Interval Construction and Optimization for Adaptive Neuro Fuzzy Inference Systems Abbas Khosravi, , Saeid Nahavandi, and Doug Creighton, Abstract The performance of Adaptive Neuro Fuzzy Inference System (ANFIS) significantly drops when uncertainty exists in the data or system operation. Prediction Intervals (PIs) can quantify the uncertainty associated with ANFIS point predictions. This paper first presents a methodology to adapt the delta technique for construction of PIs for outcomes of the ANFIS models. As the ANFIS models are linear in their consequent part, the ANFIS–based PIs are computationally less expensive than Neural Network (NN)–based PIs. Secondly, the paper proposes a method for optimizing ANFIS–based PIs. A new PI-based cost function is developed for training of the ANFIS models. A simulated annealing–based algorithm is applied for minimization of the new nonlinear cost function and adjustment of the premise and consequent parameters of the ANFIS model. Using three real world case studies, it is shown that ANFIS–based PIs are computationally less expensive than NN–based PIs. The application of the proposed optimization algorithm leads to better quality PIs than optimized NN–based PIs. Index Terms Prediction interval, uncertainty, ANFIS.

I. I NTRODUCTION Application of NNs and ANFIS models for prediction purposes has proliferated in the last two decades. Both are universal approximators with the capability of identification and approximation of nonlinear relationships between independent (inputs) and the dependent (targets) variables [1] [2] [3]. As NN and ANFIS models both generate averaged values of the targets, their performance decreases under uncertainty in the data. Sources of uncertainty include the occurrence of probabilistic events, lack of data, and measurement noise. Such a reduction cannot be mitigated through changing the model structure or repeating the training process. For effective decision–making, it is important to know how well the

2

predictions generated by these models match the real targets. A practical problem of NNs and ANFIS is that they only provide a point prediction without any indication of its accuracy. Such a prediction conveys no information about the prediction errors, the data uncertainty, and the reliability of the generated results. Prediction Intervals (PIs) have been proposed in literature to address these deficiencies. A PI represents the uncertainty in prediction and offers a range that the target is highly likely to lie within. Compared to point predictions, PIs provide more information about possible future realities and are practically more useful. A range of techniques has been proposed for PI construction for NN outcomes in the literature [4], [5], [6], [7], [8], [9], [10], [11]. However, there is no published resource considering the construction of PIs for the predictions generated by ANFIS. As NNs and ANFIS have the same structure and share similar learning algorithms [1], it should be possible to apply a NN-based PI construction method to ANFIS models as well. One of the primary benefits for using ANFIS for optimized PI generation is that the computational mass will be markedly less than the case of NNs. The delta technique requires calculation of the NN output derivative against the network parameters. As the activation functions of the NN hidden layers are usually nonlinear, the computational burden of the method is high. In contrast, the consequent parameters of the ANFIS models are linear and their derivative calculation is computationally inexpensive. Therefore, the computational mass will be lower, allowing the generation of PIs more quickly. A second area that has remained unarticulated is how PIs can be optimized in terms of their width and coverage probability. The focus of existing scientific and practical research is predominantly on the coverage probability of PIs for their assessment. The closeness of this to the nominal confidence level, (1 − 𝛼)%, is interpreted as an indication of the quality of PIs. Such an evaluation is highly subjective and

can lead to misleading results and potentially disastrous decision-making consequences. In this paper, measures proposed in [12] and [13] are applied for the quantitative evaluation of PIs. A new PI–based cost function is developed using these measures for enhancing the quality of PIs through reducing their width and increasing their coverage probability. A Simulated Annealing (SA) method [14] is applied to minimize the developed cost function. In literature, there are numerous papers reporting successful application of SA for solving this type of optimization problems [15] [16]. The major contribution of this research is two-fold. First, it extends the delta technique to ANFIS models. Through this extension, PIs can be constructed for quantification of uncertainties associated with point predictions obtained from ANFIS models. Second, it proposes a new method for improving prediction quality through PI optimization. This improvement is achieved through training ANFIS models using an innovative PI-based cost function. This paper is organized as follows; Section II describes how the NN delta technique can be applied to the

3

ANFIS. Section III introduces the PI assessment measures. The new cost function and its minimization are discussed in Section IV. Experimental results are demonstrated in Section V. Finally, Section VI concludes the paper with remarks for further study in this domain. II. A DAPTING T HE D ELTA T ECHNIQUE FOR ANFIS A NN model can be represented as a nonlinear regression model as follows, 𝑦 = 𝑓 (𝑋, 𝑤∗ ) + 𝜖

(1)

where 𝑋 and 𝑦 are respectively the set of inputs (𝑚 independent variables) and the corresponding target (dependent variable). 𝑓 (⋅) with the parameter set 𝑤∗ is the nonlinear function representing the true regression function. 𝜖 is the noise with a zero expectation. The estimated NN parameters are represented by 𝑤 ˆ . It is obtained through minimizing the Sum of Squared Error (SSE) cost function,

𝑆𝑆𝐸 = (𝑦 − 𝑦ˆ)𝑇 (𝑦 − 𝑦ˆ)

(2)

The NN parameter estimation problem can be viewed as a nonlinear regression problem, using the cost function defined in (2). Therefore, the standard methods for constructing the asymptotic PIs for nonlinear regression models are applicable. In the neighborhood of the true set of NN parameters, 𝑤∗ , 𝑓 (𝑋, 𝑤) can be approximated by a first-order Taylor series, 𝑦ˆ = 𝑓 (𝑋, 𝑤∗ ) + 𝐽(𝑤 ˆ − 𝑤∗ )

where the 𝑖, 𝑗 –th element in the matrix 𝐽 is

∂𝑓𝑖 ∂𝑤𝑗 .

(3)

These derivatives are calculated for the true set of

parameters, 𝑤∗ . For the sample 𝑥0 , the linear model is given by, 𝑦ˆ0 = 𝑓 (𝑥0 , 𝑤∗ ) + 𝑔0𝑇 (𝑤 ˆ − 𝑤∗ ) 𝑔0𝑇 is a vector whose 𝑗 –th entry is

∂𝑓 (𝑥0 ) ∂𝑤𝑗 .

(4)

With the assumption that 𝜖 in (1) are independently and

normally distributed (𝑁 (0, 𝜎 2 )), the (1 − 𝛼)% PI for 𝑦ˆ𝑖 is, 1− 𝛼2

𝑦ˆ0 ± 𝑡𝑑𝑓 1− 𝛼2

𝑡𝑑𝑓

is the

𝛼 2

𝑠

√ 1 + 𝑔0𝑇 (𝐽 𝑇 𝐽)−1 𝑔0

(5)

quantile of a cumulative t-distribution function with 𝑑𝑓 degrees of freedom. 𝑑𝑓 here is

the difference between number of training samples (𝑛) and number of NN parameters (𝑝). 𝐽 is also the

4

Jacobian matrix of the NN model with respect to its parameters. 𝑠 is an unbiased estimation of 𝜎 given below,

𝑠2 =

𝑆𝑆𝐸 𝑛−𝑝

(6)

For those cases that (𝐽 𝑇 𝐽) in (5) is (nearly) singular, a very small value, (𝐽 𝑇 𝐽 + 𝜆𝐼)−1 , can be added. The singularity problem is avoided at a cost of a relatively small increase in the widths of PIs. A detailed discussion of this technique and its extension can be found in [5] and [7]. The delta technique as described above can be adapted and applied to the ANFIS models. ANFIS uses the Takagi–Sugeno fuzzy inference engine model with a six layer feed-forward network [1]. The ANFIS is essentially a multilayer feed-forward network that uses NN learning algorithms and fuzzy reasoning to characterize an input space to an output space. Similar to the work done by Hwang et al. [5], it has been proven that these models are globally identifiable [17]. It can be considered as the nonlinear regression model shown in (1). 𝑤 corresponds to the set of premise (membership function parameters) and consequent parameters of the ANFIS. These parameters are adjusted through minimization of (2) using the backpropagation or hybrid training techniques [1]. Entries of 𝐽 and 𝑔0 are derivatives of the ANFIS output with respect to its premise and consequent parameters (totally 𝑝 parameters). As the consequent part of each rule in the ANFIS model is a linear combination of the input variables and a constant, these derivatives can be calculated in much less time when compared to NN models. The formula represented in (5) can be applied for the development of PIs for the ANFIS models. III. PI A SSESSMENT I NDICES There is limited information in the literature regarding how PIs can be assessed quantitatively. In the majority of papers applying PI construction techniques, discussion of the quality of PIs is either ignored, incomplete, or subjective. The PI Coverage Probability (PICP) [13] [12], the probability that the targets lie within the constructed PIs, is computed by, 𝑛

𝑃 𝐼𝐶𝑃 =

1∑ 𝑐𝑖 𝑛

(7)

𝑖=1

where 𝑐𝑖 = 1 if 𝑦𝑖 ∈ [𝐿𝑖 , 𝑈𝑖 ], otherwise 𝑐𝑖 = 0. 𝐿𝑖 and 𝑈𝑖 are lower and upper bounds of the i-th PI respectively. From a theoretical standpoint, PICP should be sufficiently close to or greater than the nominal confidence level.

5

Normalized Mean Prediction Interval Width (NMPIW) [12] [13] is an index measuring how wide PIs are. NMPIW is calculated as follows, 𝑛

1 ∑ 𝑁 𝑀 𝑃 𝐼𝑊 = (𝑈𝑖 − 𝐿𝑖 ) 𝑛𝑅

(8)

𝑖=1

where 𝑅 = 𝑦𝑚𝑎𝑥 − 𝑦𝑚𝑖𝑛 . NMPIW is the mean of the PI widths normalized by the target range. The normalization allows PIs constructed for different targets to be compared objectively. In the case of using the extreme values as PIs, NMPIW will equal one (the maximum value). Practically, it is desirable to have PIs with small NMPIW and a high PICP. These two measures have a direct relationship, as increasing either leads to the increment of the next one. Therefore, a comprehensive measure needs to cover both aspects of PIs: width and coverage probability. As the coverage probability is theoretically important, any proposed measure should highly penalize PIs with unsatisfactory coverage probability. Thus, more importance should be given to PICP when developing the new measure. The Coverage-Width-based Criterion (CWC) [12] addresses these requirements,

𝐶𝑊 𝐶 =

𝑁 𝑀 𝑃 𝐼𝑊 𝜎(𝑃 𝐼𝐶𝑃, 𝜂, 𝜇)

(9)

where 𝜎(⋅) is the sigmoidal function defined as follows,

𝜎(𝑃 𝐼𝐶𝑃, 𝜂, 𝜇) =

1 1+

𝑒−𝜂(𝑃 𝐼𝐶𝑃 −𝜇)

(10)

where 𝜂 and 𝜇 are two controlling parameters. The level of confidence associated with PIs can be appropriately used as a guide to select the hyperparameters of CWC. One reasonable principle is to penalize PIs whose PICP is less than (1 − 𝛼)%. This is based on the theory that the coverage probability of PIs in an infinite number of replicates will approach (1 − 𝛼)%. IV. PI O PTIMIZATION The delta technique described in Section II is only employed for the construction of PIs. The ANFIS models are trained using the backpropagation technique in order to minimize the SSE cost function. This section introduces a method for adjusting ANFIS parameters for improving the quality of PIs in terms of their width and coverage probability. In the optimization method proposed here, ANFIS models are first trained using the traditional backpropagation technique. The optimization algorithm then modifies these parameters in order to minimize a

6

PI-based cost function. The focus of the proposed method is on optimal adjustment of ANFIS parameters based on characteristics of PIs. The ANFIS model structure, as described in [1], remains unchanged. Hereafter opt subscript is used for all computed measures and models developed using the proposed optimization algorithm. Also, bpm subscript indicates those developed and computed with the backpropagation method. The first step is to develop a cost function that includes measures related to the quality of PIs. As the CWC index covers both width and the coverage probability of PIs, it should be at the core of the new PI-based cost function. The fundamental theories of the delta technique are valid when the ANFIS models are trained through minimization of SSE. To keep these conditions valid when training ANFIS models based on the new cost function, at least one of the followings should be satisfied: (i) 𝑤 ˆ remains in a close neighborhood of the ANFIS parameters obtained based on minimization of (2), (ii) selection of a new set of parameters is only acceptable when it leads to a reduction or a small increase in (2). The requirement for the first option is obvious based on the linearization in (3). The second option guarantees that SSE is never increased. Therefore, the following PI-error-based Cost Function (PICF) is proposed for training ANFIS and adjusting its premise and consequent parameters, 𝑃 𝐼𝐶𝐹 = 𝐶𝑊 𝐶 + 𝜋 𝑒 𝛽 (𝑆𝑆𝐸𝑜𝑝𝑡 − 𝑆𝑆𝐸𝑏𝑝𝑚 )

(11)

where,

𝜋=

⎧   0 𝑆𝑆𝐸𝑜𝑝𝑡 ≤ 𝑆𝑆𝐸𝑏𝑝𝑚  ⎨

(12)

   ⎩1 𝑆𝑆𝐸 > 𝑆𝑆𝐸 𝑜𝑝𝑡 𝑏𝑝𝑚 𝜋 is a coefficient controlling the penalization. 𝑃 𝐼𝐶𝐹 = 𝐶𝑊 𝐶 when 𝑆𝑆𝐸𝑜𝑝𝑡 < 𝑆𝑆𝐸𝑏𝑝𝑚 . This means

that conditions of the delta technique are valid for the new set of ANFIS parameters (𝑤𝑜𝑝𝑡 ). 𝜋 is one when 𝑆𝑆𝐸𝑜𝑝𝑡 > 𝑆𝑆𝐸𝑏𝑝𝑚 . Therefore, PICF is increased proportional to the difference between 𝑆𝑆𝐸𝑜𝑝𝑡 and 𝑆𝑆𝐸𝑏𝑝𝑚 . The magnifying parameter, 𝛽 , controls the rate of increase. The first term in the right-hand side of (11) relates to the quality of PIs. It is the CWC measure defined in (9), which is based on the width and coverage probability of PIs. Minimizing this term will improve the informativeness of correct PIs (making them narrower). The second term in the right-hand side of (11) is an exponential term related to prediction error. This term is added to the cost function to keep the fundamental assumptions of the delta technique valid.

7

ANFIS prediction performance is sensitive to its premise and consequent parameters. This sensitivity results in complex PICF behavior with many local minima. Descent-based methods are not suitable for PICF minimization, because they are biased towards local optima. PICF also depends on 𝜋 and the PICP variable. As both are discontinuous, the gradient-based optimization methods are not applicable. Derivative-free optimization methods, in contrast, do not require any information about the gradient and higher derivatives of the cost function. In addition, the global optimization methods are not limited by the presence of a local minimum during the optimization process. Hence, they are the best candidates for minimization of (11) and adjustment of the ANFIS parameters. The PI optimization method is based on the Simulated Annealing (SA) optimization technique introduced in [14] 1 . SA randomly explores the neighborhood of the current solution seeking a better solution. It has the ability to escape from local minima due to the probability of accepting a new solution that increases the cost function. The probability is controlled by a parameter called the cooling temperature. The training algorithm is described in Fig. 1. Three data subsets are randomly selected from the available samples. The first two datasets, 𝐷1 and 𝐷2 , are used for training and the third one, 𝐷𝑡𝑒𝑠𝑡 , is applied as the test dataset. The backpropagation technique is used for training the ANFIS model using 𝐷1 through minimization of SSE. The obtained set of ANFIS parameters (𝑤𝑏𝑝𝑚 ) is applied as the initial values for the optimization procedure. Also 𝑆𝑆𝐸𝑏𝑝𝑚 (corresponding to 𝑤𝑏𝑝𝑚 ) is calculated and used as part of PICF defined in (11). The cooling temperature is set to be sufficiently large to allow uphill movements in the early iterations of the optimization algorithm. For each iteration, a new set of parameters (𝑤𝑛𝑒𝑤 ) is generated through random perturbation of one parameter of the current solution (𝑤𝑜𝑝𝑡 ). PIs are then constructed for the new set of ANFIS parameters. If this new set of parameters improves the PI quality (𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 < 𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 ), the optimization algorithm accepts the transition, and 𝑤𝑜𝑝𝑡 = 𝑤𝑛𝑒𝑤 . Otherwise, a random number is generated between 0 to 1. A decision to accept or reject an uphill movement is made based on the Boltzmann distribution (Steps 8 and 9 in Fig. 1). In case of rejection, 𝑤𝑜𝑝𝑡 remains unchanged for the current iteration. The optimization algorithm continues (returning to Step 4) until one or more of the following termination criteria are met: (i) reaching the maximum number of iterations, (ii) no further improvement for a specific number of consecutive iterations, (iii) reaching a very low temperature, or (iv) 𝑃 𝐼𝐶𝐹 becoming smaller than a predefined value. The proposed method uses two training datasets, 𝐷1 and 𝐷2 , for adjusting the ANFIS parameters. The optimization algorithm primarily uses samples of 𝐷2 for minimization of the proposed cost function. 1

Other metaheuristic optimization algorithms, such as genetic algorithm, can also be used for minimization of the PICF.

8

Step 1

Description Split the dataset into 𝐷1 , 𝐷2 , and 𝐷𝑡𝑒𝑠𝑡 .

2

Train the ANFIS using the backpropagation technique for minimization of SSE.

3

Initialization (𝑇 = 𝑇0 and 𝑤𝑜𝑝𝑡 = 𝑤𝑏𝑝𝑚 )

4

Update the cooling temperature (𝑇𝑘 )

5

Generate a new set of parameters for ANFIS (𝑤𝑛𝑒𝑤 ) through random perturbation

6

Construct PIs for 𝐷2 and calculate PICF.

7

If 𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 < 𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 , then 𝑤𝑜𝑝𝑡 = 𝑤𝑛𝑒𝑤 and 𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 = 𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 . Go to Step 10.

8

If 𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 > 𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 , generate a random number (𝑟), where 𝑟 ∈ [0, 1].

9

If 𝑟 ≥ 𝑒



𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 −𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 𝜅𝑇𝑘

, then

𝑤𝑜𝑝𝑡 = 𝑤𝑛𝑒𝑤 and 𝑃 𝐼𝐶𝐹𝑜𝑝𝑡 = 𝑃 𝐼𝐶𝐹𝑛𝑒𝑤 . 10

If the termination criterion is not met, then return to Step 4.

11

Construct PI for test samples using 𝐴𝑁 𝐹 𝐼𝑆𝑜𝑝𝑡 .

Fig. 1.

The Simulated Annealing-based optimization algorithm

Application of these two datasets helps to avoid over-fitting problems and improves the generalization power of the optimized ANFIS models. This feature is demonstrated through case studies in the next section. V. E XPERIMENTS AND R ESULTS Three real world case studies are implemented and analyzed to examine the effectiveness of the proposed method for the development of optimal ANFIS-based PIs. The level of confidence associated with all PIs is 90%. 𝜂 and 𝜇 are set to 200 and 0.875 respectively. Initial temperature is set to 5, which allows uphill movements in the early iterations of the optimization algorithm. A geometric cooling schedule with a cooling factor of 0.95 is applied during the optimization. In all experiments, it is assumed that each ANFIS input has three Gaussian Membership Functions (MFs). Therefore, the number of rules in the ANFIS model will be 3𝑛𝑖𝑛𝑝𝑢𝑡 , where 𝑛𝑖𝑛𝑝𝑢𝑡 is the number of inputs. For the purpose of comparison, the optimized delta method for NNs, 𝐷𝑒𝑙𝑡𝑎𝑁 𝑁 𝑜𝑝𝑡 , is also implemented for three case studies [12]. The NN structure is determined through a five fold cross-validation technique.

9

Fig. 2.

Profile of PICF (solid line) and the cooling temperature (dashed line) for T70.

A. Baggage Handling System The first case study in this paper deals with the prediction of travel times of bags in an airport Baggage Handling System (BHS). The underlying targets in the experiments are time required for processing 70% and 90% of each flight’s bags. These targets hereafter are referred to as T70 and T90. The high level of uncertainty in this system is primarily due to the frequent occurrence of probabilistic events. Further experimental information can be found in [13] and [18]. The iteration history of PICF and the cooling temperature during the optimization process have been displayed in Fig. 2 for T70. There are early rapid fluctuations indicating the optimization algorithm’s effort to avoid becoming stuck in local minima. Several transitions to higher-cost solutions are permitted through randomly generated transition probability. As the temperature falls (Step four in the optimization algorithm), uphill movements are less likely to be allowed. Rapid fluctuations gradually decrease and PICF follows a continuous downward trend. After iteration 3000, the optimization algorithm becomes greedy and only accepts downhill transitions. Application of the optimization algorithm reduces 𝑃 𝐼𝐶𝐹𝑇 90 from 58.39 to 50.27. For T70, the reduction is 7.14 (from 44.51 to 37.37). The results indicate that the ANFIS parameters determined using the traditional training algorithms are not optimal in terms of the PI characteristics (width and coverage probability). The set of ANFIS parameters obtained using the proposed optimization method leads to more reliable and higher quality PIs. Table I summarizes NMPIW, PICP, and CWC for the test samples (𝐷𝑡𝑒𝑠𝑡 ) of T70 and T90. An interesting feature of the proposed optimization method is that it squeezes PIs without compromising their coverage probability. These results show the application of the proposed PI optimization method not only reduces the width of PIs, but also improves the coverage probability of PIs. The obtained results also indicate that 𝑃 𝐼𝑜𝑝𝑡 are higher quality (greater PICP with lower NMPIW) than PIs constructed using

10

TABLE I PI CHARACTERISTICS FOR TEST SAMPLES (𝐷𝑡𝑒𝑠𝑡 )

OF THE THREE CASE STUDIES

Target

Optimized Delta for NNs PICP(%) NMPIW(%) CWC

Traditional method (bpm) PICP(%) NMPIW(%) CWC

Proposed optimization method (opt) PICP(%) NMPIW(%) CWC

T70 T90

95.59 91.18

43.43 58.95

43.43 58.99

97.06 89.71

43.89 62.58

43.89 63.34

98.53 91.18

36.84 55.45

36.84 55.48

Steam pressure Main steam temperature Reheat steam temperature

88.00 94.00 84.00

61.26 81.54 46.39

83.80 81.54 50921

92.00 96.00 92.00

88.95 53.88 53.20

88.96 53.88 53.21

92.00 96.00 92.00

79.80 47.77 47.09

79.81 47.77 47.10

Dry bulb temperature Moisture content Wet bulb temperature

94.47 90.78 86.18

64.60 63.75 60.59

64.60 63.84 918 5

91.70 93.09 92.63

66.93 56.73 69.85

66.95 56.73 69.85

90.78 93.09 91.71

61.42 53.77 65.15

61.50 53.75 65.16

the optimized delta method for NNs. The PIs become wider as the uncertainty level in data increases. From an operational stand point, T90 is more affected by the occurrence of probabilistic events in BHS than T70. Therefore, its associated uncertainty is higher, and this results in wider PIs (𝑁 𝑀 𝑃 𝐼𝑊𝑏𝑝𝑚 ). On average, 𝑃 𝐼𝑇 90 are 18.97% wider than 𝑃 𝐼𝑇 70 . B. Power Station The second case study in this paper is a 120 MW power plant described in [19]. The purpose of modeling is to predict the steam pressure, main steam temperature, and reheat steam temperature based on subsets of inputs. In this study two variables are considered as the ANFIS inputs. The convergence behavior of the optimization algorithm for these three targets is similar to the one shown in Fig. 2. The optimization algorithm terminates in less than 12,000 iterations. During the optimization, the cost function drops from 53.88 (first iteration) to 48.04 (termination point). The evaluation measures for PIs constructed for the test samples of these three targets are summarized in Table I. Again, the quality of PIs is significantly improved through the application of the optimization algorithm. PIs for steam pressure show the most improvement. While the width of PIs for this target decreased from 88.95% to 79.80%, the PICP remained unchanged (92%). The experiment results also reveal that the proposed optimized delta method for ANFIS outperforms the optimized delta method for NNs. Quality of 𝑃 𝐼𝑜𝑝𝑡 is superior to those constructed using the 𝐷𝑒𝑙𝑡𝑎𝑁 𝑁 𝑜𝑝𝑡 method for all three targets of power station. For the reheat steam temperature, the coverage probability of NN–based PIs is low resulting in a large CWC. 𝑃 𝐼𝑏𝑝𝑚 and 𝑃 𝐼𝑜𝑝𝑡 for reheat steam temperature are illustrated in Fig. 3. The superiority of 𝑃 𝐼𝑜𝑝𝑡 over

11

Fig. 3. PIs constructed using the proposed optimization method (top) and the traditional technique (bottom) for reheat steam temperature (second case study).

𝑃 𝐼𝑏𝑝𝑚 is obvious from this plot. The PIs for some samples are longer than others. This can be explained

by taking into consideration the magnitude of the term 𝑔0𝑇 (𝐽 𝑇 𝐽)−1 𝑔0 in (5), for these samples. All 1− 𝛼2

PIs have a minimum width determined by 2𝑡𝑑𝑓

𝑠. If a test sample is significantly different from the 1− 𝛼

training samples, the term 𝑔0𝑇 (𝐽 𝑇 𝐽)−1 𝑔0 will be large. This increases the widths of PIs from 2𝑡𝑑𝑓 2 𝑠 to √ 1− 𝛼 2𝑡𝑑𝑓 2 𝑠 1 + 𝑔0𝑇 (𝐽 𝑇 𝐽)−1 𝑔0 . This happened for many samples and is observable in Fig. 3, in particular for samples 26 and 43. The construction of PIs with different widths implicitly indicates that the ANFIS model does not treat all test samples equally. If a PI width highly differs from other PI, it means that the sample is odd for the model. Therefore, the ANFIS model point prediction reliability and accuracy would be low for such a sample. C. Industrial Dryer Data for the third case study in this paper comes from an industrial dryer study [20]. The underlying targets in this system are dry bulb temperature, wet bulb temperature, and moisture content of raw material. The purpose of modeling is to predict these targets using fuel flow rate, hot gas exhaust fan speed, and rate of flow of raw material. Table I summarizes the obtained PI measures for this case study. For this dataset, the averaged width of PIs for the three targets have been decreased by between 2.96% (moisture content of raw material) to

12

Fig. 4. Evolution of membership functions for the first input of the ANFIS model of the dry bulb temperature target (third case study).

5.51% (dry bulb temperature). For the second target, the coverage probabilities for 𝑃 𝐼𝑏𝑝𝑚 and 𝑃 𝐼𝑜𝑝𝑡 are the same. This value has dropped slightly for the other two targets. Although the obtained PICPs are above the nominal confidence level (90%), this small drop clearly shows how sensitive the PI characteristics are against the ANFIS parameters. Comparison of CWCs for NN and ANFIS–based PIs demonstrates the superiority of the proposed method over the optimized delta technique for NNs. Although the quality of PIs constructed using the 𝐷𝑒𝑙𝑡𝑎𝑁 𝑁 𝑜𝑝𝑡 is acceptable, their coverage probability is lower than the nominal confidence level, for

instance for wet bulb temperature. Furthermore, they are wider than ANFIS–based PIs making them less informative. The evolutions of the three Gaussian MFs for the first ANFIS input of the dry bulb temperature target are illustrated in Fig. 4. The optimization algorithm changes the parameters of these MFs in order to improve the PI quality (narrower width with a higher coverage probability). The movement of MFs to the left and right reflects the attempt of the optimization algorithm to escape from the local minima and find the global optimum. The three experimental results included here demonstrate the effectiveness of the proposed optimization algorithm. As the constructed PIs are narrow and properly cover the targets, the generalization power of the developed models is acceptable. The key point is that the ANFIS models trained using the traditional learning algorithms are not optimal in terms of the PI characteristics. If the purpose of model development is construction of PIs, the proposed method provides PIs with an improved quality. It is also important to compare the computational requirements of PI construction using the delta technique for both NNs and ANFISs. Fig. 5 shows the elapsed time for PI construction for test samples of different targets. According to the demonstrated results, the computational load of the delta technique for NNs is on average three times more that required for the ANFIS method for PI construction. This

13 4.00 NN

Elapsed Time (sec.)

3.50

ANFIS

3.00

2.50 2.00

1.50 1.00 0.50

0.00

T70

Fig. 5.

T90

Steam pressure Main steam temp.

Reheat steam Dry bulb temp. temp.

Moisture content

Wet bulb temp.

Elapsed time for PI construction using NN and ANFIS models for different case studies.

clearly shows that the ANFIS–based method for PI construction outperforms the traditional NN–based method for PI construction in terms of the computational demand. In conclusion, the proposed delta method for construction of ANFIS–based PIs and their optimization leads to high quality PIs. Optimized ANFIS–based PIs not only have narrower widths with a higher coverage probability compared to traditional and optimized NN–based PIs [12] [18], but also their computational burden is less. These results encourage further investigations and research efforts into the development and application of fuzzy systems for uncertainty quantification. VI. C ONCLUSION The neural network delta technique for construction of prediction intervals is adopted and optimized for ANFIS models in this paper. The main motivations for conducting this study are simplicity and less computational mass of the ANFIS models compared to NNs. Width and coverage probability–based measures are applied for the quantitative evaluation of prediction interval quality. Based on these measures, a new cost function is designed to simultaneously address key characteristics of prediction intervals. Training of the ANFIS premise and consequent parameters is done through the minimization of this cost function. As the cost function is complex, nondifferentiable, and sensitive to the model parameters, the simulated annealing optimization method is employed for its minimization. The effectiveness of the proposed prediction interval optimization method is examined using three real world case studies. The obtained results indicated that the application of the proposed optimization method improves the quality of prediction intervals. Comparative studies also reveal that ANFIS–based PIs are of higher quality than NN–based PIs. Furthermore, their computational requirement is less than the NN–based method for PI construction.

14

ACKNOWLEDGMENT This research was fully supported by the Centre for Intelligent Systems Research (CISR) at Deakin University. R EFERENCES [1] J.-S. Jang, “ANFIS: adaptive-network-based fuzzy inference system,” IEEE Transactions on Systems, Man and Cybernetics, vol. 23, no. 3, pp. 665–685, 1993. [2] J.-S. R. Jang, C.-T. Sun, and E. Mizutani, Neuro–Fuzzy and Soft Computing: a computational approach to learning and machine intelligence.

Englewood Cliffs, NJ: Prentice-Hall, 1997.

[3] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989. [4] C. M. Bishop, Neural Networks for Pattern Recognition.

Oxford University, Press, Oxford, 1995.

[5] J. T. G. Hwang and A. A. Ding, “Prediction intervals for artificial neural networks,” Journal of the American Statistical Association, vol. 92, no. 438, pp. 748–757, 1997. [6] A. Khosravi, S. Nahavandi, D. Creighton, and A. F. Atiya, “A lower upper bound estimation method for construction of neural network-based prediction intervals,” IEEE Transactions on Neural Networks, DOI: 10.1109/TNN.2010.2096824, Accepted on 28-Nov-2010. [7] R. D. d. Veaux, J. Schumi, J. Schweinsberg, and L. H. Ungar, “Prediction intervals for neural networks via nonlinear regression,” Technometrics, vol. 40, no. 4, pp. 273–282, 1998. [8] D. J. C. MacKay, “The evidence framework applied to classification networks,” Neural Computation, vol. 4, no. 5, pp. 720–736, 1992. [9] B. Efron, “Bootstrap methods: Another look at the jackknife,” The Annals of Statistics, vol. 7, no. 1, pp. 1–26, 1979. [10] T. Heskes, “Practical confidence and prediction intervals,” in Neural Information Processing Systems, T. P. M. Mozer, M. Jordan, Ed., vol. 9.

MIT Press, 1997, pp. 176–182.

[11] D. Nix and A. Weigend, “Estimating the mean and variance of the target probability distribution,” in IEEE International Conference on Neural Networks, vol. 1, 1994, pp. 55–60 vol.1. [12] A. Khosravi, S. Nahavandi, and D. Creighton, “Construction of optimal prediction intervals for load forecasting problem,” IEEE Transactions on Power Systems, Accepted Jan. 2010. [13] ——, “A prediction interval-based approach to determine optimal structures of neural network metamodels,” Expert Systems with Applications, vol. 37, pp. 2377–2387, 2010. [14] G. C. V. M. Kirkpatrick, S., “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, 1983. [15] S. Wu and T. Chow, “Self-organizing and self-evolving neurons: A new neural network for optimization,” IEEE Transactions on Neural Networks, vol. 18, no. 2, pp. 385–396, 2007. [16] S.-J. Ho, L.-S. Shu, and S.-Y. Ho, “Optimizing fuzzy neural networks for tuning PID controllers using an orthogonal simulated annealing algorithm osa,” IEEE Transactions on Fuzzy Systems, vol. 14, no. 3, pp. 421–434, 2006. [17] J. A. M. and J. Bentez, “On the identifiability of tsk additive fuzzy rule-based models,” Advances in Soft Computing, vol. 6, pp. 79–86, 2006. [18] A. Khosravi, S. Nahavandi, and D. Creighton, “Constructing prediction intervals for neural network metamodels of complex systems,” in International Joint Conference on Neural Networks (IJCNN), 2009, pp. 1576–1582.

15

[19] M. Moonen, B. De Moor, L. Vandenberghe, and J. Vandewalle, “On- and off-line identification of linear state-space models,” International Journal of Control, vol. 49, no. 1, pp. 219–232, 1989. [20] C. T. Chou and J. Maciejowski, “System identification using balanced parametrizations,” IEEE Transactions on Automatic Control, vol. 42, no. 7, pp. 956–974, 1997.