Optimization of Temporal Processes: A Model Predictive Control Approach
Zhe Song and Andrew Kusiak Department of Mechanical and Industrial Engineering 3131 Seamans Center The University of Iowa Iowa City, IA 52242 – 1527 (Email:
[email protected])
Abstract A dynamic predictive-control model of a nonlinear and temporal process is considered. Evolutionary computation and data mining algorithms are integrated for solving the model. Data-mining algorithms learn dynamic equations from process data. Evolutionary algorithms could then be applied to solve the optimization problem guided by the knowledge extracted by data-mining algorithms. Several properties of the optimization model are shown in detail, in particular, a selection of regressors, time delays, prediction and control horizons, and weights. The concepts proposed in this paper are illustrated with an industrial case study in combustion process.
Key Words: Model predictive control, Evolutionary strategy, Data mining, Nonlinear temporal process, Optimization.
Nomenclature x
vector of controllable variables of a process
xi
the ith controllable variable
v
vector of non-controllable variables of a process
vi
the ith non-controllable variable
y
a performance variable of a process
S
search space of x
f(.)
function captures the mapping between (x, v) and y 1
sampling time stamp
t
d y , d xi , dvi
maximum possible time delays for y, xi , vi
Dyy , Dxy , Dvy i
i
sets of time delay constants selected for corresponding variables y, xi , vi under the
performance variable y
d yy ,low , d yy ,high
lower and upper bounds of set D yy
d xy ,low , d xy ,high
lower and upper bounds of set Dxy
dvy ,low , dvy ,high
lower and upper bounds of set Dvy
d xy ,min , d xy ,max
minimum and maximum time delays for controllable variables in x
dvy ,min , dvy ,max
minimum and maximum time delays for controllable variables in v
i
i
i
i
i
set of all controllable variables
X
X
i
1, y
, X 2, y
two subsets of X
Np
prediction horizon
Nc
control horizon
r
reference value for y
J(.)
cost function related with x, y, r
q(i )
ith weight constant
R, S
positive semi-definite matrices
N
number of different data mining algorithms used to identify the process function
f i (.)
process function identified by data mining algorithm i
λ
offspring size
µ
parent size or initial population size
Sp
the solution matrix of pth individual
σp
the mutation matrix of pth individual
N(.)
normal distribution
1. Introduction Optimizing nonlinear and non-stationary processes presents challenges for traditional solution approaches. A process can be represented as a triplet ( x , v , y, ) , where x ∈ R k is a vector of k controllable variables, v ∈ R m is a vector of m non-controllable variables (measurable); y ∈ R is a system performance variable. A
greater value of y indicates a better performance. The value of a performance variable changes in response to controllable and non-controllable variables. The controllable and non-controllable variables are considered in this research as input variables. The underlying relationship is represented as y = f ( x, v )
2
where f(.) is a function capturing the process, and it may change in time. Finding optimal control settings for optimizing y can be formulated as a single-objective optimization problem with constraints (1). argmax y
(1)
x
s.t. x ∈ S
In the model (1) S is a feasible search space. In many industrial applications the non-controllable variables v, the underlying function f(.), and the search space S are time dependent. Finding optimal control settings for nonlinear and temporal processes poses several challenges: 1.
It is difficult to derive analytic models describing f(.). For example, modeling the relationship between combustion process efficiency and input variables is not trivial. Thus it is difficult to solve model (1) with traditional optimization techniques.
2.
The function f(.) is non-stationary. Updating f(.) is necessary in practical applications. For example, a combustor ages over time. Regular maintenance and repair change the combustor’s properties, thus impacting the combustion process.
Recent advances in Evolutionary Computation (EC) and Data Mining (DM) present an opportunity to model and optimize complex systems using operational data. In recent years, data-mining algorithms have proved to be effective in applications such as manufacturing, marketing, combustion process, and so on [3, 15, 20, 24]. Many research and application projects of EC have led to success [1, 2, 7, 8, 10, 18, 23, 28]. The research results reported in the literature offer a promising direction for solving complex problems that are difficult to handle by traditional analytical methods. In this paper, DM and EC are combined to optimize a nonlinear and temporal process. A predictive control model is developed to control the process. The dynamic equations included in the model are learnt by data-mining algorithms. Various properties of the model predictive control problem are analyzed in this paper.
2. Dynamic Modeling A process can be considered as a dynamic Multi-Input-Single-Output (MISO) system. Assume y (t ) can be
} {
{
}
determined based on the previous system status: y (t − 1),..., y (t − d y ) , x1 (t − 1),..., x1 (t − d x1 ) ,…,
{x (t − 1),..., x (t − d )} , {v (t − 1),..., v (t − d )} ,…, {v k
k
xk
1
1
m (t
v1
}
− 1),..., vm (t − d vm ) .
The positive integers d y , d x1 ,…, d xk , dv1 ,…, dvm are the maximum possible time delays to be considered for the corresponding variables. The dynamic MISO model could be extracted from the historical process data by data-mining algorithms. Although many system identification algorithms are available, most of them assume a certain system structure [31]. Data mining offers many efficient algorithms extracting predictive models from large amount of data. .
3
To obtain an accurate dynamic model that can be generalized to optimize the process, selection of appropriate predictors is important. Data mining offers algorithms that can perform such a task. For example, the boosting tree [12-13] algorithm can be used to determine the predictor’s importance. Wrapper technique and a genetic random search can be combined to determine the best set of predictors [11, 26]. For performance variable y (t ) , a regressor (predictor) selection algorithm selects a set of important
} {
{
{
}
}
predictors among y (t − 1),..., y (t − d y ) , x1 (t − 1),..., x1 (t − d x1 ) ,…, xk (t − 1),..., xk (t − d xk ) ,
{v1 (t − 1),..., v1 (t − d )} ,…, {v
m (t
v1
}
− 1),..., vm (t − d vm ) .
For ease of discussion, different index sets of y need to be defined.
{
Definition 1. For response variable y , Dyy = d yy ,low ,..., d yy , high
{
from 1,..., d y
} related to
{
y ’s previous values and arranged in ascending sequence, d yy ,low ≤ d yy , high .
Similarly, Dxy = d xy ,low ,..., d xy , high 1
{
1
Dvy = d vy ,low ,..., d vy ,high 1
1
1
} is a set composed of integers selected
1
} is a set selected from {1,..., d } for predictors related to x . Similarly, 1
x1
} is a set selected from {1,..., d } for predictors related to v . In total there are 1
v1
1+k+m individual sets for y : Dyy , Dxy ,…, Dxy , Dvy ,…, Dvy . 1
1
k
m
Based on Definition 1, the y = f ( x, v ) can be rewritten as the following dynamic model: y (t ) = f ([ y (t − d ) ]d∈D y , [ x1 (t − d )]d∈D y ,..., xk (t − d )d∈D y , [ v1 (t − d ) ]d∈D y ,... , vm (t − d )d∈D y ) , where y x x v v 1
[ y(t − d )]d∈D , [ x1 (t − d )]d∈D y y
y x1
1
k
m
,..., xk (t − d ) d∈D y , [ v1 (t − d ) ]d ∈D y ,..., vm (t − d )d ∈D y expands by x v v 1
k
m
enumerating all possible elements in the corresponding sets.
3. Model Predictive Control The concept of model predictive control (MPC) is used to determine optimal control settings of the process. However, optimization is performed by an evolutionary algorithm. The MPC has proven to be effective in industrial applications [16, 21, 22]. Illustrative applications of the MPC solved by EC algorithms are discussed in [14]. A computational intelligence algorithm used to solve the MPC is presented in [27]. However, the research reported in the literature indicates that the relationship between controllable and non-controllable variables of the MPC has not been fully analyzed. In particular, the weights used in the MPC need greater research, as they are important. The basic elements of the MPC are [11, 30]: 1.
Sample the response, controllable, and non-controllable variables of the process.
4
2.
Build a model with the data collected on the previous step.
3.
Use the process model to predict its future behavior over a prediction horizon N p when a control action is applied along a control horizon N c .
4.
Calculate the optimal control sequence { x (t ),..., x (t + N c )} that minimizes some cost function J.
5.
Apply the controllable setting x (t ) and repeat the procedure at the next sampling time.
Suppose at sampling time t, an attempt is made to find an optimal control sequence of selected controllable variables x: { x (t ),..., x (t + N c )} , which minimize some cost function. Assume the prediction horizon is
{
}
N p , and a reference signal for the performance variable y is r (t ),..., r (t + N p ) . The value of r is
usually greater (better) than y at any time.
Like the MPC, the optimization model (1) can be rewritten as argmin
J ( x, y, r )
x (t ),..., x (t + Nc )
(2)
subject to x∈S y (t ) = f ([ y (t − d )]d∈D y , [ x1 (t − d ) ]d ∈D y ,..., xk (t − d ) d∈D y , [ v1 (t − d ) ]d∈D y ,..., vm (t − d )d∈D y ) y x x v v 1
t+Np
where J ( x, y, r ) =
∑
q (i )[r (i ) − y (i )]2 +
i =t
1
k
t + Nc
∑ [ x(i)
T
m
Rx (i ) + ∆x (i )T S ∆x (i )] ; ∆x (i ) = x (i ) − x (i − 1) ; q(i ) is
i =t
a positive weight; R and S are positive semi-definite matrices. Solving the model (2) brings the system’s
{
}
current performance y (t ) to the desired one r (t ),..., r (t + N p ) . In some applications, constraint x ∈ S could be relaxed to specific ones. In this research, the dynamic models will be constructed by data-mining algorithms. An EC algorithm will be used to solve model (2) as analytical models are not available. To further analyze the MPC problem, suppose at sampling time t, the system status is y (t ), x1 (t ),..., xk (t ), v1 (t ),..., vm (t ) , and all historical system information is available. Based on the dynamic
equation y(t ) = f ([ y(t − d )] y , [ x1 (t − d )] y ,..., xk (t − d ) y , [ v1 (t − d )] y ,..., vm (t − d ) y ) , some d ∈Dy d ∈Dx d ∈Dx d ∈Dv d ∈Dv k m 1 1 relationships among the input process variables need to be clarified. For a dynamic equation y ,low , dx y (t ) = f ([ y (t − d ) ]d∈D y , [ x1 (t − d )]d∈D y ,..., xk (t − d )d∈D y , [ v1 (t − d ) ]d∈D y ,..., vm (t − d )d∈D y ) , d y 1 y x x v v
y ,low
1
1
k
m
d xy ,low , d vy ,low ,…, d vy ,low are lower bounds for D yy , Dxy ,…, Dxy , Dvy ,…, Dvy . k
1
1
m
5
k
1
m
,…,
{ } } the largest time delay of controllable variables x; } the smallest time delay of non-controllable variables v; } the largest time delay of non-controllable variables v.
Definition 2. Let d xy ,min = min d xy ,low ,..., d xy ,low be the smallest time delay of controllable variables x; 1
{ = min {d = max {d
k
d xy ,max = max d xy , high ,..., d xy , high dvy ,min dvy ,max
1
k
y ,low ,..., dvy ,low v1 m
y , high ,..., dvy , high v1 m
m
Observation 1. If there exists a constant d ∈
∪D
y vj
j =1
, and d < d xy ,min holds, there is not sufficient
information about non-controllable variables to optimize y by finding the optimal control setting of
{ x1 (t ),..., xk (t )} at sampling time t. To explain Observation 1, consider an illustrative example. For y(t ) = f ( x1 (t − 3), v1 (t − 1)) , at sampling time t, controllable variable x1 (t ) optimizing y (t + 3) is to be determined, i.e., y(t + 3) = f ( x1 (t ), v1 (t + 2)) . As v1 (t + 2) is not known, it is difficult to find the optimal value of x1 (t ) .
Definition 3. Let X be a set composed of all controllable variables. X = {x1 ,..., xk } , and for a response
{
}
variable y , define X 1, y = x j | d xy ,low = d xy ,min , j = 1...k . It is obvious that X 1, y ⊆ X . Let X 2, y = X − X 1, y , j
and for any controllable variables x j ∈ X 2, y , d xy ,low > d xy ,min holds. j
Observation 2. At sampling time t, one can only modify controllable variables in X 1, y to optimize y . X 1, y is called an actionable variable set of y .
To make sure all controllable variables are used to optimize y , Observation 2 offers hints for selecting predictors and the corresponding time delays. Observation 2 is explained by the following example. For y(t ) = f ( x1 (t − 3), x2 (t − 4)) , at sampling time t, y (t + 3) is optimized by modifying the values of controllable variables, i.e., y(t + 3) = f ( x1 (t ), x2 (t − 1)) .
As x2 (t − 1) has already happened, y (t + 3) is optimized with x1 (t ) only. Based on Observation 2, the x vector in model (2) is composed of variables belonging to X 1, y , i.e., x (t ) = x j (t )
x j ∈ X 1, y
.
6
Observation 3. There exist a lower and an upper bound for prediction horizon N p and control horizon N c for model (2). UpBound N p = dvy ,min , LowBound N p = d xy ,min ; UpBound Nc = dvy ,min − d xy ,min , LowBound Nc = 0 . Besides, N p and N c should satisfy the equation N c = N p − d xy ,min .
Consider the dynamic model y (t ) = f ( y (t − 1), x1 (t − 1), v1 (t − 5)) , dvy ,min = 5 , d xy ,min = 1 . Determine whether UpBound N p = 5 makes sense. As t increases from t to N c = 6 , y (t ) = f ( y (t − 1), x1 (t − 1), v1 (t − 5)) , y (t + 1) = f ( y (t ), x1 (t ), v1 (t − 4)) , y (t + 2) = f ( y (t + 1), x1 (t + 1), v1 (t − 3)) , y (t + 3) = f ( y (t + 2), x1 (t + 2), v1 (t − 2)) , …, y (t + 5) = f ( y (t + 4), x1 (t + 4), v1 (t )) , y (t + 6) = f ( y (t + 5), x1 (t + 5), v1 (t + 1)) . Since v1 (t + 1) is not known, there is no way to accurately predict y (t + 6) . It is also obvious that y (t ) has been already determined from the historical values y (t − 1), x1 (t − 1), v1 (t − 5) . The control action can only start from y (t + 1) by changing x1 (t ) . The control
horizon could start from x1 (t ) to x1 (t + 4) . The upper bound of N c is 4, i.e., UpBound Nc = d vy ,min − d xy ,min = 5 − 1 .
4. Solving the MPC with an Evolutionary Computation Algorithm The MPC is usually solved by nonlinear programming algorithms. In this study, the MPC is solved with an evolutionary computation algorithm. Robustness of optimal solutions is of importance in many applications. In this paper, the robustness of model (2) is ensured by using various data-mining algorithms and combining predictions. Different definitions of robustness can be found in [9] and [25]. Using N different data-mining algorithms to learn the dynamic equation, model (2) can be rewritten into model (3). argmin x (t ),..., x (t + Nc )
J ( x , y, r )
subject to
(3)
x∈S y (t ) =
1 N
N
∑f i =1
i
([ y (t − d ) ]d∈D y , [ x1 (t − d ) ]d∈D y ,..., xk (t − d ) d∈D y , [ v1 (t − d ) ]d ∈D y ,..., vm (t − d )d∈D y ) y x x v v 1
k
1
m
In model (3), N different data mining algorithms are used to construct f i (.) , where i = 1,..., N . The final prediction of y (t ) is the average value of the N predictions.
7
Assume N c and N p satisfy all the conditions discussed before. The objective function t+Np
J ( x, y, r ) =
∑
t + Nc
∑ [ x(i)
q (i )[r (i ) − y (i )]2 +
T
i =t
J ( x, y, r ) =
i =t
t + d xy ,min −1
∑
t+Np
∑
∑
q (i )[r (i ) − y (i )]2 +
i =t
t + d xy ,min −1
Rx (i ) + ∆x (i )T S ∆x (i )] can be written as
q (i )[r (i ) − y (i )]2 +
t + Nc
∑ [ x(i)
T
Rx (i ) + ∆x (i )T S ∆x (i )] , where
i =t
i =t + d xy ,min
q (i )[r (i ) − y (i )]2 is already determined irrespective of what the values { x (t ),..., x (t + N c )} are. As
i =t
t+Np
a result,
∑
q (i )[r (i ) − y (i )]2 +
i =t + d xy ,min
t + Nc
∑ [ x(i)
T
Rx (i ) + ∆x (i )T S ∆x (i )] is calculated for each individual
i =t
solution and use it as a fitness function value in EA. Observation 4. Optimal solution of model (2) or (3) may decrease y at t + d xy ,min when the optimal value
of x (t ) is implemented at sampling time t. Maintaining the weight q(t + d xy ,min ) relatively larger than the other weights would assure y at t + d xy ,min is not decreased by changing x (t ) to the optimal value.
Each individual solution could be represented as a matrix. Recall that x (t ) = x j (t )
vector with index j varying from j
low
to j
high
x j ∈ X 1, y
x j low (t ) = is a x (t ) high j
x j low (t ) … x j low (t + N c ) . Let { x (t ),..., x (t + N c )} = (4) x (t ) x j high (t + N c ) high j
In this paper, evolutionary strategies (ES) [10] are intuitively selected to solve model (3). Definition 4. Let λ be the offspring size, and µ be both the number of offspring selected and the initial
population size. Individuals in the parent population are numbered from 1 to µ . Individuals in the offspring population are numbered from 1 to λ .
8
Definition 5. The general form of the pth individual in the evolutionary strategy is defined as ( S p , σ p ) , x plow (t ) … x plow (t + N c ) σ plow (t ) … σ plow (t + N c ) j j j j , σp = . Each element of σ p is where S p = p p x phigh (t ) σ phigh (t ) x σ high (t + N c ) high (t + N c ) j j j j
used as a standard deviation of a normal distribution with zero mean. The basic steps of the evolutionary strategy algorithm are [10]:Algorithm 1 1: Initialize µ individuals (candidate solutions) to form the initial parent population. 2: Repeat until stopping criteria is satisfied. 2.1: Select and recombine parents from the parent population to generate λ offspring (children). 2.2: Mutate the λ children. 2.3: Select the best µ children based on fitness function values. 2.4: Use the selected µ children as parents for the next generation.
4.1. Mutation
An individual ( S p , σ p ) can be mutated by following the equations (5, 6), with σ p mutated first, S p mutated next.
σp
t+N N (0,τ ') + N lowc (0,τ ) N (0,τ ') + N tjlow (0,τ ) j … e e p (5), where N (0,τ ') is a random number drawn from =σ • Nc (0,τ ) N (0,τ ') + N t +high N (0,τ ') + N tj high (0,τ ) j e e
normal distribution with 0 mean and standard deviation τ ' . N tj low (0,τ ) is a random number drawn from normal distribution with a mean of 0 and standard deviation τ . N tj low (0,τ ) is generated specifically for
σ jplow (t ) , while N (0,τ ') is for all entries. “ • ” is the Hadamard matrix product [29]. S p = S p + N (0, σ p ) (6), where N (0, σ p ) is a matrix of the same size as S p . Each element of N (0, σ p )
is generated from a normal distribution with mean 0 and the corresponding standard deviation in matrix
σp.
4.2. Selection and recombination of parents
9
Definition 6. Let SeletedParents be an index set composed of n unique randomly selected index from 1 to
µ . SeletedParents changes every time it is generated. The value n is usually much smaller than µ (frequently n is 2). To generate λ children, parents are selected from the parent population and recombined λ times. Assume each time n parents are selected randomly to produce one child by using equation (7) (
1 n
∑
p∈SeletedParents
S p,
1 n
∑
σ p)
(7)
p∈SeletedParents
A discrete recombination operator [10] was applied in this research; however, it did not perform as well as the intermediary recombination operator used in (7).
4.3. Children Selection
The selection process in the evolutionary strategy algorithm is simple. Once the λ children are generated, the best µ of them are selected based on the fitness function value.
5. Industrial case study To validate the concepts introduced in this paper, industrial data were collected from Boiler 11 (fluidizedbed combustion) of the University of Iowa Power Plant (UI PP). The boiler burns coal and biomass (oat hulls). The coal and oat hulls’ ratio changes depending on the availability of oat hulls. The load demand of Boiler 11 changes in time. Based on the domain expertise, finding the optimal air flow becomes the first choice. Boiler 11 has two air flow inputs, the primary and the secondary air flow. Fig. 1 illustrates the proposed optimization framework. Boiler combustion process generates data streams stored in data (PI) historian. The data streams are then periodically used by the data mining algorithms to update process models with the new data. Then, evolutionary strategy algorithm determines the optimal air flow settings. The optimal air flow could be either recommended to the boiler operators or directly input as a set point of the air flow controller.
Boiler combustion process
Control signal
Boiler operators or automatic controllers
Optimal air settings
Evolutionary strategy optimization
PI historian
Data mining algorithms
10
Identified process models
Fig. 1. Illustration of the proposed boiler combustion process optimization by data mining and evolutionary strategy algorithms.
5.1 Data set, sampling frequency, and predictor selection
From the UI PP data historian, 5868 data points were sampled at 5-minute intervals. Data set 1 in Table 1 is the total data set composed of 5868 data points starting from “2/1/07 2:50 AM” and continuing to “2/21/07 11:45 AM.” During this time period, the boiler operations could be described as normal. Data set 1 was divided into two data sets. Data set 2 was used to extract a model by data-mining algorithms. It consisted of 4715 data points. Data set 3 was used to test the model learnt from data set 2. Considering the noise in the industrial data, data set 1 was denoised by the moving-average approach with a lag of 4. All the data points were scaled in the interval [0, 1]. Table 1. The data set description. Data set 1
Start time stamp 2/1/07 2:50 AM
End time stamp 2/21/07 11:45 AM
2
2/1/07 2:50 AM
2/17/07 11:40 AM
3
2/17/07 11:45 AM
2/21/07 11:45 AM
Description Total data set Training data set; total 4715 data points Testing data set; total 1153 data points
In this paper, boiler efficiency is heuristically modeled as a function of coal and primary air ratio (coal flow (klbs/hr) / primary air flow (klbs/hr)), coal and secondary air ratio (coal flow (klbs/hr) / secondary air flow (klbs/hr)), coal and oat hull ratio (coal flow (klbs/hr) / oat hull flow (klbs/hr)), and coal and oat hull quality (BTU/lb). However, other variables could be considered based on the application context. Using the coal and primary air ratio, the primary air flow could be determined based on the current coal flow. Similarly, the secondary air flow could be determined by using the current coal flow and coal and secondary air ratio. The coal and oat hull ratio is assumed to be a noncontrollable variable from a practical point. The maximum time delays, d y , d x1 , d x2 , d v1 , d v2 , d v3 , are assumed to be 9. In the context of 5-minute sampling intervals, 9 × 5 = 45 minutes are assumed to be maximum time delays. For example, if the operator changes the primary air flow right now, it would take at most 45 minutes to observe that this change has some effect on boiler efficiency. After running the boosting tree algorithm [12-13] on data set 2, the importance of each predictor is calculated. Considering observations 1-3, 11 predictors are selected in Table 2. Then the current boiler efficiency y (t ) can be written as y (t ) = f ( y (t − 1), y (t − 2), x1 (t − 5), x1 (t − 6), x1 (t − 7), x2 (t − 5), x2 (t − 6), x2 (t − 7), v1 (t − 9), v2 (t − 9), v3 (t − 9)) Table 2. Predictors used in the dynamic equation of boiler efficiency. x1 (t − 5)
Coal and primary air ratio
11
(8)
x1 (t − 6)
Coal and primary air ratio
x1 (t − 7)
Coal and primary air ratio
x2 (t − 5)
Coal and secondary air ratio
x2 (t − 6)
Coal and secondary air ratio
x2 (t − 7)
Coal and secondary air ratio
v1 (t − 9)
Coal and oat hull ratio
v2 (t − 9)
Coal quality
v3 (t − 9)
Oat hull quality
y (t − 1)
Combustion efficiency
y (t − 2)
Combustion efficiency
5.2 Learning dynamic equations from the process data
In this paper, five different data-mining algorithms were used to learn the dynamic equation (8) from data set 2. They include the classification and regression tree algorithm (C&R Tree) [6], the boosting tree regression algorithm [12-13], the random forests regression algorithm [5], the back-propagation feedforward neural network (NN), and the radial basis function (RBF) algorithm [4, 17]. The extracted models were tested using data set 3. Table 3 summarizes the models’ prediction accuracy based on data set 3. Figs. 2 (a) to (e) show the first 200 predicted and observed values of data set 3. The NN performed best among the five algorithms. Random forests performed the worst. In summary, all the algorithms made high quality predictions on the testing data sets and captured the system dynamics. However, updating the learned models by new data points are necessary for a temporal process. The modeling task is simplified by using data-mining algorithms. Table 3. Prediction accuracy of different algorithms for data set 3. Algorithms
C & R Tree Boosting Tree Random Forests NN RBF
Mean absolute error 0.0055 0.0040 0.0066 0.0030 0.0054
Mean absolute error Std 0.0052 0.0033 0.0055 0.0027 0.0051
12
Mean absolute error% 0.6436 0.4742 0.7943 0.3515 0.6471
Mean absolute error% Std 0.6084 0.3903 0.6708 0.3262 0.6143
C & R Tree 0.88
B o iler efficien cy
0.87 0.86 0.85 0.84
Predicted
0.83
Observed
0.82 0.81 0.8 0.79 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 Point number in ascending time sequence
(a) Boosting tree 0.88
B o iler efficien cy
0.87 0.86 0.85 0.84
Predicted
0.83
Observed
0.82 0.81 0.8 0.79 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 Point number in ascending time sequence
(b) Random forests 0.88
B o iler efficien cy
0.87 0.86 0.85 0.84
Predicted
0.83
Observed
0.82 0.81 0.8 0.79 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 Point number in ascending time sequence
(c)
13
ANN 0.88
B o iler efficien cy
0.87 0.86 0.85 0.84
Predicted
0.83
Observed
0.82 0.81 0.8 0.79 1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 Point number in ascending time sequence
(d)
B o iler efficien cy
RBF 0.89 0.88 0.87 0.86 0.85 0.84 0.83 0.82 0.81 0.8 0.79 0.78
Predicted Observed
1 12 23 34 45 56 67 78 89 100 111 122 133 144 155 166 177 188 199 Point number in ascending time sequence
(e) Fig. 2. Predicted values and observed values of the first 200 data points of data set 3.
5.3 Evolutionary Strategy Algorithm
Based on Equation (8) and Observations 2-3, prediction horizon N p is set as 9, control horizon N c is set as 4. Thus an individual in the evolutionary strategy is represented as two 2×5 matrices:
σ 1p (t ) … σ 1p (t + 4) x1p (t ) … x1p (t + 4) p S = , σ = p p p p x2 (t + 4) σ 2 (t + 4) σ 2 (t ) x2 (t ) p
Model (3) can be expressed as
14
(9)
argmin x (t ),..., x (t + 4)
J ( x , y, r )
subject to x∈S y (t ) =
1 5
5
∑f
i
( y (t − 1), y (t − 2), x1 (t − 5), x1 (t − 6), x1 (t − 7), x2 (t − 5), x2 (t − 6), x2 (t − 7), v1 (t − 9), v2 (t − 9), v3 (t − 9))
i =1
t +9
where J ( x, y , r ) = t +9
∑
q (i )[r (i ) − y (i )]2 +
i =t + 5
t +4
∑ q(i)[r (i) − y(i)] + ∑[ x(i) 2
i =t
only
(10)
T
Rx (i ) + ∆x (i )T S ∆x (i )] , and
i =t
t +4
∑[ x(i)
T
i =t
x1 Rx (i ) + ∆x (i )T S ∆x (i )] needs to be calculated . is limited to x2
0.05 0.2 between and 0.29 in this case study based on analyzing the historical data distribution of x. 0.05
The ES parameters τ ' and τ are determined based on the heuristics τ ' = 1/ 2n and τ = 1/ 2 n , with n = 2 , in this case the number of actionable variables [10]. The lower and upper bounds for the standard
deviation values of σ p are set to 0.005 and 0.1, respectively. Two parents are selected to generate one child. For the initial population at sampling time t, each column of S p is generated by drawing random x1 (t ) + 0.1 0.05 p and numbers uniformly between . Similarly each column of σ is generated by ( ) 0.1 + x t 0.05 2 0.005 0.1 and . Although research suggests the selection drawing random numbers uniformly between 0.005 0.1
pressure to be µ / λ = 1/ 7 [10, 19], numerous experiments were conducted, and it was found that µ / λ = 15 / 400 performs well, and 50 generations are enough for the algorithm to converge to local optima.
{r (t ),..., r (t + 9)}
are set as constants {2,..., 2} since the boiler efficiency is always between 0 and 1. The
weight vector {q (t ),..., q (t + 9)} plays an important role based on Observation 4. Different weights are 1 0 0 0 discussed later. R and S are set as either = I or = 0 . Figs. 2-4 are based on q(t + 5) = 10 , 0 1 0 0 q(t + i ) = 1 , i = 1, 2,3, 4, 6, 7,8,9 , R = S = I .
At sampling time “2/18/2007 2:05 AM,” model (10) was solved with ES. Fig. 3 shows that the objective function value of the best individual in each generation is decreased as the number of generations goes up. The selection pressure µ / λ is important for the ES to converge to the optimal values. Fig. 4 illustrates another example of solving model (10) at sampling time “2/21/2007 5:05 AM.” µ = 15, λ = 400 achieves the best performance in terms of the computational time and objective function value. For µ = 15, λ = 200
15
the ES converges to a higher value of the objective function. However, for µ = 15, λ = 100 the ES appears to be unstable.
19.4 19.35 Objective function valu
19.3 λ=700, µ=15
19.25
λ=600, µ=15
19.2
λ=500, µ=15
19.15
λ=400, µ=15
19.1
λ=300, µ=15 λ=200, µ=15
19.05
λ=100, µ=15
19 18.95
96
84 90
72 78
60 66
48 54
36 42
24 30
12 18
0
6
18.9
Generations
Fig. 3. ES solving model (10) for different offspring sizes at sampling time “2/18/2007 2:05 AM”; the objective function value is the best individual’s fitness value at each generation.
18.15 18.1
Objective function valu
18.05 λ=700, µ=15
18
λ=600, µ=15 λ=500, µ=15
17.95
λ=400, µ=15 17.9
λ=300, µ=15
17.85
λ=200, µ=15 λ=100, µ=15
17.8 17.75
96
90
84
78
72
66
60
54
48
42
36
30
24
18
12
6
0
17.7
Generations
Fig. 4. ES solving the model (10) for different offspring sizes at sampling time “2/21/2007 5:05 AM”; the objective function value is the best individual’s fitness value at each generation.
16
18.15
Objective function valu
18.1 18.05 18
λ=400, µ=15
17.95
λ=533, µ=20
17.9
λ=667, µ=25 λ=800, µ=30
17.85 17.8 17.75
90 96
78 84
66 72
60
54
48
36 42
24 30
18
6 12
0
17.7
Generations
Fig. 5. ES solving model (10) at “2/21/2007 5:05 AM,” with the same selection pressure and increasing population size; the objective function value is the best individual’s fitness value at each generation. Fixing the selection pressure at µ / λ = 0.0375 increases µ . The graph in Fig. 5 demonstrates that increasing µ does not significantly affect the performance of the ES optimizing model (10) once the selection pressure is fixed.
5.4 Virtual testing for prediction of efficiency gains
At sampling time t, the current values of x1 (t ) and x2 (t ) become the optimal values determined from model (10). To see whether the combustion efficiency y (t + 5) = f ( y (t + 4), y (t + 3), x1 (t ), x1 (t − 1), x1 (t − 2), x2 (t ), x2 (t − 1), x2 (t − 2), v1 (t − 4), v2 (t − 4), v3 (t − 4))
is improved, a virtual testing
approach is used. The virtual testing approach [20] validates the concepts introduced in this paper without performing live testing. The basic idea behind virtual testing is to develop a valid computational model of the combustion process to test the proposed method. The virtual model is derived from historical data and simulates the real-time process. An artificial neural network is used to construct f (.) based on data set 1. Thus one can assume the underlying combustion process during “2/1/07 2:50 AM” and “2/21/07 11:45 AM” is modeled by f (.) . Then f (.) is used to predict the combustion efficiency y based on the inputs. Also, recall that model (10) is based on the predictive models f 1 (.) to f 5 (.) , which are learnt from data set 2, and data set 3 has not been used to derive the five models. The optimization results derived from models f 1 (.) to f 5 (.) are tested on f (.) which represents the real-time combustion process in the time window
“2/1/07 2:50 AM” and “2/21/07 11:45 AM”.
17
Let the optimal values of x1 (t ) and x2 (t ) be represented by x1* (t ) and x2* (t ) . If x1* (t ) and x2* (t ) are implemented at time t, combustion efficiency will be changed from y (t + 5) to y* (t + 5) , where y* (t + 5) = f ( y (t + 4), y (t + 3), x1* (t ), x1 (t − 1), x1 (t − 2), x2* (t ), x2 (t − 1), x2 (t − 2), v1 (t − 4), v2 (t − 4), v3 (t − 4))
.
Two potential factors could affect the efficiency gains. One is based on Observation 4, y (t + 5) could decrease to y* (t + 5) , if the weight q(t + 5) is not large enough. The other is, if the models f 1 (.) through f 5 (.) do not accurately capture f (.) , the optimal solution of model (10) may decrease the efficiency
predicted by f (.) . For the second factor, models f 1 (.) to f 5 (.) could be updated by new process data to make sure they capture the underlying process accurately in real applications. For the first factor, different weights about q(t + 5) are tested. Based on Observation 4, if the value of q(t + 5) = 10 , q(t + i ) = 0 , i = 1, 2,3, 4, 6, 7,8,9 , R = S = 0 , the optimal solution will lead to an increased
combustion efficiency. Five experiments were performed to assess the impact of different combinations of weights (see Table 4). In each experiment, model (10) was solved for each data point in data set 3. Each data point in data set 3 is controlled by the optimal values x1* (t ) and x2* (t ) , and y (t + 5) and y* (t + 5) are compared to see whether there is an efficiency gain. All the experiments in Table 4 were run with 50 generations, and
µ / λ = 15 / 400 . x1* (t ) and x2* (t ) were obtained through the best individual in the 50th generation. Table 4. Experiment performed to evaluate of the impact of weights. Exp #
1
Description q(t + 5) = 1 , R = S = I , q(t + i ) = 1 , i = 1, 2,3, 4, 6, 7,8,9
2
q(t + 5) = 10 , R = S = I , q(t + i ) = 1 , i = 1, 2,3, 4, 6, 7,8,9
3
q(t + 5) = 100 , R = S = I , q(t + i ) = 1 , i = 1, 2,3, 4, 6, 7,8,9
4
q(t + 5) = 1000 , R = S = I , q(t + i ) = 1 , i = 1, 2,3, 4, 6, 7,8,9
5
q(t + 5) = 10 , R = S = 0 , q (t + i ) = 0 , i = 1, 2,3, 4, 6, 7,8,9
For experiments 1 through 5, as q(t + 5) increases relative to other weights, based on Observation 4, y* (t + 5) would be greater than y (t + 5) with higher confidence.
18
Table 5. Experiment results for the data of Table 4. Exp #
Average efficiency gain * y (t + 5) − y (t + 5)
Std deviation
# of positive gains
# of non-positive gains
1 2 3 4 5
-0.0335 0.01592 0.03309 0.03352 0.03366
0.0248 0.0159 0.0155 0.0154 0.0154
58 1106 1153 1153 1153
1095 47 0 0 0
0.06 0.04
Efficiency gain
0.02 0 -0.02
1
17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289
Exp1 Exp2 Exp3
-0.04
Exp4 Exp5
-0.06 -0.08 -0.1 -0.12 Data point number in ascending time sequence
Fig. 6. Efficiency gains of the first 300 data points of data set 3 and controlled optimal values of x1* (t ) and x2* (t ) .
The data in Table 5 indicates that the equal weights in Exp. 1, result in a large number of data points with decreased efficiency. In Exp. 2, the number of non-positive efficiency gains is significantly reduced. For Exp. 3-5, all data points are controlled with positive efficiency gains. The average efficiency gain increases from Exp. 1 to Exp. 5. Just a 1% (i.e., 0.01 efficiency gain) improvement in combustion efficiency translates into significant savings in the power industry. Table 5 shows that the proposed approach could successfully improve combustion efficiency without doing live tests. Fig. 6 plots the efficiency gains of the first 300 data points of data set 3. Note that in Exp. 3-5, as the q(t + 5) is large enough relative to other weights, the efficiency gains in the three experiments tend to be the same.
6. Conclusion A generic framework optimizing a nonlinear and temporal process was presented. In this framework three powerful techniques were combined, namely: evolutionary algorithms, data mining and model-predictive control. The dynamics of the temporal process can be captured by periodically updating predictive models by learning from the recent process data. Unlike the traditional MPC, non-controllable variables are directly considered in the predictive models, thus providing more accurate dynamic information about the
19
process. The weights of the MPC model are shown to be important in optimizing the process at each sampling step. Although the evolutionary strategy algorithm is used in this paper, other evolutionary computation algorithms could optimize the model predictive control (MPC) model. Selection pressure is a critical parameter in the evolutionary strategy algorithm so that better individuals are used the next generation. A recombination operator is another important factor in the evolutionary strategy algorithm. A discrete recombination operator was initially used in this research and then replaced by a better performing recombination operator. It took about a minute to produce a suboptimal solution for a data point of data set 3 using a standard PC. A 5-minute sampling interval provides enough time for the algorithm to solve the model in many applications. Future research effort could focus on a multi-objective optimization, where several competing performance variables could considered. Multi-objective evolutionary algorithms could be developed to solve the multi-objective problem.
References [1] J.M. Alonso, F. Alvarruiz, J. M. Desantes, L. Hernández, V. Hernández, and G. Moltó, "Combining neural networks and genetic algorithms to predict and reduce diesel engine emissions," IEEE Transactions on Evolutionary Computation, vol. 11, pp. 46-55, 2007. [2] A. Benedetti, M. Farina, and M. Gobbi, "Evolutionary multiobjective industrial design: the case of a racing car tire-suspension system," IEEE Transactions on Evolutionary Computation, vol. 10, pp. 230-244, 2006. [3] M.J.A. Berry and G. S. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. New York: Wiley, 2004, [4] C. Bishop, Neural Networks for Pattern Recognition. Oxford, UK: University Press, 1995. [5] L. Breiman, "Random forests," Machine Learning, vol. 45, pp. 5-32, 2001. [6] L. Breiman, J.H. Friedman, and R. A. Olshen, Classification and Regression Trees. Monterey, CA: Wadsworth International, 1984. [7] D. Büche, P. Stoll, R. Dornberger, and P. Koumoutsakos, "Multiobjective evolutionary algorithm for the optimization of noisy combustion processes," IEEE Transactions on Man, Control, and Cybernetics – Part C, vol. 32, pp. 460-473, 2002. [8] Z. Cai and Y. Wang, "A multiobjective optimization-based evolutionary algorithm for constrained optimization," IEEE Transactions on Evolutionary Computation, vol. 10, pp. 658-675, 2006. [9] K. Deb and H. Gupa, "Introducing robustness in multi-objective optimization," Evolutionary Computation, vol. 14, pp. 463-494, 2006. [10] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computation. New York: Springer-Verlag, 2003, pp. 299.
20
[11] J. Espinosa, J. Vandewalle, and V. Wertz, Fuzzy Logic, Identification and Predictive Control. London, UK: Springer-Verlag, 2005. [12] J. H. Friedman, "Stochastic gradient boosting," Computational Statistics & Data Analysis, vol. 38, pp. 367-378, 2002. [13] J.H. Friedman, "Greedy function approximation: A gradient boosting machine," Annals of Statistics, vol. 29, pp. 1189-1232, 2001. [14] H. Ghezelayagh and K.Y. Lee, "Intelligent predictive control of a power plant with evolutionary programming optimizer and neuro-fuzzy identifier," in Proceedings of the 2002 Congress on Evolutionary Computation, 2002, pp. 1308-1313. [15] J.A. Harding, M. Shahbaz, S. Srinivas, and A. Kusiak, "Data mining in manufacturing: a review," ASME Transactions: Journal of Manufacturing Science and Engineering, vol. 128, pp. 969-976, 2006. [16] V. Havlena and J. Findejs, "Application of model predictive control to advanced combustion control," Control Engineering Practice, vol. 13, pp. 671-680, 2005. [17] S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan Publishing, 1994. [18] J.S. Heo, K.Y. Lee, and R. Garduno-Ramirez, "Multiobjective control of power plants using particle swarm optimization techniques," IEEE Transactions on Energy Conversion, vol. 21, pp. 552-561, 2006. [19] T. Jansen, K.A. De Jong, and I. Wegener, "On the choice of the offspring population size in evolutionary algorithms," Evolutionary Computation, vol. 13, pp. 413-440, 2005. [20] A. Kusiak and Z. Song, "Combustion efficiency optimization and virtual testing: A data-mining approach," IEEE Transactions on Industrial Informatics, vol. 2, pp. 176-184, 2006. [21] X.J. Liu and C.W. Chan, "Neural-fuzzy generalized predictive control of boiler steam temperature," IEEE Transactions on Energy Conversion, vol. 21, pp. 900-908, 2006. [22] C.H. Lu and C.C. Tsai, "Generalized predictive control using recurrent fuzzy neural networks for industrial processes," Journal of Process Control, vol. 17, pp. 83-92, 2007. [23] C.K. Tan, S. Kakietek, S.J. Wilcox, and J. Ward, "Constrained optimisation of pulverised coal fired boiler using genetic algorithms and artificial neural networks," International Journal of COMADEM, vol. 9, pp. 39-46, 2006. [24] P.N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Boston, MA: Pearson Education/Addison Wesley, 2006. [25] S. Tsutsui and A. Ghosh, "Genetic algorithms with a robust solution searching scheme," IEEE Transactions on Evolutionary Computation, vol. 1, pp. 201-208, 1997. [26] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. San Francisco, CA: Morgan Kaufmann, 2005. [27] T. Zhang and J. Lu, "A PSO-based multivariable fuzzy decision-making predictive controller for a once-through 300-MW power plant," International Journal of Cybernetics and Systems, vol. 7, pp. 417-441, 2006.
21
[28] H. Zhou, K. Cen, and J. Fan, "Modeling and optimization of the NOx emission characteristics of a tangentially firedboiler with artificial neural networks," Energy, vol. 29, pp. 167-183, 2004. [29] http://en.wikipedia.org/wiki/Matrix_multiplication, Accessed April 17, 2007. [30] E.F. Camacho and C. Bordons, Model Predictive Control, Springer-Verlag, London, UK, 1999. [31] Y.C. Zhu, Multivariable System Identification for Process Control, Pergamon, New York, 2001.
Appendix A1. Proof of observation 1. Proof: The predictors of y (t ) = f ([ y (t − d ) ]d∈D y , [ x1 (t − d )]d∈D y ,..., xk (t − d )d∈D y , [ v1 (t − d ) ]d∈D y ,..., vm (t − d )d∈D y ) can be partitioned y x x v v 1
1
k
m
into three parts with predictors in ordered in each part. The first part concerns y ’s past status ordered in
{
}
ascending sequence of time delays: y (t − d yy ,low ),..., y (t − d yy ,high ) . The second part contains controllable
{
}
variables, xs (t − d xy ,min ),..., x p (t − d xy ,max ) , arranged in ascending order of time delays. For the controllable
{
}
variable xs the following holds d xy ,low = d xy ,min = min d xy ,low ,..., d xy ,low and the controllable variable x p , s
{
1
k
}
d xy , high = d xy ,max = max d xy , high ,..., d xy ,high . The third part concerns noncontrollable variables p
{v (t − d s
y ,min ),..., v p (t v
1
k
}
− dvy ,max ) arranged in ascending order of time delays, where vs is the controllable
{ }.
}
variable for which dvy ,low = dvy ,min = min dvy ,low ,..., dvy ,low and v p is the controllable variable, where s
{
dvy , high = d vy ,max = max dvy , high ,..., d vy ,high p
1
k
1
m
The dynamic equation can be rewritten as: y (t ) = f ( y (t − d yy ,low ),..., y (t − d yy ,high ), xs (t − d xy ,min ),..., x p (t − d xy ,max ), vs (t − dvy ,min ),..., v p (t − dvy ,max )) . At sampling
time t, y is optimized by changing the values of { x1 (t ),..., xk (t )} . The modified value of xs (t ) optimizes y (t + d xy ,min ) as
y (t + d xy ,min ) = f ( y (t − d yy ,low + d xy ,min ),..., y (t − d yy , high + d xy ,min ), xs (t ),..., x p (t − d xy ,max + d xy ,min ), vs (t − dvy ,min + d xy ,min ),..., v p (t − dvy ,max + d xy ,min ))
Since at sampling time t, the current and past system status is known, if any item among { −d vy ,min + d xy ,min ,…, −d vy ,max + d xy ,min } is greater than 0, future information about noncontrollable variables is needed. To continuously optimize y by the MPC approach, one needs to make sure that −d vy ,min + d xy ,min ≤ 0 , i.e., d xy ,min ≤ dvy ,min .
22
.
A2. Proof of observation 2. Proof: Rewrite y(t ) = f ([ y(t − d )] y , [ x1 (t − d )] y ,..., xk (t − d ) y , [ v1 (t − d )] y ,..., vm (t − d ) y ) d ∈Dy d ∈Dx d ∈Dx d ∈Dv d ∈Dv k m 1 1
into a simpler form by keeping only controllable variables y (t ) = f (..., x j (t − d xy ,min ) , x (t − d )x ∈X 2, y ,d ∈D y ,...) . At sampling time t, { x1 (t ),..., xk (t )} , x j ∈X 1, y m m xm y (t + d xy ,min ) = f (..., x j (t )
x j ∈ X 1, y
y ,min ), , xm (t − d + d xy ,min ) ,...) . To optimize y (t + d x xm ∈X 2, y , d∈Dxy m
y ,min x j (t ) < 0 for xm ∈ X 2, y , d ∈ Dxy , x j ∈X 1, y could be changed. Since −d + d x m
xm (t − d + d xy ,min ) are all historical values that have already happened. It is impossible to xm ∈X 2, y ,d ∈Dxy m
change historical data to optimize y (t + d xy ,min ) .
A3. Proof of observation 3. Proof: Due to the limited information about noncontrollable variables, there should exist an upper bound t+Np
for N p and N c . The first part of the objective function is
∑ q(i)[r (i) − y(i)]
2
, which can be expanded by
i =t
substituting y (i ) using the dynamic equation, t+Np
∑ q(i)[r (i) − f ([ y(i − d )]
d ∈Dyy
i =t
, [ x1 (i − d ) ]d∈D y ,..., xk (i − d ) d ∈D y , [ v1 (i − d ) ]d∈D y ,..., vm (i − d ) d ∈D y )]2 . x x v v 1
As i increases from t to t + N p , index (i − d ) of
k
{[
1
m
v1 (i − d ) ]d∈D y ,..., vm (i − d ) d ∈D y v v 1
m
}
will become larger
than 0 at some point provided that N p is large enough. It easy to see that when i is equal to t + d vy ,min , some (i − d ) s of (i − d ) s of
{[
{[
v1 (i − d ) ]d ∈D y ,..., vm (i − d ) d ∈D y v1
vm
v1 (i − d ) ]d∈D y ,..., vm (i − d ) d ∈D y v v 1
m
}
}
will become t. When i increases further, some
will become larger than t, which means that future
information about some of the noncontrollable variables is needed to predict y(i). Therefore, the upper bound for N p is UpBound N p = dvy ,min . To determine the upper bound of N c , d xy ,min ≤ dvy ,min need to hold, otherwise model (2) can not be optimized. It is also obvious that N p is always greater than N c . As i increases from t to t + N p , index (i − d ) of
{[
x1 (i − d ) ]d ∈D y ,..., xk (i − d ) d∈D y x1
xk
}
will become t when i is t + d xy ,min and become larger than t if i further
23
increases. Therefore, before i reaches t + d xy ,min , all values of
{[
x1 (i − d ) ]d ∈D y ,..., xk (i − d ) d∈D y x1
xk
}
are
historical and can not be changed. In other words, y (t ) to y (t + d xy ,min − 1) are already determined by historical values. y (t + d xy ,min ) to y (t + N p ) can be changed by changing x (t ) to x (t + N p − d xy ,min ) . In order to optimize model (2), N p − d xy ,min ≥ 0 , so the lower bound LowBound N p for N p is d xy ,min . It is also obvious that N c = N p − d xy ,min , the upper bound of N c is UpBound N p − d xy ,min = d vy ,min − d xy ,min . Let UpBound Nc = d vy ,min − d xy ,min .
A4. Proof of observation 4. Proof: For ease of discussion, weight matrices R and S are assumed to be zero. When they are not zero, the
conclusion still holds, as shown in the proof. Let { x (t ) = x (t ),..., x (t + N c ) = x (t )} be a solution, and the
{
}
values of the corresponding performance variables are y (t + d xy ,min ),..., y (t + N p ) , which implies that x
{
}
does not change during the control horizon. Let x* (t ),..., x* (t + N c ) be the optimal solution of model (3),
{
}
and the values of corresponding performance variables are y* (t + d xy ,min ),..., y* (t + N p ) . Since
{
t+Np
}
x* (t ),..., x* (t + N c ) is the optimal solution, the objective function
∑
q (i )[r (i ) − y* (i )]2 should be
i =t + d xy ,min t+Np
smaller than or equal to
∑
q (i )[r (i ) − y (i )]2 .
i =t + d xy ,min t+Np
However,
∑
q (i )[r (i ) − y* (i )]2 ≤
i =t + d xy ,min
t+Np
∑
q (i )[r (i ) − y (i )]2 does not guarantee
i =t + d xy ,min
q (t + d xy ,min )[r (t + d xy ,min ) − y* (t + d xy ,min )]2 ≤ q (t + d xy ,min )[r (t + d xy ,min ) − y (t + d xy ,min )]2 . In other words,
[r (t + d xy ,min ) − y* (t + d xy ,min )]2 could be larger than [r (t + d xy ,min ) − y (t + d xy ,min )]2 . Note that the value of r is greater (better) than y at any time, so r (t + d xy ,min ) − y* (t + d xy ,min ) ≥ r (t + d xy ,min ) − y (t + d xy ,min )
could hold. That is to say if x (t ) is replaced with x* (t ) , y (t + d xy ,min ) may be decreased to y* (t + d xy ,min ) . t+Np
Rewrite
∑
i =t + d xy ,min t+Np
∑
q (i )[r (i ) − y* (i )]2 ≤
t+Np
∑
q (i )[r (i ) − y (i )]2 as
i =t + d xy ,min
q (i )[r (i ) − y* (i )]2 − q (i )[r (i ) − y (i )]2 ≤ 0 , and let [r (i ) − y* (i )]2 − [r (i ) − y (i )]2 = a(i ) , and expand
i =t + d xy ,min
24
t+Np
the inequality into q(t
+ d xy ,min )a (t
+ d xy ,min ) +
∑
q(i )a (i ) ≤ 0 , which equals to
i =t + d xy ,min +1 t+Np
∑
a(t
+ d xy ,min )
≤−
q(i )a (i )
i =t + d xy ,min +1 q(t + d xy ,min )
. Note that a(i ) ≤ 0 equals y* (i ) ≥ y (i ) . To make sure a(t + d xy ,min ) ≤ 0 t+Np
∑
holds once an optimal solution is determined, −
q(i )a (i )
i =t + d xy ,min +1 q(t + d xy ,min )
has to be zero. The values of the
t+Np
∑
weights q(i ) can be adjusted to make −
q (i )a (i )
i =t + d xy ,min +1 q (t + d xy ,min )
approach 0. One easy way is to make
q(t + d xy ,min ) large enough and make q (i ) , i = t + d xy ,min + 1,..., N p small enough.
25