ARTICLE IN PRESS
Control Engineering Practice 12 (2004) 917–929
Estimating product composition profiles in batch distillation via partial least squares regression Eliana Zamprognaa, Massimiliano Baroloa,*, Dale E. Seborgb a
Dipartimento di Principi e Impianti di Ingegneria Chimica (DIPIC), Universita" di Padova, Via Marzolo, 9, 35131 Padova PD, Italy b Department of Chemical Engineering, University of California, Santa Barbara, CA 93106, USA Received 15 February 2003; accepted 24 November 2003
Abstract The properties of two multivariate regression techniques, principal component analysis and partial least squares (PLS) regression, are exploited to develop soft sensors able to estimate the product composition profiles in a simulated batch distillation process using available temperature measurements. The estimators’ performance is evaluated with respect to several issues, such as pre-processing of the calibration and validation data sets, number of measurements used as sensor inputs, presence of noise in the input measurements, and use of lagged measurements. A simple augmentation of the conventional PLS regression approach is also proposed, which is based on the development and sequential use of multiple regression models. The results prove that the PLS estimators can provide accurate composition estimations for a batch distillation process. The computational requirements are very low, which makes the estimators attractive for on-line use. r 2004 Elsevier Ltd. All rights reserved. Keywords: Batch distillation; Composition estimators; Soft sensors; Partial least squares regression; Principal component analysis
1. Introduction Batch distillation is a well-known unit operation that is widely used in the fine chemistry, pharmaceutical, biochemical, and food industries to process small amounts of materials with high added value. The success of batch distillation as a method of separation is undoubtedly due to its operational flexibility. A single batch column can separate a multicomponent mixture into several products within a single operation; conversely, if the separation were carried out continuously, either a train of columns or a multi-pass operation would be required. Also, whenever completely different mixtures must be processed from day to day, the versatility of a batch column is unexcelled. These attributes are crucial for quickly responding to a market demand characterised by short product lifetimes and severe specification requirements. Batch columns can be operated in three different ways: at constant-reflux ratio (with variable distillate *Corresponding author. Tel.: +39-0498275473; 0498275461. E-mail address:
[email protected] (M. Barolo).
fax:
+39-
0967-0661/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.conengprac.2003.11.005
composition), at constant distillate composition (with variable reflux ratio), and at total reflux. A combination of these three basic modes can be used to optimize the performance of the separation. Whatever the operating mode, proper operation of a batch column requires knowledge of products compositions during the entire duration of the batch. Although product composition can be measured on-line, it is well known that on-line analyzers are complex pieces of equipment that are expensive and difficult to maintain; they also entail significant measurement delays, which can be detrimental from the control point of view (Leegwater, 1992). Therefore, to circumvent these disadvantages, it is possible to estimate the product composition on-line, rather than measuring it. The use of such inferential composition estimators (or software sensors) has long been suggested to assist the monitoring and control of continuous distillation columns. Several applications have been reported in the literature, for both simulated and experimental columns (Joseph & Brosilow, 1978; Yu & Luyben, 1988; Lang & Gilles, 1990; Mejdell & Skogestad, 1991; Baratti, Bertucco, Da Rold, & Morbidelli, 1995; Chien & Ogunnaike, 1997; Kano, Miyazaki, Hasebe, & Hashimoto, 2000). However, the
ARTICLE IN PRESS 918
E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
issue of composition estimation in batch distillation columns has received very little attention. Quintero-Marmol, Luyben, and Georgakis (1991) and Quinte! ro-Marmol and Luyben (1992) compared the performances of a steady-state composition estimator, a quasi-dynamic estimator (QDE), and an extended Luenberger observer (ELO) for a ternary batch column. They found that the ELO provided the best performance. However, they noted that the observer was quite sensitive to the accuracy of assumed vapor–liquid equilibria and to the assumed initial compositions; moreover, the estimator’s performance rapidly degraded when the tray temperature measurements (i.e., the observer’s inputs) were affected by noise. Similar issues were noted by Barolo and Berto (1998), who used the composition estimates generated by an ELO within a nonlinear strategy for composition control in a conventional batch rectifier. They also observed that the estimator accuracy tended to degrade if the tray hydraulics was taken into account and the number of trays was large. Han and Park (2001) used Luyben’s QDE to estimate the distillate composition, and to control the estimated composition profile of a batch rectifier. In order to improve the estimator’s robustness to process/model mismatch and measurement noise, Barolo, Pistillo, and Trotta (2000) developed an extended Kalman filter (EKF) to reconstruct the product composition profiles of a middle-vessel batch column from temperature measurements. They showed that, while the robustness to measurement noise was generally improved with respect to an ELO, the estimation performance was greatly affected by the location of measurement sensors. The state vector initialization and filter ‘‘tuning’’ were more difficult than for the ELO case; moreover, the improvements in the estimator’s performance were obtained at the expense of a much larger computational load. Oisiovici and Cruz (2000, 2001) developed an EKF to infer the product composition of a batch rectifier, and applied it within a globally linearizing control scheme to control operation at constant distillate composition. They pointed out that accurate description of vapor–liquid equilibria is very important for the performance of the filter. They also observed that the filter performance usually improved when larger sets of secondary measurements were used, and/or when the sampling frequency was increased; however, these options increased the estimator’s complexity and the computational burden. Similar results were obtained by Venkateswarlu and Avantika (2001), who also pointed out the difficulty in tuning the EKF covariance matrices. To summarize the results reported in the above papers, QDEs do not seem to be accurate enough for actual monitoring and/or control applications in batch distillation. ELOs are more reliable, and have modest
computational requirements, but they may be difficult to initialize, and suffer from very poor robustness to process/model mismatch and to measurement noise. EKFs are much more robust to mismatch and noise, but still their performance heavily depends on the thermodynamic modeling of vapor–liquid equilibria; they are also difficult to initialize and tune, and require considerable computational effort for on-line use. In all of the batch distillation studies cited above, the composition estimator was obtained based on a fundamental (i.e., first-principles) model of the process. In this paper, we pursue a different approach by developing an estimator based on an empirical process model. The objective of this research is to evaluate the applicability of multivariate regression techniques to develop a composition soft sensor for a conventional batch rectifier. This approach is potentially very profitable, because most of the disadvantages of estimators based on a physical model can be resolved using an empirical estimator. In fact, a priori knowledge about vapor– liquid equilibria behavior is not required in this latter case. Also, the estimator does not require composition initialization, and is computationally simple, which is desirable for on-line implementation. Partial least squares (PLS) regression is a widely used multivariate regression technique, and its application to the development of composition estimators for chemical processes has gained vast interest (Kourti & MacGregor, 1995; Yin, 1998; Kourti, 2002). This projection method is used to extract the information contained in available process data, and to project it onto to a lowdimensional space defined by new variables called latent variables. Several applications of PLS regression to soft sensor development have been reported for continuous distillation processes (Mejdell & Skogestad, 1991; Park & Han, 1998; Hong, Jung, & Han, 1999; Kano et al., 2000; Shin, Lee, & Park, 2000), while the potential of extending the use of this technique to batch distillation has received relatively little attention. This may be due to the fact that PLS regression was originally developed for continuous steady-state process systems, and its extension to discontinuous processes raises some difficulties. Recently, this technique has indeed been extended to the analysis, on-line monitoring and diagnosis of batch processes (Nomikos & MacGregor, 1995; Duchesne & MacGregor, 2000), and successful applications have been reported (Wold, Kettaneh-Wold, & Skagerberg, 1989; Kourti, Nomikos, & MacGregor, 1995; Zheng, McAvoy, Huang, & Chen, 2001). However, in the large majority of these cases, the use of PLS regression is limited only to the estimation of the final quality of the batch product, whereas in batch distillation the knowledge of the composition profile during the entire batch is required. Fletcher, Morris, and Martin (2002) developed a local estimation approach based on dynamic PLS regression (Kaspar & Ray, 1993) to
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
monitor the performance of a batch fermentation process during the entire operation. Apparently, no previous applications of PLS regression to the development of a composition estimator for batch distillation monitoring have been reported. In this paper, alternative composition soft sensors based on PLS regression are developed and evaluated for a simulated conventional batch distillation process operated at constant reflux ratio. The preliminary results reported by Zamprogna, Barolo, and Seborg (2002) are extended. In particular, a technique based on principal component analysis (PCA) is developed to pre-process the input data set, and several PLS regression approaches are considered to estimate the product composition not only at the end of the process but also during the entire duration of the batch. Several issues are addressed, such as the effect of the number of measurements used as soft sensor inputs, the effect measurement noise, and the effect of augmenting the input data with lagged measurements. Also, a novel PLS regression approach is proposed, which is based on the development and sequential use of individual regression models for the different portions of the batch duration. This paper is organized as follows. Section 2 provides the theoretical background for PCA and PLS regression. The process model and operating strategy are described in Section 3, while issues concerning process data generation and pre-processing are discussed in Section 4. Section 5 provides details on the soft sensor development, and evaluates the effects of different factors on the estimator’s performance. An augmentation of the PLS estimator is proposed in Section 6, and the conclusions for the research are presented in Section 7.
2. Multivariate regression techniques This section provides a review of two important multivariate regression techniques, PCA and PLS. 2.1. Principal component analysis PCA explains the variance contained in a set of correlated process variables by projecting the data onto a low-dimensional space defined by new uncorrelated variables, called principal components (Geladi & Kowalski, 1986). The original process variables are collectively represented as an m n matrix K, where m is the number of samples and n is the number of variables. This transformation, which consists of an orthogonal regression in the n-dimensional space of the original variables, is performed so that the observation matrix K is factored into two matrices: the score matrix T (m s) and the principal component (or loading) matrix P (n s), where s is the specified number of principal
919
components. The data matrix is then decomposed as K ¼ TPT þ E;
ð1Þ
where E is an m n matrix of residuals that contains that part of K that is left out of the regression. The principal components, which are aligned along the s columns of the matrix P, are the eigenvectors corresponding to the s largest eigenvalues of the covariance matrix of K. The covariance matrix of K is defined as covðKÞ ¼
KT K ; m1
ð2Þ
and the relationship between the covariance matrix and each loading vector pi is covðKÞpi ¼ li pi ;
ð3Þ
where li is the eigenvalue associated with the eigenvector pi ; and provides a measure of the amount of variance of the original data described by the score-loading vector pair ti pi : Because these pairs are in descending order of li ; the first pair captures the largest amount of variance of any pair in the decomposition. Each subsequent pair captures the greatest possible amount of variance remaining after subtracting ti pTi from K. The total variance of the original data retained in the PCA transformation is defined as the summation of the variance expressed by the s principal components accounted for in the regression space. Computationally, the loading vectors can be obtained sequentially using the Nonlinear Iterative PArtial Least Squares (NIPALS) algorithm (Geladi & Kowalski, 1986; Wold, Esbensen, & Geladi, 1987), which ensures that the Euclidean norm of the residual matrix E is minimized for the given number of principal components. The optimal number of principal components s can be assessed using a number of methods, with crossvalidation being the most reliable and widely used (Kourti & MacGregor, 1995). The residual Q and the Hotelling’s T 2 statistic are widely used metrics to determine how well a sample conforms to the regression model (Jackson, 1991). 2.2. Partial least squares regression PLS is conceptually similar to PCA, except that it reduces the dimensions of two sets of data (an m nX input data set X and an m nY output data set Y) simultaneously, finding the directions (latent variables, LVs) in the input space that are most predictive for the output space (Kourti & MacGregor, 1995). A detailed description of the PLS algorithm and its mathematical formulation are provided by Geladi and Kowalski (1986). The PLS algorithm decomposes the X and Y original matrices into two lower-dimensional score matrices T (nY k) and U (nX k), which represent
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
920
the projection of the original matrices X and Y onto the latent variable space, plus two residual matrices E (mY nY ) and F (mX nX ), which contain that part of Y and X that is left out of the regression: Y ¼ TPT þ E;
regressor (polynomial PLS; Wold et al., 1989), a spline function (spline PLS; Wold, 1992), or a feedforward artificial neural network (ANNPLS; Qin & McAvoy, 1992). These linear and nonlinear techniques are generally referred to as static PLS algorithms. When process variables are also characterized by auto-correlation in time, dynamic PLS can be used. In this approach, a relatively large number of past samples of the variables (lagged values) are included at each sampling instant in the original input data matrix X (Ricker, 1988; Lakshminarayanan, Shah, & Nandakumur, 1997), or in both the input and output matrices, X and Y (Qin & McAvoy, 1996). The augmented data matrices are then processed using conventional linear or nonlinear PLS regression algorithms. Conventional PLS assumes that the data are given by two-dimensional matrices. However, for batch processes, data matrices are typically arranged in the form of three-dimensional arrays, where different batch runs are organized as rows, the measurement variables are in the columns, and their time evolution occupies the third dimension. Each horizontal slice through this threedimensional array contains the trajectories of all the variables from a single batch; each vertical slice collects the values of all the variables for all the batches at the same time instant. To extend the conventional PLS approach to batch data sets, these batch data arrays have to be unfolded to create two-dimensional arrays. One possible way to rearrange the original arrays is represented by stacking one horizontal slice after the other, as shown in Fig. 1. This unfolding procedure is particularly useful when the number of recorded samples varies from batch to batch. As an alternative, the Multiway PLS approach (Nomikos & MacGregor, 1995; Kosanovic, Dahl, & Piovoso, 1996) provides for XB to be unfolded by putting each of its vertical slices side by side to the right, starting with the one corresponding to the first time interval. The three-way array is therefore decomposed in such a way that all measurements collected over the entire duration of a
ð4Þ
X ¼ UQT þ F:
The latent variables are aligned along the k columns of the two score matrices, T (mY k) and U (mX k), and are ordered in such a way that the amount of information (variance) of the original data described by each variable decreases as the number of latent variables increases. The PLS transformation is performed so that the score vectors of each ith latent variable are mutually related through an inner linear relationship: ui ¼ bi t i þ hi ;
ð5Þ
where bi is a coefficient determined by minimizing the norm of the residual vector hi : The optimal number of latent variables k can be assessed using a number of methods, with crossvalidation being the most reliable and widely used (Kourti & MacGregor, 1995; Jackson, 1991). The latent vectors are usually calculated iteratively using the NIPALS algorithm, or the SIMPLS algorithm. The latter ensures lower computational load and faster convergence (De Jong, 1993). Because most practical problems are nonlinear, nonlinear PLS techniques have been developed in order to maintain the robust generalization property of the conventional (i.e., linear) PLS approach and, at the same time, represent any nonlinear relationships existing between X and Y. These techniques retain the framework of linear PLS, but use a nonlinear relationship for each pair of latent variables ui and ti : This relationship can be generally represented as ui ¼ f i ðti Þ þ hi ;
ð6Þ
where fi( ) stands for a nonlinear vector function. For example, it could be a second-order polynomial
XB
X Samples Samples
XB Samples
Batches Samples Batches
X
Variables
Variables
Batches
Samples
Variables
(a)
Variables
Variables
Variables
(b)
Fig. 1. Arrangement of the batch data XB (batches variables samples) in PLS (a) and in MWPLS (b).
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
batch are aligned along one row, and each row of the new matrix X represents one batch. This unfolding procedure makes it possible to analyze the variability among the batches in X by summarizing the information carried in the original data set with respect to both variables and their time variation. However, it is possible to resort to this unfolding paradigm only if all the batches in the data set have the same time duration (i.e., they have the same number of time samples), which is not the case of batch distillation. To determine how well the original data conform to the PLS regression space, the mean squared (MSQ) error is defined as:
MSQi ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðyi y# i Þðyi y# i ÞT m
;
ð7Þ
where yi is the row vector measurements referring to the generic ith output variable y; y# i is the vector of its estimates obtained from the soft sensor, and m is the total number of samples recorded.
3. The process The process considered in this work is the separation of a zeotropic ternary mixture in a conventional batch rectifier with 20 trays, which is operated according to the constant-reflux strategy described by Luyben (1991). The column is initially operated at total reflux. When the distillate composition meets the desired quality specification, the distillate withdrawal is started, products and slop cuts are sequentially collected from the top and segregated in separate tanks. The heaviest cut is recovered from the reboiler at the end of the batch. Details about slop-cut withdrawal and handling are given in Luyben’s paper. The process objective is to recover each component of the feed at a given minimum purity level. Namely, the mole fraction of the key component in each product must be equal to, or greater than, 0.95. A software sensor is developed to estimate the instantaneous mole fraction of the light and intermediate components in the distillate stream (xD;1 and xD;2 ; respectively), and the mole fraction of the heavy component in the reboiler (xB;3 ), which are the key compositions needed for process monitoring. To this purpose, it is supposed that temperature measurements are available from the still pot and four additional trays (trays 5, 10, 15, 20, considering a bottom-top numbering scheme), as suggested by Quintero-Marmol et al. (1991). The process is modeled by a system of differential and algebraic equations, which is extensively described in Barolo and Berto (1998). The only difference with respect to that model is that a tray holdup of 5 mol is
921
considered in the present study. This model will be referred to as ‘‘the process’’ hereafter.
4. Process data generation and pre-processing The data sets needed to develop the PLS regression model of the process were generated by repeatedly running the first-principles model of the batch column under different operating conditions. Nineteen batch operations were simulated by varying the initial feed composition, boilup rate and reflux rate from batch to batch, as summarized in Table 1. This somewhat mimics the actual situation encountered by a batch distillation process, whose feed and operating conditions may widely change from batch to batch. For each operation, the trajectories of all process variables were monitored throughout the entire duration of the batch, and recorded using a sampling period of 36 s. For each ith batch, the recorded temperature measurements TB ; T5 ; T10 ; T15 ; T20 were arranged as column vectors and used to compose the input data matrix Xi : Similarly, the measurements of the mole fraction of the light and intermediate components in the distillate stream, and the heavy component in the bottoms were used to assemble the output matrix Yi : Because the simulated batches have different durations, matrices Xi and Yi have different number of samples. The simulated data sets were divided into two groups: data from the first 11 batches were used to compute the Table 1 Characterization of the simulated batch runs included in the data set V (mol/h)
D (mol/h)
tF (h)
Calibration data T01 0.25/0.60/0.15 T02 0.33/0.33/0.34 T03 0.40/0.20/0.40 T04 0.10/0.80/0.10 T05 0.10/0.20/0.70 T06 0.35/0.15/0.50 T07 0.20/0.45/0.35 T08 0.50/0.40/0.10 T09 0.60/0.25/0.15 T10 0.55/0.15/0.30 T11 0.10/0.05/0.85
105 100 130 85 90 95 105 100 115 110 80
52.50 47.84 68.06 44.73 42.65 50.26 49.52 53.19 53.99 51.16 42.78
3.96 3.28 2.42 5.81 2.27 2.55 3.33 3.88 3.80 3.66 1.13
Validation data V01 0.45/0.40/0.15 V02 0.20/0.30/0.50 V03 0.33/0.50/0.17 V04 0.75/0.10/0.15 V05 0.08/0.40/0.52 V06 0.20/0.10/0.70 V07 0.45/0.05/0.50 V08 0.65/0.10/0.25
110 95 75 85 100 120 80 105
58.82 51.07 40.54 40.47 50.00 56.33 37.38 56.45
3.38 2.85 5.00 5.01 3.32 1.37 4.14 3.66
Batch run
XF
Note: For each batch run, the values of feed composition XF ; boilup rate V ; distillate rate D; and final batch time tF are specified.
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
922
reflux rate R; and the total duration of the batch tF : The feature vectors corresponding to the batch operations used for calibration and validation were collected into two matrices (Kfc and Kfv ; respectively), in such a way that the ith feature vector of a data set is aligned along the ith row of the corresponding matrix. The dimensions of the feature matrices Kfc and Kfv are therefore 11 5 and 8 5, respectively. PCA was then performed on the calibration feature matrix Kfc ; in order to define a lower-dimensional space that represents the original informative content of this matrix. A subspace of two principal components was used to project the original feature matrix Kfc ; as this guarantees that more than 80% of the original data variance is explained. The value of the residual Q and T 2 statistics in the principal component space was calculated for each calibration and validation set, and is reported in Fig. 2. All the calibration sets conform fairly well in the selected principal component space, as the 95% statistical confidence limits are never violated. Conversely, the plot of score residuals Q calculated for the validation data reveals that batch runs 4, 7 and 8 in the validation sets are anomalous, since the value of their residuals exceeds the 95% confidence limit calculated for the calibration data. Close inspection of these data sets revealed that they were indeed characterized by an unusual development of the operation, as the bottom of the column empties before the composition in the still
parameters of the multivariate regression model (calibration sets); the remaining eight batches were used to test the accuracy of the PLS regression (validation sets). PLS regression is data dependent. As a consequence, analyzing and pre-processing the data used for PLS model calibration and validation is of paramount importance and requires particular care. First, data from abnormal operations need to be detected and removed from the database, as they generally make the process identification more difficult, and result in a regression model which is not representative of the ordinary process behavior for the considered operating region (Kano et al., 2000). Second, a suitable scaling procedure for the available data set needs to be adopted, as proper data normalization can favor the determination of the most representative PLS multiplane (Kourti & MacGregor, 1995). These two issues will be addressed in the following subsection. 4.1. Data analysis and pre-processing
12
12
10
10
8
8
Residuals Q
Residuals Q
The data sets selected for calibration and validation were investigated in order to verify that no anomalous batches were included in the representative databases. To this purpose, by exploiting the properties of PCA a method was developed to detect abnormal batch runs. According to this approach, each simulated operation can be characterized by using a feature vector composed by the mole fraction of the light and intermediate component in the feed ðxF ;1 ; xF ;2 Þ; the boilup rate V ; the
6 4
Q95 = 0.2
2
6 4
Q95 = 0.2
2
0 1
2
3
4
5
6
7
8
0
9 10 11 12
0
1
2
Calibration Set
T
95
= 9.5
10 2
8
Hotelling T
2
Hotelling T
5
6
7
8
9
7
8
9
12 2
6 4
T
2 95
= 9.5
8 6 4 2
2
0
0 1
(a)
4
Validation Set
12 10
3
2
3
4
5
6
7
8
Calibration Set
1
9 10 11 12
(b)
2
3
4
5
6
Validation Set
Fig. 2. Value of residuals Q and Hotelling’s T 2 statistic and their 95% confidence limit for (a) the calibration sets, and (b) the validation sets.
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
pot reaches the desired specification. These data sets were therefore excluded from the database. The input and output matrices corresponding to the calibration sets were then arranged according to the procedure schematically represented in Fig. 1 obtaining the comprehensive input and output calibration data matrices Xc and Yc : Similarly, the input and output matrices (corresponding to the five validation batches remaining after purging the original database from the anomalous sets) were used to build the validation data matrices Xv and Yv : Calibration and validation data were scaled to zero mean and unit variance. This normalization procedure showed to lead to soft sensor of improved estimation performance over alternative scaling methods (Zamprogna, 2002).
5. PLS soft sensors Linear, polynomial, spline and ANN PLS transformations were carried out on the calibration data sets Xc and Yc to determine a regression model that relates the characteristics of the recorded temperature profiles to the changes in the dominant component mole fraction in each product. Three latent variables were retained in the regression models, this optimal number having been determined using cross-validation. Table 2 shows the total percentage of variance of the original calibration data captured by the obtained regression models. The values of the estimation error index MSQ calculated for the validation data are also reported for each soft sensor. As can be seen, the total percentage of explained variance of Xc and Yc is not strongly affected by the adopted regression method, since its value roughly the same for all the soft sensors. On the contrary, from the values of the prediction error index MSQ calculated with respect to the validation data it emerges that the PLS regression model affects more markedly the accuracy of estimation of the resulting soft sensor. In particular, the estimation performance of the linear and polynomial PLS soft sensors can be considered equally good. Conversely, the accuracy of estimation is poorer when spline or ANN PLS regression is adopted. The value of MSQ calculated
923
for these latter soft sensors is remarkably higher when xD;1 and xD;2 are regarded. The deterioration of the estimation accuracy for xB;3 is instead moderate. On the whole, the best overall estimation performance is provided by the linear PLS soft sensor, as it expresses the minimum value of total MSQ. Fig. 3 reports the comparison between the actual value of the product composition profiles and their estimates provided by a linear PLS soft sensor. All the validation data are represented in this figure. It is clear that a composition estimator can indeed be developed using a PLS regression model of the process: the profiles of the estimated product composition provided by the soft sensor matches the actual composition profiles quite accurately. Note that the accuracy of bottom composition estimation needs to be good only by the end of each batch, since the bottom product is not withdrawn continuously. It is useful to remark that, differently to what happens in structured estimators, no information on the system thermodynamics needs to be provided to the PLS
xD,1
1.0 0.8 0.6 0.4 0.2 0.0
xD,2
1.0 0.8 0.6 0.4 0.2 0.0
xB,3
1.0 0.8 0.6 0.4 0.2 0.0
Actual Estimated
0
400
800
1200
1600
Time sample Fig. 3. Comparison between the actual value of the product compositions and their estimates provided by a linear PLS soft sensor (validation data).
Table 2 Total percent of variance and validation MSQ estimation error calculated for linear and nonlinear PLS soft sensors (variance refers to calibration data, MSQ to validation data) PLS regression method
Linear Polynomial Spline ANN
MSQ 103
Variance explained (%) Xc
Yc
xD;1
xD;2
xB;3
Total
98.05 98.05 98.03 98.05
88.37 87.66 88.79 87.18
2.17 2.49 3.14 3.05
2.84 2.84 3.64 3.52
2.06 2.03 2.39 2.35
7.07 7.37 9.19 8.92
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
924
estimator (although this information is needed to generate the calibration and validation databases, if experimental data are not available). Moreover, no initialization of the composition profiles is requested, and the composition estimates are obtained almost instantaneously, as the computational load required by the PLS soft sensor is very low. 5.1. Effect of measurement noise Normally distributed noise of zero mean and standard deviation s was added to the process inputs for both the calibration and validation data sets in order to determine whether the presence of measurement noise can undermine the estimation accuracy of the PLS estimators. In particular, the effect of the presence of low-level noise (s ¼ 0:1 C) and high-level noise (s ¼ 1:0 C) in the process data is evaluated. As shown in Table 3, the cumulative percentage of the explained data variance slightly decreases when noisy input data are used to develop the linear and nonlinear estimators with respect to the nominal case, in which
noise-free input data were considered. In particular, this reduction is larger as the noise level increases. However, the presence of measurement noise does not remarkably affect the overall estimation accuracy of the PLS models. In fact, Fig. 4 reveals that the MSQ error remains nearly constant for varying noise level. Because of the inherent features of PLS regression, the first few latent variables capture the most of the valuable information retained by the process data, while the random noise is typically associated with the higherorder latent variables. Therefore, measurement noise is usually eliminated when the original data are projected onto the lower-dimensional PLS space. 5.2. Effect of the number of temperature inputs As mentioned in Section 3, five temperatures, evenly distributed along the column, were used as soft sensor inputs, according to the indications provided by Quintero-Marmol et al. (1991). In order to investigate the effect of using a different number of temperature inputs (nX ), linear and nonlinear PLS soft sensors (with
Table 3 Total percent of variance captured by linear and nonlinear three-dimensional PLS soft sensors when model inputs are affected by normally distributed noise with zero mean and variance s (calibration data) s ¼ 0 C
PLS regression method
Linear Polynomial Spline ANN
s ¼ 0:1 C
s ¼ 1:0 C
X c block
Yc block
Xc block
Yc block
Xc block
Yc block
98.05 98.05 98.03 98.05
88.37 87.66 88.79 87.18
98.03 98.03 98.01 98.03
88.35 87.65 88.73 87.21
97.87 97.87 97.88 97.57
87.78 87.06 88.30 86.60
10
10 Linear PLS
Noise free σ = 0.1 ˚C σ = 1.0 ˚C
Polynomial PLS 8
MSQ x 10
3
8 6
6
4
4
2
2
0 10
xD,1
xD,2
xB,3
0
Total
10
xD,1
xD,2
xB,3
Total
xD,2
xB,3
Total
ANN PLS
Spline PLS 8
6
6
4
4
2
2
MSQ x 10
3
8
0
xD,1
xD,2
xB,3
0
Total
xD,1
Fig. 4. Estimation error MSQ for linear, polynomial, spline and ANN PLS soft sensors when input data corrupted by noise of different level are used (validation data).
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
three latent variables) were developed assuming that measurements from all 20 trays of the column and the reboiler (nX ¼ 21) were available. Alternatively, linear and nonlinear PLS estimators were obtained considering three temperature measurements as model inputs (TB ; T10 and T20 ; nX ¼ 3). Because the number of original input temperatures and the number of latent variables retained by the regression models are the same in the latter case, the value of the total percentage captured for the Xc matrix is 100% when nX ¼ 3: As can be seen by comparing Table 4 to Table 2, all the estimators capture approximately the same amount of information of the original calibration input data Xc that was explained by the corresponding model using five temperature measurements. However, the percentage of represented variance for the output data Yc is lower when 21 temperatures are used, and this indicates that a larger portion of the information about the composition dynamics variables is lost in this case. When three temperature inputs are considered instead, the amount of information described for Yc is approximately the same than that obtained for the nominal case. Table 4 Total percent of variance captured by linear and nonlinear threedimensional PLS soft sensors when different numbers of temperature measurements (nX ) are used as model inputs (calibration data) PLS regression method
Linear Polynomial Spline ANN
nX ¼ 3
nX ¼ 21
Xc block
Yc block
Xc block
Yc block
100.00 100.00 100.00 100.00
87.59 88.24 87.67 88.81
98.09 98.08 98.03 98.08
83.39 82.10 83.98 82.45
12
Fig. 5, which compares the validation MSQ error obtained for the soft sensors using different numbers of temperature inputs, provides further proof of the reduced capability of the PLS models using 21 measurement inputs to describe the actual dynamics of the process with respect to the nominal case (nX ¼ 5). It was also experienced that the computational load required to the calculation of the optimal regression parameters increases, because of the larger number of inputs variables incorporated in the data matrices. Consequently, the regression procedure gets lengthier, particularly so for the spline PLS and ANN PLS. Conversely, the reduction of the number of temperatures in the input matrix does not deteriorate the estimation accuracy of the model. These results confirm that the number and location of the temperature inputs play an important role for the estimator accuracy, to a point that inappropriate choice of the sensor inputs may reduce the estimation accuracy (Kano et al., 2000). Therefore, further investigation is needed in order to determine the optimal subset of temperature measurements to be fed to the estimator. A study addressing this issue is under development, and results will be reported elsewhere. 5.3. Dynamic PLS regression The PLS estimators considered so far are static in nature; dynamics is obtained by simply placing static estimations side by side. This may seem somewhat conflicting with the fact that batch distillation is an inherently dynamic process. For this reason, the possibility of developing an intrinsically dynamic estimator was explored. 12
10
nX = 5
10
8
nX = 21
8
3
MSQ x 10
Linear PLS
nX = 3
6
6
4
4
2
2
0
Polynomial PLS
0
xD,1 12
925
xD,2
xB,3
xD,1
Total 12
Spline PLS
10
8
8
6
6
4
4
2
2
xB,3
Total
xD,2
xB,3
Total
ANN PLS
MSQ x 10
3
10
xD,2
0
0
xD,1
xD,2
xB,3
Total
xD,1
Fig. 5. Estimation error MSQ for linear, polynomial, spline and ANN PLS soft sensors when a different number of temperature inputs are used (validation data).
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
926
Table 5 Total percent of variance captured by linear and nonlinear three-dimensional PLS soft sensors when L lagged inputs are considered in the data matrices (calibration data) L¼0
PLS regression method
Linear Polynomial Spline ANN
L¼2
L¼4
Xc block
Yc block
Xc block
Yc block
Xc block
Yc block
Xc block
Yc block
98.05 98.05 98.03 98.05
88.37 87.66 88.79 87.18
97.51 97.51 97.49 97.51
88.20 87.50 88.42 78.04
97.07 97.07 97.05 97.07
87.93 87.25 88.01 86.70
96.64 96.64 96.63 96.64
87.64 86.98 87.68 86.30
12
MSQ x 10
3
10 8
12 Polynomial PLS
Linear PLS
Lags = 0 Lags = 2 Lags = 4 Lags = 6
10 8
6
6
4
4
2
2
0 12
xD,1
xD,2
xB,3
0
Total Spline PLS
3
10
MSQ x 10
L¼6
xD,1
xD,2
xB,3
Total
xD,2
xB,3
Total
ANN PLS 8
8
6
6 4 4 2
2
0
0
xD,1
xD,2
xB,3
Total
xD,1
Fig. 6. Estimation error MSQ for linear, polynomial, spline and ANN Dynamic PLS soft sensors obtained considering input matrices with different number of lagged inputs (validation data).
The extension of the conventional PLS algorithm for dynamic modeling was achieved through the augmentation of the original input data matrices Xc and Xv with lagged values. The augmented matrix Xc was subsequently processed using conventional linear and nonlinear PLS regression paradigms, and the performance of the obtained multivariate models was tested using the augmented matrix Xv : As shown in Table 5, the number of lagged temperature samples included in the input data set affects the amount of cumulative percentage of variance explained by the PLS models. In fact, the total variance captured decreases slightly with the increase of the inputs lags, for all the regression models. However, the augmentation of the input matrices does not affect the estimation performance of the resulting soft sensors significantly, as can be inferred from Fig. 6. The value of MSQ in fact does not change substantially for different number of lags included in the original data. A major drawback faced when performing dynamic PLS is that the addition of a relatively large number of
lagged values in the input data matrix causes a substantial increase in the dimension of this matrix. This implies that the computational load required to the development of the multivariate regression becomes considerably heavier, and the regression procedure gets much lengthier. This is particularly true for spline PLS and ANN PLS, which are based on computationally more intensive algorithms. In addition, the resulting model results more cumbersome to manipulate, due to the presence of a higher number of parameters.
6. Multiple PLS soft sensors As shown in Fig. 3, a linear PLS soft sensor is able to provide a fairly accurate estimation performance. However, some mismatch can be observed between the actual and estimated composition profiles, particularly so for xD;1 and xD;2 : In order to improve the estimator performance, a simple augmentation of the conventional PLS methods was considered that takes into account the
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
peculiar characteristics of the batch distillation process. Namely, the proposed approach consists of subdividing the data recorded for each batch into subsets, each of which corresponds to a particular operating period (i.e., startup; main cut 1 withdrawal; slop cut 1 withdrawal;y). For each period, separate models using linear and nonlinear PLS regression algorithms can be developed. The PLS models of the same type (linear, polynomial, spline or ANN) obtained for each period are then used sequentially to estimate the whole composition profile. The composition estimators obtained through this regression procedure are referred to as multiple PLS (MPLS) soft sensors. This modeling strategy is motivated by the fact that each batch goes through a series of phases with substantially different characterization. In fact, because of the operating procedure adopted, the process dynamic regime moves from a condition of total reflux to a condition of constant reflux. At the same time, the process experiences large excursions in the tray compositions, due to the sequential movement of the light and intermediate components from the bottom to the top of the column. These changes in the column dynamic regime and composition distribution are reflected in the variability of the temperature profile, which is used in the PLS regressors to reconstruct the entire composition profiles. In contrast to conventional PLS, the MPLS method can handle this complexity and variety of information about the different process phases directly, and this fact can potentially improve the model accuracy. In order to develop the MPLS models, the original data for each batch run were partitioned into five sections, corresponding to the total reflux phase and to the withdrawal of the products and slop cuts obtained from the top of the column. Linear and nonlinear MPLS models were developed. Table 6 reports the overall value of MSQ calculated for the linear and nonlinear MPLS regression models when validation data are considered. The spline PLS regression results in a very large MSQ. It was already mentioned that the model parameters are generally more difficult to tune for a spline PLS soft sensor because of the inherent complexity of its regression algorithm (Wold, 1992). Because the MPLS method requires the
Table 6 MSQ error calculated for the linear and nonlinear MPLS soft sensors (validation data) PLS regression method
Linear Polynomial Spline ANN
MSQ 103 xD;1
xD;2
xB;3
Total
1.31 1.71 6.17 1.84
1.60 1.87 21.86 1.88
2.09 2.83 12.19 2.48
5.00 6.41 40.22 6.20
x D,1
x D,2
x B,3
927
1.00 0.75 0.50 0.25 0.00 1.00 0.75 0.50 0.25 0.00
1.00 0.75 0.50 0.25 0.00
Actual Linear MPLS
0
400
800 1200 Time sample
1600
Fig. 7. Comparison between the actual value of the product compositions and their estimates provided by the linear MPLS soft sensor.
tuning and sequential use of several single spline PLS models, the use of spline regression within this augmented approach might not be profitable, as evidenced by the results obtained. When the other MPLS soft sensors are regarded, the value of the estimation error obtained for xB;3 is in general slightly higher than that obtained using the corresponding conventional PLS soft sensors (see Table 2). However, this increase can be considered nearly negligible. Conversely, it can be observed that the value of MSQ calculated for xD;1 and xD;2 is markedly lower for the soft sensors based on the MPLS regression approach. This is particularly so for the linear MPLS soft sensor, which results in the lowest total MSQ error and therefore provides the most accurate estimation performance. As shown in Fig. 7, this soft sensor is capable to describe the composition dynamics very accurately, and its estimation accuracy is clearly superior to that achieved by the conventional PLS approach.
7. Concluding remarks In this study, a PLS-based soft sensor was developed for a simulated batch distillation process in order to estimate the composition of the distillate stream and of the bottom product using secondary process information provided by temperature measurements. PCA was used to analyze the available process data and identify the anomalous operations to be excluded from the
ARTICLE IN PRESS 928
E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929
database. Several soft sensors were developed using linear and nonlinear PLS regression, and their estimation performances were compared. Various issues were addressed, such as the effect of the number of measurement inputs, the effect of noise in the model input variables, and the effect of the augmentation of the original process data with lagged input measurements. With respect to the characterization of the variables used as sensor inputs, it was evidenced that an effective composition estimation can be achieved even when not all of the available temperature measurements are used as input data to calibrate the PLS regression models; on the contrary, the reduction of the number of temperatures in the input matrix does not necessarily deteriorate the estimation accuracy of the model. These results confirm the importance of proper data selection in the development of regression soft sensors, and motivate further investigation in order to determine the optimal number and location of the temperature measurements to be used as soft sensor inputs. The estimators’ performance is not undermined by the presence of measurement noise, as the inherent properties of PLS regression make it possible to segregate and eliminate data disturbances. No remarkable improvement of the estimation accuracy was observed when employing dynamic PLS regression. On the contrary, the resulting soft sensors are generally more complex and difficult to calibrate. The estimator performance improves significantly when using the proposed regression approach, multiple PLS regression, particularly so as far as the estimation of the distillate composition is concerned. Finally, the computing power required by the estimators is generally very low, which makes them attractive for on-line use. For the practical implementation of these soft sensors, however, some issues should be considered. For example, since the sensors rely on multiple (temperature) measurements, they are open to sensor malfunctioning, and therefore care should be taken in identifying any sensor faults. On the other hand, measurement noise should not be an issue, as was discussed. Switching between models in the multiple PLS approach may not be easy to achieve in practice, because a bad composition estimation may drive the soft sensor to switch between a model and another one at the wrong time, with the risk of further worsening the composition estimation. An alternative approach could be to develop a single model for each composition to be estimated; thus, a multiple-model would eventually be obtained, but with no need for switching between models. This approach should prove convenient also when a larger number of composition profiles need to be estimated (i.e., in the case of feeds with more than three components).
Acknowledgements This research was carried out in the framework of the MIUR-PRIN 2002 project ‘‘Operability and controllability of middle-vessel distillation columns’’ (ref. no. 2002095147 002).
References Baratti, R., Bertucco, A., Da Rold, A., & Morbidelli, M. (1995). Development of a composition estimator for binary distillation columns. Application to a pilot plant. Chemical Engineering Science, 50, 1541–1550. Barolo, M., & Berto, F. (1998). Composition control in batch distillation: Binary and multicomponent mixtures. Industrial and Engineering Chemistry Research, 37, 4689–4698. Barolo, M, Pistillo, A., & Trotta, A. (2000). Issues in the development of a composition estimator for a middle-vessel batch column. In E. F. Camacho, L. Bas!anez, J. A. de la Puente (Eds.), Advanced control of chemical processes 2000–IFAC ADCHEM 2000 (pp. 923–928). Oxford, UK: Elsevier. Chien, I., & Ogunnaike, B. A. (1997). Modeling and control of a temperature-based high-purity distillation column. Chemical Engineering Communications, 158, 71–105. De Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems, 18, 251–263. Duchesne, C., & MacGregor, J. F. (2000). Multivariate analysis and optimization of process variable trajectory for batch processes. Chemometrics and Intelligent Laboratory Systems, 51, 125–137. Fletcher, N. M., Morris, A. J., & Martin, E. B. (2002). Local linear and nonlinear multi-way partial least squares batch modelling. Edited preprints of b ’02–15th IFAC World Conference, Barcelona, Spain, July 21–26. Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: A tutorial. Analytica Chimica Acta, 185, 1–17. Han, M., & Park, S. (2001). Profile position control of batch distillation based on a nonlinear wave model. Industrial and Engineering Chemistry Research, 40, 4111–4120. Hong, S. J., Jung, J. H., & Han, C. (1999). A design methodology of a soft sensor based on local models. Computers and Chemical Engineering, 23, S351–S354. Jackson, J. E. (1991). A user’s guide to principal components. New York, USA: Wiley. Joseph, B., & Brosilow, C. B. (1978). Inferential control of processes. Part I: Steady state analysis and design. AIChE Journal, 24, 485–492. Kano, M., Miyazaki, K., Hasebe, S., & Hashimoto, I. (2000). Inferential control system of distillation compositions using dynamic partial least squares regression. Journal of Process Control, 10, 157–166. Kaspar, M. H., & Ray, W. H. (1993). Dynamic PLS modeling for process control. Chemical Engineering Science, 48, 3447–3461. Kosanovic, K. A., Dahl, K. S., & Piovoso, M. J. (1996). Improved process understanding using multiway principal component analysis. Industrial and Engineering Chemistry Research, 35, 138–146. Kourti, T. (2002). Process analysis and abnormal situation detection: From theory to practice. IEEE Control Systems Magazine, 22(5), 12–25. Kourti, T., & MacGregor, J. F. (1995). Tutorial: Process analysis, monitoring and diagnosis, using multivariate regression methods. Chemometrics and Intelligent Laboratory Systems, 28, 3–21.
ARTICLE IN PRESS E. Zamprogna et al. / Control Engineering Practice 12 (2004) 917–929 Kourti, T., Nomikos, P., & MacGregor, J. F. (1995). Analysis, monitoring and fault diagnosis of batch processes using multiblock and multiway PLS. Journal of Process Control, 4, 277–284. Lakshminarayanan, S., Shah, S. L., & Nandakumur, K. (1997). Modeling and control of multivariable processes: Dynamic PLS approach. AIChE Journal, 43, 2307–2322. Lang, L., & Gilles, E. D. (1990). Nonlinear observers for distillation columns. Computers and Chemical Engineering, 14, 1297–1301. Leegwater, H. (1992). Industrial experience with double quality control. In W. L. Luyben (Ed.), Practical distillation control. New York, USA: Van Nostrand Reinhold. Luyben, W. L. (1991). Multicomponent batch distillation. 1. Ternary systems with slop recycle. Industrial and Chemical Engineering Research, 27, 642–657. Mejdell, T., & Skogestad, S. (1991). Estimation of distillation compositions from multiple temperature measurements using partial-least-squares regression. Industrial and Engineering Chemistry Research, 30, 2543–2555. Nomikos, P., & MacGregor, J. F. (1995). Multi-way partial least squares in monitoring batch processes. Chemometrics and Intelligent Laboratory Systems, 30, 97–108. Oisiovici, R. M., & Cruz, S. L. (2000). State estimation of batch distillation columns using an extended Kalman filter. Chemical Engineering Science, 55, 4667–4680. Osiovici, R. M., & Cruz, S. L. (2001). Inferential control of high-purity multicomponent batch distillation columns using an extended Kalman filter. Industrial and Engineering Chemistry Research, 40, 2628–2639. Park, S., & Han, C. (1998). A nonlinear soft sensor based on multivariate smoothing procedure for quality estimations in distillation columns. Computers and Chemical Engineering, 24, 871–877. Qin, S. J., & McAvoy, T. J. (1992). Non-linear PLS modelling using artificial neural networks. Computers and Chemical Engineering, 16, 379–391. Qin, S. J., & McAvoy, T. J. (1996). Nonlinear FIR modeling via a neural net PLS approach. Computers and Chemical Engineering, 20, 147–159. Quint!ero-Marmol, E., & Luyben, W. L. (1992). Inferential modelbased control of multicomponent batch distillation. Chemical Engineering Science, 47, 887–898.
929
Quintero-Marmol, E., Luyben, W. L., & Georgakis, C. (1991). Application of an extended Luenberger observer to the control of multicomponent batch distillation. Industrial and Engineering Chemistry Research, 30, 1870–1880. Ricker, N. L. (1988). The use of biased least squares estimators for parameters in discrete-time pulse response model. Industrial and Engineering Chemistry Research, 27, 343–350. Shin, J., Lee, M., & Park, S. (2000). Design of a composition estimator for inferential control of distillation columns. Chemical Engineering Communications, 178, 221–248. Venkateswarlu, C., & Avantika, S. (2001). Optimal state estimation of multicomponent batch distillation. Chemical Engineering Science, 56, 5771–5786. Wold, S. (1992). Nonlinear partial least squares modelling. II. Spline inner relation. Chemometrics and Intelligent Laboratory Systems, 14, 71–84. Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2, 37–52. Wold, S., Kettaneh-Wold, N., & Skagerberg, B. (1989). Non-linear PLS modelling. Chemometrics and Intelligent Laboratory Systems, 7, 53–65. Yin, K. K. (1998). Multivariate statistical methods for fault detection and diagnosis in chemical process industries: A survey. Trends in chemical engineering, 4, 233–241. Yu, C. C., & Luyben, W. L. (1988). Control of multicomponent distillation columns using rigorous composition estimators. In Distillation and adsorption 1997, IChemE Symposium Series No. 104 (pp. A29–A69). London, UK: IChemE. Zamprogna, E. (2002). Development of virtual sensors for batch distillation monitoring and control using multivariate regression techniques. Ph.D. dissertation, Department of Chemical Engineering, University of Padova, Italy. Zamprogna, E., Barolo, M., & Seborg, D. E. (2002). Development of a soft sensor for a batch distillation column using linear and nonlinear PLS regression techniques. Edited preprints of b’02–15th IFAC World Conference, Barcelona, Spain, July 21–26. Zheng, L. L., McAvoy, T. J., Huang, Y., & Chen, G. (2001). Application of multivariate statistical analysis in batch processes. Industrial and Engineering Chemistry Research, 40, 1641–1649.