Application of Optimization Algorithms for the ... - Springer Link

Report 3 Downloads 54 Views
Application of Optimization Algorithms for the Improvement of Water Quality Monitoring Systems

Marina G. Erechtchoukova, Stephen Y. Chen and Peter A. Khaiter Atkinson Faculty of Liberal and Professional Studies, York University, Canada

Abstract The number of observations is limited by budgetary and other constraints, but the collected data must be sufficient to meet a wide range of monitoring objectives. The process of determining an efficient temporal water quality monitoring design has been formulated as an optimization problem. In the problem, a target level of acceptable uncertainty for a given indicator is specified, and the resultant solution is the smallest set of observation dates sufficient to achieve this level. Solution robustness can be improved by adding a sliding window – a maximum range in deviations of the observation dates that still provide the acceptable level of uncertainty. Application of the proposed approach to a real-world case study suggested an iterative procedure for the improvement of monitoring designs.

1 Introduction With respect to an aquatic system, environmental monitoring is defined as the long-term standardized measurement, observation, evaluation, and reporting of the aquatic conditions such that its status and trends can be determined (Helmer 1993). Monitoring systems provide broad sets of data collected in accordance with a program designed for a specific set of scientific, environmental, and/or managerial objectives. The Canada-wide framework for water quality monitoring includes three main phases: (1) I.N. Athanasiadis et al., Information Technologies in Environmental Engineering, Environmental Science and Engineering, DOI 10.1007/978-3-540-88351-7_13, © Springer-Verlag Berlin Heidelberg 2009

Optimization Algorithms for Improving Water Quality Monitoring Systems

177

planning which determines the objectives and the scope of the program; (2) collection/analysis incorporating field sampling, laboratory analysis, data management and processing, data analysis and interpretation, data reporting, and quality assurance/quality control; and (3) information utilization for decision making, policy development and enforcement, and education (WQTG 2006). Monitoring systems have a complex infrastructure supporting all of their sampling and data processing activities. One key component of a monitoring system is its network of sampling sites where water quality observations and measurements are conducted. There are four important design considerations enclosed with developing an efficient monitoring network which can meet the required monitoring objectives. The issues include detecting what needs to be measured or observed, how it should be implemented, where it needs to be measured or observed, and when (how often) observations and measurements have to be performed. The answer to the first question identifies a list of water quality indicators which will be tracked by the system. The second question helps to select appropriate measurement tools and analytical methods. The last two determine the spatial and temporal location of the observation points. For many tasks of water quality assessment, relatively long series of values for water quality indicators are required. That is why samples are collected periodically at the same locations. Monitoring systems operate in a changing environment and under limited budgets. Due to budgetary constraints, existing recommendations prescribe as few as 4 – 6 observations at some monitoring sites or 6 – 12 samples over a three-year period (Statistics Canada 2008). At the same time, the systems are expected to provide sufficient data for a wide range of scientifically valid conclusions with the level of errors not exceeding 10%. The extent to which an assessment conducted on scarce data sets satisfies these requirements is unclear. Thus, the issues of possible improvements, increased efficiency and/or optimization of a monitoring system in general, and a monitoring design in particular, are urgent. This study investigates the approaches towards improvements in the temporal monitoring design.

2 Monitoring design optimization Optimization of a monitoring system is a complex multidisciplinary problem. Groot and Schilperoort (1983) proposed a framework for optimization of water quality monitoring networks. The framework is based on two main subjects: the water system to be monitored and the monitoring objectives. The water system and the monitoring objectives determine the scope and scale of a monitoring program. Moreover, the program must be de-

178

M.G. Erechtchoukova et al.

signed in a way that data collection and subsequent information extraction must meet the monitoring objectives. The degree to which the information meets the objectives represents the effectiveness of a monitoring program. It is suggested that the effectiveness of a monitoring design should be expressed as a function of frequencies of observations, the number and location of sampling sites, and the number and kind of observed ingredients. The same variables affect the cost of the monitoring program. The optimal monitoring program can be found by weighing the effectiveness against the cost. The proposed framework reflects a key feature of the optimization process – the optimal (i.e. recommended for the future uses) network can only be determined based on the information provided by the present one. In order to apply the proposed framework, it is necessary to relate the degree to which the monitoring data meet the objectives to the cost estimates of direct and indirect benefits of the data provided by the program. Such estimates, although they have been described in the literature, are not always available and are very approximate. Another water quality monitoring framework (WQTG 2006) provides the guidance to program development. The framework does not include a cost-effectiveness analysis step, but it describes important activities for comprehensive assessment of water resources. In many cases, the selection of the monitoring program is done subjectively. The solutions are obtained based on heuristics and expert knowledge rather than on formal procedures. Thus, the solutions can be considered as satisficing, but they are not formally optimal. The interdisciplinary nature of a monitoring system dictates the necessity to consider different and sometimes contradicting aspects of the system during its optimization. Thus, the problem of monitoring optimization can be solved as a multi-objective mixed integer programming model with constraints (Ning and Chang 2002), or as a constraint optimization using generic algorithms (Icaga 2005; Cieniawski et al. 1995). Application of formal optimization techniques requires a quantification of the effectiveness of a monitoring program which suggests a list of water quality indicators to monitor and an objective function to optimize. Since monitoring activities are always subject to financial and logistics limitations, algorithms of constraint optimization are deemed more suitable for this purpose. It is important to introduce criteria demonstrating to what extent one monitoring design outperforms another. Intuitively, one can introduce a measure of the monitoring outcomes and use the measure to choose the most appropriate. Since the main outcomes of a monitoring system are data sets and information derived from these data sets, the measure should reflect the compliance of data with monitoring objectives. The resultant measure describes the quality of data supplied by a certain monitoring design. This issue has received due attention in contemporary scientific

Optimization Algorithms for Improving Water Quality Monitoring Systems

179

literature. The methods and approaches used can be classified into those performing a statistical evaluation of the uncertainty of the monitoring data (e.g. Erechtchoukova and Khaiter 2007; Harmel et al. 2006; Ullrich et al. 2008) and those based on entropy theory (e.g. LoBuglio et al. 2007; Moramarco et al. 2008).

3 The optimization problem It is necessary to formulate an objective function and to find an optimum for this function with or without constraints. A measure has to be identified first. Cost estimates are preferable from a managerial perspective since they allow for straightforward comparison with available budget and between various monitoring designs. Although absolute values of the estimates are not available, the cost of a monitoring design increases monotonically with the number of samples collected at the monitoring sites and the number of sites. Hence, the efficiency of a monitoring design is related to the required number of samples. Focusing on the optimization of a temporal monitoring design at a given site, the current study uses the number of samples collected at a given site as a measure of efficiency. At the same time, monitoring data carry important information about the status and trends of the aquatic environment described via a set of water quality parameters, i.e. measured water ingredients and physical properties. Monitoring data actually present a discrete approximation of continuous fields of values of water quality parameters. Traditionally, monitoring data are analyzed statistically, which implies that the fewer the number of samples used for an estimate, the greater the uncertainty in the final result. The uncertainty is expressed as deviations of the estimated values from the actual ones. The uncertainty of monitoring data is unavoidable, and it is usually minimized by extensive sampling. The current study uses these two important properties of monitoring data to derive an efficient monitoring design. An articulation of the problem can be done in terms of mathematical programming:

subject to

(1)

where n is the number of required observations, I is the selected estimator for the water quality indicator, D(I) is the estimator deviation, and V is the given level of uncertainty. The problem (1) is a generalization of the models proposed by Erechtchoukova and Tsirkunov (1989) and Bodo and Unny (1983). The expression of D depends on an estimator I selected for a

180

M.G. Erechtchoukova et al.

particular case study and a measure selected to describe possible deviations of estimated values from the actual ones. Monitoring systems supply the values of concentrations of chemical ingredients in a water column which can be used for various types of data analysis including those not considered at the planning stage of a monitoring framework. The latter makes it preferable to have a monitoring design that provides the best possible fit for a series of concentrations. At the same time, chemical loads of ingredients are widely used for sustainable water resource management including waste load allocations and total maximum daily loading problems. Shrestha et al. (2008) pointed out that for effective water quality management, estimates of ingredient loads are more important than concentrations. Hooper et al. (2001) have also suggested using chemical loads as an objective for designing the monitoring program. Two water quality indicators – the annual chemical load and the annual average concentration of a water ingredient – are considered in this paper. The quality of an estimate can be described by its variance (e.g. Erechtchoukova and Tsirkunov 1989; Bodo and Unny 1983). Variance of a selected estimator allows for evaluation of a guaranteed range of possible deviations and hence helps to determine the required number of observations over an investigated period of time for achieving a desired level of accuracy. At the same time, the application of the estimator’s variance results in relatively high number of required observations which are not affordable for many sites and water bodies. An introduction of additional expert knowledge about the dynamics of investigated water quality parameters, underlying hydrological processes, and mathematical properties of the estimator helps to reduce the numbers, but it makes the utilization of automatic optimization techniques more difficult. For this reason, another metric to measure the quality of monitoring designs was constructed. Values of observed concentrations can be used to reconstruct the dynamics of concentrations of a selected ingredient based on interpolation with the goal of keeping the sum of absolute deviations of the interpolated curve from the actual one below a certain level of uncertainty. The level can be determined as a fraction of an estimate of the selected indicator. For many natural streams, water quality indicators are characterized by sharp seasonal variations in concentration values. The latter explains the fact that optimal monitoring designs have a non-uniform distribution of samples over the year. Thus, it is necessary to determine not only the total number of required observations per year, but also the pattern of how samples should be distributed in an optimal monitoring design. The monitoring design is described as a vector of 365 binary coordinates in which ones denote days when observations are conducted and zeros represent the days when no sample is collected. The sum of all coordinates of the vector gives the value of n from (1), and the optimal monitoring design can be

Optimization Algorithms for Improving Water Quality Monitoring Systems

181

found as a solution of (1) from a finite search space of 2365 possible solutions. To act as both an evaluator of the proposed optimization problem (1) and as a benchmark for future tests with optimization techniques such as genetic algorithms, a greedy search algorithm has been implemented. Using a standard greedy approach of building a solution one component at a time without backtracking (Reinelt 1994), a solution is obtained by eliminating the least informative observation date from a complete series of 365 daily observations of a selected water quality indicator. The algorithm stops when the closest approximation to the given level of uncertainty is obtained. The deviation D(I) on a complete series of 365 daily observations of the selected indicator returns zero. Reducing the number of observations can only increase this value since more points must be interpolated. Hence, in each step the value of D(I) moves closer to the allowed uncertainty level V until it reaches the threshold. Although the proposed algorithm does not 2 guarantee the global optimum solution, its complexity of O(N ), where N is the maximum number of observations (e.g. 365 for daily measurements), makes it useful for obtaining satisficing solutions. The greedy search solution prescribes a set of particular calendar dates for observations which provide sufficiently good interpolation of a selected indicator over a given period of time. These dates are determined a posteriori. The selected dates depend on the actual dates on which hydrological events had taken place within this period. These can vary from year to year, so the developed monitoring design is suitable only for years that have the exact same hydrological and hydrochemical characteristics. To improve the robustness of the algorithm, the constraint function in (1) has been modified by adding a “sliding window”. The size of the sliding window represents the largest deviation in the observation dates where the collected samples will still be able to provide measurements of the water quality indicators that meet the desired accuracy levels. Thus, the model (1) can be rewritten as:

(2) subject to

,

where Ic – the indicator estimate calculated using interpolation, Ia – the actual value of the selected indicator, p – the maximum allowed level of uncertainty (e.g. 0.05, 0.1, etc.), sw – the size of the sliding window around a proposed date when a sample can be collected.

M.G. Erechtchoukova et al.

182

4 The case study Data sets collected over four years (denoted as Year 1, Year 2, Year 3, and Year 4) at the Vyatka River near the town of Vyatskiye Polyany have been used in model (2). The Vyatka River is a large Eastern-European river with a length of 1,370 km and a watershed area of 129,000 km2 located in the Kirov Oblast (Region) and the Republic of Tatarstan in the Russian Federation. Its average water discharge is about 700 m3/s. Concentrations and instantaneous loads of chloride ions were chosen as water quality indicators of interest mainly due to available long series of daily observed values of water discharges and concentrations. The series of instantaneous load values were obtained by multiplication of corresponding values of the water discharge and the concentration of chloride ions. In the first step of the study, the problem (2) was solved on the data set of the Year 1 with a unimodal type of hydrograph and for different sizes of the sliding window ranging from zero days to 41 days. As a result for each size of the sliding window, base monitoring designs have been determined that provide sampling programs sufficient to interpolate the chemograph and the curve of instantaneous loads. Table 1. The performance of the monitoring design developed using data from Year 1 with p = 0.05 for the annual load (L) and the average annual concentration (C) of chloride ions Sliding window (sw)

Total number of observations L C

Year 2

Year 3

Year 4

L

C

L

C

L

C

0

16

13

0.185

0.968

0.196

0.127

0.128

0.082

3

18

14

0.184

0.099

0.203

0.129

0.108

0.064

11

25

19

0.175

0.075

0.155

0.097

0.108

0.041

21

30

21

0.122

0.066

0.133

0.071

0.092

0.041

31

31

22

0.120

0.060

0.105

0.077

0.070

0.035

41

33

22

0.137

0.040

0.114

0.071

0.080

0.025

The obtained monitoring designs were then applied to Year 2 (which also has a unimodal hydrograph) and to Year 3 and Year 4 (which have bimodal hydrographs) – see Fig. 1. The chemographs of the chloride ion concentrations for the same years are shown in Fig. 2.

Optimization Algorithms for Improving Water Quality Monitoring Systems

Fig.1. Hydrographs of the investigated years, the Vyatka River

Fig. 2. Chemographs of the investigated years, the Vyatka River

183

M.G. Erechtchoukova et al.

184

The performance of the determined monitoring designs was compared on the data sets for the investigated years using deviations of the selected estimator (see Table 1). The results demonstrate that the application of the suggested sampling programs can evaluate the selected water quality indicators within reasonable levels of accuracy, but these levels do not always meet the same uncertainty level p that was used to develop the monitoring design. It is obvious that hydrological and hydrochemical regimes of a particular year can substantially affect the accuracy of the estimates. The greater the difference between the regimes in the base year and in the year being evaluated, the lower the accuracy of the estimates, i.e. the higher the deviation in the indicator estimate used in (2). As it is seen from Table 1, the Year 3 consistently delivered the highest error in estimates for both water quality indicators and for all sizes of the sliding window. That year was chosen as a new base year in an attempt to obtain a monitoring design with lower errors of estimates. The sampling program developed from the data set collected in the Year 3 (see Table 2) was applied to all other years in the study. The accuracy of the estimates is better than that with the base designs of the Year 1. This result suggests an iterative approach to improving the monitoring designs. Table 2. The number of observations required for calculating the total annual chloride ion load (monitoring designs of the base Year 3) Sliding window (sw) 0

Total number of observations 5% uncertainty 10% uncertainty 22 14

3

25

15

11

35

19

21

39

20

31

41

22

41

44

25

61

45

26

5 Discussion An obvious benefit of the proposed optimization model (2) is that the size of the sliding window is its only parameter. The constraint function is evaluated using basic statistics on observed and linearly interpolated data. The sw parameter is site independent and does not require site specific

Optimization Algorithms for Improving Water Quality Monitoring Systems

185

parameterization. It makes the model applicable to observation sites on rivers with various length and watershed area since these characteristics are reflected through the hydrological values and water quality parameters. The monitoring designs found as the solutions of problem (2) prescribe particular dates when the samples should be collected. However, measurements cannot always be made at the desired dates and the timing of hydrological events can vary from year to year. The introduction of sliding windows allows for flexibility in the observation dates and it is important for the practical operation of the environmental monitoring systems. The issue of the appropriate size of the sliding window seems important because the total number of required samples per year is an increasing function of the size of the sliding window. On the other hand, the larger the sliding window, the more evenly samples are distributed over the year. As it is shown in Fig. 3, the monitoring design derived based on Year 3 data and a 41 day sliding window provides a sampling program sufficient for evaluating the annual chloride ion load with the desired level of accuracy for all investigated years, and it can be used as the first approximation in the development of recommendations for sampling programs. In general, recommendations of specific sampling dates for observations are hard to follow. The solutions obtained for a fixed size of the sliding window can rather be used to determine monthly sampling frequencies of observations which would guarantee a required level of accuracy (see Table 3).

Fig. 3. The robustness of the monitoring designs developed using data from Year 3 versus the size of the sliding window

M.G. Erechtchoukova et al.

186

Table 3. Monthly sampling frequencies in Year 3 for chloride ion load estimate Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 5%

2

2

3

4

5

5

4

3

6

4

3

2

10%

1

1

1

3

3

2

2

3

4

3

1

1

Monthly frequencies in Table 3 vary from two to six samples per month for keeping the uncertainty below 5% and from 1 to 4 samples per month for estimates within a 10% level of uncertainty. It is interesting to note that the highest frequencies are required to record variability of the indicators, while extreme values of the indicator (though important) play a secondary role. The suggested monitoring designs depend on a selected indicator. Thus, approximation of the instantaneous loads used for evaluation of the total annual load requires a higher number of samples than the approximation of the concentration curve. Although values of water discharge provide additional information about hydrological characteristics in the estimate, they also amplify variability of the instantaneous loads, which leads to the increased numbers of required observations. The observation dates determined for the chemical load in Year 3 has been tested on the series of concentrations collected in all investigated years and the errors were less than the 5% uncertainty (Table 4). Selection of an indicator for constructing an efficient monitoring design is an important issue but it is beyond the scope of the paper. However, concentrations of several water ingredients with different physical and chemical properties are usually obtained from the same sample. This implies the necessity to introduce a common monitoring design for all these ingredients. Since the load indicator reflects hydrological processes which contribute to the values of water quality parameters and are common for these parameters at a given sampling site, the choice of the chemical load indicator as an objective for the design of efficient monitoring programs appears reasonable. In general, the larger the number of samples and more uniform their distribution over the investigated period of time, the more robust the monitoring design will be and the recommendations for a sampling program become easier to follow. However, a uniform distribution of samples leads to oversampling in periods when variations of the selected indicators are small. Calculation showed that between 25 and 40 observations per year are required to attain a reasonable level of accuracy in estimating the annual load of chloride ions at the cross-section Vyatskiye Polyany. These numbers significantly exceed recommendations for this water quality ingredient provided by many existing monitoring programs, but such recommendations lead to errors in the estimates that are higher than 10%.

Optimization Algorithms for Improving Water Quality Monitoring Systems

187

Table 4. The performance of the monitoring designs developed for chemical loads using Year 3 data with p = 0.05 on the series of concentrations Sliding window

Year 1

Year 2

Year 3

Year 4

0

0.063

0.069

0.037

0.063

3 11 21 31 41 51 61

0.038 0.037 0.031 0.025 0.017 0.019 0.018

0.043 0.045 0.036 0.024 0.019 0.021 0.020

0.029 0.031 0.025 0.029 0.022 0.017 0.019

0.025 0.033 0.022 0.016 0.014 0.018 0.014

6 Conclusions The approach to designing an efficient sampling program and its application to the case study provide insights into optimization of water quality monitoring systems. In many monitoring sites, recommended frequencies of observations are not sufficient for water quality estimates with a declared 10% target of the errors. As a rule, concentrations of several water quality parameters are obtained from the same sample. The optimization problem described in this study can be extended to include more than one water quality ingredient generating designs common for these ingredients. In this case, the same computational scheme is useful to obtain a satisficing solution. The proposed approach can be used to derive a framework for optimization of a temporal monitoring design at an existing sampling site. It can be adopted in the automated procedures which generate monitoring design supplying data sufficient to attain a given level of accuracy. The framework is based on iterations and requires data sets collected over several years, preferably with different types of hydrograph.

Acknowledgements The authors are grateful to anonymous reviewers for their thoughtful suggestions and helpful comments on the manuscript. Part of the research was done based on the data sets prepared in Hydrochemical Institute, the Russian Federation.

188

M.G. Erechtchoukova et al.

References Bodo B, Unny TE (1983) Sampling strategies for mass-discharge estimation, Journal of Environmental Engineering, 109 (4), 812 – 829 Cieniawski S, Ehart J, Ranjithan S (1995) Using genetic algorithms to solve a multiobjective groundwater monitoring problem. Water Resources Research 31(2): 399-409 Erechtchoukova MG, Khaiter PA (2007) Uncertainty reduction in modelling of chemical load in streams. In: Oxley L. and Kulasiri D. (eds) MODSIM 2007 International Congress on Modelling and Simulation. Modelling and Simulation Society of Australia and New Zealand, December 2007, pp. 2445 - 2451 Erechtchoukova MG, Tsirkunov VV (1989) Determination of sampling strategy for calculation of mass-discharge in streams with a given accuracy. In: Proc. of the Conf. Problems of Surface Hydrology, February, 1987, Leningrad: Gidrometeoizdat, 171-189 Groot S, Schilperoort T (1983) Optimization of water quality monitoring networks. Water Science and Technology 16: 275-287 Harmel RD, Cooper RJ, Slade RM, Haney R., Arnold JG (2006) Cumulative uncertainty in measured streamflow and water quality data for small watersheds. Transactions of the ASABE 49(3): 689-701 Helmer R (1993) Water quality monitoring: national and international approaches. In: Int. Symp. Proc. Hydrochemistry 1993. Rostov-on-Don, Russia: IAHS Publ., 219: 3-17 Hooper RP, Aulenbach BT, Kelly VJ (2001) The National Stream Quality Accounting Network: A flux-based approach to monitoring the water quality of large rivers. Hydrological Processes.15: 1089-1106 Icaga Y (2005) Genetic algorithm usage in water quality monitoring networks optimization in Gediz (Turkey) river basin. Environmental Monitoring & Assessment, 108: 261-277 LoBuglio JN, Characklis GW, Serre ML (2007) Cost-effective water quality assessment through the integration of monitoring data and modeling results. Water Resources Research 43: W03435, doi:10.1029/2006WR005020. Moramarco T, Ammari A, Burnelli A, Mirauda D, Pascale V (2008) Entropy theory application for flow monitoring in natural channels. In: Sànchez-Marrè M et al.(eds) Proc. of the 4th International Congress on Environmental Modelling and Software (iEMSs 2008). iEMSs, Barcelona, Catalonia, July 2008 pp. 430-437 Ning SK, Chang N-B (2002) Multi-objective, decision-based assessment of a water quality monitoring network in a river system. J. Environ Monit. 4: 121-126 Reinelt, G (1994) The Traveling Salesman: Computational Solutions for TSP Applications. Springer-Verlag, Berlin New York Shrestha SF, Kazama LT, Newman H (2008) A framework for estimating pollutant export coefficients from long-term in-stream water quality monitoring data. Environmental Modelling and Software. 23: 182-194 Statistics Canada (2008) Canadian environmental sustainability indicators 2007: Freshwater quality indicator. Data sources and methods. (Cat. N 16-256-X) http://www.statcan.gc.ca/pub/16-256-x/16-256-x2008000-eng.pdf Ullrich A, Volk M, Schmidt G (2008) Influence of the uncertainty of monitoring data on model calibration and evaluation. In: Sànchez-Marrè M et al.(eds) Proc. of the 4th International Congress on Environmental Modelling and Software (iEMSs 2008). iEMSs, Barcelona, Catalonia, July 2008. pp. 544-552 US Environmental Protection Agency (2003) Elements of a state water monitoring and assessment program (EPA 841-B-03-003), Online URL http://www.epa.gov/owow/ monitoring/elements/index.html Water Quality Task Group (2006), A Canada-wide framework for water quality monitoring. PN 1369, online URL http://www.ccme.ca/assets/pdf/wqm_framework_1.0_e_web.pdf