A Multiagent Q-Learning-Based Optimal Allocation ... - IEEE Xplore

Comment

Report 4 Downloads 154 Views

204

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

A Multiagent Q-Learning-Based Optimal Allocation Approach for Urban Water Resource Management System Jianjun Ni, Member, IEEE, Minghua Liu, Li Ren, and Simon X. Yang, Senior Member, IEEE

Abstract—Water environment system is a complex system, and an agent-based model presents an effective approach that has been implemented in water resource management research. Urban water resource optimal allocation is a challenging and critical issue in water environment systems, which belongs to the resource optimal allocation problem. In this paper, a novel approach based on multiagent Q-learning is proposed to deal with this problem. In the proposed approach, water users of different regions in the city are abstracted into the agent-based model. To realize the cooperation among these stakeholder agents, a maximum mapping value function-based Q-learning algorithm is proposed in this study, which allows the agents to self-learn. In the proposed algorithm, an adaptive reward value function is used to improve the performance of the multiagent Q-learning algorithm, where the influence of multiple factors on the optimal allocation can be fully considered. The proposed approach can deal with various situations in urban water resource allocation. The experimental results show that the proposed approach is capable of allocating water resource efficiently and the objectives of all the stakeholder agents can be successfully achieved. Note to Practitioners—Water resource optimal allocation is an important decision making activity in water resource management systems. This paper sets up a water resource optimal allocation model based on multiagent modeling technology, where different optimal objectives are abstracted into various properties of agents, and a new multiagent Q-learning approach is proposed to deal with the optimal allocation problem in water resource management system. The proposed approach can be used in practical water resource management systems, with any feasible data from the statistical agencies of government and companies. The experimental results demonstrate the effectiveness and efficiency of the agent-based allocation model and the proposed approach based on the novel multiagent Q-learning algorithm. Index Terms—Complex system, multiagent Q-learning, optimal allocation, urban water resource management. Manuscript received September 07, 2012; accepted November 19, 2012. Date of publication January 14, 2013; date of current version January 01, 2014. This paper was recommended for publication by Associate Editor L. Moench and Editor H. Ding upon evaluation of the reviewers’ comments. This paper was supported in part by the National Natural Science Foundation of China (61203365), the Jiangsu Province Natural Science Foundation (BK2012149), the Open Fund of Changzhou Key Laboratory of Sensor Networks and Environmental Sensing (CZSN201102), and the Fundamental Research Funds for the Central Universities (2011B04614). J. Ni and M. Liu are with the Changzhou Key Laboratory of Sensor Networks and Environmental Sensing, College of Computer and Information, Hohai University, Changzhou, Jiangsu 213022 China (e-mail: [email protected]; [email protected]). L. Ren is with the College of Hydrology and Water Resources, Hohai University, Nanjing, Jiangsu 210098 China (e-mail: [email protected]). S. X. Yang is with the Advanced Robotics and Intelligent Systems (ARIS) Laboratory, School of Engineering, University of Guelph, Guelph, ON N1G 2W1, Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/TASE.2012.2229978

I. INTRODUCTION

W

ATER RESOURCE is a very important factor in the development of economy and society. It is a challenging and critical issue in water environment systems to allocate and utilize water resource properly. The challenge arises from the complexity of water resource management, which requires the ability to integrate numerous objectives in order to satisfy the goals of different stakeholder agents [1], [2]. Urban water resource allocation has been more and more serious, because urban population and industry are both intensive and water demand increases continuously. How to make a reasonable allocation of water resource is the main task of urban water resource management systems. Urban water resource optimal allocation is a complicated and difficult task, because there are multiple water sources and various water users. Furthermore, the objectives of these water users are different and even contrary [3]. Various methods have been proposed to deal with the optimal allocation problem. For example, Ren et al. [4] proposed an effective methodology for a robust global optimization of electromagnetic devices, which is based on the gradient index and multiobjective optimization method. Zhang et al. [5] presented a new manufacturing resource allocation method, by using an extended genetic algorithm (GA) to support the multiobjective decision-making optimization for the supply chain deployment. Rathinam et al. [6] proposed a resource allocation algorithm for multivehicle systems with nonholonomic constraints based on a constant factor approximation algorithm. Recently, some machine learning-based approaches have been proposed to deal with the optimal allocation problem [7], [8]. Bone and Dragićević [9] developed an agent-based model to achieve optimal forest harvesting strategies through an integration of reinforcement learning algorithm. Song et al. [10] proposed a machine learning approach for determining feasible plans of a remanufacturing system, where the rough set theory was applied to establish the relationship between a plan and its feasibility, and an iterative reinforcement process was used to enhance the confidence. Though water resource optimal allocation task is similar to other resource optimal allocation tasks, there are some differences, for example, the main challenges of resource optimal allocation for the remanufacturing system are uncertainties, while it is complicated internal relation in water resource optimal allocation (e.g., multiple water sources and various water users). So those methods in other resource allocation tasks have some

1545-5955 © 2012 IEEE

NI et al.: A MULTIAGENT Q-LEARNING-BASED OPTIMAL ALLOCATION APPROACH FOR URBAN WATER RESOURCE MANAGEMENT SYSTEM

active reference on water resource optimal allocation. However, it is difficult to apply them to deal with water resource optimal allocation directly. For example, Zhang et al. [5] used an extended GA to solve the multiobjective decision-making optimization problem, which cannot give out optimal solutions for continuous decision steps in water source allocation. Furthermore, that GA cannot deal with the interactions among agents in water resource systems. The problem of urban water resource optimal allocation is a focus recently. Now, the sustainable development theory is the basis of water optimal allocation and lots of relationship models have been developed [11]. The traditional models and methods can not satisfy the requirements of water resource allocation, which have extended to the river basin and region water system as a whole. More and more researchers have focused on the heuristic approaches derived from nature to solve water resource allocation problem. Abolpour et al. [12] proposed an adaptive neural fuzzy inference system method to simulate seven interconnected sub-basins in a regional river system located in Iran. Cunha and Ribeiro [13] proposed a tabu search algorithm to find the least-cost design of looped water distribution networks. Montalvo et al. [14] proposed a multiobjective variant of the particle swarm optimization (PSO) algorithm to deal with water distribution system optimization problem. Water resource optimal allocation is a classic complex system problem [15], [16]. The relevant factors in various aspects (such as the development condition of social economy, water demand, and water resource protection) must be considered in the decision making of water resource allocation. Although there is much research on water resource optimal allocation, most of these approaches attempt to obtain an optimal value to satisfy the objective functions and constraints. But only one of the economic, social or environmental objectives can be achieved, and seldom of these approaches can satisfy all the goals of different stakeholder agents. The traditional mathematical model is difficult to deal with the problem of complex systems. To solve the problem of water environment complex systems, the agent-based model presents an effective approach that has been implemented in water resource management research [17]–[19]. In this paper, a water resource allocation model is set up based on multiagent modeling technology and a novel approach based on multiagent Q-learning is proposed to accomplish the water resource optimal allocation task efficiently. The Q-learning-based method is one of the most popular reinforcement learning methods, which is widely used to learn tabular relationships among states, described by a finite number of values for each variable, and discrete action [20], [21]. Recently, there has been a lot of research on the Q-learning-based method for various optimization problems. For example, Kartoun et al. [22] presented a physical model to find the directions of forces and moments required to open a plastic bag, and a Q-learning algorithm was implemented on the robot learning system. Juang et al. [23] proposed a design of fuzzy controllers by ant colony optimization incorporated with fuzzy Q-learning. Cui et al. [24] proposed a new machine-learning-based ship design optimization approach, where the Q-learning algorithm was utilized to realize the learning function in the optimization process. In a multiagent system, the Q-learning algorithm can increase the

205

intelligence of the system, when multiagents pursuit a common goal by communication and cooperation. Each agent would be affected in the learning process by knowledge, belief, and intension of other agents [25], [26]. However, the traditional multiagent Q-learning methods have some shortcomings in water resource optimal allocation. For example, the computation of the methods based on the traditional multiagent Q-learning algorithms will be very complicated, because there are too many objectives to be optimized in water resource system [27], [28]. To deal with these problems, the individual objectives are abstracted into an agent-based model in this study and an improved multiagent Q-learning algorithm is proposed, which allows agents to self-learn. The proposed Q-learning algorithm is based on a maximum mapping value function to strengthen the cooperation among the stakeholder agents. In the proposed multiagent Q-learning algorithm, an adaptive reward value function is used to improve the performance of the multiagent Q-learning algorithm, where the influence of multiple factors on the optimal allocation can be fully considered. The experimental results show that the proposed approach is capable of allocating water resource efficiently in various situations. The main contributions of this paper are summarized as follows. 1) A water resource optimal allocation model based on multiagents is presented, where the optimal allocation problem of water resource system is considered as an interactive problem among various stakeholder agents. 2) A novel resource allocation approach based on multiagent Q-learning algorithm is proposed. The adaptability of the multiagent Q-learning algorithm is improved and all the influence factors are considered naturally. 3) A set of experiments are conducted to show the efficiency of the proposed approach. 4) Some comparison experiments are conducted, and the proposed approach is proved to be more suitable than the general multiagent Q-learning methods for water resource optimal allocation in water resource management systems. Remark: The basic idea of the proposed approach in this paper is similar to that in [9], where an integration approach based on reinforcement learning method and an agent-based model is proposed, to achieve the optimal forest harvesting strategies. However, there are two basic differences between the approach in the literature and the proposed approach in this paper (see Section II). The first one is the optimization process of the two approaches. The other one is the way to consider the multiple factors. This paper is organized as follows. Section II presents the proposed approach. To illustrate the effectiveness of the proposed approach, the experiments for various situations are given in Section III. Some discussions and comparison studies are shown in Section IV. Finally, the conclusion is given in Section V. II. THE PROPOSED MULTIAGENT Q-LEARNING APPROACH In this paper, the water resource optimal allocation problem in water resource management systems is studied, which is dynamic, interacted, social, and economic related [15]. The problem is defined as follows: 1) there are a number of stakeholder agents in the system, which are labeled as ; 2) there is a set of actions for each agent, which belongs to a water resource allocation solution ; and

206

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

method, which is a temporal-difference method to estimate the accumulative future rewards (or costs) of performing an action in a given state. In the Q-learning method, a Q-learning function is used as the learning value function. By using the standard Q-learning method for the decision making of an agent, a table must be set up to store each pair of state and action. The Q-value can be calculated by [20], [29]

(1) where

is the learning rate; is the discount factor; is the current state and action pair; is the next state; and is the action set of all of the possible actions under the state . Remark: In a multiagent system, an agent needs to keep track of its environment, as well as other agents. To extend Q-learning to the multiagent learning domain, the joint actions of participating agents rather than merely individual actions need to be taken into account. In general, the Q-learning algorithm for single agent is extended, such as the Nash Q-value function for one agent in the multiagent system [30], [31] is defined as

(2)

Fig. 1. The work flow of the proposed approach.

3) the water resource allocation task is to decide the optimal allocation solution for a continuous decision period, which is made up of limited decision steps. To solve this problem, a novel multiagent Q-learning approach is proposed in this paper, which is based on the agent-based model for water resource management (this agent-based model will be introduced concretely in Section III). In the proposed approach, a maximum mapping value function is defined to make the Q-learning-based approach deal with the optimal allocation effectively and a GA is used to find this maximum mapping value efficiently. To deal with the multifactor constraint problem in water resource optimal allocation tasks, an adaptive multifactor reward value function is designed in the proposed approach. The work flow of the proposed approach is shown in Fig. 1. The proposed multiagent Q-learning approach is presented in detail as follows. A. The Maximum Mapping Value Function of the Proposed Approach In the reinforcement learning-based methods, the learning value function is the main part, which associates each pair of state and action to a utility value. The design of the learning value function is the key to the reinforcement learning-based approach. Q-learning is a model-free reinforcement learning

is a joint action; is an one-period reward where for the th agent in the state under this joint action; is the number of agents in the system and is the payoff of the th agent in the state for the selected Nash equilibrium. Because the joint action set will increase rapidly with the increase of the number of agents, the convergence problem of the algorithm based on the Q-value function in (2) will become very serious. Some improvements have been done on the multiagent Q-learning to deal with the convergence problem. For example, Wu et al. [32] proposed a novel multiagent reinforcement learning method for job scheduling problems. The approach circumvents the scalability problem by using an ordinal distributed learning strategy, and realizes multiagent coordination based on an information-sharing mechanism with limited communication. Fujita and Matsuo [33] proposed a hybrid modular Q-learning technique for improving the learning performance of Q-learning, by using a method of partially increasing the dimensionality of the state space. However, there still are some limitations of those multiagent Q-learning methods introduced above for water resource optimal allocation. Such as most of those methods based on approximation function rely on the slow and iterative learning proceeding, which are not effective to cope in a real-time environment. Furthermore, the optimization procedures of most methods do not consider the interactions among the stakeholder agents. To deal with those problems above, a maximum mapping value function-based Q-learning algorithm is proposed in this paper. The main idea of the proposed approach is: to calculate a mapping value of the Q-function for every state-action pair and find the maximum sum mapping value of the Q-function for each decision step. The meaning of the mapping value here

NI et al.: A MULTIAGENT Q-LEARNING-BASED OPTIMAL ALLOCATION APPROACH FOR URBAN WATER RESOURCE MANAGEMENT SYSTEM

207

is a quality to evaluate each allocation solution. The mapping value function in this study is defined as

where is the maximum size of the th gene in the chromosome. It can be calculated by

(3)

(6)

is a state-action pair; represents the decision where step; represents an action (namely an allocation solution) at the decision step , and is the reward value given to the solution of the th agent, which will be introduced in Section II-C. The sum of mapping value of all the agents in the system can be obtained by

is a rounding function towards zero; and where function and are the maximum value and minimum value of the th objective , respectively. Because these objectives have different units and dimensions, all of the objectives in the allocation solutions are normalized, before they are translated into the binary number. 2) The fitness function. The fitness function is the key to the GA-based method in optimal allocation task. In this study, the sum of the mapping value of all the agents is used as the fitness function of the GA [(4)], which is corresponding to the state-action pair of one allocation solution. 3) The genetic operations. Because the chromosome is coded in binary, the selection, crossover and mutation operation are realized easily. The general method is adopted in this paper [36]. The selection strategy is chiefly based on the fitness level of the individual chromosome in the population. In this paper, the roulette-wheel selection is used as the selection strategy. The crossover operator starts with two selected individuals and the crossover probability is . In this GA, the mutation operator is just to negate every bit of the chromosome with probability . Remark: Although there is much research on the optimal allocation approach based on GAs, it is clear that there are some differences between the literature and the proposed approach in this paper. For example, the fitness function of most of the research is directly based on the benefits of all the agents, which make the GA more complex than the proposed approach. Furthermore, the optimization process of the proposed approach is based on Q-learning algorithm, which makes the obtained optimal allocation solutions to be global optimization solutions for all the decision steps, it is different with those approaches based on GAs directly.

(4) where is the sum of mapping value for the th allocation solution in the th decision step; can be obtained by the iterative formula (3); and is the maximum decision step of water resource allocation task. Let the number of all the possible solutions in one decision step is , then the number of the sum of mapping value of all the agents obtained for this decision step is too. If the maximum sum of mapping value from these values could be found, then the optimal allocation solution (namely the actions corresponding to the maximum mapping value) for this decision step is obtained. When all the optimal allocation solutions for all the decision steps are found, the optimal allocation task is finished. Remark: In the proposed approach, the optimal allocation solutions for each decision step are obtained based on the maximum sum of mapping value. Because the mapping value of each agent is calculated independently and the sum of mapping value is just a linear computation, the convergence of the proposed approach can be guaranteed by the Q-learning algorithm for single agent (see [31], [34], and [35]). B. The Maximum Mapping Value Searching Based on GA The description of the proposed approach shows that the proposed approach has some good performances than the general multiagent Q-learning algorithm, for example, the proposed approach can satisfy all the goals of different stakeholder agents, and the computation of the proposed approach is relatively simple. However, in the proposed multiagent Q-learning algorithm based on the maximum mapping value function, the number of the maximum mapping value will increase with the actions of agents in the system. So it is a contradiction between the searching speed and the computation precision. To solve this problem, a GA is used, to make the decision quickly and precisely. The GA used in this paper is introduced as follows: 1) The chromosome of the GA. In the proposed approach, the allocation solution is used to make up the chromosome of the GA, which can be coded as a binary number. The allocation solutions are constructed by various objectives need to be optimized. For example, if there are objectives in the system, one chromosome of the GA is . The length of the chromosome is (5)

C. The Adaptive Multifactor Reward Value Function In the reinforcement learning-based methods, how to define the reward value function is one of the main tasks. The reward value function of the learning system is used to seek the maximum future reward. In the learning process, the reward value is often in a boolean form, namely the reward value is 0, otherwise it is 1. The reward value in boolean form is convenient for simple systems. But it is difficult to reflect the influences of the environmental feedback for some large and complex systems, because a single reward signal is inadequate to accurately evaluate the effect of an action from the view of the whole system. To give full consideration to all the factors in the system, some improvements on the reward value function have been proposed. However, many of these reward value functions have some shortcomings. For example, the reward function based on the weighted sum is complicated, which need to calculate the weights of the factors [9]. Some reward values obtained by the methods based on fuzzy rules are discrete and the number of them is limited, which are not suitable to the complex system,

208

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

because the reward values are continuous and unlimited in realtime applications [20]. To deal with the problems above, an adaptive reward value function is proposed, which is introduced as follows: (7) where is the reward value for the th agent; is the number of influence factors (the influence factors in this paper are the indexes for water resource optimal allocation, such as water quality and economic benefit index); is the th influence factor; is an action (decision), and is a benefit function used to calculate the benefit of the influence factor under the certain action , which is determined by the actual application. To reduce the effects of the influence factors’ dimension to the algorithm, the value of the influence factor should be normalized by (8) where is the normalized value of the influence factor and is the maximum value of this influence factor. The work flow of the proposed approach is summarized as follows. 1) Initialize all the state-action pairs of agents, namely, select a possible allocation solution randomly for all the agents at every decision step. 2) Get all the initial values of the influence factors in agents and normalize them. 3) Calculate the real-time reward values for all the agents by (7). 4) Calculate the mapping value of each agent and get the sum mapping value of all the agents by (3) and (4). 5) Change the actions of agents at the th decision step, and repeat the steps from 3) to 5) till all the actions of this decision step have been selected. 6) Search the maximum mapping value for the th decision step by the GA introduced in Section II-B. The actions according to this maximum mapping value are the optimal solutions in this step. 7) Let and go to step 5), till the optimal allocation solutions for all the decision steps are obtained. The pseudocode of the proposed approach in this paper is introduced in Fig. 2. III. EXPERIMENTAL STUDIES In urban water resource optimal allocation task, water demand and supply is subject to complex interplay of social and ecological dynamics existing in environment. Furthermore, there are multiple water sources, such as the local surface water from river or lake, the transfer water from another city, and the reclaimed water by some water companies. To conduct the task efficiently, a water resource allocation model for a city based on multiagent modeling technology is set up first in this paper. For simplification without losing generality, there are just two

Fig. 2. The pseudo of the proposed approach.

different water sources in this model, which are denoted as and , respectively, and there are four different regions in this city, namely there are four regional agents, which are water consumers (denoted by , and ). In the inner of each region, there are three different stakeholder agents, which are abstracted as industry agent, inhabitant agent, and ecology agent. In these experiments, the main influence factors of water resource allocation are water quality, transportation price index, economic benefit index, and ecologic benefit index. All these influence factors are dimensionless parameters. The water quality is the key factor of water resource management system, which is used to reflect the quality of water. The transportation price index is used to reflect the transportation costs of water from water source to various regions. The economic benefit index is used to denote the contribution of water resource to the regional economy. The ecologic benefit index reflects the influence of water resource on the ecology environment. The

NI et al.: A MULTIAGENT Q-LEARNING-BASED OPTIMAL ALLOCATION APPROACH FOR URBAN WATER RESOURCE MANAGEMENT SYSTEM

209

TABLE I INITIAL VALUES OF INFLUENCE FACTORS IN THE WATER RESOURCE ALLOCATION MODEL

Fig. 4. The distribution of the water sources and the regional agents in the experiments.

TABLE II SOME PARAMETERS OF THE PROPOSED APPROACH

Fig. 3. The water resource allocation model based on multiagents.

frame of the proposed water resource allocation model is shown in Fig. 3. The task of water resource allocation is to decide the allocation proportion of different water sources and the total water supply for every region. The goal is to achieve the best overall social, economic and ecologic benefits. To test the performance of the proposed multiagent Q-learning approach in the water resource optimal allocation task, some experiments are conducted in this paper. The geographical position of the two water sources and the four regional agents in these experiments are shown in Fig. 4. The total constraint conditions in these experiments are: 1) the actual supply quantity to all the regions can not exceed the water supply capacity; 2) the daily water supply to each region can not be less than the minimum daily water demand of this region; and 3) the change of the daily water demand in every region is less than 5%. These constraints are expressed by

(9) %

where represents the actual supply quantity of water source for the regional agent is the total available water supply of the city; and are the daily water supply and water demand for the regional agent respectively, is the minimum daily water demand of the regional agent , which is set as 95% of the daily water demand. is the water demand of the regional agent at the th day (namely, the th decision step in the proposed approach). In this study, the initial values of the influence factors in the water resource allocation model are listed in Table I. Because most parameters of the proposed approach can be calculated adaptively according to different tasks, here just some parameters are given out and listed in Table II, which should be set by the designer. These parameters of the proposed approach in all the experiments are same. The average daily water demands of the four regional agents are set as , and million tons (MT), respectively. To satisfy the constraint conditions, here the available

210

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

TABLE III ALLOCATION OF TOTAL WATER QUANTITY FOR EACH AGENT UNDER NORMAL STATE IN ONE EXPERIMENT

Fig. 5. The allocation proportion of water source normal state in one experiment.

for each agent under

water supply of the city is set as million tons per day (MT/D), where the available water supply of water source and are 0.8267 and 0.9067 (MT/D), respectively. In the normal state, the daily water demand of the th regional agent will fluctuate around 1%.

TABLE IV OF TEN REPETITION EXPERIMENTS FOR THE AVERAGE VALUE TOTAL WATER QUANTITY ALLOCATION UNDER NORMAL STATE

A. Experiment Under Normal State In the first experiment, the economic and social conditions are under normal state, namely, all the influence factors fluctuate around the normal range, which is set as % in this study. The initial values of influence factors are inputted into the water resource allocation model and the optimal allocation solutions can be obtained by the proposed approach. Because the operators of GA are based on probabilistic methods, every experiment is conducted 10 times. The experimental results are listed in Tables III and IV (to make this paper more concise, here just the results for the allocation of total water supply are given out). Table III is the allocation results for every regional agent of one experiment in this situation, and Table IV is the average value of these ten repetition experiments. Fig. 5 is the allocation proportion curve of water source for each agent. Because the sum of the proportion of water source and for one agent is equal to 1, here just the proportion of water source for each agent is given out (see Fig. 5). Remark: The deviation value in Table III is defined as (10) is the average allocation of total water quantity for where the th regional agent (the average daily water supply). The results in Table III show that all the deviation values of the four regional agents are negative, namely, the water supply quantity is less than the water demand. The reason is that the total water supply is limited. The results in Fig. 5 show that the proposed approach can give out the optimal allocation of different water sources to each agent based on the complex interaction among these agents. For example, the average allocation proportion of the water source for the regional agent is

about 0.65, which is obviously higher than the average allocation proportion of the water source for the regional agent (about 0.49). The main reasons are that the water source can give more benefit than the water source . But it is opposite to the regional agent (see Table I). The curve of the allocation proportion in Fig. 5 and the standard deviation values in Table III show that the fluctuates of the allocation solutions are small. Also, the average results of ten repetitions show that the deviations among these repetition experiments are very little (see Table IV). These results of this experiment show that the allocation solutions based on the proposed approach will not change seriously. This performance is very important for urban water resource allocation sometimes, which can help the society, economy, and environment to develop in a healthy way [37].

NI et al.: A MULTIAGENT Q-LEARNING-BASED OPTIMAL ALLOCATION APPROACH FOR URBAN WATER RESOURCE MANAGEMENT SYSTEM

TABLE V FOR AND THE ALLOCATION OF TOTAL WATER QUANTITY TRANSPORTATION PRICE INDEX OF EACH AGENT BASED ON THE PROPOSED APPROACH IN ONE EXPERIMENT

211

FOR

TABLE VI AVERAGE VALUE

AND THE STANDARD DEVIATION TOTAL WATER QUANTITY ALLOCATION

OF TEN REPETITION EXPERIMENTS UNDER ABNORMAL STATE

FOR THE

The results show that the proposed approach can give out optimal allocation solutions for urban water resource effectively. B. Experiment Under Abnormal State To further test the proposed approach in practice of water resource allocation, this experiment is conducted where the state of water environment system will change abruptly. In this experiment, the initial values of influence factors are the same as those of the first experiment and all of them will fluctuate under normal range, except the transportation price index of the water source for will change seriously by some certain reasons (such as the increasing in the transportation pipeline, which is a typical abnormal state in some developing countries, e.g., China). The fluctuations of the transportation price index of for are listed in Table V. To prove the effectiveness of the proposed approach in this situation, this experiment is repeated ten times. The total water quantity allocation and the allocation proportion of the water source in one experiment are shown in Table V and Fig. 6, respectively. The average value and the standard deviation of these ten repetition experiments for the total water quantity allocation in this situation are listed in Table VI. The results in Fig. 6 show that the allocation proportion of the water source for is decreasing, with the increasing of the transportation price index of the water source for . With the continual interaction, the allocation proportion of the water source for and are increasing, but it is almost unchanged for . The main reason of these results is that the demands of different water sources for each agent are different.

Fig. 6. The allocation proportion of water source normal state in one experiment.

for each agent under ab-

It is obviously that the allocation proportion of the water source for is decreasing significantly at the 7th day and the 14th day, when the transportation price index of the water source for changes at these days. However, the allocation proportion of the water source for does not change at the 21st day when the transportation price index increases again (see Table V and Fig. 6). The reason is that the total quantity of the two water sources is limited. To satisfy the constrain conditions [see (9)], the proportion of the water source cannot increase continually with the decrease of the proportion of the

212

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

TABLE VII SATISFACTION RATE OF AGENTS BASED ON THE GENERAL Q-LEARNING METHOD AND THE PROPOSED APPROACH

water source . The standard deviation values of these ten repetition experiments show that the result fluctuation of the proposed approach is small (see Tables V and VI), which means that the proposed approach has better stability. These results show that the proposed approach is capable of dealing with the dynamic environment and giving out an optimal allocation proportion for different water sources. This performance is very important for the reality applications. IV. DISCUSSION To illustrate the advantages of the proposed approach, some comparison experiments are conducted and the results are discussed. In order to have comparability, the fist experiment in Section III-A is used as a reference and the initial values of influence factors in this experiment are the same as those of the first experiment (see Table I). At first, the proposed approach is compared with the general Q-learning method based on the preference of the decision maker. The parameters of the general Q-learning method are the same as those of the proposed approach, except the work flow is different. The work flow of the general Q-learning method based on the preference of the decision maker in the water resource allocation task is as follows: 1) select an agent from the system by the preference of the decision maker, and obtain the allocation solution for this agent, by calculating the maximum mapping value of this agent and 2) calculate the mapping value of other agents in the system under this allocation solution. To analyze easily, a concept of satisfaction rate is introduced here to reflect the level of the objective attainment. The satisfaction rate in this study is denoted as SA, which is defined as %

(11)

is the mapping value of the th agent and is the maximum mapping value. The satisfaction rates of each agent based on the general Q-learning approach and the proposed approach are listed in Table VII. The results in Table VII show that the general Q-learning method based on the preference of the decision maker can just satisfy the objective of one agent fully, but the comprehensive satisfaction rate is lower than the proposed approach. For example, when the preference of the decision maker is , the objective of is stratified fully % . The biggest

where

satisfaction rate of other agents is % under this allocation solution. However, the lowest satisfaction rate of the agent obtained by the proposed approach is 88.57%. The average satisfaction rate of the proposed approach is higher than all the values obtained by the general Q-learning method based on the preference of different decision makers. The results in this experiment show that the allocation solutions obtained by the proposed approach can satisfy most of the objectives of agents and the comprehensive satisfaction rate is high. This performance of the proposed approach is very important in dealing with water resource allocation, which can balance the development of society, economy and ecology. To discuss the effects of the GA in the proposed approach, the second comparison experiment is conducted, where the proposed approach is compared with the general mapping value Q-learning approach. In the general mapping value Q-learning approach, all the parameters and the work flow are the same as those of the proposed approach, except the searching method for the maximum mapping value function is different. In the general mapping value Q-learning approach, the traditional traversing searching algorithm is used. When the number of actions in the general mapping value Q-learning approach is set as , the total computation time is 3.11 s in this experiment. However, the total computation time of the proposed approach is just 0.74 s using the same computer. From another point-of-view, the number of actions under each state must be set as little as possible, to ensure the convergence and save the decision time when the general mapping value Q-learning approach is used, which will make the allocation results imprecise (see Fig. 7). Finally, the effect of the adaptive multifactor reward function in the proposed approach is discussed. In the literature on water resource allocation, global optimization algorithms are often used [38]–[40], where the constraint function is generally defined as follows: (12) where is the influence factor and is the corresponding coefficient of this factor. The water resource allocation approaches based on those algorithms above have some limitations. For example, the constraint functions in those global optimization algorithms can consider only one aspect, such as the economic

NI et al.: A MULTIAGENT Q-LEARNING-BASED OPTIMAL ALLOCATION APPROACH FOR URBAN WATER RESOURCE MANAGEMENT SYSTEM

Fig. 7. The allocation proportion of water source of actions is different.

for

when the number

aspect or social aspect. Furthermore, the corresponding coefficient is difficult to get and fixed, so it cannot satisfy the requirements of the change in a dynamic system environment. However, the proposed approach can fully consider all the influence factors of various aspects, and the reward value is calculated real-time with the change of system state. So the allocation solutions are more reasonable than the traditional optimal allocation approaches based on the general global optimization algorithms. V. CONCLUSION The optimal allocation of water resource for urban water management system has been investigated in this paper. In order to deal with this problem, a water resource optimal allocation model based on multiagent modeling technology is set up, where different optimization objectives are abstracted into various properties of different agents. A maximum mapping value-based multiagent Q-learning approach is proposed to deal with the optimal allocation problem. In the proposed approach, an adaptive reward value function is proposed, which can give full consideration to all the factors in the system, and a GA is used to make the decision quickly and precisely. The stability of the proposed approach is higher, which can help the society, economy and environment to develop in a healthy way. The proposed approach is capable of dealing with the dynamic environment and giving out an optimal allocation proportion for different water sources. Some comparison experiments are conducted and the results show that the proposed approach is efficient in the water resource optimal allocation task for urban water resource management system. ACKNOWLEDGMENT The authors would also like to thank the editors and the reviewers for their helpful comments. REFERENCES [1] G.-H. Wei, F. Liu, and L. Ma, “Fuzzy optimization of water resources project scheme based on improved grey relation analysis,” in Proc. 3rd Int. Conf. Comput. Res. Develop., Shanghai, China, 2011, vol. 4, pp. 333–336. [2] W. Sun and Z. Zeng, “City optimal allocation of water resources research based on sustainable development,” Adv, Mater. Res., vol. 446–449, pp. 2703–2707, 2012.

213

[3] B. Huang, F. Gui, and X. Zhang, “Study of the optimal water resources allocation scenarios in Pingxiang city,” Adv. Intell. Soft Comput., vol. 105, pp. 487–493, 2011. [4] Z. Ren, M.-T. Pham, M. Song, D.-H. Kim, and C. S. Koh, “A robust global optimization algorithm of electromagnetic device utilizing gradient index and multi-objective optimization method,” IEEE Trans. Magn., vol. 47, no. 5, pp. 1254–1257, May 2011. [5] W. Y. Zhang, S. Zhang, M. Cai, and J. X. Huang, “A new manufacturing resource allocation method for supply chain optimization using extended genetic algorithm,” International J. Adv. Manuf. Technol., vol. 53, no. 9–12, pp. 1247–1260, Apr. 2011. [6] S. Rathinam, R. Sengupta, and S. Darbha, “A resource allocation algorithm for multivehicle systems with nonholonomic constraints,” IEEE Trans. Autom. Sci. Eng., vol. 4, no. 1, pp. 98–104, Jan. 2007. [7] A. Gosavi, Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning. Boston, MA: Kluwer Academic, 2003. [8] D. Vengerov, “A reinforcement learning approach to dynamic resource allocation,” Eng. Appl. Artif. Intell., vol. 20, no. 3, pp. 383–390, Apr. 2007. [9] C. Bone and S. Dragićević, “Simulation and validation of a reinforcement learning agent-based model for multi-stakeholder forest management,” Comput., Environ. Urban Syst., vol. 34, no. 2, pp. 162–174, 2010. [10] C. Song, X. Guan, Q. Zhao, and Y. Ho, “Machine learning approach for determining feasible plans of a remanufacturing system,” IEEE Trans. Autom. Sci. Eng., vol. 2, no. 3, pp. 262–275, Jul. 2005. [11] M. Han, H. Du, X. Yang, and Y. Liu, “Research advances on water resources optimal distribution,” Procedia Environ. Sci., vol. 2, pp. 1912–1918, 2010. [12] B. Abolpour, M. Javan, and M. Karamouz, “Water allocation improvement in river basin using adaptive neural fuzzy reinforcement learning approach,” Appl. Soft Comput., vol. 7, no. 1, pp. 265–285, Jan. 2007. [13] M. Cunha and L. Ribeiro, “Tabu search algorithms for water network optimization,” Eur. J. Oper. Res., vol. 157, no. 3, pp. 746–758, Sept. 16, 2004. [14] I. Montalvo, J. Izquierdo, S. Schwarze, and R. Perez-Garcia, “Multiobjective particle swarm optimization applied to water distribution systems design: An approach with human interaction,” Math. Comput. Modeling, vol. 52, no. 7–8, pp. 1219–1227, 2010. [15] L. A. House-Peters and H. Chang, “Urban water demand modeling: Review of concepts, methods, and organizing principles,” Water Resources Res., vol. 47, no. 5, 2011. [16] L. Kanta and E. M. Zechman, “A complex adaptive systems approach to develop basin-scale optimal management strategies for water resources systems,” in Proc. World Environ. Water Resources Congr., Palm Springs, CA, May 16–20, 2011, pp. 2840–2843. [17] C. Li, F. Wang, X. Wei, and Z. Ma, “Solution method of optimal scheme set for water resources scheduling group decision-making based on multi-agent computation,” Intell. Autom. Soft Comput., vol. 17, no. 7, pp. 871–883, 2011, SI. [18] D. Nickel, R. Barthel, and J. Braun, “Large-scale water resources management within the framework of GLOWA-Danube—The water supply model,” Phys. Chem. Earth, vol. 30, no. 6–7, pp. 383–388, 2005, SPEC. ISS.. [19] J. Ni, M. Liu, J. Fei, and H. Ma, “Reinforcement learning based multiagent cooperation for water price forecasting decision support system,” Information—An International Interdisciplinary J., vol. 15, no. 5, pp. 1889–1899, May 2012. [20] A. Bonarini, A. Lazaric, F. Montrone, and M. Restelli, “Reinforcement distribution in fuzzy Q-learning,” Fuzzy Sets and Systems, vol. 160, no. 10, pp. 1420–1443, 2009. [21] R. Ollington and P. Vamplew, “Concurrent Q-learning: Reinforcement learning for dynamic goals and environments,” Int. J. Intell. Syst., vol. 20, no. 10, Oct. 2005. [22] U. Kartoun, A. Shapiro, H. Stern, and Y. Edan, “Physical modeling of a bag knot in a robot learning system,” IEEE Trans. Autom. Sci. Eng., vol. 7, no. 1, pp. 172–177, 2010. [23] C.-F. Juang and C.-M. Lu, “Ant colony optimization incorporated with fuzzy Q-learning for reinforcement fuzzy control,” IEEE Tran. Syst., Man, Cybern. Part A: Syst. Humans, vol. 39, no. 3, pp. 597–608, 2009. [24] H. Cui, O. Turan, and P. Sayer, “Learning-based ship design optimization approach,” Comput. Aided Design, vol. 44, no. 3, Mar. 2012.

214

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 11, NO. 1, JANUARY 2014

[25] L. Panait and S. Luke, “Cooperative multi-agent learning: The state of the art,” Autonomous Agents and Multi-Agent Systems, vol. 11, no. 3, pp. 387–434, Nov. 2005. [26] X.-H. Xia, L.-H. Xu, and X.-Y. Kuang, “A distributed Nash Q-learning approach for optimizing urban traffic,” J. Convergence Inf. Technol., vol. 7, no. 2, pp. 92–100, 2012. [27] L. Busoniu, R. Babuka, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,” IEEE Trans. Syst., Man, Cybern. Part C: Appl. Rev., vol. 38, no. 2, pp. 156–172, 2008. [28] D. Liu, S. Guo, X. Chen, Q. Shao, Q. Ran, X. Song, and Z. Wang, “A macro-evolutionary multi-objective immune algorithm with application to optimal allocation of water resources in Dongjiang River basins, South China,” Stochastic Environmental Research and Risk Assessment, vol. 26, no. 4, pp. 491–507, May 2012. [29] C. Watkins, “Learning from delayed rewards,” Ph.D. dissertation, King’s College, Cambridge, U.K., May 1989. [30] J. Hu and M. P. Wellman, “Nash Q-learning for general-sum stochastic games,” J. Mach. Learning Res., vol. 4, no. 6, pp. 1039–1069, 2003. [31] E. Yang and D. Gu, “Multiagent reinforcement learning for multi-robot systems: A survey,” Tech. Rep. 2004. [Online]. Available: http://www.essex.ac.uk/csee/research/publications/technicalreports/2004/csm404.pdf [32] J. Wu, X. Xu, P. Zhang, and C. Liu, “A novel multi-agent reinforcement learning approach for job scheduling in Grid computing,” Future Generation Comput. Syst., vol. 27, no. 5, pp. 430–439, May 2011. [33] K. Fujita and H. Matsuo, “Multiagent reinforcement learning with the partly high-dimensional state space,” Syst. Comput. Japan, vol. 37, no. 9, pp. 22–31, 2006. [34] M. L. Littman, “Value-function reinforcement learning in Markov games,” Cognitive Syst. Res., vol. 2, no. 1, pp. 55–66, 2001. [35] X. Xu, D. Hu, and X. Lu, “Kernel-based least squares policy iteration for reinforcement learning,” IEEE Trans. Neural Networks, vol. 18, no. 4, pp. 973–992, Jul. 2007. [36] M. Zhou and S. Sun, Theory and Application of Genetic Algorithm. Beijing, China: National Defense Industry Press in China, 1996. [37] H. Wang and Y. Tong, “Optimal allocation models for regional water resources with sustainable development,” J. Tsinghua Univ. (Sci. Technol.), vol. 47, no. 9, pp. 1531–1536, Sep. 2007. [38] Z. Zabinsky, Stochastic Adaptive Search for Global Optimization. Boston, MA: Kluwer Academic, 2003. [39] A. Montazar, “A decision tool for optimal irrigated crop planning and water resources sustainability,” J. Global Opt. pp. 1–14, 2011. [Online]. Available: http://dx.doi.org/10.1007/s10898-011-9803-1 [40] J. R. Kasprzyk, P. M. Reed, G. W. Characklis, and B. R. Kirsch, “Many-objective de Novo water supply portfolio planning under deep uncertainty,” Environmental Modelling Software, vol. 34, no. SI, pp. 87–104, Jun. 2012. Jianjun Ni (M’11) received the Ph.D. degree from the Department of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, China, in 2005. Currently, he is an Associate Professor with the College of Computer and Information, Hohai University, Jiangsu, China. He was a Visiting Professor with the Advanced Robotics and Intelligent Systems (ARIS) Laboratory, University of Guelph, Canada, from November 2009 to October 2010. He has published over 30 papers in related international conferences and journals. He has served as a reviewer of a number of international journals. His research interests include fuzzy systems, neural networks, robotics, machine intelligence, and multiagent system.

Minghua Liu received the B.S. degree in 2009 from Hohai University, Jiangsu, China. Currently, she is working towards the M.S. degree at the Department of Information and Communication Engineering, College of Computer and Information, Hohai University. Her research interests include intelligent information processing and machine learning.

Li Ren received the Ph.D. degree from the College of Hydrology and Water Resources, Hohai University, Jiangsu, China, in 2009. Currently, she is a Lecturer with the College of Hydrology and Water Resources, Hohai University. She has published over ten papers in related international conferences and journals. She has served as Reviewer of a number of international journals. Her research interests include eco-hydrology, water resources system optimization, management and utilization of water resources.

Simon X. Yang (SM’08) received the B.Sc. degree in engineering physics from Beijing University, Beijing, China, in 1987, the first M.Sc. degree in biophysics from the Chinese Academy of Sciences, Beijing, in 1990, the second M.Sc. degree in electrical engineering from the University of Houston, Houston, TX, in 1996, and the Ph.D. degree in electrical and computer engineering from the University of Alberta, Edmonton, Canada, in 1999. He joined the School of Engineering at the University of Guelph, Canada, in 1999. Currently, he is a Professor and the Head of the Advanced Robotics and Intelligent Systems (ARIS) Laboratory, University of Guelph, Guelph, Canada. His research interests include intelligent systems, robotics, sensors and multisensor fusion, wireless sensor networks, control systems, soft computing, and computational neuroscience. Prof. Yang serves as an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS, the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, PART B, and the International Journal of Robotics and Automation, and serves as an Associate Editor or Editorial Member of several other journals. He has involved in the organization of many conferences.

Recommend Documents

Optimal Power Allocation for Parameter Tracking in a ... - IEEE Xplore

Fast Optimal Resource Allocation is Possible for ... - IEEE Xplore

Optimal power allocation scheme on generalized ... - IEEE Xplore

Near-Optimal Power Allocation and Multiuser ... - IEEE Xplore