Dynamic Bandwidth Allocation under Uncertainty in Cognitive Radio Networks Kun Zhu, Dusit Niyato, and Ping Wang School of Computer Engineering, Nanyang Technological University (NTU), Singapore Email:{zhuk0001,dniyato,wangping}@ntu.edu.sg Abstract—We consider the problem of dynamic bandwidth allocation among different service classes under uncertainty in cognitive radio networks. In such networks, the secondary users compete for bandwidth resources and the service providers compete for users access (e.g., subscription). To address this problem, a two-level dynamic game framework is developed. The underlying dynamic service selection of secondary users is modeled as an evolutionary game based on replicator dynamics. The randomly irrational churning behavior of secondary users is modeled as a stochastic disturbance to the service selection distribution evolution. At the upper level, a bandwidth allocation stochastic differential game is formulated to model the competition among different service providers. The service selection distribution of the underlying evolutionary game describes the state of the upper stochastic differential game and a Markov perfect Nash equilibrium is considered to be the solution. The decentralized nature of the framework makes the system flexible and simple for implementation.
Keywords – Bandwidth allocation, Replicator dynamics, Stochastic differential game, Markov perfect Nash equilibrium. I. I NTRODUCTION In cognitive radio networks, different service providers may provide different services in terms of coverage area, mobility support, offered data rate, and price. Also, each of the service providers can differentiate the service into multiple service classes. Naturally, two issues arise. First, the secondary users compete for the services and dynamically adapt the service selection strategies according to the time varying perceived performance so that the individual utility is maximized. Due to the stochastic nature of wireless channel or irrational behavior of secondary users, the service selection distribution may subject to stochastic disturbance continuously. Second, according to the disturbed dynamic service selection distribution, the service providers competitively and dynamically allocate the available network capacity to the offered service classes to obtain the maximum profits. This type of competition behavior can be analyzed using dynamic games. Also, due to stochastic disturbance, the uncertainty needs to be considered. A few works addressed the issues of service selection and flow assignment in heterogeneous wireless networks. In [1], evolutionary game based algorithms were proposed for dynamic network selection. A Markov decision process (MDP) based control scheme was proposed for flow assignment among different networks in [2]. However, none of the works jointly considered the problem of dynamic bandwidth allocation and service selection in cognitive radio networks under competition in which the secondary users can change service selection dynamically. Our previous work [3] addressed the problem of dynamic optimal bandwidth allocation
in heterogeneous wireless network. However, the stochastic disturbance to service selection dynamics was not considered. To address the issues of dynamic service selection and dynamic bandwidth allocation under uncertainty, a hierarchical (i.e., two-level) game framework is developed to jointly obtain the strategies for both secondary users and service providers. The dynamic competition of service selection among secondary users is modeled as an evolutionary game. The strategy adaptation of this evolutionary game subjects to the controls of service providers in terms of bandwidth allocation as well as stochastic disturbance. In turn, the providers observe the service selection distribution of the secondary users and decide the optimal bandwidth allocation strategy dynamically. This dynamic bandwidth allocation under uncertainty is modeled as a stochastic differential game [4] and the state of which is the service selection distribution. The rest of this paper is organized as follows: Section II presents the system model. The replicator dynamics of service selection adaptation under continuous stochastic disturbance is presented in Section III. The optimal bandwidth allocation control considering the dynamic disturbed service selection is formulated as a stochastic differential game in Section IV. Section V presents the numerical studies. The summary of this paper is given in Section VI. II. S YSTEM M ODEL AND A SSUMPTIONS We consider a particular service area a in the coverage of a cognitive radio network consisting of M secondary service providers and N secondary users as shown in Fig. 1. The M service providers buy spectrum from primary users to provide service to secondary users. Cooperative dynamic spectrum access (DSA) is used for spectrum sharing and the interference caused to primary users by the secondary system is not considered here (i.e., this is based on exclusive-use DSA model). Provider i ∈ {1, 2, . . . , M } can provide Ki service classes to secondary P users to satisfy different service requirements. M Denote K = i=1 Ki the total number of service classes. For differentiating services, Paris Metro Pricing (PMP) model [5] is used based on which the bandwidth is divided into several logically separated partitions, each partition for each class, and different classes differ only at price. The secondary users can select and churn to different providers independently according to the perceived instantaneous utility [6] which depends on the allocated bandwidth and price. Here, we assume a secondary user can only use service from one provider at a time. For the resource allocation among secondary users, we consider an un-weighted round robin scheme where all secondary users subscribed to the same
2
service class will share the available bandwidth equally. In this case, the bandwidth of a secondary user received from service class j of provider i at time t is denoted as τij (t) = bij (t)/nij (t), where bij (t) represents the total bandwidth of service class j from provider i, nij (t) represents the total number of secondary users choosing service PM PKiclass j of provider i at time t. Specifically, we have i=1 j=1 nij (t) = N and bij (t) = Ci ηi γij (t), where Ci is the total size of spectrum that provider i bought from primary users, γij (t) is the proportion of provider i allocated to service class j (i.e., PKbandwidth i j=1 γij (t) = 1), and ηi is the average spectral efficiency in bits per symbol per Hz [7]. SP1
SC1 SC2
SP2
SCK1
SC1 SC2
SPM
SCK2
Bandwidth allocation
SCKM
Service selection
a
Secondary Users
Fig. 1.
SC1 SC2
dxij (t) = δxij (t) (πij (t) − π(t)) dt + σij (xij (t))dω(t), (2)
SP : Service Provider SC : Service Class
System model of multi-class cognitive radio networks.
III. E VOLUTION OF S ERVICE S ELECTION An underlying evolutionary game is formulated to model the dynamic competition of service selection among secondary users. This is the lower-level game in the proposed twolevel game framework. The players are the N bound-rational secondary users in area a. In the context of evolutionary game, these N users constitute the population. The strategies of secondary users are the choices of service provider and service class. The payoff of a secondary user is the utility that quantifies the performance satisfaction level according to the allocated bandwidth and price. Specifically, we consider a utility function u(τij (t)) = ατij (t)/pij , where α is an adjustable constant to provide flexibility to fit the function with an empirical data for different types of applications and pij is the price charged by provider i for service class j per secondary user per unit of time. Let xij (t) = nij /N denote the proportion of secondary users choosing service class j from provider i at time t which is also referred to as the population share. The population share of all K service classes constitutes the population state denoted by a vector ~x(t) = £ ¤T x11 (t) · · · xij (t) · · · xM KM (t) , where xij ∈ PM PKi [0, 1] and x (t) = 1. The expected payoff of i=1 j=1 ij a secondary user selecting this service class can be defined as πij (t) = u(τij (t)). Accordingly, the average payoff of the PM PKi xij (t)πij (t). population can be derived as π(t) = i=1 j=1 For a small period of time, the rate of service selection strategy change can be modeled by replicator dynamics which is defined as follows: x˙ ij (t) = δxij (t) (πij (t) − π(t)) ,
with initial condition ~x(0) = ~x0 ∈ X, for all i ∈ {1, 2, . . . , M } and j ∈ {1, 2, . . . , Ki }, where δ is the learning rate and X is the state space which contains all possible population distributions. This learning rate δ controls the speed of adaptation in service selection. The service selection adaption algorithm for secondary users is the same as that proposed in [1] and therefore is not included here. The bound-rational secondary users lack of complete information and computation capability. Therefore, irrational churning can occur randomly. That is, a secondary user may make a wrong decision and choose a strategy with lower payoff with a small probability. This randomly irrational churning can be interpreted as the stochastic perturbation to the population state evolution. We assume this disturbance enters the dynamics in the form of a Wiener process. In this case, the evolution dynamics under disturbance can be represented as follows:
(1)
where σij (xij (t)) is a variance term and ω(t) represents a standard Wiener process. Specifically, σij (xij (t)) represents the direct impact of the stochastic disturbance on the evolution of the population share of service class j of provider i. It is straightforward that if σij (xij (t)) ≡ 0 for all t ∈ [0, ∞), the uncertainty disappears and the stochastic state dynamics defined in (2) degenerate to the deterministic state dynamics as defined in (1). To make the population share xij remain bounded within [0, 1] under the continuous stochastic disturbance, the variance term σij (xij (t)) needs to satisfy the condition σij (xij (t)) > 0 for xij (t) ∈ (0, 1) and σ(0) = σ(1) = 0. IV. O PTIMAL BANDWIDTH A LLOCATION UNDER U NCERTAINTY With the dynamic service selection behavior of secondary users, the providers can optimally allocate the bandwidth for different service classes to achieve the maximum profits. Increasing the allocated bandwidth of certain service class is a natural way to improve the performance and also to attract more secondary users for this service class. However, with the limited total capacity, increasing the bandwidth allocated to one service class will decrease the bandwidth allocated to other service classes which may result in a reduced total profit. In this section, we formulate the dynamic bandwidth allocation with disturbed service selection dynamics as a stochastic differential game. This is the upper-level game in the proposed two-level game framework. The optimal control strategies for providers will be derived and the characteristic of the disturbed state evolution path will be investigated. A. Noncooperative Dynamic Bandwidth Allocation We first formulate the competitive dynamic bandwidth allocation as a noncooperative stochastic differential game. In this case, each of the M noncooperative providers competes to maximize the present value of its own objective function derived over an infinite time horizon by controlling the bandwidth allocation strategy.
3
The players are the M service providers. The strategy of each provider is the dynamic control of the proportion of bandwidth allocated to different service classes. We consider a closed-loop control strategy which can use feedback information (i.e., state) to adjust the control process. Specifically, the control strategy of provider i is denoted by a vector ~γ i (~x(t), t) = £ ¤T γi1 (~x(t), t) · · · γij (~x(t), t) · · · γiKi (~x(t), t) . PKi Naturally, γij (~x(t), t) ∈ [0, 1], and j=1 γij (~x(t), t) = 1 for all t ∈ [0, +∞). The disturbed population state ~x(t) of the underlying service selection evolutionary game describes the state of the upper differential game. The stochastic differential equations (2) describe how the current state ~x(t), the service providers’ control ~γ i (~x(t), t) for all i ∈ {1, 2, . . . , M }, and the white noise disturbance influence the rate of change of the state at time t. As common in control theory, the parameters will be omitted whenever no ambiguity occurs. The instantaneous profit of provider i choosing strategy ~γ i is expressed as follows: i Jins (~γ i , ~γ −i ) =
Ki X
(pij N xij − βij (γij Ci )2 ),
(3)
j=1
where ~γ −i is a vector of strategies of all players except player i, βij is a cost factor, and βij (γij Ci )2 is the instant cost of provider i buying spectrum for service class j from primary users. For each of the M providers, the maximization of expected profit under dynamic competition becomes a stochastic optimal control problem subject to the constraints (e.g., disturbed state evolution differential equations) given the control strategies of other providers. The optimal control model can be expressed as follows: maximize: J i (~γ i , ~γ −i ) where ½Z ∞ ¾ i J i (~γ i , ~γ −i ) = E e−ρt Jins (~γ i , ~γ −i )dt , (4) 0
subject to: dxij = (δxij (πij − π)) dt + σij (xij )dω, ~x(0) = ~x0 ,
(5)
for i ∈ {1, . . . , M } and j ∈ {1, . . . , Ki }, where ρ is the discount rate influencing the present value of future profit. Since this stochastic differential game is defined over infinite time horizon, we need to verify whether the objective functionals are upper bounded for all feasible strategies and corresponding state trajectories. To this end, we first i show that the instantaneous profit Jins (~γ i , ~γ −i ) is upper bounded for all i ∈ {1, 2, . . . , M } since xij ∈ [0, 1]. Denote i i L ©R = max{Jins (~ ªγ i , ~γ −i )}, it is obvious that J (~γ i , ~γ −i ) ≤ ∞ −ρt E 0 e Ldt = L/ρ < ∞ and therefore the objective functionals are well defined. For this noncooperative formulation, Nash equilibrium is a reasonable notion of optimality and is considered to be the solution. To obtain a Markov perfect Nash equilibrium (i.e., a subgame perfect closed-loop Nash equilibrium [8]), HamiltonJacobi-Bellman (HJB) method can be used. As common for autonomous dynamic games with infinite time horizon, an aim
is to find the stationary Markov perfect Nash equilibrium. In this case, the optimal value function and optimal control strategy for problem in (4) are independent of time t. Specifically, denote V i (~x) the optimal value function for provider i. The HJB equation for the formulated bandwidth allocation stochastic differential game is expressed as follows: 1 i ~σ~σ T ] ρV i (~x) − tr[Vxx 2 Kj M X X i = max Jins (~γ i , ~γ −i ) + Vxijk x˙ jk , ~ γi
(6)
j=1 k=1
~ where tr[·] is the trace of a matrix [8], σ = £ ¤T σ11 · · · σij (t) · · · σM KM (t) is a K dimensional vector, Vxijk = ∂V i /∂xjk , Vxiij xkl = ∂ 2 V i /∂xij ∂xkl , and i Vxx is a K × K matrix. Maximizing the right hand side of (6) with respect to ~γ i yields the optimal bandwidth allocation control strategies as follows: ³ ´ ∗ γij = max 0, δαηi Vxiij /(2βij pij N Ci ) . (7) Since Vxiij can be interpreted as the shadow value representing the variation of total profit induced by the variation of the population share xij . It is straightforward that the increase of population share of provider i increases its profit. In this case, Vxiij ≥ 0 and (7) can be reduced to ∗ ∗ γij = δαηi Vxiij /(2βij pij N Ci ). Substituting γij into (6), we can obtain a system of partial differential equations as follows: 1 i ρV i (~x) − tr[Vxx ~σ~σ T ] 2 =
i (~γ ∗i , ~γ ∗−i ) Jins
+
Kj M X X
Vxijk x˙ jk ,
(8)
j=1 k=1
for all i ∈ {1, 2, . . . , M }. If σij (xij (t)) ≡ 0 for all t ∈ [0, ∞), the stochastic bandwidth allocation differential game degenerates to a deterministic differential game. The degenerated deterministic bandwidth allocation differential game is a linear state differential game whose optimal value is a linear function with respect to system state. For the nondegenerate (i.e., σij (xij (t)) 6= 0) stochastic bandwidth allocation differential game, we first assume the optimal value function also has a linear form and then show thatPthis P assumption works. Specifically, we Kj M i i assume V i (~x) = j=1 k=1 ajk xjk + bi , where ajk and i bi are constant coefficients. Therefore, ∂V /∂xij = aiij and ∗ γij = δαηi aiij /(2βij pij N Ci ). Substituting V i (~x) into (8) and equating powers of ~x yields 2 ai = pjk N , j = i, jk ρN + αδC (9) i ajk = 0, j 6= i, for all j ∈ {1, . . . , M } and k ∈ {1, . . . , Kj }, where C = PM l=1 Cl ηl . Similarly, equating the constant items yields Kj Ki M X X X 1 αδ i ∗ ∗ 2 bi = a Cj ηj γjk − βij (Ci γij ) , (10) ρ j=1 N jk j=1 k=1
4
for all i ∈ {1, 2, . . . , M }. We have obtained the optimal value function as well as the closed form optimal bandwidth allocation strategy ~γi∗ for provider i. Similarly, the optimal control strategies for all providers can be obtained. Accordingly, the strategy profile ∗ ϕ∗ = {~γi∗ , ~γ−i } constitutes the Markov perfect Nash equilibrium for the stochastic bandwidth allocation differential game. B. Cooperative Dynamic Bandwidth Allocation We have described the noncooperative bandwidth allocation stochastic differential game. Now, we consider the case that the providers agree to cooperate. Specifically, the providers decide their bandwidth control strategies in a cooperative manner with the objective to maximize the aggregated profit. This cooperative dynamic bandwidth allocation can be formulated as a cooperative stochastic differential game. The objective functionals are the aggregated profit, and therefore are the same for all providers. Denote ~γ ci and ~γ c−i the cooperative bandwidth allocation strategies for provider i and all other providers, respectively. Similar to the noncooperative case, each of the M providers needs to solve a stochastic optimal control problem expressed as follows: maximize: (Z ) M ∞ X c c c −ρt i J (~γ i , ~γ −i ) = E e Jins (~γ i , ~γ −i )dt , (11) 0
i=1
with the same constraints as defined in (5). To obtain the optimal control for the cooperative bandwidth allocation, method can also be used. Denote V c (~x) = PM PKi HJB c c i=1 j=1 aij xij + b the optimal value function for the aggregated profit. The HJB equation can be expressed as follows: 1 c ρV c (~x) − tr[Vxx ~σ~σ T ] 2 Kj M X (12) X c c c c ~ = max J (~ γ , γ ) + V x ˙ , i −i xjk jk ~ γ ci j=1 k=1
c where Vxcjk and Vxx have the same definitions as those of noncooperative model. With the same method, we can obtain the optimal coopc∗ erative optimal control γij = δαηi Vxcij /(2βij N Ci ) and the coefficients of optimal value function V c as follows: pij N 2 acij = , ρN + αδC ¸ (13) M Ki · 1 XX αδ c c c∗ c∗ 2 b = a C η γ − β (C γ ) , i i ij i ij ij ρ N ij i=1 j=1
for all i ∈ {1, . . . , M } and j ∈ {1, . . . , Ki }. We observe that aiij = acij which indicates the noncooperative bandwidth allocation control for provider i is the same as the cooperative ∗ c∗ control strategy (i.e., γij = γij ). Therefore, with the noncooperative control strategies, the aggregated profit can also be maximized. C. Characterization of the Evolution Path With the optimal bandwidth allocation controls of all providers, we now investigate the characteristic of the dis-
turbed evolution path analytically. In particular, we examine the mean evolution path. Since the noncooperative control strategies coincide with the cooperative strategies and therefore have the same characteristics, we only consider the noncooperative case here. ∗ Substituting the optimal control γij into (1) yields dxij (t) = (Aij − Axij )dt + σij (xij (t))dω(t), (14) P M ∗ where Aij = δαCi ηi γij /pij N and A = δα l=1 Cl ηl /N . Then, xij is a solution of the stochastic differential equation (14) that satisfies the integral equation Z t xij (t) =xij (0) + (Aij − Axij (s))ds 0 (15) Z t + σij (xij (s))dω(s). 0
From (15), we observe that the mean evolution path is independent of the stochastic disturbance and can be derived as follows: Z t E[xij (t)] = xij (0) + (Aij − AE[xij (s)])ds. (16) 0
(16) can be rewritten as the first-order linear differential equation in the mean path E[xij ] as follows: dE[xij ]/dt = Aij − AE[xij ],
(17)
with initial condition E[xij (0)] = xij (0). Therefore, the mean evolution path can be given by Aij , (18) A for all i ∈ {1, 2, . . . , M } and j ∈ {1, 2, . . . , Ki }, and the long run mean equilibrium population share of service class j of provider i is limt→∞ E[xij (t)] = Aij /A. E[xij (t)] = e−At xij (0) + (1 − e−At )
V. P ERFORMANCE E VALUATION A. Parameters Setting We consider a cognitive radio network with two secondary service providers (i.e., M = 2) both providing two service classes to 30 active secondary users (i.e., N = 30) in a particular service area. The total spectrum bandwidth of providers 1 and 2 are C1 = 5 MHz and C2 = 3 MHz, respectively. The average spectral efficiency are assumed to be 1.5 bits/s/Hz for both providers (i.e., η1 = η2 = 1.5). Fixed connection fees for two service classes of two service providers are set to be p11 = 0.2, p12 = 0.1, p21 = 0.3, and p22 = 0.25. For the replicator dynamics, we set the learning rate to be δ = 1. The initial proportion of secondary users choosing two service classes of two service providers is assumed to be x11 (0) = 0.2, x12 (0) = 0.3, x21 (0) = 0.1, and x22 (0) = 0.4. B. Numerical Results The dynamic behavior of service selection of secondary users under the bandwidth allocation control is investigated. The population share trajectories under stochastic disturbance from the initial selection distribution are shown in Fig. 2. Also,
5
the mean state evolution paths for all service classes are given. Fig. 2 shows that the mean evolution paths converge to the certain population share distribution at which all secondary users receive the same average utility of the population. Due to continuously stochastic disturbance, the actual state evolution path fluctuates around the mean path but never reaches the steady state. Fig. 3 shows the normal density 95% confidence interval of the population share evolution path x11 . Though the evolution path may not follow a normal distribution, the interval shown provides a precise bound for the fluctuated population share due to disturbance.
Optimal value function V1
15
10
5
0 1 0.5 x12
0
0
0.2
0.4
0.6
0.8
1
x11
0.6 Service class 1 of SP1 Service class 2 of SP1 Service class 1 of SP2 Service class 2 of SP2
Proportion of users choosing services
0.55 0.5
Fig. 4.
Optimal value function of provider 1.
0.45
VI. S UMMARY
0.4 0.35 0.3 0.25 0.2 0.15 0.1
0
20
40
60
80
100
Time
Fig. 2. Evolution of service selection distribution from initial state [0.2,0.3,0.1,0.4].
Population share of service class 1 of provider 1
0.7 0.6
R EFERENCES
0.5 0.4 0.3 0.2
E[x ]+1.96s.d. 11
E[x11]−1.96s.d.
0.1
x
11
0
In this paper, we have addressed the problem of optimal bandwidth allocation among different service classes in cognitive radio networks considering the secondary users’ dynamic service selection under continuous stochastic disturbance. To model the competitive behavior of service providers, a stochastic bandwidth allocation differential game has been formulated and the state of which is the service selection distribution. A Markov perfect Nash equilibrium is considered to be the solution. In addition, we have considered the cooperative bandwidth allocation of service providers to maximize aggregated profit. Numerical studies have been performed to investigate the optimal value function, mean, and bound of the state evolution path. For the future work, the pricing issue and the amount of bandwidth to buy from primary users can be considered.
0
20
40
60
80
100
Time
Fig. 3. 95% confidence interval for population share of service class 1 of provider 1.
The optimal value function for provider 1 (i.e., V 1 ) is shown in Fig. 4. This optimal value function is linear with respect to the corresponding population share (i.e., x11 and x12 ). With the increase of population share, the optimal value increases linearly. Also, due to higher price of service class 1 charged to secondary users than service class 2, the increasing rate of V 1 with x11 is larger than that with x12 . The lines in the bottom plane show the population share at which the optimal values are equal. The optimal value function for provider 2 can be obtained similarly and the same arguments can be applied.
[1] D. Niyato and E. Hossain, “Dynamics of networks selection in heterogeneous wireless networks: an evolutionary game approach,” IEEE Transactions on Vehicular Technology, vol. 58, no. 4, pp. 2008-2017, May 2009. [2] J. P. Singh, T. Alpcan, P. Agrawal, and V. Sharma, “An optimal flow assignment framework for heterogeneous network access,” in Proc. WoWMoM, June 2007, pp. 1-12. [3] K. Zhu, D. Niyato, and P. Wang, “Optimal bandwidth allocation with dynamic service selection in heterogeneous wireless networks,” in Proc. Globecom, December 2010, pp. 1-6. [4] D. W. K. Yeung and L. A. Petrosyan, Cooperative Stochastic Differential Games. Springer, 2006. [5] S. Shakkottai, R. Srikant, A. Ozdaglar, and D. Acemoglu, “The price of simplicity,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 7 , pp. 1269 - 1276, September 2008. [6] C. U. Saraydar, N. B. Mandayam, and D. J. Goodman, “Pricing and power control in a multicell wireless data network,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 10 , pp. 1883-1892, October 2001. [7] A. J. Goldsmith and S. G. Chua, “Variable rate variable power MQAM for fading channels,” IEEE Transactions on Communications, vol. 45, no. 10, pp. 1218-1230, October 1997. [8] E. J. Dockner, S. Jørgensen, N. V. Long, and G. Sorger, Differential Games in Economics and Management Science. Cambridge Univ. Press, 2000.