Performance analysis of energy harvesting sensors with time ...

Report 4 Downloads 90 Views
Performance analysis of energy harvesting sensors with time-correlated energy supply Nicol`o Michelusi, Kostas Stamatiou and Michele Zorzi {nicolo.michelusi,kstamat,zorzi}@dei.unipd.it Department of Information Engineering, University of Padova - via Gradenigo, 6b, 35131 Padova, Italy

Abstract—Sensors powered by energy harvesting devices (EHD) are increasingly being deployed in practice, due to the demonstrated advantage of long-term, autonomous operation, without the hassle of battery replacement. This paper is concerned with the following fundamental problem: how should the harvested energy be managed to ensure optimal performance, if the statistical properties of the ambient energy supply are known? To formulate the problem mathematically, we consider an EHDpowered sensor which senses data of varying importance and model the availability of ambient energy by a two-state Markov chain (“GOOD” and “BAD”). Assuming that data transmission incurs an energy cost, our objective is to identify low-complexity transmission policies, which achieve good performance in terms of average long-term importance of the transmitted data. We derive the performance of a Balanced Policy (BP), which adapts the transmission probability to the ambient energy supply, so as to balance energy harvesting and consumption, and demonstrate that it performs within 10% of the globally optimal policy. Moreover, a BP which avoids energy overflow by always transmitting when the sensor battery is fully charged is shown to perform within 5% of the optimum. Finally, we identify a key performance parameter for any policy, the relative battery capacity, defined as the ratio of the battery capacity to the expected duration of the BAD harvesting period.

I. I NTRODUCTION In recent years, energy harvesting (EH) has become a vast research topic, spanning different disciplines related to electronics, energy storage, measurement and modeling of ambient energy sources, and energy management [1], [2]. In the field of sensor networking, energy harvesting devices (EHD) are increasingly being deployed to power sensors in a variety of applications where battery replacement is difficult or cost-prohibitive, e.g., see [3]. In this paper, we are concerned with a fundamental question regarding the operation of EHD-powered sensors (EHS): how should statistical information on the ambient energy supply be exploited in order to achieve optimal performance? We consider a general system model consisting of an EHS, which senses data of varying “importance” and reports them judiciously to its receiver (RX). The EHS is powered by an EHD, which harvests energy from an ambient energy source that is modeled by a two-state Markov chain, where states “GOOD” and “BAD” correspond to an abundance and scarcity of ambient energy, respectively. Assuming that data transmission incurs an energy cost, our objective is to characterize low-complexity transmission policies, which achieve close to optimal performance, in terms of average long-term importance of the reported data.

We derive analytically the performance of a Balanced Policy (BP), which adapts the transmission probability based only on the ambient energy availability, while balancing the energy consumption and harvesting rates, and demonstrate that it performs within 10% of the globally optimal policy, obtained by maximizing the long-term average data importance with the Policy Iteration algorithm [4], [5]. In addition, a BP which always transmits when the sensor battery is fully charged is shown to perform within 5% of the optimum. The main implication of these results is that very good data reporting performance can be achieved with simple adaptation to the ambient energy supply, without precise knowledge of the energy stored in the sensor battery at any given time. We also demonstrate that the salient system parameter is the relative battery capacity, defined as the ratio of the battery capacity to the expected duration of the BAD harvesting period, and that the performance of any policy depends weakly on the absolute value of the latter. The model considered in the present paper is a generalization of the one studied in [6], where the energy supply was modeled as temporally independent, and properties of the optimal policy where derived. Previous related work on energy management policies for EHS includes [7]–[10], which have relied on different models and performance metrics. An assumption that is commonly made in [7]–[10] is that the energy storage capacity is infinite. Instead, the present paper explicitly takes into account the effect of finite battery capacity and its interplay with the EH process. The remainder of the paper is organized as follows. In Section II, we describe in detail our system model. Section III states the optimization problem, and includes the definitions and preliminary results which pave the way for the performance analysis of the BP presented in Section IV. In Section V, we present an array of numerical examples and Section VI summarizes our main conclusions. A list of symbols employed throughout the paper is provided in Table I. In the following, x ¯ = 1−x denotes the complement of x ∈ [0, 1] and χ(·) is the indicator function. II. S YSTEM M ODEL A general block diagram of a wireless EHS is shown in Fig. 1. The EHD collects ambient energy which is stored in a storage element, either a battery or a (super) capacitor (henceforth referred to as the “battery” for simplicity), which powers the sensing apparatus and the RF circuitry. A processing unit,

Table I L IST OF SYMBOLS .

Observation Sensing apparatus Ambient Energy

EHD

Energy Storage

RF

Figure 1.

Controller

emax ρ η ηG θ

Data

BAD

GOOD

Bk = 1 Figure 2.

Long-term average harvesting rate Transition probability from BAD to BAD harvesting state Transition probability from GOOD to GOOD harvesting state Average duration of BAD harvesting period Average duration of GOOD harvesting period Ratio of GOOD to BAD average harvesting durations, DG /DB Battery capacity Relative battery capacity, emax /DB Transmission probability, as induced by threshold policy µ Transmission probability of BP in the GOOD harvesting state Overflow Avoidance (OA) parameter ∈ {0, 1}

Block diagram of a wireless EHS.

1 − pG pG

β pB pG DB DG γ

1 − pB

pB

Bk = 0

Markov chain model of the EH process.

e.g., a micro-controller, manages the energy consumption of the EHS. In this paper, we are only concerned with energy expenditure associated with the transmission of sensing data by the RF block. We consider a slotted-time system, where slot k is the time interval [k, k + 1), k ∈ Z+ . At each time instant k, the EHS has a new data packet to send to the RX, over the interval [k, k + δ), where δ ∈ (0, 1] is the duty cycle of the sensor node. If the packet is not sent, then it is discarded. The EHS battery is modeled by a buffer. As in previous work [7], [10], we assume that each position in the buffer can hold one energy quantum and that the transmission of one data packet requires the expenditure of one energy quantum. The maximum number of quanta that can be stored is emax and the set of possible energy levels is denoted by E = {0, 1, . . . , emax }. At time k + 1, k ∈ Z+ , the amount of energy in the buffer is  (1) Ek+1 = min [Ek − Qk ]+ + Bk , emax ,

where {Bk } is the energy arrival process and {Qk } is the action process. Qk = 1 if the current data packet is transmitted, which results in the expenditure of one energy quantum, and Qk = 0 otherwise. Bk models the randomness in the energy harvested by the EHD in slot k. We assume that Bk ∈ {0, 1}, i.e., either one energy quantum is harvested, or no energy is harvested at all, and that the energy harvested in time-slot k, Bk , can be used only from the next time-slot k + 1. We assume that the underlying EH process {Ak } is a two-state Markov chain, with state space {G, B}, where G and B denote the GOOD and BAD harvesting states, respectively, as shown in Fig. 2. Then, Bk = 1 in the GOOD EH state (Ak = G), and Bk = 0 in the BAD EH state (Ak = B). We denote the transition probabilities of {Ak }

as pG , Pr(Ak = G|Ak−1 = G) (transition from G to G) and pB , Pr(Ak = B|Ak−1 = B) (transition from B to B). Moreover, we denote the average durations of the GOOD and BAD EH periods as DG = (1−pG )−1 and DB = (1−pB )−1 , respectively, and their ratio as γ = DG /DB . The steady-state distribution of {Ak } is p¯B , (2) πA (G) = p¯B + p¯G p¯G πA (B) = . (3) p¯B + p¯G Since one energy quantum is harvested in every GOOD timeslot, the average EH rate is "K−1 # X 1 E (4) Bk = πA (G), β , lim K→∞ K k=0

with β ∈ (0, 1). Note that the i.i.d. Bernoulli model considered in [6] follows as a special case of the model proposed in this paper, when pG + pB = 1. We now formally define energy outage and overflow. Definition 1 In slot k, energy outage occurs if Ek = 0 and energy overflow occurs if (Ek = emax )∩(Bk = 1)∩(Qk = 0). Under energy outage, no transmissions can be performed, hence Qk = 0. Energy overflow occurs when a harvested energy quantum cannot be stored and is lost, due to a fully charged battery. At time k, the EHS state is defined as Sk = (Ek , Vk , Ak−1 ), where Vk is the importance value of the current data packet. We assume that Vk ∈ R+ is a continuous random variable with probability density function (pdf) fV (v), v ≥ 0, and that {Vk } are i.i.d. Note that, at time k, the EHS controller observes the past energy arrival Bk−1 , hence it knows Ak−1 but not the current EH state Ak , which does not appear in the state vector Sk . III. O PTIMIZATION P ROBLEM

AND

P OLICY D EFINITIONS

A. General optimization problem Given Sk = (e, v, a) ∈ E × R+ × {G, B}, the policy µ implemented by the controller in Fig. 1 is defined by the probability µ(1; e, v, a) of transmitting the data packet in slot k. The respective probability of discarding the data packet is

µ(0; e, v, a) = 1 − µ(1; e, v, a).1 Given an initial state S0 , the long-term average reward under policy µ is defined as # " K−1 X 1 (5) G(µ, S0 ) = lim inf E Qk Vk S0 , K→∞ K

C. Policy Definitions Definition 2 The set U of admissible policies is U ≡ {η : η(0, B) = 0, η(emax , a) ∈ (0, 1], η(e, a) ∈ (0, 1), e = 1, . . . , emax − 1, ∀a ∈ {G, B}}.

k=0

where the expectation is taken with respect to {Bk , Qk , Vk } and Qk is drawn according to µ. G(µ, S0 ) represents the average importance per time-slot of the transmitted data. The optimization problem at hand is to determine the optimal µ∗ such that µ∗ = arg max G(µ, S0 ). (6) µ

B. Preliminary Results

The Markov chain {(Ek , Ak−1 )} induced by a policy η ∈ U has a unique closed communicating class, hence there exists a unique steady-state distribution, independent of the initial state, denoted by πη (e, a), e ∈ E, a ∈ {G, B} [12]. The longterm reward defined in (5) then becomes # "K−1 X  1 −1 E χ Vk ≥ FV (η(Ek , Ak−1 )) Vk S0 G(η) = lim K→∞ K k=0

In the following lemma, which is stated without proof, the threshold structure (with respect to the data importance value) of the optimal policy µ∗ is established. Lemma 1 For the optimal policy µ∗ , ∀(e, a) ∈ E × {G, B}, ∗ there exists a threshold vth (e, a) such that  ∗ 1, v ≥ vth (e, a), (7) µ∗ (1; e, v, a) = ∗ 0, v < vth (e, a). As a result of Lemma 1, we henceforth consider only the subset of policies with threshold structure. As in [6], we introduce the function g(x), x ∈ [0, 1], defined as Z +∞    νfV (ν)dν, (8) g(x) = EV χ V ≥ FV−1 (x) V = FV−1 (x)

where v = FV−1 (x) denotes the inverse of the complementary cumulative distribution function of the importance value process FV (v). The function g(x) represents the average reward accrued by transmitting only the data packets with importance value above the threshold v = FV−1 (x), corresponding to an average transmission probability x. The properties of g(x) are stated in the following lemma. Lemma 2 The function g(x) is strictly increasing and strictly concave in x ∈ (0, 1). According to the definition of g(x), if η(e, a) = EV [µ(1; e, V, a)] denotes the average transmission probability in state (e, a), then vth (e, a) = FV−1 ((η(e, a)) and the average reward accrued in state (e, a) is g(η(e, a)). The mapping between µ, vth (·) and η(·) is one-to-one, and the transition probabilities of the time-homogeneous Markov chain {(Ek , Ak−1 )} are governed by η. As a result, from now on we refer to a threshold policy µ in terms of the corresponding average transmission probability η. We denote the state space of the Markov chain {(Ek , Ak−1 )} as S = E ×{G, B}\{(0, G)}. Note that (0, G) is not in S, since Ak−1 = G implies Bk−1 = 1, hence Ek > 0. 1 For the sake of maximizing a long-term average reward function of the state and action processes, it is sufficient to consider only stationary policies depending on the present state [11].

=

eX max

[πη (e, G)g(η(e, G)) + πη (e, B)g(η(e, B))] ,

(9)

e=1

and is independent of the initial state S0 . The optimization problem (6) over the class of admissible policies is equivalent to η ∗ = arg max G(η). η∈U

(10)

The optimal policy η ∗ can be found numerically using the Policy Iteration algorithm for average cost per stage, infinite horizon problems [4], [5]. The focus of this paper is on identifying sub-optimal policies, which trade off performance with complexity. To this end, we define the set of Balanced Policies (BP), which will be studied in detail in Section IV. Definition 3 A Balanced Policy (BP) η ∈ U is defined as any policy such that  e=0  0 η(e, G) = ηG e ∈ {1, 2, . . . , emax − 1}, (11)  ¯ G e = emax , θ + θη  e=0  0 ηB e ∈ {1, 2, . . . , emax − 1}, η(e, B) = (12)  ¯ B e = emax , θ + θη

where ηG ∈ (0, 1) and ηB ∈ (0, 1) are the transmission probabilities in the GOOD and BAD harvesting states, respectively, such that ¯ B = β, βηG + βη

(13)

and θ ∈ {0, 1} is the Overflow Avoidance (OA) parameter. The set of BPs is denoted by UBP ⊂ U. For a given β, a BP is fully characterized by the pair (ηG , θ), where ηG ∈ ((1 − γ −1 )+ , 1), which guarantees ηG , ηB ∈ (0, 1), and θ ∈ {0, 1}. In the following, we refer to a BP in terms of the pair (ηG , θ), which determines the corresponding BP η ∈ UBP via (11), (12) and (13). As its name suggests, a BP results in a “balanced” operation ¯ of the EHD. In fact, since πA (G) = β and πA (B) = β, (13) is equivalent to πA (G)ηG + πB (B)ηB = β, i.e., under a BP, the average energy consumption rate equals the average EH rate, if the impact of energy outage and overflow is

neglected. An alternative interpretation of its operation is that the expected energy recharge over a GOOD harvesting period DG η¯G equals the expected energy discharge over a BAD harvesting period DB ηB , resulting in an equilibrium amongst the recharge/discharge phases. The OA technique (θ = 1) guarantees that no energy is lost due to overflow, by forcing the EHS to transmit when the battery is fully charged. Besides its low realization complexity, the BP is also of interest in scenarios where the exact energy level of the battery is unknown, or its measurement is imprecise [13], [14]. Note that OA introduces only a mild dependence on the energy level, since the EHS controller only needs to know when the battery is fully charged. In Section V, it will be demonstrated via numerical results that the performance degradation with respect to the optimal policy, which also takes into account the current energy level in the decision process, is small. IV. P ERFORMANCE A NALYSIS

OF THE

BP

This section is devoted to the performance analysis of the BP (Definition 3). Initially, we determine the steady state distribution of {(Ek , Ak−1 )} under a generic policy η ∈ U (Lemma 3). Then, in Lemma 4, we evaluate the long-term reward of the BP. Finally, in Lemma 5, we characterize the optimal BP in the asymptotic regime DB → ∞, emax → ∞, where the ratio emax /DB is kept fixed. Lemma 3 For any η ∈ U,   πη (e, G) = Q(e)πη (0, B), e ∈ {1, 2, . . . emax }, (14) πη (e, B) !−1 eX max πη (0, B) = 1 + [1, 1] × Q(e) , (15) e=1

where Q(e) is defined recursively as   0 Q(0) = , 1

(16)

Q(e) = K(e) × Q(e − 1), e ∈ {1, 2, . . . , emax }.

(17)



KGG (e) KGB (e) KBG (e) KBB (e)



,

(18)

with η¯(e − 1, G)(1 − pB − pG ) , pB η¯(e, G) + p¯G η(e, G) p¯B KGB (e) = , pB η¯(e, G) + p¯G η(e, G) p¯G η¯(e − 1, G) , KBG (e) = − η(e, B)[pB η¯(e, G) + p¯G η(e, G)] p¯B η¯(e, G) η(e − 1, B) KBB (e) = + , η(e, B)[pB η¯(e, G) + p¯G η(e, G)] η(e, B)

KGG (e) = −

η¯(emax − 1, G)[¯ pG p¯B − pG pB η(emax , B)] , p¯G [pB η(emax , B) + p¯B η(emax , G)] 1 − pB η¯(emax − 1, B)¯ η(emax , B) KGB (emax ) = γ , pB η(emax , B) + p¯B η(emax , G) η¯(emax − 1, G)(1 − pG η¯(emax , G)) KBG (emax ) = − , pB η(emax , B) + p¯B η(emax , G) η(emax − 1, B) + p¯B η¯(emax − 1, B)¯ η(emax , G) KBB (emax ) = . pB η(emax , B) + p¯B η(emax , G) KGG (emax ) = −

The performance of a BP (ηG , θ) is fully characterized by β, DB and by the ratio ρ = emax /DB , which we name as the relative battery capacity and captures the ability of the battery to absorb the fluctuations in the EH process. Over a BAD EH phase, with average duration DB , the battery is being discharged. Clearly, the longer DB , the deeper the battery discharge and the higher the likelihood of an energy outage event. Therefore, batteries with smaller capacity emax , relative to DB , are more prone to being fully discharged over a BAD EH phase, and are more likely to suffer from energy outage. The parameter ρ captures this behavior. The other system parameters can be expressed as a function ¯ pB = 1 − D−1 , of β, DB and ρ as emax = ρDB , γ = β/β, B −1 −1 pG = 1 − γ DB and, from (13), ηB = γ η¯G . We denote the long-term reward as a function of the BP (ηG , θ) and of the parameters ρ, DB as G(ηG , θ; ρ, DB ) (the dependence on β is implied for notational brevity). We have the following lemma, which characterizes the longterm reward of the BP (ηG , θ), and its outage and overflow probabilities. Lemma 4 For the BP (ηG , θ), we have G(ηG , θ; ρ, DB ) = βg(ηG ) + (β¯ − πη (0, B))g(ηB ) + θγπη (0, B) (g(1) − g(ηG )) , Prη (outage) = πη (0, B), ¯ B πη (0, B), Prη (overflow) = θη

(19) (20) (21)

where ηB = γ η¯G and πη (0, B) = β¯

K(e) is defined as K(e) =

for e ∈ {1, 2, . . . , emax − 1} , and

(DB − 1)ηB + ηG . ρDB + (DB − 1)ηB − θ¯ ηG

(22)

Proof: Substituting (11) and (12) in (14) and (15) yields the steady state distribution under the BP. Then, (19) results from (9), and (20), (21) from Definition 1. Remark: In (19), the term βg(ηG ) accounts for the average reward accrued in the GOOD harvesting state; the term (β¯ − πη (0, B))g(ηB ) is the reward accrued in the BAD harvesting state, where πη (0, B) accounts for a performance loss, due to energy outage events. The term θγπη (0, B) (g(1) − g(ηG )) represents the gain obtained by OA, i.e., by forcing the EHS to transmit when the battery is fully charged. Given DB , ρ, θ, the optimization problem over the class ∗ of BPs becomes ηG (θ; ρ, DB ) = arg maxηG G(ηG , θ; ρ, DB ). We focus on the regime DB → ∞, for fixed ρ, β, θ, for the

ρ/β = 1

ρ/β = 0.2

ρ/β = 0.2

ρ/β = 5

ρ/β = 5

ρ/β = 1

90%

0.25

Percentage loss wrt OP, β = 0.1

Long-term reward, β = 0.1

0.3

OP OBP OBP-OA NABP GP g(β)

0.2

0.15

0.1

OBP OBP-OA NBP GP

70%

50%

30%

10%

0.05 0

10

20

0

10 20 e max = ρ × DB

0

10

0

20

Figure 3. Long-term reward as a function of emax = ρDB , β = 0.1. The asymptotic long-term reward for DB → ∞ is plotted with bold markers in the right side of each subplot.

10

20

0

10 20 e max = ρ × DB

0

10

20

Figure 4. Percentage loss with respect to the OP, as a function of emax = ρDB , β = 0.1.

V. N UMERICAL R ESULTS following reasons. First, we are interested in the regime of correlated energy arrivals, of which DB → ∞ represents an extreme scenario. Second, the asymptotic regime provides a simple analytical expression of the long-term reward, which lends itself to further analysis. Third, as will be verified in Section V via numerical results, the optimal policy in the asymptotic regime performs well also for small values of DB , since the long-term reward depends only mildly on the absolute value of DB . From Lemma 4, the asymptotic long-term reward is found to be G∞ (ηG , θ; ρ) = lim G(ηG , θ; ρ, DB ) DB →∞

= βg(ηG ) + β¯

(23)

ηB ρ g(ηB ) + θβ (g(1) − g(ηG )) . ρ + ηB ρ + ηB

We denote the asymptotically optimal BP with OA parameter ∗ θ ∈ {0, 1} as ηG (θ; ρ) = arg maxηG G∞ (ηG , θ; ρ). In the following lemma, we characterize the optimal BP in the asymptotic regime DB → ∞. Its proof is provided in the Appendix. ∗ Lemma 5 ηG (θ; ρ) is the unique solution of L(ηG , θ; ρ) = 0 over ηG ∈ (β, 1), where

L(ηG , θ; ρ) = [ρ + ηB ]2 g ′ (ηG ) − ρ[ρ + ηB ]g ′ (ηB ) + ρg(ηB )

(24)

− θ[ρ + ηB ]ηB g ′ (ηG ) − θγρ [g(1) − g(ηG )] . Since L(ηG , θ; ρ) is a decreasing function of ηG , with  ∗ L(ηG , θ; ρ) > 0, ηG < ηG (θ; ρ), ∗ L(ηG , θ; ρ) < 0, ηG > ηG (θ; ρ),

(25)

∗ ∗ and ηG (θ; ρ) ∈ (β, 1), the optimal BP ηG (θ; ρ) can be easily determined numerically from (24) using the bisection method [15].

In this section, we compare the performance of the following policies, for indicative values of the parameters β, ρ, DB . • Optimal policy (OP), obtained via the Policy Iteration algorithm [4], [5]. • Optimal BP (OBP), obtained as the solution of L(ηG , θ = 0; ρ) = 0 from (24) using the bisection method. • Optimal BP with OA (OBP-OA), obtained as the solution of L(ηG , θ = 1; ρ) = 0 from (24) using the bisection method. • Non-Adaptive BP (NABP), obtained from (11) and (12) with ηG = ηB = β, θ = 0. NABP does not adapt the transmission probability to the EH state. • Greedy Policy, (GP), obtained from (11) and (12) with ηG = 1, ηB = 0, θ = 0. GP immediately uses the energy harvested in the previous time-slot. A battery with small capacity suffices (emax = 1). As an example, we let fV (v) = e−v , v ≥ 0, i.e., the data importance is exponentially distributed with unit mean. Then, FV (v) = Pr(Vk ≥ v) = e−v , FV−1 (x) = − ln(x) and, from (8),    g(x) = EV χ V ≥ FV−1 (x) V = x(1 − ln x), (26)

with derivative g ′ (x) = − ln x. Unless otherwise stated, we use β = 0.1. In Fig. 3, we plot the long-term reward of the considered policies as a function of emax = ρDB , for ρ ∈ {0.02, 0.1, 0.5}. Moreover, we plot the constant g(β), which, as shown in [6], is an upper bound to the long-term reward, achievable when emax → ∞ for fixed DB , β. The asymptotic long-term reward when DB → ∞ is indicated with bold markers on the right side of each curve. In Fig. 4, we plot the corresponding performance percentage losses with respect to the OP. It is apparent that, for all policies, the long-term reward quickly approaches its asymptotic value, e.g., for emax = ρDB > 5

1 OBP OBP-OA NBP

0.91

0.3

0.82 0.73 0.64

0.2 OBP OBP-OA NBP GP g(β)

0.15

ηG

G(η)

0.25

0.55 0.46 0.37

0.1

0.28 0.19

0.05 0

0.2

0.4

ρ

0.6

0.8

1

0.1 0

0.2

0.4

ρ

0.6

0.8

1

Figure 5. Long-term reward as a function of ρ, in the asymptotic regime DB → ∞, β = 0.1.

Figure 6. Transmission probability in the GOOD EH state ηG , for the BPs considered as a function of ρ, in the asymptotic regime DB → ∞, β = 0.1.

the curves level out. This indicates that the parameter DB only mildly affects the system performance for emax > 5, which justifies the asymptotic analysis of Section IV. The results confirm that ρ is a key parameter of the system, which strongly affects the performance. By comparing the different policies, the OBP incurs only a small performance degradation with respect to the OP, within 10%, for all values of ρ considered. By performing OA (OBP-OA), an even smaller degradation is incurred, within 5% of the optimal policy, for emax > 5. The NABP, which does not adapt to the EH state and transmits with constant probability β, approaches the OBP for large values of ρ, but incurs a significant performance degradation for small values of ρ, due to the erratic nature of the EH process. In Fig. 5, we plot the asymptotic (DB → ∞) long-term reward as a function of ρ and, in Fig. 6, the respective transmission probability in the GOOD EH state ηG . Fig. 5 shows that the best performance is attained by OBP-OA, followed by OBP. NABP incurs a performance degradation with respect to OBP-OA and OBP, which decreases as ρ increases. All the policies considered, except GP, approach the upper bound g(β) for large values of ρ. We observe that the long-term reward is an increasing function of ρ. To explain this behavior, note that, for large ρ, the battery capacity is large with respect to the duration of a BAD EH period, hence it is capable of absorbing the fluctuations in the EH process, thus being seldom affected by energy outage events. Conversely, for small ρ, the energy storage capability is small, hence the battery is quickly discharged over a BAD EH period. As a consequence, the EHS experiences long outage periods, which impair its performance. From Fig. 6, it is seen that, for the OBP and OBP-OA, the transmission probability in the GOOD harvesting state ηG decreases with ρ, from 1 for ρ = 0 to β for ρ → ∞. In fact, when the battery capacity is small, the harvested energy should be immediately used to avoid energy overflow, which explains the optimality of the GP. Conversely, if the capacity of the battery is large with respect to the duration of a BAD

harvesting period (ρ ≫ 1), a more stable operation of the EHS is enabled, and energy outage events are seldom experienced. In this case, a non-adaptive policy (NABP) is optimal. From Fig. 6, we can also see that OBP is more aggressive than OBPOA for all values of ρ, so as to minimize the impact of energy overflow by forcing the battery to not fully charge. VI. C ONCLUSIONS We have analyzed the performance of a BP for an EHS with a time-correlated energy supply described by a two-state Markov chain. We provided closed form expressions for the average long-term importance of the reported data, as well as an easy to solve equation which determines the optimal transmission probabilities in the GOOD and BAD harvesting states. Numerical results demonstrated that the BP performs very well, within 10% of the (numerically computed) optimal policy, at a fraction of the complexity, since it does not require knowledge of the energy level in the battery. A BP with OA reduces the gap even further to 5%. These results are encouraging for practical design, since they demonstrate that very good data reporting performance can be achieved with simple policies that adapt to the availability of ambient energy over time. A PPENDIX A P ROOFS

OF THE LEMMAS IN

S ECTION IV

Proof of Lemma 3: Let e ∈ {1, 2, . . . , emax − 1}. By applying the stationary equation on (e − 1, B) and (e, G), we obtain  πη (e − 1, B) = πη (e − 1, B)pB η¯(e − 1, B)     +πη (e − 1, G)¯ pG η¯(e − 1, G) + πη (e, B)pB η(e, B)  +πη (e, G)¯ pG η(e, G)   πη (e, G) = πη (e − 1, B)¯ pB η¯(e − 1, B) + πη (e, G)pG η(e, G)    +πη (e − 1, G)pG η¯(e − 1, G) + πη (e, B)¯ pB η(e, B) (27)

By solving the above system of equations with respect to [πη (e, G), πη (e, B)]T , we obtain     πη (e, G) πη (e − 1, G) = K(e) × , (28) πη (e, B) πη (e − 1, B) where K(e) is defined in (18). Since πη (0, G) = 0, we have   πη (0, G) = Q(0)πη (0, B), πη (0, B)

(29)

where Q(0) = [0, 1]T . Moreover, for e = 1, . . . , emax − 1, by induction on e we have   πη (e, G) = Q(e)πη (0, B), (30) πη (e, B) where Q(e) is defined recursively in Lemma 3. The stationary equation on (emax −1, B) is given by (27) for e = emax . By applying the stationary equation on (emax , G), we obtain πη (emax , G) =πη (emax − 1, B)¯ pB η¯(emax − 1, B) (31) + πη (emax − 1, G)pG η¯(emax − 1, G) + πη (emax , B)¯ pB + πη (emax , G)pG . By solving the system of equations (27)-(31) with respect to [πη (emax , G), πη (emax , B)]T , we obtain     πη (emax , G) πη (emax − 1, G) = K(emax ) × πη (emax , B) πη (emax − 1, B) = K(emax ) × Q(emax − 1)πη (0, B). (32) Finally, the expressionPfor πη (0, B) is obtained by the law of max total probability, i.e., ee=0 [πη (e, B) + πη (e, G)] = 1.

Proof of Lemma 5: The expressions of the asymptotic long-term reward and of the outage and overflow probabilities are obtained by simply letting DB → ∞ in Lemma 4. After algebraic manipulation, we have that the derivative of G∞ (ηG , θ; ρ) with respect to ηG is positive if and only if dG∞ (ηG , θ; ρ) ∝L(ηG , θ; ρ) > 0, dηG

(33)

where L(ηG , θ; ρ) is given in (24). Moreover, dL(ηG , θ; ρ) ′ ¯ ¯ + ηB ]g ′′ (ηG ) ∝ − θ2γg (ηG ) + θ[ρ dηG + γρg ′′ (ηB ) + ρθg ′′ (ηG ) < 0.

(34)

Therefore, L(ηG , θ; ρ) is a decreasing function of ηG , with L(1, θ; ρ) = ρ2 [g ′ (1) − lim g ′ (x)] − θγρg(1) < 0,

(35)

¯ + β]βg (β) + θρg(β) ¯ L(β, θ; ρ) = θ[ρ + θρ(1 + γ) (g(β) − βg(1)) > 0.

(36)

x→0 ′

We conclude that there exists a unique ηG ∈ (β, 1) which maximizes G∞ (ηG , θ; ρ). This is obtained as the unique solution of L(ηG , θ; ρ) = 0, proving the lemma.

R EFERENCES [1] J. A. Paradiso and T. Starner, “Energy scavenging for mobile and wireless electronics,” IEEE Pervasive Computing, vol. 4, pp. 18–27, Jan. 2005. [2] D. Niyato, E. Hossain, M. Rashid, and V. Bhargava, “Wireless sensor networks with energy harvesting technologies: a game-theoretic approach to optimal energy management,” IEEE Wireless Communications, vol. 14, no. 4, pp. 90–96, Aug. 2007. [3] D. Anthony, P. Bennett, M. C. Vuran, M. B. Dwyer, S. Elbaum, A. Lacy, M. Engels, and W. Wehtje, “Sensing through the continent: towards monitoring migratory birds using cellular sensor networks,” in Proceedings of the 11th international conference on Information Processing in Sensor Networks (ISPN), vol. 12, Apr. 2012, pp. 329– 340. [4] R. Howard, Dynamic programming and Markov processes, 1st ed. The MIT Press, 1960. [5] D. Bertsekas, Dynamic programming and optimal control. Athena Scientific, Belmont, Massachusetts, 2005. [6] N. Michelusi, K. Stamatiou, and M. Zorzi, “On optimal transmission policies for energy harvesting devices,” in Information Theory and Applications Workshop (ITA), 2012, Feb. 2012, pp. 249–254. [7] N. Jaggi, K. Kar, and A. Krishnamurthy, “Rechargeable sensor activation under temporally correlated events,” Springer Wireless Networks (WINET), vol. 15, pp. 619–635, July 2009. [8] M. Gatzianas, L. Georgiadis, and L. Tassiulas, “Control of wireless networks with rechargeable batteries,” IEEE Trans. Wireless Commun., vol. 9, pp. 581–593, Feb. 2010. [9] V. Sharma, U. Mukherji, V. Joseph, and S. Gupta, “Optimal energy management policies for energy harvesting sensor nodes,” IEEE Trans. Wireless Commun., vol. 9, pp. 1326–1336, Apr. 2010. [10] A. Seyedi and B. Sikdar, “Energy efficient transmission strategies for body sensor networks with energy harvesting,” IEEE Trans. Commun., vol. 58, pp. 2116–2126, July 2010. [11] K. W. Ross, “Randomized and past-dependent policies for Markov decision processes with multiple constraints,” Operations Research, vol. 37, pp. 474–477, June 1989. [12] D. A. Levin, Y. Peres, and E. L. Wilmer, Markov chains and mixing times. American Mathematical Society, 2006. [13] N. Michelusi, K. Stamatiou, L. Badia, and M. Zorzi, “Operation policies for energy harvesting devices with imperfect State-of-Charge knowledge,” in First IEEE International Workshop on Energy Harvesting for Communications 2012, June 2012. [14] N. Michelusi, L. Badia, R. Carli, K. Stamatiou, and M. Zorzi, “Energy Generation and State-of-Charge Knowledge in Energy Harvesting Devices,” in 8th International Wireless Communications and Mobile Computing Conference, Cyprus, Aug. 2012. [15] R. L. Burden and J. D. Faires, Numerical Analysis, 9th Edition. Cengage Learning, 2011.