Performance Bounds for Active Sequential ... - Semantic Scholar

Report 4 Downloads 253 Views
2011 IEEE International Symposium on Information Theory Proceedings

Performance Bounds for Active Sequential Hypothesis Testing Mohammad Naghshvar and Tara Javidi Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093 USA, {naghshvar, tjavidi}@ucsd.edu

Abstract—Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest while accounting for the cost of data collection. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively (re-)evaluate the tradeoff between the cost of various sensing actions and the precision of their outcomes. In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, an upper bound is obtained using a heuristic policy for dynamic selection of actions. Using the obtained bounds, the closed loop (feedback) gain is shown to be at least logarithmic in the penalty associated with wrong declarations. Furthermore, it is shown that the proposed heuristic achieves asymptotic optimality in many practically relevant problems such as variable-length coding with feedback and noisy dynamic search.

I. I NTRODUCTION This paper considers a generalization of the classical sequential hypothesis testing problem. Suppose there are M hypotheses among which only one is true. A Bayesian decision maker is responsible to enhance his information about the correct hypothesis in a speedy and sequential manner while accounting for the cost of sample collection process or sensing. However, in contrast to the classical sequential hypothesis testing problem, our decision maker can choose one of K available actions and hence, exert some control over the collected sample’s “information content.” We refer to this generalization as the active sequential hypothesis testing problem. The active sequential hypothesis testing problem naturally arises in a broad spectrum of applications in cognition, communications, design of experiments, and sensor management. For instance, consider the classic I-SPY game in which a child is given an image and is asked to identify the location of an object against a crowded background: The child’s gaze not only provides samples which successively enhance the child’s belief about the object’s location, but its focus on a particular segment also controls the “information content” of the collected samples. It is intuitive that an optimized Bayesian decision maker relies on his current belief to adaptively select the observation sample and (re-)evaluate the trade-off between the observation precision and the cost of various actions. Making this intuition precise is the topic of our study.

978-1-4577-0594-6/11/$26.00 ©2011 IEEE

The most well known instance of our problem is the case of binary hypothesis testing with passive sensing (M = 2, K = 1), first studied by Wald [1]. In this instance of the problem, the optimal action at any given time is given by a sequential probability ratio test (SPRT). There are numerous studies on the generalizations to M > 2 (K = 1) and the performance of known simple and practical heuristic tests such as MSPRT [2]–[4]. In [5], we provided a first step towards characterizing the optimal solution to the active hypothesis testing problem by identifying sufficient conditions under which the active sequential hypothesis testing reduces to the passive sensing case (K = 1). Furthermore, we specialized the results in the binary case with additive observation noise and hypothesis-driven sensing actions (i.e. each sensing action is specialized to discriminate the validity of one and only one of the hypotheses). It was shown that under a large class of additive observation noise models, the active binary hypothesis-driven testing reduces to a passive sequential testing where sensing action is restricted to the one known to result in the least “noisy” observation, where noisiness is measured in terms of ratio of moment generating functions. In [6], we considered the active binary hypothesis testing problem (M = 2, general K) when the sufficient condition for reduction to passive testing does not necessarily hold. Using results in dynamic programming, an asymptotically optimal policy was provided. This paper generalizes the results in [6] for the M -ary case. More precisely, a general lower bound and upper bound on the value function are provided that recover the bounds in [6] for the binary case M = 2. Unlike the binary case presented in [6], however, the bounds may be loose and the gap between the lower and the upper bound might grow with the total number of samples. Beyond the binary setup, we will discuss important special cases where the bounds are tight and the proposed algorithm achieves asymptotic optimality. The remainder of this paper is organized as follows. In Section II, we formulate the problem. Section III provides a dynamic programming formulation and characterizes an optimal policy. Sections IV and V provide a lower bound and upper bound on the value function and study the gap between these bounds. Finally, we conclude the paper and discuss future work in Section VI. Notation: Let [x]+ = max{x, 0}. A random variable is

2602

denoted by an upper case letter (e.g. X, Y, Z) and its realization is denoted by a lower case letter (e.g. x, y, z). Similarly, a random column vector and its realization are denoted by bold face symbols (e.g. X and x). For any set S, |S| denotes the cardinality of S. The Kullback-Leibler (KL) divergence between two probability density functions q 0 (·) is R q(·) and q(z) 0 0 denoted by D(q||q ) where D(q||q ) = q(z) log q0 (z) dz. II. P ROBLEM F ORMULATION Here, we provide a precise formulation of our problem. Problem (P) [Active Sequential Hypothesis Testing] Let Hi , i = 1, 2, . . . , M , denote M hypotheses of interest among which only one holds true. Let θ be the random variable that takes the value θ = i on the event that Hi is true for i = 1, 2, . . . , M . A is the set of all sensing actions and is assumed to be finite with |A| = K < ∞. Z is the observation space. For all a ∈ A, the observation kernel qia (·) (on Z) is the probability density function for observation Z when action a has been taken and Hi is true. We assume that the one-step cost of all sensing actions is equal to 1. Let l denote the penalty for a wrong declaration of the true hypothesis, i.e. the penalty of selecting Hj , j 6= i, when Hi is true. Let τ be the stopping time at which the decision maker retires. The objective is to find a sequence of sensing actions A(0), A(1), . . . , A(τ − 1), a stopping time τ , and a declaration rule d : Z τ → {1, 2, . . . , M } that together minimize the total cost:   E τ + l1{d(Z τ )6=θ} , (1) where the expectation is taken with respect to the initial belief as well as the distribution of observation sequence.

and is given by Φa , a measurable function from P(Θ) × Z to P(Θ) for all a ∈ A:   a q1a (z) q2a (z) qM (z) a , (2) Φ (ρ, z) = ρ1 a , ρ2 a , . . . , ρM a qρ (z) qρ (z) qρ (z) P a a where qρa (z) = M i=1 ρi qi (z), qρ (z) > 0.

Definition. Markov stationary deterministic policy1 is a mapping from the information state space P(Θ) to action space A based on which sensing actions A(t), t = 0, 1, . . . , τ − 1, stopping time τ , and declaration rule d are selected. From [8], there exists a policy which minimizes (1) and is referred to as an optimal policy, denoted by π ∗ . Fact 1 (Theorems 1, 4 in [9]). Let V ∗ : P(Θ) → R+ be a value function solving the following fixed point equation:   a ∗ ∗ V (ρ) = min 1 + min(T V )(ρ), min l(1 − ρj ) , (3) a∈A

where for any measurable function g : P(Θ) → R, operator Ta , a ∈ A, is defined as: Z (4) (Ta g)(ρ) = g(Φa (ρ, z))qρa (z)dz. Then V ∗ (ρ) is equal to the minimum cost in Problem (P) where the initial belief is ρ. Note that (3) provides a natural characterization of an optimal policy π ∗ for Problem (P) as follows: retire and declare Hi as the true hypothesis whenever V ∗ (ρ) = l(1 − ρi ), i = 1, 2, . . . , M . Otherwise, select action a∗ = arg min(Ta V ∗ )(ρ). a∈A

Fact 2. Suppose there exists a V : P(Θ) → R+ and a policy π : P(Θ) → A for which lim inf Eπ [V (ρ(t))] = 0

III. DYNAMIC P ROGRAMMING F ORMULATION In this section, we first derive the corresponding dynamic programming (DP) equation for Problem (P). Finally, from the DP solution, we characterize an optimal policy for Problem (P). A. Dynamic Programming The problem of active M -ary hypothesis testing is a Partially Observable Markov Decision Problem (POMDP) where the state is static and observations are noisy. It is known that any POMDP is equivalent to an MDP with a compact yet uncountable state space, for which the belief of the decision maker about the underlying state becomes an information state [7]. In our setup, thus, the information state at time t is nothing but a belief vector specified by the conditional probability of hypotheses H1 , H2 , . . . , HM to be true given the initial belief and all the previous observations. Accordingly, the information state n o space is defined PM M as P(Θ) = ρ ∈ [0, 1] : i=1 ρi = 1 where Θ is the σ-algebra generated by random variable θ. In one sensing step, the evolution of the belief vector follows Bayes’ rule

j

t→∞

for all initial belief vectors ρ(0) ∈ P(Θ) where ρ(t) denotes the posterior belief at time t. If V (ρ) ≤ min{1 + min(Ta V )(ρ), min l(1 − ρj )}, then V ≤ V ∗ . 2 a∈A

j

IV. L OWER B OUND

AND

U PPER B OUND

FOR

V∗

As discussed in the previous section, finding an optimal policy π ∗ for Problem (P) requires knowledge about the value function V ∗ . Although V ∗ can be approximated numerically from (3) using value iteration technique [10], finding a closedform formulation for V ∗ , in general, might not be feasible. In Subsection IV-A, however, we use techniques from negative dynamic programming [10] to find a lower bound for the value function V ∗ while in Subsection IV-B, an upper bound for V ∗ is established by proposing an appropriate policy π ˜ . In Subsection V-B, we identify a set of important special cases for which the bounds are asymptotically tight establishing the asymptotic optimality of π ˜. 1 The sequence of actions can be selected in a Markovian, stationary, and deterministic fashion without loss of optimality [8, Theorem 9.1]. 2 The proof of Fact 2 simply follows similar lines as the proof of Proposition 7.3.4 of [10].

2603

Before we proceed, we need the following technical assumptions. Assumption 1. For any two hypotheses i and j, i 6= j, there exists an action a, a ∈ A such that D(qia ||qja ) > 0. Assumption 2. max max sup i,j

a∈A z∈Z

qia (z) qja (z)

< ∞.

Assumption 1 ensures the possibility of discrimination between any two hypotheses. Assumption 2 implies that no two hypotheses are fully distinguishable using a single observation sample. Assumption 2 is for ease of our proof of an upper bound for the value function. Standard techniques as in [11], [12] can be applied to find an upper bound when Assumption 2 does not hold.

Markov stopping times defined as follows:   −1 , τ = min n : min {1 − ρj (n)} ≤ l j   ρi (n) 1 − l−1 τi = min n : min ≥ −1 , j6=i ρj (n) l /(M − 1)   ρi (n0 ) ρ˜ 0 τ˜i = min n : min ≥ ∀n ≥ n . j6=i ρj (n0 ) (1 − ρ˜)/(M − 1) From (1), total cost under policy π ˜ can be written as Vπ˜ (ρ) = E[τ ] + min{(1 − ρj (τ ))l} j

≤ E[τ ] + 1 ≤

M X

ρi E[τi |θ = i] + 1,

(5)

i=1

A. Lower bound for V ∗ Proposition 1. There exists a K 0 independent of l such that  + −1 ρi M X log l−11−l − log /(M−1) ρj ρi max V (ρ) =  − K 0 a ˆ ||q a ˆ) j6=i max D(q i j i=1 a ˆ ∈A

is a lower bound for the optimal value function V ∗ .

where ρ = [ρ1 , ρ2 , . . . , ρM ] = [ρ1 (0), ρ2 (0), . . . ρM (0)] is the prior belief about H1 , H2 , . . . , HM and the last inequality follows from the fact that τ ≤ τi , i = 1, 2, . . . , M . What remains is to find an upper bound for E[τi |θ = i], i = 1, 2, . . . , M . Before we proceed, we introduce the following notations to facilitate the proof. Let X λa D(qia ||qja ), Di∗ := max min λ∈Λ j6=i

Proof: The proof is based on Fact 2 in Subsection III-A and is omitted in the interest of brevity. Next we find an upper bound for V ∗ which is proved to be achievable by a broad range of policies. B. Upper bound for V ∗ As noted in Section II, the set of all sensing actions is denoted by A = {1, 2, . . . , K}. Let Λ = {λ ∈ [0, 1]K : PK i=1 λi = 1}.

Proposition 2. There exists a K 00 independent of l such that  + −1 log ρρki log l−11−l M X /(M−1) − min k6=i   00 P V (ρ) =  ρi a ||q a )  + K max min λ D(q a i j i=1 λ∈Λ j6=i a∈A

is an upper bound for the optimal value function V ∗ . λ∗i

[λ∗i1 , λ∗i2 , . . . , λ∗iK ]

Proof: Let = ∈ Λ be the vectorP that achieves the maximum in maxλ∈Λ minj6=i a∈A λa D(qia ||qja ). Consider an arbitrary threshold ρ˜, 12 < ρ˜ < 1 − l−1 . It will be shown that the upper bound V is achievable under any Markov (randomized) policy π ˜ such that: • • •

If ρi ≥ 1 − l−1 , retires and selects hypothesis i as the true hypothesis; If ρi ∈ [˜ ρ, 1 − l−1 ), then P (˜ π (ρ) = a) = λ∗ia ∀a ∈ A; If ρi ∈ [0, ρ˜) for all i = 1, 2, . . . , M , then P (˜ π (ρ) = 1 a) = |A| ∀a ∈ A.

We denote by ρi (n) the posterior belief about hypothesis i after n observations. Let τ , τi , and τ˜i , i = 1, 2, . . . , M , be

˜ i∗ D

a∈A

1 X := min D(qia ||qja ), j6=i |A| a∈A

1 − l−1 ρi Ti∗ (ρ) := log −1 − min log . k6=i l /(M − 1) ρk For any  > 0, we have ∞ X E[τi |θ = i] = P ({τi ≥ n}|θ = i) ≤



n=1 Ti∗ (ρ) (1 Di∗

+ ) +

X

P ({τi ≥ n}|θ = i)

T ∗ (ρ) n:n> iD∗ (1+) i

Ti∗ (ρ) (1 + ) + Ki00 , Di∗

(6)

where Ki00 is independent of l and the last inequality follows from Lemma 1 below. Combining (5) and (6), and letting K 00 = max Ki00 + 1 completes the proof of Proposition 2. i

Lemma 1. Given any  > 0 and for n > P ({τi ≥ n}|θ = i) is exponentially decaying.

Ti∗ (ρ) Di∗ (1

+ ),

Proof: To prove this lemma, we appeal to the following fact from [11]: Fact 3 (McDiarmid’s Inequality). Let X = (X1 , . . . , Xn ) be a family of independent random variables with Xk taking values in a set Xk for each k. Suppose a real-valued function f defined on Πnk=1 Xk satisfies |f (x) − f (x0 )| ≤ ck , whenever the vectors x and x0 only differ in the k-th coordinate. Then for any η ≥ 0, P (f (X) − E[f (X)] ≤ −η) ≤ e−2η/

2604

Pn

k=1

c2k

.

˜ij (n) be events in the probability space Let Bij (n) and B defined as follows: ρi (n) 1 − l−1 < log −1 }, ρj (n) l /(M − 1) ρ˜ ˜ij (n) := {log ρi (n) < log B }. ρj (n) (1 − ρ)/(M ˜ − 1)

Bij (n) := {log

˜ij (n)|θ = i) P (B n ρi (n) ρi (n) =P log − E[log ]< ρj (n) ρj (n)   ρ˜ ρi (n)  o log − E log θ=i (1 − ρ)/(M ˜ − 1) ρj (n) n  ρi (n) ρi (n)  =P log − E log < ρj (n) ρj (n) n o  X ρ˜ ρi A(t) A(t) log − log − D(qi ||qj ) θ = i (1 − ρ)/(M ˜ − 1) ρj t=1 n ρi (n) ρi (n) − E[log ]< ≤P log ρj (n) ρj (n) o  ρ˜ ρi ˜ ∗ θ = i . log − min log − nD (7) i k6=i (1 − ρ)/(M ˜ − 1) ρk

Similarly, we can show that

For any δ > 0,

where Kα is independent of penalty l. Proof: The proof is based on Fact 2 and is omitted in the interest of brevity. This shows that unless there exists a λ∗ such that for all i ∈ {1, 2, . . . , M }, X X max min λa D(qia ||qja ) = min λ∗a D(qia ||qja ), λ∈Λ j6=i

j6=i

a∈A

a∈A

the feedback gain grows logarithmically with l.

The lower and upper bounds provided by Propositions 1 and 2 are not tight in general. More precisely, we have X λa D(qia ||qja ) min max D(qia ||qja ) ≥ max min j6=i a∈A

P ({τi ≥ n}|θ = i) ≤ P (∪j6=i Bij (n)|θ = i) ≤ P (∪j6=i Bij (n) ∩ {˜ τi ≤ nδ}|θ = i) + P ({˜ τi > nδ}|θ = i) X X ˜ ij (m)|θ = i). ≤ P (Bij (n) ∩ {˜ τi ≤ nδ}|θ = i) + P (B m:m>nδ

(9)

From (7)-(9), Fact 3, and Assumption 2, we have the assertion of the lemma. V. A SYMPTOTIC O PTIMALITY

Proposition 3. Vα (ρ) is lower bounded by +  −1 ρi X log l−11−l − log /(M−1) ρj  , Vα (ρ) =  ρi max P a ||q a ) − Kα j6=i α D(q a i j a∈A i

B. Special Cases: Asymptotic Optimality

P (Bij (n)|θ = i) n o  ρi (n) ρi (n) ≤P log − E[log ] < Ti∗ (ρ) − (n − τ˜i )Di∗ θ = i . ρj (n) ρj (n) (8)

j6=i

Let Vα (ρ) denote the total cost when the initial belief state is ρ and policy πα is enforced. The following proposition provides the feedback gain.

AND

F EEDBACK G AIN

In this section, we state and discuss the consequence of the bounds obtained in Section IV. In Subsection V-A, we focus on the advantage of causally selecting sensing actions. In particular, we show that the performance gap between the open loop policy and π ˜ (hence the optimal one) grows logarithmically as the penalty l increases. In Subsection V-B, we study the gap between lower and upper bounds provided in Section IV and discuss important special cases where the bounds are asymptotically tight.

λ∈Λ j6=i

a∈A

≥ max min D(qia ||qja ). a∈A j6=i

(10)

In this subsection, we consider three important special cases of the active hypothesis testing for which the upper and lower bounds are asymptotically tight. By asymptotically tight, we mean bounds that only differ by a constant independent of l. This is significant because it implies that as l increases (or equivalently with increasing number of samples), the action decisions of policy π ˜ differ from that of any optimal policy at most in a finite and bounded number of instances (we refer to this property as asymptotically optimal). These three cases are given below: • Binary Hypothesis Testing (M = 2): Consider Problem (P) for M = 2. In this setting, (10) holds with equality, i.e. min max D(q1a ||qja ) = max D(q1a ||q2a ) = max min D(q1a ||qja ), j6=1 a∈A

a∈A

a∈A j6=1

and min max D(q2a ||qja ) = max D(q2a ||q1a ) = max min D(q2a ||qja ). j6=2 a∈A

A. Full Feedback Gain

a∈A

a∈A j6=2

The problem of reliability (error exponent) associated with binary hypothesis testing with fixed sample size as well as variable-length (sequential) is hardly new. Recently, the authors in [13] have generalized this problem to the active testing context and shown the optimality of policy π ˜ (in terms of error exponent). Our work complements the findings in [13] by providing the asymptotic optimality of π ˜ in a total cost (and Bayesian) sense.

In this subsection, we utilize the upper bound provided by Proposition 2 to shed light on the value of feedback. To do so, we consider a class of policies which do not fully utilize the feedback nature of the problem. In particular, consider a class of policies πα defined as: PK • For a given vector α, αk ≥ 0, k=1 αk = 1, πα takes sensing action a, a = 1, 2, . . . , K, with probability αa until minj {1 − ρj } ≤ l−1 .

2605



qif (z)

= P (Z = z|X = f (i)).

Here again min max D(qia ||qja ) = max min D(qia ||qja ) j6=i f ∈A

f ∈A j6=i

D(P (Z|X = x)||P (Z|X = x0 )). = max 0 x,x ∈X



TABLE I D(q1a ||qja ), j 6= 1, FOR M = K = 3.

Variable-length Coding with Feedback: Consider a transmitter who wishes to convey a message θ ∈ {1, 2, . . . , M } to a receiver over a discrete memoryless channel P (Z|X) with feedback. Here, the transmitter is given the ability to tradeoff the channel use (cost) with improving the reliability of receiver’s knowledge about the message. As discussed in [14], this problem can be mapped to a special case of active hypothesis testing with A = {f : H → X }, K = |H| · |X |, and observation kernels

The problem above was first tackled by Burnashev in his seminal paper [15] where he provided upper and lower bounds on the number of channel uses in order to achieve a fixed probability of error. In [16], we addressed the relation between our bounds (for variable-length coding setup) with the ones provided in [15]. Noisy Dynamic Search: Consider the problem of sequentially searching for one and only object of interest in M locations where at each instant, one location can be searched. The outcome of observinglocation i is a random variable Si + N where b if location i contains the object Si = for some 0 otherwise b ∈ R, and N represent the noise component of the observation. The goal is to find the object quickly and accurately. In [5], we have shown that the problem above can be modeled as an active hypothesis-driven testing with A = {1, 2, . . . , M } (K = M ) and the observation kernels  fobj (·) a=i qia (·) = (11) fnoise (·) a 6= i such that fobj (z) = fnoise (z − b) and fnoise is symmetric. Table V-B shows D(q1a ||qja ), j 6= 1, a ∈ {1, 2, 3}, for M = K = 3. Note that under the symmetry condition above, we have D(fobj ||fnoise ) = D(fnoise ||fobj ). It is clear from Table V-B (and can be generalized) that min max D(qia ||qja ) = max min D(qia ||qja ) j6=i a∈A

a∈A j6=i

= D(fobj ||fnoise ). This problem was first considered in [17]. It was shown that observing the location with the highest likelihood of holding the object is optimal in a finite horizon scenario with no early detect/stopping. Tightness of the upper and lower bounds here shows the asymptotic optimality of this policy in an infinite horizon scenario with a stopping time. Moreover, our result show that the asymptotic optimality holds beyond the symmetric noise model so long as D(fobj ||fnoise ) ≥ D(fnoise ||fobj ).

j=2

j=3

a=1

D(fobj ||fnoise )

D(fobj ||fnoise )

a=2

D(fnoise ||fobj )

0

a=3

0

D(fnoise ||fobj )

VI. D ISCUSSION

AND

F UTURE W ORK

In this paper, we considered the problem of active sequential M -ary hypothesis testing. Using a DP characterization, we provided lower and upper bounds on the optimal value function which generalized the bounds obtained for binary hypothesis testing in [6]. Subsequently, we used these bounds to quantify the value of full feedback.We also discussed three special cases for which the bounds are tight and result in identifying an asymptotically optimal policy. R EFERENCES [1] A. Wald and J. Wolfowitz, “Optimal Character of the Sequential Probability Ratio Tests,” Ann. Math. Statits., vol. 19, pp. 326–339, 1948. [2] P. Armitage, “Sequential Analysis with more than Two Alternative Hypotheses, and its Relation to Discriminant Function Analysis,” Journal of the Royal Statistical Society, Series B, vol. 12, no. 1, pp. 137–144, 1950. [3] G. Lorden, “Nearly-optimal sequential tests for finitely many parameter values,” The Annals of Statistics, vol. 5, no. 1, pp. 1–21, 1977. [4] V. P. Dragalin, A. G. Tartakovsky, and V. V. Veeravalli, “Multihypothesis Sequential Probability Ratio Tests. I: Asymptotic Optimality,” IEEE Transactions on Information Theory, vol. 45, no. 7, pp. 2448–2461, November 1999. [5] M. Naghshvar and T. Javidi, “Active M-ary Sequential Hypothesis Testing,” in IEEE International Symposium on Information Theory (ISIT), 2010, pp. 1623–1627. [6] M. Naghshvar, T. Javidi, “Information Utility in Active Sequential Hypothesis Testing,” in Proceedings of the 48th Allerton conference on communication, control, and computing, 2010. [7] P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice-Hall, Inc., 1986. [8] R. E. Strauch, “Negative Dynamic Programming,” The Annals of Mathematical Statistics, vol. 37, no. 4, pp. 871–890, 1966. [9] S. E. Shreve and D. P. Bertsekas, “Universally Measurable Policies in Dynamic Programming,” Mathematics of Operations Research, vol. 4, no. 1, pp. 15–30, February 1979. [10] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., 1994. [11] C. McDiarmid, “On the Method of Bounded Differences,” Surveys in Combinatorics, London Mathematical Society Lecture Note Series 141, Cambridge University Press, pp. 148 –188, 1989. [12] M. V. Burnashev, “Sequential Discrimination of Hypotheses with Control of Observations,” Math. USSR Izvestija, vol. 15, no. 3, pp. 419–440, 1980. [13] Y. Polyanskiy and S. Verdu, “Hypothesis Testing with Feedback,” in Information Theory and Applications Workshop (ITA), 2011. [14] A. Mahajan, A. Nayyar, and D. Teneketzis, “Identifying Tractable Decentralized Problems on the Basis of Information Structures,” in Proceedings of the 46th Allerton conference on communication, control, and computing, 2008, pp. 1440–1449. [15] M. V. Burnashev, “Data Transmission Over a Discrete Channel with Feedback Random Transmission Time,” Problemy Peredachi Informatsii, vol. 12, no. 4, pp. 10–30, 1975. [16] M. Naghshvar, T. Javidi, “Variable-Length Coding with Noiseless Feedback and Finite Messages,” in Asilomar Conference, 2010. [17] D. A. Castanon, “Optimal Search Strategies in Dynamic Hypothesis Testing,” IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no. 7, pp. 1130 –1138, July 1995.

2606