Optimal and asymptotically optimal decision rules ... - Semantic Scholar

Comment

Report 3 Downloads 99 Views

Optimal and asymptotically optimal decision rules for sequential screening and resource allocation L. Pronzato Laboratoire I3S, CNRS{UNSA, b^at. Euclide, Les Algorithmes, 2000 route des Lucioles, BP 121, 06903 Sophia-Antipolis Cedex, France. Tel: 33 4 92 94 27 08; Fax: 33 4 92 94 28 98; email: [email protected] December 1, 1999 Abstract

We consider the problem of maximizing the expected sum of n variables Xk chosen sequentially in an i.i.d. sequence of length N . It is equivalent to the following resource allocation problem: n machines have to be allocated to N jobs of value Xk (k = 1; : : : ; N ) arriving sequentially, the i-th machine has a (known) probability pi to perform the job successfully and the total expected reward must be maximized. The optimal solution of this stochastic dynamic{programming problem is derived when the distribution of the Xk 's is known: the optimal threshold for accepting Xk , or allocating the i-th machine to job k, is given by a backward recurrence equation. This optimal solution is compared to the simpler (but suboptimal) open{loop feedback{optimal solution for which the threshold is constant, and their asymptotic behaviors are investigated. The asymptotic behavior of the optimal threshold is used to derive a simple open{loop solution, which is proved to be asymptotically optimal (N ! 1 with n xed) for a large class of distributions. Certainty equivalence is proposed for the case where the distribution of the Xk 's is unknown and estimated on{line. Simulation results indicate that open{loop rules can achieve good performance. Keywords| certainty equivalence; dynamic programming; neutrality; resource allocation; sequential experimental design; sequential screening; stochastic assignment.

1 Introduction Consider the situation where N items are generated sequentially, the characteristic of interest of the k-th item being measured by a scalar variable Xk 2 X . One wishes to 1

sift out the n best items in terms of their X value, that is, to maximize Pni=1 Xki , with ki 2 f1; : : : ; N g the index of the i-th item selected. The decision to select the k-th item or not must be taken on{line, that is, at time k, and Xk is observed before one makes a decision. The Xk 's are assumed to be independently identically distributed1 (i.i.d.) random variables with probability measure (). Let fuk gk be the decision sequence, with uk = 1 if item k is selected at step k, and uk = 0 otherwise. The problem is then: N X maximize EJN (uN1 ) = E f uk Xk g ; (1) k=1

with uN1 = (u1; : : :; uN ) satisfying the constraints2

uj 2 Uj = f0; 1g ; j = 1; : : : ; N ;

N X k=1

uk = n :

(2)

The expectation in (1) is with respect to the product measure N () of (X1; : : : ; XN ) and Xk is known when uk is selected. The measure () is assumed to be non degenerate (it has several support points if it is discrete) and such that E fjX jg < 1, with E f:g the expectation with respect to X . Note that by changing the probability measure of the P N Xk 's one can consider objectives having the form E f k=1 uk f (Xk )g, see Example 2 (ii) in Section 3.3. We shall see that at step j , with N ? j +1 items still available and aj already selected, the optimal solution is de ned by thresholds s~(j; aj ) such that Xj is selected if larger than s~(j; aj ) and is rejected otherwise. Many decision problems can be formulated in this way, and we only mention some of them. (i) Selling with a deadline. One has to sell n identical items within a given period of time, say N days, one oer Xk comes each day to buy one item, and is lost if not immediately accepted. (ii) Selection of best items. n items can be collected, e.g. for delayed processing, with Xk the score of item k, e.g. its information content. The decision to collect items or not must be taken on{line, and any item selected at step j cannot be replaced by another one at step k > j . For instance, for some experiments in nuclear physics, events are selected according to the energy dissipated in a detector [8]. (iii) Sequential experimental design. One has to estimate a scalar parameter from observations yk = (; Zk ) + k , with fk gk an i.i.d. sequence. The experimental conditions are characterized by the Zk 's, which are i.i.d. and independent of fk gk . They are generated sequentially, and the decision of observing yk or not must be The case of sub- and supermartingales is considered in Theorem 2 of [9]: when the k 's form a submartingale, that is, f k j 1 k?1g k?1, 2, the optimal rule is to select the last items. When they form a supermartingale, f k j 1 k?1g k?1, 2, it is optimal to select the rst items. 2 Most examples involve non{negative P k 's. In this case, the developments in the paper remain valid when the constraint (2) is relaxed to Nk=1 k . 1

X

E X

X ; : : :; X

X

E X

k

X ; : : :; X

n

X

u

n

2

n

X

k

taken on{line. Only n observations are allowed. When the objective is to make a precise estimation of , and the criterion is the Fisher information (see e.g. [22]), P n 0 ^ evaluated at a prior nominal value for , one has to maximize i=1 Xki (^0), with Xki () = (@(; Zki )=@)2, ki 2 f1; : : :; N g. (iv) Secretary problem. One has to appoint n persons among N applicants that are interviewed sequentially. The decision to appoint a candidate or not is taken immediately after the interview, Xk is the score given to candidate k. This is a generalization of Problem 9 in Chap. 2 of [6], for which n = 1; see also Problem 16 in Chap. 4 of [7], vol. 1. It corresponds to a modi cation of the \secretary problem", which has a long history, see [14, 15, 17]. In the basic form of the problem, all Xk 's are dierent, only their relative ranks are observed, () is unknown, n = 1 and one aims at maximizing the probability of selecting the applicant with the largest score. The optimal strategy, see [18], is then to reject the rst (approximately) N=e items and accept the next item with X value larger than all of preceding ones (if there is no such item, one proceeds till the end of the game and retain the last item, with a loss). Many generalizations have been considered. For instance, in [28, 31] the applicants can refuse the oer of being selected with a known xed probability, and oers are made inde nitely until one is accepted. A similar situation is considered in [4], with the constraint that only m oers are made. Moreover, in [32] the probability that an applicant will refuse an oer depends on its relative rank. In [26, 33], one can recall a candidate to whom an oer was not previously made. Tamaki [30] allows one to make an oer to the last applicant interviewed, who will accept with a given probability p1, and also to the last m applicants, who will accept with a given probability p2 . In [23, 34], the original problem is relaxed by considering that a win is obtained if the item retained has overall rank less than or equal to s 1. Few extensions to n > 1 seem to exist. The case n = 2, with a win if the best and second best applicants are selected, is considered in [29]. Note, however, that there is an important dierence between the standard secretary problem and (1,2): in the rst case the payo is zero or one, and no assumption on the probability measure for the scores Xk is made. We shall return to this point in Section 5, where the measure () is supposed to be unknown. There is a most interesting connection between (1,2) and the seemingly more dicult problem of sequential resource allocation, hence the title of the paper. Consider a situation where N jobs arrive sequentially, job k having value Xk . There are N machines available, and associated with machine i is a probability pi that it will perform the job successfully. The relative ranks of the pi 's are known. The problem is to allocate machines to arriving jobs, so as to maximize the total expected reward; that is, N X maximize ERN = E f pjk Xk g ; (3) k=1

where the random variable jk labels the machine allocated to job k. Note that if there are n > N machines available, only the N machines with largest pi 's have to be considered, 3

whereas if n < N one can consider N ? n virtual machines with associated pi equal to zero. For that reason, one can always assume that n = N . It is shown in [9] that the optimal solution of this stochastic assignment problem is as follows. At step j , with N ? j + 1 jobs to be allocated to the remaining N ? j + 1 machines, with associated probabilities p(j; j ) p(j; j + 1) p(j; N ), there exist thresholds ?1 = (j; j ? 1) (j; j ) (j; j +1) (j; N ?1) (j; N ) = 1 such that the machine with probability p(j; i) should be used if (j; i ? 1) < Xj (j; i). Moreover, the thresholds are independent of the values of the pi 's, see [9]. Taking n pi's equal to one and N ? n equal to zero makes this assignment problem equivalent to the selection problem (1,2), and the thresholds s~(j; aj ) and [j; N ? (n ? aj )] exactly coincide. Therefore, by solving (1,2) one also solves the resource allocation problem (3). In what follows we shall only consider the formulation (1,2) of the problem. The main purpose of the paper is to construct simple open{loop decision rules that are asymptotically optimal when N ! 1 with n xed. The case n = N ? [N], N ! 1 with 2 (0; 1) and [x] rounding x towards nearest integer, is considered in [1], where a strategy that uses a constant threshold s() is shown to be asymptotically optimal. The case where n is xed is more dicult: maintaining the threshold constant is not asymptotically optimal, and we need to adapt the threshold as a function of the index j of the current step and the number aj of items already collected. Two decision rules are considered in Section 2: the optimal (closed{loop) solution and the open{loop feedback{optimal solution. Their behaviors, especially for N ! 1 with n xed, are investigated in Section 3. In Section 4 we use the asymptotic behavior of the optimal solution to derive a simpler open{loop solution which is shown to be asymptotically optimal for a large class of distributions. The case where the distribution of the Xk 's is unknown is considered in Section 5, with an approach based on certainty equivalence. The proofs of all lemmas and theorems are given in an appendix.

2 The Closed{Loop and Open{Loop Feedback solutions

For any sequence uN1 and any step j , 1 j N , aj will denote the number of items already stored; that is, jX ?1 aj = uk ; (4) k=1

with a1 = 0. Consider rst the case where the measure () of X is known (the case where it is unknown will be considered in Section 5). The problem then reduces to an usual discrete{time stochastic control problem, where j represents time, (aj ; Xj ) 2 f0; : : : ; ng X represents the state at time j and uj 2 Uj f0; 1g the control at time j . For each j 2 f1; : : : ; N g and each policy uNj , one can consider the conditional expected

4

gain{to{go at time j given the state (aj ; Xj ): N X N C (j; aj ; Xj ; uj ) = E f uk Xk jaj ; Xj g : k=j

(5)

Problem (1,2) then corresponds to the maximization of C (1; a1 = 0; X1 ; uN1 ), where, for all j , the control function uj only depends on the past states (a1; X1 ; : : :; aj ; Xj ). The constraint (2) translates into constraints on the admissible sets:

Uk = f1g for k = j; : : : ; N if aj + N ? j + 1 n

(6)

(all remaining items are selected) and

Uk = f0g for k = j; : : : ; N if aj = n

(7)

(no further item can be selected).

2.1 Open{loop Feedback{Optimal decisions

The suboptimal approach called Open{Loop Feedback{Optimal (OLFO) in control theory takes decisions optimally at each step j , with the restriction that future decisions uk , k > j , should only depend on uj and the present state (aj ; Xj ), see, e.g., [5]. At step j , with n?N +j ?1 < aj < n, uj is chosen so as to maximize the expected gain{to{go; that is, u^j = arg maxuj 2f0;1g[uj Xj + maxuNj+1 [E fPNk=j+1 uk Xk jaj ; Xj g]]. Since uNj+1 only depends on uj ; aj ; Xj , E fPNk=j+1 uk Xk jaj ; Xj g = (n ? aj ? uj )E fX g. For n ? N + j ? 1 < aj < n, the OLFO rule is thus: ( Xj E fX g ; u^j (aj ; Xj ) = 01 ifotherwise, where the choice u^j (aj ; E fX g) = 0 is arbitrary. It relies on the comparison of Xj with a constant threshold, equal to E fX g. More generally, let C^ s(j; aj ; Xj ) denote the conditional expected gain{to{go (5) for a rule using a constant threshold s, and let c^s (j; aj ) denote its expected value with respect to Xj . One has, for n ? N + j ? 1 < aj < n, C^ s(j; aj ; Xj ) = u^sj(aj ; Xj )Xj + c^s [j + 1; aj + u^sj(aj ; Xj )], and therefore c^s(j; aj ) = x^(s) + F (s)^cs(j + 1; aj + 1) + F (s)^cs(j + 1; aj ) ; (8) where

x^(s) = E fX 1X>s g and F (s) = 1 ? F (s) ; (9) with F () the distribution function of X (F (x) =Prob[X x]) and 1A() the indicator function of the set A (1A (x) = 1 if x 2 A, 1A (x) = 0 otherwise). The two cases (6) and (7) respectively give c^s(j; n + j ? N ? 1) = (N ? j + 1)E fX g and C^ s(j; n; Xj ) = 0 for all Xj , which is used to initialize the backward recurrence (8). 5

2.2 Closed{Loop decisions

The optimal thresholds s~(j; aj ) are obtained by solving the following stochastic dynamic programming problem: + EXj+1 fuj+1max [u X + 2Uj+1 j +1 j +1 EXN?1 fuN?max + EXN fumax uN XN g]g]g] ; (10) 1 2UN?1 N 2UN with the constraints (6,7) on the sets Uj . Let C~ (j; aj ; Xj ) denote the optimal conditional expected gain{to{go (5) when uNj is chosen optimally, and c~(j; aj ) denote its expected value with respect to Xj . One has, for n ? N + j ? 1 < aj < n, C~ (j; aj ; Xj ) = maxuj 2f0;1g[uj Xj + c~(j + 1; aj + uj )]. The optimal decision is thus ( Xj s~(j; aj ) = c~(j + 1; aj ) ? c~(j + 1; aj + 1) ; u~j (aj ; Xj ) = 01 ifotherwise, (11) max [u X uj 2Uj j j [uN ?1XN ?1

which gives C~ (j; aj ; Xj ) = max[Xj +~c(j +1; aj +1); c~(j +1; aj )] and the following backward recurrence equation for c~(j; aj ):

c~(j; aj ) = E fmax[X + c~(j + 1; aj + 1); c~(j + 1; aj )]g :

(12)

The two cases (6) and (7) give c~(j; j + n ? N ? 1) = (N ? j +1)E fX g and C~ (j; n; Xj ) = 0 for all Xj , which initializes the recurrence (12). Example 1:

Consider the case where the Xk 's are the results of the roll of a fair dice. The dice is rolled N = 7 times and n = 3 results can be stored. The matrices [G^ ]i;j = c^(j; n ? i), [G~ ]i;j = c~(j; n ? i) and [T~ ]i;j = c~(j + 1; n ? i) ? c~(j + 1; n ? i + 1) are given by 1 0 4:91 4:82 4:625 4:25 3:5 G^ ' B@ 9:625 9:34 8:875 8:125 7 3:5 CA ; 14:11 13:59 12:84 11:81 10:5 7 3:5 1 0 5:13 4:94 4:67 4:25 3:5 G~ ' B@ 9:83 9:43 8:92 8:17 7 3:5 CA ; 14:26 13:68 12:93 11:94 10:5 7 3:5 1 0 4:94 4:67 4:25 3:5 0 T~ ' B@ 4:30 3:97 3:5 2:75 0 0 CA : 3:85 3:5 3:028 2:33 0 0 0 One starts at step 1 in the lower left corner of the matrices. If (i; j ) is the position at step k, the position at step k + 1 is (i ? 1; j + 1) if Xk is larger than the threshold (that is, E fX g for the OLFO rule, T~ i;j for the optimal rule) and (i; j + 1) otherwise. The dots indicate situations that are not reachable. The threshold of the optimal rule increases 6

with the number N ? j of steps{to{go and with the number n ? i of items already stored. It also increases when n ? i increases with N ? j ? i constant, see Lemma 1 in Section 3.3. When N = 50, the rst 5 columns of the matrices are: 1 0 5:000 5:000 5:000 C (13) G^ ' B@ 10:000 10:000 10:000 10:000 A ; 15:000 15:000 15:000 15:000 15:000 1 0 5:9997 5:9996 5:9995 (14) G~ ' B@ 11:997 11:996 11:995 11:995 CA ; 17:984 17:982 17:979 17:975 17:972 1 0 5:9996 5:9995 5:9994 (15) T~ ' B@ 5:9965 5:9959 5:9952 5:9943 CA : 5:9849 5:9826 5:9799 5:9769 5:9734

2

In this simple example the expected gain{to{go and optimal thresholds tend to a limit when the number N ? j of steps{to{go increases while the storage capacity remains bounded. This type of behavior is general when the support of () is bounded from above, as shown in the next section.

3 Performance 3.1 Upper bound

Consider rst the optimal non{sequential strategy, which corresponds to the selection of the n best items once all N have been observed. This gives an upper bound on c~(1; 0). gk denote the sequence obtained by ordering the Xk 's by decreasing values, Let fXk;N k = 1; : : :; N , one has n X g: (16) c~(1; 0) E f Xk;N k=1

, see, e.g., [24], p. 190, Let Fk;N () denote the distribution function of Xk;N

Fk;N (x) =

kX ?1 i=0

CiN [F (x)]N ?i[1 ? F (x)]i = Bk;N +1?k [F (x)] ; k = 1; : : :; N ;

with CiN = N !=[i!(N ? i)!] and Bp;q() the incomplete Beta function ?(p + q) Z x tp?1(1 ? t)q?1dt : Bp;q (x) = ?( p)?(q) 0 g = R 1 [1 ? Fk;N (t)]dt ? R 0 Fk;N (t)dt, one can easily compute the upper Using E fXk;N 0 ?1 g= bound (16) for any F (). For instance, one gets when () is uniform on [0; 1]: E fXk;N 7

g) = k=(N + 1), and (N + 1 ? k)=(N + 1), F (E fXk;N

n(n ? 1) : EJN (uN1 ) n N N+ 1 ? 2( N + 1)

Since, at step j , n ? aj items are still to be collected among N ? j + 1, an intuitive open{loop sequential decision rule for n ? N + j ? 1 < aj < n is as follows: ( if Xj s(j; aj ) = E fXn?aj ;N ?j+1g ; uj (aj ; Xj ) = 10 otherwise. (17) For the uniform law, this gives the thresholds s(j; aj ) = [N ? j +2 ? (n ? aj)]=(N ? j +2). We shall see (Examples 2 and 3 and Remark 3) that this rule is generally asymptotically suboptimal (for N ! 1 with n xed). Another simple open{loop rule will be suggested in Section 4 and will be proved to be asymptotically optimal for a large class of distributions.

3.2 Constant threshold

Consider a decision rule that uses a constant threshold s. We reverse the course of time in order to transform the backward recurrence (8) into a forward recurrence. For any step j , with aj items already stored, denote the expected gain{to{go c^s (j; aj ) by ^km , with k = k(aj ) = n ? aj (the current storage capacity) and m = m(j; aj ) = N + 1 ? j ? n + aj (the number of steps{to{go before reaching the situation (6)). One gets with this new notation: ^k0 = kE fX g 8k 0 ; ^0m = 0 8m 0 ; (18) k k ? 1 k (19) ^m = x^(s) + F (s)^m + F (s)^m?1 ; 8m 1 ; k 1 : The limiting behavior of ^km when m tends to in nity is then given by the following theorem. Theorem 1 For any xed k 0 and any s such that F (s) > 0, the decision rule that uses a constant threshold s is such that ^km = k x^(s) ; lim (20) m!1 F (s) with x^(s) and F (s) given by (9). Moreover, for any nite m, " # x ^ ( s ) x ^ ( s ) m ?m m = u + E fX g ? (21) F (s) F (s) [1 ? F (s)] V u ;

8

where m = (^1m ; : : :; ^km )>, u = (1; : : :; k)> and V = I? F (s)U, with I the k-dimensional identity matrix and 1 0 0 0 0 0 0 B CC B 1 0 0 0 0 C B B (22) U = BB 0. 1. 0. . 0. 0. CCC : C B . . . . . . @. . . . . .A 0 0 0 1 0

Note that (21) can be used to optimize the threshold s: for any (m; k) xed in advance, one can determine the threshold s(m; k) that will give the maximum possible value of the expected gain{to{go among all strategies that use a constant threshold. We shall see (Example 4 (ii)) that this approach is generally suboptimal, even asymptotically. Example 1: (continued)

For s = E fX g = 3:5 one has F (s) = 0:5 and x^(s) = 2:5, which gives limm!1 ^km = 5k as shown in (13). 2

3.3 Optimal decisions

We use notations similar to previous section, and denote the optimal expected gain{to{go c~(j; aj ) by ~km , with k = n ? aj and m = N + 1 ? j ? n + aj . The two situations (6) and (7) give now ~k0 = kE fX g 8k 0 ; (23) ~0 = 0 8m 0 : m

De ne M = minfxjF (x) = 1g 2 IR [ f+1g, so that M < 1 corresponds to the case where the support of () is bounded from above. Similarly, let M 0 2 IR [ f?1g denote the in mum of the support of (). De ne h() by h() : s 7! h(s) = x^(s) ? sF (s) = E f(X ? s)+g ; (24) where x+ = max(x; 0), so that h(s) + s = E fmax(X; s)g and h(M 0) + M 0 = E fX g. The ratio h(s)=F (s) is known as the mean excess function. The function h() is continuous and satis es h(s) > 0 for any s < M (and, when M is nite, h(s) = 0 for any s M ). Also, 8s1 < s2 ; h(s1) ? h(s2) 0 ; (25) where the inequality is strict when F (s1) > 0, that is, when s1 < M , and 8s1 < s2 ; s1 + h(s1) s2 + h(s2) ; (26) where the inequality is strict when F (s2) < 1, that is, when s2 > M 0. When () is absolutely continuous with respect to the Lebesgue measure, with '() its density, then h() is two times dierentiable and dh(s) = ?F (s) ; d2h(s) = '(s) : (27) ds ds2 9

We shall denote the optimal threshold in the decision rule (11) by s~km, with, for k + m 1, s~km = ~km?1 ? ~mk?1 . In agreement with (6,7), we set s~0m = 1, s~k0 = M 0 for k 1, m 1. This gives, for any k 2, m 1: s~km = ~km?1 ? ~mk??11 ? (~mk?1 ? ~mk?2 ) + s~mk?1 : (28) The backward recurrence (12) can be written ~km = ~mk?1 + E fmax(X; ~km?1 ? ~mk?1 )g = ~mk?1 + E fmax(X; s~km )g ; and thus

~km ? ~mk?1 = E fmax(X; s~km )g = s~km + h(~skm) : Together with (28), this gives for any k 1, m 1 s~km = s~km?1 + h(~skm?1) ? h(~smk?1) ;

(29) (30) (31)

which corresponds to (8) in [9]. The following lemma states some properties of s~km and ~km .

Lemma 1 The optimal threshold s~km satis es M 0 < s~11 < M and 8n 2 ; M 0 < s~n1 < < s~mn?m+1 < s~nm?m < s~nm?+1m < < s~1n < M :

(32)

The optimal expected gain{to{go satis es

8m 0 ; k 1 ; ~km < kM :

(33)

Next theorem gives the limiting performance of the optimal decision rule.

Theorem 2 The optimal expected gain-to-go and optimal threshold satisfy k ~k 8k 1 ; mlim !1 s~m = M ; !1 m = kM ; mlim with M 2 IR [ f+1g the supremum of the support of (). Example 1: (continued)

One has M = 6, so that the third column of the matrices G~ and T~ respectively tend to (6; 12; 18)> and (6; 6; 6)> when N tends to in nity, see (14,15). 2 Example 2: () is uniform.

(i) Assume that () is uniform on [0; 1]: '(x) = 1, x 2 [0; 1]. One gets for s in [0; 1]: F (s) = 1 ? s, x^(s) = (1 ? s2)=2, h(s) = (1 ? s)2=2. The recurrence equation for s~1m = ~1m?1 is thus: s~11 = E fX g = 1=2 and s~1m = s~1m?1 + (1 ? s~1m?1)2=2, m > 1, which is studied for instance in [16, 19]. Consider the following dierential equation dz(t)=dt = h[z(t)] = 10

[1 ? z(t)]2=2, z(1) = 1=2, which gives z(t) = (t +1)=(t +3) and F [z(t)] = 2=(t +3). De ne m = fF [z(m)] ? F (~s1m)g=F [z(m)]. One has m+1 ? m = ? (mm++3)5 2 m + (mm++3)4 2 (m)2 + (m +1 3)2 ; so that m+1 < m for 1=(m + 4) < m < 1, and one can show that m ! 0 when m ! 1. Therefore, F (~s1m) F [z(m)] q 2=mpwhen m ! 1. Similar calculations give p 2 3 mF (~sm) ! 1 + 5 and mF (~sm) ! 1 + 7 + 2 5, m ! 1. Note that for the decision rule (17), derived from the optimal non{sequential policy of Section 3.1, one gets: F [s(j; aj )] = F [skm] = F (E fXn?aj ;N ?j+1g) = Nn??j a+j 2 = m +kk + 1 and thus mF [skm] ! k when m ! 1. The optimal sequential rule therefore diers from (17), even asymptotically. (ii)PConsider now a situation where F () is continuous and strictly increasing, with N E f k=1 uk F (Xk )g to be maximized. Direct calculation gives ~11 = 1=2; ~12 = 3=8; : : : ; ~ 1m = ~1m?1 (1 ? ~1m?1 =2), and ~1m = 2=m + o(1=m), m ! 1. This result can also be obtained by noticing that situation (ii) coincides with (i), since fXk0 gk = fF (Xk )gk is independently uniformly distributed in [0; 1], see, e.g., [24], p. 190. 2 Example 3:

Assume that '(s) = K (M ? s) for s0 s M , with > ?1 (the case = 0 corresponds to Example 2). It gives K +2 F (s) = K+ 1 (M ? s) +1 ; h(s) = ( + 1)( + 2) (M ? s) ; s0 s M : Take i as the rst value of m such that s~1m s0.h For any m i, s~1mi+1 = s~1m + K (M ? +1 s~1m) +2=[( + 1)( + 2)] and F (~s1m+1) = F (~s1m) 1 ? F (~s1m)=( + 2) . Consider the dierential equation dz(t) = h[z(t)] ; z(m ) = s~1 ; (34) 0 m0 dt which gives for m0 > i: [M ? z(t)] +1 = ( + 2)=[K (t + C )], with C = ( + 2)=fK [M ? z(m0)] +1g? m0. Therefore, F [z(t)] = ( +2)=[( +1)(t + C )] ( +2)=[t( +1)]. De ne m = 1 ? m( +1)F (~s1m)=( +2), which gives m+1 = 0(m)+ 1 (m)m + 2(m)(m)2 + with 0(m) = ( + 2)=[2( + 1)](1=m2) + O(1=m3 ), 1(m) = 1 ? 1=m + ( + 4)=[2( + 1)](1=m2) + O(1=m3 ), 2(m) = 1=m ? ( ? 2)=[2( + 1)](1=m2) + O(1=m3 ). For m0 large enough in (34), m has the same behavior as the solution of the recurrence equation !m+1 = ( + 2)=[2( + 1)](1=m2) + (1 ? 1=m)!m , which satis es !m ! 0, m ! 1, see Lemma 2 in appendix. Therefore, m ! 0 and + 2 ; m ! 1: mF (~s1m) ! + (35) 1 11

For the heuristic rule (17), one has (see [10], p. 137), !?1=( +1) K ?[1=( + 1)] + o m?1=( +1) E fXm;mg = M ? m + 1 +1 g) ! f?[1=( + 1)]=( + 1)g +1 , which diers and this threshold satis es mF (E fXm;m from (35).

2

Example 4:

We consider two dierent densities for X : p p p 2 x x 2 2 2 p (36) '1(x) = exp(? ) ; '2(x) = exp(? 22 ) ; x 0 : We take = 1 in the computations below. (i) Consider rst the case N = 1000; n = 3. Figure 1 presents the evolution of c~(j; 3) (full line) and c^(j; 3) (dashed line) as functions of j for '1(). The corresponding upper bound (16) is in dotted line. Figure 2 presents the curves obtained for '2(). For both densities, the performance of the optimal rule is much better than that of OLFO decisions, and the loss due to the sequential character of the decisions is small. POSSIBLE LOCATION OF FIGURE 1 POSSIBLE LOCATION OF FIGURE 2 Figure 3 concerns the case where optimal decisions are taken with the wrong density. For the curve in dashed line, optimal decisions are taken for '1() while the true density is '2(). For the curve in full line, the true density is '1() and optimal decisions are for '2(). Comparison with Figures 1 and 2 indicates that errors on the assumed distribution of the Xk 's produce a reasonably small decrease of performance for the optimal decision rule. This robustness with respect to the assumed distribution will be important when the distribution will be estimated, see Section 5. POSSIBLE LOCATION OF FIGURE 3

(ii) Take now N = 1000; n = 1. Figure 4 presents c~(j; 1) (full line) and c^s(j;1)(j; 1) (dash{dotted line) as functions of j for '1(), with s(j; 1) the optimal value of s obtained

by maximizing c~s(j; 1) with respect to s, see (21): c^s (j;1)(j; 1) is thus the maximum value of the expected gain{to{go that could be obtained with a rule that uses a constant threshold from the current step j up to j = N . Keeping the threshold constant appears to be suboptimal, even asymptotically, already in the case n = 1. 12

POSSIBLE LOCATION OF FIGURE 4

2 From Theorems 1 and 2 the performance of the OLFO rule can be much worse than that of the optimal rule, in particular when the probability measure () has a thin tail, as illustrated by Example 4 (i). Keeping the threshold constant, even if optimized, is also suboptimal, as shown in Figure 4. On the other hand, the optimal threshold follows a nonlinear recurrence equation, which may require heavy computations (note in particular that it is impossible to compute all the thresholds in advance when the distribution of the Xk 's is estimated on{line, as it is the case in Section 5). The derivation of a simpler open{loop solution, with performance close to optimal, is thus of special interest. A natural approach is to use an adaptive threshold, of the form s^km = f (m; k), with f (m; k) derived from the asymptotic behavior of the threshold s~km of the optimal solution. This is considered in the next section.

4 An asymptotically optimal open{loop rule

We assume that () is absolutely continuous with respect to the Lebesgue measure, with density '() having unbounded support (M = 1). We know from Theorem 2 that the optimal expected gain{to{go ~km ! 1 when m ! 1 with k xed, k 1, whereas, from Theorem 1, ^km remains bounded when the threshold s is kept constant. The purpose of this section is to construct open{loop decision rules with a performance close to optimal when m ! 1. We make the following assumption on the tail of (). H1: F () is twice dierentiable, the density '() is such that '(s) > 0 and its derivative '0() satis es '0(s) < 0 for s larger than some s1. Moreover, F () has the representation " Zs # 1 F (s) = F (s0) exp ? a(t) dt ; s s0 ; s0 where the auxiliary function a(t) > 0 is absolutely continuous w.r.t. Lebesgue measure, with derivative a0(t) having limit limt!1 a0(t) = a 2 [0; 1). Note that a(s) = F (s)='(s) for s > s0. When a = 0, F () is a von Mises function, see [10], p. 138, a class which contains for instance the exponential, normal, lognormal, Weibull, Gamma, Erlang, Gumbel, distributions, all with a tail decreasing faster than any power law s?. In that case, limt!1 F (t)'0(t)=['(t)]2 = ?1 and limt!1 a(t)=t = 0, see [10], p. 140. When a(t) = t=b(t) with b(t) ! 2 (1; 1) as t ! 1, a = 1= and F () is regularly varying with index ?; that is (see [10], p. 566), F (ts) = t? ; t > 0 : F () 2 R? : slim !1 F (s) 13

We de ne

A(s) = h[(Fs)('s)](s2) ; A = slim !1 A(s) :

(37)

When F () is a von Mises function, direct application of L'H^opital's rule shows that A = 1. When F () 2 R? , 2 (1; 1), '() 2 R?(+1) and from [13], vol. 2, p. 281, A = =( ? 1). We shall use the following de nitions to characterize the asymptotic performance of a given decision rule.

De nition 1 Consider a suboptimal decision rule with expected gain{to{go km. We shall say that it is asymptotically equivalent to the optimal rule (AEOR) if km=~ km ! 1 as m ! 1, and that it is asymptotically optimal (AO) if ~km ? km ! 0 as m ! 1. Note that asymptotic optimality implies asymptotic equivalence to the optimal rule. The asymptotics we shall consider correspond to m ! 1 with k xed, that is, in the setup of Section 1 and 2, to N ! 1 with n xed. Albright and Derman [1] consider the case where n = N ? [N], 2 (0; 1), N ! 1, where [x] rounds x towards next integer. They show (Theorem 2) that for 2 (0; 1) and m ! 1, F (~s m m ) ! =(1 + ) and (Theorem 1 and Lemma 2) that the open-loop rule with constant threshold F ?1[ =(1 + )] is AEOR. The case where k is xed is more dicult, in the sense that, in general, constant thresholds do not provide asymptotic equivalence to the optimal rule (see Figure 4 in Example 4 (ii) and Remark 5 below). When the threshold is suitably chosen as a function of m and k, we get the following result.

Theorem 3 (i) k = 1. If F () satis es H1, an open{loop rule that uses thresholds s1m such that mAF (s1m) ! 1 as m ! 1 is AEOR when lim infs!1 a(s) > c > 0 and AO when lim sups!1 a(s) < C < 1.

(ii) k 1. If F () satis es H1 with a = 0 (F () is a von Mises function), an open{loop rule that uses thresholds skm such that mF (skm) ! k as m ! 1 is AEOR when lim infs!1 a(s) > c > 0 and AO when a(s) ! 0 as s ! 1.

Remark 1 Using Lemma 5 (ii) and the same arguments as in the proof of Theorem 3 (ii), one can easily show that when F () is a von Mises function with lim sups!1 a(s) < C < 1, then lim supm!1 ~km ? 1m=k < kC (log k + E;k ), with E;k the mean of the sum

of a k-dimensional -extremal variate and 1m=k the expected gain{to{go of an open{loop rule that uses thresholds skm such that mF (skm) ! k, m ! 1. Note that when k = 1 this only gives lim supm!1 ~1m ? 1m < CE, with E the mean for the Gumbel distribution, whereas part (i) of Theorem 3 indicates that ~ 1m ? 1m ! 0. Our proof of part (ii), based on the decomposition of the original problem into k subproblems (see the appendix) thus gives a pessimistic result. See also Remark 4.

14

Remark 2 The arguments used in the proof of Theorem 3 (ii) imply that ~km=E fPki=1 g ! 1 as m ! 1 when F () is a von Mises function. Moreover, if lims!1 a(s) = 0, Xi;m g ? ~k ! 0 as m ! 1. This means that the expected gain{to{go for then E fPki=1 Xi;m m sequential decisions can be made arbitrarily close to its upper bound (16) by choosing N large enough, provided F () decreases fast enough. Note, however, that the convergence

can be very slow, as shown in Figure 2. In the case of normal and exponential distributions considered in Example 4, there exist important results on the almost sure (a.s.) behavior of maxima, see [10], p. 176, 177:

X1;m (s) pK exp(?s2=2) ; s ! 1 ; K > 0 ; p lim = 1 a.s. when F m!1 2 log m 2s X1;m 1 mlim !1 log m = b a.s. when F (s) K exp(?bs) ; s ! 1 ; K > 0 :

p This implies ~1m = log m ! 1 for '1 () and ~ 1m = 2 log m ! 1 for '2 ().

Remark 3 For any distribution function F (), choosing a threshold s1m that satis es mF (s1m) ! c as m ! 1, with c some constant in [0; 1], is equivalent to limm!1 Prob(X1;m s1m) = exp(?c). This is known under the name of Poisson approximation, see, e.g., [10], p. 116. Therefore, choosing thresholds s1m that satisfy Prob(X1;m s1m) ! exp(?1=A), implies mAF (s1m ) ! 1 and Theorem 3 (i) applies. One can easily check that this is not the case for the open{loop rule (17) since mAF (E fX1;m g) ! l = 6 1. ? Indeed, when F () 2 R?, > 1, Prob(X1;m E fX1;m g) ! exp[?(E) ], with E = ?[( ? 1)=], and (E) = 6 A for any > 1. When F () is a von Mises func tion, Prob(X1;m E fX1;mg) ! exp[? exp(?E )], with E the mean for the Gumbel distribution, and exp(?E) = 6 1. Remark 4 More precise results than Theorem 3 (ii) can be obtained for the exponential distribution F (s) = K exp(?bs), K > 0. Lemma 4 (iv) gives limm!1 mF (~s1m ) = 1. De ne (~skm) = (b=K ) exp(bs~km ) = 1=F (~skm), so that (~s1m ) m as m ! 1. The recurrence (31) gives for k = 1: (~s1m+1 )?(~s1m ) = fexp[1=(~s1m )]?1g(~s1m) = 1+1=2m+ o(1=m). Therefore, (~s1m ) = m + (1=2) log m + o[log m], and s~1m = (1=b) log(Km=b) + (log m)=(2bm) + o[(log m)=m], m ! 1. De ne km = (~skm )=(~smk?1 ). For k = 2, one gets 2m+1 = 2m expf[1=2m ? (~s1m)=(~s1m+1 ) ? 1]=(~s1m )g and thus 2m+1 = 2m expf[1=m + o(1=m)][1=2m ? 2 + o(1)]g, which gives 2m ! 1=2 as m ! 1, and therefore (~s2m) = m=2 + o(m). One gets similarly for k > 1, km+1+1 = km+1 expf[1=km+1 ? (~skm )=(~skm+1 ) + (~skm)=(~smk?+11 ) ? 1]=(~skm )g. Assuming that (~skm) = m=k + o(m), which is true for k = 2, we get km+1+1 = km+1 expf[k=m + o(1=m)][1=km+1 ? (k + 1)=k + o(1)]g, which implies km+1 ! k=(k + 1) and (~skm+1 ) = m=(k + 1) + o(m) as m ! 1. Therefore, by induction on k, (~skm ) = m=k + o(m), that is, for any k, mF (~skm) ! k, m ! 1. Lemma 3 (iv) then implies that for thresholds skm satisfying mF (skm ) ! k one has jskm ? s~km j ! 0, and a proof similar to that of Theorem 3 (i) shows that the decision rule de ned by skm is AO. We make the conjecture that it is more generally the case for distributions satisfying H1 with lim sups!1 a(s) < C < 1.

15

Remark 5 A direct application of Theorem 1 shows that the expected gain{to{go ^1m of )] satis es ^1m =F ?1[1=(Am )] ! the open{loop rule that uses a constant threshold F ?1 [1=(Am (1 ? ) = R, with R = 1 ? 1=e if F () is a von Mises function, and R = (1 ? e )[=( ? 1)] if F () 2 R?, 2 (1; 1). From Theorem 3 (i), this means that for F () satisfying H1, if )], the ratio at step m suciently large one freezes the threshold at the value F ?1 [1=(Am of the expected gain{to{go to its optimal value will be close to R < 1. On the other hand, this ratio will be arbitrarily close to 1 for m large enough when the threshold is adaptive, )] at each step m. that is, equal to F ?1[1=(Am Consider the following open{loop decision rule de ned by ( n?aj uj (aj ; Xj ) = 0 if F (Xj ) A(N ?j+1)+(1?A)(n?aj )? ; 1 otherwise,

(38)

with 0 < 1. This CE open{loop rule can be expressed directly in terms of the order statistics fXi;j ?1 g: ( if Xj < Xlj ;j?1 ; uj (aj ; Xj ) = 10 otherwise, (41) with, for any j , Xl;j ?1 = ?1 for l > j ? 1, X0;j?1 = 1 and ' & j ( n ? a j) lj = A(N ? j + 1) + (1 ? A)(n ? a ) ? ; j

(42)

where dxe rounds x to the nearest larger integer. Indeed, Xj < Xlj ;j?1 implies F^ j (Xj ) lj =j , and thus F^ j (Xj ) (n ? aj )=[A(N ? j + 1) + (1 ? A)(n ? aj ) ? ] and uj (aj ; Xj ) = 0 in (38); Xj Xlj ;j?1 implies F^ j (Xj ) (lj ? 1)=j and thus uj (aj ; Xj ) = 1 in (38).

A particular feature of problem (1,2) in this context is that decisions are neutral with respect to estimation (see [12, 20] for a de nition of neutrality in control problems): the sequence f k gk is observed whatever the decision sequence f k gk , and decisions have no eect on the accuracy of the estimation of the distribution. The results in [3] therefore show that neutrality is not sucient for certainty equivalence, which contradicts the suggestion in [20] (see also [5]). 3

X

u

17

Figure 6 presents lj given by (42) as a function of j > 1 for N = 50, A = 1, = 0:01, aj = 0 and n = 1 (), n = 5 (+), n = 10 (). Only the values lj < j are plotted; for j large enough (with aj = 0), lj j and any item is selected. This corresponds to the constraint (6). Note that lj 1 for any j 2, so that the rst item larger than X1 is always accepted if (41,42) are used starting at j = 2. Small values of X might thus be accepted in case X1 is small, hence the importance of forcing the rejection of rst items by (40). POSSIBLE LOCATION OF FIGURE 6 Example 6:

Assume that the true measure () has the density '2() in (36), with = 1. We take N = 50, which gives r = 19 in (39), and compare six dierent strategies: S1 is the CE open{loop rule de ned by (40-42) with A = 1, = 0:01; it is suboptimal because of CE and because of the open{loop decisions; S2 rejects all Xj 's for j r ? 1, with r given by (39), and then accepts any Xj if larger than all previous ones or if aj + N ? j + 1 n; when n = 1, it maximizes the probability of selecting the largest Xk ; S3 is optimal for the true measure ; S4 is the OLFO-CE rule with the empirical measure ^j ; it is suboptimal because of CE and open-loop decisions; S5 is the optimal-CE rule with the empirical measure ^j ; it is suboptimal because of CE; S6 is the optimal-CE rule with the measure ^j , where the parameter corresponds to the standard deviation in '2(), see (36), and ^j is the empirical standard deviation of xj1; it is suboptimal because of CE but it uses more information than S5 about (). (i) Consider rst the case n = 1. To compare the performances of the six strategies above, we performed 1000 independent repetitions of the sequential screening experiment. Table 4 gives the empirical means and standard deviations of J = P uk Xk and n for the six strategies, with n = 1 if X1;N is selected, and n = 0 otherwise. The empirical means and standard deviations of the computing times T are also indicated (the computations are performed in MATLAB on a personal computer equipped with a Pentium processor 150 MHz). Note that for S3, the optimal thresholds are only computed once, at j = 1, hence the small value of E fT g compared to S6. POSSIBLE LOCATION OF TABLE 1 18

The upper bound (16) on J obtained for the non{sequential selection of X1;50 is approximately equal to 2.51. The theoretical value of c~(1; 0) for = , which corresponds to S3, is approximately 2.30. The decrease of performance due to the on{line character of the decisions is thus rather small for strategies S1; S3; S5 and S6. The loss due to CE is small: compare S5 and S6 to S3. Using open{loop decisions has also a small eect when the threshold is adaptive: compare S1 with S5. On the contrary, the loss is important when OLFO is used, see the results obtained for S4. The theoretical value of c^(1; 0) for = is approximately 1.37, so that the poor performance of S4 is not due to CE. Note that, as indicated by Theorems 1 and 2, the superiority of S1; S3; S5 and S6 over S4 would increase with N . S2 performs rather poorly in terms of J , although it is optimal in terms of n when nothing is known about () (the theoretical expected value of n is (r ? 1)cr?1=N ' 0:3743). S3 performs better than S2 in terms of n, but assumes p pthat () is known. Selecting one Xk at random gives on the average J = E fX g = 2= ' 0:798 and E fn g = 1=50. Since the standard deviations in Table 1 are rather large, we perform a statistical analysis of the results based on the method of paired comparisons. First, we compute the dierences between the performances of each pair of strategies, the same Xk 's being used for all strategies in each experiment. For comparing Sj with Sk , we compute j;k J;i = P [J (Sj )]i ? [J (Sk)]i, with [J (S )]i the cumulative gain uk Xk obtained for strategy S in the i-th experiment. This gives q = 1000 independent realizations of j;k J;i , with empirical j;k j;k mean E (J ) and standard deviation (J ). The same is done with the criterion n, j;k which yields to E (j;k n ) and (n ). Then, we test if Sj performspsigni cantly better or = qE (j;k worse than Sk in terms of J and n by computing the ratios j;k )=(j;k J J ) and J p j;k j;k j;k n = qE (n )=(n ). If the two decision rules have similar average performances, j;k which corresponds to the null hypothesis, then j;k J and n approximately follow Student's t-distribution with q ? 1 degrees of freedom. For large values of q, which is the case here, the distribution is approximately N (0; 1). The critical value, for a level of signi cance j;k 0.5% (one sided{test), is 2.576. Values j;k J (resp. n ) larger than 2.576 thus indicate that Sj performs signi cantly better than Sk in terms of J (resp. n). Values smaller than -2.576 indicate that Sk performs better than Sj . j;k The values of j;k J and n are given respectively in Tables 2 and 3, with indices j; k corresponding respectively to the lines and columns of the tables. POSSIBLE LOCATION OF TABLE 2 POSSIBLE LOCATION OF TABLE 3 One can conclude from these results that the strategies can be ranked by order of increasing performance as S4 S2 S1 S5 S6 S3 in terms of J , and as S4 S5 S1 S6 S2 S3 in terms of n, with indicating a non{signi cant dierence. Taking the computing times (see Table 1) into account, S1 and S5 are very ecient when () is 19

unknown, although, of course, S2 is recommended if the characteristic of interest is n. We shall see below that the situation is much dierent for S2 when n > 1. (ii) Consider now the case n = 10. Figure 7 presents the evolution of the thresholds used at step j to select uj : the curve in dashed line corresponds to S5 and that in full line to S3. The values of Xk are indicated by stars. The ten values above the dashed line (resp. full line) correspond to those that are sifted out by the decision rule S5 (resp. S3). As j increases, the empirical measure ^j tends to and the decisions for S5 get closer to the optimal ones. In this particular run, for j = 46; : : : ; 49, S3 still has 2 items to select, whereas S5 has only one. The threshold thus becomes zero at j = 49 for S3 and only at j = 50 for S5, the last item X50 being selected by S3 but not by S5. POSSIBLE LOCATION OF FIGURE 7 We performed again 1000 independent repetitions of the sequential screening experiP ment. Table 4 gives the empirical means and standard deviations of J = uk Xk , n and computing times T for the six strategies, with n now de ned as the number of Xk se . lected that are actually among the n largest values, that is, those that satisfy Xk Xn;N The small value of E fT g for S3 is due to the fact that the optimal thresholds are only computed once, at j = 1. POSSIBLE LOCATION OF TABLE 4 The upper bound (16) on J obtained for the non{sequential selection of X1;50; : : :; X10 ;50 is approximately equal to 17.3. The theoretical value of c~(1; 0) for = , which corresponds to S3, is approximately 16.63, which coincides with the value in the table. The performances of S1; S3; S5 and S6 are rather close to the upper bound 17.3, and the comparison of S5 and S6 to S3 shows that the loss due to CE is small. The loss due to open{loop decisions is small when the threshold is adaptive, compare S1 with S5, but this loss is important when OLFO is used, see S4. The theoretical value of c^(1; 0) for = is approximately 13.65, so that the poor performance of S4 is not due to CE. Again, the superiority of S1; S3; S5 and S6 over S4 would increase with N , see Theorems 1 and 2. S2 has very poor performance: selecting 10 Xk 's at random already gives on the average p p J = 10E fX g = 10 2= ' 7:98. The expected value of n for 10 Xk 's taken at random among 50 i.i.d. values is 2 (see [24], p. 35). We see that strategies S1; S3; S5 and S6 are thus rather ecient at selecting the best items. During the 1000 repetitions of the experiment, the 10 best items X1;50; : : : ; X10 ;50 were selected 116 times by S3, 17 times by S5, 12 by S6 and 9 by S1. S2 and S4 never succeeded to select all of them. These results might seem rather poor, but it should be noticed that the probability of selecting these 10 largest values by a pure random choice is as small as 0:97 10?10 ! (see [24], p. 35). A statistical analysis of the results based on the method of paired comparisons yields 20

j;k to the values of j;k J and n , given in Tables 5 and 6 respectively.

POSSIBLE LOCATION OF TABLE 5 POSSIBLE LOCATION OF TABLE 6 One can conclude from these results that the strategies can be ranked by order of increasing performance as follows, both in terms of J and n: S2 S4 S1 S5 S6 S3. S2 is the worst strategy for J and n among those considered. This is due to the rejection of the rst r ? 1 items, with r chosen independently of n. The performances of S5 and S6 are very close, although S6 uses more information. This con rms the observation made in [3] that distribution{free approaches should be preferred to Bayesian optimal rules when prior information is poor (note that we did not assume any prior on for S6, and simply used the empirical estimate ^ j = stdfxj1g). S5 is thus very ecient when no prior information on () is available. Taking the computing times in Table 4 into account, together with the fact that the performances of S5 and S1 are close, we conclude that S1 is a most attractive strategy.

2

6 Conclusions and further developments We have used the asymptotic behavior of the optimal closed{loop solution of a sequential screening (or stochastic assignment) problem to derive a simple open{loop rule with good asymptotic performance. In particular, its asymptotic optimality has been proved for a large class of distributions. The case of distributions with bounded support has only been touched at through examples (2 and 3). They indicate that for n = 1, the optimal thresholds satisfy a property similar to Lemma 4 (iv). However, Example 2 shows that the extension to n > 1 might be dicult. Several generalizations of problems (1,2), or (3), might be considered: another cost, function of k, could be added to the sum of the Xk 's; the Xk 's might arrive in time according to some process until a random deadline T , or each pi might become zero after a deadline Ti, see [25]; the Xk 's might be correlated, e.g. the distributions of two successive Xk 's might be governed by a Markov chain, see [2] which generalizes the results of [1]. The extension of the asymptotic results (N ! 1 with n xed) of Section 4 to these situations deserves further studies. Certainty equivalence has been forced in the case where the distribution of the Xk 's is unknown: we reject the rst t samples, with t chosen according to a heuristic rule, see (40), derived from the optimal solution of the secretary problem, and then replace at each step the unknown distribution by an estimate. Although this is suboptimal, numerical simulations indicate that the loss of optimality is reasonably small. 21

Finally, an extension of special importance concerns the experimental design problem (iii) of Section 1, in the case where is a p-dimensional vector, with possible application to active on{line example selection for learning problems. When the observations are yk = (; Zk ) + k , with fk gk an i.i.d. sequence of measurement errors, a classical criterion for the precision of the estimation of is the determinant of the Fisher informaP n tion matrix, M(; Zk1 ; : : : ; Zkn ) = i=1[@(; Zki )=@ ][@(; Zki )=@ > ]. Several methods exist to maximize det[M] with respect to Zk1 ; : : :; Zkn when the Zk 's can be chosen freely in a given set, see, e.g., [11, 22, 27]. When the Zki 's must be selected sequentially from a given i.i.d. sequence Z1; : : :; ZN , the problem resembles (1,2), with, however, the important dierence that the criterion is no longer additive. Note that the optimal decisions depend on which is unknown. Certainty equivalence might be used when can be estimated on{line. A simple open{loop policy (the OLFO rule of Section 2.1) is proposed in [21]. The development of open{loop adaptive rules similar to (38) is currently under study.

Appendix Proof of Theorem 1. The case k = 0 has already been considered, see (18). De ne m = (^1m ; : : :; ^km )>. One has from (19) m = x^(s)1 + F (s)U m + [1 ? F (s)]m?1 , with 1 the k-dimensional vector with all entries equal to 1 and U given by (22). This can also be written m = x ^(s)V?1 1 + [1 ? F (s)]V?1 m?1 ; (43) with V = I ? F (s)U and I the k-dimensional identity matrix. The matrix V?1 has k eigenvalues equal to 1 and F (s) > 0, so that (43) is the evolution equation of a stable stationary linear system. This system converges to its unique stable solution, given by (20). Direct calculation gives (21). Proof of Lemma 1. Proof of (32). First, we prove that

8n 1 ; M 0 < s~n1 +1 < s~n1 and s~1n < s~1n+1 < M : (44) For n = 1, s~11 = E fX g, and thus M 0 < s~11 < M (since () is nondegenerate and E fX g is nite).

Assume that M 0 < s~n1 (which is true for n = 1). It implies h(M 0) > h(~sn1 ), see (25). Equations (28) and (23) give s~n1 +1 = ~n0 +1 ? ~n0 ? (~n1 ? ~n1 ?1 ) + s~n1 = E fX g ? E fmax(X; s~n1 )g + s~n1 = E fX g ? h(~sn1 ) ; 22

and therefore s~n1 +1 > E fX g ? h(M 0) = M 0. From the de nition of s~km and (23), one has 8n 0, s~n1 +1 = ~n0 +1 ? ~n1 = (nn+?11)E fX g ? ~n1 , so that s~n1 ?n+1 s~n1 +1 = ~n1 ? ~n1 ?1 ? E fX g. For s~n1 > M 0, (29) gives ~n1 > ~1 + E fX g, and thus s~n1 > s~1 > M 0. Assume that s~1n < M (which is true for n = 1). It implies h(~s1n ) > 0 and h(~s1n ) + s~1n < M , see (26). De ne km = ~km ? ~km?1 . From (30), one has 8k 1 ; m 1 ; km = h(~skm) = E f(X ? s~km)+ g : (45) Since 8n 1, s~1n = ~1n?1 ? ~0n = ~1n?1 , (45) gives s~1n+1 ? s~1n = ~1n ? ~1n?1 = h(~s1n) and therefore s~1n < s~1n+1 < M . This completes the proof of (44). We prove now (32) by induction on n. First note that (44) with n = 1 corresponds to (32) with n = 2. Assume that (32) is true for n; that is, M 0 < s~km+1 < s~km < s~km+1 < M ; k + m = n ; k 1 ; m 1 : (46) Using (31), one gets for k + m = n + 1 s~km+1 = s~km+1?1 + h(~smk+1?1) ? h(~skm) ; s~km+1 = s~km + h(~skm) ? h(~smk?+11 ) ; and (46) gives for m 2, k 2 M 0 < s~km+1?1 < s~km?1 < s~km < M ; and M 0 < s~km < s~mk?1 < s~mk?+11 < M : Therefore, from (25,26), s~km+1?1 + h(~smk+1?1 ) < s~km + h(~skm) and h(~skm) > h(~smk?+11 ) : This nally gives s~km+1 < s~km < s~km+1, k + m = n + 1, m 2, k 2. The two boundary inequalities, obtained respectively for m = 1 and k = 1, result from (44), which completes the proof. Proof of (33). The proof is by induction on k. For k = 1, from (32), 8m 0, ~1m = s~1m+1 < M . Assume that the property is true for k; that is, 8m 0, ~km < kM . One has, 8m 0, ~km+1 = s~km+1+1 + ~km+1 < M + kM which completes the proof. Proof of Theorem 2. From (32), s~km increases with m, so it has a limit s~k 2 IR [ f+1g. For k = 1, s~1m+1 = s~1m + h(~s1m), so that the limit s~1 satis es h(~s1) = 0. From (31), for k > 1, s~k and s~k?1 satisfy h(~sk ) = h(~sk?1). By induction on k, one gets s~k = M . From (45) and (32), for any k 1, ~km increases with m, and thus has a limit ~k 2 IR [ f+1g. We show that ~k = kM by induction on k. For k = 1, ~1m = s~1m, so that ~1 = s~1 = M . Assume that the property holds for k: ~k = limm!1 ~km = kM . Since ~km+1 = s~km+1+1 + ~km+1 , one has ~k+1 = limm!1 s~km+1 + limm!1 ~km = (k + 1)M .

23

Proof of Theorem 3. (i) k = 1. Let 1m denote the expected gain{to{go for the thresholds s1m. It satis es 1m+1 = 1m + h(s1m+1) + (s1m+1 ? 1m )F (s1m+1 ) : (47) The optimal cost{to{go satis es ~1m+1 = s~1m+2 = ~1m + h(~1m ). De ne m = ~1m ? 1m 0, m = m=~1m 0 and um = s1m+1 ? ~1m. The recurrence (47) gives m+1 = m[1 ? F (s1m+1 )] + [(um)2=2]'(s1m+1 + u) for some u in (min[0; um]; max[0; um]), and thus, for m large enough, (48) m+1 < m[1 ? F (s1m+1)] + [(um)2=2]'(s1m+1 ? jumj) : Consider rst the case lim infs!1 a(s) > c > 0. Since ~1m+1 > ~1m , (48) gives m+1 < m[1 ? F (s1m+1)] + F (s1m+1)rm ; (49) with rm = [(um)2=2]'(s1m+1?jumj)=[F (s1m+1)~1m ]. Lemma 4 (iv) gives F (s1m+1 )=F (~s1m+1) ! 1 and Lemma 3 (i) and (ii) imply jumj=a(s1m+1) ! 0. Lemma 3 (v) then implies F (s1m+1)= F (s1m+1 ?jumj) > b > 0 for m large enough, and therefore rm < [1=(2b)](jumj=~1m )[jumj=a(s1m+1? jumj)]. Lemma 3 (i) and (ii) imply jumj=~1m ! 0 and jumj=a(s1m+1 ? jumj) ! 0, so that rm ! 0 when m ! 1. Lemma 2 applied to the recurrence (49) implies m ! 0 as m ! 1. Consider now the case lim sups!1 a(s) < C < 1, and de ne Rm = [(um)2=2]'(s1m+1 ? jumj)=F (s1m+1 ). Lemma 4 (v) and Lemma 3 (vi) give lim supm!1 jumj=a(s1m+1) < D < 1, Lemma 3 (v) thus implies Rm < [jumj=(2b)][jumj=a(s1m+1 ? jumj)] for some b > 0 and m large enough, which gives Rm < B 0jumj for some B 0 < 1. Since jumj ! 0 from Lemma 4 (v) and Lemma 3 (iv), Lemma 2 applied to (48) implies m ! 0 as m ! 1.

(ii) k 1.

The proof is based on the following consideration. The original problem is decomposed into k subproblems of size bm=kc, with bxc the integer part of x. Each one will give an expected cost{to{go ~1bm=kc, and, since this approach is suboptimal, ~km > k~1bm=kc. Moreover, in each subproblem, ~1bm=kc > 1m=k , which is the expected gain{to{go for the open{loop rule that uses thresholds skm such that mF (skm) ! k, m ! 1. On the other P k g, which is the expected gain of the nonsequential strategy. k hand, ~m < E f i=1 Xi;m Therefore, k X g: 1 k ~ km=k < m < E f Xi;m i=1 g= Consider rst the case lim infs!1 a(s) > c > 0. Lemma 5 (i) gives (1=k)E fPki=1 Xi;m F ?1(1=bm=kc) ! 1, Lemma 4 (iv) and Lemma 3 (ii, vi) imply F ?1(1=bm=kc)=~1bm=kc ! 1. Finally, part (i) of the theorem implies ~1bm=kc=1m=k ! 1, which gives the result. 24

Consider now the case where a(s) ! 0 when s ! 1. Lemma 5 (iii) implies F ?1(1=bm=kc)? P k g ! 0; Lemma 4 (v) gives F ?1(1=bm=k c) ? ~ 1 (1=k)E f i=1 Xi;m bm=kc ! 0; part (i) of the theorem gives ~1bm=kc ? 1m=k ! 0, which concludes the proof.

Lemma 2 Consider the recurrence m+1 = (1 ? m)m + m, where m 0, m > 0, m > 0 and m = o(m ). Then limm!1 mm = , 2 (0; 1), implies limm!1 m = 0. Q

Proof. For any M and m MP, de ne fm;M = M i=m (1 ? i ). One has, for any M and M m0 < M : M +1 = m0 fm0;M + i=m0 ifi;M =(1 ? i). Take K such that 1=K < < K and choose m0 > 2K such that 8m m0, m=m < =(2K 2) and 1=(mK ) < m < K=m. Then, for M i m0, (fi;M )K < i=(M + 1), 1=(i ? K ) < 2=i and

M +1 < m0 fm0 ;M + (K)=(2K 2)

M X i=m0

fi;M =(i ? K )

< m0 [m0=(M + 1)]1=K + (=K )

M X i=m0

(1=i)[i=(M + 1)]1=K :

Noticing that PMi=1(1=i)[i=(M + 1)]1=K ! K as M ! 1, one gets M +1 < m0 [m0=(M + 1)]1=K + , which becomes less than 2 for M large enough. Since is arbitrary, this concludes the proof.

Lemma 3 Assume that F () satis es H1 and that sm ! 1 when m ! 1. The following properties are then satis ed for m ! 1. (i) When a > 0, sm s0m is equivalent to F (sm) F (s0m ) which is also equivalent to jsm ? s0mj=a(sm) ! 0; (ii) when a = 0, F (sm ) F (s0m) is equivalent to jsm ? s0m j=a(sm) ! 0; (iii) when a = 0 with lim infs!1 a(s) > c > 0, jsm ? s0mj ! 0 implies F (sm) F (s0m ); (iv) when a = 0 with lim sups!1 a(s) < C < 1, F (sm ) F (s0m ) implies jsm ? s0mj ! 0; (v) lim supm!1 em=a(sm ) < B < 1 implies lim inf m!1 F (sm + em)=F (sm) > b for

some b > 0; (vi) when a = 0, lim supm!1 jsm ? s0mj=a(sm) < B < 1 implies sm s0m, a(sm) a(s0m) and lim supm!1 jsm ? s0m j=a(s0m) < B 0 for some B 0 < 1.

25

Proof. Denote x0m = max(sm ; s0m), xm = min(sm ; s0m). Since '() is ultimately decreasing, one has for m larger than some m0 '(x0m)(x0m ? xm) < F (xm) ? F (x0m) < '(xm)(x0m ? xm) : (50)

(i) When a > 0, F () 2 R? and sm=a(sm ) ! when sm ! 1. First, jsm ? s0mj=a(sm) = [jsm ? s0mj=sm][sm=a(sm)] so that jsm ? s0mj=a(sm) ! 0 is equivalent to sm s0m. Assume that sm s0m ; sm ! 1 and sm s0m imply xm=a(xm) ! ; (50) gives [F (xm) ? F (x0m)]=F (xm) < [xm=a(xm)](x0m ? xm)=xm, so that F (sm) F (s0m). Conversely, assume that F (sm) F (s0m ); (50) gives (x0m ? xm)=x0m < f[F (xm) ? F (x0m)]=F (x0m)g [a(x0m)=x0m], with a(x0m)=x0m ! 1= since sm ! 1, and thus sm s0m . (ii) When a = 0, F () is a von Mises function. Assume that js0m ? smj=a(sm ) ! 0, sm ! 1. It means that 8 > 0, 9m1 such that 8m > m1, jsm ? s0mj < a(sm); that is, F [sm + a(sm)] < F (s0m) < F [sm ? a(sm)] : (51)

F (sm) F (sm) F (sm) The left{hand side of (51) tends to exp(?) and its right{hand side tends to exp(), see [10], p. 142. Therefore, F (sm) F (s0m). Conversely, assume that F (sm) F (s0m), sm ! 1. One gets, 8 > 0, 9m1 such that 8m > m1, exp(?) < F (s0m )=F (sm) < exp(). Take 0 = (1 ? 2). There exists m2 such that 8m > m2, exp(?2) ? 0 < F [smF+(s2a)(sm)] < exp(?2) + 0 < exp(?) m and exp() < exp(2) ? 0 < F [smF?(s2a)(sm)] < exp(2) + 0 : m Therefore, 8m > max(m1; m2), F [sm + 2a(sm)] < F (s0m) < F [sm ? 2a(sm)], and thus sm ? 2a(sm) < s0m < sm + 2a(sm), which gives js0m ? smj=a(sm) ! 0. (iii) When a = 0 and lim infs!1 a(s) > c > 0, (50) gives [F (xm) ? F (x0m)]=F (xm) < (x0m ? xm)=a(xm) and jsm ? s0mj ! 0 gives 1=a(xm ) < 2=c for m large enough, so that F (sm ) F (s0m ).

(iv) When a = 0 and lim sups!1 a(s) < C , (50) gives (x0m ? xm)=a(x0m) ! 0 when F (sm ) F (s0m ). For m large enough, 1=a(x0m) > 1=2C , so that jsm ? s0mj ! 0. (v) Since one can take b = 1 when em < 0, we only have to consider the case em > 0. Assume rst that F () 2 R? . Then, sm + em = smf1 + [em=a(sm)][a(sm)=sm ]g, with a(sm)=sm ! 1=. Therefore, lim supm!1 em=a(sm) < B implies sm + em < sm(1+2B=) for m large enough. This implies F (sm + em)=F (sm ) > F [sm(1 + 2B=)]=F (sm) ! (1 + 2B=)? as m ! 1. 26

Assume now that F () is a von Mises function. Then lim supm!1 em=a(sm) < B implies for m large enough F (sm + em)=F (sm) > F [sm + 2Ba(sm)]=F (sm) ! exp(?2B ) as m ! 1.

(vi) One has js0m ? smj=a(sm) = [sm=a(sm)] js0m ? sm j=sm, and a = 0 implies a(sm)=sm ! 0 as sm ! 1. Therefore, lim supm!1 jsm ? s0mj=a(sm) < B implies for m large enough js0m ? smj=sm < 2Ba(sm)=sm ! 0. Moreover, ja(sm) ? a(s0m)j = jsm ? s0mjja0(s)j, with s 2 (xm; x0m); lim supm!1 jsm ? s0mj=a(sm) < B implies s0m sm , and thus s0m ! 1 and s ! 1; a = lims!1 a0(s) = 0 so that for m large enough j1 ? a(s0m )=a(sm)j < 2B ja0(s)j ! 0.

Lemma 4 Assume that F () satis es H1. Consider the dierential equation dz(t)=dt = h[z(t)], z(m0) = s~1m0 for some m0 1 such that '(s) is strictly positive and decreasing for s > s~1m0 , and de ne em = s~1m ? z (m). Then one has: (i) tAF [z(t)] ! 1 as t ! 1; (ii) em > 0 for any m > m0;

(iii) lim supm!1 e(m)=a[z(m)] < B for some B < 1 and limm!1 em = 0; (iv) lim infs!1 a(s) > c > 0 implies mAF (~s1m) ! 1;

(v) lim sups!1 a(s) < C < 1 implies js~1m ? F ?1 (1=m)j ! 0 and lim supm!1 js~1m ? F ?1(1=m)j=a(~s1m ) < D for some D < 1.

Proof. (i) Consider the behavior of F [z(t)] as t ! 1. Direct calculation gives

! 1 d t!1 = A [ z ( t )] !A dt F [z(t)]

and thus tAF [z(t)] ! 1 when t ! 1.

(ii) Using Taylor mean{value Theorem, we get 2 z(m + 1) = z(m) + dzdt(t) + 12 d dtz(2t) 0 ; t0 2 (0; 1) : jm+t jm

27

Since s~1m+1 = s~1m + h(~s1m), dz(t)=dt = h[z(t)] and d2z(t)=dt2 = ?F [z(t)]h[z(t)], we get em+1 = em + h(~s1m) ? h[z(m)] + (1=2)F [z(m + t0)]h[z(m + t0)], t0 2 (0; 1), and em0 = 0. Using again Taylor mean{value Theorem this recurrence gives em+1 = em + em dhds(s) + 12 F [z(m + t0)]h[z(m + t0)] jz(m)+e (52) = emf1 ? F [z(m) + e]g + 21 F [z(m + t0)]h[z(m + t0)] ; with e 2 (0; em), t0 2 (0; 1). Therefore, em0+1 > 0 and, by induction on i, em0 +i > 0 8i 1.

(iii)

(iii-1) Assume that lim infs!1 a(s) > c > 0, which includes the case a > 0. We show rst that em is bounded. Going one step further in the development of 1 h(~sm) ? h[z(m)], we obtain the recurrence 2 h(s) (em)2 em+1 = em + em dhds(s) + 12 d ds 2 jz(m) jz(m)+e 1 + 2 F [z(m + t0)]h[z(m + t0)] ; e 2 (0; em) ; t0 2 (0; 1) ; and thus for m m0, since '(), F () and h() decrease and em > 0, (53) em+1 < emf1 ? F [z(m)]g + 21 '[z(m)](em)2 + 21 F [z(m)]h[z(m)] : Take m0 such that 8m m0, h[z(m)] < c=2, a[z(m)] > c=2; it implies em < c=2 for any m > m0. Indeed, em0 = 0 and (53) directly gives em+1 < c=2 when em < c=2. It also implies em=a[z(m)] < 1 for any m > m0. We show now that em ! 0 when m ! 1. Lemma 3 (v) gives lim infm!1 F [z(m) + em]=F [z(m)] > b for some b > 0 and (52) gives for m large enough em+1 < emf1 ? 2bF [z(m)]g + 21 F [z(m)]h[z(m)] : (54) ), Lemma 2 implies em ! 0 when m ! 1. Since from (i) F [z(m)] 1=(Am

(iii-2) Assume now that a = 0. We show rst that m = em=a[z(m)] is bounded. One can write a[z(m + 1)] = a[z(m)] + [z(m +1) ? z(m)]a0(z) for some z in (z(m); z(m + 1)). Therefore, a[z(m + 1)] > a[z(m)] ? h[z(m)] maxz2[z(m);z(m+1)] ja0(z)j. When F () is a von Mises function, A(z) ! 1 and a0(z) ! 0 when z ! 1. Therefore, for any < 1=2, there exists m1 such that for any m > m1, A[z(m)] maxz2[z(m);z(m+1)] ja0(z)j < and A[z(m)]F [z(m)] < 1=16. Since h[z(m)] = a[z(m)]A[z(m)]F [z(m)], we obtain a[z(m + 1)] > a[z(m)]f1 ? F [z(m)]g, and (53) gives for m > m1: m+1 < 1 ? F1[z(m)] m f1 ? F [z(m)]g + 21 F [z(m)](m)2 + 21 h[z(m)]'[z(m)] ; < mf1 ? (1 ? )F [z(m)]g + F [z(m)](m)2 + h[z(m)]'[z(m)] : 28

Take m0 > m1, we show that m < (1 ? )=2 for any mn m0. Since em0 = 0, m0 =o 0, and m < (1 ? )=2 implies m+1 < (1 ? )=2 + F [z(m)] A[z(m)]F [z(m)] ? (1 ? )2=4 < (1 ? )=2. Therefore, lim supm!1 e[z(m)]=a[z(m)] < B = (1 ? )=2. We show now that em ! 0 when m ! 1. Using Lemma 3 (v) and (52), one gets (54) for m large enough. Since F [z(m)] 1=m (A = 1 when a = 0), Lemma 2 implies limm!1 em = 0.

(iv) Since em ! 0 from (iii), Lemma 3 (i, iii) gives F (~s1m)=F [z(m)] ! 1. Since mAF [z(m)] ! 1, see (i), one gets mAF (~s1m) ! 1. (v) When lim sups!1 a(s) < C < 1, (i) and Lemma 3 (iv) give jz(m) ? F ?1(1=m)j ! 0, em ! 0 thus implies js~1m ? F ?1(1=m)j ! 0 when m ! 1. From (iii), lim supm!1 js~1m ? z(m)j=a[z(m)] < B < 1, (i) and Lemma 3 (ii) give jz(m)?F ?1(1=m)j=a[z(m)] ! 0. Lemma 3 (vi) then implies lim supm!1 js~1m?F ?1(1=m)j=a(~s1m ) < D < 1.

Lemma 5 Assume that F () is a von Mises function. Then the following is true: (i)

g E fPki=1 Xi;m kF ?1(1=bm=kc) ! 1 as m ! 1 ; g ? k F ?1(1=bm=k c)j < (ii) lim sups!1 a(s) < C < 1 implies lim supm!1 jE fPki=1 Xi;m b for some b < 1; P g ? k F ?1 (1=bm=k c) ! 0 as m ! 1. (iii) lims!1 a(s) = 0 implies E f ki=1 Xi;m

Proof. (i) We shall denote dm = F ?1(1=m) and cm = a(dm). When F () is a von Mises function, k X g ? dm ]=cm ! E;k as m ! 1 ; [(1=k)E f Xi;m i=1

(55)

with E;k the mean of the sum of a k-dimensional -extremal variate, see [10] p. 201. P k g=dm ! 1 as m ! 1. Since a(s)=s ! 0 when s ! 1, one gets (1=k)E f i=1 Xi;m De ne dm;k = dm ?a(dm) log k. Since F () is a von Mises function, F (dm;k )=F (dm) ! k as m ! 1. Since F (dbm=kc)=F (dm) = m=bm=kc ! k, one gets F (dbm=kc)=F (dm;k ) ! 1. Lemma 3 (ii) implies dbm=kc=dm;k ! 1, a(dm)=dm ! 0 gives dm;k =dm ! 1, so that nally, (1=k)E fPk X g=d ! 1 when m ! 1. i=1 i;m

bm=kc

29

g?dm ?a(dm )E;k g ! 0; since F (dbm=kc )=F (dm;k ) ! (ii) (55) gives (1=cm )f(1=k)E fPki=1 Xi;m 1, see the proof of part (i), Lemma 3 (iv) implies jdbm=kc ? dm;k j ! 0, and therefore, since

cm = a(dm) < C ,

k X g ? dbm=kc ? (log k + E;k )a(dm ) ! 0 as m ! 1 : (1=k)E f Xi;m i=1

(56)

(iii) The result directly follows from (56).

Acknowledgements The author wish to thank an anonymous referee for his impressive review. His numerous comments, corrections and suggestions have been extremely useful to improve the paper.

References [1] C. Albright and C. Derman. Asymptotically optimal policies for the stochastic sequential assignment problem. Management Science, 19(1):46{51, 1972. [2] S. Albright. A Markov chain version of the secretary problem. Naval Research Logistics Quarterly, 23(1):151{159, 1976. [3] S. Albright. A Bayesian approach to a generalized house selling problem. Management Science, 24(4):432{440, 1977. [4] K. Ano, M. Tamaki, and M. Hu. A secretary problem with uncertain employment when the number of oers is restricted. J. of the Operations Research Soc. of Japan., 39(3):307{315, 1996. [5] Y. Bar-Shalom and E. Tse. Dual eect, certainty equivalence, and separation in stochastic control. IEEE Transactions on Automatic Control, 19(5):494{500, 1974. [6] D. Bertsekas. Dynamic Programming. Deterministic and Stochastic Models. PrenticeHall, Englewood Clis, 1987. [7] D. Bertsekas. Dynamic Programming and Optimal Control. Athena Scienti c, Belmont, 1995. [8] C.E.R.N. The compact muon solenoid, technical proposal, trigger and data acquisition. Technical Report CERN/LHCC 94-38, European Laboratory for Particle Physics, 1994. 30

[9] C. Derman, G. Lieberman, and S. Ross. A sequential stochastic assignment problem. Management Science, 18(7):349{355, 1972. [10] P. Embrechts, C. Kluppelberg, and T. Mikosch. Modelling Extremal Events. Springer, Berlin, 1997. [11] V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972. [12] A. Fel'dbaum. Optimal Control Systems. Academic Press, New York, 1965. [13] W. Feller. An Introduction to Probability Theory and Its Applications. Wiley, New York, 1966. [14] T. Ferguson. Who solved the secretary problem? Statistical Science, 4(3):282{296, 1989. [15] P. Freeman. The secretary problem and its extensions: a review. International Statistical Review, 51:189{206, 1983. [16] I. Gutman. On a problem of L. Moser. Can. Math. Bull., 3(1):35{39, 1960. [17] R. Kadison. Strategies in the secretary problem. Expo. Math., 12:125{144, 1994. [18] D. Lindley. Dynamic programming and decision theory. Applied Statist., 10:39{51, 1961. [19] L. Moser. On a problem of Cayley. Scripta Mathematica, 22:289{292, 1956. [20] J. Patchell and O. Jacobs. Separability, neutrality and certainty equivalence. International Journal of Control, 13(2):337{342, 1971. [21] L. Pronzato. Sequential selection of observations in randomly generated experiments. Tatra Mountains Math. Pub., 1999. to appear. [22] F. Pukelsheim. Optimal Experimental Design. Wiley, New York, 1993. [23] M. Quine and J. Law. Exact results for a secretary problem. J. Appl. Prob., 33:630{ 639, 1996. [24] A. Renyi. Calcul des Probabilites. Dunod, Paris, 1966. [25] R. Righter. A resource allocation problem in a random environment. Operations Research, 37(2):329{33, 1989. [26] J. Rose. The secretary problem with a call option. Opns. Res. Lett., 3:237{241, 1984. [27] S. Silvey. Optimal Design. Chapman & Hall, London, 1980. [28] M. Smith. A secretary problem with uncertain employment. J. Appl. Prob., 12:620{ 624, 1975. 31

[29] M. Tamaki. Recognizing both the maximum and the second maximum of a sequence. J. Appl. Prob., 16:803{812, 1979. [30] M. Tamaki. A full-information best-choice problem with nite memory. J. Appl. Prob., 23:718{735, 1986. [31] M. Tamaki. A secretary problem with uncertain employment and best choice of available candidates. Operations Research, 39(2):274{284, 1991. [32] M. Tamaki. A secretary problem with rank-dependent rejection probability when the number of oers is limited. In Proc. 1st Int. Conf. on Operations and Quantitative Management, pages 135{142, Jaipur, January 1997. [33] M. Yang. Recognizing the maximum of a random sequence based on relative rank with backward solicitation. J. Appl. Prob., 11:504{512, 1974. [34] G. Yeo. Duration of a secretary problem. J. Appl. Prob., 34:556{558, 1997.

32

E fJ g stdfJ g E fng stdfng E fT g stdfT g S1 2.02 0.55 0.291 0.45 0.07 0.025 S2 1.73 0.90 0.374 0.48 0.05 0.02 S3 2.29 0.51 0.530 0.50 0.33 0 S4 1.36 0.50 0.042 0.20 0.09 0.03 S5 2.03 0.54 0.267 0.44 1.40 0.15 S6 2.04 0.60 0.324 0.47 7.42 0.9 Table 1: Empirical means and standard{deviations of J , n and computing time T (in s) for strategies S1 to S6 (N = 50, n = 1, 1000 repetitions) j;k S1 S2 S3 S4 S5 J S1 12.79 -14.78 30.50 -1.16 S2 -12.79 -18.51 11.73 -12.67 S3 14.78 18.51 45.35 14.12 S4 -30.50 -11.73 -45.35 -31.46 S5 1.16 12.67 -14.12 31.46 S6 1.63 13.09 -13.54 28.69 0.02

S6 -1.63 -13.09 13.54 -28.69 -0.02

Table 2: Values of j;k J (method of paired comparisons) for strategies S1 to S6 (N = 50, n = 1, 1000 repetitions)

j;k S1 S2 S3 S4 n S1 -9.51 -11.95 18.20 S2 9.51 -7.32 22.28 S3 11.95 7.32 29.26 S4 -18.20 -22.28 -29.26 S5 -3.64 -10.94 -13.31 17.03 S6 2.99 -4.069 -10.74 18.73

S5 3.64 10.94 13.31 -17.03

5.12

S6 -2.99 4.07 10.74 -18.73 -5.12

Table 3: Values of j;k n (method of paired comparisons) for strategies S1 to S6 (N = 50, n = 1, 1000 repetitions)

33

E fJ g stdfJ g E fng stdfng E fT g stdfT g S1 15.61 1.90 7.24 1.18 0.09 0.03 S2 9.09 2.24 2.58 1.30 0.05 0.02 S3 16.63 1.90 8.51 0.94 2.31 0 S4 13.27 2.04 4.62 1.47 0.09 0.03 S5 15.78 1.91 7.42 1.15 6.4 0.5 S6 15.83 1.92 7.44 1.11 43 2.8 Table 4: Empirical means and standard{deviations of J , n and computing time T (in s) for strategies S1 to S6 (N = 50, n = 10, 1000 repetitions)

j;k S1 S2 S3 S4 J S1 82.97 -29.23 47.60 S2 -82.97 -98.28 -41.73 S3 29.23 98.28 62.94 S4 -47.60 41.73 -62.94 S5 8.97 89.08 -26.10 45.62 S6 9.35 91.51 -25.71 45.10

S5 -8.97 -89.08 26.10 -45.62

0.12

S6 -9.35 -91.51 25.71 -45.10 -0.12

Table 5: Values of j;k J (method of paired comparisons) for strategies S1 to S6 (N = 50, n = 10, 1000 repetitions)

j;k S1 S2 S3 S4 n S1 77.01 -30.40 56.07 S2 -77.01 -112.93 -26.85 S3 30.40 112.93 72.79 S4 -56.07 26.85 -72.79 S5 7.91 84.44 -27.91 53.91 S6 6.67 88.80 -28.93 52.60

S5 -7.91 -84.44 27.91 -53.91

1.04

S6 -6.67 -88.80 28.93 -52.60 -1.04

Table 6: Values of j;k n (method of paired comparisons) for strategies S1 to S6 (N = 50, n = 10, 1000 repetitions) 34

FIGURE CAPTIONS: Fig. 1: Evolution of the expected gain as a function of time with three items to be selected: optimal sequential decisions (full line), OLFO decisions (dashed line) optimal non-sequential decision (dotted line). The density of X is '1() Fig. 2: Evolution of the expected gain as a function of time with three items to be selected: optimal sequential decisions (full line), OLFO decisions (dashed line) optimal non-sequential decision (dotted line). The density of X is '2() Fig. 3: Evolution of the expected gain as a function of time with three items to be selected: the true density is '1() and optimal decisions are for '2() (full line), the true density is '2() and optimal decisions are for '1() (dashed line) Fig. 4: Evolution of the expected gain as a function of time with one item to be selected: optimal sequential decisions (full line), decisions with a constant but optimized threshold (dash-dotted line). The density of X is '1() Fig. 5: Evolution of the expected gain as a function of time with ten items to be selected (Example 5): optimal sequential decisions (full line), OLFO decisions (dashed line), suboptimal decisions (38) (dash{dotted line) Fig. 6: Evolution of lj (42), as a function of j for N = 50, A = 1, = 0:01: n = 1 (), n = 5 (+) and n = 10 () Fig. 7: Evolution of the threshold as a function of time: S5 (dashed line), S3 (full line). The values of Xk are indicated by stars

35

15

10

5

0 0

100

200

300

400

500

600

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

Figure 1:

10 9 8 7 6 5 4 3 2 1 0 0

Figure 2: 36

12

10

8

6

4

2

0 0

100

200

300

400

500

600

700

800

900

1000

Figure 3: 5

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

0

100

200

300

400

500

Figure 4: 37

600

700

800

900

1000

200

180

160

140

120

100

80

60

40

20

0

0

100

200

300

400

500

600

700

800

900

1000

30

35

40

45

50

Figure 5: 50

45

40

35

30

25

20

15

10

5

0

0

5

10

15

20

25

Figure 6: 38

2

1.8

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0

5

10

15

20

25

Figure 7:

39

30

35

40

45

50

Recommend Documents

asymptotically optimal algorithm and complexity ... - Semantic Scholar

ASYMPTOTICALLY OPTIMAL LOWER BOUNDS ... - Semantic Scholar