A Near-Optimal Dynamic Learning Algorithm for Online Matching ...

Report 0 Downloads 54 Views
arXiv:1307.5934v1 [cs.DS] 23 Jul 2013

A Near-Optimal Dynamic Learning Algorithm for Online Matching Problems with Concave Returns Zizhuo Wang∗

Xiao Chen

May 11, 2014

Abstract We consider the online matching problem with concave returns. This problem is a significant generalization of the Adwords allocation problem and has vast applications in online advertising. In this problem, a sequence of items arrive sequentially and each has to be allocated to one of the bidders, who bid a certain value for each item. At each time, the decision maker has to allocate the current item to one of the bidders without knowing the future bids and the objective is to maximize the sum of some concave functions of each bidder’s aggregate value. In this work, we propose an algorithm that achieves near-optimal performance for this problem when the bids arrive in a random order and the input data satisfies certain conditions. The key idea of our algorithm is to learn the input data pattern dynamically: we solve a sequence of carefully chosen partial allocation problems and use their optimal solutions to assist with the future decision. Our analysis belongs to the primal-dual paradigm, however, the absence of linearity of the objective function and the dynamic feature of the algorithm makes our analysis quite unique.

1

Introduction

Selling online advertisements has been the main revenue driver for many internet companies such as Google, Yahoo, Facebook, etc. For example, out of the $46 billion revenue earned by Google in 2012, $43.6 billion (95%) came from online advertisement [2]. In the same year, these two figures are $5.1 billion and $4.3 billion (84%) for Facebook [1]. Because of such enormous weights of ads revenue in many internet businesses, improving the performance of ads allocation systems becomes extremely important for those companies and thus has attracted great interest in the research community in the past decade. To study the online advertisement allocation problem (Adwords problem), the majority of the research adopts an online matching model, see, e.g., [15, 12, 7, 13, 4, 14, 9, 10, 8, 11, 6]. In the online matching model, there are m bidders1 . A sequence of n keywords arrive at the search engine during a fixed time horizon. Based on the relevance of the keywords, the ith bidder would bid a certain amount bij to show his advertisement on the result page of the jth keyword. The search engine’s decision is to allocate each keyword to one bidder (we only consider a single allocation in this paper). Note that each allocation decision can only depend on the information earlier in the sequence but not on any future data. In a classical online matching problem, ∗ Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN, 55455, USA. Email: [email protected], [email protected]. 1 Throughout this paper, we use the search engine terminologies when describing the problem. Other applications may have different terminologies but the problem structures are the same.

1

each bidder i has a known daily budget Bi and the search engine maximizes its daily revenue collected from all the bidders. The offline optimization problem can be written as follows: Pn Pm maximizex i=1 min{ j=1 bij xij , Bi } Pm (1) s.t. x ≤ 1, ∀j ij i=1 xij ≥ 0, ∀i, j. Here xij denotes the fraction of item j allocated to bidder i2 . In the online version of this problem, at time j, the coefficient bj = {bij }m i=1 is revealed and an irrevocable decision xj = {xij }m i=1 has to be made before observing the next data. This problem is considered to be one of the fundamental problems in the theory of online optimization. In this paper, we consider a generalization of problem (1). Specifically, instead of the budgeted linear objective function, we allow more general concave return functions. That is, we are interested in the following online matching problem with concave returns:  P Pm n maximizex j=1 bij xij i=1 Mi Pm (2) ∀j s.t. i=1 xij ≤ 1, xij ≥ 0, ∀i, j where for all i, Mi is a nondecreasing concave function with Mi (0) = 0. In this paper, we also assume that Mi (·)’s are continuously differentiable and denote their derivatives by Mi′ (·). This assumption is mainly for the ease of analysis. Our main results still hold if this assumption is not satisfied (sub-differential arguments have to be used in that case). As pointed out in [8], there are several practical motivations for considering concave returns. Among them are convex penalty costs for under-delivery in search engine-advertiser contracts, the concavity of the click-through rate 3 in the number of allocated bids observed in empirical data and fairness considerations. In each of the problems mentioned above, one can write the objective as a concave function. For the sake of space, we refer the readers to [8] for a more thorough review of the motivations for this problem. One important question when studying online algorithms is the assumption on the input data. In this work, we adopt a random permutation model. More precisely, we assume that 1. The total number of arrivals n is known a priori 2. The set of {bj }’s can be adversarially chosen, however, its arrival order is uniformly distributed over all the permutations. The random permutation model has been adopted in many recent literature in the study of online matching problems [7, 9, 3]. It is an intermediate path between using a worst-case analysis and assuming each input data is drawn i.i.d. from a certain distribution. Compared to the worst-case analysis [15, 10, 5, 8], the random permutation model is practically reasonable yet much less conservative. On the other hand, the random permutation model is much less restrictive than assuming i.i.d input and the performance difference between these two models are often small [6]. In particular, the assumption of the knowledge of n is necessary for any online algorithm to achieve near-optimal performance [7] (but it can be relaxed to 1 ± ǫ knowledge). Therefore, for large problems with relatively stationary input, random permutation model is a good approximation and the study of such models is of practical interest. Next we define the performance measure of an algorithm under the random permutation model: 2

We allow fractional allocations in our model. However, our proposed algorithms output integer solutions. Thus they are also near-optimal if one confines to integer solutions. 3 The click-through rate is the fraction of ads that converts from impression to click and thus generates revenue.

2

Definition 1. Let OPT be the optimal objective value for the offline problem (2). An online algorithm A is called c-competitive in the random permutation model if the expected value of the online solutions by using A is at least c times the optimal offline solution, that is    n m X X bij xij (σ, A) ≥ cOPT Mi  Eσ  i=1

j=1

where the expectation is taken over uniformly random permutations σ of 1, ..., n, and xij (σ, A) is the ijth decision made by algorithm A when the inputs arrive in order σ.

In [8], the authors propose an algorithm for the online matching problem with concave return that has a constant competitive ratio under the worst-case model. They also show that a constant competitive ratio is the best possible result under that model. In this paper, we propose an algorithm under the random permutation model, which achieves near-optimal performance under some conditions of the input. Our main result is stated as follows: Theorem 1. Choose any ǫ ∈ (0, 1/2). Let C =

m log (m2 n/ǫ) . ǫ2

If the following conditions hold:

1. maxi,j bij ≤ 1

2. mini,j {bij |bij > 0} ≥ η > 0 Pn 3. j=1 bij ≥ 2nγ for all i, for some γ > 0 n o 4. n ≥ max 3 logǫγ(m/ǫ) , 2mC 2 ǫηγ

> 1, where γ is defined in condition 3, 5. Define K = η ǫnγ−ǫC mC Mi′ (KC) < ηMi′′ (C),

∀i, i′ ,

then there exists an online algorithm (Algorithm DLA) that is 1−O(ǫ)-competitive for the online matching problem with concave return Mi (·)’s. Now we explain the meanings of the conditions in Theorem 1. The first condition is without loss of generality since we can always re-scale the inputs. The second condition says that there is a minimum bid requirement. This is very common in practice and is often enforced by having a reserve price. Condition 3 means that each bidder has to submit enough bids throughout the entire horizon. Since our algorithm is learning based, we need to have enough accepted data from each bidder in our learning process to obtain an accurate understanding of the input pattern. In reality, each bidder is usually interested in a class of keywords and this condition is not hard to satisfy. The fourth condition requires n to be large enough compared to other input parameters. Note that in many practical problems, n is typically very large (e.g., Google receives more than 5 billions searches per day, even if we focus on a specific category, the number can still be in the millions). Thus this condition is justified. The last condition appears to be complex, however, given concave functions Mi (·)′ s such that Mi′ (x) → 0 as x → ∞, this condition also requires n to be large enough. For example, if we choose Mi (x) = xp (0 < p < 1) for all i, then in order to satisfy this condition, one only needs to make sure that n ≥ ǫγη2mC 2/(1−p) . In the analysis, this condition is used to make sure that in the optimal allocation, each bidder will at least be allocated for a certain amount, which is necessary in proving the performance of our algorithm. Again, for a large problem, this condition is not hard to satisfy. To propose an algorithm that achieves near-optimal performance, the main idea is to utilize the observed data in the allocation process. In particular, since the input data arrives in a random order, using the past input data and projecting it into the future should present a good approximation of the problem. To mathematically capture this idea, we use a primal-dual

3

approach. We obtain the dual optimal solutions to suitably constructed partial programs and use them to assist with future allocations. Then the key question is which partial program to solve. We first propose a one-time learning algorithm (OLA, see Section 2) that only solves a partial program once at time ǫn. By carefully examining this algorithm, we prove that it achieves near-optimal performance when the inputs satisfy certain conditions. However, the conditions are stronger than those stated in Theorem 1. To improve our algorithm, we further propose a dynamic learning algorithm (DLA, see Section 3). The dynamic learning algorithm makes better use of the observed data and updates the dual solution at a geometric pace, that is, at time ǫn, 2ǫn....and so on. We show that these resolvings can lift the performance of the algorithm by an order of magnitude and thus prove Theorem 1. As one will see in the proof of DLA, the choices of the resolving points perfectly balance the tradeoff between exploration and exploitation, which are the main tensions in such types of learning algorithms. It is worth mentioning that a similar kind of dynamic learning algorithm has been proposed in [3] and further studied in [16] and [19]. However, those literature only focus on linear objectives. In our analysis, the nonlinearity of the objective function presents a non-trivial hurdle since one can no longer simply analyze the revenue generated in each part and add them together. In this paper, we successfully work around this hurdle by a convex duality argument. We believe that our analysis is a non-trivial extension of the previous work. And the problem solved has important applications.

2

The one-time learning algorithm

We first rewrite the offline problem (2) as follows: Pm maximizex,u Pi=1 Mi (ui ) n s.t. bij xij = ui , ∀i Pj=1 m x ∀j i=1 ij ≤ 1, xij ≥ 0, ∀i, j We define the following “dual problem”: Pn Pm ′ infv ,y j=1 yj + i=1 (Mi (vi ) − Mi (vi )vi ) ′ s.t. yj ≥ bij Mi (vi ), ∀i, j vi ≥ 0, ∀i yj ≥ 0, ∀j

(3)

(4)

Let the optimal value of (3) be P ∗ and the optimal value of (4) be D∗ . We first prove the following lemma whose proof is relegated to Appendix B. Lemma 1. P ∗ = D∗ . Furthermore, the objective value of any feasible solution to (4) is an upper bound of the optimal value of (3). Before we describe our algorithm, we define the following partial optimization problem: Pm Mi (ui ) (Pǫ ) maximizex,u Pi=1 ǫn bij s.t. ǫ xij = ui , ∀i Pj=1 (5) m ∀j i=1 xij ≤ 1, xij ≥ 0, ∀i, j. We define our algorithm as follows: ˆ and then use The idea of the algorithm is to use the first ǫn bids to learn an approximate u ˆ is solved from (Pǫ ) which projects the allocation in it to make all the future allocations. Here u the first ǫn bids to the entire time horizon. Note that a similar idea has been used to construct

4

Algorithm 1 One-Time Learning Algorithm (OLA) 1. During the first ǫn bids, we don’t make any allocations. ˆ and u ˆ. 2. After observing the first ǫn bids, we solve (Pǫ ) and denote the optimal solutions by x 3. Define xi (u, b) =



1 if i = argmaxk {bk Mk′ (uk )} 0 otherwise.

(6)

Here we break ties among bk Mk′ (uk ) arbitrarily. For ǫn + 1 to nth bid, we use the allocation rule xj = x(ˆ u, bj ).

algorithms for online matching problems with linear objective functions (see, e.g., [3, 16, 7]). However, the analyses of those algorithms all depend on the linearity of the objective function which we don’t possess in this model. Instead, a more careful analysis with the use of concavity is required in our analysis, making it quite different from those in prior literature. In the following, we assume without loss of generality that maxi,j bij ≤ 1. We also make a technical assumption as follows: Assumption 1. The inputs of the problem are in general position. That is, for any vector p = (p1 , ..., pm ) 6= 0, there are at most m terms among arg maxi {bij pi }, j = 1, ..., n, that are not singleton sets. The assumption says that we only need to break ties in (6) for no more than m times. This assumption is not necessarily true for all inputs. However, as pointed out by [7] and [3], one can always perturb bij ’s by an arbitrarily small amount such that the assumption holds. And the effect to the solution can be arbitrarily small. Given this assumption, we have the following ˆ and u ˆ . Its proof follows immediately from that x ˆ and x(ˆ relationship between x u, b) differs by no more than m terms. Lemma 2. ǫˆ ui − m ≤

ǫn X j=1

bij xij (ˆ u, bj ) ≤ ǫˆ ui + m.

We first prove the following theorem about the performance of OLA, which relies on a condition of the solution to (Pǫ ). Theorem 2. Choose any ǫ ∈ (0, 1/2). Given mini u ˆi ≥ competitive algorithm.

12m log (m2 n/ǫ) , ǫ3

OLA is a 1 − O(ǫ)-

Before we prove Theorem 2, we define some notations. • We define the optimal offline solution to (3) by (x∗ , u∗ ) with optimal value OPT. Pn • Define j=1 bij xij (ˆ u, bj ) = u ¯i , note that u ¯i normally doesn’t equal to u ˆi . We show the following lemma:

Lemma 3. Choose any ǫ ∈ (0, 1/2). Given mini u ˆi ≥ (1 − ǫ)ˆ ui ≤ u ¯i ≤ (1 + ǫ)ˆ ui ,

12m log (m2 n/ǫ) , ǫ3

for all i.

with probability 1 − ǫ, (7)

ˆ , we define that a random sample Proof. The proof will proceed as follows: For any fixed u ˆ if and only if u ˆ is the optimal solution to (5) for this (the first ǫn arrivals) S is bad for this u

5

S, but u ¯i < (1 − ǫ) u ˆi , or u ¯i > (1 + ǫ) u ˆi , for some i. First, we show that the probability of bad 2 n/ǫ) ˆ (satisfying mini uˆi ≥ 12m logǫ(m ) and i. Then, we take a sample is small for every fixed u 3 union bound over all distinct u ˆi and i’s to prove the lemma. ˆ and i. Define Yj = bij xij (ˆ To start with, we fix u u, bj ). By Lemma 2 and the condition on uˆi , we have X (1 − ǫ2 )ǫˆ ui ≤ ǫˆ ui − m ≤ Yj ≤ ǫˆ ui + m ≤ (1 + ǫ2 )ǫˆ ui . j∈S

Therefore, the probability of bad S is bounded by the sum of the following two terms (N denotes all the arrivals):     X X X X P Yj ≤ ǫ(1 + ǫ2 )ˆ ui , Yj > (1 + ǫ)ˆ ui  + P  Yj ≥ ǫ(1 − ǫ2 )ˆ ui , Yj < (1 − ǫ)ˆ ui  . (8) j∈S

j∈N

j∈S

For the first term, we have that   X X P Yj ≤ ǫ(1 + ǫ2 )ˆ ui , Yj > (1 + ǫ)ˆ ui  j∈S

j∈N

j∈N

  X X X ǫ2 ˆi , Yj > (1 + ǫ)ˆ ui  ≤ P  Yj − ǫ Yj > u 2 j∈S j∈N j∈N   ǫ ǫ3 uˆi . = δ. ≤ ≤ 2 exp − 4(2 + ǫ) 2m(m2 n)m

Here the second inequality follows from the Hoeffding-Bernstein’s inequality, see Lemma 6 in Appendix A. Similarly, we can get the same result for the second term in (8), which is also ˆ and i. bounded by δ. Therefore, the probability of bad samples are bounded by 2δ for fixed u ˆ ’s. We call u ˆ and u ˆ ′ distinct if and only if Next, we take a union bound over all distinct u they result in different allocations, i.e., xj (ˆ u, bj ) 6= xj (ˆ u′ , bj ) for some j. Denote Mi′ (ui ) = vi . For each j, the allocation is uniquely defined by the signs of the following terms: bij vi − bi′ j vi′ ,

∀1 ≤ i < i′ ≤ m.

There are m(m − 1)/2 such terms for each j. Therefore, the entire allocation profiles for all the n arrivals can be determined by the signs of no more than m2 n differences. Now we simply need to find out how many different allocation profiles can arise by choosing different v’s. By the result in computational geometry, the total number of different profiles for the m2 n difm 2 ˆ ’s is no more than ferences can not exceed m n [17]. Therefore, the number of distinct u m ˆ ’s, and i = 1, . . . , m, Lemma 3 follows.  m2 n . Now we take a union bound over all distinct u

Next we show that OLA archives a near-optimal solution under the condition in Theorem 2. We first construct a feasible solution to (4): vˆi = uˆi ,

By Lemma 1, OP T −

m X i=1

Pn

j=1

yˆj +

Mi (¯ ui ) ≤ = =

Pm

i=1

m X

i=1 m X

yˆj = max{bij Mi′ (ˆ ui )}.

(Mi (ˆ ui ) −

i

Mi′ (ˆ ui )ˆ ui )

(Mi (ˆ ui ) − u ˆi Mi′ (ˆ ui )) −

(Mi (ˆ ui ) − Mi (¯ ui )) +

i=1 m X i=1

is an upper bound on OPT. Thus, we have m X i=1

m X i=1

Mi (¯ ui ) +

n X

(¯ ui Mi′ (ˆ ui ) − u ˆi Mi′ (ˆ ui )) −

(Mi (ˆ ui ) − Mi (¯ ui ) + (¯ ui − uˆi )Mi′ (ˆ ui ))

6

yˆj

j=1

m X i=1

u¯i Mi′ (ˆ ui ) +

n X j=1

yˆj

where the last equality is because that by the allocation rule (6): n X j=1

yˆj =

n X m X

xij (ˆ u, bj )ˆ yj =

m X n X

xij (ˆ u, bj )bij Mi′ (ˆ ui ) =

u ¯i Mi′ (ˆ ui )

i=1

i=1 j=1

j=1 i=1

m X

Now, we claim that if condition (7) holds, m X i=1

(Mi (ˆ ui ) − Mi (¯ ui ) + (¯ ui − uˆi )Mi′ (ˆ ui )) ≤ 2ǫ

We consider the following two cases: • Case 1: u ¯i ≤ uˆi . In this case, Mi (ˆ ui ) − Mi (¯ ui ) + (¯ ui −

u ˆi )Mi′ (ˆ ui )

m X

Mi (¯ ui ).

i=1

uˆi − u ¯i ≤ Mi (ˆ ui ) − Mi (¯ ui ) ≤ Mi (¯ ui ) ≤ 2ǫMi (¯ ui ) u ¯i

where the second inequality holds because of the concavity of Mi (·). • Case 2: u ¯i > uˆi . In this case, u ¯i − uˆi ′ ′ Mi (ˆ ui ) − Mi (¯ ui ) + (¯ ui − uˆi )Mi (ˆ ui ) ≤ (¯ ui − uˆi )Mi (ˆ ui ) ≤ Mi (ˆ ui ) ≤ ǫMi (¯ ui ). u ˆi Again, the second inequality is because of the concavity of Mi (·).

Thus, under the condition that mini u ˆi ≥ OPT −

m X i=1

12m log (m2 n/ǫ) , ǫ3

Mi (¯ ui ) ≤ 2ǫ

m X i=1

with probability 1 − ǫ,

Mi (¯ ui ) ≤ 2ǫOPT,

Pm i.e., i=1 Mi (¯ ui ) ≥ (1 − 2ǫ)OPT. Pn u, bj ) Lastly, we note that the actual allocation in our algorithm for bidder i is u˜i = j=ǫn+1 bij xij (ˆ (since we ignore the first ǫn arrivals). By Lemma 2, we have u˜i = u¯i −

ǫn X j=1

bij xij (ˆ u, bj ) ≥ u ¯i − ǫ(1 + ǫ2 )ˆ ui

Thus when condition (7) holds, u ˜i ≥ (1 − 3ǫ)¯ ui . Therefore, m X i=1

Mi (˜ u) ≥

m X

Mi ((1 − 3ǫ)¯ ui ) ≥ (1 − 3ǫ)

m X

Mi (˜ u) ≥ (1 − 5ǫ)OPT.

i=1

m X

Mi (¯ ui )

i=1

The last inequality is because of the concavity of Mi (·)′ s and that Mi (0) = 0. Therefore, given 2 n/ǫ) mini u ˆi ≥ 12m logǫ(m , with probability 1 − ǫ, 3 i=1

Therefore, Theorem 2 is proved.  ˆ . However, u ˆ Theorem 2 shows that the OLA is near-optimal under some conditions of u is essentially an output in the algorithm. Although such types of condition are not uncommon in the study of online algorithms (e.g., in the result of [7], [9]), it is quite undesirable. In this particular case, it is not even clear whether the condition will be ever satisfied. In the following, we address this problem by providing a set of sufficient conditions which only depend on the input parameters (i.e., m, n, b’s and M (·)’s). We show that our algorithm achieves near-optimal performance under these conditions. We start with the following theorem.

7

Theorem 3. For any C > 0, suppose the following conditions hold: 1. maxi,j bij ≤ 1

2. mini,j {bij |bij > 0} ≥ η > 0 Pn j=1 bij ≥ 2nγ for all i, for some γ > 0 n o 4. n ≥ max 3 logǫγ(m/ǫ) , 2mC 2 ǫηγ 3.

5. Define K = η ǫnγ−ǫC > 1, where γ is defined in condition 3, mC Mi′ (KC) < ηMi′′ (C),

∀i, i′ .

(9)

Then with probability 1 − ǫ, u ˆi ≥ C, for all i. Some explanations of these conditions have been given after Theorem 1. Here we give some additional comments to condition 3 and 5 that is related to the analysis. Condition 3 means that each bidder submits enough bids throughout the entire horizon. It is intuitive that in order to guarantee that every bidder receives at least a certain amount of allocation (ˆ ui > C), they must have submitted enough bids to start with. The last condition can be explained as follows: In order to prove that in the solution to (Pǫ ), each bidder gets a certain amount of allocation, we need to rule out the possibility that one bidder receives nearly all the allocation. Precluding this scenario requires us to make sure that the decreasing marginal effect should be strong enough to compensate the potential differences in the bid values. We will see in the proof (see Appendix C) that this condition is exactly used for this purpose. Now combine Theorem 2 and 3, we have the following result for the OLA: Corollary 1. Choose any ǫ ∈ (0, 1/2). Assume the inputs satisfy the following conditions for 2 n/ǫ) : C = m log (m ǫ3 1. maxi,j bij ≤ 1

2. mini,j {bij |bij > 0} ≥ η > 0 Pn 3. j=1 bij ≥ 2nγ for all i, for some γ > 0 o n . 4. n ≥ max 3 logǫγ(m/ǫ) , 2mC 2 ǫηγ

5. Define K = η ǫnγ−ǫC > 1, where γ is defined in condition 3, mC Mi′ (KC) < ηMi′′ (C),

∀i, i′

Then OLA is 1 − O(ǫ)-competitive under the random permutation model.

3

Dynamic Learning Algorithm

In the previous section, we introduced a one-time learning algorithm that can achieve nearoptimal performance. While the OLA illustrates the ideas of our approach, and requires to solve a convex optimization problem only once, the conditions it requires to reach near-optimality are stricter than what we claim in Theorem 1. In this section, we propose an enhanced algorithm that lessens the conditions. The main idea for the enhancement is the following: in the one-time learning algorithm, we only solve a partial program once. However, it is possible that there is some error for that solution. If we could modify the solution as we get more data, we might be able to improve the performance of the algorithm. In the following, we introduce a dynamic learning algorithm

8

based on this idea, which updates the allocation policy every time the history doubles, that is, ˆ at time t = ǫn, 2ǫn, 4ǫn, . . . and use it to allocate the bids for the next it will compute a new u time period. We define the following problem by Pℓ : Pm M (u ) (Pℓ ) maximizex,u Pℓi=1 n i i s.t. j=1 ℓ bij xij = ui , ∀i Pm ∀j i=1 xij ≤ 1, xij ≥ 0, ∀i, j. And define (xℓ , uℓ ) to be the optimal solution to (Pℓ ). The algorithm is given as follows:

Algorithm 2 Dynamic Learning Algorithm (DLA) 1. During the first ǫn orders, we don’t make any allocations. 2. After t = ǫn, set each x by xj = x(uℓ , bj ). Here ℓ = ⌈2r nǫ⌉ and r is the largest integer such that ℓ < j. In the following, without loss of generality, we assume that ǫ = 2−L . Define ℓk = 2k−1 ǫn, ˆ k = uℓk . We first prove the following theorem: k = 1, ..., L. And define u Theorem 4. If for all k, mini u ˆki ≥ random permutation model.

16m log (m2 n/ǫ) , ǫ2

then DLA is 1 − O(ǫ)-competitive under the

Before we proceed to the proof, we first define some more notations. We define: ℓk+1

u ¯ki

=

X

k

bij xij (ˆ u , bj ),

u˜ki

=

n X

k

bij xij (ˆ u , bj )

and

j=1

j=ℓk +1

u ¯i =

L X

u ¯ki .

k=1

u ¯ki

Note that in these definitions, is the allocated values for bidder i in the period ℓk + 1 to ℓk+1 ˆ k , which is the actual allocation in that period. u using u ˜ki is the allocation for bidder i in all k ˆ is used. And u periods if u ¯i is the actual allocation of bidder i during the entire algorithm. ˆki . ˜ki and u We first prove the following lemma bounding the differences between u ¯ki , u 2

n/ǫ) Lemma 4. If mini u ˆki ≥ 16m logǫ(m , then with probability 1 − ǫ, for all i, 2   r  r  n n n u ˆki ≤ u¯ki ≤ 1 + ǫ u ˆki 1−ǫ ℓk ℓk ℓk

(10)

and   r  r  n n k k 1−ǫ u ˆi ≤ u ˜i ≤ 1 + ǫ uˆki . ℓk ℓk

(11)

¯ki , u˜ki and u ˆki are close to each other. In Lemma 4 shows that with high probability, ℓnk u p particular, as k is small, the factor (1 ± ǫ n/ℓk ) is relatively loose while as k increases, the factor becomes tight. The proof of Lemma 4 is similar to that of Lemma 3 and is relegated to Appendix D. The next lemma gives a bound on the revenue obtained by DLA. Lemma 5. If u ˆki ≥

16m log (m2 n/ǫ) ǫ2

for all i and k. Then with probability 1 − O(ǫ)  r  n k n Mi ( u OPT. ¯i ) ≥ 1 − 6ǫ ℓ l i k i=1

m X

9

The proof of Lemma 5 can be found in Appendix E. Finally, we prove Theorem 4. We bound the objective value of the actual allocation. Note that the actual allocation for each i can be written as L X

u ¯ki =

k=1

where αk = m X

Mi

i=1

L X

αk

k=1

n k u ¯ ℓk i

ℓk n.

By the property of concave functions, we have ! ! ! L m X L L m L X X X X X n k n k k Mi αk u αk Mi ( u u ¯i = ¯i + 1 − αk · 0 ≥ ¯i ). ℓ ℓ k k i=1 i=1 k=1

k=1

k=1

k=1

By Lemma 5, with probability 1 − O(ǫ) L m X X

i=1 k=1

L

X ℓk n αk Mi ( u¯ki ) ≥ ℓk n k=1

 r  L r X n ℓk 1 − 6ǫ OP T = (1 − ǫ)OPT − 6ǫ OPT ≥ (1 − 7ǫ)OPT ℓk n k=1

where the last inequality is because r r L r X √ ℓk 1 1 = + ... ≤ 2 − 2 ≤ 1 n 2 4 k=1

Therefore, Theorem 4 is proved.  Similar to Theorem 3, we have the following conditions for the input parameters such that with high probability, the condition in Theorem 4 holds. Theorem 5. For any C > 0, suppose the following conditions hold: 1. maxi,j bij ≤ 1

2. mini,j {bij |bij > 0} ≥ η > 0 Pn 3. j=1 bij ≥ 2nγ for all i, for some γ > 0 n o 4. n ≥ max 6 logǫγ(m/ǫ) , 2mC 2 ǫηγ

> 1, where γ is defined in condition 3, 5. Define K = η ǫnγ−ǫC mC Mi′ (KC) < ηMi′′ (C),

∀i, i′ .

Then with probability 1 − ǫ, u ˆki ≥ C, for all i and k. The proof of Theorem 5 is very similar to that of Theorem 3 and is given in Appendix F. Finally, we combine Theorem 4 and 5, and Theorem 1 follows.

4

Conclusions and Future Work

In this paper, we propose a dynamic learning algorithm for online matching problem with concave returns. We show that our algorithm achieves near-optimal performance when the data arrives in a random order and satisfies some conditions. The analysis is primal-dual based, however, nonlinear objective entails us to work around nontrivial hurdles that do not exist in previous work. One important direction of future work is the practical performance of such algorithms, especially how such learning type of algorithms are compared with the algorithms that focus on the worst-case performance, in practical size of problems. Such comparisons are not easy to evaluate, and it is one of our ongoing work.

10

References [1] Facebook 2012 fiscal year finance report. [2] Google 2012 fiscal year finance report. [3] S. Agrawal, Z. Wang, and Y. Ye. A dynamic near-optimal algorithm for online linear programming. Working paper, 2011. [4] B. Bahmani and M. Kapralov. Improved bounds for online stochastic matching. In Proceedings of the 18th annual European conference on Algorithms: Part I, ESA’10, 2010. [5] N. Buchbinder, K. Jain, and J. Naor. Online primal-dual algorithms for maximizing adauctions revenue. In Algorithms–ESA 2007, pages 253–264, 2007. [6] N. Devanur. Online algorithms with stochastic input. SIGecom Exch., 10(2):40–49, 2011. [7] N. Devanur and T. Hayes. The adwords problem: online keyword matching with budgeted bidders under random permutations. In EC’09: Proceedings of the tenth ACM conference on Electronic Commerce, pages 71–78, 2009. [8] N. Devanur and K. Jain. Online matching with concave returns. In Proceedings of the 44th symposium on Theory of Computing, STOC ’12, pages 137–144, 2012. [9] J. Feldman, M. Henzinger, N. Korula, V. Mirrokni, and C. Stein. Online stochastic packing applied to display ad allocation. Algorithms–ESA 2010, pages 182–194, 2010. [10] J. Feldman, N. Korula, V. Mirrokni, S. Muthukrishnan, and M. Pal. Online ad assignment with free disposal. In WINE’09: Proceedings of the fifth Workshop on Internet and Network Economics, pages 374–385, 2009. [11] J. Feldman, A. Mehta, V. Mirrokni, and S. Muthukrishnan. Online stochastic matching: beating 1 - 1/e. In FOCS’09: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science, pages 117–126, 2009. [12] G. Goel and A. Mehta. Online budgeted matching in random input models with applications to adwords. In SODA’08: Proceedings of the nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 982–991, 2008. [13] C. Karande, A. Mehta, and P. Tripathi. Online bipartite matching with unknown distributions. In Proceedings of the 43rd annual ACM symposium on Theory of computing, STOC ’11, 2011. [14] M. Mahdian and Q. Yan. Online bipartite matching with random arrivals: an approach based on strongly factor-revealing lps. In STOC’11: Proceedings of the the 43rd annual ACM symposium on Theory of Computing, 2011. [15] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. Adwords and generalized on-line matching. In FOCS’05: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pages 264–273, 2005. [16] M. Molinaro and R. Ravi. Geometry of online packing linear programs. In Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I, ICALP’12, pages 701–713, 2012. [17] P. Orlik and H. Terao. Arrangement of hyperplanes. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Springer-Verlag, Berlin, 1992. [18] A. van der Vaart and J. Wellner. Weak convergence and empirical processes: with applications to statistics (Springer Series in Statistics). Springer, November 1996. [19] Z. Wang. Dynamic Learning Mechanism in Revenue Management Problems. PhD thesis, Stanford University, Palo Alto, 2012.

11

A

Hoeffding-Bernstein’s Inequality

By Theorem 2.14.19 in [18]: Lemma 6. [Hoeffding-Bernstein’s Inequality] Let u1 , u2 , ...ur be random samples without replacement from the real numbers {c1 , c2 , ..., cR }. Then for every t > 0, r !   X t2 ui − r¯ c ≥ t ≤ 2 exp − P 2 + t∆ 2rσR R i=1

where ∆R = maxi ci − mini ci , c¯ =

B

1 R

PR

2 and σR =

i=1 ci ,

1 R

PR

i=1 (ci

− c¯)2 .

Proof of Lemma 1

We first write down the Lagrangian dual of (3). By associating pi to the first set of constraints and yj to the second set of constraints, the Lagrangian dual of (3) is: Pm Pn infp,y i=1 (Mi (ui ) − pi ui ) j=1 yj + supui ≥0 (12) s.t. yj ≥ bij pi , ∀i yj ≥ 0, ∀j. Since the primal problem is convex and only has linear constraints, Slater’s condition holds, thus strong duality theorem holds and (3) and (12) have the same optimal value. Next we show that (4) and (12) are equivalent. To show this, assume the range of Mi′ (·) on [0, ∞) is (ai , bi ] or [ai , bi ] (by the assumption that M (·)’s are continuously differentiable, it must be either one of these two forms). Now we argue that the optimal pi must be within in [ai , bi ] in (12). First we must have pi ≥ ai , otherwise the term supui ≥0 {Mi (ui ) − pi ui } goes to infinity as ui increases and it can’t be the optimal solution to (12). On the other hand, if pi > bi , the optimal ui must be 0, and one can always set pi = bi and achieves a smaller value of the objective function. Therefore, pi ∈ [ai , bi ] at optimal. Now if pi ∈ (ai , bi ] at optimal, one can always find one vi such that Mi′ (vi ) = pi , and that vi must be the optimal solution to supui {Mi (ui ) − pi ui } (the optimal solution must be attainable in this case). Therefore, each feasible solution of (12) will correspond to a feasible solution of (4) and vice versa. The only case left now is when pi = ai at optimal. In this case, supui {Mi (ui ) − ai ui } = limx→∞ {Mi (x) − ai x}. And we know that limx→∞ Mi′ (x) = ai , therefore, there exists a sequence of feasible solution of (4) such that the limit of the objective value equals to the objective obtained when pi = ai in (12). Therefore, the lemma is proved. 

C

Proof of Theorem 3

First we show that under condition 3, with probability 1 − ǫ, ǫn X j=1

bij ≥ ǫnγ

for all i.

To see this, we use Hoeffding-Bernstein’s Inequality, we have for any i,   ǫn   n X X ǫ 2 n2 γ 2   bij ≥ ǫnγ ≤ 2 exp − bij − ǫ P 2ǫn + ǫnγ j=1 j=1  ≤ 2 exp −ǫnγ 2 /3 < ǫ/m

12

(13)

where the last inequality is because of condition 4. Taking union bound across all i’s, we get that (13) holds with probability 1 − ǫ. Now we argue that given (13) happens, and the conditions in the theorem hold, there can’t exist an i such that u ˆi < C in the optimal solution to the partial program (Pǫ ). We prove by contradiction. Suppose there exists i such that u ˆi < C in the optimal solution to (Pǫ ), we argue that there must exist 1 ≤ j ≤ ǫn such that: 1. j ∈ S = {j : xij < 1, bij ≥ η}

2. There exists i′ such that xi′ j > 0 and u ˆi′ ≥ KC

Here the two conditions mean that there must exist a bid j such that we allocate the bid (at least partially) to a bidder with a total allocation of at least KC when we could have allocated the bid more to bidder i whose final allocation is less than C. P To see why this is true, we first see that given (13), we have ǫn j=1 bij ≥ ǫnγ. However, Pǫn by the definition of i, we also have j=1 bij xij ≤ ǫC. Therefore, combined with the fact that maxi,j bij ≤ 1, there must exist at least ǫnγ − ǫC j’s between 1 and ǫn such that xij < 1 but bij ≥ η, i.e., |S| ≥ ǫnγ − ǫC. Next we show that among j ∈ S, there exists at least one j such that xi′ j > 0 while u ˆi′ ≥ KC for some i′ . To see this, define T = {i : u ˆi < KC}. We first have X

i∈T,j

xij ≤

1 X mKC . bij xij < η η

(14)

i∈T,j

Here the second inequality is because that |T | < m. However, we also have X xij ≥ ǫnγ − ǫC.

(15)

i,j∈S

This is because since Mi (·)’s are increasing, at optimal each by taking difference between (14) and (15), we have that X

i6∈T,j∈S

xij > ǫnγ − ǫC −

P

i

xij must equal to 1. Therefore,

mKC = 0. η

Here the equality is because of the definition of K. Therefore, there exists j ∈ S such that the bid is allocated (at least partially) to some i′ with u ˆi′ ≥ KC. We denote such j by j ∗ . Finally, we consider another allocation that increases the allocation of j ∗ to i while decreasing the allocation to i′ (by the definition of j ∗ , such change is feasible at least for a small amount). The local change (derivative) of the objective function at this point is: Mi′ (ˆ ui )bij − Mi′′ (ˆ ui′ )bi′ j ≥ Mi′ (C)η − Mi′′ (KC) > 0 where the first inequality is because of the concavity of Mi (·)’s and the last inequality is because of the condition 5. However, this contradicts with the assumption that the solution is optimal. Thus, the theorem is proved. 

D

Proof of Lemma 4

We first prove (10). The idea is similar to the proof of the one-time learning case. For any ˆ k , we define that a random sample S (a sequence of arrival) is bad if and only if u ˆ k is fixed u k ¯ does not satisfy (10) for some i. First, we show that the the optimal solution to (Pℓk ) but u

13

ˆ k and fixed i. Then we take a union bound probability of bad samples is small for any fixed u k ˆ ’s and i’s to show the result. over all distinct u ˆ k and i. We define Yj = bij xij (ˆ Fix u uk , bj ). By Lemma 2 and the assumption on u ˆki , we have ℓ

k X ℓk k ℓk k Yj ≤ u u ˆ i − ǫ2 u ˆki ≤ ˆ i + ǫ2 u ˆki . n n j=1

Therefore, the probability of a bad sample is bounded by the following two terms:   k r ℓk+1 ℓ X X ℓ ℓ n k k k ǫ)ˆ uki  ˆ i + ǫ2 u ˆki , Yj > (1 + P Yj ≤ u n n ℓ k j=1 j=ℓk +1  k  r ℓk+1 ℓ X X ℓ ℓ n k k k +P  Yj ≥ u ǫ)ˆ uki  ˆi − ǫ2 uˆki , Yj < (1 − n n ℓ k j=1

(16)

j=ℓk +1

For the first term, we have   k   r ℓk+1 ℓ X X ℓ ℓ n k k 1+ ǫ u ˆki  ˆki + ǫ2 uˆki , P Yj ≤ u Yj > n n ℓ k j=1 j=ℓk +1   k   r ℓ k+1 ℓ X X ℓ n ℓ k k 2+ Yj > ǫ u ˆki  ˆki + ǫ2 uˆki , Yj ≤ u =P  n n ℓ k j=1 j=1   ℓk k+1   r ℓk+1 X ǫ r n ℓk ℓX X 1 ℓ n k Yj − ≤P  Yj > 2+ Yj > ǫ uˆki  uˆk 2 j=1 4 ℓk n i j=1 n ℓk j=1  2 k ǫ ǫ u ˆi . ≤ =δ ≤2 exp − 16 2m(m2 n)m

Here the second inequality follows from the Hoeffding-Berstein’s Inequality. And the third inequality is because of the condition of uˆki . Similarly, we can get the bound for the second term in (16). And therefore, the probability of bad samples is bounded by 2δ. ˆ k and i. Similar to the proof of Lemma 3, we Now we take union bound over all distinct u call u’s to be distinct if they result in different allocations. As argued earlier, there are no more than (m2 n)m distinct u’s. Therefore, we know that with probability 1 − O(ǫ), (10) holds. ˆ k and i. We define Yj = bij xij (ˆ Next we prove (11). The idea is similar. Fix u uk , bj ). Applying the Hoeffding-Berstein’s Inequality, we get    r  ℓk n X X n ℓk k P Yj > 1 + ǫ Yj ≤ u u ˆki  ˆ i + ǫ2 u ˆki , n ℓ k j=1 j=1    r X X  r n n X ℓk ǫ ℓk k n  k ℓk Yj > 1 + ǫ Yj > u ˆi u ˆi ≤P  Yj − n 4 n ℓ k j=1 j=1 j=1  2 k ǫ u ˆi . =δ ≤ exp − 16 Using the same argument as above, Lemma 4 holds.

14



E

Proof of Lemma 5

The proof consists of two main steps. First we show that with probability 1 − O(ǫ), the following is true for all k:  r  m X n Mi (˜ uki ) ≥ 1 − 2ǫ OPT. (17) lk i=1 To show this, we follow a similar step when we prove the optimality of the one-time learning algorithm. Define vˆik yˆik

= uˆki = max{bij Mi′ (ˆ uki )}. i

Since (vik , yik ) is a feasible solution to (4), we know that n X

yˆjk +

m X i=1

j=1

(Mi (ˆ uki ) − Mi′ (ˆ uki )ˆ uki )

is an upper bound of OPT. Therefore, by using the same argument as in (12), we know that OPT −

m X i=1

Mi (˜ uki ) ≤

m X i=1

(Mi (ˆ uki ) − Mi (˜ uki ) + (˜ uki − u ˆki )Mi′ (ˆ uki )).

(18)

Now for each term in (18), we consider two cases. If uˆki ≥ u˜ki , then Mi (˜ uki ) k (ˆ ui − u ˜ki ) k u ˜i

Mi (ˆ uki ) − Mi (˜ uki ) + (˜ uki − u ˆki )Mi′ (ˆ uki ) ≤ Mi (ˆ uki ) − Mi (˜ uki ) ≤ and with probability 1 − ǫ, this is less than 2ǫ

q

n uki ); ℓk Mi (˜

if uˆki < u˜ki , then Mi (ˆ uki ) k (˜ ui − u ˆki ). k u ˆi

Mi (ˆ uki ) − Mi (˜ uki ) + (˜ uki − u ˆki )Mi′ (ˆ uki ) ≤ (˜ uki − uˆki )Mi′ (ˆ uki ) ≤ Again, with probability 1 − ǫ, this is less than 2ǫ m X

(Mi (ˆ uki )

i=1



Mi (˜ uki )

+

(˜ uki

q

n uki ). ℓk Mi (˜

u ˆki )Mi′ (ˆ uki ))



Therefore, with probability 1 − ǫ,

≤ 2ǫ

r

n OPT. ℓk

Therefore, (17) is proved. Next we show that m X

Mi (˜ uki )

i=1

m X

n Mi ( u¯ki ) ≤ 4ǫ − ℓ k i=1

r

n OPT. ℓk

To see this, by Lemma 4, we know that with probability 1 − O(ǫ), r n k n k ¯i ≤ 2ǫ uˆ . u˜ki − u ℓk ℓk i Therefore, for each i, we have u˜ki

− ℓnk u¯ki u˜ki

2ǫ ≤

q

(1 − ǫ

n k ˆi ℓk u

q

n uki ℓk )ˆ

15

≤ 4ǫ

r

n . lk

Now we analyze Mi (˜ uki )−Mi ( ℓnk u ¯ki ) for each i. We only need to focus on the case when u˜ki > ℓnk u¯ki (otherwise the difference is less than 0). In this case, by the concavity of Mi (·), we have that r n n k Mi (˜ uki ) k n k k (˜ ui − u ¯i ) ≤ 4ǫ Mi (˜ uki ) Mi (˜ ui ) − Mi ( u¯i ) ≤ k ℓk ℓk ℓk u ˜i Therefore, we have m X i=1

Mi (˜ uki ) −

m X

Mi (

i=1

n k u¯ ) ≤ 4ǫ ℓk i

r

r m n X n Mi (˜ uki ) ≤ 4ǫ OP T. ℓk i=1 ℓk

Together with (17), Lemma 5 holds.

F



Proof of Theorem 5

ǫ We first prove for each k, with probability 1 − log (1/ǫ) , mini u ˆki > C. Then we take a union ǫ bound to prove Theorem 5. To show that for each k, with probability 1 − log (1/ǫ) , mini uˆki > C, we use very similar Pℓk approach as the proof of Theorem 3. First we show that with probability ǫ , j=1 bij ≥ ℓk γ for all i. To see this, we use the Hoeffding-Bernstein’s Inequality, 1 − log (1/ǫ) we have for any i,   ℓk n X X ǫ ℓ k bij − nij ≥ ℓk γ  ≤ 2 exp(−ℓk γ 2 /3) < P  n j=1 m log (1/ǫ) j=1

P ℓk . Next we show that given j=1 bij ≥ ℓγ, where the last inequality is because that n ≥ 6 logǫγ(m/ǫ) 2 k there can’t exist an i such that u ˆi < C in the optimal solution to the partial program (Pℓk ). We prove by contradiction. Suppose there exists i such that u ˆki < C in the optimal solution, then there must exist 1 ≤ j ≤ ℓk such that 1. j ∈ Sk = {j : xij < 1, bij > ηij }

2. There exists i′ such that xi′ j > 0 and u ˆki′ ≥ KC

Pk ǫ To see why this is true, we note that we have proved with probability 1 − log (1/ǫ) , ℓj=1 bij ≥ Pℓk ℓk γ. However, by the definition of i, we have j=1 bij xij ≤ ǫC. Therefore, we must have |Sk | ≥ ℓk γ − ℓnk C. Next we show that among j ∈ Sk , there exists at least one j such that xi′ j > 0 while uˆki′ ≥ KC for some i′ . We define Tk = {i : u ˆki < KC}. We have X

i∈T,j

xij ≤

mKC 1 X . bij xij < η η

(19)

i∈T,j

However we also have X

i,j∈S

xij ≥ ℓk γ −

ℓ C. n

Therefore, by taking difference between (19) and (20),we have that X

i6∈T,j∈S

xij > ℓk γ −

mKC ℓk C− ≥0 n η

16

(20)

The last inequality is by the definition of K and that ℓk ≥ ǫn for all k. Therefore, there exists j ∈ Sk such that the bid is allocated to some bidder i′ with uˆki′ ≥ KC. We denote such j by j ∗ . Finally, we consider another allocation that increases the allocation of j ∗ to i while decreasing the allocation to i′ . The local change of the objective function at this point is: Mi′ (ˆ uki )bij − Mi′′ (ˆ uki′ )bi′ j ≥ Mi′ (C)η − Mi′′ (KC) > 0 where the first inequality is because of the concavity of Mi (·)’s and the last inequality is because of the condition 5. However, this contradicts with the assumption that the solution is optimal. Thus Theorem 5 is proved. 

17