On Greedy Algorithms for Decision Trees

Comment

Report 3 Downloads 9 Views

On Greedy Algorithms for Decision Trees Ferdinando Cicalese1 , Tobias Jacobs?2 , Eduardo Laber3 , and Marco Molinaro4 1

2

University of Salerno, Italy National Institute of Informatics, Japan 3 PUC – Rio de Janeiro, Brazil 4 Carnegie Mellon University, USA

Abstract. In the general search problem we want to identify a specific element using a set of allowed tests. The general goal is to minimize the number of tests performed, although different measures are used to capture this goal. In this work we introduce a novel greedy approach that achieves the best known approximation ratios simultaneously for many different variations of this identification problem. In addition to this flexibility, our algorithm admits much shorter and simpler analyses than previous greedy strategies. As a second contribution, we investigate the potential of greedy algorithms for the more restricted problem of identifying elements of partially ordered sets by comparison with other elements. We prove that the latter problem is as hard to approximate as the general identification problem. As a positive result, we show that a natural greedy strategy achieves an approximation ratio of 2 for tree-like posets, improving upon the previously best known 14-approximation for this problem.

1

Introduction

The problem of efficiently searching in a discrete set is a fundamental one in computer science [20] and as such it appears in many diverse variants and in a surprisingly wide range of areas, e.g., database [5, 9], learning [7], parallel assembly of multipart products [10], image processing [3], data compression [14], and more generally, theory of algorithms [1, 2, 4, 8, 19–26]. In this paper we try to contribute to the large literature on searching by considering very general search problems and analyzing the performance of a simple novel greedy approach. We show that this novel approach matches most of the best known bounds to date and, remarkably, allows for a very direct and less involved analysis as compared to the state of the art. Problem Definition. Let U be a finite set of objects and n = |U |. An initially marked but unknown object u∗ ∈ U has to be identified. A weight function w : U → N is given, which indicates for each object u ∈ U the likelihood that u is the marked object to be identified. The identification is done by adaptively performing tests from a given finite set T. We denote by m the cardinality of T. ?

This work was supported by a fellowship within the Postdoc-Programme of the German Academic Exchange Service (DAAD).

For each t ∈ T there is an associated partition, Gt = {G1t , . . . , Gkt } of the universe U. The output of the test is the index j of the set in the partition Gt containing the object u∗ which has to be identified, i.e., the j such that u∗ ∈ Gjt . In such case, we say that the objects in Gjt satisfy (agree with) the test performed. For each test t there is a cost ct that has to be paid in order to perform the test. An identification strategy is typically represented by a decision tree D where each internal node ν of D maps to a test tν and has |Gtν | children, indexed according to the elements of the partition Gtν . The tree has exactly n leaves, which are in one-to-one correspondence with elements in U . Each leaf ` is associated with an element in U which is the unique object satisfying the tests on the path from the root of D to the leaf `. Given such a decision tree, the corresponding strategy is to start performing the test associated to the root of D; if j is the outcome of the test, then the test associated to the jth child is to be performed, and so on, until a leaf is reached, indicating the marked object for the given instance. Implicit in the definition of decision trees given here is that we only allow instances of the problem where each object u ∈ U has a set of tests which uniquely identifies u. Given a decision tree D, the cost costD (u) of identifying an object u following the strategy defined by D is defined as the sum of the costs of the tests associated to the nodes on the unique path from the root of D to the leaf associated to u. We consider two different measures of performance for a decision tree: The average identification cost of D is defined as the expected costs of identifying an objects chosen in accordance to the likelihood w(.) when we use the strategy associated to D, i.e., in formulae: X avgcost(D) = w(u)costD (u). u∈U

We remark that w is not a probability distribution. We also consider the worst identification cost of D, defined as the maximum over all u ∈ U of the cost of identifying u using the decision tree D, i.e., in formulae: worstcost(D) = max costD (u). u∈U

Existing Results The problem of finding the decision tree with minimum average cost is a direct generalization of the Binary Identification Problem [12], which coincides with the particular case when each test defines a bipartition. Since there has been a substantial amount of work in variations of the above identification problem, we will specify the type of instance we refer to by means of a three field notation, inspired by scheduling problems. More precisely, we will use the notation aCbW |cr|Obj, where the first field is used for indicating numerical restrictions on the input, the second field for indicating combinatorial restrictions, and the third field is used to specify the objective function. In the first field, aCbW with a, b ∈ {u, n} indicates whether the costs and weights are uniform or not (e.g., uCnW indicates a problem with uniform costs and non-uniform weights). As for the second field, the combinatorial restrictions we

consider are cr ∈ {B, M, P, T }, respectively for Binary tests, Multiway or k-ary tests, Poset, Tree-like poset. Finally, we consider the objective functions described above, i.e., Obj ∈ {A, W }, for average-case, and worst-case, respectively. The problem of finding a decision tree with minimum worst-case cost does not admit an o(log n)-approximation algorithm unless P=NP, even when costs are uniform [24]. The same lower bound holds for the uCnW |B|A version [5]. On the algorithmic side, it is known that a simple greedy algorithm achieves an O(log n)-approximation for minimizing the worst-case cost with binary tests and uniform costs [3]. The problem of finding a decision tree with minimum average cost has received much more attention. For the uCnW |B|A version, Kosaraju et al. [21] and DasGupta [7] independently proposed greedy algorithms with an O(log W/wmin ) approximation factor, where W is the sum of the weights of all objects and wmin is the weight of the object with minimum weight. In addition, Kosaraju et. al. showed how to modify their greedy approach to attain an O(log n)-approximation. Still in the case of binary tests, Adler and Heeringa [2] present a (ln n + 1)-approximation for the nCuW |B|A version.5 In [5], Chakaravarthy et al. started the study of multiway tests. They present an O(log2 n) approximation for the uCnW |M |A problem. In a later paper, Chakaravarthy et al. [6] managed to cut a log n factor for uniform weight instances, showing that the uCuW |M |A problem admits a O(log n)-approximation. Recently Guillory and Blimes [15] simultaneously extended many of the above results by showing that a natural extension of the algorithm proposed in [7] achieves approximation ratio of 12 log W/wmin for the nCnW |M |A problem. Finally, Gupta et al. [16] proved that such a problem admits an O(log n)approximation, matching the Ω(log n) lower bound up to constant factors. Our Results. One of the aim with which we embarked in our investigation was to better understand the potentiality of the greedy approach pursuing a more direct analysis and trying to disclose the problem’s crucial structure. In fact in all the works mentioned in the above long historical excursus [2, 7, 21, 15, 6, 16], the analyses are quite lengthy and involved, requiring several pages to prove the approximations. This might be a result of the two specific greedy criteria considered in those paper, namely the shrinkage-cost ratio and the minimization of the heaviest group. In the paper considering the former criterium, at each step the algorithm selects i which maximizes the shrinkage-cost ratio defined by the test Pk w(Gji ∩S)2 ∆i (S,w) 1 = ci w(S) − j=1 w(S) , where S is the set of objects consistent ci with the tests performed so far [2, 7, 15]. In the case when only uniform weights are considered the weights are substituted by cardinalities of the corresponding sets. In the papers considering the latter criterium, greedy means selecting the test that generates a partition of S whose heaviest group is as light as possible [6, 21]. We remark that the approach taken in [16] is different, but somehow less transparent to the actual identification problem structure, as it relies on much heavier machinery such as submodular optimization and variants of the TSP problem. 5

Note that in the paper, the authors use the term weight to refer to the cost of a test.

Our first contribution is an alternative greedy strategy which, in order to select the next test to perform, takes into account simultaneously the cardinality and the weight of the sets in the partitions induced by the possible available tests. This novel greedy strategy allows to achieve (or improve) the best known results for different types of instances considered in literature and is prone to a short, and in our opinion, neater analysis. More specifically, we give a short proof that our algorithm attains a 4 ln(W/wmin )-approximation for nCnW |M |A. This improves by a constant factor the 12 ln(W/wmin ) analysis of [15]. For uCnW |M |A our algorithm achieves a O(log n)-approximation. Finally, for instances with non-uniform costs and uniform weights it simultaneously achieves an O(log n) approximation for both the average-case and worst-case cost objective functions. In the second part of the paper we investigate the potential of greedy approaches for restricted classes of the average-case multiway identification problem that have been studied in the literature. We consider the class where the set U is a poset and for each element u ∈ U there exists a binary test which reports whether the marked object is smaller than or equal to u [25]. It turns out that there is little hope for better approximations for this class since we show an Ω(log n) hardness of approximation. Thus, we consider the class of instances where the poset is a rooted tree, the tests have uniform costs and objects may have non-uniform weights. This corresponds to the NP-complete problem of searching in trees [18, 23]. For this class, we show that the greedy strategy which always selects the test which induces the most balanced partition attains a 2-approximation, improving over the 14-approximation given in [23].

2

A novel greedy procedure

Recall the problem definition and the notation given in the introduction. We say that test t splits the objects into groups G1t , . . . , Gkt . Let njt = |Gjt | and let Wtj be the sum of the weights of the objects in Gjt . Let jt∗ = argmaxj {njt Wtj }. In j∗

j∗

order to simplify the notation, we use the shorthands nt = nti and Wt = Wt t . Now we describe the greedy approach we propose, dubbed Greedy. Let I = (U, PT, w) be an instance of an identification problem, where |U | = n and W = u∈U w(u). Let t ∈ T be the test which minimizes ct /(nW − nt Wt ). Then the root of the decision tree of Greedy is associated to the test t and its children are decision trees obtained recursively by applying Greedy to the instances I1 , . . . , Ik , where Ii = (Git , T, w). The intuition behind Greedy’s criterium is that it penalizes a test whether it has a high cost or it induces a partition of U that contains a set that is either large or heavy. We will use the following subadditivity property on the cost of an optimal decision tree for the identification problem. This property was observed in [6] for the average-case model. Similar arguments show that it also holds for the worst-case model. Proposition 1. [6] Let U be the set of objects in an instance of the identification problem and let D∗ and E ∗ be optimal decision trees with respect to the average-

case model and worst-case model, respectively. Let U1 , . . . , Uk be disjoint subsets of U, and, for i = 1, . . . , k, let Di∗ (EI∗ ) be an optimal decision tree for the subinstance of I defined on the set of objects Ui for the average-case (worstcase) model. Then k X avgcost(Di∗ ) ≤ avgcost(D∗ ) i=1

and k

max{worstcost(Ei∗ )} ≤ worstcost(E ∗ ). i=1

2.1

Multiway tests, non-uniform weights and non-uniform costs

We start our analysis of the approximation provided by Greedy considering the case of nCnW |M |A instances. We are going to show that in this case Greedy provides a solution with cost which is at most 2 log nW/wmin times the cost of an optimal decision tree, where wmin is the minimum weight assigned to an object. Without loss of generality, in the following analysis we assume wmin = 1. Let cost(I) denote the cost of the decision tree produced by Greedy on instance I and let OP T (I) be the cost of the optimal decision tree for the same instance. Let τ denote the first test performed by Greedy. Also, for each ` = 1, . . . , k, let I` be the instance associated with the set of P objects G`τ . Then, the cost incurred by Greedy is given by cost(I) = W cτ + ` cost(I` ). Moreover, using Pk Proposition 1, we have OP T (I) ≥ `=1 OP T (I` ). Given any lower bound LB on OP T (I), we can bound the approximation ratio attained by Greedy as follows: P W cτ + ` cost(I` ) W cτ cost(I` ) cost(I) P ≤ + max (1) ≤ ` OP T (I` ) OP T (I) max{LB, ` OP T (I` )} LB We now focus on devising a suitable lower bound LB on the cost of an optimal decision tree D∗ for the instance I. For each test t ∈ T we define αt as the sum of the weights of the objects associated with leaves that are descendants P of some node associated with test t in D∗ .6 Clearly we have OP T (I) = t ct αt . The greedy rule implies that cτ /(nW − nτ Wτ ) ≤ ct /(nW − nt Wt ) for each t ∈ T . It follows that X cτ OP T (I) ≥ (nW − nt Wt )αt (2) nW − nτ Wτ t P We now note that we can interpret t (nW − nt Wt )αt as the cost of a decision tree for a modified version of instance I where the cost of test t is changed from ct to nW − nt Wt . We can then use the following result. Claim. Let I˜ be an instance obtained from I by changing the costs of the tests ˜ ≥ nW 2 /2 so that for each t, it holds that ct ≥ nW − nt Wt . Then OP T (I) 6

Note that the leaves contributing to αt do not necessarily induce a subtree of D∗ , as t can appear in more than one node of D∗ .

Proof of Claim. We use induction on n. If n = 2 then any test that splits the two objects has cost at least 2W − W 0 , where W 0 is the weight of the heaviest object. Thus, the cost of the optimal tree is at least W (2W − W 0 ) ≥ W 2 , which establishes the base case. Suppose now that the instance I˜ comprises n > 2 objects. Let t˜ be the test at ˜ ≥ the root of an optimal tree for I˜ and notice that nt˜ < n. If nt˜ = 1 then OP T (I) 2 ˜ W (nW − Wt˜) ≥ nW /2. If nt˜ > 1 then OP T (I) ≥ W (nW − nt˜Wt˜) + nt˜Wt˜2 /2 because: (i) W (nW − nt˜Wt˜) is a lower bound on the contribution of the test ˜ and (ii) by induction, n˜W 2 /2 is a lower bound for the instance t˜ to OP T (I) t t˜ associated with the objects of the group induced by test t˜ that has nt˜ objects with total weight Wt˜. Since W 2 /2 ≥ Wt˜(W − Wt˜/2), collecting the terms with nt˜ gives W (nW − nt˜Wt˜) + nt˜Wt˜2 /2 ≥ nW 2 /2, which establishes the result t u From the above claim and equation (2) we get that OP T (I) ≥

cτ nW 2 . 2(nW − nτ Wτ )

Then replacing LB by this lower bound in equation (1) gives cost(I) 2(nW − nτ Wτ ) cost(I` ) ≤ + max . ` OP T (I` ) OP T (I) nW Thus, assuming by induction on the number of objects that for each instance I` the approximation ratio attained by Greedy is at most 2 ln(Wτ` n`τ ), which by the definition of Wτ and nτ is at most 2 ln(Wτ nτ ), we have cost(I) nτ W τ ≤2 1− + ln(nτ Wτ ) ≤ 2 ln nW ≤ 4 ln W, (3) OP T (I) nW where the second inequality uses the fact that ln x ≤ x − 1 for all x > 0 and the last inequality uses the fact wmin = 1. In general, dropping the assumption wmin = 1, the above analysis yields the approximation W cost(I) ≤ 2 ln n , (4) OP T (I) wmin 2.2

Uniform costs and non-uniform weights

In the case of uCnW |M |A instance, we can strengthen the above result and show that Greedy attains a 1 + 4 ln n approximation. Note that, for the particular case when the costs are uniform Greedy selects a test t with smallest nt Wt . Let D be the tree constructed by Greedy and for each node v ∈ D let Dv denote its subtree rooted at v and w(Dv ) denote the sum of the weights of the leaves in Dv . Define the set V = {v : w(Dv ) ≤ W/n}, that is, the set of nodes in Greedy’s tree such that the weight of the subtree below it is at most W/n. It follows that these nodes contribute with at most W for the cost of D. Then the approximation ratio can be bounded by P P P w(Dv ) cost(D) v∈D−V w(Dv ) v∈V w(Dv ) = + ≤ v∈T −V + 1, (5) OP T OP T OP T OP T

where the inequality follows from the P fact that OP T ≥ W . −V w(Dv ) We can now estimate the ratio v∈TOP in terms of an instance where T each object has weight at least W/n. We are coalescing objects that appear in the decision tree of Greedy in subtrees whose leaves have total cost ≤ W/n. Thus, by using the analysis of the previous section, from (4) we have P W v∈T −V w(Dv ) ≤ 2 ln n = 4 ln n, OP T W/n which, together with (5), gives the desired result. 2.3

Greedy is bi-criteria for non-uniform costs and uniform weights

We now show that for instances with uniform weights Greedy attains an O(log n)approximation simultaneously for both average-case and worst-case objectives. The first part of the claim follows directly from the general result of section 2.1, hence we now prove the approximation for the worst-case objective. Consider an instance I with arbitrary costs and uniform weights and let D be the decision tree produced by Greedy on I. Let t be the first test performed by Greedy and D` be the decision tree produced by Greedy on the instance I` induced by G`t . From the recursive nature of the algorithm we have worstcost(D) = ct + max{worstcost(D` )}. `

(6)

Now, let D∗ denote an optimal decision tree for the instance I and D`∗ be the optimal decision tree for the instance I` . Using Proposition 1 we can again bound worstcost(D∗ ) as worstcost(D∗ ) ≥ max{LB, max{worstcost(D`∗ )}}, `

(7)

where LB is any lower bound on worstcost(D∗ ). We now turn to the issue of defining a suitable lower bound LB on the cost of D∗ . In the following we identify nodes of D∗ with the tests they map to. We choose a root-to-leaf path v1 , v2 , . . . , vp in D∗ as follows. First, v1 is the root of D∗ . Then, assuming we have already chosen v1 , . . . , vi , we choose vi+1 as the child of vi that is used to split the objects that lie in the largest set of the partition of Gvi . If the largest set has only one object, it corresponds to a leaf `, then we set vi+1 = `. The process stops when we reach the leaf vp . We set Pp−1 LB = i=1 cvi . Our goal is now to bound this quantity. Let ui be the number of objects that are in the subtree of D∗ rooted at vi but not in the subtree rooted at vi+1 . By definition of the vi ’s, we have that ui + nvi ≤ n holds for i = 1, . . . , p − 1. Then, the greedy criterium allows us to write n2

ct cvi cvi ≤ 2 = 2 2 n − (n − ui ) 2nui − u2i − nt

for i = 1, . . . , p − 1.

Finally, adding up the costs of the nodes v1 , . . . , vp−1 and using the fact that Pp−1 j=1 uj = n − 1, we get ! p−1 p−1 X X ct ct ct (n2 − 1) 2 2 = LB ≥ 2 2nu − u 2n(n − 1) − u ≥ . i i i n − n2t i=1 n2 − n2t n2 − n2t i=1 Now we can bound the approximation ratio of Greedy as the ratio of (6) and (7), which gives cost(D` ) n2 − n2t + max . 2 ` cost(D`∗ ) n −1 Assuming, by induction on the size of the instances, that on I` Greedy provides a solution at most ln((n`t )2 − 1) away from the optimal one, we have that the approximation ratio becomes n2t − 1 n2 − n2t 2 + ln(n − 1) = 1 − + ln(n2t − 1) ≤ ln(n2 − 1). t n2 − 1 n2 − 1

3

Partially Ordered Sets

In this section we consider the binary identification problem where the elements of U form a partially ordered set and there is one test t(u) for each u ∈ U which indicates whether u∗ ≤ u or not. That is, Gt(u) = {G1t(u) , G2t(u) } with G1t(u) = {u0 ∈ U : u0 ≤ u} and G2t(u) = U \ G1t(u) . We represent the poset by its Hasse diagram, that is, the digraph F with node-set U that contains the arc (ui , uj ) whenever uj covers ui (i.e., ui < uj and there is no u` in U such that ui < u` < uj ). Given a decision tree for this problem, it is useful to order its nodes such that, for a node associated to the test t, its right child corresponds to the part G1t and its left child corresponds to the part G2t . We show next that, for every > 0, the problem uCnW |P |A cannot be approximated in polynomial time within a ratio of (0.25 − ) log n unless N P ⊆ T IM E(nO(log log n) ). This is accomplished via a reduction to Set Cover, which is hard to approximate within a factor of less than log n under this complexity assumption [11]. We remark that this reduction is similar to the one used in [5]. Consider a Set Cover instance (X, S S), where X = {x1 , . . . , xn } is the nonempty ground set and S ⊆ 2X with S∈S S = X is the family of covering sets. Moreover, we assume that |S| = O(n2 ); this is without loss of generality since these instances are still hard to approximate [11]. Given a Set Cover instance (X, S), we construct the following instance (F, w) to our identification problem. The digraph F has a node vi for each xi ∈ X, a node si for each set Si ∈ S and one extra node r. The arcs in this graph are (vi , r) for each vi and arcs (vi , sj ) for xi ∈ Sj . The weight function w assigns weight 1 to r and weight 0 to every other node. Now we relate the solution for these instances. Consider a cover C = {Si1 , Si2 , . . . , Sik } for the Set Cover instance (X, S). We construct a decision tree D for (F, w) with cost |C| + 1 as follows. First, make the leftmost

path of D contain the tests t(si1 ), t(si2 ), . . . , t(sik ), t(r) in this order; then complete D to form a valid decision tree. To analyze the cost of D notice that the fact that {Si1 , Si2 , . . . , Sik } is a cover implies that the node in D associated to t(r) has no descendant leaf associated to a vi . Then we have that the right child of the node associated to t(r) is a leaf associated to r and hence cost(D) = |C| + 1. Now let D be a solution for the instance (F, w). We claim that D gives a cover C for the Set Cover instance with |C| = cost(D). For this, let P be the path of D from its root to the its leaf associated to r. The crucial property is that for each element xi ∈ X, there is a node in P which either: (i) corresponds to the test t(vi ) or (ii) corresponds to a test t(sj ) such that (vi , sj ) ∈ F . To construct the cover C, we consider each xi ∈ X and if it falls in case (i) we add to C any set in S which contains xi and if xi falls in case (ii) we add to C the set Sj . By construction, C is a cover which satisfies |C| = cost(D). To conclude our reduction, let OP TSC and OP TID be respectively the optimal value for the Set Cover and the identification problem instances. Let n(F ) = n + O(n2 ) + 1 denote the number of nodes in F . Using the previous properties, we get that an α log(n(F ))-approximate solution for the identification problem gives a solution for Set Cover with size at most α log(n(F ))OP TID ≤ α log(n(F )) (OP TSC + 1) ≤ 2α(2 log n + O(1))OP TSC , where in the last inequality we used the fact that OP TSC ≥ 1. If α ≤ 0.25 − for some > 0, then for large enough n the right hand side is at most (1 − )(log n)OP TSC , hence we obtain an approximation for Set Cover with factor better than log n. Given the aforementioned hardness of Set Cover, there cannot exist a (0.25 − ) log(n(F ))-approximation with > 0 for the identification problem uCnW |P |A unless N P ⊆ T IM E(nO(log log n) ).

4

Tree-like posets

Having shown that finding a good search strategy for partial orders is essentially as hard as the general identification problem, we now turn our attention towards a special class of partial orders. Namely, we consider posets whose Hasse diagram F is a tree with arcs directed towards the root r, i.e. r is the unique maximum element. The test asking whether or not u∗ ≤ r will never be applied by a reasonable search strategy and is therefore assumed not to exist in the following. Any other test “is u∗ ≤ u?” will split the tree into the the two connected components F1 , F2 induced by the removal of the unique outgoing edge of u. We can therefore associate tests from T with edges and objects u ∈ U with nodes of F . For simplicity we do not distinguish between tree nodes (edges) and the elements of U (of T ) associated with them. After test t has revealed that the searched element u∗ is in F1 , no reasonable search strategy will perform a test corresponding to an edge in F2 , and vice versa. Therefore, in any reasonable search tree, each edge t ∈ T appears at most once. It also always holds that |T | = |U | − 1. As any search tree D for F has

exactly |U | − 1 internal nodes, it follows that each test t ∈ T appears exactly once in D. We consider here the case uCnW |T |A of non-uniform weights and uniform costs. That problem version is known to be NP-hard [18] and admits a linear time algorithm achieving an approximation ratio of 14 [23]. In this section we give a simple analysis showing that a natural greedy algorithm attains a 2approximation. We assume here that there are no elements of weight zero. In the full paper we will show that this assumption can be removed by using a more careful tie breaking strategy. Our greedy algorithm always selects a test edge t such that the two subtrees F1 , F2 obtained by the removal of t are as even as possible in terms of weight, i.e. the algorithm always maximizes min{w(F1 ), w(F2 )}, breaking ties arbitrarily. In order to prove that this algorithm results in a 2-approximation, we describe a procedure for turning any search tree D∗ , including the optimal one, into the greedy search tree D computed by our algorithm, and we show that during the transformation the cost increases by no more than cost(D)/2. Let t be the test associated with the root of the greedy search tree D, and let F1 , F2 be the two subtrees obtained after removing t from F . Furthermore, let D1∗ be the search tree for F1 which is obtained from D∗ by simply skipping all the tests associated with edges not in F1 , and let D2∗ be defined analogously. The transformation from D∗ to D proceeds as follows: (a) Construct a search tree D0 for F with test t at the root, and with the left and right subtree under the root being equal to D1∗ and D2∗ , respectively. (b) Recursively turn D1∗ and D2∗ into greedy search trees D1 and D2 for F1 and F2 , respectively. Lemma 1. Step (a) increases the cost by at most W/2. Using this lemma, we can show by simple induction on the number of nodes in F that the transformation increases the cost by at most cost(D)/2. The basic case, |T | = 1, is trivial. From the induction hypothesis we know that step (b) of the transformation increases the cost by at most cost(D1 )/2 + cost(D2 )/2, so the claim follows from the fact that cost(D) = W + cost(D1 ) + cost(D2 ). Proof (of Lemma 1). Assume w.l.o.g. that the root t1 of D∗ is associated with an edge in the subtree F1 . Consider in D∗ the path t1 , t2 , . . . that leads from t1 to test t. Let tk be the first test on that path which is not located in F2 , so either tk = t, or tk is associated with an edge in F2 . Furthermore, for i = 1, . . . , k − 1, among the two subtrees obtained by removing ti from F , let Gi be the one not containing t. As t1 , . . . , tk−1 are in F1 and thus are skipped in D2∗ , we have that the search path to any node from F2 is by at least k − 1 shorter in D2∗ than it is in D∗ . Search tree D1∗ is obtained from D∗ by skipping certain tests, so the search path in D1∗ to any node in F1 is at most as long as in D∗ . For the set of nodes in the subtree G0 := F1 − (G1 ∪ . . . ∪ Gk−1 ) we make a stronger statement: As in D∗ the search path to those nodes contains test tk ∈ / F1 , the search path to them in D1∗ is by at least one shorter. Summarizing the findings from this paragraph,

the difference d := cost(D∗ ) − (cost(D1∗ ) + cost(D2∗ )) is !! k−1 k−1 [ X d ≥ (k − 1)w(F2 ) + w F1 − Gi ≥ (k − 1)w(F2 ) + w(F1 ) − w(Gi ) . i=1

i=1

The greedy criterion ensures that for i = 2, . . . , k − 1 the total weight w(Gi ) of all nodes in Gi is at most min{w(F2 ), W/2}. w(Gi ) ≤ W/2 holds because otherwise, as t is in F − Gi , greedy would rather choose ti than t (note that here the assumptions of non-zero weights is essential). But this means that w(Gi ) ≤ w(F −w(Gi )), and therefore w(Gi ) > w(F2 ) would again mean that t1 is a better greedy choice than t. We charge G2 , . . . , Gk−1 against (k − 2) times w(F2 ): d ≥ w(F2 ) + w(F1 ) − w(G1 ) = w(F − G1 ) ≥ W/2 . Now the Lemma is established by cost(D0 ) = W + cost(D1∗ ) + cost(D1∗ ) = W + cost(D∗ ) − d ≤ cost(D∗ ) + W/2 .

5

Conclusion

We presented a new greedy approach which can be employed for different variants of the multiway identification problem (aka active learning problem). We demonstrated that this novel algorithm can be easily analyzed in several problem cases considered in the literature. We believe that our greedy approach deserves some more investigation in order to fully understand its potential and, possibly, its applicability in other contexts of search where costs of the tests and weights are simultaneously considered. With respect to our analysis, the following questions are worth considering: (1) can the analysis in Section 2.1. be improved or can its tightness be shown? (2) An interesting generalization of the result in Section 2.3 would be to prove that the cost of the costliest path is at most log nW times the cost of the costliest path in the tree that minimizes the worst-case. We also investigated the restriction of the decision tree minimization problem to poset instances, showing that for the average-case objective general poset instances are as hard to approximate as unrestricted instances. For the case of tree-like posets, however, we proved that a greedy strategy can attain 2approximation. In this direction, a further open problem concerns the limits of approximability of tree-like instances, e.g., existence of a PTAS vs. APXhardness.

References 1. J. Abrahams. Code and parse trees for lossless source encoding. In Compression and Complexity of Sequences 1997, pages 145–171, 1997. 2. M. Adler and B. Heeringa. Approximating optimal binary decision trees. In APPROX-RANDOM, pages 1–9, 2008.

3. E. Arkin, H. Meijer, J. Mitchell, D. Rappaport, and S. Skiena. Decision trees for geometric models. International Journal of Computational Geometry and Applications, 8(3):343–364, 1998. 4. R. Carmo, J. Donadelli, Y. Kohayakawa, and E. Laber. Searching in random partially ordered sets. Theoretical Computer Science, 321(1):41–57, 2004. 5. V. Chakaravarthy, V. Pandit, S. Roy, P. Awasthi, and M. Mohania. Decision trees for entity identification: Approximation algorithms and hardness results. In PODS, pages 53–62, 2007. 6. V. Chakaravarthy, V. Pandit, S. Roy, P. Sabharwal. Approximating decision trees with multiway branches. In ICALP 2009, LNCS 5555:210–221, 2009. 7. S. Dasgupta. Analysis of a Greedy Active Learning Strategy. In NIPS, 2007. 8. C. Daskalakis, R. Karp, E. Mossel, S. Riesenfeld, and E. Verbin. Sorting and selection in posets. In SODA, pages 392–401, 2009. 9. D. Dereniowski, M. Kubale. Efficient parallel query processing by graph ranking. Fundamenta Informaticae, 69, 2008. 10. D. Dereniowski. Edge ranking and searching in partial orders. Discrete Applied Mathematics, 156(13):2493–2500, 2008. 11. Uriel Feige A threshold of ln n for approximating set cover. Journal of the ACM, 1998, vol 45(4), 634–652. 12. M. Garey. Optimal binary identification procedures. SIAM Journal on Applied Mathematics, 23(2):173–186, 1972. 13. M. Garey and R. Graham. Performance bounds on the splitting algorithm for binary testing. Acta Informatica, 3:347–355, 1974. 14. M. Golin, C. Kenyon, and N. Young. Huffman coding with unequal letter costs. In STOC, pages 785–791, 2002. 15. A. Guillory and J. Bilmes. Average-Case Active Learning with Costs. In The 20th Intl. Conference on Algorithmic Learning Theory, 2009. 16. A. Gupta, R. Krishnaswamy, V. Nagarajan, R. Ravi. Approximation algorithms for optimal decision trees and adaptive TSP problems. In ICALP, 2010. 17. L. Hyafil and R. Rivest. Constructing obtimal binary decision trees is NP-complete. Information Processing Letters, 5:15–17, 1976. 18. T. Jacobs, F. Cicalese, E. Laber, M. Molinaro. On the Complexity of Searching in Trees: Average-Case Minimization. In ICALP, 2010. 19. W. Knight. Search in an ordered array having variable probe cost. SIAM Journal on Computing, 17(6):1203–1214, 1988. 20. D. Knuth. Optimum binary search trees. Acta. Informat., 1:14–25, 1971. 21. R. Kosaraju, T. Przytycka, and R. Borgstrom. On an optimal split tree problem. In WADS, pages 157–168, 1999. 22. E. Laber, R. Milidi´ u, and A. Pessoa. On binary searching with non-uniform costs. In SODA, pages 855–864, 2001. 23. E. Laber and M. Molinaro. An Approximation Algorithm for Binary Searching in Trees. In Algorithmica, DOI: 10.1007/s00453-009-9325-0 24. E. Laber and L. Nogueira. On the hardness of the minimum height decision tree problem. Discrete Applied Mathematics, 144(1-2):209–212, 2004. 25. M. Lipman and J. Abrahams Minimum average cost testing for partially ordered components. IEEE Transactions on Information Theory, 1995, vol 41, 287–291. 26. S. Mozes, K. Onak, and O. Weimann. Finding an optimal tree searching strategy in linear time. In SODA, pages 1096–1105, 2008.

Recommend Documents

Simple Learning Algorithms for Decision Trees and Multivariate ...

Decision Trees: More Theoretical Justification for Practical Algorithms