Indexability of 2D Range Search Revisited ... - Semantic Scholar

Report 2 Downloads 37 Views
Indexability of 2D Range Search Revisited: Constant Redundancy and Weak Indivisibility ∗

Yufei Tao

ABSTRACT

Keywords

In the 2D orthogonal range search problem, we want to preprocess a set of 2D points so that, given any axis-parallel query rectangle, we can report all the data points in the rectangle efficiently. This paper presents a lower bound on the query time that can be achieved by any external memory structure that stores a point at most r times, where r is a constant integer. Previous research has resolved the bound at two extremes: r = 1, and r being arbitrarily large. We, on the other hand, derive the explicit tradeoff at every specific r. A premise that lingers in existing studies is the so-called indivisibility assumption: all the information bits of a point are treated as an atom, i.e., they are always stored together in the same block. We partially remove this assumption by allowing a data structure to freely divide a point into individual bits stored in different blocks. The only assumption is that, those bits must be retrieved for reporting, as opposed to being computed – we refer to this requirement as the weak indivisibility assumption. We also describe structures to show that our lower bound is tight up to only a small factor.

Indexibility, range search, lower bound, indivisibility

Categories and Subject Descriptors F.2.3 [Analysis of Algorithms and Problem Complexity]: Tradeoffs among Complexity Measures; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing—indexing methods

1. INTRODUCTION This paper revisits orthogonal range search (henceforce, range search) in 2D space, addressing an aspect termed rredundant indexing. Let us start by defining this notion, and explain the reasons motivating its study.

1.1 r-redundant indexing Let P be a set of N points in the data space R2 , where R represents the domain of real values. We will refer to each point p ∈ P interchangeably as an object. Object p is associated with an information field, denoted as inf o(p). Given an axis-parallel rectangle, a range query returns the information fields of all the points in P that are covered by q, namely: {inf o(p) | p ∈ P ∩ q}. See Figure 1. The objective is to index P with a structure so that range queries can be answered efficiently in the worst case.

q

General Terms Theory ∗Affiliation 1: Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, Hong Kong ([email protected]); Affiliation 2: Division of Web Science and Technology, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea ([email protected]).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODS’12, May 21–23, 2012, Scottsdale, Arizona, USA. Copyright 2012 ACM 978-1-4503-1248-6/12/05 ...$10.00.

Figure 1: A range query. The information fields of the points inside the rectangle are reported. The computation model we assume is the standard external memory (EM) model [1] that prevails in research of I/O-oriented algorithms. In this model, a computer has M words of memory, and a disk of unbounded capacity that has been formatted into blocks of size B words. The value of M is at least 2B, i.e., the memory can accommodate at least two blocks. An I/O operation reads a block of data from the disk into memory, or conversely, writes a block of data from the memory into disk. The time of an algorithm

is measured as the number of I/Os performed, whereas the space of a structure is measured as the number of disk blocks occupied. We consider that an information field is L = o(B) words in length (as will be explained shortly, L = Ω(B) is not interesting). The dataset therefore needs at least N L/B blocks to store. Hence, a linear complexity refers to O(N L/B). Likewise, if a query retrieves K objects, Ω(1 + KL/B) I/Os are compulsory to read their information fields. We now clarify the definition of an r-redundant structure. Definition 1. A structure on P is r-redundant if it stores the information field of each object in P at most r times. Apparently, every object must have its information field stored at least once in order to answer all range queries correctly. A 1-redundant index is known as a non-replicating structure [12] because no information field can be duplicated. Such structures have played an important role in the database field, where most well-known indexes are nonreplicating: the kd-tree [5, 15], the R-tree [4, 8], the O-tree [12], to mention just a few. The main reason (see [12]) is that, the dataset cardinality N in practice can be so huge that each information field should be stored only once in order to meet a tight space budget. Today, while linear-size structures are still mandatory to cope with the vast data scale in many applications, constraining each information field to be stored exactly once appears excessively stringent. After all, the dollar-per-byte price of hard disks has been dropping continuously, making it realistic to replicate an information field a small number of times. Motivated by this, we are interested in r-redundant structures where r is a constant. But does duplication of information fields bring any benefit to query efficiency? The answer is yes, and as a matter of fact, very much so. Previous findings have revealed that query time can be reduced by polynomial factors, when space is increased only constant times. To be specific, when L = O(1), any 1-redundant p structure must entail Ω( N/B + K/B) I/Os in answering a query [9, 12] (under certain assumptions, as will be detailed in the next subsection). However, when constant r is sufficiently large, there is a linear-size r-redundant structure that guarantees query cost O((N/B)ǫ + K/B) for any, arbitrarily small, constant ǫ > 0 [18] (again, for L = O(1)). What the previous research has not shown is precisely how much benefit can be gained by investing additional space. This leaves open several intriguing questions. For example, if we can store every object twice (i.e., r = 2), what is the best query time possible? Conversely, if we aim at achieving query cost, say, O((N/B)1/10 +K/B), how many times must each object be stored? At a higher level, if one looks at the query complexity as a function of r, the previous research has resolved the function at two extremes: r = 1, and r being arbitrarily big, respectively. In other words, the tradeoff between redundancy and query time within the intermediate range (i.e., from r = 2 and onwards) still remains elusive. This work aims to understand the entire tradeoff at every integer r. On the lower-bound side, we will prove the best query time achievable by any r-redundant structure occupying linear space (for each specific r). On the upper-bound

side, we will present r-redundant structures whose query time matches our lower bounds, up to only a small factor. Why information field anyway? No previous lowerbound study of range search explicitly mentioned any information field. Storing an object was implicitly understood as storing its coordinates. However, as each coordinate takes up only constant words, demanding each object to be stored constant times appears somewhat unjustified when a hidden constant is permitted in the linear space complexity. This is most obvious for non-replicating structures: what point does it make to force r = 1 while allowing the structure to use up to cN/B space for some constant c potentially far greater than 1? One can harvest polynomial improvement in query time by setting r = 2 and in the meantime, perhaps with some clever tricks, still maintain the space to be at most cN/B. The introduction of information fields makes certain rredundant structures especially appealing: those consuming O(N/B)+rN L/B blocks – note that the second term is outside the big-O. Intuitively, they are the most space economical r-redundant structures because other than the storage of information fields, they are allowed to use only O(N/B) extra blocks, thus preventing tricks that “cheat” by keeping O(N L/B) blocks of search-guiding data that have nothing to do with information fields. Even though structures with such tricks still consume linear space O(N L/B), one has every right to wonder: why not use the space of those guiding data instead for storing information fields more often? It should be clear that the separation of good r-redundant structures from cheating ones owes to the fact that L can be ω(1). Conventionally, with L = O(1), O(N/B) + rN L/B is hardly any more meaningful than simply O(N/B). Indeed, the structures in our contributions all consume O(N/B) + rN L/B space. On the other hand, our lower bounds are proved for structures with O(N L/B) blocks, namely, covering also the cheating structures. Indulgence in theoretical discussion should not allow us to forget that information fields do make sense in practice. In many applications, a user is seldom interested in the coordinates of a point p retrieved; usually, it is the details of the entity represented by p that have triggered the query in the first place. Let the entity be a hotel, for example; then inf o(p) can include its concrete description (e.g., rating, prices, amenities, etc.). The size of inf o(p) may not be treated as a constant. Why L = o(B)? It turns out that for L = B, there is a trivial 1-redundant structure that occupies N L/B+O(N/B) space, and answers a range query in O((N/B)ǫ + KL/B) I/Os for any constant ǫ > 0. We will prove later that such efficiency is impossible for many L = o(B). The peculiarity at L = B arises from the fact that, the cost of reporting K ≥ 1 objects is Ω(KL/B) = Ω(K), namely, the query algorithm can afford to spend one I/O on each qualifying object. Imagine that we store the information fields of all objects in an array, indexed by object ids. The array uses N L/B + O(1) blocks. Then, create on P an O(N/B)-space structure designed for range search with L = 1, treating each id as an information field (which is why L = 1 suffices). This structure allows us to report the ids of

the objects satisfying a query in O((N/B)ǫ +K/B) I/Os, after which their information fields can be extracted from the array in O(K + KL/B) = O(KL/B) I/Os. We thus have obtained a 1-redundant index with space N L/B + O(N/B) and query time O((N/B)ǫ + KL/B). The space and query complexities of the above solution remain the same as long as L = Ω(B). The problem, nonetheless, becomes much more interesting for L = o(B), as we will see shortly. Average redundancy. In Definition 1, the redundancy of a structure is measured in the maximum number of times an object is stored. In general, an index may choose to store more copies of certain objects than others. Thus, even though the structure may duplicate some objects many times, the total number of information fields stored can be small. The redundancy of such a structure is low in an average sense: Definition 2. The average redundancy of a structure is the total number of information fields stored, divided by the size N of the dataset. Clearly, an r-redundant structure has average redundancy at most r, whereas all structures must have average redundancy at least 1. We will discuss lower bounds with respect to average redundancy.

1.2 Previous results We now proceed with a review of existing results on range search in the EM model. Focus will be placed on worst-case efficient structures using linear space O(N L/B), as they are the main subject of this paper. Query upper bounds with linear space. Linear-size 1redundant indexes have p been well studied. The best query time possible is O( N L/B + KL/B) (as will be clear in Section 1.4). This can be achieved by a slightly modified version [15, 16] of Bentley’s kd-tree [5], by ensuring that each leaf node should contain the information fields of Θ(B/L) objects. The O-tree of Kanth and Singh [12] also guarantees the optimal query time, and has the advantage of being fully dynamic: each insertion and deletion can be supported in O(log B N ) I/Os amortized (as would be difficult for the kdtree). Similar performance can also be achieved by the crosstree of Grossi and Italiano [7]. We are not aware of any specific study on r-redundant indexes with r > 1. Somewhat related is a result due to Hellerstein et al. [9]. For L = O(1), they showed that when the data points are aligned as a B × B grid (i.e., N = B 2 ), there is an r-redundant structure that solves a query in O((N/B)1/(2r) + K/B) I/Os. No structure was given in [9] for general datasets. Query lower bounds and indivisibility assumption. Establishing a lower bound on query time is usually carried out on a certain model of data structures (as well as their accompanying query algorithms), namely, the bound is guaranteed to hold on all the indexes captured by the model, but no guarantee exists for the other structures. For example, under the comparison-based model, it is easy to show that the query cost must be Ω(log B N + KL/B). This bound is excessively loose because all linear-size structures incur

polynomial query time in the worst case. Progress has been made in the past decade towards proving polynomial lower bounds. Our discussion below concentrates on L = O(1) which was assumed in the derivation of those bounds. The model of Kanth and Singh [12] can be thought of as the EM-equivalent of a pointer machine in internal memory. It represents a data structure as a tree such that, to visit a node of the tree, a query algorithm must first access all its ancestors, and follow their pointers leading to the node. Kanth and Singh showed that p any 1-redundant structure in this model must incur Ω( N/B + K/B) I/Os solving a query in the worst case. Due to the model’s limitations, their proof does not work for structures that cannot be viewed as a tree, or query algorithms that can jump directly to a node without fetching its ancestors. To date, the most general modeling of EM structures (for analyzing lower bounds on range search) is due to Hellerstein et al. [9]. Their model imposes no constraint on how a query algorithm may choose the next disk block to access – it can be any block regardless of the I/Os that have already been performed. They developed the redundancy theorem which is a powerful tool for analyzing the tradeoff between space and query efficiency for external memory structures, and generalizes earlier results [10, 13, 17] under the same model. For 2D range search, they utilized the theorem to prove the following fact. When N = B 2 , the average redundancy r (see Definition 2) of an index must satisfy   log B r=Ω (1) log A if the time of processing a query with K = B√has to be O(A) where A can be any positive value at most B/4. The fact implies that any linear-size structure must incur query cost Ω((N/B)c + K/B) in the worst case for some constant c > 0. To see this, notice that when N = B 2 , K = B, and B r is at most a constant, (1) indicates log = O(1). This, in log A turn, means that when B is large enough, log A ≥ c · log B, leading to A ≥ B c , where c is some positive constant. Given the choice of N , this translates to A ≥ (N/B)c . Underlying the analysis of [9] is the so-called indivisibility assumption, that is, every information field, say inf o(p) for some object p, must be stored as a whole (i.e., with all its bits) in a block. In other words, a structure is not allowed to, for instance, cut inf o(p) into individual bits, and stores each bit in a different block (at query time, these bits are assembled back into inf o(p) for reporting). More specifically, the indivisibility assumption lingers in the notion of flake (see Definition 5.2 in [9]), which is a subset of objects whose information fields are stored in a common block. This notion is central to proving the redundancy theorem. Others. There is a rich literature of designing heuristic structures (like the R-tree [4, 8]) that do not have attractive worst-case bounds but are empirically shown to work well on many real datasets. They are not relevant to our work which aims at good worst-case performance. The preceding survey focused on linear-size structures. Better query time is possible if super-linear space can be afforded. For L = O(1), the external range search tree of Arge, Samoladas and Vitter [2] uses O((N/B) log B N/ logB logB N ) space, and answers a range

query in O(log B N + K/B) I/Os. The lower bounds of [2, 9, 19] show that this is already optimal under some weak assumptions. It is worth mentioning that, if the query is a 3-sided rectangle, namely, having the form (−∞, x]×[y1 , y2 ], it can be solved in O(log B N + K/B) I/Os (for constant L) using the external priority search tree of [2], which consumes only linear space.

1.3 Weak indivisibility

tion fields. A pointer to an information field or a hash value computed from any portion of the information field is not counted as redundancy. A structure is permitted to store as much such information as needed, as long as the total space is O(N L/B).

1.4 Our results The first main theorem of this paper is:

We mentioned earlier that previous lower bounds were established under the indivisibility assumption that the information field inf o(p) of an object p is always stored together in the same block (but multiple copies can exist in different blocks). Although this is true in most of the existing indexes (especially the practical ones), it is quite unnecessary from the perspective of designing data structures. This fact already arises even in the traditional scenario where inf o(p) contains just the coordinates of p. For instance, why can’t a structure store the x- and y-coordinates of p in different blocks, as long as the query algorithm can put them together before they are output? This “mild” violation of the indivisibility assumption already escapes the scope of the existing lower bounds.

Theorem 1. Let τ be an integer at least 2, and L = B µ , where µ can be any constant satisfying 0 ≤ µ < 1. For any constant ǫ satisfying 0 < ǫ ≤ 1/τ , to achieve query cost

In this paper, we allow a structure to violate the assumption in a more aggressive manner. First, the bits in the information field inf o(p) of an object p can be freely divided, and stored across different blocks. Furthermore, the structure is even allowed to store some bits of inf o(p) more often than others. The only constraint is that, when inf o(p) is reported, each of its bits must have been retrieved from a block, rather than being computed. We refer to this requirement as the weak indivisibility assumption. Accordingly, we call the original atomic treatment of information fields as strong indivisibility assumption.

Corollary 1. For L = B µ where 0 ≤ µ < 1, under the weak indivisibility assumption, if a linear-space index has average redundancy no more than an integer r ≥ 1, it is im1 possible to guarantee query cost O((N L/B) r+1 −ǫ + KL/B), no matter how small the positive constant ǫ is.

An immediate remark concerns the definition of average redundancy – Definition 2 is no longer well defined if some bits of inf o(p) exist more frequently in the structure than others. However, a straightforward extension to the bit level will eliminate this issue. Denote by w the number of bits in a word. We now redefine the average redundancy of a structure as the ratio between • the total number of stored bits from object’s information fields, and • N wL, i.e., the total number of bits in the information fields of all objects. Another remark concerns why our requirement is a weak form of the indivisibility assumption. Subtle as it may be, our requirement still falls within the grand indivisibility assumption, whose removal would even allow the bits of information fields (of the objects qualifying a query) to be computed! As mentioned earlier, we do not allow a data structure to be this powerful. Nonetheless, weak indivisibility appears to be a meaningful concept to coin, given that it enables us to capture a wider class of structures than was previously possible under the strong indivisibility assumption. It is worth mentioning that efforts towards removing the ultimate indivisibility assumption have been reported in [11, 20], but those efforts do not concern range search. Finally, let us point out that the notion of redundancy in our model applies only to bits taken directly from informa-

O((N L/B)(1/τ )−ǫ + KL/B) under the weak indivisibility assumption, in the worst case a structure using O(N L/B) space must store at least τ N wL(1 − o(1)) bits from the information fields of the underlying objects (recall that w is the word length). The theorem implies:

To understand the corollary, notice that, as shown in Theorem 1, a structure must store (r + 1)N wL(1 − o(1)) bits from objects’ information fields to achieve query time 1 O((N L/B) r+1 −ǫ + KL/B). However, a structure with average redundancy at most r stores no more than rN wL such bits, i.e., N wL(1 − o(1)) bits less than needed. As mentioned in Section 1.1, when L = B, there is a linear-size 1-redundant structure with query cost O((N/B)ǫ + KL/B) for any constant ǫ > 0. Hence, Theorem 1 shows that the problem becomes much harder as soon as L drops by a polynomial factor B δ for an arbitrarily small δ > 0. To obtain our result, we applied the analytical framework of the redundancy theorem [9] with, however, several new ideas. The necessity of those ideas is due to two reasons. First, apparently, as the original framework makes the strong indivisibility assumption, it needs to be extended in a non-conventional way so that the assumption can be relaxed. Second, while the framework is good for establishing asymptotic lower bounds of average redundancy, it is not powerful enough to argue for actual, constant-revealed, lower bounds. More specifically, the hidden constant in (1) is 1/12 according to the proof in [9]. Intuitively, in our context, this r says that a structure must consume at least 12 N L/B blocks which, however, is far smaller than the target rN L/B as required to prove Theorem 1. As mentioned in Section 1.2, when L = O(1) and the data points form a B × B grid, there is an r-redundant structure [9] with query complexity O((N/B)1/(2r) + K/B). Theorem 1 reveals that this tradeoff is impossible for general data (even if L remains a constant). Instead, as our second main result, we prove:

Theorem 2. For any integer r ≥ 1, there is an rredundant structure that consumes rN L/B +O(N/B) space, and answers a query in KL/B + O((N L/B)1/(r+1) ) I/Os. The query time of our structure is optimal up to only a small factor (according to Corollary 1). We remind the reader that the term rN L/B in the space cost is outside the big-O.

2.

INDEXABILITY THEOREM FOR 2-REDUNDANCY

In this section, we will prove Theorem 1 in the special case of τ = 3. This allows us to explain the core of our technique without being distracted by the extra mathematical subtleties in the proof for arbitrary τ (as will be presented in the next section). Derivation of our lower bound theorems considers that the word length is w = Θ(log N ) bits. Let p be an object in the dataset P . We refer to each bit of inf o(p) as an information bit, that is, inf o(p) has wL information bits. In general, an information bit is uniquely characterized by two factors: (i) the object p that it belongs to, and (ii) the position in inf o(p) that it is at. With this notion, we slightly rephrase Theorem 1 under τ = 3 for easy referencing. Theorem 3. Let L = B µ , where µ can be any constant satisfying 0 ≤ µ < 1. For any constant ǫ satisfying 0 < ǫ ≤ 1/3, to achieve query cost O((N L/B)(1/3)−ǫ + KL/B) under the weak indivisibility assumption, in the worst case a structure using O(N L/B) space must store at least

Next, we define a notion called bit-flake to replace the concept of flake in [9]. Definition 3. Let q be a query with output size n. A bit-flake of q is a non-empty set f of bits satisfying two conditions: • All the bits in f are stored in an identical block that is accessed by q. • Each bit in f is an information bit of an object qualifying q. To illustrate the definition in an alternative manner, consider a block b accessed by q. Let X be the set of all information bits in b that belong to qualifying objects. Then, any non-empty subset of X is a bit-flake. Lemma 1. A query with output size n has at least nwL αnL − s B pair-wise disjoint bit-flakes of size s > 0. Proof. We use the following algorithm to collect a number of bit-flakes needed to prove the lemma. Assume that the query accessed z ≤ αnL/B blocks, denoted as b1 , ..., bz , respectively (ordering does not matter). For each i ∈ [1, z], let Xi be the set of information bits in bi that (i) belong to objects qualifying q, and (ii) are absent from the preceding blocks b1 , ..., bi−1 Pz. Clearly, X1 , ..., Xz are pair-wise disjoint. Furthermore, i=1 |Xi | = nwL because each information bit of every qualifying object is in exactly one Xi .

The theorem indicates that no 2-redundant structure can ensure query time O((N L/B)(1/3)−ǫ + KL/B); see Corollary 1.

From Xi , we form ⌊|Xi |/s⌋ pair-wise disjoint bit-flakes, by dividing arbitrarily the bits of Xi into groups of size s, leaving out at most s − 1 bits. The bit-flakes thus created from X1 , ..., Xz are mutually disjoint. The number of those bit-flakes equals   X z  z  X nwL nwL αnL |Xi | |Xi | −1 = −z ≥ − > s s s s B i=1 i=1

Our discussion concentrates on a set P of points forming an n × n grid, namely, N = n2 . We choose n such that the query cost is O(nL/B) when K = n points are reported. Towards this purpose, we solve n from:  2  31 −ǫ n L nL = B B   ǫ+2/3 (ǫ+2/3)(1−µ) B 2ǫ+1/3 (2) ⇔n = = B 2ǫ+1/3 L

We construct 3n queries as follows. Recall that the points of P form an n × n grid. Each row or column of the grid is taken as a query, referred to as a row query or column query, respectively. This has√defined √ 2n queries. Each of the remaining n queries is a n × n square, and is therefore √ called a square query. Specifically, the square touches n consecutive rows and columns of the grid, respectively. All the n square queries are mutually disjoint, and together cover the entire P . Figure 2 illustrates these queries for n = 16.

3N wL(1 − o(1)) information bits.

Since µ < 1, n is always greater 1. Furthermore, as ǫ ≤ 1/3, nL/B√= (n2 L/B)(1/3)−ǫ ≥ 1. For simplicity, let us assume that n is an integer. This assumption will be removed at the end of the section. From now on, we will consider only queries with output size K = n. Note that each such query must report nwL information bits. Given the choice of n in (2), the query is answered in O((n2 L/B)(1/3)−ǫ + nL/B) = O(nL/B) I/Os (remember nL/B ≥ 1, that is, the query must perform at least one I/O). Hence, we can assume that its cost is at most αnL/B for some constant α > 0.

as claimed.

By Lemma 1, the 3n queries define in total at least 3n( nwL − αnL ) bit-flakes of size s (we will decide the value s B of s later), such that the bit-flakes from the same query are pair-wise disjoint. Refer to all these bit-flakes as canonical bit-flakes. Recall that the bits of a bit-flake f are in a common block b. We say that b contains f . Lemma 2. Let b be a block, and t the number of information bits in b that appear in at least one canonical bit-flake. Then, b can contain at most √ t 3L n · B 2 w3 + s s3 canonical bit-flakes.

(a) Row queries

(b) Column queries

(c) Square queries

Figure 2: Hard dataset and queries on a 16 × 16 grid Proof. Let Frow be the set of canonical bit-flakes that are contained in b, and are defined from row queries. Define Fcol and Fsqr similarly with respect to column and square queries. The total number of information bits covered by at least one bit-flake in Frow ∪ Fcol ∪ Fsqr is t. Note that the bit-flakes in Frow are mutually disjoint because no two row queries retrieve a common object. The same is true for Fcol and Fsqr , respectively. Hence, it holds from the set inclusion-exclusion principle that:

most Bw/s. Hence: X

|frow ∩ fcol | ≤

X

|frow ∩ fsqr | ≤

frow ∈Frow ,fcol ∈Fcol

frow ∈Frow ,fsqr ∈Fsqr

X

fcol ∈Fcol ,fsqr ∈Fsqr

2 Bw s  2 √ Bw wL n s  2 √ Bw wL n . s wL

|fcol ∩ fsqr | ≤



Plugging the above inequalities into (3) gives: X

f ∈Frow ∪Fcol ∪Fsqr

X

∀frow ∈Frow ,fcol ∈Fcol

X

∀frow ∈Frow ,fsqr ∈Fsqr

|f |



X

f ∈Frow ∪Fcol ∪Fsqr

|frow ∩ fcol | −

≤ t + wL ≤ t+

|frow ∩ fsqr | −



Bw s

2

√ + 2wL n

√ 3L n · B 2 w3 s2



Bw s

2

As |f | = s for each f on the left hand side, we obtain:

X [ |fcol ∩ fsqr | ≤ f f ∈Frow ∪Fcol ∪Fsqr ∀fcol ∈Fcol ,fsqr ∈Fsqr = t.

|f |

(3)

Consider any bit-flakes frow , fcol and fsqr that are from Frow , Fcol and Fsqr , respectively. There are at most wL bits in frow ∩ fcol since a row query and a column query share exactly 1 object in their results (notice that the bits in frow ∩ fcol must belong to the information field of that object). On the√other hand, a row query and a square query share at most n√objects in their results. It follows that |frow √ ∩ fsqr | ≤ wL n. Similarly, it holds that |fcol ∩ fsqr | ≤ wL n. As the bit-flakes in Frow are pair-wise disjoint, we have |Frow | ≤ t/s, which is at most Bw/s because a block has Bw bits. Likewise, the sizes of Fcol and Fsqr are both at

|Frow ∪ Fcol ∪ Fsqr | ≤

√ t 3L n · B 2 w3 + s s3

thus completing the proof. We are now ready to prove a lower bound on the total number of information bits that must be stored. Let λ be the number of blocks occupied by the underlying structure. As the structure consumes O(N L/B) space, we know: λ ≤ βN L/B = βn2 L/B for some constant β > 0. Denote by ti (1 ≤ i ≤ λ) the number of information bits in the i-th block that appear in at least one canonical bit-flake. Combining Lemma 2 and the fact that there are at least 3n( nwL − αnL ) canonical s B bit-flakes, it holds that: λ  X ti i=1

s

+

 √ 3L n · B 2 w3 ≥ s3

3n



nwL αnL − s B



Hence: λ X

ti

i=1

≥ = ≥ =

√ λ X αnL 3L n · B 2 w3 nwL − −s 3ns s B s3 i=1   √ nwL αnL 3L n · B 2 w3 3ns − −λ· s B s2 √ 2 2 βn L 3L n · B 2 w3 3αn sL − · 3n2 wL − B B s2   2√ αs βBLw n 3n2 wL 1 − (4) − Bw s2 



Lemma 3. We can set s√= B c for some c > 0 such that both αs/(Bw) and βBLw2 n/s2 are o(1) when B is large enough. Proof. First, note that w = Θ(log N ) = Θ(log n) = in (2) is a constant. Θ(log B) because the term (ǫ+2/3)(1−µ) 2ǫ+1/3 Hence   αs Bc =O Bw B log B which is o(1) when: c ≤ 1. Recall that L = B (2) gives: √ βBLw2 n s2

µ

where 0 ≤ µ < 1. This together with

= O



= O



which is o(1) when: 1+µ+

(5)

µ B · B · B

B

1+µ+

(ǫ+2/3)(1−µ) 4ǫ+2/3

B 2c

(ǫ+2/3)(1−µ) 4ǫ+2/3

B 2c

 log2 B 

 log2 B 

(ǫ + 2/3)(1 − µ) < 2c. 4ǫ + 2/3

(6)

A value of c satisfying (5) and (6) exists when: (ǫ + 2/3)(1 − µ) 4ǫ + 2/3 (ǫ + 2/3)(1 − µ) ⇔ 4ǫ + 2/3 ⇔ ǫ + 2/3

1+µ+

< 2 < 1−µ < 4ǫ + 2/3

which is always true for ǫ > 0. Therefore, by fixing s as stated in the lemma, we can rewrite (4) into λ X i=1

ti ≥ 3n2 wL(1 − o(1)) = 3N wL(1 − o(1)).

We thus conclude the proof of Theorem 3. It is worth pointing out that the proof does not work for µ = 1, because in this case (6) requires c > 1 which conflicts (5). This is consistent with the discussion in Section 1.1 that better query time is possible for L = B. Recall that on a B × B grid (i.e., n = B), a 2-redundant structure can achieve query time O((N/B)1/4 + K/B) when

L = O(1) (see Section 1.2). The query complexity in Theorem 3 is exactly O((N/B)1/4 + K/B) by setting ǫ = 1/12 and µ = 0. To show that no structure can guarantee such cost, our proof used a B 1.5 × B 1.5 grid (the exponent in (2) equals 1.5). In other words, the problem in fact becomes harder than when n increases from B to B 1.5 . How is our technique different from [9]? Our proof was inspired by the analysis in [9]. In particular, we owe the method of flake counting to [9], which is the central ingredient for obtaining a tradeoff between space and query time. Nevertheless, some new ideas were deployed to obtain Theorem 3, as summarized below. The first one is to construct flakes at the bit level, which led to the introduction of bit-flakes (Definition 3). This proved to be a crucial step towards eliminating the strong indivisibility assumption. Naturally, it also demands redesigning several components in flaking counting, most notably (i) the approach described in the proof of Lemma 1 for collecting sufficiently many disjoint bit-flakes from a query, and (ii) in Lemma 2 bounding the number of canonical bitflakes per block with respect to the number t of bits participating in at least one canonical bit-flake. The second idea is to decide n in such a way that every query with output size n incurs in O(nL/B) I/Os (see Equation 2). In retrospect, the idea sounds fairly reasonable. It forces the n objects retrieved by each query to be stored in a compact manner. That is, they must be covered by asymptotically the minimum of blocks, noticing that nL/B blocks are compulsory for their storage. As a result, these blocks do not contain much information useful for answering other queries. Intuitively, the effect is that, the data structure must pack all the N objects in O(N L/B) blocks just to answer row queries, pack them again in another O(N L/B) blocks for column queries, and yet again for square queries. Hence, the redundancy needs to be roughly 3. Of course, for the above idea to work, queries should have small overlaps in their √ results. It turned out that an overlap of no more than n objects suffices. The third idea was applied in Lemma 2, which replaced Johnson’s bound in [9] (see Theorem 5.3 there). In fact, applying Johnson’s bound in Lemma 2 would tell us √ that the number of canonical bit-flakes in b is at most s/(wL n). This is quite different from what we have in Lemma 2, and does not seem to be tight enough for establishing our final result. At a higher level, the cause of the ineffectiveness behind Johnson’s bound here is that, in general, the bound can be loose when there are only a small number of canonical bit-flakes. This indeed happens in our proof, because the size s of a canonical bit-flake can be large (the value of c in (12) is close to 1 for small ǫ and µ). Finally, Lemma 3 is what really turns the flake-counting method into a working argument. The way that s is decided is specific to our context, and does not have a counterpart in [9]. √ √ Non-integer n. In this case, set n′ = (⌈ n⌉)2 . Clearly, n′ = Θ(n). It is easy to adapt our proof to work instead with N = (n′ )2 points forming an n′ × n′ grid. We will do so explicitly in the next section.

3.

GENERAL INDEXABILITY THEOREM

This section serves as the proof of Theorem 1. Our argument is analogous to the one in the previous section, but includes extra details for handling general τ . Remember that τ is a constant integer at least 2. As before, we consider a set P of points forming an n × n grid (i.e., N = n2 ), where the value of n makes the query cost bounded by O(nL/B) when the output size is K = n. Recall that the structure under our analysis has query cost O((N L/B)(1/τ )−ǫ + KL/B). We choose: l mτ −1 n = (n0 )1/(τ −1) (7)

where

n0

ǫ+(τ −1)/τ

= B 2ǫ+(τ −2)/τ

(1−µ)

.

Our choice ensures that n1/(τ −1) is an integer. The next lemma is rudimentary: Lemma 4. The following are true: n 2

(n L/B)

(1/τ )−ǫ

nL/B

= Θ(n0 ) = Θ(nL/B) ≥ 1

1/(τ −1)

Proof. Set x = ⌈(n0 ) ⌉. Hence, n0 > (x − 1)τ −1 , and n = xτ −1 . Consider sufficiently large B so that x ≥ 2. In this case: n ≤ (2(x − 1))τ −1 < 2τ −1 n0 = O(n0 ). Then, n = Θ(n0 ) follows from the obvious fact that n ≥ n0 . It can be easily verified that n0 satisfies ((n0 )2 L/B)(1/τ )−ǫ = n0 L/B. Hence:   (n2 L/B)(1/τ )−ǫ = Θ ((n0 )2 L/B)(1/τ )−ǫ =

Θ(n0 L/B) = Θ(nL/B).

Finally, nL/B ≥ n0 L/B = ((n0 )2 L/B)(1/τ )−ǫ ≥ 1 because ǫ ≤ 1/τ . It follows that log n = Θ(log n0 ) = Θ(log B) (recall that ǫ, τ and µ are constants). The subsequent analysis concentrates on queries with output size n. Every such query can be answered in time

We still use Definition 3 to define bit-flake. Lemma 1 still holds in our current context. Furthermore, define a canonical flake in the same way as in Section 2. Hence, the nτ queries constructed earlier give rise to at least   nwL αnL nτ − s B canonical bit-flakes. Lemma 2, however, no longer holds, so we provide its counterpart: Lemma 5. Let b be a block, and t the number of information bits in b that appear in at least one canonical bit-flake. Then, b can contain at most t n(τ −2)/(τ −1) · τ (τ − 1) · LB 2 w3 + s 2s3 canonical bit-flakes. Proof. Recall that our queries are divided into set 0, ..., set τ − 1, each of which contains queries with the same size. Let Fi (i ∈ [0, τ − 1]) be the set of canonical bit-flakes that are contained in b, and defined from queries of set i. The bit-flakes in each Fi are mutually disjoint. The total number of information bits that appear in at least one bit-flake in F0 ∪ ... ∪ Fτ −1 is t. By the set exclusion-inclusion principle, we have: X |f | − f ∈(F0 ∪...∪Fτ −1 )

X

∀i, j s.t. i 6= j

An axis-parallel rectangle is said to have size lx × ly if it covers exactly lx · ly points of P which come from lx (ly ) columns (rows) in the grid underlying P . Equivalently, this rectangle intersects with lx (ly ) consecutive columns (rows) of the grid. We consider τ query sets, referred to as set 0, ..., set τ − 1 respectively, each of which consists of n queries. Specifically, the queries in set i (0 ≤ i ≤ τ − 1) have the same size

∀f1 ∈Fi ,f2 ∈Fj

= t.

(8)

|f1 ∩ f2 | ≤ wL · n(τ −2)/(τ −1)

(9)

for any f1 ∈ Fi and f2 ∈ Fj with i 6= j. Without loss of generality, suppose i < j. Let q1 (q2 ) be the query from which f1 (f2 ) was defined. In other words, q1 and q2 have sizes ni/(τ −1) ×n(τ −1−i)/(τ −1) and nj/(τ −1) ×n(τ −1−j)/(τ −1) , respectively. Let lx × ly be the size of q1 ∩ q2 . It holds that lx ly

≤ ≤

ni/(τ −1) n(τ −1−j)/(τ −1) .

Therefore, q1 ∩ q2 covers at most lx · ly ≤ n(τ −1−j+i)/(τ −1) ≤ n(τ −2)/(τ −1) points of P . As f1 ∩f2 is a subset of the information bits belonging those points, (9) follows from the fact that a point’s information field has wL bits. Hence, (8) leads to: X

ni/(τ −1) × n(τ −1−i)/(τ −1) , are pair-wise disjoint, and together cover the entire grid. The total number of queries from all τ sets is τ n.

|f1 ∩ f2 | ≤

[ f f ∈(F0 ∪...∪Fτ −1 )

We now show that

O((n2 L/B)(1/τ )−ǫ + nL/B) which is O(nL/B) by Lemma 4. Hence, we can assume that its cost is no more than αnL/B for some constant α > 0.

X

f ∈(F0 ∪...∪Fτ −1 )

X

∀i, j s.t. i 6= j

X

∀f1 ∈Fi ,f2 ∈Fj

|f | −

  wL · n(τ −2)/(τ −1) ≤

t

The disjointness of the bit-flakes in Fi implies that |Fi | ≤ t/s ≤ Bw/s, with which the above inequality gives X |f | − f ∈(F0 ∪...∪Fτ −1 )

X

∀i, j s.t. i 6= j



wL · n(τ −2)/(τ −1) · ⇒

2

B w s2

X

f ∈(F0 ∪...∪Fτ −1 )

 2

≤ t

|f |



τ (τ − 1) LB 2 w3 n(τ −2)/(τ −1) · · 2 s2

≤ t.

t τ (τ − 1) LB w + n(τ −2)/(τ −1) · · s 2 s3

1+µ+ 3

As the underlying structure uses O(n2 L/B) space, we can assume that it occupies λ ≤ βn2 L/B blocks for some constant β > 0. Define ti (1 ≤ i ≤ λ) as the number of bits that are stored in the i-th block, and appear in at least one canonical bit-flake. The previous lemma, combined with the fact that there are at least nτ ( nwL − αnL ) canonical flakes, s B shows:  n(τ −2)/(τ −1) · τ (τ − 1) · LB 2 w3 ≥ + s 2s3   nwL αnL nτ − s B

i=1



αnL nwL − s B



· log 2 B 

ǫ + (τ − 1)/τ τ − 2 · · (1 − µ) 2ǫ + (τ − 2)/τ τ − 1


0 such that 2 (τ −2)/(τ −1) αs are o(1) when B is both Bw and (τ −1)β·BLw2s·n 2 large enough. Proof. First, note that w = Θ(log N ) = Θ(log n) = Θ(log B). With s = B c , we have   αs Bc =O Bw B log B which is o(1) when: c ≤ 1.

B 2c

4. r -REDUNDANT STRUCTURES

Hence: ≥ nsτ

ǫ+(τ −1)/τ −2 (1−µ) 1+µ+ 2ǫ+(τ −2)/τ τ τ −1

concluding the proof of Theorem 1.

λ  X ti

ti

B

A value of c satisfying (11) and (12) exists when:

completing the proof.

λ X

= O

1+µ+

2

i=1



(τ − 1)β · BLw2 · n(τ −2)/(τ −1) 2s2 

which is o(1) when:

As |f | = s for every f ∈ F0 ∪ ... ∪ Fτ −1 , we arrive at: |F0 ∪ ... ∪ Fτ −1 | ≤

Applying L = B µ , (7) and Lemma 4, we know:

(11)

Preliminary: external interval tree. This structure, due to Arge and Vitter [3], settles the following stabbing problem. The dataset consists of N intervals in the real domain. Given a real value q, a stabbing query reports all the data intervals enclosing q. The external interval tree consumes O(N/B) space, and answers a stabbing query in O(logB N + K/B) I/Os, where K is the number of reported intervals. The first 1-redundant structure. Let us start p by giving a 1-redundant structure occupying N L/B + p N L/B + O(N/B) space, and solving a query in KL/B +O( N L/B) I/Os. Note that the space cost is worse than required in Theorem 2. The structure in fact has been used as a component in the range search tree of Chazelle [6] and its external counterpart [2]. Our version here differs only in parameterization (as will be pointed out shortly). Nevertheless, we describe it in full anyway because the details are useful in clarifying general r-redundant indexes later. p We introduce a parameter ρ = N L/B. Also, define a 2 slab as the part of data space R between and including two vertical lines x = c1 and x = c2 . To explain our structure, let us sort the points of P in ascending order of their xcoordinates. Partition the sorted list into ρ segments with the same number of points, except possibly the last segment.

middle slabs

q

p8 p1

p7 p6 p2

y = y1

σ1

p3

σ2

σ3

p4

σ4

p5

σ5

σ6

σ7

σ8

boundary slabs Figure 3: Answering a range query (ρ = 8)

Denote by Pi (1 ≤ i ≤ ρ) the set of points in the i-th segment. Hence, for any i1 , i2 such that 1 ≤ i1 < i2 ≤ ρ, each point in Pi1 has an x-coordinate smaller than the xcoordinates of all the points in Pi2 . Denote by σi the slab determined by the smallest and largest x-coordinates in Pi . For each Pi , sort the points there in ascending order of their y-coordinates. From now on, we will treat Pi as a sorted list, abusing the notation slightly. Naturally, the jth point of Pi refers to the j-th point in the sorted list. If 2 ≤ j ≤ |Pi |, the predecessor of the j-th point is the (j −1)-st point. As a special case, the predecessor of the first point in Pi is defined as a dummy point whose y-coordinate is −∞. The x-coordinate of the dummy point is unimportant. We store the information fields of the points of Pi in an array Ai (respecting the ordering of those points). The array occupies at most 1 + L|Pi |/B blocks, i.e., all but the last block contains B words of data. Hence, arrays A 1 , ..., Aρ p require in total at most ρ + N L/B = N L/B + N L/B blocks. Since every object’s information field is already in an array, no more information field can be stored in the rest of the index. However, we are still allowed O(N/B) extra space, which is sufficient to store the coordinates of each object constant times. As discussed below, we use that space to create an external interval tree T . T is built on N one-dimensional intervals obtained as follows. From each Pi , we generate a set Ii of |Pi | intervals. Each point p ∈ Pi determines an interval in Ii having the form (ypred , yp ], where yp is the y-coordinate of p, and ypred is that of its predecessor. We associate this interval with a pointer to the block of array Ai where inf o(p) is stored, so that when the interval is fetched, we can jump to that block in one I/O. T indexes the union of I1 , ..., Iρ . As T uses O(N/B) p blocks, the overall space of our structure is N L/B + N L/B + O(N/B). To answer a range query with search region q = [x1 , x2 ] × [y1 , y2 ], we first identify the (at most) two boundary slabs that contain the left and right edges of q, respectively (this takes O(log B N ) I/Os with another B-tree on the slabs’

x-ranges; we omit this B-tree in the sequel because it is straightforward). Denote them respectively as σi1 and σi2 with i1 ≤ i2 . Scan arrays Ai1 and Ai2 completely to report the qualifying objects there. Since each array has at most N/ρ objects, the cost of the scan is   p NL O = O( N L/B). ρB

In the example of Figure 3 (where ρ = 8), the boundary slabs are σ2 and σ6 , in which all the objects are examined.

The other qualifying objects can lie only in slabs σi where i ranges from i1 +1 to i2 −1. Refer to those slabs as the middle slabs. For each such slab σi , we find the lowest point pi in σi on or above the horizontal line y = y1 . After this, jump to the block in Ai where inf o(pi ) is stored, and start scanning Ai from pi to retrieve the other points above line y = y1 . We do so in ascending order of those points’ y-coordinates, so that the scan can terminate as soon as encountering a point falling out of q. All the points already scanned prior to this moment are the only objects in σi satisfying q. If the number of them is Ki , the scan performs at most 1 + Ki L/B I/Os. Hence, carrying out the scan in all slabs σi (i ∈ [i1 +1, i2 −1]) takes at most ρ+Kmid L/B I/Os, where Kmid is the number of qualifying objects from the middle slabs. In Figure 3, the middle slabs are σ3 , σ4 and σ5 . The scan in σ3 , for instance, starts from p3 , and ends at the lowest point in σ3 above the query rectangle. It remains to explain how to find all the pi for each i ∈ [i1 +1, i2 −1]. This can be settled with the external interval tree T . Recall that pi determines an interval in Ii . The definition of pi makes that interval the only one from Ii that contains the value y1 . Hence, it can be found by a stabbing query on T with y1 as the search value. Note that this stabbing query may retrieve an interval for every slab, including those that are not a middle-slab. In Figure 3, for example, the stabbing query returns (the intervals determined by) 8 points: p1 , ..., p8 . Nevertheless, as there are only ρ slabs, the stabbing query finishes in O(log B N + ρ/B) I/Os, after which we can keep only the points in the middle-slabs and discard the others. Therefore, overall the time of reporting the qualifying points in the middle slabs is bounded by Kmid L/B + ρ + O(logB N + ρ/B) p N L/B).

= Kmid L/B + O(

(13)

As analyzed earlier, the qualifying p objects from the boundary slabs can be found in O( pN L/B) I/Os. Hence, the total query time is O(KL/B + N L/B). As mentioned, this structure has been used in [2, 6]. The only nuance in our scenario is the choice of ρ (which was logarithmic in [2, 6]). The techniques in the rest of the section, on the other hand, are newly developed in this paper.

Reducing the space. The previous index incurs more space p than our target in Theorem 2 by an additive term of N L/B (i.e., ρ). This term can be eliminated with a trick we call tail collection, as explained next. A close look at our earlier description reveals that the extra term ρ exists because each Ai (1 ≤ i ≤ ρ) may have an under-full block, which does not have B words of data, and thus, wastes space. This under-full block, if present,

must be the last block in Ai . We remove it from Ai , after which all the blocks in Ai are fully utilized. We concatenate the data of all the non-full blocks (from different arrays) into a separate tail file G. All blocks in G store B words of data, except possibly one. Therefore, the total space used by G and the arrays is now at most 1 + N L/B, i.e., ρ blocks less than before. Note that G itself has no more than ρ blocks. Therefore, we can afford to scan it completely in answering a query, which will add only ρ I/Os to the query cost, and p hence, does not change the query complexity KL/B +O( N L/B). The query algorithm is still the same as before, except that in scanning an array, if we have come to its end and see that some information has been moved to G, we should continue scanning the relevant portion in G. r-redundant structure. Assuming that there is an (r−1)redundant structure with space (r − 1)N L/B + O(N/B) and query time KL/B + O((N L/B)1/r ), we can obtain an rredundant index with space rN L/B + O(N/B) and query time KL/B+O((N L/B)1/(r+1) ). This can be done by modifying the earlier 1-redundant structure, as elaborated below. The first change is the value of ρ, which is now set to (N L/B)1/(r+1) . Then, in the same manner as in the 1redundant case, we divide P into P1 , ..., Pρ , and create arrays A1 , ..., Aρ , the tail file G, and the external interval tree T . Currently, the information field of each object has been stored once, such that the space consumption is N L/B + O(N/B). On each Pi (1 ≤ i ≤ ρ), we build an (r − 1)-redundant structure Ti , which occupies (r − 1)|Pi |L/B + O(|Pi |/B) space. Hence, T1 , ..., Tρ together use   ρ X (r − 1)|Pi |L |Pi | (r − 1)N L +O + O(N/B) = B B B i=1 space. This explains why the overall space is rN L/B + O(N/B). Note that the final structure is r-redundant. To answer a range query q = [x1 , x2 ] × [y1 , y2 ], as in the 1-redundant case, we start by identifying the boundary slabs σi1 and σi2 (i1 ≤ i2 ). Recall that they define middle slabs σi for each i ∈ [i1 +1, i2 −1]. The qualifying objects in the middle slabs are retrieved in the same way as in the 1-redundant case, i.e., utilizing T , the arrays of the middle slabs, and perhaps also the tail file. If Kmid objects from the middle slabs satisfy the query, as shown in (13), they can be extracted in Kmid L/B + O(ρ) = Kmid L/B + O((N L/B)1/(r+1) ) I/Os. Finally, to report the qualifying objects in the boundary slabs, we query the (r − 1)-redundant structures Ti1 and Ti2 . Notice that each of these structures indexes at most N/ρ objects. Hence, if K1 and K2 objects are found from Ti1 and Ti2 respectively, searching the two structures entails  1/r ! NL (K1 + K2 )L/B + O ρB   1 r = (K1 + K2 )L/B + O (N L/B) r+1 · r   = (K1 + K2 )L/B + O (N L/B)1/(r+1) I/Os. As Kmid + K1 + K2 = K, the overall query time is KL/B + O((N L/B)1/(r+1) ).

Combining the above inductive construction with our earlier 1-redundant structure, we have established the correctness of Theorem 2.

5. CONCLUDING REMARKS After explaining why (linear-size) r-redundant structures matter for range search in practice, this paper presented new lower bound results revealing the tradeoff between query efficiency and the number r of times an object can be stored, for every constant integer r. In particular, we showed that those results hold under a weaker form of the indivisibility assumption, and thereby demonstrated (for the first time to our knowledge) how the strong indivisibility assumption can be eliminated from the analytical framework underlying the indexability theorem. We also proved the tightness of our lower bounds by describing indexes realizing the optimal tradeoff up to only a small factor. Closing the gap between our lower and upper bounds would be an interesting problem for future work. Our current results do not rule out, for example, a 2-redundant structure with query time O((N L/B)1/3 / polylog B N ) plus the linear output cost. It is unknown whether such a structure exists. The proposed r-redundant structure can be made fully dynamic with standard global-rebuilding techniques [14], however, at the cost of increasing the space and query time. The resulting index answers a query in O((N L/B)1/(r+1) + KL/B) I/Os, supports an insertion and a deletion in O(log B N ) I/Os amortized, and occupies (1 + ǫ)rN L/B + O(N/B) space, where ǫ > 0 can be an arbitrarily small constant. How to remove the (1 + ǫ) factor or prove its impossibility remains open.

Acknowledgements This work was supported in part by (i) projects GRF 4169/09, 4166/10, 4165/11 from HKRGC, and (ii) the WCU (World Class University) program under the National Research Foundation of Korea, and funded by the Ministry of Education, Science and Technology of Korea (Project No: R31-30007).

References [1] A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM (CACM), 31(9):1116–1127, 1988. [2] L. Arge, V. Samoladas, and J. S. Vitter. On twodimensional indexability and optimal range search indexing. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 346–357, 1999. [3] L. Arge and J. S. Vitter. Optimal external memory interval management. SIAM Journal of Computing, 32(6):1488–1508, 2003. [4] N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of ACM Management of Data (SIGMOD), pages 322–331, 1990.

[5] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM (CACM), 18(9):509–517, 1975. [6] B. Chazelle. Filtering search: A new approach to queryanswering. SIAM Journal of Computing, 15(3):703– 724, 1986. [7] R. Grossi and G. F. Italiano. Efficient splitting and merging algorithms for order decomposable problems. Information and Computation, 154(1):1–33, 1999. [8] A. Guttman. R-trees: a dynamic index structure for spatial searching. In Proceedings of ACM Management of Data (SIGMOD), pages 47–57, 1984. [9] J. M. Hellerstein, E. Koutsoupias, D. P. Miranker, C. H. Papadimitriou, and V. Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM (JACM), 49(1):35–55, 2002. [10] J. M. Hellerstein, E. Koutsoupias, and C. H. Papadimitriou. On the analysis of indexing schemes. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 249–256, 1997. [11] J. Iacono and M. Patrascu. Using hashing to solve the dictionary problem (in external memory). To appear in Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2012. [12] K. V. R. Kanth and A. K. Singh. Optimal dynamic range searching in non-replicating index structures. In Proceedings of International Conference on Database Theory (ICDT), pages 257–276, 1999.

[13] E. Koutsoupias and D. S. Taylor. Tight bounds for 2dimensional indexing schemes. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 52–58, 1998. [14] M. H. Overmars. The Design of Dynamic Data Structures. Springer-Verlag, 1987. [15] O. Procopiuc, P. K. Agarwal, L. Arge, and J. S. Vitter. Bkd-tree: A dynamic scalable kd-tree. In Proceedings of Symposium on Advances in Spatial and Temporal Databases (SSTD), pages 46–65, 2003. [16] J. T. Robinson. The K-D-B-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of ACM Management of Data (SIGMOD), pages 10–18, 1981. [17] V. Samoladas and D. P. Miranker. A lower bound theorem for indexing schemes and its application to multidimensional range queries. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 44–51, 1998. [18] M. Streppel and K. Yi. Approximate range searching in external memory. Algorithmica, 59(2):115–128, 2011. [19] S. Subramanian and S. Ramaswamy. The p-range tree: A new data structure for range searching in secondary memory. In Proceedings of the Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pages 378–387, 1995. [20] E. Verbin and Q. Zhang. The limits of buffering: a tight lower bound for dynamic membership in the external memory model. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 447–456, 2010.