IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
341
Interactive Volume Rendering of Large Sparse Data Sets Using Adaptive Mesh Refinement Hierarchies Ralf Ka¨hler, Mark Simon, and Hans-Christian Hege Abstract—In this paper, we present an algorithm that accelerates 3D texture-based volume rendering of large, sparse data sets, i.e., data sets where only a fraction of the voxels contain relevant information. In texture-based approaches, the rendering performance is affected by the fill-rate, the size of texture memory, and the texture I/O bandwidth. For sparse data, these limitations can be circumvented by restricting most of the rendering work to the relevant parts of the volume. In order to efficiently enclose the corresponding regions with axis-aligned boxes, we employ a hierarchical data structure, known as an AMR (Adaptive Mesh Refinement) tree. The hierarchy is generated utilizing a clustering algorithm. A good balance is thereby achieved between the size of the enclosed volume, i.e., the amount to render in graphics hardware and the number of axis-aligned regions, i.e., the number of texture coordinates to compute in software. The waste of texture memory by the power-of-two restriction is minimized by a 3D packing algorithm which arranges texture bricks economically in memory. Compared to an octree approach, the rendering performance is significantly increased and less parameter tuning is necessary. Index Terms—3D texture mapping, hierarchical space partitioning, AMR tree, octree, sparse volume data.
æ 1
INTRODUCTION
T
HREE-DIMENSIONAL imaging and computational science produce increasingly large volumetric data sets. In 3D imaging, where the number of voxels is proportional to the third power of the spatial resolution, the sizes of the data sets grow especially fast. While data volumes consisting of Oð109 Þ voxels are not unusual today, future imaging devices and large scale simulations will create tera-scale data sets. Such huge data sets have to be handled and explored interactively—which remains a challenge despite the enormous computer hardware development. Due to the advent of powerful texturing hardware, slicebased volume rendering algorithms, either using 2D texture mapping [11] or 3D texture mapping [8], [29], became very popular. They obtain a moderate to good image quality, sufficient for many applications, and allow interactive frame rates on standard graphics hardware. The textures result from transfer functions that map the original data to color and opacity values. In the case of 3D texture mapping, the texture coordinates at the polygon’s vertices are interpolated and determine a slice through the 3D texture. This approach leverages the trilinear interpolation hardware and has the additional advantage that textures need to be loaded only once, independent of the viewpoint. If the data set is too large to fit in texture memory, the full volume may be rendered in
. R. Ka¨hler and H.-C. Hege are with Konrad-Zuse-Institut fu¨r Informationstechnik Berlin, Division Scientific Computing, Department Visualization, Takustraße 7, D-14195 Berlin-Dahlem, Germany. E-mail: {kaehler, hege}@zib.de. . M. Simon is with Freie Universita¨t Berlin, Department of Mathematics and Computer Science, Arnimallee 2-6, D-14195 Berlin, Germany. E-mail:
[email protected]. Manuscript received 20 July 2001; revised 11 Dec. 2001; accepted 19 Dec. 2001. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number 114576. 1077-2626/03/$17.00 ß 2003 IEEE
multiple passes, placing only portions of the volume data, socalled bricks, into the texture memory on each pass. For large data sets, the rendering performance of texture-based approaches is mainly limited by the fill-rate of the texture hardware, since this determines the time for displaying the data using an appropriate number of texture slices, . the amount of texture memory, since this defines the number of passes necessary to render the whole volume, and . the texture I/O bandwidth, since this (mainly) determines the time for placing new texture chunks in texture memory during each pass. Often, just a small portion of the highly resolved data set is of interest for the user. This holds, for example, if the data set contains large transparent or homogeneous regions that do not contribute any information to the final image. For segmented image data, the user often needs to visualize just a subset of the various segmented regions. For instance, in biomedical visualization, only line-like structures such as vessel trees, neuron trees, trabeculae in bones, or filament structures in muscles, have to be visualized. Thin structures, occupying only a tiny fraction of a volume, occur on all length scales in nature—ranging from chain molecules in chemistry to filaments of galaxies and galaxy clusters. Additionally, in numerical simulations, large computational volumes often have to be considered to take care of boundary conditions, though the interesting phenomena happen in very small spatial regions. Sparsity of data can be exploited to increase the rendering performance by assigning individual 3D textures—usually also called “texture bricks”—only to regions that have been classified as relevant. The optimal coverage of these subvolumes would be achieved by assigning a texture to each .
Published by the IEEE Computer Society
342
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
relevant elementary cell of the dual voxel grid. This, of course, would result in an enormous number of texture and polygon coordinates to be computed and specified since every brick has to be intersected with the proxy geometries (mostly slices) to be rendered. Since intersection computations are carried out in software, the rendering performance would severely decrease. A good balance between the volume enclosed by texture bricks and the number of created bricks is, therefore, crucial. In this paper, we present an algorithm that achieves this balance by utilizing a hierarchical data structure, the socalled adaptive mesh refinement (AMR) tree, which was introduced by Berger and Collela in the 1980s for numerical gas dynamic simulations [5]. Our algorithm consists of the following steps: Recursively subsample the data and build up a pyramidal structure. . Employ a signature-based clustering algorithm to create an AMR hierarchy that efficiently encloses the relevant regions by axis-aligned boxes. . Assign a separate 3D texture brick to each leaf of the hierarchy. . Minimize the waste of texture memory due to power-of-two restrictions (of the graphics API or graphics hardware) by utilizing a 3D packing algorithm. For evaluation, we apply our algorithm to 3D image and simulation data sets with different characteristics and compare it with an octree-based algorithm. We show that significant gains in rendering performance are achieved for sparse data sets. The AMR-based algorithm has the additional advantage that it requires less parameter adjustment, i.e., less user interaction, since standard parameter settings already yield good rendering performance—almost independent of the topology and spatial distribution of the interesting subregions. The paper is organized as follows: Section 2 surveys related work. Our approach is described in Section 3.1; the alternative octree-based approach is presented in Section 3.2. Sections 4.1 and 4.2 address the 3D texturebased rendering of AMR and octree hierarchies, respectively. In Section 5, we list performance results of the methods on different data sets. The results are discussed in Section 6 and future work is presented in Section 7. .
2
RELATED WORK
The idea to accelerate volume rendering algorithms by omitting subregions that do not contribute to the final image has been pursued by many researchers. Levoy [17] presented a ray transversal algorithm that skips empty space by utilizing an octree data structure to encode the presence of nontransparent material. Danskin and Hanrahan [10] extended this approach by exploiting data homogeneity and accumulated opacity as well. Laur and Hanrahan [15] proposed a splatting algorithm that works on a pyramidal representation of the volume and chooses the number of splats adaptively, according to user-supplied error criteria. Storing data mean and root mean square at each node permits rendering by progressive refinement. Nodes within the user-specified tolerance are
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
rendered as single splats by utilizing texture mapping capabilities. The octree data structure has been employed in combination with 3D texture-based volume rendering approaches, also. LaMar et al. [14] introduced a multiresolution technique for interactive volume visualization of large data sets. They employ an octree representation and a node selection scheme for adapting the distance of texture slices for regions depending on their distance to the viewpoint. Further, they introduced the use of spherical shells as proxy geometries. Weiler et al. [28] expanded the octree approach by methods that avoid rendering artifacts due to discontinuous texture interpolation and varying sample distances at boundaries between regions with different resolutions. Boada et al. [7] presented an error and importance driven strategy for selecting a set of octree nodes from the full pyramidal structure. Fang et al. [12] introduced an approach for rendering deformable volume data utilizing an octree encoded target volume. Octree-based data structures have further been applied for efficient handling of time dependent data sets by Shen et al. [21]. They proposed a combination of a “spatial” octree and a binary “time” partitioning tree, called a timespace partitioning (TSP) tree. It effectively captures both the spatial and the temporal coherence in a time-varying field. First applied to accelerate raycasting algorithms, this work has been extended to hardware volume rendering using 3D texture mapping [22]. Though the octree data structure is easy to implement and has the nice feature of allowing a level-of-detail representation of the data volume, the spatial locations of the partitioning boundaries are not adapted to the underlying data contents. For space leaping approaches applied to large, sparse data sets, this often results in an unnecessarily high number of boxes and/or boxes that capture only very few relevant voxels. In the following, we briefly sketch approaches for accelerating raycasting, respectively, raytracing that are not based upon octrees. The idea of exploiting the distance transform to speed up the background traversal by Zuiderveld et al. [30] has been extended by Cohen and Sheffer [9], who introduced so-called “proximity clouds” that store “uniformity information” (typically encoded in a space partitioning tree) directly in the voxel raster: Voxel data either contain a data value or information indicating how far incident rays may leap without missing important features. This technique does not apply to volumetric rendering via 3D textures since the volume is not subdivided into axis-aligned regions, as required by the texture hardware. Lee and Park [16] reduced ray-casting overheads by an adaptive block subdivision. Their algorithm applies a uniform space subdivision and then merges coherent uniform blocks in order to generate adaptively-sized blocks which are efficient for leaping space. This approach is not efficient for volume rendering via 3D textures since it usually generates many boxes of voxel size and it does not guarantee that the resulting blocks can be traversed in a view-consistent order. Subramanian and Fussel [24] designed a raytracer that works efficiently in case the data of interest is distributed sparsely through the volume. A kD-tree is generated from the data volume by a median-cut subdivision. Empty spaces
KA¨HLER ET AL.: INTERACTIVE VOLUME RENDERING OF LARGE SPARSE DATA SETS USING ADAPTIVE MESH REFINEMENT HIERARCHIES
343
Fig. 2. Creating the pyramidal structure. The bottom line depicts the original data samples (on a 2D grid). The rows above indicate the coarser layers.
Fig. 1. Two-dimensional example of an AMR grid hierarchy. The parentchild relations are defined by spatial inclusion. Root grid A has one subgrid B, which again has two children (C, D).
are pruned off and the subdivision is repeated until each region contains exactly one voxel. Combining this approach with texture-based volume rendering would suffer from the simple median-cut subdivision strategy which produces regions containing too many irrelevant voxels or too many regions if the subdivision is carried out until voxel-size is reached. An approach for accelerating volume rendering via 3D texture by leaping of empty regions has been proposed by Tong et al. [25]. The data volume is partitioned into equal-sized blocks that are pruned in one direction. The resulting blocks are merged to reduce texture I/O and rendered in parallel projection. Similar to the octree approach, this algorithm introduces partitioning axes independent of the underlying spatial data distribution and it is not clear which brick size to choose for optimal rendering performance. Methods for multiresolution rendering of data which are given on AMR grids, e.g., resulting from AMR simulations of hyperbolic PDEs by finite difference algorithms, have been developed by Weber et al. [26], [27] and Ma [19]. In contrast, we describe an algorithm that adaptively creates an AMR hierarchy for scalar data given on uniform grids by evaluating the 3D contents based on relevance criteria, with the aim of fast texture-based volume rendering.
3 3.1
GENERATING
THE
HIERARCHIES
Generating the AMR Hierarchy
3.1.1 The AMR Data Structure In numerical analysis, local refinement techniques and hierarchical schemes have become popular in recent years because they lead to more reliable results and allow the simulation of larger volumes as well as more complex phenomena. A special hierarchical scheme is Adaptive Mesh Refinement (AMR), introduced by Berger and Collela in the 1980s [5]. In this approach, the whole computational domain is covered by a coarse grid, representing the root node of the hierarchical data structure. In regions where higher resolution is required, finer subgrids are inserted in the root grid and represented as child nodes of the root node. Together, they define the second level of a hierarchy, increasing the resolution of their parent grid by some factor, usually referred to as the refinement factor. Fig. 1 shows a 2D example. This process repeats recursively until all grid
cells on the finest resolution level satisfy certain error criteria, dependent on the particular numerical problem. For simplification purposes, the AMR schemes usually fulfill the following restrictions: The refinement factor is an integer, possibly different for the various spatial directions. . The subgrids are axis-aligned, structured rectilinear meshes, consisting of hexahedral cells with constant edge lengths. . Subgrids are completely contained within their parent grids. . Subgrids begin and end on parent cell boundaries, which implies that parents grid cells are either completely refined or completely unrefined. For volume rendering of large, sparse data sets via 3D textures, this hierarchical data structure is well suited due to the possibility of freely placing subgrids and thereby exactly matching subregions that contain relevant cells. Imagine, for example, the simple case of a rectangular region of opaque voxels in the center of a larger volume. The AMR structure can enclose this with just one child node and without covering many transparent cells. An octree would need at least eight or more subnodes, thereby introducing a higher number of interior boundaries, resulting in a larger number of texture coordinates to be computed. Of course, the question is how to enclose complexshaped regions of relevant cells with a small number of axis-aligned bounding boxes without capturing too many nonrelevant cells. This is described in the next two sections. .
3.1.2 Detecting and Clustering Relevant Regions In a preliminary step, we generate a pyramidal representation of the original data by subsampling with an integer factor. This coarsening factor is usually chosen as 2, but larger values (e.g., to reduce the amount of additional memory necessary for storing the hierarchy), possibly different for the three coordinate directions, are also admissible. Each cell on the coarser level stores the minimal and maximal values of the data samples in the subregion it covers, as shown in Fig. 2. We will refer to these different levels of resolutions as “min-max”-layers in the following. This procedure is repeated recursively until the number of data samples in one direction drops below some specified lower bound. If the number of cells of the original data set does not match a multiple of a power of the refinement factor, the remainder cells are covered by one coarser cell on the next layer. This might involve some cropping of the subgrids’ bounding boxes when descending from a coarser
344
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
to a finer layer (see below) to make sure that no parts of the boxes exceed the original extension of the data volume. After this preparation step, we build the hierarchy of axis-aligned boxes that tightly cover the relevant subvolumes. The coarsest resolution of the pyramid defines the root grid of the hierarchy. The cells of the root grid are inspected and the relevant ones are marked for refinement. In order to classify the voxels into relevant and irrelevant ones, an importance criterion is necessary. This might, for example, be based on opaqueness or on a voxel classification performed in a preceding segmentation step. In the first case, the colormap’s alpha values in the interval associated with the minimal and maximal values stored in the coarse cells are examined and the cell is flagged if the threshold is exceeded. This is sufficient since, during rendering in each cell, trilinear interpolation is used and, thus, no values outside the min-max-interval will be generated. In the next step, subgrids are created by invoking a clustering algorithm. It splits grid cells that need refinement into axis-aligned rectilinear regions, subject to the following conditions: minimize the number of created boxes as well as the number of cells covered unnecessarily (since they do not need refinement). The algorithm will be presented in Section 3.1.3. The clustering procedure is applied recursively until the finest level of the min-max-layers is reached. As an additional optimization step, we perform a reclustering: The original data cells that are covered by the grids on the finest layer are inspected and clustered again using the algorithm presented in Section 3.1.3. The memory overhead of storing two additional values for each cell of the original data volume is thereby avoided without limiting the clustering granularity to the cell size of the first min-max-layer. The data samples covered by these grids are copied and a reference is stored in the leaves of the tree. The pseudocode for these steps is given in Fig. 3 (reclustering is omitted for simplification purposes). Some care is necessary at the boundary faces of the grids; to avoid artifacts caused by discontinuities between adjacent grids during the rendering via 3D textures, one has to insure that they share one row of data samples at their common boundary faces [14]. Notice that one could alternatively cluster directly on the original data volume, without first creating the pyramid, but this would increase the preprocessing time because the clusterer works faster on smaller grids. As discussed in [6], the running time is OðkðP þ MÞÞ, where k is the total number of grids upon termination of the algorithm, P is the number of flagged cells, and M is related to the determination of the inflection points, see Section 3.1.3. Dividing the preprocessing step into pyramid creation and clustering is advantageous since only the clustering step has to be repeated if the colormap is changed, which usually happens several times during a visualization session. Notice also that this hierarchical approach offers the possibility of obtaining a multiresolution representation of the interesting regions by applying appropriate averaging methods in order to compute the data samples on the coarser grids.
3.1.3 The Clustering Algorithm An efficient and fast algorithm for clustering cells into axisaligned regions was suggested by Berger and Rigoutsos [6],
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
Fig. 3. Pseudocode for creating the AMR hierarchy via clustering.
adopting signature-based methods used in computer vision and pattern recognition. We will describe the basic ideas briefly in this section. For more details, the reader might refer to [6]. Assume that the information about the flagged cells is encoded by the function F : ½1; . . . ; l ½1; . . . ; m ½1; . . . ; n7!f0; 1g; with F ði; j; kÞ ¼ 1 if the cell with index i; j; k is flagged. For each slice perpendicular to the i j, i k, and j k planes, the number of cells that need refinement is computed and stored in so-called signature lists S. For example, the entry for slice number i parallel to the j k plane is given by Sj k ðiÞ ¼
m X n X
F ði; j; kÞ
j¼1 k¼1
and, similarly, for the two other orientations. A 2D example is shown in Fig. 4a. In the next step, exterior zero-entries in these lists are detected and pruned off in order to place a minimal bounding box around the flagged cells (Fig. 4b). Any interior zero entry in these lists indicates a potential splitting index, i.e., a position at which the given volume is
KA¨HLER ET AL.: INTERACTIVE VOLUME RENDERING OF LARGE SPARSE DATA SETS USING ADAPTIVE MESH REFINEMENT HIERARCHIES
345
Fig. 4. Two-dimensional example of clustering: First signature lists are computed (a). Exterior rows and columns with zero entries are pruned off (b). Interior zero entries and inflection points indicate splitting edges (c).
subdivided into two smaller subregions. If all signatures are nonzero, the Laplacian second derivative j k ðiÞ ¼ Sj k ði þ 1Þ 2Sj k ðiÞ Sj k ði 1Þ; similarly for i k ðjÞ; i j ðkÞ; of each signature list is computed and the biggest inflection point is taken as the splitting plane (Fig. 4c). This procedure is repeated recursively on the newly created subregions until one of the following halting criteria is satisfied: The subregion exceeds some efficiency ratio, i.e., the ratio of the number of its cells needing refinement to its total number of cells is greater than a preselected value between 0 percent and 100 percent. . The further subdivision of the region would result in grid dimensions smaller than some minimal extension. As a result, the data volume is partitioned in a kD-tree manner, which will be relevant for the back-to-front traversal of the generated hierarchy in the rendering step (see Section 4.1.2). We found that efficiency values between 0:8 and 1:0 and a minimal extension bound of 15 gave good results for the rendering performance, see Section 5.3. Our implementation of the clustering algorithm was inspired by Walker’s version [1] that is used in the AMR software package DAGH [2]. .
3.2 Generating the Octree We compare the AMR approach with an octree-based renderer, similar to the one described in [12]. First, the data volume is partitioned into bricks of preselectable dimensions. These should equal a power of two in order to minimize the amount of texture memory needed in the rendering step (see Section 4). Note that, just like in the AMR case, adjacent nodes need to share data samples at their common faces. These bricks define the leaf nodes of the full octree hierarchy. Only subbranches with leaf nodes that cover relevant regions of the data volume are inserted into the hierarchy. As mentioned in the introduction, the rendering performance is affected by the size of the subvolumes to render and the number of polygon and texture coordinates to be generated and, thus, the number of bricks. Since the subvolumes of the leaf nodes are later defined as texture bricks (see Section 4), the rendering performance can be increased by reducing their number without
enlarging the size of volume to render. In order to do so, one checks if all eight children of an octree node are leaves and, thus, have to be rendered. Then, they are cut off and the node becomes a new leaf. This reduces the number of bricks needed by seven and also decreases the amount of texture memory since the overlapping boundaries are cancelled. Another reduction of bricks is achieved by taking into account the ratio between the sum of volumes covered by the leaves of a node and the node’s volume itself. If this ratio exceeds a preselected threshold, the branch is cut and the node gets a leaf node (see Fig. 5), which usually increases the volume to render. This threshold can be different for each level of the hierarchy. To find a good balance between the performance limiting factors mentioned above, these parameters have to be adjusted carefully, as also noticed in [23]. Now, the data volumes covered by the resulting leaves of the octree are copied and stored in the hierarchy’s nodes.
4
RENDERING
We render the created subvolumes utilizing the 3D-texture mapping approach. Individual 3D-textures are assigned to the separate subvolumes and are clipped against equidistant slices parallel to the viewplane. The slices are then blended back-to-front in the frame buffer. The volume rendering routines for both the AMR and the octree hierarchies are implemented in AMIRA [3], a 3D visualization system developed at ZIB. One-channel textures and the OpenGL colortable extension were used [20]. In the following, we describe the differences during
. .
Fig. 5. Subbranch of the octree: Ratio of volume of leaf nodes B and D 5 and the root A is 16 .
346
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
Fig. 6. Two-dimensional example of the next-fit-decreasing-height packing heuristics: Subvolumes are inserted from left to right, starting at the lower left corner.
rendering for both types of hierarchies, AMR tree and octree.
4.1 Rendering the AMR Hierarchy The graphics hardware assumes the dimensions of 3Dtextures to be equal to a power of two. This could be achieved by extending the data subvolume of each leaf grid of the AMR hierarchy to the next bigger power of two, for example, by clamping the boundary texels and restricting the generated texture coordinates to the unextended area. Regarding the potentially large number of textures to deal with, this would result in a typically large overhead of unused texture memory. We decided to reduce this overhead by utilizing a packing algorithm that inserts the texture bricks into one bigger texture. 4.1.1 Packing the Leaf Bricks For our purposes, the following variant of the threedimensional packing problem is appropriate: Pack a given number of rectilinear boxes into one container with fixed width and depth sizes such that its height is minimized. For a more formal definition, see [18]. This problem belongs to the class of NP-hard problems, but a couple of useful heuristics have been suggested. We adopt a simple level-by-level layer-by-layer packing scheme, a three-dimensional version of the next-fit-decreasing-height (NFDH) algorithm [13]. Notice that more effective packing algorithms are known, see, for example, [18]. We just want to illustrate that the amount of texture memory can be reduced by three-dimensional packing. The outline of our AMR approach does not depend on the special heuristics chosen and better ones could also be employed without affecting the rest of the presented algorithm. First, the boxes are inserted into a list in order of decreasing height. The packing algorithm starts at the lower lefthand corner of the container and inserts the boxes from left to right until the right border is reached. Then, a new row is opened, with a depth coordinate given by the largest depth of the already inserted boxes. This procedure is repeated until the lowest layer of the container is filled. Then, a new layer is opened and this process continues until all boxes are inserted. See Fig. 6 for a 2D example. We iterate this procedure with different values for the base layer extensions of the container, chosen as powers of two. For the resulting containers, the height is extended to the next power of two and the one with the smallest volume
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
Fig. 7. A 3D example of axis-aligned bounding boxes that form a cycle from the considered viewpoint.
is taken. Then, a 3D texture of this size is defined with the subtextures inserted at their computed positions. For each brick, its offset position in the merged texture is stored.
4.1.2 Back to Front Rendering Even for axis-aligned subvolumes, it is not possible in general to render them back-to-front brick-by-brick because occlusion cycles might occur when looking from certain viewpoints, see Fig. 7. However, since the clustering algorithm described in Section 3.1.3 generates a kD-tree partitioning of the data volume, no cycles occur in this AMR approach. Thus, we use this kD-tree in order to traverse the generated bricks in the correct order. At each node, the actual viewpoint is compared to the value of the node’s split position and the two child nodes are visited in a back-tofront manner. If the node is a leaf, the associated subvolume is rendered slice by slice. Computing the intersection points of the slices and the bricks’ bounding boxes is done in software. We speed up this procedure by first determining, for each brick, the interval of slices that intersect it. This is accomplished by projecting the bounding box corners of the brick on the planes normal direction, as shown in Fig. 8. The intersections need to be computed only for this subset. This is done by first determining which bounding box corners lie above and which below the oriented plane by comparing the projections onto the plane normal vector of the plane center and the corners of the bounding box. A lookup-table stores which edges are intersected and their correct order (needed for the definition of the associated polygon) for each configuration. The intersection coordinates are computed by linear interpolation between the endpoints of these edges. 4.2 Rendering the Octree Since most leaf nodes of the octree already have dimensions equal to a power of two, each of them is converted into a
Fig. 8. Computing the interval of slices intersecting a brick by projecting its corners onto the slices’ normal direction.
KA¨HLER ET AL.: INTERACTIVE VOLUME RENDERING OF LARGE SPARSE DATA SETS USING ADAPTIVE MESH REFINEMENT HIERARCHIES
347
Since texture-based volume rendering is fillrate limited, the frame rates depend on the size of the viewer window, the number of slices, and the area in screen space covered by the data volume (and, thus, on the actual position of the viewpoint). We averaged the frame rates for several positions inside and outside the data volume by choosing viewpoints located on different circles with varying radii and orientations, as indicated in Fig. 9. For all examples, the sizes of the rendered images were 764 793 pixels; the numbers of slices are listed below.
Fig. 9. Some of the camera positions used for determination of the average rendering speed.
separate texture brick. Merging them into one big texture generally increases the amount of texture size due to the nonoptimal packing scheme. Each octree brick is rendered separately in a back-tofront traversal. The view-consistent order can be easily determined by a table lookup at each node, which returns the correct order to visit the children with respect to the actual viewpoint [4]. As in the AMR case, for each brick, the interval of intersecting slices is precomputed and the intersection points are determined using the fast table-lookup approach mentioned above.
5
RESULTS
5.1 Setting We applied our algorithm to several data sets with decreasing degrees of sparseness. The performance was tested on an SGI Onyx2 InfiniteReality2 with two RM7 raster managers with 64 MB texture memory. The runs were performed on a single 195 MHz MIPS R10k processor.
5.2 Input Data In the examples, we used the opacity value as the importance criterion. Cells with an associated opacity values greater than thres ¼ 0:03 were marked as relevant. Data sets I and II are confocal microscopy images of neurons inside a honey bee’s brain. About 0.1 percent, respectively, 0.2 percent of the cells were marked as relevant and the volumes were rendered with 1; 200 slices. For data set II, a greater threshold of thres ¼ 0:1 was used in order to eliminate noise contained in the microscopy image. Data set III represents a part of a human vascular tree. Here, 1.2 percent of the cells were tagged as relevant. This data set also was rendered with 1; 200 slices. Example IV contains data from a molecular dynamics simulation and conformational analysis. About 14 percent of the cells were marked as relevant. The volume was rendered with 320 slices. This is the only data set which fit into memory without bricking, i.e., in the standard volume rendering approach, the frame rate was dominated by the fill rate. The last example is a nonsparse data set containing 23 percent of relevant cells, which was rendered with 900 slices. Images of the different data sets are shown in Figs. 10, 11, 12, 13, and 14. The left image of each figure shows an example volume rendering, the images in the middle depict the bounding boxes of the octree, and the right images display the AMR bounding boxes.
Fig. 10. Data set I: Confocal image of a bee’s neuron. (a) Standard, (b) octree, and (c) AMR tree.
348
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
Fig. 11. Data set II: Confocal image of a bee’s brain. (a) Standard, (b) octree, and (c) AMR tree.
Fig. 12. Data set III: Ultrasound image of a vascular tree of a human liver. (a) Standard, (b) octree, and (c) AMR tree.
Fig. 13. Data set IV: Configuration density of a metastable molecule conformation. (a) Standard, (b) octree, and (c) AMR tree.
5.3 Measured Results and Discussion The statistics are displayed in Tables 1, 2, 3, 4, and 5. For
rendering, for the octree, and the AMR approach. The first
each data set, we list the results for standard volume
rendering approach. Rows labeled “Octree I” contain the
rows (“Standard”) show the results for the standard volume
KA¨HLER ET AL.: INTERACTIVE VOLUME RENDERING OF LARGE SPARSE DATA SETS USING ADAPTIVE MESH REFINEMENT HIERARCHIES
349
Fig. 14. Data set V: Confocal image of a bee brain. (a) Standard, (b) octree, and (c) AMR tree.
octree result with leaf dimensions that result in optimal frame rates. The ratio threshold was set to 1. Rows labeled “Octree II” show the best octree results we achieved by adjusting the ratio threshold parameters. For the definition of these parameters, see Section 3.2. The fourth rows (“AMR I”) display AMR results with the clusterer’s efficiency parameter (as defined in Section 3.1.3) set to 0:85 and a minimal extension bound of 15. The last rows, labeled “AMR II,” report the optimal results achieved by adjusting these parameters for each data set. The tables’ columns list the depth of the hierarchies, the number of created texture bricks, the percentage of the data volume covered by them, the amount of texture memory (after extension to the next power of 2 and packing them into one texture in the AMR case), the preprocessing times, and, finally, the frame rates. On the InfiniteReality2, the texelsize is 2, so the internal size of the textures is twice the number given in the table. Note that, for data sets I, II, III, and IV, the volume entries for the standard case give values greater than 100 percent because the bricks share a common row of data samples at their boundaries. For the standard approach, the preprocessing times are just given by the times to allocate and define the texture or textures (in cases where bricking is necessary). The preprocessing times for the AMR hierarchies are split into the part needed for the resampling and a second part for clustering and packing. Note that only the
clustering and packing step has to be updated when the colormap is changed. The number of created texture bricks in the AMR case is much smaller than the bricks created by the corresponding octree hierarchy, see Figs. 1, 2, 3, 4, and 5; also, fewer nonrelevant cells are enclosed, especially for the sparse data sets I, II, and III. So, the number of intersection computations as well as the rendered area in screen space are drastically reduced, resulting in significant performance gains for the AMR approach for data sets I, II, and III. For these examples, the best AMR results needed less texture memory compared to the octree results with optimal parameters. This is due to the efficient clustering that captures only a few nonrelevant cells and the smaller number of bricks, resulting in fewer duplicated texels on common boundaries of bricks, see Section 3.1.2. For data set IV, the amount of texture memory needed was almost equal. For nonsparse data set V, the amount was twice as high as for the octree case. Here, the sizes of the covered volume are comparable for octree and AMR (choosing a better heuristics for packing could improve this for AMR). The frame rates for these two data sets are also comparable, since the performance is dominated by the fill-rate limitation.
TABLE 1 Data Set I: Neurons Inside a Bee Brain Containing 654 993 200 Voxels and Rendered with 1,200 Slices
TABLE 2 Data Set II: Neurons Inside a Bee Brain Containing 566 990 200 Voxels and Rendered with 1,200 Slices
350
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,
TABLE 3 Data Set III: Vascular Tree Containing 528 574 700 Voxels and Rendered with 1,200 Slices
6
CONCLUSIONS
We presented an algorithm based on AMR trees that accelerates the 3D texture-based volume rendering of large, sparse data sets. We compared this approach with an octree-based algorithm. These hierarchical techniques achieve acceleration rates of up to two orders of magnitude compared to the standard method. The AMR approach obtains a good balance between the number of boxes needed for tightly capturing the relevant voxels of the data volume and the number of covered nonrelevant ones. The number of created texture bricks is much smaller than the bricks created by the corresponding octree hierarchy. So, the amount of intersection computations, as well as the rendered area in screen space, are drastically reduced, resulting in significant performance gains for the AMR approach for the sparse data sets. Furthermore, the frame rates for the standard and optimal parameter settings in the AMR cases are similar for all data sets. This shows that the standard setting usually yields good performance and, hence, almost no user interaction is necessary for finding good rendering parameters. In contrast, the optimal parameter settings for the octree-based algorithm vary strongly, even for similarly structured data sets, and are therefore difficult to predict. Adjusting the parameters typically requires a higher amount of user interaction. We reduced the additional texture memory necessary due to the power-of-two restrictions by applying 3D packing heuristics. To summarize: The AMR-based algorithm yields best results for sparse and large data sets. For these kind of data, it benefits from the capability of covering complex shaped domains of relevant voxels with less boxes. This results in significant performance gains since the costs for intersection computation and raster operations are comparable in these TABLE 4 Data Set IV: Molecular Conformation Data Set Containing 131 130 101 Voxels and Rendered with 320 Slices
VOL. 9,
NO. 3,
JULY-SEPTEMBER 2003
TABLE 5 Dataset V: Bee Brain Data Set Containing 749 495 100 Voxels and Rendered with 900 Slices
cases. For nonsparse data sets VI and V, the AMR-based algorithm achieves frame rates comparable to those of the octree based algorithm since, here, the performance is limited mainly by the fill-rates of the texture hardware.
7
FUTURE WORK
We are planning to enhance the presented algorithm to take into account also accumulated opacities. In order to be prepared for the large scale simulations and huge image data sets expected for the near future, we will examine the use of better packing algorithms, like the one presented in [18]. It would also be interesting to test other clustering algorithms and compare the preprocessing times and qualities of the resulting hierarchies. Currently, we are extending our algorithm for rendering data sets from AMR simulation, i.e., where an AMR hierarchy has already been established by the numerical solver. Here, other requirements have to be considered, e.g., unrefined areas have to be partitioned without introducing cycles. Furthermore, one has to deal with the problem of opacity corrections and artifacts on the boundaries of grids with different levels of resolution.
ACKNOWLEDGMENTS The authors would like to thank Malte Zo¨ckler for the idea and implementation of the intersection algorithm based on a lookup-table, sketched in Section 4.1.2, and Johannes Schmidt-Ehrenberg, Detlev Stalling, Tom Goodale, and Werner Benger for proofreading and fruitful discussions, respectively. The single neuron and bee brain data sets were provided by Robert Brandt (research group Randolf Menzel, Freie Universita¨t Berlin), the vascular tree data set by Thomas Lange (research group Peter Schlag, RobertRo¨ssle-Klinik and Unversita¨tsklinikum Charite´, Berlin), and the molecular data set by Johannes Schmidt-Ehrenberg, Daniel Baum, and Frank Cordes (Zuse Institute Berlin). This work was supported in part by the Max-Planck-Institut fu¨r Gravitationsphysik (Albert-Einstein-Institut) in Potsdam/ Germany.
REFERENCES [1] [2] [3]
http://jean-luc.ncsa.uiuc.edu/Codes/3DCluster, 2001. http://www.caip.rutgers.edu/parashar/DAGH, 1998. Amira User’s Guide and Reference Manual, Amira Programmer’s Guide. Konrad-Zuse-Zentrum fu¨r Informationstechnik Berlin (ZIB) and Indeed—Visual Concepts GmbH, Berlin, http://www.amir avis.com, 2001.
KA¨HLER ET AL.: INTERACTIVE VOLUME RENDERING OF LARGE SPARSE DATA SETS USING ADAPTIVE MESH REFINEMENT HIERARCHIES
[4] [5] [6] [7] [8]
[9] [10] [11] [12]
[13] [14]
[15] [16] [17] [18] [19] [20] [21]
[22]
[23] [24] [25]
[26] [27]
[28]
W.G. Aref and H. Samet, “An Algorithm for Perspective Viewing of Objects Represented by Octrees,” Computer Graphics Forum, vol. 14, no. 1, pp. 59-66, 1995. M.J. Berger and P. Collela, “Local Adaptive Mesh Refinement for Shock Hydrodynamics,” J. Computational Physics, vol. 82, no. 1, pp. 64-84, 1989. M.J. Berger and I. Rigoutsos, “An Algorithm for Point Clustering and Grid Generation,” IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 5, 1991. I. Boada, I. Navazo, and R. Scopigno, “Multiresolution Volume Visualization with a Texture-Based Octree,” The Visual Computer, vol. 17, no. 5, pp. 185-197, 2001. B. Cabral, N. Cam, and J. Foran, “Accelerated Volume Rendering and Tomographic Reconstruction Using Texture Mapping Hardware,” Proc. 1994 Symp. Volume Visualization, A. Kaufman and W. Krueger, eds., pp. 91-98, 1994. D. Cohen and Z. Sheffer, “Proximity Clouds: An Acceleration Technique for 3D Grid Traversal,” The Visual Computer, vol. 11, pp. 27-38, 1994. J.M. Danskin and P. Hanrahan, “Fast Algorithms for Volume Ray Tracing,” Proc. Workshop Volume Visualization, pp. 91-98, Oct. 1992. R.A. Drebin, L. Carpenter, and P. Hanrahan, “Volume Rendering,” Computer Graphics (Proc. SIGGRAPH 88), vol. 22, no. 4, pp. 6574, Aug. 1988. S. Fang, R. Srinivasan, S. Huang, and R. Raghavan, “Deformable Volume Rendering by 3D Texture Mapping and Octree Encoding,” Proc. IEEE Visualization, R. Yagel and G.M. Nielson, eds., pp. 73-80, 1996. D.S. Johnson, E.G. Coffman Jr., M.R. Garey, and R.E. Tarjan, “Performance Bounds for Level-Oriented Two Dimensional Packing Algorithms,” SIAM J. Computing, vol. 9, pp. 808-826, 1980. E.C. LaMar, B. Hamann, and K.I. Joy, “Multiresolution Techniques for Interactive Texture-Based Volume Visualization,” Proc. IEEE Visualization, D. Ebert, M. Gross, and B. Hamann, eds., pp. 355-362, 1999. D. Laur and P. Hanrahan, “Hierarchical Splatting: A Progressive Refinement Algorithm for Volume Rendering,” Computer Graphics (Proc. SIGGRAPH 91), vol. 25, no. 4, pp. 285-288, July 1991. C.H. Lee and K.H. Park, “Fast Volume Rendering Using Adaptive Block Subdivision,” Pacific Graphics, Oct. 1997. M. Levoy, “Efficient Ray Tracing of Volume Data,” ACM Trans. Graphics, vol. 9, no. 3, pp. 245-261, July 1990. K. Li and K.-H. Cheng, “On Three-Dimensional Packing,” SIAM J. Computing, vol. 19, no. 5, pp. 847-867, 1990. K.-L. Ma, “Parallel Rendering of 3D AMR Data on the SGI/Cray T3E,” Proc. Seventh Symp. Frontiers of Massively Parallel Computation, pp. 138-145, 1999. M. Segal and K. Akely The OpenGL Graphics Sytem (Version 1.3), http://www.opengl.org/developers/documentation, 2001. H.-W. Shen, L.-J. Chiang, and K.-L. Ma, “A Fast Volume Rendering Algorithm for Time-Varying Fields Using a TimeSpace Partitioning (TSP) Tree,” Proc. IEEE Visualization, D. Ebert, M. Gross, and B. Hamann, eds., pp. 371-378, 1999. H.-W. Shen, D. Ellsworth, and L.-J. Chiang, “Accelerating TimeVarying Hardware Volume Rendering Using TSP Trees and Color-Based Error Metrics,” Proc. Visualization and Graphics Symp., pp. 119-128, 2000. R. Srinivasan, S. Fang, and S. Huang, “Multi-Object Volume Rendering Algorithm by Octree Projection,” Proc. SPIE Conf. Image Display, vol. 3335, pp. 462-468, 1998. K.R. Subramanian and D.S. Fussel, “Applying Space Subdivision Techniques to Volume Rendering,” Proc. IEEE Visualization, pp. 150-159, 1990. X. Tong, W. Wang, W. Tsang, and Z. Tang, “Efficiently Rendering Large Volume Data Using Texture Mapping Hardware,” Proc. Joint EUROGRAPHICS—IEEE TVCG Symp. Visualization, May 1999. G.H. Weber, H. Hagen, B. Hamann, K.I. Joy, T.J. Ligocki, K.-L. Ma, and J. Shalf, “Visualization of Adaptive Mesh Refinement Data,” Proc. IS&T/SPIE Electronic Imaging, vol. 4302, 2001. G.H. Weber, O. Kreylos, T.J. Ligocki, J.M. Shalf, H. Hagen, B. Hamann, and K.I. Joy, “High-Quality Volume Rendering of Adaptive Mesh Refinement Data,” Proc. Vision, Modeling, and Visualization, pp. 121-128, 2001. M. Weiler, R. Westermann, C. Hansen, K. Zimmerman, and T. Ertl, “Level-of-Detail Volume Rendering via 3D Textures,” Proc. IEEE Volume Visualization and Graphics Symp., pp. 7-13, 1994.
351
[29] O. Wilson, A. VanGelder, and J. Wilhelms, “Direct Volume Rendering via 3D Textures,” Technical Report UCSC-CRL-94-19, Univ. of California, Santa Cruz, 1994. [30] K. Zuiderveld, A. Koning, and M. Viergever, “Acceleration of Ray-Casting Using 3D Distance Transforms,” Proc. Visualization in Biomedical Computing, pp. 324-335, 1992. Ralf Ka¨hler studied physics at the Free University of Berlin, where he received the MS degree in 1999. Afterward, he joined the KonradZuse-Zentrum Berlin (ZIB) as a research scientist in the Scientific Visualization Department. His research interests include computer graphics and data visualization. Currently, he is working on his PhD thesis about hierarchical methods in volume rendering.
Mark Simon was a member of the Scientific Visualization Research Group at the Zuse Institute Berlin from 1998 to 2001. He studies computer sciences at the Free University of Berlin and is currently working on his diploma thesis in the field of computer vision. His current research interests include computer graphics, artificial intelligence, and robotics.
Hans-Christian Hege is head of the Scientific Visualization Department at Zuse Institute Berlin (ZIB, www.zib.de). He studied physics at the Free University of Berlin. From 1984 to 1989, he was a research assistant in the Physics Department, working in quantum field theory and numerical physics. In 1986, he cofounded Mental Images and GDS. From 1986 to 1989, he worked as researcher at Mental Images and as managing director at GDS. Since 1989, he has been with ZIB, a nonuniversity research institute of the State of Berlin operating in the field of information technology. At ZIB, he built up and now directs the Scientific Visualization Department. In parallel, he acts as managing director of Indeed—Visual Concepts, a company specializing in data visualization, which he cofounded in 1999. His current research interests are in computer graphics, data visualization, image analysis, and biomedical computing.
. For more information on this or any other computing topic, please visit our Digital Library at http://computer.org/publications/dlib.