International Journal of Computer Vision 65(3), 163–174, 2005 c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.
Measuring the Information Content of Fracture Lines ˜ HELENA C.G. LEITAO Institute of Computing, Fluminense Federal University, Rua Passo da P´atria 156, 24210-240, Niter´oi, RJ, Brazil
[email protected] JORGE STOLFI Institute of Computing, State University of Campinas, Caixa Postal 6176, 13083-970 Campinas, SP, Brazil
[email protected] Received April 1, 2003; Revised February 28, 2005; Accepted April 11, 2005
Abstract. Reassembling unknown broken objects from a large collection of fragments is a common problem in archaeology and other fields. Computer tools have recently been developed, by the authors and by others, which try to help by identifying pairs of fragments with matching outline shapes. Those tools have been successfully tested on small collections of fragments; here we address the question of whether they can be expected to work also for practical instances of the problem (103 to 105 fragments). To that end, we describe here a method to measure the average amount of information contained in the shape of a fracture line of given length. This parameter tells us how many false matches we can expect to find for it among a given set of fragments. In particular, the numbers we obtained for ceramic fragments indicate that fragment outline comparison should give useful results even for large instances of the problem. Keywords:
1.
curve matching, jigsaw puzzles, information content, fractals, shape recognition
Introduction
Reassembling unknown broken objects from a large collection of irregular fragments is a problem that arises in several contexts, such as archaeology, failure analysis, paleontology, and art conservation. Instances of this problem often have tens of thousands of randomly shaped and featureless fragments, whose reassembly would require years of tedious work. The most difficult part of the problem is finding the pairs of matching fragments, those that were adjacent in the original object. Note that a thousand fragments make approximately one half million pairs, and each pair could fit together in many ways. Recently there have been several attempts to automatize this task, using a variety of curve- and surface-matching techniques (Kampel and Sablatnig,
2003; McBride and Kimia, 2003; Papaioannou and Karabassi, 2003; Papaioannou et al., 2002; ¨ ¸ oluk and Toroslu, 1999; Taylor and Lewis, 2004; Uc Wolfson, 1990; Wolfson and Rigoutsos, 1997). In particular, we have developed a multiscale curve matching technique that, given a set of fragment outlines like the ones shown in Fig. 1, will efficiently identify a substantial fraction of the matching pairs, like those shown in Fig. 2 (Leit˜ao and Stolfi, 2002).
1.1.
Scaling up to Large Problems
While computer matching was shown to be effective for small instances of the problem (about a hundred fragments), it is not obvious that it will work for realistic instances, with 103 to 105 fragments. One may
164
Figure 1.
Leit˜ao and Stolfi
Digitized outlines of ceramic fragments.
Figure 2. Some matching fragment pairs. The matching segments are highlighted.
worry that, among such large collections, there will be far too many “false positives”—pairs that were not adjacent in the original object, but whose outlines have the same shape, just by chance. A program that produced thousands of false matches for each fragment would not be of much help. For a rough analysis of this issue, suppose we have N fragment outlines, with average perimeter L. For a given point p on the boundary of one fragment, there are NL/δ points on other fragments that could be matched to p in the reconstructed object, where δ is a specified positional precision. Thus, in order to identify the correct match q, we need to extract log2 (N L) + O(1) bits of useful information from the shape of the outline in the neighborhood of p — “useful” in the sense that the same bits can be extracted, with high probability, from
the outline around q. This observation is encouraging, in that it says that the amount of information required grows very slowly (logarithmically) with the size of the problem. In fact, experience suggests that the shape of a ceramic fragment contains a lot of information about its matching partner. Anyone who has tried to put back together a broken vase knows that a correct pair of fragments, even relatively small ones, will “fit” together vastly better than an incorrect pair; so that the latter is hardly ever mistaken for the former. The reason is that, for suitable materials, the two sides of a fracture will remain congruent to within a fraction of a millimeter, for most of their length. Given the irregular random shape of most fractures, the probability of obtaining such a precise fit among two unrelated pieces is practically nil (see Figs. 3 and 4). In this article, we try to turn the above intuition into a quantitative statement. Specifically, we describe a method for determining the average amount of useful information contained in a piece of fragment outline of given length, given a sample of correctly matched fragment outlines.
Figure 3.
Two matching fragments.
Measuring the Information Content of Fracture Lines
165
Figure 4. Two matching outline segments, magnified. Grid lines are 1 mm apart. The outlines were digitized at a at 300 dpi (0.085 mm/pixel) and smoothed with a Gaussian filter of characteristic width σ = 0.085 mm.
We must stress that the topic of this paper is not our matching algorithm, which uses quite different techniques and has been described elsewhere (Leit˜ao, 1999; Leit˜ao and Stolfi, 2002). What we describe here is an analysis technique whose goal is to enable us to decide a priori whether outline matching (by any algorithm) is a viable approach for a particular instance of the fragment reassembly problem. The results of this analysis can be used also to estimate such parameters as the number of false matches that one can expect to find among a large collection of segments, the minimum length of common boundary that is needed for reliable matching, and the minimum accuracy needed in the digitization of the outlines. There have been many statistical studies of fracture lines and surfaces, e.g. by Brown et al. (1993) and Brown and Hoch (1997). However, to the best of our knowledge, none of those studies have looked at fractures as information-carrying signals, or attempted to measure the discrepancy between the two sides of a fracture. In Section 2 we describe the fracture model used in this paper. Section 3 explains how we convert the fracture outline (a curve on the plane) into a onedimensional signal. The analysis of the information content is given in Section 4. In Section 5 we illustrate the method with an artificial but fairly realistic sample of ceramic fragments. Section 6 summarizes the conclusions. In particular, we observe that, for our sample fragments, the probability of two random fragment outlines 10.8 mm long being indistinguishable from a true match will be about 1/222.35 ≈ 1/4,000,000.
2.
tiles, tablets, large vases, frescoes, etc. The algorithm’s input consists of the digitized fragment contours or outlines, modeled as a set of closed plane curves. We assume that two fragments which were adjacent in the original object were separated by an ideal fracture line of zero thickness. The concrete manifestation of that line is a pair of matching segments on the digitized contours of those two fragments (see Fig. 5). (Note that the segment endpoints cannot be reliably located on the fragment outlines before identifying the matching fragment pairs; this is one major difference between assembling ceramic fragments and solving traditional jigsaw puzzles (Burdea and Wolfson, 1989).) Two matching segments will never be precisely congruent: there will be some differences, either real (e.g., due to loss of small fragments) or artificial (due to errors in the contour extraction process, such as parallax, shadowing, quantization, etc.). The useful information content of a piece of contour is determined by the magnitude of all these errors, relative to the size of the characteristic details that can be used to identify the matching piece.
Fracture Model
Our fragment matching algorithms are specialized for objects with a smooth and locally flat surface, such as
Figure 5. Original object with ideal fracture lines (a) and the observed fragment contours (b).
166
Leit˜ao and Stolfi
Figure 6.
3.
The shape function a(t) (bottom) of a plane curve (top). Grid lines are 1 mm apart.
Interpreting Curves as Signals
Before we can apply the tools of information theory to this problem, we must turn each curve into a signal—a real function of some real parameter t. The transformation must turn matching contour segments into similar signals, even if the fragments were digitized in random orientations. For this purpose, we use a shape function z(t) that is the double integral of the outline’s curvature κ(t), expressed as a function of its arc-length t measured from an arbitrary reference point. The curvature graph κ(t) is a well-known rotation-invariant representation of a curve, and this property is retained by its integrals (apart from arbitrary integration constants). More precisely, we assume that the curve has length L and is given by n + 1 equally spaced sample points c0 , c1 , . . . , cn on the plane, where n is a power of two. The shape function z is conceptually defined on the interval [0, L], and is computed as n + 1 real sample values z 0 , z 1 , . . . , z n . The curvature κ i at each point ci (1 ≤ i ≤ n−1) is estimated as the angle between the vectors ci − ci−1 and ci+1 − ci , divided by the step size δ = L/n. The curvature values are then integrated twice by simple summation,
zi = A +
i j=2
B+
j
κk−1 δ δ,
k=2
i ∈ {0, 1, . . . , n}
(1)
The integration constants A and B are then adjusted so that z0 = zn = 0. This transformation is fully invertible;
unlike the curvature κ itself, it does not magnify the small-scale noise, and moreover it preserves qualitatively the shape of the curve (see Fig. 6). As a matter of fact, the Fourier-based analysis below could have been carried out on κ instead of z, since their spectra are closely related. The only advantage of z is that its graph looks quite similar to the curve itself, a major convenience for plotting and debugging. One drawback of this transformation is that a local disturbance in the curve may change its length, and therefore cause a global shift of the shape function from that point on. Nevertheless, one verifies experimentally that the shape functions of corresponding contour pieces, like the ones shown in Fig. 4, generally remain in sync over the lengths considered in this analysis (about a centimeter)—provided that the outlines are extracted with due care to avoid and orientationdependent quantization artifacts (“pixel jaggies”) (see Fig. 7). 4.
Information Content of Outlines
We can view a digitized fragment contour abstractly as a signal (the ideal fracture line) corrupted by noise (the material losses and data acquisition errors). Specifically, the shapes of the two corresponding segments of a pair can be written as a(t) = s(t) + n (t) and b(t) = s(t) + n (t), where s is the shape of the ideal fracture line, and n , n are “noise” functions that represent loss of material, acquisition errors, etc. The presence of noise means that a finite segment of the shape of one outline, say a(t), carries only a limited amount of information about the shape s(t) of the original fracture,
Measuring the Information Content of Fracture Lines
Figure 7.
The shape functions a(t) and b(t) of the two corresponding contour segments shown in Fig. 4, and their difference d(t).
and even less about the shape b(t) of the matching outline segment. Our goal is to quantify this information. Because of the obvious correlations between adjacent samples, this information cannot be directly evaluated on a sample-per-sample basis. As in classical signal analysis, we can get around this problem by working with the Fourier transforms of the signals, rather than the signals themselves. More precisely, since the shape functions are discrete signals, we work with their discrete Fourier transforms (DFTs). Under many realistic assumptions, both experiments and theory show that the Fourier coefficients of a signal are random variables with symmetric and zeromean Gaussian distributions. Intuitively, this result is an expected consequence of the law of large numbers, and the fact that each DFT coefficient is a linear combination of a large number of sample values, whose weights are unit-norm complex numbers that sum to zero. Moreover, the orthogonality of Fourier components with different frequencies implies that their coefficients are pairwise independent, even when there are strong correlation between nearby sample values (Lathi, 1998). Turning shape functions into periodic signals. One technical difficulty of the Fourier approach is that the DFT is defined for periodic signals only. The trivial solution is to consider the shape function z0 , z1 , . . . , zn to be periodic with period n. However, this extension
Figure 8.
167
introduces significant first-order discontinuities (sharp corners) at the joining samples zkn (see Fig. 8(a)). Those corners introduce spurious high-frequency components in the signals. Since the locations and magnitudes of the discontinuities are related to the overall curvature of the outline segment, the coefficients of those spurious high-frequency components would be strongly correlated, inflating the estimates of mutual information content. The standard way of handling the boundary discontinuities would be to multiply the signal by a window function that decays smoothly to 0 at both ends of the time-domain interval. That approach is not helpful here, however: while it removes the discontinuity at period boundaries, it introduces low-frequency artifacts that are highly correlated between the a and b signals, and cannot be separated from the measured values. Therefore, instead of windowing the signal, we extend each shape function z0 , z1 , . . . , zn (with z0 = zn = 0) to a periodic signal with period 2n, where the second half of each period consists of the same signal, time-reversed and negated: that is, zn+i = − zn−i for 1 ≤ i ≤ n (see Fig. 8(b)). With this convention, the joining pixels zkn become discontinuities of second order only, whose impact on the Fourier spectrum is much smaller. One consequence of this convention is that the signal z becomes an odd function of time; which in turn implies that its DFT has only n − 1 (rather than 2n)
From shape function to periodic signal: trivially (a) and by flipping (b).
168
Leit˜ao and Stolfi
nonzero Fourier coefficients, associated with the sinusoidal components only: zj =
n−1 k=1
n−1 kj kj = (2) Z k sin 2π Z k sin π 2n n k=1
In particular, the amplitude Z1 of the lowest component, whose period is 2L, is a measure of the overall curvature of the outline segment. Information content of DFT coefficients. Let Ak , Bk , Sk , Nk , and Nk be the Fourier coefficients of a, b, s, n , and n , respectively. The information given by each coefficient Ak about the corresponding coefficient Bk (Lathi, 1998) is Ik = H (Bk ) − H (Bk |Ak )
(3)
where H(X) denotes the entropy of a random variable X (Lathi, 1998). The total information about b that is carried by a is then simply Itot = nk=1 Ik . The entropy of a Gaussian variable X is H (X ) = 1 log(2π e Xˆ ), where Xˆ denotes X’s variance. (All log2 arithms here are in base 2, and thus entropies are expressed in bits.) From linearity of the DFT, we have Ak = Sk + Nk , and Bk = Sk + Nk . We can assume that Nk and Nk have the same variance Nˆ k ; from the independence of the signals, we have Aˆ k = Bˆ k = Sˆ K + Nˆ k . Moreover, for any value y, the conditional probability distribution Pr (Bk |Ak = y) turns out to be a Gaussian with mean y Sˆk / Aˆ k , and variance ( Sˆk / Aˆ k + 1) Nˆ k . Then formula (3) becomes 1 1 log(2π e Bˆ k ) − log(2π e( Sˆk / Aˆ k + 1) Nˆ k ) 2 2 Bˆ k 1 = log 2 ( Sˆk / Aˆ k + 1) Nˆ k
Ik =
Figure 9.
Aˆ k Bˆ k 1 = log 2 Sˆk Nˆ k + Aˆ k Nˆ k ( Sˆk + Nˆ k )2 1 = log 2 (2 Sˆk + Nˆ k ) Nˆ k
(4)
Determining Sˆk and Nˆ k . Unfortunately, we have no direct information about the variance of the original signal Sˆk (the shape function of the ideal fracture line) or of the noise Nˆ k (the difference between the fracture lines and the observed contours). However, we can estimate these parameters by comparing sections of fragment contours that are known to correspond to the same fracture line in the original object—such as the highlighted line in Fig. 3. Let’s then denote by a(t) and b(t), for t ∈ [0, L], the shape functions of two corresponding pieces of contours, as in Fig. 7, selected so that the midpoints a(L/2), b(L/2) of the two graphs correspond to the same point of the ideal fracture line. Let m(t) = [a(t) + b(t)]/2 be the average of the two signals, and d(t) = a(t) − b(t) their difference (see Fig. 9). The Fourier coefficients Mk and Dk of the signals m and d then have variances
Sk + Nk + Sk + Nk 1 = Sˆk + Nˆ k Mˆ k = var 2 2 ˆ ˆ Dk = var (Sk + Nk ) − (Sk + Nk ) = 2 Nk
Thus, we can estimate the variances Mˆ k and Dˆ k by extracting the signals m(t) and d(t) from a reference set of matching segment pairs, computing their DFTs, and taking the variances of each coefficient Mk over these
The average m(t) = [a(t) + b(t)]/2 and difference d(t) = a(t) − b(t) of the shape functions of Fig. 7.
Measuring the Information Content of Fracture Lines
pairs. We then compute Sˆk and Nˆ k by the formulas 1 Sˆk = Mˆ k − Dˆ k 4
1 Nˆ k = Dˆ k 2
5. (5)
Substituting these values into formula (4), we find that the amount of information provided by the frequency-k component of curve a about the same component of its partner b is ( Aˆ k )2 1 Ik = log 2 2 Mˆ k − 14 Dˆ k + 12 Dˆ k 12 Dˆ k ( Aˆ k )2 1 = log (6) 2 Mˆ k Dˆ k 1 = log Aˆ k − (log Mˆ k + log Dˆ k ) 2
(7)
Accounting for roundoff errors. When estimating the variances Aˆ k , Mˆ k , and Dˆ k from the set of matched segment pairs, one must account for roundoff errors in the corresponding DFT coefficients, which may lead to inconsistent variances and negative information contents. An effective remedy for this problem is to add a small bias to each computed variance, corresponding to a random perturbation of the input samples commensurate with the contour digitization error. Namely, to each shape function value ai or bi we implicitly add a Gaussian perturbation with standard deviation ε, which in our tests was set to 0.02 pixel (0.0017 mm). Concretely, we added ε2 to each variance Aˆ k and Bˆ k , ε2 /2 to each Mˆ k , and 2ε2 to each Dˆ k . Consistency check. As a consistency check, let’s consider what would happen if a(t) and b(t) were the shape functions of two unrelated contour segments with the same length. In this case, we would have a = s + n and b = s + n , where s and s are independent signals. The variances of the coefficients Mk and Dk would be 1 1 Mˆ k = ( Sˆk + Nˆ k ) = Aˆ k 2 2 Dˆ k = 2( Sˆk + Nˆ k ) = 2 Aˆ k Formula (7) would then evaluate to 1 ˆ ˆ Ak − log(2 Aˆ k ) = 0 Ik = log Ak − log 2 as expected.
5.1.
169
Experimental Results Main Experiment
To test this theory, we shattered five unglazed ceramic tiles into about one hundred fragments, ranging from 10 to 50 mm in diameter. We scanned the flat sides of those fragments by placing them directly on a 300 dpi flatbed scanner, in random positions and orientations. The flat sides were lightly rubbed with chalk to improve contrast (see Fig. 14). We extracted the fragment outlines from the resulting images with a simple contour-following algorithm, using bilinear gray value interpolation to locate the outline with sub-pixel precision. To further reduce the quantization artifacts, we smoothed the outlines with a geometric Gaussian filter (da Gama Leit˜ao, 1999), with characteristic width σ = 1 pixel ( = 0.085 mm), and resampled each outline with uniform stepsize σ /4( =0.25 pixel = 0.021 mm). Some of those contours are shown in Figs. 1 and 3. From these contours, we selected 50 pairs of fragments that were adjacent in the original tiles, and we extracted from the outlines of each pair the best-matching Sections 128 pixels long (10.8 mm, 512 samples). We converted these curve segments to shape functions a(t) and b(t), as explained in Section 3, and computed the mean and difference signals m(t) and d(t) for each pair. Figures 4, 10, and 9 show one of these outline pairs, and the corresponding signals a(t), b(t), d(t), and m(t). Figures 10 and 11 and Table 1 show the estimated variances Aˆ K , Mˆ K , and Dˆ K , and the useful information content Ik for each component frequency k, as computed by formula (7). Table 2 shows the information content condensed by logarithmically-spaced frequency bands (scales of detail), and accumulated up to each scale.
5.2.
Control Experiment
As a control experiment, we repeated the process with 50 pairs of non-matching contour segments, randomly picked from the same collection of fragments. Figures 12 and 13 show the average power spectra and useful information content Ik (rather, the lack thereof) for that sample.
170
Leit˜ao and Stolfi
Figure 10. Variances of Fourier coefficients (averaged power spectra) of the mean ( Mˆ k ) and difference ( Dˆ k ) signals for the 50 pairs of matching segments.
6.
Conclusions and Future Work
The main experiment (Section 5.1) tells us that the shape of an outline segment a of length = 10.8 mm contains at least Itot = 22 bits of useful information about the shape of the matching segment a . This information lies almost entirely in components 1–31, with wavelengths from 256 to 8.3 pixels (21.7 to 0.7 mm). The parameter Itot allows us to estimate the number of “false positives” that would be reported by an ideal shape-matching algorithm—i.e. the probability that an outline segment b, randomly selected from the collection, will resemble a specific segment a as closely as the 50 pairs used in the main test. Namely, this will happen with probability pf smaller than 1/222 , that is, less than 1 in 4,000,000. For the user, of course, a more relevant parameter is the probability of two fragments (rather than two outline segments) being falsely reported as partners. Since the average perimeter L of the fragments in our sample was about 2000 pixels (170 mm), and the sampling step
Figure 11.
Useful information content Ik per frequency k.
at the finest scale was δ = 0.25 pixel, each fragment in principle contains about m = L/δ ≈ 213 potential candidates of length , that should be compared against m segments from the other fragment. If any of these segment pairs happens to be a false match, the two fragments will be falsely reported as neighbors. However, simply multiplying pf by m2 would be quite incorrect, because those m segments are not completely independent. In particular, two overlapping segments that are shifted by a single step will have nearly the same Fourier coefficients in the relevant bands, especially at the low-frequency end. Deriving the per-fragment false match rate from Itot remains as an open theoretical problem. A related question, which is also quite important for users, is how the information content Itot scales with the segment length . Consider a fracture segment a of length 2: since we expect its two halves a , a to be nearly independent, the information content of a should be about 2Itot . However, on one hand, there will be some correlation between the Fourier components
Measuring the Information Content of Fracture Lines
Table 1. Results for a set of 50 pairs of matching contour segments: variances of Fourier coefficients (averaged power spectra) of the contour ( Aˆ k ), mean ( Mˆ k ), and difference ( Dˆ k ) signals, and estimated information content (Ik ) and density (Ik /L), per frequency k.
k
Aˆ k (mm2 )
Mˆ k (mm2 )
Dˆ k (mm2 )
Ik /L (bits/mm)
1
45.79
45.21
2.311
2.16
0.200
2
10.38
10.25
0.531
2.15
0.199
3
3.973
3.887
0.34
1.78
0.164
4
3.085
3.031
0.216
1.93
0.178
5
0.884
0.849
0.141
1.35
0.125
6
1.094
1.054
0.160
1.41
0.130
7
0.668
0.638
0.120
1.27
0.117
8
0.409
0.381
0.112
0.98
0.091
9
0.383
0.358
0.101
1.01
0.093
10
0.230
0.205
0.097
0.70
0.065
11
0.137
0.118
0.078
0.52
0.048
12
0.121
0.106
0.061
0.59
0.054
13
0.113
0.100
0.054
0.62
0.058
14
0.068
0.057
0.044
0.44
0.040
15
0.066
0.055
0.041
0.46
0.043
···
···
···
···
Total
···
···
22.35
2.062
of a and those of a , especially at the low-frequency end; which implies that some of the information they contain about the true partner of a will be duplicated. On the other hand, the angle α between the mean axes of a and a is an additional piece of shape information, not present in a or a separately. It is not clear which of these two effects will dominate. The measured value of Itot is of course specific to the unglazed ceramic fragmens used in our test.
Figure 12.
Table 2. Information content (Ik ) and information density (Ik /L), accumulated by scale of detail (frequency band).
k Ik (bits)
Power spectra of m(t) and d(t) for non-matching segments.
171
1
Wavelength
Ibd
Ibd /L
(mm)
(bits)
(bits/mm)
21.68
2.16
0.200
2 .. 3
7.22 .. 10.84
3.93
0.363
4 .. 7
3.10 .. 5.42
5.96
0.550
8 .. 15
1.45 .. 2.71
5.32
0.491
16 .. 31
0.70 .. 1.35
3.75
0.346
32 .. 63
0.34 .. 0.68
1.00
0.092
64 .. 127
0.17 .. 0.34
0.23
0.021
128 .. 25
0.009 .. 0.17
0.00
0.000
256 .. 511
0.04 .. 0.09
0.00
0.000
22.35
2.062
Total
Unglazed ceramic is particularly appropriate for shapebased reconstruction, because of its highly irregular fracture lines. Still, we conjecture that our basic result holds also for many other materials—namely, there is enough information in typical fragment outlines to solve even very large instances reassembly problem. For instance, glass fragments scanned at the same resolution will have relatively smooth outlines—which means small Sˆk ’s. On the other hand, glass fragment edges are much sharper, and material losses are smaller, so the outlines can be digitized with greater accuracy— meaning small Nˆ k . This intuition ought to be checked experimentally. Needless to say, most real-worl instances of the fragment reconstruction problem involve threedimensional objects with curved surfaces, such as vessels and statuary. For such instances, one would probably acquire the fragment outlines with a 3D laser
172
Leit˜ao and Stolfi
Figure 13.
Information Ik per frequency k for non-matching segments.
Figure 14.
The ceramic fragments used for the test, manually reassembled.
Measuring the Information Content of Fracture Lines
scanner or through stereo vision techniques, and encode them in some invariant representation (e.g., local curvature in the plane tangent to the object’s surface) such that adjacent fragments will have matching outline segments. In that case, one could still use the techniques of this paper to measure the information content of the encoded outlines. We believe that the result will be roughly the same as for flat fragments of the same material digitized to the same accuracy. We expect that the main source of “noise” in realworld instances of the problem will be the erosion of fragment edges, not only from natural causes but mainly from rough handling of the fragments. (Archaeologists often use sieving to separate ceramic fragments from soil, a process which may destroy most of the edge details at sub-millimeter scale.) One could reduce the impact of edge erosion by tracing the outline of each fragment at a fixed depth relative to the object’s surface, rather than at the surface itself. Alternatively, one could use the mean inclination of the fracture surface relative to the object’s surface as an additional component of the “signal.” Taking this idea to its natural limit, one should consider fractures as surfaces rather than curves, and use surface-matching techniques (as proposed by Berequet and Sharir (1997, 1999) and Levoy (1999) to find the adjacent fragments. This approach will surely supersede contour-based methods, once ways are found to reduce its formidable computational cost. In any case, it seems likely that the Fourier-based techniques of this paper can be extended to two-dimensional signals, and used to measure the information content of fracture surfaces. We found that the main practical obstacle to the use of our information-estimation method lies in obtaining a large enough sample of matching fragments. The control experiment suggests that our use of variances estimated from only 50 pairs may have inflated our estimate of Ik by one or two bits. Therefore, it would be desirable have a provably unbiased estimator of Ik , in the sense that its expected value for small samples (as well as the limiting value for large samples) coincides with the true value.
Acknowledgments A shorter version of this article was presented at WSCG 2000 in Plzen (Czech Republic) (Leit˜ao and Stolfi,
173
2000). We are indebted to Marc Levoy and to the WSCG referees for useful comments and suggestions. This work was supported in part by the Brazilian agencies CAPES, CNPq (301016/92-5 NV, 552181/02-1), FAPESP, and FAPERJ.
References Barequet, G. and Sharir, M. 1997. Partial surface and volume matching in three dimensions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(9):929– 948. Barequet, G. and Sharir, M. 1999. Partial surface matching by using directed footprints. Computational Geometry: Theory and Appls., 12:45–62. Brown, C.A., Charles, P.D., Johnsen, W.A., and Chesters, S. 1993. Fractal analysis of topographic data by the patchwork method. Wear, 161:61–67. Brown, C.A. and Hoch, P. 1997. Information content in surface metrology for functional correlations. In Proc. of the 12th Annual Meeting of the American Society for Precision Engineering, American Society for Precision Engineering, pp. 118– 121. Burdea, G.C. and Wolfson, H.J. 1989. Solving jigsaw puzzles by a robot. IEEE Trans. on Robotics and Automation, 5(6):752– 764. Kampel, M. and Sablatnig, R. 2003. Profile-based pottery reconstruction. In Proc. of the IEEE/CVPR Workshop on Appls. of Computer Vision in Archaeology (ACVA’03). Lathi, B.P. 1998. Modern Digital and Analog Communications Systems. Oxford Univ. Press. Leit˜ao, H.C.G. 1999. Reconstruc¸a˜ o Autom´atica de Objetos Fragmentados. PhD thesis, Institute of Computing, University of Campinas, (in Portuguese). Leit˜ao, H.C.G. and Stolfi, J. 2000, Information contents of fracture lines. In Proc. WSCG ’2000 — The 8th International Conference in Central Europe on Computer Graphics, Visualization, and Interactive Digital Media, Univ. of West Bohemia Press, vol. 2, pp. 389–395. Leit˜ao, H.C.G. and Stolfi, J. 2002. A multiscale method for the reassembly of two-dimensional fragmented objects. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(9):1239– 1251. Levoy, M. 1999. Scanning the fragments of the Forma Urbis Romae. WWW document at http://www.graphics.Stanford.edu/ projects/mich/forma-urbis/forma-urbis.html/. McBride, J.C. and Kimia, B.B. 2003. Archaeological fragment reassembly using curve-matching. In Proc. of the IEEE/CVPR Workshop on Appls. of Computer Vision in Archaeology (ACVA’03). Papaioannou, G. and Karabassi, E.A. 2003. On the automatic assemblage of arbitrary broken solid artefacts. Image & Vision Computing, 21(5):401–412. Papaioannou, G., Karabassi, E.A., and Theoharis, T. 2002. Reconstruction of three-dimensional objects through matching of their parts. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(1):114–124.
174
Leit˜ao and Stolfi
Taylor, R.I. and Lewis, P.H. 1994. 2D shape signature based on fractal measurements. IEE Proc.—Vision, Image & Signal Processing, 141(6):422–430. ¨ ¸ oluk, G. and Toroslu, I.H. 1999. Automatic reconstruction of Uc broken 3-D surface objects. Computers & Graphics, 23(4):573– 582.
Wolfson, H.J. 1990. On curve matching. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(5):483– 489. Wolfson, H.J. and Rigoutsos, I. 1997. Geometric hashing: An overview. IEEE Computational Science and Engineering, 4(4):10–21.