A Point-Set Compression Heuristic for Fiber-Based Certificates of Authenticity Darko Kirovski Microsoft Research, One Microsoft Way, Redmond, WA 98052
[email protected] Abstract A certificate of authenticity (COA) is an inexpensive physical object that has a random unique structure with high cost of near-exact reproduction. An additional requirement is that the uniqueness of COA’s random structure can be verified using an inexpensive device. Bauder was the first to propose COAs created as a randomized augmentation of a set of fixed-length fibers into a transparent gluing material that randomly fixes once for all the position of the fibers within. Recently, Kirovski showed that linear improvement in the compression ratio of a point-set compression algorithm used to store fibers’ locations, yields exponential increase in the cost of forging a fiber-based COA instance. To address this issue, in this paper, we introduce a novel, generalized heuristic that compresses M points in an N-dimensional grid with computational complexity proportional to O(M 2 ). We compare its performance with an expected lower bound. The heuristic can be used for numerous other applications such as storage of biometric patterns.
1
Introduction
A certificate of authenticity (COA) is a digitally signed physical object that has a random unique structure which satisfies three requirements: • the cost of creating and signing original COAs is small, relative to a desired level of security, • the cost of manufacturing a COA instance is several orders of magnitude lower than the cost of exact of near-exact replication of the unique and random physical structure of this instance, and • the cost of verifying the authenticity of a signed COA is small, again relative to a desired level of security. To the best of our knowledge, COAs were first introduced for weapons control verification purposes during The Cold War by Bauder and Simmons at the Sandia National Labs [3, 2]. Bauder was the first to propose COAs created as a collection of fibers randomly positioned in an object using a transparent gluing material which permanently fixes fibers’ positioning [3, 2]. He proposed fiber-based COAs for banknote protection - fibers in that proposal were fixed using a semi-transparent material such as paper [3]. Readout of the random structure of a fiber-based COA can be performed in numerous ways using the following fact: if one end-point of a fiber is exposed to light, the other one will 1
illuminate as illustrated in Figure 1a. In a sense, this is the ”third dimension” of the COA which cannot be replicated using an inexpensive scan-print device. An exemplary detector implementation with a light bar and an array of photo-detectors passing over the fibers, has been developed in efforts to provide a strong counterfeit-deterrent for banknotes [5, 10]. A COA instance is issued in the following way. First, a certain hard-to-replicate statistic of COA’s unique structure (e.g., random positions of COA’s fibers) is digitized and compressed - we denote this message as f . Next, f is concatenated to the associated textual information t (e.g., product ID, expiration date). The resulting message f ||t is hashed using a cryptographically secure hash algorithm such as SHA1 [8] - we denote this hash as h. Message h is then signed using the private key of the issuer and by following a readily available publickey cryptography standard such as the IEEE 1363 [6, 8]. Finally, the resulting signature s is concatenated to f ||t and imprinted on the COA as barcode to validate that the produced instance is authentic. Each COA instance is associated with an object whose authenticity the issuer wants to vouch. Due to significantly shorter signatures for comparable level of security, cryptographic routines based on elliptic curves such as EC-DSA [1] are favored to RSA [13]. COA verification involves the following tasks. The verifier initially scans the printed components: the text t and the barcode. The barcode is decoded into the original signature s and the signed layout of fibers f . Next, signature s is verified against the hash h = f ||t using issuer’s public key. Finally, the verifier scans the statistical properties of the associated COA instance, creates their presentation f , and compares f to the extracted f . If the level of their similarity surpasses a certain threshold, the verifier announces an authentic COA and vice versa. y COA scanner
R A
Light source LED
B
Scanning chamber
Light source LED diode
L
Scanning chamber CCD matrix (128x128)
COA
x
Illuminated fibers
L
a)
b)
Figure 1: a) An example of a COA with K randomly thrown fibers scanned at a resolution of L × L pixels.
Each fiber is of fixed length R. Directional light in the denoted circle lights up the end-points of the fibers that have one end in the circle. b) An exemplary sliding COA scanner with a single CCD matrix (scanning chamber) and a LED diode (light source).
In order to counterfeit protected objects, the adversary needs to either: (i ) compute the private key of the issuer – a task which can be made arbitrarily difficult by adjusting the key length of the used public-key cryptosystem [13, 1, 6], or (ii ) devise a manufacturing process that can exactly replicate an already signed COA instance – a task which is not infeasible but requires certain expense by the malicious party – the forging cost dictates the value that a single COA instance can protect [7], or 2
(iii ) misappropriate signed COA instances – a responsibility of the organization that issues COA instances. From that perspective, COA can be used to protect objects whose value roughly does not exceed the cost of forging a single COA instance including the accumulated development of a successful adversarial manufacturing process (ii ).
1.1
Related Work
To the best of our knowledge, there are only few efforts that have followed Bouder’s work. Pappu has created a class of physical one-way functions via speckle scattering [11]. He has focused on Gabor wavelets to produce short digests of the natural randomness collected from an optical phenomenon. His Ph.D. thesis also has a solid survey of the related but scarce work [11]. Recently, Kirovski used a system for automatic verification of fiber-based COAs to emphasize the impact of point-(sub)set compression on COA’s forging costs [7]. However, the algorithm presented in [7] is geared towards point-subset compression via a solver for the traveling salesman problem. In addition, the reader considered in [7] forces the adversary to place only one tip of each recorded fiber at an exact location during forgery. This paper proposes a compression algorithm that addresses the compression speed vs. ratio in a more efficient way, while using a scanning device that rectifies the inefficiency of the reader presented in [7]. The scanning device has been invented by Chen of Microsoft Research [4].
1.2
Capturing the 3D Statistics of a COA
There are numerous ways how the three dimensional structure of a fiber-based COA can be captured. The capturing process should be such that its implementation is inexpensive and that the recorded structure is hard to replicate using an inexpensive manufacturing process. For brevity and simplicity, we make an assumption that a particular type of capturing hardware is used, although numerous similar variants with different performance can be trivially derived. A generic version of the adopted capturing hardware is illustrated in Figure 1b. It consists of an array of bright light sources (e.g., LEDs) and a dark chamber that contains an image scanning device (e.g., 2-D CCD matrix). As illustrated in Figure 1b, a fiber with one tip located directly underneath the light source conducts light and as a result, the other tip glows. If this tip is located underneath the scanning chamber, its image capturing device will record it as an illuminated pixel. By sliding the reader back and forth across the COA, positions of all individual fibers can be identified with high probability. Hence, the read-out of COA’s random structure consists of two different point-sets. For each test instance, the wave-length of reader’s light source can be randomly selected.
2
Point-Set Compression in an N-Dimensional Grid
Problem 1. Point-Set Compression. Given a set of distinct points P = {p1 , . . . , pM } in an N -dimensional finite cubic grid G = {[1, L]}N , find the shortest binary form that encodes their coordinates. Lets denote the coordinates of a point pi in the grid as pi = {xi1 , . . . , xiN }, where each coordinate xji is an integer bounded within 1 ≤ xji ≤ L. We assume that points from P are located across all points of G randomly and equiprobably. If points in P are selected randomly,
3
the length C of the shortest binary form that describes P , is lower bounded by the following expectation [9]: N L . E[C] ≥ log2 M
(1)
We stress that this is an expectation, as points which are not randomly placed in the grid may have an altered lower bound C. Just like in [7], we construct a point-set compression scheme by encoding a sequence of point-to-point vectors, where destination of one vector is used as a source for the subsequent one. The encoded path visits all points in P , thus, making a full point-set, not point-subset, compression mechanism. Initially, we assume that the first point π1 in the path is a given point pi ∈ P . We are interested in encoding the first vector π1 , π2 , where π2 ∈ P − pi . If π2 is randomly selected from P − pi , then the alphabet that describes all possible symbols for π2 has LN − 1 equiprobable symbols. Here, we consider a particularly fast and greedy heuristic for choosing π2 : the nearest point heuristic, i.e., we choose: πi = arg
min
pj ∈P −Πi
||πi−1 − pj ||, Πi = ∪i−1 k=1 πk ,
(2)
where operator ||a−b|| denotes distance between two points a and b. For now, we consider only the Euclidean distance, however other distance metrics enable significantly faster implementations at equivalent expected compression rates. If there is more than one point at minimal Euclidean distance, we chose one of them at random. By definition, this heuristic imposes a straightforward corollary. Corollary 1. Point-free Zone. For a given current point πi and a set of already traversed points Πi from P , all points Γi ∈ G that satisfy the following property: g ∈ Γi ⇔ g = πi−1 ∧ (∀πj ∈ Πi − πi−1 )||g − πj || < ||g − πj+1 ||,
(3)
cannot be part of the alphabet used to encode πi . Proof. Straightforward from Eqn.2. Hence the alphabet for a point πi consists of all points in G − Γi . Due to the deployment of the nearest point heuristic, the symbols in this alphabet are not equiprobable. First, we sort the elements of G − Γi by their distance from πi−1 . Equidistant points are sorted using a ¯ i. simple rule such as clockwise browsing for N = 2. Lets denote this sorted list as Γ ¯ i occurs as a Corollary 2. Symbol Encoding. The probability that a certain symbol g ∈ Γ destination πi for the encoding vector πi−1 , πi equals: 1 c−1 p(g, i) ≡ Pr g = arg min ||pj − πi−1 || = γi (M − i) 1 − , pj ∈P −Πi γi
¯ i. ¯ i and γi equals the cardinality of Γ where c equals the index of g in Γ 4
(4)
By assuming point O = {0}N to be the starting point π0 for the resulting path ΠM +1 through all M points of P , we derive the following conclusion. Theorem 1. Resulting Entropy. By selecting points of the resulting path ΠM +1 using the nearest point heuristic and by using a specific point sorting rule for symbol encoding, the entropy E of ΠM +1 equals:
E=−
M
log2 [p(πi , i)] .
(5)
i=1
By using an efficient encoding mechanism such as arithmetic coding [12], one can nearoptimally encode the desired information [14]. In the general case, such algorithm is not optimal for the overall goal presented in Problem 1 which is NP-hard [7]. We illustrate the compression process using Figure 2. The left subfigure depicts the compression path obtained for a 2D fiber-based COA with a scanning area equal to 96 × 64 pixels and with 100 20-pixel long fibers. The right subfigure depicts the first several point-free zones for the created path. Finally, the algorithm can be adjusted to search for improved encodings by allowing a different point selection heuristic with a possibility to enter the “point-free zone”1 using a special symbol. fiber illuminated fiber tip glowing fiber tip encoding path Π
60
50
40
30
20
3
5 6
10
0
4
0
10
20
30
40
50
60
70
80
90
1
Scanning direction
6
2
7
Figure 2: Example of a 2D fiber-based COA with a scanning area 96 × 64 pixels, 100 fibers of length 20 pixels. The figure on the left depicts the compression path obtained using the algorithm from Section 2. The figure on the right depicts the first several point-free zones for the created path.
3
COA Model
In this section, we present an analytical model of a fiber-based COA instance for the considered sliding COA scanner. We model an important feature of a COA instance S: the probability density function that a particular point in S is illuminated. A COA(X,Y,R,K) instance is defined as a rectangle with dimensions X and Y pixels. We assume that K fibers of fixed 1
In the general case, different heuristics may impose that the point-free zone as defined in Corollary 1 can potentially contain a point.
5
length R are randomly thrown over COA’s area. The COA scanner is sliding along the x dimension while performing the read-out of the random structure as illustrated in Figure 1b. We denote a fiber as a tuple f = {A, B} of points A, B ∈ S such that the Euclidean distance between them equals ||A − B|| = R. Finally, we assume that the scanning chamber of the reader is as wide as the COA, i.e., Y pixels, and as long as the fiber length, i.e., R pixels. Definition 1. Distribution of Glowing Fiber End-Points. Given that the COA scanner is slid along COA’s x dimension, we define the probability density function (pdf ) ϕ(x, y) for any point Q(x, y) ∈ S via the probability ξ(P ) that a certain area P ∈ S contains an illuminated end-point A of a fiber f = {A, B}, conditioned on the fact that their x coordinates satisfy: xB ≤ xA . More formally, for any P ∈ S: ξ(P ) = Pr[A ∈ P |f = {A, B} ∈ S, xB ≤ xA ] =
ϕ(x, y)dxdy.
(6)
Q(x,y)⊂P
Solving ϕ(x, y) analytically is a tedious task. We solve ϕ(x, y) approximately using a simple numerical computation; for brevity and simplicity of presentation in this manuscript, we have omitted the analytical presentation of this computation. Figure 3 illustrates the pdf of fiber end-point occurrence in a rectangular COA with dimensions X = 96 and Y = 64 and fiber length R = 20 pixels sampled at unit points. x 10
-4
3
φ (x,y)
2
1
0 80 60 40 20
10
20
30
40
50
60
Figure 3: An example of the function ϕ(x, y) for a rectangular fiber COA with parameters X = 96, Y = 64, and R = 20 sampled at unit points. The scanner is slid along the x-axis. It is important to notice that for relatively large R, the likelihood that an end-point of a fiber lands on a certain small area P ∈ S varies significantly depending on the particular position of P within S. By using the information about the variance of ϕ(x, y) throughout S, we can improve the performance of the point-set compression algorithm presented in Section 2. The irregular distribution of ϕ(x, y) affects the symbol occurrence probability from Eqn.4 as follows:
c−1 M −i 1 p(gc , i) ≡ Pr gc = arg min ||pj − πi−1 || = . pj ∈P −Πi ξ(gc ) 1 − ξ(gj ) j=1
6
(7)
4
Empirical Evaluation
In this section, we evaluate the efficacy of the compression mechanism as well as different design points for the overall fiber COA system. First, we look into the compression performance of the algorithm presented in Section 2. In this case, we analyze several COA configurations of fixed dimensions, i.e., X = 256 against Y = 96 cells, and a range of fiber count values, 20 ≤ M ≤ 180. Note that fiber length affects the compression only when the distribution of fiber points is taken into account as presented in Eqn.7. We present the results in Figures 4 and 5. Figure 4 depicts the performance of the point-set compression algorithm compared to the expected lower bound from Eqn.2. The presented metric equals the difference between the number of bits C used by our point-set compression algorithm and the expected entropy C of the point selection quantified in Eqn.2. Two different datasets are used. The first one is marked with • and depicts the variant of the compression scheme that does not take into account the fiber distribution function ϕ(), while the second one is marked with ◦ and depicts the variant that takes into account ϕ(). In the second case, we considered fixed fiber-lengths from the following set R ∈ {10, 20, 30, 40} pixels. Note that the first dataset results in solutions exceptionally close to the expected bound, in certain cases even outperforming it. This happens rarely and due to the imperfection of the random number generator used in the experiments. By considering the distribution of fibers on the COA, certain minimal improvements can be obtained at a significant cost in the compression and decompression speed. These improvements are small, within 1-3%, and larger when fibers are longer. In summary, due to the exceptional performance proximity with respect to the expected lower bound, we conclude that the algorithm is near-optimal. 25
20
Uniform fiber distribution assumed Fiber distribution analyzed
C-log2(L2, M) [bits]
15
10
5
0
-5
-10 20
40
60
80 100 120 M - number of fibers
140
160
180
Figure 4: Difference between the number of bits used by the presented point-set compression algorithm and the entropy of the point selection quantified in Eqn.2. Two different datasets are used: one is marked with • symbols and depicts the variant of the compression scheme that does not take into account the fiber distribution function ϕ(), while the second one is marked with ◦ symbols and depicts the variant that takes into account ϕ().
7
Figure 5 depicts the compression ratio achieved by our algorithm in the case when ϕ() is not considered during compression. The abscissa of the plot quantifies the number of fibers contained within the COA which ranges within M ∈ {20, 180}. The left ordinate illustrates the number of bits used to compress all points in the COA, whereas the right ordinate quantifies the average number of bits used to represent a point in the compressed form. The figure illustrates results that stem from ten randomly generated instances for each set of design parameters. 1600
1400
11
1000 10 800
Bits per point
Total number of bits
1200
600 9 400 20
40
60
80 100 120 M - number of fibers
140
160
180
Figure 5: Achieved compression ratio when ϕ() is not considered during compression. The abscissa quantifies the number of fibers contained within the COA. The left ordinate illustrates the number of bits used to compress all points in the COA, whereas the right ordinate quantifies the average number of bits used to represent a point in the compressed form.
4.1
Algorithm Implementation
The proposed algorithm can have several variants with different run-times depending on whether the implementation is in hardware or software. First, note that an N -dimensional cubic grid can be reduced to an (N − 1)-dimensional grid by making the grid at one of the remaining dimensions L times longer. For example, an L×L grid can be represented as a string (1D grid) with L2 points. Hardware implementations typically should solve the N -dimensional problem using a 2D representation of the original grid because of the planar computation parallelism that is easily implemented on chip. On the other hand, software implementations should typically perform the compression in 1D or 2D. Although we presented the algorithm using the Euclidean norm, other distance metrics can be employed. Since the Euclidean distance is relatively costly to compute, we propose to use the Manhattan or the max1D distance measure (i.e., sup norm) defined as ||(xi , yi )−(xj , yj )|| max(|xi − xj |, |yi − yj |). Consequently, the computational complexity of the proposed method is significantly reduced due to the speedup in computing the distance between two grid points and testing whether a given point is in a certain point-free zone. Hence, we prefer max1D distance in our implementation. Note that, the expected compression rate does not depend upon the selected distance measure.
8
The overall computational complexity of the proposed method consists of two sources. The first one is the computation of mutual distances for all points within P which can be done in O(M 2 ) operations. The second one is counting points in U − Γi from the source to the closest destination, whose complexity is O (XY ), where X and Y denote the horizontal and vertical dimensions of a 2D COA. The counting process is iterated for each point. Thus, the algorithm requires O(M 2 + M XY ) operations with a memory requirement of XY bits for marking the point-free zone. Furthermore, it can be shown more efficient counting algorithms, that by using √ 2 the overall complexity can be as low as O M + XY M in 2D or even O M 2 when the grid is represented as a point string. Assuming that XY can be prohibitively large, it is important to consider the following variant of the compression algorithm. When radially2 counting the points in the G − Γi zone in order to compute p(g, i) as in Eqn.4, one can constrain the maximal distance to the next point to T units, where T is smaller than the maximal distance D between all points in P . Both D and T are encoded in the header of the compressed stream. For a given point xi , in case its nearest unprocessed neighbor xj is such that ||xi − xj || < T , the compressor emits a special symbol and encodes xj only within grid points that belong to G − Γi and to the circle centered at xi with T as a diameter. More formally, in this case the probability p(g, i) is computed over: ¯ i = G − Γi ∩ {(∀g ∈ G)||g − xi || < T }. Γ
(8)
If ||xi − xj || ≥ T , we first denote the zone ||xi − xj || < T as point-free, i.e., Γi = Γi ∪ {(∀g ∈ G)||g − xi || < T }, and compute p(g, i) over: ¯ i = G − Γi ∩ {(∀g ∈ G)||g − xi || < D}. Γ
(9)
The value of T balances the (de)compression speed. The probability of occurrence for the special symbol is encoded in the header of the compressed file. This approach can be generalized to using an array of distance bounds T1 , . . . , TJ , D. Then, one of the bounds Ti is encoded in the compressed stream prior to encoding each point to signalize the ring (for all points g within Ti−1 < ||xi − g|| ≤ Ti ) considered for encoding of the next symbol.
5
Acknowledgements
The author would like to thank Yuqun Chen, M. Kivan¸c Mih¸cak, Gideon Yuval, Yacov Yacobi, and Gary Starkweather, all from Microsoft Research, for useful discussions that have impacted the technical content of this paper.
6
Conclusion
A certificate of authenticity (COA) is an inexpensive physical object that has a random unique structure with high cost of near-exact reproduction. Bauder was the first to propose COAs created as a randomized augmentation of a set of fixed-length fibers into a transparent gluing material that randomly fixes once for all the position of the fibers within. 2
In max1D sense.
9
This paper introduces a novel, simple, and generalized algorithm that compresses M points in an N -dimensional grid. We compare its performance with an expected lower bound. As the compression algorithm performed in all instances at a minimal offset with respect to the expected lower bound, we conclude that it is near-optimal. With computational complexity proportional to O(M 2 ), the algorithm can be used for the intended application to produce compressed point-sets in a split-second manner. The algorithm can be used for numerous applications such as storage of biometric patterns or answers to queries into database relations.
References [1] ANSI X9.62-1998. Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), 1998. On-line at: http://www.x9.org. [2] D.W. Bauder. Personal Communication. [3] D.W. Bauder. An Anti-Counterfeiting Concept for Currency Systems. Research report PTK-11990. Sandia National Labs. Albuquerque, NM, 1983. [4] Y. Chen. Personal communication, 2002. [5] S. Church and D. Littman. Machine reading of Visual Counterfeit Deterrent Features and Summary of US Research, 1980-90. Four Nation Group on Advanced Counterfeit Deterrence, Canada, 1991. [6] IEEE 1363-2000: Standard Specifications For Public Key Cryptography, 2000. On-line at: http://grouper.ieee.org/groups/1363. [7] D. Kirovski. Toward An Automated Verification of Certificates of Authenticity. ACM Electronic Commerce, pp.160–9, 2004. [8] A.J. Menezes, et al. Handbook of Applied Cryptography. CRC Press, 1996. [9] M.K. Mih¸cak. Personal communication, 2004. [10] Commission on Engineering and Technical Systems (CETS). Counterfeit Deterrent Features for the Next-Generation Currency Design. The National Academic Press, 1993. [11] R. Pappu. Physical One-Way Functions. Ph.D. Thesis, MIT, 2001. [12] J. Rissanen. Modeling by Shortest Data Description. Automatica, Vol.14, pp.465–471, 1978. [13] R.L. Rivest, et al. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, vol.21, no.2, pp.120–126, 1978. [14] C.E. Shannon. Prediction and entropy of printed English. Bell Systems Tech. Journal, pp.50–64, 1951.
10