A Point-Set Compression Heuristic for Fiber-Based ... - Semantic Scholar

Comment

Report 5 Downloads 23 Views

A Point-Set Compression Heuristic for Fiber-Based Certificates of Authenticity Darko Kirovski Microsoft Research, One Microsoft Way, Redmond, WA 98052 [email protected] Abstract A certiﬁcate of authenticity (COA) is an inexpensive physical object that has a random unique structure with high cost of near-exact reproduction. An additional requirement is that the uniqueness of COA’s random structure can be veriﬁed using an inexpensive device. Bauder was the ﬁrst to propose COAs created as a randomized augmentation of a set of ﬁxed-length ﬁbers into a transparent gluing material that randomly ﬁxes once for all the position of the ﬁbers within. Recently, Kirovski showed that linear improvement in the compression ratio of a point-set compression algorithm used to store ﬁbers’ locations, yields exponential increase in the cost of forging a ﬁber-based COA instance. To address this issue, in this paper, we introduce a novel, generalized heuristic that compresses M points in an N-dimensional grid with computational complexity proportional to O(M 2 ). We compare its performance with an expected lower bound. The heuristic can be used for numerous other applications such as storage of biometric patterns.

1

Introduction

A certiﬁcate of authenticity (COA) is a digitally signed physical object that has a random unique structure which satisﬁes three requirements: • the cost of creating and signing original COAs is small, relative to a desired level of security, • the cost of manufacturing a COA instance is several orders of magnitude lower than the cost of exact of near-exact replication of the unique and random physical structure of this instance, and • the cost of verifying the authenticity of a signed COA is small, again relative to a desired level of security. To the best of our knowledge, COAs were ﬁrst introduced for weapons control veriﬁcation purposes during The Cold War by Bauder and Simmons at the Sandia National Labs [3, 2]. Bauder was the ﬁrst to propose COAs created as a collection of ﬁbers randomly positioned in an object using a transparent gluing material which permanently ﬁxes ﬁbers’ positioning [3, 2]. He proposed ﬁber-based COAs for banknote protection - ﬁbers in that proposal were ﬁxed using a semi-transparent material such as paper [3]. Readout of the random structure of a ﬁber-based COA can be performed in numerous ways using the following fact: if one end-point of a ﬁber is exposed to light, the other one will 1

illuminate as illustrated in Figure 1a. In a sense, this is the ”third dimension” of the COA which cannot be replicated using an inexpensive scan-print device. An exemplary detector implementation with a light bar and an array of photo-detectors passing over the ﬁbers, has been developed in eﬀorts to provide a strong counterfeit-deterrent for banknotes [5, 10]. A COA instance is issued in the following way. First, a certain hard-to-replicate statistic of COA’s unique structure (e.g., random positions of COA’s ﬁbers) is digitized and compressed - we denote this message as f . Next, f is concatenated to the associated textual information t (e.g., product ID, expiration date). The resulting message f ||t is hashed using a cryptographically secure hash algorithm such as SHA1 [8] - we denote this hash as h. Message h is then signed using the private key of the issuer and by following a readily available publickey cryptography standard such as the IEEE 1363 [6, 8]. Finally, the resulting signature s is concatenated to f ||t and imprinted on the COA as barcode to validate that the produced instance is authentic. Each COA instance is associated with an object whose authenticity the issuer wants to vouch. Due to signiﬁcantly shorter signatures for comparable level of security, cryptographic routines based on elliptic curves such as EC-DSA [1] are favored to RSA [13]. COA veriﬁcation involves the following tasks. The veriﬁer initially scans the printed components: the text t and the barcode. The barcode is decoded into the original signature s and the signed layout of ﬁbers f . Next, signature s is veriﬁed against the hash h = f ||t using issuer’s public key. Finally, the veriﬁer scans the statistical properties of the associated COA instance, creates their presentation f , and compares f to the extracted f . If the level of their similarity surpasses a certain threshold, the veriﬁer announces an authentic COA and vice versa. y COA scanner

R A

Light source LED

B

Scanning chamber

Light source LED diode

L

Scanning chamber CCD matrix (128x128)

COA

x

Illuminated fibers

L

a)

b)

Figure 1: a) An example of a COA with K randomly thrown ﬁbers scanned at a resolution of L × L pixels.

Each ﬁber is of ﬁxed length R. Directional light in the denoted circle lights up the end-points of the ﬁbers that have one end in the circle. b) An exemplary sliding COA scanner with a single CCD matrix (scanning chamber) and a LED diode (light source).

In order to counterfeit protected objects, the adversary needs to either: (i ) compute the private key of the issuer – a task which can be made arbitrarily diﬃcult by adjusting the key length of the used public-key cryptosystem [13, 1, 6], or (ii ) devise a manufacturing process that can exactly replicate an already signed COA instance – a task which is not infeasible but requires certain expense by the malicious party – the forging cost dictates the value that a single COA instance can protect [7], or 2

(iii ) misappropriate signed COA instances – a responsibility of the organization that issues COA instances. From that perspective, COA can be used to protect objects whose value roughly does not exceed the cost of forging a single COA instance including the accumulated development of a successful adversarial manufacturing process (ii ).

1.1

Related Work

To the best of our knowledge, there are only few eﬀorts that have followed Bouder’s work. Pappu has created a class of physical one-way functions via speckle scattering [11]. He has focused on Gabor wavelets to produce short digests of the natural randomness collected from an optical phenomenon. His Ph.D. thesis also has a solid survey of the related but scarce work [11]. Recently, Kirovski used a system for automatic veriﬁcation of ﬁber-based COAs to emphasize the impact of point-(sub)set compression on COA’s forging costs [7]. However, the algorithm presented in [7] is geared towards point-subset compression via a solver for the traveling salesman problem. In addition, the reader considered in [7] forces the adversary to place only one tip of each recorded ﬁber at an exact location during forgery. This paper proposes a compression algorithm that addresses the compression speed vs. ratio in a more eﬃcient way, while using a scanning device that rectiﬁes the ineﬃciency of the reader presented in [7]. The scanning device has been invented by Chen of Microsoft Research [4].

1.2

Capturing the 3D Statistics of a COA

There are numerous ways how the three dimensional structure of a ﬁber-based COA can be captured. The capturing process should be such that its implementation is inexpensive and that the recorded structure is hard to replicate using an inexpensive manufacturing process. For brevity and simplicity, we make an assumption that a particular type of capturing hardware is used, although numerous similar variants with diﬀerent performance can be trivially derived. A generic version of the adopted capturing hardware is illustrated in Figure 1b. It consists of an array of bright light sources (e.g., LEDs) and a dark chamber that contains an image scanning device (e.g., 2-D CCD matrix). As illustrated in Figure 1b, a ﬁber with one tip located directly underneath the light source conducts light and as a result, the other tip glows. If this tip is located underneath the scanning chamber, its image capturing device will record it as an illuminated pixel. By sliding the reader back and forth across the COA, positions of all individual ﬁbers can be identiﬁed with high probability. Hence, the read-out of COA’s random structure consists of two diﬀerent point-sets. For each test instance, the wave-length of reader’s light source can be randomly selected.

2

Point-Set Compression in an N-Dimensional Grid

Problem 1. Point-Set Compression. Given a set of distinct points P = {p1 , . . . , pM } in an N -dimensional finite cubic grid G = {[1, L]}N , find the shortest binary form that encodes their coordinates. Lets denote the coordinates of a point pi in the grid as pi = {xi1 , . . . , xiN }, where each coordinate xji is an integer bounded within 1 ≤ xji ≤ L. We assume that points from P are located across all points of G randomly and equiprobably. If points in P are selected randomly,

3

the length C of the shortest binary form that describes P , is lower bounded by the following expectation [9]: N L . E[C] ≥ log2 M

(1)

We stress that this is an expectation, as points which are not randomly placed in the grid may have an altered lower bound C. Just like in [7], we construct a point-set compression scheme by encoding a sequence of point-to-point vectors, where destination of one vector is used as a source for the subsequent one. The encoded path visits all points in P , thus, making a full point-set, not point-subset, compression mechanism. Initially, we assume that the ﬁrst point π1 in the path is a given point pi ∈ P . We are interested in encoding the ﬁrst vector π1 , π2 , where π2 ∈ P − pi . If π2 is randomly selected from P − pi , then the alphabet that describes all possible symbols for π2 has LN − 1 equiprobable symbols. Here, we consider a particularly fast and greedy heuristic for choosing π2 : the nearest point heuristic, i.e., we choose: πi = arg

min

pj ∈P −Πi

||πi−1 − pj ||, Πi = ∪i−1 k=1 πk ,

(2)

where operator ||a−b|| denotes distance between two points a and b. For now, we consider only the Euclidean distance, however other distance metrics enable signiﬁcantly faster implementations at equivalent expected compression rates. If there is more than one point at minimal Euclidean distance, we chose one of them at random. By deﬁnition, this heuristic imposes a straightforward corollary. Corollary 1. Point-free Zone. For a given current point πi and a set of already traversed points Πi from P , all points Γi ∈ G that satisfy the following property: g ∈ Γi ⇔ g = πi−1 ∧ (∀πj ∈ Πi − πi−1 )||g − πj || < ||g − πj+1 ||,

(3)

cannot be part of the alphabet used to encode πi . Proof. Straightforward from Eqn.2. Hence the alphabet for a point πi consists of all points in G − Γi . Due to the deployment of the nearest point heuristic, the symbols in this alphabet are not equiprobable. First, we sort the elements of G − Γi by their distance from πi−1 . Equidistant points are sorted using a ¯ i. simple rule such as clockwise browsing for N = 2. Lets denote this sorted list as Γ ¯ i occurs as a Corollary 2. Symbol Encoding. The probability that a certain symbol g ∈ Γ destination πi for the encoding vector πi−1 , πi equals: 1 c−1 p(g, i) ≡ Pr g = arg min ||pj − πi−1 || = γi (M − i) 1 − , pj ∈P −Πi γi

¯ i. ¯ i and γi equals the cardinality of Γ where c equals the index of g in Γ 4

(4)

By assuming point O = {0}N to be the starting point π0 for the resulting path ΠM +1 through all M points of P , we derive the following conclusion. Theorem 1. Resulting Entropy. By selecting points of the resulting path ΠM +1 using the nearest point heuristic and by using a specific point sorting rule for symbol encoding, the entropy E of ΠM +1 equals:

E=−

M

log2 [p(πi , i)] .

(5)

i=1

By using an eﬃcient encoding mechanism such as arithmetic coding [12], one can nearoptimally encode the desired information [14]. In the general case, such algorithm is not optimal for the overall goal presented in Problem 1 which is NP-hard [7]. We illustrate the compression process using Figure 2. The left subﬁgure depicts the compression path obtained for a 2D ﬁber-based COA with a scanning area equal to 96 × 64 pixels and with 100 20-pixel long ﬁbers. The right subﬁgure depicts the ﬁrst several point-free zones for the created path. Finally, the algorithm can be adjusted to search for improved encodings by allowing a diﬀerent point selection heuristic with a possibility to enter the “point-free zone”1 using a special symbol. fiber illuminated fiber tip glowing fiber tip encoding path Π

60

50

40

30

20

3

5 6

10

0

4

0

10

20

30

40

50

60

70

80

90

1

Scanning direction

6

2

7

Figure 2: Example of a 2D ﬁber-based COA with a scanning area 96 × 64 pixels, 100 ﬁbers of length 20 pixels. The ﬁgure on the left depicts the compression path obtained using the algorithm from Section 2. The ﬁgure on the right depicts the ﬁrst several point-free zones for the created path.

3

COA Model

In this section, we present an analytical model of a ﬁber-based COA instance for the considered sliding COA scanner. We model an important feature of a COA instance S: the probability density function that a particular point in S is illuminated. A COA(X,Y,R,K) instance is deﬁned as a rectangle with dimensions X and Y pixels. We assume that K ﬁbers of ﬁxed 1

In the general case, diﬀerent heuristics may impose that the point-free zone as deﬁned in Corollary 1 can potentially contain a point.

5

length R are randomly thrown over COA’s area. The COA scanner is sliding along the x dimension while performing the read-out of the random structure as illustrated in Figure 1b. We denote a ﬁber as a tuple f = {A, B} of points A, B ∈ S such that the Euclidean distance between them equals ||A − B|| = R. Finally, we assume that the scanning chamber of the reader is as wide as the COA, i.e., Y pixels, and as long as the ﬁber length, i.e., R pixels. Definition 1. Distribution of Glowing Fiber End-Points. Given that the COA scanner is slid along COA’s x dimension, we define the probability density function (pdf ) ϕ(x, y) for any point Q(x, y) ∈ S via the probability ξ(P ) that a certain area P ∈ S contains an illuminated end-point A of a fiber f = {A, B}, conditioned on the fact that their x coordinates satisfy: xB ≤ xA . More formally, for any P ∈ S: ξ(P ) = Pr[A ∈ P |f = {A, B} ∈ S, xB ≤ xA ] =

ϕ(x, y)dxdy.

(6)

Q(x,y)⊂P

Solving ϕ(x, y) analytically is a tedious task. We solve ϕ(x, y) approximately using a simple numerical computation; for brevity and simplicity of presentation in this manuscript, we have omitted the analytical presentation of this computation. Figure 3 illustrates the pdf of ﬁber end-point occurrence in a rectangular COA with dimensions X = 96 and Y = 64 and ﬁber length R = 20 pixels sampled at unit points. x 10

-4

3

φ (x,y)

2

1

0 80 60 40 20

10

20

30

40

50

60

Figure 3: An example of the function ϕ(x, y) for a rectangular ﬁber COA with parameters X = 96, Y = 64, and R = 20 sampled at unit points. The scanner is slid along the x-axis. It is important to notice that for relatively large R, the likelihood that an end-point of a ﬁber lands on a certain small area P ∈ S varies signiﬁcantly depending on the particular position of P within S. By using the information about the variance of ϕ(x, y) throughout S, we can improve the performance of the point-set compression algorithm presented in Section 2. The irregular distribution of ϕ(x, y) aﬀects the symbol occurrence probability from Eqn.4 as follows:

c−1 M −i 1 p(gc , i) ≡ Pr gc = arg min ||pj − πi−1 || = . pj ∈P −Πi ξ(gc ) 1 − ξ(gj ) j=1

6

(7)

4

Empirical Evaluation

In this section, we evaluate the eﬃcacy of the compression mechanism as well as diﬀerent design points for the overall ﬁber COA system. First, we look into the compression performance of the algorithm presented in Section 2. In this case, we analyze several COA conﬁgurations of ﬁxed dimensions, i.e., X = 256 against Y = 96 cells, and a range of ﬁber count values, 20 ≤ M ≤ 180. Note that ﬁber length aﬀects the compression only when the distribution of ﬁber points is taken into account as presented in Eqn.7. We present the results in Figures 4 and 5. Figure 4 depicts the performance of the point-set compression algorithm compared to the expected lower bound from Eqn.2. The presented metric equals the diﬀerence between the number of bits C used by our point-set compression algorithm and the expected entropy C of the point selection quantiﬁed in Eqn.2. Two diﬀerent datasets are used. The ﬁrst one is marked with • and depicts the variant of the compression scheme that does not take into account the ﬁber distribution function ϕ(), while the second one is marked with ◦ and depicts the variant that takes into account ϕ(). In the second case, we considered ﬁxed ﬁber-lengths from the following set R ∈ {10, 20, 30, 40} pixels. Note that the ﬁrst dataset results in solutions exceptionally close to the expected bound, in certain cases even outperforming it. This happens rarely and due to the imperfection of the random number generator used in the experiments. By considering the distribution of ﬁbers on the COA, certain minimal improvements can be obtained at a signiﬁcant cost in the compression and decompression speed. These improvements are small, within 1-3%, and larger when ﬁbers are longer. In summary, due to the exceptional performance proximity with respect to the expected lower bound, we conclude that the algorithm is near-optimal. 25

20

Uniform fiber distribution assumed Fiber distribution analyzed

C-log2(L2, M) [bits]

15

10

5

0

-5

-10 20

40

60

80 100 120 M - number of fibers

140

160

180

Figure 4: Diﬀerence between the number of bits used by the presented point-set compression algorithm and the entropy of the point selection quantiﬁed in Eqn.2. Two diﬀerent datasets are used: one is marked with • symbols and depicts the variant of the compression scheme that does not take into account the ﬁber distribution function ϕ(), while the second one is marked with ◦ symbols and depicts the variant that takes into account ϕ().

7

Figure 5 depicts the compression ratio achieved by our algorithm in the case when ϕ() is not considered during compression. The abscissa of the plot quantiﬁes the number of ﬁbers contained within the COA which ranges within M ∈ {20, 180}. The left ordinate illustrates the number of bits used to compress all points in the COA, whereas the right ordinate quantiﬁes the average number of bits used to represent a point in the compressed form. The ﬁgure illustrates results that stem from ten randomly generated instances for each set of design parameters. 1600

1400

11

1000 10 800

Bits per point

Total number of bits

1200

600 9 400 20

40

60

80 100 120 M - number of fibers

140

160

180

Figure 5: Achieved compression ratio when ϕ() is not considered during compression. The abscissa quantiﬁes the number of ﬁbers contained within the COA. The left ordinate illustrates the number of bits used to compress all points in the COA, whereas the right ordinate quantiﬁes the average number of bits used to represent a point in the compressed form.

4.1

Algorithm Implementation

The proposed algorithm can have several variants with diﬀerent run-times depending on whether the implementation is in hardware or software. First, note that an N -dimensional cubic grid can be reduced to an (N − 1)-dimensional grid by making the grid at one of the remaining dimensions L times longer. For example, an L×L grid can be represented as a string (1D grid) with L2 points. Hardware implementations typically should solve the N -dimensional problem using a 2D representation of the original grid because of the planar computation parallelism that is easily implemented on chip. On the other hand, software implementations should typically perform the compression in 1D or 2D. Although we presented the algorithm using the Euclidean norm, other distance metrics can be employed. Since the Euclidean distance is relatively costly to compute, we propose to use the Manhattan or the max1D distance measure (i.e., sup norm) deﬁned as ||(xi , yi )−(xj , yj )|| max(|xi − xj |, |yi − yj |). Consequently, the computational complexity of the proposed method is signiﬁcantly reduced due to the speedup in computing the distance between two grid points and testing whether a given point is in a certain point-free zone. Hence, we prefer max1D distance in our implementation. Note that, the expected compression rate does not depend upon the selected distance measure.

8

The overall computational complexity of the proposed method consists of two sources. The ﬁrst one is the computation of mutual distances for all points within P which can be done in O(M 2 ) operations. The second one is counting points in U − Γi from the source to the closest destination, whose complexity is O (XY ), where X and Y denote the horizontal and vertical dimensions of a 2D COA. The counting process is iterated for each point. Thus, the algorithm requires O(M 2 + M XY ) operations with a memory requirement of XY bits for marking the point-free zone. Furthermore, it can be shown more eﬃcient counting algorithms, that by using √ 2 the overall complexity can be as low as O M + XY M in 2D or even O M 2 when the grid is represented as a point string. Assuming that XY can be prohibitively large, it is important to consider the following variant of the compression algorithm. When radially2 counting the points in the G − Γi zone in order to compute p(g, i) as in Eqn.4, one can constrain the maximal distance to the next point to T units, where T is smaller than the maximal distance D between all points in P . Both D and T are encoded in the header of the compressed stream. For a given point xi , in case its nearest unprocessed neighbor xj is such that ||xi − xj || < T , the compressor emits a special symbol and encodes xj only within grid points that belong to G − Γi and to the circle centered at xi with T as a diameter. More formally, in this case the probability p(g, i) is computed over: ¯ i = G − Γi ∩ {(∀g ∈ G)||g − xi || < T }. Γ

(8)

If ||xi − xj || ≥ T , we ﬁrst denote the zone ||xi − xj || < T as point-free, i.e., Γi = Γi ∪ {(∀g ∈ G)||g − xi || < T }, and compute p(g, i) over: ¯ i = G − Γi ∩ {(∀g ∈ G)||g − xi || < D}. Γ

(9)

The value of T balances the (de)compression speed. The probability of occurrence for the special symbol is encoded in the header of the compressed ﬁle. This approach can be generalized to using an array of distance bounds T1 , . . . , TJ , D. Then, one of the bounds Ti is encoded in the compressed stream prior to encoding each point to signalize the ring (for all points g within Ti−1 < ||xi − g|| ≤ Ti ) considered for encoding of the next symbol.

5

Acknowledgements

The author would like to thank Yuqun Chen, M. Kivan¸c Mih¸cak, Gideon Yuval, Yacov Yacobi, and Gary Starkweather, all from Microsoft Research, for useful discussions that have impacted the technical content of this paper.

6

Conclusion

A certiﬁcate of authenticity (COA) is an inexpensive physical object that has a random unique structure with high cost of near-exact reproduction. Bauder was the ﬁrst to propose COAs created as a randomized augmentation of a set of ﬁxed-length ﬁbers into a transparent gluing material that randomly ﬁxes once for all the position of the ﬁbers within. 2

In max1D sense.

9

This paper introduces a novel, simple, and generalized algorithm that compresses M points in an N -dimensional grid. We compare its performance with an expected lower bound. As the compression algorithm performed in all instances at a minimal oﬀset with respect to the expected lower bound, we conclude that it is near-optimal. With computational complexity proportional to O(M 2 ), the algorithm can be used for the intended application to produce compressed point-sets in a split-second manner. The algorithm can be used for numerous applications such as storage of biometric patterns or answers to queries into database relations.

References [1] ANSI X9.62-1998. Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), 1998. On-line at: http://www.x9.org. [2] D.W. Bauder. Personal Communication. [3] D.W. Bauder. An Anti-Counterfeiting Concept for Currency Systems. Research report PTK-11990. Sandia National Labs. Albuquerque, NM, 1983. [4] Y. Chen. Personal communication, 2002. [5] S. Church and D. Littman. Machine reading of Visual Counterfeit Deterrent Features and Summary of US Research, 1980-90. Four Nation Group on Advanced Counterfeit Deterrence, Canada, 1991. [6] IEEE 1363-2000: Standard Speciﬁcations For Public Key Cryptography, 2000. On-line at: http://grouper.ieee.org/groups/1363. [7] D. Kirovski. Toward An Automated Veriﬁcation of Certiﬁcates of Authenticity. ACM Electronic Commerce, pp.160–9, 2004. [8] A.J. Menezes, et al. Handbook of Applied Cryptography. CRC Press, 1996. [9] M.K. Mih¸cak. Personal communication, 2004. [10] Commission on Engineering and Technical Systems (CETS). Counterfeit Deterrent Features for the Next-Generation Currency Design. The National Academic Press, 1993. [11] R. Pappu. Physical One-Way Functions. Ph.D. Thesis, MIT, 2001. [12] J. Rissanen. Modeling by Shortest Data Description. Automatica, Vol.14, pp.465–471, 1978. [13] R.L. Rivest, et al. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, vol.21, no.2, pp.120–126, 1978. [14] C.E. Shannon. Prediction and entropy of printed English. Bell Systems Tech. Journal, pp.50–64, 1951.

10

Recommend Documents

Semantic Matching and Heuristic Search for a ... - Semantic Scholar