A novel time algorithm for 3D block-based medial axis transform by ...

Report 0 Downloads 71 Views
Parallel Computing 35 (2009) 72–82

Contents lists available at ScienceDirect

Parallel Computing journal homepage: www.elsevier.com/locate/parco

A novel Oð1Þ time algorithm for 3D block-based medial axis transform by peeling corner shells Yuh-Rau Wang * Department of Computer Science and Information Engineering, St. John’s University, 499, Section 4, Tam-King Road, Tamsui, Taipei, Taiwan, ROC

a r t i c l e

i n f o

Article history: Received 21 February 2007 Received in revised form 27 April 2008 Accepted 25 October 2008 Available online 7 November 2008

Keywords: Medial axis transform Chessboard distance transform Image processing Parallel algorithm LARPBS

a b s t r a c t The block-based medial axis transform (BB_MAT, for short) of a 3D (2D) binary image, denoted 3D_BB_MAT (2D_BB_MAT), is defined as the problem to find a minimal set of upright 1-cubes (1-squares) whose union corresponds exactly to the 1-voxels (1-pixels) in the binary image. Many parallel algorithms have been proposed for computing the 2D_BB_MAT, almost all proposed approaches are unable to extend to a solution of the 3D_BB_MAT problem. In this paper, we first propose a novel Oð1Þ time algorithm for solving the 3D_BB_MAT (2D_BB_MAT) problem of a binary image of size N  N  N (N  N) on a linear array with a reconfigurable pipelined bus system (LARPBS, for short) by peeling the 3D (2D) corner shell of each 1-voxel (1-pixel) of the image and revealing some properties of the 3D_BB_MAT (2D_BB_MAT). The 3D (2D) chessboard distance transform problem, denoted 3D_CDT (2D_CDT), is the problem for each 0-voxel (0-pixel) of a 3D (2D) binary image to find its nearest 1-voxel (1-pixel) in the binary image based on chessboard distance metric. In this paper, we also solve the 3D_CDT (2D_CDT) based on the computed 3D_BB_MAT (2D_BB_MAT). All the proposed algorithms are very innovative although they look like ‘‘straightforward”. To the best of our knowledge, the proposed 2D_BB_MAT (2D_CDT) algorithm is the first algorithm that can extend to a solution of the 3D_BB_MAT (3D_CDT) problem in parallel, and the 3D_BB_MAT (3D_CDT) algorithm is the first parallel algorithm proposed for solving the 3D_BB_MAT (3D_CDT) problem. Ó 2008 Elsevier B.V. All rights reserved.

1. Introduction The medial axis transform (MAT, for short) is an image representation scheme first introduced by Blum [1]. It is a powerful shape descriptor that has good properties for data reduction, especially in the applications of pattern recognition and computer vision, and allows a full reconstruction of the original shape. The book by Rosenfeld and Kak [23] gives an excellent introduction to MAT and states that MAT plays an important role in image understanding. There are two types of MATs [5]. One is the distance-based MAT (DB_MAT, for short). See [4] for detailed references. The 3D_DB_MAT (2D_DB_MAT) is a recovering of the object by maximal 1-balls (1-disks) included in the object in a 3D (2D) space. A ball (disk) is defined as a 1-ball (1-disk) if all the voxels (pixels) in this ball (disk) are 1-voxels (1-pixels). A voxel (pixel) is used to represent an image point in 3D (2D) space. For readability, we will term a voxel (pixel) as a point as well. A 1/0-point means a black/white point. A maximal 1-ball (1-disk) is a 1-ball (1-disk) that contains in no other 1-ball (1-disk). If every voxel (pixel) of the medial axis of an object is labeled with the radius of its corresponding maximal 1-ball (1-disk), then the object can be exactly rebuilt using the information of its axis [26]. Very few parallel algorithms have been developed for the 3D_DB_MAT. In [29], Wang et al. devised two parallel algorithms for computing the 3D_DB_MAT of a binary image of size * Tel.: +886 2 28013131x6774; fax: +886 2 28013143. E-mail address: [email protected] 0167-8191/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2008.10.003

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

73

N  N  N; one in Oðlog log NÞ time on a common concurrent read concurrent write parallel random access machine (com3 mon CRCW PRAM, for short) with logNlog N processors, and the other in Oðlog NÞ time on an exclusive read exclusive write parN3 processors. In [30], Wang and Horng devised a parallel allel random access machine (EREW PRAM, for short) with log N algorithm for computing the 3D_DB_MAT of a binary image of size N  N  N in Oð1Þ time on an array with reconfigurable optical buses (AROB, for short) of N 3þd processors, where d ¼ 1k, k is a constant and a positive integer. To the best of our knowledge, the algorithms proposed in [29,30] are the best previous results for solving the 3D_DB_MAT problem in parallel. The other type is the BB_MAT (i.e., block-based MAT). The 3D_BB_MAT (2D_BB_MAT) is a recovering of the object by maximal 1-cubes (1-squares) in a 3D (2D) space. A cube (square) is defined as a 1-cube (1-square) if all the points in this cube (square) are 1-points. A maximal 1-cube (1-square) is a 1-cube (1-square) that contains in no other 1-cube (1-square). For an N  N  N (N  N) binary image, the 3D_BB_MAT (2D_BB_MAT) of the binary image is defined as a problem to find a minimal set of upright 1-cubes (upright 1-squares) whose union corresponds exactly to the 1-points in the 3D (2D) space. Although the computation of the 2D_BB_MAT problem has been well studied both in the sequential and parallel domains [2,3,5,7,10,11,27,28]. almost all proposed approaches are dedicated to solving the 2D_BB_MAT problem only and are very difficult to extend to a solution of solving the 3D_BB_MAT problem. Up to now, only the preliminary version of this paper [32] has developed the parallel algorithm for the 3D_BB_MAT problem. In this paper, by analyzing the corner shells of each 1-point of the 3D (2D) binary images and revealing some properties of the 3D_BB_MAT (2D_BB_MAT), we propose a novel Oð1Þ time 3D_BB_MAT (2D_BB_MAT) algorithm on the LARPBS model. The 3D (2D) chessboard distance transform problem, denoted 3D_CDT (2D_CDT), is defined as the problem for each 0voxel (0-pixel) of a 3D (2D) binary image to find its nearest 1-voxel (1-pixel) in the binary image based on chessboard distance metric. In this paper, we apply the proposed 3D_BB_MAT (2D_BB_MAT) algorithm to solve the 3D_CDT (2D_CDT) problem. All the proposed algorithms are very innovative although they look like ‘‘straightforward”. To the best of our knowledge, the proposed 2D_BB_MAT (2D_CDT) algorithm is the first algorithm that can extend to a solution of the 3D_BB_MAT (3D_CDT) problem in parallel, and the 3D_BB_MAT (3D_CDT) algorithm is the first parallel algorithm proposed for solving the 3D_BB_MAT (3D_CDT) problem. The rest of this paper is organized as follows. In Section 2, we introduce the LARPBS model. In Section 3, we introduce the notations and properties of 3D_BB_MAT (2D_BB_MAT) problem. In Section 4, we present an Oð1Þ time 2D_BB_MAT algorithm in detail. In Section 5, we present an Oð1Þ time 3D_BB_MAT algorithm in detail. In Section 6, we then present an Oð1Þ time 3D_CDT (2D_CDT) algorithm in detail. Finally, some concluding remarks are included in the last section. 2. The LARPBS model Two related models based on reconfigurable optical buses, i.e., the array with reconfigurable optical buses (AROB, for short) proposed by Pavel and Akl [21], and the linear array with a reconfigurable pipelined bus system (LARPBS, for short) proposed by Pan and Hamdi [18,19] were presented in 1996 and have drawn much attention from the researchers since then. An LARPBS of N processors is an array of processors P0 ; P1 ; . . . ; P N1 connected by optical buses. Each processor with a local memory is identified by a unique index denoted as Pi , where 0 6 i < N. In both LARPBS and AROB systems, messages can be transmitted concurrently on an optical bus in a pipelined fashion and the optical bus can be reconfigured dynamically under program control. Due to reconfigurability, an LARPBS can be partitioned into j (where j P 2) independent subarrays. All these subarrays can operate independently to solve different subproblems without interference. For example, the LARPBS of N processors can be split into two separate systems, one consisting of processors P 0 ; P 1 ; . . . ; Pi , whose leader processor is P 0 , and the other consisting of processors Piþ1 ; P iþ2 ; . . . ; P N1 , whose leader processor is P iþ1 if we set the reconfigurable switches of processor Pi to cross. The reader is referred to [14] for more details on the LARPBS model. Several approaches, such as time waiting function [8], time-division multiplexing scheme [22], and coincident pulse technique [13,22] can be used for an optical bus system to route messages from one processor to another. However, only the coincident pulse technique is used on the LARPBS computation model for addressing [16]. 2.1. Some primitive operations on the LARPBS In parallel processing systems, the communication efficiency often determines the effectiveness of processor utilization, which, in turn, determines performance. Limited bandwidth, capacitive loading, and cross-talk caused by mutual inductance are the fundamental constraints in electronic systems. A pipelined optical bus system uses optical signals instead of electrical signals to transfer messages among processors. Although optical buses offer numerous advantages including high bandwidth and low interference, however, the two most relevant properties of optical buses are their unidirectional propagation and predictable propagation delay. These two properties enable synchronized concurrent access to an optical bus in a pipelined fashion [8,17,22] and make the architecture with bus structure of optical interconnections, such as the LARPBS and AROB models, suitable for a wide range of applications, especially for communication intensive applications. Lemma 1 summarizes some primitive operations of the LARPBS model. See [15,19] for the definitions and detailed implementations of these primitive operations. All these primitive operations take O(1) bus cycles using the coincident pulse addressing technique, where the bus cycle length is the end-to-end message transmission time over a bus [14]. These prim-

74

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

itive operations, plus the reconfigurability of the LARPBS model, make the LARPBS very attractive in solving problems that are both computation and communication intensive, such as 2D_BB_MAT and 3D_BB_MAT problems. To avoid controversy, in this paper, Oð1Þ time means Oð1Þ bus cycles for communication plus Oð1Þ time for local computation [14]. This assumption for time complexity measure has been shown and adopted widely in the literature [8,9,15,17,19,21,24,30,31]. In this paper, we adopt the same time complexity measure. Lemma 1 ([15,19]). One-to-one communication, broadcasting, multicasting, binary prefix sum, binary summation, and compression can be done in Oð1Þ bus cycles on an LARPBS. 2.2. An Oð1Þ time minimum finding algorithm Lemma 2 [20]. The minimum value of N data items can be computed in Oð1Þ time on an LARPBS of size N2 . Then, based on Lemma 2, we devise an Oð1Þ time minimum finding algorithm, named Algorithm MFA as follows. Note that throughout this paper, 2cþ11 1 is denoted as , where c is a constant. Algorithm. MFA Input: N data items. Suppose that each data item with index i, 0 6 i < N, is initially assigned to processor with  index iN (i.e., P iN ) of an LARPBS of size N 1þ . Output: The minimum value of these N input data items. Step 1: First, we partition the processors of this LARPBS in the ascending order of processor index into N1 subsets, each of size N 2 . The corresponding N data items with index i (which have been given in processor PiN , 0 6 i < N) are automatically partitioned into N 1 subsets, each of size N  . Each subset of the contiguous N 2 processors is responsible for finding the minimum from the subset of N  data items in Oð1Þ time by invoking Lemma 2. In each subset, only the minimum will survive. We then allocate this surviving data item to P iN2 . At the end of this step, we are left with N 1 surviving data items, with each of them stored in processor P iN2 , where 0 6 i < N 1 . c c Step 2 to Step c: From Step 2 to Step c, the N 1þ processors are partitioned into N 1ð2 1Þ subsets, each of size N 2  . The surc1 1ð2c1 1Þ 1ð2c 1Þ data items are also automatically partitioned into N subsets, each of size N 2  . viving N 2c  2c1  data items Each subset of N processors is responsible for finding the minimum from each subset of N in Oð1Þ time. Only the minimum in each subset survives. We then allocate this surviving data item to PiN2c  . c At the end of this step, we are left with N 1ð2 1Þ surviving data items, with each of them stored in proces1ð2c 1Þ . sor PiN2c  , where 0 6 i < N cþ1 Step c þ 1: Finally, in Step c þ 1, the N 1þ processors are partitioned into N 1ð2 1Þ ¼ N 0 ¼ 1 subset of size cþ1 c N 2  ¼ N 1þ (since  ¼ 2cþ11 1, so 2cþ1  ¼ 1 þ , and 1  ð2cþ1  1Þ ¼ 0). The surviving N 1ð2 1Þ data elecþ1 1ð2cþ1 1Þ 0 2c  ¼ N ¼ 1 subset of size N . All the N 2  procesments are also automatically partitioned into N 2c  sors are responsible for finding the minimum from the subset of N surviving data items in Oð1Þ time. So, the minimum of the N data items is computed at the end of Step c þ 1. We allocate it to processor P0 . See Table 1 for the data and processors partition in each step of Algorithm MFA. Since each step can be computed in Oð1Þ time using N1þ processors and the minimum can be computed in c þ 1 steps, where c is a constant, so, this algorithm can be done in ðc þ 1ÞOð1Þ ¼ Oð1Þ time on an LARPBS of size N1þ . This concludes Lemma 3. Clearly, Lemma 2 is a special case of Lemma 3 (when c ¼ 0). Note that in each step of Algorithm MFA, all processors are always fully utilized and the number of processors needed will reduce dramatically even if c increases very slowly. For example, if we invoke Lemma 2 (i.e., c ¼ 0 in Lemma 3), it will take N 2 processors for us to find the minimum of these N data items in Oð1Þ time; however, if we invoke 16 16 14 Lemma 3 and choose c ¼ 3, then it only takes N 15 processors for finding the minimum in Oð1Þ time. Clearly, N 15  ðN 15  1Þ processors are saved. Lemma 3 ([31,33]). The minimum value of N data items can be computed in Oð1Þ time on an LARPBS of size N 1þ , where 0 <  ¼ 2cþ11 1  1, c is a constant.

Table 1 The data and processors partition in each step of Algorithm MFA. Step 1 2 c cþ1

Subset # 1

N N 13 c N2  N0

Data/subset 20 

N 1 N2  c1 N2  c N2 

Processors/subset 0

0

N2   N2  1 1 N2   N2  c1 c1 N2   N2  c c N2   N2 

75

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

3. Notations and properties of BB_MAT Some notations and properties of the 3D_BB_MAT (2D_BB_MAT) problem are introduced in this section. A point is represented by its Cartesian coordinates. In a 3D (2D) space, let the origin of the 3D (2D) coordinate system be at the topleft-far (top-left) corner of the image and be represented as ð0; 0; 0Þ or p0;0;0 (ð0; 0Þ or p0;0 ). Let a 3D (2D) binary image of size N  N  N (N  N) be represented as V (P). The X-coordinate increases downwards. The Y-coordinate increases towards right. In a 3D (2D) space, let the plane of the paper be the xy-plane with Z-coordinate = 0 (i.e., the Z 0 -plane). The origin ð0; 0; 0Þ is on the Z 0 -plane and the Z-coordinate is perpendicular to all the Z k -plane and onto it, where 0 6 k < N. A voxel ði; j; kÞ is located at X-coordinate i, Y-coordinate j and Z-coordinate k of V. If we omit the Z-coordinate, then it turns out to be a 2D plane. Definition 1. In a 2D plane, given two points pi;j ¼ ði; jÞ and px;y ¼ ðx; yÞ, in this paper, we define that pi;j dominates px;y if i 6 x and j 6 y. Let Pi;j ¼ fpx;y ji 6 x < N; j 6 y < Ng denote the rectangle dominated by point pi;j . For fixed i and j, let Ci;j denote S fpx;j ji 6 x < Ng fpi;y jj 6 y < Ng and be defined as the 2D corner shell dominated by point pi;j . Clearly, C0;0 denotes S S ð0; Þ ð; 0Þ, and P0;0 ¼ N1 i¼0 Ci;i . For example, a binary image P0;0 of size 4  4 as shown in Fig. 1e is the union of Ci;i , where 0 6 i 6 3, as shown in Fig. 1a–d. In a 3D space, given two points pi;j;k ¼ ði; j; kÞ and px;y;z ¼ ðx; y; zÞ, in this paper, we define that pi;j;k dominates px;y;z if i 6 x, j 6 y, and k 6 z. Let Vi;j;k ¼ fpx;y;z ji 6 x < N; j 6 y < N; k 6 z < Ng denote the box dominated by S S point pi;j;k . For fixed i; j and k, let Ci;j;k denote fpx;y;k ji 6 x < N; j 6 y < Ng fpx;j;z ji 6 x < N; k 6 z < Ng fpi;y;z jj 6 y < N; k 6 z < Ng and be defined as the 3D corner shell dominated by point pi;j;k . Clearly, C0;0;0 as shown in Fig. 2d denotes S S ð0; ; Þ ð; 0; Þ ð; ; 0Þ, where ð0; ; Þ, ð; 0; Þ, and ð; ; 0Þ are shown in Fig. 2a–c, respectively. A binary image V0;0;0 of size 4  4  4 is the union of Ci;i;i , where 0 6 i 6 3, as shown in Fig. 3. Definition 2. For any pixel ði0 ; j0 Þ 2 C0;0 , let the set of pixel ði0 þ h; j0 þ hÞ, where i0 þ h < N, j0 þ h < N, and h ¼ 0; 1; 2; . . ., be defined as the diagonal of ði0 ; j0 Þ and be denoted as Diag½i0 ; j0 . Let the length of Diag½i0 ; j0  be denoted as jDiag½i0 ; j0 j. For a binary image of size N  N, the diagonal of length N (i.e., Diag½0; 0) is denoted as main diagonal. For any voxel ði0 ; j0 ; k0 Þ 2 C0;0;0 , the set of voxel ði0 þ h; j0 þ h; k0 þ hÞ, where i0 þ h < N, j0 þ h < N; k0 þ h < N and h ¼ 0; 1; 2; . . ., is defined as the diagonal of ði0 ; j0 ; k0 Þ and denoted as Diag½i0 ; j0 ; k0 . Let the length of Diag½i0 ; j0 ; k0  be denoted as jDiag½i0 ; j0 ; k0 j. For a binary image of size N  N  N, the diagonal of length N (i.e., Diag½0; 0; 0) is denoted as main diagonal. Lemma 4. For a binary image P of size N  N, there are 2N  1 diagonals. For a binary image V of size N  N  N, there are 3N 2  3N þ 1 diagonals. S Proof. For a binary image P of size N  N, C0;0 denotes ð0; Þ ð; 0Þ (defined in Definition 1). According to Definition 2, the diagonal of a point ði0 ; j0 Þ 2 C0;0 is defined as the set of point ði0 þ h; j0 þ hÞ, where i0 þ h < N, j0 þ h < N, and h ¼ 0; 1; 2; . . . T S T Since jð0; Þj ¼ jð; 0Þj ¼ N, and jð0; Þ ð; 0Þj ¼ 1, so jð0; Þ ð; 0Þj ¼ jð0; Þj þ jð; 0Þj  jð0; Þ ð; 0Þj ¼ 2N  1. That is, there are 2N  1 points in C0;0 , hence, totally there are 2N  1 diagonals. In more detailed, there are one diagonal of length N, two diagonals of length N  1, two diagonals of length N  2; . . ., and two diagonals of length 1, and hence totally there are 2N  1 diagonals. S S Similarly, for a binary image V of size N  N  N, C0;0;0 denotes ð0; ; Þ ð; 0; Þ ð; ; 0Þ (defined in Definition 1). According to Definition 2, the diagonal of a point ði0 ; j0 ; k0 Þ 2 C0;0;0 is defined as the set of point ði0 þ h; j0 þ h; k0 þ hÞ, where T i0 þ h < N, j0 þ h < N; k0 þ h < N and h ¼ 0; 1; 2; . . . Since jð0; ; Þj ¼ jð; 0; Þj ¼ jð; ; 0Þj ¼ N 2 , jð0; ; Þ ð; 0; Þj ¼ T T T T S S jð0; ; Þ ð; ; 0Þj ¼ jð; 0; Þ ð; ; 0Þj ¼ N, and jð0; ; Þ ð; 0; Þ ð; ; 0Þj ¼ 1, so, jð0; ; Þ ð; 0; Þ ð; ; 0Þj ¼ jð0; ; Þjþ T T T T T jð;0;Þj þ jð;; 0Þj  jð0; ;Þ ð; 0; Þj  jð0;; Þ ð;; 0Þj  jð; 0; Þ ð;; 0Þj þ jð0; ;Þ ð; 0; Þ ð;; 0Þj ¼ 3N 2  3N þ 1. That 2 2 is, there are 3N  3N þ 1 points in C0;0;0 , and hence totally there are 3N  3N þ 1 diagonals. In more detailed, there are one diagonal of length N (which is denoted as main diagonal), 6 diagonals of length N  1, 12 diagonals of length N  2, 18 diagonals of length N  3; . . ., and 6ðN  1Þ diagonals of length 1, and hence totally there are 1 þ 6ð1 þ 2 þ 3 þ    þ ðN  1ÞÞ ¼ 3N 2  3N þ 1 diagonals. h Definition 3. In a 2D plane, let the number of contiguous 1-pixels in column j starting at pixel ði; jÞ be denoted as LC½i; j. Let the number of contiguous 1-pixels in row i starting at ði; jÞ be denoted as LR½i; j. Let the number of contiguous 1-pixels in Diag½i0 ; j0  starting at ði; jÞ be denoted as LD½i; j. Let minfLD½i; j; LR½i; j; LC½i; jg be denoted as M 1D ½i; j. In a 3D space, for a 1-

Y X

a

b

c

d

e

Fig. 1. A binary image P0;0 of size 4  4 represented by the union of the 2D corner shells Ci;i , where 0 6 i 6 3.

76

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

Y

Y Z X

Z

a

Y Z X

X

c

b

d

Fig. 2. A 3D corner shell Ci;j;k of a binary image of size 4  4  4 is composed of the union of fpx;y;k ji 6 x < N; j 6 y < Ng, fpx;j;z ji 6 x < N; k 6 z < Ng, and fpi;y;z jj 6 y < N; k 6 z < Ng.

Y Z X

a

b

c

d

e

Fig. 3. A binary image V0;0;0 of size 4  4  4 represented by the union of the 3D corner shells Ci;i;i , where 0 6 i 6 3.

voxel ði; j; kÞ, the side-length of a 2D (upright) maximal 1-square with ði; j; kÞ at its top-left corner on ð; ; kÞ plane (i.e., Z k plane) as shown in Fig. 2c is denoted as MZ½i; j; k. The side-length of a 2D maximal 1-square with ði; j; kÞ at its top-far corner on ð; j; Þ plane (i.e., Y j -plane) as shown in Fig. 2b is denoted as MY½i; j; k. The side-length of a 2D maximal 1-square with ði; j; kÞ at its left-far corner on ði; ; Þ plane (i.e., X i -plane) as shown in Fig. 2a is denoted as MX½i; j; k. Let the number of contiguous 1-voxels in Diag½i0 ; j0 ; k0  starting at ði; j; kÞ be denoted as PD½i; j; k. Let minfMX½i; j; k; MY½i; j; k; MZ½i; j; k; PD½i; j; kg be denoted as M 2D ½i; j; k. The fact that the binary prefix sum operation can be done in Oð1Þ bus cycles on an LARPBS introduced in Lemma 1 results in Corollary 1. Corollary 1. For all the 1-pixel ði; jÞ of a fixed column j of P, where 0 6 i < N, their corresponding LC½i; j can be computed in Oð1Þ time using an LARPBS of size N. Similarly, for all the 1-pixel ði; jÞ of a fixed row i of P, where 0 6 j < N, their corresponding LR½i; j can be computed in Oð1Þ time using an LARPBS of size N. For all the 1-pixel ði; jÞ in a fixed diagonal Diag½i0 ; j0 , their corresponding LD½i; j can be computed in Oð1Þ time using an LARPBS of size jDiag½i0 ; j0 j. Definition 4. In a 2D plane, let Mi;j be the maximal 1-square with 1-pixel ði; jÞ as its top-left corner and M½i; j be the sidelength of Mi;j . Clearly, M½i; j ¼ 0 if ði; jÞ is a 0-pixel. In a 3D space, let Mi;j;k be the maximal 1-cube with 1-voxel ði; j; kÞ at the top-left-far corner and M½i; j; k be the side-length of Mi;j;k . Clearly, M½i; j; k ¼ 0 if ði; j; kÞ is a 0-voxel. Lemma 5. In a 2D plane, assume that M 1D ½i; j ¼ m, and minfM 1D ½i; j; M 1D ½i þ 1; j þ 1 þ 1; M 1D ½i þ 2; j þ 2 þ 2; . . . ; M 1D ½i þ m  1; j þ m  1 þ ðm  1Þg ¼ h, where m P h. Then, M½i; j ¼ h. In a 3D space, assume that M 2D ½i; j; k ¼ m, and m i nfM 2D ½i; j; k; M 2D ½i þ 1; j þ 1; k þ 1 þ 1; M 2D ½i þ 2; j þ 2; k þ 2 þ 2; . . . ; M 2D ½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þg ¼ h, where m P h. Then, M½i; j; k ¼ h. Proof. In the 2D case, since M 1D ½i; j denotes minfLC½i; j; LR½i; j; LD½i; jg (defined in Definition 3) and hence minfM 1D ½i; j; M 1D ½i þ 1; j þ 1 þ 1; M 1D ½i þ 2; j þ 2 þ 2; . . . ; M 1D ½i þ m  1; j þ m  1 þ ðm  1Þg ¼ h means that for the set A ¼ fLC½i; j; LC½i þ 1; j þ 1j þ 1; LC½i þ 2; j þ 2 þ 2; . . . ; LC½i þ m  1; j þ m  1 þ ðm  1Þ; LR½i; j; LR½i þ 1; j þ 1j þ 1; LR½i þ 2; j þ 2 þ 2; . . . ; the LR½i þ m  1; j þ m  1 þ ðm  1Þ; LD½i; j; LD½i þ 1; j þ 1j þ 1; LD½i þ 2; j þ 2 þ 2; . . . ; LD½i þ m  1; j þ m  1 þ ðm  1Þg, following two conditions hold: (1) any one member in A is either greater than or equal to h, and (2) at least one member in A is equal to h. Clearly, condition (1) implies that M½i; j P h, and condition (2) implies that at least one 0-point is in the S point set fði þ h; jÞ; ði þ h; j þ 1Þ; ði þ h; j þ 2Þ; . . . ; ði þ h; j þ hÞg fði; j þ hÞ; ði þ 1; j þ hÞ; ði þ 2; j þ hÞ; . . . ; ði þ h; j þ hÞg, and hence M½i; j 6 h. In summary, we conclude that M½i; j ¼ h. Similarly, in the 3D case, since M 2D ½i; j; k ¼ minfMX½i; j; k; MY½i; j; k; MZ½i; j; k, PD½i; j; kg (defined in Definition 3) and hence minfM 2D ½i; j; k; M 2D ½i þ 1; j þ 1; k þ 1 þ 1; M 2D ½i þ 2; j þ 2; k þ 2 þ 2; . . . ; M 2D ½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þg ¼ h means that for the set B ¼ fMX½i; j; k; MX½i þ 1; j þ 1; k þ 1 þ 1; MX½i þ 2; j þ 2; k þ 2þ 2; . . . ; MX½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þ; MY½i; j; k; MY½i þ 1; j þ 1; k þ 1 þ 1; MY½i þ 2; j þ 2; k þ 2 þ 2; . . . ; MY½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þ; MZ½i; j; k; MZ½i þ 1; j þ 1; k þ 1 þ 1; MZ½i þ 2; j þ 2; k þ 2 þ 2; . . . ; MZ½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þ; PD½i; j; k; PD½i þ 1;

77

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

j þ 1; k þ 1 þ 1; PD½i þ 2; j þ 2; k þ 2 þ 2; . . . ; PD½i þ m  1; j þ m  1; k þ m  1 þ ðm  1Þg, the following two conditions hold: (1) any one member in B is greater than or equal to h, and (2) at least one member in B is equal to h. Clearly, condition (1) S S implies that M½i; j; k P h, and condition (2) implies that at least one 0-point is in SX SY SZ , where SX denotes the square which is on X iþh -plane with its four corners being ði þ h; j; kÞ; ði þ h; j; k þ hÞ; ði þ h; j þ h; kÞ, and ði þ h; j þ h; k þ hÞ, SY denotes the square which is on Y iþh -plane with its four corners being ði; j þ h; kÞ; ði; j þ h; k þ hÞ; ði þ h; j þ h; kÞ, and ði þ h; j þ h; k þ hÞ, and SZ denotes the square which is on Z iþh -plane with its four corners being ði; j; k þ hÞ; ði þ h; j; k þ hÞ; ði; j þ h; k þ hÞ, and ði þ h; j þ h; k þ hÞ. Clearly, condition (2) also implies M½i; j; k 6 h. In summary, we conclude that M½i; j; k ¼ h. h 4. Algorithm 2D_BB_MAT It is well known that the 2D_BB_MAT problem can be solved by first computing M½i; j based on Lemma 5, and then checking if the Mi;j is included by any other Mi0 ;j0 or not, where i P i0 and j P j0 . If Mi;j is not included by any other Mi0 ;j0 (i.e., maxfM½i  1; j  1; M½i  1; j; M½i; j  1g 6 M½i; j), set T½i; j ¼ 1; otherwise, set T½i; j ¼ 0 [2,10,28]. Then, 2D_BB_MAT can be represented by the union of all the Mi;j with T½i; j ¼ 1. An example of 2D_BB_MAT computation for a binary image of size 8  8 is shown in Fig. 4. For a binary image P of size 8  8 as shown in Fig. 4a, the M½i; j of each 1-pixel ði; jÞ in P is shown in Fig. 4b. The computed 2D_BB_MAT is shown in Fig. 4c. Although the computation of the BB_MAT has been well studied in the parallel domains, almost all proposed approaches are dedicated to solving the 2D_BB_MAT problem only and cannot extend to a solution of the 3D_BB_MAT problem in parallel. In the following, we devise a novel 2D_BB_MAT algorithm by revealing some properties of the 2D_BB_MAT problem and analyzing the 2D corner shells of each 1-point of the 2D binary image as described in Section 2. To the best of our knowledge, this is the first parallel 2D_BB_MAT algorithm that can easily extend to a solution of the 3D_BB_MAT problem in parallel efficiently. Algorithm. 2D_BB_MAT Input: A 2D N  N binary image, each pixel being represented by ði; jÞ, where 0 6 i; j < N. Output: 2D_BB_MAT. Step 1: For each 0-pixel ði; jÞ in P, set M½i; j ¼ 0. If all the pixels in P are 0-pixels, then the 2D_BB_MAT computation has been completed. If all the pixels in P are 1-pixels, then the 2D_BB_MAT can also be finished by representing P by M½0; 0 ¼ N with T½0; 0 ¼ 1. Otherwise, for each 1-pixel ði; jÞ in P, compute LC½i; j, LR½i; j and LD½i; j by performing binary prefix sum on column j, row i, and the diagonal which covers LD½i; j, all in parallel. Then, for each 1-pixel ði; jÞ, compute M 1D ½i; j (i.e., minfLD½i; j; LR½i; j; LC½i; jg). Step 2: This step is for all the 1-pixels ði; jÞ to compute their corresponding M½i; j in parallel. Assume that M 1D ½i; j ¼ mi;j and minfM1D ½i; j; M 1D ½i þ 1; j þ 1 þ 1; M 1D ½i þ 2; j þ 2 þ 2; . . . ; M 1D ½i þ mi;j  1; j þ mi;j  1 þ ðmi;j  1Þg ¼ hi;j . Then, based  for any single 1-pixel ði; jÞ to comon Lemma 5, we get M½i; j ¼ hi;j . Clearly, based on Lemma 3, it takes at most m1þ i;j pute its corresponding M½i; j. Step 3: This step checks if each Mi;j is included by other Mi0 ;j0 or not, where i P i0 , and j P j0 . If maxfM½i; j  1; M½i  1; j; M½i  1; j  1g 6 M½i; j, then we assign 1 to T½i; j to represent that Mi;j is not included by any other maximal 1-square. Then, 2D_BB_MAT can be represented by all the Mi;j of 1-pixel ði; jÞ in P with their corresponding T½i; j ¼ 1. Let us choose a 1-pixel, say ð2; 3Þ, of Fig. 4 for example to explain this algorithm more precisely as follows. In Step 1, since LC½2; 3 ¼ 3, LR½2; 3 ¼ 3 and LD½2; 3 ¼ 5, we get M1D ½2; 3 ¼ minfLD½2; 3; LR½2; 3; LC½2; 3g ¼ 3. (Similarly, we get M 1D ½3; 4 ¼ minfLD½3; 4; LR½3; 4; LC½3; 4g ¼ minf4; 1; 3g ¼ 1, and M 1D ½4; 5 ¼ minfLD½4; 5; LR½4; 5; LC½4; 5g ¼ minf3; 1; 3g ¼ 1.) In Step 2, since M 1D ½2; 3 ¼ 3 and h2;3 ¼ minfM 1D ½2; 3; M 1D ½3; 4 þ 1; M 1D ½4; 5 þ 2g ¼ minf3; 1 þ 1; 1 þ 2g ¼ 2, so, based on Lemma 5, we get M½2; 3 ¼ h2;3 ¼ 2. (Similarly, we get M½2; 2 ¼ 3.) In Step 3, for 1-pixel ð2; 3Þ, since M½2; 3 ¼ 2,

Y

(0,0) 2* 1 1 1

2* 1 1* 2* 1 1 3* 2 1 1 2 2 1 2* 1 1 2* 1 1 2* 1 2* 1 1 1 1 1 2* 1 1* 1 1

X

a

(7,7)

b

2

2

1

2 3 2

2 2

2

2 1

c

Fig. 4. An example of 2D_BB_MAT computation. (a) A binary image of size 8  8. (b) The M½i; j of each 1-pixel. (c) The computed 2D_BB_MAT.

78

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

M½1; 2 ¼ 0, M½1; 3 ¼ 0, M½2; 2 ¼ 3, so maxfM½1; 2; M½1; 3; M½2; 2g ¼ 3 > M½2; 3 ¼ 2, and hence we assign 0 to T½2; 3 to represent that M2;3 is at least included by one of the other maximal 1-squares. Now, the 1-pixel ð2; 3Þ has gone through all the three steps of Algorithm 2D_BB_MAT. Let us choose another 1-pixel, say ð2; 2Þ, for example. After we go through this algorithm, we get M½2; 2 ¼ 3, M½1; 1 ¼ 1, M½1; 2 ¼ 0, and M½2; 1 ¼ 0. So maxfM½1; 1; M½1; 2; M½2; 1g ¼ 1 6 M½2; 2 ¼ 3, and hence we assign 1 to T½2; 2 to represent that M2;3 is not included by any other maximal 1-square. 4.1. Time complexity analysis Obviously, in Step 1, if the pixels in P are either all 0-pixels or all 1-pixels, it takes an LARPBS of size N 2 to complete the 2D_BB_MAT computation in Oð1Þ time. Otherwise, based on Corollary 1, all the N columns of LC½i; j can be computed in Oð1Þ time using an LARPBS of size N 2 . Similarly, all the N rows of LR½i; j can be computed in Oð1Þ time using an LARPBS of size N 2 . All the LD½i; j can be computed in Oð1Þ time using an LARPBS of size N 2 based on the computed binary prefix sums on all the 2N  1 diagonals (which cover all the LD½i; j) as stated in Lemma 4. Then, it takes an LARPBS of size N 2 for all the 1-pixel ði; jÞ 2 P to compute their corresponding M1D ½i; j (i.e., minfLD½i; j; LR½i; j; LC½i; jg) in Oð1Þ time. In Step 2, assume that  for each 1-pixel ði; jÞ to compute its corresponding M 1D ½i; j ¼ mi;j . Then based on Lemma 3, it takes an LARPBS of size m1þ i;j PN1 1þ M½i; j. For all the 1-pixel ði; jÞ, assume that i;j¼0 mi;j ¼ S2 , then it takes an LARPBS of size S2 for all the 1-pixel ði; jÞ to compute their corresponding M½i; j. In the worst case, there are ðN 2  1Þ 1-pixels in P and hence S2 < 1N 1þ þ 3ðN  1Þ1þ þ P P 2 5ðN  2Þ1þ þ    þ ð2N  1ÞðN  ðN  1ÞÞ1þ ¼ Ni¼1 ði  ði  1Þ2 ÞðN  i þ 1Þ1þ ¼ Ni¼1 ð2i  1ÞðN  i þ 1Þ1þ . Since the upper 2 3þ bound of the number of 1-pixels is less than N and mi;j < N, so S2 < N . However, in the best case, it is possible that only a pixel, say p ¼ ði; jÞ, is 1-pixel. Clearly, it takes only one processor to set M½i; j ¼ 1 for this 1-pixel and no computations are needed for all the other 0-pixels. In summary, N 2 6 S2 < N 3þ . In Step 3, it takes at most an LARPBS of size N 2 for all the Mi;j of 1-pixel ði; jÞ 2 P to compute their corresponding T½i; j. From the above analysis, it takes an LARPBS of size a ¼ S2 for P to compute its corresponding 2D_BB_MAT in Oð1Þ time. This concludes the following result. Theorem 1. The 2D_BB_MAT of a binary image of size N  N can be computed in Oð1Þ time on an LARPBS of size a, where N 2 6 a < N 3þ , 0 <  ¼ 2cþ111  1. 5. Algorithm 3D_BB_MAT The 3D_BB_MAT can be implemented as follows. First compute M½i; j; k based on Lemma 5, then check if the Mi;j;k is included by any other Mi0 ;j0 ;k0 or not, where i P i0 , j P j0 , and k P k0 . If Mi;j;k is not included by any other Mi0 ;j0 ;k0 (i.e., maxfM½i  1; j  1; k  1; M½i  1; j  1; k; M½i  1; j; k  1; M½i; j  1; k  1; M½i  1; j; k; M½i; j  1; k; M½i; j; k  1g 6 M½i; j; k), set T½i; j; k ¼ 1; otherwise, set T½i; j; k ¼ 0. Then, 3D_BB_MAT can be represented by the union of all the Mi;j;k with T½i; j; k ¼ 1. By revealing some properties of the 3D_BB_MAT problem and analyzing the 3D corner shells of each black point of the 3D binary image, and based on the computed 2D_BB_MAT, we devise a novel 3D_BB_MAT algorithm as follows. To the best of our knowledge, this is the first parallel algorithm for solving the 3D_BB_MAT problem. Algorithm. 3D_BB_MAT Input: A 3D N  N  N binary image, each voxel being represented by ði; j; kÞ, where 0 6 i; j; k < N. Output: 3D_BB_MAT. Step 1: For each 0-voxel ði; j; kÞ in V, set M½i; j; k ¼ 0. If all the voxels in V are 0-voxels, then the 3D_BB_MAT computation has been completed. If all the voxels in V are 1-voxels, then the 3D_BB_MAT can also be finished by representing V by M½0; 0; 0 ¼ N with T½0; 0; 0 ¼ 1. Otherwise, for each 1-voxel ði; j; kÞ in V, compute MZ½i; j; k, MY½i; j; k, MX½i; j; k by invoking Steps 1 and 2 of Algorithm 2D_BB_MAT and compute PD½i; j; k by performing binary prefix sum, all in parallel. Then, we compute M 2D ½i; j; k (i.e., minfMZ½i; j; k; MY½i; j; k; MX½i; j; k; PD½i; j; kg) for each 1-voxel ði; j; kÞ. Step 2: This step is for all the 1-voxels ði; j; kÞ to compute their corresponding M½i; j; k in parallel. Assume that M 2D ½i; j; k ¼ mi;j;k and minfM2D ½i; j; k; M 2D ½i þ 1; j þ 1; k þ 1 þ 1; M 2D ½i þ 2; j þ 2; k þ 2 þ 2; . . . ; M 2D ½i þ mi;j;k  1; jþ mi;j;k  1; k þ mi;j;k  1 þ ðmi;j;k  1Þg ¼ hi;j;k . Based on Lemma 5, we get M½i; j; k ¼ hi;j;k . Clearly, based on Lemma 3,  it takes m1þ i;j;k for any single 1-voxel ði; j; kÞ to compute its corresponding M½i; j; k. Step 3: This step checks if each Mi;j;k is included by other Mi0 ;j0 ;k0 or not, where i P i0 , j P j0 and k P k0 . If maxfM½i  1; j  1; k  1; M½i  1; j  1; k; M½i  1; j; k  1; M½i; j  1; k  1; M½i  1; j; k; M½i; j  1; k; M½i; j; k  1g 6 M½i; j; k, then we assign 1 to T½i; j; k to represent that Mi;j;k is not included by any other maximal 1-cube. Then, 3D_BB_MAT can be represented by all the Mi;j;k of 1-voxel ði; j; kÞ in V with their corresponding T½i; j; k ¼ 1. 5.1. Time complexity analysis Obviously, if the voxels in V are either all 1-voxels or all 0-voxels, it takes an LARPBS of size N 3 to complete the 3D_BB_MAT computation in Oð1Þ time. Otherwise, in Step 1, it takes an LARPBS of size maxfN 2 ; S2k g ¼ S2k for all the 1-voxels

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

79

ði; j; kÞ 2 Z k -plane, where 0 6 k < N, to compute Steps 1 and 2 of their corresponding 2D_BB_MAT in Oð1Þ time. Clearly, V is P the 1-voxels composed of N Z k -planes. Hence, it takes an LARPBS of size maxfN 3 ; S3k g ¼ S3k , where S3k ¼ N1 k¼0 S2k , for all P S2j , for all ði; j; kÞ in V to compute their corresponding MZ½i; j; k. Similarly, it takes an LARPBS of size S3j , where S3j ¼ N1 j¼0 P the 1-voxels ði; j; kÞ in V to compute their corresponding MY½i; j; k, and an LARPBS of size S3i , where S3i ¼ N1 i¼0 S2i , for all the 1-voxels ði; j; kÞ in V to compute their corresponding MX½i; j; k. All the PD½i; j; k can be computed in Oð1Þ time using an LARPBS of size N 3 based on the computed binary prefix sums on all the 3N 2  3N þ 1 diagonals (which cover all the PD½i; j; k) as stated in Lemma 4. Then, the M 2D ½i; j; k can be computed in Oð1Þ time using an LARPBS of size N 3 by finding the minimum in the set fMZ½i; j; k; MY½i; j; k; MX½i; j; k; PD½i; j; kg. In summary, Step 1 takes an LARPBS of maxfS3k ; S3j ; S3i g processors, where N 3 6 S3i ; S3j ; S3k < N 4þ . In Step 2, assume that M2D ½i; j; k ¼ mi;j;k . Then based on Lemma 3, it takes an LARPBS of PN1  1þ size m1þ i;j;k¼0 mi;j;k ¼ S3 , i;j;k for each 1-voxel ði; j; kÞ to compute its corresponding M½i; j; k. For all the 1-voxel ði; j; kÞ in V, let then it takes an LARPBS of size S3 for all the 1-voxels ði; j; kÞ in V to compute their corresponding M½i; j; k. In the worst case, there are ðN 3  1Þ 1-voxels in V and hence S3 < 1N 1þ þ 7ðN  1Þ1þ þ 19ðN  2Þ1þ þ    þ ð3N 2  3N þ 1ÞðN  ðN  1ÞÞ1þ ¼ P PN 3 2 3 1þ ¼ Ni¼1 ð3i  3i þ 1ÞðN  i þ 1Þ1þ . Since the upper bound of the number of 1-voxels is less i¼1 ði  ði  1Þ ÞðN  i þ 1Þ 3 4þ than N and mi;j;k < N, so S3 < N . However, in the best case, it is possible that only a voxel, say p ¼ ði; j; kÞ, is 1-voxel. Clearly, it takes only one processor to set M½i; j; k ¼ 1 for this 1-voxel and no computations are needed for all the other 0voxels. In summary, N 3 6 S3 < N 4þ . In Step 3, it takes at most an LARPBS of size N 3 for all the Mi;j;k of 1-voxel ði; j; kÞ 2 V to compute their corresponding T½i; j; k. From the above analysis, it takes an LARPBS of size b ¼ maxfS3 ; S3i ; S3j ; S3k g for V to compute its corresponding 3D_BB_MAT in Oð1Þ time. This concludes the following result. Theorem 2. The 3D_BB_MAT of a binary image of size N  N  N can be computed in Oð1Þ time on an LARPBS of size b, where N 3 6 b < N 4þ , 0 <  ¼ cþ11  1. 2

1

In this paper, we also extend the proposed 3D_BB_MAT (2D_BB_MAT) algorithm to solve the 3D_CDT (2D_CDT) problem. All the proposed algorithms are very innovative although they look like ‘‘straightforward”. To the best of our knowledge, the proposed 2D_BB_MAT (2D_CDT) algorithm is the first algorithm that can extend to a solution of the 3D_BB_MAT (3D_CDT) problem in parallel, and the 3D_BB_MAT (3D_CDT) algorithm is the first parallel algorithm proposed for solving the 3D_BB_MAT (3D_CDT) problem. 6. Applications Chessboard distance transform (CDT, for short) is a distance transform based on chessboard distance metric. The 3D_CDT (2D_CDT) is the problem for each 0-voxel (0-pixel) of a 3D (2D) binary image to find its nearest 1-voxel (1-pixel) in the binary image based on chessboard distance metric. The 2D_CDT and 3D_CDT are formally defined as follows. Definition 5. Let p ¼ ði; jÞ denote a pixel of a 2D binary image. Let P ¼ fpjp ¼ ði; jÞ, p is either a 0-pixel or a 1-pixel, 0 6 i; j 6 N  1g denote a 2D binary image of size N  N. Let B2D ¼ fqjq ¼ ðx; yÞ, q is a 1-pixel, 0 6 x; y 6 N  1g represent the set of the 1-pixels of the 2D binary image. Then the 2D chessboard distance transform (2D_CDT) of pixel p ¼ ði; jÞ with respect to its nearest 1-pixel is denoted by 2d cdti;j ¼ minq2B2D fmaxðji  xj; jj  yjÞg and the 2D_CDT of P is to compute f2d cdti;j jfor all p ¼ ði; jÞ 2 Pg. Definition 6. Let p ¼ ði; j; kÞ denote a voxel p of a 3D binary image. Let V ¼ fpjp ¼ ði; j; kÞ, p is either a 0-voxel or a 1-voxel, 0 6 i; j; k 6 N  1g denote a 3D binary image of size N  N  N. Let B3D ¼ fqjq ¼ ðx; y; zÞ, q is a 1-voxel, 0 6 x; y; z 6 N  1g represent the set of the 1-voxels of the 3D binary image. Then the 3D chessboard distance transform (3D_CDT) of voxel p with respect to its nearest 1-voxel is denoted by 3d cdti;j;k ¼ minq2B3D fmaxðji  xj; jj  yj; jk  zjÞg and the 3D_CDT of V is to compute f3d cdt i;j;k jfor all p ¼ ði; j; kÞ 2 Vg. Similar to the BB_MAT problem, although the computation of the CDT problem has been studied both in the sequential and parallel domains [6,12,25], almost all proposed approaches are dedicated to solving the 2D_CDT problem only and are very difficult to extend to a solution of the 3D_CDT problem. In this section, we extend the proposed 3D_BB_MAT (2D_BB_MAT) algorithm to solve the 3D_CDT (2D_CDT) problem. 6.1. Algorithm 2D_CDT Traditionally, the 2D_BB_MAT and the 2D_CDT are viewed as two completely different image computation problems. Lee and Horng first pointed out in [12] that the computation of the 2D_CDT is interchangeable with the computation of the 2D_BB_MAT. For clarity, in the following, we propose Algorithm 2D_CDT to solve the 2D_CDT problem on an LARPBS model by invoking some steps of the proposed Algorithm 2D_BB_MAT. Algorithm. 2D_CDT Input: A 2D N  N binary image P, each pixel being represented by ði; jÞ, where 0 6 i; j < N. Output: 2D_CDT.

80

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

Step 1: Complement the input binary image P. We denote the complemented 2D binary image as PC . Step 2: Perform Step 1 of Algorithm 2D_BB_MAT from upper-left corner (i.e., pixel ð0; 0Þ) toward lower-right corner of PC to compute minfLD½i; j; LR½i; j; LC½i; jg (denoted minfLDð0;0Þ ½i; j; LRð0;0Þ ½i; j; LC ð0;0Þ ½i; jg here for clarity). Note that we assume that all the points outside the P are 0-pixels and hence all the points outside the PC are 1-pixels. So, for Step 1 of Algorithm 2D_BB_MAT to perform correctly here, the following adjustment should be taken: If any LC ð0;0Þ ½i; j þ i ¼ N, LRð0;0Þ ½i; j þ j ¼ N, LDð0;0Þ ½i; j þ maxfi; jg ¼ N, then use N to represent LC ð0;0Þ ½i; j, LRð0;0Þ ½i; j, LDð0;0Þ ½i; j, respectively. Step 3: Perform Step 2 of Algorithm 2D_BB_MAT. After this step, each 1-pixel ði; jÞ of PC gets its corresponding M½i; j. Here we denote it as M ð0;0Þ ½i; j. Step 4: So far, in Steps 2 and 3 of this algorithm, we have computed the M ð0;0Þ ½i; j (i.e., 2D_BB_MAT) from upper-left corner (i.e., pixel ð0; 0Þ) toward lower-right corner. Similarly, in this step, we repeat Steps 2 and 3 to compute the M ð0;N1Þ ½i; j, M ðN1;0Þ ½i; j, and M ðN1;N1Þ ½i; j from upper-right corner (i.e., pixel ð0; N  1Þ) toward lower-left corner, from lower-left corner (i.e., pixel ðN  1; 0Þ) toward upper-right corner, and from lower-right corner (i.e., pixel ðN  1; N  1Þ) toward upper-left corner, respectively. Of course, for pixel ð0; N  1Þ to dominate any other pixel in PC , it is necessary to change its coordinates from ð0; N  1Þ to ð0; 0Þ. Re-coordinating any other pixel with coordinates ði; jÞ to coordinates ði; N  1  jÞ is needed also for computing the M ð0;N1Þ ½i; j. Similarly, re-coordinating each pixel with coordinates ði; jÞ to coordinates ðN  1  i; jÞ and coordinates ðN  1  i; N  1  jÞ are needed for computing the M ðN1;0Þ ½i; j and M ðN1;N1Þ ½i; j, respectively. To prevent from triviality, here we skip the details. After this step, each pixel ði; jÞ (in original coordinates) of PC has gotten its corresponding Mð0;0Þ ½i; j, M ð0;N1Þ ½i; j, M ðN1;0Þ ½i; j, and M ðN1;N1Þ ½i; j. Step 5: Compute 2d cdt i;j ¼ minfM ð0;0Þ ½i; j; M ð0;N1Þ ½i; j; M ðN1;0Þ ½i; j; M ðN1;N1Þ ½i; jg for each pixel ði; jÞ of PC . The computed 2d cdt i;j is exact the 2D_CDT of each pixel ði; jÞ of P.

Y X

(0,0)

(0,7)

(7,0)

(7,7)

(0,0) 2 1 0 0 0 2 1 1

2 1 0 0 1 2 1 0

0 0 1 1 2 1 0 0

0 0 2 2 1 0 0 0

0 0 3 2 1 1 0 0

b

a 2 1 0 0 0 8 8 8

1 1 0 0 2 1 1 0

0 1 3 2 1 1 0 0

1 2 1 0 2 1 1 0

2 1 0 0 0 2 1 1

1 (0,7) 0 0 0 0 0 2 2

2 2 0 0 0 1 2 (7,0) 1

1 1 0 0 1 2 1 0

0 0 1 2 3 1 0 0

0 0 1 2 2 0 0 0

(0,0) 2 1 0 0 0 1 1 (7,0) 1

1 1 0 0 1 1 1 0

0 0 1 1 1 1 0 0

0 0 1 2 1 0 0 0

0 2 1 1 2 1 0 0

2 1 1 0 1 2 1 0

1 1 0 0 0 1 8 8

1 0 0 0 0 0 8 8

0 1 1 2 3 1 0 0

1 1 2 0 1 2 1 0

1 2 0 0 0 1 2 1

1 0 0 0 0 0 1 2 (7,7)

c 0 1 2 1 1 2 0 0

8 2 1 0 1 1 2 0

8 1 0 0 0 1 1 2

8 0 0 0 0 0 1 2

e

d

0 0 2 2 1 0 0 0

8 8 0 0 0 1 1 1

8 8 0 0 1 1 2 0

0 0 1 1 1 2 0 0

0 0 1 2 2 0 0 0

f 0 1 1 1 1 1 0 0

1 1 1 0 1 1 1 0

1 1 0 0 0 1 1 1

1 (0,7) 0 0 0 0 0 1 2 (7,7)

g Fig. 5. An example of 2D_CDT computation based on the computed 2D_BB_MAT. (a) A binary image of size 8  8, denoted P. (b) The complemented image of (a), denoted PC . (c)–(f) The corresponding Mð0;0Þ ½i; j, M ð0;N1Þ ½i; j, M ðN1;0Þ ½i; j, and M ðN1;N1Þ ½i; j of PC . (g) The computed 2D_CDT of P.

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

81

An example is given as shown in Fig. 5. Fig. 5a shows a binary image of size 8  8. Fig. 5b is the complemented image of Fig. 5a after we perform Step 1. Fig. 5c is the computed Mð0;0Þ ½i; j of Fig. 5b after we perform Steps 2 and 3 from upper-left corner (i.e., point (0, 0)) toward lower-right corner as indicated by the direction of the arrow. Similarly, in Step 4, the corresponding Mð0;7Þ ½i; j, M ð7;0Þ ½i; j, M ð7;7Þ ½i; j of Fig. 5b are computed as shown in Fig. 5d–f, respectively. Note that as stated in Step 2, we assume that all the pixels outside the P (i.e., Fig. 5a) are 0-pixels and hence all the pixels outside the PC (i.e., Fig. 5b) will be 1-pixels. This results in Mð0;0Þ ½6; 6 ¼ M ð0;0Þ ½6; 7 ¼ M ð0;0Þ ½7; 6 ¼ M ð0;0Þ ½7; 7 ¼ 1 as shown in Fig. 5c. Because the maximum chessboard distance of any 0-pixel to its nearest 1-pixel within a binary image of size 8  8 is 7 if there exists any 1-pixel within the binary image P. Without loss of generality, here we use 8 (i.e., n) to represent 1 for convenience. All the chessboard distance 8 shown in Fig. 5d–f are defined and computed similarly. Finally, in Step 5, the computed 2d cdti;j as shown in Fig. 5g is exact the 2D_CDT of each pixel ði; jÞ of Fig. 5a. From the above analysis, it is clear that a 2D_CDT problem of a binary image can be reduced to four 2D_BB_MAT problems as shown in Lemma 6. Clearly, based on Lemma 6 and by invoking Theorem 1, we conclude Corollary 2. Lemma 6. The 2D_CDT problem can be reduced to four 2D_BB_MAT problems. Corollary 2. The 2D_CDT of a binary image of size N  N can be computed in Oð1Þ time on an LARPBS of size a, where N 2 6 a < N 3þ , 0 <  ¼ 2cþ11 1  1. 6.2. Algorithm 3D_CDT Algorithm 2D_CDT can be extended to Algorithm 3D_CDT straightforwardly. When we extend Algorithm 2D_CDT to Algorithm 3D_CDT, the computed 3d cdt i;j;k is exact the 3D_CDT of each voxel ði; j; kÞ of V, where the 3d cdti;j;k is the minimum among Mð0;0;0Þ ½i; j; k, M ð0;0;N1Þ ½i; j; k, M ð0;N1;0Þ ½i; j; k, M ð0;N1;N1Þ ½i; j; k, M ðN1;0;0Þ ½i; j; k, M ðN1;0;N1Þ ½i; j; k, M ðN1;N1;0Þ ½i; j; k, and M ðN1;N1;N1Þ ½i; j; k. Note that the 3D notations used here can be defined exactly the same as those defined in Algorithm 2D_CDT, here we skip the details. Clearly, Lemma 6 can be extended to Lemma 7 straightforwardly. Lemma 7. The 3D_CDT problem can be reduced to eight 3D_BB_MAT problems. As stated in the beginning of this section, all the parallel CDT algorithms proposed by others are dedicated to solving the 2D_CDT problem only and are very difficult to extend to a solution of the 3D_CDT problem. Here we solve the 3D_CDT problem by reduced it to eight 3D_BB_MAT problems, which can easily solved by peeling each corner shell. From time complexity perspective, both 3D_BB_MAT and the 3D_CDT have the same time complexity and are logically equivalent. To prevent from triviality, here we skip the proof. Based on Lemma 7 and by invoking Theorem 2, we solve the 3D_CDT problem as stated in Corollary 3. Up to now, this is the first parallel 3D_CDT algorithm proposed. Corollary 3. The 3D_CDT of a binary image of size N  N  N can be computed in Oð1Þ time on an LARPBS of size b, where N 3 6 b < N 4þ , 0 <  ¼ cþ11  1. 2

1

7. Concluding remarks In this paper, we develop an Oð1Þ time algorithm for computing the 2D_BB_MAT of a binary image of size N  N on an LARPBS with a processors, where N 2 6 a < N 3þ , 0 <  ¼ 2cþ11 1  1, and an Oð1Þ time algorithm for computing the 3D_BB_MAT of a binary image of size N  N  N on an LARPBS with b processors, where N 3 6 b < N 4þ , 0 <  ¼ 2cþ11 1  1. The worst case happens only when most of the points in V (P) are 1-points. For an image with sparse 1-points, the proposed algorithms might be very efficient and take N 2 processors to solve the 2D_BB_MAT problem or N 3 processors to solve the 3D_BB_MAT problem in Oð1Þ time. Besides, the running time of the 2D_BB_MAT algorithm has a smaller constant factor compared with all the other previous proposed Oð1Þ time 2D_BB_MAT algorithms. We then solve the 3D_CDT (2D_CDT) problem by invoking the 3D_BB_MAT (2D_BB_MAT) algorithm. The Oð1Þ time 3D_CDT (2D_CDT) algorithm uses exact the same number of processors needed by 3D_BB_MAT (2D_BB_MAT) algorithm. In this paper, the proposed algorithms look like ‘‘straightforward”, actually, all of them are very innovative. To the best of our knowledge, the proposed 2D_BB_MAT (2D_CDT) algorithm is the first algorithm that can extend to a solution of the 3D_BB_MAT (3D_CDT) problem in parallel, and the 3D_BB_MAT (3D_CDT) algorithm is the first parallel algorithm proposed.

Acknowledgement This work was partially supported by National Science Council under the Contract Number NSC 97-2221-E-129-007. References [1] H. Blum, A transformation for extracting new descriptors of shape, in: W. Wathen-Dunn (Ed.), Models for the Perception of Speech and Visual Form, MIT Press, Cambridge, Mass, 1967, pp. 362–380.

82

Y.-R. Wang / Parallel Computing 35 (2009) 72–82

[2] S. Chandran, D. Mount, Shared memory algorithms and the medial axis transform, in: IEEE Workshop on Computer Architecture for PAMI, 1987, pp. 44–50. [3] S. Chandran, S.K. Kim, D. Mount, Parallel computational geometry of rectangles, Algorithmica 7 (1992) 25–49. [4] H. Choi, S. Choi, H. Moon, N. Wee, New algorithms for medial axis transform of plain domain, Graphical Models and Image Processing 59 (6) (1997) 463–483. [5] A. Ferreira, S. Ubéda, Computing the medial axis transform in parallel with eight scan operations, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (1999) 277–282. [6] A. Fujiwara, M. Inoue, T. Masuzawa, H. Fujiwara, A cost optimal parallel algorithm for weighted distance transforms, Parallel Computing 25 (4) (1999) 405–416. [7] A. Fujiwara, M. Inoue, T. Masuzawa, H. Fujiwara, A simple parallel algorithm for the medial axis transform, IEICE Transactions on Information and Systems E79D (8) (1996) 1038–1045. [8] Z. Guo, R.G. Melhem, R.W. Hall, D.M. Chiarulli, S.P. Levitan, Pipelined communications in optically interconnected arrays, Journal of Parallel and Distributed Computing 12 (3) (1991) 269–282. [9] Z. Guo, Optically interconnected processor arrays with switching capability, Journal of Parallel and Distributed Computing 23 (1994) 314–329. [10] J.F. Jeng, S. Sahni, Serial and parallel algorithms for the medial axis transform, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (1992) 1218–1224. [11] S.K. Kim, Optimal parallel algorithms for region labeling and medial axis transform of binary images, SIAM Journal of Discrete Mathematics 4 (3) (1991) 385–396. [12] Y.H. Lee, S.J. Horng, Equivalence of the chessboard distance transform and the medial axis transform, International Journal of Computer Mathematics 65 (1997) 165–177. [13] S.P. Levitan, D.M. Chiarulli, R.G. Melhem, Coincident pulse technique for multiprocessor interconnection structures, Applied Optics 29 (1990) 2024– 2033. [14] K. Li, Y. Pan, S.-Q. Zheng (Eds.), Parallel Computing Using Optical Interconnections, Kluwer Academic Publishers, Boston, 1998. [15] K. Li, Y. Pan, S.-Q. Zheng, Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system, IEEE Transactions on Parallel and Distributed Systems 9 (8) (1998) 705–720. [16] K. Li, Y. Pan, S.-Q. Zheng, Efficient deterministic and probabilistic simulations of PRAMs on linear arrays with reconfigurable pipelined bus systems, Journal of Supercomputing 15 (2000) 163–181. [17] R.G. Melhem, D.M. Chiarulli, S.P. Levitan, Space multiplexing of waveguides in optically interconnected multiprocessor systems, Computer Journal 32 (4) (1989) 362–369. [18] Y. Pan, M. Hamdi, Quicksort on a linear array with a reconfigurable pipelined bus, in: Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks, 1996, pp. 313–319. [19] Y. Pan, K. Li, Linear array with a reconfigurable pipelined bus system – concepts and applications, Journal of Information Sciences (1998) 237–258. [20] Y. Pan, K. Li, S.-Q. Zheng, Fast nearest neighbor algorithms on a linear array with a reconfigurable pipelined bus system, Parallel Algorithms and Applications 13 (1998) 1–25. [21] S. Pavel, S.G. Akl, On the power of arrays with reconfigurable optical bus, in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1996, pp. 1443–1454. [22] C. Qiao, R.G. Melhem, Time-division communications in multiprocessor arrays, IEEE Transactions on Computers 42 (1993) 577–590. [23] A. Rosenfeld, A.C. Kak, Digital Picture Processing, Academic Press, 1982. [24] S. Sahni, Models and algorithms for optical and optoelectronic parallel computers, in: Proceedings of the Fourth International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN’99), June 23–25, 1999, pp. 2–7. [25] O. Schwarzkopf, Parallel computation of distance transforms, Algorithmica 6 (1991) 685–697. [26] J. Serra, Image Analysis and Mathematical Morphology, Academy Press, 1982. [27] D.F. Stout, Prob 80-4: improved solution, Journal of Algorithms 5 (1983) 176–177. [28] K. Vo, Prob 80-4, Journal of Algorithms 4 (3) (1982) 366–368. [29] Y.R. Wang, S.J. Horng, Y.H. Lee, P.Z. Lee, Parallel algorithms for higher-dimensional Euclidean distance transforms with applications, IEICE Transactions on Information and Systems E86-D (9) (2003) 1586–1593. [30] Y.R. Wang, S.J. Horng, Parallel algorithms for arbitrary dimensional Euclidean distance transforms with applications on arrays with reconfigurable optical buses, IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics 34 (1) (2004) 517–532. [31] Y.R. Wang, S.J. Horng, C.H. Wu, Efficient algorithms for all nearest neighbor and closest pair problems on the linear array with a reconfigurable pipelined bus system, IEEE Transactions on Parallel and Distributed Systems 16 (3) (2005) 193–206. [32] Y.R. Wang, Fast algorithms for block-based medial axis transform on the LARPBS, in: Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics (IEEE SMC 2005), October 10–12, 2005, pp. 3616–3621. [33] Y.R. Wang, An efficient O(1) time 3D all nearest neighbor algorithm from image processing perspective, Journal of Parallel and Distributed Computing 67 (2007) 1082–1091.