Design of Efficient Binary Comparators in Quantum-Dot ... - IEEE Xplore

Report 9 Downloads 138 Views
192

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

Design of Efficient Binary Comparators in Quantum-Dot Cellular Automata Stefania Perri, Senior Member, IEEE, Pasquale Corsonello, Member, IEEE, and Giuseppe Cocorullo, Member, IEEE

Abstract—Quantum-dot cellular automata (QCA) are an attractive emerging technology suitable for the development of ultradense low-power high-performance digital circuits. Efficient solutions have recently been proposed for several arithmetic circuits, such as adders, multipliers, and comparators. Nevertheless, since the design of digital circuits in QCA still poses several challenges, novel implementation strategies and methodologies are highly desirable. This paper proposes a new design approach oriented to the implementation of binary comparators in QCA. New formulations of basic logic equations required to perform the comparison function are proposed. The new strategy has been exploited in the design of two different comparator architectures and for several operands word lengths. With respect to existing counterparts, the comparators proposed here exhibit significantly higher speed and reduced overall area. Index Terms—Binary comparators, majority gates, quantumdot cellular automata (QCA).

I. INTRODUCTION UANTUM-DOT cellular automata (QCA) technology provides a promising opportunity to overcome the approaching limits of conventional CMOS technology [1]–[6]. For this reason, in recent years the design of logic circuits based on QCA has received a great deal of attention, and special efforts have been directed towards arithmetic circuits, such as adders [7]–[14], multipliers [15]–[21], and comparators [22]–[28]. EVEN though comparators are key elements for a wide range of applications [29], [30], QCA implementations existing in the literature are mainly provided for comparing two single bits. Only few examples of comparators able to process n-bit operands, with n > 2, are available [24], [26], [27]. The comparator described in [22] simply computes the XNOR function to establish whether two input bits a and b match each other. The structures proposed in [23]–[28] provide higher computational capabilities, and circuits able to separately recognize all the three possible conditions in which a = b, a > b, and a < b (here named full comparators) are described in [23], [24], and [27]. The 1-bit implementation proposed in [23] and then

Q

Manuscript received June 11, 2013; revised September 26, 2013; accepted December 11, 2013. Date of publication December 20, 2013; date of current version March 6, 2014. The review of this paper was arranged by Associate Editor C. A. Moritz. The authors are with the Department of Electronics Computer Sciences and Systems, University of Calabria, Rende 87036, Italy (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNANO.2013.2295711

improved in [25], has been exploited in [27] to design a parallel n-bit full comparator. An example of serial structures is provided in [24], whereas the n-bit comparator described in [26] can recognize only the case in which, A and B being the nbit inputs, A ≥ B. Alternative QCA implementations of 1-bit full comparators were recently proposed in [28]. With respect to other QCA designs, the latter exhibit reduced delays, area occupancy and number of used cells. This paper focuses on the design of efficient parallel QCAbased n-bit full comparators. The main contribution of this paper is the introduction of a novel design methodology that allows low computational time and very compact layouts to be achieved. In particular, original theorems and corollaries are stated and demonstrated that directly impact on the QCA realizations of some basic Boolean functions used within the comparator architectures. The novel theorems were applied to achieve innovative QCAbased structures of n-bit full comparators that were laid out and simulated using the QCADesigner tool [31] for n ranging between 2 and 32. As an example, one of the 32-bit comparators designed exploiting the proposed theory is implemented using less than 2800 cells within an overall area of about 2.66 μm2 ; moreover, it requires only 15 clock cycles to complete the operation. The rest of the paper is organized as follows: a brief background of the QCA design approach and existing QCA implementations of binary comparators is given in Section II; the new theorems and corollaries are then enunciated and demonstrated in Section III; comparators designed exploiting the novel theorems are proposed in Section IV that also presents comparison results with existing designs; finally, in Section V, conclusions are drawn. II. BACKGROUND AND RELATED WORKS The basic element of a nanostructure based on QCA is a square cell with four quantum dots and two free electrons. The latter can tunnel through the dots within the cell, but, owing to Coulombic repulsion, they will always reside in opposite corners [1], thus leading to only two possible stable states, also named polarizations. Locations of the electrons in the cell are associated with the binary states 1 and 0. Adjacent cells interact through electrostatic forces and tend to align their polarizations. However, QCA cells do not have intrinsic data flow directionality. Therefore, to achieve controllable data directions, the cells within a QCA design are partitioned into the so-called clock zones that are progressively associated with four clock signals, each phase shifted by 90◦ . This clock scheme, named the zone clocking scheme, makes the QCA

1536-125X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

PERRI et al.: DESIGN OF EFFICIENT BINARY COMPARATORS IN QUANTUM-DOT CELLULAR AUTOMATA

193

designs intrinsically pipelined, since each clock zone behaves like a D-latch [20]. QCA cells are used for both logic structures and interconnections that can exploit either the coplanar cross or the bridge technique [1], [2], [6], [31], [32]. The fundamental logic gates inherently available within the QCA technology are the inverter and the majority gate (MG). Given three inputs a, b and c, the MG performs the logic function reported in (1) provided that all input cells are associated with the same clock signal clkx (with x ranging from 0 to 3), whereas the remaining cells of the MG are associated with the clock signal clkx+1 M (a, b, c) = a · b + a · c + b · c

(1)

There are several QCA designs of comparators in the literature [22]–[28]. A 1-bit binary comparator receives two bits a and b as inputs and establishes whether they are equal, less than or greater than each other. These possible states are represented through three output signals, here named Aeq B, Abig B, Bbig A, that are asserted, respectively, when a = b, a > b, and a < b. Full comparators are those that can separately identify all the above cases, whereas non-full comparators recognize just one or two of them. As an example, the comparator designed in [22] and depicted in Fig. 1(a) can verify only whether a = b. Conversely, the circuits shown in Fig. 1(b) and (c), proposed in [23] and [24], are full comparators. The latter also exploits two 1-bit registers D to process n-bit operands serially from the least significant bit to the most significant one. With the main objective of reducing the number of wire crossings, which is still a big challenge of QCA designs [33]–[35], in [25] the universal logic gate (ULG) f (y1 , y2 , y3 ) = M (M (y1 , y2 , 0), M (y1 , y3 , 1), 1) was proposed and then used to implement the comparator illustrated in Fig. 1(d). It is worth noting that, two n-bit numbers A(n −1:0) = an −1 . . . a0 and B(n −1:0) = bn −1 . . . b0 can be processed by cascading n instances of the 1-bit comparator. Each instance receives as inputs the ith bits ai and bi (with i = n − 1, . . . , 0) of the operands and the signals Abig B(i−1:0) and Bbig A(i−1:0) . The former is asserted when the subword A(i−1:0) = ai−1 . . . a0 represents a binary number greater than B(i−1:0) = bi−1 . . . b0 . In a similar way, Bbig A(i−1:0) is set to 1 when A(i−1:0) < B(i−1:0) . The outputs Abig B(i:0) and Bbig A(i:0) directly feed the next stage. It can be seen that this circuit does not identify the case in which A = B, therefore it cannot be classified as a full-comparator. The design described in [26] exploits a tree-based (TB) architecture and exhibits a delay that in theory logarithmically increases with n. The 2-bit version of such designed comparator is illustrated in Fig. 1(e). Also the full comparator proposed in [27] exploits a TB architecture to achieve high speed. As shown in Fig. 1(f), where 4-bit operands are assumed, one instance of the 1-bit comparator presented in [23] is used for each bit position. The intermediate results obtained in this way are then further processed through a proper number of cascaded 2-input OR and AND gates implemented by means of MGs having one input permanently set to 1 and 0, respectively. Analyzing existing QCA implementations of binary comparators it can be observed that they were designed directly mapping

Fig. 1. QCA-based comparators presented in: (a) [22]; (b) [23]; (c) [24]; (d) [25]; (e) [26]; (f) [27].

the basic Boolean functions consolidated for the CMOS logic designs to MGs and inverters, or ULGs. Unfortunately, in this way the computational capability offered by each MG could be underutilized [13], [36], [37]. As a consequence, both the complexity and the overall delay of the resulting QCA designs could be increased in vain. III. NEW FORMULATIONS FOR QCA IMPLEMENTATIONS OF n-BIT FULL COMPARATORS In this section, four original theorems and two corollaries are enunciated that can significantly increase the speed performances of QCA-based designs of full comparators and can significantly reduce the number of used MGs and inverters with

194

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

respect to existing comparators, thus reducing also the number of used cells and the overall active area. The Appendix at the end of the paper provides the proofs of the novel theorems and corollaries. The novel formulations can be exploited in the design of n-bit full comparators splitting the operands A(n −1:0) = an −1 . . . a0 and B(n −1:0) = bn −1 . . . b0 into a proper number of 2-bit and 3-bit subwords that can be compared applying Theorems 1 and 2. The intermediate results obtained in this way can be then further processed by applying Theorems 3 and 4 together with Corollaries 1 and 2. Theorem 1: If A(k −1:k −2) = ak −1 ak −2 and B(k −1:k −2) = bk −1 bk −2 , with k = 2, 4, . . ., n – 2, n, are two 2-bit subwords of the n-bit numbers A(n −1:0) and B(n −1:0) , respectively, then Abig B(k −1:k −2) as defined in (2) is equal to 1 if and only if A(k −1:k −2) > B(k −1:k −2) ;Bbig A(k −1:k −2) as defined in (3) is equal to 0 if and only if A(k −1:k −2) < B(k −1:k −2)   Abig B(k −1:k −2) = M ak −1 , bk −1 , ak −2   · M ak −1 , bk −1 , bk −2 (2)   Bbig A(k −1:k −2) = M ak −1 , bk −1 , ak −2   (3) + M ak −1 , bk −1 , bk −2 Theorem 2: If A(k −1:k −3) = ak −1 ak −2 ak −3 and B(k −1:k −3) = bk −1 bk −2 bk −3 , with k = 3, 6, . . . , n − 3, n, are 3-bit subwords of the n-bit numbers A(n −1:0) and B(n −1:0) , respectively, then Abig B(k −1:k −3) as defined in (4) is equal to 1 if and only if A(k −1:k −3) > B(k −1:k −3) ; Bbig A(k −1:k −3) as given in (5) is equal to 0 if and only if A(k −1:k −3) < B(k −1:k −3)   Abig B(k −1:k −3) = M (M ak −1 , bk −1 , ak −2 ,   M ak −1 , bk −1 , bk −2 , ak −3 · bk −3 ) (4)   Bbig Ak −1:k −3 = M (M ak −1 , bk −1 , ak −2 ,   M ak −1 , bk −1 , bk −2 , ak −3 + bk −3 ). (5) Theorem 3: Given two n-bit numbers A(n −1:0) and B(n −1:0) , (6) gives Abig B(n −1:0) = 1 if and only if A(n −1:0) > B(n −1:0) , whereas (7) gives Bbig A(n −1:0) = 0 if and only if A(n −1:0) < B(n −1:0) .   Abig B(n −1:0) = M (M an −1 , bn −1 , an −2 ,   M an −1 , bn −1 , bn −2 , Abig B(n −3:0) ) (6)   Bbig A(n −1:0) = M (M an −1 , bn −1 , an −2 ,   M an −1 , bn −1 , bn −2 , Bbig A(n −3:0) ) (7) Theorem 4: If A(n −1:0) and B(n −1:0) are two n-bit numbers, Abig B(n −1:n −3) and Bbig A(n −1:n −3) being computed by (4) and (5), respectively, then Abig B(n −1:0) as defined in (8) is equal to 1 if and only if A(n −1:0) > B(n −1:0) , whereas Bbig A(n −1:0) as defined in (9) is equal to 0 if and only if A(n −1:0) < B(n −1:0) Abig B(n −1:0) = M (Abig B(n −1:n −3) , Bbig A(n −1:n −3) , Abig B(n −4:0) )

(8)

 Bbig A(n −1:0) = M Abig B(n −1:n −3) ,

 Bbig A(n −1:n −3) , Bbig A(n −4:0) .

(9)

Corollary 1: Let’s consider two n-bit numbers A(n −1:0) and B(n −1:0) , and let’s suppose that they are split into the subwords A(n −1:h) , A(h−1:0) , B(n −1:h) and B(h−1:0) . If Abig B(n −1:h) , Abig B(h−1:0) , Bbig A(n −1:h) and Bbig A(h−1:0) are computed by applying Theorems 3 and 4, then Abig B(n −1:0) as defined in (10) is equal to 1 if and only if A(n −1:0) > B(n −1:0) , whereas Bbig A(n −1:0) as defined in (11) is equal to 0 if and only if A(n −1:0) < B(n −1:0) .

  Abig B(n −1:0) = M Abig B(n −1:h) , Bbig A(n −1:h) , Abig B(h −1:0) 

(10) 

Bbig A(k −1:0) = M Abig B(n −1:h) , Bbig A(n −1:h) , Bbig A(h −1:0)

(11) Corollary 2: Given two n-bit numbers A(n −1:0) and B(n −1:0) , if Abig B(n −1:0) and Bbig A(n −1:0) are computed by applying Theorems 1, 2, 3, and 4 and/or Corollary 1, then Aeq B(n −1:0) defined in (12) is equal to 1 if and only if A(n −1:0) = B(n −1:0)   Aeq B(n −1:0) = M Abig B(n −1:0) , Bbig A(n −1:0) , 0 . (12) In the following, it is demonstrated that several strategies can be adopted at the circuit level to apply the above demonstrated formulations and, consequently, that different architectures and QCA implementations can be achieved for an n-bit full comparator. In order to exploit the novel approach, the operands A(n −1:0) and B(n −1:0) are split into a proper number of 2- and 3-bit subwords that are compared applying Theorems 1 and 2. The results obtained comparing 2- and 3-bit subwords are then combined by applying Theorems 3 and 4 together with Corollaries 1 and 2. IV. DESIGNING BINARY COMPARATORS EXPLOITING THE NEW THEOREMS The circuits illustrated in Fig. 2 were designed to implement in QCA the novel equations demonstrated in the previous Section. The generic module Ti, with i ranging between 1 and 4, implements the equations enunciated in the ith theorem, whereas C1 and C2 compute the signals Abig B(k −1:0) , Bbig A(k −1:0) , and Aeq B(k −1:0) as shown above in Corollaries 1 and 2, respectively. As examples of application, the above QCA modules have been used to design two different structures of full comparators here named cascade-based and TB architectures. However, many other structures can be designed by combining the basic modules in different manners. A. Novel QCA Comparators The first proposed comparator exploits a cascade-based (CB) architecture. To explain better how the overall computation is performed, the schematic diagram illustrated in Fig. 3 is provided. It shows a possible implementation of a 32-bit comparator based on the proposed theory. Following the criterion illustrated

PERRI et al.: DESIGN OF EFFICIENT BINARY COMPARATORS IN QUANTUM-DOT CELLULAR AUTOMATA

195

TABLE I SPLITTING CRITERION ADOPTED IN THE CB COMPARATORS

Fig. 2.

QCA modules: (a) T1; (b) T2; (c) T3; (d) T4; (e) C1; and (f) C2.

Fig. 3.

Novel 32-bit CB full comparator.

in Fig. 3, an n-bit CB full comparator designed as proposed here uses: n/3 instances of T1 and/or T2; n/3 cascaded instances of T4 through which the signals Abig B(n −1:0) and Bbig A(n −1:0) are computed; and one instance of C2, needed to compute also Aeq B(n −1:0) . Circles visible in Fig. 3 indicate the additional clock phases that have to be inserted on wires to guarantee the correct synchronization of the overall design. The CB full comparator was designed for operands word lengths ranging from 2 to 32 and using, for n > 2, the split criterion summarized in Table I. Obviously, alternative splits could be used.

As it is well known, the number of cascaded MGs within the worst computational path of a QCA design directly affects the delay achieved. In fact, each MG introduces one clock phase in the overall delay. From Fig. 2, it can be seen that the modules T1 and T2 contribute to the computational path with one inverter and two MGs. Each instance of T4 introduces one more MG, whereas C2 is responsible for one MG and one inverter. As a consequence, the critical computational path of the novel n-bit CB full comparator consists of n/3+ 3 MGs and 2 inverters. As an example, the 32-bit implementation depicted in Fig. 3 has the worst-case path made up of 13 MGs and 2 inverters. As always happens in CB computational architectures, the number of MGs within the computational path of the abovedescribed comparator linearly increases with n. An alternative solution presented here adopts a TB architecture to achieve shorter computational paths. When this approach is exploited, several implementations of an n-bit full comparator can be designed differently combining the novel theorems and corollaries, as well as their QCA implementations depicted in Fig. 2. The TB comparators implement the comparison function recursively. The operands A and B are preliminarily partitioned as A = AM SB ALSB and B = BM SB BLSB . The portions AM SB and BM SB are compared independently of the portions ALSB and BLSB . The depth of the recursion directly impacts the whole architecture. Examples of TB structures designed for 16- and 32-bit comparators are illustrated in Fig. 4. In Fig. 4(b) and (d), the recursion with its minimum depth is adopted. The portions AM SB and BM SB , as well as the portions ALSB and BLSB , are separately compared trough two independent CB architectures. The overall result is finally built with the modules C1 and C2. Fig. 4(a) and (c) shows comparators designed adopting deeper recursions. In the following of the paper, the 16- and 32-bit TB implementations illustrated in Fig. 4(b) and (d) are deeply analyzed. Referring to the QCA modules depicted in Fig. 2, it can be easily verified that the former uses 35 MGs and 17 inverters and its critical computational path consists of 7MGs and 2 inverters, whereas the latter utilizes 83 MGs and 33 inverters and it has a worst-case path composed by 9 MGs and 2 inverters. B. Results Preliminary results obtained for the novel comparators at several operands word lengths are reported in Table II and compared to the pre-implementation characteristics furnished in the original papers for the parallel comparators described in [26] and [27] for operands wider than 1 bit. The design complexity

196

Fig. 4.

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

Examples of novel TB comparators with: (a) and (b) 16-bit operands; (c) and (d) 32-bit inputs. TABLE II PREIMPLEMENTATION RESULTS

TABLE III SIMULATION PARAMETERS

Fig. 5.

Novel 16-bit comparators: (a) the CB; (b) the TB.

of the comparators examined is reported in terms of number of MGs and inverters required in the overall designs, and number of MGs within the worst computational paths. The computational capability of each architecture is also specified.

PERRI et al.: DESIGN OF EFFICIENT BINARY COMPARATORS IN QUANTUM-DOT CELLULAR AUTOMATA

Fig. 6.

QCA implementation of the novel 32-bit comparators: (a) the CB; (b) the TB.

Fig. 7.

Simulation results obtained for the novel 16-bit comparators: (a) the CB; (b) the TB.

It can be seen that, exploiting the formulations introduced in this paper, the novel comparators can achieve lower complexity than their competitors, especially when wider operands are processed. Among the compared architectures, that described in [26] theoretically has a shorter critical path for all the considered n. However, it should be noted that it is not a full comparator. Moreover, the QCA implementations of the TB architectures adopted in [26] and [27] require overlong wires. As deeply discussed in [1], [2], [6], in order to achieve robust QCA designs, a maximum of 15 or 16 cascaded cells per clock zone should be used. As a consequence, overlong wires introduce additional clock phases that, depending on the operands word length, can significantly exceed the number of cascaded MGs

197

reported in Table II, thus compromising the actually achievable speed performances. As shown in the following, the novel comparators have been implemented in QCA taking this aspect into account. Proper layout strategies have been adopted that allow the number of additional clock phases due to overlong wires to be limited to 1 independently of the operands word length and in both cascade-based and TB architectures proposed here. The novel comparators were implemented using the QCADesigner tool [31] adopting the following rules: the QCA cells are 18 nm wide and 18 nm high; the cells are placed on a grid with a cell center-to-center distance of 20 nm; there is at least one cell spacing between adjacent wires; the quantum dot diameter is 5 nm; the multilayer wire crossing structure is exploited; a

198

Fig. 8.

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

Simulation results obtained for the novel 32-bit comparators: (a) the CB; (b) the TB.

TABLE IV POSTIMPLEMENTATION COMPARISON RESULTS

maximum of 16 cascaded cells and a minimum of 2 cascaded cells per clock zone are assumed. The coherence vector engine was used for simulations with the options summarized in Table III. Layouts implemented for the 16- and 32-bit versions of the novel comparators are illustrated in Figs. 5 and 6, respectively. It is worth pointing out that the layouts of the CB comparators exploit only forwards paths: that is the intermediate results obtained for the least significant bits of the operands are routed towards the most significant ones (i.e., from right to left). On the contrary, in order to limit the number of overlong wires, the layouts of the TB comparators also use backward paths. From Figs. 5(b) and 6(b), it can be seen that forward paths are routed for the least significant bits, whereas backward paths are exploited for the most significant bits.

Some simulation results obtained for the 16- and the 32-bit comparators are depicted in Figs. 7 and 8 that also show the time at which the first valid results are outputted. It is worth noting that the CB circuits require 10 and 15 latency clock phases to obtain the first 16- and 32-bit valid outputs, respectively. As an example, the 15 clock phases of the 32-bit comparator are as follows: 1 clock phase is needed for inputs acquisition; the signals Abig B(1:0) and Bbig A(1:0) related to the least significant bit positions are then computed within the 2 subsequent clock phases; one additional phase is due to the overlong wire highlighted in Fig. 6(a); ten phases are required for computing the signals Abig B(j :0) and Bbig A(j :0) , with j = 4, 7,. . ., 28, 31; finally, one more phase is needed to compute the output Aeq B(31:0) . The 16- and 32-bit TB comparators have a latency of 8 and 11 clock phases, respectively, and also in this case at most there is only one overlong wire [highlighted in Fig. 6(b)] in the layout. Post layout characteristics, such as, cell count, overall size, delay and number of clock phases, are reported in Table IV for all the examined comparators. Results obtained demonstrate that, at a parity of the operands word length, when compared to [27], which is the only parallel comparator existing in the literature for which QCA implementations have been characterized also for n > 2, the novel CB implementation is over 60% faster, occupies up to 81% smaller area and uses over 56% less cells. Results reported in Table IV demonstrate that speed performances can be further increased by using the novel TB structure, which also exhibits limited area and cells requirements. As expected, for the TB comparators presented in [26] and [27], several additional clock phases exceeding the number of cascaded MGs within the computational path are introduced in the layout due to overlong wires. For example, in the case of n = 8, the comparator [27] has 9 cascaded MGs within its critical path, but the operation is actually performed within 18 clock phases. On the contrary, as a further merit, for the new comparators the number of clock phases exceeding the number of MGs within the worst computational path is at most two, independently of n. Only one of these additional clock phases

PERRI et al.: DESIGN OF EFFICIENT BINARY COMPARATORS IN QUANTUM-DOT CELLULAR AUTOMATA

Fig. 9.

199

Possible cases occurring when two 2-bit subwords are compared.

is due to overlong wires, whereas the other one is needed for the inputs acquisition. V. CONCLUSION A new methodology useful to design binary comparators in QCA has been presented. It is based on innovative formulations that allow increased speed performances and reduced overall sizes to be achieved with respect to the existing competitors. The novel comparators split the received n-bit inputs into a proper number of 2- and 3-bit subwords that are processed in parallel through 2- and 3-bit comparators designed by applying theorems demonstrated here. Thanks to the basic logic and layout strategies adopted, a 32-bit CB full comparator designed as described in this paper exhibits a delay of only 3 + (3/4) clock cycles, occupies an active area of 2.66 μm2 , and achieves an area-delay product less than 10. When the alternative TB architecture presented here is exploited, the delay is further reduced to 2 + (3/4) clock cycles; the active area is ∼2.9 μm2 , whereas the area-delay product is less than 8. APPENDIX This Appendix provides proofs of the theorems and corollaries enunciated in Section III. Proof of Theorem 1: Equations (2) and (3) can be easily proven referring to the truth table reported in Fig. 9, which shows that the terms M (ak −1 , bk −1 , ak −2 ) and M (ak −1 , bk −1 , bk −2 ) are both equal to 1 only when A(k −1:k −2) > B(k −1:k −2) , whereas they are both equal to 0 only when A(k −1:k −2) < B(k −1:k −2) . Proof of Theorem 2: Fig. 9 shows that, if A(k −1:k −2) > B(k −1:k −2) , M (ak −1 , bk −1 , ak −2 ) = M (ak −1 , bk −1 , bk −2 ) = 1 and then the majority functions used in and (4) and (5) provide Abig B(k −1:k −3) = 1 Bbig A(k −1:k −3) = 1, independently of ak −3 · bk −3 and ak −3 + bk −3 , respectively. Analogously, when A(k −1:k −2) < B(k −1:k −2) , both M (ak −1 , bk −1 , ak −2 ) and M (ak −1 , bk −1 , bk −2 ) are equal to 0, therefore Abig B(k −1:k −3) = 0

and Bbig A(k −1:k −3) = 0, independently of ak −3 · bk −3 and ak −3 + bk −3 . On the contrary, with A(k −1:k −2) = B(k −1:k −2) , M (ak −1 , bk −1 , ak −2 ) and M (ak −1 , bk −1 , bk −2 ) assume opposite values, therefore the results provided by (4) and (5) depend on ak −3 · bk −3 and ak −3 + bk −3 . In this case, the only condition for which A(k −1:k −3) > B(k −1:k −3) occurs if ak −3 = 1 and bk −3 = 0 (i.e., bk −3 = 1). When it takes place, Abig B(k −1:k −3) = 1; otherwise, it is equal to 0. It can be also observed that, A(k −1:k −2) being equal to B(k −1:k −2) , the condition A(k −1:k −3) < B(k −1:k −3) is satisfied only if ak −3 = 0 and bk −3 = 1 (i.e., bk −3 = 0). When it occurs, Bbig A(k −1:k −3) = 0; otherwise, it is equal to 1. Proof of Theorem 3: Recursively applying (6) to the term Abig B(n −3:0) , we obtain    Abig B(n −3:0) = M M an −3 , bn −3 , an −4 ,    M an −3 , bn −3 , bn −4 , Abig B(n −5:0) and in turn    Abig B(n −5:0) = M M an −5 , bn −5 , an −6 ,    M an −5 , bn −5 , bn −6 , Abig B(n −7:0) and so on till (6) is rewrittenas follows, with h being equal to 1 or 2    Abig B(n −1:0) = M M an −1 , bn −1 , an −2 ,    M an −1 , bn −1 , bn −2 , Abig B(n −3:0)     = M (M an −1 , bn −1 , an −2 , M an −1 , bn −1 , bn −2 ,     M (M an −3 , bn −3 , an −4 , M an −3 , bn −3 , bn −4 , Abig B(n −5:0) ))     = M (M an −1 , bn −1 , an −2 , M an −1 , bn −1 , bn −2 ,     M (M an −3 , bn −3 , an −4 , M an −3 , bn −3 , bn −4 , . . . ,     M (M ah+2 , bh+2 , ah+1 , M ah+2 , bh+2 , bh+1 , Abig B(h:0) ))).

200

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

Analogously, (7) can be rewritten as shown in the expression below.    Bbig A(n −1:0) = M M an −1 , bn −1 , an −2 ,    M an −1 , bn −1 , bn −2 , Bbig A(n −3:0)     = M (M an −1 , bn −1 , an −2 , M an −1 , bn −1 , bn −2 ,   M (M an −3 , bn −3 , an −4 ,   M an −3 , bn −3 , bn −4 , Bbig A(n −5:0) ))     = M (M an −1 , bn −1 , an −2 , M an −1 , bn −1 , bn −2 ,   M (M an −3 , bn −3 , an −4 , M (an −3 , bn −3 , bn −4 ), . . . , M (M (ah+2 , bh+2 , ah+1 ),   M ah+2 , bh+2 , bh+1 , Bbig A(h:0) ))). By applying Theorem 1 to the 2-bit subwords A(h+2:h+1) = ah+2 ah+1 and B(h+2:h+1) = bh+2 bh+1 , and Theorems 2 and 3 to the subwords A(h:0) and B(h:0) , we obtain that    Abig B(h+2:0) = M M ah+2 , bh+2 , ah+1 ,    M ah+2 , bh+2 , bh+1 , Abig B(h:0) is equal to 1 if and only if A(h+2:0) > B(h+2:0) , and in turn that    Abig B(h+4:0) = M M ah+4 , bh+4 , ah+3 ,    M ah+4 , bh+4 , bh+3 , Abig B(h+2:0) provides 1 only when A(h+4:0) > B(h+4:0) , and so on till    Abig B(n −1:0) = M M an −1 , bn −1 , an −2 ,    M an −1 , bn −1 , bn −2 , Abig B(n −3:0) is obtained, which is equal to 1 if and only if A(n −1:0) > B(n −1:0) . Similarly, we have that    Bbig A(h+2:0) = M M ah+2 , bh+2 , ah+1 ,    M ah+2 , bh+2 , bh+1 , Bbig A(h:0) is equal to 0 if and only if A(h+2:0) < B(h+2:0) , and in turn that    Bbig A(h+4:0) = M M ah+4 , bh+4 , ah+3 ,    M ah+4 , bh+4 , bh+3 , Bbig A(h+2:0) provides 0 only when A(h+4:0) < B(h+4:0) , and so on till    Bbig A(n −1:0) = M M an −1 , bn −1 , an −2 ,    M an −1 , bn −1 , bn −2 , Bbig A(n −3:0) is obtained, which is equal to 0 if and only if A(n −1:0) < B(n −1:0) . Proof of Theorem 4: Recursively applying (8) to the term Abig B(n −4:0) , we obtain  Abig B(n −4:0) = M Abig B(n −4:n −6) ,  Bbig A(n −4:n −6) , Abig B(n −7:0)

and in turn

 Abig B(n −7:0) = Abig B(n −7:n −9) , Bbig A(n −7:n −9) , Abig B(n −10:0)



and so on till (8) is rewritten as follows, with h being equal to 1 or 2.  Abig B(n −1:0) = M Abig B(n −1:n −3) ,  Bbig A(n −1:n −3) , Abig B(n −4:0) = M (Abig B(n −1:n −3) , Bbig A(n −1:n −3) , M (Abig B(n −4:n −6) , Bbig A(n −4:n −6) , Abig B(n −7:0) )) = M (Abig B(n −1:n −3) , Bbig A(n −1:n −3) , M (Abig B(n −4:n −6) , Bbig A(n −4:n −6) , . . . , M (Abig B(h+3:h+1) , Bbig A(h+3:h+1) , Abig B(h:0) ))). In a similar way, (9) can be rewritten as shown in the expression below  Bbig A(n −1:0) = M Abig B(n −1:n −3) ,  Bbig A(n −1:n −3) , Bbig A(n −4:0) = M (Abig B(n −1:n −3) , Bbig A(n −1:n −3) , M (Abig B(n −4:n −6) , Bbig A(n −4:n −6) , Bbig A(n −7:0) )) = M (Abig B(n −1:n −3) , Bbig A(n −1:n −3) , M (Abig B(n −4:n −6) , Bbig A(n −4:n −6) , . . . ,  M Abig B(h+3:h+1) , Bbig A(h+3:h+1) ,  Bbig A(h:0) )). By applying Theorem 2 to the 3-bit subwords A(h+3:h+1) and B(h+3:h+1) , and Theorems 2 and 3 to the subwords A(h:0) and B(h:0) , we can derive that  Abig B(h+3:0) = M Abig B(h+3:h+1) ,  Bbig A(h+3:h+1) , Abig B(h:0) is equal to 1 if and only if A(h+3:0) > B(h+3:0) and that  Bbig A(h+3:0) = M Abig B(h+3:h+1) ,  Bbig A(h+3:h+1) , Bbig A(h:0) is equal to 0 if and only if A(h+3:0) < B(h+3:0) . In turn, it also arises that  Abig B(h+6:0) = M Abig B(h+6:h+4) ,  Bbig A(h+6:h+4) , Abig B(h+3:0) is equal to 1 if and only if A(h+6:0) > B(h+6:0) and that  Bbig A(h+6:0) = M Abig B(h+6:h+4) ,  Bbig A(h+6:h+4) , Bbig A(h+3:0)

PERRI et al.: DESIGN OF EFFICIENT BINARY COMPARATORS IN QUANTUM-DOT CELLULAR AUTOMATA

is equal to 0 if and only if A(h+6:0) < B(h+6:0) , and so on till  Abig B(n −1:0) = M Abig B(n −1:n −3) ,  Bbig A(n −1:n −3) , Abig B(n −4:0) and

 Bbig A(n −1:0) = M Abig B(n −1:n −3) , Bbig A(n −1:n −3) , Bbig A(n −4:0)



are obtained. The former is equal to 1 if and only if A(n −1:0) > B(n −1:0) , whereas the latter is equal to 0 if and only if A(n −1:0) < B(n −1:0) . Proof of Corollary 1:From Theorems 3 and 4, we know that, Abig B(n −1:h) = Bbig A(n −1:h) = 1 if and only if A(n −1:h) > B(n −1:h) . In this case, Abig B(n −1:0) = Bbig A(n −1:0) = 1 independently of Abig B(h−1:0) and Bbig A(h−1:0) . Analogously, Abig B(n −1:h) = Bbig A(n −1:h) = 0 if and only if A(n −1:h) < B(n −1:h) . This case leads to Abig B(n −1:0) = Bbig A(n −1:0) = 0 independently of Abig B(h−1:0) and Bbig A(h−1:0) . Theorems 3 and 4 also demonstrate that, A(n −1:h) being equal to B(n −1:h) , Abig B(n −1:h) = 0 and Bbig A(n −1:h) = 1. In this case, the results provided by (10) and (11) depend on Abig B(h−1:0) and Bbig A(h−1:0) . The former is equal to 1 only when A(h−1:0) > B(h−1:0) , this implies A(n −1:0) > B(n −1:0) , whereas the latter is equal to 0 only if A(h−1:0) < B(h−1:0) , that is, if A(n −1:0) < B(n −1:0) . Consequently, when the first case occurs, (10) and (11) provide Abig B(n −1:0) = Bbig A(n −1:0) = 1. On the contrary, if the second case takes place, both Abig B(n −1:0) and Bbig A(n −1:0) are equal to 0. Proof of Corollary 2: From Theorems 1, 2, 3, and 4 and Corollary 1, we know that Abig B(n −1:0) = 1 if and only if A(n −1:0) > B(n −1:0) and that Bbig A(n −1:0) = 0 if and only if A(n −1:0) < B(n −1:0) . This implies that, when the case A(n −1:0) = B(n −1:0) occurs, only one condition is possible: Abig B(n −1:0) = 0 and Bbig A(n −1:0) = 1. Consequently, the majority function used in (12) provides Aeq B(n −1:0) = 1, otherwise it outputs 0. REFERENCES [1] C. S. Lent, P. D. Tougaw, W. Porod, and G. H. Bernestein, “Quantum cellular automata,” Nanotechnology, vol. 4, no. 1, pp. 49–57, 1993. [2] M. T. Niemer and P. M. Kogge, “Problems in designing with QCAs: Layout = timing,” Int. J. Circuit Theory Appl., vol. 29, pp. 49–62, 2001. [3] G. H. Bernstein, A. Imre, V. Metlushko, A. Orlov, L. Zhou, L. Ji, G. Csaba, and W. Porod, “Magnetic QCA systems,” Microelectron. J., vol. 36, pp. 619–624, 2005. [4] J. Huang and F. Lombardi, Design and Test of Digital Circuits by QuantumDot Cellular Automata. Norwood, MA, USA: Artech House, 2007. [5] W. Liu, L. Lu, M. O’Neill, and E. E. Swartzlander Jr., “Design rules for quantum-dot cellular automata,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), Rio De Janeiro, Brazil, May 2011, pp. 2361–2364. [6] K. Kim, K. Wu, and R. Karri, “Towards designing robust QCA architectures in the presence of sneak noise paths,” in Proc. IEEE Design, Automation Test Eur. Conf. Exhib. (DATE), Munich, Germany, Mar. 2005, pp. 1214–1219. [7] K. Navi, M. H. Moaiyeri, R. F. Mirzaee, O. Hashemipour, and B. M. Nezhad, “Two new low-power full adders based on majority-not gates,” Microelectron. J., vol. 40, pp. 126–130, 2009. [8] H. Cho and E. E. Swartzlander Jr., “Adder design and analyses for quantum-dot cellular automata,” IEEE Trans. Nanotechnol., vol. 6, no. 3, pp. 374–383, May 2007.

201

[9] H. Cho and E. E. Swartzlander Jr., “Adder and multiplier design in quantum-dot cellular automata,” IEEE Trans. Comput., vol. 58, no. 6, pp. 721–727, Apr. 2009. [10] V. Pudi and K. Sridharan, “Efficient design of a hybrid adder in quantumdot cellular automata,” IEEE Trans. VLSI Syst., vol. 19, no. 9, pp. 1535– 1548, Jul. 2011. [11] M. Gladshtein, “Quantum-dot cellular automata serial decimal adder,” IEEE Trans. Nanotechnol., vol. 10, no. 6, pp. 1377–1382, Nov. 2011. [12] V. Pudi and K. Sridharan, “Low complexity design of ripple carry and Brent-Kung adders in QCA,” IEEE Trans. Nanotechnol., vol. 11, no. 1, pp. 105–119, Jan. 2012. [13] S. Perri and P. Corsonello, “New methodology for the design of efficient binary circuits addition in QCA,” IEEE Trans. Nanotechnol., vol. 11, no. 6, pp. 1192–1200, Nov. 2012. [14] V. Pudi and K. Sridharan, “New decomposition theorems on majority logic for low-delay adder designs in quantum dot cellular automata,” IEEE Trans. Circuits Syst. II: Exp. Brief, vol. 59, no. 10, pp. 678–682, Oct. 2012. [15] H. Cho and E. E. Swartzlander Jr., “Serial parallel multiplier design in quantum-dot cellular automata,” in Proc. IEEE Symp. Comput. Arithmetic, 2007, pp. 7–15. [16] S. W. Kim and E. E. Swartzlander Jr., “Parallel multipliers for quantumdot cellular automata,” in Proc. IEEE Nanotechnol. Mater. Devices Conf., 2009, pp. 68–72. [17] S. W. Kim and E. E. Swartzlander Jr., “Multipliers with coplanar crossings for quantum-dot cellular automata,” in Proc. IEEE Int. Conf. Nanotechnol., 2010, pp. 953–957. [18] W. Liu, L. Lu, M. O’Neill, and E. E. Swartzlander Jr., “Montgomery modular multiplier design in quantum-dot cellular automata using cut-set retiming,” in Proc. IEEE Int. Conf. Nanotechnol., 2010, pp. 205–210. [19] L. Lu, W. Liu, M. O’Neill, and E. E. Swartzlander Jr., “QCA systolic matrix multiplier,” in Proc. IEEE Annu. Symp. VLSI, 2010, pp. 149–154. [20] J. D. Wood and D. Tougaw, “Matrix multiplication using quantum-dot cellular automata to implement conventional microelectronics,” IEEE Trans. Nanotechnol., vol. 10, no. 5, pp. 1036–1042, Sep. 2011. [21] L. Lu, W. Liu, M. O’Neill, and E. E. Swartzlander Jr., “QCA systolic array design,” IEEE Trans. Comput., vol. 62, no. 3, pp. 548–560, Mar. 2013. [22] J. R. Janulis, P. D. Tougaw, S. C. Henderson, and E. W. Johnson, “Serial bit-stream analysis using quantum-dot cellular automata,” IEEE Trans. Nanotechnol., vol. 3, no. 1, pp. 158–164, Mar. 2004. [23] K. Qiu and Y. Xia, “Quantum-dots cellular automata comparator,” in Proc. Int. Conf. ASIC, 2007, pp. 1297–1300. [24] B. Lampreht, L. Stepancic, I. Vizec, and B. Zankar, “Quantum-dot cellular automata serial comparator,” in Proc. EUROMICRO Conf. Digital Syst. Design Architectures, Methods Tools, 2008, pp. 447–452. [25] Y. Xia and K. Qiu, “Design and application of universal logic gate based on quantum-dot cellular automata,” in Proc. IEEE Int. Conf. Commun. Technol., 2008, pp. 335–338. [26] M. D. Wagh, Y. Sun, and V. Annampedu, “Implementation of comparison function using quantum-dot cellular automata,” in Proc. Nanotechnol. Conf. Trade Show, 2008, pp. 76–79. [27] Y. Xia and K. Qiu, “Comparator design based on quantum-dot cellular automata,” J. Electron. Inf. Technol., vol. 31, no. 6, pp. 1517–1520, 2009. [28] S. Ying, T. Pei, and L. Xiao, “Efficient design of QCA optimal universal logic gate ULG.2 and its application,” in Proc. Int. Conf. Comput. Appl. Syst. Modeling (ICCASM), 2010, pp. 392–396. [29] S. Perri and P. Corsonello, “Fast low-cost implementation of single-clockcycle binary comparator,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 55, no. 12, pp. 1239–1243, Dec. 2008. [30] P. Chuang, D. Li, and M. Sachdev, “A low-power high-performance singlecycle tree-based 64-bit binary comparator,” IEEE Trans. Circuits Syst.-II: Exp. Briefs, vol. 59, no. 2, pp. 108–112, Feb. 2012. [31] K. Walus and G. A. Jullien, “Design tools for an emerging soc technology: Quantum-dot cellular automata,” Proc. IEEE, vol. 94, no. 6, pp. 1225– 1244, Jun. 2006. [32] S. Bhanja, M. Ottavi, S. Pontarelli, and F. Lombardi, “QCA circuits for robust coplanar crossing,” J. Electron. Testing: Theory Appl., vol. 23, no. 2, pp. 193–210, 2007. [33] A. Gin, P. D. Tougaw, and S. Williams, “An alternative geometry for quantum dot cellular automata,” J. Appl. Phys., vol. 85, no. 12, pp. 8281– 8286, Jun. 1999. [34] A. Chaudhary, D. Z. Chen, X. S. Hu, and M. T. Niemer, “Fabricatable interconnect and molecular QCA circuits,” IEEE Trans. Comput. Aided Design Integr. Circuits Syst., vol. 26, no. 11, pp. 1977–1991, Nov. 2007.

202

[35] M. Janez, P. Pecar, and M. Mraz, “Layout design of manufacturable quantum-dot cellular automata,” Microelectron. J., vol. 43, pp. 501–513, 2012. [36] R. Zhang, K. Walus, W. Wang, and G. A. Jullien, “A method of majority logic reduction for quantum cellular automata,” IEEE Trans. Nanotechnol., vol. 3, no. 4, pp. 443–450, Dec. 2004. [37] K. Kong, Y. Shang, and R. Lu, “An optimized majority logic synthesis methodology for quantum-dot cellular automata,” IEEE Trans. Nanotechnol., vol. 9, no. 2, pp. 170–183, Mar. 2010.

Stefania Perri (M’00–SM’09) received the M.S. degree in computer science engineering from the University of Calabria, Rende, Italy, in 1996, and the Ph.D. degree in electronics engineering from the University Mediterranea of Reggio Calabria, Reggio Calabria, Italy, in 2000. In 1996, she joined as a Researcher Associate in the Department of Electronics, Computer Sciences and Systems, University of Calabria, where she is currently an Associate Professor of Electronics. In 2002, she was appointed as an Assistant Professor of Electronics with the Department of Electronics, Computer Science and Systems, University of Calabria. In the summer 2004, she was a Visiting Researcher in the Department of Electrical and Computer Engineering, University of Rochester, NY, Rochester, USA, where from 2005 she was appointed as Adjunct Assistant Professor for four years. Her current research interests include QCA-based circuits, high-performance arithmetic circuits, low-power design, VLSI circuits for image processing and multimedia, reconfigurable computing, and VLSI design. She is coauthor of more than 100 technical papers and holds two patents in these fields.

Pasquale Corsonello (M’97) was born in Cosenza, Italy, on May 4, 1964. He received the M.S. degree in electronics engineering from the University of Naples “Federico II,” Naples, Italy, in 1988. He joined the Institute of Research on Parallel Computers, National Council of Research of Italy, Naples, Italy, where he was involved on the design and modeling of electronic transducers for high precision measurement, receiving a post-graduate twoyear grant. In 1992, he joined the Department of Electronics, Computer Science and Systems, University of Calabria, Rende, Italy, as a Research Associate. In 1997, he was appointed as an Assistant Professor of Electronics with the Department of Electronics Engineering and Applied Mathematics, University of Reggio Calabria, Reggio Calabria, Italy, where he was also the Director of the Microelectronics Laboratory. In 2001, he was appointed as an Associate Professor of Electronics and as the Chair of the Ph.D. Program in Electronics Engineering at the University of Reggio Calabria. In the summer 2004, he was a Visiting Researcher with the Department of Electrical and Computer Engineering of the University of Rochester, Rochester, NY, USA. In 2005, he was appointed as an Adjunct Associate Professor with the same department. He is currently an Associate Professor of Electronics in the Department of Electronics, Computer Science and Systems, University of Calabria. He is an Associate Editor of the Journal of Low Power Electronics and Applications. His current research interests include high-performance arithmetic circuits, low-power design, VLSI architecture for image processing, QCA-based circuits, and reconfigurable systems. He has authored or coauthored over 120 technical papers and holds two patents in these fields. Dr. Corsonello a member of technical committees of several VLSI conferences and a peer reviewer for several VLSI journals. He is an Associate Editor of the IEEE TRANSACTIONS ON VLSI SYSTEMS.

IEEE TRANSACTIONS ON NANOTECHNOLOGY, VOL. 13, NO. 2, MARCH 2014

Giuseppe Cocorullo (M’93) was born in Italy in 1952. He received the Dr.Eng. degree in electronic from the University of Naples, Naples, Italy, in 1978. From 1983 to 1992, he was with the National Research Council (IRECE Institute-Naples), where he was in charge of the microelectronic lab. Winner of a national competition in 1992, he was appointed an Associate Professor of Electronics at the University of Calabria, Rende, Italy, where he became a Full Professor of Electronics in 2001 and is currently in charge of the Nanoelectronics and Microsystems Lab. His current research interests include silicon optoelectronics, solar cells, bioelectronics, and embedded systems.