Slew-Aware Clock Tree Design for Reliable Subthreshold Circuits Jeremy R. Tolbert, Xin Zhao, Sung Kyu Lim, and Saibal Mukhopadhyay Georgia Institute of Technology, Atlanta, GA, 30332
[email protected], {xinzhao, limsk, saibal}@ece.gatech.edu designer to optimize secondary parameters such as robustness and performance. As a result of these efforts, works have been presented to optimize energy-delay product, while performing computations with minimal error [1].
ABSTRACT In the paper, we analyze the effect of clock slew in subthreshold circuits. Specifically, we address the issue that variations in clock slew at the register control can cause serious timing violations. We show that clock slew variations can cause latch timing metrics such as setup, hold and clock-to-q times to deviate by 90% from the design goals. Based on these observations, we recognize the importance of clock slew control in subthreshold circuits. We propose a systematic approach to design the clock tree for subthreshold circuits to reduce the clock slew variations while minimizing the power dissipation in the tree. We show that a tighter nodal capacitance control is necessary to control the slew in a subthreshold clock tree, which can increase the power dissipation. Recognizing that the wire resistances have a negligible effect in subthreshold circuits, we show proper wire sizing is necessary to reduce the clock power. Finally, we propose a dynamic nodal capacitance control technique that allows larger slew at the earlier nets of the tree while controlling it more aggressively near the sink nodes. The combined approach, including the wire sizing and dynamic nodal capacitance control, can achieve better slew control (and better timing control) at lower power in subthreshold circuits.
In addressing the design of an optimal energy-delay subthreshold system, the clock network plays a significant role. Delivering robust clock signals to hundreds or even thousands of flip-flops requires the clock tree to be optimally designed to handle issues of delay, skew, and jitter. In subthreshold, the signal slew also has capacity to affect system performance. Additionally, because the clock network is the work horse of sequential logic, with the highest switching activity, it can contribute up to 40% of the total dynamic power [2]. The same trend is expected when an above threshold system is scaled to subthreshold voltages. Thus, designing a low-power yet robust clock tree is a critical challenge to implement large scale subthreshold systems. This challenge is increasingly difficult because subthreshold designs are always constrained by the requirement of robustness. The inherent variations in process parameters and other sources of random circuit noises severely degrade the robustness of the subthreshold circuits. Hence, the deterministic variation sources (i.e. clock slew) that can be modeled during clock tree design time need to be accurately considered and methods need to be developed to reduce them.
Categories and Subject Descriptors
The purpose of this work is to analyze the impact of clock slew in subthreshold design and propose a technique for a lowpower slew-controlled clock-tree design. We examine the inherent slew variations in clock tree. We show that the slew variations can contribute to as much as 90% variation in parameters such as setup time, hold time, and clock-to-q. As the slew increases, these timing metrics deviate further from the best case, increasing the risk of violating cycle and hold time margins. By designing a subthreshold clock tree that causes less slew variations at the registers, the timing violations will be reduced, creating a more robust clock design. Experiments in predictive 65nm technology shows that by employing the proposed techniques, it is possible to reduce the clock slew variation induced errors in latch timing while maintaining or reducing the power dissipation of the clock tree.
B.8.2 [Performance and Reliability]: General
General Terms Performance, Design, Reliability
Keywords Subthreshold, Slew, Clocks
1. I#TRODUCTIO# Transistors operating in the subthreshold region constitute an attractive technology for ultra-low power mobile applications such as micro-sensors and biomedical devices. When the primary goal is to save energy, subthreshold logic can allow for maximum power reduction by simply reducing the supply voltage. Even though low power is the main focus, it is still innate for the
2. THE IMPORTA#TA#CE OF SUBTHRESHOLD SLEW CO#TROL
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’09, August 19-21, 2009, San Francisco, CA, USA. Copyright 2009 ACM 978-1-60558-684-7/09/08…$10.00.
As the supply voltage of digital circuits is scaled below the device threshold, the characteristics of the transistor change. An immediate observation is that the current in the subthreshold regime has an exponential dependence on gate voltage, threshold voltage, and additional parameters that are functions of the process. This is in contrary to the above threshold design, whose dependence has been noted to be linear or square [3]. As a result
15
100
20
Sub VTH σ / µ = 0.2309
10 0
4
6
8
Slew Variations, ns
Load Cap, fF
30
4
2.5
60 40
1.5
20
1
0
10
In the subthreshold clock tree the CV is 23 percent, showing there is a wide distribution of slew. Well controlled CVs are in the range of 10-15 percent.
3.5
80
3
This distribution of slew shown in Figure 1(left) corresponds to the slew variation across different sink nodes of the tree. This slew variation is caused by variation in the output slew of the inverter stages of the clock tree. The right plot in Figure 1 shows the effect of input slew and load capacitance on the output slew of an inverter. The line contours represent how the output slew of an inverter has a strong dependence on the input slew and load capacitance [10]. This is important for clock tree paths, because it is composed of numerous inverters, and the slew and capacitance vary from node to node. For a smaller output slew, a smaller input slew and load capacitance are required. The recovery of the slew (i.e. output slew smaller than the input slew) through an inverter stage is an important consideration for the clock tree path. If not controlled, the slew can become progressively higher (as much as 3X) as signals progress down a clock path. The slew recovery is a strong function of the load capacitance. Figure 1 shows that if the load capacitance is reasonable, the inverter can recover slew even for a large input slew. Hence, controlling the capacitance driven by each node in the clock tree is very important to control slew propagation through the tree and reduce the slew variation at the sink nodes.
2
5
10
Normalized Input Slew
Figure 1: (Left) Slew variations at the sink nodes of a subthreshold clock tree designed using above threshold concepts. (Right) #ormalized output slew contours and their dependence on input slew and total load capacitance for an inverter. of these effects, small variations in the subthreshold regime have been known to follow this exponential dependence, and care has been taken to control these effects [4-8]. In above threshold circuits, proper modeling has given insight into physical device parameters and how they interact with each other. It has been shown that the output slew of a logic gate has a strong dependence on the device dimensions as well as the load capacitance. The input slew of a logic gate can cause the output delay to change in the range of 50-100% [9]. More recently, this effect has been designated a concern for flip-flop design in the subthreshold region [8].
2.2 Slew Impact on Timing Violations
The purpose of this section is to demonstrate to the reader the effects of clock slew, and how severely it can impact important timing metrics. When designed with above threshold methods, subthreshold clock trees exhibit significant slew variations at the sink nodes (i.e. the nodes directly connected to the latches) that will increase the probability of timing violations. As an example, the focus of this experimental section will be based on a clock network designed using an above threshold, zero skew clock tree design algorithm, with a slew control method that limits the maximum capacitance driven by each internal and external clock node (i.e. hereafter, referred to as the nodal capacitance). The power supply of this design was then scaled to below the device threshold. In this clock tree, inverting buffers were used to reduced the number of devices and thus save on power. The small scale tree was designed using a 65nm Predictive Technology Model (PTM) with VTP = -378mV and VTN = 429mV. The power supply was 300mV and the clock tree has 267 sink nodes, each driving multiple flip-flops [11].
The primary concern that can arise from clock slew variations is the impact on sequential circuit timing. Robust latches and flip-flops are required to minimize the functionality errors, but their design alone can not account for the impact of clock slew. In this section we will characterize the clock slew impact on timing violations and show its direct influence on a commonly used flip-flop. The operating frequency of a pipelined processor is limited by the logic that can be performed between latches and is defined as the inverse of the cycle time, T:
T + δ ≥ t c − q + t log ic + t su
Where tc-q denotes the maximum propagation delay of the register (clock-to-q), tlogic is the maximum delay of the combinational logic, tsu is the setup time for the registers and δ the clock skew. When the cycle time is not met, functionality errors arise as the result of a computation has not been completed before the next cycle begins.
2.1 Clock Tree Impact on Slew Variations Variations can come from sources of random noise, imperfect device processing, and design choices. While it is possible to corral variations, it is impossible to eliminate them in general. Slew variations are no different, yet little attention has been designated to this problem. Figure 1 (left) depicts the slew distribution at the sink nodes for the subthreshold clock tree described above. The slew at the clock sink nodes is important because they are the control for pipeline registers, which constitute a significant amount of latches and flip-flops in a chip design. The coefficient of variation, CV, is a normalized measure of dispersion of a probability distribution and is defined as the ratio of standard deviation (σ) to mean (µ):
CV =
σ µ
(2)
While maximum delay is important, the hold time of the destination register must be shorter than the minimum propagation delay through the network. It is defined as,
δ + t hold < t c−q,cd + t log ic,cd
(3)
where the subscript cd refers to the contamination delay, or minimum delay. Essentially, if the hold time of a logic path is violated, the data presented at the destination register will be overwritten by new data, before it has an opportunity to write to the latch. Initially we ignore skew to understand how slew independently impacts the timing metrics. To understand the effect of clock slew we have analyzed how these variations impact the most important timing metrics; setup
(1)
16
Normalization
T + δ ≥ ∆t c−q + t log ic + ∆t su
1.5
t
1 t HOLD
0.5
4
The delta denotes that there is a variation in setup and clockto-q times that will ultimately contribute to the cycle time. From our understanding, a smaller slew variation with translate in to smaller setup and clock-to-q variations. By reducing the clock slew variation, this timing metrics will reduce and the cycle time will be better controlled.
t CQ
SETUP
t CQ CD
6
Slew, ns
(4)
2.2.2 Slew Variations affecting hold margins
8
Looking at the hold time curve of Figure 2, we see the curve with a negative slope. This leads us to believe that increasing the clock slew will reduce the hold time. Upon further investigation, it is known than all hold times are negative; thus the smaller ratio of hold time to best case hold time means the hold time at a larger slew is less negative (i.e hold time is increased). The small negative number means the requirement of the hold edge has been pulled to a time closer to the clock edge. This range of hold times can vary as much as 50% of the best case. Much like the worse case clock-to-q, the contamination delay clock-to-q (tCQ CD) is directly dependent on the slew distribution. For an increasing clock slew, the variation of contamination delay can be as much as 25%. The combination of hold and contamination times will come into play when the hold time margin is considered.
10
In our flip-flop configuration, the hold time and contamination delay increased with increasing clock slew distribution. Using this information and (3), we can rearrange the equation to get a better understanding of the hold requirement. To ensure that a race condition does not occur, the following expression should hold true, even with variations:
Figure 2: Timing variations for a clock tree designed in above threshold lowered to subthreshold. Timing metrics are normalized by the best case value.
∆t hold − ∆t c −q ,cd < −δ + t log ic,cd
time, hold time and clock-to-q. The clock slew values obtained from Figure 1 were used to simulate slew variations on a flip-flop (Figure 2). The authors realize that understanding the slew impact is a complex problem [8], so to simplify, the data slew is assumed to be zero to isolate the effects of clock slew.
(5)
Since the hold time and contamination delay increases with slew it is possible to violate the requirement of (5). If the hold time plus variations are greater than the contamination delay plus variations, a violation occurs and there is a logic failure. Additionally, if the clock tree has been designed with a recognized clock skew, any change in the left hand expression could cause a race condition. If the slew variation is minimal, this will reduce or even eliminate this problem. By considering this point, it further justifies the need for clock slew control in subthreshold design.
Figure 2 depicts a commonly used flip-flop configuration that can be used in subthreshold operation [12]. The plot above in Figure 2 shows how the slew distribution directly affects each of the four timing metrics described above. They have been normalized since the focus of this study is on the general behavior. In the next subsections we will explain how the overall timing margins are changed by a degrading slew.
2.3 Clock Tree Impact on Timing Violations In subsections 2.1 and 2.2 we have discussed that (1) a clock tree design will cause slew variations and (2) slew variations have the potential to cause severe timing violations in subthreshold. In this section, we make the connection that the design of the clock tree contributes to these timing variations. Figures 3 (a) and (b) shows the distribution of the timing metrics that impact cycle time. Figure 3 (a) reiterates the concept that the setup time is worsened by increasing clock slew with a variation of up to 90%. Recall that the clock-to-q trends were similar (Figure 3 (b)), as the clock slew increases the cycle time will increase as well. Figures 3 (c) and (d) summarize the trends on the hold time requirements. In subthreshold, the hold time is increasing (becoming less negative) while the contamination delay is increasing. The subthreshold flip-flop will be more sensitive to clock slews, because it will increase the chance of violating a hold time.
2.2.1 Slew Variations affecting cycle time margins When slew variations are applied to clock, the setup time is directly proportional to it. In severe cases the setup time can vary by 90% worse than the best case achieved. This variation in setup will reduce the time that is available to compute logic, and in some cases will cause errors in the logic by violating the setup requirements. When the worst case clock-to-q (tCQ) is considered, a similar trend will exist in relation to the clock slew rate. For the given slew distributions, the clock-to-q will be in the range of 25% worse than the best case clock-to-q. Overall, the clock slew distribution shows that there will be a distribution of cycle time as well, based on each logic path. Referring now to (4), designers should consider the setup and clock-to-q variations when designing for a target frequency.
17
30
20 15
20
10
10
5
0 1
1.5
Setup Time Variations
0 1
2
1.1
(a) 30
20
20
10
10
0.6
1.3
1.4
Figure 5: Distribute RC Line used to model wires
(b)
30
0
1.2
C−Q Variations
0.8
Hold Time Variations
0 1
1
1.1
1.2
1.3
C−Q CD Variations
possible to better control the slew in subthreshold reducing the average rise slew from 6ns to 3ns. At the same time we are reducing the slew, we are increasing the power by nearly 20%. This behavior is exhibited because as we reduce the CMAX each node can drive, the total interconnect capacitance remains constant and we need to insert more buffers to compensate for the reduced CMAX. Figure 4 (b) shows this trend of power and number of inverters as a function of CMAX. Additionally, the total wire length of the design has subtle changes, because a small wire will take the place on removed inverters. Note that going from a CMAX of 100fF to 300fF, the number of inverters decreases by 60%, but the power only decreases by nearly 20%. A large portion of this is unaffected power comes from the large interconnect capacitance. In summary to design a subthreshold clock tree, we require a smaller CMAX than in above threshold.
1.4
(c) (d) Figure 3: Timing metric distributions for a clock tree designed in above threshold and then the power supply is scaled to subthreshold voltages. The variations are normalized by the best case scenario. Hold time distributions are for negative hold times.
3. TECH#IQUES FOR LOW POWER SLEW CO#TROL I# SUBTHRESHOLD
3.2 Minimum Wire Width in Subthreshold The wire interconnect contributes to a large portion of the power in a clock network. Reducing the capacitance in interconnect without sacrificing delay in the clock path can help to address this power component. Figure 5 shows the distributed RC line that is often used to model wires. The line of length L is broken into smaller networks each of ∆L, with multiple resistor and capacitive paths. Based on [12] the design rule of thumb is that rc wire delays should only be considered when the line being modeled has reached a critical length, Lcrit:
The findings from section 2 show that it is necessary to control and reduce the variations associated with clock slew for robust subthreshold operation. The focus of this section will be to investigate techniques to design a clock tree in subthreshold that provide a smaller slew variation.
3.1 Conventional Methods that limit CMAX
Lcrit >>
100
150
1X Wire 4X Wire
200
0.9
250
150 200
0.8
250 3
1
Wire Length
0.8
Power
0.6
# INVs
0.4 100
200
td 0.38rc
(6)
where td is the gate delay, r the resistance per unit length and c the capacitance per unit length. Based on the above equation, the Lcrit for a 1ns gate in 65nm (using the PTM model) should be much greater than 4mm. This value also assumes a minimum wire width to reduce the interconnect capacitance. A recent work showed a 65nm subthreshold chip with a 2.29mm x 1.86mm area [17]. Using this as a benchmark we could expect a worse case wire length of just over 4mm. Since we expect the length of the wire to be much less than the critical length, it may be possible to neglect the propagation delay across the wire. When L