Harmonious Internal Clock Synchronization Horst F. Wedde, Wolfgang Freund Informatik III University of Dortmund 44221 Dortmund / Germany
ABSTRACT Internal clock synchronization has been investigated, or employed, for quite a number of years, under the requirement of good upper bounds for the deviation, or accuracy, between a predefined master node M and slaves S1, S2, etc. It has been tacitly assumed that the accuracy range between slaves S1 and S2 was just the same as for the master/ slave. In extensive experimental studies of unicast-based protocols, however, we found a significantly worse accuracy between the slave nodes. Except for the asymmetry as such – which results in different communication conditions in a logically peer communication structure (like in our MELODY system) – this is particularly undesirable, if not harmful, if one wants to dynamically shift the master role between nodes, for the purpose of achieving fault tolerance. Nearly all previous internal clock synchronization algorithms with high accuracy are based on unicast messages (including our own efforts). In this paper we present two new broadcast protocols, the Real-Time Duplex Protocol (RTDP) and the Real-Time Burst Protocol (RTBP). With these we achieve a homogeneous clock deviation structure in that the accuracy between master and slave is the same as between two slaves. Also, despite the fact that the broadcast delay is slightly higher than the unicast transmission time, the accuracy is even better than for a master/ slave synchronization under unicast methods. We explain how both algorithms serve both real-time and fault tolerance needs. We report on our experimental findings.
6.2 . This will be our new target environment for the MELODY project in which a distributed operating system for supporting real-time and safety-critical applications has been incrementally developed [WeL97]. While porting the MELODY system from a Token Ring of IBM RS/6000 machines running AIX, to the Linux environment (with our own function for direct access to the hardware clock) we revisited the clock synchronization problem which had been solved in the IBM network through internal synchronization [WLS99]. Problem and Purpose of Work. We had constructed and evaluated a novel and inexpensive synchronization method called Real-Time Network Protocol (RTNP). It relies on unicast messages and roundtrip-based measurements. Between a predefined master and the remainder (slave) nodes a maximum clock deviation of 30 µs could be guaranteed [WLS99]. This was a good result at that time compared to related work, as explained in [WLS99].
0. Introduction and Motivation
The discussion in related projects – whether the synchronization is based on unicast or multicast messages – is solely focussed on deviations between a master (server) and the slave (client) clocks. It has been tacitly assumed that the deviation between such slave clocks (which are synchronized against their common master) is the same as between the master and any slave. This assumption does not at once seem reasonable since if one slave runs ahead of the master and one is behind (within guaranteed bounds), then the slave clocks may in the worst case differ by the sum of the absolute amounts of the master-slave deviation values. While we tuned our RTNP protocol to the new target environment we found evidence for this conjecture. This started the present research. We were strongly motivated by the needs for a fair peer-to-peer cooperation support for our MELODY system - which calls for homogeneous, or harmonious, clock accuracy relationships.
The Hardware Environment. Recently we have been installing a LAN of PCs which are each equipped with Pentium III processors (500 MHz), 128 Mbyte of main memory, and 13 Gbyte of disk space. They are interconnected through three parallel Ethernet channels (100 Mb/s capacity each). The local operating system is Suse Linux
At the same time we were hopeful of finding suitable broadcast-based protocols which would produce a harmonious behavior. This stemmed from the fact that a unicast protocol sends two messages sequentially for synchronizing two slave clocks (thus allowing for the mentioned divergence) while a broadcast-based protocol sends one
Keywords: Distributed Real-Time Systems, Internal Clock Synchronization, Fault Tolerant Distributed Systems, Broadcast Protocols.
broadcast message for the same purpose. In the sequel we will define and investigate two novel broadcast protocols, the Real-Time Duplex Protocol (RTDP) and the RealTime Burst Protocol (RTBP), which have been particularly designed for typical LAN environments in which narrow global bounds for one-way messages cannot be guaranteed. We will demonstrate how they achieve a harmonious synchronization. Previous and Related Work. The work reported here is a direct extension of our previous research described in [WSL99]. In the past clock synchronization has been extensively studied. Earlier work [Mil91] was not really concerned with real-time constraints. Other papers like [CrF94, SSH97, Sch97] are conceptual contributions, with no experimental evaluation. [SSH97] presents valuable work on the design of hardware devices that are in particular meant for supporting broadcast-based synchronization in MANs but no algorithm is mentioned. In [GeS94] an interesting protocol is presented which is based on broadcast messages running on a CAN bus. It takes advantage of the fact that in such environments good narrow bounds for broadcast one-way messages can be determined. Unfortunately this does not hold for typical LAN architectures like Ethernet, Token Ring etc. (which are yet considerably faster). Our protocols, in turn, are meant for just those popular environments. To our knowledge, the problem of harmonious synchronization has so far not been investigated, not even mentioned, despite its relevance for peer-to-peer cooperation support. Organization of the Paper. In section 1 we will shortly describe and discuss our approach of measuring clock deviations and drifts, including the achieved accuracies of the measurements. Section 2 contains a thorough evaluation of the deviation of the slave clocks against a common master in our protocol RTNP. The disharmonious behavior of RTNP (as a representative for unicast-based protocols in general) is demonstrated in some detail. In section 3 our novel broadcast protocols, RTDP and RTBP, are defined. They will be compared to RTNP with respect to master-slave deviation. Despite the slightly higher overhead of broadcast roundtrip messages (compared to unicast messages) they prove clearly superior to RTNP. The nearly perfect harmony of the clock deviations will also be reported. In a concluding section the results are put into perspective, and future work, in particular in fault tolerance, will be outlined.
1. Measuring Clock Deviations and Drifts The timer function in Suse Linux 6.2 (the operating system we use in our PC network) has only a granularity of 1 µs since it may be suspended by several kernel processes. For our investigations, however, we needed to measure delay differences even well below 200 nsec. Therefore we
established a timer counter process with direct access to the hardware clock where every cycle corresponds to 2 nsec. Our customized solution has been inspired by the work of the real-time Linux group in New Mexico [Bar97] and, even more, by the group at the University of Kansas [HSP98]. Due to inaccuracies during the crystal manufacturing process there is a drift between the different PC clocks. It may be assumed to be linear over a period of a few minutes (up to 12 min) as we found out with our equipment. However, over a longer time span various factors like temperature change the picture. This has already constructively been discussed in [Sch97]. 1.1 Assumption: For this paper, we consider the drift between two clocks as a linear function over time while pursuing experiments through 100 synchronization steps. For determining the clock drift we exploited the same method as described in [WSL99]. We start with roundtrip messages between nodes P and Q according to the scheme in fig.1, without placing a time stamp on the return message. For 3 slave nodes S3, S5, S6 (out of 6 in our environment) and a master node M the resulting communication delays (as measured in M’s time) are depicted in fig.2. Time Node Q user space kernel space physical network layer m1
m2
Time Node P Standard Communication Delay t2
t2
t3
t4
Standard Arrival Window t5
Figure 1: Message Roundtrip Delay with Remote Timestamp We had considered unicast and broadcast messages from M to the slaves and back, e.g. UC-Node-S5 (unicast roundtrip between M and S5) and BC-Node-S5 (broadcast between M and S5). The experiments ran over 10,000 measurements. We also studied patterns resulting from 1000 and 100,000 measurements without noticing any difference. Our observations and conclusions are summarized as follows:
1.2 Properties: a) There is a minimum delay that is independent of the number of measurements. There is also a delay value that is assumed in a peak number of
b) The difference between minimal and standard communication delay is again at most 2 µsec, both in unicast and broadcast mode. c)
The minimal and standard delay values between the master M and any slave, if considered for broadcast messages, are a little higher than for unicast messages (less than 2 µsec in either case).
d) (Symmetry) The mentioned properties hold as well if the roundtrip messages originate at the slave sites. e)
If a time stamp is placed on the return message (see fig.1) an additional overhead of at most 200 nsec is encountered (in the local time of the sending node).
the difference of this value and the current local time at P is the value of the deviation between the clocks at P and Q, with a worst-case accuracy y where –1 y µsec].
For the proof we make use of the assumption above that the delays for m1 and m2 have equal lower bounds. It should be noted also that the clock drift during the roundtrip message can be neglected (well below 2 nsec), and this holds as well for the overhead of putting the time stamp on m2 (see 1.2,e)). For more details see [WLS99]. 0 clock deviation [usec]
measurements. This peak time value is termed standard communication delay since more than 95% of all messages (again independent of the number of measurements) experience a delay which differs at most 2 µsec from the peak value.
-500 -1000
S3: -3,881 S6: -9,840 S1: -15,565 S2: -16,092 S5: -19,945
-1500 -2000 -2500 -3000 -3500 -4000
S4: -34,462
-4500 0
As a symmetry property beyond 1.2d we will assume that the minimal delay values for the messages m1, m2 involved in the roundtrip are equal. (This makes sense at least for homogeneous environments.) 6000 UC-Node-S3 UC-Node-S5 UC-Node-S6 BC-Node-S3 BC-Node-S5 BC-Node-S6
Number of Arrivals
5000 4000 3000 2000 1000
115
114
112
111
109
108
106
105
103
102
100
0
Communication Delays [usec]
Figure 2: Communication Delays (10000 measurements) We want to measure clock deviations in the subsequent sections through roundtrip messaging. In the framework of fig.1, we denote the message from P to Q by m1. Let m2 be the message from Q to P . The standard communication delay between P and Q will be denoted by T’. Also the size of a window around the standard arrival time of m2 (the standard arrival window in fig.1) is set to ± x µsec. Then the following holds: 1.3 Proposition: (Monitoring the Clock Deviation) Let m2 arrive within the standard arrival window, and let a time stamp be placed on m2 (fig.1). If 1/2 T’ is added to the time stamp when m2 arrives at P, then
20
40
60
80
100
120
Time [sec]
Figure 3: Clock Drift Against Master Clock
Measuring the Clock Drift. Between a master M and slaves S1, S2, S3, S4, S5, S6 (see fig.3) unicast messages were sent from each slave to M. M would return a message with a time stamp to determine the clock drift according to 1.3 . The standard arrival window size factor x was set to 2 µsec. 12,000 roundtrip messages were sent and evaluated, one every 10 msec. The result is depicted in fig.3. It shows a very clear linear dependency between master and slaves, within the experimental duration (2 min). (Please recall that at least 95% of all messages fit into the standard arrival window (1.2a)). For an easy understanding of the following protocol discussions we will clarify our language first. Fig. 4 depicts how the different protocols are organized during their evaluation. There is a sequence of synchronization rounds at each site, each of them consists of a “normal” synchronization phase (according to the protocol under discussion), followed by a monitoring phase during which the clock deviation both between master and slaves and between slaves will be measured using unicast messages according to 1.3. A synchronization phase may consist of 1 – s synchronization cycles, depending on the protocol. Each synchronization cycle will consist of a global synchronization step and a local update procedure. The first one is specific for each protocol while the local update automatically adjusts the local clock according to the drift to the master (which has been determined off-line (fig.3)). In our experiments a local update was
performed every 100 msec. (Please note that a master is not involved in local clock adjustment. Also, the duration of the synchronization cycles is so short that the potential clock drift in the meantime is less than 0.9 nsec.) Synchronization Roundi Synchronization Monitoring Phasei Phasei
Synchronization Roundi+1 Synchronization Monitoring Phasei+1 Phasei+1
for slave nodes S5 and S6, and a master M, through 100 synchronization rounds (cf. fig.4). (The results have been similar or identical for the other node pairs, and they are representative for a long series of experiments.) The 3 curves describe the deviations between M and S5 (M-S5), M and S6 (M-S6), and S5 and S6 (S6-S5). The deviations are denoted by a, b, c, respectively. We examined the following relationships between a, b, c, for every synchronization round (cf. fig.4):
Time
(1)
Synchronization Phasei
Monitoring Phasei-1
Synchronization Cyclei,0
Monitoring Synchronization Phase i Cyclei,S-1
(2a) (2b)
Time Monitoring Phasei Synchronization Phasei Clock Deviation Clock Deviation master-slavei slave-slavei
Synchronization Phasei+1
Time
Figure 4: Scheme of Synchronization/ Evaluation Mechanisms
2. Evaluation of Unicast Deviation (RTNP) Between Slaves The RTNP protocol has one master. It starts with a broadcast message from the master to all slaves. This invokes the slaves to start an initial synchronization procedure which is essentially the same as the one described in 1.3 except that the local slave time is not compared to the (master time + 1/2 T’) but replaced by it. The window size is set to ± 100 nsec. If m2 fails to fit into the standard arrival window, it will be ignored, and another attempt has to be made by the involved slave until the synchronization has been successful. After the initial synchronization has been completed each slave updates its clock after every 100 msec. The update value is the drift value with respect to the master clock (see section 1 around fig.4). Monitoring is not part of the protocol but of its evaluation. Since using unicast messaging entails that the synchronization of S5 and S6 takes place sequentially we expected that after a global synchronization step the deviation between master-slave pairs might be positive or negative. In such cases the deviation between corresponding slaves might be as bad as the sum of the absolute master-slave values. (Please recall that no monitoring takes place in RTNP.) In order to evaluate the harmonious behavior of RTNP, the monitoring phase was executed after every successful synchronization phase. The results are depicted in fig.5,
ε ; |c| ! "!#%$ & ε ; |c| ' ()*,+- )- .!- /-!01- 2-' ε; |c| - max{|a|,|b|} 3456587:9 a · b < 0; |c|
ε was set to 700, 800, 900 nsec in order to focus on significant cases. The first result was that condition (1) (the worst-case expectation as explained above) was true for 27.9%, 27.0%, 26.1% of all rounds. This is a significantly high value. Furthermore, (2a) and (2b) reflect the disharmony between a, b and c in a more general form. The additional constraint in (2b) makes sure that only significant results are taken into account. Assuming the same values for ε as previously, (2a) was true in 34.2%, 33.3%, and 32.4% of the rounds. (2b) held in 32.4%, 31.5%, 30.6% of the synchronization cases. With this, we achieved a clear picture about a strongly disharmonious behavior of the unicast-based protocol.
3. Broadcast Protocols Broadcast messages, different from a sequence of unicast messages, are logically reaching at, ideally even received into memory by, each node at the same time. Thus the discrepancy between master-slave and slave-slave clock deviation could hopefully be avoided in suitable broadcast protocols. (In the previous section, we identified the divergent synchronization effects of sequential unicast messages as a major (potential) source of the disharmonious picture.) We will now describe two new protocols: the Real-Time Duplex Protocol (RTDP) and the Real-Time Burst Protocol (RTBP).
3.1 RTDP. This protocol is somewhat similar in spirit to RTNP. It is operated through two master nodes, Ma and Mb. Each synchronization round (see fig.4) is started by Mb sending a broadcast message (see fig.6). Immediately after receiving it, Ma emits also a broadcast message. Upon receiving this latter message every other node (including Mb) checks whether the arrival time fits into a predefined standard arrival window. This window is set up to ±100 nsec around the standard roundtrip delay time T’ for Mb, based on the broad-
M-S5
4000
M-S6
S6-S5
clock deviation [nsec]
3000 2000 1000 0 -1000 -2000 -3000 -4000 0
10
20
30
40
50
60
70
80
90
100
Synchronization Rounds
Figure 5: Clock Deviation Between Slaves with Unicast cast information depicted in fig.2. For every other node S a standard one-way broadcast delay is assumed to be 1/2T’ as the center of the standard arrival window. Ma‘s broadcast message carries the emission time stamp which will be used for adjusting the other clocks like in case of RTNP. 50.1 usec Node Ma Node Mb Node S Time
t1
t2t3 t4
t5 t6 t7
t8t9 t10
volved slave (or master Mb). Instead a kind of burst of synchronization cycles might then bridge the gap for the unsuccessful node. This idea is carried further in the second broadcast protocol. The basic idea stems from the following
3.2 Observation: In a burst of (broadcast) messages which are sent with a high frequency (every 30-80 µsec) to some target node the first few may suffer from a considerably long delay (a kind of cold start effect). However, after at most six messages, the next few ones are very close to the standard one-way delay (to be measured off-line (see section 1)). Due to page limitations we refrain from deriving, or further substantiating, this insight here.
t11 t12 ± 100 nsec
Figure 6: Real-Time Duplex Protocol (RTDP) If Ma‘s message does not fit into a standard arrival window at node S it will be ignored. In any case the local update procedure is part of the protocol (see discussion around fig.4). Every synchronization phase consists of 12 synchronization cycles (see fig.4). (Please note that the clock drift during a synchronization cycle is at most 0.9 nsec.) Each synchronization phase is followed by a monitoring phase. There is a synchronization round every 12 sec. In case of a timely receipt of Ma‘s message, RTDP resembles the unicast procedures, with Mb taking over the solicitation role for the slaves. If standard arrival windows are not met, however, there is no solicitation by the in-
3.3 RTBP. In the Real-Time Burst Protocol there is only one master M (see fig.7) M sends a burst of 12 broadcast messages with a high frequency (30 µsec in the experiments reported here). Each one carries the time stamp of its emission time. Based on predetermined standard arrival windows (± 100 nsec in our experiments), each (receiving) node checks, after receiving a message, whether it fits into the arrival window around the expected arrival time. The latter is supposed to be 30 µsec from the arrival of the previous burst message. (A slave has to wait for the second burst message before checking since it has no information on the emission time of the first message, relative to its own time.) Once a time window is met the time stamp is used to adjust the local clock (adding a stan-
dard one-way delay (see 3.1)). All remainder burst messages of this phase are ignored.
with RTNP. •
Node M
As representative for the whole picture, fig.8 depicts the master-slave deviation results for node S6. While the curves as such show already the comparably high deviation divergence of RTNP this becomes even more impressive if one compares the tendencies. (The latter are given through polynomial approximation of degree 6.) These exhibit a near-ideal and stable behavior of the broadcast protocols (less than 200 nsec deviation from the master throughout, in case of RTBP) as opposed to the varying picture for RTNP, with much higher absolute clock deviations.
Node Sa Node Sb t1
Time
t2 t3
30 usec
t4
t5
t7
t8 t9
30 usec
Master
Slave 43 usec
± 100 nsec
Figure 7: Real-Time Burst Protocol (RTBP) Each synchronization phase consists of 12 synchronization cycles followed by a monitoring phase, thus making up a synchronization round (fig.4). (The monitoring phase is part of the protocol since the clocks might be off considerably at the beginning.) There is a synchronization round every 12 seconds (as for RTDP).
3.4 Evaluation of RTDP and RTBP. The monitoring results were studied and evaluated in two directions: •
harmonious behavior of the broadcast protocols.
Next the master-deviations M-S5 and M-S6 are compared to the slave-slave deviation S6-S5, for RTDP (fig.9) and RTBP (fig.10). The nearly identical tendency curves explain immediately the throughout harmonious relationships. Using the conditions (1), (2a), (2b) from section 2, including the notations a, b, c, we found that (1) was met for 7.6%, 5.7%, 1.0% of the synchronization rounds, for ε assuming 700, 800, 900 nsec. (2a) held for 7.6%, 5,7%, 1.0%, and (2b) was valid for 6.7%, 5,7%, 1.0%. This demonstrates convincingly that the behavior under RTBP is nearly ideally harmonious, even more so since the mostly identical results for (1), (2a), (2b) suggest that the percentage numbers are altogether neglectible. (Conceptually there should be a remarkable difference under the different conditions.)
quality of master-slave synchronization, node by node. RTDP and RTBP were compared
2500
RTNP
RTDP
RTBP
Trend(RTBP)
Trend(RTDP)
Trend(RTNP)
2000
clock deviation [nsec]
1500 1000 500 0 -500 -1000 -1500 -2000 0
10
20
30
40
50
60
70
Synchronization Rounds
Figure 8: Quality of Master-Slave Synchronization
80
90
100
M-S5 Trend(S6-S5)
1500
M-S6 Trend(M-S5)
S6-S5 Trend(M-S6)
clock deviation [nsec]
1000 500 0 -500 -1000 -1500 -2000 0
10
20
30
40
50
60
70
80
90
100
Synchronization Rounds
Figure 9: Harmonious Synchronization Through RTDP
Conclusion and Future Work When revisiting the clock Synchronization protocol RTNP for our new Linux-based experimental network of 7 fast PCs we found that this protocol was disharmonious in that slave-slave deviations were considerably different from the master-slave deviations. This problem, to our knowledge, has not yet been noticed or studied although it may cause imbalancing effects for peer-to-peer cooperation support. After extensive studies of the phenomena we defined two novel protocols, RTDP and RTBP. While the are quite different approaches to the problem both of them turned out to be better and more stable than RTNP (and M-S5 Trend(S6-S5)
1500
probably every other unicast-based protocol) regarding master-slave clock deviations. Also, both are near-ideally harmonious. In RTDP as well as in RTBP, activities are started and driven by a master. This makes them good candidates for fault tolerance studies. If a master crashes this is not easily perceived under RTNP since each synchronization round is asynchronously started by the slaves but a trigger is to be provided each time by the master. Also, after switching to another master, the slaves have to be notified whereas in the broadcast protocols there is no such delay. Also, the clock synchrony is probably much better preserved in RTDP and RTBP since after a master crash all M-S6 Trend(M-S5)
S6-S5 Trend(M-S6)
clock deviation [nsec]
1000 500 0 -500 -1000 -1500 0
10
20
30
40
50
60
70
Synchronization Rounds
Figure 10: Harmonious Synchronization Through RTBP
80
90
100
slaves are still in good mutual clock synchrony (in harmony). Now any of them could take over the master role without major time adjustments for the other slaves. In order to drop the assumption of a linear clock drift we will now consider a dynamically determined (non-linear) clock drift. There have been good conceptual preparations for computing the drift function [Sch97] which we will make use of. So far, the results obtained allowed us to prepare for adequately porting MELODY to the PC environment. Our studies were possible through gaining direct access to the hardware clock under Linux. Along the same line, we are preparing, for the burst protocol RTBP, a network driver function which would guarantee that the burst of 12 messages will not be preempted when the MELODY system is running. (We are in the process of porting it to our target environment.) This would enable us to establish that no intrusion comes up between the timing services and MELODY. (Time synchronization is done on a separate network channel.) This will be subject to upcoming publications.
References [Bar97]
M. Barabanov, A Linux-based Real-Time Operating System; New Mexico Institute of Mining and Technology, Socorro, New Mexico, June 1997.
[CrF94]
F. Cristian, C. Fetzer: Probabilistic Internal Clock Synchronization; Proceedings of the Thirteenth Symposium on Reliable Distributed Systems, Oct. 1994, Dana Point, Ca.
[GeS94]
M. Gergeleit, H. Streich: Implementing a Distributed High-Resolution Real-Time Clock
using the CAN-Bus; Proceedings of the 1st international CAN-Conference 94, Mainz, Sep. 1994. [HSP98]
R. Hill, Balaji Srinivasan, Shyam Pather, Douglas Niehaus: Temporal Resolution and Real-Time Extensions to Linux; Technical Report ITIC-FY98-TR-11510-03, University of Kansas, June 1998.
[Mil91]
Mills: D.L. Internet Time Synchronization: The Network Time Protocol; IEEE Trans. Communications COM-39, 10 (October 1991).
[SSH97]
K. Schossmaier, U. Schmid, M. Horauer, D. Loy: Specification and Implementation of the Universal Time Coordinated Synchronization Unit (UTCSU); Journal of Real-Time Systems 12:3, pages 295-327, May 1997.
[Sch97]
K. Schossmaier: An Interval-based Framework for Clock Rate Synchronization; Proceedings of the 16th ACM Symposium on Principles of Distributed Computing (PODC), pages 169178, Santa Barbara, USA, August 21-24, 1997.
[WeL97] Wedde, H.F., and J.A. Lind: Building Large, Complex, Distributed Safety-Critical Operating Systems; Real-Time Systems Vol. 13 No. 3 (1997). [WLS99] Wedde, H.F., J.A. Lind and G. Segbert: Achieving Internal Synchronization Accuracy of 30 µs Under Message Delays Varying More Than 3 msec; Proc. of WRTP’99, 24th IFAC/IFIP Workshop on REAL TIME PROGRAMMING; (June 1999).