idnits 2.17.1 draft-ietf-ntp-ntpv4-proto-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4741. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4752. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4759. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4765. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 119 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document obsoletes RFC1305, but the abstract doesn't seem to directly say this. It does mention RFC1305 though, so this could be OK. -- The draft header indicates that this document obsoletes RFC4330, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2663 has weird spacing: '... ipaddr srcad...' == Line 2664 has weird spacing: '... ipaddr dstad...' == Line 2665 has weird spacing: '... char ver...' == Line 2666 has weird spacing: '... char lea...' == Line 2667 has weird spacing: '... char mod...' == (114 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 23, 2007) is 6244 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NSTAGE' on line 3535 -- Looks like a reference, but probably isn't: 'NMAX' on line 2807 == Missing Reference: '0' is mentioned on line 4142, but not defined -- Obsolete informational reference (is this intentional?): RFC 1305 (ref. '1') (Obsoleted by RFC 5905) -- Unexpected draft version: The latest known version of draft-mills-sntp-v4 is -00, but you're referring to -01. -- Obsolete informational reference (is this intentional?): RFC 1349 (ref. '7') (Obsoleted by RFC 2474) Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NTP WG J. Burbank, Ed. 3 Internet-Draft W. Kasch, Ed. 4 Obsoletes: RFC 4330, RFC 1305 JHU/APL 5 (if approved) J. Martin, Ed. 6 Intended status: Standards Track Daedelus 7 Expires: September 24, 2007 D. Mills 8 U. Delaware 9 March 23, 2007 11 Network Time Protocol Version 4 Protocol And Algorithms Specification 12 draft-ietf-ntp-ntpv4-proto-05 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on September 24, 2007. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2007). 43 Abstract 45 The Network Time Protocol (NTP) is widely used to synchronize 46 computer clocks in the Internet. This document describes NTP Version 47 4 (NTPv4), which is backwards compatible with NTP Version 3 (NTPv3) 48 described in RFC 1305, as well as previous versions of the protocol. 50 It includes a modified protocol header to accommodate the Internet 51 Protocol Version 6 address family. NTPv4 includes fundamental 52 improvements in the mitigation and discipline algorithms which extend 53 the potential accuracy to the tens of microseconds with modern 54 workstations and fast LANs. It includes a dynamic server discovery 55 scheme, so that in many cases specific server configuration is not 56 required. It corrects certain errors in the NTPv3 design and 57 implementation and includes an optional extension mechanism. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 63 2. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 5 64 3. Protocol Modes . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. Simple Network Time Protocol (SNTP) . . . . . . . . . . . 7 66 3.2. Dynamic Server Discovery . . . . . . . . . . . . . . . . 8 67 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 9 68 5. Implementation Model . . . . . . . . . . . . . . . . . . . . 11 69 6. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 13 70 7. Data Structures . . . . . . . . . . . . . . . . . . . . . . . 17 71 7.1. Structure Conventions . . . . . . . . . . . . . . . . . . 17 72 7.2. Global Parameters . . . . . . . . . . . . . . . . . . . . 17 73 7.3. Packet Header Variables . . . . . . . . . . . . . . . . . 18 74 7.4. The Kiss-o'-Death Packet . . . . . . . . . . . . . . . . 24 75 7.5. NTP Extension Field Format . . . . . . . . . . . . . . . 25 76 8. On Wire Protocol . . . . . . . . . . . . . . . . . . . . . . 27 77 9. Peer Process . . . . . . . . . . . . . . . . . . . . . . . . 31 78 9.1. Peer Process Variables . . . . . . . . . . . . . . . . . 31 79 9.2. Peer Process Operations . . . . . . . . . . . . . . . . . 34 80 10. Clock Filter Algorithm . . . . . . . . . . . . . . . . . . . 38 81 11. System Process . . . . . . . . . . . . . . . . . . . . . . . 40 82 11.1. System Process Variables . . . . . . . . . . . . . . . . 40 83 11.2. System Process Operations . . . . . . . . . . . . . . . . 42 84 11.2.1. Selection Algorithm . . . . . . . . . . . . . . . . 44 85 11.2.2. Cluster Algorithm . . . . . . . . . . . . . . . . . 45 86 11.2.3. Combine Algorithm . . . . . . . . . . . . . . . . . 46 87 11.3. Clock Discipline Algorithm . . . . . . . . . . . . . . . 48 88 12. Clock Adjust Process . . . . . . . . . . . . . . . . . . . . 52 89 13. Poll Process . . . . . . . . . . . . . . . . . . . . . . . . 52 90 13.1. Poll Process Variables . . . . . . . . . . . . . . . . . 52 91 13.2. Poll Process Operations . . . . . . . . . . . . . . . . . 53 92 14. Security Considerations . . . . . . . . . . . . . . . . . . . 55 93 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 94 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 56 95 17. Informative References . . . . . . . . . . . . . . . . . . . 56 96 Appendix A. Code Skeleton . . . . . . . . . . . . . . . . . . . 57 97 A.1. Global Definitions . . . . . . . . . . . . . . . . . . . 58 98 A.1.1. Definitions, Constants, Parameters . . . . . . . . . 58 99 A.1.2. Packet Data Structures . . . . . . . . . . . . . . . 61 100 A.1.3. Association Data Structures . . . . . . . . . . . . 62 101 A.1.4. System Data Structures . . . . . . . . . . . . . . . 64 102 A.1.5. Local Clock Data Structures . . . . . . . . . . . . 65 103 A.1.6. Function Prototypes . . . . . . . . . . . . . . . . 65 104 A.2. Main Program and Utility Routines . . . . . . . . . . . . 66 105 A.3. Kernel Input/Output Interface . . . . . . . . . . . . . . 69 106 A.4. Kernel System Clock Interface . . . . . . . . . . . . . . 69 107 A.5. Peer Process . . . . . . . . . . . . . . . . . . . . . . 71 108 A.5.1. receive() . . . . . . . . . . . . . . . . . . . . . 72 109 A.5.2. clock_filter() . . . . . . . . . . . . . . . . . . . 79 110 A.5.3. fast_xmit() . . . . . . . . . . . . . . . . . . . . 83 111 A.5.4. access() . . . . . . . . . . . . . . . . . . . . . . 85 112 A.5.5. System Process . . . . . . . . . . . . . . . . . . . 85 113 A.5.6. Clock Adjust Process . . . . . . . . . . . . . . . . 99 114 A.5.7. Poll Process . . . . . . . . . . . . . . . . . . . . 100 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 107 116 Intellectual Property and Copyright Statements . . . . . . . . . 108 118 1. Introduction 120 This document defines the Network Time Protocol Version 4 (NTPv4), 121 which is widely used to synchronize the system clocks among a set of 122 distributed time servers and clients. It describes the core 123 architecture, protocol, state machines, data structures and 124 algorithms. NTPv4 introduces new functionality to NTPv3, as 125 described in [1], and functionality expanded from SNTPv4 as described 126 in [2] (SNTPv4 is a subset of NTPv4). This document obsoletes [1], 127 and [2]. While certain minor changes have been made in some protocol 128 header fields, these do not affect the interoperability between NTPv4 129 and previous versions of NTP and SNTP. 131 The NTP subnet model includes a number of widely accessible primary 132 time servers synchronized by wire or radio to national standards. 133 The purpose of the NTP protocol is to convey timekeeping information 134 from these primary servers to secondary time servers and clients via 135 both private networks and the public Internet. Crafted algorithms 136 mitigate errors that may result from network disruptions, server 137 failures and possible hostile action. Servers and clients are 138 configured such that values flow from the primary servers at the root 139 via branching secondary servers toward clients. 141 The NTPv4 design overcomes significant shortcomings in the NTPv3 142 design, corrects certain bugs and incorporates new features. In 143 particular, expanded NTP timestamp definitions encourage the use of 144 floating double data types throughout the implementation. The time 145 resolution is better than one nanosecond and frequency resolution 146 better than one nanosecond per second. Additional improvements 147 include a new clock discipline algorithm which is more responsive to 148 system clock hardware frequency fluctuations. Typical primary 149 servers using modern machines are precise within a few tens of 150 microseconds. Typical secondary servers and clients on fast LANs are 151 within a few hundred microseconds with poll intervals up to 1024 152 seconds, which was the maximum with NTPv3. With NTPv4, servers and 153 clients are within a few tens of milliseconds with poll intervals up 154 to 36 hours. 156 The main body of this document describes the core protocol and data 157 structures necessary to interoperate between conforming 158 implementations. Appendix A contains additional detail in the form 159 of a skeleton program including data structures and code segments for 160 the core algorithms and in addition the mitigation algorithms used to 161 enhance reliability and accuracy. While the skeleton and other 162 descriptions in this document apply to a particular implementation, 163 they are not intended as the only way the required functions can be 164 implemented. While the NTPv3 symmetric key authentication scheme 165 described in this document carries over from NTPv3, the Autokey 166 public key authentication scheme new to NTPv4 is described in [3]. 168 The NTP protocol includes the modes of operation described in 169 Section 2 using the data types described in Section 6 and the data 170 structures in Section 7. The implementation model described in 171 Section 5 is based on a multiple-process, threaded architecture, 172 although other architectures could be used as well. The on-wire 173 protocol described in Section 8 is based on a returnable-time design 174 which depends only on measured clock offsets, but does not require 175 reliable message delivery. The synchronization subnet is a self- 176 organizing, hierarchical, master-slave network with synchronization 177 paths determined by a shortest-path spanning tree and defined metric. 178 While multiple masters (primary servers) may exist, there is no 179 requirement for an election protocol. 181 This document includes material from [4], which contains flow charts 182 and equations unsuitable for RFC format. There is much additional 183 information in [5], including an extensive technical analysis and 184 performance assessment of the protocol and algorithms in this 185 document. The reference implementation itself is available at 186 www.ntp.org. 188 The remainder of this document contains numerous variables and 189 mathematical expressions. Some variables take the form of Greek 190 characters, which are spelled out by their full case-sensitive name. 191 For example DELTA refers to the uppercase Greek character, while 192 delta refers to the lowercase character. Furthermore, subscripts are 193 denoted with '_', for example theta_i refers to the lowercase Greek 194 character theta with subscript i, or phonetically theta sub i. 196 1.1. Requirements Notation 198 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 199 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 200 document are to be interpreted as described in [6]. 202 2. Modes of Operation 204 An NTP implementation operates as a primary server, secondary server 205 or client. A primary server is synchronized directly to a reference 206 clock, such as a GPS receiver or telephone modem service. A client 207 is synchronized to one or more upstream servers, but does not provide 208 synchronization to dependent clients. A secondary server has one or 209 more upstream servers and one or more downstream servers or clients. 210 All servers and clients claiming full NTPv4 compliance must implement 211 the entire suite of algorithms described in this document. In order 212 to maintain stability in large NTP subnets, secondary servers must be 213 fully NTPv4 compliant. 215 3. Protocol Modes 217 There are three NTP protocol variants, symmetric, client/server and 218 broadcast. Each is associated with an association mode as shown in 219 Figure 1. Persistent associations are mobilized upon startup and are 220 never demobilized. Ephemeral associations are mobilized upon arrival 221 of a packet and are demobilized upon error or timeout. 223 +-------------------+--------------+-------------+ 224 | Association Mode | Assoc. Mode | Packet Mode | 225 +-------------------+--------------+-------------+ 226 | Symmetric Active | 1 | 1 or 2 | 227 | Symmetric Passive | 2 | 1 | 228 | Client | 3 | 4 | 229 | Server | 4 | 3 | 230 | Broadcast Server | 5 | 5 | 231 | Broadcast Client | 6 | N/A | 232 +-------------------+--------------+-------------+ 234 Figure 1: Association and Packet Modes 236 In the client/server variant a persistent client association sends 237 client (mode 3) packets to a server, which returns server (mode 4) 238 packets. Servers provide synchronization to one or more clients, but 239 do not accept synchronization from them. A server can also be a 240 reference clock driver which obtains time directly from a standard 241 source such as a GPS receiver or telephone modem service. We say 242 that clients pull synchronization from servers. 244 In the symmetric variant a peer operates as both a server and client 245 using either a symmetric active or symmetric passive association. A 246 persistent symmetric active association sends symmetric active (mode 247 1) packets to a symmetric active peer association. Alternatively, an 248 ephemeral symmetric passive association can be mobilized upon arrival 249 of a symmetric active packet matching no association. That 250 association sends symmetric passive (mode 2) packets and persists 251 until error or timeout. Peers both push and pull synchronization to 252 and from each other. For the purposes of this document, a peer 253 operates like a client, so a reference to client implies peer as 254 well. 256 In the broadcast variant a persistent broadcast server association 257 sends periodic broadcast server (mode 5) packets which can be 258 received by multiple clients. Upon reception of a broadcast server 259 packet matching no association, an ephemeral broadcast client (mode 260 6) association is mobilized and persists until error or timeout. It 261 is useful to provide an initial volley where the client operating in 262 client mode exchanges several packets with the server in order to 263 calibrate the propagation delay and to run the Autokey security 264 protocol, after which the client reverts to broadcast client mode. 265 We say that broadcast servers push synchronization to willing 266 consumers. 268 Following conventions established by the telephone industry, the 269 level of each server in the hierarchy is defined by a number called 270 the stratum, with the primary servers assigned stratum one and the 271 secondary servers at each level assigned one greater than the 272 preceding level. As the stratum increases from one, the accuracies 273 achievable degrade somewhat depending on the particular network path 274 and system clock stability. It is useful to assume that mean errors, 275 and thus a metric called the synchronization distance, increase 276 approximately in proportion to the stratum and measured round trip 277 delay. It is important to note that NTP stratum is only loosely 278 modeled after the telecommunications stratum, which is defined by 279 international agreement. 281 Drawing from the experience of the telecommunications industry, which 282 learned such lessons at considerable cost, the subnet topology should 283 be organized to produce the lowest synchronization distances, but 284 must never be allowed to form a loop. In NTP the subnet topology is 285 determined using a variant of the Bellman-Ford distributed routing 286 algorithm, which computes the shortest-distance spanning tree rooted 287 on the primary servers. As a result of this design, the algorithm 288 automatically reorganizes the subnet to produce the most accurate and 289 reliable time, even when one or more primary or secondary servers or 290 the network paths fail. 292 3.1. Simple Network Time Protocol (SNTP) 294 Primary servers and clients complying with a subset of NTP, called 295 the Simple Network Time Protocol (SNTPv4) [2], do not need to 296 implement the mitigation algorithms described in Section 9 and 297 following sections. SNTP is intended for primary servers equipped 298 with a single reference clock, as well as for clients with a single 299 upstream server and no dependent clients. The fully developed NTPv4 300 implementation is intended for secondary servers with multiple 301 upstream servers and multiple downstream servers or clients. Other 302 than these considerations, NTP and SNTP servers and clients are 303 completely interoperable and can be mixed and matched in NTP subnets. 305 An SNTP primary server implementing the on-wire protocol described in 306 Section 8 has no upstream servers except a single reference clock. 307 In principle, it is indistinguishable from an NTP primary server 308 which has the mitigation algorithms, presumably to mitigate between 309 multiple reference clocks. 311 Upon receiving a client request, an SNTP primary server constructs 312 and sends the reply packet as described in Figure 2 of Section 9.2. 313 Note that the dispersion field in the packet header must be updated 314 as described in Section 4. 316 +-----------------------------------+ 317 | Packet Variable <-- Variable | 318 +-----------------------------------+ 319 | x.leap <-- s.leap | 320 | x.version <-- r.version | 321 | x.mode <-- 4 | 322 | x.stratum <-- s.stratum | 323 | x.poll <-- r.poll | 324 | x.precision <-- s.precision | 325 | x.rootdelay <-- s.rootdelay | 326 | x.rootdisp <-- s.rootdisp | 327 | x.refid <-- s.refid | 328 | x.reftime <-- s.reftime | 329 | x.org <-- r.xmt | 330 | x.rec <-- r.dst | 331 | x.xmt <-- clock | 332 | x.keyid <-- r.keyid | 333 | x.digest <-- md5 digest | 334 +-----------------------------------+ 336 Figure 2: fast_xmit Packet Header 338 A SNTP client implementing the on-wire protocol has a single server 339 and no dependent clients. It can operate with any subset of the NTP 340 on-wire protocol, the simplest using only the transmit timestamp of 341 the server packet and ignoring all other fields. However, the 342 additional complexity to implement the full on-wire protocol is 343 minimal and is encouraged. 345 3.2. Dynamic Server Discovery 347 There are two special associations, manycast client and manycast 348 server, which provide a dynamic server discovery function. There are 349 two types of manycast client associations, persistent and ephemeral. 350 The persistent manycast client sends client (mode 3) packets to a 351 designated IPv4 or IPv6 broadcast or multicast group address. 352 Designated manycast servers in range of the time-to-live (TTL) field 353 in the packet listen for packets with that address. If suitable for 354 synchronization, the server returns an ordinary server (mode 4) 355 packet, but using its unicast address rather than its broadcast 356 address. Upon receipt an ephemeral client (mode 3) association is 357 mobilized using the addresses and other data in the persistent 358 manycast client association and server packet header. The ephemeral 359 client association persists until error or timeout. 361 The manycast client continues to send packets until a specified 362 minimum number of client associations have been mobilized. If fewer 363 than this number have been found, the client sends packets starting 364 with a TTL of one and increasing by one for each subsequent packet 365 until reaching a designated maximum. Upon reaching the maximum, 366 packets are not sent until after a designated timeout, after which 367 the cycle repeats. If at least the minimum number of associations 368 have been found, the client sends one packet at each timeout. 370 It is the intent that ephemeral associations compete with other 371 associations and newly discovered associations. As each crop of 372 ephemeral associations are mobilized, the mitigation algorithms 373 described in Section 10 and Section 11.2 sift the best candidates 374 from the population and the remaining ephemeral associations time out 375 and are demobilized. In this way the population includes only the 376 best and freshest candidates to discipline the system clock. The 377 reference implementation includes intricate means to do this, but 378 these are beyond the scope of this document. 380 4. Definitions 382 A number of terms used throughout this document have a precise 383 technical definition. A timescale is a frame of reference where time 384 is expressed as the value of a monotonic-increasing binary counter 385 with an indefinite number of bits. It counts in seconds and fraction 386 with the decimal point somewhere in the middle. The Coordinated 387 Universal Time (UTC) timescale represents mean solar time as 388 disseminated by national standards laboratories. The system time is 389 represented by the system clock maintained by the hardware and 390 operating system. The goal of the NTP algorithms is to minimize both 391 the time difference and frequency difference between UTC and the 392 system clock. When these differences have been reduced below nominal 393 tolerances, the system clock is said to be synchronized to UTC. 395 The date of an event is the UTC time at which it takes place. Dates 396 are ephemeral values which always increase in step with reality and 397 are designated with upper case T in this document. It is convenient 398 to define another timescale coincident with the running time of the 399 NTP program that provides the synchronization function. This is 400 convenient in order to determine intervals for the various repetitive 401 functions like poll events. Running time is designated with lower 402 case t. 404 A timestamp T(t) represents either the UTC date or time offset from 405 UTC at running time t. Which meaning is intended should be clear 406 from the context. Let T(t) be the time offset, R(t) the frequency 407 offset, D(t) the ageing rate (first derivative of R(t) with respect 408 to t). Then, if T(t_0) is the UTC time offset determined at t = t_0, 409 the UTC time offset after some interval is 411 T(t+t_0) = T(t_0) + R(t_0)(t+t_0) + 1/2 * D(t_0)(t+t_0)^2 + e, 413 where e is a stochastic error term discussed later in this document. 414 While the D(t) term is important when characterizing precision 415 oscillators, it is ordinarily neglected for computer oscillators. In 416 this document all time values are in seconds (s) and all frequency 417 values are in seconds-per-second (s/s). It is sometimes convenient 418 to express frequency offsets in parts-per-million (PPM), where 1 PPM 419 is equal to 10^(-6) seconds. 421 It is important in computer timekeeping applications to assess the 422 performance of the timekeeping function. The NTP performance model 423 includes four statistics which are updated each time a client makes a 424 measurement with a server. The offset (theta) represents the 425 maximum-likelihood time offset of the server clock relative to the 426 system clock. The delay (delta) represents the round trip delay 427 between the client and server. The dispersion (epsilon) represents 428 the maximum error inherent in the measurement. It increases at a 429 rate equal to the maximum disciplined system clock frequency 430 tolerance (PHI), typically 15 PPM. The jitter (psi) is defined as 431 the root-mean-square (RMS) average of the most recent offset 432 differences, represents the nominal error in estimating the offset. 434 While the theta, delta, epsilon, and psi statistics represent 435 measurements of the system clock relative to the each server clock 436 separately, the NTP protocol includes mechanisms to combine the 437 statistics of several servers to more accurately discipline and 438 calibrate the system clock. The system offset (THETA) represents the 439 maximum-likelihood offset estimate for the server population. The 440 system jitter (PSI) represents the nominal error in estimating the 441 system offset. The delta and epsilon statistics are accumulated at 442 each stratum level from the reference clock to produce the rootdelay 443 (DELTA) and root dispersion (EPSILON) statistics. The 444 synchronization distance (LAMBDA) equal to EPSILON + DELTA / 2 445 represents the maximum error due all causes. The detailed 446 formulations of these statistics are given in Section 11.2. They are 447 available to the dependent applications in order to assess the 448 performance of the synchronization function. 450 5. Implementation Model 452 Figure 3 shows the architecture of a typical, multiple-thread 453 implementation. It includes two processes dedicated to each server, 454 a peer process to receive messages from the server or reference clock 455 and a poll process to transmit messages to the server or reference 456 clock. 458 ..................................................................... 459 . Remote . Peer/Poll . System . Clock . 460 . Servers . Processes . Process .Discipline. 461 . . . . Process . 462 .+--------+. +-----------+. +------------+ . . 463 .| |->| |. | | . . 464 .|Server 1| |Peer/Poll 1|->| | . . 465 .| |<-| |. | | . . 466 .+--------+. +-----------+. | | . . 467 . . ^ . | | . . 468 . . | . | | . . 469 .+--------+. +-----------+. | | +-----------+. . 470 .| |->| |. | Selection |->| |. +------+ . 471 .|Server 2| |Peer/Poll 2|->| and | | Combine |->| Loop | . 472 .| |<-| |. | Cluster | | Algorithm |. |Filter| . 473 .+--------+. +-----------+. | Algorithms |->| |. +------+ . 474 . . ^ . | | +-----------+. | . 475 . . | . | | . | . 476 .+--------+. +-----------+. | | . | . 477 .| |->| |. | | . | . 478 .|Server 3| |Peer/Poll 3|->| | . | . 479 .| |<-| |. | | . | . 480 .+--------+. +-----------+. +------------+ . | . 481 ....................^.........................................|...... 482 | . V . 483 | . +-----+ . 484 +--------------------------------------| VFO | . 485 . +-----+ . 486 . Clock . 487 . Adjust . 488 . Process . 489 ............ 491 Figure 3: Implementatin Model 493 These processes operate on a common data structure, called an 494 association, which contains the statistics described above along with 495 various other data described in Section 9. A client sends packets to 496 one or more servers and processes the replies as received. The 497 server interchanges addresses and ports, overwrites certain fields in 498 the packet and returns it immediately (client/server mode) or at some 499 time later (symmetric modes). As each NTP message is received, the 500 offset theta between the peer clock and the system clock is computed 501 along with the associated statistics delta, epsilon and psi. 503 The system process includes the selection, cluster and combine 504 algorithms which mitigate among the various servers and reference 505 clocks to determine the most accurate and reliable candidates to 506 synchronize the system clock. The selection algorithm uses Byzantine 507 principles to discard the falsetickers from the incident population, 508 leaving only truechimers. A truechimer is a clock that maintains 509 timekeeping accuracy to a previously published (and trusted) 510 standard, while a falseticker is a clock that shows misleading or 511 inconsistent time. The cluster algorithm uses statistical principles 512 to sift the most accurate truechimers leaving the survivors as 513 result. The combine algorithm develops the final clock offset as a 514 statistical average of the survivors. 516 The clock discipline process, which is actually part of the system 517 process, includes engineered algorithms to control the time and 518 frequency of the system clock, here represented as a variable 519 frequency oscillator (VFO). Timestamps struck from the VFO close the 520 feedback loop which maintains the system clock time. Associated with 521 the clock discipline process is the clock adjust process, which runs 522 once each second to inject a computed time offset and maintain 523 constant frequency. The RMS average of past time offset differences 524 represents the nominal error or system clock jitter. The RMS average 525 of past frequency offset differences represents the oscillator 526 frequency stability or frequency wander. These terms are given 527 precise interpretation in Section 11.2. 529 A client sends messages to each server with a poll interval of 2^tau 530 seconds, as determined by the poll exponent tau. In NTPv4, tau 531 ranges from 4 (16 s) through 17 (36 h). The value of tau is 532 determined by the clock discipline algorithm to match the loop time 533 constant T_c = 2^tau. In client/server mode the server responds 534 immediately; however, in symmetric modes each of two peers manages 535 tau as a function of current system offset and system jitter, so may 536 not agree with the same value. It is important that the dynamic 537 behavior of the clock discipline algorithm be carefully controlled in 538 order to maintain stability in the NTP subnet at large. This 539 requires that the peers agree on a common tau equal to the minimum 540 poll exponent of both peers. The NTP protocol includes provisions to 541 properly negotiate this value. 543 The implementation model includes some means to set and adjust the 544 system clock. The operating system is assumed to provide two 545 functions, one to set the time directly, for example the Unix 546 settimeofday() function, and another to adjust the time in small 547 increments advancing or retarding the time by a designated amount, 548 for example the Unix adjtime() function. In this and following 549 references, parentheses following a name indicate reference to a 550 function rather than a simple variable. In the intended design the 551 clock discipline process uses the adjtime() function if the 552 adjustment is less than a designated threshold, and the 553 settimeofday() function if above the threshold. The manner in which 554 this is done and the value of the threshold as described in 555 Section 10. 557 6. Data Types 559 All NTP time values are represented in twos-complement format, with 560 bits numbered in big-endian (as described in Appendix A of [7]) 561 fashion from zero starting at the left, or high-order, position. 562 There are three NTP time formats, a 128-bit date format, a 64-bit 563 timestamp format and a 32-bit short format, as shown in Figure 4. 564 The 128-bit date format is used where sufficient storage and word 565 size are available. It includes a 64-bit signed seconds field 566 spanning 584 billion years and a 64-bit fraction field resolving .05 567 attosecond (i.e., 0.5e-18). For convenience in mapping between 568 formats, the seconds field is divided into a 32-bit Era Number field 569 and a 32-bit Era Offset field. Eras cannot be produced by NTP 570 directly, nor is there need to do so. When necessary, they can be 571 derived from external means, such as the filesystem or dedicated 572 hardware. 574 0 1 2 3 575 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 577 | Seconds | Fraction | 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 NTP Short Format 582 0 1 2 3 583 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 | Seconds | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | Fraction | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 NTP Timestamp Format 592 0 1 2 3 593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 | Era Number | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Era Offset | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | | 600 | Fraction | 601 | | 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 NTP Date Format 606 Figure 4: NTP Time Formats 608 The 64-bit timestamp format is used in packet headers and other 609 places with limited word size. It includes a 32-bit unsigned seconds 610 field spanning 136 years and a 32-bit fraction field resolving 232 611 picoseconds. The 32-bit short format is used in delay and dispersion 612 header fields where the full resolution and range of the other 613 formats are not justified. It includes a 16-bit unsigned seconds 614 field and a 16-bit fraction field. 616 In the date and timestamp formats the prime epoch, or base date of 617 era 0, is 0 h 1 January 1900 UTC, when all bits are zero. It should 618 be noted that strictly speaking, UTC did not exist prior to 1 January 619 1972, but it is convenient to assume it has existed for all eternity, 620 even if all knowledge of historic leap seconds has been lost. Dates 621 are relative to the prime epoch; values greater than zero represent 622 times after that date; values less than zero represent times before 623 it. Note that the Era Offset field of the date format and the 624 Seconds field of the timestamp format have the same interpretation. 626 Timestamps are unsigned values and operations on them produce a 627 result in the same or adjacent eras. Era 0 includes dates from the 628 prime epoch to some time in 2036, when the timestamp field wraps 629 around and the base date for era 1 is established. In either format 630 a value of zero is a special case representing unknown or 631 unsynchronized time. Figure 5 shows a number of historic NTP dates 632 together with their corresponding Modified Julian Day (MJD), NTP era 633 and NTP timestamp. 635 +-------------+------------+-----+---------------+------------------+ 636 | Year | MJD | NTP | NTP Timestamp | Epoch | 637 | | | Era | Era Offset | | 638 +-------------+------------+-----+---------------+------------------+ 639 | 1 Jan -4712 | -2,400,001 | -49 | 1,795,583,104 | 1st day Julian | 640 | 1 Jan -1 | -679,306 | -14 | 139,775,744 | 2 BCE | 641 | 1 Jan 0 | -678,491 | -14 | 171,311,744 | 1 BCE | 642 | 1 Jan 1 | -678,575 | -14 | 202,939,144 | 1 CE | 643 | 4 Oct 1582 | -100,851 | -3 | 2,873,647,488 | Last day Julian | 644 | 15 Oct 1582 | -100,840 | -3 | 2,874,597,888 | First day | 645 | | | | | Gregorian | 646 | 31 Dec 1899 | 15019 | -1 | 4,294,880,896 | Last day NTP Era | 647 | | | | | -1 | 648 | 1 Jan 1900 | 15020 | 0 | 0 | First day NTP | 649 | | | | | Era 0 | 650 | 1 Jan 1970 | 40,587 | 0 | 2,208,988,800 | First day UNIX | 651 | 1 Jan 1972 | 41,317 | 0 | 2,272,060,800 | First day UTC | 652 | 31 Dec 1999 | 51,543 | 0 | 3,155,587,200 | Last day 20th | 653 | | | | | Century | 654 | 8 Feb 2036 | 64,731 | 1 | 63,104 | First day NTP | 655 | | | | | Era 1 | 656 +-------------+------------+-----+---------------+------------------+ 658 Figure 5: Interesting Historic NTP Dates 660 Let p be the number of significant bits in the second fraction. The 661 clock resolution is defined 2^(-p), in seconds. In order to minimize 662 bias and help make timestamps unpredictable to an intruder, the non- 663 significant bits should be set to an unbiased random bit string. The 664 clock precision is defined as the running time to read the system 665 clock, in seconds. Note that the precision defined in this way can 666 be larger or smaller than the resolution. The term rho, representing 667 the precision used in the protocol, is the larger of the two. 669 The only arithmetic operation permitted on dates and timestamps is 670 twos-complement subtraction, yielding a 127-bit or 63-bit signed 671 result. It is critical that the first-order differences between two 672 dates preserve the full 128-bit precision and the first-order 673 differences between two timestamps preserve the full 64-bit 674 precision. However, the differences are ordinarily small compared to 675 the seconds span, so they can be converted to floating double format 676 for further processing and without compromising the precision. 678 It is important to note that twos-complement arithmetic does not know 679 the difference between signed and unsigned values; only the 680 conditional branch instructions do. Thus, although the distinction 681 is made between signed dates and unsigned timestamps, they are 682 processed the same way. A perceived hazard with 64-bit timestamp 683 calculations spanning an era, such as possible in 2036, might result 684 in incorrect values. In point of fact, if the client is set within 685 68 years of the server before the protocol is started, correct values 686 are obtained even if the client and server are in adjacent eras. 688 Some time values are represented in exponent format, including the 689 precision, time constant and poll interval. These are in 8-bit 690 signed integer format in log2 (log to the base 2) seconds. The only 691 arithmetic operations permitted on them are increment and decrement. 692 For the purpose of this document and to simplify the presentation, a 693 reference to one of these variables by name means the exponentiated 694 value, e.g., the poll interval is 1024 s, while reference by name and 695 exponent means the actual value, e.g., the poll exponent is 10. 697 To convert system time in any format to NTP date and timestamp 698 formats requires that the number of seconds s from the prime epoch to 699 the system time be determined. To determine the integer era and 700 timestamp given s, 702 era = s / 2^(32) and timestamp = s - era * 2^(32), 704 which works for positive and negative dates. To determine s given 705 the era and timestamp, 707 s = era * 2^(32) + timestamp. 709 Converting between NTP and system time can be a little messy, but 710 beyond the scope of this document. Note that the number of days in 711 era 0 is one more than the number of days in most other eras and this 712 won't happen again until the year 2400 in era 3. 714 In the description of state variables to follow, explicit reference 715 to integer type implies a 32-bit unsigned integer. This simplifies 716 bounds checks, since only the upper limit needs to be defined. 717 Without explicit reference, the default type is 64-bit floating 718 double. Exceptions will be noted as necessary. 720 7. Data Structures 722 The NTP protocol state machines described in following sections are 723 defined using state variables and code fragments defined in 724 Appendix A. State variables are separated into classes according to 725 their function in packet headers, peer and poll processes, the system 726 process and the clock discipline process. Packet variables represent 727 the NTP header values in transmitted and received packets. Peer and 728 poll variables represent the contents of the association for each 729 server separately. System variables represent the state of the 730 server as seen by its dependent clients. Clock discipline variables 731 represent the internal workings of the clock discipline algorithm. 732 Additional parameters and variable classes are defined in Appendix A. 734 7.1. Structure Conventions 736 In order to distinguish between different variables of the same name 737 but used in different processes, the naming convention summarized in 738 Figure 6 is adopted. A receive packet variable v is a member of the 739 packet structure r with fully qualified name r.v. In a similar 740 manner x.v is a transmit packet variable, p.v is a peer variable, s.v 741 is a system variable and c.v is a clock discipline variable. There 742 is a set of peer variables for each association; there is only one 743 set of system and clock variables. 745 +------+---------------------------------+ 746 | Name | Description | 747 +------+---------------------------------+ 748 | r. | receive packet header variable | 749 | x. | transmit packet header variable | 750 | p. | peer/poll variable | 751 | s. | system variable | 752 | c. | clock discipline variable | 753 +------+---------------------------------+ 755 Figure 6: Prefix Conventions 757 7.2. Global Parameters 759 In addition to the variable classes a number of global parameters are 760 defined in this document, including those shown with values in 761 Figure 7. 763 +-----------+-------+----------------------------------+ 764 | Name | Value | Description | 765 +-----------+-------+----------------------------------+ 766 | PORT | 123 | NTP port number | 767 | VERSION | 4 | version number | 768 | TOLERANCE | 15e-6 | frequency tolerance PHI (s/s) | 769 | MINPOLL | 4 | minimum poll exponent (16 s) | 770 | MAXPOLL | 17 | maximum poll exponent (36 h) | 771 | MAXDISP | 16 | maximum dispersion (16 s) | 772 | MINDISP | .005 | minimum dispersion increment (s) | 773 | MAXDIST | 1 | distance threshold (1 s) | 774 | MAXSTRAT | 16 | maximum stratum number | 775 +-----------+-------+----------------------------------+ 777 Figure 7: Global Parameters 779 While these are the only global parameters needed in this document, a 780 larger collection is necessary in the skeleton and larger still for 781 any implementation. Appendix A.1.1 contains those used by the 782 skeleton for the mitigation algorithms, clock discipline algorithm 783 and related implementation-dependent functions. Some of these 784 parameter values are cast in stone, like the NTP port number assigned 785 by the IANA and the version number assigned NTPv4 itself. Others 786 like the frequency tolerance (also called PHI), involve an assumption 787 about the worst case behavior of a system clock once synchronized and 788 then allowed to drift when its sources have become unreachable. The 789 minimum and maximum parameters define the limits of state variables 790 as described in later sections of this document. 792 While shown with fixed values in this document, some implementations 793 may make them variables adjustable by configuration commands. For 794 instance, the reference implementation computes the value of 795 PRECISION as log2 of the minimum time in several iterations to read 796 the system clock. 798 7.3. Packet Header Variables 800 The most important state variables from an external point of view are 801 the packet header variables described in Figure 8 and below. The NTP 802 packet header consists of an integral number of 32-bit (4 octet) 803 words in network byte order. The packet format consists of three 804 components, the header itself, one or more optional extension fields 805 and an optional message authentication code (MAC). The header 806 component is identical to the NTPv3 header and previous versions. 807 The optional extension fields are used by the Autokey public key 808 cryptographic algorithms described in [3]. The optional MAC is used 809 by both Autokey and the symmetric key cryptographic algorithm 810 described in report. 812 +-----------+------------+-----------------------+ 813 | Name | Formula | Description | 814 +-----------+------------+-----------------------+ 815 | leap | leap | leap indicator (LI) | 816 | version | version | version number (VN) | 817 | mode | mode | mode | 818 | stratum | stratum | stratum | 819 | poll | poll | poll exponent | 820 | precision | rho | precision exponent | 821 | rootdelay | delta_r | root delay | 822 | rootdisp | epsilon_r | root dispersion | 823 | refid | refid | reference ID | 824 | reftime | reftime | reference timestamp | 825 | org | T1 | origin timestamp | 826 | rec | T2 | receive timestamp | 827 | xmt | T3 | transmit timestamp | 828 | dst | T4 | destination timestamp | 829 | keyid | keyid | key ID | 830 | digest | digest | message digest | 831 +-----------+------------+-----------------------+ 833 Figure 8: Packet Header Variables 835 The NTP packet header follows the UDP and IP headers and the physical 836 header specific to the underlying transport network. Some fields use 837 multiple words and others are packed in smaller fields within a word. 838 The NTP packet header shown in Figure 9 has 12 words followed by 839 optional extension fields and finally an optional message 840 authentication code (MAC) consisting of the key identifier field and 841 message digest field. 843 0 1 2 3 844 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 846 |LI | VN |Mode | Stratum | Poll | Precision | 847 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 848 | Root Delay | 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 850 | Root Dispersion | 851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 852 | Reference ID | 853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 854 | | 855 + Reference Timestamp (64) + 856 | | 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 | | 859 + Origin Timestamp (64) + 860 | | 861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 862 | | 863 + Receive Timestamp (64) + 864 | | 865 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 866 | | 867 + Transmit Timestamp (64) + 868 | | 869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 870 | | 871 + Extension Field 1 (variable) + 872 | | 873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 874 | | 875 + Extension Field 2 (variable) + 876 | | 877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 878 | Key Identifier | 879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 | | 881 | Message Digest (128) | 882 | | 883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 885 Figure 9: Packet Header Format 887 The extension fields are used to add optional capabilities, for 888 example, the Autokey security protocol [3]. The extension field 889 format is presented in order that the packet can be parsed without 890 knowledge of the extension field functions. The MAC is used by both 891 Autokey and the symmetric key authentication scheme described in 892 Appendix A. 894 A list of the packet header variables is shown in Figure 8 and 895 described in detail below. Except for a minor variation when using 896 the IPv6 address family, these fields are backwards compatible with 897 NTPv3. The packet header fields apply to both transmitted packets (x 898 prefix) and received packets (r prefix). In Figure 9 the size of 899 some multiple-word fields is shown in bits if not the default 32 900 bits. The header extends from the beginning of the packet to the end 901 of the Transmit Timestamp field. 903 The fields and associated packet variables (in parentheses) are 904 interpreted as follows: 906 LI Leap Indicator (leap): 2-bit integer warning of an impending leap 907 second to be inserted or deleted in the last minute of the current 908 month with values defined in Figure 10. 910 +-------+-------------------------------------------------+ 911 | Value | Meaning | 912 +-------+-------------------------------------------------+ 913 | 0 | no warning | 914 | 1 | last minute of the day has 61 seconds | 915 | 2 | last minute of the day has 59 seconds | 916 | 3 | alarm condition (the clock is not synchronized) | 917 +-------+-------------------------------------------------+ 919 Figure 10: Leap Indicator 921 VN Version Number (version): 3-bit integer representing the NTP 922 version number, currently 4. 924 Mode (mode): 3-bit integer representing the mode, with values defined 925 in Figure 11. 927 +-------+--------------------------+ 928 | Value | Meaning | 929 +-------+--------------------------+ 930 | 0 | reserved | 931 | 1 | symmetric active | 932 | 2 | symmetric passive | 933 | 3 | client | 934 | 4 | server | 935 | 5 | broadcast | 936 | 6 | NTP control message | 937 | 7 | reserved for private use | 938 +-------+--------------------------+ 940 Figure 11: Association Modes 942 Stratum (stratum): 8-bit integer representing the stratum, with 943 values defined in Figure 12. 945 +--------+-----------------------------------------------------+ 946 | Value | Meaning | 947 +--------+-----------------------------------------------------+ 948 | 0 | unspecified or invalid | 949 | 1 | primary server (e.g., equipped with a GPS receiver) | 950 | 2-15 | secondary server (via NTP) | 951 | 16-255 | undefined | 952 +--------+-----------------------------------------------------+ 954 Figure 12: Packet Stratum 956 It is customary to map the stratum value 0 in received packets to 957 MAXSTRAT (16) in the peer variable p.stratum and to map p.stratum 958 values of MAXSTRAT or greater to 0 in transmitted packets. This 959 allows reference clocks, which normally appear at stratum 0, to be 960 conveniently mitigated using the same algorithms used for external 961 sources. 963 Poll: 8-bit signed integer representing the maximum interval between 964 successive messages, in log2 seconds. Suggested default limits for 965 minimum and maximum poll intervals are 6 and 10, respectively. 967 Precision: 8-bit signed integer representing the precision of the 968 system clock, in log2 seconds. For instance a value of -18 969 corresponds to a precision of about one microsecond. The precision 970 can be determined when the service first starts up as the minimum 971 time of several iterations to read the system clock. 973 Root Delay (rootdelay): Total round trip delay to the reference 974 clock, in NTP short format. 976 Root Dispersion (rootdisp): Total dispersion to the reference clock, 977 in NTP short format. 979 Reference ID (refid): 32-bit code identifying the particular server 980 or reference clock. The interpretation depends on the value in the 981 stratum field. For packet stratum 0 (unspecified or invalid) this is 982 a four-character ASCII string, called the kiss code, used for 983 debugging and monitoring purposes. For stratum 1 (reference clock) 984 this is a four-octet, left-justified, zero-padded ASCII string 985 assigned to the reference clock. While not specifically enumerated 986 in this document, the identifiers in Figure 13 have been used as 987 ASCII identifiers: 989 +------+----------------------------------------------------------+ 990 | ID | Clock Source | 991 +------+----------------------------------------------------------+ 992 | GOES | Geosynchronous Orbit Environment Satellite | 993 | GPS | Global Position System | 994 | GAL | Galileo Positioning System | 995 | PPS | Generic pulse-per-second | 996 | IRIG | Inter-Range Instrumentation Group | 997 | WWVB | LF Radio WWVB Ft. Collins, CO 60 kHz | 998 | DCF | LF Radio DCF77 Mainflingen, DE 77.5 kHz | 999 | HBG | LF Radio HBG Prangins, HB 75 kHz | 1000 | MSF | LF Radio MSF Anthorn, UK 60 kHz (Rugby until April 2007) | 1001 | JJY | LF Radio JJY Fukushima, JP 40 kHz, Saga, JP 60 kHz | 1002 | LORC | MF Radio LORAN C 100 kHz | 1003 | TDF | MF Radio Allouis, FR 162 kHz | 1004 | CHU | HF Radio CHU Ottawa, Ontario | 1005 | WWV | HF Radio WWV Ft. Collins, CO | 1006 | WWVH | HF Radio WWVH Kauai, HI | 1007 | NIST | NIST telephone modem | 1008 | ACTS | NIST telephone modem | 1009 | USNO | USNO telephone modem | 1010 | PTB | European telephone modem | 1011 +------+----------------------------------------------------------+ 1013 Figure 13: Reference Identifiers 1015 Above stratum 1 (secondary servers and clients) this is the reference 1016 identifier of the server and is intended to detect timing loops. If 1017 using the IPv4 address family, the identifier is the four-octet IPv4 1018 address. If using the IPv6 address family, it is the first four 1019 octets of the MD5 hash of the IPv6 address. Note that, when using 1020 the IPv6 address family on an NTPv4 server with a NTPv3 client, the 1021 Reference Identifier field appears to be a random value and a timing 1022 loop might not be detected. 1024 Reference Timestamp: Time when the system clock was last set or 1025 corrected, in NTP timestamp format. 1027 Origin Timestamp (org): Time at the client when the request departed 1028 for the server, in NTP timestamp format. 1030 Receive Timestamp (rec): Time at the server when the request arrived 1031 from the client, in NTP timestamp format. 1033 Transmit Timestamp (xmt): Time at the server when the response left 1034 for the client, in NTP timestamp format. 1036 Destination Timestamp (dst): Time at the client when the reply 1037 arrived from the server, in NTP timestamp format. 1039 Note: Destination Timestamp field is not included as a header field; 1040 it is determined upon arrival of the packet and made available in the 1041 packet buffer data structure. 1043 The MAC consists of the Key Identifier followed by the Message 1044 Digest. The message digest, or cryptosum, is calculated as in [8] 1045 over all header and optional extension fields, but not the MAC 1046 itself. 1048 Key Identifier (keyid): 32-bit unsigned integer used by the client 1049 and server to designate a secret 128-bit MD5 key. 1051 Message Digest (digest): 128-bit bitstring computed by the keyed MD5 1052 message digest computed over all the words in the header and 1053 extension fields, but not the MAC itself. 1055 7.4. The Kiss-o'-Death Packet 1057 If the Stratum field is 0, which implies unspecified or invalid, the 1058 Reference Identifier field can be used to convey messages useful for 1059 status reporting and access control. These are called Kiss-o'-Death 1060 (KoD) packets and the ASCII messages they convey are called kiss 1061 codes. The KoD packets got their name because an early use was to 1062 tell clients to stop sending packets that violate server access 1063 controls. The kiss codes can provide useful information for an 1064 intelligent client, either NTPv4 or SNTPv4. Kiss codes are encoded 1065 in four-character ASCII strings left justified and zero filled. The 1066 strings are designed for character displays and log files. A list of 1067 the currently-defined kiss codes is given in Figure 14. Other than 1068 displaying the kiss code, KoD packets have no protocol significance 1069 and are discarded after inspection. 1071 +------+------------------------------------------------------------+ 1072 | Code | Meaning | 1073 +------+------------------------------------------------------------+ 1074 | ACST | The association belongs to a unicast server | 1075 | AUTH | Server authentication failed | 1076 | AUTO | Autokey sequence failed | 1077 | BCST | The association belongs to a broadcast server | 1078 | CRYP | Cryptographic authentication or identification failed | 1079 | DENY | Access denied by remote server | 1080 | DROP | Lost peer in symmetric mode | 1081 | RSTR | Access denied due to local policy | 1082 | INIT | The association has not yet synchronized for the first | 1083 | | time | 1084 | MCST | The association belongs to a dynamically discovered server | 1085 | NKEY | No key found. Either the key was never installed or is | 1086 | | not trusted | 1087 | RATE | Rate exceeded. The server has temporarily denied access | 1088 | | because the client exceeded the rate threshold | 1089 | RMOT | Alteration of association from a remote host running | 1090 | | ntpdc. | 1091 | STEP | A step change in system time has occurred, but the | 1092 | | association has not yet resynchronized | 1093 +------+------------------------------------------------------------+ 1095 Figure 14: Kiss Codes 1097 7.5. NTP Extension Field Format 1099 In NTPv4 one or more extension fields can be inserted after the 1100 header and before the MAC, which is always present when an extension 1101 field is present. Other than defining the field format, this 1102 document makes no use of the field contents. An extension field 1103 contains a request or response message in the format shown in 1104 Figure 15. 1106 0 1 2 3 1107 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1108 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1109 | Field Type | Length | 1110 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1111 | Association ID | 1112 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1113 | Timestamp | 1114 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1115 | Filestamp | 1116 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1117 | Value Length | 1118 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1119 . . 1120 . Value . 1121 . . 1122 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1123 | Signature Length | 1124 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1125 . . 1126 . Signature . 1127 . . 1128 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1129 | Padding (as needed) | 1130 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1132 Figure 15: Extension Field Format 1134 All extension fields are zero-padded to a word (4 octets) boundary 1135 and the last is padded to a 64-bit (8 octet) boundary. The Field 1136 Type field is specific to the defined function and is not elaborated 1137 here. While the minimum field length containing required fields is 4 1138 words (16 octets), a maximum field length remains to be established. 1140 The Length field is a 16-bit integer which indicates the length of 1141 the entire extension field in octets, including the Padding field. 1143 The 32-bit Association ID field is set by clients to the value 1144 previously received from the server or 0 otherwise. The server sets 1145 the Association ID field when sending a response as a handle for 1146 subsequent exchanges. 1148 The Timestamp and Filestamp 32-bit fields carry the seconds field of 1149 an NTP timestamp. The Timestamp field establishes the signature 1150 epoch of the data in the extension field, while the filestamp 1151 establishes the generation epoch of the file that ultimately produced 1152 the data. 1154 The 32-bit Value Length field indicates the length of the Value field 1155 in octets. The minimum length of this field is 0, in which case the 1156 Value field itself is omitted. 1158 The 32-bit Signature Length field indicates the length of the 1159 Signature field in octets. The minimum length of this field is 0. 1160 In which case the Signature field itself is omitted. 1162 If both the Value Length and Signature Length fields are 0, both of 1163 these words can be omitted, in which case the extension field has 1164 length 4 words. 1166 The presence of the MAC and extension fields in the packet is 1167 determined from the length of the remaining area after the header to 1168 the end of the packet. The parser initializes a pointer just after 1169 the header. If the Length field is not a multiple of 4, a format 1170 error has occurred and the packet is discarded. The following cases 1171 are possible based on the remaining length in words. 1172 0 The packet contains no MAC and is not authenticated. 1173 1 The packet is an error report or crypto-NAK. 1174 2, 3, 4 The packet is discarded with a format error. 1175 5 The remainder of the packet is the 160-bit MAC. 1176 >5 One or more extension fields are present. 1178 If an extension field is present, the parser examines the Length 1179 field. If the length is less than 4 or not a multiple of 4, a format 1180 error has occurred and the packet is discarded; otherwise, the parser 1181 increments the pointer by this value. The parser now uses the same 1182 rules as above to determine whether a MAC is present and/or another 1183 extension field. An additional implementation dependent test is 1184 necessary to ensure the pointer does not stray outside the buffer 1185 space occupied by the packet. 1187 8. On Wire Protocol 1189 The heart of the NTP on-wire protocol is the core mechanism which 1190 exchanges time values between servers, peers and clients. It is 1191 inherently resistant to lost or duplicate packets. Data integrity is 1192 provided by the IP and UDP checksums. No flow control or 1193 retransmission facilities are provided or necessary. The protocol 1194 uses timestamps, either extracted from packet headers or struck from 1195 the system clock upon the arrival or departure of a packet. 1196 Timestamps are precision data and should be restruck in case of link 1197 level retransmission and corrected for the time to compute a MAC on 1198 transmit. 1200 NTP messages make use of two different communication modes, one-to- 1201 one and one-to-many, commonly referred to as unicast and broadcast. 1202 For the purposes of this document, the term broadcast is interpreted 1203 to mean any available one-to-many mechanism. For IPv4 this equates 1204 to either IPv4 broadcast or IPv4 multicast. For IPv6 this equates to 1205 IPv6 multicast. For this purpose, IANA has allocated the IPv4 1206 multicast address 224.0.1.1 and the IPv6 multicast address ending 1207 :101, with prefix determined by scoping rules. 1209 The on-wire protocol uses four timestamps numbered T1 through T4 and 1210 three state variables org, rec and xmt, as shown in Figure 17. This 1211 figure shows the most general case where each of two peers, A and B, 1212 independently measure the offset and delay relative to the other. 1213 For purposes of illustration the individual timestamp values are 1214 shown in lower case with a number indicating the order of 1215 transmission and reception. 1217 t2 t3 t6 t7 1218 +---------+ +---------+ +---------+ +---------+ 1219 T1 | 0 | | t1 | | t3 | | t5 | 1220 +---------+ +---------+ +---------+ +---------+ 1221 T2 | 0 | | t2 | | t4 | | t6 | Packet 1222 +---------+ +---------+ +---------+ +---------+ Variables 1223 T3 | t1 | |t3=clock | | t5 | |t7=clock | 1224 +---------+ +---------+ +---------+ +---------+ 1225 T4 |t2=clock | |t6=clock | 1226 +---------+ +---------+ 1227 Peer B 1228 +---------+ +---------+ +---------+ +---------+ 1229 org | t1 | | t1 | | T3<>t1? | | t5 | 1230 +---------+ +---------+ +---------+ +---------+ State 1231 rec | t2 | | t2 | | t6 | | t6 | Variables 1232 +---------+ +---------+ +---------+ +---------+ 1233 xmt | 0 | | t3 | | T1<>t3? | | t7 | 1234 +---------+ +---------+ +---------+ +---------+ 1236 t2 t3 t6 t7 1237 --------------------------------------------------------- 1238 /\ \ /\ \ 1239 / \ / \ 1240 / \ / \ 1241 / \/ / \/ 1242 --------------------------------------------------------- 1243 t1 t4 t5 t8 1245 t1 t4 t5 t8 1246 +---------+ +---------+ +---------+ +---------+ 1247 T1 | 0 | | t1 | | t3 | | t5 | 1248 +---------+ +---------+ +---------+ +---------+ 1249 T2 | 0 | | t2 | | t4 | | t6 | Packet 1250 +---------+ +---------+ +---------+ +---------+ Variables 1251 T3 |t1=clock | | t3 | |t5=clock | | t7 | 1252 +---------+ +---------+ +---------+ +---------+ 1253 T4 |t4=clock | |t8=clock | 1254 +---------+ +---------+ 1255 Peer A 1256 +---------+ +---------+ +---------+ +---------+ 1257 org | 0 | | T3<>0? | | t3 | | T3<>t3? | 1258 +---------+ +---------+ +---------+ +---------+ State 1259 rec | 0 | | t4 | | t4 | | t8 | Variables 1260 +---------+ +---------+ +---------+ +---------+ 1261 xmt | t1 | | T1=t1? | | t5 | | T1<>t5? | 1262 +---------+ +---------+ +---------+ +---------+ 1264 Figure 17: On-Wire Protocol 1266 In the figure the first packet transmitted by A contains only the 1267 transmit timestamp T3 with value t1. B receives the packet at t2 and 1268 saves the origin timestamp T1 with value t1 in state variable org and 1269 the destination timestamp T4 with value t2 in state variable rec. At 1270 this time or some time later B sends a packet to A containing the org 1271 and rec state variables in T1 and T2, respectively and in addition 1272 the transmit timestamp T3 with value t3, which is saved in the xmt 1273 state variable. When this packet arrives at A the packet header 1274 variables T1, T2, T3 and destination timestamp T4 represent the four 1275 timestamps necessary to compute the offset and delay of B relative to 1276 A, as described below. 1278 Before the A state variables are updated, two sanity checks are 1279 performed in order to protect against duplicate or bogus packets. A 1280 packet is a duplicate if the transmit timestamp T3 in the packet 1281 matches the xmt state variable. A packet is bogus if the origin 1282 timestamp T1 in the packet does not match the org state variable. In 1283 either of these cases the state variables are updated, then the 1284 packet is discarded. 1286 The four most recent timestamps, T1 through T4, are used to compute 1287 the offset of B relative to A 1289 theta = T(B) - T(A) = 1/2 * [(T2-T1) + (T4-T3)] 1291 and the round trip delay 1293 delta = T(ABA) = (T4-T1) - (T3-T2). 1295 Note that the quantities within parentheses are computed from 64-bit 1296 unsigned timestamps and result in signed values with 63 significant 1297 bits plus sign. These values can represent dates from 68 years in 1298 the past to 68 years in the future. However, the offset and delay 1299 are computed as sums and differences of these values, which contain 1300 62 significant bits and two sign bits, so can represent unambiguous 1301 values from 34 years in the past to 34 years in the future. In other 1302 words, the time of the client must be set within 34 years of the 1303 server before the service is started. This is a fundamental 1304 limitation with 64-bit integer arithmetic. 1306 In implementations where floating double arithmetic is available, the 1307 first-order differences can be converted to floating double and the 1308 second-order sums and differences computed in that arithmetic. Since 1309 the second-order terms are typically very small relative to the 1310 timestamp magnitudes, there is no loss in significance, yet the 1311 unambiguous range is restored from 34 years to 68 years. 1313 In some scenarios where the initial frequency offset of the client is 1314 relatively large and the actual propagation time small, it is 1315 possible that the delay computation becomes negative. For instance, 1316 if the frequency difference is 100 PPM and the interval T4-T1 is 64 1317 s, the apparent delay is -6.4 ms. Since negative values are 1318 misleading in subsequent computations, the value of delta should be 1319 clamped not less than s.rho, where s.rho is the system precision 1320 described in Section 11.1, expressed in seconds. 1322 The discussion above assumes the most general case where two 1323 symmetric peers independently measure the offsets and delays between 1324 them. In the case of a stateless server, the protocol can be 1325 simplified. A stateless server copies T3 and T4 from the client 1326 packet to T1 and T2 of the server packet and tacks on the transmit 1327 timestamp T3 before sending it to the client. Additional details for 1328 filling in the remaining protocol fields are given in a Section 9 and 1329 following sections and in the appendix. 1331 9. Peer Process 1333 The process descriptions to follow include a listing of the important 1334 state variables followed by an overview of the process operations 1335 implemented as routines. Frequent reference is made to the skeleton 1336 in the appendix. The skeleton includes C-language fragments that 1337 describe the functions in more detail. It includes the parameters, 1338 variables and declarations necessary for a conforming NTPv4 1339 implementation. However, many additional variables and routines may 1340 be necessary in a working implementation. 1342 The peer process is called upon arrival of a server or peer packet. 1343 It runs the on-wire protocol to determine the clock offset and round 1344 trip delay and in addition computes statistics used by the system and 1345 poll processes. Peer variables are instantiated in the association 1346 data structure when the structure is initialized and updated by 1347 arriving packets. There is a peer process, poll process and 1348 association for each server. 1350 9.1. Peer Process Variables 1352 Figure 18, Figure 19, Figure 20 and Figure 21 summarize the common 1353 names, formula names and a short description of the peer variables. 1354 The common names and formula names are interchangeable; formula names 1355 are intended for equations where space is important. Unless noted 1356 otherwise, all peer variables have assumed prefix p. 1358 +---------+----------+-----------------------+ 1359 | Name | Formula | Description | 1360 +---------+----------+-----------------------+ 1361 | srcaddr | srcaddr | source address | 1362 | srcport | srcport | source port | 1363 | dstaddr | dstaddr | destination address | 1364 | dstport | destport | destination port | 1365 | keyid | keyid | key identifier key ID | 1366 +---------+----------+-----------------------+ 1368 Figure 18: Peer Process Configuration Variables 1370 +-----------+------------+---------------------+ 1371 | Name | Formula | Description | 1372 +-----------+------------+---------------------+ 1373 | leap | leap | leap indicator | 1374 | version | version | version number | 1375 | mode | mode | mode | 1376 | stratum | stratum | stratum | 1377 | ppoll | ppoll | peer poll exponent | 1378 | rootdelay | delta_r | root delay | 1379 | rootdisp | epsilon_r | root dispersion | 1380 | refid | refid | reference ID | 1381 | reftime | reftime | reference timestamp | 1382 +-----------+------------+---------------------+ 1384 Figure 19: Peer Process Packet Variables 1386 +------+---------+--------------------+ 1387 | Name | Formula | Description | 1388 +------+---------+--------------------+ 1389 | org | T1 | origin timestamp | 1390 | rec | T2 | receive timestamp | 1391 | xmt | T3 | transmit timestamp | 1392 | t | t | packet time | 1393 +------+---------+--------------------+ 1395 Figure 20: Peer Process Timestamp Variables 1396 +--------+---------+-----------------+ 1397 | Name | Formula | Description | 1398 +--------+---------+-----------------+ 1399 | offset | theta | clock offset | 1400 | delay | delta | roundtrip delay | 1401 | disp | epsilon | dispersion | 1402 | jitter | psi | jitter | 1403 | filter | filter | clock filter | 1404 | tp | t_p | filter time | 1405 +--------+---------+-----------------+ 1407 Figure 21: Peer Process Statistics Variables 1409 The following configuration variables are normally initialized when 1410 the association is mobilized, either from a configuration file or 1411 upon arrival of the first packet for an unknown association. 1413 srcaddr: IP address of the remote server or reference clock. This 1414 becomes the destination IP address in packets sent from this 1415 association. 1417 srcport: UDP port number of the server or reference clock. This 1418 becomes the destination port number in packets sent from this 1419 association. When operating in symmetric modes (1 and 2) this field 1420 must contain the NTP port number PORT (123) assigned by the IANA. In 1421 other modes it can contain any number consistent with local policy. 1423 dstaddr: IP address of the client. This becomes the source IP 1424 address in packets sent from this association. 1426 dstport: UDP port number of the client, ordinarily the NTP port 1427 number PORT (123) assigned by the IANA. This becomes the source port 1428 number in packets sent from this association. 1430 keyid: Symmetric key ID for the 128-bit MD5 key used to generate and 1431 verify the MAC. The client and server or peer can use different 1432 values, but they must map to the same key. 1434 The variables defined in Figure 19 are updated from the packet header 1435 as each packet arrives. They are interpreted in the same way as the 1436 packet variables of the same names. Note however, unlike the NTPv3 1437 design, the leap and stratum variables are never reset unless the 1438 association is reset, which happens only if the system time is 1439 stepped. It is convenient for later processing to convert the NTP 1440 short format packet values r.rootdelay and r.rootdisp to floating 1441 doubles as peer variables. 1443 The variables defined in Figure 20 include the timestamps exchanged 1444 by the on-wire protocol in Section 8. The t variable is the seconds 1445 counter c.t associated with these values. The c.t variable is 1446 maintained by the clock adjust process described in Section 12. It 1447 counts the seconds since the service was started. The variables 1448 defined in Figure 21 include the statistics computed by the 1449 clock_filter() routine described in Section 10. The tp variable is 1450 the seconds counter associated with these values. 1452 9.2. Peer Process Operations 1454 The receive() routine in Appendix A.5.1 shows the peer process code 1455 flow upon the arrival of a packet. The access() routine in 1456 Appendix A.5.4 implements access restrictions using an access control 1457 list (ACL). There is no specific method required for access control, 1458 although it is recommended that implementations include such a 1459 scheme, which is similar to many others now in widespread use. 1460 Format checks require correct field length and alignment, acceptable 1461 version number (1-4) and correct extension field syntax, if present. 1463 There is no specific requirement for authentication; however, if 1464 authentication is implemented, the symmetric key scheme described in 1465 Appendix A.2 must be among the supported schemes. This scheme uses 1466 the MD5 keyed hash algorithm described in [8]. 1468 Next, the association table is searched for matching source address 1469 and source port using the find_assoc() routine in Appendix A.5.1. 1470 Figure 22 is a dispatch table where the columns correspond to the 1471 packet mode and rows correspond to the association mode. The 1472 intersection of the association and packet modes dispatches 1473 processing to one of the following steps. 1475 +------------------+---------------------------------------+ 1476 | | Packet Mode | 1477 +------------------+-------+-------+-------+-------+-------+ 1478 | Association Mode | 1 | 2 | 3 | 4 | 5 | 1479 +------------------+-------+-------+-------+-------+-------+ 1480 | No Association 0 | NEWPS | DSCRD | FXMIT | MANY | NEWBC | 1481 | Symm. Active 1 | PROC | PROC | DSCRD | DSCRD | DSCRD | 1482 | Symm. Passive 2 | PROC | ERR | DSCRD | DSCRD | DSCRD | 1483 | Client 3 | DSCRD | DSCRD | DSCRD | PROC | DSCRD | 1484 | Server 4 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1485 | Broadcast 5 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1486 | Bcast Client 6 | DSCRD | DSCRD | DSCRD | DSCRD | PROC | 1487 +------------------+-------+-------+-------+-------+-------+ 1489 Figure 22: Peer Dispatch Table 1491 DSCRD. This is a nonfatal violation of protocol as the result of a 1492 programming error, long delayed packet or replayed packet. The peer 1493 process discards the packet and exits. 1495 ERR. This is a fatal violation of protocol as the result of a 1496 programming error, long delayed packet or replayed packet. The peer 1497 process discards the packet, demobilizes the symmetric passive 1498 association and exits. 1500 FXMIT. This is a client (mode 3) packet matching no association 1501 (mode 0). If the destination address is not a broadcast address, the 1502 server constructs a server (mode 4) packet and returns it to the 1503 client without retaining state. The server packet header is 1504 constructed by the fast_xmit() routine in Appendix A.5.3. The packet 1505 header is assembled from the receive packet and system variables as 1506 shown in Figure 23. If the s.rootdelay and s.rootdisp system 1507 variables are stored in floating double, they must be converted to 1508 NTP short format first. 1510 +-----------------------------------+ 1511 | Packet Variable --> Variable | 1512 +-----------------------------------+ 1513 | r.leap --> p.leap | 1514 | r.mode --> p.mode | 1515 | r.stratum --> p.stratum | 1516 | r.poll --> p.ppoll | 1517 | r.rootdelay --> p.rootdelay | 1518 | r.rootdisp --> p.rootdisp | 1519 | r.refid --> p.refid | 1520 | r.reftime --> p.reftime | 1521 | r.keyid --> p.keyid | 1522 +-----------------------------------+ 1524 Figure 23: Receive Packet Header 1526 Note that, if authentication fails, the server returns a special 1527 message called a crypto-NAK. This message includes the normal NTP 1528 header data shown in Figure 9, but with a MAC consisting of four 1529 octets of zeros. The client MAY accept or reject the data in the 1530 message. After these actions the peer process exits. 1532 If the destination address is a multicast address, the sender is 1533 operating in manycast client mode. If the packet is valid and the 1534 server stratum is less than the client stratum, the server sends an 1535 ordinary server (mode 4) packet, but using its unicast destination 1536 address. A crypto-NAK is not sent if authentication fails. After 1537 these actions the peer process exits. 1539 MANY: This is a server (mode 4) packet matching no association. 1541 Ordinarily, this can happen only as the result of a manycast server 1542 reply to a previously sent multicast client packet. If the packet is 1543 valid, an ordinary client (mode 3) association is mobilized and 1544 operation continues as if the association was mobilized by the 1545 configuration file. 1547 NEWBC. This is a broadcast (mode 5) packet matching no association. 1548 The client mobilizes either a client (mode 3) or broadcast client 1549 (mode 6) association as shown in the mobilize() and clear() routines 1550 in Appendix A.2. Then the packet() routine in Appendix A.5.1.1 1551 validates the packet and initializes the peer variables. 1553 If the implementation supports no additional security or calibration 1554 functions, the association mode is set to broadcast client (mode 6) 1555 and the peer process exits. Implementations supporting public key 1556 authentication MAY run the Autokey or equivalent security protocol. 1557 Implementations SHOULD set the association mode to 3 and run a short 1558 client/server exchange to determine the propagation delay. Following 1559 the exchange the association mode is set to 6 and the peer process 1560 continues in listen-only mode. Note the distinction between a mode-6 1561 packet, which is reserved for the NTP monitor and control functions, 1562 and a mode-6 association. 1564 NEWPS. This is a symmetric active (mode 1) packet matching no 1565 association. The client mobilizes a symmetric passive (mode 2) 1566 association as shown in the mobilize() routine and clear() routines 1567 in Appendix A.2. Processing continues in the PROC section below. 1569 PROC. The packet matches an existing association. The packet 1570 timestamps are carefully checked to avoid invalid, duplicate or bogus 1571 packets. Additional checks are summarized in Figure 24. Note that 1572 all packets, including a crypto-NAK, are considered valid only if 1573 they survive these tests. 1575 +--------------------------+----------------------------------------+ 1576 | Packet Type | Description | 1577 +--------------------------+----------------------------------------+ 1578 | 1 duplicate packet | The packet is at best an old duplicate | 1579 | | or at worst a replay by a hacker. | 1580 | | This can happen in symmetric modes if | 1581 | | the poll intervals are uneven. | 1582 | 2 bogus packet | | 1583 | 3 invalid | One or more timestamp fields are | 1584 | | invalid. This normally happens in | 1585 | | symmetric modes when one peer sends | 1586 | | the first packet to the other and | 1587 | | before the other has received its | 1588 | | first reply. | 1589 | 4 access denied | The access controls have black | 1590 | 5 authentication failure | The cryptographic message digest does | 1591 | | not match the MAC. | 1592 | 6 unsynchronized | The server is not synchronized to a | 1593 | | valid source. | 1594 | 7 bad header data | One or more header fields are invalid. | 1595 +--------------------------+----------------------------------------+ 1597 Figure 24: Packet Error Checks 1599 Processing continues in the packet() routine in Appendix A.5.1.1. It 1600 copies the packet variables to the peer variables as shown in 1601 Figure 23 and the packet() routine in Appendix A.5.2">. The 1602 receive() routine implements tests 1-5 in Figure 24; the packet() 1603 routine implements tests 6-7. If errors are found the packet is 1604 discarded and the peer process exits. 1606 The on-wire protocol calculates the clock offset theta and round trip 1607 delay delta from the four most recent timestamps as described in 1608 Section 8. While it is in principle possible to do all calculations 1609 except the first-order timestamp differences in fixed-point 1610 arithmetic, it is much easier to convert the first-order differences 1611 to floating doubles and do the remaining calculations in that 1612 arithmetic, and this will be assumed in the following description. 1614 Next, the 8-bit p.reach shift register in the poll process described 1615 in Section 13 is used to determine whether the server is reachable 1616 and the data are fresh. The register is shifted left by one bit when 1617 a packet is sent and the rightmost bit is set to zero. As valid 1618 packets arrive, the packet() routine sets the rightmost bit to one. 1619 If the register contains any nonzero bits, the server is considered 1620 reachable; otherwise, it is unreachable. Since the peer poll 1621 interval might have changed since the last packet, the poll_update() 1622 routine in Appendix A.5.7.2 is called to redetermine the host poll 1623 interval. 1625 The dispersion statistic epsilon(t) represents the maximum error due 1626 to the frequency tolerance and time since the last packet was sent It 1627 is initialized 1629 epsilon(t_0) = r.rho + s.rho + PHI * (T4-T1) 1631 when the measurement is made at t_0 according to the seconds counter. 1632 Here r.rho is the packet precision described in Section 7.3 and s.rho 1633 is the system precision described in Section 11.1, both expressed in 1634 seconds. These terms are necessary to account for the uncertainty in 1635 reading the system clock in both the server and the client. 1637 The dispersion then grows at constant rate PHI; in other words, at 1638 time t, epsilon(t) = epsilon(t_0) + PHI * (t-t_0). With the default 1639 value PHI = 15 PPM, this amounts to about 1.3 s per day. With this 1640 understanding, the argument t will be dropped and the dispersion 1641 represented simply as epsilon. The remaining statistics are computed 1642 by the clock filter algorithm described in the next section. 1644 10. Clock Filter Algorithm 1646 The clock filter algorithm, part of the peer process, is implemented 1647 in the clock_filter() routine in Appendix A.5.2. It grooms the 1648 stream of on-wire data to select the samples most likely to represent 1649 accurate time. The algorithm produces the variables shown in 1650 Figure 21, including the offset (theta), delay (delta), dispersion 1651 (epsilon), jitter (psi) and time of arrival (t). These data are used 1652 by the mitigation algorithms to determine the best and final offset 1653 used to discipline the system clock. They are also used to determine 1654 the server health and whether it is suitable for synchronization. 1656 The clock filter algorithm saves the most recent sample tuples 1657 (theta, delta, epsilon, t) in the filter structure, which functions 1658 as an 8-stage shift register. The tuples are saved in the order that 1659 packets arrive. Here t is the packet time of arrival according to 1660 the seconds counter and should not be confused with the peer variable 1661 tp. 1663 The following scheme is used to insure sufficient samples are in the 1664 filter and that old stale data are discarded. Initially, the tuples 1665 of all stages are set to the dummy tuple (0, MAXDISP, MAXDISP, 0). 1666 As valid packets arrive, tuples are shifted into the filter causing 1667 old tuples to be discarded, so eventually only valid tuples remain. 1668 If the three low order bits of the reach register are zero, 1669 indicating three poll intervals have expired with no valid packets 1670 received, the poll process calls the clock filter algorithm with a 1671 dummy tuple just as if the tuple had arrived from the network. If 1672 this persists for eight poll intervals, the register returns to the 1673 initial condition. 1675 In the next step the shift register stages are copied to a temporary 1676 list and the list sorted by increasing delta. Let i index the stages 1677 starting with the lowest delta. If the first tuple epoch t_0 is not 1678 later than the last valid sample epoch p.t, the routine exits without 1679 affecting the current peer variables. Otherwise, let epsilon_i be 1680 the dispersion of the ith entry, then 1681 i=n-1 1682 --- epsilon_i 1683 capepsilon = \ ---------- 1684 / (i+1) 1685 --- 2 1686 i=0 1688 is the peer dispersion p.disp. Note the overload of epsilon, whether 1689 input to the clock filter or output, the meaning should be clear from 1690 context. 1692 The observer should note (a) if all stages contain the dummy tuple 1693 with dispersion MAXDISP, the computed dispersion is a little less 1694 than 16 s, (b) each time a valid tuple is shifted into the register, 1695 the dispersion drops by a little less than half, depending on the 1696 valid tuples dispersion, (c) after the fourth valid packet the 1697 dispersion is usually a little less than 1 s, which is the assumed 1698 value of the MAXDIST parameter used by the selection algorithm to 1699 determine whether the peer variables are acceptable or not. 1701 Let the first stage offset in the sorted list be theta_0; then, for 1702 the other stages in any order, the jitter is the RMS average 1703 +----- -----+ 1704 | 1/2 | 1705 | +----- -----+ | 1706 | | n-1 | | 1707 | | --- | | 1708 | 1 | \ 2 | | 1709 psi = | -------- * | / (theta_0-theta_j) | | 1710 | (n-1) | --- | | 1711 | | j=1 | | 1712 | +----- -----+ | 1713 | | 1714 +----- -----+ 1716 where n is the number of valid tuples in the filter (n > 1). In 1717 order to insure consistency and avoid divide exceptions in other 1718 computations, the psi is bounded from below by the system precision 1719 s.rho expressed in seconds. While not in general considered a major 1720 factor in ranking server quality, jitter is a valuable indicator of 1721 fundamental timekeeping performance and network congestion state. Of 1722 particular importance to the mitigation algorithms is the peer 1723 synchronization distance, which is computed from the delay and 1724 dispersion. 1726 lambda = (delta / 2) + epsilon. 1728 Note that epsilon and therefore lambda increase at rate PHI. The 1729 lambda is not a state variable, since lambda is recalculated at each 1730 use. It is a component of the root synchronization distance used by 1731 the mitigation algorithms as a metric to evaluate the quality of time 1732 available from each server. 1734 It is important to note that, unlike NTPv3, NTPv4 associations do not 1735 show a timeout condition by setting the stratum peer variable to 16. 1736 In NTPv4 lambda increases with time, so eventually the 1737 synchronization distance exceeds the distance threshold MAXDIST, in 1738 which case the association is considered unfit for synchronization. 1740 11. System Process 1742 As each new sample (theta, delta, epsilon, jitter, t) is produced by 1743 the clock filter algorithm, all peer processes are scanned by the 1744 mitigation algorithms consisting of the selection, cluster, combine 1745 and clock discipline algorithms in the system process. The selection 1746 algorithm scans all associations and casts off the falsetickers, 1747 which have demonstrably incorrect time, leaving the truechimers as 1748 result. In a series of rounds the cluster algorithm discards the 1749 association statistically furthest from the centroid until a 1750 specified minimum number of survivors remain. The combine algorithm 1751 produces the best and final statistics on a weighted average basis. 1752 The final offset is passed to the clock discipline algorithm to steer 1753 the system clock to the correct time. 1755 The cluster algorithm selects one of the survivors as the system 1756 peer. The associated statistics (theta, delta, epsilon, jitter, t) 1757 are used to construct the system variables inherited by dependent 1758 servers and clients and made available to other applications running 1759 on the same machine. 1761 11.1. System Process Variables 1763 Figure 27 summarizes the common names, formula names and a short 1764 description of each system variable. Unless noted otherwise, all 1765 variables have assumed prefix s. 1767 +-----------+------------+------------------------+ 1768 | Name | Formula | Description | 1769 +-----------+------------+------------------------+ 1770 | t | t | update time | 1771 | p | p | system peer identifier | 1772 | leap | leap | leap indicator | 1773 | stratum | stratum | stratum | 1774 | precision | rho | precision | 1775 | offset | THETA | combined offset | 1776 | jitter | PSI | combined jitter | 1777 | rootdelay | DELTA | root delay | 1778 | rootdisp | EPSILON | root dispersion | 1779 | refid | refid | reference ID | 1780 | reftime | reftime | reference time | 1781 | NMIN | 3 | minimum survivors | 1782 | CMIN | 1 | minimum candidates | 1783 +-----------+------------+------------------------+ 1785 Figure 27: System Process Variables 1787 Except for the t, p, offset and jitter variables and the NMIN and 1788 CMIN constants, the variables have the same format and interpretation 1789 as the peer variables of the same name. The NMIN and CMIN parameters 1790 are used by the selection and cluster algorithms described in the 1791 next section. 1793 The t variable is the seconds counter at the last update determined 1794 by the clock_update() routine in Appendix A.5.5.4. The p variable is 1795 the system peer identifier determined by the cluster() routine in 1796 Section 11.2.2. The precision variable has the same format as the 1797 packet variable of the same name. The precision is defined as the 1798 larger of the resolution and time to read the clock, in log2 units. 1799 For instance, the precision of a mains-frequency clock incrementing 1800 at 60 Hz is 16 ms, even when the system clock hardware representation 1801 is to the nanosecond. 1803 The offset and jitter variables are determined by the combine() 1804 routine in Section 11.2.3. These values represent the best and final 1805 offset and jitter used to discipline the system clock. Initially, 1806 all variables are cleared to zero, then the leap is set to 3 1807 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember that 1808 MAXSTRAT is mapped to zero in the transmitted packet. 1810 11.2. System Process Operations 1812 Figure 28 summarizes the system process operations performed by the 1813 clock_select() routine. The selection algorithm described in 1814 Section 11.2.1 produces a majority clique of presumed correct 1815 candidates (truechimers) based on agreement principles. The cluster 1816 algorithm described in Section 11.2.2 discards outlyers to produce 1817 the most accurate survivors. The combine algorithm described in 1818 Section 11.2.3 provides the best and final offset for the clock 1819 discipline algorithm described in Appendix A.5.5.6. If the selection 1820 algorithm cannot produce a majority clique, or if it cannot produce 1821 at least CMIN survivors, the system process exits without 1822 disciplining the system clock. If successful, the cluster algorithm 1823 selects the statistically best candidate as the system peer and its 1824 variables are inherited as the system variables. 1826 +-----------------+ 1827 | clock_select() | 1828 +-----------------+ 1829 ................................|........... 1830 . V . 1831 . yes +---------+ +-----------------+ . 1832 . +--| accept? | | scan candidates | . 1833 . | +---------+ | | . 1834 . V no | | | . 1835 . +---------+ | | | . 1836 . | add peer| | | | . 1837 . +---------- | | | . 1838 . | V | | . 1839 . +---------->-->| | . 1840 . | | . 1841 . Selection Algorithm +-----------------+ . 1842 .................................|.......... 1843 V 1844 no +-------------------+ 1845 +-------------| survivors? | 1846 | +-------------------+ 1847 | | yes 1848 | V 1849 | +-------------------+ 1850 | | Cluster Algorithm | 1851 | +-------------------+ 1852 | | 1853 | V 1854 V yes +-------------------+ 1855 |<------------| n < CMIN? | 1856 | +-------------------+ 1857 V | 1858 +-----------------+ V no 1859 | s.p = NULL | +-------------------+ 1860 +-----------------+ | s.p = vo.p | 1861 | +-------------------+ 1862 V | 1863 +-----------------+ V 1864 | return (UNSYNC) | +-------------------+ 1865 +-----------------+ | return (SYNC) | 1866 +-------------------+ 1868 Figure 28: clock_select() Routine 1870 11.2.1. Selection Algorithm 1872 Note that the selection and cluster algorithms are described 1873 separately, but combined in the code skeleton. The selection 1874 algorithm operates to find an intersection interval containing a 1875 majority clique of truechimers using Byzantine agreement principles 1876 originally proposed by Marzullo [9], but modified to improve 1877 accuracy. An overview of the algorithm is given below and in the 1878 first half of the clock_select() routine in Appendix A.5.5.1. 1880 First, those servers which are unusable according to the rules of the 1881 protocol are detected and discarded by the accept() routine in 1882 Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is generated 1883 for the remaining candidates. Here, p is the association identifier 1884 and type identifies the upper (+1), middle (0) and lower (-1) 1885 endpoints of a correctness interval centered on theta for that 1886 candidate. This results in three tuples, lowpoint (p, -1, theta - 1887 lambda), midpoint (p, 0, theta) and highpoint (p, +1, theta + 1888 lambda), where lambda is the root synchronization distance calculated 1889 on each use by the rootdist() routine in Appendix A.5.1.1. The steps 1890 of the algorithm are: 1892 1. For each of m associations, place three tuples as defined above 1893 on the candidate list. 1895 2. Sort the tuples on the list by the edge component. Order the 1896 lowpoint, midpoint and highpoint of these intervals from lowest to 1897 highest. Set the number of falsetickers f = 0. 1899 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest 1900 endpoint to highest. Add one to c for every lowpoint, subtract one 1901 for every highpoint, add one to d for every midpoint. If c >= m - f, 1902 stop; set l = current lowpoint. 1904 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c 1905 for every highpoint, subtract one for every lowpoint, add one to d 1906 for every midpoint. If c >= m - f, stop; set u = current highpoint. 1908 5. Is d = f and l < u? If yes, then follow step 5A; else, follow 1909 step 5B. 1911 5A. Success: the intersection interval is [l, u]. 1913 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. 1914 If no, then go to step 6. 1916 6. Failure; a majority clique could not be found. There are no 1917 suitable candidates to discipline the system clock. 1919 The algorithm is described in detail in Appendix A.5.5.1. Note that 1920 it starts with the assumption that there are no falsetickers (f = 0) 1921 and attempts to find a nonempty intersection interval containing the 1922 midpoints of all correct servers, i.e., truechimers. If a nonempty 1923 interval cannot be found, it increases the number of assumed 1924 falsetickers by one and tries again. If a nonempty interval is found 1925 and the number of falsetickers is less than the number of 1926 truechimers, a majority clique has been found and the midpoint of 1927 each truechimer (theta) represents the candidates available to the 1928 cluster algorithm. 1930 If a majority clique is not found, or if the number of truechimers is 1931 less than CMIN, there are insufficient candidates to discipline the 1932 system clock. CMIN defines the minimum number of servers consistent 1933 with the correctness requirements. Suspicious operators would set 1934 CMIN to insure multiple redundant servers are available for the 1935 algorithms to mitigate properly. However, for historic reasons the 1936 default value for CMIN is one. 1938 11.2.2. Cluster Algorithm 1940 The candidates of the majority clique are placed on the survivor list 1941 in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an 1942 association identifier, theta_p, psi_p, and stratum_p the current 1943 offset, jitter and stratum of association p, respectively, and 1944 lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, 1945 where lambda is the root synchronization distance for association p. 1946 The list is processed by the cluster algorithm below and the second 1947 half of the clock_select() algorithm in Appendix A.5.5.1. 1949 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 1951 2. Sort the candidates by increasing lambda_p. Let n be the number 1952 of candidates and NMIN the minimum required number of survivors. 1954 3. For each candidate compute the selection jitter psi_s: 1955 1/2 1956 +----- -----+ 1957 | n-1 | 1958 | --- | 1959 1 | \ 2 | 1960 psi_s = ---- * | / (theta_s - theta_j) | 1961 n-1 | --- | 1962 | j=1 | 1963 +----- -----+ 1965 4. Select psi_max as the candidate with maximum psi_s. 1967 5. Select psi_min as the candidate with minimum psi_p. 1969 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; 1970 otherwise, follow step 6B. 1972 6A. Done. The remaining candidates on the survivor list are ranked 1973 in the order of preference. The first entry on the list represents 1974 the system peer; its variables are used later to update the system 1975 variables. 1977 6B. Delete the outlyer candidate with psi_max; reduce n by one and go 1978 back to step 3. 1980 The algorithm operates in a series of rounds where each round 1981 discards the statistical outlyer with maximum selection jitter psi_s. 1982 However, if psi_s is less than the minimum peer jitter psi_p, no 1983 improvement is possible by discarding outlyers. This and the minimum 1984 number of survivors represent the terminating conditions of the 1985 algorithm. Upon termination, the final value of psi_max is saved as 1986 the system selection jitter PSI_s for use later. 1988 11.2.3. Combine Algorithm 1990 The remaining survivors are processed by the clock_combine() routine 1991 in Appendix A.5.5.5 to produce the best and final data for the clock 1992 discipline algorithm. The clock_combine() routine processes peer 1993 offset and jitter statistics to produce the combined system offset 1994 THETA and system peer jitter PSI_p, where each server statistic is 1995 weighted by the reciprocal of the root synchronization distance and 1996 the result normalized. 1998 The combined THETA is passed to the clock_update() routine in 1999 Appendix A.5.5.4. The first candidate on the survivor list is 2000 nominated as the system peer with identifier p. The system peer 2001 jitter PSI_p is a component of the system jitter PSI. It is used 2002 along with the selection jitter PSI_s to produce the system jitter: 2004 PSI = [(PSI_s)^2 + (PSI_p)^2] 2006 Each time an update is received from the system peer, the 2007 clock_update() routine in Appendix A.5.5.4 is called. By rule, an 2008 update is discarded if its time of arrival p.t is not strictly later 2009 than the last update used s.t. The labels IGNOR, PANIC, ADJ and STEP 2010 refer to return codes from the local_clock() routine described in the 2011 next section. 2013 IGNORE means the update has been ignored as an outlyer. PANIC means 2014 the offset is greater than the panic threshold PANICT (1000 s) and 2015 SHOULD cause the program to exit with a diagnostic message to the 2016 system log. STEP means the offset is less than the panic threshold, 2017 but greater than the step threshold STEPT (125 ms). Since this means 2018 all peer data have been invalidated, all associations MUST be reset 2019 and the client begins as at initial start. 2021 ADJ means the offset is less than the step threshold and thus a valid 2022 update. In this case the system variables are updated from the peer 2023 variables as shown in Figure 30. 2025 +-------------------------------------------+ 2026 | System Variable <-- System Peer Variable | | 2027 +-------------------------------------------+ 2028 | s.leap <-- p.leap | 2029 | s.stratum <-- p.stratum + 1 | 2030 | s.offset <-- THETA | 2031 | s.jitter <-- PSI | 2032 | s.rootdelay <-- p.delta_r + delta | 2033 | s.rootdisp <-- p.epsilon_r + p.epsilon + | 2034 | p.psi + PHI * (s.t - p.t) | 2035 | + |THETA| | 2036 | s.refid <-- p.refid | 2037 | s.reftime <-- p.reftime | 2038 | s.t <-- p.t | 2039 +-------------------------------------------+ 2041 Figure 30: System Variables Update 2043 There is an important detail not shown. The dispersion increment 2044 (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from 2045 below by MINDISP. In subnets with very fast processors and networks 2046 and very small delay and dispersion this forces a monotone-definite 2047 increase in s.rootdisp (EPSILON), which avoids loops between peers 2048 operating at the same stratum. 2050 The system variables are available to dependent application programs 2051 as nominal performance statistics. The system offset THETA is the 2052 clock offset relative to the available synchronization sources. The 2053 system jitter PSI is an estimate of the error in determining this 2054 value, elsewhere called the expected error. The root delay DELTA is 2055 the total round trip delay relative to the primary server. The root 2056 dispersion EPSILON is the dispersion accumulated over the network 2057 from the primary server. Finally, the root synchronization distance 2058 is defined 2060 LAMBDA = EPSILON + DELTA / 2, 2062 which represents the maximum error due all causes and is designated 2063 the root synchronization distance. 2065 11.3. Clock Discipline Algorithm 2067 The NTPv4 clock discipline algorithm, shortened to discipline in the 2068 following, functions as a combination of two philosophically quite 2069 different feedback control systems. In a phase-locked loop (PLL) 2070 design, periodic phase updates at update intervals mu seconds are 2071 used directly to minimize the time error and indirectly the frequency 2072 error. In a frequency-locked loop (FLL) design, periodic frequency 2073 updates at intervals mu are used directly to minimize the frequency 2074 error and indirectly the time error. As shown in [5], a PLL usually 2075 works better when network jitter dominates, while a FLL works better 2076 when oscillator wander dominates. This section contains an outline 2077 of how the NTPv4 design works. An in-depth discussion of the design 2078 principles is provided in [5], which also includes a performance 2079 analysis. 2081 The discipline is implemented as the feedback control system shown in 2082 Figure 31. The variable theta_r represents the combine algorithm 2083 offset (reference phase) and theta_c the VFO offset (control phase). 2084 Each update produces a signal V_d representing the instantaneous 2085 phase difference theta_r - theta_c. The clock filter for each server 2086 functions as a tapped delay line, with the output taken at the tap 2087 selected by the clock filter algorithm. The selection, cluster and 2088 combine algorithms combine the data from multiple filters to produce 2089 the signal V_s. The loop filter, with impulse response F(t), 2090 produces the signal V_c which controls the VFO frequency omega_c and 2091 thus the integral of the phase theta_c which closes the loop. The 2092 V_c signal is generated by the clock adjust process in Section 12. 2093 The detailed equations that implement these functions are best 2094 presented in the routines of Appendix A.5.5.6 and Appendix A.5.6.1. 2096 theta_r + +---------\ +----------------+ 2097 NTP --------->| Phase \ V_d | | V_s 2098 theta_c - | Detector ------>| Clock Filter |----+ 2099 +-------->| / | | | 2100 | +---------/ +----------------+ | 2101 | | 2102 ----- | 2103 / \ | 2104 | VFO | | 2105 \ / | 2106 ----- ....................................... | 2107 ^ . Loop Filter . | 2108 | . +---------+ x +-------------+ . | 2109 | V_c . | |<-----| | . | 2110 +------.-| Clock | y | Phase/Freq |<---------+ 2111 . | Adjust |<-----| Prediction | . 2112 . | | | | . 2113 . +---------+ +-------------+ . 2114 ....................................... 2116 Figure 31: Clock Discipline Feedback Loop 2118 Ordinarily, the pseudo-linear feedback loop described above operates 2119 to discipline the system clock. However, there are cases where a 2120 nonlinear algorithm offers considerable improvement. One case is 2121 when the discipline starts without knowledge of the intrinsic clock 2122 frequency. The pseudo-linear loop takes several hours to develop an 2123 accurate measurement and during most of that time the poll interval 2124 cannot be increased. The nonlinear loop described below does this in 2125 15 minutes. Another case is when occasional bursts of large jitter 2126 are present due to congested network links. The state machine 2127 described below resists error bursts lasting less than 15 minutes. 2129 Figure 32 contains a summary of the variables and parameters 2130 including the variables (lower case) or parameters (upper case) name, 2131 formula name and short description. Unless noted otherwise, all 2132 variables have assumed prefix c. The variables t, tc, state, hyster 2133 and count are integers; the remaining variables are floating doubles. 2134 The function of each will be explained in the algorithm descriptions 2135 below. 2137 +--------+------------+--------------------------+ 2138 | Name | Formula | Description | 2139 +--------+------------+--------------------------+ 2140 | t | timer | seconds counter | 2141 | offset | theta | combined offset | 2142 | resid | theta_r | residual offset | 2143 | freq | phi | clock frequency | 2144 | jitter | psi | clock offset jitter | 2145 | wander | omega | clock frequency wander | 2146 | tc | tau | time constant (log2) | 2147 | state | state | state | 2148 | adj | adj | frequency adjustment | 2149 | hyster | hyster | hysteresis counter | 2150 | STEPT | 125 | step threshold (.125 s) | 2151 | WATCH | 900 | stepout thresh(s) | 2152 | PANICT | 1000 | panic threshold (1000 s) | 2153 | LIMIT | 30 | hysteresis limit | 2154 | PGATE | 4 | hysteresis gate | 2155 | TC | 16 | time constant scale | 2156 | AVG | 8 | averaging constant | 2157 +--------+------------+--------------------------+ 2159 Figure 32: Clock Discipline Variables and Parameters 2161 The discipline is implemented by the local_clock() routine, which is 2162 called from the clock_update() routine. The local_clock() routine in 2163 Appendix A.5.5.6 has two parts; the first implements the clock state 2164 machine and the second determines the time constant and thus the poll 2165 interval. 2167 The local_clock() routine exits immediately if the offset is greater 2168 than the panic threshold PANICT (1000 s). The state transition 2169 function is implemented by the rstclock() function in 2170 Appendix A.5.5.7. Figure 33 shows the state transition function used 2171 bu this routine. It has four columns showing respectively the state 2172 name, predicate and action if the offset theta is less than the step 2173 threshold, the predicate and actions otherwise, and finally some 2174 comments. 2176 +-------+---------------------+-------------------+--------------+ 2177 | State | theta < STEP | theta > STEP | Comments | 2178 +-------+---------------------+-------------------+--------------+ 2179 | NSET | ->FREQ | ->FREQ | no frequency | 2180 | | adjust time | step time | file | 2181 +-------+---------------------+-------------------+--------------+ 2182 | FSET | ->SYNC | ->SYNC | frequency | 2183 | | adjust time | step time | file | 2184 +-------+---------------------+-------------------+--------------+ 2185 | SPIK | ->SYNC | if < 900 s ->SPIK | outlyer | 2186 | | adjust freq | else ->SYNC | detected | 2187 | | adjust time | step freq | | 2188 | | | step time | | 2189 +-------+---------------------+-------------------+--------------+ 2190 | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | 2191 | | else ->SYNC | else ->SYNC | frequency | 2192 | | step freq | step freq | | 2193 | | adjust time | adjust time | | 2194 +-------+---------------------+-------------------+--------------+ 2195 | SYNC | ->SYNC | if < 900 s ->SPIK | normal | 2196 | | adjust freq | else ->SYNC | operation | 2197 | | adjust time | step freq | | 2198 | | | step time | | 2199 +-------+---------------------+-------------------+--------------+ 2201 Figure 33: State Transition Function 2203 In the table entries the next state is identified by the arrow -> 2204 with the actions listed below. Actions such as adjust time and 2205 adjust frequency are implemented by the PLL/FLL feedback loop in the 2206 local_clock() routine. A step clock action is implemented by setting 2207 the clock directly, but this is done only after the stepout threshold 2208 WATCH (900 s) when the offset is more than the step threshold STEPT 2209 (.125 s). This resists clock steps under conditions of extreme 2210 network congestion. 2212 The jitter (psi) and wander (omega) statistics are computed using an 2213 exponential average with weight factor AVG. The time constant 2214 exponent (tau) is determined by comparing psi with the magnitude of 2215 the current offset theta. If the offset is greater than PGATE (4) 2216 times the clock jitter, the hysteresis counter hyster is reduced by 2217 two; otherwise, it is increased by one. If hyster increases to the 2218 upper limit LIMIT (30), tau is increased by one; if it decreases to 2219 the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau 2220 hovers near MAXPOLL, but quickly decreases if a temperature spike 2221 causes a frequency surge. 2223 12. Clock Adjust Process 2225 The actual clock adjustment is performed by the clock_adjust() 2226 routine in Appendix Appendix A.5.6.1. It runs at one-second 2227 intervals to add the frequency correction and a fixed percentage of 2228 the residual offset theta_r. The theta_r is in effect the 2229 exponential decay of the theta value produced by the loop filter at 2230 each update. The TC parameter scales the time constant to match the 2231 poll interval for convenience. Note that the dispersion EPSILON 2232 increases by PHI at each second. 2234 The clock adjust process includes a timer interrupt facility driving 2235 the seconds counter c.t. It begins at zero when the service starts 2236 and increments once each second. At each interrupt the 2237 clock_adjust() routine is called to incorporate the clock discipline 2238 time and frequency adjustments, then the associations are scanned to 2239 determine if the seconds counter equals or exceeds the p.next state 2240 variable defined in the next section. If so, the poll process is 2241 called to send a packet and compute the next p.next value. 2243 13. Poll Process 2245 Each association supports a poll process that runs at regular 2246 intervals to construct and send packets in symmetric, client and 2247 broadcast server associations. It runs continuously, whether or not 2248 servers are reachable in order to manage the clock filter and reach 2249 register. 2251 13.1. Poll Process Variables 2253 Figure 34 summarizes the common names, formula names and a short 2254 description of the poll process variables(lower case) and parameters 2255 (upper case). Unless noted otherwise, all variables have assumed 2256 prefix p. 2258 +---------+---------+--------------------+ 2259 | Name | Formula | Description | 2260 +---------+---------+--------------------+ 2261 | hpoll | hpoll | host poll exponent | 2262 | last | last | last poll time | 2263 | next | next | next poll time | 2264 | reach | reach | reach register | 2265 | unreach | unreach | unreach counter | 2266 | UNREACH | 24 | unreach limit | 2267 | BCOUNT | 8 | burst count | 2268 | BURST | flag | burst enable | 2269 | IBURST | flag | iburst enable | 2270 +---------+---------+--------------------+ 2272 Figure 34: Poll Process Variables 2274 The poll process variables are allocated in the association data 2275 structure along with the peer process variables. Following is a 2276 detailed description of the variables. The parameters will be called 2277 out in the following text. 2279 hpoll: signed integer representing the poll exponent, in log2 seconds 2281 last: integer representing the seconds counter when the most recent 2282 packet was sent 2284 next: integer representing the seconds counter when the next packet 2285 is to be sent 2287 reach: 8-bit integer shift register shared by the peer and poll 2288 processes 2290 unreach: integer representing the number of seconds the server has 2291 been unreachable 2293 13.2. Poll Process Operations 2295 As described previously, once each second the clock_adjust() routine 2296 in the clock adjust process is called. This routine calls the poll() 2297 routine in Appendix A.5.7.1 for each association in turn. If the 2298 time for the next poll message is greater than the seconds counter, 2299 the routine returns immediately. Symmetric (modes 1, 2), client 2300 (mode 3) and broadcast server (mode 5) associations routinely send 2301 packets. A broadcast client (mode 6) association runs the routine to 2302 update the reach and unreach variables, but does not send packets. 2303 The poll() routine calls the peer_xmit() routine in Appendix A.5.7.3 2304 to send a packet. If in a burst (burst > 0), nothing further is done 2305 except call the poll_update() routine to set the next poll interval. 2307 If not in a burst, the reach variable is shifted left by one bit, 2308 with zero replacing the rightmost bit. If the server has not been 2309 heard for the last three poll intervals, the clock_filter() routine 2310 is called to increase the dispersion as described in 2311 Appendix A.5.7.3. 2313 If the BURST flag is lit and the server is reachable and a valid 2314 source of synchronization is available, the client sends a burst of 2315 BCOUNT (8) packets at each poll interval. The interval between 2316 packets in the burst is two seconds. This is useful to accurately 2317 measure jitter with long poll intervals. If the IBURST flag is lit 2318 and this is the first packet sent when the server has been 2319 unreachable, the client sends a burst. This is useful to quickly 2320 reduce the synchronization distance below the distance threshold and 2321 synchronize the clock. 2323 If the P_MANY flag is lit in the p.flags word of the association, 2324 this is a manycast client association. Manycast client associations 2325 send client mode packets to designated multicast group addresses at 2326 MINPOLL intervals. The association starts out with a TTL of 1. If 2327 by the time of the next poll there are fewer than MINCLOCK servers 2328 have been mobilized, the ttl is increased by one. If the ttl reaches 2329 the limit TTLMAX, without finding MINCLOCK servers, the poll interval 2330 increases until reaching BEACON, when it starts over from the 2331 beginning. 2333 The poll() routine includes a feature that backs off the poll 2334 interval if the server becomes unreachable. If reach is nonzero, the 2335 server is reachable and unreach is set to zero; otherwise, unreach is 2336 incremented by one for each poll to the maximum UNREACH. Thereafter 2337 for each poll hpoll is increased by one, which doubles the poll 2338 interval up to the maximum MAXPOLL determined by the poll_update() 2339 routine. When the server again becomes reachable, unreach is set to 2340 zero, hpoll is reset to the t.c system variable and operation resumes 2341 normally. 2343 A packet is sent by the xmit_packet() routine in Appendix A.3. Some 2344 header values are copied from the peer variables left by a previous 2345 packet and others from the system variables. Figure 35 shows which 2346 values are copied to each header field. In those implementations 2347 using floating double data types for root delay and root dispersion, 2348 these must be converted to NTP short format. All other fields are 2349 either copied intact from peer and system variables or struck as a 2350 timestamp from the system clock. 2352 +-----------------------------------+ 2353 | Packet Variable <-- Variable | 2354 +-----------------------------------+ 2355 | x.leap <-- s.leap | 2356 | x.version <-- s.version | 2357 | x.mode <-- s.mode | 2358 | x.stratum <-- s.stratum | 2359 | x.poll <-- s.poll | 2360 | x.precision <-- s.precision | 2361 | x.rootdelay <-- s.rootdelay | 2362 | x.rootdisp <-- s.rootdisp | 2363 | x.refid <-- s.refid | 2364 | x.reftime <-- s.reftime | 2365 | x.org <-- p.xmt | 2366 | x.rec <-- p.dst | 2367 | x.xmt <-- clock | 2368 | x.keyid <-- p.keyid | 2369 | x.digest <-- md5 digest | 2370 +-----------------------------------+ 2372 Figure 35: xmit_packet Packet Header 2374 The poll_update() routine shown in Appendix A.5.7.2 is called when a 2375 valid packet is received and immediately after a poll message has 2376 been sent. If in a burst, the poll interval is fixed at 2 s; 2377 otherwise, the host poll exponent hpoll is set to the minimum of 2378 ppoll from the last packet received and hpoll from the poll() 2379 routine, but not less than MINPOLL nor greater than MAXPOLL. Thus 2380 the clock discipline can be oversampled, but not undersampled. This 2381 is necessary to preserve subnet dynamic behavior and protect against 2382 protocol errors. 2384 The poll exponent is converted to an interval which when added to the 2385 last variable determines the next variable and thus the time for the 2386 next poll. Finally, the last variable is set to the current seconds 2387 counter. 2389 14. Security Considerations 2391 NTPv4 provides an optional authentication field that utilizes the MD5 2392 algorithm. MD5, as the case for SHA-1, is derived from MD4, which 2393 has long been known to be weak. In 2004, techniques for efficiently 2394 finding collisions in MD5 were announced. A summary of the 2395 apropriateness of MD5 can be found in [10]. 2397 In the case of NTP as specified herein, NTP broadcast clients are 2398 vulnerable to disruption by misbehaving or hostile SNTP or NTP 2399 broadcast servers elsewhere in the Internet. Access controls and/or 2400 cryptographic authentication means should be provided for additional 2401 security in such cases. 2403 15. IANA Considerations 2405 UDP/TCP Port 123 was previously assigned by IANA for this protocol. 2406 The IANA has assigned the IPv4 multicast group address 224.0.1.1 and 2407 the IPv6 multicast address ending :101 for NTP. This document 2408 introduces NTP extension fields allowing for the development of 2409 future extensions to the protocol, where a particular extension is to 2410 be identified by the Field Type sub-field within the extension field. 2411 IANA is requested to establish and maintain a registry for Extension 2412 Field Types associated with this protocol, populating this registry 2413 with no initial entries. As future needs arise, new Extension Field 2414 Types may be defined. Following the policies outlined in [11], new 2415 consensus. 2417 16. Acknowledgements 2419 The editors would like to thank Karen O'Donoghue, Brian Haberman, 2420 Greg Dowd, Mark Elliot, and Harlan Stenn for technical reviews of 2421 this document. 2423 17. Informative References 2425 [1] Mills, D., "Network Time Protocol (Version 3) Specification, 2426 Implementation and Analysis", RFC 1305, Current Status DRAFT 2427 STANDARD, March 1992. 2429 [2] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 for 2430 IPv4, IPv6 and OSI", RFC 4330, draft-mills-sntp-v4-01 (work in 2431 progress), Current Status INFORMATIONAL, January 2006. 2433 [3] Mills, D.L., "The Autokey security architecture, protocol and 2434 algorithms. Electrical and Computer Engineering Technical 2435 Report 06-1-1", NDSS , January 2006. 2437 [4] Mills, D.L., Electrical and Computer Engineering Technical 2438 Report 06-6-1, NDSS, June 2006., "Network Time Protocol Version 2439 4 Reference and Implementation Guide.", 2006. 2441 [5] Mills, D.L., "Computer Network Time Synchronization - the 2442 Network Time Protocol. CRC Press, 304pp.", 2006. 2444 [6] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2445 Levels", BCP 14, RFC 2119, Current Status BEST CURRENT 2446 PRACTICE, March 1997. 2448 [7] Postel, J., "Internet Protocol", STD 5, RFC 791, Updated 2449 by RFC1349, Current Status STANDARD, September 1981. 2451 [8] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 2452 Current Status INFORMATIONAL, April 1992. 2454 [9] Marzullo and S. Owicki, "Maintaining the time in a distributed 2455 system.", ACM Operating Systems Review 19 , July 1985. 2457 [10] Bellovin, S. and E. Rescorla, Proceedings of the 13th annual 2458 ISOC Network and Distributed System Security Symposium, 2459 "Deploying a new Hash Algorithm", February 2006. 2461 [11] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 2462 Considerations Section in RFCs", BCP 26, RFC 2434, Updated 2463 by RFC3692, Current Status BEST CURRENT PRACTICE, October 1998. 2465 Appendix A. Code Skeleton 2467 This appendix is intended to describe the protocol and algorithms of 2468 an implementation in a general way using what is called a code 2469 skeleton program. This consists of a set of definitions, structures 2470 and code fragments which illustrate the protocol operations without 2471 the complexities of an actual implementation of the protocol. This 2472 program is not an executable and is not designed to run in the 2473 ordinary sense. It is designed to be compiled only in order to 2474 verify consistent variable and type usage. 2476 Most of the features of the reference implementation are included 2477 here, with the following exceptions: There are no provisions for 2478 reference clocks or public key (Autokey) cryptography. There is no 2479 huff-n'-puff filter, anti-clockhop hysteresis or monitoring 2480 provisions. Many of the values that can be tinkered in the reference 2481 implementation are assumed constants here. There are only minimal 2482 provisions for the kiss-o'death packet and no responding code. 2484 The program is not intended to be fast or compact, just to 2485 demonstrate the algorithms with sufficient fidelity to understand how 2486 they work. The code skeleton consists of eight segments, a header 2487 segment included by each of the other segments, plus a code segment 2488 for the main program, kernel I/O and system clock interfaces, and 2489 peer, system, clock_adjust and poll processes. These are presented 2490 in order below along with definitions and variables specific to each 2491 process. 2493 A.1. Global Definitions 2495 A.1.1. Definitions, Constants, Parameters 2497 #include /* avoids complaints about sqrt() */ 2498 #include /* for gettimeofday() and friends */ 2499 #include /* for malloc() and friends */ 2501 /* 2502 * Data types 2503 * 2504 * This program assumes the int data type is 32 bitsand the long data 2505 * type is 64 bits. The native data type used in most calculations is 2506 * floating double. The data types used in some packet header fields 2507 * require conversion to and from this representation. Some header 2508 * fields involve partitioning an octet, here represeted by individual 2509 * octets. 2510 * 2511 * The 64-bit NTP timestamp format used in timestamp calculations is 2512 * unsigned seconds and fraction with the decimal point to the left of 2513 * bit 32. The only operation permitted with these values is 2514 * subtraction, yielding a signed 31-bit difference. The 32-bit NTP 2515 * short format used in delay and dispersion calculations is seconds and 2516 * fraction with the decimal point to the left of bit 16. The only 2517 * operations permitted with these values are addition and 2518 * multiplication by a constant. 2519 * 2520 * The IPv4 address is 32 bits, while the IPv6 address is 128 bits. The 2521 * message digest field is 128 bits as constructed by the MD5 algorithm. 2522 * The precision and poll interval fields are signed log2 seconds. 2523 */ 2524 typedef unsigned long tstamp; /* NTP timestamp format */ 2525 typedef unsigned int tdist; /* NTP short format */ 2526 typedef unsigned long ipaddr; /* IPv4 or IPv6 address */ 2527 typedef unsigned long digest; /* md5 digest */ 2528 typedef signed char s_char; /* precision and poll interval (log2) */ 2530 /* 2531 * Timestamp conversion macroni 2532 */ 2533 #define FRIC 65536. /* 2^16 as a double */ 2534 #define D2FP(r) ((tdist)((r) * FRIC)) /* NTP short */ 2535 #define FP2D(r) ((double)(r) / FRIC) 2537 #define FRAC 4294967296. /* 2^32 as a double */ 2538 #define D2LFP(a) ((tstamp)((a) * FRAC)) /* NTP timestamp */ 2539 #define LFP2D(a) ((double)(a) / FRAC) 2540 #define U2LFP(a) ((a).tv_sec + JAN_1970 << 32 + (unsigned long) \ 2541 ((a).tv_usec / 1e6 * FRAC)) 2543 /* 2544 * Arithmetic conversions 2545 */ 2546 #define LOG2D(a) ((a) < 0 ? 1. / (1L << -(a)) : \ 2547 1L << (a)) /* poll, etc. */ 2548 #define SQUARE(x) (x * x) 2549 #define SQRT(x) (sqrt(x)) 2551 /* 2552 * Global constants. Some of these might be converted to variables 2553 * which can be tinkered by configuration or computed on-fly. For 2554 * instance, the reference implementation computes PRECISION on-fly and 2555 * provides performance tuning for the defines marked with % below. 2556 */ 2557 #define VERSION 4 /* version number */ 2558 #define MINDISP .01 /* % minimum dispersion (s) */ 2559 #define MAXDISP 16 /* maximum dispersion (s) */ 2560 #define MAXDIST 1 /* % distance threshold (s) */ 2561 #define NOSYNC 0x3 /* leap unsync */ 2562 #define MAXSTRAT 16 /* maximum stratum (infinity metric) */ 2563 #define MINPOLL 6 /* % minimum poll interval (64 s)*/ 2564 #define MAXPOLL 17 /* % maximum poll interval (36.4 h) */ 2565 #define MINCLOCK 3 /* minimum manycast survivors */ 2566 #define MAXCLOCK 10 /* maximum manycast candidates */ 2567 #define TTLMAX 8 /* max ttl manycast */ 2568 #define BEACON 15 /* max interval between beacons */ 2570 #define PHI 15e-6 /* % frequency tolerance (15 PPM) */ 2571 #define NSTAGE 8 /* clock register stages */ 2572 #define NMAX 50 /* maximum number of peers */ 2573 #define NSANE 1 /* % minimum intersection survivors */ 2574 #define NMIN 3 /* % minimum cluster survivors */ 2576 /* 2577 * Global return values 2578 */ 2579 #define TRUE 1 /* boolean true */ 2580 #define FALSE 0 /* boolean false */ 2581 #define NULL 0 /* empty pointer */ 2583 /* 2584 * Local clock process return codes 2585 */ 2586 #define IGNORE 0 /* ignore */ 2587 #define SLEW 1 /* slew adjustment */ 2588 #define STEP 2 /* step adjustment */ 2589 #define PANIC 3 /* panic - no adjustment */ 2591 /* 2592 * System flags 2593 */ 2594 #define S_FLAGS 0 /* any system flags */ 2595 #define S_BCSTENAB 0x1 /* enable broadcast client */ 2597 /* 2598 * Peer flags 2599 */ 2600 #define P_FLAGS 0 /* any peer flags */ 2601 #define P_EPHEM 0x01 /* association is ephemeral */ 2602 #define P_BURST 0x02 /* burst enable */ 2603 #define P_IBURST 0x04 /* intial burst enable */ 2604 #define P_NOTRUST 0x08 /* authenticated access */ 2605 #define P_NOPEER 0x10 /* authenticated mobilization */ 2606 #define P_MANY 0x20 /* manycast client */ 2608 /* 2609 * Authentication codes 2610 */ 2611 #define A_NONE 0 /* no authentication */ 2612 #define A_OK 1 /* authentication OK */ 2613 #define A_ERROR 2 /* authentication error */ 2614 #define A_CRYPTO 3 /* crypto-NAK */ 2616 /* 2617 * Association state codes 2618 */ 2619 #define X_INIT 0 /* initialization */ 2620 #define X_STALE 1 /* timeout */ 2621 #define X_STEP 2 /* time step */ 2622 #define X_ERROR 3 /* authentication error */ 2623 #define X_CRYPTO 4 /* crypto-NAK received */ 2624 #define X_NKEY 5 /* untrusted key */ 2626 /* 2627 * Protocol mode definitionss 2628 */ 2629 #define M_RSVD 0 /* reserved */ 2630 #define M_SACT 1 /* symmetric active */ 2631 #define M_PASV 2 /* symmetric passive */ 2632 #define M_CLNT 3 /* client */ 2633 #define M_SERV 4 /* server */ 2634 #define M_BCST 5 /* broadcast server */ 2635 #define M_BCLN 6 /* broadcast client */ 2637 /* 2638 * Clock state definitions 2639 */ 2640 #define NSET 0 /* clock never set */ 2641 #define FSET 1 /* frequency set from file */ 2642 #define SPIK 2 /* spike detected */ 2643 #define FREQ 3 /* frequency mode */ 2644 #define SYNC 4 /* clock synchronized */ 2646 A.1.2. Packet Data Structures 2648 /* 2649 * The receive and transmit packets may contain an optional message 2650 * authentication code (MAC) consisting of a key identifier (keyid) and 2651 * message digest (mac). NTPv4 supports optional extension fields which 2652 * are inserted after the the header and before the MAC, but these are 2653 * not described here. 2654 * 2655 * Receive packet 2656 * 2657 * Note the dst timestamp is not part of the packet itself. It is 2658 * captured upon arrival and returned in the receive buffer along with 2659 * the buffer length and data. Note that some of the char fields are 2660 * packed in the actual header, but the details are omited here. 2661 */ 2662 struct r { 2663 ipaddr srcaddr; /* source (remote) address */ 2664 ipaddr dstaddr; /* destination (local) address */ 2665 char version; /* version number */ 2666 char leap; /* leap indicator */ 2667 char mode; /* mode */ 2668 char stratum; /* stratum */ 2669 char poll; /* poll interval */ 2670 s_char precision; /* precision */ 2671 tdist rootdelay; /* root delay */ 2672 tdist rootdisp; /* root dispersion */ 2673 char refid; /* reference ID */ 2674 tstamp reftime; /* reference time */ 2675 tstamp org; /* origin timestamp */ 2676 tstamp rec; /* receive timestamp */ 2677 tstamp xmt; /* transmit timestamp */ 2678 int keyid; /* key ID */ 2679 digest mac; /* message digest */ 2680 tstamp dst; /* destination timestamp */ 2681 } r; 2682 /* 2683 * Transmit packet 2684 */ 2685 struct x { 2686 ipaddr dstaddr; /* source (local) address */ 2687 ipaddr srcaddr; /* destination (remote) address */ 2688 char version; /* version number */ 2689 char leap; /* leap indicator */ 2690 char mode; /* mode */ 2691 char stratum; /* stratum */ 2692 char poll; /* poll interval */ 2693 s_char precision; /* precision */ 2694 tdist rootdelay; /* root delay */ 2695 tdist rootdisp; /* root dispersion */ 2696 char refid; /* reference ID */ 2697 tstamp reftime; /* reference time */ 2698 tstamp org; /* origin timestamp */ 2699 tstamp rec; /* receive timestamp */ 2700 tstamp xmt; /* transmit timestamp */ 2701 int keyid; /* key ID */ 2702 digest mac; /* message digest */ 2703 } x; 2705 A.1.3. Association Data Structures 2707 /* 2708 * Filter stage structure. Note the t member in this and other 2709 * structures refers to process time, not real time. Process time 2710 * increments by one second for every elapsed second of real time. 2711 */ 2712 struct f { 2713 tstamp t; /* update time */ 2714 double offset; /* clock ofset */ 2715 double delay; /* roundtrip delay */ 2716 double disp; /* dispersion */ 2717 } f; 2719 /* 2720 * Association structure. This is shared between the peer process and 2721 * poll process. 2722 */ 2723 struct p { 2725 /* 2726 * Variables set by configuration 2727 */ 2728 ipaddr srcaddr; /* source (remote) address */ 2729 ipaddr dstaddr; /* destination (local) address */ 2730 char version; /* version number */ 2731 char hmode; /* host mode */ 2732 int keyid; /* key identifier */ 2733 int flags; /* option flags */ 2735 /* 2736 * Variables set by received packet 2737 */ 2738 char leap; /* leap indicator */ 2739 char pmode; /* peer mode */ 2740 char stratum; /* stratum */ 2741 char ppoll; /* peer poll interval */ 2742 double rootdelay; /* root delay */ 2743 double rootdisp; /* root dispersion */ 2744 char refid; /* reference ID */ 2745 tstamp reftime; /* reference time */ 2746 #define begin_clear org /* beginning of clear area */ 2747 tstamp org; /* originate timestamp */ 2748 tstamp rec; /* receive timestamp */ 2749 tstamp xmt; /* transmit timestamp */ 2751 /* 2752 * Computed data 2753 */ 2754 double t; /* update time */ 2755 struct f f[NSTAGE]; /* clock filter */ 2756 double offset; /* peer offset */ 2757 double delay; /* peer delay */ 2758 double disp; /* peer dispersion */ 2759 double jitter; /* RMS jitter */ 2761 /* 2762 * Poll process variables 2763 */ 2764 char hpoll; /* host poll interval */ 2765 int burst; /* burst counter */ 2766 int reach; /* reach register */ 2767 int ttl; /* ttl (manycast) */ 2768 #define end_clear unreach /* end of clear area */ 2769 int unreach; /* unreach counter */ 2770 int outdate; /* last poll time */ 2771 int nextdate; /* next poll time */ 2772 } p; 2774 A.1.4. System Data Structures 2776 /* 2777 * Chime list. This is used by the intersection algorithm. 2778 */ 2779 struct m { /* m is for Marzullo */ 2780 struct p *p; /* peer structure pointer */ 2781 int type; /* high +1, mid 0, low -1 */ 2782 double edge; /* correctness interval edge */ 2783 } m; 2785 /* 2786 * Survivor list. This is used by the clustering algorithm. 2787 */ 2788 struct v { 2789 struct p *p; /* peer structure pointer */ 2790 double metric; /* sort metric */ 2791 } v; 2793 /* 2794 * System structure 2795 */ 2796 struct s { 2797 tstamp t; /* update time */ 2798 char leap; /* leap indicator */ 2799 char stratum; /* stratum */ 2800 char poll; /* poll interval */ 2801 char precision; /* precision */ 2802 double rootdelay; /* root delay */ 2803 double rootdisp; /* root dispersion */ 2804 char refid; /* reference ID */ 2805 tstamp reftime; /* reference time */ 2806 struct m m[NMAX]; /* chime list */ 2807 struct v v[NMAX]; /* survivor list */ 2808 struct p *p; /* association ID */ 2809 double offset; /* combined offset */ 2810 double jitter; /* combined jitter */ 2811 int flags; /* option flags */ 2812 int n; /* number of survivors */ 2813 } s; 2815 A.1.5. Local Clock Data Structures 2817 /* 2818 * Local clock structure 2819 */ 2820 struct c { 2821 tstamp t; /* update time */ 2822 int state; /* current state */ 2823 double offset; /* current offset */ 2824 double last; /* previous offset */ 2825 int count; /* jiggle counter */ 2826 double freq; /* frequency */ 2827 double jitter; /* RMS jitter */ 2828 double wander; /* RMS wander */ 2829 } c; 2831 A.1.6. Function Prototypes 2833 /* 2834 * Peer process 2835 */ 2836 void receive(struct r *); /* receive packet */ 2837 void packet(struct p *, struct r *); /* process packet */ 2838 void clock_filter(struct p *, double, double, double); /* filter */ 2839 double root_dist(struct p *); /* calculate root distance */ 2840 int fit(struct p *); /* determine fitness of server */ 2841 void clear(struct p *, int); /* clear association */ 2842 int access(struct r *); /* determine access restrictions */ 2844 /* 2845 * System process 2846 */ 2847 int main(); /* main program */ 2848 void clock_select(); /* find the best clocks */ 2849 void clock_update(struct p *); /* update the system clock */ 2850 void clock_combine(); /* combine the offsets */ 2852 /* 2853 * Local clock process 2854 */ 2855 int local_clock(struct p *, double); /* clock discipline */ 2856 void rstclock(int, double, double); /* clock state transition */ 2858 /* 2859 * Clock adjust process 2860 */ 2861 void clock_adjust(); /* one-second timer process */ 2862 /* 2863 * Poll process 2864 */ 2865 void poll(struct p *); /* poll process */ 2866 void poll_update(struct p *, int); /* update the poll interval */ 2867 void peer_xmit(struct p *); /* transmit a packet */ 2868 void fast_xmit(struct r *, int, int); /* transmit a reply packet */ 2870 /* 2871 * Utility routines 2872 */ 2873 digest md5(int); /* generate a message digest */ 2874 struct p *mobilize(ipaddr, ipaddr, int, int, int, int); /* mobilize */ 2875 struct p *find_assoc(struct r *); /* search the association table */ 2877 /* 2878 * Kernel interface 2879 */ 2880 struct r *recv_packet(); /* wait for packet */ 2881 void xmit_packet(struct x *); /* send packet */ 2882 void step_time(double); /* step time */ 2883 void adjust_time(double); /* adjust (slew) time */ 2884 tstamp get_time(); /* read time */ 2886 A.2. Main Program and Utility Routines 2888 /* 2889 * Definitions 2890 */ 2891 #define PRECISION -18 /* precision (log2 s) */ 2892 #define IPADDR 0 /* any IP address */ 2893 #define MODE 0 /* any NTP mode */ 2894 #define KEYID 0 /* any key identifier */ 2896 /* 2897 * main() - main program 2898 */ 2899 int 2900 main() 2901 { 2902 struct p *p; /* peer structure pointer */ 2903 struct r *r; /* receive packet pointer */ 2905 /* 2906 * Read command line options and initialize system variables. 2907 * The reference implementation measures the precision specific 2908 * to each machine by measuring the clock increments to read the 2909 * system clock. 2911 */ 2912 memset(&s, sizeof(s), 0); 2913 s.leap = NOSYNC; 2914 s.stratum = MAXSTRAT; 2915 s.poll = MINPOLL; 2916 s.precision = PRECISION; 2917 s.p = NULL; 2919 /* 2920 * Initialize local clock variables 2921 */ 2922 memset(&c, sizeof(c), 0); 2923 if (/* frequency file */ 0) { 2924 c.freq = /* freq */ 0; 2925 rstclock(FSET, 0, 0); 2926 } else { 2927 rstclock(NSET, 0, 0); 2928 } 2929 c.jitter = LOG2D(s.precision); 2931 /* 2932 * Read the configuration file and mobilize persistent 2933 * associations with spcified addresses, version, mode, key ID 2934 * and flags. 2935 */ 2936 while (/* mobilize configurated associations */ 0) { 2937 p = mobilize(IPADDR, IPADDR, VERSION, MODE, KEYID, 2938 P_FLAGS); 2939 } 2941 /* 2942 * Start the system timer, which ticks once per second. Then 2943 * read packets as they arrive, strike receive timestamp and 2944 * call the receive() routine. 2945 */ 2946 while (0) { 2947 r = recv_packet(); 2948 r->dst = get_time(); 2949 receive(r); 2950 } 2951 } 2953 /* 2954 * mobilize() - mobilize and initialize an association 2955 */ 2956 struct p 2957 *mobilize( 2958 ipaddr srcaddr, /* IP source address */ 2959 ipaddr dstaddr, /* IP destination address */ 2960 int version, /* version */ 2961 int mode, /* host mode */ 2962 int keyid, /* key identifier */ 2963 int flags /* peer flags */ 2964 ) 2965 { 2966 struct p *p; /* peer process pointer */ 2968 /* 2969 * Allocate and initialize association memory 2970 */ 2971 p = malloc(sizeof(struct p)); 2972 p->srcaddr = srcaddr; 2973 p->dstaddr = dstaddr; 2974 p->version = version; 2975 p->hmode = mode; 2976 p->keyid = keyid; 2977 p->hpoll = MINPOLL; 2978 clear(p, X_INIT); 2979 p->flags == flags; 2980 return (p); 2981 } 2983 /* 2984 * find_assoc() - find a matching association 2985 */ 2986 struct p /* peer structure pointer or NULL */ 2987 *find_assoc( 2988 struct r *r /* receive packet pointer */ 2989 ) 2990 { 2991 struct p *p; /* dummy peer structure pointer */ 2993 /* 2994 * Search association table for matching source 2995 * address, source port and mode. 2996 */ 2997 while (/* all associations */ 0) { 2998 if (r->srcaddr == p->srcaddr && r->mode == p->hmode) 2999 return(p); 3000 } 3001 return (NULL); 3002 } 3004 /* 3005 * md5() - compute message digest 3006 */ 3008 digest 3009 md5( 3010 int keyid /* key identifier */ 3011 ) 3012 { 3013 /* 3014 * Compute a keyed cryptographic message digest. The key 3015 * identifier is associated with a key in the local key cache. 3016 * The key is prepended to the packet header and extension fieds 3017 * and the result hashed by the MD5 algorithm as described in 3018 * RFC-1321. Return a MAC consisting of the 32-bit key ID 3019 * concatenated with the 128-bit digest. 3020 */ 3021 return (/* MD5 digest */ 0); 3022 } 3024 A.3. Kernel Input/Output Interface 3026 /* 3027 * Kernel interface to transmit and receive packets. Details are 3028 * deliberately vague and depend on the operating system. 3029 * 3030 * recv_packet - receive packet from network 3031 */ 3032 struct r /* receive packet pointer*/ 3033 *recv_packet() { 3034 return (/* receive packet r */ 0); 3035 } 3037 /* 3038 * xmit_packet - transmit packet to network 3039 */ 3040 void 3041 xmit_packet( 3042 struct x *x /* transmit packet pointer */ 3043 ) 3044 { 3045 /* send packet x */ 3046 } 3048 A.4. Kernel System Clock Interface 3050 /* 3051 * System clock utility functions 3052 * 3053 * There are three time formats: native (Unix), NTP and floating double. 3054 * The get_time() routine returns the time in NTP long format. The Unix 3055 * routines expect arguments as a structure of two signed 32-bit words 3056 * in seconds and microseconds (timeval) or nanoseconds (timespec). The 3057 * step_time() and adjust_time() routines expect signed arguments in 3058 * floating double. The simplified code shown here is for illustration 3059 * only and has not been verified. 3060 */ 3061 #define JAN_1970 2208988800UL /* 1970 - 1900 in seconds */ 3063 /* 3064 * get_time - read system time and convert to NTP format 3065 */ 3066 tstamp 3067 get_time() 3068 { 3069 struct timeval unix_time; 3071 /* 3072 * There are only two calls on this routine in the program. One 3073 * when a packet arrives from the network and the other when a 3074 * packet is placed on the send queue. Call the kernel time of 3075 * day routine (such as gettimeofday()) and convert to NTP 3076 * format. 3077 */ 3078 gettimeofday(&unix_time, NULL); 3079 return (U2LFP(unix_time)); 3080 } 3082 /* 3083 * step_time() - step system time to given offset valuet 3084 */ 3085 void 3086 step_time( 3087 double offset /* clock offset */ 3088 ) 3089 { 3090 struct timeval unix_time; 3091 tstamp ntp_time; 3093 /* 3094 * Convert from double to native format (signed) and add to the 3095 * current time. Note the addition is done in native format to 3096 * avoid overflow or loss of precision. 3097 */ 3098 gettimeofday(&unix_time, NULL); 3099 ntp_time = D2LFP(offset) + U2LFP(unix_time);; 3100 unix_time.tv_sec = ntp_time >> 32; 3101 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3102 32) / FRAC * 1e6); 3103 settimeofday(&unix_time, NULL); 3104 } 3106 /* 3107 * adjust_time() - slew system clock to given offset value 3108 */ 3109 void 3110 adjust_time( 3111 double offset /* clock offset */ 3112 ) 3113 { 3114 struct timeval unix_time; 3115 tstamp ntp_time; 3117 /* 3118 * Convert from double to native format (signed) and add to the 3119 * current time. 3120 */ 3121 ntp_time = D2LFP(offset); 3122 unix_time.tv_sec = ntp_time >> 32; 3123 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3124 32) / FRAC * 1e6); 3125 adjtime(&unix_time, NULL); 3126 } 3128 A.5. Peer Process 3130 /* 3131 * A crypto-NAK packet includes the NTP header followed by a MAC 3132 * consisting only of the key identifier with value zero. It tells the 3133 * receiver that a prior request could not be properly authenticated, 3134 * but the NTP header fields are correct. 3135 * 3136 * A kiss-o'-death packet is an NTP header with leap 0x3 (NOSYNC) and 3137 * stratum 16 (MAXSTRAT. It tells the receiver that something drastic 3138 * has happened, as revealled by the kiss code in the refid field. The 3139 * NTP header fields may or may not be correct. 3140 */ 3141 /* 3142 * Peer process parameters and constants 3143 */ 3144 #define SGATE 3 /* spike gate (clock filter */ 3145 #define BDELAY .004 /* broadcast delay (s) */ 3147 /* 3148 * Dispatch codes 3149 */ 3150 #define ERR -1 /* error */ 3151 #define DSCRD 0 /* discard packet */ 3152 #define PROC 1 /* process packet */ 3153 #define BCST 2 /* broadcast packet */ 3154 #define FXMIT 3 /* client packet */ 3155 #define NEWPS 4 /* new symmetric passive client */ 3156 #define NEWBC 5 /* new broadcast client */ 3158 /* 3159 * Dispatch matrix 3160 * active passv client server bcast */ 3161 int table[7][5] = { 3162 /* nopeer */ { NEWPS, DSCRD, FXMIT, MANY, NEWBC }, 3163 /* active */ { PROC, PROC, DSCRD, DSCRD, DSCRD }, 3164 /* passv */ { PROC, ERR, DSCRD, DSCRD, DSCRD }, 3165 /* client */ { DSCRD, DSCRD, DSCRD, PROC, DSCRD }, 3166 /* server */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3167 /* bcast */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3168 /* bclient */ { DSCRD, DSCRD, DSCRD, DSCRD, PROC} 3169 }; 3171 /* 3172 * Miscellaneous macroni 3173 * 3174 * This macro defines the authentication state. If x is 0, 3175 * authentication is optional, othewise it is required. 3176 */ 3177 #define AUTH(x, y) ((x) ? (y) == A_OK : (y) == A_OK || \ 3178 (y) == A_NONE) 3180 /* 3181 * These are used by the clear() routine 3182 */ 3183 #define BEGIN_CLEAR(p) ((char *)&((p)->begin_clear)) 3184 #define END_CLEAR(p) ((char *)&((p)->end_clear)) 3185 #define LEN_CLEAR (END_CLEAR((struct p *)0) - \ 3186 BEGIN_CLEAR((struct p *)0)) 3188 A.5.1. receive() 3190 /* 3191 * receive() - receive packet and decode modes 3192 */ 3193 void 3194 receive( 3195 struct r *r /* receive packet pointer */ 3196 ) 3198 { 3199 struct p *p; /* peer structure pointer 3200 int auth; /* authentication code */ 3201 int has_mac; /* size of MAC */ 3202 int synch; /* synchronized switch */ 3203 int auth; /* authentication code */ 3205 /* 3206 * Check access control lists. The intent here is to implement a 3207 * whitelist of those IP addresses specifically accepted and/or 3208 * a blacklist of those IP addresses specifically rejected. 3209 * There could be different lists for authenticated clients and 3210 * unauthenticated clients. 3211 */ 3212 if (!access(r)) 3213 return; /* access denied */ 3215 /* 3216 * The version must not be in the future. Format checks include 3217 * packet length, MAC length and extension field lengths, if 3218 * present. 3219 */ 3220 if (r->version > VERSION /* or format error */) 3221 return; /* format error */ 3223 /* 3224 * Authentication is conditioned by two switches which can be 3225 * specified on a per-client basis. 3226 * 3227 * P_NOPEER do not mobilize an association unless 3228 * authenticated 3229 * P_NOTRUST do not allow access unless authenticated 3230 * (implies P_NOPEER) 3231 * 3232 * There are four outcomes: 3233 * 3234 * A_NONE the packet has no MAC 3235 * A_OK the packet has a MAC and authentication 3236 * succeeds 3237 * A_ERROR the packet has a MAC and authentication fails 3238 * A_CRYPTO crypto-NAK. The MAC has four octets only. 3239 * 3240 * Note: The AUTH(x, y) macro is used to filter outcomes. If x 3241 * is zero, acceptable outcomes of y are NONE and OK. If x is 3242 * one, the only acceptable outcome of y is OK. 3243 */ 3244 has_mac = /* length of MAC field */ 0; 3245 if (has_mac == 0) { 3246 auth = A_NONE; /* not required */ 3247 } else if (has_mac == 4) { 3248 auth == A_CRYPTO; /* crypto-NAK */ 3249 } else { 3250 if (r->mac != md5(r->keyid)) 3251 auth = A_ERROR; /* auth error */ 3252 else 3253 auth = A_OK; /* auth OK */ 3254 } 3256 /* 3257 * Find association and dispatch code. If there is no 3258 * association to match, the value of p->hmode is assumed NULL. 3259 */ 3260 p = find_assoc(r); 3261 switch(table[p->hmode][r->mode]) { 3263 /* 3264 * Client packet and no association. Send server reply without 3265 * saving state. 3266 */ 3267 case FXMIT: 3269 /* 3270 * If unicast destination address, send server packet. 3271 * If authentication fails, send a crypto-NAK packet. 3272 /* 3273 if (/* not multicast dstaddr */0) { 3274 if (AUTH(p->flags & P_NOTRUST, auth)) 3275 fast_xmit(r, M_SERV, auth); 3276 else if (auth == A_ERROR) 3277 fast_xmit(r, M_SERV, A_CRYPTO); 3278 return; /* M_SERV packet sent */ 3279 } 3281 /* 3282 * This must be manycast. Do not repspond if we are not 3283 * synchronized or if our stratum is above the 3284 * manycaster. 3285 */ 3286 if (s.leap == NOSYNC || s.stratum > r->stratum) 3287 return; 3289 /* 3290 * Respond only if authentication is ok. Note that the 3291 * unicast addreess is used, not the multicast. 3292 */ 3293 if (AUTH(p->flags & P_NOTRUST, auth)) 3294 fast_xmit(r, M_SERV, auth); 3295 return; 3297 /* 3298 * New manycase client ephemeral association. It is mobilized in 3299 * the same version as in the packet. If authentication fails, 3300 * ignore the packet. Verify the server packet by comparing the 3301 * r->org timestamp in the packet with the p->xmt timestamp in 3302 * the multicast client association. If they match, the server 3303 * packet is authentic. Details omitted. 3304 */ 3305 case MANY: 3306 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3307 return; /* authentication error */ 3309 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_CLNT, 3310 r->keyid, P_EPHEM); 3311 break; 3313 /* 3314 * New symmetric passive ssociation. It is mobilized in the same 3315 * version as in the packet. If authentication fails, send a 3316 * crypto-NAK packet. If restrict no-moblize, send a symmetric 3317 * active packet instead. 3318 */ 3319 case NEWPS: 3320 if (!AUTH(p->flags & P_NOTRUST, auth)) { 3321 if (auth == A_ERROR) 3322 fast_xmit(r, M_SACT, A_CRYPTO); 3323 return; /* crypto-NAK packet sent */ 3324 } 3325 if (!AUTH(p->flags & P_NOPEER, auth)) { 3326 fast_xmit(r, M_SACT, auth); 3327 return; /* M_SACT packet sent */ 3328 } 3329 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_PASV, 3330 r->keyid, P_EPHEM); 3331 break; 3333 /* 3334 * New broadcast client association. It is mobilized in the same 3335 * version as in the packet. If authentication fails, ignore the 3336 * packet. Note this code does not support the initial volley 3337 * feature in the reference implementation. 3338 */ 3339 case NEWBC: 3340 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3341 return; /* authentication error */ 3343 if (!(s.flags & S_BCSTENAB)) 3344 return; /* broadcast not enabled */ 3346 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_BCLN, 3347 r->keyid, P_EPHEM); 3348 break; /* processing continues */ 3350 /* 3351 * Process packet. Placeholdler only. 3352 */ 3353 case PROC: 3354 break; /* processing continues */ 3356 /* 3357 * Invalid mode combination. We get here only in case of 3358 * ephemeral associations, so the correct action is simply to 3359 * toss it. 3360 */ 3361 case ERR: 3362 clear(p, X_ERROR); 3363 return; /* invalid mode combination */ 3365 /* 3366 * No match; just discard the packet. 3367 */ 3368 case DSCRD: 3369 return; /* orphan abandoned */ 3370 } 3372 /* 3373 * Next comes a rigorous schedule of timestamp checking. If the 3374 * transmit timestamp is zero, the server is horribly broken. 3375 */ 3376 if (r->xmt == 0) 3377 return; /* invalid timestamp */ 3379 /* 3380 * If the transmit timestamp duplicates a previous one, the 3381 * packet is a replay. 3382 */ 3383 if (r->xmt == p->xmt) 3384 return; /* duplicate packet */ 3386 /* 3387 * If this is a broadcast mode packet, skip further checking. 3388 * If the origin timestamp is zero, the sender has not yet heard 3389 * from us. Otherwise, if the origin timestamp does not match 3390 * the transmit timestamp, the packet is bogus. 3392 */ 3393 synch = TRUE; 3394 if (r->mode != M_BCST) { 3395 if (r->org == 0) 3396 synch = FALSE; /* unsynchronized */ 3398 else if (r->org != p->xmt) 3399 synch = FALSE; /* bogus packet */ 3400 } 3402 /* 3403 * Update the origin and destination timestamps. If 3404 * unsynchronized or bogus, abandon ship. 3405 */ 3406 p->org = r->xmt; 3407 p->rec = r->dst; 3408 if (!synch) 3409 return; /* unsynch */ 3411 /* 3412 * The timestamps are valid and the receive packet matches the 3413 * last one sent. If the packet is a crypto-NAK, the server 3414 * might have just changed keys. We demobilize the association 3415 * and wait for better times. 3416 */ 3417 if (auth == A_CRYPTO) { 3418 clear(p, X_CRYPTO); 3419 return; /* crypto-NAK */ 3420 } 3422 /* 3423 * If the association is authenticated, the key ID is nonzero 3424 * and received packets must be authenticated. This is designed 3425 * to avoid a bait-and-switch attack, which was possible in past 3426 * versions. 3427 */ 3428 if (!AUTH(p->keyid || (p->flags & P_NOTRUST), auth)) 3429 return; /* bad auth */ 3431 /* 3432 * Everything possible has been done to validate the timestamps 3433 * and prevent bad guys from disrupting the protocol or 3434 * injecting bogus data. Earn some revenue. 3435 */ 3436 packet(p, r); 3437 } 3439 A.5.1.1. packet() 3440 /* 3441 * packet() - process packet and compute offset, delay and 3442 * dispersion. 3443 */ 3444 void 3445 packet( 3446 struct p *p, /* peer structure pointer */ 3447 struct r *r /* receive packet pointer */ 3448 ) 3449 { 3450 double offset; /* sample offsset */ 3451 double delay; /* sample delay */ 3452 double disp; /* sample dispersion */ 3454 /* 3455 * By golly the packet is valid. Light up the remaining header 3456 * fields. Note that we map stratum 0 (unspecified) to MAXSTRAT 3457 * to make stratum comparisons simpler and to provide a natural 3458 * interface for radio clock drivers that operate for 3459 * convenience at stratum 0. 3460 */ 3461 p->leap = r->leap; 3462 if (r->stratum == 0) 3463 p->stratum = MAXSTRAT; 3464 else 3465 p->stratum = r->stratum; 3466 p->pmode = r->mode; 3467 p->ppoll = r->poll; 3468 p->rootdelay = FP2D(r->rootdelay); 3469 p->rootdisp = FP2D(r->rootdisp); 3470 p->refid = r->refid; 3471 p->reftime = r->reftime; 3473 /* 3474 * Verify the server is synchronized with valid stratum and 3475 * reference time not later than the transmit time. 3476 */ 3477 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3478 return; /* unsynchronized */ 3480 /* 3481 * Verify valid root distance. 3482 */ 3483 if (r->rootdelay / 2 + r->rootdisp >= MAXDISP || p->reftime > 3484 r->xmt) 3485 return; /* invalid header values */ 3487 poll_update(p, p->hpoll); 3488 p->reach |= 1; 3490 /* 3491 * Calculate offset, delay and dispersion, then pass to the 3492 * clock filter. Note carefully the implied processing. The 3493 * first-order difference is done directly in 64-bit arithmetic, 3494 * then the result is converted to floating double. All further 3495 * processing is in floating double arithmetic with rounding 3496 * done by the hardware. This is necessary in order to avoid 3497 * overflow and preseve precision. 3498 * 3499 * The delay calculation is a special case. In cases where the 3500 * server and client clocks are running at different rates and 3501 * with very fast networks, the delay can appear negative. In 3502 * order to avoid violating the Principle of Least Astonishment, 3503 * the delay is clamped not less than the system precision. 3504 */ 3505 if (p->pmode == M_BCST) { 3506 offset = LFP2D(r->xmt - r->dst); 3507 delay = BDELAY; 3508 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3509 2 * BDELAY; 3510 } else { 3511 offset = (LFP2D(r->rec - r->org) + LFP2D(r->dst - 3512 r->xmt)) / 2; 3513 delay = max(LFP2D(r->dst - r->org) - LFP2D(r->rec - 3514 r->xmt), LOG2D(s.precision)); 3515 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3516 LFP2D(r->dst - r->org); 3517 } 3518 clock_filter(p, offset, delay, disp); 3519 } 3521 A.5.2. clock_filter() 3523 /* 3524 * clock_filter(p, offset, delay, dispersion) - select the best from the 3525 * latest eight delay/offset samples. 3526 */ 3527 void 3528 clock_filter( 3529 struct p *p, /* peer structure pointer */ 3530 double offset, /* clock offset */ 3531 double delay, /* roundtrip delay */ 3532 double disp /* dispersion */ 3533 ) 3534 { 3535 struct f f[NSTAGE]; /* sorted list */ 3536 double dtemp; 3537 int i; 3539 /* 3540 * The clock filter contents consist of eight tuples (offset, 3541 * delay, dispersion, time). Shift each tuple to the left, 3542 * discarding the leftmost one. As each tuple is shifted, 3543 * increase the dispersion since the last filter update. At the 3544 * same time, copy each tuple to a temporary list. After this, 3545 * place the (offset, delay, disp, time) in the vacated 3546 * rightmost tuple. 3547 */ 3548 for (i = 1; i < NSTAGE; i++) { 3549 p->f[i] = p->f[i - 1]; 3550 p->f[i].disp += PHI * (c.t - p->t); 3551 f[i] = p->f[i]; 3552 } 3553 p->f[0].t = c.t; 3554 p->f[0].offset = offset; 3555 p->f[0].delay = delay; 3556 p->f[0].disp = disp; 3557 f[0] = p->f[0]; 3559 /* 3560 * Sort the temporary list of tuples by increasing f[].delay. 3561 * The first entry on the sorted list represents the best 3562 * sample, but it might be old. 3563 */ 3564 dtemp = p->offset; 3565 p->offset = f[0].offset; 3566 p->delay = f[0].delay; 3567 for (i = 0; i < NSTAGE; i++) { 3568 p->disp += f[i].disp / (2 ^ (i + 1)); 3569 p->jitter += SQUARE(f[i].offset - f[0].offset); 3570 } 3571 p->jitter = max(SQRT(p->jitter), LOG2D(s.precision)); 3573 /* 3574 * Prime directive: use a sample only once and never a sample 3575 * older than the latest one, but anything goes before first 3576 * synchronized. 3577 */ 3578 if (f[0].t - p->t <= 0 && s.leap != NOSYNC) 3579 return; 3581 /* 3582 * Popcorn spike suppressor. Compare the difference between the 3583 * last and current offsets to the current jitter. If greater 3584 * than SGATE (3) and if the interval since the last offset is 3585 * less than twice the system poll interval, dump the spike. 3586 * Otherwise, and if not in a burst, shake out the truechimers. 3587 */ 3588 if (fabs(p->offset - dtemp) > SGATE * p->jitter && (f[0].t - 3589 p->t) < 2 * s.poll) 3590 return; 3592 p->t = f[0].t; 3593 if (p->burst == 0) 3594 clock_select(); 3595 return; 3596 } 3598 /* 3599 * root_dist() - calculate root distance 3600 */ 3601 double 3602 root_dist( 3603 struct p *p /* peer structure pointer */ 3604 ) 3605 { 3606 /* 3607 * The root synchronization distance is the maximum error due to 3608 * all causes of the local clock relative to the primary server. 3609 * It is defined as half the total delay plus total dispersion 3610 * plus peer jitter. 3611 */ 3612 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3613 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3614 } 3616 /* 3617 * fit() - test if association p is acceptable for synchronization 3618 */ 3619 int 3620 fit( 3621 struct p *p /* peer structure pointer */ 3622 ) 3623 { 3624 /* 3625 * A stratum error occurs if (1) the server has never been 3626 * synchronized, (2) the server stratum is invalid. 3627 */ 3628 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3629 return (FALSE); 3631 /* 3632 * A distance error occurs if the root distance exceeds the 3633 * distance threshold plus an increment equal to one poll 3634 * interval. 3635 */ 3636 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 3637 return (FALSE); 3639 /* 3640 * A loop error occurs if the remote peer is synchronized to the 3641 * local peer or the remote peer is synchronized to the current 3642 * system peer. Note this is the behavior for IPv4; for IPv6 the 3643 * MD5 hash is used instead. 3644 */ 3645 if (p->refid == p->dstaddr || p->refid == s.refid) 3646 return (FALSE); 3648 /* 3649 * An unreachable error occurs if the server is unreachable. 3650 */ 3651 if (p->reach == 0) 3652 return (FALSE); 3654 return (TRUE); 3655 } 3657 /* 3658 * clear() - reinitialize for persistent association, demobilize 3659 * for ephemeral association. 3660 */ 3661 void 3662 clear( 3663 struct p *p, /* peer structure pointer */ 3664 int kiss /* kiss code */ 3665 ) 3666 { 3667 int i; 3669 /* 3670 * The first thing to do is return all resources to the bank. 3671 * Typical resources are not detailed here, but they include 3672 * dynamically allocated structures for keys, certificates, etc. 3673 * If an ephemeral association and not initialization, return 3674 * the association memory as well. 3675 */ 3676 /* return resources */ 3677 if (s.p == p) 3678 s.p = NULL; 3679 if (kiss != X_INIT && (p->flags & P_EPHEM)) { 3680 free(p); 3681 return; 3682 } 3684 /* 3685 * Initialize the association fields for general reset. 3686 */ 3687 memset(BEGIN_CLEAR(p), LEN_CLEAR, 0); 3688 p->leap = NOSYNC; 3689 p->stratum = MAXSTRAT; 3690 p->ppoll = MAXPOLL; 3691 p->hpoll = MINPOLL; 3692 p->disp = MAXDISP; 3693 p->jitter = LOG2D(s.precision); 3694 p->refid = kiss; 3695 for (i = 0; i < NSTAGE; i++) 3696 p->f[i].disp = MAXDISP; 3698 /* 3699 * Randomize the first poll just in case thousands of broadcast 3700 * clients have just been stirred up after a long absence of the 3701 * broadcast server. 3702 */ 3703 p->outdate = p->t = c.t; 3704 p->nextdate = p->outdate + (random() & ((1 << MINPOLL) - 1)); 3705 } 3707 A.5.3. fast_xmit() 3709 /* 3710 * fast_xmit() - transmit a reply packet for receive packet r 3711 */ 3712 void 3713 fast_xmit( 3714 struct r *r, /* receive packet pointer */ 3715 int mode, /* association mode */ 3716 int auth /* authentication code */ 3717 ) 3718 { 3719 struct x x; 3721 /* 3722 * Initialize header and transmit timestamp. Note that the 3723 * transmit version is copied from the receive version. This is 3724 * for backward compatibility. 3725 */ 3726 x.version = r->version; 3727 x.srcaddr = r->dstaddr; 3728 x.dstaddr = r->srcaddr; 3729 x.leap = s.leap; 3730 x.mode = mode; 3731 if (s.stratum == MAXSTRAT) 3732 x.stratum = 0; 3733 else 3734 x.stratum = s.stratum; 3735 x.poll = r->poll; 3736 x.precision = s.precision; 3737 x.rootdelay = D2FP(s.rootdelay); 3738 x.rootdisp = D2FP(s.rootdisp); 3739 x.refid = s.refid; 3740 x.reftime = s.reftime; 3741 x.org = r->xmt; 3742 x.rec = r->dst; 3743 x.xmt = get_time(); 3745 /* 3746 * If the authentication code is A.NONE, include only the 3747 * header; if A.CRYPTO, send a crypto-NAK; if A.OK, send a valid 3748 * MAC. Use the key ID in the received packet and the key in the 3749 * local key cache. 3750 */ 3751 if (auth != A_NONE) { 3752 if (auth == A_CRYPTO) { 3753 x.keyid = 0; 3754 } else { 3755 x.keyid = r->keyid; 3756 x.mac = md5(x.keyid); 3757 } 3758 } 3759 xmit_packet(&x); 3760 } 3761 A.5.4. access() 3763 /* 3764 * access() - determine access restrictions 3765 */ 3766 int 3767 access( 3768 struct r *r /* receive packet pointer */ 3769 ) 3770 { 3771 /* 3772 * The access control list is an ordered set of tuples 3773 * consisting of an address, mask and restrict word containing 3774 * defined bits. The list is searched for the first match on the 3775 * source address (r->srcaddr) and the associated restrict word 3776 * is returned. 3777 */ 3778 return (/* access bits */ 0); 3779 } 3781 A.5.5. System Process 3783 A.5.5.1. clock_select() 3785 /* 3786 * clock_select() - find the best clocks 3787 */ 3788 void 3789 clock_select() { 3790 struct p *p, *osys; /* peer structure pointers */ 3791 double low, high; /* correctness interval extents */ 3792 int allow, found, chime; /* used by intersecion algorithm */ 3793 int n, i, j; 3795 /* 3796 * We first cull the falsetickers from the server population, 3797 * leaving only the truechimers. The correctness interval for 3798 * association p is the interval from offset - root_dist() to 3799 * offset + root_dist(). The object of the game is to find a 3800 * majority clique; that is, an intersection of correctness 3801 * intervals numbering more than half the server population. 3802 * 3803 * First construct the chime list of tuples (p, type, edge) as 3804 * shown below, then sort the list by edge from lowest to 3805 * highest. 3806 */ 3807 osys = s.p; 3808 s.p = NULL; 3809 n = 0; 3810 while (fit(p)) { 3811 s.m[n].p = p; 3812 s.m[n].type = +1; 3813 s.m[n].edge = p->offset + root_dist(p); 3814 n++; 3815 s.m[n].p = p; 3816 s.m[n].type = 0; 3817 s.m[n].edge = p->offset; 3818 n++; 3819 s.m[n].p = p; 3820 s.m[n].type = -1; 3821 s.m[n].edge = p->offset - root_dist(p); 3822 n++; 3823 } 3825 /* 3826 * Find the largest contiguous intersection of correctness 3827 * intervals. Allow is the number of allowed falsetickers; found 3828 * is the number of midpoints. Note that the edge values are 3829 * limited to the range +-(2 ^ 30) < +-2e9 by the timestamp 3830 * calculations. 3831 */ 3832 low = 2e9; high = -2e9; 3833 for (allow = 0; 2 * allow < n; allow++) { 3835 /* 3836 * Scan the chime list from lowest to highest to find 3837 * the lower endpoint. 3838 */ 3839 found = 0; 3840 chime = 0; 3841 for (i = 0; i < n; i++) { 3842 chime -= s.m[i].type; 3843 if (chime >= n - found) { 3844 low = s.m[i].edge; 3845 break; 3846 } 3847 if (s.m[i].type == 0) 3848 found++; 3849 } 3851 /* 3852 * Scan the chime list from highest to lowest to find 3853 * the upper endpoint. 3854 */ 3855 chime = 0; 3856 for (i = n - 1; i >= 0; i--) { 3857 chime += s.m[i].type; 3858 if (chime >= n - found) { 3859 high = s.m[i].edge; 3860 break; 3861 } 3862 if (s.m[i].type == 0) 3863 found++; 3864 } 3866 /* 3867 * If the number of midpoints is greater than the number 3868 * of allowed falsetickers, the intersection contains at 3869 * least one truechimer with no midpoint. If so, 3870 * increment the number of allowed falsetickers and go 3871 * around again. If not and the intersection is 3872 * nonempty, declare success. 3873 */ 3874 if (found > allow) 3875 continue; 3877 if (high > low) 3878 break; 3879 } 3881 /* 3882 * Clustering algorithm. Construct a list of survivors (p, 3883 * metric) from the chime list, where metric is dominated first 3884 * by stratum and then by root distance. All other things being 3885 * equal, this is the order of preference. 3886 */ 3887 s.n = 0; 3888 for (i = 0; i < n; i++) { 3889 if (s.m[i].edge < low || s.m[i].edge > high) 3890 continue; 3892 p = s.m[i].p; 3893 s.v[n].p = p; 3894 s.v[n].metric = MAXDIST * p->stratum + root_dist(p); 3895 s.n++; 3896 } 3898 /* 3899 * There must be at least NSANE survivors to satisfy the 3900 * correctness assertions. Ordinarily, the Byzantine criteria 3901 * require four, survivors, but for the demonstration here, one 3902 * is acceptable. 3903 */ 3904 if (s.n < NSANE) 3905 return; 3907 /* 3908 * For each association p in turn, calculate the selection 3909 * jitter p->sjitter as the square root of the sum of squares 3910 * (p->offset - q->offset) over all q associations. The idea is 3911 * to repeatedly discard the survivor with maximum selection 3912 * jitter until a termination condition is met. 3913 */ 3914 while (1) { 3915 struct p *p, *q, *qmax; /* peer structure pointers */ 3916 double max, min, dtemp; 3918 max = -2e9; min = 2e9; 3919 for (i = 0; i < s.n; i++) { 3920 p = s.v[i].p; 3921 if (p->jitter < min) 3922 min = p->jitter; 3923 dtemp = 0; 3924 for (j = 0; j < n; j++) { 3925 q = s.v[j].p; 3926 dtemp += SQUARE(p->offset - q->offset); 3927 } 3928 dtemp = SQRT(dtemp); 3929 if (dtemp > max) { 3930 max = dtemp; 3931 qmax = q; 3932 } 3933 } 3935 /* 3936 * If the maximum selection jitter is less than the 3937 * minimum peer jitter, then tossing out more survivors 3938 * will not lower the minimum peer jitter, so we might 3939 * as well stop. To make sure a few survivors are left 3940 * for the clustering algorithm to chew on, we also stop 3941 * if the number of survivors is less than or equal to 3942 * NMIN (3). 3943 */ 3944 if (max < min || n <= NMIN) 3945 break; 3947 /* 3948 * Delete survivor qmax from the list and go around 3949 * again. 3950 */ 3951 s.n--; 3952 } 3953 /* 3954 * Pick the best clock. If the old system peer is on the list 3955 * and at the same stratum as the first survivor on the list, 3956 * then don't do a clock hop. Otherwise, select the first 3957 * survivor on the list as the new system peer. 3958 */ 3959 if (osys->stratum == s.v[0].p->stratum) 3960 s.p = osys; 3961 else 3962 s.p = s.v[0].p; 3963 clock_update(s.p); 3964 } 3966 A.5.5.2. root_dist() 3968 /* 3969 * root_dist() - calculate root distance 3970 */ 3971 double 3972 root_dist( 3973 struct p *p /* peer structure pointer */ 3974 ) 3975 { 3977 /* 3978 * The root synchronization distance is the maximum error due to 3979 * all causes of the local clock relative to the primary server. 3980 * It is defined as half the total delay plus total dispersion 3981 * plus peer jitter. 3982 */ 3983 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3984 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3985 } 3986 A.5.5.3. accept() 3988 /* 3989 * accept() - test if association p is acceptable for synchronization 3990 */ 3991 int 3992 accept( 3993 struct p *p /* peer structure pointer */ 3994 ) 3995 { 3996 /* 3997 * A stratum error occurs if (1) the server has never been 3998 * synchronized, (2) the server stratum is invalid. 3999 */ 4000 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 4001 return (FALSE); 4003 /* 4004 * A distance error occurs if the root distance exceeds the 4005 * distance threshold plus an increment equal to one poll 4006 * interval. 4007 */ 4008 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 4009 return (FALSE); 4011 /* 4012 * A loop error occurs if the remote peer is synchronized to the 4013 * local peer or the remote peer is synchronized to the current 4014 * system peer. Note this is the behavior for IPv4; for IPv6 the 4015 * MD5 hash is used instead. 4016 */ 4017 if (p->refid == p->dstaddr || p->refid == s.refid) 4018 return (FALSE); 4020 /* 4021 * An unreachable error occurs if the server is unreachable. 4022 */ 4023 if (p->reach == 0) 4024 return (FALSE); 4026 return (TRUE); 4027 } 4029 A.5.5.4. clock_update() 4031 /* 4032 * clock_update() - update the system clock 4033 */ 4035 void 4036 clock_update( 4037 struct p *p /* peer structure pointer */ 4038 ) 4039 { 4040 double dtemp; 4042 /* 4043 * If this is an old update, for instance as the result of a 4044 * system peer change, avoid it. We never use an old sample or 4045 * the same sample twice. 4046 * 4047 if (s.t >= p->t) 4048 return; 4050 /* 4051 * Combine the survivor offsets and update the system clock; the 4052 * local_clock() routine will tell us the good or bad news. 4053 */ 4054 s.t = p->t; 4055 clock_combine(); 4056 switch (local_clock(p, s.offset)) { 4058 /* 4059 * The offset is too large and probably bogus. Complain to the 4060 * system log and order the operator to set the clock manually 4061 * within PANIC range. The reference implementation includes a 4062 * command line option to disable this check and to change the 4063 * panic threshold from the default 1000 s as required. 4064 */ 4065 case PANIC: 4066 exit (0); 4068 /* 4069 * The offset is more than the step threshold (0.125 s by 4070 * default). After a step, all associations now have 4071 * inconsistent time valurs, so they are reset and started 4072 * fresh. The step threshold can be changed in the reference 4073 * implementation in order to lessen the chance the clock might 4074 * be stepped backwards. However, there may be serious 4075 * consequences, as noted in the white papers at the NTP project 4076 * site. 4077 */ 4078 case STEP: 4079 while (/* all associations */ 0) 4080 clear(p, X_STEP); 4081 s.stratum = MAXSTRAT; 4082 s.poll = MINPOLL; 4083 break; 4085 /* 4086 * The offset was less than the step threshold, which is the 4087 * normal case. Update the system variables from the peer 4088 * variables. The lower clamp on the dispersion increase is to 4089 * avoid timing loops and clockhopping when highly precise 4090 * sources are in play. The clamp can be changed from the 4091 * default .01 s in the reference implementation. 4092 */ 4093 case SLEW: 4094 s.leap = p->leap; 4095 s.stratum = p->stratum + 1; 4096 s.refid = p->refid; 4097 s.reftime = p->reftime; 4098 s.rootdelay = p->rootdelay + p->delay; 4099 dtemp = SQRT(SQUARE(p->jitter) + SQUARE(s.jitter)); 4100 dtemp += max(p->disp + PHI * (c.t - p->t) + 4101 fabs(p->offset), MINDISP); 4102 s.rootdisp = p->rootdisp + dtemp; 4103 break; 4105 /* 4106 * Some samples are discarded while, for instance, a direct 4107 * frequency measurement is being made. 4108 */ 4109 case IGNORE: 4110 break; 4111 } 4112 } 4113 A.5.5.5. clock_combine() 4115 /* 4116 * clock_combine() - combine offsets 4117 */ 4118 void 4119 clock_combine() 4120 { 4121 struct p *p; /* peer structure pointer */ 4122 double x, y, z, w; 4123 int i; 4125 /* 4126 * Combine the offsets of the clustering algorithm survivors 4127 * using a weighted average with weight determined by the root 4128 * distance. Compute the selection jitter as the weighted RMS 4129 * difference between the first survivor and the remaining 4130 * survivors. In some cases the inherent clock jitter can be 4131 * reduced by not using this algorithm, especially when frequent 4132 * clockhopping is involved. The reference implementation can be 4133 * configured to avoid this algorithm by designating a preferred 4134 * peer. 4135 */ 4136 y = z = w = 0; 4137 for (i = 0; s.v[i].p != NULL; i++) { 4138 p = s.v[i].p; 4139 x = root_dist(p); 4140 y += 1 / x; 4141 z += p->offset / x; 4142 w += SQUARE(p->offset - s.v[0].p->offset) / x; 4143 } 4144 s.offset = z / y; 4145 s.jitter = SQRT(w / y); 4146 } 4148 A.5.5.6. local_clock() 4150 /* 4151 * Clock discipoine parameters and constants 4152 */ 4153 #define STEPT .128 /* step threshold (s) */ 4154 #define WATCH 900 /* stepout threshold (s) */ 4155 #define PANICT 1000 /* panic threshold (s) */ 4156 #define PLL 65536 /* PLL loop gain */ 4157 #define FLL MAXPOLL + 1 /* FLL loop gain */ 4158 #define AVG 4 /* parameter averaging constant */ 4159 #define ALLAN 1500 /* compromise Allan intercept (s) */ 4160 #define LIMIT 30 /* poll-adjust threshold */ 4161 #define MAXFREQ 500e-6 /* frequency tolerance (500 PPM) */ 4162 #define PGATE 4 /* poll-adjust gate */ 4164 /* 4165 * local_clock() - discipline the local clock 4166 */ 4167 int /* return code */ 4168 local_clock( 4169 struct p *p, /* peer structure pointer */ 4170 double offset /* clock offset from combine() */ 4171 ) 4172 { 4173 int state; /* clock discipline state */ 4174 double freq; /* frequency */ 4175 double mu; /* interval since last update */ 4176 int rval; 4177 double etemp, dtemp; 4179 /* 4180 * If the offset is too large, give up and go home. 4181 */ 4182 if (fabs(offset) > PANICT) 4183 return (PANIC); 4185 /* 4186 * Clock state machine transition function. This is where the 4187 * action is and defines how the system reacts to large time 4188 * and frequency errors. There are two main regimes: when the 4189 * offset exceeds the step threshold and when it does not. 4190 */ 4191 rval = SLEW; 4192 mu = p->t - s.t; 4193 freq = 0; 4194 if (fabs(offset) > STEPT) { 4195 switch (c.state) { 4197 /* 4198 * In S_SYNC state we ignore the first outlyer amd 4199 * switch to S_SPIK state. 4200 */ 4201 case SYNC: 4202 state = SPIK; 4203 return (rval); 4205 /* 4206 * In S_FREQ state we ignore outlyers and inlyers. At 4207 * the first outlyer after the stepout threshold, 4208 * compute the apparent frequency correction and step 4209 * the time. 4210 */ 4211 case FREQ: 4212 if (mu < WATCH) 4213 return (IGNORE); 4215 freq = (offset - c.offset) / mu; 4216 /* fall through to S_SPIK */ 4218 /* 4219 * In S_SPIK state we ignore succeeding outlyers until 4220 * either an inlyer is found or the stepout threshold is 4221 * exceeded. 4222 */ 4223 case SPIK: 4224 if (mu < WATCH) 4225 return (IGNORE); 4227 /* fall through to default */ 4229 /* 4230 * We get here by default in S_NSET and S_FSET states 4231 * and from above in S_FREQ state. Step the time and 4232 * clamp down the poll interval. 4233 * 4234 * In S_NSET state an initial frequency correction is 4235 * not available, usually because the frequency file has 4236 * not yet been written. Since the time is outside the 4237 * capture range, the clock is stepped. The frequency 4238 * will be set directly following the stepout interval. 4239 * 4240 * In S_FSET state the initial frequency has been set 4241 * from the frequency file. Since the time is outside 4242 * the capture range, the clock is stepped immediately, 4243 * rather than after the stepout interval. Guys get 4244 * nervous if it takes 17 minutes to set the clock for 4245 * the first time. 4246 * 4247 * In S_SPIK state the stepout threshold has expired and 4248 * the phase is still above the step threshold. Note 4249 * that a single spike greater than the step threshold 4250 * is always suppressed, even at the longer poll 4251 * intervals. 4252 */ 4253 default: 4255 /* 4256 * This is the kernel set time function, usually 4257 * implemented by the Unix settimeofday() system 4258 * call. 4259 */ 4260 step_time(offset); 4261 c.count = 0; 4262 s.poll = MINPOLL; 4263 rval = STEP; 4264 if (state == NSET) { 4265 rstclock(FREQ, p->t, 0); 4266 return (rval); 4267 } 4268 break; 4269 } 4270 rstclock(SYNC, p->t, 0); 4271 } else { 4273 /* 4274 * Compute the clock jitter as the RMS of exponentially 4275 * weighted offset differences. This is used by the 4276 * poll-adjust code. 4277 */ 4278 etemp = SQUARE(c.jitter); 4279 dtemp = SQUARE(max(fabs(offset - c.last), 4280 LOG2D(s.precision))); 4281 c.jitter = SQRT(etemp + (dtemp - etemp) / AVG); 4282 switch (c.state) { 4284 /* 4285 * In S_NSET state this is the first update received and 4286 * the frequency has not been initialized. The first 4287 * thing to do is directly measure the oscillator 4288 * frequency. 4289 */ 4290 case NSET: 4291 rstclock(FREQ, p->t, offset); 4292 return (IGNORE); 4294 /* 4295 * In S_FSET state this is the first update and the 4296 * frequency has been initialized. Adjust the phase, but 4297 * don't adjust the frequency until the next update. 4298 */ 4299 case FSET: 4300 rstclock(SYNC, p->t, offset); 4301 break; 4303 /* 4304 * In S_FREQ state ignore updates until the stepout 4305 * threshold. After that, correct the phase and 4306 * frequency and switch to S_SYNC state. 4307 */ 4308 case FREQ: 4309 if (c.t - s.t < WATCH) 4310 return (IGNORE); 4312 freq = (offset - c.offset) / mu; 4313 break; 4315 /* 4316 * We get here by default in S_SYNC and S_SPIK states. 4317 * Here we compute the frequency update due to PLL and 4318 * FLL contributions. 4319 */ 4320 default: 4322 /* 4323 * The FLL and PLL frequency gain constants 4324 * depend on the poll interval and Allan 4325 * intercept. The FLL is not used below one-half 4326 * the Allan intercept. Above that the loop gain 4327 * increases in steps to 1 / AVG. 4328 */ 4329 if (LOG2D(s.poll) > ALLAN / 2) { 4330 etemp = FLL - s.poll; 4331 if (etemp < AVG) 4332 etemp = AVG; 4333 freq += (offset - c.offset) / (max(mu, 4334 ALLAN) * etemp); 4335 } 4337 /* 4338 * For the PLL the integration interval 4339 * (numerator) is the minimum of the update 4340 * interval and poll interval. This allows 4341 * oversampling, but not undersampling. 4342 */ 4343 etemp = min(mu, LOG2D(s.poll)); 4344 dtemp = 4 * PLL * LOG2D(s.poll); 4345 freq += offset * etemp / (dtemp * dtemp); 4346 rstclock(SYNC, p->t, offset); 4347 break; 4348 } 4349 } 4351 /* 4352 * Calculate the new frequency and frequency stability (wander). 4354 * Compute the clock wander as the RMS of exponentially weighted 4355 * frequency differences. This is not used directly, but can, 4356 * along withthe jitter, be a highly useful monitoring and 4357 * debugging tool 4358 */ 4359 freq += c.freq; 4360 c.freq = max(min(MAXFREQ, freq), -MAXFREQ); 4361 etemp = SQUARE(c.wander); 4362 dtemp = SQUARE(freq); 4363 c.wander = SQRT(etemp + (dtemp - etemp) / AVG); 4365 /* 4366 * Here we adjust the poll interval by comparing the current 4367 * offset with the clock jitter. If the offset is less than the 4368 * clock jitter times a constant, then the averaging interval is 4369 * increased, otherwise it is decreased. A bit of hysteresis 4370 * helps calm the dance. Works best using burst mode. 4371 */ 4372 if (fabs(c.offset) < PGATE * c.jitter) { 4373 c.count += s.poll; 4374 if (c.count > LIMIT) { 4375 c.count = LIMIT; 4376 if (s.poll < MAXPOLL) { 4377 c.count = 0; 4378 s.poll++; 4379 } 4380 } 4381 } else { 4382 c.count -= s.poll << 1; 4383 if (c.count < -LIMIT) { 4384 c.count = -LIMIT; 4385 if (s.poll > MINPOLL) { 4386 c.count = 0; 4387 s.poll--; 4388 } 4389 } 4390 } 4391 return (rval); 4392 } 4393 A.5.5.7. rstclock() 4395 /* 4396 * rstclock() - clock state machine 4397 */ 4398 void 4399 rstclock( 4400 int state, /* new state */ 4401 double offset, /* new offset */ 4402 double t /* new update time */ 4403 ) 4404 { 4405 /* 4406 * Enter new state and set state variables. Note we use the time 4407 * of the last clock filter sample, which must be earlier than 4408 * the current time. 4409 */ 4410 c.state = state; 4411 c.last = c.offset = offset; 4412 s.t = t; 4413 } 4415 A.5.6. Clock Adjust Process 4417 A.5.6.1. clock_adjust() 4419 /* 4420 * clock_adjust() - runs at one-second intervals 4421 */ 4422 void 4423 clock_adjust() { 4424 double dtemp; 4426 /* 4427 * Update the process time c.t. Also increase the dispersion 4428 * since the last update. In contrast to NTPv3, NTPv4 does not 4429 * declare unsynchronized after one day, since the dispersion 4430 * threshold serves this function. When the dispersion exceeds 4431 * MAXDIST (1 s), the server is considered unfit for 4432 * synchroniztion. 4433 */ 4434 c.t++; 4435 s.rootdisp += PHI; 4437 /* 4438 * Implement the phase and frequency adjustments. The gain 4439 * factor (denominator) is not allowed to increase beyond the 4440 * Allan intercept. It doesn't make sense to average phase noise 4441 * beyond this point and it helps to damp residual offset at the 4442 * longer poll intervals. 4443 */ 4444 dtemp = c.offset / (PLL * min(LOG2D(s.poll), ALLAN)); 4445 c.offset -= dtemp; 4447 /* 4448 * This is the kernel adjust time function, usually implemented 4449 * by the Unix adjtime() system call. 4450 */ 4451 adjust_time(c.freq + dtemp); 4453 /* 4454 * Peer timer. Call the poll() routine when the poll timer 4455 * expires. 4456 */ 4457 while (/* all associations */ 0) { 4458 struct p *p; /* dummy peer structure pointer */ 4460 if (c.t >= p->nextdate) 4461 poll(p); 4462 } 4464 /* 4465 * Once per hour write the clock frequency to a file 4466 */ 4467 if (c.t % 3600 == 3599) 4468 /* write c.freq to file */ 0; 4469 } 4471 A.5.7. Poll Process 4473 /* 4474 * Poll process parameters and constants 4475 */ 4476 #define UNREACH 12 /* unreach counter threshold */ 4477 #define BCOUNT 8 /* packets in a burst */ 4478 #define BTIME 2 /* burst interval (s) */ 4480 A.5.7.1. poll() 4482 /* 4483 * poll() - determine when to send a packet for association p-> 4484 */ 4485 void 4486 poll( 4487 struct p *p /* peer structure pointer */ 4488 ) 4490 { 4491 int hpoll; 4492 int oreach; 4494 /* 4495 * This routine is called when the current time c.t catches up 4496 * to the next poll time p->nextdate. The value p->outdate is 4497 * the last time this routine was executed. The poll_update() 4498 * routine determines the next execution time p->nextdate. 4499 * 4500 * If broadcasting, just do it, but only if we are synchronized. 4501 */ 4502 hpoll = p->hpoll; 4503 if (p->hmode == M_BCST) { 4504 p->outdate = c.t; 4505 if (s.p != NULL) 4506 peer_xmit(p); 4507 poll_update(p, hpoll); 4508 return; 4509 } 4511 /* 4512 * If manycasting, start with ttl = 1. The ttl is increased by 4513 * one for each poll until MAXCLOCK servers have been found or 4514 * ttl reaches TTLMAX. If reaching MAXCLOCK, stop polling until 4515 * the number of servers falls below MINCLOCK, then start all 4516 * over. 4517 */ 4518 if (p->hmode == M_CLNT && p->flags & P_MANY) { 4519 p->outdate = c.t; 4520 if (p->unreach > BEACON) { 4521 p->unreach = 0; 4522 p->ttl = 1; 4523 peer_xmit(p); 4524 } else if (s.n < MINCLOCK) { 4525 if (p->ttl < TTLMAX) 4526 p->ttl++; 4527 peer_xmit(p); 4528 } 4529 p->unreach++; 4530 poll_update(p, hpoll); 4531 return; 4532 } 4533 if (p->burst == 0) { 4535 /* 4536 * We are not in a burst. Shift the reachability 4537 * register to the left. Hopefully, some time before the 4538 * next poll a packet will arrive and set the rightmost 4539 * bit. 4540 */ 4541 oreach = p->reach; 4542 p->outdate = c.t; 4543 p->reach << 1; 4544 if (!(p->reach & 0x7)) 4545 clock_filter(p, 0, 0, MAXDISP); 4546 if (!p->reach) { 4548 /* 4549 * The server is unreachable, so bump the 4550 * unreach counter. If the unreach threshold has 4551 * been reached, double the poll interval to 4552 * minimize wasted network traffic. Send a burst 4553 * only if enabled and the unreach threshold has 4554 * not been reached. 4555 */ 4556 if (p->flags & P_IBURST && p->unreach == 0) { 4557 p->burst = BCOUNT; 4558 } else if (p->unreach < UNREACH) 4559 p->unreach++; 4560 else 4561 hpoll++; 4562 p->unreach++; 4563 } else { 4565 /* 4566 * The server is reachable. Set the poll 4567 * interval to the system poll interval. Send a 4568 * burst only if enabled and the peer is fit. 4569 */ 4570 p->unreach = 0; 4571 hpoll = s.poll; 4572 if (p->flags & P_BURST && fit(p)) 4573 p->burst = BCOUNT; 4574 } 4575 } else { 4577 /* 4578 * If in a burst, count it down. When the reply comes 4579 * back the clock_filter() routine will call 4580 * clock_select() to process the results of the burst. 4581 */ 4582 p->burst--; 4583 } 4585 /* 4586 * Do not transmit if in broadcast client mode. 4587 */ 4588 if (p->hmode != M_BCLN) 4589 peer_xmit(p); 4590 poll_update(p, hpoll); 4591 } 4593 A.5.7.2. poll_update() 4594 /* 4595 * poll_update() - update the poll interval for association p 4596 * 4597 * Note: This routine is called by both the packet() and poll() routine. 4598 * Since the packet() routine is executed when a network packet arrives 4599 * and the poll() routine is executed as the result of timeout, a 4600 * potential race can occur, possibly causing an incorrect interval for 4601 * the next poll. This is considered so unlikely as to be negligible. 4602 */ 4603 void 4604 poll_update( 4605 struct p *p, /* peer structure pointer */ 4606 int poll /* poll interval (log2 s) */ 4607 ) 4608 { 4609 /* 4610 * This routine is called by both the poll() and packet() 4611 * routines to determine the next poll time. If within a burst 4612 * the poll interval is two seconds. Otherwise, it is the 4613 * minimum of the host poll interval and peer poll interval, but 4614 * not greater than MAXPOLL and not less than MINPOLL. The 4615 * design insures that a longer interval can be preempted by a 4616 * shorter one if required for rapid response. 4617 */ 4618 p->hpoll = max(min(MAXPOLL, poll), MINPOLL); 4619 if (p->burst > 0) { 4620 if (p->nextdate != c.t) 4621 return; 4622 else 4623 p->nextdate += BTIME; 4624 } else { 4626 /* 4627 * While not shown here, the reference implementation 4628 * randonizes the poll interval by a small factor. 4629 */ 4630 p->nextdate = p->outdate + (1 << max(min(p->ppoll, 4631 p->hpoll), MINPOLL)); 4632 } 4634 /* 4635 * It might happen that the due time has already passed. If so, 4636 * make it one second in the future. 4637 */ 4638 if (p->nextdate <= c.t) 4639 p->nextdate = c.t + 1; 4640 } 4641 A.5.7.3. peer_xmit() 4642 /* 4643 * transmit() - transmit a packet for association p 4644 */ 4645 void 4646 peer_xmit( 4647 struct p *p /* peer structure pointer */ 4648 ) 4649 { 4650 struct x x; /* transmit packet */ 4652 /* 4653 * Initialize header and transmit timestamp 4654 */ 4655 x.srcaddr = p->dstaddr; 4656 x.dstaddr = p->srcaddr; 4657 x.leap = s.leap; 4658 x.version = p->version; 4659 x.mode = p->hmode; 4660 if (s.stratum == MAXSTRAT) 4661 x.stratum = 0; 4662 else 4663 x.stratum = s.stratum; 4664 x.poll = p->hpoll; 4665 x.precision = s.precision; 4666 x.rootdelay = D2FP(s.rootdelay); 4667 x.rootdisp = D2FP(s.rootdisp); 4668 x.refid = s.refid; 4669 x.reftime = s.reftime; 4670 x.org = p->org; 4671 x.rec = p->rec; 4672 x.xmt = get_time(); 4673 p->xmt = x.xmt; 4675 /* 4676 * If the key ID is nonzero, send a valid MAC using the key ID 4677 * of the association and the key in the local key cache. If 4678 * something breaks, like a missing trusted key, don't send the 4679 * packet; just reset the association and stop until the problem 4680 * is fixed. 4681 */ 4682 if (p->keyid) 4683 if (/* p->keyid invalid */ 0) { 4684 clear(p, X_NKEY); 4685 return; 4686 } 4687 x.mac = md5(p->keyid); 4688 xmit_packet(&x); 4689 } 4690 Authors' Addresses 4692 Jack Burbank (editor) 4693 The Johns Hopkins University Applied Physics Laboratory 4694 11100 Johns Hopkins Road 4695 Laurel, MD 20723-6099 4696 US 4698 Phone: +1 443 778 7127 4699 Email: jack.burbank@jhuapl.edu 4701 William Kasch (editor) 4702 The Johns Hopkins University Applied Physics Laboratory 4703 11100 Johns Hopkins Road 4704 Laurel, MD 20723-6099 4705 US 4707 Phone: +1 443 778 7463 4708 Email: william.kasch@jhuapl.edu 4710 Jim Martin (editor) 4711 The Daedelus Group 4712 721 Alameda de las Pulgas 4713 Belmont, CA 94002 4714 US 4716 Phone: +1 408 799-2788 4717 Email: jim@daedelus.com 4719 Dr. David L. Mills 4720 University of Delaware 4721 Newark, DE 19716 4722 US 4724 Phone: +1 302 831 8247 4725 Email: mills@udel.edu 4727 Full Copyright Statement 4729 Copyright (C) The IETF Trust (2007). 4731 This document is subject to the rights, licenses and restrictions 4732 contained in BCP 78, and except as set forth therein, the authors 4733 retain all their rights. 4735 This document and the information contained herein are provided on an 4736 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4737 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4738 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4739 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4740 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4741 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4743 Intellectual Property 4745 The IETF takes no position regarding the validity or scope of any 4746 Intellectual Property Rights or other rights that might be claimed to 4747 pertain to the implementation or use of the technology described in 4748 this document or the extent to which any license under such rights 4749 might or might not be available; nor does it represent that it has 4750 made any independent effort to identify any such rights. Information 4751 on the procedures with respect to rights in RFC documents can be 4752 found in BCP 78 and BCP 79. 4754 Copies of IPR disclosures made to the IETF Secretariat and any 4755 assurances of licenses to be made available, or the result of an 4756 attempt made to obtain a general license or permission for the use of 4757 such proprietary rights by implementers or users of this 4758 specification can be obtained from the IETF on-line IPR repository at 4759 http://www.ietf.org/ipr. 4761 The IETF invites any interested party to bring to its attention any 4762 copyrights, patents or patent applications, or other proprietary 4763 rights that may cover technology that may be required to implement 4764 this standard. Please address the information to the IETF at 4765 ietf-ipr@ietf.org. 4767 Acknowledgment 4769 Funding for the RFC Editor function is provided by the IETF 4770 Administrative Support Activity (IASA).