idnits 2.17.1 draft-ietf-ntp-ntpv4-proto-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4734. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4745. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4752. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4758. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 119 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document obsoletes RFC1305, but the abstract doesn't seem to directly say this. It does mention RFC1305 though, so this could be OK. -- The draft header indicates that this document obsoletes RFC4330, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2655 has weird spacing: '... ipaddr srcad...' == Line 2656 has weird spacing: '... ipaddr dstad...' == Line 2657 has weird spacing: '... char ver...' == Line 2658 has weird spacing: '... char lea...' == Line 2659 has weird spacing: '... char mod...' == (114 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 24, 2007) is 6176 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NSTAGE' on line 3528 -- Looks like a reference, but probably isn't: 'NMAX' on line 2800 == Missing Reference: '0' is mentioned on line 4135, but not defined -- Obsolete informational reference (is this intentional?): RFC 1305 (ref. '1') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 4330 (ref. '2') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 2434 (ref. '11') (Obsoleted by RFC 5226) Summary: 2 errors (**), 0 flaws (~~), 11 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NTP WG J. Burbank, Ed. 3 Internet-Draft W. Kasch, Ed. 4 Obsoletes: RFC 4330, RFC 1305 JHU/APL 5 (if approved) J. Martin, Ed. 6 Intended status: Standards Track Daedelus 7 Expires: November 25, 2007 D. Mills 8 U. Delaware 9 May 24, 2007 11 Network Time Protocol Version 4 Protocol And Algorithms Specification 12 draft-ietf-ntp-ntpv4-proto-06 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on November 25, 2007. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2007). 43 Abstract 45 The Network Time Protocol (NTP) is widely used to synchronize 46 computer clocks in the Internet. This document describes NTP Version 47 4 (NTPv4), which is backwards compatible with NTP Version 3 (NTPv3) 48 described in RFC 1305, as well as previous versions of the protocol. 50 It includes a modified protocol header to accommodate the Internet 51 Protocol Version 6 address family. NTPv4 includes fundamental 52 improvements in the mitigation and discipline algorithms which extend 53 the potential accuracy to the tens of microseconds with modern 54 workstations and fast LANs. It includes a dynamic server discovery 55 scheme, so that in many cases specific server configuration is not 56 required. It corrects certain errors in the NTPv3 design and 57 implementation and includes an optional extension mechanism. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 63 2. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 5 64 3. Protocol Modes . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. Simple Network Time Protocol (SNTP) . . . . . . . . . . . 7 66 3.2. Dynamic Server Discovery . . . . . . . . . . . . . . . . 8 67 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 9 68 5. Implementation Model . . . . . . . . . . . . . . . . . . . . 10 69 6. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 13 70 7. Data Structures . . . . . . . . . . . . . . . . . . . . . . . 17 71 7.1. Structure Conventions . . . . . . . . . . . . . . . . . . 17 72 7.2. Global Parameters . . . . . . . . . . . . . . . . . . . . 17 73 7.3. Packet Header Variables . . . . . . . . . . . . . . . . . 18 74 7.4. The Kiss-o'-Death Packet . . . . . . . . . . . . . . . . 24 75 7.5. NTP Extension Field Format . . . . . . . . . . . . . . . 25 76 8. On Wire Protocol . . . . . . . . . . . . . . . . . . . . . . 27 77 9. Peer Process . . . . . . . . . . . . . . . . . . . . . . . . 31 78 9.1. Peer Process Variables . . . . . . . . . . . . . . . . . 31 79 9.2. Peer Process Operations . . . . . . . . . . . . . . . . . 34 80 10. Clock Filter Algorithm . . . . . . . . . . . . . . . . . . . 38 81 11. System Process . . . . . . . . . . . . . . . . . . . . . . . 40 82 11.1. System Process Variables . . . . . . . . . . . . . . . . 40 83 11.2. System Process Operations . . . . . . . . . . . . . . . . 41 84 11.2.1. Selection Algorithm . . . . . . . . . . . . . . . . 44 85 11.2.2. Cluster Algorithm . . . . . . . . . . . . . . . . . 45 86 11.2.3. Combine Algorithm . . . . . . . . . . . . . . . . . 46 87 11.3. Clock Discipline Algorithm . . . . . . . . . . . . . . . 48 88 12. Clock Adjust Process . . . . . . . . . . . . . . . . . . . . 52 89 13. Poll Process . . . . . . . . . . . . . . . . . . . . . . . . 52 90 13.1. Poll Process Variables . . . . . . . . . . . . . . . . . 52 91 13.2. Poll Process Operations . . . . . . . . . . . . . . . . . 53 92 14. Security Considerations . . . . . . . . . . . . . . . . . . . 55 93 15. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 56 94 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 56 95 17. Informative References . . . . . . . . . . . . . . . . . . . 56 96 Appendix A. Code Skeleton . . . . . . . . . . . . . . . . . . . 57 97 A.1. Global Definitions . . . . . . . . . . . . . . . . . . . 58 98 A.1.1. Definitions, Constants, Parameters . . . . . . . . . 58 99 A.1.2. Packet Data Structures . . . . . . . . . . . . . . . 61 100 A.1.3. Association Data Structures . . . . . . . . . . . . 62 101 A.1.4. System Data Structures . . . . . . . . . . . . . . . 64 102 A.1.5. Local Clock Data Structures . . . . . . . . . . . . 65 103 A.1.6. Function Prototypes . . . . . . . . . . . . . . . . 65 104 A.2. Main Program and Utility Routines . . . . . . . . . . . . 66 105 A.3. Kernel Input/Output Interface . . . . . . . . . . . . . . 69 106 A.4. Kernel System Clock Interface . . . . . . . . . . . . . . 69 107 A.5. Peer Process . . . . . . . . . . . . . . . . . . . . . . 71 108 A.5.1. receive() . . . . . . . . . . . . . . . . . . . . . 72 109 A.5.2. clock_filter() . . . . . . . . . . . . . . . . . . . 79 110 A.5.3. fast_xmit() . . . . . . . . . . . . . . . . . . . . 83 111 A.5.4. access() . . . . . . . . . . . . . . . . . . . . . . 85 112 A.5.5. System Process . . . . . . . . . . . . . . . . . . . 85 113 A.5.6. Clock Adjust Process . . . . . . . . . . . . . . . . 99 114 A.5.7. Poll Process . . . . . . . . . . . . . . . . . . . . 100 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 107 116 Intellectual Property and Copyright Statements . . . . . . . . . 108 118 1. Introduction 120 This document defines the Network Time Protocol Version 4 (NTPv4), 121 which is widely used to synchronize system clocks among a set of 122 distributed time servers and clients. It describes the core 123 architecture, protocol, state machines, data structures and 124 algorithms. NTPv4 introduces new functionality to NTPv3, as 125 described in [1], and functionality expanded from SNTPv4 as described 126 in [2] (SNTPv4 is a subset of NTPv4). This document obsoletes [1], 127 and [2]. While certain minor changes have been made in some protocol 128 header fields, these do not affect the interoperability between NTPv4 129 and previous versions of NTP and SNTP. 131 The NTP subnet model includes a number of widely accessible primary 132 time servers synchronized by wire or radio to national standards. 133 The purpose of the NTP protocol is to convey timekeeping information 134 from these primary servers to secondary time servers and clients via 135 both private networks and the public Internet. Crafted algorithms 136 mitigate errors that may result from network disruptions, server 137 failures and possible hostile actions. Servers and clients are 138 configured such that values flow towards clients from the primary 139 servers at the root via branching secondary servers. 141 The NTPv4 design overcomes significant shortcomings in the NTPv3 142 design, corrects certain bugs and incorporates new features. In 143 particular, expanded NTP timestamp definitions encourage the use of 144 the floating double data type throughout the implementation. As a 145 result, the time resolution is better than one nanosecond and 146 frequency resolution is less than one nanosecond per second. 147 Additional improvements include a new clock discipline algorithm 148 which is more responsive to system clock hardware frequency 149 fluctuations. Typical primary servers using modern machines are 150 precise within a few tens of microseconds. Typical secondary servers 151 and clients on fast LANs are within a few hundred microseconds with 152 poll intervals up to 1024 seconds, which was the maximum with NTPv3. 153 With NTPv4, servers and clients are precise within a few tens of 154 milliseconds with poll intervals up to 36 hours. 156 The main body of this document describes the core protocol and data 157 structures necessary to interoperate between conforming 158 implementations. Appendix A contains additional detail in the form 159 of a skeleton program, including data structures and code segments 160 for the core algorithms as well as the mitigation algorithms used to 161 enhance reliability and accuracy. While the skeleton program and 162 other descriptions in this document apply to a particular 163 implementation, they are not intended as the only way the required 164 functions can be implemented. While the NTPv3 symmetric key 165 authentication scheme described in this document has been carried 166 over from NTPv3, the Autokey public key authentication scheme new to 167 NTPv4 is described in [3]. 169 The NTP protocol includes modes of operation described in Section 2 170 using data types described in Section 6 and data structures described 171 in Section 7. The implementation model described in Section 5 is 172 based on a threaded, multi-process architecture, although other 173 architectures could be used as well. The on-wire protocol described 174 in Section 8 is based on a returnable-time design which depends only 175 on measured clock offsets, but does not require reliable message 176 delivery. The synchronization subnet is a self-organizing, 177 hierarchical, master-slave network with synchronization paths 178 determined by a shortest-path spanning tree and defined metric. 179 While multiple masters (primary servers) may exist, there is no 180 requirement for an election protocol. 182 This document includes material from [4], which contains flow charts 183 and equations unsuited for RFC format. There is much additional 184 information in [5], including an extensive technical analysis and 185 performance assessment of the protocol and algorithms in this 186 document. The reference implementation is available at www.ntp.org. 188 The remainder of this document contains numerous variables and 189 mathematical expressions. Some variables take the form of Greek 190 characters, which are spelled out by their full case-sensitive name. 191 For example DELTA refers to the uppercase Greek character, while 192 delta refers to the lowercase character. Furthermore, subscripts are 193 denoted with '_', for example theta_i refers to the lowercase Greek 194 character theta with subscript i, or phonetically theta sub i. 196 1.1. Requirements Notation 198 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 199 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 200 document are to be interpreted as described in [6]. 202 2. Modes of Operation 204 An NTP implementation operates as a primary server, secondary server 205 or client. A primary server is synchronized directly to a reference 206 clock, such as a GPS receiver or telephone modem service. A client 207 synchronizes to one or more upstream servers, but does not provide 208 synchronization to dependent clients. A secondary server has one or 209 more upstream servers and one or more downstream servers or clients. 210 All servers and clients who are fully NTPv4 compliance MUST implement 211 the entire suite of algorithms described in this document. In order 212 to maintain stability in large NTP subnets, secondary servers MUST be 213 fully NTPv4 compliant. 215 3. Protocol Modes 217 There are three NTP protocol variants, symmetric, client/server and 218 broadcast. Each is associated with an association mode as shown in 219 Figure 1. In addition, persistent associations are mobilized upon 220 startup and are never demobilized. Ephemeral associations are 221 mobilized upon the arrival of a packet and are demobilized upon error 222 or timeout. 224 +-------------------+--------------+-------------+ 225 | Association Mode | Assoc. Mode | Packet Mode | 226 +-------------------+--------------+-------------+ 227 | Symmetric Active | 1 | 1 or 2 | 228 | Symmetric Passive | 2 | 1 | 229 | Client | 3 | 4 | 230 | Server | 4 | 3 | 231 | Broadcast Server | 5 | 5 | 232 | Broadcast Client | 6 | N/A | 233 +-------------------+--------------+-------------+ 235 Figure 1: Association and Packet Modes 237 In the client/server variant a persistent client sends client (mode 238 3) packets to a server, which returns server (mode 4) packets. 239 Servers provide synchronization to one or more clients, but do not 240 accept synchronization from them. A server can also be a reference 241 clock driver which obtains time directly from a standard source such 242 as a GPS receiver or telephone modem service. In this varient, 243 clients pull synchronization from servers. 245 In the symmetric variant a peer operates as both a server and client 246 using either a symmetric active or symmetric passive association. A 247 persistent symmetric active association sends symmetric active (mode 248 1) packets to a symmetric active peer association. Alternatively, an 249 ephemeral symmetric passive association can be mobilized upon arrival 250 of a symmetric active packet with matching no association. That 251 association sends symmetric passive (mode 2) packets and persists 252 until error or timeout. Peers both push and pull synchronization to 253 and from each other. For the purposes of this document, a peer 254 operates like a client, so references to client imply peer as well. 256 In the broadcast variant a persistent broadcast server association 257 sends periodic broadcast server (mode 5) packets which can be 258 received by multiple clients. Upon reception of a broadcast server 259 packet without a matching association, an ephemeral broadcast client 260 (mode 6) association is mobilized and persists until error or 261 timeout. It is useful to provide an initial volley where the client 262 operating in client mode exchanges several packets with the server, 263 so as to calibrate the propagation delay and to run the Autokey 264 security protocol, after which the client reverts to broadcast client 265 mode. A broadcast server pushes synchronization to clients and other 266 servers. 268 Following loosely the conventions established by the telephone 269 industry, the level of each server in the hierarchy is defined by a 270 stratum number. Primary servers are assigned stratum one; secondary 271 servers at each lower level are assigned stratum numbers one greater 272 than the preceding level. As the stratum number increases, its 273 accuracy degrades depending on the particular network path and system 274 clock stability. Mean errors, measured by synchronization distances, 275 increase approximately in proportion to stratum numbers and measured 276 roundtrip delay. 278 As a standard practice, timing network topology should be organized 279 to avoid timing loops and minimize the synchronization distance. In 280 NTP the subnet topology is determined using a variant of the Bellman- 281 Ford distributed routing algorithm, which computes the shortest- 282 distance spanning tree rooted on the primary servers. As a result of 283 this design, the algorithm automatically reorganizes the subnet, so 284 as to produce the most accurate and reliable time, even when there 285 are failures in the timing network. 287 3.1. Simple Network Time Protocol (SNTP) 289 Primary servers and clients complying with a subset of NTP, called 290 the Simple Network Time Protocol (SNTPv4) [2], do not need to 291 implement the mitigation algorithms described in Section 9 and 292 following sections. SNTP is intended for primary servers equipped 293 with a single reference clock, as well as for clients with a single 294 upstream server and no dependent clients. The fully developed NTPv4 295 implementation is intended for secondary servers with multiple 296 upstream servers and multiple downstream servers or clients. Other 297 than these considerations, NTP and SNTP servers and clients are 298 completely interoperable and can be mixed and matched in NTP subnets. 300 An SNTP primary server implementing the on-wire protocol described in 301 Section 8 has no upstream servers except a single reference clock. 302 In principle, it is indistinguishable from an NTP primary server that 303 has the mitigation algorithms and therefore capable of mitigating 304 between multiple reference clocks. 306 Upon receiving a client request, an SNTP primary server constructs 307 and sends the reply packet as described in Figure 2 of Section 9.2. 309 Note that the dispersion field in the packet header must be updated 310 as described in Section 5. 312 +-----------------------------------+ 313 | Packet Variable <-- Variable | 314 +-----------------------------------+ 315 | x.leap <-- s.leap | 316 | x.version <-- r.version | 317 | x.mode <-- 4 | 318 | x.stratum <-- s.stratum | 319 | x.poll <-- r.poll | 320 | x.precision <-- s.precision | 321 | x.rootdelay <-- s.rootdelay | 322 | x.rootdisp <-- s.rootdisp | 323 | x.refid <-- s.refid | 324 | x.reftime <-- s.reftime | 325 | x.org <-- r.xmt | 326 | x.rec <-- r.dst | 327 | x.xmt <-- clock | 328 | x.keyid <-- r.keyid | 329 | x.digest <-- md5 digest | 330 +-----------------------------------+ 332 Figure 2: fast_xmit Packet Header 334 A SNTP client implementing the on-wire protocol has a single server 335 and no dependent clients. It can operate with any subset of the NTP 336 on-wire protocol, the simplest approach using only the transmit 337 timestamp of the server packet and ignoring all other fields. 338 However, the additional complexity to implement the full on-wire 339 protocol is minimal so that a full implementation is encouraged. 341 3.2. Dynamic Server Discovery 343 There are two special associations, manycast client and manycast 344 server, which provide a dynamic server discovery function. There are 345 two types of manycast client associations, persistent and ephemeral. 346 The persistent manycast client sends client (mode 3) packets to a 347 designated IPv4 or IPv6 broadcast or multicast group address. 348 Designated manycast servers within range of the time-to-live (TTL) 349 field in the packet header listen for packets with that address. If 350 a server is suitable for synchronization, it returns an ordinary 351 server (mode 4) packet using the client's unicast address. Upon 352 receiving this packet, the client mobilizes an ephemeral client (mode 353 3) association. The ephemeral client association persists until 354 error or timeout. 356 A manycast client continues sending packets to search for a minimum 357 number of associations. It starts with a TTL equal to one and 358 continuously adding one to it until the minimum number of 359 associations is made or when the TTL reaches a maximum value. If the 360 TTL reaches its maximum value and yet not enough associations are 361 mobilized, the client stops transmission for a time-out period to 362 clear all associations, and then repeats the search cycle. If a 363 minimum number of associations has been mobilized, then the client 364 starts transmitting one packet per time-out period to maintain the 365 associations. 367 The ephemeral associations compete among themselves. As new 368 ephemeral associations are mobilized, the client runs the mitigation 369 algorithms described in Section 10 and Section 11.2 for the best 370 candidates out of the population, the remaining ephemeral 371 associations are timed out and demobilized. In this way the 372 population includes only the best and freshest candidates to 373 discipline the system clock. The reference implementation includes 374 intricate means to do this, but these are beyond the scope of this 375 document. 377 4. Definitions 379 A number of technical terms are defined in this section. A timescale 380 is a frame of reference where time is expressed as the value of a 381 monotonically increasing binary counter with an indefinite number of 382 bits. It counts in seconds and fractions of a second, when a decimal 383 point is employed. The Coordinated Universal Time (UTC) timescale 384 represents mean solar time as disseminated by national standards 385 laboratories. The system time is represented by the system clock 386 maintained by the hardware and operating system. The goal of the NTP 387 algorithms is to minimize both the time difference and frequency 388 difference between UTC and the system clock. When these differences 389 have been reduced below nominal tolerances, the system clock is said 390 to be synchronized to UTC. 392 The date of an event is the UTC time at which the event takes place. 393 Dates are ephemeral values designated with upper case T. Running time 394 is another timescale that is coincident to the synchronization 395 function of the NTP program. 397 A timestamp T(t) represents either the UTC date or time offset from 398 UTC at running time t. Which meaning is intended should be clear 399 from the context. Let T(t) be the time offset, R(t) the frequency 400 offset, D(t) the aging rate (first derivative of R(t) with respect to 401 t). Then, if T(t_0) is the UTC time offset determined at t = t_0, 402 the UTC time offset at time t is 403 T(t) = T(t_0) + R(t_0)(t-t_0) + 1/2 * D(t_0)(t-t_0)^2 + e, 405 where e is a stochastic error term discussed later in this document. 406 While the D(t) term is important when characterizing precision 407 oscillators, it is ordinarily neglected for computer oscillators. In 408 this document all time values are in seconds (s) and all frequency 409 values are in seconds-per-second (s/s). It is sometimes convenient 410 to express frequency offsets in parts-per-million (PPM), where 1 PPM 411 is equal to 10^(-6) seconds/second. 413 It is important in computer timekeeping applications to assess the 414 performance of the timekeeping function. The NTP performance model 415 includes four statistics which are updated each time a client makes a 416 measurement with a server. The offset (theta) represents the 417 maximum-likelihood time offset of the server clock relative to the 418 system clock. The delay (delta) represents the round trip delay 419 between the client and server. The dispersion (epsilon) represents 420 the maximum error inherent in the measurement. It increases at a 421 rate equal to the maximum disciplined system clock frequency 422 tolerance (PHI), typically 15 PPM. The jitter (psi) is defined as 423 the root-mean-square (RMS) average of the most recent offset 424 differences, represents the nominal error in estimating the offset. 426 While the theta, delta, epsilon, and psi statistics represent 427 measurements of the system clock relative to the each server clock 428 separately, the NTP protocol includes mechanisms to combine the 429 statistics of several servers to more accurately discipline and 430 calibrate the system clock. The system offset (THETA) represents the 431 maximum-likelihood offset estimate for the server population. The 432 system jitter (PSI) represents the nominal error in estimating the 433 system offset. The delta and epsilon statistics are accumulated at 434 each stratum level from the reference clock to produce the root delay 435 (DELTA) and root dispersion (EPSILON) statistics. The 436 synchronization distance (LAMBDA) equal to EPSILON + DELTA / 2 437 represents the maximum error due all causes. The detailed 438 formulations of these statistics are given in Section 11.2. They are 439 available to the dependent applications in order to assess the 440 performance of the synchronization function. 442 5. Implementation Model 444 Figure 3 shows the architecture of a typical, multi-threaded 445 implementation. It includes two processes dedicated to each server, 446 a peer process to receive messages from the server or reference clock 447 and a poll process to transmit messages to the server or reference 448 clock. 450 ..................................................................... 451 . Remote . Peer/Poll . System . Clock . 452 . Servers . Processes . Process .Discipline. 453 . . . . Process . 454 .+--------+. +-----------+. +------------+ . . 455 .| |->| |. | | . . 456 .|Server 1| |Peer/Poll 1|->| | . . 457 .| |<-| |. | | . . 458 .+--------+. +-----------+. | | . . 459 . . ^ . | | . . 460 . . | . | | . . 461 .+--------+. +-----------+. | | +-----------+. . 462 .| |->| |. | Selection |->| |. +------+ . 463 .|Server 2| |Peer/Poll 2|->| and | | Combine |->| Loop | . 464 .| |<-| |. | Cluster | | Algorithm |. |Filter| . 465 .+--------+. +-----------+. | Algorithms |->| |. +------+ . 466 . . ^ . | | +-----------+. | . 467 . . | . | | . | . 468 .+--------+. +-----------+. | | . | . 469 .| |->| |. | | . | . 470 .|Server 3| |Peer/Poll 3|->| | . | . 471 .| |<-| |. | | . | . 472 .+--------+. +-----------+. +------------+ . | . 473 ....................^.........................................|...... 474 | . V . 475 | . +-----+ . 476 +--------------------------------------| VFO | . 477 . +-----+ . 478 . Clock . 479 . Adjust . 480 . Process . 481 ............ 483 Figure 3: Implementation Model 485 These processes operate on a common data structure, called an 486 association, which contains the statistics described above along with 487 various other data described in Section 9. A client sends packets to 488 one or more servers and then processes returned packets when they are 489 received. The server interchanges source and destination addresses 490 and ports, overwrites certain fields in the packet and returns it 491 immediately (in the client/server mode) or at some time later (in the 492 symmetric modes). As each NTP message is received, the offset theta 493 between the peer clock and the system clock is computed along with 494 the associated statistics delta, epsilon and psi. 496 The system process includes the selection, cluster and combine 497 algorithms that mitigate among the various servers and reference 498 clocks to determine the most accurate and reliable candidates to 499 synchronize the system clock. The selection algorithm uses Byzantine 500 principles to discard the presumably incorrect candidates called 501 "falsetickers" from the incident population, leaving only good 502 candidates called "truechimers". A truechimer is a clock that 503 maintains timekeeping accuracy to a previously published and trusted 504 standard, while a falseticker is a clock that shows misleading or 505 inconsistent time. The cluster algorithm uses statistical principles 506 to find the most accurate set of truechimers. The combine algorithm 507 computes the final clock offset by statistically averaging the 508 surviving truechimers. 510 The clock discipline process is a system process that controls the 511 time and frequency of the system clock, here represented as a 512 variable frequency oscillator (VFO). Timestamps struck from the VFO 513 close the feedback loop which maintains the system clock time. 514 Associated with the clock discipline process is the clock adjust 515 process, which runs once each second to inject a computed time offset 516 and maintain constant frequency. The RMS average of past time offset 517 differences represents the nominal error or system clock jitter. The 518 RMS average of past frequency offset differences represents the 519 oscillator frequency stability or frequency wander. These terms are 520 given precise interpretation in Section 11.2. 522 A client sends messages to each server with a poll interval of 2^tau 523 seconds, as determined by the poll exponent tau. In NTPv4, tau 524 ranges from 4 (16 s) through 17 (36 h). The value of tau is 525 determined by the clock discipline algorithm to match the loop time 526 constant T_c = 2^tau. In client/server mode the server responds 527 immediately; however, in symmetric modes each of two peers manages 528 tau as a function of current system offset and system jitter, so may 529 not agree with the same value. It is important that the dynamic 530 behavior of the clock discipline algorithm be carefully controlled in 531 order to maintain stability in the NTP subnet at large. This 532 requires that the peers agree on a common tau equal to the minimum 533 poll exponent of both peers. The NTP protocol includes provisions to 534 properly negotiate this value. 536 The implementation model includes some means to set and adjust the 537 system clock. The operating system is assumed to provide two 538 functions, one to set the time directly, for example the Unix 539 settimeofday() function, and another to adjust the time in small 540 increments advancing or retarding the time by a designated amount, 541 for example the Unix adjtime() function. In this and following 542 references, parentheses following a name indicate reference to a 543 function rather than a simple variable. In the intended design the 544 clock discipline process uses the adjtime() function if the 545 adjustment is less than a designated threshold, and the 546 settimeofday() function if above the threshold. The manner in which 547 this is done and the value of the threshold as described in 548 Section 10. 550 6. Data Types 552 All NTP time values are represented in twos-complement format, with 553 bits numbered in big-endian (as described in Appendix A of [7]) 554 fashion from zero starting at the left, or high-order, position. 555 There are three NTP time formats, a 128-bit date format, a 64-bit 556 timestamp format and a 32-bit short format, as shown in Figure 4. 557 The 128-bit date format is used where sufficient storage and word 558 size are available. It includes a 64-bit signed seconds field 559 spanning 584 billion years and a 64-bit fraction field resolving .05 560 attosecond (i.e., 0.5e-18). For convenience in mapping between 561 formats, the seconds field is divided into a 32-bit Era Number field 562 and a 32-bit Era Offset field. Eras cannot be produced by NTP 563 directly, nor is there need to do so. When necessary, they can be 564 derived from external means, such as the filesystem or dedicated 565 hardware. 567 0 1 2 3 568 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 570 | Seconds | Fraction | 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 NTP Short Format 575 0 1 2 3 576 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | Seconds | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | Fraction | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 583 NTP Timestamp Format 585 0 1 2 3 586 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 588 | Era Number | 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 | Era Offset | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 | | 593 | Fraction | 594 | | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 NTP Date Format 599 Figure 4: NTP Time Formats 601 The 64-bit timestamp format is used in packet headers and other 602 places with limited word size. It includes a 32-bit unsigned seconds 603 field spanning 136 years and a 32-bit fraction field resolving 232 604 picoseconds. The 32-bit short format is used in delay and dispersion 605 header fields where the full resolution and range of the other 606 formats are not justified. It includes a 16-bit unsigned seconds 607 field and a 16-bit fraction field. 609 In the date and timestamp formats the prime epoch, or base date of 610 era 0, is 0 h 1 January 1900 UTC, when all bits are zero. It should 611 be noted that strictly speaking, UTC did not exist prior to 1 January 612 1972, but it is convenient to assume it has existed for all eternity, 613 even if all knowledge of historic leap seconds has been lost. Dates 614 are relative to the prime epoch; values greater than zero represent 615 times after that date; values less than zero represent times before 616 it. Note that the Era Offset field of the date format and the 617 Seconds field of the timestamp format have the same interpretation. 619 Timestamps are unsigned values and operations on them produce a 620 result in the same or adjacent eras. Era 0 includes dates from the 621 prime epoch to some time in 2036, when the timestamp field wraps 622 around and the base date for era 1 is established. In either format 623 a value of zero is a special case representing unknown or 624 unsynchronized time. Figure 5 shows a number of historic NTP dates 625 together with their corresponding Modified Julian Day (MJD), NTP era 626 and NTP timestamp. 628 +-------------+------------+-----+---------------+------------------+ 629 | Year | MJD | NTP | NTP Timestamp | Epoch | 630 | | | Era | Era Offset | | 631 +-------------+------------+-----+---------------+------------------+ 632 | 1 Jan -4712 | -2,400,001 | -49 | 1,795,583,104 | 1st day Julian | 633 | 1 Jan -1 | -679,306 | -14 | 139,775,744 | 2 BCE | 634 | 1 Jan 0 | -678,491 | -14 | 171,311,744 | 1 BCE | 635 | 1 Jan 1 | -678,575 | -14 | 202,939,144 | 1 CE | 636 | 4 Oct 1582 | -100,851 | -3 | 2,873,647,488 | Last day Julian | 637 | 15 Oct 1582 | -100,840 | -3 | 2,874,597,888 | First day | 638 | | | | | Gregorian | 639 | 31 Dec 1899 | 15019 | -1 | 4,294,880,896 | Last day NTP Era | 640 | | | | | -1 | 641 | 1 Jan 1900 | 15020 | 0 | 0 | First day NTP | 642 | | | | | Era 0 | 643 | 1 Jan 1970 | 40,587 | 0 | 2,208,988,800 | First day UNIX | 644 | 1 Jan 1972 | 41,317 | 0 | 2,272,060,800 | First day UTC | 645 | 31 Dec 1999 | 51,543 | 0 | 3,155,587,200 | Last day 20th | 646 | | | | | Century | 647 | 8 Feb 2036 | 64,731 | 1 | 63,104 | First day NTP | 648 | | | | | Era 1 | 649 +-------------+------------+-----+---------------+------------------+ 651 Figure 5: Interesting Historic NTP Dates 653 Let p be the number of significant bits in the second fraction. The 654 clock resolution is defined 2^(-p), in seconds. In order to minimize 655 bias and help make timestamps unpredictable to an intruder, the non- 656 significant bits should be set to an unbiased random bit string. The 657 clock precision is defined as the running time to read the system 658 clock, in seconds. Note that the precision defined in this way can 659 be larger or smaller than the resolution. The term rho, representing 660 the precision used in the protocol, is the larger of the two. 662 The only arithmetic operation permitted on dates and timestamps is 663 twos-complement subtraction, yielding a 127-bit or 63-bit signed 664 result. It is critical that the first-order differences between two 665 dates preserve the full 128-bit precision and the first-order 666 differences between two timestamps preserve the full 64-bit 667 precision. However, the differences are ordinarily small compared to 668 the seconds span, so they can be converted to floating double format 669 for further processing and without compromising the precision. 671 It is important to note that twos-complement arithmetic does not know 672 the difference between signed and unsigned values; only the 673 conditional branch instructions do. Thus, although the distinction 674 is made between signed dates and unsigned timestamps, they are 675 processed the same way. A perceived hazard with 64-bit timestamp 676 calculations spanning an era, such as possible in 2036, might result 677 in incorrect values. In point of fact, if the client is set within 678 68 years of the server before the protocol is started, correct values 679 are obtained even if the client and server are in adjacent eras. 681 Some time values are represented in exponent format, including the 682 precision, time constant and poll interval. These are in 8-bit 683 signed integer format in log2 (log to the base 2) seconds. The only 684 arithmetic operations permitted on them are increment and decrement. 685 For the purpose of this document and to simplify the presentation, a 686 reference to one of these variables by name means the exponentiated 687 value, e.g., the poll interval is 1024 s, while reference by name and 688 exponent means the actual value, e.g., the poll exponent is 10. 690 To convert system time in any format to NTP date and timestamp 691 formats requires that the number of seconds s from the prime epoch to 692 the system time be determined. To determine the integer era and 693 timestamp given s, 695 era = s / 2^(32) and timestamp = s - era * 2^(32), 697 which works for positive and negative dates. To determine s given 698 the era and timestamp, 700 s = era * 2^(32) + timestamp. 702 Converting between NTP and system time can be a little messy, but 703 beyond the scope of this document. Note that the number of days in 704 era 0 is one more than the number of days in most other eras and this 705 won't happen again until the year 2400 in era 3. 707 In the description of state variables to follow, explicit reference 708 to integer type implies a 32-bit unsigned integer. This simplifies 709 bounds checks, since only the upper limit needs to be defined. 710 Without explicit reference, the default type is 64-bit floating 711 double. Exceptions will be noted as necessary. 713 7. Data Structures 715 The NTP protocol state machines described in following sections are 716 defined using state variables and code fragments defined in 717 Appendix A. State variables are separated into classes according to 718 their function in packet headers, peer and poll processes, the system 719 process and the clock discipline process. Packet variables represent 720 the NTP header values in transmitted and received packets. Peer and 721 poll variables represent the contents of the association for each 722 server separately. System variables represent the state of the 723 server as seen by its dependent clients. Clock discipline variables 724 represent the internal workings of the clock discipline algorithm. 725 Additional parameters and variable classes are defined in Appendix A. 727 7.1. Structure Conventions 729 In order to distinguish between different variables of the same name 730 but used in different processes, the naming convention summarized in 731 Figure 6 is adopted. A receive packet variable v is a member of the 732 packet structure r with fully qualified name r.v. In a similar 733 manner x.v is a transmit packet variable, p.v is a peer variable, s.v 734 is a system variable and c.v is a clock discipline variable. There 735 is a set of peer variables for each association; there is only one 736 set of system and clock variables. 738 +------+---------------------------------+ 739 | Name | Description | 740 +------+---------------------------------+ 741 | r. | receive packet header variable | 742 | x. | transmit packet header variable | 743 | p. | peer/poll variable | 744 | s. | system variable | 745 | c. | clock discipline variable | 746 +------+---------------------------------+ 748 Figure 6: Prefix Conventions 750 7.2. Global Parameters 752 In addition to the variable classes a number of global parameters are 753 defined in this document, including those shown with values in 754 Figure 7. 756 +-----------+-------+----------------------------------+ 757 | Name | Value | Description | 758 +-----------+-------+----------------------------------+ 759 | PORT | 123 | NTP port number | 760 | VERSION | 4 | version number | 761 | TOLERANCE | 15e-6 | frequency tolerance PHI (s/s) | 762 | MINPOLL | 4 | minimum poll exponent (16 s) | 763 | MAXPOLL | 17 | maximum poll exponent (36 h) | 764 | MAXDISP | 16 | maximum dispersion (16 s) | 765 | MINDISP | .005 | minimum dispersion increment (s) | 766 | MAXDIST | 1 | distance threshold (1 s) | 767 | MAXSTRAT | 16 | maximum stratum number | 768 +-----------+-------+----------------------------------+ 770 Figure 7: Global Parameters 772 While these are the only global parameters needed in this document, a 773 larger collection is necessary in the skeleton and larger still for 774 any implementation. Appendix A.1.1 contains those used by the 775 skeleton for the mitigation algorithms, clock discipline algorithm 776 and related implementation-dependent functions. Some of these 777 parameter values are cast in stone, like the NTP port number assigned 778 by the IANA and the version number assigned NTPv4 itself. Others 779 like the frequency tolerance (also called PHI), involve an assumption 780 about the worst case behavior of a system clock once synchronized and 781 then allowed to drift when its sources have become unreachable. The 782 minimum and maximum parameters define the limits of state variables 783 as described in later sections of this document. 785 While shown with fixed values in this document, some implementations 786 may make them variables adjustable by configuration commands. For 787 instance, the reference implementation computes the value of 788 PRECISION as log2 of the minimum time in several iterations to read 789 the system clock. 791 7.3. Packet Header Variables 793 The most important state variables from an external point of view are 794 the packet header variables described in Figure 8 and below. The NTP 795 packet header consists of an integral number of 32-bit (4 octet) 796 words in network byte order. The packet format consists of three 797 components, the header itself, one or more optional extension fields 798 and an optional message authentication code (MAC). The header 799 component is identical to the NTPv3 header and previous versions. 800 The optional extension fields are used by the Autokey public key 801 cryptographic algorithms described in [3]. The optional MAC is used 802 by both Autokey and the symmetric key cryptographic algorithm 803 described in this report. 805 +-----------+------------+-----------------------+ 806 | Name | Formula | Description | 807 +-----------+------------+-----------------------+ 808 | leap | leap | leap indicator (LI) | 809 | version | version | version number (VN) | 810 | mode | mode | mode | 811 | stratum | stratum | stratum | 812 | poll | poll | poll exponent | 813 | precision | rho | precision exponent | 814 | rootdelay | delta_r | root delay | 815 | rootdisp | epsilon_r | root dispersion | 816 | refid | refid | reference ID | 817 | reftime | reftime | reference timestamp | 818 | org | T1 | origin timestamp | 819 | rec | T2 | receive timestamp | 820 | xmt | T3 | transmit timestamp | 821 | dst | T4 | destination timestamp | 822 | keyid | keyid | key ID | 823 | digest | digest | message digest | 824 +-----------+------------+-----------------------+ 826 Figure 8: Packet Header Variables 828 The NTP packet header follows the UDP and IP headers and the physical 829 header specific to the underlying transport network. Some fields use 830 multiple words and others are packed in smaller fields within a word. 831 The NTP packet header shown in Figure 9 has 12 words followed by 832 optional extension fields and finally an optional message 833 authentication code (MAC) consisting of the key identifier field and 834 message digest field. 836 0 1 2 3 837 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 838 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 839 |LI | VN |Mode | Stratum | Poll | Precision | 840 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 841 | Root Delay | 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 843 | Root Dispersion | 844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 845 | Reference ID | 846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 847 | | 848 + Reference Timestamp (64) + 849 | | 850 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 851 | | 852 + Origin Timestamp (64) + 853 | | 854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 855 | | 856 + Receive Timestamp (64) + 857 | | 858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 859 | | 860 + Transmit Timestamp (64) + 861 | | 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 | | 864 + Extension Field 1 (variable) + 865 | | 866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 867 | | 868 + Extension Field 2 (variable) + 869 | | 870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 871 | Key Identifier | 872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 873 | | 874 | Message Digest (128) | 875 | | 876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 878 Figure 9: Packet Header Format 880 The extension fields are used to add optional capabilities, for 881 example, the Autokey security protocol [3]. The extension field 882 format is presented in order that the packet can be parsed without 883 knowledge of the extension field functions. The MAC is used by both 884 Autokey and the symmetric key authentication scheme described in 885 Appendix A. 887 A list of the packet header variables is shown in Figure 8 and 888 described in detail below. Except for a minor variation when using 889 the IPv6 address family, these fields are backwards compatible with 890 NTPv3. The packet header fields apply to both transmitted packets (x 891 prefix) and received packets (r prefix). In Figure 9 the size of 892 some multiple-word fields is shown in bits if not the default 32 893 bits. The header extends from the beginning of the packet to the end 894 of the Transmit Timestamp field. 896 The fields and associated packet variables (in parentheses) are 897 interpreted as follows: 899 LI Leap Indicator (leap): 2-bit integer warning of an impending leap 900 second to be inserted or deleted in the last minute of the current 901 month with values defined in Figure 10. 903 +-------+-------------------------------------------------+ 904 | Value | Meaning | 905 +-------+-------------------------------------------------+ 906 | 0 | no warning | 907 | 1 | last minute of the day has 61 seconds | 908 | 2 | last minute of the day has 59 seconds | 909 | 3 | alarm condition (the clock is not synchronized) | 910 +-------+-------------------------------------------------+ 912 Figure 10: Leap Indicator 914 VN Version Number (version): 3-bit integer representing the NTP 915 version number, currently 4. 917 Mode (mode): 3-bit integer representing the mode, with values defined 918 in Figure 11. 920 +-------+--------------------------+ 921 | Value | Meaning | 922 +-------+--------------------------+ 923 | 0 | reserved | 924 | 1 | symmetric active | 925 | 2 | symmetric passive | 926 | 3 | client | 927 | 4 | server | 928 | 5 | broadcast | 929 | 6 | NTP control message | 930 | 7 | reserved for private use | 931 +-------+--------------------------+ 933 Figure 11: Association Modes 935 Stratum (stratum): 8-bit integer representing the stratum, with 936 values defined in Figure 12. 938 +--------+-----------------------------------------------------+ 939 | Value | Meaning | 940 +--------+-----------------------------------------------------+ 941 | 0 | unspecified or invalid | 942 | 1 | primary server (e.g., equipped with a GPS receiver) | 943 | 2-15 | secondary server (via NTP) | 944 | 16-255 | undefined | 945 +--------+-----------------------------------------------------+ 947 Figure 12: Packet Stratum 949 It is customary to map the stratum value 0 in received packets to 950 MAXSTRAT (16) in the peer variable p.stratum and to map p.stratum 951 values of MAXSTRAT or greater to 0 in transmitted packets. This 952 allows reference clocks, which normally appear at stratum 0, to be 953 conveniently mitigated using the same algorithms used for external 954 sources. 956 Poll: 8-bit signed integer representing the maximum interval between 957 successive messages, in log2 seconds. Suggested default limits for 958 minimum and maximum poll intervals are 6 and 10, respectively. 960 Precision: 8-bit signed integer representing the precision of the 961 system clock, in log2 seconds. For instance a value of -18 962 corresponds to a precision of about one microsecond. The precision 963 can be determined when the service first starts up as the minimum 964 time of several iterations to read the system clock. 966 Root Delay (rootdelay): Total round trip delay to the reference 967 clock, in NTP short format. 969 Root Dispersion (rootdisp): Total dispersion to the reference clock, 970 in NTP short format. 972 Reference ID (refid): 32-bit code identifying the particular server 973 or reference clock. The interpretation depends on the value in the 974 stratum field. For packet stratum 0 (unspecified or invalid) this is 975 a four-character ASCII string, called the kiss code, used for 976 debugging and monitoring purposes. For stratum 1 (reference clock) 977 this is a four-octet, left-justified, zero-padded ASCII string 978 assigned to the reference clock. While not specifically enumerated 979 in this document, the identifiers in Figure 13 have been used as 980 ASCII identifiers: 982 +------+----------------------------------------------------------+ 983 | ID | Clock Source | 984 +------+----------------------------------------------------------+ 985 | GOES | Geosynchronous Orbit Environment Satellite | 986 | GPS | Global Position System | 987 | GAL | Galileo Positioning System | 988 | PPS | Generic pulse-per-second | 989 | IRIG | Inter-Range Instrumentation Group | 990 | WWVB | LF Radio WWVB Ft. Collins, CO 60 kHz | 991 | DCF | LF Radio DCF77 Mainflingen, DE 77.5 kHz | 992 | HBG | LF Radio HBG Prangins, HB 75 kHz | 993 | MSF | LF Radio MSF Anthorn, UK 60 kHz (Rugby until April 2007) | 994 | JJY | LF Radio JJY Fukushima, JP 40 kHz, Saga, JP 60 kHz | 995 | LORC | MF Radio LORAN C 100 kHz | 996 | TDF | MF Radio Allouis, FR 162 kHz | 997 | CHU | HF Radio CHU Ottawa, Ontario | 998 | WWV | HF Radio WWV Ft. Collins, CO | 999 | WWVH | HF Radio WWVH Kauai, HI | 1000 | NIST | NIST telephone modem | 1001 | ACTS | NIST telephone modem | 1002 | USNO | USNO telephone modem | 1003 | PTB | European telephone modem | 1004 +------+----------------------------------------------------------+ 1006 Figure 13: Reference Identifiers 1008 Above stratum 1 (secondary servers and clients) this is the reference 1009 identifier of the server and is intended to detect timing loops. If 1010 using the IPv4 address family, the identifier is the four-octet IPv4 1011 address. If using the IPv6 address family, it is the first four 1012 octets of the MD5 hash of the IPv6 address. Note that, when using 1013 the IPv6 address family on an NTPv4 server with a NTPv3 client, the 1014 Reference Identifier field appears to be a random value and a timing 1015 loop might not be detected. 1017 Reference Timestamp: Time when the system clock was last set or 1018 corrected, in NTP timestamp format. 1020 Origin Timestamp (org): Time at the client when the request departed 1021 for the server, in NTP timestamp format. 1023 Receive Timestamp (rec): Time at the server when the request arrived 1024 from the client, in NTP timestamp format. 1026 Transmit Timestamp (xmt): Time at the server when the response left 1027 for the client, in NTP timestamp format. 1029 Destination Timestamp (dst): Time at the client when the reply 1030 arrived from the server, in NTP timestamp format. 1032 Note: Destination Timestamp field is not included as a header field; 1033 it is determined upon arrival of the packet and made available in the 1034 packet buffer data structure. 1036 The MAC consists of the Key Identifier followed by the Message 1037 Digest. The message digest, or cryptosum, is calculated as in [8] 1038 over all header and optional extension fields, but not the MAC 1039 itself. 1041 Key Identifier (keyid): 32-bit unsigned integer used by the client 1042 and server to designate a secret 128-bit MD5 key. 1044 Message Digest (digest): 128-bit bitstring computed by the keyed MD5 1045 message digest computed over all the words in the header and 1046 extension fields, but not the MAC itself. 1048 7.4. The Kiss-o'-Death Packet 1050 If the Stratum field is 0, which implies unspecified or invalid, the 1051 Reference Identifier field can be used to convey messages useful for 1052 status reporting and access control. These are called Kiss-o'-Death 1053 (KoD) packets and the ASCII messages they convey are called kiss 1054 codes. The KoD packets got their name because an early use was to 1055 tell clients to stop sending packets that violate server access 1056 controls. The kiss codes can provide useful information for an 1057 intelligent client, either NTPv4 or SNTPv4. Kiss codes are encoded 1058 in four-character ASCII strings left justified and zero filled. The 1059 strings are designed for character displays and log files. A list of 1060 the currently-defined kiss codes is given in Figure 14. Other than 1061 displaying the kiss code, KoD packets have no protocol significance 1062 and are discarded after inspection. 1064 +------+------------------------------------------------------------+ 1065 | Code | Meaning | 1066 +------+------------------------------------------------------------+ 1067 | ACST | The association belongs to a unicast server | 1068 | AUTH | Server authentication failed | 1069 | AUTO | Autokey sequence failed | 1070 | BCST | The association belongs to a broadcast server | 1071 | CRYP | Cryptographic authentication or identification failed | 1072 | DENY | Access denied by remote server | 1073 | DROP | Lost peer in symmetric mode | 1074 | RSTR | Access denied due to local policy | 1075 | INIT | The association has not yet synchronized for the first | 1076 | | time | 1077 | MCST | The association belongs to a dynamically discovered server | 1078 | NKEY | No key found. Either the key was never installed or is | 1079 | | not trusted | 1080 | RATE | Rate exceeded. The server has temporarily denied access | 1081 | | because the client exceeded the rate threshold | 1082 | RMOT | Alteration of association from a remote host running | 1083 | | ntpdc. | 1084 | STEP | A step change in system time has occurred, but the | 1085 | | association has not yet resynchronized | 1086 +------+------------------------------------------------------------+ 1088 Figure 14: Kiss Codes 1090 7.5. NTP Extension Field Format 1092 In NTPv4 one or more extension fields can be inserted after the 1093 header and before the MAC, which is always present when an extension 1094 field is present. Other than defining the field format, this 1095 document makes no use of the field contents. An extension field 1096 contains a request or response message in the format shown in 1097 Figure 15. 1099 0 1 2 3 1100 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1102 | Field Type | Length | 1103 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1104 | Association ID | 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1106 | Timestamp | 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1108 | Filestamp | 1109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1110 | Value Length | 1111 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1112 . . 1113 . Value . 1114 . . 1115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1116 | Signature Length | 1117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1118 . . 1119 . Signature . 1120 . . 1121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1122 | Padding (as needed) | 1123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1125 Figure 15: Extension Field Format 1127 All extension fields are zero-padded to a word (4 octets) boundary. 1128 The Field Type field is specific to the defined function and is not 1129 elaborated here. While the minimum field length containing required 1130 fields is 4 words (16 octets), a maximum field length remains to be 1131 established. 1133 The Length field is a 16-bit integer which indicates the length of 1134 the entire extension field in octets, including the Padding field. 1136 The 32-bit Association ID field is set by clients to the value 1137 previously received from the server or 0 otherwise. The server sets 1138 the Association ID field when sending a response as a handle for 1139 subsequent exchanges. 1141 The Timestamp and Filestamp 32-bit fields carry the seconds field of 1142 an NTP timestamp. The Timestamp field establishes the signature 1143 epoch of the data in the extension field, while the filestamp 1144 establishes the generation epoch of the file that ultimately produced 1145 the data. 1147 The 32-bit Value Length field indicates the length of the Value field 1148 in octets. The minimum length of this field is 0, in which case the 1149 Value field itself is omitted. 1151 The 32-bit Signature Length field indicates the length of the 1152 Signature field in octets. The minimum length of this field is 0, in 1153 which case the Signature field itself is omitted. 1155 If both the Value Length and Signature Length fields are 0, both of 1156 these words can be omitted, in which case the extension field has 1157 length 4 words. 1159 The presence of the MAC and extension fields in the packet is 1160 determined from the length of the remaining area after the header to 1161 the end of the packet. The parser initializes a pointer just after 1162 the header. If the Length field is not a multiple of 4, a format 1163 error has occurred and the packet is discarded. The following cases 1164 are possible based on the remaining length in words. 1165 0 The packet contains no MAC and is not authenticated. 1166 1 The packet is an error report or crypto-NAK. 1167 2, 3, 4 The packet is discarded with a format error. 1168 5 The remainder of the packet is the 160-bit MAC. 1169 >5 One or more extension fields are present. 1171 If an extension field is present, the parser examines the Length 1172 field. If the length is less than 4 or not a multiple of 4, a format 1173 error has occurred and the packet is discarded; otherwise, the parser 1174 increments the pointer by this value. The parser now uses the same 1175 rules as above to determine whether a MAC is present and/or another 1176 extension field. An additional implementation dependent test is 1177 necessary to ensure the pointer does not stray outside the buffer 1178 space occupied by the packet. 1180 8. On Wire Protocol 1182 The heart of the NTP on-wire protocol is the core mechanism which 1183 exchanges time values between servers, peers and clients. It is 1184 inherently resistant to lost or duplicate packets. Data integrity is 1185 provided by the IP and UDP checksums. No flow control or 1186 retransmission facilities are provided or necessary. The protocol 1187 uses timestamps, either extracted from packet headers or struck from 1188 the system clock upon the arrival or departure of a packet. 1189 Timestamps are precision data and should be restruck in case of link 1190 level retransmission and corrected for the time to compute a MAC on 1191 transmit. 1193 NTP messages make use of two different communication modes, one-to- 1194 one and one-to-many, commonly referred to as unicast and broadcast. 1195 For the purposes of this document, the term broadcast is interpreted 1196 to mean any available one-to-many mechanism. For IPv4 this equates 1197 to either IPv4 broadcast or IPv4 multicast. For IPv6 this equates to 1198 IPv6 multicast. For this purpose, IANA has allocated the IPv4 1199 multicast address 224.0.1.1 and the IPv6 multicast address ending 1200 :101, with prefix determined by scoping rules. 1202 The on-wire protocol uses four timestamps numbered T1 through T4 and 1203 three state variables org, rec and xmt, as shown in Figure 17. This 1204 figure shows the most general case where each of two peers, A and B, 1205 independently measure the offset and delay relative to the other. 1206 For purposes of illustration the individual timestamp values are 1207 shown in lower case with a number indicating the order of 1208 transmission and reception. 1210 t2 t3 t6 t7 1211 +---------+ +---------+ +---------+ +---------+ 1212 T1 | 0 | | t1 | | t3 | | t5 | 1213 +---------+ +---------+ +---------+ +---------+ 1214 T2 | 0 | | t2 | | t4 | | t6 | Packet 1215 +---------+ +---------+ +---------+ +---------+ Variables 1216 T3 | t1 | |t3=clock | | t5 | |t7=clock | 1217 +---------+ +---------+ +---------+ +---------+ 1218 T4 |t2=clock | |t6=clock | 1219 +---------+ +---------+ 1220 Peer B 1221 +---------+ +---------+ +---------+ +---------+ 1222 org | t1 | | t1 | | T3<>t1? | | t5 | 1223 +---------+ +---------+ +---------+ +---------+ State 1224 rec | t2 | | t2 | | t6 | | t6 | Variables 1225 +---------+ +---------+ +---------+ +---------+ 1226 xmt | 0 | | t3 | | T1<>t3? | | t7 | 1227 +---------+ +---------+ +---------+ +---------+ 1229 t2 t3 t6 t7 1230 --------------------------------------------------------- 1231 /\ \ /\ \ 1232 / \ / \ 1233 / \ / \ 1234 / \/ / \/ 1235 --------------------------------------------------------- 1236 t1 t4 t5 t8 1238 t1 t4 t5 t8 1239 +---------+ +---------+ +---------+ +---------+ 1240 T1 | 0 | | t1 | | t3 | | t5 | 1241 +---------+ +---------+ +---------+ +---------+ 1242 T2 | 0 | | t2 | | t4 | | t6 | Packet 1243 +---------+ +---------+ +---------+ +---------+ Variables 1244 T3 |t1=clock | | t3 | |t5=clock | | t7 | 1245 +---------+ +---------+ +---------+ +---------+ 1246 T4 |t4=clock | |t8=clock | 1247 +---------+ +---------+ 1248 Peer A 1249 +---------+ +---------+ +---------+ +---------+ 1250 org | 0 | | T3<>0? | | t3 | | T3<>t3? | 1251 +---------+ +---------+ +---------+ +---------+ State 1252 rec | 0 | | t4 | | t4 | | t8 | Variables 1253 +---------+ +---------+ +---------+ +---------+ 1254 xmt | t1 | | T1=t1? | | t5 | | T1<>t5? | 1255 +---------+ +---------+ +---------+ +---------+ 1257 Figure 17: On-Wire Protocol 1259 In the figure the first packet transmitted by A contains only the 1260 transmit timestamp T3 with value t1. B receives the packet at t2 and 1261 saves the origin timestamp T1 with value t1 in state variable org and 1262 the destination timestamp T4 with value t2 in state variable rec. At 1263 this time or some time later B sends a packet to A containing the org 1264 and rec state variables in T1 and T2, respectively and in addition 1265 the transmit timestamp T3 with value t3, which is saved in the xmt 1266 state variable. When this packet arrives at A the packet header 1267 variables T1, T2, T3 and destination timestamp T4 represent the four 1268 timestamps necessary to compute the offset and delay of B relative to 1269 A, as described below. 1271 Before the A state variables are updated, two sanity checks are 1272 performed in order to protect against duplicate or bogus packets. A 1273 packet is a duplicate if the transmit timestamp T3 in the packet 1274 matches the xmt state variable. A packet is bogus if the origin 1275 timestamp T1 in the packet does not match the org state variable. In 1276 either of these cases the state variables are updated, then the 1277 packet is discarded. 1279 The four most recent timestamps, T1 through T4, are used to compute 1280 the offset of B relative to A 1282 theta = T(B) - T(A) = 1/2 * [(T2-T1) + (T3-T4)] 1284 and the round trip delay 1286 delta = T(ABA) = (T4-T1) - (T3-T2). 1288 Note that the quantities within parentheses are computed from 64-bit 1289 unsigned timestamps and result in signed values with 63 significant 1290 bits plus sign. These values can represent dates from 68 years in 1291 the past to 68 years in the future. However, the offset and delay 1292 are computed as sums and differences of these values, which contain 1293 62 significant bits and two sign bits, so can represent unambiguous 1294 values from 34 years in the past to 34 years in the future. In other 1295 words, the time of the client must be set within 34 years of the 1296 server before the service is started. This is a fundamental 1297 limitation with 64-bit integer arithmetic. 1299 In implementations where floating double arithmetic is available, the 1300 first-order differences can be converted to floating double and the 1301 second-order sums and differences computed in that arithmetic. Since 1302 the second-order terms are typically very small relative to the 1303 timestamp magnitudes, there is no loss in significance, yet the 1304 unambiguous range is restored from 34 years to 68 years. 1306 In some scenarios where the initial frequency offset of the client is 1307 relatively large and the actual propagation time small, it is 1308 possible that the delay computation becomes negative. For instance, 1309 if the frequency difference is 100 PPM and the interval T4-T1 is 64 1310 s, the apparent delay is -6.4 ms. Since negative values are 1311 misleading in subsequent computations, the value of delta should be 1312 clamped not less than s.rho, where s.rho is the system precision 1313 described in Section 11.1, expressed in seconds. 1315 The discussion above assumes the most general case where two 1316 symmetric peers independently measure the offsets and delays between 1317 them. In the case of a stateless server, the protocol can be 1318 simplified. A stateless server copies T3 and T4 from the client 1319 packet to T1 and T2 of the server packet and tacks on the transmit 1320 timestamp T3 before sending it to the client. Additional details for 1321 filling in the remaining protocol fields are given in a Section 9 and 1322 following sections and in the appendix. 1324 Note that the on-wire protocol as described resists replay of a 1325 server response packet. However, it does not resist replay of the 1326 client request packet, which would result in a server reply packet 1327 with new values of T2 and T3 and result in incorrect offset and 1328 delay. This vulnerability can be avoided by setting the xmt state 1329 variable to zero after computing the offset and delay. 1331 9. Peer Process 1333 The process descriptions to follow include a listing of the important 1334 state variables followed by an overview of the process operations 1335 implemented as routines. Frequent reference is made to the skeleton 1336 in the appendix. The skeleton includes C-language fragments that 1337 describe the functions in more detail. It includes the parameters, 1338 variables and declarations necessary for a conforming NTPv4 1339 implementation. However, many additional variables and routines may 1340 be necessary in a working implementation. 1342 The peer process is called upon arrival of a server or peer packet. 1343 It runs the on-wire protocol to determine the clock offset and round 1344 trip delay and in addition computes statistics used by the system and 1345 poll processes. Peer variables are instantiated in the association 1346 data structure when the structure is initialized and updated by 1347 arriving packets. There is a peer process, poll process and 1348 association for each server. 1350 9.1. Peer Process Variables 1352 Figure 18, Figure 19, Figure 20 and Figure 21 summarize the common 1353 names, formula names and a short description of the peer variables. 1355 The common names and formula names are interchangeable; formula names 1356 are intended for equations where space is important. Unless noted 1357 otherwise, all peer variables have assumed prefix p. 1359 +---------+----------+-----------------------+ 1360 | Name | Formula | Description | 1361 +---------+----------+-----------------------+ 1362 | srcaddr | srcaddr | source address | 1363 | srcport | srcport | source port | 1364 | dstaddr | dstaddr | destination address | 1365 | dstport | destport | destination port | 1366 | keyid | keyid | key identifier key ID | 1367 +---------+----------+-----------------------+ 1369 Figure 18: Peer Process Configuration Variables 1371 +-----------+------------+---------------------+ 1372 | Name | Formula | Description | 1373 +-----------+------------+---------------------+ 1374 | leap | leap | leap indicator | 1375 | version | version | version number | 1376 | mode | mode | mode | 1377 | stratum | stratum | stratum | 1378 | ppoll | ppoll | peer poll exponent | 1379 | rootdelay | delta_r | root delay | 1380 | rootdisp | epsilon_r | root dispersion | 1381 | refid | refid | reference ID | 1382 | reftime | reftime | reference timestamp | 1383 +-----------+------------+---------------------+ 1385 Figure 19: Peer Process Packet Variables 1387 +------+---------+--------------------+ 1388 | Name | Formula | Description | 1389 +------+---------+--------------------+ 1390 | org | T1 | origin timestamp | 1391 | rec | T2 | receive timestamp | 1392 | xmt | T3 | transmit timestamp | 1393 | t | t | packet time | 1394 +------+---------+--------------------+ 1396 Figure 20: Peer Process Timestamp Variables 1397 +--------+---------+-----------------+ 1398 | Name | Formula | Description | 1399 +--------+---------+-----------------+ 1400 | offset | theta | clock offset | 1401 | delay | delta | roundtrip delay | 1402 | disp | epsilon | dispersion | 1403 | jitter | psi | jitter | 1404 | filter | filter | clock filter | 1405 | tp | t_p | filter time | 1406 +--------+---------+-----------------+ 1408 Figure 21: Peer Process Statistics Variables 1410 The following configuration variables are normally initialized when 1411 the association is mobilized, either from a configuration file or 1412 upon the arrival of the first packet for an unknown association. 1414 srcaddr: IP address of the remote server or reference clock. This 1415 becomes the destination IP address in packets sent from this 1416 association. 1418 srcport: UDP port number of the server or reference clock. This 1419 becomes the destination port number in packets sent from this 1420 association. When operating in symmetric modes (1 and 2) this field 1421 must contain the NTP port number PORT (123) assigned by the IANA. In 1422 other modes it can contain any number consistent with local policy. 1424 dstaddr: IP address of the client. This becomes the source IP 1425 address in packets sent from this association. 1427 dstport: UDP port number of the client, ordinarily the NTP port 1428 number PORT (123) assigned by the IANA. This becomes the source port 1429 number in packets sent from this association. 1431 keyid: Symmetric key ID for the 128-bit MD5 key used to generate and 1432 verify the MAC. The client and server or peer can use different 1433 values, but they must map to the same key. 1435 The variables defined in Figure 19 are updated from the packet header 1436 as each packet arrives. They are interpreted in the same way as the 1437 packet variables of the same names. It is convenient for later 1438 processing to convert the NTP short format packet values r.rootdelay 1439 and r.rootdisp to floating doubles as peer variables. 1441 The variables defined in Figure 20 include the timestamps exchanged 1442 by the on-wire protocol in Section 8. The t variable is the seconds 1443 counter c.t associated with these values. The c.t variable is 1444 maintained by the clock adjust process described in Section 12. It 1445 counts the seconds since the service was started. The variables 1446 defined in Figure 21 include the statistics computed by the 1447 clock_filter() routine described in Section 10. The tp variable is 1448 the seconds counter associated with these values. 1450 9.2. Peer Process Operations 1452 The receive() routine in Appendix A.5.1 shows the peer process code 1453 flow upon the arrival of a packet. The access() routine in 1454 Appendix A.5.4 implements access restrictions using an access control 1455 list (ACL). There is no specific method required for access control, 1456 although it is recommended that implementations include such a 1457 scheme, which is similar to many others now in widespread use. 1458 Format checks require correct field length and alignment, acceptable 1459 version number (1-4) and correct extension field syntax, if present. 1461 There is no specific requirement for authentication; however, if 1462 authentication is implemented, the symmetric key scheme described in 1463 Appendix A.2 must be among the supported schemes. This scheme uses 1464 the MD5 keyed hash algorithm described in [8]. 1466 Next, the association table is searched for matching source address 1467 and source port using the find_assoc() routine in Appendix A.5.1. 1468 Figure 22 is a dispatch table where the columns correspond to the 1469 packet mode and rows correspond to the association mode. The 1470 intersection of the association and packet modes dispatches 1471 processing to one of the following steps. 1473 +------------------+---------------------------------------+ 1474 | | Packet Mode | 1475 +------------------+-------+-------+-------+-------+-------+ 1476 | Association Mode | 1 | 2 | 3 | 4 | 5 | 1477 +------------------+-------+-------+-------+-------+-------+ 1478 | No Association 0 | NEWPS | DSCRD | FXMIT | MANY | NEWBC | 1479 | Symm. Active 1 | PROC | PROC | DSCRD | DSCRD | DSCRD | 1480 | Symm. Passive 2 | PROC | ERR | DSCRD | DSCRD | DSCRD | 1481 | Client 3 | DSCRD | DSCRD | DSCRD | PROC | DSCRD | 1482 | Server 4 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1483 | Broadcast 5 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1484 | Bcast Client 6 | DSCRD | DSCRD | DSCRD | DSCRD | PROC | 1485 +------------------+-------+-------+-------+-------+-------+ 1487 Figure 22: Peer Dispatch Table 1489 DSCRD. This is a nonfatal violation of protocol as the result of a 1490 programming error, long delayed packet or replayed packet. The peer 1491 process discards the packet and exits. 1493 ERR. This is a fatal violation of protocol as the result of a 1494 programming error, long delayed packet or replayed packet. The peer 1495 process discards the packet, demobilizes the symmetric passive 1496 association and exits. 1498 FXMIT. This is a client (mode 3) packet matching no association 1499 (mode 0). If the destination address is not a broadcast address, the 1500 server constructs a server (mode 4) packet and returns it to the 1501 client without retaining state. The server packet header is 1502 constructed by the fast_xmit() routine in Appendix A.5.3. The packet 1503 header is assembled from the receive packet and system variables as 1504 shown in Figure 23. If the s.rootdelay and s.rootdisp system 1505 variables are stored in floating double, they must be converted to 1506 NTP short format first. 1508 +-----------------------------------+ 1509 | Packet Variable --> Variable | 1510 +-----------------------------------+ 1511 | r.leap --> p.leap | 1512 | r.mode --> p.mode | 1513 | r.stratum --> p.stratum | 1514 | r.poll --> p.ppoll | 1515 | r.rootdelay --> p.rootdelay | 1516 | r.rootdisp --> p.rootdisp | 1517 | r.refid --> p.refid | 1518 | r.reftime --> p.reftime | 1519 | r.keyid --> p.keyid | 1520 +-----------------------------------+ 1522 Figure 23: Receive Packet Header 1524 Note that, if authentication fails, the server returns a special 1525 message called a crypto-NAK. This message includes the normal NTP 1526 header data shown in Figure 9, but with a MAC consisting of four 1527 octets of zeros. The client MAY accept or reject the data in the 1528 message. After these actions the peer process exits. 1530 If the destination address is a multicast address, the sender is 1531 operating in manycast client mode. If the packet is valid and the 1532 server stratum is less than the client stratum, the server sends an 1533 ordinary server (mode 4) packet, but using its unicast destination 1534 address. A crypto-NAK is not sent if authentication fails. After 1535 these actions the peer process exits. 1537 MANY: This is a server (mode 4) packet matching no association. 1538 Ordinarily, this can happen only as the result of a manycast server 1539 reply to a previously sent multicast client packet. If the packet is 1540 valid, an ordinary client (mode 3) association is mobilized and 1541 operation continues as if the association was mobilized by the 1542 configuration file. 1544 NEWBC. This is a broadcast (mode 5) packet matching no association. 1545 The client mobilizes either a client (mode 3) or broadcast client 1546 (mode 6) association as shown in the mobilize() and clear() routines 1547 in Appendix A.2. Then the packet() routine in Appendix A.5.1.1 1548 validates the packet and initializes the peer variables. 1550 If the implementation supports no additional security or calibration 1551 functions, the association mode is set to broadcast client (mode 6) 1552 and the peer process exits. Implementations supporting public key 1553 authentication MAY run the Autokey or equivalent security protocol. 1554 Implementations SHOULD set the association mode to 3 and run a short 1555 client/server exchange to determine the propagation delay. Following 1556 the exchange the association mode is set to 6 and the peer process 1557 continues in listen-only mode. Note the distinction between a mode-6 1558 packet, which is reserved for the NTP monitor and control functions, 1559 and a mode-6 association. 1561 NEWPS. This is a symmetric active (mode 1) packet matching no 1562 association. The client mobilizes a symmetric passive (mode 2) 1563 association as shown in the mobilize() routine and clear() routines 1564 in Appendix A.2. Processing continues in the PROC section below. 1566 PROC. The packet matches an existing association. The packet 1567 timestamps are carefully checked to avoid invalid, duplicate or bogus 1568 packets. Additional checks are summarized in Figure 24. Note that 1569 all packets, including a crypto-NAK, are considered valid only if 1570 they survive these tests. 1572 +--------------------------+----------------------------------------+ 1573 | Packet Type | Description | 1574 +--------------------------+----------------------------------------+ 1575 | 1 duplicate packet | The packet is at best an old duplicate | 1576 | | or at worst a replay by a hacker. | 1577 | | This can happen in symmetric modes if | 1578 | | the poll intervals are uneven. | 1579 | 2 bogus packet | | 1580 | 3 invalid | One or more timestamp fields are | 1581 | | invalid. This normally happens in | 1582 | | symmetric modes when one peer sends | 1583 | | the first packet to the other and | 1584 | | before the other has received its | 1585 | | first reply. | 1586 | 4 access denied | The access controls have black | 1587 | 5 authentication failure | The cryptographic message digest does | 1588 | | not match the MAC. | 1589 | 6 unsynchronized | The server is not synchronized to a | 1590 | | valid source. | 1591 | 7 bad header data | One or more header fields are invalid. | 1592 +--------------------------+----------------------------------------+ 1594 Figure 24: Packet Error Checks 1596 Processing continues in the packet() routine in Appendix A.5.1.1. It 1597 copies the packet variables to the peer variables as shown in 1598 Figure 23 and the packet() routine in Appendix A.5.2". The receive() 1599 routine implements tests 1-5 in Figure 24; the packet() routine 1600 implements tests 6-7. If errors are found the packet is discarded 1601 and the peer process exits. 1603 The on-wire protocol calculates the clock offset theta and round trip 1604 delay delta from the four most recent timestamps as described in 1605 Section 8. While it is in principle possible to do all calculations 1606 except the first-order timestamp differences in fixed-point 1607 arithmetic, it is much easier to convert the first-order differences 1608 to floating doubles and do the remaining calculations in that 1609 arithmetic, and this will be assumed in the following description. 1611 Next, the 8-bit p.reach shift register in the poll process described 1612 in Section 13 is used to determine whether the server is reachable 1613 and the data are fresh. The register is shifted left by one bit when 1614 a packet is sent and the rightmost bit is set to zero. As valid 1615 packets arrive, the packet() routine sets the rightmost bit to one. 1616 If the register contains any nonzero bits, the server is considered 1617 reachable; otherwise, it is unreachable. Since the peer poll 1618 interval might have changed since the last packet, the poll_update() 1619 routine in Appendix A.5.7.2 is called to redetermine the host poll 1620 interval. 1622 The dispersion statistic epsilon(t) represents the maximum error due 1623 to the frequency tolerance and time since the last packet was sent It 1624 is initialized 1626 epsilon(t_0) = r.rho + s.rho + PHI * (T4-T1) 1628 when the measurement is made at t_0 according to the seconds counter. 1629 Here r.rho is the packet precision described in Section 7.3 and s.rho 1630 is the system precision described in Section 11.1, both expressed in 1631 seconds. These terms are necessary to account for the uncertainty in 1632 reading the system clock in both the server and the client. 1634 The dispersion then grows at constant rate PHI; in other words, at 1635 time t, epsilon(t) = epsilon(t_0) + PHI * (t-t_0). With the default 1636 value PHI = 15 PPM, this amounts to about 1.3 s per day. With this 1637 understanding, the argument t will be dropped and the dispersion 1638 represented simply as epsilon. The remaining statistics are computed 1639 by the clock filter algorithm described in the next section. 1641 10. Clock Filter Algorithm 1643 The clock filter algorithm, part of the peer process, is implemented 1644 in the clock_filter() routine in Appendix A.5.2. It grooms the 1645 stream of on-wire data to select the samples most likely to represent 1646 accurate time. The algorithm produces the variables shown in 1647 Figure 21, including the offset (theta), delay (delta), dispersion 1648 (epsilon), jitter (psi) and time of arrival (t). These data are used 1649 by the mitigation algorithms to determine the best and final offset 1650 used to discipline the system clock. They are also used to determine 1651 the server health and whether it is suitable for synchronization. 1653 The clock filter algorithm saves the most recent sample tuples 1654 (theta, delta, epsilon, t) in the filter structure, which functions 1655 as an 8-stage shift register. The tuples are saved in the order that 1656 packets arrive. Here t is the packet time of arrival according to 1657 the seconds counter and should not be confused with the peer variable 1658 tp. 1660 The following scheme is used to insure sufficient samples are in the 1661 filter and that old stale data are discarded. Initially, the tuples 1662 of all stages are set to the dummy tuple (0, MAXDISP, MAXDISP, 0). 1663 As valid packets arrive, tuples are shifted into the filter causing 1664 old tuples to be discarded, so eventually only valid tuples remain. 1665 If the three low order bits of the reach register are zero, 1666 indicating three poll intervals have expired with no valid packets 1667 received, the poll process calls the clock filter algorithm with a 1668 dummy tuple just as if the tuple had arrived from the network. If 1669 this persists for eight poll intervals, the register returns to the 1670 initial condition. 1672 In the next step the shift register stages are copied to a temporary 1673 list and the list sorted by increasing delta. Let i index the stages 1674 starting with the lowest delta. If the first tuple epoch t_0 is not 1675 later than the last valid sample epoch tp, the routine exits without 1676 affecting the current peer variables. Otherwise, let epsilon_i be 1677 the dispersion of the ith entry, then 1678 i=n-1 1679 --- epsilon_i 1680 epsilon = \ ---------- 1681 / (i+1) 1682 --- 2 1683 i=0 1685 is the peer dispersion p.disp. Note the overload of epsilon, whether 1686 input to the clock filter or output, the meaning should be clear from 1687 context. 1689 The observer should note (a) if all stages contain the dummy tuple 1690 with dispersion MAXDISP, the computed dispersion is a little less 1691 than 16 s, (b) each time a valid tuple is shifted into the register, 1692 the dispersion drops by a little less than half, depending on the 1693 valid tuples dispersion, (c) after the fourth valid packet the 1694 dispersion is usually a little less than 1 s, which is the assumed 1695 value of the MAXDIST parameter used by the selection algorithm to 1696 determine whether the peer variables are acceptable or not. 1698 Let the first stage offset in the sorted list be theta_0; then, for 1699 the other stages in any order, the jitter is the RMS average 1700 +----- -----+^1/2 1701 | n-1 | 1702 | --- | 1703 1 | \ 2 | 1704 psi = -------- * | / (theta_0-theta_j) | 1705 (n-1) | --- | 1706 | j=1 | 1707 +----- -----+ 1709 where n is the number of valid tuples in the filter (n > 1). In 1710 order to insure consistency and avoid divide exceptions in other 1711 computations, the psi is bounded from below by the system precision 1712 s.rho expressed in seconds. While not in general considered a major 1713 factor in ranking server quality, jitter is a valuable indicator of 1714 fundamental timekeeping performance and network congestion state. Of 1715 particular importance to the mitigation algorithms is the peer 1716 synchronization distance, which is computed from the delay and 1717 dispersion. 1719 lambda = (delta / 2) + epsilon. 1721 Note that epsilon and therefore lambda increase at rate PHI. The 1722 lambda is not a state variable, since lambda is recalculated at each 1723 use. It is a component of the root synchronization distance used by 1724 the mitigation algorithms as a metric to evaluate the quality of time 1725 available from each server. 1727 It is important to note that, unlike NTPv3, NTPv4 associations do not 1728 show a timeout condition by setting the stratum to 16 and leap 1729 indicator to 3. The association variables retain the values 1730 determined upon arrival of the last packet. In NTPv4 lambda 1731 increases with time, so eventually the synchronization distance 1732 exceeds the distance threshold MAXDIST, in which case the association 1733 is considered unfit for synchronization. 1735 11. System Process 1737 As each new sample (theta, delta, epsilon, jitter, t) is produced by 1738 the clock filter algorithm, all peer processes are scanned by the 1739 mitigation algorithms consisting of the selection, cluster, combine 1740 and clock discipline algorithms in the system process. The selection 1741 algorithm scans all associations and casts off the falsetickers, 1742 which have demonstrably incorrect time, leaving the truechimers as 1743 result. In a series of rounds the cluster algorithm discards the 1744 association statistically furthest from the centroid until a 1745 specified minimum number of survivors remain. The combine algorithm 1746 produces the best and final statistics on a weighted average basis. 1747 The final offset is passed to the clock discipline algorithm to steer 1748 the system clock to the correct time. 1750 The cluster algorithm selects one of the survivors as the system 1751 peer. The associated statistics (theta, delta, epsilon, jitter, t) 1752 are used to construct the system variables inherited by dependent 1753 servers and clients and made available to other applications running 1754 on the same machine. 1756 11.1. System Process Variables 1758 Figure 27 summarizes the common names, formula names and a short 1759 description of each system variable. Unless noted otherwise, all 1760 variables have assumed prefix s. 1762 +-----------+------------+------------------------+ 1763 | Name | Formula | Description | 1764 +-----------+------------+------------------------+ 1765 | t | t | update time | 1766 | p | p | system peer identifier | 1767 | leap | leap | leap indicator | 1768 | stratum | stratum | stratum | 1769 | precision | rho | precision | 1770 | offset | THETA | combined offset | 1771 | jitter | PSI | combined jitter | 1772 | rootdelay | DELTA | root delay | 1773 | rootdisp | EPSILON | root dispersion | 1774 | v | v | survivor list | 1775 | refid | refid | reference ID | 1776 | reftime | reftime | reference time | 1777 | NMIN | 3 | minimum survivors | 1778 | CMIN | 1 | minimum candidates | 1779 +-----------+------------+------------------------+ 1781 Figure 27: System Process Variables 1783 Except for the t, p, offset and jitter variables and the NMIN and 1784 CMIN constants, the variables have the same format and interpretation 1785 as the peer variables of the same name. The NMIN and CMIN parameters 1786 are used by the selection and cluster algorithms described in the 1787 next section. 1789 The t variable is the seconds counter at the last update determined 1790 by the clock_update() routine in Appendix A.5.5.4. The p variable is 1791 the system peer identifier determined by the cluster() routine in 1792 Section 11.2.2. The precision variable has the same format as the 1793 packet variable of the same name. The precision is defined as the 1794 larger of the resolution and time to read the clock, in log2 units. 1795 For instance, the precision of a mains-frequency clock incrementing 1796 at 60 Hz is 16 ms, even when the system clock hardware representation 1797 is to the nanosecond. 1799 The offset and jitter variables are determined by the combine() 1800 routine in Section 11.2.3. These values represent the best and final 1801 offset and jitter used to discipline the system clock. Initially, 1802 all variables are cleared to zero, then the leap is set to 3 1803 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember that 1804 MAXSTRAT is mapped to zero in the transmitted packet. 1806 11.2. System Process Operations 1808 Figure 28 summarizes the system process operations performed by the 1809 clock_select() routine. The selection algorithm described in 1810 Section 11.2.1 produces a majority clique of presumed correct 1811 candidates (truechimers) based on agreement principles. The cluster 1812 algorithm described in Section 11.2.2 discards outlyers to produce 1813 the most accurate survivors. The combine algorithm described in 1814 Section 11.2.3 provides the best and final offset for the clock 1815 discipline algorithm described in Appendix A.5.5.6. If the selection 1816 algorithm cannot produce a majority clique, or if it cannot produce 1817 at least CMIN survivors, the system process exits without 1818 disciplining the system clock. If successful, the cluster algorithm 1819 selects the statistically best candidate as the system peer and its 1820 variables are inherited as the system variables. 1822 +-----------------+ 1823 | clock_select() | 1824 +-----------------+ 1825 ................................|........... 1826 . V . 1827 . yes +---------+ +-----------------+ . 1828 . +--| accept? | | scan candidates | . 1829 . | +---------+ | | . 1830 . V no | | | . 1831 . +---------+ | | | . 1832 . | add peer| | | | . 1833 . +---------- | | | . 1834 . | V | | . 1835 . +---------->-->| | . 1836 . | | . 1837 . Selection Algorithm +-----------------+ . 1838 .................................|.......... 1839 V 1840 no +-------------------+ 1841 +-------------| survivors? | 1842 | +-------------------+ 1843 | | yes 1844 | V 1845 | +-------------------+ 1846 | | Cluster Algorithm | 1847 | +-------------------+ 1848 | | 1849 | V 1850 V yes +-------------------+ 1851 |<------------| n < CMIN? | 1852 | +-------------------+ 1853 V | 1854 +-----------------+ V no 1855 | s.p = NULL | +-------------------+ 1856 +-----------------+ | s.p = v_0.p | 1857 | +-------------------+ 1858 V | 1859 +-----------------+ V 1860 | return (UNSYNC) | +-------------------+ 1861 +-----------------+ | return (SYNC) | 1862 +-------------------+ 1864 Figure 28: clock_select() Routine 1866 11.2.1. Selection Algorithm 1868 Note that the selection and cluster algorithms are described 1869 separately, but combined in the code skeleton. The selection 1870 algorithm operates to find an intersection interval containing a 1871 majority clique of truechimers using Byzantine agreement principles 1872 originally proposed by Marzullo [9], but modified to improve 1873 accuracy. An overview of the algorithm is given below and in the 1874 first half of the clock_select() routine in Appendix A.5.5.1. 1876 First, those servers which are unusable according to the rules of the 1877 protocol are detected and discarded by the accept() routine in 1878 Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is generated 1879 for the remaining candidates. Here, p is the association identifier 1880 and type identifies the upper (+1), middle (0) and lower (-1) 1881 endpoints of a correctness interval centered on theta for that 1882 candidate. This results in three tuples, lowpoint (p, -1, theta - 1883 lambda), midpoint (p, 0, theta) and highpoint (p, +1, theta + 1884 lambda), where lambda is the root synchronization distance calculated 1885 on each use by the rootdist() routine in Appendix A.5.1.1. The steps 1886 of the algorithm are: 1888 1. For each of m associations, place three tuples as defined above 1889 on the candidate list. 1891 2. Sort the tuples on the list by the edge component. Order the 1892 lowpoint, midpoint and highpoint of these intervals from lowest to 1893 highest. Set the number of falsetickers f = 0. 1895 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest 1896 endpoint to highest. Add one to c for every lowpoint, subtract one 1897 for every highpoint, add one to d for every midpoint. If c >= m - f, 1898 stop; set l = current lowpoint. 1900 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c 1901 for every highpoint, subtract one for every lowpoint, add one to d 1902 for every midpoint. If c >= m - f, stop; set u = current highpoint. 1904 5. Is d = f and l < u? If yes, then follow step 5A; else, follow 1905 step 5B. 1907 5A. Success: the intersection interval is [l, u]. 1909 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. 1910 If no, then go to step 6. 1912 6. Failure; a majority clique could not be found. There are no 1913 suitable candidates to discipline the system clock. 1915 The algorithm is described in detail in Appendix A.5.5.1. Note that 1916 it starts with the assumption that there are no falsetickers (f = 0) 1917 and attempts to find a nonempty intersection interval containing the 1918 midpoints of all correct servers, i.e., truechimers. If a nonempty 1919 interval cannot be found, it increases the number of assumed 1920 falsetickers by one and tries again. If a nonempty interval is found 1921 and the number of falsetickers is less than the number of 1922 truechimers, a majority clique has been found and the midpoint of 1923 each truechimer (theta) represents the candidates available to the 1924 cluster algorithm. 1926 If a majority clique is not found, or if the number of truechimers is 1927 less than CMIN, there are insufficient candidates to discipline the 1928 system clock. CMIN defines the minimum number of servers consistent 1929 with the correctness requirements. Suspicious operators would set 1930 CMIN to insure multiple redundant servers are available for the 1931 algorithms to mitigate properly. However, for historic reasons the 1932 default value for CMIN is one. 1934 11.2.2. Cluster Algorithm 1936 The candidates of the majority clique are placed on the survivor list 1937 v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an 1938 association identifier, theta_p, psi_p, and stratum_p the current 1939 offset, jitter and stratum of association p, respectively, and 1940 lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, 1941 where lambda is the root synchronization distance for association p. 1942 The list is processed by the cluster algorithm below and the second 1943 half of the clock_select() algorithm in Appendix A.5.5.1. 1945 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 1947 2. Sort the candidates by increasing lambda_p. Let n be the number 1948 of candidates and NMIN the minimum required number of survivors. 1950 3. For each candidate compute the selection jitter psi_s: 1952 +----- -----+^1/2 1953 | n-1 | 1954 | --- | 1955 | 1 \ 2 | 1956 psi_s = | ---- * / (theta_s - theta_j) | 1957 | n-1 --- | 1958 | j=1 | 1959 +----- -----+ 1961 4. Select psi_max as the candidate with maximum psi_s. 1963 5. Select psi_min as the candidate with minimum psi_p. 1965 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; 1966 otherwise, follow step 6B. 1968 6A. Done. The remaining candidates on the survivor list are ranked 1969 in the order of preference. The first entry on the list represents 1970 the system peer; its variables are used later to update the system 1971 variables. 1973 6B. Delete the outlyer candidate with psi_max; reduce n by one and go 1974 back to step 3. 1976 The algorithm operates in a series of rounds where each round 1977 discards the statistical outlyer with maximum selection jitter psi_s. 1978 However, if psi_s is less than the minimum peer jitter psi_p, no 1979 improvement is possible by discarding outlyers. This and the minimum 1980 number of survivors represent the terminating conditions of the 1981 algorithm. Upon termination, the final value of psi_max is saved as 1982 the system selection jitter PSI_s for use later. 1984 11.2.3. Combine Algorithm 1986 The remaining survivors are processed by the clock_combine() routine 1987 in Appendix A.5.5.5 to produce the best and final data for the clock 1988 discipline algorithm. The clock_combine() routine processes peer 1989 offset and jitter statistics to produce the combined system offset 1990 THETA and system peer jitter PSI_p, where each server statistic is 1991 weighted by the reciprocal of the root synchronization distance and 1992 the result normalized. 1994 The combined THETA is passed to the clock_update() routine in 1995 Appendix A.5.5.4. The first candidate on the survivor list is 1996 nominated as the system peer with identifier p. The system peer 1997 jitter PSI_p is a component of the system jitter PSI. It is used 1998 along with the selection jitter PSI_s to produce the system jitter: 2000 PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2 2002 Each time an update is received from the system peer, the 2003 clock_update() routine in Appendix A.5.5.4 is called. By rule, an 2004 update is discarded if its time of arrival p.t is not strictly later 2005 than the last update used s.t. The labels IGNOR, PANIC, ADJ and STEP 2006 refer to return codes from the local_clock() routine described in the 2007 next section. 2009 IGNORE means the update has been ignored as an outlyer. PANIC means 2010 the offset is greater than the panic threshold PANICT (1000 s) and 2011 SHOULD cause the program to exit with a diagnostic message to the 2012 system log. STEP means the offset is less than the panic threshold, 2013 but greater than the step threshold STEPT (125 ms). In this case the 2014 clock is stepped to the correct offset, but since this means all peer 2015 data have been invalidated, all associations MUST be reset and the 2016 client begins as at initial start. 2018 ADJ means the offset is less than the step threshold and thus a valid 2019 update. In this case the system variables are updated from the peer 2020 variables as shown in Figure 30. 2022 +-------------------------------------------+ 2023 | System Variable <-- System Peer Variable | | 2024 +-------------------------------------------+ 2025 | s.leap <-- p.leap | 2026 | s.stratum <-- p.stratum + 1 | 2027 | s.offset <-- THETA | 2028 | s.jitter <-- PSI | 2029 | s.rootdelay <-- p.delta_r + delta | 2030 | s.rootdisp <-- p.epsilon_r + p.epsilon + | 2031 | p.psi + PHI * (s.t - p.t) | 2032 | + |THETA| | 2033 | s.refid <-- p.refid | 2034 | s.reftime <-- p.reftime | 2035 | s.t <-- p.t | 2036 +-------------------------------------------+ 2038 Figure 30: System Variables Update 2040 There is an important detail not shown. The dispersion increment 2041 (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from 2042 below by MINDISP. In subnets with very fast processors and networks 2043 and very small delay and dispersion this forces a monotone-definite 2044 increase in s.rootdisp (EPSILON), which avoids loops between peers 2045 operating at the same stratum. 2047 The system variables are available to dependent application programs 2048 as nominal performance statistics. The system offset THETA is the 2049 clock offset relative to the available synchronization sources. The 2050 system jitter PSI is an estimate of the error in determining this 2051 value, elsewhere called the expected error. The root delay DELTA is 2052 the total round trip delay relative to the primary server. The root 2053 dispersion EPSILON is the dispersion accumulated over the network 2054 from the primary server. Finally, the root synchronization distance 2055 is defined 2057 LAMBDA = EPSILON + DELTA / 2, 2058 which represents the maximum error due all causes and is designated 2059 the root synchronization distance. 2061 11.3. Clock Discipline Algorithm 2063 The NTPv4 clock discipline algorithm, shortened to discipline in the 2064 following, functions as a combination of two philosophically quite 2065 different feedback control systems. In a phase-locked loop (PLL) 2066 design, periodic phase updates at update intervals mu seconds are 2067 used directly to minimize the time error and indirectly the frequency 2068 error. In a frequency-locked loop (FLL) design, periodic frequency 2069 updates at intervals mu are used directly to minimize the frequency 2070 error and indirectly the time error. As shown in [5], a PLL usually 2071 works better when network jitter dominates, while a FLL works better 2072 when oscillator wander dominates. This section contains an outline 2073 of how the NTPv4 design works. An in-depth discussion of the design 2074 principles is provided in [5], which also includes a performance 2075 analysis. 2077 The discipline is implemented as the feedback control system shown in 2078 Figure 31. The variable theta_r represents the combine algorithm 2079 offset (reference phase) and theta_c the VFO offset (control phase). 2080 Each update produces a signal V_d representing the instantaneous 2081 phase difference theta_r - theta_c. The clock filter for each server 2082 functions as a tapped delay line, with the output taken at the tap 2083 selected by the clock filter algorithm. The selection, cluster and 2084 combine algorithms combine the data from multiple filters to produce 2085 the signal V_s. The loop filter, with impulse response F(t), 2086 produces the signal V_c which controls the VFO frequency omega_c and 2087 thus the integral of the phase theta_c which closes the loop. The 2088 V_c signal is generated by the clock adjust process in Section 12. 2089 The detailed equations that implement these functions are best 2090 presented in the routines of Appendix A.5.5.6 and Appendix A.5.6.1. 2092 theta_r + +---------\ +----------------+ 2093 NTP --------->| Phase \ V_d | | V_s 2094 theta_c - | Detector ------>| Clock Filter |----+ 2095 +-------->| / | | | 2096 | +---------/ +----------------+ | 2097 | | 2098 ----- | 2099 / \ | 2100 | VFO | | 2101 \ / | 2102 ----- ....................................... | 2103 ^ . Loop Filter . | 2104 | . +---------+ x +-------------+ . | 2105 | V_c . | |<-----| | . | 2106 +------.-| Clock | y | Phase/Freq |<---------+ 2107 . | Adjust |<-----| Prediction | . 2108 . | | | | . 2109 . +---------+ +-------------+ . 2110 ....................................... 2112 Figure 31: Clock Discipline Feedback Loop 2114 Ordinarily, the pseudo-linear feedback loop described above operates 2115 to discipline the system clock. However, there are cases where a 2116 nonlinear algorithm offers considerable improvement. One case is 2117 when the discipline starts without knowledge of the intrinsic clock 2118 frequency. The pseudo-linear loop takes several hours to develop an 2119 accurate measurement and during most of that time the poll interval 2120 cannot be increased. The nonlinear loop described below does this in 2121 15 minutes. Another case is when occasional bursts of large jitter 2122 are present due to congested network links. The state machine 2123 described below resists error bursts lasting less than 15 minutes. 2125 Figure 32 contains a summary of the variables and parameters 2126 including the variables (lower case) or parameters (upper case) name, 2127 formula name and short description. Unless noted otherwise, all 2128 variables have assumed prefix c. The variables t, tc, state, hyster 2129 and count are integers; the remaining variables are floating doubles. 2130 The function of each will be explained in the algorithm descriptions 2131 below. 2133 +--------+------------+--------------------------+ 2134 | Name | Formula | Description | 2135 +--------+------------+--------------------------+ 2136 | t | timer | seconds counter | 2137 | offset | theta | combined offset | 2138 | resid | theta_r | residual offset | 2139 | freq | phi | clock frequency | 2140 | jitter | psi | clock offset jitter | 2141 | wander | omega | clock frequency wander | 2142 | tc | tau | time constant (log2) | 2143 | state | state | state | 2144 | adj | adj | frequency adjustment | 2145 | hyster | hyster | hysteresis counter | 2146 | STEPT | 125 | step threshold (.125 s) | 2147 | WATCH | 900 | stepout thresh(s) | 2148 | PANICT | 1000 | panic threshold (1000 s) | 2149 | LIMIT | 30 | hysteresis limit | 2150 | PGATE | 4 | hysteresis gate | 2151 | TC | 16 | time constant scale | 2152 | AVG | 8 | averaging constant | 2153 +--------+------------+--------------------------+ 2155 Figure 32: Clock Discipline Variables and Parameters 2157 The discipline is implemented by the local_clock() routine, which is 2158 called from the clock_update() routine. The local_clock() routine in 2159 Appendix A.5.5.6 has two parts; the first implements the clock state 2160 machine and the second determines the time constant and thus the poll 2161 interval. 2163 The local_clock() routine exits immediately if the offset is greater 2164 than the panic threshold PANICT (1000 s). The state transition 2165 function is implemented by the rstclock() function in 2166 Appendix A.5.5.7. Figure 33 shows the state transition function used 2167 by this routine. It has four columns showing respectively the state 2168 name, predicate and action if the offset theta is less than the step 2169 threshold, the predicate and actions otherwise, and finally some 2170 comments. 2172 +-------+---------------------+-------------------+--------------+ 2173 | State | theta < STEP | theta > STEP | Comments | 2174 +-------+---------------------+-------------------+--------------+ 2175 | NSET | ->FREQ | ->FREQ | no frequency | 2176 | | adjust time | step time | file | 2177 +-------+---------------------+-------------------+--------------+ 2178 | FSET | ->SYNC | ->SYNC | frequency | 2179 | | adjust time | step time | file | 2180 +-------+---------------------+-------------------+--------------+ 2181 | SPIK | ->SYNC | if < 900 s ->SPIK | outlyer | 2182 | | adjust freq | else ->SYNC | detected | 2183 | | adjust time | step freq | | 2184 | | | step time | | 2185 +-------+---------------------+-------------------+--------------+ 2186 | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | 2187 | | else ->SYNC | else ->SYNC | frequency | 2188 | | step freq | step freq | | 2189 | | adjust time | adjust time | | 2190 +-------+---------------------+-------------------+--------------+ 2191 | SYNC | ->SYNC | if < 900 s ->SPIK | normal | 2192 | | adjust freq | else ->SYNC | operation | 2193 | | adjust time | step freq | | 2194 | | | step time | | 2195 +-------+---------------------+-------------------+--------------+ 2197 Figure 33: State Transition Function 2199 In the table entries the next state is identified by the arrow -> 2200 with the actions listed below. Actions such as adjust time and 2201 adjust frequency are implemented by the PLL/FLL feedback loop in the 2202 local_clock() routine. A step clock action is implemented by setting 2203 the clock directly, but this is done only after the stepout threshold 2204 WATCH (900 s) when the offset is more than the step threshold STEPT 2205 (.125 s). This resists clock steps under conditions of extreme 2206 network congestion. 2208 The jitter (psi) and wander (omega) statistics are computed using an 2209 exponential average with weight factor AVG. The time constant 2210 exponent (tau) is determined by comparing psi with the magnitude of 2211 the current offset theta. If the offset is greater than PGATE (4) 2212 times the clock jitter, the hysteresis counter hyster is reduced by 2213 two; otherwise, it is increased by one. If hyster increases to the 2214 upper limit LIMIT (30), tau is increased by one; if it decreases to 2215 the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau 2216 hovers near MAXPOLL, but quickly decreases if a temperature spike 2217 causes a frequency surge. 2219 12. Clock Adjust Process 2221 The actual clock adjustment is performed by the clock_adjust() 2222 routine in Appendix A.5.6.1. It runs at one-second intervals to add 2223 the frequency correction and a fixed percentage of the residual 2224 offset theta_r. The theta_r is in effect the exponential decay of 2225 the theta value produced by the loop filter at each update. The TC 2226 parameter scales the time constant to match the poll interval for 2227 convenience. Note that the dispersion EPSILON increases by PHI at 2228 each second. 2230 The clock adjust process includes a timer interrupt facility driving 2231 the seconds counter c.t. It begins at zero when the service starts 2232 and increments once each second. At each interrupt the 2233 clock_adjust() routine is called to incorporate the clock discipline 2234 time and frequency adjustments, then the associations are scanned to 2235 determine if the seconds counter equals or exceeds the p.next state 2236 variable defined in the next section. If so, the poll process is 2237 called to send a packet and compute the next p.next value. 2239 13. Poll Process 2241 Each association supports a poll process that runs at regular 2242 intervals to construct and send packets in symmetric, client and 2243 broadcast server associations. It runs continuously, whether or not 2244 servers are reachable in order to manage the clock filter and reach 2245 register. 2247 13.1. Poll Process Variables 2249 Figure 34 summarizes the common names, formula names and a short 2250 description of the poll process variables(lower case) and parameters 2251 (upper case). Unless noted otherwise, all variables have assumed 2252 prefix p. 2254 +---------+---------+--------------------+ 2255 | Name | Formula | Description | 2256 +---------+---------+--------------------+ 2257 | hpoll | hpoll | host poll exponent | 2258 | last | last | last poll time | 2259 | next | next | next poll time | 2260 | reach | reach | reach register | 2261 | unreach | unreach | unreach counter | 2262 | UNREACH | 24 | unreach limit | 2263 | BCOUNT | 8 | burst count | 2264 | BURST | flag | burst enable | 2265 | IBURST | flag | iburst enable | 2266 +---------+---------+--------------------+ 2268 Figure 34: Poll Process Variables and Parameters 2270 The poll process variables are allocated in the association data 2271 structure along with the peer process variables. Following is a 2272 detailed description of the variables. The parameters will be called 2273 out in the following text. 2275 hpoll: signed integer representing the poll exponent, in log2 seconds 2277 last: integer representing the seconds counter when the most recent 2278 packet was sent 2280 next: integer representing the seconds counter when the next packet 2281 is to be sent 2283 reach: 8-bit integer shift register shared by the peer and poll 2284 processes 2286 unreach: integer representing the number of seconds the server has 2287 been unreachable 2289 13.2. Poll Process Operations 2291 As described previously, once each second the clock_adjust() routine 2292 in the clock adjust process is called. This routine calls the poll() 2293 routine in Appendix A.5.7.1 for each association in turn. If the 2294 time for the next poll message is greater than the seconds counter, 2295 the routine returns immediately. Symmetric (modes 1, 2), client 2296 (mode 3) and broadcast server (mode 5) associations routinely send 2297 packets. A broadcast client (mode 6) association runs the routine to 2298 update the reach and unreach variables, but does not send packets. 2299 The poll() routine calls the peer_xmit() routine in Appendix A.5.7.3 2300 to send a packet. If in a burst (burst > 0), nothing further is done 2301 except call the poll_update() routine to set the next poll interval. 2303 If not in a burst, the reach variable is shifted left by one bit, 2304 with zero replacing the rightmost bit. If the server has not been 2305 heard for the last three poll intervals, the clock_filter() routine 2306 is called to increase the dispersion as described in 2307 Appendix A.5.7.3. 2309 If the BURST flag is lit and the server is reachable and a valid 2310 source of synchronization is available, the client sends a burst of 2311 BCOUNT (8) packets at each poll interval. The interval between 2312 packets in the burst is two seconds. This is useful to accurately 2313 measure jitter with long poll intervals. If the IBURST flag is lit 2314 and this is the first packet sent when the server has been 2315 unreachable, the client sends a burst. This is useful to quickly 2316 reduce the synchronization distance below the distance threshold and 2317 synchronize the clock. 2319 If the P_MANY flag is lit in the p.flags word of the association, 2320 this is a manycast client association. Manycast client associations 2321 send client mode packets to designated multicast group addresses at 2322 MINPOLL intervals. The association starts out with a TTL of 1. If 2323 by the time of the next poll there are fewer than MINCLOCK servers 2324 have been mobilized, the ttl is increased by one. If the ttl reaches 2325 the limit TTLMAX, without finding MINCLOCK servers, the poll interval 2326 increases until reaching BEACON, when it starts over from the 2327 beginning. 2329 The poll() routine includes a feature that backs off the poll 2330 interval if the server becomes unreachable. If reach is nonzero, the 2331 server is reachable and unreach is set to zero; otherwise, unreach is 2332 incremented by one for each poll to the maximum UNREACH. Thereafter 2333 for each poll hpoll is increased by one, which doubles the poll 2334 interval up to the maximum MAXPOLL determined by the poll_update() 2335 routine. When the server again becomes reachable, unreach is set to 2336 zero, hpoll is reset to the tc system variable and operation resumes 2337 normally. 2339 A packet is sent by the xmit_packet() routine in Appendix A.3. Some 2340 header values are copied from the peer variables left by a previous 2341 packet and others from the system variables. Figure 35 shows which 2342 values are copied to each header field. In those implementations 2343 using floating double data types for root delay and root dispersion, 2344 these must be converted to NTP short format. All other fields are 2345 either copied intact from peer and system variables or struck as a 2346 timestamp from the system clock. 2348 +-----------------------------------+ 2349 | Packet Variable <-- Variable | 2350 +-----------------------------------+ 2351 | x.leap <-- s.leap | 2352 | x.version <-- s.version | 2353 | x.mode <-- s.mode | 2354 | x.stratum <-- s.stratum | 2355 | x.poll <-- s.poll | 2356 | x.precision <-- s.precision | 2357 | x.rootdelay <-- s.rootdelay | 2358 | x.rootdisp <-- s.rootdisp | 2359 | x.refid <-- s.refid | 2360 | x.reftime <-- s.reftime | 2361 | x.org <-- p.xmt | 2362 | x.rec <-- p.dst | 2363 | x.xmt <-- clock | 2364 | x.keyid <-- p.keyid | 2365 | x.digest <-- md5 digest | 2366 +-----------------------------------+ 2368 Figure 35: xmit_packet Packet Header 2370 The poll_update() routine shown in Appendix A.5.7.2 is called when a 2371 valid packet is received and immediately after a poll message has 2372 been sent. If in a burst, the poll interval is fixed at 2 s; 2373 otherwise, the host poll exponent hpoll is set to the minimum of 2374 ppoll from the last packet received and hpoll from the poll() 2375 routine, but not less than MINPOLL nor greater than MAXPOLL. Thus 2376 the clock discipline can be oversampled, but not undersampled. This 2377 is necessary to preserve subnet dynamic behavior and protect against 2378 protocol errors. 2380 The poll exponent is converted to an interval which when added to the 2381 last variable determines the next variable and thus the time for the 2382 next poll. Finally, the last variable is set to the current seconds 2383 counter. 2385 14. Security Considerations 2387 NTPv4 provides an optional authentication field that utilizes the MD5 2388 algorithm. MD5, as the case for SHA-1, is derived from MD4, which 2389 has long been known to be weak. In 2004, techniques for efficiently 2390 finding collisions in MD5 were announced. A summary of the 2391 apropriateness of MD5 can be found in [10]. 2393 In the case of NTP as specified herein, NTP broadcast clients are 2394 vulnerable to disruption by misbehaving or hostile SNTP or NTP 2395 broadcast servers elsewhere in the Internet. Access controls and/or 2396 cryptographic authentication means should be provided for additional 2397 security in such cases. 2399 15. IANA Considerations 2401 UDP/TCP Port 123 was previously assigned by IANA for this protocol. 2402 The IANA has assigned the IPv4 multicast group address 224.0.1.1 and 2403 the IPv6 multicast address ending :101 for NTP. This document 2404 introduces NTP extension fields allowing for the development of 2405 future extensions to the protocol, where a particular extension is to 2406 be identified by the Field Type sub-field within the extension field. 2407 IANA is requested to establish and maintain a registry for Extension 2408 Field Types associated with this protocol, populating this registry 2409 with no initial entries. As future needs arise, new Extension Field 2410 Types may be defined. Following the policies outlined in [11], new 2411 consensus. 2413 16. Acknowledgements 2415 The editors would like to thank Karen O'Donoghue, Brian Haberman, 2416 Greg Dowd, Mark Elliot, and Harlan Stenn for technical reviews of 2417 this document. 2419 17. Informative References 2421 [1] Mills, D., "Network Time Protocol (Version 3) Specification, 2422 Implementation", RFC 1305, March 1992. 2424 [2] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 for 2425 IPv4, IPv6 and OSI", RFC 4330, January 2006. 2427 [3] Mills, D.L., "The Autokey security architecture, protocol and 2428 algorithms. Electrical and Computer Engineering Technical 2429 Report 06-1-1", NDSS , January 2006. 2431 [4] Mills, D.L., Electrical and Computer Engineering Technical 2432 Report 06-6-1, NDSS, June 2006., "Network Time Protocol Version 2433 4 Reference and Implementation Guide.", 2006. 2435 [5] Mills, D.L., "Computer Network Time Synchronization - the 2436 Network Time Protocol. CRC Press, 304pp.", 2006. 2438 [6] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2439 Levels", BCP 14, RFC 2119, March 1997. 2441 [7] Postel, J., "Internet Protocol", STD 5, RFC 791, 2442 September 1981. 2444 [8] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 2445 April 1992. 2447 [9] Marzullo and S. Owicki, "Maintaining the time in a distributed 2448 system.", ACM Operating Systems Review 19 , July 1985. 2450 [10] Bellovin, S. and E. Rescorla, Proceedings of the 13th annual 2451 ISOC Network and Distributed System Security Symposium, 2452 "Deploying a new Hash Algorithm", February 2006. 2454 [11] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 2455 Considerations Section in RFCs", BCP 26, RFC 2434, 2456 October 1998. 2458 Appendix A. Code Skeleton 2460 This appendix is intended to describe the protocol and algorithms of 2461 an implementation in a general way using what is called a code 2462 skeleton program. This consists of a set of definitions, structures 2463 and code fragments which illustrate the protocol operations without 2464 the complexities of an actual implementation of the protocol. This 2465 program is not an executable and is not designed to run in the 2466 ordinary sense. It is designed to be compiled only in order to 2467 verify consistent variable and type usage. 2469 Most of the features of the reference implementation are included 2470 here, with the following exceptions: There are no provisions for 2471 reference clocks or public key (Autokey) cryptography. There is no 2472 huff-n'-puff filter, anti-clockhop hysteresis or monitoring 2473 provisions. Many of the values that can be tinkered in the reference 2474 implementation are assumed constants here. There are only minimal 2475 provisions for the kiss-o'death packet and no responding code. 2477 The program is not intended to be fast or compact, just to 2478 demonstrate the algorithms with sufficient fidelity to understand how 2479 they work. The code skeleton consists of eight segments, a header 2480 segment included by each of the other segments, plus a code segment 2481 for the main program, kernel I/O and system clock interfaces, and 2482 peer, system, clock_adjust and poll processes. These are presented 2483 in order below along with definitions and variables specific to each 2484 process. 2486 A.1. Global Definitions 2488 A.1.1. Definitions, Constants, Parameters 2490 #include /* avoids complaints about sqrt() */ 2491 #include /* for gettimeofday() and friends */ 2492 #include /* for malloc() and friends */ 2494 /* 2495 * Data types 2496 * 2497 * This program assumes the int data type is 32 bitsand the long data 2498 * type is 64 bits. The native data type used in most calculations is 2499 * floating double. The data types used in some packet header fields 2500 * require conversion to and from this representation. Some header 2501 * fields involve partitioning an octet, here represeted by individual 2502 * octets. 2503 * 2504 * The 64-bit NTP timestamp format used in timestamp calculations is 2505 * unsigned seconds and fraction with the decimal point to the left of 2506 * bit 32. The only operation permitted with these values is 2507 * subtraction, yielding a signed 31-bit difference. The 32-bit NTP 2508 * short format used in delay and dispersion calculations is seconds and 2509 * fraction with the decimal point to the left of bit 16. The only 2510 * operations permitted with these values are addition and 2511 * multiplication by a constant. 2512 * 2513 * The IPv4 address is 32 bits, while the IPv6 address is 128 bits. The 2514 * message digest field is 128 bits as constructed by the MD5 algorithm. 2515 * The precision and poll interval fields are signed log2 seconds. 2516 */ 2517 typedef unsigned long tstamp; /* NTP timestamp format */ 2518 typedef unsigned int tdist; /* NTP short format */ 2519 typedef unsigned long ipaddr; /* IPv4 or IPv6 address */ 2520 typedef unsigned long digest; /* md5 digest */ 2521 typedef signed char s_char; /* precision and poll interval (log2) */ 2523 /* 2524 * Timestamp conversion macroni 2525 */ 2526 #define FRIC 65536. /* 2^16 as a double */ 2527 #define D2FP(r) ((tdist)((r) * FRIC)) /* NTP short */ 2528 #define FP2D(r) ((double)(r) / FRIC) 2530 #define FRAC 4294967296. /* 2^32 as a double */ 2531 #define D2LFP(a) ((tstamp)((a) * FRAC)) /* NTP timestamp */ 2532 #define LFP2D(a) ((double)(a) / FRAC) 2533 #define U2LFP(a) ((a).tv_sec + JAN_1970 << 32 + (unsigned long) \ 2534 ((a).tv_usec / 1e6 * FRAC)) 2536 /* 2537 * Arithmetic conversions 2538 */ 2539 #define LOG2D(a) ((a) < 0 ? 1. / (1L << -(a)) : \ 2540 1L << (a)) /* poll, etc. */ 2541 #define SQUARE(x) (x * x) 2542 #define SQRT(x) (sqrt(x)) 2544 /* 2545 * Global constants. Some of these might be converted to variables 2546 * which can be tinkered by configuration or computed on-fly. For 2547 * instance, the reference implementation computes PRECISION on-fly and 2548 * provides performance tuning for the defines marked with % below. 2549 */ 2550 #define VERSION 4 /* version number */ 2551 #define MINDISP .01 /* % minimum dispersion (s) */ 2552 #define MAXDISP 16 /* maximum dispersion (s) */ 2553 #define MAXDIST 1 /* % distance threshold (s) */ 2554 #define NOSYNC 0x3 /* leap unsync */ 2555 #define MAXSTRAT 16 /* maximum stratum (infinity metric) */ 2556 #define MINPOLL 6 /* % minimum poll interval (64 s)*/ 2557 #define MAXPOLL 17 /* % maximum poll interval (36.4 h) */ 2558 #define MINCLOCK 3 /* minimum manycast survivors */ 2559 #define MAXCLOCK 10 /* maximum manycast candidates */ 2560 #define TTLMAX 8 /* max ttl manycast */ 2561 #define BEACON 15 /* max interval between beacons */ 2563 #define PHI 15e-6 /* % frequency tolerance (15 PPM) */ 2564 #define NSTAGE 8 /* clock register stages */ 2565 #define NMAX 50 /* maximum number of peers */ 2566 #define NSANE 1 /* % minimum intersection survivors */ 2567 #define NMIN 3 /* % minimum cluster survivors */ 2569 /* 2570 * Global return values 2571 */ 2572 #define TRUE 1 /* boolean true */ 2573 #define FALSE 0 /* boolean false */ 2574 #define NULL 0 /* empty pointer */ 2576 /* 2577 * Local clock process return codes 2578 */ 2579 #define IGNORE 0 /* ignore */ 2580 #define SLEW 1 /* slew adjustment */ 2581 #define STEP 2 /* step adjustment */ 2582 #define PANIC 3 /* panic - no adjustment */ 2584 /* 2585 * System flags 2586 */ 2587 #define S_FLAGS 0 /* any system flags */ 2588 #define S_BCSTENAB 0x1 /* enable broadcast client */ 2590 /* 2591 * Peer flags 2592 */ 2593 #define P_FLAGS 0 /* any peer flags */ 2594 #define P_EPHEM 0x01 /* association is ephemeral */ 2595 #define P_BURST 0x02 /* burst enable */ 2596 #define P_IBURST 0x04 /* intial burst enable */ 2597 #define P_NOTRUST 0x08 /* authenticated access */ 2598 #define P_NOPEER 0x10 /* authenticated mobilization */ 2599 #define P_MANY 0x20 /* manycast client */ 2601 /* 2602 * Authentication codes 2603 */ 2604 #define A_NONE 0 /* no authentication */ 2605 #define A_OK 1 /* authentication OK */ 2606 #define A_ERROR 2 /* authentication error */ 2607 #define A_CRYPTO 3 /* crypto-NAK */ 2609 /* 2610 * Association state codes 2611 */ 2612 #define X_INIT 0 /* initialization */ 2613 #define X_STALE 1 /* timeout */ 2614 #define X_STEP 2 /* time step */ 2615 #define X_ERROR 3 /* authentication error */ 2616 #define X_CRYPTO 4 /* crypto-NAK received */ 2617 #define X_NKEY 5 /* untrusted key */ 2619 /* 2620 * Protocol mode definitionss 2621 */ 2622 #define M_RSVD 0 /* reserved */ 2623 #define M_SACT 1 /* symmetric active */ 2624 #define M_PASV 2 /* symmetric passive */ 2625 #define M_CLNT 3 /* client */ 2626 #define M_SERV 4 /* server */ 2627 #define M_BCST 5 /* broadcast server */ 2628 #define M_BCLN 6 /* broadcast client */ 2629 /* 2630 * Clock state definitions 2631 */ 2632 #define NSET 0 /* clock never set */ 2633 #define FSET 1 /* frequency set from file */ 2634 #define SPIK 2 /* spike detected */ 2635 #define FREQ 3 /* frequency mode */ 2636 #define SYNC 4 /* clock synchronized */ 2638 A.1.2. Packet Data Structures 2640 /* 2641 * The receive and transmit packets may contain an optional message 2642 * authentication code (MAC) consisting of a key identifier (keyid) and 2643 * message digest (mac). NTPv4 supports optional extension fields which 2644 * are inserted after the the header and before the MAC, but these are 2645 * not described here. 2646 * 2647 * Receive packet 2648 * 2649 * Note the dst timestamp is not part of the packet itself. It is 2650 * captured upon arrival and returned in the receive buffer along with 2651 * the buffer length and data. Note that some of the char fields are 2652 * packed in the actual header, but the details are omited here. 2653 */ 2654 struct r { 2655 ipaddr srcaddr; /* source (remote) address */ 2656 ipaddr dstaddr; /* destination (local) address */ 2657 char version; /* version number */ 2658 char leap; /* leap indicator */ 2659 char mode; /* mode */ 2660 char stratum; /* stratum */ 2661 char poll; /* poll interval */ 2662 s_char precision; /* precision */ 2663 tdist rootdelay; /* root delay */ 2664 tdist rootdisp; /* root dispersion */ 2665 char refid; /* reference ID */ 2666 tstamp reftime; /* reference time */ 2667 tstamp org; /* origin timestamp */ 2668 tstamp rec; /* receive timestamp */ 2669 tstamp xmt; /* transmit timestamp */ 2670 int keyid; /* key ID */ 2671 digest mac; /* message digest */ 2672 tstamp dst; /* destination timestamp */ 2673 } r; 2675 /* 2676 * Transmit packet 2677 */ 2678 struct x { 2679 ipaddr dstaddr; /* source (local) address */ 2680 ipaddr srcaddr; /* destination (remote) address */ 2681 char version; /* version number */ 2682 char leap; /* leap indicator */ 2683 char mode; /* mode */ 2684 char stratum; /* stratum */ 2685 char poll; /* poll interval */ 2686 s_char precision; /* precision */ 2687 tdist rootdelay; /* root delay */ 2688 tdist rootdisp; /* root dispersion */ 2689 char refid; /* reference ID */ 2690 tstamp reftime; /* reference time */ 2691 tstamp org; /* origin timestamp */ 2692 tstamp rec; /* receive timestamp */ 2693 tstamp xmt; /* transmit timestamp */ 2694 int keyid; /* key ID */ 2695 digest mac; /* message digest */ 2696 } x; 2698 A.1.3. Association Data Structures 2700 /* 2701 * Filter stage structure. Note the t member in this and other 2702 * structures refers to process time, not real time. Process time 2703 * increments by one second for every elapsed second of real time. 2704 */ 2705 struct f { 2706 tstamp t; /* update time */ 2707 double offset; /* clock ofset */ 2708 double delay; /* roundtrip delay */ 2709 double disp; /* dispersion */ 2710 } f; 2712 /* 2713 * Association structure. This is shared between the peer process and 2714 * poll process. 2715 */ 2716 struct p { 2718 /* 2719 * Variables set by configuration 2720 */ 2721 ipaddr srcaddr; /* source (remote) address */ 2722 ipaddr dstaddr; /* destination (local) address */ 2723 char version; /* version number */ 2724 char hmode; /* host mode */ 2725 int keyid; /* key identifier */ 2726 int flags; /* option flags */ 2728 /* 2729 * Variables set by received packet 2730 */ 2731 char leap; /* leap indicator */ 2732 char pmode; /* peer mode */ 2733 char stratum; /* stratum */ 2734 char ppoll; /* peer poll interval */ 2735 double rootdelay; /* root delay */ 2736 double rootdisp; /* root dispersion */ 2737 char refid; /* reference ID */ 2738 tstamp reftime; /* reference time */ 2739 #define begin_clear org /* beginning of clear area */ 2740 tstamp org; /* originate timestamp */ 2741 tstamp rec; /* receive timestamp */ 2742 tstamp xmt; /* transmit timestamp */ 2744 /* 2745 * Computed data 2746 */ 2747 double t; /* update time */ 2748 struct f f[NSTAGE]; /* clock filter */ 2749 double offset; /* peer offset */ 2750 double delay; /* peer delay */ 2751 double disp; /* peer dispersion */ 2752 double jitter; /* RMS jitter */ 2754 /* 2755 * Poll process variables 2756 */ 2757 char hpoll; /* host poll interval */ 2758 int burst; /* burst counter */ 2759 int reach; /* reach register */ 2760 int ttl; /* ttl (manycast) */ 2761 #define end_clear unreach /* end of clear area */ 2762 int unreach; /* unreach counter */ 2763 int outdate; /* last poll time */ 2764 int nextdate; /* next poll time */ 2765 } p; 2767 A.1.4. System Data Structures 2769 /* 2770 * Chime list. This is used by the intersection algorithm. 2771 */ 2772 struct m { /* m is for Marzullo */ 2773 struct p *p; /* peer structure pointer */ 2774 int type; /* high +1, mid 0, low -1 */ 2775 double edge; /* correctness interval edge */ 2776 } m; 2778 /* 2779 * Survivor list. This is used by the clustering algorithm. 2780 */ 2781 struct v { 2782 struct p *p; /* peer structure pointer */ 2783 double metric; /* sort metric */ 2784 } v; 2786 /* 2787 * System structure 2788 */ 2789 struct s { 2790 tstamp t; /* update time */ 2791 char leap; /* leap indicator */ 2792 char stratum; /* stratum */ 2793 char poll; /* poll interval */ 2794 char precision; /* precision */ 2795 double rootdelay; /* root delay */ 2796 double rootdisp; /* root dispersion */ 2797 char refid; /* reference ID */ 2798 tstamp reftime; /* reference time */ 2799 struct m m[NMAX]; /* chime list */ 2800 struct v v[NMAX]; /* survivor list */ 2801 struct p *p; /* association ID */ 2802 double offset; /* combined offset */ 2803 double jitter; /* combined jitter */ 2804 int flags; /* option flags */ 2805 int n; /* number of survivors */ 2806 } s; 2808 A.1.5. Local Clock Data Structures 2810 /* 2811 * Local clock structure 2812 */ 2813 struct c { 2814 tstamp t; /* update time */ 2815 int state; /* current state */ 2816 double offset; /* current offset */ 2817 double last; /* previous offset */ 2818 int count; /* jiggle counter */ 2819 double freq; /* frequency */ 2820 double jitter; /* RMS jitter */ 2821 double wander; /* RMS wander */ 2822 } c; 2824 A.1.6. Function Prototypes 2826 /* 2827 * Peer process 2828 */ 2829 void receive(struct r *); /* receive packet */ 2830 void packet(struct p *, struct r *); /* process packet */ 2831 void clock_filter(struct p *, double, double, double); /* filter */ 2832 double root_dist(struct p *); /* calculate root distance */ 2833 int fit(struct p *); /* determine fitness of server */ 2834 void clear(struct p *, int); /* clear association */ 2835 int access(struct r *); /* determine access restrictions */ 2837 /* 2838 * System process 2839 */ 2840 int main(); /* main program */ 2841 void clock_select(); /* find the best clocks */ 2842 void clock_update(struct p *); /* update the system clock */ 2843 void clock_combine(); /* combine the offsets */ 2845 /* 2846 * Local clock process 2847 */ 2848 int local_clock(struct p *, double); /* clock discipline */ 2849 void rstclock(int, double, double); /* clock state transition */ 2851 /* 2852 * Clock adjust process 2853 */ 2854 void clock_adjust(); /* one-second timer process */ 2855 /* 2856 * Poll process 2857 */ 2858 void poll(struct p *); /* poll process */ 2859 void poll_update(struct p *, int); /* update the poll interval */ 2860 void peer_xmit(struct p *); /* transmit a packet */ 2861 void fast_xmit(struct r *, int, int); /* transmit a reply packet */ 2863 /* 2864 * Utility routines 2865 */ 2866 digest md5(int); /* generate a message digest */ 2867 struct p *mobilize(ipaddr, ipaddr, int, int, int, int); /* mobilize */ 2868 struct p *find_assoc(struct r *); /* search the association table */ 2870 /* 2871 * Kernel interface 2872 */ 2873 struct r *recv_packet(); /* wait for packet */ 2874 void xmit_packet(struct x *); /* send packet */ 2875 void step_time(double); /* step time */ 2876 void adjust_time(double); /* adjust (slew) time */ 2877 tstamp get_time(); /* read time */ 2879 A.2. Main Program and Utility Routines 2881 /* 2882 * Definitions 2883 */ 2884 #define PRECISION -18 /* precision (log2 s) */ 2885 #define IPADDR 0 /* any IP address */ 2886 #define MODE 0 /* any NTP mode */ 2887 #define KEYID 0 /* any key identifier */ 2889 /* 2890 * main() - main program 2891 */ 2892 int 2893 main() 2894 { 2895 struct p *p; /* peer structure pointer */ 2896 struct r *r; /* receive packet pointer */ 2898 /* 2899 * Read command line options and initialize system variables. 2900 * The reference implementation measures the precision specific 2901 * to each machine by measuring the clock increments to read the 2902 * system clock. 2904 */ 2905 memset(&s, sizeof(s), 0); 2906 s.leap = NOSYNC; 2907 s.stratum = MAXSTRAT; 2908 s.poll = MINPOLL; 2909 s.precision = PRECISION; 2910 s.p = NULL; 2912 /* 2913 * Initialize local clock variables 2914 */ 2915 memset(&c, sizeof(c), 0); 2916 if (/* frequency file */ 0) { 2917 c.freq = /* freq */ 0; 2918 rstclock(FSET, 0, 0); 2919 } else { 2920 rstclock(NSET, 0, 0); 2921 } 2922 c.jitter = LOG2D(s.precision); 2924 /* 2925 * Read the configuration file and mobilize persistent 2926 * associations with spcified addresses, version, mode, key ID 2927 * and flags. 2928 */ 2929 while (/* mobilize configurated associations */ 0) { 2930 p = mobilize(IPADDR, IPADDR, VERSION, MODE, KEYID, 2931 P_FLAGS); 2932 } 2934 /* 2935 * Start the system timer, which ticks once per second. Then 2936 * read packets as they arrive, strike receive timestamp and 2937 * call the receive() routine. 2938 */ 2939 while (0) { 2940 r = recv_packet(); 2941 r->dst = get_time(); 2942 receive(r); 2943 } 2944 } 2946 /* 2947 * mobilize() - mobilize and initialize an association 2948 */ 2949 struct p 2950 *mobilize( 2951 ipaddr srcaddr, /* IP source address */ 2952 ipaddr dstaddr, /* IP destination address */ 2953 int version, /* version */ 2954 int mode, /* host mode */ 2955 int keyid, /* key identifier */ 2956 int flags /* peer flags */ 2957 ) 2958 { 2959 struct p *p; /* peer process pointer */ 2961 /* 2962 * Allocate and initialize association memory 2963 */ 2964 p = malloc(sizeof(struct p)); 2965 p->srcaddr = srcaddr; 2966 p->dstaddr = dstaddr; 2967 p->version = version; 2968 p->hmode = mode; 2969 p->keyid = keyid; 2970 p->hpoll = MINPOLL; 2971 clear(p, X_INIT); 2972 p->flags == flags; 2973 return (p); 2974 } 2976 /* 2977 * find_assoc() - find a matching association 2978 */ 2979 struct p /* peer structure pointer or NULL */ 2980 *find_assoc( 2981 struct r *r /* receive packet pointer */ 2982 ) 2983 { 2984 struct p *p; /* dummy peer structure pointer */ 2986 /* 2987 * Search association table for matching source 2988 * address, source port and mode. 2989 */ 2990 while (/* all associations */ 0) { 2991 if (r->srcaddr == p->srcaddr && r->mode == p->hmode) 2992 return(p); 2993 } 2994 return (NULL); 2995 } 2997 /* 2998 * md5() - compute message digest 2999 */ 3001 digest 3002 md5( 3003 int keyid /* key identifier */ 3004 ) 3005 { 3006 /* 3007 * Compute a keyed cryptographic message digest. The key 3008 * identifier is associated with a key in the local key cache. 3009 * The key is prepended to the packet header and extension fieds 3010 * and the result hashed by the MD5 algorithm as described in 3011 * RFC-1321. Return a MAC consisting of the 32-bit key ID 3012 * concatenated with the 128-bit digest. 3013 */ 3014 return (/* MD5 digest */ 0); 3015 } 3017 A.3. Kernel Input/Output Interface 3019 /* 3020 * Kernel interface to transmit and receive packets. Details are 3021 * deliberately vague and depend on the operating system. 3022 * 3023 * recv_packet - receive packet from network 3024 */ 3025 struct r /* receive packet pointer*/ 3026 *recv_packet() { 3027 return (/* receive packet r */ 0); 3028 } 3030 /* 3031 * xmit_packet - transmit packet to network 3032 */ 3033 void 3034 xmit_packet( 3035 struct x *x /* transmit packet pointer */ 3036 ) 3037 { 3038 /* send packet x */ 3039 } 3041 A.4. Kernel System Clock Interface 3043 /* 3044 * System clock utility functions 3045 * 3046 * There are three time formats: native (Unix), NTP and floating double. 3047 * The get_time() routine returns the time in NTP long format. The Unix 3048 * routines expect arguments as a structure of two signed 32-bit words 3049 * in seconds and microseconds (timeval) or nanoseconds (timespec). The 3050 * step_time() and adjust_time() routines expect signed arguments in 3051 * floating double. The simplified code shown here is for illustration 3052 * only and has not been verified. 3053 */ 3054 #define JAN_1970 2208988800UL /* 1970 - 1900 in seconds */ 3056 /* 3057 * get_time - read system time and convert to NTP format 3058 */ 3059 tstamp 3060 get_time() 3061 { 3062 struct timeval unix_time; 3064 /* 3065 * There are only two calls on this routine in the program. One 3066 * when a packet arrives from the network and the other when a 3067 * packet is placed on the send queue. Call the kernel time of 3068 * day routine (such as gettimeofday()) and convert to NTP 3069 * format. 3070 */ 3071 gettimeofday(&unix_time, NULL); 3072 return (U2LFP(unix_time)); 3073 } 3075 /* 3076 * step_time() - step system time to given offset valuet 3077 */ 3078 void 3079 step_time( 3080 double offset /* clock offset */ 3081 ) 3082 { 3083 struct timeval unix_time; 3084 tstamp ntp_time; 3086 /* 3087 * Convert from double to native format (signed) and add to the 3088 * current time. Note the addition is done in native format to 3089 * avoid overflow or loss of precision. 3090 */ 3091 gettimeofday(&unix_time, NULL); 3092 ntp_time = D2LFP(offset) + U2LFP(unix_time);; 3093 unix_time.tv_sec = ntp_time >> 32; 3094 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3095 32) / FRAC * 1e6); 3096 settimeofday(&unix_time, NULL); 3097 } 3099 /* 3100 * adjust_time() - slew system clock to given offset value 3101 */ 3102 void 3103 adjust_time( 3104 double offset /* clock offset */ 3105 ) 3106 { 3107 struct timeval unix_time; 3108 tstamp ntp_time; 3110 /* 3111 * Convert from double to native format (signed) and add to the 3112 * current time. 3113 */ 3114 ntp_time = D2LFP(offset); 3115 unix_time.tv_sec = ntp_time >> 32; 3116 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3117 32) / FRAC * 1e6); 3118 adjtime(&unix_time, NULL); 3119 } 3121 A.5. Peer Process 3123 /* 3124 * A crypto-NAK packet includes the NTP header followed by a MAC 3125 * consisting only of the key identifier with value zero. It tells the 3126 * receiver that a prior request could not be properly authenticated, 3127 * but the NTP header fields are correct. 3128 * 3129 * A kiss-o'-death packet is an NTP header with leap 0x3 (NOSYNC) and 3130 * stratum 16 (MAXSTRAT. It tells the receiver that something drastic 3131 * has happened, as revealled by the kiss code in the refid field. The 3132 * NTP header fields may or may not be correct. 3133 */ 3134 /* 3135 * Peer process parameters and constants 3136 */ 3137 #define SGATE 3 /* spike gate (clock filter */ 3138 #define BDELAY .004 /* broadcast delay (s) */ 3140 /* 3141 * Dispatch codes 3142 */ 3143 #define ERR -1 /* error */ 3144 #define DSCRD 0 /* discard packet */ 3145 #define PROC 1 /* process packet */ 3146 #define BCST 2 /* broadcast packet */ 3147 #define FXMIT 3 /* client packet */ 3148 #define NEWPS 4 /* new symmetric passive client */ 3149 #define NEWBC 5 /* new broadcast client */ 3151 /* 3152 * Dispatch matrix 3153 * active passv client server bcast */ 3154 int table[7][5] = { 3155 /* nopeer */ { NEWPS, DSCRD, FXMIT, MANY, NEWBC }, 3156 /* active */ { PROC, PROC, DSCRD, DSCRD, DSCRD }, 3157 /* passv */ { PROC, ERR, DSCRD, DSCRD, DSCRD }, 3158 /* client */ { DSCRD, DSCRD, DSCRD, PROC, DSCRD }, 3159 /* server */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3160 /* bcast */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3161 /* bclient */ { DSCRD, DSCRD, DSCRD, DSCRD, PROC} 3162 }; 3164 /* 3165 * Miscellaneous macroni 3166 * 3167 * This macro defines the authentication state. If x is 0, 3168 * authentication is optional, othewise it is required. 3169 */ 3170 #define AUTH(x, y) ((x) ? (y) == A_OK : (y) == A_OK || \ 3171 (y) == A_NONE) 3173 /* 3174 * These are used by the clear() routine 3175 */ 3176 #define BEGIN_CLEAR(p) ((char *)&((p)->begin_clear)) 3177 #define END_CLEAR(p) ((char *)&((p)->end_clear)) 3178 #define LEN_CLEAR (END_CLEAR((struct p *)0) - \ 3179 BEGIN_CLEAR((struct p *)0)) 3181 A.5.1. receive() 3183 /* 3184 * receive() - receive packet and decode modes 3185 */ 3186 void 3187 receive( 3188 struct r *r /* receive packet pointer */ 3189 ) 3191 { 3192 struct p *p; /* peer structure pointer 3193 int auth; /* authentication code */ 3194 int has_mac; /* size of MAC */ 3195 int synch; /* synchronized switch */ 3196 int auth; /* authentication code */ 3198 /* 3199 * Check access control lists. The intent here is to implement a 3200 * whitelist of those IP addresses specifically accepted and/or 3201 * a blacklist of those IP addresses specifically rejected. 3202 * There could be different lists for authenticated clients and 3203 * unauthenticated clients. 3204 */ 3205 if (!access(r)) 3206 return; /* access denied */ 3208 /* 3209 * The version must not be in the future. Format checks include 3210 * packet length, MAC length and extension field lengths, if 3211 * present. 3212 */ 3213 if (r->version > VERSION /* or format error */) 3214 return; /* format error */ 3216 /* 3217 * Authentication is conditioned by two switches which can be 3218 * specified on a per-client basis. 3219 * 3220 * P_NOPEER do not mobilize an association unless 3221 * authenticated 3222 * P_NOTRUST do not allow access unless authenticated 3223 * (implies P_NOPEER) 3224 * 3225 * There are four outcomes: 3226 * 3227 * A_NONE the packet has no MAC 3228 * A_OK the packet has a MAC and authentication 3229 * succeeds 3230 * A_ERROR the packet has a MAC and authentication fails 3231 * A_CRYPTO crypto-NAK. The MAC has four octets only. 3232 * 3233 * Note: The AUTH(x, y) macro is used to filter outcomes. If x 3234 * is zero, acceptable outcomes of y are NONE and OK. If x is 3235 * one, the only acceptable outcome of y is OK. 3236 */ 3237 has_mac = /* length of MAC field */ 0; 3238 if (has_mac == 0) { 3239 auth = A_NONE; /* not required */ 3240 } else if (has_mac == 4) { 3241 auth == A_CRYPTO; /* crypto-NAK */ 3242 } else { 3243 if (r->mac != md5(r->keyid)) 3244 auth = A_ERROR; /* auth error */ 3245 else 3246 auth = A_OK; /* auth OK */ 3247 } 3249 /* 3250 * Find association and dispatch code. If there is no 3251 * association to match, the value of p->hmode is assumed NULL. 3252 */ 3253 p = find_assoc(r); 3254 switch(table[p->hmode][r->mode]) { 3256 /* 3257 * Client packet and no association. Send server reply without 3258 * saving state. 3259 */ 3260 case FXMIT: 3262 /* 3263 * If unicast destination address, send server packet. 3264 * If authentication fails, send a crypto-NAK packet. 3265 /* 3266 if (/* not multicast dstaddr */0) { 3267 if (AUTH(p->flags & P_NOTRUST, auth)) 3268 fast_xmit(r, M_SERV, auth); 3269 else if (auth == A_ERROR) 3270 fast_xmit(r, M_SERV, A_CRYPTO); 3271 return; /* M_SERV packet sent */ 3272 } 3274 /* 3275 * This must be manycast. Do not repspond if we are not 3276 * synchronized or if our stratum is above the 3277 * manycaster. 3278 */ 3279 if (s.leap == NOSYNC || s.stratum > r->stratum) 3280 return; 3282 /* 3283 * Respond only if authentication is ok. Note that the 3284 * unicast addreess is used, not the multicast. 3285 */ 3286 if (AUTH(p->flags & P_NOTRUST, auth)) 3287 fast_xmit(r, M_SERV, auth); 3288 return; 3290 /* 3291 * New manycase client ephemeral association. It is mobilized in 3292 * the same version as in the packet. If authentication fails, 3293 * ignore the packet. Verify the server packet by comparing the 3294 * r->org timestamp in the packet with the p->xmt timestamp in 3295 * the multicast client association. If they match, the server 3296 * packet is authentic. Details omitted. 3297 */ 3298 case MANY: 3299 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3300 return; /* authentication error */ 3302 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_CLNT, 3303 r->keyid, P_EPHEM); 3304 break; 3306 /* 3307 * New symmetric passive ssociation. It is mobilized in the same 3308 * version as in the packet. If authentication fails, send a 3309 * crypto-NAK packet. If restrict no-moblize, send a symmetric 3310 * active packet instead. 3311 */ 3312 case NEWPS: 3313 if (!AUTH(p->flags & P_NOTRUST, auth)) { 3314 if (auth == A_ERROR) 3315 fast_xmit(r, M_SACT, A_CRYPTO); 3316 return; /* crypto-NAK packet sent */ 3317 } 3318 if (!AUTH(p->flags & P_NOPEER, auth)) { 3319 fast_xmit(r, M_SACT, auth); 3320 return; /* M_SACT packet sent */ 3321 } 3322 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_PASV, 3323 r->keyid, P_EPHEM); 3324 break; 3326 /* 3327 * New broadcast client association. It is mobilized in the same 3328 * version as in the packet. If authentication fails, ignore the 3329 * packet. Note this code does not support the initial volley 3330 * feature in the reference implementation. 3331 */ 3332 case NEWBC: 3333 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3334 return; /* authentication error */ 3336 if (!(s.flags & S_BCSTENAB)) 3337 return; /* broadcast not enabled */ 3339 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_BCLN, 3340 r->keyid, P_EPHEM); 3341 break; /* processing continues */ 3343 /* 3344 * Process packet. Placeholdler only. 3345 */ 3346 case PROC: 3347 break; /* processing continues */ 3349 /* 3350 * Invalid mode combination. We get here only in case of 3351 * ephemeral associations, so the correct action is simply to 3352 * toss it. 3353 */ 3354 case ERR: 3355 clear(p, X_ERROR); 3356 return; /* invalid mode combination */ 3358 /* 3359 * No match; just discard the packet. 3360 */ 3361 case DSCRD: 3362 return; /* orphan abandoned */ 3363 } 3365 /* 3366 * Next comes a rigorous schedule of timestamp checking. If the 3367 * transmit timestamp is zero, the server is horribly broken. 3368 */ 3369 if (r->xmt == 0) 3370 return; /* invalid timestamp */ 3372 /* 3373 * If the transmit timestamp duplicates a previous one, the 3374 * packet is a replay. 3375 */ 3376 if (r->xmt == p->xmt) 3377 return; /* duplicate packet */ 3379 /* 3380 * If this is a broadcast mode packet, skip further checking. 3381 * If the origin timestamp is zero, the sender has not yet heard 3382 * from us. Otherwise, if the origin timestamp does not match 3383 * the transmit timestamp, the packet is bogus. 3385 */ 3386 synch = TRUE; 3387 if (r->mode != M_BCST) { 3388 if (r->org == 0) 3389 synch = FALSE; /* unsynchronized */ 3391 else if (r->org != p->xmt) 3392 synch = FALSE; /* bogus packet */ 3393 } 3395 /* 3396 * Update the origin and destination timestamps. If 3397 * unsynchronized or bogus, abandon ship. 3398 */ 3399 p->org = r->xmt; 3400 p->rec = r->dst; 3401 if (!synch) 3402 return; /* unsynch */ 3404 /* 3405 * The timestamps are valid and the receive packet matches the 3406 * last one sent. If the packet is a crypto-NAK, the server 3407 * might have just changed keys. We demobilize the association 3408 * and wait for better times. 3409 */ 3410 if (auth == A_CRYPTO) { 3411 clear(p, X_CRYPTO); 3412 return; /* crypto-NAK */ 3413 } 3415 /* 3416 * If the association is authenticated, the key ID is nonzero 3417 * and received packets must be authenticated. This is designed 3418 * to avoid a bait-and-switch attack, which was possible in past 3419 * versions. 3420 */ 3421 if (!AUTH(p->keyid || (p->flags & P_NOTRUST), auth)) 3422 return; /* bad auth */ 3424 /* 3425 * Everything possible has been done to validate the timestamps 3426 * and prevent bad guys from disrupting the protocol or 3427 * injecting bogus data. Earn some revenue. 3428 */ 3429 packet(p, r); 3430 } 3432 A.5.1.1. packet() 3433 /* 3434 * packet() - process packet and compute offset, delay and 3435 * dispersion. 3436 */ 3437 void 3438 packet( 3439 struct p *p, /* peer structure pointer */ 3440 struct r *r /* receive packet pointer */ 3441 ) 3442 { 3443 double offset; /* sample offsset */ 3444 double delay; /* sample delay */ 3445 double disp; /* sample dispersion */ 3447 /* 3448 * By golly the packet is valid. Light up the remaining header 3449 * fields. Note that we map stratum 0 (unspecified) to MAXSTRAT 3450 * to make stratum comparisons simpler and to provide a natural 3451 * interface for radio clock drivers that operate for 3452 * convenience at stratum 0. 3453 */ 3454 p->leap = r->leap; 3455 if (r->stratum == 0) 3456 p->stratum = MAXSTRAT; 3457 else 3458 p->stratum = r->stratum; 3459 p->pmode = r->mode; 3460 p->ppoll = r->poll; 3461 p->rootdelay = FP2D(r->rootdelay); 3462 p->rootdisp = FP2D(r->rootdisp); 3463 p->refid = r->refid; 3464 p->reftime = r->reftime; 3466 /* 3467 * Verify the server is synchronized with valid stratum and 3468 * reference time not later than the transmit time. 3469 */ 3470 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3471 return; /* unsynchronized */ 3473 /* 3474 * Verify valid root distance. 3475 */ 3476 if (r->rootdelay / 2 + r->rootdisp >= MAXDISP || p->reftime > 3477 r->xmt) 3478 return; /* invalid header values */ 3480 poll_update(p, p->hpoll); 3481 p->reach |= 1; 3483 /* 3484 * Calculate offset, delay and dispersion, then pass to the 3485 * clock filter. Note carefully the implied processing. The 3486 * first-order difference is done directly in 64-bit arithmetic, 3487 * then the result is converted to floating double. All further 3488 * processing is in floating double arithmetic with rounding 3489 * done by the hardware. This is necessary in order to avoid 3490 * overflow and preseve precision. 3491 * 3492 * The delay calculation is a special case. In cases where the 3493 * server and client clocks are running at different rates and 3494 * with very fast networks, the delay can appear negative. In 3495 * order to avoid violating the Principle of Least Astonishment, 3496 * the delay is clamped not less than the system precision. 3497 */ 3498 if (p->pmode == M_BCST) { 3499 offset = LFP2D(r->xmt - r->dst); 3500 delay = BDELAY; 3501 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3502 2 * BDELAY; 3503 } else { 3504 offset = (LFP2D(r->rec - r->org) + LFP2D(r->dst - 3505 r->xmt)) / 2; 3506 delay = max(LFP2D(r->dst - r->org) - LFP2D(r->rec - 3507 r->xmt), LOG2D(s.precision)); 3508 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3509 LFP2D(r->dst - r->org); 3510 } 3511 clock_filter(p, offset, delay, disp); 3512 } 3514 A.5.2. clock_filter() 3516 /* 3517 * clock_filter(p, offset, delay, dispersion) - select the best from the 3518 * latest eight delay/offset samples. 3519 */ 3520 void 3521 clock_filter( 3522 struct p *p, /* peer structure pointer */ 3523 double offset, /* clock offset */ 3524 double delay, /* roundtrip delay */ 3525 double disp /* dispersion */ 3526 ) 3527 { 3528 struct f f[NSTAGE]; /* sorted list */ 3529 double dtemp; 3530 int i; 3532 /* 3533 * The clock filter contents consist of eight tuples (offset, 3534 * delay, dispersion, time). Shift each tuple to the left, 3535 * discarding the leftmost one. As each tuple is shifted, 3536 * increase the dispersion since the last filter update. At the 3537 * same time, copy each tuple to a temporary list. After this, 3538 * place the (offset, delay, disp, time) in the vacated 3539 * rightmost tuple. 3540 */ 3541 for (i = 1; i < NSTAGE; i++) { 3542 p->f[i] = p->f[i - 1]; 3543 p->f[i].disp += PHI * (c.t - p->t); 3544 f[i] = p->f[i]; 3545 } 3546 p->f[0].t = c.t; 3547 p->f[0].offset = offset; 3548 p->f[0].delay = delay; 3549 p->f[0].disp = disp; 3550 f[0] = p->f[0]; 3552 /* 3553 * Sort the temporary list of tuples by increasing f[].delay. 3554 * The first entry on the sorted list represents the best 3555 * sample, but it might be old. 3556 */ 3557 dtemp = p->offset; 3558 p->offset = f[0].offset; 3559 p->delay = f[0].delay; 3560 for (i = 0; i < NSTAGE; i++) { 3561 p->disp += f[i].disp / (2 ^ (i + 1)); 3562 p->jitter += SQUARE(f[i].offset - f[0].offset); 3563 } 3564 p->jitter = max(SQRT(p->jitter), LOG2D(s.precision)); 3566 /* 3567 * Prime directive: use a sample only once and never a sample 3568 * older than the latest one, but anything goes before first 3569 * synchronized. 3570 */ 3571 if (f[0].t - p->t <= 0 && s.leap != NOSYNC) 3572 return; 3574 /* 3575 * Popcorn spike suppressor. Compare the difference between the 3576 * last and current offsets to the current jitter. If greater 3577 * than SGATE (3) and if the interval since the last offset is 3578 * less than twice the system poll interval, dump the spike. 3579 * Otherwise, and if not in a burst, shake out the truechimers. 3580 */ 3581 if (fabs(p->offset - dtemp) > SGATE * p->jitter && (f[0].t - 3582 p->t) < 2 * s.poll) 3583 return; 3585 p->t = f[0].t; 3586 if (p->burst == 0) 3587 clock_select(); 3588 return; 3589 } 3591 /* 3592 * root_dist() - calculate root distance 3593 */ 3594 double 3595 root_dist( 3596 struct p *p /* peer structure pointer */ 3597 ) 3598 { 3599 /* 3600 * The root synchronization distance is the maximum error due to 3601 * all causes of the local clock relative to the primary server. 3602 * It is defined as half the total delay plus total dispersion 3603 * plus peer jitter. 3604 */ 3605 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3606 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3607 } 3609 /* 3610 * fit() - test if association p is acceptable for synchronization 3611 */ 3612 int 3613 fit( 3614 struct p *p /* peer structure pointer */ 3615 ) 3616 { 3617 /* 3618 * A stratum error occurs if (1) the server has never been 3619 * synchronized, (2) the server stratum is invalid. 3620 */ 3621 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3622 return (FALSE); 3624 /* 3625 * A distance error occurs if the root distance exceeds the 3626 * distance threshold plus an increment equal to one poll 3627 * interval. 3628 */ 3629 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 3630 return (FALSE); 3632 /* 3633 * A loop error occurs if the remote peer is synchronized to the 3634 * local peer or the remote peer is synchronized to the current 3635 * system peer. Note this is the behavior for IPv4; for IPv6 the 3636 * MD5 hash is used instead. 3637 */ 3638 if (p->refid == p->dstaddr || p->refid == s.refid) 3639 return (FALSE); 3641 /* 3642 * An unreachable error occurs if the server is unreachable. 3643 */ 3644 if (p->reach == 0) 3645 return (FALSE); 3647 return (TRUE); 3648 } 3650 /* 3651 * clear() - reinitialize for persistent association, demobilize 3652 * for ephemeral association. 3653 */ 3654 void 3655 clear( 3656 struct p *p, /* peer structure pointer */ 3657 int kiss /* kiss code */ 3658 ) 3659 { 3660 int i; 3662 /* 3663 * The first thing to do is return all resources to the bank. 3664 * Typical resources are not detailed here, but they include 3665 * dynamically allocated structures for keys, certificates, etc. 3666 * If an ephemeral association and not initialization, return 3667 * the association memory as well. 3668 */ 3669 /* return resources */ 3670 if (s.p == p) 3671 s.p = NULL; 3672 if (kiss != X_INIT && (p->flags & P_EPHEM)) { 3673 free(p); 3674 return; 3675 } 3677 /* 3678 * Initialize the association fields for general reset. 3679 */ 3680 memset(BEGIN_CLEAR(p), LEN_CLEAR, 0); 3681 p->leap = NOSYNC; 3682 p->stratum = MAXSTRAT; 3683 p->ppoll = MAXPOLL; 3684 p->hpoll = MINPOLL; 3685 p->disp = MAXDISP; 3686 p->jitter = LOG2D(s.precision); 3687 p->refid = kiss; 3688 for (i = 0; i < NSTAGE; i++) 3689 p->f[i].disp = MAXDISP; 3691 /* 3692 * Randomize the first poll just in case thousands of broadcast 3693 * clients have just been stirred up after a long absence of the 3694 * broadcast server. 3695 */ 3696 p->outdate = p->t = c.t; 3697 p->nextdate = p->outdate + (random() & ((1 << MINPOLL) - 1)); 3698 } 3700 A.5.3. fast_xmit() 3702 /* 3703 * fast_xmit() - transmit a reply packet for receive packet r 3704 */ 3705 void 3706 fast_xmit( 3707 struct r *r, /* receive packet pointer */ 3708 int mode, /* association mode */ 3709 int auth /* authentication code */ 3710 ) 3711 { 3712 struct x x; 3714 /* 3715 * Initialize header and transmit timestamp. Note that the 3716 * transmit version is copied from the receive version. This is 3717 * for backward compatibility. 3718 */ 3719 x.version = r->version; 3720 x.srcaddr = r->dstaddr; 3721 x.dstaddr = r->srcaddr; 3722 x.leap = s.leap; 3723 x.mode = mode; 3724 if (s.stratum == MAXSTRAT) 3725 x.stratum = 0; 3726 else 3727 x.stratum = s.stratum; 3728 x.poll = r->poll; 3729 x.precision = s.precision; 3730 x.rootdelay = D2FP(s.rootdelay); 3731 x.rootdisp = D2FP(s.rootdisp); 3732 x.refid = s.refid; 3733 x.reftime = s.reftime; 3734 x.org = r->xmt; 3735 x.rec = r->dst; 3736 x.xmt = get_time(); 3738 /* 3739 * If the authentication code is A.NONE, include only the 3740 * header; if A.CRYPTO, send a crypto-NAK; if A.OK, send a valid 3741 * MAC. Use the key ID in the received packet and the key in the 3742 * local key cache. 3743 */ 3744 if (auth != A_NONE) { 3745 if (auth == A_CRYPTO) { 3746 x.keyid = 0; 3747 } else { 3748 x.keyid = r->keyid; 3749 x.mac = md5(x.keyid); 3750 } 3751 } 3752 xmit_packet(&x); 3753 } 3754 A.5.4. access() 3756 /* 3757 * access() - determine access restrictions 3758 */ 3759 int 3760 access( 3761 struct r *r /* receive packet pointer */ 3762 ) 3763 { 3764 /* 3765 * The access control list is an ordered set of tuples 3766 * consisting of an address, mask and restrict word containing 3767 * defined bits. The list is searched for the first match on the 3768 * source address (r->srcaddr) and the associated restrict word 3769 * is returned. 3770 */ 3771 return (/* access bits */ 0); 3772 } 3774 A.5.5. System Process 3776 A.5.5.1. clock_select() 3778 /* 3779 * clock_select() - find the best clocks 3780 */ 3781 void 3782 clock_select() { 3783 struct p *p, *osys; /* peer structure pointers */ 3784 double low, high; /* correctness interval extents */ 3785 int allow, found, chime; /* used by intersecion algorithm */ 3786 int n, i, j; 3788 /* 3789 * We first cull the falsetickers from the server population, 3790 * leaving only the truechimers. The correctness interval for 3791 * association p is the interval from offset - root_dist() to 3792 * offset + root_dist(). The object of the game is to find a 3793 * majority clique; that is, an intersection of correctness 3794 * intervals numbering more than half the server population. 3795 * 3796 * First construct the chime list of tuples (p, type, edge) as 3797 * shown below, then sort the list by edge from lowest to 3798 * highest. 3799 */ 3800 osys = s.p; 3801 s.p = NULL; 3802 n = 0; 3803 while (fit(p)) { 3804 s.m[n].p = p; 3805 s.m[n].type = +1; 3806 s.m[n].edge = p->offset + root_dist(p); 3807 n++; 3808 s.m[n].p = p; 3809 s.m[n].type = 0; 3810 s.m[n].edge = p->offset; 3811 n++; 3812 s.m[n].p = p; 3813 s.m[n].type = -1; 3814 s.m[n].edge = p->offset - root_dist(p); 3815 n++; 3816 } 3818 /* 3819 * Find the largest contiguous intersection of correctness 3820 * intervals. Allow is the number of allowed falsetickers; found 3821 * is the number of midpoints. Note that the edge values are 3822 * limited to the range +-(2 ^ 30) < +-2e9 by the timestamp 3823 * calculations. 3824 */ 3825 low = 2e9; high = -2e9; 3826 for (allow = 0; 2 * allow < n; allow++) { 3828 /* 3829 * Scan the chime list from lowest to highest to find 3830 * the lower endpoint. 3831 */ 3832 found = 0; 3833 chime = 0; 3834 for (i = 0; i < n; i++) { 3835 chime -= s.m[i].type; 3836 if (chime >= n - found) { 3837 low = s.m[i].edge; 3838 break; 3839 } 3840 if (s.m[i].type == 0) 3841 found++; 3842 } 3844 /* 3845 * Scan the chime list from highest to lowest to find 3846 * the upper endpoint. 3847 */ 3848 chime = 0; 3849 for (i = n - 1; i >= 0; i--) { 3850 chime += s.m[i].type; 3851 if (chime >= n - found) { 3852 high = s.m[i].edge; 3853 break; 3854 } 3855 if (s.m[i].type == 0) 3856 found++; 3857 } 3859 /* 3860 * If the number of midpoints is greater than the number 3861 * of allowed falsetickers, the intersection contains at 3862 * least one truechimer with no midpoint. If so, 3863 * increment the number of allowed falsetickers and go 3864 * around again. If not and the intersection is 3865 * nonempty, declare success. 3866 */ 3867 if (found > allow) 3868 continue; 3870 if (high > low) 3871 break; 3872 } 3874 /* 3875 * Clustering algorithm. Construct a list of survivors (p, 3876 * metric) from the chime list, where metric is dominated first 3877 * by stratum and then by root distance. All other things being 3878 * equal, this is the order of preference. 3879 */ 3880 s.n = 0; 3881 for (i = 0; i < n; i++) { 3882 if (s.m[i].edge < low || s.m[i].edge > high) 3883 continue; 3885 p = s.m[i].p; 3886 s.v[n].p = p; 3887 s.v[n].metric = MAXDIST * p->stratum + root_dist(p); 3888 s.n++; 3889 } 3891 /* 3892 * There must be at least NSANE survivors to satisfy the 3893 * correctness assertions. Ordinarily, the Byzantine criteria 3894 * require four, survivors, but for the demonstration here, one 3895 * is acceptable. 3896 */ 3897 if (s.n < NSANE) 3898 return; 3900 /* 3901 * For each association p in turn, calculate the selection 3902 * jitter p->sjitter as the square root of the sum of squares 3903 * (p->offset - q->offset) over all q associations. The idea is 3904 * to repeatedly discard the survivor with maximum selection 3905 * jitter until a termination condition is met. 3906 */ 3907 while (1) { 3908 struct p *p, *q, *qmax; /* peer structure pointers */ 3909 double max, min, dtemp; 3911 max = -2e9; min = 2e9; 3912 for (i = 0; i < s.n; i++) { 3913 p = s.v[i].p; 3914 if (p->jitter < min) 3915 min = p->jitter; 3916 dtemp = 0; 3917 for (j = 0; j < n; j++) { 3918 q = s.v[j].p; 3919 dtemp += SQUARE(p->offset - q->offset); 3920 } 3921 dtemp = SQRT(dtemp); 3922 if (dtemp > max) { 3923 max = dtemp; 3924 qmax = q; 3925 } 3926 } 3928 /* 3929 * If the maximum selection jitter is less than the 3930 * minimum peer jitter, then tossing out more survivors 3931 * will not lower the minimum peer jitter, so we might 3932 * as well stop. To make sure a few survivors are left 3933 * for the clustering algorithm to chew on, we also stop 3934 * if the number of survivors is less than or equal to 3935 * NMIN (3). 3936 */ 3937 if (max < min || n <= NMIN) 3938 break; 3940 /* 3941 * Delete survivor qmax from the list and go around 3942 * again. 3943 */ 3944 s.n--; 3945 } 3946 /* 3947 * Pick the best clock. If the old system peer is on the list 3948 * and at the same stratum as the first survivor on the list, 3949 * then don't do a clock hop. Otherwise, select the first 3950 * survivor on the list as the new system peer. 3951 */ 3952 if (osys->stratum == s.v[0].p->stratum) 3953 s.p = osys; 3954 else 3955 s.p = s.v[0].p; 3956 clock_update(s.p); 3957 } 3959 A.5.5.2. root_dist() 3961 /* 3962 * root_dist() - calculate root distance 3963 */ 3964 double 3965 root_dist( 3966 struct p *p /* peer structure pointer */ 3967 ) 3968 { 3970 /* 3971 * The root synchronization distance is the maximum error due to 3972 * all causes of the local clock relative to the primary server. 3973 * It is defined as half the total delay plus total dispersion 3974 * plus peer jitter. 3975 */ 3976 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3977 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3978 } 3979 A.5.5.3. accept() 3981 /* 3982 * accept() - test if association p is acceptable for synchronization 3983 */ 3984 int 3985 accept( 3986 struct p *p /* peer structure pointer */ 3987 ) 3988 { 3989 /* 3990 * A stratum error occurs if (1) the server has never been 3991 * synchronized, (2) the server stratum is invalid. 3992 */ 3993 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3994 return (FALSE); 3996 /* 3997 * A distance error occurs if the root distance exceeds the 3998 * distance threshold plus an increment equal to one poll 3999 * interval. 4000 */ 4001 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 4002 return (FALSE); 4004 /* 4005 * A loop error occurs if the remote peer is synchronized to the 4006 * local peer or the remote peer is synchronized to the current 4007 * system peer. Note this is the behavior for IPv4; for IPv6 the 4008 * MD5 hash is used instead. 4009 */ 4010 if (p->refid == p->dstaddr || p->refid == s.refid) 4011 return (FALSE); 4013 /* 4014 * An unreachable error occurs if the server is unreachable. 4015 */ 4016 if (p->reach == 0) 4017 return (FALSE); 4019 return (TRUE); 4020 } 4022 A.5.5.4. clock_update() 4024 /* 4025 * clock_update() - update the system clock 4026 */ 4028 void 4029 clock_update( 4030 struct p *p /* peer structure pointer */ 4031 ) 4032 { 4033 double dtemp; 4035 /* 4036 * If this is an old update, for instance as the result of a 4037 * system peer change, avoid it. We never use an old sample or 4038 * the same sample twice. 4039 * 4040 if (s.t >= p->t) 4041 return; 4043 /* 4044 * Combine the survivor offsets and update the system clock; the 4045 * local_clock() routine will tell us the good or bad news. 4046 */ 4047 s.t = p->t; 4048 clock_combine(); 4049 switch (local_clock(p, s.offset)) { 4051 /* 4052 * The offset is too large and probably bogus. Complain to the 4053 * system log and order the operator to set the clock manually 4054 * within PANIC range. The reference implementation includes a 4055 * command line option to disable this check and to change the 4056 * panic threshold from the default 1000 s as required. 4057 */ 4058 case PANIC: 4059 exit (0); 4061 /* 4062 * The offset is more than the step threshold (0.125 s by 4063 * default). After a step, all associations now have 4064 * inconsistent time valurs, so they are reset and started 4065 * fresh. The step threshold can be changed in the reference 4066 * implementation in order to lessen the chance the clock might 4067 * be stepped backwards. However, there may be serious 4068 * consequences, as noted in the white papers at the NTP project 4069 * site. 4070 */ 4071 case STEP: 4072 while (/* all associations */ 0) 4073 clear(p, X_STEP); 4074 s.stratum = MAXSTRAT; 4075 s.poll = MINPOLL; 4076 break; 4078 /* 4079 * The offset was less than the step threshold, which is the 4080 * normal case. Update the system variables from the peer 4081 * variables. The lower clamp on the dispersion increase is to 4082 * avoid timing loops and clockhopping when highly precise 4083 * sources are in play. The clamp can be changed from the 4084 * default .01 s in the reference implementation. 4085 */ 4086 case SLEW: 4087 s.leap = p->leap; 4088 s.stratum = p->stratum + 1; 4089 s.refid = p->refid; 4090 s.reftime = p->reftime; 4091 s.rootdelay = p->rootdelay + p->delay; 4092 dtemp = SQRT(SQUARE(p->jitter) + SQUARE(s.jitter)); 4093 dtemp += max(p->disp + PHI * (c.t - p->t) + 4094 fabs(p->offset), MINDISP); 4095 s.rootdisp = p->rootdisp + dtemp; 4096 break; 4098 /* 4099 * Some samples are discarded while, for instance, a direct 4100 * frequency measurement is being made. 4101 */ 4102 case IGNORE: 4103 break; 4104 } 4105 } 4106 A.5.5.5. clock_combine() 4108 /* 4109 * clock_combine() - combine offsets 4110 */ 4111 void 4112 clock_combine() 4113 { 4114 struct p *p; /* peer structure pointer */ 4115 double x, y, z, w; 4116 int i; 4118 /* 4119 * Combine the offsets of the clustering algorithm survivors 4120 * using a weighted average with weight determined by the root 4121 * distance. Compute the selection jitter as the weighted RMS 4122 * difference between the first survivor and the remaining 4123 * survivors. In some cases the inherent clock jitter can be 4124 * reduced by not using this algorithm, especially when frequent 4125 * clockhopping is involved. The reference implementation can be 4126 * configured to avoid this algorithm by designating a preferred 4127 * peer. 4128 */ 4129 y = z = w = 0; 4130 for (i = 0; s.v[i].p != NULL; i++) { 4131 p = s.v[i].p; 4132 x = root_dist(p); 4133 y += 1 / x; 4134 z += p->offset / x; 4135 w += SQUARE(p->offset - s.v[0].p->offset) / x; 4136 } 4137 s.offset = z / y; 4138 s.jitter = SQRT(w / y); 4139 } 4141 A.5.5.6. local_clock() 4143 /* 4144 * Clock discipoine parameters and constants 4145 */ 4146 #define STEPT .128 /* step threshold (s) */ 4147 #define WATCH 900 /* stepout threshold (s) */ 4148 #define PANICT 1000 /* panic threshold (s) */ 4149 #define PLL 65536 /* PLL loop gain */ 4150 #define FLL MAXPOLL + 1 /* FLL loop gain */ 4151 #define AVG 4 /* parameter averaging constant */ 4152 #define ALLAN 1500 /* compromise Allan intercept (s) */ 4153 #define LIMIT 30 /* poll-adjust threshold */ 4154 #define MAXFREQ 500e-6 /* frequency tolerance (500 PPM) */ 4155 #define PGATE 4 /* poll-adjust gate */ 4157 /* 4158 * local_clock() - discipline the local clock 4159 */ 4160 int /* return code */ 4161 local_clock( 4162 struct p *p, /* peer structure pointer */ 4163 double offset /* clock offset from combine() */ 4164 ) 4165 { 4166 int state; /* clock discipline state */ 4167 double freq; /* frequency */ 4168 double mu; /* interval since last update */ 4169 int rval; 4170 double etemp, dtemp; 4172 /* 4173 * If the offset is too large, give up and go home. 4174 */ 4175 if (fabs(offset) > PANICT) 4176 return (PANIC); 4178 /* 4179 * Clock state machine transition function. This is where the 4180 * action is and defines how the system reacts to large time 4181 * and frequency errors. There are two main regimes: when the 4182 * offset exceeds the step threshold and when it does not. 4183 */ 4184 rval = SLEW; 4185 mu = p->t - s.t; 4186 freq = 0; 4187 if (fabs(offset) > STEPT) { 4188 switch (c.state) { 4190 /* 4191 * In S_SYNC state we ignore the first outlyer amd 4192 * switch to S_SPIK state. 4193 */ 4194 case SYNC: 4195 state = SPIK; 4196 return (rval); 4198 /* 4199 * In S_FREQ state we ignore outlyers and inlyers. At 4200 * the first outlyer after the stepout threshold, 4201 * compute the apparent frequency correction and step 4202 * the time. 4203 */ 4204 case FREQ: 4205 if (mu < WATCH) 4206 return (IGNORE); 4208 freq = (offset - c.offset) / mu; 4209 /* fall through to S_SPIK */ 4211 /* 4212 * In S_SPIK state we ignore succeeding outlyers until 4213 * either an inlyer is found or the stepout threshold is 4214 * exceeded. 4215 */ 4216 case SPIK: 4217 if (mu < WATCH) 4218 return (IGNORE); 4220 /* fall through to default */ 4222 /* 4223 * We get here by default in S_NSET and S_FSET states 4224 * and from above in S_FREQ state. Step the time and 4225 * clamp down the poll interval. 4226 * 4227 * In S_NSET state an initial frequency correction is 4228 * not available, usually because the frequency file has 4229 * not yet been written. Since the time is outside the 4230 * capture range, the clock is stepped. The frequency 4231 * will be set directly following the stepout interval. 4232 * 4233 * In S_FSET state the initial frequency has been set 4234 * from the frequency file. Since the time is outside 4235 * the capture range, the clock is stepped immediately, 4236 * rather than after the stepout interval. Guys get 4237 * nervous if it takes 17 minutes to set the clock for 4238 * the first time. 4239 * 4240 * In S_SPIK state the stepout threshold has expired and 4241 * the phase is still above the step threshold. Note 4242 * that a single spike greater than the step threshold 4243 * is always suppressed, even at the longer poll 4244 * intervals. 4245 */ 4246 default: 4248 /* 4249 * This is the kernel set time function, usually 4250 * implemented by the Unix settimeofday() system 4251 * call. 4252 */ 4253 step_time(offset); 4254 c.count = 0; 4255 s.poll = MINPOLL; 4256 rval = STEP; 4257 if (state == NSET) { 4258 rstclock(FREQ, p->t, 0); 4259 return (rval); 4260 } 4261 break; 4262 } 4263 rstclock(SYNC, p->t, 0); 4264 } else { 4266 /* 4267 * Compute the clock jitter as the RMS of exponentially 4268 * weighted offset differences. This is used by the 4269 * poll-adjust code. 4270 */ 4271 etemp = SQUARE(c.jitter); 4272 dtemp = SQUARE(max(fabs(offset - c.last), 4273 LOG2D(s.precision))); 4274 c.jitter = SQRT(etemp + (dtemp - etemp) / AVG); 4275 switch (c.state) { 4277 /* 4278 * In S_NSET state this is the first update received and 4279 * the frequency has not been initialized. The first 4280 * thing to do is directly measure the oscillator 4281 * frequency. 4282 */ 4283 case NSET: 4284 rstclock(FREQ, p->t, offset); 4285 return (IGNORE); 4287 /* 4288 * In S_FSET state this is the first update and the 4289 * frequency has been initialized. Adjust the phase, but 4290 * don't adjust the frequency until the next update. 4291 */ 4292 case FSET: 4293 rstclock(SYNC, p->t, offset); 4294 break; 4296 /* 4297 * In S_FREQ state ignore updates until the stepout 4298 * threshold. After that, correct the phase and 4299 * frequency and switch to S_SYNC state. 4300 */ 4301 case FREQ: 4302 if (c.t - s.t < WATCH) 4303 return (IGNORE); 4305 freq = (offset - c.offset) / mu; 4306 break; 4308 /* 4309 * We get here by default in S_SYNC and S_SPIK states. 4310 * Here we compute the frequency update due to PLL and 4311 * FLL contributions. 4312 */ 4313 default: 4315 /* 4316 * The FLL and PLL frequency gain constants 4317 * depend on the poll interval and Allan 4318 * intercept. The FLL is not used below one-half 4319 * the Allan intercept. Above that the loop gain 4320 * increases in steps to 1 / AVG. 4321 */ 4322 if (LOG2D(s.poll) > ALLAN / 2) { 4323 etemp = FLL - s.poll; 4324 if (etemp < AVG) 4325 etemp = AVG; 4326 freq += (offset - c.offset) / (max(mu, 4327 ALLAN) * etemp); 4328 } 4330 /* 4331 * For the PLL the integration interval 4332 * (numerator) is the minimum of the update 4333 * interval and poll interval. This allows 4334 * oversampling, but not undersampling. 4335 */ 4336 etemp = min(mu, LOG2D(s.poll)); 4337 dtemp = 4 * PLL * LOG2D(s.poll); 4338 freq += offset * etemp / (dtemp * dtemp); 4339 rstclock(SYNC, p->t, offset); 4340 break; 4341 } 4342 } 4344 /* 4345 * Calculate the new frequency and frequency stability (wander). 4347 * Compute the clock wander as the RMS of exponentially weighted 4348 * frequency differences. This is not used directly, but can, 4349 * along withthe jitter, be a highly useful monitoring and 4350 * debugging tool 4351 */ 4352 freq += c.freq; 4353 c.freq = max(min(MAXFREQ, freq), -MAXFREQ); 4354 etemp = SQUARE(c.wander); 4355 dtemp = SQUARE(freq); 4356 c.wander = SQRT(etemp + (dtemp - etemp) / AVG); 4358 /* 4359 * Here we adjust the poll interval by comparing the current 4360 * offset with the clock jitter. If the offset is less than the 4361 * clock jitter times a constant, then the averaging interval is 4362 * increased, otherwise it is decreased. A bit of hysteresis 4363 * helps calm the dance. Works best using burst mode. 4364 */ 4365 if (fabs(c.offset) < PGATE * c.jitter) { 4366 c.count += s.poll; 4367 if (c.count > LIMIT) { 4368 c.count = LIMIT; 4369 if (s.poll < MAXPOLL) { 4370 c.count = 0; 4371 s.poll++; 4372 } 4373 } 4374 } else { 4375 c.count -= s.poll << 1; 4376 if (c.count < -LIMIT) { 4377 c.count = -LIMIT; 4378 if (s.poll > MINPOLL) { 4379 c.count = 0; 4380 s.poll--; 4381 } 4382 } 4383 } 4384 return (rval); 4385 } 4386 A.5.5.7. rstclock() 4388 /* 4389 * rstclock() - clock state machine 4390 */ 4391 void 4392 rstclock( 4393 int state, /* new state */ 4394 double offset, /* new offset */ 4395 double t /* new update time */ 4396 ) 4397 { 4398 /* 4399 * Enter new state and set state variables. Note we use the time 4400 * of the last clock filter sample, which must be earlier than 4401 * the current time. 4402 */ 4403 c.state = state; 4404 c.last = c.offset = offset; 4405 s.t = t; 4406 } 4408 A.5.6. Clock Adjust Process 4410 A.5.6.1. clock_adjust() 4412 /* 4413 * clock_adjust() - runs at one-second intervals 4414 */ 4415 void 4416 clock_adjust() { 4417 double dtemp; 4419 /* 4420 * Update the process time c.t. Also increase the dispersion 4421 * since the last update. In contrast to NTPv3, NTPv4 does not 4422 * declare unsynchronized after one day, since the dispersion 4423 * threshold serves this function. When the dispersion exceeds 4424 * MAXDIST (1 s), the server is considered unfit for 4425 * synchroniztion. 4426 */ 4427 c.t++; 4428 s.rootdisp += PHI; 4430 /* 4431 * Implement the phase and frequency adjustments. The gain 4432 * factor (denominator) is not allowed to increase beyond the 4433 * Allan intercept. It doesn't make sense to average phase noise 4434 * beyond this point and it helps to damp residual offset at the 4435 * longer poll intervals. 4436 */ 4437 dtemp = c.offset / (PLL * min(LOG2D(s.poll), ALLAN)); 4438 c.offset -= dtemp; 4440 /* 4441 * This is the kernel adjust time function, usually implemented 4442 * by the Unix adjtime() system call. 4443 */ 4444 adjust_time(c.freq + dtemp); 4446 /* 4447 * Peer timer. Call the poll() routine when the poll timer 4448 * expires. 4449 */ 4450 while (/* all associations */ 0) { 4451 struct p *p; /* dummy peer structure pointer */ 4453 if (c.t >= p->nextdate) 4454 poll(p); 4455 } 4457 /* 4458 * Once per hour write the clock frequency to a file 4459 */ 4460 if (c.t % 3600 == 3599) 4461 /* write c.freq to file */ 0; 4462 } 4464 A.5.7. Poll Process 4466 /* 4467 * Poll process parameters and constants 4468 */ 4469 #define UNREACH 12 /* unreach counter threshold */ 4470 #define BCOUNT 8 /* packets in a burst */ 4471 #define BTIME 2 /* burst interval (s) */ 4473 A.5.7.1. poll() 4475 /* 4476 * poll() - determine when to send a packet for association p-> 4477 */ 4478 void 4479 poll( 4480 struct p *p /* peer structure pointer */ 4481 ) 4483 { 4484 int hpoll; 4485 int oreach; 4487 /* 4488 * This routine is called when the current time c.t catches up 4489 * to the next poll time p->nextdate. The value p->outdate is 4490 * the last time this routine was executed. The poll_update() 4491 * routine determines the next execution time p->nextdate. 4492 * 4493 * If broadcasting, just do it, but only if we are synchronized. 4494 */ 4495 hpoll = p->hpoll; 4496 if (p->hmode == M_BCST) { 4497 p->outdate = c.t; 4498 if (s.p != NULL) 4499 peer_xmit(p); 4500 poll_update(p, hpoll); 4501 return; 4502 } 4504 /* 4505 * If manycasting, start with ttl = 1. The ttl is increased by 4506 * one for each poll until MAXCLOCK servers have been found or 4507 * ttl reaches TTLMAX. If reaching MAXCLOCK, stop polling until 4508 * the number of servers falls below MINCLOCK, then start all 4509 * over. 4510 */ 4511 if (p->hmode == M_CLNT && p->flags & P_MANY) { 4512 p->outdate = c.t; 4513 if (p->unreach > BEACON) { 4514 p->unreach = 0; 4515 p->ttl = 1; 4516 peer_xmit(p); 4517 } else if (s.n < MINCLOCK) { 4518 if (p->ttl < TTLMAX) 4519 p->ttl++; 4520 peer_xmit(p); 4521 } 4522 p->unreach++; 4523 poll_update(p, hpoll); 4524 return; 4525 } 4526 if (p->burst == 0) { 4528 /* 4529 * We are not in a burst. Shift the reachability 4530 * register to the left. Hopefully, some time before the 4531 * next poll a packet will arrive and set the rightmost 4532 * bit. 4533 */ 4534 oreach = p->reach; 4535 p->outdate = c.t; 4536 p->reach << 1; 4537 if (!(p->reach & 0x7)) 4538 clock_filter(p, 0, 0, MAXDISP); 4539 if (!p->reach) { 4541 /* 4542 * The server is unreachable, so bump the 4543 * unreach counter. If the unreach threshold has 4544 * been reached, double the poll interval to 4545 * minimize wasted network traffic. Send a burst 4546 * only if enabled and the unreach threshold has 4547 * not been reached. 4548 */ 4549 if (p->flags & P_IBURST && p->unreach == 0) { 4550 p->burst = BCOUNT; 4551 } else if (p->unreach < UNREACH) 4552 p->unreach++; 4553 else 4554 hpoll++; 4555 p->unreach++; 4556 } else { 4558 /* 4559 * The server is reachable. Set the poll 4560 * interval to the system poll interval. Send a 4561 * burst only if enabled and the peer is fit. 4562 */ 4563 p->unreach = 0; 4564 hpoll = s.poll; 4565 if (p->flags & P_BURST && fit(p)) 4566 p->burst = BCOUNT; 4567 } 4568 } else { 4570 /* 4571 * If in a burst, count it down. When the reply comes 4572 * back the clock_filter() routine will call 4573 * clock_select() to process the results of the burst. 4574 */ 4575 p->burst--; 4576 } 4578 /* 4579 * Do not transmit if in broadcast client mode. 4580 */ 4581 if (p->hmode != M_BCLN) 4582 peer_xmit(p); 4583 poll_update(p, hpoll); 4584 } 4586 A.5.7.2. poll_update() 4587 /* 4588 * poll_update() - update the poll interval for association p 4589 * 4590 * Note: This routine is called by both the packet() and poll() routine. 4591 * Since the packet() routine is executed when a network packet arrives 4592 * and the poll() routine is executed as the result of timeout, a 4593 * potential race can occur, possibly causing an incorrect interval for 4594 * the next poll. This is considered so unlikely as to be negligible. 4595 */ 4596 void 4597 poll_update( 4598 struct p *p, /* peer structure pointer */ 4599 int poll /* poll interval (log2 s) */ 4600 ) 4601 { 4602 /* 4603 * This routine is called by both the poll() and packet() 4604 * routines to determine the next poll time. If within a burst 4605 * the poll interval is two seconds. Otherwise, it is the 4606 * minimum of the host poll interval and peer poll interval, but 4607 * not greater than MAXPOLL and not less than MINPOLL. The 4608 * design insures that a longer interval can be preempted by a 4609 * shorter one if required for rapid response. 4610 */ 4611 p->hpoll = max(min(MAXPOLL, poll), MINPOLL); 4612 if (p->burst > 0) { 4613 if (p->nextdate != c.t) 4614 return; 4615 else 4616 p->nextdate += BTIME; 4617 } else { 4619 /* 4620 * While not shown here, the reference implementation 4621 * randonizes the poll interval by a small factor. 4622 */ 4623 p->nextdate = p->outdate + (1 << max(min(p->ppoll, 4624 p->hpoll), MINPOLL)); 4625 } 4627 /* 4628 * It might happen that the due time has already passed. If so, 4629 * make it one second in the future. 4630 */ 4631 if (p->nextdate <= c.t) 4632 p->nextdate = c.t + 1; 4633 } 4634 A.5.7.3. peer_xmit() 4635 /* 4636 * transmit() - transmit a packet for association p 4637 */ 4638 void 4639 peer_xmit( 4640 struct p *p /* peer structure pointer */ 4641 ) 4642 { 4643 struct x x; /* transmit packet */ 4645 /* 4646 * Initialize header and transmit timestamp 4647 */ 4648 x.srcaddr = p->dstaddr; 4649 x.dstaddr = p->srcaddr; 4650 x.leap = s.leap; 4651 x.version = p->version; 4652 x.mode = p->hmode; 4653 if (s.stratum == MAXSTRAT) 4654 x.stratum = 0; 4655 else 4656 x.stratum = s.stratum; 4657 x.poll = p->hpoll; 4658 x.precision = s.precision; 4659 x.rootdelay = D2FP(s.rootdelay); 4660 x.rootdisp = D2FP(s.rootdisp); 4661 x.refid = s.refid; 4662 x.reftime = s.reftime; 4663 x.org = p->org; 4664 x.rec = p->rec; 4665 x.xmt = get_time(); 4666 p->xmt = x.xmt; 4668 /* 4669 * If the key ID is nonzero, send a valid MAC using the key ID 4670 * of the association and the key in the local key cache. If 4671 * something breaks, like a missing trusted key, don't send the 4672 * packet; just reset the association and stop until the problem 4673 * is fixed. 4674 */ 4675 if (p->keyid) 4676 if (/* p->keyid invalid */ 0) { 4677 clear(p, X_NKEY); 4678 return; 4679 } 4680 x.mac = md5(p->keyid); 4681 xmit_packet(&x); 4682 } 4683 Authors' Addresses 4685 Jack Burbank (editor) 4686 The Johns Hopkins University Applied Physics Laboratory 4687 11100 Johns Hopkins Road 4688 Laurel, MD 20723-6099 4689 US 4691 Phone: +1 443 778 7127 4692 Email: jack.burbank@jhuapl.edu 4694 William Kasch (editor) 4695 The Johns Hopkins University Applied Physics Laboratory 4696 11100 Johns Hopkins Road 4697 Laurel, MD 20723-6099 4698 US 4700 Phone: +1 443 778 7463 4701 Email: william.kasch@jhuapl.edu 4703 Jim Martin (editor) 4704 The Daedelus Group 4705 721 Alameda de las Pulgas 4706 Belmont, CA 94002 4707 US 4709 Phone: +1 408 799-2788 4710 Email: jim@daedelus.com 4712 Dr. David L. Mills 4713 University of Delaware 4714 Newark, DE 19716 4715 US 4717 Phone: +1 302 831 8247 4718 Email: mills@udel.edu 4720 Full Copyright Statement 4722 Copyright (C) The IETF Trust (2007). 4724 This document is subject to the rights, licenses and restrictions 4725 contained in BCP 78, and except as set forth therein, the authors 4726 retain all their rights. 4728 This document and the information contained herein are provided on an 4729 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4730 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4731 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4732 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4733 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4734 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4736 Intellectual Property 4738 The IETF takes no position regarding the validity or scope of any 4739 Intellectual Property Rights or other rights that might be claimed to 4740 pertain to the implementation or use of the technology described in 4741 this document or the extent to which any license under such rights 4742 might or might not be available; nor does it represent that it has 4743 made any independent effort to identify any such rights. Information 4744 on the procedures with respect to rights in RFC documents can be 4745 found in BCP 78 and BCP 79. 4747 Copies of IPR disclosures made to the IETF Secretariat and any 4748 assurances of licenses to be made available, or the result of an 4749 attempt made to obtain a general license or permission for the use of 4750 such proprietary rights by implementers or users of this 4751 specification can be obtained from the IETF on-line IPR repository at 4752 http://www.ietf.org/ipr. 4754 The IETF invites any interested party to bring to its attention any 4755 copyrights, patents or patent applications, or other proprietary 4756 rights that may cover technology that may be required to implement 4757 this standard. Please address the information to the IETF at 4758 ietf-ipr@ietf.org. 4760 Acknowledgment 4762 Funding for the RFC Editor function is provided by the IETF 4763 Administrative Support Activity (IASA).