idnits 2.17.1 draft-ietf-ntp-ntpv4-proto-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4753. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4764. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4771. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4777. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 119 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with multicast IPv4 addresses in the document. If these are generic example addresses, they should be changed to use the 233.252.0.x range defined in RFC 5771 == The 'Obsoletes: ' line in the draft header should list only the _numbers_ of the RFCs which will be obsoleted by this document (if approved); it should not include the word 'RFC' in the list. -- The draft header indicates that this document obsoletes RFC1305, but the abstract doesn't seem to directly say this. It does mention RFC1305 though, so this could be OK. -- The draft header indicates that this document obsoletes RFC4330, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 2674 has weird spacing: '... ipaddr srcad...' == Line 2675 has weird spacing: '... ipaddr dstad...' == Line 2676 has weird spacing: '... char ver...' == Line 2677 has weird spacing: '... char lea...' == Line 2678 has weird spacing: '... char mod...' == (114 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2007) is 6192 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'NSTAGE' on line 3547 -- Looks like a reference, but probably isn't: 'NMAX' on line 2819 == Missing Reference: '0' is mentioned on line 4154, but not defined -- Obsolete informational reference (is this intentional?): RFC 1305 (ref. '1') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 4330 (ref. '2') (Obsoleted by RFC 5905) -- Obsolete informational reference (is this intentional?): RFC 2434 (ref. '10') (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 793 (ref. '11') (Obsoleted by RFC 9293) ** Downref: Normative reference to an Informational RFC: RFC 1345 (ref. '15') ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '16') Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NTP WG J. Burbank, Ed. 3 Internet-Draft W. Kasch, Ed. 4 Obsoletes: RFC 4330, RFC 1305 JHU/APL 5 (if approved) J. Martin, Ed. 6 Intended status: Standards Track Daedelus 7 Expires: November 2, 2007 D. Mills 8 U. Delaware 9 May 2007 11 Network Time Protocol Version 4 Protocol And Algorithms Specification 12 draft-ietf-ntp-ntpv4-proto-07 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on November 2, 2007. 39 Copyright Notice 41 Copyright (C) The IETF Trust (2007). 43 Abstract 45 The Network Time Protocol (NTP) is widely used to synchronize 46 computer clocks in the Internet. This document describes NTP Version 47 4 (NTPv4), which is backwards compatible with NTP Version 3 (NTPv3) 48 described in RFC 1305, as well as previous versions of the protocol. 50 NTPv4 includes a modified protocol header to accommodate the Internet 51 Protocol Version 6 address family. NTPv4 includes fundamental 52 improvements in the mitigation and discipline algorithms which extend 53 the potential accuracy to the tens of microseconds with modern 54 workstations and fast LANs. It includes a dynamic server discovery 55 scheme, so that in many cases specific server configuration is not 56 required. It corrects certain errors in the NTPv3 design and 57 implementation and includes an optional extension mechanism. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 62 1.1. Requirements Notation . . . . . . . . . . . . . . . . . . 5 63 2. Modes of Operation . . . . . . . . . . . . . . . . . . . . . 5 64 3. Protocol Modes . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. Dynamic Server Discovery . . . . . . . . . . . . . . . . 7 66 4. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 8 67 5. Implementation Model . . . . . . . . . . . . . . . . . . . . 10 68 6. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 7. Data Structures . . . . . . . . . . . . . . . . . . . . . . . 16 70 7.1. Structure Conventions . . . . . . . . . . . . . . . . . . 16 71 7.2. Global Parameters . . . . . . . . . . . . . . . . . . . . 16 72 7.3. Packet Header Variables . . . . . . . . . . . . . . . . . 17 73 7.4. The Kiss-o'-Death Packet . . . . . . . . . . . . . . . . 23 74 7.5. NTP Extension Field Format . . . . . . . . . . . . . . . 24 75 8. On Wire Protocol . . . . . . . . . . . . . . . . . . . . . . 25 76 9. Peer Process . . . . . . . . . . . . . . . . . . . . . . . . 29 77 9.1. Peer Process Variables . . . . . . . . . . . . . . . . . 30 78 9.2. Peer Process Operations . . . . . . . . . . . . . . . . . 32 79 10. Clock Filter Algorithm . . . . . . . . . . . . . . . . . . . 36 80 11. System Process . . . . . . . . . . . . . . . . . . . . . . . 38 81 11.1. System Process Variables . . . . . . . . . . . . . . . . 38 82 11.2. System Process Operations . . . . . . . . . . . . . . . . 39 83 11.2.1. Selection Algorithm . . . . . . . . . . . . . . . . 42 84 11.2.2. Cluster Algorithm . . . . . . . . . . . . . . . . . 43 85 11.2.3. Combine Algorithm . . . . . . . . . . . . . . . . . 44 86 11.3. Clock Discipline Algorithm . . . . . . . . . . . . . . . 46 87 12. Clock Adjust Process . . . . . . . . . . . . . . . . . . . . 50 88 13. Poll Process . . . . . . . . . . . . . . . . . . . . . . . . 50 89 13.1. Poll Process Variables . . . . . . . . . . . . . . . . . 50 90 13.2. Poll Process Operations . . . . . . . . . . . . . . . . . 51 91 14. Simple Network Time Protocol (SNTP) . . . . . . . . . . . . . 53 92 15. Security Considerations . . . . . . . . . . . . . . . . . . . 54 93 16. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 94 17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 56 95 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 56 96 18.1. Informative References . . . . . . . . . . . . . . . . . 56 97 18.2. Normative References . . . . . . . . . . . . . . . . . . 57 98 Appendix A. Code Skeleton . . . . . . . . . . . . . . . . . . . 57 99 A.1. Global Definitions . . . . . . . . . . . . . . . . . . . 58 100 A.1.1. Definitions, Constants, Parameters . . . . . . . . . 58 101 A.1.2. Packet Data Structures . . . . . . . . . . . . . . . 61 102 A.1.3. Association Data Structures . . . . . . . . . . . . 62 103 A.1.4. System Data Structures . . . . . . . . . . . . . . . 64 104 A.1.5. Local Clock Data Structures . . . . . . . . . . . . 65 105 A.1.6. Function Prototypes . . . . . . . . . . . . . . . . 65 106 A.2. Main Program and Utility Routines . . . . . . . . . . . . 66 107 A.3. Kernel Input/Output Interface . . . . . . . . . . . . . . 69 108 A.4. Kernel System Clock Interface . . . . . . . . . . . . . . 69 109 A.5. Peer Process . . . . . . . . . . . . . . . . . . . . . . 71 110 A.5.1. receive() . . . . . . . . . . . . . . . . . . . . . 72 111 A.5.2. clock_filter() . . . . . . . . . . . . . . . . . . . 79 112 A.5.3. fast_xmit() . . . . . . . . . . . . . . . . . . . . 83 113 A.5.4. access() . . . . . . . . . . . . . . . . . . . . . . 85 114 A.5.5. System Process . . . . . . . . . . . . . . . . . . . 85 115 A.5.6. Clock Adjust Process . . . . . . . . . . . . . . . . 99 116 A.5.7. Poll Process . . . . . . . . . . . . . . . . . . . . 100 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 107 118 Intellectual Property and Copyright Statements . . . . . . . . . 108 120 1. Introduction 122 This document defines the Network Time Protocol Version 4 (NTPv4), 123 which is widely used to synchronize system clocks among a set of 124 distributed time servers and clients. It describes the core 125 architecture, protocol, state machines, data structures and 126 algorithms. NTPv4 introduces new functionality to NTPv3, as 127 described in [1], and functionality expanded from SNTPv4 as described 128 in [2] (SNTPv4 is a subset of NTPv4). This document obsoletes [1], 129 and [2]. While certain minor changes have been made in some protocol 130 header fields, these do not affect the interoperability between NTPv4 131 and previous versions of NTP and SNTP. 133 The NTP subnet model includes a number of widely accessible primary 134 time servers synchronized by wire or radio to national standards. 135 The purpose of the NTP protocol is to convey timekeeping information 136 from these primary servers to secondary time servers and clients via 137 both private networks and the public Internet. Precisely tuned 138 algorithms mitigate errors that may result from network disruptions, 139 server failures and possible hostile actions. Servers and clients 140 are configured such that values flow towards clients from the primary 141 servers at the root via branching secondary servers. 143 The NTPv4 design overcomes significant shortcomings in the NTPv3 144 design, corrects certain bugs and incorporates new features. In 145 particular, expanded NTP timestamp definitions encourage the use of 146 the floating double data type throughout the implementation. As a 147 result, the time resolution is better than one nanosecond and 148 frequency resolution is less than one nanosecond per second. 149 Additional improvements include a new clock discipline algorithm 150 which is more responsive to system clock hardware frequency 151 fluctuations. Typical primary servers using modern machines are 152 precise within a few tens of microseconds. Typical secondary servers 153 and clients on fast LANs are within a few hundred microseconds with 154 poll intervals up to 1024 seconds, which was the maximum with NTPv3. 155 With NTPv4, servers and clients are precise within a few tens of 156 milliseconds with poll intervals up to 36 hours. 158 The main body of this document describes the core protocol and data 159 structures necessary to interoperate between conforming 160 implementations. Appendix A contains additional detail in the form 161 of a skeleton program, including data structures and code segments 162 for the core algorithms as well as the mitigation algorithms used to 163 enhance reliability and accuracy. While the skeleton program and 164 other descriptions in this document apply to a particular 165 implementation, they are not intended as the only way the required 166 functions can be implemented. While the NTPv3 symmetric key 167 authentication scheme described in this document has been carried 168 over from NTPv3, the Autokey public key authentication scheme new to 169 NTPv4 is described in [3]. 171 The NTP protocol includes modes of operation described in Section 2 172 using data types described in Section 6 and data structures described 173 in Section 7. The implementation model described in Section 5 is 174 based on a threaded, multi-process architecture, although other 175 architectures could be used as well. The on-wire protocol described 176 in Section 8 is based on a returnable-time design which depends only 177 on measured clock offsets, but does not require reliable message 178 delivery. Reliable message delivery such as TCP[11] can actually 179 make the delivered NTP packet less reliable since retries would 180 increase the delay value and other errors. The synchronization 181 subnet is a self-organizing, hierarchical, master-slave network with 182 synchronization paths determined by a shortest-path spanning tree and 183 defined metric. While multiple masters (primary servers) may exist, 184 there is no requirement for an election protocol. 186 This document includes material from [4], which contains flow charts 187 and equations unsuited for RFC format. There is much additional 188 information in [5], including an extensive technical analysis and 189 performance assessment of the protocol and algorithms in this 190 document. The reference implementation is available at www.ntp.org. 192 The remainder of this document contains numerous variables and 193 mathematical expressions. Some variables take the form of Greek 194 characters, which are spelled out by their full case-sensitive name. 195 For example DELTA refers to the uppercase Greek character, while 196 delta refers to the lowercase character. Furthermore, subscripts are 197 denoted with '_', for example theta_i refers to the lowercase Greek 198 character theta with subscript i, or phonetically theta sub i. In 199 this document all time values are in seconds (s), and all frequencies 200 will be specified as fractional frequency offsets (FFO) (pure 201 number). It is often convenient to express these FFOs in parts per 202 million (ppm). 204 1.1. Requirements Notation 206 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 207 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 208 document are to be interpreted as described in [12]. 210 2. Modes of Operation 212 An NTP implementation operates as a primary server, secondary server 213 or client. A primary server is synchronized to a reference clock 214 directly traceable to UTC (eg, GPS, Galileo, etc). A client 215 synchronizes to one or more upstream servers, but does not provide 216 synchronization to dependent clients. A secondary server has one or 217 more upstream servers and one or more downstream servers or clients. 218 All servers and clients who are fully NTPv4 compliant MUST implement 219 the entire suite of algorithms described in this document. In order 220 to maintain stability in large NTP subnets, secondary servers SHOULD 221 be fully NTPv4 compliant. Alternative algorithms MAY be used, but 222 their output MUST be identical to the algorithms described in this 223 specification, or they MUST be used exclusively in a private network. 225 3. Protocol Modes 227 There are three NTP protocol variants, symmetric, client/server and 228 broadcast. Each is associated with an association mode as shown in 229 Figure 1. In addition, persistent associations are mobilized upon 230 startup and are never demobilized. Ephemeral associations are 231 mobilized upon the arrival of a packet and are demobilized upon error 232 or timeout. 234 +-------------------+--------------+-------------+ 235 | Association Mode | Assoc. Mode | Packet Mode | 236 +-------------------+--------------+-------------+ 237 | Symmetric Active | 1 | 1 or 2 | 238 | Symmetric Passive | 2 | 1 | 239 | Client | 3 | 4 | 240 | Server | 4 | 3 | 241 | Broadcast Server | 5 | 5 | 242 | Broadcast Client | 6 | N/A | 243 +-------------------+--------------+-------------+ 245 Figure 1: Association and Packet Modes 247 In the client/server variant a persistent client sends client (mode 248 3) packets to a server, which returns server (mode 4) packets. 249 Servers provide synchronization to one or more clients, but do not 250 accept synchronization from them. A server can also be a reference 251 clock driver which obtains time directly from a standard source such 252 as a GPS receiver or telephone modem service. In this varient, 253 clients pull synchronization from servers. 255 In the symmetric variant a peer operates as both a server and client 256 using either a symmetric active or symmetric passive association. A 257 persistent symmetric active association sends symmetric active (mode 258 1) packets to a symmetric active peer association. Alternatively, an 259 ephemeral symmetric passive association can be mobilized upon arrival 260 of a symmetric active packet with no matching association. That 261 association sends symmetric passive (mode 2) packets and persists 262 until error or timeout. Peers both push and pull synchronization to 263 and from each other. For the purposes of this document, a peer 264 operates like a client, so references to client imply peer as well. 266 In the broadcast variant a persistent broadcast server association 267 sends periodic broadcast server (mode 5) packets which can be 268 received by multiple clients. Upon reception of a broadcast server 269 packet without a matching association, an ephemeral broadcast client 270 (mode 6) association is mobilized and persists until error or 271 timeout. It is useful to provide an initial volley where the client 272 operating in client mode exchanges several packets with the server, 273 so as to calibrate the propagation delay and to run the Autokey 274 security protocol, after which the client reverts to broadcast client 275 mode. A broadcast server pushes synchronization to clients and other 276 servers. 278 Following loosely the conventions established by the telephone 279 industry, the level of each server in the hierarchy is defined by a 280 stratum number. Primary servers are assigned stratum one; secondary 281 servers at each lower level are assigned stratum numbers one greater 282 than the preceding level. As the stratum number increases, its 283 accuracy degrades depending on the particular network path and system 284 clock stability. Mean errors, measured by synchronization distances, 285 increase approximately in proportion to stratum numbers and measured 286 roundtrip delay. 288 As a standard practice, timing network topology should be organized 289 to avoid timing loops and minimize the synchronization distance. In 290 NTP the subnet topology is determined using a variant of the Bellman- 291 Ford distributed routing algorithm, which computes the shortest-path 292 spanning tree rooted on the primary servers. As a result of this 293 design, the algorithm automatically reorganizes the subnet, so as to 294 produce the most accurate and reliable time, even when there are 295 failures in the timing network. 297 3.1. Dynamic Server Discovery 299 There are two special associations, manycast client and manycast 300 server, which provide a dynamic server discovery function. There are 301 two types of manycast client associations, persistent and ephemeral. 302 The persistent manycast client sends client (mode 3) packets to a 303 designated IPv4 or IPv6 broadcast or multicast group address. 304 Designated manycast servers within range of the time-to-live (TTL) 305 field in the packet header listen for packets with that address. If 306 a server is suitable for synchronization, it returns an ordinary 307 server (mode 4) packet using the client's unicast address. Upon 308 receiving this packet, the client mobilizes an ephemeral client (mode 309 3) association. The ephemeral client association persists until 310 error or timeout. 312 A manycast client continues sending packets to search for a minimum 313 number of associations. It starts with a TTL equal to one and 314 continuously adding one to it until the minimum number of 315 associations is made or when the TTL reaches a maximum value. If the 316 TTL reaches its maximum value and yet not enough associations are 317 mobilized, the client stops transmission for a time-out period to 318 clear all associations, and then repeats the search cycle. If a 319 minimum number of associations has been mobilized, then the client 320 starts transmitting one packet per time-out period to maintain the 321 associations. Field constraints limit the minimum value to 1 and the 322 maximum to 255. These limits may be tuned for individual application 323 needs. 325 The ephemeral associations compete among themselves. As new 326 ephemeral associations are mobilized, the client runs the mitigation 327 algorithms described in Section 10 and Section 11.2 for the best 328 candidates out of the population, the remaining ephemeral 329 associations are timed out and demobilized. In this way the 330 population includes only the best candidates that have most recently 331 responded with an NTP packet to discipline the system clock. 333 4. Definitions 335 A number of technical terms are defined in this section. A timescale 336 is a frame of reference where time is expressed as the value of a 337 monotonically increasing binary counter with an indefinite number of 338 bits. It counts in seconds and fractions of a second, when a decimal 339 point is employed. The Coordinated Universal Time (UTC) timescale is 340 defined by ITU-R TF.460[6]. Under the auspices of the Metre 341 Convention of 1865, in 1975 the CGPM[7] strongly endorsed the use of 342 UTC as the basis for civil time. 344 The Coordinated Universal Time (UTC) timescale represents mean solar 345 time as disseminated by national standards laboratories. The system 346 time is represented by the system clock maintained by the hardware 347 and operating system. The goal of the NTP algorithms is to minimize 348 both the time difference and frequency difference between UTC and the 349 system clock. When these differences have been reduced below nominal 350 tolerances, the system clock is said to be synchronized to UTC. 352 The date of an event is the UTC time at which the event takes place. 353 Dates are ephemeral values designated with upper case T. Running time 354 is another timescale that is coincident to the synchronization 355 function of the NTP program. 357 A timestamp T(t) represents either the UTC date or time offset from 358 UTC at running time t. Which meaning is intended should be clear 359 from the context. Let T(t) be the time offset, R(t) the frequency 360 offset, D(t) the aging rate (first derivative of R(t) with respect to 361 t). Then, if T(t_0) is the UTC time offset determined at t = t_0, 362 the UTC time offset at time t is 364 T(t) = T(t_0) + R(t_0)(t-t_0) + 1/2 * D(t_0)(t-t_0)^2 + e, 366 where e is a stochastic error term discussed later in this document. 367 While the D(t) term is important when characterizing precision 368 oscillators, it is ordinarily neglected for computer oscillators. In 369 this document all time values are in seconds (s) and all frequency 370 values are in seconds-per-second (s/s). It is sometimes convenient 371 to express frequency offsets in parts-per-million (PPM), where 1 PPM 372 is equal to 10^(-6) seconds/second. 374 It is important in computer timekeeping applications to assess the 375 performance of the timekeeping function. The NTP performance model 376 includes four statistics which are updated each time a client makes a 377 measurement with a server. The offset (theta) represents the 378 maximum-likelihood time offset of the server clock relative to the 379 system clock. The delay (delta) represents the round trip delay 380 between the client and server. The dispersion (epsilon) represents 381 the maximum error inherent in the measurement. It increases at a 382 rate equal to the maximum disciplined system clock frequency 383 tolerance (PHI), typically 15 PPM. The jitter (psi) is defined as 384 the root-mean-square (RMS) average of the most recent offset 385 differences, represents the nominal error in estimating the offset. 387 While the theta, delta, epsilon, and psi statistics represent 388 measurements of the system clock relative to each server clock 389 separately, the NTP protocol includes mechanisms to combine the 390 statistics of several servers to more accurately discipline and 391 calibrate the system clock. The system offset (THETA) represents the 392 maximum-likelihood offset estimate for the server population. The 393 system jitter (PSI) represents the nominal error in estimating the 394 system offset. The delta and epsilon statistics are accumulated at 395 each stratum level from the reference clock to produce the root delay 396 (DELTA) and root dispersion (EPSILON) statistics. The 397 synchronization distance (LAMBDA) equal to EPSILON + DELTA / 2 398 represents the maximum error due all causes. The detailed 399 formulations of these statistics are given in Section 11.2. They are 400 available to the dependent applications in order to assess the 401 performance of the synchronization function. 403 5. Implementation Model 405 Figure 2 shows the architecture of a typical, multi-threaded 406 implementation. It includes two processes dedicated to each server, 407 a peer process to receive messages from the server or reference clock 408 and a poll process to transmit messages to the server or reference 409 clock. 411 ..................................................................... 412 . Remote . Peer/Poll . System . Clock . 413 . Servers . Processes . Process .Discipline. 414 . . . . Process . 415 .+--------+. +-----------+. +------------+ . . 416 .| |->| |. | | . . 417 .|Server 1| |Peer/Poll 1|->| | . . 418 .| |<-| |. | | . . 419 .+--------+. +-----------+. | | . . 420 . . ^ . | | . . 421 . . | . | | . . 422 .+--------+. +-----------+. | | +-----------+. . 423 .| |->| |. | Selection |->| |. +------+ . 424 .|Server 2| |Peer/Poll 2|->| and | | Combine |->| Loop | . 425 .| |<-| |. | Cluster | | Algorithm |. |Filter| . 426 .+--------+. +-----------+. | Algorithms |->| |. +------+ . 427 . . ^ . | | +-----------+. | . 428 . . | . | | . | . 429 .+--------+. +-----------+. | | . | . 430 .| |->| |. | | . | . 431 .|Server 3| |Peer/Poll 3|->| | . | . 432 .| |<-| |. | | . | . 433 .+--------+. +-----------+. +------------+ . | . 434 ....................^.........................................|...... 435 | . V . 436 | . +-----+ . 437 +--------------------------------------| VFO | . 438 . +-----+ . 439 . Clock . 440 . Adjust . 441 . Process . 442 ............ 444 Figure 2: Implementation Model 446 These processes operate on a common data structure, called an 447 association, which contains the statistics described above along with 448 various other data described in Section 9. A client sends packets to 449 one or more servers and then processes returned packets when they are 450 received. The server interchanges source and destination addresses 451 and ports, overwrites certain fields in the packet and returns it 452 immediately (in the client/server mode) or at some time later (in the 453 symmetric modes). As each NTP message is received, the offset theta 454 between the peer clock and the system clock is computed along with 455 the associated statistics delta, epsilon and psi. 457 The system process includes the selection, cluster and combine 458 algorithms that mitigate among the various servers and reference 459 clocks to determine the most accurate and reliable candidates to 460 synchronize the system clock. The selection algorithm uses Byzantine 461 fault detection principles to discard the presumably incorrect 462 candidates called "falsetickers" from the incident population, 463 leaving only good candidates called "truechimers". A truechimer is a 464 clock that maintains timekeeping accuracy to a previously published 465 and trusted standard, while a falseticker is a clock that shows 466 misleading or inconsistent time. The cluster algorithm uses 467 statistical principles to find the most accurate set of truechimers. 468 The combine algorithm computes the final clock offset by 469 statistically averaging the surviving truechimers. 471 The clock discipline process is a system process that controls the 472 time and frequency of the system clock, here represented as a 473 variable frequency oscillator (VFO). Timestamps struck from the VFO 474 close the feedback loop which maintains the system clock time. 475 Associated with the clock discipline process is the clock adjust 476 process, which runs once each second to inject a computed time offset 477 and maintain constant frequency. The RMS average of past time offset 478 differences represents the nominal error or system clock jitter. The 479 RMS average of past frequency offset differences represents the 480 oscillator frequency stability or frequency wander. These terms are 481 given precise interpretation in Section 11.3. 483 A client sends messages to each server with a poll interval of 2^tau 484 seconds, as determined by the poll exponent tau. In NTPv4, tau 485 ranges from 4 (16 s) through 17 (36 h). The value of tau is 486 determined by the clock discipline algorithm to match the loop time 487 constant T_c = 2^tau. In client/server mode the server responds 488 immediately; however, in symmetric modes each of two peers manages 489 tau as a function of current system offset and system jitter, so may 490 not agree with the same value. It is important that the dynamic 491 behavior of the clock discipline algorithm be carefully controlled in 492 order to maintain stability in the NTP subnet at large. This 493 requires that the peers agree on a common tau equal to the minimum 494 poll exponent of both peers. The NTP protocol includes provisions to 495 properly negotiate this value. 497 The implementation model includes some means to set and adjust the 498 system clock. The operating system is assumed to provide two 499 functions, one to set the time directly, for example the Unix 500 settimeofday() function, and another to adjust the time in small 501 increments advancing or retarding the time by a designated amount, 502 for example the Unix adjtime() function. In this and following 503 references, parentheses following a name indicate reference to a 504 function rather than a simple variable. In the intended design the 505 clock discipline process uses the adjtime() function if the 506 adjustment is less than a designated threshold, and the 507 settimeofday() function if above the threshold. The manner in which 508 this is done and the value of the threshold as described in 509 Section 10. 511 6. Data Types 513 All NTP time values are represented in twos-complement format, with 514 bits numbered in big-endian (as described in Appendix A of [13]) 515 fashion from zero starting at the left, or high-order, position. 516 There are three NTP time formats, a 128-bit date format, a 64-bit 517 timestamp format and a 32-bit short format, as shown in Figure 3. 518 The 128-bit date format is used where sufficient storage and word 519 size are available. It includes a 64-bit signed seconds field 520 spanning 584 billion years and a 64-bit fraction field resolving .05 521 attosecond (i.e., 0.5e-18). For convenience in mapping between 522 formats, the seconds field is divided into a 32-bit Era Number field 523 and a 32-bit Era Offset field. Eras cannot be produced by NTP 524 directly, nor is there need to do so. When necessary, they can be 525 derived from external means, such as the filesystem or dedicated 526 hardware. 528 0 1 2 3 529 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 531 | Seconds | Fraction | 532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 534 NTP Short Format 536 0 1 2 3 537 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 539 | Seconds | 540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 541 | Fraction | 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 544 NTP Timestamp Format 546 0 1 2 3 547 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | Era Number | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 551 | Era Offset | 552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 | | 554 | Fraction | 555 | | 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 NTP Date Format 560 Figure 3: NTP Time Formats 562 The 64-bit timestamp format is used in packet headers and other 563 places with limited word size. It includes a 32-bit unsigned seconds 564 field spanning 136 years and a 32-bit fraction field resolving 232 565 picoseconds. The 32-bit short format is used in delay and dispersion 566 header fields where the full resolution and range of the other 567 formats are not justified. It includes a 16-bit unsigned seconds 568 field and a 16-bit fraction field. 570 In the date and timestamp formats the prime epoch, or base date of 571 era 0, is 0 h 1 January 1900 UTC, when all bits are zero. It should 572 be noted that strictly speaking, UTC did not exist prior to 1 January 573 1972, but it is convenient to assume it has existed for all eternity, 574 even if all knowledge of historic leap seconds has been lost. Dates 575 are relative to the prime epoch; values greater than zero represent 576 times after that date; values less than zero represent times before 577 it. Note that the Era Offset field of the date format and the 578 Seconds field of the timestamp format have the same interpretation. 580 Timestamps are unsigned values and operations on them produce a 581 result in the same or adjacent eras. Era 0 includes dates from the 582 prime epoch to some time in 2036, when the timestamp field wraps 583 around and the base date for era 1 is established. In either format 584 a value of zero is a special case representing unknown or 585 unsynchronized time. Figure 4 shows a number of historic NTP dates 586 together with their corresponding Modified Julian Day (MJD), NTP era 587 and NTP timestamp. 589 +-------------+------------+-----+---------------+------------------+ 590 | Date | MJD | NTP | NTP Timestamp | Epoch | 591 | | | Era | Era Offset | | 592 +-------------+------------+-----+---------------+------------------+ 593 | 1 Jan -4712 | -2,400,001 | -49 | 1,795,583,104 | 1st day Julian | 594 | 1 Jan -1 | -679,306 | -14 | 139,775,744 | 2 BCE | 595 | 1 Jan 0 | -678,491 | -14 | 171,311,744 | 1 BCE | 596 | 1 Jan 1 | -678,575 | -14 | 202,939,144 | 1 CE | 597 | 4 Oct 1582 | -100,851 | -3 | 2,873,647,488 | Last day Julian | 598 | 15 Oct 1582 | -100,840 | -3 | 2,874,597,888 | First day | 599 | | | | | Gregorian | 600 | 31 Dec 1899 | 15019 | -1 | 4,294,880,896 | Last day NTP Era | 601 | | | | | -1 | 602 | 1 Jan 1900 | 15020 | 0 | 0 | First day NTP | 603 | | | | | Era 0 | 604 | 1 Jan 1970 | 40,587 | 0 | 2,208,988,800 | First day UNIX | 605 | 1 Jan 1972 | 41,317 | 0 | 2,272,060,800 | First day UTC | 606 | 31 Dec 1999 | 51,543 | 0 | 3,155,587,200 | Last day 20th | 607 | | | | | Century | 608 | 8 Feb 2036 | 64,731 | 1 | 63,104 | First day NTP | 609 | | | | | Era 1 | 610 +-------------+------------+-----+---------------+------------------+ 612 Figure 4: Interesting Historic NTP Dates 614 Let p be the number of significant bits in the second fraction. The 615 clock resolution is defined 2^(-p), in seconds. In order to minimize 616 bias and help make timestamps unpredictable to an intruder, the non- 617 significant bits should be set to an unbiased random bit string. The 618 clock precision is defined as the running time to read the system 619 clock, in seconds. Note that the precision defined in this way can 620 be larger or smaller than the resolution. The term rho, representing 621 the precision used in the protocol, is the larger of the two. 623 The only arithmetic operation permitted on dates and timestamps is 624 twos-complement subtraction, yielding a 127-bit or 63-bit signed 625 result. It is critical that the first-order differences between two 626 dates preserve the full 128-bit precision and the first-order 627 differences between two timestamps preserve the full 64-bit 628 precision. However, the differences are ordinarily small compared to 629 the seconds span, so they can be converted to floating double format 630 for further processing and without compromising the precision. 632 It is important to note that twos-complement arithmetic does not 633 distinguish between signed and unsigned values (although comparisons 634 can take sign into account); only the conditional branch instructions 635 do. Thus, although the distinction is made between signed dates and 636 unsigned timestamps, they are processed the same way. A perceived 637 hazard with 64-bit timestamp calculations spanning an era, such as 638 possible in 2036, might result in over-run. In point of fact, if the 639 client is set within 68 years of the server before the protocol is 640 started, correct values are obtained even if the client and server 641 are in adjacent eras. 643 Some time values are represented in exponent format, including the 644 precision, time constant and poll interval. These are in 8-bit 645 signed integer format in log2 (log base 2) seconds. The only 646 arithmetic operations permitted on them are increment and decrement. 647 For the purpose of this document and to simplify the presentation, a 648 reference to one of these variables by name means the exponentiated 649 value, e.g., the poll interval is 1024 s, while reference by name and 650 exponent means the actual value, e.g., the poll exponent is 10. 652 To convert system time in any format to NTP date and timestamp 653 formats requires that the number of seconds s from the prime epoch to 654 the system time be determined. To determine the integer era and 655 timestamp given s, 657 era = s / 2^(32) and timestamp = s - era * 2^(32), 659 which works for positive and negative dates. To determine s given 660 the era and timestamp, 662 s = era * 2^(32) + timestamp. 664 Converting between NTP and system time can be a little messy, and 665 beyond the scope of this document. Note that the number of days in 666 era 0 is one more than the number of days in most other eras and this 667 won't happen again until the year 2400 in era 3. 669 In the description of state variables to follow, explicit reference 670 to integer type implies a 32-bit unsigned integer. This simplifies 671 bounds checks, since only the upper limit needs to be defined. 673 Without explicit reference, the default type is 64-bit floating 674 double. Exceptions will be noted as necessary. 676 7. Data Structures 678 The NTP protocol state machines described in the following sections 679 are defined using state variables and code fragments defined in 680 Appendix A. State variables are separated into classes according to 681 their function in packet headers, peer and poll processes, the system 682 process and the clock discipline process. Packet variables represent 683 the NTP header values in transmitted and received packets. Peer and 684 poll variables represent the contents of the association for each 685 server separately. System variables represent the state of the 686 server as seen by its dependent clients. Clock discipline variables 687 represent the internal workings of the clock discipline algorithm. 688 Additional parameters and variable classes are defined in Appendix A. 690 7.1. Structure Conventions 692 In order to distinguish between different variables of the same name 693 but used in different processes, the naming convention summarized in 694 Figure 5 is adopted. A receive packet variable v is a member of the 695 packet structure r with fully qualified name r.v. In a similar 696 manner x.v is a transmit packet variable, p.v is a peer variable, s.v 697 is a system variable and c.v is a clock discipline variable. There 698 is a set of peer variables for each association; there is only one 699 set of system and clock variables. 701 +------+---------------------------------+ 702 | Name | Description | 703 +------+---------------------------------+ 704 | r. | receive packet header variable | 705 | x. | transmit packet header variable | 706 | p. | peer/poll variable | 707 | s. | system variable | 708 | c. | clock discipline variable | 709 +------+---------------------------------+ 711 Figure 5: Prefix Conventions 713 7.2. Global Parameters 715 In addition to the variable classes a number of global parameters are 716 defined in this document, including those shown with values in 717 Figure 6. 719 +-----------+-------+----------------------------------+ 720 | Name | Value | Description | 721 +-----------+-------+----------------------------------+ 722 | PORT | 123 | NTP port number | 723 | VERSION | 4 | version number | 724 | TOLERANCE | 15e-6 | frequency tolerance PHI (s/s) | 725 | MINPOLL | 4 | minimum poll exponent (16 s) | 726 | MAXPOLL | 17 | maximum poll exponent (36 h) | 727 | MAXDISP | 16 | maximum dispersion (16 s) | 728 | MINDISP | .005 | minimum dispersion increment (s) | 729 | MAXDIST | 1 | distance threshold (1 s) | 730 | MAXSTRAT | 16 | maximum stratum number | 731 +-----------+-------+----------------------------------+ 733 Figure 6: Global Parameters 735 While these are the only global parameters needed in this document, a 736 larger collection is necessary in the skeleton and larger still for 737 any implementation. Appendix A.1.1 contains those used by the 738 skeleton for the mitigation algorithms, clock discipline algorithm 739 and related implementation-dependent functions. Some of these 740 parameter values are cast in stone, like the NTP port number assigned 741 by the IANA and the version number assigned NTPv4 itself. Others 742 like the frequency tolerance (also called PHI), involve an assumption 743 about the worst case behavior of a system clock once synchronized and 744 then allowed to drift when its sources have become unreachable. The 745 minimum and maximum parameters define the limits of state variables 746 as described in later sections of this document. 748 While shown with fixed values in this document, some implementations 749 may make them variables adjustable by configuration commands. For 750 instance, the reference implementation computes the value of 751 PRECISION as log2 of the minimum time in several iterations to read 752 the system clock. 754 7.3. Packet Header Variables 756 The most important state variables from an external point of view are 757 the packet header variables described in Figure 7 and below. The NTP 758 packet header consists of an integral number of 32-bit (4 octet) 759 words in network byte order. The packet format consists of three 760 components, the header itself, one or more optional extension fields 761 and an optional message authentication code (MAC). The header 762 component is identical to the NTPv3 header and previous versions. 763 The optional extension fields are used by the Autokey public key 764 cryptographic algorithms described in [3]. The optional MAC is used 765 by both Autokey and the symmetric key cryptographic algorithm 766 described in this report. 768 +-----------+------------+-----------------------+ 769 | Name | Formula | Description | 770 +-----------+------------+-----------------------+ 771 | leap | leap | leap indicator (LI) | 772 | version | version | version number (VN) | 773 | mode | mode | mode | 774 | stratum | stratum | stratum | 775 | poll | poll | poll exponent | 776 | precision | rho | precision exponent | 777 | rootdelay | delta_r | root delay | 778 | rootdisp | epsilon_r | root dispersion | 779 | refid | refid | reference ID | 780 | reftime | reftime | reference timestamp | 781 | org | T1 | origin timestamp | 782 | rec | T2 | receive timestamp | 783 | xmt | T3 | transmit timestamp | 784 | dst | T4 | destination timestamp | 785 | keyid | keyid | key ID | 786 | MAC | MAC | message digest | 787 +-----------+------------+-----------------------+ 789 Figure 7: Packet Header Variables 791 The NTP packet is a UDP datagram[14]. Some fields use multiple words 792 and others are packed in smaller fields within a word. The NTP 793 packet header shown in Figure 8 has 12 words followed by optional 794 extension fields and finally an optional message authentication code 795 (MAC) consisting of the key identifier field and message digest 796 field. 798 0 1 2 3 799 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 801 |LI | VN |Mode | Stratum | Poll | Precision | 802 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 803 | Root Delay | 804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 805 | Root Dispersion | 806 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 807 | Reference ID | 808 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 809 | | 810 + Reference Timestamp (64) + 811 | | 812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 813 | | 814 + Origin Timestamp (64) + 815 | | 816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 817 | | 818 + Receive Timestamp (64) + 819 | | 820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 821 | | 822 + Transmit Timestamp (64) + 823 | | 824 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 825 | | 826 . . 827 . Extension Field 1 (variable) . 828 . . 829 | | 830 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 | | 832 . . 833 . Extension Field 2 (variable) . 834 . . 835 | | 836 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 837 | Key Identifier | 838 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 839 | | 840 | MAC (128) | 841 | | 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 Figure 8: Packet Header Format 846 The extension fields are used to add optional capabilities, for 847 example, the Autokey security protocol [3]. The extension field 848 format is presented in order that the packet can be parsed without 849 knowledge of the extension field functions. The MAC is used by both 850 Autokey and the symmetric key authentication scheme described in 851 Appendix A. 853 A list of the packet header variables is shown in Figure 7 and 854 described in detail below. Except for a minor variation when using 855 the IPv6 address family, these fields are backwards compatible with 856 NTPv3. The packet header fields apply to both transmitted packets (x 857 prefix) and received packets (r prefix). In Figure 8 the size of 858 some multiple-word fields is shown in bits if not the default 32 859 bits. The basic header extends from the beginning of the packet to 860 the end of the Transmit Timestamp field. 862 The fields and associated packet variables (in parentheses) are 863 interpreted as follows: 865 LI Leap Indicator (leap): 2-bit integer warning of an impending leap 866 second to be inserted or deleted in the last minute of the current 867 month with values defined in Figure 9. 869 +-------+-------------------------------------------------+ 870 | Value | Meaning | 871 +-------+-------------------------------------------------+ 872 | 0 | no warning | 873 | 1 | last minute of the day has 61 seconds | 874 | 2 | last minute of the day has 59 seconds | 875 | 3 | alarm condition (the clock is not synchronized) | 876 +-------+-------------------------------------------------+ 878 Figure 9: Leap Indicator 880 VN Version Number (version): 3-bit integer representing the NTP 881 version number, currently 4. 883 Mode (mode): 3-bit integer representing the mode, with values defined 884 in Figure 10. 886 +-------+--------------------------+ 887 | Value | Meaning | 888 +-------+--------------------------+ 889 | 0 | reserved | 890 | 1 | symmetric active | 891 | 2 | symmetric passive | 892 | 3 | client | 893 | 4 | server | 894 | 5 | broadcast | 895 | 6 | NTP control message | 896 | 7 | reserved for private use | 897 +-------+--------------------------+ 899 Figure 10: Association Modes 901 Stratum (stratum): 8-bit integer representing the stratum, with 902 values defined in Figure 11. 904 +--------+-----------------------------------------------------+ 905 | Value | Meaning | 906 +--------+-----------------------------------------------------+ 907 | 0 | unspecified or invalid | 908 | 1 | primary server (e.g., equipped with a GPS receiver) | 909 | 2-15 | secondary server (via NTP) | 910 | 16 | unsynchronized | 911 | 17-255 | reserved | 912 +--------+-----------------------------------------------------+ 914 Figure 11: Packet Stratum 916 It is customary to map the stratum value 0 in received packets to 917 MAXSTRAT (16) in the peer variable p.stratum and to map p.stratum 918 values of MAXSTRAT or greater to 0 in transmitted packets. This 919 allows reference clocks, which normally appear at stratum 0, to be 920 conveniently mitigated using the same algorithms used for external 921 sources (See Appendix A.5.5.1). 923 Poll: 8-bit signed integer representing the maximum interval between 924 successive messages, in log2 seconds. Suggested default limits for 925 minimum and maximum poll intervals are 6 and 10, respectively. 927 Precision: 8-bit signed integer representing the precision of the 928 system clock, in log2 seconds. For instance a value of -18 929 corresponds to a precision of about one microsecond. The precision 930 can be determined when the service first starts up as the minimum 931 time of several iterations to read the system clock. 933 Root Delay (rootdelay): Total round trip delay to the reference 934 clock, in NTP short format. 936 Root Dispersion (rootdisp): Total dispersion to the reference clock, 937 in NTP short format. 939 Reference ID (refid): 32-bit code identifying the particular server 940 or reference clock. The interpretation depends on the value in the 941 stratum field. For packet stratum 0 (unspecified or invalid) this is 942 a four-character ASCII[15] string, called the kiss code, used for 943 debugging and monitoring purposes. For stratum 1 (reference clock) 944 this is a four-octet, left-justified, zero-padded ASCII string 945 assigned to the reference clock. While not specifically enumerated 946 in this document, the identifiers in Figure 12 have been used as 947 ASCII identifiers: 949 +------+----------------------------------------------------------+ 950 | ID | Clock Source | 951 +------+----------------------------------------------------------+ 952 | GOES | Geosynchronous Orbit Environment Satellite | 953 | GPS | Global Position System | 954 | GAL | Galileo Positioning System | 955 | PPS | Generic pulse-per-second | 956 | IRIG | Inter-Range Instrumentation Group | 957 | WWVB | LF Radio WWVB Ft. Collins, CO 60 kHz | 958 | DCF | LF Radio DCF77 Mainflingen, DE 77.5 kHz | 959 | HBG | LF Radio HBG Prangins, HB 75 kHz | 960 | MSF | LF Radio MSF Anthorn, UK 60 kHz | 961 | JJY | LF Radio JJY Fukushima, JP 40 kHz, Saga, JP 60 kHz | 962 | LORC | MF Radio LORAN C station, 100 kHz | 963 | TDF | MF Radio Allouis, FR 162 kHz | 964 | CHU | HF Radio CHU Ottawa, Ontario | 965 | WWV | HF Radio WWV Ft. Collins, CO | 966 | WWVH | HF Radio WWVH Kauai, HI | 967 | NIST | NIST telephone modem | 968 | ACTS | NIST telephone modem | 969 | USNO | USNO telephone modem | 970 | PTB | European telephone modem | 971 +------+----------------------------------------------------------+ 973 Figure 12: Reference Identifiers 975 Above stratum 1 (secondary servers and clients) this is the reference 976 identifier of the server and can be used to detect timing loops. If 977 using the IPv4 address family, the identifier is the four-octet IPv4 978 address. If using the IPv6 address family, it is the first four 979 octets of the MD5 hash of the IPv6 address. Note that, when using 980 the IPv6 address family on an NTPv4 server with a NTPv3 client, the 981 Reference Identifier field appears to be a random value and a timing 982 loop might not be detected. 984 Reference Timestamp: Time when the system clock was last set or 985 corrected, in NTP timestamp format. 987 Origin Timestamp (org): Time at the client when the request departed 988 for the server, in NTP timestamp format. 990 Receive Timestamp (rec): Time at the server when the request arrived 991 from the client, in NTP timestamp format. 993 Transmit Timestamp (xmt): Time at the server when the response left 994 for the client, in NTP timestamp format. 996 Destination Timestamp (dst): Time at the client when the reply 997 arrived from the server, in NTP timestamp format. 999 Note: Destination Timestamp field is not included as a header field; 1000 it is determined upon arrival of the packet and made available in the 1001 packet buffer data structure. 1003 The MAC consists of the Key Identifier followed by the Message 1004 Digest. The message digest, or cryptosum, is calculated as in [16] 1005 over all NTP header and optional extension fields, but not the MAC 1006 itself. 1008 Extension Field n: See Section 7.5 for a description of the format of 1009 this field. 1011 Key Identifier (keyid): 32-bit unsigned integer used by the client 1012 and server to designate a secret 128-bit MD5 key. 1014 Message Digest (digest): 128-bit bitstring computed by the keyed MD5 1015 message digest computed over all the words in the header and 1016 extension fields, but not the MAC itself. 1018 7.4. The Kiss-o'-Death Packet 1020 If the Stratum field is 0, which implies unspecified or invalid, the 1021 Reference Identifier field can be used to convey messages useful for 1022 status reporting and access control. These are called Kiss-o'-Death 1023 (KoD) packets and the ASCII messages they convey are called kiss 1024 codes. The KoD packets got their name because an early use was to 1025 tell clients to stop sending packets that violate server access 1026 controls. The kiss codes can provide useful information for an 1027 intelligent client, either NTPv4 or SNTPv4. Kiss codes are encoded 1028 in four-character ASCII strings left justified and zero filled. The 1029 strings are designed for character displays and log files. A list of 1030 the currently-defined kiss codes is given in Figure 13. Recipients 1031 of kiss codes MUST inspect them and in the following cases take these 1032 actions: 1034 a. For kiss codes: DENY, RSTR the client MUST demobilize any 1035 associations to that server and stop sending packets to that 1036 server; 1038 b. For kiss code: RATE the client MUST immediately reduce its 1039 polling interval to that server and continue to reduce it each 1040 time it receives a RATE kiss code. 1042 c. Other than the above conditions, KoD packets have no protocol 1043 significance and are discarded after inspection. 1045 +------+------------------------------------------------------------+ 1046 | Code | Meaning | 1047 +------+------------------------------------------------------------+ 1048 | ACST | The association belongs to a unicast server | 1049 | AUTH | Server authentication failed | 1050 | AUTO | Autokey sequence failed | 1051 | BCST | The association belongs to a broadcast server | 1052 | CRYP | Cryptographic authentication or identification failed | 1053 | DENY | Access denied by remote server | 1054 | DROP | Lost peer in symmetric mode | 1055 | RSTR | Access denied due to local policy | 1056 | INIT | The association has not yet synchronized for the first | 1057 | | time | 1058 | MCST | The association belongs to a dynamically discovered server | 1059 | NKEY | No key found. Either the key was never installed or is | 1060 | | not trusted | 1061 | RATE | Rate exceeded. The server has temporarily denied access | 1062 | | because the client exceeded the rate threshold | 1063 | RMOT | Alteration of association from a remote host running | 1064 | | ntpdc. | 1065 | STEP | A step change in system time has occurred, but the | 1066 | | association has not yet resynchronized | 1067 +------+------------------------------------------------------------+ 1069 Figure 13: Kiss Codes 1071 7.5. NTP Extension Field Format 1073 In NTPv4 one or more extension fields can be inserted after the 1074 header and before the MAC, which is always present when an extension 1075 field is present. Other than defining the field format, this 1076 document makes no use of the field contents. An extension field 1077 contains a request or response message in the format shown in 1078 Figure 14. 1080 0 1 2 3 1081 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1082 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1083 | Field Type | Length | 1084 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1085 . . 1086 . Value . 1087 . . 1088 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1089 | Padding (as needed) | 1090 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 Figure 14: Extension Field Format 1094 All extension fields are zero-padded to a word (4 octets) boundary. 1095 The Field Type field is specific to the defined function and is not 1096 elaborated here. While the minimum field length containing required 1097 fields is 4 words (16 octets), a maximum field length remains to be 1098 established. 1100 The Length field is a 16-bit unsigned integer which indicates the 1101 length of the entire extension field in octets, including the Padding 1102 field. 1104 8. On Wire Protocol 1106 The heart of the NTP on-wire protocol is the core mechanism which 1107 exchanges time values between servers, peers and clients. It is 1108 inherently resistant to lost or duplicate packets. Data integrity is 1109 provided by the IP and UDP checksums. No flow control or 1110 retransmission facilities are provided or necessary. The protocol 1111 uses timestamps, either extracted from packet headers or struck from 1112 the system clock upon the arrival or departure of a packet. 1113 Timestamps are precision data and should be restruck in case of link 1114 level retransmission and corrected for the time to compute a MAC on 1115 transmit. 1117 NTP messages make use of two different communication modes, one-to- 1118 one and one-to-many, commonly referred to as unicast and broadcast. 1119 For the purposes of this document, the term broadcast is interpreted 1120 as any available one-to-many mechanism. For IPv4 this equates to 1121 either IPv4 broadcast or IPv4 multicast. For IPv6 this equates to 1122 IPv6 multicast. For this purpose, IANA has allocated the IPv4 1123 multicast address 224.0.1.1 and the IPv6 multicast address ending 1124 :101, with prefix determined by scoping rules. 1126 The on-wire protocol uses four timestamps numbered t1 through t4 and 1127 three state variables org, rec and xmt, as shown in Figure 15. This 1128 figure shows the most general case where each of two peers, A and B, 1129 independently measure the offset and delay relative to the other. 1130 For purposes of illustration the packet timestamps are shown in lower 1131 case, while the state variables are shown in upper case. The state 1132 variables are copied from the packet timestamps upon arrival or 1133 departure of a packet. 1135 t2 t3 t6 t7 1136 +---------+ +---------+ +---------+ +---------+ 1137 | 0 | | t1 | | t3 | | t5 | 1138 +---------+ +---------+ +---------+ +---------+ 1139 | 0 | | t2 | | t4 | | t6 | Packet 1140 +---------+ +---------+ +---------+ +---------+ Timestamps 1141 | t1 | |t3=clock | | t5 | |t7=clock | 1142 +---------+ +---------+ +---------+ +---------+ 1143 |t2=clock | |t6=clock | 1144 +---------+ +---------+ 1145 Peer B 1146 +---------+ +---------+ +---------+ +---------+ 1147 org | T1 | | T1 | | t5<>T1? | | T5 | 1148 +---------+ +---------+ +---------+ +---------+ State 1149 rec | T2 | | T2 | | T6 | | T6 | Variables 1150 +---------+ +---------+ +---------+ +---------+ 1151 xmt | 0 | | T3 | | t3=T3? | | T7 | 1152 +---------+ +---------+ +---------+ +---------+ 1154 t2 t3 t6 t7 1155 --------------------------------------------------------- 1156 /\ \ /\ \ 1157 / \ / \ 1158 / \ / \ 1159 / \/ / \/ 1160 --------------------------------------------------------- 1161 t1 t4 t5 t8 1163 t1 t4 t5 t8 1164 +---------+ +---------+ +---------+ +---------+ 1165 | 0 | | t1 | | t3 | | t5 | 1166 +---------+ +---------+ +---------+ +---------+ 1167 | 0 | | t2 | | t4 | | t6 | Packet 1168 +---------+ +---------+ +---------+ +---------+ Timestamps 1169 |t1=clock | | t3 | |t5=clock | | t7 | 1170 +---------+ +---------+ +---------+ +---------+ 1171 |t4=clock | |t8=clock | 1172 +---------+ +---------+ 1173 Peer A 1174 +---------+ +---------+ +---------+ +---------+ 1175 org | 0 | | t3<>0? | | T3 | | t7<>T3? | 1176 +---------+ +---------+ +---------+ +---------+ State 1177 rec | 0 | | T4 | | T4 | | T8 | Variables 1178 +---------+ +---------+ +---------+ +---------+ 1179 xmt | T1 | | t1=T1? | | T5 | | t5=T5? | 1180 +---------+ +---------+ +---------+ +---------+ 1181 Figure 15: On-Wire Protocol 1183 In the figure the first packet transmitted by A contains only the 1184 origin timestamp t1, which is then copied to T1. B receives the 1185 packet at t2 and copies t1 to T1 and the receive timestamp t2 to T2. 1186 At this time or some time later at t3, B sends a packet to A 1187 containing t1 and t2 and in addition the transmit timestamp t3. All 1188 three timestamps are copied to the corresponding state variables. A 1189 receives the packet at t4 containing the three timestamps t1, t2 and 1190 t3 and in addition the destination timestamp t4. These four 1191 timestamps are used to compute the offset and delay of B relative to 1192 A, as described below. 1194 Before the xmt and org state variables are updated, two sanity checks 1195 are performed in order to protect against duplicate, bogus or 1196 replayed packets. In the exchange above, a packet is duplicate or 1197 replay if the transmit timestamp t3 in the packet matches the org 1198 state variable T3. A packet is bogus if the origin timestamp t1 in 1199 the packet does not match the xmt state variable T1. In either of 1200 these cases the state variables are updated, then the packet is 1201 discarded. To protect against replay of the last transmitted packet, 1202 the xmt state variable is set to zero immediately after a successful 1203 bogus check. 1205 The four most recent timestamps, T1 through T4, are used to compute 1206 the offset of B relative to A 1208 theta = T(B) - T(A) = 1/2 * [(T2-T1) + (T3-T4)] 1210 and the round trip delay 1212 delta = T(ABA) = (T4-T1) - (T3-T2). 1214 Note that the quantities within parentheses are computed from 64-bit 1215 unsigned timestamps and result in signed values with 63 significant 1216 bits plus sign. These values can represent dates from 68 years in 1217 the past to 68 years in the future. However, the offset and delay 1218 are computed as sums and differences of these values, which contain 1219 62 significant bits and two sign bits, so can represent unambiguous 1220 values from 34 years in the past to 34 years in the future. In other 1221 words, the time of the client must be set within 34 years of the 1222 server before the service is started. This is a fundamental 1223 limitation with 64-bit integer arithmetic. 1225 In implementations where floating double arithmetic is available, the 1226 first-order differences can be converted to floating double and the 1227 second-order sums and differences computed in that arithmetic. Since 1228 the second-order terms are typically very small relative to the 1229 timestamp magnitudes, there is no loss in significance, yet the 1230 unambiguous range is restored from 34 years to 68 years. 1232 In some scenarios where the initial frequency offset of the client is 1233 relatively large and the actual propagation time small, it is 1234 possible for the delay computation to becomes negative. For 1235 instance, if the frequency difference is 100 PPM and the interval 1236 T4-T1 is 64 s, the apparent delay is -6.4 ms. Since negative values 1237 are misleading in subsequent computations, the value of delta should 1238 be clamped not less than s.rho, where s.rho is the system precision 1239 described in Section 11.1, expressed in seconds. 1241 The discussion above assumes the most general case where two 1242 symmetric peers independently measure the offsets and delays between 1243 them. In the case of a stateless server, the protocol can be 1244 simplified. A stateless server copies T3 and T4 from the client 1245 packet to T1 and T2 of the server packet and tacks on the transmit 1246 timestamp T3 before sending it to the client. Additional details for 1247 filling in the remaining protocol fields are given in a Section 9 and 1248 following sections and in the appendix. 1250 Note that the on-wire protocol as described resists replay of a 1251 server response packet. However, it does not resist replay of the 1252 client request packet, which would result in a server reply packet 1253 with new values of T2 and T3 and result in incorrect offset and 1254 delay. This vulnerability can be avoided by setting the xmt state 1255 variable to zero after computing the offset and delay. 1257 9. Peer Process 1259 The process descriptions to follow include a listing of the important 1260 state variables followed by an overview of the process operations 1261 implemented as routines. Frequent reference is made to the skeleton 1262 in the appendix. The skeleton includes C-language fragments that 1263 describe the functions in more detail. It includes the parameters, 1264 variables and declarations necessary for a conforming NTPv4 1265 implementation. However, many additional variables and routines may 1266 be necessary in a working implementation. 1268 The peer process is called upon arrival of a server or peer packet. 1269 It runs the on-wire protocol to determine the clock offset and round 1270 trip delay and in addition computes statistics used by the system and 1271 poll processes. Peer variables are instantiated in the association 1272 data structure when the structure is initialized and updated by 1273 arriving packets. There is a peer process, poll process and 1274 association for each server. 1276 9.1. Peer Process Variables 1278 Figure 16, Figure 17, Figure 18 and Figure 19 summarize the common 1279 names, formula names and a short description of the peer variables. 1280 The common names and formula names are interchangeable; formula names 1281 are intended to increase readability of equations in this 1282 specification. Unless noted otherwise, all peer variables have 1283 assumed prefix p. 1285 +---------+----------+-----------------------+ 1286 | Name | Formula | Description | 1287 +---------+----------+-----------------------+ 1288 | srcaddr | srcaddr | source address | 1289 | srcport | srcport | source port | 1290 | dstaddr | dstaddr | destination address | 1291 | dstport | destport | destination port | 1292 | keyid | keyid | key identifier key ID | 1293 +---------+----------+-----------------------+ 1295 Figure 16: Peer Process Configuration Variables 1297 +-----------+------------+---------------------+ 1298 | Name | Formula | Description | 1299 +-----------+------------+---------------------+ 1300 | leap | leap | leap indicator | 1301 | version | version | version number | 1302 | mode | mode | mode | 1303 | stratum | stratum | stratum | 1304 | ppoll | ppoll | peer poll exponent | 1305 | rootdelay | delta_r | root delay | 1306 | rootdisp | epsilon_r | root dispersion | 1307 | refid | refid | reference ID | 1308 | reftime | reftime | reference timestamp | 1309 +-----------+------------+---------------------+ 1311 Figure 17: Peer Process Packet Variables 1313 +------+---------+--------------------+ 1314 | Name | Formula | Description | 1315 +------+---------+--------------------+ 1316 | org | T1 | origin timestamp | 1317 | rec | T2 | receive timestamp | 1318 | xmt | T3 | transmit timestamp | 1319 | t | t | packet time | 1320 +------+---------+--------------------+ 1322 Figure 18: Peer Process Timestamp Variables 1324 +--------+---------+-----------------+ 1325 | Name | Formula | Description | 1326 +--------+---------+-----------------+ 1327 | offset | theta | clock offset | 1328 | delay | delta | roundtrip delay | 1329 | disp | epsilon | dispersion | 1330 | jitter | psi | jitter | 1331 | filter | filter | clock filter | 1332 | tp | t_p | filter time | 1333 +--------+---------+-----------------+ 1335 Figure 19: Peer Process Statistics Variables 1337 The following configuration variables are normally initialized when 1338 the association is mobilized, either from a configuration file or 1339 upon the arrival of the first packet for an unknown association. 1341 srcaddr: IP address of the remote server or reference clock. This 1342 becomes the destination IP address in packets sent from this 1343 association. 1345 srcport: UDP port number of the server or reference clock. This 1346 becomes the destination port number in packets sent from this 1347 association. When operating in symmetric modes (1 and 2) this field 1348 must contain the NTP port number PORT (123) assigned by the IANA. In 1349 other modes it can contain any number consistent with local policy. 1351 dstaddr: IP address of the client. This becomes the source IP 1352 address in packets sent from this association. 1354 dstport: UDP port number of the client, ordinarily the NTP port 1355 number PORT (123) assigned by the IANA. This becomes the source port 1356 number in packets sent from this association. 1358 keyid: Symmetric key ID for the 128-bit MD5 key used to generate and 1359 verify the MAC. The client and server or peer can use different 1360 values, but they must map to the same key. 1362 The variables defined in Figure 17 are updated from the packet header 1363 as each packet arrives. They are interpreted in the same way as the 1364 packet variables of the same names. It is convenient for later 1365 processing to convert the NTP short format packet values r.rootdelay 1366 and r.rootdisp to floating doubles as peer variables. 1368 The variables defined in Figure 18 include the timestamps exchanged 1369 by the on-wire protocol in Section 8. The t variable is the seconds 1370 counter c.t associated with these values. The c.t variable is 1371 maintained by the clock adjust process described in Section 12. It 1372 counts the seconds since the service was started. The variables 1373 defined in Figure 19 include the statistics computed by the 1374 clock_filter() routine described in Section 10. The tp variable is 1375 the seconds counter associated with these values. 1377 9.2. Peer Process Operations 1379 The receive() routine in Appendix A.5.1 shows the peer process code 1380 flow upon the arrival of a packet. The access() routine in 1381 Appendix A.5.4 implements access restrictions using an access control 1382 list (ACL). There is no specific method required for access control, 1383 although it is recommended that implementations include such a 1384 scheme, which is similar to many others now in widespread use. 1385 Format checks require correct field length and alignment, acceptable 1386 version number (1-4) and correct extension field syntax, if present. 1388 There is no specific requirement for authentication; however, if 1389 authentication is implemented, the symmetric key scheme described in 1390 Appendix A.2 must be among the supported schemes. This scheme uses 1391 the MD5 keyed hash algorithm described in [16]. 1393 Next, the association table is searched for matching source address 1394 and source port using the find_assoc() routine in Appendix A.5.1. 1395 Figure 20 is a dispatch table where the columns correspond to the 1396 packet mode and rows correspond to the association mode. The 1397 intersection of the association and packet modes dispatches 1398 processing to one of the following steps. 1400 +------------------+---------------------------------------+ 1401 | | Packet Mode | 1402 +------------------+-------+-------+-------+-------+-------+ 1403 | Association Mode | 1 | 2 | 3 | 4 | 5 | 1404 +------------------+-------+-------+-------+-------+-------+ 1405 | No Association 0 | NEWPS | DSCRD | FXMIT | MANY | NEWBC | 1406 | Symm. Active 1 | PROC | PROC | DSCRD | DSCRD | DSCRD | 1407 | Symm. Passive 2 | PROC | ERR | DSCRD | DSCRD | DSCRD | 1408 | Client 3 | DSCRD | DSCRD | DSCRD | PROC | DSCRD | 1409 | Server 4 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1410 | Broadcast 5 | DSCRD | DSCRD | DSCRD | DSCRD | DSCRD | 1411 | Bcast Client 6 | DSCRD | DSCRD | DSCRD | DSCRD | PROC | 1412 +------------------+-------+-------+-------+-------+-------+ 1414 Figure 20: Peer Dispatch Table 1416 DSCRD. This indicates a nonfatal violation of protocol as the result 1417 of a programming error, long delayed packet or replayed packet. The 1418 peer process discards the packet and exits. 1420 ERR. This indicates a fatal violation of protocol as the result of a 1421 programming error, long delayed packet or replayed packet. The peer 1422 process discards the packet, demobilizes the symmetric passive 1423 association and exits. 1425 FXMIT. This indicates a client (mode 3) packet matching no 1426 association (mode 0). If the destination address is not a broadcast 1427 address, the server constructs a server (mode 4) packet and returns 1428 it to the client without retaining state. The server packet header 1429 is constructed by the fast_xmit() routine in Appendix A.5.3. The 1430 packet header is assembled from the receive packet and system 1431 variables as shown in Figure 21. If the s.rootdelay and s.rootdisp 1432 system variables are stored in floating double, they must be 1433 converted to NTP short format first. 1435 +-----------------------------------+ 1436 | Packet Variable --> Variable | 1437 +-----------------------------------+ 1438 | r.leap --> p.leap | 1439 | r.mode --> p.mode | 1440 | r.stratum --> p.stratum | 1441 | r.poll --> p.ppoll | 1442 | r.rootdelay --> p.rootdelay | 1443 | r.rootdisp --> p.rootdisp | 1444 | r.refid --> p.refid | 1445 | r.reftime --> p.reftime | 1446 | r.keyid --> p.keyid | 1447 +-----------------------------------+ 1449 Figure 21: Receive Packet Header 1451 Note that, if authentication fails, the server returns a special 1452 message called a crypto-NAK. This message includes the normal NTP 1453 header data shown in Figure 8, but with a MAC consisting of four 1454 octets of zeros. The client MAY accept or reject the data in the 1455 message. After these actions the peer process exits. 1457 If the destination address is a multicast address, the sender is 1458 operating in manycast client mode. If the packet is valid and the 1459 server stratum is less than the client stratum, the server sends an 1460 ordinary server (mode 4) packet, but using its unicast destination 1461 address. A crypto-NAK is not sent if authentication fails. After 1462 these actions the peer process exits. 1464 MANY: This indicates a server (mode 4) packet matching no 1465 association. Ordinarily, this can happen only as the result of a 1466 manycast server reply to a previously sent multicast client packet. 1467 If the packet is valid, an ordinary client (mode 3) association is 1468 mobilized and operation continues as if the association was mobilized 1469 by the configuration file. 1471 NEWBC. This indicates a broadcast (mode 5) packet matching no 1472 association. The client mobilizes either a client (mode 3) or 1473 broadcast client (mode 6) association as shown in the mobilize() and 1474 clear() routines in Appendix A.2. Then the packet() routine in 1475 Appendix A.5.1.1 validates the packet and initializes the peer 1476 variables. 1478 If the implementation supports no additional security or calibration 1479 functions, the association mode is set to broadcast client (mode 6) 1480 and the peer process exits. Implementations supporting public key 1481 authentication MAY run the Autokey or equivalent security protocol. 1482 Implementations SHOULD set the association mode to 3 and run a short 1483 client/server exchange to determine the propagation delay. Following 1484 the exchange the association mode is set to 6 and the peer process 1485 continues in listen-only mode. Note the distinction between a mode-6 1486 packet, which is reserved for the NTP monitor and control functions, 1487 and a mode-6 association. 1489 NEWPS. This indicates a symmetric active (mode 1) packet matching no 1490 association. The client mobilizes a symmetric passive (mode 2) 1491 association as shown in the mobilize() routine and clear() routines 1492 in Appendix A.2. Processing continues in the PROC section below. 1494 PROC. This indicates a packet matching an existing association. The 1495 packet timestamps are carefully checked to avoid invalid, duplicate 1496 or bogus packets. Additional checks are summarized in Figure 22. 1497 Note that all packets, including a crypto-NAK, are considered valid 1498 only if they survive these tests. 1500 +--------------------------+----------------------------------------+ 1501 | Packet Type | Description | 1502 +--------------------------+----------------------------------------+ 1503 | 1 duplicate packet | The packet is at best an old duplicate | 1504 | | or at worst a replay by a hacker. | 1505 | | This can happen in symmetric modes if | 1506 | | the poll intervals are uneven. | 1507 | 2 bogus packet | | 1508 | 3 invalid | One or more timestamp fields are | 1509 | | invalid. This normally happens in | 1510 | | symmetric modes when one peer sends | 1511 | | the first packet to the other and | 1512 | | before the other has received its | 1513 | | first reply. | 1514 | 4 access denied | The access controls have blacklisted | 1515 | | the source. | 1516 | 5 authentication failure | The cryptographic message digest does | 1517 | | not match the MAC. | 1518 | 6 unsynchronized | The server is not synchronized to a | 1519 | | valid source. | 1520 | 7 bad header data | One or more header fields are invalid. | 1521 +--------------------------+----------------------------------------+ 1523 Figure 22: Packet Error Checks 1525 Processing continues in the packet() routine in Appendix A.5.1.1. It 1526 copies the packet variables to the peer variables as shown in 1527 Figure 21 and the packet() routine in Appendix A.5.2. The receive() 1528 routine implements tests 1-5 in Figure 22; the packet() routine 1529 implements tests 6-7. If errors are found the packet is discarded 1530 and the peer process exits. 1532 The on-wire protocol calculates the clock offset theta and round trip 1533 delay delta from the four most recent timestamps as described in 1534 Section 8. While it is in principle possible to do all calculations 1535 except the first-order timestamp differences in fixed-point 1536 arithmetic, it is much easier to convert the first-order differences 1537 to floating doubles and do the remaining calculations in that 1538 arithmetic, and this will be assumed in the following description. 1540 Next, the 8-bit p.reach shift register in the poll process described 1541 in Section 13 is used to determine whether the server is reachable 1542 and the data are fresh. The register is shifted left by one bit when 1543 a packet is sent and the rightmost bit is set to zero. As valid 1544 packets arrive, the packet() routine sets the rightmost bit to one. 1545 If the register contains any nonzero bits, the server is considered 1546 reachable; otherwise, it is unreachable. Since the peer poll 1547 interval might have changed since the last packet, the poll_update() 1548 routine in Appendix A.5.7.2 is called to redetermine the host poll 1549 interval. 1551 The dispersion statistic epsilon(t) represents the maximum error due 1552 to the frequency tolerance and time since the last packet was sent. 1553 It is initialized 1555 epsilon(t_0) = r.rho + s.rho + PHI * (T4-T1) 1557 when the measurement is made at t_0 according to the seconds counter. 1558 Here r.rho is the packet precision described in Section 7.3 and s.rho 1559 is the system precision described in Section 11.1, both expressed in 1560 seconds. These terms are necessary to account for the uncertainty in 1561 reading the system clock in both the server and the client. 1563 The dispersion then grows at constant rate PHI; in other words, at 1564 time t, epsilon(t) = epsilon(t_0) + PHI * (t-t_0). With the default 1565 value PHI = 15 PPM, this amounts to about 1.3 s per day. With this 1566 understanding, the argument t will be dropped and the dispersion 1567 represented simply as epsilon. The remaining statistics are computed 1568 by the clock filter algorithm described in the next section. 1570 10. Clock Filter Algorithm 1572 The clock filter algorithm, part of the peer process, is implemented 1573 in the clock_filter() routine of Appendix A.5.2. It grooms the 1574 stream of on-wire data to select the samples most likely to represent 1575 accurate time. The algorithm produces the variables shown in 1576 Figure 19, including the offset (theta), delay (delta), dispersion 1577 (epsilon), jitter (psi) and time of arrival (t). These data are used 1578 by the mitigation algorithms to determine the best and final offset 1579 used to discipline the system clock. They are also used to determine 1580 the server health and whether it is suitable for synchronization. 1582 The clock filter algorithm saves the most recent sample tuples 1583 (theta, delta, epsilon, t) in the filter structure, which functions 1584 as an 8-stage shift register. The tuples are saved in the order that 1585 packets arrive. Here t is the packet time of arrival according to 1586 the seconds counter and should not be confused with the peer variable 1587 tp. 1589 The following scheme is used to insure sufficient samples are in the 1590 filter and that old stale data are discarded. Initially, the tuples 1591 of all stages are set to the dummy tuple (0, MAXDISP, MAXDISP, 0). 1592 As valid packets arrive, tuples are shifted into the filter causing 1593 old tuples to be discarded, so eventually only valid tuples remain. 1594 If the three low order bits of the reach register are zero, 1595 indicating three poll intervals have expired with no valid packets 1596 received, the poll process calls the clock filter algorithm with a 1597 dummy tuple just as if the tuple had arrived from the network. If 1598 this persists for eight poll intervals, the register returns to the 1599 initial condition. 1601 In the next step the shift register stages are copied to a temporary 1602 list and the list sorted by increasing delta. Let i index the stages 1603 starting with the lowest delta. If the first tuple epoch t_0 is not 1604 later than the last valid sample epoch tp, the routine exits without 1605 affecting the current peer variables. Otherwise, let epsilon_i be 1606 the dispersion of the ith entry, then 1607 i=n-1 1608 --- epsilon_i 1609 epsilon = \ ---------- 1610 / (i+1) 1611 --- 2 1612 i=0 1614 is the peer dispersion p.disp. Note the overload of epsilon, whether 1615 input to the clock filter or output, the meaning should be clear from 1616 context. 1618 The observer should note (a) if all stages contain the dummy tuple 1619 with dispersion MAXDISP, the computed dispersion is a little less 1620 than 16 s, (b) each time a valid tuple is shifted into the register, 1621 the dispersion drops by a little less than half, depending on the 1622 valid tuples dispersion, (c) after the fourth valid packet the 1623 dispersion is usually a little less than 1 s, which is the assumed 1624 value of the MAXDIST parameter used by the selection algorithm to 1625 determine whether the peer variables are acceptable or not. 1627 Let the first stage offset in the sorted list be theta_0; then, for 1628 the other stages in any order, the jitter is the RMS average 1629 +----- -----+^1/2 1630 | n-1 | 1631 | --- | 1632 1 | \ 2 | 1633 psi = -------- * | / (theta_0-theta_j) | 1634 (n-1) | --- | 1635 | j=1 | 1636 +----- -----+ 1638 where n is the number of valid tuples in the filter (n > 1). In 1639 order to insure consistency and avoid divide exceptions in other 1640 computations, the psi is bounded from below by the system precision 1641 s.rho expressed in seconds. While not in general considered a major 1642 factor in ranking server quality, jitter is a valuable indicator of 1643 fundamental timekeeping performance and network congestion state. Of 1644 particular importance to the mitigation algorithms is the peer 1645 synchronization distance, which is computed from the delay and 1646 dispersion. 1648 lambda = (delta / 2) + epsilon. 1650 Note that epsilon and therefore lambda increase at rate PHI. The 1651 lambda is not a state variable, since lambda is recalculated at each 1652 use. It is a component of the root synchronization distance used by 1653 the mitigation algorithms as a metric to evaluate the quality of time 1654 available from each server. 1656 It is important to note that, unlike NTPv3, NTPv4 associations do not 1657 show a timeout condition by setting the stratum to 16 and leap 1658 indicator to 3. The association variables retain the values 1659 determined upon arrival of the last packet. In NTPv4 lambda 1660 increases with time, so eventually the synchronization distance 1661 exceeds the distance threshold MAXDIST, in which case the association 1662 is considered unfit for synchronization. 1664 11. System Process 1666 As each new sample (theta, delta, epsilon, jitter, t) is produced by 1667 the clock filter algorithm, all peer processes are scanned by the 1668 mitigation algorithms consisting of the selection, cluster, combine 1669 and clock discipline algorithms in the system process. The selection 1670 algorithm scans all associations and casts off the falsetickers, 1671 which have demonstrably incorrect time, leaving the truechimers as 1672 result. In a series of rounds the cluster algorithm discards the 1673 association statistically furthest from the centroid until a 1674 specified minimum number of survivors remain. The combine algorithm 1675 produces the best and final statistics on a weighted average basis. 1676 The final offset is passed to the clock discipline algorithm to steer 1677 the system clock to the correct time. 1679 The cluster algorithm selects one of the survivors as the system 1680 peer. The associated statistics (theta, delta, epsilon, jitter, t) 1681 are used to construct the system variables inherited by dependent 1682 servers and clients and made available to other applications running 1683 on the same machine. 1685 11.1. System Process Variables 1687 Figure 25 summarizes the common names, formula names and a short 1688 description of each system variable. Unless noted otherwise, all 1689 variables have assumed prefix s. 1691 +-----------+------------+------------------------+ 1692 | Name | Formula | Description | 1693 +-----------+------------+------------------------+ 1694 | t | t | update time | 1695 | p | p | system peer identifier | 1696 | leap | leap | leap indicator | 1697 | stratum | stratum | stratum | 1698 | precision | rho | precision | 1699 | offset | THETA | combined offset | 1700 | jitter | PSI | combined jitter | 1701 | rootdelay | DELTA | root delay | 1702 | rootdisp | EPSILON | root dispersion | 1703 | v | v | survivor list | 1704 | refid | refid | reference ID | 1705 | reftime | reftime | reference time | 1706 | NMIN | 3 | minimum survivors | 1707 | CMIN | 1 | minimum candidates | 1708 +-----------+------------+------------------------+ 1710 Figure 25: System Process Variables 1712 Except for the t, p, offset and jitter variables and the NMIN and 1713 CMIN constants, the variables have the same format and interpretation 1714 as the peer variables of the same name. The NMIN and CMIN parameters 1715 are used by the selection and cluster algorithms described in the 1716 next section. 1718 The t variable is the seconds counter at the last update determined 1719 by the clock_update() routine in Appendix A.5.5.4. The p variable is 1720 the system peer identifier determined by the cluster() routine in 1721 Section 11.2.2. The precision variable has the same format as the 1722 packet variable of the same name. The precision is defined as the 1723 larger of the resolution and time to read the clock, in log2 units. 1724 For instance, the precision of a mains-frequency clock incrementing 1725 at 60 Hz is 16 ms, even when the system clock hardware representation 1726 is to the nanosecond. 1728 The offset and jitter variables are determined by the combine() 1729 routine in Section 11.2.3. These values represent the best and final 1730 offset and jitter used to discipline the system clock. Initially, 1731 all variables are cleared to zero, then the leap is set to 3 1732 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember that 1733 MAXSTRAT is mapped to zero in the transmitted packet. 1735 11.2. System Process Operations 1737 Figure 26 summarizes the system process operations performed by the 1738 clock_select() routine. The selection algorithm described in 1739 Section 11.2.1 produces a majority clique of presumed correct 1740 candidates (truechimers) based on agreement principles. The cluster 1741 algorithm described in Section 11.2.2 discards outlyers to produce 1742 the most accurate survivors. The combine algorithm described in 1743 Section 11.2.3 provides the best and final offset for the clock 1744 discipline algorithm described in Appendix A.5.5.6. If the selection 1745 algorithm cannot produce a majority clique, or if it cannot produce 1746 at least CMIN survivors, the system process exits without 1747 disciplining the system clock. If successful, the cluster algorithm 1748 selects the statistically best candidate as the system peer and its 1749 variables are inherited as the system variables. 1751 +-----------------+ 1752 | clock_select() | 1753 +-----------------+ 1754 ................................|........... 1755 . V . 1756 . yes +---------+ +-----------------+ . 1757 . +--| accept? | | scan candidates | . 1758 . | +---------+ | | . 1759 . V no | | | . 1760 . +---------+ | | | . 1761 . | add peer| | | | . 1762 . +---------- | | | . 1763 . | V | | . 1764 . +---------->-->| | . 1765 . | | . 1766 . Selection Algorithm +-----------------+ . 1767 .................................|.......... 1768 V 1769 no +-------------------+ 1770 +-------------| survivors? | 1771 | +-------------------+ 1772 | | yes 1773 | V 1774 | +-------------------+ 1775 | | Cluster Algorithm | 1776 | +-------------------+ 1777 | | 1778 | V 1779 V yes +-------------------+ 1780 |<------------| n < CMIN? | 1781 | +-------------------+ 1782 V | 1783 +-----------------+ V no 1784 | s.p = NULL | +-------------------+ 1785 +-----------------+ | s.p = v_0.p | 1786 | +-------------------+ 1787 V | 1788 +-----------------+ V 1789 | return (UNSYNC) | +-------------------+ 1790 +-----------------+ | return (SYNC) | 1791 +-------------------+ 1793 Figure 26: clock_select() Routine 1795 11.2.1. Selection Algorithm 1797 Note that the selection and cluster algorithms are described 1798 separately, but combined in the code skeleton. The selection 1799 algorithm operates to find an intersection interval containing a 1800 majority clique of truechimers using Byzantine agreement principles 1801 originally proposed by Marzullo [8], but modified to improve 1802 accuracy. An overview of the algorithm is given below and in the 1803 first half of the clock_select() routine in Appendix A.5.5.1. 1805 First, those servers which are unusable according to the rules of the 1806 protocol are detected and discarded by the accept() routine in 1807 Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is generated 1808 for the remaining candidates. Here, p is the association identifier 1809 and type identifies the upper (+1), middle (0) and lower (-1) 1810 endpoints of a correctness interval centered on theta for that 1811 candidate. This results in three tuples, lowpoint (p, -1, theta - 1812 lambda), midpoint (p, 0, theta) and highpoint (p, +1, theta + 1813 lambda), where lambda is the root synchronization distance calculated 1814 on each use by the rootdist() routine in Appendix A.5.1.1. The steps 1815 of the algorithm are: 1817 1. For each of m associations, place three tuples as defined above 1818 on the candidate list. 1820 2. Sort the tuples on the list by the edge component. Order the 1821 lowpoint, midpoint and highpoint of these intervals from lowest to 1822 highest. Set the number of falsetickers f = 0. 1824 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest 1825 endpoint to highest. Add one to c for every lowpoint, subtract one 1826 for every highpoint, add one to d for every midpoint. If c >= m - f, 1827 stop; set l = current lowpoint. 1829 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c 1830 for every highpoint, subtract one for every lowpoint, add one to d 1831 for every midpoint. If c >= m - f, stop; set u = current highpoint. 1833 5. Is d = f and l < u? If yes, then follow step 5A; else, follow 1834 step 5B. 1836 5A. Success: the intersection interval is [l, u]. 1838 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. 1839 If no, then go to step 6. 1841 6. Failure; a majority clique could not be found. There are no 1842 suitable candidates to discipline the system clock. 1844 The algorithm is described in detail in Appendix A.5.5.1. Note that 1845 it starts with the assumption that there are no falsetickers (f = 0) 1846 and attempts to find a nonempty intersection interval containing the 1847 midpoints of all correct servers, i.e., truechimers. If a nonempty 1848 interval cannot be found, it increases the number of assumed 1849 falsetickers by one and tries again. If a nonempty interval is found 1850 and the number of falsetickers is less than the number of 1851 truechimers, a majority clique has been found and the midpoint of 1852 each truechimer (theta) represents the candidates available to the 1853 cluster algorithm. 1855 If a majority clique is not found, or if the number of truechimers is 1856 less than CMIN, there are insufficient candidates to discipline the 1857 system clock. CMIN defines the minimum number of servers consistent 1858 with the correctness requirements. Suspicious operators would set 1859 CMIN to insure multiple redundant servers are available for the 1860 algorithms to mitigate properly. However, for historic reasons the 1861 default value for CMIN is one. 1863 11.2.2. Cluster Algorithm 1865 The candidates of the majority clique are placed on the survivor list 1866 v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an 1867 association identifier, theta_p, psi_p, and stratum_p the current 1868 offset, jitter and stratum of association p, respectively, and 1869 lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, 1870 where lambda is the root synchronization distance for association p. 1871 The list is processed by the cluster algorithm below and the second 1872 half of the clock_select() algorithm in Appendix A.5.5.1. 1874 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 1876 2. Sort the candidates by increasing lambda_p. Let n be the number 1877 of candidates and NMIN the minimum required number of survivors. 1879 3. For each candidate compute the selection jitter psi_s: 1881 +----- -----+^1/2 1882 | n-1 | 1883 | --- | 1884 | 1 \ 2 | 1885 psi_s = | ---- * / (theta_s - theta_j) | 1886 | n-1 --- | 1887 | j=1 | 1888 +----- -----+ 1890 4. Select psi_max as the candidate with maximum psi_s. 1892 5. Select psi_min as the candidate with minimum psi_p. 1894 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; 1895 otherwise, follow step 6B. 1897 6A. Done. The remaining candidates on the survivor list are ranked 1898 in the order of preference. The first entry on the list represents 1899 the system peer; its variables are used later to update the system 1900 variables. 1902 6B. Delete the outlyer candidate with psi_max; reduce n by one and go 1903 back to step 3. 1905 The algorithm operates in a series of rounds where each round 1906 discards the statistical outlyer with maximum selection jitter psi_s. 1907 However, if psi_s is less than the minimum peer jitter psi_p, no 1908 improvement is possible by discarding outlyers. This and the minimum 1909 number of survivors represent the terminating conditions of the 1910 algorithm. Upon termination, the final value of psi_max is saved as 1911 the system selection jitter PSI_s for use later. 1913 11.2.3. Combine Algorithm 1915 The remaining survivors are processed by the clock_combine() routine 1916 in Appendix A.5.5.5 to produce the best and final data for the clock 1917 discipline algorithm. The clock_combine() routine processes peer 1918 offset and jitter statistics to produce the combined system offset 1919 THETA and system peer jitter PSI_p, where each server statistic is 1920 weighted by the reciprocal of the root synchronization distance and 1921 the result normalized. 1923 The combined THETA is passed to the clock_update() routine in 1924 Appendix A.5.5.4. The first candidate on the survivor list is 1925 nominated as the system peer with identifier p. The system peer 1926 jitter PSI_p is a component of the system jitter PSI. It is used 1927 along with the selection jitter PSI_s to produce the system jitter: 1929 PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2 1931 Each time an update is received from the system peer, the 1932 clock_update() routine in Appendix A.5.5.4 is called. By rule, an 1933 update is discarded if its time of arrival p.t is not strictly later 1934 than the last update used s.t. The labels IGNOR, PANIC, ADJ and STEP 1935 refer to return codes from the local_clock() routine described in the 1936 next section. 1938 IGNORE means the update has been ignored as an outlyer. PANIC means 1939 the offset is greater than the panic threshold PANICT (1000 s) and 1940 SHOULD cause the program to exit with a diagnostic message to the 1941 system log. STEP means the offset is less than the panic threshold, 1942 but greater than the step threshold STEPT (125 ms). In this case the 1943 clock is stepped to the correct offset, but since this means all peer 1944 data have been invalidated, all associations MUST be reset and the 1945 client begins as at initial start. 1947 ADJ means the offset is less than the step threshold and thus a valid 1948 update. In this case the system variables are updated from the peer 1949 variables as shown in Figure 28. 1951 +-------------------------------------------+ 1952 | System Variable <-- System Peer Variable | | 1953 +-------------------------------------------+ 1954 | s.leap <-- p.leap | 1955 | s.stratum <-- p.stratum + 1 | 1956 | s.offset <-- THETA | 1957 | s.jitter <-- PSI | 1958 | s.rootdelay <-- p.delta_r + delta | 1959 | s.rootdisp <-- p.epsilon_r + p.epsilon + | 1960 | p.psi + PHI * (s.t - p.t) | 1961 | + |THETA| | 1962 | s.refid <-- p.refid | 1963 | s.reftime <-- p.reftime | 1964 | s.t <-- p.t | 1965 +-------------------------------------------+ 1967 Figure 28: System Variables Update 1969 There is an important detail not shown. The dispersion increment 1970 (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from 1971 below by MINDISP. In subnets with very fast processors and networks 1972 and very small delay and dispersion this forces a monotone-definite 1973 increase in s.rootdisp (EPSILON), which avoids loops between peers 1974 operating at the same stratum. 1976 The system variables are available to dependent application programs 1977 as nominal performance statistics. The system offset THETA is the 1978 clock offset relative to the available synchronization sources. The 1979 system jitter PSI is an estimate of the error in determining this 1980 value, elsewhere called the expected error. The root delay DELTA is 1981 the total round trip delay relative to the primary server. The root 1982 dispersion EPSILON is the dispersion accumulated over the network 1983 from the primary server. Finally, the root synchronization distance 1984 is defined 1986 LAMBDA = EPSILON + DELTA / 2, 1987 which represents the maximum error due all causes and is designated 1988 the root synchronization distance. 1990 11.3. Clock Discipline Algorithm 1992 The NTPv4 clock discipline algorithm, shortened to discipline in the 1993 following, functions as a combination of two philosophically quite 1994 different feedback control systems. In a phase-locked loop (PLL) 1995 design, periodic phase updates at update intervals mu seconds are 1996 used directly to minimize the time error and indirectly the frequency 1997 error. In a frequency-locked loop (FLL) design, periodic frequency 1998 updates at intervals mu are used directly to minimize the frequency 1999 error and indirectly the time error. As shown in [5], a PLL usually 2000 works better when network jitter dominates, while a FLL works better 2001 when oscillator wander dominates. This section contains an outline 2002 of how the NTPv4 design works. An in-depth discussion of the design 2003 principles is provided in [5], which also includes a performance 2004 analysis. 2006 The discipline is implemented as the feedback control system shown in 2007 Figure 29. The variable theta_r represents the combine algorithm 2008 offset (reference phase) and theta_c the VFO offset (control phase). 2009 Each update produces a signal V_d representing the instantaneous 2010 phase difference theta_r - theta_c. The clock filter for each server 2011 functions as a tapped delay line, with the output taken at the tap 2012 selected by the clock filter algorithm. The selection, cluster and 2013 combine algorithms combine the data from multiple filters to produce 2014 the signal V_s. The loop filter, with impulse response F(t), 2015 produces the signal V_c which controls the VFO frequency omega_c and 2016 thus the integral of the phase theta_c which closes the loop. The 2017 V_c signal is generated by the clock adjust process in Section 12. 2018 The detailed equations that implement these functions are best 2019 presented in the routines of Appendix A.5.5.6 and Appendix A.5.6.1. 2021 theta_r + +---------\ +----------------+ 2022 NTP --------->| Phase \ V_d | | V_s 2023 theta_c - | Detector ------>| Clock Filter |----+ 2024 +-------->| / | | | 2025 | +---------/ +----------------+ | 2026 | | 2027 ----- | 2028 / \ | 2029 | VFO | | 2030 \ / | 2031 ----- ....................................... | 2032 ^ . Loop Filter . | 2033 | . +---------+ x +-------------+ . | 2034 | V_c . | |<-----| | . | 2035 +------.-| Clock | y | Phase/Freq |<---------+ 2036 . | Adjust |<-----| Prediction | . 2037 . | | | | . 2038 . +---------+ +-------------+ . 2039 ....................................... 2041 Figure 29: Clock Discipline Feedback Loop 2043 Ordinarily, the pseudo-linear feedback loop described above operates 2044 to discipline the system clock. However, there are cases where a 2045 nonlinear algorithm offers considerable improvement. One case is 2046 when the discipline starts without knowledge of the intrinsic clock 2047 frequency. The pseudo-linear loop takes several hours to develop an 2048 accurate measurement and during most of that time the poll interval 2049 cannot be increased. The nonlinear loop described below does this in 2050 15 minutes. Another case is when occasional bursts of large jitter 2051 are present due to congested network links. The state machine 2052 described below resists error bursts lasting less than 15 minutes. 2054 Figure 30 contains a summary of the variables and parameters 2055 including the variables (lower case) or parameters (upper case) name, 2056 formula name and short description. Unless noted otherwise, all 2057 variables have assumed prefix c. The variables t, tc, state, hyster 2058 and count are integers; the remaining variables are floating doubles. 2059 The function of each will be explained in the algorithm descriptions 2060 below. 2062 +--------+------------+--------------------------+ 2063 | Name | Formula | Description | 2064 +--------+------------+--------------------------+ 2065 | t | timer | seconds counter | 2066 | offset | theta | combined offset | 2067 | resid | theta_r | residual offset | 2068 | freq | phi | clock frequency | 2069 | jitter | psi | clock offset jitter | 2070 | wander | omega | clock frequency wander | 2071 | tc | tau | time constant (log2) | 2072 | state | state | state | 2073 | adj | adj | frequency adjustment | 2074 | hyster | hyster | hysteresis counter | 2075 | STEPT | 125 | step threshold (.125 s) | 2076 | WATCH | 900 | stepout thresh(s) | 2077 | PANICT | 1000 | panic threshold (1000 s) | 2078 | LIMIT | 30 | hysteresis limit | 2079 | PGATE | 4 | hysteresis gate | 2080 | TC | 16 | time constant scale | 2081 | AVG | 8 | averaging constant | 2082 +--------+------------+--------------------------+ 2084 Figure 30: Clock Discipline Variables and Parameters 2086 The discipline is implemented by the local_clock() routine, which is 2087 called from the clock_update() routine. The local_clock() routine in 2088 Appendix A.5.5.6 has two parts; the first implements the clock state 2089 machine and the second determines the time constant and thus the poll 2090 interval. 2092 The local_clock() routine exits immediately if the offset is greater 2093 than the panic threshold PANICT (1000 s). The state transition 2094 function is implemented by the rstclock() function in 2095 Appendix A.5.5.7. Figure 31 shows the state transition function used 2096 by this routine. It has four columns showing respectively the state 2097 name, predicate and action if the offset theta is less than the step 2098 threshold, the predicate and actions otherwise, and finally some 2099 comments. 2101 +-------+---------------------+-------------------+--------------+ 2102 | State | theta < STEP | theta > STEP | Comments | 2103 +-------+---------------------+-------------------+--------------+ 2104 | NSET | ->FREQ | ->FREQ | no frequency | 2105 | | adjust time | step time | file | 2106 +-------+---------------------+-------------------+--------------+ 2107 | FSET | ->SYNC | ->SYNC | frequency | 2108 | | adjust time | step time | file | 2109 +-------+---------------------+-------------------+--------------+ 2110 | SPIK | ->SYNC | if < 900 s ->SPIK | outlyer | 2111 | | adjust freq | else ->SYNC | detected | 2112 | | adjust time | step freq | | 2113 | | | step time | | 2114 +-------+---------------------+-------------------+--------------+ 2115 | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | 2116 | | else ->SYNC | else ->SYNC | frequency | 2117 | | step freq | step freq | | 2118 | | adjust time | adjust time | | 2119 +-------+---------------------+-------------------+--------------+ 2120 | SYNC | ->SYNC | if < 900 s ->SPIK | normal | 2121 | | adjust freq | else ->SYNC | operation | 2122 | | adjust time | step freq | | 2123 | | | step time | | 2124 +-------+---------------------+-------------------+--------------+ 2126 Figure 31: State Transition Function 2128 In the table entries the next state is identified by the arrow -> 2129 with the actions listed below. Actions such as adjust time and 2130 adjust frequency are implemented by the PLL/FLL feedback loop in the 2131 local_clock() routine. A step clock action is implemented by setting 2132 the clock directly, but this is done only after the stepout threshold 2133 WATCH (900 s) when the offset is more than the step threshold STEPT 2134 (.125 s). This resists clock steps under conditions of extreme 2135 network congestion. 2137 The jitter (psi) and wander (omega) statistics are computed using an 2138 exponential average with weight factor AVG. The time constant 2139 exponent (tau) is determined by comparing psi with the magnitude of 2140 the current offset theta. If the offset is greater than PGATE (4) 2141 times the clock jitter, the hysteresis counter hyster is reduced by 2142 two; otherwise, it is increased by one. If hyster increases to the 2143 upper limit LIMIT (30), tau is increased by one; if it decreases to 2144 the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau 2145 hovers near MAXPOLL, but quickly decreases if a temperature spike 2146 causes a frequency surge. 2148 12. Clock Adjust Process 2150 The actual clock adjustment is performed by the clock_adjust() 2151 routine in Appendix A.5.6.1. It runs at one-second intervals to add 2152 the frequency correction and a fixed percentage of the residual 2153 offset theta_r. The theta_r is in effect the exponential decay of 2154 the theta value produced by the loop filter at each update. The TC 2155 parameter scales the time constant to match the poll interval for 2156 convenience. Note that the dispersion EPSILON increases by PHI at 2157 each second. 2159 The clock adjust process includes a timer interrupt facility driving 2160 the seconds counter c.t. It begins at zero when the service starts 2161 and increments once each second. At each interrupt the 2162 clock_adjust() routine is called to incorporate the clock discipline 2163 time and frequency adjustments, then the associations are scanned to 2164 determine if the seconds counter equals or exceeds the p.next state 2165 variable defined in the next section. If so, the poll process is 2166 called to send a packet and compute the next p.next value. 2168 13. Poll Process 2170 Each association supports a poll process that runs at regular 2171 intervals to construct and send packets in symmetric, client and 2172 broadcast server associations. It runs continuously, whether or not 2173 servers are reachable in order to manage the clock filter and reach 2174 register. 2176 13.1. Poll Process Variables 2178 Figure 32 summarizes the common names, formula names and a short 2179 description of the poll process variables(lower case) and parameters 2180 (upper case). Unless noted otherwise, all variables have assumed 2181 prefix p. 2183 +---------+---------+--------------------+ 2184 | Name | Formula | Description | 2185 +---------+---------+--------------------+ 2186 | hpoll | hpoll | host poll exponent | 2187 | last | last | last poll time | 2188 | next | next | next poll time | 2189 | reach | reach | reach register | 2190 | unreach | unreach | unreach counter | 2191 | UNREACH | 24 | unreach limit | 2192 | BCOUNT | 8 | burst count | 2193 | BURST | flag | burst enable | 2194 | IBURST | flag | iburst enable | 2195 +---------+---------+--------------------+ 2197 Figure 32: Poll Process Variables and Parameters 2199 The poll process variables are allocated in the association data 2200 structure along with the peer process variables. Following is a 2201 detailed description of the variables. The parameters will be called 2202 out in the following text. 2204 hpoll: signed integer representing the poll exponent, in log2 seconds 2206 last: integer representing the seconds counter when the most recent 2207 packet was sent 2209 next: integer representing the seconds counter when the next packet 2210 is to be sent 2212 reach: 8-bit integer shift register shared by the peer and poll 2213 processes 2215 unreach: integer representing the number of seconds the server has 2216 been unreachable 2218 13.2. Poll Process Operations 2220 As described previously, once each second the clock_adjust() routine 2221 in the clock adjust process is called. This routine calls the poll() 2222 routine in Appendix A.5.7.1 for each association in turn. If the 2223 time for the next poll message is greater than the seconds counter, 2224 the routine returns immediately. Symmetric (modes 1, 2), client 2225 (mode 3) and broadcast server (mode 5) associations routinely send 2226 packets. A broadcast client (mode 6) association runs the routine to 2227 update the reach and unreach variables, but does not send packets. 2228 The poll() routine calls the peer_xmit() routine in Appendix A.5.7.3 2229 to send a packet. If in a burst (burst > 0), nothing further is done 2230 except call the poll_update() routine to set the next poll interval. 2232 If not in a burst, the reach variable is shifted left by one bit, 2233 with zero replacing the rightmost bit. If the server has not been 2234 heard for the last three poll intervals, the clock_filter() routine 2235 is called to increase the dispersion as described in 2236 Appendix A.5.7.3. 2238 If the BURST flag is lit and the server is reachable and a valid 2239 source of synchronization is available, the client sends a burst of 2240 BCOUNT (8) packets at each poll interval. The interval between 2241 packets in the burst is two seconds. This is useful to accurately 2242 measure jitter with long poll intervals. If the IBURST flag is lit 2243 and this is the first packet sent when the server has been 2244 unreachable, the client sends a burst. This is useful to quickly 2245 reduce the synchronization distance below the distance threshold and 2246 synchronize the clock. 2248 If the P_MANY flag is lit in the p.flags word of the association, 2249 this is a manycast client association. Manycast client associations 2250 send client mode packets to designated multicast group addresses at 2251 MINPOLL intervals. The association starts out with a TTL of 1. If 2252 by the time of the next poll there are fewer than MINCLOCK servers 2253 have been mobilized, the ttl is increased by one. If the ttl reaches 2254 the limit TTLMAX, without finding MINCLOCK servers, the poll interval 2255 increases until reaching BEACON, when it starts over from the 2256 beginning. 2258 The poll() routine includes a feature that backs off the poll 2259 interval if the server becomes unreachable. If reach is nonzero, the 2260 server is reachable and unreach is set to zero; otherwise, unreach is 2261 incremented by one for each poll to the maximum UNREACH. Thereafter 2262 for each poll hpoll is increased by one, which doubles the poll 2263 interval up to the maximum MAXPOLL determined by the poll_update() 2264 routine. When the server again becomes reachable, unreach is set to 2265 zero, hpoll is reset to the tc system variable and operation resumes 2266 normally. 2268 A packet is sent by the xmit_packet() routine in Appendix A.3. Some 2269 header values are copied from the peer variables left by a previous 2270 packet and others from the system variables. Figure 33 shows which 2271 values are copied to each header field. In those implementations 2272 using floating double data types for root delay and root dispersion, 2273 these must be converted to NTP short format. All other fields are 2274 either copied intact from peer and system variables or struck as a 2275 timestamp from the system clock. 2277 +-----------------------------------+ 2278 | Packet Variable <-- Variable | 2279 +-----------------------------------+ 2280 | x.leap <-- s.leap | 2281 | x.version <-- s.version | 2282 | x.mode <-- s.mode | 2283 | x.stratum <-- s.stratum | 2284 | x.poll <-- s.poll | 2285 | x.precision <-- s.precision | 2286 | x.rootdelay <-- s.rootdelay | 2287 | x.rootdisp <-- s.rootdisp | 2288 | x.refid <-- s.refid | 2289 | x.reftime <-- s.reftime | 2290 | x.org <-- p.xmt | 2291 | x.rec <-- p.dst | 2292 | x.xmt <-- clock | 2293 | x.keyid <-- p.keyid | 2294 | x.digest <-- md5 digest | 2295 +-----------------------------------+ 2297 Figure 33: xmit_packet Packet Header 2299 The poll_update() routine shown in Appendix A.5.7.2 is called when a 2300 valid packet is received and immediately after a poll message has 2301 been sent. If in a burst, the poll interval is fixed at 2 s; 2302 otherwise, the host poll exponent hpoll is set to the minimum of 2303 ppoll from the last packet received and hpoll from the poll() 2304 routine, but not less than MINPOLL nor greater than MAXPOLL. Thus 2305 the clock discipline can be oversampled, but not undersampled. This 2306 is necessary to preserve subnet dynamic behavior and protect against 2307 protocol errors. 2309 The poll exponent is converted to an interval which when added to the 2310 last variable determines the next variable and thus the time for the 2311 next poll. Finally, the last variable is set to the current seconds 2312 counter. 2314 14. Simple Network Time Protocol (SNTP) 2316 Primary servers and clients complying with a subset of NTP, called 2317 the Simple Network Time Protocol (SNTPv4) [2], do not need to 2318 implement the mitigation algorithms described in Section 9 and 2319 following sections. SNTP is intended for primary servers equipped 2320 with a single reference clock, as well as for clients with a single 2321 upstream server and no dependent clients. The fully developed NTPv4 2322 implementation is intended for secondary servers with multiple 2323 upstream servers and multiple downstream servers or clients. Other 2324 than these considerations, NTP and SNTP servers and clients are 2325 completely interoperable and can be intermixed in NTP subnets. 2327 An SNTP primary server implementing the on-wire protocol described in 2328 Section 8 has no upstream servers except a single reference clock. 2329 In principle, it is indistinguishable from an NTP primary server that 2330 has the mitigation algorithms and therefore capable of mitigating 2331 between multiple reference clocks. 2333 Upon receiving a client request, an SNTP primary server constructs 2334 and sends the reply packet as described in Figure 34. Note that the 2335 dispersion field in the packet header must be updated as described in 2336 Section 5. 2338 +-----------------------------------+ 2339 | Packet Variable <-- Variable | 2340 +-----------------------------------+ 2341 | x.leap <-- s.leap | 2342 | x.version <-- r.version | 2343 | x.mode <-- 4 | 2344 | x.stratum <-- s.stratum | 2345 | x.poll <-- r.poll | 2346 | x.precision <-- s.precision | 2347 | x.rootdelay <-- s.rootdelay | 2348 | x.rootdisp <-- s.rootdisp | 2349 | x.refid <-- s.refid | 2350 | x.reftime <-- s.reftime | 2351 | x.org <-- r.xmt | 2352 | x.rec <-- r.dst | 2353 | x.xmt <-- clock | 2354 | x.keyid <-- r.keyid | 2355 | x.digest <-- md5 digest | 2356 +-----------------------------------+ 2358 Figure 34: fast_xmit Packet Header 2360 A SNTP client implementing the on-wire protocol has a single server 2361 and no dependent clients. It can operate with any subset of the NTP 2362 on-wire protocol, the simplest approach using only the transmit 2363 timestamp of the server packet and ignoring all other fields. 2364 However, the additional complexity to implement the full on-wire 2365 protocol is minimal so that a full implementation is encouraged. 2367 15. Security Considerations 2369 NTPv4 provides an optional authentication field that utilizes the MD5 2370 algorithm. MD5, as the case for SHA-1, is derived from MD4, which 2371 has long been known to be weak. In 2004, techniques for efficiently 2372 finding collisions in MD5 were announced. A summary of the 2373 apropriateness of MD5 can be found in [9]. 2375 In the case of NTP as specified herein, NTP broadcast clients are 2376 vulnerable to disruption by misbehaving or hostile SNTP or NTP 2377 broadcast servers elsewhere in the Internet. Access controls and/or 2378 cryptographic authentication means should be provided for additional 2379 security in such cases. 2381 Section 8 describes a potential security concern with the replay of 2382 client requests. Following the recommendations in that section 2383 provides protection against such attacks. 2385 16. IANA Considerations 2387 UDP/TCP Port 123 was previously assigned by IANA for this protocol. 2388 The IANA has assigned the IPv4 multicast group address 224.0.1.1 and 2389 the IPv6 multicast address ending :101 for NTP. This document 2390 introduces NTP extension fields allowing for the development of 2391 future extensions to the protocol, where a particular extension is to 2392 be identified by the Field Type sub-field within the extension field. 2393 IANA is requested to establish and maintain a registry for Extension 2394 Field Types associated with this protocol, populating this registry 2395 with no initial entries. As future needs arise, new Extension Field 2396 Types may be defined. Following the policies outlined in [10], new 2397 values are to be defined by IETF consensus. 2399 The IANA is requested to create a new registry for NTP Kiss-o'-Death 2400 codes. This should include the current codes defined in Section 7.4, 2401 and may be extended on a FCFS basis. The format of the registry is: 2403 +------+------------------------------------------------------------+ 2404 | Code | Meaning | 2405 +------+------------------------------------------------------------+ 2406 | ACST | The association belongs to a unicast server | 2407 | AUTH | Server authentication failed | 2408 | ... | ... | 2409 +------+------------------------------------------------------------+ 2411 Figure 35: Kiss Codes 2413 17. Acknowledgements 2415 The editors would like to thank Karen O'Donoghue, Brian Haberman, 2416 Greg Dowd, Mark Elliot, and Harlan Stenn for technical reviews of 2417 this document. 2419 18. References 2421 18.1. Informative References 2423 [1] Mills, D., "Network Time Protocol (Version 3) Specification, 2424 Implementation", RFC 1305, March 1992. 2426 [2] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 for 2427 IPv4, IPv6 and OSI", RFC 4330, January 2006. 2429 [3] Mills, D.L., "The Autokey security architecture, protocol and 2430 algorithms. Electrical and Computer Engineering Technical 2431 Report 06-1-1", NDSS , January 2006. 2433 [4] Mills, D.L., Electrical and Computer Engineering Technical 2434 Report 06-6-1, NDSS, June 2006., "Network Time Protocol Version 2435 4 Reference and Implementation Guide.", 2006. 2437 [5] Mills, D.L., "Computer Network Time Synchronization - the 2438 Network Time Protocol. CRC Press, 304pp.", 2006. 2440 [6] International Telecommunications Union, "ITU-R TF.460 Standard- 2441 frequency and time-signal emissions", February 2002. 2443 [7] Bureau International des Poids et Mesures, "Comptes Rendus de 2444 la 15e CGPM", 1976. 2446 [8] Marzullo and S. Owicki, "Maintaining the time in a distributed 2447 system.", ACM Operating Systems Review 19 , July 1985. 2449 [9] Bellovin, S. and E. Rescorla, Proceedings of the 13th annual 2450 ISOC Network and Distributed System Security Symposium, 2451 "Deploying a new Hash Algorithm", February 2006. 2453 [10] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA 2454 Considerations Section in RFCs", BCP 26, RFC 2434, 2455 October 1998. 2457 18.2. Normative References 2459 [11] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 2460 September 1981. 2462 [12] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2463 Levels", BCP 14, RFC 2119, March 1997. 2465 [13] Postel, J., "Internet Protocol", STD 5, RFC 791, 2466 September 1981. 2468 [14] Postel, J., "User Datagram Protocol", STD 6, RFC 768, 2469 August 1980. 2471 [15] Simonsen, K., "Character Mnemonics and Character Sets", 2472 RFC 1345, June 1992. 2474 [16] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 2475 April 1992. 2477 Appendix A. Code Skeleton 2479 This appendix is intended to describe the protocol and algorithms of 2480 an implementation in a general way using what is called a code 2481 skeleton program. This consists of a set of definitions, structures 2482 and code fragments which illustrate the protocol operations without 2483 the complexities of an actual implementation of the protocol. This 2484 program is not an executable and is not designed to run in the 2485 ordinary sense. It is designed to be compiled only in order to 2486 verify consistent variable and type usage. 2488 Most of the features of the reference implementation are included 2489 here, with the following exceptions: There are no provisions for 2490 reference clocks or public key (Autokey) cryptography. There is no 2491 huff-n'-puff filter, anti-clockhop hysteresis or monitoring 2492 provisions. Many of the values that can be tinkered in the reference 2493 implementation are assumed constants here. There are only minimal 2494 provisions for the kiss-o'death packet and no responding code. 2496 The program is not intended to be fast or compact, just to 2497 demonstrate the algorithms with sufficient fidelity to understand how 2498 they work. The code skeleton consists of eight segments, a header 2499 segment included by each of the other segments, plus a code segment 2500 for the main program, kernel I/O and system clock interfaces, and 2501 peer, system, clock_adjust and poll processes. These are presented 2502 in order below along with definitions and variables specific to each 2503 process. 2505 A.1. Global Definitions 2507 A.1.1. Definitions, Constants, Parameters 2509 #include /* avoids complaints about sqrt() */ 2510 #include /* for gettimeofday() and friends */ 2511 #include /* for malloc() and friends */ 2513 /* 2514 * Data types 2515 * 2516 * This program assumes the int data type is 32 bits and the long data 2517 * type is 64 bits. The native data type used in most calculations is 2518 * floating double. The data types used in some packet header fields 2519 * require conversion to and from this representation. Some header 2520 * fields involve partitioning an octet, here represeted by individual 2521 * octets. 2522 * 2523 * The 64-bit NTP timestamp format used in timestamp calculations is 2524 * unsigned seconds and fraction with the decimal point to the left of 2525 * bit 32. The only operation permitted with these values is 2526 * subtraction, yielding a signed 31-bit difference. The 32-bit NTP 2527 * short format used in delay and dispersion calculations is seconds and 2528 * fraction with the decimal point to the left of bit 16. The only 2529 * operations permitted with these values are addition and 2530 * multiplication by a constant. 2531 * 2532 * The IPv4 address is 32 bits, while the IPv6 address is 128 bits. The 2533 * message digest field is 128 bits as constructed by the MD5 algorithm. 2534 * The precision and poll interval fields are signed log2 seconds. 2535 */ 2536 typedef unsigned long tstamp; /* NTP timestamp format */ 2537 typedef unsigned int tdist; /* NTP short format */ 2538 typedef unsigned long ipaddr; /* IPv4 or IPv6 address */ 2539 typedef unsigned long digest; /* md5 digest */ 2540 typedef signed char s_char; /* precision and poll interval (log2) */ 2542 /* 2543 * Timestamp conversion macroni 2544 */ 2545 #define FRIC 65536. /* 2^16 as a double */ 2546 #define D2FP(r) ((tdist)((r) * FRIC)) /* NTP short */ 2547 #define FP2D(r) ((double)(r) / FRIC) 2549 #define FRAC 4294967296. /* 2^32 as a double */ 2550 #define D2LFP(a) ((tstamp)((a) * FRAC)) /* NTP timestamp */ 2551 #define LFP2D(a) ((double)(a) / FRAC) 2552 #define U2LFP(a) ((a).tv_sec + JAN_1970 << 32 + (unsigned long) \ 2553 ((a).tv_usec / 1e6 * FRAC)) 2555 /* 2556 * Arithmetic conversions 2557 */ 2558 #define LOG2D(a) ((a) < 0 ? 1. / (1L << -(a)) : \ 2559 1L << (a)) /* poll, etc. */ 2560 #define SQUARE(x) (x * x) 2561 #define SQRT(x) (sqrt(x)) 2563 /* 2564 * Global constants. Some of these might be converted to variables 2565 * which can be tinkered by configuration or computed on-fly. For 2566 * instance, the reference implementation computes PRECISION on-fly and 2567 * provides performance tuning for the defines marked with % below. 2568 */ 2569 #define VERSION 4 /* version number */ 2570 #define MINDISP .01 /* % minimum dispersion (s) */ 2571 #define MAXDISP 16 /* maximum dispersion (s) */ 2572 #define MAXDIST 1 /* % distance threshold (s) */ 2573 #define NOSYNC 0x3 /* leap unsync */ 2574 #define MAXSTRAT 16 /* maximum stratum (infinity metric) */ 2575 #define MINPOLL 6 /* % minimum poll interval (64 s)*/ 2576 #define MAXPOLL 17 /* % maximum poll interval (36.4 h) */ 2577 #define MINCLOCK 3 /* minimum manycast survivors */ 2578 #define MAXCLOCK 10 /* maximum manycast candidates */ 2579 #define TTLMAX 8 /* max ttl manycast */ 2580 #define BEACON 15 /* max interval between beacons */ 2582 #define PHI 15e-6 /* % frequency tolerance (15 PPM) */ 2583 #define NSTAGE 8 /* clock register stages */ 2584 #define NMAX 50 /* maximum number of peers */ 2585 #define NSANE 1 /* % minimum intersection survivors */ 2586 #define NMIN 3 /* % minimum cluster survivors */ 2588 /* 2589 * Global return values 2590 */ 2591 #define TRUE 1 /* boolean true */ 2592 #define FALSE 0 /* boolean false */ 2593 #define NULL 0 /* empty pointer */ 2595 /* 2596 * Local clock process return codes 2597 */ 2598 #define IGNORE 0 /* ignore */ 2599 #define SLEW 1 /* slew adjustment */ 2600 #define STEP 2 /* step adjustment */ 2601 #define PANIC 3 /* panic - no adjustment */ 2603 /* 2604 * System flags 2605 */ 2606 #define S_FLAGS 0 /* any system flags */ 2607 #define S_BCSTENAB 0x1 /* enable broadcast client */ 2609 /* 2610 * Peer flags 2611 */ 2612 #define P_FLAGS 0 /* any peer flags */ 2613 #define P_EPHEM 0x01 /* association is ephemeral */ 2614 #define P_BURST 0x02 /* burst enable */ 2615 #define P_IBURST 0x04 /* intial burst enable */ 2616 #define P_NOTRUST 0x08 /* authenticated access */ 2617 #define P_NOPEER 0x10 /* authenticated mobilization */ 2618 #define P_MANY 0x20 /* manycast client */ 2620 /* 2621 * Authentication codes 2622 */ 2623 #define A_NONE 0 /* no authentication */ 2624 #define A_OK 1 /* authentication OK */ 2625 #define A_ERROR 2 /* authentication error */ 2626 #define A_CRYPTO 3 /* crypto-NAK */ 2628 /* 2629 * Association state codes 2630 */ 2631 #define X_INIT 0 /* initialization */ 2632 #define X_STALE 1 /* timeout */ 2633 #define X_STEP 2 /* time step */ 2634 #define X_ERROR 3 /* authentication error */ 2635 #define X_CRYPTO 4 /* crypto-NAK received */ 2636 #define X_NKEY 5 /* untrusted key */ 2638 /* 2639 * Protocol mode definitionss 2640 */ 2641 #define M_RSVD 0 /* reserved */ 2642 #define M_SACT 1 /* symmetric active */ 2643 #define M_PASV 2 /* symmetric passive */ 2644 #define M_CLNT 3 /* client */ 2645 #define M_SERV 4 /* server */ 2646 #define M_BCST 5 /* broadcast server */ 2647 #define M_BCLN 6 /* broadcast client */ 2648 /* 2649 * Clock state definitions 2650 */ 2651 #define NSET 0 /* clock never set */ 2652 #define FSET 1 /* frequency set from file */ 2653 #define SPIK 2 /* spike detected */ 2654 #define FREQ 3 /* frequency mode */ 2655 #define SYNC 4 /* clock synchronized */ 2657 A.1.2. Packet Data Structures 2659 /* 2660 * The receive and transmit packets may contain an optional message 2661 * authentication code (MAC) consisting of a key identifier (keyid) and 2662 * message digest (mac). NTPv4 supports optional extension fields which 2663 * are inserted after the the header and before the MAC, but these are 2664 * not described here. 2665 * 2666 * Receive packet 2667 * 2668 * Note the dst timestamp is not part of the packet itself. It is 2669 * captured upon arrival and returned in the receive buffer along with 2670 * the buffer length and data. Note that some of the char fields are 2671 * packed in the actual header, but the details are omited here. 2672 */ 2673 struct r { 2674 ipaddr srcaddr; /* source (remote) address */ 2675 ipaddr dstaddr; /* destination (local) address */ 2676 char version; /* version number */ 2677 char leap; /* leap indicator */ 2678 char mode; /* mode */ 2679 char stratum; /* stratum */ 2680 char poll; /* poll interval */ 2681 s_char precision; /* precision */ 2682 tdist rootdelay; /* root delay */ 2683 tdist rootdisp; /* root dispersion */ 2684 char refid; /* reference ID */ 2685 tstamp reftime; /* reference time */ 2686 tstamp org; /* origin timestamp */ 2687 tstamp rec; /* receive timestamp */ 2688 tstamp xmt; /* transmit timestamp */ 2689 int keyid; /* key ID */ 2690 digest mac; /* message digest */ 2691 tstamp dst; /* destination timestamp */ 2692 } r; 2694 /* 2695 * Transmit packet 2696 */ 2697 struct x { 2698 ipaddr dstaddr; /* source (local) address */ 2699 ipaddr srcaddr; /* destination (remote) address */ 2700 char version; /* version number */ 2701 char leap; /* leap indicator */ 2702 char mode; /* mode */ 2703 char stratum; /* stratum */ 2704 char poll; /* poll interval */ 2705 s_char precision; /* precision */ 2706 tdist rootdelay; /* root delay */ 2707 tdist rootdisp; /* root dispersion */ 2708 char refid; /* reference ID */ 2709 tstamp reftime; /* reference time */ 2710 tstamp org; /* origin timestamp */ 2711 tstamp rec; /* receive timestamp */ 2712 tstamp xmt; /* transmit timestamp */ 2713 int keyid; /* key ID */ 2714 digest mac; /* message digest */ 2715 } x; 2717 A.1.3. Association Data Structures 2719 /* 2720 * Filter stage structure. Note the t member in this and other 2721 * structures refers to process time, not real time. Process time 2722 * increments by one second for every elapsed second of real time. 2723 */ 2724 struct f { 2725 tstamp t; /* update time */ 2726 double offset; /* clock ofset */ 2727 double delay; /* roundtrip delay */ 2728 double disp; /* dispersion */ 2729 } f; 2731 /* 2732 * Association structure. This is shared between the peer process and 2733 * poll process. 2734 */ 2735 struct p { 2737 /* 2738 * Variables set by configuration 2739 */ 2740 ipaddr srcaddr; /* source (remote) address */ 2741 ipaddr dstaddr; /* destination (local) address */ 2742 char version; /* version number */ 2743 char hmode; /* host mode */ 2744 int keyid; /* key identifier */ 2745 int flags; /* option flags */ 2747 /* 2748 * Variables set by received packet 2749 */ 2750 char leap; /* leap indicator */ 2751 char pmode; /* peer mode */ 2752 char stratum; /* stratum */ 2753 char ppoll; /* peer poll interval */ 2754 double rootdelay; /* root delay */ 2755 double rootdisp; /* root dispersion */ 2756 char refid; /* reference ID */ 2757 tstamp reftime; /* reference time */ 2758 #define begin_clear org /* beginning of clear area */ 2759 tstamp org; /* originate timestamp */ 2760 tstamp rec; /* receive timestamp */ 2761 tstamp xmt; /* transmit timestamp */ 2763 /* 2764 * Computed data 2765 */ 2766 double t; /* update time */ 2767 struct f f[NSTAGE]; /* clock filter */ 2768 double offset; /* peer offset */ 2769 double delay; /* peer delay */ 2770 double disp; /* peer dispersion */ 2771 double jitter; /* RMS jitter */ 2773 /* 2774 * Poll process variables 2775 */ 2776 char hpoll; /* host poll interval */ 2777 int burst; /* burst counter */ 2778 int reach; /* reach register */ 2779 int ttl; /* ttl (manycast) */ 2780 #define end_clear unreach /* end of clear area */ 2781 int unreach; /* unreach counter */ 2782 int outdate; /* last poll time */ 2783 int nextdate; /* next poll time */ 2784 } p; 2786 A.1.4. System Data Structures 2788 /* 2789 * Chime list. This is used by the intersection algorithm. 2790 */ 2791 struct m { /* m is for Marzullo */ 2792 struct p *p; /* peer structure pointer */ 2793 int type; /* high +1, mid 0, low -1 */ 2794 double edge; /* correctness interval edge */ 2795 } m; 2797 /* 2798 * Survivor list. This is used by the clustering algorithm. 2799 */ 2800 struct v { 2801 struct p *p; /* peer structure pointer */ 2802 double metric; /* sort metric */ 2803 } v; 2805 /* 2806 * System structure 2807 */ 2808 struct s { 2809 tstamp t; /* update time */ 2810 char leap; /* leap indicator */ 2811 char stratum; /* stratum */ 2812 char poll; /* poll interval */ 2813 char precision; /* precision */ 2814 double rootdelay; /* root delay */ 2815 double rootdisp; /* root dispersion */ 2816 char refid; /* reference ID */ 2817 tstamp reftime; /* reference time */ 2818 struct m m[NMAX]; /* chime list */ 2819 struct v v[NMAX]; /* survivor list */ 2820 struct p *p; /* association ID */ 2821 double offset; /* combined offset */ 2822 double jitter; /* combined jitter */ 2823 int flags; /* option flags */ 2824 int n; /* number of survivors */ 2825 } s; 2827 A.1.5. Local Clock Data Structures 2829 /* 2830 * Local clock structure 2831 */ 2832 struct c { 2833 tstamp t; /* update time */ 2834 int state; /* current state */ 2835 double offset; /* current offset */ 2836 double last; /* previous offset */ 2837 int count; /* jiggle counter */ 2838 double freq; /* frequency */ 2839 double jitter; /* RMS jitter */ 2840 double wander; /* RMS wander */ 2841 } c; 2843 A.1.6. Function Prototypes 2845 /* 2846 * Peer process 2847 */ 2848 void receive(struct r *); /* receive packet */ 2849 void packet(struct p *, struct r *); /* process packet */ 2850 void clock_filter(struct p *, double, double, double); /* filter */ 2851 double root_dist(struct p *); /* calculate root distance */ 2852 int fit(struct p *); /* determine fitness of server */ 2853 void clear(struct p *, int); /* clear association */ 2854 int access(struct r *); /* determine access restrictions */ 2856 /* 2857 * System process 2858 */ 2859 int main(); /* main program */ 2860 void clock_select(); /* find the best clocks */ 2861 void clock_update(struct p *); /* update the system clock */ 2862 void clock_combine(); /* combine the offsets */ 2864 /* 2865 * Local clock process 2866 */ 2867 int local_clock(struct p *, double); /* clock discipline */ 2868 void rstclock(int, double, double); /* clock state transition */ 2870 /* 2871 * Clock adjust process 2872 */ 2873 void clock_adjust(); /* one-second timer process */ 2874 /* 2875 * Poll process 2876 */ 2877 void poll(struct p *); /* poll process */ 2878 void poll_update(struct p *, int); /* update the poll interval */ 2879 void peer_xmit(struct p *); /* transmit a packet */ 2880 void fast_xmit(struct r *, int, int); /* transmit a reply packet */ 2882 /* 2883 * Utility routines 2884 */ 2885 digest md5(int); /* generate a message digest */ 2886 struct p *mobilize(ipaddr, ipaddr, int, int, int, int); /* mobilize */ 2887 struct p *find_assoc(struct r *); /* search the association table */ 2889 /* 2890 * Kernel interface 2891 */ 2892 struct r *recv_packet(); /* wait for packet */ 2893 void xmit_packet(struct x *); /* send packet */ 2894 void step_time(double); /* step time */ 2895 void adjust_time(double); /* adjust (slew) time */ 2896 tstamp get_time(); /* read time */ 2898 A.2. Main Program and Utility Routines 2900 /* 2901 * Definitions 2902 */ 2903 #define PRECISION -18 /* precision (log2 s) */ 2904 #define IPADDR 0 /* any IP address */ 2905 #define MODE 0 /* any NTP mode */ 2906 #define KEYID 0 /* any key identifier */ 2908 /* 2909 * main() - main program 2910 */ 2911 int 2912 main() 2913 { 2914 struct p *p; /* peer structure pointer */ 2915 struct r *r; /* receive packet pointer */ 2917 /* 2918 * Read command line options and initialize system variables. 2919 * The reference implementation measures the precision specific 2920 * to each machine by measuring the clock increments to read the 2921 * system clock. 2923 */ 2924 memset(&s, sizeof(s), 0); 2925 s.leap = NOSYNC; 2926 s.stratum = MAXSTRAT; 2927 s.poll = MINPOLL; 2928 s.precision = PRECISION; 2929 s.p = NULL; 2931 /* 2932 * Initialize local clock variables 2933 */ 2934 memset(&c, sizeof(c), 0); 2935 if (/* frequency file */ 0) { 2936 c.freq = /* freq */ 0; 2937 rstclock(FSET, 0, 0); 2938 } else { 2939 rstclock(NSET, 0, 0); 2940 } 2941 c.jitter = LOG2D(s.precision); 2943 /* 2944 * Read the configuration file and mobilize persistent 2945 * associations with spcified addresses, version, mode, key ID 2946 * and flags. 2947 */ 2948 while (/* mobilize configurated associations */ 0) { 2949 p = mobilize(IPADDR, IPADDR, VERSION, MODE, KEYID, 2950 P_FLAGS); 2951 } 2953 /* 2954 * Start the system timer, which ticks once per second. Then 2955 * read packets as they arrive, strike receive timestamp and 2956 * call the receive() routine. 2957 */ 2958 while (0) { 2959 r = recv_packet(); 2960 r->dst = get_time(); 2961 receive(r); 2962 } 2963 } 2965 /* 2966 * mobilize() - mobilize and initialize an association 2967 */ 2968 struct p 2969 *mobilize( 2970 ipaddr srcaddr, /* IP source address */ 2971 ipaddr dstaddr, /* IP destination address */ 2972 int version, /* version */ 2973 int mode, /* host mode */ 2974 int keyid, /* key identifier */ 2975 int flags /* peer flags */ 2976 ) 2977 { 2978 struct p *p; /* peer process pointer */ 2980 /* 2981 * Allocate and initialize association memory 2982 */ 2983 p = malloc(sizeof(struct p)); 2984 p->srcaddr = srcaddr; 2985 p->dstaddr = dstaddr; 2986 p->version = version; 2987 p->hmode = mode; 2988 p->keyid = keyid; 2989 p->hpoll = MINPOLL; 2990 clear(p, X_INIT); 2991 p->flags == flags; 2992 return (p); 2993 } 2995 /* 2996 * find_assoc() - find a matching association 2997 */ 2998 struct p /* peer structure pointer or NULL */ 2999 *find_assoc( 3000 struct r *r /* receive packet pointer */ 3001 ) 3002 { 3003 struct p *p; /* dummy peer structure pointer */ 3005 /* 3006 * Search association table for matching source 3007 * address, source port and mode. 3008 */ 3009 while (/* all associations */ 0) { 3010 if (r->srcaddr == p->srcaddr && r->mode == p->hmode) 3011 return(p); 3012 } 3013 return (NULL); 3014 } 3016 /* 3017 * md5() - compute message digest 3018 */ 3020 digest 3021 md5( 3022 int keyid /* key identifier */ 3023 ) 3024 { 3025 /* 3026 * Compute a keyed cryptographic message digest. The key 3027 * identifier is associated with a key in the local key cache. 3028 * The key is prepended to the packet header and extension fieds 3029 * and the result hashed by the MD5 algorithm as described in 3030 * RFC-1321. Return a MAC consisting of the 32-bit key ID 3031 * concatenated with the 128-bit digest. 3032 */ 3033 return (/* MD5 digest */ 0); 3034 } 3036 A.3. Kernel Input/Output Interface 3038 /* 3039 * Kernel interface to transmit and receive packets. Details are 3040 * deliberately vague and depend on the operating system. 3041 * 3042 * recv_packet - receive packet from network 3043 */ 3044 struct r /* receive packet pointer*/ 3045 *recv_packet() { 3046 return (/* receive packet r */ 0); 3047 } 3049 /* 3050 * xmit_packet - transmit packet to network 3051 */ 3052 void 3053 xmit_packet( 3054 struct x *x /* transmit packet pointer */ 3055 ) 3056 { 3057 /* send packet x */ 3058 } 3060 A.4. Kernel System Clock Interface 3062 /* 3063 * System clock utility functions 3064 * 3065 * There are three time formats: native (Unix), NTP and floating double. 3066 * The get_time() routine returns the time in NTP long format. The Unix 3067 * routines expect arguments as a structure of two signed 32-bit words 3068 * in seconds and microseconds (timeval) or nanoseconds (timespec). The 3069 * step_time() and adjust_time() routines expect signed arguments in 3070 * floating double. The simplified code shown here is for illustration 3071 * only and has not been verified. 3072 */ 3073 #define JAN_1970 2208988800UL /* 1970 - 1900 in seconds */ 3075 /* 3076 * get_time - read system time and convert to NTP format 3077 */ 3078 tstamp 3079 get_time() 3080 { 3081 struct timeval unix_time; 3083 /* 3084 * There are only two calls on this routine in the program. One 3085 * when a packet arrives from the network and the other when a 3086 * packet is placed on the send queue. Call the kernel time of 3087 * day routine (such as gettimeofday()) and convert to NTP 3088 * format. 3089 */ 3090 gettimeofday(&unix_time, NULL); 3091 return (U2LFP(unix_time)); 3092 } 3094 /* 3095 * step_time() - step system time to given offset valuet 3096 */ 3097 void 3098 step_time( 3099 double offset /* clock offset */ 3100 ) 3101 { 3102 struct timeval unix_time; 3103 tstamp ntp_time; 3105 /* 3106 * Convert from double to native format (signed) and add to the 3107 * current time. Note the addition is done in native format to 3108 * avoid overflow or loss of precision. 3109 */ 3110 gettimeofday(&unix_time, NULL); 3111 ntp_time = D2LFP(offset) + U2LFP(unix_time);; 3112 unix_time.tv_sec = ntp_time >> 32; 3113 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3114 32) / FRAC * 1e6); 3115 settimeofday(&unix_time, NULL); 3116 } 3118 /* 3119 * adjust_time() - slew system clock to given offset value 3120 */ 3121 void 3122 adjust_time( 3123 double offset /* clock offset */ 3124 ) 3125 { 3126 struct timeval unix_time; 3127 tstamp ntp_time; 3129 /* 3130 * Convert from double to native format (signed) and add to the 3131 * current time. 3132 */ 3133 ntp_time = D2LFP(offset); 3134 unix_time.tv_sec = ntp_time >> 32; 3135 unix_time.tv_usec = (long)((ntp_time - unix_time.tv_sec << 3136 32) / FRAC * 1e6); 3137 adjtime(&unix_time, NULL); 3138 } 3140 A.5. Peer Process 3142 /* 3143 * A crypto-NAK packet includes the NTP header followed by a MAC 3144 * consisting only of the key identifier with value zero. It tells the 3145 * receiver that a prior request could not be properly authenticated, 3146 * but the NTP header fields are correct. 3147 * 3148 * A kiss-o'-death packet is an NTP header with leap 0x3 (NOSYNC) and 3149 * stratum 16 (MAXSTRAT. It tells the receiver that something drastic 3150 * has happened, as revealled by the kiss code in the refid field. The 3151 * NTP header fields may or may not be correct. 3152 */ 3153 /* 3154 * Peer process parameters and constants 3155 */ 3156 #define SGATE 3 /* spike gate (clock filter */ 3157 #define BDELAY .004 /* broadcast delay (s) */ 3159 /* 3160 * Dispatch codes 3161 */ 3162 #define ERR -1 /* error */ 3163 #define DSCRD 0 /* discard packet */ 3164 #define PROC 1 /* process packet */ 3165 #define BCST 2 /* broadcast packet */ 3166 #define FXMIT 3 /* client packet */ 3167 #define NEWPS 4 /* new symmetric passive client */ 3168 #define NEWBC 5 /* new broadcast client */ 3170 /* 3171 * Dispatch matrix 3172 * active passv client server bcast */ 3173 int table[7][5] = { 3174 /* nopeer */ { NEWPS, DSCRD, FXMIT, MANY, NEWBC }, 3175 /* active */ { PROC, PROC, DSCRD, DSCRD, DSCRD }, 3176 /* passv */ { PROC, ERR, DSCRD, DSCRD, DSCRD }, 3177 /* client */ { DSCRD, DSCRD, DSCRD, PROC, DSCRD }, 3178 /* server */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3179 /* bcast */ { DSCRD, DSCRD, DSCRD, DSCRD, DSCRD }, 3180 /* bclient */ { DSCRD, DSCRD, DSCRD, DSCRD, PROC} 3181 }; 3183 /* 3184 * Miscellaneous macroni 3185 * 3186 * This macro defines the authentication state. If x is 0, 3187 * authentication is optional, othewise it is required. 3188 */ 3189 #define AUTH(x, y) ((x) ? (y) == A_OK : (y) == A_OK || \ 3190 (y) == A_NONE) 3192 /* 3193 * These are used by the clear() routine 3194 */ 3195 #define BEGIN_CLEAR(p) ((char *)&((p)->begin_clear)) 3196 #define END_CLEAR(p) ((char *)&((p)->end_clear)) 3197 #define LEN_CLEAR (END_CLEAR((struct p *)0) - \ 3198 BEGIN_CLEAR((struct p *)0)) 3200 A.5.1. receive() 3202 /* 3203 * receive() - receive packet and decode modes 3204 */ 3205 void 3206 receive( 3207 struct r *r /* receive packet pointer */ 3208 ) 3210 { 3211 struct p *p; /* peer structure pointer 3212 int auth; /* authentication code */ 3213 int has_mac; /* size of MAC */ 3214 int synch; /* synchronized switch */ 3215 int auth; /* authentication code */ 3217 /* 3218 * Check access control lists. The intent here is to implement a 3219 * whitelist of those IP addresses specifically accepted and/or 3220 * a blacklist of those IP addresses specifically rejected. 3221 * There could be different lists for authenticated clients and 3222 * unauthenticated clients. 3223 */ 3224 if (!access(r)) 3225 return; /* access denied */ 3227 /* 3228 * The version must not be in the future. Format checks include 3229 * packet length, MAC length and extension field lengths, if 3230 * present. 3231 */ 3232 if (r->version > VERSION /* or format error */) 3233 return; /* format error */ 3235 /* 3236 * Authentication is conditioned by two switches which can be 3237 * specified on a per-client basis. 3238 * 3239 * P_NOPEER do not mobilize an association unless 3240 * authenticated 3241 * P_NOTRUST do not allow access unless authenticated 3242 * (implies P_NOPEER) 3243 * 3244 * There are four outcomes: 3245 * 3246 * A_NONE the packet has no MAC 3247 * A_OK the packet has a MAC and authentication 3248 * succeeds 3249 * A_ERROR the packet has a MAC and authentication fails 3250 * A_CRYPTO crypto-NAK. The MAC has four octets only. 3251 * 3252 * Note: The AUTH(x, y) macro is used to filter outcomes. If x 3253 * is zero, acceptable outcomes of y are NONE and OK. If x is 3254 * one, the only acceptable outcome of y is OK. 3255 */ 3256 has_mac = /* length of MAC field */ 0; 3257 if (has_mac == 0) { 3258 auth = A_NONE; /* not required */ 3259 } else if (has_mac == 4) { 3260 auth == A_CRYPTO; /* crypto-NAK */ 3261 } else { 3262 if (r->mac != md5(r->keyid)) 3263 auth = A_ERROR; /* auth error */ 3264 else 3265 auth = A_OK; /* auth OK */ 3266 } 3268 /* 3269 * Find association and dispatch code. If there is no 3270 * association to match, the value of p->hmode is assumed NULL. 3271 */ 3272 p = find_assoc(r); 3273 switch(table[p->hmode][r->mode]) { 3275 /* 3276 * Client packet and no association. Send server reply without 3277 * saving state. 3278 */ 3279 case FXMIT: 3281 /* 3282 * If unicast destination address, send server packet. 3283 * If authentication fails, send a crypto-NAK packet. 3284 /* 3285 if (/* not multicast dstaddr */0) { 3286 if (AUTH(p->flags & P_NOTRUST, auth)) 3287 fast_xmit(r, M_SERV, auth); 3288 else if (auth == A_ERROR) 3289 fast_xmit(r, M_SERV, A_CRYPTO); 3290 return; /* M_SERV packet sent */ 3291 } 3293 /* 3294 * This must be manycast. Do not repspond if we are not 3295 * synchronized or if our stratum is above the 3296 * manycaster. 3297 */ 3298 if (s.leap == NOSYNC || s.stratum > r->stratum) 3299 return; 3301 /* 3302 * Respond only if authentication is ok. Note that the 3303 * unicast addreess is used, not the multicast. 3304 */ 3305 if (AUTH(p->flags & P_NOTRUST, auth)) 3306 fast_xmit(r, M_SERV, auth); 3307 return; 3309 /* 3310 * New manycase client ephemeral association. It is mobilized in 3311 * the same version as in the packet. If authentication fails, 3312 * ignore the packet. Verify the server packet by comparing the 3313 * r->org timestamp in the packet with the p->xmt timestamp in 3314 * the multicast client association. If they match, the server 3315 * packet is authentic. Details omitted. 3316 */ 3317 case MANY: 3318 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3319 return; /* authentication error */ 3321 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_CLNT, 3322 r->keyid, P_EPHEM); 3323 break; 3325 /* 3326 * New symmetric passive ssociation. It is mobilized in the same 3327 * version as in the packet. If authentication fails, send a 3328 * crypto-NAK packet. If restrict no-moblize, send a symmetric 3329 * active packet instead. 3330 */ 3331 case NEWPS: 3332 if (!AUTH(p->flags & P_NOTRUST, auth)) { 3333 if (auth == A_ERROR) 3334 fast_xmit(r, M_SACT, A_CRYPTO); 3335 return; /* crypto-NAK packet sent */ 3336 } 3337 if (!AUTH(p->flags & P_NOPEER, auth)) { 3338 fast_xmit(r, M_SACT, auth); 3339 return; /* M_SACT packet sent */ 3340 } 3341 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_PASV, 3342 r->keyid, P_EPHEM); 3343 break; 3345 /* 3346 * New broadcast client association. It is mobilized in the same 3347 * version as in the packet. If authentication fails, ignore the 3348 * packet. Note this code does not support the initial volley 3349 * feature in the reference implementation. 3350 */ 3351 case NEWBC: 3352 if (!AUTH(p->flags & (P_NOTRUST | P_NOPEER), auth)) 3353 return; /* authentication error */ 3355 if (!(s.flags & S_BCSTENAB)) 3356 return; /* broadcast not enabled */ 3358 p = mobilize(r->srcaddr, r->dstaddr, r->version, M_BCLN, 3359 r->keyid, P_EPHEM); 3360 break; /* processing continues */ 3362 /* 3363 * Process packet. Placeholdler only. 3364 */ 3365 case PROC: 3366 break; /* processing continues */ 3368 /* 3369 * Invalid mode combination. We get here only in case of 3370 * ephemeral associations, so the correct action is simply to 3371 * toss it. 3372 */ 3373 case ERR: 3374 clear(p, X_ERROR); 3375 return; /* invalid mode combination */ 3377 /* 3378 * No match; just discard the packet. 3379 */ 3380 case DSCRD: 3381 return; /* orphan abandoned */ 3382 } 3384 /* 3385 * Next comes a rigorous schedule of timestamp checking. If the 3386 * transmit timestamp is zero, the server is horribly broken. 3387 */ 3388 if (r->xmt == 0) 3389 return; /* invalid timestamp */ 3391 /* 3392 * If the transmit timestamp duplicates a previous one, the 3393 * packet is a replay. 3394 */ 3395 if (r->xmt == p->xmt) 3396 return; /* duplicate packet */ 3398 /* 3399 * If this is a broadcast mode packet, skip further checking. 3400 * If the origin timestamp is zero, the sender has not yet heard 3401 * from us. Otherwise, if the origin timestamp does not match 3402 * the transmit timestamp, the packet is bogus. 3404 */ 3405 synch = TRUE; 3406 if (r->mode != M_BCST) { 3407 if (r->org == 0) 3408 synch = FALSE; /* unsynchronized */ 3410 else if (r->org != p->xmt) 3411 synch = FALSE; /* bogus packet */ 3412 } 3414 /* 3415 * Update the origin and destination timestamps. If 3416 * unsynchronized or bogus, abandon ship. 3417 */ 3418 p->org = r->xmt; 3419 p->rec = r->dst; 3420 if (!synch) 3421 return; /* unsynch */ 3423 /* 3424 * The timestamps are valid and the receive packet matches the 3425 * last one sent. If the packet is a crypto-NAK, the server 3426 * might have just changed keys. We demobilize the association 3427 * and wait for better times. 3428 */ 3429 if (auth == A_CRYPTO) { 3430 clear(p, X_CRYPTO); 3431 return; /* crypto-NAK */ 3432 } 3434 /* 3435 * If the association is authenticated, the key ID is nonzero 3436 * and received packets must be authenticated. This is designed 3437 * to avoid a bait-and-switch attack, which was possible in past 3438 * versions. 3439 */ 3440 if (!AUTH(p->keyid || (p->flags & P_NOTRUST), auth)) 3441 return; /* bad auth */ 3443 /* 3444 * Everything possible has been done to validate the timestamps 3445 * and prevent bad guys from disrupting the protocol or 3446 * injecting bogus data. Earn some revenue. 3447 */ 3448 packet(p, r); 3449 } 3451 A.5.1.1. packet() 3452 /* 3453 * packet() - process packet and compute offset, delay and 3454 * dispersion. 3455 */ 3456 void 3457 packet( 3458 struct p *p, /* peer structure pointer */ 3459 struct r *r /* receive packet pointer */ 3460 ) 3461 { 3462 double offset; /* sample offsset */ 3463 double delay; /* sample delay */ 3464 double disp; /* sample dispersion */ 3466 /* 3467 * By golly the packet is valid. Light up the remaining header 3468 * fields. Note that we map stratum 0 (unspecified) to MAXSTRAT 3469 * to make stratum comparisons simpler and to provide a natural 3470 * interface for radio clock drivers that operate for 3471 * convenience at stratum 0. 3472 */ 3473 p->leap = r->leap; 3474 if (r->stratum == 0) 3475 p->stratum = MAXSTRAT; 3476 else 3477 p->stratum = r->stratum; 3478 p->pmode = r->mode; 3479 p->ppoll = r->poll; 3480 p->rootdelay = FP2D(r->rootdelay); 3481 p->rootdisp = FP2D(r->rootdisp); 3482 p->refid = r->refid; 3483 p->reftime = r->reftime; 3485 /* 3486 * Verify the server is synchronized with valid stratum and 3487 * reference time not later than the transmit time. 3488 */ 3489 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3490 return; /* unsynchronized */ 3492 /* 3493 * Verify valid root distance. 3494 */ 3495 if (r->rootdelay / 2 + r->rootdisp >= MAXDISP || p->reftime > 3496 r->xmt) 3497 return; /* invalid header values */ 3499 poll_update(p, p->hpoll); 3500 p->reach |= 1; 3502 /* 3503 * Calculate offset, delay and dispersion, then pass to the 3504 * clock filter. Note carefully the implied processing. The 3505 * first-order difference is done directly in 64-bit arithmetic, 3506 * then the result is converted to floating double. All further 3507 * processing is in floating double arithmetic with rounding 3508 * done by the hardware. This is necessary in order to avoid 3509 * overflow and preseve precision. 3510 * 3511 * The delay calculation is a special case. In cases where the 3512 * server and client clocks are running at different rates and 3513 * with very fast networks, the delay can appear negative. In 3514 * order to avoid violating the Principle of Least Astonishment, 3515 * the delay is clamped not less than the system precision. 3516 */ 3517 if (p->pmode == M_BCST) { 3518 offset = LFP2D(r->xmt - r->dst); 3519 delay = BDELAY; 3520 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3521 2 * BDELAY; 3522 } else { 3523 offset = (LFP2D(r->rec - r->org) + LFP2D(r->dst - 3524 r->xmt)) / 2; 3525 delay = max(LFP2D(r->dst - r->org) - LFP2D(r->rec - 3526 r->xmt), LOG2D(s.precision)); 3527 disp = LOG2D(r->precision) + LOG2D(s.precision) + PHI * 3528 LFP2D(r->dst - r->org); 3529 } 3530 clock_filter(p, offset, delay, disp); 3531 } 3533 A.5.2. clock_filter() 3535 /* 3536 * clock_filter(p, offset, delay, dispersion) - select the best from the 3537 * latest eight delay/offset samples. 3538 */ 3539 void 3540 clock_filter( 3541 struct p *p, /* peer structure pointer */ 3542 double offset, /* clock offset */ 3543 double delay, /* roundtrip delay */ 3544 double disp /* dispersion */ 3545 ) 3546 { 3547 struct f f[NSTAGE]; /* sorted list */ 3548 double dtemp; 3549 int i; 3551 /* 3552 * The clock filter contents consist of eight tuples (offset, 3553 * delay, dispersion, time). Shift each tuple to the left, 3554 * discarding the leftmost one. As each tuple is shifted, 3555 * increase the dispersion since the last filter update. At the 3556 * same time, copy each tuple to a temporary list. After this, 3557 * place the (offset, delay, disp, time) in the vacated 3558 * rightmost tuple. 3559 */ 3560 for (i = 1; i < NSTAGE; i++) { 3561 p->f[i] = p->f[i - 1]; 3562 p->f[i].disp += PHI * (c.t - p->t); 3563 f[i] = p->f[i]; 3564 } 3565 p->f[0].t = c.t; 3566 p->f[0].offset = offset; 3567 p->f[0].delay = delay; 3568 p->f[0].disp = disp; 3569 f[0] = p->f[0]; 3571 /* 3572 * Sort the temporary list of tuples by increasing f[].delay. 3573 * The first entry on the sorted list represents the best 3574 * sample, but it might be old. 3575 */ 3576 dtemp = p->offset; 3577 p->offset = f[0].offset; 3578 p->delay = f[0].delay; 3579 for (i = 0; i < NSTAGE; i++) { 3580 p->disp += f[i].disp / (2 ^ (i + 1)); 3581 p->jitter += SQUARE(f[i].offset - f[0].offset); 3582 } 3583 p->jitter = max(SQRT(p->jitter), LOG2D(s.precision)); 3585 /* 3586 * Prime directive: use a sample only once and never a sample 3587 * older than the latest one, but anything goes before first 3588 * synchronized. 3589 */ 3590 if (f[0].t - p->t <= 0 && s.leap != NOSYNC) 3591 return; 3593 /* 3594 * Popcorn spike suppressor. Compare the difference between the 3595 * last and current offsets to the current jitter. If greater 3596 * than SGATE (3) and if the interval since the last offset is 3597 * less than twice the system poll interval, dump the spike. 3598 * Otherwise, and if not in a burst, shake out the truechimers. 3599 */ 3600 if (fabs(p->offset - dtemp) > SGATE * p->jitter && (f[0].t - 3601 p->t) < 2 * s.poll) 3602 return; 3604 p->t = f[0].t; 3605 if (p->burst == 0) 3606 clock_select(); 3607 return; 3608 } 3610 /* 3611 * root_dist() - calculate root distance 3612 */ 3613 double 3614 root_dist( 3615 struct p *p /* peer structure pointer */ 3616 ) 3617 { 3618 /* 3619 * The root synchronization distance is the maximum error due to 3620 * all causes of the local clock relative to the primary server. 3621 * It is defined as half the total delay plus total dispersion 3622 * plus peer jitter. 3623 */ 3624 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3625 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3626 } 3628 /* 3629 * fit() - test if association p is acceptable for synchronization 3630 */ 3631 int 3632 fit( 3633 struct p *p /* peer structure pointer */ 3634 ) 3635 { 3636 /* 3637 * A stratum error occurs if (1) the server has never been 3638 * synchronized, (2) the server stratum is invalid. 3639 */ 3640 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 3641 return (FALSE); 3643 /* 3644 * A distance error occurs if the root distance exceeds the 3645 * distance threshold plus an increment equal to one poll 3646 * interval. 3647 */ 3648 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 3649 return (FALSE); 3651 /* 3652 * A loop error occurs if the remote peer is synchronized to the 3653 * local peer or the remote peer is synchronized to the current 3654 * system peer. Note this is the behavior for IPv4; for IPv6 the 3655 * MD5 hash is used instead. 3656 */ 3657 if (p->refid == p->dstaddr || p->refid == s.refid) 3658 return (FALSE); 3660 /* 3661 * An unreachable error occurs if the server is unreachable. 3662 */ 3663 if (p->reach == 0) 3664 return (FALSE); 3666 return (TRUE); 3667 } 3669 /* 3670 * clear() - reinitialize for persistent association, demobilize 3671 * for ephemeral association. 3672 */ 3673 void 3674 clear( 3675 struct p *p, /* peer structure pointer */ 3676 int kiss /* kiss code */ 3677 ) 3678 { 3679 int i; 3681 /* 3682 * The first thing to do is return all resources to the bank. 3683 * Typical resources are not detailed here, but they include 3684 * dynamically allocated structures for keys, certificates, etc. 3685 * If an ephemeral association and not initialization, return 3686 * the association memory as well. 3687 */ 3688 /* return resources */ 3689 if (s.p == p) 3690 s.p = NULL; 3691 if (kiss != X_INIT && (p->flags & P_EPHEM)) { 3692 free(p); 3693 return; 3694 } 3696 /* 3697 * Initialize the association fields for general reset. 3698 */ 3699 memset(BEGIN_CLEAR(p), LEN_CLEAR, 0); 3700 p->leap = NOSYNC; 3701 p->stratum = MAXSTRAT; 3702 p->ppoll = MAXPOLL; 3703 p->hpoll = MINPOLL; 3704 p->disp = MAXDISP; 3705 p->jitter = LOG2D(s.precision); 3706 p->refid = kiss; 3707 for (i = 0; i < NSTAGE; i++) 3708 p->f[i].disp = MAXDISP; 3710 /* 3711 * Randomize the first poll just in case thousands of broadcast 3712 * clients have just been stirred up after a long absence of the 3713 * broadcast server. 3714 */ 3715 p->outdate = p->t = c.t; 3716 p->nextdate = p->outdate + (random() & ((1 << MINPOLL) - 1)); 3717 } 3719 A.5.3. fast_xmit() 3721 /* 3722 * fast_xmit() - transmit a reply packet for receive packet r 3723 */ 3724 void 3725 fast_xmit( 3726 struct r *r, /* receive packet pointer */ 3727 int mode, /* association mode */ 3728 int auth /* authentication code */ 3729 ) 3730 { 3731 struct x x; 3733 /* 3734 * Initialize header and transmit timestamp. Note that the 3735 * transmit version is copied from the receive version. This is 3736 * for backward compatibility. 3737 */ 3738 x.version = r->version; 3739 x.srcaddr = r->dstaddr; 3740 x.dstaddr = r->srcaddr; 3741 x.leap = s.leap; 3742 x.mode = mode; 3743 if (s.stratum == MAXSTRAT) 3744 x.stratum = 0; 3745 else 3746 x.stratum = s.stratum; 3747 x.poll = r->poll; 3748 x.precision = s.precision; 3749 x.rootdelay = D2FP(s.rootdelay); 3750 x.rootdisp = D2FP(s.rootdisp); 3751 x.refid = s.refid; 3752 x.reftime = s.reftime; 3753 x.org = r->xmt; 3754 x.rec = r->dst; 3755 x.xmt = get_time(); 3757 /* 3758 * If the authentication code is A.NONE, include only the 3759 * header; if A.CRYPTO, send a crypto-NAK; if A.OK, send a valid 3760 * MAC. Use the key ID in the received packet and the key in the 3761 * local key cache. 3762 */ 3763 if (auth != A_NONE) { 3764 if (auth == A_CRYPTO) { 3765 x.keyid = 0; 3766 } else { 3767 x.keyid = r->keyid; 3768 x.mac = md5(x.keyid); 3769 } 3770 } 3771 xmit_packet(&x); 3772 } 3773 A.5.4. access() 3775 /* 3776 * access() - determine access restrictions 3777 */ 3778 int 3779 access( 3780 struct r *r /* receive packet pointer */ 3781 ) 3782 { 3783 /* 3784 * The access control list is an ordered set of tuples 3785 * consisting of an address, mask and restrict word containing 3786 * defined bits. The list is searched for the first match on the 3787 * source address (r->srcaddr) and the associated restrict word 3788 * is returned. 3789 */ 3790 return (/* access bits */ 0); 3791 } 3793 A.5.5. System Process 3795 A.5.5.1. clock_select() 3797 /* 3798 * clock_select() - find the best clocks 3799 */ 3800 void 3801 clock_select() { 3802 struct p *p, *osys; /* peer structure pointers */ 3803 double low, high; /* correctness interval extents */ 3804 int allow, found, chime; /* used by intersecion algorithm */ 3805 int n, i, j; 3807 /* 3808 * We first cull the falsetickers from the server population, 3809 * leaving only the truechimers. The correctness interval for 3810 * association p is the interval from offset - root_dist() to 3811 * offset + root_dist(). The object of the game is to find a 3812 * majority clique; that is, an intersection of correctness 3813 * intervals numbering more than half the server population. 3814 * 3815 * First construct the chime list of tuples (p, type, edge) as 3816 * shown below, then sort the list by edge from lowest to 3817 * highest. 3818 */ 3819 osys = s.p; 3820 s.p = NULL; 3821 n = 0; 3822 while (fit(p)) { 3823 s.m[n].p = p; 3824 s.m[n].type = +1; 3825 s.m[n].edge = p->offset + root_dist(p); 3826 n++; 3827 s.m[n].p = p; 3828 s.m[n].type = 0; 3829 s.m[n].edge = p->offset; 3830 n++; 3831 s.m[n].p = p; 3832 s.m[n].type = -1; 3833 s.m[n].edge = p->offset - root_dist(p); 3834 n++; 3835 } 3837 /* 3838 * Find the largest contiguous intersection of correctness 3839 * intervals. Allow is the number of allowed falsetickers; found 3840 * is the number of midpoints. Note that the edge values are 3841 * limited to the range +-(2 ^ 30) < +-2e9 by the timestamp 3842 * calculations. 3843 */ 3844 low = 2e9; high = -2e9; 3845 for (allow = 0; 2 * allow < n; allow++) { 3847 /* 3848 * Scan the chime list from lowest to highest to find 3849 * the lower endpoint. 3850 */ 3851 found = 0; 3852 chime = 0; 3853 for (i = 0; i < n; i++) { 3854 chime -= s.m[i].type; 3855 if (chime >= n - found) { 3856 low = s.m[i].edge; 3857 break; 3858 } 3859 if (s.m[i].type == 0) 3860 found++; 3861 } 3863 /* 3864 * Scan the chime list from highest to lowest to find 3865 * the upper endpoint. 3866 */ 3867 chime = 0; 3868 for (i = n - 1; i >= 0; i--) { 3869 chime += s.m[i].type; 3870 if (chime >= n - found) { 3871 high = s.m[i].edge; 3872 break; 3873 } 3874 if (s.m[i].type == 0) 3875 found++; 3876 } 3878 /* 3879 * If the number of midpoints is greater than the number 3880 * of allowed falsetickers, the intersection contains at 3881 * least one truechimer with no midpoint. If so, 3882 * increment the number of allowed falsetickers and go 3883 * around again. If not and the intersection is 3884 * nonempty, declare success. 3885 */ 3886 if (found > allow) 3887 continue; 3889 if (high > low) 3890 break; 3891 } 3893 /* 3894 * Clustering algorithm. Construct a list of survivors (p, 3895 * metric) from the chime list, where metric is dominated first 3896 * by stratum and then by root distance. All other things being 3897 * equal, this is the order of preference. 3898 */ 3899 s.n = 0; 3900 for (i = 0; i < n; i++) { 3901 if (s.m[i].edge < low || s.m[i].edge > high) 3902 continue; 3904 p = s.m[i].p; 3905 s.v[n].p = p; 3906 s.v[n].metric = MAXDIST * p->stratum + root_dist(p); 3907 s.n++; 3908 } 3910 /* 3911 * There must be at least NSANE survivors to satisfy the 3912 * correctness assertions. Ordinarily, the Byzantine criteria 3913 * require four, survivors, but for the demonstration here, one 3914 * is acceptable. 3915 */ 3916 if (s.n < NSANE) 3917 return; 3919 /* 3920 * For each association p in turn, calculate the selection 3921 * jitter p->sjitter as the square root of the sum of squares 3922 * (p->offset - q->offset) over all q associations. The idea is 3923 * to repeatedly discard the survivor with maximum selection 3924 * jitter until a termination condition is met. 3925 */ 3926 while (1) { 3927 struct p *p, *q, *qmax; /* peer structure pointers */ 3928 double max, min, dtemp; 3930 max = -2e9; min = 2e9; 3931 for (i = 0; i < s.n; i++) { 3932 p = s.v[i].p; 3933 if (p->jitter < min) 3934 min = p->jitter; 3935 dtemp = 0; 3936 for (j = 0; j < n; j++) { 3937 q = s.v[j].p; 3938 dtemp += SQUARE(p->offset - q->offset); 3939 } 3940 dtemp = SQRT(dtemp); 3941 if (dtemp > max) { 3942 max = dtemp; 3943 qmax = q; 3944 } 3945 } 3947 /* 3948 * If the maximum selection jitter is less than the 3949 * minimum peer jitter, then tossing out more survivors 3950 * will not lower the minimum peer jitter, so we might 3951 * as well stop. To make sure a few survivors are left 3952 * for the clustering algorithm to chew on, we also stop 3953 * if the number of survivors is less than or equal to 3954 * NMIN (3). 3955 */ 3956 if (max < min || n <= NMIN) 3957 break; 3959 /* 3960 * Delete survivor qmax from the list and go around 3961 * again. 3962 */ 3963 s.n--; 3964 } 3965 /* 3966 * Pick the best clock. If the old system peer is on the list 3967 * and at the same stratum as the first survivor on the list, 3968 * then don't do a clock hop. Otherwise, select the first 3969 * survivor on the list as the new system peer. 3970 */ 3971 if (osys->stratum == s.v[0].p->stratum) 3972 s.p = osys; 3973 else 3974 s.p = s.v[0].p; 3975 clock_update(s.p); 3976 } 3978 A.5.5.2. root_dist() 3980 /* 3981 * root_dist() - calculate root distance 3982 */ 3983 double 3984 root_dist( 3985 struct p *p /* peer structure pointer */ 3986 ) 3987 { 3989 /* 3990 * The root synchronization distance is the maximum error due to 3991 * all causes of the local clock relative to the primary server. 3992 * It is defined as half the total delay plus total dispersion 3993 * plus peer jitter. 3994 */ 3995 return (max(MINDISP, p->rootdelay + p->delay) / 2 + 3996 p->rootdisp + p->disp + PHI * (c.t - p->t) + p->jitter); 3997 } 3998 A.5.5.3. accept() 4000 /* 4001 * accept() - test if association p is acceptable for synchronization 4002 */ 4003 int 4004 accept( 4005 struct p *p /* peer structure pointer */ 4006 ) 4007 { 4008 /* 4009 * A stratum error occurs if (1) the server has never been 4010 * synchronized, (2) the server stratum is invalid. 4011 */ 4012 if (p->leap == NOSYNC || p->stratum >= MAXSTRAT) 4013 return (FALSE); 4015 /* 4016 * A distance error occurs if the root distance exceeds the 4017 * distance threshold plus an increment equal to one poll 4018 * interval. 4019 */ 4020 if (root_dist(p) > MAXDIST + PHI * LOG2D(s.poll)) 4021 return (FALSE); 4023 /* 4024 * A loop error occurs if the remote peer is synchronized to the 4025 * local peer or the remote peer is synchronized to the current 4026 * system peer. Note this is the behavior for IPv4; for IPv6 the 4027 * MD5 hash is used instead. 4028 */ 4029 if (p->refid == p->dstaddr || p->refid == s.refid) 4030 return (FALSE); 4032 /* 4033 * An unreachable error occurs if the server is unreachable. 4034 */ 4035 if (p->reach == 0) 4036 return (FALSE); 4038 return (TRUE); 4039 } 4041 A.5.5.4. clock_update() 4043 /* 4044 * clock_update() - update the system clock 4045 */ 4047 void 4048 clock_update( 4049 struct p *p /* peer structure pointer */ 4050 ) 4051 { 4052 double dtemp; 4054 /* 4055 * If this is an old update, for instance as the result of a 4056 * system peer change, avoid it. We never use an old sample or 4057 * the same sample twice. 4058 * 4059 if (s.t >= p->t) 4060 return; 4062 /* 4063 * Combine the survivor offsets and update the system clock; the 4064 * local_clock() routine will tell us the good or bad news. 4065 */ 4066 s.t = p->t; 4067 clock_combine(); 4068 switch (local_clock(p, s.offset)) { 4070 /* 4071 * The offset is too large and probably bogus. Complain to the 4072 * system log and order the operator to set the clock manually 4073 * within PANIC range. The reference implementation includes a 4074 * command line option to disable this check and to change the 4075 * panic threshold from the default 1000 s as required. 4076 */ 4077 case PANIC: 4078 exit (0); 4080 /* 4081 * The offset is more than the step threshold (0.125 s by 4082 * default). After a step, all associations now have 4083 * inconsistent time valurs, so they are reset and started 4084 * fresh. The step threshold can be changed in the reference 4085 * implementation in order to lessen the chance the clock might 4086 * be stepped backwards. However, there may be serious 4087 * consequences, as noted in the white papers at the NTP project 4088 * site. 4089 */ 4090 case STEP: 4091 while (/* all associations */ 0) 4092 clear(p, X_STEP); 4093 s.stratum = MAXSTRAT; 4094 s.poll = MINPOLL; 4095 break; 4097 /* 4098 * The offset was less than the step threshold, which is the 4099 * normal case. Update the system variables from the peer 4100 * variables. The lower clamp on the dispersion increase is to 4101 * avoid timing loops and clockhopping when highly precise 4102 * sources are in play. The clamp can be changed from the 4103 * default .01 s in the reference implementation. 4104 */ 4105 case SLEW: 4106 s.leap = p->leap; 4107 s.stratum = p->stratum + 1; 4108 s.refid = p->refid; 4109 s.reftime = p->reftime; 4110 s.rootdelay = p->rootdelay + p->delay; 4111 dtemp = SQRT(SQUARE(p->jitter) + SQUARE(s.jitter)); 4112 dtemp += max(p->disp + PHI * (c.t - p->t) + 4113 fabs(p->offset), MINDISP); 4114 s.rootdisp = p->rootdisp + dtemp; 4115 break; 4117 /* 4118 * Some samples are discarded while, for instance, a direct 4119 * frequency measurement is being made. 4120 */ 4121 case IGNORE: 4122 break; 4123 } 4124 } 4125 A.5.5.5. clock_combine() 4127 /* 4128 * clock_combine() - combine offsets 4129 */ 4130 void 4131 clock_combine() 4132 { 4133 struct p *p; /* peer structure pointer */ 4134 double x, y, z, w; 4135 int i; 4137 /* 4138 * Combine the offsets of the clustering algorithm survivors 4139 * using a weighted average with weight determined by the root 4140 * distance. Compute the selection jitter as the weighted RMS 4141 * difference between the first survivor and the remaining 4142 * survivors. In some cases the inherent clock jitter can be 4143 * reduced by not using this algorithm, especially when frequent 4144 * clockhopping is involved. The reference implementation can be 4145 * configured to avoid this algorithm by designating a preferred 4146 * peer. 4147 */ 4148 y = z = w = 0; 4149 for (i = 0; s.v[i].p != NULL; i++) { 4150 p = s.v[i].p; 4151 x = root_dist(p); 4152 y += 1 / x; 4153 z += p->offset / x; 4154 w += SQUARE(p->offset - s.v[0].p->offset) / x; 4155 } 4156 s.offset = z / y; 4157 s.jitter = SQRT(w / y); 4158 } 4160 A.5.5.6. local_clock() 4162 /* 4163 * Clock discipline parameters and constants 4164 */ 4165 #define STEPT .128 /* step threshold (s) */ 4166 #define WATCH 900 /* stepout threshold (s) */ 4167 #define PANICT 1000 /* panic threshold (s) */ 4168 #define PLL 65536 /* PLL loop gain */ 4169 #define FLL MAXPOLL + 1 /* FLL loop gain */ 4170 #define AVG 4 /* parameter averaging constant */ 4171 #define ALLAN 1500 /* compromise Allan intercept (s) */ 4172 #define LIMIT 30 /* poll-adjust threshold */ 4173 #define MAXFREQ 500e-6 /* frequency tolerance (500 PPM) */ 4174 #define PGATE 4 /* poll-adjust gate */ 4176 /* 4177 * local_clock() - discipline the local clock 4178 */ 4179 int /* return code */ 4180 local_clock( 4181 struct p *p, /* peer structure pointer */ 4182 double offset /* clock offset from combine() */ 4183 ) 4184 { 4185 int state; /* clock discipline state */ 4186 double freq; /* frequency */ 4187 double mu; /* interval since last update */ 4188 int rval; 4189 double etemp, dtemp; 4191 /* 4192 * If the offset is too large, give up and go home. 4193 */ 4194 if (fabs(offset) > PANICT) 4195 return (PANIC); 4197 /* 4198 * Clock state machine transition function. This is where the 4199 * action is and defines how the system reacts to large time 4200 * and frequency errors. There are two main regimes: when the 4201 * offset exceeds the step threshold and when it does not. 4202 */ 4203 rval = SLEW; 4204 mu = p->t - s.t; 4205 freq = 0; 4206 if (fabs(offset) > STEPT) { 4207 switch (c.state) { 4209 /* 4210 * In S_SYNC state we ignore the first outlyer amd 4211 * switch to S_SPIK state. 4212 */ 4213 case SYNC: 4214 state = SPIK; 4215 return (rval); 4217 /* 4218 * In S_FREQ state we ignore outlyers and inlyers. At 4219 * the first outlyer after the stepout threshold, 4220 * compute the apparent frequency correction and step 4221 * the time. 4222 */ 4223 case FREQ: 4224 if (mu < WATCH) 4225 return (IGNORE); 4227 freq = (offset - c.offset) / mu; 4228 /* fall through to S_SPIK */ 4230 /* 4231 * In S_SPIK state we ignore succeeding outlyers until 4232 * either an inlyer is found or the stepout threshold is 4233 * exceeded. 4234 */ 4235 case SPIK: 4236 if (mu < WATCH) 4237 return (IGNORE); 4239 /* fall through to default */ 4241 /* 4242 * We get here by default in S_NSET and S_FSET states 4243 * and from above in S_FREQ state. Step the time and 4244 * clamp down the poll interval. 4245 * 4246 * In S_NSET state an initial frequency correction is 4247 * not available, usually because the frequency file has 4248 * not yet been written. Since the time is outside the 4249 * capture range, the clock is stepped. The frequency 4250 * will be set directly following the stepout interval. 4251 * 4252 * In S_FSET state the initial frequency has been set 4253 * from the frequency file. Since the time is outside 4254 * the capture range, the clock is stepped immediately, 4255 * rather than after the stepout interval. Guys get 4256 * nervous if it takes 17 minutes to set the clock for 4257 * the first time. 4258 * 4259 * In S_SPIK state the stepout threshold has expired and 4260 * the phase is still above the step threshold. Note 4261 * that a single spike greater than the step threshold 4262 * is always suppressed, even at the longer poll 4263 * intervals. 4264 */ 4265 default: 4267 /* 4268 * This is the kernel set time function, usually 4269 * implemented by the Unix settimeofday() system 4270 * call. 4271 */ 4272 step_time(offset); 4273 c.count = 0; 4274 s.poll = MINPOLL; 4275 rval = STEP; 4276 if (state == NSET) { 4277 rstclock(FREQ, p->t, 0); 4278 return (rval); 4279 } 4280 break; 4281 } 4282 rstclock(SYNC, p->t, 0); 4283 } else { 4285 /* 4286 * Compute the clock jitter as the RMS of exponentially 4287 * weighted offset differences. This is used by the 4288 * poll-adjust code. 4289 */ 4290 etemp = SQUARE(c.jitter); 4291 dtemp = SQUARE(max(fabs(offset - c.last), 4292 LOG2D(s.precision))); 4293 c.jitter = SQRT(etemp + (dtemp - etemp) / AVG); 4294 switch (c.state) { 4296 /* 4297 * In S_NSET state this is the first update received and 4298 * the frequency has not been initialized. The first 4299 * thing to do is directly measure the oscillator 4300 * frequency. 4301 */ 4302 case NSET: 4303 rstclock(FREQ, p->t, offset); 4304 return (IGNORE); 4306 /* 4307 * In S_FSET state this is the first update and the 4308 * frequency has been initialized. Adjust the phase, but 4309 * don't adjust the frequency until the next update. 4310 */ 4311 case FSET: 4312 rstclock(SYNC, p->t, offset); 4313 break; 4315 /* 4316 * In S_FREQ state ignore updates until the stepout 4317 * threshold. After that, correct the phase and 4318 * frequency and switch to S_SYNC state. 4319 */ 4320 case FREQ: 4321 if (c.t - s.t < WATCH) 4322 return (IGNORE); 4324 freq = (offset - c.offset) / mu; 4325 break; 4327 /* 4328 * We get here by default in S_SYNC and S_SPIK states. 4329 * Here we compute the frequency update due to PLL and 4330 * FLL contributions. 4331 */ 4332 default: 4334 /* 4335 * The FLL and PLL frequency gain constants 4336 * depend on the poll interval and Allan 4337 * intercept. The FLL is not used below one-half 4338 * the Allan intercept. Above that the loop gain 4339 * increases in steps to 1 / AVG. 4340 */ 4341 if (LOG2D(s.poll) > ALLAN / 2) { 4342 etemp = FLL - s.poll; 4343 if (etemp < AVG) 4344 etemp = AVG; 4345 freq += (offset - c.offset) / (max(mu, 4346 ALLAN) * etemp); 4347 } 4349 /* 4350 * For the PLL the integration interval 4351 * (numerator) is the minimum of the update 4352 * interval and poll interval. This allows 4353 * oversampling, but not undersampling. 4354 */ 4355 etemp = min(mu, LOG2D(s.poll)); 4356 dtemp = 4 * PLL * LOG2D(s.poll); 4357 freq += offset * etemp / (dtemp * dtemp); 4358 rstclock(SYNC, p->t, offset); 4359 break; 4360 } 4361 } 4363 /* 4364 * Calculate the new frequency and frequency stability (wander). 4366 * Compute the clock wander as the RMS of exponentially weighted 4367 * frequency differences. This is not used directly, but can, 4368 * along withthe jitter, be a highly useful monitoring and 4369 * debugging tool 4370 */ 4371 freq += c.freq; 4372 c.freq = max(min(MAXFREQ, freq), -MAXFREQ); 4373 etemp = SQUARE(c.wander); 4374 dtemp = SQUARE(freq); 4375 c.wander = SQRT(etemp + (dtemp - etemp) / AVG); 4377 /* 4378 * Here we adjust the poll interval by comparing the current 4379 * offset with the clock jitter. If the offset is less than the 4380 * clock jitter times a constant, then the averaging interval is 4381 * increased, otherwise it is decreased. A bit of hysteresis 4382 * helps calm the dance. Works best using burst mode. 4383 */ 4384 if (fabs(c.offset) < PGATE * c.jitter) { 4385 c.count += s.poll; 4386 if (c.count > LIMIT) { 4387 c.count = LIMIT; 4388 if (s.poll < MAXPOLL) { 4389 c.count = 0; 4390 s.poll++; 4391 } 4392 } 4393 } else { 4394 c.count -= s.poll << 1; 4395 if (c.count < -LIMIT) { 4396 c.count = -LIMIT; 4397 if (s.poll > MINPOLL) { 4398 c.count = 0; 4399 s.poll--; 4400 } 4401 } 4402 } 4403 return (rval); 4404 } 4405 A.5.5.7. rstclock() 4407 /* 4408 * rstclock() - clock state machine 4409 */ 4410 void 4411 rstclock( 4412 int state, /* new state */ 4413 double offset, /* new offset */ 4414 double t /* new update time */ 4415 ) 4416 { 4417 /* 4418 * Enter new state and set state variables. Note we use the time 4419 * of the last clock filter sample, which must be earlier than 4420 * the current time. 4421 */ 4422 c.state = state; 4423 c.last = c.offset = offset; 4424 s.t = t; 4425 } 4427 A.5.6. Clock Adjust Process 4429 A.5.6.1. clock_adjust() 4431 /* 4432 * clock_adjust() - runs at one-second intervals 4433 */ 4434 void 4435 clock_adjust() { 4436 double dtemp; 4438 /* 4439 * Update the process time c.t. Also increase the dispersion 4440 * since the last update. In contrast to NTPv3, NTPv4 does not 4441 * declare unsynchronized after one day, since the dispersion 4442 * threshold serves this function. When the dispersion exceeds 4443 * MAXDIST (1 s), the server is considered unfit for 4444 * synchroniztion. 4445 */ 4446 c.t++; 4447 s.rootdisp += PHI; 4449 /* 4450 * Implement the phase and frequency adjustments. The gain 4451 * factor (denominator) is not allowed to increase beyond the 4452 * Allan intercept. It doesn't make sense to average phase noise 4453 * beyond this point and it helps to damp residual offset at the 4454 * longer poll intervals. 4455 */ 4456 dtemp = c.offset / (PLL * min(LOG2D(s.poll), ALLAN)); 4457 c.offset -= dtemp; 4459 /* 4460 * This is the kernel adjust time function, usually implemented 4461 * by the Unix adjtime() system call. 4462 */ 4463 adjust_time(c.freq + dtemp); 4465 /* 4466 * Peer timer. Call the poll() routine when the poll timer 4467 * expires. 4468 */ 4469 while (/* all associations */ 0) { 4470 struct p *p; /* dummy peer structure pointer */ 4472 if (c.t >= p->nextdate) 4473 poll(p); 4474 } 4476 /* 4477 * Once per hour write the clock frequency to a file 4478 */ 4479 if (c.t % 3600 == 3599) 4480 /* write c.freq to file */ 0; 4481 } 4483 A.5.7. Poll Process 4485 /* 4486 * Poll process parameters and constants 4487 */ 4488 #define UNREACH 12 /* unreach counter threshold */ 4489 #define BCOUNT 8 /* packets in a burst */ 4490 #define BTIME 2 /* burst interval (s) */ 4492 A.5.7.1. poll() 4494 /* 4495 * poll() - determine when to send a packet for association p-> 4496 */ 4497 void 4498 poll( 4499 struct p *p /* peer structure pointer */ 4500 ) 4502 { 4503 int hpoll; 4504 int oreach; 4506 /* 4507 * This routine is called when the current time c.t catches up 4508 * to the next poll time p->nextdate. The value p->outdate is 4509 * the last time this routine was executed. The poll_update() 4510 * routine determines the next execution time p->nextdate. 4511 * 4512 * If broadcasting, just do it, but only if we are synchronized. 4513 */ 4514 hpoll = p->hpoll; 4515 if (p->hmode == M_BCST) { 4516 p->outdate = c.t; 4517 if (s.p != NULL) 4518 peer_xmit(p); 4519 poll_update(p, hpoll); 4520 return; 4521 } 4523 /* 4524 * If manycasting, start with ttl = 1. The ttl is increased by 4525 * one for each poll until MAXCLOCK servers have been found or 4526 * ttl reaches TTLMAX. If reaching MAXCLOCK, stop polling until 4527 * the number of servers falls below MINCLOCK, then start all 4528 * over. 4529 */ 4530 if (p->hmode == M_CLNT && p->flags & P_MANY) { 4531 p->outdate = c.t; 4532 if (p->unreach > BEACON) { 4533 p->unreach = 0; 4534 p->ttl = 1; 4535 peer_xmit(p); 4536 } else if (s.n < MINCLOCK) { 4537 if (p->ttl < TTLMAX) 4538 p->ttl++; 4539 peer_xmit(p); 4540 } 4541 p->unreach++; 4542 poll_update(p, hpoll); 4543 return; 4544 } 4545 if (p->burst == 0) { 4547 /* 4548 * We are not in a burst. Shift the reachability 4549 * register to the left. Hopefully, some time before the 4550 * next poll a packet will arrive and set the rightmost 4551 * bit. 4552 */ 4553 oreach = p->reach; 4554 p->outdate = c.t; 4555 p->reach << 1; 4556 if (!(p->reach & 0x7)) 4557 clock_filter(p, 0, 0, MAXDISP); 4558 if (!p->reach) { 4560 /* 4561 * The server is unreachable, so bump the 4562 * unreach counter. If the unreach threshold has 4563 * been reached, double the poll interval to 4564 * minimize wasted network traffic. Send a burst 4565 * only if enabled and the unreach threshold has 4566 * not been reached. 4567 */ 4568 if (p->flags & P_IBURST && p->unreach == 0) { 4569 p->burst = BCOUNT; 4570 } else if (p->unreach < UNREACH) 4571 p->unreach++; 4572 else 4573 hpoll++; 4574 p->unreach++; 4575 } else { 4577 /* 4578 * The server is reachable. Set the poll 4579 * interval to the system poll interval. Send a 4580 * burst only if enabled and the peer is fit. 4581 */ 4582 p->unreach = 0; 4583 hpoll = s.poll; 4584 if (p->flags & P_BURST && fit(p)) 4585 p->burst = BCOUNT; 4586 } 4587 } else { 4589 /* 4590 * If in a burst, count it down. When the reply comes 4591 * back the clock_filter() routine will call 4592 * clock_select() to process the results of the burst. 4593 */ 4594 p->burst--; 4595 } 4597 /* 4598 * Do not transmit if in broadcast client mode. 4599 */ 4600 if (p->hmode != M_BCLN) 4601 peer_xmit(p); 4602 poll_update(p, hpoll); 4603 } 4605 A.5.7.2. poll_update() 4606 /* 4607 * poll_update() - update the poll interval for association p 4608 * 4609 * Note: This routine is called by both the packet() and poll() routine. 4610 * Since the packet() routine is executed when a network packet arrives 4611 * and the poll() routine is executed as the result of timeout, a 4612 * potential race can occur, possibly causing an incorrect interval for 4613 * the next poll. This is considered so unlikely as to be negligible. 4614 */ 4615 void 4616 poll_update( 4617 struct p *p, /* peer structure pointer */ 4618 int poll /* poll interval (log2 s) */ 4619 ) 4620 { 4621 /* 4622 * This routine is called by both the poll() and packet() 4623 * routines to determine the next poll time. If within a burst 4624 * the poll interval is two seconds. Otherwise, it is the 4625 * minimum of the host poll interval and peer poll interval, but 4626 * not greater than MAXPOLL and not less than MINPOLL. The 4627 * design insures that a longer interval can be preempted by a 4628 * shorter one if required for rapid response. 4629 */ 4630 p->hpoll = max(min(MAXPOLL, poll), MINPOLL); 4631 if (p->burst > 0) { 4632 if (p->nextdate != c.t) 4633 return; 4634 else 4635 p->nextdate += BTIME; 4636 } else { 4638 /* 4639 * While not shown here, the reference implementation 4640 * randonizes the poll interval by a small factor. 4641 */ 4642 p->nextdate = p->outdate + (1 << max(min(p->ppoll, 4643 p->hpoll), MINPOLL)); 4644 } 4646 /* 4647 * It might happen that the due time has already passed. If so, 4648 * make it one second in the future. 4649 */ 4650 if (p->nextdate <= c.t) 4651 p->nextdate = c.t + 1; 4652 } 4653 A.5.7.3. peer_xmit() 4654 /* 4655 * transmit() - transmit a packet for association p 4656 */ 4657 void 4658 peer_xmit( 4659 struct p *p /* peer structure pointer */ 4660 ) 4661 { 4662 struct x x; /* transmit packet */ 4664 /* 4665 * Initialize header and transmit timestamp 4666 */ 4667 x.srcaddr = p->dstaddr; 4668 x.dstaddr = p->srcaddr; 4669 x.leap = s.leap; 4670 x.version = p->version; 4671 x.mode = p->hmode; 4672 if (s.stratum == MAXSTRAT) 4673 x.stratum = 0; 4674 else 4675 x.stratum = s.stratum; 4676 x.poll = p->hpoll; 4677 x.precision = s.precision; 4678 x.rootdelay = D2FP(s.rootdelay); 4679 x.rootdisp = D2FP(s.rootdisp); 4680 x.refid = s.refid; 4681 x.reftime = s.reftime; 4682 x.org = p->org; 4683 x.rec = p->rec; 4684 x.xmt = get_time(); 4685 p->xmt = x.xmt; 4687 /* 4688 * If the key ID is nonzero, send a valid MAC using the key ID 4689 * of the association and the key in the local key cache. If 4690 * something breaks, like a missing trusted key, don't send the 4691 * packet; just reset the association and stop until the problem 4692 * is fixed. 4693 */ 4694 if (p->keyid) 4695 if (/* p->keyid invalid */ 0) { 4696 clear(p, X_NKEY); 4697 return; 4698 } 4699 x.mac = md5(p->keyid); 4700 xmit_packet(&x); 4701 } 4702 Authors' Addresses 4704 Jack Burbank (editor) 4705 The Johns Hopkins University Applied Physics Laboratory 4706 11100 Johns Hopkins Road 4707 Laurel, MD 20723-6099 4708 US 4710 Phone: +1 443 778 7127 4711 Email: jack.burbank@jhuapl.edu 4713 William Kasch (editor) 4714 The Johns Hopkins University Applied Physics Laboratory 4715 11100 Johns Hopkins Road 4716 Laurel, MD 20723-6099 4717 US 4719 Phone: +1 443 778 7463 4720 Email: william.kasch@jhuapl.edu 4722 Jim Martin (editor) 4723 The Daedelus Group 4724 721 Alameda de las Pulgas 4725 Belmont, CA 94002 4726 US 4728 Phone: +1 408 799-2788 4729 Email: jim@daedelus.com 4731 Dr. David L. Mills 4732 University of Delaware 4733 Newark, DE 19716 4734 US 4736 Phone: +1 302 831 8247 4737 Email: mills@udel.edu 4739 Full Copyright Statement 4741 Copyright (C) The IETF Trust (2007). 4743 This document is subject to the rights, licenses and restrictions 4744 contained in BCP 78, and except as set forth therein, the authors 4745 retain all their rights. 4747 This document and the information contained herein are provided on an 4748 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4749 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4750 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4751 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4752 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4753 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4755 Intellectual Property 4757 The IETF takes no position regarding the validity or scope of any 4758 Intellectual Property Rights or other rights that might be claimed to 4759 pertain to the implementation or use of the technology described in 4760 this document or the extent to which any license under such rights 4761 might or might not be available; nor does it represent that it has 4762 made any independent effort to identify any such rights. Information 4763 on the procedures with respect to rights in RFC documents can be 4764 found in BCP 78 and BCP 79. 4766 Copies of IPR disclosures made to the IETF Secretariat and any 4767 assurances of licenses to be made available, or the result of an 4768 attempt made to obtain a general license or permission for the use of 4769 such proprietary rights by implementers or users of this 4770 specification can be obtained from the IETF on-line IPR repository at 4771 http://www.ietf.org/ipr. 4773 The IETF invites any interested party to bring to its attention any 4774 copyrights, patents or patent applications, or other proprietary 4775 rights that may cover technology that may be required to implement 4776 this standard. Please address the information to the IETF at 4777 ietf-ipr@ietf.org. 4779 Acknowledgment 4781 Funding for the RFC Editor function is provided by the IETF 4782 Administrative Support Activity (IASA).